CN113902975B

CN113902975B - Scene perception data enhancement method for SAR ship detection

Info

Publication number: CN113902975B
Application number: CN202111170725.1A
Authority: CN
Inventors: 张晓玲; 杨振宇; 张天文; 师君; 韦顺军
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2021-10-08
Filing date: 2021-10-08
Publication date: 2023-05-05
Anticipated expiration: 2041-10-08
Also published as: CN113902975A

Abstract

The invention discloses a scene perception data enhancement method for SAR ship detection, which is characterized in that firstly, a classical convolutional neural network VGG-11 is used for improvement, so that the method is more suitable for SAR images, and then the network is used for classifying images in a training set: the method comprises the steps of dividing an offshore training sample and an offshore training sample; obtaining a quantity-balanced offshore training sample and an offshore training sample by scene amplification; the classical detection network uses the processed data set to train, execute detection tasks and evaluate detection results; the overall detection precision of the Faster R-CNN ship detection network adopting the method of the invention is improved by 1.95% compared with the overall detection precision of the Faster R-CNN ship detection network in the prior art, the detection precision of the coasting ship is improved by 6.61%, and the improvement of the detection precision of the SAR image coasting ship is realized.

Description

Scene perception data enhancement method for SAR ship detection

Technical Field

The invention belongs to the technical field of Synthetic Aperture Radar (SAR) image interpretation, and relates to a scene perception data enhancement method for SAR ship detection.

Background

The Synthetic Aperture Radar (SAR) is a high-resolution microwave active imaging radar, has the working characteristics of all weather and all weather, and compared with an optical sensor, electromagnetic waves emitted by the SAR can penetrate through the shielding of complex environmental objects such as cloud, vegetation and the like and can not be influenced by the brightness of light in a detection area, so that the SAR has wide application in civil and military fields. See literature "European, synthetic aperture radar application research in ship target positioning and imaging technology [ J ]. Ship science and technology 2019,41 (02): 152-154.

In recent years, ship detection in SAR images has become a research hotspot, because it can realize convenient marine traffic management, ship oil spill monitoring, ship disaster rescue and the like. The ship in the SAR image is an important and valuable target, particularly in the field of national defense and military, can effectively protect national ocean rights and interests, and provides an effective solution for solving ocean disputes. In particular, SAR operation is not affected by daytime, climatic conditions, and is particularly suitable for use in varioforming marine environments, thus compensating for the shortcomings of optical sensors. See document Meng Fanchao, bao Yong for details, application of synthetic aperture radars in high resolution monitoring and mapping of ship targets [ J ]. Ship science, 2018,40 (22): 157-159 ].

Numerous SAR image ship detection algorithms have been proposed so far, the most common and efficient method being various detection algorithms based on CFAR, which uses a pre-established sea clutter model, uses a sliding window to retrieve the image, and determines whether to include the ship based on a ship detection threshold provided by the sea clutter model, wherein the common sea clutter model is based on gaussian distribution, rayleigh distribution, K distribution, etc. However, the background clutter distribution model is difficult to fit to the real background clutter distribution because the sea surface background is influenced by the surrounding environment and weather, so that the CFAR is difficult to apply to the condition of a more complex scene. See for details the documents "Yang Xuezhi, song Hui, du Yang, zhang Xi, meng Junmin. Rice-CFAR based SAR image Ship detection [ J ]. Proc. Of the fertilizer industry, natl.Sci., 2015,38 (04): 463-467 ].

With the development of artificial intelligence, deep learning is applied to the field of SAR image ship detection. The deep learning-based method mainly adopts a deep convolutional neural network to automatically extract the characteristics of the ship, the mathematical distribution of data is fitted through learning training, and the coordinate position of the ship in the SAR image is obtained through reasoning by regression, so that the accuracy is higher than that of various detection algorithms based on CFAR. Some object detectors derived from the field of computer vision, such as Fast R-CNN, fast R-CNN, YOLO, retinaNet, etc., have been successfully applied to the field of SAR image ship detection. However, since the offshore area has a strong backscattering characteristic, the detection accuracy of the offshore ship is significantly lower than that of the offshore ship.

Although the CNN-based SAR ship detector has better detection performance than the conventional detection method, the detection accuracy of the offshore ship is still difficult to improve due to unbalance of sample scenes. To balance the number of onshore and offshore samples, a method for a Balanced Scene Learning Mechanism (BSLM) for onshore and offshore ship detection in SAR images is proposed. The method is based on unsupervised learning, and utilizes a generated countermeasure network (GAN) to extract scene characteristics of SAR images; with these features, scene binary clustering (onshore/offshore) is performed by k-means; finally, the offshore samples are enhanced by copying, rotating transformation or adding noise so as to balance the offshore samples, thereby eliminating scene learning deviation, obtaining balanced learning representation capability and further improving learning benefit and detection accuracy. See the literature "T.Zhang et al," Balance Scene Learning Mechanism for Offshore and Inshore Ship Detection in SAR Images, "in IEEE Geoscience and Remote Sensing Letters, doi:10.1109/LGRS.2020.3033988.

Therefore, in order to solve the problem of insufficient detection precision of the traditional SAR on-shore ship, the invention provides a scene perception data enhancement method for SAR ship detection.

Disclosure of Invention

The invention belongs to the technical field of Synthetic Aperture Radar (SAR) image interpretation, and discloses a scene perception data enhancement method for SAR ship detection. The method is based on a deep learning theory and mainly comprises four parts of a convolutional neural network, scene amplification and a classical detection network Faster R-CNN. The invention is improved to a certain extent based on the classical convolutional neural network VGG-11, so that the method is more suitable for SAR images, and then the images in the training set are classified into an offshore training sample and an offshore training sample by the network; obtaining a quantity-balanced offshore training sample and an offshore training sample by scene amplification; the classical detection network uses the processed data set for training, performs detection tasks and evaluates the detection results. And finally, the overall detection precision of the fast R-CNN ship detection network adopting the method is improved by 1.95% compared with that of the fast R-CNN ship detection network in the prior art, the detection precision of the shore ship is improved by 6.61%, and the detection precision of the SAR image shore ship is improved.

For convenience in describing the present invention, the following terms are first defined:

definition 1: SSDD data set acquisition method

The SSDD data set refers to a SAR ship detection data set, which is called SAR Ship Detection Dataset in english and is the first open SAR image ship detection data set. SSDD data are derived mainly from RadarSat-2, terrasar-X and Sentinel-1 sensors, containing data for HH, HV, VV and VH four polarizations. The observation scene of the SSDD data set mainly comprises a sea area and a near-shore area, wherein the sea area and the near-shore area share 1160 500 multiplied by 500 images and 2551 ships, each image has 2.20 ships on average, the ships have the characteristics of different scales, different distribution positions, different resolutions and the like, and the ship targets have diversity. The method for acquiring the SSDD data set is disclosed in the documents Li Jianwei, qu Changwen, peng Shujuan and Deng Bing, namely SAR image ship target detection based on convolutional neural network [ J ]. System engineering and electronic technology, 2018,40 (09): 1953-1959 ].

Definition 2: classical convolutional neural networks

Classical convolutional neural networks are generally composed of an input layer, an hidden layer, and an output layer. The input layer can process multidimensional data, and it is generally assumed in the field of computer vision that the input layer inputs three-dimensional input data, i.e., two-dimensional pixel points and RGB channels on a plane. The output layer typically uses a logic function or a normalized exponential function in image detection and recognition to output classification labels and corresponding frame coordinate values. The hidden layer comprises a convolution layer, a nonlinear activation function, a pooling layer and a full-connection layer, wherein the convolution layer takes a small rectangular area of an input characteristic as a unit, and the characteristic is abstracted in a high-dimensional manner; the nonlinear pooling layer is used to scale down the matrix and thereby reduce parameters in subsequent neural networks; the fully connected layer is equivalent to an implicit layer in a traditional feedforward neural network, and takes high-dimensional characteristics obtained by abstraction as input to carry out classification and detection tasks. Classical convolutional neural network methods are described in detail in the documents "Hu Fuyuan, li Linyan, shang Xinru, shen Junyu, dai Yongliang" overview of target detection algorithms based on convolutional neural networks [ J ]. Suzhou university of science and technology (Nature science edition), 2020,37 (02): 1-10+25 ] "

Definition 3: standard full connection layer method

The full-connection layer is a part of the convolutional neural network, the input and output sizes of the full-connection layer are fixed, and each node is connected with all nodes of the upper layer and used for integrating the features extracted from the front edge. The full-link layer method is described in detail in "Haoren Wang, haotian Shi, ke Lin, chengjin Qin, liqun Zhao, YIxiang Huang, chengliang Liu.Ahigh-precision arrhythmia classification method based on dual fully connected neural network [ J ]. Biomedical Signal Processing and Control,2020,58 ].

Definition 4: convolution kernel

A convolution kernel is a node that enables the weighting and then summing, respectively, of values within a small rectangular region in an input feature map or picture as an output. Each convolution kernel requires manual specification of a number of parameters. One type of parameter is the length and width of the node matrix processed by the convolution kernel, and the size of this node matrix is also the size of the convolution kernel. The other type of parameters of the convolution kernel is the depth of the unit node matrix obtained by processing, and the depth of the unit node matrix is also the depth of the convolution kernel. In the convolution operation process, each convolution kernel slides on input data, then the inner product of the corresponding position of the whole convolution kernel and the input data is calculated, then the final result is obtained through a nonlinear function, and finally the results of all the corresponding positions form a two-dimensional characteristic diagram. Each convolution kernel generates a two-dimensional feature map, and the feature maps generated by the convolution kernels are overlapped to form a three-dimensional feature map. The convolution kernel method is described in detail in "Fan Lili, zhao Hongwei, zhao Haoyu, hu Huangshui, wang Zhen. Overview of object detection studies based on deep convolutional neural networks [ J ]. Optical precision engineering, 2020,28 (05): 1152-1164 ].

Definition 5: traditional IoU cross-matching method

The IoU score is a standard performance metric for the object class segmentation problem. Given a set of images, ioU measures the similarity between the predicted and ground truth areas that give the objects present in the set of images, and is formulated as

Definition, wherein I (X) and U (X) represent the intersection and union of "predicted bounding box" and "real bounding box", respectively. The traditional IoU cross-Over ratio calculation method is detailed in the document Rahman M A, wang Y.optimizingInteraction-Over-Union in Deep Neural Networks for Image Segmentation [ M ]]//Advances in Visual Computing.Springer International Publishing,2016:234-244.”。

Definition 6: standard ReLU function activation method

The standard ReLU function, collectively known as a linear rectification function (Rectified Linear Unit, reLU), also known as a modified linear unit, is a commonly used activation function (activation function) in artificial neural networks, typically referred to as a nonlinear function represented by a ramp function and its variants. The expression is

The function can map the input variable of the function into the interval of 0 to 1, and is a negative half axisConstant 0, monotonically increasing and derivative function of the positive half-axis can increase sparsity in the neural network. The standard ReLU function activation method is described in detail in the website "https:// www.cnblogs.com/makefile/p/activation-function. Html".

Definition 7: standard batch normalization method

The standard batch normalization (Batch Normalization, BN) method is a method of unifying scattered data and serves to make it easier for the network to learn the regularity in the data. Generally BN is considered as a layer, and is added in front of the activation function to make the input x value change range smaller and reduce the overfitting to a certain extent. Standard batch normalization methods are described in detail in the website "https:// www.cnblogs.com/tune-lee/p/11989612. Html".

Definition 8: standard maximum pooling method

The standard Max Pooling (Max Pooling) method is a method for taking the point of maximum median of a local acceptance domain, and is mainly used for reducing the size of a model, improving the calculation speed and improving the robustness of extracted features. The standard max-pooling method is described in detail in the website "https:// blog. Csdn. Net/weixin_ 43336281/arc/details/102149468".

Definition 9: standard softmax method

The standard softmax method is the popularization of a logistic regression model on the multi-classification problem, and the expression is that

Wherein Vi is the output of the front-stage output unit of the classifier, i represents the class index, the total class number is C, and Si represents the ratio of the index of the current element to the sum of indexes of all elements. The softmax processed output can characterize the value as the relative probability between different classes. Standard softmax methods are described in detail in the website "https:// blog.csdn.net/qq_ 32642107/arte/details/97270994utm_medium=distribution.pc_release.none-task-blog-2-default-baidujs_baidulandingword-default-0.control &spm＝1001.2101.3001.4242”。

Definition 10: standard VGG-11 network

The standard VGG-11 network refers to a VGG network with 11 hidden layers, is a network part for extracting characteristics, can combine different modules in the network, comprises a plurality of convolution layers and pooling layers, and can automatically extract useful characteristic information through training. See document "Simonyan K, zisselman A.Very Deep Convolutional Networks for Large-Scale Image Recognition [ J ]. Computer Science,2014.

Definition 11: classical random gradient descent algorithm

The classical random gradient descent (SGD) algorithm is an optimization algorithm that optimizes the loss function built by the original model to find the optimal parameters. The method is characterized in that each data calculates a loss function and calculates gradients to update parameters, and the calculation speed is high. Classical random gradient descent algorithms are described in detail in "https:// blog. Csdn. Net/qq_ 38150441/arc/details/80533891".

Definition 12: recall rate and accuracy rate calculation method

Recall R refers to the predicted correct number in all positive samples expressed as

The accuracy P refers to the result of the positive example, and the ratio expression of the correct number is +.>

Wherein TP (true positive) represents a positive sample predicted by the model to be positive; FN (false negative) represents a negative sample predicted by the model to be negative; FP (false positive) is represented as a positive sample predicted by the model to be negative. The recall and precision curve P (R) refers to a function with R as independent variable and P as dependent variable, and the above parameter values are found in the literature Li Hang, statistical learning method [ M ] ]Beijing, university of Qinghua Press, 2012.

Definition 13: standard mAP index precision evaluation method

mAP refers to mean average precision, and English is called mean Average Precision. In the field of object detection, mAP is used to scale a detection modelThe accuracy is good. The calculation formula is as follows

Wherein P is precision, and R is recall. The standard mAP index accuracy assessment method is described in detail in "https:// www.cnblogs.com/zongfa/p/9783972.Html".

Definition 14: prior Art Faster R-CNN

The prior art Faster R-CNN is a target detection network. The network consists of two modules, the first is a regional recommendation network for recommending the position where the target may appear, and the second is a Fast R-CNN network for classifying the target and performing frame regression. The method for establishing the prior art Faster R-CNN network is shown in detail in Ren S, he K, girsheck R, et al Faster R-CNN: towards Real-Time Object Detection with Region Proposal Networks [ J ]. IEEE Transactions on Pattern Analysis & Machine Intelligence,2017,39 (6): 1137-1149 ].

Definition 15: classical data enhancement method

The classical data enhancement method is a method for generating new training samples, and the method achieves the aim of generating more training samples by adding some random disturbance to the original data and ensuring that class labels of the original data are unchanged. The function of data enhancement is to enhance the generalization of the network and improve various indexes of the network. Common data enhancement operations include flipping, rotating, scaling, cropping, and the like. Classical data enhancement methods are described in detail in "https:// blog.csdn.net/u 010801994/artecle/deltails/81914716".

Definition 16: standard forward propagation method

The standard forward propagation method is the most basic method in deep learning, and mainly performs forward reasoning on the input according to parameters in the network and a connection method, so as to obtain the output of the network. The standard forward propagation method is described in detail in "https:// www.jianshu.com/p/f30c8daebebb".

Definition 17: standard non-maximum value inhibition method

Standard non-maximum suppression methods are algorithms used in the field of target detection to remove redundant detection frames. In the forward propagation result of classical detection networks, it is often the case that the same target corresponds to multiple detection frames. Therefore, an algorithm is needed to screen out a best quality, highest scoring detection box from multiple detection boxes of the same target. Non-maximum suppression local maximum search is performed by calculating the overlap ratio threshold. Standard non-maximum suppression methods are described in "https:// www.cnblogs.com/makefile/p/nms. Html".

Definition 18: standard image mirroring method

The standard image mirroring method is classified into horizontal mirroring and vertical mirroring. The horizontal mirror image is to take the vertical central axis of the image as the central axis to exchange the left half part and the right half part of the image; the vertical mirror image is to take the horizontal central axis of the image as the central axis to exchange the upper half and the lower half of the image. The standard image mirroring method is described in detail in 'https:// blog.csdn.net/qq_ 30708445/arc/details/87881362utm_medium=distribution.pc_release.none-task-blog-2-defaults-baidujs-baidulandingword-defaults-0.no_search_link & spm= 1001.2101.3001.4242'.

Definition 19: standard data aggregation method

Standard data aggregation and methods are to merge together different data source data, including merging and renaming of pictures and labels, and then further data processing and analysis. Standard data sets and methods are described in detail in "https:// zhuanlan. Zhihu. Com/p/97074949".

The invention provides a scene perception data enhancement method for SAR ship detection, the whole flow is shown in figure 1, and the method comprises the following steps:

step 1, preparing a data set

The SSDD data set is obtained according to the method of obtaining the SSDD data set according to the definition 1, images with suffixes of 1 and 9 are selected as Test sets, other images are marked as Train sets, SAR images in the Train sets are marked and divided into two types of an offshore scene and an offshore scene, and a new Train set is obtained and marked as new_train.

Step 2, establishing a scene classification network

Defining an input layer, denoted as L1, according to the classical convolutional neural network method in definition 2, and inputting SAR images with 224 multiplied by 1;

taking an input layer L1 as an input, constructing a convolution layer C1 according to the classical convolution neural network method in the definition 2, and setting convolution kernel parameters: the size is set to 3×3×64, and the step size is set to 1;

Activating the convolution layer C1 by adopting the standard ReLU function activation method in definition 6 to obtain the activated convolution layer C1 _act ；

Activated convolutional layer C1 is normalized by the batch normalization method of the standard in definition 7 _act Carrying out batch normalization processing to obtain 224 multiplied by 64 dimensional vectors, and marking the vectors as L2;

taking 224 multiplied by 64 vector L2 as input, carrying out maximum pooling on L2 with the size of 2 multiplied by 2 by adopting a standard maximum pooling method in definition 8 to obtain 112 multiplied by 64 vector, and marking the vector as L3;

taking a 112×112×64-dimensional vector L3 as an input, constructing a convolutional layer C2 according to the classical convolutional neural network method in definition 2, and setting a convolutional kernel parameter: the size is set to 3×3×128, and the step size is set to 1;

activating the convolution layer C2 by adopting the standard ReLU function activation method in definition 6 to obtain the activated convolution layer C2 _act ；

Activated convolutional layer C2 using the batch normalization method of the criteria in definition 7 _act Carrying out batch normalization processing to obtain 112×112×128-dimensional vectors, and marking the vectors as L4;

taking a 112×112×128-dimensional vector L4 as an input, and carrying out maximum pooling on the L4 with the size of 2×2 by adopting a standard maximum pooling method in definition 8 to obtain a 56×56×128-dimensional vector, which is denoted as L5;

Taking a vector L5 with 56 multiplied by 128 as an input, constructing a convolution layer C3 according to the classical convolution neural network method in definition 2, and setting convolution kernel parameters: the size is set to 3×3×256, and the step size is set to 1;

activating the convolution layer C3 by adopting the standard ReLU function activation method in the definition 6 to obtain an activated volumeLamination C3 _act ；

Activated convolutional layer C3 using the batch normalization method of the criteria in definition 7 _act Carrying out batch normalization processing to obtain 56×56×256-dimensional vectors, and marking the vectors as L6;

taking a 56×56×256-dimensional vector L6 as an input, constructing a convolutional layer C4 according to the classical convolutional neural network method in definition 2, and setting a convolutional kernel parameter: the size is set to 3×3×256, and the step size is set to 1;

activating the convolution layer C4 by adopting the standard ReLU function activation method in definition 6 to obtain the activated convolution layer C4 _act ；

Activated convolutional layer C4 using the batch normalization method of the criteria in definition 7 _act Carrying out batch normalization processing to obtain 56×56×256-dimensional vectors, and marking the vectors as L7;

taking a vector L7 with 56 multiplied by 256 as an input, carrying out maximum pooling on the L7 with the size of 2 multiplied by 2 by adopting a standard maximum pooling method in definition 8 to obtain a vector with 28 multiplied by 256, and marking the vector as L8;

Taking a vector L8 with 28 multiplied by 256 as an input, constructing a convolution layer C5 according to the classical convolution neural network method in definition 2, and setting convolution kernel parameters: the size is set to 3×3×512, and the step size is set to 1;

activating the convolution layer C5 by adopting the standard ReLU function activation method in the definition 6 to obtain the activated convolution layer C5 _act ；

Activated convolutional layer C5 using the batch normalization method of the criteria in definition 7 _act Carrying out batch normalization processing to obtain a vector with dimensions of 28 multiplied by 512, and marking the vector as L9;

taking a vector L9 with 28 multiplied by 512 dimensions as an input, constructing a convolution layer C6 according to the classical convolution neural network method in the definition 2, and setting convolution kernel parameters: the size is set to 3×3×512, and the step size is set to 1;

activating the convolution layer C6 by adopting a standard ReLU function activation method in definition 6 to obtain the activated convolution layer C6 _act ；

Activated convolutional layer C6 using the batch normalization method of the criteria in definition 7 _act Carrying out batch normalization processing to obtain a vector of 28 multiplied by 512, and marking the vector as L10;

taking a vector L10 with 28 multiplied by 512 as an input, carrying out maximum pooling on the L10 with the size of 2 multiplied by 2 by adopting a standard maximum pooling method in definition 8 to obtain a vector with 14 multiplied by 512, and marking the vector as L11;

Taking a vector L11 with 14 multiplied by 512 as an input, constructing a convolution layer C7 according to the classical convolution neural network method in definition 2, and setting convolution kernel parameters: the size is set to 3×3×512, and the step size is set to 1;

activating the convolution layer C7 by adopting the standard ReLU function activation method in the definition 6 to obtain the activated convolution layer C7 _act ；

Activated convolutional layer C7 using the batch normalization method of the criteria in definition 7 _act Carrying out batch normalization processing to obtain a vector with dimensions of 14 multiplied by 512, and marking the vector as L12;

taking a vector L12 with 14 multiplied by 512 dimensions as an input, constructing a convolution layer C8 according to the classical convolution neural network method in definition 2, and setting convolution kernel parameters: the size is set to 3×3×512, and the step size is set to 1;

activating the convolution layer C8 by adopting the standard ReLU function activation method in the definition 6 to obtain the activated convolution layer C8 _act ；

Activated convolutional layer C8 using the batch normalization method of the criteria in definition 7 _act Carrying out batch normalization processing to obtain a vector with dimensions of 14 multiplied by 512, and marking the vector as L13;

taking a vector L13 with 14 multiplied by 512 as an input, carrying out maximum pooling on the L13 with the size of 2 multiplied by 2 by adopting a maximum pooling method of the standard in definition 8 to obtain a vector with 7 multiplied by 512, and marking the vector as L14;

Taking a vector L14 with 7 multiplied by 512 as an input, adopting a standard full-connection layer method in definition 3 to construct a full-connection layer with the size of 1 multiplied by 4096, and marking the full-connection layer as FC1;

taking FC1 as input, adopting a standard full-connection layer method in definition 3 to construct a full-connection layer with the size of 1 multiplied by 4096, and marking the full-connection layer as FC2;

taking FC2 as input, we use the method in definition 3Is a standard full-link layer method of (1X N) construction size _class Is the full connection layer of N _class The scene category number is recorded as FC-N _class ；

So far, after the scene classification network is constructed, the scene classification network is marked as Modified-VGG _pre 。

Step 3, training scene classification network

Taking the new training set new_train obtained in the step 1 as input, adopting a classical random gradient descent algorithm in the definition 9, and modifying-VGG the scene classification network established in the step 2 _pre Training and optimizing are carried out, and a scene classification network after training and optimizing is obtained and is recorded as Modified-VGG.

Step 4, scene classification is carried out

The training set Train is used as input, the scene classification network Modified-VGG obtained in the step 3 is used for classifying, all pictures in the Train are divided into two types, wherein the first type is a land scene and is marked as Data1, and the second type is an offshore scene and is marked as Data2.

Step 5, scene augmentation is carried out

According to the classification results Data1 and Data2 obtained in step 4. Define the number of pictures of Data1 as M ₁ The number of pictures of Data2 is M ₂ 。

If M ₁ <M ₂ Randomly selecting M in the first type of coastal scene Data1 by adopting an image mirroring method of the standard in definition 18 ₂ -M ₁ Mirror image operation is carried out on the pictures to obtain M after the mirror image operation ₂ -M ₁ The picture is denoted as extra_data1. M after the mirror operation is then performed using the data set and method of the criteria in definition 19 ₂ -M ₁ And merging the pictures extra_data1 with the first type of the shore scene Data1 to obtain a new shore scene Data set which is marked as new_data1. New_data2=data2 is defined.

If M ₁ >M ₂ Randomly selecting M in the second type of offshore scene Data2 by adopting a standard image mirroring method in definition 18 ₁ -M ₂ Mirror image operation is carried out on the pictures to obtain M after the mirror image operation ₁ -M ₂ The picture is denoted as extra_data2. M after the mirror operation is then performed using the data set and method of the criteria in definition 19 ₁ -M ₂ And merging the pictures extra_data2 with the second type of offshore scene data2 to obtain a new offshore scene Data set, which is marked as new_data2. New_data1=data1 is defined.

A new Data set new_data= { new_data1, new_data2}, is defined.

Step 6, performing experimental verification on a classical model

Step 6.1, data enhancement

Taking the new Data set new_data obtained in the step 5 as input, carrying out Data enhancement on the new_data by adopting a classical Data enhancement method in the definition 15, and obtaining a SAR image detection training set after Data enhancement, which is recorded as DetTrain.

Step 6.2, establishing a network

Establishing an untrained Faster R-CNN network by adopting a classical Faster R-CNN method in definition 14;

step 6.3 training network

Initializing the image batch processing size of the untrained network obtained in the step 6.2, and marking as a batch size;

initializing the learning rate of an untrained network, and marking the learning rate as eta;

initializing the weight attenuation rate and the momentum of untrained network training parameters, which are respectively marked as DC and MM;

initializing random parameters of the untrained fast R-CNN network obtained in the step 6.2, and recording the initialized parameters as W;

training an untrained fast R-CNN network by using a training set DetTrain step 6.1 and adopting a classical random gradient descent algorithm in definition 11 to obtain a loss value of the network, and recording the loss value as loss.

When the loss value loss of the network is smaller than the ideal loss value, training is stopped, and a new network parameter new_W is obtained.

Step 6.4, evaluation of detection results

Taking the new network parameter new_W obtained in the step 6.3 and the Test set Test obtained in the step 1 as inputs, adopting a standard forward propagation method in the definition 16 to obtain a detection Result of the ship detection network based on the fast R-CNN, and marking the detection Result as Result.

Taking a detection Result obtained by a ship detection network based on Faster R-CNN as input, removing redundant frames in the detection Result by adopting a standard non-maximum suppression method in definition 17, and obtaining a detection frame with the highest score, wherein the detection method comprises the following specific steps of:

(1) Firstly, enabling a frame with the highest score in a detection Result to be recorded as a BS;

(2) Then adopting a IoU cross-over ratio calculation method in definition 5 to perform IoU cross-over ratio calculation on the rest of frames in the detection Result and the BS to obtain the cross-over ratio (IoU) of the rest of frames in the detection Result and the BS, and marking the rest of frames in the Result as RB after discarding the frames with IoU > 0.5;

(3) Continuing to select the frame BS with the highest score from the RBs;

the process of calculating IoU and discarding in the above step (2) is repeated until no frame can be discarded, and the last remaining frame is the final detection result and is denoted as RR.

Taking the detection result RR of the Faster R-CNN network obtained in the above steps as input, and adopting a calculation method of the recall rate and the precision rate in definition 12 to calculate and obtain the precision rate P, the recall rate R and the precision rate and recall rate curve P (R) of the Faster R-CNN network detection; and calculating to obtain the average accuracy mAP of the Faster R-CNN network by adopting the standard mAP index accuracy assessment method in definition 13.

The invention has the innovation point that a model of scene classification is constructed by using a convolutional neural network to enhance data, and the detection precision of the shore ship in the SAR image is improved. The method can classify the offshore samples and the offshore samples of the training set to balance the quantity of the offshore training samples and the offshore training samples, so that the ship detection model has better offshore ship detection capability: the overall detection precision of the detection network of the Faster R-CNN ship by adopting the method is improved by 1.95 percent compared with that of the detection network of the Faster R-CNN ship in the prior art, and the detection precision of the detection network of the coasting ship is improved by 6.61 percent.

The method has the advantages that the detection precision of the shore ship in the SAR image can be improved, the defect of the detection precision of the shore ship in the prior art is overcome, and the overall detection precision is improved to a certain extent.

Drawings

Fig. 1 is a schematic flow chart of a scene perception data enhancement method for SAR ship detection in the present invention.

Fig. 2 is a schematic diagram of a scene classification network structure of a scene perception data enhancement method for SAR ship detection in the present invention.

Fig. 3 shows the detection accuracy of the scene perception data enhancement method for SAR ship detection in the present invention.

Detailed Description

Step 1, preparing a data set

Step 2, establishing a scene classification network

activating the convolution layer C3 by adopting the standard ReLU function activation method in definition 6 to obtain the activated convolution layer C3 _act ；

Activated convolutional layer C7 using the batch normalization method of the criteria in definition 7 _act Performing batch normalization to obtain 14×14×512-dimensional vector, and recording Is L12;

using FC2 as input, a full connection layer method of the standard in definition 3 was used to build a 1×1×n size _class Is the full connection layer of N _class The scene category number is recorded as FC-N _class ；

Step 3, training scene classification network

Step 4, scene classification is carried out

Step 5, scene augmentation is carried out

A new Data set new_data= { new_data1, new_data2}, is defined.

Step 6, performing experimental verification on a classical model

Step 6.1, data enhancement

Step 6.2, establishing a network

Step 6.3 training network

Step 6.4, evaluation of detection results

(3) Continuing to select the frame BS with the highest score from the RBs;

Claims

1. A scene perception data enhancement method for SAR ship detection is characterized by comprising the following steps:

step 1, preparing a data set

Obtaining an SSDD data set by adopting a method for obtaining the SSDD data set, selecting images with suffixes of 1 and 9 as Test sets, marking other images as training sets as Train, marking SAR images in the training set Train, and dividing the SAR images into an offshore scene and an offshore scene to obtain a new training set, wherein the new training set is new_train;

Step 2, establishing a scene classification network

Defining an input layer by adopting a classical convolutional neural network method, marking as L1, and inputting SAR images with 224 multiplied by 1;

taking an input layer L1 as input, constructing a convolution layer C1 by adopting a classical convolution neural network method, and setting convolution kernel parameters: the size is set to 3×3×64, and the step size is set to 1;

activating the convolution layer C1 by adopting a standard ReLU function activation method to obtain an activated convolution layer C1 _act ；

The activated convolution layer C1 is subjected to a standard batch normalization method _act Carrying out batch normalization processing to obtain 224 multiplied by 64 dimensional vectors, and marking the vectors as L2;

taking 224 multiplied by 64 vector L2 as input, carrying out maximum pooling on L2 with the size of 2 multiplied by 2 by adopting a standard maximum pooling method to obtain 112 multiplied by 64 vector, and marking the vector as L3;

taking a 112 multiplied by 64 vector L3 as an input, constructing a convolution layer C2 according to a classical convolution neural network method, and setting convolution kernel parameters: the size is set to 3×3×128, and the step size is set to 1;

activating the convolution layer C2 by adopting a standard ReLU function activation method to obtain an activated convolution layer C2 _act ；

The activated convolution layer C2 is subjected to a standard batch normalization method _act Carrying out batch normalization processing to obtain 112×112×128-dimensional vectors, and marking the vectors as L4;

Taking a 112×112×128-dimensional vector L4 as an input, and carrying out maximum pooling on the L4 with a size of 2×2 by adopting a standard maximum pooling method to obtain a 56×56×128-dimensional vector, which is denoted as L5;

taking a vector L5 with 56 multiplied by 128 as an input, constructing a convolution layer C3 according to a classical convolution neural network method, and setting convolution kernel parameters: the size is set to 3×3×256, and the step size is set to 1;

activating the convolution layer C3 by adopting a standard ReLU function activation method to obtain an activated convolution layer C3 _act ；

The activated convolution layer C3 is subjected to a standard batch normalization method _act Carrying out batch normalization processing to obtain 56×56×256-dimensional vectors, and marking the vectors as L6;

taking a vector L6 with 56 multiplied by 256 as an input, constructing a convolution layer C4 by adopting a classical convolution neural network method, and setting convolution kernel parameters: the size is set to 3×3×256, and the step size is set to 1;

activating the convolution layer C4 by adopting a standard ReLU function activation method to obtain an activated convolution layer C4 _act ；

The activated convolution layer C4 is subjected to a standard batch normalization method _act Carrying out batch normalization processing to obtain 56×56×256-dimensional vectors, and marking the vectors as L7;

taking a vector L7 with 56 multiplied by 256 as an input, carrying out maximum pooling on the L7 with the size of 2 multiplied by 2 by adopting a standard maximum pooling method to obtain a vector with 28 multiplied by 256, and marking the vector as L8;

Taking a vector L8 with 28 multiplied by 256 as an input, constructing a convolution layer C5 by adopting a classical convolution neural network method, and setting convolution kernel parameters: the size is set to 3×3×512, and the step size is set to 1;

activating the convolution layer C5 by adopting a standard ReLU function activation method to obtain an activated convolution layer C5 _act ；

The activated convolution layer C5 is subjected to a standard batch normalization method _act Carrying out batch normalization processing to obtain a vector with dimensions of 28 multiplied by 512, and marking the vector as L9;

taking a vector L9 with 28 multiplied by 512 dimensions as an input, constructing a convolution layer C6 by adopting a classical convolution neural network method, and setting convolution kernel parameters: the size is set to 3×3×512, and the step size is set to 1;

activating the convolution layer C6 by adopting a standard ReLU function activation method to obtain an activated convolution layer C6 _act ；

The activated convolution layer C6 is subjected to a standard batch normalization method _act Carrying out batch normalization processing to obtain a vector of 28 multiplied by 512, and marking the vector as L10;

taking a vector L10 with 28 multiplied by 512 as an input, carrying out maximum pooling on the L10 with the size of 2 multiplied by 2 by adopting a standard maximum pooling method to obtain a vector with 14 multiplied by 512, and marking the vector as L11;

taking a vector L11 with 14 multiplied by 512 as an input, constructing a convolution layer C7 by adopting a classical convolution neural network method, and setting convolution kernel parameters: the size is set to 3×3×512, and the step size is set to 1;

Activating the convolution layer C7 by adopting a standard ReLU function activation method to obtain an activated convolution layer C7 _act ；

The activated convolution layer C7 is subjected to a standard batch normalization method _act Carrying out batch normalization processing to obtain a vector with dimensions of 14 multiplied by 512, and marking the vector as L12;

taking a vector L12 with 14 multiplied by 512 as an input, constructing a convolution layer C8 by adopting a classical convolution neural network method, and setting convolution kernel parameters: the size is set to 3×3×512, and the step size is set to 1;

activating the convolution layer C8 by adopting a standard ReLU function activation method to obtain an activated convolution layerC8 _act ；

The activated convolution layer C8 is subjected to a standard batch normalization method _act Carrying out batch normalization processing to obtain a vector with dimensions of 14 multiplied by 512, and marking the vector as L13;

taking a vector L13 with 14 multiplied by 512 as an input, carrying out maximum pooling on the L13 with the size of 2 multiplied by 2 by adopting a standard maximum pooling method to obtain a vector with 7 multiplied by 512, and marking the vector as L14;

taking a vector L14 with 7 multiplied by 512 as an input, and adopting a standard full-connection layer method to construct a full-connection layer with the size of 1 multiplied by 4096, which is marked as FC1;

taking FC1 as input, adopting a standard full-connection layer method to construct a full-connection layer with the size of 1 multiplied by 4096, and marking the full-connection layer as FC2;

Using FC2 as input, using standard full-link layer method to build 1×1×N size _class Is the full connection layer of N _class The scene category number is recorded as FC-N _class ；

So far, after the scene classification network is constructed, the scene classification network is marked as Modified-VGG _pre ；

Step 3, training scene classification network

Taking the new training set new_train obtained in the step 1 as input, adopting a classical random gradient descent algorithm, and modifying-VGG the scene classification network established in the step 2 _pre Training and optimizing to obtain a scene classification network after training and optimizing, and marking the scene classification network as Modified-VGG;

step 4, scene classification is carried out

Classifying by taking a training set Train as input through a scene classification network Modified-VGG obtained in the step 3, classifying all pictures in the Train into two types, wherein the first type is a coastal scene, and the second type is an offshore scene, and is denoted as Data 1;

step 5, scene augmentation is carried out

According to the classification results Data1 and Data2 obtained in the step 4; define the number of pictures of Data1 as M ₁ The number of pictures of Data2 is M ₂ ；

If M ₁ <M ₂ Using standard imagesMirror image method for randomly selecting M in first-class shore scene Data1 ₂ -M ₁ Mirror image operation is carried out on the pictures to obtain M after the mirror image operation ₂ -M ₁ A picture, recorded as extra_data1; then adopting standard data set combining method to make M after mirror image operation ₂ -M ₁ Merging the extra_data1 pictures with the first type of shore scene Data1 to obtain a new shore scene Data set, and marking the new_data1; defining new_data2=data2;

if M ₁ >M ₂ M randomly selected in the second type of offshore scene Data2 by adopting a standard image mirroring method ₁ -M ₂ Mirror image operation is carried out on the pictures to obtain M after the mirror image operation ₁ -M ₂ A picture, recorded as extra_data2; then adopting standard data set combining method to make M after mirror image operation ₁ -M ₂ Merging the extra_data2 pictures with the second type of offshore scene Data2 to obtain a new offshore scene Data set, and recording the new offshore scene Data set as new_data2; defining new_data1=data1;

defining a new Data set new_data= { new_data1, new_data2};

step 6, performing experimental verification on a classical model

Step 6.1, data enhancement

Taking the new Data set new_data obtained in the step 5 as input, and carrying out Data enhancement on the new_data by adopting a classical Data enhancement method to obtain a SAR image detection training set after Data enhancement, and recording the SAR image detection training set as DetTrain;

step 6.2, establishing a network

Establishing an untrained Faster R-CNN network by adopting a classical Faster R-CNN method;

Step 6.3 training network

training an untrained fast R-CNN network by using a training set DetTrain step 6.1 and adopting a classical random gradient descent algorithm to obtain a loss value of the network, and marking the loss value as loss;

when the loss value loss of the network is smaller than an ideal loss value, stopping training to obtain a new network parameter new_W;

step 6.4, evaluation of detection results

Taking the new network parameter new_W obtained in the step 6.3 and the Test set Test obtained in the step 1 as inputs, adopting a standard forward propagation method to obtain a detection Result of a ship detection network based on Faster R-CNN, and recording the detection Result as Result;

taking a detection Result obtained by a ship detection network based on Faster R-CNN as input, and removing redundant frames in the detection Result by adopting a standard non-maximum suppression method to obtain a detection frame with highest score, wherein the detection method comprises the following specific steps of:

(2) Then, a traditional IoU cross-over ratio calculation method is adopted to carry out IoU cross-over ratio calculation on the rest frames in the detection Result and the BS, so as to obtain the cross-over ratio (IoU) of the rest frames in the detection Result and the BS, and after the frames with IoU more than 0.5 are discarded, the rest frames in the Result are marked as RB;

(3) Continuing to select the frame BS with the highest score from the RBs;

repeating the steps of calculating IoU and discarding in the step (2) until no frame can be discarded, and finally obtaining the remaining frame as a final detection result, and marking as RR;

taking the detection result RR of the Faster R-CNN network obtained in the above steps as input, and adopting a recall rate and precision rate calculation method to calculate and obtain the precision rate P, the recall rate R and the precision rate and recall rate curve P (R) of Faster R-CNN network detection; and calculating to obtain the average accuracy mAP of the Faster R-CNN network by adopting a standard mAP index accuracy assessment method.