CN113902975A

CN113902975A - Scene perception data enhancement method for SAR ship detection

Info

Publication number: CN113902975A
Application number: CN202111170725.1A
Authority: CN
Inventors: 张晓玲; 杨振宇; 张天文; 师君; 韦顺军
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2021-10-08
Filing date: 2021-10-08
Publication date: 2022-01-07
Anticipated expiration: 2041-10-08
Also published as: CN113902975B

Abstract

The invention discloses a scene perception data enhancement method for SAR ship detection, which is characterized in that the method is firstly improved based on a classical convolutional neural network VGG-11 so as to be more suitable for SAR images, and then the network is used for classifying images in a training set: dividing the training sample into an offshore training sample and an offshore training sample; then, utilizing scene amplification to obtain a balanced number of offshore training samples and offshore training samples; the classical detection network is trained by using the processed data set, executes a detection task and evaluates a detection result; the total detection precision of the Faster R-CNN ship detection network adopting the method is improved by 1.95 percent compared with the total detection precision of the Faster R-CNN ship detection network in the prior art, the detection precision of the ship on shore is improved by 6.61 percent, and the improvement of the detection precision of the SAR image ship on shore is realized.

Description

Scene perception data enhancement method for SAR ship detection

Technical Field

The invention belongs to the technical field of Synthetic Aperture Radar (SAR) image interpretation, and relates to a scene perception data enhancement method for SAR ship detection.

Background

Synthetic Aperture Radar (SAR) is a microwave active imaging radar with high resolution, has the characteristics of all-weather and all-day operation, and compared with an optical sensor, the electromagnetic wave transmitted by the SAR can penetrate through the shielding of cloud and fog, vegetation and other complex environment objects and can not be influenced by the brightness of light in a detection area, so that the SAR has wide application in the fields of civil affairs and military affairs. See the literature, "Ou Shining, application research of synthetic aperture radar in ship target positioning and imaging technology [ J ]. ship science and technology, 2019,41(02): 152-.

In recent years, ship detection in the SAR image has also become a research hotspot, because it can realize convenient marine traffic management, ship oil spill monitoring, ship disaster rescue, and the like. Ships in the SAR images are important valuable targets, particularly in the field of national defense and military, the national ocean rights and interests can be effectively protected, and an effective solving means is provided for solving the ocean disputes. In particular, the SAR work is not influenced by daytime and climate conditions, and is particularly suitable for the ocean environment of metamerism measurement, thereby making up for the defects of the optical sensor. See the literature "application of marfan, bau, synthetic aperture radar in high-resolution monitoring and mapping of ship targets [ J ] ship science and technology, 2018,40(22): 157-.

Many SAR image ship detection algorithms have been proposed so far, and the most common and effective method is various detection algorithms based on CFAR, which uses a sea clutter model established in advance, searches images using a sliding window, and determines whether to include a ship according to a ship detection threshold provided by the sea clutter model, wherein the common sea clutter models are based on gaussian distribution, rayleigh distribution, K distribution, and the like. However, since the sea surface background is affected by the surrounding environment and weather, the background clutter distribution model is difficult to fit the real background clutter distribution, so that the CFAR is difficult to apply in the condition of a more complex scene. See in detail the document "Yang Zhi, Song Hui, Du Yang, Zhang Qing, Mengming, Rice-CFAR based SAR image Ship detection [ J ]. Syzygium university of Fertilizer industry (Nature science edition), 2015,38(04):463 and 467".

With the development of artificial intelligence, deep learning is applied to the SAR image ship detection field. The deep learning-based method mainly adopts a deep convolutional neural network to automatically extract the characteristics of the ship, fits the mathematical distribution of data through learning training, and obtains the coordinate position of the ship in the SAR image through reasoning through regression, wherein the precision of the method is higher than that of various detection algorithms based on CFAR. Currently, some target detectors derived from the computer vision field, such as Fast R-CNN, YOLO, RetinaNet, etc., have been successfully applied to the SAR image ship detection field. However, the detection accuracy of the ship on shore is obviously lower than that of the ship off shore because the landing area has stronger backscattering characteristics.

Although the SAR ship detector based on the CNN has better detection performance than the traditional detection method, the detection precision of the offshore ship is still difficult to improve due to the unbalance of the sample scene. To balance the number of onshore and offshore samples, a method of a Balanced Scene Learning Mechanism (BSLM) for onshore and offshore ship detection in SAR images is proposed. The method is based on unsupervised learning, and utilizes a generation countermeasure network (GAN) to extract scene characteristics of the SAR image; using these features, scene binary clustering (onshore/offshore) by k-means; and finally, enhancing the near-shore sample by copying, rotating and transforming or adding noise to balance the off-shore sample, thereby eliminating the scene learning deviation, obtaining the balanced learning representation capability and improving the learning benefit and the detection precision. See the documents "T.Zhang et al," Balance Scene Learning Mechanism for offset and interior Shift Detection in SAR Images, "in IEEE Geoscience and Remote Sensing Letters, doi:10.1109/LGRS.2020.3033988.

Therefore, in order to solve the problem of insufficient detection precision of the traditional SAR ship landing, the invention provides a scene perception data enhancement method for SAR ship detection.

Disclosure of Invention

The invention belongs to the technical field of Synthetic Aperture Radar (SAR) image interpretation, and discloses a scene perception data enhancement method for SAR ship detection. The method is based on a deep learning theory and mainly comprises a convolutional neural network, scene amplification and a classical detection network Faster R-CNN. The method is improved to a certain extent based on a classical convolutional neural network VGG-11, so that the method is more suitable for SAR images, and then the images in a training set are subjected to secondary classification by using the network and are divided into an onshore training sample and an offshore training sample; then, utilizing scene amplification to obtain a balanced number of offshore training samples and offshore training samples; the classical detection network is trained using the processed data set, performs detection tasks and evaluates detection results. Finally, the total detection precision of the fast R-CNN ship detection network adopting the method is improved by 1.95 percent compared with the total detection precision of the fast R-CNN ship detection network in the prior art, the detection precision of the ship on shore is improved by 6.61 percent, and the improvement of the detection precision of the SAR image ship on shore is realized.

For the convenience of describing the present invention, the following terms are first defined:

definition 1: SSDD data set acquisition method

The SSDD data set refers to a SAR Ship Detection data set, which is called SAR Ship Detection data set in all english, and is a data set for Ship Detection of the first open SAR image. SSDD data mainly come from RadarSat-2, TerrasAR-X and Sentiniel-1 sensors and comprise data of four polarization modes of HH, HV, VV and VH. The observation scene of the SSDD data set is mainly sea area and offshore area, there are 1160 images of 500 × 500 and 2551 ships in total, there are 2.20 ships in each image on average, and the ships have different dimensions, different distribution positions and different resolutions, and the ship targets have diversity. The method for acquiring the SSDD data set is shown in a document ' Lijianwei, Quchang, Ponlan and Dengdong ' SAR image ship target detection [ J ] based on a convolutional neural network, a system engineering and electronic technology, 2018,40(09):1953, 1959 '.

Definition 2: classical convolutional neural network

A classical convolutional neural network is usually composed of an input layer, a hidden layer, and an output layer. The input layer can process multidimensional data, and in the field of computer vision, the input layer is generally assumed to input three-dimensional input data in advance, namely two-dimensional pixel points and RGB channels on a plane. The output layer outputs the classification labels and corresponding bounding box coordinate values, typically using a logistic function or normalized exponential function, in image detection and recognition. The hidden layer comprises a convolution layer, a nonlinear activation function, a pooling layer and a full-connection layer, wherein the convolution layer takes a small rectangular region of an input feature as a unit and abstracts the feature in a high dimension; the non-linear pooling layer is used to reduce the matrix, thereby reducing the parameters in the subsequent neural network; the fully-connected layer is equivalent to a hidden layer in a traditional feedforward neural network, and takes high-dimensional features obtained by previous abstraction as input to carry out classification and detection tasks. The classical convolution neural network method is described in detail in the literature "Huvogen, Lilinyan, Shangxinluo, Shenmilitary, Dyyonghe.Objective detection algorithm based on convolution neural network overview [ J ]. proceedings of Suzhou university of science and technology (Nature science edition), 2020,37(02):1-10+25 ].

Definition 3: standard full connection layer method

The fully-connected layer is a part of a convolutional neural network, the input and output sizes of the fully-connected layer are fixed, and each node is connected with all nodes of the previous layer and is used for integrating the extracted features. The full link layer method is described in detail in "Haoren Wang, Haotian Shi, Ke Lin, Chengjin Qin, Liqun Zhao, Yixiang Huang, Chengliang Liu.

Definition 4: convolution kernel

The convolution kernel is a node that implements weighting and then summing values within a small portion of a rectangular region in an input feature map or picture, respectively, as an output. Each convolution kernel requires the manual specification of multiple parameters. One type of parameter is the length and width of the node matrix processed by the convolution kernel, and the size of this node matrix is also the size of the convolution kernel. The other type of convolution kernel has parameters of the depth of the unit node matrix obtained by processing, and the depth of the unit node matrix is also the depth of the convolution kernel. In the convolution operation process, each convolution kernel slides on input data, then an inner product of the whole convolution kernel and the corresponding position of the input data is calculated, then the inner product is processed through a nonlinear function to obtain a final result, and finally the results of all the corresponding positions form a two-dimensional characteristic diagram. Each convolution kernel generates a two-dimensional feature map, and the feature maps generated by the plurality of convolution kernels are overlapped to form a three-dimensional feature map. The convolution kernel method is detailed in 'Vanli, Zhao hong Wei, Zhao Hao Yu, Huhuang shui, Wang Zheng' the research and study of target detection based on deep convolution neural network is reviewed in [ J ] optical precision engineering, 2020,28(05):1152 once 1164 ].

Definition 5: conventional IoU intersection ratio method

The IoU score is a standard performance metric for the object class segmentation problem. Given a set of images, the IoU measurement gives the similarity between the predicted area and the ground truth area of the objects present in the set of images, and is formulated

Definition, where i (x) and u (x) represent the intersection and union of "predicted bounding box" and "real bounding box", respectively. The conventional IoU Intersection ratio calculation method is described in the literature "Rahman M A, Wang Y. optimizing interaction-Over-Union in Deep Neural Networks for Image Segmentation [ M]//Advances in Visual Computing.Springer International Publishing,2016:234-244.”。

Definition 6: standard ReLU function activation method

The standard ReLU function is called a Linear rectification function (ReLU), also called a modified Linear Unit (modified Linear Unit), and is an activation function (activation function) commonly used in artificial neural networks, and generally refers to a non-Linear function represented by a ramp function and a variant thereofA linear function. The expression is

The function can map input variables of the function into an interval from 0 to 1, is a function with a negative half axis being constant 0 and a positive half axis being monotonically increasing and being derivable, and can increase sparsity in a neural network. The standard ReLU function activation method is detailed in the website "https:// www.cnblogs.com/makefile/p/activation-function html".

Definition 7: standard batch normalization method

The standard Batch Normalization (BN) method is a method for unifying scattered data, and is used to make the network learn the rules in the data more easily. BN is usually viewed as a layer, added in front of the activation function, to narrow the variation range of the input x value and reduce overfitting to some extent. Standard batch normalization methods are detailed in the website "https:// www.cnblogs.com/shine-lee/p/11989612. html".

Definition 8: standard maximum pooling method

The standard Max Pooling (Max Pooling) method is a method of taking the point with the largest value in the local acceptance domain, and its main role is to reduce the size of the model, increase the computation speed, and increase the robustness of the extracted features. Standard maximum pooling methods are detailed in the website "https:// blog.csdn.net/weixin _ 43336281/article/details/102149468".

Definition 9: standard softmax method

The standard softmax method is the popularization of a logistic regression model on a multi-classification problem, and the expression is

Wherein Vi is the output of the preceding-stage output unit of the classifier, i represents the class index, the total number of classes is C, and Si represents the ratio of the index of the current element to the sum of the indexes of all elements. The output after softmax processing can characterize the value as the relative probability between different classes. Standard softmax method details the website "https:// blog. csdn. net/qq _32642107/article/details/97270994utm_medium＝distribute.pc_relevant.none-task-blog-2～default～baidujs_baidulandingword～default-0.control&spm＝1001.2101.3001.4242”。

Definition 10: standard VGG-11 networks

The standard VGG-11 network refers to a VGG network with 11 hidden layers, is a network part for extracting features, can combine different modules in the network, comprises a plurality of convolutional layers and pooling layers, and can automatically extract useful feature information through training. See in detail the document "Simnyan K, Zisserman A. Very Deep conditional Networks for Large-Scale Image Recognition [ J ]. Computer Science,2014.

Definition 11: classical stochastic gradient descent algorithm

The classical Stochastic Gradient Descent (SGD) algorithm is an optimization algorithm that optimizes the loss function constructed from the original model to find the optimal parameters. The method is characterized in that each data calculates a loss function and calculates gradient to update parameters, and the calculation speed is high. The classical stochastic gradient descent algorithm is detailed in "https:// blog.csdn.net/qq _ 38150441/article/details/80533891".

Definition 12: recall ratio and accuracy calculation method

Recall R refers to the number of correct predictions in all positive samples, expressed as

The precision ratio P refers to the proportional expression of the correct number in the result predicted as positive example as

Wherein tp (true positive) represents a positive sample predicted to be a positive value by the model; fn (false negative) represents the negative sample predicted by the model as negative; fp (false positive) is expressed as a positive sample predicted to be negative by the model. The recall rate and accuracy curve P (R) is a function with R as independent variable and P as dependent variable, and the method for solving the numerical values of the parameters is shown in the literature' Lihang, statistical learning method [ M]Beijing, Qinghua university Press, 2012 ".

Definition 13: standard mAP index precision evaluation method

The mAP refers to the mean Average Precision, and is called mean Average Precision in English. In the field of target detection, the mAP is used to measure the accuracy of a detection model. The calculation formula is

Where P is precision and R is recall. Standard mAP index accuracy assessment methods are detailed in "https:// www.cnblogs.com/zongfa/p/9783972. html".

Definition 14: prior art fast R-CNN

The prior art Faster R-CNN is a target detection network. The network consists of two modules, wherein the first module is a regional recommendation network and is used for recommending the positions where targets may appear, and the second module is a Fast R-CNN network and is used for classifying the targets and performing frame regression. The method for establishing the prior art fast R-CNN network is described in detail in "Ren S, He K, Girshick R, et al. fast R-CNN: Towards read-Time Object Detection with Region pro-pos Networks [ J ]. IEEE Transactions on Pattern Analysis & Machine Analysis, 2017,39(6): 1137-.

Definition 15: classical data enhancement method

The classical data enhancement method is a method for generating a new training sample, and the method achieves the aim of generating more training samples by adding some random disturbance to original data and simultaneously ensuring that class labels of the original data are unchanged. The data enhancement function is to enhance the generalization of the network and improve various indexes of the network. Common data enhancement operations include flipping, rotating, scaling, cropping, and the like. The classic data enhancement method is detailed in https:// blog.csdn.net/u 010801994/article/details/81914716.

Definition 16: standard forward propagation method

The standard forward propagation method is the most basic method in deep learning, and mainly carries out forward reasoning on input according to parameters and connection methods in a network so as to obtain the output of the network. Standard forward propagation methods are detailed in "https:// www.jianshu.com/p/f30c8 daebebebb".

Definition 17: standard non-maximum suppression method

The standard non-maximum suppression method is an algorithm used in the field of target detection to remove redundant detection boxes. In the forward propagation result of the classical detection network, the situation that the same target corresponds to a plurality of detection boxes often occurs. Therefore, an algorithm is needed to select a detection box with the best quality and the highest score from a plurality of detection boxes of the same target. Non-maxima suppression performs a local maximum search by calculating an overlap rate threshold. Standard non-maxima suppression methods are detailed in "https:// www.cnblogs.com/makefile/p/nms. html".

Definition 18: standard image mirroring method

The standard image mirroring method is divided into horizontal mirroring and vertical mirroring. The horizontal mirror image is to exchange the left half part and the right half part of the image by taking a vertical central axis of the image as a central axis; the vertical mirror image is obtained by interchanging the upper half part and the lower half part of the image by taking the horizontal central axis of the image as a central axis. The standard image mirroring method is detailed in https:// blog.csdn.net/qq _ 30708445/adaptor/detail/87881362 utm _ medium ═ distribution.pc _ release.ne-task-block-2-default-basic _ functional-0. no _ search _ link & span ═ 1001.2101.3001.4242 ".

Definition 19: standard data set merging method

The standard data set and method is to combine different data sources together, including combining and renaming pictures and labels, and then further perform data processing and analysis. Standard data sets and methods are detailed in "https:// zhuanlan.

The invention provides a scene perception data enhancement method for SAR ship detection, the whole process is shown in the attached figure 1, and the method comprises the following steps:

step 1, preparing a data set

Obtaining an SSDD data set according to a method of obtaining the SSDD data set according to definition 1, selecting images with suffixes of 1 and 9 as a Test set, marking the images as a Test set, marking other images as a training set as Train, marking SAR images in the training set Train, dividing the SAR images into an onshore scene and an offshore scene, and obtaining a new training set, marking the new training set as new _ Train.

Step 2, establishing a scene classification network

According to a classical convolutional neural network method in definition 2, an input layer is defined and is marked as L1, and SAR images with the size of 224 multiplied by 1 are input;

taking the input layer L1 as input, constructing a convolutional layer C1 according to a convolutional neural network method as classic in definition 2, and setting parameters of a convolutional kernel: the size is set to 3 × 3 × 64, and the step size is set to 1;

activating the convolutional layer C1 by the standard ReLU function activation method in definition 6 to obtain an activated convolutional layer C1_act；

Activated convolutional layer C1 using the Standard batch normalization method in definition 7_actCarrying out batch normalization processing to obtain a 224 multiplied by 64 dimensional vector which is marked as L2;

taking a vector L2 with dimensions of 224 × 224 × 64 as input, performing maximum pooling of L2 with the size of 2 × 2 by using the standard maximum pooling method in definition 8 to obtain a vector with dimensions of 112 × 112 × 64, which is marked as L3;

with a 112 × 112 × 64-dimensional vector L3 as input, convolutional layer C2 is constructed according to the convolutional neural network method as classic in definition 2, and the convolutional kernel parameter set: the size is set to 3 × 3 × 128, the step size is set to 1;

activating the convolutional layer C2 by the standard ReLU function activation method in definition 6 to obtain an activated convolutional layer C2_act；

Activated convolutional layer C2 using the Standard batch normalization method in definition 7_actCarrying out batch normalization processing to obtain a 112 multiplied by 128 dimensional vector which is marked as L4;

taking a 112 × 112 × 128-dimensional vector L4 as an input, performing maximum pooling of L4 with a size of 2 × 2 by using a standard maximum pooling method in definition 8 to obtain a 56 × 56 × 128-dimensional vector, which is denoted as L5;

taking a vector L5 with dimensions of 56 × 56 × 128 as input, a convolutional layer C3 is constructed according to the convolutional neural network method as classic in definition 2, and the parameters of the convolutional kernel are set as follows: the size is set to 3 × 3 × 256, and the step size is set to 1;

activating the convolutional layer C3 by the standard ReLU function activation method in definition 6 to obtain an activated convolutional layer C3_act；

Activated convolutional layer C3 using the Standard batch normalization method in definition 7_actCarrying out batch normalization processing to obtain a 56 × 56 × 256-dimensional vector which is marked as L6;

taking a vector L6 with dimensions of 56 × 56 × 256 as input, a convolutional layer C4 is constructed according to the convolutional neural network method as classic in definition 2, and the parameters of the convolutional kernel are set as follows: the size is set to 3 × 3 × 256, and the step size is set to 1;

activating the convolutional layer C4 by the standard ReLU function activation method in definition 6 to obtain an activated convolutional layer C4_act；

Activated convolutional layer C4 using the Standard batch normalization method in definition 7_actCarrying out batch normalization processing to obtain a 56 × 56 × 256-dimensional vector which is marked as L7;

taking a 56 × 56 × 256 dimensional vector L7 as an input, performing maximum pooling of L7 with a size of 2 × 2 by using a standard maximum pooling method in definition 8 to obtain a 28 × 28 × 256 dimensional vector, which is denoted as L8;

taking a 28 × 28 × 256-dimensional vector L8 as input, a convolutional layer C5 is constructed according to the convolutional neural network method as classic in definition 2, and the convolutional kernel parameter setting is: the size is set to 3 × 3 × 512, and the step size is set to 1;

activating the convolutional layer C5 by the standard ReLU function activation method in definition 6 to obtain an activated convolutional layer C5_act；

Activated convolutional layer C5 using the Standard batch normalization method in definition 7_actCarrying out batch normalization processing to obtain a 28 multiplied by 512-dimensional vector which is marked as L9;

taking a 28 × 28 × 512-dimensional vector L9 as input, a convolutional layer C6 is constructed according to the convolutional neural network method as classic in definition 2, and the convolutional kernel parameter setting is: the size is set to 3 × 3 × 512, and the step size is set to 1;

by usingDefining the standard ReLU function activation method in the 6 to activate the convolutional layer C6 to obtain the activated convolutional layer C6_act；

Activated convolutional layer C6 using the Standard batch normalization method in definition 7_actCarrying out batch normalization processing to obtain a 28 multiplied by 512-dimensional vector which is marked as L10;

taking a vector L10 with dimensions of 28 × 28 × 512 as input, performing maximum pooling of L10 with the size of 2 × 2 by using the standard maximum pooling method in definition 8 to obtain a vector with dimensions of 14 × 14 × 512, which is recorded as L11;

with a 14 × 14 × 512-dimensional vector L11 as input, convolutional layer C7 is constructed according to the convolutional neural network method as classic in definition 2, and the convolutional kernel parameter is set: the size is set to 3 × 3 × 512, and the step size is set to 1;

activating the convolutional layer C7 by the standard ReLU function activation method in definition 6 to obtain an activated convolutional layer C7_act；

Activated convolutional layer C7 using the Standard batch normalization method in definition 7_actCarrying out batch normalization processing to obtain a 14 multiplied by 512-dimensional vector which is marked as L12;

with a 14 × 14 × 512-dimensional vector L12 as input, convolutional layer C8 is constructed according to the convolutional neural network method as classic in definition 2, and the convolutional kernel parameter is set: the size is set to 3 × 3 × 512, and the step size is set to 1;

activating the convolutional layer C8 by the standard ReLU function activation method in definition 6 to obtain an activated convolutional layer C8_act；

Activated convolutional layer C8 using the Standard batch normalization method in definition 7_actCarrying out batch normalization processing to obtain a 14 multiplied by 512-dimensional vector which is marked as L13;

taking a vector L13 with dimensions of 14 × 14 × 512 as input, performing maximal pooling with the size of 2 × 2 on L13 by adopting a standard maximal pooling method in definition 8 to obtain a vector with dimensions of 7 × 7 × 512, which is recorded as L14;

constructing a full-link layer with the size of 1 × 1 × 4096 by taking a vector L14 with dimensions of 7 × 7 × 512 as input and adopting a standard full-link layer method in definition 3, and marking as FC 1;

taking FC1 as input, and adopting a standard full-connection layer method in definition 3 to construct a full-connection layer with the size of 1 × 1 × 4096, and recording the full-connection layer as FC 2;

using the standard full connectivity layer method in definition 3, with FC2 as input, build dimensions of 1 × 1 × N_classFull connection layer of, N_classIs the number of scene categories, and is recorded as FC-N_class；

At this point, the scene classification network is marked as Modified-VGG after being constructed_pre。

Step 3, training scene classification network

Taking the new _ Train obtained in the step 1 as an input, adopting a classical random gradient descent algorithm in the definition 9, and carrying out Modified-VGG (Modified-gradient G) on the scene classification network established in the step 2_preAnd training and optimizing to obtain a scene classification network after training and optimizing, and recording the scene classification network as Modified-VGG.

Step 4, carrying out scene classification

And (3) taking the training set Train as an input, classifying all pictures in the Train into two types through the scene classification network Modified-VGG obtained in the step (3), wherein the first type is an inshore scene and is recorded as Data1, and the second type is an offshore scene and is recorded as Data 2.

Step 5, carrying out scene amplification

According to the classification results Data1 and Data2 obtained in step 4. Define the number of pictures of Data1 as M₁Number of pictures of Data2 is M₂。

If M is₁<M₂Randomly selecting M in the first type of landing scene Data1 by adopting a standard image mirroring method in definition 18₂-M₁Carrying out mirror image operation on a picture to obtain M after the mirror image operation₂-M₁Picture, denoted as extra _ Data 1. M after the mirroring operation is then mirrored using the data set union method that defines the criteria in 19₂-M₁The picture extra _ Data1 and the first type of landing scene Data1 are merged to obtain a new landing scene Data set which is marked as new _ Data 1. New _ Data2 is defined as Data 2.

If M is₁>M₂Randomly selected M of the second type of offshore scene Data2 using the standard image mirroring method in definition 18₁-M₂Carrying out mirror image operation on a picture to obtain M after the mirror image operation₁-M₂Picture, denoted as extra _ Data 2. M after the mirroring operation is then mirrored using the data set union method that defines the criteria in 19₁-M₂The picture extra _ Data2 and the second type of offshore scene Data2 are merged to get a new set of offshore scene Data, denoted new _ Data 2. New _ Data1 is defined as Data 1.

A new Data set new _ Data is defined { new _ Data1, new _ Data2 }.

Step 6, carrying out experimental verification on a classical model

Step 6.1, data enhancement

And taking the new Data set new _ Data obtained in the step 5 as input, and performing Data enhancement on the new _ Data by adopting a classical Data enhancement method in the definition 15 to obtain an SAR image detection training set after Data enhancement, and recording the SAR image detection training set as DetTrain.

Step 6.2, network establishment

Adopting a classical Faster R-CNN method in definition 14 to establish an untrained Faster R-CNN network;

step 6.3, training the network

Initializing the image batch processing size of the untrained network obtained in the step 6.2, and recording as Batchsize;

initializing the learning rate of an untrained network, and recording the learning rate as eta;

initializing the weight attenuation rate and momentum of untrained network training parameters, and recording the weight attenuation rate and momentum as DC and MM respectively;

initializing random parameters of the untrained Faster R-CNN network obtained in the step 6.2, and recording the initialized parameters as W;

and (3) training the untrained Faster R-CNN network by using the training set DetTrain in the step 6.1 and adopting a classical random gradient descent algorithm in the definition 11 to obtain a loss value of the network, and recording the loss value as loss.

And when the loss value loss of the network is less than the ideal loss value, stopping training to obtain a new network parameter new _ W.

Step 6.4, evaluation of detection result

And (3) taking the new network parameter new _ W obtained in the step 6.3 and the Test set Test obtained in the step 1 as input, and obtaining a ship detection network based on the Faster R-CNN by adopting a standard forward propagation method in the definition 16 to obtain a detection Result which is recorded as Result.

Taking a detection Result obtained by a ship detection network based on fast R-CNN as input, removing a redundant frame in the detection Result by adopting a standard non-maximum value inhibition method in definition 17, and obtaining a detection frame with the highest score, wherein the method comprises the following specific steps:

(1) firstly, marking a box with the highest score in a detection Result as a BS;

(2) then adopting an IoU intersection ratio calculation method in definition 5 to calculate IoU intersection ratio of the rest frames in the detection Result and the BS to obtain the intersection ratio (IoU) of the rest frames in the detection Result and the BS, and discarding the frames with IoU >0.5 to mark the rest frames in the Result as RB;

(3) continuously selecting a box BS with the highest score from the RB;

repeating the calculation IoU and discarding process in the step (2) until no frame can be discarded, and finally, the remaining frame is the final detection result and is marked as RR.

Taking the detection result RR of the Faster R-CNN network obtained in the previous step as input, and calculating by adopting a recall rate and precision calculation method in definition 12 to obtain the precision P, the recall rate R and a precision and recall rate curve P (R) of the detection of the Faster R-CNN network; and calculating to obtain the average precision mAP of the Faster R-CNN network by adopting a standard mAP index precision evaluation method in definition 13.

The invention has the innovation point that a scene classification model is constructed by using a convolutional neural network to enhance data, so that the detection precision of the ship on the shore in the SAR image is improved. The method can classify the shore-approaching samples and the offshore samples of the training set to balance the number of the shore-approaching training samples and the offshore training samples, so that the ship detection model has better detection capability of the shore-approaching ship: the total detection precision of the fast R-CNN ship detection network adopting the method is improved by 1.95 percent compared with the total detection precision of the fast R-CNN ship detection network in the prior art, and the detection precision of the ship on shore is improved by 6.61 percent.

The method has the advantages of improving the detection precision of the ship on shore in the SAR image, overcoming the defect of the ship on shore detection precision in the prior art, and improving the overall detection precision to a certain extent.

Drawings

Fig. 1 is a schematic flow diagram of a scene awareness data enhancement method for SAR ship detection in the present invention.

Fig. 2 is a schematic diagram of a scene classification network structure of the scene awareness data enhancement method for SAR ship detection in the present invention.

Fig. 3 shows the detection accuracy of the scene awareness data enhancement method for SAR ship detection according to the present invention.

Detailed Description

Step 1, preparing a data set

Step 2, establishing a scene classification network

Batch normalization method using criteria in definition 7 for activated volumesLamination C1_actCarrying out batch normalization processing to obtain a 224 multiplied by 64 dimensional vector which is marked as L2;

activating the convolutional layer C6 by the standard ReLU function activation method in definition 6 to obtain an activated convolutional layer C6_act；

convolutional layer C7 was activated using the standard ReLU function activation method in definition 6,obtaining the activated convolutional layer C7_act；

Step 3, training scene classification network

Taking the new _ Train obtained in the step 1 as an input, adopting a classical random gradient descent algorithm in the definition 9, and carrying out Modified-VGG (Modified-gradient G) on the scene classification network established in the step 2_preTraining and optimizing are carried out to obtain the field after training and optimizingAnd the scene classification network is marked as Modified-VGG.

Step 4, carrying out scene classification

Step 5, carrying out scene amplification

A new Data set new _ Data is defined { new _ Data1, new _ Data2 }.

Step 6, carrying out experimental verification on a classical model

Step 6.1, data enhancement

Step 6.2, network establishment

step 6.3, training the network

Step 6.4, evaluation of detection result

(3) continuously selecting a box BS with the highest score from the RB;

Claims

1. A scene perception data enhancement method for SAR ship detection is characterized by comprising the following steps:

step 1, preparing a data set

Obtaining an SSDD data set by adopting a method for obtaining the SSDD data set, selecting images with suffixes 1 and 9 as a Test set, marking the images as a Test set and other images as a training set as Train, marking SAR images in the training set Train into an onshore scene and an offshore scene, and obtaining a new training set which is marked as new _ Train;

step 2, establishing a scene classification network

Defining an input layer by adopting a classical convolutional neural network method, recording the input layer as L1, and inputting an SAR image with the size of 224 multiplied by 1;

taking an input layer L1 as input, constructing a convolutional layer C1 by adopting a classical convolutional neural network method, and setting convolutional kernel parameters: the size is set to 3 × 3 × 64, and the step size is set to 1;

activating the convolutional layer C1 by adopting a standard ReLU function activation method to obtain an activated convolutional layer C1_act；

Activated convolutional layer C1 using standard batch normalization method_actCarrying out batch normalization processing to obtain a 224 multiplied by 64 dimensional vector which is marked as L2;

taking a vector L2 with dimensions of 224 multiplied by 64 as input, performing maximum pooling on L2 with the size of 2 multiplied by 2 by adopting a standard maximum pooling method to obtain a vector with dimensions of 112 multiplied by 64, and recording the vector as L3;

constructing a convolutional layer C2 by taking a vector L3 with dimensions of 112 multiplied by 64 as input according to a classical convolutional neural network method, and setting convolution kernel parameters: the size is set to 3 × 3 × 128, the step size is set to 1;

activating the convolutional layer C2 by adopting a standard ReLU function activation method to obtain an activated convolutional layer C2_act；

Activated convolutional layer C2 using standard batch normalization method_actCarrying out batch normalization processing to obtain a 112 multiplied by 128 dimensional vector which is marked as L4;

taking a 112 × 112 × 128-dimensional vector L4 as an input, performing maximum pooling on L4 by adopting a standard maximum pooling method, and obtaining a 56 × 56 × 128-dimensional vector which is marked as L5;

taking a vector L5 with dimensions of 56 multiplied by 128 as an input, constructing a convolutional layer C3 according to a classical convolutional neural network method, and setting parameters of a convolutional kernel: the size is set to 3 × 3 × 256, and the step size is set to 1;

activating the convolutional layer C3 by adopting a standard ReLU function activation method to obtain an activated convolutional layer C3_act；

Activated convolutional layer C3 using standard batch normalization method_actCarrying out batch normalization processing to obtain a 56 × 56 × 256-dimensional vector which is marked as L6;

taking a vector L6 with dimensions of 56 multiplied by 256 as input, constructing a convolutional layer C4 by adopting a classical convolutional neural network method, and setting parameters of a convolutional kernel: the size is set to 3 × 3 × 256, and the step size is set to 1;

activating the convolutional layer C4 by adopting a standard ReLU function activation method to obtain an activated convolutional layer C4_act；

Activated convolutional layer C4 using standard batch normalization method_actCarrying out batch normalization processing to obtain a 56 × 56 × 256-dimensional vector which is marked as L7;

taking a 56 × 56 × 256-dimensional vector L7 as an input, performing maximum pooling on L7 by adopting a standard maximum pooling method, wherein the size of the maximum pooling is 2 × 2, and obtaining a 28 × 28 × 256-dimensional vector which is marked as L8;

taking a vector L8 with dimensions of 28 multiplied by 256 as input, constructing a convolutional layer C5 by adopting a classical convolutional neural network method, and setting parameters of a convolutional kernel: the size is set to 3 × 3 × 512, and the step size is set to 1;

activating the convolutional layer C5 by adopting a standard ReLU function activation method to obtain an activated convolutional layer C5_act；

Activated convolutional layer C5 using standard batch normalization method_actCarrying out batch normalization processing to obtain a 28 multiplied by 512-dimensional vector which is marked as L9;

taking a vector L9 with dimensions of 28 multiplied by 512 as input, constructing a convolutional layer C6 by adopting a classical convolutional neural network method, and setting parameters of a convolutional kernel: the size is set to 3 × 3 × 512, and the step size is set to 1;

activating the convolutional layer C6 by adopting a standard ReLU function activation method to obtain an activated convolutional layer C6_act；

Activated convolutional layer C6 using standard batch normalization method_actCarrying out batch normalization processing to obtain a 28 multiplied by 512-dimensional vector which is marked as L10;

taking a 28 × 28 × 512-dimensional vector L10 as an input, performing maximum pooling on L10 by adopting a standard maximum pooling method, wherein the size of the maximum pooling is 2 × 2, and obtaining a 14 × 14 × 512-dimensional vector which is marked as L11;

taking a vector L11 with dimensions of 14 multiplied by 512 as input, adopting a classical convolutional neural network method to construct a convolutional layer C7, and setting parameters of a convolutional kernel: the size is set to 3 × 3 × 512, and the step size is set to 1;

activating the convolutional layer C7 by adopting a standard ReLU function activation method to obtain an activated convolutional layer C7_act；

Activated convolutional layer C7 using standard batch normalization method_actCarrying out batch normalization processing to obtain a 14 multiplied by 512-dimensional vector which is marked as L12;

taking a vector L12 with dimensions of 14 multiplied by 512 as input, adopting a classical convolutional neural network method to construct a convolutional layer C8, and setting parameters of a convolutional kernel: the size is set to 3 × 3 × 512, and the step size is set to 1;

activating the convolutional layer C8 by adopting a standard ReLU function activation method to obtain an activated convolutional layer C8_act；

Activated convolutional layer C8 using standard batch normalization method_actCarrying out batch normalization processing to obtain a 14 multiplied by 512-dimensional vector which is marked as L13;

taking a vector L13 with dimensions of 14 multiplied by 512 as input, performing maximum pooling on L13 with the size of 2 multiplied by 2 by adopting a standard maximum pooling method to obtain a vector with dimensions of 7 multiplied by 512, and recording the vector as L14;

constructing a full-connection layer with the size of 1 × 1 × 4096 by taking a vector L14 with dimensions of 7 × 7 × 512 as input and adopting a standard full-connection layer method, and recording the full-connection layer as FC 1;

taking FC1 as input, and adopting a standard full-connection layer method to construct a full-connection layer with the size of 1 × 1 × 4096, and recording the full-connection layer as FC 2;

using FC2 as input, a standard full link layer approach was used to construct a dimension of 1 × 1 × N_classFull connection layer of, N_classIs the number of scene categories, and is recorded as FC-N_class；

At this point, the scene classification network is marked as Modified-VGG after being constructed_pre；

Step 3, training scene classification network

Taking the new training set new _ Train obtained in the step 1 as input, adopting a classical random gradient descent algorithm, and carrying out Modified-VGG (Modified-gradient G) on the scene classification network established in the step 2_preTraining and optimizing to obtain a trained and optimized scene classification network, and recording the trained and optimized scene classification network as Modified-VGG;

step 4, carrying out scene classification

Classifying all pictures in the Train into two types by taking the Train as input through the scene classification network Modified-VGG obtained in the step 3, wherein the first type is an inshore scene and is recorded as Data1, and the second type is an offshore scene and is recorded as Data 2;

step 5, carrying out scene amplification

According to the classification results Data1 and Data2 obtained in step 4; define the number of pictures of Data1 as M₁Number of pictures of Data2 is M₂；

If M is₁<M₂Randomly selecting M from the first type of landing scene Data1 by adopting a standard image mirroring method₂-M₁Carrying out mirror image operation on a picture to obtain M after the mirror image operation₂-M₁A picture, denoted as extra _ Data 1; then adopting a standard data set combination method to mirror M after the operation₂-M₁Merging the picture extra _ Data1 and the first type of landing scene Data1 to obtain a new landing scene Data set which is marked as new _ Data 1; define new _ Data2 ═ Data 2;

if M is₁>M₂Randomly selected M of the second offshore scene Data2 using standard image mirroring method₁-M₂Carrying out mirror image operation on a picture to obtain M after the mirror image operation₁-M₂A picture, denoted as extra _ Data 2; then adopting a standard data set combination method to mirror M after the operation₁-M₂Merging the picture extra _ Data2 and the second type of offshore scene Data2 to obtain a new offshore scene Data set which is marked as new _ Data 2; define new _ Data1 ═ Data 1;

defining a new Data set new _ Data { new _ Data1, new _ Data2 };

step 6, carrying out experimental verification on a classical model

Step 6.1, data enhancement

Taking the new Data set new _ Data obtained in the step 5 as input, and performing Data enhancement on the new _ Data by adopting a classical Data enhancement method to obtain an SAR image detection training set after Data enhancement, and recording the SAR image detection training set as DetTrain;

step 6.2, network establishment

Adopting a classic Faster R-CNN method to establish an untrained Faster R-CNN network;

step 6.3, training the network

training an untrained Faster R-CNN network by using the training set DetTrain in the step 6.1 and adopting a classical random gradient descent algorithm to obtain a loss value of the network, and recording the loss value as loss;

when the loss value loss of the network is smaller than the ideal loss value, stopping training to obtain a new network parameter new _ W;

step 6.4, evaluation of detection result

Taking the new network parameter new _ W obtained in the step 6.3 and the Test set Test obtained in the step 1 as input, and obtaining a ship detection network based on fast R-CNN by adopting a standard forward propagation method to obtain a detection Result, and recording the detection Result as Result;

the method comprises the following steps of taking a detection Result obtained by a ship detection network based on fast R-CNN as input, removing redundant frames in the detection Result by adopting a standard non-maximum value inhibition method, and obtaining a detection frame with the highest score, wherein the specific steps are as follows:

(2) then, IoU cross-over ratio calculation is carried out on the rest frames in the detection Result and the BS by adopting a traditional IoU cross-over ratio calculation method to obtain the cross-over ratio (IoU) of the rest frames in the detection Result and the BS, and after the frames with the length of IoU being more than 0.5 are abandoned, the rest frames in the Result are marked as RB;

(3) continuously selecting a box BS with the highest score from the RB;

repeating the processes of calculating IoU and discarding in the step (2) until no frame can be discarded, and recording the final remaining frame as the final detection result as RR;

taking the detection result RR of the Faster R-CNN network obtained in the previous step as input, and calculating the precision P, the recall rate R, the precision and the recall rate curve P (R) of the detection of the Faster R-CNN network by adopting a recall rate and precision rate calculation method; and calculating to obtain the average precision mAP of the Faster R-CNN network by adopting a standard mAP index precision evaluation method.