CN116206099A - Ship position detection method based on SAR image and storage medium - Google Patents

Ship position detection method based on SAR image and storage medium Download PDF

Info

Publication number
CN116206099A
CN116206099A CN202310501019.3A CN202310501019A CN116206099A CN 116206099 A CN116206099 A CN 116206099A CN 202310501019 A CN202310501019 A CN 202310501019A CN 116206099 A CN116206099 A CN 116206099A
Authority
CN
China
Prior art keywords
ship
feature
sar
image
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310501019.3A
Other languages
Chinese (zh)
Other versions
CN116206099B (en
Inventor
赵良军
郑莉萍
叶扬
何中良
宁峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University of Science and Engineering
Original Assignee
Sichuan University of Science and Engineering
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University of Science and Engineering filed Critical Sichuan University of Science and Engineering
Priority to CN202310501019.3A priority Critical patent/CN116206099B/en
Publication of CN116206099A publication Critical patent/CN116206099A/en
Application granted granted Critical
Publication of CN116206099B publication Critical patent/CN116206099B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Abstract

The invention discloses a ship position detection method and a storage medium based on SAR images, and relates to the technical field of ship detection, wherein the method comprises the following steps: acquiring SAR ship images; inputting the SAR ship image into an SAR image ship detection model, and generating a plurality of suggestion frames on the SAR ship image; and the plurality of suggestion frames carry out frame selection on the ship in the SAR ship image to obtain the position of the ship. The invention uses variability convolution to enlarge receptive field, and combines EA-fusion strategy and self-attention mechanism to provide REAA backbone feature extraction network, which retains more abundant effective feature diagram, combines EA-fusion strategy to design EAFPN network, and improves extraction of target feature of middle and large ships. Secondly, the TRPN network is provided by analyzing the fine granularity of the RPN network, so that the detection granularity of the model and the accuracy of a prediction frame are improved.

Description

Ship position detection method based on SAR image and storage medium
Technical Field
The invention relates to the technical field of ship detection, in particular to a ship position detection method based on SAR images and a storage medium.
Background
The ship detection method based on the SAR image can predict the position of each ship in the SAR image. Plays a vital role in the fields of civil security and military, and has attracted extensive research interest in recent years.
The visible light image ship detection tasks are different, the visible light image can only detect the ship image in the daytime, and the SAR ship image can detect all the time, all the weather and multiple dimensions. Since only one type of ship is detected, the existing method is more designed with fine granularity and network structure. For example, ship detection is improved by constructing a four-feature pyramid network (Quad-FPN), and ship detection under different SAR image pixels is realized by designing an HRSID model. These methods are sufficient to demonstrate that the SAR image based ship detection tasks can be achieved by different network frame designs.
The existing method adopts a small target detection network structure, has a deeper network layer number, and can effectively detect ship elements in SAR images, but has larger ship size and shape change caused by the interference of environment and geographic positions, so that the existing method is easy to cause missed detection. Meanwhile, unlike visible light image target recognition, SAR image is single object detection, and the existing method is more focused on fine granularity and network structure design, but not network depth extension, so that a large amount of redundant calculation amount can be generated.
Disclosure of Invention
The invention provides a ship position detection method and a storage medium based on SAR images, which solve the problems that a target ship is easy to miss and the network layer is too deep to generate large calculation amount in the prior art.
The invention provides a ship position detection method based on SAR images, which comprises the following steps:
acquiring SAR ship images;
inputting the SAR ship image into an SAR image ship detection model, and generating a plurality of suggestion frames on the SAR ship image;
the plurality of suggestion frames carry out frame selection on the ship in the SAR ship image to obtain the position of the ship;
inputting the SAR ship image into the SAR image ship detection model to obtain a prediction result, wherein the method comprises the following steps of:
carrying out multi-scale feature extraction on the SAR ship image based on the REAA backbone network to obtain a plurality of first feature images;
fusing the plurality of first feature maps based on the EAFPA network to obtain a predictable second feature map;
a plurality of suggestion boxes are generated on the second feature map based on the TRPN network, and the plurality of suggestion boxes are mapped to the SAR ship image.
Preferably, the multi-scale feature extraction is performed on the SAR ship image based on the REAA backbone network to obtain a plurality of first feature images, which comprises the following steps:
replacing the common convolution of the four stage feature layers in the ResNet network with a variability convolution;
after convolution replacement, fusing the first-stage feature layer and the third-stage feature layer as well as the second-stage feature layer and the fourth-stage feature layer by combining an EA-fusion strategy with a self-Attention mechanism Attention;
and inputting the SAR ship image into the fused four-stage feature layers to obtain four first feature images.
Preferably, the EA-fusion strategy combines two feature vectors into a composite vector by adopting an add parallel connection mode, and the composite vector is represented by the following formula:
Figure SMS_1
in the formula, a and b represent input features to be fused, and z represents a composite vector obtained after fusion, wherein i is an imaginary unit.
Preferably, the plurality of first feature maps are obtained by: a plurality of first feature maps are obtained by:
Figure SMS_2
wherein, the liquid crystal display device comprises a liquid crystal display device,
C(k)=256×k
D(k)=4×k
where k represents the current number of stages, k ε {1,2,3,4},
Figure SMS_3
the first feature map of the kth stage is represented by R representing a real number set, C (k) representing the number of channels of the current feature map, H representing the length of the current feature map, W representing the width of the current feature map, and D (k) representing a multiple of the length-width reduction of the aspect ratio SAR ship image of the current feature map.
Preferably, the merging the plurality of first feature maps based on the EAFPA network to obtain a predictable second feature map includes the following steps:
sequentially carrying out downward feature fusion on the four first feature graphs by using an EA-fusion strategy;
carrying out convolution operation on the four first feature graphs after the downward feature fusion, and classifying the channel numbers of the four first feature graphs into the same channel;
sequentially performing up-sampling operation on the four first feature graphs classified into the same channel, and performing up-feature fusion on the four first feature graphs fused with the corresponding down-features in the up-sampling operation;
the four first feature maps after the upward feature fusion are convolved by using 3×3 standard convolution to generate a predictable second feature map.
Preferably, up-sampling operation is sequentially performed on four first feature graphs classified into the same channel, and the relationship between the corresponding input feature graph and the corresponding output feature graph is as follows:
Figure SMS_4
Figure SMS_5
in the method, in the process of the invention,
Figure SMS_6
high, ∈h representing input feature map>
Figure SMS_7
High, ∈h representing output profile>
Figure SMS_8
Representing the width, ++of the input feature map>
Figure SMS_9
The scale_factor, representing the width of the output profile, specifies the output profile as a multiple of the input profile.
Preferably, the TRPN-based network generates a plurality of suggestion boxes on the second feature map, specifically including the following steps:
inputting the second characteristic diagram to a first RPN network, and generating 9 first target frames with different size ratios at each pixel point on the second characteristic diagram through a sliding window;
performing standard convolution operation on the plurality of pixel points, and judging 9 first target frames on the plurality of pixel points;
if the first target frame is an object, carrying out regression prediction on the first target frame, and calculating the total offset between the first target frame and the object to obtain a plurality of first prediction frames;
calculating the scores of the first prediction frames to obtain a plurality of feature frames with scores greater than 0.6
Figure SMS_10
Inputting the second characteristic diagram to a second RPN network, generating at each pixel point on the second characteristic diagram through a sliding window
Figure SMS_11
A second target frame of different size proportions;
performing expansion convolution operation on multiple pixel points
Figure SMS_12
Judging a second target frame;
if the second target frame is an object, carrying out regression prediction on the second target frame, and calculating the total offset between the second target frame and the object to obtain a plurality of second prediction frames;
calculating the scores of the plurality of second prediction frames to obtain a plurality of suggestion frames with scores greater than 0.7
Figure SMS_13
Preferably, the total offset is calculated by:
Figure SMS_14
in the method, in the process of the invention,
Figure SMS_15
for the center point offset, +.>
Figure SMS_16
Is the shape offset, o is the total offset.
Preferably, the TRPN network is trained by a classification loss function and a regression loss function, the classification loss function being calculated by:
Figure SMS_17
in the method, in the process of the invention,
Figure SMS_18
representing a loss of classification,/->
Figure SMS_19
Representing a set of real values, +.>
Figure SMS_20
Representing a set of predicted values, j being the current sample, c representing the number of categories, +.>
Figure SMS_21
Representing the true class of the current sample, +.>
Figure SMS_22
Representing a prediction category of the current sample;
the regression loss function is calculated by:
Figure SMS_23
in the method, in the process of the invention,
Figure SMS_24
representing regression loss, e represents the difference between the predicted value and the actual value.
A computer-readable storage medium storing computer instructions for causing the computer to perform the SAR image-based ship position detection method.
Compared with the prior art, the invention has the beneficial effects that:
the invention uses variability convolution to enlarge receptive field, and combines EA-fusion strategy and self-attention mechanism to provide REAA backbone feature extraction network, which retains more abundant effective feature map. And an EAFPN network is designed by combining an EA-fusion strategy, so that the extraction of the target characteristics of the middle and large ships is improved. Secondly, the TRPN network is provided by analyzing the fine granularity of the RPN network, so that the detection granularity of the model and the accuracy of a prediction frame are improved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a SAR image based ship position detection method of the present invention;
FIG. 2 is a block diagram of REAA backbone network and EAFPA network of the SAR image marine detection model of the present invention;
FIG. 3 is a TRPN network structure diagram of the SAR image ship detection model of the present invention;
FIG. 4 is a statistical chart of the number of vessels at different sizes in the dataset of the present invention;
FIG. 5 is a statistical chart of the number of vessels contained in each image in the dataset of the present invention;
FIG. 6 is a wide high scale statistical plot of a target box in a dataset of the present invention;
FIG. 7 is an aspect ratio statistics of a target box in a dataset of the present invention;
FIG. 8 is an original image of a SAR ship image in the dataset of the present invention;
FIG. 9 is a horizontal flip chart of SAR ship images in the dataset of the present invention;
FIG. 10 is a scaled deformation map of SAR ship images in the dataset of the present invention;
FIG. 11 is a Gaussian noise plot of SAR ship images in the dataset of the present invention;
FIG. 12 is a median filter diagram of SAR ship images in the dataset of the present invention;
FIG. 13 is a schematic diagram of mAP values in different sizes of vessels by different methods;
FIG. 14 is a schematic representation of AR values in different sized vessels for different methods;
FIG. 15 is a schematic diagram of model performance under a CE Loss of class Loss function;
FIG. 16 is a model performance schematic under the Focal loss-of-class function;
FIG. 17 is a graph of model performance at Balanced L1 loss regression loss function;
FIG. 18 is a graph of model performance at an L1 loss-of-loss regression function;
FIG. 19 is a model performance schematic under a Smooth L1 loss regression loss function;
FIG. 20 is a schematic diagram of model performance under CE Loss in combination with a Smooth L1 Loss function;
FIG. 21 is a schematic diagram of model performance under the Focal Loss in combination with the Smooth L1 Loss function;
FIG. 22 is a graph of manual labeling of predicted results at an intensive ship target;
FIG. 23 is a graph of the predicted results of the Cascade Rcnn model on dense vessel targets;
FIG. 24 is a graph of the predicted results of the Faster Rcnn model at dense vessel targets;
FIG. 25 is a graph of predicted results of the RDET Rcnn model on dense vessel targets;
FIG. 26 is a graph of manual labeling of predicted results for an open sea vessel target;
FIG. 27 is a graph of the predicted results of the Cascade Rcnn model on the open sea vessel target;
FIG. 28 is a graph of the predicted results of the Faster Rcnn model on the open sea vessel target;
FIG. 29 is a graph of predicted results of the RDET Rcnn model on an open sea vessel target;
FIG. 30 is a graph of manual labeling of predicted results at a large target vessel target;
FIG. 31 is a graph of the predicted results of the Cascade Rcnn model on a large target vessel target;
FIG. 32 is a graph of the predicted results of the Faster Rcnn model at the large target vessel target;
FIG. 33 is a graph of predicted results of the RDET Rcnn model on a large target vessel target;
FIG. 34 is a graph of manual signature of predicted results for offshore vessel targets;
FIG. 35 is a graph of the predicted results of the Cascade Rcnn model on offshore vessel targets;
FIG. 36 is a graph of the predicted results of the fast Rcnn model at offshore vessel targets;
fig. 37 is a graph of the predicted results of the RDET Rcnn model on offshore vessel targets.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, the invention provides a ship position detection method based on a SAR image, comprising the following steps:
the first step: and acquiring SAR ship images.
And acquiring an SSDD data set and an SAR-clip data set, wherein the SSDD data set totally comprises 1160 pictures and 2456 ships, and the average Ship number of each picture is 2.12. The SAR-clip dataset was derived from multi-source, multi-mode SAR images, containing 43819 Ship slices in total.
Referring to fig. 8-12, in order to avoid the phenomenon that a model is under fitted with a large target ship sample and the phenomenon that the model is missed due to unbalanced ship space distribution in an image, the invention adopts the ways of scaling, overturning, rotating and adding various noise to carry out data enhancement on an SSDD data set so as to balance the data sample.
And a second step of: and inputting the SAR ship image with the enhanced data into a SAR image ship detection model to obtain a prediction result.
Referring to fig. 2 and 3, the present invention provides a SAR image ship detection model RDET Rcnn including a REAA backbone network that enlarges receptive fields and enhances early fusion of deep and shallow features, an EAFPN network for enhancing late fusion of deep and shallow features, and a candidate region generator TRPN network with higher fine granularity.
The bottom features have higher resolution, contain more location and detail information, but have less relative convolution and more noise. The high-level features have stronger semantic information, but have low resolution and poorer perceptibility of details. Fusing features of different scales is an important means of improving detection performance. The invention provides an EA-fusion early fusion strategy, which combines two feature vectors into a composite vector by an add parallel connection mode, and for input features a and b, output features are as follows:
Figure SMS_25
(1)
where i is an imaginary unit.
The invention usesResNet is a key component, and firstly, a variability convolution is used for replacing an original common convolution to enlarge the receptive field of the feature map, so that more feature information is obtained. And secondly, fusing the stage1 feature layer and the stage3 feature layer and the stage2 feature layer and the stage4 feature layer by combining EA-fusion with a self-Attention mechanism Attention, thereby realizing a backbone network REAA for enhancing early fusion of deep and shallow features. Given an input image
Figure SMS_26
A plurality of first feature maps are obtained by:
Figure SMS_27
(2)
wherein, the liquid crystal display device comprises a liquid crystal display device,
C(k)=256×k(3)
D(k)=4×k(4)
where k represents the current number of stages, k ε {1,2,3,4},
Figure SMS_28
the first feature map of the kth stage is represented by R representing a real number set, C (k) representing the number of channels of the current feature map, H representing the length of the current feature map, W representing the width of the current feature map, and D (k) representing a multiple of the length-width reduction of the aspect ratio SAR ship image of the current feature map.
After the backbone network architecture design, the invention constructs an EAFPN feature fusion network layer based on FPN. FPN is a typical depth feature late fusion strategy. And fusing the multi-scale characteristic information extracted from the backbone network, so as to improve the accuracy of target detection. In order to extract effective characteristic information as much as possible, the invention sequentially performs EA-fusion on output characteristic layers of stage2, stage3 and stage4 before late fusion operation, thereby realizing an EAFPN network for enhancing late fusion of deep and shallow characteristics.
EAFPN first makes the channel number of the 4-layer first feature map output by REAA return to 256 through convolution operation, and the feature map shape is expressed as
Figure SMS_29
,/>
Figure SMS_30
Secondly, carrying out double bilinear interpolation up-sampling operation, wherein the heights and widths of the feature graphs before and after the up-sampling operation meet the following relation:
Figure SMS_31
(5)
Figure SMS_32
(6)
in the method, in the process of the invention,
Figure SMS_33
high, ∈h representing input feature map>
Figure SMS_34
High, ∈h representing output profile>
Figure SMS_35
Representing the width, ++of the input feature map>
Figure SMS_36
The scale_factor, representing the width of the output profile, specifies the output profile as a multiple of the input profile.
And finally, combining an EA-fusion strategy to carry out transverse connection, fusing the processed REAA output result with a feature layer obtained by the same upsampling, and generating a predictable second feature map by using 3X 3 standard convolution to eliminate an aliasing effect.
Referring again to fig. 3, trpn is used to screen out boxes where targets may be present. The RPN relies on a sliding window on the shared feature map to generate 9 anchors for each location and fine-tune by convolving the difference between the current anchor and the real box, but because the real box is not aligned with the anchor, a large amount of regression error occurs. In order to alleviate the alignment problem, the invention designs a TRPN network, which comprises a first RPN network and a second RPN network, wherein the first RPN network is a T1 part and is a standard RPN network. The second PRN network is a partial T1 expanded RPN network, replacing the standard convolution in the standard RPN network with an expanded convolution. Generating a suggestion box on the second feature map based on the TRPN network, specifically comprising the following steps:
s1: inputting the second characteristic diagram to a first RPN network, and generating 9 first target frames with different size ratios at each pixel point on the second characteristic diagram through a sliding window;
s2: carrying out standard convolution operation on a plurality of pixel points through a first detection head H1, and judging 9 kinds of first target frames on the plurality of pixel points;
s3: if the first target frame is an object, carrying out regression prediction on the first target frame through prediction regression B1, and calculating the total offset between the first target frame and the object through classification prediction C1 to obtain a plurality of first prediction frames;
s4: calculating the scores of the first prediction frames to obtain a plurality of feature frames with scores greater than 0.6
Figure SMS_37
S5: inputting the second characteristic diagram to a second RPN network, generating at each pixel point on the second characteristic diagram through a sliding window
Figure SMS_38
A second target frame of different size proportions;
s6: performing expansion convolution operation on multiple pixels by using a second detection head H2, wherein the expansion coefficient r is determined by the magnitude of o, and the expansion coefficient r is determined by the magnitude of o
Figure SMS_39
Judging a second target frame;
s7: if the second target frame is an object, carrying out regression prediction on the second target frame through prediction regression B2, and calculating the total offset between the second target frame and the object through classification prediction C2 to obtain a plurality of second prediction frames;
s8: calculating the scores of the plurality of second prediction frames to obtain a plurality of suggestion frames with scores greater than 0.7
Figure SMS_40
S9: multiple suggestion boxes
Figure SMS_41
Mapping to the SAR ship image map, generating a plurality of suggestion frames, and carrying out frame selection on the ship in the SAR ship image by the plurality of suggestion frames to obtain the position of the ship.
The total offset is calculated by:
Figure SMS_42
(7)
in the method, in the process of the invention,
Figure SMS_43
for the center point offset, +.>
Figure SMS_44
Is the shape offset, determined by the shape of the anchor and the convolution kernel size.
The loss calculation of TRPN networks consists of two parts: classification loss between ship target prediction and real target
Figure SMS_45
And regression loss of ship target detection frame ∈ ->
Figure SMS_46
And (5) calculating. The CE_S loss function of the present invention uses a cross entropy classification loss and a Smooth L1 regression loss as shown in the following formula:
Figure SMS_47
(8)
the present invention uses a cross entropy loss calculation model for classification loss. The classification cross entropy loss function formula is shown as follows:
Figure SMS_48
(9)
in the method, in the process of the invention,
Figure SMS_49
representing a loss of classification,/->
Figure SMS_50
Representing a set of real values, +.>
Figure SMS_51
Representing a set of predicted values, j being the current sample, c representing the number of categories, +.>
Figure SMS_52
Representing the true class of the current sample, +.>
Figure SMS_53
Representing the predicted class of the current sample.
For regression loss of the prediction frame, the invention adopts a Smooth L1 loss function. The ship detection belongs to a single sample, and e is defined as the difference between a predicted value and a true value, and the corresponding Smooth L1 loss function is shown as follows:
Figure SMS_54
(10)
examples
The invention trains and tests Ship detection models on the SSDD data set and the SAR-clip data set. The SSDD data set is composed of 1160 SAR Ship images, the SAR-clip data set is composed of 43819 SAR Ship images, and the method is characterized in that the method comprises the following steps: 1: the scale of 1 divides the training set, validation set and test set. The SSDD data set is used as a training and verification data set of the RDET Rcnn model, and the SAR-clip is used as an experimental data set for quantitative analysis of the model. In order to meet the requirement that the SSDD and SAR-clip data set formats are the same, the image tag is uniformly set to be in a COCO data format.
By carrying out statistical analysis on each ship image in the SSDD data set, the following three characteristics are obtained:
(1) As shown in fig. 4, most vessels in the data set are small-size targets, and there are small large-size vessel targets, that is, the vessel size distribution is unbalanced, and the under-fitting phenomenon occurs in the detection of the large-size vessel target model.
(2) As shown in fig. 5, the number of images of only 1 ship is the largest, and 2 to 3 ships are present in the rest of the images on average, and a small number of images contain a large number of ship elements, so that the ship spatial distribution is not uniform, and the missing detection phenomenon occurs when images containing a large number of ship targets are detected.
(3) As shown in fig. 6 and 7, the ship target frames are mostly of small square and oblong shape and the aspect ratio of the anchor frame is concentrated on 2, whereby the anchor ratio initial value in the ship detection model can be set to [0.5, 1.0, 2.0].
In order to evaluate the performance of the RDET Rcnn algorithm in SAR image ship detection, the invention adopts the accuracy (P), the recall rate (R), the average precision (mAP) and the average recall rate (AR) as evaluation indexes, and adopts TP (true positive), FP (false positive) and FN (false negative) to form target detection evaluation indexes. TP represents the number of predicted positive targets and actually positive targets, i.e., the result is considered true positive if and only if RDET-rcnn accurately detects and locates a ship target. FP represents the number of predicted positive targets but actually negative targets. FN represents the number of predicted negative targets but actually positive targets.
The experiment adopts ResNet-50 as a basic component, firstly uses coco format data set training to initialize a network, and secondly uses batch_size as SGD optimizer to train the model for 30 rounds. The initial learning rate of the backbone network is set to be 0.02, the kinetic energy is 0.9, the average value of the normalized data set images is [0.1559097,0.15591368,0.15588938], and the variance is [0.10875329,0.10876005,0.10869534]. All experiments of the present invention were performed on NVIDIA GeForce RTX 3060 GPU.
Referring to table 1, a major quantitative comparison of RDET Rcnn with the SAR image ship detection method in each detection type is shown. Specifically, the present invention makes a classification comparison because the detection mechanism of each large class of target detection algorithms is different. In the Anchor-free type, the SAR ship detection algorithm using the FCOS and CornerNet models as reference models is more, wherein the CP-FCOS is based on the FCOS framework, reconstructs a network layer and adds a Class Position (CP) module layer for optimizing the characteristics of regression branches in the network, effectively improving the model performance, and simultaneously increasing the model layer number. In the single-stage detection type, the network model constructed based on the idea of a one-stage algorithm has excellent performance on a small target detection task, but the overall accuracy AP_0.5:0.95 is not obviously increased. In the two-stage detection type, most of researches are improved by using a fast Rcnn and a cascades Rcnn as reference models, and a backbone network based on context-coupled representation learning is proposed by crTranssar by using a Swin trans-mer as a basic framework, but the model is excessively large in size and quantity of parameters. The PVT-SAR has the outstanding advantage that the accuracy of small target detection is effectively improved by using a two-stage algorithm. Compared with the existing multiple target detection methods, the RDET Rcnn serving as the two-stage SAR image ship detection model still has good performance, and can improve the large target detection model while keeping the small target detection efficiency.
Table 1 is compared with various SAR image ship detection models
Figure SMS_55
The REAA backbone network with enhanced early fusion of deep and shallow features has the ability to efficiently retain more information and capture long range context information. The present invention is classified into 3 types (small-size target, medium-size target and large-size target) from small to large according to the size of the ship.
As can be seen from fig. 13 and 14, as the size of the ship increases, the detection and positioning of the ship becomes more difficult, and the detection performance of the rea is better than that of other popular backbone networks. The reason for this result is that operating with limited receptive fields relying on common convolutions is less capable in capturing context information over longer distances. Conversely, REAA feature extraction is more efficient due to the use of variability convolution to increase receptive fields and EA-fusion strategies to enhance early fusion of deep and shallow features, resulting in a richer and larger range of contextual feature information.
EAFPN was compared to existing mainstream Neck networks and the performance of different Neck networks on SSDD dataset is shown in table 2.
Table 2 comparative experiments with different negk networks
Figure SMS_56
The feature pyramid FPN network is mainly used for extracting feature graphs with different scales and providing the feature graphs for a later network to execute a prediction task. As can be seen from Table 2, the EAFPN network of the invention has better advantages in SAR image ship detection tasks. The main stream improved CARFPN and PAFPN are used, the accuracy is obviously reduced, and the result is caused by the fact that the data set belongs to a single-color simple target type, and more effective characteristic information is lost when too many convolution operations are performed in the characteristic late fusion process. In contrast, the EAFPN enriches the input characteristic information by combining with an EA-fusion strategy, so that the EAFPN can extract more effective characteristic information.
In order to refine the detection granularity of the RDET-rcnn model, the TRPN proposal section generation network provided by the invention mainly generates a proposal frame with higher quality by designing two RPNs in a master-slave relation. Similarly, a comparative experiment was performed on TRPN with existing mainstream RPN networks, and the performance of different RPN networks on SSDD data sets is shown in table 3.
TABLE 3 comparative experiments for different RPN networks
Figure SMS_57
The regional generation network RPN is mainly used to screen out boxes where targets may be present. As can be seen from table 3, both types of mainstream networks CRPN and GARPN based on RPN improvement perform better in this experiment. CRPN focuses on emphasizing the alignment rules of the anchors, using adaptive convolution to fine tune the anchor for each stage. The GARPN is focused on judging whether the probability of the target point exceeds a threshold value so as to adjust the anchor, and compared with the CRPN large-size ship positioning frame, the GARPN is more accurate. The TRPN designed by combining the advantages of the two RPNs has two master-slave relations, the anchor alignment rule is used for fine adjustment of the position of the prediction frame, and the ship prediction frame with a higher threshold value is sent to the second stage for fine adjustment again. The final TRPN has a more prominent performance in SAR image ship detection tasks.
To verify the effectiveness of the CE_S loss function on the RDET-rcnn model, three sets of comparative experiments were performed. Referring to fig. 15 and 16, using Focal loss in classification loss has significant advantages, and model convergence speed is also fast. Referring to fig. 17 to 19, it can be seen that the advantages of the smear L1 Loss are fully exerted in model training, and referring to fig. 20 and 21, the CE Loss and the Focal Loss are respectively combined with the smear L1 Loss to compare the total Loss, and the effect of combining the Focal Loss with the smear L1 Loss is similar to the total Loss of combining the CE Loss with the smear L1 Loss.
Referring to table 4, although the focal Loss has better convergence and performance in the sample classification Loss calculation, the accuracy is significantly reduced compared to the result of training the model with CE Loss. In natural image detection, focal Loss focuses on difficult-to-separate samples from the sample difficulty classification perspective, so that the problem of sample unbalance is solved, and meanwhile, the overall performance of a natural image detection model is improved. However, the data set has an obvious characteristic that the difference between ship target and non-ship target objects is small, the ship shape can be changed under the condition of noise interference, and excessive interpretation can be carried out on the ship samples in the training of Focal Loss, so that the model accuracy of the invention is reduced. The present invention therefore uses ce_s as the loss function of the RDET-rcnn model.
Table 4 loss function accuracy comparison
Figure SMS_58
In practical application, for example, shooting height, environmental noise and picture brightness are the most common ship information changes, and the invention selects SAR ship images with different scene complexity from images shot by different radar satellites. As shown in fig. 22 to 37, a comparative experiment was performed using RDET Rcnn with cascades Rcnn and Faster Rcnn and the prediction results of RDET Rcnn model were analyzed. The results show that, as shown in fig. 22 to 25 and fig. 34 to 37, cascades Rcnn are good at detecting large and medium-sized ship targets, and that a missing detection phenomenon is likely to occur in the case of small target ships. In fig. 22-33, it is found that fast Rcnn is more suitable for detecting small target vessels, but is prone to false detection in the face of small targets as well as large target vessels that are not vessels. Compared with Cascade Rcnn and fast Rcnn, the RDET Rcnn ship detection model provided herein has better performance under different complex scenes and better detection capability under the conditions of facing different sizes and offshore interference.
The method is used for detecting and positioning the marine ship. As a ship-centric detection task, it is possible in reality to relate to marine civil safety and ship monitoring in off-course areas. In practical application, due to the fact that factors such as ship size and noise are greatly changed due to the fact that SAR shooting distance and environments are different, the RDET-rcnn is used for not only enhancing the extraction of ship features, but also refining the detection granularity, namely the accuracy of large target detection is improved while the high small target detection rate is guaranteed.
The invention provides a novel SAR image ship detection model RDET-rcnn for refining detection granularity and enhancing depth feature fusion. The RDET-rcnn model is designed for detecting and positioning ships under different SAR shooting scenes. The model is not influenced by ship scale change and stronger noise interference, and has higher effectiveness. Extensive experiments verify the powerful performance of the model.
The embodiment of the invention also provides a computer readable storage medium, and the computer readable storage medium is stored with computer executable instructions which can execute the ship position detection method based on SAR images.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (10)

1. The ship position detection method based on the SAR image is characterized by comprising the following steps of:
acquiring SAR ship images;
inputting the SAR ship image into an SAR image ship detection model, and generating a plurality of suggestion frames on the SAR ship image;
the plurality of suggestion frames carry out frame selection on the ship in the SAR ship image to obtain the position of the ship;
inputting the SAR ship image into the SAR image ship detection model, and generating a plurality of suggestion frames on the SAR ship image, wherein the method comprises the following steps of:
carrying out multi-scale feature extraction on the SAR ship image based on the REAA backbone network to obtain a plurality of first feature images;
fusing the plurality of first feature maps based on the EAFPA network to obtain a predictable second feature map;
a plurality of suggestion boxes are generated on the second feature map based on the TRPN network, and the plurality of suggestion boxes are mapped to the SAR ship image.
2. The ship position detection method based on the SAR image as set forth in claim 1, wherein the multi-scale feature extraction is performed on the SAR ship image based on the REAA backbone network to obtain a plurality of first feature images, and the method comprises the following steps:
replacing the common convolution of the four stage feature layers in the ResNet network with a variability convolution;
after convolution replacement, fusing the first-stage feature layer and the third-stage feature layer as well as the second-stage feature layer and the fourth-stage feature layer by combining an EA-fusion strategy with a self-Attention mechanism Attention;
and inputting the SAR ship image into the fused four-stage feature layers to obtain four first feature images.
3. The ship position detection method based on the SAR image as set forth in claim 2, wherein the EA-fusion strategy combines two eigenvectors into a composite vector by adopting an add parallel connection mode, and the composite vector is represented by the following formula:
Figure QLYQS_1
;
in the formula, a and b represent input features to be fused, and z represents a composite vector obtained after fusion, wherein i is an imaginary unit.
4. The SAR image-based ship position detection method of claim 2, wherein the plurality of first feature maps are obtained by:
Figure QLYQS_2
;
wherein, the liquid crystal display device comprises a liquid crystal display device,
C(k)=256×k;
D(k)=4×k;
where k represents the current number of stages, k ε {1,2,3,4},
Figure QLYQS_3
the first feature map of the kth stage is represented by R representing a real number set, C (k) representing the number of channels of the current feature map, H representing the length of the current feature map, W representing the width of the current feature map, and D (k) representing a multiple of the length-width reduction of the aspect ratio SAR ship image of the current feature map.
5. The SAR image-based ship position detection method according to claim 4, wherein the EAFPA network-based fusion of the plurality of first feature maps to obtain the second predictable feature map comprises the steps of:
sequentially carrying out downward feature fusion on the four first feature graphs by using an EA-fusion strategy;
carrying out convolution operation on the four first feature graphs after the downward feature fusion, and classifying the channel numbers of the four first feature graphs into the same channel;
sequentially performing up-sampling operation on the four first feature graphs classified into the same channel, and performing up-feature fusion on the four first feature graphs fused with the corresponding down-features in the up-sampling operation;
the four first feature maps after the upward feature fusion are convolved by using 3×3 standard convolution to generate a predictable second feature map.
6. The ship position detection method based on SAR image as set forth in claim 5, wherein the four first feature images classified into the same channel are sequentially up-sampled, and the relationship between the corresponding input feature images and the corresponding output feature images is as follows:
Figure QLYQS_4
;
Figure QLYQS_5
;
in the method, in the process of the invention,
Figure QLYQS_6
high, ∈h representing input feature map>
Figure QLYQS_7
High, ∈h representing output profile>
Figure QLYQS_8
Representing the width, ++of the input feature map>
Figure QLYQS_9
Output characteristics of representationThe width of the profile, scale_factor, specifies the output profile as a multiple of the input profile.
7. The SAR image-based ship position detection method according to claim 5, wherein the TRPN network generates a plurality of suggestion boxes on the second feature map, comprising the steps of:
inputting the second characteristic diagram to a first RPN network, and generating 9 first target frames with different size ratios at each pixel point on the second characteristic diagram through a sliding window;
performing standard convolution operation on the plurality of pixel points, and judging 9 first target frames on the plurality of pixel points;
if the first target frame is an object, carrying out regression prediction on the first target frame, and calculating the total offset between the first target frame and the object to obtain a plurality of first prediction frames;
calculating the scores of the first prediction frames to obtain a plurality of feature frames with scores greater than 0.6;
inputting the second feature map to a second RPN network, and generating second target frames with different size proportions at each pixel point on the second feature map through a sliding window;
performing expansion convolution operation on the plurality of pixel points, and judging a seed second target frame on the plurality of pixel points;
if the second target frame is an object, carrying out regression prediction on the second target frame, and calculating the total offset between the second target frame and the object to obtain a plurality of second prediction frames;
and calculating the scores of the plurality of second prediction frames to obtain a plurality of suggestion frames with scores greater than 0.7.
8. The SAR image-based ship position detection method of claim 7, wherein the total offset is calculated by:
Figure QLYQS_10
;
in the method, in the process of the invention,
Figure QLYQS_11
for the center point offset, +.>
Figure QLYQS_12
Is the shape offset, o is the total offset.
9. The SAR image-based ship position detection method of claim 7, wherein the TRPN network is trained by a classification loss function and a regression loss function, and the classification loss function is calculated by:
Figure QLYQS_13
;
in the method, in the process of the invention,
Figure QLYQS_14
represents a classification loss, x= { X } represents a set of real values, y= { Y } represents a set of predicted values, j is the current sample, c represents the number of classes,/-c represents the number of classes, ">
Figure QLYQS_15
Representing the true class of the current sample, +.>
Figure QLYQS_16
Representing a prediction category of the current sample;
the regression loss function is calculated by:
Figure QLYQS_17
;
in the method, in the process of the invention,
Figure QLYQS_18
representing regression loss, e represents the difference between the predicted value and the actual value. />
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer instructions for causing the computer to execute the SAR image-based ship position detection method according to any one of claims 1 to 9.
CN202310501019.3A 2023-05-06 2023-05-06 Ship position detection method based on SAR image and storage medium Active CN116206099B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310501019.3A CN116206099B (en) 2023-05-06 2023-05-06 Ship position detection method based on SAR image and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310501019.3A CN116206099B (en) 2023-05-06 2023-05-06 Ship position detection method based on SAR image and storage medium

Publications (2)

Publication Number Publication Date
CN116206099A true CN116206099A (en) 2023-06-02
CN116206099B CN116206099B (en) 2023-08-15

Family

ID=86517737

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310501019.3A Active CN116206099B (en) 2023-05-06 2023-05-06 Ship position detection method based on SAR image and storage medium

Country Status (1)

Country Link
CN (1) CN116206099B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110533084A (en) * 2019-08-12 2019-12-03 长安大学 A kind of multiscale target detection method based on from attention mechanism
AU2020103715A4 (en) * 2020-11-27 2021-02-11 Beijing University Of Posts And Telecommunications Method of monocular depth estimation based on joint self-attention mechanism
CN112464883A (en) * 2020-12-11 2021-03-09 武汉工程大学 Automatic detection and identification method and system for ship target in natural scene
US20210390700A1 (en) * 2020-06-12 2021-12-16 Adobe Inc. Referring image segmentation
CN114049561A (en) * 2021-11-25 2022-02-15 江苏科技大学 Ship target detection model and method
CN114202696A (en) * 2021-12-15 2022-03-18 安徽大学 SAR target detection method and device based on context vision and storage medium
CN114299303A (en) * 2021-12-07 2022-04-08 集美大学 Ship target detection method, terminal device and storage medium
WO2022073452A1 (en) * 2020-10-07 2022-04-14 武汉大学 Hyperspectral remote sensing image classification method based on self-attention context network
CN114821155A (en) * 2022-03-29 2022-07-29 国家电网有限公司大数据中心 Multi-label classification method and system based on deformable NTS-NET neural network
CN114973116A (en) * 2022-01-21 2022-08-30 昆明理工大学 Method and system for detecting foreign matters embedded into airport runway at night by self-attention feature
CN115049923A (en) * 2022-05-30 2022-09-13 北京航空航天大学杭州创新研究院 SAR image ship target instance segmentation training method, system and device
CN115546555A (en) * 2022-10-18 2022-12-30 安徽大学 Lightweight SAR target detection method based on hybrid characterization learning enhancement

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110533084A (en) * 2019-08-12 2019-12-03 长安大学 A kind of multiscale target detection method based on from attention mechanism
US20210390700A1 (en) * 2020-06-12 2021-12-16 Adobe Inc. Referring image segmentation
WO2022073452A1 (en) * 2020-10-07 2022-04-14 武汉大学 Hyperspectral remote sensing image classification method based on self-attention context network
AU2020103715A4 (en) * 2020-11-27 2021-02-11 Beijing University Of Posts And Telecommunications Method of monocular depth estimation based on joint self-attention mechanism
CN112464883A (en) * 2020-12-11 2021-03-09 武汉工程大学 Automatic detection and identification method and system for ship target in natural scene
CN114049561A (en) * 2021-11-25 2022-02-15 江苏科技大学 Ship target detection model and method
CN114299303A (en) * 2021-12-07 2022-04-08 集美大学 Ship target detection method, terminal device and storage medium
CN114202696A (en) * 2021-12-15 2022-03-18 安徽大学 SAR target detection method and device based on context vision and storage medium
CN114973116A (en) * 2022-01-21 2022-08-30 昆明理工大学 Method and system for detecting foreign matters embedded into airport runway at night by self-attention feature
CN114821155A (en) * 2022-03-29 2022-07-29 国家电网有限公司大数据中心 Multi-label classification method and system based on deformable NTS-NET neural network
CN115049923A (en) * 2022-05-30 2022-09-13 北京航空航天大学杭州创新研究院 SAR image ship target instance segmentation training method, system and device
CN115546555A (en) * 2022-10-18 2022-12-30 安徽大学 Lightweight SAR target detection method based on hybrid characterization learning enhancement

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
LIPING ZHENG等: "SSE-Ship: A SAR Image Ship Detection Model with Expanded Detection Field of View and Enhanced Effective Feature Information", 《OPEN JOURNAL OF APPLIED SCIENCES》, vol. 13, no. 04, pages 562 - 578 *
WEI TAN等: "Multimodal medical image fusion algorithm in the era of big data", 《NEURAL COMPUTING AND APPLICATIONS (2020)》, pages 1 - 21 *
张阳等: "改进Faster R-CNN的SAR图像船舶检测技术", 《无线电工程》, vol. 52, no. 12, pages 2280 - 2287 *
朱明明等: "改进区域卷积神经网络的机场检测方法", 《光学学报》, vol. 38, no. 07, pages 330 - 335 *
米亚鑫: "基于低层特征融合多核卷积神经网络的管道缺陷漏磁图像识别方法", 《当代化工》, vol. 52, no. 03, pages 677 - 681 *
马浩为等: "基于改进YOLOv5的雾霾环境下船舶红外图像检测算法", 《 交通信息与安全》, vol. 41, no. 01, pages 95 - 104 *

Also Published As

Publication number Publication date
CN116206099B (en) 2023-08-15

Similar Documents

Publication Publication Date Title
CN111652321B (en) Marine ship detection method based on improved YOLOV3 algorithm
Chen et al. Learning slimming SAR ship object detector through network pruning and knowledge distillation
CN112766087A (en) Optical remote sensing image ship detection method based on knowledge distillation
CN110516605A (en) Any direction Ship Target Detection method based on cascade neural network
Wang et al. Ship detection based on deep learning
CN112418165B (en) Small-size target detection method and device based on improved cascade neural network
CN116863539A (en) Fall figure target detection method based on optimized YOLOv8s network structure
CN109190456A (en) Pedestrian detection method is overlooked based on the multiple features fusion of converging channels feature and gray level co-occurrence matrixes
CN115526935A (en) Pixel-level capture pose detection method and system based on global and local information
Cao et al. Detection of microalgae objects based on the Improved YOLOv3 model
CN115546555A (en) Lightweight SAR target detection method based on hybrid characterization learning enhancement
Dai et al. GCD-YOLOv5: An armored target recognition algorithm in complex environments based on array lidar
CN116206099B (en) Ship position detection method based on SAR image and storage medium
CN117315752A (en) Training method, device, equipment and medium for face emotion recognition network model
Fan et al. An improved yolov5 marine biological object detection algorithm
CN116403133A (en) Improved vehicle detection algorithm based on YOLO v7
Wang et al. YOLOV5s-Face face detection algorithm
Li et al. Target detection in color sonar image based on YOLOV5 network
CN115457423A (en) Flame smoke detection method combining efficient sampling enhancement
Zhang et al. Ship detection based on improved YOLO algorithm
CN114283336A (en) Anchor-frame-free remote sensing image small target detection method based on mixed attention
Ke et al. Scale-aware dimension-wise attention network for small ship instance segmentation in synthetic aperture radar images
Zhao et al. A novel fusion framework without pooling for noisy SAR image classification
CN116468928B (en) Thermal infrared small target detection method based on visual perception correlator
Hu et al. FSPN: Feature Selection Pyramid Network for Tiny Person Detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant