CN116206099A - Ship position detection method based on SAR image and storage medium - Google Patents
Ship position detection method based on SAR image and storage medium Download PDFInfo
- Publication number
- CN116206099A CN116206099A CN202310501019.3A CN202310501019A CN116206099A CN 116206099 A CN116206099 A CN 116206099A CN 202310501019 A CN202310501019 A CN 202310501019A CN 116206099 A CN116206099 A CN 116206099A
- Authority
- CN
- China
- Prior art keywords
- ship
- feature
- sar
- image
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Abstract
The invention discloses a ship position detection method and a storage medium based on SAR images, and relates to the technical field of ship detection, wherein the method comprises the following steps: acquiring SAR ship images; inputting the SAR ship image into an SAR image ship detection model, and generating a plurality of suggestion frames on the SAR ship image; and the plurality of suggestion frames carry out frame selection on the ship in the SAR ship image to obtain the position of the ship. The invention uses variability convolution to enlarge receptive field, and combines EA-fusion strategy and self-attention mechanism to provide REAA backbone feature extraction network, which retains more abundant effective feature diagram, combines EA-fusion strategy to design EAFPN network, and improves extraction of target feature of middle and large ships. Secondly, the TRPN network is provided by analyzing the fine granularity of the RPN network, so that the detection granularity of the model and the accuracy of a prediction frame are improved.
Description
Technical Field
The invention relates to the technical field of ship detection, in particular to a ship position detection method based on SAR images and a storage medium.
Background
The ship detection method based on the SAR image can predict the position of each ship in the SAR image. Plays a vital role in the fields of civil security and military, and has attracted extensive research interest in recent years.
The visible light image ship detection tasks are different, the visible light image can only detect the ship image in the daytime, and the SAR ship image can detect all the time, all the weather and multiple dimensions. Since only one type of ship is detected, the existing method is more designed with fine granularity and network structure. For example, ship detection is improved by constructing a four-feature pyramid network (Quad-FPN), and ship detection under different SAR image pixels is realized by designing an HRSID model. These methods are sufficient to demonstrate that the SAR image based ship detection tasks can be achieved by different network frame designs.
The existing method adopts a small target detection network structure, has a deeper network layer number, and can effectively detect ship elements in SAR images, but has larger ship size and shape change caused by the interference of environment and geographic positions, so that the existing method is easy to cause missed detection. Meanwhile, unlike visible light image target recognition, SAR image is single object detection, and the existing method is more focused on fine granularity and network structure design, but not network depth extension, so that a large amount of redundant calculation amount can be generated.
Disclosure of Invention
The invention provides a ship position detection method and a storage medium based on SAR images, which solve the problems that a target ship is easy to miss and the network layer is too deep to generate large calculation amount in the prior art.
The invention provides a ship position detection method based on SAR images, which comprises the following steps:
acquiring SAR ship images;
inputting the SAR ship image into an SAR image ship detection model, and generating a plurality of suggestion frames on the SAR ship image;
the plurality of suggestion frames carry out frame selection on the ship in the SAR ship image to obtain the position of the ship;
inputting the SAR ship image into the SAR image ship detection model to obtain a prediction result, wherein the method comprises the following steps of:
carrying out multi-scale feature extraction on the SAR ship image based on the REAA backbone network to obtain a plurality of first feature images;
fusing the plurality of first feature maps based on the EAFPA network to obtain a predictable second feature map;
a plurality of suggestion boxes are generated on the second feature map based on the TRPN network, and the plurality of suggestion boxes are mapped to the SAR ship image.
Preferably, the multi-scale feature extraction is performed on the SAR ship image based on the REAA backbone network to obtain a plurality of first feature images, which comprises the following steps:
replacing the common convolution of the four stage feature layers in the ResNet network with a variability convolution;
after convolution replacement, fusing the first-stage feature layer and the third-stage feature layer as well as the second-stage feature layer and the fourth-stage feature layer by combining an EA-fusion strategy with a self-Attention mechanism Attention;
and inputting the SAR ship image into the fused four-stage feature layers to obtain four first feature images.
Preferably, the EA-fusion strategy combines two feature vectors into a composite vector by adopting an add parallel connection mode, and the composite vector is represented by the following formula:
in the formula, a and b represent input features to be fused, and z represents a composite vector obtained after fusion, wherein i is an imaginary unit.
Preferably, the plurality of first feature maps are obtained by: a plurality of first feature maps are obtained by:
wherein, the liquid crystal display device comprises a liquid crystal display device,
C(k)=256×k
D(k)=4×k
where k represents the current number of stages, k ε {1,2,3,4},the first feature map of the kth stage is represented by R representing a real number set, C (k) representing the number of channels of the current feature map, H representing the length of the current feature map, W representing the width of the current feature map, and D (k) representing a multiple of the length-width reduction of the aspect ratio SAR ship image of the current feature map.
Preferably, the merging the plurality of first feature maps based on the EAFPA network to obtain a predictable second feature map includes the following steps:
sequentially carrying out downward feature fusion on the four first feature graphs by using an EA-fusion strategy;
carrying out convolution operation on the four first feature graphs after the downward feature fusion, and classifying the channel numbers of the four first feature graphs into the same channel;
sequentially performing up-sampling operation on the four first feature graphs classified into the same channel, and performing up-feature fusion on the four first feature graphs fused with the corresponding down-features in the up-sampling operation;
the four first feature maps after the upward feature fusion are convolved by using 3×3 standard convolution to generate a predictable second feature map.
Preferably, up-sampling operation is sequentially performed on four first feature graphs classified into the same channel, and the relationship between the corresponding input feature graph and the corresponding output feature graph is as follows:
in the method, in the process of the invention,high, ∈h representing input feature map>High, ∈h representing output profile>Representing the width, ++of the input feature map>The scale_factor, representing the width of the output profile, specifies the output profile as a multiple of the input profile.
Preferably, the TRPN-based network generates a plurality of suggestion boxes on the second feature map, specifically including the following steps:
inputting the second characteristic diagram to a first RPN network, and generating 9 first target frames with different size ratios at each pixel point on the second characteristic diagram through a sliding window;
performing standard convolution operation on the plurality of pixel points, and judging 9 first target frames on the plurality of pixel points;
if the first target frame is an object, carrying out regression prediction on the first target frame, and calculating the total offset between the first target frame and the object to obtain a plurality of first prediction frames;
calculating the scores of the first prediction frames to obtain a plurality of feature frames with scores greater than 0.6;
Inputting the second characteristic diagram to a second RPN network, generating at each pixel point on the second characteristic diagram through a sliding windowA second target frame of different size proportions;
if the second target frame is an object, carrying out regression prediction on the second target frame, and calculating the total offset between the second target frame and the object to obtain a plurality of second prediction frames;
calculating the scores of the plurality of second prediction frames to obtain a plurality of suggestion frames with scores greater than 0.7。
Preferably, the total offset is calculated by:
in the method, in the process of the invention,for the center point offset, +.>Is the shape offset, o is the total offset.
Preferably, the TRPN network is trained by a classification loss function and a regression loss function, the classification loss function being calculated by:
in the method, in the process of the invention,representing a loss of classification,/->Representing a set of real values, +.>Representing a set of predicted values, j being the current sample, c representing the number of categories, +.>Representing the true class of the current sample, +.>Representing a prediction category of the current sample;
the regression loss function is calculated by:
in the method, in the process of the invention,representing regression loss, e represents the difference between the predicted value and the actual value.
A computer-readable storage medium storing computer instructions for causing the computer to perform the SAR image-based ship position detection method.
Compared with the prior art, the invention has the beneficial effects that:
the invention uses variability convolution to enlarge receptive field, and combines EA-fusion strategy and self-attention mechanism to provide REAA backbone feature extraction network, which retains more abundant effective feature map. And an EAFPN network is designed by combining an EA-fusion strategy, so that the extraction of the target characteristics of the middle and large ships is improved. Secondly, the TRPN network is provided by analyzing the fine granularity of the RPN network, so that the detection granularity of the model and the accuracy of a prediction frame are improved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a SAR image based ship position detection method of the present invention;
FIG. 2 is a block diagram of REAA backbone network and EAFPA network of the SAR image marine detection model of the present invention;
FIG. 3 is a TRPN network structure diagram of the SAR image ship detection model of the present invention;
FIG. 4 is a statistical chart of the number of vessels at different sizes in the dataset of the present invention;
FIG. 5 is a statistical chart of the number of vessels contained in each image in the dataset of the present invention;
FIG. 6 is a wide high scale statistical plot of a target box in a dataset of the present invention;
FIG. 7 is an aspect ratio statistics of a target box in a dataset of the present invention;
FIG. 8 is an original image of a SAR ship image in the dataset of the present invention;
FIG. 9 is a horizontal flip chart of SAR ship images in the dataset of the present invention;
FIG. 10 is a scaled deformation map of SAR ship images in the dataset of the present invention;
FIG. 11 is a Gaussian noise plot of SAR ship images in the dataset of the present invention;
FIG. 12 is a median filter diagram of SAR ship images in the dataset of the present invention;
FIG. 13 is a schematic diagram of mAP values in different sizes of vessels by different methods;
FIG. 14 is a schematic representation of AR values in different sized vessels for different methods;
FIG. 15 is a schematic diagram of model performance under a CE Loss of class Loss function;
FIG. 16 is a model performance schematic under the Focal loss-of-class function;
FIG. 17 is a graph of model performance at Balanced L1 loss regression loss function;
FIG. 18 is a graph of model performance at an L1 loss-of-loss regression function;
FIG. 19 is a model performance schematic under a Smooth L1 loss regression loss function;
FIG. 20 is a schematic diagram of model performance under CE Loss in combination with a Smooth L1 Loss function;
FIG. 21 is a schematic diagram of model performance under the Focal Loss in combination with the Smooth L1 Loss function;
FIG. 22 is a graph of manual labeling of predicted results at an intensive ship target;
FIG. 23 is a graph of the predicted results of the Cascade Rcnn model on dense vessel targets;
FIG. 24 is a graph of the predicted results of the Faster Rcnn model at dense vessel targets;
FIG. 25 is a graph of predicted results of the RDET Rcnn model on dense vessel targets;
FIG. 26 is a graph of manual labeling of predicted results for an open sea vessel target;
FIG. 27 is a graph of the predicted results of the Cascade Rcnn model on the open sea vessel target;
FIG. 28 is a graph of the predicted results of the Faster Rcnn model on the open sea vessel target;
FIG. 29 is a graph of predicted results of the RDET Rcnn model on an open sea vessel target;
FIG. 30 is a graph of manual labeling of predicted results at a large target vessel target;
FIG. 31 is a graph of the predicted results of the Cascade Rcnn model on a large target vessel target;
FIG. 32 is a graph of the predicted results of the Faster Rcnn model at the large target vessel target;
FIG. 33 is a graph of predicted results of the RDET Rcnn model on a large target vessel target;
FIG. 34 is a graph of manual signature of predicted results for offshore vessel targets;
FIG. 35 is a graph of the predicted results of the Cascade Rcnn model on offshore vessel targets;
FIG. 36 is a graph of the predicted results of the fast Rcnn model at offshore vessel targets;
fig. 37 is a graph of the predicted results of the RDET Rcnn model on offshore vessel targets.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, the invention provides a ship position detection method based on a SAR image, comprising the following steps:
the first step: and acquiring SAR ship images.
And acquiring an SSDD data set and an SAR-clip data set, wherein the SSDD data set totally comprises 1160 pictures and 2456 ships, and the average Ship number of each picture is 2.12. The SAR-clip dataset was derived from multi-source, multi-mode SAR images, containing 43819 Ship slices in total.
Referring to fig. 8-12, in order to avoid the phenomenon that a model is under fitted with a large target ship sample and the phenomenon that the model is missed due to unbalanced ship space distribution in an image, the invention adopts the ways of scaling, overturning, rotating and adding various noise to carry out data enhancement on an SSDD data set so as to balance the data sample.
And a second step of: and inputting the SAR ship image with the enhanced data into a SAR image ship detection model to obtain a prediction result.
Referring to fig. 2 and 3, the present invention provides a SAR image ship detection model RDET Rcnn including a REAA backbone network that enlarges receptive fields and enhances early fusion of deep and shallow features, an EAFPN network for enhancing late fusion of deep and shallow features, and a candidate region generator TRPN network with higher fine granularity.
The bottom features have higher resolution, contain more location and detail information, but have less relative convolution and more noise. The high-level features have stronger semantic information, but have low resolution and poorer perceptibility of details. Fusing features of different scales is an important means of improving detection performance. The invention provides an EA-fusion early fusion strategy, which combines two feature vectors into a composite vector by an add parallel connection mode, and for input features a and b, output features are as follows:
where i is an imaginary unit.
The invention usesResNet is a key component, and firstly, a variability convolution is used for replacing an original common convolution to enlarge the receptive field of the feature map, so that more feature information is obtained. And secondly, fusing the stage1 feature layer and the stage3 feature layer and the stage2 feature layer and the stage4 feature layer by combining EA-fusion with a self-Attention mechanism Attention, thereby realizing a backbone network REAA for enhancing early fusion of deep and shallow features. Given an input imageA plurality of first feature maps are obtained by:
wherein, the liquid crystal display device comprises a liquid crystal display device,
C(k)=256×k(3)
D(k)=4×k(4)
where k represents the current number of stages, k ε {1,2,3,4},the first feature map of the kth stage is represented by R representing a real number set, C (k) representing the number of channels of the current feature map, H representing the length of the current feature map, W representing the width of the current feature map, and D (k) representing a multiple of the length-width reduction of the aspect ratio SAR ship image of the current feature map.
After the backbone network architecture design, the invention constructs an EAFPN feature fusion network layer based on FPN. FPN is a typical depth feature late fusion strategy. And fusing the multi-scale characteristic information extracted from the backbone network, so as to improve the accuracy of target detection. In order to extract effective characteristic information as much as possible, the invention sequentially performs EA-fusion on output characteristic layers of stage2, stage3 and stage4 before late fusion operation, thereby realizing an EAFPN network for enhancing late fusion of deep and shallow characteristics.
EAFPN first makes the channel number of the 4-layer first feature map output by REAA return to 256 through convolution operation, and the feature map shape is expressed as,/>。
Secondly, carrying out double bilinear interpolation up-sampling operation, wherein the heights and widths of the feature graphs before and after the up-sampling operation meet the following relation:
in the method, in the process of the invention,high, ∈h representing input feature map>High, ∈h representing output profile>Representing the width, ++of the input feature map>The scale_factor, representing the width of the output profile, specifies the output profile as a multiple of the input profile.
And finally, combining an EA-fusion strategy to carry out transverse connection, fusing the processed REAA output result with a feature layer obtained by the same upsampling, and generating a predictable second feature map by using 3X 3 standard convolution to eliminate an aliasing effect.
Referring again to fig. 3, trpn is used to screen out boxes where targets may be present. The RPN relies on a sliding window on the shared feature map to generate 9 anchors for each location and fine-tune by convolving the difference between the current anchor and the real box, but because the real box is not aligned with the anchor, a large amount of regression error occurs. In order to alleviate the alignment problem, the invention designs a TRPN network, which comprises a first RPN network and a second RPN network, wherein the first RPN network is a T1 part and is a standard RPN network. The second PRN network is a partial T1 expanded RPN network, replacing the standard convolution in the standard RPN network with an expanded convolution. Generating a suggestion box on the second feature map based on the TRPN network, specifically comprising the following steps:
s1: inputting the second characteristic diagram to a first RPN network, and generating 9 first target frames with different size ratios at each pixel point on the second characteristic diagram through a sliding window;
s2: carrying out standard convolution operation on a plurality of pixel points through a first detection head H1, and judging 9 kinds of first target frames on the plurality of pixel points;
s3: if the first target frame is an object, carrying out regression prediction on the first target frame through prediction regression B1, and calculating the total offset between the first target frame and the object through classification prediction C1 to obtain a plurality of first prediction frames;
s4: calculating the scores of the first prediction frames to obtain a plurality of feature frames with scores greater than 0.6;
S5: inputting the second characteristic diagram to a second RPN network, generating at each pixel point on the second characteristic diagram through a sliding windowA second target frame of different size proportions;
s6: performing expansion convolution operation on multiple pixels by using a second detection head H2, wherein the expansion coefficient r is determined by the magnitude of o, and the expansion coefficient r is determined by the magnitude of oJudging a second target frame;
s7: if the second target frame is an object, carrying out regression prediction on the second target frame through prediction regression B2, and calculating the total offset between the second target frame and the object through classification prediction C2 to obtain a plurality of second prediction frames;
s8: calculating the scores of the plurality of second prediction frames to obtain a plurality of suggestion frames with scores greater than 0.7;
S9: multiple suggestion boxesMapping to the SAR ship image map, generating a plurality of suggestion frames, and carrying out frame selection on the ship in the SAR ship image by the plurality of suggestion frames to obtain the position of the ship.
The total offset is calculated by:
in the method, in the process of the invention,for the center point offset, +.>Is the shape offset, determined by the shape of the anchor and the convolution kernel size.
The loss calculation of TRPN networks consists of two parts: classification loss between ship target prediction and real targetAnd regression loss of ship target detection frame ∈ ->And (5) calculating. The CE_S loss function of the present invention uses a cross entropy classification loss and a Smooth L1 regression loss as shown in the following formula:
the present invention uses a cross entropy loss calculation model for classification loss. The classification cross entropy loss function formula is shown as follows:
in the method, in the process of the invention,representing a loss of classification,/->Representing a set of real values, +.>Representing a set of predicted values, j being the current sample, c representing the number of categories, +.>Representing the true class of the current sample, +.>Representing the predicted class of the current sample.
For regression loss of the prediction frame, the invention adopts a Smooth L1 loss function. The ship detection belongs to a single sample, and e is defined as the difference between a predicted value and a true value, and the corresponding Smooth L1 loss function is shown as follows:
examples
The invention trains and tests Ship detection models on the SSDD data set and the SAR-clip data set. The SSDD data set is composed of 1160 SAR Ship images, the SAR-clip data set is composed of 43819 SAR Ship images, and the method is characterized in that the method comprises the following steps: 1: the scale of 1 divides the training set, validation set and test set. The SSDD data set is used as a training and verification data set of the RDET Rcnn model, and the SAR-clip is used as an experimental data set for quantitative analysis of the model. In order to meet the requirement that the SSDD and SAR-clip data set formats are the same, the image tag is uniformly set to be in a COCO data format.
By carrying out statistical analysis on each ship image in the SSDD data set, the following three characteristics are obtained:
(1) As shown in fig. 4, most vessels in the data set are small-size targets, and there are small large-size vessel targets, that is, the vessel size distribution is unbalanced, and the under-fitting phenomenon occurs in the detection of the large-size vessel target model.
(2) As shown in fig. 5, the number of images of only 1 ship is the largest, and 2 to 3 ships are present in the rest of the images on average, and a small number of images contain a large number of ship elements, so that the ship spatial distribution is not uniform, and the missing detection phenomenon occurs when images containing a large number of ship targets are detected.
(3) As shown in fig. 6 and 7, the ship target frames are mostly of small square and oblong shape and the aspect ratio of the anchor frame is concentrated on 2, whereby the anchor ratio initial value in the ship detection model can be set to [0.5, 1.0, 2.0].
In order to evaluate the performance of the RDET Rcnn algorithm in SAR image ship detection, the invention adopts the accuracy (P), the recall rate (R), the average precision (mAP) and the average recall rate (AR) as evaluation indexes, and adopts TP (true positive), FP (false positive) and FN (false negative) to form target detection evaluation indexes. TP represents the number of predicted positive targets and actually positive targets, i.e., the result is considered true positive if and only if RDET-rcnn accurately detects and locates a ship target. FP represents the number of predicted positive targets but actually negative targets. FN represents the number of predicted negative targets but actually positive targets.
The experiment adopts ResNet-50 as a basic component, firstly uses coco format data set training to initialize a network, and secondly uses batch_size as SGD optimizer to train the model for 30 rounds. The initial learning rate of the backbone network is set to be 0.02, the kinetic energy is 0.9, the average value of the normalized data set images is [0.1559097,0.15591368,0.15588938], and the variance is [0.10875329,0.10876005,0.10869534]. All experiments of the present invention were performed on NVIDIA GeForce RTX 3060 GPU.
Referring to table 1, a major quantitative comparison of RDET Rcnn with the SAR image ship detection method in each detection type is shown. Specifically, the present invention makes a classification comparison because the detection mechanism of each large class of target detection algorithms is different. In the Anchor-free type, the SAR ship detection algorithm using the FCOS and CornerNet models as reference models is more, wherein the CP-FCOS is based on the FCOS framework, reconstructs a network layer and adds a Class Position (CP) module layer for optimizing the characteristics of regression branches in the network, effectively improving the model performance, and simultaneously increasing the model layer number. In the single-stage detection type, the network model constructed based on the idea of a one-stage algorithm has excellent performance on a small target detection task, but the overall accuracy AP_0.5:0.95 is not obviously increased. In the two-stage detection type, most of researches are improved by using a fast Rcnn and a cascades Rcnn as reference models, and a backbone network based on context-coupled representation learning is proposed by crTranssar by using a Swin trans-mer as a basic framework, but the model is excessively large in size and quantity of parameters. The PVT-SAR has the outstanding advantage that the accuracy of small target detection is effectively improved by using a two-stage algorithm. Compared with the existing multiple target detection methods, the RDET Rcnn serving as the two-stage SAR image ship detection model still has good performance, and can improve the large target detection model while keeping the small target detection efficiency.
Table 1 is compared with various SAR image ship detection models
The REAA backbone network with enhanced early fusion of deep and shallow features has the ability to efficiently retain more information and capture long range context information. The present invention is classified into 3 types (small-size target, medium-size target and large-size target) from small to large according to the size of the ship.
As can be seen from fig. 13 and 14, as the size of the ship increases, the detection and positioning of the ship becomes more difficult, and the detection performance of the rea is better than that of other popular backbone networks. The reason for this result is that operating with limited receptive fields relying on common convolutions is less capable in capturing context information over longer distances. Conversely, REAA feature extraction is more efficient due to the use of variability convolution to increase receptive fields and EA-fusion strategies to enhance early fusion of deep and shallow features, resulting in a richer and larger range of contextual feature information.
EAFPN was compared to existing mainstream Neck networks and the performance of different Neck networks on SSDD dataset is shown in table 2.
Table 2 comparative experiments with different negk networks
The feature pyramid FPN network is mainly used for extracting feature graphs with different scales and providing the feature graphs for a later network to execute a prediction task. As can be seen from Table 2, the EAFPN network of the invention has better advantages in SAR image ship detection tasks. The main stream improved CARFPN and PAFPN are used, the accuracy is obviously reduced, and the result is caused by the fact that the data set belongs to a single-color simple target type, and more effective characteristic information is lost when too many convolution operations are performed in the characteristic late fusion process. In contrast, the EAFPN enriches the input characteristic information by combining with an EA-fusion strategy, so that the EAFPN can extract more effective characteristic information.
In order to refine the detection granularity of the RDET-rcnn model, the TRPN proposal section generation network provided by the invention mainly generates a proposal frame with higher quality by designing two RPNs in a master-slave relation. Similarly, a comparative experiment was performed on TRPN with existing mainstream RPN networks, and the performance of different RPN networks on SSDD data sets is shown in table 3.
TABLE 3 comparative experiments for different RPN networks
The regional generation network RPN is mainly used to screen out boxes where targets may be present. As can be seen from table 3, both types of mainstream networks CRPN and GARPN based on RPN improvement perform better in this experiment. CRPN focuses on emphasizing the alignment rules of the anchors, using adaptive convolution to fine tune the anchor for each stage. The GARPN is focused on judging whether the probability of the target point exceeds a threshold value so as to adjust the anchor, and compared with the CRPN large-size ship positioning frame, the GARPN is more accurate. The TRPN designed by combining the advantages of the two RPNs has two master-slave relations, the anchor alignment rule is used for fine adjustment of the position of the prediction frame, and the ship prediction frame with a higher threshold value is sent to the second stage for fine adjustment again. The final TRPN has a more prominent performance in SAR image ship detection tasks.
To verify the effectiveness of the CE_S loss function on the RDET-rcnn model, three sets of comparative experiments were performed. Referring to fig. 15 and 16, using Focal loss in classification loss has significant advantages, and model convergence speed is also fast. Referring to fig. 17 to 19, it can be seen that the advantages of the smear L1 Loss are fully exerted in model training, and referring to fig. 20 and 21, the CE Loss and the Focal Loss are respectively combined with the smear L1 Loss to compare the total Loss, and the effect of combining the Focal Loss with the smear L1 Loss is similar to the total Loss of combining the CE Loss with the smear L1 Loss.
Referring to table 4, although the focal Loss has better convergence and performance in the sample classification Loss calculation, the accuracy is significantly reduced compared to the result of training the model with CE Loss. In natural image detection, focal Loss focuses on difficult-to-separate samples from the sample difficulty classification perspective, so that the problem of sample unbalance is solved, and meanwhile, the overall performance of a natural image detection model is improved. However, the data set has an obvious characteristic that the difference between ship target and non-ship target objects is small, the ship shape can be changed under the condition of noise interference, and excessive interpretation can be carried out on the ship samples in the training of Focal Loss, so that the model accuracy of the invention is reduced. The present invention therefore uses ce_s as the loss function of the RDET-rcnn model.
Table 4 loss function accuracy comparison
In practical application, for example, shooting height, environmental noise and picture brightness are the most common ship information changes, and the invention selects SAR ship images with different scene complexity from images shot by different radar satellites. As shown in fig. 22 to 37, a comparative experiment was performed using RDET Rcnn with cascades Rcnn and Faster Rcnn and the prediction results of RDET Rcnn model were analyzed. The results show that, as shown in fig. 22 to 25 and fig. 34 to 37, cascades Rcnn are good at detecting large and medium-sized ship targets, and that a missing detection phenomenon is likely to occur in the case of small target ships. In fig. 22-33, it is found that fast Rcnn is more suitable for detecting small target vessels, but is prone to false detection in the face of small targets as well as large target vessels that are not vessels. Compared with Cascade Rcnn and fast Rcnn, the RDET Rcnn ship detection model provided herein has better performance under different complex scenes and better detection capability under the conditions of facing different sizes and offshore interference.
The method is used for detecting and positioning the marine ship. As a ship-centric detection task, it is possible in reality to relate to marine civil safety and ship monitoring in off-course areas. In practical application, due to the fact that factors such as ship size and noise are greatly changed due to the fact that SAR shooting distance and environments are different, the RDET-rcnn is used for not only enhancing the extraction of ship features, but also refining the detection granularity, namely the accuracy of large target detection is improved while the high small target detection rate is guaranteed.
The invention provides a novel SAR image ship detection model RDET-rcnn for refining detection granularity and enhancing depth feature fusion. The RDET-rcnn model is designed for detecting and positioning ships under different SAR shooting scenes. The model is not influenced by ship scale change and stronger noise interference, and has higher effectiveness. Extensive experiments verify the powerful performance of the model.
The embodiment of the invention also provides a computer readable storage medium, and the computer readable storage medium is stored with computer executable instructions which can execute the ship position detection method based on SAR images.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.
Claims (10)
1. The ship position detection method based on the SAR image is characterized by comprising the following steps of:
acquiring SAR ship images;
inputting the SAR ship image into an SAR image ship detection model, and generating a plurality of suggestion frames on the SAR ship image;
the plurality of suggestion frames carry out frame selection on the ship in the SAR ship image to obtain the position of the ship;
inputting the SAR ship image into the SAR image ship detection model, and generating a plurality of suggestion frames on the SAR ship image, wherein the method comprises the following steps of:
carrying out multi-scale feature extraction on the SAR ship image based on the REAA backbone network to obtain a plurality of first feature images;
fusing the plurality of first feature maps based on the EAFPA network to obtain a predictable second feature map;
a plurality of suggestion boxes are generated on the second feature map based on the TRPN network, and the plurality of suggestion boxes are mapped to the SAR ship image.
2. The ship position detection method based on the SAR image as set forth in claim 1, wherein the multi-scale feature extraction is performed on the SAR ship image based on the REAA backbone network to obtain a plurality of first feature images, and the method comprises the following steps:
replacing the common convolution of the four stage feature layers in the ResNet network with a variability convolution;
after convolution replacement, fusing the first-stage feature layer and the third-stage feature layer as well as the second-stage feature layer and the fourth-stage feature layer by combining an EA-fusion strategy with a self-Attention mechanism Attention;
and inputting the SAR ship image into the fused four-stage feature layers to obtain four first feature images.
3. The ship position detection method based on the SAR image as set forth in claim 2, wherein the EA-fusion strategy combines two eigenvectors into a composite vector by adopting an add parallel connection mode, and the composite vector is represented by the following formula:
in the formula, a and b represent input features to be fused, and z represents a composite vector obtained after fusion, wherein i is an imaginary unit.
4. The SAR image-based ship position detection method of claim 2, wherein the plurality of first feature maps are obtained by:
wherein, the liquid crystal display device comprises a liquid crystal display device,
C(k)=256×k;
D(k)=4×k;
where k represents the current number of stages, k ε {1,2,3,4},the first feature map of the kth stage is represented by R representing a real number set, C (k) representing the number of channels of the current feature map, H representing the length of the current feature map, W representing the width of the current feature map, and D (k) representing a multiple of the length-width reduction of the aspect ratio SAR ship image of the current feature map.
5. The SAR image-based ship position detection method according to claim 4, wherein the EAFPA network-based fusion of the plurality of first feature maps to obtain the second predictable feature map comprises the steps of:
sequentially carrying out downward feature fusion on the four first feature graphs by using an EA-fusion strategy;
carrying out convolution operation on the four first feature graphs after the downward feature fusion, and classifying the channel numbers of the four first feature graphs into the same channel;
sequentially performing up-sampling operation on the four first feature graphs classified into the same channel, and performing up-feature fusion on the four first feature graphs fused with the corresponding down-features in the up-sampling operation;
the four first feature maps after the upward feature fusion are convolved by using 3×3 standard convolution to generate a predictable second feature map.
6. The ship position detection method based on SAR image as set forth in claim 5, wherein the four first feature images classified into the same channel are sequentially up-sampled, and the relationship between the corresponding input feature images and the corresponding output feature images is as follows:
in the method, in the process of the invention,high, ∈h representing input feature map>High, ∈h representing output profile>Representing the width, ++of the input feature map>Output characteristics of representationThe width of the profile, scale_factor, specifies the output profile as a multiple of the input profile.
7. The SAR image-based ship position detection method according to claim 5, wherein the TRPN network generates a plurality of suggestion boxes on the second feature map, comprising the steps of:
inputting the second characteristic diagram to a first RPN network, and generating 9 first target frames with different size ratios at each pixel point on the second characteristic diagram through a sliding window;
performing standard convolution operation on the plurality of pixel points, and judging 9 first target frames on the plurality of pixel points;
if the first target frame is an object, carrying out regression prediction on the first target frame, and calculating the total offset between the first target frame and the object to obtain a plurality of first prediction frames;
calculating the scores of the first prediction frames to obtain a plurality of feature frames with scores greater than 0.6;
inputting the second feature map to a second RPN network, and generating second target frames with different size proportions at each pixel point on the second feature map through a sliding window;
performing expansion convolution operation on the plurality of pixel points, and judging a seed second target frame on the plurality of pixel points;
if the second target frame is an object, carrying out regression prediction on the second target frame, and calculating the total offset between the second target frame and the object to obtain a plurality of second prediction frames;
and calculating the scores of the plurality of second prediction frames to obtain a plurality of suggestion frames with scores greater than 0.7.
9. The SAR image-based ship position detection method of claim 7, wherein the TRPN network is trained by a classification loss function and a regression loss function, and the classification loss function is calculated by:
in the method, in the process of the invention,represents a classification loss, x= { X } represents a set of real values, y= { Y } represents a set of predicted values, j is the current sample, c represents the number of classes,/-c represents the number of classes, ">Representing the true class of the current sample, +.>Representing a prediction category of the current sample;
the regression loss function is calculated by:
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer instructions for causing the computer to execute the SAR image-based ship position detection method according to any one of claims 1 to 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310501019.3A CN116206099B (en) | 2023-05-06 | 2023-05-06 | Ship position detection method based on SAR image and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310501019.3A CN116206099B (en) | 2023-05-06 | 2023-05-06 | Ship position detection method based on SAR image and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116206099A true CN116206099A (en) | 2023-06-02 |
CN116206099B CN116206099B (en) | 2023-08-15 |
Family
ID=86517737
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310501019.3A Active CN116206099B (en) | 2023-05-06 | 2023-05-06 | Ship position detection method based on SAR image and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116206099B (en) |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110533084A (en) * | 2019-08-12 | 2019-12-03 | 长安大学 | A kind of multiscale target detection method based on from attention mechanism |
AU2020103715A4 (en) * | 2020-11-27 | 2021-02-11 | Beijing University Of Posts And Telecommunications | Method of monocular depth estimation based on joint self-attention mechanism |
CN112464883A (en) * | 2020-12-11 | 2021-03-09 | 武汉工程大学 | Automatic detection and identification method and system for ship target in natural scene |
US20210390700A1 (en) * | 2020-06-12 | 2021-12-16 | Adobe Inc. | Referring image segmentation |
CN114049561A (en) * | 2021-11-25 | 2022-02-15 | 江苏科技大学 | Ship target detection model and method |
CN114202696A (en) * | 2021-12-15 | 2022-03-18 | 安徽大学 | SAR target detection method and device based on context vision and storage medium |
CN114299303A (en) * | 2021-12-07 | 2022-04-08 | 集美大学 | Ship target detection method, terminal device and storage medium |
WO2022073452A1 (en) * | 2020-10-07 | 2022-04-14 | 武汉大学 | Hyperspectral remote sensing image classification method based on self-attention context network |
CN114821155A (en) * | 2022-03-29 | 2022-07-29 | 国家电网有限公司大数据中心 | Multi-label classification method and system based on deformable NTS-NET neural network |
CN114973116A (en) * | 2022-01-21 | 2022-08-30 | 昆明理工大学 | Method and system for detecting foreign matters embedded into airport runway at night by self-attention feature |
CN115049923A (en) * | 2022-05-30 | 2022-09-13 | 北京航空航天大学杭州创新研究院 | SAR image ship target instance segmentation training method, system and device |
CN115546555A (en) * | 2022-10-18 | 2022-12-30 | 安徽大学 | Lightweight SAR target detection method based on hybrid characterization learning enhancement |
-
2023
- 2023-05-06 CN CN202310501019.3A patent/CN116206099B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110533084A (en) * | 2019-08-12 | 2019-12-03 | 长安大学 | A kind of multiscale target detection method based on from attention mechanism |
US20210390700A1 (en) * | 2020-06-12 | 2021-12-16 | Adobe Inc. | Referring image segmentation |
WO2022073452A1 (en) * | 2020-10-07 | 2022-04-14 | 武汉大学 | Hyperspectral remote sensing image classification method based on self-attention context network |
AU2020103715A4 (en) * | 2020-11-27 | 2021-02-11 | Beijing University Of Posts And Telecommunications | Method of monocular depth estimation based on joint self-attention mechanism |
CN112464883A (en) * | 2020-12-11 | 2021-03-09 | 武汉工程大学 | Automatic detection and identification method and system for ship target in natural scene |
CN114049561A (en) * | 2021-11-25 | 2022-02-15 | 江苏科技大学 | Ship target detection model and method |
CN114299303A (en) * | 2021-12-07 | 2022-04-08 | 集美大学 | Ship target detection method, terminal device and storage medium |
CN114202696A (en) * | 2021-12-15 | 2022-03-18 | 安徽大学 | SAR target detection method and device based on context vision and storage medium |
CN114973116A (en) * | 2022-01-21 | 2022-08-30 | 昆明理工大学 | Method and system for detecting foreign matters embedded into airport runway at night by self-attention feature |
CN114821155A (en) * | 2022-03-29 | 2022-07-29 | 国家电网有限公司大数据中心 | Multi-label classification method and system based on deformable NTS-NET neural network |
CN115049923A (en) * | 2022-05-30 | 2022-09-13 | 北京航空航天大学杭州创新研究院 | SAR image ship target instance segmentation training method, system and device |
CN115546555A (en) * | 2022-10-18 | 2022-12-30 | 安徽大学 | Lightweight SAR target detection method based on hybrid characterization learning enhancement |
Non-Patent Citations (6)
Title |
---|
LIPING ZHENG等: "SSE-Ship: A SAR Image Ship Detection Model with Expanded Detection Field of View and Enhanced Effective Feature Information", 《OPEN JOURNAL OF APPLIED SCIENCES》, vol. 13, no. 04, pages 562 - 578 * |
WEI TAN等: "Multimodal medical image fusion algorithm in the era of big data", 《NEURAL COMPUTING AND APPLICATIONS (2020)》, pages 1 - 21 * |
张阳等: "改进Faster R-CNN的SAR图像船舶检测技术", 《无线电工程》, vol. 52, no. 12, pages 2280 - 2287 * |
朱明明等: "改进区域卷积神经网络的机场检测方法", 《光学学报》, vol. 38, no. 07, pages 330 - 335 * |
米亚鑫: "基于低层特征融合多核卷积神经网络的管道缺陷漏磁图像识别方法", 《当代化工》, vol. 52, no. 03, pages 677 - 681 * |
马浩为等: "基于改进YOLOv5的雾霾环境下船舶红外图像检测算法", 《 交通信息与安全》, vol. 41, no. 01, pages 95 - 104 * |
Also Published As
Publication number | Publication date |
---|---|
CN116206099B (en) | 2023-08-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111652321B (en) | Marine ship detection method based on improved YOLOV3 algorithm | |
Chen et al. | Learning slimming SAR ship object detector through network pruning and knowledge distillation | |
CN112766087A (en) | Optical remote sensing image ship detection method based on knowledge distillation | |
CN110516605A (en) | Any direction Ship Target Detection method based on cascade neural network | |
Wang et al. | Ship detection based on deep learning | |
CN112418165B (en) | Small-size target detection method and device based on improved cascade neural network | |
CN116863539A (en) | Fall figure target detection method based on optimized YOLOv8s network structure | |
CN109190456A (en) | Pedestrian detection method is overlooked based on the multiple features fusion of converging channels feature and gray level co-occurrence matrixes | |
CN115526935A (en) | Pixel-level capture pose detection method and system based on global and local information | |
Cao et al. | Detection of microalgae objects based on the Improved YOLOv3 model | |
CN115546555A (en) | Lightweight SAR target detection method based on hybrid characterization learning enhancement | |
Dai et al. | GCD-YOLOv5: An armored target recognition algorithm in complex environments based on array lidar | |
CN116206099B (en) | Ship position detection method based on SAR image and storage medium | |
CN117315752A (en) | Training method, device, equipment and medium for face emotion recognition network model | |
Fan et al. | An improved yolov5 marine biological object detection algorithm | |
CN116403133A (en) | Improved vehicle detection algorithm based on YOLO v7 | |
Wang et al. | YOLOV5s-Face face detection algorithm | |
Li et al. | Target detection in color sonar image based on YOLOV5 network | |
CN115457423A (en) | Flame smoke detection method combining efficient sampling enhancement | |
Zhang et al. | Ship detection based on improved YOLO algorithm | |
CN114283336A (en) | Anchor-frame-free remote sensing image small target detection method based on mixed attention | |
Ke et al. | Scale-aware dimension-wise attention network for small ship instance segmentation in synthetic aperture radar images | |
Zhao et al. | A novel fusion framework without pooling for noisy SAR image classification | |
CN116468928B (en) | Thermal infrared small target detection method based on visual perception correlator | |
Hu et al. | FSPN: Feature Selection Pyramid Network for Tiny Person Detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |