CN116071664A - SAR image ship detection method based on improved CenterNet network - Google Patents

SAR image ship detection method based on improved CenterNet network Download PDF

Info

Publication number
CN116071664A
CN116071664A CN202310011081.4A CN202310011081A CN116071664A CN 116071664 A CN116071664 A CN 116071664A CN 202310011081 A CN202310011081 A CN 202310011081A CN 116071664 A CN116071664 A CN 116071664A
Authority
CN
China
Prior art keywords
network
representing
image
size
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310011081.4A
Other languages
Chinese (zh)
Inventor
魏雪云
唐志勇
张贞凯
郑威
靳标
奚彩萍
尚尚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University of Science and Technology
Original Assignee
Jiangsu University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University of Science and Technology filed Critical Jiangsu University of Science and Technology
Priority to CN202310011081.4A priority Critical patent/CN116071664A/en
Publication of CN116071664A publication Critical patent/CN116071664A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Abstract

The invention relates to the field of synthetic aperture radar image target detection, in particular to an SAR image ship target detection method based on an improved central Net network. The original CenterNet framework is integrated with the feature pyramid structure, the improved RepVGG network is utilized to extract features, the number of parameters in the model is reduced, the training time is shortened, and the model performance is optimized. The feature information improves the attention of the model on the target feature through an ECA attention mechanism, reduces the complexity of the model, and finally improves the detection precision and realizes light weight. And when the features are fused, the CAFAFE upsampling is performed, different sampling kernels are generated at different positions, the feature map information is fully captured, omission is avoided, and false detection and omission detection are reduced.

Description

SAR image ship detection method based on improved CenterNet network
Technical Field
The invention relates to the field of synthetic aperture radar image target detection, in particular to an SAR image ship target detection method based on an improved CenterNet network.
Background
Synthetic aperture radar (Synthetic Aperture Radar, SAR) is an active microwave imaging sensor. SAR can penetrate cloud layers, is not influenced by weather, and continuously images to realize all-weather marine remote sensing monitoring all the day. And China has a vast sea area, the sea is monitored by SAR, ship detection based on SAR images is researched, and the method has important significance for protecting the national sea safety and maintaining the national interests.
The ship detection by using the SAR image is difficult, the performance of the ship detection affecting the SAR image is mainly affected by the SAR system and the sea area environment, and due to the influence of speckle noise and scattering imaging, the pixel scattering intensity of a target area is unstable and signs of target feature loss are easy to exist. Moreover, due to the influence of shooting angles and distances, the size of the ship target in a far sea area can be reduced, and the existence of the target can not be detected. For offshore areas, similar scattering characteristics of buildings and islands exist, which are mistaken for ship targets, resulting in inaccurate detection. Traditional methods of ship detection, such as constant false alarm rate (Constant FalseAlarm Rate, CFAR). The method has a certain detection capability for large-target ships, but still cannot show good performance for open sea ship targets and complex scenes, and is long in time consumption and difficult to meet the basic requirements of ship detection. However, with the development of hardware upgrade and deep learning, the method has strong capability on the task of ship detection.
The target detection algorithm based on deep learning is divided into a detection method based on anchor frames and a detection algorithm without anchor frames according to whether the candidate frames are generated by using the anchor frames, such as SSD, faster R-CNN, mask R-CNN and YOLO series are all algorithms based on anchor frames. Indeed, there is a great improvement over traditional algorithms, but they all require a large anchor frame, require as much overlap as possible of the real frames with a large number of anchor frames, but only a small portion can overlap, thus leading to imbalance of positive and negative samples, slowing down training speed, and also require the introduction of many hyper-parameters, making tuning and universalization difficult. And the algorithm for detecting the anchor frame is mainly used for positioning the target by adopting key points, so that a large number of matching calculation of the anchor frames is reduced, and the detection speed is increased. The CenterNet is a detection algorithm without an anchor frame, the positioning and the classification of the target are completed by utilizing the characteristic information of the center point of the target object, the detection speed and the detection precision of the detection result are good, but the characteristic information is lost for a small target, so that the detection precision is slightly lower.
The invention patent with the application number of 201910718858.4, named as a remote sensing target detection method based on boundary constraint CenterNet, discloses a remote sensing target detection method, which utilizes a cascade of multi-layer convolution layers and sampling layer stacks to extract characteristic information, and utilizes the output of a boundary constraint convolution network to improve the accuracy of boundary constraint completion. However, this type of approach has a number of problems, first: for the selected scene in the SAR ship image dataset, the number of samples is too small, and the universality of network detection cannot be fully reflected. Second,: the continuously stacked network depth is really more accurate in classifying ships, but key characteristic information is lost, deviation is caused by positioning, and training speed is also reduced. Third,: the generated boundary constraint prediction labels have less constraint on the prediction frames of the false targets, cannot completely eliminate the false prediction frames, cause deviation of overall loss, and have great influence on the detected recall rate and speed indexes.
Disclosure of Invention
In order to solve the technical problems, the invention provides a SAR image ship target detection method based on an improved CenterNet network.
In order to achieve the above purpose, the present invention provides the following technical solutions:
a SAR image ship detection method based on an improved CenterNet network comprises the following steps:
step 1, acquiring an original image, carrying out data preprocessing on the original image, carrying out data enhancement, expanding a data set, changing the image of the expanded data set into the resolution of the same size, dividing the image into the ratio of 7:2:1, and sequentially taking the image as a training set, a test set and a verification set of an experiment;
step 2, extracting image features by adopting a RepVGG network enhanced by an attention mechanism;
step 3, carrying out multi-scale feature fusion on the feature images of the extracted images;
step 4, predicting the fused feature map to obtain thermodynamic diagrams, the width and the height of the target and the coordinates of the center point;
and 5, taking out the detection frame from the thermodynamic diagram to obtain a detection result.
The invention is further improved, the specific data enhancement operation in the step 1 comprises that the size of the picture cut out randomly is between 0.8 and 1 times of the original picture size, and the cutting length-width ratio is 4:3; the random sharpening enhancement adopts a USM sharpening enhancement algorithm; the brightness and contrast ratio are operated to adjust the brightness of the picture to be 1.2 and the contrast ratio to be 100; the gamma value of the gamma correction algorithm is set to 0.7.
The invention is further improved, and the specific flow of the step 2 is as follows:
in the flow 2.1, the RepVGG network adopted by the feature extraction backbone network combines the branched convolution and identity mapping of 1*1 into a 3*3 convolution stack in a reconstruction mode, so as to reduce the model parameter, namely adding a BN layer, namely a normalization layer after each convolution in training, if used
Figure SMS_1
Represented as C in the input channel 1 The output channel is C 2 The convolution kernel used is 3 x 3 in size, mu 3 ,θ 3 ,α 3 ,β 3 Expressed as mean, variance, learning scale factor and bias in the BN layer thereafter, respectively, when +.>
Figure SMS_2
Represents the convolution kernel used as 1 x 1, mu 1 ,θ 1 ,α 1 ,β 1 Also expressed as mean, variance, learned scale factor and bias in the BN layer thereafter, whereas for an identity mapped layer containing only BN layer, its parameter factor is expressed as μ 0 ,θ 0 ,α 0 ,β 0 ,/>
Figure SMS_3
To represent input +.>
Figure SMS_4
For output, assume C 1 =C 2 ,H 1 =H 2 ,W 1 =W 2 In the case of (1), the digital form of the original convolution block operation process during training is expressed as formula (1):
O 2 =BN{(O 1 *W 3 ),μ 3333 }
+BN{(O 1 *W 1 ),μ 1111 }
+BN{O 10000 } (1)
in the process of re-parameterization, in order to facilitate the calculation of the convolution kernel, the convolution kernel of 1×1 is converted into the same 3×3 shape by an edge zero padding mode; to perform unified operation, the identity mapping process can be regarded as a linear process, namely, the identity matrix can be regarded as conversion; and using the same operation principle, obtaining the size of a 3 multiplied by 3 convolution kernel by using edge zero padding, and finally unifying branch paths into the convolution operation of 3 multiplied by 3 through convolution and superposition operation. The manner of operation after the manner of reparameterization is expressed by the formula (2):
O 2 =O 1 *W i +b (2)
wherein W is i A convolution kernel size representing a 3 x 3 convolution;
flow 2.2, converting the RELU activation function into a mix function, which is smoother, allowing better information to go deep into the neural network, enabling better generalization and accuracy. The original CenterNet network is added to RELU (Rectified Linear Unit) activation function to increase nonlinear factors and improve the expression capacity of the model. The formula alpha=g (x) =max (0, z) of the function gives a gradient descent and back propagation that can be more efficient for other activation functions, avoiding the problems of gradient explosion and gradient disappearance. But neuronal death will readily occur during training, resulting in a gradient of 0. The Mish function is a self-regularized non-monotonic nerve activation function, is smoother than the original function, and when a positive value reaches any height, the formula y=x tan h (ln (1+exp (x))) shows that saturation caused by capping is avoided, and a negative value can be slightly allowed, and the function is not as hard as the original function, so that better information is embedded into a neural network and information characteristics are learned more quickly;
2.3, extracting each channel of the feature map from the previous convolution module by using the global average pooling layer, performing size modification to finish dimension reduction, reducing the depth of feature mapping of the feature map after dimension reduction through one-dimensional convolution, extracting the output feature map, and improving the model performance of the backbone network by information interaction between all channels and K neighbors;
flow 2.4, utilize the aggregate feature of non-dimension reduction as xε R C 'C' is the channel dimension, then channel attention can be derived from equation (3):
ω=σ(C1D k (x)) (3)
in C1D k Representing a fast one-dimensional convolution with a convolution kernel of size k, σ being represented as a Sigmoid function; omega represents the weight of the channel, and k represents the number of parameters of the module;
the flow 2.5, for the determination of k value, adopts the adaptive mode, its size is obtained by the direct proportion of the channel dimension C' by formula (4):
Figure SMS_5
in the formula, | odd The nearest odd number is indicated, C' is the channel dimension, and γ and b are set to 2 and 1, respectively, in this experiment.
The invention is further improved, and the specific flow of the step 3 is as follows:
the lightweight general up-sampling operation (CARAFE) is mainly divided into two parts of kernel prediction and feature recombination;
flow 3.1 channel compression of the size of the channel attention enhanced feature map, processing from CxW xH to C m Result of XW×H to reduce subsequent calculation amount, C m The number of channels is the number of channels after channel compression;
procedure 3.2 convolving content encoded with a convolution kernel size k en ×k en Is subjected to content coding by checking the compressed characteristic diagram to obtain a size of
Figure SMS_6
Feature map (++)>
Figure SMS_7
For the predicted upsampled kernel size). Expansion in channel dimension, where the dimension becomes +.>
Figure SMS_8
And 3.3, carrying out normalization processing by utilizing a softmax function so that the sum of the weights of the upsampling kernels is 1, and convolving the input feature map with the predicted upsampling kernels to obtain a final upsampling result, wherein the parameter of the CARAFE upsampling process is represented by a formula (5):
Figure SMS_9
the invention is further improved, and the specific flow of the step 4 also comprises the following steps:
4.1, predicting the feature map fused by the feature pyramid network to generate a thermodynamic diagram of the key point
Figure SMS_10
When->
Figure SMS_11
Center point corresponding to detection target->
Figure SMS_12
Calculating to obtain corresponding key point +.>
Figure SMS_13
The key point mapping after the Gaussian function downsampling is used for obtaining the point weight of the central point of the feature map, if the two Gaussian functions overlap with the same key point or the same category c, the focus loss L of the logistic regression of the pixel level with the maximum element level is trained by selecting the objective function k Expressed by formula (6) as:
Figure SMS_14
wherein N represents the number of key points of the image, alpha and beta represent super parameters of a focus loss function, x represents x-axis coordinates, y represents y-axis coordinates, and z represents z-axis coordinates; y is Y xyz The result value of the gaussian function is represented,
Figure SMS_15
is a predicted value of thermodynamic diagram;
flow 4.2, setting the bias value of the feature extraction network output as
Figure SMS_16
Figure SMS_17
R represents tensor space, H represents image height, W represents image width, C represents channel value of image, and the method comprisesBias value outputted by training network with L1 +.>
Figure SMS_18
Expressed by formula (7):
Figure SMS_19
wherein L is offset Representing a loss of target offset; n represents the number of key points of the image,
Figure SMS_20
representing the offset value of the network output, p representing the center point of the target frame, R representing the multiple of the downsampling, +.>
Figure SMS_21
Representing the center point of the target box after downsampling,
Figure SMS_22
Figure SMS_23
representing the deviation value;
flow 4.3 training the length and width of the target with L1 loss when the kth target locates his specific coordinates to deviate from the predicted coordinates is expressed by equation (8):
Figure SMS_24
wherein L is size Representing the loss of target size, N represents the number of image keypoints,
Figure SMS_25
representing the result output by the convolution network, R representing tensor space, H representing the high of the image, W representing the wide of the image, C representing the channel value of the image, s k Representing the length or width of the target frame;
and 4.4, according to a preset weight, obtaining an overall loss function which is expressed as a formula (9):
L det =L ksuze L sizeoffset L offset (9)
wherein L is k Representing focus loss of pixel level logistic regression, L size Indicating loss of target size, L offset Represents the loss of target offset, lambda size Represents L size Weights, lambda offset Represents L offset Is a weight of (2).
The invention is further improved, the specific step of the step 5 is to normalize the thermodynamic diagram by adopting a Sigmoid function, obtain the key point position of the thermodynamic diagram by utilizing 3×3 maximum pooling, and inquire whether the thermodynamic diagram is a ship target according to the score; and obtaining four corner coordinates of the target frame through the corresponding length, width and center point coordinates, and obtaining a result.
The invention has the beneficial effects that: according to the invention, the data set is expanded by adopting data enhancement, so that the diversity of training samples is increased, the robustness and generalization capability of the model are improved, and the dependence of the model on certain attributes is reduced; the feature extraction network adopts the RepVGG network, the architecture is simple and strong, the training process and the reasoning time are decoupled through the structure re-parameterization, the accuracy and the speed are perfectly combined, and the optimization capacity is improved; in the pyramid network fusion of the feature images extracted through the network, the high-performance and high-efficiency learning is completed when the complexity is obviously reduced by using the high-efficiency channel attention module to avoid the degradation and proper cross-channel interaction, and the improvement effect is obviously improved; according to the invention, different upsampling kernels are generated at different positions through CARAFE upsampling, so that the network characteristic diagram information is fully enhanced, information omission is avoided when resolution is improved, and false detection and omission are reduced.
Drawings
Fig. 1 is a schematic flow chart of an embodiment of the present invention.
Fig. 2 is an overall structure diagram of an improved network according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of a RepVGG network module according to an embodiment of the invention.
Fig. 4 is a schematic diagram of an ECA efficient channel attention mechanism according to an embodiment of the present invention.
Detailed Description
The present invention will be further described in detail with reference to the drawings and examples, which are only for the purpose of illustrating the invention and are not to be construed as limiting the scope of the invention.
Examples: as shown in fig. 1, a SAR image ship target detection based on an improved central net comprises the following steps:
step 1, acquiring an original image, carrying out data preprocessing on the original image, carrying out data enhancement, expanding a data set, changing the image of the expanded data set into the resolution of the same size, dividing the image into the ratio of 7:2:1, and sequentially taking the image as a training set, a testing set and a verification set of an experiment.
The specific implementation method is as follows: randomly cutting the original SAR ship image to obtain a picture with the size of 0.8 to 1 times that of the original picture, wherein the cutting length-width ratio is 4:3; the random sharpening enhancement adopts a USM sharpening enhancement algorithm; the brightness and contrast ratio are operated to adjust the brightness of the picture to be 1.2 and the contrast ratio to be 100; the gamma value of the gamma correction algorithm is set to 0.7.
Step 2: the image features are extracted using a RepVGG network enhanced by the attention mechanism, as shown in FIG. 3.
The specific implementation method is as follows:
(1) The RepVGG network adopted by the feature extraction backbone network combines the 1×1 branched convolution and identity mapping into a 3×3 convolution stack in a reconstruction mode, so as to reduce the model parameter number, namely adding a BN layer, namely a normalization layer after each convolution in training, if the model parameter number is used
Figure SMS_26
Represented as C in the input channel 1 The output channel is C 2 The convolution kernel used is 3 x 3 in size, mu 3 ,θ 3 ,α 3 ,β 3 Respectively expressed as mean, variance, learning scale factor and bias in the BN layer thereafter, when
Figure SMS_27
Representing the use of1 x 1 convolution kernel, μ 1 ,θ 1 ,α 1 ,β 1 Also expressed as mean, variance, learned scale factor and bias in the BN layer thereafter, whereas for an identity mapped layer containing only BN layer, its parameter factor is expressed as μ 0 ,θ 0 ,α 0 ,β 0 ,/>
Figure SMS_28
To represent input +.>
Figure SMS_29
For output, assume C 1 =C 2 ,H 1 =H 2 ,W 1 =W 2 In the case of (1), the digital form of the original convolution block operation process during training is expressed as formula (1):
Figure SMS_30
in the process of re-parameterization, in order to facilitate the calculation of the convolution kernel, the convolution kernel of 1×1 is converted into the same 3×3 shape by an edge zero padding mode, and to perform unified operation, the process of identity mapping can be regarded as a linear process, namely the transformation can be regarded as a unit matrix; and using the same operation principle, obtaining the size of a 3 multiplied by 3 convolution kernel by using edge zero padding, and finally unifying branch paths into the convolution operation of 3 multiplied by 3 through convolution and superposition operation. The manner of operation after the manner of reparameterization is expressed by the formula (2):
O 2 =O 1 *W i +b (2)
wherein W is i Representing the convolution kernel size of the 3 x 3 convolution. As shown in the right column of fig. 2, the small block in the middle of the convolution kernel is the most overlapped, and the color is darker;
(2) Converting the RELU activation function into a mix function, the function is smoother, allowing better information deep into the neural network, enabling better generalization and accuracy. The original CenterNet network is added to RELU (Rectified LinearUnit) activation function to increase nonlinear factors and improve the expression capacity of the model. The formula alpha=g (x) =max (0, z) of the function gives a gradient descent and back propagation that can be more efficient for other activation functions, avoiding the problems of gradient explosion and gradient disappearance. But neuronal death will readily occur during training, resulting in a gradient of 0. The Mish function is a self-regularized non-monotonic nerve activation function, is smoother than the original function, avoids saturation caused by capping when a positive value reaches any height by the formula y=x tan (ln (1+exp (x))), and can slightly allow a negative value, and does not have a hard zero boundary like the original function, so that better information is embedded into a neural network and information characteristics are learned more quickly;
(3) As shown in fig. 4, in the efficient channel attention mechanism module, a global average pooling layer is utilized to modify the size of each channel of the feature map extracted from the previous convolution module, so as to reduce the dimension, the depth of the feature map is reduced through one-dimensional convolution on the feature map after the dimension reduction, the output feature map is extracted, and the information interaction between all channels and K neighbors is enabled to improve the model performance of the backbone network;
(4) The aggregation characteristic without dimension reduction is x epsilon R C 'C' is the channel dimension, then channel attention can be derived from equation (3):
ω=σ(C1D k (x)) (3)
in C1D k Representing a fast one-dimensional convolution with a convolution kernel of size k; sigma is denoted Sigmoid function; omega represents the weight of the channel, and k represents the number of parameters of the module;
(5) The k value is determined in an adaptive mode, and the proportional relation of the size of the k value and the channel dimension C' is obtained by the formula (4):
Figure SMS_31
in the formula, | odd The nearest odd number is indicated, C' is the channel dimension, and γ and b are set to 2 and 1, respectively, in this experiment.
In this embodiment, as shown in fig. 2, a high-efficiency channel attention mechanism (ECA) is embedded in a path that a feature map of a RepVGG network is transferred to a feature pyramid network for multi-scale fusion, so that the feature map can obtain an obvious performance gain, and the influence of dimension reduction on learning channel attention is avoided. The pixel information of the original small target is not disappeared due to the reduction of the length and the width of the feature map. Therefore, the structure of the characteristic pyramid network is utilized to optimize the CenterNet, and the precision is improved.
Step 3: and carrying out multi-scale feature fusion on the feature map of the extracted image.
The specific implementation method is as follows: lightweight general upsampling with upsampling operator (CARAFE) is mainly divided into kernel prediction and feature recombination:
(1) Channel compression of the size of the channel attention enhanced feature map, processing from c×w×h to C m Result of XW×H to reduce subsequent calculation amount, C m The number of channels is the number of channels after channel compression;
(2) Convolving content encoded data into a kernel size k en ×k en Is subjected to content coding by checking the compressed characteristic diagram to obtain a size of
Figure SMS_32
Feature map (++)>
Figure SMS_33
For the predicted upsampled kernel size), is spread out over the channel dimension, where the dimension becomes +.>
Figure SMS_34
(3) And (3) carrying out normalization processing by using a softmax function so that the sum of the weights of the upsampling kernels is 1, and convolving the input feature map with the predicted upsampling kernels to obtain a final upsampling result, wherein the parameter of the CARAFE upsampling process is represented by a formula (5):
Figure SMS_35
step 4: and predicting the fused characteristic diagram to obtain thermodynamic diagrams, the width and the height of the target and the coordinates of the center point.
The specific implementation method is as follows:
(1) Predicting the feature map fused by the feature pyramid network to generate a thermodynamic diagram of key points
Figure SMS_36
When->
Figure SMS_37
Center point corresponding to detection target->
Figure SMS_38
Calculating to obtain corresponding key point +.>
Figure SMS_39
The key point mapping after the Gaussian function downsampling is used for obtaining the point weight of the central point of the feature map, if the two Gaussian functions overlap with the same key point or the same category c, the focus loss L of the logistic regression of the pixel level with the maximum element level is trained by selecting the objective function k Expressed by formula (6) as:
Figure SMS_40
wherein N represents the number of image key points; alpha and beta represent hyper-parameters of the focus loss function, and x represents x-axis coordinates; y represents the Y-axis coordinate, z-axis represents the z-axis coordinate, Y xyz The result value of the gaussian function is represented,
Figure SMS_41
is a predicted value of thermodynamic diagram;
(2) Setting the bias value of the output of the feature extraction network as
Figure SMS_42
Figure SMS_43
R represents tensor space, H represents image height, W represents image width, and C represents channel value of image; bias value outputted by training network by L1>
Figure SMS_44
Expressed by formula (7):
Figure SMS_45
wherein L is offset Representing a loss of target offset; n represents the number of key points of the image,
Figure SMS_46
representing the offset value of the network output, p representing the center point of the target frame, R representing the multiple of the downsampling, +.>
Figure SMS_47
Representing the center point of the target box after downsampling,
Figure SMS_48
representing the deviation value;
(3) When the kth target locates his specific coordinates to deviate from the predicted coordinates, training the length and width of the target with the L1 penalty is expressed by equation (8):
Figure SMS_49
wherein L is size Representing the loss of target size, N represents the number of image keypoints,
Figure SMS_50
representing the result output by the convolution network, R representing tensor space, H representing the high of the image, W representing the wide of the image, C representing the channel value of the image, s k Representing the length or width of the target frame;
(4) According to the preset weight, the loss function of the whole is expressed as a formula (9):
L det =L ksuze L sizeoffset L offset (9)
wherein L is k Representing focus loss of pixel level logistic regression, L size Indicating loss of target size, L offset Represents the loss of target offset, lambda size Represents L size Weights, lambda offset Represents L offset Is a weight of (2).
Step 5: the detection frame is taken out from the thermodynamic diagram to obtain a detection result, and the specific implementation method is as follows:
normalizing the thermodynamic diagram by adopting a Sigmoid function, and obtaining the key point position of the thermodynamic diagram by utilizing 3×3 maximum pooling, and inquiring whether the thermodynamic diagram is a ship target according to the score; and obtaining four corner coordinates of the target frame through the corresponding length, width and center point coordinates, and obtaining a detection result.
The effects of the present invention are further described in conjunction with experiments as follows:
the experiment was performed using an SSDD Ship Dataset and a SAR-clip-Dataset Ship Dataset. Both data sets were used to verify the generalization and accuracy of the improved algorithms herein. Where the SSDD dataset has 1160 images covering 2456 ship instances of different sizes. Each SAR image comprises 2 ships in average, and is distributed sparsely. The SAR-clip-Dataset Dataset has 43819 Ship segmentation maps. The information of satellite sensor sources, spatial resolution, polarization mode, image scene and the like in the two data sets is shown in table 1.
Table 1 basic information of the ship data set
Figure SMS_51
Table 2 detection accuracy of SSDD dataset
Figure SMS_52
TABLE 3 detection accuracy of SAR-clip-Dataset Dataset
Figure SMS_53
From tables 2 and 3, the algorithm is verified by the two data sets, so that the improved algorithm has excellent generalization capability and obviously improves the detection precision of SAR ship images, and has good performance in ship target detection.
The foregoing is merely exemplary embodiments of the present invention, and specific structures and features that are well known in the art are not described in detail herein. It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims (8)

1. The SAR image ship detection method based on the improved CenterNet network is characterized by comprising the following steps of:
step 1, acquiring an original image, carrying out data preprocessing on the original image, carrying out data enhancement, expanding a data set, changing the image of the expanded data set into the resolution of the same size, dividing the image into the ratio of 7:2:1, and sequentially taking the image as a training set, a test set and a verification set of an experiment;
step 2, extracting image features by adopting a RepVGG network enhanced by an attention mechanism;
step 3, carrying out multi-scale feature fusion on the feature images of the extracted images;
step 4, predicting the fused feature map to obtain thermodynamic diagrams, the width and the height of the target and the coordinates of the center point;
and 5, taking out the detection frame from the thermodynamic diagram to obtain a detection result.
2. The method for detecting the SAR image ship based on the improved CenterNet network according to claim 1, wherein the specific data enhancement operation in the step 1 comprises the steps of randomly cutting out the picture with the size between 0.8 and 1 time of the original picture, cutting out the aspect ratio of 4:3, randomly sharpening the picture with the USM sharpening enhancement algorithm, wherein the brightness and the contrast are operated to adjust the brightness of the picture to 1.2, the contrast is 100, and the gamma value of the gamma correction algorithm is set to 0.7.
3. The SAR image ship detection method based on the improved central net network according to claim 2, wherein the specific flow of step 2 is as follows:
2.1, combining a branched chain convolution and identity mapping of 1*1 into a 3*3 convolution stack by a RepVGG network adopted by a feature extraction backbone network in a reconstruction mode, so as to reduce the model parameter number;
2.2, converting RELU activation function into Mish function, which is smoother, allowing better information to go deep into neural network, so that better generalization and accuracy can be achieved;
2.3, extracting each channel of the feature map from the previous convolution module by using the global average pooling layer, performing size modification to finish dimension reduction, reducing the depth of feature mapping of the feature map after dimension reduction through one-dimensional convolution, extracting the output feature map, and improving the model performance of the backbone network by information interaction between all channels and K neighbors;
flow 2.4, utilize the aggregate feature of non-dimension reduction as xε R C 'C' is the channel dimension, then channel attention is given by:
ω=σ(C1D k (x))
in C1D k Representing a fast one-dimensional convolution with a convolution kernel k, wherein sigma is represented as a Sigmoid function, ω represents the weight of a channel, and k represents the number of parameters of a module;
the flow 2.5, for the determination of k value, adopts the adaptive mode, its size is obtained by the following formula by the direct proportion of the channel dimension C':
Figure FDA0004038497310000011
in the formula, | odd The nearest odd number is indicated, C' is the channel dimension, and γ and b are set to 2 and 1, respectively, in this experiment.
4. The method for detecting SAR image ships based on the improved CenterNet network according to claim 3, wherein in the flow 2.1 of the step 2, a BN layer, namely a normalization layer, is added after each convolution during training, if the following steps are used
Figure FDA0004038497310000021
Represented as C in the input channel 1 The output channel is C 2 The convolution kernel used is 3 x 3 in size, mu 3 ,θ 3 ,α 3 ,β 3 Expressed as mean, variance, learning scale factor and bias in the BN layer thereafter, respectively, when +.>
Figure FDA0004038497310000022
Represents the convolution kernel used as 1 x 1, mu 1 ,θ 1 ,α 1 ,β 1 Also expressed as mean, variance, learned scale factor and bias in the BN layer thereafter, whereas for an identity mapped layer containing only BN layer, its parameter factor is expressed as μ 0 ,θ 0 ,α 0 ,β 0
Figure FDA0004038497310000023
To represent input +.>
Figure FDA0004038497310000024
For output, assume C 1 =C 2 ,H 1 =H 2 ,W 1 =W 2 In the case of (1) representing the course of the convolution operation, then the original convolution block is trainedThe numerical form of the operation is expressed by the following formula:
O 2 =BN{(O 1 *W 3 ),μ 3333 }
+BN{(O 1 *W 1 ),μ 1111 }
+BN{O 10000 }
in the process of re-parameterization, in order to facilitate the calculation of the convolution kernel, the convolution kernel of 1×1 is converted into the same 3×3 shape by an edge zero padding mode; to perform unified operation, the identity mapping process can be regarded as a linear process, namely, the identity matrix can be regarded as conversion; and using the same operation principle, obtaining the size of a 3 multiplied by 3 convolution kernel by using edge zero padding, and finally unifying branch paths into the convolution operation of 3 multiplied by 3 through convolution and superposition operation. The manner of operation after the re-parameterization is expressed by the following formula:
O 2 =O 1 *W i +b
wherein W is i The convolution kernel size representing a 3 x 3 convolution,
the specific content of the flow 2.2 is as follows:
the original CenterNet network is added to RELU (Rectified Linear Unit) activation function to increase nonlinear factors and improve the expression capacity of the model. The formula alpha=g (x) =max (0, z) of the function gives a gradient descent and back propagation that can be more efficient for other activation functions, avoiding the problems of gradient explosion and gradient disappearance. But neuronal death will readily occur during training, resulting in a gradient of 0. The Mish function is a self-regularized non-monotonic nerve activation function which is smoother than the original function, and the equation y=x tan h (ln (1+exp (x))) of the function shows that when a positive value reaches any height, saturation caused by capping is avoided, and a negative value can be slightly allowed, unlike a hard zero boundary like the original function, so that better information is deep into a neural network, and information characteristics are learned more quickly.
5. The SAR image ship detection method based on the improved central net network according to claim 4, wherein the specific flow of step 3 is as follows:
flow 3.1 channel compression of the size of the channel attention enhanced feature map, processing from CxW xH to C m Result of XW×H to reduce subsequent calculation amount, C m The number of channels is the number of channels after channel compression;
procedure 3.2 convolving content encoded with a convolution kernel size k en ×k en Is subjected to content coding by checking the compressed characteristic diagram to obtain a size of
Figure FDA0004038497310000031
Is spread out in the channel dimension, in which case the dimension becomes +.>
Figure FDA0004038497310000032
And 3.3, carrying out normalization processing by utilizing a softmax function so that the sum of the weights of the upsampling kernels is 1, and convolving the input feature map with the predicted upsampling kernels to obtain a final upsampling result, wherein the parameter quantity of the CARAFE upsampling process is expressed as the following formula:
Figure FDA0004038497310000033
6. the SAR image ship detection method based on the improved central net network according to claim 5, wherein the specific flow of step 4 comprises:
4.1, predicting the feature map fused by the feature pyramid network to generate a thermodynamic diagram of the key points;
4.2, setting a bias value output by the feature extraction network, and training the bias value output by the network by adopting L1;
4.3, when the kth target locates the deviation between the specific coordinates and the predicted coordinates, training the length and the width of the target by adopting L1 loss;
and 4.4, obtaining an integral loss function according to the preset weight.
7. The SAR image ship detection method based on the improved central net network according to claim 6, wherein the specific content of the flow 4.1 in the step 4 is as follows:
predicting the feature map fused by the feature pyramid network to generate a thermodynamic diagram of key points
Figure FDA0004038497310000034
When->
Figure FDA0004038497310000035
Center point corresponding to detection target->
Figure FDA0004038497310000036
Calculating to obtain corresponding key point +.>
Figure FDA0004038497310000037
The key point mapping after the Gaussian function downsampling is used for obtaining the point weight of the central point of the feature map, if the two Gaussian functions overlap with the same key point or the same category c, the focus loss L of the logistic regression of the pixel level with the maximum element level is trained by selecting the objective function k Expressed by the following formula:
Figure FDA0004038497310000038
wherein N represents the number of key points of the image, alpha and beta represent super parameters of a focus loss function, x represents x-axis coordinates, y represents y-axis coordinates, and z represents z-axis coordinates; y is Y xyz The result value of the gaussian function is represented,
Figure FDA0004038497310000039
is a predicted value of thermodynamic diagram;
the specific content of the flow 4.2 is as follows:
setting the bias value of the output of the feature extraction network as
Figure FDA0004038497310000041
R represents tensor space, H represents image height, W represents image width, C represents image channel value, and bias value outputted by L1 training network is adopted +.>
Figure FDA0004038497310000048
Expressed by the following formula:
Figure FDA0004038497310000042
wherein L is offset Representing a loss of target offset; n represents the number of key points of the image,
Figure FDA0004038497310000043
representing the offset value of the network output, p representing the center point of the target frame, R representing the multiple of the downsampling, +.>
Figure FDA0004038497310000044
Representing the center point of the target box after downsampling,
Figure FDA0004038497310000045
Figure FDA0004038497310000046
representing the deviation value;
the specific content of the flow 4.3 is as follows:
when the kth target locates his specific coordinates to deviate from the predicted coordinates, training the length and width of the target with the L1 penalty is expressed by the following formula:
Figure FDA0004038497310000047
wherein L is size Representing the loss of target size, N represents the number of image keypoints,
Figure FDA0004038497310000049
representing the result output by the convolution network, R representing tensor space, H representing the high of the image, W representing the wide of the image, C representing the channel value of the image, s k Representing the length or width of the target frame;
the specific content of the flow 4.4 is as follows:
according to the preset weight, the overall loss function is obtained by the following formula:
L det =L ksuze L sizeoffset L offset
wherein L is k Representing focus loss of pixel level logistic regression, L size Indicating loss of target size, L offset Represents the loss of target offset, lambda size Represents L size Weights, lambda offset Represents L offset Is a weight of (2).
8. The SAR image ship detection method based on the improved central net network according to claim 7, wherein the specific steps of step 5 are as follows: normalizing the thermodynamic diagram by adopting a Sigmoid function, and carrying out maximum pooling by utilizing 3 multiplied by 3 to obtain the key point position of the thermodynamic diagram, inquiring whether the point is a ship target or not according to the score, and obtaining four corner coordinates of a target frame through corresponding length, width and center point coordinates to obtain a result.
CN202310011081.4A 2023-01-05 2023-01-05 SAR image ship detection method based on improved CenterNet network Pending CN116071664A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310011081.4A CN116071664A (en) 2023-01-05 2023-01-05 SAR image ship detection method based on improved CenterNet network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310011081.4A CN116071664A (en) 2023-01-05 2023-01-05 SAR image ship detection method based on improved CenterNet network

Publications (1)

Publication Number Publication Date
CN116071664A true CN116071664A (en) 2023-05-05

Family

ID=86174345

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310011081.4A Pending CN116071664A (en) 2023-01-05 2023-01-05 SAR image ship detection method based on improved CenterNet network

Country Status (1)

Country Link
CN (1) CN116071664A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116778176A (en) * 2023-06-30 2023-09-19 哈尔滨工程大学 SAR image ship trail detection method based on frequency domain attention

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116778176A (en) * 2023-06-30 2023-09-19 哈尔滨工程大学 SAR image ship trail detection method based on frequency domain attention
CN116778176B (en) * 2023-06-30 2024-02-09 哈尔滨工程大学 SAR image ship trail detection method based on frequency domain attention

Similar Documents

Publication Publication Date Title
CN114202696B (en) SAR target detection method and device based on context vision and storage medium
CN110472627B (en) End-to-end SAR image recognition method, device and storage medium
CN109636742B (en) Mode conversion method of SAR image and visible light image based on countermeasure generation network
CN112395987B (en) SAR image target detection method based on unsupervised domain adaptive CNN
Wang et al. Remote sensing landslide recognition based on convolutional neural network
CN111368769B (en) Ship multi-target detection method based on improved anchor point frame generation model
Huang et al. An intelligent ship image/video detection and classification method with improved regressive deep convolutional neural network
CN110555841B (en) SAR image change detection method based on self-attention image fusion and DEC
CN112347895A (en) Ship remote sensing target detection method based on boundary optimization neural network
CN110765912B (en) SAR image ship target detection method based on statistical constraint and Mask R-CNN
Chen et al. Geospatial transformer is what you need for aircraft detection in SAR Imagery
Shaoqing et al. The comparative study of three methods of remote sensing image change detection
CN117237740B (en) SAR image classification method based on CNN and Transformer
Chen et al. Change detection algorithm for multi-temporal remote sensing images based on adaptive parameter estimation
CN112348758A (en) Optical remote sensing image data enhancement method and target identification method
Sun et al. Image recognition technology in texture identification of marine sediment sonar image
Chai et al. Marine ship detection method for SAR image based on improved faster RCNN
CN116071664A (en) SAR image ship detection method based on improved CenterNet network
CN115965862A (en) SAR ship target detection method based on mask network fusion image characteristics
CN110069987B (en) Single-stage ship detection algorithm and device based on improved VGG network
Chen et al. Shape similarity intersection-over-union loss hybrid model for detection of synthetic aperture radar small ship objects in complex scenes
Wang et al. Automatic SAR Ship Detection Based on Multi-Feature Fusion Network in Spatial and Frequency Domain
CN113486819A (en) Ship target detection method based on YOLOv4 algorithm
Gui et al. A scale transfer convolution network for small ship detection in SAR images
CN116824413A (en) Aerial image target detection method based on multi-scale cavity convolution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination