CN116071664A

CN116071664A - SAR image ship detection method based on improved CenterNet network

Info

Publication number: CN116071664A
Application number: CN202310011081.4A
Authority: CN
Inventors: 魏雪云; 唐志勇; 张贞凯; 郑威; 靳标; 奚彩萍; 尚尚
Original assignee: Jiangsu University of Science and Technology
Current assignee: Jiangsu University of Science and Technology
Priority date: 2023-01-05
Filing date: 2023-01-05
Publication date: 2023-05-05

Abstract

The invention relates to the field of synthetic aperture radar image target detection, in particular to an SAR image ship target detection method based on an improved central Net network. The original CenterNet framework is integrated with the feature pyramid structure, the improved RepVGG network is utilized to extract features, the number of parameters in the model is reduced, the training time is shortened, and the model performance is optimized. The feature information improves the attention of the model on the target feature through an ECA attention mechanism, reduces the complexity of the model, and finally improves the detection precision and realizes light weight. And when the features are fused, the CAFAFE upsampling is performed, different sampling kernels are generated at different positions, the feature map information is fully captured, omission is avoided, and false detection and omission detection are reduced.

Description

SAR image ship detection method based on improved CenterNet network

Technical Field

The invention relates to the field of synthetic aperture radar image target detection, in particular to an SAR image ship target detection method based on an improved CenterNet network.

Background

Synthetic aperture radar (Synthetic Aperture Radar, SAR) is an active microwave imaging sensor. SAR can penetrate cloud layers, is not influenced by weather, and continuously images to realize all-weather marine remote sensing monitoring all the day. And China has a vast sea area, the sea is monitored by SAR, ship detection based on SAR images is researched, and the method has important significance for protecting the national sea safety and maintaining the national interests.

The ship detection by using the SAR image is difficult, the performance of the ship detection affecting the SAR image is mainly affected by the SAR system and the sea area environment, and due to the influence of speckle noise and scattering imaging, the pixel scattering intensity of a target area is unstable and signs of target feature loss are easy to exist. Moreover, due to the influence of shooting angles and distances, the size of the ship target in a far sea area can be reduced, and the existence of the target can not be detected. For offshore areas, similar scattering characteristics of buildings and islands exist, which are mistaken for ship targets, resulting in inaccurate detection. Traditional methods of ship detection, such as constant false alarm rate (Constant FalseAlarm Rate, CFAR). The method has a certain detection capability for large-target ships, but still cannot show good performance for open sea ship targets and complex scenes, and is long in time consumption and difficult to meet the basic requirements of ship detection. However, with the development of hardware upgrade and deep learning, the method has strong capability on the task of ship detection.

The target detection algorithm based on deep learning is divided into a detection method based on anchor frames and a detection algorithm without anchor frames according to whether the candidate frames are generated by using the anchor frames, such as SSD, faster R-CNN, mask R-CNN and YOLO series are all algorithms based on anchor frames. Indeed, there is a great improvement over traditional algorithms, but they all require a large anchor frame, require as much overlap as possible of the real frames with a large number of anchor frames, but only a small portion can overlap, thus leading to imbalance of positive and negative samples, slowing down training speed, and also require the introduction of many hyper-parameters, making tuning and universalization difficult. And the algorithm for detecting the anchor frame is mainly used for positioning the target by adopting key points, so that a large number of matching calculation of the anchor frames is reduced, and the detection speed is increased. The CenterNet is a detection algorithm without an anchor frame, the positioning and the classification of the target are completed by utilizing the characteristic information of the center point of the target object, the detection speed and the detection precision of the detection result are good, but the characteristic information is lost for a small target, so that the detection precision is slightly lower.

The invention patent with the application number of 201910718858.4, named as a remote sensing target detection method based on boundary constraint CenterNet, discloses a remote sensing target detection method, which utilizes a cascade of multi-layer convolution layers and sampling layer stacks to extract characteristic information, and utilizes the output of a boundary constraint convolution network to improve the accuracy of boundary constraint completion. However, this type of approach has a number of problems, first: for the selected scene in the SAR ship image dataset, the number of samples is too small, and the universality of network detection cannot be fully reflected. Second,: the continuously stacked network depth is really more accurate in classifying ships, but key characteristic information is lost, deviation is caused by positioning, and training speed is also reduced. Third,: the generated boundary constraint prediction labels have less constraint on the prediction frames of the false targets, cannot completely eliminate the false prediction frames, cause deviation of overall loss, and have great influence on the detected recall rate and speed indexes.

Disclosure of Invention

In order to solve the technical problems, the invention provides a SAR image ship target detection method based on an improved CenterNet network.

In order to achieve the above purpose, the present invention provides the following technical solutions:

a SAR image ship detection method based on an improved CenterNet network comprises the following steps:

step 1, acquiring an original image, carrying out data preprocessing on the original image, carrying out data enhancement, expanding a data set, changing the image of the expanded data set into the resolution of the same size, dividing the image into the ratio of 7:2:1, and sequentially taking the image as a training set, a test set and a verification set of an experiment;

step 2, extracting image features by adopting a RepVGG network enhanced by an attention mechanism;

step 3, carrying out multi-scale feature fusion on the feature images of the extracted images;

step 4, predicting the fused feature map to obtain thermodynamic diagrams, the width and the height of the target and the coordinates of the center point;

and 5, taking out the detection frame from the thermodynamic diagram to obtain a detection result.

The invention is further improved, the specific data enhancement operation in the step 1 comprises that the size of the picture cut out randomly is between 0.8 and 1 times of the original picture size, and the cutting length-width ratio is 4:3; the random sharpening enhancement adopts a USM sharpening enhancement algorithm; the brightness and contrast ratio are operated to adjust the brightness of the picture to be 1.2 and the contrast ratio to be 100; the gamma value of the gamma correction algorithm is set to 0.7.

The invention is further improved, and the specific flow of the step 2 is as follows:

in the flow 2.1, the RepVGG network adopted by the feature extraction backbone network combines the branched convolution and identity mapping of 1*1 into a 3*3 convolution stack in a reconstruction mode, so as to reduce the model parameter, namely adding a BN layer, namely a normalization layer after each convolution in training, if used

Represented as C in the input channel ₁ The output channel is C ₂ The convolution kernel used is 3 x 3 in size, mu ³ ，θ ³ ，α ³ ，β ³ Expressed as mean, variance, learning scale factor and bias in the BN layer thereafter, respectively, when +.>

Represents the convolution kernel used as 1 x 1, mu ¹ ，θ ¹ ，α ¹ ，β ¹ Also expressed as mean, variance, learned scale factor and bias in the BN layer thereafter, whereas for an identity mapped layer containing only BN layer, its parameter factor is expressed as μ ⁰ ，θ ⁰ ，α ⁰ ，β ⁰ ，/>

To represent input +.>

For output, assume C ¹ ＝C ² ，H ¹ ＝H ² ,W ¹ ＝W ² In the case of (1), the digital form of the original convolution block operation process during training is expressed as formula (1):

O ² ＝BN{(O ¹ *W ³ ),μ ³ ,θ ³ ,α ³ ,β ³ }

+BN{(O ¹ *W ¹ ),μ ¹ ,θ ¹ ,α ¹ ,β ¹ }

+BN{O ¹ ,μ ⁰ ,θ ⁰ ,α ⁰ ,β ⁰ } (1)

in the process of re-parameterization, in order to facilitate the calculation of the convolution kernel, the convolution kernel of 1×1 is converted into the same 3×3 shape by an edge zero padding mode; to perform unified operation, the identity mapping process can be regarded as a linear process, namely, the identity matrix can be regarded as conversion; and using the same operation principle, obtaining the size of a 3 multiplied by 3 convolution kernel by using edge zero padding, and finally unifying branch paths into the convolution operation of 3 multiplied by 3 through convolution and superposition operation. The manner of operation after the manner of reparameterization is expressed by the formula (2):

O ² ＝O ¹ *W ⁱ +b (2)

wherein W is ⁱ A convolution kernel size representing a 3 x 3 convolution;

flow 2.2, converting the RELU activation function into a mix function, which is smoother, allowing better information to go deep into the neural network, enabling better generalization and accuracy. The original CenterNet network is added to RELU (Rectified Linear Unit) activation function to increase nonlinear factors and improve the expression capacity of the model. The formula alpha=g (x) =max (0, z) of the function gives a gradient descent and back propagation that can be more efficient for other activation functions, avoiding the problems of gradient explosion and gradient disappearance. But neuronal death will readily occur during training, resulting in a gradient of 0. The Mish function is a self-regularized non-monotonic nerve activation function, is smoother than the original function, and when a positive value reaches any height, the formula y=x tan h (ln (1+exp (x))) shows that saturation caused by capping is avoided, and a negative value can be slightly allowed, and the function is not as hard as the original function, so that better information is embedded into a neural network and information characteristics are learned more quickly;

2.3, extracting each channel of the feature map from the previous convolution module by using the global average pooling layer, performing size modification to finish dimension reduction, reducing the depth of feature mapping of the feature map after dimension reduction through one-dimensional convolution, extracting the output feature map, and improving the model performance of the backbone network by information interaction between all channels and K neighbors;

flow 2.4, utilize the aggregate feature of non-dimension reduction as xε R ^C 'C' is the channel dimension, then channel attention can be derived from equation (3):

ω＝σ(C1D _k (x)) (3)

in C1D _k Representing a fast one-dimensional convolution with a convolution kernel of size k, σ being represented as a Sigmoid function; omega represents the weight of the channel, and k represents the number of parameters of the module;

the flow 2.5, for the determination of k value, adopts the adaptive mode, its size is obtained by the direct proportion of the channel dimension C' by formula (4):

in the formula, | _odd The nearest odd number is indicated, C' is the channel dimension, and γ and b are set to 2 and 1, respectively, in this experiment.

The invention is further improved, and the specific flow of the step 3 is as follows:

the lightweight general up-sampling operation (CARAFE) is mainly divided into two parts of kernel prediction and feature recombination;

flow 3.1 channel compression of the size of the channel attention enhanced feature map, processing from CxW xH to C _m Result of XW×H to reduce subsequent calculation amount, C _m The number of channels is the number of channels after channel compression;

procedure 3.2 convolving content encoded with a convolution kernel size k _en ×k _en Is subjected to content coding by checking the compressed characteristic diagram to obtain a size of

Feature map (++)>

For the predicted upsampled kernel size). Expansion in channel dimension, where the dimension becomes +.>

And 3.3, carrying out normalization processing by utilizing a softmax function so that the sum of the weights of the upsampling kernels is 1, and convolving the input feature map with the predicted upsampling kernels to obtain a final upsampling result, wherein the parameter of the CARAFE upsampling process is represented by a formula (5):

the invention is further improved, and the specific flow of the step 4 also comprises the following steps:

4.1, predicting the feature map fused by the feature pyramid network to generate a thermodynamic diagram of the key point

When->

Center point corresponding to detection target->

Calculating to obtain corresponding key point +.>

The key point mapping after the Gaussian function downsampling is used for obtaining the point weight of the central point of the feature map, if the two Gaussian functions overlap with the same key point or the same category c, the focus loss L of the logistic regression of the pixel level with the maximum element level is trained by selecting the objective function _k Expressed by formula (6) as:

wherein N represents the number of key points of the image, alpha and beta represent super parameters of a focus loss function, x represents x-axis coordinates, y represents y-axis coordinates, and z represents z-axis coordinates; y is Y _xyz The result value of the gaussian function is represented,

is a predicted value of thermodynamic diagram;

flow 4.2, setting the bias value of the feature extraction network output as

R represents tensor space, H represents image height, W represents image width, C represents channel value of image, and the method comprisesBias value outputted by training network with L1 +.>

Expressed by formula (7):

wherein L is _offset Representing a loss of target offset; n represents the number of key points of the image,

representing the offset value of the network output, p representing the center point of the target frame, R representing the multiple of the downsampling, +.>

Representing the center point of the target box after downsampling,

representing the deviation value;

flow 4.3 training the length and width of the target with L1 loss when the kth target locates his specific coordinates to deviate from the predicted coordinates is expressed by equation (8):

wherein L is _size Representing the loss of target size, N represents the number of image keypoints,

representing the result output by the convolution network, R representing tensor space, H representing the high of the image, W representing the wide of the image, C representing the channel value of the image, s _k Representing the length or width of the target frame;

and 4.4, according to a preset weight, obtaining an overall loss function which is expressed as a formula (9):

L _det ＝L _k +λ _suze L _size +λ _offset L _offset (9)

wherein L is _k Representing focus loss of pixel level logistic regression, L _size Indicating loss of target size, L _offset Represents the loss of target offset, lambda _size Represents L _size Weights, lambda _offset Represents L _offset Is a weight of (2).

The invention is further improved, the specific step of the step 5 is to normalize the thermodynamic diagram by adopting a Sigmoid function, obtain the key point position of the thermodynamic diagram by utilizing 3×3 maximum pooling, and inquire whether the thermodynamic diagram is a ship target according to the score; and obtaining four corner coordinates of the target frame through the corresponding length, width and center point coordinates, and obtaining a result.

The invention has the beneficial effects that: according to the invention, the data set is expanded by adopting data enhancement, so that the diversity of training samples is increased, the robustness and generalization capability of the model are improved, and the dependence of the model on certain attributes is reduced; the feature extraction network adopts the RepVGG network, the architecture is simple and strong, the training process and the reasoning time are decoupled through the structure re-parameterization, the accuracy and the speed are perfectly combined, and the optimization capacity is improved; in the pyramid network fusion of the feature images extracted through the network, the high-performance and high-efficiency learning is completed when the complexity is obviously reduced by using the high-efficiency channel attention module to avoid the degradation and proper cross-channel interaction, and the improvement effect is obviously improved; according to the invention, different upsampling kernels are generated at different positions through CARAFE upsampling, so that the network characteristic diagram information is fully enhanced, information omission is avoided when resolution is improved, and false detection and omission are reduced.

Drawings

Fig. 1 is a schematic flow chart of an embodiment of the present invention.

Fig. 2 is an overall structure diagram of an improved network according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of a RepVGG network module according to an embodiment of the invention.

Fig. 4 is a schematic diagram of an ECA efficient channel attention mechanism according to an embodiment of the present invention.

Detailed Description

The present invention will be further described in detail with reference to the drawings and examples, which are only for the purpose of illustrating the invention and are not to be construed as limiting the scope of the invention.

Examples: as shown in fig. 1, a SAR image ship target detection based on an improved central net comprises the following steps:

step 1, acquiring an original image, carrying out data preprocessing on the original image, carrying out data enhancement, expanding a data set, changing the image of the expanded data set into the resolution of the same size, dividing the image into the ratio of 7:2:1, and sequentially taking the image as a training set, a testing set and a verification set of an experiment.

The specific implementation method is as follows: randomly cutting the original SAR ship image to obtain a picture with the size of 0.8 to 1 times that of the original picture, wherein the cutting length-width ratio is 4:3; the random sharpening enhancement adopts a USM sharpening enhancement algorithm; the brightness and contrast ratio are operated to adjust the brightness of the picture to be 1.2 and the contrast ratio to be 100; the gamma value of the gamma correction algorithm is set to 0.7.

Step 2: the image features are extracted using a RepVGG network enhanced by the attention mechanism, as shown in FIG. 3.

The specific implementation method is as follows:

(1) The RepVGG network adopted by the feature extraction backbone network combines the 1×1 branched convolution and identity mapping into a 3×3 convolution stack in a reconstruction mode, so as to reduce the model parameter number, namely adding a BN layer, namely a normalization layer after each convolution in training, if the model parameter number is used

Represented as C in the input channel ₁ The output channel is C ₂ The convolution kernel used is 3 x 3 in size, mu ³ ，θ ³ ，α ³ ，β ³ Respectively expressed as mean, variance, learning scale factor and bias in the BN layer thereafter, when

Representing the use of1 x 1 convolution kernel, μ ¹ ，θ ¹ ，α ¹ ，β ¹ Also expressed as mean, variance, learned scale factor and bias in the BN layer thereafter, whereas for an identity mapped layer containing only BN layer, its parameter factor is expressed as μ ⁰ ，θ ⁰ ，α ⁰ ，β ⁰ ，/>

To represent input +.>

in the process of re-parameterization, in order to facilitate the calculation of the convolution kernel, the convolution kernel of 1×1 is converted into the same 3×3 shape by an edge zero padding mode, and to perform unified operation, the process of identity mapping can be regarded as a linear process, namely the transformation can be regarded as a unit matrix; and using the same operation principle, obtaining the size of a 3 multiplied by 3 convolution kernel by using edge zero padding, and finally unifying branch paths into the convolution operation of 3 multiplied by 3 through convolution and superposition operation. The manner of operation after the manner of reparameterization is expressed by the formula (2):

O ² ＝O ¹ *W ⁱ +b (2)

wherein W is ⁱ Representing the convolution kernel size of the 3 x 3 convolution. As shown in the right column of fig. 2, the small block in the middle of the convolution kernel is the most overlapped, and the color is darker;

(2) Converting the RELU activation function into a mix function, the function is smoother, allowing better information deep into the neural network, enabling better generalization and accuracy. The original CenterNet network is added to RELU (Rectified LinearUnit) activation function to increase nonlinear factors and improve the expression capacity of the model. The formula alpha=g (x) =max (0, z) of the function gives a gradient descent and back propagation that can be more efficient for other activation functions, avoiding the problems of gradient explosion and gradient disappearance. But neuronal death will readily occur during training, resulting in a gradient of 0. The Mish function is a self-regularized non-monotonic nerve activation function, is smoother than the original function, avoids saturation caused by capping when a positive value reaches any height by the formula y=x tan (ln (1+exp (x))), and can slightly allow a negative value, and does not have a hard zero boundary like the original function, so that better information is embedded into a neural network and information characteristics are learned more quickly;

(3) As shown in fig. 4, in the efficient channel attention mechanism module, a global average pooling layer is utilized to modify the size of each channel of the feature map extracted from the previous convolution module, so as to reduce the dimension, the depth of the feature map is reduced through one-dimensional convolution on the feature map after the dimension reduction, the output feature map is extracted, and the information interaction between all channels and K neighbors is enabled to improve the model performance of the backbone network;

(4) The aggregation characteristic without dimension reduction is x epsilon R ^C 'C' is the channel dimension, then channel attention can be derived from equation (3):

ω＝σ(C1D _k (x)) (3)

in C1D _k Representing a fast one-dimensional convolution with a convolution kernel of size k; sigma is denoted Sigmoid function; omega represents the weight of the channel, and k represents the number of parameters of the module;

(5) The k value is determined in an adaptive mode, and the proportional relation of the size of the k value and the channel dimension C' is obtained by the formula (4):

In this embodiment, as shown in fig. 2, a high-efficiency channel attention mechanism (ECA) is embedded in a path that a feature map of a RepVGG network is transferred to a feature pyramid network for multi-scale fusion, so that the feature map can obtain an obvious performance gain, and the influence of dimension reduction on learning channel attention is avoided. The pixel information of the original small target is not disappeared due to the reduction of the length and the width of the feature map. Therefore, the structure of the characteristic pyramid network is utilized to optimize the CenterNet, and the precision is improved.

Step 3: and carrying out multi-scale feature fusion on the feature map of the extracted image.

The specific implementation method is as follows: lightweight general upsampling with upsampling operator (CARAFE) is mainly divided into kernel prediction and feature recombination:

(1) Channel compression of the size of the channel attention enhanced feature map, processing from c×w×h to C _m Result of XW×H to reduce subsequent calculation amount, C _m The number of channels is the number of channels after channel compression;

(2) Convolving content encoded data into a kernel size k _en ×k _en Is subjected to content coding by checking the compressed characteristic diagram to obtain a size of

Feature map (++)>

For the predicted upsampled kernel size), is spread out over the channel dimension, where the dimension becomes +.>

(3) And (3) carrying out normalization processing by using a softmax function so that the sum of the weights of the upsampling kernels is 1, and convolving the input feature map with the predicted upsampling kernels to obtain a final upsampling result, wherein the parameter of the CARAFE upsampling process is represented by a formula (5):

step 4: and predicting the fused characteristic diagram to obtain thermodynamic diagrams, the width and the height of the target and the coordinates of the center point.

The specific implementation method is as follows:

(1) Predicting the feature map fused by the feature pyramid network to generate a thermodynamic diagram of key points

When->

Center point corresponding to detection target->

Calculating to obtain corresponding key point +.>

wherein N represents the number of image key points; alpha and beta represent hyper-parameters of the focus loss function, and x represents x-axis coordinates; y represents the Y-axis coordinate, z-axis represents the z-axis coordinate, Y _xyz The result value of the gaussian function is represented,

is a predicted value of thermodynamic diagram;

(2) Setting the bias value of the output of the feature extraction network as

R represents tensor space, H represents image height, W represents image width, and C represents channel value of image; bias value outputted by training network by L1>

Expressed by formula (7):

Representing the center point of the target box after downsampling,

representing the deviation value;

(3) When the kth target locates his specific coordinates to deviate from the predicted coordinates, training the length and width of the target with the L1 penalty is expressed by equation (8):

(4) According to the preset weight, the loss function of the whole is expressed as a formula (9):

L _det ＝L _k +λ _suze L _size +λ _offset L _offset (9)

Step 5: the detection frame is taken out from the thermodynamic diagram to obtain a detection result, and the specific implementation method is as follows:

normalizing the thermodynamic diagram by adopting a Sigmoid function, and obtaining the key point position of the thermodynamic diagram by utilizing 3×3 maximum pooling, and inquiring whether the thermodynamic diagram is a ship target according to the score; and obtaining four corner coordinates of the target frame through the corresponding length, width and center point coordinates, and obtaining a detection result.

The effects of the present invention are further described in conjunction with experiments as follows:

the experiment was performed using an SSDD Ship Dataset and a SAR-clip-Dataset Ship Dataset. Both data sets were used to verify the generalization and accuracy of the improved algorithms herein. Where the SSDD dataset has 1160 images covering 2456 ship instances of different sizes. Each SAR image comprises 2 ships in average, and is distributed sparsely. The SAR-clip-Dataset Dataset has 43819 Ship segmentation maps. The information of satellite sensor sources, spatial resolution, polarization mode, image scene and the like in the two data sets is shown in table 1.

Table 1 basic information of the ship data set

Table 2 detection accuracy of SSDD dataset

TABLE 3 detection accuracy of SAR-clip-Dataset Dataset

From tables 2 and 3, the algorithm is verified by the two data sets, so that the improved algorithm has excellent generalization capability and obviously improves the detection precision of SAR ship images, and has good performance in ship target detection.

The foregoing is merely exemplary embodiments of the present invention, and specific structures and features that are well known in the art are not described in detail herein. It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims

1. The SAR image ship detection method based on the improved CenterNet network is characterized by comprising the following steps of:

2. The method for detecting the SAR image ship based on the improved CenterNet network according to claim 1, wherein the specific data enhancement operation in the step 1 comprises the steps of randomly cutting out the picture with the size between 0.8 and 1 time of the original picture, cutting out the aspect ratio of 4:3, randomly sharpening the picture with the USM sharpening enhancement algorithm, wherein the brightness and the contrast are operated to adjust the brightness of the picture to 1.2, the contrast is 100, and the gamma value of the gamma correction algorithm is set to 0.7.

3. The SAR image ship detection method based on the improved central net network according to claim 2, wherein the specific flow of step 2 is as follows:

2.1, combining a branched chain convolution and identity mapping of 1*1 into a 3*3 convolution stack by a RepVGG network adopted by a feature extraction backbone network in a reconstruction mode, so as to reduce the model parameter number;

2.2, converting RELU activation function into Mish function, which is smoother, allowing better information to go deep into neural network, so that better generalization and accuracy can be achieved;

flow 2.4, utilize the aggregate feature of non-dimension reduction as xε R ^C 'C' is the channel dimension, then channel attention is given by:

ω＝σ(C1D _k (x))

in C1D _k Representing a fast one-dimensional convolution with a convolution kernel k, wherein sigma is represented as a Sigmoid function, ω represents the weight of a channel, and k represents the number of parameters of a module;

the flow 2.5, for the determination of k value, adopts the adaptive mode, its size is obtained by the following formula by the direct proportion of the channel dimension C':

4. The method for detecting SAR image ships based on the improved CenterNet network according to claim 3, wherein in the flow 2.1 of the step 2, a BN layer, namely a normalization layer, is added after each convolution during training, if the following steps are used

Represents the convolution kernel used as 1 x 1, mu ¹ ，θ ¹ ，α ¹ ，β ¹ Also expressed as mean, variance, learned scale factor and bias in the BN layer thereafter, whereas for an identity mapped layer containing only BN layer, its parameter factor is expressed as μ ⁰ ，θ ⁰ ，α ⁰ ，β ⁰ ，

To represent input +.>

For output, assume C ¹ ＝C ² ，H ¹ ＝H ² ,W ¹ ＝W ² In the case of (1) representing the course of the convolution operation, then the original convolution block is trainedThe numerical form of the operation is expressed by the following formula:

O ² ＝BN{(O ¹ *W ³ ),μ ³ ,θ ³ ,α ³ ,β ³ }

+BN{(O ¹ *W ¹ ),μ ¹ ,θ ¹ ,α ¹ ,β ¹ }

+BN{O ¹ ,μ ⁰ ,θ ⁰ ,α ⁰ ,β ⁰ }

in the process of re-parameterization, in order to facilitate the calculation of the convolution kernel, the convolution kernel of 1×1 is converted into the same 3×3 shape by an edge zero padding mode; to perform unified operation, the identity mapping process can be regarded as a linear process, namely, the identity matrix can be regarded as conversion; and using the same operation principle, obtaining the size of a 3 multiplied by 3 convolution kernel by using edge zero padding, and finally unifying branch paths into the convolution operation of 3 multiplied by 3 through convolution and superposition operation. The manner of operation after the re-parameterization is expressed by the following formula:

O ² ＝O ¹ *W ⁱ +b

wherein W is ⁱ The convolution kernel size representing a 3 x 3 convolution,

the specific content of the flow 2.2 is as follows:

the original CenterNet network is added to RELU (Rectified Linear Unit) activation function to increase nonlinear factors and improve the expression capacity of the model. The formula alpha=g (x) =max (0, z) of the function gives a gradient descent and back propagation that can be more efficient for other activation functions, avoiding the problems of gradient explosion and gradient disappearance. But neuronal death will readily occur during training, resulting in a gradient of 0. The Mish function is a self-regularized non-monotonic nerve activation function which is smoother than the original function, and the equation y=x tan h (ln (1+exp (x))) of the function shows that when a positive value reaches any height, saturation caused by capping is avoided, and a negative value can be slightly allowed, unlike a hard zero boundary like the original function, so that better information is deep into a neural network, and information characteristics are learned more quickly.

5. The SAR image ship detection method based on the improved central net network according to claim 4, wherein the specific flow of step 3 is as follows:

Is spread out in the channel dimension, in which case the dimension becomes +.>

And 3.3, carrying out normalization processing by utilizing a softmax function so that the sum of the weights of the upsampling kernels is 1, and convolving the input feature map with the predicted upsampling kernels to obtain a final upsampling result, wherein the parameter quantity of the CARAFE upsampling process is expressed as the following formula:

6. the SAR image ship detection method based on the improved central net network according to claim 5, wherein the specific flow of step 4 comprises:

4.1, predicting the feature map fused by the feature pyramid network to generate a thermodynamic diagram of the key points;

4.2, setting a bias value output by the feature extraction network, and training the bias value output by the network by adopting L1;

4.3, when the kth target locates the deviation between the specific coordinates and the predicted coordinates, training the length and the width of the target by adopting L1 loss;

and 4.4, obtaining an integral loss function according to the preset weight.

7. The SAR image ship detection method based on the improved central net network according to claim 6, wherein the specific content of the flow 4.1 in the step 4 is as follows:

predicting the feature map fused by the feature pyramid network to generate a thermodynamic diagram of key points

When->

Center point corresponding to detection target->

Calculating to obtain corresponding key point +.>

The key point mapping after the Gaussian function downsampling is used for obtaining the point weight of the central point of the feature map, if the two Gaussian functions overlap with the same key point or the same category c, the focus loss L of the logistic regression of the pixel level with the maximum element level is trained by selecting the objective function _k Expressed by the following formula:

is a predicted value of thermodynamic diagram;

the specific content of the flow 4.2 is as follows:

setting the bias value of the output of the feature extraction network as

R represents tensor space, H represents image height, W represents image width, C represents image channel value, and bias value outputted by L1 training network is adopted +.>

Expressed by the following formula:

Representing the center point of the target box after downsampling,

representing the deviation value;

the specific content of the flow 4.3 is as follows:

when the kth target locates his specific coordinates to deviate from the predicted coordinates, training the length and width of the target with the L1 penalty is expressed by the following formula:

the specific content of the flow 4.4 is as follows:

according to the preset weight, the overall loss function is obtained by the following formula:

L _det ＝L _k +λ _suze L _size +λ _offset L _offset

8. The SAR image ship detection method based on the improved central net network according to claim 7, wherein the specific steps of step 5 are as follows: normalizing the thermodynamic diagram by adopting a Sigmoid function, and carrying out maximum pooling by utilizing 3 multiplied by 3 to obtain the key point position of the thermodynamic diagram, inquiring whether the point is a ship target or not according to the score, and obtaining four corner coordinates of a target frame through corresponding length, width and center point coordinates to obtain a result.