CN114170512A

CN114170512A - Remote sensing SAR target detection method based on combination of network pruning and parameter quantification

Info

Publication number: CN114170512A
Application number: CN202111488427.7A
Authority: CN
Inventors: 雷杰; 王嘉轩; 杨埂; 谢卫莹; 李云松
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-12-08
Filing date: 2021-12-08
Publication date: 2022-03-11

Abstract

The invention discloses a remote sensing SAR target detection method based on combination of network pruning and parameter quantification, which mainly solves the defects of high model complexity and low inference speed of the existing remote sensing SAR target detection method. The implementation scheme is as follows: acquiring a divided training set and a test set from a public remote sensing SAR target detection data set, and performing data expansion on the training set; performing data enhancement on the expanded training set; adjusting the existing lightweight network, and constructing a reference model; training a reference model and calculating the performance index of the reference model; evaluating the importance of each feature extraction module in the adjusted network, cutting out unimportant modules, and then performing filter pruning; setting a metric search space, searching a mixed precision quantization scheme of the pruned model to obtain a final model, and using the model to perform SAR target detection. The method improves the detection precision, saves the training cost, and can be used for target identification scenes with limited computing resources and high real-time requirements.

Description

Remote sensing SAR target detection method based on combination of network pruning and parameter quantification

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a remote sensing SAR target detection method which can be used for an SAR image recognition scene with limited computing resources and high real-time requirement.

Background

Synthetic Aperture Radar (SAR) is one of the important means for earth observation of people at present, and is widely applied to civil fields such as marine rescue, marine law enforcement and the like and military fields such as marine real-time monitoring and detection and the like. The SAR image data is usually a single-channel gray-scale image, and has the characteristics of many small targets, unobvious target characteristics, unbalanced target distribution and high similarity between the target and the background. In recent years, the convolutional neural network CNN has been widely used for SAR image target detection. Compared with the traditional detection algorithm which needs tedious feature design and modeling, the CNN-based method shows better performance than the traditional method by virtue of the autonomous learning parameters and the capability of automatically extracting features.

Currently, target detection algorithms based on CNN are mainly classified into two categories. The first type is a two-stage target detection algorithm represented by an RCNN (Region-CNN) series, which generates a candidate Region on a frame containing a target first and then performs target detection. The second type is a single-stage target detection algorithm represented by ssd (single Shot multi box detector) and YOLOv3 (young Only Look one). The algorithm does not generate a candidate region, directly performs target detection in a regression mode, has small calculated amount and high reasoning speed, and performs better in resource-limited environment, but the detection precision of the method is generally not as good as that of a two-stage method.

The convolutional neural network CNN is a core part of an object detection algorithm and is used for extracting image features. In pursuit of higher accuracy, researchers often design complex networks with a large number of redundant parameters. Such a network structure consumes a large amount of computing resources and storage resources no matter in the training phase or the reasoning phase, so that in practical application, a tradeoff between precision and resource consumption is required, and a main method for balancing the network structure is to compress a designed CNN network. Although the compression can generate some precision loss, the redundant parameters of the model can be greatly reduced, the inference speed is improved, the complexity and the resource consumption of the model are reduced, and the efficient SAR target detection is realized.

The most common method of model compression is model pruning, which can be divided into weight pruning, channel/filter pruning and layer pruning according to the rule that the pruning granularity is from fine to coarse. Wherein:

weight pruning, also known as unstructured pruning, has the idea of thinning out the original weight matrix and reducing the computational load during inference by reducing the connections between neurons. The method can reduce the calculation amount without modifying the network structure, but the increase of the reasoning speed needs special hardware for realizing sparse convolution.

Channel/filter pruning and layer pruning, also known as structured pruning, both of which alter the original network structure, tend to increase the speed of inference, but usually also with a concomitant loss of precision. Zhou et al propose channel pruning using Lasso regression and least squares on two-stage model Faster-RCNN in the document "Zhou, Wanhaipeng, Xufeng, Zhang Zhi, Wangxizheng, SAR image ship detection optimization algorithm based on channel pruning" Shanghai Sporo shuttle (Chinese & ENGLISH) No. 37, No. 4 of 2020 ", to reduce 56% of model parameters, but because the original network structure is complex, the reasoning speed is still not ideal.

The traditional pruning scheme is mostly carried out according to the following route: 1) training a large and over-parameterized network to obtain a reference model; 2) pruning the reference model based on a certain strategy; 3) and carrying out fine tuning training on the trimmed model to obtain the trimmed CNN model. Lin et al in the literature "Mingbao Lin, Rongrong Ji, Yan Wang, Yiche Zhang, Baochang Zhang, Yonghong Tian, Ling Shao.HRank: Filter planning using High-Rank Feature map. in CVPR 2020" found that the average Rank of a plurality of Feature maps generated by a single Filter is always the same, and therefore proposed a method for Filter Pruning according to the Rank of the Feature maps. Although the method does not need fine tuning training, the compression ratio of the model cannot reach a higher level because only filter pruning is carried out.

Most of the existing pruning schemes absorb the ideas of NAS (non-access stratum) architecture search and knowledge distillation, and a better structured pruning scheme is found by using complex iterative training. In a patent document ' SAR ship target detection method based on network pruning and knowledge distillation ' (patent application number: CN202011308276.8, publication number: CN112308019A) applied by the national defense science and technology university of China's liberation army, a single-stage YOLOv3 detector is used as a reference detection frame, and after channel pruning is carried out on a network, knowledge distillation guided by the interrelation between feature maps is utilized to restore the performance. However, the use of NAS methods requires a lot of GPU support and is generally only suitable for research use.

Another common method of model compression is parametric quantization. Korean pine et al used shared weights to quantize network parameters for the first time in deep compression, compressing the model by 2-3 times. Parameter quantization is a technique that replaces the default 32-bit floating point calculation with low-specific-point calculation, and is mostly used to save memory resources when the model is deployed in hardware. Although quantization can effectively reduce the model size, parameter number and computational intensity, it often causes a huge loss of precision, especially when the precision is less than 4 bits, even 1 bit, the precision challenge is larger.

The traditional quantization method generally stores model parameters by using fixed bits uniformly, but because the feature extraction capabilities of different depth layers of a network are different, the quantization by using the fixed bits cannot adapt to the characteristics of the network structure, so that important information can be lost, and the optimal compression ratio is difficult to achieve. The method of using mixed precision quantization instead of fixed bit quantization is a commonly used quantization method at present, and different quantization bits are used for representing parameters of different depth layers of a network. Due to the complexity and variability of the network structure, in order to find a better strategy for quantizing the blending precision, complex methods such as grid search or reinforcement learning are usually used, but these methods consume a large amount of computing resources and time cost.

In summary, the existing pruning and quantification methods have three disadvantages: 1) both compress only one aspect of the model, and do not combine the advantages of multiple compression methods; 2) the implementation engineering is too complex, and a large amount of computing resources are needed; 3) the method is not specially designed for a target detection task of the remote sensing SAR image. These deficiencies result in the difficulty of achieving high compression rates for network models and the inability to achieve high-speed SAR target detection in resource-constrained environments.

Disclosure of Invention

The invention aims to provide a remote sensing SAR target detection method based on combination of network pruning and parameter quantification aiming at the defects of the prior art, so as to reduce the consumption of calculation and storage resources, improve the compression ratio of a network model and improve the detection speed of an SAR target in a resource-limited environment.

In order to achieve the purpose, the technical scheme of the invention comprises the following steps:

(1) obtaining well-divided training set D from public remote sensing SAR target detection data set_trainAnd test set D_testAnd performing data expansion on the images in the training set by using geometric transformation to obtain an expanded training set D_exp；

(2) Training set D after expansion_expThe method comprises the steps of counting the proportion of target images with different sizes and images containing a large number of backgrounds, selecting the image with smaller proportion as a difficult sample in an extended training set, and sequentially performing offline data enhancement and online data enhancement on the sample to obtain an enhanced training set D_aug；

(3) Constructing an SAR target detection reference model:

(3a) the method comprises the steps of adjusting the existing lightweight network structure, namely changing an original classification layer into a detection layer, modifying part of down-sampling layers in a network deep layer into non-down-sampling layers, taking the adjusted lightweight network as a feature extraction backbone network N, wherein the feature extraction backbone network N comprises an input layer, a hidden layer and a detection layer, the hidden layer is composed of a plurality of feature extraction modules with the same structure, each feature extraction module comprises a possibly existing down-sampling layer and a plurality of non-down-sampling layers, the down-sampling layers are composed of convolution layers or pooling layers or rearrangement layers, and each non-down-sampling layer is composed of convolution layers or BN layers or activation layers or connection layers;

(3b) using K-Means clustering algorithm to pair extended training set D_expClustering target frames of the labels to obtain a prior anchor frame aiming at the data set, using the anchor frame as an anchor frame of the existing YOLO series single-stage detector to obtain an SAR target detector, and connecting the detector with a backbone network N to form a reference model for SAR target detection;

(4) updating the network weight parameters of the reference model:

(4a) taking a YOLO loss function as a loss function of a reference model, and initializing a weight parameter of a backbone network N by using a random number seed S;

(4b) will enhance the training set D_augInputting the parameters into an initialized reference model to start training, optimizing a YOLO loss function by using a momentum stochastic gradient descent algorithm (SGD) to update the weight parameters of the network, storing the network weight parameters once every 10 periods until the set maximum iteration times is reached, and stopping training to obtain a plurality of updated network weight parameters;

(5) evaluating the performance index of the reference model:

(5a) updating the reference model using the saved plurality of network weight parameters in the test set D_testCalculating F1 score of each updated reference model, and taking the maximum value of the F1 scores to be recorded as F1₀，F1₀Corresponding network weight parameter is W₀Represents;

(5b) calculating W₀The parameter P and floating point operands FLOPs, and the calculation results are respectively marked as P₀And FLOPs₀；

(6) Carrying out rough pruning on the backbone network N:

(6a) setting a module mask m for N feature extraction modules in a backbone network N, and performing One Hot coding on all the feature extraction modules of the network by using the module mask m to obtain N mask subnets;

(6b) all n masked subnets share a weight parameter W₀Fine tuning training each mask subnet and calculating its performance index F1 score F1_iParameter P_iAnd floating point operands FLOPs_i；

(6c) According to the performance index of each mask subnet, calculating the importance index I of each feature extraction module in the backbone network N, and respectively comparing the importance indexes I with a set importance index threshold I_thrComparing when I is less than I_thrThen, the feature extraction module corresponding to the importance index I in the backbone network N is cut off to obtain the roughly-cut backbone network N_m；

(7) For the backbone network N after rough pruning_mFine pruning:

(7a) evaluation of backbone network N after rough pruning based on Hrank method_mImportance of filters in convolutional layers, and dividing important filters and unimportant filters, backbone network N after rough pruning_mBased on the obtained data, the unimportant filter is cut off to obtain the network N after fine pruning_P；

(7b) Initializing a fine-pruned network N using the same random number seed S_PThen using the same training method as in (4) to the network N after fine pruning_PTraining, storing the network weight parameter with the highest F1 score, and recording as W_P；

(8) For network N after fine pruning_PAnd (3) carrying out parameter quantization:

(8a) network N after fine pruning_PThe characteristic extraction module in the system is modified into a quantization module;

(8b) selecting quantization strategies with different precisions for the weight parameters and the activation output, and designing a quantization search space according to the importance index I of the feature extraction module;

(8c) training set D after enhancement_augThe best quantization scheme is searched in the iteration mode, and the best quantization scheme obtained through the search is applied to the network N after the fine pruning_PTo obtain the final network N_Q；

(8d) Using W_PAs a final network N_QThe final network weight parameter W is obtained by fine tuning training_Q；

(9) With final network N_QReplacing the backbone network N of the reference model and using the final network weight parameter W_QUpdating the network weight parameters to obtain a final SAR target detection model;

(10) test set D_testAnd inputting the data into a final SAR target detection model to obtain an accurate SAR target detection result.

Compared with the prior art, the invention has the following advantages:

1. according to the SAR target detection method, a single-stage target detection algorithm with high speed is selected, and a compression scheme of module pruning, filter pruning and mixed precision quantification is further implemented, so that compared with the existing compression method using a two-stage detection algorithm and channel pruning and the method directly using the single-stage detection algorithm, the SAR target detection method has the advantages that the compression ratio of the model is greatly improved, and SAR target detection can be realized on more marginal hardware equipment;

2. aiming at the data characteristics of the remote sensing SAR target, the invention enhances the original training set data by using a plurality of data enhancement methods, improves the richness of the original data, constructs a high-performance reference model by reducing the down-sampling rate of a backbone network, and improves the accuracy of SAR target identification;

3. according to the invention, coarse-grained module pruning is carried out by analyzing module importance, and fine-grained filter pruning is carried out based on the HRank method, so that compared with the existing model compression method using reinforcement learning and knowledge distillation, the training cost is reduced;

4. the method combines random initialization training and pre-training plus fine-tuning training, uses different training methods for different models, reduces the training difficulty and improves the target detection precision of the models.

Drawings

FIG. 1 is a flow chart of an implementation of the present invention;

FIG. 2 is an illustration of a sample remote sensing SAR image used in the present invention;

FIG. 3 is a sub-flow diagram of the present invention for implementing network pruning;

fig. 4 is a schematic diagram of designing a parameter quantization search space in the present invention.

Detailed Description

Embodiments of the present invention are further described below with reference to the accompanying drawings.

Referring to fig. 1, the invention provides a remote sensing SAR target detection method based on network pruning and parameter quantification combination, which comprises five stages: data preprocessing, building a reference model and training, calculating performance indexes, network pruning and parameter quantification.

The concrete implementation is as follows:

step 1: and (4) preprocessing data.

1.1) acquiring a public data set and carrying out data expansion:

obtaining well-divided training set D from public remote sensing SAR target detection data set_trainAnd test set D_testA single image in the dataset may be represented as X^1×H×WWherein (1 × H × W) represents a single channel, a height, and a width of the image, respectively;

since the disclosed remote sensing SAR data set typically contains less data, data expansion of the acquired data set is required, for D_trainRespectively performing translation transformation and rotation transformation on the images of (1), and converting the images of G 'after translation transformation'_i',j'And a rotation-converted image G'_W',Q'Are all added to the training set D_trainObtaining an extended training set D_exp；

The translation transformation formula is as follows:

G'_i',j'＝G_i+x,j+y

wherein G is_i,jRepresenting an original image, (i, j) representing coordinates before translation, (i ', j') representing coordinates after translation, and (x, y) representing a translation direction;

the rotational transformation formula is as follows:

G'_W',Q'＝G_{Wcosθ+Qsinθ,-Wsinθ+Qcosθ}

wherein, (W, Q) represents the coordinate before rotation, (W ', Q') represents the coordinate after rotation, and θ represents the angle of rotation;

1.2) on the extended training set D_expOffline data enhancement and online data enhancement are performed:

1.2.1) evaluation of the indices according to COCO in the extended training set D_expIn the image, the area of the real object frame of all the objects is calculated and 32 is used²And 96²Dividing all targets into small targets, medium targets and large targets for nodes, counting the number of images containing various large and small targets, and manually selecting images containing a large amount of background noise as images of difficult examples, as shown in fig. 2, wherein fig. 2(a) is an image containing a small target, fig. 2(b) is an image containing a medium target, fig. 2(c) is an image containing a large target, and fig. 2(d) is an image containing a large amount of background noise, wherein a white frame is a real target frame manually added at a later stage;

1.2.2) offline data enhancement: scaling an image containing a small target in a difficult sample, and performing background noise reduction on the image partially containing a large amount of backgrounds;

1.2.3) data enhancement on line: reading the images of the training set after offline data enhancement into a memory, and then sequentially carrying out dynamic random change of random overturning, random expanding and random erasing on the images to ensure that the same image is different in different iterative training periods to obtain a data set D after data enhancement_aug。

Step 2: and constructing and training a reference model.

2.1) selecting a lightweight network structure according to computing resources, and determining whether to connect a feature pyramid FPN and a PAN module on the basis of the network structure according to requirements;

the existing lightweight network structure comprises an input layer, a hidden layer and a detection layer, wherein the hidden layer is generally composed of a plurality of feature extraction modules with the same structure, each feature extraction module comprises a possibly existing downsampling layer and a plurality of non-downsampling layers, the downsampling layer is composed of a convolution layer or a pooling layer or a rearrangement layer, each non-downsampling layer is composed of a convolution layer or a BN layer or an activation layer or a connection layer, and in the example, due to limited computing resources, feature pyramid FPN and PAN modules are not added;

2.2) adjusting the selected lightweight network structure:

usually, the lightweight network structure includes 5 down-sampling layers with a step length of 2, which can reduce the size of the network output feature map to 1/32 of the original image, resulting in the loss of small target information, and therefore, it needs to be adjusted, i.e. the classification layer in the original structure is changed into a detection layer, and part of the down-sampling layers in the network deep layer are modified into non-down-sampling layers, and the modification mode is to select different modification modes according to different structures used by the down-sampling layers: if the downsampling layer adopts a convolution layer, changing the step length of the convolution layer into 1; if the downsampling layer adopts the pooling layer, the empty layer is used for replacing the layer, and after modification, the number of network modules without changing the characteristic diagram scale is increased, so that the number of the modules can be reduced appropriately;

the adjusted lightweight network is used as a feature extraction backbone network N, so that the identification capability of the network on small targets can be enhanced, and although the calculated amount is increased compared with that of the original network, the target detection performance can be greatly improved;

2.3) use of K-Means clustering Algorithm on the extended training set D_expClustering target frames of the labels to obtain a plurality of groups of prior anchor frames for the data set, using the anchor frames as anchor frames of the existing YOLO series single-stage detector to obtain an SAR target detector, and connecting the detector with a backbone network N to form an SAR target detection reference model;

2.4) obtaining the loss function of YOLOv2 by using a YOLO detector, and taking the loss function as the loss function of a reference model:

splitting input pictures into S with a YOLO detector²Each grid is provided with an A group of prior anchor frames obtained by clustering, and the coordinate of the upper left corner of the grid is assumed to be c_xAnd c_yThe width and height of the prior anchor frame is p_wAnd p_hThe backbone network performs regression for each prior anchor frame to obtain 5 outputs, t_x、t_y、t_w、t_hAnd t_oWhich respectively correspond to the regression boxesx, y coordinates, width, height and confidence;

the 5 outputs are converted as follows to yield b_x、b_y、b_w、b_hAnd p (object) which corresponds to the x, y coordinates of the prediction box, width, height and probability of being an object, respectively:

b_x＝σ(t_x)+c_x

b_y＝σ(t_y)+c_y

P(object)＝σ(t_o)*IOU(box_pred,box_truth)

wherein σ () is sigmoid function, IOU (box)_pred,box_truth) The overlapping rate between the regression box and the truth box is shown;

the loss function of YOLOv2 is derived from the above conversion parameters:

wherein the content of the first and second substances,

the number of the target categories is the number of the target categories,

is the number of anchor frames a priori,

g is a coordinate variable corresponding to a truth box, b is a coordinate variable corresponding to a prediction box, i is a grid number, and S is total²Each grid, j is the serial number of the prior anchor frame, and each grid is provided with an A group of prior anchor frames;

and

the flag bit of the overlapping relation between the prior anchor frame and the truth frame is judged to determine which group of prior anchor frames is used for predicting the result: if the overlap ratio of the jth prior anchor frame and a certain true value frame in the ith grid is greater than a set overlap ratio threshold value, the overlap ratio of the jth prior anchor frame and the certain true value frame in the ith grid is greater than the set overlap ratio threshold value

Otherwise

If the overlapping rate of the jth prior anchor frame and all the true value frames in the ith grid is less than the set overlapping degree threshold value, the overlapping degree of the jth prior anchor frame and all the true value frames in the ith grid is less than the set overlapping degree threshold value

Otherwise

2.5) initializing the weight parameter of the backbone network N by using the random number seed S, and enhancing the training set D_augInputting the data into an initialized reference model to start training, and optimizing a YOLO loss function by using a momentum Stochastic Gradient Descent (SGD) algorithm to update the weight parameters of the network:

2.5.1) setting the initial iteration time t as 0 and the maximum iteration time t as t_maxInitial acceleration v₀Initializing the network weight parameter θ using the random number seed S, 0₀；

2.5.2) at each iteration, from the enhanced training set D_augObtaining a batch of training samples (X) with the size of n_t,Y_t)_nRandomly selecting a training sample from the batch of samples

To train the sample

And theta_tObtaining the current loss function as the parameter of the YOLO loss function

Wherein i belongs to 1, 2.. and n;

2.5.3) the t iteration, updating the network weight parameter theta by using the following expression to obtain the updated network weight parameter theta_t+1：

θ_t+1＝θ_t-v_t

Wherein v is_tRepresenting the acceleration of the accumulated image at time t, v, at the first iteration_t＝v₀，θ_tFor the model weight parameter at time t, θ at the first iteration_t＝θ₀Alpha represents the power, the value is 0.9, and eta is the learning rate;

2.5.4) repeat 2.5.2) and 2.5.3), the enhanced training set D_augThe iteration is once recorded as a period, and the network weight parameter is stored every 10 periods until the set maximum iteration time t is reached_maxAnd stopping training to obtain a plurality of network weight parameters.

And step 3: and calculating the performance index.

The performance indexes include: f1 fraction, parameter P and floating point operands FLOPs, which are specifically realized as follows:

3.1) updating the reference model using the saved plurality of network weight parameters, and testing the set D_testInputting the target detection result of each updated reference model into each updated reference model, and acquiring the following variable values of each updated reference model from the target detection result of each updated reference model:

true positive TP: the number of targets that are correctly detected;

false positive FP: the number of targets that are detected incorrectly, i.e., false alarms;

false negative FN: the number of objects which are not identified, namely missing detection;

using the three variable values, the accuracy Pr, recall Re and F1 scores for each updated reference model are calculated:

the maximum value of the F1 score was designated as F1₀，F1₀Corresponding network weight parameter is W₀Represents;

3.2) calculating W₀Parameter P and floating point operands FLOPs:

3.2.1) calculating W using the following equation₀Parameter P of each convolutional layer in corresponding backbone network N_i：

P_i＝K²×C_in×C_out

Where K is the convolution kernel size, C_inFor input channel number, C_outThe number of output channels;

3.2.2) to W₀Parameter P of all convolutional layers in corresponding backbone network N_iSumming to obtain W₀Total parameter number P of₀：

Wherein N is the number of convolutional layers in the backbone network N;

3.2.3) calculating W using the following equation₀Corresponding backbone network NFloating point operands FLOPs per convolutional layer_i：

FLOPs_i＝2K²×C_in×C_out×H_out×W_out

Where K is the convolution kernel size, H_outTo output the feature map height, W_outIs the output signature width;

3.2.4) pairs of W₀All convolution layer floating point operands FLOPs in the corresponding backbone network N_iSumming to obtain W₀Total floating point operands of FLOPs₀：

And 4, step 4: and (4) performing network pruning according to the performance index obtained by calculation in the step (3).

The goal of network pruning is: the parameter P and the floating-point operand FLOPs are reduced, and the F1 is ensured to be reduced within a reasonable expectation;

referring to fig. 3, the implementation of the coarse-grained module pruning and the fine-grained filter pruning on the network in this step is as follows:

4.1) carrying out module pruning on the backbone network N:

4.1.1) set module mask m ═ m₁,m₂,...,m_i,...,m_nIn which m is_iCorresponding to the ith feature extraction module m in the backbone network N_iE {0,1}, i ═ 1, 2., N, N represents the number of feature extraction modules in the backbone network N;

4.1.2) order m_j|j≠i＝1，m_iObtaining a mask subnet shielding the ith feature extraction module;

4.1.3) repeating 4.1.2) to obtain n mask subnets;

4.1.4) sharing W across all n masked subnets₀Fine tuning training each mask subnet and calculating their performance index F1 score F1_iParameter P_iAnd floating point operands FLOPs_i；

4.15) calculating each mask subnet Performance indicator and benchmark model Performance indicator F1₀、P₀And FLOPs₀Is Δ F1_i、ΔP_i、ΔFLOPs_i：

ΔF1_i＝F1₀-F1_i

ΔP_i＝P₀-P_i

ΔFLOPs_i＝FLOPs₀-FLOPs_i

4.1.6) defining the importance index I of the ith feature extraction module in the backbone network N_iComprises the following steps:

wherein, alpha, beta and gamma are respectively delta F1_i、ΔP_i、ΔFLOPs_iConstant of influence factor of I_iThe importance degree of each module is represented, the meaning of the importance degree is the influence of the equilibrium value of each unit parameter and floating point operand on the change of F1, and the larger I is, the more important the module is;

4.1.7) comparing the module importance index I with a set importance index threshold I_thrComparing when I is less than I_thrThen, the feature extraction module corresponding to the importance index I in the backbone network N is cut off to obtain the backbone network N after the module pruning_m；

4.2) backbone network N after pruning the modules based on the Hrank method_mPerforming filter pruning:

since the average ranks of a plurality of feature maps generated by a single filter are always the same, the importance of the filter can be judged through the ranks of the feature maps, and then unimportant filters are cut off to obtain a network after pruning, for N_mThe pruning procedure for each convolutional layer of (a) is as follows:

4.2.1) optional training set D_augA few images are input to the backbone network N_mIn the method, the average rank R ═ R of each convolution layer output characteristic graph in the network is calculated₁,r₂,...r_i,...,r_nTherein of，r_iFor the ith filter w_iThe average rank of the output feature map, i ∈ {1, 2., n }, n denotes the number of filters of the layer;

4.2.2) sorting R according to descending order to obtain average rank of sorted convolutional layer output characteristic diagram

Wherein the content of the first and second substances,

represents the ith value from high to low in R;

4.2.3) setting the number of filters n reserved₁And the number n of filters to be pruned₂Wherein n is₁+n₂＝n；

4.2.4) from

Before n is selected₁Filter corresponding to the value to obtain important filter set

From

After (n) is selected₂Filters whose values correspond to each other, resulting in a non-significant set of filters

Wherein the content of the first and second substances,

is composed of

The filter corresponding to the jth value in the list, j ∈ {1,2₁}，

Is composed of

The k-th value in (k) corresponds to a filter, k ∈ {1,2₂}；

4.2.5) set of insignificant filters U removing all convolutional layers_wThen, the network N after fine pruning is obtained_PInitializing the fine-pruned network N using the same random number seed S_PThen to the network N after fine pruning_PTraining and calculating performance indexes, storing the network weight parameter with the highest F1 score, and recording as W_P。

And 5: and quantizing the network parameters to obtain a final SAR target detection model.

After pruning, the network parameters are further compressed by parameter quantization:

5.1) quantizing the weight parameters in the full-precision network module by using a quantization function to obtain quantized weight parameters w^q：

Wherein, the Quan_kIs a quantization function, which is expressed as:

k is the number of quantization bits;

w^τas a primary network weight parameter, w^qThe quantized network weight parameters;

5.2) quantizing the activation output in the full-precision network module by using a quantization function to obtain quantized activation output:

where x is the activation function input and y is activationOutput, y_qFor quantized activation output, α is a learnable parameter;

5.3) to the post-pruning network N_PThe weight parameter and the activation output set different quantization bit value ranges, that is, the quantization bit value range Q of the weight parameter is set_wSetting a quantization bit value range Q of the activation output as {4,5_a＝{3,4,...,7}；

5.4) extracting the importance index I of all the feature extraction modules to { I ═ I₁,I₂,...,I_i,...I_nSequencing in ascending order to obtain the importance indexes of the feature extraction modules sequenced in ascending order

Wherein, I_iRepresents the importance index of the ith feature extraction module,

the ith value from small to large in I is represented, I belongs to {1, 2.. multidot.n };

referring to FIG. 4, this example will be described as I_o、Q_wAnd Q_aDividing into 3 subsets, making the three 3 subsets one-to-one correspondence in order, forming quantization search space, i.e. for I_oEach value in (1) can be corresponding to a feature extraction module in Q_wSubset and Q_aCarrying out quantitative search in the subset;

5.5) Q corresponding to each characteristic extraction module in each quantitative search_wSubset and Q_aRespectively selecting one value from the subset as a weight and a k value of a quantization function in activation output to obtain a quantization network, and then performing enhancement on a training set D_augTraining upper fine adjustment and calculating performance indexes;

5.6) repeat 5.5), search for the best quantization solution, and apply the best quantization solution obtained by the search to the network N after fine pruning_PTo obtain the final network N_Q；

5.7) Final network N after quantization_QIs small, is difficult to train from scratch using random initialization parameters, so it usesNetwork weight parameter W after fine pruning_PAs a final network N_QThe final network weight parameter W is obtained by fine tuning training_Q；

5.8) Final network N_QReplacing the backbone network N of the reference model and using the final network weight parameter W_QAnd updating the network weight parameters to obtain a final SAR target detection model.

Step 6: test set D_testAnd inputting the data into a final SAR target detection model to obtain an accurate SAR target detection result.

The foregoing description is only an example of the present invention and is not intended to limit the invention, so that it will be apparent to those skilled in the art that various changes and modifications in form and detail may be made therein without departing from the spirit and scope of the invention.

Claims

1. A remote sensing SAR target detection method based on network pruning and parameter quantification combination is characterized by comprising the following steps:

(3) Constructing an SAR target detection reference model:

(3b) using K-Means clustering algorithm to pair extended training set D_expClustering target frames of the labels to obtain a prior anchor frame aiming at the data set, using the anchor frame as an anchor frame of the existing YOLO single-stage detector to obtain an SAR target detector, and connecting the detector with a backbone network N to form a reference model for SAR target detection;

(4) updating the network weight parameters of the reference model:

(5) evaluating the performance index of the reference model:

(6) Carrying out rough pruning on the backbone network N:

(7) For the backbone network N after rough pruning_mFine pruning:

2. The method of claim 1, wherein the image is data augmented in (1) by:

1a) for the obtained training set D_trainThe image is subjected to translation transformation to obtain an image G 'subjected to translation transformation'_i',j'：

G'_i',j'＝G_i+x,j+y，

Wherein G represents the original image, (i, j) represents the coordinates before translation, (i ', j') represents the coordinates after translation, and (x, y) represents the translation direction;

1b) performing rotation transformation on the obtained training set image to obtain an image G 'after rotation transformation'_W',Q'：

G'_W',Q'＝G_{Wcosθ+Qsinθ,-Wsinθ+Qcosθ}

1c) the translated image G'_i',j'And a rotated image G'_W',Q'Are all added to the training set D_trainObtaining an extended training set D_exp。

3. The method of claim 1, wherein the extended training set D in (2)_expThe difficult samples in (1) sequentially perform offline data enhancement and online data enhancement, and the implementation is as follows:

2a) selecting a difficult sample:

training set D after expansion_expIn the image, the area of the real object frame of all the objects is calculated and 32 is used²And 96²Dividing all targets into small targets, medium targets and large targets for nodes, counting the number of images containing various large and small targets, and manually selecting images containing a large amount of background noise as difficult samples;

2b) offline data enhancement:

scaling images containing small targets in the difficult samples, and performing background noise reduction on the images containing a large number of backgrounds to obtain a training set after offline data enhancement;

2c) on-line data enhancement:

reading the images of the training set after offline data enhancement into a memory, and then sequentially carrying out dynamic random change of random overturning, random expanding and random erasing on the images to ensure that the same image is different in different iterative training periods to obtain a data set D after data enhancement_aug。

4. The method of claim 1, wherein the detection reference model loss function in (4a) is expressed as follows:

wherein the content of the first and second substances,

the number of the target categories is the number of the target categories,

is the number of anchor frames a priori,

g is a coordinate variable corresponding to a truth box, b is a coordinate variable corresponding to a prediction box, i is a grid number, and S is total²Each grid, j being the serial number of the prior anchor frame, each grid having A groups of priorAn anchor frame is arranged on the base plate,

and

is a flag bit for judging the overlapping relation between the prior anchor frame and the truth frame, and sigma () is sigmoid function, IOU (box)_pred,box_truth) Is the overlap ratio between the prior anchor box and the truth box, b_x＝σ(t_x)+c_x，b_y＝σ(t_y)+c_y，

t_x、t_y、t_w、t_hAnd t_oRespectively 5 outputs of the model obtained for each prior anchor frame, which respectively correspond to the x, y coordinates, width, height and confidence of the regression frame, c_xAnd c_yAs x, y coordinates of the upper left corner of the grid, p_wAnd p_hThe width and height of the anchor frame a priori.

5. The method of claim 1, wherein the SGD optimization of the YOLO loss function in (4b) is performed by using a momentum stochastic gradient descent algorithm to update the weight parameters of the network, and the method is implemented as follows:

4b1) the initial iteration time t is 0, and the maximum iteration time t is_maxInitial acceleration v₀Randomly initializing a network weight parameter θ when equal to 0₀；

4b2) At each iteration, from a batch of training samples (X) of size n_t,Y_t)_nRandomly selecting a training sample

Will train the sample

Wherein i belongs to 1, 2.. and n;

4b3) during the t iteration, the network weight parameter theta is updated by using the following expression to obtain the updated network weight parameter theta_t+1：

θ_t+1＝θ_t-v_t

4b4) repeat 4b2) and 4b3) when the set maximum number of iterations t) is reached_maxWhen the network weight parameter is updated, the updating is stopped to obtain the final network weight parameter

6. The method of claim 1, wherein in (5a) is in test set D_testThe updated F1 score for each reference model was calculated as follows:

5a1) test set D_testInputting the target detection result into each updated reference model to obtain a target detection result of each updated reference model;

5a2) obtaining, from the target detection result of each updated reference model, a variable value for each updated reference model as follows:

true positive TP: the number of targets that are correctly detected;

5a3) and calculating the accuracy rate Pr and the recall rate Re of each updated reference model by using the three variable values:

5a4) using the calculated accuracy Pr and recall Re, the F1 score for each updated benchmark model is calculated:

7. the method of claim 1, wherein the network weight parameter W that maximizes the F1 score is calculated in (5b)₀The parameter P and floating point operands FLOPs of (1) are implemented as follows:

5b1) calculating W₀Parameter P of each convolutional layer in corresponding backbone network N_i：

P_i＝K²×C_in×C_out

5b2) to W₀Parameter P of all convolutional layers in corresponding backbone network N_iSumming to obtain W₀Total parameter number P of₀：

Wherein N is the number of convolutional layers in the backbone network N;

5b3) calculating W₀Corresponding backbone network NFloating point operands FLOPs per convolutional layer_i：

FLOPs_i＝2K²×C_in×C_out×H_out×W_out

5b4) to W₀All convolution layer floating point operands FLOPs in the corresponding backbone network N_iSumming to obtain W₀Total floating point operands of FLOPs₀：

8. The method of claim 1, wherein all feature extraction modules of the backbone network N are One Hot encoded using a module mask in (6b) as follows:

6b1) set module mask m ═ m₁,m₂,...,m_i,...,m_nIn which m is_iCorresponding to the ith feature extraction module m in the backbone network N_iE {0,1}, i ═ 1, 2., N, N represents the number of feature extraction modules in the backbone network N;

6b2) let m_j|j≠i＝1，m_iObtaining a mask subnet shielding the ith feature extraction module;

6b3) repeat 6b2) resulting in n masked subnets.

9. The method according to claim 1, wherein the importance indicator I of each feature extraction module in the backbone network N is calculated according to the performance indicator of each mask subnet in (6c) as follows:

6c1) calculating the difference value between the performance index of each mask subnet and the performance index of the reference model:

ΔF1_i＝F1₀-F1_i

ΔP_i＝P₀-P_i

ΔFLOPs_i＝FLOPs₀-FLOPs_i

6c2) defining an importance index I of the ith feature extraction module in a backbone network N_iComprises the following steps:

wherein, alpha, beta and gamma are respectively delta F1_i、ΔP_i、ΔFLOPs_iIs constant.

10. The method according to claim 1, wherein the evaluation of the backbone network N after rough pruning in (7a) is based on the Hrank method_mThe importance of the filters in the convolutional layer, and the important filters and unimportant filters are divided, and the following are realized:

7a1) optional training set D_augA few images are input to the backbone network N_mIn the method, the average rank R ═ R of each convolution layer output characteristic graph of the network is calculated₁,r₂,...r_i,...,r_nWherein r is_iFor the ith filter w_iThe average rank of the output feature map, i ∈ {1, 2., n }, n denotes the number of filters of the layer;

7a2) sequencing R according to descending order to obtain average rank of output characteristic diagram of sequenced convolution layer

Wherein the content of the first and second substances,

represents the ith value from high to low in R;

7a3) setting the number n of retained filters₁And the number n of filters to be pruned₂Wherein n is₁+n₂＝n；

7a4) From

From

Wherein the content of the first and second substances,

is composed of

The filter corresponding to the jth value in the list, j ∈ {1,2₁}，

Is composed of

The k-th value in (k) corresponds to a filter, k ∈ {1,2₂}。

11. The method of claim 1, wherein the full-precision network module is modified to the quantized network module in (8a) by:

8b1) quantizing the weight parameters in the full-precision network module by using a quantization function to obtain quantized weight parameters w^q：

Wherein, the Quan_kAnd a quantization function, which is expressed as:

k is the number of quantization bits;

8b2) quantizing the activation output in the full-precision network module by using a quantization function to obtain quantized activation output:

where x is the activation function input, y is the activation output, y_qFor quantized activation output, α is a learnable parameter.

12. The method of claim 1, wherein in (8b), quantization strategies of different precisions are selected for the weight parameters and the activation outputs, and the quantization search space is designed according to the importance index I of the feature extraction module, which is implemented as follows:

8a1) setting quantization bit value range Q of weight parameter_wSetting a quantization bit value range Q of the activation output as {4,5_a＝{3,4,...,7}；

8a2) The importance index I of all the feature extraction modules is set as { I ═ I₁,I₂,...,I_i,...I_nSequencing in ascending order to obtain the importance indexes of the feature extraction modules sequenced in ascending order

in representation I fromAn ith value as small as large, i ∈ {1, 2.., n };

8a3) respectively mixing I_o、Q_wAnd Q_aDividing into x subsets, making the three subsets correspond to each other in sequence, and forming a quantized search space, namely for I_oAt each value in the set of values, in its corresponding Q_wSubset and Q_aA quantitative search is performed in the subset.