CN115327544A

CN115327544A - Little-sample space target ISAR defocus compensation method based on self-supervision learning

Info

Publication number: CN115327544A
Application number: CN202211250172.5A
Authority: CN
Inventors: 朱卫纲; 李晨瑄; 李永刚; 朱霸坤; 杨君; 曲卫; 邱磊; 何永华; 王鹏飞
Original assignee: Peoples Liberation Army Strategic Support Force Aerospace Engineering University
Current assignee: Peoples Liberation Army Strategic Support Force Aerospace Engineering University
Priority date: 2022-10-13
Filing date: 2022-10-13
Publication date: 2022-11-11
Anticipated expiration: 2042-10-13
Also published as: CN115327544B

Abstract

The invention discloses a less-sample space target ISAR defocus compensation method based on self-supervision learning, and belongs to the technical field of radar signal processing and radar target detection. Firstly, acquiring radar echo and ISAR radar image data which do not distinguish defocusing from non-defocusing; classifying the radar echoes through a clustering algorithm, and distinguishing whether the radar echoes are defocused according to the image entropy; constructing first sample data according to a target imaging azimuth inclination angle of ISAR radar image data, and performing self-supervision training on a defocusing compensation network; and during compensation, acquiring the radar echo of the satellite type to be compensated, constructing second sample data, performing supervised training on the pre-training model, acquiring a trained defocus compensation network, and performing defocus compensation on the radar echo of the satellite type. The method solves the problems that the defocusing compensation effect is prevented from being improved due to the fact that space target radar echo data are difficult to obtain, the data volume is small, and the labeling cost is high.

Description

Little-sample space target ISAR defocus compensation method based on self-supervision learning

Technical Field

The invention belongs to the technical field of radar signal processing and radar target detection, and particularly relates to a less-sample space target ISAR defocus compensation method based on self-supervision learning.

Background

Compared with optical detection equipment, inverse Synthetic Aperture Radars (ISAR) are less influenced by meteorological conditions, have the advantages of all-weather, high-precision and long-distance detection, and play an important role in the fields of space target detection and identification, battlefield situation reconnaissance and the like.

The radar technician mostly uses a larger bandwidth signal to acquire a high-resolution ISAR image, however, an excessively large target size and a long observation time may cause a Migration of a distance unit (MTRC) of an imaging result, resulting in defocusing of a target image. The shape and structure difference between the target after the defocusing and the real target is large, and the improvement of the identification accuracy of the ISAR space target is hindered to a great extent.

The traditional algorithm defocusing compensation effect is limited to experience and capability of operators, the defocused target radar echo is processed by adopting modes such as manual design parameter compensation and the like, the robustness of the algorithm under different defocusing conditions is insufficient, and higher reliability and timeliness are difficult to guarantee.

Once the deep learning algorithm in the field of artificial intelligence appears, the deep learning algorithm can be widely concerned by researchers in various fields by virtue of strong data modeling and feature extraction capabilities. The deep learning algorithm utilizes the convolutional neural network to analyze data, and can better improve the defects of high complexity, poor robustness and the like of the traditional ISAR space target defocus compensation algorithm. However, most deep learning algorithms require a large number of label samples to train the network model, so as to ensure that the model has high generalization capability. Due to the requirement on the confidentiality of the echo data of the space target radar and the limitation on the quantity of data acquisition, large-scale network training is difficult to support under the condition of few sample labels, and the problems of low convergence rate, overfitting and the like can be caused. In addition, a great deal of information beneficial to defocus compensation and target identification exists in radar echo, such as intra-class differences, inter-class similarity measures, and the like, and the above characteristics are not effectively utilized.

In summary, the existing ISAR image defocus compensation method has the following disadvantages:

(1) The traditional algorithm relies on the capability of technicians, sets a threshold value by means of expert experience, or selects parameters through a large number of experiments, so that better defocus compensation effect is difficult to obtain when the traditional algorithm is used for different types of targets, and the generalization capability is insufficient;

(2) In order to obtain a good defocus compensation effect, a large number of labeled sample training models are needed in the existing algorithm. The acquisition of the space target radar echo data has certain limitation, so that the acquisition of a large amount of space target echo data is difficult, and in addition, the manufacture of a tag data set needs to consume more labor and time costs;

(3) The existing few-sample problem takes transfer learning as a main solution, after an algorithm trains a model on a large number of labeled samples of other types, model weights are transferred to a few-sample task, and then a small amount of existing labeled data is utilized to perform model parameter fine adjustment. However, due to the characteristic difference between the space target ISAR data and the labeling data of other fields, the training models in different fields cannot be directly migrated to obtain a good effect;

(4) Target data characteristics in radar complex echoes are not sufficiently mined, and the modal fusion is difficult to realize by fully utilizing the data characteristics; the determination of the fusion threshold value usually needs multi-round testing and manual tuning, and the selection of the iteration threshold value is time-consuming; in addition, the feature extraction networks designed for the spatial target defocus compensation task are fewer, and the difficulty of the defocus compensation task is increased under the condition of less sample labeling.

Disclosure of Invention

In view of this, the invention provides an ISAR defocus compensation method for a space target with less samples based on self-supervision learning, and solves the problem that the defocus compensation effect is prevented from being improved due to the fact that space target radar echo data are difficult to acquire, the data volume is small, and the labeling cost is high.

When the ISAR image defocuses due to target migration and the identification accuracy is low, the method can better extract the target characteristics in the radar echo, and can realize high-resolution space target ISAR defocusing compensation.

In order to solve the above-mentioned technical problems, the present invention has been accomplished as described above.

A few-sample space target ISAR defocus compensation method based on self-supervised learning comprises the following steps:

step one, radar echo and ISAR radar image data which do not distinguish defocusing from non-defocusing are obtained;

step two, aiming at radar echoes, dividing the radar echoes into two types of data clusters through a clustering algorithm; calculating the image entropy of the two types of data clusters, determining the data cluster with large image entropy as defocused echo data, and determining the data cluster with small image entropy as non-defocused echo data;

aiming at ISAR radar image data, a target imaging azimuth inclination angle is obtained by utilizing an image processing algorithm; constructing defocused echo data and non-defocused echo data with consistent target imaging azimuth inclination angle into first sample data, wherein the non-defocused echo data are used as a pseudo label;

performing first-stage self-supervision training on the constructed defocus compensation network by using the first sample data to obtain a pre-training model;

fourthly, acquiring a radar echo of the satellite of the type aiming at the type of the satellite needing defocusing compensation, and giving a defocused or non-defocused label to form second sample data under the condition of labeling; carrying out second-stage supervised training on the pre-training model to obtain a trained defocus compensation network;

and fifthly, defocusing compensation of the radar echo of the satellite type is performed by using the trained defocusing compensation network.

Preferably, the clustering algorithm is a K-means algorithm; the target imaging azimuth inclination angle obtained by using the image processing algorithm is as follows: and carrying out edge detection by using a Canny operator and calculating the azimuth inclination angle of the target imaging by using Hough transformation.

Preferably, the defocus compensation network comprises a real-number-domain convolution processing branch, a complex-number-domain convolution processing branch and a fusion unit; the fusion unit comprises at least 1 cross enhancement fusion module, a bimodal feature fusion module, at least 1 first weighting fusion module and at least 1 second weighting fusion module;

the real number domain convolution processing branch processes amplitude phase characteristics of radar echoes; the branch comprises an N-layer downsampling module SD for real numbers ₁ ~SD _N The first semantic feature extraction module and the N-layer up-sampling module SU ₁ ~SU _N (ii) a Wherein, the down sampling modules SD with different resolutions _n And an up-sampling module SU _m The outputs of the first and second weighting fusion modules are fused in a first weighting fusion module, and the fusion result is used as a next-stage up-sampling module SU _m+1 Partial inputs of (2); n is not equal to m, the value range of N and m is 1 to N, and N is a positive integer greater than or equal to 2;

the complex field convolution processing branch processes radar echo signals in a complex form; the branch comprises an N-layer down-sampling module FD for the complex number ₁ ~FD _N A second semantic feature extraction module and an N-layer up-sampling module FU ₁ ~FU _N (ii) a Wherein, the down-sampling modules FD of different resolutions _n And an upsampling module FD _m The output of the first weighted fusion module is fused in a second weighted fusion module, and the fusion result is used as the next-stage up-sampling module FU _m+1 Partial inputs of (2);

the cross enhancement fusion module realizes fusion of the characteristic graphs obtained by the real number domain convolution processing branch and the complex number domain convolution processing branch in an up-sampling part and feeds back the characteristic graphs to the real number domain convolution processing branch;

the bimodal feature fusion module realizes the fusion of the extracted features of the last stage of the two branches.

Preferably, the number of the cross enhanced fusion modules is P, P<N, corresponding to P continuous up-sampling modules; for equal resolution in two branchesUp-sampling module SU of rate _p And FU _p One cross-enhancement fusion module is responsible for combining SU _p And FU _p The output feature graph is fused, and the feature is fused with the up-sampling module SU _p The outputs of the two are combined and then used as a next-stage up-sampling module SU _p+1 Is input.

Preferably, the real number domain convolution processing branch comprises 6 down-sampling modules SD ₁ ~SD ₆ 6 up-sampling modules SU ₁ ~SU ₆ 3 first weighted fusion modules SM ₁ ~SM ₃ ：

Down sampling module SD ₃ And an up-sampling module SU ₅ Is output in a first weighted fusion module SM ₁ Performing fusion, and taking the fusion result as an up-sampling module SU ₆ Partial inputs of (2);

down sampling module SD ₄ And an up-sampling module SU ₄ Is output in a first weighted fusion module SM ₂ Performing fusion, and taking the fusion result as an up-sampling module SU ₅ Partial inputs of (2);

down sampling module SD ₅ And an up-sampling module SU ₃ Is output in a first weighted fusion module SM ₃ Performing fusion, and taking the fusion result as an up-sampling module SU ₄ Partial inputs of (a);

the complex field convolution processing branch comprises 6 down-sampling modules FD ₁ ~FD ₆ 6 up-sampling modules FU ₁ ~FU ₆ 3 second weighted fusion modules FM ₁ ~FM ₃ ：

Down-sampling module FD ₃ And an upsampling module FD ₅ Is output in a second weighted fusion module FM ₁ Performing fusion, and taking the fusion result as an up-sampling module FU ₆ Partial inputs of (2);

down-sampling module FD ₄ And an upsampling module FD ₄ Is output in a second weighted fusion module FM ₂ Performing fusion, and taking the fusion result as an up-sampling module FU ₅ Partial inputs of (a);

down-sampling module FD ₅ And an upsampling module FD ₃ Is output in a second weighted fusion module FM ₃ In the process of fusion, theThe resultant is used as an upsampling module FU ₄ Partial inputs of (2);

the number of the cross enhancement fusion modules is 4, and the cross enhancement fusion modules correspond to the up-sampling modules SU ₂ ~ SU ₅ And FU ₂ ~ FU ₅ 。

Preferably, in the real number domain convolution processing branch:

the down-sampling module SD ₁ ~SD _N The down sampling processing of (1) changes the scale of the input characteristic diagram into one half of the original scale, and expands the number of channels into two times of the number before processing;

the first semantic feature extraction module realizes convolution with kernel 3, batch normalization operation and activation function processing to extract semantic features;

the up-sampling module SU ₁ ~SU _N And real convolution upsampling is adopted, so that the resolution of the characteristic diagram is increased layer by layer, and the number of channels is reduced.

Preferably, in the complex-field convolution processing branch:

for a radar echo signal in a complex form, firstly, a complex convolution with a convolution kernel of 3, a step length of 1 and an expansion number of 1 is carried out, then a complex normalization layer and a complex PRelu activation function are carried out, a complex characteristic diagram is obtained and is input into a down-sampling module FD ₁ ；

The down-sampling module FD ₁ ~FD _N The down-sampling processing of (1) changes the scale of the input feature map into one half of the scale before processing, and expands the number of channels into two times of the scale before processing;

the second semantic feature extraction module realizes complex convolution with kernel 3, complex batch normalization operation and complex activation function processing to extract semantic features;

the up-sampling module FU ₁ ~FU _N And (3) complex field convolution upsampling is adopted, so that the resolution of the feature map is increased layer by layer, and the number of channels is reduced.

Preferably, the cross-enhancement fusion module input upsampling module SU _p Amplitude phase characteristic ofF _K1 And an up-sampling module FU _p Complex field signal characteristics ofF _P1 (ii) a In the cross enhancement fusion module, amplitude and phase characteristics are firstly carried outF _K1 Mapping to complex field for feature alignment with complex field signal featuresF _P1 Adding pixel by pixel, then using complex convolution with convolution kernel being 3 and transposed convolution block to make feature fusion and up-sampling, and the output channel number is complex field signal featureF _P1 One-half characteristicF _P1 '; then the amplitude and the phase are characterizedF _K1 Up-sampling by convolution with a real number domain transpose with a convolution kernel of 3, and the featuresF _P1 ' adding pixel by pixel, outputting fused multi-modal fusion characteristicsF _U1 。

Preferably, the bimodal feature fusion module inputs amplitude phase features respectively output by the real number domain convolution processing branchesF _K2 Complex field signal characteristics output by complex field convolution processing branchF _P2 ；

In the bimodal feature fusion module, firstly, the initialized weight factors are processed by utilizing a sigmoid functionwObtaining a scale factor which can be updated with the gradient of the loss function in the training processλ(ii) a Using a scaling factor according to formula IλAdding the features respectively extracted from the two branches pixel by pixel, and outputting the result after completing weighted fusion by using a convolution block with the kernel number of 1 and an activation function;

F _{U 2} =(1-λ)* F _K2 +λ*real(F _P2 )+λ*imag(F _P2 ) I

where real () denotes taking the real part, imag () denotes taking the imaginary part, + denotes pixel-by-pixel addition,F _U2 representing the output of the bimodal feature fusion module.

Preferably, the first weighting fusion module and the second weighting fusion module have the same structure, the first weighting fusion module completes the feature operation by adopting real number field convolution, and the second weighting fusion module completes the feature operation by adopting complex number field convolution;

let the characteristic diagram of the sampling stage beF _E Of the up-sampling phaseThe characteristic diagram isF _D (ii) a The first weighted fusion module and the second weighted fusion module firstly fuse the feature mapsF _E Extracting features by corresponding real number or complex number field operation, and adjusting the extracted features into a feature graph by utilizing bilinear interpolationF _D Features with equal resolution and number of channelsF _E ', then andF _D performing weighted fusion and outputting the fusion resultF _out 。

Preferably, the weighted fusion is: realizing self-adaptive weighting fusion between the features by using a formula II and a formula III;

γ=sigmoid(v) II

F _out =Up[(1-γ)*F _E ’+γ*F _D ] III

wherein the content of the first and second substances,vin order to initialize the weight factors,γthe scale factor can be updated with the gradient of the loss function in the network training process;F _out for the weighted fusion result, up represents that the resolution of the feature map is increased by using inverse convolution, and the number of channels is reduced at the same time; + denotes pixel-by-pixel addition.

Has the advantages that:

(1) The deep learning algorithm training effect depends on a large number of labeled samples, the target high-resolution feature extraction under the condition of less sample labeling is realized, and the more accurate defocus compensation effect is obtained.

In order to obtain better defocus compensation effect and recognition accuracy, the existing algorithm needs a large amount of labeled sample training models. Due to the limitation of space target radar echo data acquisition in certain time and space, the acquisition of a large amount of target data is difficult, and the manufacture of a tag data set needs to consume a large amount of labor and time cost.

The existing few-sample problem solution mainly uses a large number of labeled sample training models in other fields, after training is completed, model weights are transferred to few-sample tasks in the fields, and then the existing few labeled samples are used for parameter fine adjustment.

The method is based on the self-supervision learning idea, and the model training is divided into two stages. Firstly, constructing an automatic supervision training data set by using feature difference of defocused data and non-defocused data, and inputting the automatic supervision data set into a constructed bimodal self-adaptive feature fusion network to obtain a pre-training model; and in the second training stage, loading the pre-training model obtained in the first stage, and optimizing the defocus compensation model by using a small amount of labeled samples. The experimental result shows that the less-sample-space target ISAR defocus compensation method based on the self-supervision learning does not need a large amount of labeled data, and a good experimental effect can be obtained under the condition of less samples.

(2) Design aspect of self-supervision training data set of the invention

In order to realize space target defocusing compensation under the condition of only a small amount of labeled samples, after the characteristics of space target ISAR data are analyzed, firstly, a clustering algorithm is utilized, and the data are divided into defocused echoes and non-defocused echoes by utilizing characteristic distribution differences; because the echo defocuses to cause the entropy of a target image to increase, the defocusing echo with larger image entropy is used as the input of a defocusing compensation network architecture through the calculation of the image entropy, and the non-defocusing echo with smaller image entropy is used as a pseudo label of the self-supervision data. Because the imaging difference of space targets with different azimuth angles is large, compensation errors are increased possibly due to the fact that the space targets are directly input into network training, the imaging inclination angle of the target echo is obtained when defocusing and defocusing are not conducted through an image processing algorithm, and the problem of feature aliasing caused by the fact that the imaging inclination angle difference of the targets is too large in the construction process of the self-supervision data set is solved by defining the angle range.

(3) The multi-modal feature fusion is realized by effectively utilizing the characteristics of different modal data, and the defocus compensation result is optimized.

The traditional method for finishing refocusing and identification by using the target ISAR image mainly describes target surface layer information, only retains amplitude information and ignores phase information when performing feature extraction by using a real convolution network; when space target feature refocusing or space target detection and identification tasks are carried out, the extracted target information is richer, the contained features are more, and the experimental effect is better. Experiments show that the complex information has better generalization characteristics and faster learning efficiency, and the imaginary part of the complex number in the radar echo data also contains more target characteristic parameters; the high-resolution features are easy to lose only by using the complex information modeling, and the defocus compensation effect is still to be improved.

In order to fully utilize the high-resolution characteristics contained in real number and complex number data, the invention respectively designs an amplitude branch and a complex number branch for processing different modal data, and considers that shallow layer characteristics contain more noise information, so after the semantic characteristics are extracted by utilizing down sampling, the deep layer characteristics and the shallow layer characteristics of the same modal are fused with the high-resolution characteristics in the up sampling process.

(4) In order to realize the characteristic enhancement among different modal data and enable the output characteristic diagram to simultaneously contain clearer texture characteristics and outline characteristics, the invention designs a cross enhancement fusion module to realize the characteristic enhancement fusion among different modal data, and uses the high-resolution characteristic enhancement amplitude characteristic in a plurality of characteristic extraction branches to extract the characteristic diagram in the branch while keeping more texture information, thereby being beneficial to improving the model degradation caused by network deepening.

(5) The invention is used for processing the radar complex echo data, and has good timeliness and strong generalization capability for carrying out space target defocusing compensation.

The iteration threshold of the traditional defocus compensation method needs to be manually set for different defocus targets through multiple experiments. When the image characteristic signal-to-noise ratio is low, the operation amount is large, and the timeliness of the algorithm is difficult to meet. In addition, the setting of the experimental parameters of the existing method is often suitable for defocusing echoes of a single target, and the selection of the feature screening threshold value may cause important information to be lost. The method effectively combines the powerful feature extraction and processing capabilities of the deep convolutional neural network, can calculate the optimal compensation parameter by utilizing iterative operation and gradient updating aiming at the data characteristics, and can obtain better defocus compensation effect on target echoes with different defocus degrees. In addition, when the device is actually used, trained weight parameters can be loaded, defocusing echoes are directly input to complete compensation, repeated experiment manual adjustment is not needed, and timeliness is better.

(6) Experimental results show that the less-sample space target ISAR defocus compensation method based on the self-supervision learning can effectively extract the target high-resolution characteristics in the radar echo under the defocus condition, and is good in defocus compensation effect, strong in robustness and high in application value.

Drawings

Fig. 1 is a flow chart of an automated supervised training data set construction.

Fig. 2 is a block diagram of a defocus compensation network.

Fig. 3 is a flowchart of defocus compensation training.

Fig. 4 is a diagram of a basic unit structure of a cross enhanced fusion module.

Fig. 5 is a diagram showing a basic unit structure of a weighted fusion module.

FIG. 6 is a diagram of a bimodal feature fusion module base unit.

FIG. 7 is a flowchart of a defocus compensation test performed by the present invention.

FIG. 8 is a three-dimensional plot of spatial target radar echo data in a defocused condition.

Fig. 9 is a spatial target ISAR image before defocus compensation.

FIG. 10 is a diagram of an ISAR image of a spatial target after defocus compensation using the present invention.

FIG. 11 is a graph of defocus compensation results when the data set partition is not optimized with azimuthal partition.

Detailed Description

The invention is described in detail below by way of example with reference to the accompanying drawings.

The spatial target feature extraction resolution has a large influence on the spatial target defocus compensation effect. The ISAR astigmatic focal length may cause false alarm and false detection of the target after the shape and structural features of the target change. Aiming at the problems that space target radar echo data is difficult to obtain, and the defocusing compensation effect is prevented from being improved due to the small data volume and the high labeling cost, the method effectively utilizes the characteristic information of unmarked data and marked data, and realizes the defocusing compensation of the space target ISAR by only utilizing a small amount of labeled data; aiming at the problem that the traditional algorithm depends heavily on expert experience or is limited in selecting parameters through a large number of experiments and is used for different types of targets to hardly obtain better defocus compensation effect, the invention designs the dual-mode convolution neural network which is suitable for extracting the high-resolution characteristics of the targets in the radar complex echoes, and improves the limitation that the traditional algorithm depends on manual parameter adjustment; aiming at the problem that multi-round testing and manual tuning are needed for determining the multi-modal data fusion threshold, the multi-scale feature fusion weight factor which can be subjected to error back propagation and is combined with loss function self-adaptive iterative optimization is designed, and the complexity of manual tuning is greatly reduced; aiming at the problems that the convolutional neural network feature extraction capability is greatly influenced by the quantity of marked data and the existing algorithm is easy to fall into local optimization under the condition of less sample data, the invention designs the deep-shallow feature weighting fusion module, fully utilizes self-supervision data, only introduces less calculation amount, realizes the effective utilization of deep-layer and shallow-layer information, obtains better ISAR space target defocusing compensation effect and is beneficial to improving the space target identification accuracy under the condition of less samples.

The technical solutions in the embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

Example one

The present invention uses a convolutional neural network for defocus compensation, and the training of the network is divided into two stages, see fig. 3. The defocus compensation network training scheme and the defocus compensation scheme of the present invention are described in detail below with reference to fig. 3. The method specifically comprises the following steps:

the method comprises the following steps: and obtaining the ISAR radar echo and the ISAR radar image data of the space target.

In the step, the required radar echo and ISAR radar image data can be obtained in a simulation mode without distinguishing defocusing and non-defocusing conditions and satellite types, and some existing real radar echoes can be summarized.

In a preferred embodiment, the radar echo is obtained in a simulation mode. The method specifically comprises the following steps: setting radar parameters, utilizing FEKO electromagnetic simulation software to realize space target three-dimensional surface element analysis, and acquiring structural parameters of each point surface element; inputting the three-dimensional structure information of the space target into MATLAB to realize the analysis and processing of the radar complex echo signal; and acquiring the space target ISAR radar echo under defocusing and non-defocusing conditions according to the target motion parameters. The ISAR radar returns obtained here are not marked defocused/unfocused.

Step two: and constructing self-supervision learning data.

Step S201: defocused echo data and non-defocused echo data are distinguished for radar echoes.

In this step, as shown in fig. 1, the defocused echo and the non-defocused echo have different feature distributions, and the radar echo can be classified by calculating the inter-class distance and the inter-class difference. Meanwhile, the defocusing echo characteristic distribution has more discreteness, and the discreteness can be divided by calculating the image entropy of the data cluster. Therefore, this step is specifically implemented as follows:

dividing the radar echo into two types of data clusters by using a clustering algorithm (for example, using a K-means algorithm); and then calculating the image entropy of the two types of data clusters, wherein the data cluster with small image entropy is the non-defocused echo data, and the data cluster with large image entropy is determined as the defocused echo data.

Step S202: aiming at ISAR radar image data, a target imaging azimuth inclination angle is obtained by using an image processing algorithm; and constructing defocused echo data with consistent target imaging azimuth inclination angle and non-defocused echo data into first sample data, wherein the non-defocused echo data are used as pseudo labels.

In the step, considering that the imaging azimuth difference of the space target in the observation arc section is large, all data are directly input into the neural network for training under the condition of dividing the imaging azimuth, the characteristics extracted under different angles are mixed due to the swing of the target angle, and the target boundary is lost after defocusing compensation. Based on the above problems, the step calculates the imaging azimuth inclination angle of the space target by using an image processing algorithm. For example, canny operator is used for edge detection, and then Hough transformation is used for obtaining the azimuth inclination angle of the target.

And then, according to the size of the azimuth inclination angle, defocusing echo data and non-defocusing echo data with the same or similar azimuth angles are searched to form a pair, namely sample data. If the azimuth inclination angles of the 1 defocused echo data and the 2 non-defocused echo data are similar, only the most similar group of defocused echoes and non-defocused echoes is reserved. The excluded one non-defocused echo data is then combined with the other defocused echo data. The operation realizes the rough alignment of the target radar echo data in the divided areas with different azimuth angles, and avoids the boundary loss after compensation caused by the overlarge azimuth angle swing difference between defocused data and non-defocused data.

Then, renaming the defocused echo data after rough alignment and the non-defocused echo data in sequence (or not renaming), and carrying out data normalization so as to input the data into a neural network for processing. The sample data set comprises defocused echo data and non-defocused echo data, and the non-defocused echo data serves as a pseudo label of the self-supervision training.

Constructing a space target defocusing compensation self-supervision data set under the condition of no labeling through the steps; after the construction of the self-supervision data set is completed, the self-supervision data set is divided into a training sample set and a testing sample set according to the proportion of 8:2. And the training sample set carries out network training, and the test sample carries out the test of the network training result.

Step three: and (5) performing the first-stage self-supervision training on the defocus compensation network by using the training sample set constructed in the second step to obtain a pre-training model.

And step four, aiming at the type of the satellite needing defocusing compensation, acquiring a small amount of radar echoes of the type of the satellite, and giving defocused or non-defocused labels to form second sample data under the condition of labeling. And performing second-stage supervised training on the pre-training model obtained in the step three to obtain a trained defocus compensation network.

In this step, simulation data may be used to obtain radar echoes in the defocused and non-defocused states of the radar for the satellite type. The second sample data is used for supervised learning and therefore a label is required. Since the second stage is optimized for the pre-trained network, a large number of samples are not required. And a good effect can be obtained by using a small amount of samples.

In one example, 100 marked defocused and unfocused echo data are input into a defocusing compensation network to extract deep features of a plurality of echoes; and (3) realizing feature fusion among different modes by using a cross enhancement fusion module while up-sampling, acquiring fusion features with higher resolution texture information and contour information, and further optimizing a defocus compensation model to obtain a defocus compensation result which is beneficial to improving the space target identification precision.

At the end of this step, the trained model weights can be loaded, and the test sample is used to test the space target ISAR defocus compensation effect.

This flow ends by this point.

Example two

This embodiment provides a dual-modality defocus compensation network for use in the defocus compensation scheme of the present invention.

Because the amplitude and phase information in the radar complex echo contain the characteristics which are beneficial to improving the defocusing compensation effect of the space target. As shown in fig. 2, in order to obtain a better defocus compensation effect under the condition of less sample data, through experimental tests, the invention designs two branches: a real-number domain convolution processing branch and a complex-number domain convolution processing branch. Meanwhile, a Fusion unit is designed, and the Fusion unit comprises at least 1 Cross Enhancement Fusion module (hereinafter referred to as CEF module, CEF is an abbreviation of Cross Enhancement Fusion), a bimodal feature Fusion module, at least 1 first weighted Fusion module (hereinafter referred to as first MIXUP module) and at least 1 second weighted Fusion module (hereinafter referred to as second MIXUP module).

And (4) carrying out convolution processing on branches in a real number domain, and processing amplitude and phase characteristics of radar echoes. The branch uses conventional real convolution for extracting the target texture features in the radar echo. The branch comprises an N-layer downsampling module SD for real numbers ₁ ~SD _N (encoding stage), first semantic feature extraction module and N-layer upsampling moduleBlock SU ₁ ~SU _N (decoding stage). Wherein, the down sampling modules SD with different resolutions _n And an up-sampling module SU _m The output of (a) is fused in a first MIXUP module, and the fusion result is used as a next-stage up-sampling module SU _m+1 Partial inputs of (2); n is not equal to m, the value range of N and m is 1 to N, and N is a positive integer greater than or equal to 2.

And a complex field convolution processing branch for processing the radar echo signal in a complex form. The branch uses complex convolution and other operations for contour information mining of targets in radar complex echoes. The branch comprises an N-layer down-sampling module FD for the complex number ₁ ~FD _N A second semantic feature extraction module and an N-layer up-sampling module FU ₁ ~FU _N (ii) a Wherein, the down-sampling modules FD of different resolutions _n And an upsampling module FD _m The output of (a) is fused in a second MIXUP module, and the fused result is used as the next-stage up-sampling module FU _m+1 Is input.

And the CEF module realizes the fusion of the characteristic graphs obtained by the real number domain convolution processing branch and the complex number domain convolution processing branch in an up-sampling part and feeds back the characteristic graphs to the real number domain convolution processing branch. Specifically, the CEF modules are P, P<N, corresponding to P continuous up-sampling modules; upsampling modules SU for the same resolution in both branches _p And FU _p One CEF module is responsible for coupling SU to _p And FU _p The output feature graph is fused, and the feature is fused with the up-sampling module SU _p Output of the first stage is combined and then used as a next stage up-sampling module SU _p+1 Is input.

In the encoding stage (downsampling stage), a plurality of downsampling operations are utilized, the number of channels is expanded while the resolution of the image is reduced, and in the decoding stage (upsampling stage), the same-domain feature fusion, different-domain feature fusion processing and the like are carried out, so that multi-mode feature multiplexing is realized. And finally, performing adaptive weighting fusion on the bimodal features to obtain compensated features. The dual-mode defocus compensation network provided by the invention utilizes dual-mode branches to process radar echoes, so that the shallow characteristic and the deep semantic characteristic are well kept, and meanwhile, the defocus compensation effect is favorably improved.

In the self-supervision training stage, after sample data is input into a network and is preprocessed by the network, ISAR amplitude characteristics containing more texture information are obtained, and meanwhile, complex characteristics containing more contour information are obtained. The characteristics are respectively input into the two branches to complete characteristic processing, and the similar modal data enhancement and the self-adaptive weighting fusion between different modal data are carried out at the network sampling stage.

This embodiment also specifically provides an example of a dual-branch network architecture having 6 layers of up/down sampling, 4 CEF modules, and 3 pairs of MIXUP modules (each pair includes a first MIXUP module and a second MIXUP module), as shown in fig. 2.

(1) Target amplitude feature processing branch in radar echo: real number domain convolution processing branch

The real number domain convolution processing branch comprises 6 down-sampling modules SD ₁ ~SD ₆ 6 up-sampling modules SU ₁ ~SU ₆ 3 first MIXUP modules SM ₁ ~SM ₃ ：

Down sampling module SD ₃ And an up-sampling module SU ₅ Is output at the first MIXUP module SM ₁ Performing fusion, and taking the fusion result as an up-sampling module SU ₆ Partial inputs of (a);

down sampling module SD ₄ And an up-sampling module SU ₄ Is output at the first MIXUP module SM ₂ Performing fusion, and taking the fusion result as an up-sampling module SU ₅ Partial inputs of (2);

down sampling module SD ₅ And an up-sampling module SU ₃ Is output at the first MIXUP module SM ₃ Performing fusion, and taking the fusion result as an up-sampling module SU ₄ Is input.

The branch coding stage has six downsampling operations in total, and the decoding stage has six upsampling operations in total, so that the input characteristic diagram and the output characteristic diagram have the same resolution.

The working process of the branch is as follows: a total of six downsampling layers for reducing resolution while expanding the number of channels. Extracting amplitude phase information after two FFT processing of radar echoes to obtain input characteristics; after down-sampling processing, the scale of the input characteristic diagram is changed into one half of the scale before processing, and the number of channels is expanded into twice of the scale before processing; and extracting semantic features from the features subjected to six times of real number down-sampling through convolution with the kernel number of 3, batch normalization operation and an activation function, then increasing the resolution of the feature map layer by utilizing an up-sampling module in the decoding process, and simultaneously reducing the number of channels.

Still referring to FIG. 2, for the operation of the MIXUP module: the branch input features are subjected to three times of down sampling to obtain a feature map with the channel number of 32, after interpolation operation in an MIXUP module shown in fig. 5, the feature map with the channel number of 8 in the decoding process is subjected to self-adaptive weighted fusion, more high-resolution features are reserved, and then the feature map with the channel number of 8 subjected to up sampling processing in the decoding process is added, and the feature with the channel number of 4 is output by the module; similarly, the feature map with 64 channels in the encoding process and the feature map with 16 channels in the decoding process are input into the MIXUP module together, and after the operation of the MIXUP module, the feature maps with 16 channels are added after up-sampling to obtain the feature with 8 channels; the feature map with 128 channels in the encoding process and the feature map with 32 channels in the decoding process are input into the MIXUP module together, and after the MIXUP module is operated, the feature map with 32 channels is up-sampled and added to obtain the feature with 16 channels.

For the operation of the CEF module: after six times of downsampling and convolution processing are carried out on input radar echo data, the characteristic that the number of channels is 256 is obtained. The upsampling operation uses a transposed convolution with a kernel number of 3 and a PRelu activation function. After two upsamplings, the number of feature channels becomes 64. After up-sampling the features with the feature channel number of 64, adding the features with the same resolution feature map output by the CEF module shown in FIG. 4 to obtain the features with the channel number of 32; similarly, after up-sampling the feature with the number of feature channels of 32, adding the feature with the same resolution ratio output by the CEF module to obtain the feature with the number of channels of 16; the feature with 16 number of feature channels is up-sampled and added with the feature map with the same resolution output by the CEF module to obtain the feature with the number of channels8, a feature of; and after up-sampling the features with the number of the feature channels of 8, adding the features with the same resolution output by the CEF module to obtain the features with the number of the channels of 4. In this embodiment, the number of CEF modules is 4, and the CEF modules correspond to the up-sampling modules SU ₂ ~ SU ₅ And FU ₂ ~ FU ₅ 。

(2) Target complex feature processing branch in radar echo: complex field convolution processing branch

The complex field convolution processing branch comprises 6 down-sampling modules FD ₁ ~FD ₆ 6 up-sampling modules FU ₁ ~FU ₆ 3 second MIXUP modules FM ₁ ~FM ₃ ：

Down-sampling module FD ₃ And an upsampling module FD ₅ Is output in a second MIXUP module FM ₁ Performing fusion, and taking the fusion result as an up-sampling module FU ₆ Partial inputs of (2);

down-sampling module FD ₄ And an upsampling module FD ₄ Is output in a second MIXUP module FM ₂ Performing fusion, and taking the fusion result as an up-sampling module FU ₅ Partial inputs of (2);

down-sampling module FD ₅ And an upsampling module FD ₃ Is output in a second MIXUP module FM ₃ Performing fusion, and taking the fusion result as an up-sampling module FU ₄ Is partially inputted

Therefore, the branch encoding stage has six downsampling operations in total, and the decoding stage has six upsampling operations in total, so that the input feature diagram and the output feature diagram have the same feature resolution. Different from the amplitude feature processing branch, the operations of the convolution module, the normalization operation, the activation function and feature splicing, the feature fusion and the like used by the branch are complex number fields.

The working process of the branch is as follows: after radar complex echo preprocessing, the characteristics are firstly subjected to complex convolution with a convolution kernel of 3, a step length of 1 and an expansion number of 1, and then complex normalization layer and complex PRelu activation function processing are carried out to generate complex characteristics with a channel number of 8. The branch is used for reducing the resolution and expanding the total six down-sampling operations of the number of channels, after the input features are subjected to down-sampling processing, the resolution is changed into one half of the resolution before processing, and the number of channels is expanded into two times of the resolution before processing; the features after six times of complex down-sampling are subjected to convolution with the kernel number of 3, batch normalization operation and activation function to extract semantic features, then the resolution is increased layer by using an up-sampling module, and the number of channels is reduced.

Fig. 2 shows that the branch input feature is subjected to three times of downsampling to obtain a feature map with a channel number of 64, and after interpolation operation in MIXUP shown in fig. 5, the feature map with the channel number of 16 in the decoding process is subjected to adaptive weighted fusion, so that more high-resolution features are retained, and then the feature map with the channel number of 16 subjected to upsampling processing in the decoding process is added to the feature map with the channel number of 8 output by the module; similarly, the feature map with the channel number of 128 in the encoding process and the feature map with the channel number of 32 in the decoding process are input into the MIXUP module together, and after the operation of the MIXUP module, the feature maps with the channel number of 32 are subjected to upsampling and then are added to obtain the feature with the channel number of 16; the feature map with 256 channels in the encoding process and the feature map with 64 channels in the decoding process are input into the MIXUP module together, and after the MIXUP module is operated, the feature map with 64 channels is up-sampled and added to obtain the feature with 32 channels.

After six times of downsampling and complex convolution module processing, complex field target characteristics with the channel number of 512 are obtained. The upsampling operation uses a complex transpose convolution with a kernel number of 3, complex batch normalization, and a complex PRelu activation function. After two upsampling, the number of eigen-channels becomes 128. Because the contour features contained in the complex echoes are clearer, the complex features with the number of feature channels of 128, 64, 32 and 16 are continuously transmitted by using the complex up-sampling to obtain features with higher resolution, and meanwhile, the feature extraction network architecture designed by the invention also utilizes a CEF module to enhance the amplitude feature edge with the same feature resolution, and the dual-mode network designed by the invention can effectively improve the defocusing compensation precision while realizing multi-mode feature fusion.

After the complex field convolution processing branch extracts the characteristics through encoding and decoding, the characteristics with the channel number of 8 are output, the characteristics which are output by the amplitude characteristic extraction branch and the channel number of 4 are input into the bimodal characteristic fusion module, the self-adaptive weighting fusion of different modal characteristics is realized, and the defocusing compensated high-resolution characteristic diagram is obtained.

And in the defocusing compensation network model training process, gradient and weight updating is carried out by using mean square error loss, and when the loss function is iterated for 40 times and does not fall any more, the training is stopped, so that the self-supervision pre-training model is obtained. The hyper-parameters used for training are: the total number of iterations was 160, the batch training amount was 80, the learning rate was 0.005, and the training was optimized using an Adam optimizer.

Fig. 4 shows a block diagram of a CEF module. The module respectively inputs the amplitude and phase characteristics of radar echoesF _K1 And complex field signal characteristicsF _P1 . In the CEF module, the first step is to firstlyF _K1 Mapping to a complex field for feature alignment, anF _P1 Adding pixel by pixel, then using complex convolution with convolution kernel as 3 and transposed convolution block to make feature fusion and up-sampling, and its output channel number isF _P1 One-half characteristicF _P1 '; the second step is toF _K1 Up-sampling by convolution with a convolution kernel of 3 with a real number domain transpose, and outputting the result of the first operationF _P1 ' carry out pixel-by-pixel addition and output the fused multi-modal featuresF _U1 . Further, the multi-modal fusion features output by the CEF moduleF _U And the up-sampling module SU _p Is subjected to corresponding pixel addition as an up-sampling module SU _p+1 Is input.

Fig. 5 shows a block diagram of the MIXUP module. During the down-sampling process of the radar complex echo data, shallow information with higher resolution is inevitably lost. Shallow layer features under defocusing or noise interference conditions have larger structural difference with the label truth value, and if shallow layer information greatly influenced by noise is directly connected with a feature map in an up-sampling process, more noise influence feature extraction effect may be introduced. In order to solve the problems, the invention provides a MIXUP module for weighted fusion of deep features and shallow features.

Let the encoding stage (downsampling stage)) Is characterized byF _E The characteristic diagram of the decoding stage (up-sampling stage) isF _D . The MIXUP module firstly carries out the operationF _E Extracting features by convolution operation of corresponding real number or complex number field, and adjusting the features into AND by bilinear interpolationF _D Features with equal resolution and number of channelsF _E ' then, the self-adaptive weighting fusion between the features is realized by using the formula (1) and the formula (2).

Setting the initialized weight factor tov，vObtaining a scale factor through a sigmoid activation functionγScaling factors during gradient update with error functionγIt will be updated towards generating a gradient that is closer to the tag data and more distinct in character. And realizing further feature fusion and up-sampling by utilizing the inverse convolution of the fused deep and shallow features. In the process of realizing feature enhancement by using the MIXUP module, the calculation formulas of the scale factor and the deep and shallow layer fusion feature are shown as a formula (1) and a formula (2).

γ=sigmoid(v) (update with gradient) (1)

F _out =Up[(1-γ)*F _E ’+γ*F _D ] （2）

In the formula (I), the compound is shown in the specification,vin order to initialize the weight factors,γthe scale factor is updated with the gradient of the loss function in the network training process;F _out for the weighted fusion result, up represents that the resolution of the feature map is increased by using inverse convolution, and the number of channels is reduced at the same time; + denotes pixel-by-pixel addition.

FIG. 6 illustrates a block diagram of a bimodal feature fusion module. The module inputs the amplitude and phase characteristics of radar echoF _K2 And complex field signal characteristicsF _P2 . Firstly, the bimodal feature fusion module processes initialized weight factors by utilizing a sigmoid functionwObtaining a scale factor which can be updated with the gradient of the loss function in the training processλUsing the proportional factorSeed of Japanese apricotλAnd (3) adding the features respectively extracted from the two branches pixel by pixel, and further, completing weighted fusion by using a convolution block with the kernel number of 1 and an activation function:

λ=sigmoid(w) （3）

F _{U 2} =(1-λ)* F _K2 +λ*real(F _P2 )+λ*imag(F _P2 ) （4）

In the model optimization stage, the total training times are 100 times, the batch training amount is 50, the learning rate is 0.005, and an Adam optimizer is used for optimization training.

The defocus compensation method of the present invention is shown in FIG. 7. Fig. 8 is a three-dimensional thermodynamic diagram of space target radar echo data obtained through simulation, and an ISAR image of a space target without defocus compensation is shown in fig. 9. And in the model test stage, inputting test set data into a bimodal network, loading the trained model weight, and outputting a defocusing compensation result. The space target ISAR image obtained by defocusing compensation on the image in FIG. 9 by the method of the present invention is shown in FIG. 10. The training effect after improving the auto-supervised defocus compensation without using the angular range division is shown in fig. 11. Experimental results show that the construction method of the self-supervision defocus compensation data set provided by the invention can better improve the defocus compensation effect under the condition of only a small amount of marked samples.

The method effectively utilizes the strong data analysis and processing capacity of deep learning, effectively fuses the complex number and the clear contour feature of the echo and the strong texture feature of the real number domain ISAR image by analyzing and extracting the radar echo feature, and realizes a good defocus compensation effect.

The above embodiments only describe the design principle of the present invention, and the shapes and names of the components in the description may be different without limitation. Therefore, a person skilled in the art of the present invention can modify or substitute the technical solutions described in the foregoing embodiments; such modifications and substitutions do not depart from the spirit and scope of the present invention.

Claims

1. A few-sample space target ISAR defocus compensation method based on self-supervised learning is characterized by comprising the following steps:

step two, aiming at radar echoes, dividing the radar echoes into two types of data clusters through a clustering algorithm; calculating the image entropy of the two types of data clusters, determining the data cluster with large image entropy as defocusing echo data, and determining the data cluster with small image entropy as non-defocusing echo data;

2. The defocus compensation method of claim 1, wherein the clustering algorithm is a K-means algorithm; the target imaging azimuth inclination angle obtained by using the image processing algorithm is as follows: and carrying out edge detection by using a Canny operator and calculating the azimuth inclination angle of the target imaging by using Hough transformation.

3. The defocus compensation method of claim 1, wherein the defocus compensation network comprises a real-domain convolution processing branch, a complex-domain convolution processing branch, and a fusion unit; the fusion unit comprises at least 1 cross enhancement fusion module, a bimodal feature fusion module, at least 1 first weighting fusion module and at least 1 second weighting fusion module;

the real number domain convolution processing branch processes amplitude phase characteristics of radar echoes; the branch comprises an N-layer downsampling module SD for real numbers ₁ ~SD _N The first semantic feature extraction module and the N-layer up-sampling module SU ₁ ~SU _N (ii) a Wherein, the down-sampling module SD of different resolutions _n And an up-sampling module SU _m The outputs of the first and second weighting fusion modules are fused in a first weighting fusion module, and the fusion result is used as a next-stage up-sampling module SU _m+1 Partial inputs of (2); n is not equal to m, the value range of N and m is 1 to N, and N is a positive integer greater than or equal to 2;

the complex field convolution processing branch processes radar echo signals in a complex form; the branch comprises an N-layer down-sampling module FD for the complex number ₁ ~FD _N A second semantic feature extraction module and an N-layer up-sampling module FU ₁ ~FU _N (ii) a Wherein, the down-sampling modules FD of different resolutions _n And an upsampling module FD _m The output of the first weighted fusion module is fused in a second weighted fusion module, and the fusion result is used as the next-stage up-sampling module FU _m+1 Partial inputs of (a);

4. The defocus compensation method of claim 3 wherein said cross enhancement fusion module is P, P<N, corresponding to P continuous up-sampling modules; upsampling modules SU for the same resolution in both branches _p And FU _p One cross enhancementThe fusion module is responsible for fusing SU _p And FU _p The output feature graph is fused, and the feature is fused with the up-sampling module SU _p The outputs of the two are combined and then used as a next-stage up-sampling module SU _p+1 Is input.

5. The defocus compensation method of claim 4 wherein said real number domain convolution processing branch comprises 6 down-sampling modules SD ₁ ~SD ₆ 6 up-sampling modules SU ₁ ~SU ₆ 3 first weighted fusion modules SM ₁ ~SM ₃ ：

down-sampling module FD ₄ And an upsampling module FD ₄ Is output in a second weighted fusion module FM ₂ Performing fusion, and taking the fusion result as an up-sampling module FU ₅ Partial inputs of (2);

down-sampling module FD ₅ And an upsampling module FD ₃ Output of (2) in second weighted fusionMatched module FM ₃ Performing fusion, and taking the fusion result as an up-sampling module FU ₄ Partial inputs of (2);

6. The defocus compensation method of claim 3~5, wherein in the real number domain convolution processing branch:

the up-sampling module SU ₁ ~SU _N Real convolution is adopted for up-sampling, the resolution of the feature map is increased layer by layer, and the number of channels is reduced at the same time;

in the complex-domain convolution processing branch:

7. The defocus compensation of claim 3~5Method, characterized in that said cross enhancement fusion module is input to an up-sampling module SU _p Amplitude phase characteristic ofF _K1 And an up-sampling module FU _p Complex field signal characteristics ofF _P1 (ii) a In the cross enhancement fusion module, amplitude and phase characteristics are firstly carried outF _K1 Mapping to complex field for feature alignment with complex field signal featuresF _P1 Adding pixel by pixel, then using complex convolution with convolution kernel 3 and transposed convolution block to make feature fusion and up-sampling, and the output channel number is complex number field signal featureF _P1 One-half characteristicF _P1 '; then the amplitude and the phase are characterizedF _K1 Up-sampling by convolution with a real number domain transpose with a convolution kernel of 3, and the featuresF _P1 ' performing pixel-by-pixel addition and outputting fused multi-modal fusion featuresF _U1 。

8. The defocus compensation method of claim 3~5 wherein said bimodal feature fusion module inputs amplitude phase features respectively output from real number domain convolution processing branchesF _K2 Complex field signal characteristics output by complex field convolution processing branchF _P2 ；

F _{U 2} =(1-λ)* F _K2 +λ*real(F _P2 )+λ*imag(F _P2 ) Formula I

9. The defocus compensation method of claim 3~5 wherein said first weighted fusion module and said second weighted fusion module have the same structure, said first weighted fusion module employs real number domain convolution to complete feature operation, said second weighted fusion module employs complex number domain convolution to complete feature operation;

let the characteristic diagram of the sampling stage beF _E The characteristic diagram of the up-sampling stage isF _D (ii) a The first weighted fusion module and the second weighted fusion module firstly fuse the feature mapF _E Extracting features through corresponding real number or complex number field operation, and adjusting the extracted features into a feature graph by utilizing bilinear interpolationF _D Features with equal resolution and number of channelsF _E ', then andF _D performing weighted fusion and outputting the fusion resultF _out 。

10. The defocus compensation method of claim 9 wherein said weighted fusion is: realizing self-adaptive weighting fusion between the features by using a formula II and a formula III;

γ=sigmoid(v) Formula II

F _out =Up[(1-γ)*F _E ’+γ*F _D ]Formula III

Wherein the content of the first and second substances,vin order to be the weight factor for the initialization,γthe scale factor can be updated with the gradient of the loss function in the network training process;F _out for the weighted fusion result, up represents that the resolution of the feature map is increased by using inverse convolution, and the number of channels is reduced at the same time; + denotes pixel-by-pixel addition.