CN116008982A

CN116008982A - Radar target identification method based on trans-scale feature aggregation network

Info

Publication number: CN116008982A
Application number: CN202211607715.4A
Authority: CN
Inventors: 包敏; 邹富; 贾伯阳; 郭亮
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2022-12-14
Filing date: 2022-12-14
Publication date: 2023-04-25

Abstract

The invention discloses a radar target identification method based on a trans-scale feature aggregation network, which comprises the following steps: collecting skin vibration echo generated by static target respiration by using a radar to obtain a radar echo signal; processing the radar echo signal by using an SSST algorithm to generate a time-frequency image containing respiratory characteristics; dividing and aggregating the time-frequency images with different scales by using a feature pyramid network, and extracting micro Doppler feature images from feature images with different scales output by the feature pyramid by using a CSFA network; and inputting the aggregated micro Doppler characteristic diagram into a softMax classifier to obtain a static target identification result. According to the invention, the radar echo signals of the stationary targets are processed through the SSST algorithm, so that clutter such as multipath interference and the like can be suppressed, and a high-resolution time-frequency diagram with breathing characteristics is generated; meanwhile, the micro Doppler characteristic map in the time-frequency map is extracted through the trans-scale characteristic aggregation network, so that the identification accuracy of the target is improved, and the efficiency of radar detection of the target is improved.

Description

Radar target identification method based on trans-scale feature aggregation network

Technical Field

The invention belongs to the technical field of radar signal processing, and particularly relates to a radar target identification method based on a trans-scale feature aggregation network.

Background

The ultra-wideband continuous wave radar is an emerging sensing technology, and the electromagnetic wave signal emitted by the ultra-wideband continuous wave radar has the characteristics of strong penetrating capacity and high resolution, can reflect the characteristics of scattering point energy distribution and the like of a target, and can be used for detecting vital sign signals.

In order to avoid the degradation of target identification accuracy caused by interference of radar echo signals such as multipath clutter and the like caused by the environment where a stationary target is located, it is important to improve the resolution of a target image. Currently, MSRA (Multiscale Residual Attention, multi-scale residual attention) based networks are adopted in the related art to realize the identification of stationary targets, and the method comprises three parts of radar signal processing, a multi-scale learning architecture and a residual attention learning mechanism. However, the method utilizes a two-dimensional pseudo-color image to identify a static target, has poor clutter interference suppression effect, is difficult to generate a high-resolution image, causes more invalid features to be extracted from the image, and reduces the accuracy of target identification.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a radar target identification method based on a cross-scale feature aggregation network. The technical problems to be solved by the invention are realized by the following technical scheme:

the invention provides a radar target identification method based on a trans-scale feature aggregation network, which comprises the following steps:

collecting skin vibration echo generated by static target respiration by using a radar to obtain a radar echo signal;

processing the radar echo signal by using a synchronous extrusion S-transformation time-frequency analysis algorithm to generate a time-frequency image containing respiratory characteristics;

dividing and aggregating the time-frequency images with different scales by using a feature pyramid network, and extracting micro Doppler feature images from feature images with different scales output by the feature pyramid network by using a CSFA network;

and after generating an aggregate micro-Doppler feature map based on the micro-Doppler feature map, inputting the aggregate micro-Doppler feature map to a SoftMax classifier to obtain an identification result of the radar target.

In one embodiment of the present invention, before the step of generating the aggregated micro doppler profile based on the micro doppler profile, the method further comprises:

acquiring a pre-trained Resnet-18 model, and loading a time-frequency diagram data set model file;

after generating an aggregate micro-Doppler feature map based on the micro-Doppler feature map, inputting the aggregate micro-Doppler feature map to a SoftMax classifier to obtain an identification result of the radar target, wherein the method comprises the following steps of:

and after generating an aggregate micro-Doppler feature map based on the micro-Doppler feature map, inputting the aggregate micro-Doppler feature map into a softMax classifier, so that the softMax classifier compares the aggregate micro-Doppler feature map by utilizing the time-frequency map dataset model file to obtain an identification result of the radar target.

In one embodiment of the present invention, the step of processing the radar echo signal using a synchronous extrusion S-transform time-frequency analysis algorithm to generate a time-frequency image including respiration features further includes:

preprocessing the radar echo signal to obtain a radar echo signal containing respiratory characteristics;

and generating a two-dimensional range profile through time accumulation based on the radar echo signals containing respiratory features.

In one embodiment of the present invention, the step of processing the radar echo signal using a synchronous extrusion S-transform time-frequency analysis algorithm to generate a time-frequency image containing respiratory features includes:

and processing the effective channel data in the two-dimensional range profile by using a synchronous extrusion S-transformation time-frequency analysis algorithm, and aggregating all the effective channel data into a time-frequency image.

In one embodiment of the invention, the radar echo signal is processed using a synchronous extrusion S-transform time-frequency analysis algorithm according to the following formula:

wherein f _k 、f _c Δf _c Respectively representing the discrete frequency of the S transformation, the center frequency of the extrusion section and the bandwidth of the extrusion section, b represents the time axis displacement parameter, and Δf _k ＝f _k -f _k-1 ，Δf _c ＝f _c -f _c-1 ，ST(f _k B) represents S-transformation of radar echo signals, f _c (f _k B) represents the instantaneous frequency of the radar echo signal.

In one embodiment of the invention, the CSFA network includes a channel attention model and a spatial attention model;

dividing and aggregating the time-frequency images with different scales by using a feature pyramid network, and extracting micro Doppler feature images from feature images with different scales output by the feature pyramid network by using a CSFA network, wherein the method comprises the following steps:

dividing the time-frequency image into different scales by utilizing a characteristic pyramid network;

input feature map F for each scale _i (i=2, 3,4, 5) inputting the channel attention model to obtain a one-dimensional channel attention profile M _c1 Thereafter, one-dimensional channel attention profile M _c1 And input feature map F _i Multiplying to obtain channel characteristic F _c1 Channel characteristics F _c1 Inputting the two-dimensional space attention model to obtain a two-dimensional space attention characteristic diagram M _s1 ；

By combining channel characteristics F _c1 And two-dimensional spatial attention profile M _s1 Multiplication to obtain a spatial feature F _s ；

According to the input characteristic diagram F _i The spatial feature F _s Multiplying to determine feedback variable F _FB And apply the feedback variable F _FB Inputting the model into the channel attention model to obtain a one-dimensional channel attention characteristic diagram M _c2 Thereafter, one-dimensional channel attention profile M _c2 And feedback variable F _FB Multiplying to obtain channel characteristic F _c2 ；

By combining channel characteristics F _c2 Inputting the two-dimensional space attention model to obtain a two-dimensional space attention characteristic diagram M _s2 ；

By combining channel characteristics F _c2 And two-dimensional spatial attention profile M _s2 Multiplying to obtain an input feature map F of the current ith layer scale _i Is a micro Doppler characteristic diagram

In one embodiment of the invention, the channel attention model comprises: a first maximum pooling layer, a first average pooling layer and a multi-layer fully connected neural network;

mapping each scale feature of the feature pyramid network F _i Inputting the channel attention model to obtain channel characteristics

Comprises the steps of:

will input a feature map F _i Respectively transmitting to a first maximum pooling layer and a first average pooling layer to obtain a first maximum pooling characteristic diagram

And a first averaged pooling profile +.>

Pooling the first maximum feature map

And a first averaged pooling profile +.>

The multi-layer fully-connected neural network with the activation function of ReLU is input according to the forward direction, and two output characteristics of the multi-layer fully-connected neural network are added to obtain the one-dimensional channel injectionForce of intention characteristic diagram M _c1 One-dimensional channel attention characteristic diagram M _c1 With the current input feature map F _i Multiplying to obtain channel characteristic F _c1 。

In one embodiment of the invention, the spatial attention model comprises: a second max-pooling layer, a second average pooling layer, and a 5 x 5 convolution kernel;

by combining channel characteristics F _c1 Inputting the two-dimensional space attention model to obtain a two-dimensional space attention characteristic diagram M _s1 Comprises the steps of:

by combining channel characteristics F _c1 Respectively inputting to a second maximum pooling layer and a second average pooling layer to obtain a two-dimensional maximum pooling characteristic diagram

Two-dimensional average pooling feature map->

Merging the two space attention characteristic images into a matrix, and generating a two-dimensional space attention characteristic image M after the dimension reduction of a convolution layer with a convolution kernel of 5 multiplied by 5 _s1 。

In one embodiment of the invention, the feedback variable

Representing a dot product operation. Compared with the prior art, the invention has the beneficial effects that:

the invention provides a radar target identification method based on a trans-scale feature aggregation network, which processes radar echo signals of targets through a synchronous extrusion S-transformation time-frequency analysis algorithm, can inhibit clutters such as multipath interference and the like, and further generates a high-resolution time-frequency diagram with breathing features; meanwhile, the micro Doppler feature map in the time-frequency map is extracted through the trans-scale feature aggregation network, so that the identification accuracy of the target is improved, and the method can be used for detecting vital sign signals.

The present invention will be described in further detail with reference to the accompanying drawings and examples.

Drawings

FIG. 1 is a flowchart of a method for radar target identification based on a cross-scale feature aggregation network according to an embodiment of the present invention;

FIG. 2 is a time-frequency image acquisition diagram provided by an embodiment of the present invention;

FIG. 3 is a graph showing contrast of different time-frequency images provided by an embodiment of the present invention;

FIG. 4 is a schematic diagram of the overall architecture of a method for radar target recognition based on a cross-scale feature aggregation network according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a cross-scale feature aggregation network provided by an embodiment of the present invention;

FIG. 6 is a comparison of micro Doppler feature maps of time-frequency images provided by an embodiment of the present invention;

fig. 7a is a schematic diagram of an experimental scenario provided in an embodiment of the present invention;

FIG. 7b is a schematic diagram of another experimental scenario provided by an embodiment of the present invention;

fig. 8 is a comparison chart of convergence curves of a time-frequency analysis algorithm according to an embodiment of the present invention.

FIG. 9 is a graph comparing accuracy rate curves of human body recognition results of different training models according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to specific examples, but embodiments of the present invention are not limited thereto.

Fig. 1 is a flowchart of a method for identifying radar targets based on a cross-scale feature aggregation network according to an embodiment of the present invention. As shown in fig. 1, an embodiment of the present invention provides a radar target recognition method based on a cross-scale feature aggregation network, including:

s1, acquiring skin micro-vibration echo caused by target respiration by using a radar to obtain a radar echo signal;

s2, processing radar echo signals by using a synchronous extrusion S-transformation time-frequency analysis algorithm, and generating a time-frequency image containing breathing characteristics;

s3, dividing and aggregating time-frequency images with different scales by using a feature pyramid network, and extracting micro Doppler feature images from feature images with various scales output by the feature pyramid network by using a CSFA (cross-scale feature aggregation, trans-scale feature aggregation) network;

s4, after the aggregate micro Doppler feature map is generated based on the micro Doppler feature map, the aggregate micro Doppler feature map is input into a softMax classifier, and an identification result of a radar detection target is obtained.

It should be understood that, since the heartbeat vibration of the target is weak, the characteristic error extracted from the heartbeat signal is large, which affects the accuracy of target identification, the present embodiment selects the respiratory signal caused by the fluctuation of the target chest as the target radar echo signal. Specifically, the radar acquires a radar echo signal by collecting skin micro-vibration echo caused by target respiration, and performs preprocessing operations such as local oscillator mixing, hamming window filtering, inverse fast Fourier transform, moving target display, constant false alarm detection and the like on the radar echo signal. Radar echoes accumulate over time to produce a two-dimensional range profile that can show changes in target position over time, but cannot represent doppler characteristic information of the target, and is relatively weak in noise immunity. In the processing process of radar echo signals, an STFT (Short-time Fourier Transform) algorithm is generally adopted to acquire micro Doppler characteristics, however, the resolution problem of the algorithm can cause spectrum leakage, and a blurring effect can occur to a time-frequency image. In view of this, the embodiment of the present invention generates a high-resolution time-frequency image containing respiratory features by using SSST (Synchrosqueezing S-Transform, synchronous extrusion S-Transform) algorithm, where SSST algorithm is a continuous reversible process for identifying and extracting oscillation components such as time-varying frequency and amplitude from a uniformly sampled signal, which can overcome the problem of time and frequency expansion in the conventional time-frequency analysis method, and is beneficial to improving the spectral resolution, so as to extract a micro doppler feature map with higher resolution later. Meanwhile, compared with the time-frequency image generated by STFT, the time-frequency image generated by SSST has obviously improved frequency focusing property, and the time-frequency image of the respiratory feature has clear envelope and no frequency mutation item.

Optionally, using SSST time-frequency analysis algorithm, the radar echo signal is processed according to the following formula:

wherein f _k 、f _c Δf _c Respectively representing the discrete frequency of the S transformation, the center frequency of the extrusion section and the bandwidth of the extrusion section, b represents the time axis displacement parameter, and Δf _k ＝f _k -f _k-1 ，Δf _c ＝f _c -f _c-1 ，ST(f _k B) represents S-transformation of radar echo signals, f _c (f _k B) represents the instantaneous frequency of the radar echo signal. The representation transforms the center frequency f on the spectrum _c Surrounding frequency interval [ f _c -0.5Δf _c ,f _c +0.5Δf _c ]The frequency spectrum in (a) is superimposed and placed at the central frequency f _c The resolution is enhanced by compressing the S-transform time spectrum of a frequency bin to a frequency point.

Specifically, the step frequency waveform s (t) of radar emissions may be written as follows:

where N represents the total number of frequency bins of the stepped frequency signal, T represents the duration of each frequency bin, f _L Representing the starting carrier frequency, Δf represents the frequency bandwidth, and the function of rect (·) is represented as follows:

/>

the radar return signal r (t) reflected from the target may be expressed as follows:

wherein K represents target respirationThe number of frequency points τ _i Indicating the round trip time of the ith scattering point, r _ω (t) and r _n (t) represents the reflected echo and other environmental disturbances and noise, respectively, of the wall.

First, the S-transform of radar return signal r (t) may be expressed as follows:

where t represents time, f represents frequency, and b represents a time axis displacement parameter. Considering the time-spectral energy distribution of the signal at f=f ₀ At, but with the actual time spectrum at f ₀ There is a spurious band nearby, so the instantaneous frequency expression of the radar echo signal r (t) is:

finally, the synchronous extrusion S-transform of the radar return signal r (t) can be expressed as follows:

SSST (f) calculated according to the above formula _c B) obtaining the corresponding time-frequency image with the breathing characteristic.

The step of processing the radar echo signal by using a synchronous extrusion S-transformation time-frequency analysis algorithm and generating a time-frequency image containing respiratory characteristics is preceded by the following steps:

Fig. 2 is a time-frequency image acquisition diagram provided in an embodiment of the present invention. Specifically, as shown in fig. 2, the present embodiment acquires a radar echo signal including a respiratory feature by performing preprocessing operations such as local oscillator mixing, hamming window filtering, inverse fast fourier transform, moving object display, and constant false alarm detection on the radar echo signal. The radar echo signals containing the breathing characteristics are accumulated in time to generate a two-dimensional range profile which can display the change condition of the position of the target along with time, but cannot represent Doppler characteristic information of the target, and has weak noise resistance. The data of the effective channels are selected from the two-dimensional range profile and processed by using an SSST time-frequency analysis algorithm, so that the corresponding time-frequency image of each channel is obtained, and then the data of all the effective channels are aggregated to generate a final time-frequency image.

Fig. 3 is a comparison chart of different time-frequency images provided by the embodiment of the present invention, where the first behavior is a time-frequency image generated by using an SSST time-frequency analysis algorithm, the second behavior is a time-frequency image generated by using an STFT time-frequency analysis algorithm, the target corresponding to the (a) column is a stationary person, the target corresponding to the (b) column is a stationary pig, the target corresponding to the (c) column is a stationary dog, and the target corresponding to the (d) column is a stationary cat. Obviously, as shown in fig. 3, the time-frequency image generated by SSST is clearer than the image envelope of the time-frequency image generated by STFT, the frequency focusing performance is obviously improved, and meanwhile, the SSST time-frequency analysis algorithm relieves the blurring effect of the time-frequency image, and each oscillation component of the radar echo signal can be well concentrated in the time-frequency image.

Although the targets can be distinguished according to different respiratory features in the time-frequency image, there is a problem of overlapping intervals between respiratory features of each target, resulting in difficulty in identifying different targets. Therefore, in order to make the respiratory features contained in the generated time-frequency image meet the requirement of target recognition, the embodiment of the invention needs to further extract the micro-doppler feature map in the time-frequency image through the neural network.

Fig. 4 is a schematic overall architecture of a radar target recognition method based on a cross-scale feature aggregation network according to an embodiment of the present invention, and fig. 5 is a schematic structural diagram of the cross-scale feature aggregation network according to an embodiment of the present invention. Optionally, referring to fig. 4-5, the csfa network includes a channel attention model and a spatial attention model;

in the step S3, the steps of dividing and aggregating the time-frequency images by using the feature pyramid network to obtain different scales, and extracting the micro doppler feature map from the output feature map of each scale processed by the feature pyramid network by using the CSFA network include:

s301, dividing time-frequency images into different scales by utilizing a characteristic pyramid network;

s302, dividing the input feature map F of each scale _i Inputting the channel attention model to obtain a one-dimensional channel attention feature map M _c1 Thereafter, one-dimensional channel attention profile M _c1 And input feature map F _i Multiplying to obtain channel characteristic F _c1 Channel characteristics F _c1 Input into a spatial attention model to obtain a two-dimensional spatial attention characteristic diagram M _s1 ；

S303, characterizing the channel F _c1 And two-dimensional spatial attention profile M _s1 Multiplication to obtain a spatial feature F _s ；

S304, according to the input feature diagram F _i Spatial characteristics F _s Multiplying to determine feedback variable F _FB And will feed back variable F _FB Inputting the model into a channel attention model to obtain a one-dimensional channel attention characteristic diagram M _c2 Thereafter, one-dimensional channel attention profile M _c2 And feedback variable F _FB Multiplying to obtain channel characteristic F _c2 ；

S305, channel characteristic F _c2 Input into a spatial attention model to obtain a two-dimensional spatial attention characteristic diagram M _s2 ；

S306, channel characteristic F _c2 And two-dimensional spatial attention profile M _s2 Multiplying to obtain an input feature map F of the current ith layer scale _i Is a micro Doppler characteristic diagram

It should be understood that the breathing characteristics included in the time-frequency image are time-varying, and the problem of overlapping intervals in the time-frequency image is caused by the limitation of the distance resolution, so that the key to solve the problem of object identification is to extract the differential breathing characteristics of different objects from the time-frequency image.

In this embodiment, after obtaining the micro doppler feature map, the input feature map F of the current i-th layer scale is also needed _i Is a micro Doppler characteristic diagram

Micro Doppler characteristic diagram +.>

Aggregation is carried out in an up-sampling mode, and an aggregate micro Doppler characteristic Map is obtained _i-1 . Further, the aggregated micro Doppler feature map is input to a softMax classifier, and a radar target identification result is obtained.

The inventor finds that the performance of the depth CNN is better than that of the shallow CNN in the research process, and micro Doppler features can be captured in a layered manner from a time-frequency image so as to obtain good classification performance. However, the depth CNN has a problem that the accuracy is always in a saturated state, which hinders the process of feature optimization. In order to solve the problem, the embodiment uses the depth residual error learning network to extract the characteristic information of the image, and simultaneously, the convolution characteristics of different layers can improve the complementary characteristic information, thereby being beneficial to improving the identification performance of people and animals.

As shown in FIG. 4, resnet-18 has better feature extraction capability on a smaller image dataset and relatively small performance overhead, so in this embodiment, the Resnet-18 model is used as a Backbone to extract convolution features, and the feature pyramid network extracts micro Doppler feature maps from the time-frequency image. Specifically, as shown in fig. 5, the bottom-up flow line is the forward processing of the network, and each extracted feature layer is the output of a layer on the same level. The time-frequency image is based on the input of a feature pyramid network of Resnet-18, the output of which is a 4-layer feature pyramid obtained by applying steps {4,8,16,32}, i.e., P2, P3, P4, P5. And in the flow line from top to bottom, the feature images output by the CSFA network are aggregated by upsampling the high-level semantic feature information. Of these, the high-level feature map contains more information about respiratory features and the low-level feature map contains more information about respiratory feature details.

The feature pyramid network can realize multi-feature representation and improve the image target identification performance. However, the high-level feature map ignores information about the breathing feature details, and the low-level feature map ignores information about the breathing features. To solve this problem, the present embodiment designs a CSFA architecture to extract micro doppler feature patterns in time-frequency images. The network is mainly used for carrying out feature aggregation and feature enhancement on each micro Doppler feature, and simultaneously focusing on the interesting depth feature and suppressing unnecessary features from the channel and the space response, so that a micro Doppler feature map is effectively extracted from a time-frequency image.

Illustratively, the CSFA network includes a channel attention model and a spatial attention model, the channel attention model including: the spatial attention model comprises a first maximum pooling layer, a first average pooling layer and a multi-layer fully connected neural network, wherein the spatial attention model comprises: a second max-pooling layer, a second average pooling layer, and a 5 x 5 convolution kernel;

mapping each scale feature of the feature pyramid F _i Inputting the channel attention model to obtain channel characteristic F _ic1 Comprises the steps of:

And a first averaged pooling profile +.>

Pooling the first maximum feature map

And a first averaged pooling profile +.>

The multi-layer fully-connected neural network with the activation function of ReLU is input according to the forward direction, and two output characteristics of the multi-layer fully-connected neural network are added to obtain a one-dimensional channel attention characteristic diagram M _c1 One-dimensional channel attention characteristic diagram M _c1 With the current input feature map F _i Multiplying to obtain channel characteristic F _c1 。

Further, the channel characteristics F _c1 Inputting the two-dimensional space attention model to obtain a two-dimensional space attention characteristic diagram M _s1 Comprises the steps of:

Two-dimensional average pooling feature map->

Specifically, a feature map is input

Wherein C, H, W respectively represent the input feature images F _i Channel number, image height, and image width; for the channel attention model, input feature map F _i The first maximum pooling feature map +.>

And a first averaged pooling profile +.>

Respectively inputting the two feature maps into a multi-layer fully-connected neural network with an activation function of ReLU according to a forward input mode, and adding two output features of the multi-layer fully-connected neural network to obtainCx1×1 one-dimensional channel attention profile M _c1 Finally, the attention characteristic diagram M of the one-dimensional channel _c1 With the current input feature map F _i Multiplying to obtain channel characteristic F _c1 。

For the spatial attention model, channel characteristics F _c1 As input, respectively passing through a second maximum pooling layer and a second average pooling layer to obtain a two-dimensional maximum pooling characteristic diagram

Two-dimensional average pooling feature map->

Combining the two feature images into a matrix, and then reducing the dimension of a convolution layer with a convolution kernel of 5×5 to generate a two-dimensional space attention feature image M with H×W×1 _s1 Finally M _s1 And F is equal to _c1 Multiplication to obtain spatial feature F _s The course of the process can be written in the form:

wherein, the liquid crystal display device comprises a liquid crystal display device,

representing an input profile, < >>

Representing multiplication by array elements. The complete flow of the attention model can be written in the form:

sigma represents a Sigmoid function for the juxtaposition operator, < ->

And->

Representing the weights of an MLP network, a ReLU activation function followed by W ₀ R is the decay ratio, f ^5×5 A convolution operation with a convolution kernel size of 5 x 5 is represented.

In the present embodiment, spatial feature F _s And current feature map F _i Multiplication can obtain feedback variable F _FB Determining a feedback variable F _FB After that, F _FB New input feature map as channel attention and spatial attention model, final output F _out Is an output characteristic diagram processed by the model. In this way, the feature information of the current layer can use feedback connections to obtain more micro-doppler feature maps, making the new features more diverse. F (F) _FB Can be written in the following form:

feedback variable F _FB As the characteristics of the bidirectional information flow, the connection of each characteristic is greatly improved in the space dimension and the channel dimension, so that the high-level characteristics and the low-level characteristics are utilized, and the high-quality target identification is realized. Optionally, before the step of inputting the aggregated micro doppler profile to the SoftMax classifier to obtain the identification result of the target, the method further includes:

obtaining a pre-trained Resnet-18 model, and loading a time-frequency diagram data set model file;

in step S4, the step of inputting the aggregated micro doppler profile to a SoftMax classifier to obtain the identification result of the target includes:

and inputting the aggregated micro Doppler feature map into a SoftMax classifier, so that the SoftMax classifier compares the aggregated micro Doppler feature map with a time-frequency map data set model file, and the result after comparison is the identification result of the target.

In the embodiment, a pre-trained Resnet-18 model can be obtained before target identification, and a time-frequency diagram data set model file is loaded, so that the diversity of a data set is improved by adding 5-10 dB of Gaussian white noise into radar echo; further, the aggregated micro Doppler feature map is input to a SoftMax classifier, the SoftMax classifier compares and evaluates in a time-frequency map data set, and finally, the identification result of the target is output, so that the identification of the human and the animal is completed.

Figure 6 is a comparison of micro-doppler profiles of time-frequency images generated by different modeling methods in accordance with an embodiment of the present invention. As shown in fig. 6, when the CSFA network is introduced, more effective micro-doppler features can be obtained from the time-frequency image, so that breathing characteristics of different targets can be fully reflected.

Further, this example performed four sets of comparative experiments targeting humans and three different animals, respectively. Specifically, in this experiment, the radar works in the frequency range of 1.0-2.0 GHz, the duration of each frequency is 100s, the Pulse Repetition Interval (PRI) is 70ms, and a transmitting-receiving array antenna is used for transmitting and receiving signals to the wall.

Fig. 7a and 7b are schematic diagrams of experimental scenarios provided in an embodiment of the present invention, wherein the thickness of the brick wall is about 0.20m. Referring to fig. 7a-7b, humans, dogs, cats and pigs were divided into 4 groups of subjects, each of which was subjected to 10 experiments, in each of which humans and animals were required to remain stationary as subjects. To ensure that the subjects were at rest in each experiment, the time of experimental detection was set to 90s. It should be noted that, since it cannot be guaranteed that the cat is still, during the experimental test, the cat is put into a plastic box in this embodiment, so that interference caused by metal to the radar echo collection is avoided. In addition, the perception range of the through-wall radar is set to be 0-5 m according to the size of an experimental scene.

All models configured NVIDIA GTX 1080GPU and Intel i5-8400, and the optimizer selected SGD mode with a learning rate of 0.01. The main evaluation index adopted in this embodiment is Average Precision (AP), and the definition mode of the accuracy parameter may be written as follows:

the definition mode of the AP can be written as follows:

where TP represents true positive, FP represents false positive, p represents precision, r represents recall, and p is a function of r, which is the sum of the areas under the curve.

Fig. 8 is a comparison chart of convergence curves of a time-frequency analysis algorithm according to an embodiment of the present invention. As shown in fig. 8, the data set of the STFT algorithm converged after 60 iterations, and the data set of the SST algorithm converged after 20 iterations. The comparison result shows that the data set of the SST algorithm has higher convergence speed than the data set of the STFT algorithm. The trained network includes Resnet-18, resnet-18+CBAM, MSRA network and CSFA network architecture, and the precision curves of the four recognition models are shown in FIG. 9. Compared with other networks, the network model of the invention has higher recognition precision, proves the effectiveness of the CSFA network, and the training result is shown in table 1.

TABLE 1

As shown in table 1, the four methods are arranged in the following order for different target recognition performance: resnet-18, resnet-18+CBAM, MSRA networks and methods for target recognition based on through-the-wall radars provided by the invention. Among them, MSRA networks are based on Resnet-18 networks, which add a multi-scale attention mechanism, thus improving the performance of recognizing objects. Experimental results show that the CSFA network improves the precision by 2.57% compared with the CBAM network, and compared with the MSRA network, the classification accuracy of the invention in human, dog, cat and pig is respectively improved by 1.23%, 0.37%, 2.02% and 1.7%. Therefore, the CSFA network can be obtained to effectively extract micro Doppler features from the time-frequency diagram, and accuracy of time-frequency diagram identification of different targets is improved.

According to the above embodiments, the beneficial effects of the invention are as follows:

the invention provides a radar target identification method based on a trans-scale feature aggregation network, which processes radar echo signals of targets through a synchronous extrusion S-transformation time-frequency analysis algorithm, can inhibit clutters such as multipath interference and the like, and further generates a high-resolution time-frequency diagram with breathing features; meanwhile, the micro Doppler characteristic map in the time-frequency map is extracted through the trans-scale characteristic aggregation network, so that the identification accuracy of the target is improved. In the description of the present invention, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Further, one skilled in the art can engage and combine the different embodiments or examples described in this specification.

Although the present application has been described herein in connection with various embodiments, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed application, from a review of the figures, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the "a" or "an" does not exclude a plurality. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

The foregoing is a further detailed description of the invention in connection with the preferred embodiments, and it is not intended that the invention be limited to the specific embodiments described. It will be apparent to those skilled in the art that several simple deductions or substitutions may be made without departing from the spirit of the invention, and these should be considered to be within the scope of the invention.

Claims

1. A radar target identification method based on a trans-scale feature aggregation network is characterized by comprising the following steps:

dividing and aggregating the time-frequency images with different scales by using a characteristic pyramid network, and extracting micro Doppler characteristic images from characteristic images of all scales output by the characteristic pyramid network by using a cross-scale characteristic aggregation CSFA network;

2. The method of claim 1, further comprising, prior to the step of generating an aggregated micro-doppler profile based on the micro-doppler profile:

3. The method for radar target recognition based on a cross-scale feature aggregation network according to claim 2, wherein the step of processing the radar echo signal using a synchronous extrusion S-transform time-frequency analysis algorithm to generate a time-frequency image containing respiratory features is preceded by the step of:

4. A method of radar target recognition based on a cross-scale feature aggregation network as claimed in claim 3, wherein the step of processing the radar echo signal using a synchronous extrusion S-transform time-frequency analysis algorithm to generate a time-frequency image containing respiratory features comprises:

5. The method for radar target recognition based on a cross-scale feature aggregation network according to claim 4, wherein the radar echo signal is processed by using a synchronous extrusion S-transform time-frequency analysis algorithm according to the following formula:

6. The method of claim 1, wherein the CSFA network comprises a channel attention model and a spatial attention model;

input feature map F of each scale obtained by dividing _i (i=2, 3,4, 5) inputting the channel attention model to obtain a one-dimensional channel attention profile M _c1 Thereafter, one-dimensional channel attention profile M _c1 And input feature map F _i Multiplying to obtain channel characteristic F _c1 Channel characteristics F _c1 Inputting the two-dimensional space attention model to obtain a two-dimensional space attention characteristic diagram M _s1 ；

By combining channel characteristics F _c2 And two-dimensional spatial attention profile M _s2 Multiplying to obtain an input feature map F of the current ith layer scale _i Micro Doppler profile F of (2) _iout 。

7. The method of claim 6, wherein the channel attention model comprises: a first maximum pooling layer, a first average pooling layer and a multi-layer fully connected neural network;

Comprises the steps of:

will be transportedGo into feature map F _i Respectively transmitting to a first maximum pooling layer and a first average pooling layer to obtain a first maximum pooling characteristic diagram

And a first averaged pooling profile +.>

Pooling the first maximum feature map

And a first averaged pooling profile +.>

8. The method of claim 7, wherein the spatial attention model comprises: a second max-pooling layer, a second average pooling layer, and a 5 x 5 convolution kernel;

Two-dimensional average pooling feature map->

9. The method for radar target identification based on a cross-scale feature aggregation network of claim 8, wherein the feedback variable

Representing a dot product operation. />