CN113341383B

CN113341383B - Anti-interference intelligent decision method for radar based on DQN algorithm

Info

Publication number: CN113341383B
Application number: CN202110601114.1A
Authority: CN
Inventors: 张娟; 段燕辉; 张林让; 丁彤
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2023-06-30
Anticipated expiration: 2041-05-31
Also published as: CN113341383A

Abstract

The invention discloses a radar anti-interference intelligent decision method based on a DQN algorithm, which is mainly used for solving the problems of low selection efficiency, low accuracy, poor stability and incapability of processing large-data-level tasks caused by complex processing steps and large calculation amount of manually separating signal characteristic identification signals in a Q learning method, and comprises the following implementation steps: (1) the jammer transmitting an active jammer signal; (2) the radar identifies the type of active signal it receives; (3) constructing two convolutional neural networks; (4) generating a radar state matrix; (5) constructing a loss function; (6) training a network of estimates; (7) selecting the best anti-interference method by using the DQN algorithm. The method can effectively select the optimal anti-interference method aiming at the active interference signal, has the advantages of strong real-time performance, high accuracy and good stability, and can be used for intelligent selection of the optimal anti-interference method corresponding to the active interference signal.

Description

Anti-interference intelligent decision method for radar based on DQN algorithm

Technical Field

The invention belongs to the technical field of radars, and further relates to a radar anti-interference intelligent decision method based on a depth Q network DQN (Deep Q Network) algorithm in the technical field of radar anti-interference. The method utilizes the DQN algorithm to actively interfere a certain radar and selects the optimal anti-interference technology, thereby realizing effective anti-interference.

Background

The perception capability of the traditional radar to the combat environment can not meet the actual requirement, and meanwhile, the radar anti-interference mode mainly depends on a manual pre-selection anti-interference method, so that complex interference scenes can not be rapidly dealt with and the combat situation can not be rapidly responded. The operator judges the condition of the radar subjected to interference according to the radar display result and combining with own experience, then selects from the prefabricated anti-interference schemes, and the radar does not have the capability of completing such a group of anti-interference actions, namely, the radar is seriously dependent on personnel, and the intelligent degree is low.

The university of Hehai discloses a method for selecting the best radar anti-interference technology based on a Q-learning algorithm in the patent literature 'a radar anti-interference method and system based on Q-learning' (patent application number 201910811779.8, application publication number CN 110515045A). Firstly, taking interference signals received by a radar as an interference state set and taking different self-adaptive anti-interference methods as an action set; secondly, taking the state action cost function as an evaluation function, and establishing a reinforcement learning model; and finally, selecting the optimal radar anti-interference technology from the anti-interference technology set by utilizing a Q-learning algorithm. The method has the defects that the Q value is updated by using a simple value iteration mode to cause the Q value to be overestimated, so that the radar anti-interference effect is poor.

Xing Jiang, gu Xin and Zhu Weigang in their published papers "smart radar countermeasure based on Q-learning" (system engineering and electronics, 2018,5,1030-05) propose a method for selecting the best anti-interference technique by using a multifunctional radar of Q-learning algorithm. Firstly, receiving an interference signal by using a radar receiver, and adopting a manual separation signal characteristic; then, identifying the processed interference signals; and finally, autonomously synthesizing an interference pattern according to the current radar state to serve as an optimal anti-interference method. The method has the defects that the method adopts manual separation signal characteristics to identify signals, so that the processing steps are complex, the calculated amount is large, when the optimal anti-interference method is selected, the speed of selecting the optimal anti-interference method is low, and large-data-level tasks can not be processed.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides an anti-interference intelligent decision method of a radar based on a DQN algorithm, which is used for solving the problems of low selection efficiency, low accuracy, poor stability and incapability of processing large-data-level tasks caused by complex processing steps and large calculated amount of manually separating signal characteristic identification signals by a Q learning method.

The idea for realizing the purpose of the invention is as follows: constructing an estimated value network and a target value network, inputting a radar state matrix into two convolutional neural networks, enabling the two convolutional neural networks to autonomously process data, obtaining convergence matrices output by the two networks, and solving the problem that a large data level decision task cannot be solved by a Q learning method in the prior art. And selecting the maximum convergence value in the convergence matrix by using a greedy algorithm, and selecting the optimal anti-interference method corresponding to the maximum convergence value from the anti-interference method library, thereby further improving the selection accuracy and stability of the optimal anti-interference method.

The technical scheme for realizing the aim of the invention comprises the following steps:

(1) The jammer transmits an active interference signal;

(2) The radar identifies the type of active signal it receives;

(3) Two convolutional neural networks are constructed:

(3a) Constructing an estimated value network consisting of an input layer, a convolution layer and a pooling layer, wherein the output layer is used for initializing a network weight parameter w to 0.01 and initializing a bias parameter to 0;

(3b) Constructing a target value network consisting of an input layer, a convolution layer and a pooling layer, wherein the output layer is used for forming a target value network, the network weight parameter w is initialized to 0.01, and the bias parameter is initialized to 0;

(4) Generating a radar state matrix:

(4a) Selecting an anti-interference method a corresponding to the identified active interference signal type from an anti-interference method library by using an epsilon-greedy method;

(4b) Obtaining an anti-interference effective value R from an anti-interference gain table according to the identified corresponding relation between the active interference type and the anti-interference method a in the normalized gain table;

(4c) The anti-interference method a and the anti-interference effective value R form a radar state matrix S _t ；

(5) The loss function is constructed as follows:

wherein Loss represents LossFunction, gamma represents discount factor, its value range is [0,1]Max represents the maximum value taking operation, q(s) _t ) Representing a radar state matrix S _t A convergence matrix input to the output of the estimation network, q(s) _t+1 ) Representing the next radar state matrix S _t+1 A convergence matrix input to the target value network output;

(6) Training the estimated value network:

will radar state matrix S _t Inputting the parameters into an estimated value network, and iteratively updating the network parameters by using a gradient descent method until the loss function converges to obtain a trained estimated value network;

(7) Selecting an optimal anti-interference method by using a DQN algorithm;

will radar state matrix S _t Inputting into a trained estimated value network, outputting a 1×6 convergent matrix, each column in the convergent matrix corresponding to a convergent value of an anti-interference method, selecting the maximum convergent value in a row from the convergent matrix, and selecting the optimal anti-interference method a corresponding to the maximum convergent value from the anti-interference method library _t 。

Compared with the prior art, the invention has the following advantages:

firstly, the radar state matrix is processed by using the estimated value network and the target value network, so that the problem of limitation of manual data processing in the prior art is solved, a large data level decision task can be processed, the radar anti-interference data is stored by using the memory bank, and the optimal anti-interference method can be selected while training, so that the efficiency of selecting the optimal anti-interference method is improved.

Secondly, the maximum convergence value is directly selected from the convergence matrix output by the estimated value network, so that the problems of low selection accuracy caused by complex calculation steps and over-estimation of the Q value of the Q learning algorithm are solved, the method has the advantages of simplicity in calculation and capability of selecting the optimal anti-interference method for the row, and the accuracy and stability of selecting the optimal anti-interference method are greatly improved.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a graph showing the results of a challenge selection experiment between an active interference signal and an anti-interference method in a simulation experiment of the present invention;

FIG. 3 is a graph comparing normalized anti-interference effective values of the simulation experiment of the present invention.

Detailed Description

Embodiments and effects of the present invention are further described below with reference to the accompanying drawings.

The steps of the invention will be further described with reference to fig. 1.

Step 1, an active interference signal is emitted by an interference machine.

The active interference signal refers to any one of a sweep frequency interference signal, a noise frequency modulation interference signal, a dense false target interference signal, a speed dragging interference signal and a distance speed dragging interference signal

Step 2, the radar identifies the type of active signal it receives.

And 3, constructing two convolutional neural networks.

An estimated value network consisting of an input layer, a convolution layer and a pooling layer is built, the network weight parameter w is initialized to 0.01, and the bias parameter is initialized to 0.

A target value network consisting of an input layer, a convolution layer and a pooling layer is built, a network weight parameter w is initialized to 0.01, and a bias parameter is initialized to 0.

And 4, generating a radar state matrix.

And selecting an anti-interference method a corresponding to the identified active interference signal type from an anti-interference method library by using an epsilon-greedy method.

The epsilon-greedy method is characterized in that an anti-interference method is selected from an anti-interference method library according to the probability of epsilon, an anti-interference method corresponding to the identified active interference signal type is selected from the anti-interference method library according to the probability of 1-epsilon, and epsilon is a value selected from (0, 1).

The anti-interference method library comprises 6 anti-interference methods including agile frequency, sidelobe shadow, self-adaptive sidelobe cancellation, self-adaptive beam forming, main lobe transmitting waveform resisting and space-time self-adaptive.

And obtaining an anti-interference effective value R from the anti-interference gain table according to the corresponding relation between the identified active interference type and the anti-interference method a in the normalized gain table.

The normalized gain value table consists of a plurality of anti-interference methods and a plurality of active interference signal values, each anti-interference effective value is a value corresponding to each anti-interference method and the active interference signal thereof one by one, the size of each effective value is 0,1, and the abscissa in the normalized gain value table represents the active interference type and the ordinate represents the anti-interference method.

The anti-interference method a and the anti-interference effective value R form a radar state matrix S _t 。

Step 5, constructing a loss function as follows.

Wherein Loss represents a Loss function, gamma represents a discount factor, and the value range of the Loss is [0,1]]Max represents the maximum value taking operation, q(s) _t ) Representing a radar state matrix S _t A convergence matrix input to the output of the estimation network, q(s) _t+1 ) Representing the next radar state matrix S _t+1 A convergence matrix input to the target network output.

Said next radar state matrix S _t+1 Refers to the optimal anti-interference method a _t And obtaining an anti-interference effective value R from a normalized gain table _t A radar state matrix of 1×2 size is constructed.

And 6, training an estimated value network.

Will radar state matrix S _t And inputting the parameters into an estimated value network, and iteratively updating the network parameters by using a gradient descent method until the loss function converges to obtain a trained estimated value network.

And 7, selecting the optimal anti-interference method by using the DQN algorithm.

Will radar state matrix S _t Input to trainingOutputting a 1 multiplied by 6 convergence matrix in the refined estimation value network, wherein each column in the convergence matrix corresponds to a convergence value of an anti-interference method, selecting the maximum convergence value in one row from the convergence matrix, and then selecting the optimal anti-interference method a corresponding to the maximum convergence value from an anti-interference method library _t 。

The effects of the present invention are further described below in connection with simulation experiments.

1. Simulation experiment conditions:

the hardware platform of the simulation experiment of the invention: CPU is Intel Core i7-7700, and RAM is 16GB.

The software platform of the simulation experiment of the invention: windows 7 operating system and Pycharm 2019.

2. Simulation content and result analysis:

the simulation experiment of the invention adopts the method provided by the invention and the optimal anti-interference method corresponding to the selected active interference signal in the prior art (Q learning method), and generates a selection result diagram of the method, an anti-interference effective value diagram of the method and the Q learning method through simulation software Prchar 2019, wherein the results are shown in figures 2 and 3.

One prior art Q learning method employed in simulation experiments is disclosed in the following papers:

xing Jiang et al propose a method for selecting the best anti-interference technique using a Q-learning algorithm (abbreviated as Q-learning method) in "Q-learning based intelligent radar countermeasure" (system engineering and electronics, 2018,5,1030-05).

In the simulation experiment, epsilon of an epsilon-greedy algorithm is set to be 0.9, learning rates of an estimated value network and a target value network are set to be 0.6, discount factor gamma in a loss function is set to be 0.7, memory storage capacity of stored training data is set to be 300, the number M of training samples randomly selected each time is set to be 30, the number of interval steps of a parameter updating target value network is set to be 100, iteration times are set to be 100, an optimal anti-interference method is selected 100 times in each iteration process, a selection process is respectively performed for 1667 times of sweep frequency interference, noise frequency modulation interference, dense false target interference and speed dragging interference, and a selection process is performed for 1666 times of distance speed combined dragging interference.

The simulation of the invention comprises 5 active interference signals of sweep frequency interference, dense false target interference, speed dragging interference, noise frequency modulation interference and distance speed combined dragging interference.

The signal model for each disturbance is as follows.

The signal model of the sweep interference is as follows:

wherein J is ₁ (t) represents a swept interference signal, U _j For the amplitude of the sweep interference signal, cos represents cosine function operation, t represents discrete time after sampling by a radar jammer, and the center frequency of the signal is f _j ，Δf _j For the bandwidth of the noise to be present,

is uniformly distributed in [0,2 pi ]]And is combined with f _j Independent of each other.

The signal model of dense decoy interference is as follows:

wherein J is ₂ (t) represents a dense decoy interference signal, which is intermittent sample forwarding of the radar signal, the sampled signal being a rectangular pulse train, represents a multiplication operation,

representing a gate function +.>

Where τ represents the pulse width of the sample pulse train, Σ (·) represents the summation operation, P represents the number of pulses sampled, δ (t) represents the impulse function, +.>

T _s Representing the sampling pulse repetition period.

The signal model of the velocity-drag disturbance is as follows:

wherein J is ₃ Representing a speed trailing disturbance signal, A _J Representing the signal amplitude, exp represents an exponential operation based on a natural number e, ω _c For radar signal carrier frequency omega _d For the Doppler frequency of the radar echo signal, Δω represents the fixed Doppler frequency, R _t Representing the distance between the real target and the radar, c representing the speed of light.

The signal model of noise fm interference is as follows:

wherein J is ₄ Representing noise FM interference signals, U _j For signal amplitude, cos denotes the cosine function operation, K _FM Frequency modulation slope in constant form omega _j Is the center frequency, the modulation noise u (t) is zero in mean and zero in variance

Is a generalized stationary random process of->

Is uniformly distributed in [0,2 pi ]]Independent of u (t).

The signal model of the range-rate joint-towing interference is as follows:

wherein J is ₅ Indicating distance-velocity joint trailing interference signal, U _j Is a signalAmplitude exp represents an exponential operation based on a natural number e, f ₀ Carrier frequency of signal f _dj To pull the Doppler frequency, R _t Representing the distance between the real target and the radar, Δt _f C represents the speed of light, which is a drag delay function.

After the active interference signals emitted by the interference machine are received by the method in the simulation experiment, the type of the active interference signals is identified by the radar, two convolutional neural networks are constructed, the radar state is generated, and the optimal anti-interference method corresponding to the active interference signals is obtained after the processing of the DQN algorithm. A comparison of normalized anti-interference effective values of the DQN algorithm and the Q learning method of fig. 3 is thus obtained for the selection result graph between the active interference signal and the optimal anti-interference method shown in fig. 2.

Fig. 2 is a graph of the results of the optimal anti-interference methods corresponding to the 5 active interference signals in the simulation experiment according to the present invention, and the abscissa in fig. 2 is the number of anti-interference times of each optimal anti-interference method, and the ordinate indicates the convergence value corresponding to each optimal anti-interference method. Fig. 2 (a) is a convergence graph of anti-interference using an optimal anti-interference method corresponding to sweep interference, a dotted line marked with "+" in fig. 2 (a) is a convergence graph of anti-interference using an optimal anti-interference method corresponding to sweep interference, fig. 2 (b) is a convergence graph of anti-interference using an optimal anti-interference method corresponding to decoy interference, a dotted line marked with "x" in fig. 2 (b) is a convergence graph of anti-interference using an optimal anti-interference method corresponding to decoy interference, fig. 2 (c) is a convergence graph of anti-interference using an optimal anti-interference method corresponding to velocity trailing interference, a dotted line marked with "- - -" in fig. 2 (c) is a convergence graph of anti-interference using an optimal anti-interference method corresponding to velocity trailing interference, a dotted line marked with "·" in fig. 2 (d) is a convergence graph of anti-interference using an optimal anti-interference method corresponding to noise suppressing interference, and a convergence graph (e) is a convergence graph corresponding to velocity-interference distance of anti-interference using an optimal trailing interference method. Fig. 3 is a graph of normalized anti-interference effective values obtained using the method of the present invention and the Q learning method of the prior art, and the abscissa in fig. 3 refers to the number of iterative selections, which represents the normalized anti-interference effective values of the DQN algorithm and the Q learning method respectively performed 100 times in total. The curve marked "-" in figure 3 represents the normalized anti-interference effectiveness of the DQN algorithm, the curve marked with "-" represents the normalized anti-interference effective value of the Q learning method.

As can be seen from fig. 2 (a), when the interference type is the frequency sweep interference, 1667 simulation experiments for selecting the best anti-interference method are performed on the frequency sweep interference, 1442 frequency agile anti-interference methods are selected by using the method of the invention, and the intelligent decision accuracy of the radar is 84.7%.

As can be seen from fig. 2 (b), when the interference type is dense decoy interference, 1667 simulation experiments for selecting the best anti-interference method are performed on the dense decoy interference, 1406 side lobe concealment anti-interference methods are selected by using the method of the invention, and the intelligent decision accuracy of the radar is 84.3%.

As can be seen from fig. 2 (c), when the interference type is speed trailing interference, a simulation experiment of 1667 selecting an optimal anti-interference method is performed for speed trailing interference, and 1379 time space-time adaptive anti-interference methods are selected by using the method of the invention, so that the accuracy of radar intelligent decision is 82.7%.

As can be seen from fig. 2 (d), when the interference type is noise suppression interference, 1667 simulation experiments for selecting the best anti-interference method are performed for noise suppression interference, 1215 self-adaptive sidelobe anti-interference methods are selected by using the method of the invention, and the intelligent decision accuracy of the radar is 72.8%.

As can be seen from fig. 2 (e), when the interference type is distance-speed combined trailing interference, 1666 simulation experiments of selecting an optimal anti-interference method are performed for the distance-speed combined trailing interference, 1258 anti-main lobe emission waveform anti-interference methods are selected by using the method of the invention, and the radar intelligent decision accuracy is 75.5%.

In summary, the intelligent radar decision system based on the DQN algorithm can obtain the result that the average decision accuracy is 80%, and has higher decision accuracy.

As can be seen from fig. 3, compared with the normalized anti-interference effective value of the Q learning method, the normalized anti-interference effective value of the DQN algorithm gradually increases and is far greater than the normalized anti-interference effective value of the Q learning method, which indicates that the anti-interference effect of the method of the present invention is better than that of the Q learning method, and as can be seen from the graph, when the iteration number is about 40, the normalized anti-interference effective value of the method of the present invention tends to be stable compared with the normalized anti-interference effective value of the Q learning method, and the stability of the method of the present invention is proved to be good, the processing speed is fast, the large data level decision task can be processed, and the selection efficiency of selecting the optimal anti-interference method can be improved.

Claims

1. The intelligent anti-interference radar decision-making method based on the DQN algorithm is characterized in that the current state of the radar is constructed by utilizing interference signals which are already identified by the radar, and under the condition that two convolutional neural networks are trained while making decisions, the optimal anti-interference method is selected by the DQN algorithm, and the method comprises the following steps:

(1) The jammer transmits an active interference signal;

(2) The radar identifies the type of active signal it receives;

(3) Two convolutional neural networks are constructed:

(4) Generating a radar state matrix:

(5) The loss function is constructed as follows:

wherein Loss represents a Loss function, gamma represents a discount factor, and the value range of the Loss is [0,1]]Max represents the maximum value taking operation, q(s) _t ) Representing a radar state matrix S _t A convergence matrix input to the output of the estimation network, q(s) _t+1 ) Representing the next radar state matrix S _t+1 A convergence matrix input to the target value network output;

(6) Training the estimated value network:

(7) Selecting an optimal anti-interference method by using a DQN algorithm;

2. The intelligent anti-interference radar decision method based on the DQN algorithm according to claim 1, wherein the active interference signal in the step (1) refers to any one of a frequency sweep interference signal, a noise frequency modulation interference signal, a dense false target interference signal, a speed trailing interference signal and a distance speed combined trailing interference signal.

3. The intelligent decision-making method for radar anti-interference based on DQN algorithm according to claim 1, wherein the epsilon-greedy method in step (4 a) refers to selecting an anti-interference method from the anti-interference method library with probability of epsilon, and selecting an anti-interference method corresponding to the identified active interference signal type from the anti-interference method library with probability of 1-epsilon, epsilon being a value selected between (0, 1).

4. The intelligent decision-making method for radar anti-interference based on DQN algorithm according to claim 1, wherein the anti-interference method library in step (4 a) includes 6 anti-interference methods of agility, sidelobe concealment, adaptive sidelobe cancellation, adaptive beamforming, anti-main lobe transmit waveform and space-time adaptation.

5. The intelligent decision-making method of the radar anti-interference based on the DQN algorithm as claimed in claim 1, wherein the normalized gain value table in the step (4 b) is composed of a plurality of anti-interference methods and a plurality of active interference signal values, each anti-interference effective value is a value of one-to-one correspondence between each anti-interference method and its active interference signal, the magnitude of each effective value is in [0,1], the abscissa in the normalized gain value table represents the active interference type, and the ordinate represents the anti-interference method.

6. The intelligent decision-making method for radar anti-interference based on DQN algorithm as claimed in claim 1, wherein the next radar state matrix S in step (5) _t+1 Refers to the optimal anti-interference method a _t And obtaining an anti-interference effective value R from a normalized gain table _t A radar state matrix of 1×2 size is constructed.