CN113341383A

CN113341383A - Radar anti-interference intelligent decision method based on DQN algorithm

Info

Publication number: CN113341383A
Application number: CN202110601114.1A
Authority: CN
Inventors: 张娟; 段燕辉; 张林让; 丁彤
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2021-09-03
Anticipated expiration: 2041-05-31
Also published as: CN113341383B

Abstract

The invention discloses a DQN algorithm-based radar anti-interference intelligent decision method, which is mainly used for solving the problems of low selection efficiency, low accuracy, poor stability and incapability of processing large data-level tasks caused by the fact that a Q learning method manually separates signal feature recognition signals, the processing steps are complex and the calculated amount is large, and the implementation steps are as follows: (1) the jammer transmits an active jammer signal; (2) the radar identifies the type of active signal it receives; (3) constructing two convolutional neural networks; (4) generating a radar state matrix; (5) constructing a loss function; (6) training an estimated value network; (7) the best interference rejection method is selected using the DQN algorithm. The method can effectively select the optimal anti-interference method aiming at the active interference signal, has the advantages of strong real-time property, high accuracy and good stability, and can be used for intelligently selecting the optimal anti-interference method corresponding to the active interference signal.

Description

Radar anti-interference intelligent decision method based on DQN algorithm

Technical Field

The invention belongs to the technical field of radars, and further relates to a radar anti-interference intelligent decision method based on a deep Q network DQN (deep Q network) algorithm in the technical field of radar anti-interference. The invention utilizes DQN algorithm to interfere with certain radar active, selects the method of the best anti-interference technology, and realizes effective anti-interference.

Background

The sensing capability of the traditional radar to the combat environment cannot meet the actual requirement, and meanwhile, the anti-interference mode of the radar mainly depends on an artificial preselection anti-interference method, so that the radar cannot rapidly cope with complex interference scenes and rapidly respond to the battlefield conditions. The operating personnel judge the condition that the radar is interfered according to the radar display result and by combining self experience, then select in the prefabricated anti-interference scheme, the radar itself does not have the ability of accomplishing such a set of anti-interference action, relies on personnel seriously promptly, and intelligent degree is low.

In patent document "a radar anti-interference method and system based on Q-learning" (patent application No. 201910811779.8, application publication No. CN 110515045 a), which is applied by the university of river and sea, a method for selecting an optimal radar anti-interference technology based on a Q-learning algorithm is disclosed. Firstly, taking interference signals received by a radar as an interference state set, and taking different self-adaptive anti-interference methods as an action set; secondly, establishing a reinforcement learning model by taking the state action value function as an evaluation function; and finally, selecting the optimal radar anti-interference technology from the anti-interference technology set by using a Q-learning algorithm. The method has the disadvantages that the Q value is over-estimated due to the fact that the Q value is updated in a simple value iteration mode, the radar is poor in anti-interference effect, and in addition, when the best anti-interference method is selected through a Q-learning algorithm, the anti-interference method is selected according to the over-estimated Q value, so that the selection accuracy rate and the selection efficiency are low, and the stability is poor.

In the published paper "intelligent radar countermeasure based on Q-learning" (system engineering and electronics, 2018,5,1030-05), chenqiang, jia xin and juwei, a method for selecting the best anti-interference technology by using a multifunctional radar of a Q learning algorithm is proposed. Firstly, receiving an interference signal by using a radar receiver, and adopting the characteristic of manually separating the signal; then, identifying the processed interference signal; and finally, autonomously synthesizing an interference pattern according to the current radar state to serve as an optimal anti-interference method. The method has the disadvantages that the signal characteristic identification signal is manually separated, the processing steps are complex, the calculated amount is large, when the optimal anti-interference method is selected, the speed of selecting the optimal anti-interference method is low, and the large data level task cannot be processed.

Disclosure of Invention

The invention aims to provide an anti-interference intelligent decision method of a radar based on a DQN algorithm aiming at the defects of the prior art, and the method is used for solving the problems of low selection efficiency, low accuracy, poor stability and incapability of processing large-data-level tasks caused by complex processing steps and large calculated amount due to the fact that a signal feature recognition signal is manually separated by a Q learning method.

The idea for realizing the purpose of the invention is as follows: an estimation value network and a target value network are constructed, the radar state matrix is input into the two convolutional neural networks, the two convolutional neural networks are enabled to process data autonomously, a convergence matrix output by the two networks is obtained, and the problem that a large data level decision task cannot be solved by a Q learning method in the prior art is solved. And selecting the maximum convergence value in the convergence matrix by using a greedy algorithm, and selecting the optimal anti-interference method corresponding to the maximum convergence value from the anti-interference method library, thereby further improving the selection accuracy and stability of the optimal anti-interference method.

The technical scheme for realizing the aim of the invention comprises the following steps:

(1) the jammer transmits an active jammer signal;

(2) the radar identifies the type of active signal it receives;

(3) two convolutional neural networks were constructed:

(3a) building an estimation value network consisting of an input layer, a convolution layer, a pooling layer and an output layer, wherein a network weight parameter w is initialized to 0.01, and a bias parameter is initialized to 0;

(3b) building a target value network consisting of an input layer, a convolution layer, a pooling layer and an output layer, wherein a network weight parameter w is initialized to 0.01, and a bias parameter is initialized to 0;

(4) and (3) generating a radar state matrix:

(4a) selecting an anti-interference method a corresponding to the identified active interference signal type from an anti-interference method library by using an epsilon-greedy method;

(4b) obtaining an anti-interference effective value R from the anti-interference gain table according to the identified active interference type and the corresponding relation of the anti-interference method a in the normalized gain value table;

(4c) forming an anti-interference method a and an anti-interference effective value R into a radar state matrix S_t；

(5) The loss function was constructed as follows:

wherein, Loss represents Loss function, gamma represents discount factor, and the value range is [0, 1%]Max denotes an operation of taking the maximum value, q(s)_t) Representing the radar state matrix S_tConvergence matrix, q(s), input to the output of the estimated value network_t+1) Indicating the next radar state matrix S_t+1A convergence matrix input to the target value network output;

(6) training an estimation value network:

the radar state matrix S_tInputting the data into an estimation value network, iteratively updating network parameters by using a gradient descent method until a loss function is converged to obtain a trained estimation value network;

(7) selecting the optimal anti-interference method by using a DQN algorithm;

the radar state matrix S_tInputting the signal into a trained estimation value network, outputting a 1 x 6 convergence matrix, wherein each column in the convergence matrix corresponds to the convergence value of an anti-interference method, selecting the maximum convergence value in one row from the convergence matrix, and selecting the optimal anti-interference method a corresponding to the maximum convergence value from an anti-interference method library_t。

Compared with the prior art, the invention has the following advantages:

firstly, because the invention uses the estimated value network and the target value network to process the radar state matrix, the limitation problem of manual data processing in the prior art is overcome, a large data level decision task can be processed, the radar anti-interference data is stored by using the memory base, the best anti-interference method can be selected while training, and the efficiency of selecting the best anti-interference method is improved.

Secondly, the invention directly selects the maximum convergence value from the convergence matrix output by the estimation value network, thereby overcoming the problems of complex calculation steps of a Q learning algorithm and low selection accuracy rate caused by Q value overestimation, leading the invention to have the advantages of simple calculation and being capable of selecting the optimal anti-interference method aiming at the selection, and greatly improving the accuracy rate and the stability of selecting the optimal anti-interference method.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a graph of the results of a challenge selection experiment between active interferers and anti-interference methods in a simulation experiment of the present invention;

FIG. 3 is a comparison graph of normalized anti-interference effective values of simulation experiments of the present invention.

Detailed Description

Embodiments and effects of the present invention will be further described below with reference to the accompanying drawings.

The steps of carrying out the present invention are further described with reference to fig. 1.

Step 1, an interference machine transmits an active interference signal.

The active interference signal refers to any one of a sweep frequency interference signal, a noise frequency modulation interference signal, a dense false target interference signal, a speed drag cause interference signal and a distance speed drag interference signal

And 2, identifying the type of the active signal received by the radar.

And 3, constructing two convolutional neural networks.

And (3) building an estimation value network consisting of an input layer, a convolution layer, a pooling layer and an output layer, wherein a network weight parameter w is initialized to 0.01, and a bias parameter is initialized to 0.

And (3) building a target value network consisting of an input layer, a convolution layer, a pooling layer and an output layer, wherein a network weight parameter w is initialized to 0.01, and a bias parameter is initialized to 0.

And 4, generating a radar state matrix.

And selecting an anti-interference method a corresponding to the identified active interference signal type from the anti-interference method library by using an epsilon-greedy method.

The epsilon-greedy method is characterized in that an anti-interference method is selected from an anti-interference method library according to the probability of epsilon, an anti-interference method corresponding to the identified active interference signal type is selected from the anti-interference method library according to the probability of 1-epsilon, and epsilon is a value selected between (0 and 1).

The anti-interference method library comprises 6 anti-interference methods of frequency agility, sidelobe hiding, self-adaptive sidelobe cancellation, self-adaptive beam forming, main lobe emission resisting waveform and space-time self-adaptation.

And obtaining an anti-interference effective value R from the anti-interference gain table according to the identified active interference type and the corresponding relation of the anti-interference method a in the normalized gain value table.

The normalized gain value table is composed of a plurality of anti-interference methods and a plurality of active interference signal values, each anti-interference effective value is a value corresponding to each anti-interference method and the active interference signal thereof one to one, the size of each effective value is [0,1], the abscissa in the normalized gain value table represents the type of the active interference, and the ordinate represents the anti-interference method.

Forming an anti-interference method a and an anti-interference effective value R into a radar state matrix S_t。

Step 5, construct the loss function as follows.

Wherein, Loss represents Loss function, gamma represents discount factor, and the value range is [0, 1%]Max denotes an operation of taking the maximum value, q(s)_t) Representing the radar state matrix S_tConvergence matrix, q(s), input to the output of the estimated value network_t+1) Indicating the next radar state matrix S_t+1The convergence matrix is input to the target value network output.

The next radar state matrix S_t+1Refers to the use of the optimal interference rejection method a_tAnd obtaining the anti-interference effective value R from the normalized gain table_tA radar state matrix of 1 × 2 size is constructed.

And 6, training an estimated value network.

The radar state matrix S_tInputting the data into an estimation value network, and iteratively updating network parameters by using a gradient descent method until a loss function is converged to obtain a trained estimation value network.

And 7, selecting the optimal anti-interference method by using the DQN algorithm.

The effect of the present invention will be further described with reference to simulation experiments.

1. Simulation experiment conditions are as follows:

the hardware platform of the simulation experiment of the invention: the CPU is Intel Core i7-7700, and the RAM is 16 GB.

The software platform of the simulation experiment of the invention comprises: windows 7 operating system and Pycharm 2019.

2. Simulation content and result analysis:

in the simulation experiment of the invention, the method provided by the invention and the optimal anti-interference method corresponding to the selected active interference signal in the prior art (Q learning method) are adopted, and a selection result graph of the method of the invention, an anti-interference effective value graph of the method of the invention and the Q learning method are generated through simulation software Prcharm2019, and the results are shown in fig. 2 and fig. 3.

One prior art Q learning method employed in simulation experiments is set forth in the following paper:

the method for selecting the best anti-interference technology (Q learning method for short) by using a Q learning algorithm is proposed by the chenqiang et al in 'Q-learning-based intelligent radar countermeasure' (system engineering and electronic technology, 2018,5, 1030-05).

In the simulation experiment, the epsilon of an epsilon-greedy algorithm is set to be 0.9, the learning rate of an estimation value network and a target value network is 0.6, a discount factor gamma in a loss function is 0.7, the memory capacity of a stored training data is 300, the number M of training samples randomly selected each time is 30, the interval step number of parameter updating of the target value network is 100, the iteration frequency is 100, an optimal anti-interference method is selected 100 times in each iteration process, a frequency sweep interference process, a noise frequency modulation interference process, a dense false target interference process and a speed dragging interference process are respectively selected 1667 times, and a distance and speed combined dragging interference process is selected 1666 times.

The simulation of the invention comprises 5 active interference signals of sweep frequency interference, dense false target interference, speed dragging interference, noise frequency modulation interference and distance and speed combined dragging interference.

The signal model for each interference is as follows.

The signal model of the sweep frequency interference is as follows:

wherein, J₁(t) represents a swept interference signal, U_jFor the amplitude of the sweep frequency interference signal, cos represents cosine function operation, t represents discrete time after sampling by the radar jammer, and the central frequency of the signal is f_j，Δf_jIn order to be able to measure the noise bandwidth,

is uniformly distributed in [0,2 pi ]]And with f_jAre independent of each other.

The signal model for dense decoy interference is as follows:

wherein, J₂(t) represents a dense decoy interference signal, which is an intermittent sampling retransmission of the radar signal, the sampling signal is a rectangular pulse train,. represents a multiplication operation,

the function of the gate is represented by,

where τ represents the pulse width of the sampled pulse train, Σ (-) represents the summation operation, P represents the number of pulses sampled, δ (t) represents the impulse function,

T_srepresenting the sampling pulse repetition period.

The signal model of the velocity-pulling disturbance is as follows:

wherein, J₃Representing a speed-pulling interference signal, A_JRepresenting the signal amplitude, exp representing an exponential operation based on a natural number e, ω_cCarrier frequency, omega, of radar signals_dFor the Doppler frequency of the radar echo signal, Δ ω represents the fixed Doppler frequency, R_tRepresenting the distance between the real target and the radar, c represents the speed of light.

The signal model of the noise fm interference is as follows:

wherein, J₄Indicating a noisy FM interferer, U_jFor signal amplitude, cos representsOperation of the cosine function, K_FMFrequency modulation slope, ω, in constant form_jIs the center frequency, the modulation noise u (t) is the mean zero, the variance is

The generalized stationary random process of (a) is,

is uniformly distributed in [0,2 pi ]]Is independent of u (t).

The signal model of the range-velocity joint pulling interference is as follows:

wherein, J₅Representing combined towing of interfering signals, U, of range and velocity_jFor signal amplitude, exp denotes an exponential operation based on a natural number e, f₀Carrier frequency of signal, f_djTo pull the Doppler frequency, R_tRepresenting the distance, Δ t, between the real target and the radar_fTo drag the delay function, c represents the speed of light.

In the simulation experiment, after the method is used for receiving the active interference signal transmitted by the jammer, the type of the active interference signal is identified by the radar, two convolutional neural networks are constructed, a radar state is generated, and the optimal anti-interference method corresponding to the active interference signal is obtained after the radar state is processed by the DQN algorithm. This results in a graph of the selection results between the active interference signal and the optimal interference rejection method shown in fig. 2, and a comparison graph of the normalized interference rejection values of the DQN algorithm and the Q learning method in fig. 3.

Fig. 2 is a result graph of respective optimal anti-interference methods corresponding to 5 active interference signals in a simulation experiment according to the present invention, and the abscissa in fig. 2 indicates the anti-interference times of each optimal anti-interference method, and the ordinate indicates the convergence value corresponding to each optimal anti-interference method. Fig. 2(a) is a convergence diagram of interference resistance by using the best interference resistance method corresponding to the frequency sweep interference, a dotted line denoted by "+" in fig. 2(a) indicates a convergence value curve of interference resistance by using the best interference resistance method corresponding to the frequency sweep interference, fig. 2(b) is a convergence diagram of interference resistance by using the best interference resistance method corresponding to the decoy interference, a dotted line denoted by "+" in fig. 2(b) indicates a convergence value curve of interference resistance by using the best interference resistance method corresponding to the decoy interference, fig. 2(c) is a convergence diagram of interference resistance by using the best interference resistance method corresponding to the speed pull interference, a dotted line denoted by "- -" in fig. 2(c) indicates a convergence value curve of interference resistance by using the best interference resistance method corresponding to the speed pull interference, and fig. 2(d) is a convergence diagram of interference resistance by using the best interference resistance method corresponding to the noise suppression interference, the curve of the convergence value for interference rejection using the optimal interference rejection method corresponding to noise suppression interference is indicated by a dotted line denoted by "·" in fig. 2(d), the curve of the convergence value for interference rejection using the optimal interference rejection method corresponding to distance-velocity-pulling interference is indicated by fig. 2(e), and the curve of the convergence value for interference rejection using the optimal interference rejection method corresponding to distance-velocity-pulling interference is indicated by a dotted line denoted by "·" in fig. 2 (e). Fig. 3 is a normalized anti-interference effective value graph obtained by using the method of the present invention and the Q learning method of the prior art, and the abscissa in fig. 3 indicates the number of iterative selection times, which indicates that 100 iterative processes are performed in total, and the ordinate DQN algorithm and the Q learning method respectively normalize anti-interference effective values. The curve denoted by "-" in fig. 3 represents the normalized anti-interference effective value of the DQN algorithm, and the curve denoted by "-" represents the normalized anti-interference effective value of the Q learning method.

As can be seen from fig. 2(a), when the interference type is frequency-sweep interference, a 1667-selected optimal anti-interference method simulation experiment is performed for the frequency-sweep interference, 1442 times of frequency-agile anti-interference methods are selected by using the method of the present invention, and the radar intelligent decision accuracy is 84.7%.

As can be seen from fig. 2(b), when the interference type is dense false target interference, a 1667 best anti-interference method simulation experiment is performed on the dense false target interference, and 1406 times of side lobe ghost anti-interference methods are selected by using the method of the present invention, so that the radar intelligent decision accuracy is 84.3%.

As can be seen from fig. 2(c), when the interference type is the speed-towed interference, a simulation experiment of 1667 selecting the best anti-interference method is performed for the speed-towed interference, 1379 times of space-time adaptive anti-interference methods are selected by using the method of the present invention, and the accuracy of the radar intelligent decision is 82.7%.

As can be seen from fig. 2(d), when the interference type is noise suppression interference, 1667 simulation experiments of selecting the best anti-interference method are performed for the noise suppression interference, 1215 times of adaptive sidelobe cancellation anti-interference methods are selected by using the method of the present invention, and the radar intelligent decision accuracy is 72.8%.

As can be seen from fig. 2(e), when the interference type is the distance-velocity joint towing interference, a 1666-selected optimal anti-interference method simulation experiment is performed on the distance-velocity joint towing interference, 1258-time anti-main-lobe emission waveform anti-interference methods are selected by using the method of the present invention, and the radar intelligent decision accuracy is 75.5%.

In summary, the DQN algorithm-based radar intelligent decision making system can obtain a result with an average decision making accuracy of 80%, and has a higher decision making accuracy.

As can be seen from fig. 3, compared with the normalized anti-interference effective value of the Q learning method, the normalized anti-interference effective value of the DQN algorithm is gradually increased and is much larger than the normalized anti-interference effective value of the Q learning method, which indicates that the anti-interference effect of the method of the present invention is superior to that of the Q learning method, and it can also be known from the figure that, when the number of iterations is about 40, the normalized anti-interference effective value of the method of the present invention tends to be stable, which proves that the method of the present invention has good stability and fast processing speed, can process large data level decision tasks, and can improve the selection efficiency for selecting the optimal anti-interference method.

Claims

1. An anti-interference intelligent decision method of a radar based on a DQN algorithm is characterized in that the current state of the radar is constructed by using interference signals recognized by the radar, and the optimal anti-interference method is selected by the DQN algorithm under the condition of training two convolutional neural networks and deciding, and the method comprises the following steps:

(1) the jammer transmits an active jammer signal;

(2) the radar identifies the type of active signal it receives;

(3) two convolutional neural networks were constructed:

(4) and (3) generating a radar state matrix:

(5) The loss function was constructed as follows:

(6) training an estimation value network:

the radar state matrix S_tInputting the data into an estimated value network, and iteratively updating network parameters by using a gradient descent method untilObtaining a trained estimation value network until the loss function is converged;

(7) selecting the optimal anti-interference method by using a DQN algorithm;

2. The DQN algorithm-based radar anti-interference intelligent decision method according to claim 1, wherein the active interference signal in step (1) refers to any one of a frequency sweep interference signal, a noise frequency modulation interference signal, a dense decoy interference signal, a speed-drag interference signal, and a range-speed joint-drag interference signal.

3. The intelligent DQN-algorithm-based radar anti-interference decision making method according to claim 1, wherein the epsilon-greedy method in step (4a) is to select the anti-interference method from the anti-interference method library according to the probability of epsilon, and to select the anti-interference method corresponding to the identified active interference signal type from the anti-interference method library according to the probability of 1-epsilon, epsilon being a value selected between (0, 1).

4. The DQN algorithm-based radar anti-interference intelligent decision method of claim 1, wherein the anti-interference method library in step (4a) comprises 6 anti-interference methods of frequency agility, side lobe concealment, adaptive side lobe cancellation, adaptive beam forming, anti-main lobe transmit waveform, and space-time adaptation.

5. The DQN-algorithm-based intelligent decision-making method for radar anti-interference according to claim 1, wherein the normalized gain value table in step (4b) is composed of multiple anti-interference methods and multiple active interference signal values, each anti-interference effective value is a value corresponding to each anti-interference method and its active interference signal one to one, the magnitude of each effective value is [0,1], the abscissa in the normalized gain value table represents the type of active interference, and the ordinate represents the anti-interference method.

6. The DQN algorithm-based radar anti-interference intelligent decision method according to claim 1, wherein the next radar state matrix S in step (5)_t+1Refers to the use of the optimal interference rejection method a_tAnd obtaining the anti-interference effective value R from the normalized gain table_tA radar state matrix of 1 × 2 size is constructed.