CN113507342A

CN113507342A - Unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning

Info

Publication number: CN113507342A
Application number: CN202110930717.6A
Authority: CN
Inventors: 赵睿; 刘浩然
Original assignee: Huaqiao University
Current assignee: Huaqiao University
Priority date: 2021-08-13
Filing date: 2021-08-13
Publication date: 2021-10-15
Anticipated expiration: 2041-08-13
Also published as: CN113507342B

Abstract

The invention provides an unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning, which comprises the following steps: step 1, building an unmanned aerial vehicle cooperative communication network, wherein the unmanned aerial vehicle cooperative communication network comprises a ground node, a source unmanned aerial vehicle, a relay unmanned aerial vehicle, an intelligent jammer and a fixed jammer; step 2, the observed state of the current time slot is used as the input of a target Q neural network, Q values of all actions are obtained after analysis, the action of the current time slot is selected according to an epsilon-greedy strategy, the bit error rate, the signal interference noise ratio, the interruption rate and the benefit under the current time slot are calculated, the state of the next time slot is observed, the experience e is stored in an experience pool, and a plurality of experiences are randomly extracted from the experience pool to update a Q neural network parameter theta; step 3, the operation in the step 2 is executed to the divided time slots in sequence, and the parameters of the target Q network are updated once every fixed time slot T

The method can reduce the bit error rate and the interruption rate of the communication system, improve the anti-interference performance and reduce the energy consumption of the source unmanned aerial vehicle.

Description

Unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning

Technical Field

The invention relates to the technical field of unmanned aerial vehicles, in particular to an unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning.

Background

Due to the broadcast nature of wireless communication, communications between a single drone and a ground station are easily interfered by ground hostile jammers. In addition, the distance from the ground station is usually far when a single unmanned aerial vehicle performs a task, and the path loss can be large. Under the influence of the two main factors, a direct link between a single unmanned aerial vehicle and a ground station is subjected to more serious interference, so that the quality of a received signal is reduced, and information errors or loss are caused.

The anti-interference scheme of traditional unmanned aerial vehicle improves anti-interference performance through optimizing transmit power mostly, but, when source unmanned aerial vehicle is far away from the ground satellite station, path loss is very big, can't realize ideal anti-interference performance only through optimizing source unmanned aerial vehicle's transmit power. Meanwhile, the traditional unmanned aerial vehicle anti-interference scheme needs to know a specific channel model and an interference model to optimize the transmitting power, so that the application effect of the scheme in an actual scene is greatly limited.

Disclosure of Invention

The invention aims to solve the technical problem of providing an unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning, which can obviously improve the anti-interference performance of a communication system under the condition of not knowing an object interference model and a channel model.

The invention is realized in this way, an unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning, comprising the following steps:

step 1, building an unmanned aerial vehicle cooperative communication network, including ground node, source unmanned aerial vehicle, relay unmanned aerial vehicle, intelligent jammer and fixed jammer, forwarding information between ground node and source unmanned aerial vehicle through relay unmanned aerial vehicle, jammer simultaneously transmitting jamming signal to ground node and relay unmanned aerial vehicle node, unmanned aerial vehicle node set in the communication network is defined as U ═ U_n},0≤n≤N，U₀As a source unmanned aerial vehicle node, U_rR is more than or equal to 1 and less than or equal to N;

step 2, the observed state of the current time slot is taken as the input of a target Q neural network, the Q values of all actions are obtained after the analysis of the target Q neural network, selecting the action of the current time slot according to an epsilon-greedy strategy, calculating the bit error rate, the signal interference noise ratio, the interruption rate and the benefit under the current time slot, observing the state of the next time slot, storing the experience e into an experience pool, randomly extracting a plurality of historical experiences from the experience pool and updating a Q neural network parameter theta by adopting a random gradient descent algorithm, the state of the current time slot comprises the transmission power of the last time slot, the bit error rate of the last time slot and the signal interference noise ratio of the last time slot, the action comprises the transmitting power of the current time slot and the relay unmanned aerial vehicle of the current time slot, and the experience e comprises the state of the current time slot, the action of the current time slot, the benefit of the current time slot and the state of the next time slot;

step 3, the operation in the step 2 is executed to the divided time slots in sequence, and the parameters of the target Q network are updated once every fixed time slot T

Order to

Further, in the k-th slot, U₀Select a U_rAnd with p^(k)To U_rSending a message, U_rAfter receiving the message, calculating the signal interference noise ratio of the received message

And bit error rate

And judging whether the transmission is interrupted, if the transmission is not interrupted, U_rThen at a fixed relay power p_rRelaying the message to the ground node, after the ground node receives the message, calculating the signal interference noise ratio of the received message

And bit error rate

And judging whether the transmission is interrupted, if the transmission is not interrupted, the acquisition mode of the signal interference noise ratio and the bit error rate in the whole communication process is as follows:

further, the manner of "judging whether transmission is interrupted" is specifically: and comparing the signal interference noise ratio with a threshold tau, and if the signal interference noise ratio is greater than the threshold tau, judging that the communication process is not interrupted.

Further, the interruption rate is calculated according to the following formula:

further, an intelligent jammer is represented by J1, a fixed jammer is represented by J2, and the interference power of the intelligent jammer J1

Maximum J_maxThe interference power is quantized to Y level,

b is an interference power set, and a benefit calculation formula after the intelligent jammer takes action is as follows:

wherein i is equal to 0 or 1, C_jWeight representing energy consumption of intelligent jammer, I {. represents interruption indication function, if transmission is interrupted (O ═ 1) then is 1, and if transmission is not interrupted (O ═ 0) then is 0, intelligent jammer J1 observes signal to interference plus noise ratio ρ of last time slot^(k-1)Interference power is selected to improve outage rate and maximize efficiency, and the interference power of jammer J2 is fixed

A fixed value is maintained at all times. Because the jammer transmits the jamming signal in all directions, the relay unmanned aerial vehicle U_rAnd the ground node can be interfered, and for the intelligent jammer, the interference power of the intelligent jammer needs to be selected through reinforcement learning.

Further, the specific calculation formula of the signal to interference plus noise ratio is as follows:

where ρ is_m-nRepresenting the signal-to-interference-and-noise ratio of the message received by the n node when the m node sends the message to the n node, p represents the transmission power of the m node, h_m-nMultiple of dB value representing path loss from m node to n node, p_JiRepresenting the power of the jammer, i taking 1 or 0, p_J1Power, p, representing smart jammer J1_J2Represents the power, h, of a stationary jammer J2_i-nMultiple of dB value, sigma, representing what jammer Ji node to n node path loss²Power as background noise;

h is_m-nThe calculation formula of (a) is as follows:

wherein L is_m-n(r) represents the path loss from the m node to the n node, and the specific calculation formula is as follows:

r is the Euclidean distance between two nodes of m and N, m is more than or equal to 0 and is not equal to N and is not more than N, c is the speed of light, f is the communication frequency, alpha_pRepresents the path loss index, alpha when m and n nodes are the source unmanned aerial vehicle node and the relay unmanned aerial vehicle node_p2.05, when m and n nodes are relay unmanned aerial vehicle and ground nodeAt a point of time, α_p＝2.32。

Further, the bit error rate is calculated as follows:

where ρ represents the signal to interference plus noise ratio.

Further, the calculation formula of the benefit is as follows:

u^(k)＝10-δb^(k)-C_up^(k)

where δ represents the weight of the bit error rate, C_uRepresenting the weight of energy consumption.

Further, the updating formula of the Q neural network parameter θ is as follows:

where s, x, u, s ' respectively represent the state, action, benefit and next state in experience e, gamma represents the discount factor, x ' represents the action in the s ' state,

the Q value after selecting the operation x 'in the next state s' is shown, and α represents the learning rate.

The invention has the advantages that: the transmission power of the source unmanned aerial vehicle and the relay unmanned aerial vehicle are optimized simultaneously by adopting a deep reinforcement learning algorithm, so that the bit error rate and the interruption rate of a communication system can be effectively reduced, the anti-interference performance is improved, the energy consumption of the source unmanned aerial vehicle is effectively reduced, a specific channel model and an interference model are not required to be known, and the method is more suitable for practical application and convenient to popularize.

Drawings

The invention will be further described with reference to the following examples with reference to the accompanying drawings.

Fig. 1 is an execution flow chart of the unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning.

Fig. 2 is a schematic diagram of a communication network system model according to an embodiment of the present invention.

Fig. 3 is a schematic diagram illustrating a comparison between the bit error rate of an embodiment of the present invention and the bit error rate of an embodiment of the present invention using a conventional interference rejection method.

Fig. 4 is a schematic diagram illustrating comparison between the energy consumption of the source drone and the energy consumption of the source drone by using a conventional anti-jamming method according to an embodiment of the present invention.

FIG. 5 is a diagram illustrating a comparison between the interruption rate of an embodiment of the present invention and the interruption rate of an embodiment of the present invention using a conventional interference rejection method.

Detailed Description

The invention provides an unmanned aerial vehicle relay anti-interference scheme based on deep reinforcement learning, the optimized anti-interference performance indexes are bit error rate of received messages, interruption rate of a communication system and energy consumption of a source unmanned aerial vehicle, the scheme is a combined optimization scheme, firstly, a plurality of relay unmanned aerial vehicles are arranged between a ground station and the source unmanned aerial vehicle, path loss of each transmission is reduced by relaying and forwarding the messages, and secondly, the source unmanned aerial vehicle can obtain optimal transmitting power and the relay unmanned aerial vehicle to improve the anti-interference performance by applying a deep reinforcement learning algorithm DQN. Finally, according to the simulation result, the scheme provided by the invention can obviously improve the anti-interference performance of the communication system under the condition of not knowing an interference model and a channel model.

As shown in fig. 1, the method for resisting interference of the relay of the unmanned aerial vehicle based on deep reinforcement learning of the present invention includes:

step 1, building an unmanned aerial vehicle cooperative communication network, including ground node, source unmanned aerial vehicle, relay unmanned aerial vehicle, intelligent jammer and fixed jammer, forwarding information between ground node and source unmanned aerial vehicle through relay unmanned aerial vehicle, jammer simultaneously transmitting jamming signal to ground node and relay unmanned aerial vehicle node, unmanned aerial vehicle node set in the communication network is defined as U ═ U_n},0≤n≤N，U₀As a source unmanned aerial vehicle node, U_rFor the relay unmanned aerial vehicle node, r is more than or equal to 1 and less than or equal to N, and the flight period of the source unmanned aerial vehicle is divided into a plurality of communication time slots. In the communication network, noneThe human-computer nodes are hovered at different heights, the jammers are located on the ground, a direct link between the source unmanned aerial vehicle node and the ground node is damaged due to interference, the relay unmanned aerial vehicle node can help the source unmanned aerial vehicle node to relay messages to the ground node, and if a feedback channel is not interfered, the feedback channel is used for transmitting a bit error rate and a signal interference noise ratio to the source unmanned aerial vehicle node U₀；

Step 2, the observed state of the current time slot is taken as the input of a target Q neural network, the Q values of all actions are obtained after the analysis of the target Q neural network, selecting the action of the current time slot according to an epsilon-greedy strategy, calculating the bit error rate, the signal interference noise ratio, the interruption rate and the benefit under the current time slot, observing the state of the next time slot, storing the experience e into an experience pool, randomly extracting a plurality of historical experiences from the experience pool and updating a Q neural network parameter theta by adopting a random gradient descent algorithm, the state of the current time slot comprises the transmission power of the last time slot, the bit error rate of the last time slot and the signal interference noise ratio of the last time slot, the action comprises the transmitting power of the current time slot and the relay unmanned aerial vehicle of the current time slot, and the experience e comprises the state of the current time slot, the action of the current time slot, the benefit of the current time slot and the state of the next time slot; the invention provides an unmanned aerial vehicle relay anti-interference scheme based on DQN, wherein the DQN is a deep reinforcement learning algorithm which integrates neural network and Q learning. When the problem to be solved by Q learning is complex, the state space or action set may be large. Learning efficiency is reduced if the states are retrieved from a very large Q-table each time. The state is taken as the input of the neural network, and the Q values of all actions are obtained after the neural network is analyzed, so that a large Q table is avoided being constructed to store the Q values.

Order to

For example, in the k-th time slot, the system state is defined as s^(k)＝[p^(k-1),b^(k-1),ρ^(k-1)]Containing the transmission power p of the last time slot^k-1Bit error rate b^k-1Signal to interference plus noise ratio ρ^k-1，U₀The selection action is

A is the action set of the source node and comprises selectable transmission power p^kAnd relay unmanned aerial vehicle U_r. At U₀After the message is sent, the bit error rate and the signal interference noise ratio of the received message are respectively in U_rAnd the ground node carries out calculation and then sends the result to the U through a feedback channel₀And calculating the benefit u of the time slot^(k). Next time slot, U₀Observing the feedback results determines a new state and then continuing to select a new action based on this state. And so on.

Preferably, in the k-th slot, U₀Select a U_rAnd with p^(k)To U_rSending a message, U_rAfter receiving the message, calculating the signal interference noise ratio of the received message

And bit error rate

And bit error rate

And judging whether the transmission is interrupted, if the transmission is not interrupted, the signal interference of the whole communication process is judgedThe interference-to-noise ratio and the bit error rate are obtained as follows:

the cooperation mode of the relay unmanned aerial vehicle node can be set as Decoding Forwarding (DF), U₀Is transmitted with a power p^(k)Maximum P_maxThe transmission power being quantized to M levels, e.g. p^(k)＝{mP_max/M},1≤m≤M。

Preferably, the method of "determining whether transmission is interrupted" specifically includes: and comparing the signal interference noise ratio with a threshold tau, and if the signal interference noise ratio is greater than the threshold tau, judging that the communication process is not interrupted.

Preferably, the interruption rate is calculated according to the following formula:

preferably, the intelligent jammer is represented by J1, the fixed jammer is represented by J2, and the interference power of the intelligent jammer J1

Maximum J_maxThe interference power is quantized to Y level,

wherein i is equal to 0 or 1, C_jA weight representing the energy consumption of the intelligent jammer,i {. represents an interruption indication function, which is 1 if the transmission is interrupted (O ═ 1) and 0 if the transmission is not interrupted (O ═ 0), and the intelligent jammer J1 observes the sir ρ of the last timeslot^(k-1)Interference power is selected to improve outage rate and maximize efficiency, and the interference power of jammer J2 is fixed

Preferably, the interference of the jammer affects the quality of communication between any two nodes in the communication network. The specific calculation formula of the signal interference noise ratio is as follows:

where ρ is_m-nRepresenting the signal-to-interference-and-noise ratio of the message received by the n node when the m node sends the message to the n node, p represents the transmission power of the m node, h_m-nMultiple of dB value representing path loss from m node to n node, p_JiRepresenting the power of the jammer, i taking 1 or 0, p_J1Power, p, representing smart jammer J1_J2Represents the power, h, of a stationary jammer J2_i-nMultiple of dB value, sigma, representing what jammer Ji node to n node path loss²Power as background noise; the coding modulation mode of the transmitted message can adopt QPSK;

h is_m-nThe calculation formula of (a) is as follows:

wherein L is_m-n(r) represents the path loss from m-node to n-node (the path loss L is expressed by the above formula)_m-nThe dB value of (r) is converted into multipleNumber h_m-n)，L_m-n(r) the specific calculation formula is as follows:

r is the Euclidean distance between two nodes of m and N, m is more than or equal to 0 and is not equal to N and is not more than N, c is the speed of light, f is the communication frequency, alpha_pDenotes the path loss exponent due to U₀To U_rThe communication channel of (1) is a space-to-space channel, and the path loss can be described by free-space propagation, so the path loss in the space-to-space channel has a low fading rate. But for U_rFor air-to-ground channels to ground nodes, fading dominates the air-to-ground channel due to objects near ground nodes and more severe path loss, so when m and n nodes are source drone nodes and relay drone nodes, α_p2.05, when m and n nodes are relay drones and ground nodes, α_p＝2.32。

Preferably, the bit error rate is calculated as follows:

where ρ represents the signal to interference plus noise ratio.

Preferably, the calculation formula of the benefit is as follows:

u^(k)＝10-δb^(k)-C_up^(k)

Preferably, the update formula of the Q neural network parameter θ is as follows:

where s, x, u, s' represent the state, action, benefit and next state, respectively, in the experience e, gamma tableShowing the discount factor, x 'representing the action in the s' state,

In each time slot, U₀Will experience e^(k)＝{s^(k),x^(k),u^(k),s^(k+1)The experience pool is stored in the experience pool of the user, and the experience pool is defined as R ═ e⁽ⁱ⁾I is more than or equal to 1 and less than or equal to k-1. The Q network and the target Q network have the same structure, and the initial network parameters are the same

The network parameter theta of the Q network is updated every time slot, and when updated, U₀Randomly extracting a plurality of historical experiences from the experience pool, updating theta by adopting a random gradient descent algorithm, and obtaining a loss function of

Thus, the above-mentioned update formula of θ is obtained from the loss function. The network parameters of the target Q network are updated once every fixed time slot T, and the network parameters are directly updated

The target Q network is used to calculate a target value, which can reduce the correlation between the current Q value and the target Q value.

The following steps of the detailed algorithm are set according to the technical scheme of the invention as follows:

1 initialization of gamma, p⁽⁰⁾,b⁽⁰⁾,ρ⁽⁰⁾,

p_J2,ε,τ,C_u,C_j,δ,

2:for k＝1,2,3…do

3 observing the State s^(k)＝[p^(k-1),b^(k-1),ρ^(k-1)]

4, obtaining the Q value output by the neural network, and selecting an action x according to an epsilon-greedy strategy^(k)＝[p^(k),U_r ^(k)]

5 calculating rho respectively^(k),b^(k)

6 calculating the interruption rate

7 calculating u^(k)

8 observing the next state s^(k+1)＝[p^(k),b^(k),ρ^(k)]

9 general experience e^(k)＝{s^(k),x^(k),u^(k),s^(k+1)Store it in an experience pool R

10, randomly extracting Z experiences from an experience pool

11 updating network parameter theta

Ifk equal to an integer multiple of T then

13 to order

14:end if

15 order s^(k)＝s^(k+1)Performing state iteration

16:end for

For a more detailed description of the present invention, reference is now made to a specific embodiment:

as shown in fig. 2, in this embodiment, a three-dimensional coordinate system is used to represent a communication network, and 3 optional relay drone nodes U are arranged in the network_r(i.e. including U)₁,U₂,U₃) At (10,30,30), (20,20,20), (30,10, 10); 1 source unmanned aerial vehicle node U₀Coordinates (40,40, 40); 1 ground node G coordinate (0,0, 0); 2 jammers (fixed jammer J2(110,0,0) and smart jammer J1(0,110, 0). according to the universal national drone standard, the communication frequency f is set to 2.4ghz^(k)In the range of [30,100 ]]mW, uniform quantization is 5 grades. Considering the distance between the relay node and the ground node and the jammerRelation, relay power p_rSet to 60 mW. Interference power of fixed jammer

Set to 30mW, interference power of intelligent jammer

In the range of [10,40]mW, uniform quantization is 4 grades. Background noise power σ²Set to-100 dBm and the threshold τ to 10 dB. In the DQN algorithm, the learning rate α is 0.001, the discount factor γ is 0.5, δ is 3000, C_u＝100，C_j＝60。

Based on the algorithm, the simulation result is filtered by adopting the least square smoothing filter function sgolayfilt in matlab, so that the error in the simulation result is reduced, and the curve can be more smooth. As shown in fig. 3, the bit error rate of the relay anti-interference scheme of the unmanned aerial vehicle based on the DQN of the present invention is optimized to a certain extent compared with that of the other two conventional schemes. The bit error rate of the unmanned aerial vehicle relay anti-interference scheme based on Q learning is 7.2 multiplied by 10 from the initial time^-4Begins to fall and finally converges to 2.8 x 10 in about 4500 time slots^-5. Meanwhile, the bit error rate of the DQN-based unmanned aerial vehicle relay anti-interference scheme is reduced from the same value, and finally the bit error rate is converged to 7.1 multiplied by 10 around 2000 time slots^-6. Therefore, the algorithm provided by the invention has higher convergence speed and better learning effect. As shown in fig. 4, a graph of the variation of the energy consumption of the source drone is given. The DQN-based unmanned aerial vehicle relay anti-interference scheme is superior to other two schemes in convergence speed and final result. In the Q learning-based unmanned aerial vehicle relay anti-interference scheme, the energy consumption starts to decrease from 65mJ at the beginning, and finally converges to 36.2mJ at about 4500 time slots. In the scheme provided by the invention, the energy consumption also starts to decrease from the same value, and the energy consumption converges to 30.4mJ only after 2000 time slots, so that the learning effect is better than that of other schemes. As shown in fig. 5, the relay anti-interference scheme based on DQN for the unmanned aerial vehicle can effectively reduce the interruption rate of the communication system, and the degree of the interruption is reduced compared with other two traditional schemesAnd is larger. In the unmanned aerial vehicle relay anti-interference scheme based on Q learning, the interruption rate is reduced from 0.24, and finally, the interruption rate is converged to 0.08 at about 4500 time slots. In the proposed solution, the outage rate also decreases from 0.24 and finally converges to 0.005 in around 2000 time slots, which is a reduction of about 93.75% compared to the former.

The invention is different from the traditional unmanned aerial vehicle power optimization anti-interference scheme, and in the combined optimization scheme provided by the invention, the source unmanned aerial vehicle can reduce the path loss by selecting the relay unmanned aerial vehicle and can optimize the transmitting power according to the position of the relay unmanned aerial vehicle. The scheme can obviously improve the anti-interference performance of the communication system. Meanwhile, the invention provides an unmanned aerial vehicle relay anti-interference scheme based on DQN by applying a deep reinforcement learning algorithm in a joint optimization scheme. According to the scheme, a source unmanned aerial vehicle can obtain an optimal communication strategy by continuously trial and error and summarizing experience without knowing a specific channel model and an interference model. Therefore, the scheme is more suitable for application in actual scenes and has certain universality.

Although specific embodiments of the invention have been described above, it will be understood by those skilled in the art that the specific embodiments described are illustrative only and are not limiting upon the scope of the invention, and that equivalent modifications and variations can be made by those skilled in the art without departing from the spirit of the invention, which is to be limited only by the appended claims.

Claims

1. An unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning is characterized in that: the method comprises the following steps:

Order to

2. The unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning of claim 1, characterized in that:

in the k-th time slot, U₀Select a U_rAnd with p^(k)To U_rSending a message, U_rAfter receiving the message, calculating the signal interference noise ratio of the received message

And bit error rate

And bit error rate

3. the unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning of claim 2, characterized in that: the method for judging whether transmission is interrupted specifically comprises the following steps: and comparing the signal interference noise ratio with a threshold tau, and if the signal interference noise ratio is greater than the threshold tau, judging that the communication process is not interrupted.

4. The unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning of claim 3, characterized in that: the calculation formula of the interruption rate is as follows:

5. a method according to claim 2, based on deep reinforcement learningUnmanned aerial vehicle relay anti-interference method, its characterized in that: the intelligent jammer is represented by J1, the fixed jammer is represented by J2, and the interference power of the intelligent jammer J1

Maximum J_maxThe interference power is quantized to Y level,

wherein i is equal to 0 or 1, C_jA weight representing the energy consumption of the intelligent jammer, I {. represents an interruption indication function, and is 1 if the transmission is interrupted (O ═ 1) and is 0 if the transmission is not interrupted (O ═ 0);

the intelligent jammer J1 observes the signal-to-interference-and-noise ratio p of the last time slot^(k-1)Selecting interference power, fixing interference power of jammer J2

A fixed value is maintained at all times.

6. The unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning of claim 2, characterized in that: the specific calculation formula of the signal interference noise ratio is as follows:

where ρ is_m-nRepresenting the signal-to-interference-and-noise ratio of the message received by the n node when the m node sends the message to the n node, p represents the transmission power of the m node, h_m-nRepresents a multiple of the dB value of the m-node to n-node path loss,p_Jirepresenting the power of the jammer, i taking 1 or 0, p_J1Power, p, representing smart jammer J1_J2Represents the power, h, of a stationary jammer J2_i-nMultiple of dB value, sigma, representing what jammer Ji node to n node path loss²Power as background noise;

h is_m-nThe calculation formula of (a) is as follows:

r is the Euclidean distance between two nodes of m and N, m is more than or equal to 0 and is not equal to N and is not more than N, c is the speed of light, f is the communication frequency, alpha_pRepresents the path loss index, alpha when m and n nodes are the source unmanned aerial vehicle node and the relay unmanned aerial vehicle node_p2.05, when m and n nodes are relay drones and ground nodes, α_p＝2.32。

7. The unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning of claim 2, characterized in that: the bit error rate is calculated as follows:

where ρ represents the signal to interference plus noise ratio.

8. The unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning of claim 2, characterized in that: the calculation formula of the benefit is as follows:

u^(k)＝10-δb^(k)-C_up^(k)

9. The unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning of claim 2, characterized in that: the updating formula of the Q neural network parameter theta is as follows: