CN113507342A - Unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning - Google Patents

Unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning Download PDF

Info

Publication number
CN113507342A
CN113507342A CN202110930717.6A CN202110930717A CN113507342A CN 113507342 A CN113507342 A CN 113507342A CN 202110930717 A CN202110930717 A CN 202110930717A CN 113507342 A CN113507342 A CN 113507342A
Authority
CN
China
Prior art keywords
aerial vehicle
unmanned aerial
time slot
interference
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110930717.6A
Other languages
Chinese (zh)
Other versions
CN113507342B (en
Inventor
赵睿
刘浩然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaqiao University
Original Assignee
Huaqiao University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaqiao University filed Critical Huaqiao University
Priority to CN202110930717.6A priority Critical patent/CN113507342B/en
Publication of CN113507342A publication Critical patent/CN113507342A/en
Application granted granted Critical
Publication of CN113507342B publication Critical patent/CN113507342B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04KSECRET COMMUNICATION; JAMMING OF COMMUNICATION
    • H04K3/00Jamming of communication; Counter-measures
    • H04K3/20Countermeasures against jamming
    • H04K3/22Countermeasures against jamming including jamming detection and monitoring
    • H04K3/224Countermeasures against jamming including jamming detection and monitoring with countermeasures at transmission and/or reception of the jammed signal, e.g. stopping operation of transmitter or receiver, nulling or enhancing transmitted power in direction of or at frequency of jammer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04KSECRET COMMUNICATION; JAMMING OF COMMUNICATION
    • H04K3/00Jamming of communication; Counter-measures
    • H04K3/80Jamming or countermeasure characterized by its function
    • H04K3/84Jamming or countermeasure characterized by its function related to preventing electromagnetic interference in petrol station, hospital, plane or cinema
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04KSECRET COMMUNICATION; JAMMING OF COMMUNICATION
    • H04K3/00Jamming of communication; Counter-measures
    • H04K3/20Countermeasures against jamming
    • H04K3/28Countermeasures against jamming with jamming and anti-jamming mechanisms both included in a same device or system, e.g. wherein anti-jamming includes prevention of undesired self-jamming resulting from jamming
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04KSECRET COMMUNICATION; JAMMING OF COMMUNICATION
    • H04K3/00Jamming of communication; Counter-measures
    • H04K3/40Jamming having variable characteristics
    • H04K3/45Jamming having variable characteristics characterized by including monitoring of the target or target signal, e.g. in reactive jammers or follower jammers for example by means of an alternation of jamming phases and monitoring phases, called "look-through mode"
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04KSECRET COMMUNICATION; JAMMING OF COMMUNICATION
    • H04K2203/00Jamming of communication; Countermeasures
    • H04K2203/10Jamming or countermeasure used for a particular application
    • H04K2203/22Jamming or countermeasure used for a particular application for communication related to vehicles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04KSECRET COMMUNICATION; JAMMING OF COMMUNICATION
    • H04K2203/00Jamming of communication; Countermeasures
    • H04K2203/30Jamming or countermeasure characterized by the infrastructure components
    • H04K2203/34Jamming or countermeasure characterized by the infrastructure components involving multiple cooperating jammers
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

The invention provides an unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning, which comprises the following steps: step 1, building an unmanned aerial vehicle cooperative communication network, wherein the unmanned aerial vehicle cooperative communication network comprises a ground node, a source unmanned aerial vehicle, a relay unmanned aerial vehicle, an intelligent jammer and a fixed jammer; step 2, the observed state of the current time slot is used as the input of a target Q neural network, Q values of all actions are obtained after analysis, the action of the current time slot is selected according to an epsilon-greedy strategy, the bit error rate, the signal interference noise ratio, the interruption rate and the benefit under the current time slot are calculated, the state of the next time slot is observed, the experience e is stored in an experience pool, and a plurality of experiences are randomly extracted from the experience pool to update a Q neural network parameter theta; step 3, the operation in the step 2 is executed to the divided time slots in sequence, and the parameters of the target Q network are updated once every fixed time slot T
Figure DDA0003210583960000011
The method can reduce the bit error rate and the interruption rate of the communication system, improve the anti-interference performance and reduce the energy consumption of the source unmanned aerial vehicle.

Description

Unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning
Technical Field
The invention relates to the technical field of unmanned aerial vehicles, in particular to an unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning.
Background
Due to the broadcast nature of wireless communication, communications between a single drone and a ground station are easily interfered by ground hostile jammers. In addition, the distance from the ground station is usually far when a single unmanned aerial vehicle performs a task, and the path loss can be large. Under the influence of the two main factors, a direct link between a single unmanned aerial vehicle and a ground station is subjected to more serious interference, so that the quality of a received signal is reduced, and information errors or loss are caused.
The anti-interference scheme of traditional unmanned aerial vehicle improves anti-interference performance through optimizing transmit power mostly, but, when source unmanned aerial vehicle is far away from the ground satellite station, path loss is very big, can't realize ideal anti-interference performance only through optimizing source unmanned aerial vehicle's transmit power. Meanwhile, the traditional unmanned aerial vehicle anti-interference scheme needs to know a specific channel model and an interference model to optimize the transmitting power, so that the application effect of the scheme in an actual scene is greatly limited.
Disclosure of Invention
The invention aims to solve the technical problem of providing an unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning, which can obviously improve the anti-interference performance of a communication system under the condition of not knowing an object interference model and a channel model.
The invention is realized in this way, an unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning, comprising the following steps:
step 1, building an unmanned aerial vehicle cooperative communication network, including ground node, source unmanned aerial vehicle, relay unmanned aerial vehicle, intelligent jammer and fixed jammer, forwarding information between ground node and source unmanned aerial vehicle through relay unmanned aerial vehicle, jammer simultaneously transmitting jamming signal to ground node and relay unmanned aerial vehicle node, unmanned aerial vehicle node set in the communication network is defined as U ═ Un},0≤n≤N,U0As a source unmanned aerial vehicle node, UrR is more than or equal to 1 and less than or equal to N;
step 2, the observed state of the current time slot is taken as the input of a target Q neural network, the Q values of all actions are obtained after the analysis of the target Q neural network, selecting the action of the current time slot according to an epsilon-greedy strategy, calculating the bit error rate, the signal interference noise ratio, the interruption rate and the benefit under the current time slot, observing the state of the next time slot, storing the experience e into an experience pool, randomly extracting a plurality of historical experiences from the experience pool and updating a Q neural network parameter theta by adopting a random gradient descent algorithm, the state of the current time slot comprises the transmission power of the last time slot, the bit error rate of the last time slot and the signal interference noise ratio of the last time slot, the action comprises the transmitting power of the current time slot and the relay unmanned aerial vehicle of the current time slot, and the experience e comprises the state of the current time slot, the action of the current time slot, the benefit of the current time slot and the state of the next time slot;
step 3, the operation in the step 2 is executed to the divided time slots in sequence, and the parameters of the target Q network are updated once every fixed time slot T
Figure BDA0003210583940000021
Order to
Figure BDA0003210583940000022
Further, in the k-th slot, U0Select a UrAnd with p(k)To UrSending a message, UrAfter receiving the message, calculating the signal interference noise ratio of the received message
Figure BDA0003210583940000023
And bit error rate
Figure BDA0003210583940000024
And judging whether the transmission is interrupted, if the transmission is not interrupted, UrThen at a fixed relay power prRelaying the message to the ground node, after the ground node receives the message, calculating the signal interference noise ratio of the received message
Figure BDA0003210583940000025
And bit error rate
Figure BDA0003210583940000026
And judging whether the transmission is interrupted, if the transmission is not interrupted, the acquisition mode of the signal interference noise ratio and the bit error rate in the whole communication process is as follows:
Figure BDA0003210583940000027
Figure BDA0003210583940000028
further, the manner of "judging whether transmission is interrupted" is specifically: and comparing the signal interference noise ratio with a threshold tau, and if the signal interference noise ratio is greater than the threshold tau, judging that the communication process is not interrupted.
Further, the interruption rate is calculated according to the following formula:
Figure BDA0003210583940000029
further, an intelligent jammer is represented by J1, a fixed jammer is represented by J2, and the interference power of the intelligent jammer J1
Figure BDA0003210583940000031
Maximum JmaxThe interference power is quantized to Y level,
Figure BDA0003210583940000032
b is an interference power set, and a benefit calculation formula after the intelligent jammer takes action is as follows:
Figure BDA0003210583940000033
wherein i is equal to 0 or 1, CjWeight representing energy consumption of intelligent jammer, I {. represents interruption indication function, if transmission is interrupted (O ═ 1) then is 1, and if transmission is not interrupted (O ═ 0) then is 0, intelligent jammer J1 observes signal to interference plus noise ratio ρ of last time slot(k-1)Interference power is selected to improve outage rate and maximize efficiency, and the interference power of jammer J2 is fixed
Figure BDA0003210583940000034
A fixed value is maintained at all times. Because the jammer transmits the jamming signal in all directions, the relay unmanned aerial vehicle UrAnd the ground node can be interfered, and for the intelligent jammer, the interference power of the intelligent jammer needs to be selected through reinforcement learning.
Further, the specific calculation formula of the signal to interference plus noise ratio is as follows:
Figure BDA0003210583940000035
where ρ ism-nRepresenting the signal-to-interference-and-noise ratio of the message received by the n node when the m node sends the message to the n node, p represents the transmission power of the m node, hm-nMultiple of dB value representing path loss from m node to n node, pJiRepresenting the power of the jammer, i taking 1 or 0, pJ1Power, p, representing smart jammer J1J2Represents the power, h, of a stationary jammer J2i-nMultiple of dB value, sigma, representing what jammer Ji node to n node path loss2Power as background noise;
h ism-nThe calculation formula of (a) is as follows:
Figure BDA0003210583940000036
wherein L ism-n(r) represents the path loss from the m node to the n node, and the specific calculation formula is as follows:
Figure BDA0003210583940000037
r is the Euclidean distance between two nodes of m and N, m is more than or equal to 0 and is not equal to N and is not more than N, c is the speed of light, f is the communication frequency, alphapRepresents the path loss index, alpha when m and n nodes are the source unmanned aerial vehicle node and the relay unmanned aerial vehicle nodep2.05, when m and n nodes are relay unmanned aerial vehicle and ground nodeAt a point of time, αp=2.32。
Further, the bit error rate is calculated as follows:
Figure BDA0003210583940000041
where ρ represents the signal to interference plus noise ratio.
Further, the calculation formula of the benefit is as follows:
u(k)=10-δb(k)-Cup(k)
where δ represents the weight of the bit error rate, CuRepresenting the weight of energy consumption.
Further, the updating formula of the Q neural network parameter θ is as follows:
Figure BDA0003210583940000042
where s, x, u, s ' respectively represent the state, action, benefit and next state in experience e, gamma represents the discount factor, x ' represents the action in the s ' state,
Figure BDA0003210583940000043
the Q value after selecting the operation x 'in the next state s' is shown, and α represents the learning rate.
The invention has the advantages that: the transmission power of the source unmanned aerial vehicle and the relay unmanned aerial vehicle are optimized simultaneously by adopting a deep reinforcement learning algorithm, so that the bit error rate and the interruption rate of a communication system can be effectively reduced, the anti-interference performance is improved, the energy consumption of the source unmanned aerial vehicle is effectively reduced, a specific channel model and an interference model are not required to be known, and the method is more suitable for practical application and convenient to popularize.
Drawings
The invention will be further described with reference to the following examples with reference to the accompanying drawings.
Fig. 1 is an execution flow chart of the unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning.
Fig. 2 is a schematic diagram of a communication network system model according to an embodiment of the present invention.
Fig. 3 is a schematic diagram illustrating a comparison between the bit error rate of an embodiment of the present invention and the bit error rate of an embodiment of the present invention using a conventional interference rejection method.
Fig. 4 is a schematic diagram illustrating comparison between the energy consumption of the source drone and the energy consumption of the source drone by using a conventional anti-jamming method according to an embodiment of the present invention.
FIG. 5 is a diagram illustrating a comparison between the interruption rate of an embodiment of the present invention and the interruption rate of an embodiment of the present invention using a conventional interference rejection method.
Detailed Description
The invention provides an unmanned aerial vehicle relay anti-interference scheme based on deep reinforcement learning, the optimized anti-interference performance indexes are bit error rate of received messages, interruption rate of a communication system and energy consumption of a source unmanned aerial vehicle, the scheme is a combined optimization scheme, firstly, a plurality of relay unmanned aerial vehicles are arranged between a ground station and the source unmanned aerial vehicle, path loss of each transmission is reduced by relaying and forwarding the messages, and secondly, the source unmanned aerial vehicle can obtain optimal transmitting power and the relay unmanned aerial vehicle to improve the anti-interference performance by applying a deep reinforcement learning algorithm DQN. Finally, according to the simulation result, the scheme provided by the invention can obviously improve the anti-interference performance of the communication system under the condition of not knowing an interference model and a channel model.
As shown in fig. 1, the method for resisting interference of the relay of the unmanned aerial vehicle based on deep reinforcement learning of the present invention includes:
step 1, building an unmanned aerial vehicle cooperative communication network, including ground node, source unmanned aerial vehicle, relay unmanned aerial vehicle, intelligent jammer and fixed jammer, forwarding information between ground node and source unmanned aerial vehicle through relay unmanned aerial vehicle, jammer simultaneously transmitting jamming signal to ground node and relay unmanned aerial vehicle node, unmanned aerial vehicle node set in the communication network is defined as U ═ Un},0≤n≤N,U0As a source unmanned aerial vehicle node, UrFor the relay unmanned aerial vehicle node, r is more than or equal to 1 and less than or equal to N, and the flight period of the source unmanned aerial vehicle is divided into a plurality of communication time slots. In the communication network, noneThe human-computer nodes are hovered at different heights, the jammers are located on the ground, a direct link between the source unmanned aerial vehicle node and the ground node is damaged due to interference, the relay unmanned aerial vehicle node can help the source unmanned aerial vehicle node to relay messages to the ground node, and if a feedback channel is not interfered, the feedback channel is used for transmitting a bit error rate and a signal interference noise ratio to the source unmanned aerial vehicle node U0
Step 2, the observed state of the current time slot is taken as the input of a target Q neural network, the Q values of all actions are obtained after the analysis of the target Q neural network, selecting the action of the current time slot according to an epsilon-greedy strategy, calculating the bit error rate, the signal interference noise ratio, the interruption rate and the benefit under the current time slot, observing the state of the next time slot, storing the experience e into an experience pool, randomly extracting a plurality of historical experiences from the experience pool and updating a Q neural network parameter theta by adopting a random gradient descent algorithm, the state of the current time slot comprises the transmission power of the last time slot, the bit error rate of the last time slot and the signal interference noise ratio of the last time slot, the action comprises the transmitting power of the current time slot and the relay unmanned aerial vehicle of the current time slot, and the experience e comprises the state of the current time slot, the action of the current time slot, the benefit of the current time slot and the state of the next time slot; the invention provides an unmanned aerial vehicle relay anti-interference scheme based on DQN, wherein the DQN is a deep reinforcement learning algorithm which integrates neural network and Q learning. When the problem to be solved by Q learning is complex, the state space or action set may be large. Learning efficiency is reduced if the states are retrieved from a very large Q-table each time. The state is taken as the input of the neural network, and the Q values of all actions are obtained after the neural network is analyzed, so that a large Q table is avoided being constructed to store the Q values.
Step 3, the operation in the step 2 is executed to the divided time slots in sequence, and the parameters of the target Q network are updated once every fixed time slot T
Figure BDA0003210583940000061
Order to
Figure BDA0003210583940000062
For example, in the k-th time slot, the system state is defined as s(k)=[p(k-1),b(k-1)(k-1)]Containing the transmission power p of the last time slotk-1Bit error rate bk-1Signal to interference plus noise ratio ρk-1,U0The selection action is
Figure BDA0003210583940000063
A is the action set of the source node and comprises selectable transmission power pkAnd relay unmanned aerial vehicle Ur. At U0After the message is sent, the bit error rate and the signal interference noise ratio of the received message are respectively in UrAnd the ground node carries out calculation and then sends the result to the U through a feedback channel0And calculating the benefit u of the time slot(k). Next time slot, U0Observing the feedback results determines a new state and then continuing to select a new action based on this state. And so on.
Preferably, in the k-th slot, U0Select a UrAnd with p(k)To UrSending a message, UrAfter receiving the message, calculating the signal interference noise ratio of the received message
Figure BDA0003210583940000064
And bit error rate
Figure BDA0003210583940000065
And judging whether the transmission is interrupted, if the transmission is not interrupted, UrThen at a fixed relay power prRelaying the message to the ground node, after the ground node receives the message, calculating the signal interference noise ratio of the received message
Figure BDA0003210583940000066
And bit error rate
Figure BDA0003210583940000067
And judging whether the transmission is interrupted, if the transmission is not interrupted, the signal interference of the whole communication process is judgedThe interference-to-noise ratio and the bit error rate are obtained as follows:
Figure BDA0003210583940000068
Figure BDA0003210583940000069
the cooperation mode of the relay unmanned aerial vehicle node can be set as Decoding Forwarding (DF), U0Is transmitted with a power p(k)Maximum PmaxThe transmission power being quantized to M levels, e.g. p(k)={mPmax/M},1≤m≤M。
Preferably, the method of "determining whether transmission is interrupted" specifically includes: and comparing the signal interference noise ratio with a threshold tau, and if the signal interference noise ratio is greater than the threshold tau, judging that the communication process is not interrupted.
Preferably, the interruption rate is calculated according to the following formula:
Figure BDA0003210583940000071
preferably, the intelligent jammer is represented by J1, the fixed jammer is represented by J2, and the interference power of the intelligent jammer J1
Figure BDA0003210583940000072
Maximum JmaxThe interference power is quantized to Y level,
Figure BDA0003210583940000073
b is an interference power set, and a benefit calculation formula after the intelligent jammer takes action is as follows:
Figure BDA0003210583940000074
wherein i is equal to 0 or 1, CjA weight representing the energy consumption of the intelligent jammer,i {. represents an interruption indication function, which is 1 if the transmission is interrupted (O ═ 1) and 0 if the transmission is not interrupted (O ═ 0), and the intelligent jammer J1 observes the sir ρ of the last timeslot(k-1)Interference power is selected to improve outage rate and maximize efficiency, and the interference power of jammer J2 is fixed
Figure BDA0003210583940000075
A fixed value is maintained at all times. Because the jammer transmits the jamming signal in all directions, the relay unmanned aerial vehicle UrAnd the ground node can be interfered, and for the intelligent jammer, the interference power of the intelligent jammer needs to be selected through reinforcement learning.
Preferably, the interference of the jammer affects the quality of communication between any two nodes in the communication network. The specific calculation formula of the signal interference noise ratio is as follows:
Figure BDA0003210583940000076
where ρ ism-nRepresenting the signal-to-interference-and-noise ratio of the message received by the n node when the m node sends the message to the n node, p represents the transmission power of the m node, hm-nMultiple of dB value representing path loss from m node to n node, pJiRepresenting the power of the jammer, i taking 1 or 0, pJ1Power, p, representing smart jammer J1J2Represents the power, h, of a stationary jammer J2i-nMultiple of dB value, sigma, representing what jammer Ji node to n node path loss2Power as background noise; the coding modulation mode of the transmitted message can adopt QPSK;
h ism-nThe calculation formula of (a) is as follows:
Figure BDA0003210583940000077
wherein L ism-n(r) represents the path loss from m-node to n-node (the path loss L is expressed by the above formula)m-nThe dB value of (r) is converted into multipleNumber hm-n),Lm-n(r) the specific calculation formula is as follows:
Figure BDA0003210583940000081
r is the Euclidean distance between two nodes of m and N, m is more than or equal to 0 and is not equal to N and is not more than N, c is the speed of light, f is the communication frequency, alphapDenotes the path loss exponent due to U0To UrThe communication channel of (1) is a space-to-space channel, and the path loss can be described by free-space propagation, so the path loss in the space-to-space channel has a low fading rate. But for UrFor air-to-ground channels to ground nodes, fading dominates the air-to-ground channel due to objects near ground nodes and more severe path loss, so when m and n nodes are source drone nodes and relay drone nodes, αp2.05, when m and n nodes are relay drones and ground nodes, αp=2.32。
Preferably, the bit error rate is calculated as follows:
Figure BDA0003210583940000082
where ρ represents the signal to interference plus noise ratio.
Preferably, the calculation formula of the benefit is as follows:
u(k)=10-δb(k)-Cup(k)
where δ represents the weight of the bit error rate, CuRepresenting the weight of energy consumption.
Preferably, the update formula of the Q neural network parameter θ is as follows:
Figure BDA0003210583940000083
where s, x, u, s' represent the state, action, benefit and next state, respectively, in the experience e, gamma tableShowing the discount factor, x 'representing the action in the s' state,
Figure BDA0003210583940000084
the Q value after selecting the operation x 'in the next state s' is shown, and α represents the learning rate.
In each time slot, U0Will experience e(k)={s(k),x(k),u(k),s(k+1)The experience pool is stored in the experience pool of the user, and the experience pool is defined as R ═ e(i)I is more than or equal to 1 and less than or equal to k-1. The Q network and the target Q network have the same structure, and the initial network parameters are the same
Figure BDA0003210583940000085
The network parameter theta of the Q network is updated every time slot, and when updated, U0Randomly extracting a plurality of historical experiences from the experience pool, updating theta by adopting a random gradient descent algorithm, and obtaining a loss function of
Figure BDA0003210583940000091
Thus, the above-mentioned update formula of θ is obtained from the loss function. The network parameters of the target Q network are updated once every fixed time slot T, and the network parameters are directly updated
Figure BDA0003210583940000092
The target Q network is used to calculate a target value, which can reduce the correlation between the current Q value and the target Q value.
The following steps of the detailed algorithm are set according to the technical scheme of the invention as follows:
1 initialization of gamma, p(0),b(0)(0),
Figure BDA0003210583940000093
pJ2,ε,τ,Cu,Cj,δ,
Figure BDA0003210583940000094
2:for k=1,2,3…do
3 observing the State s(k)=[p(k-1),b(k-1)(k-1)]
4, obtaining the Q value output by the neural network, and selecting an action x according to an epsilon-greedy strategy(k)=[p(k),Ur (k)]
5 calculating rho respectively(k),b(k)
6 calculating the interruption rate
7 calculating u(k)
8 observing the next state s(k+1)=[p(k),b(k)(k)]
9 general experience e(k)={s(k),x(k),u(k),s(k+1)Store it in an experience pool R
10, randomly extracting Z experiences from an experience pool
11 updating network parameter theta
Ifk equal to an integer multiple of T then
13 to order
Figure BDA0003210583940000095
14:end if
15 order s(k)=s(k+1)Performing state iteration
16:end for
For a more detailed description of the present invention, reference is now made to a specific embodiment:
as shown in fig. 2, in this embodiment, a three-dimensional coordinate system is used to represent a communication network, and 3 optional relay drone nodes U are arranged in the networkr(i.e. including U)1,U2,U3) At (10,30,30), (20,20,20), (30,10, 10); 1 source unmanned aerial vehicle node U0Coordinates (40,40, 40); 1 ground node G coordinate (0,0, 0); 2 jammers (fixed jammer J2(110,0,0) and smart jammer J1(0,110, 0). according to the universal national drone standard, the communication frequency f is set to 2.4ghz(k)In the range of [30,100 ]]mW, uniform quantization is 5 grades. Considering the distance between the relay node and the ground node and the jammerRelation, relay power prSet to 60 mW. Interference power of fixed jammer
Figure BDA0003210583940000101
Set to 30mW, interference power of intelligent jammer
Figure BDA0003210583940000102
In the range of [10,40]mW, uniform quantization is 4 grades. Background noise power σ2Set to-100 dBm and the threshold τ to 10 dB. In the DQN algorithm, the learning rate α is 0.001, the discount factor γ is 0.5, δ is 3000, Cu=100,Cj=60。
Based on the algorithm, the simulation result is filtered by adopting the least square smoothing filter function sgolayfilt in matlab, so that the error in the simulation result is reduced, and the curve can be more smooth. As shown in fig. 3, the bit error rate of the relay anti-interference scheme of the unmanned aerial vehicle based on the DQN of the present invention is optimized to a certain extent compared with that of the other two conventional schemes. The bit error rate of the unmanned aerial vehicle relay anti-interference scheme based on Q learning is 7.2 multiplied by 10 from the initial time-4Begins to fall and finally converges to 2.8 x 10 in about 4500 time slots-5. Meanwhile, the bit error rate of the DQN-based unmanned aerial vehicle relay anti-interference scheme is reduced from the same value, and finally the bit error rate is converged to 7.1 multiplied by 10 around 2000 time slots-6. Therefore, the algorithm provided by the invention has higher convergence speed and better learning effect. As shown in fig. 4, a graph of the variation of the energy consumption of the source drone is given. The DQN-based unmanned aerial vehicle relay anti-interference scheme is superior to other two schemes in convergence speed and final result. In the Q learning-based unmanned aerial vehicle relay anti-interference scheme, the energy consumption starts to decrease from 65mJ at the beginning, and finally converges to 36.2mJ at about 4500 time slots. In the scheme provided by the invention, the energy consumption also starts to decrease from the same value, and the energy consumption converges to 30.4mJ only after 2000 time slots, so that the learning effect is better than that of other schemes. As shown in fig. 5, the relay anti-interference scheme based on DQN for the unmanned aerial vehicle can effectively reduce the interruption rate of the communication system, and the degree of the interruption is reduced compared with other two traditional schemesAnd is larger. In the unmanned aerial vehicle relay anti-interference scheme based on Q learning, the interruption rate is reduced from 0.24, and finally, the interruption rate is converged to 0.08 at about 4500 time slots. In the proposed solution, the outage rate also decreases from 0.24 and finally converges to 0.005 in around 2000 time slots, which is a reduction of about 93.75% compared to the former.
The invention is different from the traditional unmanned aerial vehicle power optimization anti-interference scheme, and in the combined optimization scheme provided by the invention, the source unmanned aerial vehicle can reduce the path loss by selecting the relay unmanned aerial vehicle and can optimize the transmitting power according to the position of the relay unmanned aerial vehicle. The scheme can obviously improve the anti-interference performance of the communication system. Meanwhile, the invention provides an unmanned aerial vehicle relay anti-interference scheme based on DQN by applying a deep reinforcement learning algorithm in a joint optimization scheme. According to the scheme, a source unmanned aerial vehicle can obtain an optimal communication strategy by continuously trial and error and summarizing experience without knowing a specific channel model and an interference model. Therefore, the scheme is more suitable for application in actual scenes and has certain universality.
Although specific embodiments of the invention have been described above, it will be understood by those skilled in the art that the specific embodiments described are illustrative only and are not limiting upon the scope of the invention, and that equivalent modifications and variations can be made by those skilled in the art without departing from the spirit of the invention, which is to be limited only by the appended claims.

Claims (9)

1. An unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning is characterized in that: the method comprises the following steps:
step 1, building an unmanned aerial vehicle cooperative communication network, including ground node, source unmanned aerial vehicle, relay unmanned aerial vehicle, intelligent jammer and fixed jammer, forwarding information between ground node and source unmanned aerial vehicle through relay unmanned aerial vehicle, jammer simultaneously transmitting jamming signal to ground node and relay unmanned aerial vehicle node, unmanned aerial vehicle node set in the communication network is defined as U ═ Un},0≤n≤N,U0As a source unmanned aerial vehicle node, UrR is more than or equal to 1 and less than or equal to N;
step 2, the observed state of the current time slot is taken as the input of a target Q neural network, the Q values of all actions are obtained after the analysis of the target Q neural network, selecting the action of the current time slot according to an epsilon-greedy strategy, calculating the bit error rate, the signal interference noise ratio, the interruption rate and the benefit under the current time slot, observing the state of the next time slot, storing the experience e into an experience pool, randomly extracting a plurality of historical experiences from the experience pool and updating a Q neural network parameter theta by adopting a random gradient descent algorithm, the state of the current time slot comprises the transmission power of the last time slot, the bit error rate of the last time slot and the signal interference noise ratio of the last time slot, the action comprises the transmitting power of the current time slot and the relay unmanned aerial vehicle of the current time slot, and the experience e comprises the state of the current time slot, the action of the current time slot, the benefit of the current time slot and the state of the next time slot;
step 3, the operation in the step 2 is executed to the divided time slots in sequence, and the parameters of the target Q network are updated once every fixed time slot T
Figure FDA0003210583930000011
Order to
Figure FDA0003210583930000012
2. The unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning of claim 1, characterized in that:
in the k-th time slot, U0Select a UrAnd with p(k)To UrSending a message, UrAfter receiving the message, calculating the signal interference noise ratio of the received message
Figure FDA0003210583930000016
And bit error rate
Figure FDA0003210583930000015
And judging whether the transmission is interrupted, if the transmission is not interrupted, UrThen at a fixed relay power prRelaying the message to the ground node, after the ground node receives the message, calculating the signal interference noise ratio of the received message
Figure FDA0003210583930000014
And bit error rate
Figure FDA0003210583930000013
And judging whether the transmission is interrupted, if the transmission is not interrupted, the acquisition mode of the signal interference noise ratio and the bit error rate in the whole communication process is as follows:
Figure FDA0003210583930000021
Figure FDA0003210583930000022
3. the unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning of claim 2, characterized in that: the method for judging whether transmission is interrupted specifically comprises the following steps: and comparing the signal interference noise ratio with a threshold tau, and if the signal interference noise ratio is greater than the threshold tau, judging that the communication process is not interrupted.
4. The unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning of claim 3, characterized in that: the calculation formula of the interruption rate is as follows:
Figure FDA0003210583930000028
5. a method according to claim 2, based on deep reinforcement learningUnmanned aerial vehicle relay anti-interference method, its characterized in that: the intelligent jammer is represented by J1, the fixed jammer is represented by J2, and the interference power of the intelligent jammer J1
Figure FDA0003210583930000023
Maximum JmaxThe interference power is quantized to Y level,
Figure FDA0003210583930000024
b is an interference power set, and a benefit calculation formula after the intelligent jammer takes action is as follows:
Figure FDA0003210583930000025
wherein i is equal to 0 or 1, CjA weight representing the energy consumption of the intelligent jammer, I {. represents an interruption indication function, and is 1 if the transmission is interrupted (O ═ 1) and is 0 if the transmission is not interrupted (O ═ 0);
the intelligent jammer J1 observes the signal-to-interference-and-noise ratio p of the last time slot(k-1)Selecting interference power, fixing interference power of jammer J2
Figure FDA0003210583930000026
A fixed value is maintained at all times.
6. The unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning of claim 2, characterized in that: the specific calculation formula of the signal interference noise ratio is as follows:
Figure FDA0003210583930000027
where ρ ism-nRepresenting the signal-to-interference-and-noise ratio of the message received by the n node when the m node sends the message to the n node, p represents the transmission power of the m node, hm-nRepresents a multiple of the dB value of the m-node to n-node path loss,pJirepresenting the power of the jammer, i taking 1 or 0, pJ1Power, p, representing smart jammer J1J2Represents the power, h, of a stationary jammer J2i-nMultiple of dB value, sigma, representing what jammer Ji node to n node path loss2Power as background noise;
h ism-nThe calculation formula of (a) is as follows:
Figure FDA0003210583930000031
wherein L ism-n(r) represents the path loss from the m node to the n node, and the specific calculation formula is as follows:
Figure FDA0003210583930000032
r is the Euclidean distance between two nodes of m and N, m is more than or equal to 0 and is not equal to N and is not more than N, c is the speed of light, f is the communication frequency, alphapRepresents the path loss index, alpha when m and n nodes are the source unmanned aerial vehicle node and the relay unmanned aerial vehicle nodep2.05, when m and n nodes are relay drones and ground nodes, αp=2.32。
7. The unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning of claim 2, characterized in that: the bit error rate is calculated as follows:
Figure FDA0003210583930000033
where ρ represents the signal to interference plus noise ratio.
8. The unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning of claim 2, characterized in that: the calculation formula of the benefit is as follows:
u(k)=10-δb(k)-Cup(k)
where δ represents the weight of the bit error rate, CuRepresenting the weight of energy consumption.
9. The unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning of claim 2, characterized in that: the updating formula of the Q neural network parameter theta is as follows:
Figure FDA0003210583930000034
where s, x, u, s ' respectively represent the state, action, benefit and next state in experience e, gamma represents the discount factor, x ' represents the action in the s ' state,
Figure FDA0003210583930000035
the Q value after selecting the operation x 'in the next state s' is shown, and α represents the learning rate.
CN202110930717.6A 2021-08-13 2021-08-13 Unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning Active CN113507342B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110930717.6A CN113507342B (en) 2021-08-13 2021-08-13 Unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110930717.6A CN113507342B (en) 2021-08-13 2021-08-13 Unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN113507342A true CN113507342A (en) 2021-10-15
CN113507342B CN113507342B (en) 2023-06-02

Family

ID=78015555

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110930717.6A Active CN113507342B (en) 2021-08-13 2021-08-13 Unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN113507342B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140206279A1 (en) * 2013-01-22 2014-07-24 Eden Rock Communications, Llc Method and system for intelligent jamming signal generation
CN109274456A (en) * 2018-09-10 2019-01-25 电子科技大学 A kind of imperfect information intelligence anti-interference method based on intensified learning
CN111917508A (en) * 2020-08-10 2020-11-10 中国人民解放军陆军工程大学 Anti-interference communication model based on multiple antennas and dynamic spatial spectrum anti-interference method
CN112564849A (en) * 2020-12-01 2021-03-26 国网辽宁省电力有限公司营口供电公司 Identification and trapping method for multi-model unmanned aerial vehicle
EP3854013A1 (en) * 2018-09-19 2021-07-28 Rheinmetall Air Defence AG Signal interference device and a method for operating a signal interference device for protecting unmanned aerial vehicles (uav), in particular drones

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140206279A1 (en) * 2013-01-22 2014-07-24 Eden Rock Communications, Llc Method and system for intelligent jamming signal generation
CN109274456A (en) * 2018-09-10 2019-01-25 电子科技大学 A kind of imperfect information intelligence anti-interference method based on intensified learning
EP3854013A1 (en) * 2018-09-19 2021-07-28 Rheinmetall Air Defence AG Signal interference device and a method for operating a signal interference device for protecting unmanned aerial vehicles (uav), in particular drones
CN111917508A (en) * 2020-08-10 2020-11-10 中国人民解放军陆军工程大学 Anti-interference communication model based on multiple antennas and dynamic spatial spectrum anti-interference method
CN112564849A (en) * 2020-12-01 2021-03-26 国网辽宁省电力有限公司营口供电公司 Identification and trapping method for multi-model unmanned aerial vehicle

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHICHAO SHENG 等: "UAV-Aided Two-Way Multi-User Relaying" *

Also Published As

Publication number Publication date
CN113507342B (en) 2023-06-02

Similar Documents

Publication Publication Date Title
CN109729528B (en) D2D resource allocation method based on multi-agent deep reinforcement learning
CN109474980B (en) Wireless network resource allocation method based on deep reinforcement learning
CN108880662B (en) Wireless information and energy transmission optimization method based on unmanned aerial vehicle
CN110620611B (en) Cooperative spectrum sensing method based on GEO and LEO double-layer satellite network
CN110784882B (en) Energy acquisition D2D communication resource allocation method based on reinforcement learning
CN106680780A (en) Radar optimal waveform design method based on radio frequency stealth in frequency spectrum shared environment
CN111511038B (en) Distributed channel intelligent sensing and access method for wireless cooperative network
CN113596785B (en) D2D-NOMA communication system resource allocation method based on deep Q network
CN109861728B (en) Joint multi-relay selection and time slot resource allocation method for large-scale MIMO system
CN112583453A (en) Downlink NOMA power distribution method of multi-beam LEO satellite communication system
CN109195207B (en) Energy-collecting wireless relay network throughput maximization method based on deep reinforcement learning
CN109661034B (en) Antenna selection and resource allocation method in wireless energy supply communication network
CN101729164B (en) Wireless resource allocation method and cognitive radio user equipment
CN112040498B (en) Fixed point iteration-based wireless energy supply sensor network time allocation method
CN110139282B (en) Energy acquisition D2D communication resource allocation method based on neural network
CN115766089A (en) Energy acquisition cognitive Internet of things anti-interference optimal transmission method
CN113255218A (en) Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network
CN110366225B (en) Wireless energy supply multi-hop communication system node selection method
CN113795050B (en) Sum Tree sampling-based deep double-Q network dynamic power control method
CN111741520A (en) Cognitive underwater acoustic communication system power distribution method based on particle swarm
CN113507342A (en) Unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning
CN108449790B (en) Time and power distribution method of cognitive wireless network based on differential evolution algorithm
CN103957565B (en) Resource allocation methods based on target SINR in distributed wireless networks
CN115119174A (en) Unmanned aerial vehicle autonomous deployment method based on energy consumption optimization in irrigation area scene
Du et al. Joint time and power control of energy harvesting CRN based on PPO

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant