CN113507342A - Unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning - Google Patents
Unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning Download PDFInfo
- Publication number
- CN113507342A CN113507342A CN202110930717.6A CN202110930717A CN113507342A CN 113507342 A CN113507342 A CN 113507342A CN 202110930717 A CN202110930717 A CN 202110930717A CN 113507342 A CN113507342 A CN 113507342A
- Authority
- CN
- China
- Prior art keywords
- aerial vehicle
- unmanned aerial
- time slot
- interference
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 230000002787 reinforcement Effects 0.000 title claims abstract description 24
- 230000006854 communication Effects 0.000 claims abstract description 35
- 230000009471 action Effects 0.000 claims abstract description 29
- 238000004891 communication Methods 0.000 claims abstract description 29
- 238000013528 artificial neural network Methods 0.000 claims abstract description 18
- 230000008901 benefit Effects 0.000 claims abstract description 18
- 238000005265 energy consumption Methods 0.000 claims abstract description 15
- 230000005540 biological transmission Effects 0.000 claims description 32
- 238000004364 calculation method Methods 0.000 claims description 17
- 238000004422 calculation algorithm Methods 0.000 claims description 12
- 230000006870 function Effects 0.000 claims description 6
- 238000010586 diagram Methods 0.000 description 4
- 238000005457 optimization Methods 0.000 description 4
- 230000007423 decrease Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000004088 simulation Methods 0.000 description 3
- 238000005562 fading Methods 0.000 description 2
- 238000013139 quantization Methods 0.000 description 2
- 238000009499 grossing Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04K—SECRET COMMUNICATION; JAMMING OF COMMUNICATION
- H04K3/00—Jamming of communication; Counter-measures
- H04K3/20—Countermeasures against jamming
- H04K3/22—Countermeasures against jamming including jamming detection and monitoring
- H04K3/224—Countermeasures against jamming including jamming detection and monitoring with countermeasures at transmission and/or reception of the jammed signal, e.g. stopping operation of transmitter or receiver, nulling or enhancing transmitted power in direction of or at frequency of jammer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04K—SECRET COMMUNICATION; JAMMING OF COMMUNICATION
- H04K3/00—Jamming of communication; Counter-measures
- H04K3/80—Jamming or countermeasure characterized by its function
- H04K3/84—Jamming or countermeasure characterized by its function related to preventing electromagnetic interference in petrol station, hospital, plane or cinema
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04K—SECRET COMMUNICATION; JAMMING OF COMMUNICATION
- H04K3/00—Jamming of communication; Counter-measures
- H04K3/20—Countermeasures against jamming
- H04K3/28—Countermeasures against jamming with jamming and anti-jamming mechanisms both included in a same device or system, e.g. wherein anti-jamming includes prevention of undesired self-jamming resulting from jamming
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04K—SECRET COMMUNICATION; JAMMING OF COMMUNICATION
- H04K3/00—Jamming of communication; Counter-measures
- H04K3/40—Jamming having variable characteristics
- H04K3/45—Jamming having variable characteristics characterized by including monitoring of the target or target signal, e.g. in reactive jammers or follower jammers for example by means of an alternation of jamming phases and monitoring phases, called "look-through mode"
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04K—SECRET COMMUNICATION; JAMMING OF COMMUNICATION
- H04K2203/00—Jamming of communication; Countermeasures
- H04K2203/10—Jamming or countermeasure used for a particular application
- H04K2203/22—Jamming or countermeasure used for a particular application for communication related to vehicles
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04K—SECRET COMMUNICATION; JAMMING OF COMMUNICATION
- H04K2203/00—Jamming of communication; Countermeasures
- H04K2203/30—Jamming or countermeasure characterized by the infrastructure components
- H04K2203/34—Jamming or countermeasure characterized by the infrastructure components involving multiple cooperating jammers
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Abstract
The invention provides an unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning, which comprises the following steps: step 1, building an unmanned aerial vehicle cooperative communication network, wherein the unmanned aerial vehicle cooperative communication network comprises a ground node, a source unmanned aerial vehicle, a relay unmanned aerial vehicle, an intelligent jammer and a fixed jammer; step 2, the observed state of the current time slot is used as the input of a target Q neural network, Q values of all actions are obtained after analysis, the action of the current time slot is selected according to an epsilon-greedy strategy, the bit error rate, the signal interference noise ratio, the interruption rate and the benefit under the current time slot are calculated, the state of the next time slot is observed, the experience e is stored in an experience pool, and a plurality of experiences are randomly extracted from the experience pool to update a Q neural network parameter theta; step 3, the operation in the step 2 is executed to the divided time slots in sequence, and the parameters of the target Q network are updated once every fixed time slot TThe method can reduce the bit error rate and the interruption rate of the communication system, improve the anti-interference performance and reduce the energy consumption of the source unmanned aerial vehicle.
Description
Technical Field
The invention relates to the technical field of unmanned aerial vehicles, in particular to an unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning.
Background
Due to the broadcast nature of wireless communication, communications between a single drone and a ground station are easily interfered by ground hostile jammers. In addition, the distance from the ground station is usually far when a single unmanned aerial vehicle performs a task, and the path loss can be large. Under the influence of the two main factors, a direct link between a single unmanned aerial vehicle and a ground station is subjected to more serious interference, so that the quality of a received signal is reduced, and information errors or loss are caused.
The anti-interference scheme of traditional unmanned aerial vehicle improves anti-interference performance through optimizing transmit power mostly, but, when source unmanned aerial vehicle is far away from the ground satellite station, path loss is very big, can't realize ideal anti-interference performance only through optimizing source unmanned aerial vehicle's transmit power. Meanwhile, the traditional unmanned aerial vehicle anti-interference scheme needs to know a specific channel model and an interference model to optimize the transmitting power, so that the application effect of the scheme in an actual scene is greatly limited.
Disclosure of Invention
The invention aims to solve the technical problem of providing an unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning, which can obviously improve the anti-interference performance of a communication system under the condition of not knowing an object interference model and a channel model.
The invention is realized in this way, an unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning, comprising the following steps:
step 1, building an unmanned aerial vehicle cooperative communication network, including ground node, source unmanned aerial vehicle, relay unmanned aerial vehicle, intelligent jammer and fixed jammer, forwarding information between ground node and source unmanned aerial vehicle through relay unmanned aerial vehicle, jammer simultaneously transmitting jamming signal to ground node and relay unmanned aerial vehicle node, unmanned aerial vehicle node set in the communication network is defined as U ═ Un},0≤n≤N,U0As a source unmanned aerial vehicle node, UrR is more than or equal to 1 and less than or equal to N;
step 2, the observed state of the current time slot is taken as the input of a target Q neural network, the Q values of all actions are obtained after the analysis of the target Q neural network, selecting the action of the current time slot according to an epsilon-greedy strategy, calculating the bit error rate, the signal interference noise ratio, the interruption rate and the benefit under the current time slot, observing the state of the next time slot, storing the experience e into an experience pool, randomly extracting a plurality of historical experiences from the experience pool and updating a Q neural network parameter theta by adopting a random gradient descent algorithm, the state of the current time slot comprises the transmission power of the last time slot, the bit error rate of the last time slot and the signal interference noise ratio of the last time slot, the action comprises the transmitting power of the current time slot and the relay unmanned aerial vehicle of the current time slot, and the experience e comprises the state of the current time slot, the action of the current time slot, the benefit of the current time slot and the state of the next time slot;
step 3, the operation in the step 2 is executed to the divided time slots in sequence, and the parameters of the target Q network are updated once every fixed time slot TOrder to
Further, in the k-th slot, U0Select a UrAnd with p(k)To UrSending a message, UrAfter receiving the message, calculating the signal interference noise ratio of the received messageAnd bit error rateAnd judging whether the transmission is interrupted, if the transmission is not interrupted, UrThen at a fixed relay power prRelaying the message to the ground node, after the ground node receives the message, calculating the signal interference noise ratio of the received messageAnd bit error rateAnd judging whether the transmission is interrupted, if the transmission is not interrupted, the acquisition mode of the signal interference noise ratio and the bit error rate in the whole communication process is as follows:
further, the manner of "judging whether transmission is interrupted" is specifically: and comparing the signal interference noise ratio with a threshold tau, and if the signal interference noise ratio is greater than the threshold tau, judging that the communication process is not interrupted.
Further, the interruption rate is calculated according to the following formula:
further, an intelligent jammer is represented by J1, a fixed jammer is represented by J2, and the interference power of the intelligent jammer J1Maximum JmaxThe interference power is quantized to Y level,b is an interference power set, and a benefit calculation formula after the intelligent jammer takes action is as follows:
wherein i is equal to 0 or 1, CjWeight representing energy consumption of intelligent jammer, I {. represents interruption indication function, if transmission is interrupted (O ═ 1) then is 1, and if transmission is not interrupted (O ═ 0) then is 0, intelligent jammer J1 observes signal to interference plus noise ratio ρ of last time slot(k-1)Interference power is selected to improve outage rate and maximize efficiency, and the interference power of jammer J2 is fixedA fixed value is maintained at all times. Because the jammer transmits the jamming signal in all directions, the relay unmanned aerial vehicle UrAnd the ground node can be interfered, and for the intelligent jammer, the interference power of the intelligent jammer needs to be selected through reinforcement learning.
Further, the specific calculation formula of the signal to interference plus noise ratio is as follows:
where ρ ism-nRepresenting the signal-to-interference-and-noise ratio of the message received by the n node when the m node sends the message to the n node, p represents the transmission power of the m node, hm-nMultiple of dB value representing path loss from m node to n node, pJiRepresenting the power of the jammer, i taking 1 or 0, pJ1Power, p, representing smart jammer J1J2Represents the power, h, of a stationary jammer J2i-nMultiple of dB value, sigma, representing what jammer Ji node to n node path loss2Power as background noise;
h ism-nThe calculation formula of (a) is as follows:
wherein L ism-n(r) represents the path loss from the m node to the n node, and the specific calculation formula is as follows:
r is the Euclidean distance between two nodes of m and N, m is more than or equal to 0 and is not equal to N and is not more than N, c is the speed of light, f is the communication frequency, alphapRepresents the path loss index, alpha when m and n nodes are the source unmanned aerial vehicle node and the relay unmanned aerial vehicle nodep2.05, when m and n nodes are relay unmanned aerial vehicle and ground nodeAt a point of time, αp=2.32。
Further, the bit error rate is calculated as follows:
where ρ represents the signal to interference plus noise ratio.
Further, the calculation formula of the benefit is as follows:
u(k)=10-δb(k)-Cup(k)
where δ represents the weight of the bit error rate, CuRepresenting the weight of energy consumption.
Further, the updating formula of the Q neural network parameter θ is as follows:
where s, x, u, s ' respectively represent the state, action, benefit and next state in experience e, gamma represents the discount factor, x ' represents the action in the s ' state,the Q value after selecting the operation x 'in the next state s' is shown, and α represents the learning rate.
The invention has the advantages that: the transmission power of the source unmanned aerial vehicle and the relay unmanned aerial vehicle are optimized simultaneously by adopting a deep reinforcement learning algorithm, so that the bit error rate and the interruption rate of a communication system can be effectively reduced, the anti-interference performance is improved, the energy consumption of the source unmanned aerial vehicle is effectively reduced, a specific channel model and an interference model are not required to be known, and the method is more suitable for practical application and convenient to popularize.
Drawings
The invention will be further described with reference to the following examples with reference to the accompanying drawings.
Fig. 1 is an execution flow chart of the unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning.
Fig. 2 is a schematic diagram of a communication network system model according to an embodiment of the present invention.
Fig. 3 is a schematic diagram illustrating a comparison between the bit error rate of an embodiment of the present invention and the bit error rate of an embodiment of the present invention using a conventional interference rejection method.
Fig. 4 is a schematic diagram illustrating comparison between the energy consumption of the source drone and the energy consumption of the source drone by using a conventional anti-jamming method according to an embodiment of the present invention.
FIG. 5 is a diagram illustrating a comparison between the interruption rate of an embodiment of the present invention and the interruption rate of an embodiment of the present invention using a conventional interference rejection method.
Detailed Description
The invention provides an unmanned aerial vehicle relay anti-interference scheme based on deep reinforcement learning, the optimized anti-interference performance indexes are bit error rate of received messages, interruption rate of a communication system and energy consumption of a source unmanned aerial vehicle, the scheme is a combined optimization scheme, firstly, a plurality of relay unmanned aerial vehicles are arranged between a ground station and the source unmanned aerial vehicle, path loss of each transmission is reduced by relaying and forwarding the messages, and secondly, the source unmanned aerial vehicle can obtain optimal transmitting power and the relay unmanned aerial vehicle to improve the anti-interference performance by applying a deep reinforcement learning algorithm DQN. Finally, according to the simulation result, the scheme provided by the invention can obviously improve the anti-interference performance of the communication system under the condition of not knowing an interference model and a channel model.
As shown in fig. 1, the method for resisting interference of the relay of the unmanned aerial vehicle based on deep reinforcement learning of the present invention includes:
step 1, building an unmanned aerial vehicle cooperative communication network, including ground node, source unmanned aerial vehicle, relay unmanned aerial vehicle, intelligent jammer and fixed jammer, forwarding information between ground node and source unmanned aerial vehicle through relay unmanned aerial vehicle, jammer simultaneously transmitting jamming signal to ground node and relay unmanned aerial vehicle node, unmanned aerial vehicle node set in the communication network is defined as U ═ Un},0≤n≤N,U0As a source unmanned aerial vehicle node, UrFor the relay unmanned aerial vehicle node, r is more than or equal to 1 and less than or equal to N, and the flight period of the source unmanned aerial vehicle is divided into a plurality of communication time slots. In the communication network, noneThe human-computer nodes are hovered at different heights, the jammers are located on the ground, a direct link between the source unmanned aerial vehicle node and the ground node is damaged due to interference, the relay unmanned aerial vehicle node can help the source unmanned aerial vehicle node to relay messages to the ground node, and if a feedback channel is not interfered, the feedback channel is used for transmitting a bit error rate and a signal interference noise ratio to the source unmanned aerial vehicle node U0;
Step 2, the observed state of the current time slot is taken as the input of a target Q neural network, the Q values of all actions are obtained after the analysis of the target Q neural network, selecting the action of the current time slot according to an epsilon-greedy strategy, calculating the bit error rate, the signal interference noise ratio, the interruption rate and the benefit under the current time slot, observing the state of the next time slot, storing the experience e into an experience pool, randomly extracting a plurality of historical experiences from the experience pool and updating a Q neural network parameter theta by adopting a random gradient descent algorithm, the state of the current time slot comprises the transmission power of the last time slot, the bit error rate of the last time slot and the signal interference noise ratio of the last time slot, the action comprises the transmitting power of the current time slot and the relay unmanned aerial vehicle of the current time slot, and the experience e comprises the state of the current time slot, the action of the current time slot, the benefit of the current time slot and the state of the next time slot; the invention provides an unmanned aerial vehicle relay anti-interference scheme based on DQN, wherein the DQN is a deep reinforcement learning algorithm which integrates neural network and Q learning. When the problem to be solved by Q learning is complex, the state space or action set may be large. Learning efficiency is reduced if the states are retrieved from a very large Q-table each time. The state is taken as the input of the neural network, and the Q values of all actions are obtained after the neural network is analyzed, so that a large Q table is avoided being constructed to store the Q values.
Step 3, the operation in the step 2 is executed to the divided time slots in sequence, and the parameters of the target Q network are updated once every fixed time slot TOrder toFor example, in the k-th time slot, the system state is defined as s(k)=[p(k-1),b(k-1),ρ(k-1)]Containing the transmission power p of the last time slotk-1Bit error rate bk-1Signal to interference plus noise ratio ρk-1,U0The selection action isA is the action set of the source node and comprises selectable transmission power pkAnd relay unmanned aerial vehicle Ur. At U0After the message is sent, the bit error rate and the signal interference noise ratio of the received message are respectively in UrAnd the ground node carries out calculation and then sends the result to the U through a feedback channel0And calculating the benefit u of the time slot(k). Next time slot, U0Observing the feedback results determines a new state and then continuing to select a new action based on this state. And so on.
Preferably, in the k-th slot, U0Select a UrAnd with p(k)To UrSending a message, UrAfter receiving the message, calculating the signal interference noise ratio of the received messageAnd bit error rateAnd judging whether the transmission is interrupted, if the transmission is not interrupted, UrThen at a fixed relay power prRelaying the message to the ground node, after the ground node receives the message, calculating the signal interference noise ratio of the received messageAnd bit error rateAnd judging whether the transmission is interrupted, if the transmission is not interrupted, the signal interference of the whole communication process is judgedThe interference-to-noise ratio and the bit error rate are obtained as follows:
the cooperation mode of the relay unmanned aerial vehicle node can be set as Decoding Forwarding (DF), U0Is transmitted with a power p(k)Maximum PmaxThe transmission power being quantized to M levels, e.g. p(k)={mPmax/M},1≤m≤M。
Preferably, the method of "determining whether transmission is interrupted" specifically includes: and comparing the signal interference noise ratio with a threshold tau, and if the signal interference noise ratio is greater than the threshold tau, judging that the communication process is not interrupted.
Preferably, the interruption rate is calculated according to the following formula:
preferably, the intelligent jammer is represented by J1, the fixed jammer is represented by J2, and the interference power of the intelligent jammer J1Maximum JmaxThe interference power is quantized to Y level,b is an interference power set, and a benefit calculation formula after the intelligent jammer takes action is as follows:
wherein i is equal to 0 or 1, CjA weight representing the energy consumption of the intelligent jammer,i {. represents an interruption indication function, which is 1 if the transmission is interrupted (O ═ 1) and 0 if the transmission is not interrupted (O ═ 0), and the intelligent jammer J1 observes the sir ρ of the last timeslot(k-1)Interference power is selected to improve outage rate and maximize efficiency, and the interference power of jammer J2 is fixedA fixed value is maintained at all times. Because the jammer transmits the jamming signal in all directions, the relay unmanned aerial vehicle UrAnd the ground node can be interfered, and for the intelligent jammer, the interference power of the intelligent jammer needs to be selected through reinforcement learning.
Preferably, the interference of the jammer affects the quality of communication between any two nodes in the communication network. The specific calculation formula of the signal interference noise ratio is as follows:
where ρ ism-nRepresenting the signal-to-interference-and-noise ratio of the message received by the n node when the m node sends the message to the n node, p represents the transmission power of the m node, hm-nMultiple of dB value representing path loss from m node to n node, pJiRepresenting the power of the jammer, i taking 1 or 0, pJ1Power, p, representing smart jammer J1J2Represents the power, h, of a stationary jammer J2i-nMultiple of dB value, sigma, representing what jammer Ji node to n node path loss2Power as background noise; the coding modulation mode of the transmitted message can adopt QPSK;
h ism-nThe calculation formula of (a) is as follows:
wherein L ism-n(r) represents the path loss from m-node to n-node (the path loss L is expressed by the above formula)m-nThe dB value of (r) is converted into multipleNumber hm-n),Lm-n(r) the specific calculation formula is as follows:
r is the Euclidean distance between two nodes of m and N, m is more than or equal to 0 and is not equal to N and is not more than N, c is the speed of light, f is the communication frequency, alphapDenotes the path loss exponent due to U0To UrThe communication channel of (1) is a space-to-space channel, and the path loss can be described by free-space propagation, so the path loss in the space-to-space channel has a low fading rate. But for UrFor air-to-ground channels to ground nodes, fading dominates the air-to-ground channel due to objects near ground nodes and more severe path loss, so when m and n nodes are source drone nodes and relay drone nodes, αp2.05, when m and n nodes are relay drones and ground nodes, αp=2.32。
Preferably, the bit error rate is calculated as follows:
where ρ represents the signal to interference plus noise ratio.
Preferably, the calculation formula of the benefit is as follows:
u(k)=10-δb(k)-Cup(k)
where δ represents the weight of the bit error rate, CuRepresenting the weight of energy consumption.
Preferably, the update formula of the Q neural network parameter θ is as follows:
where s, x, u, s' represent the state, action, benefit and next state, respectively, in the experience e, gamma tableShowing the discount factor, x 'representing the action in the s' state,the Q value after selecting the operation x 'in the next state s' is shown, and α represents the learning rate.
In each time slot, U0Will experience e(k)={s(k),x(k),u(k),s(k+1)The experience pool is stored in the experience pool of the user, and the experience pool is defined as R ═ e(i)I is more than or equal to 1 and less than or equal to k-1. The Q network and the target Q network have the same structure, and the initial network parameters are the sameThe network parameter theta of the Q network is updated every time slot, and when updated, U0Randomly extracting a plurality of historical experiences from the experience pool, updating theta by adopting a random gradient descent algorithm, and obtaining a loss function ofThus, the above-mentioned update formula of θ is obtained from the loss function. The network parameters of the target Q network are updated once every fixed time slot T, and the network parameters are directly updatedThe target Q network is used to calculate a target value, which can reduce the correlation between the current Q value and the target Q value.
The following steps of the detailed algorithm are set according to the technical scheme of the invention as follows:
2:for k=1,2,3…do
3 observing the State s(k)=[p(k-1),b(k-1),ρ(k-1)]
4, obtaining the Q value output by the neural network, and selecting an action x according to an epsilon-greedy strategy(k)=[p(k),Ur (k)]
5 calculating rho respectively(k),b(k)
6 calculating the interruption rate
7 calculating u(k)
8 observing the next state s(k+1)=[p(k),b(k),ρ(k)]
9 general experience e(k)={s(k),x(k),u(k),s(k+1)Store it in an experience pool R
10, randomly extracting Z experiences from an experience pool
11 updating network parameter theta
Ifk equal to an integer multiple of T then
14:end if
15 order s(k)=s(k+1)Performing state iteration
16:end for
For a more detailed description of the present invention, reference is now made to a specific embodiment:
as shown in fig. 2, in this embodiment, a three-dimensional coordinate system is used to represent a communication network, and 3 optional relay drone nodes U are arranged in the networkr(i.e. including U)1,U2,U3) At (10,30,30), (20,20,20), (30,10, 10); 1 source unmanned aerial vehicle node U0Coordinates (40,40, 40); 1 ground node G coordinate (0,0, 0); 2 jammers (fixed jammer J2(110,0,0) and smart jammer J1(0,110, 0). according to the universal national drone standard, the communication frequency f is set to 2.4ghz(k)In the range of [30,100 ]]mW, uniform quantization is 5 grades. Considering the distance between the relay node and the ground node and the jammerRelation, relay power prSet to 60 mW. Interference power of fixed jammerSet to 30mW, interference power of intelligent jammerIn the range of [10,40]mW, uniform quantization is 4 grades. Background noise power σ2Set to-100 dBm and the threshold τ to 10 dB. In the DQN algorithm, the learning rate α is 0.001, the discount factor γ is 0.5, δ is 3000, Cu=100,Cj=60。
Based on the algorithm, the simulation result is filtered by adopting the least square smoothing filter function sgolayfilt in matlab, so that the error in the simulation result is reduced, and the curve can be more smooth. As shown in fig. 3, the bit error rate of the relay anti-interference scheme of the unmanned aerial vehicle based on the DQN of the present invention is optimized to a certain extent compared with that of the other two conventional schemes. The bit error rate of the unmanned aerial vehicle relay anti-interference scheme based on Q learning is 7.2 multiplied by 10 from the initial time-4Begins to fall and finally converges to 2.8 x 10 in about 4500 time slots-5. Meanwhile, the bit error rate of the DQN-based unmanned aerial vehicle relay anti-interference scheme is reduced from the same value, and finally the bit error rate is converged to 7.1 multiplied by 10 around 2000 time slots-6. Therefore, the algorithm provided by the invention has higher convergence speed and better learning effect. As shown in fig. 4, a graph of the variation of the energy consumption of the source drone is given. The DQN-based unmanned aerial vehicle relay anti-interference scheme is superior to other two schemes in convergence speed and final result. In the Q learning-based unmanned aerial vehicle relay anti-interference scheme, the energy consumption starts to decrease from 65mJ at the beginning, and finally converges to 36.2mJ at about 4500 time slots. In the scheme provided by the invention, the energy consumption also starts to decrease from the same value, and the energy consumption converges to 30.4mJ only after 2000 time slots, so that the learning effect is better than that of other schemes. As shown in fig. 5, the relay anti-interference scheme based on DQN for the unmanned aerial vehicle can effectively reduce the interruption rate of the communication system, and the degree of the interruption is reduced compared with other two traditional schemesAnd is larger. In the unmanned aerial vehicle relay anti-interference scheme based on Q learning, the interruption rate is reduced from 0.24, and finally, the interruption rate is converged to 0.08 at about 4500 time slots. In the proposed solution, the outage rate also decreases from 0.24 and finally converges to 0.005 in around 2000 time slots, which is a reduction of about 93.75% compared to the former.
The invention is different from the traditional unmanned aerial vehicle power optimization anti-interference scheme, and in the combined optimization scheme provided by the invention, the source unmanned aerial vehicle can reduce the path loss by selecting the relay unmanned aerial vehicle and can optimize the transmitting power according to the position of the relay unmanned aerial vehicle. The scheme can obviously improve the anti-interference performance of the communication system. Meanwhile, the invention provides an unmanned aerial vehicle relay anti-interference scheme based on DQN by applying a deep reinforcement learning algorithm in a joint optimization scheme. According to the scheme, a source unmanned aerial vehicle can obtain an optimal communication strategy by continuously trial and error and summarizing experience without knowing a specific channel model and an interference model. Therefore, the scheme is more suitable for application in actual scenes and has certain universality.
Although specific embodiments of the invention have been described above, it will be understood by those skilled in the art that the specific embodiments described are illustrative only and are not limiting upon the scope of the invention, and that equivalent modifications and variations can be made by those skilled in the art without departing from the spirit of the invention, which is to be limited only by the appended claims.
Claims (9)
1. An unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning is characterized in that: the method comprises the following steps:
step 1, building an unmanned aerial vehicle cooperative communication network, including ground node, source unmanned aerial vehicle, relay unmanned aerial vehicle, intelligent jammer and fixed jammer, forwarding information between ground node and source unmanned aerial vehicle through relay unmanned aerial vehicle, jammer simultaneously transmitting jamming signal to ground node and relay unmanned aerial vehicle node, unmanned aerial vehicle node set in the communication network is defined as U ═ Un},0≤n≤N,U0As a source unmanned aerial vehicle node, UrR is more than or equal to 1 and less than or equal to N;
step 2, the observed state of the current time slot is taken as the input of a target Q neural network, the Q values of all actions are obtained after the analysis of the target Q neural network, selecting the action of the current time slot according to an epsilon-greedy strategy, calculating the bit error rate, the signal interference noise ratio, the interruption rate and the benefit under the current time slot, observing the state of the next time slot, storing the experience e into an experience pool, randomly extracting a plurality of historical experiences from the experience pool and updating a Q neural network parameter theta by adopting a random gradient descent algorithm, the state of the current time slot comprises the transmission power of the last time slot, the bit error rate of the last time slot and the signal interference noise ratio of the last time slot, the action comprises the transmitting power of the current time slot and the relay unmanned aerial vehicle of the current time slot, and the experience e comprises the state of the current time slot, the action of the current time slot, the benefit of the current time slot and the state of the next time slot;
2. The unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning of claim 1, characterized in that:
in the k-th time slot, U0Select a UrAnd with p(k)To UrSending a message, UrAfter receiving the message, calculating the signal interference noise ratio of the received messageAnd bit error rateAnd judging whether the transmission is interrupted, if the transmission is not interrupted, UrThen at a fixed relay power prRelaying the message to the ground node, after the ground node receives the message, calculating the signal interference noise ratio of the received messageAnd bit error rateAnd judging whether the transmission is interrupted, if the transmission is not interrupted, the acquisition mode of the signal interference noise ratio and the bit error rate in the whole communication process is as follows:
3. the unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning of claim 2, characterized in that: the method for judging whether transmission is interrupted specifically comprises the following steps: and comparing the signal interference noise ratio with a threshold tau, and if the signal interference noise ratio is greater than the threshold tau, judging that the communication process is not interrupted.
5. a method according to claim 2, based on deep reinforcement learningUnmanned aerial vehicle relay anti-interference method, its characterized in that: the intelligent jammer is represented by J1, the fixed jammer is represented by J2, and the interference power of the intelligent jammer J1Maximum JmaxThe interference power is quantized to Y level,b is an interference power set, and a benefit calculation formula after the intelligent jammer takes action is as follows:
wherein i is equal to 0 or 1, CjA weight representing the energy consumption of the intelligent jammer, I {. represents an interruption indication function, and is 1 if the transmission is interrupted (O ═ 1) and is 0 if the transmission is not interrupted (O ═ 0);
6. The unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning of claim 2, characterized in that: the specific calculation formula of the signal interference noise ratio is as follows:
where ρ ism-nRepresenting the signal-to-interference-and-noise ratio of the message received by the n node when the m node sends the message to the n node, p represents the transmission power of the m node, hm-nRepresents a multiple of the dB value of the m-node to n-node path loss,pJirepresenting the power of the jammer, i taking 1 or 0, pJ1Power, p, representing smart jammer J1J2Represents the power, h, of a stationary jammer J2i-nMultiple of dB value, sigma, representing what jammer Ji node to n node path loss2Power as background noise;
h ism-nThe calculation formula of (a) is as follows:
wherein L ism-n(r) represents the path loss from the m node to the n node, and the specific calculation formula is as follows:
r is the Euclidean distance between two nodes of m and N, m is more than or equal to 0 and is not equal to N and is not more than N, c is the speed of light, f is the communication frequency, alphapRepresents the path loss index, alpha when m and n nodes are the source unmanned aerial vehicle node and the relay unmanned aerial vehicle nodep2.05, when m and n nodes are relay drones and ground nodes, αp=2.32。
8. The unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning of claim 2, characterized in that: the calculation formula of the benefit is as follows:
u(k)=10-δb(k)-Cup(k)
where δ represents the weight of the bit error rate, CuRepresenting the weight of energy consumption.
9. The unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning of claim 2, characterized in that: the updating formula of the Q neural network parameter theta is as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110930717.6A CN113507342B (en) | 2021-08-13 | 2021-08-13 | Unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110930717.6A CN113507342B (en) | 2021-08-13 | 2021-08-13 | Unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113507342A true CN113507342A (en) | 2021-10-15 |
CN113507342B CN113507342B (en) | 2023-06-02 |
Family
ID=78015555
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110930717.6A Active CN113507342B (en) | 2021-08-13 | 2021-08-13 | Unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113507342B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140206279A1 (en) * | 2013-01-22 | 2014-07-24 | Eden Rock Communications, Llc | Method and system for intelligent jamming signal generation |
CN109274456A (en) * | 2018-09-10 | 2019-01-25 | 电子科技大学 | A kind of imperfect information intelligence anti-interference method based on intensified learning |
CN111917508A (en) * | 2020-08-10 | 2020-11-10 | 中国人民解放军陆军工程大学 | Anti-interference communication model based on multiple antennas and dynamic spatial spectrum anti-interference method |
CN112564849A (en) * | 2020-12-01 | 2021-03-26 | 国网辽宁省电力有限公司营口供电公司 | Identification and trapping method for multi-model unmanned aerial vehicle |
EP3854013A1 (en) * | 2018-09-19 | 2021-07-28 | Rheinmetall Air Defence AG | Signal interference device and a method for operating a signal interference device for protecting unmanned aerial vehicles (uav), in particular drones |
-
2021
- 2021-08-13 CN CN202110930717.6A patent/CN113507342B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140206279A1 (en) * | 2013-01-22 | 2014-07-24 | Eden Rock Communications, Llc | Method and system for intelligent jamming signal generation |
CN109274456A (en) * | 2018-09-10 | 2019-01-25 | 电子科技大学 | A kind of imperfect information intelligence anti-interference method based on intensified learning |
EP3854013A1 (en) * | 2018-09-19 | 2021-07-28 | Rheinmetall Air Defence AG | Signal interference device and a method for operating a signal interference device for protecting unmanned aerial vehicles (uav), in particular drones |
CN111917508A (en) * | 2020-08-10 | 2020-11-10 | 中国人民解放军陆军工程大学 | Anti-interference communication model based on multiple antennas and dynamic spatial spectrum anti-interference method |
CN112564849A (en) * | 2020-12-01 | 2021-03-26 | 国网辽宁省电力有限公司营口供电公司 | Identification and trapping method for multi-model unmanned aerial vehicle |
Non-Patent Citations (1)
Title |
---|
ZHICHAO SHENG 等: "UAV-Aided Two-Way Multi-User Relaying" * |
Also Published As
Publication number | Publication date |
---|---|
CN113507342B (en) | 2023-06-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109729528B (en) | D2D resource allocation method based on multi-agent deep reinforcement learning | |
CN109474980B (en) | Wireless network resource allocation method based on deep reinforcement learning | |
CN108880662B (en) | Wireless information and energy transmission optimization method based on unmanned aerial vehicle | |
CN110620611B (en) | Cooperative spectrum sensing method based on GEO and LEO double-layer satellite network | |
CN110784882B (en) | Energy acquisition D2D communication resource allocation method based on reinforcement learning | |
CN106680780A (en) | Radar optimal waveform design method based on radio frequency stealth in frequency spectrum shared environment | |
CN111511038B (en) | Distributed channel intelligent sensing and access method for wireless cooperative network | |
CN113596785B (en) | D2D-NOMA communication system resource allocation method based on deep Q network | |
CN109861728B (en) | Joint multi-relay selection and time slot resource allocation method for large-scale MIMO system | |
CN112583453A (en) | Downlink NOMA power distribution method of multi-beam LEO satellite communication system | |
CN109195207B (en) | Energy-collecting wireless relay network throughput maximization method based on deep reinforcement learning | |
CN109661034B (en) | Antenna selection and resource allocation method in wireless energy supply communication network | |
CN101729164B (en) | Wireless resource allocation method and cognitive radio user equipment | |
CN112040498B (en) | Fixed point iteration-based wireless energy supply sensor network time allocation method | |
CN110139282B (en) | Energy acquisition D2D communication resource allocation method based on neural network | |
CN115766089A (en) | Energy acquisition cognitive Internet of things anti-interference optimal transmission method | |
CN113255218A (en) | Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network | |
CN110366225B (en) | Wireless energy supply multi-hop communication system node selection method | |
CN113795050B (en) | Sum Tree sampling-based deep double-Q network dynamic power control method | |
CN111741520A (en) | Cognitive underwater acoustic communication system power distribution method based on particle swarm | |
CN113507342A (en) | Unmanned aerial vehicle relay anti-interference method based on deep reinforcement learning | |
CN108449790B (en) | Time and power distribution method of cognitive wireless network based on differential evolution algorithm | |
CN103957565B (en) | Resource allocation methods based on target SINR in distributed wireless networks | |
CN115119174A (en) | Unmanned aerial vehicle autonomous deployment method based on energy consumption optimization in irrigation area scene | |
Du et al. | Joint time and power control of energy harvesting CRN based on PPO |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |