CN113890564B - Special ad hoc network frequency hopping anti-interference method and device for unmanned aerial vehicle based on federal learning - Google Patents

Special ad hoc network frequency hopping anti-interference method and device for unmanned aerial vehicle based on federal learning Download PDF

Info

Publication number
CN113890564B
CN113890564B CN202110976965.4A CN202110976965A CN113890564B CN 113890564 B CN113890564 B CN 113890564B CN 202110976965 A CN202110976965 A CN 202110976965A CN 113890564 B CN113890564 B CN 113890564B
Authority
CN
China
Prior art keywords
dqn
model
local
unmanned aerial
aerial vehicle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110976965.4A
Other languages
Chinese (zh)
Other versions
CN113890564A (en
Inventor
叶远帆
雷鸣
赵民建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202110976965.4A priority Critical patent/CN113890564B/en
Publication of CN113890564A publication Critical patent/CN113890564A/en
Application granted granted Critical
Publication of CN113890564B publication Critical patent/CN113890564B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B1/00Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission
    • H04B1/69Spread spectrum techniques
    • H04B1/713Spread spectrum techniques using frequency hopping
    • H04B1/715Interference-related aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/14Relay systems
    • H04B7/15Active relay systems
    • H04B7/185Space-based or airborne stations; Stations for satellite systems
    • H04B7/18502Airborne stations
    • H04B7/18506Communications with or from aircraft, i.e. aeronautical mobile service
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W84/00Network topologies
    • H04W84/18Self-organising networks, e.g. ad-hoc networks or sensor networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The application provides a special ad hoc network frequency hopping anti-interference method and device for an unmanned aerial vehicle based on federal learning, wherein the method comprises the following steps: the method comprises the steps of training a frequency hopping anti-interference model based on a DQN algorithm in an unmanned aerial vehicle client, selecting historical experience from an experience pool as training data to update parameters of a local DQN local model to obtain a high-quality frequency hopping strategy, outputting Q values corresponding to different trained channels as sequences according to the channel state of a current time slot, selecting corresponding channels to communicate, sending loss coefficients to a ground server as local weights, optimizing by adopting federal average to obtain an optimized result, training a global model of the ground server, sending the updated global weights to all unmanned aerial vehicles to update the loss coefficients according to the updated global weights, and training the DQN frequency hopping to select the anti-interference model. According to the invention, a corresponding frequency hopping strategy can be generated according to the interference condition, the unmanned aerial vehicle is guided to select a channel without interference for communication, and the communication efficiency can be effectively improved.

Description

Special ad hoc network frequency hopping anti-interference method and device for unmanned aerial vehicle based on federal learning
Technical Field
The invention relates to the technical field of ad hoc networks, in particular to a frequency hopping anti-interference method and device of an ad hoc network special for an unmanned aerial vehicle based on federal learning.
Background
Drones are becoming increasingly popular because they can accomplish a variety of challenging tasks in three-dimensional space. Due to the high mobility of drones, there are many challenging applications, such as border monitoring, relay networks, and disaster monitoring, where drones can be deployed for better efficacy. The unmanned Ad-hoc Network (FANET) is a distributed communication Network composed of unmanned planes, and is used for alleviating The challenges faced by unmanned plane networks based on complete infrastructures. The reliability of wireless communication is important in any application, especially in the military. In general, a jamming attack may effectively prevent a drone from transmitting data to a target. Therefore, an effective anti-interference strategy is urgently needed by the unmanned aerial vehicle self-organizing network to solve the interference problem.
Joint Learning (FL) is a distributed Machine Learning (ML) technique intended for device-level training of mobile devices. Federal learning was originally proposed by google in 2016 to solve the problem of local model update by the end user of a mobile phone, but with the development of technology, federal learning is beginning to be used in communication technology. The federal learning aims to realize common modeling and improve the effect of an AI model on the basis of ensuring the safety of data transmission.
Deep Reinforcement Learning is a combination of Deep Learning (DL) and Reinforcement Learning (RL). The DQN algorithm, as a mountain-opening work of deep reinforcement learning, was published by depmind on NIPS 2013, and then an improved version was proposed on Nature 2015. The DQN algorithm is a Learning technology combining Q-Learning and a neural network, and compared with Q-Learning, the DQN algorithm can obtain low-dimensional action output from high-dimensional state input and has stronger applicability.
Due to the complex environment faced by the ad hoc network of the unmanned aerial vehicle, the communication is more susceptible to interference.
Disclosure of Invention
The present invention is directed to solving, at least in part, one of the technical problems in the related art.
Therefore, the first purpose of the invention is to provide a special frequency hopping anti-interference method for the unmanned aerial vehicle based on federal learning, by assuming that a frequency spectrum channel in an area where the unmanned aerial vehicle is located is intermittently interfered, when the channel selected by the unmanned aerial vehicle is interfered, data sent by the unmanned aerial vehicle cannot be transmitted to a target; when the channel selected by the unmanned aerial vehicle has no interference, the data sent by the unmanned aerial vehicle is successfully transmitted to the target. Therefore, the DQN algorithm and the Federal learning algorithm are combined to learn the channel interference situation, and a channel with a proper current state is selected under different states to successfully send information for communication.
The second purpose of the invention is to provide a special frequency hopping anti-jamming device of the ad hoc network for the unmanned aerial vehicle based on federal learning.
In order to achieve the above object, an embodiment of a first aspect of the present invention provides a frequency hopping anti-interference method for an ad hoc network dedicated to an unmanned aerial vehicle based on federal learning, including the following steps:
s1, in an unmanned aerial vehicle client, training is carried out on a frequency hopping anti-interference model based on a DQN algorithm, and in each training round, a preset number of historical experiences are selected from an experience pool and are used as training data to be input into a DQN local model;
s2, parameters of the local DQN local model are updated by inputting the training data to a neural network of the DQN local model, so that the DQN local model obtains a high-quality frequency hopping strategy;
s3, the unmanned aerial vehicle outputs Q values corresponding to different channels trained by the DQN local model as a sequence according to the channel state of the current time slot, the frequency hopping strategy selects corresponding channels to communicate according to a preset threshold value in the sequence, and loss coefficients in the DQN local model are sent to a ground server as local weights;
and S4, the ground server collects the local weights and optimizes the local weights by adopting a Federal averaging algorithm to obtain an optimized result, the optimized result is used for training a global model of the ground server, the ground server retransmits the updated global weights to each unmanned aerial vehicle, and the unmanned aerial vehicle client updates a loss coefficient according to the updated global weights so as to train the DQN frequency hopping to select an anti-interference model.
In addition, the frequency hopping anti-interference method of the special ad hoc network for the unmanned aerial vehicle based on federal learning according to the embodiment of the invention can also have the following additional technical characteristics:
further, in an embodiment of the present invention, in S1, the input based on the DQN local model in the drone client is (S, a, r, S') from the experience pool;
wherein s represents the current state when the user acts and is a channel selected before the learning; a represents the action made at that time, being the channel selected according to the current state; r represents the feedback information received after the action is taken, if the channel has no interference, the value is a positive value, otherwise, if the channel has interference, the value is a negative value; s' represents the channel selected for a as the next state to be entered after the action is taken.
Further, in an embodiment of the present invention, the drone uses a channel condition of a previous time slot as a history state, uses a channel selection condition of the previous time slot as a history action, and uses whether information can be communicated as corresponding feedback information.
Further, in an embodiment of the present invention, the randomly selecting a preset number of historical experiences from the experience pool as training data to be input into the DQN local model includes: after testing a channel at each time slot, saving the historical states, the historical actions and the feedback information as historical information into an experience pool of the DQN, and randomly selecting a preset number of the historical states, the historical actions and the feedback information from the experience pool as inputs to be input into the local DQN model.
Further, in an embodiment of the present invention, in S2, the frequency-hopping anti-interference model is based on the DQN algorithm, the DQN algorithm combines a neural Network and a Q-learning algorithm, a greedy strategy is adopted to explore a plurality of channels, a channel is selected according to the DQN local model, a target Q-Network is used to reduce a loss function and promote convergence of the DQN local model.
Further, in an embodiment of the present invention, the reducing the loss function and promoting the DQN local model convergence using a target Q-Network comprises:
the target Q-Network and the action Q-Network have the same structure, and every N training rounds, the parameters of the action Q-Network are assigned to the target Q-Network, wherein the calculation formula of the loss function during each replacement is as follows:
Figure BDA0003227884950000031
wherein Q (s, a; theta) refers to a Q value under the model parameter of the current moment in the action Q-Network when the current state is taken as input; r is the feedback at the current time; γ is the attenuation coefficient; s' is the state at the next time; a' is the action at the next moment; θ' is the model parameter after the last update; gamma raymax a ' Q (s ', a '; theta) is the largest Q value in the series of Q values output by the model after the target Q-Network is processed by the last updated model parameter with the current state as input.
Further, in an embodiment of the present invention, the S3 includes:
the DQN local model adopts a Bellman formula to update the Q value, and the formula is as follows:
Q(s t ,a t )←Q(s t ,a t )+α[r t +γmax a Q(s t+1 ,a t+1 )-Q(s t ,a t )]
wherein, Q(s) t ,a t ) Is in the current state s t Take action a t The resulting Q value; α is the learning rate; γ is the attenuation coefficient; r is t Is the feedback of the current time t; max a Q(s t+1 ,a t+1 ) Is referred to as being in the current state s t Take action a t The next state s that can be obtained thereafter t+1 The maximum Q value of (2);
the frequency hopping strategy of the unmanned aerial vehicle client side is according to the maximum value in the output sequence of the DQN local model
Figure BDA0003227884950000032
And selecting a corresponding channel a to communicate with the server, and sending the loss coefficient theta to the local server as a local weight.
Further, in an embodiment of the present invention, in S4, the ground server collects the local weights and performs optimization by using a federal mean algorithm to obtain an optimized result, where the formula of the federal mean algorithm is as follows:
Figure BDA0003227884950000033
wherein the content of the first and second substances,
Figure BDA0003227884950000034
is local from the kth drone clientWeight, i.e. loss factor theta, p k Dividing by p to represent that weighted average is carried out on the local weight of the kth unmanned aerial vehicle client, adding the weighted average values to obtain an updated global weight, and inputting the updated global weight into the global model of the local server to train to obtain the updated global model.
The frequency hopping anti-interference method of the special ad hoc network for the unmanned aerial vehicle based on the federal learning comprises the steps that in an unmanned aerial vehicle client, a frequency hopping anti-interference model based on a DQN algorithm is trained, and in each training round, a preset number of historical experiences are selected from an experience pool and used as training data to be input into a DQN local model; updating parameters of the local DQN local model by inputting training data to a neural network of the DQN local model, so that the DQN local model obtains a high-quality frequency hopping strategy; the unmanned aerial vehicle outputs Q values corresponding to different channels trained by the DQN local model as a sequence according to the channel state of the current time slot, the frequency hopping strategy selects the corresponding channel for communication according to a preset threshold value in the sequence, and the loss coefficient in the DQN local model is sent to a ground server as a local weight; the ground server collects the local weights and optimizes the local weights by adopting a federal average algorithm to obtain an optimized result, the optimized result is used for training a global model of the ground server, the ground server retransmits the updated global weights to each unmanned aerial vehicle, and the unmanned aerial vehicle client updates the loss coefficient according to the updated global weights so as to train the DQN frequency hopping to select the anti-interference model. According to the invention, a corresponding frequency hopping strategy can be generated according to the interference condition, the unmanned aerial vehicle is guided to select a channel without interference for communication, and the communication efficiency can be effectively improved.
In order to achieve the above object, an embodiment of a second aspect of the present invention provides an ad hoc network frequency hopping anti-interference apparatus dedicated for an unmanned aerial vehicle based on federal learning, including:
the training module is used for training a frequency hopping anti-interference model based on a DQN algorithm in an unmanned aerial vehicle client, and randomly selecting a preset number of historical experiences from the experience pool as training data to be input into the DQN local model in each training round;
the updating module is used for updating parameters of the local DQN local model by inputting the training data to the neural network of the DQN local model, so that the DQN local model obtains a high-quality frequency hopping strategy;
the communication module is used for outputting Q values corresponding to different channels trained by the DQN local model as a sequence by the unmanned aerial vehicle according to the channel state of the current time slot, selecting the corresponding channels for communication according to a preset threshold value in the sequence by the frequency hopping strategy, and sending loss coefficients in the DQN local model as local weights to a ground server;
and the optimization module is used for collecting the local weights by the ground server and optimizing by adopting a federal averaging algorithm to obtain an optimized result, training a global model of the ground server by the optimized result, retransmitting the updated global weights to each unmanned aerial vehicle by the ground server, and updating loss coefficients by the unmanned aerial vehicle client according to the updated global weights so as to train the DQN frequency hopping to select an anti-interference model.
According to the special ad-hoc network frequency hopping anti-interference device based on the federal learning for the unmanned aerial vehicle, the frequency hopping anti-interference model based on the DQN algorithm is trained in the client side of the unmanned aerial vehicle, and in each training round, a preset number of historical experiences are selected from an experience pool and used as training data to be input into a DQN local model; updating parameters of the local DQN local model by inputting training data to a neural network of the DQN local model, so that the DQN local model obtains a high-quality frequency hopping strategy; the unmanned aerial vehicle outputs Q values corresponding to different channels trained by the DQN local model as a sequence according to the channel state of the current time slot, the frequency hopping strategy selects corresponding channels according to a preset threshold value in the sequence for communication, and loss coefficients in the DQN local model are used as local weights to be sent to a ground server; the ground server collects the local weights and optimizes the local weights by adopting a federal average algorithm to obtain an optimized result, the optimized result is used for training a global model of the ground server, the ground server retransmits the updated global weights to each unmanned aerial vehicle, and the unmanned aerial vehicle client updates the loss coefficient according to the updated global weights so as to train the DQN frequency hopping to select the anti-interference model. According to the invention, a corresponding frequency hopping strategy can be generated according to the interference condition, the unmanned aerial vehicle is guided to select a channel without interference for communication, and the communication efficiency can be effectively improved.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic diagram of an ad hoc network of a drone and an interference area according to one embodiment of the present invention;
fig. 2 is a flowchart of a frequency hopping anti-interference method of a dedicated ad hoc network of an unmanned aerial vehicle based on federal learning according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a federated learning model in accordance with an embodiment of the present invention;
fig. 4 is a model schematic of a DQN algorithm according to one embodiment of the invention;
FIG. 5 is a schematic diagram of a frequency hopping strategy model for Federal reinforcement learning according to an embodiment of the present invention;
fig. 6 is a transmission accuracy simulation diagram of an unmanned aerial vehicle using the federal reinforcement learning algorithm and the DQN algorithm, respectively, according to an embodiment of the present invention;
fig. 7 is a graph of an average return simulation of a drone using the federal reinforcement learning algorithm and DQN algorithm, respectively, in accordance with one embodiment of the present invention;
FIG. 8 is a graph of an average return simulation using the Federal reinforcement learning algorithm and the DQN algorithm for different numbers of drones, in accordance with one embodiment of the present invention;
fig. 9 is a structural diagram of a frequency hopping anti-interference device of a dedicated ad hoc network of an unmanned aerial vehicle based on federal learning according to an embodiment of the invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The invention discloses a frequency hopping anti-interference method and device of a special ad hoc network for an unmanned aerial vehicle based on federal learning, and relates to the technical field of communication.
The invention aims at a schematic diagram of an unmanned aerial vehicle network and an interference area, wherein the area in an irregular shape represents the interference area, a table with blue and white at intervals at the lower right represents a channel interference mode, blue represents that a time slot channel has interference, and white represents that the channel is not interfered. Unmanned aerial vehicle will communicate with ground terminal in the interference region, if unmanned aerial vehicle has selected the channel of interference, then can't communicate with ground terminal normally, otherwise if has selected the channel that does not have the interference, ground terminal just can receive the information that unmanned aerial vehicle sent, carries out normal communication. As shown in fig. 1.
Fig. 2 is a flowchart of a frequency hopping anti-interference method of a dedicated ad hoc network for an unmanned aerial vehicle based on federal learning according to an embodiment of the present invention.
As shown in fig. 2, the frequency hopping anti-interference method for the ad hoc network dedicated to the unmanned aerial vehicle based on federal learning includes:
step S1, in an unmanned aerial vehicle client, training is carried out on a frequency hopping anti-interference model based on a DQN algorithm, and in each training round, a preset number of historical experiences are selected from an experience pool and used as training data to be input into a DQN local model.
It can be appreciated that in the drone client, we train through a DQN-based frequency hopping antijam model. The unmanned aerial vehicle takes the channel condition of the last time slot as a historical state, takes the channel selection condition of the last time slot as a historical action, and takes the possibility of information communication as corresponding feedback information. After testing the channel at each time slot, the historical state, historical action and feedback information is saved as historical information into the experience pool of the DQN. At each training round, a certain number of historical experiences from the experience pool are randomly selected as training data to be input into the local DQN learning model.
FIG. 3 is a model schematic of the DQN algorithm, which consists essentially of an action Q-Network, a target Q-Network, an experience pool, and a loss function.
Specifically, the input of the local model based on DQN in the drone client is (s, a, r, s') from the experience pool. s represents the current state when the action is taken, being the channel selected before this learning. a represents the action made at that time, being the channel selected according to the current state. r represents the feedback received after the action, and is a positive value if the channel has no interference, or a negative value if the channel has interference. s' is represented as the next state entered after the action is made, i.e. the channel selected for the action.
And S2, updating parameters of the local DQN local model by inputting training data to the neural network of the DQN local model, so that the DQN local model obtains a high-quality frequency hopping strategy.
It can be understood that the frequency hopping anti-interference method provided by the invention is based on a DQN algorithm, and the DQN algorithm combines a neural network and a Q-learning algorithm. Firstly, an epsilon greedy strategy is adopted to search more channels, epsilon represents the search probability, epsilon is reduced from 1 to 0 along with the increase of the training times, namely the search probability is gradually reduced, and the probability of selecting the channels according to the DQN model is increased. The size of the experience pool is 5000, and the occurrence of the overfitting phenomenon can be reduced by increasing the size of the experience pool in a proper amount. The DQN algorithm randomly selects historical experience from the experience pool to eliminate sample correlations and make the model converge faster. A target Q-Network is used to reduce the loss function and facilitate DQN model convergence. The target Q-Network has the same structure as the action Q-Network, is a fully connected neural Network with 2 hidden layers, and adjusts the learning rate by using an Adam optimizer. Every N training rounds, the parameters of the action Q-Network are assigned to the target Q-Network. The formula for the penalty function at each substitution is as follows:
Figure BDA0003227884950000071
/>
wherein Q (s, a; theta) refers to a Q value under the model parameter at the current moment in the action Q-Network when the current state is taken as input; r is the feedback at the current time; γ is the attenuation coefficient; s' is the state at the next time; a' is the action at the next moment; θ' is the model parameter after the last update; gamma max a′ Q (s ','; θ) is the largest Q value in the series of Q values output by the model after the target Q-Network is processed with the last updated model parameters when the current state is used as input.
In the frequency hopping strategy, if the server successfully receives the packet sent by the unmanned aerial vehicle, the channel is not interfered and is a frequency strategy which can be adopted, and otherwise, the channel is interfered and is a frequency strategy which cannot be adopted.
And S3, the unmanned aerial vehicle outputs Q values corresponding to different channels trained by the DQN local model as a sequence according to the channel state of the current time slot, the frequency hopping strategy selects the corresponding channels according to a preset threshold value in the sequence for communication, and the loss coefficient in the DQN local model is used as a local weight and sent to the ground server.
Specifically, according to the current state, the Q values corresponding to different actions trained by the DQN local model are output as a sequence, and the frequency hopping strategy selects a corresponding channel according to the maximum value in the sequence for communication, and at the same time, sends the loss coefficient θ as the local weight to the ground server.
Firstly, the DQN model adopts a Bellman formula to update the Q value, and the formula is as follows:
Q(s t ,a t )←Q(s t ,a t )+α[r t +γmax a Q(s t+1 ,a t+1 )-Q(s t ,a t )]
wherein, Q(s) t ,a t ) Is in the current state s t Take action a t The resulting Q value; α is the learning rate; γ is the attenuation coefficient; r is t Is the feedback of the current time t; max of a Q(s t+1 ,a t+1 ) Is referred to as being in the current state s t Take action a t The next state s that can be obtained thereafter t+1 The maximum Q value of (2).
The frequency hopping strategy of the unmanned aerial vehicle client side is based on the maximum value in the output sequence of the DQN model
Figure BDA0003227884950000072
To select the corresponding channel a to communicate with the server and to send the loss coefficient theta in the loss function to the server.
And S4, the ground server collects the local weights and optimizes the local weights by adopting a federal average algorithm to obtain an optimized result, the optimized result is used for training a global model of the ground server, the ground server retransmits the updated global weights to each unmanned aerial vehicle, and the unmanned aerial vehicle client updates the loss coefficient according to the updated global weights so as to train DQN frequency hopping to select an anti-interference model.
Specifically, as shown in fig. 5, the schematic diagram is composed of an unmanned aerial vehicle and a server, and specifically includes the following:
the local model runs in the drone to support the frequency hopping strategy through the DQN algorithm described in figure 4. The trained Q-network is used as a local model, and the DQN algorithm is trained according to the local model updated by the global model. The updated loss coefficient θ trained by DQN will be assigned to the local model. At the same time, the updated loss factor θ will be sent to the frequency hopping strategy to make the decision. The local model then sends the updated local weights to the server.
Firstly, a global model is trained in a server, the model is updated by a federal average algorithm, and the federal average formula is as follows:
Figure BDA0003227884950000081
wherein the content of the first and second substances,
Figure BDA0003227884950000082
is the local weight, p, from the kth drone client k Dividing by p means that weighted average is carried out on local weight of the kth unmanned aerial vehicle client, and finally K pieces of weighted average are carried outThe addition of the values results in an optimized local weight. Then, the updated ω m+1 And inputting the global model into the server to train so as to obtain a new global model. After the new global weight update, the global model is updated with it to obtain a better strategy after each training round. Then, the global model retransmits the updated loss coefficient θ as the current global weight to the drone. The drone receives the updated global weights and updates the local model with the global weights.
Fig. 6 is a transmission accuracy simulation diagram of the unmanned aerial vehicle using the federal reinforcement learning algorithm and the DQN algorithm, respectively, where the abscissa represents training time and the ordinate represents accuracy of successfully transmitted data. The accuracy curves of the two algorithms are initially closer because both algorithms are in the discovery phase at the start and the selection of the hopping strategy is random. Because the learning method is optimized by combining information of a plurality of unmanned aerial vehicle clients through federal learning, the accuracy of the algorithm is increased faster and reaches 100% faster than that of a DQN algorithm, which shows that the algorithm can perform frequency hopping selection more quickly and effectively. Finally, the curve tends to be stable at 100%, which shows that the algorithm proposed by the inventor obtains the optimal frequency hopping strategy, and can effectively avoid channels without interference for communication.
Fig. 7 is a simulation plot of average return for a drone using the federal reinforcement learning algorithm and DQN algorithm, respectively, with the abscissa representing training time and the ordinate representing average return. As can be known from the simulation diagram, both the proposed Federal reinforcement learning algorithm and the DQN algorithm can be converged, but the proposed Federal reinforcement learning algorithm has shorter training time and can obtain an effective frequency hopping strategy more quickly.
Fig. 8 is a graph of the average return simulation using the federal reinforcement learning algorithm and DQN algorithm for different numbers of drones. The performance of a federal reinforcement learning algorithm and the performance of a DQN algorithm, which are provided when different numbers of unmanned aerial vehicle clients are simulated, and the abscissa is training time. The simulation graph shows that the convergence speed of the algorithm provided by 10 unmanned aerial vehicle clients is obviously higher than that of 6 unmanned aerial vehicle clients, and the convergence speed of the algorithm provided by 6 unmanned aerial vehicle clients is also obviously higher than that of 3 and 2 unmanned aerial vehicle clients adopting the DQN algorithm. With the increase of the number of unmanned aerial vehicles, the convergence speed is faster. The simulation graph illustrates that the performance of the proposed federal reinforcement learning model can be improved as the number of the unmanned aerial vehicle clients increases.
According to the frequency hopping anti-interference method of the special ad hoc network for the unmanned aerial vehicle based on the federal learning, the frequency hopping anti-interference model based on the DQN algorithm is trained in the client of the unmanned aerial vehicle, and in each training round, a preset number of historical experiences are selected from an experience pool and are used as training data to be input into a DQN local model; updating parameters of the local DQN local model by inputting training data to a neural network of the DQN local model, so that the DQN local model obtains a high-quality frequency hopping strategy; the unmanned aerial vehicle outputs Q values corresponding to different channels trained by the DQN local model as a sequence according to the channel state of the current time slot, the frequency hopping strategy selects the corresponding channel for communication according to a preset threshold value in the sequence, and the loss coefficient in the DQN local model is sent to a ground server as a local weight; the ground server collects the local weights and optimizes the local weights by adopting a federal average algorithm to obtain an optimized result, the optimized result is used for training a global model of the ground server, the ground server retransmits the updated global weights to each unmanned aerial vehicle, and the unmanned aerial vehicle client updates the loss coefficient according to the updated global weights so as to train the DQN frequency hopping to select the anti-interference model. According to the invention, a corresponding frequency hopping strategy can be generated according to the interference condition, the unmanned aerial vehicle is guided to select a channel without interference for communication, and the communication efficiency can be effectively improved.
Fig. 9 is a schematic structural diagram of a frequency hopping anti-jamming device of a dedicated ad hoc network of an unmanned aerial vehicle based on federal learning according to an embodiment of the invention.
As shown in fig. 9, the apparatus 10 includes:
training module 100, updating module 200, communication module 300, and optimization module 400.
The training module 100 is configured to train, in an unmanned aerial vehicle client, a frequency hopping anti-interference model based on a DQN algorithm, and randomly select a preset number of historical experiences from an experience pool as training data to input the training data into a DQN local model in each training round;
an updating module 200, configured to update parameters of the local DQN local model by inputting training data to the neural network of the DQN local model, so that the DQN local model obtains a high-quality frequency hopping strategy;
the communication module 300 is configured to enable the unmanned aerial vehicle to output Q values corresponding to different channels trained by the DQN local model as a sequence according to a channel state of a current time slot, select a corresponding channel for communication according to a preset threshold in the sequence by a frequency hopping strategy, and send a loss coefficient θ in the DQN local model as a local weight to the ground server;
and the optimization module 400 is used for collecting the local weights by the ground server and optimizing by adopting a federal averaging algorithm to obtain an optimized result, training a global model of the ground server by using the optimized result, retransmitting the updated global weights to each unmanned aerial vehicle by the ground server, updating the loss coefficient theta by the client of each unmanned aerial vehicle according to the updated global weights, and training the DQN to hop frequencies and select an anti-interference model.
Further, in the training module 100, the input of the drone client based on the DQN local model is (s, a, r, s') from the experience pool;
wherein s represents the current state when the user acts and is a channel selected before the learning; a represents the action made at that time, being the channel selected according to the current state; r represents the feedback information received after the action is taken, if the channel has no interference, the value is a positive value, otherwise, if the channel has interference, the value is a negative value; s' represents the channel selected for a as the next state to be entered after the action is taken.
According to the special ad-hoc network frequency-hopping anti-jamming device based on the federal learning for the unmanned aerial vehicle, the training module is used for training a frequency-hopping anti-jamming model based on a DQN algorithm in an unmanned aerial vehicle client, and in each training round, a preset number of historical experiences are selected from an experience pool and are input into a DQN local model as training data; the updating module is used for updating parameters of the local DQN local model by inputting training data to the neural network of the DQN local model, so that the DQN local model obtains a high-quality frequency hopping strategy; the communication module is used for the unmanned aerial vehicle to output Q values corresponding to different channels trained by the DQN local model as a sequence according to the channel state of the current time slot, the frequency hopping strategy selects the corresponding channel to communicate according to a preset threshold value in the sequence, and the loss coefficient in the DQN local model is used as a local weight and sent to the ground server; and the optimization module is used for collecting the local weight by the ground server, optimizing by adopting a federal average algorithm to obtain an optimized result, training a global model of the ground server by the optimized result, retransmitting the updated global weight to each unmanned aerial vehicle by the ground server, and updating the loss coefficient by the client of each unmanned aerial vehicle according to the updated global weight so as to train DQN frequency hopping to select the anti-interference model. The invention can generate a corresponding frequency hopping strategy according to the interference condition, guide the unmanned aerial vehicle to select a channel without interference for communication, and effectively improve the communication efficiency.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or to implicitly indicate the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of the feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless explicitly specified otherwise.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Moreover, various embodiments or examples and features of various embodiments or examples described in this specification can be combined and combined by one skilled in the art without being mutually inconsistent.
Although embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are exemplary and not to be construed as limiting the present invention, and that changes, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (6)

1. A frequency hopping anti-interference method of a special ad hoc network of an unmanned aerial vehicle based on federal learning is characterized by comprising the following steps:
s1, in an unmanned aerial vehicle client, training is carried out on a frequency hopping anti-interference model based on a DQN algorithm, and in each training round, a preset number of historical experiences are selected from an experience pool and are used as training data to be input into a DQN local model;
s2, updating parameters of the DQN local model by inputting the training data to a neural network of the DQN local model, so that the DQN local model obtains a high-quality frequency hopping strategy;
s3, the unmanned aerial vehicle outputs Q values corresponding to different channels trained by the DQN local model as a sequence according to the channel state of the current time slot, the frequency hopping strategy selects corresponding channels to communicate according to a preset threshold value in the sequence, and a loss coefficient theta in the DQN local model is used as a local weight and sent to a ground server;
s4, the ground server collects the local weights and optimizes the local weights by adopting a federal average algorithm to obtain an optimized result, the optimized result is used for training a global model of the ground server, the ground server retransmits the updated global weights to all the unmanned aerial vehicles, and the unmanned aerial vehicle client updates a loss coefficient theta according to the updated global weights so as to train the DQN frequency hopping to select an anti-interference model;
the selecting a preset amount of historical experiences from the experience pool as training data to be input into the DQN local model comprises the following steps:
after testing a channel at each time slot, saving historical states, historical actions and feedback information as historical information into an experience pool of the DQN, randomly selecting a preset number of the historical states, the historical actions and the feedback information from the experience pool as input, and inputting the input into the DQN local model;
in the S2, the frequency hopping anti-interference model is based on the DQN algorithm, the DQN algorithm is combined with a neural Network and a Q-learning algorithm, an epsilon greedy strategy is adopted to explore a plurality of channels, channels are selected according to the DQN local model, a target Q-Network is used to reduce a loss function and promote convergence of the DQN local model;
the reducing loss functions and facilitating the DQN local model convergence using a target Q-Network, comprising:
the target Q-Network and the action Q-Network have the same structure, and every N training rounds, the parameters of the action Q-Network are assigned to the target Q-Network, wherein the calculation formula of the loss function during each replacement is as follows:
Figure FDA0004095195950000021
wherein Q (s, a; theta) refers to a Q value under the model parameter of the current moment in the action Q-Network when the current state is taken as input; r is the feedback at the current time; γ is the attenuation coefficient; s Is the state at the next time; a is Is the action at the next moment; theta is the model parameter after the last update; gamma max a′ Q(s ,a (ii) a Theta) is the maximum Q value in the Q value sequence output by the model after the target Q-Network is processed by the last updated model parameter when the current state is used as input;
the S3 comprises the following steps:
the DQN local model adopts a Bellman formula to update the Q value, and the formula is as follows:
Q(s t ,a t )←Q(s t ,a t )+α[r t +γmax a Q(s t+1 ,a t+1 )-Q(s t ,a t )]
wherein, Q(s) t ,a r ) Is in the current state s r Take action a r The resulting Q value; α is the learning rate; γ is the attenuation coefficient; r is a radical of hydrogen r Is the feedback of the current time t; max a Q(s r+1 ,a t+1 ) Is referred to as being in the current state s t Take action a t The next state s that can be obtained thereafter t+1 The maximum Q value of (2);
the frequency hopping strategy of the unmanned aerial vehicle client side is based on the maximum value in the output sequence of the DQN local model
Figure FDA0004095195950000022
And selecting a corresponding channel a to communicate with the server, and sending the loss coefficient theta to the local server as a local weight. />
2. The ad-hoc network frequency-hopping anti-jamming method special for unmanned aerial vehicle based on federal learning as claimed in claim 1, wherein in S1, the input based on the DQN local model in the unmanned aerial vehicle client is (S, a, r, S) from the experience pool );
Wherein s represents the current state when the user acts and is a channel selected before the learning; a represents the action made at a certain time, which is the channel selected according to the current state; r represents the feedback information received after the action is taken, if the channel has no interference, the value is a positive value, otherwise, if the channel has interference, the value is a negative value; s Indicated as the next state entered after the action is taken, is the channel selected for a.
3. The special ad-hoc network frequency hopping anti-interference method for the unmanned aerial vehicle based on the federal learning as claimed in claim 1, wherein the unmanned aerial vehicle takes the channel condition of the last time slot as a historical state, takes the channel selection condition of the last time slot as a historical action, and takes the information communication possibility as corresponding feedback information.
4. The frequency hopping anti-jamming method for the ad hoc network dedicated for the unmanned aerial vehicle based on the federal learning as claimed in claim 1,
in S4, the ground server collects the local weights and performs optimization by using a federal mean algorithm to obtain an optimized result, where the formula of the federal mean algorithm is as follows:
Figure FDA0004095195950000031
wherein the content of the first and second substances,
Figure FDA0004095195950000032
is the local weight from the kth drone client, i.e. the loss coefficient θ, p k Dividing by p to represent that weighted average is carried out on the local weight of the kth unmanned aerial vehicle client, adding the weighted average values to obtain an updated global weight, and inputting the updated global weight into the global model of the local server to train to obtain the updated global model.
5. The utility model provides a special ad hoc network frequency hopping anti jamming unit of unmanned aerial vehicle based on federal study which characterized in that includes:
the training module is used for training a frequency hopping anti-interference model based on a DQN algorithm in an unmanned aerial vehicle client, and randomly selecting a preset number of historical experiences from an experience pool as training data to be input into a DQN local model in each training round;
the updating module is used for updating the parameters of the DQN local model by inputting the training data to the neural network of the DQN local model, so that the DQN local model obtains a high-quality frequency hopping strategy;
the communication module is used for outputting Q values corresponding to different channels trained by the DQN local model as a sequence by the unmanned aerial vehicle according to the channel state of the current time slot, selecting the corresponding channels for communication according to a preset threshold value in the sequence by the frequency hopping strategy, and sending a loss coefficient theta in the DQN local model as a local weight to a ground server;
the optimization module is used for collecting the local weights by the ground server and optimizing the local weights by adopting a federal average algorithm to obtain an optimized result, training a global model of the ground server by the optimized result, retransmitting the updated global weights to each unmanned aerial vehicle by the ground server, and updating a loss coefficient theta by the unmanned aerial vehicle client according to the updated global weights so as to train the DQN frequency hopping to select an anti-interference model;
the training module is further configured to:
after testing a channel at each time slot, saving historical states, historical actions and feedback information as historical information into an experience pool of the DQN, randomly selecting a preset number of the historical states, the historical actions and the feedback information from the experience pool as input, and inputting the input into the DQN local model;
the frequency hopping anti-interference model is based on the DQN algorithm, the DQN algorithm is combined with a neural Network and a Q-learning algorithm, an epsilon greedy strategy is adopted to explore a plurality of channels, the channels are selected according to the DQN local model, a target Q-Network is used to reduce a loss function and promote convergence of the DQN local model;
the reducing loss functions and facilitating the DQN local model convergence using a target Q-Network, comprising:
the target Q-Network and the action Q-Network have the same structure, and every N training rounds, the parameters of the action Q-Network are assigned to the target Q-Network, wherein the calculation formula of the loss function during each replacement is as follows:
Figure FDA0004095195950000041
wherein Q (s, a; theta) refers to a Q value under the model parameter at the current moment in the action Q-Network when the current state is taken as input; r is the feedback at the current time; γ is the attenuation coefficient; s Is the state at the next time; a is Is the action at the next moment; theta is the model parameter after the last update; gamma max a′ Q(s ,a (ii) a Theta) is the maximum Q value in the Q value sequence output by the model after the target Q-Network is processed by the last updated model parameter when the current state is used as input;
the communication module is further configured to:
the DQN local model adopts a Bellman formula to update the Q value, and the formula is as follows:
Q(s t ,a t )←Q(s r ,a r )+α[r t +γmax a Q(s t+1 ,a t+1 )-Q(s t ,a t )]
wherein, Q(s) t ,a t ) Is in the current state s t Take action a t The resulting Q value; α is the learning rate; γ is the attenuation coefficient; r is t Is the feedback of the current time t; max a Q(s t+1 ,a t+1 ) Is referred to as being in the current state s t Take action a t The next state s that can be obtained thereafter t+1 The maximum Q value of (2);
the frequency hopping strategy of the unmanned aerial vehicle client side is based on the maximum value in the output sequence of the DQN local model
Figure FDA0004095195950000042
And selecting a corresponding channel a to communicate with the server, and sending the loss coefficient theta to the local server as a local weight.
6. The special ad-hoc network frequency hopping anti-jamming device for unmanned aerial vehicle based on federal learning of claim 5, wherein:
in the training module, the input based on the DQN local model in the drone client is (s, a, r, s) from the experience pool );
Wherein s represents the current state when the user acts and is a channel selected before the learning; a represents the action taken at a certain time, according to whichThe channel selected by the current state; r represents the feedback information received after the action is taken, if the channel has no interference, the value is a positive value, otherwise, if the channel has interference, the value is a negative value; s Indicated as the next state entered after the action is taken, is the channel selected for a.
CN202110976965.4A 2021-08-24 2021-08-24 Special ad hoc network frequency hopping anti-interference method and device for unmanned aerial vehicle based on federal learning Active CN113890564B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110976965.4A CN113890564B (en) 2021-08-24 2021-08-24 Special ad hoc network frequency hopping anti-interference method and device for unmanned aerial vehicle based on federal learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110976965.4A CN113890564B (en) 2021-08-24 2021-08-24 Special ad hoc network frequency hopping anti-interference method and device for unmanned aerial vehicle based on federal learning

Publications (2)

Publication Number Publication Date
CN113890564A CN113890564A (en) 2022-01-04
CN113890564B true CN113890564B (en) 2023-04-11

Family

ID=79011315

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110976965.4A Active CN113890564B (en) 2021-08-24 2021-08-24 Special ad hoc network frequency hopping anti-interference method and device for unmanned aerial vehicle based on federal learning

Country Status (1)

Country Link
CN (1) CN113890564B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114548426B (en) * 2022-02-17 2023-11-24 北京百度网讯科技有限公司 Asynchronous federal learning method, business service prediction method, device and system
CN114509732B (en) * 2022-02-21 2023-05-09 四川大学 Deep reinforcement learning anti-interference method of frequency agile radar
CN115622616B (en) * 2022-12-09 2023-04-28 清华大学 Resource control method and device in federal learning model training process
CN117332878B (en) * 2023-10-31 2024-04-16 慧之安信息技术股份有限公司 Model training method and system based on ad hoc network system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109302262A (en) * 2018-09-27 2019-02-01 电子科技大学 A kind of communication anti-interference method determining Gradient Reinforcement Learning based on depth
CN114726743A (en) * 2022-03-04 2022-07-08 重庆邮电大学 Service function chain deployment method based on federal reinforcement learning

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11462209B2 (en) * 2018-05-18 2022-10-04 Baidu Usa Llc Spectrogram to waveform synthesis using convolutional networks
CN108777872B (en) * 2018-05-22 2020-01-24 中国人民解放军陆军工程大学 Intelligent anti-interference method and intelligent anti-interference system based on deep Q neural network anti-interference model
US20200153535A1 (en) * 2018-11-09 2020-05-14 Bluecom Systems and Consulting LLC Reinforcement learning based cognitive anti-jamming communications system and method
CN110213025A (en) * 2019-05-22 2019-09-06 浙江大学 Dedicated ad hoc network anti-interference method based on deeply study
CN110531617B (en) * 2019-07-30 2021-01-08 北京邮电大学 Multi-unmanned aerial vehicle 3D hovering position joint optimization method and device and unmanned aerial vehicle base station
US11473913B2 (en) * 2019-09-20 2022-10-18 Prince Sultan University System and method for service oriented cloud based management of internet of drones
WO2021158313A1 (en) * 2020-02-03 2021-08-12 Intel Corporation Systems and methods for distributed learning for wireless edge dynamics
CN111970072B (en) * 2020-07-01 2023-05-26 中国人民解放军陆军工程大学 Broadband anti-interference system and method based on deep reinforcement learning
CN112584347B (en) * 2020-09-28 2022-07-08 西南电子技术研究所(中国电子科技集团公司第十研究所) UAV heterogeneous network multi-dimensional resource dynamic management method
CN113239023A (en) * 2021-04-20 2021-08-10 浙江大学德清先进技术与产业研究院 Remote sensing data-oriented federal learning model training method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109302262A (en) * 2018-09-27 2019-02-01 电子科技大学 A kind of communication anti-interference method determining Gradient Reinforcement Learning based on depth
CN114726743A (en) * 2022-03-04 2022-07-08 重庆邮电大学 Service function chain deployment method based on federal reinforcement learning

Also Published As

Publication number Publication date
CN113890564A (en) 2022-01-04

Similar Documents

Publication Publication Date Title
CN113890564B (en) Special ad hoc network frequency hopping anti-interference method and device for unmanned aerial vehicle based on federal learning
CN108112082B (en) Wireless network distributed autonomous resource allocation method based on stateless Q learning
CN109729528B (en) D2D resource allocation method based on multi-agent deep reinforcement learning
CN108616916B (en) Anti-interference learning method based on cooperative anti-interference layered game model
Slimeni et al. Cooperative Q-learning based channel selection for cognitive radio networks
CN111726811B (en) Slice resource allocation method and system for cognitive wireless network
CN112566127B (en) Physical layer secure transmission method in cognitive wireless network based on unmanned aerial vehicle assistance
CN108834109B (en) D2D cooperative relay power control method based on Q learning under full-duplex active eavesdropping
CN112188504A (en) Multi-user cooperative anti-interference system and dynamic spectrum cooperative anti-interference method
CN115766089A (en) Energy acquisition cognitive Internet of things anti-interference optimal transmission method
CN113271119B (en) Anti-interference cooperative frequency hopping method based on transmission scheduling
AlQerm et al. Adaptive multi-objective Optimization scheme for cognitive radio resource management
CN113747396A (en) Social perception V2X network joint resource optimization method based on RIS
Abuzainab et al. Robust Bayesian learning for wireless RF energy harvesting networks
Chen et al. Adaptive repetition scheme with machine learning for 3GPP NB-IoT
Lu et al. A learning approach towards power control in full-duplex underlay cognitive radio networks
Pei et al. MAC contention protocol based on reinforcement learning for IoV communication environments
CN116318520A (en) Path control method, device and system of jammer and storage medium
Deshmukh et al. RL-based interference mitigation in uncoordinated networks with partially overlapping tones
Fan et al. Adaptive Switching for Communication Profiles in Underwater Acoustic Modems Based on Reinforcement Learning
Mohammadi et al. Channel-adaptive MAC frame length in wireless body area networks
CN110933679B (en) Robust D2D power control method under probability-based active eavesdropping
Liu et al. Deep reinforcement learning for spectrum sharing in future mobile communication system
CN113572548B (en) Unmanned plane network cooperative fast frequency hopping method based on multi-agent reinforcement learning
Dubosarskii et al. Multichannel power allocation game against jammer with changing strategy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant