CN113890564B

CN113890564B - Special ad hoc network frequency hopping anti-interference method and device for unmanned aerial vehicle based on federal learning

Info

Publication number: CN113890564B
Application number: CN202110976965.4A
Authority: CN
Inventors: 叶远帆; 雷鸣; 赵民建
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2021-08-24
Filing date: 2021-08-24
Publication date: 2023-04-11
Anticipated expiration: 2041-08-24
Also published as: CN113890564A

Abstract

The application provides a special ad hoc network frequency hopping anti-interference method and device for an unmanned aerial vehicle based on federal learning, wherein the method comprises the following steps: the method comprises the steps of training a frequency hopping anti-interference model based on a DQN algorithm in an unmanned aerial vehicle client, selecting historical experience from an experience pool as training data to update parameters of a local DQN local model to obtain a high-quality frequency hopping strategy, outputting Q values corresponding to different trained channels as sequences according to the channel state of a current time slot, selecting corresponding channels to communicate, sending loss coefficients to a ground server as local weights, optimizing by adopting federal average to obtain an optimized result, training a global model of the ground server, sending the updated global weights to all unmanned aerial vehicles to update the loss coefficients according to the updated global weights, and training the DQN frequency hopping to select the anti-interference model. According to the invention, a corresponding frequency hopping strategy can be generated according to the interference condition, the unmanned aerial vehicle is guided to select a channel without interference for communication, and the communication efficiency can be effectively improved.

Description

Special ad hoc network frequency hopping anti-interference method and device for unmanned aerial vehicle based on federal learning

Technical Field

The invention relates to the technical field of ad hoc networks, in particular to a frequency hopping anti-interference method and device of an ad hoc network special for an unmanned aerial vehicle based on federal learning.

Background

Drones are becoming increasingly popular because they can accomplish a variety of challenging tasks in three-dimensional space. Due to the high mobility of drones, there are many challenging applications, such as border monitoring, relay networks, and disaster monitoring, where drones can be deployed for better efficacy. The unmanned Ad-hoc Network (FANET) is a distributed communication Network composed of unmanned planes, and is used for alleviating The challenges faced by unmanned plane networks based on complete infrastructures. The reliability of wireless communication is important in any application, especially in the military. In general, a jamming attack may effectively prevent a drone from transmitting data to a target. Therefore, an effective anti-interference strategy is urgently needed by the unmanned aerial vehicle self-organizing network to solve the interference problem.

Joint Learning (FL) is a distributed Machine Learning (ML) technique intended for device-level training of mobile devices. Federal learning was originally proposed by google in 2016 to solve the problem of local model update by the end user of a mobile phone, but with the development of technology, federal learning is beginning to be used in communication technology. The federal learning aims to realize common modeling and improve the effect of an AI model on the basis of ensuring the safety of data transmission.

Deep Reinforcement Learning is a combination of Deep Learning (DL) and Reinforcement Learning (RL). The DQN algorithm, as a mountain-opening work of deep reinforcement learning, was published by depmind on NIPS 2013, and then an improved version was proposed on Nature 2015. The DQN algorithm is a Learning technology combining Q-Learning and a neural network, and compared with Q-Learning, the DQN algorithm can obtain low-dimensional action output from high-dimensional state input and has stronger applicability.

Due to the complex environment faced by the ad hoc network of the unmanned aerial vehicle, the communication is more susceptible to interference.

Disclosure of Invention

The present invention is directed to solving, at least in part, one of the technical problems in the related art.

Therefore, the first purpose of the invention is to provide a special frequency hopping anti-interference method for the unmanned aerial vehicle based on federal learning, by assuming that a frequency spectrum channel in an area where the unmanned aerial vehicle is located is intermittently interfered, when the channel selected by the unmanned aerial vehicle is interfered, data sent by the unmanned aerial vehicle cannot be transmitted to a target; when the channel selected by the unmanned aerial vehicle has no interference, the data sent by the unmanned aerial vehicle is successfully transmitted to the target. Therefore, the DQN algorithm and the Federal learning algorithm are combined to learn the channel interference situation, and a channel with a proper current state is selected under different states to successfully send information for communication.

The second purpose of the invention is to provide a special frequency hopping anti-jamming device of the ad hoc network for the unmanned aerial vehicle based on federal learning.

In order to achieve the above object, an embodiment of a first aspect of the present invention provides a frequency hopping anti-interference method for an ad hoc network dedicated to an unmanned aerial vehicle based on federal learning, including the following steps:

s1, in an unmanned aerial vehicle client, training is carried out on a frequency hopping anti-interference model based on a DQN algorithm, and in each training round, a preset number of historical experiences are selected from an experience pool and are used as training data to be input into a DQN local model;

s2, parameters of the local DQN local model are updated by inputting the training data to a neural network of the DQN local model, so that the DQN local model obtains a high-quality frequency hopping strategy;

s3, the unmanned aerial vehicle outputs Q values corresponding to different channels trained by the DQN local model as a sequence according to the channel state of the current time slot, the frequency hopping strategy selects corresponding channels to communicate according to a preset threshold value in the sequence, and loss coefficients in the DQN local model are sent to a ground server as local weights;

and S4, the ground server collects the local weights and optimizes the local weights by adopting a Federal averaging algorithm to obtain an optimized result, the optimized result is used for training a global model of the ground server, the ground server retransmits the updated global weights to each unmanned aerial vehicle, and the unmanned aerial vehicle client updates a loss coefficient according to the updated global weights so as to train the DQN frequency hopping to select an anti-interference model.

In addition, the frequency hopping anti-interference method of the special ad hoc network for the unmanned aerial vehicle based on federal learning according to the embodiment of the invention can also have the following additional technical characteristics:

further, in an embodiment of the present invention, in S1, the input based on the DQN local model in the drone client is (S, a, r, S') from the experience pool;

wherein s represents the current state when the user acts and is a channel selected before the learning; a represents the action made at that time, being the channel selected according to the current state; r represents the feedback information received after the action is taken, if the channel has no interference, the value is a positive value, otherwise, if the channel has interference, the value is a negative value; s' represents the channel selected for a as the next state to be entered after the action is taken.

Further, in an embodiment of the present invention, the drone uses a channel condition of a previous time slot as a history state, uses a channel selection condition of the previous time slot as a history action, and uses whether information can be communicated as corresponding feedback information.

Further, in an embodiment of the present invention, the randomly selecting a preset number of historical experiences from the experience pool as training data to be input into the DQN local model includes: after testing a channel at each time slot, saving the historical states, the historical actions and the feedback information as historical information into an experience pool of the DQN, and randomly selecting a preset number of the historical states, the historical actions and the feedback information from the experience pool as inputs to be input into the local DQN model.

Further, in an embodiment of the present invention, in S2, the frequency-hopping anti-interference model is based on the DQN algorithm, the DQN algorithm combines a neural Network and a Q-learning algorithm, a greedy strategy is adopted to explore a plurality of channels, a channel is selected according to the DQN local model, a target Q-Network is used to reduce a loss function and promote convergence of the DQN local model.

Further, in an embodiment of the present invention, the reducing the loss function and promoting the DQN local model convergence using a target Q-Network comprises:

the target Q-Network and the action Q-Network have the same structure, and every N training rounds, the parameters of the action Q-Network are assigned to the target Q-Network, wherein the calculation formula of the loss function during each replacement is as follows:

wherein Q (s, a; theta) refers to a Q value under the model parameter of the current moment in the action Q-Network when the current state is taken as input; r is the feedback at the current time; γ is the attenuation coefficient; s' is the state at the next time; a' is the action at the next moment; θ' is the model parameter after the last update; gamma raymax _a ' Q (s ', a '; theta) is the largest Q value in the series of Q values output by the model after the target Q-Network is processed by the last updated model parameter with the current state as input.

Further, in an embodiment of the present invention, the S3 includes:

the DQN local model adopts a Bellman formula to update the Q value, and the formula is as follows:

Q(s _t ,a _t )←Q(s _t ,a _t )+α[r _t +γmax _a Q(s _t+1 ,a _t+1 )-Q(s _t ,a _t )]

wherein, Q(s) _t ,a _t ) Is in the current state s _t Take action a _t The resulting Q value; α is the learning rate; γ is the attenuation coefficient; r is _t Is the feedback of the current time t; max _a Q(s _t+1 ,a _t+1 ) Is referred to as being in the current state s _t Take action a _t The next state s that can be obtained thereafter _t+1 The maximum Q value of (2);

the frequency hopping strategy of the unmanned aerial vehicle client side is according to the maximum value in the output sequence of the DQN local model

And selecting a corresponding channel a to communicate with the server, and sending the loss coefficient theta to the local server as a local weight.

Further, in an embodiment of the present invention, in S4, the ground server collects the local weights and performs optimization by using a federal mean algorithm to obtain an optimized result, where the formula of the federal mean algorithm is as follows:

wherein the content of the first and second substances,

is local from the kth drone clientWeight, i.e. loss factor theta, p _k Dividing by p to represent that weighted average is carried out on the local weight of the kth unmanned aerial vehicle client, adding the weighted average values to obtain an updated global weight, and inputting the updated global weight into the global model of the local server to train to obtain the updated global model.

The frequency hopping anti-interference method of the special ad hoc network for the unmanned aerial vehicle based on the federal learning comprises the steps that in an unmanned aerial vehicle client, a frequency hopping anti-interference model based on a DQN algorithm is trained, and in each training round, a preset number of historical experiences are selected from an experience pool and used as training data to be input into a DQN local model; updating parameters of the local DQN local model by inputting training data to a neural network of the DQN local model, so that the DQN local model obtains a high-quality frequency hopping strategy; the unmanned aerial vehicle outputs Q values corresponding to different channels trained by the DQN local model as a sequence according to the channel state of the current time slot, the frequency hopping strategy selects the corresponding channel for communication according to a preset threshold value in the sequence, and the loss coefficient in the DQN local model is sent to a ground server as a local weight; the ground server collects the local weights and optimizes the local weights by adopting a federal average algorithm to obtain an optimized result, the optimized result is used for training a global model of the ground server, the ground server retransmits the updated global weights to each unmanned aerial vehicle, and the unmanned aerial vehicle client updates the loss coefficient according to the updated global weights so as to train the DQN frequency hopping to select the anti-interference model. According to the invention, a corresponding frequency hopping strategy can be generated according to the interference condition, the unmanned aerial vehicle is guided to select a channel without interference for communication, and the communication efficiency can be effectively improved.

In order to achieve the above object, an embodiment of a second aspect of the present invention provides an ad hoc network frequency hopping anti-interference apparatus dedicated for an unmanned aerial vehicle based on federal learning, including:

the training module is used for training a frequency hopping anti-interference model based on a DQN algorithm in an unmanned aerial vehicle client, and randomly selecting a preset number of historical experiences from the experience pool as training data to be input into the DQN local model in each training round;

the updating module is used for updating parameters of the local DQN local model by inputting the training data to the neural network of the DQN local model, so that the DQN local model obtains a high-quality frequency hopping strategy;

the communication module is used for outputting Q values corresponding to different channels trained by the DQN local model as a sequence by the unmanned aerial vehicle according to the channel state of the current time slot, selecting the corresponding channels for communication according to a preset threshold value in the sequence by the frequency hopping strategy, and sending loss coefficients in the DQN local model as local weights to a ground server;

and the optimization module is used for collecting the local weights by the ground server and optimizing by adopting a federal averaging algorithm to obtain an optimized result, training a global model of the ground server by the optimized result, retransmitting the updated global weights to each unmanned aerial vehicle by the ground server, and updating loss coefficients by the unmanned aerial vehicle client according to the updated global weights so as to train the DQN frequency hopping to select an anti-interference model.

According to the special ad-hoc network frequency hopping anti-interference device based on the federal learning for the unmanned aerial vehicle, the frequency hopping anti-interference model based on the DQN algorithm is trained in the client side of the unmanned aerial vehicle, and in each training round, a preset number of historical experiences are selected from an experience pool and used as training data to be input into a DQN local model; updating parameters of the local DQN local model by inputting training data to a neural network of the DQN local model, so that the DQN local model obtains a high-quality frequency hopping strategy; the unmanned aerial vehicle outputs Q values corresponding to different channels trained by the DQN local model as a sequence according to the channel state of the current time slot, the frequency hopping strategy selects corresponding channels according to a preset threshold value in the sequence for communication, and loss coefficients in the DQN local model are used as local weights to be sent to a ground server; the ground server collects the local weights and optimizes the local weights by adopting a federal average algorithm to obtain an optimized result, the optimized result is used for training a global model of the ground server, the ground server retransmits the updated global weights to each unmanned aerial vehicle, and the unmanned aerial vehicle client updates the loss coefficient according to the updated global weights so as to train the DQN frequency hopping to select the anti-interference model. According to the invention, a corresponding frequency hopping strategy can be generated according to the interference condition, the unmanned aerial vehicle is guided to select a channel without interference for communication, and the communication efficiency can be effectively improved.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a schematic diagram of an ad hoc network of a drone and an interference area according to one embodiment of the present invention;

fig. 2 is a flowchart of a frequency hopping anti-interference method of a dedicated ad hoc network of an unmanned aerial vehicle based on federal learning according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a federated learning model in accordance with an embodiment of the present invention;

fig. 4 is a model schematic of a DQN algorithm according to one embodiment of the invention;

FIG. 5 is a schematic diagram of a frequency hopping strategy model for Federal reinforcement learning according to an embodiment of the present invention;

fig. 6 is a transmission accuracy simulation diagram of an unmanned aerial vehicle using the federal reinforcement learning algorithm and the DQN algorithm, respectively, according to an embodiment of the present invention;

fig. 7 is a graph of an average return simulation of a drone using the federal reinforcement learning algorithm and DQN algorithm, respectively, in accordance with one embodiment of the present invention;

FIG. 8 is a graph of an average return simulation using the Federal reinforcement learning algorithm and the DQN algorithm for different numbers of drones, in accordance with one embodiment of the present invention;

fig. 9 is a structural diagram of a frequency hopping anti-interference device of a dedicated ad hoc network of an unmanned aerial vehicle based on federal learning according to an embodiment of the invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

The invention discloses a frequency hopping anti-interference method and device of a special ad hoc network for an unmanned aerial vehicle based on federal learning, and relates to the technical field of communication.

The invention aims at a schematic diagram of an unmanned aerial vehicle network and an interference area, wherein the area in an irregular shape represents the interference area, a table with blue and white at intervals at the lower right represents a channel interference mode, blue represents that a time slot channel has interference, and white represents that the channel is not interfered. Unmanned aerial vehicle will communicate with ground terminal in the interference region, if unmanned aerial vehicle has selected the channel of interference, then can't communicate with ground terminal normally, otherwise if has selected the channel that does not have the interference, ground terminal just can receive the information that unmanned aerial vehicle sent, carries out normal communication. As shown in fig. 1.

Fig. 2 is a flowchart of a frequency hopping anti-interference method of a dedicated ad hoc network for an unmanned aerial vehicle based on federal learning according to an embodiment of the present invention.

As shown in fig. 2, the frequency hopping anti-interference method for the ad hoc network dedicated to the unmanned aerial vehicle based on federal learning includes:

step S1, in an unmanned aerial vehicle client, training is carried out on a frequency hopping anti-interference model based on a DQN algorithm, and in each training round, a preset number of historical experiences are selected from an experience pool and used as training data to be input into a DQN local model.

It can be appreciated that in the drone client, we train through a DQN-based frequency hopping antijam model. The unmanned aerial vehicle takes the channel condition of the last time slot as a historical state, takes the channel selection condition of the last time slot as a historical action, and takes the possibility of information communication as corresponding feedback information. After testing the channel at each time slot, the historical state, historical action and feedback information is saved as historical information into the experience pool of the DQN. At each training round, a certain number of historical experiences from the experience pool are randomly selected as training data to be input into the local DQN learning model.

FIG. 3 is a model schematic of the DQN algorithm, which consists essentially of an action Q-Network, a target Q-Network, an experience pool, and a loss function.

Specifically, the input of the local model based on DQN in the drone client is (s, a, r, s') from the experience pool. s represents the current state when the action is taken, being the channel selected before this learning. a represents the action made at that time, being the channel selected according to the current state. r represents the feedback received after the action, and is a positive value if the channel has no interference, or a negative value if the channel has interference. s' is represented as the next state entered after the action is made, i.e. the channel selected for the action.

And S2, updating parameters of the local DQN local model by inputting training data to the neural network of the DQN local model, so that the DQN local model obtains a high-quality frequency hopping strategy.

It can be understood that the frequency hopping anti-interference method provided by the invention is based on a DQN algorithm, and the DQN algorithm combines a neural network and a Q-learning algorithm. Firstly, an epsilon greedy strategy is adopted to search more channels, epsilon represents the search probability, epsilon is reduced from 1 to 0 along with the increase of the training times, namely the search probability is gradually reduced, and the probability of selecting the channels according to the DQN model is increased. The size of the experience pool is 5000, and the occurrence of the overfitting phenomenon can be reduced by increasing the size of the experience pool in a proper amount. The DQN algorithm randomly selects historical experience from the experience pool to eliminate sample correlations and make the model converge faster. A target Q-Network is used to reduce the loss function and facilitate DQN model convergence. The target Q-Network has the same structure as the action Q-Network, is a fully connected neural Network with 2 hidden layers, and adjusts the learning rate by using an Adam optimizer. Every N training rounds, the parameters of the action Q-Network are assigned to the target Q-Network. The formula for the penalty function at each substitution is as follows:

/>

wherein Q (s, a; theta) refers to a Q value under the model parameter at the current moment in the action Q-Network when the current state is taken as input; r is the feedback at the current time; γ is the attenuation coefficient; s' is the state at the next time; a' is the action at the next moment; θ' is the model parameter after the last update; gamma max _a′ Q (s ','; θ) is the largest Q value in the series of Q values output by the model after the target Q-Network is processed with the last updated model parameters when the current state is used as input.

In the frequency hopping strategy, if the server successfully receives the packet sent by the unmanned aerial vehicle, the channel is not interfered and is a frequency strategy which can be adopted, and otherwise, the channel is interfered and is a frequency strategy which cannot be adopted.

And S3, the unmanned aerial vehicle outputs Q values corresponding to different channels trained by the DQN local model as a sequence according to the channel state of the current time slot, the frequency hopping strategy selects the corresponding channels according to a preset threshold value in the sequence for communication, and the loss coefficient in the DQN local model is used as a local weight and sent to the ground server.

Specifically, according to the current state, the Q values corresponding to different actions trained by the DQN local model are output as a sequence, and the frequency hopping strategy selects a corresponding channel according to the maximum value in the sequence for communication, and at the same time, sends the loss coefficient θ as the local weight to the ground server.

Firstly, the DQN model adopts a Bellman formula to update the Q value, and the formula is as follows:

wherein, Q(s) _t ,a _t ) Is in the current state s _t Take action a _t The resulting Q value; α is the learning rate; γ is the attenuation coefficient; r is _t Is the feedback of the current time t; max of _a Q(s _t+1 ,a _t+1 ) Is referred to as being in the current state s _t Take action a _t The next state s that can be obtained thereafter _t+1 The maximum Q value of (2).

The frequency hopping strategy of the unmanned aerial vehicle client side is based on the maximum value in the output sequence of the DQN model

To select the corresponding channel a to communicate with the server and to send the loss coefficient theta in the loss function to the server.

And S4, the ground server collects the local weights and optimizes the local weights by adopting a federal average algorithm to obtain an optimized result, the optimized result is used for training a global model of the ground server, the ground server retransmits the updated global weights to each unmanned aerial vehicle, and the unmanned aerial vehicle client updates the loss coefficient according to the updated global weights so as to train DQN frequency hopping to select an anti-interference model.

Specifically, as shown in fig. 5, the schematic diagram is composed of an unmanned aerial vehicle and a server, and specifically includes the following:

the local model runs in the drone to support the frequency hopping strategy through the DQN algorithm described in figure 4. The trained Q-network is used as a local model, and the DQN algorithm is trained according to the local model updated by the global model. The updated loss coefficient θ trained by DQN will be assigned to the local model. At the same time, the updated loss factor θ will be sent to the frequency hopping strategy to make the decision. The local model then sends the updated local weights to the server.

Firstly, a global model is trained in a server, the model is updated by a federal average algorithm, and the federal average formula is as follows:

wherein the content of the first and second substances,

is the local weight, p, from the kth drone client _k Dividing by p means that weighted average is carried out on local weight of the kth unmanned aerial vehicle client, and finally K pieces of weighted average are carried outThe addition of the values results in an optimized local weight. Then, the updated ω _m+1 And inputting the global model into the server to train so as to obtain a new global model. After the new global weight update, the global model is updated with it to obtain a better strategy after each training round. Then, the global model retransmits the updated loss coefficient θ as the current global weight to the drone. The drone receives the updated global weights and updates the local model with the global weights.

Fig. 6 is a transmission accuracy simulation diagram of the unmanned aerial vehicle using the federal reinforcement learning algorithm and the DQN algorithm, respectively, where the abscissa represents training time and the ordinate represents accuracy of successfully transmitted data. The accuracy curves of the two algorithms are initially closer because both algorithms are in the discovery phase at the start and the selection of the hopping strategy is random. Because the learning method is optimized by combining information of a plurality of unmanned aerial vehicle clients through federal learning, the accuracy of the algorithm is increased faster and reaches 100% faster than that of a DQN algorithm, which shows that the algorithm can perform frequency hopping selection more quickly and effectively. Finally, the curve tends to be stable at 100%, which shows that the algorithm proposed by the inventor obtains the optimal frequency hopping strategy, and can effectively avoid channels without interference for communication.

Fig. 7 is a simulation plot of average return for a drone using the federal reinforcement learning algorithm and DQN algorithm, respectively, with the abscissa representing training time and the ordinate representing average return. As can be known from the simulation diagram, both the proposed Federal reinforcement learning algorithm and the DQN algorithm can be converged, but the proposed Federal reinforcement learning algorithm has shorter training time and can obtain an effective frequency hopping strategy more quickly.

Fig. 8 is a graph of the average return simulation using the federal reinforcement learning algorithm and DQN algorithm for different numbers of drones. The performance of a federal reinforcement learning algorithm and the performance of a DQN algorithm, which are provided when different numbers of unmanned aerial vehicle clients are simulated, and the abscissa is training time. The simulation graph shows that the convergence speed of the algorithm provided by 10 unmanned aerial vehicle clients is obviously higher than that of 6 unmanned aerial vehicle clients, and the convergence speed of the algorithm provided by 6 unmanned aerial vehicle clients is also obviously higher than that of 3 and 2 unmanned aerial vehicle clients adopting the DQN algorithm. With the increase of the number of unmanned aerial vehicles, the convergence speed is faster. The simulation graph illustrates that the performance of the proposed federal reinforcement learning model can be improved as the number of the unmanned aerial vehicle clients increases.

According to the frequency hopping anti-interference method of the special ad hoc network for the unmanned aerial vehicle based on the federal learning, the frequency hopping anti-interference model based on the DQN algorithm is trained in the client of the unmanned aerial vehicle, and in each training round, a preset number of historical experiences are selected from an experience pool and are used as training data to be input into a DQN local model; updating parameters of the local DQN local model by inputting training data to a neural network of the DQN local model, so that the DQN local model obtains a high-quality frequency hopping strategy; the unmanned aerial vehicle outputs Q values corresponding to different channels trained by the DQN local model as a sequence according to the channel state of the current time slot, the frequency hopping strategy selects the corresponding channel for communication according to a preset threshold value in the sequence, and the loss coefficient in the DQN local model is sent to a ground server as a local weight; the ground server collects the local weights and optimizes the local weights by adopting a federal average algorithm to obtain an optimized result, the optimized result is used for training a global model of the ground server, the ground server retransmits the updated global weights to each unmanned aerial vehicle, and the unmanned aerial vehicle client updates the loss coefficient according to the updated global weights so as to train the DQN frequency hopping to select the anti-interference model. According to the invention, a corresponding frequency hopping strategy can be generated according to the interference condition, the unmanned aerial vehicle is guided to select a channel without interference for communication, and the communication efficiency can be effectively improved.

Fig. 9 is a schematic structural diagram of a frequency hopping anti-jamming device of a dedicated ad hoc network of an unmanned aerial vehicle based on federal learning according to an embodiment of the invention.

As shown in fig. 9, the apparatus 10 includes:

training module 100, updating module 200, communication module 300, and optimization module 400.

The training module 100 is configured to train, in an unmanned aerial vehicle client, a frequency hopping anti-interference model based on a DQN algorithm, and randomly select a preset number of historical experiences from an experience pool as training data to input the training data into a DQN local model in each training round;

an updating module 200, configured to update parameters of the local DQN local model by inputting training data to the neural network of the DQN local model, so that the DQN local model obtains a high-quality frequency hopping strategy;

the communication module 300 is configured to enable the unmanned aerial vehicle to output Q values corresponding to different channels trained by the DQN local model as a sequence according to a channel state of a current time slot, select a corresponding channel for communication according to a preset threshold in the sequence by a frequency hopping strategy, and send a loss coefficient θ in the DQN local model as a local weight to the ground server;

and the optimization module 400 is used for collecting the local weights by the ground server and optimizing by adopting a federal averaging algorithm to obtain an optimized result, training a global model of the ground server by using the optimized result, retransmitting the updated global weights to each unmanned aerial vehicle by the ground server, updating the loss coefficient theta by the client of each unmanned aerial vehicle according to the updated global weights, and training the DQN to hop frequencies and select an anti-interference model.

Further, in the training module 100, the input of the drone client based on the DQN local model is (s, a, r, s') from the experience pool;

According to the special ad-hoc network frequency-hopping anti-jamming device based on the federal learning for the unmanned aerial vehicle, the training module is used for training a frequency-hopping anti-jamming model based on a DQN algorithm in an unmanned aerial vehicle client, and in each training round, a preset number of historical experiences are selected from an experience pool and are input into a DQN local model as training data; the updating module is used for updating parameters of the local DQN local model by inputting training data to the neural network of the DQN local model, so that the DQN local model obtains a high-quality frequency hopping strategy; the communication module is used for the unmanned aerial vehicle to output Q values corresponding to different channels trained by the DQN local model as a sequence according to the channel state of the current time slot, the frequency hopping strategy selects the corresponding channel to communicate according to a preset threshold value in the sequence, and the loss coefficient in the DQN local model is used as a local weight and sent to the ground server; and the optimization module is used for collecting the local weight by the ground server, optimizing by adopting a federal average algorithm to obtain an optimized result, training a global model of the ground server by the optimized result, retransmitting the updated global weight to each unmanned aerial vehicle by the ground server, and updating the loss coefficient by the client of each unmanned aerial vehicle according to the updated global weight so as to train DQN frequency hopping to select the anti-interference model. The invention can generate a corresponding frequency hopping strategy according to the interference condition, guide the unmanned aerial vehicle to select a channel without interference for communication, and effectively improve the communication efficiency.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or to implicitly indicate the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of the feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless explicitly specified otherwise.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Moreover, various embodiments or examples and features of various embodiments or examples described in this specification can be combined and combined by one skilled in the art without being mutually inconsistent.

Although embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are exemplary and not to be construed as limiting the present invention, and that changes, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A frequency hopping anti-interference method of a special ad hoc network of an unmanned aerial vehicle based on federal learning is characterized by comprising the following steps:

s2, updating parameters of the DQN local model by inputting the training data to a neural network of the DQN local model, so that the DQN local model obtains a high-quality frequency hopping strategy;

s3, the unmanned aerial vehicle outputs Q values corresponding to different channels trained by the DQN local model as a sequence according to the channel state of the current time slot, the frequency hopping strategy selects corresponding channels to communicate according to a preset threshold value in the sequence, and a loss coefficient theta in the DQN local model is used as a local weight and sent to a ground server;

s4, the ground server collects the local weights and optimizes the local weights by adopting a federal average algorithm to obtain an optimized result, the optimized result is used for training a global model of the ground server, the ground server retransmits the updated global weights to all the unmanned aerial vehicles, and the unmanned aerial vehicle client updates a loss coefficient theta according to the updated global weights so as to train the DQN frequency hopping to select an anti-interference model;

the selecting a preset amount of historical experiences from the experience pool as training data to be input into the DQN local model comprises the following steps:

after testing a channel at each time slot, saving historical states, historical actions and feedback information as historical information into an experience pool of the DQN, randomly selecting a preset number of the historical states, the historical actions and the feedback information from the experience pool as input, and inputting the input into the DQN local model;

in the S2, the frequency hopping anti-interference model is based on the DQN algorithm, the DQN algorithm is combined with a neural Network and a Q-learning algorithm, an epsilon greedy strategy is adopted to explore a plurality of channels, channels are selected according to the DQN local model, a target Q-Network is used to reduce a loss function and promote convergence of the DQN local model;

the reducing loss functions and facilitating the DQN local model convergence using a target Q-Network, comprising:

wherein Q (s, a; theta) refers to a Q value under the model parameter of the current moment in the action Q-Network when the current state is taken as input; r is the feedback at the current time; γ is the attenuation coefficient; s ^′ Is the state at the next time; a is ^′ Is the action at the next moment; theta is the model parameter after the last update; gamma max _a′ Q(s ^′ ,a ^′ (ii) a Theta) is the maximum Q value in the Q value sequence output by the model after the target Q-Network is processed by the last updated model parameter when the current state is used as input;

the S3 comprises the following steps:

wherein, Q(s) _t ,a _r ) Is in the current state s _r Take action a _r The resulting Q value; α is the learning rate; γ is the attenuation coefficient; r is a radical of hydrogen _r Is the feedback of the current time t; max _a Q(s _r+1 ,a _t+1 ) Is referred to as being in the current state s _t Take action a _t The next state s that can be obtained thereafter _t+1 The maximum Q value of (2);

the frequency hopping strategy of the unmanned aerial vehicle client side is based on the maximum value in the output sequence of the DQN local model

And selecting a corresponding channel a to communicate with the server, and sending the loss coefficient theta to the local server as a local weight. />

2. The ad-hoc network frequency-hopping anti-jamming method special for unmanned aerial vehicle based on federal learning as claimed in claim 1, wherein in S1, the input based on the DQN local model in the unmanned aerial vehicle client is (S, a, r, S) from the experience pool ^′ )；

Wherein s represents the current state when the user acts and is a channel selected before the learning; a represents the action made at a certain time, which is the channel selected according to the current state; r represents the feedback information received after the action is taken, if the channel has no interference, the value is a positive value, otherwise, if the channel has interference, the value is a negative value; s ^′ Indicated as the next state entered after the action is taken, is the channel selected for a.

3. The special ad-hoc network frequency hopping anti-interference method for the unmanned aerial vehicle based on the federal learning as claimed in claim 1, wherein the unmanned aerial vehicle takes the channel condition of the last time slot as a historical state, takes the channel selection condition of the last time slot as a historical action, and takes the information communication possibility as corresponding feedback information.

4. The frequency hopping anti-jamming method for the ad hoc network dedicated for the unmanned aerial vehicle based on the federal learning as claimed in claim 1,

in S4, the ground server collects the local weights and performs optimization by using a federal mean algorithm to obtain an optimized result, where the formula of the federal mean algorithm is as follows:

wherein the content of the first and second substances,

is the local weight from the kth drone client, i.e. the loss coefficient θ, p _k Dividing by p to represent that weighted average is carried out on the local weight of the kth unmanned aerial vehicle client, adding the weighted average values to obtain an updated global weight, and inputting the updated global weight into the global model of the local server to train to obtain the updated global model.

5. The utility model provides a special ad hoc network frequency hopping anti jamming unit of unmanned aerial vehicle based on federal study which characterized in that includes:

the training module is used for training a frequency hopping anti-interference model based on a DQN algorithm in an unmanned aerial vehicle client, and randomly selecting a preset number of historical experiences from an experience pool as training data to be input into a DQN local model in each training round;

the updating module is used for updating the parameters of the DQN local model by inputting the training data to the neural network of the DQN local model, so that the DQN local model obtains a high-quality frequency hopping strategy;

the communication module is used for outputting Q values corresponding to different channels trained by the DQN local model as a sequence by the unmanned aerial vehicle according to the channel state of the current time slot, selecting the corresponding channels for communication according to a preset threshold value in the sequence by the frequency hopping strategy, and sending a loss coefficient theta in the DQN local model as a local weight to a ground server;

the optimization module is used for collecting the local weights by the ground server and optimizing the local weights by adopting a federal average algorithm to obtain an optimized result, training a global model of the ground server by the optimized result, retransmitting the updated global weights to each unmanned aerial vehicle by the ground server, and updating a loss coefficient theta by the unmanned aerial vehicle client according to the updated global weights so as to train the DQN frequency hopping to select an anti-interference model;

the training module is further configured to:

the frequency hopping anti-interference model is based on the DQN algorithm, the DQN algorithm is combined with a neural Network and a Q-learning algorithm, an epsilon greedy strategy is adopted to explore a plurality of channels, the channels are selected according to the DQN local model, a target Q-Network is used to reduce a loss function and promote convergence of the DQN local model;

wherein Q (s, a; theta) refers to a Q value under the model parameter at the current moment in the action Q-Network when the current state is taken as input; r is the feedback at the current time; γ is the attenuation coefficient; s ^′ Is the state at the next time; a is ^′ Is the action at the next moment; theta is the model parameter after the last update; gamma max _a′ Q(s ^′ ,a ^′ (ii) a Theta) is the maximum Q value in the Q value sequence output by the model after the target Q-Network is processed by the last updated model parameter when the current state is used as input;

the communication module is further configured to:

Q(s _t ,a _t )←Q(s _r ,a _r )+α[r _t +γmax _a Q(s _t+1 ,a _t+1 )-Q(s _t ,a _t )]

6. The special ad-hoc network frequency hopping anti-jamming device for unmanned aerial vehicle based on federal learning of claim 5, wherein:

in the training module, the input based on the DQN local model in the drone client is (s, a, r, s) from the experience pool ^′ )；

Wherein s represents the current state when the user acts and is a channel selected before the learning; a represents the action taken at a certain time, according to whichThe channel selected by the current state; r represents the feedback information received after the action is taken, if the channel has no interference, the value is a positive value, otherwise, if the channel has interference, the value is a negative value; s ^′ Indicated as the next state entered after the action is taken, is the channel selected for a.