CN113573323A

CN113573323A - Knowledge gradient-based rapid selection method for optimal channel of unmanned aerial vehicle

Info

Publication number: CN113573323A
Application number: CN202110681497.8A
Authority: CN
Inventors: 杜丰; 林艳; 李骏
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2021-06-18
Filing date: 2021-06-18
Publication date: 2021-10-29

Abstract

The invention discloses an unmanned aerial vehicle optimal channel rapid selection method based on knowledge gradient. The method comprises the following steps: modeling the channel capacity of all channels into a lookup table belief model based on Bayesian theorem; initializing a belief model according to the past communication task experience of the unmanned aerial vehicle; calculating to obtain knowledge gradient values of all channels according to the belief state about the channel capacity at the current moment, and selecting the channel with the maximum knowledge gradient as the current-moment channel; the unmanned aerial vehicle communicates on the selected channel, monitors the transmission rate at the same time, and updates the belief state of the channel capacity according to the monitored transmission rate; the above process is repeated until the time limit, i.e. budget, for each channel selection is exceeded. The method is suitable for quick channel selection of the high-dynamic unmanned aerial vehicle network, and effectively improves the speed of optimal channel selection.

Description

Knowledge gradient-based rapid selection method for optimal channel of unmanned aerial vehicle

Technical Field

The invention belongs to the technical field of unmanned aerial vehicle communication, and particularly relates to an unmanned aerial vehicle optimal channel rapid selection method based on knowledge gradient.

Background

With the large-scale development and application of the unmanned aerial vehicle technology, the anti-interference problem in the field of unmanned aerial vehicle communication becomes increasingly severe. Interference in the communication process of the unmanned aerial vehicle is not only from background noise, but also possibly from an interfering machine. The interference machine transmits a signal with certain interference strength on a channel to occupy channel resources, and the unmanned aerial vehicle end can make a corresponding strategy to select the channel which is not interfered or is less interfered, so that the unmanned aerial vehicle can obtain higher transmission rate on the channels.

At present, in a plurality of methods for resisting interference of the unmanned aerial vehicle, frequency hopping is a common and easy-to-implement direction. But the related algorithms are not much proposed and are mainly limited by the drone's resistance to this particular environment. Based on the hypothesis of cognitive radio, the channel selection process of the unmanned aerial vehicle is modeled as MDP, and the benefit of the unmanned aerial vehicle is maximized through reinforcement learning algorithm (such as Q learning) selection, so that a channel with small interference and high benefit is selected in each time slot (Zhanyu. unmanned aerial vehicle network anti-interference method research [ D ]. Beijing post and telecommunications university.2019). Or, also modeling the channel selection process as MDP, select the best channel by minimizing the perceived interference power through Q-learning (huang bang. The method of reinforcement learning can well cope with the jammers with fixed strategies, but when the interference power and the interference channel of the jammers are relatively high in randomness, the reinforcement learning algorithm is difficult to converge and consumes a long time, and the environment confronted by the unmanned aerial vehicle is difficult to cope with. In addition, there is a method for modeling unmanned plane channel selection as a dobby gambling machine, and the best channel is estimated by a greedy algorithm or a UCB algorithm, although the method based on statistics better deals with the randomness problem, the method also faces the same problem that a large amount of training is needed to converge.

Disclosure of Invention

The invention aims to provide a method for quickly selecting an optimal channel of an unmanned aerial vehicle based on knowledge gradient, so that the speed of selecting the optimal channel of the unmanned aerial vehicle under the scene of random dynamic change of interference power is increased.

The technical solution for realizing the purpose of the invention is as follows: an unmanned aerial vehicle optimal channel rapid selection method based on knowledge gradient comprises the following steps:

step 1, modeling the channel capacity of all channels into a lookup table belief model based on Bayesian theorem;

step 2, initializing a lookup table belief model according to the unmanned aerial vehicle communication task experience;

step 3, calculating to obtain knowledge gradient values of all channels according to the belief state of the channel capacity at the current moment, and selecting the channel with the maximum knowledge gradient as a communication channel at the current moment;

and 4, the unmanned aerial vehicle communicates on the selected channel, simultaneously monitors the transmission rate, and updates the belief state of the channel capacity according to the transmission rate.

Further, the lookup table belief model in step 1 is based on bayes theorem, and specifically includes the following steps:

the lookup table belief model is used for modeling channel capacity, is composed of a belief mean and a belief variance of the channel capacity and is generally called a belief state; the belief state updated at the previous moment belongs to posterior distribution of channel capacity and is used for prior distribution participation operation at the current moment.

Further, in step 2, the lookup table belief model is initialized, wherein initial values of the belief states are empirically aggregated, and comprise a belief mean of the channel capacity.

Further, in step 3, according to the belief state of the current time about the channel capacity, a knowledge gradient value about each channel is obtained by calculation, and a channel with the largest knowledge gradient is selected as the communication channel of the current time, specifically:

using a knowledge gradient algorithm based on a lookup table model, taking a belief state about channel capacity as prior distribution of the iteration, wherein the prior distribution obeys Gaussian distribution and is a two-dimensional table consisting of a belief mean and a belief variance;

and calculating to obtain knowledge gradient values of all channels in the current belief state according to a calculation formula of the knowledge gradient, selecting the channel with the maximum knowledge gradient value as a communication channel at the current moment, and monitoring the selected channel.

Further, a knowledge gradient algorithm based on a lookup table belief model is used for selecting a channel with minimum interference, and for a scene of variable power interference of a single jammer or multiple jammers on a communication channel, the method specifically comprises the following steps:

(1) lookup table belief model

Definition of S^tFor a belief state for channel N e {1, 2., N }, the action is to select one of the N channels,

wherein

The channel capacity estimate for channel n for time t,

then it is the belief standard deviation for the channel n at time t and the belief state records the actual value r for the channel capacity_nLet us say the belief of EF (n, W), W representing the observation of channel n, assuming

the gradient of knowledge about the channel n at time t is defined as:

is that at time t, a channel n is selected^tRear end

Updating of (1);

after the action selection is done at each moment, the following reports, namely the observed values, are obtained:

r_nis an actual observation about the channel capacity of channel n;

defining belief accuracy

Sum noise accuracy

Wherein

Is based on empirical and historical statistics about the variance of the channel capacity of channel n;

is the belief variance, which is the square of the belief standard deviation;

based on the above definitions and formulas, the following updates are made to the channel capacity and belief accuracy, for the selected channel n at time t:

the rest channels use the belief state of the last moment;

(2) knowledge gradient algorithm

Calculating the knowledge gradient of each channel capacity in the interference scene of the multi-channel variable power of the interference machine by using a knowledge gradient algorithm based on a lookup table belief model, wherein the size of the knowledge gradient represents the amount of information which can be obtained after the corresponding channel is selected; the knowledge gradient algorithm calculates the information quantity which can be obtained by each channel based on Bayesian theory and hypothesis that the belief state obeys Gaussian distribution, and the larger the information quantity is, the larger the decision making progress is after the belief state is updated;

noting the variance of the change in the mean value of beliefs caused by selecting channel n at time t

Indicating an observation error;

then calculate

The normalized influence called action n gives the standard deviation of the channel capacity corresponding to the current action n;

recalculation

f(ζ)＝ζΦ(ζ)+φ(ζ)

Wherein Φ (ζ) and Φ (ζ) represent a cumulative standard normal distribution function and a standard normal density function, respectively;

in summary, the knowledge gradient corresponding to the channel n at the time t is written as:

considering the influence of the knowledge gradient of the current moment on the rest moments, and simultaneously weighing the relation between data exploration and utilization, the online knowledge gradient is finally used as the basis for channel selection:

inputting the current belief state S^tDefinition of

And calculating therefrom a normalized influence

And corresponding f (zeta), and finally, giving online knowledge gradients corresponding to all channels at the time t

The action is selected as

And then observing the actual communication rate of the channel, updating the belief state according to an update equation in the lookup table belief model in step (1), and then starting channel selection at the next moment, wherein each moment corresponds to one budget until the budget is used up and the iteration of the algorithm is stopped.

Compared with the prior art, the invention has the following remarkable advantages: (1) by adopting a decision-making mode based on a Bayesian theory, more scientific judgment is made on the value of unmanned aerial vehicle channel information or whether the information needs to be collected again, the value is completely believed or not to be believed in the calculation result unlike the common decision, but the data is measured through belief variance, the confidence degree is digitized, and the method is more scientific and reasonable; (2) the method has the advantages that the concept of knowledge gradient constructed based on Bayesian theory is utilized, the information value which can be obtained after the channel is selected is digitalized, and meanwhile, the online knowledge gradient which can balance the channel capacity mean value and the channel information value is used as the reference for selecting the channel, so that the method has higher convergence speed.

The present invention is described in further detail below with reference to the attached drawing figures.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Fig. 2 is a schematic diagram of the unmanned aerial vehicle communication anti-interference transmission model.

Fig. 3 is a graphical representation of the accumulated interference power over 5 channels as a function of time.

Fig. 4 is a schematic diagram of channel selection for each slot drone.

Fig. 5 is a statistical diagram of the number of times each channel is selected by the drones.

Detailed Description

While considering uncertainty of jammers, The invention applies a Knowledge Gradient (Powell, w.b. "The Knowledge Gradient for Optimal Learning," Encyclopedia for Operations Research and Management Science,2011(c) John Wiley and Sons.) to The field of drone communication, greatly reducing The convergence time. The invention provides an unmanned aerial vehicle optimal channel rapid selection method based on knowledge gradient, which rapidly learns the information of all channels through an updating method of a lookup table belief model constructed based on Bayesian hypothesis and a knowledge gradient calculation formula based on Gaussian hypothesis, and provides channel selection with minimum accumulated interference after few iterations, namely a channel with maximum accumulated channel capacity, and specifically comprises the following steps:

step 2, initializing a belief model according to the communication task experience of the unmanned aerial vehicle;

Further, the lookup table belief model in step 1 is based on bayes theorem, which is specifically as follows:

the lookup table belief model is used for modeling channel capacity, is composed of a belief mean and a belief variance of the channel capacity and is generally called a belief state; the belief state updated at the previous moment belongs to posterior distribution of channel capacity, and can be used as prior distribution participation operation at the current moment.

Further, the initial value of the belief state in step 2 is obtained according to the past experience aggregation, and mainly includes the belief mean value of the channel capacity.

Further, the belief state about the channel capacity in step 3 is used as the prior distribution of the iteration, and then the knowledge gradient values of all channels in the current belief state are calculated, specifically:

using a knowledge gradient algorithm based on a lookup table model, taking the belief state obtained in the step 2 or the step 4 as prior distribution, wherein the distribution obeys Gaussian distribution and is a two-dimensional table consisting of a belief mean and a belief variance;

and calculating to obtain the knowledge gradient value of each channel according to a calculation formula of the knowledge gradient, selecting the channel with the maximum value as the communication channel at the current moment, and monitoring the selected channel.

Further, aiming at the variable power interference scene of a single jammer or a plurality of jammers on a communication channel, a knowledge gradient algorithm based on a lookup table belief model is used for selecting the channel with the minimum interference, and the method specifically comprises the following steps:

(1) lookup table belief model

wherein

The channel capacity estimate for channel n for time t,

then is the belief variance for the channel n at time t. The belief state records the actual value r of the channel capacity_nBeliefs of EF (n, W)W denotes the observation of channel n, assuming

the gradient of knowledge about the channel n at time t is defined as:

is that at time t, a channel n is selected^tRear end

And (4) updating.

r_nis an actual observation about the channel capacity of channel n.

Defining belief precision and noise precision:

wherein

Is based on empirical and historical statistics on the variance of the channel capacity of channel n,

is the belief variance, which is the beliefThe square of the standard deviation is recited.

the remaining channels follow the belief state at the last time.

(2) Knowledge gradient algorithm

And calculating the knowledge gradient of each channel capacity in the interference machine multi-channel variable power interference scene by using a knowledge gradient algorithm based on a lookup table belief model, wherein the size of the knowledge gradient represents the amount of information which can be obtained after the corresponding channel is selected. The knowledge gradient algorithm is based on Bayesian theory and the assumption that the belief state obeys Gaussian distribution, the information quantity which can be acquired by each channel is calculated, and the larger the information quantity is, the larger the decision making progress is after the belief state is updated.

Firstly, defining the variance of the change of the belief mean value caused by selecting the channel n at the time t:

indicating an observation error.

Then calculate

Called the normalized impact of action n, which gives the magnitude of the standard deviation of the channel capacity for the current action n.

Recalculation

f(ζ)＝ζΦ(ζ)+φ(ζ)

Where Φ (ζ) and Φ (ζ) represent a cumulative standard normal distribution function and a standard normal density function, respectively.

In summary, the knowledge gradient corresponding to the channel n at the time t can be written as:

inputting the current belief state S^tDefinition of

And calculating therefrom a normalized influence

The action is selected as

And then observing the actual communication rate of the channel, updating the belief state according to the updating equation in the lookup table belief model in the step (1), starting channel selection at the next moment, wherein each moment corresponds to one budget, and stopping the iteration of the algorithm until the budget is used up.

Examples

One embodiment of the invention is described in detail below, with the simulation using python programming, and the parameter settings do not affect generality. The embodiment verifies the effectiveness and convergence of the proposed model and algorithm, and the scenario is as follows:

since the algorithm is based on the assumption of instantaneous tasks, only a single drone user is in the scene, the position of the drone user and the ground base station is fixed relatively, and then communication is carried out, and 5 channels are available. And in each time slot, a communication link is established between the aerial unmanned aerial vehicle and the ground base station, the flying height of the unmanned aerial vehicle is 80m, the distance between the unmanned aerial vehicle and the base station is 100m, and the transmitting power of the unmanned aerial vehicle is 0.4W. 5 malicious interference machines simultaneously interfere normal transmission of users, the positions of the interference machines are also fixed, and the distances between the interference machines and a base station are 100 m; the interference patterns are all multi-tone random interference, i.e. a single jammer performs interference on 5 channels, with the difference that the distribution of the interference power is different from each other. In addition, the interference powers of the jammers are independent of each other and subject to different distributions, for example, some jammers use interference powers subject to gaussian distribution, and some jammers use uniform distribution. The specific interference power setting is as shown in table 1:

TABLE 1

Jammer serial number	Interference power mean/lower bound	Interference power variance/ceiling
			Jammer 1 (Gauss)	[0.4,0.6,0.8,0.8,1.0]	[0.3,0.3,0.3,0.3,0.3]
Jammer 2 (Gauss)	[0.5,0.5,0.8,0.7,0.9]	[0.2,0.2,0.2,0.2,0.2]
			Jammer 3 (Uniform)	[0.1,0.2,0.3,0.4,0.5]	[0.8,0.9,0.9,1.0,1.2]
Jammer 4 (Uniform)	[0.3,0.2,0.3,0.6,0.8]	[1.0,1.2,1.0,1.4,1.5]
			Jammer 5 (Uniform)	[0.2,0.2,0.2,0.2,0.2]	[0.9,0.9,0.9,0.9,0.9]

The unmanned aerial vehicle channel selection algorithm based on the knowledge gradient provided by the invention is combined with the figures 1-2, and the specific process is as follows:

step 1: initialization: initializing the iteration time t as 0; initializing a belief model, estimating a channel capacity mean value according to past experience, and setting the same belief variance for each channel; other system fixed parameters are initialized.

Step 2: repeating iteration, which is specifically divided into the following substeps:

1) sequentially calculating and storing the online knowledge gradients of all the channels, wherein the step of calculating the knowledge gradients is as follows:

firstly, calculating variance of change of mean value of belief

② calculating normalized influence of channel n

Calculating f (zeta) ═ zeta phi (zeta) + phi (zeta), where phi (zeta) and phi (zeta) represent cumulative standard normal distribution functions and

a standard normal density function;

calculating knowledge gradient

Calculating on-line knowledge gradient

2) Selecting a communication channel according to the online knowledge gradient of each channel corresponding to the current time t;

3) after the channel is selected, the unmanned aerial vehicle monitors the actual transmission rate of the channel, and the belief state is updated by the following formula

Wherein the belief precision and the noise precision are as follows:

as can be seen from fig. 3, as time increases, the interference power experienced by each channel also increases. Where channel C0 is the channel with the least accumulated interference and vice versa the channel with the largest channel capacity. Fig. 4 shows whether the drone selects the channel with the smallest cumulative interference or not in each time slot, and it can be seen from fig. 3 and 4 that after the drone performs multiple operations in a cyclic manner, the channel selection of the drone tends to be stable, and convergence is achieved. And the convergence rate is within 100 times of calculation, and the convergence rate is higher than that of an algorithm based on historical statistics, such as UCB or epsilon-greedy. As can be seen from fig. 5, in the process of learning the channel capacity and selecting a communication channel, the drone tends to select a channel with the minimum actual interference mean, and the validity of the algorithm is verified.

In conclusion, the knowledge gradient-based rapid selection method for the optimal channel of the unmanned aerial vehicle fully considers the experience about the channel capacity in historical communication, and by utilizing the Bayesian theory, the rapid convergence speed can be achieved, the channel with the maximum average value of the channel capacity can be learned, and the channel can be continuously selected as the actual communication channel. Therefore, the method is very suitable for the channel selection task of the unmanned aerial vehicle in the time-limited environment.

Claims

1. A method for quickly selecting an optimal channel of an unmanned aerial vehicle based on knowledge gradient is characterized by comprising the following steps:

2. The method for quickly selecting the optimal channel of the unmanned aerial vehicle based on the knowledge gradient according to claim 1, wherein the lookup table belief model in the step 1 is based on Bayesian theorem and specifically comprises the following steps:

3. The knowledge gradient-based unmanned aerial vehicle optimal channel rapid selection method according to claim 1, wherein the initialization lookup table belief model in step 2 is obtained by empirically aggregating initial values of belief states, including belief mean values of channel capacity.

4. The method for quickly selecting the optimal channel of the unmanned aerial vehicle based on the knowledge gradient as claimed in claim 1,2 or 3, wherein in step 3, the knowledge gradient value of each channel is calculated according to the belief state about the channel capacity at the current moment, and the channel with the largest knowledge gradient is selected as the communication channel at the current moment, specifically:

5. The method for quickly selecting the optimal channel of the unmanned aerial vehicle based on the knowledge gradient as claimed in claim 4, wherein a knowledge gradient algorithm based on a lookup table belief model is used to select the channel with the minimum interference, and for a scene of variable power interference of a single jammer or multiple jammers on a communication channel, the method specifically comprises the following steps:

(1) lookup table belief model