CN111600676A

CN111600676A - Q value table determining method, anti-interference method, device and equipment

Info

Publication number: CN111600676A
Application number: CN202010506845.3A
Authority: CN
Inventors: 李瀚�; 姜化京; 姜维
Original assignee: Shanghai Terjin Wireless Technology Co ltd
Current assignee: Shanghai Terjin Wireless Technology Co ltd
Priority date: 2020-06-05
Filing date: 2020-06-05
Publication date: 2020-08-28

Abstract

The invention provides an unmanned aerial vehicle communication Q value table determining method, an unmanned aerial vehicle communication anti-interference device and unmanned aerial vehicle communication anti-interference equipment. The Q value table determining method comprises the following steps: determining an interfered state set, an anti-interference strategy set and a reward function; determining an initial Q value table according to the interfered state set and the anti-interference strategy set; when the unmanned aerial vehicle for learning is controlled to execute each anti-interference strategy, updating the Q value in the Q value table in a Q learning mode; the updated Q value is associated with learning the anti-jamming policy employed by the drone, the power information and channel information of the received jamming signal, and the reward function. Wherein, the different disturbed state information has represented the different power interval and/or different channel of learning with the power information of the interference signal that unmanned aerial vehicle received, can reflect the influence of external environment to unmanned aerial vehicle communication more comprehensively, and then the anti-interference strategy of confirming accords with actual flight condition more.

Description

Q value table determining method, anti-interference method, device and equipment

Technical Field

The invention relates to the field of unmanned aerial vehicles, in particular to an unmanned aerial vehicle communication Q value table determining method, an unmanned aerial vehicle anti-interference device and unmanned aerial vehicle anti-interference equipment.

Background

With the continuous development of unmanned aerial vehicle technology, more and more consumption-level unmanned aerial vehicles are applied to the daily life of ordinary people.

Unmanned aerial vehicle is at the flight in-process, and malicious interference is very big to unmanned aerial vehicle communication security threat, in order to deal with malicious interference threat, among the prior art, when unmanned aerial vehicle communication received the interference, through perception interfering signal's channel, later switch unmanned aerial vehicle's communication channel to the channel different with interfering signal's channel, can avoid interfering signal interference unmanned aerial vehicle communication like this.

It is thus clear that, among the prior art, only through the channel that acquires the interfering signal in the unmanned aerial vehicle external environment, can not reflect the influence of external environment to unmanned aerial vehicle communication comprehensively to, current correlation technique is anti-jamming only through the mode of switching unmanned aerial vehicle communication channel, and anti-interference strategy is more single, and the flexibility is than poor.

Disclosure of Invention

The invention provides a method, an anti-interference method, a device and equipment for determining an unmanned aerial vehicle communication Q value table, and aims to solve the problems that the existing unmanned aerial vehicle anti-interference technology cannot comprehensively reflect the influence of an external environment on unmanned aerial vehicle communication, and an anti-interference strategy is single and poor in flexibility.

According to a first aspect of the present invention, there is provided a method for determining a communication Q-value table of an unmanned aerial vehicle, which is applied to a control end, and includes:

determining an interfered state set, an anti-interference strategy set and a reward function;

determining an initial Q value table according to the interfered state set and the anti-interference strategy set;

when the unmanned aerial vehicle for learning is controlled to execute each anti-interference strategy, updating the Q value in the Q value table in a Q learning mode; the updated Q value is associated with an anti-interference policy employed by the learning drone, power information and channel information of the received interfering signal, and the reward function;

the Q value table records Q values corresponding to execution of different anti-interference strategies under each type of interfered state information; the interference state set represents a set of predefined interference state information, the interference resistance strategy set is a set of interference resistance strategies executable by the learning unmanned aerial vehicle, and the reward function is used for representing different interference resistance strategies executed by the learning unmanned aerial vehicle, interference state information before the corresponding interference resistance strategy is executed, and a functional relation between reward values; the different interfered state information represents different power intervals and/or different channels where the power of the interference signal received by the learning unmanned aerial vehicle is located; the immunity policy is executed to adjust a communication channel and/or a communication bandwidth of the learning drone.

Optionally, updating the Q-value table in a Q-learning manner includes:

acquiring first interfered state information sensed by the unmanned aerial vehicle for learning at any Kth time slot;

determining a first anti-interference strategy of the learning unmanned aerial vehicle at a Kth time slot;

determining a first reward value obtained by the learning unmanned aerial vehicle executing the first anti-interference strategy under the first interfered state information according to the reward function;

acquiring second interfered state information perceived by the unmanned aerial vehicle for learning in the K +1 th time slot;

and updating the Q value table according to the Q value table corresponding to the Kth time slot, the first interfered state information, the first anti-interference strategy, the second interfered state information and the first reward value to obtain a Q value table corresponding to the Kth +1 th time slot.

Optionally, determining the first anti-interference policy of the learning drone at the kth time slot includes:

determining an anti-interference strategy subset corresponding to the first interfered state information according to a Q value table corresponding to the Kth time slot;

and determining the anti-interference strategy with the maximum Q value in the anti-interference strategy subset as the first anti-interference strategy.

acquiring third interfered state information perceived by the unmanned aerial vehicle for learning at the K-1 time slot;

determining an anti-interference strategy subset corresponding to the third interfered state information according to a Q value table corresponding to the K-1 time slot;

and determining the first anti-interference strategy through a boltzmann random strategy according to the Q value corresponding to each anti-interference strategy in the anti-interference strategy subset.

Optionally, before determining, according to the reward function, a first reward value obtained by the learning drone executing the first anti-interference policy under the first interfered state information, the method further includes:

determining a signal-to-noise ratio of a communication signal received by the learning drone after the first anti-jamming policy is executed by the learning drone under the first interfered state information;

if the signal-to-noise ratio is smaller than or equal to a preset signal demodulation threshold of the learning unmanned aerial vehicle, then: determining the function value of the reward function as a preset reward value;

if the signal-to-noise ratio is greater than the preset signal demodulation threshold, then: determining the reward function according to the signal throughput and the channel switching overhead; the signal throughput characterizes an ability of the learning drone to receive the communication signal, and the channel switching overhead characterizes a loss incurred by the learning drone to switch communication channels.

Optionally, the signal-to-noise ratio is determined according to the following formula:

wherein:

t represents a time slot;

represents the signal-to-noise ratio at the t-th time slot;

a_ta communication channel representing the learning drone at the t-th time slot;

c_tindicating an interfering signal channel at the t-th time slot;

indicating a bandwidth of a communication channel of the learning drone at the t-th time slot;

indicating the bandwidth of the interference signal at the t-th time slot;

P₁(t) represents the communication signal power received by the learning drone at the t-th time slot;

P₂(t) represents the power of the interference signal received by the learning drone at the t-th time slot.

According to a second aspect of the invention, an unmanned aerial vehicle communication anti-interference method is provided, which includes:

determining interfered state information of the unmanned aerial vehicle at the current moment, wherein different interfered state information represents different power intervals and/or different channels where the power of an interference signal received by the corresponding unmanned aerial vehicle is located;

determining an anti-interference strategy to be executed by the unmanned aerial vehicle at the current moment according to the interfered state information at the current moment and a Q value table determined according to the method related to the first aspect and the optional schemes thereof;

and executing the anti-interference strategy required to be executed at the current moment so as to adjust the communication channel and/or the communication bandwidth of the unmanned aerial vehicle.

According to a third aspect of the present invention, there is provided an unmanned aerial vehicle communication Q-value table determination apparatus, including:

a first module for determining an interfered state set, an anti-interference policy set, and a reward function;

a second module, configured to determine an initial Q-value table according to the interfered state set and the anti-interference policy set;

the third module is used for updating the Q value in the Q value table in a Q learning mode when the unmanned aerial vehicle for learning is controlled to execute each anti-interference strategy; the updated Q value is associated with an anti-interference policy employed by the learning drone, power information and channel information of the received interfering signal, and the reward function;

Optionally, the third module includes:

the first information acquisition unit is used for acquiring first interfered state information perceived by the learning unmanned aerial vehicle at any Kth time slot;

the first strategy determining unit is used for determining a first anti-interference strategy of the learning unmanned aerial vehicle at a Kth time slot;

the first calculation unit is used for determining a first reward value obtained by the learning unmanned aerial vehicle executing the first anti-interference strategy under the first interfered state information according to the reward function;

the second information acquisition unit is used for acquiring second interfered state information perceived by the learning unmanned aerial vehicle at the K +1 time slot;

and the second calculation unit is used for updating the Q value table according to the Q value table corresponding to the Kth time slot, the first interfered state information, the first anti-interference strategy, the second interfered state information and the first reward value to obtain the Q value table corresponding to the K +1 th time slot.

Optionally, the first policy determining unit includes:

a first policy subset determining subunit, configured to determine, according to a Q-value table corresponding to a kth time slot, an anti-interference policy subset corresponding to the first interfered state information;

and the first strategy selection subunit is used for determining the anti-interference strategy with the maximum Q value in the anti-interference strategy subset as the first anti-interference strategy.

Optionally, the first policy determining unit includes:

the information acquisition subunit is used for acquiring third interfered state information perceived by the learning unmanned aerial vehicle at the K-1 time slot;

a second policy subset determining subunit, configured to determine, according to the Q-value table corresponding to the K-1 th time slot, an anti-interference policy subset corresponding to the third interfered state information;

and the second strategy selection subunit is used for determining the first anti-interference strategy through a boltzmann random strategy according to the Q value corresponding to each anti-interference strategy in the anti-interference strategy subset.

Optionally, the apparatus further comprises:

a third calculating unit, configured to determine a signal-to-noise ratio of a communication signal received by the learning unmanned aerial vehicle after the learning unmanned aerial vehicle executes the first anti-jamming policy under the first interfered state information;

According to a fourth aspect of the present invention, there is provided an electronic device comprising a processor and a memory, the memory for storing code and associated data;

the processor is adapted to execute the code in the memory to implement the method according to the first aspect of the invention and its alternatives.

According to a fifth aspect of the present invention, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, carries out the method according to the first aspect of the present invention and its alternatives.

The invention provides a method for determining a communication Q value table of an unmanned aerial vehicle, and a method, a device and equipment for communication anti-interference of the unmanned aerial vehicle, wherein the determined Q value table comprises Q values corresponding to execution of different anti-interference strategies under each type of interfered state information, and different interfered state information represents different power intervals and/or different channels where power information of interference signals received by a learning unmanned aerial vehicle is located, so that compared with the prior art, the influence of an external environment on communication of the unmanned aerial vehicle can be reflected more comprehensively, and the determined anti-interference strategy is more in line with actual flight conditions; because the anti-interference strategy is executed, the communication channel and/or the communication bandwidth of the unmanned aerial vehicle are/is selected and adjusted, so that the selection of the anti-interference strategy is more flexible.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a diagram of a learning scenario for determining a Q-value table in accordance with an embodiment of the present invention;

fig. 2 is a first flowchart of a method for determining a communication Q-value table of a drone according to an embodiment of the invention;

fig. 3 is a flowchart of a method for determining a communication Q value table of an unmanned aerial vehicle according to an embodiment of the present invention;

fig. 4 is a flowchart of a method for determining a communication Q-value table of an unmanned aerial vehicle according to an embodiment of the present invention;

fig. 5 is a fourth flowchart of a method for determining a communication Q value table of a drone according to an embodiment of the invention;

fig. 6 is a fifth flowchart of a method for determining a communication Q-value table of a drone according to an embodiment of the invention;

fig. 7a is a schematic diagram of the case where the gain factor of the interference is 0.

Fig. 7b is a schematic diagram of the case where the gain factor of the interference is 1.

FIG. 7c shows the gain factor of the interference as

Schematic representation of (c).

FIG. 8 is a schematic view of a scene in which an UAV flies in an embodiment of the invention;

fig. 9 is a flowchart of an anti-jamming method for communication of a drone according to an embodiment of the present invention;

fig. 10 is a schematic block diagram of an anti-jamming device for communication of a drone according to an embodiment of the present invention;

fig. 11 is a first block diagram of an apparatus for determining a communication Q-value table of a drone according to an embodiment of the present invention;

fig. 12 is a block diagram of a device for determining a communication Q-value table of an unmanned aerial vehicle according to an embodiment of the present invention;

fig. 13 is a block diagram of a third exemplary apparatus for determining a communication Q-value table of a drone according to an embodiment of the present invention;

fig. 14 is a block diagram of a device for determining a communication Q-value table of a drone according to an embodiment of the present invention;

fig. 15 is a block diagram of a device for determining a communication Q-value table of a drone according to an embodiment of the present invention;

fig. 16 is a schematic configuration diagram of an electronic device in an embodiment of the present invention.

Description of reference numerals:

11-a learning drone;

12-unmanned aerial vehicle for communication;

13-interfering drone;

14-a control terminal;

21-unmanned aerial vehicle;

22-a source of interference;

23-a communication station;

31-a state acquisition module;

32-a policy determination module;

33-a policy enforcement module;

41-a first module;

42-a second module;

43-a third module;

431-a first information acquisition unit;

432-a first policy determination unit;

4321-first policy subset determination subunit;

4322-first policy selection subunit;

4323-information acquisition subunit;

4324-second policy subset determination subunit;

4325-second policy selection subunit;

433 — a first calculation unit;

434 — a second information obtaining unit;

435-a second calculation unit;

436-a third calculation unit;

51-a processor;

52-a bus;

53-memory.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The technical solution of the present invention will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

For convenience of description, the following partial terms related to the embodiments of the present application will be briefly described:

the reward function: the reward function is used in this application to characterize the direct feedback (i.e., the reward value determined based on the reward function) that the drone obtains after performing the anti-jamming policy. Further, the reward function is used for representing different anti-interference strategies executed by the unmanned aerial vehicle, interfered state information before the corresponding anti-interference strategies are executed, and a functional relation between reward values.

Q value: the Q value is determined according to a Q (quality) function, the Q value in the application indicates that the unmanned aerial vehicle executes a certain anti-interference strategy under an interfered state (information), and the feedback is determined according to the Q function, wherein in a series of anti-interference strategies corresponding to each interfered state, the anti-interference strategy corresponding to the maximum Q value is the optimal anti-interference strategy under the interfered state.

Q value table: the Q value table may be understood as a set of Q values corresponding to each interference rejection policy executed under each type of interfered state information, that is, the set of Q values described above.

FIG. 1 is a diagram of a learning scenario for determining a Q-value table according to an embodiment of the present invention.

Referring to fig. 1, the apparatus associated with the scenario may include: the unmanned aerial vehicle for learning 11, the unmanned aerial vehicle for communication 12, the unmanned aerial vehicle for interference 13, and the control terminal 14.

It should be noted that, in the embodiment of the present invention, the Q-value table is determined in a reinforcement learning manner, and the learning unmanned aerial vehicle 11 in the embodiment of the present invention is in a dynamic time-varying interfered environment, and for an anti-interference problem in the dynamic time-varying environment, the embodiment of the present invention provides a theoretical support for designing an anti-interference algorithm by modeling through a markov decision process.

Referring to fig. 1, the learning drone 11 and the communication drone 12 may communicate in real time, and the interfering drone 13 may emit an interfering signal that may affect communication between the learning drone 11 and the communication drone 12. The learning drone 11 may receive a control command of the control terminal 14, and the control command may further include a communication anti-jamming command. The control end 14 can be the control end outside the unmanned aerial vehicle 11 for learning, and the control end 14 can also be integrated inside the unmanned aerial vehicle 11 for learning.

Fig. 2 is a first flowchart of a method for determining a communication Q-value table of a drone according to an embodiment of the present invention.

Referring to fig. 2, in an embodiment, a method for determining a communication Q value table of an unmanned aerial vehicle, applied to a control end 14, includes:

s11: determining an interfered state set, an anti-interference strategy set and a reward function;

s12: determining an initial Q value table according to the interfered state set and the anti-interference strategy set;

s13: when the learning unmanned aerial vehicle 11 is controlled to execute each anti-interference strategy, updating the Q value in the Q value table in a Q learning mode; the updated Q value is associated with the anti-interference policy adopted by the learning drone 11, the power information and channel information of the received interference signal, and the reward function;

the Q value table records Q values corresponding to different anti-interference strategies executed under each type of interfered state information; the interfered state set represents a set of predefined interfered state information, the anti-interference strategy set is a set of anti-interference strategies executable by the learning unmanned aerial vehicle 11, and the reward function is used for representing different anti-interference strategies executed by the learning unmanned aerial vehicle 11, interfered state information before the corresponding anti-interference strategies are executed, and a functional relation between reward values; the different interfered state information represents different power intervals and/or different channels where the power information of the interference signal received by the learning unmanned aerial vehicle 11 is located; the immunity policy is executed to adjust the communication channel and/or communication bandwidth of the learning drone 11.

In the method for determining the unmanned aerial vehicle communication Q value table in the embodiment of the invention, the determined Q value table comprises Q values corresponding to different anti-interference strategies executed under each type of interfered state information, and different interfered state information represents different power intervals and/or different channels where the power information of the interference signal received by the learning unmanned aerial vehicle 11 is located, so that compared with the prior art, the influence of an external environment on unmanned aerial vehicle communication can be reflected more comprehensively, and the determined anti-interference strategy is more in line with the actual flight condition; because the anti-interference strategy is executed, the communication channel and/or the communication bandwidth of the unmanned aerial vehicle are/is selected and adjusted, so that the selection of the anti-interference strategy is more flexible.

In the embodiment of the present invention, the power interval may be understood as an interval divided according to a set power interval, for example: dividing the power of the interference signal received by the learning drone 11 into L continuous non-overlapping ranges, defining L continuous non-overlapping ranges in sequence corresponding to L types of received power intervals (the L types of received power intervals can also be understood as L types of power levels):

h₁＜h₂＜…h_l＜h_l+1…＜h_L+1；

when P is present₁(t)∈[h_l,h_l+1) When the received power interval is l, h_lFor the corresponding lower boundary, h, of the corresponding power interval_l+1The corresponding upper boundary of the corresponding power interval, wherein L is more than or equal to 1 and less than or equal to L, P₁(t) is the power of the interference signal received by the learning drone 11 in the t time slot.

In embodiments of the present invention, the predefined set of disturbed states may be characterized as:

S＝{s₁(l₁,c₁),s₂(l₂,c₂)……s_X(l_X,c_X)}；

wherein the content of the first and second substances,

the size of S (i.e., the number of interfered state information included in S) is X ═ lxn; n is the channel number of the interference signal;

s_Xrepresenting the Xth interfered state information;

l_Xrepresenting a power interval corresponding to the Xth interfered state information;

c_Xand indicating the channel corresponding to the Xth interfered state information.

In the embodiment of the present invention, the anti-interference policy may be: the channel and bandwidth combinations of the communication signals of the drone 11 are learned. The set of immunity policies is a set of all possible immunity policies.

The set of anti-interference strategies may be characterized as:

G＝{g₁(B₁,a₁),g₂(B₂,a₂)……g_Y(B_Y,a_Y)}；

wherein the content of the first and second substances,

the size of G (i.e., the number of anti-interference strategies included in G) is Y ═ B × M, B is the number of adjustable bandwidths of the learning unmanned aerial vehicle 11, and M is the number of communication channels of the learning unmanned aerial vehicle 11;

g_Yexpressing the Yth anti-interference strategy;

B_Ythe bandwidth of the communication signal of the unmanned aerial vehicle 11 for learning corresponding to the Y-th anti-interference strategy is represented;

a_Yindicating that the Y-th anti-jamming policy corresponds to the channel of the communication signal of the learning drone 11.

In other embodiments, the anti-interference policy may be: characterizing the action of the learning drone 11 to adjust the channel and bandwidth of the current communication signal, e.g. the anti-jamming policy g₁Can be characterized as:

g₁＝g((B₁→B₂),(a₁→a₂) Antijam policy g)₁Indicating that the learning drone 11 needs to use the current channel B₁Adjust to channel B₂The current bandwidth a₁Adjusted to bandwidth a₂(ii) a In this case, the size of G is Y ═ B²×M²。

The initial Q-value table may be a full 0 matrix with a size of X × Y, which may be specifically characterized as:

fig. 3 is a flowchart of a method for determining a communication Q-value table of an unmanned aerial vehicle according to an embodiment of the present invention.

Referring to fig. 3, in one embodiment, the step S13 of updating the Q-value table by Q-learning includes:

s131: acquiring first interfered state information sensed by the learning unmanned aerial vehicle 11 at any Kth time slot;

s132: determining a first anti-interference strategy of the learning unmanned aerial vehicle 11 in the Kth time slot;

s133: determining a first reward value obtained by the learning unmanned aerial vehicle 11 executing the first anti-interference strategy under the first interfered state information according to the reward function;

s134: acquiring second interfered state information perceived by the learning unmanned aerial vehicle 11 at the K +1 time slot;

s135: and updating the Q value table according to the Q value table corresponding to the Kth time slot, the first interfered state information, the first anti-interference strategy, the second interfered state information and the first reward value to obtain a Q value table corresponding to the Kth +1 th time slot.

The updated formula for the Q value can be characterized as:

Q_k+1(s_k,g_k)＝Q_k(s_k,g_k)+τ(R_k+γχ_k+1-Q_k(s_k,g_k))；

wherein the content of the first and second substances,

Q_ka Q value table corresponding to the Kth time slot;

Q_k+1a Q value table corresponding to the K +1 time slot is represented;

s_krepresenting first interfered state information;

g_krepresenting a first immunity policy;

R_krepresenting a first prize value;

s_k+1representing second interfered state information;

τ ∈ (0, 1), which represents a learning step length for controlling the convergence rate of Q learning;

gamma belongs to (0, 1) and represents a discount factor which is used for evaluating the influence weight of the future return value on the current decision;

represents a state s_k+1All possible anti-jamming strategies.

Fig. 4 is a flowchart of a method for determining a communication Q-value table of an unmanned aerial vehicle according to an embodiment of the present invention.

Referring to fig. 4, in one embodiment, the determining the first anti-interference strategy for the learning drone 11 in the kth time slot, i.e., step S132, includes:

s1321: determining an anti-interference strategy subset corresponding to the first interfered state information according to a Q value table corresponding to the Kth time slot;

s1322: and determining the anti-interference strategy with the maximum Q value in the anti-interference strategy subset as a first anti-interference strategy.

Determining the subset of interference rejection policies corresponding to the first disturbed state information may be understood as being according to Q_kA set of all possible immunity policies under the first disturbed state information is determined.

Fig. 5 is a fourth flowchart of a method for determining a communication Q-value table of a drone according to an embodiment of the invention.

Referring to fig. 5, in one embodiment, the determining the first anti-interference strategy for the learning drone 11 in the kth time slot, i.e., step S132, includes:

s1323: acquiring third interfered state information perceived by the unmanned aerial vehicle 11 for learning at the K-1 time slot;

s1324: determining an anti-interference strategy subset corresponding to the third interfered state information according to the Q value table corresponding to the K-1 time slot;

s1325: and determining a first anti-interference strategy through a boltzmann random strategy according to the Q value corresponding to each anti-interference strategy in the anti-interference strategy subset.

Determining the first anti-interference strategy via the boltzmann random strategy may be characterized as:

W(k)＝[w₁(k),w₂(k),…,w_y(k),…,w_Y(k)]；

wherein the content of the first and second substances,

zeta is Boltzmann update coefficient, w_y(k) Anti-interference strategy g is selected for the K time slot unmanned aerial vehicle_yThe probability of (c).

In other embodiments, the first anti-interference policy for determining the K-th slot of the learning drone 11 may also be determined by, for example, an e-greedy policy or a gaussian policy.

Fig. 6 is a flowchart of a method for determining a communication Q-value table of a drone according to an embodiment of the invention.

Referring to fig. 6, in an embodiment, before determining, according to the reward function, a first reward value obtained by the learning drone 11 executing the first anti-interference policy under the first interfered state information, that is, before step S133, the method further includes:

s136: after determining that the learning unmanned aerial vehicle 11 executes the first anti-interference strategy under the first interfered state information, the learning unmanned aerial vehicle 11 receives the signal-to-noise ratio of the communication signal;

if the signal-to-noise ratio is less than or equal to the preset signal demodulation threshold of the learning unmanned aerial vehicle 11, then: determining the function value of the reward function as a preset reward value;

if the signal-to-noise ratio is greater than a preset signal demodulation threshold, then: determining a reward function according to the signal throughput and the channel switching overhead; signal throughput characterizes the ability of the learning drone 11 to receive and/or transmit communication signals, and channel switching overhead characterizes the loss incurred by the learning drone 11 to switch communication channels.

In the above scheme, the signal demodulation threshold may be understood as a threshold value at which the learning unmanned aerial vehicle 11 can demodulate a received communication signal, and specifically may be a numerical value representing a signal-to-noise ratio of a signal received by the learning unmanned aerial vehicle 11. When the signal to noise ratio of the signal received by the unmanned aerial vehicle 11 for learning is too low, that is, the signal to noise ratio is less than or equal to the set signal demodulation threshold, it is indicated that the unmanned aerial vehicle 11 for learning cannot use the received communication signal at this time, and further it can be considered that the unmanned aerial vehicle cannot work at this time, the function value of the reward function is set to a preset reward value, the preset reward value represents feedback of the unmanned aerial vehicle 11 for learning after the first anti-interference strategy is executed, because the unmanned aerial vehicle 11 for learning after the first anti-interference strategy is executed cannot demodulate the communication signal, the preset reward value can be a numerical value representing negative feedback (negative feedback), for example, the preset reward value can be a negative numerical value.

In the above scheme, the channel switching overhead represents the loss (which may also be understood as a correction term of the reward function) generated by learning to switch the communication channel by the drone 11, and the loss generated by switching the channel may include, for example, time overhead and energy consumed by executing the switching action, which are collectively represented by the switching overhead.

In one embodiment, the reward function may be characterized as:

wherein the content of the first and second substances,

a reward function (also referred to as a utility function) for t slots;

represents signal throughput;

κ_c(1-(a_t,a_t-1) Represents channel switching overhead;

lambda is a signal demodulation threshold;

a_t-1a communication channel indicating the learning drone 11 at the t-1 th time slot;

a_ta communication channel indicating the learning drone 11 at the t-th time slot;

ct represents an interference signal channel at the t-th time slot;

indicates the bandwidth of the communication channel of the learning drone 11 at the t-th time slot;

indicating the bandwidth of the interference signal at the t-th time slot;

represents the signal-to-noise ratio at the t-th time slot;

κ_cthe channel switching overhead coefficient is expressed and specifically is a constant, and the constant can be determined by using an empirical value and also according to the result of Q learning for multiple times;

(a_t,a_t-1) In order to indicate the function,

the signal-to-noise ratio can be characterized as:

wherein:

P₁(t) represents the communication signal power received by the learning drone 11 at the t-th time slot;

P₂(t) represents the power of the interference signal received by the learning drone 11 at the t-th time slot;

N₀representing the noise power per unit bandwidth.

For learning useThe man-machine 11 is in actual flight, P₁(t)、P₂(t) may all be directly acquired by its own sensor or other power detection device.

P₁(t) may be characterized in particular as:

P₁(t)＝pω₁(t)；

wherein:

p represents the signal transmission power of the communication drone 12;

ω₁(t) represents a channel gain at which the learning drone 11 receives the communication signal at the t-th time slot;

d₁indicating the distance between the learning drone 11 and the communication drone 12 at the t-th time slot;

α represents a path fading factor;

₁expressing instantaneous fading coefficients, and following the exponential distribution of unit mean values;

P₂(t) may be characterized in particular as:

P₂(t)＝Jω₂(t)；

wherein:

j represents the power of the interference signal transmitted by the interference source;

ω₂(t) represents a channel gain at which the learning drone 11 receives the interfering signal at the t-th time slot;

d₂indicates the distance between the learning drone 11 and the interference source (interfering drone 13) at the t-th time slot;

β represents a path fading factor;

₂representing instantaneous fadingCoefficients, subject to exponential distribution in unit means;

gain factor of interference at t-th time slot.

FIG. 7c shows the gain factor of the interference as

Schematic representation of (c).

The ordinate of fig. 7a, 7b, 7c represents the signal power spectrum and the abscissa represents the frequency.

Please refer to fig. 7a, when a_t≠c_tIn this case, the interference signal does not coincide with the communication signal channel, and the interference signal does not interfere with the communication signal, and the gain coefficient of the interference signal is 0.

Please refer to fig. 7b, when a_t＝c_tAnd is and

in the meantime, the learning drone 11 may receive all the interference signals, and the gain factor of the interference signals is 1 at this time.

Please refer to fig. 7c, when a_t＝c_tAnd is and

in this case, the learning drone 11 may receive all the interference signals, and the ratio of the received interference signals (which may be understood as the gain coefficient of the interference signals at this time) is

Fig. 8 is a schematic view of a flight scenario of the drone according to an embodiment of the present invention.

Referring to fig. 8, the apparatus associated with the scenario may include: a drone 21, an interference source 22, a communication station 23.

The drone 21 communicates with the communication station 23, and the communication station 23 may be, for example, a ground-fixed communication station (e.g., a remote control center), a ground-mobile communication station (e.g., a vehicle-mounted communication station), or another drone.

The interference source 22 continuously emits an interference signal in the flight area, which interference signal can interfere with the communication of the drone 21 with the communication station 23. The interference source 22 may be, for example, a ground-fixed interference source, a ground-moving interference source, or another drone emitting an interference signal. The interference source 22 may further be understood as a normally operating electronic device that may emit wireless signals that can interfere with the communications of the drone 21.

Fig. 9 is a flowchart of an anti-jamming method for communication of a drone according to an embodiment of the present invention.

Referring to fig. 9, an anti-jamming method for communication of an unmanned aerial vehicle includes:

s21: determining interfered state information of the unmanned aerial vehicle 21 at the current moment, wherein different interfered state information represents different power intervals and/or different channels where power information of interference signals received by the unmanned aerial vehicle 21 is located;

s22: determining an anti-interference strategy to be executed by the unmanned aerial vehicle at the current moment according to the interfered state information at the current moment and the Q value table determined by the method related to the scheme;

s23: and executing the anti-interference strategy required to be executed at the current moment so as to adjust the communication channel and/or the communication bandwidth of the unmanned aerial vehicle 21.

In the embodiment of the present invention, determining the interfered state information of the unmanned aerial vehicle 21 at the current time may be further understood as determining the power of the interfering signal received or sensed by the unmanned aerial vehicle 21 at the current time and the channel of the interfering signal, and determining the power interval corresponding to the power according to the power of the sensed interfering signal. In one embodiment, the power and channel of the interference signal may be determined by a spectrum sensing technique or a wideband spectrum sensing technique.

In the above scheme, the anti-interference strategy to be executed at the current moment is determined by combining the power of the interference signal of the flight environment of the unmanned aerial vehicle 21 at the current moment, the channel of the interference signal and the determined Q value table, and because the anti-interference strategy to be executed by the unmanned aerial vehicle 21 takes the power and the channel of the interference signal at the current moment into account, compared with the prior art, the influence of the external environment on the communication of the unmanned aerial vehicle 21 can be reflected more comprehensively, and the determined anti-interference strategy better conforms to the actual flight condition; due to the consideration of the power and the channel of the interference signal at the current moment, when the corresponding anti-interference strategy is executed, the communication channel and/or the communication bandwidth of the unmanned aerial vehicle 21 are/is selected and adjusted, so that the selection of the anti-interference strategy is more flexible.

Fig. 10 is a schematic block diagram of an apparatus for interference rejection of unmanned aerial vehicle communication according to an embodiment of the present invention.

Referring to fig. 10, an apparatus for interference rejection of communication of unmanned aerial vehicle includes:

the state obtaining module 31 is configured to determine interfered state information of the unmanned aerial vehicle at the current time, where different interfered state information represents different power intervals and/or different channels where power information corresponding to an interference signal received by the unmanned aerial vehicle is located;

the strategy determining module 32 is configured to determine an anti-interference strategy that the unmanned aerial vehicle needs to execute at the current moment according to the interfered state information at the current moment and the determined Q value table; the Q value table records the Q values corresponding to different anti-interference strategies executed under each type of interfered state information;

and the policy execution module 33 is configured to execute the anti-interference policy that needs to be executed at the current time, so as to adjust a communication channel and/or a communication bandwidth of the drone.

Fig. 11 is a first schematic block diagram of an apparatus for determining a communication Q-value table of a drone according to an embodiment of the present invention.

Referring to fig. 11, an apparatus for determining a communication Q value table of a drone includes:

a first module 41 for determining an interfered state set, an anti-interference policy set, and a reward function;

a second module 42, configured to determine an initial Q-value table according to the interfered state set and the anti-interference policy set;

a third module 43, configured to update the Q value in the Q value table in a Q learning manner when the learning unmanned aerial vehicle 11 is controlled to execute each anti-interference policy; the updated Q value is associated with the anti-interference policy adopted by the learning drone 11, the power information and channel information of the received interference signal, and the reward function;

the Q value table records Q values corresponding to different anti-interference strategies executed under each type of interfered state information; the interfered state set represents a set of predefined interfered state information, the anti-interference strategy set is a set of anti-interference strategies executable by the learning unmanned aerial vehicle 11, and the reward function is used for representing different anti-interference strategies executed by the learning unmanned aerial vehicle 11, interfered state information before the corresponding anti-interference strategies are executed, and a functional relation between reward values; the different interfered state information represents different power intervals and/or different channels where the power of the interference signal received by the learning unmanned aerial vehicle 11 is located; the immunity policy is executed to adjust the communication channel and/or communication bandwidth of the learning drone 11.

In the device for determining the unmanned aerial vehicle communication Q value table in the embodiment of the invention, the determined Q value table comprises Q values corresponding to different anti-interference strategies executed under each type of interfered state information, and different interfered state information represents different power intervals and/or different channels where the power information of the interference signal received by the learning unmanned aerial vehicle 11 is located, so that compared with the prior art, the influence of an external environment on unmanned aerial vehicle communication can be reflected more comprehensively, and the determined anti-interference strategy is more in line with the actual flight condition; because the anti-interference strategy is executed, the communication channel and/or the communication bandwidth of the unmanned aerial vehicle are/is selected and adjusted, so that the selection of the anti-interference strategy is more flexible.

Fig. 12 is a block diagram illustrating a second exemplary embodiment of an apparatus for determining a communication Q-value table of a drone.

Referring to fig. 12, in one embodiment, the third module 43 includes:

a first information obtaining unit 431, configured to obtain first interfered state information that the learning unmanned aerial vehicle 11 perceives at any kth time slot;

a first policy determining unit 432, configured to determine a first anti-interference policy of the learning drone 11 at the kth time slot;

the first calculating unit 433 is configured to determine, according to the reward function, a first reward value obtained by the learning unmanned aerial vehicle 11 executing the first anti-interference policy under the first interfered state information;

a second information obtaining unit 434, configured to obtain second interfered state information perceived by the learning unmanned aerial vehicle 11 at the K +1 th time slot;

the second calculating unit 435 is configured to update the Q value table according to the Q value table corresponding to the kth time slot, the first interfered state information, the first anti-interference policy, the second interfered state information, and the first reward value, so as to obtain a Q value table corresponding to the K +1 th time slot.

Fig. 13 is a block diagram illustrating a third exemplary embodiment of an apparatus for determining a communication Q-value table of a drone.

Referring to fig. 13, in an embodiment, the first policy determining unit 432 includes:

a first policy subset determining subunit 4321, configured to determine, according to the Q-value table corresponding to the kth time slot, an anti-interference policy subset corresponding to the first interfered state information;

a first policy selecting subunit 4322, configured to determine, as the first anti-interference policy, the anti-interference policy with the largest Q value in the anti-interference policy subset.

Fig. 14 is a block diagram illustrating a fourth exemplary embodiment of the apparatus for determining a communication Q-value table of a drone.

Referring to fig. 14, in an embodiment, the first policy determining unit 432 includes:

an information obtaining subunit 4323, configured to obtain third interfered state information perceived by the learning unmanned aerial vehicle 11 at the K-1 th time slot;

a second policy subset determining subunit 4324, configured to determine, according to the Q-value table corresponding to the K-1 th time slot, an anti-interference policy subset corresponding to the third interfered state information;

and a second policy selecting subunit 4325, configured to determine, according to a Q value corresponding to each anti-interference policy in the anti-interference policy subset, the first anti-interference policy through a boltzmann random policy.

Fig. 15 is a block diagram of a device for determining a communication Q-value table of a drone according to an embodiment of the present invention.

Referring to fig. 15, in an embodiment, the apparatus further includes:

a third calculating unit 436, configured to determine that, after the learning unmanned aerial vehicle 11 executes the first anti-interference policy under the first interfered state information, the learning unmanned aerial vehicle 11 receives the signal-to-noise ratio of the communication signal;

if the signal-to-noise ratio is greater than a preset signal demodulation threshold, then: determining a reward function according to the signal throughput and the channel switching overhead; the signal throughput characterizes the ability of the learning drone 11 to receive communication signals, and the channel switching overhead characterizes the loss incurred by the learning drone 11 to switch communication channels.

wherein:

t represents a time slot;

represents the signal-to-noise ratio at the t-th time slot;

c_tindicating an interfering signal channel at the t-th time slot;

band indicating communication channel of learning drone 11 at t-th time slotWidth;

indicating the bandwidth of the interference signal at the t-th time slot;

P₂(t) represents the power of the interference signal received by the learning drone 11 at the t-th time slot.

To sum up, in the apparatus for determining a Q-value table for unmanned aerial vehicle communication provided in an embodiment of the present invention, the policy determination module 32 determines the anti-interference policy to be executed at the current time by combining the power of the interference signal and the channel of the interference signal in the flight environment of the unmanned aerial vehicle at the current time and the determined Q-value table, and as the anti-interference policy to be executed by the unmanned aerial vehicle considers the power and the channel of the interference signal at the current time, compared with the prior art, the apparatus can more comprehensively reflect the influence of the external environment on the unmanned aerial vehicle communication, and the determined anti-interference policy better conforms to the actual flight condition; due to the fact that the power and the channel of the interference signal at the current moment are considered, when the corresponding anti-interference strategy is executed, the communication channel and/or the communication bandwidth of the unmanned aerial vehicle are/is selected and adjusted, and the anti-interference strategy is more flexible to select.

Referring to fig. 16, an electronic device includes a processor 51 and a memory 53,

a memory 53 for storing codes and related data;

a processor 51 for executing code in a memory 53 for implementing the method according to the first aspect of the invention and its alternatives.

The processor 51 is capable of communicating with the memory 53 via the bus 52.

An embodiment of the present invention also provides a storage medium having stored thereon a computer program which, when executed by a processor, performs the method according to the first aspect of the present invention and its alternatives.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for determining a communication Q value table of an unmanned aerial vehicle is applied to a control end and is characterized by comprising the following steps:

2. The method of claim 1, wherein updating the Q-value table by Q-learning comprises:

3. The method of claim 2, wherein determining the first anti-jamming policy for the learning drone at the kth time slot comprises:

4. The method of claim 2, wherein determining the first anti-jamming policy for the learning drone at the kth time slot comprises:

5. The method of any of claims 2-4, wherein prior to determining, from the reward function, a first reward value obtained by the learning drone for performing the first anti-interference policy under the first disturbed state information, further comprising:

6. The method of claim 5, wherein the signal-to-noise ratio is determined according to the following equation:

wherein:

t represents a time slot;

represents the signal-to-noise ratio at the t-th time slot;

c_tindicating an interfering signal channel at the t-th time slot;

indicating the bandwidth of the interference signal at the t-th time slot;

7. An unmanned aerial vehicle communication anti-interference method is characterized by comprising the following steps:

determining an anti-interference strategy to be executed by the unmanned aerial vehicle at the current moment according to the interfered state information at the current moment and the Q value table determined according to the method of any one of claims 1 to 6;

8. An unmanned aerial vehicle communication Q value table determining device, comprising:

9. The apparatus of claim 8, wherein the third module comprises:

10. The apparatus of claim 9, wherein the first policy determining unit comprises:

11. The apparatus of claim 9, wherein the first policy determining unit comprises:

12. The apparatus of any one of claims 9-11, further comprising:

13. An electronic device, comprising a processor and a memory,

the memory is used for storing codes and related data;

the processor to execute code in the memory to implement the method of any of claims 1-6.

14. A storage medium having stored thereon a computer program which, when executed by a processor, carries out the method of any one of claims 1 to 6.