CN112235810B - Multi-dimensional optimization method and system of unmanned aerial vehicle communication system based on reinforcement learning - Google Patents

Multi-dimensional optimization method and system of unmanned aerial vehicle communication system based on reinforcement learning Download PDF

Info

Publication number
CN112235810B
CN112235810B CN202010991491.6A CN202010991491A CN112235810B CN 112235810 B CN112235810 B CN 112235810B CN 202010991491 A CN202010991491 A CN 202010991491A CN 112235810 B CN112235810 B CN 112235810B
Authority
CN
China
Prior art keywords
power distribution
aerial vehicle
unmanned aerial
optimization problem
optimization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010991491.6A
Other languages
Chinese (zh)
Other versions
CN112235810A (en
Inventor
邓单
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Gala Information Technology Co.,Ltd.
Original Assignee
Guangzhou Panyu Polytechnic
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Panyu Polytechnic filed Critical Guangzhou Panyu Polytechnic
Priority to CN202010991491.6A priority Critical patent/CN112235810B/en
Publication of CN112235810A publication Critical patent/CN112235810A/en
Application granted granted Critical
Publication of CN112235810B publication Critical patent/CN112235810B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W52/00Power management, e.g. TPC [Transmission Power Control], power saving or power classes
    • H04W52/04TPC
    • H04W52/30TPC using constraints in the total amount of available transmission power
    • H04W52/34TPC management, i.e. sharing limited amount of power among users or channels or data types, e.g. cell loading
    • H04W52/346TPC management, i.e. sharing limited amount of power among users or channels or data types, e.g. cell loading distributing total power among users or channels
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a multidimensional optimization method and a multidimensional optimization system of an unmanned aerial vehicle communication system based on reinforcement learning, wherein the method comprises the following steps: step S1, establishing a flight path and power distribution optimization problem model of the unmanned aerial vehicle communication system under the constraint of the minimum transmission rate; step S2, fixing the flight path, sorting the flight path of the established unmanned aerial vehicle communication system under the constraint of the minimum transmission rate and the power distribution strategy optimization problem of the power distribution optimization problem model, and solving by adopting a convex optimization method to obtain a power distribution factor; and step S3, optimizing the optimal flight path by adopting an iterative reinforcement learning method.

Description

Multi-dimensional optimization method and system of unmanned aerial vehicle communication system based on reinforcement learning
Technical Field
The invention relates to the technical field of unmanned aerial vehicle communication, in particular to a multidimensional optimization method and system of an unmanned aerial vehicle communication system based on reinforcement learning.
Background
A flight Trajectory and Precoding Joint Optimization method under a Non-orthogonal Multiple Access (NOMA) technology adopted in an Unmanned Aerial Vehicle (UAV) communication system is disclosed in detail in a journal paper published in IEEE Transactions on Communications in 2019, and the paper adopts an approximately convex Optimization method to convert a complex Non-convex problem into convex Optimization for solving so as to obtain an optimal flight Trajectory and a Precoding matrix.
However, the following disadvantages still exist in this method: the optimization method based on convex optimization needs to accurately model the system capacity, but in an actual communication system, because of the influence of channel distortion, disturbance and the like, the key parameters of the communication system are difficult to accurately describe, and therefore the use scene is limited.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention aims to provide a multidimensional optimization method and system of an unmanned aerial vehicle communication system based on reinforcement learning.
In order to achieve the above and other objects, the present invention provides a multidimensional optimization method for an unmanned aerial vehicle communication system based on reinforcement learning, comprising the following steps:
step S1, establishing a flight path and power distribution optimization problem model of the unmanned aerial vehicle communication system under the constraint of the minimum transmission rate;
step S2, fixing the flight path, sorting the flight path of the established unmanned aerial vehicle communication system under the constraint of the minimum transmission rate and the power distribution strategy optimization problem of the power distribution optimization problem model, and solving by adopting a convex optimization method to obtain a power distribution factor;
and step S3, optimizing the optimal flight path by adopting an iterative reinforcement learning method.
Preferably, in step S1, the flight trajectory and power allocation optimization problem model under the constraint of the minimum transmission rate is represented as:
(P0):
Figure BDA0002687021550000021
Figure BDA0002687021550000022
Figure BDA0002687021550000023
Figure BDA0002687021550000024
Figure BDA0002687021550000025
wherein r isk[n]Indicating the safety capacity, ξ, of the kth useri[n]Represents the power allocation factor, w n, for the ith user]For the position of the target user, vm is the maximum moving speed of the unmanned aerial vehicle, N represents the number of the divided time slots within a certain observation time T, and the interval between two adjacent time slots is represented as: delta is equal to T/N, and the ratio of T/N,Rk,k[n]indicating the capacity of the kth user.
Preferably, the step S2 further includes:
s200, assuming that the flight path is fixed, sorting the flight path under the constraint of the minimum transmission rate and a power distribution strategy optimization problem of a power distribution optimization problem model into a convex optimization problem about three variables;
step S201, solving the power distribution strategy optimization problem converted in the step S200 by adopting an iterative approximately convex optimization method to obtain a power distribution factor.
Preferably, in step S200, the target function r is obtained byk[n]Conversion to convex function
Figure BDA0002687021550000026
Therefore, the power distribution strategy optimization problem of the flight path and power distribution optimization problem model under the constraint of the minimum transmission rate is consolidated into a convex optimization problem about three variables.
Preferably, in step S200, a first-order taylor expansion is adopted, and a relaxation variable is introduced, and the objective function is converted into a convex function, so that the power distribution strategy optimization problem of the flight trajectory and power distribution optimization problem model under the constraint of the minimum transmission rate is consolidated into a convex optimization problem P1 about three variables.
Preferably, in step S201, the solving process of the P1 problem includes:
step 1, obtaining an initial power distribution factor according to a minimum transmission rate requirement, and distributing all residual power to a strongest user; initialize iteration index r ═ 0, and calculate ξr[n],Ir[n],Ie,r[n],ηr[n];
Step 2, give ηr[n]Solving the P1 problem by using a standard convex optimization solving tool to obtain updated ξr+1[n],Ir+1[n],Ie,r+1[n],ηr+1[n]And updating an iteration index r ═ r + 1;
step 3, if r reaches the maximum number of iterations or the increment of the objective function in the P1 problemLess than a predetermined threshold
Figure BDA0002687021550000031
The iteration stops; otherwise, repeating the step 2.
Preferably, the step S3 further includes:
step S300, carrying out grid segmentation on the horizontal target space of the flight path of the unmanned aerial vehicle, wherein the segmentation granularity is vmδ*vmDelta, converting different grids into a state space for reinforcement learning according to coordinates, and approximating a continuous action space of the unmanned aerial vehicle to a discrete action space consisting of five optional actions;
step S301, defining the safety capacity sum after the position of the unmanned aerial vehicle is updated as a reward function, and performing value function iterative updating;
step S302, after the unmanned aerial vehicle is subjected to reinforcement learning once, a new updating position is obtained, the updated power distribution factor is calculated by using the P1 solving method of the step S2, and the value function is updated in an iterative manner;
and S303, after the unmanned aerial vehicle is explored for a plurality of rounds, gradually approaching the value function to the optimal value function, and finally obtaining the optimal flight trajectory of the unmanned aerial vehicle.
Preferably, in step S301, the value function is iteratively updated according to the following iterative formula:
Figure BDA0002687021550000032
wherein Q isn(sn,an) Is a function of values and has an initial value of all zeros, RnFor the reward function, θ is the learning rate factor and β is the discount factor.
In order to achieve the above object, the present invention further provides a multidimensional optimization system of an unmanned aerial vehicle communication system based on reinforcement learning, including:
the model building unit is used for building a flight trajectory and power distribution optimization problem model of the unmanned aerial vehicle communication system under the constraint of the minimum transmission rate;
the convex optimization solving unit is used for fixing a flight track, sorting the flight track of the established unmanned aerial vehicle communication system under the constraint of the minimum transmission rate and the power distribution strategy optimization problem of the power distribution optimization problem model, and solving by adopting a convex optimization method to obtain a power distribution factor;
and the reinforcement learning optimization unit is used for optimizing the optimal flight trajectory by adopting an iterative reinforcement learning method.
Preferably, the reinforcement learning optimization unit is specifically configured to:
firstly, carrying out grid segmentation on a horizontal target space of a flight trajectory of an Unmanned Aerial Vehicle (UAV), wherein the granularity of segmentation is vmδ*vmDelta, converting different grids into a state space for reinforcement learning according to coordinates, and approximating a continuous action space of the UAV to a discrete action space consisting of five optional actions;
defining the sum of the safe capacity after the position of the unmanned aerial vehicle UAV is updated as a reward function so as to carry out value function iteration updating;
after the unmanned aerial vehicle is subjected to reinforcement learning once, a new updating position is obtained, the updated power distribution factor is calculated by utilizing a P1 solving method of a convex optimization solving unit, and value function iteration updating is carried out;
after the unmanned aerial vehicle is explored for a plurality of rounds, the value function gradually approaches to the optimal value function, and finally the optimal unmanned aerial vehicle flight track is obtained.
Compared with the prior art, the unmanned aerial vehicle communication system multidimensional optimization method and system based on reinforcement learning, disclosed by the invention, have the advantages that the flight trajectory, the power distribution factor and other multidimensional joint optimization are carried out by adopting an optimization method combining convex optimization and reinforcement learning, the reward function can be obtained through feedback based on reinforcement learning, the communication system does not need to be accurately modeled, the application scene is wider, and the optimal multidimensional optimization result can be obtained.
Drawings
Fig. 1 is a flowchart illustrating steps of a multidimensional optimization method for an unmanned aerial vehicle communication system based on reinforcement learning according to the present invention;
FIG. 2 is a block diagram of a model of an Unmanned Aerial Vehicle (UAV) communications system to which the present invention is applied;
fig. 3 is a system architecture diagram of a multidimensional optimization system of an unmanned aerial vehicle communication system based on reinforcement learning according to the present invention;
FIG. 4 is a diagram of a flight trajectory of an unmanned aerial vehicle based on reinforcement learning according to an embodiment of the present invention;
fig. 5 shows the sum of flight path safety capacities of the unmanned aerial vehicle according to the embodiment of the invention.
Detailed Description
Other advantages and capabilities of the present invention will be readily apparent to those skilled in the art from the present disclosure by describing the embodiments of the present invention with specific embodiments thereof in conjunction with the accompanying drawings. The invention is capable of other and different embodiments and its several details are capable of modification in various other respects, all without departing from the spirit and scope of the present invention.
Fig. 1 is a flowchart illustrating steps of a multidimensional optimization method for an unmanned aerial vehicle communication system based on reinforcement learning according to the present invention. The invention relates to a multidimensional optimization method of an unmanned aerial vehicle communication system based on reinforcement learning, which comprises the following steps:
and step S1, establishing a flight trajectory and power distribution optimization problem model of the unmanned aerial vehicle communication system under the constraint of the minimum transmission rate.
An Unmanned Aerial Vehicle (UAV) communication system model applied by the present invention is shown in fig. 2, in which a UAV communication base station, K target users, and an eavesdropping user are provided. Where the UAV base station is free to move within a horizontal target area of height H, the target user's location is represented as: l isi=[xi,yi]T,i∈[1,K]The location of the eavesdropping user is expressed as: l ise. The flight trajectory of the UAV base station at different time points may be expressed as:
W={w[n]=[x[n],y[n]]Tn ═ 1, 2, ·, N. } (formula one)
Wherein, w [ n ]]Horizontal coordinates representing the nth observation time point, N representing division within a certain observation time TAnd the interval between two adjacent slots can be expressed as: delta T/N and UAV maximum movement rate vmAt this time, the channel fading power from the ith user to the UAV base station may be expressed as:
Figure BDA0002687021550000061
wherein d isiRepresenting the distance of the ith user from the UAV.
ρoThe reference signal power gain at unit distance is shown, alpha is more than or equal to 2 to represent the channel path loss index, and the value range is generally between 2 and 4. Similarly, the channel fading power of the eavesdropping user to the UAV base station is:
Figure BDA0002687021550000062
deindicating the distance of an eavesdropping user from the UAV
Assuming that the UAV communication system employs a non-orthogonal multiple access (NOMA) communication protocol, its downlink transmission signal can be expressed as:
Figure BDA0002687021550000063
where P represents the total transmit power of the UAV base station, xiIs the data symbol of the ith user, ξiRepresenting the power allocation factor of the ith user.
In view of power constraints, have
Figure BDA0002687021550000064
Ω represents a set of all users, and Ω is [1, K ]
Then the received signal of the ith user is:
Figure BDA0002687021550000065
wherein n isiRepresenting the received noise of the target user, with a power of σ2,giChannel fading power g from the ith user to the UAV base stationi[n]. According to the NOMA receiver successive interference cancellation algorithm, the signal-to-noise ratio of the kth data stream symbol in the ith user can be expressed as:
Figure BDA0002687021550000066
in the above formula, the first and second carbon atoms are,
Figure BDA0002687021550000067
representing the disturbing part, ξ, of the kth data streamiRepresenting the power allocation factor of the ith user. At this time, the capacity of the kth user can be expressed as:
Figure BDA0002687021550000071
it is assumed that each user is a minimum transmission rate constraint, i.e.:
Figure BDA0002687021550000072
Figure BDA0002687021550000073
indicating the minimum transmission rate requirement for the kth user.
Similarly, for an eavesdropping user, the capacity of the kth data stream can be expressed as:
Figure BDA0002687021550000074
wherein:
Figure BDA0002687021550000075
Figure BDA0002687021550000076
according to the definition of the safe capacity, the safe capacity of the kth user is shown as follows:
Figure BDA0002687021550000077
in summary, the flight trajectory and power allocation optimization problem model under the constraint of the minimum transmission rate can be expressed as:
(P0):
Figure BDA0002687021550000078
Figure BDA0002687021550000079
Figure BDA00026870215500000710
Figure BDA00026870215500000711
Figure BDA00026870215500000712
w represents the optimal flight path, and ζ represents the optimal power distribution factor
And S2, fixing the flight path, sorting the flight path of the established unmanned aerial vehicle communication system under the constraint of the minimum transmission rate and the power distribution strategy optimization problem of the power distribution optimization problem model, and solving by adopting a convex optimization method to obtain a power distribution factor.
Because the flight path and power distribution optimization problem model (formula nine) under the minimum transmission rate constraint established in the step S1 is a non-convex function and is difficult to directly solve, the optimization problem is solved by adopting a method combining approximate convex optimization and reinforcement learning, namely, the flight path and power distribution factor and other dimensions are jointly optimized by adopting an optimization method combining convex optimization and reinforcement learning.
Specifically, step S2 further includes:
and S200, assuming that the flight path is fixed, and sorting the flight path under the constraint of the minimum transmission rate and the power distribution strategy optimization problem of the power distribution optimization problem model into a convex optimization problem about three variables.
Firstly, assuming that the flight trajectory is fixed, the power distribution strategy optimization problem is solved.
Considering the minimum transmission rate equation six may translate to:
Figure BDA0002687021550000081
the objective function for the P0 problem can be approximated as:
Figure BDA0002687021550000082
wherein the content of the first and second substances,
Figure BDA0002687021550000083
Figure BDA0002687021550000084
Figure BDA0002687021550000085
at this time, the objective function is still notIs a convex function, therefore, the present invention employs a first order Taylor expansion and introduces a relaxation variable
Figure BDA0002687021550000086
The objective function is further converted into:
Figure BDA0002687021550000087
wherein the content of the first and second substances,
Figure BDA0002687021550000088
Figure BDA0002687021550000089
the result of the solution of the r-th time is obtained.
At this time, the power allocation policy optimization problem can be collated as:
(P1):
Figure BDA00026870215500000810
Figure BDA0002687021550000091
Figure BDA0002687021550000092
Figure BDA0002687021550000093
Figure BDA0002687021550000094
Figure BDA0002687021550000095
thus, the P1 problem is a convex optimization problem with respect to three variables.
Step S201, solving the power distribution strategy optimization problem converted in the step S200 by adopting an iterative approximately convex optimization method to obtain a power distribution factor.
In a specific embodiment of the invention, a standard convex optimization solver, such as CVX, is used to perform the numerical solution. Specifically, the flow of the solving method of the P1 problem is as follows:
step 1, initialization: obtaining an initial power distribution factor according to the requirement of the minimum transmission rate, and distributing all residual power to the strongest user; initializing the iteration index r as 0 and calculating
ξr[n],Ir[n],Ie,r[n],ηr[n];
Step 2, give ηr[n]Solving the P1 problem by using a CVX tool to obtain updated ξr+1[n],Ir+1[n],Ie,r+1[n],ηr+1[n]And updating an iteration index r ═ r + 1;
step 3, if r reaches the maximum iteration number or the increment of the objective function in the P1 problem is smaller than the preset threshold
Figure BDA0002687021550000096
The iteration stops; otherwise, repeating the step 2.
And step S3, optimizing the optimal flight path by adopting an iterative reinforcement learning method.
After the power distribution factor is obtained, the flight trajectory needs to be optimized continuously. The invention adopts an optimization method based on reinforcement learning to solve. Specifically, step S3 further includes:
step S300, carrying out grid segmentation on the horizontal target space of the flight trajectory of the unmanned aerial vehicle UAV, wherein the segmentation granularity is vmδ*vmDelta, and converting different grids into a state space s for reinforcement learning according to coordinatesn
Step S301, connecting the unmanned aerial vehicle UAVThe motion space is approximated to be a discrete motion space a composed of five optional motionsn
Step S302, defining the sum of the safety capacities after the position update of the unmanned aerial vehicle UAV as a reward function, and performing value function iterative update by using the following iterative formula:
Figure BDA0002687021550000101
wherein Q isn(sn,an) Is a function of values and has an initial value of all zeros, RnFor the reward function, θ is the learning rate factor and β is the discount factor. snRepresenting the state of the point at time n, i.e. horizontal coordinate, anThe action taken by the UAV at time point n is shown.
And step S303, obtaining a new updating position after each reinforcement learning of the unmanned aerial vehicle UAV, wherein the reinforcement learning updating method adopts a probability greedy algorithm, namely, the optimal action in the current value function is selected with a certain probability, and the rest probability is averagely distributed to all other non-optimal actions. Calculating the updated power distribution factor by using the P1 solving method of the step S2, iteratively updating through the iterative formula, judging whether the value function approaches the optimal value function, if not, continuously obtaining an updated position, calculating the updated power distribution factor, and continuously iterating until the value function approaches the optimal value function in the step S304;
and S304, after the unmanned aerial vehicle UAV is explored for a plurality of rounds, gradually approaching the value function to the optimal value function, and finally obtaining the optimal UAV flight track.
Fig. 3 is a system architecture diagram of a multidimensional optimization system of an unmanned aerial vehicle communication system based on reinforcement learning according to the present invention. The invention relates to a multidimensional optimization system of an unmanned aerial vehicle communication system based on reinforcement learning, which comprises the following components:
the model building unit 301 is configured to build a flight trajectory and power distribution optimization problem model of the unmanned aerial vehicle communication system under the constraint of the minimum transmission rate.
Unmanned aerial vehicle applied by the inventionThe (UAV) communication system model is shown in fig. 2, where there is a UAV communication base station, K target users and an eavesdropping user. Where the UAV base station is free to move within a horizontal target area of height H, the target user's location is represented as: l isi=[xi,yi]T,i∈[1,K]The location of the eavesdropping user is expressed as: l ise. The flight trajectory of the UAV base station at different time points may be expressed as:
W={w[n]=[x[n],y[n]]T,n=1,2,...,N.}
where N represents the number of slots divided within a certain observation time T, and the interval between two adjacent slots can be represented as: delta T/N and UAV maximum movement rate vmAt this time, the channel fading power from the ith user to the UAV base station may be expressed as:
Figure BDA0002687021550000111
where ρ o is the power gain of the reference signal at unit distance, and α ≧ 2 represents the channel path loss exponent. Similarly, the channel fading power of the eavesdropping user to the UAV base station is:
Figure BDA0002687021550000112
assuming that the UAV communication system employs a non-orthogonal multiple access (NOMA) communication protocol, its downlink transmission signal can be expressed as:
Figure BDA0002687021550000113
where P represents the total transmit power of the UAV base station, xiIs the data symbol of the ith user, ξiRepresenting the power allocation factor of the ith user.
In view of power constraints, have
Figure BDA0002687021550000114
Then the received signal of the ith user is:
Figure BDA0002687021550000115
wherein n isiRepresenting the received noise of the target user. According to the NOMA receiver successive interference cancellation algorithm, the signal-to-noise ratio of the kth data stream symbol in the ith user can be expressed as:
Figure BDA0002687021550000116
in the above formula, the first and second carbon atoms are,
Figure BDA0002687021550000117
representing the interfering part of the k-th data stream. At this time, the capacity of the kth user can be expressed as:
Figure BDA0002687021550000118
it is assumed that each user is a minimum transmission rate constraint, i.e.:
Figure BDA0002687021550000119
similarly, for an eavesdropping user, the capacity of the kth data stream can be expressed as:
Figure BDA00026870215500001110
wherein:
Figure BDA0002687021550000121
Figure BDA0002687021550000122
according to the definition of the safe capacity, the safe capacity of the kth user is shown as follows:
Figure BDA0002687021550000123
in summary, the flight trajectory and power allocation optimization problem model under the constraint of the minimum transmission rate can be expressed as:
(P0):
Figure BDA0002687021550000124
Figure BDA0002687021550000125
Figure BDA0002687021550000126
Figure BDA0002687021550000127
Figure BDA0002687021550000128
and the convex optimization solving unit 302 is configured to fix the flight trajectory, sort the power distribution strategy optimization problem of the flight trajectory and power distribution optimization problem model of the established unmanned aerial vehicle communication system under the constraint of the minimum transmission rate, and solve by using a convex optimization method to obtain the power distribution factor.
Because the flight trajectory and power distribution optimization problem model (formula nine) under the minimum transmission rate constraint established by the model establishing unit 301 is a non-convex function and is difficult to directly solve, the optimization problem is solved by adopting a method combining approximate convex optimization and reinforcement learning, namely, the flight trajectory and power distribution factor and other dimensions are jointly optimized by adopting an optimization method combining convex optimization and reinforcement learning.
Specifically, the convex optimization solving unit 302 further includes:
and the model conversion module is used for assuming that the flight path is fixed and organizing the flight path under the constraint of the minimum transmission rate and the power distribution strategy optimization problem of the power distribution optimization problem model into a convex optimization problem about three variables.
Firstly, assuming that the flight trajectory is fixed, the power distribution strategy optimization problem is solved.
Considering the minimum transmission rate, this can be translated into:
Figure BDA0002687021550000131
the objective function for the P0 problem can be approximated as:
Figure BDA0002687021550000132
wherein the content of the first and second substances,
Figure BDA0002687021550000133
Figure BDA0002687021550000134
Figure BDA0002687021550000135
at this time, the objective function is still not a convex function, therefore, the invention adopts first-order Taylor expansion, introduces a relaxation variable, and further converts the objective function into:
Figure BDA0002687021550000136
wherein the content of the first and second substances,
Figure BDA0002687021550000137
Figure BDA0002687021550000138
the result of the solution of the r-th time is obtained.
At this time, the power allocation policy optimization problem can be collated as:
(P1):
Figure BDA0002687021550000139
Figure BDA00026870215500001310
Figure BDA00026870215500001311
Figure BDA00026870215500001312
Figure BDA00026870215500001313
Figure BDA00026870215500001314
thus, the P1 problem is a convex optimization problem with respect to three variables.
And the convex optimization solving module is used for solving the power distribution strategy optimization problem converted by the model conversion module by adopting an iterative approximate convex optimization method to obtain a power distribution factor.
In a specific embodiment of the invention, a standard convex optimization solver, such as CVX, is used to perform the numerical solution. Specifically, the flow of the solving method of the P1 problem is as follows:
step 1, initialization: obtaining an initial power distribution factor according to the requirement of the minimum transmission rate, and distributing all residual power to the strongest user; initializing the iteration index r as 0 and calculating
ξr[n],Ir[n],Ie,r[n],ηr[n];
Step 2, give ηr[n]Solving the P1 problem by using a CVX tool to obtain updated ξr+1[n],Ir+1[n],Ie,r+1[n],ηr+1[n]And updating an iteration index r ═ r + 1;
step 3, if r reaches the maximum iteration number or the increment of the objective function in the P1 problem is smaller than the preset threshold
Figure BDA0002687021550000141
The iteration stops; otherwise, repeating the step 2.
And the reinforcement learning optimization unit 303 is configured to optimize the optimal flight trajectory by using an iterative reinforcement learning method.
After the power distribution factor is obtained, the flight trajectory needs to be optimized continuously. The reinforcement learning optimization unit 303 of the present invention performs solution by using an optimization method based on reinforcement learning. The reinforcement learning optimization unit 303 is specifically configured to:
firstly, carrying out grid segmentation on a horizontal target space of a flight trajectory of an Unmanned Aerial Vehicle (UAV), wherein the granularity of segmentation is vmδ*vmAnd delta, converting different grids into a state space for reinforcement learning according to coordinates, and approximating a continuous motion space of the unmanned aerial vehicle UAV to a discrete motion space formed by front, back, left and right and five optional motions.
Defining the sum of safe capacities after the update of the unmanned aerial vehicle UAV position as a reward function, and employing the following iterative formula to perform value function iterative update:
Figure BDA0002687021550000142
wherein Q isn(sn,an) Is a function of values and has an initial value of all zeros, RnFor the reward function, θ is the learning rate factor and β is the discount factor.
Acquiring a new updating position after each reinforcement learning of the unmanned aerial vehicle UAV, wherein the reinforcement learning updating method adopts a probability greedy algorithm, namely selecting the optimal action in a current value function with a certain probability, averagely distributing the optimal action to all other non-optimal actions with a residual probability, and calculating an updated power distribution factor by using a P1 solving method of the convex optimization solving unit 302; and (4) performing iterative updating through the iterative formula, judging whether the value function approaches to the optimal value function, if not, continuously obtaining an updated position, calculating the updated power distribution factor, and continuously iterating until the value function approaches to the optimal value function.
After the unmanned aerial vehicle UAV is explored through a plurality of rounds, the value function gradually approaches to the optimal value function, and finally the optimal UAV flight track is obtained.
Examples
In this embodiment, it is assumed that 9 target users are distributed on a diagonal line at 45 degrees, and the coordinates (100 ) of the intercepted users are used to obtain a flight path as shown in fig. 2, and a corresponding safe capacity as shown in fig. 3. As can be seen from fig. 4 and 5, the multidimensional optimization method based on reinforcement learning can reach a substantially steady state after about 2000 times of exploration, and can maintain the optimal capacity sum with a high probability.
In summary, the invention provides a multidimensional optimization method and system for an unmanned aerial vehicle communication system based on reinforcement learning, which jointly optimizes multiple dimensions such as flight trajectories and power distribution factors by adopting an optimization method combining convex optimization and reinforcement learning.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Modifications and variations can be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the present invention. Therefore, the scope of the invention should be determined from the following claims.

Claims (8)

1. A multidimensional optimization method of an unmanned aerial vehicle communication system based on reinforcement learning is characterized by comprising the following steps:
step S1, establishing a flight path and power distribution optimization problem model of the unmanned aerial vehicle communication system under the constraint of the minimum transmission rate;
the flight trajectory and power distribution optimization problem model under the constraint of the minimum transmission rate is as follows:
Figure FDA0003057420860000011
Figure FDA0003057420860000012
Figure FDA0003057420860000013
Figure FDA0003057420860000014
wherein r isk[n]Indicating the safety capacity, ξ, of the kth useri[n]Represents the power allocation factor, w n, for the ith user]Is the location of the target user, vmFor the maximum moving rate of the unmanned aerial vehicle, N represents the number of time slots divided within a certain observation time T, and the interval between two adjacent time slots is represented as: delta as T/N, Rk,k[n]Indicating the capacity of the k-th user,k denotes the target user, Rk thRepresents the minimum transmission rate requirement for the kth user;
step S2, fixing the flight path, sorting the flight path of the established unmanned aerial vehicle communication system under the constraint of the minimum transmission rate and the power distribution strategy optimization problem of the power distribution optimization problem model, and solving by adopting a convex optimization method to obtain a power distribution factor;
step S200, assuming that the flight path is fixed, the flight path under the constraint of the minimum transmission rate and the power distribution strategy optimization problem of the power distribution optimization problem model are sorted into a convex optimization problem about three variables:
(P1):
Figure FDA0003057420860000015
Figure FDA0003057420860000021
Figure FDA0003057420860000022
Figure FDA0003057420860000023
Figure FDA0003057420860000024
Figure FDA0003057420860000025
the P1 problem is a convex optimization problem with respect to three variables;
step S201, solving the power distribution strategy optimization problem converted in the step S200 by adopting an iterative approximately convex optimization method to obtain a power distribution factor;
and step S3, optimizing the optimal flight path by adopting an iterative reinforcement learning method.
2. The method of claim 1, wherein in step S200, the objective function r is determined by performing a multi-dimensional optimization of the communication system of the drone based on reinforcement learningk[n]Conversion to convex function rk lb[n]Therefore, the power distribution strategy optimization problem of the flight path and power distribution optimization problem model under the constraint of the minimum transmission rate is consolidated into a convex optimization problem about three variables.
3. The method as claimed in claim 2, wherein in step S200, a first-order taylor expansion is adopted, and a relaxation variable is introduced to convert an objective function into a convex function, so as to solve the power distribution strategy optimization problem of the flight trajectory and power distribution optimization problem model under the constraint of the minimum transmission rate into a convex optimization problem P1 with respect to three variables.
4. The method as claimed in claim 3, wherein in step S201, the process of solving the P1 problem includes:
step 1, obtaining an initial power distribution factor according to a minimum transmission rate requirement, and distributing all residual power to a strongest user; initialize iteration index r ═ 0, and calculate ξr[n],Ir[n],Ie,r[n],ηr[n];
Step 2, give ηr[n]Solving the P1 problem by using a standard convex optimization solving tool to obtain updated ξr+1[n],Ir+1[n],Ie,r+1[n],ηr+1[n]And updating an iteration index r ═ r + 1;
step 3, if r reaches the maximum iteration number or the increment of the target function in the P1 problem is smaller than a preset threshold tau, the iteration is stopped; otherwise, repeating the step 2.
5. The method for multi-dimensional optimization of unmanned aerial vehicle communication system based on reinforcement learning of claim 4, wherein step S3 further comprises:
step S300, carrying out grid segmentation on the horizontal target space of the flight path of the unmanned aerial vehicle, wherein the segmentation granularity is vmδ*vmDelta, converting different grids into a state space for reinforcement learning according to coordinates, and approximating a continuous action space of the unmanned aerial vehicle to a discrete action space consisting of five optional actions;
step S301, defining the safety capacity sum after the position of the unmanned aerial vehicle is updated as a reward function, and performing value function iterative updating;
step S302, after the unmanned aerial vehicle is subjected to reinforcement learning once, a new updating position is obtained, the updated power distribution factor is calculated by using the P1 solving method of the step S2, and the value function is updated in an iterative manner;
and S303, after the unmanned aerial vehicle is explored for a plurality of rounds, gradually approaching the value function to the optimal value function, and finally obtaining the optimal flight trajectory of the unmanned aerial vehicle.
6. The method for multidimensional optimization of unmanned aerial vehicle communication system based on reinforcement learning of claim 5, wherein in step S301, the value function is iteratively updated according to the following iterative formula:
Figure FDA0003057420860000031
wherein Q isn(sn,an) Is a function of values and has an initial value of all zeros, RnFor the reward function, θ is the learning rate factor and β is the discount factor.
7. A multidimensional optimization system of an unmanned aerial vehicle communication system based on reinforcement learning is characterized by comprising:
the model building unit is used for building a flight trajectory and power distribution optimization problem model of the unmanned aerial vehicle communication system under the constraint of the minimum transmission rate;
the flight trajectory and power distribution optimization problem model under the constraint of the minimum transmission rate is as follows:
(P0):
Figure FDA0003057420860000032
Figure FDA0003057420860000041
Figure FDA0003057420860000042
Figure FDA0003057420860000043
Figure FDA0003057420860000044
wherein r isk[n]Indicating the safety capacity, ξ, of the kth useri[n]Represents the power allocation factor, w n, for the ith user]Is the location of the target user, vmFor the maximum moving rate of the unmanned aerial vehicle, N represents the number of time slots divided within a certain observation time T, and the interval between two adjacent time slots is represented as: delta as T/N, Rk,k[n]Denotes the capacity of the kth user, K denotes the target user, Rk thRepresents the minimum transmission rate requirement for the kth user;
the convex optimization solving unit is used for fixing a flight track, sorting the flight track of the established unmanned aerial vehicle communication system under the constraint of the minimum transmission rate and the power distribution strategy optimization problem of the power distribution optimization problem model, and solving by adopting a convex optimization method to obtain a power distribution factor;
assuming that the flight path is fixed, the flight path under the constraint of the minimum transmission rate and the power distribution strategy optimization problem of the power distribution optimization problem model are sorted into a convex optimization problem about three variables:
(P1):
Figure FDA0003057420860000045
Figure FDA0003057420860000046
Figure FDA0003057420860000047
Figure FDA0003057420860000048
Figure FDA0003057420860000049
Figure FDA00030574208600000410
the P1 problem is a convex optimization problem with respect to three variables;
solving the power distribution strategy optimization problem converted in the step S200 by adopting an iterative approximately convex optimization method to obtain a power distribution factor;
and the reinforcement learning optimization unit is used for optimizing the optimal flight trajectory by adopting an iterative reinforcement learning method.
8. The system of claim 7, wherein the reinforcement learning optimization unit is specifically configured to:
firstly, carrying out grid segmentation on a horizontal target space of a flight trajectory of an Unmanned Aerial Vehicle (UAV), wherein the granularity of segmentation is upsilonmδ*υmDelta, converting different grids into a state space for reinforcement learning according to coordinates, and approximating a continuous action space of the UAV to a discrete action space consisting of five optional actions;
defining the sum of the safe capacity after the position of the unmanned aerial vehicle UAV is updated as a reward function so as to carry out value function iteration updating;
after the unmanned aerial vehicle is subjected to reinforcement learning once, a new updating position is obtained, the updated power distribution factor is calculated by utilizing a P1 solving method of a convex optimization solving unit, and value function iteration updating is carried out;
after the unmanned aerial vehicle is explored for a plurality of rounds, the value function gradually approaches to the optimal value function, and finally the optimal unmanned aerial vehicle flight track is obtained;
the solving process of the P1 problem comprises the following steps:
step 1, obtaining an initial power distribution factor according to a minimum transmission rate requirement, and distributing all residual power to a strongest user; initialize iteration index r ═ 0, and calculate ξr[n],Ir[n],Ie,r[n],ηr[n];
Step 2, give ηr[n]Solving the P1 problem by using a standard convex optimization solving tool to obtain updated ξr+1[n],Ir+1[n],Ie,r+1[n],ηr+1[n]And updating an iteration index r ═ r + 1;
step 3, if r reaches the maximum iteration number or the increment of the target function in the P1 problem is smaller than a preset threshold tau, the iteration is stopped; otherwise, repeating the step 2.
CN202010991491.6A 2020-09-17 2020-09-17 Multi-dimensional optimization method and system of unmanned aerial vehicle communication system based on reinforcement learning Active CN112235810B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010991491.6A CN112235810B (en) 2020-09-17 2020-09-17 Multi-dimensional optimization method and system of unmanned aerial vehicle communication system based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010991491.6A CN112235810B (en) 2020-09-17 2020-09-17 Multi-dimensional optimization method and system of unmanned aerial vehicle communication system based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN112235810A CN112235810A (en) 2021-01-15
CN112235810B true CN112235810B (en) 2021-07-09

Family

ID=74108006

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010991491.6A Active CN112235810B (en) 2020-09-17 2020-09-17 Multi-dimensional optimization method and system of unmanned aerial vehicle communication system based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN112235810B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113423060B (en) * 2021-06-22 2022-05-10 广东工业大学 Online optimization method for flight route of unmanned aerial communication platform
CN115642949B (en) * 2022-10-11 2024-05-03 华北电力大学 C-NOMA enabled 6G heterogeneous network unmanned aerial vehicle track optimization method
CN116704823B (en) * 2023-06-12 2023-12-19 大连理工大学 Unmanned aerial vehicle intelligent track planning and general sense resource allocation method based on reinforcement learning

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108521667B (en) * 2018-03-07 2020-12-29 杭州电子科技大学 Unmanned aerial vehicle data transmission method with low transmission energy consumption
CN108848465B (en) * 2018-08-15 2020-10-30 中国人民解放军陆军工程大学 Unmanned aerial vehicle flight trajectory and resource scheduling joint optimization method oriented to data distribution
CN109151718A (en) * 2018-09-17 2019-01-04 南昌大学 Unmanned plane efficiency maximum resource distribution method based on safety of physical layer
CN110380773B (en) * 2019-06-13 2021-10-29 广东工业大学 Trajectory optimization and resource allocation method of unmanned aerial vehicle multi-hop relay communication system
CN110381445B (en) * 2019-06-28 2021-01-15 广东工业大学 Resource allocation and flight trajectory optimization method based on unmanned aerial vehicle base station system
CN110488861B (en) * 2019-07-30 2020-08-28 北京邮电大学 Unmanned aerial vehicle track optimization method and device based on deep reinforcement learning and unmanned aerial vehicle
CN110856191A (en) * 2019-10-24 2020-02-28 广东工业大学 Unmanned aerial vehicle track optimization method based on wireless communication
CN111182469B (en) * 2020-01-07 2021-04-16 东南大学 Energy collection network time distribution and unmanned aerial vehicle track optimization method
CN111562797B (en) * 2020-07-06 2021-07-30 北京理工大学 Unmanned aerial vehicle flight time optimal real-time trajectory optimization method capable of ensuring convergence

Also Published As

Publication number Publication date
CN112235810A (en) 2021-01-15

Similar Documents

Publication Publication Date Title
CN112235810B (en) Multi-dimensional optimization method and system of unmanned aerial vehicle communication system based on reinforcement learning
CN108924936B (en) Resource allocation method of unmanned aerial vehicle-assisted wireless charging edge computing network
CN111628855B (en) Industrial 5G dynamic multi-priority multi-access method based on deep reinforcement learning
CN110730031B (en) Unmanned aerial vehicle track and resource allocation joint optimization method for multi-carrier communication
CN112351503A (en) Task prediction-based multi-unmanned-aerial-vehicle-assisted edge computing resource allocation method
CN110311715B (en) Large-scale MIMO non-orthogonal unicast and multicast transmission power distribution method with optimal energy efficiency
CN114585006B (en) Edge computing task unloading and resource allocation method based on deep learning
CN111586718A (en) Fountain code design method for unmanned aerial vehicle relay communication system
CN116156563A (en) Heterogeneous task and resource end edge collaborative scheduling method based on digital twin
CN113552898B (en) Unmanned aerial vehicle robust trajectory planning method under uncertain interference environment
CN117369495A (en) Unmanned aerial vehicle formation track planning method based on model predictive control
CN115086964A (en) Dynamic spectrum allocation method and system based on multi-dimensional vector space optimization
Siddiqi et al. Deep reinforcement based power allocation for the max-min optimization in non-orthogonal multiple access
Hao et al. Topology optimised fixed‐time consensus for multi‐UAV system in a multipath fading channel
Orlov et al. Simulation of devices mobility to estimate wireless channel quality metrics in 5G networks
CN117614520A (en) Method for optimizing large-scale MIMO (multiple input multiple output) resources by removing cells based on unmanned aerial vehicle-satellite cooperation
CN115633320B (en) Multi-unmanned aerial vehicle assisted data acquisition and return method, system, equipment and medium
CN113543271B (en) Effective capacity-oriented resource allocation method and system
CN114599102A (en) Method for unloading linear dependent tasks of edge computing network of unmanned aerial vehicle
CN112752290B (en) Method and equipment for predicting data traffic of wireless base station
CN114980205A (en) QoE (quality of experience) maximization method and device for multi-antenna unmanned aerial vehicle video transmission system
Liu et al. Joint optimization of resource scheduling and mobility for UAV-assisted vehicle platoons
Kim et al. RL-based transmission completion time minimization with energy harvesting for time-varying channels
CN115866638A (en) Rate optimization method for uplink rate division multiple access system
Si et al. Uav-assisted semantic communication with hybrid action reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20211208

Address after: 200233 room 805, No. 418, Guiping Road, Xuhui District, Shanghai

Patentee after: Shanghai jianeng Intelligent Technology Co.,Ltd.

Address before: 510000 Qingshan Lake, Shawan Town, Panyu District, Guangzhou City, Guangdong Province

Patentee before: GUANGZHOU PANYU POLYTECHNIC

TR01 Transfer of patent right

Effective date of registration: 20220111

Address after: No.398, Xinlu Road, Xinbang Town, Songjiang District, Shanghai, 201605

Patentee after: Shanghai Gala Information Technology Co.,Ltd.

Address before: 200233 room 805, No. 418, Guiping Road, Xuhui District, Shanghai

Patentee before: Shanghai jianeng Intelligent Technology Co.,Ltd.

TR01 Transfer of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Multi dimensional optimization method and system of UAV Communication System Based on Reinforcement Learning

Effective date of registration: 20220506

Granted publication date: 20210709

Pledgee: The Bank of Shanghai branch Caohejing Limited by Share Ltd.

Pledgor: Shanghai Gala Information Technology Co.,Ltd.

Registration number: Y2022980005170

PE01 Entry into force of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right

Date of cancellation: 20230727

Granted publication date: 20210709

Pledgee: The Bank of Shanghai branch Caohejing Limited by Share Ltd.

Pledgor: Shanghai Gala Information Technology Co.,Ltd.

Registration number: Y2022980005170

PC01 Cancellation of the registration of the contract for pledge of patent right