CN111786713A - Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning - Google Patents

Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning Download PDF

Info

Publication number
CN111786713A
CN111786713A CN202010497656.4A CN202010497656A CN111786713A CN 111786713 A CN111786713 A CN 111786713A CN 202010497656 A CN202010497656 A CN 202010497656A CN 111786713 A CN111786713 A CN 111786713A
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
network
ground
base station
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010497656.4A
Other languages
Chinese (zh)
Other versions
CN111786713B (en
Inventor
刘中豪
覃振权
卢炳先
王雷
朱明�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN202010497656.4A priority Critical patent/CN111786713B/en
Publication of CN111786713A publication Critical patent/CN111786713A/en
Application granted granted Critical
Publication of CN111786713B publication Critical patent/CN111786713B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/14Relay systems
    • H04B7/15Active relay systems
    • H04B7/185Space-based or airborne stations; Stations for satellite systems
    • H04B7/18502Airborne stations
    • H04B7/18506Communications with or from aircraft, i.e. aeronautical mobile service
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/18Network planning tools
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/22Traffic simulation tools or models
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Astronomy & Astrophysics (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

An unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning comprises the steps of firstly, modeling a channel model, a coverage model and an energy loss model in an unmanned aerial vehicle ground communication scene; modeling a throughput maximization problem of an unmanned aerial vehicle to ground communication network into a partially observable Markov decision process; obtaining local observation information and instantaneous rewards through continuous interaction of an unmanned aerial vehicle and the environment, and carrying out centralized training based on the information to obtain a distributed strategy network; and deploying the strategy network to each unmanned aerial vehicle, wherein each unmanned aerial vehicle can obtain a moving direction and a moving distance decision based on local observation information of the unmanned aerial vehicle, adjust the hovering position and perform distributed cooperation. The invention also introduces proportional fair scheduling and energy consumption loss information of the unmanned aerial vehicle into the instantaneous reward function, thereby improving the throughput, ensuring the fairness of the unmanned aerial vehicle to the ground user service, reducing the energy consumption loss and enabling the unmanned aerial vehicle cluster to adapt to the dynamic environment.

Description

Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning
Technical Field
The invention relates to the technical field of wireless communication, in particular to a multi-agent deep reinforcement learning-based multi-unmanned-aerial-vehicle network hovering position optimization method.
Background
In recent years, due to high mobility, easy deployment and low cost of the unmanned aerial vehicle, the communication technology based on the unmanned aerial vehicle draws wide attention and becomes a new research hotspot in the field of wireless communication. The unmanned aerial vehicle auxiliary communication technology mainly has the following application scenes: the unmanned aerial vehicle is used as a mobile base station to provide communication coverage for scarce infrastructure or post-disaster areas, and the unmanned aerial vehicle is used as a relay node to provide wireless connection for two communication nodes which are far away and cannot be directly connected, and data distribution and acquisition are based on the unmanned aerial vehicle. The present invention is primarily directed to the first scenario in which the hover position of the drone determines the coverage performance and throughput of the entire drone network. Ground devices served by the drone network may have mobility, so the drone needs to constantly adjust its hover position for optimal performance.
In 2018, Qingqing Wu et al proposed a UAV path planning scheme for a multi-drone ground communication system in the paper "jointtrajectoryandcommunicationdesign for multi-uavenalyzed wireless networks", which divides time into multiple periods, the UAVs movement trajectories in each period are the same, and in each timeslot, a drone base station serves a specific ground user. According to the scheme, an optimization problem is modeled into a mixed integer programming problem, a block coordinate gradient descent and approximately convex optimization technology is used for solving, the optimal hovering position of each time slice in a period is obtained, and downlink throughput between the time slice and a ground user is maximized. However, the solution proposed in this paper is only applicable to a static environment, which is performed under the condition that the ground device does not have mobility, and is not applicable to a scenario in which the ground user continuously moves. A deep reinforcement Learning-based UAV path planning algorithm is proposed in a paper "Energy-Efficient UAV Control for Efficient and Fair communication coverage" by Chi Harold Liu et al, a decision model is trained by a deep reinforcement Learning method, and the model outputs the next decision (moving direction and moving distance) of UAVs according to the current state. The method proposed by the paper enables fair wireless coverage over a large area and reduces the energy consumption of UAVs as much as possible. However, this method only considers the coverage performance of UAVs networks and is coarse-grained coverage fairness for areas, not fine-grained coverage fairness for users. In addition, the method is a centralized solution, and a controller is required to collect information of all drones in each time slot to make a decision.
In summary, the UAVs path planning technique in the ground communication network based on the base station of the unmanned aerial vehicle mainly has the following defects: (1) the dynamics of the environment, i.e. the mobility of the terrestrial users, are not taken into account. (2) The adopted centralized algorithm depends on global information and centralized control, and centralized control is difficult in some large-scale scenes, so that a distributed control strategy is needed, and each unmanned aerial vehicle base station makes a decision only by the information obtained by the unmanned aerial vehicle base station. (3) Service fairness at the user level is ignored. Due to the defects, the UAVs trajectory optimization method in the existing unmanned aerial vehicle network cannot be applied to the actual communication environment.
Disclosure of Invention
The invention aims to provide a multi-agent reinforcement learning-based multi-unmanned aerial vehicle hovering position optimization method to solve the technical problem.
The technical scheme of the invention is as follows:
an unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning comprises the following steps:
(1) the method for establishing the multi-unmanned aerial vehicle communication network model comprises the following 4 steps:
(1.1) establishing a scene model: a square target area with the side length l is established, N ground users and M unmanned aerial vehicle base stations (UAV-BSs) are arranged in the area, and the UAV base stations provide communication services for the ground users. The time is divided into T identical time slots, and from the last time slot to the current time slot, the ground user may be stationary or may move, so the base station of the drone needs to find a new optimal hovering position at each time slot, and select the ground user to perform data transmission service after reaching the target position.
(1.2) establishing an air-to-ground communication model: the invention uses an air-to-ground channel model to model the channel between the unmanned aerial vehicle base station and the ground user, the unmanned aerial vehicle base station is easier to establish a line of sight (LoS) with the ground user compared with the ground base station due to high flight height, and under the LoS condition, the path loss model between the unmanned aerial vehicle base station m and the ground user n is as follows:
Figure BDA0002523468170000031
where η denotes the excess path loss coefficient, c denotes the speed of light, fcRepresenting the subcarrier frequency, α the path loss exponent,
Figure BDA0002523468170000032
represents the distance between drone base station m and ground user n, where rn,mAnd h is the fixed flying height of the unmanned aerial vehicle base station. The channel gain can be expressed as a path loss
Figure BDA0002523468170000033
According to the channel gain, the data transmission rate between the base station m of the unmanned aerial vehicle and the ground user n in the time slot t is as follows:
Figure BDA0002523468170000034
where σ represents additive white Gaussian noise, ptIndicating transmit power of drone base station,gn,m(t) represents the channel gain between drone base station m and ground user n at time t.
(1.3) establishing a coverage model: due to hardware limitations, the coverage of each drone base station is limited. The invention defines the maximum tolerable path loss LmaxIf the path loss between the unmanned aerial vehicle base station and the user is less than L at a certain momentmaxWe consider the established connection reliable, otherwise we consider the established connection failed. Therefore, the effective coverage range of each unmanned aerial vehicle base station can be defined according to the maximum tolerable path loss, and the range takes the projection point of the unmanned aerial vehicle base station on the ground as the circle center and RcovIs radius, according to the path loss formula, RcovCan be expressed as:
Figure BDA0002523468170000035
(1.4) establishing an energy loss model: the invention mainly focuses on energy loss caused by movement of the unmanned aerial vehicle, and considers the flight speed V and the flight power p of the unmanned aerial vehiclefThe flight energy consumption of the drone base station m at the time slot t depends on the distance of flight:
Figure BDA0002523468170000041
wherein
Figure BDA0002523468170000042
Respectively representing the x-axis and y-axis position coordinates of the drone on the horizontal plane.
(2) Modeling the problem as a locally observable markov decision process:
each unmanned aerial vehicle base station is equivalent to an agent; in each time slot with the environment state S (t), the agent m can only obtain local observation o within the coverage range of the agent mmAnd according to a decision function um(om) Selecting action a from action set AmTo maximize the total desired reward for the discount
Figure BDA0002523468170000043
Where γ ∈ (0,1) is the discount coefficient, rm(t) represents the reward of agent m at time t;
the system state set S ═ { S (t) | S (t) ═ S (S)u(t),Sg(t)) }, respectively containing the current state of the drone base station
Figure BDA0002523468170000044
And the current state of the ground user
Figure BDA0002523468170000045
Unmanned aerial vehicle base station status
Figure BDA0002523468170000046
The current position information of the unmanned aerial vehicle is included; ground user status
Figure BDA0002523468170000047
Including location information of the current terrestrial user.
In the time slot t, the unmanned aerial vehicle m needs to make a decision a after obtaining the current local observation informationm(t), move to the next hover position, so the action set includes the flight rotation angle θ (t) and the movement distance d (t).
System timely reward r (t): the objective herein is to maximize the throughput of the drone network while taking into account user service fairness and energy consumption. Thus, the extra throughput generated by adjusting the hover position of the drone at each time t is a positive reward, expressed as:
ΔC(t)=C(Su(t+1),Sg(t))-C(Su(t),Sg(t))
wherein C (S)u(t),Sg(t)) indicates that the unmanned aerial vehicle base station state is Su(t) the ground user status is Sg(t) throughput generated by the network. C (S)u(t+1),Sg(t)) then indicates that the unmanned aerial vehicle base station state is Su(t +1), the ground user state is Sg(t) throughput generated by the network. Fair in view of user servicesAlternatively, if a certain area has a large number of users, and a certain area has only one user, the drone base station always hovers in a high-density area in pursuit of maximizing throughput, and ignores a low-density area, so the invention applies a weight w to the throughput reward of each usern(t) implementing proportional fair scheduling. RreqExpressed is the minimum communication rate requirement, R, of the terrestrial user demandn(t) represents the average communication rate of the terrestrial user n from the start phase to time t. When the drone base station serves this user, Rn(t) increasing, the user's weight gradually decreasing; if the user is not served, Rn(t) decreases and the user weight increases. Therefore, the reward weight of the user sparse area is increased continuously, and the unmanned aerial vehicle base station is attracted to carry out service.
Figure BDA0002523468170000051
Figure BDA0002523468170000053
Wherein, an,m(t) is an indicator variable, at time t, if drone base station m serves a ground user n, then an,m(t) is 1, therefore, the invention gives a system real-time reward r (t) by comprehensively considering fairness throughput reward and energy loss penalty:
Figure BDA0002523468170000052
wherein alpha represents the weight occupied by the energy consumption punishment, the larger alpha is, the more the system pays attention to the energy consumption loss in the decision making process, otherwise, the more the energy consumption loss is ignored.
Local observation set o (t) ═ o1(t),…,oM(t), when the multiple unmanned aerial vehicle base stations cooperatively work in a large-range area, each unmanned aerial vehicle cannot observe global information and can only observe ground user information in the coverage area of the unmanned aerial vehicle. om(t) observed by drone base station m at time tPosition information of the ground users within the coverage of the ground users.
(3) Training based on a multi-agent deep reinforcement learning algorithm:
the multi-agent deep reinforcement learning algorithm MADDPG is introduced into the hovering position optimization of the unmanned aerial vehicle in the ground communication network, a centralized training and distributed execution architecture is adopted, global information is used during training, the gradient updating of a decision function of each unmanned aerial vehicle is better guided, each unmanned aerial vehicle only uses local information observed by the unmanned aerial vehicle to make a next decision during execution, and the needs of an actual scene are better met; each agent adopts an Actor-Critic structured DDPG network for training, the strategy network is used for fitting a strategy function u (o), inputting a local observation o and outputting an action strategy a; the evaluation network is used to fit a state-action function Q (s, a) representing the desired reward for taking action a when the system state is s; let u be { u ═ u1,…,uMDenotes the deterministic policy functions of M agents,
Figure BDA0002523468170000061
parameter representing each policy network, Q ═ Q1,…,QMDenotes the evaluation network of M agents,
Figure BDA0002523468170000062
a parameter indicative of an evaluation network, the step (3) comprising:
(3.1) initializing an experience playback space, setting the size of the experience playback space, initializing parameters of each DDPG network, the number of training rounds and the like
(3.2) starting from the training round epoch-1 and starting from the time t-1.
(3.3) acquiring local observation information o of the current unmanned aerial vehicle and the current state s of the whole system, wherein each unmanned aerial vehicle m uses the local observation information obtained by the time slot t, and outputs decision information a based on an ∈ greedy strategy and a DDPG networkmAdjusting the hovering position, selecting the W ground users with the lowest path loss to carry out communication service based on a greedy scheme according to the path loss between the ground users and the hovering position, obtaining an instantaneous return reward r, and achieving the next seriesThe state s 'is unified and local observation information o' is obtained; storing (s, o, a, r, s ', o') as a sample in an empirical playback space, a ═ a { (a)1,…,aMDenotes the joint action of all drones, o ═ o1,…,omDenotes the local observation information of all drones, t ═ t + 1.
(3.4) if the number of the samples stored in the playback space is greater than B, reaching the step 3.5; otherwise, continuing to collect the sample and returning to the step 3.3.
(3.5) for each agent m, calculating a target value from randomly sampling a fixed number K of samples in the empirical playback space, where the kth sample(s)k,ok,ak,rk,s′k,ok) Target value y ofkCan be expressed as:
Figure BDA0002523468170000071
wherein Q'mA target network u 'representing an evaluation network of the m-th agent'mTarget network of the policy network representing the mth agent, rkRepresents the timely reward, a ', in the kth sample'mRepresents that the unmanned plane m is in the system state s′kAccording to local observation
Figure BDA0002523468170000072
The decision made. Minimizing loss functions using gradient descent based on global information
Figure BDA0002523468170000073
And updating parameters of the evaluation network of the agent:
Figure BDA0002523468170000074
updating parameters of the intelligent agent strategy network based on the strategy gradient of the sample according to the evaluation network and the sample information:
Figure BDA0002523468170000075
(3.6) betweenAfter a certain round, i.e. updating the target network parameter θQ′And thetau′:θQ′=τθQ+(1-τ)θQ′u′=τθu+(1-τ)θu′. And when the total duration T is reached or the energy of the unmanned aerial vehicle is exhausted, exiting the current training round, otherwise, returning to the step 3.3. If the number of training rounds is up, the training process is exited, otherwise a new training round is entered.
(4) And distributing the trained strategy network u to each unmanned aerial vehicle, deploying the unmanned aerial vehicles to a target area, adjusting the hovering position of each unmanned aerial vehicle in each time slot according to local observation of the unmanned aerial vehicle, and performing communication service on ground users.
The invention has the beneficial effects that: the invention provides an unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning, which is characterized in that the problem of throughput maximization in an unmanned aerial vehicle ground communication network scene is modeled as a partially observable Markov decision process, a multi-agent deep reinforcement learning method MADPGS is introduced for centralized training and distributed execution, and the problem of unmanned aerial vehicle hovering position optimization in a dynamic environment is solved. The method enables the unmanned aerial vehicle cluster to be better adaptive to a dynamic environment, and the unmanned aerial vehicles do not depend on a centralized controller and can cooperate in a distributed mode.
Drawings
Fig. 1 is a schematic view of a scene of a ground-to-ground communication network of an unmanned aerial vehicle according to the present invention.
FIG. 2 is a flow chart of the unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning.
Fig. 3 is a flow chart of the distributed strategy network for training the unmanned aerial vehicle based on multi-agent deep reinforcement learning according to the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
An unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning is applied to emergency communication recovery of areas lacking ground infrastructure or in disaster. As shown in fig. 1, the area lacks an infrastructure communication facility, the unmanned aerial vehicle is used as a mobile base station for communication coverage, the ground environment is dynamically changed, ground equipment may move, and the unmanned aerial vehicle base station needs to continuously adjust its hovering position to realize better communication service (maximize the throughput of the system). Meanwhile, fairness of service and energy consumption loss are considered, certain ground users cannot be ignored due to the fact that the maximum throughput is pursued, and the energy consumption loss caused by movement of the unmanned aerial vehicle base station is reduced as far as possible. As shown in fig. 2, firstly, a communication model, a coverage model, an energy consumption model and the like in a specific application scene are modeled, and an optimization target is constructed; secondly, modeling an optimization problem into a local observable Markov decision process according to an optimization target and the characteristics of the multi-unmanned aerial vehicle system; then, simulating a ground communication scene of multiple unmanned aerial vehicles by using a simulation platform, acquiring samples through interaction of an unmanned aerial vehicle cluster and the environment, and performing centralized training by using a multi-agent deep reinforcement learning algorithm MADDPG to obtain a distributed strategy of each unmanned aerial vehicle. And finally, deploying the trained strategy network to the unmanned aerial vehicle, deploying the unmanned aerial vehicle cluster to a target area, and enabling the unmanned aerial vehicles to cooperate with each other to complete high-throughput, low-energy-consumption and fair-quality communication coverage.
The method comprises the following specific steps:
(1) the method for establishing the multi-unmanned aerial vehicle communication network model comprises the following 4 steps:
(1.1) establishing a scene model: a square target area with the side length l is established, N ground users and M unmanned aerial vehicle base stations (UAV-BSs) are arranged in the area, and the UAV base stations provide communication services for the ground users. The time is divided into T identical time slots, and from the last time slot to the current time slot, the ground user may be stationary or may move, so the base station of the drone needs to find a new optimal hovering position at each time slot, and select the ground user to perform data transmission service after reaching the target position.
(1.2) establishing an air-to-ground communication model: the invention uses an air-to-ground channel model to model the channel between the unmanned aerial vehicle base station and the ground user, the unmanned aerial vehicle base station is easier to establish a line of sight (LoS) with the ground user compared with the ground base station due to high flight height, and under the LoS condition, the path loss model between the unmanned aerial vehicle base station m and the ground user n is as follows:
Figure BDA0002523468170000091
where η denotes the excess path loss coefficient, c denotes the speed of light, fcRepresenting the subcarrier frequency, α the path loss exponent,
Figure BDA0002523468170000092
representing the distance, r, between the drone base station m and the ground user nn,mFor horizontal distance, h is the fixed flying height of unmanned aerial vehicle basic station. The channel gain can be expressed as a path loss
Figure BDA0002523468170000101
According to the channel gain, the data transmission rate between the base station m of the unmanned aerial vehicle and the ground user n in the time slot t is as follows:
Figure BDA0002523468170000102
where σ represents additive white Gaussian noise, ptRepresenting the transmitted power of the drone base station, gn,m(t) represents the channel gain between drone base station m and ground user n at time t.
(1.3) establishing a coverage model: due to hardware limitations, the coverage of each drone base station is limited. The invention defines the maximum tolerable path loss LmaxIf the path loss between the unmanned aerial vehicle base station and the user is less than L at a certain momentmaxWe believe thatOtherwise, we consider the connection to be established as failed. Therefore, the effective coverage range of each unmanned aerial vehicle base station can be defined according to the maximum tolerable path loss, and the range takes the projection point of the unmanned aerial vehicle base station on the ground as the circle center and RcovIs radius, according to the path loss formula, RcovCan be expressed as:
Figure BDA0002523468170000103
(1.4) establishing an energy loss model: the invention mainly focuses on energy loss caused by movement of the unmanned aerial vehicle, and considers the flight speed V and the flight power p of the unmanned aerial vehiclefThe flight energy consumption of the drone base station m at the time slot t depends on the distance of flight:
Figure BDA0002523468170000104
wherein
Figure BDA0002523468170000105
Respectively representing the x-axis and y-axis position coordinates of the drone on the horizontal plane.
(2) Modeling the problem as a locally observable markov decision process:
each unmanned aerial vehicle base station is equivalent to an agent; in each time slot with the environment state S (t), the agent m can only obtain local observation o within the coverage range of the agent mmAnd according to a decision function um(om) Selecting action a from action set AmTo maximize the total desired reward for the discount
Figure BDA0002523468170000106
Where γ ∈ (0,1) is the discount coefficient, rm(t) represents the reward of agent m at time t;
the system state set S ═ { S (t) | S (t) ═ S (S)u(t),Sg(t)) }, respectively containing the current state of the drone base station
Figure BDA0002523468170000111
And the current state of the ground user
Figure BDA0002523468170000112
Unmanned aerial vehicle base station status
Figure BDA0002523468170000113
The current position information of the unmanned aerial vehicle is included; ground user status
Figure BDA0002523468170000114
Including location information of the current terrestrial user.
In the time slot t, the unmanned aerial vehicle m needs to make a decision a after obtaining the current local observation informationm(t), move to the next hover position, so the action set includes the flight rotation angle θ (t) and the movement distance d (t).
System timely reward r (t): the objective herein is to maximize the throughput of the drone network while taking into account user service fairness and energy consumption. Thus, the extra throughput generated by adjusting the hover position of the drone at each time t is a positive reward, expressed as:
ΔC(t)=C(Su(t+1),Sg(t))-C(Su(t),Sg(t))
wherein C (S)u(t),Sg(t)) indicates that the unmanned aerial vehicle base station state is Su(t) the ground user status is Sg(t) throughput generated by the network. C (S)u(t+1),Sg(t)) then indicates that the unmanned aerial vehicle base station state is Su(t +1), the ground user state is Sg(t) throughput generated by the network. Considering fairness of user service, if a certain area is gathered with a large number of users and the certain area has only one user, the unmanned aerial vehicle base station always hovers in a high-density area in pursuit of maximizing throughput and ignores a low-density area, so that the invention applies a weight w to the throughput reward of each usern(t) implementing proportional fair scheduling. RreqRepresenting the minimum communication rate requirement of terrestrial user demand,Rn(t) represents the average communication rate of the terrestrial user n from the start phase to time t. When the drone base station serves this user, Rn(t) increasing, the user's weight gradually decreasing; if the user is not served, Rn(t) decreases and the user weight increases. Therefore, the reward weight of the user sparse area is increased continuously, and the unmanned aerial vehicle base station is attracted to carry out service.
Figure BDA0002523468170000121
Figure BDA0002523468170000122
Therefore, by comprehensively considering fairness throughput reward and energy loss penalty, the invention gives a system real-time reward r (t)
Figure BDA0002523468170000123
Wherein alpha represents the weight occupied by the energy consumption punishment, the larger alpha is, the more the system pays attention to the energy consumption loss in the decision making process, otherwise, the more the energy consumption loss is ignored.
Local observation set o (t) ═ o1(t),…,oM(t), when the multiple unmanned aerial vehicle base stations cooperatively work in a large-range area, each unmanned aerial vehicle cannot observe global information and can only observe ground user information in the coverage area of the unmanned aerial vehicle. om(t) represents the position information of the ground users in the coverage range of the unmanned aerial vehicle base station m.
(3) Training based on a multi-agent deep reinforcement learning algorithm:
the invention introduces a multi-agent deep reinforcement learning algorithm MADDPG into the hovering position optimization of the unmanned aerial vehicle to the ground communication network, adopts a centralized training and distributed execution architecture, uses global information during training to better guide the gradient update of a decision function of each unmanned aerial vehicle, and only uses local information observed by each unmanned aerial vehicle to make the next step during executionStep decision, the requirements of the actual scene are better met; each agent adopts an Actor-Critic structured DDPG network for training, the strategy network is used for fitting a strategy function u (o), inputting a local observation o and outputting an action strategy a; the evaluation network is used to fit a state-action function Q (s, a) representing the desired reward for taking action a when the system state is s; let u be { u ═ u1,…,uMDenotes the deterministic policy functions of M agents,
Figure BDA0002523468170000124
parameter representing each policy network, Q ═ Q1,…,QMDenotes the evaluation network of M agents,
Figure BDA0002523468170000131
representing the parameters of the evaluation network, as shown in fig. 3, step (3) includes:
(3.1) initializing an experience playback space, setting the size B of the experience playback space, initializing the parameter theta, the number of training rounds P, the time length T and the like of each DDPG network
(3.2) starting from the training round epoch-1 and starting from the time t-1.
(3.3) acquiring local observation information o of the current unmanned aerial vehicle and the current state s of the whole system, wherein each unmanned aerial vehicle m uses the local observation information obtained by the time slot t, and outputs decision information a based on an ∈ greedy strategy and a DDPG networkmAdjusting the hovering position, selecting W ground users with the lowest path loss to perform communication service based on a greedy scheme according to the path loss between the hovering position and the ground users, obtaining an instant return reward r, achieving the next system state s 'and obtaining local observation information o'; storing (s, o, a, r, s ', o') as a sample in an empirical playback space, a ═ a { (a)1,…,aMDenotes the joint action of all drones, o ═ o1,…,omThe local observation information of all unmanned aerial vehicles is represented, and t is t + 1;
(3.4) if the number of the samples stored in the playback space is greater than B, reaching the step 3.5; otherwise, continuing to collect the sample and returning to the step 3.3.
(3.5) for each agent m, fromA fixed number K of samples are randomly sampled in playback space, and a target value is calculated, wherein the kth sample(s)k,ok,ak,rk,s′k,ok) Target value y ofkCan be expressed as:
Figure BDA0002523468170000132
wherein Q'mA target network u 'representing an evaluation network of the m-th agent'mTarget network of the policy network representing the mth agent, rkRepresents the timely reward, a ', in the kth sample'mRepresents that the unmanned plane m is in the system state s′kAccording to local observation
Figure BDA0002523468170000133
The decision made. Minimizing loss functions using gradient descent based on global information
Figure BDA0002523468170000134
And updating parameters of the evaluation network of the agent:
Figure BDA0002523468170000141
updating parameters of the intelligent agent strategy network based on the strategy gradient of the sample according to the evaluation network and the sample information:
Figure BDA0002523468170000142
(3.6) updating the evaluation target network parameter theta after a certain turn intervalQ′And a policy target network parameter θu′:θQ′=τθQ+(1-τ)θQ′u′=τθu+(1-τ)θu′. And when the total duration T is reached or the energy of the unmanned aerial vehicle is exhausted, exiting the current training round, otherwise, returning to the step 3.3. If the number of training rounds is up, the training process is exited, otherwise a new training round is entered.
(4) And distributing the trained strategy network u to each unmanned aerial vehicle, deploying the unmanned aerial vehicles to a target area, adjusting the hovering position of each unmanned aerial vehicle in each time slot according to local observation of the unmanned aerial vehicle, and performing communication service on ground users.
In summary, the following steps:
the invention provides an unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning, which is characterized in that the problem of throughput maximization in a multi-unmanned aerial vehicle ground communication scene is modeled as a local observable Markov decision process and is solved by using an MADDPG algorithm, so that an unmanned aerial vehicle cluster can adapt to a dynamic environment, distributed cooperation is carried out, and high throughput, low energy consumption and service fairness of a network are realized.
The foregoing shows and describes the general principles and broad features of the present invention and advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (1)

1. An unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning is characterized by comprising the following steps:
(1) establishing multi-unmanned aerial vehicle to ground communication network model
(1.1) establishing a scene model: establishing a square target area with the side length of l, wherein N ground users and M unmanned aerial vehicle base stations are arranged in the area, and the unmanned aerial vehicle base stations provide communication service for the ground users; the time is divided into T identical time slots, from the last time slot to the current time slot, the ground user may be static or may move, so the unmanned aerial vehicle base station needs to search a new optimal hovering position in each time slot and select the ground user to perform data transmission service after reaching the target position;
(1.2) establishing an air-to-ground communication model: use the air-to-ground channel model to model the channel between unmanned aerial vehicle basic station and the ground user, unmanned aerial vehicle basic station is because high flight altitude, compares in ground basic station and establishes the stadia link LoS with the ground user more easily, and under the LoS condition, the path loss model between unmanned aerial vehicle basic station m and the ground user n is:
Figure FDA0002523468160000011
where η denotes the excess path loss coefficient, c denotes the speed of light, fcRepresenting the subcarrier frequency, α the path loss exponent,
Figure FDA0002523468160000012
representing the distance, r, between the drone base station m and the ground user nn,mThe horizontal distance is h, and the fixed flying height of the unmanned aerial vehicle base station is h; the channel gain is expressed as
Figure FDA0002523468160000013
According to the channel gain, the data transmission rate between the unmanned aerial vehicle base station m and the ground user n in the time slot t is Rn,m(t):
Figure FDA0002523468160000014
Where σ represents additive white Gaussian noise, ptRepresenting the transmitted power of the drone base station, gn,m(t) represents the channel gain between the unmanned aerial vehicle base station m and the ground user n at time t;
(1.3) establishing a coverage model: defining a maximum tolerable path loss LmaxIf the path loss between the unmanned aerial vehicle base station and the user is less than L at a certain momentmaxIf the connection is not established, the connection is established; defining the effective coverage range of each unmanned aerial vehicle base station according to the maximum tolerable path loss, wherein the range takes the projection point of the unmanned aerial vehicle base station on the ground as the circle center and takes R as the circle centercovIs radius, according to the path loss formula, RcovExpressed as:
Figure FDA0002523468160000021
(1.4) establishing an energy loss model: paying attention to energy loss caused by movement of the unmanned aerial vehicle, and considering the flying speed V and the flying power p of the unmanned aerial vehiclefAnd the flight energy consumption delta e of the unmanned aerial vehicle base station m in the time slot tm(t) distance of flight dependent:
Figure FDA0002523468160000022
wherein,
Figure FDA0002523468160000023
respectively representing the position coordinates of the unmanned aerial vehicle on the x axis and the y axis on the horizontal plane at the time t;
(2) modeling the problem as a locally observable markov decision process:
each unmanned aerial vehicle base station is equivalent to an agent; in each time slot with the environment state S (t), the agent m can only obtain local observation o within the coverage range of the agent mmAnd according to a decision function um(om) Selecting an action a from the action setmTo maximize the total desired reward for the discount
Figure FDA0002523468160000024
Where γ ∈ (0,1) is the discount coefficient, rm(t) represents the reward of agent m at time t;
the system state set S ═ { S (t) | S (t) ═ S (S)u(t),Sg(t)) }, respectively containing the current state of the drone base station
Figure FDA0002523468160000025
And the current state of the ground user
Figure FDA0002523468160000026
Status of each drone base station
Figure FDA0002523468160000027
The current position information of the unmanned aerial vehicle is included; state of each terrestrial user
Figure FDA0002523468160000028
Including location information of a current ground user;
in the time slot t, the unmanned aerial vehicle m needs to make a decision a after obtaining the current local observation informationm(t), move to the next hover position, so the action set includes flight rotation angle θ (t) and movement distance d (t);
system real-time rewards r (t): the throughput of the unmanned aerial vehicle network is maximized while the user service fairness and the energy consumption are considered; thus, the extra throughput generated by adjusting the hover position of the drone at each time t is a positive reward, expressed as:
ΔC(t)=C(Su(t+1),Sg(t))-C(Su(t),Sg(t))
wherein, C (S)u(t),Sg(t)) indicates that the unmanned aerial vehicle base station state is Su(t) the ground user status is Sg(t) throughput generated by the network; c (S)u(t+1),Sg(t)) then indicates that the unmanned aerial vehicle base station state is Su(t +1), the ground user state is Sg(t) throughput generated by the network; considering fairness of user service, if a certain area is gathered with a large number of users and a certain area has only a small number of users, the unmanned aerial vehicle base station always hovers in a high-density area in pursuit of maximizing throughput, and ignores the low-density area, so that a weight w is applied to the throughput reward of each usern(t) implementing proportional fair scheduling; rreqExpressed is the minimum communication rate requirement, R, of the terrestrial user demandn(t) represents the average communication rate of the terrestrial user n from the beginning to the time t; when the drone base station serves this user, Rn(t) increase, the user's weight gradually becomes smaller; if the user is not served, Rn(t) increase, the user weight increasing; therefore, the reward weight of the user sparse area is continuously increased, and the unmanned aerial vehicle base station is attracted to carry out service;
Figure FDA0002523468160000031
Figure FDA0002523468160000032
wherein, an,m(t) is an indicator variable, at time t, if drone base station m serves a ground user n, then an,m(t) is 1, whereas, an,m(t) ═ 0; therefore, considering the fairness throughput reward and the energy loss penalty comprehensively, the system rewards r (t) in real time:
Figure FDA0002523468160000041
wherein alpha represents the weight occupied by the energy consumption punishment, the larger alpha is, the more the system pays attention to the energy consumption loss in the decision making process, otherwise, the more the energy consumption loss is ignored;
local observation set o (t) ═ o1(t),...,oM(t), when a plurality of unmanned aerial vehicle base stations cooperatively work in a large-range area, each unmanned aerial vehicle cannot observe global information and can only observe ground user information in the coverage area of the unmanned aerial vehicle; om(t) the position information of the ground user in the coverage range of the unmanned aerial vehicle base station m observed at the moment t is represented;
(3) training based on a multi-agent deep reinforcement learning algorithm:
the multi-agent deep reinforcement learning algorithm MADDPG is introduced into the hovering position optimization of the unmanned aerial vehicle to the ground communication network, a centralized training and distributed execution architecture is adopted, global information is used during training, gradient updating of a decision function of each unmanned aerial vehicle is better guided, each unmanned aerial vehicle only uses local information observed by the unmanned aerial vehicle to make a next decision during execution, and the unmanned aerial vehicle is more suitable for an actual fieldThe need for a scene; each agent adopts an Actor-Critic structured DDPG network for training, the strategy network is used for fitting a strategy function u (o), inputting a local observation o and outputting an action strategy a; the evaluation network is used to fit a state-action function Q (s, a) representing the desired reward for taking action a when the system state is s; let u be { u ═ u1,...,uMDenotes the deterministic policy functions of M agents,
Figure FDA0002523468160000042
parameter representing each policy network, Q ═ Q1,...,QMDenotes the evaluation network of M agents,
Figure FDA0002523468160000043
a parameter indicative of an evaluation network;
(3.1) initializing an experience playback space, setting the size of the experience playback space, initializing parameters of each DDPG network and training rounds;
(3.2) starting from the training round epoch-1 and starting from the time t-1;
(3.3) acquiring local observation information o of the current unmanned aerial vehicle and the current state s of the whole system, wherein each unmanned aerial vehicle m uses the local observation information obtained by the time slot t, and outputs decision information a based on an ∈ greedy strategy and a DDPG networkmAdjusting the hovering position, selecting W ground users with the lowest path loss to perform communication service based on a greedy scheme according to the path loss between the hovering position and the ground users, obtaining an instant return reward r, achieving the next system state s 'and obtaining local observation information o'; storing (s, o, a, r, s ', o') as a sample in an empirical playback space, a ═ a { (a)1,...,aMDenotes the joint action of all drones, o ═ o1,...,omThe local observation information of all unmanned aerial vehicles is represented, and t is t + 1;
(3.4) if the number of the samples stored in the playback space is greater than B, reaching the step (3.5); otherwise, continuing to collect the sample, and returning to the step (3.3);
(3.5) for each agent m, randomly sampling data from the empirical playback spaceA constant number K of samples, calculating a target value, wherein the kth sample(s)k,ok,ak,rk,s′k,ok) Target value y ofkCan be expressed as:
Figure FDA0002523468160000051
wherein Q'mA target network u 'representing an evaluation network of the m-th agent'mTarget network of the policy network representing the mth agent, rkRepresents the timely reward, a ', in the kth sample'mRepresenting unmanned plane m in system state s'kAccording to local observation
Figure FDA0002523468160000052
The decision made; minimizing loss functions using gradient descent based on global information
Figure FDA0002523468160000053
And updating parameters of the evaluation network of the agent:
Figure FDA0002523468160000054
updating parameters of the intelligent agent strategy network based on the strategy gradient of the sample according to the evaluation network and the sample information:
Figure FDA0002523468160000055
(3.6) updating the evaluation target network parameter theta after a certain turn intervalQ′And a policy target network parameter θu′:θQ′=τθQ+(1-τ)θQ′,θu′=τθu+(1-τ)θu′Tau ∈ (0,1) represents updating weight, when reaching total duration T or the energy of the unmanned aerial vehicle is exhausted, quitting the current training round, otherwise, returning to step (3.3), if the number of the training rounds is up, quitting the training process, otherwise, entering a new training round;
(4) and distributing the trained strategy network u to each unmanned aerial vehicle, deploying the unmanned aerial vehicles to a target area, adjusting the hovering position of each unmanned aerial vehicle in each time slot according to local observation of the unmanned aerial vehicle, and performing communication service on ground users.
CN202010497656.4A 2020-06-04 2020-06-04 Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning Active CN111786713B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010497656.4A CN111786713B (en) 2020-06-04 2020-06-04 Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010497656.4A CN111786713B (en) 2020-06-04 2020-06-04 Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN111786713A true CN111786713A (en) 2020-10-16
CN111786713B CN111786713B (en) 2021-06-08

Family

ID=72753669

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010497656.4A Active CN111786713B (en) 2020-06-04 2020-06-04 Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN111786713B (en)

Cited By (78)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112256056A (en) * 2020-10-19 2021-01-22 中山大学 Unmanned aerial vehicle control method and system based on multi-agent deep reinforcement learning
CN112511250A (en) * 2020-12-03 2021-03-16 中国人民解放军火箭军工程大学 DRL-based multi-unmanned aerial vehicle air base station dynamic deployment method and system
CN112511197A (en) * 2020-12-01 2021-03-16 南京工业大学 Unmanned aerial vehicle auxiliary elastic video multicast method based on deep reinforcement learning
CN112512115A (en) * 2020-11-20 2021-03-16 北京邮电大学 Method and device for determining position of air base station and electronic equipment
CN112566209A (en) * 2020-11-24 2021-03-26 山西三友和智慧信息技术股份有限公司 UAV-BSs energy and service priority track design method based on double Q learning
CN112636811A (en) * 2020-12-08 2021-04-09 北京邮电大学 Relay unmanned aerial vehicle deployment method and device
CN112672361A (en) * 2020-12-17 2021-04-16 东南大学 Large-scale MIMO capacity increasing method based on unmanned aerial vehicle cluster deployment
CN112752357A (en) * 2020-12-02 2021-05-04 宁波大学 Online unmanned aerial vehicle auxiliary data collection method and device based on energy harvesting technology
CN112821938A (en) * 2021-01-08 2021-05-18 重庆大学 Total throughput and energy consumption optimization method of air-space-ground satellite communication system
CN112904890A (en) * 2021-01-15 2021-06-04 北京国网富达科技发展有限责任公司 Unmanned aerial vehicle automatic inspection system and method for power line
CN112947575A (en) * 2021-03-17 2021-06-11 中国人民解放军国防科技大学 Unmanned aerial vehicle cluster multi-target searching method and system based on deep reinforcement learning
CN113094982A (en) * 2021-03-29 2021-07-09 天津理工大学 Internet of vehicles edge caching method based on multi-agent deep reinforcement learning
CN113115344A (en) * 2021-04-19 2021-07-13 中国人民解放军火箭军工程大学 Unmanned aerial vehicle base station communication resource allocation strategy prediction method based on noise optimization
CN113162679A (en) * 2021-04-01 2021-07-23 南京邮电大学 DDPG algorithm-based IRS (inter-Range instrumentation System) auxiliary unmanned aerial vehicle communication joint optimization method
CN113190039A (en) * 2021-04-27 2021-07-30 大连理工大学 Unmanned aerial vehicle acquisition path planning method based on hierarchical deep reinforcement learning
CN113194488A (en) * 2021-03-31 2021-07-30 西安交通大学 Unmanned aerial vehicle track and intelligent reflecting surface phase shift joint optimization method and system
CN113242556A (en) * 2021-06-04 2021-08-10 重庆邮电大学 Unmanned aerial vehicle resource dynamic deployment method based on differentiated services
CN113255218A (en) * 2021-05-27 2021-08-13 电子科技大学 Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network
CN113286275A (en) * 2021-04-23 2021-08-20 南京大学 Unmanned aerial vehicle cluster efficient communication method based on multi-agent reinforcement learning
CN113286314A (en) * 2021-05-25 2021-08-20 重庆邮电大学 Unmanned aerial vehicle base station deployment and user association method based on Q learning algorithm
CN113328775A (en) * 2021-05-28 2021-08-31 怀化学院 UAV height positioning system and computer storage medium
CN113346944A (en) * 2021-06-28 2021-09-03 上海交通大学 Time delay minimization calculation task unloading method and system in air-space-ground integrated network
CN113342029A (en) * 2021-04-16 2021-09-03 山东师范大学 Maximum sensor data acquisition path planning method and system based on unmanned aerial vehicle cluster
CN113364495A (en) * 2021-05-25 2021-09-07 西安交通大学 Multi-unmanned aerial vehicle track and intelligent reflecting surface phase shift joint optimization method and system
CN113359480A (en) * 2021-07-16 2021-09-07 中国人民解放军火箭军工程大学 Multi-unmanned aerial vehicle and user cooperative communication optimization method based on MAPPO algorithm
CN113364630A (en) * 2021-06-15 2021-09-07 广东技术师范大学 Quality of service (QoS) differentiation optimization method and device
CN113382060A (en) * 2021-06-07 2021-09-10 北京理工大学 Unmanned aerial vehicle track optimization method and system in Internet of things data collection
CN113395708A (en) * 2021-07-13 2021-09-14 东南大学 Multi-autonomous-subject centralized region coverage method and system based on global environment prediction
CN113392971A (en) * 2021-06-11 2021-09-14 武汉大学 Strategy network training method, device, equipment and readable storage medium
CN113467508A (en) * 2021-06-30 2021-10-01 天津大学 Multi-unmanned aerial vehicle intelligent cooperative decision-making method for trapping task
CN113572548A (en) * 2021-06-18 2021-10-29 南京理工大学 Unmanned aerial vehicle network cooperative fast frequency hopping method based on multi-agent reinforcement learning
CN113613339A (en) * 2021-07-10 2021-11-05 西北农林科技大学 Channel access method of multi-priority wireless terminal based on deep reinforcement learning
CN113625751A (en) * 2021-08-05 2021-11-09 南京航空航天大学 Unmanned aerial vehicle position and resource joint optimization method for air-ground integrated federal learning
CN113625569A (en) * 2021-08-12 2021-11-09 中国人民解放军32802部队 Small unmanned aerial vehicle prevention and control hybrid decision method and system based on deep reinforcement learning and rule driving
CN113641192A (en) * 2021-07-06 2021-11-12 暨南大学 Route planning method for unmanned aerial vehicle crowd sensing task based on reinforcement learning
CN113660681A (en) * 2021-05-31 2021-11-16 西北工业大学 Multi-agent resource optimization method applied to unmanned aerial vehicle cluster auxiliary transmission
CN113691294A (en) * 2021-09-27 2021-11-23 中国人民解放军空军预警学院 Near-field sparse array antenna beam establishing method and device
CN113706023A (en) * 2021-08-31 2021-11-26 哈尔滨理工大学 Shipboard aircraft guarantee operator scheduling method based on deep reinforcement learning
CN113762512A (en) * 2021-11-10 2021-12-07 北京航空航天大学杭州创新研究院 Distributed model training method, system and related device
CN113776531A (en) * 2021-07-21 2021-12-10 电子科技大学长三角研究院(湖州) Multi-unmanned-aerial-vehicle autonomous navigation and task allocation algorithm of wireless self-powered communication network
CN114021775A (en) * 2021-09-30 2022-02-08 成都海天数联科技有限公司 Intelligent body handicap device putting method based on optimal solution
CN114051252A (en) * 2021-09-28 2022-02-15 嘉兴学院 Multi-user intelligent transmitting power control method in wireless access network
CN114124784A (en) * 2022-01-27 2022-03-01 军事科学院系统工程研究院网络信息研究所 Intelligent routing decision protection method and system based on vertical federation
CN114142912A (en) * 2021-11-26 2022-03-04 西安电子科技大学 Resource control method for guaranteeing time coverage continuity of high-dynamic air network
CN114222251A (en) * 2021-11-30 2022-03-22 中山大学·深圳 Adaptive network forming and track optimizing method for multiple unmanned aerial vehicles
CN114268986A (en) * 2021-12-14 2022-04-01 北京航空航天大学 Unmanned aerial vehicle computing unloading and charging service efficiency optimization method
CN114268963A (en) * 2021-12-24 2022-04-01 北京航空航天大学 Unmanned aerial vehicle network autonomous deployment method facing communication coverage
CN114339842A (en) * 2022-01-06 2022-04-12 北京邮电大学 Method and device for designing dynamic trajectory of unmanned aerial vehicle cluster under time-varying scene based on deep reinforcement learning
CN114374951A (en) * 2022-01-12 2022-04-19 重庆邮电大学 Multi-unmanned aerial vehicle dynamic pre-deployment method
CN114372612A (en) * 2021-12-16 2022-04-19 电子科技大学 Route planning and task unloading method for unmanned aerial vehicle mobile edge computing scene
CN114449482A (en) * 2022-03-11 2022-05-06 南京理工大学 Heterogeneous vehicle networking user association method based on multi-agent deep reinforcement learning
CN114548551A (en) * 2022-02-21 2022-05-27 广东汇天航空航天科技有限公司 Method and device for determining residual endurance time, aircraft and medium
CN114567888A (en) * 2022-03-04 2022-05-31 重庆邮电大学 Multi-unmanned aerial vehicle dynamic deployment method
CN114578335A (en) * 2022-03-03 2022-06-03 电子科技大学长三角研究院(衢州) Positioning method based on multi-agent deep reinforcement learning and least square
CN114625151A (en) * 2022-03-10 2022-06-14 大连理工大学 Underwater robot obstacle avoidance path planning method based on reinforcement learning
CN114679699A (en) * 2022-03-23 2022-06-28 重庆邮电大学 Multi-unmanned-aerial-vehicle energy-saving cruise communication coverage method based on deep reinforcement learning
CN114884895A (en) * 2022-05-05 2022-08-09 郑州轻工业大学 Intelligent traffic scheduling method based on deep reinforcement learning
CN114942653A (en) * 2022-07-26 2022-08-26 北京邮电大学 Method and device for determining unmanned cluster flight strategy and electronic equipment
CN114980169A (en) * 2022-05-16 2022-08-30 北京理工大学 Unmanned aerial vehicle auxiliary ground communication method based on combined optimization of track and phase
CN114980020A (en) * 2022-05-17 2022-08-30 重庆邮电大学 Unmanned aerial vehicle data collection method based on MADDPG algorithm
CN114997617A (en) * 2022-05-23 2022-09-02 华中科技大学 Multi-unmanned platform multi-target joint detection task allocation method and system
CN115038155A (en) * 2022-05-23 2022-09-09 香港中文大学(深圳) Ultra-dense multi-access-point dynamic cooperative transmission method
CN115113651A (en) * 2022-07-18 2022-09-27 中国电子科技集团公司第五十四研究所 Unmanned robot bureaucratic cooperative coverage optimization method based on ellipse fitting
CN115314904A (en) * 2022-06-14 2022-11-08 北京邮电大学 Communication coverage method and related equipment based on multi-agent maximum entropy reinforcement learning
CN115460543A (en) * 2022-08-31 2022-12-09 中国地质大学(武汉) Distributed ring fence covering method, device and storage device
CN115499849A (en) * 2022-11-16 2022-12-20 国网湖北省电力有限公司信息通信公司 Wireless access point and reconfigurable intelligent surface cooperation method
CN115713130A (en) * 2022-09-07 2023-02-24 华东交通大学 Vehicle scheduling method based on hyper-parameter network weight distribution deep reinforcement learning
CN115802313A (en) * 2022-11-16 2023-03-14 河南大学 Air-ground mobile network energy-carrying fair communication method based on intelligent reflecting surface
CN116009590A (en) * 2023-02-01 2023-04-25 中山大学 Unmanned aerial vehicle network distributed track planning method, system, equipment and medium
CN116017479A (en) * 2022-12-30 2023-04-25 河南大学 Distributed multi-unmanned aerial vehicle relay network coverage method
CN116208968A (en) * 2022-12-30 2023-06-02 北京信息科技大学 Track planning method and device based on federal learning
CN116456307A (en) * 2023-05-06 2023-07-18 山东省计算中心(国家超级计算济南中心) Q learning-based energy-limited Internet of things data acquisition and fusion method
CN116502547A (en) * 2023-06-29 2023-07-28 深圳大学 Multi-unmanned aerial vehicle wireless energy transmission method based on graph reinforcement learning
CN116980881A (en) * 2023-08-29 2023-10-31 北方工业大学 Multi-unmanned aerial vehicle collaboration data distribution method, system, electronic equipment and medium
CN117376934A (en) * 2023-12-08 2024-01-09 山东科技大学 Deep reinforcement learning-based multi-unmanned aerial vehicle offshore mobile base station deployment method
CN117835463A (en) * 2023-12-27 2024-04-05 武汉大学 Space-to-ground ad hoc communication network space-time dynamic deployment method based on deep reinforcement learning
CN117856903A (en) * 2023-12-07 2024-04-09 山东科技大学 Marine unmanned aerial vehicle optical link data transmission method based on multi-agent reinforcement learning
CN116017479B (en) * 2022-12-30 2024-10-25 河南大学 Distributed multi-unmanned aerial vehicle relay network coverage method

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180129882A1 (en) * 2016-11-08 2018-05-10 Dedrone Holdings, Inc. Systems, Methods, Apparatuses, and Devices for Identifying, Tracking, and Managing Unmanned Aerial Vehicles
CN109923799A (en) * 2016-11-11 2019-06-21 高通股份有限公司 The method restored for wave beam in millimeter-wave systems
CN209085657U (en) * 2017-08-02 2019-07-09 强力物联网投资组合2016有限公司 For data gathering system related or industrial environment with chemical production technology
CN110198531A (en) * 2019-05-24 2019-09-03 吉林大学 A kind of dynamic D2D relay selection method based on relative velocity
CN110430527A (en) * 2019-07-17 2019-11-08 大连理工大学 A kind of unmanned plane safe transmission power distribution method over the ground
CN110488861A (en) * 2019-07-30 2019-11-22 北京邮电大学 Unmanned plane track optimizing method, device and unmanned plane based on deeply study
CN110531617A (en) * 2019-07-30 2019-12-03 北京邮电大学 Multiple no-manned plane 3D hovering position combined optimization method, device and unmanned plane base station
CN110730028A (en) * 2019-08-29 2020-01-24 广东工业大学 Unmanned aerial vehicle-assisted backscatter communication device and resource allocation control method
CN110809274A (en) * 2019-10-28 2020-02-18 南京邮电大学 Narrowband Internet of things-oriented unmanned aerial vehicle base station enhanced network optimization method
US20200115047A1 (en) * 2018-10-11 2020-04-16 Beihang University Multi-uav continuous movement control method, apparatus, device, and storage medium for energy efficient communication coverage
CN111026147A (en) * 2019-12-25 2020-04-17 北京航空航天大学 Zero overshoot unmanned aerial vehicle position control method and device based on deep reinforcement learning
CN111132009A (en) * 2019-12-23 2020-05-08 北京邮电大学 Mobile edge calculation method, device and system of Internet of things

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180129882A1 (en) * 2016-11-08 2018-05-10 Dedrone Holdings, Inc. Systems, Methods, Apparatuses, and Devices for Identifying, Tracking, and Managing Unmanned Aerial Vehicles
CN109923799A (en) * 2016-11-11 2019-06-21 高通股份有限公司 The method restored for wave beam in millimeter-wave systems
CN209085657U (en) * 2017-08-02 2019-07-09 强力物联网投资组合2016有限公司 For data gathering system related or industrial environment with chemical production technology
US20200115047A1 (en) * 2018-10-11 2020-04-16 Beihang University Multi-uav continuous movement control method, apparatus, device, and storage medium for energy efficient communication coverage
CN110198531A (en) * 2019-05-24 2019-09-03 吉林大学 A kind of dynamic D2D relay selection method based on relative velocity
CN110430527A (en) * 2019-07-17 2019-11-08 大连理工大学 A kind of unmanned plane safe transmission power distribution method over the ground
CN110488861A (en) * 2019-07-30 2019-11-22 北京邮电大学 Unmanned plane track optimizing method, device and unmanned plane based on deeply study
CN110531617A (en) * 2019-07-30 2019-12-03 北京邮电大学 Multiple no-manned plane 3D hovering position combined optimization method, device and unmanned plane base station
CN110730028A (en) * 2019-08-29 2020-01-24 广东工业大学 Unmanned aerial vehicle-assisted backscatter communication device and resource allocation control method
CN110809274A (en) * 2019-10-28 2020-02-18 南京邮电大学 Narrowband Internet of things-oriented unmanned aerial vehicle base station enhanced network optimization method
CN111132009A (en) * 2019-12-23 2020-05-08 北京邮电大学 Mobile edge calculation method, device and system of Internet of things
CN111026147A (en) * 2019-12-25 2020-04-17 北京航空航天大学 Zero overshoot unmanned aerial vehicle position control method and device based on deep reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CONG WANG: "Research of UAV Target Detection and Flight Control Based on Deep Learning", 《2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD)》 *
周毅: "基于深度强化学习的无人机自主部署及能效优化策略", 《物联网学报》 *

Cited By (126)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112256056B (en) * 2020-10-19 2022-03-01 中山大学 Unmanned aerial vehicle control method and system based on multi-agent deep reinforcement learning
CN112256056A (en) * 2020-10-19 2021-01-22 中山大学 Unmanned aerial vehicle control method and system based on multi-agent deep reinforcement learning
CN112512115B (en) * 2020-11-20 2022-02-11 北京邮电大学 Method and device for determining position of air base station and electronic equipment
CN112512115A (en) * 2020-11-20 2021-03-16 北京邮电大学 Method and device for determining position of air base station and electronic equipment
CN112566209A (en) * 2020-11-24 2021-03-26 山西三友和智慧信息技术股份有限公司 UAV-BSs energy and service priority track design method based on double Q learning
CN112511197A (en) * 2020-12-01 2021-03-16 南京工业大学 Unmanned aerial vehicle auxiliary elastic video multicast method based on deep reinforcement learning
CN112752357A (en) * 2020-12-02 2021-05-04 宁波大学 Online unmanned aerial vehicle auxiliary data collection method and device based on energy harvesting technology
CN112752357B (en) * 2020-12-02 2022-06-17 宁波大学 Online unmanned aerial vehicle auxiliary data collection method and device based on energy harvesting technology
CN112511250A (en) * 2020-12-03 2021-03-16 中国人民解放军火箭军工程大学 DRL-based multi-unmanned aerial vehicle air base station dynamic deployment method and system
CN112636811A (en) * 2020-12-08 2021-04-09 北京邮电大学 Relay unmanned aerial vehicle deployment method and device
CN112672361A (en) * 2020-12-17 2021-04-16 东南大学 Large-scale MIMO capacity increasing method based on unmanned aerial vehicle cluster deployment
CN112672361B (en) * 2020-12-17 2022-12-02 东南大学 Large-scale MIMO capacity increasing method based on unmanned aerial vehicle cluster deployment
CN112821938A (en) * 2021-01-08 2021-05-18 重庆大学 Total throughput and energy consumption optimization method of air-space-ground satellite communication system
CN112904890A (en) * 2021-01-15 2021-06-04 北京国网富达科技发展有限责任公司 Unmanned aerial vehicle automatic inspection system and method for power line
CN112904890B (en) * 2021-01-15 2023-06-30 北京国网富达科技发展有限责任公司 Unmanned aerial vehicle automatic inspection system and method for power line
CN112947575A (en) * 2021-03-17 2021-06-11 中国人民解放军国防科技大学 Unmanned aerial vehicle cluster multi-target searching method and system based on deep reinforcement learning
CN113094982A (en) * 2021-03-29 2021-07-09 天津理工大学 Internet of vehicles edge caching method based on multi-agent deep reinforcement learning
CN113194488A (en) * 2021-03-31 2021-07-30 西安交通大学 Unmanned aerial vehicle track and intelligent reflecting surface phase shift joint optimization method and system
CN113162679A (en) * 2021-04-01 2021-07-23 南京邮电大学 DDPG algorithm-based IRS (inter-Range instrumentation System) auxiliary unmanned aerial vehicle communication joint optimization method
CN113162679B (en) * 2021-04-01 2023-03-10 南京邮电大学 DDPG algorithm-based IRS (intelligent resilient software) assisted unmanned aerial vehicle communication joint optimization method
CN113342029A (en) * 2021-04-16 2021-09-03 山东师范大学 Maximum sensor data acquisition path planning method and system based on unmanned aerial vehicle cluster
CN113115344A (en) * 2021-04-19 2021-07-13 中国人民解放军火箭军工程大学 Unmanned aerial vehicle base station communication resource allocation strategy prediction method based on noise optimization
CN113115344B (en) * 2021-04-19 2021-12-14 中国人民解放军火箭军工程大学 Unmanned aerial vehicle base station communication resource allocation strategy prediction method based on noise optimization
CN113286275A (en) * 2021-04-23 2021-08-20 南京大学 Unmanned aerial vehicle cluster efficient communication method based on multi-agent reinforcement learning
CN113190039B (en) * 2021-04-27 2024-04-16 大连理工大学 Unmanned aerial vehicle acquisition path planning method based on layered deep reinforcement learning
CN113190039A (en) * 2021-04-27 2021-07-30 大连理工大学 Unmanned aerial vehicle acquisition path planning method based on hierarchical deep reinforcement learning
CN113364495A (en) * 2021-05-25 2021-09-07 西安交通大学 Multi-unmanned aerial vehicle track and intelligent reflecting surface phase shift joint optimization method and system
CN113364495B (en) * 2021-05-25 2022-08-05 西安交通大学 Multi-unmanned aerial vehicle track and intelligent reflecting surface phase shift joint optimization method and system
CN113286314B (en) * 2021-05-25 2022-03-08 重庆邮电大学 Unmanned aerial vehicle base station deployment and user association method based on Q learning algorithm
CN113286314A (en) * 2021-05-25 2021-08-20 重庆邮电大学 Unmanned aerial vehicle base station deployment and user association method based on Q learning algorithm
CN113255218A (en) * 2021-05-27 2021-08-13 电子科技大学 Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network
CN113255218B (en) * 2021-05-27 2022-05-31 电子科技大学 Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network
CN113328775A (en) * 2021-05-28 2021-08-31 怀化学院 UAV height positioning system and computer storage medium
CN113328775B (en) * 2021-05-28 2022-06-21 怀化学院 UAV height positioning system and computer storage medium
CN113660681B (en) * 2021-05-31 2023-06-06 西北工业大学 Multi-agent resource optimization method applied to unmanned aerial vehicle cluster auxiliary transmission
CN113660681A (en) * 2021-05-31 2021-11-16 西北工业大学 Multi-agent resource optimization method applied to unmanned aerial vehicle cluster auxiliary transmission
CN113242556A (en) * 2021-06-04 2021-08-10 重庆邮电大学 Unmanned aerial vehicle resource dynamic deployment method based on differentiated services
CN113242556B (en) * 2021-06-04 2022-08-23 重庆邮电大学 Unmanned aerial vehicle resource dynamic deployment method based on differentiated services
CN113382060B (en) * 2021-06-07 2022-03-22 北京理工大学 Unmanned aerial vehicle track optimization method and system in Internet of things data collection
CN113382060A (en) * 2021-06-07 2021-09-10 北京理工大学 Unmanned aerial vehicle track optimization method and system in Internet of things data collection
CN113392971A (en) * 2021-06-11 2021-09-14 武汉大学 Strategy network training method, device, equipment and readable storage medium
CN113364630A (en) * 2021-06-15 2021-09-07 广东技术师范大学 Quality of service (QoS) differentiation optimization method and device
CN113572548B (en) * 2021-06-18 2023-07-07 南京理工大学 Unmanned plane network cooperative fast frequency hopping method based on multi-agent reinforcement learning
CN113572548A (en) * 2021-06-18 2021-10-29 南京理工大学 Unmanned aerial vehicle network cooperative fast frequency hopping method based on multi-agent reinforcement learning
CN113346944A (en) * 2021-06-28 2021-09-03 上海交通大学 Time delay minimization calculation task unloading method and system in air-space-ground integrated network
CN113346944B (en) * 2021-06-28 2022-06-10 上海交通大学 Time delay minimization calculation task unloading method and system in air-space-ground integrated network
CN113467508B (en) * 2021-06-30 2022-06-28 天津大学 Multi-unmanned aerial vehicle intelligent cooperative decision-making method for trapping task
CN113467508A (en) * 2021-06-30 2021-10-01 天津大学 Multi-unmanned aerial vehicle intelligent cooperative decision-making method for trapping task
CN113641192A (en) * 2021-07-06 2021-11-12 暨南大学 Route planning method for unmanned aerial vehicle crowd sensing task based on reinforcement learning
CN113641192B (en) * 2021-07-06 2023-07-18 暨南大学 Route planning method for intelligent perception task of unmanned aerial vehicle group based on reinforcement learning
CN113613339A (en) * 2021-07-10 2021-11-05 西北农林科技大学 Channel access method of multi-priority wireless terminal based on deep reinforcement learning
CN113613339B (en) * 2021-07-10 2023-10-17 西北农林科技大学 Channel access method of multi-priority wireless terminal based on deep reinforcement learning
CN113395708A (en) * 2021-07-13 2021-09-14 东南大学 Multi-autonomous-subject centralized region coverage method and system based on global environment prediction
CN113359480B (en) * 2021-07-16 2022-02-01 中国人民解放军火箭军工程大学 Multi-unmanned aerial vehicle and user cooperative communication optimization method based on MAPPO algorithm
CN113359480A (en) * 2021-07-16 2021-09-07 中国人民解放军火箭军工程大学 Multi-unmanned aerial vehicle and user cooperative communication optimization method based on MAPPO algorithm
CN113776531A (en) * 2021-07-21 2021-12-10 电子科技大学长三角研究院(湖州) Multi-unmanned-aerial-vehicle autonomous navigation and task allocation algorithm of wireless self-powered communication network
CN113625751B (en) * 2021-08-05 2023-02-24 南京航空航天大学 Unmanned aerial vehicle position and resource joint optimization method for air-ground integrated federal learning
CN113625751A (en) * 2021-08-05 2021-11-09 南京航空航天大学 Unmanned aerial vehicle position and resource joint optimization method for air-ground integrated federal learning
CN113625569A (en) * 2021-08-12 2021-11-09 中国人民解放军32802部队 Small unmanned aerial vehicle prevention and control hybrid decision method and system based on deep reinforcement learning and rule driving
CN113625569B (en) * 2021-08-12 2022-02-08 中国人民解放军32802部队 Small unmanned aerial vehicle prevention and control decision method and system based on hybrid decision model
CN113706023B (en) * 2021-08-31 2022-07-12 哈尔滨理工大学 Shipboard aircraft guarantee operator scheduling method based on deep reinforcement learning
CN113706023A (en) * 2021-08-31 2021-11-26 哈尔滨理工大学 Shipboard aircraft guarantee operator scheduling method based on deep reinforcement learning
CN113691294A (en) * 2021-09-27 2021-11-23 中国人民解放军空军预警学院 Near-field sparse array antenna beam establishing method and device
CN114051252A (en) * 2021-09-28 2022-02-15 嘉兴学院 Multi-user intelligent transmitting power control method in wireless access network
CN114021775A (en) * 2021-09-30 2022-02-08 成都海天数联科技有限公司 Intelligent body handicap device putting method based on optimal solution
CN113762512A (en) * 2021-11-10 2021-12-07 北京航空航天大学杭州创新研究院 Distributed model training method, system and related device
CN114142912A (en) * 2021-11-26 2022-03-04 西安电子科技大学 Resource control method for guaranteeing time coverage continuity of high-dynamic air network
CN114222251A (en) * 2021-11-30 2022-03-22 中山大学·深圳 Adaptive network forming and track optimizing method for multiple unmanned aerial vehicles
CN114268986A (en) * 2021-12-14 2022-04-01 北京航空航天大学 Unmanned aerial vehicle computing unloading and charging service efficiency optimization method
CN114372612B (en) * 2021-12-16 2023-04-28 电子科技大学 Path planning and task unloading method for unmanned aerial vehicle mobile edge computing scene
CN114372612A (en) * 2021-12-16 2022-04-19 电子科技大学 Route planning and task unloading method for unmanned aerial vehicle mobile edge computing scene
CN114268963A (en) * 2021-12-24 2022-04-01 北京航空航天大学 Unmanned aerial vehicle network autonomous deployment method facing communication coverage
CN114268963B (en) * 2021-12-24 2023-07-11 北京航空航天大学 Communication coverage-oriented unmanned aerial vehicle network autonomous deployment method
CN114339842A (en) * 2022-01-06 2022-04-12 北京邮电大学 Method and device for designing dynamic trajectory of unmanned aerial vehicle cluster under time-varying scene based on deep reinforcement learning
CN114339842B (en) * 2022-01-06 2022-12-20 北京邮电大学 Method and device for designing dynamic trajectory of unmanned aerial vehicle cluster in time-varying scene based on deep reinforcement learning
CN114374951B (en) * 2022-01-12 2024-04-30 重庆邮电大学 Dynamic pre-deployment method for multiple unmanned aerial vehicles
CN114374951A (en) * 2022-01-12 2022-04-19 重庆邮电大学 Multi-unmanned aerial vehicle dynamic pre-deployment method
CN114124784B (en) * 2022-01-27 2022-04-12 军事科学院系统工程研究院网络信息研究所 Intelligent routing decision protection method and system based on vertical federation
CN114124784A (en) * 2022-01-27 2022-03-01 军事科学院系统工程研究院网络信息研究所 Intelligent routing decision protection method and system based on vertical federation
CN114548551A (en) * 2022-02-21 2022-05-27 广东汇天航空航天科技有限公司 Method and device for determining residual endurance time, aircraft and medium
CN114578335B (en) * 2022-03-03 2024-08-16 电子科技大学长三角研究院(衢州) Positioning method based on multi-agent deep reinforcement learning and least square
CN114578335A (en) * 2022-03-03 2022-06-03 电子科技大学长三角研究院(衢州) Positioning method based on multi-agent deep reinforcement learning and least square
CN114567888A (en) * 2022-03-04 2022-05-31 重庆邮电大学 Multi-unmanned aerial vehicle dynamic deployment method
CN114567888B (en) * 2022-03-04 2023-12-26 国网浙江省电力有限公司台州市黄岩区供电公司 Multi-unmanned aerial vehicle dynamic deployment method
CN114625151B (en) * 2022-03-10 2024-05-28 大连理工大学 Underwater robot obstacle avoidance path planning method based on reinforcement learning
CN114625151A (en) * 2022-03-10 2022-06-14 大连理工大学 Underwater robot obstacle avoidance path planning method based on reinforcement learning
CN114449482B (en) * 2022-03-11 2024-05-14 南京理工大学 Heterogeneous Internet of vehicles user association method based on multi-agent deep reinforcement learning
CN114449482A (en) * 2022-03-11 2022-05-06 南京理工大学 Heterogeneous vehicle networking user association method based on multi-agent deep reinforcement learning
CN114679699A (en) * 2022-03-23 2022-06-28 重庆邮电大学 Multi-unmanned-aerial-vehicle energy-saving cruise communication coverage method based on deep reinforcement learning
CN114884895B (en) * 2022-05-05 2023-08-22 郑州轻工业大学 Intelligent flow scheduling method based on deep reinforcement learning
CN114884895A (en) * 2022-05-05 2022-08-09 郑州轻工业大学 Intelligent traffic scheduling method based on deep reinforcement learning
CN114980169B (en) * 2022-05-16 2024-08-20 北京理工大学 Unmanned aerial vehicle auxiliary ground communication method based on track and phase joint optimization
CN114980169A (en) * 2022-05-16 2022-08-30 北京理工大学 Unmanned aerial vehicle auxiliary ground communication method based on combined optimization of track and phase
CN114980020B (en) * 2022-05-17 2024-07-12 中科润物科技(南京)有限公司 MADDPG algorithm-based unmanned aerial vehicle data collection method
CN114980020A (en) * 2022-05-17 2022-08-30 重庆邮电大学 Unmanned aerial vehicle data collection method based on MADDPG algorithm
CN114997617B (en) * 2022-05-23 2024-06-07 华中科技大学 Multi-unmanned platform multi-target combined detection task allocation method and system
CN115038155A (en) * 2022-05-23 2022-09-09 香港中文大学(深圳) Ultra-dense multi-access-point dynamic cooperative transmission method
CN114997617A (en) * 2022-05-23 2022-09-02 华中科技大学 Multi-unmanned platform multi-target joint detection task allocation method and system
CN115314904B (en) * 2022-06-14 2024-03-29 北京邮电大学 Communication coverage method based on multi-agent maximum entropy reinforcement learning and related equipment
CN115314904A (en) * 2022-06-14 2022-11-08 北京邮电大学 Communication coverage method and related equipment based on multi-agent maximum entropy reinforcement learning
CN115113651A (en) * 2022-07-18 2022-09-27 中国电子科技集团公司第五十四研究所 Unmanned robot bureaucratic cooperative coverage optimization method based on ellipse fitting
CN114942653A (en) * 2022-07-26 2022-08-26 北京邮电大学 Method and device for determining unmanned cluster flight strategy and electronic equipment
CN115460543A (en) * 2022-08-31 2022-12-09 中国地质大学(武汉) Distributed ring fence covering method, device and storage device
CN115460543B (en) * 2022-08-31 2024-04-19 中国地质大学(武汉) Distributed annular fence coverage method, device and storage device
CN115713130A (en) * 2022-09-07 2023-02-24 华东交通大学 Vehicle scheduling method based on hyper-parameter network weight distribution deep reinforcement learning
CN115713130B (en) * 2022-09-07 2023-09-05 华东交通大学 Vehicle scheduling method based on super-parameter network weight distribution deep reinforcement learning
CN115802313A (en) * 2022-11-16 2023-03-14 河南大学 Air-ground mobile network energy-carrying fair communication method based on intelligent reflecting surface
CN115499849A (en) * 2022-11-16 2022-12-20 国网湖北省电力有限公司信息通信公司 Wireless access point and reconfigurable intelligent surface cooperation method
CN115499849B (en) * 2022-11-16 2023-04-07 国网湖北省电力有限公司信息通信公司 Wireless access point and reconfigurable intelligent surface cooperation method
CN116017479B (en) * 2022-12-30 2024-10-25 河南大学 Distributed multi-unmanned aerial vehicle relay network coverage method
CN116208968B (en) * 2022-12-30 2024-04-05 北京信息科技大学 Track planning method and device based on federal learning
CN116017479A (en) * 2022-12-30 2023-04-25 河南大学 Distributed multi-unmanned aerial vehicle relay network coverage method
CN116208968A (en) * 2022-12-30 2023-06-02 北京信息科技大学 Track planning method and device based on federal learning
CN116009590A (en) * 2023-02-01 2023-04-25 中山大学 Unmanned aerial vehicle network distributed track planning method, system, equipment and medium
CN116009590B (en) * 2023-02-01 2023-11-17 中山大学 Unmanned aerial vehicle network distributed track planning method, system, equipment and medium
CN116456307B (en) * 2023-05-06 2024-04-09 山东省计算中心(国家超级计算济南中心) Q learning-based energy-limited Internet of things data acquisition and fusion method
CN116456307A (en) * 2023-05-06 2023-07-18 山东省计算中心(国家超级计算济南中心) Q learning-based energy-limited Internet of things data acquisition and fusion method
CN116502547B (en) * 2023-06-29 2024-06-04 深圳大学 Multi-unmanned aerial vehicle wireless energy transmission method based on graph reinforcement learning
CN116502547A (en) * 2023-06-29 2023-07-28 深圳大学 Multi-unmanned aerial vehicle wireless energy transmission method based on graph reinforcement learning
CN116980881A (en) * 2023-08-29 2023-10-31 北方工业大学 Multi-unmanned aerial vehicle collaboration data distribution method, system, electronic equipment and medium
CN116980881B (en) * 2023-08-29 2024-01-23 北方工业大学 Multi-unmanned aerial vehicle collaboration data distribution method, system, electronic equipment and medium
CN117856903A (en) * 2023-12-07 2024-04-09 山东科技大学 Marine unmanned aerial vehicle optical link data transmission method based on multi-agent reinforcement learning
CN117856903B (en) * 2023-12-07 2024-08-30 山东科技大学 Marine unmanned aerial vehicle optical link data transmission method based on multi-agent reinforcement learning
CN117376934A (en) * 2023-12-08 2024-01-09 山东科技大学 Deep reinforcement learning-based multi-unmanned aerial vehicle offshore mobile base station deployment method
CN117376934B (en) * 2023-12-08 2024-02-27 山东科技大学 Deep reinforcement learning-based multi-unmanned aerial vehicle offshore mobile base station deployment method
CN117835463A (en) * 2023-12-27 2024-04-05 武汉大学 Space-to-ground ad hoc communication network space-time dynamic deployment method based on deep reinforcement learning

Also Published As

Publication number Publication date
CN111786713B (en) 2021-06-08

Similar Documents

Publication Publication Date Title
CN111786713B (en) Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning
CN109831797B (en) Unmanned aerial vehicle base station bandwidth and track joint optimization method with limited push power
Zhang et al. Energy-efficient trajectory optimization for UAV-assisted IoT networks
CN110364031B (en) Path planning and wireless communication method for unmanned aerial vehicle cluster in ground sensor network
CN109286913B (en) Energy consumption optimization method of unmanned aerial vehicle mobile edge computing system based on cellular network connection
CN111263332A (en) Unmanned aerial vehicle track and power joint optimization method based on deep reinforcement learning
CN109885088B (en) Unmanned aerial vehicle flight trajectory optimization method based on machine learning in edge computing network
CN109067490A (en) Cellular Networks join lower multiple no-manned plane and cooperate with mobile edge calculations method for distributing system resource
CN114690799A (en) Air-space-ground integrated unmanned aerial vehicle Internet of things data acquisition method based on information age
CN113660681B (en) Multi-agent resource optimization method applied to unmanned aerial vehicle cluster auxiliary transmission
CN113359480B (en) Multi-unmanned aerial vehicle and user cooperative communication optimization method based on MAPPO algorithm
CN108668257B (en) Distributed unmanned aerial vehicle postman difference relay trajectory optimization method
CN112702713B (en) Low-altitude unmanned-machine communication deployment method under multi-constraint condition
CN114980169A (en) Unmanned aerial vehicle auxiliary ground communication method based on combined optimization of track and phase
CN115499921A (en) Three-dimensional trajectory design and resource scheduling optimization method for complex unmanned aerial vehicle network
Gangula et al. A landing spot approach for enhancing the performance of UAV-aided wireless networks
CN113163332A (en) Road sign graph coloring unmanned aerial vehicle energy-saving endurance data collection method based on metric learning
CN114205769A (en) Joint trajectory optimization and bandwidth allocation method based on unmanned aerial vehicle data acquisition system
CN117119489A (en) Deployment and resource optimization method of wireless energy supply network based on multi-unmanned aerial vehicle assistance
Babu et al. Fairness-based energy-efficient 3-D path planning of a portable access point: A deep reinforcement learning approach
CN115407794A (en) Sea area safety communication unmanned aerial vehicle track real-time planning method based on reinforcement learning
CN114020024B (en) Unmanned aerial vehicle path planning method based on Monte Carlo tree search
CN113050672B (en) Unmanned aerial vehicle path planning method for emergency information acquisition and transmission
CN113776531A (en) Multi-unmanned-aerial-vehicle autonomous navigation and task allocation algorithm of wireless self-powered communication network
CN116882270A (en) Multi-unmanned aerial vehicle wireless charging and edge computing combined optimization method and system based on deep reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant