CN111786713A - Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning - Google Patents
Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning Download PDFInfo
- Publication number
- CN111786713A CN111786713A CN202010497656.4A CN202010497656A CN111786713A CN 111786713 A CN111786713 A CN 111786713A CN 202010497656 A CN202010497656 A CN 202010497656A CN 111786713 A CN111786713 A CN 111786713A
- Authority
- CN
- China
- Prior art keywords
- unmanned aerial
- aerial vehicle
- network
- ground
- base station
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 230000002787 reinforcement Effects 0.000 title claims abstract description 22
- 238000005457 optimization Methods 0.000 title claims abstract description 21
- 238000004891 communication Methods 0.000 claims abstract description 46
- 238000012549 training Methods 0.000 claims abstract description 34
- 238000005265 energy consumption Methods 0.000 claims abstract description 23
- 230000006870 function Effects 0.000 claims abstract description 19
- 230000008569 process Effects 0.000 claims abstract description 13
- 239000003795 chemical substances by application Substances 0.000 claims description 54
- 238000011156 evaluation Methods 0.000 claims description 20
- 230000009471 action Effects 0.000 claims description 15
- 230000005540 biological transmission Effects 0.000 claims description 6
- 239000000654 additive Substances 0.000 claims description 3
- 230000000996 additive effect Effects 0.000 claims description 3
- 230000009916 joint effect Effects 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 2
- 230000003068 static effect Effects 0.000 claims description 2
- 230000001419 dependent effect Effects 0.000 claims 1
- 230000003993 interaction Effects 0.000 abstract description 2
- 238000005516 engineering process Methods 0.000 description 3
- 230000007423 decrease Effects 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011217 control strategy Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B7/00—Radio transmission systems, i.e. using radiation field
- H04B7/14—Relay systems
- H04B7/15—Active relay systems
- H04B7/185—Space-based or airborne stations; Stations for satellite systems
- H04B7/18502—Airborne stations
- H04B7/18506—Communications with or from aircraft, i.e. aeronautical mobile service
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W16/00—Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
- H04W16/18—Network planning tools
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W16/00—Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
- H04W16/22—Traffic simulation tools or models
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Aviation & Aerospace Engineering (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Astronomy & Astrophysics (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
An unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning comprises the steps of firstly, modeling a channel model, a coverage model and an energy loss model in an unmanned aerial vehicle ground communication scene; modeling a throughput maximization problem of an unmanned aerial vehicle to ground communication network into a partially observable Markov decision process; obtaining local observation information and instantaneous rewards through continuous interaction of an unmanned aerial vehicle and the environment, and carrying out centralized training based on the information to obtain a distributed strategy network; and deploying the strategy network to each unmanned aerial vehicle, wherein each unmanned aerial vehicle can obtain a moving direction and a moving distance decision based on local observation information of the unmanned aerial vehicle, adjust the hovering position and perform distributed cooperation. The invention also introduces proportional fair scheduling and energy consumption loss information of the unmanned aerial vehicle into the instantaneous reward function, thereby improving the throughput, ensuring the fairness of the unmanned aerial vehicle to the ground user service, reducing the energy consumption loss and enabling the unmanned aerial vehicle cluster to adapt to the dynamic environment.
Description
Technical Field
The invention relates to the technical field of wireless communication, in particular to a multi-agent deep reinforcement learning-based multi-unmanned-aerial-vehicle network hovering position optimization method.
Background
In recent years, due to high mobility, easy deployment and low cost of the unmanned aerial vehicle, the communication technology based on the unmanned aerial vehicle draws wide attention and becomes a new research hotspot in the field of wireless communication. The unmanned aerial vehicle auxiliary communication technology mainly has the following application scenes: the unmanned aerial vehicle is used as a mobile base station to provide communication coverage for scarce infrastructure or post-disaster areas, and the unmanned aerial vehicle is used as a relay node to provide wireless connection for two communication nodes which are far away and cannot be directly connected, and data distribution and acquisition are based on the unmanned aerial vehicle. The present invention is primarily directed to the first scenario in which the hover position of the drone determines the coverage performance and throughput of the entire drone network. Ground devices served by the drone network may have mobility, so the drone needs to constantly adjust its hover position for optimal performance.
In 2018, Qingqing Wu et al proposed a UAV path planning scheme for a multi-drone ground communication system in the paper "jointtrajectoryandcommunicationdesign for multi-uavenalyzed wireless networks", which divides time into multiple periods, the UAVs movement trajectories in each period are the same, and in each timeslot, a drone base station serves a specific ground user. According to the scheme, an optimization problem is modeled into a mixed integer programming problem, a block coordinate gradient descent and approximately convex optimization technology is used for solving, the optimal hovering position of each time slice in a period is obtained, and downlink throughput between the time slice and a ground user is maximized. However, the solution proposed in this paper is only applicable to a static environment, which is performed under the condition that the ground device does not have mobility, and is not applicable to a scenario in which the ground user continuously moves. A deep reinforcement Learning-based UAV path planning algorithm is proposed in a paper "Energy-Efficient UAV Control for Efficient and Fair communication coverage" by Chi Harold Liu et al, a decision model is trained by a deep reinforcement Learning method, and the model outputs the next decision (moving direction and moving distance) of UAVs according to the current state. The method proposed by the paper enables fair wireless coverage over a large area and reduces the energy consumption of UAVs as much as possible. However, this method only considers the coverage performance of UAVs networks and is coarse-grained coverage fairness for areas, not fine-grained coverage fairness for users. In addition, the method is a centralized solution, and a controller is required to collect information of all drones in each time slot to make a decision.
In summary, the UAVs path planning technique in the ground communication network based on the base station of the unmanned aerial vehicle mainly has the following defects: (1) the dynamics of the environment, i.e. the mobility of the terrestrial users, are not taken into account. (2) The adopted centralized algorithm depends on global information and centralized control, and centralized control is difficult in some large-scale scenes, so that a distributed control strategy is needed, and each unmanned aerial vehicle base station makes a decision only by the information obtained by the unmanned aerial vehicle base station. (3) Service fairness at the user level is ignored. Due to the defects, the UAVs trajectory optimization method in the existing unmanned aerial vehicle network cannot be applied to the actual communication environment.
Disclosure of Invention
The invention aims to provide a multi-agent reinforcement learning-based multi-unmanned aerial vehicle hovering position optimization method to solve the technical problem.
The technical scheme of the invention is as follows:
an unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning comprises the following steps:
(1) the method for establishing the multi-unmanned aerial vehicle communication network model comprises the following 4 steps:
(1.1) establishing a scene model: a square target area with the side length l is established, N ground users and M unmanned aerial vehicle base stations (UAV-BSs) are arranged in the area, and the UAV base stations provide communication services for the ground users. The time is divided into T identical time slots, and from the last time slot to the current time slot, the ground user may be stationary or may move, so the base station of the drone needs to find a new optimal hovering position at each time slot, and select the ground user to perform data transmission service after reaching the target position.
(1.2) establishing an air-to-ground communication model: the invention uses an air-to-ground channel model to model the channel between the unmanned aerial vehicle base station and the ground user, the unmanned aerial vehicle base station is easier to establish a line of sight (LoS) with the ground user compared with the ground base station due to high flight height, and under the LoS condition, the path loss model between the unmanned aerial vehicle base station m and the ground user n is as follows:
where η denotes the excess path loss coefficient, c denotes the speed of light, fcRepresenting the subcarrier frequency, α the path loss exponent,represents the distance between drone base station m and ground user n, where rn,mAnd h is the fixed flying height of the unmanned aerial vehicle base station. The channel gain can be expressed as a path lossAccording to the channel gain, the data transmission rate between the base station m of the unmanned aerial vehicle and the ground user n in the time slot t is as follows:
where σ represents additive white Gaussian noise, ptIndicating transmit power of drone base station,gn,m(t) represents the channel gain between drone base station m and ground user n at time t.
(1.3) establishing a coverage model: due to hardware limitations, the coverage of each drone base station is limited. The invention defines the maximum tolerable path loss LmaxIf the path loss between the unmanned aerial vehicle base station and the user is less than L at a certain momentmaxWe consider the established connection reliable, otherwise we consider the established connection failed. Therefore, the effective coverage range of each unmanned aerial vehicle base station can be defined according to the maximum tolerable path loss, and the range takes the projection point of the unmanned aerial vehicle base station on the ground as the circle center and RcovIs radius, according to the path loss formula, RcovCan be expressed as:
(1.4) establishing an energy loss model: the invention mainly focuses on energy loss caused by movement of the unmanned aerial vehicle, and considers the flight speed V and the flight power p of the unmanned aerial vehiclefThe flight energy consumption of the drone base station m at the time slot t depends on the distance of flight:
whereinRespectively representing the x-axis and y-axis position coordinates of the drone on the horizontal plane.
(2) Modeling the problem as a locally observable markov decision process:
each unmanned aerial vehicle base station is equivalent to an agent; in each time slot with the environment state S (t), the agent m can only obtain local observation o within the coverage range of the agent mmAnd according to a decision function um(om) Selecting action a from action set AmTo maximize the total desired reward for the discountWhere γ ∈ (0,1) is the discount coefficient, rm(t) represents the reward of agent m at time t;
the system state set S ═ { S (t) | S (t) ═ S (S)u(t),Sg(t)) }, respectively containing the current state of the drone base stationAnd the current state of the ground userUnmanned aerial vehicle base station statusThe current position information of the unmanned aerial vehicle is included; ground user statusIncluding location information of the current terrestrial user.
In the time slot t, the unmanned aerial vehicle m needs to make a decision a after obtaining the current local observation informationm(t), move to the next hover position, so the action set includes the flight rotation angle θ (t) and the movement distance d (t).
System timely reward r (t): the objective herein is to maximize the throughput of the drone network while taking into account user service fairness and energy consumption. Thus, the extra throughput generated by adjusting the hover position of the drone at each time t is a positive reward, expressed as:
ΔC(t)=C(Su(t+1),Sg(t))-C(Su(t),Sg(t))
wherein C (S)u(t),Sg(t)) indicates that the unmanned aerial vehicle base station state is Su(t) the ground user status is Sg(t) throughput generated by the network. C (S)u(t+1),Sg(t)) then indicates that the unmanned aerial vehicle base station state is Su(t +1), the ground user state is Sg(t) throughput generated by the network. Fair in view of user servicesAlternatively, if a certain area has a large number of users, and a certain area has only one user, the drone base station always hovers in a high-density area in pursuit of maximizing throughput, and ignores a low-density area, so the invention applies a weight w to the throughput reward of each usern(t) implementing proportional fair scheduling. RreqExpressed is the minimum communication rate requirement, R, of the terrestrial user demandn(t) represents the average communication rate of the terrestrial user n from the start phase to time t. When the drone base station serves this user, Rn(t) increasing, the user's weight gradually decreasing; if the user is not served, Rn(t) decreases and the user weight increases. Therefore, the reward weight of the user sparse area is increased continuously, and the unmanned aerial vehicle base station is attracted to carry out service.
Wherein, an,m(t) is an indicator variable, at time t, if drone base station m serves a ground user n, then an,m(t) is 1, therefore, the invention gives a system real-time reward r (t) by comprehensively considering fairness throughput reward and energy loss penalty:
wherein alpha represents the weight occupied by the energy consumption punishment, the larger alpha is, the more the system pays attention to the energy consumption loss in the decision making process, otherwise, the more the energy consumption loss is ignored.
Local observation set o (t) ═ o1(t),…,oM(t), when the multiple unmanned aerial vehicle base stations cooperatively work in a large-range area, each unmanned aerial vehicle cannot observe global information and can only observe ground user information in the coverage area of the unmanned aerial vehicle. om(t) observed by drone base station m at time tPosition information of the ground users within the coverage of the ground users.
(3) Training based on a multi-agent deep reinforcement learning algorithm:
the multi-agent deep reinforcement learning algorithm MADDPG is introduced into the hovering position optimization of the unmanned aerial vehicle in the ground communication network, a centralized training and distributed execution architecture is adopted, global information is used during training, the gradient updating of a decision function of each unmanned aerial vehicle is better guided, each unmanned aerial vehicle only uses local information observed by the unmanned aerial vehicle to make a next decision during execution, and the needs of an actual scene are better met; each agent adopts an Actor-Critic structured DDPG network for training, the strategy network is used for fitting a strategy function u (o), inputting a local observation o and outputting an action strategy a; the evaluation network is used to fit a state-action function Q (s, a) representing the desired reward for taking action a when the system state is s; let u be { u ═ u1,…,uMDenotes the deterministic policy functions of M agents,parameter representing each policy network, Q ═ Q1,…,QMDenotes the evaluation network of M agents,a parameter indicative of an evaluation network, the step (3) comprising:
(3.1) initializing an experience playback space, setting the size of the experience playback space, initializing parameters of each DDPG network, the number of training rounds and the like
(3.2) starting from the training round epoch-1 and starting from the time t-1.
(3.3) acquiring local observation information o of the current unmanned aerial vehicle and the current state s of the whole system, wherein each unmanned aerial vehicle m uses the local observation information obtained by the time slot t, and outputs decision information a based on an ∈ greedy strategy and a DDPG networkmAdjusting the hovering position, selecting the W ground users with the lowest path loss to carry out communication service based on a greedy scheme according to the path loss between the ground users and the hovering position, obtaining an instantaneous return reward r, and achieving the next seriesThe state s 'is unified and local observation information o' is obtained; storing (s, o, a, r, s ', o') as a sample in an empirical playback space, a ═ a { (a)1,…,aMDenotes the joint action of all drones, o ═ o1,…,omDenotes the local observation information of all drones, t ═ t + 1.
(3.4) if the number of the samples stored in the playback space is greater than B, reaching the step 3.5; otherwise, continuing to collect the sample and returning to the step 3.3.
(3.5) for each agent m, calculating a target value from randomly sampling a fixed number K of samples in the empirical playback space, where the kth sample(s)k,ok,ak,rk,s′k,ok) Target value y ofkCan be expressed as:wherein Q'mA target network u 'representing an evaluation network of the m-th agent'mTarget network of the policy network representing the mth agent, rkRepresents the timely reward, a ', in the kth sample'mRepresents that the unmanned plane m is in the system state s′kAccording to local observationThe decision made. Minimizing loss functions using gradient descent based on global informationAnd updating parameters of the evaluation network of the agent:
updating parameters of the intelligent agent strategy network based on the strategy gradient of the sample according to the evaluation network and the sample information:
(3.6) betweenAfter a certain round, i.e. updating the target network parameter θQ′And thetau′:θQ′=τθQ+(1-τ)θQ′,θu′=τθu+(1-τ)θu′. And when the total duration T is reached or the energy of the unmanned aerial vehicle is exhausted, exiting the current training round, otherwise, returning to the step 3.3. If the number of training rounds is up, the training process is exited, otherwise a new training round is entered.
(4) And distributing the trained strategy network u to each unmanned aerial vehicle, deploying the unmanned aerial vehicles to a target area, adjusting the hovering position of each unmanned aerial vehicle in each time slot according to local observation of the unmanned aerial vehicle, and performing communication service on ground users.
The invention has the beneficial effects that: the invention provides an unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning, which is characterized in that the problem of throughput maximization in an unmanned aerial vehicle ground communication network scene is modeled as a partially observable Markov decision process, a multi-agent deep reinforcement learning method MADPGS is introduced for centralized training and distributed execution, and the problem of unmanned aerial vehicle hovering position optimization in a dynamic environment is solved. The method enables the unmanned aerial vehicle cluster to be better adaptive to a dynamic environment, and the unmanned aerial vehicles do not depend on a centralized controller and can cooperate in a distributed mode.
Drawings
Fig. 1 is a schematic view of a scene of a ground-to-ground communication network of an unmanned aerial vehicle according to the present invention.
FIG. 2 is a flow chart of the unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning.
Fig. 3 is a flow chart of the distributed strategy network for training the unmanned aerial vehicle based on multi-agent deep reinforcement learning according to the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
An unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning is applied to emergency communication recovery of areas lacking ground infrastructure or in disaster. As shown in fig. 1, the area lacks an infrastructure communication facility, the unmanned aerial vehicle is used as a mobile base station for communication coverage, the ground environment is dynamically changed, ground equipment may move, and the unmanned aerial vehicle base station needs to continuously adjust its hovering position to realize better communication service (maximize the throughput of the system). Meanwhile, fairness of service and energy consumption loss are considered, certain ground users cannot be ignored due to the fact that the maximum throughput is pursued, and the energy consumption loss caused by movement of the unmanned aerial vehicle base station is reduced as far as possible. As shown in fig. 2, firstly, a communication model, a coverage model, an energy consumption model and the like in a specific application scene are modeled, and an optimization target is constructed; secondly, modeling an optimization problem into a local observable Markov decision process according to an optimization target and the characteristics of the multi-unmanned aerial vehicle system; then, simulating a ground communication scene of multiple unmanned aerial vehicles by using a simulation platform, acquiring samples through interaction of an unmanned aerial vehicle cluster and the environment, and performing centralized training by using a multi-agent deep reinforcement learning algorithm MADDPG to obtain a distributed strategy of each unmanned aerial vehicle. And finally, deploying the trained strategy network to the unmanned aerial vehicle, deploying the unmanned aerial vehicle cluster to a target area, and enabling the unmanned aerial vehicles to cooperate with each other to complete high-throughput, low-energy-consumption and fair-quality communication coverage.
The method comprises the following specific steps:
(1) the method for establishing the multi-unmanned aerial vehicle communication network model comprises the following 4 steps:
(1.1) establishing a scene model: a square target area with the side length l is established, N ground users and M unmanned aerial vehicle base stations (UAV-BSs) are arranged in the area, and the UAV base stations provide communication services for the ground users. The time is divided into T identical time slots, and from the last time slot to the current time slot, the ground user may be stationary or may move, so the base station of the drone needs to find a new optimal hovering position at each time slot, and select the ground user to perform data transmission service after reaching the target position.
(1.2) establishing an air-to-ground communication model: the invention uses an air-to-ground channel model to model the channel between the unmanned aerial vehicle base station and the ground user, the unmanned aerial vehicle base station is easier to establish a line of sight (LoS) with the ground user compared with the ground base station due to high flight height, and under the LoS condition, the path loss model between the unmanned aerial vehicle base station m and the ground user n is as follows:
where η denotes the excess path loss coefficient, c denotes the speed of light, fcRepresenting the subcarrier frequency, α the path loss exponent,representing the distance, r, between the drone base station m and the ground user nn,mFor horizontal distance, h is the fixed flying height of unmanned aerial vehicle basic station. The channel gain can be expressed as a path lossAccording to the channel gain, the data transmission rate between the base station m of the unmanned aerial vehicle and the ground user n in the time slot t is as follows:
where σ represents additive white Gaussian noise, ptRepresenting the transmitted power of the drone base station, gn,m(t) represents the channel gain between drone base station m and ground user n at time t.
(1.3) establishing a coverage model: due to hardware limitations, the coverage of each drone base station is limited. The invention defines the maximum tolerable path loss LmaxIf the path loss between the unmanned aerial vehicle base station and the user is less than L at a certain momentmaxWe believe thatOtherwise, we consider the connection to be established as failed. Therefore, the effective coverage range of each unmanned aerial vehicle base station can be defined according to the maximum tolerable path loss, and the range takes the projection point of the unmanned aerial vehicle base station on the ground as the circle center and RcovIs radius, according to the path loss formula, RcovCan be expressed as:
(1.4) establishing an energy loss model: the invention mainly focuses on energy loss caused by movement of the unmanned aerial vehicle, and considers the flight speed V and the flight power p of the unmanned aerial vehiclefThe flight energy consumption of the drone base station m at the time slot t depends on the distance of flight:
whereinRespectively representing the x-axis and y-axis position coordinates of the drone on the horizontal plane.
(2) Modeling the problem as a locally observable markov decision process:
each unmanned aerial vehicle base station is equivalent to an agent; in each time slot with the environment state S (t), the agent m can only obtain local observation o within the coverage range of the agent mmAnd according to a decision function um(om) Selecting action a from action set AmTo maximize the total desired reward for the discountWhere γ ∈ (0,1) is the discount coefficient, rm(t) represents the reward of agent m at time t;
the system state set S ═ { S (t) | S (t) ═ S (S)u(t),Sg(t)) }, respectively containing the current state of the drone base stationAnd the current state of the ground userUnmanned aerial vehicle base station statusThe current position information of the unmanned aerial vehicle is included; ground user statusIncluding location information of the current terrestrial user.
In the time slot t, the unmanned aerial vehicle m needs to make a decision a after obtaining the current local observation informationm(t), move to the next hover position, so the action set includes the flight rotation angle θ (t) and the movement distance d (t).
System timely reward r (t): the objective herein is to maximize the throughput of the drone network while taking into account user service fairness and energy consumption. Thus, the extra throughput generated by adjusting the hover position of the drone at each time t is a positive reward, expressed as:
ΔC(t)=C(Su(t+1),Sg(t))-C(Su(t),Sg(t))
wherein C (S)u(t),Sg(t)) indicates that the unmanned aerial vehicle base station state is Su(t) the ground user status is Sg(t) throughput generated by the network. C (S)u(t+1),Sg(t)) then indicates that the unmanned aerial vehicle base station state is Su(t +1), the ground user state is Sg(t) throughput generated by the network. Considering fairness of user service, if a certain area is gathered with a large number of users and the certain area has only one user, the unmanned aerial vehicle base station always hovers in a high-density area in pursuit of maximizing throughput and ignores a low-density area, so that the invention applies a weight w to the throughput reward of each usern(t) implementing proportional fair scheduling. RreqRepresenting the minimum communication rate requirement of terrestrial user demand,Rn(t) represents the average communication rate of the terrestrial user n from the start phase to time t. When the drone base station serves this user, Rn(t) increasing, the user's weight gradually decreasing; if the user is not served, Rn(t) decreases and the user weight increases. Therefore, the reward weight of the user sparse area is increased continuously, and the unmanned aerial vehicle base station is attracted to carry out service.
Therefore, by comprehensively considering fairness throughput reward and energy loss penalty, the invention gives a system real-time reward r (t)
Wherein alpha represents the weight occupied by the energy consumption punishment, the larger alpha is, the more the system pays attention to the energy consumption loss in the decision making process, otherwise, the more the energy consumption loss is ignored.
Local observation set o (t) ═ o1(t),…,oM(t), when the multiple unmanned aerial vehicle base stations cooperatively work in a large-range area, each unmanned aerial vehicle cannot observe global information and can only observe ground user information in the coverage area of the unmanned aerial vehicle. om(t) represents the position information of the ground users in the coverage range of the unmanned aerial vehicle base station m.
(3) Training based on a multi-agent deep reinforcement learning algorithm:
the invention introduces a multi-agent deep reinforcement learning algorithm MADDPG into the hovering position optimization of the unmanned aerial vehicle to the ground communication network, adopts a centralized training and distributed execution architecture, uses global information during training to better guide the gradient update of a decision function of each unmanned aerial vehicle, and only uses local information observed by each unmanned aerial vehicle to make the next step during executionStep decision, the requirements of the actual scene are better met; each agent adopts an Actor-Critic structured DDPG network for training, the strategy network is used for fitting a strategy function u (o), inputting a local observation o and outputting an action strategy a; the evaluation network is used to fit a state-action function Q (s, a) representing the desired reward for taking action a when the system state is s; let u be { u ═ u1,…,uMDenotes the deterministic policy functions of M agents,parameter representing each policy network, Q ═ Q1,…,QMDenotes the evaluation network of M agents,representing the parameters of the evaluation network, as shown in fig. 3, step (3) includes:
(3.1) initializing an experience playback space, setting the size B of the experience playback space, initializing the parameter theta, the number of training rounds P, the time length T and the like of each DDPG network
(3.2) starting from the training round epoch-1 and starting from the time t-1.
(3.3) acquiring local observation information o of the current unmanned aerial vehicle and the current state s of the whole system, wherein each unmanned aerial vehicle m uses the local observation information obtained by the time slot t, and outputs decision information a based on an ∈ greedy strategy and a DDPG networkmAdjusting the hovering position, selecting W ground users with the lowest path loss to perform communication service based on a greedy scheme according to the path loss between the hovering position and the ground users, obtaining an instant return reward r, achieving the next system state s 'and obtaining local observation information o'; storing (s, o, a, r, s ', o') as a sample in an empirical playback space, a ═ a { (a)1,…,aMDenotes the joint action of all drones, o ═ o1,…,omThe local observation information of all unmanned aerial vehicles is represented, and t is t + 1;
(3.4) if the number of the samples stored in the playback space is greater than B, reaching the step 3.5; otherwise, continuing to collect the sample and returning to the step 3.3.
(3.5) for each agent m, fromA fixed number K of samples are randomly sampled in playback space, and a target value is calculated, wherein the kth sample(s)k,ok,ak,rk,s′k,ok) Target value y ofkCan be expressed as:wherein Q'mA target network u 'representing an evaluation network of the m-th agent'mTarget network of the policy network representing the mth agent, rkRepresents the timely reward, a ', in the kth sample'mRepresents that the unmanned plane m is in the system state s′kAccording to local observationThe decision made. Minimizing loss functions using gradient descent based on global informationAnd updating parameters of the evaluation network of the agent:
updating parameters of the intelligent agent strategy network based on the strategy gradient of the sample according to the evaluation network and the sample information:
(3.6) updating the evaluation target network parameter theta after a certain turn intervalQ′And a policy target network parameter θu′:θQ′=τθQ+(1-τ)θQ′,θu′=τθu+(1-τ)θu′. And when the total duration T is reached or the energy of the unmanned aerial vehicle is exhausted, exiting the current training round, otherwise, returning to the step 3.3. If the number of training rounds is up, the training process is exited, otherwise a new training round is entered.
(4) And distributing the trained strategy network u to each unmanned aerial vehicle, deploying the unmanned aerial vehicles to a target area, adjusting the hovering position of each unmanned aerial vehicle in each time slot according to local observation of the unmanned aerial vehicle, and performing communication service on ground users.
In summary, the following steps:
the invention provides an unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning, which is characterized in that the problem of throughput maximization in a multi-unmanned aerial vehicle ground communication scene is modeled as a local observable Markov decision process and is solved by using an MADDPG algorithm, so that an unmanned aerial vehicle cluster can adapt to a dynamic environment, distributed cooperation is carried out, and high throughput, low energy consumption and service fairness of a network are realized.
The foregoing shows and describes the general principles and broad features of the present invention and advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.
Claims (1)
1. An unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning is characterized by comprising the following steps:
(1) establishing multi-unmanned aerial vehicle to ground communication network model
(1.1) establishing a scene model: establishing a square target area with the side length of l, wherein N ground users and M unmanned aerial vehicle base stations are arranged in the area, and the unmanned aerial vehicle base stations provide communication service for the ground users; the time is divided into T identical time slots, from the last time slot to the current time slot, the ground user may be static or may move, so the unmanned aerial vehicle base station needs to search a new optimal hovering position in each time slot and select the ground user to perform data transmission service after reaching the target position;
(1.2) establishing an air-to-ground communication model: use the air-to-ground channel model to model the channel between unmanned aerial vehicle basic station and the ground user, unmanned aerial vehicle basic station is because high flight altitude, compares in ground basic station and establishes the stadia link LoS with the ground user more easily, and under the LoS condition, the path loss model between unmanned aerial vehicle basic station m and the ground user n is:
where η denotes the excess path loss coefficient, c denotes the speed of light, fcRepresenting the subcarrier frequency, α the path loss exponent,representing the distance, r, between the drone base station m and the ground user nn,mThe horizontal distance is h, and the fixed flying height of the unmanned aerial vehicle base station is h; the channel gain is expressed asAccording to the channel gain, the data transmission rate between the unmanned aerial vehicle base station m and the ground user n in the time slot t is Rn,m(t):
Where σ represents additive white Gaussian noise, ptRepresenting the transmitted power of the drone base station, gn,m(t) represents the channel gain between the unmanned aerial vehicle base station m and the ground user n at time t;
(1.3) establishing a coverage model: defining a maximum tolerable path loss LmaxIf the path loss between the unmanned aerial vehicle base station and the user is less than L at a certain momentmaxIf the connection is not established, the connection is established; defining the effective coverage range of each unmanned aerial vehicle base station according to the maximum tolerable path loss, wherein the range takes the projection point of the unmanned aerial vehicle base station on the ground as the circle center and takes R as the circle centercovIs radius, according to the path loss formula, RcovExpressed as:
(1.4) establishing an energy loss model: paying attention to energy loss caused by movement of the unmanned aerial vehicle, and considering the flying speed V and the flying power p of the unmanned aerial vehiclefAnd the flight energy consumption delta e of the unmanned aerial vehicle base station m in the time slot tm(t) distance of flight dependent:
wherein,respectively representing the position coordinates of the unmanned aerial vehicle on the x axis and the y axis on the horizontal plane at the time t;
(2) modeling the problem as a locally observable markov decision process:
each unmanned aerial vehicle base station is equivalent to an agent; in each time slot with the environment state S (t), the agent m can only obtain local observation o within the coverage range of the agent mmAnd according to a decision function um(om) Selecting an action a from the action setmTo maximize the total desired reward for the discountWhere γ ∈ (0,1) is the discount coefficient, rm(t) represents the reward of agent m at time t;
the system state set S ═ { S (t) | S (t) ═ S (S)u(t),Sg(t)) }, respectively containing the current state of the drone base stationAnd the current state of the ground userStatus of each drone base stationThe current position information of the unmanned aerial vehicle is included; state of each terrestrial userIncluding location information of a current ground user;
in the time slot t, the unmanned aerial vehicle m needs to make a decision a after obtaining the current local observation informationm(t), move to the next hover position, so the action set includes flight rotation angle θ (t) and movement distance d (t);
system real-time rewards r (t): the throughput of the unmanned aerial vehicle network is maximized while the user service fairness and the energy consumption are considered; thus, the extra throughput generated by adjusting the hover position of the drone at each time t is a positive reward, expressed as:
ΔC(t)=C(Su(t+1),Sg(t))-C(Su(t),Sg(t))
wherein, C (S)u(t),Sg(t)) indicates that the unmanned aerial vehicle base station state is Su(t) the ground user status is Sg(t) throughput generated by the network; c (S)u(t+1),Sg(t)) then indicates that the unmanned aerial vehicle base station state is Su(t +1), the ground user state is Sg(t) throughput generated by the network; considering fairness of user service, if a certain area is gathered with a large number of users and a certain area has only a small number of users, the unmanned aerial vehicle base station always hovers in a high-density area in pursuit of maximizing throughput, and ignores the low-density area, so that a weight w is applied to the throughput reward of each usern(t) implementing proportional fair scheduling; rreqExpressed is the minimum communication rate requirement, R, of the terrestrial user demandn(t) represents the average communication rate of the terrestrial user n from the beginning to the time t; when the drone base station serves this user, Rn(t) increase, the user's weight gradually becomes smaller; if the user is not served, Rn(t) increase, the user weight increasing; therefore, the reward weight of the user sparse area is continuously increased, and the unmanned aerial vehicle base station is attracted to carry out service;
wherein, an,m(t) is an indicator variable, at time t, if drone base station m serves a ground user n, then an,m(t) is 1, whereas, an,m(t) ═ 0; therefore, considering the fairness throughput reward and the energy loss penalty comprehensively, the system rewards r (t) in real time:
wherein alpha represents the weight occupied by the energy consumption punishment, the larger alpha is, the more the system pays attention to the energy consumption loss in the decision making process, otherwise, the more the energy consumption loss is ignored;
local observation set o (t) ═ o1(t),...,oM(t), when a plurality of unmanned aerial vehicle base stations cooperatively work in a large-range area, each unmanned aerial vehicle cannot observe global information and can only observe ground user information in the coverage area of the unmanned aerial vehicle; om(t) the position information of the ground user in the coverage range of the unmanned aerial vehicle base station m observed at the moment t is represented;
(3) training based on a multi-agent deep reinforcement learning algorithm:
the multi-agent deep reinforcement learning algorithm MADDPG is introduced into the hovering position optimization of the unmanned aerial vehicle to the ground communication network, a centralized training and distributed execution architecture is adopted, global information is used during training, gradient updating of a decision function of each unmanned aerial vehicle is better guided, each unmanned aerial vehicle only uses local information observed by the unmanned aerial vehicle to make a next decision during execution, and the unmanned aerial vehicle is more suitable for an actual fieldThe need for a scene; each agent adopts an Actor-Critic structured DDPG network for training, the strategy network is used for fitting a strategy function u (o), inputting a local observation o and outputting an action strategy a; the evaluation network is used to fit a state-action function Q (s, a) representing the desired reward for taking action a when the system state is s; let u be { u ═ u1,...,uMDenotes the deterministic policy functions of M agents,parameter representing each policy network, Q ═ Q1,...,QMDenotes the evaluation network of M agents,a parameter indicative of an evaluation network;
(3.1) initializing an experience playback space, setting the size of the experience playback space, initializing parameters of each DDPG network and training rounds;
(3.2) starting from the training round epoch-1 and starting from the time t-1;
(3.3) acquiring local observation information o of the current unmanned aerial vehicle and the current state s of the whole system, wherein each unmanned aerial vehicle m uses the local observation information obtained by the time slot t, and outputs decision information a based on an ∈ greedy strategy and a DDPG networkmAdjusting the hovering position, selecting W ground users with the lowest path loss to perform communication service based on a greedy scheme according to the path loss between the hovering position and the ground users, obtaining an instant return reward r, achieving the next system state s 'and obtaining local observation information o'; storing (s, o, a, r, s ', o') as a sample in an empirical playback space, a ═ a { (a)1,...,aMDenotes the joint action of all drones, o ═ o1,...,omThe local observation information of all unmanned aerial vehicles is represented, and t is t + 1;
(3.4) if the number of the samples stored in the playback space is greater than B, reaching the step (3.5); otherwise, continuing to collect the sample, and returning to the step (3.3);
(3.5) for each agent m, randomly sampling data from the empirical playback spaceA constant number K of samples, calculating a target value, wherein the kth sample(s)k,ok,ak,rk,s′k,ok) Target value y ofkCan be expressed as:wherein Q'mA target network u 'representing an evaluation network of the m-th agent'mTarget network of the policy network representing the mth agent, rkRepresents the timely reward, a ', in the kth sample'mRepresenting unmanned plane m in system state s'kAccording to local observationThe decision made; minimizing loss functions using gradient descent based on global informationAnd updating parameters of the evaluation network of the agent:
updating parameters of the intelligent agent strategy network based on the strategy gradient of the sample according to the evaluation network and the sample information:
(3.6) updating the evaluation target network parameter theta after a certain turn intervalQ′And a policy target network parameter θu′:θQ′=τθQ+(1-τ)θQ′,θu′=τθu+(1-τ)θu′Tau ∈ (0,1) represents updating weight, when reaching total duration T or the energy of the unmanned aerial vehicle is exhausted, quitting the current training round, otherwise, returning to step (3.3), if the number of the training rounds is up, quitting the training process, otherwise, entering a new training round;
(4) and distributing the trained strategy network u to each unmanned aerial vehicle, deploying the unmanned aerial vehicles to a target area, adjusting the hovering position of each unmanned aerial vehicle in each time slot according to local observation of the unmanned aerial vehicle, and performing communication service on ground users.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010497656.4A CN111786713B (en) | 2020-06-04 | 2020-06-04 | Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010497656.4A CN111786713B (en) | 2020-06-04 | 2020-06-04 | Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111786713A true CN111786713A (en) | 2020-10-16 |
CN111786713B CN111786713B (en) | 2021-06-08 |
Family
ID=72753669
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010497656.4A Active CN111786713B (en) | 2020-06-04 | 2020-06-04 | Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111786713B (en) |
Cited By (78)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112256056A (en) * | 2020-10-19 | 2021-01-22 | 中山大学 | Unmanned aerial vehicle control method and system based on multi-agent deep reinforcement learning |
CN112511250A (en) * | 2020-12-03 | 2021-03-16 | 中国人民解放军火箭军工程大学 | DRL-based multi-unmanned aerial vehicle air base station dynamic deployment method and system |
CN112511197A (en) * | 2020-12-01 | 2021-03-16 | 南京工业大学 | Unmanned aerial vehicle auxiliary elastic video multicast method based on deep reinforcement learning |
CN112512115A (en) * | 2020-11-20 | 2021-03-16 | 北京邮电大学 | Method and device for determining position of air base station and electronic equipment |
CN112566209A (en) * | 2020-11-24 | 2021-03-26 | 山西三友和智慧信息技术股份有限公司 | UAV-BSs energy and service priority track design method based on double Q learning |
CN112636811A (en) * | 2020-12-08 | 2021-04-09 | 北京邮电大学 | Relay unmanned aerial vehicle deployment method and device |
CN112672361A (en) * | 2020-12-17 | 2021-04-16 | 东南大学 | Large-scale MIMO capacity increasing method based on unmanned aerial vehicle cluster deployment |
CN112752357A (en) * | 2020-12-02 | 2021-05-04 | 宁波大学 | Online unmanned aerial vehicle auxiliary data collection method and device based on energy harvesting technology |
CN112821938A (en) * | 2021-01-08 | 2021-05-18 | 重庆大学 | Total throughput and energy consumption optimization method of air-space-ground satellite communication system |
CN112904890A (en) * | 2021-01-15 | 2021-06-04 | 北京国网富达科技发展有限责任公司 | Unmanned aerial vehicle automatic inspection system and method for power line |
CN112947575A (en) * | 2021-03-17 | 2021-06-11 | 中国人民解放军国防科技大学 | Unmanned aerial vehicle cluster multi-target searching method and system based on deep reinforcement learning |
CN113094982A (en) * | 2021-03-29 | 2021-07-09 | 天津理工大学 | Internet of vehicles edge caching method based on multi-agent deep reinforcement learning |
CN113115344A (en) * | 2021-04-19 | 2021-07-13 | 中国人民解放军火箭军工程大学 | Unmanned aerial vehicle base station communication resource allocation strategy prediction method based on noise optimization |
CN113162679A (en) * | 2021-04-01 | 2021-07-23 | 南京邮电大学 | DDPG algorithm-based IRS (inter-Range instrumentation System) auxiliary unmanned aerial vehicle communication joint optimization method |
CN113190039A (en) * | 2021-04-27 | 2021-07-30 | 大连理工大学 | Unmanned aerial vehicle acquisition path planning method based on hierarchical deep reinforcement learning |
CN113194488A (en) * | 2021-03-31 | 2021-07-30 | 西安交通大学 | Unmanned aerial vehicle track and intelligent reflecting surface phase shift joint optimization method and system |
CN113242556A (en) * | 2021-06-04 | 2021-08-10 | 重庆邮电大学 | Unmanned aerial vehicle resource dynamic deployment method based on differentiated services |
CN113255218A (en) * | 2021-05-27 | 2021-08-13 | 电子科技大学 | Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network |
CN113286275A (en) * | 2021-04-23 | 2021-08-20 | 南京大学 | Unmanned aerial vehicle cluster efficient communication method based on multi-agent reinforcement learning |
CN113286314A (en) * | 2021-05-25 | 2021-08-20 | 重庆邮电大学 | Unmanned aerial vehicle base station deployment and user association method based on Q learning algorithm |
CN113328775A (en) * | 2021-05-28 | 2021-08-31 | 怀化学院 | UAV height positioning system and computer storage medium |
CN113346944A (en) * | 2021-06-28 | 2021-09-03 | 上海交通大学 | Time delay minimization calculation task unloading method and system in air-space-ground integrated network |
CN113342029A (en) * | 2021-04-16 | 2021-09-03 | 山东师范大学 | Maximum sensor data acquisition path planning method and system based on unmanned aerial vehicle cluster |
CN113364495A (en) * | 2021-05-25 | 2021-09-07 | 西安交通大学 | Multi-unmanned aerial vehicle track and intelligent reflecting surface phase shift joint optimization method and system |
CN113359480A (en) * | 2021-07-16 | 2021-09-07 | 中国人民解放军火箭军工程大学 | Multi-unmanned aerial vehicle and user cooperative communication optimization method based on MAPPO algorithm |
CN113364630A (en) * | 2021-06-15 | 2021-09-07 | 广东技术师范大学 | Quality of service (QoS) differentiation optimization method and device |
CN113382060A (en) * | 2021-06-07 | 2021-09-10 | 北京理工大学 | Unmanned aerial vehicle track optimization method and system in Internet of things data collection |
CN113395708A (en) * | 2021-07-13 | 2021-09-14 | 东南大学 | Multi-autonomous-subject centralized region coverage method and system based on global environment prediction |
CN113392971A (en) * | 2021-06-11 | 2021-09-14 | 武汉大学 | Strategy network training method, device, equipment and readable storage medium |
CN113467508A (en) * | 2021-06-30 | 2021-10-01 | 天津大学 | Multi-unmanned aerial vehicle intelligent cooperative decision-making method for trapping task |
CN113572548A (en) * | 2021-06-18 | 2021-10-29 | 南京理工大学 | Unmanned aerial vehicle network cooperative fast frequency hopping method based on multi-agent reinforcement learning |
CN113613339A (en) * | 2021-07-10 | 2021-11-05 | 西北农林科技大学 | Channel access method of multi-priority wireless terminal based on deep reinforcement learning |
CN113625751A (en) * | 2021-08-05 | 2021-11-09 | 南京航空航天大学 | Unmanned aerial vehicle position and resource joint optimization method for air-ground integrated federal learning |
CN113625569A (en) * | 2021-08-12 | 2021-11-09 | 中国人民解放军32802部队 | Small unmanned aerial vehicle prevention and control hybrid decision method and system based on deep reinforcement learning and rule driving |
CN113641192A (en) * | 2021-07-06 | 2021-11-12 | 暨南大学 | Route planning method for unmanned aerial vehicle crowd sensing task based on reinforcement learning |
CN113660681A (en) * | 2021-05-31 | 2021-11-16 | 西北工业大学 | Multi-agent resource optimization method applied to unmanned aerial vehicle cluster auxiliary transmission |
CN113691294A (en) * | 2021-09-27 | 2021-11-23 | 中国人民解放军空军预警学院 | Near-field sparse array antenna beam establishing method and device |
CN113706023A (en) * | 2021-08-31 | 2021-11-26 | 哈尔滨理工大学 | Shipboard aircraft guarantee operator scheduling method based on deep reinforcement learning |
CN113762512A (en) * | 2021-11-10 | 2021-12-07 | 北京航空航天大学杭州创新研究院 | Distributed model training method, system and related device |
CN113776531A (en) * | 2021-07-21 | 2021-12-10 | 电子科技大学长三角研究院(湖州) | Multi-unmanned-aerial-vehicle autonomous navigation and task allocation algorithm of wireless self-powered communication network |
CN114021775A (en) * | 2021-09-30 | 2022-02-08 | 成都海天数联科技有限公司 | Intelligent body handicap device putting method based on optimal solution |
CN114051252A (en) * | 2021-09-28 | 2022-02-15 | 嘉兴学院 | Multi-user intelligent transmitting power control method in wireless access network |
CN114124784A (en) * | 2022-01-27 | 2022-03-01 | 军事科学院系统工程研究院网络信息研究所 | Intelligent routing decision protection method and system based on vertical federation |
CN114142912A (en) * | 2021-11-26 | 2022-03-04 | 西安电子科技大学 | Resource control method for guaranteeing time coverage continuity of high-dynamic air network |
CN114222251A (en) * | 2021-11-30 | 2022-03-22 | 中山大学·深圳 | Adaptive network forming and track optimizing method for multiple unmanned aerial vehicles |
CN114268986A (en) * | 2021-12-14 | 2022-04-01 | 北京航空航天大学 | Unmanned aerial vehicle computing unloading and charging service efficiency optimization method |
CN114268963A (en) * | 2021-12-24 | 2022-04-01 | 北京航空航天大学 | Unmanned aerial vehicle network autonomous deployment method facing communication coverage |
CN114339842A (en) * | 2022-01-06 | 2022-04-12 | 北京邮电大学 | Method and device for designing dynamic trajectory of unmanned aerial vehicle cluster under time-varying scene based on deep reinforcement learning |
CN114374951A (en) * | 2022-01-12 | 2022-04-19 | 重庆邮电大学 | Multi-unmanned aerial vehicle dynamic pre-deployment method |
CN114372612A (en) * | 2021-12-16 | 2022-04-19 | 电子科技大学 | Route planning and task unloading method for unmanned aerial vehicle mobile edge computing scene |
CN114449482A (en) * | 2022-03-11 | 2022-05-06 | 南京理工大学 | Heterogeneous vehicle networking user association method based on multi-agent deep reinforcement learning |
CN114548551A (en) * | 2022-02-21 | 2022-05-27 | 广东汇天航空航天科技有限公司 | Method and device for determining residual endurance time, aircraft and medium |
CN114567888A (en) * | 2022-03-04 | 2022-05-31 | 重庆邮电大学 | Multi-unmanned aerial vehicle dynamic deployment method |
CN114578335A (en) * | 2022-03-03 | 2022-06-03 | 电子科技大学长三角研究院(衢州) | Positioning method based on multi-agent deep reinforcement learning and least square |
CN114625151A (en) * | 2022-03-10 | 2022-06-14 | 大连理工大学 | Underwater robot obstacle avoidance path planning method based on reinforcement learning |
CN114679699A (en) * | 2022-03-23 | 2022-06-28 | 重庆邮电大学 | Multi-unmanned-aerial-vehicle energy-saving cruise communication coverage method based on deep reinforcement learning |
CN114884895A (en) * | 2022-05-05 | 2022-08-09 | 郑州轻工业大学 | Intelligent traffic scheduling method based on deep reinforcement learning |
CN114942653A (en) * | 2022-07-26 | 2022-08-26 | 北京邮电大学 | Method and device for determining unmanned cluster flight strategy and electronic equipment |
CN114980169A (en) * | 2022-05-16 | 2022-08-30 | 北京理工大学 | Unmanned aerial vehicle auxiliary ground communication method based on combined optimization of track and phase |
CN114980020A (en) * | 2022-05-17 | 2022-08-30 | 重庆邮电大学 | Unmanned aerial vehicle data collection method based on MADDPG algorithm |
CN114997617A (en) * | 2022-05-23 | 2022-09-02 | 华中科技大学 | Multi-unmanned platform multi-target joint detection task allocation method and system |
CN115038155A (en) * | 2022-05-23 | 2022-09-09 | 香港中文大学(深圳) | Ultra-dense multi-access-point dynamic cooperative transmission method |
CN115113651A (en) * | 2022-07-18 | 2022-09-27 | 中国电子科技集团公司第五十四研究所 | Unmanned robot bureaucratic cooperative coverage optimization method based on ellipse fitting |
CN115314904A (en) * | 2022-06-14 | 2022-11-08 | 北京邮电大学 | Communication coverage method and related equipment based on multi-agent maximum entropy reinforcement learning |
CN115460543A (en) * | 2022-08-31 | 2022-12-09 | 中国地质大学(武汉) | Distributed ring fence covering method, device and storage device |
CN115499849A (en) * | 2022-11-16 | 2022-12-20 | 国网湖北省电力有限公司信息通信公司 | Wireless access point and reconfigurable intelligent surface cooperation method |
CN115713130A (en) * | 2022-09-07 | 2023-02-24 | 华东交通大学 | Vehicle scheduling method based on hyper-parameter network weight distribution deep reinforcement learning |
CN115802313A (en) * | 2022-11-16 | 2023-03-14 | 河南大学 | Air-ground mobile network energy-carrying fair communication method based on intelligent reflecting surface |
CN116009590A (en) * | 2023-02-01 | 2023-04-25 | 中山大学 | Unmanned aerial vehicle network distributed track planning method, system, equipment and medium |
CN116017479A (en) * | 2022-12-30 | 2023-04-25 | 河南大学 | Distributed multi-unmanned aerial vehicle relay network coverage method |
CN116208968A (en) * | 2022-12-30 | 2023-06-02 | 北京信息科技大学 | Track planning method and device based on federal learning |
CN116456307A (en) * | 2023-05-06 | 2023-07-18 | 山东省计算中心(国家超级计算济南中心) | Q learning-based energy-limited Internet of things data acquisition and fusion method |
CN116502547A (en) * | 2023-06-29 | 2023-07-28 | 深圳大学 | Multi-unmanned aerial vehicle wireless energy transmission method based on graph reinforcement learning |
CN116980881A (en) * | 2023-08-29 | 2023-10-31 | 北方工业大学 | Multi-unmanned aerial vehicle collaboration data distribution method, system, electronic equipment and medium |
CN117376934A (en) * | 2023-12-08 | 2024-01-09 | 山东科技大学 | Deep reinforcement learning-based multi-unmanned aerial vehicle offshore mobile base station deployment method |
CN117835463A (en) * | 2023-12-27 | 2024-04-05 | 武汉大学 | Space-to-ground ad hoc communication network space-time dynamic deployment method based on deep reinforcement learning |
CN117856903A (en) * | 2023-12-07 | 2024-04-09 | 山东科技大学 | Marine unmanned aerial vehicle optical link data transmission method based on multi-agent reinforcement learning |
CN116017479B (en) * | 2022-12-30 | 2024-10-25 | 河南大学 | Distributed multi-unmanned aerial vehicle relay network coverage method |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180129882A1 (en) * | 2016-11-08 | 2018-05-10 | Dedrone Holdings, Inc. | Systems, Methods, Apparatuses, and Devices for Identifying, Tracking, and Managing Unmanned Aerial Vehicles |
CN109923799A (en) * | 2016-11-11 | 2019-06-21 | 高通股份有限公司 | The method restored for wave beam in millimeter-wave systems |
CN209085657U (en) * | 2017-08-02 | 2019-07-09 | 强力物联网投资组合2016有限公司 | For data gathering system related or industrial environment with chemical production technology |
CN110198531A (en) * | 2019-05-24 | 2019-09-03 | 吉林大学 | A kind of dynamic D2D relay selection method based on relative velocity |
CN110430527A (en) * | 2019-07-17 | 2019-11-08 | 大连理工大学 | A kind of unmanned plane safe transmission power distribution method over the ground |
CN110488861A (en) * | 2019-07-30 | 2019-11-22 | 北京邮电大学 | Unmanned plane track optimizing method, device and unmanned plane based on deeply study |
CN110531617A (en) * | 2019-07-30 | 2019-12-03 | 北京邮电大学 | Multiple no-manned plane 3D hovering position combined optimization method, device and unmanned plane base station |
CN110730028A (en) * | 2019-08-29 | 2020-01-24 | 广东工业大学 | Unmanned aerial vehicle-assisted backscatter communication device and resource allocation control method |
CN110809274A (en) * | 2019-10-28 | 2020-02-18 | 南京邮电大学 | Narrowband Internet of things-oriented unmanned aerial vehicle base station enhanced network optimization method |
US20200115047A1 (en) * | 2018-10-11 | 2020-04-16 | Beihang University | Multi-uav continuous movement control method, apparatus, device, and storage medium for energy efficient communication coverage |
CN111026147A (en) * | 2019-12-25 | 2020-04-17 | 北京航空航天大学 | Zero overshoot unmanned aerial vehicle position control method and device based on deep reinforcement learning |
CN111132009A (en) * | 2019-12-23 | 2020-05-08 | 北京邮电大学 | Mobile edge calculation method, device and system of Internet of things |
-
2020
- 2020-06-04 CN CN202010497656.4A patent/CN111786713B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180129882A1 (en) * | 2016-11-08 | 2018-05-10 | Dedrone Holdings, Inc. | Systems, Methods, Apparatuses, and Devices for Identifying, Tracking, and Managing Unmanned Aerial Vehicles |
CN109923799A (en) * | 2016-11-11 | 2019-06-21 | 高通股份有限公司 | The method restored for wave beam in millimeter-wave systems |
CN209085657U (en) * | 2017-08-02 | 2019-07-09 | 强力物联网投资组合2016有限公司 | For data gathering system related or industrial environment with chemical production technology |
US20200115047A1 (en) * | 2018-10-11 | 2020-04-16 | Beihang University | Multi-uav continuous movement control method, apparatus, device, and storage medium for energy efficient communication coverage |
CN110198531A (en) * | 2019-05-24 | 2019-09-03 | 吉林大学 | A kind of dynamic D2D relay selection method based on relative velocity |
CN110430527A (en) * | 2019-07-17 | 2019-11-08 | 大连理工大学 | A kind of unmanned plane safe transmission power distribution method over the ground |
CN110488861A (en) * | 2019-07-30 | 2019-11-22 | 北京邮电大学 | Unmanned plane track optimizing method, device and unmanned plane based on deeply study |
CN110531617A (en) * | 2019-07-30 | 2019-12-03 | 北京邮电大学 | Multiple no-manned plane 3D hovering position combined optimization method, device and unmanned plane base station |
CN110730028A (en) * | 2019-08-29 | 2020-01-24 | 广东工业大学 | Unmanned aerial vehicle-assisted backscatter communication device and resource allocation control method |
CN110809274A (en) * | 2019-10-28 | 2020-02-18 | 南京邮电大学 | Narrowband Internet of things-oriented unmanned aerial vehicle base station enhanced network optimization method |
CN111132009A (en) * | 2019-12-23 | 2020-05-08 | 北京邮电大学 | Mobile edge calculation method, device and system of Internet of things |
CN111026147A (en) * | 2019-12-25 | 2020-04-17 | 北京航空航天大学 | Zero overshoot unmanned aerial vehicle position control method and device based on deep reinforcement learning |
Non-Patent Citations (2)
Title |
---|
CONG WANG: "Research of UAV Target Detection and Flight Control Based on Deep Learning", 《2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD)》 * |
周毅: "基于深度强化学习的无人机自主部署及能效优化策略", 《物联网学报》 * |
Cited By (126)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112256056B (en) * | 2020-10-19 | 2022-03-01 | 中山大学 | Unmanned aerial vehicle control method and system based on multi-agent deep reinforcement learning |
CN112256056A (en) * | 2020-10-19 | 2021-01-22 | 中山大学 | Unmanned aerial vehicle control method and system based on multi-agent deep reinforcement learning |
CN112512115B (en) * | 2020-11-20 | 2022-02-11 | 北京邮电大学 | Method and device for determining position of air base station and electronic equipment |
CN112512115A (en) * | 2020-11-20 | 2021-03-16 | 北京邮电大学 | Method and device for determining position of air base station and electronic equipment |
CN112566209A (en) * | 2020-11-24 | 2021-03-26 | 山西三友和智慧信息技术股份有限公司 | UAV-BSs energy and service priority track design method based on double Q learning |
CN112511197A (en) * | 2020-12-01 | 2021-03-16 | 南京工业大学 | Unmanned aerial vehicle auxiliary elastic video multicast method based on deep reinforcement learning |
CN112752357A (en) * | 2020-12-02 | 2021-05-04 | 宁波大学 | Online unmanned aerial vehicle auxiliary data collection method and device based on energy harvesting technology |
CN112752357B (en) * | 2020-12-02 | 2022-06-17 | 宁波大学 | Online unmanned aerial vehicle auxiliary data collection method and device based on energy harvesting technology |
CN112511250A (en) * | 2020-12-03 | 2021-03-16 | 中国人民解放军火箭军工程大学 | DRL-based multi-unmanned aerial vehicle air base station dynamic deployment method and system |
CN112636811A (en) * | 2020-12-08 | 2021-04-09 | 北京邮电大学 | Relay unmanned aerial vehicle deployment method and device |
CN112672361A (en) * | 2020-12-17 | 2021-04-16 | 东南大学 | Large-scale MIMO capacity increasing method based on unmanned aerial vehicle cluster deployment |
CN112672361B (en) * | 2020-12-17 | 2022-12-02 | 东南大学 | Large-scale MIMO capacity increasing method based on unmanned aerial vehicle cluster deployment |
CN112821938A (en) * | 2021-01-08 | 2021-05-18 | 重庆大学 | Total throughput and energy consumption optimization method of air-space-ground satellite communication system |
CN112904890A (en) * | 2021-01-15 | 2021-06-04 | 北京国网富达科技发展有限责任公司 | Unmanned aerial vehicle automatic inspection system and method for power line |
CN112904890B (en) * | 2021-01-15 | 2023-06-30 | 北京国网富达科技发展有限责任公司 | Unmanned aerial vehicle automatic inspection system and method for power line |
CN112947575A (en) * | 2021-03-17 | 2021-06-11 | 中国人民解放军国防科技大学 | Unmanned aerial vehicle cluster multi-target searching method and system based on deep reinforcement learning |
CN113094982A (en) * | 2021-03-29 | 2021-07-09 | 天津理工大学 | Internet of vehicles edge caching method based on multi-agent deep reinforcement learning |
CN113194488A (en) * | 2021-03-31 | 2021-07-30 | 西安交通大学 | Unmanned aerial vehicle track and intelligent reflecting surface phase shift joint optimization method and system |
CN113162679A (en) * | 2021-04-01 | 2021-07-23 | 南京邮电大学 | DDPG algorithm-based IRS (inter-Range instrumentation System) auxiliary unmanned aerial vehicle communication joint optimization method |
CN113162679B (en) * | 2021-04-01 | 2023-03-10 | 南京邮电大学 | DDPG algorithm-based IRS (intelligent resilient software) assisted unmanned aerial vehicle communication joint optimization method |
CN113342029A (en) * | 2021-04-16 | 2021-09-03 | 山东师范大学 | Maximum sensor data acquisition path planning method and system based on unmanned aerial vehicle cluster |
CN113115344A (en) * | 2021-04-19 | 2021-07-13 | 中国人民解放军火箭军工程大学 | Unmanned aerial vehicle base station communication resource allocation strategy prediction method based on noise optimization |
CN113115344B (en) * | 2021-04-19 | 2021-12-14 | 中国人民解放军火箭军工程大学 | Unmanned aerial vehicle base station communication resource allocation strategy prediction method based on noise optimization |
CN113286275A (en) * | 2021-04-23 | 2021-08-20 | 南京大学 | Unmanned aerial vehicle cluster efficient communication method based on multi-agent reinforcement learning |
CN113190039B (en) * | 2021-04-27 | 2024-04-16 | 大连理工大学 | Unmanned aerial vehicle acquisition path planning method based on layered deep reinforcement learning |
CN113190039A (en) * | 2021-04-27 | 2021-07-30 | 大连理工大学 | Unmanned aerial vehicle acquisition path planning method based on hierarchical deep reinforcement learning |
CN113364495A (en) * | 2021-05-25 | 2021-09-07 | 西安交通大学 | Multi-unmanned aerial vehicle track and intelligent reflecting surface phase shift joint optimization method and system |
CN113364495B (en) * | 2021-05-25 | 2022-08-05 | 西安交通大学 | Multi-unmanned aerial vehicle track and intelligent reflecting surface phase shift joint optimization method and system |
CN113286314B (en) * | 2021-05-25 | 2022-03-08 | 重庆邮电大学 | Unmanned aerial vehicle base station deployment and user association method based on Q learning algorithm |
CN113286314A (en) * | 2021-05-25 | 2021-08-20 | 重庆邮电大学 | Unmanned aerial vehicle base station deployment and user association method based on Q learning algorithm |
CN113255218A (en) * | 2021-05-27 | 2021-08-13 | 电子科技大学 | Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network |
CN113255218B (en) * | 2021-05-27 | 2022-05-31 | 电子科技大学 | Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network |
CN113328775A (en) * | 2021-05-28 | 2021-08-31 | 怀化学院 | UAV height positioning system and computer storage medium |
CN113328775B (en) * | 2021-05-28 | 2022-06-21 | 怀化学院 | UAV height positioning system and computer storage medium |
CN113660681B (en) * | 2021-05-31 | 2023-06-06 | 西北工业大学 | Multi-agent resource optimization method applied to unmanned aerial vehicle cluster auxiliary transmission |
CN113660681A (en) * | 2021-05-31 | 2021-11-16 | 西北工业大学 | Multi-agent resource optimization method applied to unmanned aerial vehicle cluster auxiliary transmission |
CN113242556A (en) * | 2021-06-04 | 2021-08-10 | 重庆邮电大学 | Unmanned aerial vehicle resource dynamic deployment method based on differentiated services |
CN113242556B (en) * | 2021-06-04 | 2022-08-23 | 重庆邮电大学 | Unmanned aerial vehicle resource dynamic deployment method based on differentiated services |
CN113382060B (en) * | 2021-06-07 | 2022-03-22 | 北京理工大学 | Unmanned aerial vehicle track optimization method and system in Internet of things data collection |
CN113382060A (en) * | 2021-06-07 | 2021-09-10 | 北京理工大学 | Unmanned aerial vehicle track optimization method and system in Internet of things data collection |
CN113392971A (en) * | 2021-06-11 | 2021-09-14 | 武汉大学 | Strategy network training method, device, equipment and readable storage medium |
CN113364630A (en) * | 2021-06-15 | 2021-09-07 | 广东技术师范大学 | Quality of service (QoS) differentiation optimization method and device |
CN113572548B (en) * | 2021-06-18 | 2023-07-07 | 南京理工大学 | Unmanned plane network cooperative fast frequency hopping method based on multi-agent reinforcement learning |
CN113572548A (en) * | 2021-06-18 | 2021-10-29 | 南京理工大学 | Unmanned aerial vehicle network cooperative fast frequency hopping method based on multi-agent reinforcement learning |
CN113346944A (en) * | 2021-06-28 | 2021-09-03 | 上海交通大学 | Time delay minimization calculation task unloading method and system in air-space-ground integrated network |
CN113346944B (en) * | 2021-06-28 | 2022-06-10 | 上海交通大学 | Time delay minimization calculation task unloading method and system in air-space-ground integrated network |
CN113467508B (en) * | 2021-06-30 | 2022-06-28 | 天津大学 | Multi-unmanned aerial vehicle intelligent cooperative decision-making method for trapping task |
CN113467508A (en) * | 2021-06-30 | 2021-10-01 | 天津大学 | Multi-unmanned aerial vehicle intelligent cooperative decision-making method for trapping task |
CN113641192A (en) * | 2021-07-06 | 2021-11-12 | 暨南大学 | Route planning method for unmanned aerial vehicle crowd sensing task based on reinforcement learning |
CN113641192B (en) * | 2021-07-06 | 2023-07-18 | 暨南大学 | Route planning method for intelligent perception task of unmanned aerial vehicle group based on reinforcement learning |
CN113613339A (en) * | 2021-07-10 | 2021-11-05 | 西北农林科技大学 | Channel access method of multi-priority wireless terminal based on deep reinforcement learning |
CN113613339B (en) * | 2021-07-10 | 2023-10-17 | 西北农林科技大学 | Channel access method of multi-priority wireless terminal based on deep reinforcement learning |
CN113395708A (en) * | 2021-07-13 | 2021-09-14 | 东南大学 | Multi-autonomous-subject centralized region coverage method and system based on global environment prediction |
CN113359480B (en) * | 2021-07-16 | 2022-02-01 | 中国人民解放军火箭军工程大学 | Multi-unmanned aerial vehicle and user cooperative communication optimization method based on MAPPO algorithm |
CN113359480A (en) * | 2021-07-16 | 2021-09-07 | 中国人民解放军火箭军工程大学 | Multi-unmanned aerial vehicle and user cooperative communication optimization method based on MAPPO algorithm |
CN113776531A (en) * | 2021-07-21 | 2021-12-10 | 电子科技大学长三角研究院(湖州) | Multi-unmanned-aerial-vehicle autonomous navigation and task allocation algorithm of wireless self-powered communication network |
CN113625751B (en) * | 2021-08-05 | 2023-02-24 | 南京航空航天大学 | Unmanned aerial vehicle position and resource joint optimization method for air-ground integrated federal learning |
CN113625751A (en) * | 2021-08-05 | 2021-11-09 | 南京航空航天大学 | Unmanned aerial vehicle position and resource joint optimization method for air-ground integrated federal learning |
CN113625569A (en) * | 2021-08-12 | 2021-11-09 | 中国人民解放军32802部队 | Small unmanned aerial vehicle prevention and control hybrid decision method and system based on deep reinforcement learning and rule driving |
CN113625569B (en) * | 2021-08-12 | 2022-02-08 | 中国人民解放军32802部队 | Small unmanned aerial vehicle prevention and control decision method and system based on hybrid decision model |
CN113706023B (en) * | 2021-08-31 | 2022-07-12 | 哈尔滨理工大学 | Shipboard aircraft guarantee operator scheduling method based on deep reinforcement learning |
CN113706023A (en) * | 2021-08-31 | 2021-11-26 | 哈尔滨理工大学 | Shipboard aircraft guarantee operator scheduling method based on deep reinforcement learning |
CN113691294A (en) * | 2021-09-27 | 2021-11-23 | 中国人民解放军空军预警学院 | Near-field sparse array antenna beam establishing method and device |
CN114051252A (en) * | 2021-09-28 | 2022-02-15 | 嘉兴学院 | Multi-user intelligent transmitting power control method in wireless access network |
CN114021775A (en) * | 2021-09-30 | 2022-02-08 | 成都海天数联科技有限公司 | Intelligent body handicap device putting method based on optimal solution |
CN113762512A (en) * | 2021-11-10 | 2021-12-07 | 北京航空航天大学杭州创新研究院 | Distributed model training method, system and related device |
CN114142912A (en) * | 2021-11-26 | 2022-03-04 | 西安电子科技大学 | Resource control method for guaranteeing time coverage continuity of high-dynamic air network |
CN114222251A (en) * | 2021-11-30 | 2022-03-22 | 中山大学·深圳 | Adaptive network forming and track optimizing method for multiple unmanned aerial vehicles |
CN114268986A (en) * | 2021-12-14 | 2022-04-01 | 北京航空航天大学 | Unmanned aerial vehicle computing unloading and charging service efficiency optimization method |
CN114372612B (en) * | 2021-12-16 | 2023-04-28 | 电子科技大学 | Path planning and task unloading method for unmanned aerial vehicle mobile edge computing scene |
CN114372612A (en) * | 2021-12-16 | 2022-04-19 | 电子科技大学 | Route planning and task unloading method for unmanned aerial vehicle mobile edge computing scene |
CN114268963A (en) * | 2021-12-24 | 2022-04-01 | 北京航空航天大学 | Unmanned aerial vehicle network autonomous deployment method facing communication coverage |
CN114268963B (en) * | 2021-12-24 | 2023-07-11 | 北京航空航天大学 | Communication coverage-oriented unmanned aerial vehicle network autonomous deployment method |
CN114339842A (en) * | 2022-01-06 | 2022-04-12 | 北京邮电大学 | Method and device for designing dynamic trajectory of unmanned aerial vehicle cluster under time-varying scene based on deep reinforcement learning |
CN114339842B (en) * | 2022-01-06 | 2022-12-20 | 北京邮电大学 | Method and device for designing dynamic trajectory of unmanned aerial vehicle cluster in time-varying scene based on deep reinforcement learning |
CN114374951B (en) * | 2022-01-12 | 2024-04-30 | 重庆邮电大学 | Dynamic pre-deployment method for multiple unmanned aerial vehicles |
CN114374951A (en) * | 2022-01-12 | 2022-04-19 | 重庆邮电大学 | Multi-unmanned aerial vehicle dynamic pre-deployment method |
CN114124784B (en) * | 2022-01-27 | 2022-04-12 | 军事科学院系统工程研究院网络信息研究所 | Intelligent routing decision protection method and system based on vertical federation |
CN114124784A (en) * | 2022-01-27 | 2022-03-01 | 军事科学院系统工程研究院网络信息研究所 | Intelligent routing decision protection method and system based on vertical federation |
CN114548551A (en) * | 2022-02-21 | 2022-05-27 | 广东汇天航空航天科技有限公司 | Method and device for determining residual endurance time, aircraft and medium |
CN114578335B (en) * | 2022-03-03 | 2024-08-16 | 电子科技大学长三角研究院(衢州) | Positioning method based on multi-agent deep reinforcement learning and least square |
CN114578335A (en) * | 2022-03-03 | 2022-06-03 | 电子科技大学长三角研究院(衢州) | Positioning method based on multi-agent deep reinforcement learning and least square |
CN114567888A (en) * | 2022-03-04 | 2022-05-31 | 重庆邮电大学 | Multi-unmanned aerial vehicle dynamic deployment method |
CN114567888B (en) * | 2022-03-04 | 2023-12-26 | 国网浙江省电力有限公司台州市黄岩区供电公司 | Multi-unmanned aerial vehicle dynamic deployment method |
CN114625151B (en) * | 2022-03-10 | 2024-05-28 | 大连理工大学 | Underwater robot obstacle avoidance path planning method based on reinforcement learning |
CN114625151A (en) * | 2022-03-10 | 2022-06-14 | 大连理工大学 | Underwater robot obstacle avoidance path planning method based on reinforcement learning |
CN114449482B (en) * | 2022-03-11 | 2024-05-14 | 南京理工大学 | Heterogeneous Internet of vehicles user association method based on multi-agent deep reinforcement learning |
CN114449482A (en) * | 2022-03-11 | 2022-05-06 | 南京理工大学 | Heterogeneous vehicle networking user association method based on multi-agent deep reinforcement learning |
CN114679699A (en) * | 2022-03-23 | 2022-06-28 | 重庆邮电大学 | Multi-unmanned-aerial-vehicle energy-saving cruise communication coverage method based on deep reinforcement learning |
CN114884895B (en) * | 2022-05-05 | 2023-08-22 | 郑州轻工业大学 | Intelligent flow scheduling method based on deep reinforcement learning |
CN114884895A (en) * | 2022-05-05 | 2022-08-09 | 郑州轻工业大学 | Intelligent traffic scheduling method based on deep reinforcement learning |
CN114980169B (en) * | 2022-05-16 | 2024-08-20 | 北京理工大学 | Unmanned aerial vehicle auxiliary ground communication method based on track and phase joint optimization |
CN114980169A (en) * | 2022-05-16 | 2022-08-30 | 北京理工大学 | Unmanned aerial vehicle auxiliary ground communication method based on combined optimization of track and phase |
CN114980020B (en) * | 2022-05-17 | 2024-07-12 | 中科润物科技(南京)有限公司 | MADDPG algorithm-based unmanned aerial vehicle data collection method |
CN114980020A (en) * | 2022-05-17 | 2022-08-30 | 重庆邮电大学 | Unmanned aerial vehicle data collection method based on MADDPG algorithm |
CN114997617B (en) * | 2022-05-23 | 2024-06-07 | 华中科技大学 | Multi-unmanned platform multi-target combined detection task allocation method and system |
CN115038155A (en) * | 2022-05-23 | 2022-09-09 | 香港中文大学(深圳) | Ultra-dense multi-access-point dynamic cooperative transmission method |
CN114997617A (en) * | 2022-05-23 | 2022-09-02 | 华中科技大学 | Multi-unmanned platform multi-target joint detection task allocation method and system |
CN115314904B (en) * | 2022-06-14 | 2024-03-29 | 北京邮电大学 | Communication coverage method based on multi-agent maximum entropy reinforcement learning and related equipment |
CN115314904A (en) * | 2022-06-14 | 2022-11-08 | 北京邮电大学 | Communication coverage method and related equipment based on multi-agent maximum entropy reinforcement learning |
CN115113651A (en) * | 2022-07-18 | 2022-09-27 | 中国电子科技集团公司第五十四研究所 | Unmanned robot bureaucratic cooperative coverage optimization method based on ellipse fitting |
CN114942653A (en) * | 2022-07-26 | 2022-08-26 | 北京邮电大学 | Method and device for determining unmanned cluster flight strategy and electronic equipment |
CN115460543A (en) * | 2022-08-31 | 2022-12-09 | 中国地质大学(武汉) | Distributed ring fence covering method, device and storage device |
CN115460543B (en) * | 2022-08-31 | 2024-04-19 | 中国地质大学(武汉) | Distributed annular fence coverage method, device and storage device |
CN115713130A (en) * | 2022-09-07 | 2023-02-24 | 华东交通大学 | Vehicle scheduling method based on hyper-parameter network weight distribution deep reinforcement learning |
CN115713130B (en) * | 2022-09-07 | 2023-09-05 | 华东交通大学 | Vehicle scheduling method based on super-parameter network weight distribution deep reinforcement learning |
CN115802313A (en) * | 2022-11-16 | 2023-03-14 | 河南大学 | Air-ground mobile network energy-carrying fair communication method based on intelligent reflecting surface |
CN115499849A (en) * | 2022-11-16 | 2022-12-20 | 国网湖北省电力有限公司信息通信公司 | Wireless access point and reconfigurable intelligent surface cooperation method |
CN115499849B (en) * | 2022-11-16 | 2023-04-07 | 国网湖北省电力有限公司信息通信公司 | Wireless access point and reconfigurable intelligent surface cooperation method |
CN116017479B (en) * | 2022-12-30 | 2024-10-25 | 河南大学 | Distributed multi-unmanned aerial vehicle relay network coverage method |
CN116208968B (en) * | 2022-12-30 | 2024-04-05 | 北京信息科技大学 | Track planning method and device based on federal learning |
CN116017479A (en) * | 2022-12-30 | 2023-04-25 | 河南大学 | Distributed multi-unmanned aerial vehicle relay network coverage method |
CN116208968A (en) * | 2022-12-30 | 2023-06-02 | 北京信息科技大学 | Track planning method and device based on federal learning |
CN116009590A (en) * | 2023-02-01 | 2023-04-25 | 中山大学 | Unmanned aerial vehicle network distributed track planning method, system, equipment and medium |
CN116009590B (en) * | 2023-02-01 | 2023-11-17 | 中山大学 | Unmanned aerial vehicle network distributed track planning method, system, equipment and medium |
CN116456307B (en) * | 2023-05-06 | 2024-04-09 | 山东省计算中心(国家超级计算济南中心) | Q learning-based energy-limited Internet of things data acquisition and fusion method |
CN116456307A (en) * | 2023-05-06 | 2023-07-18 | 山东省计算中心(国家超级计算济南中心) | Q learning-based energy-limited Internet of things data acquisition and fusion method |
CN116502547B (en) * | 2023-06-29 | 2024-06-04 | 深圳大学 | Multi-unmanned aerial vehicle wireless energy transmission method based on graph reinforcement learning |
CN116502547A (en) * | 2023-06-29 | 2023-07-28 | 深圳大学 | Multi-unmanned aerial vehicle wireless energy transmission method based on graph reinforcement learning |
CN116980881A (en) * | 2023-08-29 | 2023-10-31 | 北方工业大学 | Multi-unmanned aerial vehicle collaboration data distribution method, system, electronic equipment and medium |
CN116980881B (en) * | 2023-08-29 | 2024-01-23 | 北方工业大学 | Multi-unmanned aerial vehicle collaboration data distribution method, system, electronic equipment and medium |
CN117856903A (en) * | 2023-12-07 | 2024-04-09 | 山东科技大学 | Marine unmanned aerial vehicle optical link data transmission method based on multi-agent reinforcement learning |
CN117856903B (en) * | 2023-12-07 | 2024-08-30 | 山东科技大学 | Marine unmanned aerial vehicle optical link data transmission method based on multi-agent reinforcement learning |
CN117376934A (en) * | 2023-12-08 | 2024-01-09 | 山东科技大学 | Deep reinforcement learning-based multi-unmanned aerial vehicle offshore mobile base station deployment method |
CN117376934B (en) * | 2023-12-08 | 2024-02-27 | 山东科技大学 | Deep reinforcement learning-based multi-unmanned aerial vehicle offshore mobile base station deployment method |
CN117835463A (en) * | 2023-12-27 | 2024-04-05 | 武汉大学 | Space-to-ground ad hoc communication network space-time dynamic deployment method based on deep reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN111786713B (en) | 2021-06-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111786713B (en) | Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning | |
CN109831797B (en) | Unmanned aerial vehicle base station bandwidth and track joint optimization method with limited push power | |
Zhang et al. | Energy-efficient trajectory optimization for UAV-assisted IoT networks | |
CN110364031B (en) | Path planning and wireless communication method for unmanned aerial vehicle cluster in ground sensor network | |
CN109286913B (en) | Energy consumption optimization method of unmanned aerial vehicle mobile edge computing system based on cellular network connection | |
CN111263332A (en) | Unmanned aerial vehicle track and power joint optimization method based on deep reinforcement learning | |
CN109885088B (en) | Unmanned aerial vehicle flight trajectory optimization method based on machine learning in edge computing network | |
CN109067490A (en) | Cellular Networks join lower multiple no-manned plane and cooperate with mobile edge calculations method for distributing system resource | |
CN114690799A (en) | Air-space-ground integrated unmanned aerial vehicle Internet of things data acquisition method based on information age | |
CN113660681B (en) | Multi-agent resource optimization method applied to unmanned aerial vehicle cluster auxiliary transmission | |
CN113359480B (en) | Multi-unmanned aerial vehicle and user cooperative communication optimization method based on MAPPO algorithm | |
CN108668257B (en) | Distributed unmanned aerial vehicle postman difference relay trajectory optimization method | |
CN112702713B (en) | Low-altitude unmanned-machine communication deployment method under multi-constraint condition | |
CN114980169A (en) | Unmanned aerial vehicle auxiliary ground communication method based on combined optimization of track and phase | |
CN115499921A (en) | Three-dimensional trajectory design and resource scheduling optimization method for complex unmanned aerial vehicle network | |
Gangula et al. | A landing spot approach for enhancing the performance of UAV-aided wireless networks | |
CN113163332A (en) | Road sign graph coloring unmanned aerial vehicle energy-saving endurance data collection method based on metric learning | |
CN114205769A (en) | Joint trajectory optimization and bandwidth allocation method based on unmanned aerial vehicle data acquisition system | |
CN117119489A (en) | Deployment and resource optimization method of wireless energy supply network based on multi-unmanned aerial vehicle assistance | |
Babu et al. | Fairness-based energy-efficient 3-D path planning of a portable access point: A deep reinforcement learning approach | |
CN115407794A (en) | Sea area safety communication unmanned aerial vehicle track real-time planning method based on reinforcement learning | |
CN114020024B (en) | Unmanned aerial vehicle path planning method based on Monte Carlo tree search | |
CN113050672B (en) | Unmanned aerial vehicle path planning method for emergency information acquisition and transmission | |
CN113776531A (en) | Multi-unmanned-aerial-vehicle autonomous navigation and task allocation algorithm of wireless self-powered communication network | |
CN116882270A (en) | Multi-unmanned aerial vehicle wireless charging and edge computing combined optimization method and system based on deep reinforcement learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |