CN111786713A - A UAV network hovering position optimization method based on multi-agent deep reinforcement learning - Google Patents

A UAV network hovering position optimization method based on multi-agent deep reinforcement learning Download PDF

Info

Publication number
CN111786713A
CN111786713A CN202010497656.4A CN202010497656A CN111786713A CN 111786713 A CN111786713 A CN 111786713A CN 202010497656 A CN202010497656 A CN 202010497656A CN 111786713 A CN111786713 A CN 111786713A
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
ground
network
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010497656.4A
Other languages
Chinese (zh)
Other versions
CN111786713B (en
Inventor
刘中豪
覃振权
卢炳先
王雷
朱明�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN202010497656.4A priority Critical patent/CN111786713B/en
Publication of CN111786713A publication Critical patent/CN111786713A/en
Application granted granted Critical
Publication of CN111786713B publication Critical patent/CN111786713B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/14Relay systems
    • H04B7/15Active relay systems
    • H04B7/185Space-based or airborne stations; Stations for satellite systems
    • H04B7/18502Airborne stations
    • H04B7/18506Communications with or from aircraft, i.e. aeronautical mobile service
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/18Network planning tools
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/22Traffic simulation tools or models
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Astronomy & Astrophysics (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

An unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning comprises the steps of firstly, modeling a channel model, a coverage model and an energy loss model in an unmanned aerial vehicle ground communication scene; modeling a throughput maximization problem of an unmanned aerial vehicle to ground communication network into a partially observable Markov decision process; obtaining local observation information and instantaneous rewards through continuous interaction of an unmanned aerial vehicle and the environment, and carrying out centralized training based on the information to obtain a distributed strategy network; and deploying the strategy network to each unmanned aerial vehicle, wherein each unmanned aerial vehicle can obtain a moving direction and a moving distance decision based on local observation information of the unmanned aerial vehicle, adjust the hovering position and perform distributed cooperation. The invention also introduces proportional fair scheduling and energy consumption loss information of the unmanned aerial vehicle into the instantaneous reward function, thereby improving the throughput, ensuring the fairness of the unmanned aerial vehicle to the ground user service, reducing the energy consumption loss and enabling the unmanned aerial vehicle cluster to adapt to the dynamic environment.

Description

一种基于多智能体深度强化学习的无人机网络悬停位置优化 方法A UAV network hovering position optimization based on multi-agent deep reinforcement learning method

技术领域technical field

本发明涉及无线通信技术领域,特别涉及一种基于多智能体深度强化学习的多无人机网络悬停位置优化方法。The invention relates to the technical field of wireless communication, in particular to a method for optimizing the hovering position of a multi-unmanned aerial vehicle network based on multi-agent deep reinforcement learning.

背景技术Background technique

近年来,由于无人机的高机动性、易部署性和低成本,基于无人机的通信技术引起了广泛的关注,成为了无线通信领域的一个新的研究热点。无人机辅助通信技术主要有以下几个应用场景:无人机作为移动基站为基础设施稀少或灾后地区提供通信覆盖、无人机作为中继节点为相距较远的无法直接建立连接的两个通信节点提供无线连接、基于无人机的数据分发和采集。本发明主要针对第一个场景,在该场景中,无人机的悬停位置决定了整个无人机网络的覆盖性能和吞吐量大小。无人机网络所服务的地面设备可能具有移动性,因此无人机需要不断地调整自身的悬停位置以实现最优的性能。In recent years, due to the high mobility, easy deployment and low cost of UAVs, UAV-based communication technology has attracted extensive attention and has become a new research hotspot in the field of wireless communication. UAV-assisted communication technology mainly has the following application scenarios: UAV as a mobile base station to provide communication coverage for areas with sparse infrastructure or post-disaster areas, UAV as a relay node for two distant places that cannot directly establish a connection Communication nodes provide wireless connectivity, UAV-based data distribution and collection. The present invention is mainly aimed at the first scenario, in which the hovering position of the UAV determines the coverage performance and throughput of the entire UAV network. The ground equipment served by the drone network may be mobile, so the drone needs to constantly adjust its hovering position to achieve optimal performance.

2018年,Qingqing Wu等人在论文《JointTrajectoryandCommunicationDesignforMulti-UAVEnabledWirelessNetworks》中提出一种多无人机对地通信系统的UAV路径规划方案,将时间划分为多个周期,每个周期UAVs的移动轨迹是相同的,在每个时隙,无人机基站服务特定的地面用户。该方案将优化问题建模为混合整数规划问题,并使用块坐标梯度下降和近似凸优化技术进行求解,求得周期内每个时间片的最优悬停位置,最大化和地面用户间的下行链路吞吐量。但是,该论文提出的方案只适用于静态环境,是假设地面设备不具备移动性的条件下进行的,并不适用于地面用户不断移动的场景。Chi Harold Liu等人在论文《Energy-Efficient UAV Control for Effective andFair CommunicationCoverage:A DeepReinforcement Learning Approach》提出了一种基于深度强化学习的UAV路径规划算法,通过深度强化学习方法训练出了一个决策模型,该模型根据当前状态输出UAVs下一步的决策(移动方向、移动距离)。该论文提出的方法能够实现大范围区域的公平无线覆盖,并尽可能减少UAVs的能耗。但是,该方法仅仅考虑了UAVs网络的覆盖性能,且是针对区域的粗粒度覆盖公平,而不是针对用户的细粒度覆盖公平。此外,该方法是一种集中式的方案,需要一个控制器在每个时隙收集所有无人机的信息,才能做出决策。In 2018, Qingqing Wu et al. proposed a UAV path planning scheme for a multi-UAV-to-ground communication system in the paper "Joint Trajectory and Communication Design for Multi-UAV Enabled Wireless Networks", dividing the time into multiple cycles, and the movement trajectories of UAVs in each cycle are the same. , in each time slot, the UAV base station serves a specific ground user. This scheme models the optimization problem as a mixed integer programming problem, and solves it using block coordinate gradient descent and approximate convex optimization techniques to obtain the optimal hovering position for each time slice in the cycle, and maximize the downlink between ground users. link throughput. However, the scheme proposed in this paper is only suitable for static environments, and is carried out under the assumption that ground equipment does not have mobility, and is not suitable for scenarios where ground users are constantly moving. Chi Harold Liu et al. proposed a UAV path planning algorithm based on deep reinforcement learning in the paper "Energy-Efficient UAV Control for Effective and Fair Communication Coverage: A Deep Reinforcement Learning Approach", and trained a decision model through deep reinforcement learning. The model outputs the UAVs' next decision (moving direction, moving distance) according to the current state. The method proposed in this paper can achieve fair wireless coverage over a large area and minimize the energy consumption of UAVs. However, this method only considers the coverage performance of the UAVs network, and is fair for the coarse-grained coverage of the region rather than the fine-grained coverage of the user. Furthermore, the method is a centralized scheme that requires a controller to collect information from all UAVs at each time slot in order to make decisions.

综上所述,基于无人机基站的对地通信网络中的UAVs路径规划技术主要有如下缺陷:(1)没有考虑环境的动态性,即地面用户的移动性。(2)采用的是集中式的算法,依赖全局信息和集中式控制,某些大范围的场景中,进行集中式控制是较为困难的,因此需要一种分布式的控制策略,每个无人机基站仅靠自己获得的信息做出决策。(3)忽略了考虑用户层次的服务公平性。这些缺陷使得现有的无人机网络中的UAVs轨迹优化方法无法适用于实际通信环境。To sum up, the UAVs path planning technology in the ground-to-ground communication network based on the UAV base station mainly has the following defects: (1) The dynamics of the environment, that is, the mobility of ground users, is not considered. (2) A centralized algorithm is used, which relies on global information and centralized control. In some large-scale scenarios, it is difficult to perform centralized control. Therefore, a distributed control strategy is required. The base station only relies on the information it obtains to make decisions. (3) The service fairness considering the user level is ignored. These shortcomings make the existing UAVs trajectory optimization methods in UAV networks unsuitable for practical communication environments.

发明内容SUMMARY OF THE INVENTION

本发明的目的是提出一种基于多智能体强化学习的多无人机悬停位置优化方法,以解决上述技术问题。The purpose of the present invention is to propose a method for optimizing the hovering position of multiple UAVs based on multi-agent reinforcement learning, so as to solve the above technical problems.

本发明的技术方案:Technical scheme of the present invention:

一种基于多智能体深度强化学习的无人机网络悬停位置优化方法,步骤如下:A UAV network hovering position optimization method based on multi-agent deep reinforcement learning, the steps are as follows:

(1)建立多无人机対地通信网络模型,主要包括以下4个步骤:(1) Establish a multi-UAV ground-to-ground communication network model, which mainly includes the following 4 steps:

(1.1)建立场景模型:建立一个边长为l的正方形目标区域,该区域中有N个地面用户和M个无人机基站(UAV-BSs),这些无人机基站为地面用户提供通信服务。时间被划分为T个相同的时隙,从上一时隙到当前时隙,地面用户可能静止也可能发生移动,因此无人机基站需要在每个时隙寻找新的最优悬停位置,并在到达目标位置后选择地面用户进行数据传输服务。(1.1) Establish a scene model: establish a square target area with a side length of l, in which there are N ground users and M unmanned aerial vehicle base stations (UAV-BSs), which provide communication services for ground users . Time is divided into T identical time slots, from the previous time slot to the current time slot, the ground user may be stationary or may move, so the UAV base station needs to find a new optimal hovering position in each time slot, and After reaching the target location, the ground user is selected for data transmission service.

(1.2)建立空对地通信模型:本发明使用空对地信道模型对无人机基站和地面用户之间的信道进行建模,无人机基站由于高飞行高度,相比于地面基站更容易与地面用户建立视距链路(LoS),在LoS情况下,无人机基站m和地面用户n之间的路径损耗模型为:(1.2) Establishing an air-to-ground communication model: The present invention uses an air-to-ground channel model to model the channel between the UAV base station and the ground user. The UAV base station is easier to fly than the ground base station due to its high flying height. A line-of-sight link (LoS) is established with the ground user. In the LoS case, the path loss model between the UAV base station m and the ground user n is:

Figure BDA0002523468170000031
Figure BDA0002523468170000031

其中η表示额外路径损耗系数,c表示光速,fc表示子载波频率,α表示路径损失指数,

Figure BDA0002523468170000032
表示无人机基站m和地面用户n之间的距离,其中rn,m表示二者的水平距离,h为无人机基站固定飞行高度。根据路径损失,信道增益可以表示为
Figure BDA0002523468170000033
根据信道增益,无人机基站m和地面用户n之间在时隙t的数据传输速率为:where η is the extra path loss coefficient, c is the speed of light, f c is the subcarrier frequency, α is the path loss index,
Figure BDA0002523468170000032
Represents the distance between the UAV base station m and the ground user n, where r n,m represents the horizontal distance between the two, and h is the fixed flying height of the UAV base station. According to the path loss, the channel gain can be expressed as
Figure BDA0002523468170000033
According to the channel gain, the data transmission rate between the UAV base station m and the ground user n in the time slot t is:

Figure BDA0002523468170000034
Figure BDA0002523468170000034

其中σ表示加性高斯白噪声,pt表示无人机基站的发射功率,gn,m(t)表示t时刻无人机基站m和地面用户n之间的信道增益。where σ is the additive white Gaussian noise, p t is the transmit power of the UAV base station, and g n,m (t) is the channel gain between the UAV base station m and the ground user n at time t.

(1.3)建立覆盖模型:由于硬件限制,每个无人机基站的覆盖范围是有限的。本发明定义了最大可容忍路径损失Lmax,如果某一时刻无人机基站和用户之间路径损失小于Lmax,我们认为建立的连接是可靠的,否则,我们认为建立连接失败。因此,可以根据最大可容忍路径损耗定义出每个无人机基站的有效覆盖范围,该范围以无人机基站在地面的投影点为圆心,以Rcov为半径,根据路径损失公式,Rcov可以表示为:(1.3) Establish coverage model: Due to hardware limitations, the coverage of each UAV base station is limited. The present invention defines the maximum tolerable path loss L max . If the path loss between the UAV base station and the user is less than L max at a certain moment, we consider the established connection to be reliable, otherwise, we consider that the established connection fails. Therefore, the effective coverage of each UAV base station can be defined according to the maximum tolerable path loss. The range takes the projection point of the UAV base station on the ground as the center and R cov as the radius. According to the path loss formula, R cov It can be expressed as:

Figure BDA0002523468170000035
Figure BDA0002523468170000035

(1.4)建立能量损耗模型:本发明主要关注无人机移动造成的能量损耗,考虑无人机的飞行速度V以及飞行功率pf,无人机基站m在时隙t的飞行能耗取决于飞行的距离:(1.4) Establishing an energy loss model: the present invention mainly focuses on the energy loss caused by the movement of the UAV, considering the flight speed V and the flight power p f of the UAV, the flight energy consumption of the UAV base station m in the time slot t depends on Distance flown:

Figure BDA0002523468170000041
Figure BDA0002523468170000041

其中

Figure BDA0002523468170000042
分别表示无人机在水平面上x轴和y轴的位置坐标。in
Figure BDA0002523468170000042
Represent the position coordinates of the UAV on the x-axis and y-axis on the horizontal plane, respectively.

(2)将问题建模为局部可观测马尔科夫决策过程:(2) Model the problem as a locally observable Markov decision process:

每个无人机基站相当于一个智能体;在每一个环境状态为S(t)的时隙中,智能体m在仅能获得自身覆盖范围内的局部观察om,并根据决策函数um(om),从动作集A中选择动作am,以最大化折扣总期望奖励

Figure BDA0002523468170000043
其中γ∈(0,1)为折扣系数,rm(t)表示智能体m在t时刻的奖励;Each UAV base station is equivalent to an agent; in each time slot where the environmental state is S(t), the agent m can only obtain the local observation o m within its own coverage, and according to the decision function u m (o m ), choose action a m from action set A to maximize the discounted total expected reward
Figure BDA0002523468170000043
where γ∈(0,1) is the discount coefficient, and r m (t) represents the reward of agent m at time t;

系统状态集合S={S(t)|S(t)=(Su(t),Sg(t))},分别包含无人机基站的当前状态

Figure BDA0002523468170000044
和地面用户当前状态
Figure BDA0002523468170000045
无人机基站状态
Figure BDA0002523468170000046
包括无人机当前的位置信息;地面用户状态
Figure BDA0002523468170000047
包括当前地面用户的位置信息。The system state set S={S(t)|S(t)=(S u (t), S g (t))}, respectively including the current state of the UAV base station
Figure BDA0002523468170000044
and the current state of the ground user
Figure BDA0002523468170000045
UAV base station status
Figure BDA0002523468170000046
Including the current position information of the drone; ground user status
Figure BDA0002523468170000047
Including the location information of the current ground user.

无人机动作集合A={a(t)|a(t)=(θ(t),d(t))},在时隙t,无人机m需要在得到当前局部观察信息后做出决策am(t),移动到下一个悬停位置,因此动作集合包括飞行旋转角度θ(t)和移动距离d(t)。UAV action set A={a(t)|a(t)=(θ(t),d(t))}, in time slot t, UAV m needs to make the current local observation information after obtaining the current local observation information. Decision a m (t), move to the next hover position, so the action set includes the flight rotation angle θ(t) and the moving distance d(t).

系统及时奖励r(t):本文的目标是在考虑用户服务公平性和能耗的同时,最大化无人机网络的吞吐量。因此,在每个时刻t通过调整无人机悬停位置所产生的额外吞吐量是一个正项奖励,表示为:The system rewards r(t) in time: The goal of this paper is to maximize the throughput of the UAV network while considering user service fairness and energy consumption. Therefore, the additional throughput generated by adjusting the hovering position of the drone at each time t is a positive reward, expressed as:

ΔC(t)=C(Su(t+1),Sg(t))-C(Su(t),Sg(t))ΔC(t)=C(S u (t+1),S g (t))-C(S u (t),S g (t))

其中C(Su(t),Sg(t))表示无人机基站状态为Su(t),地面用户状态为Sg(t)时网络产生的吞吐量。C(Su(t+1),Sg(t))则表示无人机基站状态为Su(t+1),地面用户状态为Sg(t)时网络产生的吞吐量。考虑到用户服务的公平性,如果某个区域聚集有大量用户,而某个区域只有一个用户,无人机基站为了追求最大化吞吐量会一直悬停在高密度区域,而忽略低密度区域,因此本发明为每个用户的吞吐量奖励施加一个权重wn(t)实现比例公平调度。Rreq表示的是地面用户需求的最小通信速率要求,Rn(t)表示的是地面用户n从开始阶段到时刻t的平均通信速率。当无人机基站服务该用户时,Rn(t)增长,该用户的权重会逐渐变小;若该用户没有被服务到,则Rn(t)减小,该用户权重不断增大。因此,用户稀疏地区的奖励权重会不断增大,吸引无人机基站进行服务。where C(S u (t), S g (t)) represents the throughput generated by the network when the UAV base station state is S u (t) and the ground user state is S g (t). C(S u (t+1), S g (t)) represents the throughput generated by the network when the UAV base station state is S u (t+1) and the ground user state is S g (t). Considering the fairness of user services, if there are a large number of users in a certain area, but only one user in a certain area, the drone base station will always hover in the high-density area and ignore the low-density area in order to maximize throughput. Therefore, the present invention applies a weight w n (t) to each user's throughput reward to achieve proportional fair scheduling. R req represents the minimum communication rate required by the ground user, and R n (t) represents the average communication rate of the ground user n from the initial stage to time t. When the UAV base station serves the user, R n (t) increases, and the weight of the user will gradually decrease; if the user is not served, R n (t) decreases, and the weight of the user increases continuously. Therefore, the reward weight in areas with sparse users will continue to increase, attracting UAV base stations to serve.

Figure BDA0002523468170000051
Figure BDA0002523468170000051

Figure BDA0002523468170000053
Figure BDA0002523468170000053

其中,an,m(t)是一个指示变量,在t时刻,如果无人机基站m服务地面用户用户n,那么an,m(t)=1,因此,综合考虑公平性吞吐量奖励和能耗损失惩罚,本发明给出系统实时奖励r(t):Among them, a n,m (t) is an indicator variable. At time t, if the UAV base station m serves the ground user user n, then an n,m (t)=1. Therefore, the fairness throughput reward is comprehensively considered. and energy loss penalty, the present invention gives the system real-time reward r(t):

Figure BDA0002523468170000052
Figure BDA0002523468170000052

其中α表示能耗惩罚所占的权重,α越大,则该系统在决策时更注重能耗损失,反之则越忽略能耗损失。Among them, α represents the weight of the energy consumption penalty. The larger the α is, the more energy consumption loss the system pays attention to in decision-making, otherwise, the more energy consumption loss is ignored.

局部观察集合O(t)={o1(t),…,oM(t)},当多无人机基站在一个大范围区域协同工作时,每个无人机无法观察到全局信息,只能观察到自身覆盖范围内的地面用户信息。om(t)表示t时刻无人机基站m所观察到的处于自己覆盖范围内的地面用户的位置信息。The local observation set O(t)={o 1 (t),...,o M (t)}, when multiple UAV base stations work together in a large area, each UAV cannot observe the global information, Only terrestrial user information within its own coverage area can be observed. o m (t) represents the location information of the ground users within the coverage area observed by the UAV base station m at time t.

(3)基于多智能体深度强化学习算法进行训练:(3) Training based on multi-agent deep reinforcement learning algorithm:

本发明将多智能体深度强化学习算法MADDPG引入到无人机对地通信网络悬停位置优化中,采用集中式训练和分布式执行的架构,在训练时使用全局信息,更好地指导每个无人机的决策函数的梯度更新,在执行时每个无人机仅使用自己观察到的局部信息做出下一步决策,更贴合实际场景的需要;每个智能体采用了Actor-Critic架构的DDPG网络进行训练,策略网络用来拟合策略函数u(o),输入局部观察o,输出动作策略a;评价网络用来拟合状态-动作函数Q(s,a),表示在系统状态为s时,采取动作a所获得的期望奖励;令u={u1,…,uM}表示M个智能体的确定性策略函数,

Figure BDA0002523468170000061
表示每个策略网络的参数,Q={Q1,…,QM}表示M个智能体的评价网络,
Figure BDA0002523468170000062
表示评价网络的参数,步骤(3)包括:The invention introduces the multi-agent deep reinforcement learning algorithm MADDPG into the hovering position optimization of the UAV-to-ground communication network, adopts the architecture of centralized training and distributed execution, and uses global information during training to better guide each The gradient update of the UAV's decision function, each UAV only uses the local information observed by itself to make the next decision during execution, which is more in line with the needs of the actual scene; each agent adopts the Actor-Critic architecture The DDPG network is used for training, and the strategy network is used to fit the strategy function u(o), input the local observation o, and output the action strategy a; the evaluation network is used to fit the state-action function Q(s, a), which represents the state of the system. When it is s, the expected reward obtained by taking action a; let u={u 1 ,...,u M } denote the deterministic policy function of M agents,
Figure BDA0002523468170000061
represents the parameters of each policy network, Q={Q 1 ,...,Q M } represents the evaluation network of M agents,
Figure BDA0002523468170000062
represents the parameters of the evaluation network, and step (3) includes:

(3.1)初始化经验回放空间,设置经验回放空间大小,初始化每个DDPG网络的参数,训练回合数等(3.1) Initialize the experience playback space, set the size of the experience playback space, initialize the parameters of each DDPG network, the number of training rounds, etc.

(3.2)从训练回合epoch=1开始,从时刻t=1开始。(3.2) Starting from the training round epoch=1, starting from time t=1.

(3.3)获取当前无人机的局部观察信息o和整个系统当前状态s;每个无人机m使用t时隙得到的局部观察信息,基于∈贪婪策略和DDPG网络输出决策信息am调整悬停位置,并根据和地面用户间的路径损耗,基于贪婪方案选择路径损耗最低的W个地面用户进行通信服务,得到瞬时回报奖励r,达到下一系统状态s′并获得局部观察信息o′;将(s,o,a,r,s′,o′)作为样本存入经验回放空间,a={a1,…,aM}表示所有无人机的联合动作,o={o1,…,om}表示所有无人机的局部观察信息,t=t+1。(3.3) Obtain the local observation information o of the current UAV and the current state s of the entire system; each UAV m uses the local observation information obtained in the t time slot, and adjusts the suspension based on the ∈ greedy strategy and the DDPG network output decision information a m stop position, and according to the path loss with the ground users, based on the greedy scheme, select W ground users with the lowest path loss for communication services, obtain the instantaneous reward r, reach the next system state s' and obtain the local observation information o'; Store (s,o,a,r,s′,o′) as a sample in the experience playback space, a={a 1 ,...,a M } represents the joint actions of all UAVs, o={o 1 , ..., o m } represents the local observation information of all UAVs, t=t+1.

(3.4)若回放空间存储的样本数量大于B,到达步骤3.5;否则,继续收集样本,返回步骤3.3。(3.4) If the number of samples stored in the playback space is greater than B, go to step 3.5; otherwise, continue to collect samples and return to step 3.3.

(3.5)对每个智能体m,从经验回放空间中随机采样固定数量K的样本,计算目标值,其中第k个样本(sk,ok,ak,rk,s′k,ok)的目标值yk可以表示为:

Figure BDA0002523468170000071
其中Q′m表示第m个智能体的评价网络的目标网络,u′m表示第m个智能体的策略网络的目标网络,rk表示第k个样本中的及时奖励,a′m表示无人机m在系统状态s′k下根据局部观察
Figure BDA0002523468170000072
所作出的决策。基于全局信息,使用梯度下降法最小化损失函数
Figure BDA0002523468170000073
更新该智能体的评价网络的参数:(3.5) For each agent m, randomly sample a fixed number of K samples from the experience replay space, and calculate the target value, where the kth sample (s k , ok , ak ,r k ,s ′k ,o The target value y k of k ) can be expressed as:
Figure BDA0002523468170000071
where Q' m represents the target network of the evaluation network of the mth agent, u'm represents the target network of the policy network of the mth agent, r k represents the timely reward in the kth sample, and a' m represents no The human-machine m is in the system state s'k according to the local observation
Figure BDA0002523468170000072
decisions made. Minimize the loss function using gradient descent based on global information
Figure BDA0002523468170000073
Update the parameters of the agent's evaluation network:

Figure BDA0002523468170000074
Figure BDA0002523468170000074

根据评价网络和样本信息,基于样本的策略梯度,更新该智能体策略网络的参数:According to the evaluation network and sample information, based on the policy gradient of the sample, update the parameters of the agent's policy network:

Figure BDA0002523468170000075
Figure BDA0002523468170000075

(3.6)间隔一定回合后,即,更新目标网络参数θQ′和θu′:θQ′=τθQ+(1-τ)θQ′u′=τθu+(1-τ)θu′。当达到总时长T或无人机能量耗尽后,退出当前训练回合,否则,返回步骤3.3。若训练回合数已到,则退出训练过程,否则进入新的训练回合。(3.6) After a certain round interval, that is, update the target network parameters θ Q′ and θ u′ : θ Q′ =τθ Q +(1-τ)θ Q′u′ =τθ u +(1-τ) θ u′ . When the total duration T is reached or the drone's energy is exhausted, exit the current training round, otherwise, go back to step 3.3. If the number of training rounds is reached, exit the training process, otherwise enter a new training round.

(4)将训练好的策略网络u分配给每个无人机,将无人机部署到目标区域,每个无人机在每个时隙根据自身的局部观察调整悬停位置,并对地面用户进行通信服务。(4) Allocate the trained policy network u to each UAV, deploy the UAV to the target area, and each UAV adjusts the hovering position according to its own local observation in each time slot, and adjusts the hovering position to the ground. Users perform communication services.

本发明的有益效果:本发明提出一种基于多智能体深度强化学习的无人机网络悬停位置优化方法,将无人机对地通信网络场景下的吞吐量最大化问题建模为局部可观察马尔可夫决策过程,引入多智能体深度强化学习方法MADDPG进行集中式训练和分布式执行,解决动态环境下无人机悬停位置优化问题。该方法使得无人机集群能够更好的适应动态环境,且多个无人机不依赖集中式控制器,能够以分布式的方式进行协作,本发明在即时奖励函数构建中引入了比例公平权重和能耗损失信息,在提高吞吐量的同时一定程度上保证了用户服务的公平性和无人机集群的低能耗。Beneficial effects of the present invention: The present invention proposes a method for optimizing the hovering position of a UAV network based on multi-agent deep reinforcement learning, which models the throughput maximization problem in the UAV-to-ground communication network scenario as a locally available method. Observe the Markov decision-making process, and introduce the multi-agent deep reinforcement learning method MADDPG for centralized training and distributed execution to solve the optimization problem of UAV hovering position in dynamic environment. The method enables the UAV swarm to better adapt to the dynamic environment, and multiple UAVs can cooperate in a distributed manner without relying on a centralized controller. The present invention introduces proportional fair weight in the construction of the instant reward function. and energy consumption loss information, which ensures the fairness of user services and the low energy consumption of UAV swarms to a certain extent while improving throughput.

附图说明Description of drawings

图1是本发明所述的无人机对地通信网络场景示意图。FIG. 1 is a schematic diagram of a UAV-to-ground communication network scenario according to the present invention.

图2是本发明一种基于多智能体深度强化学习的无人机网络悬停位置优化方法的流程图。FIG. 2 is a flowchart of a method for optimizing the hovering position of a UAV network based on multi-agent deep reinforcement learning of the present invention.

图3是本发明基于多智能体深度强化学习的训练无人机分布式策略网络的流程图。FIG. 3 is a flow chart of the present invention for training a distributed strategy network for UAVs based on multi-agent deep reinforcement learning.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白,以下结合实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

一种基于多智能体深度强化学习的无人机网络悬停位置优化方法,应用于缺少地面基础设施或灾后地区的紧急通信恢复。如图1所示,该区域缺少基础通信设施,由无人机作为移动基站进行通信覆盖,地面环境是动态变化的,地面设备可能会发生移动,无人机基站需要不断调整自身的悬停位置,以实现更好的通信服务(最大化系统的吞吐量)。同时还要考虑服务的公平性和能耗损失,不能因为追求吞吐量最大化而忽略某些地面用户,并尽可能减少无人机基站移动所造成的能耗损失。本发明的流程如图2所示,首先,对具体的应用场景中的通信模型、覆盖模型和能耗模型等进行建模并构建优化目标;其次,根据优化目标和多无人机系统特性将优化问题建模为局部可观测马尔科夫决策过程;然后,使用仿真平台模拟多无人机对地通信场景,通过无人机集群和环境的交互采集样本,使用多智能体深度强化学习算法MADDPG进行集中式训练,得到每个无人机的分布式策略。最后,将训练好的策略网络部署到无人机中,将无人机集群部署到目标区域,无人机互相协作完成高吞吐量、低能耗、公平的通信覆盖。A multi-agent deep reinforcement learning-based hovering position optimization method for UAV networks, applied to emergency communication recovery in areas lacking ground infrastructure or post-disaster. As shown in Figure 1, the area lacks basic communication facilities, and the UAV is used as a mobile base station for communication coverage. The ground environment is dynamic, and the ground equipment may move. The UAV base station needs to constantly adjust its hovering position , for better communication service (maximizing the throughput of the system). At the same time, the fairness of the service and the loss of energy consumption should also be considered. Some ground users cannot be ignored because of the pursuit of maximum throughput, and the energy loss caused by the movement of the UAV base station should be minimized. The process of the present invention is shown in Figure 2. First, the communication model, coverage model and energy consumption model in specific application scenarios are modeled and an optimization target is constructed; secondly, according to the optimization target and the characteristics of the multi-UAV system, The optimization problem is modeled as a locally observable Markov decision process; then, a simulation platform is used to simulate a multi-UAV-to-ground communication scenario, and samples are collected through the interaction between the UAV swarm and the environment, and the multi-agent deep reinforcement learning algorithm MADDPG is used. Conduct centralized training to get a distributed policy for each drone. Finally, the trained policy network is deployed into the UAV, and the UAV cluster is deployed to the target area, and the UAVs cooperate with each other to achieve high throughput, low energy consumption, and fair communication coverage.

具体步骤如下:Specific steps are as follows:

(1)建立多无人机対地通信网络模型,主要包括以下4个步骤:(1) Establish a multi-UAV ground-to-ground communication network model, which mainly includes the following 4 steps:

(1.1)建立场景模型:建立一个边长为l的正方形目标区域,该区域中有N个地面用户和M个无人机基站(UAV-BSs),这些无人机基站为地面用户提供通信服务。时间被划分为T个相同的时隙,从上一时隙到当前时隙,地面用户可能静止也可能发生移动,因此无人机基站需要在每个时隙寻找新的最优悬停位置,并在到达目标位置后选择地面用户进行数据传输服务。(1.1) Establish a scene model: establish a square target area with a side length of l, in which there are N ground users and M unmanned aerial vehicle base stations (UAV-BSs), which provide communication services for ground users . Time is divided into T identical time slots, from the previous time slot to the current time slot, the ground user may be stationary or may move, so the UAV base station needs to find a new optimal hovering position in each time slot, and After reaching the target location, the ground user is selected for data transmission service.

(1.2)建立空对地通信模型:本发明使用空对地信道模型对无人机基站和地面用户之间的信道进行建模,无人机基站由于高飞行高度,相比于地面基站更容易与地面用户建立视距链路(LoS),在LoS情况下,无人机基站m和地面用户n之间的路径损耗模型为:(1.2) Establishing an air-to-ground communication model: The present invention uses an air-to-ground channel model to model the channel between the UAV base station and the ground user. The UAV base station is easier to fly than the ground base station due to its high flying height. A line-of-sight link (LoS) is established with the ground user. In the LoS case, the path loss model between the UAV base station m and the ground user n is:

Figure BDA0002523468170000091
Figure BDA0002523468170000091

其中η表示额外路径损耗系数,c表示光速,fc表示子载波频率,α表示路径损失指数,

Figure BDA0002523468170000092
表示无人机基站m和地面用户n之间的距离,rn,m为水平距离,h为无人机基站固定飞行高度。根据路径损失,信道增益可以表示为
Figure BDA0002523468170000101
根据信道增益,无人机基站m和地面用户n之间在时隙t的数据传输速率为:where η is the extra path loss coefficient, c is the speed of light, f c is the subcarrier frequency, α is the path loss index,
Figure BDA0002523468170000092
Represents the distance between the UAV base station m and the ground user n, rn ,m is the horizontal distance, and h is the fixed flying height of the UAV base station. According to the path loss, the channel gain can be expressed as
Figure BDA0002523468170000101
According to the channel gain, the data transmission rate between the UAV base station m and the ground user n in the time slot t is:

Figure BDA0002523468170000102
Figure BDA0002523468170000102

其中σ表示加性高斯白噪声,pt表示无人机基站的发射功率,gn,m(t)表示t时刻无人机基站m和地面用户n之间的信道增益。where σ is the additive white Gaussian noise, p t is the transmit power of the UAV base station, and g n,m (t) is the channel gain between the UAV base station m and the ground user n at time t.

(1.3)建立覆盖模型:由于硬件限制,每个无人机基站的覆盖范围是有限的。本发明定义了最大可容忍路径损失Lmax,如果某一时刻无人机基站和用户之间路径损失小于Lmax,我们认为建立的连接是可靠的,否则,我们认为建立连接失败。因此,可以根据最大可容忍路径损耗定义出每个无人机基站的有效覆盖范围,该范围以无人机基站在地面的投影点为圆心,以Rcov为半径,根据路径损失公式,Rcov可以表示为:(1.3) Establish coverage model: Due to hardware limitations, the coverage of each UAV base station is limited. The present invention defines the maximum tolerable path loss L max . If the path loss between the UAV base station and the user is less than L max at a certain moment, we consider the established connection to be reliable, otherwise, we consider that the established connection fails. Therefore, the effective coverage of each UAV base station can be defined according to the maximum tolerable path loss. The range takes the projection point of the UAV base station on the ground as the center and R cov as the radius. According to the path loss formula, R cov It can be expressed as:

Figure BDA0002523468170000103
Figure BDA0002523468170000103

(1.4)建立能量损耗模型:本发明主要关注无人机移动造成的能量损耗,考虑无人机的飞行速度V以及飞行功率pf,无人机基站m在时隙t的飞行能耗取决于飞行的距离:(1.4) Establishing an energy loss model: the present invention mainly focuses on the energy loss caused by the movement of the UAV, considering the flight speed V and the flight power p f of the UAV, the flight energy consumption of the UAV base station m in the time slot t depends on Distance flown:

Figure BDA0002523468170000104
Figure BDA0002523468170000104

其中

Figure BDA0002523468170000105
分别表示无人机在水平面上x轴和y轴的位置坐标。in
Figure BDA0002523468170000105
Represent the position coordinates of the UAV on the x-axis and y-axis on the horizontal plane, respectively.

(2)将问题建模为局部可观测马尔科夫决策过程:(2) Model the problem as a locally observable Markov decision process:

每个无人机基站相当于一个智能体;在每一个环境状态为S(t)的时隙中,智能体m在仅能获得自身覆盖范围内的局部观察om,并根据决策函数um(om),从动作集A中选择动作am,以最大化折扣总期望奖励

Figure BDA0002523468170000106
其中γ∈(0,1)为折扣系数,rm(t)表示智能体m在t时刻的奖励;Each UAV base station is equivalent to an agent; in each time slot where the environmental state is S(t), the agent m can only obtain the local observation o m within its own coverage, and according to the decision function u m (o m ), choose action a m from action set A to maximize the discounted total expected reward
Figure BDA0002523468170000106
where γ∈(0,1) is the discount coefficient, and r m (t) represents the reward of agent m at time t;

系统状态集合S={S(t)|S(t)=(Su(t),Sg(t))},分别包含无人机基站的当前状态

Figure BDA0002523468170000111
和地面用户当前状态
Figure BDA0002523468170000112
无人机基站状态
Figure BDA0002523468170000113
包括无人机当前的位置信息;地面用户状态
Figure BDA0002523468170000114
包括当前地面用户的位置信息。The system state set S={S(t)|S(t)=(S u (t), S g (t))}, respectively including the current state of the UAV base station
Figure BDA0002523468170000111
and the current state of the ground user
Figure BDA0002523468170000112
UAV base station status
Figure BDA0002523468170000113
Including the current position information of the drone; ground user status
Figure BDA0002523468170000114
Including the location information of the current ground user.

无人机动作集合A={a(t)|a(t)=(θ(t),d(t))},在时隙t,无人机m需要在得到当前局部观察信息后做出决策am(t),移动到下一个悬停位置,因此动作集合包括飞行旋转角度θ(t)和移动距离d(t)。UAV action set A={a(t)|a(t)=(θ(t),d(t))}, in time slot t, UAV m needs to make the current local observation information after obtaining the current local observation information. Decision a m (t), move to the next hover position, so the action set includes the flight rotation angle θ(t) and the moving distance d(t).

系统及时奖励r(t):本文的目标是在考虑用户服务公平性和能耗的同时,最大化无人机网络的吞吐量。因此,在每个时刻t通过调整无人机悬停位置所产生的额外吞吐量是一个正项奖励,表示为:The system rewards r(t) in time: The goal of this paper is to maximize the throughput of the UAV network while considering user service fairness and energy consumption. Therefore, the additional throughput generated by adjusting the hovering position of the drone at each time t is a positive reward, expressed as:

ΔC(t)=C(Su(t+1),Sg(t))-C(Su(t),Sg(t))ΔC(t)=C(S u (t+1),S g (t))-C(S u (t),S g (t))

其中C(Su(t),Sg(t))表示无人机基站状态为Su(t),地面用户状态为Sg(t)时网络产生的吞吐量。C(Su(t+1),Sg(t))则表示无人机基站状态为Su(t+1),地面用户状态为Sg(t)时网络产生的吞吐量。考虑到用户服务的公平性,如果某个区域聚集有大量用户,而某个区域只有一个用户,无人机基站为了追求最大化吞吐量会一直悬停在高密度区域,而忽略低密度区域,因此本发明为每个用户的吞吐量奖励施加一个权重wn(t)实现比例公平调度。Rreq表示的是地面用户需求的最小通信速率要求,Rn(t)表示的是地面用户n从开始阶段到时刻t的平均通信速率。当无人机基站服务该用户时,Rn(t)增长,该用户的权重会逐渐变小;若该用户没有被服务到,则Rn(t)减小,该用户权重不断增大。因此,用户稀疏地区的奖励权重会不断增大,吸引无人机基站进行服务。where C(S u (t), S g (t)) represents the throughput generated by the network when the UAV base station state is S u (t) and the ground user state is S g (t). C(S u (t+1), S g (t)) represents the throughput generated by the network when the UAV base station state is S u (t+1) and the ground user state is S g (t). Considering the fairness of user services, if there are a large number of users in a certain area, but only one user in a certain area, the drone base station will always hover in the high-density area and ignore the low-density area in order to maximize throughput. Therefore, the present invention applies a weight w n (t) to each user's throughput reward to achieve proportional fair scheduling. R req represents the minimum communication rate required by the ground user, and R n (t) represents the average communication rate of the ground user n from the initial stage to time t. When the UAV base station serves the user, R n (t) increases, and the weight of the user will gradually decrease; if the user is not served, R n (t) decreases, and the weight of the user increases continuously. Therefore, the reward weight in areas with sparse users will continue to increase, attracting UAV base stations to serve.

Figure BDA0002523468170000121
Figure BDA0002523468170000121

Figure BDA0002523468170000122
Figure BDA0002523468170000122

因此,综合考虑公平性吞吐量奖励和能耗损失惩罚,本发明给出系统实时奖励r(t)Therefore, considering the fairness throughput reward and energy loss penalty comprehensively, the present invention gives the system real-time reward r(t)

Figure BDA0002523468170000123
Figure BDA0002523468170000123

其中α表示能耗惩罚所占的权重,α越大,则该系统在决策时更注重能耗损失,反之则越忽略能耗损失。Among them, α represents the weight of the energy consumption penalty. The larger the α is, the more energy consumption loss the system pays attention to in decision-making, otherwise, the more energy consumption loss is ignored.

局部观察集合O(t)={o1(t),…,oM(t)},当多无人机基站在一个大范围区域协同工作时,每个无人机无法观察到全局信息,只能观察到自身覆盖范围内的地面用户信息。om(t)表示无人机基站m所观察到的处于自己覆盖范围内的地面用户的位置信息。The local observation set O(t)={o 1 (t),...,o M (t)}, when multiple UAV base stations work together in a large area, each UAV cannot observe the global information, Only terrestrial user information within its own coverage area can be observed. o m (t) represents the location information of ground users within its coverage area observed by the UAV base station m.

(3)基于多智能体深度强化学习算法进行训练:(3) Training based on multi-agent deep reinforcement learning algorithm:

本发明将多智能体深度强化学习算法MADDPG引入到无人机对地通信网络悬停位置优化中,采用集中式训练和分布式执行的架构,在训练时使用全局信息,更好地指导每个无人机的决策函数的梯度更新,在执行时每个无人机仅使用自己观察到的局部信息做出下一步决策,更贴合实际场景的需要;每个智能体采用了Actor-Critic架构的DDPG网络进行训练,策略网络用来拟合策略函数u(o),输入局部观察o,输出动作策略a;评价网络用来拟合状态-动作函数Q(s,a),表示在系统状态为s时,采取动作a所获得的期望奖励;令u={u1,…,uM}表示M个智能体的确定性策略函数,

Figure BDA0002523468170000124
表示每个策略网络的参数,Q={Q1,…,QM}表示M个智能体的评价网络,
Figure BDA0002523468170000131
表示评价网络的参数,如图3所示,步骤(3)包括:The invention introduces the multi-agent deep reinforcement learning algorithm MADDPG into the hovering position optimization of the UAV-to-ground communication network, adopts the architecture of centralized training and distributed execution, and uses global information during training to better guide each The gradient update of the UAV's decision function, each UAV only uses the local information observed by itself to make the next decision during execution, which is more in line with the needs of the actual scene; each agent adopts the Actor-Critic architecture The DDPG network is used for training, and the strategy network is used to fit the strategy function u(o), input the local observation o, and output the action strategy a; the evaluation network is used to fit the state-action function Q(s, a), which represents the state of the system. When it is s, the expected reward obtained by taking action a; let u={u 1 ,...,u M } denote the deterministic policy function of M agents,
Figure BDA0002523468170000124
represents the parameters of each policy network, Q={Q 1 ,...,Q M } represents the evaluation network of M agents,
Figure BDA0002523468170000131
Represents the parameters of the evaluation network, as shown in Figure 3, step (3) includes:

(3.1)初始化经验回放空间,并设置经验回放空间大小B,初始化每个DDPG网络的参数θ,训练回合数P,时长T等(3.1) Initialize the experience playback space, and set the size B of the experience playback space, initialize the parameters θ of each DDPG network, the number of training rounds P, the duration T, etc.

(3.2)从训练回合epoch=1开始,从时刻t=1开始。(3.2) Starting from the training round epoch=1, starting from time t=1.

(3.3)获取当前无人机的局部观察信息o和整个系统当前状态s;每个无人机m使用t时隙得到的局部观察信息,基于∈贪婪策略和DDPG网络输出决策信息am调整悬停位置,并根据和地面用户间的路径损耗,基于贪婪方案选择路径损耗最低的W个地面用户进行通信服务,得到瞬时回报奖励r,达到下一系统状态s′并获得局部观察信息o′;将(s,o,a,r,s′,o′)作为样本存入经验回放空间,a={a1,…,aM}表示所有无人机的联合动作,o={o1,…,om}表示所有无人机的局部观察信息,t=t+1;(3.3) Obtain the local observation information o of the current UAV and the current state s of the entire system; each UAV m uses the local observation information obtained in the t time slot, and adjusts the suspension based on the ∈ greedy strategy and the DDPG network output decision information a m stop position, and according to the path loss with the ground users, based on the greedy scheme, select W ground users with the lowest path loss for communication services, obtain the instantaneous reward r, reach the next system state s' and obtain the local observation information o'; Store (s,o,a,r,s′,o′) as a sample in the experience playback space, a={a 1 ,...,a M } represents the joint actions of all UAVs, o={o 1 , ..., o m } represents the local observation information of all UAVs, t=t+1;

(3.4)若回放空间存储的样本数量大于B,到达步骤3.5;否则,继续收集样本,返回步骤3.3。(3.4) If the number of samples stored in the playback space is greater than B, go to step 3.5; otherwise, continue to collect samples and return to step 3.3.

(3.5)对每个智能体m,从经验回放空间中随机采样固定数量K的样本,计算目标值,其中第k个样本(sk,ok,ak,rk,s′k,ok)的目标值yk可以表示为:

Figure BDA0002523468170000132
其中Q′m表示第m个智能体的评价网络的目标网络,u′m表示第m个智能体的策略网络的目标网络,rk表示第k个样本中的及时奖励,a′m表示无人机m在系统状态s′k下根据局部观察
Figure BDA0002523468170000133
所作出的决策。基于全局信息,使用梯度下降法最小化损失函数
Figure BDA0002523468170000134
更新该智能体的评价网络的参数:(3.5) For each agent m, randomly sample a fixed number of K samples from the experience replay space, and calculate the target value, where the kth sample (s k , ok , ak ,r k ,s ′k ,o The target value y k of k ) can be expressed as:
Figure BDA0002523468170000132
where Q' m represents the target network of the evaluation network of the mth agent, u'm represents the target network of the policy network of the mth agent, r k represents the timely reward in the kth sample, and a' m represents no The human-machine m is in the system state s'k according to the local observation
Figure BDA0002523468170000133
decisions made. Minimize the loss function using gradient descent based on global information
Figure BDA0002523468170000134
Update the parameters of the agent's evaluation network:

Figure BDA0002523468170000141
Figure BDA0002523468170000141

根据评价网络和样本信息,基于样本的策略梯度,更新该智能体策略网络的参数:According to the evaluation network and sample information, based on the policy gradient of the sample, update the parameters of the agent's policy network:

Figure BDA0002523468170000142
Figure BDA0002523468170000142

(3.6)间隔一定回合后,更新评价目标网络参数θQ′和策略目标网络参数θu′:θQ′=τθQ+(1-τ)θQ′u′=τθu+(1-τ)θu′。当达到总时长T或无人机能量耗尽后,退出当前训练回合,否则,返回步骤3.3。若训练回合数已到,则退出训练过程,否则进入新的训练回合。(3.6) After a certain round interval, update the evaluation target network parameters θ Q′ and policy target network parameters θ u′ : θ Q′ = τθ Q +(1-τ)θ Q′u′ =τθ u +(1 -τ)θ u′ . When the total duration T is reached or the drone's energy is exhausted, exit the current training round, otherwise, go back to step 3.3. If the number of training rounds is reached, exit the training process, otherwise enter a new training round.

(4)将训练好的策略网络u分配给每个无人机,将无人机部署到目标区域,每个无人机在每个时隙根据自身的局部观察调整悬停位置,并对地面用户进行通信服务。(4) Allocate the trained policy network u to each UAV, deploy the UAV to the target area, and each UAV adjusts the hovering position according to its own local observation in each time slot, and adjusts the hovering position to the ground. Users perform communication services.

综上所述:In summary:

本发明提出一种基于多智能体深度强化学习的无人机网络悬停位置优化方法,通过将多无人机对地通信场景中的吞吐量最大化问题建模为局部可观测马尔科夫决策过程,并使用MADDPG算法进行解决,使得无人机集群能够适应动态环境,进行分布式协作,实现网络的高吞吐量、低能耗和服务公平性。The present invention proposes a method for optimizing the hovering position of the UAV network based on multi-agent deep reinforcement learning. By modeling the throughput maximization problem in the multi-UAV-to-ground communication scenario as a locally observable Markov decision process, and use the MADDPG algorithm to solve it, so that the UAV swarm can adapt to the dynamic environment, carry out distributed cooperation, and achieve high throughput, low energy consumption and service fairness of the network.

以上显示和描述了本发明的基本原理和主要特征和本发明的优点。本行业的技术人员应该了解,本发明不受上述实施例的限制,上述实施例和说明书中描述的只是说明本发明的原理,在不脱离本发明精神和范围的前提下,本发明还会有各种变化和改进,这些变化和改进都落入要求保护的本发明范围内。本发明要求保护范围由所附的权利要求书及其等效物界定。The basic principles and main features of the present invention and the advantages of the present invention have been shown and described above. Those skilled in the art should understand that the present invention is not limited by the above-mentioned embodiments. The above-mentioned embodiments and descriptions only illustrate the principle of the present invention. Without departing from the spirit and scope of the present invention, the present invention will also have Various changes and modifications fall within the scope of the claimed invention. The claimed scope of the present invention is defined by the appended claims and their equivalents.

Claims (1)

1. An unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning is characterized by comprising the following steps:
(1) establishing multi-unmanned aerial vehicle to ground communication network model
(1.1) establishing a scene model: establishing a square target area with the side length of l, wherein N ground users and M unmanned aerial vehicle base stations are arranged in the area, and the unmanned aerial vehicle base stations provide communication service for the ground users; the time is divided into T identical time slots, from the last time slot to the current time slot, the ground user may be static or may move, so the unmanned aerial vehicle base station needs to search a new optimal hovering position in each time slot and select the ground user to perform data transmission service after reaching the target position;
(1.2) establishing an air-to-ground communication model: use the air-to-ground channel model to model the channel between unmanned aerial vehicle basic station and the ground user, unmanned aerial vehicle basic station is because high flight altitude, compares in ground basic station and establishes the stadia link LoS with the ground user more easily, and under the LoS condition, the path loss model between unmanned aerial vehicle basic station m and the ground user n is:
Figure FDA0002523468160000011
where η denotes the excess path loss coefficient, c denotes the speed of light, fcRepresenting the subcarrier frequency, α the path loss exponent,
Figure FDA0002523468160000012
representing the distance, r, between the drone base station m and the ground user nn,mThe horizontal distance is h, and the fixed flying height of the unmanned aerial vehicle base station is h; the channel gain is expressed as
Figure FDA0002523468160000013
According to the channel gain, the data transmission rate between the unmanned aerial vehicle base station m and the ground user n in the time slot t is Rn,m(t):
Figure FDA0002523468160000014
Where σ represents additive white Gaussian noise, ptRepresenting the transmitted power of the drone base station, gn,m(t) represents the channel gain between the unmanned aerial vehicle base station m and the ground user n at time t;
(1.3) establishing a coverage model: defining a maximum tolerable path loss LmaxIf the path loss between the unmanned aerial vehicle base station and the user is less than L at a certain momentmaxIf the connection is not established, the connection is established; defining the effective coverage range of each unmanned aerial vehicle base station according to the maximum tolerable path loss, wherein the range takes the projection point of the unmanned aerial vehicle base station on the ground as the circle center and takes R as the circle centercovIs radius, according to the path loss formula, RcovExpressed as:
Figure FDA0002523468160000021
(1.4) establishing an energy loss model: paying attention to energy loss caused by movement of the unmanned aerial vehicle, and considering the flying speed V and the flying power p of the unmanned aerial vehiclefAnd the flight energy consumption delta e of the unmanned aerial vehicle base station m in the time slot tm(t) distance of flight dependent:
Figure FDA0002523468160000022
wherein,
Figure FDA0002523468160000023
respectively representing the position coordinates of the unmanned aerial vehicle on the x axis and the y axis on the horizontal plane at the time t;
(2) modeling the problem as a locally observable markov decision process:
each unmanned aerial vehicle base station is equivalent to an agent; in each time slot with the environment state S (t), the agent m can only obtain local observation o within the coverage range of the agent mmAnd according to a decision function um(om) Selecting an action a from the action setmTo maximize the total desired reward for the discount
Figure FDA0002523468160000024
Where γ ∈ (0,1) is the discount coefficient, rm(t) represents the reward of agent m at time t;
the system state set S ═ { S (t) | S (t) ═ S (S)u(t),Sg(t)) }, respectively containing the current state of the drone base station
Figure FDA0002523468160000025
And the current state of the ground user
Figure FDA0002523468160000026
Status of each drone base station
Figure FDA0002523468160000027
The current position information of the unmanned aerial vehicle is included; state of each terrestrial user
Figure FDA0002523468160000028
Including location information of a current ground user;
in the time slot t, the unmanned aerial vehicle m needs to make a decision a after obtaining the current local observation informationm(t), move to the next hover position, so the action set includes flight rotation angle θ (t) and movement distance d (t);
system real-time rewards r (t): the throughput of the unmanned aerial vehicle network is maximized while the user service fairness and the energy consumption are considered; thus, the extra throughput generated by adjusting the hover position of the drone at each time t is a positive reward, expressed as:
ΔC(t)=C(Su(t+1),Sg(t))-C(Su(t),Sg(t))
wherein, C (S)u(t),Sg(t)) indicates that the unmanned aerial vehicle base station state is Su(t) the ground user status is Sg(t) throughput generated by the network; c (S)u(t+1),Sg(t)) then indicates that the unmanned aerial vehicle base station state is Su(t +1), the ground user state is Sg(t) throughput generated by the network; considering fairness of user service, if a certain area is gathered with a large number of users and a certain area has only a small number of users, the unmanned aerial vehicle base station always hovers in a high-density area in pursuit of maximizing throughput, and ignores the low-density area, so that a weight w is applied to the throughput reward of each usern(t) implementing proportional fair scheduling; rreqExpressed is the minimum communication rate requirement, R, of the terrestrial user demandn(t) represents the average communication rate of the terrestrial user n from the beginning to the time t; when the drone base station serves this user, Rn(t) increase, the user's weight gradually becomes smaller; if the user is not served, Rn(t) increase, the user weight increasing; therefore, the reward weight of the user sparse area is continuously increased, and the unmanned aerial vehicle base station is attracted to carry out service;
Figure FDA0002523468160000031
Figure FDA0002523468160000032
wherein, an,m(t) is an indicator variable, at time t, if drone base station m serves a ground user n, then an,m(t) is 1, whereas, an,m(t) ═ 0; therefore, considering the fairness throughput reward and the energy loss penalty comprehensively, the system rewards r (t) in real time:
Figure FDA0002523468160000041
wherein alpha represents the weight occupied by the energy consumption punishment, the larger alpha is, the more the system pays attention to the energy consumption loss in the decision making process, otherwise, the more the energy consumption loss is ignored;
local observation set o (t) ═ o1(t),...,oM(t), when a plurality of unmanned aerial vehicle base stations cooperatively work in a large-range area, each unmanned aerial vehicle cannot observe global information and can only observe ground user information in the coverage area of the unmanned aerial vehicle; om(t) the position information of the ground user in the coverage range of the unmanned aerial vehicle base station m observed at the moment t is represented;
(3) training based on a multi-agent deep reinforcement learning algorithm:
the multi-agent deep reinforcement learning algorithm MADDPG is introduced into the hovering position optimization of the unmanned aerial vehicle to the ground communication network, a centralized training and distributed execution architecture is adopted, global information is used during training, gradient updating of a decision function of each unmanned aerial vehicle is better guided, each unmanned aerial vehicle only uses local information observed by the unmanned aerial vehicle to make a next decision during execution, and the unmanned aerial vehicle is more suitable for an actual fieldThe need for a scene; each agent adopts an Actor-Critic structured DDPG network for training, the strategy network is used for fitting a strategy function u (o), inputting a local observation o and outputting an action strategy a; the evaluation network is used to fit a state-action function Q (s, a) representing the desired reward for taking action a when the system state is s; let u be { u ═ u1,...,uMDenotes the deterministic policy functions of M agents,
Figure FDA0002523468160000042
parameter representing each policy network, Q ═ Q1,...,QMDenotes the evaluation network of M agents,
Figure FDA0002523468160000043
a parameter indicative of an evaluation network;
(3.1) initializing an experience playback space, setting the size of the experience playback space, initializing parameters of each DDPG network and training rounds;
(3.2) starting from the training round epoch-1 and starting from the time t-1;
(3.3) acquiring local observation information o of the current unmanned aerial vehicle and the current state s of the whole system, wherein each unmanned aerial vehicle m uses the local observation information obtained by the time slot t, and outputs decision information a based on an ∈ greedy strategy and a DDPG networkmAdjusting the hovering position, selecting W ground users with the lowest path loss to perform communication service based on a greedy scheme according to the path loss between the hovering position and the ground users, obtaining an instant return reward r, achieving the next system state s 'and obtaining local observation information o'; storing (s, o, a, r, s ', o') as a sample in an empirical playback space, a ═ a { (a)1,...,aMDenotes the joint action of all drones, o ═ o1,...,omThe local observation information of all unmanned aerial vehicles is represented, and t is t + 1;
(3.4) if the number of the samples stored in the playback space is greater than B, reaching the step (3.5); otherwise, continuing to collect the sample, and returning to the step (3.3);
(3.5) for each agent m, randomly sampling data from the empirical playback spaceA constant number K of samples, calculating a target value, wherein the kth sample(s)k,ok,ak,rk,s′k,ok) Target value y ofkCan be expressed as:
Figure FDA0002523468160000051
wherein Q'mA target network u 'representing an evaluation network of the m-th agent'mTarget network of the policy network representing the mth agent, rkRepresents the timely reward, a ', in the kth sample'mRepresenting unmanned plane m in system state s'kAccording to local observation
Figure FDA0002523468160000052
The decision made; minimizing loss functions using gradient descent based on global information
Figure FDA0002523468160000053
And updating parameters of the evaluation network of the agent:
Figure FDA0002523468160000054
updating parameters of the intelligent agent strategy network based on the strategy gradient of the sample according to the evaluation network and the sample information:
Figure FDA0002523468160000055
(3.6) updating the evaluation target network parameter theta after a certain turn intervalQ′And a policy target network parameter θu′:θQ′=τθQ+(1-τ)θQ′,θu′=τθu+(1-τ)θu′Tau ∈ (0,1) represents updating weight, when reaching total duration T or the energy of the unmanned aerial vehicle is exhausted, quitting the current training round, otherwise, returning to step (3.3), if the number of the training rounds is up, quitting the training process, otherwise, entering a new training round;
(4) and distributing the trained strategy network u to each unmanned aerial vehicle, deploying the unmanned aerial vehicles to a target area, adjusting the hovering position of each unmanned aerial vehicle in each time slot according to local observation of the unmanned aerial vehicle, and performing communication service on ground users.
CN202010497656.4A 2020-06-04 2020-06-04 A UAV network hovering position optimization method based on multi-agent deep reinforcement learning Active CN111786713B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010497656.4A CN111786713B (en) 2020-06-04 2020-06-04 A UAV network hovering position optimization method based on multi-agent deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010497656.4A CN111786713B (en) 2020-06-04 2020-06-04 A UAV network hovering position optimization method based on multi-agent deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN111786713A true CN111786713A (en) 2020-10-16
CN111786713B CN111786713B (en) 2021-06-08

Family

ID=72753669

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010497656.4A Active CN111786713B (en) 2020-06-04 2020-06-04 A UAV network hovering position optimization method based on multi-agent deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN111786713B (en)

Cited By (78)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112256056A (en) * 2020-10-19 2021-01-22 中山大学 Unmanned aerial vehicle control method and system based on multi-agent deep reinforcement learning
CN112511197A (en) * 2020-12-01 2021-03-16 南京工业大学 Unmanned aerial vehicle auxiliary elastic video multicast method based on deep reinforcement learning
CN112512115A (en) * 2020-11-20 2021-03-16 北京邮电大学 Method and device for determining position of air base station and electronic equipment
CN112511250A (en) * 2020-12-03 2021-03-16 中国人民解放军火箭军工程大学 DRL-based multi-unmanned aerial vehicle air base station dynamic deployment method and system
CN112566209A (en) * 2020-11-24 2021-03-26 山西三友和智慧信息技术股份有限公司 UAV-BSs energy and service priority track design method based on double Q learning
CN112636811A (en) * 2020-12-08 2021-04-09 北京邮电大学 Relay unmanned aerial vehicle deployment method and device
CN112672361A (en) * 2020-12-17 2021-04-16 东南大学 Large-scale MIMO capacity increasing method based on unmanned aerial vehicle cluster deployment
CN112752357A (en) * 2020-12-02 2021-05-04 宁波大学 Online unmanned aerial vehicle auxiliary data collection method and device based on energy harvesting technology
CN112821938A (en) * 2021-01-08 2021-05-18 重庆大学 Total throughput and energy consumption optimization method of air-space-ground satellite communication system
CN112904890A (en) * 2021-01-15 2021-06-04 北京国网富达科技发展有限责任公司 Unmanned aerial vehicle automatic inspection system and method for power line
CN112947575A (en) * 2021-03-17 2021-06-11 中国人民解放军国防科技大学 Unmanned aerial vehicle cluster multi-target searching method and system based on deep reinforcement learning
CN113094982A (en) * 2021-03-29 2021-07-09 天津理工大学 Internet of vehicles edge caching method based on multi-agent deep reinforcement learning
CN113115344A (en) * 2021-04-19 2021-07-13 中国人民解放军火箭军工程大学 Unmanned aerial vehicle base station communication resource allocation strategy prediction method based on noise optimization
CN113162679A (en) * 2021-04-01 2021-07-23 南京邮电大学 DDPG algorithm-based IRS (inter-Range instrumentation System) auxiliary unmanned aerial vehicle communication joint optimization method
CN113190039A (en) * 2021-04-27 2021-07-30 大连理工大学 Unmanned aerial vehicle acquisition path planning method based on hierarchical deep reinforcement learning
CN113194488A (en) * 2021-03-31 2021-07-30 西安交通大学 Unmanned aerial vehicle track and intelligent reflecting surface phase shift joint optimization method and system
CN113242556A (en) * 2021-06-04 2021-08-10 重庆邮电大学 Unmanned aerial vehicle resource dynamic deployment method based on differentiated services
CN113255218A (en) * 2021-05-27 2021-08-13 电子科技大学 Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network
CN113286314A (en) * 2021-05-25 2021-08-20 重庆邮电大学 Unmanned aerial vehicle base station deployment and user association method based on Q learning algorithm
CN113286275A (en) * 2021-04-23 2021-08-20 南京大学 Unmanned aerial vehicle cluster efficient communication method based on multi-agent reinforcement learning
CN113328775A (en) * 2021-05-28 2021-08-31 怀化学院 UAV height positioning system and computer storage medium
CN113346944A (en) * 2021-06-28 2021-09-03 上海交通大学 Time delay minimization calculation task unloading method and system in air-space-ground integrated network
CN113342029A (en) * 2021-04-16 2021-09-03 山东师范大学 Maximum sensor data acquisition path planning method and system based on unmanned aerial vehicle cluster
CN113364495A (en) * 2021-05-25 2021-09-07 西安交通大学 Multi-unmanned aerial vehicle track and intelligent reflecting surface phase shift joint optimization method and system
CN113364630A (en) * 2021-06-15 2021-09-07 广东技术师范大学 Quality of service (QoS) differentiation optimization method and device
CN113359480A (en) * 2021-07-16 2021-09-07 中国人民解放军火箭军工程大学 Multi-unmanned aerial vehicle and user cooperative communication optimization method based on MAPPO algorithm
CN113382060A (en) * 2021-06-07 2021-09-10 北京理工大学 Unmanned aerial vehicle track optimization method and system in Internet of things data collection
CN113395708A (en) * 2021-07-13 2021-09-14 东南大学 Multi-autonomous-subject centralized region coverage method and system based on global environment prediction
CN113392971A (en) * 2021-06-11 2021-09-14 武汉大学 Strategy network training method, device, equipment and readable storage medium
CN113467508A (en) * 2021-06-30 2021-10-01 天津大学 Multi-unmanned aerial vehicle intelligent cooperative decision-making method for trapping task
CN113572548A (en) * 2021-06-18 2021-10-29 南京理工大学 A fast frequency hopping method for UAV network collaboration based on multi-agent reinforcement learning
CN113613339A (en) * 2021-07-10 2021-11-05 西北农林科技大学 Channel access method of multi-priority wireless terminal based on deep reinforcement learning
CN113625751A (en) * 2021-08-05 2021-11-09 南京航空航天大学 Unmanned aerial vehicle position and resource joint optimization method for air-ground integrated federal learning
CN113625569A (en) * 2021-08-12 2021-11-09 中国人民解放军32802部队 Small unmanned aerial vehicle prevention and control hybrid decision method and system based on deep reinforcement learning and rule driving
CN113641192A (en) * 2021-07-06 2021-11-12 暨南大学 A Path Planning Method for UAV Swarm Perception Task Based on Reinforcement Learning
CN113660681A (en) * 2021-05-31 2021-11-16 西北工业大学 Multi-agent resource optimization method applied to unmanned aerial vehicle cluster auxiliary transmission
CN113691294A (en) * 2021-09-27 2021-11-23 中国人民解放军空军预警学院 Near-field sparse array antenna beam establishing method and device
CN113706023A (en) * 2021-08-31 2021-11-26 哈尔滨理工大学 Shipboard aircraft guarantee operator scheduling method based on deep reinforcement learning
CN113762512A (en) * 2021-11-10 2021-12-07 北京航空航天大学杭州创新研究院 Distributed model training method, system and related device
CN113776531A (en) * 2021-07-21 2021-12-10 电子科技大学长三角研究院(湖州) Multi-UAV autonomous navigation and task assignment algorithm based on wireless self-powered communication network
CN114021775A (en) * 2021-09-30 2022-02-08 成都海天数联科技有限公司 Intelligent body handicap device putting method based on optimal solution
CN114051252A (en) * 2021-09-28 2022-02-15 嘉兴学院 Multi-user intelligent transmitting power control method in wireless access network
CN114124784A (en) * 2022-01-27 2022-03-01 军事科学院系统工程研究院网络信息研究所 Intelligent routing decision protection method and system based on vertical federation
CN114142912A (en) * 2021-11-26 2022-03-04 西安电子科技大学 Resource management and control method for high dynamic air network time coverage continuity guarantee
CN114222251A (en) * 2021-11-30 2022-03-22 中山大学·深圳 Adaptive network forming and track optimizing method for multiple unmanned aerial vehicles
CN114268963A (en) * 2021-12-24 2022-04-01 北京航空航天大学 A method for autonomous deployment of UAV networks for communication coverage
CN114268986A (en) * 2021-12-14 2022-04-01 北京航空航天大学 Unmanned aerial vehicle computing unloading and charging service efficiency optimization method
CN114339842A (en) * 2022-01-06 2022-04-12 北京邮电大学 Dynamic trajectory design method and device for UAV swarms in time-varying scenarios based on deep reinforcement learning
CN114372612A (en) * 2021-12-16 2022-04-19 电子科技大学 Route planning and task unloading method for unmanned aerial vehicle mobile edge computing scene
CN114374951A (en) * 2022-01-12 2022-04-19 重庆邮电大学 A method for dynamic pre-deployment of multiple UAVs
CN114449482A (en) * 2022-03-11 2022-05-06 南京理工大学 Heterogeneous vehicle networking user association method based on multi-agent deep reinforcement learning
CN114548551A (en) * 2022-02-21 2022-05-27 广东汇天航空航天科技有限公司 Method and device for determining residual endurance time, aircraft and medium
CN114567888A (en) * 2022-03-04 2022-05-31 重庆邮电大学 Multi-unmanned aerial vehicle dynamic deployment method
CN114578335A (en) * 2022-03-03 2022-06-03 电子科技大学长三角研究院(衢州) Positioning method based on multi-agent deep reinforcement learning and least square
CN114625151A (en) * 2022-03-10 2022-06-14 大连理工大学 Underwater robot obstacle avoidance path planning method based on reinforcement learning
CN114679699A (en) * 2022-03-23 2022-06-28 重庆邮电大学 Multi-UAV energy-saving cruise communication coverage method based on deep reinforcement learning
CN114884895A (en) * 2022-05-05 2022-08-09 郑州轻工业大学 Intelligent traffic scheduling method based on deep reinforcement learning
CN114942653A (en) * 2022-07-26 2022-08-26 北京邮电大学 Method, device and electronic device for determining flight strategy of unmanned swarm
CN114980169A (en) * 2022-05-16 2022-08-30 北京理工大学 A UAV-assisted ground communication method based on joint optimization of trajectory and phase
CN114980020A (en) * 2022-05-17 2022-08-30 重庆邮电大学 Unmanned aerial vehicle data collection method based on MADDPG algorithm
CN114997617A (en) * 2022-05-23 2022-09-02 华中科技大学 Multi-unmanned platform multi-target joint detection task allocation method and system
CN115038155A (en) * 2022-05-23 2022-09-09 香港中文大学(深圳) Ultra-dense multi-access-point dynamic cooperative transmission method
CN115113651A (en) * 2022-07-18 2022-09-27 中国电子科技集团公司第五十四研究所 Unmanned robot bureaucratic cooperative coverage optimization method based on ellipse fitting
CN115314904A (en) * 2022-06-14 2022-11-08 北京邮电大学 Communication coverage method and related equipment based on multi-agent maximum entropy reinforcement learning
CN115460543A (en) * 2022-08-31 2022-12-09 中国地质大学(武汉) A distributed ring fence covering method, device and storage device
CN115499849A (en) * 2022-11-16 2022-12-20 国网湖北省电力有限公司信息通信公司 A method for cooperation between a wireless access point and a reconfigurable smart surface
CN115616906A (en) * 2022-09-08 2023-01-17 南京航空航天大学 A Joint Optimization Method of UAV Data Acquisition Trajectories and User Association Based on Reinforcement Learning in Wireless Networks
CN115713130A (en) * 2022-09-07 2023-02-24 华东交通大学 Vehicle scheduling method based on hyper-parameter network weight distribution deep reinforcement learning
CN115802313A (en) * 2022-11-16 2023-03-14 河南大学 Air-ground mobile network energy-carrying fair communication method based on intelligent reflecting surface
CN116009590A (en) * 2023-02-01 2023-04-25 中山大学 Distributed trajectory planning method, system, equipment and medium for UAV network
CN116017479A (en) * 2022-12-30 2023-04-25 河南大学 A method for distributed multi-UAV relay network coverage
CN116208968A (en) * 2022-12-30 2023-06-02 北京信息科技大学 Trajectory planning method and device based on federated learning
CN116456307A (en) * 2023-05-06 2023-07-18 山东省计算中心(国家超级计算济南中心) A data acquisition and fusion method for energy-constrained Internet of Things based on Q-learning
CN116502547A (en) * 2023-06-29 2023-07-28 深圳大学 Multi-unmanned aerial vehicle wireless energy transmission method based on graph reinforcement learning
CN116980881A (en) * 2023-08-29 2023-10-31 北方工业大学 Multi-unmanned aerial vehicle collaboration data distribution method, system, electronic equipment and medium
CN117376934A (en) * 2023-12-08 2024-01-09 山东科技大学 Deep reinforcement learning-based multi-unmanned aerial vehicle offshore mobile base station deployment method
CN117835463A (en) * 2023-12-27 2024-04-05 武汉大学 Space-to-ground ad hoc communication network space-time dynamic deployment method based on deep reinforcement learning
CN117856903A (en) * 2023-12-07 2024-04-09 山东科技大学 Marine unmanned aerial vehicle optical link data transmission method based on multi-agent reinforcement learning

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180129882A1 (en) * 2016-11-08 2018-05-10 Dedrone Holdings, Inc. Systems, Methods, Apparatuses, and Devices for Identifying, Tracking, and Managing Unmanned Aerial Vehicles
CN109923799A (en) * 2016-11-11 2019-06-21 高通股份有限公司 The method restored for wave beam in millimeter-wave systems
CN209085657U (en) * 2017-08-02 2019-07-09 强力物联网投资组合2016有限公司 For data gathering system related or industrial environment with chemical production technology
CN110198531A (en) * 2019-05-24 2019-09-03 吉林大学 A kind of dynamic D2D relay selection method based on relative velocity
CN110430527A (en) * 2019-07-17 2019-11-08 大连理工大学 A kind of unmanned plane safe transmission power distribution method over the ground
CN110488861A (en) * 2019-07-30 2019-11-22 北京邮电大学 Unmanned plane track optimizing method, device and unmanned plane based on deeply study
CN110531617A (en) * 2019-07-30 2019-12-03 北京邮电大学 Multiple no-manned plane 3D hovering position combined optimization method, device and unmanned plane base station
CN110730028A (en) * 2019-08-29 2020-01-24 广东工业大学 Unmanned aerial vehicle-assisted backscatter communication device and resource allocation control method
CN110809274A (en) * 2019-10-28 2020-02-18 南京邮电大学 An enhanced network optimization method for UAV base stations for narrowband Internet of Things
US20200115047A1 (en) * 2018-10-11 2020-04-16 Beihang University Multi-uav continuous movement control method, apparatus, device, and storage medium for energy efficient communication coverage
CN111026147A (en) * 2019-12-25 2020-04-17 北京航空航天大学 A zero-overshoot UAV position control method and device based on deep reinforcement learning
CN111132009A (en) * 2019-12-23 2020-05-08 北京邮电大学 Mobile edge calculation method, device and system of Internet of things

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180129882A1 (en) * 2016-11-08 2018-05-10 Dedrone Holdings, Inc. Systems, Methods, Apparatuses, and Devices for Identifying, Tracking, and Managing Unmanned Aerial Vehicles
CN109923799A (en) * 2016-11-11 2019-06-21 高通股份有限公司 The method restored for wave beam in millimeter-wave systems
CN209085657U (en) * 2017-08-02 2019-07-09 强力物联网投资组合2016有限公司 For data gathering system related or industrial environment with chemical production technology
US20200115047A1 (en) * 2018-10-11 2020-04-16 Beihang University Multi-uav continuous movement control method, apparatus, device, and storage medium for energy efficient communication coverage
CN110198531A (en) * 2019-05-24 2019-09-03 吉林大学 A kind of dynamic D2D relay selection method based on relative velocity
CN110430527A (en) * 2019-07-17 2019-11-08 大连理工大学 A kind of unmanned plane safe transmission power distribution method over the ground
CN110488861A (en) * 2019-07-30 2019-11-22 北京邮电大学 Unmanned plane track optimizing method, device and unmanned plane based on deeply study
CN110531617A (en) * 2019-07-30 2019-12-03 北京邮电大学 Multiple no-manned plane 3D hovering position combined optimization method, device and unmanned plane base station
CN110730028A (en) * 2019-08-29 2020-01-24 广东工业大学 Unmanned aerial vehicle-assisted backscatter communication device and resource allocation control method
CN110809274A (en) * 2019-10-28 2020-02-18 南京邮电大学 An enhanced network optimization method for UAV base stations for narrowband Internet of Things
CN111132009A (en) * 2019-12-23 2020-05-08 北京邮电大学 Mobile edge calculation method, device and system of Internet of things
CN111026147A (en) * 2019-12-25 2020-04-17 北京航空航天大学 A zero-overshoot UAV position control method and device based on deep reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CONG WANG: "Research of UAV Target Detection and Flight Control Based on Deep Learning", 《2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD)》 *
周毅: "基于深度强化学习的无人机自主部署及能效优化策略", 《物联网学报》 *

Cited By (128)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112256056A (en) * 2020-10-19 2021-01-22 中山大学 Unmanned aerial vehicle control method and system based on multi-agent deep reinforcement learning
CN112256056B (en) * 2020-10-19 2022-03-01 中山大学 UAV control method and system based on multi-agent deep reinforcement learning
CN112512115B (en) * 2020-11-20 2022-02-11 北京邮电大学 A method, device and electronic device for determining the position of an air base station
CN112512115A (en) * 2020-11-20 2021-03-16 北京邮电大学 Method and device for determining position of air base station and electronic equipment
CN112566209A (en) * 2020-11-24 2021-03-26 山西三友和智慧信息技术股份有限公司 UAV-BSs energy and service priority track design method based on double Q learning
CN112511197A (en) * 2020-12-01 2021-03-16 南京工业大学 Unmanned aerial vehicle auxiliary elastic video multicast method based on deep reinforcement learning
CN112752357B (en) * 2020-12-02 2022-06-17 宁波大学 Online UAV-assisted data collection method and device based on energy harvesting technology
CN112752357A (en) * 2020-12-02 2021-05-04 宁波大学 Online unmanned aerial vehicle auxiliary data collection method and device based on energy harvesting technology
CN112511250A (en) * 2020-12-03 2021-03-16 中国人民解放军火箭军工程大学 DRL-based multi-unmanned aerial vehicle air base station dynamic deployment method and system
CN112636811A (en) * 2020-12-08 2021-04-09 北京邮电大学 Relay unmanned aerial vehicle deployment method and device
CN112672361A (en) * 2020-12-17 2021-04-16 东南大学 Large-scale MIMO capacity increasing method based on unmanned aerial vehicle cluster deployment
CN112672361B (en) * 2020-12-17 2022-12-02 东南大学 Large-scale MIMO capacity increasing method based on unmanned aerial vehicle cluster deployment
CN112821938A (en) * 2021-01-08 2021-05-18 重庆大学 Total throughput and energy consumption optimization method of air-space-ground satellite communication system
CN112904890B (en) * 2021-01-15 2023-06-30 北京国网富达科技发展有限责任公司 Unmanned aerial vehicle automatic inspection system and method for power line
CN112904890A (en) * 2021-01-15 2021-06-04 北京国网富达科技发展有限责任公司 Unmanned aerial vehicle automatic inspection system and method for power line
CN112947575A (en) * 2021-03-17 2021-06-11 中国人民解放军国防科技大学 Unmanned aerial vehicle cluster multi-target searching method and system based on deep reinforcement learning
CN113094982A (en) * 2021-03-29 2021-07-09 天津理工大学 Internet of vehicles edge caching method based on multi-agent deep reinforcement learning
CN113194488A (en) * 2021-03-31 2021-07-30 西安交通大学 Unmanned aerial vehicle track and intelligent reflecting surface phase shift joint optimization method and system
CN113162679A (en) * 2021-04-01 2021-07-23 南京邮电大学 DDPG algorithm-based IRS (inter-Range instrumentation System) auxiliary unmanned aerial vehicle communication joint optimization method
CN113162679B (en) * 2021-04-01 2023-03-10 南京邮电大学 DDPG algorithm-based IRS (intelligent resilient software) assisted unmanned aerial vehicle communication joint optimization method
CN113342029A (en) * 2021-04-16 2021-09-03 山东师范大学 Maximum sensor data acquisition path planning method and system based on unmanned aerial vehicle cluster
CN113115344A (en) * 2021-04-19 2021-07-13 中国人民解放军火箭军工程大学 Unmanned aerial vehicle base station communication resource allocation strategy prediction method based on noise optimization
CN113115344B (en) * 2021-04-19 2021-12-14 中国人民解放军火箭军工程大学 Unmanned aerial vehicle base station communication resource allocation strategy prediction method based on noise optimization
CN113286275A (en) * 2021-04-23 2021-08-20 南京大学 Unmanned aerial vehicle cluster efficient communication method based on multi-agent reinforcement learning
CN113190039A (en) * 2021-04-27 2021-07-30 大连理工大学 Unmanned aerial vehicle acquisition path planning method based on hierarchical deep reinforcement learning
CN113190039B (en) * 2021-04-27 2024-04-16 大连理工大学 Unmanned aerial vehicle acquisition path planning method based on layered deep reinforcement learning
CN113364495A (en) * 2021-05-25 2021-09-07 西安交通大学 Multi-unmanned aerial vehicle track and intelligent reflecting surface phase shift joint optimization method and system
CN113286314B (en) * 2021-05-25 2022-03-08 重庆邮电大学 Unmanned aerial vehicle base station deployment and user association method based on Q learning algorithm
CN113364495B (en) * 2021-05-25 2022-08-05 西安交通大学 A method and system for joint optimization of multi-UAV trajectory and intelligent reflector phase shift
CN113286314A (en) * 2021-05-25 2021-08-20 重庆邮电大学 Unmanned aerial vehicle base station deployment and user association method based on Q learning algorithm
CN113255218A (en) * 2021-05-27 2021-08-13 电子科技大学 Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network
CN113255218B (en) * 2021-05-27 2022-05-31 电子科技大学 Unmanned aerial vehicle autonomous navigation and resource scheduling method of wireless self-powered communication network
CN113328775B (en) * 2021-05-28 2022-06-21 怀化学院 UAV height positioning system and computer storage medium
CN113328775A (en) * 2021-05-28 2021-08-31 怀化学院 UAV height positioning system and computer storage medium
CN113660681A (en) * 2021-05-31 2021-11-16 西北工业大学 Multi-agent resource optimization method applied to unmanned aerial vehicle cluster auxiliary transmission
CN113660681B (en) * 2021-05-31 2023-06-06 西北工业大学 Multi-agent resource optimization method applied to unmanned aerial vehicle cluster auxiliary transmission
CN113242556B (en) * 2021-06-04 2022-08-23 重庆邮电大学 Unmanned aerial vehicle resource dynamic deployment method based on differentiated services
CN113242556A (en) * 2021-06-04 2021-08-10 重庆邮电大学 Unmanned aerial vehicle resource dynamic deployment method based on differentiated services
CN113382060A (en) * 2021-06-07 2021-09-10 北京理工大学 Unmanned aerial vehicle track optimization method and system in Internet of things data collection
CN113382060B (en) * 2021-06-07 2022-03-22 北京理工大学 A method and system for UAV trajectory optimization in IoT data collection
CN113392971A (en) * 2021-06-11 2021-09-14 武汉大学 Strategy network training method, device, equipment and readable storage medium
CN113364630A (en) * 2021-06-15 2021-09-07 广东技术师范大学 Quality of service (QoS) differentiation optimization method and device
CN113572548A (en) * 2021-06-18 2021-10-29 南京理工大学 A fast frequency hopping method for UAV network collaboration based on multi-agent reinforcement learning
CN113572548B (en) * 2021-06-18 2023-07-07 南京理工大学 A Cooperative Fast Frequency Hopping Method for UAV Network Based on Multi-agent Reinforcement Learning
CN113346944B (en) * 2021-06-28 2022-06-10 上海交通大学 Time delay minimization calculation task unloading method and system in air-space-ground integrated network
CN113346944A (en) * 2021-06-28 2021-09-03 上海交通大学 Time delay minimization calculation task unloading method and system in air-space-ground integrated network
CN113467508B (en) * 2021-06-30 2022-06-28 天津大学 Multi-unmanned aerial vehicle intelligent cooperative decision-making method for trapping task
CN113467508A (en) * 2021-06-30 2021-10-01 天津大学 Multi-unmanned aerial vehicle intelligent cooperative decision-making method for trapping task
CN113641192A (en) * 2021-07-06 2021-11-12 暨南大学 A Path Planning Method for UAV Swarm Perception Task Based on Reinforcement Learning
CN113641192B (en) * 2021-07-06 2023-07-18 暨南大学 A Path Planning Method for Unmanned Aerial Vehicle Crowd Sensing Task Based on Reinforcement Learning
CN113613339B (en) * 2021-07-10 2023-10-17 西北农林科技大学 Channel access method for multi-priority wireless terminals based on deep reinforcement learning
CN113613339A (en) * 2021-07-10 2021-11-05 西北农林科技大学 Channel access method of multi-priority wireless terminal based on deep reinforcement learning
CN113395708A (en) * 2021-07-13 2021-09-14 东南大学 Multi-autonomous-subject centralized region coverage method and system based on global environment prediction
CN113359480A (en) * 2021-07-16 2021-09-07 中国人民解放军火箭军工程大学 Multi-unmanned aerial vehicle and user cooperative communication optimization method based on MAPPO algorithm
CN113359480B (en) * 2021-07-16 2022-02-01 中国人民解放军火箭军工程大学 Multi-unmanned aerial vehicle and user cooperative communication optimization method based on MAPPO algorithm
CN113776531A (en) * 2021-07-21 2021-12-10 电子科技大学长三角研究院(湖州) Multi-UAV autonomous navigation and task assignment algorithm based on wireless self-powered communication network
CN113625751B (en) * 2021-08-05 2023-02-24 南京航空航天大学 Unmanned aerial vehicle position and resource joint optimization method for air-ground integrated federal learning
CN113625751A (en) * 2021-08-05 2021-11-09 南京航空航天大学 Unmanned aerial vehicle position and resource joint optimization method for air-ground integrated federal learning
CN113625569B (en) * 2021-08-12 2022-02-08 中国人民解放军32802部队 Small unmanned aerial vehicle prevention and control decision method and system based on hybrid decision model
CN113625569A (en) * 2021-08-12 2021-11-09 中国人民解放军32802部队 Small unmanned aerial vehicle prevention and control hybrid decision method and system based on deep reinforcement learning and rule driving
CN113706023B (en) * 2021-08-31 2022-07-12 哈尔滨理工大学 Shipboard aircraft guarantee operator scheduling method based on deep reinforcement learning
CN113706023A (en) * 2021-08-31 2021-11-26 哈尔滨理工大学 Shipboard aircraft guarantee operator scheduling method based on deep reinforcement learning
CN113691294A (en) * 2021-09-27 2021-11-23 中国人民解放军空军预警学院 Near-field sparse array antenna beam establishing method and device
CN114051252A (en) * 2021-09-28 2022-02-15 嘉兴学院 Multi-user intelligent transmitting power control method in wireless access network
CN114021775A (en) * 2021-09-30 2022-02-08 成都海天数联科技有限公司 Intelligent body handicap device putting method based on optimal solution
CN113762512A (en) * 2021-11-10 2021-12-07 北京航空航天大学杭州创新研究院 Distributed model training method, system and related device
CN114142912A (en) * 2021-11-26 2022-03-04 西安电子科技大学 Resource management and control method for high dynamic air network time coverage continuity guarantee
CN114222251A (en) * 2021-11-30 2022-03-22 中山大学·深圳 Adaptive network forming and track optimizing method for multiple unmanned aerial vehicles
CN114268986A (en) * 2021-12-14 2022-04-01 北京航空航天大学 Unmanned aerial vehicle computing unloading and charging service efficiency optimization method
CN114372612A (en) * 2021-12-16 2022-04-19 电子科技大学 Route planning and task unloading method for unmanned aerial vehicle mobile edge computing scene
CN114372612B (en) * 2021-12-16 2023-04-28 电子科技大学 Path planning and task unloading method for unmanned aerial vehicle mobile edge computing scene
CN114268963B (en) * 2021-12-24 2023-07-11 北京航空航天大学 Communication coverage-oriented unmanned aerial vehicle network autonomous deployment method
CN114268963A (en) * 2021-12-24 2022-04-01 北京航空航天大学 A method for autonomous deployment of UAV networks for communication coverage
CN114339842A (en) * 2022-01-06 2022-04-12 北京邮电大学 Dynamic trajectory design method and device for UAV swarms in time-varying scenarios based on deep reinforcement learning
CN114339842B (en) * 2022-01-06 2022-12-20 北京邮电大学 Method and device for dynamic trajectory design of UAV swarms in time-varying scenarios based on deep reinforcement learning
CN114374951B (en) * 2022-01-12 2024-04-30 重庆邮电大学 Dynamic pre-deployment method for multiple unmanned aerial vehicles
CN114374951A (en) * 2022-01-12 2022-04-19 重庆邮电大学 A method for dynamic pre-deployment of multiple UAVs
CN114124784A (en) * 2022-01-27 2022-03-01 军事科学院系统工程研究院网络信息研究所 Intelligent routing decision protection method and system based on vertical federation
CN114124784B (en) * 2022-01-27 2022-04-12 军事科学院系统工程研究院网络信息研究所 Intelligent routing decision protection method and system based on vertical federation
CN114548551A (en) * 2022-02-21 2022-05-27 广东汇天航空航天科技有限公司 Method and device for determining residual endurance time, aircraft and medium
CN114578335A (en) * 2022-03-03 2022-06-03 电子科技大学长三角研究院(衢州) Positioning method based on multi-agent deep reinforcement learning and least square
CN114578335B (en) * 2022-03-03 2024-08-16 电子科技大学长三角研究院(衢州) Positioning method based on multi-agent deep reinforcement learning and least square
CN114567888B (en) * 2022-03-04 2023-12-26 国网浙江省电力有限公司台州市黄岩区供电公司 Multi-unmanned aerial vehicle dynamic deployment method
CN114567888A (en) * 2022-03-04 2022-05-31 重庆邮电大学 Multi-unmanned aerial vehicle dynamic deployment method
CN114625151A (en) * 2022-03-10 2022-06-14 大连理工大学 Underwater robot obstacle avoidance path planning method based on reinforcement learning
CN114625151B (en) * 2022-03-10 2024-05-28 大连理工大学 Underwater robot obstacle avoidance path planning method based on reinforcement learning
CN114449482A (en) * 2022-03-11 2022-05-06 南京理工大学 Heterogeneous vehicle networking user association method based on multi-agent deep reinforcement learning
CN114449482B (en) * 2022-03-11 2024-05-14 南京理工大学 Heterogeneous Internet of vehicles user association method based on multi-agent deep reinforcement learning
CN114679699A (en) * 2022-03-23 2022-06-28 重庆邮电大学 Multi-UAV energy-saving cruise communication coverage method based on deep reinforcement learning
CN114884895A (en) * 2022-05-05 2022-08-09 郑州轻工业大学 Intelligent traffic scheduling method based on deep reinforcement learning
CN114884895B (en) * 2022-05-05 2023-08-22 郑州轻工业大学 Intelligent flow scheduling method based on deep reinforcement learning
CN114980169B (en) * 2022-05-16 2024-08-20 北京理工大学 Unmanned aerial vehicle auxiliary ground communication method based on track and phase joint optimization
CN114980169A (en) * 2022-05-16 2022-08-30 北京理工大学 A UAV-assisted ground communication method based on joint optimization of trajectory and phase
CN114980020A (en) * 2022-05-17 2022-08-30 重庆邮电大学 Unmanned aerial vehicle data collection method based on MADDPG algorithm
CN114980020B (en) * 2022-05-17 2024-07-12 中科润物科技(南京)有限公司 MADDPG algorithm-based unmanned aerial vehicle data collection method
CN114997617B (en) * 2022-05-23 2024-06-07 华中科技大学 A method and system for allocating multi-unmanned platform multi-target joint detection tasks
CN115038155A (en) * 2022-05-23 2022-09-09 香港中文大学(深圳) Ultra-dense multi-access-point dynamic cooperative transmission method
CN114997617A (en) * 2022-05-23 2022-09-02 华中科技大学 Multi-unmanned platform multi-target joint detection task allocation method and system
CN115314904A (en) * 2022-06-14 2022-11-08 北京邮电大学 Communication coverage method and related equipment based on multi-agent maximum entropy reinforcement learning
CN115314904B (en) * 2022-06-14 2024-03-29 北京邮电大学 Communication coverage method and related equipment based on multi-agent maximum entropy reinforcement learning
CN115113651B (en) * 2022-07-18 2024-11-08 中国电子科技集团公司第五十四研究所 An optimization method for UAV leader-officer collaborative coverage based on ellipse fitting
CN115113651A (en) * 2022-07-18 2022-09-27 中国电子科技集团公司第五十四研究所 Unmanned robot bureaucratic cooperative coverage optimization method based on ellipse fitting
CN114942653A (en) * 2022-07-26 2022-08-26 北京邮电大学 Method, device and electronic device for determining flight strategy of unmanned swarm
CN115460543A (en) * 2022-08-31 2022-12-09 中国地质大学(武汉) A distributed ring fence covering method, device and storage device
CN115460543B (en) * 2022-08-31 2024-04-19 中国地质大学(武汉) A distributed annular fence covering method, device and storage device
CN115713130A (en) * 2022-09-07 2023-02-24 华东交通大学 Vehicle scheduling method based on hyper-parameter network weight distribution deep reinforcement learning
CN115713130B (en) * 2022-09-07 2023-09-05 华东交通大学 Vehicle scheduling method based on super-parameter network weight distribution deep reinforcement learning
CN115616906A (en) * 2022-09-08 2023-01-17 南京航空航天大学 A Joint Optimization Method of UAV Data Acquisition Trajectories and User Association Based on Reinforcement Learning in Wireless Networks
CN115802313A (en) * 2022-11-16 2023-03-14 河南大学 Air-ground mobile network energy-carrying fair communication method based on intelligent reflecting surface
CN115499849B (en) * 2022-11-16 2023-04-07 国网湖北省电力有限公司信息通信公司 A method for cooperation between a wireless access point and a reconfigurable smart surface
CN115499849A (en) * 2022-11-16 2022-12-20 国网湖北省电力有限公司信息通信公司 A method for cooperation between a wireless access point and a reconfigurable smart surface
CN116017479B (en) * 2022-12-30 2024-10-25 河南大学 Distributed multi-unmanned aerial vehicle relay network coverage method
CN116017479A (en) * 2022-12-30 2023-04-25 河南大学 A method for distributed multi-UAV relay network coverage
CN116208968B (en) * 2022-12-30 2024-04-05 北京信息科技大学 Trajectory planning method and device based on federated learning
CN116208968A (en) * 2022-12-30 2023-06-02 北京信息科技大学 Trajectory planning method and device based on federated learning
CN116009590A (en) * 2023-02-01 2023-04-25 中山大学 Distributed trajectory planning method, system, equipment and medium for UAV network
CN116009590B (en) * 2023-02-01 2023-11-17 中山大学 Unmanned aerial vehicle network distributed track planning method, system, equipment and medium
CN116456307B (en) * 2023-05-06 2024-04-09 山东省计算中心(国家超级计算济南中心) Q learning-based energy-limited Internet of things data acquisition and fusion method
CN116456307A (en) * 2023-05-06 2023-07-18 山东省计算中心(国家超级计算济南中心) A data acquisition and fusion method for energy-constrained Internet of Things based on Q-learning
CN116502547A (en) * 2023-06-29 2023-07-28 深圳大学 Multi-unmanned aerial vehicle wireless energy transmission method based on graph reinforcement learning
CN116502547B (en) * 2023-06-29 2024-06-04 深圳大学 Multi-unmanned aerial vehicle wireless energy transmission method based on graph reinforcement learning
CN116980881A (en) * 2023-08-29 2023-10-31 北方工业大学 Multi-unmanned aerial vehicle collaboration data distribution method, system, electronic equipment and medium
CN116980881B (en) * 2023-08-29 2024-01-23 北方工业大学 Multi-unmanned aerial vehicle collaboration data distribution method, system, electronic equipment and medium
CN117856903A (en) * 2023-12-07 2024-04-09 山东科技大学 Marine unmanned aerial vehicle optical link data transmission method based on multi-agent reinforcement learning
CN117856903B (en) * 2023-12-07 2024-08-30 山东科技大学 Marine unmanned aerial vehicle optical link data transmission method based on multi-agent reinforcement learning
CN117376934B (en) * 2023-12-08 2024-02-27 山东科技大学 Deep reinforcement learning-based multi-unmanned aerial vehicle offshore mobile base station deployment method
CN117376934A (en) * 2023-12-08 2024-01-09 山东科技大学 Deep reinforcement learning-based multi-unmanned aerial vehicle offshore mobile base station deployment method
CN117835463A (en) * 2023-12-27 2024-04-05 武汉大学 Space-to-ground ad hoc communication network space-time dynamic deployment method based on deep reinforcement learning

Also Published As

Publication number Publication date
CN111786713B (en) 2021-06-08

Similar Documents

Publication Publication Date Title
CN111786713B (en) A UAV network hovering position optimization method based on multi-agent deep reinforcement learning
CN112118556B (en) Unmanned aerial vehicle track and power joint optimization method based on deep reinforcement learning
Li et al. On-board deep Q-network for UAV-assisted online power transfer and data collection
You et al. 3D trajectory optimization in Rician fading for UAV-enabled data harvesting
US20210165405A1 (en) Multiple unmanned aerial vehicles navigation optimization method and multiple unmanned aerial vehicles system using the same
CN114025330B (en) A self-organizing network data transmission method for air-ground coordination
CN109885088B (en) Optimization method of UAV flight trajectory based on machine learning in edge computing network
CN110380776B (en) Internet of things system data collection method based on unmanned aerial vehicle
CN114339842B (en) Method and device for dynamic trajectory design of UAV swarms in time-varying scenarios based on deep reinforcement learning
Zhou et al. QoE-driven adaptive deployment strategy of multi-UAV networks based on hybrid deep reinforcement learning
CN115499921A (en) Three-dimensional trajectory design and resource scheduling optimization method for complex unmanned aerial vehicle network
CN113163332B (en) Metric learning-based roadmap coloring UAV energy-saving endurance data collection method
CN112671451B (en) A kind of unmanned aerial vehicle data collection method, equipment, electronic equipment and storage medium
CN113359480A (en) Multi-unmanned aerial vehicle and user cooperative communication optimization method based on MAPPO algorithm
Wu et al. 3D aerial base station position planning based on deep Q-network for capacity enhancement
CN114945182B (en) Multi-unmanned aerial vehicle relay optimization deployment method in urban environment
CN113660681A (en) Multi-agent resource optimization method applied to unmanned aerial vehicle cluster auxiliary transmission
CN114980169A (en) A UAV-assisted ground communication method based on joint optimization of trajectory and phase
CN112702713A (en) Low-altitude unmanned-machine communication deployment method under multi-constraint condition
CN114020024A (en) UAV path planning method based on Monte Carlo tree search
CN115119174A (en) Autonomous deployment method of unmanned aerial vehicle based on energy consumption optimization in irrigation area
CN109413664A (en) A kind of super-intensive based on interference is tethered at unmanned plane base station height adjusting method
CN115314904A (en) Communication coverage method and related equipment based on multi-agent maximum entropy reinforcement learning
Yang et al. Deep reinforcement learning in NOMA-assisted UAV networks for path selection and resource offloading
CN113776531A (en) Multi-UAV autonomous navigation and task assignment algorithm based on wireless self-powered communication network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant