CN116257335A

CN116257335A - UAV-assisted MEC system joint task scheduling and motion trajectory optimization method

Info

Publication number: CN116257335A
Application number: CN202211613821.3A
Authority: CN
Inventors: 刘宜明; 王熠鹏; 张家祥; 刘宝玲
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2022-12-15
Filing date: 2022-12-15
Publication date: 2023-06-13

Abstract

The invention discloses a UAV-assisted MEC system joint task scheduling and motion trajectory optimization method, fully considering the change of the position of the mobile device in a dynamic environment (under the condition that the ground terminal equipment and the UAV are both in dynamic movement) In the case of partial unloading of calculation tasks, through joint optimization of user scheduling and selection of calculation task offloading methods, UAV motion parameters (including flight angle, flight speed) and power distribution, the joint optimization of communication, calculation and motion effectively reduces the terminal load. The computing task processing delay of the device improves the unloading efficiency of the UAV-assisted edge computing system. And compared with DQN and other baseline algorithms in the simulation experiment, it is found that the DDPG algorithm has a significant improvement in processing delay.

Description

Joint task scheduling and motion trajectory optimization method for UAV-assisted MEC system

技术领域Technical Field

本发明涉及车联网服务隐私保护技术领域，尤其涉及一种无人机辅助MEC系统联合任务调度及运动轨迹优化方法。The present invention relates to the technical field of privacy protection for Internet of Vehicles services, and in particular to a method for joint task scheduling and motion trajectory optimization of a drone-assisted MEC system.

背景技术Background Art

随着第五代移动通信技术的发展，计算密集型和延迟敏感型的新型应用快速涌现，如自动导航、人脸识别、网络游戏等。然而，终端设备的计算能力往往较低难以处理大量的计算任务；传统的云计算方式将计算资源集中在云端，密集计算任务访问以及云端远离终端设备的情况将会导致较大的传输时延。移动边缘计算(Mobile Edge Computing，MEC)通过将计算服务器部署在靠近用户侧的网络边缘，可以便捷地提供计算服务来处理终端密集的计算任务，从而有效降低计算处理时延提升用户的服务体验。因此，移动边缘计算模式的出现极大地缓解了网络带宽和云计算中心的压力，优化了计算、存储服务的响应能力。With the development of the fifth generation of mobile communication technology, new computing-intensive and delay-sensitive applications have emerged rapidly, such as automatic navigation, face recognition, and online games. However, the computing power of terminal devices is often low and it is difficult to handle a large number of computing tasks; traditional cloud computing methods concentrate computing resources in the cloud, and the access to intensive computing tasks and the distance between the cloud and the terminal devices will lead to large transmission delays. Mobile Edge Computing (MEC) can conveniently provide computing services to handle terminal-intensive computing tasks by deploying computing servers at the edge of the network close to the user side, thereby effectively reducing computing processing delays and improving user service experience. Therefore, the emergence of the mobile edge computing model has greatly alleviated the pressure on network bandwidth and cloud computing centers, and optimized the responsiveness of computing and storage services.

现有的移动边缘计算服务器/计算中心往往采用固定部署的方式为用户提供服务。然而，在野外通信设施稀疏、发生突发灾害的情况下，固定基础设施往往难以提供有效服务。无人机(Unmanned Aerial Vehicles，UAV)具有灵活机动、易于部署的特征，通过与地面终端建立视距(Line of Sight，LOS)连接，可以用于支持地面通信中继及边缘计算服务等。Existing mobile edge computing servers/computing centers often use fixed deployment to provide services to users. However, in the case of sparse field communication facilities and sudden disasters, fixed infrastructure often finds it difficult to provide effective services. Unmanned Aerial Vehicles (UAVs) are flexible and easy to deploy. By establishing a line of sight (LOS) connection with ground terminals, they can be used to support ground communication relays and edge computing services.

在无人机用于支持地面通信的方案中，无人机往往保持静止并充当空中基站，可以应用于基础设施受损或大型活动通信流量突发等场景。此外，利用无人机的计算能力，无人机可以面向地面终端设备按需提供计算资源，有效提高用户的服务体验。In the scenario where drones are used to support ground communications, drones often remain stationary and act as aerial base stations, which can be applied to scenarios such as damaged infrastructure or sudden communication traffic at large events. In addition, using the computing power of drones, drones can provide computing resources on demand to ground terminal devices, effectively improving the user's service experience.

目前，学术界和产业界围绕UAV辅助的MEC系统展开研究。在文献[1](T.Ren etal.,"Enabling Efficient Scheduling in Large-Scale UAV-Assisted Mobile-EdgeComputing via Hierarchical Reinforcement Learning,"in IEEE Internet of ThingsJournal,vol.9,no.10,pp.7095-7109,15May15,2022,doi:10.1109/JIOT.2021.3071531.)中，针对无人机运动规划与地面终端设备计算卸载调度问题，作者将调度问题分解为两层子问题，利用分层强化学习算法进行交替优化，以获取动态环境下的实时调度策略。At present, academia and industry are conducting research on UAV-assisted MEC systems. In the literature [1] (T. Ren et al., "Enabling Efficient Scheduling in Large-Scale UAV-Assisted Mobile-Edge Computing via Hierarchical Reinforcement Learning," in IEEE Internet of Things Journal, vol. 9, no. 10, pp. 7095-7109, 15 May 15, 2022, doi: 10.1109/JIOT.2021.3071531.), the authors decompose the scheduling problem into two sub-problems for the UAV motion planning and ground terminal device computing offloading scheduling problem, and use the hierarchical reinforcement learning algorithm for alternating optimization to obtain a real-time scheduling strategy in a dynamic environment.

然而，现有技术主要考虑无人机作为通信中继节点提供通信服务，或者无人机作为边缘计算节点为终端设备提供计算服务，较少考虑两种服务方式协同的方案。由于无人机计算能力受限，往往会出现无法满足终端设备计算任务卸载需求的情况。However, existing technologies mainly consider drones as communication relay nodes to provide communication services, or drones as edge computing nodes to provide computing services for terminal devices, and rarely consider solutions that coordinate the two service modes. Due to the limited computing power of drones, it is often impossible to meet the computing task offloading needs of terminal devices.

发明内容Summary of the invention

本发明针对现有无人机计算能力受限，无法满足终端设备计算任务卸载需求的问题，提出一种无人机辅助MEC系统联合任务调度及运动轨迹优化方法，不仅考虑无人机作为边缘计算节点为用户提供计算服务的情况，同时还考虑了无人机作为通信中继节点将计算任务转发给基站侧边缘计算服务器的情况，进而将用户调度、计算任务卸载选择、无人机飞行参数和通信资源分配联合优化，有效降低了地面终端设备的计算任务处理时延，提升了无人机辅助边缘计算系统的卸载效率。In view of the problem that the computing power of existing drones is limited and cannot meet the computing task offloading needs of terminal devices, the present invention proposes a drone-assisted MEC system joint task scheduling and motion trajectory optimization method, which not only considers the situation where the drone acts as an edge computing node to provide computing services to users, but also considers the situation where the drone acts as a communication relay node to forward computing tasks to the edge computing server on the base station side, thereby jointly optimizing user scheduling, computing task offloading selection, drone flight parameters and communication resource allocation, effectively reducing the computing task processing delay of ground terminal equipment and improving the offloading efficiency of the drone-assisted edge computing system.

为了实现上述目的，本发明提供如下技术方案：In order to achieve the above object, the present invention provides the following technical solutions:

一种无人机辅助MEC系统联合任务调度及运动轨迹优化方法，以最小化完成所有终端计算任务处理为优化目标，将无人机传输功率、任务卸载变量、无人机飞行速度、无人机飞行角度、无人机自身能量、无人机的移动区域和终端设备的移动区域作为约束，采用深度确定性策略梯度算法求解优化目标。A method for joint task scheduling and motion trajectory optimization of a drone-assisted MEC system takes minimizing the completion of all terminal computing task processing as the optimization goal, takes drone transmission power, task offloading variables, drone flight speed, drone flight angle, drone's own energy, drone's moving area and terminal device's moving area as constraints, and adopts a deep deterministic policy gradient algorithm to solve the optimization goal.

进一步地，优化目标和约束表示如下：Furthermore, the optimization objectives and constraints are expressed as follows:

C1:

C1:

C2:

C2:

C3:

C3:

C4:0≤β(n)≤2πC4:0≤β(n)≤2π

C5:

C5:

C6:{x_u(n)∈[0,L],y_u(n)∈[0,W]}C6:{x _u (n)∈[0,L],y _u (n)∈[0,W]}

C7:{x_m(n)∈[0,L],y_m(n)∈[0,W]}C7:{x _m (n)∈[0,L],y _m (n)∈[0,W]}

其中，t_sum(n)是在时隙T内完成计算任务处理的总时延，表示为：Where t _sum (n) is the total delay to complete the computing task processing in the time slot T, expressed as:

其中，

为地面终端设备将计算任务全部卸载给无人机的传输时延，

为无人机将计算任务部分卸载给基站的传输时延，t_mu(n)为无人机计算任务时延，t_mb(n)为基站部署MEC服务器侧完成计算任务所需时间；in,

The transmission delay for ground terminal equipment to offload all computing tasks to drones,

is the transmission delay of the UAV offloading part of the computing task to the base station, t _mu (n) is the computing task delay of the UAV, and t _mb (n) is the time required for the base station to deploy the MEC server side to complete the computing task;

C1是对无人机传输功率P_u(n)的约束，C2是对任务卸载变量

的约束，C3是对无人机飞行速度v_u(n)的约束，C4是对无人机飞行角度β(n)的约束，C5是对无人机自身能量的约束，C6和C7分别是对无人机和终端设备的移动区域的限制；E_fly＝φ||v_u(n)||²为无人机的飞行能耗，其中，φ＝0.5M_UAVt_fly，M是无人机质量，t_fly为飞行时间，E_mu(n)＝γ_u(f_u)³t_mu(n)，E_mu(n)为无人机本地计算能耗，f_u为无人机的计算能力，γ_u为芯片结构对CPU处理的影响因子，E^U为无人机自带电量。C1 is the constraint on the UAV transmission power P _u (n), and C2 is the task offloading variable

, C3 is the constraint on the UAV's flight speed v _u (n), C4 is the constraint on the UAV's flight angle β(n), C5 is the constraint on the UAV's own energy, C6 and C7 are the restrictions on the moving areas of the UAV and the terminal device respectively; E _fly =φ||v _u (n)|| ² is the flight energy consumption of the UAV, where φ=0.5M _UAV t _fly , M is the mass of the UAV, t _fly is the flight time, E _mu (n)=γ _u (f _u ) ³ t _mu (n), E _mu (n) is the local computing energy consumption of the UAV, _fu is the computing power of the UAV, γ _u is the influence factor of the chip structure on CPU processing, and ^EU is the battery capacity of the UAV.

进一步地，采用深度确定性策略梯度算法求解优化目标的过程为：Furthermore, the process of solving the optimization objective using the deep deterministic policy gradient algorithm is:

状态空间表示为

其中q(n)代表无人机的位置信息，p_m(n)代表地面终端设备的位置信息，D_m(n)代表剩余计算任务数据量的大小，

代表无人机剩余的电量；The state space is represented as

Where q(n) represents the location information of the UAV, p _m (n) represents the location information of the ground terminal equipment, and D _m (n) represents the amount of data for the remaining computing task.

Represents the remaining battery power of the drone;

动作空间表示为

其中v_u(n)为无人机的速度大小，β(n)为无人机的飞行角度，P_u(n)为无人机的传输功率，

为任务卸载变量；The action space is represented as

Where v _u (n) is the speed of the UAV, β(n) is the flight angle of the UAV, and P _u (n) is the transmission power of the UAV.

Unload variables for tasks;

奖励函数表示为r_n＝-t_sum(n)，其中奖励函数为负的总时延；The reward function is expressed as r _n =-t _sum (n), where the reward function is the negative total delay;

使用Q-table来记录和更新状态-动作值，即Q(s,a)，同时采用critic和actor网络进行更新，表示如下：Use Q-table to record and update the state-action value, i.e. Q(s,a), and use critic and actor networks for updating, as shown below:

θ^Q'←ρθ^Q+(1-ρ)θ^Q' θ ^Q' ←ρθ ^Q +(1-ρ)θ ^Q'

θ^μ'←ρθ^μ+(1-ρ)θ^μ' θ ^μ' ←ρθ ^μ +(1-ρ)θ ^μ'

其中，θ^Q为critic神经网络的参数，ρ为常数，θ^μ为actor神经网络的参数。Among them, θ ^Q is the parameter of the critic neural network, ρ is a constant, and θ ^μ is the parameter of the actor neural network.

进一步地，critic网络更新为：Furthermore, the critic network is updated as follows:

L(θ^Q)＝E_μ'[(y_n-Q(s_n,a_n|θ^Q))²]L(θ ^Q )＝E _μ' [(y _n -Q(s _n ,a _n |θ ^Q )) ² ]

其中，E_μ'为求均值函数，y_n为Q目标值，y_n＝r_n+γQ(s_n+1,μ(s_n+1)|θ^Q)，γ为折扣因子，r_n为奖励函数。Where E _μ' is the averaging function, _yn is the Q target value, _yn = _rn +γQ(sn ₊₁ ,μ( _sn+1 )| ^θQ ), γ is the discount factor, and _rn is the reward function.

进一步地，Actor网络更新为：Furthermore, the Actor network is updated as follows:

其中，

为求策略梯度函数，μ为策略网络，N为从经验池随机抽取的大小。in,

To find the policy gradient function, μ is the policy network and N is the size randomly drawn from the experience pool.

与现有技术相比，本发明的有益效果为：Compared with the prior art, the present invention has the following beneficial effects:

本发明提出的无人机辅助MEC系统联合任务调度及运动轨迹优化方法，在充分考虑到在动态环境中(地面终端设备及无人机均处于动态移动的条件下)移动设备位置发生变化和计算任务部分卸载的情况，通过联合优化用户调度及计算任务卸载方式选择，无人机运动参数(包括飞行角度、飞行速度)和功率分配，将通信、计算与运动联合优化，有效降低了终端设备的计算任务处理时延，提升了无人机辅助边缘计算系统的卸载效率。并在仿真实验中与DQN等基线算法进行对比，发现DDPG算法在处理时延上有显著的提高。The proposed method for joint task scheduling and motion trajectory optimization of the UAV-assisted MEC system takes full account of the change in the position of the mobile device and the partial unloading of the computing task in a dynamic environment (when both the ground terminal device and the UAV are in dynamic movement). By jointly optimizing the user scheduling and the selection of the computing task unloading method, the UAV motion parameters (including flight angle, flight speed) and power allocation, the communication, computing and motion are jointly optimized, which effectively reduces the computing task processing delay of the terminal device and improves the unloading efficiency of the UAV-assisted edge computing system. In the simulation experiment, it is compared with the baseline algorithms such as DQN, and it is found that the DDPG algorithm has a significant improvement in processing delay.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本申请实施例或现有技术中的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明中记载的一些实施例，对于本领域普通技术人员来讲，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required for use in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in the present invention. For ordinary technicians in this field, other drawings can also be obtained based on these drawings.

图1为无人机辅助边缘系统模型图。Figure 1 is a model diagram of the drone-assisted edge system.

图2为本发明实施例提供的无人机辅助MEC系统联合任务调度及运动轨迹优化方法的系统架构图。FIG2 is a system architecture diagram of a method for joint task scheduling and motion trajectory optimization of a drone-assisted MEC system provided in an embodiment of the present invention.

图3为DDPG与DQN在处理时延上的对比结果。Figure 3 shows the comparison results of DDPG and DQN in processing latency.

图4为DDPG与Local基线算法和Offload基线算法在处理时延上的对比结果。Figure 4 shows the comparison results of DDPG with the Local baseline algorithm and the Offload baseline algorithm in terms of processing latency.

具体实施方式DETAILED DESCRIPTION

为了更好地理解本技术方案，下面结合附图对本发明的方法做详细的说明。In order to better understand the technical solution, the method of the present invention is described in detail below with reference to the accompanying drawings.

1)系统模型1) System Model

图1为无人机辅助MEC系统图，包括一台无人机、多个地面终端设备(MDs)和基站。考虑时隙可以划分为T＝{1,2,...,N}。在时隙n内，终端设备m的位置坐标为p_m(n)＝[x_m(n),y_m(n),0]^T，并以速度v_m随机移动；无人机的位置坐标为

基站b的坐标为l_b(n)＝[x_b(n),y_b(n),h]^T。Figure 1 shows a UAV-assisted MEC system diagram, which includes a UAV, multiple ground terminal devices (MDs) and base stations. Consider that the time slots can be divided into T = {1, 2, ..., N}. In time slot n, the position coordinates of terminal device m are p _m (n) = [x _m (n), y _m (n), 0] ^T , and it moves randomly at a speed v _m ; the position coordinates of the UAV are

The coordinates of the base station b are l _b (n) = [x _b (n), y _b (n), h] ^T .

考虑无人机部署在空中，无人机和MDs之间以视距传输为主，无人机和MDs之间的信道增益可以表示为：Considering that the UAV is deployed in the air, the transmission between the UAV and MDs is mainly based on line-of-sight. The channel gain between the UAV and MDs can be expressed as:

其中，g₀表示参考距离1m时的信道增益，

为无人机和MDs之间的距离。Where _g0 represents the channel gain at a reference distance of 1m.

is the distance between the UAV and MDs.

无人机和基站之间的信道增益可以表达为：The channel gain between the drone and the base station can be expressed as:

其中，

为无人机和基站之间的距离。in,

is the distance between the drone and the base station.

在时隙n内，MD m携带的计算任务可以表示为：In time slot n, the computational task carried by MD m can be expressed as:

W_m(n)＝{D_m(n),C_m(n),T_m(n)}W _m (n)={D _m (n), C _m (n), T _m (n)}

其中D_m(n)为计算任务数据量大小，C_m(n)为处理每比特的数据所需要的CPU循环，T_m(n)为最大允许处理计算任务的时间。Where D _m (n) is the data size of the computing task, C _m (n) is the CPU cycle required to process each bit of data, and T _m (n) is the maximum allowed time to process the computing task.

2)通信模型2) Communication Model

(1)MDs与无人机之间的通信模型(1) Communication model between MDs and drones

MDs将计算任务全部卸载给无人机的传输时延为

其中D_m(n)为MD所携带的计算任务数据量大小，

为MD和无人机之间的传输速率。The transmission delay of MDs to offload all computing tasks to drones is

Where D _m (n) is the amount of computing task data carried by MD,

is the transmission rate between MD and UAV.

其中，B_u是无人机和MDs之间通信的可用带宽，P_m是MDs的传输功率，

是白高斯噪声，P_nlos是非视距传输损耗，f(n)为二值函数，且f(n)∈{0,1}。(f(n)＝0表示在无人机和MDs之间没有遮挡，f(n)＝1代表在无人机和MDs之间有遮挡)Where _Bu is the available bandwidth for communication between UAVs and MDs, _Pm is the transmission power of MDs,

is white Gaussian noise, _Pnlos is non-line-of-sight transmission loss, f(n) is a binary function, and f(n)∈{0,1}. (f(n)＝0 means there is no occlusion between the drone and MDs, and f(n)＝1 means there is occlusion between the drone and MDs)

因此，传输能耗可以表示为：Therefore, the transmission energy consumption can be expressed as:

(2)无人机和基站之间的通信模型(2) Communication model between drone and base station

无人机将计算任务部分卸载给基站的传输时延为：The transmission delay when the drone offloads part of the computing task to the base station is:

其中

为任务卸载变量，

为无人机和基站之间的传输速率。in

Unload variables for tasks,

is the transmission rate between the drone and the base station.

其中，B_k是无人机和基站之间通信的可用带宽，

是无人机的传输功率，

是无人机最大传输功率，

是白高斯噪声。Where _Bk is the available bandwidth for communication between the drone and the base station,

is the transmission power of the drone,

is the maximum transmission power of the drone,

is white Gaussian noise.

因此，在得到无人机和基站之间的传输时延后，传输能耗可以表示为：Therefore, after obtaining the transmission delay between the drone and the base station, the transmission energy consumption can be expressed as:

(3)无人机移动模型(3) Drone mobility model

在时隙n内，无人机从q(n)飞行到新的位置坐标可以表示为：In time slot n, the drone flies from q(n) to the new position coordinates which can be expressed as:

q(n+1)＝[x_u(n)+l_b(n)cosβ(n),y_u(n)+l_b(n)sinβ(n)]q(n+1)＝[x _u (n)+l _b (n)cosβ(n),y _u (n)+l _b (n)sinβ(n)]

其中，l_b(n)＝v_u(n)t_fly为无人机飞行的距离，t_fly为飞行时间，β(n)为无人机的飞行角度。Wherein, l _b (n) = v _u (n) t _fly is the flight distance of the UAV, t _fly is the flight time, and β(n) is the flight angle of the UAV.

因此，无人机的飞行能耗可以表示为：Therefore, the flight energy consumption of the UAV can be expressed as:

E_fly＝φ||v_u(n)||² E _fly =φ||v _u (n)|| ²

其中，φ＝0.5M_UAVt_fly，M是无人机质量。Where, φ = 0.5M _UAV t _fly , and M is the mass of the UAV.

3)计算模型3) Computational model

首先，无人机计算任务时延为：First, the UAV calculation task delay is:

其中f_u为无人机的计算能力，单位是每秒CPU的圈数。Where f _u is the computing power of the drone, and the unit is the number of CPU cycles per second.

此时UAV计算任务产生的能量消耗为：At this time, the energy consumption generated by the UAV computing task is:

E_mu(n)＝γ_u(f_u)³t_mu(n)E _mu (n)=γ _u ( _fu ) ³ t _mu (n)

其中，γ_u为芯片结构对CPU处理的影响因子。Among them, _γu is the impact factor of chip structure on CPU processing.

另外，基站部署MEC服务器侧完成计算任务所需时间为：In addition, the time required for the base station to deploy the MEC server side to complete the computing task is:

其中，f_b为基站的计算能力，单位是每秒CPU的圈数。Where _fb is the computing power of the base station, and the unit is the number of CPU cycles per second.

如图2所示，在无人机辅助边缘计算系统中，由终端随机产生计算任务并传输给无人机后，分为两种处理方式：无人机充当边缘计算节点完成卸载的计算任务处理，或者无人机视作为中继转发，与边缘节点(基站)共同完成计算任务处理。As shown in Figure 2, in the drone-assisted edge computing system, after the terminal randomly generates computing tasks and transmits them to the drone, there are two processing methods: the drone acts as an edge computing node to complete the offloaded computing task processing, or the drone is regarded as a relay forwarding and completes the computing task processing together with the edge node (base station).

本发明提出的无人机辅助MEC系统联合任务调度及运动轨迹优化方法，具体如下。The UAV-assisted MEC system joint task scheduling and motion trajectory optimization method proposed in the present invention is as follows.

优化目标：Optimization goals:

考虑在时隙T内完成计算任务处理，总时延可以表示为：Considering that the computing task processing is completed within the time slot T, the total delay can be expressed as:

其中，

in,

本发明以最小化完成所有终端计算任务处理为目标，考虑地面终端设备及计算任务卸载选择调整、UAV运动参数(包括飞行角度、飞行速度)和传输功率联合优化，将联合优化问题建模如下：The present invention aims to minimize the processing of all terminal computing tasks, considers the ground terminal equipment and computing task offloading selection adjustment, UAV motion parameters (including flight angle, flight speed) and transmission power joint optimization, and models the joint optimization problem as follows:

C1:

C1:

C2:

C2:

C3:

C3:

C4:0≤β(n)≤2πC4:0≤β(n)≤2π

C5:

C5:

C6:{x_u(n)∈[0,L],y_u(n)∈[0,W]}C6:{x _u (n)∈[0,L],y _u (n)∈[0,W]}

C7:{x_m(n)∈[0,L],y_m(n)∈[0,W]}C7:{x _m (n)∈[0,L],y _m (n)∈[0,W]}

其中，C1是对无人机传输功率的约束，C2是对任务卸载变量的约束，C3是对无人机飞行速度的约束，C4是对无人机飞行角度的约束，C5是对无人机自身能量的约束，C6和C7是对无人机和终端设备的移动区域的限制。Among them, C1 is the constraint on the transmission power of the UAV, C2 is the constraint on the task offloading variable, C3 is the constraint on the flight speed of the UAV, C4 is the constraint on the flight angle of the UAV, C5 is the constraint on the UAV's own energy, and C6 and C7 are restrictions on the moving areas of the UAV and terminal equipment.

针对上述优化问题，本发明将非凸优化问题转化为马尔可夫决策过程(MDP)。考虑到该系统的信道条件以及设备状态是动态和时变的，进而，采用深度确定性策略梯度(DDPG)算法对该问题进行求解，该算法可以有效求解具有连续动作空间的优化问题。In view of the above optimization problem, the present invention transforms the non-convex optimization problem into a Markov decision process (MDP). Considering that the channel conditions and device states of the system are dynamic and time-varying, the deep deterministic policy gradient (DDPG) algorithm is used to solve the problem, which can effectively solve the optimization problem with a continuous action space.

求解优化目标的过程为：The process of solving the optimization objective is:

状态空间表示为

代表无人机剩余的电量；The state space is represented as

Represents the remaining battery power of the drone;

动作空间表示为

为任务卸载变量；The action space is represented as

Unload variables for tasks;

奖励函数表示为

其中奖励函数为负的总时延。The reward function is expressed as

The reward function is the negative total delay.

在传统的强化学习中，Q-learning得到了广泛的应用，其使用Q-table来记录和更新状态-动作值，即Q(s,a)，然而，Q-learning很难从以前的经验中提取和泛化特征，处理大维度问题效果低下。DDPG不仅可以通过从以前的经验中不断训练来预测Q值，还采用critic和actor网络进行更新。在DDPG算法中，critic网络可以更新为：In traditional reinforcement learning, Q-learning has been widely used. It uses Q-table to record and update state-action values, i.e., Q(s,a). However, it is difficult for Q-learning to extract and generalize features from previous experience, and it is ineffective in dealing with large-dimensional problems. DDPG can not only predict Q values by continuously training from previous experience, but also use critic and actor networks for updating. In the DDPG algorithm, the critic network can be updated as:

L(θ^Q)＝E_u'[(y_n-Q(s_n,a_n|θ^Q))²]L(θ ^Q )＝E _u' [(y _n -Q(s _n ,a _n |θ ^Q )) ² ]

Actor网络更新为：The Actor network is updated to:

其中，

为求策略梯度函数，μ为策略网络。in,

To find the policy gradient function, μ is the policy network.

DDPG软更新critic和actor目标网络过程表示如下：The DDPG soft update critic and actor target network process is as follows:

θ^Q'←ρθ^Q+(1-ρ)θ^Q' θ ^Q' ←ρθ ^Q +(1-ρ)θ ^Q'

θ^μ'←ρθ^μ+(1-ρ)θ^μ' θ ^μ' ←ρθ ^μ +(1-ρ)θ ^μ'

其中，θ^Q为神经网络的参数，ρ为常数，θ^μ为actor神经网络的参数。Among them, θ ^Q is the parameter of the neural network, ρ is a constant, and θ ^μ is the parameter of the actor neural network.

DDPG算法如下：The DDPG algorithm is as follows:

随机初始化critic和actor网络；Randomly initialize the critic and actor networks;

构建critic和actor目标网络；Construct critic and actor target networks;

初始化经验回放池；Initialize the experience replay pool;

For episode＝1，M do；For episode = 1, M do;

初始化一个随机噪声用于探索；Initialize a random noise for exploration;

初始状态S₁；Initial state S ₁ ;

For t＝1，T do；For t = 1, T do;

根据当前策略随机选择一个动作a_t；Randomly select an action a _t according to the current strategy;

执行动作后，环境反馈及时奖励r_t+1和新状态s_t+1；After executing the action, the environment provides timely reward r _t+1 and new state s _t+1 ;

添加transitions(s_t，a_t，r_t+1，s_t+1)到经验池中；Add transitions(s _t , a _t , r _t+1 , s _t+1 ) to the experience pool;

从经验池中随机抽取一小批transitions(s_t，a_t，r_t+1，s_t+1)；Randomly extract a small batch of transitions (s _t , a _t , r _t+1 , s _t+1 ) from the experience pool;

计算Q值，基于均方损失

更新critic网络参数；Calculate Q value based on mean square loss

Update critic network parameters;

更新actor网络；Update actor network;

最后软更新critic和actor目标网络。Finally, the critic and actor target networks are soft-updated.

本发明在充分考虑到在动态环境中移动设备位置发生变化和计算任务部分卸载的情况的同时，通过联合优化用户调度及计算任务卸载方式选择，无人机运动参数(包括飞行角度、飞行速度)和功率分配，有效降低了终端设备的计算任务处理时延，有效提升了无人机辅助边缘计算系统的卸载效率。并在仿真实验中与DQN等基线算法进行对比，发现DDPG算法在处理时延上有显著的提高。其中，DDPG与DQN在处理时延上的对比结果(如图3所示)显示，DDPG比DQN提高了近20％。如图4所示，DDPG与Local基线算法(计算任务只在本地计算)相比在处理时延上提高了近60％，DDPG与Offload基线算法(计算任务全部卸载到基站进行边缘计算)相比在处理时延上提高了近34％。While fully considering the situation that the position of mobile devices changes and the partial unloading of computing tasks in a dynamic environment, the present invention effectively reduces the computing task processing delay of terminal devices by jointly optimizing user scheduling and computing task unloading mode selection, drone motion parameters (including flight angle, flight speed) and power allocation, and effectively improves the unloading efficiency of drone-assisted edge computing systems. And compared with baseline algorithms such as DQN in simulation experiments, it is found that the DDPG algorithm has a significant improvement in processing delay. Among them, the comparison results of DDPG and DQN in processing delay (as shown in Figure 3) show that DDPG is nearly 20% higher than DQN. As shown in Figure 4, DDPG has improved nearly 60% in processing delay compared with the Local baseline algorithm (computational tasks are only calculated locally), and DDPG has improved nearly 34% in processing delay compared with the Offload baseline algorithm (computational tasks are all unloaded to the base station for edge computing).

以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换，但这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。The above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit the same. Although the present invention has been described in detail with reference to the aforementioned embodiments, a person skilled in the art should understand that the technical solutions described in the aforementioned embodiments may still be modified, or some of the technical features thereof may be replaced by equivalents. However, such modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The unmanned aerial vehicle auxiliary MEC system joint task scheduling and motion trail optimizing method is characterized in that all terminal calculation task processing is minimized to be completed to be an optimizing target, unmanned aerial vehicle transmission power, task unloading variables, unmanned aerial vehicle flight speed, unmanned aerial vehicle flight angle, unmanned aerial vehicle self energy, a moving area of an unmanned aerial vehicle and a moving area of terminal equipment are taken as constraints, and a depth deterministic strategy gradient algorithm is adopted to solve the optimizing target.

2. The unmanned aerial vehicle assisted MEC system joint task scheduling and motion trajectory optimization method of claim 1, wherein the optimization objectives and constraints are expressed as follows:

C4:0≤β(n)≤2π

C6:{x _u (n)∈[0,L],y _u (n)∈[0,W]}

C7:{x _m (n)∈[0,L],y _m (n)∈[0,W]}

wherein ,t_sum (n) is the total delay to complete the processing of the computing task in time slot T, expressed as:

wherein ,

offloading all computing tasks to a drone for ground terminal equipmentTransmission delay->

Transmission delay for unloading calculation task part to base station for unmanned aerial vehicle, t _mu (n) calculating task time delay for unmanned aerial vehicle, t _mb (n) time required for the base station to deploy MEC server side to complete the calculation task;

c1 is the transmission power P to the unmanned aerial vehicle _u (n) constraint, C2 is a variable for task offloading

C3 is the constraint on the unmanned aerial vehicle flight speed v _u (n) constraint, C4 is constraint on unmanned plane flight angle β (n), C5 is constraint on unmanned plane self energy, and C6 and C7 are limits on unmanned plane and terminal equipment movement area, respectively; e (E) _fly ＝φ||v _u (n)|| ² Is the flight energy consumption of the unmanned aerial vehicle, wherein phi=0.5M _UAV t _fly M is unmanned aerial vehicle mass, t _fly For time of flight, E _mu (n)＝γ _u (f _u ) ³ t _mu (n)，E _mu (n) locally calculating energy consumption for unmanned aerial vehicle, f _u Is the calculation capability of unmanned aerial vehicle, gamma _u For influencing the CPU processing by the chip structure, E ^U Is self-charged quantity of the unmanned aerial vehicle.

3. The unmanned aerial vehicle assisted MEC system joint task scheduling and motion trail optimization method according to claim 1, wherein the process of solving the optimization target by adopting the depth deterministic strategy gradient algorithm is as follows:

the state space is expressed as

Wherein q (n) represents positional information of the unmanned aerial vehicle, p _m (n) represents the position information of the ground terminal equipment, D _m (n) represents the size of the remaining calculation task data amount,

representing the residual electric quantity of the unmanned aerial vehicle;

the action space is expressed as

wherein v_u (n) is the speed of the unmanned aerial vehicle, beta (n) is the flight angle of the unmanned aerial vehicle, and P _u (n) is the transmission power of the unmanned aerial vehicle, < >>

Unloading variables for the task;

the bonus function is denoted as r _n ＝-t _sum (n) wherein the reward function is a negative total delay;

the state-action values, i.e., Q (s, a), are recorded and updated using the Q-table, with updates using the critic and actor networks, as follows:

θ ^Q' ←ρθ ^Q +(1-ρ)θ ^Q'

θ ^μ' ←ρθ ^μ +(1-ρ)θ ^μ'

wherein ,θ^Q Is a parameter of the critic neural network, ρ is a constant, θ ^μ Is a parameter of the actor neural network.

4. The unmanned aerial vehicle assisted MEC system joint task scheduling and motion trajectory optimization method of claim 3, wherein the critic network updates as:

L(θ ^Q )＝E _μ' [(y _n -Q(s _n ,a _n |θ ^Q )) ² ]

wherein ,E_μ' To calculate the mean value function, y _n For Q target value, y _n ＝r _n +γQ(s _n+1 ,μ(s _n+1 )|θ ^Q ) Gamma is the discount factor, r _n Is a bonus function.

5. The unmanned aerial vehicle assisted MEC system joint task scheduling and motion trail optimization method of claim 3, wherein the Actor network updates as:

wherein ,

for the policy gradient function, μ is the policy network and N is the size randomly extracted from the experience pool. />