CN117354759B

CN117354759B - Task unloading and charging scheduling combined optimization method for multi-unmanned aerial vehicle auxiliary MEC

Info

Publication number: CN117354759B
Application number: CN202311660129.0A
Authority: CN
Inventors: 梅芳; 吉非凡; 孙庚�; 康辉; 刘雨晴
Original assignee: Jilin University
Current assignee: Jilin University
Priority date: 2023-12-06
Filing date: 2023-12-06
Publication date: 2024-03-19
Anticipated expiration: 2043-12-06
Also published as: CN117354759A

Abstract

The invention discloses a task unloading and charging scheduling combined optimization method of a multi-unmanned aerial vehicle auxiliary MEC, which relates to the technical field of mobile edge calculation and comprises the following steps: 1. establishing a multi-unmanned aerial vehicle auxiliary movement and charging model; 2. determining the time delay and the total energy consumption of the multi-unmanned aerial vehicle auxiliary movement and charging model to finish tasks; 3. constructing an optimization target; 4. modeling the optimization problem as a discrete-time markov decision process; 5. obtaining an optimal strategy of task unloading and charging scheduling by using a P-TD3 algorithm; 6. and executing the computationally intensive tasks cooperatively according to the calculated optimal task unloading proportion of the unmanned aerial vehicle, the flight track, the charging schedule and the optimal task unloading proportion of the user equipment. According to the invention, the charging station is deployed in the multi-unmanned aerial vehicle auxiliary mobile edge computing system to charge unmanned aerial vehicles, so that the total energy consumption of the system is effectively reduced, the maximum task completion amount is ensured, meanwhile, the problem of insufficient energy of unmanned aerial vehicles is solved through charging scheduling, and the overall service quality is improved.

Description

Task unloading and charging scheduling combined optimization method for multi-unmanned aerial vehicle auxiliary MEC

Technical Field

The invention relates to the technical field of mobile edge calculation, in particular to a task unloading and charging scheduling combined optimization method of a multi-unmanned aerial vehicle auxiliary MEC.

Background

The advent of the 5G communications era has resulted in a substantial increase in the number of user devices and intelligent applications, and thus in the generation of massive amounts of task data, the demand for computing resources and low-latency services has been unprecedented. In this context, mobile Edge Computing (MEC) offloads computationally intensive tasks to the edge of the wireless network with its unique advantage, significantly reducing service delay and improving quality of service. Unmanned aerial vehicles, because of their flexible deployment, wide coverage, and reliable wireless communication capabilities, have been widely used to assist MEC systems in performing computationally intensive tasks, fly above a target area, and provide support for user equipment on the ground.

However, in current research on multi-drone assisted mobile edge computing, most solutions do not consider both task offloading policies and drone to charging station charging scheduling issues. Unmanned aerial vehicles are limited by energy, and all of the unmanned aerial vehicles consume energy in flight, communication and task execution, and if unmanned aerial vehicles are out of operation in the task execution process, the service quality is greatly affected.

Therefore, the charging station should be deployed on the ground to provide stable and continuous power supplement for the unmanned aerial vehicle, and meanwhile, the unmanned aerial vehicle should also perform effective energy management and timely arrange charging.

Disclosure of Invention

The invention aims to design and develop a task unloading and charging scheduling combined optimization method of a multi-unmanned aerial vehicle auxiliary MEC, establishes an unmanned aerial vehicle model for auxiliary movement and charging, and combines a P-TD3 algorithm to obtain an optimal task unloading proportion, a flight track, charging scheduling and an optimal task unloading proportion of user equipment of an unmanned aerial vehicle when cooperatively executing a computationally intensive task, so that energy consumption is reduced, and the problem of insufficient energy of the unmanned aerial vehicle is solved.

The technical scheme provided by the invention is as follows:

a task unloading and charging scheduling combined optimization method of a multi-unmanned aerial vehicle auxiliary MEC comprises the following steps:

step one, collecting geographical position information of user equipment, a base station and a charging station, and establishing a multi-unmanned aerial vehicle auxiliary movement and charging model;

step two, determining time delay and total energy consumption of the multi-unmanned aerial vehicle auxiliary movement and charging model to complete tasks;

the time delay satisfies:

；

in the method, in the process of the invention,for user equipment->At->Time delay of completion of task by time slot,/->For user equipment->At->Local computation time delay of time slot, +.>For user equipment->And unmanned plane->Between->Data transmission time delay of time slot, < >>Is unmanned plane->At->Calculation time delay of time slot processing task, +.>Is unmanned plane->Is->Data transmission time delay of time slot, < >>For base station +.>Calculating time delay of the time slot processing task;

the total energy consumption satisfies:

；

in the method, in the process of the invention,is->Total energy consumption of time slot system->For user equipment->At->Local computation energy consumption of time slot, +.>For user equipment->And unmanned plane->Between->Data transmission energy consumption of time slot, < >>Is unmanned plane->At the position ofCalculation energy consumption of time slot processing task, +.>Is unmanned plane->Is->Data transmission energy consumption of time slot, < >>For base station +.>The calculation energy consumption of the time slot processing task;

step three, constructing an optimization target as follows:

；

in the method, in the process of the invention,for a set of horizontal angles of flight of the unmanned aerial vehicle, +.>，/>For a set of horizontal distances for unmanned aerial vehicle flight, +.>，/>Is a set of unmanned aerial vehicle states->，/>For the set of unmanned aerial vehicle offloading rates and user equipment offloading rates, +.>，/>For time slot->For the first weight factor, +.>For the second weight factor, +.>Is->The number of tasks completed in the time slot;

and step four, modeling the optimization target as a discrete time Markov decision process, and obtaining the optimal task unloading proportion, the flight track, the charging schedule and the optimal task unloading proportion of the user equipment of the unmanned aerial vehicle according to a P-TD3 algorithm.

Preferably, the multi-unmanned aerial vehicle auxiliary movement and charging model is formed byEach user equipment,The frame is composed of an unmanned plane equipped with a mobile edge computing server, a base station provided with an MEC server and a fixed charging station.

Preferably, the user equipmentAt->The local computation time delay of the time slot satisfies:

；

in the method, in the process of the invention,for user equipment->At->Task ratio of time slot offloading to drone, +.>For +.>At->Task proportion calculated locally in time slot, +.>Representing user equipment +.>At->CPU frequency of time slot, ">For user equipment->At->The size of the task data generated by the time slot, +.>For user equipment->At->The total number of CPU cycles required in the local calculation of the time slot;

the user equipmentAt->The local computation energy consumption of the time slot satisfies the following conditions:

；

wherein, κ is user equipmentIs effective in terms of the effective capacitance coefficient.

Preferably, the user equipmentAnd unmanned plane->Between->The data transmission time delay of the time slot satisfies:

；

in the method, in the process of the invention,is->Time slot user equipment->And unmanned plane->Data transmission rate between them, and which satisfies:

；

in the method, in the process of the invention,for upstream bandwidth, < > for>Is->Time slot offloading tasks to drone->Is>For user equipment->Transmit power of>Additive Gaussian white noise power for unmanned plane, < ->Is->Time slot user equipment->And unmanned plane->Channel gain between, and which satisfies:

；

in the method, in the process of the invention,for the power gain at a reference distance of 1 meter, < >>Is->Time slot user equipment->And unmanned plane->Distance between them.

Preferably, the user equipmentAnd unmanned plane->Between->The data transmission energy consumption of the time slot satisfies the following conditions:

；

in the method, in the process of the invention,is unmanned plane->Is provided.

Preferably, the unmanned aerial vehicleAt->The computation time delay of the slot processing task satisfies:

；

in the method, in the process of the invention,is unmanned plane->At->CPU frequency with time slot for calculation, < >>Representing unmanned plane->At->Task ratio of time slot offloading to base station, +.>Is shown in unmanned plane->At->Task proportion calculated locally by time slot;

the unmanned aerial vehicleAt->The calculation energy consumption of the time slot processing task meets the following conditions:

。

preferably, the unmanned aerial vehicleIs->The data transmission time delay of the time slot satisfies:

；

in the method, in the process of the invention,is->Time slot unmanned plane->Transmission data rate with a base station, and which satisfies:

；

in the method, in the process of the invention,for bandwidth pre-allocated to base station->Is unmanned plane->Transmit power of>For the base station's additive white gaussian noise power, < >>Is->Time slot unmanned plane->Channel gain with the base station, and which satisfies:

；

in the method, in the process of the invention,is->Time slot unmanned plane->Distance from the base station;

the unmanned aerial vehicleIs->The data transmission energy consumption of the time slot satisfies the following conditions:

。

preferably, the base station is atThe computation time delay of the slot processing task satisfies:

；

in the method, in the process of the invention,the CPU frequency of the base station;

the base station is atThe calculation energy consumption of the time slot processing task meets the following conditions:

。

preferably, the modeling the optimization objective as a discrete-time markov decision process specifically includes:

the state space is:；

the action space is as follows: ；

the reward function is:；

wherein,is->The state space of the time slot is that a plane rectangular coordinate system is established on the ground by taking a base station as an origin, and the north-south direction of the base station is +.>The axis is oriented north in positive direction, and the east-west direction of the base station is +.>The axis is oriented in the positive direction, +.>Is unmanned plane->At->Abscissa of time slot, ">Is unmanned plane->At->Ordinate of time slot, ">Is unmanned plane->Residual capacity at time slot t +.>Is->Action space of time slot,/>Is unmanned plane->At->Horizontal angle of time slot, +.>Is unmanned plane->At->Horizontal distance of time slot, ">Is unmanned plane->At->Status of time slot->Is unmanned plane->At->Unloading proportion of time slots, ">For user equipment->At->Unloading proportion of time slots, ">Is->A reward function of time slot, and->，，/>Maximum distance that can fly horizontally for one time slot of the unmanned aerial vehicle, +.>When (when)When indicate +.>Time slot unmanned plane->Is charging, when->When indicate +.>Time slot unmanned plane->Task being performed, +.>，/>。

Preferably, the P-TD3 algorithm specifically includes:

step 1, initializing an Actor networkFirst Critic network->And a second Critic network->And an Actor network->The parameter of (2) is->First Critic network->The parameter of (2) is->Parameters of the second Critic networkInitializing target Actor network again>First target Critic network->And a second target Critic networkAnd the parameter of the target Actor network is +.>The parameter of the first target Critic network is +.>The parameter of the second target Critic network is +.>Copying the three network parameters to the parameters of three target networks respectively and initializing an experience playback buffer area and Gaussian noise at the same time;

wherein the experience playback buffer has a capacity of 10000 experiences;

step 2, initializing time slotThe number of training time slots of each round is 20, < >>Status space of time slot->Unmanned aerial vehicle>At->Abscissa and ordinate of time slot +.>At->Random values within the range,/->；

Step 3, willThe hybrid action space of the time slot is replaced by +.>Continuous space of action of time slotsWherein->Is unmanned planeAt->Execution task action of time slot,/->Is unmanned plane->At->Time slot charging operation, and，/>all are continuous actions, will ∈>The state space of the time slot is input into the target Actor network +.>Generating the current continuous motion space +.>And adding noise +.>Thereafter converting the continuous motion space into a hybrid motion space;

wherein if itThen->The method comprises the steps of carrying out a first treatment on the surface of the If->Then->；

Step 4, executing the obtained mixed action space by the environment to obtain a corresponding rewarding function and a state space of the next time slot；

Step 5, the current state space is processedContinuous motion space->Bonus function->And the state space of the next time slot +.>Store in an experience playback buffer;

step 6, if the experience playback buffer stores more than 128 experiences, randomly sampling 128 experiences from the experience playback buffer, and for each sample, using a first target Critic networkAnd a second target Critic networkRespectively calculating initial next state Q values, selecting the minimum value of the initial state Q values and the initial next state Q values as the next state Q value of the actual application, and updating parameters +_ of two Critic networks by a gradient descent method>，/>；

Step 7, updating 1-time Actor network by a gradient descent method every 2 times of updating of two Critic networksParameters of (2)；

Step 8, updating parameters of the target Actor network 1 time by using a soft update strategy every 4 times by two Critic networksParameter of the first target Critic network +.>And parameters of the second target Critic network +.>The time slot is completed;

step 9, ifThen directly jump to step 3;

if it isAnd->Then jump to step 2;

if it isAnd->The algorithm ends.

The beneficial effects of the invention are as follows:

according to the task unloading and charging scheduling combined optimization method for the multi-unmanned aerial vehicle auxiliary MEC, disclosed by the invention, aiming at the problem that the calculation capacity and energy of an unmanned aerial vehicle are limited, a user can transmit the task to the unmanned aerial vehicle carrying an edge server for calculation, meanwhile, the unmanned aerial vehicle can transmit the task unloaded to the unmanned aerial vehicle to a remote base station for calculation, so that the task quantity maximization and the total system energy consumption minimization are completed, the P-TD3 algorithm improved based on a dual-delay depth deterministic strategy gradient (TD 3) algorithm is adopted to jointly optimize the task unloading proportion, the flight track, the charging scheduling and the task unloading proportion of user equipment of the unmanned aerial vehicle, the total energy consumption of the system can be effectively reduced, the task quantity maximization is guaranteed, meanwhile, the problem of insufficient energy of the unmanned aerial vehicle is solved through efficient charging scheduling, and the overall service quality is improved.

Drawings

Fig. 1 is a flow chart of a task unloading and charging scheduling combined optimization method of a multi-unmanned aerial vehicle auxiliary MEC.

Fig. 2 is a schematic system structure of the multi-unmanned aerial vehicle auxiliary MEC according to the present invention.

Detailed Description

The present invention is described in further detail below to enable those skilled in the art to practice the invention by reference to the specification.

As shown in fig. 1, the task unloading and charging scheduling combined optimization method for the multi-unmanned aerial vehicle auxiliary MEC provided by the invention comprises the following steps:

step one, as shown in fig. 2, geographical location information of user equipment, a base station and a charging station is collected, and a multi-unmanned aerial vehicle auxiliary movement and charging model is established:

calculating the mobile edge by M user equipments and N framesMEC) server, a base station with the MEC server installed and a fixed Charging Station (CS),，/>dividing the task duration into T pieces of length +.>Is set to->Representing a set of time slots, assuming that each user equipment is to handle one calculation task per time slot +.>Wherein->For the size of the task data +.>In order to execute the total number of CPU cycles required by the task, each task can be executed by user equipment, or a part of tasks can be unloaded to a unmanned aerial vehicle closest to the task, the unmanned aerial vehicle can further unload a part of tasks to a base station for processing, and the residual capacity of each unmanned aerial vehicle is expressed as +>Task start +.>The value of (2) is +.>，/>The unmanned aerial vehicle is exhausted and cannot continue to operate.

To avoidThe unmanned aerial vehicle is free from electric quantity exhaustion in the working process, the CS provides energy supply for the unmanned aerial vehicle so as to ensure the stability of service, and the induction range of the CS is assumed to beI.e. the unmanned aerial vehicle can stay +.>Any area within the range, unmanned plane +.>The state of (2) is defined as a binary variable +.>To represent unmanned plane->Whether charging is performed in time slot t: when->At the same time, represent unmanned plane->Is being charged; when->At the same time, represent unmanned plane->The task is being executed only if the distance between the unmanned plane and the CS is less than or equal to +.>In the time, the charging schedule decision will be validated, unmanned plane +.>Residual electric quantity after charging>Restoring to the original electric quantity +.>。

the user equipment is fixed at the ground position, the unmanned plane is fixed to be H in height, and the user equipment is fixed at the time slotIn each unmanned aerial vehicle's flight action is according to horizontal angle +.>And horizontal distance->Deciding, and not to exceed the boundary of the target area, wherein +.>The maximum distance that a slot can fly horizontally for a drone.

Establishing a space rectangular coordinate system by taking a base station as an origin, taking the north-south direction of the base station as a y axis, taking the north-south direction as a positive direction, taking the east-west direction of the base station as an x axis, taking the east direction as the positive direction, taking the direction vertical to the ground as a z axis, taking the direction as the positive direction, and setting an unmanned aerial vehicleAt->Time slot coordinates are +.>Unmanned plane->At->Coordinates of time slots->It can be calculated as:

；

(1) User equipment calculates:

assuming that each user equipment can perform local computation and task offloading simultaneously, in an MEC system, a partial offloading policy is employed for each time slot's user equipment tasks,for user equipment->At->Task ratio of time slot offloading to drone, +.>For +.>At->Task proportion calculated locally in time slot, user equipment +.>At->The local computation time delay of a slot can be expressed as:

；

in the method, in the process of the invention,representing user equipment +.>At->CPU frequency of time slot, ">For user equipment->At->The size of the task data generated by the time slot, +.>The total number of CPU cycles required to perform this task;

thus, the user equipmentAt->The local computational power consumption of a slot can be expressed as:

；

wherein, κ is user equipmentDepending on the chip architecture of its processor.

(2) The user equipment offloads to the drone:

the user device may offload a portion of the data to a closest drone for computing,time slot user equipment->And unmanned plane->Channel gain between/>Can be defined as:

；

in the method, in the process of the invention,for the power gain at a reference distance of 1 meter, < >>Is->Time slot user equipment->And unmanned plane->A distance therebetween;

in the task offloading process, it is assumed that the upstream bandwidthEqually allocated to each user equipment>Time slot user equipment->And unmanned plane->The data transmission rate between them is:

；

in the method, in the process of the invention,is->Time slot offloading tasks to drone->Is>For user equipment->Transmit power of>Additive Gaussian white noise power of the unmanned aerial vehicle;

time slot user equipment->And unmanned plane->The data transfer time delay between can be defined as:

；

in the same way as described above,time slot user equipment->And unmanned plane->The transmission energy consumption between them can be defined as:

；

in the method, in the process of the invention,is unmanned plane->Is provided.

(3) Unmanned aerial vehicle calculates:

after receiving the input data of the user equipment, each unmanned aerial vehicle can simultaneously perform local calculation and task unloading,representing unmanned plane->At->Task ratio of time slot offloading to base station, +.>Is shown in unmanned plane->At->Task proportion calculated locally in time slot, computing resource of each unmanned plane +.>Each user equipment equally allocated to its services, i.e +.>；

Time slot unmanned plane->The calculation time delay and energy consumption of the processing task are as follows:

；

(4) The drone offloads to the base station:

considering that the drone may offload part of the task to the base station for further calculation,time slot unmanned plane->Channel gain with base station>Can be defined as:

；

in the method, in the process of the invention,for the power gain at a reference distance of 1 meter, < >>Is->Time slot unmanned plane->Distance from the base station.

ThenTime slot unmanned plane->The transmission data rate with the base station is:

；

in the method, in the process of the invention,for bandwidth pre-allocated to base station->Is unmanned plane->Transmit power of>Is the additive white gaussian noise power of the base station.

ThenTime slot unmanned plane->The data transmission time delay with the base station can be defined as:

；

also, the process of the present invention is,time slot unmanned plane->The energy consumption of data transmission between the base station and the base station is as follows:

；

unmanned planeResidual electric quantity>The value is updated to +.>；

(5) And (3) calculating by a base station:

the time delay and energy consumption of the time slot base station processing task are as follows:

；

in the method, in the process of the invention,is the CPU frequency of the base station.

In conclusion, the method comprises the steps of,time slot user equipment->Time delay of task->And->Total energy consumption of time slot systemCan be expressed as:

；

step three, constructing an optimization target:

the invention optimizes the task unloading proportion, the flight track, the charging schedule and the task unloading proportion of the user equipment of the unmanned aerial vehicle by taking the maximization of the completed task amount and the minimization of the total energy consumption of the system as optimization targets, and specifically, the combined optimization problem can be expressed as follows:

；

in the method, in the process of the invention,for the first weight factor, +.>Is a second weight factor, preferably, < >>And->The average value is 1,/o>Is->The number of tasks completed in a slot, if->Less than or equal to->Time slot length, i.e.)>User equipment +.>Task of (1) is->Time slot successful completion,/->Increase by 1.

Modeling the optimization target as a discrete time Markov decision process, and obtaining an optimal task unloading proportion, a flight track, a charging schedule and user equipment of the unmanned aerial vehicle according to a P-TD3 algorithm;

the decision process specifically includes:

state space: comprisingCoordinates and battery remaining power of each unmanned aerial vehicle in time slot are expressed as；

Action space: comprisingHorizontal flight distance, horizontal direction angle, unloading ratio and whether to charge each unmanned aerial vehicle and unloading ratio of each user equipment of time slot are expressed as +.>；

Bonus function: depending on the objective of the optimization,the slot prize function is denoted +.>；

Wherein,is->State space of time slot, ">Is unmanned plane->At->Abscissa of time slot, ">Is unmanned plane->At->Ordinate of time slot, ">Is unmanned plane->At->Remaining capacity of time slot, ">Is->Action space of time slot,/>Is unmanned plane->At->Horizontal angle of time slot, +.>Is unmanned plane->At->Horizontal distance of time slot, ">Is unmanned plane->At->Status of time slot->Is unmanned plane->At->Unloading proportion of time slots, ">For user equipment->At->Unloading proportion of time slots, ">Is->Rewarding function of time slot, ">For the first weight factor, +.>For the second weight factor, +.>Is->The number of tasks completed in a slot, ">Is->Total energy consumption of a slotted system, and->，/>，/>Maximum distance that can fly horizontally for one time slot of the unmanned aerial vehicle, +.>When->When indicate +.>Time slot unmanned aerial vehicleIs charging, when->When indicate +.>Time slot unmanned plane->Task being performed, +.>，/>。

The original TD3 algorithm is applicable to continuous motion space, butIncluding discrete actions->And also comprises continuous actionThe method is a mixed action space, and in order to solve the problem of how to process discrete actions, an original TD3 algorithm is improved to obtain a P-TD3 algorithm, wherein the P-TD3 algorithm specifically comprises:

(1) Initializing parameters:

initializing an Actor networkIts parameter is->；

Initializing two Critic networks，/>Parameters are->，/>；

Reinitialization ofIs->Parameter is->；

Initialization ofIs->Parameter is->；

The three networks are connected，/>，/>Are copied to their target networks respectively corresponding to the parameters of (a) I.e. +.>At the same time, an empty experience playback buffer is initialized to 10000 experiences, and Gaussian noise +.>Training wheel number->Each round includes 20 slots;

(2) Initializing time slot t and state space：

Time slot，/>Status space of time slot->Unmanned aerial vehicle>At->The abscissa +.>，/>At->Random values within the range,/->；

(3) Selecting:

to handle discrete actions, a hybrid action space is usedDiscrete actions->Replacement by continuous actionWherein->Is unmanned plane->At->Execution of the time slot the task action,is unmanned plane->At->The charging operation of the time slot obtains continuous operation space as，/>The action of the method is implemented by the way of +.>Input state space of current time slot->Generating and adding noise to the motion in order to increase exploratory propertiesI.e. +.>Judging discrete action->Is the value of (1):

if it isRepresenting->Time slot unmanned plane->Tasks being performed, i.e.)>；

If it isRepresenting->Time slot unmanned plane->Charging, i.e.)>；

Obtain discrete actionsWill->And->Replace back->Get mixed action space +.>；

(4) The actions are performed:

environment execution hybrid action spaceObtain the reward function of this time slot +.>And the state space of the next time slot；

(5) And (3) storing experience:

the current time slot state spaceContinuous action space of the current time slot->Bonus function for current time slotAnd the state space of the next time slot +.>Storing in experience playback buffer, if experience playback buffer is already full, then the new experience overrides the earliest stored experience, then updating current state space to let ∈ ->；

(6) Updating the Critic network:

if the experience playback buffer stores more than 128 experiences, then 128 experiences are randomly sampled from the experience playback buffer, for each sample, using two target Critic networks，/>Respectively calculating the Q values of the initial next states, selecting the minimum value of the Q values as the Q value of the next state of the actual application, and updating the two Critic networks by a gradient descent method>，/>Parameter of->，/>；

(7) Updating an Actor network:

two Critic networks，/>Every 2 updates, 1 update by gradient descent method>Parameter of->；

(8) Updating the target network:

two Critic networksUpdating a target network 1 time using a soft update policy 4 times per updateParameter of->；

(9) The algorithm ends:

time slotsAdd 1 if slot->The process jumps directly to step 3, whereas if +.>Firstly, training the number of turns epinode to be added with 1, and then judging if +.>Then jump to step 2,/>The algorithm ends.

In this embodiment, the operation of the P-TD3 algorithm is performed using pytorch1.7.0, which may automatically initialize network parameters.

The user equipment, the unmanned aerial vehicle and the base station cooperatively execute the computationally intensive tasks according to the calculated optimal task unloading proportion of the unmanned aerial vehicle, the flight track, the charging schedule and the optimal task unloading proportion of the user equipment so as to ensure the maximization of the task amount and reduce the total energy consumption of the system.

Although embodiments of the present invention have been disclosed above, it is not limited to the details and embodiments shown, it is well suited to various fields of use for which the invention is suited, and further modifications may be readily made by one skilled in the art, and the invention is therefore not to be limited to the particular details and examples shown and described herein, without departing from the general concepts defined by the claims and the equivalents thereof.

Claims

1. The task unloading and charging scheduling combined optimization method of the multi-unmanned aerial vehicle auxiliary MEC is characterized by comprising the following steps of:

the time delay satisfies:

；

the total energy consumption satisfies:

；

in the method, in the process of the invention,is->Total energy consumption of time slot system->For user equipment->At->The local computation of the time slot is energy consuming,for user equipment->And unmanned plane->Between->Data transmission energy consumption of time slot, < >>Is unmanned plane->At->Calculation energy consumption of time slot processing task, +.>Is unmanned plane->Is->Data transmission energy consumption of time slot, < >>For base station +.>The calculation energy consumption of the time slot processing task;

step three, constructing an optimization target as follows:

；

in the method, in the process of the invention,for a set of horizontal angles of flight of the unmanned aerial vehicle, +.>，/>For a set of horizontal distances for unmanned aerial vehicle flight, +.>，/>Is a set of unmanned aerial vehicle states->，/>For the set of unmanned aerial vehicle offloading rates and user equipment offloading rates, +.>，/>For time slot->For the first weight factor, +.>For the second weight factor, +.>Is->The number of tasks completed in a slot, ">Is unmanned plane->At->Horizontal angle of time slot, +.>Is unmanned plane->At->Horizontal distance of time slot, ">Is unmanned plane->At->Status of time slot->Is unmanned plane->At->Unloading proportion of time slots, ">For user equipment->At->Unloading proportion of time slots, ">Total number of user equipments for multi-unmanned aerial vehicle assisted mobile and charging model, < >>For the total number of unmanned aerial vehicles equipped with a mobile edge calculation server in a multi-unmanned aerial vehicle auxiliary mobile and charging model,/-unmanned aerial vehicle auxiliary mobile and charging model>Is a set of time slots;

modeling the optimization objective as a discrete-time markov decision process specifically includes:

the state space is:；

the action space is as follows:；

the reward function is:；

wherein,is->The state space of the time slot is that a plane rectangular coordinate system is established on the ground by taking a base station as an origin, and the north-south direction of the base station is +.>The axis is oriented north in positive direction, and the east-west direction of the base station is +.>The axis is oriented in the positive direction, +.>Is unmanned plane->At->Abscissa of time slot, ">Is unmanned plane->At->Ordinate of time slot, ">Is unmanned plane->The remaining power of the time slot at t,is->Action space of time slot,/>Is->A reward function of time slot, and->，/>，/>Maximum distance that can fly horizontally for one time slot of the unmanned aerial vehicle, +.>When->When indicate +.>Time slot unmanned plane->Is charging, when->When indicate +.>Time slot unmanned plane->Task being performed, +.>，/>；

The P-TD3 algorithm specifically comprises the following steps:

step 1, initializing an Actor networkFirst Critic network->And a second Critic network->And an Actor network->The parameter of (2) is->First Critic network->The parameter of (2) is->Parameter of the second Critic network +.>Initializing target Actor network again>First target Critic network->And a second target Critic network +.>And parameters of the target Actor networkIs->The parameter of the first target Critic network is +.>The parameter of the second target Critic network is +.>Copying the three network parameters to the parameters of three target networks respectively and initializing an experience playback buffer area and Gaussian noise at the same time;

wherein the experience playback buffer has a capacity of 10000 experiences;

step 2, initializing time slotThe number of training time slots of each round is 20, < >>Status space of time slot->Unmanned aerial vehicle>At->Abscissa and ordinate of time slot +.>At->Random values within the range,/->，/>The initial electric quantity of each unmanned aerial vehicle is calculated;

step 3, willThe hybrid action space of the time slot is replaced by +.>Continuous space of action of time slotsWherein->Is unmanned planeAt->Execution task action of time slot,/->Is unmanned plane->At->Time slot charging operation, andall are continuous actions, will ∈>The state space of the time slot is input into the target Actor network +.>Generating a current continuous motion space/>And adding noise +.>Thereafter converting the continuous motion space into a hybrid motion space;

Step 5, the current state space is processedContinuous motion space->Bonus function->And the state space of the next time slotStore in an experience playback buffer;

step 6, if the experience playback buffer stores the experience numberThe amount is greater than 128, then 128 experiences are randomly sampled from the experience playback buffer, for each sample, using the first target Critic networkAnd a second target Critic network +.>Respectively calculating initial next state Q values, selecting the minimum value of the initial state Q values and the initial next state Q values as the next state Q value of the actual application, and updating parameters +_ of two Critic networks by a gradient descent method>；

Step 7, updating 1-time Actor network by a gradient descent method every 2 times of updating of two Critic networksParameter of->；

step 9, ifThen directly jump to step 3;

if it isAnd->Then jump to step 2;

if it isAnd->The algorithm ends;

wherein,the training round number is the number of training rounds.

2. The multi-unmanned aerial vehicle assisted MEC task offloading and charging scheduling joint optimization method of claim 1, wherein the multi-unmanned aerial vehicle assisted movement and charging model is composed ofPersonal user equipment>The frame is composed of an unmanned plane equipped with a mobile edge computing server, a base station provided with an MEC server and a fixed charging station.

3. The multi-unmanned aerial vehicle assisted MEC task offloading and charging schedule joint optimization method of claim 2, wherein the user equipmentAt->The local computation time delay of the time slot satisfies:

；

4. A multi-unmanned aerial vehicle assisted MEC task offloading and charging schedule joint optimization method as claimed in claim 3, wherein the user equipmentAnd unmanned plane->Between->The data transmission time delay of the time slot satisfies:

；

in the method, in the process of the invention,for upstream bandwidth, < > for>Is->Time slot offloading tasks to drone->Is>For user equipmentTransmit power of>Additive Gaussian white noise power for unmanned plane, < ->Is->Time slot user equipment->And unmanned plane->Channel gain between, and which satisfies:

；

5. The multi-unmanned aerial vehicle assisted MEC task offloading and charging schedule joint optimization method of claim 4, wherein the user equipmentAnd unmanned plane->Between->The data transmission energy consumption of the time slot satisfies the following conditions:

；

in the method, in the process of the invention,is unmanned plane->Is provided.

6. The multi-unmanned aerial vehicle assisted MEC task offloading and charging schedule joint optimization method of claim 5, wherein the unmanned aerial vehicleAt->The computation time delay of the slot processing task satisfies:

；

。

7. the multi-unmanned aerial vehicle assisted MEC task offloading and charging schedule joint optimization method of claim 6, wherein the unmanned aerial vehicleIs->The data transmission time delay of the time slot satisfies:

；

。

8. the multi-unmanned aerial vehicle assisted MEC task offloading and charging schedule joint optimization method of claim 7, wherein the base station is atThe computation time delay of the slot processing task satisfies:

；

。