CN117236561A - SAC-based multi-unmanned aerial vehicle auxiliary mobile edge computing method, device and storage medium - Google Patents

SAC-based multi-unmanned aerial vehicle auxiliary mobile edge computing method, device and storage medium Download PDF

Info

Publication number
CN117236561A
CN117236561A CN202311293225.6A CN202311293225A CN117236561A CN 117236561 A CN117236561 A CN 117236561A CN 202311293225 A CN202311293225 A CN 202311293225A CN 117236561 A CN117236561 A CN 117236561A
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
network
resource allocation
critic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311293225.6A
Other languages
Chinese (zh)
Inventor
董璐
姜骏永
田卜玮
石祥沛
袁心
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202311293225.6A priority Critical patent/CN117236561A/en
Publication of CN117236561A publication Critical patent/CN117236561A/en
Pending legal-status Critical Current

Links

Landscapes

  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a SAC-based multi-unmanned aerial vehicle auxiliary mobile edge computing method, device and storage medium, and belongs to the technical field of mobile edge computing. The method comprises the following steps: acquiring basic element information in an edge computing system; according to the information, an optimization model of unmanned plane path planning and resource allocation is established; taking each unmanned aerial vehicle as a decision maker, taking observation of the unmanned aerial vehicle as a state, taking path planning and resource allocation strategies selected by the unmanned aerial vehicle as actions, and converting the optimization model into a Markov decision process based on a preset rewarding function and a discount factor; and solving the Markov decision process by each unmanned aerial vehicle based on real-time observation information by utilizing a pre-trained deep reinforcement neural network to obtain an optimized unmanned aerial vehicle flight track and a resource allocation strategy. According to the invention, the multi-unmanned aerial vehicle combined base station assists the ground user edge calculation, and the strategy optimization is adopted to realize the improvement of task processing efficiency, energy consumption dispersion and the saving of calculation resources.

Description

SAC-based multi-unmanned aerial vehicle auxiliary mobile edge computing method, device and storage medium
Technical Field
The invention relates to a multi-unmanned aerial vehicle auxiliary mobile edge computing method and device based on SAC (Soft Actor-Critic) and a storage medium, and belongs to the technical field of mobile edge computing.
Background
Mobile edge computing can provide computing services with lower transmission latency and lighter access load for user devices located at the edge of a wireless network. Unmanned aerial vehicles play an important role in the field of mobile edge computing due to their flexibility, maneuverability, low cost, and other characteristics.
However, the processing efficiency is low using the conventional method because the drone is in a dynamic wireless channel environment, is subject to battery capacity limitations, computational resource constraints, and the like. In the process of unmanned aerial vehicle execution task, its decision-making process is actually a mixed optimization problem containing constraint. When the traditional method is used for solving the problems, the challenges of high calculation amount and long time delay are faced, and in a complex and changeable environment, each step needs to be calculated again for solving, so that the real-time requirement cannot be met.
The rapid development of reinforcement learning (Reinforcement Learning, RL) provides an efficient and viable solution to the above-described problems. The reinforcement learning is an algorithm through trial and error learning, can realize real-time resource allocation, track design and intelligent decision, and is more suitable for processing tasks sensitive to time delay. Meanwhile, the reinforcement learning method can remarkably reduce the operation amount, has stronger robustness, and can effectively cope with complex and changeable environments.
However, these algorithms currently only consider the computational offloading and task distribution between a single drone and a ground user, and in practical applications, the drone edge computation scenario may include situations where multiple drones are simultaneously providing computing services to the ground user. In this case, if each unmanned aerial vehicle performs direct computing offloading with only the ground user, it may result in waste and inefficiency of computing resources.
Disclosure of Invention
The invention aims to provide a SAC-based multi-unmanned aerial vehicle auxiliary mobile edge calculation method, which aims to optimize strategies of unmanned aerial vehicle path planning, resource allocation and task unloading when a multi-unmanned aerial vehicle combined base station auxiliary ground user performs edge calculation by utilizing a deep reinforcement learning algorithm, improve the processing efficiency of user tasks and avoid concentrated energy consumption and resource waste.
In order to solve the technical problems, the invention is realized by adopting the following technical scheme:
in one aspect, the invention provides a multi-unmanned aerial vehicle auxiliary movement edge calculation method, which comprises the following steps:
basic element information in a system for assisting ground users to perform edge calculation by combining multiple unmanned aerial vehicles with a base station is obtained;
according to the basic element information, an optimization model of unmanned plane path planning, channel resource allocation and computing resource allocation is established;
taking each unmanned aerial vehicle as a decision maker, taking observation of the unmanned aerial vehicle as a state, taking path planning, channel resource allocation and calculation resource allocation strategies selected by the unmanned aerial vehicle as actions, and converting the optimization model into a Markov decision process based on a preset reward function and a discount factor;
and solving the Markov decision process by each unmanned aerial vehicle based on real-time observation information by utilizing a pre-trained deep reinforcement neural network to obtain an optimized unmanned aerial vehicle flight track, channel resource allocation and calculation resource allocation strategy.
In order to avoid the defects of direct calculation and unloading of each unmanned plane and ground users in the prior art and further to the waste and inefficiency of calculation resources, the invention takes a base station as a calculation resource to participate in calculation and unloading and task allocation between the unmanned plane and the ground users on the premise of based on a deep reinforcement learning algorithm, ensures that part of intensive calculation tasks can be unloaded from the unmanned plane to a ground base station server through cooperative cooperation between an edge calculation server and the ground server, further optimizes the flight track of the unmanned plane, channel resource allocation and strategy of calculation resource allocation, and finally realizes great improvement of user calculation experience quality and overall calculation efficiency.
Optionally, the system for assisting the ground users to perform edge calculation by combining the multiple unmanned aerial vehicles with the base station comprises a plurality of unmanned aerial vehicles, a plurality of ground users and the base station, wherein the unmanned aerial vehicles are used as an edge calculation server, and the base station is used as a ground server;
specifically, the basic element information includes: number of ground users N, number of unmanned aerial vehicles M, location of ground users x= (x) 1 ,...,x M ) Position y= (y) of unmanned plane 1 ,...,y M ) Movement= (movement) of unmanned plane 1 ,...,move M ) Channel resource allocation alpha= (alpha) of unmanned plane to ground user 1 ,...,α M ) Calculation resource allocation beta= (beta) of unmanned plane to ground user 1 ,...,β M ) Task data amount, calculation amount and position (D) of the ith ground user i ,f i ,x i ) Energy consumption E, number of ground users served by unmanned aerial vehicle (I 1 ,...,I M ) Fairness coefficient I, maximum delay T, information transmission power P of user user Maximum computing resource F of unmanned aerial vehicle uav Unmanned aerial vehicle maximum bandwidth B 0 Transmission power P of unmanned aerial vehicle uav Communication time T for the ith ground user to offload tasks to the jth drone 0i j Calculation time T of task of ith ground user calculated locally by jth unmanned aerial vehicle 1i j Communication time T for the jth unmanned aerial vehicle to offload tasks of the ith ground user to the base station 2i j Power gain g at a reference distance of 1 meter 0 Effective switch capacitance k, noise power delta 2 Single maximum movement distance V of unmanned plane max Unmanned aerial vehicle flight range boundary, position U of basic station, unmanned aerial vehicle and ground user's communication maximum distance R.
Optionally, the optimization model of unmanned plane path planning, channel resource allocation and computing resource allocation is expressed as:
wherein the intermediate parameter k i,j 、θ i,j The following relationship is satisfied:
optionally, the converting the optimization model into a markov decision process includes:
let the observation of the jth unmanned plane be (D 1 ,...,D N ,f 1 ,...,f N ,x 1 ,...,x N ,y 1 ,...,y M );
Let the j-th unmanned plane act as (alpha) j ,β j ,move j );
The bonus function is set toOr->
If the unmanned aerial vehicle overspeed or moves out of the boundary, the reward is reduced by 0.3;
if the tasks of the ground users are not offloaded, the rewards are subtracted (0.1. The number of mobile terminal users whose tasks are not offloaded).
By converting the optimization model into a Markov decision process, it is achieved that the decision maker periodically or continuously observes a random dynamic system with Markov properties to make corresponding decisions sequentially.
Optionally, the pre-trained deep reinforcement neural network is obtained by training by using a SAC algorithm, and the training process comprises the following steps:
constructing and initializing a Target V network, a Critic Q1 network and a Critic Q2 network of the SAC algorithm architecture;
the parameter initialization of the TargetV network is the same as that of the CriticV network, and the parameter initialization of the CriticQ 1 network is the same as that of the CriticQ 2 network;
the inputs of the Target V network and the Critic V network are the observations of any unmanned aerial vehicle, the output of the Target V network and the Critic V network is the state value under the observations, the inputs of the Critic Q1 network and the Critic Q2 network are the observations of any unmanned aerial vehicle and the actions of all unmanned aerial vehicles, and the output of the Target V network and the Critic V network is the value of the actions taken under the observations;
further, the training process further includes:
constructing and initializing an Actor network of a SAC algorithm architecture, inputting an observation of a corresponding unmanned aerial vehicle by each Actor network, generating an average value and a variance for each element in the action, sampling by using normal distribution according to the average value and the variance, and processing the obtained data to obtain the action of the unmanned aerial vehicle;
the observation of different unmanned aerial vehicles is the same at the same time;
during training, the Actor network generates an action for each unmanned aerial vehicle in the current observation, then integrates the actions into an action group, and jumps to the next observation after rewarding, so that a quadruple is formed.
Specifically, the quadruple is expressed as: (obs) z ,a z ,r z ,next_obs z ) Wherein 4 elements respectively represent an observation, an action group, a reward, and a next observation, and the quadruples are stored in a replay buffer;
if the capacity of the replay buffer reaches a certain value, the latest arriving quadruple replaces the latest arriving quadruple, and Z quadruples are randomly sampled from the replay buffer for training.
Further, the four tuples of Z are randomly sampled from the replay buffer, so as to implement optimization of the unmanned aerial vehicle path planning, channel resource allocation and computing resource allocation policy, and the optimization process includes:
using sampled quaternions (obs z ,a z ,r z ,next_obs z ) Updating the Critic V network, and performing gradient descent by using a loss function of an Actor network to update the Critic V network;
updating the Critic Q1 network and the Critic Q2 network respectively by using a loss function of the Critic Q1 network and the Critic Q2 network;
respectively updating M Actor networks by using a gradient ascending maximization function;
soft updating the Target V network by utilizing the Critic V network;
repeating the above operation for several times to reset the environment until the optimal strategy is converged;
wherein, the loss function of the Actor network is:
the loss function of the Critic Q1 network and the Critic Q2 network is as follows:
the gradient ascent maximization function is:
the ground base station is a fixed server, and the coverage area is also fixed, however, the terminal equipment has mobility, so that the unloading of the calculation task in the movement cannot be well processed; for the unmanned aerial vehicle, although the unmanned aerial vehicle can movably cover the unloading area, the energy consumption of the unmanned aerial vehicle for self-flight and hovering is removed, and a large amount of energy is consumed for task unloading and calculation;
therefore, the ground base station is used as a computing resource, the unmanned aerial vehicle is used as a relay node to selectively offload tasks with high computing complexity of the ground user to the ground base station, and meanwhile, the unmanned aerial vehicle utilizes the pre-trained deep reinforcement neural network to timely optimize the flight track of the unmanned aerial vehicle, the channel resource allocation and the strategy of computing resource allocation according to the observed real-time information, so that the energy consumed by the unmanned aerial vehicle is minimized, and the fairness of the unmanned aerial vehicle for offloading tasks is ensured.
In another aspect, the present invention also provides a device for multi-unmanned aerial vehicle assisted mobile edge calculation, including:
the information acquisition module is configured to acquire basic element information in a system for assisting a ground user to perform edge calculation by combining multiple unmanned aerial vehicles with a base station;
the optimization model construction module is configured to establish an optimization model of unmanned plane path planning, channel resource allocation and computing resource allocation according to the basic element information;
the optimization model conversion module is configured to take each unmanned aerial vehicle as a decision maker, take observation of the unmanned aerial vehicle as a state, take path planning, channel resource allocation and calculation resource allocation strategies selected by the unmanned aerial vehicle as actions, and convert the optimization model into a Markov decision process based on a preset reward function and a discount factor;
and the training optimization module is configured for solving the Markov decision process by utilizing a pre-trained deep reinforcement neural network and based on real-time observation information to obtain an optimized unmanned aerial vehicle flight track, channel resource allocation and calculation resource allocation strategy.
The device for calculating the auxiliary moving edge of the multi-unmanned aerial vehicle can execute the method for calculating the auxiliary moving edge of the multi-unmanned aerial vehicle, so that the technical effect corresponding to the method for calculating the auxiliary moving edge of the multi-unmanned aerial vehicle can be achieved.
In a third aspect, the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method as described in the first aspect.
Compared with the prior art, the invention has the beneficial effects that:
(1) The base station is creatively used as a computing resource to participate in computing unloading and task distribution between the unmanned aerial vehicle and the ground user, and by effectively utilizing the computing resource and the network bandwidth of the base station, the task can be timely completed and returned to be used, and the cooperative cooperation of the base station and the ground user not only realizes the improvement of the task processing efficiency, but also simultaneously avoids the defects of concentrated energy consumption and resource waste;
(2) The method is based on the SAC algorithm, not only realizes real-time resource allocation, track design and intelligent decision under the cloud, side and end architecture, but also effectively deals with complex and changeable computing environments; in addition, the algorithm can also carry out self-adaptive dynamic adjustment according to the current environment and task demands so as to realize adaptive computing resource allocation and task unloading;
(3) By cooperative calculation among multiple unmanned aerial vehicles, on the premise of distributing tasks to different unmanned aerial vehicles, each unmanned aerial vehicle is guaranteed to obtain reasonable calculation tasks; meanwhile, the problem that the number difference of direct service users of the unmanned aerial vehicle is large when the unmanned aerial vehicle is used as a relay node to unload the task to the base station is solved by introducing the fairness coefficient, so that the calculation resource allocation of each unmanned aerial vehicle is balanced better, and the completion efficiency of the whole task is improved to a certain extent.
Drawings
Fig. 1 is a graph showing a fairness coefficient I obtained by training 5000 curtains (100 rounds per curtain) when n=24 and m=3;
fig. 2 is a graph showing log curves obtained by training 5000 shots (100 rounds per shot) with n=24 and m=3;
fig. 3 is a graph showing a reward curve obtained by training 5000 shots (100 rounds per shot) when n=24 and m=3;
fig. 4 is a schematic diagram of the algorithm structure when n=24 and m=3.
Detailed Description
The technical conception of the invention is as follows: the base station is used as a computing resource to participate in computing unloading and task allocation between the unmanned aerial vehicle and the ground user, and the computing resource allocation and the task unloading are carried out among a plurality of unmanned aerial vehicles and between the unmanned aerial vehicle and the base station, so that the overall computing efficiency is improved.
The implementation of the above technical concept needs to consider the following problems:
(1) Collaborative computing between multiple unmanned aerial vehicles: when a plurality of unmanned aerial vehicles participate in calculation unloading at the same time, cooperative work is needed to avoid waste and conflict of calculation resources, an algorithm should consider how to distribute tasks to different unmanned aerial vehicles, and each unmanned aerial vehicle can be guaranteed to obtain reasonable calculation tasks;
(2) Calculation offloading between drone and base station: in the case of considering the participation of the base station, the algorithm should consider how to offload the calculation task to the base station, which needs to consider how to effectively utilize the calculation resource and network bandwidth of the base station, and ensure that the task can be completed and returned to use in time;
(3) Dynamic allocation of computing resources: when the computing resources of the unmanned aerial vehicle and the base station change with time, the algorithm can perform computing resource allocation and task unloading in real time, which requires the algorithm to have dynamic adjustment capability and can perform self-adaptive adjustment according to the current environment and task requirements.
The invention is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.
Example 1
With reference to fig. 1 to fig. 4, the embodiment provides a multi-unmanned aerial vehicle auxiliary movement edge computing method, which specifically includes the following steps:
basic element information in a system for assisting ground users to perform edge calculation by combining multiple unmanned aerial vehicles with a base station is obtained;
according to the basic element information, an optimization model of unmanned plane path planning, channel resource allocation and computing resource allocation is established;
taking each unmanned aerial vehicle as a decision maker, taking observation of the unmanned aerial vehicle as a state, taking path planning, channel resource allocation and calculation resource allocation strategies selected by the unmanned aerial vehicle as actions, and converting the optimization model into a Markov decision process based on a preset reward function and a discount factor;
and solving the Markov decision process by each unmanned aerial vehicle based on real-time observation information by utilizing a pre-trained deep reinforcement neural network to obtain an optimized unmanned aerial vehicle flight track, channel resource allocation and calculation resource allocation strategy.
When the embodiment is applied, the SAC algorithm in deep reinforcement learning is optimized, a centralized algorithm is provided to obtain a good effect, and on the basis, a centralized training and distributed execution strategy is adopted to further improve the optimizing effect; meanwhile, the base station is critically involved in the resource allocation as a computing resource, so that the unmanned aerial vehicle and the base station cooperatively process the computing resource, the efficiency is greatly improved, and the concentrated energy consumption and the resource waste are avoided.
Example 2
With reference to fig. 1 to 4, this embodiment also has the following design on the basis of embodiment 1:
the system for assisting the ground users to perform edge calculation by combining the multiple unmanned aerial vehicles with the base station comprises a plurality of unmanned aerial vehicles, a plurality of ground users and the base station, wherein the unmanned aerial vehicles are used as an edge calculation server, and the base station is used as a ground server;
specifically, the flight area of the unmanned aerial vehicle and the distribution area of the ground users are limited to a square area of 300×300, the number of the ground users is 24, the number of the unmanned aerial vehicles is 3, and x= (x) 1 ,...,x 24 ) Represents the location of the ground user, using y= (y) 1 ,...,y 24 ) To represent the position coordinates of the unmanned aerial vehicle, move= (move) 1 ,...,move 24 ) To represent the movement of the drone, α= (α) 123 ) Represents the channel resource allocation of the unmanned aerial vehicle to the ground user, β= (β) 123 ) Representing allocation of computing resources of the drone to the ground user, (D) i ,f i ,x i ) Respectively representing task data quantity, calculation quantity and position coordinates of ith ground user, E represents energy consumption, (I) 1 ,I 2 ,I 3 ) The number of ground users served by the unmanned plane is represented, I represents a fairness coefficient, T represents the maximum time delay, the value is 1, F uav Represents the maximum computing resource of the unmanned aerial vehicle, and the value is 3e9 and B 0 Represents the maximum bandwidth of the unmanned plane, and takes the value of 1e7 and P user Representing the information transmission power of the user, wherein the value is 0.5 and P uav Representing the transmission power of the unmanned aerial vehicle, wherein the transmission power is 5, T 0i j Representing communication time of the ith ground user to offload tasks to the jth unmanned plane, T 1i j Representing the calculation time of the task of the ith ground user of the jth unmanned aerial vehicle in local operation, T 2i j Representing the communication time g of the jth unmanned plane for offloading the task of the ith ground user to the base station 0 Representing the power gain at a reference distance of 1 meter, taking the value as1e-5, k is the capacitance coefficient of the unmanned aerial vehicle CPU, and the value is 1e-28, delta 2 Is the noise power, the value is 1e-10, V max The single maximum movement distance of the unmanned plane is 20;
in addition, the users are set to be uniformly distributed on the diagonal line of the square area, the initial positions of the three unmanned aerial vehicles are respectively (0, 0), (120 ), (240, 240), the task data size of each ground user is {1e5,2e5,3e5}, the value range of the calculated amount is {100, 200, 300}, the maximum communication distance between the unmanned aerial vehicle and the ground user is 200, and meanwhile, for the sake of simplicity, the distance between the unmanned aerial vehicle and the base station is assumed to be a constant value 1000.
Taking each unmanned aerial vehicle as a decision maker, taking observation of the unmanned aerial vehicle as a state, taking path planning, channel resource allocation and calculation resource allocation strategies selected by the unmanned aerial vehicle as actions, and converting the optimization model into a Markov decision process based on a preset reward function and a discount factor.
Specifically, the optimization model of unmanned plane path planning, channel resource allocation and computing resource allocation is expressed as:
wherein:
specifically, let the j-th unmanned aerial vehicle observe (D 1 ,...,D N ,f 1 ,...,f N ,x 1 ,...,x N ,y 1 ,...,y M ) The jth unmanned aerial vehicle acts as (alpha) jj ,move j ) The bonus function bonus is set toAnd once the unmanned aerial vehicle overspeed or movement out of the boundary, the reward is reduced by 0.3, and once the task of the ground user is not unloaded, the reward is reduced by 0.1 the number of the ground users of which the task is not unloaded. Wherein, the discount factor γ=0.99.
And solving the Markov decision process by each unmanned aerial vehicle based on real-time observation information by utilizing a pre-trained deep reinforcement neural network to obtain an optimized unmanned aerial vehicle flight track, channel resource allocation and calculation resource allocation strategy.
Specifically, the four initialized neural networks are respectively Target V, critic Q1 and Critic Q2, the structures of the Target V and the Critic V are the same, the parameter initialization is the same, all three layers of fully connected networks are adopted, and the number of neurons is 256; the structure of Critic Q1 and the structure of Critic Q2 are the same, the parameter initialization is the same, three layers of full-connection networks are adopted, and the number of neurons is 256.
The two V networks input the observation of any unmanned aerial vehicle, output the state value under the observation, the two Q networks input the observation of any unmanned aerial vehicle and the actions of all unmanned aerial vehicles, and output the value of taking the actions under the observation.
Initializing an Actor network for each unmanned aerial vehicle, and using a sigmoid function when outputting, wherein the number of neurons in the three-layer fully-connected network is 256.
And each Actor network inputs the state of the corresponding unmanned aerial vehicle, generates an average value and a variance for each element in the action, samples by using normal distribution according to the average value and the variance, subtracts 0.5 from the sampled position parameter, multiplies 20 from the sampled position parameter, processes all the obtained communication resource allocation parameters and all the obtained calculation resource allocation parameters by using a softmax function respectively, and finally obtains the action of the unmanned aerial vehicle in the state.
It should be noted that, since the states of different unmanned aerial vehicles are always the same at the same time, the states of different unmanned aerial vehicles are not distinguished any more.
Meanwhile, the neural network optimizer is set to be Adam, parameters are all 3e-4, and the soft update coefficient of the target V network is set to be 0.001.
In combination with fig. 4, during training, an action is generated by the Actor network for each unmanned aerial vehicle in the current state, and then integrated into an action group to obtain rewards and jump to the next observation; we store (observation, action group, rewards, next observation) in the replay buffer, when the capacity of the replay buffer reaches 50000, the latest arriving quadruples (observation, action group, rewards, next observation) will replace the latest arriving quadruples, and randomly sample 128 quadruples from the replay buffer for training.
The four-tuple of Z number is randomly sampled from the replay buffer, and is used for realizing optimization of unmanned plane path planning, channel resource allocation and calculation resource allocation strategy, and the specific optimization process is as follows:
first, a four-tuple (obs) z ,a z ,r z ,next_obs z ) Updating the Critic V network, and performing gradient descent by using a loss function to update the Critic V network;
secondly, respectively updating the two Q networks by using a loss function;
then, respectively updating M Actor networks by using a gradient ascending maximization function;
finally, utilizing the Critic V network to update the Target V network in a soft mode;
this was done 100 times, resetting the environment, and training a screen. The above operation was repeated 5000 times.
The invention obtains the strategies of path planning, communication resource allocation and calculation resource allocation aiming at the calculation of the mobile edge of one multi-unmanned aerial vehicle in the environment.
Wherein, the loss function of the Actor network is:
the loss function of the Critic Q1 network and the Critic Q2 network is as follows:
the gradient ascent maximization function is:
in the invention, a plurality of unmanned aerial vehicles and ground base stations are applied to realize task unloading for all ground users in a certain range under the condition of meeting the time delay requirement in a certain time; and it is required to ensure fairness and energy consumption minimization when the unmanned aerial vehicle serves the ground users.
To solve the above-mentioned problems, the present embodiment optimizes the flight trajectory, wireless channel resource allocation, and computing resource allocation of a plurality of unmanned aerial vehicles based on a centralized training and distributed execution algorithm of SAC.
Specifically, the base station is used as a computing resource to be added into the task processing process, and meanwhile, the unmanned aerial vehicle only needs to obtain the position information of the current ground user, the task information and the position information of the unmanned aerial vehicle, so that path planning and resource allocation decisions can be rapidly made, and the optimization of the final decision purpose is realized.
Example 3
The present embodiment provides a device for multi-unmanned aerial vehicle assisted mobile edge calculation, which may be used to implement the method described in the foregoing embodiment 2, and includes:
the information acquisition module is configured to acquire basic element information in a system for assisting a ground user to perform edge calculation by combining multiple unmanned aerial vehicles with a base station;
the optimization model construction module is configured to establish an optimization model of unmanned plane path planning, channel resource allocation and computing resource allocation according to the basic element information;
the optimization model conversion module is configured to take each unmanned aerial vehicle as a decision maker, take observation of the unmanned aerial vehicle as a state, take path planning, channel resource allocation and calculation resource allocation strategies selected by the unmanned aerial vehicle as actions, and convert the optimization model into a Markov decision process based on a preset reward function and a discount factor;
and the training optimization module is configured for solving the Markov decision process by utilizing a pre-trained deep reinforcement neural network and based on real-time observation information to obtain an optimized unmanned aerial vehicle flight track, channel resource allocation and calculation resource allocation strategy.
With reference to fig. 1 to 4, the apparatus for calculating a multi-unmanned aerial vehicle auxiliary moving edge according to the present embodiment may perform the method provided in the above embodiment 2, and may achieve the beneficial effects corresponding to the multi-unmanned aerial vehicle auxiliary moving edge calculating method described in the above embodiment 2.
Example 4
The present embodiment provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method according to any of claims 1-8.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and variations could be made by those skilled in the art without departing from the technical principles of the present invention, and such modifications and variations should also be regarded as being within the scope of the invention.

Claims (10)

1. The multi-unmanned aerial vehicle auxiliary movement edge calculation method is characterized by comprising the following steps of:
basic element information in a system for assisting ground users to perform edge calculation by combining multiple unmanned aerial vehicles with a base station is obtained;
according to the basic element information, an optimization model of unmanned plane path planning, channel resource allocation and computing resource allocation is established;
taking each unmanned aerial vehicle as a decision maker, taking observation of the unmanned aerial vehicle as a state, taking path planning, channel resource allocation and calculation resource allocation strategies selected by the unmanned aerial vehicle as actions, and converting the optimization model into a Markov decision process based on a preset reward function and a discount factor;
and solving the Markov decision process by each unmanned aerial vehicle based on real-time observation information by utilizing a pre-trained deep reinforcement neural network to obtain an optimized unmanned aerial vehicle flight track, channel resource allocation and calculation resource allocation strategy.
2. The multi-unmanned aerial vehicle assisted mobile edge computing method according to claim 1, wherein the system for assisting the ground users in edge computing by the multi-unmanned aerial vehicle combined with the base station comprises a plurality of unmanned aerial vehicles, a plurality of ground users and the base station, wherein the unmanned aerial vehicles serve as edge computing servers, and the base station serves as a ground server;
the basic element information includes: number of ground users N, number of unmanned aerial vehicles M, location of ground users x= (x) 1 ,...,x M ) Position y= (y) of unmanned plane 1 ,...,y M ) Movement= (movement) of unmanned plane 1 ,...,move M ) Channel resource allocation alpha= (alpha) of unmanned plane to ground user 1 ,...,α M ) Calculation resource allocation beta= (beta) of unmanned plane to ground user 1 ,...,β M ) Task data amount, calculation amount and position (D) of the ith ground user i ,f i ,x i ) Energy consumption E, number of ground users served by unmanned aerial vehicle (I 1 ,...,I M ) Fairness coefficient I, maximum delay T, information transmission power P of user user Maximum computing resource F of unmanned aerial vehicle uav Unmanned aerial vehicle maximum bandwidth B 0 Transmission power P of unmanned aerial vehicle uav Communication time T for the ith ground user to offload tasks to the jth drone 0i j Calculation time T of task of ith ground user calculated locally by jth unmanned aerial vehicle 1i j Communication time T for the jth unmanned aerial vehicle to offload tasks of the ith ground user to the base station 2i j Power gain g at a reference distance of 1 meter 0 Effective switch capacitance k, noise power delta 2 Single maximum movement distance V of unmanned plane max Unmanned aerial vehicle flight range boundary, position U of basic station, unmanned aerial vehicle and ground user's communication maximum distance R.
3. The multi-drone assisted mobile edge computing method of claim 2, wherein the optimization model of drone path planning, channel resource allocation, and computing resource allocation is expressed as:
wherein the intermediate parameter k i,j 、θ i,j The following relationship is satisfied:
4. a multi-unmanned aerial vehicle assisted mobile edge computing method according to claim 3, wherein the converting the optimization model into a markov decision process comprises:
let the observation of the jth unmanned plane be (D 1 ,...,D N ,f 1 ,...,f N ,x 1 ,...,x N ,y 1 ,...,y M );
Let the j-th unmanned plane act as (alpha) j ,β j ,move j );
The bonus function is set toOr->
If any unmanned aerial vehicle overspeed or moves out of the boundary, the reward is reduced by 0.3;
if the tasks of any ground user are not offloaded, the rewards are subtracted (0.1 is the number of mobile terminal users whose tasks are not offloaded).
5. The multi-unmanned aerial vehicle assisted mobile edge computing method of claim 1, wherein the pre-trained deep reinforcement neural network is trained by using a SAC algorithm, and the training process comprises:
constructing and initializing a Target V network, a Critic Q1 network and a Critic Q2 network of the SAC algorithm architecture;
the Target V network and the Critic V network have the same parameter initialization, and the Critic Q1 network and the Critic Q2 network have the same parameter initialization;
the inputs of the Target V network and the Critic V network are the observations of any unmanned aerial vehicle, the output of the Target V network and the Critic V network is the state value under the observations, the inputs of the Critic Q1 network and the Critic Q2 network are the observations of any unmanned aerial vehicle and the actions of all unmanned aerial vehicles, and the output of the Target V network and the Critic V network is the value of the actions taken under the observations.
6. The multi-unmanned aerial vehicle assisted mobile edge computing method of claim 5, wherein the training process further comprises:
constructing and initializing an Actor network of a SAC algorithm architecture, inputting an observation of a corresponding unmanned aerial vehicle by each Actor network, generating an average value and a variance for each element in the action, sampling by using normal distribution according to the average value and the variance, and processing the obtained data to obtain the action of the unmanned aerial vehicle;
the observation of different unmanned aerial vehicles is the same at the same time;
during training, the Actor network generates an action for each unmanned aerial vehicle in the current observation, then integrates the actions into an action group, and jumps to the next observation after rewarding, so that a quadruple is formed.
7. The multi-unmanned aerial vehicle assisted movement edge computing method of claim 6, wherein the four-tuple is represented as: (obs) z ,a z ,r z ,next_obs z ) Wherein 4 elements respectively represent an observation, an action group, a reward, and a next observation, and the quadruples are stored in a replay buffer;
if the capacity of the replay buffer reaches a certain value, the latest arriving quadruple replaces the latest arriving quadruple, and Z quadruples are randomly sampled from the replay buffer for training.
8. The multi-drone assisted mobile edge computing method of claim 7, wherein the randomly sampling Z quadruples from the replay buffer is used to implement optimization of the drone path planning, channel resource allocation and computing resource allocation policies, the optimization process comprising:
using sampled quaternions (obs z ,a z ,r z ,next_obs z ) Updating the Critic V network, and using the loss function of the Actor network to perform gradientLowering and updating the Critic V network;
updating the Critic Q1 network and the Critic Q2 network respectively by using a loss function of the Critic Q1 network and the Critic Q2 network;
respectively updating M Actor networks by using a gradient ascending maximization function;
soft updating the Target V network by utilizing the Critic V network;
repeating the above operation for several times to reset the environment until the optimal strategy is converged;
wherein, the loss function of the Actor network is:
the loss function of the Critic Q1 network and the Critic Q2 network is as follows:
the gradient ascent maximization function is:
9. an apparatus for multi-unmanned aerial vehicle assisted mobile edge computing, comprising:
the information acquisition module is configured to acquire basic element information in a system for assisting a ground user to perform edge calculation by combining multiple unmanned aerial vehicles with a base station;
the optimization model construction module is configured to establish an optimization model of unmanned plane path planning, channel resource allocation and computing resource allocation according to the basic element information;
the optimization model conversion module is configured to take each unmanned aerial vehicle as a decision maker, take observation of the unmanned aerial vehicle as a state, take path planning, channel resource allocation and calculation resource allocation strategies selected by the unmanned aerial vehicle as actions, and convert the optimization model into a Markov decision process based on a preset reward function and a discount factor;
and the training optimization module is configured for solving the Markov decision process by utilizing a pre-trained deep reinforcement neural network and based on real-time observation information to obtain an optimized unmanned aerial vehicle flight track, channel resource allocation and calculation resource allocation strategy.
10. A computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method according to any of claims 1 to 8.
CN202311293225.6A 2023-10-08 2023-10-08 SAC-based multi-unmanned aerial vehicle auxiliary mobile edge computing method, device and storage medium Pending CN117236561A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311293225.6A CN117236561A (en) 2023-10-08 2023-10-08 SAC-based multi-unmanned aerial vehicle auxiliary mobile edge computing method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311293225.6A CN117236561A (en) 2023-10-08 2023-10-08 SAC-based multi-unmanned aerial vehicle auxiliary mobile edge computing method, device and storage medium

Publications (1)

Publication Number Publication Date
CN117236561A true CN117236561A (en) 2023-12-15

Family

ID=89085931

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311293225.6A Pending CN117236561A (en) 2023-10-08 2023-10-08 SAC-based multi-unmanned aerial vehicle auxiliary mobile edge computing method, device and storage medium

Country Status (1)

Country Link
CN (1) CN117236561A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117553803A (en) * 2024-01-09 2024-02-13 大连海事大学 Multi-unmanned aerial vehicle intelligent path planning method based on deep reinforcement learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117553803A (en) * 2024-01-09 2024-02-13 大连海事大学 Multi-unmanned aerial vehicle intelligent path planning method based on deep reinforcement learning
CN117553803B (en) * 2024-01-09 2024-03-19 大连海事大学 Multi-unmanned aerial vehicle intelligent path planning method based on deep reinforcement learning

Similar Documents

Publication Publication Date Title
CN112367353B (en) Mobile edge computing unloading method based on multi-agent reinforcement learning
CN112351503B (en) Task prediction-based multi-unmanned aerial vehicle auxiliary edge computing resource allocation method
CN111405568B (en) Computing unloading and resource allocation method and device based on Q learning
CN113543156B (en) Industrial wireless network resource allocation method based on multi-agent deep reinforcement learning
CN111800828B (en) Mobile edge computing resource allocation method for ultra-dense network
CN111405569A (en) Calculation unloading and resource allocation method and device based on deep reinforcement learning
CN114665952B (en) Low-orbit satellite network beam-jumping optimization method based on star-ground fusion architecture
CN114169234A (en) Scheduling optimization method and system for unmanned aerial vehicle-assisted mobile edge calculation
CN117236561A (en) SAC-based multi-unmanned aerial vehicle auxiliary mobile edge computing method, device and storage medium
CN113254188B (en) Scheduling optimization method and device, electronic equipment and storage medium
CN115175217A (en) Resource allocation and task unloading optimization method based on multiple intelligent agents
CN116451934B (en) Multi-unmanned aerial vehicle edge calculation path optimization and dependent task scheduling optimization method and system
CN116489708B (en) Meta universe oriented cloud edge end collaborative mobile edge computing task unloading method
CN116893861A (en) Multi-agent cooperative dependency task unloading method based on space-ground cooperative edge calculation
CN115065678A (en) Multi-intelligent-device task unloading decision method based on deep reinforcement learning
CN117055619A (en) Unmanned aerial vehicle scheduling method based on multi-agent reinforcement learning
Bayerlein et al. Learning to rest: A Q-learning approach to flying base station trajectory design with landing spots
CN115659803A (en) Intelligent unloading method for computing tasks under unmanned aerial vehicle twin network mapping error condition
CN116634498A (en) Low orbit satellite constellation network edge calculation multistage unloading method based on reinforcement learning
CN113821346B (en) Edge computing unloading and resource management method based on deep reinforcement learning
CN117580105A (en) Unmanned aerial vehicle task unloading optimization method for power grid inspection
CN115756873B (en) Mobile edge computing and unloading method and platform based on federation reinforcement learning
CN116737391A (en) Edge computing cooperation method based on mixing strategy in federal mode
CN116774584A (en) Unmanned aerial vehicle differentiated service track optimization method based on multi-agent deep reinforcement learning
CN115967430A (en) Cost-optimal air-ground network task unloading method based on deep reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination