CN115567978A - System and method for joint optimization of computation unloading and resource allocation under multi-constraint side environment - Google Patents

System and method for joint optimization of computation unloading and resource allocation under multi-constraint side environment Download PDF

Info

Publication number
CN115567978A
CN115567978A CN202211200913.9A CN202211200913A CN115567978A CN 115567978 A CN115567978 A CN 115567978A CN 202211200913 A CN202211200913 A CN 202211200913A CN 115567978 A CN115567978 A CN 115567978A
Authority
CN
China
Prior art keywords
resource allocation
network
task
tasks
mds
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211200913.9A
Other languages
Chinese (zh)
Inventor
陈哲毅
黄思进
张俊杰
熊兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN202211200913.9A priority Critical patent/CN115567978A/en
Priority to PCT/CN2022/126471 priority patent/WO2024065903A1/en
Publication of CN115567978A publication Critical patent/CN115567978A/en
Priority to NL2033996A priority patent/NL2033996A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/06Optimizing the usage of the radio link, e.g. header compression, information sizing, discarding information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/08Load balancing or load distribution
    • H04W28/09Management thereof
    • H04W28/0925Management thereof using policies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/08Load balancing or load distribution
    • H04W28/09Management thereof
    • H04W28/0958Management thereof based on metrics or performance parameters
    • H04W28/0967Quality of Service [QoS] parameters
    • H04W28/0975Quality of Service [QoS] parameters for reducing delays

Abstract

The invention relates to a combined optimization system and method for computation unloading and resource allocation under a multi-constraint side environment, which designs a unified computation unloading and resource allocation model aiming at a dynamic MEC system under a multi-constraint condition and takes time delay and energy consumption of an execution task as an optimization target. A task priority preprocessing mechanism is designed, priorities can be allocated to tasks according to the data volume of the tasks and the performance of mobile equipment, a calculation unloading and resource allocation joint optimization method based on deep reinforcement learning JOR-RL is provided, and in the JOA-RL method, a criticic network adopts a single-step updating mode based on a value function and is used for evaluating a current unloading scheme and a resource scheduling strategy; and the actor network adopts an updating mode based on strategy gradient and is used for outputting the unloading scheme and the resource scheduling strategy. The method has obvious effects on improving the success rate of task execution and reducing the time delay and energy consumption of task execution.

Description

System and method for joint optimization of computation unloading and resource allocation under multi-constraint side environment
Technical Field
The invention relates to a system and a method for joint optimization of computation unloading and resource allocation under a multi-constraint edge environment.
Background
With the rapid development and popularity of communication technologies and mobile devices, a variety of emerging applications continue to emerge, which often collect large amounts of sensory data and are accompanied by computationally intensive tasks to support their high-quality intelligent services, which present significant challenges to the hardware performance of mobile devices. However, due to the size and cost of manufacturing devices, mobile devices are usually equipped with a battery with a certain capacity and a processor with limited computing power, which has not been able to support the demand of high-performance sustainable processing for emerging applications. Cloud computing provides sufficient computing and storage resources, and mobile devices can make up for their deficiencies in hardware performance by means of cloud services. Therefore, one possible solution is to offload the computationally intensive tasks on the mobile device to a remote cloud with sufficient resources for execution, and to feed the results back to the mobile device after the tasks are completed. However, the long distance between the mobile device and the remote cloud may cause severe data transmission delay, may not well meet the requirements of the delay-sensitive application, and may also significantly affect the service experience of the user.
Compared to cloud computing, mobile Edge Computing (MEC) deploys computing and storage resources to the edge of the network closer to the mobile devices. Therefore, the MEC is used for carrying out computation unloading, so that the condition of network congestion in cloud computing can be effectively avoided, the response time of network service is shortened, and the basic requirements of users on privacy protection can be better met. The MEC server is provided with fewer resources but is more flexible than a cloud server. Therefore, how to achieve reasonable resource allocation in a resource-constrained MEC system is a difficulty. In addition, the mobile device often needs to run continuously to support various intelligent applications, but is limited by the battery capacity, and the calculation unloading process of the task is also affected to a certain extent. The integration of MECs with radio frequency-based Wireless Power Transfer (WPT) has recently become a viable and promising solution for providing on-demand energy to the radio transceiver of wireless mobile devices. However, the multiple constraints of energy and delay bring new challenges to the computation offload and resource allocation in the edge environment, and therefore, an effective computation offload and resource allocation method needs to be designed.
Disclosure of Invention
In view of this, the present invention provides a joint optimization system and method for computation offload and resource allocation in a multi-constraint edge environment, which can obtain an optimal policy for computation offload and resource allocation in a dynamic MEC environment.
In order to achieve the purpose, the invention adopts the following technical scheme:
a joint optimization system for computation offload and resource allocation under a multi-constrained edge environment comprises a base station BS, an MEC server and N chargeable mobile devices MDs, wherein the N chargeable mobile devices MDs are recorded as a set MD = { MDS = 1 ,MD 2 ,...MD i ...,MD N }; the chargeable mobile equipment MDs is accessed to the base station BS through a 5G or LTE mode, and the base station BS is provided with an MEC server.
Further, the MDs are equipped with Energy Harvesting (EH) components and are powered by energy harvested from Radio Frequency (RF) signals.
Further, when the chargeable mobile device MDs generates tasks, and the computing tasks are unloaded to the MEC server to be executed or executed locally, the tasks with higher priority tend to be unloaded to the MEC server to be executed, specifically, the priority pr i T Is defined as
Figure BDA0003872432350000031
Wherein the content of the first and second substances,
Figure BDA0003872432350000032
representing the transmission channel gain in the sub-slot t,
Figure BDA0003872432350000033
is Task i Amount of data of (f) i Is MD i Meter (2)Computing power, P i Denotes MD i The transmission power.
An optimization method of a combined optimization system for computation offload and resource allocation under a multi-constraint edge environment comprises the following steps:
step S1, generating an unloading decision and a resource allocation decision based on a calculation unloading and resource allocation combined optimization model according to tasks generated on different MDs, unloading priorities of the tasks, battery electric quantity of the MDs and available calculation resources of an MEC server at the current moment;
s2, communication resources are issued according to the resource allocation decision, and the MDs unload the tasks to a local or MEC server for execution according to the unloading decision;
and S3, the job scheduler allocates the job to the server from the job sequence according to the resource allocation decision.
Further, the joint optimization model for computation offloading and resource allocation is constructed and trained based on python3.6 and an open-source framework Pytorch, and specifically comprises the following steps:
(1) Obtaining MD i Computing power f i MEC server computing power
Figure BDA0003872432350000034
Network bandwidth
Figure BDA0003872432350000035
And initializing the system;
(2) Training is carried out, and the system environment state s obtained by each training t Inputting an operator network, performing an operator network output action a in the environment t Executing corresponding unloading calculation and resource allocation operation;
(3) Calculating corresponding reward according to formula, and feeding back the step of task cumulative execution reward r by environment t And the next state s t+1 And store the training samples in an experience replay pool m.push(s) t ,a t ,r t ,s t+1 );
(4) And when the number of training samples stored in the M reaches N, randomly selecting N records for training network parameters to obtain a final calculation unloading and resource allocation combined optimization model.
Further, the initialization system specifically includes: based on the state space, the action space and the reward function, firstly, the parameter theta of the operator network is initialized μ And parameter θ of critic network Q (ii) a Then, the operator network parameter theta is set μ Assigning to the target actor network parameter theta μ′ And the criticc network parameter theta Q Assigning to the target critic network parameter theta Q′ Simultaneously initializing an experience replay pool M, a training round P and a time series length T max
Further, the state space, the action space and the reward function are as follows:
state space: the state space contains the tasks Task generated on all the MDs of the subslot t t The unloading priority pr of a task t MDs battery power b t And computing resources available to the MEC server at the current time
Figure BDA0003872432350000041
Thus, the system state at the sub-slot t is represented as:
Figure BDA0003872432350000042
wherein
Figure BDA0003872432350000043
An action space: the DRL agent makes the actions of calculating unloading and resource allocation according to the current system state; the action space contains the offload decision α t Upload bandwidth allocation w of tasks t And the assigned MEC server computing resources p for the task t . Thus, the action at the sub-slot t is represented as:
a t ={α t ,w t ,p t equation (15)
Wherein the content of the first and second substances,
Figure BDA0003872432350000044
the reward function: the goal of the system is to minimize the sum of the weighted costs of system latency and energy consumption under the constraint of satisfying the optimization problem P1, and therefore, at the sub-slot t instant, the immediate reward of the system is expressed as:
Figure BDA0003872432350000051
wherein, w 1 And w 2 And F represents a normalization function, and Pu represents a penalty coefficient of task failure.
Further, the training specifically comprises: training criticic networks θ Q Fitting Q(s) off t ,a t ) When Q(s) t ,a t ) When determined, for a fixed s t Must exist a t So that Q(s) t ,a t ) Maximum, Q(s) t ,a t ) Expressed as:
Q(s t ,a t )=E environment [r(s t ,a t )+γQ(s t+1 ,μ(s t+1 ))]formula (17)
Wherein the operator network θ μ According to the current state s t Maximum operation a of output Q value t The process is represented as:
a t =μ(s tμ ) Formula (18)
The performance goal of an actor network is defined as:
Figure BDA0003872432350000052
further, defining a target operator network theta μ′ And target critic network theta Q′
The critic network is responsible for calculating the current Q value Q(s) t ,a t ) And defines a target Q value y t
y t =r t +γQ′(s t+1 ,μ′(s t+1μ′ )|θ Q′ ) Formula (20)
The strategy optimal solution of the operator network is approximated by a gradient ascending method, and the loss function of the critic network is defined as follows:
Figure BDA0003872432350000053
and in each training step, the target operator network and the target critic network are close to the operator network and the critic network according to the updating step tau.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention can generate a proper calculation unloading and resource allocation scheme according to the calculation resources and the network condition, improve the success rate of executing the task and reduce the time delay and energy consumption of executing the task
2. The invention can allocate the priority to the task according to the task data volume and the performance of the mobile equipment.
Drawings
Fig. 1 is a single edge multi-mobile MEC system in an embodiment of the invention;
FIG. 2 is a flow of sequential task work in accordance with an embodiment of the present invention;
FIG. 3 is a flow chart of a method of JOA _ RL in accordance with an embodiment of the present invention;
FIG. 4 is a comparison of convergence of different methods in an embodiment of the invention
FIG. 5 illustrates the impact of network bandwidth on various methods in one embodiment of the present invention;
FIG. 6 is an illustration of the impact of the computing power of the MEC server on different methods in accordance with an embodiment of the present invention;
FIG. 7 is a graph of the impact of MD battery maximum capacity on various methods in one embodiment of the present invention.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
The invention designs a unified calculation unloading and resource allocation model aiming at the dynamic MEC system under the multi-constraint condition, and takes the time delay and energy consumption of the execution task as the optimization target. A task priority preprocessing mechanism is designed, and priorities can be allocated to tasks according to the data volume of the tasks and the performance of the mobile device. Accordingly, for the DRL framework, a state space, an action space and a reward function of the computation offload and resource allocation problem in the MEC environment are defined, and the optimization problem is formally expressed as a Markov Decision Process (MDP). Then, a calculation unloading and resource allocation joint optimization method JOR-RL based on deep reinforcement learning is provided, and in the JOA-RL method, a criticc network adopts a single-step updating mode based on a value function and is used for evaluating a current unloading scheme and a resource scheduling strategy; and the actor network adopts an updating mode based on strategy gradient and is used for outputting the unloading scheme and the resource scheduling strategy.
Referring to fig. 1, the present invention provides an MEC system, which is composed of a Base Station (BS), an MEC server and N chargeable Mobile Devices (MDs), wherein N is marked as a set MD = { MD = m 1 ,MD 2 ,...,MD N }. MDs access to BS through 5G or LTE mode, and MEC server is equipped on BS. In addition, all MDs are equipped with Energy Harvesting (EH) components and are powered by energy harvested from Radio Frequency (RF) signals.
As shown in FIG. 2, at the beginning of each time slot T, each MD generates a computation task
Figure BDA0003872432350000071
Wherein
Figure BDA0003872432350000072
Is the data volume of the task,
Figure BDA0003872432350000073
Computing resources, T, required for a task d The maximum completion delay allowed for the task. The MDs draws power from the radio frequency signals of the BS. The task must be completed within its corresponding maximum tolerated delay and the existing battery charge, otherwise the task will be determined to have failed. In the proposed MEC system, tasks from MDsThe completion may be performed with the assistance of an MEC server, and specific communication models, calculation models, and energy collection models are defined as follows.
1 communication model
As shown in the figure 2 of the drawings,
Figure BDA0003872432350000074
is defined as the time slot T starting time MD i To generate an offload decision for the task. When the temperature is higher than the set temperature
Figure BDA0003872432350000075
While, MD i Unloading the task to an MEC server for execution; when in use
Figure BDA0003872432350000076
MD i The task is executed locally. When MD is performed i When the task is selected to be unloaded to the MEC server for execution, the data depended on by the task calculation is uploaded correspondingly, and the bandwidth of the uploaded task is distributed by the BS. Thus, MD i The signal-to-noise ratio in the sub-slot t is
Figure BDA0003872432350000081
Where, δ represents the average power of white gaussian noise,
Figure BDA0003872432350000082
and P i Respectively represent MD i Channel gain and transmission power at sub-slot t. Thus, MD i The power of the transmitted computing task is
Figure BDA0003872432350000083
Wherein, B t Indicating the upload bandwidth shared by all the MDs of the current sub-slot t,
Figure BDA0003872432350000084
indicating the assignment of the BS to the MD at the sub-slot t i And transmitting the bandwidth proportion of the uploading task.
2 computational model
In the proposed MEC system, when the MDs generates a task, the task is first added to the task buffer queue of the corresponding MD, and the task added to the queue first is completed before the subsequent task is executed. Since both the MDs and MEC servers can provide computing services, two computing modes are defined as follows:
(1) Local computing mode
It is assumed that the computing power (i.e., CPU frequency) of different MDs may be different but will not change during task execution. Thus, the latency and energy consumption of the local computation mode are defined as
Figure BDA0003872432350000085
Figure BDA0003872432350000086
Wherein f is i Denotes MD i The frequency of the CPU of (a) is,
Figure BDA0003872432350000087
represent
Figure BDA0003872432350000088
The required computational resources, k, represent the effective capacitance coefficient.
(2) Edge calculation mode
When the MDs unloads the task to the MEC server for execution, the MEC server selects and allocates part of currently available computing resources to the MDs, and the MEC server returns the result to the MDs after the task is executed. Generally, the data size of the calculation result is very small, and the delay and energy consumption for downloading the calculation result of the task are negligible. Therefore, the delay and power consumption of the edge calculation mode are respectively defined as
Figure BDA0003872432350000091
Figure BDA0003872432350000092
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003872432350000093
indicating the computing resources available to the MEC server at the start of the sub-slot t,
Figure BDA0003872432350000094
indicating t sub-slots allocated to MD i Calculating the proportion of resources, P e Representing the computational power allocated by the MEC server to the task.
Thus, performing
Figure BDA0003872432350000095
Can be expressed as
Figure BDA0003872432350000096
Execute
Figure BDA0003872432350000097
Can be expressed as
Figure BDA0003872432350000098
Wherein the content of the first and second substances,
Figure BDA0003872432350000099
to represent
Figure BDA00038724323500000910
To offload decisions.
In order to make quick decisions to find a proper computing mode aiming at different tasks, the invention provides a task priority preprocessing mechanism which can allocate the priority to the tasks according to the data volume of the tasks and the performance of the mobile device. The mechanism measures the appropriateness of uploading different tasks to the MEC server for execution, and the tasks with higher priority tend to be unloaded to the MEC server for execution. Specifically, the above priority is defined as
Figure BDA0003872432350000101
Wherein the content of the first and second substances,
Figure BDA0003872432350000102
representing the transmission channel gain, f, in a sub-slot t i Is MD i Computing power and P i Denotes MD i The transmission power. And corresponding priorities are given to the tasks according to the computing environments of the tasks, so that the total time and energy consumption of task computing are reduced while the successful completion of the high-priority tasks is ensured, and the service quality is improved.
3 energy harvesting model
In the proposed MEC system, all MDs are equipped with rechargeable batteries with a maximum capacity of B max "memory MD i The amount of power at the beginning of the sub-slot t is
Figure BDA0003872432350000103
In particular, the ET and MEC servers are deployed at the edge of the network, allowing the ET to provide on-demand energy through the WPT to the Central Processing Unit (CPU) and radio transceivers of the wireless device in a fully controllable manner, with the collected energy being input into the batteries of the MDs. With the collected energy, the MDs may offload computing tasks to be executed on the MEC server or execute the tasks locally. For simplifying the model, it is assumed that energy arrives at the MDs in the form of energy packets during the energy collection process, that is, at the beginning of each sub-time slot t, the MDs obtains the energy packets through the EH module and inputs the energy packets into the battery, and the size of the energy packets is recorded as e t . The change of the MDs electric quantity under different execution states of the task is considered as follows:
(1) When the task in the sub-slot t fails to be in MD due to decision failure i Can smoothly finish the electricity within the supportable rangeIf no task is currently executed, only the charging capacity of the wireless component changes during the sub-time slot t. Thus, at the beginning of sub-slot t +1, MD i Is of electric quantity
Figure BDA0003872432350000104
(2) MD in current sub-time slot t i The task is executed locally with energy consumption of
Figure BDA0003872432350000105
Then at the beginning of sub-slot t +1, time MD i Is an electric quantity of
Figure BDA0003872432350000111
(3) MD in current sub-slot t i The task is unloaded to the MEC server to be executed, and the energy consumption is
Figure BDA0003872432350000112
Then at the start of sub-slot t +1 at time MD i Is of electric quantity
Figure BDA0003872432350000113
Based on the above system model definitions, the proposed MEC system aims to minimize the sum of the weighted overhead of latency and energy consumption resulting from the execution of sequential tasks on MDs, which can be formalized as an optimization problem P1 such as
Figure BDA0003872432350000114
Wherein w 1 And w 2 Respectively representing the time delay and the energy consumption generated by the execution of the task. C1 indicates that a task can only be executed locally or off-loaded to the MEC server. C2 indicates that the energy consumption generated by executing the task cannot exceed the available electric quantity of the current equipment. C3 represents the execution of a taskThe line time can not exceed the maximum tolerance time delay T of the task d . C4 represents a constraint on the proportion of upload bandwidth allocated for the offload task. C5 represents a constraint on the proportion of MEC server computing resources allocated for offloading tasks.
In this embodiment, referring to fig. 3, the present invention provides a method JOA _ RL for joint optimization of computation offload and resource allocation based on deep reinforcement learning; the computation offload and resource allocation in MEC systems are treated as environments, and the DRL agent selects corresponding actions by interacting with the environments
Where the state space, motion space and reward function defined in the JOA _ RL method are as follows:
state space: the state space contains the tasks Task generated on all the MDs of the subslot t t The unloading priority pr of a task t MDs battery power b t And computing resources available to the MEC server at the current time
Figure BDA0003872432350000121
Thus, the system state at the sub-slot t can be expressed as:
Figure BDA0003872432350000122
wherein
Figure BDA0003872432350000123
An action space: and the DRL agent performs the actions of calculation unloading and resource allocation according to the current system state. The action space contains the offload decision α t Upload bandwidth allocation of tasks w t And the assigned MEC server computing resources p for the task t . Thus, the action at the sub-slot t can be expressed as:
a t ={α t ,w t ,p t equation (15)
Wherein the content of the first and second substances,
Figure BDA0003872432350000124
the reward function: the objective of the proposed MEC system is to minimize the sum of the weighted overhead of system latency and energy consumption while satisfying the constraints of the optimization problem P1. Thus, at the sub-slot t instant, the instant prize of the system can be expressed as:
Figure BDA0003872432350000125
wherein w 1 And w 2 Respectively representing the time delay and the energy consumption generated by the execution of the task. F represents a normalization function for normalizing the values of the time delay and the energy consumption to the same value interval. Pu represents a penalty factor for task failure.
In the process of computing unloading and resource allocation optimization in a multi-constraint MEC environment, the DRL agent determines the current system state (including task state and resource usage) s according to the strategy mu t Next select an action a t (computational offloading and resource allocation). Environment according to action a t Feedback award r t And transition to a new system state s t+1 The process may be described as an MDP process.
In the embodiment, JOA-RL can effectively approach to an optimal strategy for calculating offloading and resource allocation in a dynamic MEC environment, can obtain better balance between latency and energy consumption under the constraint of the maximum latency of a task and the electric quantity of equipment, and shows a higher success rate of task execution.
5363 the method of JOA-RL utilizes Deep Deterministic Policy Gradient (DDPG) for training DNN to obtain optimal computational offload and resource allocation Policy.
In the JOA-RL method, the critic network adopts a single-step updating mode based on a value function and is responsible for evaluating a Q value corresponding to each action, and the actor network adopts an updating mode based on a strategy gradient and is responsible for generating corresponding calculation unloading and resource allocation actions in the current system state.
The error of the strategy gradient can be effectively reduced by using the critic network, because the critic network can guide the operator network to learn the optimal strategy. Furthermore, by integrating DNN, the JOR-RL approach can handle the problem of high dimensional state space very well.
The key steps of the JOA _ RL method provided by the invention are shown as algorithm 1:
Figure BDA0003872432350000131
Figure BDA0003872432350000141
Figure BDA0003872432350000151
based on the definitions of the state space in equation (14), the action space in equation (15), and the reward function in equation (16), first, the parameter θ of the operator network is initialized μ And parameter θ of critical network Q . Then, the operator network parameter theta is set μ Assigning to the target actor network parameter theta μ′ And the criticc network parameter theta Q Assigning to the target critic network parameter theta Q′ Simultaneously initializing an experience replay pool M, a training round P and a time series length T max . In particular, an independent target network is employed in the method. The method reduces the correlation among data, enhances the stability and robustness of the method, and simultaneously reduces the correlation of the data by introducing an empirical playback mechanism.
After the initialization is completed, training is started. In each training round, the method acquires the system environment state s in each step t Inputting an operator network, performing an operator network output action a in the environment t Corresponding offload computation and resource allocation operations are performed (lines 5-11). Calculating corresponding reward according to formula, and feeding back the step of task cumulative execution reward r by environment t And the next state s t+1 (line 12).
Since the system state and resource allocation actions in the MEC environment are one continuous value, the JOA-RL method considers the stateMDP, which is a continuous value with action. JOA-RL method training criticic network theta Q Fitting Q(s) off t ,a t ) When Q(s) t ,a t ) When determined, for a fixed s t Must exist a t So that Q(s) t ,a t ) And max. However, s t To a t The mapping relationship between s is very complex, given s t The latter Q value being with respect to a t High-dimensional multi-level nested non-linear functions. To address this problem, an operator network θ is utilized herein μ Fitting the complex mapping. Specifically, Q(s) t ,a t ) Expressed as:
Q(s t ,a t )=E environment [r(s t ,a t )+γQ(s t+1 ,μ(s t+1 ))]formula (17)
Wherein the operator network theta μ According to the current state s t Maximum operation a of output Q value t The process can be expressed as:
a t =μ(s tμ ) Formula (18)
In this approach, the performance goals of the actor network are defined as:
Figure BDA0003872432350000161
when the number of training samples stored in M reaches N, N records are randomly selected for training the network parameters (line 14). An important problem faced by the method in optimizing the loss function is that the performance is unstable when derivation optimization is carried out on an expression containing max, and the update parameters can not necessarily enable max(s) t+1 ,a t+1 ) Changing towards the ideal direction. This is especially true when the motion space is continuous, resulting in a training Q(s) t ,a t ) The target network itself is moving while moving toward the target network.
To solve this problem, in the method, target operator networks θ are defined, respectively μ And a target critic network theta Q
The critic network is responsible for calculating the current Q value Q(s) t ,a t ) And defines a target Q value y t
y t =r t +γQ′(s t+1 ,μ′(s t+1μ′ )|θ Q′ ) Formula (20)
The strategy optimal solution of the actor network is approximated by adopting a gradient ascending method, and the loss function of the critic network is defined as follows:
Figure BDA0003872432350000171
in each training step, the target operator network and the target critic network approach to the operator network and the critic network according to the updating step tau. Compared with the method of simply copying the network parameters, the updating method can make the method more stable.
Example 1:
the joint optimization model for computing offloading and resource allocation proposed in this embodiment is constructed and trained based on python3.6 and an open-source framework Pytorch. All simulation experiments are carried out on a notebook computer equipped with Intel i5-7300HQ, and the clock frequency of a CPU is 2.5GHz and the internal memory is 8GB. In the experiment, all the MDs are randomly distributed and share the bandwidth in the coverage area of the AP, and the AP is equipped with a MEC server. Wherein, the distribution of the computing power of each MD is [1,1.2] GHz/s, and the computing power of the MEC server is 20GHz/s. Under the default experimental setting, 10 MDs share the bandwidth of 10MHz, the duration of each time slot T is 1s, the duration of the sub-time slot T is 0.25s, and the total of 48 time slots T is obtained in one training round.
During training, the learning rate of the actor network is 0.0006, the learning rate of the critic network is 0.006, and the reward discount factor gamma is set to 0.95. After the JOA-RL method completes training, the method is applicable to joint optimization of computation unloading and resource allocation under a changeable MEC environment.
Based on the above settings, a large number of simulation experiments are performed to evaluate the performance of the proposed deep reinforcement learning-based computational offloading and resource allocation joint optimization method. To analyze the effectiveness and advantages of the proposed JOA _ RL method, the proposed JOA-RL method was compared to the following 5 baseline methods.
Local: all tasks are performed on the MDs;
and MEC: all tasks are unloaded to an MEC server for execution;
random: the tasks are executed on the MDs or MEC server in a random mode;
greeny: on the premise of meeting the maximum tolerant time delay of the tasks, the tasks are preferentially selected to be executed on the MDs;
DQN: the value-based DRL method learns deterministic policies by computing the probability of each computational offload and resource allocation action.
As shown in fig. 4 (a), comparing the convergence of different methods, the methods of Local, MEC, random, and Greedy are single-step decisions, and there is no learning and optimization process. When processing time sequence tasks, the performance of the methods such as Local, MEC and Random is not as good as that of the other three methods. This is because the methods such as Local, MEC, and Random select the task blindly, and do not take into account the current system status and task characteristics, which results in a large portion of the tasks failing due to exceeding the latency and power constraints. For example, limited computing power may result in tasks that cannot be completed within latency constraints as compared to MEC servers. If the task is frequently offloaded to the MEC server for execution, the battery power of the MDs may not support the offloading process, resulting in a task failure. In contrast to JOA-RL and DQN methods, greeny's method only looks at the immediate rewards that a task can achieve and does not well consider long-term rewards. In the early stage of the training process, the performance of the greeny method is better than that of JOA-RL and DQN which are two DRL-based methods. However, at the later stage of the training process, the JOA-RL and DQN methods perform better than the Greedy method because the long-term reward of the system is considered. The JOA-RL method provided by the invention integrates a DRL method based on values and a strategy, can cope with a high-dimensional continuous motion space, and has higher convergence speed, so that the JOA-RL method has better performance than a DQN method. As shown in fig. 4 (b), the MEC method and the Local method exhibit the highest and lowest average task consumption energy, respectively, compared to the average consumption energy of the different methods for successfully completing the task. The Greedy method executes the task locally preferentially on the premise of meeting the maximum tolerant time delay of the task, so that the average task consumption energy is only higher than that of a Local method. Compared with the DQN method, the JOA-RL method is also superior to the DQN method in effect after convergence. As shown in fig. 4 (c), the average task latency of the different methods is compared. The JOA-RL method is superior to other 5 methods in the average task waiting time after convergence, and the Local method is limited in Local computing capacity and long in time required for completing the task, so that the average task waiting time is far higher than that of the other 5 methods. As shown in fig. 4 (d), the task success rates of the different methods are compared.
As shown in fig. 5, the Local method has no influence on the change of the network bandwidth since there is no process of calculating the offload. For the MEC method, when the network bandwidth is low, the bandwidth allocated to each uploaded task is low, which results in a large amount of task uploading time, and many tasks fail due to failing to meet the maximum delay constraint, so the performance reflected by the MEC method is poor. With the increase of network bandwidth, the performance of 5 methods other than Local method also tends to increase. Of these, the performance improvement of the MEC method is most significant because the performance of the method is very dependent on the network bandwidth. Compared with the DQN method, the JOA-RL method provided by the invention can better handle the continuous resource allocation problem and realize lower time delay and energy consumption. This shows that the JOA-RL method is more advantageous in the joint optimization problem of computation offload and resource allocation. When the network bandwidth is increased to a certain degree, the performances of the 5 methods except the Local method basically tend to be stable. This is because as the network bandwidth increases, the tasks that fail due to exceeding the delay constraint during the computation offload process are reduced, but the performance of these methods cannot be further improved due to the remaining constraint of the battery power of the MDs.
As shown in fig. 6, the Local method has no influence on the change of the computing power of the MEC server because there is no process of computing offload. As the computing power of the MEC server increases, the performance of 5 methods other than the Local method also tends to increase. Compared with a DQN method, the JOA-RL method provided by the invention can realize lower time delay and energy consumption, and the JOA-RL method can better handle continuous resource allocation problems, which shows that the JOA-RL method has more advantages in the combined optimization problem of calculation unloading and resource allocation. When the computing power of the MEC server increases to a certain extent, the performance of all 5 methods except the Local method also tends to be substantially stable. This is because as the computing power of the MEC server increases, the number of tasks that fail due to exceeding the delay constraint during the computation offload process decreases, but there is a constraint on the battery power of the MDs, so that the performance of these methods cannot be further improved.
As shown in fig. 7, for the Local method, the amount of electricity consumed by the task Local calculation is lower than the maximum capacity of the battery, so an increase in the MD battery maximum capacity has no effect on the Local method. For the other five methods, the amount of power consumed by uploading the task is large, so when the MD storage battery maximum capacity is small, the task often fails because the storage battery capacity is not enough to support calculation unloading. As the maximum capacity of the MD battery increases, the amount of stored power can support more computational offload, and therefore the performance of these five methods is on the rise. When the maximum capacity of the MD battery is increased to a certain extent, the failure of calculation unloading due to the insufficient maximum capacity of the MD battery is substantially eliminated, and the performance of these methods also tends to be stable. Compared with the DQN method, the JOA-RL method provided by the invention can better handle the continuous resource allocation problem and realize lower time delay and energy consumption. This shows that the JOA-RL method is more advantageous in the joint optimization problem of computation offload and resource allocation.
The above description is only a preferred embodiment of the present invention, and all equivalent changes and modifications made in accordance with the claims of the present invention should be covered by the present invention.

Claims (9)

1. A joint optimization system for computation offload and resource allocation in a multi-constraint edge environment is characterized by comprising
Base station BS, MEC server and N chargeable mobile devices MDs, wherein the N chargeable mobile devices MDs are represented as a set MD = { MD = 1 ,MD 2 ,...MD i ...,MD N }; the chargeable mobile equipment MDs is accessed to the base station BS through a 5G or LTE mode, and the base station BS is provided with an MEC server.
2. The joint optimization system for computation offload and resource allocation in multi-constrained edge environment according to claim 1, wherein the MDs are equipped with energy harvesting components and powered by energy harvested from RF signals.
3. The joint optimization system for computation offload and resource allocation in multi-constrained edge environment of claim 1, wherein when the chargeable mobile device MDs generates tasks, the computing tasks are offloaded to the MEC server for execution or executed locally, and the tasks with higher priority tend to be offloaded to the MEC server for execution, specifically, the priority pr is higher than the priority pr i T Is defined as
Figure FDA0003872432340000011
Wherein the content of the first and second substances,
Figure FDA0003872432340000012
representing the transmission channel gain in the sub-slot t,
Figure FDA0003872432340000013
is Task i Amount of data of f i Is MD i Computing power of P i Denotes MD i The transmission power.
4. The optimization method of the joint optimization system for computation offload and resource allocation under the multi-constrained edge environment according to claim 1, comprising the following steps:
step S1, generating an unloading decision and a resource allocation decision based on a calculation unloading and resource allocation combined optimization model according to tasks generated on different MDs, unloading priorities of the tasks, battery electric quantity of the MDs and available calculation resources of an MEC server at the current moment;
s2, communication resources are issued according to the resource allocation decision, and the MDs unload the tasks to a local or MEC server for execution according to the unloading decision;
and S3, the job scheduler allocates the job to the server from the job sequence according to the resource allocation decision.
5. The optimization method according to claim 4, wherein the joint optimization model for computation offloading and resource allocation is constructed and trained based on Python3.6 and an open-source framework Pythrch, and specifically comprises the following steps:
(1) Obtaining MD i Computing power f i MEC server computing power
Figure FDA0003872432340000021
Network bandwidth
Figure FDA0003872432340000022
And initializing the system;
(2) Training is carried out, and the system environment state s obtained by each training is t Inputting an operator network, performing an operator network output action a in the environment t Executing corresponding unloading calculation and resource allocation operation;
(3) Calculating corresponding reward according to formula, and feeding back environment to perform reward r in the step of task accumulation t And the next state s t+1 And store the training samples in an experience replay pool m.push(s) t ,a t ,r t ,s t+1 );
(4) And when the number of training samples stored in the M reaches N, randomly selecting N records for training network parameters to obtain a final calculation unloading and resource allocation combined optimization model.
6. The optimization method according to claim 4, whichCharacterized in that, the initialization system specifically comprises: based on the state space, the action space and the reward function, firstly, the parameter theta of the operator network is initialized μ And parameter θ of critical network Q (ii) a Then, the operator network parameter theta is measured μ Assigning to the target actor network parameter theta μ′ And the criticc network parameter theta Q Assigning to the target critic network parameter theta Q′ Simultaneously initializing an experience replay pool M, training rounds P and a time series length T max
7. The optimization method according to claim 6, wherein the state space, action space and reward function are as follows:
state space: the state space contains the tasks Task generated on all the MDs of the subslot t t Priority pr for offloading of tasks t MDs battery power b t And computing resources available to the MEC server at the current time
Figure FDA0003872432340000031
Thus, the system state at the sub-slot t is represented as:
Figure FDA0003872432340000032
wherein
Figure FDA0003872432340000033
An action space: the DRL agent carries out the actions of calculating unloading and resource allocation according to the current system state; the action space contains the offload decision α t Upload bandwidth allocation of tasks w t And the assigned MEC server computing resources p for the task t (ii) a Therefore, the action at the sub-slot t is represented as:
a t ={α t ,w t ,p t equation (15)
Wherein the content of the first and second substances,
Figure FDA0003872432340000034
the reward function: the goal of the system is to minimize the sum of the weighted costs of system latency and energy consumption under the constraint of satisfying the optimization problem P1, and therefore, at the sub-slot t instant, the immediate reward of the system is expressed as:
Figure FDA0003872432340000041
wherein, w 1 And w 2 And F represents a normalization function, and Pu represents a penalty coefficient of task failure.
8. The optimization method according to claim 4, wherein the training is specifically: training criticic networks θ Q Fitting Q(s) off t ,a t ) When Q(s) t ,a t ) When determined, for a fixed s t Must exist a t So that Q(s) t ,a t ) Maximum, Q(s) t ,a t ) Expressed as:
Q(s t ,a t )=E environment [r(s t ,a t )+γQ(s t+1 ,μ(s t+1 ))]formula (17)
Wherein the operator network theta μ According to the current state s t Maximum operation a of output Q value t The process is represented as:
a t =μ(s tμ ) Formula (18)
The performance goal of an actor network is defined as:
Figure FDA0003872432340000042
9. the optimization method of claim 4, wherein a target actor mesh is definedLuo theta μ′ And target critic network theta Q′
The critic network is responsible for calculating the current Q value Q(s) t ,a t ) And defines a target Q value y t
y t =r t +γQ′(s t+1 ,μ′(s t+1μ′ )|θ Q′ ) Formula (20)
The strategy optimal solution of the actor network is approximated by adopting a gradient ascending method, and the loss function of the critic network is defined as follows:
Figure FDA0003872432340000043
and in each training step, the target operator network and the target critic network are close to the operator network and the critic network according to the updating step tau.
CN202211200913.9A 2022-09-29 2022-09-29 System and method for joint optimization of computation unloading and resource allocation under multi-constraint side environment Pending CN115567978A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202211200913.9A CN115567978A (en) 2022-09-29 2022-09-29 System and method for joint optimization of computation unloading and resource allocation under multi-constraint side environment
PCT/CN2022/126471 WO2024065903A1 (en) 2022-09-29 2022-10-20 Joint optimization system and method for computation offloading and resource allocation in multi-constraint-edge environment
NL2033996A NL2033996A (en) 2022-09-29 2023-01-20 Joint optimization system and method for computation offloading and resource allocation in multi-constraint edge environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211200913.9A CN115567978A (en) 2022-09-29 2022-09-29 System and method for joint optimization of computation unloading and resource allocation under multi-constraint side environment

Publications (1)

Publication Number Publication Date
CN115567978A true CN115567978A (en) 2023-01-03

Family

ID=84742402

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211200913.9A Pending CN115567978A (en) 2022-09-29 2022-09-29 System and method for joint optimization of computation unloading and resource allocation under multi-constraint side environment

Country Status (3)

Country Link
CN (1) CN115567978A (en)
NL (1) NL2033996A (en)
WO (1) WO2024065903A1 (en)

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111464208A (en) * 2020-03-09 2020-07-28 深圳大学 Passive edge computing system based on spread spectrum communication, task unloading method and storage medium
CN113158544B (en) * 2021-02-03 2024-04-12 大连理工大学 Edge pre-caching strategy based on federal learning under vehicle-mounted content center network
CN113286317B (en) * 2021-04-25 2023-07-28 南京邮电大学 Task scheduling method based on wireless energy supply edge network
CN113573324B (en) * 2021-07-06 2022-08-12 河海大学 Cooperative task unloading and resource allocation combined optimization method in industrial Internet of things
CN113645273B (en) * 2021-07-06 2023-07-07 南京邮电大学 Internet of vehicles task unloading method based on service priority
CN113873022A (en) * 2021-09-23 2021-12-31 中国科学院上海微系统与信息技术研究所 Mobile edge network intelligent resource allocation method capable of dividing tasks
CN114641076A (en) * 2022-03-25 2022-06-17 重庆邮电大学 Edge computing unloading method based on dynamic user satisfaction in ultra-dense network

Also Published As

Publication number Publication date
WO2024065903A1 (en) 2024-04-04
NL2033996A (en) 2024-04-08

Similar Documents

Publication Publication Date Title
WO2022121097A1 (en) Method for offloading computing task of mobile user
CN113950066B (en) Single server part calculation unloading method, system and equipment under mobile edge environment
CN110493360B (en) Mobile edge computing unloading method for reducing system energy consumption under multiple servers
CN109240818B (en) Task unloading method based on user experience in edge computing network
CN111586720B (en) Task unloading and resource allocation combined optimization method in multi-cell scene
CN108924936B (en) Resource allocation method of unmanned aerial vehicle-assisted wireless charging edge computing network
CN109151864B (en) Migration decision and resource optimal allocation method for mobile edge computing ultra-dense network
CN111586696A (en) Resource allocation and unloading decision method based on multi-agent architecture reinforcement learning
CN113225377B (en) Internet of things edge task unloading method and device
CN113286317B (en) Task scheduling method based on wireless energy supply edge network
CN112105062A (en) Mobile edge computing network energy consumption minimization strategy method under time-sensitive condition
CN113377533A (en) Dynamic computation unloading and server deployment method in unmanned aerial vehicle assisted mobile edge computation
Zhao et al. QoE aware and cell capacity enhanced computation offloading for multi-server mobile edge computing systems with energy harvesting devices
CN111511028B (en) Multi-user resource allocation method, device, system and storage medium
Zhu et al. Learn and pick right nodes to offload
CN114172558B (en) Task unloading method based on edge calculation and unmanned aerial vehicle cluster cooperation in vehicle network
CN116233927A (en) Load-aware computing unloading energy-saving optimization method in mobile edge computing
Liu et al. Computation offloading and resource allocation in unmanned aerial vehicle networks
Xiong et al. An Energy Aware Algorithm for Edge Task Offloading.
Gong et al. Hierarchical deep reinforcement learning for age-of-information minimization in IRS-aided and wireless-powered wireless networks
CN115567978A (en) System and method for joint optimization of computation unloading and resource allocation under multi-constraint side environment
Lyu et al. Service-driven resource management in vehicular networks based on deep reinforcement learning
Shang et al. A hybrid deep reinforcement learning approach for dynamic task offloading in NOMA-MEC system
Li Optimal offloading for dynamic compute-intensive applications in wireless networks
Liu et al. Computation offloading optimization in mobile edge computing based on HIBSA

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination