CN115567978A

CN115567978A - System and method for joint optimization of computation unloading and resource allocation under multi-constraint side environment

Info

Publication number: CN115567978A
Application number: CN202211200913.9A
Authority: CN
Inventors: 陈哲毅; 黄思进; 张俊杰; 熊兵
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2022-09-29
Filing date: 2022-09-29
Publication date: 2023-01-03
Also published as: WO2024065903A1; NL2033996A

Abstract

The invention relates to a combined optimization system and method for computation unloading and resource allocation under a multi-constraint side environment, which designs a unified computation unloading and resource allocation model aiming at a dynamic MEC system under a multi-constraint condition and takes time delay and energy consumption of an execution task as an optimization target. A task priority preprocessing mechanism is designed, priorities can be allocated to tasks according to the data volume of the tasks and the performance of mobile equipment, a calculation unloading and resource allocation joint optimization method based on deep reinforcement learning JOR-RL is provided, and in the JOA-RL method, a criticic network adopts a single-step updating mode based on a value function and is used for evaluating a current unloading scheme and a resource scheduling strategy; and the actor network adopts an updating mode based on strategy gradient and is used for outputting the unloading scheme and the resource scheduling strategy. The method has obvious effects on improving the success rate of task execution and reducing the time delay and energy consumption of task execution.

Description

System and method for joint optimization of computation unloading and resource allocation under multi-constraint side environment

Technical Field

The invention relates to a system and a method for joint optimization of computation unloading and resource allocation under a multi-constraint edge environment.

Background

With the rapid development and popularity of communication technologies and mobile devices, a variety of emerging applications continue to emerge, which often collect large amounts of sensory data and are accompanied by computationally intensive tasks to support their high-quality intelligent services, which present significant challenges to the hardware performance of mobile devices. However, due to the size and cost of manufacturing devices, mobile devices are usually equipped with a battery with a certain capacity and a processor with limited computing power, which has not been able to support the demand of high-performance sustainable processing for emerging applications. Cloud computing provides sufficient computing and storage resources, and mobile devices can make up for their deficiencies in hardware performance by means of cloud services. Therefore, one possible solution is to offload the computationally intensive tasks on the mobile device to a remote cloud with sufficient resources for execution, and to feed the results back to the mobile device after the tasks are completed. However, the long distance between the mobile device and the remote cloud may cause severe data transmission delay, may not well meet the requirements of the delay-sensitive application, and may also significantly affect the service experience of the user.

Compared to cloud computing, mobile Edge Computing (MEC) deploys computing and storage resources to the edge of the network closer to the mobile devices. Therefore, the MEC is used for carrying out computation unloading, so that the condition of network congestion in cloud computing can be effectively avoided, the response time of network service is shortened, and the basic requirements of users on privacy protection can be better met. The MEC server is provided with fewer resources but is more flexible than a cloud server. Therefore, how to achieve reasonable resource allocation in a resource-constrained MEC system is a difficulty. In addition, the mobile device often needs to run continuously to support various intelligent applications, but is limited by the battery capacity, and the calculation unloading process of the task is also affected to a certain extent. The integration of MECs with radio frequency-based Wireless Power Transfer (WPT) has recently become a viable and promising solution for providing on-demand energy to the radio transceiver of wireless mobile devices. However, the multiple constraints of energy and delay bring new challenges to the computation offload and resource allocation in the edge environment, and therefore, an effective computation offload and resource allocation method needs to be designed.

Disclosure of Invention

In view of this, the present invention provides a joint optimization system and method for computation offload and resource allocation in a multi-constraint edge environment, which can obtain an optimal policy for computation offload and resource allocation in a dynamic MEC environment.

In order to achieve the purpose, the invention adopts the following technical scheme:

a joint optimization system for computation offload and resource allocation under a multi-constrained edge environment comprises a base station BS, an MEC server and N chargeable mobile devices MDs, wherein the N chargeable mobile devices MDs are recorded as a set MD = { MDS = ₁ ,MD ₂ ,...MD _i ...,MD _N }; the chargeable mobile equipment MDs is accessed to the base station BS through a 5G or LTE mode, and the base station BS is provided with an MEC server.

Further, the MDs are equipped with Energy Harvesting (EH) components and are powered by energy harvested from Radio Frequency (RF) signals.

Further, when the chargeable mobile device MDs generates tasks, and the computing tasks are unloaded to the MEC server to be executed or executed locally, the tasks with higher priority tend to be unloaded to the MEC server to be executed, specifically, the priority pr _i ^T Is defined as

Wherein the content of the first and second substances,

representing the transmission channel gain in the sub-slot t,

is Task _i Amount of data of (f) _i Is MD _i Meter (2)Computing power, P _i Denotes MD _i The transmission power.

An optimization method of a combined optimization system for computation offload and resource allocation under a multi-constraint edge environment comprises the following steps:

step S1, generating an unloading decision and a resource allocation decision based on a calculation unloading and resource allocation combined optimization model according to tasks generated on different MDs, unloading priorities of the tasks, battery electric quantity of the MDs and available calculation resources of an MEC server at the current moment;

s2, communication resources are issued according to the resource allocation decision, and the MDs unload the tasks to a local or MEC server for execution according to the unloading decision;

and S3, the job scheduler allocates the job to the server from the job sequence according to the resource allocation decision.

Further, the joint optimization model for computation offloading and resource allocation is constructed and trained based on python3.6 and an open-source framework Pytorch, and specifically comprises the following steps:

(1) Obtaining MD _i Computing power f _i MEC server computing power

Network bandwidth

And initializing the system;

(2) Training is carried out, and the system environment state s obtained by each training _t Inputting an operator network, performing an operator network output action a in the environment _t Executing corresponding unloading calculation and resource allocation operation;

(3) Calculating corresponding reward according to formula, and feeding back the step of task cumulative execution reward r by environment _t And the next state s _t+1 And store the training samples in an experience replay pool m.push(s) _t ,a _t ,r _t ,s _t+1 )；

(4) And when the number of training samples stored in the M reaches N, randomly selecting N records for training network parameters to obtain a final calculation unloading and resource allocation combined optimization model.

Further, the initialization system specifically includes: based on the state space, the action space and the reward function, firstly, the parameter theta of the operator network is initialized ^μ And parameter θ of critic network ^Q (ii) a Then, the operator network parameter theta is set ^μ Assigning to the target actor network parameter theta ^μ′ And the criticc network parameter theta ^Q Assigning to the target critic network parameter theta ^Q′ Simultaneously initializing an experience replay pool M, a training round P and a time series length T _max 。

Further, the state space, the action space and the reward function are as follows:

state space: the state space contains the tasks Task generated on all the MDs of the subslot t ^t The unloading priority pr of a task ^t MDs battery power b ^t And computing resources available to the MEC server at the current time

Thus, the system state at the sub-slot t is represented as:

wherein

An action space: the DRL agent makes the actions of calculating unloading and resource allocation according to the current system state; the action space contains the offload decision α ^t Upload bandwidth allocation w of tasks ^t And the assigned MEC server computing resources p for the task ^t . Thus, the action at the sub-slot t is represented as:

a _t ＝{α ^t ,w ^t ,p ^t equation (15)

Wherein the content of the first and second substances,

the reward function: the goal of the system is to minimize the sum of the weighted costs of system latency and energy consumption under the constraint of satisfying the optimization problem P1, and therefore, at the sub-slot t instant, the immediate reward of the system is expressed as:

wherein, w ₁ And w ₂ And F represents a normalization function, and Pu represents a penalty coefficient of task failure.

Further, the training specifically comprises: training criticic networks θ ^Q Fitting Q(s) off _t ,a _t ) When Q(s) _t ,a _t ) When determined, for a fixed s _t Must exist a _t So that Q(s) _t ,a _t ) Maximum, Q(s) _t ,a _t ) Expressed as:

Q(s _t ,a _t )＝E _environment [r(s _t ,a _t )+γQ(s _t+1 ,μ(s _t+1 ))]formula (17)

Wherein the operator network θ ^μ According to the current state s _t Maximum operation a of output Q value _t The process is represented as:

a _t ＝μ(s _t |θ ^μ ) Formula (18)

The performance goal of an actor network is defined as:

further, defining a target operator network theta ^μ′ And target critic network theta ^Q′ ；

The critic network is responsible for calculating the current Q value Q(s) _t ,a _t ) And defines a target Q value y _t ：

y _t ＝r _t +γQ′(s _t+1 ,μ′(s _t+1 |θ ^μ′ )|θ ^Q′ ) Formula (20)

The strategy optimal solution of the operator network is approximated by a gradient ascending method, and the loss function of the critic network is defined as follows:

and in each training step, the target operator network and the target critic network are close to the operator network and the critic network according to the updating step tau.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention can generate a proper calculation unloading and resource allocation scheme according to the calculation resources and the network condition, improve the success rate of executing the task and reduce the time delay and energy consumption of executing the task

2. The invention can allocate the priority to the task according to the task data volume and the performance of the mobile equipment.

Drawings

Fig. 1 is a single edge multi-mobile MEC system in an embodiment of the invention;

FIG. 2 is a flow of sequential task work in accordance with an embodiment of the present invention;

FIG. 3 is a flow chart of a method of JOA _ RL in accordance with an embodiment of the present invention;

FIG. 4 is a comparison of convergence of different methods in an embodiment of the invention

FIG. 5 illustrates the impact of network bandwidth on various methods in one embodiment of the present invention;

FIG. 6 is an illustration of the impact of the computing power of the MEC server on different methods in accordance with an embodiment of the present invention;

FIG. 7 is a graph of the impact of MD battery maximum capacity on various methods in one embodiment of the present invention.

Detailed Description

The invention is further explained below with reference to the drawings and the embodiments.

The invention designs a unified calculation unloading and resource allocation model aiming at the dynamic MEC system under the multi-constraint condition, and takes the time delay and energy consumption of the execution task as the optimization target. A task priority preprocessing mechanism is designed, and priorities can be allocated to tasks according to the data volume of the tasks and the performance of the mobile device. Accordingly, for the DRL framework, a state space, an action space and a reward function of the computation offload and resource allocation problem in the MEC environment are defined, and the optimization problem is formally expressed as a Markov Decision Process (MDP). Then, a calculation unloading and resource allocation joint optimization method JOR-RL based on deep reinforcement learning is provided, and in the JOA-RL method, a criticc network adopts a single-step updating mode based on a value function and is used for evaluating a current unloading scheme and a resource scheduling strategy; and the actor network adopts an updating mode based on strategy gradient and is used for outputting the unloading scheme and the resource scheduling strategy.

Referring to fig. 1, the present invention provides an MEC system, which is composed of a Base Station (BS), an MEC server and N chargeable Mobile Devices (MDs), wherein N is marked as a set MD = { MD = m ₁ ,MD ₂ ,...,MD _N }. MDs access to BS through 5G or LTE mode, and MEC server is equipped on BS. In addition, all MDs are equipped with Energy Harvesting (EH) components and are powered by energy harvested from Radio Frequency (RF) signals.

As shown in FIG. 2, at the beginning of each time slot T, each MD generates a computation task

Wherein

Is the data volume of the task,

Computing resources, T, required for a task _d The maximum completion delay allowed for the task. The MDs draws power from the radio frequency signals of the BS. The task must be completed within its corresponding maximum tolerated delay and the existing battery charge, otherwise the task will be determined to have failed. In the proposed MEC system, tasks from MDsThe completion may be performed with the assistance of an MEC server, and specific communication models, calculation models, and energy collection models are defined as follows.

1 communication model

As shown in the figure 2 of the drawings,

is defined as the time slot T starting time MD _i To generate an offload decision for the task. When the temperature is higher than the set temperature

While, MD _i Unloading the task to an MEC server for execution; when in use

MD _i The task is executed locally. When MD is performed _i When the task is selected to be unloaded to the MEC server for execution, the data depended on by the task calculation is uploaded correspondingly, and the bandwidth of the uploaded task is distributed by the BS. Thus, MD _i The signal-to-noise ratio in the sub-slot t is

Where, δ represents the average power of white gaussian noise,

and P _i Respectively represent MD _i Channel gain and transmission power at sub-slot t. Thus, MD _i The power of the transmitted computing task is

Wherein, B ^t Indicating the upload bandwidth shared by all the MDs of the current sub-slot t,

indicating the assignment of the BS to the MD at the sub-slot t _i And transmitting the bandwidth proportion of the uploading task.

2 computational model

In the proposed MEC system, when the MDs generates a task, the task is first added to the task buffer queue of the corresponding MD, and the task added to the queue first is completed before the subsequent task is executed. Since both the MDs and MEC servers can provide computing services, two computing modes are defined as follows:

(1) Local computing mode

It is assumed that the computing power (i.e., CPU frequency) of different MDs may be different but will not change during task execution. Thus, the latency and energy consumption of the local computation mode are defined as

Wherein f is _i Denotes MD _i The frequency of the CPU of (a) is,

represent

The required computational resources, k, represent the effective capacitance coefficient.

(2) Edge calculation mode

When the MDs unloads the task to the MEC server for execution, the MEC server selects and allocates part of currently available computing resources to the MDs, and the MEC server returns the result to the MDs after the task is executed. Generally, the data size of the calculation result is very small, and the delay and energy consumption for downloading the calculation result of the task are negligible. Therefore, the delay and power consumption of the edge calculation mode are respectively defined as

Wherein, the first and the second end of the pipe are connected with each other,

indicating the computing resources available to the MEC server at the start of the sub-slot t,

indicating t sub-slots allocated to MD _i Calculating the proportion of resources, P _e Representing the computational power allocated by the MEC server to the task.

Thus, performing

Can be expressed as

Execute

Can be expressed as

Wherein the content of the first and second substances,

to represent

To offload decisions.

In order to make quick decisions to find a proper computing mode aiming at different tasks, the invention provides a task priority preprocessing mechanism which can allocate the priority to the tasks according to the data volume of the tasks and the performance of the mobile device. The mechanism measures the appropriateness of uploading different tasks to the MEC server for execution, and the tasks with higher priority tend to be unloaded to the MEC server for execution. Specifically, the above priority is defined as

Wherein the content of the first and second substances,

representing the transmission channel gain, f, in a sub-slot t _i Is MD _i Computing power and P _i Denotes MD _i The transmission power. And corresponding priorities are given to the tasks according to the computing environments of the tasks, so that the total time and energy consumption of task computing are reduced while the successful completion of the high-priority tasks is ensured, and the service quality is improved.

3 energy harvesting model

In the proposed MEC system, all MDs are equipped with rechargeable batteries with a maximum capacity of B _max "memory MD _i The amount of power at the beginning of the sub-slot t is

In particular, the ET and MEC servers are deployed at the edge of the network, allowing the ET to provide on-demand energy through the WPT to the Central Processing Unit (CPU) and radio transceivers of the wireless device in a fully controllable manner, with the collected energy being input into the batteries of the MDs. With the collected energy, the MDs may offload computing tasks to be executed on the MEC server or execute the tasks locally. For simplifying the model, it is assumed that energy arrives at the MDs in the form of energy packets during the energy collection process, that is, at the beginning of each sub-time slot t, the MDs obtains the energy packets through the EH module and inputs the energy packets into the battery, and the size of the energy packets is recorded as e _t . The change of the MDs electric quantity under different execution states of the task is considered as follows:

(1) When the task in the sub-slot t fails to be in MD due to decision failure _i Can smoothly finish the electricity within the supportable rangeIf no task is currently executed, only the charging capacity of the wireless component changes during the sub-time slot t. Thus, at the beginning of sub-slot t +1, MD _i Is of electric quantity

(2) MD in current sub-time slot t _i The task is executed locally with energy consumption of

Then at the beginning of sub-slot t +1, time MD _i Is an electric quantity of

(3) MD in current sub-slot t _i The task is unloaded to the MEC server to be executed, and the energy consumption is

Then at the start of sub-slot t +1 at time MD _i Is of electric quantity

Based on the above system model definitions, the proposed MEC system aims to minimize the sum of the weighted overhead of latency and energy consumption resulting from the execution of sequential tasks on MDs, which can be formalized as an optimization problem P1 such as

Wherein w ₁ And w ₂ Respectively representing the time delay and the energy consumption generated by the execution of the task. C1 indicates that a task can only be executed locally or off-loaded to the MEC server. C2 indicates that the energy consumption generated by executing the task cannot exceed the available electric quantity of the current equipment. C3 represents the execution of a taskThe line time can not exceed the maximum tolerance time delay T of the task _d . C4 represents a constraint on the proportion of upload bandwidth allocated for the offload task. C5 represents a constraint on the proportion of MEC server computing resources allocated for offloading tasks.

In this embodiment, referring to fig. 3, the present invention provides a method JOA _ RL for joint optimization of computation offload and resource allocation based on deep reinforcement learning; the computation offload and resource allocation in MEC systems are treated as environments, and the DRL agent selects corresponding actions by interacting with the environments

Where the state space, motion space and reward function defined in the JOA _ RL method are as follows:

Thus, the system state at the sub-slot t can be expressed as:

wherein

An action space: and the DRL agent performs the actions of calculation unloading and resource allocation according to the current system state. The action space contains the offload decision α ^t Upload bandwidth allocation of tasks w ^t And the assigned MEC server computing resources p for the task ^t . Thus, the action at the sub-slot t can be expressed as:

a _t ＝{α ^t ,w ^t ,p ^t equation (15)

Wherein the content of the first and second substances,

the reward function: the objective of the proposed MEC system is to minimize the sum of the weighted overhead of system latency and energy consumption while satisfying the constraints of the optimization problem P1. Thus, at the sub-slot t instant, the instant prize of the system can be expressed as:

wherein w ₁ And w ₂ Respectively representing the time delay and the energy consumption generated by the execution of the task. F represents a normalization function for normalizing the values of the time delay and the energy consumption to the same value interval. Pu represents a penalty factor for task failure.

In the process of computing unloading and resource allocation optimization in a multi-constraint MEC environment, the DRL agent determines the current system state (including task state and resource usage) s according to the strategy mu _t Next select an action a _t (computational offloading and resource allocation). Environment according to action a _t Feedback award r _t And transition to a new system state s _t+1 The process may be described as an MDP process.

In the embodiment, JOA-RL can effectively approach to an optimal strategy for calculating offloading and resource allocation in a dynamic MEC environment, can obtain better balance between latency and energy consumption under the constraint of the maximum latency of a task and the electric quantity of equipment, and shows a higher success rate of task execution.

5363 the method of JOA-RL utilizes Deep Deterministic Policy Gradient (DDPG) for training DNN to obtain optimal computational offload and resource allocation Policy.

In the JOA-RL method, the critic network adopts a single-step updating mode based on a value function and is responsible for evaluating a Q value corresponding to each action, and the actor network adopts an updating mode based on a strategy gradient and is responsible for generating corresponding calculation unloading and resource allocation actions in the current system state.

The error of the strategy gradient can be effectively reduced by using the critic network, because the critic network can guide the operator network to learn the optimal strategy. Furthermore, by integrating DNN, the JOR-RL approach can handle the problem of high dimensional state space very well.

The key steps of the JOA _ RL method provided by the invention are shown as algorithm 1:

based on the definitions of the state space in equation (14), the action space in equation (15), and the reward function in equation (16), first, the parameter θ of the operator network is initialized ^μ And parameter θ of critical network ^Q . Then, the operator network parameter theta is set ^μ Assigning to the target actor network parameter theta ^μ′ And the criticc network parameter theta ^Q Assigning to the target critic network parameter theta ^Q′ Simultaneously initializing an experience replay pool M, a training round P and a time series length T _max . In particular, an independent target network is employed in the method. The method reduces the correlation among data, enhances the stability and robustness of the method, and simultaneously reduces the correlation of the data by introducing an empirical playback mechanism.

After the initialization is completed, training is started. In each training round, the method acquires the system environment state s in each step _t Inputting an operator network, performing an operator network output action a in the environment _t Corresponding offload computation and resource allocation operations are performed (lines 5-11). Calculating corresponding reward according to formula, and feeding back the step of task cumulative execution reward r by environment _t And the next state s _t+1 (line 12).

Since the system state and resource allocation actions in the MEC environment are one continuous value, the JOA-RL method considers the stateMDP, which is a continuous value with action. JOA-RL method training criticic network theta ^Q Fitting Q(s) off _t ,a _t ) When Q(s) _t ,a _t ) When determined, for a fixed s _t Must exist a _t So that Q(s) _t ,a _t ) And max. However, s _t To a _t The mapping relationship between s is very complex, given s _t The latter Q value being with respect to a _t High-dimensional multi-level nested non-linear functions. To address this problem, an operator network θ is utilized herein ^μ Fitting the complex mapping. Specifically, Q(s) _t ,a _t ) Expressed as:

Wherein the operator network theta ^μ According to the current state s _t Maximum operation a of output Q value _t The process can be expressed as:

a _t ＝μ(s _t |θ ^μ ) Formula (18)

In this approach, the performance goals of the actor network are defined as:

when the number of training samples stored in M reaches N, N records are randomly selected for training the network parameters (line 14). An important problem faced by the method in optimizing the loss function is that the performance is unstable when derivation optimization is carried out on an expression containing max, and the update parameters can not necessarily enable max(s) _t+1 ,a _t+1 ) Changing towards the ideal direction. This is especially true when the motion space is continuous, resulting in a training Q(s) _t ,a _t ) The target network itself is moving while moving toward the target network.

To solve this problem, in the method, target operator networks θ are defined, respectively ^μ And a target critic network theta ^Q 。

y _t ＝r _t +γQ′(s _t+1 ,μ′(s _t+1 |θ ^μ′ )|θ ^Q′ ) Formula (20)

The strategy optimal solution of the actor network is approximated by adopting a gradient ascending method, and the loss function of the critic network is defined as follows:

in each training step, the target operator network and the target critic network approach to the operator network and the critic network according to the updating step tau. Compared with the method of simply copying the network parameters, the updating method can make the method more stable.

Example 1:

the joint optimization model for computing offloading and resource allocation proposed in this embodiment is constructed and trained based on python3.6 and an open-source framework Pytorch. All simulation experiments are carried out on a notebook computer equipped with Intel i5-7300HQ, and the clock frequency of a CPU is 2.5GHz and the internal memory is 8GB. In the experiment, all the MDs are randomly distributed and share the bandwidth in the coverage area of the AP, and the AP is equipped with a MEC server. Wherein, the distribution of the computing power of each MD is [1,1.2] GHz/s, and the computing power of the MEC server is 20GHz/s. Under the default experimental setting, 10 MDs share the bandwidth of 10MHz, the duration of each time slot T is 1s, the duration of the sub-time slot T is 0.25s, and the total of 48 time slots T is obtained in one training round.

During training, the learning rate of the actor network is 0.0006, the learning rate of the critic network is 0.006, and the reward discount factor gamma is set to 0.95. After the JOA-RL method completes training, the method is applicable to joint optimization of computation unloading and resource allocation under a changeable MEC environment.

Based on the above settings, a large number of simulation experiments are performed to evaluate the performance of the proposed deep reinforcement learning-based computational offloading and resource allocation joint optimization method. To analyze the effectiveness and advantages of the proposed JOA _ RL method, the proposed JOA-RL method was compared to the following 5 baseline methods.

Local: all tasks are performed on the MDs;

and MEC: all tasks are unloaded to an MEC server for execution;

random: the tasks are executed on the MDs or MEC server in a random mode;

greeny: on the premise of meeting the maximum tolerant time delay of the tasks, the tasks are preferentially selected to be executed on the MDs;

DQN: the value-based DRL method learns deterministic policies by computing the probability of each computational offload and resource allocation action.

As shown in fig. 4 (a), comparing the convergence of different methods, the methods of Local, MEC, random, and Greedy are single-step decisions, and there is no learning and optimization process. When processing time sequence tasks, the performance of the methods such as Local, MEC and Random is not as good as that of the other three methods. This is because the methods such as Local, MEC, and Random select the task blindly, and do not take into account the current system status and task characteristics, which results in a large portion of the tasks failing due to exceeding the latency and power constraints. For example, limited computing power may result in tasks that cannot be completed within latency constraints as compared to MEC servers. If the task is frequently offloaded to the MEC server for execution, the battery power of the MDs may not support the offloading process, resulting in a task failure. In contrast to JOA-RL and DQN methods, greeny's method only looks at the immediate rewards that a task can achieve and does not well consider long-term rewards. In the early stage of the training process, the performance of the greeny method is better than that of JOA-RL and DQN which are two DRL-based methods. However, at the later stage of the training process, the JOA-RL and DQN methods perform better than the Greedy method because the long-term reward of the system is considered. The JOA-RL method provided by the invention integrates a DRL method based on values and a strategy, can cope with a high-dimensional continuous motion space, and has higher convergence speed, so that the JOA-RL method has better performance than a DQN method. As shown in fig. 4 (b), the MEC method and the Local method exhibit the highest and lowest average task consumption energy, respectively, compared to the average consumption energy of the different methods for successfully completing the task. The Greedy method executes the task locally preferentially on the premise of meeting the maximum tolerant time delay of the task, so that the average task consumption energy is only higher than that of a Local method. Compared with the DQN method, the JOA-RL method is also superior to the DQN method in effect after convergence. As shown in fig. 4 (c), the average task latency of the different methods is compared. The JOA-RL method is superior to other 5 methods in the average task waiting time after convergence, and the Local method is limited in Local computing capacity and long in time required for completing the task, so that the average task waiting time is far higher than that of the other 5 methods. As shown in fig. 4 (d), the task success rates of the different methods are compared.

As shown in fig. 5, the Local method has no influence on the change of the network bandwidth since there is no process of calculating the offload. For the MEC method, when the network bandwidth is low, the bandwidth allocated to each uploaded task is low, which results in a large amount of task uploading time, and many tasks fail due to failing to meet the maximum delay constraint, so the performance reflected by the MEC method is poor. With the increase of network bandwidth, the performance of 5 methods other than Local method also tends to increase. Of these, the performance improvement of the MEC method is most significant because the performance of the method is very dependent on the network bandwidth. Compared with the DQN method, the JOA-RL method provided by the invention can better handle the continuous resource allocation problem and realize lower time delay and energy consumption. This shows that the JOA-RL method is more advantageous in the joint optimization problem of computation offload and resource allocation. When the network bandwidth is increased to a certain degree, the performances of the 5 methods except the Local method basically tend to be stable. This is because as the network bandwidth increases, the tasks that fail due to exceeding the delay constraint during the computation offload process are reduced, but the performance of these methods cannot be further improved due to the remaining constraint of the battery power of the MDs.

As shown in fig. 6, the Local method has no influence on the change of the computing power of the MEC server because there is no process of computing offload. As the computing power of the MEC server increases, the performance of 5 methods other than the Local method also tends to increase. Compared with a DQN method, the JOA-RL method provided by the invention can realize lower time delay and energy consumption, and the JOA-RL method can better handle continuous resource allocation problems, which shows that the JOA-RL method has more advantages in the combined optimization problem of calculation unloading and resource allocation. When the computing power of the MEC server increases to a certain extent, the performance of all 5 methods except the Local method also tends to be substantially stable. This is because as the computing power of the MEC server increases, the number of tasks that fail due to exceeding the delay constraint during the computation offload process decreases, but there is a constraint on the battery power of the MDs, so that the performance of these methods cannot be further improved.

As shown in fig. 7, for the Local method, the amount of electricity consumed by the task Local calculation is lower than the maximum capacity of the battery, so an increase in the MD battery maximum capacity has no effect on the Local method. For the other five methods, the amount of power consumed by uploading the task is large, so when the MD storage battery maximum capacity is small, the task often fails because the storage battery capacity is not enough to support calculation unloading. As the maximum capacity of the MD battery increases, the amount of stored power can support more computational offload, and therefore the performance of these five methods is on the rise. When the maximum capacity of the MD battery is increased to a certain extent, the failure of calculation unloading due to the insufficient maximum capacity of the MD battery is substantially eliminated, and the performance of these methods also tends to be stable. Compared with the DQN method, the JOA-RL method provided by the invention can better handle the continuous resource allocation problem and realize lower time delay and energy consumption. This shows that the JOA-RL method is more advantageous in the joint optimization problem of computation offload and resource allocation.

The above description is only a preferred embodiment of the present invention, and all equivalent changes and modifications made in accordance with the claims of the present invention should be covered by the present invention.

Claims

1. A joint optimization system for computation offload and resource allocation in a multi-constraint edge environment is characterized by comprising

Base station BS, MEC server and N chargeable mobile devices MDs, wherein the N chargeable mobile devices MDs are represented as a set MD = { MD = ₁ ,MD ₂ ,...MD _i ...,MD _N }; the chargeable mobile equipment MDs is accessed to the base station BS through a 5G or LTE mode, and the base station BS is provided with an MEC server.

2. The joint optimization system for computation offload and resource allocation in multi-constrained edge environment according to claim 1, wherein the MDs are equipped with energy harvesting components and powered by energy harvested from RF signals.

3. The joint optimization system for computation offload and resource allocation in multi-constrained edge environment of claim 1, wherein when the chargeable mobile device MDs generates tasks, the computing tasks are offloaded to the MEC server for execution or executed locally, and the tasks with higher priority tend to be offloaded to the MEC server for execution, specifically, the priority pr is higher than the priority pr _i ^T Is defined as

Wherein the content of the first and second substances,

representing the transmission channel gain in the sub-slot t,

is Task _i Amount of data of f _i Is MD _i Computing power of P _i Denotes MD _i The transmission power.

4. The optimization method of the joint optimization system for computation offload and resource allocation under the multi-constrained edge environment according to claim 1, comprising the following steps:

5. The optimization method according to claim 4, wherein the joint optimization model for computation offloading and resource allocation is constructed and trained based on Python3.6 and an open-source framework Pythrch, and specifically comprises the following steps:

(1) Obtaining MD _i Computing power f _i MEC server computing power

Network bandwidth

And initializing the system;

(2) Training is carried out, and the system environment state s obtained by each training is _t Inputting an operator network, performing an operator network output action a in the environment _t Executing corresponding unloading calculation and resource allocation operation;

(3) Calculating corresponding reward according to formula, and feeding back environment to perform reward r in the step of task accumulation _t And the next state s _t+1 And store the training samples in an experience replay pool m.push(s) _t ,a _t ,r _t ,s _t+1 )；

6. The optimization method according to claim 4, whichCharacterized in that, the initialization system specifically comprises: based on the state space, the action space and the reward function, firstly, the parameter theta of the operator network is initialized ^μ And parameter θ of critical network ^Q (ii) a Then, the operator network parameter theta is measured ^μ Assigning to the target actor network parameter theta ^μ′ And the criticc network parameter theta ^Q Assigning to the target critic network parameter theta ^Q′ Simultaneously initializing an experience replay pool M, training rounds P and a time series length T _max 。

7. The optimization method according to claim 6, wherein the state space, action space and reward function are as follows:

state space: the state space contains the tasks Task generated on all the MDs of the subslot t ^t Priority pr for offloading of tasks ^t MDs battery power b ^t And computing resources available to the MEC server at the current time

Thus, the system state at the sub-slot t is represented as:

wherein

An action space: the DRL agent carries out the actions of calculating unloading and resource allocation according to the current system state; the action space contains the offload decision α ^t Upload bandwidth allocation of tasks w ^t And the assigned MEC server computing resources p for the task ^t (ii) a Therefore, the action at the sub-slot t is represented as:

a _t ＝{α ^t ,w ^t ,p ^t equation (15)

Wherein the content of the first and second substances,

8. The optimization method according to claim 4, wherein the training is specifically: training criticic networks θ ^Q Fitting Q(s) off _t ,a _t ) When Q(s) _t ,a _t ) When determined, for a fixed s _t Must exist a _t So that Q(s) _t ,a _t ) Maximum, Q(s) _t ,a _t ) Expressed as:

Wherein the operator network theta ^μ According to the current state s _t Maximum operation a of output Q value _t The process is represented as:

a _t ＝μ(s _t |θ ^μ ) Formula (18)

The performance goal of an actor network is defined as:

9. the optimization method of claim 4, wherein a target actor mesh is definedLuo theta ^μ′ And target critic network theta ^Q′ ；

y _t ＝r _t +γQ′(s _t+1 ,μ′(s _t+1 |θ ^μ′ )|θ ^Q′ ) Formula (20)