CN113993218A

CN113993218A - Multi-agent DRL-based cooperative unloading and resource allocation method under MEC architecture

Info

Publication number: CN113993218A
Application number: CN202111367135.8A
Authority: CN
Inventors: 唐元春; 夏炳森; 陈端云; 冷正龙; 林文钦; 林彧茜; 周钊正; 李翠; 游敏毅; 黄莘程
Original assignee: State Grid Fujian Electric Power Co Ltd; Economic and Technological Research Institute of State Grid Fujian Electric Power Co Ltd
Current assignee: State Grid Fujian Electric Power Co Ltd; Economic and Technological Research Institute of State Grid Fujian Electric Power Co Ltd
Priority date: 2021-11-18
Filing date: 2021-11-18
Publication date: 2022-01-28

Abstract

The invention relates to a cooperative unloading and resource allocation method based on a multi-agent DRL under an MEC architecture. The method comprises the following steps: 1) providing a collaborative MEC-based system architecture, considering the collaboration among edge nodes, namely migrating the task request to other low-load edge nodes for collaborative processing when the edge nodes are overloaded; 2) adopting a partial unloading strategy, namely unloading partial computing tasks to the edge server for execution, and distributing the rest part to the local IoT equipment for execution; 3) aiming at the dynamic change characteristics of task arrival, modeling a joint optimization problem of a task unloading decision, a computing resource allocation decision and a communication resource allocation decision as an MDP problem; 4) and further dynamically allocating resources by using a multi-agent reinforcement learning-based cooperative task unloading and resource allocation method so as to maximize the experience quality of users in the system. The invention realizes the dynamic management of system resources under the architecture of the collaborative MEC system, and reduces the average delay and energy consumption of the system.

Description

Multi-agent DRL-based cooperative unloading and resource allocation method under MEC architecture

Technical Field

The invention belongs to the technical field of mobile communication, and particularly relates to a multi-agent DRL-based cooperative unloading and resource allocation method under an MEC architecture.

Background

The explosive growth of Smart Mobile Devices (SMDs) and internet of things (IoT) is accelerating the development of computing-intensive and delay-sensitive applications such as virtual/augmented reality, auto-driving, face recognition, smart cities and smart grids, etc., which has led to a tremendous pressure on mobile terminals and smart internet of things devices with limited computing power [1 ]. This complex application requires SMDs with higher computing power, memory and battery life [2 ]. Fortunately, Mobile Edge Computing (MEC) provides an effective solution to this problem by deploying computing resources to edge servers close to the Base Station (BS) so that intelligent internet of things devices can offload their computing tasks. There are also some challenges in MEC systems, particularly task offloading and resource allocation. A key issue for task offloading is how to select an appropriate edge server that not only helps to offload tasks, but also guarantees service requirements, especially in a multi-user and multi-edge server MEC system. The joint optimization of computation offloading and resource allocation can significantly reduce the response time of tasks and save energy consumption.

The prior art proposes a three-tier computing network that takes into account cooperation by taking advantage of vertical cooperation between devices, edge nodes and cloud servers, and horizontal cooperation between edge nodes. And jointly optimize offloading decisions and computational resource allocation to minimize average task time in the case of limited device battery capacity, but without considering the energy consumption of the edge nodes. The prior art also proposes an online optimization strategy for MECs Server (MECs) computation task offloading with sleep control scheme to minimize the long term energy consumption of MECs networks. However, it only concerns the energy consumption of the edge nodes, does not take into account the resource allocation problem, nor does it take into account the cooperation between the edge servers.

From the above analysis, it is clear that the existing research does not consider cooperation between edge servers and joint optimization of resource allocation at the same time, and also does not consider the influence caused by dynamic change of environment, and basically only considers the unloading problem of one service, and does not extend to the situation of continuous arrival of multiple services. Therefore, there is a need to develop a cooperative offloading and resource allocation method based on multi-agent deep reinforcement learning under the MEC architecture.

Disclosure of Invention

The invention aims to provide a cooperative unloading and resource allocation method based on a multi-agent DRL under an MEC architecture, which can realize the minimization of system cost by selecting the most appropriate task unloading and resource allocation method.

In order to achieve the purpose, the technical scheme of the invention is as follows: a multi-agent DRL-based cooperative unloading and resource allocation method under an MEC architecture comprises the following steps:

s1, providing a collaborative MEC system architecture, considering collaboration among edge nodes, namely migrating the task request to other low-load edge nodes for collaborative processing when the edge nodes are overloaded;

s2, adopting a partial unloading strategy, namely unloading partial calculation tasks to the edge server for execution, and distributing the rest part to the local IoT equipment for execution;

s3, aiming at the dynamic change characteristics of task arrival, modeling the joint optimization problem of task unloading decision, calculation resource allocation decision and communication resource allocation decision as an MDP problem;

and S4, dynamically allocating resources by using a multi-agent-based deep reinforcement learning cooperative task unloading and resource allocation method to maximize the experience quality of users in the system.

In an embodiment of the present invention, the architecture of the MEC system based on collaboration includes an equipment layer and an edge layer; the equipment layer consists of various IoT equipment, the edge layer consists of a plurality of base stations, the base stations are connected through a wired network, and each base station is provided with an MEC server with computing and storing capabilities; when unloading tasks to the edge servers, selecting a target edge server and considering the current resource residual state of each edge server; the user can firstly choose to unload the task to the edge server which is associated with the base station for providing access to the user for execution; however, due to the fact that the number of users served by different edge servers and the task achievement rate of the users are different, service space-time distribution is uneven, and differential communication loads and calculation loads are formed among different nodes; the mismatching between the computing capacity and the computing load causes certain edge nodes to be particularly busy, and larger computing time delay is generated; some edge nodes are quite idle, resulting in lower resource utilization; therefore, the calculation load balance is realized by considering the cooperation among the edge nodes, namely, the tasks of the busy edge nodes are transferred to the idle edge nodes for cooperation processing, so that the resource utilization rate of the system is improved on the whole, and the task execution delay is reduced.

In an embodiment of the present invention, the specific implementation manner of step S2 is: in order to better utilize computing resources on the IoT equipment and the edge server, a partial unloading strategy is designed; assuming that in each time slot, the IoT user equipment partially unloads the computing task to the edge node end for execution, and the rest part is processed locally, so as to fully utilize the computing resources on the local IoT equipment and accelerate the processing of the computing task, and consider adopting a partial task unloading strategy; the system determines the amount of the computing tasks left for local processing according to the IoT resources and the current remaining amount of the resources, and unloads the remaining computing tasks to the edge nodes with stronger computing power for processing, so that the computing tasks can be processed in parallel, and the task processing speed is increased.

In an embodiment of the present invention, the specific implementation manner of step S3 is: assuming that each user adopts the task scheduling rule of first-in first-out FIFO, the task scheduler decides to offload part or all of the task to the edge server and/or execute locally; similarly, each edge server is also provided with a task scheduler of FIFO, which is used for deciding whether the incoming task is processed by the server or transferred to the adjacent edge server; due to the fact that the random task model is more consistent with the characteristic of dynamic change of the environment, the arrival of the tasks obeys Poisson distribution; resource control decisions for a particular task under a stochastic task model need to take into account their impact on future tasks from the perspective of the long-term average performance of the system; calculating the total execution time delay and energy consumption of the tasks generated in each time slot by modeling a task model, a local calculation model and an unloading model, and establishing an optimization model meeting task scheduling constraints, communication resources and calculation resource constraints; since the constraints in the optimization model contain binary variables and are therefore a non-convex problem, the optimization model is modeled as a multi-agent MDP architecture based markov game.

In an embodiment of the present invention, the specific way of modeling the optimization model as the markov game based on the multi-agent MDP architecture is as follows:

markov games for M agents are represented by < M, S, A, R, P >, where M represents the number of agents, S represents the state of the system, and A represents the joint action of all agentsSet, R ═ R₁(s(t),a(t)),…,R_M(s (t), a (t)) }, wherein R_m(s, a, t) indicates that agent m is in state s_m(t) selecting action a_m(t) a reward function; further, the MDP model for each agent is defined as follows:

1) state space: the state space is composed of the task arrival rate of each time slot, the service characteristics and the rest computing resources;

2) the action space: the action space is composed of a cooperative unloading decision of each time slot, unloading rate, sub-channel distribution factors, transmission power and calculation resource distribution;

3) a return function: the reward function represents all agents in state s_m(t) selecting action a_m(t), each of the reward functions representing the total delay and cost of energy consumption of the system.

In an embodiment of the present invention, for a markov game based on a multi-agent MDP architecture, a task offloading and resource allocation method based on a multi-agent depth certainty policy gradient algorithm madpg is proposed to solve, which is specifically as follows:

in a multi-agent environment, each agent independently updates their own strategy in the learning process, which may cause instability in the training process; in addition, the agents need to cooperate to obtain decision information of other agents to calculate a local Q function, so that in order to overcome instability of a multi-agent environment and enable the agents to realize mutual cooperation, a multi-agent depth certainty strategy gradient algorithm MADDPG is provided; the intelligent agents can obtain the states and actions of other intelligent agents through cooperation, so that a strategy that the cooperation is learned more accurately among the intelligent agents is realized; madpg simulates the impact of other agents in the environment by using a centralized value function and will do a centralized training distributed execution.

In an embodiment of the present invention, the task offloading and resource allocation method based on the multi-agent depth certainty policy gradient algorithm madpg specifically includes:

each agent is divided into two parts: a judge part and an actor part; each agent can acquire own state information in each time slot, and then the state information is input into the actor network, so that the optimal action under the current strategy can be obtained; each agent will get a current reward and enter a new state by performing the action; in order to break the correlation between experience data, an experience pool is adopted for playback; at the beginning of each time slot, each agent randomly extracts H samples from the global experience pool, and then inputs the samples into the judge network respectively to obtain a state action function; the actor network is responsible for selecting the action, and the judge network tells the actor network whether the selected action is appropriate according to the obtained state action function.

Compared with the prior art, the invention has the following beneficial effects:

1. in the invention, a collaborative MEC-based system architecture is provided, and the collaboration among edge nodes is considered, namely when the edge nodes are overloaded, the task requests are migrated to other low-load edge nodes for collaborative processing, so that the resource utilization rate on the edge nodes is further improved, and the congestion problem on some busy nodes is reduced.

2. And adopting a partial unloading strategy, namely unloading partial computing tasks to the edge server for execution, and distributing the rest on the local IoT equipment for execution, so that computing resources on the local IoT equipment are fully utilized, and the processing of the computing tasks is accelerated.

3. And considering the dynamic change characteristic of task arrival, and modeling the joint optimization problem of task unloading decision, calculation resource allocation decision and communication resource allocation decision into an MDP problem.

4. And further, a multi-agent deep reinforcement learning-based cooperative task unloading and resource allocation method is used for dynamically allocating resources to maximize the experience quality of users in the system, so that the dynamic management of system resources under the cooperative MEC system architecture is realized, and the average delay and energy consumption of the system are reduced.

Drawings

FIG. 1 is a diagram illustrating a network architecture scenario to which embodiments of the present invention may be applied;

FIG. 2 is a diagram of an actor network architecture;

FIG. 3 is a diagram of a judgmental network architecture;

FIG. 4 is a flow chart of a cooperative offloading and resource allocation method based on multi-agent deep reinforcement learning.

Detailed Description

The technical scheme of the invention is specifically explained below with reference to the accompanying drawings.

Fig. 1 is a schematic diagram of a network architecture scenario to which an embodiment of the present invention may be applied, including an equipment layer and an edge layer. The device layer is composed of various IoT devices, the edge layer is composed of a plurality of base stations, the base stations are connected through a wired network, and each base station is provided with an MEC server with computing and storing capabilities. Therefore, when offloading tasks to the edge servers, the selection of the target edge server should take into account the current resource remaining state of each edge server. The user may first choose to offload the task to the edge server associated with the base station for which access is provided. However, due to the difference in the number of users served by different edge servers and the difference in the task achievement rate of the users, the spatial and temporal distribution of the services is uneven, so that different nodes have different communication loads and calculation loads. The mismatching between the computing capacity and the computing load causes certain edge nodes to be particularly busy, and larger computing time delay is generated; while some edge nodes are very idle, resulting in lower resource utilization. Therefore, the calculation load balance is realized through cooperation among the edge nodes, namely, a busy task of the edge node is transferred to an idle edge node for cooperation processing, so that the resource utilization rate of the system is improved on the whole, and the task execution delay is reduced.

FIG. 2 is a diagram of an actor network architecture. It can be seen from the figure that the input to the actor network is the state of each agent, and then a series of actions is finally output through a two-layer fully-connected network with modified linear elements (relus) as activation functions.

Fig. 3 is a diagram of a structure of a judger network, shown as consisting of two fully connected hidden layers and one output layer of a node, wherein the hidden layers still use ReLU as an activation function. Unlike an actor network, the inputs to the judge network are the state and actions of all agents, and the outputs are state-behavior value functions.

FIG. 4 is a flowchart of a cooperative offloading and resource allocation method based on multi-agent deep reinforcement learning, specifically including the steps of:

201. initializing the state of the system and relevant parameters in the system, including relevant parameters such as a cooperation unloading decision, an unloading rate, a sub-channel distribution factor, transmission power and calculation resource distribution.

202. Modeling the unloading task model and the communication model, and characterizing the unloading task model and the communication model by corresponding parameters. IoT device n connected to access point m is U_n,mThe resulting task obeys the mean value of

(packets/s) and these tasks are independent of each other. Device U_n,mThe task generated in the time slot t is

Wherein

Indicating arrival of IoT device U at time t_n,mThe data size of the task of (1), the unit is bit,

indicating arrival of IoT device U at time t_n,mThe strength of the computational resources required for the task (number of CPU cycles/bit),

representing a latency constraint for the execution of the task. And the communication modeling quantifies the transmission rate through a Shannon formula.

203. And calculating the average time delay and the average energy consumption of the system at the time t, and obtaining the average time delay and the average energy consumption of the system at the time t by calculating the accumulation of the time delay and the energy consumption required by the task to be completed locally and the task to be completed by the server.

204. By calculating the average time delay and the average energy consumption of the system, a relevant optimization model can be established, wherein the optimization model is to minimize the long-term average cost (time delay and energy consumption) of the system, and the constraint conditions are to meet the requirements of unloading, time delay (the maximum time delay allowed cannot be exceeded), sub-channel allocation (each device is allowed to allocate only one sub-channel), calculation resources (the maximum calculation resource amount cannot be exceeded by the allocated calculation resources), and power constraint (the power cannot exceed the maximum power).

205. Establishing a Markov decision process based on multiple agents, wherein Markov games of M agents are expressed by < M, S, A, R, P > in terms of M, S and A, wherein M represents the number of agents, S represents the state set of the system, A represents the action set of all agents, and R ═ { R { (R)₁(s(t),a(t)),…,R_M(s (t), a (t)) }, wherein R_m(s, a, t) represents a reward function, i.e. the mth agent is in state s_m(t) selecting action a_m(t) the resulting reward function. Further, the MDP model for each agent is defined as follows:

206. Selecting the optimal resource unloading and resource allocation decision at the current moment according to the MADDPG algorithm, as shown in the following formula

207. And executing the optimal decision at the current moment, and updating the system to the state at the next moment.

208. And judging whether the iteration number of the whole multi-agent deep reinforcement learning cooperative unloading and resource allocation method flow reaches the maximum iteration number T, if not, continuing to execute the step 401, otherwise, ending the whole flow.

Finally, it is noted that the above-mentioned preferred embodiments illustrate rather than limit the invention, and that, although the invention has been described in detail with reference to the above-mentioned preferred embodiments, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the scope of the invention as defined by the appended claims.

The above are preferred embodiments of the present invention, and all changes made according to the technical scheme of the present invention that produce functional effects do not exceed the scope of the technical scheme of the present invention belong to the protection scope of the present invention.

Claims

1. A cooperative unloading and resource allocation method based on multi-agent DRL under MEC architecture is characterized by comprising the following steps:

2. The method of claim 1, wherein the architecture of the cooperative multi-agent DRL-based MEC system comprises an equipment layer and an edge layer; the equipment layer consists of various IoT equipment, the edge layer consists of a plurality of base stations, the base stations are connected through a wired network, and each base station is provided with an MEC server with computing and storing capabilities; when unloading tasks to the edge servers, selecting a target edge server and considering the current resource residual state of each edge server; the user can firstly choose to unload the task to the edge server which is associated with the base station for providing access to the user for execution; however, due to the fact that the number of users served by different edge servers and the task achievement rate of the users are different, service space-time distribution is uneven, and differential communication loads and calculation loads are formed among different nodes; the mismatching between the computing capacity and the computing load causes certain edge nodes to be particularly busy, and larger computing time delay is generated; some edge nodes are quite idle, resulting in lower resource utilization; therefore, the calculation load balance is realized by considering the cooperation among the edge nodes, namely, the tasks of the busy edge nodes are transferred to the idle edge nodes for cooperation processing, so that the resource utilization rate of the system is improved on the whole, and the task execution delay is reduced.

3. The method for collaborative offloading and resource allocation based on multi-agent DRL under MEC architecture of claim 1, wherein the step S2 is specifically implemented as follows: in order to better utilize computing resources on the IoT equipment and the edge server, a partial unloading strategy is designed; assuming that in each time slot, the IoT user equipment partially unloads the computing task to the edge node end for execution, and the rest part is processed locally, so as to fully utilize the computing resources on the local IoT equipment and accelerate the processing of the computing task, and consider adopting a partial task unloading strategy; the system determines the amount of the computing tasks left for local processing according to the IoT resources and the current remaining amount of the resources, and unloads the remaining computing tasks to the edge nodes with stronger computing power for processing, so that the computing tasks can be processed in parallel, and the task processing speed is increased.

4. The method for collaborative offloading and resource allocation based on multi-agent DRL under MEC architecture of claim 1, wherein the step S3 is specifically implemented as follows: assuming that each user adopts the task scheduling rule of first-in first-out FIFO, the task scheduler decides to offload part or all of the task to the edge server and/or execute locally; similarly, each edge server is also provided with a task scheduler of FIFO, which is used for deciding whether the incoming task is processed by the server or transferred to the adjacent edge server; due to the fact that the random task model is more consistent with the characteristic of dynamic change of the environment, the arrival of the tasks obeys Poisson distribution; resource control decisions for a particular task under a stochastic task model need to take into account their impact on future tasks from the perspective of the long-term average performance of the system; calculating the total execution time delay and energy consumption of the tasks generated in each time slot by modeling a task model, a local calculation model and an unloading model, and establishing an optimization model meeting task scheduling constraints, communication resources and calculation resource constraints; since the constraints in the optimization model contain binary variables and are therefore a non-convex problem, the optimization model is modeled as a multi-agent MDP architecture based markov game.

5. The method for collaborative offloading and resource allocation based on multi-agent DRL under MEC architecture as claimed in claim 4, wherein the specific way to model the optimization model as the Markov game based on multi-agent MDP architecture is:

markov games for M agents are represented by < M, S, A, R, P >, where M represents the number of agents, S represents the state of the system, A represents the joint action set for all agents, and R ═ R { (R)₁(s(t),a(t)),…,R_M(s (t), a (t)) }, wherein R_m(s, a, t) indicates that agent m is in state s_m(t) selecting action a_m(t) a reward function; further, the MDP model for each agent is defined as follows:

6. The method for cooperative offloading and resource allocation based on multi-agent DRL under MEC architecture as claimed in claim 5, wherein for the markov game based on multi-agent MDP architecture, a task offloading and resource allocation method based on multi-agent depth deterministic policy gradient algorithm madpg is proposed to solve, specifically as follows:

7. The method for cooperative offloading and resource allocation based on multi-agent DRL under MEC architecture as claimed in claim 6, wherein the task offloading and resource allocation method based on multi-agent deep deterministic policy gradient algorithm MADDPG is as follows: