CN116828541A

CN116828541A - Edge computing dependent task dynamic unloading method and system based on multi-agent reinforcement learning

Info

Publication number: CN116828541A
Application number: CN202310711734.XA
Authority: CN
Inventors: 陈建海; 李秋惠; 刘二腾; 沈睿; 刘振广; 何钦铭
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2023-06-15
Filing date: 2023-06-15
Publication date: 2023-09-29

Abstract

The invention relates to the field of reinforcement learning and edge computing task scheduling, in particular to a method and a system for dynamically unloading an edge computing dependent task based on multi-agent reinforcement learning. The method comprises the following steps: abstracting and generalizing an application environment, and defining a computing unloading system model; abstracting and generalizing the edge computing environment, and defining a computing unloading system model; determining basic elements of a reinforcement learning algorithm according to a system model of a computing unloading system; and determining a network structure and a training method, and training the model to obtain an edge calculation dependent task unloading decision model. According to the method, all edge devices and edge servers are regarded as one system, and efficient unloading of dependent tasks in the system is achieved.

Description

Edge computing dependent task dynamic unloading method and system based on multi-agent reinforcement learning

Technical Field

The invention relates to the field of reinforcement learning and edge computing task scheduling, in particular to a method and a system for dynamically unloading an edge computing dependent task based on multi-agent reinforcement learning.

Background

With the development of 5G, internet of things and artificial intelligence technology, the number of various terminal devices is explosively increased, and the IMT-2020 (5G) propulsion group predicts that the number of global Internet of things device connections in 2030 will reach about 1000 hundred million. With the increase of the number of terminal devices of the internet of things, a large number of computationally intensive and delay sensitive applications, such as intelligent home, unmanned driving, face recognition, virtual reality, etc., emerge.

1. Edge computation and problem

In the internet of things scenario, since the terminal device with limited resources and the cloud server with a relatively long distance are difficult to meet the requirements of large-scale computation intensive and delay sensitive applications, edge computation is generated. Edge computation refers to computation performed at or near the physical location of a user or data source, thereby reducing latency and saving bandwidth. In the edge computing scenario, the edge device is a terminal device; an edge server refers to a server that is physically close to a terminal or data source. The edge device may send part of the computationally intensive and delay sensitive tasks to the edge server for processing, thereby improving task execution efficiency.

However, at different times, the number of tasks and load per edge device, the communication bandwidth between edge device and edge server, and the load per edge server are dynamically complex and variable. How to determine the offloading decision of each edge device, reduce the delay of each task, and improve the task execution efficiency of the whole system is a problem to be solved.

2. Reinforcement learning and current situation

The advent of reinforcement learning provides an effective solution for edge computing load task offloading decisions. Reinforcement learning is a branch of machine learning, mainly studying how an agent can maximize rewards it can acquire in complex, uncertain environments. Elements of the reinforcement learning environment include status, actions, and rewards. In deep reinforcement learning, actions are represented by neural networks. In the reinforcement learning process, the intelligent agent acquires a state from the environment, takes an action according to the state, obtains rewards of environment feedback, and then changes the state of the environment, and the cycle is repeated. Initially, the agent selects actions completely at random; as training progresses, the agent continuously modifies its own actions based on rewards, with the end goal of obtaining as much rewards as possible from the environment. In the task offload scenario of edge computing, the state is a feature extracted from the environment, such as bandwidth, memory of an edge server, etc.; an action is a task offloading decision, i.e., either performed locally or offloaded to a particular edge server; the reward may be a negative value of the delay or a negative value of the delay plus a factor such as energy consumption.

The existing unloading method based on reinforcement learning uses each device as an independent agent to independently interact with the edge server. In practice, the edge devices are not independent of each other, but rather affect each other, and the decision of each edge device may affect the offloading decision of the other edge device. In addition, most of the existing solutions focus on independent tasks which are independent of each other and are generated by edge devices, and research on dependent tasks with dependency relations is lacking.

In view of the foregoing, there is a need for a dynamic task-dependent offloading method that comprehensively considers behaviors of all edge devices to achieve efficient offloading of dependent tasks in a system.

Disclosure of Invention

Aiming at the defects of the traditional edge computing dependent task unloading method, the invention provides an edge computing dependent task dynamic unloading method and system based on multi-agent reinforcement learning, wherein in an edge computing scene, an agent is edge equipment, and the environment is an edge computing environment consisting of a network, an edge server and the like; all edge devices and edge servers are regarded as a system, so that the efficient unloading of dependent tasks in the system is realized.

The specific technical scheme is as follows:

the invention provides a multi-agent reinforcement learning-based dynamic unloading method for an edge computing dependent task, which comprises the following steps:

1) Abstracting and generalizing an edge computing environment, and constructing a computing unloading system model comprising a task definition part, a time delay computing part and an energy consumption computing part;

2) Determining basic elements of a reinforcement learning algorithm according to a system model of a computing unloading system, and constructing a reinforcement learning model;

3) And determining a network structure and a training method, and training the reinforcement learning model to obtain an edge calculation dependent task unloading decision model.

As a preferred embodiment of the present invention, the computing and unloading system model in step 1) specifically includes: a task definition portion representing a set of computation offloaded dependent tasks in the form of a Directed Acyclic Graph (DAG), a latency computation portion for computing a time from decision making to return of execution results for the task, and an energy consumption computation portion for computing energy consumption for task execution.

As a preferred embodiment of the present invention, the task defining section specifically includes: using a Directed Acyclic Graph (DAG) to represent dependent tasks requiring a set of computational offloading; wherein, the vertex in the directed acyclic graph is adopted to represent a task, the side pointing from the vertex B to the vertex A is adopted to represent the dependency relationship of the task A on the task B, and the task B is called as the father task of the task A; after all father tasks of a subtask are executed, the subtask can be executed; using tuples (lambda) ⁱ ，d ⁱ ，r ⁱ ) Representing the attributes of task i, lambda ⁱ The number of CPU clock cycles, d, required to perform task i ⁱ Input data size for task i, r ⁱ The returned data size for task i.

As a preferred embodiment of the present invention, the delay calculating section specifically includes:

when all the father tasks of the task i are executed at the time t, if the task i is unloaded to the edge server j for execution, the sending delay of the task i is as follows:

the reception delay is

Execution delay of

Total time delayIs that

wherein ,w^up The uplink channel bandwidth between the edge device and the edge server; w (w) ^down For the downlink channel bandwidth between the edge device and the edge server, f ^es CPU clock frequency for edge server j;

if task i is executed locally, then total delay T ^ed Is that

wherein ,f^ed CPU clock frequency for edge devices.

As a preferred embodiment of the present invention, the energy consumption calculating section specifically includes:

when task i is offloaded to an edge server, the total energy consumption is

The energy consumption when task i is executed locally is

wherein ,transmission energy consumption during offloading to edge server for task i, +.>Reception energy consumption during offloading to edge server for task i, +.>Execution energy consumption when offloading task i to edge server, +.>For the total energy consumption when task i is offloaded to the edge server, P ^up For transmitting power, P ^down To receive power, c ₁ Is the effective switched capacitance associated with the edge server chip architecture.

As a preferred embodiment of the present invention, the reinforcement learning algorithm includes: the state of the task i, the execution mode of the task i and the reward function; wherein the status of task iThe method comprises the following steps:

execution mode A of task i ⁱ The method comprises the following steps:

reward function R ⁱ For rewards negative of the sum of time delay and energy consumption:

wherein ,k for task i _p A decision of tasks that are not ancestor and that have been offloaded but not completed at the current time;for all edge server loads from time T-h to time T-1, G (T, E) is the code of the DAG where task i is located,indicating that task i is performed locally, +.>Representing task i offloaded to edge server j for execution, E ^ed For the energy consumption of tasks when executing locally, E ^es For task offloading to edge server energy consumption, T ^ed Delay for task execution locally, +.>Latency for task offloading to edge server j execution.

As a preferred embodiment of the present invention, in step 3), the network structure specifically includes: each edge device includes an Actor network mu ^k And a Critic networkThe Actor network is responsible for selecting a task execution mode, and the Critic network is responsible for scoring the task execution mode selected by the edge device.

As a preferred embodiment of the present invention, in step 3), the training method specifically includes:

3.1 Setting algorithm super-parameters such as learning rate, maximum iteration times, discount factors of rewards and the like; randomly initializing parameters θ of an Actor network for each edge device ^k Critic network parameters w ^k Parameter θ of acttortarget network ^k Parameter W of the CriticTarget network ^k ′，

3.2 For the edge device k, if all the parent tasks of the current task i are executed at the time t-1, starting to execute the task i and updating network parameters: the edge device k observes itselfAnd inputting the own Actor network, outputting the unloading decision, and then executing the task i.

3.3 At time t-1, if the edge device k has the task execution completion of the decision at time t-p, storing the rewards of the edge device k at the time t-pIf the system is globally rewarded at time t-p>After the collection is completed, updating parameters:

and 4) repeating the operations of the steps 3.2) and 3.3) until the maximum iteration number is reached, and obtaining the edge calculation dependent task unloading decision model.

The invention also provides an edge computing dependent task dynamic unloading system based on the edge computing dependent task dynamic unloading method as claimed in claim 1, which comprises

The system model building module is used for abstracting and summarizing the dynamic unloading environment of the edge computing dependent tasks and building a computing unloading system model comprising a task definition part, a time delay computing part and an energy consumption computing part.

The reinforcement learning model construction module is used for analyzing and calculating an unloading system model and determining basic elements of a reinforcement learning algorithm, including states, actions and rewarding functions.

And the training module is used for determining a network structure and a training method, training the reinforcement learning model and obtaining an edge calculation dependent task unloading decision model.

Compared with the prior art, the invention has the beneficial effects that:

(1) Compared with the existing method only considering independent task unloading, the unloading object of the invention is a dependent task with a dependent relation.

(2) By adopting a multi-agent reinforcement learning algorithm, the influence of the behavior of the edge equipment on the environment is comprehensively considered, and the time delay and the energy consumption are jointly optimized, so that the efficient unloading of the dependent tasks in the edge computing scene is realized.

(3) An attention structure is added in a Critic network of an intelligent agent, so that the problem of dynamic change of the number of edge devices in a task-dependent unloading scene is solved.

Drawings

FIG. 1 is a schematic diagram of a scenario of the dependent task dynamic offloading method of the present invention;

FIG. 2 is a schematic workflow diagram of the dependent task dynamic offloading method of the present invention;

FIG. 3 is a schematic diagram of an agent Actor network architecture;

FIG. 4 is a schematic diagram of an agent Critic network architecture;

FIG. 5 is a schematic diagram of a task-dependent dynamic offload system.

Detailed Description

The invention is further illustrated and described below in connection with specific embodiments. The described embodiments are merely exemplary of the present disclosure and do not limit the scope. The technical features of the embodiments of the invention can be combined correspondingly on the premise of no mutual conflict.

Aiming at the defects of the traditional edge computing dependency task unloading method, in the invention, under the edge computing scene, the intelligent agent is edge equipment, and the environment is edge computing environment consisting of a network, an edge server and the like; all edge devices and edge servers are regarded as a system, so that the efficient unloading of dependent tasks in the system is realized. As shown in FIG. 5, the invention provides an edge computing dependent task dynamic unloading system of an edge computing dependent task dynamic unloading method, which comprises the following steps of

The system model building module is used for carrying out abstract induction on the dynamic unloading environment of the edge computing dependent task and building a computing unloading system model comprising task definition, time delay computing and energy consumption computing.

In a specific embodiment of the present invention, as shown in fig. 1, the system has N edge devices, M edge servers, and the edge devices and the edge servers are connected through a wireless network, where the edge devices randomly generate dependency tasks with random structures, and the dependency relationship of the task a to the task B is represented by using a directed acyclic graph representation and using an edge pointing from the vertex B to the vertex a, and the task B is called as a parent task of the task a.

As shown in fig. 2, the workflow of the task dynamic offloading method includes the steps of:

1) According to the mapping values of the tasks, the directed acyclic graph DAG representing the dependent tasks is encoded into vectors: ordering each task in the DAG using topological ordering, employing index idx of ordering result _i The number of the task is represented such that the number of the child task is necessarily larger than the number of the parent task. In the code G (T, E) of the DAG in which task i is located, the task rootOrdered according to an ascending order of numbers, each task employs a task attribute, an index of a parent task, and an index triplet ({ idx) of child tasks _i ，d ⁱ ，r ⁱ }，{idx _p (p∈parent(i))}，{idx _c (c.epsilon.child (i))) representation.

2) Setting parameters of a computing unloading system model: the time delay calculation parameters comprise uplink channel bandwidth wup and downlink channel bandwidth w ^down CPU clock frequency f of edge device ^ed And CPU clock frequency f of edge server ^es The method comprises the steps of carrying out a first treatment on the surface of the The energy consumption calculation parameter comprises the transmission power P from the edge device to the edge server ^up Received power P ^down Effective switched capacitor c related to edge server chip architecture ₁ And an effective switched capacitor c associated with the edge server chip architecture ₂ 。

3) Setting algorithm super-parameters such as learning rate, maximum iteration times, discount factors of rewards and the like; randomly initializing network parameters including parameters θ for the Actor network of each agent ^k Critic network parameters w ^k Parameter θ of acttortarget network ^k Parameter w of the CriticTarget network ^k ′。

4) For agent k, if all parent tasks of its current task i are executed at time t-1, starting to execute task i and updating network parameters: agent k observes itselfAnd inputting the own Actor network, outputting the unloading decision, and then executing the task i.

4-1) the observation vector of task i at time t consists of DAG encoding, offloading decisions of the first i-1 tasks, and the pending queue sizes of all edge servers from time t-h to time t-1:

4-2) motion vectorIndicating the offloading decision of the task->Indicating that task i is performed locally, +.>j > 0 means task i is offloaded to edge server j for execution.

4-3) System rewards rt are the sum of rewards after making a offload decision for each task that participates in the offload decision:

5) At time t-1, if the task execution of the decision of the time t-p is completed for the agent k, the rewards of the agent k at the time t-p are storedIf the system is globally rewarded at time t-p>After the collection is completed, updating parameters:

5-1) Global observations at time t-p, actions of all agents, global rewards and Global observation tuples at the next time (x _t-p ，a _t-p ，r _t-p ，x _t-p+1 ) And storing the data into a global experience playback pool D.

5-2) randomly sampling b samples from the empirical playback pool, for each agent z performing a task decision at time t-p, according to a loss function:

updating parameters of Critc network, where y _t-p For TD Target, including true rewards at time t-p and predicted rewards by Critic network at time t-p+1:CriticTarget network, μ for agent z ^z ' is the ActorTarget network for agent z, and gamma is the rewards discount function.

5-3) according to the gradient:

updating parameters of an Actor network, mu ^z An Actor network for agent z, < ->Critic network for agent z.

5-4) according to w ^k ′＝τw ^k +(1-τ)w ^k′ and θ^k ′＝τθ ^k +(1-τ)θ ^k The parameters of the CriticTarget and ActorTarget networks are' updated.

6) Repeating steps 4) and 5) until a maximum number of iterations is reached.

7) And deploying the trained Actor network of each intelligent agent into an actual scene, wherein the Critic network is not used any more.

As shown in fig. 3, the Actor network structure of agent k includes LSTM, full connectivity layer, and multi-layer neural network. Wherein, the LSTM layer is input as the size of the queue to be processed from the time t-h to the time t-1 of all edge serversAnd outputting load prediction of the edge server at the time t-t+u.

As shown in fig. 4, the Critic network structure of agent k includes an attach and full connection layer. The observation and action information of the intelligent agent k is directly input into the first full-connection layer and then is input into the second full-connection layer, and the observation and action information of other intelligent agents are firstly encoded into a vector v through the transition _i And then inputting a second full connection layer.

The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit of the invention.

Claims

1. The edge computing dependent task dynamic unloading method based on multi-agent reinforcement learning is characterized by comprising the following steps of:

2. The edge computing-dependent task dynamic offloading method of claim 1, wherein the computing offloading system model in step 1) is specifically: a task definition portion representing a set of computation offloaded dependent tasks in the form of a Directed Acyclic Graph (DAG), a latency computation portion for computing a time from decision making to return of execution results for the task, and an energy consumption computation portion for computing energy consumption for task execution.

3. The edge computing-dependent task dynamic offloading method of claim 2, wherein the task definition portion is specifically: using a Directed Acyclic Graph (DAG) to represent dependent tasks requiring a set of computational offloading; wherein, the vertex in the directed acyclic graph is adopted to represent a task, the side pointing from the vertex B to the vertex A is adopted to represent the dependency relationship of the task A on the task B, and the task B is called as the father task of the task A; after all father tasks of a subtask are executed, the subtask can be executed; using tuples (lambda) ⁱ ,d ⁱ ,r ⁱ ) Representing the attributes of task i, lambda ⁱ The number of CPU clock cycles, d, required to perform task i ⁱ Input data size for task i, r ⁱ For task iIs a return data size of (c).

4. The edge computing-dependent task dynamic offloading method of claim 2, wherein the latency computing portion is specifically:

the reception delay is

Execution delay of

Total time delayIs that

if task i is executed locally, then total delay T ^ed Is that

wherein ,f^ed CPU clock frequency for edge devices.

5. The edge computing-dependent task dynamic offloading method of claim 2, wherein the energy consumption computing portion is specifically:

when task i is offloaded to an edge server, the total energy consumption is

The energy consumption when task i is executed locally is

wherein ,transmission energy consumption during offloading to edge server for task i, +.>Reception energy consumption during offloading to edge server for task i, +.>Execution energy consumption when offloading tasks to edge servers, < >>For the total energy consumption when task i is offloaded to the edge server, P ^up For transmitting power, P ^down To receive power, c ₁ Is the effective switched capacitance associated with the edge server chip architecture.

6. The edge computing dependent task dynamic offloading method of claim 1, wherein the base elements of the reinforcement learning algorithm comprise: the state of the task i, the execution mode of the task i and the reward function; wherein the status of task iThe method comprises the following steps:

execution mode A of task i ⁱ The method comprises the following steps:

wherein ,k for task i _p A decision of tasks that are not ancestor and that have been offloaded but not completed at the current time; />G (T, E) is the code of DAG where task i is located for the load of all edge servers from T-h time to T-1 time, < >>Indicating that task i is performed locally, +.>Representing task i offloaded to edge server j for execution, E ^ed For the energy consumption of tasks when executing locally, E ^es For task offloading to edge server energy consumption, T ^ed Delay for task execution locally, +.>Latency for task offloading to edge server j execution.

7. The edge computing-dependent task dynamic offloading method of claim 1, wherein in step 3), the network structure is specifically: each edge device includes an Actor network mu ^k And a Critic networkThe Actor network is responsible for selecting a task execution mode, and the Critic network is responsible for scoring the task execution mode selected by the edge device.

8. The method for dynamically offloading edge computing-dependent tasks according to claim 7, wherein in step 3), the training method is specifically:

3.1 Setting algorithm super-parameters such as learning rate, maximum iteration times, discount factors of rewards and the like; randomly initializing parameters θ of an Actor network for each edge device ^k Critic network parameters w ^k Parameter θ of acttortarget network ^k′ CritictTarget networkParameter w of (2) ^k′ ，

4) Repeating the operations of the steps 3.2) and 3.3) until the maximum iteration number is reached, and obtaining the edge calculation dependent task unloading decision model.

9. An edge computing dependent task dynamic offload system based on the edge computing dependent task dynamic offload method of claim 1, comprising