CN116828541A - Edge computing dependent task dynamic unloading method and system based on multi-agent reinforcement learning - Google Patents

Edge computing dependent task dynamic unloading method and system based on multi-agent reinforcement learning Download PDF

Info

Publication number
CN116828541A
CN116828541A CN202310711734.XA CN202310711734A CN116828541A CN 116828541 A CN116828541 A CN 116828541A CN 202310711734 A CN202310711734 A CN 202310711734A CN 116828541 A CN116828541 A CN 116828541A
Authority
CN
China
Prior art keywords
task
edge
computing
dependent
energy consumption
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310711734.XA
Other languages
Chinese (zh)
Inventor
陈建海
李秋惠
刘二腾
沈睿
刘振广
何钦铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202310711734.XA priority Critical patent/CN116828541A/en
Publication of CN116828541A publication Critical patent/CN116828541A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/08Load balancing or load distribution
    • H04W28/09Management thereof
    • H04W28/0917Management thereof based on the energy state of entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5044Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5094Allocation of resources, e.g. of the central processing unit [CPU] where the allocation takes into account power or heat criteria
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/08Load balancing or load distribution
    • H04W28/09Management thereof
    • H04W28/0925Management thereof using policies
    • H04W28/0942Management thereof using policies based on measured or predicted load of entities- or links
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/08Load balancing or load distribution
    • H04W28/09Management thereof
    • H04W28/0958Management thereof based on metrics or performance parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/08Load balancing or load distribution
    • H04W28/09Management thereof
    • H04W28/0958Management thereof based on metrics or performance parameters
    • H04W28/0967Quality of Service [QoS] parameters
    • H04W28/0975Quality of Service [QoS] parameters for reducing delays
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/502Proximity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/509Offload

Abstract

The invention relates to the field of reinforcement learning and edge computing task scheduling, in particular to a method and a system for dynamically unloading an edge computing dependent task based on multi-agent reinforcement learning. The method comprises the following steps: abstracting and generalizing an application environment, and defining a computing unloading system model; abstracting and generalizing the edge computing environment, and defining a computing unloading system model; determining basic elements of a reinforcement learning algorithm according to a system model of a computing unloading system; and determining a network structure and a training method, and training the model to obtain an edge calculation dependent task unloading decision model. According to the method, all edge devices and edge servers are regarded as one system, and efficient unloading of dependent tasks in the system is achieved.

Description

Edge computing dependent task dynamic unloading method and system based on multi-agent reinforcement learning
Technical Field
The invention relates to the field of reinforcement learning and edge computing task scheduling, in particular to a method and a system for dynamically unloading an edge computing dependent task based on multi-agent reinforcement learning.
Background
With the development of 5G, internet of things and artificial intelligence technology, the number of various terminal devices is explosively increased, and the IMT-2020 (5G) propulsion group predicts that the number of global Internet of things device connections in 2030 will reach about 1000 hundred million. With the increase of the number of terminal devices of the internet of things, a large number of computationally intensive and delay sensitive applications, such as intelligent home, unmanned driving, face recognition, virtual reality, etc., emerge.
1. Edge computation and problem
In the internet of things scenario, since the terminal device with limited resources and the cloud server with a relatively long distance are difficult to meet the requirements of large-scale computation intensive and delay sensitive applications, edge computation is generated. Edge computation refers to computation performed at or near the physical location of a user or data source, thereby reducing latency and saving bandwidth. In the edge computing scenario, the edge device is a terminal device; an edge server refers to a server that is physically close to a terminal or data source. The edge device may send part of the computationally intensive and delay sensitive tasks to the edge server for processing, thereby improving task execution efficiency.
However, at different times, the number of tasks and load per edge device, the communication bandwidth between edge device and edge server, and the load per edge server are dynamically complex and variable. How to determine the offloading decision of each edge device, reduce the delay of each task, and improve the task execution efficiency of the whole system is a problem to be solved.
2. Reinforcement learning and current situation
The advent of reinforcement learning provides an effective solution for edge computing load task offloading decisions. Reinforcement learning is a branch of machine learning, mainly studying how an agent can maximize rewards it can acquire in complex, uncertain environments. Elements of the reinforcement learning environment include status, actions, and rewards. In deep reinforcement learning, actions are represented by neural networks. In the reinforcement learning process, the intelligent agent acquires a state from the environment, takes an action according to the state, obtains rewards of environment feedback, and then changes the state of the environment, and the cycle is repeated. Initially, the agent selects actions completely at random; as training progresses, the agent continuously modifies its own actions based on rewards, with the end goal of obtaining as much rewards as possible from the environment. In the task offload scenario of edge computing, the state is a feature extracted from the environment, such as bandwidth, memory of an edge server, etc.; an action is a task offloading decision, i.e., either performed locally or offloaded to a particular edge server; the reward may be a negative value of the delay or a negative value of the delay plus a factor such as energy consumption.
The existing unloading method based on reinforcement learning uses each device as an independent agent to independently interact with the edge server. In practice, the edge devices are not independent of each other, but rather affect each other, and the decision of each edge device may affect the offloading decision of the other edge device. In addition, most of the existing solutions focus on independent tasks which are independent of each other and are generated by edge devices, and research on dependent tasks with dependency relations is lacking.
In view of the foregoing, there is a need for a dynamic task-dependent offloading method that comprehensively considers behaviors of all edge devices to achieve efficient offloading of dependent tasks in a system.
Disclosure of Invention
Aiming at the defects of the traditional edge computing dependent task unloading method, the invention provides an edge computing dependent task dynamic unloading method and system based on multi-agent reinforcement learning, wherein in an edge computing scene, an agent is edge equipment, and the environment is an edge computing environment consisting of a network, an edge server and the like; all edge devices and edge servers are regarded as a system, so that the efficient unloading of dependent tasks in the system is realized.
The specific technical scheme is as follows:
the invention provides a multi-agent reinforcement learning-based dynamic unloading method for an edge computing dependent task, which comprises the following steps:
1) Abstracting and generalizing an edge computing environment, and constructing a computing unloading system model comprising a task definition part, a time delay computing part and an energy consumption computing part;
2) Determining basic elements of a reinforcement learning algorithm according to a system model of a computing unloading system, and constructing a reinforcement learning model;
3) And determining a network structure and a training method, and training the reinforcement learning model to obtain an edge calculation dependent task unloading decision model.
As a preferred embodiment of the present invention, the computing and unloading system model in step 1) specifically includes: a task definition portion representing a set of computation offloaded dependent tasks in the form of a Directed Acyclic Graph (DAG), a latency computation portion for computing a time from decision making to return of execution results for the task, and an energy consumption computation portion for computing energy consumption for task execution.
As a preferred embodiment of the present invention, the task defining section specifically includes: using a Directed Acyclic Graph (DAG) to represent dependent tasks requiring a set of computational offloading; wherein, the vertex in the directed acyclic graph is adopted to represent a task, the side pointing from the vertex B to the vertex A is adopted to represent the dependency relationship of the task A on the task B, and the task B is called as the father task of the task A; after all father tasks of a subtask are executed, the subtask can be executed; using tuples (lambda) i ,d i ,r i ) Representing the attributes of task i, lambda i The number of CPU clock cycles, d, required to perform task i i Input data size for task i, r i The returned data size for task i.
As a preferred embodiment of the present invention, the delay calculating section specifically includes:
when all the father tasks of the task i are executed at the time t, if the task i is unloaded to the edge server j for execution, the sending delay of the task i is as follows:
the reception delay is
Execution delay of
Total time delayIs that
wherein ,wup The uplink channel bandwidth between the edge device and the edge server; w (w) down For the downlink channel bandwidth between the edge device and the edge server, f es CPU clock frequency for edge server j;
if task i is executed locally, then total delay T ed Is that
wherein ,fed CPU clock frequency for edge devices.
As a preferred embodiment of the present invention, the energy consumption calculating section specifically includes:
when task i is offloaded to an edge server, the total energy consumption is
The energy consumption when task i is executed locally is
wherein ,transmission energy consumption during offloading to edge server for task i, +.>Reception energy consumption during offloading to edge server for task i, +.>Execution energy consumption when offloading task i to edge server, +.>For the total energy consumption when task i is offloaded to the edge server, P up For transmitting power, P down To receive power, c 1 Is the effective switched capacitance associated with the edge server chip architecture.
As a preferred embodiment of the present invention, the reinforcement learning algorithm includes: the state of the task i, the execution mode of the task i and the reward function; wherein the status of task iThe method comprises the following steps:
execution mode A of task i i The method comprises the following steps:
reward function R i For rewards negative of the sum of time delay and energy consumption:
wherein ,k for task i p A decision of tasks that are not ancestor and that have been offloaded but not completed at the current time;for all edge server loads from time T-h to time T-1, G (T, E) is the code of the DAG where task i is located,indicating that task i is performed locally, +.>Representing task i offloaded to edge server j for execution, E ed For the energy consumption of tasks when executing locally, E es For task offloading to edge server energy consumption, T ed Delay for task execution locally, +.>Latency for task offloading to edge server j execution.
As a preferred embodiment of the present invention, in step 3), the network structure specifically includes: each edge device includes an Actor network mu k And a Critic networkThe Actor network is responsible for selecting a task execution mode, and the Critic network is responsible for scoring the task execution mode selected by the edge device.
As a preferred embodiment of the present invention, in step 3), the training method specifically includes:
3.1 Setting algorithm super-parameters such as learning rate, maximum iteration times, discount factors of rewards and the like; randomly initializing parameters θ of an Actor network for each edge device k Critic network parameters w k Parameter θ of acttortarget network k Parameter W of the CriticTarget network k ′,
3.2 For the edge device k, if all the parent tasks of the current task i are executed at the time t-1, starting to execute the task i and updating network parameters: the edge device k observes itselfAnd inputting the own Actor network, outputting the unloading decision, and then executing the task i.
3.3 At time t-1, if the edge device k has the task execution completion of the decision at time t-p, storing the rewards of the edge device k at the time t-pIf the system is globally rewarded at time t-p>After the collection is completed, updating parameters:
and 4) repeating the operations of the steps 3.2) and 3.3) until the maximum iteration number is reached, and obtaining the edge calculation dependent task unloading decision model.
The invention also provides an edge computing dependent task dynamic unloading system based on the edge computing dependent task dynamic unloading method as claimed in claim 1, which comprises
The system model building module is used for abstracting and summarizing the dynamic unloading environment of the edge computing dependent tasks and building a computing unloading system model comprising a task definition part, a time delay computing part and an energy consumption computing part.
The reinforcement learning model construction module is used for analyzing and calculating an unloading system model and determining basic elements of a reinforcement learning algorithm, including states, actions and rewarding functions.
And the training module is used for determining a network structure and a training method, training the reinforcement learning model and obtaining an edge calculation dependent task unloading decision model.
Compared with the prior art, the invention has the beneficial effects that:
(1) Compared with the existing method only considering independent task unloading, the unloading object of the invention is a dependent task with a dependent relation.
(2) By adopting a multi-agent reinforcement learning algorithm, the influence of the behavior of the edge equipment on the environment is comprehensively considered, and the time delay and the energy consumption are jointly optimized, so that the efficient unloading of the dependent tasks in the edge computing scene is realized.
(3) An attention structure is added in a Critic network of an intelligent agent, so that the problem of dynamic change of the number of edge devices in a task-dependent unloading scene is solved.
Drawings
FIG. 1 is a schematic diagram of a scenario of the dependent task dynamic offloading method of the present invention;
FIG. 2 is a schematic workflow diagram of the dependent task dynamic offloading method of the present invention;
FIG. 3 is a schematic diagram of an agent Actor network architecture;
FIG. 4 is a schematic diagram of an agent Critic network architecture;
FIG. 5 is a schematic diagram of a task-dependent dynamic offload system.
Detailed Description
The invention is further illustrated and described below in connection with specific embodiments. The described embodiments are merely exemplary of the present disclosure and do not limit the scope. The technical features of the embodiments of the invention can be combined correspondingly on the premise of no mutual conflict.
Aiming at the defects of the traditional edge computing dependency task unloading method, in the invention, under the edge computing scene, the intelligent agent is edge equipment, and the environment is edge computing environment consisting of a network, an edge server and the like; all edge devices and edge servers are regarded as a system, so that the efficient unloading of dependent tasks in the system is realized. As shown in FIG. 5, the invention provides an edge computing dependent task dynamic unloading system of an edge computing dependent task dynamic unloading method, which comprises the following steps of
The system model building module is used for carrying out abstract induction on the dynamic unloading environment of the edge computing dependent task and building a computing unloading system model comprising task definition, time delay computing and energy consumption computing.
The reinforcement learning model construction module is used for analyzing and calculating an unloading system model and determining basic elements of a reinforcement learning algorithm, including states, actions and rewarding functions.
And the training module is used for determining a network structure and a training method, training the reinforcement learning model and obtaining an edge calculation dependent task unloading decision model.
In a specific embodiment of the present invention, as shown in fig. 1, the system has N edge devices, M edge servers, and the edge devices and the edge servers are connected through a wireless network, where the edge devices randomly generate dependency tasks with random structures, and the dependency relationship of the task a to the task B is represented by using a directed acyclic graph representation and using an edge pointing from the vertex B to the vertex a, and the task B is called as a parent task of the task a.
As shown in fig. 2, the workflow of the task dynamic offloading method includes the steps of:
1) According to the mapping values of the tasks, the directed acyclic graph DAG representing the dependent tasks is encoded into vectors: ordering each task in the DAG using topological ordering, employing index idx of ordering result i The number of the task is represented such that the number of the child task is necessarily larger than the number of the parent task. In the code G (T, E) of the DAG in which task i is located, the task rootOrdered according to an ascending order of numbers, each task employs a task attribute, an index of a parent task, and an index triplet ({ idx) of child tasks i ,d i ,r i },{idx p (p∈parent(i))},{idx c (c.epsilon.child (i))) representation.
2) Setting parameters of a computing unloading system model: the time delay calculation parameters comprise uplink channel bandwidth wup and downlink channel bandwidth w down CPU clock frequency f of edge device ed And CPU clock frequency f of edge server es The method comprises the steps of carrying out a first treatment on the surface of the The energy consumption calculation parameter comprises the transmission power P from the edge device to the edge server up Received power P down Effective switched capacitor c related to edge server chip architecture 1 And an effective switched capacitor c associated with the edge server chip architecture 2
3) Setting algorithm super-parameters such as learning rate, maximum iteration times, discount factors of rewards and the like; randomly initializing network parameters including parameters θ for the Actor network of each agent k Critic network parameters w k Parameter θ of acttortarget network k Parameter w of the CriticTarget network k ′。
4) For agent k, if all parent tasks of its current task i are executed at time t-1, starting to execute task i and updating network parameters: agent k observes itselfAnd inputting the own Actor network, outputting the unloading decision, and then executing the task i.
4-1) the observation vector of task i at time t consists of DAG encoding, offloading decisions of the first i-1 tasks, and the pending queue sizes of all edge servers from time t-h to time t-1:
4-2) motion vectorIndicating the offloading decision of the task->Indicating that task i is performed locally, +.>j > 0 means task i is offloaded to edge server j for execution.
4-3) System rewards rt are the sum of rewards after making a offload decision for each task that participates in the offload decision:
5) At time t-1, if the task execution of the decision of the time t-p is completed for the agent k, the rewards of the agent k at the time t-p are storedIf the system is globally rewarded at time t-p>After the collection is completed, updating parameters:
5-1) Global observations at time t-p, actions of all agents, global rewards and Global observation tuples at the next time (x t-p ,a t-p ,r t-p ,x t-p+1 ) And storing the data into a global experience playback pool D.
5-2) randomly sampling b samples from the empirical playback pool, for each agent z performing a task decision at time t-p, according to a loss function:
updating parameters of Critc network, where y t-p For TD Target, including true rewards at time t-p and predicted rewards by Critic network at time t-p+1:CriticTarget network, μ for agent z z ' is the ActorTarget network for agent z, and gamma is the rewards discount function.
5-3) according to the gradient:
updating parameters of an Actor network, mu z An Actor network for agent z, < ->Critic network for agent z.
5-4) according to w k ′=τw k +(1-τ)w k′ and θk ′=τθ k +(1-τ)θ k The parameters of the CriticTarget and ActorTarget networks are' updated.
6) Repeating steps 4) and 5) until a maximum number of iterations is reached.
7) And deploying the trained Actor network of each intelligent agent into an actual scene, wherein the Critic network is not used any more.
As shown in fig. 3, the Actor network structure of agent k includes LSTM, full connectivity layer, and multi-layer neural network. Wherein, the LSTM layer is input as the size of the queue to be processed from the time t-h to the time t-1 of all edge serversAnd outputting load prediction of the edge server at the time t-t+u.
As shown in fig. 4, the Critic network structure of agent k includes an attach and full connection layer. The observation and action information of the intelligent agent k is directly input into the first full-connection layer and then is input into the second full-connection layer, and the observation and action information of other intelligent agents are firstly encoded into a vector v through the transition i And then inputting a second full connection layer.
The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit of the invention.

Claims (9)

1. The edge computing dependent task dynamic unloading method based on multi-agent reinforcement learning is characterized by comprising the following steps of:
1) Abstracting and generalizing an edge computing environment, and constructing a computing unloading system model comprising a task definition part, a time delay computing part and an energy consumption computing part;
2) Determining basic elements of a reinforcement learning algorithm according to a system model of a computing unloading system, and constructing a reinforcement learning model;
3) And determining a network structure and a training method, and training the reinforcement learning model to obtain an edge calculation dependent task unloading decision model.
2. The edge computing-dependent task dynamic offloading method of claim 1, wherein the computing offloading system model in step 1) is specifically: a task definition portion representing a set of computation offloaded dependent tasks in the form of a Directed Acyclic Graph (DAG), a latency computation portion for computing a time from decision making to return of execution results for the task, and an energy consumption computation portion for computing energy consumption for task execution.
3. The edge computing-dependent task dynamic offloading method of claim 2, wherein the task definition portion is specifically: using a Directed Acyclic Graph (DAG) to represent dependent tasks requiring a set of computational offloading; wherein, the vertex in the directed acyclic graph is adopted to represent a task, the side pointing from the vertex B to the vertex A is adopted to represent the dependency relationship of the task A on the task B, and the task B is called as the father task of the task A; after all father tasks of a subtask are executed, the subtask can be executed; using tuples (lambda) i ,d i ,r i ) Representing the attributes of task i, lambda i The number of CPU clock cycles, d, required to perform task i i Input data size for task i, r i For task iIs a return data size of (c).
4. The edge computing-dependent task dynamic offloading method of claim 2, wherein the latency computing portion is specifically:
when all the father tasks of the task i are executed at the time t, if the task i is unloaded to the edge server j for execution, the sending delay of the task i is as follows:
the reception delay is
Execution delay of
Total time delayIs that
wherein ,wup The uplink channel bandwidth between the edge device and the edge server; w (w) down For the downlink channel bandwidth between the edge device and the edge server, f es CPU clock frequency for edge server j;
if task i is executed locally, then total delay T ed Is that
wherein ,fed CPU clock frequency for edge devices.
5. The edge computing-dependent task dynamic offloading method of claim 2, wherein the energy consumption computing portion is specifically:
when task i is offloaded to an edge server, the total energy consumption is
The energy consumption when task i is executed locally is
wherein ,transmission energy consumption during offloading to edge server for task i, +.>Reception energy consumption during offloading to edge server for task i, +.>Execution energy consumption when offloading tasks to edge servers, < >>For the total energy consumption when task i is offloaded to the edge server, P up For transmitting power, P down To receive power, c 1 Is the effective switched capacitance associated with the edge server chip architecture.
6. The edge computing dependent task dynamic offloading method of claim 1, wherein the base elements of the reinforcement learning algorithm comprise: the state of the task i, the execution mode of the task i and the reward function; wherein the status of task iThe method comprises the following steps:
execution mode A of task i i The method comprises the following steps:
reward function R i For rewards negative of the sum of time delay and energy consumption:
wherein ,k for task i p A decision of tasks that are not ancestor and that have been offloaded but not completed at the current time; />G (T, E) is the code of DAG where task i is located for the load of all edge servers from T-h time to T-1 time, < >>Indicating that task i is performed locally, +.>Representing task i offloaded to edge server j for execution, E ed For the energy consumption of tasks when executing locally, E es For task offloading to edge server energy consumption, T ed Delay for task execution locally, +.>Latency for task offloading to edge server j execution.
7. The edge computing-dependent task dynamic offloading method of claim 1, wherein in step 3), the network structure is specifically: each edge device includes an Actor network mu k And a Critic networkThe Actor network is responsible for selecting a task execution mode, and the Critic network is responsible for scoring the task execution mode selected by the edge device.
8. The method for dynamically offloading edge computing-dependent tasks according to claim 7, wherein in step 3), the training method is specifically:
3.1 Setting algorithm super-parameters such as learning rate, maximum iteration times, discount factors of rewards and the like; randomly initializing parameters θ of an Actor network for each edge device k Critic network parameters w k Parameter θ of acttortarget network k′ CritictTarget networkParameter w of (2) k′
3.2 For the edge device k, if all the parent tasks of the current task i are executed at the time t-1, starting to execute the task i and updating network parameters: the edge device k observes itselfAnd inputting the own Actor network, outputting the unloading decision, and then executing the task i.
3.3 At time t-1, if the edge device k has the task execution completion of the decision at time t-p, storing the rewards of the edge device k at the time t-pIf the system is globally rewarded at time t-p>After the collection is completed, updating parameters:
4) Repeating the operations of the steps 3.2) and 3.3) until the maximum iteration number is reached, and obtaining the edge calculation dependent task unloading decision model.
9. An edge computing dependent task dynamic offload system based on the edge computing dependent task dynamic offload method of claim 1, comprising
The system model building module is used for carrying out abstract induction on the dynamic unloading environment of the edge computing dependent task and building a computing unloading system model comprising task definition, time delay computing and energy consumption computing.
The reinforcement learning model construction module is used for analyzing and calculating an unloading system model and determining basic elements of a reinforcement learning algorithm, including states, actions and rewarding functions.
And the training module is used for determining a network structure and a training method, training the reinforcement learning model and obtaining an edge calculation dependent task unloading decision model.
CN202310711734.XA 2023-06-15 2023-06-15 Edge computing dependent task dynamic unloading method and system based on multi-agent reinforcement learning Pending CN116828541A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310711734.XA CN116828541A (en) 2023-06-15 2023-06-15 Edge computing dependent task dynamic unloading method and system based on multi-agent reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310711734.XA CN116828541A (en) 2023-06-15 2023-06-15 Edge computing dependent task dynamic unloading method and system based on multi-agent reinforcement learning

Publications (1)

Publication Number Publication Date
CN116828541A true CN116828541A (en) 2023-09-29

Family

ID=88119711

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310711734.XA Pending CN116828541A (en) 2023-06-15 2023-06-15 Edge computing dependent task dynamic unloading method and system based on multi-agent reinforcement learning

Country Status (1)

Country Link
CN (1) CN116828541A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117499491A (en) * 2023-12-27 2024-02-02 杭州海康威视数字技术股份有限公司 Internet of things service arrangement method and device based on double-agent deep reinforcement learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117499491A (en) * 2023-12-27 2024-02-02 杭州海康威视数字技术股份有限公司 Internet of things service arrangement method and device based on double-agent deep reinforcement learning
CN117499491B (en) * 2023-12-27 2024-03-26 杭州海康威视数字技术股份有限公司 Internet of things service arrangement method and device based on double-agent deep reinforcement learning

Similar Documents

Publication Publication Date Title
Wang et al. Computation offloading in multi-access edge computing using a deep sequential model based on reinforcement learning
CN113950066B (en) Single server part calculation unloading method, system and equipment under mobile edge environment
CN113242568B (en) Task unloading and resource allocation method in uncertain network environment
CN109753751B (en) MEC random task migration method based on machine learning
CN107766292B (en) Neural network processing method and processing system
JP2023510566A (en) Adaptive search method and apparatus for neural networks
CN110531996B (en) Particle swarm optimization-based computing task unloading method in multi-micro cloud environment
CN113064671A (en) Multi-agent-based edge cloud extensible task unloading method
CN113645637B (en) Method and device for unloading tasks of ultra-dense network, computer equipment and storage medium
CN112988285B (en) Task unloading method and device, electronic equipment and storage medium
CN112214301B (en) Smart city-oriented dynamic calculation migration method and device based on user preference
CN113867843B (en) Mobile edge computing task unloading method based on deep reinforcement learning
CN114205353B (en) Calculation unloading method based on hybrid action space reinforcement learning algorithm
CN114661466A (en) Task unloading method for intelligent workflow application in edge computing environment
CN116828541A (en) Edge computing dependent task dynamic unloading method and system based on multi-agent reinforcement learning
CN113626104A (en) Multi-objective optimization unloading strategy based on deep reinforcement learning under edge cloud architecture
CN115562756A (en) Multi-access edge computing vehicle task unloading method and system
CN111931901A (en) Neural network construction method and device
CN116938323B (en) Satellite transponder resource allocation method based on reinforcement learning
CN117436485A (en) Multi-exit point end-edge-cloud cooperative system and method based on trade-off time delay and precision
CN116009990B (en) Cloud edge collaborative element reinforcement learning computing unloading method based on wide attention mechanism
CN116954866A (en) Edge cloud task scheduling method and system based on deep reinforcement learning
CN114217881B (en) Task unloading method and related device
Jayakodi et al. A general hardware and software co-design framework for energy-efficient edge AI
CN115220818A (en) Real-time dependency task unloading method based on deep reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination