CN116450241A

CN116450241A - Multi-user time sequence dependent service calculation unloading method based on graph neural network

Info

Publication number: CN116450241A
Application number: CN202310425029.3A
Authority: CN
Inventors: 孙阳; 边钰薇; 翟雨晴; 吴文君; 王朱伟
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2023-04-20
Filing date: 2023-04-20
Publication date: 2023-07-18

Abstract

The invention discloses a time sequence dependent service unloading method based on a graph neural network and deep reinforcement learning, which carries out modeling research on an MEC system of multiple users and multiple servers, wherein the application generated by the users arrives dynamically, the size and the shape of the application are arbitrary, and the time sequence dependent tasks forming the application can be unloaded to an MEC server where a service base station is located or to MEC servers of other adjacent base stations. We use the graph neural network to capture the feature states of timing dependent services and applications more efficiently. The fine-grained offloading problem is modeled as a Markov process, decisions are made using a deep reinforcement learned strategy gradient algorithm, and the order of task selection and offloading positions are optimized in a complex and high-dimensional state space to reduce the scheduling time applied in the system. Simulation proves that the algorithm has good convergence in the aspect of reducing the scheduling time of the application program, and is superior to other reference algorithms, so that the efficiency of the MEC system is improved.

Description

Multi-user time sequence dependent service calculation unloading method based on graph neural network

Technical Field

The invention relates to a fine-grained task scheduling and unloading method based on a graph neural network and a deep reinforcement learning algorithm, which designs a decision scheme for reducing application scheduling time in a system under MEC (media access control) scenes of multiple users and multiple servers, selects tasks to be scheduled in a complex and high-dimensional state space and selects a server for unloading the tasks, optimizes unloading time sequence and unloading position, and belongs to the related fields of wireless communication and computing unloading.

Background

With the rapid development of internet of things (IoT) and mobile devices, application programs are increasingly sensitive to computational effort and latency, which presents a serious challenge for resource-limited devices. The Mobile Edge Computing (MEC) provides infrastructure such as computing storage and the like in a place close to a data source or a user, and provides cloud service and IT environment service for edge application, so that the problems of long time delay and overlarge converged flow are solved. The computing offloading is used as a key technology in MEC, provides better support for real-time and bandwidth intensive business application, and a set of more fine-grained and intelligent task offloading mechanism is established for the characteristics and diversified performance requirements of intelligent products and applications in the future, so that network resource management is comprehensively planned, network service efficiency is improved, and user experience is enhanced.

In the unloading research considering time sequence dependent services, the time-dependent characteristics of the services are usually modeled as a directed acyclic graph (Directed Acyclic Graph, DAG) model, because the DAG structure is complex and various, the problem of calculating and unloading based on DAG type application is difficult, documents [1-3] adopt heuristic algorithms to solve the problem of scheduling and unloading time sequence dependent services, however, the heuristic algorithms can obtain optimal solutions through a large number of iterations and are not suitable for MEC systems with high real-time performance, deep reinforcement learning can search the optimal solutions in complex and huge states, document [4] provides a time sequence dependent service scheduling scheme based on priority, scheduling efficiency is improved while task time sequence dependence is guaranteed, and efficient unloading of calculation tasks is realized through offline training and online deployment. Document [5] proposes a method for capturing the time sequence dependency of a DAG by using an S2S neural network, encoding the nodes of the DAG into a series of embedments by fixed length, training the S2S neural network by using a strategy gradient algorithm, and realizing calculation offloading in a single-user single-server scene, wherein the embedment vector of each task in the method is fixed length, and the length of the embedment vector input into the neural network cannot be too long, so that the structural information of the DAG cannot be accurately extracted. Document [6] proposes a Directed Acyclic Graph Neural Network (DAGNN) to solve the problem of missing structural information of the DAG graph when the DAG graph is input to the neural network of the DRL algorithm, and uses the GNN in combination with the DRL algorithm to solve the problem of offloading scheduling in a single user scenario.

The above scenario of solving the offloading problem by using the deep reinforcement learning algorithm is mostly only aimed at static scenarios of single user or single server, lacks offloading research on dynamically arrived applications in MEC scenarios of multiple users and multiple servers, and the feature extraction of the time-sequence dependent service is only limited to the information of the task itself, lacks global information between tasks and applications.

Different from the existing work, (1) the invention considers a more complex MEC scene, and carries out modeling research on a multi-user multi-server MEC system, and the time sequence dependent service of each user can be unloaded to a directly connected server or other servers. The user-generated application is dynamically arrived at and the size, shape of the application is arbitrary. (2) The invention utilizes two-stage embedding (task level and global level embedding) of the graph neural network to more effectively capture the states of time sequence dependent services and applications with arbitrary sizes and shapes. (3) The invention carries out Markov process modeling aiming at the unloading problem with fine granularity, utilizes a strategy gradient algorithm of deep reinforcement learning to carry out decision, selects a task to be scheduled and selects a server to be unloaded for the task, thereby optimizing the unloading time sequence and the unloading position.

[1].W.He,L.Gao and J.Luo,"A Multi-Layer Offloading Framework for Dependency-Aware Tasks in MEC,"ICC 2021-IEEE International Conference on Communications,Montreal,QC,Canada,2021,pp.1-6.

[2].H.Liao,X.Li,D.Guo,W.Kang and J.Li,"Dependency-Aware Application Assigning and Scheduling in Edge Computing,"in IEEE Internet of Things Journal,vol.9,no.6,pp.4451-4463.

[3].L.Chen,J.Wu,J.Zhang,H.-N.Dai,X.Long and M.Yao,"Dependency-Aware Computation Offloading for Mobile Edge Computing With Edge-Cloud Cooperation,"in IEEE Transactions on Cloud Computing,vol.10,no.4,pp.2451-2468.

[4].L.Chen,J.Wu,J.Zhang,H.-N.Dai,X.Long and M.Yao,"Dependency-Aware Computation Offloading for Mobile Edge Computing With Edge-Cloud Cooperation,"in IEEE Transactions on Cloud Computing,vol.10,no.4,pp.2451-2468.

[5].N.Mao,Y.Chen,M.Guizani and G.M.Lee,"Graph Mapping Offloading Model Based On Deep Reinforcement Learning With Dependent Task,"2021International Wireless Communications and Mobile Computing(IWCMC),2021,pp.21-28.

[6] Cui Shuo, agaroux, xie Zhi, zhang Guhao, shengjiang. Computer measurement and control, 2021,29 (11): 189-195, based on computational offloading of graph neural network with dependent tasks [ J ].

Disclosure of Invention

In the MEC scene of multiple users and multiple servers, the invention provides a fine-granularity task scheduling and unloading method based on a graph neural network and a deep reinforcement learning algorithm, each user generates a series of dynamically arrived applications, the graph neural network in an extensible state is utilized to better capture the characteristics of each time sequence dependent service, global information among a plurality of tasks and applications is fully considered, parameters of the graph neural network and the strategy network are trained together through a strategy gradient algorithm, and optimal tasks to be scheduled and optimal servers for the tasks are selected in a complex and high-dimensional state space to reduce scheduling time of all applications in the system.

The key problem to be solved by the invention is to unload and schedule the time sequence dependent service which is dynamically arrived by different users aiming at a network scene of multiple users and multiple servers, and the adopted technical scheme is as follows:

the multi-user dependent service calculation unloading optimization algorithm based on the graph neural network comprises the following steps:

and step 1, establishing a system model according to the network scene.

The network scenario is a multi-user multi-server MEC system in which each user generates a series of streamed applications, each of which consists of time-dependent tasks.

And step 1.1, establishing a dynamic model of application flow arrival in a multi-user multi-server scene according to the server and the user information.

Fig. 1 shows a MEC system consisting of a plurality of heterogeneous edge servers with varying computing power. The user equipment and the server set in the MEC system are respectively composed ofAnd->The expression u represents any one user equipment in the set, m represents any one server in the set of servers, and the computing power of any one server is expressed as f _m A group of user equipment is connected to the same base station, and the user communicates with the directly connected server in the unloading execution process, so that the user task can be unloaded to the MEC server where the service base station is located or to the MEC servers of other adjacent base stations.

As shown in FIG. 2, the application is modeled as a directed acyclic graph DAG, one DAG is described asAnd E represents the point set and edge set of the graph, respectively, in MEC systemAny one user->The resulting application set may be expressed asG _un Representing user +.>Any one of the generated applications n. N (N) _u Representing the userTotal number of applications generated.

Wherein, any user U (U epsilon U) generates an application G _un The number of subtasks included in the system isG _un The set of all tasks in (1) is expressed as: /> Represents G _un Total number of tasks in->Representative application G _un Is>Representing the dependency between tasks->Representing->And->Has a dependency relationship with ∈>Must be ensured at +.>Execution can be performed after completion of execution, +.>Called asIs (are) parent task->Is->Is to add any one of the subtasks->The set of all forward tasks of (1) is denoted +.>The set of backward tasks (subtasks) is denoted +.>The data volume uploaded by the task is +.>The number of CPU cycles required is +.>

Each user generates a series of streaming applications, and the time interval between applications arrived in a certain user in a period obeys an exponential distribution according to the assumption of queuing theoryThe time interval from time 0 to the first application is lambda ₀ Then the time interval between the nth to the (n+1) th applications is lambda _n ：

λ _n ～Exp(λ)

Wherein λ is an arrival rate parameter, and represents an average number of applications generated in a unit time of a user, and a relationship between arrival times of two adjacent applications is:

step 1.2, modeling is carried out on the multi-user multi-server system, a user can communicate with a base station which is directly connected, when a task in a certain application generated by the user is unloaded, the directly connected server can be selected for unloading, other servers can be selected for unloading, and a model of system time delay can be obtained according to the influence of an unloading decision on time delay.

The user and the edge server perform wireless transmission through the base station, B is the channel bandwidth, p _u,m Is the uploading power of the user g _u,m Representing the channel gain, sigma, between user u and connected server m ² Representing the noise power, the transmission rate can be obtained according to shannon's formula:

the data transmission between any two servers is carried out through wired connection, and the transmission rate is a fixed value r _m,m′ The method comprises the steps of carrying out a first treatment on the surface of the If the task isSelecting to offload on other servers, the transfer time requires an additional addition of the wire transfer time +.>The transmission time can be written as:

after uploading the task to the MEC server, the time delay for the task to perform the computation on the MEC server, the number of CPU cycles required for the task and the selected server f _selected Related to the calculation frequency of (a), the calculation formula is as follows:

task start timeThe start time of a task is equal to the maximum value of all forward task completion times and the waiting time +.>Is added to the sum of (3).

Task end timeThe end time of a task is equal to the sum of the start time of the task and the transmission time, calculation time.

End time of applicationThe end time of an application is equal to the maximum of the completion times of all tasks in the application.

Setting the start time of the job as the arrival time of the applicationThe scheduling time of the application is->Can be represented by the start time and end time of the application:

and 2, establishing a calculation unloading optimization problem in the MEC system.

Optimization target: minimizing the scheduling time for all applications in the system:

constraint 1: the subtasks must ensure that all forward tasks are completed before scheduling can begin;

constraint 2: the subtask can only select one of all servers to unload;

wherein,,for unloading decision variables, if->Representative subtask->Offloading on mth server, otherwise +.>

And 3, solving the problem of embedding the time sequence dependent service state by using the graph neural network.

In solving the computational offload problem with DRLs, state information needs to be translated into features that are passed to the policy network, however most neural networks are only suitable for handling European spatial data, and are not suitable for handling the offload problem for DAG-type applications, in order to better input any number and size of DAG-type graphs into the neural network of the DRL, a graph neural network is used to handle scalable states, which encodes state information into a set of embedded vectors.

Step 3.1, obtaining task level embedding;

as shown in the fig. 3 fig. neural network part, the fig. neural network takes a DAG type application as an input, each task of the application has a set of characteristics (the CPU cycles of the task, the data volume uploaded by the task, the average transmission time of the task on the server, the average calculation time of the task on the server, etc.), and two levels of embedding are output: the task level embedding and the global level embedding, respectively. The task level embedding captures information of the task and subtasks thereof, and the global level embedding contains embedded information from all DAG type applications.

Task level embedding: ith subtask of nth application generated by ith userIs characterized in thatIn the present method, task->The original features of (1) include the upload data volume of the task +.>Task->The number of CPU cycles required +.>Task->Transmission time with the server where the serving base station is located +.>Average computing time of tasks in servers of different computing capacities +.>

By a two-layer nonlinear function f ₁ (. Cndot.) and f ₂ (. Cndot.) obtaining the nth application G generated by the nth user _un Arbitrary subtasks of (3)Embedded features->Features are continuously transferred to nodes forward of the node, starting from the leaf nodes of the DAG through the order of message transfer. Any one subtask->The embedding calculation mode is as follows:

wherein f ₁ (. Cndot.) and f ₂ (. Cndot.) is a nonlinear function composed of a small neural network, f ₁ (-) aggregate tasksIs->Is embedded in the memory.

Step 3.2 obtaining the Global level of embedding

The global level of embedding contains embedded information (e.g., the number of applications) from all DAG-type applications. The graph neural network first calculates a summary of all node embeddings for each DAGTo calculate the embedding of the global level, a summary node is added to each DAG graph, which takes all nodes in the DAG as child nodes, and then gets the summary +_ of the global level>

Step 4, unloading decision making by using deep reinforcement learning algorithm

In order to balance the size and the operation number of the action space, a scheduling decision is decomposed into a series of two-dimensional actions, a next scheduled task and a server for which the task is to execute unloading are output, a scheduling agent observes the state of a system to determine each scheduling operation, receives rewards according to an optimization target, converts application DAGs into vectors of a strategy network by using a graph neural network, and guides the strategy network to output the action of each decision by rewards.

Step 4.1 modeling a Markov (MDP) process according to the unloading problem;

in the framework of reinforcement learning, the machine that performs learning and decision making is called an agent, the object with which the agent interacts is called an environment, and at each discrete time t, the agent interacts with the environment to observe a certain characteristic expression of the state of the environmentAnd selects an action on the basis of this>Receiving a reward at the next timeAs a result of selecting this action, and entering a new state S _t+1 This process is called the Markov decision process and is usually described as a quadruple +.>Representing state space, action space, state transition probability, and rewards, respectively. Status space->The states input into the graph neural network include application state S _Gun And state S of task _Gun,i Two parts.

Arbitrary tasksState of (2): the state of a task includes a taskIs->CPU cycle number required for task->Transmission time of task and server connected to the user>Average computing time of tasks in servers of different computing capacities +.>

The state of the application includes: index I of server to which user u is connected _u Channel gain g between the application-located user u and the connected server m _u,m 。

Action spaceDecision action is a two-dimensional action, i.e. outputting the next scheduled task V _next Server s whose tasks are to be offloaded _next Both the unloading position and the unloading timing are optimized.

A _t ＝＜V _next ,s _next ＞

Scheduling an event: when a task is completed, the corresponding actuator or new application is released to arrive at a scheduled event called at time t.

Selecting a task: when the scheduling event occurs at the time t, the corresponding state is S _t Any subtask for the nth application generated by the nth userCalculating a score->Converting vectors embedded in the neural network into a scalar, the score representing, to some extent, the schedulable task +.>And then calculates the probability of selecting a task from the priority scores using softmax operations:

wherein,,is a set of tasks that can be scheduled at time t, G (x) is the application DAG where task x is located.

The selection server: for each server, the policy network also calculates a score g (·) that assigns an offload location for the task using another score function g (·). Similar to selecting schedulable tasks, the system applies softmax operations to these scores to calculate a selection server probability.

Awards R: let T be the total action number of one epoch, T _k Is the clock number corresponding to the kth action, J _k Represented at (t _k-1 ,t _k ]Applications that exist within a time period.

R _k ＝-(t _k -t _k-1 )J _k

Step 4.2, unloading decision is carried out by utilizing a strategy gradient algorithm;

the strategy Gradient method (Policy Gradient Methods) is a class of reinforcement learning methods that are directly strategy-optimized for Expected Return (Expected Return) through Gradient Descent (Gradient Descent), and avoids some of the difficulties faced by other conventional reinforcement learning methods, such as the lack of an accurate cost function, or the difficulty (incapacity) due to continuous state and action space, and uncertainty of state information.

Using band reference functions b (S _t′ ) The algorithm can greatly improve the performance of the strategy gradient method. All operations of the system, from the graph neural network to the policy network, are minimal. For simplicity, all parameters in these operations are jointly denoted as θ, the scheduling policy pi _θ (A _t |S _t ) Defined as in state S _t Probability of taking action. Considering an event of length T, the agent gathers (status, action, rewards) at each step T, i.e. (S _t ,A _t ,R _t ). agent updates its policy pi using REINFORCE policy gradient algorithm _θ (A _t |S _t ) And a parameter θ of (2). With reference function b (S _t′ ) Gradient of objective functionWriting:

the parameters are updated as follows, where a is the learning rate:

summary of the technology:

compared with the prior art, the invention considers a more complex MEC system, and carries out fine-granularity unloading research on the dynamically arrived time sequence dependent service under the network scene of multiple users and multiple servers, and subtasks of the application generated by each user can be unloaded to a directly connected server or other servers. The invention utilizes the graph neural network to more effectively capture the characteristic states of time sequence dependent services and applications with arbitrary sizes and shapes, and comprehensively considers the information and global information of the task. And finally, carrying out Markov process modeling on the unloading problem, carrying out unloading decision by utilizing a strategy gradient algorithm of deep reinforcement learning, selecting a task to be scheduled and selecting a server to be unloaded for the task, thereby optimizing the unloading time sequence and the unloading position.

Drawings

Fig. 1, a system model includes a schematic structure diagram of a base station, a user, and an application for which the user generates a streaming arrival.

FIG. 2 shows the dependency of a time-dependent service in the form of a directed acyclic graph.

FIG. 3 illustrates a neural network and deep reinforcement learning framework, embodying the correlation between states, actions, rewards. The squares of the neural network portion of the figure represent feature summaries at the task level, and the triangles represent status information summaries at all application levels.

FIG. 4 shows a convergence of the algorithm with a learning rate lr equal to 0.001, the abscissa representing the number of epoch events during training and the ordinate representing the total rebate rewards during training.

FIG. 5 is a graph of average applied scheduling delay(s) versus other algorithms in the system of the present scheme, including first-in-first-out algorithms and random scheduling methods.

Detailed Description

The following is a further description of the technical scheme of the time-sequence-dependent service offloading method based on the graph neural network with reference to the accompanying drawings and examples.

The system model diagram of the invention is shown in figure 1.

The application model diagram of the invention is shown in fig. 2.

The invention relates to a graph neural network and a deep reinforcement learning algorithm framework, which are shown in fig. 3.

The algorithm convergence diagram according to the invention is shown in fig. 4.

The comparison relation diagram of the average completion time and other algorithms of the system application related to the invention is shown in fig. 5.

1, initializing a system, and generating the number of users and the number of servers in a small system, the position relation between the users and the servers, the arrival rate of applications generated by the users in a section, an adjacency matrix read by the applications, the task number contained in each application, the system bandwidth, the transmitting power and the noise power of the users and the computing capacities of different servers;

2, obtaining time delays under different unloading decisions according to problem modeling;

3, establishing a Markov decision process (state, action and rewards) according to the unloading problem;

and 4, obtaining the feature embedding of the task level and the application level through the message transmission and embedding process of the graph neural network, and converting the DAG into a vector of the strategy network.

5. Iterative learning based on a strategy gradient algorithm is carried out, and actions (schedulable tasks and selected servers) are selected to update states and obtain rewards.

And 6, receiving rewards according to the optimization targets, and guiding the strategy network to output the action of each decision through the rewards.

And (3) main parameter setting:

algorithm 1 shows a policy gradient algorithm flow diagram with baseline:

algorithm 2 shows an overall flow chart of a multi-user time sequence dependent service unloading method based on a graph neural network

FIG. 4 is a convergence graph of the algorithm when the learning rate is equal to 1e-3, and it can be seen from FIG. 4 that the total rebate rewards of the system gradually rise with increasing epochs until substantially unchanged, i.e., the algorithm is converging.

Fig. 5 is a graph comparing the average application completion time under the scheme of the present invention and the scheme of applying first-in-first-out (fifo), randomly selected server and task (random) offloading, wherein the number of users is 6, the total number of applications generated by all users is 27 in this case, the number of tasks contained in each application is 4 to 6, and the applications arrive within 10s, as can be seen from fig. 5, the average application scheduling time of the scheme of the present invention is lower and better than other schemes.

Claims

1. The multi-user dependent service calculation unloading optimization method based on the graph neural network is characterized by comprising the following steps of:

step 1, establishing a system model according to a network scene;

the network scene is a multi-user multi-server MEC system, each user in the MEC system generates a series of streaming arriving applications, and each application is composed of time sequence dependent tasks;

step 1.1, establishing a dynamic model of application flow arrival in a multi-user multi-server scene according to server and user information;

in MEC system composed of multiple heterogeneous edge servers, user equipment and server set in MEC system are composed ofAnd->The expression u represents any one user equipment in the set, m represents any one server in the set of servers, and the computing power of any one server is expressed as f _m ；

Application modeling is a directed acyclic graph DAG, one DAG described as And E represents the point set and the edge set of the graph, respectively, any one user in the MEC system +.>The resulting application set may be denoted as G _u ＝{G _u1 ,G _u2 ,...,G _un ,...,G _uNu },G _un Representing user +.>Any one of the generated applications n; n (N) _u Representing user +.>The total number of applications generated;

wherein, any user U (U epsilon U) generates an application G _un The number of subtasks included in the system isG _un The set of all tasks in (1) is expressed as: /> Represents G _un Total number of tasks in->Representative application G _un Is>Representing the dependency between tasks->Representing->And (3) withHas a dependency relationship with ∈>Must be ensured at +.>Execution can be performed after completion of execution, +.>Called->Is (are) parent task->Is->Is to add any one of the subtasks->Is represented as a set of all forward tasksThe set of backward tasks (subtasks) is denoted +.>The data volume uploaded by the task is +.>The number of CPU cycles required is +.>

Each user generates a series of streaming applications, the time interval between applications reached by a user in a period of time obeys an exponential distribution according to the assumption of queuing theory, and the time interval between the user from time 0 to the first application is lambda ₀ Then the time interval between the nth to the (n+1) th applications is lambda _n ：

λ _n ～Exp(λ)

step 1.2, modeling a multi-user multi-server system, wherein a user communicates with a base station directly connected with the multi-user multi-server system, and when a subtask in a certain application generated by the user is unloaded, the subtask is unloaded to an MEC server where a service base station is located or to MEC servers of other adjacent base stations; according to the influence of the unloading decision on the time delay, a model of the system time delay can be obtained;

the user and the edge server perform wireless transmission through the base station, B is the channel bandwidth, p _u,m Is the uploading power of the user g _u,m Representing the channel gain between user u and the connected server m, the transmission rate is obtained according to shannon's formula:

the data transmission between any two servers is carried out through wired connection, and the transmission rate is a fixed value r _m,m′ The method comprises the steps of carrying out a first treatment on the surface of the If the task isSelecting to offload on other servers, the transfer time requires an additional addition of the wire transfer time +.>The transmission time is written as:

task start timeThe start time of a task is equal to the maximum value of all forward task completion times and the waiting time +.>Is added up;

task end timeThe ending time of the task is equal to the sum of the starting time of the task, the transmission time and the calculation time;

end time of applicationThe ending time of the application is equal to the maximum value of the completion time of all tasks in the application;

step 2, establishing a calculation unloading optimization problem in the MEC system;

constraint 2: the subtask can only select one of all servers to unload;

Step 3, solving the problem of embedding the time sequence dependent service state by using a graph neural network;

processing the scalable state using a graph neural network, the network encoding state information into a set of embedded vectors;

step 3.1, obtaining task level embedding;

the graph neural network takes DAG type application as input, each task of the application has a group of characteristics, and two levels of embedding are output: the task level embedding and the global level embedding are respectively carried out; the embedding of the task level captures the information of the task and subtasks, and the embedding of the global level comprises the embedded information from all DAG type applications;

task level embedding: ith subtask of nth application generated by ith userIs characterized by->In the present method, task->The original features of (1) include the upload data volume of the task +.>Task->Required number of CPU cyclesTask->Transmission time with the server where the serving base station is located +.>Average computing time of tasks in servers of different computing capacities +.>

By a two-layer nonlinear function f ₁ (. Cndot.) and f ₂ (. Cndot.) obtaining the nth application G generated by the nth user _un Arbitrary subtasks of (3)Embedded features->Features are continuously transferred to forward nodes of the nodes by starting from leaf nodes of the DAG through the sequence of message transfer; any one subtask->The embedding calculation mode is as follows:

wherein f ₁ (. Cndot.) and f ₂ (. Cndot.) is a nonlinear function composed of a small neural network, f ₁ (-) aggregate tasksIs->Is embedded in the mold;

step 3.2 obtaining the Global level of embedding

The global level of embedding contains embedded information from all DAG-type applications; the graph neural network first calculates a summary of all node embeddings for each DAGTo compute the global level of embedding, each DAG graph is assignedAdding a summary node which takes all nodes in the DAG as child nodes and then obtains summary +.>

In order to balance the size and the operation number of the action space, a scheduling decision is decomposed into a series of two-dimensional actions, a next scheduled task and a server for executing unloading of the task are output, a scheduling agent observes the state of a system to determine each scheduling operation, receives rewards according to an optimization target, converts application DAGs into vectors of a strategy network by using a graph neural network, and guides the strategy network to output the action of each decision by rewards;

step 4.1, modeling a Markov MDP process according to an unloading problem;

in the framework of reinforcement learning, at each discrete time t, an intelligent agent interacts with the environment to observe a certain characteristic expression of the state of the environmentAnd selects an action on the basis of this>Receiving a reward +.>As a result of selecting this action, and entering a new state S _t+1 Described as a four-tupleRespectively representing a state space, an action space, a state transition probability and rewards;

state spaceThe states input into the graphic neural network include the application state +.>And status of task->Two parts;

arbitrary tasksState of (2): the status of the task comprises the uploading data amount of the task->CPU cycles required for taskTransmission time of task and server connected to the user>Average computing time of tasks in servers of different computing capacities +.>

The state of the application includes: index I of server to which user u is connected _u Channel gain g between the application-located user u and the connected server m _u,m ；

Action spaceDecision action is a two-dimensional action, i.e. outputting the next scheduled task V _next Server s whose tasks are to be offloaded _next Not only the unloading position but also the unloading time sequence are optimized;

A _t ＝＜V _next ,s _next ＞

scheduling an event: when a task is completed, releasing the corresponding executor or new application to arrive at a scheduled event called at time t;

wherein,,is a task set schedulable at time t, G (x) is an application DAG where task x is located;

the selection server: for each server, the policy network also calculates a score g (·) that assigns an offload location for the task using another score function g (·); similar to selecting schedulable tasks, the system applies softmax operations to these scores to calculate a selection server probability;

rewardsLet T be the total action number of one epoch, T _k Is the clock number corresponding to the kth action, J _k Represented at (t _k-1 ,t _k ]Applications that exist within a time period;

R _k ＝-(t _k -t _k-1 )J _k

using band reference functions b (S _t′ ) And (2) the strategy gradient algorithm of the method, which jointly represents all parameters in operation as theta, and the scheduling strategy pi _θ (A _t |S _t ) Defined as in state S _t Probability of action taken at that time; considering an event of length T, the agent gathers (status, action, rewards) at each step T, i.e. (S _t ,A _t ,R _t ) The method comprises the steps of carrying out a first treatment on the surface of the agent updates its policy pi using REINFORCE policy gradient algorithm _θ (A _t |S _t ) A parameter θ of (2); with reference function b (S _t′ ) Gradient of objective functionWriting:

the parameters are updated as follows, where a is the learning rate:

2. the method for optimizing traffic offload of multiple user dependent computing based on neural network according to claim 1, wherein a group of user equipments are connected to the same base station, and the user communicates with directly connected servers during offload execution, and user tasks are offloaded to directly connected servers or to other servers through wired transmission.