CN115686846A

CN115686846A - Container cluster online deployment method for fusing graph neural network and reinforcement learning in edge computing

Info

Publication number: CN115686846A
Application number: CN202211347967.8A
Authority: CN
Inventors: 陈卓; 朱博文; 周川
Original assignee: Chongqing University of Technology
Current assignee: Chongqing University of Technology
Priority date: 2022-10-31
Filing date: 2022-10-31
Publication date: 2023-02-03
Anticipated expiration: 2042-10-31
Also published as: CN115686846B

Abstract

The invention provides a container cluster online deployment method for fusing a graph neural network and reinforcement learning in edge computing, which comprises the following steps of: s1, extracting a topological association relation existing between containers by a graph convolution network; and S2, deducing a deployment strategy from the sequence to the sequence network with the assistance of the graph convolution network. The invention can reasonably deploy the edge calculation according to the constructed optimization model.

Description

Container cluster online deployment method for fusing graph neural network and reinforcement learning in edge computing

Technical Field

The invention relates to the technical field of edge deployment, in particular to a container cluster online deployment method for fusion graph neural network and reinforcement learning in edge computing.

Background

In recent years, with the rapid development of wireless access technology, various mobile internet and novel internet of things are continuously applied, new characteristics such as shorter response time requirement, higher service quality requirement, more diverse resource requirements and dynamic change of resource requirement scale are increasingly presented in business, and the new requirements are difficult to be met by a cloud computing mode of concentrating IT resources in a data center to provide services for users. The edge computing deploys the service nodes at the network edge closer to the user in a distributed manner, so that the mobile user can access the service on the edge service node nearby, thereby significantly improving the service quality and effectively reducing the resource load of the data center. By introducing a virtualization technology, an edge service provider can abstract physical resources of an edge node into a Virtual Network Function (VNF), improve the utilization efficiency of IT resources on the premise of meeting user service requirements, and further reduce the service expense (OPEX) of the provider. Currently, virtualization technology (VM-VNF) based on a Virtual Machine (VM) is most widely used. However, the VM-VNF has limitations such as slow start and migration and large resource overhead, which makes it slow to meet the dynamic requirements of tasks. With the rise of the recent new proposed Serverless Computing, network functions can be deployed in the form of Containers (CT), and thus form a Container-based virtualization technology (CT-VNF). CT-VNF is increasingly used by edge service providers due to its advantages of lighter weight resource usage, shorter service startup time, and higher migration efficiency. Providing services to tasks at the edge end often requires deploying multiple container units on the edge service node and connecting them to each other to build a Container Cluster (CC), for example: a real-time data analysis service with information security requirements may need to be established to include Firewall, IDS, a plurality of computing units, a load balancer and other functional units. The functional units are mapped to the same or different edge service nodes in the form of containers and establish virtual networks for interconnection. The complexity of the service itself and the higher demand for service efficiency make it a challenging problem how to implement optimized CC deployment in edge computing environments, which needs to be considered at the same time: 1) The service requests for resources; 2) A logical association relationship between a plurality of containers; 3) The rest IT resources of the currently available edge nodes; 4) The expense of container deployment on energy consumption; 5) Poor service quality that may result from container deployment, etc.

Disclosure of Invention

The invention aims to at least solve the technical problems in the prior art, and particularly innovatively provides a container cluster online deployment method for fusion graph neural network and reinforcement learning in edge computing.

In order to achieve the above object, the present invention provides a container cluster online deployment method for fusion graph neural network and reinforcement learning in edge computing, which comprises the following steps:

s1, extracting a topological association relation existing between containers by a graph convolution network;

and S2, deducing a deployment strategy from the sequence to the sequence network with the assistance of the graph convolution network.

In a preferred embodiment of the present invention, the hierarchical propagation of the graph volume network in step S1 is:

wherein ,H^(l+1) Represents the characteristics of the l +1 th layer;

σ () represents an activation function;

representation matrix

A degree matrix of (c);

presentation pair

Performing a power-1/2 operation on the matrix;

a represents the relationship matrix between the nodes in diagram G;

representing an adjacency matrix with an undirected graph G appended with self-joins;

H ^(l) represents the characteristics of the l-th layer;

W ^(l) the training parameter matrix for layer l is represented.

In a preferred embodiment of the present invention, the strategy is deployed in step S2 as follows:

π(p|c,θ)＝P _r {A _t ＝p|S _t ＝c,θ _t ＝θ}

where π (p | c, θ) represents the probability of deploying policy p for the output of a given input c;

θ represents the training parameters of the model;

P _r representing the probability of outputting the deployment policy p;

A _t represents the operation at time t;

S _t indicates the state at time t;

θ _t representing the training parameters at time t.

In a preferred embodiment of the present invention, after step S1, a step S3 is further included, and the commentator network evaluates the rewards obtained after performing the action of the actor.

In a preferred embodiment of the present invention, after step S1, there is further included step S4, in which the actor network updates the optimized model parameters according to the output of the critic module.

In a preferred embodiment of the present invention, the optimization model is:

max (Total charge-Total energy expenditure) (1.1)

Wherein N represents a set of physical nodes;

G _c representing revenue per unit of computing resource;

η _k,c representing the utilization of computing resources on physical node k;

i represents a service request set;

V _i a set of containers representing service request i;

which represents a binary flag bit that is,

a container j representing a request i is deployed on a physical node k;

representing a demand amount for computing resources by container j of request i;

G _m expressing the income of memory resources per unit;

representing the demand of the container j of the request i for the memory resource;

G _s representing the profit per unit of storage resource;

representing the demand of the container j of the request i for the storage resource;

wherein N represents a set of physical nodes;

representing the maximum energy consumption value of the physical node k;

representing an idle energy consumption value of physical node k;

i represents a service request set;

V _i a set of containers representing service requests i;

which represents a binary flag bit that is set to zero,

a container j representing a request i is deployed on a physical node k;

representing the total amount of computing resources of physical node k;

u _k representing binary flag bits, u _k When =1, it indicates that physical node k is in an active state;

c represents the specific energy expenditure coefficient.

In a preferred embodiment of the present invention, the optimization model is: min (total energy consumption expense), min () represents the minimum; max () means max.

Wherein N represents a set of physical nodes;

representing the maximum energy consumption value of the physical node k;

represents an idle energy consumption value for physical node k;

i represents a service request set;

V _i a set of containers representing service request i;

which represents a binary flag bit that is,

a container j representing a request i is deployed on a physical node k;

representing the demand of the container j of the request i for the computing resource;

representing the total amount of computing resources of physical node k;

c represents the specific energy expenditure coefficient.

In a preferred embodiment of the present invention, the constraint conditions of the optimization model are:

wherein ,η_k,c Representing the utilization of computing resources on physical node k;

i represents a service request set;

n represents a physical node set;

which represents a binary flag bit that is,

a container j representing a request i is deployed on a physical node k;

representing the total amount of computing resources of physical node k;

wherein N represents a set of physical nodes;

which represents a binary flag bit that is set to zero,

a container j representing a request i is deployed on a physical node k;

i represents a service request set;

V _i a set of containers representing service requests i;

wherein I represents a service request set;

V _i a set of containers representing service requests i;

representing the bandwidth requirements of container m and container n for request i;

which represents a binary flag bit that is,

a container m representing a request i is deployed at a physical node k _u C, removing;

which represents a binary flag bit that is set to zero,

a container n representing a request i is deployed at a physical node k _v The above step (1);

representing a physical node k _u and k_v Total amount of bandwidth resources in between;

wherein I represents a service request set;

n represents a physical node set;

which represents a binary flag bit that is,

a container j representing a request i is deployed in physicalOn node k;

representing the total amount of computing resources of physical node k;

representing the total amount of memory resources of a physical node k;

representing the total amount of storage resources of physical node k.

In a preferred embodiment of the invention, the model is updated as:

wherein ,θ_k+1 Representing model parameters at a next time instant;

θ _k model parameters representing a current time;

α represents a learning rate;

representing the lagrangian gradient approximated using monte carlo sampling.

In a preferred embodiment of the present invention, the model updating further comprises:

wherein ,

represents the mean square error of the evaluation value b (c, p) and the reward value Q (c, p) given by the reference evaluator;

m represents the number of samples;

Q(c,p _i ) Indicating that the algorithm makes a decision p at a given input container cluster c _i The reward obtained is made;

b(c,p _i ) Representing a cluster c and a decision p at a given input container _i The evaluation value given by the lower reference evaluator b.

In summary, due to the adoption of the technical scheme, the edge calculation can be reasonably deployed according to the constructed optimization model.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a schematic diagram of container cluster deployment in an edge network environment according to the present invention.

FIG. 2 is a diagram of a reinforcement learning model decision-reward cycle according to the present invention.

FIG. 3 is a schematic diagram of the model training process of the present invention.

Fig. 4 is a schematic diagram of details of the network model of the actor of the present invention.

FIG. 5 is a schematic diagram of training history of the present invention in three experimental scenarios;

where (a) is a training history (small-scale scenario), (b) is a training history (medium-scale scenario), (c) is a training history (large-scale scenario), (d) is a training loss (small-scale scenario), (e) is a training loss (medium-scale scenario), and (f) is a training loss (large-scale scenario).

FIG. 6 is a solution time comparison diagram of the present invention.

FIG. 7 is a comparative illustration of the deployment error rate of the present invention.

FIG. 8 is a graphical illustration of a comparison of the cumulative revenue of the present invention over a period of time;

in these cases, (a) is cumulative benefit comparison (small-scale scenario), (b) is cumulative benefit comparison (medium-scale scenario), and (c) is cumulative benefit comparison (large-scale scenario).

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

The invention mainly comprises the following steps: modeling a container cluster deployment problem in an edge computing network environment, and solving a framework based on an edge computing container cluster deployment strategy of Actor critics (Actor-Critic) reinforcement learning. The method comprises the steps of extracting features of a mesh relation topological structure among a plurality of containers in a container cluster by introducing a graph volume network, using the extracted features as input of an attention mechanism in a Seq2Seq network to improve the output quality of a solution, carrying out embedded coding on the container cluster by an encoder part of Seq2Seq, and outputting corresponding container deployment positions by a decoder part. And a reinforcement learning framework based on Actor-Critic is adopted to train the network, label mapping is not needed, the Actor network and the Critic network can train and learn mutually to be promoted independently, and the system benefit is improved obviously through the solution given by the trained network.

Different numbers of service requests may be received by the edge computing platform in the same period, and functions required to be implemented by each service request are different as much as possible, and services with different functions mean that containers with different types and different numbers need to be used, and uncertain communication needs exist in containers with the same number. The most intuitive impact of service request size and category is the change of virtual nodes and links, i.e. the change of the structural configuration. Workload fluctuations typically change the amount of resource requirements, i.e., changes in resource configuration, of a virtual node or link. The process of mapping two different container clusters to the underlying physical network is illustrated in fig. 1.

1. Reinforced learning solving framework combined with graph convolution network

In the invention, a reinforcement learning framework of Actor-Critic is adopted to train the model. The entire model involves two neural networks: actor networks and critic networks. Their workflow is as shown in fig. 2: for a given container cluster input into the decision system, the agent (Actor network) will depend on the current network state S _t Giving a suitable decision a _t In our question, the deployment policy Placement, indicates the deployment location of the containers in the container cluster. The environment then evaluates the deployment policy to generate corresponding feedback information (reward) R indicative of the quality of the deployment policy _t+1 At the same time, the environment will update the new environment S after deployment _t+1 . The critic network evaluates the return (namely the Langerhans value) obtained after the actor acts, and the evaluation result is Baseline; the actor network updates the model parameters based on the output of the critic module (the actor network will update the parameters in the direction of higher returns). The training process of the model is specifically shown in fig. 3.

In the invention, a Graph Convolutional neural Network (Graph conditional Network) is expanded on the basis of a neural combination optimization theory to extract a topological link relation existing in a container cluster, so that an intelligent agent can predict a topological structure of the container cluster in advance, and a deployment strategy can be given more accurately. In particular, we use a graph-convolution network and a sequence-to-sequence model based on an codec structure to infer deployment strategies. For a cluster of containers of the same training batch, we adopt the following method: clustering a plurality of containersThe feature information group and the block diagonal matrix are input to a graph convolution network for training. To more clearly explain the above model working process, we assume a set of container clusters [ Q, V, W]Mapping into the underlying physical network is required. Each container cluster to which a service request corresponds has a container number of variable size m, e.g., Q = (f) ₁ ,f ₂ ,...,f _m ). Clustering containers [ Q, V, W ]]As input to the GCN network, containers Q = (f) in a container cluster ₁ ,f ₂ ,...,f _m ) As input to the encoder, the decoder portion outputs a deployment policy P = (P) ₁ ,p ₂ ,...,p _m ) Indicating the deployment location of each container. The actor network model in the method proposed by the present invention is shown in fig. 4.

One part of the task request is input into the GCN network for extracting the topological characteristics, and the other part of the task request is input into the encoder part of the Seq2Seq network for controlling the sequence of container deployment. The output of the GCN network and the output part of the encoder are input to a decoder part of Seq2Seq through matrix dot product operation, and finally the deployment strategy of the container is given by the decoder.

The invention sets out from the perspective of an edge computing service provider to construct an optimization model, and hopefully reduces the total energy consumption expense on the premise of meeting the service request of a user as much as possible so as to realize the benefit maximization of the service provider.

max (Total charge-Total energy expenditure) (1.1)

The objective function is divided into two parts: equation (1.2) provides for the edge computing service provider to perform a regular charging of rented resources, i.e., for the container j ∈ V included in the service request I ∈ I _i Occupied physical resources: computing resources

Memory resource

And storage resources

Respectively multiplied by corresponding chargesCoefficient: g _c 、G _m and G_s . It is worth noting that for the charging rule of the computing resource, a service effect coefficient (1-eta) is creatively added _k,c ) The competing use of physical resources to constrain containers is exacerbated resulting in reduced service capacity.

Wherein N represents a set of physical nodes;

G _c representing revenue per unit of computing resource;

η _k,c representing the utilization rate of computing resources on a physical node k;

i represents a service request set;

V _i a set of containers representing service requests i;

which represents a binary flag bit that is,

a container j representing a request i is deployed on a physical node k;

G _m expressing the income of memory resources per unit;

G _s representing the profit per unit of storage resource;

in equation (1.3) we define the energy consumption expenses incurred by the underlying physical network, and here our optimization model considers only the energy consumption expenses as operator expenses, considering that the energy consumption expenses account for a large part of the service provider's daily operating expenses.

Is the maximum energy consumption value of the physical node k,

is the minimum energy consumption value of the physical node k, since the energy consumption is positively correlated to the resource utilization rate, we use

And computing resource occupancy rate

The product of (b) represents the energy consumption of the physical node k, and the energy consumption is generated when the physical node is idle, so that the energy consumption value of the physical node k is added

And finally, multiplying the sum of the two by a unit energy consumption expenditure coefficient to express the total energy consumption expenditure of the service provider.

Wherein N represents a set of physical nodes;

representing a maximum energy consumption value of a physical node k;

represents an idle energy consumption value for physical node k;

i represents a service request set;

V _i a set of containers representing service request i;

which represents a binary flag bit that is,

a container j representing a request i is deployed on a physical node k;

representing the total amount of computing resources of physical node k;

c represents the unit energy consumption coefficient;

the optimization model is constrained by a plurality of constraints, the constraints (1.4) representing the utilization η on the physical node k with respect to the computational resources _k,c ，η _k,c The range of values is limited to [0,1 ]]。

i represents a service request set;

n represents a physical node set;

which represents a binary flag bit that is set to zero,

a container j representing a request i is deployed on a physical node k;

representing the total amount of computing resources of physical node k;

the constraint (1.5) defines that the jth container of the ith service request can only be deployed on one physical node and cannot be deployed repeatedly.

Wherein N represents a set of physical nodes;

which represents a binary flag bit that is set to zero,

a container j representing a request i is deployed on a physical node k;

i represents a service request set;

V _i a set of containers representing service requests i;

constraint (1.6) defines that two of the service requests i are located at physical node k respectively _u and k_v Does not exceed the bandwidth resource occupied by the communication between the containers m and n of the physical node k _u and k_v The total amount of bandwidth resources in between.

Wherein I represents a service request set;

V _i a set of containers representing service requests i;

which represents a binary flag bit that is,

which represents a binary flag bit that is,

constraints (1.7), (1.8) and (1.9) respectively limit the sum of the total amount of all container resources contained in the service request not to exceed the total amount of computing resources, memory resources and storage resources.

Wherein I represents a service request set;

n represents a physical node set;

which represents a binary flag bit that is,

a container j representing a request i is deployed on a physical node k;

representing the total amount of computing resources of physical node k;

representing the total amount of memory resources of a physical node k;

representing the total storage resources of physical node kAn amount;

2. topological relation description based on graph convolution network

The invention adopts the graph convolution network to extract the topological relation of the input container cluster, and uses the extracted characteristics to assist the intelligent agent to provide a more accurate deployment strategy on the premise of not damaging the constraint condition, thereby reducing the container deployment cost and improving the overall benefit of the edge computing service provider.

Assume that a graph of one container cluster configuration is represented by G = (N, E). Where N represents a vertex in the diagram, i.e., a container in the container cluster, and E represents an edge in the diagram, i.e., a link resulting from communication between containers in the container cluster. The features of the vertices in G form an N X D matrix X, where D represents the number of features. The container-to-container relationship is represented by an N × N dimensional matrix a, i.e., a contiguous matrix of G. The hierarchical propagation of the graph convolution network is shown in equation (10).

wherein ,H^(l+1) Represents the characteristics of the l +1 th layer;

σ () represents an activation function;

representation matrix

A degree matrix of (c);

pair of representations

Performing a power-1/2 operation on the matrix;

a represents the relationship matrix between the nodes in diagram G;

H ^(l) features representing the ith layer;

W ^(l) a training parameter matrix representing the l-th layer;

I _N representing an identity matrix with the order of N;

to represent

The ith row and j columns of elements of the matrix;

x represents a characteristic matrix formed by G node characteristics in the diagram;

in this equation

Is an adjacency matrix of an undirected graph G with attached self-joins, where A is the adjacency matrix of the undirected graph G, I _N Is an identity matrix.

Is a matrix

The degree matrix of (c). W is a group of ^(l) Is the training parameter matrix for layer i. σ represents an activation function, such as ReLu, sigmoid, etc. (we use ReLu in our model). H ^(l) Representative is the characteristics of the l-th layer, H = X for the input layer.

3. Policy gradient based constrained optimization

Assuming that a set of container clusters is represented by C, wherein one container cluster is represented by C (C ∈ C), the policy function of C is represented as:

π(p|c,θ)＝P _r {A _t ＝p|S _t ＝c,θ _t ＝θ}

θ represents the training parameters of the model;

P _r representing the probability of outputting the deployment policy p;

A _t represents the operation at time t;

S _t represents the state at time t;

θ _t a training parameter representing time t;

the strategy function represents t moment, c is input, the parameter is theta, and the probability of outputting the deployment strategy P is P _r . The strategy gives a high-income deployment strategy p higher probability and gives a low-income deployment strategy p lower probability. Interaction of the input container cluster with the output policy within the T period generates a trajectory of a markov decision process (trajectory) = (c) ₁ ,p ₁ ,...,c _T ,p _T ) The probability of (d) can be expressed as:

wherein ,P_θ (c ₁ ,p ₁ ,...,c _T ,p _T ) Represents the trace τ = (c) under the parameter θ ₁ ,p ₁ ,...,c _T ,p _T ) The probability of occurrence;

p(c ₁ ) Represents a state c ₁ (i.e., the input at time t =1 is c ₁ ) The probability of occurrence;

t represents a period of time;

π _θ (p _t |c _t ) Indicates that at time t, the current state is c _t (i.e., the container cluster of inputs), with parameter θ, the agent takes action p _t (i.e., the outputted deployment policy);

p(c _t+1 |c _t ,p _t ) The state at time t (i.e., the incoming container cluster) is denoted as c _t And the action (i.e., deployment policy of the output) is p _t Under the condition(s), the system state at time t +1 (i.e., the input container cluster) is c _t+1 The probability of (d);

c ₁ represents the system state (i.e., incoming container cluster) at time t = 1;

p ₁ represents the deployment strategy at time t = 1;

c _t an input representing time t;

p _t representing the deployment strategy output at the time t;

in the above policy function, cluster c is clustered for the current input container _t Deployment policy p _t Depends on the deployment position p of the previous container cluster _(＜t) And a system state. For simplicity, we assume that the system state is fully defined by the container cluster C. The only output of the policy function is the probability that indicates the container cluster deployment location. The objective of the tactical gradient method is to find an optimal set of parameters θ ^* To obtain an optimal deployment location for the cluster of containers. To do this, we need to define an objective function to describe the quality of the deployment strategy.

wherein ,J_R (θ | c) represents the policy quality corresponding to input c;

expressing the expectation;

r (p) represents service income corresponding to the deployment strategy p;

p π θ (. DELTA.c) represents all deployment policies p for a given input c;

in the above equation, we use the expected service revenue R (p) for a given container cluster C for a deployment policy as an objective function describing the quality of the deployment policy. Because the agent infers the deployment policy from all container clusters, the revenue expectation can then be defined as the expectation of the container probability distribution:

wherein ,J_R (θ) represents the policy quality, i.e., expected value of revenue;

expressing the expectation;

J _R (θ | c) represents the policy quality corresponding to input c;

c to C represent clusters C for all containers;

the same reasoning can be expressed for the expected penalty due to violation of the constraint:

wherein ,J_C (θ) represents an expected value of a penalty value;

expressing the expectation;

J _C (θ | c) represents a penalty value corresponding to the input c;

c to C represent clusters C for all containers;

herein, we define four constraint signals, respectively: computing resource cpu, memory resource mem, storage resource sto and bandwidth resource bw. The final optimization objective can be transformed into an unconstrained problem by the lagrangian relaxation technique:

wherein ,J_L (λ, θ) represents the Lagrangian value calculated by taking the expected value of the benefit J _R (theta) plus an expected value J of a plurality of resource-corresponding penalty values _C (θ) a weighted sum;

λ represents the weight of the four constraint signals;

J _R (θ) represents the policy quality, i.e., expected value of revenue;

λ _i representing weights of the constraint signals;

J _C (θ) represents an expected value of the penalty value;

J _ξ (θ) represents a weighted sum of expected values of the four constraint signal penalty values;

where λ is the weight of the four constraint signals, J _ξ (θ) is the expected revenue weighted sum of the four constraint signals. Next, we calculate J using log-likelihood _L (λ, θ) gradient.

wherein ,

representing a gradient operation performed with respect to θ;

J _L (λ, θ) represents the Lagrangian value calculated by taking the expected value of the benefit J _R (theta) plus expectation J of corresponding penalty values for multiple resources _C (θ) a weighted sum;

expressing the expectation;

π θ (p | c) represents the policy function of c;

q (c, p) represents the reward accrued given the input container cluster c algorithm making decision p;

p π θ (. DELTA.c) represents the deployment strategy p that is all for a given input c;

in the above equation, Q (c, p) is used to describe the reward that can be achieved at a given input c and the algorithm making a decision p. The calculation method is by adding a weighted sum of all constraint unsatisfied values C (p) to the profit value R (p), as shown in (18):

where Q (c, p) represents the reward accrued given the input container cluster c algorithm making decision p;

r (p) represents the reward that the decision p can obtain corresponding to the system;

ξ (p) represents the weighted sum of the penalty values corresponding to all the constraint signals of decision p;

λ _i a weight representing the constraint signal;

c (p) represents the penalty value produced by a constraint signal at decision p;

then, we approximate the Lagrangian gradient using Monte Carlo sampling

Wherein m is the number of samples, and in order to reduce the variance of the gradient and accelerate the convergence speed of the model, a critic network is used as a reference estimator b and is composed of a simple RNN network. Then the lagrangian gradient can be expressed as:

wherein ,

representing a lagrange gradient;

m represents the number of samples;

Q(c,p _i ) Indicating that the algorithm makes a decision p at a given input container cluster c _i The prize won is made.

b(c,p _i ) Representing a cluster c and a decision p at a given input container _i The evaluation value given by the lower reference evaluator b;

a gradient representing the logarithm of the policy function;

and finally, updating a parameter theta of the network model by adopting a random gradient descent method:

wherein ,θ_k+1 Representing the model parameters at the next time instant;

θ _k model parameters representing a current time;

α represents a learning rate;

representing a lagrangian gradient approximated using monte carlo sampling;

the benchmark evaluator gives an evaluation b (c, p) of the current container cluster return, and then the parameter σ of the benchmark evaluator is updated based on the mean square error of b (c, p) and the reward value Q (c, p) using a random gradient descent method.

wherein ,

m represents the number of samples;

the container cluster deployment algorithm training process based on the graph convolution network and neural combinatorial optimization can be described as table 1:

table 1. Container Cluster deployment algorithm training process description based on graph convolution network and neural combinatorial optimization

While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. A container cluster online deployment method for fusing graph neural network and reinforcement learning in edge computing is characterized by comprising the following steps:

and S2, deducing a deployment strategy from the sequence to the sequence network with the assistance of the graph volume network.

2. The method for deploying container clusters fusing graph neural networks and reinforcement learning on line in edge computing according to claim 1, wherein the hierarchical propagation of the graph convolution network in step S1 is as follows:

wherein ,H^(l+1) Represents the characteristics of the l +1 th layer;

σ () represents an activation function;

representation matrix

A degree matrix of (c);

presentation pair

Performing a power-1/2 operation on the matrix;

a represents the relationship matrix between the nodes in diagram G;

H ^(l) represents the characteristics of the l-th layer;

W ^(l) the training parameter matrix for layer l is represented.

3. The method for deploying the container cluster fusing the graph neural network and the reinforcement learning in the edge computing on line according to claim 1, wherein the deployment strategy in step S2 is as follows:

π(p|c,θ)＝P _r {A _t ＝p|S _t ＝c,θ _t ＝θ}

theta represents a training parameter of the model;

P _r representing a probability of outputting the deployment policy p;

A _t represents the operation at time t;

S _t indicates the state at time t;

θ _t representing the training parameters at time t.

4. The container cluster online deployment method for fusion graph neural network and reinforcement learning in edge computing as claimed in claim 1, characterized in that, after step S1, a step S3 is further included, and the commentator network evaluates the rewards obtained after the actor acts.

5. The method for deploying the fusion graph neural network and the container cluster for reinforcement learning in edge computing on line according to claim 1, characterized by further comprising a step S4 after the step S1, wherein the actor network updates the optimized model parameters according to the output of the critic module.

6. The method for deploying a fusion graph neural network and a reinforcement learning container cluster on line in edge computing according to claim 5, wherein an optimization model is as follows:

max (Total charge-Total energy expenditure) (1.1)

Wherein N represents a set of physical nodes;

G _c representing revenue per unit of computing resource;

η _k,c representing the utilization of computing resources on physical node k;

i represents a service request set;

V _i a set of containers representing service requests i;

which represents a binary flag bit that is,

a container j representing a request i is deployed on a physical node k;

G _m expressing the income of memory resources per unit;

G _s representing revenue per unit of storage resource;

wherein N represents a set of physical nodes;

representing the maximum energy consumption value of the physical node k;

represents an idle energy consumption value for physical node k;

i represents a service request set;

V _i a set of containers representing service request i;

which represents a binary flag bit that is set to zero,

a container j representing a request i is deployed on a physical node k;

representing the total amount of computing resources of physical node k;

c represents the unit energy consumption coefficient;

or, min (Total energy expenditure)

Wherein N represents a set of physical nodes;

representing the maximum energy consumption value of the physical node k;

represents an idle energy consumption value for physical node k;

i represents a service request set;

V _i a set of containers representing service requests i;

which represents a binary flag bit that is set to zero,

a container j representing a request i is deployed on a physical node k;

representing the total amount of computing resources of physical node k;

u _k representing binary flag bits, u _k 1 indicates that the physical node k is in an activated state;

c represents the specific energy expenditure coefficient.

7. The method for deploying the fusion graph neural network and the container cluster for reinforcement learning in the edge computing on line according to claim 6, wherein the constraint conditions of the optimization model are as follows:

i represents a service request set;

n represents a physical node set;

which represents a binary flag bit that is,

a container j representing a request i is deployed on a physical node k;

representing the total amount of computing resources of physical node k;

wherein N represents a set of physical nodes;

which represents a binary flag bit that is,

a container j representing a request i is deployed on a physical node k;

i represents a service request set;

V _i a set of containers representing service requests i;

wherein I represents a service request set;

V _i a set of containers representing service request i;

representing the bandwidth requirements of container m and container n of request i;

which represents a binary flag bit that is set to zero,

a container m representing a request i is deployed at a physical node k _u The above step (1);

which represents a binary flag bit that is set to zero,

wherein I represents a service request set;

n represents a physical node set;

which represents a binary flag bit that is set to zero,

a container j representing a request i is deployed on a physical node k;

meter for representing physical node kCalculating the total amount of resources;

representing the total amount of memory resources of a physical node k;

representing the total amount of storage resources of physical node k.

8. The method for deploying a fusion graph neural network and a reinforcement learning container cluster on line in edge computing according to claim 5, wherein the model is updated as follows:

wherein ,θ_k+1 Representing model parameters at a next time instant;

θ _k model parameters representing a current time;

α represents a learning rate;

representing the lagrangian gradient approximated using monte carlo sampling.

9. The method for deploying the fusion graph neural network and the container cluster for reinforcement learning in the edge computing on line according to claim 8, wherein the model updating further comprises:

wherein ,

represents the mean square error of the evaluation value b (c, p) and the reward value Q (c, p) given by the reference evaluator; m represents the number of samples;

Q(c,p _i ) Representing the algorithm making a decision p at a given input container cluster c _i The reward obtained is made;