CN115686846A  Container cluster online deployment method for fusing graph neural network and reinforcement learning in edge computing  Google Patents
Container cluster online deployment method for fusing graph neural network and reinforcement learning in edge computing Download PDFInfo
 Publication number
 CN115686846A CN115686846A CN202211347967.8A CN202211347967A CN115686846A CN 115686846 A CN115686846 A CN 115686846A CN 202211347967 A CN202211347967 A CN 202211347967A CN 115686846 A CN115686846 A CN 115686846A
 Authority
 CN
 China
 Prior art keywords
 representing
 container
 physical node
 request
 network
 Prior art date
 Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
 Granted
Links
 238000000034 method Methods 0.000 title claims abstract description 27
 230000002787 reinforcement Effects 0.000 title claims abstract description 19
 238000013528 artificial neural network Methods 0.000 title claims abstract description 16
 238000005457 optimization Methods 0.000 claims abstract description 15
 239000011159 matrix material Substances 0.000 claims description 32
 238000005265 energy consumption Methods 0.000 claims description 26
 238000012549 training Methods 0.000 claims description 23
 230000006870 function Effects 0.000 claims description 18
 238000010586 diagram Methods 0.000 claims description 12
 238000011156 evaluation Methods 0.000 claims description 9
 230000004927 fusion Effects 0.000 claims description 9
 238000012614 MonteCarlo sampling Methods 0.000 claims description 4
 230000004913 activation Effects 0.000 claims description 4
 238000004364 calculation method Methods 0.000 abstract description 3
 230000008901 benefit Effects 0.000 description 11
 230000000875 corresponding effect Effects 0.000 description 10
 230000008569 process Effects 0.000 description 7
 230000008859 change Effects 0.000 description 4
 230000001186 cumulative effect Effects 0.000 description 4
 238000005516 engineering process Methods 0.000 description 4
 230000009471 action Effects 0.000 description 3
 238000004891 communication Methods 0.000 description 3
 238000013507 mapping Methods 0.000 description 3
 230000001537 neural effect Effects 0.000 description 3
 238000011478 gradient descent method Methods 0.000 description 2
 230000005012 migration Effects 0.000 description 2
 238000013508 migration Methods 0.000 description 2
 MWRWFPQBGSZWNVUHFFFAOYSAN Dinitrosopentamethylenetetramine Chemical compound C1N2CN(N=O)CN1CN(N=O)C2 MWRWFPQBGSZWNVUHFFFAOYSAN 0.000 description 1
 230000004075 alteration Effects 0.000 description 1
 230000000052 comparative effect Effects 0.000 description 1
 230000001276 controlling effect Effects 0.000 description 1
 238000013527 convolutional neural network Methods 0.000 description 1
 230000002596 correlated effect Effects 0.000 description 1
 238000007405 data analysis Methods 0.000 description 1
 238000011161 development Methods 0.000 description 1
 230000000694 effects Effects 0.000 description 1
 230000003993 interaction Effects 0.000 description 1
 230000007246 mechanism Effects 0.000 description 1
 238000012986 modification Methods 0.000 description 1
 230000004048 modification Effects 0.000 description 1
 230000004044 response Effects 0.000 description 1
 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications

 Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSSSECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSSREFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
 Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
 Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
 Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention provides a container cluster online deployment method for fusing a graph neural network and reinforcement learning in edge computing, which comprises the following steps of: s1, extracting a topological association relation existing between containers by a graph convolution network; and S2, deducing a deployment strategy from the sequence to the sequence network with the assistance of the graph convolution network. The invention can reasonably deploy the edge calculation according to the constructed optimization model.
Description
Technical Field
The invention relates to the technical field of edge deployment, in particular to a container cluster online deployment method for fusion graph neural network and reinforcement learning in edge computing.
Background
In recent years, with the rapid development of wireless access technology, various mobile internet and novel internet of things are continuously applied, new characteristics such as shorter response time requirement, higher service quality requirement, more diverse resource requirements and dynamic change of resource requirement scale are increasingly presented in business, and the new requirements are difficult to be met by a cloud computing mode of concentrating IT resources in a data center to provide services for users. The edge computing deploys the service nodes at the network edge closer to the user in a distributed manner, so that the mobile user can access the service on the edge service node nearby, thereby significantly improving the service quality and effectively reducing the resource load of the data center. By introducing a virtualization technology, an edge service provider can abstract physical resources of an edge node into a Virtual Network Function (VNF), improve the utilization efficiency of IT resources on the premise of meeting user service requirements, and further reduce the service expense (OPEX) of the provider. Currently, virtualization technology (VMVNF) based on a Virtual Machine (VM) is most widely used. However, the VMVNF has limitations such as slow start and migration and large resource overhead, which makes it slow to meet the dynamic requirements of tasks. With the rise of the recent new proposed Serverless Computing, network functions can be deployed in the form of Containers (CT), and thus form a Containerbased virtualization technology (CTVNF). CTVNF is increasingly used by edge service providers due to its advantages of lighter weight resource usage, shorter service startup time, and higher migration efficiency. Providing services to tasks at the edge end often requires deploying multiple container units on the edge service node and connecting them to each other to build a Container Cluster (CC), for example: a realtime data analysis service with information security requirements may need to be established to include Firewall, IDS, a plurality of computing units, a load balancer and other functional units. The functional units are mapped to the same or different edge service nodes in the form of containers and establish virtual networks for interconnection. The complexity of the service itself and the higher demand for service efficiency make it a challenging problem how to implement optimized CC deployment in edge computing environments, which needs to be considered at the same time: 1) The service requests for resources; 2) A logical association relationship between a plurality of containers; 3) The rest IT resources of the currently available edge nodes; 4) The expense of container deployment on energy consumption; 5) Poor service quality that may result from container deployment, etc.
Disclosure of Invention
The invention aims to at least solve the technical problems in the prior art, and particularly innovatively provides a container cluster online deployment method for fusion graph neural network and reinforcement learning in edge computing.
In order to achieve the above object, the present invention provides a container cluster online deployment method for fusion graph neural network and reinforcement learning in edge computing, which comprises the following steps:
s1, extracting a topological association relation existing between containers by a graph convolution network;
and S2, deducing a deployment strategy from the sequence to the sequence network with the assistance of the graph convolution network.
In a preferred embodiment of the present invention, the hierarchical propagation of the graph volume network in step S1 is:
wherein ,H^{(l+1)} Represents the characteristics of the l +1 th layer;
σ () represents an activation function;
a represents the relationship matrix between the nodes in diagram G;
H ^{(l)} represents the characteristics of the lth layer;
W ^{(l)} the training parameter matrix for layer l is represented.
In a preferred embodiment of the present invention, the strategy is deployed in step S2 as follows:
π(pc,θ)＝P _{r} {A _{t} ＝pS _{t} ＝c,θ _{t} ＝θ}
where π (p  c, θ) represents the probability of deploying policy p for the output of a given input c;
θ represents the training parameters of the model;
P _{r} representing the probability of outputting the deployment policy p;
A _{t} represents the operation at time t;
S _{t} indicates the state at time t;
θ _{t} representing the training parameters at time t.
In a preferred embodiment of the present invention, after step S1, a step S3 is further included, and the commentator network evaluates the rewards obtained after performing the action of the actor.
In a preferred embodiment of the present invention, after step S1, there is further included step S4, in which the actor network updates the optimized model parameters according to the output of the critic module.
In a preferred embodiment of the present invention, the optimization model is:
max (Total chargeTotal energy expenditure) (1.1)
Wherein N represents a set of physical nodes;
G _{c} representing revenue per unit of computing resource;
η _{k,c} representing the utilization of computing resources on physical node k;
i represents a service request set;
V _{i} a set of containers representing service request i;
which represents a binary flag bit that is,a container j representing a request i is deployed on a physical node k;
G _{m} expressing the income of memory resources per unit;
G _{s} representing the profit per unit of storage resource;
wherein N represents a set of physical nodes;
i represents a service request set;
V _{i} a set of containers representing service requests i;
which represents a binary flag bit that is set to zero,a container j representing a request i is deployed on a physical node k;
u _{k} representing binary flag bits, u _{k} When =1, it indicates that physical node k is in an active state;
c represents the specific energy expenditure coefficient.
In a preferred embodiment of the present invention, the optimization model is: min (total energy consumption expense), min () represents the minimum; max () means max.
Wherein N represents a set of physical nodes;
i represents a service request set;
V _{i} a set of containers representing service request i;
which represents a binary flag bit that is,a container j representing a request i is deployed on a physical node k;representing the demand of the container j of the request i for the computing resource;
u _{k} representing binary flag bits, u _{k} When =1, it indicates that physical node k is in an active state;
c represents the specific energy expenditure coefficient.
In a preferred embodiment of the present invention, the constraint conditions of the optimization model are:
wherein ,η_{k,c} Representing the utilization of computing resources on physical node k;
i represents a service request set;
n represents a physical node set;
which represents a binary flag bit that is,a container j representing a request i is deployed on a physical node k;representing a demand amount for computing resources by container j of request i;
wherein N represents a set of physical nodes;
which represents a binary flag bit that is set to zero,a container j representing a request i is deployed on a physical node k;
i represents a service request set;
V _{i} a set of containers representing service requests i;
wherein I represents a service request set;
V _{i} a set of containers representing service requests i;
which represents a binary flag bit that is,a container m representing a request i is deployed at a physical node k _{u} C, removing;
which represents a binary flag bit that is set to zero,a container n representing a request i is deployed at a physical node k _{v} The above step (1);
wherein I represents a service request set;
n represents a physical node set;
which represents a binary flag bit that is,a container j representing a request i is deployed in physicalOn node k;
In a preferred embodiment of the invention, the model is updated as:
wherein ,θ_{k+1} Representing model parameters at a next time instant;
θ _{k} model parameters representing a current time;
α represents a learning rate;
In a preferred embodiment of the present invention, the model updating further comprises:
wherein ,represents the mean square error of the evaluation value b (c, p) and the reward value Q (c, p) given by the reference evaluator;
m represents the number of samples;
Q(c,p _{i} ) Indicating that the algorithm makes a decision p at a given input container cluster c _{i} The reward obtained is made;
b(c,p _{i} ) Representing a cluster c and a decision p at a given input container _{i} The evaluation value given by the lower reference evaluator b.
In summary, due to the adoption of the technical scheme, the edge calculation can be reasonably deployed according to the constructed optimization model.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic diagram of container cluster deployment in an edge network environment according to the present invention.
FIG. 2 is a diagram of a reinforcement learning model decisionreward cycle according to the present invention.
FIG. 3 is a schematic diagram of the model training process of the present invention.
Fig. 4 is a schematic diagram of details of the network model of the actor of the present invention.
FIG. 5 is a schematic diagram of training history of the present invention in three experimental scenarios;
where (a) is a training history (smallscale scenario), (b) is a training history (mediumscale scenario), (c) is a training history (largescale scenario), (d) is a training loss (smallscale scenario), (e) is a training loss (mediumscale scenario), and (f) is a training loss (largescale scenario).
FIG. 6 is a solution time comparison diagram of the present invention.
FIG. 7 is a comparative illustration of the deployment error rate of the present invention.
FIG. 8 is a graphical illustration of a comparison of the cumulative revenue of the present invention over a period of time;
in these cases, (a) is cumulative benefit comparison (smallscale scenario), (b) is cumulative benefit comparison (mediumscale scenario), and (c) is cumulative benefit comparison (largescale scenario).
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
The invention mainly comprises the following steps: modeling a container cluster deployment problem in an edge computing network environment, and solving a framework based on an edge computing container cluster deployment strategy of Actor critics (ActorCritic) reinforcement learning. The method comprises the steps of extracting features of a mesh relation topological structure among a plurality of containers in a container cluster by introducing a graph volume network, using the extracted features as input of an attention mechanism in a Seq2Seq network to improve the output quality of a solution, carrying out embedded coding on the container cluster by an encoder part of Seq2Seq, and outputting corresponding container deployment positions by a decoder part. And a reinforcement learning framework based on ActorCritic is adopted to train the network, label mapping is not needed, the Actor network and the Critic network can train and learn mutually to be promoted independently, and the system benefit is improved obviously through the solution given by the trained network.
Different numbers of service requests may be received by the edge computing platform in the same period, and functions required to be implemented by each service request are different as much as possible, and services with different functions mean that containers with different types and different numbers need to be used, and uncertain communication needs exist in containers with the same number. The most intuitive impact of service request size and category is the change of virtual nodes and links, i.e. the change of the structural configuration. Workload fluctuations typically change the amount of resource requirements, i.e., changes in resource configuration, of a virtual node or link. The process of mapping two different container clusters to the underlying physical network is illustrated in fig. 1.
1. Reinforced learning solving framework combined with graph convolution network
In the invention, a reinforcement learning framework of ActorCritic is adopted to train the model. The entire model involves two neural networks: actor networks and critic networks. Their workflow is as shown in fig. 2: for a given container cluster input into the decision system, the agent (Actor network) will depend on the current network state S _{t} Giving a suitable decision a _{t} In our question, the deployment policy Placement, indicates the deployment location of the containers in the container cluster. The environment then evaluates the deployment policy to generate corresponding feedback information (reward) R indicative of the quality of the deployment policy _{t+1} At the same time, the environment will update the new environment S after deployment _{t+1} . The critic network evaluates the return (namely the Langerhans value) obtained after the actor acts, and the evaluation result is Baseline; the actor network updates the model parameters based on the output of the critic module (the actor network will update the parameters in the direction of higher returns). The training process of the model is specifically shown in fig. 3.
In the invention, a Graph Convolutional neural Network (Graph conditional Network) is expanded on the basis of a neural combination optimization theory to extract a topological link relation existing in a container cluster, so that an intelligent agent can predict a topological structure of the container cluster in advance, and a deployment strategy can be given more accurately. In particular, we use a graphconvolution network and a sequencetosequence model based on an codec structure to infer deployment strategies. For a cluster of containers of the same training batch, we adopt the following method: clustering a plurality of containersThe feature information group and the block diagonal matrix are input to a graph convolution network for training. To more clearly explain the above model working process, we assume a set of container clusters [ Q, V, W]Mapping into the underlying physical network is required. Each container cluster to which a service request corresponds has a container number of variable size m, e.g., Q = (f) _{1} ,f _{2} ,...,f _{m} ). Clustering containers [ Q, V, W ]]As input to the GCN network, containers Q = (f) in a container cluster _{1} ,f _{2} ,...,f _{m} ) As input to the encoder, the decoder portion outputs a deployment policy P = (P) _{1} ,p _{2} ,...,p _{m} ) Indicating the deployment location of each container. The actor network model in the method proposed by the present invention is shown in fig. 4.
One part of the task request is input into the GCN network for extracting the topological characteristics, and the other part of the task request is input into the encoder part of the Seq2Seq network for controlling the sequence of container deployment. The output of the GCN network and the output part of the encoder are input to a decoder part of Seq2Seq through matrix dot product operation, and finally the deployment strategy of the container is given by the decoder.
The invention sets out from the perspective of an edge computing service provider to construct an optimization model, and hopefully reduces the total energy consumption expense on the premise of meeting the service request of a user as much as possible so as to realize the benefit maximization of the service provider.
max (Total chargeTotal energy expenditure) (1.1)
The objective function is divided into two parts: equation (1.2) provides for the edge computing service provider to perform a regular charging of rented resources, i.e., for the container j ∈ V included in the service request I ∈ I _{i} Occupied physical resources: computing resourcesMemory resourceAnd storage resourcesRespectively multiplied by corresponding chargesCoefficient: g _{c} 、G _{m} and G_{s} . It is worth noting that for the charging rule of the computing resource, a service effect coefficient (1eta) is creatively added _{k,c} ) The competing use of physical resources to constrain containers is exacerbated resulting in reduced service capacity.
Wherein N represents a set of physical nodes;
G _{c} representing revenue per unit of computing resource;
η _{k,c} representing the utilization rate of computing resources on a physical node k;
i represents a service request set;
V _{i} a set of containers representing service requests i;
which represents a binary flag bit that is,a container j representing a request i is deployed on a physical node k;
G _{m} expressing the income of memory resources per unit;
G _{s} representing the profit per unit of storage resource;
in equation (1.3) we define the energy consumption expenses incurred by the underlying physical network, and here our optimization model considers only the energy consumption expenses as operator expenses, considering that the energy consumption expenses account for a large part of the service provider's daily operating expenses.Is the maximum energy consumption value of the physical node k,is the minimum energy consumption value of the physical node k, since the energy consumption is positively correlated to the resource utilization rate, we useAnd computing resource occupancy rateThe product of (b) represents the energy consumption of the physical node k, and the energy consumption is generated when the physical node is idle, so that the energy consumption value of the physical node k is addedAnd finally, multiplying the sum of the two by a unit energy consumption expenditure coefficient to express the total energy consumption expenditure of the service provider.
Wherein N represents a set of physical nodes;
i represents a service request set;
V _{i} a set of containers representing service request i;
which represents a binary flag bit that is,a container j representing a request i is deployed on a physical node k;
u _{k} representing binary flag bits, u _{k} When =1, it indicates that physical node k is in an active state;
c represents the unit energy consumption coefficient;
the optimization model is constrained by a plurality of constraints, the constraints (1.4) representing the utilization η on the physical node k with respect to the computational resources _{k,c} ，η _{k,c} The range of values is limited to [0,1 ]]。
wherein ,η_{k,c} Representing the utilization of computing resources on physical node k;
i represents a service request set;
n represents a physical node set;
which represents a binary flag bit that is set to zero,a container j representing a request i is deployed on a physical node k;
the constraint (1.5) defines that the jth container of the ith service request can only be deployed on one physical node and cannot be deployed repeatedly.
Wherein N represents a set of physical nodes;
which represents a binary flag bit that is set to zero,a container j representing a request i is deployed on a physical node k;
i represents a service request set;
V _{i} a set of containers representing service requests i;
constraint (1.6) defines that two of the service requests i are located at physical node k respectively _{u} and k_{v} Does not exceed the bandwidth resource occupied by the communication between the containers m and n of the physical node k _{u} and k_{v} The total amount of bandwidth resources in between.
Wherein I represents a service request set;
V _{i} a set of containers representing service requests i;
which represents a binary flag bit that is,a container m representing a request i is deployed at a physical node k _{u} C, removing;
which represents a binary flag bit that is,a container n representing a request i is deployed at a physical node k _{v} The above step (1);
constraints (1.7), (1.8) and (1.9) respectively limit the sum of the total amount of all container resources contained in the service request not to exceed the total amount of computing resources, memory resources and storage resources.
Wherein I represents a service request set;
n represents a physical node set;
which represents a binary flag bit that is,a container j representing a request i is deployed on a physical node k;
2. topological relation description based on graph convolution network
The invention adopts the graph convolution network to extract the topological relation of the input container cluster, and uses the extracted characteristics to assist the intelligent agent to provide a more accurate deployment strategy on the premise of not damaging the constraint condition, thereby reducing the container deployment cost and improving the overall benefit of the edge computing service provider.
Assume that a graph of one container cluster configuration is represented by G = (N, E). Where N represents a vertex in the diagram, i.e., a container in the container cluster, and E represents an edge in the diagram, i.e., a link resulting from communication between containers in the container cluster. The features of the vertices in G form an N X D matrix X, where D represents the number of features. The containertocontainer relationship is represented by an N × N dimensional matrix a, i.e., a contiguous matrix of G. The hierarchical propagation of the graph convolution network is shown in equation (10).
wherein ,H^{(l+1)} Represents the characteristics of the l +1 th layer;
σ () represents an activation function;
a represents the relationship matrix between the nodes in diagram G;
H ^{(l)} features representing the ith layer;
W ^{(l)} a training parameter matrix representing the lth layer;
I _{N} representing an identity matrix with the order of N;
x represents a characteristic matrix formed by G node characteristics in the diagram;
in this equationIs an adjacency matrix of an undirected graph G with attached selfjoins, where A is the adjacency matrix of the undirected graph G, I _{N} Is an identity matrix.Is a matrixThe degree matrix of (c). W is a group of ^{(l)} Is the training parameter matrix for layer i. σ represents an activation function, such as ReLu, sigmoid, etc. (we use ReLu in our model). H ^{(l)} Representative is the characteristics of the lth layer, H = X for the input layer.
3. Policy gradient based constrained optimization
Assuming that a set of container clusters is represented by C, wherein one container cluster is represented by C (C ∈ C), the policy function of C is represented as:
π(pc,θ)＝P _{r} {A _{t} ＝pS _{t} ＝c,θ _{t} ＝θ}
where π (p  c, θ) represents the probability of deploying policy p for the output of a given input c;
θ represents the training parameters of the model;
P _{r} representing the probability of outputting the deployment policy p;
A _{t} represents the operation at time t;
S _{t} represents the state at time t;
θ _{t} a training parameter representing time t;
the strategy function represents t moment, c is input, the parameter is theta, and the probability of outputting the deployment strategy P is P _{r} . The strategy gives a highincome deployment strategy p higher probability and gives a lowincome deployment strategy p lower probability. Interaction of the input container cluster with the output policy within the T period generates a trajectory of a markov decision process (trajectory) = (c) _{1} ,p _{1} ,...,c _{T} ,p _{T} ) The probability of (d) can be expressed as:
wherein ,P_{θ} (c _{1} ,p _{1} ,...,c _{T} ,p _{T} ) Represents the trace τ = (c) under the parameter θ _{1} ,p _{1} ,...,c _{T} ,p _{T} ) The probability of occurrence;
p(c _{1} ) Represents a state c _{1} (i.e., the input at time t =1 is c _{1} ) The probability of occurrence;
t represents a period of time;
π _{θ} (p _{t} c _{t} ) Indicates that at time t, the current state is c _{t} (i.e., the container cluster of inputs), with parameter θ, the agent takes action p _{t} (i.e., the outputted deployment policy);
p(c _{t+1} c _{t} ,p _{t} ) The state at time t (i.e., the incoming container cluster) is denoted as c _{t} And the action (i.e., deployment policy of the output) is p _{t} Under the condition(s), the system state at time t +1 (i.e., the input container cluster) is c _{t+1} The probability of (d);
c _{1} represents the system state (i.e., incoming container cluster) at time t = 1;
p _{1} represents the deployment strategy at time t = 1;
c _{t} an input representing time t;
p _{t} representing the deployment strategy output at the time t;
in the above policy function, cluster c is clustered for the current input container _{t} Deployment policy p _{t} Depends on the deployment position p of the previous container cluster _{(＜t)} And a system state. For simplicity, we assume that the system state is fully defined by the container cluster C. The only output of the policy function is the probability that indicates the container cluster deployment location. The objective of the tactical gradient method is to find an optimal set of parameters θ ^{*} To obtain an optimal deployment location for the cluster of containers. To do this, we need to define an objective function to describe the quality of the deployment strategy.
wherein ,J_{R} (θ  c) represents the policy quality corresponding to input c;
r (p) represents service income corresponding to the deployment strategy p;
p π θ (. DELTA.c) represents all deployment policies p for a given input c;
in the above equation, we use the expected service revenue R (p) for a given container cluster C for a deployment policy as an objective function describing the quality of the deployment policy. Because the agent infers the deployment policy from all container clusters, the revenue expectation can then be defined as the expectation of the container probability distribution:
wherein ,J_{R} (θ) represents the policy quality, i.e., expected value of revenue;
J _{R} (θ  c) represents the policy quality corresponding to input c;
c to C represent clusters C for all containers;
the same reasoning can be expressed for the expected penalty due to violation of the constraint:
wherein ,J_{C} (θ) represents an expected value of a penalty value;
J _{C} (θ  c) represents a penalty value corresponding to the input c;
c to C represent clusters C for all containers;
herein, we define four constraint signals, respectively: computing resource cpu, memory resource mem, storage resource sto and bandwidth resource bw. The final optimization objective can be transformed into an unconstrained problem by the lagrangian relaxation technique:
wherein ,J_{L} (λ, θ) represents the Lagrangian value calculated by taking the expected value of the benefit J _{R} (theta) plus an expected value J of a plurality of resourcecorresponding penalty values _{C} (θ) a weighted sum;
λ represents the weight of the four constraint signals;
J _{R} (θ) represents the policy quality, i.e., expected value of revenue;
λ _{i} representing weights of the constraint signals;
J _{C} (θ) represents an expected value of the penalty value;
J _{ξ} (θ) represents a weighted sum of expected values of the four constraint signal penalty values;
where λ is the weight of the four constraint signals, J _{ξ} (θ) is the expected revenue weighted sum of the four constraint signals. Next, we calculate J using loglikelihood _{L} (λ, θ) gradient.
J _{L} (λ, θ) represents the Lagrangian value calculated by taking the expected value of the benefit J _{R} (theta) plus expectation J of corresponding penalty values for multiple resources _{C} (θ) a weighted sum;
π θ (p  c) represents the policy function of c;
q (c, p) represents the reward accrued given the input container cluster c algorithm making decision p;
p π θ (. DELTA.c) represents the deployment strategy p that is all for a given input c;
in the above equation, Q (c, p) is used to describe the reward that can be achieved at a given input c and the algorithm making a decision p. The calculation method is by adding a weighted sum of all constraint unsatisfied values C (p) to the profit value R (p), as shown in (18):
where Q (c, p) represents the reward accrued given the input container cluster c algorithm making decision p;
r (p) represents the reward that the decision p can obtain corresponding to the system;
ξ (p) represents the weighted sum of the penalty values corresponding to all the constraint signals of decision p;
λ _{i} a weight representing the constraint signal;
c (p) represents the penalty value produced by a constraint signal at decision p;
then, we approximate the Lagrangian gradient using Monte Carlo samplingWherein m is the number of samples, and in order to reduce the variance of the gradient and accelerate the convergence speed of the model, a critic network is used as a reference estimator b and is composed of a simple RNN network. Then the lagrangian gradient can be expressed as:
m represents the number of samples;
Q(c,p _{i} ) Indicating that the algorithm makes a decision p at a given input container cluster c _{i} The prize won is made.
b(c,p _{i} ) Representing a cluster c and a decision p at a given input container _{i} The evaluation value given by the lower reference evaluator b;
and finally, updating a parameter theta of the network model by adopting a random gradient descent method:
wherein ,θ_{k+1} Representing the model parameters at the next time instant;
θ _{k} model parameters representing a current time;
α represents a learning rate;
the benchmark evaluator gives an evaluation b (c, p) of the current container cluster return, and then the parameter σ of the benchmark evaluator is updated based on the mean square error of b (c, p) and the reward value Q (c, p) using a random gradient descent method.
wherein ,represents the mean square error of the evaluation value b (c, p) and the reward value Q (c, p) given by the reference evaluator;
m represents the number of samples;
Q(c,p _{i} ) Indicating that the algorithm makes a decision p at a given input container cluster c _{i} The reward obtained is made;
b(c,p _{i} ) Representing a cluster c and a decision p at a given input container _{i} The evaluation value given by the lower reference evaluator b;
the container cluster deployment algorithm training process based on the graph convolution network and neural combinatorial optimization can be described as table 1:
table 1. Container Cluster deployment algorithm training process description based on graph convolution network and neural combinatorial optimization
While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
Claims (9)
1. A container cluster online deployment method for fusing graph neural network and reinforcement learning in edge computing is characterized by comprising the following steps:
s1, extracting a topological association relation existing between containers by a graph convolution network;
and S2, deducing a deployment strategy from the sequence to the sequence network with the assistance of the graph volume network.
2. The method for deploying container clusters fusing graph neural networks and reinforcement learning on line in edge computing according to claim 1, wherein the hierarchical propagation of the graph convolution network in step S1 is as follows:
wherein ,H^{(l+1)} Represents the characteristics of the l +1 th layer;
σ () represents an activation function;
a represents the relationship matrix between the nodes in diagram G;
H ^{(l)} represents the characteristics of the lth layer;
W ^{(l)} the training parameter matrix for layer l is represented.
3. The method for deploying the container cluster fusing the graph neural network and the reinforcement learning in the edge computing on line according to claim 1, wherein the deployment strategy in step S2 is as follows:
π(pc,θ)＝P _{r} {A _{t} ＝pS _{t} ＝c,θ _{t} ＝θ}
where π (p  c, θ) represents the probability of deploying policy p for the output of a given input c;
theta represents a training parameter of the model;
P _{r} representing a probability of outputting the deployment policy p;
A _{t} represents the operation at time t;
S _{t} indicates the state at time t;
θ _{t} representing the training parameters at time t.
4. The container cluster online deployment method for fusion graph neural network and reinforcement learning in edge computing as claimed in claim 1, characterized in that, after step S1, a step S3 is further included, and the commentator network evaluates the rewards obtained after the actor acts.
5. The method for deploying the fusion graph neural network and the container cluster for reinforcement learning in edge computing on line according to claim 1, characterized by further comprising a step S4 after the step S1, wherein the actor network updates the optimized model parameters according to the output of the critic module.
6. The method for deploying a fusion graph neural network and a reinforcement learning container cluster on line in edge computing according to claim 5, wherein an optimization model is as follows:
max (Total chargeTotal energy expenditure) (1.1)
Wherein N represents a set of physical nodes;
G _{c} representing revenue per unit of computing resource;
η _{k,c} representing the utilization of computing resources on physical node k;
i represents a service request set;
V _{i} a set of containers representing service requests i;
which represents a binary flag bit that is,a container j representing a request i is deployed on a physical node k;
G _{m} expressing the income of memory resources per unit;
G _{s} representing revenue per unit of storage resource;
wherein N represents a set of physical nodes;
i represents a service request set;
V _{i} a set of containers representing service request i;
which represents a binary flag bit that is set to zero,a container j representing a request i is deployed on a physical node k;representing the demand of the container j of the request i for the computing resource;
u _{k} representing binary flag bits, u _{k} When =1, it indicates that physical node k is in an active state;
c represents the unit energy consumption coefficient;
or, min (Total energy expenditure)
Wherein N represents a set of physical nodes;
i represents a service request set;
V _{i} a set of containers representing service requests i;
which represents a binary flag bit that is set to zero,a container j representing a request i is deployed on a physical node k;
u _{k} representing binary flag bits, u _{k} 1 indicates that the physical node k is in an activated state;
c represents the specific energy expenditure coefficient.
7. The method for deploying the fusion graph neural network and the container cluster for reinforcement learning in the edge computing on line according to claim 6, wherein the constraint conditions of the optimization model are as follows:
wherein ,η_{k,c} Representing the utilization of computing resources on physical node k;
i represents a service request set;
n represents a physical node set;
which represents a binary flag bit that is,a container j representing a request i is deployed on a physical node k;
wherein N represents a set of physical nodes;
which represents a binary flag bit that is,a container j representing a request i is deployed on a physical node k;
i represents a service request set;
V _{i} a set of containers representing service requests i;
wherein I represents a service request set;
V _{i} a set of containers representing service request i;
which represents a binary flag bit that is set to zero,a container m representing a request i is deployed at a physical node k _{u} The above step (1);
which represents a binary flag bit that is set to zero,a container n representing a request i is deployed at a physical node k _{v} The above step (1);
wherein I represents a service request set;
n represents a physical node set;
which represents a binary flag bit that is set to zero,a container j representing a request i is deployed on a physical node k;
8. The method for deploying a fusion graph neural network and a reinforcement learning container cluster on line in edge computing according to claim 5, wherein the model is updated as follows:
wherein ,θ_{k+1} Representing model parameters at a next time instant;
θ _{k} model parameters representing a current time;
α represents a learning rate;
9. The method for deploying the fusion graph neural network and the container cluster for reinforcement learning in the edge computing on line according to claim 8, wherein the model updating further comprises:
wherein ,represents the mean square error of the evaluation value b (c, p) and the reward value Q (c, p) given by the reference evaluator; m represents the number of samples;
Q(c,p _{i} ) Representing the algorithm making a decision p at a given input container cluster c _{i} The reward obtained is made;
b(c,p _{i} ) Representing a cluster c and a decision p at a given input container _{i} The evaluation value given by the lower reference evaluator b.
Priority Applications (1)
Application Number  Priority Date  Filing Date  Title 

CN202211347967.8A CN115686846B (en)  20221031  20221031  Container cluster online deployment method integrating graph neural network and reinforcement learning in edge calculation 
Applications Claiming Priority (1)
Application Number  Priority Date  Filing Date  Title 

CN202211347967.8A CN115686846B (en)  20221031  20221031  Container cluster online deployment method integrating graph neural network and reinforcement learning in edge calculation 
Publications (2)
Publication Number  Publication Date 

CN115686846A true CN115686846A (en)  20230203 
CN115686846B CN115686846B (en)  20230502 
Family
ID=85045641
Family Applications (1)
Application Number  Title  Priority Date  Filing Date 

CN202211347967.8A Active CN115686846B (en)  20221031  20221031  Container cluster online deployment method integrating graph neural network and reinforcement learning in edge calculation 
Country Status (1)
Country  Link 

CN (1)  CN115686846B (en) 
Cited By (1)
Publication number  Priority date  Publication date  Assignee  Title 

CN116069512A (en) *  20230323  20230505  之江实验室  Serverless efficient resource allocation method and system based on reinforcement learning 
Citations (7)
Publication number  Priority date  Publication date  Assignee  Title 

CN110008819A (en) *  20190130  20190712  武汉科技大学  A kind of facial expression recognizing method based on figure convolutional neural networks 
CN112631717A (en) *  20201221  20210409  重庆大学  Network service function chain dynamic deployment system and method based on asynchronous reinforcement learning 
CN112711475A (en) *  20210120  20210427  上海交通大学  Workflow scheduling method and system based on graph convolution neural network 
CN113568675A (en) *  20210708  20211029  广东利通科技投资有限公司  Internet of vehicles edge calculation task unloading method based on layered reinforcement learning 
CN113778648A (en) *  20210831  20211210  重庆理工大学  Task scheduling method based on deep reinforcement learning in hierarchical edge computing environment 
US20220124543A1 (en) *  20210630  20220421  Oner Orhan  Graph neural network and reinforcement learning techniques for connection management 
US20220343143A1 (en) *  20190911  20221027  Siemens Aktiengesellschaft  Method for generating an adapted task graph 

2022
 20221031 CN CN202211347967.8A patent/CN115686846B/en active Active
Patent Citations (7)
Publication number  Priority date  Publication date  Assignee  Title 

CN110008819A (en) *  20190130  20190712  武汉科技大学  A kind of facial expression recognizing method based on figure convolutional neural networks 
US20220343143A1 (en) *  20190911  20221027  Siemens Aktiengesellschaft  Method for generating an adapted task graph 
CN112631717A (en) *  20201221  20210409  重庆大学  Network service function chain dynamic deployment system and method based on asynchronous reinforcement learning 
CN112711475A (en) *  20210120  20210427  上海交通大学  Workflow scheduling method and system based on graph convolution neural network 
US20220124543A1 (en) *  20210630  20220421  Oner Orhan  Graph neural network and reinforcement learning techniques for connection management 
CN113568675A (en) *  20210708  20211029  广东利通科技投资有限公司  Internet of vehicles edge calculation task unloading method based on layered reinforcement learning 
CN113778648A (en) *  20210831  20211210  重庆理工大学  Task scheduling method based on deep reinforcement learning in hierarchical edge computing environment 
Cited By (2)
Publication number  Priority date  Publication date  Assignee  Title 

CN116069512A (en) *  20230323  20230505  之江实验室  Serverless efficient resource allocation method and system based on reinforcement learning 
CN116069512B (en) *  20230323  20230804  之江实验室  Serverless efficient resource allocation method and system based on reinforcement learning 
Also Published As
Publication number  Publication date 

CN115686846B (en)  20230502 
Similar Documents
Publication  Publication Date  Title 

CN111835827B (en)  Internet of things edge computing task unloading method and system  
CN109818786B (en)  Method for optimally selecting distributed multiresource combined path capable of sensing application of cloud data center  
Yuan et al.  A Qlearningbased approach for virtual network embedding in data center  
CN113098714A (en)  Lowdelay network slicing method based on deep reinforcement learning  
Rkhami et al.  Learn to improve: A novel deep reinforcement learning approach for beyond 5G network slicing  
CN113705610A (en)  Heterogeneous model aggregation method and system based on federal learning  
Rkhami et al.  On the use of graph neural networks for virtual network embedding  
CN112990485A (en)  Knowledge strategy selection method and device based on reinforcement learning  
Djigal et al.  Machine and deep learning for resource allocation in multiaccess edge computing: A survey  
CN113642700A (en)  Crossplatform multimodal public opinion analysis method based on federal learning and edge calculation  
CN115686846B (en)  Container cluster online deployment method integrating graph neural network and reinforcement learning in edge calculation  
CN108170531A (en)  A kind of cloud data center request stream scheduling method based on depth belief network  
Bi et al.  Green energy forecastbased biobjective scheduling of tasks across distributed clouds  
Xu et al.  Living with artificial intelligence: A paradigm shift toward future network traffic control  
CN113543160A (en)  5G slice resource allocation method and device, computing equipment and computer storage medium  
CN115499511A (en)  Microservice active scaling method based on spacetime diagram neural network load prediction  
CN113783726B (en)  SLAoriented resource selfadaptive customization method for edge cloud system  
CN112906745B (en)  Integrity intelligent network training method based on edge cooperation  
CN115630979A (en)  Dayahead electricity price prediction method and device, storage medium and computer equipment  
Bhargavi et al.  Uncertainty aware resource provisioning framework for cloud using expected 3SARSA learning agent: NSS and FNSS based approach  
Qin et al.  Dynamic IoT service placement based on shared parallel architecture in fogcloud computing  
Bhargavi et al.  Fuzzy Neutrosophic Soft Set Based TransferQLearning Scheme for Load Balancing in Uncertain Grid Computing Environments  
Damaševičius et al.  Short time prediction of cloud server roundtrip time using a hybrid neurofuzzy network  
Jin et al.  Common structures in resource management as driver for Reinforcement Learning: a survey and research tracks  
Oikonomou et al.  On the use of intelligent models towards meeting the challenges of the edge mesh 
Legal Events
Date  Code  Title  Description 

PB01  Publication  
PB01  Publication  
SE01  Entry into force of request for substantive examination  
SE01  Entry into force of request for substantive examination  
GR01  Patent grant  
GR01  Patent grant 