CN115686846A - Container cluster online deployment method for fusing graph neural network and reinforcement learning in edge computing - Google Patents

Container cluster online deployment method for fusing graph neural network and reinforcement learning in edge computing Download PDF

Info

Publication number
CN115686846A
CN115686846A CN202211347967.8A CN202211347967A CN115686846A CN 115686846 A CN115686846 A CN 115686846A CN 202211347967 A CN202211347967 A CN 202211347967A CN 115686846 A CN115686846 A CN 115686846A
Authority
CN
China
Prior art keywords
representing
container
physical node
request
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211347967.8A
Other languages
Chinese (zh)
Other versions
CN115686846B (en
Inventor
陈卓
朱博文
周川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Technology
Original Assignee
Chongqing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Technology filed Critical Chongqing University of Technology
Priority to CN202211347967.8A priority Critical patent/CN115686846B/en
Publication of CN115686846A publication Critical patent/CN115686846A/en
Application granted granted Critical
Publication of CN115686846B publication Critical patent/CN115686846B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a container cluster online deployment method for fusing a graph neural network and reinforcement learning in edge computing, which comprises the following steps of: s1, extracting a topological association relation existing between containers by a graph convolution network; and S2, deducing a deployment strategy from the sequence to the sequence network with the assistance of the graph convolution network. The invention can reasonably deploy the edge calculation according to the constructed optimization model.

Description

Container cluster online deployment method for fusing graph neural network and reinforcement learning in edge computing
Technical Field
The invention relates to the technical field of edge deployment, in particular to a container cluster online deployment method for fusion graph neural network and reinforcement learning in edge computing.
Background
In recent years, with the rapid development of wireless access technology, various mobile internet and novel internet of things are continuously applied, new characteristics such as shorter response time requirement, higher service quality requirement, more diverse resource requirements and dynamic change of resource requirement scale are increasingly presented in business, and the new requirements are difficult to be met by a cloud computing mode of concentrating IT resources in a data center to provide services for users. The edge computing deploys the service nodes at the network edge closer to the user in a distributed manner, so that the mobile user can access the service on the edge service node nearby, thereby significantly improving the service quality and effectively reducing the resource load of the data center. By introducing a virtualization technology, an edge service provider can abstract physical resources of an edge node into a Virtual Network Function (VNF), improve the utilization efficiency of IT resources on the premise of meeting user service requirements, and further reduce the service expense (OPEX) of the provider. Currently, virtualization technology (VM-VNF) based on a Virtual Machine (VM) is most widely used. However, the VM-VNF has limitations such as slow start and migration and large resource overhead, which makes it slow to meet the dynamic requirements of tasks. With the rise of the recent new proposed Serverless Computing, network functions can be deployed in the form of Containers (CT), and thus form a Container-based virtualization technology (CT-VNF). CT-VNF is increasingly used by edge service providers due to its advantages of lighter weight resource usage, shorter service startup time, and higher migration efficiency. Providing services to tasks at the edge end often requires deploying multiple container units on the edge service node and connecting them to each other to build a Container Cluster (CC), for example: a real-time data analysis service with information security requirements may need to be established to include Firewall, IDS, a plurality of computing units, a load balancer and other functional units. The functional units are mapped to the same or different edge service nodes in the form of containers and establish virtual networks for interconnection. The complexity of the service itself and the higher demand for service efficiency make it a challenging problem how to implement optimized CC deployment in edge computing environments, which needs to be considered at the same time: 1) The service requests for resources; 2) A logical association relationship between a plurality of containers; 3) The rest IT resources of the currently available edge nodes; 4) The expense of container deployment on energy consumption; 5) Poor service quality that may result from container deployment, etc.
Disclosure of Invention
The invention aims to at least solve the technical problems in the prior art, and particularly innovatively provides a container cluster online deployment method for fusion graph neural network and reinforcement learning in edge computing.
In order to achieve the above object, the present invention provides a container cluster online deployment method for fusion graph neural network and reinforcement learning in edge computing, which comprises the following steps:
s1, extracting a topological association relation existing between containers by a graph convolution network;
and S2, deducing a deployment strategy from the sequence to the sequence network with the assistance of the graph convolution network.
In a preferred embodiment of the present invention, the hierarchical propagation of the graph volume network in step S1 is:
Figure BDA0003917890840000021
wherein ,H(l+1) Represents the characteristics of the l +1 th layer;
σ () represents an activation function;
Figure BDA0003917890840000022
representation matrix
Figure BDA0003917890840000023
A degree matrix of (c);
Figure BDA0003917890840000024
presentation pair
Figure BDA0003917890840000025
Performing a power-1/2 operation on the matrix;
a represents the relationship matrix between the nodes in diagram G;
Figure BDA0003917890840000026
representing an adjacency matrix with an undirected graph G appended with self-joins;
H (l) represents the characteristics of the l-th layer;
W (l) the training parameter matrix for layer l is represented.
In a preferred embodiment of the present invention, the strategy is deployed in step S2 as follows:
π(p|c,θ)=P r {A t =p|S t =c,θ t =θ}
where π (p | c, θ) represents the probability of deploying policy p for the output of a given input c;
θ represents the training parameters of the model;
P r representing the probability of outputting the deployment policy p;
A t represents the operation at time t;
S t indicates the state at time t;
θ t representing the training parameters at time t.
In a preferred embodiment of the present invention, after step S1, a step S3 is further included, and the commentator network evaluates the rewards obtained after performing the action of the actor.
In a preferred embodiment of the present invention, after step S1, there is further included step S4, in which the actor network updates the optimized model parameters according to the output of the critic module.
In a preferred embodiment of the present invention, the optimization model is:
max (Total charge-Total energy expenditure) (1.1)
Figure BDA0003917890840000031
Wherein N represents a set of physical nodes;
G c representing revenue per unit of computing resource;
η k,c representing the utilization of computing resources on physical node k;
i represents a service request set;
V i a set of containers representing service request i;
Figure BDA0003917890840000041
which represents a binary flag bit that is,
Figure BDA0003917890840000042
a container j representing a request i is deployed on a physical node k;
Figure BDA0003917890840000043
representing a demand amount for computing resources by container j of request i;
G m expressing the income of memory resources per unit;
Figure BDA0003917890840000044
representing the demand of the container j of the request i for the memory resource;
G s representing the profit per unit of storage resource;
Figure BDA0003917890840000045
representing the demand of the container j of the request i for the storage resource;
Figure BDA0003917890840000046
wherein N represents a set of physical nodes;
Figure BDA0003917890840000047
representing the maximum energy consumption value of the physical node k;
Figure BDA0003917890840000048
representing an idle energy consumption value of physical node k;
i represents a service request set;
V i a set of containers representing service requests i;
Figure BDA0003917890840000049
which represents a binary flag bit that is set to zero,
Figure BDA00039178908400000410
a container j representing a request i is deployed on a physical node k;
Figure BDA00039178908400000411
representing a demand amount for computing resources by container j of request i;
Figure BDA00039178908400000412
representing the total amount of computing resources of physical node k;
u k representing binary flag bits, u k When =1, it indicates that physical node k is in an active state;
c represents the specific energy expenditure coefficient.
In a preferred embodiment of the present invention, the optimization model is: min (total energy consumption expense), min () represents the minimum; max () means max.
Figure BDA00039178908400000413
Wherein N represents a set of physical nodes;
Figure BDA00039178908400000414
representing the maximum energy consumption value of the physical node k;
Figure BDA0003917890840000051
represents an idle energy consumption value for physical node k;
i represents a service request set;
V i a set of containers representing service request i;
Figure BDA0003917890840000052
which represents a binary flag bit that is,
Figure BDA0003917890840000053
a container j representing a request i is deployed on a physical node k;
Figure BDA0003917890840000054
representing the demand of the container j of the request i for the computing resource;
Figure BDA0003917890840000055
representing the total amount of computing resources of physical node k;
u k representing binary flag bits, u k When =1, it indicates that physical node k is in an active state;
c represents the specific energy expenditure coefficient.
In a preferred embodiment of the present invention, the constraint conditions of the optimization model are:
Figure BDA0003917890840000056
wherein ,ηk,c Representing the utilization of computing resources on physical node k;
i represents a service request set;
n represents a physical node set;
Figure BDA0003917890840000057
which represents a binary flag bit that is,
Figure BDA0003917890840000058
a container j representing a request i is deployed on a physical node k;
Figure BDA0003917890840000059
representing a demand amount for computing resources by container j of request i;
Figure BDA00039178908400000510
representing the total amount of computing resources of physical node k;
Figure BDA00039178908400000511
wherein N represents a set of physical nodes;
Figure BDA00039178908400000512
which represents a binary flag bit that is set to zero,
Figure BDA00039178908400000513
a container j representing a request i is deployed on a physical node k;
i represents a service request set;
V i a set of containers representing service requests i;
Figure BDA00039178908400000514
wherein I represents a service request set;
V i a set of containers representing service requests i;
Figure BDA0003917890840000061
representing the bandwidth requirements of container m and container n for request i;
Figure BDA0003917890840000062
which represents a binary flag bit that is,
Figure BDA0003917890840000063
a container m representing a request i is deployed at a physical node k u C, removing;
Figure BDA0003917890840000064
which represents a binary flag bit that is set to zero,
Figure BDA0003917890840000065
a container n representing a request i is deployed at a physical node k v The above step (1);
Figure BDA0003917890840000066
representing a physical node k u and kv Total amount of bandwidth resources in between;
Figure BDA0003917890840000067
Figure BDA0003917890840000068
Figure BDA0003917890840000069
wherein I represents a service request set;
n represents a physical node set;
Figure BDA00039178908400000610
which represents a binary flag bit that is,
Figure BDA00039178908400000611
a container j representing a request i is deployed in physicalOn node k;
Figure BDA00039178908400000612
representing a demand amount for computing resources by container j of request i;
Figure BDA00039178908400000613
representing the total amount of computing resources of physical node k;
Figure BDA00039178908400000614
representing the demand of the container j of the request i for the memory resource;
Figure BDA00039178908400000615
representing the total amount of memory resources of a physical node k;
Figure BDA00039178908400000616
representing the demand of the container j of the request i for the storage resource;
Figure BDA00039178908400000617
representing the total amount of storage resources of physical node k.
In a preferred embodiment of the invention, the model is updated as:
Figure BDA00039178908400000618
wherein ,θk+1 Representing model parameters at a next time instant;
θ k model parameters representing a current time;
α represents a learning rate;
Figure BDA00039178908400000619
representing the lagrangian gradient approximated using monte carlo sampling.
In a preferred embodiment of the present invention, the model updating further comprises:
Figure BDA0003917890840000071
wherein ,
Figure BDA0003917890840000072
represents the mean square error of the evaluation value b (c, p) and the reward value Q (c, p) given by the reference evaluator;
m represents the number of samples;
Q(c,p i ) Indicating that the algorithm makes a decision p at a given input container cluster c i The reward obtained is made;
b(c,p i ) Representing a cluster c and a decision p at a given input container i The evaluation value given by the lower reference evaluator b.
In summary, due to the adoption of the technical scheme, the edge calculation can be reasonably deployed according to the constructed optimization model.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic diagram of container cluster deployment in an edge network environment according to the present invention.
FIG. 2 is a diagram of a reinforcement learning model decision-reward cycle according to the present invention.
FIG. 3 is a schematic diagram of the model training process of the present invention.
Fig. 4 is a schematic diagram of details of the network model of the actor of the present invention.
FIG. 5 is a schematic diagram of training history of the present invention in three experimental scenarios;
where (a) is a training history (small-scale scenario), (b) is a training history (medium-scale scenario), (c) is a training history (large-scale scenario), (d) is a training loss (small-scale scenario), (e) is a training loss (medium-scale scenario), and (f) is a training loss (large-scale scenario).
FIG. 6 is a solution time comparison diagram of the present invention.
FIG. 7 is a comparative illustration of the deployment error rate of the present invention.
FIG. 8 is a graphical illustration of a comparison of the cumulative revenue of the present invention over a period of time;
in these cases, (a) is cumulative benefit comparison (small-scale scenario), (b) is cumulative benefit comparison (medium-scale scenario), and (c) is cumulative benefit comparison (large-scale scenario).
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
The invention mainly comprises the following steps: modeling a container cluster deployment problem in an edge computing network environment, and solving a framework based on an edge computing container cluster deployment strategy of Actor critics (Actor-Critic) reinforcement learning. The method comprises the steps of extracting features of a mesh relation topological structure among a plurality of containers in a container cluster by introducing a graph volume network, using the extracted features as input of an attention mechanism in a Seq2Seq network to improve the output quality of a solution, carrying out embedded coding on the container cluster by an encoder part of Seq2Seq, and outputting corresponding container deployment positions by a decoder part. And a reinforcement learning framework based on Actor-Critic is adopted to train the network, label mapping is not needed, the Actor network and the Critic network can train and learn mutually to be promoted independently, and the system benefit is improved obviously through the solution given by the trained network.
Different numbers of service requests may be received by the edge computing platform in the same period, and functions required to be implemented by each service request are different as much as possible, and services with different functions mean that containers with different types and different numbers need to be used, and uncertain communication needs exist in containers with the same number. The most intuitive impact of service request size and category is the change of virtual nodes and links, i.e. the change of the structural configuration. Workload fluctuations typically change the amount of resource requirements, i.e., changes in resource configuration, of a virtual node or link. The process of mapping two different container clusters to the underlying physical network is illustrated in fig. 1.
1. Reinforced learning solving framework combined with graph convolution network
In the invention, a reinforcement learning framework of Actor-Critic is adopted to train the model. The entire model involves two neural networks: actor networks and critic networks. Their workflow is as shown in fig. 2: for a given container cluster input into the decision system, the agent (Actor network) will depend on the current network state S t Giving a suitable decision a t In our question, the deployment policy Placement, indicates the deployment location of the containers in the container cluster. The environment then evaluates the deployment policy to generate corresponding feedback information (reward) R indicative of the quality of the deployment policy t+1 At the same time, the environment will update the new environment S after deployment t+1 . The critic network evaluates the return (namely the Langerhans value) obtained after the actor acts, and the evaluation result is Baseline; the actor network updates the model parameters based on the output of the critic module (the actor network will update the parameters in the direction of higher returns). The training process of the model is specifically shown in fig. 3.
In the invention, a Graph Convolutional neural Network (Graph conditional Network) is expanded on the basis of a neural combination optimization theory to extract a topological link relation existing in a container cluster, so that an intelligent agent can predict a topological structure of the container cluster in advance, and a deployment strategy can be given more accurately. In particular, we use a graph-convolution network and a sequence-to-sequence model based on an codec structure to infer deployment strategies. For a cluster of containers of the same training batch, we adopt the following method: clustering a plurality of containersThe feature information group and the block diagonal matrix are input to a graph convolution network for training. To more clearly explain the above model working process, we assume a set of container clusters [ Q, V, W]Mapping into the underlying physical network is required. Each container cluster to which a service request corresponds has a container number of variable size m, e.g., Q = (f) 1 ,f 2 ,...,f m ). Clustering containers [ Q, V, W ]]As input to the GCN network, containers Q = (f) in a container cluster 1 ,f 2 ,...,f m ) As input to the encoder, the decoder portion outputs a deployment policy P = (P) 1 ,p 2 ,...,p m ) Indicating the deployment location of each container. The actor network model in the method proposed by the present invention is shown in fig. 4.
One part of the task request is input into the GCN network for extracting the topological characteristics, and the other part of the task request is input into the encoder part of the Seq2Seq network for controlling the sequence of container deployment. The output of the GCN network and the output part of the encoder are input to a decoder part of Seq2Seq through matrix dot product operation, and finally the deployment strategy of the container is given by the decoder.
The invention sets out from the perspective of an edge computing service provider to construct an optimization model, and hopefully reduces the total energy consumption expense on the premise of meeting the service request of a user as much as possible so as to realize the benefit maximization of the service provider.
max (Total charge-Total energy expenditure) (1.1)
The objective function is divided into two parts: equation (1.2) provides for the edge computing service provider to perform a regular charging of rented resources, i.e., for the container j ∈ V included in the service request I ∈ I i Occupied physical resources: computing resources
Figure BDA0003917890840000101
Memory resource
Figure BDA0003917890840000102
And storage resources
Figure BDA0003917890840000103
Respectively multiplied by corresponding chargesCoefficient: g c 、G m and Gs . It is worth noting that for the charging rule of the computing resource, a service effect coefficient (1-eta) is creatively added k,c ) The competing use of physical resources to constrain containers is exacerbated resulting in reduced service capacity.
Figure BDA0003917890840000104
Wherein N represents a set of physical nodes;
G c representing revenue per unit of computing resource;
η k,c representing the utilization rate of computing resources on a physical node k;
i represents a service request set;
V i a set of containers representing service requests i;
Figure BDA0003917890840000105
which represents a binary flag bit that is,
Figure BDA0003917890840000106
a container j representing a request i is deployed on a physical node k;
Figure BDA0003917890840000107
representing the demand of the container j of the request i for the computing resource;
G m expressing the income of memory resources per unit;
Figure BDA0003917890840000108
representing the demand of the container j of the request i for the memory resource;
G s representing the profit per unit of storage resource;
Figure BDA0003917890840000109
representing the demand of the container j of the request i for the storage resource;
in equation (1.3) we define the energy consumption expenses incurred by the underlying physical network, and here our optimization model considers only the energy consumption expenses as operator expenses, considering that the energy consumption expenses account for a large part of the service provider's daily operating expenses.
Figure BDA0003917890840000111
Is the maximum energy consumption value of the physical node k,
Figure BDA0003917890840000112
is the minimum energy consumption value of the physical node k, since the energy consumption is positively correlated to the resource utilization rate, we use
Figure BDA0003917890840000113
And computing resource occupancy rate
Figure BDA0003917890840000114
The product of (b) represents the energy consumption of the physical node k, and the energy consumption is generated when the physical node is idle, so that the energy consumption value of the physical node k is added
Figure BDA0003917890840000115
And finally, multiplying the sum of the two by a unit energy consumption expenditure coefficient to express the total energy consumption expenditure of the service provider.
Figure BDA0003917890840000116
Wherein N represents a set of physical nodes;
Figure BDA0003917890840000117
representing a maximum energy consumption value of a physical node k;
Figure BDA0003917890840000118
represents an idle energy consumption value for physical node k;
i represents a service request set;
V i a set of containers representing service request i;
Figure BDA0003917890840000119
which represents a binary flag bit that is,
Figure BDA00039178908400001110
a container j representing a request i is deployed on a physical node k;
Figure BDA00039178908400001111
representing a demand amount for computing resources by container j of request i;
Figure BDA00039178908400001112
representing the total amount of computing resources of physical node k;
u k representing binary flag bits, u k When =1, it indicates that physical node k is in an active state;
c represents the unit energy consumption coefficient;
the optimization model is constrained by a plurality of constraints, the constraints (1.4) representing the utilization η on the physical node k with respect to the computational resources k,c ,η k,c The range of values is limited to [0,1 ]]。
Figure BDA00039178908400001113
wherein ,ηk,c Representing the utilization of computing resources on physical node k;
i represents a service request set;
n represents a physical node set;
Figure BDA0003917890840000121
which represents a binary flag bit that is set to zero,
Figure BDA0003917890840000122
a container j representing a request i is deployed on a physical node k;
Figure BDA0003917890840000123
representing a demand amount for computing resources by container j of request i;
Figure BDA0003917890840000124
representing the total amount of computing resources of physical node k;
the constraint (1.5) defines that the jth container of the ith service request can only be deployed on one physical node and cannot be deployed repeatedly.
Figure BDA0003917890840000125
Wherein N represents a set of physical nodes;
Figure BDA0003917890840000126
which represents a binary flag bit that is set to zero,
Figure BDA0003917890840000127
a container j representing a request i is deployed on a physical node k;
i represents a service request set;
V i a set of containers representing service requests i;
constraint (1.6) defines that two of the service requests i are located at physical node k respectively u and kv Does not exceed the bandwidth resource occupied by the communication between the containers m and n of the physical node k u and kv The total amount of bandwidth resources in between.
Figure BDA0003917890840000128
Wherein I represents a service request set;
V i a set of containers representing service requests i;
Figure BDA0003917890840000129
representing the bandwidth requirements of container m and container n for request i;
Figure BDA00039178908400001210
which represents a binary flag bit that is,
Figure BDA00039178908400001211
a container m representing a request i is deployed at a physical node k u C, removing;
Figure BDA00039178908400001212
which represents a binary flag bit that is,
Figure BDA00039178908400001213
a container n representing a request i is deployed at a physical node k v The above step (1);
Figure BDA00039178908400001214
representing a physical node k u and kv Total amount of bandwidth resources in between;
constraints (1.7), (1.8) and (1.9) respectively limit the sum of the total amount of all container resources contained in the service request not to exceed the total amount of computing resources, memory resources and storage resources.
Figure BDA00039178908400001215
Figure BDA00039178908400001216
Figure BDA00039178908400001217
Wherein I represents a service request set;
n represents a physical node set;
Figure BDA0003917890840000131
which represents a binary flag bit that is,
Figure BDA0003917890840000132
a container j representing a request i is deployed on a physical node k;
Figure BDA0003917890840000133
representing the demand of the container j of the request i for the computing resource;
Figure BDA0003917890840000134
representing the total amount of computing resources of physical node k;
Figure BDA0003917890840000135
representing the demand of the container j of the request i for the memory resource;
Figure BDA0003917890840000136
representing the total amount of memory resources of a physical node k;
Figure BDA0003917890840000137
representing the demand of the container j of the request i for the storage resource;
Figure BDA0003917890840000138
representing the total storage resources of physical node kAn amount;
2. topological relation description based on graph convolution network
The invention adopts the graph convolution network to extract the topological relation of the input container cluster, and uses the extracted characteristics to assist the intelligent agent to provide a more accurate deployment strategy on the premise of not damaging the constraint condition, thereby reducing the container deployment cost and improving the overall benefit of the edge computing service provider.
Assume that a graph of one container cluster configuration is represented by G = (N, E). Where N represents a vertex in the diagram, i.e., a container in the container cluster, and E represents an edge in the diagram, i.e., a link resulting from communication between containers in the container cluster. The features of the vertices in G form an N X D matrix X, where D represents the number of features. The container-to-container relationship is represented by an N × N dimensional matrix a, i.e., a contiguous matrix of G. The hierarchical propagation of the graph convolution network is shown in equation (10).
Figure BDA0003917890840000139
wherein ,H(l+1) Represents the characteristics of the l +1 th layer;
σ () represents an activation function;
Figure BDA00039178908400001310
representation matrix
Figure BDA00039178908400001311
A degree matrix of (c);
Figure BDA00039178908400001312
pair of representations
Figure BDA00039178908400001313
Performing a power-1/2 operation on the matrix;
a represents the relationship matrix between the nodes in diagram G;
Figure BDA0003917890840000141
representing an adjacency matrix with an undirected graph G appended with self-joins;
H (l) features representing the ith layer;
W (l) a training parameter matrix representing the l-th layer;
I N representing an identity matrix with the order of N;
Figure BDA0003917890840000142
to represent
Figure BDA0003917890840000143
The ith row and j columns of elements of the matrix;
x represents a characteristic matrix formed by G node characteristics in the diagram;
in this equation
Figure BDA0003917890840000144
Is an adjacency matrix of an undirected graph G with attached self-joins, where A is the adjacency matrix of the undirected graph G, I N Is an identity matrix.
Figure BDA0003917890840000145
Is a matrix
Figure BDA0003917890840000146
The degree matrix of (c). W is a group of (l) Is the training parameter matrix for layer i. σ represents an activation function, such as ReLu, sigmoid, etc. (we use ReLu in our model). H (l) Representative is the characteristics of the l-th layer, H = X for the input layer.
3. Policy gradient based constrained optimization
Assuming that a set of container clusters is represented by C, wherein one container cluster is represented by C (C ∈ C), the policy function of C is represented as:
π(p|c,θ)=P r {A t =p|S t =c,θ t =θ}
where π (p | c, θ) represents the probability of deploying policy p for the output of a given input c;
θ represents the training parameters of the model;
P r representing the probability of outputting the deployment policy p;
A t represents the operation at time t;
S t represents the state at time t;
θ t a training parameter representing time t;
the strategy function represents t moment, c is input, the parameter is theta, and the probability of outputting the deployment strategy P is P r . The strategy gives a high-income deployment strategy p higher probability and gives a low-income deployment strategy p lower probability. Interaction of the input container cluster with the output policy within the T period generates a trajectory of a markov decision process (trajectory) = (c) 1 ,p 1 ,...,c T ,p T ) The probability of (d) can be expressed as:
Figure BDA0003917890840000151
wherein ,Pθ (c 1 ,p 1 ,...,c T ,p T ) Represents the trace τ = (c) under the parameter θ 1 ,p 1 ,...,c T ,p T ) The probability of occurrence;
p(c 1 ) Represents a state c 1 (i.e., the input at time t =1 is c 1 ) The probability of occurrence;
t represents a period of time;
π θ (p t |c t ) Indicates that at time t, the current state is c t (i.e., the container cluster of inputs), with parameter θ, the agent takes action p t (i.e., the outputted deployment policy);
p(c t+1 |c t ,p t ) The state at time t (i.e., the incoming container cluster) is denoted as c t And the action (i.e., deployment policy of the output) is p t Under the condition(s), the system state at time t +1 (i.e., the input container cluster) is c t+1 The probability of (d);
c 1 represents the system state (i.e., incoming container cluster) at time t = 1;
p 1 represents the deployment strategy at time t = 1;
c t an input representing time t;
p t representing the deployment strategy output at the time t;
in the above policy function, cluster c is clustered for the current input container t Deployment policy p t Depends on the deployment position p of the previous container cluster (<t) And a system state. For simplicity, we assume that the system state is fully defined by the container cluster C. The only output of the policy function is the probability that indicates the container cluster deployment location. The objective of the tactical gradient method is to find an optimal set of parameters θ * To obtain an optimal deployment location for the cluster of containers. To do this, we need to define an objective function to describe the quality of the deployment strategy.
Figure BDA0003917890840000152
wherein ,JR (θ | c) represents the policy quality corresponding to input c;
Figure BDA0003917890840000153
expressing the expectation;
r (p) represents service income corresponding to the deployment strategy p;
p π θ (. DELTA.c) represents all deployment policies p for a given input c;
in the above equation, we use the expected service revenue R (p) for a given container cluster C for a deployment policy as an objective function describing the quality of the deployment policy. Because the agent infers the deployment policy from all container clusters, the revenue expectation can then be defined as the expectation of the container probability distribution:
Figure BDA0003917890840000161
wherein ,JR (θ) represents the policy quality, i.e., expected value of revenue;
Figure BDA0003917890840000162
expressing the expectation;
J R (θ | c) represents the policy quality corresponding to input c;
c to C represent clusters C for all containers;
the same reasoning can be expressed for the expected penalty due to violation of the constraint:
Figure BDA0003917890840000163
wherein ,JC (θ) represents an expected value of a penalty value;
Figure BDA0003917890840000164
expressing the expectation;
J C (θ | c) represents a penalty value corresponding to the input c;
c to C represent clusters C for all containers;
herein, we define four constraint signals, respectively: computing resource cpu, memory resource mem, storage resource sto and bandwidth resource bw. The final optimization objective can be transformed into an unconstrained problem by the lagrangian relaxation technique:
Figure BDA0003917890840000165
wherein ,JL (λ, θ) represents the Lagrangian value calculated by taking the expected value of the benefit J R (theta) plus an expected value J of a plurality of resource-corresponding penalty values C (θ) a weighted sum;
λ represents the weight of the four constraint signals;
J R (θ) represents the policy quality, i.e., expected value of revenue;
λ i representing weights of the constraint signals;
J C (θ) represents an expected value of the penalty value;
J ξ (θ) represents a weighted sum of expected values of the four constraint signal penalty values;
where λ is the weight of the four constraint signals, J ξ (θ) is the expected revenue weighted sum of the four constraint signals. Next, we calculate J using log-likelihood L (λ, θ) gradient.
Figure BDA0003917890840000171
wherein ,
Figure BDA0003917890840000172
representing a gradient operation performed with respect to θ;
J L (λ, θ) represents the Lagrangian value calculated by taking the expected value of the benefit J R (theta) plus expectation J of corresponding penalty values for multiple resources C (θ) a weighted sum;
Figure BDA0003917890840000177
expressing the expectation;
π θ (p | c) represents the policy function of c;
q (c, p) represents the reward accrued given the input container cluster c algorithm making decision p;
p π θ (. DELTA.c) represents the deployment strategy p that is all for a given input c;
in the above equation, Q (c, p) is used to describe the reward that can be achieved at a given input c and the algorithm making a decision p. The calculation method is by adding a weighted sum of all constraint unsatisfied values C (p) to the profit value R (p), as shown in (18):
Figure BDA0003917890840000173
where Q (c, p) represents the reward accrued given the input container cluster c algorithm making decision p;
r (p) represents the reward that the decision p can obtain corresponding to the system;
ξ (p) represents the weighted sum of the penalty values corresponding to all the constraint signals of decision p;
λ i a weight representing the constraint signal;
c (p) represents the penalty value produced by a constraint signal at decision p;
then, we approximate the Lagrangian gradient using Monte Carlo sampling
Figure BDA0003917890840000174
Wherein m is the number of samples, and in order to reduce the variance of the gradient and accelerate the convergence speed of the model, a critic network is used as a reference estimator b and is composed of a simple RNN network. Then the lagrangian gradient can be expressed as:
Figure BDA0003917890840000175
wherein ,
Figure BDA0003917890840000176
representing a lagrange gradient;
m represents the number of samples;
Q(c,p i ) Indicating that the algorithm makes a decision p at a given input container cluster c i The prize won is made.
b(c,p i ) Representing a cluster c and a decision p at a given input container i The evaluation value given by the lower reference evaluator b;
Figure BDA0003917890840000181
a gradient representing the logarithm of the policy function;
and finally, updating a parameter theta of the network model by adopting a random gradient descent method:
Figure BDA0003917890840000182
wherein ,θk+1 Representing the model parameters at the next time instant;
θ k model parameters representing a current time;
α represents a learning rate;
Figure BDA0003917890840000183
representing a lagrangian gradient approximated using monte carlo sampling;
the benchmark evaluator gives an evaluation b (c, p) of the current container cluster return, and then the parameter σ of the benchmark evaluator is updated based on the mean square error of b (c, p) and the reward value Q (c, p) using a random gradient descent method.
Figure BDA0003917890840000184
wherein ,
Figure BDA0003917890840000185
represents the mean square error of the evaluation value b (c, p) and the reward value Q (c, p) given by the reference evaluator;
m represents the number of samples;
Q(c,p i ) Indicating that the algorithm makes a decision p at a given input container cluster c i The reward obtained is made;
b(c,p i ) Representing a cluster c and a decision p at a given input container i The evaluation value given by the lower reference evaluator b;
the container cluster deployment algorithm training process based on the graph convolution network and neural combinatorial optimization can be described as table 1:
table 1. Container Cluster deployment algorithm training process description based on graph convolution network and neural combinatorial optimization
Figure BDA0003917890840000186
Figure BDA0003917890840000191
While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims (9)

1. A container cluster online deployment method for fusing graph neural network and reinforcement learning in edge computing is characterized by comprising the following steps:
s1, extracting a topological association relation existing between containers by a graph convolution network;
and S2, deducing a deployment strategy from the sequence to the sequence network with the assistance of the graph volume network.
2. The method for deploying container clusters fusing graph neural networks and reinforcement learning on line in edge computing according to claim 1, wherein the hierarchical propagation of the graph convolution network in step S1 is as follows:
Figure FDA0003917890830000011
wherein ,H(l+1) Represents the characteristics of the l +1 th layer;
σ () represents an activation function;
Figure FDA0003917890830000012
representation matrix
Figure FDA0003917890830000013
A degree matrix of (c);
Figure FDA0003917890830000014
presentation pair
Figure FDA0003917890830000015
Performing a power-1/2 operation on the matrix;
a represents the relationship matrix between the nodes in diagram G;
Figure FDA0003917890830000016
representing an adjacency matrix with an undirected graph G appended with self-joins;
H (l) represents the characteristics of the l-th layer;
W (l) the training parameter matrix for layer l is represented.
3. The method for deploying the container cluster fusing the graph neural network and the reinforcement learning in the edge computing on line according to claim 1, wherein the deployment strategy in step S2 is as follows:
π(p|c,θ)=P r {A t =p|S t =c,θ t =θ}
where π (p | c, θ) represents the probability of deploying policy p for the output of a given input c;
theta represents a training parameter of the model;
P r representing a probability of outputting the deployment policy p;
A t represents the operation at time t;
S t indicates the state at time t;
θ t representing the training parameters at time t.
4. The container cluster online deployment method for fusion graph neural network and reinforcement learning in edge computing as claimed in claim 1, characterized in that, after step S1, a step S3 is further included, and the commentator network evaluates the rewards obtained after the actor acts.
5. The method for deploying the fusion graph neural network and the container cluster for reinforcement learning in edge computing on line according to claim 1, characterized by further comprising a step S4 after the step S1, wherein the actor network updates the optimized model parameters according to the output of the critic module.
6. The method for deploying a fusion graph neural network and a reinforcement learning container cluster on line in edge computing according to claim 5, wherein an optimization model is as follows:
max (Total charge-Total energy expenditure) (1.1)
Figure FDA0003917890830000021
Wherein N represents a set of physical nodes;
G c representing revenue per unit of computing resource;
η k,c representing the utilization of computing resources on physical node k;
i represents a service request set;
V i a set of containers representing service requests i;
Figure FDA0003917890830000022
which represents a binary flag bit that is,
Figure FDA0003917890830000023
a container j representing a request i is deployed on a physical node k;
Figure FDA0003917890830000024
representing the demand of the container j of the request i for the computing resource;
G m expressing the income of memory resources per unit;
Figure FDA0003917890830000025
representing the demand of the container j of the request i for the memory resource;
G s representing revenue per unit of storage resource;
Figure FDA0003917890830000031
representing the demand of the container j of the request i for the storage resource;
Figure FDA0003917890830000032
wherein N represents a set of physical nodes;
Figure FDA0003917890830000033
representing the maximum energy consumption value of the physical node k;
Figure FDA0003917890830000034
represents an idle energy consumption value for physical node k;
i represents a service request set;
V i a set of containers representing service request i;
Figure FDA0003917890830000035
which represents a binary flag bit that is set to zero,
Figure FDA0003917890830000036
a container j representing a request i is deployed on a physical node k;
Figure FDA0003917890830000037
representing the demand of the container j of the request i for the computing resource;
Figure FDA0003917890830000038
representing the total amount of computing resources of physical node k;
u k representing binary flag bits, u k When =1, it indicates that physical node k is in an active state;
c represents the unit energy consumption coefficient;
or, min (Total energy expenditure)
Figure FDA0003917890830000039
Wherein N represents a set of physical nodes;
Figure FDA00039178908300000310
representing the maximum energy consumption value of the physical node k;
Figure FDA00039178908300000311
represents an idle energy consumption value for physical node k;
i represents a service request set;
V i a set of containers representing service requests i;
Figure FDA00039178908300000312
which represents a binary flag bit that is set to zero,
Figure FDA00039178908300000313
a container j representing a request i is deployed on a physical node k;
Figure FDA00039178908300000314
representing a demand amount for computing resources by container j of request i;
Figure FDA00039178908300000315
representing the total amount of computing resources of physical node k;
u k representing binary flag bits, u k 1 indicates that the physical node k is in an activated state;
c represents the specific energy expenditure coefficient.
7. The method for deploying the fusion graph neural network and the container cluster for reinforcement learning in the edge computing on line according to claim 6, wherein the constraint conditions of the optimization model are as follows:
Figure FDA0003917890830000041
wherein ,ηk,c Representing the utilization of computing resources on physical node k;
i represents a service request set;
n represents a physical node set;
Figure FDA0003917890830000042
which represents a binary flag bit that is,
Figure FDA0003917890830000043
a container j representing a request i is deployed on a physical node k;
Figure FDA0003917890830000044
representing a demand amount for computing resources by container j of request i;
Figure FDA0003917890830000045
representing the total amount of computing resources of physical node k;
Figure FDA0003917890830000046
wherein N represents a set of physical nodes;
Figure FDA0003917890830000047
which represents a binary flag bit that is,
Figure FDA0003917890830000048
a container j representing a request i is deployed on a physical node k;
i represents a service request set;
V i a set of containers representing service requests i;
Figure FDA0003917890830000049
wherein I represents a service request set;
V i a set of containers representing service request i;
Figure FDA00039178908300000410
representing the bandwidth requirements of container m and container n of request i;
Figure FDA00039178908300000411
which represents a binary flag bit that is set to zero,
Figure FDA00039178908300000412
a container m representing a request i is deployed at a physical node k u The above step (1);
Figure FDA00039178908300000413
which represents a binary flag bit that is set to zero,
Figure FDA00039178908300000414
a container n representing a request i is deployed at a physical node k v The above step (1);
Figure FDA00039178908300000415
representing a physical node k u and kv Total amount of bandwidth resources in between;
Figure FDA0003917890830000051
Figure FDA0003917890830000052
Figure FDA0003917890830000053
wherein I represents a service request set;
n represents a physical node set;
Figure FDA0003917890830000054
which represents a binary flag bit that is set to zero,
Figure FDA0003917890830000055
a container j representing a request i is deployed on a physical node k;
Figure FDA0003917890830000056
representing a demand amount for computing resources by container j of request i;
Figure FDA0003917890830000057
meter for representing physical node kCalculating the total amount of resources;
Figure FDA0003917890830000058
representing the demand of the container j of the request i for the memory resource;
Figure FDA0003917890830000059
representing the total amount of memory resources of a physical node k;
Figure FDA00039178908300000510
representing the demand of the container j of the request i for the storage resource;
Figure FDA00039178908300000511
representing the total amount of storage resources of physical node k.
8. The method for deploying a fusion graph neural network and a reinforcement learning container cluster on line in edge computing according to claim 5, wherein the model is updated as follows:
Figure FDA00039178908300000512
wherein ,θk+1 Representing model parameters at a next time instant;
θ k model parameters representing a current time;
α represents a learning rate;
Figure FDA00039178908300000513
representing the lagrangian gradient approximated using monte carlo sampling.
9. The method for deploying the fusion graph neural network and the container cluster for reinforcement learning in the edge computing on line according to claim 8, wherein the model updating further comprises:
Figure FDA00039178908300000514
wherein ,
Figure FDA00039178908300000515
represents the mean square error of the evaluation value b (c, p) and the reward value Q (c, p) given by the reference evaluator; m represents the number of samples;
Q(c,p i ) Representing the algorithm making a decision p at a given input container cluster c i The reward obtained is made;
b(c,p i ) Representing a cluster c and a decision p at a given input container i The evaluation value given by the lower reference evaluator b.
CN202211347967.8A 2022-10-31 2022-10-31 Container cluster online deployment method integrating graph neural network and reinforcement learning in edge calculation Active CN115686846B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211347967.8A CN115686846B (en) 2022-10-31 2022-10-31 Container cluster online deployment method integrating graph neural network and reinforcement learning in edge calculation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211347967.8A CN115686846B (en) 2022-10-31 2022-10-31 Container cluster online deployment method integrating graph neural network and reinforcement learning in edge calculation

Publications (2)

Publication Number Publication Date
CN115686846A true CN115686846A (en) 2023-02-03
CN115686846B CN115686846B (en) 2023-05-02

Family

ID=85045641

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211347967.8A Active CN115686846B (en) 2022-10-31 2022-10-31 Container cluster online deployment method integrating graph neural network and reinforcement learning in edge calculation

Country Status (1)

Country Link
CN (1) CN115686846B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116069512A (en) * 2023-03-23 2023-05-05 之江实验室 Serverless efficient resource allocation method and system based on reinforcement learning
CN117149443A (en) * 2023-10-30 2023-12-01 江西师范大学 Edge computing service deployment method based on neural network

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008819A (en) * 2019-01-30 2019-07-12 武汉科技大学 A kind of facial expression recognizing method based on figure convolutional neural networks
CN112631717A (en) * 2020-12-21 2021-04-09 重庆大学 Network service function chain dynamic deployment system and method based on asynchronous reinforcement learning
CN112711475A (en) * 2021-01-20 2021-04-27 上海交通大学 Workflow scheduling method and system based on graph convolution neural network
CN113568675A (en) * 2021-07-08 2021-10-29 广东利通科技投资有限公司 Internet of vehicles edge calculation task unloading method based on layered reinforcement learning
CN113778648A (en) * 2021-08-31 2021-12-10 重庆理工大学 Task scheduling method based on deep reinforcement learning in hierarchical edge computing environment
US20220124543A1 (en) * 2021-06-30 2022-04-21 Oner Orhan Graph neural network and reinforcement learning techniques for connection management
US20220343143A1 (en) * 2019-09-11 2022-10-27 Siemens Aktiengesellschaft Method for generating an adapted task graph

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008819A (en) * 2019-01-30 2019-07-12 武汉科技大学 A kind of facial expression recognizing method based on figure convolutional neural networks
US20220343143A1 (en) * 2019-09-11 2022-10-27 Siemens Aktiengesellschaft Method for generating an adapted task graph
CN112631717A (en) * 2020-12-21 2021-04-09 重庆大学 Network service function chain dynamic deployment system and method based on asynchronous reinforcement learning
CN112711475A (en) * 2021-01-20 2021-04-27 上海交通大学 Workflow scheduling method and system based on graph convolution neural network
US20220124543A1 (en) * 2021-06-30 2022-04-21 Oner Orhan Graph neural network and reinforcement learning techniques for connection management
CN113568675A (en) * 2021-07-08 2021-10-29 广东利通科技投资有限公司 Internet of vehicles edge calculation task unloading method based on layered reinforcement learning
CN113778648A (en) * 2021-08-31 2021-12-10 重庆理工大学 Task scheduling method based on deep reinforcement learning in hierarchical edge computing environment

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116069512A (en) * 2023-03-23 2023-05-05 之江实验室 Serverless efficient resource allocation method and system based on reinforcement learning
CN116069512B (en) * 2023-03-23 2023-08-04 之江实验室 Serverless efficient resource allocation method and system based on reinforcement learning
CN117149443A (en) * 2023-10-30 2023-12-01 江西师范大学 Edge computing service deployment method based on neural network
CN117149443B (en) * 2023-10-30 2024-01-26 江西师范大学 Edge computing service deployment method based on neural network

Also Published As

Publication number Publication date
CN115686846B (en) 2023-05-02

Similar Documents

Publication Publication Date Title
CN115686846B (en) Container cluster online deployment method integrating graph neural network and reinforcement learning in edge calculation
CN109818786B (en) Method for optimally selecting distributed multi-resource combined path capable of sensing application of cloud data center
Babazadeh et al. Application of particle swarm optimization to transportation network design problem
Djigal et al. Machine and deep learning for resource allocation in multi-access edge computing: A survey
CN113705610A (en) Heterogeneous model aggregation method and system based on federal learning
Yuan et al. A Q-learning-based approach for virtual network embedding in data center
CN113098714A (en) Low-delay network slicing method based on deep reinforcement learning
CN112990485A (en) Knowledge strategy selection method and device based on reinforcement learning
Rkhami et al. On the use of graph neural networks for virtual network embedding
Bi et al. Green energy forecast-based bi-objective scheduling of tasks across distributed clouds
CN108170531A (en) A kind of cloud data center request stream scheduling method based on depth belief network
Xu et al. Living with artificial intelligence: A paradigm shift toward future network traffic control
Qin et al. Dynamic IoT service placement based on shared parallel architecture in fog-cloud computing
Zhang et al. Offloading demand prediction-driven latency-aware resource reservation in edge networks
CN113543160A (en) 5G slice resource allocation method and device, computing equipment and computer storage medium
CN115001978B (en) Cloud tenant virtual network intelligent mapping method based on reinforcement learning model
CN115499511A (en) Micro-service active scaling method based on space-time diagram neural network load prediction
CN115883371A (en) Virtual network function placement method based on learning optimization method in edge-cloud collaborative system
CN113783726B (en) SLA-oriented resource self-adaptive customization method for edge cloud system
CN112906745B (en) Integrity intelligent network training method based on edge cooperation
CN115630979A (en) Day-ahead electricity price prediction method and device, storage medium and computer equipment
Tsang et al. On-Chain and Off-Chain Data Management for Blockchain-Internet of Things: A Multi-Agent Deep Reinforcement Learning Approach
Bhargavi et al. Uncertainty aware resource provisioning framework for cloud using expected 3-SARSA learning agent: NSS and FNSS based approach
Liu et al. Towards Multi-Task Generative-AI Edge Services with an Attention-based Diffusion DRL Approach
CN117648174B (en) Cloud computing heterogeneous task scheduling and container management method based on artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant