CN115361387A

CN115361387A - Joint control method and device for adjusting number of container instances and distributing user requests

Info

Publication number: CN115361387A
Application number: CN202210990289.0A
Authority: CN
Inventors: 马骁; 李元哲; 周傲; 徐梦炜; 孙其博; 王尚广
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2022-08-18
Filing date: 2022-08-18
Publication date: 2022-11-18

Abstract

A combined control method and a device for adjusting the number of container instances and distributing user requests are completed by adopting a trained deep reinforcement learning model, the model takes the number of container instances which can be started currently, the number of container instances operated at the current time slot and user request state information of the current time slot as input, and the increase and decrease information of the number of container instances required by the current time slot and the user request distribution information of the current time slot are output. And on the basis of the output result of the model, increasing and decreasing the number of container instances in the cross-domain edge network and controlling the process of shunting the user request to the corresponding edge cloud. The training of the model is realized by solving based on a preset problem model, and the problem model is obtained by performing problem modeling in a Markov decision mode by taking the minimum average user request cost in a unit time slot of a cross-domain edge network as a target. Therefore, the dynamic management of the computing resources in the cross-domain edge network is realized by taking the user request as the scheduling granularity.

Description

Joint control method and device for adjusting number of container instances and distributing user requests

Technical Field

The present application relates to the field of edge computing in computer networks, and in particular, to a method and an apparatus for joint control of container instance number adjustment and user request distribution.

Background

In the edge calculation of the cross-domain edge network, the service quality of large-scale distributed artificial intelligence application in the whole network can be guaranteed only by adopting dynamic supply of calculation resources. Specifically, the cross-domain edge network comprises a plurality of edge clouds loaded with artificial intelligence applications and deployed in a distributed manner, wherein container instances are dynamically arranged in each edge cloud, and each container instance can process user requests of a plurality of users. When the cross-domain edge network manages the computing resources of the user request, a containerization mode is adopted, namely after a distributing decision of the user request is obtained based on the computing resources required by the current user request, the number of container instances in the related edge cloud is adjusted according to the obtained distributing decision, and the corresponding user request is distributed to the corresponding container instances in the edge cloud for processing.

In the process of managing the computing resources, when the computing resources required by the user requests in the cross-domain edge network are increased, the number of container instances for processing the user requests should be increased by the corresponding edge cloud, so that sufficient computing resources are supplied to the cross-domain edge network, and the phenomenon that the user requests are queued at a service end of a network side to cause overlong time delay is avoided; when the computing resources required by the user request in the cross-domain edge network are reduced, the corresponding edge cloud needs to reduce the number of capacity instances in operation, so that the computing resources in the cross-domain edge network are saved, and the utilization rate of the computing resources is improved.

However, simply and dynamically increasing the computational resource supply in the cross-domain edge network has limitations to guarantee the service quality of large-scale distributed artificial intelligence application in the cross-domain edge network. This is because, on one hand, starting or terminating a container instance in an edge cloud in a cross-domain edge network may bring time cost, and for the container instance, the process may last from several seconds to several tens of seconds, so that there is a hysteresis in the adjustment of computational power resources of the cross-domain edge network, and an immediate response cannot be realized; on the other hand, such cross-domain edge network scheduling granularity does not enable user request level management since each container instance can handle multiple user requests. Therefore, currently the adjustment of computational resource provisioning throughout a cross-domain edge network by changing the number of container instances is only suitable for coarse-grained and low-frequency management.

In order to solve the above problem, it can be considered to migrate the traditional user-level computational resource scheduling method based on computer network architecture into the cross-domain edge network for implementation. However, different edge clouds in the cross-domain edge network are deployed in a distributed manner and are in a cross-domain isolated state, so that the user request cannot be directly scheduled by adopting a traditional method. In addition, when the cross-domain edge network is based on a mobile communication network, because of a user plane bearer mechanism, an Internet Protocol (IP) packet of a user is encapsulated in a General Packet Radio Service (GPRS) tunneling protocol (GTP-U) tunnel of a user plane, and an IP address in the packet cannot affect a route of the tunnel, a Domain Name System (DNS) mechanism of a conventional method cannot be directly applied to schedule a user request.

Therefore, how to realize the management of computational resources by using the user request as the scheduling granularity in the cross-domain edge network becomes an urgent problem to be solved.

Disclosure of Invention

In view of this, the embodiments of the present application provide a joint control method for adjusting the number of container instances and offloading user requests, where the method can use a user request as a scheduling granularity to implement dynamic management of computing resources in a cross-domain edge network.

The embodiment of the application also provides a combined control device for adjusting the number of the container instances and distributing the user requests, and the device can realize dynamic management of computing resources in a cross-domain edge network by taking the user requests as scheduling granularity.

The embodiment of the application is realized as follows:

one embodiment of the present application provides a method for jointly controlling the number adjustment of container instances and the offloading of user requests, where the method includes:

aiming at minimizing the average user request cost of a unit time slot of a cross-domain edge network, carrying out problem modeling by adopting a Markov decision mode, wherein the state in the problem model is set based on the number of container instances which can be started by the system, the number of container instances operated at the current time slot and the user request state information of the current time slot; actions in the problem model are set based on container instance control information of each edge cloud of the current time slot and flow distribution information of a user request of the current time slot; an instant reward in the problem model is set based on a policy that each user request of a current time slot costs the lowest;

solving the problem model by adopting a deep reinforcement learning model to train the deep reinforcement learning model;

and inputting the number of container examples which can be started currently, the number of container examples operated at the current time slot and the user request state information of the current time slot in the network into the deep reinforcement learning model obtained by training, and outputting increase and decrease information of the number of container examples required by the current time slot and user request shunt information of the current time slot.

In the above embodiment, the problem of minimizing the average user request cost per time slot of the network comprises:

in the network, an upper limit value of the number of container instances in each edge cloud is set, the number of container instances in the edge cloud is dynamically adjusted in each time slot, and user request distribution information is calculated, so that the cost spent for each user request in each time slot is the lowest on average.

In the above embodiment, the setting of the state in the problem model based on the number of container instances that the system can start, the number of container instances that the system runs in the current time slot, and the user request state information of the current time slot includes:

upper limit (M) of number of container instances that the network can start _i )；

Number of container instances running in current time slot

The number of user requests which are in a service state and not finished in the current time slot;

and the number of user requests received by each edge cloud at the current time slot.

In the above embodiment, the setting of the action in the problem model based on the container instance control information of each edge cloud of the current time slot and the offloading information requested by the user of the current time slot includes:

each action (a) in an action space (A) in the problem model _i ) The method comprises the steps that container instance control information of each edge cloud of a current time slot and distribution information of a user request of the current time slot are contained;

the container instance control information of each edge cloud comprises information for adding one container instance, information for reducing one container instance and information for keeping the container instance unchanged;

for having a set first number n _e A second number n of edge clouds _q Said network requested by an individual user, action a _i Expressed as a first number n _e Container instance control information for an edge cloud, and a second number n _q Vectors of the shunt information requested by each user;

the action (a) _i ) The realization of (1) is as follows: a plurality of the components are continuously operated (a' _i ) Discretizing to obtain discretized action (a) _i ) Wherein the discretized action (a) _i ) The first number (n) of _e ) The individual vectors represent capacity instance control information corresponding to respective edge clouds, followed by a second number (n) _q ) The individual vectors represent information of whether or not the corresponding user request is shunted to the edge cloud.

In the above embodiment, the policy setting for the instant prize, which is based on the lowest cost of each user request in the current time slot, includes:

the instant prize is defined as: w is a ^t -c ^t Wherein

represents the reward value, w, brought by the user request that the time delay satisfaction rate in the current time slot (t) reaches the set time delay satisfaction rate threshold value ₀ Requesting the reward value when the set time delay is met and the threshold value is met for a single user; c. C ^t The total cost of the system for the current time slot includes the operating cost of occupied computing resources from the time slot previous to the current time slot, the container instance switch cost, and the user request penalty cost.

In the above embodiment, the deep reinforcement learning model is implemented by using a dominant motion review algorithm A2C.

In the above embodiment, the method further comprises:

in the next time slot, the core network of the network increases and decreases the number of corresponding container instances in each edge cloud in the network according to the increase and decrease information of the number of container instances required by the current time slot;

and in the next time slot, the core network of the network establishes a protocol data unit session to the corresponding user, and distributes the user request to the corresponding edge cloud according to the user request distribution information of the current time slot.

In another embodiment of the present application, a combined control device for adjusting the number of container instances and splitting user requests is provided, which includes: a problem modeling unit, a model training unit and a model application unit, wherein,

the problem modeling unit is used for performing problem modeling by adopting a Markov decision mode with the aim of minimizing the average user request cost of a unit time slot of a cross-domain edge network, wherein the state in the problem model is set based on the number of container examples which can be started by the system, the number of container examples operated at the current time slot and the user request state information of the current time slot; actions in the problem model are set based on container instance control information of each edge cloud of the current time slot and flow distribution information of a user request of the current time slot; an instant reward in the problem model is set based on a policy that each user of the current time slot requests the lowest cost;

the model training unit is used for solving the problem model by adopting a deep reinforcement learning model so as to train the deep reinforcement learning model until the instant reward in the problem model reaches a set evaluation value;

and the model application unit is used for inputting the number of container examples which can be started currently, the number of container examples operated at the current time slot and the user request state information of the current time slot in the network into the deep reinforcement learning model obtained by training and outputting the increase and decrease information of the number of container examples required by the current time slot and the user request shunt information of the current time slot.

In the above apparatus, further comprising:

the core network of the network increases and decreases the number of corresponding container instances in each edge cloud in the network according to the increase and decrease information of the number of container instances required by the current time slot in the next time slot; and in the next time slot, the core network of the network establishes a protocol data unit session to the corresponding user, and distributes the user request to the corresponding edge cloud according to the user request distribution information of the current time slot.

In another embodiment of the present application, there is provided an electronic device including:

a processor;

a memory storing a program configured to implement the joint control method of container instance number adjustment and user request splitting described above when executed by the processor.

As can be seen from the above, when the cross-domain edge network according to the embodiment of the present application manages computing resources, a trained deep reinforcement learning model is used to complete the management, where the deep reinforcement learning model takes the number of container instances that can be currently started, the number of container instances that are currently operated in a current time slot, and user request state information of the current time slot as input, and outputs increase and decrease information of the number of container instances required by the current time slot and user request split information of the current time slot. And on the basis of the output result of the deep reinforcement learning model, increasing and decreasing the number of container instances in the cross-domain edge network and controlling the process of shunting the user request to the corresponding edge cloud. The training of the deep reinforcement learning model is realized by solving based on a preset problem model, and the problem model is obtained by performing problem modeling in a Markov decision mode with the aim of minimizing the average user request cost in unit time slot of a cross-domain edge network. Therefore, the embodiment of the application takes the user request as the scheduling granularity, and realizes the dynamic management of the computing resources in the cross-domain edge network.

Drawings

Fig. 1 is a flowchart of a method for jointly controlling the number adjustment of container instances and the user request offloading according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of a management structure of a cross-domain edge network, which is provided in the embodiment of the present application, for implementing computing resources with a user request as a scheduling granularity;

fig. 3 is a schematic structural diagram of a joint control device for adjusting the number of container instances and splitting user requests according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged under appropriate circumstances such that the embodiments of the application described herein may be implemented, for example, in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements explicitly listed, but may include other steps or elements not explicitly listed or inherent to such process, method, article, or apparatus.

When performing computational resource management on user requests by a cross-domain edge network, the user requests of each user need to be finely scheduled, and computational resource supply on the edge side needs to be dynamically deployed. If only the number of container instances on the edge side is adjusted, there is a waste of resources due to coarse grain management and a slow response due to delay of the container switching process. And because the mode of session management based on a GTP-U tunnel in a communication network is different from the mode of routing based on the IP address of a user packet in the traditional computer network, the traditional user-level resource scheduling method based on the computer network architecture cannot be directly transplanted to a cross-domain edge network for realization.

In order to overcome the above problems, in the embodiment of the present application, when the cross-domain edge network manages computing resources, a trained deep reinforcement learning model is used to complete the management, where the deep reinforcement learning model takes the number of container instances that can be currently started, the number of container instances that operate in a current time slot, and user request state information of the current time slot as inputs, and outputs increase and decrease information of the number of container instances required by the current time slot and user request shunt information of the current time slot. And on the basis of the output result of the deep reinforcement learning model, increasing and decreasing the number of container instances in the cross-domain edge network and controlling the process of shunting the user request to the corresponding edge cloud. The training of the deep reinforcement learning model is realized by solving based on a preset problem model, and the problem model is obtained by performing problem modeling in a Markov decision mode by taking the minimum average user request cost of a unit time slot of a cross-domain edge network as a target.

Therefore, the number of container instances started in each edge cloud can be adjusted, the supply amount of computing resources is dynamically adjusted to adapt to the change of the user request demand, and meanwhile, the cross-domain edge network dynamically unloads the user request to the corresponding edge cloud, so that the number of container instances is prevented from being frequently adjusted, and the service quality of a user side is guaranteed.

Therefore, the embodiment of the application takes the user request as the scheduling granularity, and realizes the dynamic management of the computing resources in the cross-domain edge network. .

In the embodiment of the application, when the complex computational resources in the cross-domain edge network are faced, the user request shunting in the cross-domain edge network and the number adjustment of the container instances on each edge cloud are jointly scheduled, the operation of the cross-domain edge network needs to be guaranteed at the lowest cost. The embodiment of the application adopts a trained deep reinforcement learning model to manage the computational power resources of the cross-domain edge network, and can minimize the average cost of the user request of a unit time slot in the cross-domain edge network.

Fig. 1 is a flowchart of a joint control method for adjusting the number of container instances and offloading user requests according to an embodiment of the present application, which includes the following specific steps:

step 101, aiming at the minimum average user request cost of a unit time slot of a cross-domain edge network, performing problem modeling by adopting a Markov decision mode, wherein the state in the problem model is set based on the number of container instances which can be started by the system, the number of container instances operated at the current time slot and the user request state information of the current time slot; actions in the problem model are set based on container instance control information of each edge cloud of the current time slot and the shunting information of the user request of the current time slot; an instant reward in the problem model is set based on a policy that each user of the current time slot requests the lowest cost;

102, solving the problem model by adopting a deep reinforcement learning model so as to train the deep reinforcement learning model;

step 103, inputting the number of container instances which can be started currently, the number of container instances operated at the current time slot and the user request state information of the current time slot in the network into the deep reinforcement learning model obtained by training, and outputting increase and decrease information of the number of container instances required by the current time slot and user request shunt information of the current time slot.

In the above method, the edge cloud includes at least one edge node, each of the edge nodes may include a plurality of container instances, and the number of included container instances is increased, decreased, or unchanged under the control of a core network of the system. Each container instance may process at least one user request.

In the above method, the problem of minimizing an average user request cost per time slot of the network comprises:

in the network, an upper limit value of the number of container instances in each edge cloud is set, the number of container instances in the edge cloud is dynamically adjusted in each time slot, and user request distribution information is calculated, so that the cost spent for each user request in each time slot is the lowest on average. Specifically, the user requested average cost per time slot optimization problem can be quantitatively described as: given edge cloud e _i Upper limit of number of container instances M _i Dynamically adjusting the opening number of container instances in the edge cloud in each time slot

And calculating a split relationship

So that the cost spent on average for each user request in each time slot is minimized. Wherein,

representing a compute offload relationship if a user request x is offloaded to an edge cloud e _j Then there is

Otherwise

In the embodiment of the application, problem modeling is carried out by taking the minimum average user request cost of a unit time slot of a cross-domain edge network as a target, and the problem modeling is carried out by adopting a Markov decision mode. Specifically, in problem modeling, the user-requested split and computational resource dynamic deployment problems are mapped into a Markov decision process model. A markov decision process model M = (S, a, P, R) includes a finite pair state set S, a finite action set a, a state transition probability P of the system, and an immediate reward R after action is taken. In the above problem, the effect of actions on the network is deterministic, so the markov decision process emphasizes the need to determine what is the state, the action, and the instant reward.

Specifically, the setting of the state in the problem model based on the number of container instances that the system can start, the number of container instances running in the current time slot, and the user request state information of the current time slot includes: upper limit (M) of number of container instances that the network can start _i ) (ii) a Number of container instances running in current time slot

The number of user requests which are in a service state and not finished in the current time slot; and the number of user requests received by each edge cloud at the current time slot.

The action in the problem model, based on the container instance control information of each edge cloud of the current time slot and the distribution information setting of the user request of the current time slot, includes: each action (a) in an action space (A) in the problem model _i ) The method comprises the steps of including container instance control information of each edge cloud of a current time slot and distribution information of a user request of the current time slot; the container instance control information of each edge cloud comprises adding one container instance information, reducing one container instance information and keeping the container instance unchanged information；

For having a set first number n _e A second number n of edge clouds _q Said network requested by an individual user, action a _i Expressed as a first number n _e Container instance control information for an edge cloud, and a second number n _q Vectors of the split information requested by each user;

the action (a) _i ) The realization of (1) is as follows: a plurality of the continuous motion (a' _i ) Discretizing to obtain discretized action (a) _i ) Wherein the discretized action (a) _i ) The first number (n) of _e ) The individual vectors represent capacity instance control information corresponding to respective edge clouds, followed by a second number (n) _q ) The individual vectors represent information of whether or not the corresponding user request is shunted to the edge cloud.

The instant rewards in the problem model, the least costly policy setting spent per user request based on the current time slot, comprises:

the instant prize is defined as: w is a ^t -c ^t Wherein

In the method, the problem model is solved by adopting the deep reinforcement learning model, in the process of training the deep reinforcement learning model, the instant reward is calculated based on the action result obtained by each training turn, the parameters in the deep reinforcement learning model are propagated and trained reversely according to the instant reward, and after the total number of training turns reaches the preset training times, the whole training is finished.

In the method, the deep reinforcement learning model is specifically realized by using a dominant action review algorithm (A2C).

In the method, when the deep reinforcement learning model obtained by training is applied, increase and decrease information of the number of container instances required by the current time slot and user request diversion information of the current time slot are obtained, and the diversion policy may be applied to diversion control, and specifically includes: in the next time slot, the core network of the network increases and decreases the number of corresponding container instances in each edge cloud in the network according to the increase and decrease information of the number of container instances required by the current time slot; and in the next time slot, the core network of the network establishes a Protocol Data Unit (PDU) session to the corresponding user, and distributes the user request to the corresponding edge cloud according to the user request distribution information of the current time slot.

That is, the control of the increase and decrease of the container instances in each edge cloud in the network is implemented by using a Policy Control Function (PCF) -network open function (NEF) -Application Function (AF) mechanism of the network, and the control of offloading of user requests is implemented by PDU. The increase and decrease of the container instances in each edge cloud realize the allocation of the computing resources, the distribution of the user requirements is realized by the diversion of the user requests, and the complete scheduling of the computing resources of the network is realized by combining the increase and decrease of the container instances in each edge cloud. The network can adopt a fifth generation network (5G) system and a PCF-NEF-AF interaction mechanism of the 5G system, can better solve the cross-domain information communication problem of the communication network state and the edge cloud internal state, and can realize the shunting of the user request by depending on a session establishment mechanism of the 5G system on the scheduling of the user request.

Fig. 2 is a schematic diagram of a management structure of a cross-domain edge network using a user request as a scheduling granularity to implement a computational power resource according to an embodiment of the present application. As shown in fig. 2, the core network in the cross-domain edge network controls the user request to be shunted to the edge cloud e by establishing different PDU sessions ₁ Edge cloud e ₂ Or edge cloud e ₃ . Meanwhile, the starting number of container instances in the three edge clouds is influenced through a PCF-NEF-AF mechanism. In fig. 2, a network micro-cellular Base Station (Macro Base Station) for accessing a user is labeled for accessingMicro Base Station (Micro Base Station) of Macro Base Station, edge Server (Edge Server) of Edge cloud for controlling bearer container instance, and Core Network (Core Network) of the Network.

Fig. 3 is a schematic structural diagram of a joint control device for adjusting the number of container instances and splitting user requests according to an embodiment of the present application, including: a problem modeling unit, a model training unit and a model application unit, wherein,

the problem modeling unit is used for performing problem modeling by adopting a Markov decision mode with the aim of minimizing the average user request cost of a unit time slot of a cross-domain edge network, wherein the state in the problem model is set based on the number of container examples which can be started by the system, the number of container examples operated at the current time slot and the user request state information of the current time slot; actions in the problem model are set based on container instance control information of each edge cloud of the current time slot and the shunting information of the user request of the current time slot; an instant reward in the problem model is set based on a policy that each user of the current time slot requests the lowest cost;

The above-mentioned apparatus may be disposed on a certain control entity of a core network in a cross-domain edge network, and is not limited herein.

In the above apparatus, further comprising: the core network of the network increases and decreases the number of corresponding container instances in each edge cloud in the network according to the increase and decrease information of the number of container instances required by the current time slot in the next time slot; and in the next time slot, the core network of the network establishes a protocol data unit session to the corresponding user, and distributes the user request to the corresponding edge cloud according to the user request distribution information of the current time slot.

The following description will be given with reference to a specific example.

The present example provides a method for determining offload information of a user request and a number of container instances in each edge cloud, which is run in a core network of a cross-domain edge network, and minimizes a user request cost per time slot in the network.

First, a quantitative description of the network cost is needed, as described in detail below.

For the first time slot t ₁ And a second time slot t ₂ Wherein t is ₁ ,t ₂ E T, the network cost comes from three parts: a container instance running cost, a container switch cost, and a default cost requested by a user. The container running cost refers to the expense incurred in running a certain number of containers in each time slot, thereby occupying computational resources in the edge cloud. In traditional edge cloud computing, there are a variety of ways to pay for container instances, including annual, monthly, and pay-on-demand. In this example, the flexible pay-per-demand mode, that is, the payment is performed according to the number of time slots occupied by the container instance resource, is specifically defined as follows:

wherein, C _r (t ₁ ,t ₂ ) Representing the total operating cost, p, in said network _run The price of running a time slot for a single container instance,

as edge clouds e _i The number of container instances is run at time slot t.

The container switch cost comes primarily from the adjustment of the number of container instances in operation by the edge cloud to accommodate load changes. However, the switching of container instances incurs additional overhead for the network, and the computational resources of the network are adjusted by the number of switch container instances with some hysteresis, which introduces container switching costs in order to avoid frequent container instance switching operations. The specific definition is as follows:

wherein, C _s (t ₁ ,t ₂ ) Represents the total container switch cost, p, in said network _switch The cost of opening or closing a single container instance once.

The default cost of the user request mainly comes from the fact that the total time delay of the user request is over standard. Reasons for the total delay exceeding include additional transmission delay of the system due to improper distribution of user requests and additional computation delay due to insufficient computing resources. The quality of the user request cannot meet the requirement due to the fact that the total time delay of the user request exceeds the standard, and according to a Service Level Agreement (SLA) between a service provider operating the network and the user, the service provider needs to compensate the user, so that default cost of the user request is caused.

Wherein, C _d (t ₁ ,t ₂ ) Represents a penalty cost, p, of a total user request of said network _delay The default value of the default is requested for a single user,

requesting the time delay satisfaction rate of x in the time slot t for the user, namely sampling the time delay for a plurality of times in the time slot to be fullThe proportion of user requests sufficient for delay requirements. R _min The lowest agreed latency satisfaction rate.

To sum up, the total cost of the system in a certain time is:

C _total (t ₁ ,t ₂ )＝C _r (t ₁ ,t ₂ )+C _s (t ₁ ,t ₂ )+C _d (t ₁ ,t ₂ )

then, in order to guarantee the quality of service for processing the user request, the cost spent by the system for each user request in each time slot is as follows:

wherein

The number of user requests in time slot t.

Secondly, the cross-domain edge network implements a method for determining the distribution information requested by the user and the number of container instances in each edge cloud, namely, a core network of the cross-domain edge network performs a calculation distribution decision algorithm with optimal cost.

The user requested average cost per time slot optimization problem can be quantitatively described as: given edge cloud e _i Upper limit of number of container instances M _i Dynamically adjusting the opening number of container instances in the edge cloud in each time slot

And calculating a split relationship

So that the cost spent for each user request is minimized on average within each time slot. Wherein,

Otherwise

Aiming at the problems, after the Markov decision process mode is adopted to model the problems, a classical A2C algorithm is adopted to solve the problems. And when a Markov decision process is adopted for problem modeling, mapping the shunting of user requests and the dynamic allocation problem of computing resources in the network into a Markov decision process model. A markov decision process model M = (S, a, P, R) includes a finite pair state set S, a finite action set a, a state transition probability P of the network, and an immediate reward R after action is taken. In the above problem, the effect of actions on the network is deterministic, so the markov decision process emphasizes the need to determine what is the state, the action, and the instant reward.

What the markov decision process emphasizes needs to determine is the specific definition of state, action, and instant rewards as follows.

In each time slot, the state of the network includes: 1) Upper limit (M) of number of container instances that the network can start _i ) (ii) a 2) Number of container instances running in current time slot

3) The number of user requests which are in a service state and not finished in the current time slot; 4) The number of user requests received by each edge cloud at the current time slot.

Each action a in the limited action set A _t Epsilon A contains two parts: control information of container examples in each edge cloud of the time slot and distribution information requested by each user of the time slot. In this example, assume that in one slot, there are only three container instance operations per edge cloud: one container instance is added, one container instance is reduced, and the number of container instances remains unchanged. Assume that there is a first number (n) for one _e ) An edge cloud with each time slot being the mostHas a third number (n) _q ) A request of individual user, then a _t Is one n _e +n _q Vector of bits, first n _e The bits correspond to control information, n, for the container instance of each edge cloud _q The bits correspond to the streaming information requested by the user. In this example, note that the actual number of user requests per time slot is due to

So that the 1 st to the 1 st of the whole motion vector

The bit is the valid bit and the remaining default is filled with 0. In this example the system outputs continuous action a' _t Then discretizing the continuous motion of the output into a _t . Discretized motion vector a _t Front n of (2) _e The bit value 1,0, -1 respectively indicates that one container instance is added to the corresponding edge cloud, the number of the container instances is unchanged, and one container instance is reduced. a is _t And if the rest bits are nonzero, the corresponding user request is shunted to the corresponding edge cloud, and otherwise, the bits are invalid. A 'continuously operated' _t The discretization algorithm may be:

a) A' _t Is clamped to [ -theta, theta [ -theta [ ]]；

b) To a' _t 1 st to n th _e Bit, if a' _t [i]Is greater than 0.5 and

then a _t [i]=1; if it is not

Then a _t [i]= -1, other cases a _t [i]＝0；

c) To a' _t N of (2) _e +1 to the second

The number of bits is,

d) To a' _t To (1) a

To the n-th _e +n _q Bit, a _t [i]＝0

e) Return to a _t

Thus, action a of the current time slot is obtained _t 。

Wherein the general formula a 'of the point a)' _t Is clamped to [ -theta, theta [ -theta [ ]]θ in (1) is an empirical value, and takes different values according to different problems, and takes 1.5 in this example, which is not limited herein.

In point b), a is described _t [i]A is calculated by _t [i]Is associated with the number of containers and user request of the current time slot;

in point c), the formula is used for a _t [i]The resulting values are rounded and mapped onto the corresponding edge clouds.

In each time slot, action a _t The instant prize earned after acting on the network is w ^t -c ^t Wherein

the reward value w brought by the user request indicating that the time delay satisfaction rate reaches the set time delay satisfaction rate threshold value in each time slot t ₀ Requesting the reward value when the set time delay is met and the threshold value is met for a single user; c. C ^t The total cost of the system for the current time slot. c. C ^t ＝C _total (t-1,t) wherein the total cost of the network for the current time slot comprises the running cost of occupied computing resources from the previous time slot to the current time slot of the current time slot, the container instance switch cost, and the user request default cost.

In this example, based on the problem model described above, a multi-round training A2C algorithm is adopted, an instant reward is calculated based on an action result obtained by each round of training, and according to parameters in the instant reward back propagation training depth reinforcement learning model, after the total number of training rounds reaches a preset number of training times, the training process of the A2C algorithm is completed. In the cross-domain edge network, the A2C algorithm can be applied, and the obtained result is applied to the shunting control of the user request and the adjustment of the number of container instances.

It can be seen that, in the embodiments of the present application, a scheduling method for calculating distribution and number of container instances that minimizes average user request cost per time slot is implemented based on a deep reinforcement learning algorithm under the condition that edge cloud computing resources in a cross-domain edge network are limited is provided from the perspective of a network operator and a user.

In another embodiment of the present application, a non-transitory computer readable storage medium is provided, which stores instructions that, when executed by a processor, cause the processor to perform the joint control method of container instance number adjustment and user request offloading in the foregoing embodiments.

Fig. 4 is a schematic diagram of an electronic device according to another embodiment of the present application. As shown in fig. 4, another embodiment of the present application further provides an electronic device, which may include a processor 401, where the processor 401 is configured to execute the steps of the method for managing and controlling an application across a wide area network. As can also be seen from fig. 4, the electronic device provided in the above embodiment further includes a non-transitory computer readable storage medium 402, where a computer program is stored on the non-transitory computer readable storage medium 402, and when the computer program is executed by the processor 401, the steps of the above-mentioned joint control method for adjusting the number of container instances and offloading user requests are performed.

Specifically, the non-transitory computer readable storage medium 402 can be a general storage medium, such as a removable disk, a hard disk, a FLASH, a read-only memory (ROM), an erasable programmable read-only memory (EPROM or FLASH memory), or a portable compact disc read-only memory (CD-ROM), and the like, and when the computer program on the non-transitory computer readable storage medium 402 is executed by the processor 401, the processor 401 can be caused to execute the steps of one of the above-mentioned methods for jointly controlling the number of container instances and the user request bypass.

In practical applications, the non-transitory computer readable storage medium 402 may be included in the device/apparatus/system described in the above embodiments, or may exist alone without being assembled into the device/apparatus/system. The computer readable storage medium carries one or more programs, which when executed, are capable of performing the steps of the above method for joint control of container instance number adjustment and user request forking.

Yet another embodiment of the present application further provides a computer program product, which includes a computer program or instructions, and when the computer program or instructions are executed by a processor, the computer program or instructions implement the above steps in a method for jointly controlling the number of container instances and the user request offloading.

The flowchart and block diagrams in the figures of the present application illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments disclosed herein. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not explicitly recited in the present application. In particular, the features recited in the various embodiments and/or claims of the present application may be combined and/or coupled in various ways, all of which fall within the scope of the present disclosure, without departing from the spirit and teachings of the present application.

The principles and embodiments of the present application are described herein using specific examples, which are provided only for the purpose of understanding the method and the core idea of the present application and are not intended to limit the present application. It will be appreciated by those skilled in the art that changes may be made in this embodiment and its broader aspects and without departing from the principles, spirit and scope of the invention, and that all such modifications, equivalents, improvements and equivalents as may be included within the scope of the invention are intended to be protected by the claims.

Claims

1. A joint control method for adjusting the number of container instances and distributing user requests is characterized by comprising the following steps:

aiming at minimizing the average user request cost of a unit time slot of a cross-domain edge network, problem modeling is carried out by adopting a Markov decision mode, and the state in the problem model is set based on the number of container instances which can be started by the system, the number of container instances operated at the current time slot and the user request state information of the current time slot; actions in the problem model are set based on container instance control information of each edge cloud of the current time slot and the shunting information of the user request of the current time slot; an instant reward in the problem model is set based on a policy that each user of the current time slot requests the lowest cost;

2. The method of claim 1, wherein the problem of minimizing the average user request cost per time slot of the network comprises:

3. The method of claim 1, wherein the setting of the state in the problem model based on the number of container instances that the system can start, the number of container instances that the current time slot runs, and the user requested state information for the current time slot comprises:

Number of container instances running in current time slot

4. The method of claim 1, wherein the actions in the problem model, based on container instance control information for each edge cloud for a current time slot and user requested split information settings for the current time slot, comprise:

each action (a) in an action space (A) in the problem model _i ) The method comprises the steps of including container instance control information of each edge cloud of a current time slot and distribution information of a user request of the current time slot;

the container instance control information of each edge cloud comprises adding one container instance information, reducing one container instance information and keeping the container instance unchanged information;

for having a set first number n _e A second number n of edge clouds _q Said network requested by a user, action a _i Expressed as a first number n _e Container instance control information for an edge cloud, and a second number n _q Vectors of the shunt information requested by each user;

the action (a) _i ) The implementation of (A) is as follows: a plurality of the continuous motion (a' _i ) Discretizing to obtain discretized action (a) _i ) Wherein the discretized action (a) _i ) The first number (n) of _e ) The individual vectors represent capacity instance control information corresponding to respective edge clouds, followed by a second number (n) _q ) The individual vectors represent information of whether or not the corresponding user request is shunted to the edge cloud.

5. The method of claim 1, wherein the immediate reward, lowest cost per user request policy setting based on a current time slot comprises:

the instant prize is defined as: w is a ^t -c ^t Wherein

represents the reward value, w, brought by the user request that the time delay satisfaction rate in the current time slot (t) reaches the set time delay satisfaction rate threshold value ₀ Requesting the reward value when the set time delay is met and the threshold value is met for a single user; c. C ^t The total cost of the system for the current time slot includes the running cost of the occupied computing resources from the previous time slot to the current time slot of the current time slot, the container instance switch cost, and the user request default cost.

6. The method of claim 1 or 2, wherein the deep reinforcement learning model is implemented using a dominant action review algorithm A2C.

7. The method of claim 1 or 2, further comprising:

8. A combined control device for adjusting the number of container instances and splitting user requests is characterized by comprising: a problem modeling unit, a model training unit and a model application unit, wherein,

9. The apparatus of claim 8, wherein the apparatus further comprises:

10. An electronic device, comprising:

a processor;

a memory storing a program configured to implement the joint control method of container instance number adjustment and user request forking according to any one of claims 1 to 7 when executed by the processor.