WO2023092466A1

WO2023092466A1 - Resource sharing-aware online task unloading method

Info

Publication number: WO2023092466A1
Application number: PCT/CN2021/133572
Authority: WO
Inventors: 谢瑞桃; 方俊鸿; 姚俊梅; 伍楷舜
Original assignee: 深圳大学
Priority date: 2021-11-26
Filing date: 2021-11-26
Publication date: 2023-06-01

Abstract

Disclosed in the present invention is a resource sharing-aware online task unloading method. The method comprises: sorting stations, so as to obtain a station sequence; pre-allocating users to the stations by means of solving a modeled user allocation problem, wherein the optimization objectives of the user allocation problem are meeting the delay requirement of each user, and maximizing a resource sharing opportunity; and for each station, by means of solving a modeled server loading problem, determining, for user tasks in an obtained user set, servers to be subjected to task loading, wherein the optimization objectives of the server loading problem are loading, by using the fewest servers, the tasks of the users pre-allocated to each station, and meeting resource capacity limitations of the servers. By using the present invention, a task unloading problem of a plurality of instances of a single application can be solved, the delay requirement of each user in the instances can be met, and the cost of the servers used can be minimized.

Description

A resource sharing-aware online task offloading method

technical field

The present invention relates to the field of computer technology, and more specifically, to a resource sharing-aware online task offloading method.

Background technique

Cloud-based interactive applications have high requirements on the user's network environment, and it is difficult to meet the user's delay requirements. Utilizing emerging mobile edge computing and 5G networks, the rendering tasks of interactive applications are offloaded to edge servers close to users, which can reduce user latency.

Edge computing comes in the form of localized clouds. Telecom operators are upgrading infrastructure in existing communication networks to mobile edge computing platforms, with access sites such as cellular radio base stations, aggregation sites such as housing distributed antenna systems, and core sites such as central offices. These sites are equipped with computing and storage resources, cooling, power systems, etc., and have been redesigned to accommodate edge servers. Since the edge server is close to the user, using mobile edge computing can reduce the response time of user requests.

Cloud-based interactive applications, such as virtual reality and cloud games, utilize cloud resources to handle computationally intensive tasks, which avoids the need for high-end hardware (usually expensive and energy-intensive) on user devices and makes clients light Quantify. However, cloud-based interactive applications require high-throughput and low-latency network connections. If the distance between the user and the data center is long, it will be difficult to meet the low latency requirement of the user. Leveraging the emerging mobile edge computing, edge-assisted interactive applications offload computationally intensive 3D rendering tasks to mobile edge computing systems, and stream edge-rendered video to end users via 5G connections. Since the edge server is close to the end user, this approach can greatly reduce latency.

As shown in Figure 1, an edge-assisted interactive application system consists of cloud servers, edge servers, and user equipment. The cloud server hosts the core logic of the application; the edge server is responsible for rendering and encoding; and the user device is responsible for decoding and display. An information ring is formed between the three. The cloud server generates a rendering command and transmits it to the edge server; the edge server renders and generates a video frame and transmits it to the user device; the user device generates a control command and transmits it to the cloud server, which updates the application logic after receiving it.

In the prior art, task offloading in edge computing can be divided into three categories according to optimization objectives: minimizing task delay, minimizing system cost, and simultaneously minimizing task delay and system cost. Although these technical solutions use different methods to reduce the delay of tasks or reduce the cost of the system, none of them consider the characteristics of multi-dimensional resource sharing of tasks, and the use of resource sharing between tasks can effectively reduce resource consumption, thereby Reduce system cost.

Taking running a rendering task as an example, it involves multiple types of resources such as storage, memory, GPU memory, CPU, GPU, and bandwidth. With the exception of bandwidth, each type of resource can be shared by multiple users served by the same server. Utilizing the feature of multi-dimensional resource sharing in rendering tasks, users who share resources can be assigned to the same server, thereby saving resource consumption and cost. See Figure 2 for an example to illustrate resource sharing. First, for one application (in the example, a multiplayer 3D chess game), rendered assets (board and pieces) are stored in memory, and these assets can be shared by multiple application instances. An instance is an execution of the application, such as the example for a game of chess with two cooperating users on each side. Second, in an instance, these cached data can be shared by all rendering tasks of the instance. Third, some users have the same viewpoint throughout the running of the application, so these users can share the rendering tasks run by the CPU and GPU, for example, user 1 and user 2 in the example share the viewpoint, and therefore share rendering task 1. Using this feature of multi-dimensional resource sharing, users of an application instance can be allocated together to save resource consumption.

Multiple users of an application instance are usually located in different locations on the network. Given several edge sites equipped with servers for users to offload tasks, each user's rendering task should be offloaded to a certain server at an edge site. Since interactive applications are latency sensitive, when choosing an edge location for offloading, make sure that the latency for each user is acceptable. Also, the cost of servers at different sites may vary. Therefore, it is necessary to provide a new task offloading method to take advantage of the multi-dimensional resource sharing characteristics of the rendering task, minimize the cost of the used server, or minimize the number of used servers when the server cost is the same.

Contents of the invention

The purpose of the present invention is to overcome the above-mentioned defects in the prior art, and provide an online task offloading method with resource sharing awareness. The method includes:

Sorting the sites to obtain a site sequence, where C represents the site ranked first;

Perform the following steps until user tasks in the user set are loaded or target sites are traversed: Instance-based user set U and target site set V', users are preassigned to sites by solving the modeled user assignment problem, where V' Indicates the set of all sites starting from site C in the site sequence, the optimization goal of the user allocation problem is to meet the delay requirements of each user and maximize the opportunity of resource sharing; in the current site C, for the obtained user set The user tasks in the model determine the servers to be loaded by solving the modeled server loading problem, wherein the optimization goal of the server loading problem is to use the least number of servers to load their tasks for the users pre-assigned to each site, and satisfy the server requirements. resource capacity constraints.

Compared with the prior art, the present invention has the advantage of utilizing the characteristics of multi-dimensional resource sharing of tasks to provide a new technical solution for unloading rendering tasks, which can not only meet the needs of each user's delay, but also minimize the use of The cost of the server, the cost can be rental cost or energy consumption etc.

Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments of the present invention with reference to the accompanying drawings.

Description of drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention.

FIG. 1 is a schematic diagram of a system architecture of an edge-assisted interactive application in the prior art;

FIG. 2 is an example of resource sharing in the prior art;

Figure 3 is an example of user allocation according to one embodiment of the present invention;

Fig. 4 is a schematic diagram showing that the cost reduction rate increases as the number of instances increases according to an embodiment of the present invention;

Fig. 5 is a schematic diagram of the cost reduction rate in the case of equal server costs according to an embodiment of the present invention;

Fig. 6 is a schematic diagram of performance in a dynamic environment according to an embodiment of the present invention;

Fig. 7 is a schematic diagram of costs under various delay limit values according to an embodiment of the present invention;

Fig. 8 is a schematic diagram of the cost reduction rate when the delay is limited to 40 ms according to an embodiment of the present invention;

Fig. 9 is a schematic diagram of the empirical cumulative distribution function of the cardinality of the set division blocks under different weight values θ according to an embodiment of the present invention;

Fig. 10 is a schematic diagram of cost reduction rates under different weight values θ according to an embodiment of the present invention.

Detailed ways

Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that the relative arrangements of components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.

The following description of at least one exemplary embodiment is merely illustrative in nature and in no way taken as limiting the invention, its application or uses.

Techniques, methods and devices known to those of ordinary skill in the relevant art may not be discussed in detail, but where appropriate, such techniques, methods and devices should be considered part of the description.

In all examples shown and discussed herein, any specific values should be construed as exemplary only, and not as limitations. Therefore, other instances of the exemplary embodiment may have different values.

It should be noted that like numerals and letters denote like items in the following figures, therefore, once an item is defined in one figure, it does not require further discussion in subsequent figures.

The present invention proposes a sharing-aware online task offloading method, which is used to optimize the task offloading problem of multiple instances of a single application. The cost of the server. The present invention alternately and iteratively solves two sub-problems: the user allocation problem and the server loading problem.

The user allocation problem is to allocate users to edge sites so that the latency of each user satisfies the requirement while maximizing the chance of resource sharing. In one embodiment, the user assignment problem is modeled as a set partition problem and a heuristic algorithm is proposed to solve it.

The problem of server loading is that in each site, for the above-mentioned users pre-allocated to the site, use the least number of servers to load their rendering tasks, and at the same time meet the resource capacity limit of the server. In one embodiment, the server loading problem is modeled as a multidimensional bin packing problem. However, the shared nature of resources complicates server loading issues. Therefore, further, it is proposed that each user can be regarded as an item, or a group of users who share a perspective can be regarded as an item, or users who belong to the same instance can be regarded as an item, and then solve the problem of multiple items. box problem. The difference between the above three loading methods is that the granularity is different, and the loading scheme and the resulting cost are also different. Therefore, there are three levels of granularity: individual user level, shared perspective user group level, and instance level. To find a level of granularity that achieves the lowest cost, the granular decision problem is modeled as a multi-armed bandit problem and solved using a reinforcement learning algorithm.

In the following, the user delay will be analyzed first, and then the task offloading problem, user allocation problem and server loading problem will be introduced respectively, and the sharing-aware online offloading method and its two sub-methods will be introduced in detail: the sharing-aware user assignment algorithm and granular decision-making algorithms.

1), user delay analysis

Where rendering tasks are offloaded affects user latency. As shown in Figure 1, in edge-assisted interactive applications, user latency consists of the upstream latency of transmitting control commands, the downstream latency of transmitting rendering commands and rendered video frames. Where rendering tasks are offloaded is related to downlink latency, not uplink latency. Therefore, this article takes the downlink delay as an example for illustration.

Assuming that the rendering task of user u is offloaded to edge site v, its downlink delay is denoted by t _uv . Let r and f denote the average data size of the rendering command and video frame respectively, let b _v and d _v denote the downlink throughput and delay from the cloud server to the edge site v respectively, and the delay of transmitting the rendering command can be obtained as:

Let b _vu and d _vu represent the downlink throughput and delay from edge site v to user equipment u respectively, and the delay of transmitting video frames can be obtained as:

According to formula (1) and formula (2), the downlink delay can be obtained:

It can be known from formula (3) that the downlink delay is related to the position of v. Therefore, when offloading rendering tasks, it is necessary to ensure that each user's latency requirements are met.

2) Modeling of task offloading problem

Given an application and a set of instances (denoted by I), each instance is an execution of the application, which is participated by one or more users. In addition, given a group of edge servers (denoted by S), distributed in a group of edge sites (denoted by V). Assuming that all edge servers have the same hardware configuration, the cost of an edge server depends on the location of the edge site where it is located. The goal of the present invention is to offload the rendering tasks of these instances to edge servers and minimize the overall cost. The following modeling can be extended to the scenario of heterogeneous edge servers, such as the hardware configuration of the server is different, or the capacity of the server is different, etc., and the proposed method can also be applied to the scenario of heterogeneous edge servers.

The task assignment problem is defined as: Given a group of users U and a group of edge servers S, assign each user in U to an edge server in S, not only to make each user’s delay requirement satisfied, but also The resources on each edge server are limited by its capacity while minimizing the total cost of the edge servers used. For each user, the rendering tasks serving it will run on the assigned server. For each server, all users sharing a view will share the same rendering task.

Let the Boolean variable x _uj denote whether server j hosts rendering tasks for user u. Each user must be assigned to a server, expressed as:

∑ _j∈S x _uj = 1,

As mentioned earlier, user latency depends on the site where the task is offloaded. Let t _uv denote the network delay incurred by offloading user u's tasks to site v. For any server j in site v, t _uj =t _uv . Therefore, the delay for user u is ∑ _j∈S t _uj x _uj . The delay is capped at a tolerable maximum value, denoted by τ. Therefore, it can be expressed as:

∑ _j∈S t _uj x _uj ≤τ,

Next, model resource constraints. It is assumed that each edge server has sufficient storage space to host the application. Otherwise, some servers are ignored because they cannot host any rendering tasks. Let the vector p=(p ^C , p ^G , p ^M , p ^GM , p ^B ) represent the CPU, GPU, memory, GPU memory and bandwidth capacity of each server. For simplicity, assume that the rendering task requires one unit of each resource. The multi-dimensional resource sharing nature of rendering tasks complicates the modeling of resource constraints, therefore, auxiliary variables are introduced.

Assume that users in U are divided into several user groups that share perspectives, denoted by K. To model the resource constraints of CPU and GPU, a Boolean variable y _kj is introduced to indicate whether server j hosts rendering tasks for user group k sharing a view. Therefore, the number of tasks on server j is ∑ _k∈K y _kj . Due to possible task sharing, this number of tasks may be smaller than the number of users assigned to the server (ie ∑ _{u ∈ U} x _uj ). Let the Boolean variable w _j denote whether server j is started. Thus, the capacity constraint can be expressed as:

∑ _k∈K y _kj ≤ p ^C w _j ,

∑ _k∈k y _kj ≤ p ^G w _j ,

The above two constraints can be combined, let p ^K represent the maximum number of tasks allowed by each server, namely:

p ^K ＝min{p ^C ，p ^G } (8)

Replace the two constraints of the above formulas (6) and (7) with:

∑ _k∈K y _kj ≤ p ^K w _j ,

The constraint of formula (9) ensures that if w _j is 0, then all variables y _kj must be 0.

To model the resource constraints of memory and GPU memory, a boolean variable zi _j is introduced, denoting whether server j hosts instance i ∈ I. Then, the capacity constraint can be expressed as:

∑ _i∈I z _ij ≤ p ^M w _j ,

∑ _i∈I z _ij ≤ p ^GM w _j ,

Since any two instances have separate memory and GPU memory, sum over all instances. The two constraints of the above formulas (10) and (11) can also be combined, let p ^I represent the maximum number of instances allowed by each server, namely:

p ^I =min{p ^M , p ^GM } (12)

Then, replace the two constraints of the above formulas (10) and (11) with:

∑ _i∈I z _ij ≤ p ^I w _j ,

Unlike resources that can be shared, assumed users cannot share bandwidth. Therefore, the bandwidth constraint can be expressed as:

∑ _u∈U x _uj ≤ p ^B w _j ,

It should be noted that there is a certain relationship between the above variables. First, x _uj and y _kj must satisfy the constraints:

x _uj ≤ y _k(u) , j,

where k(u) represents the shared view user group to which user u belongs. This means that x _uj is 1 only when y _k(u),j is 1. That is, event server j hosting user group k(u) is a necessary condition for event server j hosting user u. A similar relationship exists between y _kj and z _ij . If server j hosts user group k, then it must host the instance to which the user group belongs. Let i(k) denote an instance containing user group k, then the following expression:

y _kj ≤ z _{i(k), j} ,

Finally, formula (13) gives the necessary condition z _ij ≤ w _j .

The goal of the present invention is to minimize the total cost of the servers used, such as the total fee paid by the application provider to the edge computing service provider. Let c _v denote the cost of each server in site v, and for any server j in site v, c _j = c _v . The goal is to minimize ∑ _j∈S c _j w _j . Therefore, the task assignment problem is modeled as a Boolean linear program as follows:

3) Modeling of user assignment problem

The user allocation problem is to allocate users to edge sites, not only to satisfy each user's latency requirement, but also to maximize resource sharing. Ultimately, a given set of users is partitioned into disjoint subsets, and each subset is assigned to a site. Different partitions will result in different resource consumption and thus different costs. Finding the optimal partition is a set partitioning problem.

Refer to Figure 3 to illustrate the user allocation problem. Given 4 edge sites (represented by squares) and 8 users (represented by circles). All users belong to the same instance, but belong to two different user groups with shared perspectives. Use solid circles and hollow circles to distinguish user groups. In this example, the relationship between users and sites is represented by a bipartite graph, that is, there is an edge between a user and a site if the user's latency requirement is satisfied on one site. The light gray edges represent feasible assignments, and the dark gray edges represent actual assignments. Each site has a set of users connected to it, and the set of users connected to site v is denoted by _Rv . For example, in FIG. 3( a ), R ₃ ={u ₂ , u ₃ , u ₄ , u ₇ }. Users assigned to site v must have latency requirements met, so these users must be a subset of R _v . Additionally, each user must be assigned to a site. Therefore, the sets of users assigned to each site must be pairwise disjoint, and the union of these sets equals the entire set of users. That is, the result of user assignment is a set partition. For example, the user allocation scheme in Figure 3(b)

is a set partition, the user allocation scheme {{u ₁ ,u ₃ },{u ₂ ,u ₅ ,u ₆ },{u ₄ ,u ₇ },{u ₈ }} in Figure 3(c) is Another kind of collection partitioning. Both allocation schemes are valid but may incur different costs. The allocation scheme in Figure 3(b) requires 3 rendering tasks, running in sites v ₁ , v ₃ and v ₄ , because in site v ₃ users u ₂ , u ₃ and u ₄ share the perspective and thus share the same rendering task. The same goes for users assigned to site _v4 . In contrast, the allocation scheme in Figure 3(c) requires 1, 2, 2, and 1 rendering tasks in each site, for a total of 6. The higher the number of tasks, the higher the cost incurred.

Next, define the user assignment problem. Let U represent the user set of a certain instance, which is different from the symbol U used to represent the user set of all instances. Given a set U of users for an instance and a set of edge sites V, the user assignment problem is a partition to solve U. A division corresponds to an indexed family, which is composed of a subset of U, and the index set is V, represented by {U _v :v∈V}. An index family satisfies the following two constraints.

1) The union of all subsets is equal to U, namely:

∪ _v∈V U _v = U (18)

2) All subsets are pairwise disjoint, namely:

One divided block U _v corresponds to the set of users assigned to site v.

Segmentation ensures that each user is assigned to an edge site and that the user's latency requirements are met at the assigned site. Let R _v denote all users in U that meet the delay requirement on site v, then it can be expressed as:

R _v ＝{u∈U|t _uv ≤τ} (20)

Therefore, the user set U _v allocated to the edge site v must satisfy:

To introduce this constraint into the modeling of the set partition problem, a two-tuple <v,A> is introduced, denoting a site v and a feasible set of the site

Then, get a set of candidate bigrams:

Therefore, the problem becomes given U and

choose

Subsets of U form a partition of U, which means that the second element of the chosen pair satisfies constraints (18) and (19).

available from

Various divisions of U are obtained in . However, different partitions result in different types of resource sharing and thus different resource (memory and processor) consumption. For example, consider a partition where each user is assigned to a different edge site, then there is no resource sharing, resulting in high resource consumption. In contrast, considering another partition, all users are assigned to the same edge site, and if their rendering tasks are loaded on the same server, then the resources can be shared, thereby reducing resource consumption.

Given the set of candidate bigrams defined in formula (22)

Preferably, a set A with a large cardinality is selected because there are greater chances of resource sharing. In addition, there is a tendency to choose edge sites with low cost. In order to meet the above two requirements at the same time, the binary group can be

The cost of is defined as:

c(<v, A>)=c _v |A| ^-θ (23)

The lower the cost, the better. θ is the weight of |A|, which is used to balance the cost of the server and the cardinality of the collection. The value of θ affects the degree of resource sharing.

Therefore, the user assignment problem for a single instance is defined as: given a set of users U of a certain instance, a set defined in Equation (22)

and a cost function c:

found one

The subset with the smallest cost in , making it a partition of U. This is a set partitioning problem. This modeling ensures that each user is assigned to a certain edge site, and the user's latency requirements are satisfied at this site, that is, the first two constraints of the mathematical model (17) are satisfied. In the following, a shared-aware user assignment algorithm is introduced to solve this problem.

3), server loading problem

Once users are assigned to edge locations, the server loading problem is solved. For each site, given a set of users assigned to it, the optimization goal of the present invention is to use the least number of servers to load the user's rendering tasks. According to the mathematical model (17), each server has three resource capacity constraints: a limited number of instances, a limited number of shared view user groups, and a limited number of users. Therefore, the server loading problem can be modeled as a three-dimensional bin packing problem.

Resource sharing complicates server loading issues, since loading some users' tasks together may reduce resource consumption. In one embodiment, each user can be regarded as an item with resource requirements <1,1,1>, and stored in a box with capacity <p ^I , p ^K , p ^B >. Each element in the triplet corresponds to the number of instances, tasks, and users, respectively. Alternatively, users belonging to the same shared view user group can be grouped together and viewed as items with resource requirements <1, 1, n ^B >, where n ^B is the number of users in the user group. Similarly, users belonging to the same instance can be grouped together and viewed as an item with resource requirements <1,n ^K ,n ^B >, where n ^K and n ^B are the tasks in the instance (i.e. user groups) and number of users. Different levels of granularity may have different costs. Coarse-grained (eg, instances) can maximize shared resources, but may lead to wasted resources due to the large size of the items. In contrast, fine-grained (e.g., users) do not maximize resource sharing, but may utilize resources more fully due to the small item size. Therefore, it is necessary to find an optimal granularity level to minimize the cost of loading.

According to simulation experiments, it is found that the optimal granularity level depends on many factors, such as the capacity of the server <p ^I , p ^K , p ^B >, the delay limit τ, whether the server costs of each edge site are the same, etc. In different situations, the optimal level of granularity is different, and it is difficult to obtain the decision-making formula of the granularity level through the analysis model. Therefore, in one embodiment, the optimal level of granularity is learned from experience using reinforcement learning methods.

Specifically, the granular decision-making problem is modeled as a multi-armed bandit problem. Three kinds of actions (level of granularity) are given. For each step, an action (level of granularity) is selected. For the k instances that are subsequently uninstalled sequentially, this action (level of granularity) is used to load the server. Repeat this process for each step. Some servers will be started to load the instance, the less number of started servers the better. Therefore, the reward of an action (denoted by R) is defined as the negative number of the number of newly started servers (denoted by C), that is, R=-C. The goal of the present invention is to maximize the expected total reward in the long run of action selection. In the following, we will specifically introduce the use of granular decision-making algorithm to solve the granular decision-making problem.

When a certain level of granularity is selected, a group of users will form a group of items according to the level of granularity, and then use the least number of servers to load their rendering tasks, while meeting the constraints of 3D resource capacity. The 3D bin packing problem is NP-hard and there are many approximate or heuristic algorithms that can solve it. For example, the traditional First-Fit algorithm can be used: when loading a new item, the algorithm scans the servers that have been started in order, and loads the item in the first server that meets the capacity constraint; If none of the existing servers satisfies the capacity constraint, a new server is started. To determine whether an item satisfies the server's capacity constraints, resource capacity constraints (9), (13) and (14) need to be evaluated, taking into account the resource sharing outlined in the mathematical model (17). It is worth noting that the granularity decision algorithm of the present invention is applicable to any bin packing algorithm.

Consider a special case of the above problem: There is only one user group per application instance. In this case, there is a one-to-one correspondence between user groups and instances. The two variables y _kj and z _ij in the mathematical model (17) become the same, so the two corresponding capacity constraints become one:

∑ _i∈I z _ij ≤ min{p ^K , p ^I }w _j ,

As a result, the server loading problem becomes a cardinality constrained bin packing problem, which is still NP-hard, and some approximate algorithms have been proposed to solve it.

In the following, the algorithms involved in solving the above modeling problems will be introduced.

1) Shared perception user allocation algorithm

For the above modeling of the user assignment problem as a set partition problem, the set of bigrams defined in Eq. (22)

The cardinality of is exponential, so solving the problem directly is inefficient. Therefore, in one embodiment, an efficient heuristic algorithm (shared-aware user assignment algorithm) is proposed to solve the algorithm, which does not start from the set of exponential cardinality

Choose a partition from , but from the set

A subset of

start, that is:

gather

The number of elements of is only |V|.

Specifically, the user allocation algorithm steps of shared awareness include:

Step S101, given a user set U of a certain instance and a set of edge sites V, a set of 2-tuples can be obtained

The first element of each two-tuple is site v, and the second element is a subset R _v of user set U, which is the user meeting the delay requirement on site v.

Step S102, from the collection

cost-effective

The smallest binary group (called <v ^* , A ^* >) is used as the division block of the user set U corresponding to the site v.

Step S103, update collection

Delete the users assigned in step S102 (namely A*) from each dyad to ensure that the selected dyads are pairwise disjoint.

Step S104, from the collection

Remove invalid two-tuples in

Step S105, repeat steps S102 to S104 until the union of the selected 2-tuples is equal to U.

2) Granular decision-making algorithm

In one embodiment, the granular decision problem is modeled as a multi-armed bandit problem, which is solved using the action-value method in reinforcement learning. Let Q(a) denote the value function of action a, which is used to evaluate the merits of choosing each action. The value function Q(a) is obtained by computing the average reward obtained in the steps of choosing action a. The initial value of the value function Q(a) has a great influence on the convergence, first initialize Q(a), and then use Q(a) to select actions. In the initialization phase of the value function Q(a), an action a is selected in m steps to initialize Q(a). Let N(a) denote the number of times action a is selected.

Specifically, the granular decision-making algorithm includes the following steps:

In step S201, the value function Q(a) is initialized according to the following rules. In each step, three actions are taken sequentially at the user group granularity level, instance granularity level and user granularity level. Repeat the above process until each action is selected m times, and end the initialization phase. Assign the average reward obtained by action a in m steps to Q(a), and assign m to N(a).

In step S202, after the initialization phase ends, the action with the highest value function Q(a) is selected with the probability of 1-ξ, and the action is randomly selected with the probability of ξ.

Step S203, use action a to perform server loading on the sequentially arriving k instances. Count the number of newly started servers during this period, and use its negative number as the reward R obtained by action a.

In step S204, the counting function N(a) is incremented by 1.

Step S205, using the counting function N(a) to update the value function Q(a), that is, Q(a)=Q(a)+[R-Q(a)]/N(a).

Step S206, repeat steps S202 to S205.

In the above algorithm, the parameter ξ balances exploration and exploitation in reinforcement learning. The parameter m affects the quality of the initialization and further affects the learning process. The larger m is, the closer the value function is to the real value, but too large m may slow down the convergence of the algorithm. The parameter k is the number of instances performing offloading at each step, which affects the quality of the reward. If k is too small, 1 in extreme cases, the rewards of each action are very similar or even 0, so that it is impossible to distinguish the pros and cons of actions; if k is too large, the update of the value function needs to unload a large number of instances, and at the same time This leads to slower convergence of the algorithm.

3) Shared-aware online uninstall method

Next, the online offloading method is introduced based on the sub-algorithms introduced above. The present invention unloads instances one by one in the order they arrive. For each instance, the user allocation problem is first solved, followed by the server loading problem for each site. However, a site may not be able to accommodate all the users assigned to it because the number of servers per site is not considered when user allocation is made. Therefore, the sites are sorted in the order of non-decreasing cost, and then the servers are loaded on the sites in turn. The purpose is to give priority to the site with the lowest cost. Insufficient resources occur from time to time.

Specifically, the steps of uninstalling any instance in the sharing-aware online uninstalling method include:

In step S301, the sites are sorted in a non-decreasing order of their costs, and if the costs are the same, the site with more servers has a higher priority. Let C denote the currently lowest-cost site.

Step S302, according to the station sequence obtained in step S301, let V' represent the set of all stations starting from station C in the sequence. Based on the example user set U and site set V', the user allocation algorithm can be solved to obtain a set partition {U _v :v∈V'} of the user set U, where the user set corresponding to the current site C is U _C .

In step S303, the current optimum granularity level is obtained by the granularity decision algorithm.

Step S304, according to the granularity level obtained in step S303, use the first adaptation algorithm to load the server in the current site C for the tasks of the users in the user set U _C obtained in step S302 until there are not enough servers.

Step S305, delete the user who successfully loaded the task in step S304 from the user set U of the instance.

Step S306, according to the station sequence obtained in step S301, the station next to the current station C is taken as station C.

Step S307, repeat steps S302 to S306, until the tasks of all users in the user set U have been loaded or all sites have been tried.

In summary, the sharing-aware offloading method (or called SAO) of the present invention takes resource sharing into consideration when offloading rendering tasks. Compared with the algorithm (or called SBO) that does not consider resource sharing, although the unloading algorithm (SBO) that does not consider sharing is the same as the method of the present invention, iteratively solves the user allocation subproblem and the server loading subproblem, but the SBO algorithm Use different user allocation algorithms and server loading algorithms. For each instance, the SBO algorithm assigns each user to the edge site with the lowest latency and the lowest cost. If the cost is the same, the site with the most servers is preferred. In addition, when the server is loaded, the SBO algorithm uses the first-fit algorithm and takes users as the granularity. It should be noted that for any method, in the first adaptation algorithm, when determining an item (it can be a user, a group of users belonging to a shared view user group, or a group of users belonging to an instance) Resource sharing in the mathematical model (17) must be considered when satisfying the capacity constraints of the server.

Further, in order to verify the effect of the present invention, a simulation experiment has been carried out, and the present invention is compared with the user as the granularity (SAO-U), the user group as the granularity (SAO-G) and the instance as the granularity (SAO-I) Algorithms for server loading are compared. The performance of the algorithm is evaluated using the cost reduction rate defined below:

In the simulation experiment, if not stated otherwise, the set parameters are shown in Table 1. Instances are started sequentially and remain running until the end of the experiment. Each experimental result is obtained after 20 random simulation experiments. In the experiment, the parameters of the algorithm are set as follows: θ=1, ξ=0, k=200, m=1. Therefore, in the granular decision algorithm, the initialization of the value function requires 3mk = 600 instances. In addition, in the granular decision-making algorithm, the initialization starts from the second step, because the servers in the first step are not started, starting from zero may lead to a large deviation of the reward.

Table 1 Parameter settings of the simulation environment

参数parameter	值value
每个实例的共享视角用户组数量Number of shared perspective user groups per instance	22
每个共享视角用户组的用户数量Number of users per shared perspective user group	44
边缘站点的数量Number of edge sites	5050
每个边缘站点的服务器数量Number of servers per edge location	服从[50,100]上的均匀分布Obey the uniform distribution on [50,100]
每个边缘站点中的服务器成本(c _v) Server cost in each edge location (c _v )	服从[1,10]上的均匀分布Obey the uniform distribution on [1,10]

<p ^I,p ^K,p ^B> <p ^I ,p ^K ,p ^B >	<5,10,20><5,10,20>
延迟t _uv(毫秒) Delay t _uv (ms)	服从[10,50]上的均匀分布Obey the uniform distribution on [10,50]
延迟限制τ(毫秒)Latency limit τ (milliseconds)	3030

1), cost reduction

Compared with the SBO algorithm, the present invention can significantly reduce the cost. As shown in Fig. 4, the method of the present invention (SAO) reduces the cost by up to 52% compared to the SBO algorithm. The performance improvement comes from two aspects: 1) user allocation algorithm based on shared awareness; 2) granular decision algorithm. First, compared with the SBO algorithm, the cost of the SAO-U algorithm is reduced by 50%. This shows that server loading is also performed at the granularity of users, and resource sharing can reduce costs. Furthermore, the method of the present invention (SAO) outperforms the SAO-U algorithm by about two percentage points. This suggests that granular decision algorithms can further reduce costs.

Table 2 shows the convergence results of the granular decision-making algorithm in various situations through 20 random simulation experiments with 4000 instances.

Table 2 Experimental results

环境参数Environmental parameters	实例example	用户组user group	用户user
默认的延迟限制τ，默认的服务器成本Default latency limit τ, default server cost	1616	44	00
默认的延迟限制τ，服务器成本相同Default latency bound τ, same server cost	00	44	1616
延迟限制τ＝40ms，默认的服务器成本Latency limit τ=40ms, default server cost	00	2020	00

As shown in the first row of Table 2, in 20 random simulation experiments with 4000 instances, the granular decision-making algorithm converges to the granularity level of instances in 16 experiments, and converges to the granularity level of user groups in 4 experiments.

Next, set the cost of all servers to the same value. It can be seen from FIG. 5 that the method (SAO) of the present invention can reduce the cost by as much as 30%. In this case, the performance of the SAO-I algorithm is very poor, because |A| ^-θ dominates in Equation (23) under equal server costs, so the users in the instance are more closely aggregated, Loading servers at the granularity of instances, although resource sharing can be maximized, results in waste of resources due to the large size of items. In contrast, the performance of the SAO-U algorithm is the best, because the size of items with user granularity is small, and resources can be fully utilized. The method of the present invention (SAO) achieves almost the best performance by learning. As shown in the second row of Table 2, in 20 random simulation experiments with 4000 instances, the granularity decision algorithm converges to the granularity level of users in 16 experiments, and converges to the granularity level of user groups in 4 experiments. It successfully avoids the granularity level of the worst performing instance and almost converges to the best granularity level.

2), performance in dynamic environment

When an instance is started, the method of the present invention allocates resources for it, and when the instance is completed, the occupied resources will be released and reused later. Evaluate the performance of the algorithm in the above dynamic environment, 4000 instances arrive and leave sequentially, the arrival time interval of the instances obeys the exponential distribution with a mean value of 500 milliseconds, and the running time of the instances obeys the uniform distribution of 10 to 20 minutes. Other parameter settings are shown in Table 1. The experimental results are obtained through 20 random simulation experiments. Figure 6(a) shows the average number of running instances in the system over time. In each simulation experiment, the number of running instances first increases as the instances arrive sequentially, then reaches a maximum before the first instance leaves, remains there for a while, and then decreases after the last instance arrives. Figure 6(b) shows the average cost reduction rate of each algorithm over time. Compared with the SBO algorithm, the method (SAO) of the present invention reduces the cost by more than 40%, and learns the optimal granularity of server loading.

3), the influence of delay limit τ

To evaluate the impact of delay limit τ on performance, the server cost under three values of delay limit τ is compared: 20 ms, 30 ms and 40 ms. Since the SBO algorithm consumes a large number of servers when the delay limit τ is 20 milliseconds, in this set of experiments, the number of servers for each site is set to obey a uniform distribution from 100 to 150. As shown in Fig. 7, the cost of both the method of the present invention (SAO) and the SBO algorithm decreases as the delay constraint increases. The reason is that the higher the value of the latency limit, the more users the latency requirements are satisfied in each edge site, and the greater the opportunity for resource sharing and cost reduction.

Fig. 8 shows the cost reduction rate when the delay limit τ is 40ms. Unlike the case where the delay limit τ is 30 ms, the SAO-I algorithm performs poorly. Because the higher the value of the latency limit, the more users the latency requirement satisfies in each edge site, the more users may be clustered together after user allocation. Loading servers at the granularity of instances will lead to waste of resources due to the large size of the items. The method of the present invention (SAO) achieves the best performance through learning. As shown in the third row of Table 2, in 20 random simulation experiments with 4000 instances, the granular decision-making algorithm converges to the granularity level of user groups in all experiments.

4), the influence of the weight value θ

Further, the impact of θ on performance is evaluated, that is, the weight of |A| (the cardinality of the collection partition block) relative to Cv (the cost of the server). Figure 9 shows the empirical cumulative distribution function of the partition cardinality in a simulation experiment with 2000 instances. If the value of θ is large, the shared-aware user allocation algorithm tends to choose the partition block that contains more users.

Figure 10 shows the cost reduction rate of the method of the present invention (SAO) at different values of θ. In the case of large problem scale, as θ increases from 0 to 1, the performance becomes better, because the weight of |A| improves the aggregation degree of users in the site and increases the chance of resource sharing among users. However, as θ increases further, the performance gets worse. Because the weight of |A| becomes larger, the cost of the server is ignored. Although the aggregation degree of the user becomes higher, the user may be allocated to some edge sites with high server cost. In one embodiment, θ is set to 1 to strike a good balance between server cost and user aggregation.

To sum up, the present invention proposes a shared-aware online offloading method, which is used to solve the task offloading problem of multiple instances of a single application, which can satisfy the delay requirements of each user in the instance while minimizing The cost of the servers used. The present invention utilizes the characteristic of multi-dimensional resource sharing of tasks, and reduces resource consumption through resource sharing, thereby reducing system cost. A sharing-aware user allocation algorithm is proposed to allocate users to edge sites so that the latency of each user meets the requirements while maximizing resource sharing. The algorithm considers making the user's latency satisfy the requirements, while aggregating users together, minimizing the cost of the server and improving the chance of resource sharing among users. Furthermore, an optimal level of granularity for loading tasks is learned from experience using reinforcement learning methods. Since the cost of server loading at different granularity levels is different, using the best granularity level can reduce the cost of server loading.

The present invention can be a system, method and/or computer program product. A computer program product may include a computer readable storage medium having computer readable program instructions thereon for causing a processor to implement various aspects of the present invention.

A computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device. A computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or flash memory), static random access memory (SRAM), compact disc read only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanically encoded device, such as a printer with instructions stored thereon A hole card or a raised structure in a groove, and any suitable combination of the above. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.

Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .

Computer program instructions for carrying out operations of the present invention may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or Source or object code written in any combination, including object-oriented programming languages—such as Smalltalk, C++, Python, etc., and conventional procedural programming languages—such as the “C” language or similar programming languages. Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as via the Internet using an Internet service provider). connect). In some embodiments, an electronic circuit, such as a programmable logic circuit, field programmable gate array (FPGA), or programmable logic array (PLA), can be customized by utilizing state information of computer-readable program instructions, which can Various aspects of the invention are implemented by executing computer readable program instructions.

Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It should be understood that each block of the flowcharts and/or block diagrams, and combinations of blocks in the flowcharts and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processor of the computer or other programmable data processing apparatus , producing an apparatus for realizing the functions/actions specified in one or more blocks in the flowchart and/or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause computers, programmable data processing devices and/or other devices to work in a specific way, so that the computer-readable medium storing instructions includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks in flowcharts and/or block diagrams.

Having described various embodiments of the present invention, the foregoing description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and alterations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen to best explain the principle of each embodiment, practical application or technical improvement in the market, or to enable other ordinary skilled in the art to understand each embodiment disclosed herein. The scope of the invention is defined by the appended claims.

Claims

A resource sharing-aware online task offloading method, comprising:

Sorting the sites to obtain a site sequence, where C represents the site ranked first;

Perform the following steps until the user tasks in the user collection are loaded or the target site is traversed:

Based on the instance-based user set U and target site set V', users are pre-assigned to sites by solving the modeled user allocation problem, where V' represents the set of all sites starting from site C in the site sequence, the user allocation problem The optimization goal of is to meet the delay requirements of each user and maximize the opportunity for resource sharing;

In the current site C, by solving the modeled server loading problem for the user tasks in the obtained user set, determine the server to be loaded with the task, wherein the optimization goal of the server loading problem is for the users pre-assigned to each site, Use the fewest servers to load its tasks and meet the server's resource capacity constraints.
The method according to claim 1, wherein the user assignment problem is modeled as a set partition problem, for a given set of users is divided into disjoint subsets, each subset is assigned to a site, And the union of these subsets is equal to the entire set of users.
The method according to claim 2, wherein the user allocation problem of a single instance is defined as: given a set of user sets U and
choose
Subsets of U form a partition of U, where
is a set of candidate binary groups, expressed as:

Among them, the two-tuple <v, A> represents the site v and a feasible set of the site
R v represents all users in a set of user sets U that meet the delay requirement on site v, and V represents a group of sites.
The method according to claim 3, wherein the following steps are used to solve the user assignment problem of the single instance:

Step S41: For a given instance of user set U and a set of sites V, get a set of 2-tuples
in
yes
A subset of;

Step S42: from the collection
Select the binary group with the least cost-effectiveness in , marked as (<v * , A * >), as the division block of the user set U corresponding to the site v;

Step S43: Update collection
To delete the already assigned user A* from each set of 2-tuples;

Step S44: from the collection
remove invalid binary pairs;

Step S45: Repeat steps S42 to S44 until the union of the selected two-tuples is equal to U.
The method according to claim 4, characterized in that, for the binary
Define its cost as:

c(<v, A>)=c v |A| -θ

where θ is the weight of |A|, which is used to balance the cost of the server and the cardinality of the collection, and the value of θ affects the degree of resource sharing.
The method according to claim 1, wherein the server loading problem is modeled as a three-dimensional bin packing problem, and a user granularity level, a user group granularity level, and an instance granularity level are set for decision-making, wherein the user granularity The level is to treat each user as an item, the user group granularity level is to treat a group of users sharing a perspective as an item, and the instance granularity level is to treat users belonging to the same instance as an item.
The method of claim 5, wherein the granularity-level decision-making problem is modeled as a multi-armed bandit problem and solved using action values in reinforcement learning, where actions in reinforcement learning are used to fit At the level of granularity, the goal of reinforcement learning is set to maximize the expected total return in action selection. For a selected action, multiple instances are unloaded in turn, and the server is loaded with the selected action, and the reward of the action is defined as the newly launched A negative number for the number of servers.
The method according to claim 6, wherein solving the constraint conditions of the server loading problem comprises:

Among them, the Boolean variable x uj indicates whether the server j hosts the task of user u, t uv indicates the network delay caused by offloading the task of user u to site v, the delay of user u is ∑ j∈S t uj x uj , τ indicates The upper limit of latency allowed, U represents a group of users, S represents a group of edge servers, Boolean variable w j represents whether server j is started, Boolean variable y kj represents whether server j hosts rendering tasks for user group k sharing a view, on server j The number of tasks is ∑ k∈K y kj , p K represents the maximum number of tasks allowed by each server, the Boolean variable z ij represents whether server j hosts instance i∈I, p I represents the maximum number of instances allowed by each server, k (u) represents the shared perspective user group to which user u belongs, c v represents the cost of each server in site v, and the vector p=(p C , p G , p M , p GM , p B ) represents the CPU of each server , GPU, memory, GPU memory and bandwidth capacity.
A computer-readable storage medium, on which a computer program is stored, wherein, when the program is executed by a processor, the steps of the method according to any one of claims 1 to 8 are implemented.
A computer device comprising a memory and a processor, wherein a computer program capable of running on the processor is stored on the memory, wherein any one of claims 1 to 8 is implemented when the processor executes the program The steps of the method described in the item.