CN113672372B - Multi-edge collaborative load balancing task scheduling method based on reinforcement learning - Google Patents

Multi-edge collaborative load balancing task scheduling method based on reinforcement learning Download PDF

Info

Publication number
CN113672372B
CN113672372B CN202111000830.0A CN202111000830A CN113672372B CN 113672372 B CN113672372 B CN 113672372B CN 202111000830 A CN202111000830 A CN 202111000830A CN 113672372 B CN113672372 B CN 113672372B
Authority
CN
China
Prior art keywords
value
edge
load balancing
action
balancing scheme
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111000830.0A
Other languages
Chinese (zh)
Other versions
CN113672372A (en
Inventor
陈哲毅
胡俊钦
陈星�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN202111000830.0A priority Critical patent/CN113672372B/en
Publication of CN113672372A publication Critical patent/CN113672372A/en
Application granted granted Critical
Publication of CN113672372B publication Critical patent/CN113672372B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The invention relates to a multi-edge collaborative load balancing task scheduling method based on reinforcement learning, which comprises the following steps: step S1: according to the historical data set, a reinforcement learning algorithm is used for evaluating the Q value of each adjustment operation under different system states; step S2: preprocessing the Q value of the adjustment operation in the Q value table constructed in the step S1, and then training a Q value prediction model by using a machine learning algorithm; step S3: each edge independently makes decisions in parallel according to the Q-value prediction model. The invention combines reinforcement learning and machine learning, and designs a multi-edge cooperative load balancing algorithm in the wireless metropolitan area network. Each edge node can independently perform load balancing scheduling between the edge node and the adjacent nodes by only using local information, and gradually search for a proper load balancing scheme through feedback control and multi-edge cooperation. The sought solution can effectively reduce the response time of the task.

Description

Multi-edge collaborative load balancing task scheduling method based on reinforcement learning
Technical Field
The invention relates to the field of load balancing scheduling strategies in edge calculation, in particular to a polygonal edge collaborative load balancing task scheduling method based on reinforcement learning.
Background
In recent years, advances in mobile computing technology have enabled users to experience a wide variety of applications. However, as the resource requirements of newly developed applications continue to grow, the computing power of mobile devices remains limited. A traditional approach to overcoming mobile device resource starvation is to utilize the abundant computing resources in the remote cloud. Mobile devices can reduce their workload and extend battery life by offloading computation-intensive tasks to remote cloud execution. However, long distances between the cloud and the user can cause delays in applications where user interaction is frequent, affecting the user experience. To minimize the delay in offloading tasks to the remote cloud, researchers propose to use edges closer to the user for offloading tasks.
An edge is a cluster of computers rich in resources that connect to nearby mobile users through a wireless network. By providing low latency access to its rich computing resources, edges can significantly improve the performance of mobile applications. Although edges are often defined as isolated "data centers in boxes," it is a clear benefit to connect multiple edges together to form a network. Cities are typically highly populated, meaning that edges will be available to a large number of users. This increases the cost effectiveness of the edges because they are less likely to be idle. In addition, due to the size of the network, wireless metropolitan area network service providers can take advantage of economies of scale in providing edge services over wireless metropolitan area networks, making the edge services more acceptable to the public.
One major problem faced by wireless metropolitan area network service providers is how to distribute user's task requests to different edges, so that the workload between edges in the wireless metropolitan area network is well balanced, thereby shortening the response time delay of tasks and enhancing the user's experience of using services. In particular, a large number of users in the network means that the workload of each edge will be highly unstable. If an edge is suddenly overwhelmed by a user request, the task response time of the edge will increase dramatically, resulting in delays in the user application and a reduction in the user experience. In order to prevent some edges from being overloaded, it is important to assign user requests to different edges so that the workload between the edges can be well balanced, thus reducing the maximum response time.
Disclosure of Invention
In view of the above, the present invention aims to provide a multi-edge collaborative load balancing task scheduling method based on reinforcement learning, which is used for solving the problem of multi-edge collaborative load balancing in a wireless metropolitan area network, each edge node independently performs load balancing scheduling between the node and an adjacent node based on local information, performs a single scheduling decision by a method combining reinforcement learning and machine learning, gradually searches for a proper load balancing scheme through feedback control and multi-edge collaboration, and can effectively reduce the response time of a task.
The invention is realized by adopting the following scheme: a polygonal edge collaborative load balancing task scheduling method based on reinforcement learning comprises the following steps:
step S1: according to the historical data set, using a reinforcement learning algorithm to evaluate the Q value of each adjustment operation under different multi-edge collaborative system states;
step S2: preprocessing the Q value of the adjustment operation in the Q value table constructed in the step S1, and then training a Q value prediction model by using a machine learning algorithm;
step S3: each edge independently makes decisions in parallel according to the Q-value prediction model.
Further, the reinforcement learning algorithm in step S1 is:
state space: edge e i Is not equal to the state of each of the otherWith a triplet < lambda i ,/>L i Represented by edge e i Task arrival rate lambda of (2) i Edge e i Is>And edge e i And the load ratio L of the adjacent edges i
Action space: edge e i For the movement space of (a)Representation, wherein->Representing the added edge e i Is scheduled to adjacent edges in arriving tasks>The amount of task performed thereon; />Representing the reduced edge e i Is scheduled to adjacent edges in arriving tasks>The amount of task performed thereon;
bonus function: the bonus function is defined as follows:
the reinforcement learning algorithm adopts a Q-learning algorithm, and the Q value updating formula is as follows:
Q(s,a)=Q(s,a)+α[r+γ·max(Q(s',a'))-Q(s,a)]
where max (Q (s ', a')) represents the maximum Q value obtained by selecting action a 'in state s', parameter α represents learning efficiency, and parameter γ represents a prize discount.
Further, the step S1 specifically includes the following steps:
step S11: initializing a Q value table;
step S12: using a Q-learning algorithm to evaluate the Q value of the adjustment operation in each piece of historical data, and continuing the training process until the Q value converges;
step S13: obtaining a Q value table and recording the edges e at different moments i Task arrival rate of edge e i Current local load balancing scheme of (a)And edge e i And the load ratio L of the adjacent edges i And a Q value corresponding to each adjustment operation.
Further, the step S12 specifically includes the following steps:
step S121: in each round, first randomInitializing a current local load balancing schemeAnd generating a current system state;
initializing a current local load balancing scheme
Edge e i Is not equal to the state of each of the otherWith a triplet < lambda i ,/>L i Represented by edge e i Task arrival rate lambda of (2) i Edge e i Is>And edge e i And the load ratio L of the adjacent edges i
Step S122: if the current local load balancing schemeNot target local load balancing scheme->Circularly executing the steps S123 to S125;
step S123: selecting an action a from the action space according to an epsilon-greedy strategy; selecting an action a with an epsilon-greedy policy:
a=select_action(s,Q_table);
step S124: reaching state s' and obtaining a reward r under the action of the state transfer function;
the current state is transformed into s' under the action of a state transfer function:
s'=T(S,a)
the agent obtains a prize value:
step S125: updating the Q value according to the following formula, and finally replacing the current state s with the state s';
updating the Q value:
Q(s,a)=Q(s,a)+α[r+γ·max(Q(s',a'))-Q(s,a)]
updating the current state:
s=s'。
further, in step S2
The Q value is preprocessed, and the processing rule is as follows:
if q_value= 0andthe corresponding adjustment operations are considered illegal, these Q values will be labeled I; if->I.e. a target local load balancing scheme is found, the Q value is still set to 0; for the rest of the cases, the Q value is set to its inverse; after preprocessing, the current local load balancing scheme +.>Closer to the target local load balancing scheme +.>Q value of adjustment operationThe smaller the Q value is, the smaller the Q value is 0 when a target local load balancing scheme is found; based on the preprocessed Q value table, a SVR algorithm is used for training a Q value prediction model, and a regression equation is expressed as follows:
where m is the number of training samples, κ (x, x i ) Is a kernel function, the remaining parameters are model parameters; with Gaussian kernel as kernel function, i.e.
Where χ > 0 is the bandwidth of the Gaussian kernel.
Further, the specific content of the step S3 is as follows:
adopting an adjustment operation decision algorithm, predicting the Q value of each operation under the state of different multi-edge cooperative systems by utilizing a Q value prediction model in decision, and selecting the action with the minimum Q value; setting a threshold T, and when the Q value corresponding to each adjustment operation is smaller than the threshold T, considering that the current local load balancing scheme is close to the target local load balancing scheme, and taking the current local load balancing scheme as the target local load balancing scheme approximately; the inputs to adjust the operational decision algorithm include edge e i Task arrival rate lambda of (2) i Edge e i Current local load balancing scheme of (a)Edge e i And the load rate L of adjacent edges i The output is edge e i A) the next load balancing adjustment operation a; the specific process is as follows:
firstly, evaluating the Q value of each action, namely, adjusting operation; if an action is deemed illegal, marking the corresponding Q value as I; under other conditions, predicting the Q value corresponding to each action according to the Q value prediction model;
wherein the prediction_model () -invokes the Q value prediction model
Q_value (a) -Q value of action a
Then, judging whether the Q values of all legal adjustment operations are smaller than a threshold T, wherein the Q values marked as I are excluded; if the Q values of all actions are smaller than the threshold T, the target local load balancing scheme is considered to be found, and adjustment is not needed, so that the adjustment operation is Null; otherwise, selecting the adjustment operation with the minimum Q value, and if the Q values of a plurality of adjustment operations are the same and are the minimum Q value, randomly selecting one adjustment operation from the adjustment operations;
if(for each Q_value(a)≤T||Q_value(a)==I):
a=Null
else:
record actions with minimum Q value and not I:
a_List=A i .getAction_MinQvalue()
randomly selecting an action from the inside of the a_list:
a=a_List.get_Action_Random()
finally, return the selected edge e i Is a next load balancing adjustment operation a.
Compared with the prior art, the invention has the following beneficial effects:
the invention combines reinforcement learning and machine learning, and designs a multi-edge cooperative load balancing method in a wireless metropolitan area network. Each edge node can independently perform load balancing scheduling between the edge node and the adjacent nodes by only using local information, and gradually search for a proper load balancing scheme through feedback control and multi-edge cooperation. The sought solution can effectively reduce the response time of the task.
Drawings
Fig. 1 is a general frame diagram of an embodiment of the present invention.
FIG. 2 is a graph comparing performance of an embodiment of the present invention with that of a conventional method.
Detailed Description
The invention will be further described with reference to the accompanying drawings and examples.
It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
As shown in fig. 1 and 2, the present embodiment provides a multi-edge collaborative load balancing task scheduling method based on reinforcement learning, which includes the following steps:
step S1: according to the historical data set, using a reinforcement learning algorithm to evaluate the Q value of each adjustment operation under different multi-edge collaborative system states; the multi-edge cooperative system refers to a multi-edge cooperative system in a wireless metropolitan area network, and consists of a plurality of edges connected with each other through the wireless metropolitan area network.
Step S2: preprocessing the Q value of the adjustment operation in the Q value table constructed in the step S1, and then training a Q value prediction model by using a machine learning algorithm;
step S3: each edge independently makes decisions in parallel according to the Q-value prediction model.
In this embodiment, the reinforcement learning algorithm in step S1 is:
state space: edge e i Is not equal to the state of each of the otherWith a triplet < lambda i ,/>L i Represented by edge e i Task arrival rate lambda of (2) i Edge e i Is>And edge e i And the load ratio L of the adjacent edges i
Action space: edge e i For the movement space of (a)Representation, wherein->Representing the added edge e i Is scheduled to adjacent edges in arriving tasks>The amount of task performed thereon; />Representing the reduced edge e i Is scheduled to adjacent edges in arriving tasks>The amount of task performed thereon;
bonus function: the bonus function is defined as follows:
the reinforcement learning algorithm adopts a Q-learning algorithm, and the Q value updating formula is as follows:
Q(s,a)=Q(s,a)+α[r+γ·max(Q(s',a'))-Q(s,a)]
where max (Q (s ', a')) represents the maximum Q value obtained by selecting action a 'in state s', parameter α represents learning efficiency, and parameter γ represents a prize discount.
In this embodiment, the step S1 specifically includes the following steps:
step S11: initializing a Q value table;
step S12: using a Q-learning algorithm to evaluate the Q value of the adjustment operation in each piece of historical data, and continuing the training process until the Q value converges;
step S13: obtaining a Q value table and recording the edges e at different moments i Task arrival rate of edge e i Current local load balancing scheme of (a)And edge e i And the load ratio L of the adjacent edges i And a Q value corresponding to each adjustment operation.
In this embodiment, the step S12 specifically includes the following steps:
step S121: in each round, the current local load balancing scheme is first randomly initializedAnd generating a current system state;
initializing a current local load balancing scheme
Edge e i Is not equal to the state of each of the otherWith a triplet < lambda i ,/>L i Represented by edge e i Task arrival rate lambda of (2) i Edge e i Is equal to the current local load of (a)Balance plan->And edge e i And the load ratio L of the adjacent edges i
Step S122: if the current local load balancing schemeNot target local load balancing scheme->Circularly executing the steps S123 to S125;
step S123: selecting an action a from the action space according to an epsilon-greedy strategy; selecting an action a with an epsilon-greedy policy:
a=select_action(s,Q_table);
step S124: reaching state s' and obtaining a reward r under the action of the state transfer function;
the current state is transformed into s' under the action of a state transfer function:
s'=T(S,a)
the agent obtains a prize value:
step S125: updating the Q value according to the following formula, and finally replacing the current state s with the state s';
updating the Q value:
Q(s,a)=Q(s,a)+α[r+γ·max(Q(s',a'))-Q(s,a)]
updating the current state:
s=s'。
in the present embodiment, the steps are described in step S2
The Q value is preprocessed, and the processing rule is as follows:
if q_value= 0andthe corresponding adjustment operations are considered illegal, these Q values will be labeled I; if->I.e. a target local load balancing scheme is found, the Q value is still set to 0; for the rest of the cases, the Q value is set to its inverse; after preprocessing, the current local load balancing scheme +.>Closer to the target local load balancing scheme +.>The smaller the Q value of the adjustment operation is, the smaller the Q value is, and when a target local load balancing scheme is found, the Q value is the minimum value of 0; based on the preprocessed Q value table, training a Q value prediction model by using an SVR algorithm, wherein a regression equation of the Q value prediction model (training the Q value prediction model is a parameter in a training regression equation) is expressed as follows:
where m is the number of training samples, κ (x, x i ) Is a kernel function, the remaining parameters are model parameters; with Gaussian kernel as kernel function, i.e.
Where χ > 0 is the bandwidth of the Gaussian kernel.
In this embodiment, the specific content of step S3 is as follows:
adopting an adjustment operation decision algorithm, and predicting each of the different multi-edge collaborative system states in decision by using a Q value prediction modelThe Q value of the operation is selected, and the action with the minimum Q value is selected; setting a threshold T, and when the Q value corresponding to each adjustment operation is smaller than the threshold T, considering that the current local load balancing scheme is close to the target local load balancing scheme, and taking the current local load balancing scheme as the target local load balancing scheme approximately; the inputs to adjust the operational decision algorithm include edge e i Task arrival rate lambda of (2) i Edge e i Current local load balancing scheme of (a)Edge e i And the load rate L of adjacent edges i The output is edge e i A) the next load balancing adjustment operation a; the specific process is as follows:
firstly, evaluating the Q value of each action, namely, adjusting operation; if an action is deemed illegal, marking the corresponding Q value as I; under other conditions, predicting the Q value corresponding to each action according to the Q value prediction model;
wherein the prediction_model () -invokes the Q value prediction model
Q_value (a) -Q value of action a
Then, judging whether the Q values of all legal adjustment operations are smaller than a threshold T, wherein the Q values marked as I are excluded; if the Q values of all actions are smaller than the threshold T, the target local load balancing scheme is considered to be found, and adjustment is not needed, so that the adjustment operation is Null; otherwise, selecting the adjustment operation with the minimum Q value, and if the Q values of a plurality of adjustment operations are the same and are the minimum Q value, randomly selecting one adjustment operation from the adjustment operations;
if(for each Q_value(a)≤T||Q_value(a)==I):
a=Null
else:
record actions with minimum Q value and not I:
a_List=A i .getAction_MinQvalue()
randomly selecting an action from the inside of the a_list:
a=a_List.get_Action_Random()
finally, return the selected edge e i Is a next load balancing adjustment operation a.
Preferably, in the present embodiment, the symbols are defined as follows:
definition 1: the deployed edge set in a wireless metropolitan area network is denoted as e= { E 1 ,e 2 ,...,e N E, where e i Representing the ith edge, N is the total number of edges.
Definition 2: the service rate of N edges is v= { V 1 ,v 2 ,...,v N Represented by, v where i Representing edge e i Is provided.
Definition 3: the unit task transfer time between N edges is expressed as:
wherein d i,j Representing edge e i And edge e j The task transmission time between them.
Definition 4: task arrival rate for N edges is shown as λ= { λ 12 ,...,λ N Represented by }, where lambda i > 0 represents edge e i Task arrival rate of (2). For convenience of description, an initial task of offloading a user to an edge is hereinafter referred to as an arrival task of the edge. Definition 5: the global load balancing scheme is expressed as:
wherein F is i Representing edge e i Is provided.
Definition 6: the actual task load for the N edges is denoted as w= { W 1 ,w 2 ,...,w N -w is i > 0 representsEdge e per unit time i Is a real task amount of the system.
Definition 7: edge e per unit time j The actual task load rate of (2) is expressed as:
preferably, the problem in this embodiment is defined as follows:
definition 8: based on queuing theory, the average execution time of tasks on different edges is:
definition 9: the task response time is composed of execution time and transmission time, so that the edge e in unit time i Scheduled to adjacent edge e in arriving task of (a) j The average response time of the task performed on is:
t i,j =T a (l j )+d i,j
definition 10: edge e i Average response time T of arriving task r i The method comprises the following steps:
defining the maximum average response time to reach a task on 11:N edges as:
definition 12: the objective function is:
min(T max )
i.e. the maximum average response time to reach a task on N edges.
Reinforcement learning:
state space: edge(s)Edge e i Is not equal to the state of each of the otherWith a triplet < lambda i ,/>L i Represented by edge e i Task arrival rate lambda of (2) i Edge e i Is>And edge e i And the load ratio L of the adjacent edges i
Action space: edge e i For the movement space of (a)Representation, wherein->Representing the added edge e i Is scheduled to adjacent edges in arriving tasks>The amount of task performed thereon; />Representing the reduced edge e i Is scheduled to adjacent edges in arriving tasks>The amount of tasks performed thereon.
Bonus function: the reward function herein is defined as follows:
the reinforcement learning algorithm of this embodiment adopts a Q-learning algorithm, and the Q value update formula is as follows:
Q(s,a)=Q(s,a)+α[r+γ·max(Q(s',a'))-Q(s,a)]
where max (Q (s ', a')) represents the maximum Q value obtained by selecting action a 'in state s', parameter α represents learning efficiency, and parameter γ represents a prize discount.
Based on the above definition, the Q-learning algorithm is used to evaluate the Q values of the different adjustment operations according to the data set of table 3, as shown in algorithm 1. First, a Q value table (line 1) is initialized. The Q value of the tuning operation in each piece of historical data is then evaluated using the Q-learning algorithm, and the training process continues until the Q value converges (rows 2-12). In each round, the current local load balancing scheme is first randomly initializedAnd generates the current system state (line 3). Then if the current local load balancing scheme +.>Not target local load balancing scheme->The following procedure (lines 5-11) is cyclically performed: firstly, an action a is selected from the action space according to an epsilon-greedy strategy (line 6), then the state s' is reached under the action of a state transfer function and a prize r is obtained (lines 7-8). Next, the Q value is updated again according to equation (9) (line 9). Finally, the current state s is replaced by the state s' (line 10).
Thus, a Q value table can be obtained to record the edges e at different moments i Task arrival rate of edge e i Current local load balancing scheme of (a)And edge e i And the load ratio L of the adjacent edges i And a Q value corresponding to each adjustment operation.
Q value prediction model:
the Q value is preprocessed, and the processing rule is as follows:
if q_value= 0andthe corresponding adjustment operations are considered illegal and these Q values will be labeled I. If->(i.e., the target local load balancing scheme is found), the Q value is still set to 0. For the rest of the cases, the Q value is set to its inverse. After preprocessing, the current local load balancing scheme +.>Closer to the target local load balancing scheme +.>The smaller the Q value of the adjustment operation is, the smaller the Q value is, and when the target local load balancing scheme is found, the Q value is the minimum value of 0. Based on the preprocessed Q-value table, a SVR algorithm is used to train a Q-value prediction model, and its regression equation can be expressed as:
where m is the number of training samples, κ (x, x i ) Is a kernel function and the remaining parameters are model parameters. We choose gaussian kernels as weKernel functions, i.e.
Where χ > 0 is the bandwidth of the Gaussian kernel.
Local load balancing scheduling algorithm:
first, edge e is initialized i Is a current load balancing scheme of (1)(line 1), i.e. edge e i Is performed at the node.
Then edge e i The following procedure (lines 2 to 11) was repeatedly performed:
the first step: edge e i Acquiring load rates L of self and adjacent edges i (line 3).
And a second step of: use edge e i Task arrival rate lambda of (2) i Edge e i Current local load balancing scheme of (a)And edge e i And the load ratio L of the adjacent edges i As input, an edge e is obtained according to algorithm 3 i Next load balancing adjustment operation (line 4).
And a third step of: if the adjustment operation returned by algorithm 3 is null, then edge e is declared i A target local load balancing scheme has been found, without further adjustment (lines 5-6); otherwise, edge e i Performing the resulting adjustment operation and updating the current local load balancing scheme(lines 7 to 9). />
The algorithm 3 predicts the Q value of each operation in different system states by using a Q value prediction model, and selects the action with the smallest Q value. When the Q value corresponding to each adjustment operation is smaller than the threshold T, we consider that the current local load balancing scheme is close enough to the target local load balancing scheme, and at this time, the current local load balancing scheme can be approximately used as the target local load balancing scheme.
The inputs to algorithm 3 include edge e i Task arrival rate lambda of (2) i Edge e i Current local load balancing scheme of (a)Edge e i And the load rate L of adjacent edges i The output is edge e i Is a next load balancing adjustment operation a. The specific process is as follows:
first, the Q value of each action (adjustment operation) is evaluated. If an action is deemed illegal, marking the corresponding Q value as I; in other cases, the Q value (1 to 7 lines) corresponding to each operation is predicted from the Q value prediction model.
Then, it is determined whether the Q values of all legal adjustment operations are less than the threshold T (except the Q value marked as I). If the Q values of all actions are smaller than the threshold T, the target local load balancing scheme is considered to be found, and adjustment is not needed, so that the adjustment operation is Null. Otherwise, the adjustment operation with the smallest Q value is selected, and if the Q values of a plurality of adjustment operations are the same and are the smallest Q value, one adjustment operation is randomly selected (rows 8-13).
Finally, return the selected edge e i Next load balancing adjustment operation a (line 14).
/>
Preferably, five areas are randomly selected on the distribution diagram of the wireless base station in Shanghai city, and five different simulation scenes are designed. In each scene, the total number of edges n=15, and the longitude and latitude coordinates of 15 wireless base stations are randomly selected in each area as the coordinates of the edges, and the task arrival rate lambda of each edge i Satisfies normal distribution N (10, 4), service rate v i Satisfies a normal distribution N (15, 6); the number of the edges connected with other edges is more than 0and less than or equal to 3, the unit task transmission time D between the edges is mapped to the intervals [0.1,0.2 according to the distance between the edges]The closer the distance between the two edges, the less the transmission time per unit task. The iteration round number (epi) of reinforcement learning, learning efficiency α, and bonus discount γ are set to 100, 0.1, and 0.9, respectively. Epsilon=0.1 is set in epsilon-greedy policy. The threshold T is set to 0.15.
The experimental results are shown in the figure comparing the method proposed in this example (RF-CLB) with the classical ML-based method and the rule-based method. Experimental results show that the response time of the load balancing scheme obtained by the RF-CLB is 6-9% and 10-12% smaller than that of the classical ML-based method and the rule-based method respectively.
The foregoing description is only of the preferred embodiments of the invention, and all changes and modifications that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims (1)

1. A polygonal edge cooperative load balancing task scheduling method based on reinforcement learning is characterized by comprising the following steps of: the method comprises the following steps:
step S1: according to the historical data set, using a reinforcement learning algorithm to evaluate the Q value of each adjustment operation under different multi-edge collaborative system states;
step S2: preprocessing the Q value of the adjustment operation in the Q value table constructed in the step S1, and then training a Q value prediction model by using a machine learning algorithm;
step S3: each edge independently and parallelly makes decisions according to the Q value prediction model;
the reinforcement learning algorithm in step S1 is:
state space: edge e i Is not equal to the state of each of the otherWith a triplet->Representing, respectively, edge e i Task arrival rate lambda of (2) i Edge e i Is>And edge e i And the load ratio L of the adjacent edges i
Action space: edge e i For the movement space of (a)Representation, wherein->Representing the added edge e i Is scheduled to adjacent edges in arriving tasks>The amount of task performed thereon; />Representing the reduced edge e i Is scheduled to adjacent edges in arriving tasks>The amount of task performed thereon;
bonus function: the bonus function is defined as follows:
the reinforcement learning algorithm adopts a Q-learning algorithm, and the Q value updating formula is as follows:
Q(s,a)=Q(s,a)+α[r+γmax(Q(s',a'))-Q(s,a)]
wherein max (Q (s ', a')) represents the maximum Q value obtained by selecting action a 'in state s', parameter α represents learning efficiency, and parameter γ represents rewarding discount;
the step S1 specifically comprises the following steps:
step S11: initializing a Q value table;
step S12: using a Q-learning algorithm to evaluate the Q value of the adjustment operation in each piece of historical data, and continuing the training process until the Q value converges;
step S13: obtaining a Q value table and recording the edges e at different moments i Task arrival rate of edge e i Current local load balancing scheme of (a)And edge e i And the load ratio L of the adjacent edges i And a Q value corresponding to each adjustment operation;
the step S12 specifically includes the following steps:
step S121: in each round, the current local load balancing scheme is first randomly initializedAnd generating a current system state;
initializing a current local load balancing scheme
Edge e i Is not equal to the state of each of the otherWith a triplet->Representing, respectively, edge e i Task arrival rate lambda of (2) i Edge e i Is>And edge e i And the load ratio L of the adjacent edges i
Step S122: if the current local load balancing schemeNot target local load balancing scheme->Circularly executing the steps S123 to S125;
step S123: selecting an action a from the action space according to an epsilon-greedy strategy;
selecting an action a with an epsilon-greedy policy:
a=select_action(s,Q_table);
step S124: reaching state s' and obtaining a reward r under the action of the state transfer function;
the current state is transformed into s' under the action of a state transfer function:
s'=T(S,a)
the agent obtains a prize value:
step S125: updating the Q value according to the following formula, and finally replacing the current state s with the state s';
updating the Q value:
Q(s,a)=Q(s,a)+α[r+γmax(Q(s',a'))-Q(s,a)]
updating the current state:
s=s';
described in step S2
The Q value is preprocessed, and the processing rule is as follows:
if it isThe corresponding adjustment operations are considered illegal, these Q values will be labeled I; if->I.e. a target local load balancing scheme is found, the Q value is still set to 0; for the rest of the cases, the Q value is set to its inverse; after preprocessing, the current local load balancing scheme +.>Closer to the target local load balancing scheme +.>The smaller the Q value of the adjustment operation is, the smaller the Q value is, and when a target local load balancing scheme is found, the Q value is the minimum value of 0; based on the preprocessed Q value table, a SVR algorithm is used for training a Q value prediction model, and a regression equation is expressed as follows:
where m is the number of training samples, κ (x, x i ) Is a kernel function, the remaining parameters are model parameters; with Gaussian kernel as kernel function, i.e.
Where χ > 0 is the bandwidth of the Gaussian kernel;
the specific content of the step S3 is as follows:
adopting an adjustment operation decision algorithm, predicting the Q value of each operation under the state of different multi-edge cooperative systems by utilizing a Q value prediction model in decision, and selecting the action with the minimum Q value; setting a threshold T, and when the Q value corresponding to each adjustment operation is smaller than the threshold T, considering that the current local load balancing scheme is close to the target local load balancing scheme, and taking the current local load balancing scheme as the target local load balancing scheme approximately; the inputs to adjust the operational decision algorithm include edge e i Task arrival rate lambda of (2) i Edge e i Current local load balancing scheme of (a)Edge e i And the load rate L of adjacent edges i The output is edge e i A) the next load balancing adjustment operation a; the specific process is as follows:
firstly, evaluating the Q value of each action, namely, adjusting operation; if an action is deemed illegal, marking the corresponding Q value as I; under other conditions, predicting the Q value corresponding to each action according to the Q value prediction model;
wherein the prediction_model () -invokes the Q value prediction model
Q_value (a) -Q value of action a
Then, judging whether the Q values of all legal adjustment operations are smaller than a threshold T, wherein the Q values marked as I are excluded; if the Q values of all actions are smaller than the threshold T, the target local load balancing scheme is considered to be found, and adjustment is not needed, so that the adjustment operation is Null; otherwise, selecting the adjustment operation with the minimum Q value, and if the Q values of a plurality of adjustment operations are the same and are the minimum Q value, randomly selecting one adjustment operation from the adjustment operations;
finally, return the selected edge e i Is a next load balancing adjustment operation a.
CN202111000830.0A 2021-08-30 2021-08-30 Multi-edge collaborative load balancing task scheduling method based on reinforcement learning Active CN113672372B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111000830.0A CN113672372B (en) 2021-08-30 2021-08-30 Multi-edge collaborative load balancing task scheduling method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111000830.0A CN113672372B (en) 2021-08-30 2021-08-30 Multi-edge collaborative load balancing task scheduling method based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN113672372A CN113672372A (en) 2021-11-19
CN113672372B true CN113672372B (en) 2023-08-08

Family

ID=78547253

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111000830.0A Active CN113672372B (en) 2021-08-30 2021-08-30 Multi-edge collaborative load balancing task scheduling method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN113672372B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115118728B (en) * 2022-06-21 2024-01-19 福州大学 Edge load balancing task scheduling method based on ant colony algorithm

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109947567A (en) * 2019-03-14 2019-06-28 深圳先进技术研究院 A kind of multiple agent intensified learning dispatching method, system and electronic equipment
WO2020011068A1 (en) * 2018-07-10 2020-01-16 第四范式(北京)技术有限公司 Method and system for executing machine learning process
CN112506643A (en) * 2020-10-12 2021-03-16 苏州浪潮智能科技有限公司 Load balancing method and device of distributed system and electronic equipment
CN112948112A (en) * 2021-02-26 2021-06-11 杭州电子科技大学 Edge computing workload scheduling method based on reinforcement learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020011068A1 (en) * 2018-07-10 2020-01-16 第四范式(北京)技术有限公司 Method and system for executing machine learning process
CN109947567A (en) * 2019-03-14 2019-06-28 深圳先进技术研究院 A kind of multiple agent intensified learning dispatching method, system and electronic equipment
CN112506643A (en) * 2020-10-12 2021-03-16 苏州浪潮智能科技有限公司 Load balancing method and device of distributed system and electronic equipment
CN112948112A (en) * 2021-02-26 2021-06-11 杭州电子科技大学 Edge computing workload scheduling method based on reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
面向云计算的分布式机器学习任务调度算法研究;孟彬彬;吴艳;;西安文理学院学报(自然科学版)(01);全文 *

Also Published As

Publication number Publication date
CN113672372A (en) 2021-11-19

Similar Documents

Publication Publication Date Title
Cui et al. Novel method of mobile edge computation offloading based on evolutionary game strategy for IoT devices
Qu et al. DMRO: A deep meta reinforcement learning-based task offloading framework for edge-cloud computing
Wei et al. Joint optimization of caching, computing, and radio resources for fog-enabled IoT using natural actor–critic deep reinforcement learning
CN109947545B (en) Task unloading and migration decision method based on user mobility
Liu et al. A reinforcement learning-based resource allocation scheme for cloud robotics
Li et al. NOMA-enabled cooperative computation offloading for blockchain-empowered Internet of Things: A learning approach
Sun et al. Autonomous resource slicing for virtualized vehicular networks with D2D communications based on deep reinforcement learning
Zhang et al. Joint task offloading and data caching in mobile edge computing networks
CN108416465B (en) Workflow optimization method in mobile cloud environment
Zhao et al. MESON: A mobility-aware dependent task offloading scheme for urban vehicular edge computing
Zhou et al. Learning from peers: Deep transfer reinforcement learning for joint radio and cache resource allocation in 5G RAN slicing
CN115190033B (en) Cloud edge fusion network task unloading method based on reinforcement learning
Supreeth et al. Hybrid genetic algorithm and modified-particle swarm optimization algorithm (GA-MPSO) for predicting scheduling virtual machines in educational cloud platforms
EP4024212B1 (en) Method for scheduling inference workloads on edge network resources
CN107566535B (en) Self-adaptive load balancing method based on concurrent access timing sequence rule of Web map service
CN113672372B (en) Multi-edge collaborative load balancing task scheduling method based on reinforcement learning
ABDULKAREEM et al. OPTIMIZATION OF LOAD BALANCING ALGORITHMS TO DEAL WITH DDOS ATTACKS USING WHALE‎ OPTIMIZATION ALGORITHM
Li et al. DQN-enabled content caching and quantum ant colony-based computation offloading in MEC
CN115022926A (en) Multi-objective optimization container migration method based on resource balance
Akhtar et al. A comparative study of the application of glowworm swarm optimization algorithm with other nature-inspired algorithms in the network load balancing problem
Chen et al. Joint optimization of task offloading and resource allocation via deep reinforcement learning for augmented reality in mobile edge network
CN115499875B (en) Satellite internet task unloading method, system and readable storage medium
Mohammadi et al. SDN-IoT: SDN-based efficient clustering scheme for IoT using improved Sailfish optimization algorithm
CN116302578A (en) QoS (quality of service) constraint stream application delay ensuring method and system
CN114500561B (en) Power Internet of things network resource allocation decision-making method, system, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant