CN114116156B - Cloud-edge cooperative double-profit equilibrium taboo reinforcement learning resource allocation method - Google Patents
Cloud-edge cooperative double-profit equilibrium taboo reinforcement learning resource allocation method Download PDFInfo
- Publication number
- CN114116156B CN114116156B CN202111209997.8A CN202111209997A CN114116156B CN 114116156 B CN114116156 B CN 114116156B CN 202111209997 A CN202111209997 A CN 202111209997A CN 114116156 B CN114116156 B CN 114116156B
- Authority
- CN
- China
- Prior art keywords
- resource
- task
- representing
- user
- computing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
- G06F9/5016—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5072—Grid computing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/502—Proximity
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention discloses a cloud-edge cooperative double-profit equilibrium taboo reinforcement learning resource allocation method and a cloud-edge cooperative double-profit equilibrium taboo reinforcement learning resource allocation bagComprises the following steps: 1) establishing a resource distribution framework in a cloud edge environment; 2) determining a user benefit optimization objective function, a service provider benefit optimization objective function and a bilateral benefit balancing objective function; 3) constructing three factors of reinforcement learning in a resource distributor; 4) selecting a computing node a i (ii) a 5) According to the selected action a i Updating to obtain new state s t+1 (ii) a 6) According to the new state s t+1 Simulation of motion a' i (ii) a 7) Calculating a target value8) Calculating an Actor-Critic network parameter eta Q (ii) a 9) Updating the Actor-Critic network parameters; and 10) repeating the steps 3) -9) until the Actor-Critic network is converged, and obtaining the optimal solution of the bilateral benefit balance objective function. The invention takes the average resource utilization rate of the service provider as the benefit index of the service provider, and makes an optimal resource allocation decision in a self-adaptive way by facing real-time and dynamic user tasks through a taboo reinforcement learning method.
Description
Technical Field
The invention relates to a system resource allocation method in the field of cloud computing and edge computing, in particular to a cloud-edge cooperative double-profit equilibrium taboo reinforcement learning resource allocation method.
Technical Field
The cloud edge cooperation is used as a brand-new internet of things computing mode, large-scale complex computing tasks are executed through computing/data migration and resource cooperation between remote cloud and edge cloud and mutual cooperation between computing nodes, and the cloud edge cooperation gradually becomes a focus and a leading-edge field concerned by academic circles and industrial circles at home and abroad. In traditional cloud computing, edge computing models, users only view online videos as the ultimate "consumer" of data, such as with a cell phone. In contrast, the cloud edge collaborative mode is an interconnection system composed of multiple types and resource heterogeneous computing nodes, an integrated collaborative computing system is formed, and intelligent services are provided for users nearby. The user has the dual roles of data consumer and data producer, such as sharing video through WeChat, tremble and the like. The user is concerned with how much revenue they can get to complete their own task requests, how much cost the provider needs to be paid to complete these task requests, and the user experience, among other things. If the user is not good in benefit when using the cloud-edge collaborative computing mode, the user refuses to use the cloud-edge collaborative service and only selects to complete their job task locally. Conversely, if the interests of a large number of users can be optimized, the users will be more willing to use the cloud-edge collaborative computing mode, which will also attract more potential users in the market to use cloud-edge collaboration.
In fact, the interests of the user are related to the interests inhabitation of the facilitator. As mentioned above, cloud-edge collaboration is a new application paradigm that includes software, platform, and infrastructure services that users share in using. For the service provider, their income is derived from the fees charged to the user for providing the service (the user is the consumer) and the fees charged to the user for the use of the data shared by him (the user is the producer). The improvement of income can better promote the quality of service, attract more users to use the service, and finally realize virtuous circle. Therefore, how to increase the interests of the service provider while optimizing the interests of the user is also part of the consideration. Therefore, how to reasonably allocate resources to satisfy the balance of interests of users and service providers in the cloud edge collaborative environment has very important significance.
In many of the existing research efforts, the resource allocation problem has been identified as a multi-constraint, multi-objective optimization NP-hard problem. The existing resource allocation solution is only oriented to a single cloud computing or edge computing environment, and lacks universality, so that the existing resource allocation solution is difficult to be directly applied to a complex cloud edge collaborative environment. In addition, most of these schemes are based on the perspective of maximizing the benefit of a single perspective, and there is no consideration for both the benefit of the user and the benefit of the service provider. Therefore, it is necessary to provide a resource allocation method for balancing the interests of users and service providers to solve the above problems.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a cloud-edge collaborative dual-profit equilibrium taboo reinforcement learning resource allocation method, which comprehensively takes the average completion time of a user task as a user benefit index, takes the average resource utilization rate of a service provider as a service provider benefit index and adaptively makes an optimal resource allocation decision when facing a real-time dynamic user task through the taboo reinforcement learning method.
In order to achieve the above object, the invention provides a resource allocation method for cloud-edge collaborative dual-interest balance taboo reinforcement learning, which is characterized by comprising the following steps:
1) establishing a resource allocation framework in a cloud edge environment, comprising: a user resource demand model, a computing node resource state model and a resource distributor;
2) determining a user benefit optimization objective function, a service provider benefit optimization objective function and a bilateral benefit balance objective function;
3) three elements in reinforcement learning are constructed in a resource allocator: state space, action space and reward function;
4) the resource allocator sends the status space to the Actor network, which selects a set of compute nodes a from the action space according to the policy i Assigning user tasks as motion vectors;
a i =μ(s t ,η μ )+Ψ
wherein s is t Representing the state of the cloud edge system at the time t; mu represents a strategy simulated by a convolutional neural network, psi is random noise, eta μ Is an Actor-critical network parameter;
5) the state space is selected according to the action a of step 4) i Updating to obtain new state s t+1 (ii) a The resource distributor distributes the tasks of the users to the node a in sequence i Calculating the reward value r in the time period t t (ii) a If the obtained reward value is a negative number, storing the selected action vector into a contraindication list, and if the obtained reward value is a positive number, storing the selected action vector into an experience replay pool;
6) the state space being based on the new state s t+1 Simulation of motion a' i ;
a′ i =μ′(s t+1, η μ′ )+Ψ
Wherein, mu' represents a strategy simulated by a convolution neural network, psi is random noise, eta μ′ Is an Actor-critical network parameter;
Wherein Reward represents a Reward function, γ is an attenuation factor, Q μ′ Is shown in state s t+1 Q evaluation value, η, of the lower-adopted strategy μ Q′ Policy network parameters, η, for target in Critic networks μ′ And (5) strategy of network parameters for the target in the Actor network.
8) Calculating an Actor-critical network parameter eta by adopting a minimum mean square error loss function Q :
Wherein X represents the experience number in the experience replay pool, Q μ Is shown in state s t Lower adoption action a i And always adopting the Q value of the strategy mu;
9) using a Monte Carlo method to measure a strategy mu to update the Actor-Critic network parameter;
10) and repeating the steps 3) to 9) until the Actor-Critic network converges, and obtaining the optimal solution of the bilateral benefit balance target function.
Preferably, in step 1), at each scheduling time t:
each computing node transmits its own state to the resource allocator, and the specific state includes: CPU resource allowance, memory resource allowance and storage resource allowance;
each user transmits own computing task requirements to the resource allocator by means of the terminal device, and the specific requirements include: the location of the user, the size of the task, the requirements for CPU resources, memory resources, and storage resources.
Preferably, the resource allocator in step 1) stores the user requirement and the state of the computing node in the form of a matrix:
wherein, U t Representing a user demand matrix at the time t; k represents the total number of users at the time t; s k,cpu t Representing the demands of the kth user on CPU resources; s k,men t Representing the demands of the kth user on the memory resources; s k,storage t Representing the demands of the kth user on the storage resources; c t Representing a state matrix of the computing node at the time t; m represents the total number of compute nodes; c m,cpu t Representing the CPU resource allowance of the mth computing node; c. C m,men t Representing the memory resource allowance of the mth computing node; c. C m,storage t Representing the storage resource margin of the mth compute node.
Preferably, the user benefit optimization objective function in step 2) is composed of task average execution time of all users:
wherein, art i Representing the task execution time of the user i; ART represents the average execution time of tasks for all users; k denotes the number of users, art i Dong renThe delay of the transmission of the task to the computing node, the delay waiting for execution in the computing node and the task computing time are composed of:
art i =art delay +art wait +art computg
wherein, art delay Indicating the delay of the task transmission to the compute node, art wait Indicating the delay of a task waiting to execute in a compute node, art computing Representing the time that the task was computed in the compute node.
Preferably, the facilitator interest optimization objective function in step 2) is composed of resource utilization rates of all computing nodes:
wherein, asr j Representing the resource utilization rate of the computing node j; ASR represents the resource utilization rate of all computing nodes; n represents the total number of nodes having the nth resource margin; c. C m,n t Representing the surplus of the nth resource of the mth computing node at the time t; s k,n t Representing the requirement of the kth task on the nth resource at the time t; a. the t Indicating the scheduling action selected by the scheduler at time t.
Preferably, the bilateral benefit balancing objective function in step 2) is composed of a user benefit optimizing objective function and a service provider benefit optimizing objective function:
wherein Z represents a bilateral benefit balancing objective function; theta represents a weight coefficient of the user benefit optimization target function;weight coefficients representing a facilitator interest optimization objective function,
preferably, the resource allocator in step 6) evaluates the policy μ using bellman's formula:
Q μ (s t ,a i ,η μ )=E[Reward+γQ μ (s t+1 ,μ(s t+1 ,η Q ),η μ )]
e represents expectation.
Preferably, the method for updating the Actor-critical network parameter by using the monte carlo measurement policy μ in step 9) includes:
η Q′ ←vη Q +(1-v)η Q′
η μ′ ←vη μ +(1-v)η μ′
wherein the content of the first and second substances,representing the gradient, v is the update factor, and has a value of 0.001.
Preferably, the task is transmitted to the delay art of the computing node delay The calculation method of (c) is as follows:
Distance ij =R*cos -1 [sin(Mlat i )*sin(Mlat j )*cos(Mlon i -Mlon j ) +cos(Mlat i )*cos(Mlat j )|*π÷180;
where α is the delay coefficient, Distance ii Represents the distance between user i and computing node j, R represents the mean radius of the earth, and the value6371.004km, where π is expressed as the circumference ratio; mlat i Representing the calculated latitude value, Mlon, of user i i Representing the calculated longitude value for user i.
Preferably, the task waits in the compute node for a delayed art to execute wait The calculation of (c) is as follows:
art wait =task begin -task arrive
wherein, task begin The time for starting the task calculation is represented and obtained by the system record; task arrive The arrival time of the task is represented and obtained by system record;
time art calculated by the task in the computing node computing The calculation method of (c) is as follows:
art computing =task size /f j
wherein, task size Is the size of the task; f. of j Representing the computation frequency of the computation node j.
The invention acquires the environment information by interacting with the cloud side environment, and performs corresponding allocation actions according to the change of the environment information to realize optimal resource allocation, and has the advantages that:
1. compared with the existing method, the average resource utilization rate is improved by 35.08%.
2. Compared with the existing method, the average task completion time is reduced by 24.2%.
3. Compared with the existing method, the method ensures the benefits of the service provider, improves the user benefits by 32.96%, and has better benefit balancing performance.
Drawings
FIG. 1 is a system architecture diagram of a resource allocation method for balancing bilateral interests of users and resource providers based on tabu reinforcement learning.
Fig. 2 is an overall architecture diagram of a tabu reinforcement learning algorithm.
FIG. 3 shows the results of the user profits of the inventive method (SHAER) compared to the existing methods (NSGA-II, MSQL, ICPSO) in accordance with an embodiment of the present invention.
FIG. 4 shows the results of the facilitator profit for the inventive method (SHARE) compared to the existing methods (NSGA-II, MSQL, ICPSO) for an example of the present invention.
FIG. 5 shows the results of the average completion times of the tasks comparing the method of the invention (SHArer) with the prior art methods (NSGA-II, MSQL, ICPSO) in the examples of the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and specific embodiments.
As shown in fig. 1, the cloud-edge collaborative dual-benefit equilibrium taboo strong learning resource allocation method provided by the present invention interacts with the cloud-edge environment to obtain environment information, and performs a corresponding allocation action according to a change of the environment information, so as to implement optimal resource allocation. The specific steps are as follows:
1) establishing a resource allocation framework in a cloud edge environment, comprising: a user resource demand model, a computing node resource state model and a resource distributor. At each scheduling instant t:
each computing node transmits its own state to the resource allocator, and the specific state includes: CPU resource allowance, memory resource allowance and storage resource allowance;
each user transmits own computing task requirements to the resource allocator by means of the terminal device, and the specific requirements include: the position of the user, the size of the task, the requirement on CPU resources, the requirement on memory resources and the requirement on storage resources.
The resource allocator stores user requirements and compute node states in the form of a matrix:
wherein, U t Representing a user demand matrix at the time t; k represents the total number of users at time t; s k,cpu t Represents the kth user pairThe requirements of CPU resources; s k,men t Representing the demands of the kth user on the memory resources; s is k,storage t Representing the demands of the kth user on the storage resources; c t Representing a state matrix of the computing node at the time t; m represents the total number of compute nodes; c. C m,cpu t Representing the CPU resource allowance of the mth computing node; c. C m,men t Representing the memory resource allowance of the mth computing node; c. C m,storage t Indicating the storage resource margin of the mth computing node.
2) And determining a user benefit optimization objective function, a service provider benefit optimization objective function and a double-edge benefit balancing objective function. Wherein:
the user benefit optimization objective function consists of the average execution time of tasks of all users:
wherein, art i Representing the task execution time of the user i; ART represents the average execution time of tasks for all users; k is the number of users, art i The method comprises the following steps of delay of task transmission to a computing node, delay waiting for execution in the computing node and task computing time:
art i =art delay +art wait +art computing
wherein, art delay Indicating the delay of the task transmission to the compute node, art wait Indicating the delay of a task waiting to execute in a compute node, art computing Representing the time that the task was computed in the compute node.
Delayed art for task transmission to compute node delay The calculation of (c) is as follows:
Distance ij =R*cos -1 [sin(Mlat i )*sin(Mlat j )*cos(Mlon i -Mlon j ) +cos(Mlat i )*cos(Mlat j )]*π÷180;
where α is the delay coefficient, Distance ij Representing the distance between the user i and the computing node j, wherein R represents the average radius of the earth, the value is 6371.004km, and pi represents the circumferential rate; mlat i Representing the calculated latitude value of user i, Mlat if the geographic location is the northern hemisphere i =90-lat i (ii) a If the geographic location is the southern hemisphere, then Mlat i =90+lat i ;lat i For the true latitude value of user i, obtained from GPS data, Mlat j Method for computing (D) and Mlat i And (5) the consistency is achieved. Mlon i Representing the calculated longitude value of user i, if the geographic location is eastern hemisphere, Mlon i =lon i (ii) a Mlon if the geographic location is the western hemisphere i =-lon i (ii) a Wherein, lon i For the true longitude value of user i, obtained from GPS data, Mlon j Is calculated by the method (2) and Mlon i And (5) the consistency is achieved.
Delayed art where a task is waiting to execute in a compute node wait The calculation method of (c) is as follows:
art wait =task begin -task arrive
wherein, task begin The time for starting the task calculation is represented and obtained by the system record; task arrive The arrival time of the task is represented and obtained by system record;
time art calculated by task in computing node computing The calculation method of (c) is as follows:
art computing =task size /f j
wherein, task size Is the size of the task; f. of j Representing the computation frequency of compute node j.
The service provider interest optimization objective function consists of the resource utilization rates of all the computing nodes:
wherein, asr j Representing the resource utilization rate of the computing node j; ASR represents the resource utilization of all computing nodes; n represents the total number of nodes having the nth resource margin; c. C m,n t Representing the surplus of the nth resource of the mth computing node at the time t; s is k,n t Representing the requirement of the kth task on the nth resource at the time t; a. the t Indicating the scheduling action selected by the scheduler at time t.
The bilateral benefit balancing objective function consists of a user benefit optimizing objective function and a service provider benefit optimizing objective function:
wherein Z represents a bilateral benefit balancing objective function; theta represents a weight coefficient of the user benefit optimization target function;a weight coefficient representing a facilitator interest optimization objective function,
3) three elements in reinforcement learning are constructed in a resource allocator: state space, action space, and reward functions. As shown in fig. 2, the present embodiment employs a DDPG algorithm, and the DDPG algorithm is composed of an Actor network and a criticc network. And (4) determining the computing nodes distributed by the user task at each time t by the algorithm. The state space is represented by a computational node state matrix:
S={C t }
wherein S represents a state space. The action space is represented by a set of compute nodes that can satisfy the execution of user tasks:
A={a 1 ,a 2 ,…,a i }
wherein A represents an action space, a i Representing a set of compute nodes that satisfy the execution of a user task.
The reward function is composed of a bilateral benefit balancing objective function and is calculated in the following mode:
where Reward represents a Reward function.
4) The resource allocator sends the status space to the Actor network, which selects a set of compute nodes a from the action space according to the policy i Assigning user tasks as motion vectors;
a i =μ(s t ,η μ )+Ψ
wherein s is t Representing the state of the cloud edge system at the time t; mu represents a strategy simulated by a convolutional neural network, psi is random noise, eta μ Is an Actor-critical network parameter.
5) The state space is selected according to the action a of step 4) i Updating to obtain new state s t+1 (ii) a The resource distributor distributes the tasks of the users to the node a in sequence i Calculating the reward value r in the t time period t (ii) a And if the obtained reward value is a negative number, storing the selected action vector into a contraindication list, and if the obtained reward value is a positive number, storing the selected action vector into an experience replay pool.
6) The state space being based on the new state s t+1 Simulation of output motion a' i ;
a′ i =μ′(s t+1 ,η μ′ )+Ψ
Wherein, mu' represents a strategy simulated by a convolution neural network, psi is random noise, eta μ′ Is an Actor-Critic network parameter. The resource allocator evaluates the policy μ using bellman's formula:
Q μ (s t ,a i ,η μ )=E[Reward+γQ μ (s t+1 ,μ(s t+1 ,η Q ),η μ )]
wherein gamma is an attenuation factor, eta Q Is an Actor-critical network parameter.
Wherein Reward represents a Reward function, γ is an attenuation factor, Q μ′ Is shown in state s t+1 Q evaluation value, η, of the lower-adopted strategy μ Q′ Policy network parameters, η, for target in a Critic network μ′ And (5) strategy of network parameters for the target in the Actor network.
8) Calculating Actor-Critic network parameter eta by adopting minimum mean square error loss function Q :
Wherein X represents the experience number in the experience replay pool, Q μ Is shown in state s t Lower adoption action a i And the Q value of the strategy mu is always adopted.
9) And (3) adopting a Monte Carlo method to measure the strategy mu to update the Actor-Critic network parameters:
η Q′ ←vη Q +(1-v)η Q′
η μ′ ←vη μ +(1-v)η μ′
wherein the content of the first and second substances,representing the gradient, v is the update factor, and has a value of 0.001.
10) And repeating the steps 3) to 9) until the Actor-Critic network converges, and obtaining the optimal solution of the bilateral profit balance objective function.
The method and the device interact with the cloud side environment to obtain the environment information, and perform corresponding allocation actions according to the change of the environment information to realize optimal resource allocation. The invention adopts MO-FJSPW data set and the existing methods (NSGA-II, MSQL, ICPSO) to carry out multi-angle performance comparison. The invention adopts the ratio of the user income to the service provider income as the performance index for measuring the double benefit balance, and as can be seen from figure 3, the invention can obtain the highest ratio under different numbers of users, which proves the superiority of the method in the benefit balance. Fig. 4 shows that the method of the present invention has a higher average resource utilization rate under different user numbers, and the effect is significantly better than the other three methods. As can be seen from fig. 5, the average task completion time is advantageous to other methods for different numbers of users.
Finally, it should be noted that the above detailed description is only for illustrating the patent technical solution and not for limiting, although the patent is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the patent can be modified or replaced equivalently without departing from the spirit and scope of the patent, and all that should be covered by the claims of the patent.
Claims (5)
1. A cloud-edge cooperative double-profit equilibrium taboo reinforcement learning resource allocation method is characterized by comprising the following steps: the method comprises the following steps:
1) establishing a resource allocation framework in a cloud edge environment, comprising: a user resource demand model, a computing node resource state model and a resource distributor; at each scheduling instant t:
each computing node transmits its own state to the resource allocator, and the specific state includes: CPU resource allowance, memory resource allowance and storage resource allowance;
each user transmits own computing task requirements to the resource allocator by means of the terminal device, and the specific requirements include: the position of a user, the size of a task, the requirement on CPU resources, the requirement on memory resources and the requirement on storage resources;
the resource allocator stores user requirements and compute node states in the form of a matrix:
wherein, U t Representing a user demand matrix at the time t; k represents the total number of users at time t; s k,cpu t Representing the demands of the kth user on CPU resources; s k,men t Representing the demands of the kth user on the memory resources; s is k,storage t Representing the demands of the kth user on the storage resources; c t Representing a state matrix of the computing node at the time t; m represents the total number of compute nodes; c. C m,cpu t Representing the CPU resource allowance of the mth computing node; c. C m,men t Representing the memory resource allowance of the mth computing node; c. C m,storage t Representing the surplus of the storage resource of the mth computing node;
2) determining a user benefit optimization objective function, a service provider benefit optimization objective function and a bilateral benefit balancing objective function;
the user benefit optimization objective function is composed of the average execution time of tasks of all users:
wherein, art i Representing the task execution time of the user i; ART represents the average execution time of tasks for all users; k represents the number of users,art i The method comprises the following steps of transmitting a task to a computing node, waiting to execute in the computing node, and computing time of the task:
art i =art delay +art wait +art computing
wherein, art delay Indicating the delay of the task transmission to the compute node, art wait Indicating the delay of a task waiting to execute in a compute node, art computing Representing the time of the task calculated in the computing node;
the facilitator interest optimization objective function is composed of resource utilization rates of all computing nodes:
wherein, asr j Representing the resource utilization rate of the computing node j; ASR represents the resource utilization rate of all computing nodes; n represents the total number of nodes having the nth resource margin; c. C m,n t Indicating the surplus of the nth resource of the mth computing node at the time t; s k,n t Representing the requirement of the kth task on the nth resource at the time t; a. the t A scheduling action representing the scheduler's choice at time t;
the bilateral benefit balancing objective function consists of a user benefit optimization objective function and a service provider benefit optimization objective function:
wherein Z represents a bilateral benefit balancing objective function; theta represents a weight coefficient of the user benefit optimization objective function;weight coefficients representing a facilitator interest optimization objective function,
3) three elements in reinforcement learning are constructed in a resource allocator: state space, action space and reward function;
4) the resource distributor sends the status space to the Actor network, and the Actor network selects a group of computing nodes a from the action space according to the strategy i Assigning user tasks as motion vectors;
a i =μ(s t ,η μ )+Ψ
wherein s is t Representing the state of the cloud edge system at the time t; mu represents a strategy simulated by a convolutional neural network, psi is random noise, eta μ Is an Actor-critical network parameter;
5) the state space is according to the node a selected in step 4) i Update to obtain a new state s t+1 (ii) a The resource distributor distributes the tasks of the users to the node a in sequence i Calculating the reward value r in the time period t t (ii) a If the obtained reward value is a negative number, storing the selected action vector into a contraindication list, and if the obtained reward value is a positive number, storing the selected action vector into an experience replay pool;
6) the state space being based on the new state s t+1 Simulation of output motion a' i ;
a′ i =μ′(s t+1 ,η μ′ )+Ψ
Wherein, mu' represents a strategy simulated by a convolution neural network, psi is random noise, eta μ′ Is an Actor-critical network parameter;
Wherein Reward represents a Reward function, γ is an attenuation factor, Q μ′ Is shown in state s t+1 Q-estimation, η of the lower-adoption strategy μ Q′ Policy network parameters, η, for target in Critic networks μ′ Strategy network parameters for target in an Actor network;
8) calculating an Actor-critical network parameter eta by adopting a minimum mean square error loss function Q :
Wherein X represents the experience number in the experience replay pool, Q μ Is shown in state s t Lower adoption node a i And always adopting the Q value of the strategy mu;
9) updating the Actor-critical network parameters by adopting a Monte Carlo method to measure the strategy mu;
10) and repeating the steps 3) -9) until the Actor-Critic network is converged, and obtaining the optimal solution of the bilateral benefit balance objective function.
2. The method according to claim 1, wherein the resource allocation method comprises: the resource allocator in the step 6) evaluates the policy mu by using a Bellman formula:
Q μ (s t ,a i ,η μ )=E[Reward+γQ μ (s t+1 ,μ(s t+1 ,η Q ),η μ )]
e represents expectation.
3. The method according to claim 1, wherein the resource allocation method comprises: the method for updating the Actor-critical network parameters by adopting the Monte Carlo method to measure the strategy μ in the step 9) comprises the following steps:
η Q′ ←vη Q +(1-ν)η Q′
η μ′ ←vη μ +(1-v)η μ′
4. The method according to claim 1, wherein the resource allocation method comprises: delay art of the task transmission to the compute node delay The calculation of (c) is as follows:
Distance ij =R*cos -1 [sin(Mlat i )*sin(Mlat j )*cos(Mlon i -Mlon j )+cos(Mlat i )*cos(Mlat j )]*π÷180;
where α is the delay factor, Distance ij Representing the distance between the user i and the computing node j, wherein R represents the average radius of the earth, the value is 6371.004km, and pi represents the circumferential rate; mlat i Representing the calculated latitude value, Mlon, of user i i Representing the calculated longitude value for user i.
5. The method according to claim 1, wherein the resource allocation method comprises: the task waiting for a delayed art to execute in the compute node wait The calculation method of (c) is as follows:
art wait =task begin -task arrive
wherein, task begin Indicating the time at which the task started to computeObtained by system recording; task arrive Representing the arrival time of the task, and obtaining the arrival time through system record;
time art calculated by the task in the computing node computing The calculation method of (c) is as follows:
art computing =task size /f j
wherein, task size Is the size of the task; f. of j Representing the computation frequency of the computation node j.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111209997.8A CN114116156B (en) | 2021-10-18 | 2021-10-18 | Cloud-edge cooperative double-profit equilibrium taboo reinforcement learning resource allocation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111209997.8A CN114116156B (en) | 2021-10-18 | 2021-10-18 | Cloud-edge cooperative double-profit equilibrium taboo reinforcement learning resource allocation method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114116156A CN114116156A (en) | 2022-03-01 |
CN114116156B true CN114116156B (en) | 2022-09-09 |
Family
ID=80376227
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111209997.8A Active CN114116156B (en) | 2021-10-18 | 2021-10-18 | Cloud-edge cooperative double-profit equilibrium taboo reinforcement learning resource allocation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114116156B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111444009A (en) * | 2019-11-15 | 2020-07-24 | 北京邮电大学 | Resource allocation method and device based on deep reinforcement learning |
CN112367353A (en) * | 2020-10-08 | 2021-02-12 | 大连理工大学 | Mobile edge computing unloading method based on multi-agent reinforcement learning |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111813539A (en) * | 2020-05-29 | 2020-10-23 | 西安交通大学 | Edge computing resource allocation method based on priority and cooperation |
CN111918339B (en) * | 2020-07-17 | 2022-08-05 | 西安交通大学 | AR task unloading and resource allocation method based on reinforcement learning in mobile edge network |
CN112363829B (en) * | 2020-11-03 | 2024-03-29 | 武汉理工大学 | User resource dynamic allocation method based on elastic scale aggregation |
CN112351433B (en) * | 2021-01-05 | 2021-05-25 | 南京邮电大学 | Heterogeneous network resource allocation method based on reinforcement learning |
-
2021
- 2021-10-18 CN CN202111209997.8A patent/CN114116156B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111444009A (en) * | 2019-11-15 | 2020-07-24 | 北京邮电大学 | Resource allocation method and device based on deep reinforcement learning |
CN112367353A (en) * | 2020-10-08 | 2021-02-12 | 大连理工大学 | Mobile edge computing unloading method based on multi-agent reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN114116156A (en) | 2022-03-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108009023B (en) | Task scheduling method based on BP neural network time prediction in hybrid cloud | |
WO2021248607A1 (en) | Deep reinforcement learning-based taxi dispatching method and system | |
Murad et al. | A review on job scheduling technique in cloud computing and priority rule based intelligent framework | |
Keshk et al. | Cloud task scheduling for load balancing based on intelligent strategy | |
CN110247795A (en) | A kind of cloud net resource service chain method of combination and system based on intention | |
Aloqaily et al. | Fairness-aware game theoretic approach for service management in vehicular clouds | |
CN114546608A (en) | Task scheduling method based on edge calculation | |
Salimi et al. | Task scheduling with Load balancing for computational grid using NSGA II with fuzzy mutation | |
CN109710372A (en) | A kind of computation-intensive cloud workflow schedule method based on cat owl searching algorithm | |
CN113139639B (en) | MOMBI-oriented smart city application multi-target computing migration method and device | |
CN114116156B (en) | Cloud-edge cooperative double-profit equilibrium taboo reinforcement learning resource allocation method | |
Zhou et al. | DPS: Dynamic pricing and scheduling for distributed machine learning jobs in edge-cloud networks | |
Ma et al. | Improved differential search algorithm based dynamic resource allocation approach for cloud application | |
CN110012507B (en) | Internet of vehicles resource allocation method and system with priority of user experience | |
Chen et al. | Profit-Aware Cooperative Offloading in UAV-Enabled MEC Systems Using Lightweight Deep Reinforcement Learning | |
CN115271130B (en) | Dynamic scheduling method and system for maintenance order of ship main power equipment | |
Milocco et al. | Evaluating the upper bound of energy cost saving by proactive data center management | |
CN115016889A (en) | Virtual machine optimization scheduling method for cloud computing | |
Zhao et al. | Hypergraph-based task-bundle scheduling towards efficiency and fairness in heterogeneous distributed systems | |
Sun et al. | An intelligent resource allocation mechanism in the cloud computing environment | |
CN112700269A (en) | Distributed data center selection method based on anisotropic reinforcement learning | |
CN110661649A (en) | Power communication network resource allocation method | |
Wang et al. | Reinforcement Contract Design for Vehicular-Edge Computing Scheduling and Energy Trading Via Deep Q-Network With Hybrid Action Space | |
Lee et al. | A market-based resource management and qos support framework for distributed multimedia systems | |
Zeng et al. | Game strategies among multiple cloud computing platforms for non-cooperative competing assignment user tasks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |