CN114090239A - Model-based reinforcement learning edge resource scheduling method and device - Google Patents
Model-based reinforcement learning edge resource scheduling method and device Download PDFInfo
- Publication number
- CN114090239A CN114090239A CN202111285553.2A CN202111285553A CN114090239A CN 114090239 A CN114090239 A CN 114090239A CN 202111285553 A CN202111285553 A CN 202111285553A CN 114090239 A CN114090239 A CN 114090239A
- Authority
- CN
- China
- Prior art keywords
- edge
- reinforcement learning
- model
- resource scheduling
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 230000002787 reinforcement Effects 0.000 title claims abstract description 39
- 230000006870 function Effects 0.000 claims description 37
- 238000013528 artificial neural network Methods 0.000 claims description 30
- 230000009471 action Effects 0.000 claims description 24
- 238000013468 resource allocation Methods 0.000 claims description 11
- 230000004044 response Effects 0.000 claims description 9
- 230000001186 cumulative effect Effects 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 3
- 238000012886 linear function Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 230000000977 initiatory effect Effects 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 11
- 238000004422 calculation algorithm Methods 0.000 description 8
- 238000004590 computer program Methods 0.000 description 7
- 230000008901 benefit Effects 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 238000003860 storage Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000004540 process dynamic Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000002922 simulated annealing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
- G06F9/5016—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5055—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering software capabilities, i.e. software resources associated or available to the machine
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a model-based reinforcement learning edge resource scheduling method and device, which comprises the steps of collecting historical data of load information, resource information and user request information of edge nodes through an edge server, and constructing an edge environment model through supervised learning according to the historical data; and realizing reinforcement learning edge node resource scheduling based on the edge environment model, and distributing the user request to a proper edge node. The model-based reinforcement learning edge resource scheduling method and device provided by the invention are used for processing dynamic resource load requests aiming at the scene of edge computing resource scheduling, and have higher sample utilization rate and higher practicability.
Description
Technical Field
The invention relates to a model-based edge resource scheduling method and device for reinforcement learning, and belongs to the technical field of Internet of things.
Background
Because the load of the edge node can dynamically change, the load of the edge node needs to be reasonably scheduled through an algorithm, and the task requests of the users are distributed to different edge nodes, so that the optimal service guarantee is realized, and the load balance is realized.
The prior art generally performs resource scheduling by the following methods:
1. setting by manual rules, for example, allocating requests with low load demand to edge nodes which are busy, and allocating requests with high load demand to nodes which are idle;
2. and solving an approximately optimal distribution scheme by a combined optimization method and an approximately boxing problem method, and distributing the request to the corresponding edge node.
3. And finding out a heuristic load request distribution algorithm according to a heuristic algorithm, for example, a simulated annealing method.
4. A reinforcement learning based load request distribution algorithm.
The manual rule method requires the dependence on experienced personnel, maintains a very complex rule system, and is often not effective; the combined optimization method can only process static resource requests, and is not applied to the scene of dynamic resource requests; the heuristic method often cannot obtain the globally optimal result; although the resource scheduling algorithm based on general reinforcement learning can process dynamic requests, it needs to perform exploration and trial and error in a real edge computing environment, which may cause performance loss and decrease of user satisfaction.
Disclosure of Invention
The purpose is as follows: in order to overcome the defects in the prior art, the invention provides the edge resource scheduling method and device based on the model reinforcement learning, which have very high sample efficiency, can realize resource allocation aiming at the edge calculation scene, and is more suitable for being deployed in the real edge calculation scene.
The technical scheme is as follows: in order to solve the technical problems, the technical scheme adopted by the invention is as follows:
in a first aspect, a model-based edge resource scheduling method for reinforcement learning includes the following steps:
and collecting historical data of load information, resource information and user request information of the edge nodes through the edge server, and constructing an edge environment model through supervised learning according to the historical data.
And realizing reinforcement learning edge node resource scheduling based on the edge environment model, and distributing the user request to a proper edge node.
In a second aspect, an apparatus for scheduling edge resources based on model-based reinforcement learning includes the following modules:
the edge environment model building module: the edge server is used for collecting historical data of load information, resource information and user request information of the edge nodes and building an edge environment model through supervised learning according to the historical data.
A reinforcement learning module: the method is used for realizing reinforcement learning edge node resource scheduling based on the edge environment model and distributing the user request to the appropriate edge node.
Preferably, the method for constructing the edge environment model through supervised learning according to the historical data comprises the following steps:
based on the collected historical data, through the supervised learning of the deep neural network, the input of the edge environment model is the current state and the current action as an input vector X, and the current state comprises the following steps: the method comprises the steps of obtaining resource information of edge nodes, load information of the edge nodes and user request data; the current actions include: allocation is requested for each user. The output of the edge environment model is the state at the next moment as an output vector y, and the state at the next moment comprises: resource information of the edge node, load information of the edge node, and user request data.
The dimension of the deep neural network input is the second dimension of the input vector X, and in the deep neural network, network output is performed through a full connection layer by taking a plurality of full connection layers, a ReLU activation layer and a batch normalization layer as intermediate network layers.
The deep neural network updates parameters of the deep neural network through a gradient descent and back propagation method according to a loss function.
Preferably, the resource information of the edge node includes: the number of CPU cores of the edge nodes, the total amount of memory, the total amount of bandwidth and the number of servers of the edge nodes. The load information of the edge node includes: yesterday historical load, last week historical average load, last month historical average load, last year calendar historical average load. The user request information includes: the amount of resources requested by each user, the response time of the user request.
As a preferred scheme, the method for implementing reinforcement learning edge node resource scheduling based on the edge environment model and allocating the user's request to a suitable edge node includes the following steps:
for reinforcement learning, elements in the markov decision process are defined:
and a state s: resource information of the edge node, load information of the edge node, and user request data.
Action a: the user's request is distributed to the edge nodes.
Reward r: a weighted sum of user satisfaction and load balancing.
By constructing a state-action value function Q (s, a) ═ E [ r | s0=s,a0=a]Acquiring a cumulative award, acquiring different actions output at different probabilities by a policy function mu (o) of allocating resources by the edge node, and outputting a resource allocation scheme requested by each user in which the cumulative award is maximized according to the cumulative award and the actions. Where s is the initial state, a is the initial action, and o is the state observed by the edge node.
As a preferred scheme, a state-action value function and a policy function of resource allocation of edge nodes are modeled by a multilayer neural network, wherein the neural network constructed by the state-action value function updates parameters of the neural network by using a minimized time division error, and the neural network constructed by the policy function of resource allocation of the edge nodes updates the parameters of the neural network by using a policy gradient theorem to obtain the updated neural network.
Preferably, the state-action value function updates the formula as follows:
Q=(1-w)Qg+wQ
wherein Q isgIs a global state action value function, and w is a weight.
Preferably, the satisfaction degree comprises: a linear function of the response time, the longer the response time, the lower the satisfaction; the load balancing comprises: a minimum load among the plurality of edge nodes; the weight of the weighted sum is set according to the preference of the edge node administrator.
Has the advantages that: the model-based reinforcement learning edge resource scheduling method and device provided by the invention can process dynamic resource load requests aiming at the scene of edge computing resource scheduling, and have higher sample utilization rate and higher practicability.
Drawings
FIG. 1 is a schematic diagram of a system architecture for edge computing resource allocation.
FIG. 2 is a schematic flow diagram of the method of the present invention.
Fig. 3 is a schematic diagram of a model for resource allocation by multi-edge node cooperation.
Detailed Description
The present invention will be further described with reference to the following examples.
The invention discloses a model-based reinforcement learning edge resource scheduling system, which is used for performing resource scheduling on dynamic user load requests and distributing the user requests to different edge nodes, thereby maximizing the satisfaction degree of users and simultaneously balancing the load among the edge nodes.
As shown in fig. 1, the system is composed of a plurality of unmanned aerial vehicle terminal devices, a base station, and an edge device cluster, and when the system performs edge computing resource allocation, a plurality of terminal devices send a load task to the edge device cluster through the base station. The edge device determines how many resources (CPU, memory) to allocate for each task according to the load and resource requirements of different tasks.
As shown in fig. 2, a model-based edge resource scheduling method for reinforcement learning includes the following steps:
and collecting load information, resource information and historical data of user request information of the edge nodes through the edge server, and constructing an edge environment model through supervised learning.
And realizing reinforcement learning edge node resource scheduling based on the edge environment model, and distributing the user request to a proper edge node.
The specific method comprises the following steps:
the construction method of the edge environment model comprises the following steps:
step 1: collecting historical data of edge nodes, specifically including the following categories:
the resource information of the edge node includes: the number of CPU cores of the edge nodes, the total amount of memory, the total amount of bandwidth and the number of servers of the edge nodes.
The load information of the edge node includes: yesterday historical load, last week historical average load, last month historical average load, last year calendar historical average load.
The user request information includes: the amount of resources requested by each user, the response time of the user request.
Step 2: constructing a marginal environment model through a supervised learning algorithm
And based on the collected historical data, constructing an edge environment model through the supervised learning of a deep neural network. The input of the edge environment model is a current state and a current action as an input vector X, and the current state comprises: the method comprises the steps of obtaining resource information of edge nodes, load information of the edge nodes and user request data; the current actions include: allocation is requested for each user. The output of the edge environment model is the state at the next moment as an output vector y, and the state at the next moment comprises: resource information of the edge node, load information of the edge node, and user request data.
The dimension of the deep neural network input is the second dimension of the input vector X, and in the deep neural network, network output is performed through a full connection layer by taking a plurality of full connection layers, a ReLU activation layer and a batch normalization layer as intermediate network layers. At the output, the comparison is made with the true state y and the following loss function is calculated:
wherein: f () represents the output of the deep neural network, yiFor the real state, the network parameters can then be updated by gradient descent and back propagation methods, according to the loss function.
In addition, since the resource information of the edge node is static data, prediction is not required. For the load information of the edge node, when the user request information is known, the part can also be directly determined, so that the output part only needs to include the user request data at the next moment.
According to the method for realizing the reinforcement learning edge node resource scheduling based on the edge environment model, exploration trial and error are carried out in the edge environment model through reinforcement learning according to the edge environment model, and an optimal resource scheduling strategy is found out. The method comprises the following steps:
step 1: for reinforcement learning, the elements in the markov decision process are respectively defined as follows:
and a state s: resource information of the edge node, load information of the edge node, and user request data.
Action a: the user's request is distributed to the edge nodes.
Reward r: a weighted sum of user satisfaction and load balancing. The satisfaction degree comprises: a linear function of the response time, the longer the response time, the lower the satisfaction; the load balancing comprises: a minimum load among the plurality of edge nodes; the weight of the weighted sum is set according to the preference of the edge node administrator.
Step 2: and outputting a resource allocation scheme requested by each user through a deep reinforcement learning algorithm, thereby achieving the maximization of long-term accumulated benefits.
Define the state-action value function Q (s, a) ═ E [ r | s0=s,a0=a]I.e. the accumulated reward that this strategy can achieve when the initial state action is s, a, respectively.
And defining a policy function of resource allocation of the edge node as mu (o), namely, the probability of adopting different allocation schemes after the edge node observes the state o. Modeling is performed through a multilayer neural network for the state-action value function and the policy function of the edge node allocation resource, and the parameters of the neural network are updated through the following method.
For the state-action value function, the update of the neural network parameters is performed by minimizing the time division error:
L(θ)=E[Q(s,a)-y]2
wherein: y ═ r + γ maxQ (s ', a'). Wherein s 'is the state after the action a is executed, a' is the next moment action, the gamma yield weight coefficient, the reward function r and the prediction of the next moment state s 'and the action a' through the marginal environment model, so that the interaction with the real environment is not needed. The training speed can be effectively improved, and the stability of the algorithm can also be improved.
For the strategy function of the edge node distribution resources, updating the neural network parameters according to the strategy gradient theorem:
it should also be noted that in using reinforcement learning, exploration in the environment is required. In the invention, the strategy function is assumed to be a probability function, so that different actions can be output with different probabilities, and in the process of reinforcement learning execution, the variance of the probability function is gradually reduced, so that the finally executed action is output with a more stable value.
As shown in fig. 3, the above method is to perform uniform and global scheduling on a plurality of edge nodes. If a plurality of edge nodes can only perform distributed scheduling, if each edge node only performs local resource allocation, it is difficult to achieve global benefit maximization. On the other hand, if the states of other edge nodes are taken into consideration, the overall benefit can be improved through cooperation. A state action value function Q and a policy function μmay be maintained for each edge node, and in addition, to improve overall cooperative efficiency, a global Q may be usedgAs a function of the global state action values. QgThe value of (d) is not used directly for the output action, but may be updated as part of the Q value:
Q=(1-w)Qg+wQ
wherein w is a weight, when w is larger, each edge node pays more attention to cooperative global rewards, otherwise, the node pays more attention to the rewards.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.
Claims (8)
1. A method for scheduling edge resources based on model reinforcement learning is characterized in that: the method comprises the following steps:
collecting historical data of load information, resource information and user request information of edge nodes through an edge server, and constructing an edge environment model through supervised learning according to the historical data;
and realizing reinforcement learning edge node resource scheduling based on the edge environment model, and distributing the user request to a proper edge node.
2. The method of claim 1, wherein the model-based edge resource scheduling method for reinforcement learning comprises: the method for constructing the edge environment model through supervised learning according to the historical data comprises the following steps:
based on the collected historical data, through the supervised learning of the deep neural network, the input of the edge environment model is the current state and the current action as an input vector X, and the current state comprises the following steps: the method comprises the steps of obtaining resource information of edge nodes, load information of the edge nodes and user request data; the current actions include: requesting allocation for each user; the output of the edge environment model is the state at the next moment as an output vector y, and the state at the next moment comprises: the method comprises the steps of obtaining resource information of edge nodes, load information of the edge nodes and user request data;
the dimension of the deep neural network input is the second dimension of the input vector X, and in the deep neural network, network output is carried out through a full connection layer by taking a plurality of full connection layers, a ReLU activation layer and a batch normalization layer as intermediate network layers;
the deep neural network updates parameters of the deep neural network through a gradient descent and back propagation method according to a loss function.
3. The method of claim 1, wherein the model-based edge resource scheduling method for reinforcement learning comprises: the resource information of the edge node includes: the number of CPU cores, the total amount of memory, the total amount of bandwidth and the number of servers of the edge nodes; the load information of the edge node includes: yesterday historical load, last week historical average load, last month historical average load, last year calendar historical average load; the user request information includes: the amount of resources requested by each user, the response time of the user request.
4. The method of claim 1, wherein the model-based edge resource scheduling method for reinforcement learning comprises: the method for realizing the reinforcement learning edge node resource scheduling based on the edge environment model and distributing the user request to the proper edge node comprises the following steps:
for reinforcement learning, elements in the markov decision process are defined:
and a state s: the method comprises the steps of obtaining resource information of edge nodes, load information of the edge nodes and user request data;
action a: distributing the request of the user to the edge node;
reward r: a weighted sum of user satisfaction and load balancing;
by constructing a state-action value function Q (s, a) ═ E [ r | s0=s,a0=a]Acquiring a cumulative award, acquiring different actions output under different probabilities through a policy function mu (o) of allocating resources by a fringe node, and outputting a resource allocation scheme requested by each user with the maximized cumulative award according to the cumulative award and the actions; where s is the initial state, a is the initial action, and o is the state observed by the edge node.
5. The method of claim 4, wherein the model-based reinforcement learning edge resource scheduling method comprises: and modeling the state-action value function and the strategy function of distributing resources to the edge nodes through a multilayer neural network, wherein the neural network constructed by the state-action value function updates parameters of the neural network by using a minimized time division error, and the neural network constructed by the strategy function of distributing resources to the edge nodes updates the parameters of the neural network by using a strategy gradient theorem to obtain the updated neural network.
6. The method of claim 4, wherein the model-based reinforcement learning edge resource scheduling method comprises: the state-action value function updates the formula as follows:
Q=(1-w)Qg+wQ
wherein Q isgIs a global state action value function, and w is a weight.
7. The method of claim 4, wherein the model-based reinforcement learning edge resource scheduling method comprises: the satisfaction degree comprises: a linear function of the response time, the longer the response time, the lower the satisfaction; the load balancing comprises: a minimum load among the plurality of edge nodes; the weight of the weighted sum is set according to the preference of the edge node administrator.
8. The utility model provides a marginal resource scheduling device of reinforcement learning based on model which characterized in that: the system comprises the following modules:
the edge environment model building module: the edge server is used for collecting historical data of load information, resource information and user request information of edge nodes and building an edge environment model through supervised learning according to the historical data;
a reinforcement learning module: the method is used for realizing reinforcement learning edge node resource scheduling based on the edge environment model and distributing the user request to the appropriate edge node.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111285553.2A CN114090239B (en) | 2021-11-01 | 2021-11-01 | Method and device for dispatching edge resources based on model reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111285553.2A CN114090239B (en) | 2021-11-01 | 2021-11-01 | Method and device for dispatching edge resources based on model reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114090239A true CN114090239A (en) | 2022-02-25 |
CN114090239B CN114090239B (en) | 2024-08-13 |
Family
ID=80298547
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111285553.2A Active CN114090239B (en) | 2021-11-01 | 2021-11-01 | Method and device for dispatching edge resources based on model reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114090239B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115022189A (en) * | 2022-05-31 | 2022-09-06 | 武汉大学 | Edge user distribution model construction method, device, equipment and readable storage medium |
CN118227369A (en) * | 2024-05-22 | 2024-06-21 | 苏州元脑智能科技有限公司 | Active fault tolerance method, apparatus, device, medium and computer program product |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190384641A1 (en) * | 2018-06-15 | 2019-12-19 | EMC IP Holding Company LLC | Method, apparatus, and computer program product for processing computing task |
CN111506405A (en) * | 2020-04-08 | 2020-08-07 | 北京交通大学 | Edge calculation time slice scheduling method based on deep reinforcement learning |
CN112069903A (en) * | 2020-08-07 | 2020-12-11 | 之江实验室 | Method and device for achieving face recognition end side unloading calculation based on deep reinforcement learning |
CN113282368A (en) * | 2021-05-25 | 2021-08-20 | 国网湖北省电力有限公司检修公司 | Edge computing resource scheduling method for substation inspection |
US20210303481A1 (en) * | 2020-03-27 | 2021-09-30 | Intel Corporation | Efficient data sharing for graphics data processing operations |
CN113467952A (en) * | 2021-07-15 | 2021-10-01 | 北京邮电大学 | Distributed federated learning collaborative computing method and system |
CN113495793A (en) * | 2020-04-02 | 2021-10-12 | 英特尔公司 | Method and apparatus for buffer sharing |
CN113543156A (en) * | 2021-06-24 | 2021-10-22 | 中国科学院沈阳自动化研究所 | Industrial wireless network resource allocation method based on multi-agent deep reinforcement learning |
-
2021
- 2021-11-01 CN CN202111285553.2A patent/CN114090239B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190384641A1 (en) * | 2018-06-15 | 2019-12-19 | EMC IP Holding Company LLC | Method, apparatus, and computer program product for processing computing task |
US20210303481A1 (en) * | 2020-03-27 | 2021-09-30 | Intel Corporation | Efficient data sharing for graphics data processing operations |
CN113495793A (en) * | 2020-04-02 | 2021-10-12 | 英特尔公司 | Method and apparatus for buffer sharing |
CN111506405A (en) * | 2020-04-08 | 2020-08-07 | 北京交通大学 | Edge calculation time slice scheduling method based on deep reinforcement learning |
CN112069903A (en) * | 2020-08-07 | 2020-12-11 | 之江实验室 | Method and device for achieving face recognition end side unloading calculation based on deep reinforcement learning |
CN113282368A (en) * | 2021-05-25 | 2021-08-20 | 国网湖北省电力有限公司检修公司 | Edge computing resource scheduling method for substation inspection |
CN113543156A (en) * | 2021-06-24 | 2021-10-22 | 中国科学院沈阳自动化研究所 | Industrial wireless network resource allocation method based on multi-agent deep reinforcement learning |
CN113467952A (en) * | 2021-07-15 | 2021-10-01 | 北京邮电大学 | Distributed federated learning collaborative computing method and system |
Non-Patent Citations (4)
Title |
---|
LEI LEI: ""Multiuser Resource Control With Deep Reinforcement Learning in IoT Edge Computing"", 《IEEE INTERNET OF THINGS JOURNAL》, vol. 6, no. 6, 31 December 2019 (2019-12-31), pages 10119 - 10133, XP011760732, DOI: 10.1109/JIOT.2019.2935543 * |
机器学习算法工程师: ""强化学习通俗理解系列一:马尔科夫奖赏过程MRP"", Retrieved from the Internet <URL:《https://cloud.tencent.com/developer/article/1167673》> * |
缪巍巍: ""基于多智能体强化学习的边缘物联代理资源分配算法"", 《电力信息与通信技术》, vol. 19, no. 12, 25 December 2021 (2021-12-25), pages 9 - 15 * |
陆知遥: ""基于多智能体的共享车辆动态调配问题研究"", 《中国优秀硕士学位论文全文数据库 工程科技II辑》, no. 2021, 15 September 2021 (2021-09-15), pages 034 - 64 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115022189A (en) * | 2022-05-31 | 2022-09-06 | 武汉大学 | Edge user distribution model construction method, device, equipment and readable storage medium |
CN115022189B (en) * | 2022-05-31 | 2024-03-26 | 武汉大学 | Edge user allocation model construction method, device, equipment and readable storage medium |
CN118227369A (en) * | 2024-05-22 | 2024-06-21 | 苏州元脑智能科技有限公司 | Active fault tolerance method, apparatus, device, medium and computer program product |
CN118227369B (en) * | 2024-05-22 | 2024-09-13 | 苏州元脑智能科技有限公司 | Active fault tolerance method, apparatus, device, medium and computer program product |
Also Published As
Publication number | Publication date |
---|---|
CN114090239B (en) | 2024-08-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Prem Jacob et al. | A multi-objective optimal task scheduling in cloud environment using cuckoo particle swarm optimization | |
CN109617826B (en) | Storm dynamic load balancing method based on cuckoo search | |
CN115248728A (en) | Distributed training task scheduling method, system and device for intelligent computing | |
Kaur et al. | Deep‐Q learning‐based heterogeneous earliest finish time scheduling algorithm for scientific workflows in cloud | |
CN114090239B (en) | Method and device for dispatching edge resources based on model reinforcement learning | |
CN112052092B (en) | Risk-aware edge computing task allocation method | |
CN110297699A (en) | Dispatching method, scheduler, storage medium and system | |
CN114610474B (en) | Multi-strategy job scheduling method and system under heterogeneous supercomputing environment | |
CN113342510B (en) | Water and power basin emergency command cloud-side computing resource cooperative processing method | |
CN115134371A (en) | Scheduling method, system, equipment and medium containing edge network computing resources | |
CN115237581A (en) | Heterogeneous computing power-oriented multi-strategy intelligent scheduling method and device | |
CN112732444A (en) | Distributed machine learning-oriented data partitioning method | |
CN113641445B (en) | Cloud resource self-adaptive configuration method and system based on depth deterministic strategy | |
CN111309472A (en) | Online virtual resource allocation method based on virtual machine pre-deployment | |
CN116932198A (en) | Resource scheduling method, device, electronic equipment and readable storage medium | |
CN106407007B (en) | Cloud resource configuration optimization method for elastic analysis process | |
CN118210609A (en) | Cloud computing scheduling method and system based on DQN model | |
CN112632615B (en) | Scientific workflow data layout method based on hybrid cloud environment | |
CN116701001B (en) | Target task allocation method and device, electronic equipment and storage medium | |
CN116500896B (en) | Intelligent real-time scheduling model and method for intelligent network-connected automobile domain controller multi-virtual CPU tasks | |
CN117436627A (en) | Task allocation method, device, terminal equipment and medium | |
CN112698911B (en) | Cloud job scheduling method based on deep reinforcement learning | |
CN115185651A (en) | Workflow optimization scheduling algorithm based on cloud computing | |
CN113238873A (en) | Method for optimizing and configuring spacecraft resources | |
Liu | A Programming Model for the Cloud Platform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |