CN114090239A - Model-based reinforcement learning edge resource scheduling method and device - Google Patents

Model-based reinforcement learning edge resource scheduling method and device Download PDF

Info

Publication number
CN114090239A
CN114090239A CN202111285553.2A CN202111285553A CN114090239A CN 114090239 A CN114090239 A CN 114090239A CN 202111285553 A CN202111285553 A CN 202111285553A CN 114090239 A CN114090239 A CN 114090239A
Authority
CN
China
Prior art keywords
edge
reinforcement learning
model
resource scheduling
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111285553.2A
Other languages
Chinese (zh)
Other versions
CN114090239B (en
Inventor
缪巍巍
曾锃
张明轩
张震
张瑞
滕昌志
李世豪
毕思博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd
Original Assignee
Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd filed Critical Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd
Priority to CN202111285553.2A priority Critical patent/CN114090239B/en
Publication of CN114090239A publication Critical patent/CN114090239A/en
Application granted granted Critical
Publication of CN114090239B publication Critical patent/CN114090239B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5055Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering software capabilities, i.e. software resources associated or available to the machine
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a model-based reinforcement learning edge resource scheduling method and device, which comprises the steps of collecting historical data of load information, resource information and user request information of edge nodes through an edge server, and constructing an edge environment model through supervised learning according to the historical data; and realizing reinforcement learning edge node resource scheduling based on the edge environment model, and distributing the user request to a proper edge node. The model-based reinforcement learning edge resource scheduling method and device provided by the invention are used for processing dynamic resource load requests aiming at the scene of edge computing resource scheduling, and have higher sample utilization rate and higher practicability.

Description

Model-based reinforcement learning edge resource scheduling method and device
Technical Field
The invention relates to a model-based edge resource scheduling method and device for reinforcement learning, and belongs to the technical field of Internet of things.
Background
Because the load of the edge node can dynamically change, the load of the edge node needs to be reasonably scheduled through an algorithm, and the task requests of the users are distributed to different edge nodes, so that the optimal service guarantee is realized, and the load balance is realized.
The prior art generally performs resource scheduling by the following methods:
1. setting by manual rules, for example, allocating requests with low load demand to edge nodes which are busy, and allocating requests with high load demand to nodes which are idle;
2. and solving an approximately optimal distribution scheme by a combined optimization method and an approximately boxing problem method, and distributing the request to the corresponding edge node.
3. And finding out a heuristic load request distribution algorithm according to a heuristic algorithm, for example, a simulated annealing method.
4. A reinforcement learning based load request distribution algorithm.
The manual rule method requires the dependence on experienced personnel, maintains a very complex rule system, and is often not effective; the combined optimization method can only process static resource requests, and is not applied to the scene of dynamic resource requests; the heuristic method often cannot obtain the globally optimal result; although the resource scheduling algorithm based on general reinforcement learning can process dynamic requests, it needs to perform exploration and trial and error in a real edge computing environment, which may cause performance loss and decrease of user satisfaction.
Disclosure of Invention
The purpose is as follows: in order to overcome the defects in the prior art, the invention provides the edge resource scheduling method and device based on the model reinforcement learning, which have very high sample efficiency, can realize resource allocation aiming at the edge calculation scene, and is more suitable for being deployed in the real edge calculation scene.
The technical scheme is as follows: in order to solve the technical problems, the technical scheme adopted by the invention is as follows:
in a first aspect, a model-based edge resource scheduling method for reinforcement learning includes the following steps:
and collecting historical data of load information, resource information and user request information of the edge nodes through the edge server, and constructing an edge environment model through supervised learning according to the historical data.
And realizing reinforcement learning edge node resource scheduling based on the edge environment model, and distributing the user request to a proper edge node.
In a second aspect, an apparatus for scheduling edge resources based on model-based reinforcement learning includes the following modules:
the edge environment model building module: the edge server is used for collecting historical data of load information, resource information and user request information of the edge nodes and building an edge environment model through supervised learning according to the historical data.
A reinforcement learning module: the method is used for realizing reinforcement learning edge node resource scheduling based on the edge environment model and distributing the user request to the appropriate edge node.
Preferably, the method for constructing the edge environment model through supervised learning according to the historical data comprises the following steps:
based on the collected historical data, through the supervised learning of the deep neural network, the input of the edge environment model is the current state and the current action as an input vector X, and the current state comprises the following steps: the method comprises the steps of obtaining resource information of edge nodes, load information of the edge nodes and user request data; the current actions include: allocation is requested for each user. The output of the edge environment model is the state at the next moment as an output vector y, and the state at the next moment comprises: resource information of the edge node, load information of the edge node, and user request data.
The dimension of the deep neural network input is the second dimension of the input vector X, and in the deep neural network, network output is performed through a full connection layer by taking a plurality of full connection layers, a ReLU activation layer and a batch normalization layer as intermediate network layers.
The deep neural network updates parameters of the deep neural network through a gradient descent and back propagation method according to a loss function.
Preferably, the resource information of the edge node includes: the number of CPU cores of the edge nodes, the total amount of memory, the total amount of bandwidth and the number of servers of the edge nodes. The load information of the edge node includes: yesterday historical load, last week historical average load, last month historical average load, last year calendar historical average load. The user request information includes: the amount of resources requested by each user, the response time of the user request.
As a preferred scheme, the method for implementing reinforcement learning edge node resource scheduling based on the edge environment model and allocating the user's request to a suitable edge node includes the following steps:
for reinforcement learning, elements in the markov decision process are defined:
and a state s: resource information of the edge node, load information of the edge node, and user request data.
Action a: the user's request is distributed to the edge nodes.
Reward r: a weighted sum of user satisfaction and load balancing.
By constructing a state-action value function Q (s, a) ═ E [ r | s0=s,a0=a]Acquiring a cumulative award, acquiring different actions output at different probabilities by a policy function mu (o) of allocating resources by the edge node, and outputting a resource allocation scheme requested by each user in which the cumulative award is maximized according to the cumulative award and the actions. Where s is the initial state, a is the initial action, and o is the state observed by the edge node.
As a preferred scheme, a state-action value function and a policy function of resource allocation of edge nodes are modeled by a multilayer neural network, wherein the neural network constructed by the state-action value function updates parameters of the neural network by using a minimized time division error, and the neural network constructed by the policy function of resource allocation of the edge nodes updates the parameters of the neural network by using a policy gradient theorem to obtain the updated neural network.
Preferably, the state-action value function updates the formula as follows:
Q=(1-w)Qg+wQ
wherein Q isgIs a global state action value function, and w is a weight.
Preferably, the satisfaction degree comprises: a linear function of the response time, the longer the response time, the lower the satisfaction; the load balancing comprises: a minimum load among the plurality of edge nodes; the weight of the weighted sum is set according to the preference of the edge node administrator.
Has the advantages that: the model-based reinforcement learning edge resource scheduling method and device provided by the invention can process dynamic resource load requests aiming at the scene of edge computing resource scheduling, and have higher sample utilization rate and higher practicability.
Drawings
FIG. 1 is a schematic diagram of a system architecture for edge computing resource allocation.
FIG. 2 is a schematic flow diagram of the method of the present invention.
Fig. 3 is a schematic diagram of a model for resource allocation by multi-edge node cooperation.
Detailed Description
The present invention will be further described with reference to the following examples.
The invention discloses a model-based reinforcement learning edge resource scheduling system, which is used for performing resource scheduling on dynamic user load requests and distributing the user requests to different edge nodes, thereby maximizing the satisfaction degree of users and simultaneously balancing the load among the edge nodes.
As shown in fig. 1, the system is composed of a plurality of unmanned aerial vehicle terminal devices, a base station, and an edge device cluster, and when the system performs edge computing resource allocation, a plurality of terminal devices send a load task to the edge device cluster through the base station. The edge device determines how many resources (CPU, memory) to allocate for each task according to the load and resource requirements of different tasks.
As shown in fig. 2, a model-based edge resource scheduling method for reinforcement learning includes the following steps:
and collecting load information, resource information and historical data of user request information of the edge nodes through the edge server, and constructing an edge environment model through supervised learning.
And realizing reinforcement learning edge node resource scheduling based on the edge environment model, and distributing the user request to a proper edge node.
The specific method comprises the following steps:
the construction method of the edge environment model comprises the following steps:
step 1: collecting historical data of edge nodes, specifically including the following categories:
the resource information of the edge node includes: the number of CPU cores of the edge nodes, the total amount of memory, the total amount of bandwidth and the number of servers of the edge nodes.
The load information of the edge node includes: yesterday historical load, last week historical average load, last month historical average load, last year calendar historical average load.
The user request information includes: the amount of resources requested by each user, the response time of the user request.
Step 2: constructing a marginal environment model through a supervised learning algorithm
And based on the collected historical data, constructing an edge environment model through the supervised learning of a deep neural network. The input of the edge environment model is a current state and a current action as an input vector X, and the current state comprises: the method comprises the steps of obtaining resource information of edge nodes, load information of the edge nodes and user request data; the current actions include: allocation is requested for each user. The output of the edge environment model is the state at the next moment as an output vector y, and the state at the next moment comprises: resource information of the edge node, load information of the edge node, and user request data.
The dimension of the deep neural network input is the second dimension of the input vector X, and in the deep neural network, network output is performed through a full connection layer by taking a plurality of full connection layers, a ReLU activation layer and a batch normalization layer as intermediate network layers. At the output, the comparison is made with the true state y and the following loss function is calculated:
Figure BDA0003331135480000041
wherein: f () represents the output of the deep neural network, yiFor the real state, the network parameters can then be updated by gradient descent and back propagation methods, according to the loss function.
In addition, since the resource information of the edge node is static data, prediction is not required. For the load information of the edge node, when the user request information is known, the part can also be directly determined, so that the output part only needs to include the user request data at the next moment.
According to the method for realizing the reinforcement learning edge node resource scheduling based on the edge environment model, exploration trial and error are carried out in the edge environment model through reinforcement learning according to the edge environment model, and an optimal resource scheduling strategy is found out. The method comprises the following steps:
step 1: for reinforcement learning, the elements in the markov decision process are respectively defined as follows:
and a state s: resource information of the edge node, load information of the edge node, and user request data.
Action a: the user's request is distributed to the edge nodes.
Reward r: a weighted sum of user satisfaction and load balancing. The satisfaction degree comprises: a linear function of the response time, the longer the response time, the lower the satisfaction; the load balancing comprises: a minimum load among the plurality of edge nodes; the weight of the weighted sum is set according to the preference of the edge node administrator.
Step 2: and outputting a resource allocation scheme requested by each user through a deep reinforcement learning algorithm, thereby achieving the maximization of long-term accumulated benefits.
Define the state-action value function Q (s, a) ═ E [ r | s0=s,a0=a]I.e. the accumulated reward that this strategy can achieve when the initial state action is s, a, respectively.
And defining a policy function of resource allocation of the edge node as mu (o), namely, the probability of adopting different allocation schemes after the edge node observes the state o. Modeling is performed through a multilayer neural network for the state-action value function and the policy function of the edge node allocation resource, and the parameters of the neural network are updated through the following method.
For the state-action value function, the update of the neural network parameters is performed by minimizing the time division error:
L(θ)=E[Q(s,a)-y]2
wherein: y ═ r + γ maxQ (s ', a'). Wherein s 'is the state after the action a is executed, a' is the next moment action, the gamma yield weight coefficient, the reward function r and the prediction of the next moment state s 'and the action a' through the marginal environment model, so that the interaction with the real environment is not needed. The training speed can be effectively improved, and the stability of the algorithm can also be improved.
For the strategy function of the edge node distribution resources, updating the neural network parameters according to the strategy gradient theorem:
Figure BDA0003331135480000061
it should also be noted that in using reinforcement learning, exploration in the environment is required. In the invention, the strategy function is assumed to be a probability function, so that different actions can be output with different probabilities, and in the process of reinforcement learning execution, the variance of the probability function is gradually reduced, so that the finally executed action is output with a more stable value.
As shown in fig. 3, the above method is to perform uniform and global scheduling on a plurality of edge nodes. If a plurality of edge nodes can only perform distributed scheduling, if each edge node only performs local resource allocation, it is difficult to achieve global benefit maximization. On the other hand, if the states of other edge nodes are taken into consideration, the overall benefit can be improved through cooperation. A state action value function Q and a policy function μmay be maintained for each edge node, and in addition, to improve overall cooperative efficiency, a global Q may be usedgAs a function of the global state action values. QgThe value of (d) is not used directly for the output action, but may be updated as part of the Q value:
Q=(1-w)Qg+wQ
wherein w is a weight, when w is larger, each edge node pays more attention to cooperative global rewards, otherwise, the node pays more attention to the rewards.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims (8)

1. A method for scheduling edge resources based on model reinforcement learning is characterized in that: the method comprises the following steps:
collecting historical data of load information, resource information and user request information of edge nodes through an edge server, and constructing an edge environment model through supervised learning according to the historical data;
and realizing reinforcement learning edge node resource scheduling based on the edge environment model, and distributing the user request to a proper edge node.
2. The method of claim 1, wherein the model-based edge resource scheduling method for reinforcement learning comprises: the method for constructing the edge environment model through supervised learning according to the historical data comprises the following steps:
based on the collected historical data, through the supervised learning of the deep neural network, the input of the edge environment model is the current state and the current action as an input vector X, and the current state comprises the following steps: the method comprises the steps of obtaining resource information of edge nodes, load information of the edge nodes and user request data; the current actions include: requesting allocation for each user; the output of the edge environment model is the state at the next moment as an output vector y, and the state at the next moment comprises: the method comprises the steps of obtaining resource information of edge nodes, load information of the edge nodes and user request data;
the dimension of the deep neural network input is the second dimension of the input vector X, and in the deep neural network, network output is carried out through a full connection layer by taking a plurality of full connection layers, a ReLU activation layer and a batch normalization layer as intermediate network layers;
the deep neural network updates parameters of the deep neural network through a gradient descent and back propagation method according to a loss function.
3. The method of claim 1, wherein the model-based edge resource scheduling method for reinforcement learning comprises: the resource information of the edge node includes: the number of CPU cores, the total amount of memory, the total amount of bandwidth and the number of servers of the edge nodes; the load information of the edge node includes: yesterday historical load, last week historical average load, last month historical average load, last year calendar historical average load; the user request information includes: the amount of resources requested by each user, the response time of the user request.
4. The method of claim 1, wherein the model-based edge resource scheduling method for reinforcement learning comprises: the method for realizing the reinforcement learning edge node resource scheduling based on the edge environment model and distributing the user request to the proper edge node comprises the following steps:
for reinforcement learning, elements in the markov decision process are defined:
and a state s: the method comprises the steps of obtaining resource information of edge nodes, load information of the edge nodes and user request data;
action a: distributing the request of the user to the edge node;
reward r: a weighted sum of user satisfaction and load balancing;
by constructing a state-action value function Q (s, a) ═ E [ r | s0=s,a0=a]Acquiring a cumulative award, acquiring different actions output under different probabilities through a policy function mu (o) of allocating resources by a fringe node, and outputting a resource allocation scheme requested by each user with the maximized cumulative award according to the cumulative award and the actions; where s is the initial state, a is the initial action, and o is the state observed by the edge node.
5. The method of claim 4, wherein the model-based reinforcement learning edge resource scheduling method comprises: and modeling the state-action value function and the strategy function of distributing resources to the edge nodes through a multilayer neural network, wherein the neural network constructed by the state-action value function updates parameters of the neural network by using a minimized time division error, and the neural network constructed by the strategy function of distributing resources to the edge nodes updates the parameters of the neural network by using a strategy gradient theorem to obtain the updated neural network.
6. The method of claim 4, wherein the model-based reinforcement learning edge resource scheduling method comprises: the state-action value function updates the formula as follows:
Q=(1-w)Qg+wQ
wherein Q isgIs a global state action value function, and w is a weight.
7. The method of claim 4, wherein the model-based reinforcement learning edge resource scheduling method comprises: the satisfaction degree comprises: a linear function of the response time, the longer the response time, the lower the satisfaction; the load balancing comprises: a minimum load among the plurality of edge nodes; the weight of the weighted sum is set according to the preference of the edge node administrator.
8. The utility model provides a marginal resource scheduling device of reinforcement learning based on model which characterized in that: the system comprises the following modules:
the edge environment model building module: the edge server is used for collecting historical data of load information, resource information and user request information of edge nodes and building an edge environment model through supervised learning according to the historical data;
a reinforcement learning module: the method is used for realizing reinforcement learning edge node resource scheduling based on the edge environment model and distributing the user request to the appropriate edge node.
CN202111285553.2A 2021-11-01 2021-11-01 Method and device for dispatching edge resources based on model reinforcement learning Active CN114090239B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111285553.2A CN114090239B (en) 2021-11-01 2021-11-01 Method and device for dispatching edge resources based on model reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111285553.2A CN114090239B (en) 2021-11-01 2021-11-01 Method and device for dispatching edge resources based on model reinforcement learning

Publications (2)

Publication Number Publication Date
CN114090239A true CN114090239A (en) 2022-02-25
CN114090239B CN114090239B (en) 2024-08-13

Family

ID=80298547

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111285553.2A Active CN114090239B (en) 2021-11-01 2021-11-01 Method and device for dispatching edge resources based on model reinforcement learning

Country Status (1)

Country Link
CN (1) CN114090239B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115022189A (en) * 2022-05-31 2022-09-06 武汉大学 Edge user distribution model construction method, device, equipment and readable storage medium
CN118227369A (en) * 2024-05-22 2024-06-21 苏州元脑智能科技有限公司 Active fault tolerance method, apparatus, device, medium and computer program product

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190384641A1 (en) * 2018-06-15 2019-12-19 EMC IP Holding Company LLC Method, apparatus, and computer program product for processing computing task
CN111506405A (en) * 2020-04-08 2020-08-07 北京交通大学 Edge calculation time slice scheduling method based on deep reinforcement learning
CN112069903A (en) * 2020-08-07 2020-12-11 之江实验室 Method and device for achieving face recognition end side unloading calculation based on deep reinforcement learning
CN113282368A (en) * 2021-05-25 2021-08-20 国网湖北省电力有限公司检修公司 Edge computing resource scheduling method for substation inspection
US20210303481A1 (en) * 2020-03-27 2021-09-30 Intel Corporation Efficient data sharing for graphics data processing operations
CN113467952A (en) * 2021-07-15 2021-10-01 北京邮电大学 Distributed federated learning collaborative computing method and system
CN113495793A (en) * 2020-04-02 2021-10-12 英特尔公司 Method and apparatus for buffer sharing
CN113543156A (en) * 2021-06-24 2021-10-22 中国科学院沈阳自动化研究所 Industrial wireless network resource allocation method based on multi-agent deep reinforcement learning

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190384641A1 (en) * 2018-06-15 2019-12-19 EMC IP Holding Company LLC Method, apparatus, and computer program product for processing computing task
US20210303481A1 (en) * 2020-03-27 2021-09-30 Intel Corporation Efficient data sharing for graphics data processing operations
CN113495793A (en) * 2020-04-02 2021-10-12 英特尔公司 Method and apparatus for buffer sharing
CN111506405A (en) * 2020-04-08 2020-08-07 北京交通大学 Edge calculation time slice scheduling method based on deep reinforcement learning
CN112069903A (en) * 2020-08-07 2020-12-11 之江实验室 Method and device for achieving face recognition end side unloading calculation based on deep reinforcement learning
CN113282368A (en) * 2021-05-25 2021-08-20 国网湖北省电力有限公司检修公司 Edge computing resource scheduling method for substation inspection
CN113543156A (en) * 2021-06-24 2021-10-22 中国科学院沈阳自动化研究所 Industrial wireless network resource allocation method based on multi-agent deep reinforcement learning
CN113467952A (en) * 2021-07-15 2021-10-01 北京邮电大学 Distributed federated learning collaborative computing method and system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
LEI LEI: ""Multiuser Resource Control With Deep Reinforcement Learning in IoT Edge Computing"", 《IEEE INTERNET OF THINGS JOURNAL》, vol. 6, no. 6, 31 December 2019 (2019-12-31), pages 10119 - 10133, XP011760732, DOI: 10.1109/JIOT.2019.2935543 *
机器学习算法工程师: ""强化学习通俗理解系列一:马尔科夫奖赏过程MRP"", Retrieved from the Internet <URL:《https://cloud.tencent.com/developer/article/1167673》> *
缪巍巍: ""基于多智能体强化学习的边缘物联代理资源分配算法"", 《电力信息与通信技术》, vol. 19, no. 12, 25 December 2021 (2021-12-25), pages 9 - 15 *
陆知遥: ""基于多智能体的共享车辆动态调配问题研究"", 《中国优秀硕士学位论文全文数据库 工程科技II辑》, no. 2021, 15 September 2021 (2021-09-15), pages 034 - 64 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115022189A (en) * 2022-05-31 2022-09-06 武汉大学 Edge user distribution model construction method, device, equipment and readable storage medium
CN115022189B (en) * 2022-05-31 2024-03-26 武汉大学 Edge user allocation model construction method, device, equipment and readable storage medium
CN118227369A (en) * 2024-05-22 2024-06-21 苏州元脑智能科技有限公司 Active fault tolerance method, apparatus, device, medium and computer program product
CN118227369B (en) * 2024-05-22 2024-09-13 苏州元脑智能科技有限公司 Active fault tolerance method, apparatus, device, medium and computer program product

Also Published As

Publication number Publication date
CN114090239B (en) 2024-08-13

Similar Documents

Publication Publication Date Title
Prem Jacob et al. A multi-objective optimal task scheduling in cloud environment using cuckoo particle swarm optimization
CN109617826B (en) Storm dynamic load balancing method based on cuckoo search
CN115248728A (en) Distributed training task scheduling method, system and device for intelligent computing
Kaur et al. Deep‐Q learning‐based heterogeneous earliest finish time scheduling algorithm for scientific workflows in cloud
CN114090239B (en) Method and device for dispatching edge resources based on model reinforcement learning
CN112052092B (en) Risk-aware edge computing task allocation method
CN110297699A (en) Dispatching method, scheduler, storage medium and system
CN114610474B (en) Multi-strategy job scheduling method and system under heterogeneous supercomputing environment
CN113342510B (en) Water and power basin emergency command cloud-side computing resource cooperative processing method
CN115134371A (en) Scheduling method, system, equipment and medium containing edge network computing resources
CN115237581A (en) Heterogeneous computing power-oriented multi-strategy intelligent scheduling method and device
CN112732444A (en) Distributed machine learning-oriented data partitioning method
CN113641445B (en) Cloud resource self-adaptive configuration method and system based on depth deterministic strategy
CN111309472A (en) Online virtual resource allocation method based on virtual machine pre-deployment
CN116932198A (en) Resource scheduling method, device, electronic equipment and readable storage medium
CN106407007B (en) Cloud resource configuration optimization method for elastic analysis process
CN118210609A (en) Cloud computing scheduling method and system based on DQN model
CN112632615B (en) Scientific workflow data layout method based on hybrid cloud environment
CN116701001B (en) Target task allocation method and device, electronic equipment and storage medium
CN116500896B (en) Intelligent real-time scheduling model and method for intelligent network-connected automobile domain controller multi-virtual CPU tasks
CN117436627A (en) Task allocation method, device, terminal equipment and medium
CN112698911B (en) Cloud job scheduling method based on deep reinforcement learning
CN115185651A (en) Workflow optimization scheduling algorithm based on cloud computing
CN113238873A (en) Method for optimizing and configuring spacecraft resources
Liu A Programming Model for the Cloud Platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant