CN109947567B - Multi-agent reinforcement learning scheduling method and system and electronic equipment - Google Patents
Multi-agent reinforcement learning scheduling method and system and electronic equipment Download PDFInfo
- Publication number
- CN109947567B CN109947567B CN201910193429.XA CN201910193429A CN109947567B CN 109947567 B CN109947567 B CN 109947567B CN 201910193429 A CN201910193429 A CN 201910193429A CN 109947567 B CN109947567 B CN 109947567B
- Authority
- CN
- China
- Prior art keywords
- scheduling
- agent
- virtual machine
- service node
- reinforcement learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 93
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000012549 training Methods 0.000 claims abstract description 37
- 230000009471 action Effects 0.000 claims description 37
- 238000004422 calculation algorithm Methods 0.000 claims description 26
- 230000015654 memory Effects 0.000 claims description 24
- 230000006870 function Effects 0.000 claims description 21
- 238000007781 pre-processing Methods 0.000 claims description 17
- 229920006395 saturated elastomer Polymers 0.000 claims description 6
- 238000010276 construction Methods 0.000 claims description 4
- 230000001172 regenerating effect Effects 0.000 claims description 4
- 238000004088 simulation Methods 0.000 abstract description 16
- 238000005516 engineering process Methods 0.000 abstract description 9
- 238000013468 resource allocation Methods 0.000 abstract description 5
- 238000010586 diagram Methods 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 230000002159 abnormal effect Effects 0.000 description 4
- 230000008859 change Effects 0.000 description 4
- 230000003044 adaptive effect Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 239000002699 waste material Substances 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 125000004432 carbon atom Chemical group C* 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000003631 expected effect Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer And Data Communications (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The application relates to a multi-agent reinforcement learning scheduling method, a multi-agent reinforcement learning scheduling system and electronic equipment. The method comprises the following steps: step a: collecting server parameters of a network data center and virtual machine load information running on each server; step b: establishing a virtual simulation environment by using the server parameters and the virtual machine load information, and establishing a deep reinforcement learning model of the multi-agent; step c: off-line training and learning are carried out by utilizing the deep reinforcement learning model and the simulation environment of the multi-agent, and an agent model is trained for each server respectively; step d: and deploying the intelligent agent model to real service nodes, and scheduling according to the load condition of each service node. The method and the system have the advantages that the service running on the server is virtualized through the virtualization technology, the load balance is carried out in a virtual machine scheduling mode, the resource allocation is macroscopic, and the strategy that the multi-agent generates cooperation under the complex dynamic environment can be realized.
Description
Technical Field
The present application relates to the field of multi-agent systems, and in particular, to a method, a system, and an electronic device for multi-agent reinforcement learning scheduling.
Background
In a cloud computing environment, a traditional service deployment mode is difficult to deal with variable access modes, although fixed allocation of resources can stably provide services, a large amount of resource waste exists in the traditional service deployment mode, for example, in the same network topology structure, some servers may often run at full load, and some servers only deploy a few services and still have a lot of unused storage space and computing capacity, so that the traditional deployment service is difficult to deal with the waste of the resources, and efficient scheduling is difficult to realize, so that the resources cannot be efficiently utilized. There is therefore a need for a scheduling algorithm that can adapt to dynamic environments to balance the load of the individual servers in the network.
With the development of virtualization technology, the resource scheduling problem is also promoted from static allocation to dynamic allocation by the appearance of technologies such as virtual machine containers, and in recent years, schemes for resource adaptive scheduling are endless, most of the schemes adopt a heuristic algorithm, perform dynamic scheduling by adjusting parameters, adjust the abundant or insufficient conditions of available resources in the operating environment according to a threshold value, and iteratively calculate a suitable threshold value by using the heuristic algorithm. However, the scheduling method only seeks an optimal solution on a massive data combination, and the solved optimal decision is only for the current specific time node, and the time sequence information is not fully utilized, so that the problem of resource allocation in a large-scale complex dynamic environment is difficult to solve.
With the rise of artificial intelligence, the development of deep reinforcement learning technology makes the decision of an intelligent agent on a large state space possible. In the field of multi-agent reinforcement learning, if distributed learning is performed by using a traditional reinforcement learning algorithm such as Q-learning, PG (Policy Gradient Method), the expected effect still cannot be obtained because each agent tries to learn and predict the actions of other agents in each step, and other agents are always changing in a dynamic environment, so the environment becomes unstable, the knowledge is difficult to learn, and optimal resource allocation cannot be realized. In addition, from the aspect of a reinforcement learning method, most of the current scheduling means are single agent reinforcement learning and distributed reinforcement learning, and if only one agent is used for centralized training, the algorithm is difficult to train and is difficult to converge due to a large amount of action spaces of complex state changes and permutation combinations under a network topology structure. The method using distributed reinforcement learning also faces another problem, and the common distributed reinforcement learning is to train a plurality of agents together to accelerate the convergence rate, but in fact, the scheduling strategies of the agents are the same, and only a plurality of entities are used to accelerate the training rate in the training process, so that the finally obtained homogeneous agents have no cooperative ability. In the traditional multi-agent method, each agent can predict the decision of other agents at each decision step, but because the decision of other agents is unstable under the dynamic environment, the training is very difficult and each agent can do things almost the same without cooperative strategy.
Disclosure of Invention
The application provides a multi-agent reinforcement learning scheduling method, a multi-agent reinforcement learning scheduling system and electronic equipment, and aims to solve at least one of the technical problems in the prior art to a certain extent.
In order to solve the above problems, the present application provides the following technical solutions:
a multi-agent reinforcement learning scheduling method comprises the following steps:
step a: collecting server parameters of a network data center and virtual machine load information running on each server;
step b: establishing a virtual simulation environment by using the server parameters and the virtual machine load information, and establishing a deep reinforcement learning model of the multi-agent;
step c: off-line training and learning are carried out by utilizing the deep reinforcement learning model and the simulation environment of the multi-agent, and an agent model is trained for each server respectively;
step d: and deploying the intelligent agent model to real service nodes, and scheduling according to the load condition of each service node.
The technical scheme adopted by the embodiment of the application further comprises the following steps: the step a further comprises: carrying out standardized preprocessing operation on the collected server parameters and the virtual machine load information; the normalized preprocessing operation comprises: defining the virtual machine information of each service node as a tuple, wherein the tuple comprises the number of virtual machines and respective configuration of the virtual machines, each virtual machine comprises two scheduling states, namely a to-be-scheduled state and an operating state, each service node comprises two states, namely a saturated state and a hungry state, and the sum of the resource ratio occupied by each virtual machine is less than the upper limit of the configuration of the server where the virtual machine is located.
The technical scheme adopted by the embodiment of the application further comprises the following steps: in the step b, the deep reinforcement learning model of the multi-agent specifically comprises a prediction module and a scheduling module, wherein the prediction module predicts resources needing to be scheduled out in the current state through information input by each service node, and maps an action space into the total capacity of the current service node according to configuration information of the current service node; the scheduling module carries out rescheduling and distribution to generate a scheduling strategy according to the marked virtual machine in the state to be scheduled, and an agent on each service node calculates a return function according to the generated scheduling action; the prediction module measures the quality of the scheduling strategy, so that the load of each service node in the whole network is balanced.
The technical scheme adopted by the embodiment of the application further comprises the following steps: in step c, the off-line training and learning by using the deep reinforcement learning model of the multi-agent and the simulation environment, and the training of an agent model for each server specifically includes: the intelligent agent on each service node adjusts the size of the resource to be scheduled through the prediction module, marks the virtual machine to be scheduled out, generates a scheduling strategy according to the virtual machine in the state to be scheduled, calculates the return value of each service node, summarizes and sums the return values to obtain a total return value, and adjusts the parameters of each prediction module according to the total return value.
The technical scheme adopted by the embodiment of the application further comprises the following steps: in the step d, the deploying the agent model to the real service nodes and scheduling according to the load condition of each service node specifically comprises: deploying a trained intelligent agent model to a corresponding service node in a real environment, sensing state information of a server where the intelligent agent model is located within a period of time as input, predicting to obtain resources needing to be released by the current server, and selecting a virtual machine closest to a standard by using a knapsack algorithm to mark the virtual machine as a state to be scheduled; then, collecting prediction results on all servers and the virtual machines marked as the to-be-scheduled states through a scheduling module, assigning the virtual machines in the to-be-scheduled states to suitable servers as required to generate a scheduling strategy, and distributing a scheduling command to corresponding service nodes to execute scheduling operation; before executing the scheduling strategy, checking whether each scheduling command is legal or not, if not, feeding back a punishment reward updating parameter, and regenerating the scheduling strategy; and if the intelligent agent parameter is legal, executing the scheduling operation, and obtaining the feedback reward value to update the intelligent agent parameter.
Another technical scheme adopted by the embodiment of the application is as follows: a multi-agent reinforcement learning scheduling system comprising:
an information collection module: the system comprises a data center, a data center and a server, wherein the data center is used for collecting server parameters of the data center and virtual machine load information running on each server;
a reinforcement learning model construction module: the system comprises a virtual simulation environment and a deep reinforcement learning model of a multi-agent, wherein the virtual simulation environment is established by using the server parameters and the virtual machine load information;
the intelligent agent model training module: the system comprises a multi-agent deep reinforcement learning model, a simulation environment and a plurality of servers, wherein the multi-agent deep reinforcement learning model is used for performing offline training and learning by utilizing the deep reinforcement learning model and the simulation environment of the multi-agent, and an agent model is trained for each server;
an agent deployment module: and the intelligent agent model is used for deploying the intelligent agent model to real service nodes and scheduling according to the load condition of each service node.
The technical scheme adopted by the embodiment of the application further comprises a preprocessing module, wherein the preprocessing module is used for carrying out standardized preprocessing operation on the collected server parameters and the collected virtual machine load information; the normalized preprocessing operation comprises: defining the virtual machine information of each service node as a tuple, wherein the tuple comprises the number of virtual machines and respective configuration of the virtual machines, each virtual machine comprises two scheduling states, namely a to-be-scheduled state and an operating state, each service node comprises two states, namely a saturated state and a hungry state, and the sum of the resource ratio occupied by each virtual machine is less than the upper limit of the configuration of the server where the virtual machine is located.
The technical scheme adopted by the embodiment of the application further comprises the following steps: the reinforcement learning model building module comprises a prediction module and a scheduling module, wherein the prediction module comprises:
a state sensing unit: the system is used for predicting the resources needing to be scheduled out in the current state through the information input by each service node;
an action space unit: the action space is mapped into the total capacity of the current service node according to the configuration information of the current service node;
the scheduling module carries out rescheduling and distribution to generate a scheduling strategy according to the marked virtual machine in the state to be scheduled, and an agent on each service node calculates a return function according to the generated scheduling action;
the prediction module further comprises:
a reward function unit: the method is used for measuring the quality of the scheduling strategy, so that the load of each service node in the whole network is balanced.
The technical scheme adopted by the embodiment of the application further comprises the following steps: the intelligent agent model training module utilizes a deep reinforcement learning model and a simulation environment of a plurality of intelligent agents to carry out off-line training and learning, and the training of an intelligent agent model for each server specifically comprises the following steps: the intelligent agent on each service node adjusts the size of the resource to be scheduled through the prediction module, marks the virtual machine to be scheduled out, generates a scheduling strategy according to the virtual machine in the state to be scheduled, calculates the return value of each service node, summarizes and sums the return values to obtain a total return value, and adjusts the parameters of each prediction module according to the total return value.
The technical scheme adopted by the embodiment of the application further comprises the following steps: the intelligent agent deployment module deploys the intelligent agent model to the real service nodes, and the scheduling according to the load condition of each service node specifically comprises the following steps: deploying a trained intelligent agent model to a corresponding service node in a real environment, sensing state information of a server where the intelligent agent model is located within a period of time as input, predicting to obtain resources needing to be released by the current server, and selecting a virtual machine closest to a standard by using a knapsack algorithm to mark the virtual machine as a state to be scheduled; then, collecting prediction results on all servers and the virtual machines marked as the to-be-scheduled states through a scheduling module, assigning the virtual machines in the to-be-scheduled states to suitable servers as required to generate a scheduling strategy, and distributing a scheduling command to corresponding service nodes to execute scheduling operation; before executing the scheduling strategy, checking whether each scheduling command is legal or not, if not, feeding back a punishment reward updating parameter, and regenerating the scheduling strategy; and if the intelligent agent parameter is legal, executing the scheduling operation, and obtaining the feedback reward value to update the intelligent agent parameter.
The embodiment of the application adopts another technical scheme that: an electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the one processor to cause the at least one processor to perform the following operations of the multi-agent reinforcement learning scheduling method described above:
step a: collecting server parameters of a network data center and virtual machine load information running on each server;
step b: establishing a virtual simulation environment by using the server parameters and the virtual machine load information, and establishing a deep reinforcement learning model of the multi-agent;
step c: off-line training and learning are carried out by utilizing the deep reinforcement learning model and the simulation environment of the multi-agent, and an agent model is trained for each server respectively;
step d: and deploying the intelligent agent model to real service nodes, and scheduling according to the load condition of each service node.
Compared with the prior art, the embodiment of the application has the advantages that: the multi-agent reinforcement learning scheduling method, the multi-agent reinforcement learning scheduling system and the electronic equipment virtualize the service running on the server through virtualization technology, and perform load balancing through a virtual machine scheduling mode, because the scheduling range is not limited in a single server, when one server is in a high-load state, the virtual machine can be scheduled to other low-load servers to run, and compared with a scheme of resource allocation, the method and the system are more macroscopic. Meanwhile, the MADDPG framework is used for expanding on the AC framework, critic adds extra information for decision making of other agents, but each agent can only use local information for training, and a strategy that a plurality of agents generate cooperation in a complex dynamic environment can be achieved through the framework.
Drawings
FIG. 1 is a flow chart of a multi-agent reinforcement learning scheduling method according to an embodiment of the present application;
FIG. 2 is a diagram of a MADDPG scheduling framework according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a scheduling overall framework according to an embodiment of the present application;
FIG. 4 is a schematic structural diagram of a multi-agent reinforcement learning scheduling system according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a hardware device of a multi-agent reinforcement learning scheduling method according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In order to solve the defects in the prior art, the multi-agent reinforcement learning scheduling method in the embodiment of the application uses a multi-agent reinforcement learning technology in the reinforcement learning field, models are built according to load information on each service node in the cloud service environment, decision is made by using recurrent neural network learning time sequence information, an agent is trained for each server, and competition or cooperative work is performed on a plurality of agents with different tasks to maintain load balance under the whole network topology structure. After the initial training is completed, each intelligent agent is placed to a real service node, then scheduling is carried out according to the load condition of each node, each intelligent agent continues to learn and perfect according to the decision memory of the current independent environment and other nodes while decision and scheduling are carried out, so that each intelligent agent can cooperate with the intelligent agents of other nodes to generate a scheduling strategy, and the load balance of each service node is realized.
Specifically, please refer to fig. 1, which is a flowchart illustrating a multi-agent reinforcement learning scheduling method according to an embodiment of the present application. The multi-agent reinforcement learning scheduling method comprises the following steps:
step 100: collecting server parameters of a network data center and virtual machine load information running on each server;
in step 100, the collected server parameters specifically include: collecting configuration information, memory, hard disk storage space and the like of each server in a real scene for a period of time; the collected virtual machine load information specifically includes: and collecting parameters of resources occupied by the virtual machine running on each server, such as CPU occupancy rate, memory and hard disk occupancy rate and the like.
Step 200: preprocessing operations such as normalization and the like are carried out on the collected server parameters and the virtual machine load information;
in step 200, the preprocessing operation specifically includes: defining the virtual machine information of each service node as a tuple, wherein the tuple comprises the number of virtual machines and the respective configuration of the virtual machines, including a CPU, a memory, a hard disk and the current state, each virtual machine comprises two scheduling states, namely a to-be-scheduled state and a running state, each service node comprises two states, namely a saturated state and a hungry state, and the sum of the resource ratio occupied by each virtual machine cannot be more than the upper limit of the configuration of the server.
Step 300: establishing a virtual simulation environment by using the preprocessed data, and establishing a deep reinforcement learning model of the multi-agent;
in step 300, establishing a deep reinforcement learning model of a multi-agent specifically includes: modeling the collected time sequence dynamic information (server parameters and virtual machine load information) to create a simulation environment for off-line training, wherein the model adopts a multi-agent deep reinforcement learning model, and in order to fully utilize the influence of time sequence data, an LSTM model is adopted in a deep network part in the model to extract the time sequence information, so that the influence of abnormal data fluctuation in an instantaneous state on decision making is avoided. The model adopts a MADDPG (Multi-Agent Deep Deterministic Policy Gradient, namely, a Multi-Agent activator-critical for Mixed Cooperative-comprehensive environment from OpenAI) framework, the MADDPG framework is the expansion of a DDPG (continuous control with Deep learning article published by Google Deep Mind) algorithm in the Multi-Agent field, and the DDPG algorithm applies Deep reinforcement learning to a continuous action space. And the action space obtained by the deep learning part is set as the resource occupation ratio of the virtual machine in the state to be scheduled, namely the load balance of the current service node can be maintained only by scheduling the occupied space. Marking the virtual machine with proper size as a to-be-scheduled state according to the obtained to-be-scheduled space, then calculating the return rewards of the virtual machine in the to-be-scheduled state and each service node on each service node in the whole network, generating a scheduling strategy by using reward values obtained by the virtual machine by being distributed to the service nodes as distance measurement, finally checking whether the scheduling strategy is executable, if the scheduling strategy is executable, scheduling the virtual machine in the to-be-scheduled state to other proper service nodes, if the scheduling strategy is not executable, returning a negative feedback punishment, and generating the scheduling strategy again by the intelligent agent. The detailed scheduling framework is shown in fig. 2.
In the embodiment of the application, in order to solve the influence caused by some instantaneous abnormal load fluctuations in a dynamic environment, a circulating neural network LSTM (long-short time memory network) is used for replacing a fully-connected neural network in deep reinforcement learning, so that an intelligent agent can learn hidden information among time sequence data, and therefore self-adaptive scheduling based on space-time perception is achieved.
In the above, the virtual machines are marked as the states to be scheduled by using the intelligent agents on the service nodes, a knapsack problem solution is adopted, the predicted spaces to be scheduled are used as knapsack spaces, the occupied resources of each virtual machine are used as the weight and the value of the articles, the maximum value which can be loaded into the knapsack is calculated, and the loaded virtual machines are marked as the states to be scheduled. Then, the space to be scheduled predicted on the service node is counted (wherein, negative numbers exist to indicate how many resources to be scheduled can fully utilize the resources), the target is that the sum of the space to be scheduled occupied and the space to be scheduled of each service node is minimum, and a scheduling strategy can be obtained through calculation.
In the embodiment of the application, the MADDPG framework expands the deep reinforcement learning technology to the field of multi-agent, the algorithm is suitable for Centralized learning (Centralized learning) and distributed execution (Centralized execution) in the multi-agent environment, and the multi-agent can be learnt to cooperate and compete by using the framework.
Specifically, the maddppg algorithm takes into account a plurality of parameterizations θ ═ θ1,θ2,θ3,…θn-calculating Policy by gaming of a plurality of agents, the Policy for all agents being defined as pi ═ pi { (pi } pi)1,π2,π3,…πnThe expected profit for the ith agent is J (θ)i)=E[Ri]Then consider the deterministic policy μθiθiWhen parametric, the gradient can be expressed as:
wherein x is (o)1…on)。
Specifically, the deep reinforcement learning model comprises a prediction module and a scheduling module, the prediction module comprises a state sensing unit, an action space unit and a reward function unit, and the specific functions are as follows:
a state sensing unit: predicting resources needing to be scheduled out in the current state through information input by each node, wherein the input state is defined through load information of each node and resources occupied by running virtual machines;
an action space unit: mapping the action space to the total capacity of the current service node according to the configuration information of the current node;
a scheduling module: according to the marked virtual machine in the state to be scheduled, rescheduling and distributing are carried out to generate a scheduling strategy, and an agent on each service node calculates a return function according to the generated scheduling action;
a reward function unit: measuring the quality of a scheduling strategy, wherein the target is load balance of each service node in the whole network, and a return function on each service node is calculated independently; the return function is formulated as follows:
in the above formula, the first and second carbon atoms are,riis the reward return on each service node, wherein c represents the CPU occupancy rate on the ith machine, and alpha and beta are penalty coefficients. α can be set as the case may be, indicating a threshold value at which it is desired that the server CPU occupancy load remain steady.
In the above formula, R is an overall reward function, and the final optimization target is the maximum R obtained for the scheduling policy cooperatively generated by each agent.
Step 400: off-line training and learning are carried out by utilizing a deep reinforcement learning model of a plurality of intelligent agents and a simulation environment, and an intelligent agent model is trained for each server respectively;
in step 400, performing offline training in a simulation environment established according to real data, creating an agent for each service node, adjusting the size of resources to be scheduled by the agent on each service node through a prediction module, marking virtual machines to be scheduled, generating a scheduling policy according to the virtual machines in a state to be scheduled, calculating the return values of each service node, summarizing and summing the return values to obtain a total return value, and adjusting the parameters of each prediction module according to the total return value.
Step 500: and deploying the trained intelligent agent model to the real service nodes, and scheduling according to the load condition of each service node.
In general, a scheduling action is directly obtained according to environment input in multi-agent reinforcement learning, but in a complex network topology, an action space for a virtual machine scheduling strategy is too large, and the action space is too large or an algorithm is difficult to converge, and in this way, each virtual machine running in the complex network topology needs to be configured with a global id for specifying a scheduling target, but it should be noted that although the id can be indexed to the virtual machine, resources occupied by the virtual machine are likely to change in the running process, and therefore the strategy learned in the learning process is not reliable. Even if the occupied resources of the virtual machines do not change, if a virtual machine is newly added, the intelligent agent trained based on the algorithm does not consider the newly added virtual machine in decision making. Therefore, the method is improved on the basis of the algorithm, so that the action space of the model is replaced by the resources which the current server wants to release, namely, how many resources are expected to be scheduled from the action space to keep load balance under the overall network topology. By the arrangement, the fact that the global id is used for marking each virtual machine can be avoided, and the operation can still be carried out even if a new virtual machine is added midway, so that the scheduling algorithm is more flexible and can be adaptive to a wider scene.
Please refer to fig. 4, which is a schematic structural diagram of a multi-agent reinforcement learning scheduling system according to an embodiment of the present application. The multi-agent reinforcement learning scheduling system comprises an information collection module, a preprocessing module, a reinforcement learning model construction module, an agent model training module and an agent deployment module.
An information collection module: the system comprises a data center, a data center and a server, wherein the data center is used for collecting server parameters of the data center and virtual machine load information running on each server; the collected server parameters specifically include: collecting configuration information, memory, hard disk storage space and the like of each server in a real scene for a period of time; the collected virtual machine load information specifically includes: and collecting parameters of resources occupied by the virtual machine running on each server, such as CPU occupancy rate, memory and hard disk occupancy rate and the like.
A preprocessing module: the system is used for carrying out preprocessing operations such as normalization on the collected server parameters and the collected virtual machine load information; wherein the preprocessing operation specifically comprises: defining the virtual machine information of each service node as a tuple, wherein the tuple comprises the number of virtual machines and the respective configuration of the virtual machines, including a CPU, a memory, a hard disk and the current state, each virtual machine comprises two scheduling states, namely a to-be-scheduled state and a running state, each service node comprises two states, namely a saturated state and a hungry state, and the sum of the resource ratio occupied by each virtual machine cannot be more than the upper limit of the configuration of the server.
A reinforcement learning model construction module: the system is used for establishing a virtual simulation environment by using the preprocessed data and establishing a deep reinforcement learning model of the multi-agent; the establishing of the deep reinforcement learning model of the multi-agent specifically comprises the following steps: modeling the collected time sequence dynamic information (server parameters and virtual machine load information) to create a simulation environment for off-line training, wherein the model adopts a multi-agent deep reinforcement learning model, and in order to fully utilize the influence of time sequence data, an LSTM model is adopted in a deep network part in the model to extract the time sequence information, so that the influence of abnormal data fluctuation in an instantaneous state on decision making is avoided. The model adopts an MADDPG frame, the MADDPG frame is the expansion of a DDPG algorithm in the field of multi-agents, and the DDPG algorithm applies deep reinforcement learning to a continuous action space. And the action space obtained by the deep learning part is set as the resource occupation ratio of the virtual machine in the state to be scheduled, namely the load balance of the current service node can be maintained only by scheduling the occupied space. Marking the virtual machine with proper size as a to-be-scheduled state according to the obtained to-be-scheduled space, then calculating the return rewards of the virtual machine in the to-be-scheduled state and each service node on each service node in the whole network, generating a scheduling strategy by using reward values obtained by the virtual machine by being distributed to the service nodes as distance measurement, finally checking whether the scheduling strategy is executable, if the scheduling strategy is executable, scheduling the virtual machine in the to-be-scheduled state to other proper service nodes, if the scheduling strategy is not executable, returning a negative feedback punishment, and generating the scheduling strategy again by the intelligent agent.
In the embodiment of the application, in order to solve the influence caused by some instantaneous abnormal load fluctuations in a dynamic environment, a circulating neural network LSTM (long-short time memory network) is used for replacing a fully-connected neural network in deep reinforcement learning, so that an intelligent agent can learn hidden information among time sequence data, and therefore self-adaptive scheduling based on space-time perception is achieved.
In the above, the virtual machines are marked as the states to be scheduled by using the intelligent agents on the service nodes, a knapsack problem solution is adopted, the predicted spaces to be scheduled are used as knapsack spaces, the occupied resources of each virtual machine are used as the weight and the value of the articles, the maximum value which can be loaded into the knapsack is calculated, and the loaded virtual machines are marked as the states to be scheduled. Then, the space to be scheduled predicted on the service node is counted (wherein, negative numbers exist to indicate how many resources to be scheduled can fully utilize the resources), the target is that the sum of the space to be scheduled occupied and the space to be scheduled of each service node is minimum, and a scheduling strategy can be obtained through calculation.
In the embodiment of the application, the MADDPG framework expands the deep reinforcement learning technology to the field of multi-agent, the algorithm is suitable for Centralized learning (Centralized learning) and distributed execution (Centralized execution) in the multi-agent environment, and the multi-agent can be learnt to cooperate and compete by using the framework.
Specifically, the maddppg algorithm takes into account a plurality of parameterizations θ ═ θ1,θ2,θ3,…θn-calculating Policy by gaming of a plurality of agents, the Policy for all agents being defined as pi ═ pi { (pi } pi)1,π2,π3,…πnThe expected profit for the ith agent is J (θ)i)=E[Ri]Then considerDeterministic strategy muθiθiWhen parametric, the gradient can be expressed as:
wherein x is (o)1…on)。
Further, the reinforcement learning model building module comprises a prediction module and a scheduling module, the prediction module comprises a state sensing unit, an action space unit and a reward function unit, and the specific functions are as follows:
a state sensing unit: predicting resources needing to be scheduled out in the current state through information input by each node, wherein the input state is defined through load information of each node and resources occupied by running virtual machines;
an action space unit: mapping the action space to the total capacity of the current service node according to the configuration information of the current node;
a scheduling module: according to the marked virtual machine in the state to be scheduled, rescheduling and distributing are carried out to generate a scheduling strategy, and an agent on each service node calculates a return function according to the generated scheduling action;
a reward function unit: measuring the quality of a scheduling strategy, wherein the target is load balance of each service node in the whole network, and a return function on each service node is calculated independently; the return function is formulated as follows:
in the above formula, riIs the reward return on each service node, wherein c represents the CPU occupancy rate on the ith machine, and alpha and beta are penalty coefficients. α can be set as the case may be, indicating a threshold value at which it is desired that the server CPU occupancy load remain steady.
In the above formula, R is an overall reward function, and the final optimization target is the maximum R obtained for the scheduling policy cooperatively generated by each agent.
The intelligent agent model training module: the system is used for performing off-line training and learning by utilizing a deep reinforcement learning model and a simulation environment of a plurality of intelligent agents, and respectively training an intelligent agent model for each server; the method comprises the steps of performing off-line training under a simulation environment established according to real data, respectively establishing an agent for each service node, adjusting the size of resources to be scheduled by the agent on each service node through a prediction module, marking virtual machines to be scheduled, generating a scheduling strategy according to the virtual machines in a state to be scheduled, respectively calculating the return values of the service nodes, summarizing and summing the return values to obtain a total return value, and finally adjusting the parameters of each prediction module according to the total return value.
An agent deployment module: and the intelligent agent model is used for deploying the trained intelligent agent model to the real service nodes and scheduling according to the load condition of each service node. The method comprises the steps of putting each trained intelligent agent model down to a corresponding service node in a real environment, then predicting and modifying a state to be scheduled through a prediction module of an intelligent agent, uniformly distributing and generating a scheduling strategy by the scheduling module, distributing a scheduling command to the corresponding node to execute scheduling operation, judging whether an action can be executed or not before executing the scheduling action, feeding back a punishment reward updating parameter if the action cannot be executed or fails to be executed, re-generating the scheduling strategy, and repeating iteration until all scheduling strategies can be executed.
In general, a scheduling action is directly obtained according to environment input in multi-agent reinforcement learning, but in a complex network topology, an action space for a virtual machine scheduling strategy is too large, and the action space is too large or an algorithm is difficult to converge, and in this way, each virtual machine running in the complex network topology needs to be configured with a global id for specifying a scheduling target, but it should be noted that although the id can be indexed to the virtual machine, resources occupied by the virtual machine are likely to change in the running process, and therefore the strategy learned in the learning process is not reliable. Even if the occupied resources of the virtual machines do not change, if a virtual machine is newly added, the intelligent agent trained based on the algorithm does not consider the newly added virtual machine in decision making. Therefore, the method is improved on the basis of the algorithm, so that the action space of the model is replaced by the resources which the current server wants to release, namely, how many resources are expected to be scheduled from the action space to keep load balance under the overall network topology. By the arrangement, the fact that the global id is used for marking each virtual machine can be avoided, and the operation can still be carried out even if a new virtual machine is added midway, so that the scheduling algorithm is more flexible and can be adaptive to a wider scene.
Fig. 5 is a schematic structural diagram of a hardware device of a multi-agent reinforcement learning scheduling method according to an embodiment of the present application. As shown in fig. 5, the device includes one or more processors and memory. Taking a processor as an example, the apparatus may further include: an input system and an output system.
The processor, memory, input system, and output system may be connected by a bus or other means, as exemplified by the bus connection in fig. 5.
The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules. The processor executes various functional applications and data processing of the electronic device, i.e., implements the processing method of the above-described method embodiment, by executing the non-transitory software program, instructions and modules stored in the memory.
The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data and the like. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processing system over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input system may receive input numeric or character information and generate a signal input. The output system may include a display device such as a display screen.
The one or more modules are stored in the memory and, when executed by the one or more processors, perform the following for any of the above method embodiments:
step a: collecting server parameters of a network data center and virtual machine load information running on each server;
step b: establishing a virtual simulation environment by using the server parameters and the virtual machine load information, and establishing a deep reinforcement learning model of the multi-agent;
step c: off-line training and learning are carried out by utilizing the deep reinforcement learning model and the simulation environment of the multi-agent, and an agent model is trained for each server respectively;
step d: and deploying the intelligent agent model to real service nodes, and scheduling according to the load condition of each service node.
The product can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the methods provided in the embodiments of the present application.
Embodiments of the present application provide a non-transitory (non-volatile) computer storage medium having stored thereon computer-executable instructions that may perform the following operations:
step a: collecting server parameters of a network data center and virtual machine load information running on each server;
step b: establishing a virtual simulation environment by using the server parameters and the virtual machine load information, and establishing a deep reinforcement learning model of the multi-agent;
step c: off-line training and learning are carried out by utilizing the deep reinforcement learning model and the simulation environment of the multi-agent, and an agent model is trained for each server respectively;
step d: and deploying the intelligent agent model to real service nodes, and scheduling according to the load condition of each service node.
Embodiments of the present application provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to perform the following:
step a: collecting server parameters of a network data center and virtual machine load information running on each server;
step b: establishing a virtual simulation environment by using the server parameters and the virtual machine load information, and establishing a deep reinforcement learning model of the multi-agent;
step c: off-line training and learning are carried out by utilizing the deep reinforcement learning model and the simulation environment of the multi-agent, and an agent model is trained for each server respectively;
step d: and deploying the intelligent agent model to real service nodes, and scheduling according to the load condition of each service node.
The multi-agent reinforcement learning scheduling method, the multi-agent reinforcement learning scheduling system and the electronic equipment virtualize the service running on the server through virtualization technology, and perform load balancing through a virtual machine scheduling mode, because the scheduling range is not limited in a single server, when one server is in a high-load state, the virtual machine can be scheduled to other low-load servers to run, and compared with a scheme of resource allocation, the method and the system are more macroscopic. Meanwhile, the MADDPG framework is used for expanding on the AC framework, critic adds extra information for decision making of other agents, but each agent can only use local information for training, and a strategy that a plurality of agents generate cooperation in a complex dynamic environment can be achieved through the framework.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (9)
1. A multi-agent reinforcement learning scheduling method is characterized by comprising the following steps:
step a: collecting server parameters of a network data center and virtual machine load information running on each server;
step b: establishing a virtual simulation environment by using the server parameters and the virtual machine load information, and establishing a deep reinforcement learning model of the multi-agent;
step c: off-line training and learning are carried out by utilizing the deep reinforcement learning model of the multi-agent and the virtual simulation environment, and an agent model is trained for each server;
step d: deploying the intelligent agent model to a real service node, and scheduling according to the load condition of each service node;
in the step d, the deploying the agent model to the real service nodes and scheduling according to the load condition of each service node specifically comprises: deploying a trained intelligent agent model to a corresponding service node in a real environment, sensing state information of a server where the intelligent agent model is located within a period of time as input, predicting to obtain resources needing to be released by the current server, and selecting a virtual machine closest to a standard by using a knapsack algorithm to mark the virtual machine as a state to be scheduled; then, collecting prediction results on all servers and the virtual machines marked as the to-be-scheduled states through a scheduling module, assigning the virtual machines in the to-be-scheduled states to suitable servers as required to generate a scheduling strategy, and distributing a scheduling command to corresponding service nodes to execute scheduling operation; before executing the scheduling strategy, checking whether each scheduling command is legal or not, if not, feeding back a punishment reward updating parameter, and regenerating the scheduling strategy; and if the intelligent agent parameter is legal, executing the scheduling operation, and obtaining the feedback reward value to update the intelligent agent parameter.
2. The multi-agent reinforcement learning scheduling method of claim 1, wherein the step a further comprises: carrying out standardized preprocessing operation on the collected server parameters and the virtual machine load information; the normalized preprocessing operation comprises: defining the virtual machine information of each service node as a tuple, wherein the tuple comprises the number of virtual machines and respective configuration of the virtual machines, each virtual machine comprises two scheduling states, namely a to-be-scheduled state and an operating state, each service node comprises two states, namely a saturated state and a hungry state, and the sum of the resource ratio occupied by each virtual machine is less than the upper limit of the configuration of the server where the virtual machine is located.
3. The multi-agent reinforcement learning scheduling method according to claim 1 or 2, wherein in the step b, the deep reinforcement learning model of the multi-agent specifically includes a prediction module and a scheduling module, the prediction module predicts the resources to be scheduled out in the current state according to the information input by each service node, and maps the action space to the total capacity of the current service node according to the configuration information of the current service node; the scheduling module carries out rescheduling and distribution to generate a scheduling strategy according to the marked virtual machine in the state to be scheduled, and an agent on each service node calculates a return function according to the generated scheduling action; the prediction module measures the quality of the scheduling strategy, so that the load of each service node in the whole network is balanced.
4. The multi-agent reinforcement learning scheduling method of claim 3, wherein in the step c, the off-line training and learning are performed by using the deep reinforcement learning model and the virtual simulation environment of the multi-agent, and the training of one agent model for each server specifically comprises: the intelligent agent on each service node adjusts the size of the resource to be scheduled through the prediction module, marks the virtual machine to be scheduled out, generates a scheduling strategy according to the virtual machine in the state to be scheduled, calculates the return value of each service node, summarizes and sums the return values to obtain a total return value, and adjusts the parameters of each prediction module according to the total return value.
5. A multi-agent reinforcement learning scheduling system, comprising:
an information collection module: the system comprises a data center, a data center and a server, wherein the data center is used for collecting server parameters of the data center and virtual machine load information running on each server;
a reinforcement learning model construction module: the system comprises a virtual simulation environment and a deep reinforcement learning model of a multi-agent, wherein the virtual simulation environment is established by using the server parameters and the virtual machine load information;
the intelligent agent model training module: the system comprises a plurality of servers, a deep reinforcement learning model of a multi-agent and a virtual simulation environment, wherein the deep reinforcement learning model of the multi-agent and the virtual simulation environment are used for off-line training and learning, and an agent model is trained for each server;
an agent deployment module: the intelligent agent model is deployed to real service nodes and is scheduled according to the load condition of each service node;
the intelligent agent deployment module deploys the intelligent agent model to the real service nodes, and the scheduling according to the load condition of each service node specifically comprises the following steps: deploying a trained intelligent agent model to a corresponding service node in a real environment, sensing state information of a server where the intelligent agent model is located within a period of time as input, predicting to obtain resources needing to be released by the current server, and selecting a virtual machine closest to a standard by using a knapsack algorithm to mark the virtual machine as a state to be scheduled; then, collecting prediction results on all servers and the virtual machines marked as the to-be-scheduled states through a scheduling module, assigning the virtual machines in the to-be-scheduled states to suitable servers as required to generate a scheduling strategy, and distributing a scheduling command to corresponding service nodes to execute scheduling operation; before executing the scheduling strategy, checking whether each scheduling command is legal or not, if not, feeding back a punishment reward updating parameter, and regenerating the scheduling strategy; and if the intelligent agent parameter is legal, executing the scheduling operation, and obtaining the feedback reward value to update the intelligent agent parameter.
6. The multi-agent reinforcement learning scheduling system of claim 5, further comprising a preprocessing module for performing a normalized preprocessing operation on the collected server parameters and virtual machine load information; the normalized preprocessing operation comprises: defining the virtual machine information of each service node as a tuple, wherein the tuple comprises the number of virtual machines and respective configuration of the virtual machines, each virtual machine comprises two scheduling states, namely a to-be-scheduled state and an operating state, each service node comprises two states, namely a saturated state and a hungry state, and the sum of the resource ratio occupied by each virtual machine is less than the upper limit of the configuration of the server where the virtual machine is located.
7. The multi-agent reinforcement learning scheduling system of claim 5 or 6, wherein the reinforcement learning model building module comprises a prediction module and a scheduling module, the prediction module comprising:
a state sensing unit: the system is used for predicting the resources needing to be scheduled out in the current state through the information input by each service node;
an action space unit: the action space is mapped into the total capacity of the current service node according to the configuration information of the current service node;
the scheduling module carries out rescheduling and distribution to generate a scheduling strategy according to the marked virtual machine in the state to be scheduled, and an agent on each service node calculates a return function according to the generated scheduling action;
the prediction module further comprises:
a reward function unit: the method is used for measuring the quality of the scheduling strategy, so that the load of each service node in the whole network is balanced.
8. The multi-agent reinforcement learning scheduling system of claim 7, wherein the agent model training module performs off-line training and learning using the deep reinforcement learning model and the virtual simulation environment of the multi-agent, and training one agent model for each server specifically comprises: the intelligent agent on each service node adjusts the size of the resource to be scheduled through the prediction module, marks the virtual machine to be scheduled out, generates a scheduling strategy according to the virtual machine in the state to be scheduled, calculates the return value of each service node, summarizes and sums the return values to obtain a total return value, and adjusts the parameters of each prediction module according to the total return value.
9. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the multi-agent reinforcement learning scheduling method of any of the above 1 to 4.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910193429.XA CN109947567B (en) | 2019-03-14 | 2019-03-14 | Multi-agent reinforcement learning scheduling method and system and electronic equipment |
PCT/CN2019/130582 WO2020181896A1 (en) | 2019-03-14 | 2019-12-31 | Multi-agent reinforcement learning scheduling method and system and electronic device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910193429.XA CN109947567B (en) | 2019-03-14 | 2019-03-14 | Multi-agent reinforcement learning scheduling method and system and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109947567A CN109947567A (en) | 2019-06-28 |
CN109947567B true CN109947567B (en) | 2021-07-20 |
Family
ID=67009966
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910193429.XA Active CN109947567B (en) | 2019-03-14 | 2019-03-14 | Multi-agent reinforcement learning scheduling method and system and electronic equipment |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109947567B (en) |
WO (1) | WO2020181896A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
RU2791840C2 (en) * | 2021-12-21 | 2023-03-13 | Владимир Германович Крюков | Decision-making system in a multi-agent environment |
Families Citing this family (48)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109947567B (en) * | 2019-03-14 | 2021-07-20 | 深圳先进技术研究院 | Multi-agent reinforcement learning scheduling method and system and electronic equipment |
CN110362411B (en) * | 2019-07-25 | 2022-08-02 | 哈尔滨工业大学 | CPU resource scheduling method based on Xen system |
CN110442129B (en) * | 2019-07-26 | 2021-10-22 | 中南大学 | Control method and system for multi-agent formation |
CN110471297B (en) * | 2019-07-30 | 2020-08-11 | 清华大学 | Multi-agent cooperative control method, system and equipment |
CN110427006A (en) * | 2019-08-22 | 2019-11-08 | 齐鲁工业大学 | A kind of multi-agent cooperative control system and method for process industry |
CN110516795B (en) * | 2019-08-28 | 2022-05-10 | 北京达佳互联信息技术有限公司 | Method and device for allocating processors to model variables and electronic equipment |
CN110728368B (en) * | 2019-10-25 | 2022-03-15 | 中国人民解放军国防科技大学 | Acceleration method for deep reinforcement learning of simulation robot |
CN111031387B (en) * | 2019-11-21 | 2020-12-04 | 南京大学 | Method for controlling video coding flow rate of monitoring video sending end |
CN111026549B (en) * | 2019-11-28 | 2022-06-10 | 国网甘肃省电力公司电力科学研究院 | Automatic test resource scheduling method for power information communication equipment |
CN110882544B (en) * | 2019-11-28 | 2023-09-15 | 网易(杭州)网络有限公司 | Multi-agent training method and device and electronic equipment |
CN111047014B (en) * | 2019-12-11 | 2023-06-23 | 中国航空工业集团公司沈阳飞机设计研究所 | Multi-agent air countermeasure distributed sampling training method and equipment |
CN111178545B (en) * | 2019-12-31 | 2023-02-24 | 中国电子科技集团公司信息科学研究院 | Dynamic reinforcement learning decision training system |
CN113067714B (en) * | 2020-01-02 | 2022-12-13 | 中国移动通信有限公司研究院 | Content distribution network scheduling processing method, device and equipment |
CN111310915B (en) * | 2020-01-21 | 2023-09-01 | 浙江工业大学 | Data anomaly detection defense method oriented to reinforcement learning |
CN111324358B (en) * | 2020-02-14 | 2020-10-16 | 南栖仙策(南京)科技有限公司 | Training method for automatic operation and maintenance strategy of information system |
CN111343095B (en) * | 2020-02-15 | 2021-11-05 | 北京理工大学 | Method for realizing controller load balance in software defined network |
CN111461338A (en) * | 2020-03-06 | 2020-07-28 | 北京仿真中心 | Intelligent system updating method and device based on digital twin |
CN111339675B (en) * | 2020-03-10 | 2020-12-01 | 南栖仙策(南京)科技有限公司 | Training method for intelligent marketing strategy based on machine learning simulation environment |
CN111538668B (en) * | 2020-04-28 | 2023-08-15 | 山东浪潮科学研究院有限公司 | Mobile terminal application testing method, device, equipment and medium based on reinforcement learning |
CN111585811B (en) * | 2020-05-06 | 2022-09-02 | 郑州大学 | Virtual optical network mapping method based on multi-agent deep reinforcement learning |
CN113822456A (en) * | 2020-06-18 | 2021-12-21 | 复旦大学 | Service combination optimization deployment method based on deep reinforcement learning in cloud and mist mixed environment |
CN111722910B (en) * | 2020-06-19 | 2023-07-21 | 广东石油化工学院 | Cloud job scheduling and resource allocation method |
CN111724001B (en) * | 2020-06-29 | 2023-08-29 | 重庆大学 | Aircraft detection sensor resource scheduling method based on deep reinforcement learning |
CN111860777B (en) * | 2020-07-06 | 2021-07-02 | 中国人民解放军军事科学院战争研究院 | Distributed reinforcement learning training method and device for super real-time simulation environment |
CN112001585B (en) * | 2020-07-14 | 2023-09-22 | 北京百度网讯科技有限公司 | Multi-agent decision method, device, electronic equipment and storage medium |
CN111967645B (en) * | 2020-07-15 | 2022-04-29 | 清华大学 | Social network information propagation range prediction method and system |
CN112422651A (en) * | 2020-11-06 | 2021-02-26 | 电子科技大学 | Cloud resource scheduling performance bottleneck prediction method based on reinforcement learning |
CN112838946B (en) * | 2020-12-17 | 2023-04-28 | 国网江苏省电力有限公司信息通信分公司 | Method for constructing intelligent sensing and early warning model based on communication network faults |
CN112766705B (en) * | 2021-01-13 | 2024-07-09 | 北京洛塔信息技术有限公司 | Distributed work order processing method, system, equipment and storage medium |
CN112966431B (en) * | 2021-02-04 | 2023-04-28 | 西安交通大学 | Data center energy consumption joint optimization method, system, medium and equipment |
CN112801303A (en) * | 2021-02-07 | 2021-05-14 | 中兴通讯股份有限公司 | Intelligent pipeline processing method and device, storage medium and electronic device |
CN113115451A (en) * | 2021-02-23 | 2021-07-13 | 北京邮电大学 | Interference management and resource allocation scheme based on multi-agent deep reinforcement learning |
CN113094171B (en) * | 2021-03-31 | 2024-07-26 | 北京达佳互联信息技术有限公司 | Data processing method, device, electronic equipment and storage medium |
US20220321605A1 (en) * | 2021-04-01 | 2022-10-06 | Cisco Technology, Inc. | Verifying trust postures of heterogeneous confidential computing clusters |
CN113325721B (en) * | 2021-08-02 | 2021-11-05 | 北京中超伟业信息安全技术股份有限公司 | Model-free adaptive control method and system for industrial system |
CN113672372B (en) * | 2021-08-30 | 2023-08-08 | 福州大学 | Multi-edge collaborative load balancing task scheduling method based on reinforcement learning |
CN114003121B (en) * | 2021-09-30 | 2023-10-31 | 中国科学院计算技术研究所 | Data center server energy efficiency optimization method and device, electronic equipment and storage medium |
CN113641462B (en) * | 2021-10-14 | 2021-12-21 | 西南民族大学 | Virtual network hierarchical distributed deployment method and system based on reinforcement learning |
WO2023121514A1 (en) * | 2021-12-21 | 2023-06-29 | Владимир Германович КРЮКОВ | System for making decisions in a multi-agent environment |
CN114116183B (en) * | 2022-01-28 | 2022-04-29 | 华北电力大学 | Data center service load scheduling method and system based on deep reinforcement learning |
CN114518948B (en) * | 2022-02-21 | 2024-09-24 | 南京航空航天大学 | Dynamic perception rescheduling method for large-scale micro-service application and application |
CN114648165B (en) * | 2022-03-24 | 2024-05-31 | 浙江英集动力科技有限公司 | Multi-heat source heating system optimal scheduling method based on multi-agent game |
CN114816659B (en) * | 2022-03-24 | 2024-08-23 | 阿里云计算有限公司 | Decision model training method for virtual machine network deployment scheme |
CN114924684A (en) * | 2022-04-24 | 2022-08-19 | 南栖仙策(南京)科技有限公司 | Environmental modeling method and device based on decision flow graph and electronic equipment |
CN114860416B (en) * | 2022-06-06 | 2024-04-09 | 清华大学 | Distributed multi-agent detection task allocation method and device in countermeasure scene |
CN114781072A (en) * | 2022-06-17 | 2022-07-22 | 北京理工大学前沿技术研究院 | Decision-making method and system for unmanned vehicle |
CN115293451B (en) * | 2022-08-24 | 2023-06-16 | 中国西安卫星测控中心 | Resource dynamic scheduling method based on deep reinforcement learning |
CN116151137B (en) * | 2023-04-24 | 2023-07-28 | 之江实验室 | Simulation system, method and device |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103873569B (en) * | 2014-03-05 | 2017-04-19 | 兰雨晴 | Resource optimized deployment method based on IaaS (infrastructure as a service) cloud platform |
CN105607952B (en) * | 2015-12-18 | 2021-04-20 | 航天恒星科技有限公司 | Method and device for scheduling virtualized resources |
CN108009016B (en) * | 2016-10-31 | 2021-10-22 | 华为技术有限公司 | Resource load balancing control method and cluster scheduler |
US10649966B2 (en) * | 2017-06-09 | 2020-05-12 | Microsoft Technology Licensing, Llc | Filter suggestion for selective data import |
CN108021451B (en) * | 2017-12-07 | 2021-08-13 | 上海交通大学 | Self-adaptive container migration method in fog computing environment |
CN108829494B (en) * | 2018-06-25 | 2020-09-29 | 杭州谐云科技有限公司 | Container cloud platform intelligent resource optimization method based on load prediction |
CN109165081B (en) * | 2018-08-15 | 2021-09-28 | 福州大学 | Web application self-adaptive resource allocation method based on machine learning |
CN109068350B (en) * | 2018-08-15 | 2021-09-28 | 西安电子科技大学 | Terminal autonomous network selection system and method for wireless heterogeneous network |
CN109947567B (en) * | 2019-03-14 | 2021-07-20 | 深圳先进技术研究院 | Multi-agent reinforcement learning scheduling method and system and electronic equipment |
-
2019
- 2019-03-14 CN CN201910193429.XA patent/CN109947567B/en active Active
- 2019-12-31 WO PCT/CN2019/130582 patent/WO2020181896A1/en active Application Filing
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
RU2791840C2 (en) * | 2021-12-21 | 2023-03-13 | Владимир Германович Крюков | Decision-making system in a multi-agent environment |
Also Published As
Publication number | Publication date |
---|---|
WO2020181896A1 (en) | 2020-09-17 |
CN109947567A (en) | 2019-06-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109947567B (en) | Multi-agent reinforcement learning scheduling method and system and electronic equipment | |
Liu et al. | Adaptive asynchronous federated learning in resource-constrained edge computing | |
Torabi et al. | A dynamic task scheduling framework based on chicken swarm and improved raven roosting optimization methods in cloud computing | |
CN104317658B (en) | A kind of loaded self-adaptive method for scheduling task based on MapReduce | |
CN104408518B (en) | Based on the neural network learning optimization method of particle swarm optimization algorithm | |
US20230206132A1 (en) | Method and Apparatus for Training AI Model, Computing Device, and Storage Medium | |
Mechalikh et al. | PureEdgeSim: A simulation framework for performance evaluation of cloud, edge and mist computing environments | |
CN114237869B (en) | Ray double-layer scheduling method and device based on reinforcement learning and electronic equipment | |
CN112732444A (en) | Distributed machine learning-oriented data partitioning method | |
CN115168027A (en) | Calculation power resource measurement method based on deep reinforcement learning | |
CN115085202A (en) | Power grid multi-region intelligent power collaborative optimization method, device, equipment and medium | |
CN115543626A (en) | Power defect image simulation method adopting heterogeneous computing resource load balancing scheduling | |
Gand et al. | A Fuzzy Controller for Self-adaptive Lightweight Edge Container Orchestration. | |
CN114567560B (en) | Edge node dynamic resource allocation method based on generation of countermeasure imitation learning | |
CN114090239B (en) | Method and device for dispatching edge resources based on model reinforcement learning | |
CN115934344A (en) | Heterogeneous distributed reinforcement learning calculation method, system and storage medium | |
Moazeni et al. | Dynamic resource allocation using an adaptive multi-objective teaching-learning based optimization algorithm in cloud | |
CN114492052A (en) | Global stream level network simulation method, system and device | |
Faraji-Mehmandar et al. | A self-learning approach for proactive resource and service provisioning in fog environment | |
Tuli et al. | Optimizing the performance of fog computing environments using ai and co-simulation | |
CN115883371B (en) | Virtual network function placement method based on learning optimization method in edge-cloud cooperative system | |
CN111612124A (en) | Network structure adaptive optimization method for task-oriented intelligent scheduling | |
Yang et al. | Energy saving strategy of cloud data computing based on convolutional neural network and policy gradient algorithm | |
Su et al. | A power-aware virtual machine mapper using firefly optimization | |
Chen et al. | Conlar: Learning to allocate resources to docker containers under time-varying workloads |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |