WO2020181896A1 - 一种多智能体强化学习调度方法、系统及电子设备 - Google Patents
一种多智能体强化学习调度方法、系统及电子设备 Download PDFInfo
- Publication number
- WO2020181896A1 WO2020181896A1 PCT/CN2019/130582 CN2019130582W WO2020181896A1 WO 2020181896 A1 WO2020181896 A1 WO 2020181896A1 CN 2019130582 W CN2019130582 W CN 2019130582W WO 2020181896 A1 WO2020181896 A1 WO 2020181896A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- agent
- scheduling
- service node
- server
- reinforcement learning
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Definitions
- This application belongs to the technical field of multi-agent systems, and in particular relates to a multi-agent reinforcement learning scheduling method, system and electronic equipment.
- the traditional service deployment method is difficult to cope with the changing access methods.
- the fixed allocation of resources can provide services stably, there is also a large amount of waste of resources, for example, under the same network topology.
- Some servers may often run at full load, while some servers only deploy a few services and still have a lot of unused storage space and computing power. It can be seen that traditional deployment services are difficult to cope with this waste of resources and are difficult to achieve Efficient scheduling makes it impossible to use resources efficiently. Therefore, a scheduling algorithm that can adapt to the dynamic environment is needed to balance the load of the servers in the network.
- the large amount of action space makes the algorithm difficult to train and difficult to converge.
- the method of using distributed reinforcement learning also faces another problem.
- distributed reinforcement learning uses multiple agents to train together to speed up the convergence speed, but in fact the scheduling strategies of these agents are the same, but In the training process, multiple clones are used to speed up the training, so the final result is a homogeneous agent that does not have the ability to collaborate.
- each agent predicts the decisions of other agents at each step of the decision.
- training is very difficult and each agent can do Things are almost the same as no collaborative strategy.
- the present application provides a multi-agent reinforcement learning scheduling method, system, and electronic device, which aim to solve at least one of the above technical problems in the prior art to a certain extent.
- a multi-agent reinforcement learning scheduling method includes the following steps:
- Step a Collect server parameters of the network data center and load information of virtual machines running on each server;
- Step b use the server parameters and virtual machine load information to establish a virtual simulation environment, and establish a multi-agent deep reinforcement learning model
- Step c Use the deep reinforcement learning model and simulation environment of the multi-agent for offline training and learning, and train an agent model for each server;
- Step d Deploy the agent model to real service nodes, and perform scheduling according to the load conditions of each service node.
- the technical solution adopted in the embodiment of the application further includes: the step a further includes: performing a standardized preprocessing operation on the collected server parameters and virtual machine load information; the standardized preprocessing operation includes: defining each service node virtual machine
- the information is a tuple.
- the tuple includes the number of virtual machines and their respective configurations.
- Each virtual machine includes two scheduling states, namely the pending state and the running state, and each service node includes two states, respectively In the saturation state and starvation state, the sum of the resource ratios occupied by each virtual machine is less than the upper limit of the server configuration.
- the deep reinforcement learning model of the multi-agent specifically includes a prediction module and a scheduling module, and the prediction module compares the current state with the information input by each service node
- the resources that need to be scheduled are predicted, and the action space is mapped to the total capacity of the current service node according to the configuration information of the current service node;
- the scheduling module performs rescheduling and allocation according to the marked virtual machine to be scheduled
- the agent on each service node calculates the reward function according to the generated scheduling action;
- the prediction module measures the quality of the scheduling strategy to balance the load of each service node in the entire network.
- the technical solution adopted in the embodiment of the present application further includes: in the step c, the use of the multi-agent deep reinforcement learning model and simulation environment for offline training and learning, and training an agent model for each server specifically includes :
- the agent on each service node adjusts the size of the resource to be scheduled through the prediction module, marks the virtual machine that needs to be scheduled, and generates a scheduling strategy based on the virtual machine to be scheduled.
- Each service node calculates its own return value and summarizes it. Sum up the total return value, and adjust the parameters of each prediction module according to the total return value.
- the technical solution adopted in the embodiment of the present application further includes: in the step d, the deployment of the agent model to the real service node, and scheduling according to the load of each service node is specifically: the trained agent The model is deployed on the corresponding service node in the real environment.
- the agent model perceives the state information on the server for a period of time as input, predicts the resources that the current server needs to release, and uses the backpack algorithm to select the closest Standard virtual machines are marked as pending; then, the prediction results on all servers and the virtual machines marked as pending are collected through the scheduling module, and then the virtual machines in pending state are assigned to suitable servers as needed Generate a scheduling strategy, and distribute the scheduling commands to the corresponding service nodes to perform scheduling operations; check whether each scheduling command is legal before executing the scheduling strategy, if it is not legal, feedback a penalty reward update parameter, and regenerate the scheduling strategy; if it is legal, then Perform scheduling operations and obtain feedback reward values to update agent parameters.
- a multi-agent reinforcement learning scheduling system including:
- Information collection module used to collect server parameters of the network data center and virtual machine load information running on each server;
- Reinforcement learning model building module used to use the server parameters and virtual machine load information to establish a virtual simulation environment and establish a multi-agent deep reinforcement learning model;
- Agent model training module used to use the deep reinforcement learning model and simulation environment of the multi-agent for offline training and learning, and train an agent model for each server;
- Agent deployment module used to deploy the agent model to real service nodes and perform scheduling according to the load conditions of each service node.
- the technical solution adopted in the embodiment of the application further includes a preprocessing module, which is used to perform standardized preprocessing operations on the collected server parameters and virtual machine load information;
- the standardized preprocessing operations include: defining each service The node virtual machine information is a tuple, the tuple includes the number of virtual machines and their respective configurations, each virtual machine includes two scheduling states, which are the pending state and the running state, and each service node includes two states , Respectively are saturated state and starvation state, the resource ratio of each virtual machine is less than the upper limit of the server configuration.
- the reinforcement learning model building module includes a prediction module and a scheduling module
- the prediction module includes:
- State perception unit used to predict the resources that need to be dispatched in the current state through the information input by each service node;
- Action space unit used to map the action space to the total capacity of the current service node according to the configuration information of the current service node;
- the scheduling module performs rescheduling and allocation to generate a scheduling strategy according to the marked virtual machine to be scheduled, and the agent on each service node calculates a reward function according to the generated scheduling action;
- the prediction module further includes:
- Reward function unit used to measure the quality of the scheduling strategy and balance the load of each service node in the entire network.
- the technical solution adopted in the embodiment of the application further includes: the agent model training module uses the multi-agent deep reinforcement learning model and simulation environment for offline training and learning, and training an agent model for each server is specifically:
- the agent on each service node adjusts the size of the resource to be scheduled through the prediction module, marks the virtual machine that needs to be scheduled, and generates a scheduling strategy based on the virtual machine to be scheduled.
- Each service node calculates its own return value and summarizes the sum. Obtain the total return value, and adjust the parameters of each prediction module according to the total return value.
- the technical solution adopted in the embodiment of the application further includes: the agent deployment module deploys the agent model to the real service node, and schedules according to the load of each service node. Specifically: deploy the trained agent model to On the corresponding service node in the real environment, the agent model perceives the state information on the server for a period of time as input, predicts the resources that the current server needs to release, and uses the backpack algorithm to select the virtual The machine marks it as a state to be scheduled; then the scheduling module collects the prediction results on all servers and the virtual machines marked as the state to be scheduled, and then assigns the virtual machines in the state to be scheduled to the appropriate server to generate a scheduling strategy , Distribute the scheduling command to the corresponding service node to perform the scheduling operation; check whether each scheduling command is legal before executing the scheduling strategy, if not legal, feedback a penalty reward update parameter, and regenerate the scheduling strategy; if it is legal, perform the scheduling operation , And get the feedback reward value to update the agent parameters.
- an electronic device including:
- At least one processor At least one processor
- a memory communicatively connected with the at least one processor; wherein,
- the memory stores instructions executable by the one processor, and the instructions are executed by the at least one processor, so that the at least one processor can perform the following operations of the foregoing multi-agent reinforcement learning scheduling method :
- Step a Collect server parameters of the network data center and load information of virtual machines running on each server;
- Step b use the server parameters and virtual machine load information to establish a virtual simulation environment, and establish a multi-agent deep reinforcement learning model
- Step c Use the deep reinforcement learning model and simulation environment of the multi-agent for offline training and learning, and train an agent model for each server;
- Step d Deploy the agent model to real service nodes, and perform scheduling according to the load conditions of each service node.
- the beneficial effects produced by the embodiments of the present application are: the multi-agent reinforcement learning scheduling method, system, and electronic equipment of the embodiments of the present application virtualize the services running on the server through virtualization technology, and through scheduling virtual machines Load balancing is performed in a way, because the scheduling scope is not limited to a single server.
- the virtual machine can be scheduled to run on other low-load servers, compared to the resource allocation scheme More macro.
- this application uses the MADDPG framework to expand on the AC framework.
- the critic adds additional information for decision-making by other agents, but each actor can only use local information for training. Through this framework, multi-agents can be realized. Produce collaborative strategies in a complex dynamic environment.
- Fig. 1 is a flowchart of a multi-agent reinforcement learning scheduling method according to an embodiment of the present application
- FIG. 2 is a schematic diagram of the MADDPG scheduling framework of an embodiment of the present application.
- Figure 3 is a schematic diagram of the overall scheduling framework of an embodiment of the present application.
- FIG. 4 is a schematic structural diagram of a multi-agent reinforcement learning scheduling system according to an embodiment of the present application.
- FIG. 5 is a schematic diagram of the hardware device structure of the multi-agent reinforcement learning scheduling method provided by an embodiment of the present application.
- the multi-agent reinforcement learning scheduling method of the embodiment of the present application uses multi-agent reinforcement learning technology in the field of reinforcement learning, based on the load information on each service node in the cloud service environment Modeling, using cyclic neural networks to learn time sequence information for decision-making, train an agent for each server, and compete or work together in multiple agents with different tasks to maintain load balance under the entire network topology.
- each agent is sent to the real service node, and then scheduled according to the load situation of each node. While making decisions and scheduling, each agent continues to learn and perfect according to the current independent environment and the decision memory of other nodes, so that Each agent can cooperate with the agents of other nodes to generate scheduling strategies to achieve load balancing of each service node.
- FIG. 1 is a flowchart of a multi-agent reinforcement learning scheduling method according to an embodiment of the present application.
- the multi-agent reinforcement learning scheduling method of the embodiment of the present application includes the following steps:
- Step 100 Collect server parameters of the network data center and load information of virtual machines running on each server;
- the collected server parameters specifically include: collecting configuration information, memory and hard disk storage space of each server for a period of time in a real scenario; the collected virtual machine load information specifically includes: collecting virtual machine usage on each server Resource parameters, such as CPU usage, memory and hard disk usage, etc.
- Step 200 Perform preprocessing operations such as normalization on the collected server parameters and virtual machine load information
- the preprocessing operation specifically includes: defining the virtual machine information of each service node as a tuple.
- the tuple includes the number of virtual machines and their respective configurations, including CPU, memory, hard disk, and current state.
- the machine includes two scheduling states, namely the pending state and the running state.
- Each service node includes two states, namely the saturated state and the starved state. The sum of the resources occupied by each virtual machine cannot be more than the server configuration. Upper limit.
- Step 300 Use the preprocessed data to establish a virtual simulation environment, and establish a multi-agent deep reinforcement learning model
- the establishment of a multi-agent deep reinforcement learning model specifically includes: modeling the collected time series dynamic information (server parameters and virtual machine load information) to create a simulation environment for offline training, and the model adopts multi-agent deep reinforcement Learning model, in order to make full use of the impact of time series data, the deep network part of the model uses the LSTM model to extract time series information to avoid the influence of abnormal data fluctuations on decision-making in the transient state.
- the model adopts the MADDPG (i.e. Multi-Agent Deep Deterministic Policy Gradient, from OpenAI's Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environment) framework.
- the MADDPG framework is DDPG (from the continuous control with deep learning reinforcement published by Google DeepMind).
- the DDPG algorithm applies deep reinforcement learning to the continuous action space.
- the action space obtained by the deep learning part is set to the resource proportion of the virtual machine in the to-be-scheduled state, that is, how much space is scheduled to maintain the load balance of the current service node.
- the virtual machine of the appropriate size is marked as the pending state, and then the virtual machine in the pending state on each service node in the entire network and the return reward of each service node are calculated, and the virtual machine is used to allocate to the service
- the reward value obtained by the node is used as a distance metric to generate a scheduling strategy.
- the detailed scheduling framework is shown in Figure 2.
- a cyclic neural network LSTM Long Short Term Memory Network
- LSTM Long Short Term Memory Network
- the agent on each service node is used to mark the virtual machine as the state to be scheduled.
- the knapsack problem is solved.
- the predicted space to be scheduled is used as the backpack space, and the resource occupied by each virtual machine is used as the weight and value of the item. You need to calculate the maximum value that the backpack can load, and mark the loaded virtual machine as a state to be scheduled. Then count the predicted space to be scheduled on the service node (the presence of a negative number indicates how many resources need to be scheduled to make full use of the resources), and the goal is to minimize the sum of the space occupied by the space to be scheduled and the scheduled space of each service node, which can be obtained by calculation Scheduling strategy.
- the MADDPG framework extends the technology of deep reinforcement learning to the field of multi-agents.
- the algorithm is suitable for centralized learning and decentralized execution in a multi-agent environment.
- the framework can be used To enable multi-agents to learn cooperation and competition.
- the deep reinforcement learning model includes a prediction module and a scheduling module.
- the prediction module includes a state perception unit, an action space unit, and a reward function unit.
- the specific functions are as follows:
- State-aware unit predict the resources that need to be scheduled in the current state through the information input by each node, and the input state is defined by the load information of each node and the resources occupied by the running virtual machine;
- Action space unit Map the action space to the total capacity of the current service node according to the configuration information of the current node;
- Scheduling module According to the marked virtual machine to be scheduled, the scheduling strategy is generated by rescheduling and allocation, and the agent on each service node calculates the reward function according to the generated scheduling action;
- Reward function unit Measure the quality of the scheduling strategy. Its goal is to balance the load of each service node in the entire network. The reward function on each service node is calculated separately; the reward function formula is as follows:
- r i is the reward return on each service node, where c represents the CPU occupancy rate on the i-th machine, and ⁇ and ⁇ are penalty coefficients.
- ⁇ can be set according to the situation, which means that the server CPU usage load is expected to maintain a steady state threshold.
- R is the overall return function
- the final optimization goal is to obtain the maximum R for the scheduling strategy generated by the cooperation of each agent.
- Step 400 Use the multi-agent deep reinforcement learning model and simulation environment for offline training and learning, and train an agent model for each server;
- step 400 offline training is performed in a simulation environment established based on real data, and an agent is created for each service node.
- the agent on each service node adjusts the size of the resource to be scheduled through the prediction module, and marks the need
- the scheduled virtual machine generates a scheduling strategy based on the virtual machine to be scheduled, and then each service node calculates its own return value and sums it up to obtain the total return value, and finally adjusts the parameters of each prediction module according to the total return value.
- Step 500 Deploy the trained agent model to real service nodes, and perform scheduling according to the load conditions of each service node.
- each trained agent model is transferred to the corresponding service node in the real environment.
- the agent first perceives the state information on the server for a period of time as input, and obtains the current state information through the prediction module of the agent.
- the server hopes to release the resources, and then uses the knapsack algorithm to select the virtual machine closest to the standard and mark it as the pending state; then the scheduling module collects the prediction results on all servers and the virtual machine marked as the pending state.
- the virtual machines in the to-be-scheduled state are assigned to a suitable server to generate a scheduling strategy, and the scheduling command is distributed to the corresponding node to perform the scheduling operation.
- the scheduling strategy Before executing the scheduling strategy, it is necessary to check whether each scheduling command is legal. If it is illegal, feedback a penalty reward update parameter, regenerate the scheduling strategy, and iterate repeatedly until all the scheduling strategies can be executed. If it is legal, execute it and get the feedback reward value to update the agent parameters.
- the specific overall scheduling framework is shown in Figure 3.
- this application improves on the above algorithm to replace the action space of the model with the resources that the current server hopes to release, that is, how many resources are hoped to be dispatched from it to maintain load balance under the overall network topology.
- This setting can avoid using the global id to mark each virtual machine, even if a new virtual machine is added midway, it can still work, so it makes the scheduling algorithm more flexible and can adapt to a wider range of scenarios.
- FIG. 4 is a schematic structural diagram of a multi-agent reinforcement learning scheduling system according to an embodiment of the present application.
- the multi-agent reinforcement learning scheduling system of the embodiment of the present application includes an information collection module, a preprocessing module, a reinforcement learning model construction module, an agent model training module, and an agent deployment module.
- Information collection module used to collect the server parameters of the network data center and the load information of the virtual machines running on each server; among them, the collected server parameters specifically include: collecting the configuration information of each server for a period of time in a real scenario, memory and hard disk Storage space, etc.; the collected virtual machine load information specifically includes: collecting the parameters of the resources occupied by the virtual machines running on each server, such as CPU occupancy rate, memory and hard disk occupancy rate, etc.
- Preprocessing module used to standardize the collected server parameters and virtual machine load information and other preprocessing operations; among them, the preprocessing operations specifically include: defining each service node virtual machine information as a tuple, which includes the virtual machine The number and their respective configuration, including CPU, memory, hard disk and current state, each virtual machine includes two scheduling states, namely the pending state and the running state, each service node includes two states, respectively saturated Status and starvation status, the sum of resources occupied by each virtual machine cannot exceed the upper limit of the server configuration.
- Reinforcement learning model building module used to use the preprocessed data to establish a virtual simulation environment and establish a multi-agent deep reinforcement learning model; among them, the establishment of a multi-agent deep reinforcement learning model specifically includes: the collected time series dynamics Information (server parameters and virtual machine load information) is modeled to create a simulation environment for offline training.
- the model uses a multi-agent deep reinforcement learning model.
- the deep network part of the model uses the LSTM model to extract the time series Information, to avoid the impact of abnormal data fluctuations on decision-making in a transient state.
- the model adopts the MADDPG framework, which is an extension of the DDPG algorithm in the field of multi-agents.
- the DDPG algorithm applies deep reinforcement learning to the continuous action space.
- the action space obtained by the deep learning part is set to the resource proportion of the virtual machine in the to-be-scheduled state, that is, how much space is scheduled to maintain the load balance of the current service node.
- the virtual machine of the appropriate size is marked as the pending state, and then the virtual machine in the pending state on each service node in the entire network and the return reward of each service node are calculated, and the virtual machine is used to allocate to the service
- the reward value obtained by the node is used as a distance metric to generate a scheduling strategy.
- a cyclic neural network LSTM Long Short Term Memory Network
- LSTM Long Short Term Memory Network
- the agent on each service node is used to mark the virtual machine as the state to be scheduled.
- the knapsack problem is solved.
- the predicted space to be scheduled is used as the backpack space, and the resource occupied by each virtual machine is used as the weight and value of the item. You need to calculate the maximum value that the backpack can load, and mark the loaded virtual machine as a state to be scheduled. Then count the predicted space to be scheduled on the service node (the presence of a negative number indicates how many resources need to be scheduled to make full use of the resources), and the goal is to minimize the sum of the space occupied by the space to be scheduled and the scheduled space of each service node, which can be obtained by calculation Scheduling strategy.
- the MADDPG framework extends the technology of deep reinforcement learning to the field of multi-agents.
- the algorithm is suitable for centralized learning and decentralized execution in a multi-agent environment.
- the framework can be used To enable multi-agents to learn cooperation and competition.
- the reinforcement learning model building module includes a prediction module and a scheduling module.
- the prediction module includes a state perception unit, an action space unit, and a reward function unit. The specific functions are as follows:
- State-aware unit predict the resources that need to be scheduled in the current state through the information input by each node, and the input state is defined by the load information of each node and the resources occupied by the running virtual machine;
- Action space unit Map the action space to the total capacity of the current service node according to the configuration information of the current node;
- Scheduling module According to the marked virtual machine to be scheduled, the scheduling strategy is generated by rescheduling and allocation, and the agent on each service node calculates the reward function according to the generated scheduling action;
- Reward function unit Measure the quality of the scheduling strategy. Its goal is to balance the load of each service node in the entire network. The reward function on each service node is calculated separately; the reward function formula is as follows:
- r i is the reward return on each service node, where c represents the CPU occupancy rate on the i-th machine, and ⁇ and ⁇ are penalty coefficients.
- ⁇ can be set according to the situation, which means that the server CPU usage load is expected to maintain a steady state threshold.
- R is the overall return function
- the final optimization goal is to obtain the maximum R for the scheduling strategy generated by the cooperation of each agent.
- Agent model training module It is used for offline training and learning using the deep reinforcement learning model and simulation environment of multi-agents, and an agent model is trained for each server; among them, it is carried out in a simulation environment established based on real data Offline training, create an agent for each service node, the agent on each service node adjusts the size of the resource to be scheduled through the prediction module, marks the virtual machine that needs to be scheduled, and generates scheduling based on the virtual machine to be scheduled Then, each service node calculates its own return value and sums it up to get the total return value, and finally adjusts the parameters of each prediction module according to the total return value.
- Agent deployment module It is used to deploy the trained agent model to the real service node, and schedule it according to the load of each service node. Among them, each trained agent model is distributed to the corresponding service node in the real environment, and then the prediction module of the agent is used to predict and modify the pending state.
- the scheduling module uniformly allocates the scheduling strategy and distributes the scheduling commands to the corresponding nodes. Perform the scheduling operation. Before the scheduling action is executed, it is necessary to determine whether the action can be executed. If it cannot be executed or the execution fails, a penalty reward update parameter is fed back to regenerate the scheduling strategy, and iterate repeatedly until all scheduling strategies can be executed.
- this application improves on the above algorithm to replace the action space of the model with the resources that the current server hopes to release, that is, how many resources are hoped to be dispatched from it to maintain load balance under the overall network topology.
- This setting can avoid using the global id to mark each virtual machine, even if a new virtual machine is added midway, it can still work, so it makes the scheduling algorithm more flexible and can adapt to a wider range of scenarios.
- FIG. 5 is a schematic diagram of the hardware device structure of the multi-agent reinforcement learning scheduling method provided by an embodiment of the present application.
- the device includes one or more processors and memory. Taking a processor as an example, the device may also include: an input system and an output system.
- the processor, the memory, the input system, and the output system may be connected by a bus or other methods.
- the connection by a bus is taken as an example.
- the memory can be used to store non-transitory software programs, non-transitory computer executable programs, and modules.
- the processor executes various functional applications and data processing of the electronic device by running non-transitory software programs, instructions, and modules stored in the memory, that is, realizing the processing methods of the foregoing method embodiments.
- the memory may include a program storage area and a data storage area, where the program storage area can store an operating system and an application program required by at least one function; the data storage area can store data and the like.
- the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device.
- the storage may optionally include storage remotely arranged with respect to the processor, and these remote storages may be connected to the processing system through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
- the input system can receive input digital or character information, and generate signal input.
- the output system may include display devices such as a display screen.
- the one or more modules are stored in the memory, and when executed by the one or more processors, the following operations of any of the foregoing method embodiments are performed:
- Step a Collect server parameters of the network data center and load information of virtual machines running on each server;
- Step b use the server parameters and virtual machine load information to establish a virtual simulation environment, and establish a multi-agent deep reinforcement learning model
- Step c Use the deep reinforcement learning model and simulation environment of the multi-agent for offline training and learning, and train an agent model for each server;
- Step d Deploy the agent model to real service nodes, and perform scheduling according to the load conditions of each service node.
- the embodiments of the present application provide a non-transitory (non-volatile) computer storage medium, the computer storage medium stores computer executable instructions, and the computer executable instructions can perform the following operations:
- Step a Collect server parameters of the network data center and load information of virtual machines running on each server;
- Step b use the server parameters and virtual machine load information to establish a virtual simulation environment, and establish a multi-agent deep reinforcement learning model
- Step c Use the deep reinforcement learning model and simulation environment of the multi-agent to perform offline training and learning, and train an agent model for each server;
- Step d Deploy the agent model to real service nodes, and perform scheduling according to the load conditions of each service node.
- the embodiment of the present application provides a computer program product, the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, when the program instructions are executed by a computer To make the computer do the following:
- Step a Collect server parameters of the network data center and load information of virtual machines running on each server;
- Step b use the server parameters and virtual machine load information to establish a virtual simulation environment, and establish a multi-agent deep reinforcement learning model
- Step c Use the deep reinforcement learning model and simulation environment of the multi-agent for offline training and learning, and train an agent model for each server;
- Step d Deploy the agent model to real service nodes, and perform scheduling according to the load conditions of each service node.
- the multi-agent reinforcement learning scheduling method, system and electronic device of the embodiments of the present application virtualize services running on the server through virtualization technology, and perform load balancing by scheduling virtual machines, because the scheduling scope is not limited to a single server .
- the virtual machine can be scheduled to run on other low-load servers, which is more macroscopic than the resource allocation scheme.
- this application uses the MADDPG framework to expand on the AC framework.
- the critic adds additional information for decision-making by other agents, but each actor can only use local information for training. Through this framework, multi-agents can be realized. Produce collaborative strategies in a complex dynamic environment.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer And Data Communications (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
Claims (11)
- 一种多智能体强化学习调度方法,其特征在于,包括以下步骤:步骤a:收集网络数据中心的服务器参数以及每台服务器上运行的虚拟机负载信息;步骤b:使用所述服务器参数和虚拟机负载信息建立虚拟仿真环境,并建立多智能体的深度强化学习模型;步骤c:利用所述多智能体的深度强化学习模型和模拟环境进行离线训练和学习,为每个服务器分别训练一个智能体模型;步骤d:将所述智能体模型部署到真实的服务节点,并根据各个服务节点的负载情况进行调度。
- 根据权利要求1所述的多智能体强化学习调度方法,其特征在于,所述步骤a还包括:将收集到的服务器参数和虚拟机负载信息进行规范化预处理操作;所述规范化预处理操作包括:定义每个服务节点虚拟机信息为一个多元组,所述多元组包括虚拟机的数量与其各自的配置,每个虚拟机包括两个调度状态,分别为待调度状态和运行状态,每个服务节点包括两个状态,分别为饱和状态和饥饿状态,各个虚拟机占用的资源比之和少于所在服务器配置的上限。
- 根据权利要求1或2所述的多智能体强化学习调度方法,其特征在于,在所述步骤b中,所述多智能体的深度强化学习模型具体包括预测模块和调度模块,所述预测模块通过各个服务节点输入的信息对当前状态下需要调度出去的资源进行预测,根据当前服务节点的配置信息将动作空间映射到当前服务节点的总容量之内;所述调度模块根据标记出来的待调度状态的虚拟机,进行重新调度分配产生调度策略,各个服务节点上的智能体根据产生的调度动作计算回报函数; 所述预测模块度量调度策略的好坏,使整个网络中各个服务节点负载均衡。
- 根据权利要求3所述的多智能体强化学习调度方法,其特征在于,在所述步骤c中,所述利用多智能体的深度强化学习模型和模拟环境进行离线训练和学习,为每个服务器分别训练一个智能体模型具体包括:每个服务节点上的智能体通过预测模块调整需要调度的资源大小,标记出需要调度出去的虚拟机,根据待调度状态的虚拟机产生调度策略,各个服务节点分别计算自身的回报值并汇总求和得到总回报值,并根据总回报值调整各个预测模块的参数。
- 根据权利要求4所述的多智能体强化学习调度方法,其特征在于,在所述步骤d中,所述将智能体模型部署到真实的服务节点,并根据各个服务节点的负载情况进行调度具体为:将训练好的智能体模型部署到真实环境中对应的服务节点上,所述智能体模型感知到所在服务器上的一段时间内的状态信息作为输入,预测得到当前服务器需要释放掉的资源,并使用背包算法选出最接近标准的虚拟机将其标记为待调度状态;之后通过调度模块收集到所有服务器上的预测结果与被标记为待调度状态的虚拟机,再按需将待调度状态的虚拟机指派给适合的服务器产生调度策略,将调度命令分发至对应服务节点执行调度操作;在执行调度策略之前对每个调度命令进行校验是否合法,若不合法则反馈一个惩罚奖励更新参数,重新产生调度策略;若合法则执行调度操作,并获得反馈的奖励值更新智能体参数。
- 一种多智能体强化学习调度系统,其特征在于,包括:信息收集模块:用于收集网络数据中心的服务器参数以及每台服务器上运行的虚拟机负载信息;强化学习模型构建模块:用于使用所述服务器参数和虚拟机负载信息建立虚拟仿真环境,并建立多智能体的深度强化学习模型;智能体模型训练模块:用于利用所述多智能体的深度强化学习模型和模拟环境进行离线训练和学习,为每个服务器分别训练一个智能体模型;智能体部署模块:用于将所述智能体模型部署到真实的服务节点,并根据各个服务节点的负载情况进行调度。
- 根据权利要求6所述的多智能体强化学习调度系统,其特征在于,还包括预处理模块,所述预处理模块用于将收集到的服务器参数和虚拟机负载信息进行规范化预处理操作;所述规范化预处理操作包括:定义每个服务节点虚拟机信息为一个多元组,所述多元组包括虚拟机的数量与其各自的配置,每个虚拟机包括两个调度状态,分别为待调度状态和运行状态,每个服务节点包括两个状态,分别为饱和状态和饥饿状态,各个虚拟机占用的资源比之和少于所在服务器配置的上限。
- 根据权利要求6或7所述的多智能体强化学习调度系统,其特征在于,所述强化学习模型构建模块包括预测模块和调度模块,所述预测模块包括:状态感知单元:用于通过各个服务节点输入的信息对当前状态下需要调度出去的资源进行预测;动作空间单元:用于根据当前服务节点的配置信息将动作空间映射到当前服务节点的总容量之内;所述调度模块根据标记出来的待调度状态的虚拟机,进行重新调度分配产生调度策略,各个服务节点上的智能体根据产生的调度动作计算回报函数;所述预测模块还包括:奖励函数单元:用于度量调度策略的好坏,使整个网络中各个服务节点负载均衡。
- 根据权利要求8所述的多智能体强化学习调度系统,其特征在于,所述 智能体模型训练模块利用多智能体的深度强化学习模型和模拟环境进行离线训练和学习,为每个服务器分别训练一个智能体模型具体为:每个服务节点上的智能体通过预测模块调整需要调度的资源大小,标记出需要调度出去的虚拟机,根据待调度状态的虚拟机产生调度策略,各个服务节点分别计算自身的回报值并汇总求和得到总回报值,并根据总回报值调整各个预测模块的参数。
- 根据权利要求9所述的多智能体强化学习调度系统,其特征在于,所述智能体部署模块将智能体模型部署到真实的服务节点,并根据各个服务节点的负载情况进行调度具体为:将训练好的智能体模型部署到真实环境中对应的服务节点上,所述智能体模型感知到所在服务器上的一段时间内的状态信息作为输入,预测得到当前服务器需要释放掉的资源,并使用背包算法选出最接近标准的虚拟机将其标记为待调度状态;之后通过调度模块收集到所有服务器上的预测结果与被标记为待调度状态的虚拟机,再按需将待调度状态的虚拟机指派给适合的服务器产生调度策略,将调度命令分发至对应服务节点执行调度操作;在执行调度策略之前对每个调度命令进行校验是否合法,若不合法则反馈一个惩罚奖励更新参数,重新产生调度策略;若合法则执行调度操作,并获得反馈的奖励值更新智能体参数。
- 一种电子设备,包括:至少一个处理器;以及与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述1至5任一项所述的多智能体强化学习调度方法的以下操作:步骤a:收集网络数据中心的服务器参数以及每台服务器上运行的虚拟机负 载信息;步骤b:使用所述服务器参数和虚拟机负载信息建立虚拟仿真环境,并建立多智能体的深度强化学习模型;步骤c:利用所述多智能体的深度强化学习模型和模拟环境进行离线训练和学习,为每个服务器分别训练一个智能体模型;步骤d:将所述智能体模型部署到真实的服务节点,并根据各个服务节点的负载情况进行调度。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910193429.X | 2019-03-14 | ||
CN201910193429.XA CN109947567B (zh) | 2019-03-14 | 2019-03-14 | 一种多智能体强化学习调度方法、系统及电子设备 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020181896A1 true WO2020181896A1 (zh) | 2020-09-17 |
Family
ID=67009966
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/130582 WO2020181896A1 (zh) | 2019-03-14 | 2019-12-31 | 一种多智能体强化学习调度方法、系统及电子设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109947567B (zh) |
WO (1) | WO2020181896A1 (zh) |
Families Citing this family (45)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109947567B (zh) * | 2019-03-14 | 2021-07-20 | 深圳先进技术研究院 | 一种多智能体强化学习调度方法、系统及电子设备 |
CN110362411B (zh) * | 2019-07-25 | 2022-08-02 | 哈尔滨工业大学 | 一种基于Xen系统的CPU资源调度方法 |
CN110442129B (zh) * | 2019-07-26 | 2021-10-22 | 中南大学 | 一种多智能体编队的控制方法和系统 |
CN110471297B (zh) * | 2019-07-30 | 2020-08-11 | 清华大学 | 多智能体协同控制方法、系统及设备 |
CN110427006A (zh) * | 2019-08-22 | 2019-11-08 | 齐鲁工业大学 | 一种用于流程工业的多智能体协同控制系统及方法 |
CN110516795B (zh) * | 2019-08-28 | 2022-05-10 | 北京达佳互联信息技术有限公司 | 一种为模型变量分配处理器的方法、装置及电子设备 |
CN110728368B (zh) * | 2019-10-25 | 2022-03-15 | 中国人民解放军国防科技大学 | 一种仿真机器人深度强化学习的加速方法 |
CN111031387B (zh) * | 2019-11-21 | 2020-12-04 | 南京大学 | 一种监控视频发送端视频编码流速控制的方法 |
CN110882544B (zh) * | 2019-11-28 | 2023-09-15 | 网易(杭州)网络有限公司 | 多智能体训练方法、装置和电子设备 |
CN111026549B (zh) * | 2019-11-28 | 2022-06-10 | 国网甘肃省电力公司电力科学研究院 | 一种电力信息通信设备自动化测试资源调度方法 |
CN111047014B (zh) * | 2019-12-11 | 2023-06-23 | 中国航空工业集团公司沈阳飞机设计研究所 | 一种多智能体空中对抗分布式采样训练方法及设备 |
CN111178545B (zh) * | 2019-12-31 | 2023-02-24 | 中国电子科技集团公司信息科学研究院 | 一种动态强化学习决策训练系统 |
CN113067714B (zh) * | 2020-01-02 | 2022-12-13 | 中国移动通信有限公司研究院 | 一种内容分发网络调度处理方法、装置及设备 |
CN111310915B (zh) * | 2020-01-21 | 2023-09-01 | 浙江工业大学 | 一种面向强化学习的数据异常检测防御方法 |
CN111324358B (zh) * | 2020-02-14 | 2020-10-16 | 南栖仙策(南京)科技有限公司 | 一种用于信息系统自动运维策略的训练方法 |
CN111343095B (zh) * | 2020-02-15 | 2021-11-05 | 北京理工大学 | 一种在软件定义网络中实现控制器负载均衡的方法 |
CN111461338A (zh) * | 2020-03-06 | 2020-07-28 | 北京仿真中心 | 基于数字孪生的智能系统更新方法、装置 |
CN111339675B (zh) * | 2020-03-10 | 2020-12-01 | 南栖仙策(南京)科技有限公司 | 基于机器学习构建模拟环境的智能营销策略的训练方法 |
CN111538668B (zh) * | 2020-04-28 | 2023-08-15 | 山东浪潮科学研究院有限公司 | 基于强化学习的移动端应用测试方法、装置、设备及介质 |
CN111585811B (zh) * | 2020-05-06 | 2022-09-02 | 郑州大学 | 一种基于多智能体深度强化学习的虚拟光网络映射方法 |
CN111722910B (zh) * | 2020-06-19 | 2023-07-21 | 广东石油化工学院 | 一种云作业调度及资源配置的方法 |
CN111724001B (zh) * | 2020-06-29 | 2023-08-29 | 重庆大学 | 一种基于深度强化学习的飞行器探测传感器资源调度方法 |
CN111860777B (zh) * | 2020-07-06 | 2021-07-02 | 中国人民解放军军事科学院战争研究院 | 面向超实时仿真环境的分布式强化学习训练方法及装置 |
CN112001585B (zh) * | 2020-07-14 | 2023-09-22 | 北京百度网讯科技有限公司 | 多智能体决策方法、装置、电子设备及存储介质 |
CN111967645B (zh) * | 2020-07-15 | 2022-04-29 | 清华大学 | 一种社交网络信息传播范围预测方法及系统 |
CN112422651A (zh) * | 2020-11-06 | 2021-02-26 | 电子科技大学 | 一种基于强化学习的云资源调度性能瓶颈预测方法 |
CN112838946B (zh) * | 2020-12-17 | 2023-04-28 | 国网江苏省电力有限公司信息通信分公司 | 基于通信网故障智能感知与预警模型的构建方法 |
CN112766705A (zh) * | 2021-01-13 | 2021-05-07 | 北京洛塔信息技术有限公司 | 分布式工单处理方法、系统、设备和存储介质 |
CN112966431B (zh) * | 2021-02-04 | 2023-04-28 | 西安交通大学 | 一种数据中心能耗联合优化方法、系统、介质及设备 |
CN112801303A (zh) * | 2021-02-07 | 2021-05-14 | 中兴通讯股份有限公司 | 一种智能流水线处理方法、装置、存储介质及电子装置 |
CN113115451A (zh) * | 2021-02-23 | 2021-07-13 | 北京邮电大学 | 基于多智能体深度强化学习的干扰管理和资源分配方案 |
CN113094171A (zh) * | 2021-03-31 | 2021-07-09 | 北京达佳互联信息技术有限公司 | 数据处理方法、装置、电子设备和存储介质 |
US20220321605A1 (en) * | 2021-04-01 | 2022-10-06 | Cisco Technology, Inc. | Verifying trust postures of heterogeneous confidential computing clusters |
CN113325721B (zh) * | 2021-08-02 | 2021-11-05 | 北京中超伟业信息安全技术股份有限公司 | 一种工业系统无模型自适应控制方法及系统 |
CN113672372B (zh) * | 2021-08-30 | 2023-08-08 | 福州大学 | 一种基于强化学习的多边缘协同负载均衡任务调度方法 |
CN114003121B (zh) * | 2021-09-30 | 2023-10-31 | 中国科学院计算技术研究所 | 数据中心服务器能效优化方法与装置、电子设备及存储介质 |
CN113641462B (zh) * | 2021-10-14 | 2021-12-21 | 西南民族大学 | 基于强化学习的虚拟网络层次化分布式部署方法及系统 |
WO2023121514A1 (ru) * | 2021-12-21 | 2023-06-29 | Владимир Германович КРЮКОВ | Система принятия решений в мультиагентной среде |
CN114116183B (zh) * | 2022-01-28 | 2022-04-29 | 华北电力大学 | 基于深度强化学习的数据中心业务负载调度方法及系统 |
CN114518948A (zh) * | 2022-02-21 | 2022-05-20 | 南京航空航天大学 | 面向大规模微服务应用的动态感知重调度的方法及应用 |
CN114924684A (zh) * | 2022-04-24 | 2022-08-19 | 南栖仙策(南京)科技有限公司 | 基于决策流图的环境建模方法、装置和电子设备 |
CN114860416B (zh) * | 2022-06-06 | 2024-04-09 | 清华大学 | 对抗场景中的分布式多智能体探测任务分配方法及装置 |
CN114781072A (zh) * | 2022-06-17 | 2022-07-22 | 北京理工大学前沿技术研究院 | 一种无人驾驶车辆的决策方法和系统 |
CN115293451B (zh) * | 2022-08-24 | 2023-06-16 | 中国西安卫星测控中心 | 基于深度强化学习的资源动态调度方法 |
CN116151137B (zh) * | 2023-04-24 | 2023-07-28 | 之江实验室 | 一种仿真系统、方法及装置 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103873569A (zh) * | 2014-03-05 | 2014-06-18 | 兰雨晴 | 一种基于IaaS云平台的资源优化部署方法 |
CN105607952A (zh) * | 2015-12-18 | 2016-05-25 | 航天恒星科技有限公司 | 一种虚拟化资源的调度方法及装置 |
WO2018076791A1 (zh) * | 2016-10-31 | 2018-05-03 | 华为技术有限公司 | 一种资源负载均衡控制方法及集群调度器 |
CN108829494A (zh) * | 2018-06-25 | 2018-11-16 | 杭州谐云科技有限公司 | 基于负载预测的容器云平台智能资源优化方法 |
CN109947567A (zh) * | 2019-03-14 | 2019-06-28 | 深圳先进技术研究院 | 一种多智能体强化学习调度方法、系统及电子设备 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10649966B2 (en) * | 2017-06-09 | 2020-05-12 | Microsoft Technology Licensing, Llc | Filter suggestion for selective data import |
CN108021451B (zh) * | 2017-12-07 | 2021-08-13 | 上海交通大学 | 一种雾计算环境下的自适应容器迁移方法 |
CN109165081B (zh) * | 2018-08-15 | 2021-09-28 | 福州大学 | 基于机器学习的Web应用自适应资源配置方法 |
CN109068350B (zh) * | 2018-08-15 | 2021-09-28 | 西安电子科技大学 | 一种无线异构网络的终端自主选网系统及方法 |
-
2019
- 2019-03-14 CN CN201910193429.XA patent/CN109947567B/zh active Active
- 2019-12-31 WO PCT/CN2019/130582 patent/WO2020181896A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103873569A (zh) * | 2014-03-05 | 2014-06-18 | 兰雨晴 | 一种基于IaaS云平台的资源优化部署方法 |
CN105607952A (zh) * | 2015-12-18 | 2016-05-25 | 航天恒星科技有限公司 | 一种虚拟化资源的调度方法及装置 |
WO2018076791A1 (zh) * | 2016-10-31 | 2018-05-03 | 华为技术有限公司 | 一种资源负载均衡控制方法及集群调度器 |
CN108829494A (zh) * | 2018-06-25 | 2018-11-16 | 杭州谐云科技有限公司 | 基于负载预测的容器云平台智能资源优化方法 |
CN109947567A (zh) * | 2019-03-14 | 2019-06-28 | 深圳先进技术研究院 | 一种多智能体强化学习调度方法、系统及电子设备 |
Non-Patent Citations (1)
Title |
---|
WEI, LIANG: "Research on Resource Scheduling Algorithm and Experimental Platform for Cloud-network Integration", CNKI, CHINA DOCTORAL DISSERTATIONS FULL-TEXT DATABASE, 30 September 2018 (2018-09-30), DOI: 20200315230946X * |
Also Published As
Publication number | Publication date |
---|---|
CN109947567A (zh) | 2019-06-28 |
CN109947567B (zh) | 2021-07-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020181896A1 (zh) | 一种多智能体强化学习调度方法、系统及电子设备 | |
Rossi et al. | Horizontal and vertical scaling of container-based applications using reinforcement learning | |
Liu et al. | Adaptive asynchronous federated learning in resource-constrained edge computing | |
Ghobaei-Arani et al. | A cost-efficient IoT service placement approach using whale optimization algorithm in fog computing environment | |
CN107888669B (zh) | 一种基于深度学习神经网络的大规模资源调度系统及方法 | |
Han et al. | Tailored learning-based scheduling for kubernetes-oriented edge-cloud system | |
CN109491790A (zh) | 基于容器的工业物联网边缘计算资源分配方法及系统 | |
CN107404523A (zh) | 云平台自适应资源调度系统和方法 | |
CN110231976B (zh) | 一种基于负载预测的边缘计算平台容器部署方法及系统 | |
CN108965014A (zh) | QoS感知的服务链备份方法及系统 | |
CN104102533B (zh) | 一种基于带宽感知的Hadoop调度方法和系统 | |
CN114787830A (zh) | 异构集群中的机器学习工作负载编排 | |
CN109783225B (zh) | 一种多租户大数据平台的租户优先级管理方法及系统 | |
CN114841345B (zh) | 一种基于深度学习算法的分布式计算平台及其应用 | |
CN112732444A (zh) | 一种面向分布式机器学习的数据划分方法 | |
CN113742089A (zh) | 异构资源中神经网络计算任务的分配方法、装置和设备 | |
Cardellini et al. | Self-adaptive container deployment in the fog: A survey | |
CN115543626A (zh) | 采用异构计算资源负载均衡调度的电力缺陷图像仿真方法 | |
Ye et al. | SHWS: Stochastic hybrid workflows dynamic scheduling in cloud container services | |
CN109976873B (zh) | 容器化分布式计算框架的调度方案获取方法及调度方法 | |
CN112446484A (zh) | 一种多任务训练集群智能网络系统及集群网络优化方法 | |
Tuli et al. | Optimizing the Performance of Fog Computing Environments Using AI and Co-Simulation | |
CN115562812A (zh) | 面向机器学习训练的分布式虚拟机调度方法、装置和系统 | |
Guérout et al. | Autonomic energy-aware tasks scheduling | |
CN111782354A (zh) | 一种基于强化学习的集中式数据处理时间优化方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19919235 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19919235 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 04/02/2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19919235 Country of ref document: EP Kind code of ref document: A1 |