WO2020119649A1 - 一种任务调度模拟系统 - Google Patents

一种任务调度模拟系统 Download PDF

Info

Publication number
WO2020119649A1
WO2020119649A1 PCT/CN2019/124086 CN2019124086W WO2020119649A1 WO 2020119649 A1 WO2020119649 A1 WO 2020119649A1 CN 2019124086 W CN2019124086 W CN 2019124086W WO 2020119649 A1 WO2020119649 A1 WO 2020119649A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
module
information
node
task scheduling
Prior art date
Application number
PCT/CN2019/124086
Other languages
English (en)
French (fr)
Inventor
喻之斌
李乐乐
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Priority to US17/055,606 priority Critical patent/US11455189B2/en
Publication of WO2020119649A1 publication Critical patent/WO2020119649A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • G06F9/4887Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues involving deadlines, e.g. rate based, periodic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation

Definitions

  • This application belongs to the field of cloud computing technology, and particularly relates to a task scheduling simulation system.
  • the task scheduling simulation system on the cloud computing platform can use several machine nodes that are much smaller than the number of machines in the production environment according to the log records of the tasks running in the production environment, which truly reflects the number of tasks to be processed by the resource management scheduling system at a certain time. As well as changes such as the downtime and addition of cluster machine nodes, it provides an experimental environment that is highly consistent with the real production environment for the research of the scheduling algorithm of the researchers, thus providing support for the effectiveness of the scheduling algorithm.
  • the task scheduling simulation system can simulate the experimental scenario of the peak number of tasks submitted or the overall cluster resource utilization reaching a critical value through some special settings, so that Provide a running environment for testing the operating efficiency of the newly designed scheduling algorithm in the worst case.
  • the Yarn simulation system (Scheduler Load Simulator, SLS) in the big data processing system Hadoop, which simulates and runs batch processing tasks based on Map-Reduce.
  • the input of SLS operation is the running log of batch processing tasks, including the running time of each task and the CPU and memory resources it requires.
  • some cloud computing data center 24-hour log records some cloud computing data centers are mixed scheduling and running of batch processing tasks and online tasks. SLS can only target a single batch of Map-Reduce simulation systems that process tasks.
  • SLS Stuler Load Simulator
  • the input of SLS operation is the running log of batch processing tasks, including the running time of each task and the CPU and memory resources it requires.
  • some cloud computing data center 24-hour log records some cloud computing data centers are mixed scheduling and running of batch processing tasks and online tasks.
  • the SLS can only address the problem of a single batch of Map-Reduce simulation systems that process tasks. This application provides a task scheduling simulation system.
  • the present application provides a task scheduling simulation system, the system includes a data preprocessing subsystem and a task scheduling subsystem;
  • the data pre-processing subsystem is used to filter data abnormality and extract the running time of each task for the input cloud computing log information
  • the task scheduling subsystem is used to enqueue or dequeue tasks from the batch processing and real-time task running queues of each node, keeping the number and status of tasks currently running in the cluster consistent with the actual production environment; Resource demand, update the used and available amount of CPU core and memory capacity of each node, and get the latest topology map of the entire cluster resource utilization.
  • the data pre-processing subsystem includes a data anomaly and absence processing module, a task information statistics module, a resource demand statistics module, and a runtime statistics module;
  • the data anomaly and absence processing module is used to read the native cloud computing cluster operation log to remove abnormal data
  • the task information statistics module is used to count the task information and the number of task instances of each submitted job
  • the resource demand statistics module is used to count the total CPU demand core and the total memory demand capacity of each job
  • Running time statistics module used to calculate the running time of each task instance; counting the estimated running time of each job.
  • the task information statistics module, the resource demand statistics module, and the run time statistics module can simultaneously start three threads for parallel processing.
  • the task scheduling subsystem includes a task operation information processing unit, a control unit, and a machine node event processing unit;
  • the task operation information processing unit includes a task operation information loading module, a task event driving module, and a task scheduling algorithm module;
  • the control unit includes a simulator operation control module and a machine node resource information statistics collection module;
  • the machine node event processing unit includes a machine node event information module and a machine node event driving module.
  • the task event driven module includes a batch task event driven sub-module and an online task event driven sub-module.
  • the machine node event information module includes adding a node sub-module and deleting a node sub-module.
  • the machine node event-driven module includes a hash table.
  • the task scheduling simulation system provided by this application, by setting up a data preprocessing subsystem and a task scheduling subsystem, realizes a mixed scheduling simulation of batch processing tasks and online tasks, and can also perform heterogeneous CPU cores and memory capacity of cluster nodes. Resource simulation.
  • FIG. 1 is a working flowchart of a data preprocessing subsystem of a task scheduling simulation system of this application;
  • FIG. 2 is a working flowchart of a task scheduling subsystem of a task scheduling simulation system of the present application.
  • a cluster is a group of independent computers interconnected by a high-speed network. They form a group and are managed as a single system. When a client interacts with the cluster, the cluster is like an independent server.
  • the cluster configuration is used to improve availability and scalability.
  • cluster technology can use servers of various grades as nodes, the system cost is low, it can achieve a high calculation speed, complete the calculation of large amounts of calculation, has a high response capacity, and can meet today's The growing demand for information services.
  • the cluster technology is a general technology, the purpose of which is to solve the shortage of single-machine computing power, IO capacity, improve service reliability, obtain scale scalability, and reduce the overall operation and maintenance costs (operation, upgrade, Maintenance cost). As long as other technologies can not achieve the above objectives, or although the above objectives can be achieved, but the cost is too high, you can consider using cluster technology.
  • the number of cluster machines is tens of thousands. For example, in 2011, Google announced that the number of cluster nodes can reach 12,500. In 2016, the number of machines in Microsoft's data center exceeded 50,000. At the same time, in these large data centers, tens of thousands of jobs and tasks are scheduled and run every day.
  • An effective job scheduling algorithm can reasonably allocate jobs to machine nodes that meet its running resource requirements, significantly improving cluster resource utilization and task throughput per unit time.
  • the machine nodes of the related research teams are smaller, generally less than a few hundred, which is not enough to truly and accurately restore the actual status of task scheduling in the enterprise production environment. Therefore, in order to verify the effectiveness of the new scheduling algorithm in the actual production environment, it is particularly important to find a system that can fully simulate the real task scheduling in the production environment and can run on a few machines.
  • the input of SLS does not include the hardware resource information of cluster nodes and the dynamic addition and deletion logs of cluster nodes during operation, and the output of SLS operation is the memory usage at the JVM level and the global CPU core usage of the entire cluster, lacking Record resource utilization statistics on a single machine node.
  • SLS's simulation of node resources in the cluster just treats them as homogeneous machines with the same number of CPU cores and memory size.
  • the API is a calling interface left by the operating system to the application program.
  • the application program causes the operating system to execute the commands of the application program by calling the operating system API.
  • Hash table (Hash table, also called hash table) is a data structure that is accessed directly according to the key value. In other words, it accesses records by mapping key code values to a location in the table to speed up the search. This mapping function is called a hash function, and the array storing records is called a hash table.
  • Comma-separated values (Comma-Separated Values, CSV, sometimes referred to as character-separated values, because the separator character may not be a comma)
  • CSV Common-Separated Values
  • its file stores the table data (numbers and text) in plain text.
  • Plain text means that the file is a sequence of characters and contains no data that must be interpreted like a binary number.
  • the CSV file consists of any number of records, separated by some newline character; each record is composed of fields, and the separators between fields are other characters or strings, and the most common are commas or tabs.
  • This application provides a task scheduling simulation system, which includes a data preprocessing subsystem and a task scheduling subsystem;
  • the data pre-processing subsystem is used to filter data abnormality and extract the running time of each task for the input cloud computing log information
  • the task scheduling subsystem is used to enqueue or dequeue tasks from the batch processing and real-time task running queues of each node, keeping the number and status of tasks currently running in the cluster consistent with the actual production environment; Resource demand, update the used and available amount of CPU core and memory capacity of each node, and get the latest topology map of the entire cluster resource utilization.
  • the data pre-processing subsystem includes a data anomaly and absence processing module, a task information statistics module, a resource demand statistics module, and a runtime statistics module;
  • the data anomaly and absence processing module is used to read the native cloud computing cluster operation log to remove abnormal data
  • the task information statistics module is used to count the task information and the number of task instances of each submitted job
  • the resource demand statistics module is used to count the total CPU demand core and the total memory demand capacity of each job
  • Running time statistics module used to calculate the running time of each task instance; counting the estimated running time of each job.
  • the input of the data preprocessing subsystem is the native cloud computing task running log, and the output is the native cloud computing task log information plus the above statistical information. Users can obtain the above statistical information through the API provided by the system, and the return format is json, plus a third-party chart visualization tool can display the hardware resources of the task on the web page.
  • the task information statistics module, the resource demand statistics module, and the run time statistics module can simultaneously start three threads for parallel processing.
  • the task scheduling subsystem includes a task operation information processing unit, a control unit, and a machine node event processing unit;
  • the task operation information processing unit includes a task operation information loading module, a task event driving module, and a task scheduling algorithm module;
  • the control unit includes a simulator operation control module and a machine node resource information statistics collection module;
  • the machine node event processing unit includes a machine node event information module and a machine node event driving module.
  • the task event driven module includes a batch task event driven sub-module and an online task event driven sub-module.
  • the task running information loading module is used to:
  • S103 The data information of the machine node is loaded into an unordered set of maps in the memory of the simulator, the key is the time stamp of the machine node event, and the value is the data record of the machine node.
  • the machine node event-driven module is used to:
  • S201 Using an event-driven model, according to the current wall clock time of the simulator, the simulator can update the globally available machine nodes of the cluster in response to the increase or failure of related machine nodes.
  • the Google log recording module can be used to output the updated information of the machine node to the relevant directory.
  • the batch task event driven module is used to:
  • S301 Using an event-driven model, according to the current wall clock time of the simulator, execute event processing for the running event status (preparation, waiting, termination, failure, cancellation, interruption) of the batch task instance. If the batch task instance is in a waiting state, the S5 task algorithm scheduling module is triggered to execute related algorithms for task scheduling; if the task instance is in a failed or terminated or canceled state, the resource information on the running node is updated.
  • the online task event-driven module is used to:
  • S401 Using an event-driven model, based on the current wall clock time of the simulator. If the event status of the online task is in the generated state, it will trigger the S5 task algorithm scheduling module to perform task scheduling; if the event status of the online task is in the removed state, the related machine node resource usage will be updated.
  • the task scheduling algorithm module is used to:
  • S501 A plug-in software design mode is adopted to integrate different task scheduling algorithms into the simulator's scheduling algorithm library. The user can specify the scheduling algorithm used in this simulator run through the configuration file xml.
  • the machine node resource information statistics collection module is used to:
  • S601 According to the number of tasks running on each node and the resource consumption of the tasks, dynamically calculate the number of CPU cores and memory capacity of each node at a certain moment.
  • the machine node resource information statistics collection module can output the resource utilization on the node to every other time period (such as 5 seconds) CSV file.
  • the simulator operation control module is used to:
  • S701 Set the start time and end time of the wall clock running by the simulator, and these two time points correspond to some two time points in the Facebook Cloud log.
  • S702 The acceleration ratio of the simulator can be set.
  • the machine node event information module includes adding a node sub-module and deleting a node sub-module.
  • the machine node event-driven module includes a hash table.
  • the task scheduling subsystem first sets the task scheduling time period required by the simulator to simulate the cloud computing data center through the simulator operation control module, starts the simulator operation, and then the task operation information loading module loads from the output data of the data preprocessing subsystem Task information that needs to be simulated, load new machine node information in real time through the machine node event drive module, manage the running status of the task through the batch task event drive module and online task event drive module, and load the specified schedule through the task scheduling algorithm module
  • the algorithm also implements the scheduling of tasks in a waiting state, and calculates the CPU core number and memory usage of each node in real time through the machine node resource information statistics collection module, and outputs it to the specified output directory.
  • This application is a task scheduling simulation system for a cluster environment.
  • Facebook Cloud is used as the object for detailed description:
  • the first is the data pre-processing subsystem.
  • the input part of the data pre-processing subsystem is the 24-hour operation log published by Facebook Cloud, and the output is a pre-processed CSV file as input data for the subsequent simulator system.
  • the pre-processing is divided into 4 modules.
  • the data exception handling module reads the four running logs of the native Amazon Cloud cluster for exception handling.
  • the exception handling operations mainly include excluding task instance records and online task records whose end time is less than the start time.
  • the average value of the resource requirements of the unified task instance is used to fill. For example, if the CPU core application record of a batch processing task instance is missing, the average value of the CPU application cores of all other task instances with the same task account number is calculated, and this average value is substituted for the missing value.
  • the task information statistics module is to count the task information of each submitted job, including counting the number of tasks owned by each task, and forming a map that maps the set of job ID and task ID, the key is the job ID, and the value is the task ID collection.
  • the resource demand statistics module first sums up the required CPU cores and memory capacity records of each task instance under a single task, and calculates the total CPU demand cores and memory demand capacity of this task. Then according to the sum of the CPU demand core and memory demand capacity of each task under a single job, the total CPU demand core and memory total demand capacity of this job are counted.
  • the running time statistics module calculates the running time of each batch or online task instance from the log records. Since the start running time of a task instance may be earlier than the start time of Facebook Cloud log sampling, and the end time of a task instance may also be later than 24 hours, there are two cases. First, the task instance starts to run earlier than 0:00, then the start time of the task instance is modified to 0 seconds; second, the end time of the task instance is later than 24 hours, then the end time of the task instance is modified to int integer The maximum value. Finally, the running time of each task instance is calculated as the end time of the task instance minus the start time, in seconds.
  • the new log records generated by the task information, resource requirements and runtime statistics module are output to the intermediate data CSV file.
  • the above three modules can simultaneously start 3 threads for parallel processing.
  • the specific workflow of the task scheduling subsystem is shown in Figure 2.
  • the user enters the simulator to simulate the time period for the task running in the cloud platform, such as 0:0 to 12:59. This time period information is used as the initialization of the simulator operation control module.
  • the analog clock starts to run.
  • the simulator operation control module starts the machine node event drive module and the task event drive module, and then reads the intermediate data CSV file output by the data preprocessing subsystem line by line according to the current analog clock. If the read information belongs to the machine node event file, the information is sent to the machine node event drive module, and the machine node event drive module is responsible for parsing.
  • the machine node event information module can be divided into two types: adding node submodule and deleting node submodule.
  • the machine node event-driven module uses a hash table to record machine node information in the current cluster. Therefore, when adding and deleting node event information needs to be processed, the machine node event driver module operates a hash table to add or delete cluster nodes to simulate the current number of cluster machine nodes, resource status, and actual production environment logs are consistent.
  • the task operation information loading module is responsible for loading data from the intermediate data CSV file into the memory map data structure, and the simulator operation control module sends the task event information of the current clock from the map to the task event driving module for processing.
  • the task event driven module analyzes these task event information, obtains the CPU and memory requirements of each task, generates a directed acyclic graph of the task, and submits it to the task scheduling algorithm module for resource allocation and task scheduling.
  • the simulator operation control module informs the machine node resource information statistics collection module to collect the resource utilization of each node, including the remaining CPU cores and allocable memory capacity, and finally updates the resource utilization topology of the entire cluster Figure.
  • the task algorithm scheduling module uses this resource as the input data, loads the scheduling algorithm code specified by the user from the algorithm scheduling library, and runs the task scheduling program.
  • the simulator operation control module records the operation start time and operation end time of the scheduler, calculates the operation time of the scheduler, and returns it to the user as the operation efficiency of the scheduling algorithm. After the task scheduler runs, the matching information of the task and the node is obtained.
  • the task event-driven module updates the node task queuing table it maintains, that is, enqueue or dequeue tasks from the batch processing and real-time task run queues of each node, thereby maintaining the current number and status of tasks running in the cluster.
  • the actual production environment is consistent.
  • the machine node resource information statistics collection module rescans the task running queue in each node, and updates the used and available amount of CPU cores and memory capacity of each node according to the resource requirements of each task. Finally, update the resource utilization topology map of the entire cluster.
  • this system Based on the 24-hour cloud computing platform cluster task operation log published by Facebook Cloud, this system implements the process of submitting, scheduling, running, and ending tasks on Facebook Cloud cluster nodes on a single machine node. And, at a certain time within 24 hours, the system can simulate the utilization of CPU and memory resources on each machine node based on the number of tasks running on each machine node and its life cycle status.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • Debugging And Monitoring (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种任务调度模拟系统,包括数据预处理子系统和任务调度子系统;数据预处理子系统,对输入的云计算日志信息进行数据异常过滤和提取每一个任务的运行时长;任务调度子系统,从每个节点的批处理和实时任务运行队列中入或出队任务,保持当前集群运行的任务与实际生产环境一致;根据每个任务的资源需求情况,更新每个节点的CPU核数和内存容量的已使用量和可使用量。实现批处理任务和在线任务的混布调度模拟,同时也可进行集群节点异构的CPU核数和内存容量的资源模拟。

Description

一种任务调度模拟系统 技术领域
本申请属于云计算技术领域,特别是涉及一种任务调度模拟系统。
背景技术
云计算平台上的任务调度模拟系统能够按照生产环境中任务运行的日志记录,使用远小于生产环境机器数量的若干个机器节点,真实地反映在某时刻资源管理调度系统所要处理的任务的数量,以及集群机器节点的宕机和添加等变化,为研究人员的调度算法的研究提供一个与真实生产环境高度吻合的实验环境,从而为调度算法的有效性证明提供支持。另一方面,与生产环境缺乏最坏任务调度考量的记录相比,任务调度模拟系统能够通过一些特殊的设置,模拟出任务数量提交峰值或整个集群资源利用率达到临界值的实验场景,从而能够为测试新设计的调度算法在最坏情况下的运行效率提供运行环境。
大数据处理系统Hadoop中的Yarn模拟系统(Scheduler Load Simulator,SLS),它模拟运行的是基于Map-Reduce的批处理任务。SLS运行的输入是批处理任务的运行日志,包含每个任务的运行时间以及它所需求的CPU和内存资源。而在有的云计算公布的数据中心24小时日志记录中,有的云计算数据中心上是批处理任务和在线任务的混布调度和运行。而SLS只能针对Hadoop中Map-Reduce单一批处理任务的模拟系统。
发明内容
1.要解决的技术问题
基于大数据处理系统Hadoop中的Yarn模拟系统(Scheduler Load Simulator,SLS),它模拟运行的是基于Map-Reduce的批处理任务。SLS运行的输入是批处理任务的运行日志,包含每个任务的运行时间以及它所需求的CPU和内存资源。而在有的云计算公布的数据中心24小时日志记录中,有的云计算数据中心上是批处理任务和在线任务的混布调度和运行。而SLS只能针对Hadoop中Map-Reduce单一批处理任务的模拟系统的问题,本申请提供了一种任务调度模拟系统。
2.技术方案
为了达到上述的目的,本申请提供了一种任务调度模拟系统,所述系统包括数据预处理子系统和任务调度子系统;
所述数据预处理子系统,用于对输入的云计算日志信息进行数据异常过滤和提取每一个任务的运行时长;
所述任务调度子系统,用于从每个节点的批处理和实时任务运行队列中入队或出队任务,保持当前集群运行的任务数量和状态与实际生产环境一致;同时根据每个任务的资源需求情况,更新每个节点的CPU核数和内存容量的已使用量和可使用量,得出最新的整个集群资源利用拓扑图。
可选地,所述数据预处理子系统包括数据异常和缺失处理模块、任务信息统计模块、资源需求统计模块和运行时长统计模块;
所述数据异常和缺失处理模块,用于读取原生的云计算集群运行日志对异常数据进行剔除;
所述任务信息统计模块,用于统计每个提交作业的任务信息以及任务实例数量;
所述资源需求统计模块,用于统计出每个作业的CPU总需求核数和内存总需求容量;
运行时长统计模块,用于计算每个任务实例运行时长;统计每个作业预计运行时长。
可选地,所述任务信息统计模块、所述资源需求统计模块和所述运行时长统计模块可同时开启3个线程并行处理。
可选地,所述任务调度子系统包括任务运行信息处理单元、控制单元和机器节点事件处理单元;
所述任务运行信息处理单元包括任务运行信息加载模块、任务事件驱动模块和任务调度算法模块;
所述控制单元包括模拟器运行控制模块和机器节点资源信息统计收集模块;
所述机器节点事件处理单元包括机器节点事件信息模块和机器节点事件驱动模块。
可选地,所述任务事件驱动模块包括批处理任务事件驱动子模块和在线任务事件驱动子模块。
可选地,所述机器节点事件信息模块包括添加节点子模块和删除节点子模块。
可选地,所述机器节点事件驱动模块包括哈希表。
3.有益效果
与现有技术相比,本申请提供的一种任务调度模拟系统的有益效果在于:
本申请提供的任务调度模拟系统,通过设置数据预处理子系统和任务调度子系统,实现批处理任务和在线任务的混布调度模拟,同时也可进行集群节点异构的CPU核数和内存容量的资源模拟。
附图说明
图1是本申请的一种任务调度模拟系统的数据预处理子系统工作流程图;
图2是本申请的一种任务调度模拟系统的任务调度子系统工作流程图。
具体实施方式
在下文中,将参考附图对本申请的具体实施例进行详细地描述,依照这些详细的描述,所属领域技术人员能够清楚地理解本申请,并能够实施本申请。在不违背本申请原理的情况下,各个不同的实施例中的特征可以进行组合以获得新的实施方式,或者替代某些实施例中的某些特征,获得其它优选的实施方式。
集群是一组相互独立的、通过高速网络互联的计算机,它们构成了一个组,并以单一系统的模式加以管理。一个客户与集群相互作用时,集群像是一个独立的服务器。集群配置是用于提高可用性和可缩放性。和传统的高性能计算机技术相比,集群技术可以利用各档次的服务器作为节点,系统造价低,可以实现很高的运算速度,完成大运算量的计算,具有较高的响应能力,能够满足当今日益增长的信息服务的需求。而集群技术是一种通用的技术,其目的是为了解决单机运算能力的不足、IO能力的不足、提高服务的可靠性、获得规模可扩展能力,降低整体方案的运维成本(运行、升级、维护成本)。只要在其他技术不能达到以上的目的,或者虽然能够达到以上的目的,但是成本过高的情况下,就可以考虑采用集群技术。
在现代云计算平台的数据中心中,集群机器的数量成千上万,如2011年谷歌公布的集群节点规模可达12500台,2016年微软的数据中心的机器数量超过了5万台。同时,在这些大型的数据中心中,每天都有上万个作业和任务被调度和运行。一个有效的作业调度算法能把作业合理地分配到满足它运行资源需求的机器节点上,显著提高集群资源利用率和单位时间内任务吞吐量。然而,与企业生产环境大规模的集群节点相比,相关研究团队的机器节点规模较小,一般在几百台以下,不足以真实并准确地还原出企业生产环境的任务调度实际状况。因此,为了验证新型的调度算法在实际生产环境中的有效性,寻找一种能充分模拟生产环境中任务真实调度并且能在少数机器上即可运行的系统,尤为重要。
SLS的输入并不包含集群节点的硬件资源信息和运行期间内集群节点的动态增删日志,并且,SLS运行的输出是JVM层次上的内存使用情况和整个集群全局上的CPU核数使用情况,缺乏对单个机器节点上的资源利用统计信息进行记录。SLS对集群中节点资源的模拟只是把它们当作拥有相同CPU核数和内存大小的同质机器。
API就是操作系统留给应用程序的一个调用接口,应用程序通过调用操作系统的API而使操作系统去执行应用程序的命令。
散列表(Hash table,也叫哈希表),是根据关键码值(Key value)而直接进行访问的数据结构。也就是说,它通过把关键码值映射到表中一个位置来访问记录,以加快查找的速度。这 个映射函数叫做散列函数,存放记录的数组叫做散列表。
逗号分隔值(Comma-Separated Values,CSV,有时也称为字符分隔值,因为分隔字符也可以不是逗号),其文件以纯文本形式存储表格数据(数字和文本)。纯文本意味着该文件是一个字符序列,不含必须像二进制数字那样被解读的数据。CSV文件由任意数目的记录组成,记录间以某种换行符分隔;每条记录由字段组成,字段间的分隔符是其它字符或字符串,最常见的是逗号或制表符。
本申请提供一种任务调度模拟系统,所述系统包括数据预处理子系统和任务调度子系统;
所述数据预处理子系统,用于对输入的云计算日志信息进行数据异常过滤和提取每一个任务的运行时长;
所述任务调度子系统,用于从每个节点的批处理和实时任务运行队列中入队或出队任务,保持当前集群运行的任务数量和状态与实际生产环境一致;同时根据每个任务的资源需求情况,更新每个节点的CPU核数和内存容量的已使用量和可使用量,得出最新的整个集群资源利用拓扑图。
可选地,所述数据预处理子系统包括数据异常和缺失处理模块、任务信息统计模块、资源需求统计模块和运行时长统计模块;
所述数据异常和缺失处理模块,用于读取原生的云计算集群运行日志对异常数据进行剔除;
所述任务信息统计模块,用于统计每个提交作业的任务信息以及任务实例数量;
所述资源需求统计模块,用于统计出每个作业的CPU总需求核数和内存总需求容量;
运行时长统计模块,用于计算每个任务实例运行时长;统计每个作业预计运行时长。
数据预处理子系统的输入为原生的云计算任务运行日志,输出为原生的云计算任务日志信息加上上述的统计信息。用户可以通过系统提供的API获取上述的统计信息,返回格式为json,加上第三方图表可视化工具可以将任务的硬件需求资源展示到网页上面。
可选地,所述任务信息统计模块、所述资源需求统计模块和所述运行时长统计模块可同时开启3个线程并行处理。
所述任务调度子系统包括任务运行信息处理单元、控制单元和机器节点事件处理单元;
所述任务运行信息处理单元包括任务运行信息加载模块、任务事件驱动模块和任务调度算法模块;
所述控制单元包括模拟器运行控制模块和机器节点资源信息统计收集模块;
所述机器节点事件处理单元包括机器节点事件信息模块和机器节点事件驱动模块。
所述任务事件驱动模块包括批处理任务事件驱动子模块和在线任务事件驱动子模块。
任务运行信息加载模块用于:
S101:批处理任务和在线任务的分类。
S102:按照任务记录中的时间戳为key值,任务记录为value值加入到模拟器支持顺序读写性能较高的Leveldb数据库中。
S103:机器节点的数据信息加载到模拟器在内存中的map无序集合,key为机器节点事件的时间戳,value为机器节点的数据记录。
机器节点事件驱动模块用于:
S201:采用事件驱动模型,根据当前模拟器的挂钟时间,针对相关机器节点的增加或故障事件,模拟器实现对集群全局可用机器节点进行更新。
S202:采用Google日志记录模块,可以把机器节点的更新信息输出到相关目录。
批处理任务事件驱动模块用于:
S301:采用事件驱动模型,根据当前模拟器的挂钟时间,针对批处理任务实例的运行事件状态(准备、等待、终止、失败、取消、中断),执行事件处理。如果批处理任务实例处于等待状态,则触发S5任务算法调度模块,执行相关算法,作任务调度;如果任务实例处于失败或终止或取消状态,则更新所运行节点上的资源信息。
在线任务事件驱动模块用于:
S401:采用事件驱动模型,根据当前模拟器的挂钟时间。如果在线任务的事件状态处于生成状态,则触发触发S5任务算法调度模块,进行任务调度;如果在线任务的事件状态处于移除状态,则更新相关的机器节点资源使用情况。
任务调度算法模块用于:
S501:采用插件的软件设计模式,将不同的任务调度算法整合到模拟器的调度算法库中。用户可以通过配置文件xml指定本次模拟器运行所采用的调度算法。
机器节点资源信息统计收集模块用于:
S601:根据每个节点上运行的任务数量和任务的资源消耗情况,动态计算出某一时刻上每个节点的CPU核数和内存容量使用情况。
S602:如果用户需要实时分析每一时刻集群的资源使用率,机器节点资源信息统计收集模块收到用户指令后,可按每隔一个时间段(如5秒)把节点上的资源利用情况输出到CSV文件中。
模拟器运行控制模块用于:
S701:设置模拟器运行的挂钟的开始时间和结束时间,这两个时间点与阿里云日志中的某两个时间点相对应。
S702:可以设置模拟器运行的加速比。
可选地,所述机器节点事件信息模块包括添加节点子模块和删除节点子模块。
可选地,所述机器节点事件驱动模块包括哈希表。
任务调度子系统首先通过模拟器运行控制模块设定模拟器所需要模拟云计算数据中心的任务调度时间段,启动模拟器运行,然后任务运行信息加载模块从数据预处理子系统的输出数据中加载需要模拟的任务信息,通过机器节点事件驱动模块实时加载新的机器节点信息,通过批处理任务事件驱动模块和在线任务事件驱动模块对任务的运行状态进行管理,通过任务调度算法模块加载指定的调度算法并实现对处于等待状态的任务进行调度,通过机器节点资源信息统计收集模块实时计算每个节点的CPU核数和内存使用情况,并输出到指定的输出目录中。
实施例
本申请是针对集群环境的任务调度模拟系统,以阿里云为对象进行详细说明:
参见图1~2,首先是数据预处理子系统。如图1所示,数据预处理子系统的输入部分为阿里云公布的24小时运行日志,输出为经预处理之后的的CSV文件,作为后续的的模拟器系统的输入数据。预处理共分为4个模块。数据异常处理模块读取原生的阿里云集群4个运行日志作异常处理。异常处理操作主要包括剔除结束时间小于开始时间的任务实例记录和在线任务记录。而针对批处理任务实例资源需求的信息缺失情况,采用统一任务实例的资源需求的平均值来填补。举例来说,若某个批处理任务实例的CPU核数申请记录缺失,则计算和该任务账号相同的所有其他任务实例的CPU申请核数的平均值,并把这个平均值替换缺失值。
任务信息统计模块是统计每个提交作业的任务信息,包括统计每个任务拥有的任务的数量,并形成一个作业ID和任务ID集合相映射的map,其key为作业的ID,value为任务的ID集合。
资源需求统计模块是首先根据单个任务下每个任务实例的所需CPU核数和内存容量记录求和,统计出这个任务的总CPU需求核数和内存需求容量。然后根据单个作业下每个任务的CPU需求核数和内存需求容量求和,统计出这个作业的CPU总需求核数和内存总需求容量。
运行时长统计模块是从日志记录中计算出每个批处理或在线任务实例的运行时长。由于 任务实例的开始运行时间可能早于阿里云日志采样的起始时间,任务实例的结束时间也可能晚于24小时,因此,需要分两种情况。第一,任务实例早于零点零分开始运行,则修改任务实例的开始运行时间为0秒;第二,任务实例结束时间晚于24小时,则修改任务实例的结束时间为int整型的最大值。最后,每个任务实例的运行时长的计算为该任务实例的结束时间减去开始时间,单位为秒。
最后,任务信息、资源需求和运行时长统计模块生成的新的日志记录输出到中间数据CSV文件中。以上三个模块的可以同时开启3个线程并行处理。
2.任务调度子系统
任务调度子系统的具体工作流程如图2所示。首先由用户输入模拟器需要模拟云平台中任务运行的时间段,如0点0分到12点59分。这个时间段信息用作模拟器运行控制模块的初始化。初始化结束后,模拟时钟启动运行。第二步,模拟器运行控制模块启动机器节点事件驱动模块和任务事件驱动模块,然后根据当前模拟时钟,逐行读取数据预处理子系统输出的中间数据CSV文件。若读取的信息属于机器节点事件文件,则把该信息发送给机器节点事件驱动模块,由机器节点事件驱动模块负责解析。机器节点事件信息模块可分为添加节点子模块和删除节点子模块两种。机器节点事件驱动模块利用哈希表记录当前集群中的机器节点信息。因此,当添加和删除节点事件信息需要被处理时,机器节点事件驱动模块操作哈希表,添加或删除集群节点,以模拟当前的集群机器节点个数、资源状况与实际生产环境日志一致。另一方面,任务运行信息加载模块负责从中间数据CSV文件中加载数据到内存map数据结构中,模拟器运行控制模块从该map中把当前时钟的任务事件信息发送给任务事件驱动模块进行处理。任务事件驱动模块解析这些任务事件信息,获取每一个任务的CPU和内存需求情况,生成任务的有向无环图,提交给任务调度算法模块进行资源分配和任务调度。
在运行任务调度算法之前,模拟器运行控制模块通知机器节点资源信息统计收集模块,收集各个节点的资源利用情况,包括剩余的CPU核数以及可分配的内存容量,最后更新整个集群的资源利用拓扑图。任务算法调度模块把这个资源利用拓扑图作为输入数据,从算法调度库中加载用户指定的调度算法代码,运行任务调度程序。同时,模拟器运行控制模块记录该调度程序的运行开始时间和运行结束时间,计算出该调度程序的运行时间,作为该调度算法的运行效率,返回给用户。任务调度程序运行结束后,得到任务和节点的匹配信息。根据这些信息,任务事件驱动模块更新它所维护的节点任务排队表,即从每个节点的批处理和实时任务运行队列中入队或出队任务,从而保持当前集群运行的任务数量和状态与实际生产环境一致。另一方面,机器节点资源信息统计收集模块重新扫描每个节点中任务运行队列,根 据每个任务的资源需求情况,更新每个节点的CPU核数和内存容量的已使用量和可使用量,最后,更新整个集群的资源利用拓扑图。
阿里云的数据中心中,机器节点因为更新换代的原因,呈现出CPU核数和内存容量的异构性。基于这个原因,阿里云数据中心公布的日志信息中不仅记录了机器节点的CPU核数、内存容量和磁盘容量,还记录了每个机器节点的加入或宕机时间戳。因此,某一时刻上集群的任务调度会受到每台机器节点上可用的CPU核数和内存容量约束。由于SLS没有考量实际机器节点的硬件资源,它把所有的机器节点的CPU和内存资源视为同一类型,因此,SLS不能够准确和充分地模拟阿里云任务的调度。基于阿里云公布的24小时云计算平台集群任务运行日志,本系统实现了在单个机器节点上模拟出阿里云集群节点上的任务提交、调度、运行和结束的过程。并且,在24小时内某个时刻,该系统能够根据每台机器节点上的任务运行的数量和其生命周期状态,模拟出每个机器节点上CPU和内存资源利用情况。
尽管在上文中参考特定的实施例对本申请进行了描述,但是所属领域技术人员应当理解,在本申请公开的原理和范围内,可以针对本申请公开的配置和细节做出许多修改。本申请的保护范围由所附的权利要求来确定,并且权利要求意在涵盖权利要求中技术特征的等同物文字意义或范围所包含的全部修改。

Claims (7)

  1. 一种任务调度模拟系统,其特征在于:所述系统包括数据预处理子系统和任务调度子系统;
    所述数据预处理子系统,用于对输入的云计算日志信息进行数据异常过滤和提取每一个任务的运行时长;
    所述任务调度子系统,用于从每个节点的批处理和实时任务运行队列中入队或出队任务,保持当前集群运行的任务数量和状态与实际生产环境一致;同时根据每个任务的资源需求情况,更新每个节点的CPU核数和内存容量的已使用量和可使用量,得出最新的整个集群资源利用拓扑图。
  2. 如权利要求1所述的任务调度模拟系统,其特征在于:所述数据预处理子系统包括数据异常和缺失处理模块、任务信息统计模块、资源需求统计模块和运行时长统计模块;
    所述数据异常和缺失处理模块,用于读取原生的云计算集群运行日志对异常数据进行剔除;
    所述任务信息统计模块,用于统计每个提交作业的任务信息以及任务实例数量;
    所述资源需求统计模块,用于统计出每个作业的CPU总需求核数和内存总需求容量;
    运行时长统计模块,用于计算每个任务实例运行时长;统计每个作业预计运行时长。
  3. 如权利要求2所述的任务调度模拟系统,其特征在于:所述任务信息统计模块、所述资源需求统计模块和所述运行时长统计模块可同时开启3个线程并行处理。
  4. 如权利要求1所述的任务调度模拟系统,其特征在于:所述任务调度子系统包括任务运行信息处理单元、控制单元和机器节点事件处理单元;
    所述任务运行信息处理单元包括任务运行信息加载模块、任务事件驱动模块和任务调度算法模块;
    所述控制单元包括模拟器运行控制模块和机器节点资源信息统计收集模块;
    所述机器节点事件处理单元包括机器节点事件信息模块和机器节点事件驱动模块。
  5. 如权利要求4所述的任务调度模拟系统,其特征在于:所述任务事件驱动模块包括批处理任务事件驱动子模块和在线任务事件驱动子模块。
  6. 如权利要求4所述的任务调度模拟系统,其特征在于:所述机器节点事件信息模块包括添加节点子模块和删除节点子模块。
  7. 如权利要求4所述的任务调度模拟系统,其特征在于:所述机器节点事件驱动模块包括哈希表。
PCT/CN2019/124086 2018-12-14 2019-12-09 一种任务调度模拟系统 WO2020119649A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/055,606 US11455189B2 (en) 2018-12-14 2019-12-09 Task scheduling simulation system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811535124.4 2018-12-14
CN201811535124.4A CN111324445B (zh) 2018-12-14 2018-12-14 一种任务调度模拟系统

Publications (1)

Publication Number Publication Date
WO2020119649A1 true WO2020119649A1 (zh) 2020-06-18

Family

ID=71077148

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/124086 WO2020119649A1 (zh) 2018-12-14 2019-12-09 一种任务调度模拟系统

Country Status (3)

Country Link
US (1) US11455189B2 (zh)
CN (1) CN111324445B (zh)
WO (1) WO2020119649A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111897631A (zh) * 2020-07-15 2020-11-06 上海携旅信息技术有限公司 基于批处理的模型推断系统、方法、电子设备和介质
CN112817725A (zh) * 2021-02-06 2021-05-18 成都飞机工业(集团)有限责任公司 一种基于高效全局优化算法的微服务划分及优化方法
CN112948118A (zh) * 2021-03-12 2021-06-11 上海哔哩哔哩科技有限公司 边缘计算方法、平台、计算机设备和可读存储介质

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111309479B (zh) * 2020-02-14 2023-06-06 北京百度网讯科技有限公司 一种任务并行处理的实现方法、装置、设备和介质
CN111737010B (zh) * 2020-07-30 2024-02-02 腾讯科技(深圳)有限公司 任务处理方法和装置、图形任务处理系统以及存储介质
CN112068942B (zh) * 2020-09-07 2023-04-07 北京航空航天大学 一种基于单节点模拟的大规模并行系统模拟方法
CN113485794A (zh) * 2021-07-26 2021-10-08 上海中通吉网络技术有限公司 基于k8s的大数据离线调度方法和系统
CN113835953A (zh) * 2021-09-08 2021-12-24 曙光信息产业股份有限公司 作业信息的统计方法、装置、计算机设备和存储介质
CN113553140B (zh) * 2021-09-17 2022-03-18 阿里云计算有限公司 资源调度方法、设备及系统
CN113886029A (zh) * 2021-10-15 2022-01-04 中国科学院信息工程研究所 一种跨地域分布数据中心任务调度方法及系统
CN116302450B (zh) * 2023-05-18 2023-09-01 深圳前海环融联易信息科技服务有限公司 任务的批处理方法、装置、计算机设备及存储介质
CN116700933B (zh) * 2023-08-02 2023-11-21 之江实验室 一种面向异构算力联邦的多集群作业调度系统和方法
CN118069374B (zh) * 2024-04-18 2024-06-18 清华大学 数据中心智能训练仿真事务加速方法、装置、设备及介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101246439A (zh) * 2008-03-18 2008-08-20 中兴通讯股份有限公司 一种基于任务调度的自动化测试方法及系统
US20110161974A1 (en) * 2009-12-28 2011-06-30 Empire Technology Development Llc Methods and Apparatus for Parallelizing Heterogeneous Network Communication in Smart Devices
US20140245298A1 (en) * 2013-02-27 2014-08-28 Vmware, Inc. Adaptive Task Scheduling of Hadoop in a Virtualized Environment
CN104298550A (zh) * 2014-10-09 2015-01-21 南通大学 一种面向Hadoop的动态调度方法
CN104915407A (zh) * 2015-06-03 2015-09-16 华中科技大学 一种基于Hadoop多作业环境下的资源调度方法

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10346188B1 (en) * 2014-06-13 2019-07-09 Veritas Technologies Llc Booting virtual machine instances in a distributed data processing architecture
US9699049B2 (en) * 2014-09-23 2017-07-04 Ebay Inc. Predictive model for anomaly detection and feedback-based scheduling
EP3182288B1 (en) * 2015-12-15 2019-02-13 Tata Consultancy Services Limited Systems and methods for generating performance prediction model and estimating execution time for applications
CN106992901B (zh) * 2016-01-20 2020-08-18 阿里巴巴集团控股有限公司 用于资源调度模拟压力的方法和设备
US10572306B2 (en) * 2016-09-14 2020-02-25 Cloudera, Inc. Utilization-aware resource scheduling in a distributed computing cluster
US10831633B2 (en) * 2018-09-28 2020-11-10 Optum Technology, Inc. Methods, apparatuses, and systems for workflow run-time prediction in a distributed computing system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101246439A (zh) * 2008-03-18 2008-08-20 中兴通讯股份有限公司 一种基于任务调度的自动化测试方法及系统
US20110161974A1 (en) * 2009-12-28 2011-06-30 Empire Technology Development Llc Methods and Apparatus for Parallelizing Heterogeneous Network Communication in Smart Devices
US20140245298A1 (en) * 2013-02-27 2014-08-28 Vmware, Inc. Adaptive Task Scheduling of Hadoop in a Virtualized Environment
CN104298550A (zh) * 2014-10-09 2015-01-21 南通大学 一种面向Hadoop的动态调度方法
CN104915407A (zh) * 2015-06-03 2015-09-16 华中科技大学 一种基于Hadoop多作业环境下的资源调度方法

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111897631A (zh) * 2020-07-15 2020-11-06 上海携旅信息技术有限公司 基于批处理的模型推断系统、方法、电子设备和介质
CN111897631B (zh) * 2020-07-15 2022-08-30 上海携旅信息技术有限公司 基于批处理的模型推断系统、方法、电子设备和介质
CN112817725A (zh) * 2021-02-06 2021-05-18 成都飞机工业(集团)有限责任公司 一种基于高效全局优化算法的微服务划分及优化方法
CN112817725B (zh) * 2021-02-06 2023-08-11 成都飞机工业(集团)有限责任公司 一种基于高效全局优化算法的微服务划分及优化方法
CN112948118A (zh) * 2021-03-12 2021-06-11 上海哔哩哔哩科技有限公司 边缘计算方法、平台、计算机设备和可读存储介质
CN112948118B (zh) * 2021-03-12 2024-01-16 上海哔哩哔哩科技有限公司 边缘计算方法、平台、计算机设备和可读存储介质

Also Published As

Publication number Publication date
CN111324445A (zh) 2020-06-23
US11455189B2 (en) 2022-09-27
CN111324445B (zh) 2024-04-02
US20210224110A1 (en) 2021-07-22

Similar Documents

Publication Publication Date Title
WO2020119649A1 (zh) 一种任务调度模拟系统
US11461329B2 (en) Tracking query execution status for selectively routing queries
Ren et al. Workload characterization on a production hadoop cluster: A case study on taobao
Hu et al. Flutter: Scheduling tasks closer to data across geo-distributed datacenters
Goodhope et al. Building LinkedIn's Real-time Activity Data Pipeline.
CN103930875A (zh) 用于加速业务数据处理的软件虚拟机
Huang et al. Yugong: Geo-distributed data and job placement at scale
CN103677759A (zh) 一种用于信息系统性能提升的对象化并行计算方法及系统
Petrov et al. Adaptive performance model for dynamic scaling Apache Spark Streaming
Lu et al. Understanding the workload characteristics in alibaba: A view from directed acyclic graph analysis
Yan et al. Cost-efficient consolidating service for Aliyun’s cloud-scale computing
Mikida et al. Towards pdes in a message-driven paradigm: A preliminary case study using charm++
US10944814B1 (en) Independent resource scheduling for distributed data processing programs
US20100106749A1 (en) Reorganizing table-based data objects
CN103380608A (zh) 在计算环境中汇聚队列信息及作业信息的方法
Lu et al. Overhead aware task scheduling for cloud container services
Wang et al. Improving utilization through dynamic VM resource allocation in hybrid cloud environment
Koutsovasilis et al. A holistic approach to data access for cloud-native analytics and machine learning
Mian et al. Managing data-intensive workloads in a cloud
Ovando-Leon et al. A simulation tool for a large-scale nosql database
Mishra et al. Ldm: lineage-aware data management in multi-tier storage systems
Sliwko et al. Agocs—accurate google cloud simulator framework
Xie et al. A resource scheduling algorithm based on trust degree in cloud computing
Zhang et al. Design of grid resource management system based on divided min-min scheduling algorithm
Lu et al. Implementation and Performance Analysis of Apache Hadoop

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19896645

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 03.11.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19896645

Country of ref document: EP

Kind code of ref document: A1