WO2019061122A1 - 一种spark任务分配方法和系统 - Google Patents

一种spark任务分配方法和系统 Download PDF

Info

Publication number
WO2019061122A1
WO2019061122A1 PCT/CN2017/103877 CN2017103877W WO2019061122A1 WO 2019061122 A1 WO2019061122 A1 WO 2019061122A1 CN 2017103877 W CN2017103877 W CN 2017103877W WO 2019061122 A1 WO2019061122 A1 WO 2019061122A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
task
resource
identifier
ssd
Prior art date
Application number
PCT/CN2017/103877
Other languages
English (en)
French (fr)
Inventor
毛睿
陆敏华
陆克中
朱金彬
隋秀峰
Original Assignee
深圳大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳大学 filed Critical 深圳大学
Priority to PCT/CN2017/103877 priority Critical patent/WO2019061122A1/zh
Publication of WO2019061122A1 publication Critical patent/WO2019061122A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Definitions

  • the invention belongs to the technical field of computers, and in particular relates to a Spark task allocation method and system.
  • Spark is an efficient big data computing framework widely used in the industry. Deploying Spark to a high-performance computing cluster can effectively improve Spark's big data processing efficiency.
  • High Performance Computing Cluster (High Performance Computing Cluster, HPC Cluster) is based on solid state drives (Solid State) by configuring different storage devices for different compute nodes. Drives, SSD) and distributed disk systems and computing clusters that are mixed with Hard Disk Drive (HDD).
  • High-performance computing clusters combine the high-speed read, write, and high-throughput features of SSDs with the high-capacity, low-cost features of HDDs to achieve efficient storage and computing power while preserving storage and computing costs.
  • the cluster In order to achieve fault tolerance of the system, the cluster usually adopts a strategy of storing multiple copies of data. That is, the management system of the cluster usually matches the use of SSD and HDD reasonably.
  • the typical usage strategy is to store a copy of the data in the SSD node. Other copies are stored on the HDD node.
  • Figure 4 it shows the topology of an existing high performance computing cluster based on SSD and HDD hybrid storage.
  • the current Spark task allocation strategy is based on the location of the operation data, assigning the task to the computing node storing the task operation data, and deploying the task and data to the same computing node, thereby avoiding remote reading of data and realizing data. Local processing.
  • the current data location-based allocation strategy ignores the different storage characteristics of SSDs and HDDs, and does not make targeted use for the different storage characteristics of SSDs and HDDs, for example, if the data of task operations are stored in both SSD nodes and HDDs.
  • the current Spark randomly selects any node in the SSD node or the HDD node as the computing node when performing task assignment. Therefore, the existing Spark does not consider the heterogeneous characteristics of the SSD and the HDD of the cluster, and randomly allocates tasks, which inevitably reduces the execution efficiency of the Spark.
  • the technical problem to be solved by the present invention is to provide a Spark task allocation method and system, which aims to solve the problem of optimizing the execution efficiency of an existing Spark application.
  • the present invention provides a Spark task allocation method, and the method includes:
  • the data location based selection method pairs the currently submitted task with the nodes included in the configured resource to generate a pairing result
  • the task is allocated to any node of the SSD node group for execution.
  • the method further includes:
  • the task is allocated to any one of the nodes of the mechanical hard disk node group for execution.
  • the method further includes: acquiring a node identifier of all nodes in the cluster, and a storage device feature identifier configured by each of the nodes, and generating a correspondence between the node identifier and a storage device feature identifier; wherein the storage device Feature IDs include solid state drive identification and mechanical hard drive identification.
  • the method further includes: configuring, in response to the resource request of the currently submitted task, a node that satisfies the resource request condition to the currently submitted task to complete resource configuration.
  • the data location-based selection method pairs the tasks and the nodes included in the configured resources, and the initial pairing result includes:
  • the task is randomly assigned to any of the configured resources, and the corresponding pairing result is generated.
  • the present invention further provides a Spark task distribution system, where the system includes:
  • a resource scheduling module configured to pair the currently submitted task and the nodes included in the configured resource according to a data location selection method to generate a pairing result
  • a resource filtering module configured to use a node identifier of a node included in the pairing result to search for a corresponding relationship between the generated node identifier and the storage device feature identifier, and obtain a corresponding storage device feature identifier;
  • the resource filtering module is further configured to group the nodes included in the pairing result into a solid state disk node group and a mechanical hard disk node group according to the storage device feature identifier;
  • the resource filtering module is further configured to: if the operation data of the task exists in the node of the SSD node group and the node of the mechanical hard disk node group, assign the task to the SSD node. Executed on any node of the group.
  • resource filtering module is further configured to:
  • the task is allocated to any one of the nodes of the mechanical hard disk node group for execution.
  • the system further includes a storage characteristic statistic module: a node identifier for acquiring all the nodes in the cluster, and a storage device feature identifier configured by each of the nodes, and generating a correspondence between the node identifier and the storage device feature identifier;
  • the storage device feature identifier includes a solid state disk identifier and a mechanical hard disk identifier.
  • system further includes a resource configuration module configured to: configure, in response to the resource request of the currently submitted task, a node that satisfies the resource request condition to the currently submitted task to complete resource configuration.
  • resource scheduling module is specifically configured to:
  • the task is randomly assigned to any of the configured resources, and the corresponding pairing result is generated.
  • the invention has the following advantages:
  • the invention provides a Spark task allocation method, which firstly pairs the currently submitted task and the nodes included in the configured resource according to the data location selection method to generate a pairing result; and utilizes the generated node identifier and storage device feature.
  • the corresponding relationship of the identifiers is grouped into the SSD node group and the mechanical hard disk node group; if the operation data of the current task exists in both the node of the SSD node group and the node of the mechanical disk node group, priority is given.
  • Assign tasks to nodes in the SSD node group to maximize the task assignment to compute nodes configured with SSDs, thereby taking advantage of SSD high-speed read, write, and high throughput features to speed up task execution. Improve the quality of service of the cluster.
  • FIG. 1 is a flowchart of a Spark task allocation method according to a first embodiment of the present invention
  • FIG. 2 is a schematic diagram of a Spark task distribution system according to a second embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a Spark task distribution system according to a third embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a topology structure of a prior art high performance computing cluster based on SSD and HDD hybrid storage provided by the present invention.
  • the present invention provides a Spark task allocation method, which includes the following steps:
  • Step S101 Acquire a node identifier of all nodes in the cluster, and a storage device feature identifier configured by each node, and generate a correspondence between the node identifier and the storage device feature identifier.
  • the storage device feature identifier includes a solid state hard disk identifier and a mechanical hard disk identifier.
  • SSD solid state drive
  • HDD mechanical hard disk
  • the storage nodes configured on the node dn1 are SSDs
  • the storage devices configured on the node dn2 are HDDs
  • the storage devices configured on the node dn3 are HDDs
  • the storage devices configured on the node dnm are SSDs.
  • the method provided by the present invention firstly performs statistics on the storage characteristics of each node and node of the cluster through step S101, which can be understood as a process of system initialization, and each time a task is executed in the cluster, the following step S102 needs to be performed. -S106 operation.
  • Step S102 In response to the resource request of the currently submitted task, configure a node that satisfies the resource request condition to the currently submitted task to complete resource configuration.
  • Step S102 is a process of configuring a resource.
  • the resource configuration process is specifically as follows:
  • all the nodes in the cluster periodically send "heartbeat" data to the resource configuration module (Provisioner), and the "heartbeat” data includes the current idle resources of the node (such as the number of CPUs available and the size of the memory).
  • Provisioner module gets this information, save it.
  • the user/user generates a resource request of the currently submitted task according to the current task, and sends the resource request to the Provisioner module of the manager in the cluster (the request information includes the required number of CPUs and Memory size and other content).
  • the Provisioner module After obtaining the resource request of the Driver, the Provisioner module queries the idle resource information of the node, and assigns the node whose idle resource satisfies the request to the Driver (which also includes information about which nodes are allocated and how many CPUs and how much memory space each node allocates. ), thus completing the configuration of the resource.
  • Step S103 Pair the currently submitted task and the nodes included in the configured resource according to the data location selection method to generate a pairing result.
  • the data location-based selection method refers to assigning a task to a node storing the task operation data (deploying task and task operation data to the same node to avoid remote transmission of data), therefore, step S103 represents maximizing the task. Pair with the node that stores its operational data. If the pairing is successful, the corresponding pairing result is generated; if the pairing is unsuccessful, the currently submitted task is randomly assigned to any different node in the configured resource, and the corresponding pairing result is generated. E.g: ⁇ t1, ⁇ dn1, dn2 ⁇ indicates that the task t1 and the node dn1 are successfully paired with the node dn2. If there are tasks that cannot be paired, the tasks are randomly assigned to different nodes.
  • step S104 the node identifier of the node included in the pairing result is used to search for the corresponding relationship between the generated node identifier and the storage device feature identifier, and obtain the corresponding storage device feature identifier.
  • Step S105 The nodes included in the pairing result are grouped into a solid state disk node group and a mechanical hard disk node group according to the corresponding storage device feature identifier acquired in step S104.
  • the nodes included in the pairing result are grouped into a solid state disk node group DNssd and a mechanical hard disk node group DNhdd according to the SSD and the HDD.
  • DNhdd contains the node whose storage device ID is HDD in the pairing result.
  • Step S106 If the operation data of the currently submitted task exists in both the node of the SSD node group DNssd and the node of the mechanical hard disk node group DNhdd, the task is assigned to any node of the SSD node group DNssd for execution.
  • the result of the pairing is ⁇ T1, ⁇ dn1, dn2 ⁇
  • the pairing result is further refined to ⁇ t1, dn1 ⁇ and ⁇ t1, dn2 ⁇ , according to dn1 belongs to DNssd, and dn2 belongs to DNhdd
  • the task is preferentially assigned to the storage device identifier as SSD The dn1 node is executed.
  • the task is allocated to any one of the nodes of the mechanical hard disk node group for execution.
  • Step S106 ensures that the task is preferentially allocated to the node where the storage device is an SSD, and the task is allocated to the computing node configured with the SSD to maximize the use of the SSD high-speed read, write, and high throughput characteristics to speed up the task.
  • the execution speed improves the quality of service of the cluster. If the matching of the task with the SSD node cannot be achieved, the pairing result of step S103 is still maintained for allocation.
  • the Spark task allocation method faces the heterogeneous storage Spark, and performs task assignment (task and node pairing) based on the data location selection method.
  • the task allocation strategy is optimized, accurately paired, and the task is assigned to the node configured with SSD to maximize the use of SSD high-speed read, write and high throughput characteristics, speed up the execution of tasks, and improve the service quality of the cluster. .
  • the present invention provides a Spark task distribution system, which includes a storage characteristic statistics module 10, a resource configuration module 20, a resource scheduling module 30, and a resource filtering module 40. .
  • the storage characteristic statistic module 10 is configured to obtain a node identifier of all nodes in the cluster, and a storage device feature identifier configured by each node, and generate a correspondence between the node identifier and the storage device feature identifier; wherein the storage device feature identifier includes a solid state hard disk identifier and Mechanical hard disk identification.
  • SSD solid state drive
  • HDD mechanical hard disk
  • the storage nodes configured on the node dn1 are SSDs
  • the storage devices configured on the node dn2 are HDDs
  • the storage devices configured on the node dn3 are HDDs
  • the storage devices configured on the node dnm are SSDs.
  • the system provided by the present invention first performs statistics on the storage characteristics of each node and node of the cluster through the module 10, which can be understood as a process of system initialization, and each time a task is executed in the cluster, modules 20, 30 and 40 perform the operation.
  • the resource configuration module 20 is configured to configure, according to the resource request of the currently submitted task, a node that satisfies the resource request condition to the currently submitted task, to complete configuration of the resource. 20 The process of resource configuration is completed.
  • the resource configuration process is as follows:
  • all nodes in the cluster periodically send "heartbeat" data to the resource configuration module 20.
  • the "heartbeat" data includes the current idle resources of the node (such as the number of available CPUs and memory size, etc.). After the resource configuration module 20 obtains the information, it saves it.
  • the user/user generates a resource request of the currently submitted task according to the current task, and sends the resource request to the resource configuration module 20 of the manager in the cluster (the request information includes the required CPUs). Number and memory size, etc.).
  • the resource configuration module 20 After obtaining the resource request of the Driver, the resource configuration module 20 queries the idle resource information of the node, and allocates the node whose idle resource satisfies the request to the Driver (which also includes which nodes are specifically allocated and how many CPUs and how much memory space each node allocates. Etc.) to complete the configuration of the resource.
  • the resource scheduling module 30 is configured to pair the currently submitted task and the nodes included in the configured resource according to the data location selection method to generate a pairing result.
  • the data location-based selection method refers to assigning a task to a node that stores the task operation data (deploying task and task operation data to the same node to avoid remote transmission of data), therefore, the module The role of 30 is to maximize the matching of tasks with nodes that store their operational data. If the pairing is successful, the corresponding pairing result is generated; if the pairing is unsuccessful, the currently submitted task is randomly assigned to any different node in the configured resource, and the corresponding pairing result is generated.
  • ⁇ t1, ⁇ dn1, dn2 ⁇ indicates that the task t1 and the node dn1 are successfully paired with the node dn2. If there are tasks that cannot be paired, the tasks are randomly assigned to different nodes.
  • the resource filtering module 40 is configured to use the node identifier of the node included in the pairing result of the resource scheduling module 30 to search for the correspondence between the node identifier generated by the storage characteristic statistics module 10 and the storage device feature identifier, and obtain the corresponding storage device feature identifier. .
  • the resource filtering module 40 is further configured to group the nodes included in the pairing result into a solid state disk node group and a mechanical hard disk node group according to the obtained corresponding storage device feature identifier.
  • the nodes included in the pairing result are grouped into a solid state disk node group DNssd and a mechanical hard disk node group DNhdd according to the SSD and the HDD.
  • DNhdd contains the node whose storage device ID is HDD in the pairing result.
  • the resource filtering module 40 is further configured to: if the operation data of the currently submitted task exists in the node of the SSD node group DNssd and the node of the mechanical hard disk node group DNhdd, assign the task to any one of the SSD node groups DNssd. Executed on the node. For example: the result of the pairing is ⁇ T1, ⁇ dn1, dn2 ⁇ , the pairing result is further refined to ⁇ t1, dn1 ⁇ and ⁇ t1, dn2 ⁇ , according to dn1 belongs to DNssd, and dn2 belongs to DNhdd, then the task is preferentially assigned to the storage device identifier as SSD The dn1 node is executed.
  • the task is allocated to any one of the nodes of the mechanical hard disk node group for execution.
  • the resource filtering module 40 maximizes the use of SSD high-speed read, write, and high throughput characteristics by ensuring that tasks are preferentially assigned to nodes whose storage devices are SSDs, and tasks are allocated to compute nodes configured with SSDs. Speed up the execution of tasks and improve the service quality of the cluster. If the match of the task with the SSD node cannot be achieved, the pairing result of the resource scheduling module 30 is still maintained for allocation.
  • the Spark task distribution system provided by the second embodiment of the present invention optimizes the task allocation strategy based on the data location selection method for task assignment (task and node pairing).
  • the Spark features of the storage are precisely paired to maximize the task assignment to the nodes configured with SSDs, thereby taking advantage of SSD high-speed read, write, and high throughput characteristics, speeding up task execution and improving cluster service quality.
  • FIG. 3 a schematic diagram of a Spark task distribution system according to the present invention is provided.
  • Spark Master
  • Application Driver
  • Provisioner indicates the resource configuration module
  • Scheduler indicates the resource scheduling module
  • Resource Filter represents a resource filtering module
  • Storage Monitor represents a storage characteristic statistics module.
  • HDFS represents a cluster, which contains two types of nodes and runs in a manager-worker mode, that is, one NameNode (manager) and multiple DataNodes (workers).
  • Figure 3 shows the working principle of the Spark task allocation framework for heterogeneous storage.
  • the specific implementation process is as follows:
  • the Storage Monitor module in the system first obtains the storage device feature information configured by each node in the cluster, and generates the corresponding relationship between the node identifier and the storage device feature identifier.
  • all nodes periodically send a "heartbeat" to the Provisioner module, including the node's current free resources (the number of available CPUs and memory size). After the Provisioner module gets this information, save it.
  • the Driver sends a resource request to the Master's Provisioner module, and the request information includes the required number of CPUs and the size of the memory;
  • the Provisioner module obtains the resource request of the Driver, the idle resource information of the node is queried, and the computing node that satisfies the request with the idle resource is allocated to the Driver, which specifically includes which nodes are allocated, how many CPUs and how much memory space each node allocates, and the resources are completed. Configuration.
  • the provisioner module sends the configured resource information to the Scheduler and ResourceFilter modules, and the data location-based pairing and the exact pairing for heterogeneous storage are performed by the Scheduler and ResourceFilter modules, respectively;
  • the Scheduler module matches configured resources and tasks based on data location policies to maximize the matching of tasks to nodes that store their operational data. If there are tasks that cannot be matched, these tasks are randomly assigned to different nodes. For example, ⁇ t1, ⁇ dn1, dn2 ⁇ indicates that the task t1 and the nodes dn1 and dn2 match successfully;
  • the ResourceFilter module After obtaining the configured resource information, the ResourceFilter module sends a feature query request to the StorageMonitor module, queries the storage device feature identifier of the node included in the Scheduler pairing result, and groups the nodes into DNssd and DNhdd according to the storage device feature identifier, and determines the task. Priority is assigned to nodes in DNssd for execution.
  • the Scheduler obtains the allocation result of the ResourceFilter, and according to the result of the allocation, assigns the task to its corresponding node for execution. For example, the ResourceFilter sends the allocation result ⁇ t1, dn1 ⁇ to the Scheduler, and the Scheduler module sends the matching result to the Driver. After the driver obtains the resource information, the task t1 is sent to the designated node dn1.
  • the node After the node gets the task, it executes the task.
  • the system provided by the third embodiment of the present invention makes a targeted selection of the SSD node.
  • the task is assigned to the SSD node, which can fully utilize the SSD high-speed read, write and high throughput characteristics, accelerate the execution speed of the task, improve the service quality of the cluster, and greatly improve the performance. Spark application execution efficiency.

Abstract

一种Spark任务分配方法和系统。首先基于数据位置的选择方法对当前提交的任务和已配置的资源中所包含的节点进行配对,生成配对结果;利用已生成的节点标识与存储设备特征标识的对应关系,将配对结果中所包含的节点分组为固态硬盘节点组和机械硬盘节点组;若当前任务的操作数据同时存在于固态硬盘节点组的节点和机械硬盘节点组的节点中,优先将任务分配到固态硬盘节点组的节点中执行,即最大限度地将任务分配到配置有SSD的计算节点上,从而充分利用SSD高速读、写以及高吞吐率的特性,加快任务的执行速度,提升集群的服务质量。

Description

一种Spark任务分配方法和系统
本发明属于计算机技术领域,尤其涉及一种Spark任务分配方法和系统。
Spark是目前产业界广泛使用的高效的大数据计算框架,将Spark部署到高性能计算集群中可有效地提升Spark的大数据处理效率。高性能计算集群(High Performance Computing Cluster,HPC Cluster)是通过对不同的计算节点配置不同的存储设备,搭建了基于固态硬盘(Solid State Drives,SSD)和机械硬盘(Hard Disk Drive,HDD)混合的分布式文件系统和计算集群。高性能计算集群综合利用SSD的高速读、写和高吞吐率的特性以及HDD的大容量、廉价的特性,在保证存储和计算成本的前提下,实现了集群存储和计算能力的有效提升。
为了实现系统的容错功能,集群通常采用一块数据存储多个副本的策略,即集群的管理系统通常会合理地搭配SSD和HDD的使用,其中典型的使用策略是将数据的一个副本存储在SSD节点,其它副本存储在HDD节点。如图4所示,其展示了现有的基于SSD和HDD混合存储的高性能计算集群的拓扑结构。
当前Spark的任务分配策略是基于操作数据的位置,将任务分配到存储有该任务操作数据的计算节点,将任务和数据部署到同一计算节点,从而避免了数据的远程读取,实现了数据的本地处理。但是,目前基于数据位置的分配策略忽略了SSD和HDD的不同存储特性,未针对SSD和HDD不同的存储特性做出有针对性的利用,例如,如果任务操作的数据同时存储在SSD节点和HDD节点时,当前Spark在进行任务分配时会随机地选择SSD节点或HDD节点中的任一节点作为计算节点。因此,现有的Spark由于没有考虑上述集群所具备的SSD和HDD 的异构特性,随机对任务进行分配,无形中降低了Spark的执行效率。
发明内容
本发明所要解决的技术问题为提供一种Spark任务分配方法和系统,旨在解决优化现有的Spark应用程序的执行效率的问题。
为解决上述技术问题,本发明提供了一种Spark任务分配方法,所述方法包括:
基于数据位置的选择方法对当前提交的任务和已配置的资源中所包含的节点进行配对,生成配对结果;
利用所述配对结果中所包含节点的节点标识,查找已生成的节点标识与存储设备特征标识的对应关系,获取相应的存储设备特征标识;
按照所述存储设备特征标识将所述配对结果中所包含节点分组为固态硬盘节点组和机械硬盘节点组;
若所述任务的操作数据同时存在于所述固态硬盘节点组的节点和所述机械硬盘节点组的节点中,则将所述任务分配至所述固态硬盘节点组的任意一个节点上执行。
进一步地,所述方法还包括:
若所述任务的操作数据只存在于所述固态硬盘节点组的节点中,则将所述任务分配至所述固态硬盘节点组的任意一个节点上执行;
若所述任务的操作数据只存在于所述机械硬盘节点组的节点中,则将所述任务分配至所述机械硬盘节点组的任意一个节点上执行。
进一步地,所述方法还包括:获取集群中所有节点的节点标识、以及各所述节点配置的存储设备特征标识,生成所述节点标识与存储设备特征标识的对应关系;其中,所述存储设备特征标识包括固态硬盘标识和机械硬盘标识。
进一步地,所述方法还包括:响应于所述当前提交的任务的资源请求,将满足所述资源请求条件的节点配置给所述当前提交的任务,以完成资源的配置。
进一步地,所述基于数据位置的选择方法对任务和已配置的资源中所包含的节点进行配对,生成初始配对结果具体包括:
基于数据位置的选择方法,将所述任务和所述已配置的资源中存储有所述任务的操作数据的节点进行配对;
若配对成功,则生成相应的所述配对结果;
若配对不成功,则将所述任务随机分配给所述已配置的资源中的任意节点,并生成相应的所述配对结果。
为解决上述技术问题,本发明还提供了一种Spark任务分配系统,所述系统包括:
资源调度模块,用于基于数据位置的选择方法对当前提交的任务和已配置的资源中所包含的节点进行配对,生成配对结果;
资源过滤模块,用于利用所述配对结果中所包含节点的节点标识,查找已生成的节点标识与存储设备特征标识的对应关系,获取相应的存储设备特征标识;
所述资源过滤模块,还用于按照所述存储设备特征标识将所述配对结果中所包含节点分组为固态硬盘节点组和机械硬盘节点组;
所述资源过滤模块,还用于若所述任务的操作数据同时存在于所述固态硬盘节点组的节点和所述机械硬盘节点组的节点中,则将所述任务分配至所述固态硬盘节点组的任意一个节点上执行。
进一步地,所述资源过滤模块还用于:
若所述任务的操作数据只存在于所述固态硬盘节点组的节点中,则将所述任务分配至所述固态硬盘节点组的任意一个节点上执行;
若所述任务的操作数据只存在于所述机械硬盘节点组的节点中,则将所述任务分配至所述机械硬盘节点组的任意一个节点上执行。
进一步地,所述系统还包括存储特性统计模块:用于获取集群中所有节点的节点标识、以及各所述节点配置的存储设备特征标识,生成所述节点标识与存储设备特征标识的对应关系;其中,所述存储设备特征标识包括固态硬盘标识和机械硬盘标识。
进一步地,所述系统还包括资源配置模块:用于响应于所述当前提交的任务的资源请求,将满足所述资源请求条件的节点配置给所述当前提交的任务,以完成资源的配置。
进一步地,所述资源调度模块具体用于:
基于数据位置的选择方法,将所述任务和所述已配置的资源中存储有所述任务的操作数据的节点进行配对;
若配对成功,则生成相应的所述配对结果;
若配对不成功,则将所述任务随机分配给所述已配置的资源中的任意节点,并生成相应的所述配对结果。
本发明与现有技术相比,有益效果在于:
本发明提供了一种Spark任务分配方法,首先基于数据位置的选择方法对当前提交的任务和已配置的资源中所包含的节点进行配对,生成配对结果;利用已生成的节点标识与存储设备特征标识的对应关系,将配对结果中所包含的节点分组为固态硬盘节点组和机械硬盘节点组;若当前任务的操作数据同时存在于固态硬盘节点组的节点和机械硬盘节点组的节点中,优先将任务分配到固态硬盘节点组的节点中执行,即最大限度地将任务分配到配置有SSD的计算节点上,从而充分利用SSD高速读、写以及高吞吐率的特性,加快任务的执行速度,提升集群的服务质量。
附图说明
图1是本发明第一个实施例提供的一种Spark任务分配方法流程图;
图2是本发明第二个实施例提供的一种Spark任务分配系统示意图;
图3是本发明第三个实施例提供的一种Spark任务分配系统示意图;
图4是本发明提供的现有的基于SSD和HDD混合存储的高性能计算集群的拓扑结构示意图。
具体实施方式
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
作为本发明的第一个实施例,如图1所示,本发明提供的一种Spark任务分配方法,该方法包括如下步骤:
步骤S101:获取集群中所有节点的节点标识、以及各节点配置的存储设备特征标识,生成节点标识与存储设备特征标识的对应关系;其中,存储设备特征标识包括固态硬盘标识和机械硬盘标识。
在本实施例中,将获取的集群中所有节点的节点标识进行统计,生成节点集合DNs(用于保存和管理集群中有效的计算节点),DNs = {dn1, dn2, dn3,…, dnm},其中,dn1, dn2, dn3,…, dnm分别表示各节点标识。根据DNs 将获取的各节点配置的存储设备特征标识进行统计,生成存储设备集合CHARs(即用于保存和管理DNs中对应节点所配置的存储设备的特征),例如,CHARs ={ssd, hdd, hdd, …, ssd},其中,ssd表示其对应的节点配置的存储设备为固态硬盘(SSD),hdd表示其对应的节点配置的存储设备为机械硬盘(HDD)。通过DNs与CHARs两个变量的建立,即实现了生成节点标识与存储设备特征标识的对应关系。上述集群中部署了m个计算节点,节点dn1配置的存储设备是SSD,节点dn2配置的存储设备是HDD,节点dn3配置的存储设备是HDD,节点dnm配置的存储设备是SSD。
本发明所提供的方法,首先通过步骤S101对集群各节点和节点的存储特性做统计,可以理解为系统初始化的过程,而每当一个任务在集群中被执行时,都需要执行下述步骤S102-S106的操作。
步骤S102:响应于当前提交的任务的资源请求,将满足该资源请求条件的节点配置给当前提交的任务,以完成资源的配置。步骤S102为资源配置的过程,在本实施例中,资源配置过程具体如下:
首先,集群中所有的节点定期向资源配置模块(Provisioner)发送“心跳”数据,“心跳”数据中包括该节点目前的空闲资源(如可用的CPU个数和内存大小等)。Provisioner模块获取这些信息后,将其保存。
用户/终端(Driver)根据当前的任务所需生成当前提交的任务的资源请求,并向集群中的管理者(Master)的Provisioner模块发送该资源请求(请求信息中包括了需要的CPU个数和内存大小等内容)。
Provisioner模块获取Driver的资源请求后,查询节点的空闲资源信息,将空闲资源满足请求的节点分配给该Driver(同时还包含了具体分配哪些节点以及各节点分配多少个CPU和多大的内存空间等信息),从而完成资源的配置。
步骤S103:基于数据位置的选择方法对当前提交的任务和已配置的资源中所包含的节点进行配对,生成配对结果。基于数据位置的选择方法是指将任务分配到存储有该任务操作数据的节点(将任务和任务操作数据部署到同一节点,可避免数据的远程传输),因此,步骤S103表示最大限度地将任务和存储有其操作数据的节点进行配对。若配对成功,则生成相应的配对结果;若配对不成功,则将当前提交的任务随机分配给已配置的资源中的任意不同的节点,并生成相应的配对结果。例如: {t1, {dn1, dn2}}表示任务t1和节点dn1与节点dn2配对成功,若存在无法配对的任务,则将任务随机分配给不同的节点。
步骤S104:利用上述配对结果中所包含节点的节点标识,查找上述已生成的节点标识与存储设备特征标识的对应关系,获取相应的存储设备特征标识。
步骤S105:按照步骤S104获取的相应的存储设备特征标识,将配对结果中所包含节点分组为固态硬盘节点组和机械硬盘节点组。在本实施例中,按SSD和HDD将配对结果中所包含的节点分组为固态硬盘节点组DNssd和机械硬盘节点组DNhdd, DNssd = {…, dni, … },DNhdd = {…, dnj, …},DNssd包含了配对结果中存储设备标识为SSD的节点, DNhdd包含了配对结果中存储设备标识为HDD的节点。
步骤S106:若当前提交的任务的操作数据同时存在于固态硬盘节点组DNssd的节点和机械硬盘节点组DNhdd的节点中,则将该任务分配至固态硬盘节点组DNssd的任意一个节点上执行。例如:配对结果为{ t1, { dn1, dn2}},则将配对结果进一步精确为{t1, dn1}和{t1, dn2},依据是dn1属于DNssd,而dn2属于DNhdd,则将任务优先分配至存储设备标识为SSD的dn1节点执行。
若所述任务的操作数据只存在于所述固态硬盘节点组的节点中,则将所述任务分配至所述固态硬盘节点组的任意一个节点上执行;
若所述任务的操作数据只存在于所述机械硬盘节点组的节点中,则将所述任务分配至所述机械硬盘节点组的任意一个节点上执行。
步骤S106即保证将任务优先分配至存储设备为SSD的节点的原则,最大限度地将任务分配到配置有SSD的计算节点上,从而充分利用SSD高速读、写以及高吞吐率的特性,加快任务的执行速度,提升集群的服务质量。如果无法实现任务与SSD节点的匹配,则仍维持步骤S103的配对结果进行分配。
综上所述,本发明第一个实施例所提供的Spark任务分配方法,面对异构存储的Spark,在基于数据位置的选择方法进行任务分配(任务与节点配对)的基础上,对该任务分配策略进行优化,进行精确配对,最大限度地将任务分配到配置有SSD的节点上,从而充分利用SSD高速读、写以及高吞吐率的特性,加快任务的执行速度,提升集群的服务质量。
作为本发明的第二个实施例,如图2所示,本发明提供的一种Spark任务分配系统,该系统包括存储特性统计模块10、资源配置模块20、资源调度模块30、资源过滤模块40。
存储特性统计模块10:用于获取集群中所有节点的节点标识、以及各节点配置的存储设备特征标识,生成节点标识与存储设备特征标识的对应关系;其中,存储设备特征标识包括固态硬盘标识和机械硬盘标识。
在本实施例中,将获取的集群中所有节点的节点标识进行统计,生成节点集合DNs(用于保存和管理集群中有效的计算节点),DNs = {dn1, dn2, dn3,…, dnm},其中,dn1, dn2, dn3,…, dnm分别表示各节点标识。根据DNs 将获取的各节点配置的存储设备特征标识进行统计,生成存储设备集合CHARs(即用于保存和管理DNs中对应节点所配置的存储设备的特征),例如,CHARs ={ssd, hdd, hdd, …, ssd},其中,ssd表示其对应的节点配置的存储设备为固态硬盘(SSD),hdd表示其对应的节点配置的存储设备为机械硬盘(HDD)。通过DNs与CHARs两个变量的建立,即实现了生成节点标识与存储设备特征标识的对应关系。上述集群中部署了m个计算节点,节点dn1配置的存储设备是SSD,节点dn2配置的存储设备是HDD,节点dn3配置的存储设备是HDD,节点dnm配置的存储设备是SSD。
本发明所提供的系统,首先通过模块10对集群各节点和节点的存储特性做统计,可以理解为系统初始化的过程,而每当一个任务在集群中被执行时,都需要模块20、30和40执行操作。
资源配置模块20:用于响应于当前提交的任务的资源请求,将满足该资源请求条件的节点配置给当前提交的任务,以完成资源的配置。20完成了资源配置的过程,在本实施例中,资源配置过程具体如下:
首先,集群中所有的节点定期向资源配置模块20发送“心跳”数据,“心跳”数据中包括该节点目前的空闲资源(如可用的CPU个数和内存大小等)。资源配置模块20获取这些信息后,将其保存。
用户/终端(Driver)根据当前的任务所需生成当前提交的任务的资源请求,并向集群中的管理者(Master)的资源配置模块20发送该资源请求(请求信息中包括了需要的CPU个数和内存大小等内容)。
资源配置模块20获取Driver的资源请求后,查询节点的空闲资源信息,将空闲资源满足请求的节点分配给该Driver(同时还包含了具体分配哪些节点以及各节点分配多少个CPU和多大的内存空间等信息),从而完成资源的配置。
资源调度模块30:用于基于数据位置的选择方法对当前提交的任务和已配置的资源中所包含的节点进行配对,生成配对结果。基于数据位置的选择方法是指将任务分配到存储有该任务操作数据的节点(将任务和任务操作数据部署到同一节点,可避免数据的远程传输),因此,模块 30作用是最大限度地将任务和存储有其操作数据的节点进行配对。若配对成功,则生成相应的配对结果;若配对不成功,则将当前提交的任务随机分配给已配置的资源中的任意不同的节点,并生成相应的配对结果。例如: {t1, {dn1, dn2}}表示任务t1和节点dn1与节点dn2配对成功,若存在无法配对的任务,则将任务随机分配给不同的节点。
资源过滤模块40:用于利用资源调度模块30的配对结果中所包含节点的节点标识,查找存储特性统计模块10已生成的节点标识与存储设备特征标识的对应关系,获取相应的存储设备特征标识。
资源过滤模块40:还用于按照获取的相应的存储设备特征标识,将配对结果中所包含节点分组为固态硬盘节点组和机械硬盘节点组。在本实施例中,按SSD和HDD将配对结果中所包含的节点分组为固态硬盘节点组DNssd和机械硬盘节点组DNhdd, DNssd = {…, dni, … },DNhdd = {…, dnj, …},DNssd包含了配对结果中存储设备标识为SSD的节点, DNhdd包含了配对结果中存储设备标识为HDD的节点。
资源过滤模块40:还用于若当前提交的任务的操作数据同时存在于固态硬盘节点组DNssd的节点和机械硬盘节点组DNhdd的节点中,则将该任务分配至固态硬盘节点组DNssd的任意一个节点上执行。例如:配对结果为{ t1, { dn1, dn2}},则将配对结果进一步精确为{t1, dn1}和{t1, dn2},依据是dn1属于DNssd,而dn2属于DNhdd,则将任务优先分配至存储设备标识为SSD的dn1节点执行。
若所述任务的操作数据只存在于所述固态硬盘节点组的节点中,则将所述任务分配至所述固态硬盘节点组的任意一个节点上执行;
若所述任务的操作数据只存在于所述机械硬盘节点组的节点中,则将所述任务分配至所述机械硬盘节点组的任意一个节点上执行。
资源过滤模块40在保证将任务优先分配至存储设备为SSD的节点的原则,最大限度地将任务分配到配置有SSD的计算节点上,从而充分利用SSD高速读、写以及高吞吐率的特性,加快任务的执行速度,提升集群的服务质量。如果无法实现任务与SSD节点的匹配,则仍维持资源调度模块30的配对结果进行分配。
综上所述,本发明第二个实施例所提供的Spark任务分配系统,在基于数据位置的选择方法进行任务分配(任务与节点配对)的基础上,对该任务分配策略进行优化,面向异构存储的Spark特性进行精确配对,最大限度地将任务分配到配置有SSD的节点上,从而充分利用SSD高速读、写以及高吞吐率的特性,加快任务的执行速度,提升集群的服务质量。
作为本发明的第三个实施例,如图3所示,为本发明提供的一种Spark任务分配系统示意图。其中, Spark(Master)表示集群中的管理者,Application(Driver)用户或者终端,Provisioner表示资源配置模块,Scheduler表示资源调度模块,Resource Filter表示资源过滤模块,Storage Monitor表示存储特性统计模块。另外,在计算机领域,HDFS表示集群,其包含两类节点,并以管理者-工作者模式运行,即一个NameNode(管理者)和多个DataNode(工作者)。图3中展示了异构存储的Spark任务分配框架的工作原理,在本实施例中,具体实现过程如下:
系统中的Storage Monitor模块首先要获取集群中各节点配置的存储设备特征信息,并生成节点标识与存储设备特征标识的对应关系。
另外,所有的节点定期向Provisioner模块发送“心跳”,其中包括该节点目前的空闲资源(可用的CPU个数和内存大小)。Provisioner模块获取这些信息后,将其保存。
当一个任务需要在系统中被执行时,先对该任务进行分配,具体分配过程如下:
首先,系统要完成资源配置过程:
Driver向Master的Provisioner模块发送资源请求,请求信息中包括了需要的CPU个数和内存大小等内容;
Provisioner模块获取Driver的资源请求后,查询节点的空闲资源信息,将空闲资源满足请求的计算节点分配给该Driver,具体包括分配哪些节点以及各节点分配多少个CPU和多大的内存空间,完成资源的配置。
其次,完成资源的选择过程:
provisioner模块将已配置的资源信息发送至Scheduler和ResourceFilter模块,分别由Scheduler和ResourceFilter模块完成基于数据位置的配对和面向异构存储的精确配对;
Scheduler模块基于数据位置策略对已配置的资源和任务进行匹配,最大限度地将任务和存储有其操作数据的节点进行配对。如果存在无法匹配的任务,则将这些任务随机分配给不同的节点。例如,{t1, {dn1, dn2}}表示任务t1和节点dn1与dn2匹配成功;
ResourceFilter模块获取已配置的资源信息后,向StorageMonitor模块发送特征查询请求,查询Scheduler配对结果中所包含的节点的存储设备特征标识,根据该存储设备特征标识将节点分组为DNssd和DNhdd,确定将任务优先分配至DNssd中的节点执行。
Scheduler获取ResourceFilter的分配结果,按照该分配结果,将任务分配到其对应的节点上执行。例如,ResourceFilter将分配结果{t1, dn1}发送至Scheduler,Scheduler模块将匹配结果发送至Driver,Driver获取到资源信息后,将任务t1发送至指定的节点dn1。
节点获取到任务后,执行该任务。
综上所述,在现有的Spark没有考虑集群所具备的SSD和HDD 混合异构特性的基础上,本发明第三个实施例所提供的系统对SSD节点做有针对性的选择,当任务的操作数据同时存储在SSD节点和HDD节点时,将任务分配到SSD节点,可以充分利用SSD高速读、写以及高吞吐率的特性,加快任务的执行速度,提升集群的服务质量,大大提升了Spark应用程序的执行效率。
需要说明的是,对于前述的各方法实施例,为了简便描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可以采用其它顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定都是本发明所必须的。
以上所述仅为本发明的较佳实施例而已,并不用以限制发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。

Claims (10)

  1. 一种Spark任务分配方法,其特征在于,所述方法包括:
    基于数据位置的选择方法对当前提交的任务和已配置的资源中所包含的节点进行配对,生成配对结果;
    利用所述配对结果中所包含节点的节点标识,查找已生成的节点标识与存储设备特征标识的对应关系,获取相应的存储设备特征标识;
    按照所述存储设备特征标识将所述配对结果中所包含节点分组为固态硬盘节点组和机械硬盘节点组;
    若所述任务的操作数据同时存在于所述固态硬盘节点组的节点和所述机械硬盘节点组的节点中,则将所述任务分配至所述固态硬盘节点组的任意一个节点上执行。
  2. 如权利要求1所述的方法,其特征在于,所述方法还包括:
    若所述任务的操作数据只存在于所述固态硬盘节点组的节点中,则将所述任务分配至所述固态硬盘节点组的任意一个节点上执行;
    若所述任务的操作数据只存在于所述机械硬盘节点组的节点中,则将所述任务分配至所述机械硬盘节点组的任意一个节点上执行。
  3. 如权利要求1所述的方法,其特征在于,所述方法还包括:
    获取集群中所有节点的节点标识、以及各所述节点配置的存储设备特征标识,生成所述节点标识与存储设备特征标识的对应关系;其中,所述存储设备特征标识包括固态硬盘标识和机械硬盘标识。
  4. 如权利要求1所述的方法,其特征在于,所述基于数据位置的选择方法对当前提交的任务和已配置的资源中所包含的节点进行配对,生成配对结果之前,所述方法还包括:
    响应于所述当前提交的任务的资源请求,将满足所述资源请求条件的节点配置给所述当前提交的任务,以完成资源的配置。
  5. 如权利要求1所述的方法,其特征在于,所述基于数据位置的选择方法对任务和已配置的资源中所包含的节点进行配对,生成初始配对结果包括:
    基于数据位置的选择方法,将所述任务和所述已配置的资源中存储有所述任务的操作数据的节点进行配对;
    若配对成功,则生成相应的所述配对结果;
    若配对不成功,则将所述任务随机分配给所述已配置的资源中的任意节点,并生成相应的所述配对结果。
  6. 一种Spark任务分配系统,其特征在于,所述系统包括:
    资源调度模块,用于基于数据位置的选择方法对当前提交的任务和已配置的资源中所包含的节点进行配对,生成配对结果;
    资源过滤模块,用于利用所述配对结果中所包含节点的节点标识,查找已生成的节点标识与存储设备特征标识的对应关系,获取相应的存储设备特征标识;
    所述资源过滤模块,还用于按照所述存储设备特征标识将所述配对结果中所包含节点分组为固态硬盘节点组和机械硬盘节点组;
    所述资源过滤模块,还用于若所述任务的操作数据同时存在于所述固态硬盘节点组的节点和所述机械硬盘节点组的节点中,则将所述任务分配至所述固态硬盘节点组的任意一个节点上执行。
  7. 如权利要求6所述的系统,其特征在于,所述资源过滤模块还用于:
    若所述任务的操作数据只存在于所述固态硬盘节点组的节点中,则将所述任务分配至所述固态硬盘节点组的任意一个节点上执行;
    若所述任务的操作数据只存在于所述机械硬盘节点组的节点中,则将所述任务分配至所述机械硬盘节点组的任意一个节点上执行。
  8. 如权利要求6所述的系统,其特征在于,所述系统还包括存储特性统计模块:
    用于获取集群中所有节点的节点标识、以及各所述节点配置的存储设备特征标识,生成所述节点标识与存储设备特征标识的对应关系;其中,所述存储设备特征标识包括固态硬盘标识和机械硬盘标识。
  9. 如权利要求6所述的系统,其特征在于,所述系统还包括资源配置模块:
    用于响应于所述当前提交的任务的资源请求,将满足所述资源请求条件的节点配置给所述当前提交的任务,以完成资源的配置。
  10. 如权利要求6所述的方法,其特征在于,所述资源调度模块具体用于:
    基于数据位置的选择方法,将所述任务和所述已配置的资源中存储有所述任务的操作数据的节点进行配对;
    若配对成功,则生成相应的所述配对结果;
    若配对不成功,则将所述任务随机分配给所述已配置的资源中的任意节点,并生成相应的所述配对结果。
PCT/CN2017/103877 2017-09-28 2017-09-28 一种spark任务分配方法和系统 WO2019061122A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/103877 WO2019061122A1 (zh) 2017-09-28 2017-09-28 一种spark任务分配方法和系统

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/103877 WO2019061122A1 (zh) 2017-09-28 2017-09-28 一种spark任务分配方法和系统

Publications (1)

Publication Number Publication Date
WO2019061122A1 true WO2019061122A1 (zh) 2019-04-04

Family

ID=65900417

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/103877 WO2019061122A1 (zh) 2017-09-28 2017-09-28 一种spark任务分配方法和系统

Country Status (1)

Country Link
WO (1) WO2019061122A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8949847B2 (en) * 2012-01-31 2015-02-03 Electronics And Telecommunications Research Institute Apparatus and method for managing resources in cluster computing environment
CN105740068A (zh) * 2016-01-27 2016-07-06 中国科学院计算技术研究所 面向大数据平台基于内存数据局部性的调度方法及系统
CN105939389A (zh) * 2016-06-29 2016-09-14 乐视控股(北京)有限公司 负载均衡方法及装置
CN106990915A (zh) * 2017-02-27 2017-07-28 北京航空航天大学 一种基于存储介质类型和加权配额的存储资源管理方法
CN107153662A (zh) * 2016-03-04 2017-09-12 华为技术有限公司 一种数据处理方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8949847B2 (en) * 2012-01-31 2015-02-03 Electronics And Telecommunications Research Institute Apparatus and method for managing resources in cluster computing environment
CN105740068A (zh) * 2016-01-27 2016-07-06 中国科学院计算技术研究所 面向大数据平台基于内存数据局部性的调度方法及系统
CN107153662A (zh) * 2016-03-04 2017-09-12 华为技术有限公司 一种数据处理方法及装置
CN105939389A (zh) * 2016-06-29 2016-09-14 乐视控股(北京)有限公司 负载均衡方法及装置
CN106990915A (zh) * 2017-02-27 2017-07-28 北京航空航天大学 一种基于存储介质类型和加权配额的存储资源管理方法

Similar Documents

Publication Publication Date Title
US20220317906A1 (en) Technologies for providing manifest-based asset representation
CN109416670B (zh) 用于执行部分同步的写入的技术
WO2018119901A1 (zh) 存储系统和固态硬盘
EP3823245A1 (en) Network switch circuitry supporting multiple different link-layer protocols
US9864759B2 (en) System and method for providing scatter/gather data processing in a middleware environment
WO2019019400A1 (zh) 任务分布式处理方法、装置、存储介质和服务器
US20160275123A1 (en) Pipeline execution of multiple map-reduce jobs
US20210405902A1 (en) Rule-based provisioning for heterogeneous distributed systems
CN103475732A (zh) 一种基于虚拟地址池的分布式文件系统数据卷部署方法
US9104501B2 (en) Preparing parallel tasks to use a synchronization register
CN117370029A (zh) 分布式计算系统中的集群资源管理
CN110569302A (zh) 一种基于lucene的分布式集群的物理隔离的方法及装置
WO2017092384A1 (zh) 一种集群数据库分布式存储的方法和装置
US10599436B2 (en) Data processing method and apparatus, and system
US11947534B2 (en) Connection pools for parallel processing applications accessing distributed databases
US20160112288A1 (en) Providing a data set for tracking and diagnosing datacenter issues
WO2014173366A2 (zh) 一种实现电信能力群发的方法、装置及系统
WO2015152751A1 (en) Provisioning resources for datacenters
WO2016029524A1 (zh) 一种用于闪存的网络存储设备及其处理方法
CN107590003B (zh) 一种Spark任务分配方法和系统
CN109902033B (zh) 应用于NVMe SSD控制器的namespace的LBA分配方法和映射方法
US9432476B1 (en) Proxy data storage system monitoring aggregator for a geographically-distributed environment
WO2019061122A1 (zh) 一种spark任务分配方法和系统
CN107967172B (zh) 一种面向异构存储的Spark任务动态迁移方法和系统
CN104508647B (zh) 用于扩大超大规模计算系统的存储器容量的方法和系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17927730

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 25.09.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 17927730

Country of ref document: EP

Kind code of ref document: A1