WO2020133245A1 - Cloud resource elastic scaling system for high performance computing and scheduling method - Google Patents

Cloud resource elastic scaling system for high performance computing and scheduling method Download PDF

Info

Publication number
WO2020133245A1
WO2020133245A1 PCT/CN2018/124970 CN2018124970W WO2020133245A1 WO 2020133245 A1 WO2020133245 A1 WO 2020133245A1 CN 2018124970 W CN2018124970 W CN 2018124970W WO 2020133245 A1 WO2020133245 A1 WO 2020133245A1
Authority
WO
WIPO (PCT)
Prior art keywords
resource
computing
node
cluster
task
Prior art date
Application number
PCT/CN2018/124970
Other languages
French (fr)
Chinese (zh)
Inventor
林帅康
刘阳
温书豪
马健
赖力鹏
Original Assignee
深圳晶泰科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳晶泰科技有限公司 filed Critical 深圳晶泰科技有限公司
Priority to PCT/CN2018/124970 priority Critical patent/WO2020133245A1/en
Publication of WO2020133245A1 publication Critical patent/WO2020133245A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt

Abstract

A cloud resource elastic scaling system for high performance computing, which belongs to the technical field of high performance computing, wherein the system comprises a resource expansion sub-system responsible for adding nodes to a cluster and a resource contraction sub-system responsible for deleting node from the computing cluster. A scheduling system accepts tasks submitted by an external user or system, and distributes the tasks to a waiting queue, the resource elastic scaling system scans the task waiting queue, combines aspects of expansion decision algorithms, and applies for bidding resources in a suitable area, and the tasks are finally run on the newly added computing nodes; when the tasks are distributed, the computing nodes in the cluster are slowing idle, a contraction strategy of the resource elastic scaling system is triggered to recover and release the nodes. The system integrates elastic scaling APIs of large public cloud providers to manage and control global resources; and an optimal resource use mode is predicted through statistical learning of a large number of existing and continuously added different types of task running time.

Description

面向高性能计算在云上的资源弹性伸缩系统及其调度方法Resource elastic scaling system for high-performance computing on cloud and its scheduling method 技术领域Technical field
本发明属于高性能计算技术领域,可以使用在云计算平台计算集群中,作为集群资源弹性伸缩管理系统。 The invention belongs to the technical field of high-performance computing, and can be used in a cloud computing platform computing cluster as a cluster resource elastic expansion management system.
背景技术Background technique
高性能计算资源弹性伸缩,是指资源调度器根据当前计算任务对资源的需求不同,动态地调整资源池的大小,以便任务获取运行所需的计算资源。High-performance computing resource elastic scaling refers to the fact that the resource scheduler dynamically adjusts the size of the resource pool according to the different resource requirements of the current computing task, so that the task can obtain the computing resources required for operation.
在公有云上,高性能计算以大规模的计算密集型任务为计算单位,通过高效的作业调度系统把任务分发到集群中。资源弹性伸缩系统通过周期性扫描任务队列,统计任务所需要的资源大小,触发资源伸容,从而使任务可以在相应的节点上进行计算。当任务计算结束后,节点连续空闲多个周期便会触发资源缩容,节点将会被回收释放以节约成本。同时当计算节点因为健康检测多次失败后,也会被强制回收替换成新的节点。资源弹性伸缩系统通过以上机制保证资源池动态的调整,使任务尽可能被调度运行起来。In the public cloud, high-performance computing uses large-scale computation-intensive tasks as the computing unit, and distributes the tasks to the cluster through an efficient job scheduling system. The resource elastic scaling system periodically scans the task queue to count the resource size required by the task and triggers resource expansion, so that the task can be calculated on the corresponding node. When the task calculation is completed, the node will be idle for multiple cycles in a row, which will trigger resource downsizing, and the node will be recovered and released to save costs. At the same time, when the computing node fails repeatedly due to the health check, it will also be forcibly recycled and replaced with a new node. The resource elastic scaling system ensures the dynamic adjustment of the resource pool through the above mechanism, so that tasks can be scheduled to run as much as possible.
目前资源弹性伸缩系统存在的问题主要有以下几方面:The current problems of the resource elastic scaling system mainly include the following aspects:
1.     资源弹性伸缩系统支持的计算节点配置单一,迫使任务调度系统处理复杂的资源装箱问题。在一个伸缩组内都是由同构的计算节点所组成,而不同的计算任务所需要的CPU核数并不相同。比如:队列中有8核,16核以及32核的任务,由于计算节点均为32核资源,每种任务的总数是不一样的,最终就会出现8核或者16核的任务独占一台32核的计算节点,从而造成大量的资源浪费。1. The single configuration of computing nodes supported by the resource elastic scaling system forces the task scheduling system to deal with complex resource bundling. Within a scaling group are composed of homogeneous computing nodes, and different computing tasks require different numbers of CPU cores. For example, there are 8-core, 16-core and 32-core tasks in the queue. Since the computing nodes are all 32-core resources, the total number of each task is different. Eventually, 8-core or 16-core tasks will occupy a single 32 Core computing nodes, which causes a lot of waste of resources.
2.     资源弹性伸缩系统的健康检测机制并不适用于高CPU负载的高性能计算任务,健康检测机制通常是在计算节点上运行一个后台检测服务,周期性向节点总控发送健康心跳信息以表明当前节点健康良好。但由于高性能计算任务会执行大量浮点计算,CPU轻松到达100%,CPU过于繁忙无法及时发送心跳信息到节点总控系统,导致节点总控误以为计算节点无响应而触发节点回收机制,不可中断的任务被误杀重新回到调度队列,下次运行还是会出现健康检测失败的情况而任务再次被误杀,造成资源浪费。2. The health detection mechanism of the resource elastic scaling system is not suitable for high-performance computing tasks with high CPU load. The health detection mechanism usually runs a background detection service on the computing node and periodically sends health heartbeat information to the node master to indicate the current node health good. However, because high-performance computing tasks will perform a large number of floating-point calculations, the CPU easily reaches 100%, and the CPU is too busy to send heartbeat information to the node master control system in time, resulting in the node master control mistakenly thinking that the computing node is unresponsive and triggers the node recycling mechanism. The interrupted task is mistakenly killed and returned to the scheduling queue again. The health check fails next time and the task is accidentally killed again, resulting in waste of resources.
3.     资源弹性伸缩系统所纳管的计算节点通常是按需计费的,近几年出现的竞价计费模式相比按需计费模式可以让企业获取大量弹性计算资源的同时,还能大幅度降低计算成本,竞价资源是公有云厂商中的可用空闲计算容量,其价格最低可达按需资源的10%,而竞价资源与按需资源的唯一区别在于,竞价资源会因为某一时刻按需资源需求量大增时而被中断回收。因此竞价资源很适合可中断的高性能计算工作任务场景。因此竞价资源的价格波动以及中断率是跟当前区域供需关系相关的,而纳管竞价型资源的弹性伸缩系统无法根据这种供需关系进行动态选择合适的区域,也就无法找到更低价格以及更低中断率的竞价资源。3. The computing nodes managed by the resource elastic scaling system are usually billed on demand. Compared with the on-demand billing model, the bidding billing model that has emerged in recent years can allow companies to obtain a large amount of elastic computing resources, while greatly reducing Calculating costs, bidding resources are available free computing capacity in public cloud vendors, and their prices can be as low as 10% of on-demand resources. The only difference between bidding resources and on-demand resources is that bidding resources will be due to on-demand resources at a certain time. The demand has been interrupted for recovery. Therefore, bidding resources are very suitable for interruptible high-performance computing task scenarios. Therefore, the price fluctuation and interruption rate of bidding resources are related to the current regional supply and demand relationship, and the elastic scaling system that manages the bidding resources cannot dynamically select the appropriate area based on this supply and demand relationship, and therefore cannot find lower prices and more Bidding resources with low interruption rate.
4.     资源弹性伸缩系统决策单次扩容的计算结点数量通常是基于任务列队所需要的总核数计算出来的,如队列中有1000个32核任务,而当前资源池又没有空闲的资源,那么资源弹性伸缩系统就会直接添加1000台32核的计算结点。但由于32核的计算任务会由于不同的计算复杂度所需要的计算时间会差异很大,复杂度高的可能要运行几小时到几天才能完成,但复杂度低的计算任务可能只需要几十分钟。当任务计算完成后,资源弹性伸缩系统还需要连续扫描计算节点多个周期后才会去回收计算节点。如设置每个周期是5分钟,连续2个周期节点空闲的话便触发回收。那么最终会有1000个计算节点空跑10分钟,从而浪费大批资源,同时也有可能当前所选的区域内竞价资源的价格相对较高,使用了高价计算资源来运行这批任务。而这样的大批量32核计算任务的场景通常对结果的反馈时间并不敏感,也就意味着任务只要在约定的时间内计算完就不影响业务的推进。而实际上造成这种一次性过度扩容的原因在于:1:资源伸缩系统的决策条件太过单一,2:无感知任务类型差异性,无法预测任务运行时间。3:无感知当前任务优先级急迫度,4:无法感知当前不同资源区域不同时间段下的竞价价格趋势。4. The number of computing nodes for single expansion of resource elasticity system decision-making is usually calculated based on the total number of cores required by the task queue. If there are 1000 32-core tasks in the queue and the current resource pool does not have free resources, then the resources The elastic scaling system will directly add 1,000 32-core computing nodes. However, the calculation time required for a 32-core computing task will vary greatly due to different computational complexity. High-complexity tasks may take several hours to several days to complete, but low-complexity computing tasks may only require a few ten minutes. After the task calculation is completed, the resource elastic scaling system needs to continuously scan the computing nodes for multiple cycles before recovering the computing nodes. For example, if each cycle is set to 5 minutes, the recovery will be triggered if the node is idle for 2 consecutive cycles. In the end, there will be 1,000 computing nodes running empty for 10 minutes, which wastes a lot of resources. At the same time, it is also possible that the price of bidding resources in the currently selected area is relatively high, and high-priced computing resources are used to run this batch of tasks. Such a large-scale 32-core computing task scenario is usually not sensitive to the feedback time of the results, which means that as long as the task is calculated within the agreed time, it will not affect the progress of the business. In fact, the reasons for this one-time over-capacity expansion are: 1: the decision conditions of the resource scaling system are too simple, 2: there is no perceived difference in task type, and the task running time cannot be predicted. 3: No awareness of the urgency of the current task priority, 4: Unable to perceive the current bid price trends in different resource areas at different time periods.
当调度系统不再向任务队列中分发新的任务时,此时集群中跑着不同CPU数的任务,如4核,8核,16核CPU任务,调度系统在一开始的时候通过算法优化集群装箱问题让不同的任务填满每个32核或者16核计算节点,但由于任务运行的时间并不一样,所以如果没有新的任务调度到节点的话,就会出现单个任务独占一台32核计算结点,由于缩容系统周期性扫描发现节点上仍有任务在运行,便不会触发节点回收机制,此时集群的利用率将会不断下降。When the scheduling system no longer distributes new tasks to the task queue, tasks with different CPU numbers are running in the cluster at this time, such as 4-core, 8-core, and 16-core CPU tasks. The scheduling system optimizes the cluster through algorithms at the beginning The boxing problem allows different tasks to fill each 32-core or 16-core computing node, but because the task runs at different times, if there is no new task scheduled to the node, a single task will occupy a single 32-core Calculate the nodes. Since the scaling system periodically scans to find that there are still tasks running on the nodes, the node recycling mechanism will not be triggered, and the utilization rate of the cluster will continue to decline.
技术问题technical problem
技术解决方案Technical solution
针对上述技术问题,本发明提供一种面向高性能计算在云上的资源弹性伸缩系统及其调度方法,实现对跨多个公有云区域以及多种计算资源配置的支持、适应高性能计算中节点健康检测;适应竞价实例资源的使用模式;并且能预测任务运行时间从而避免过度添加计算节点造成资源浪费;动态调整缩容机制从而避免由于装箱问题造成资源浪费。In response to the above technical problems, the present invention provides a high-performance computing resource resilience system on the cloud and its scheduling method, which supports the support of multiple public cloud regions and multiple computing resource configurations and adapts to high-performance computing nodes Health detection; adapt to the usage pattern of bidding instance resources; and can predict task running time to avoid excessively adding computing nodes to cause waste of resources; dynamically adjust the shrinkage mechanism to avoid waste of resources due to packing problems.
具体技术方案为:The specific technical solution is:
面向高性能计算在云上的资源弹性伸缩系统,包括两个子系统:资源扩容子系统与资源缩容子系统;所述的资源扩容子系统负责向集群内添加节点,所述的资源缩容子系统负责从计算集群中删除节点。A resource elastic scaling system for high-performance computing in the cloud, including two subsystems: a resource expansion subsystem and a resource scaling subsystem; the resource expansion subsystem is responsible for adding nodes to the cluster, and the resource scaling subsystem is responsible for Remove the node from the computing cluster.
所述的资源扩容子系统包括三个数据采集模块,分别是:The resource expansion subsystem includes three data acquisition modules, which are:
任务运行时间统计模块,从任务数据库中采集统计不同任务类型的数据;The task running time statistics module collects statistics on different task types from the task database;
竞价资源价格监控预测模块,从公有云厂商的竞价资源池中采集及监控价格趋势数据;The auction resource price monitoring and forecasting module collects and monitors price trend data from the bidding resource pool of public cloud vendors;
竞价实例中断处理模块,从计算集群中实时采集及监控竞价实例中断数据。The auction instance interrupt processing module collects and monitors auction instance interrupt data in real time from the computing cluster.
所述的资源缩容子系统包括两个数据采集群模块,分别是:The resource shrinkage subsystem includes two data collection group modules, which are:
计算节点负载监控模块,实时采集节点的CPU使用率时序数据;Compute node load monitoring module, real-time collection of node CPU utilization rate time series data;
集群节点扫描模块,周期性扫描采集集群空闲及健康数据。The cluster node scanning module periodically scans to collect cluster idle and health data.
该面向高性能计算在云上的资源弹性伸缩系统的调度方法,包括以下步骤:调度系统接受外部用户或系统提交的任务,并分发到等待队列,资源弹性伸缩系统扫描任务等待队列,结合多方面的扩容决策算法,在合适的区域内申请竞价资源,任务最终在新添加的计算节点上运行起来;当任务被分发完毕,集群中有计算节点慢慢空闲下来时,触发资源弹性伸缩系统的缩容策略,对节点进行回收释放。The scheduling method of the resource elastic scaling system for high performance computing in the cloud includes the following steps: the scheduling system accepts tasks submitted by external users or the system and distributes them to the waiting queue, and the resource elastic scaling system scans the task waiting queue, combining multiple aspects The capacity expansion decision algorithm applies for bidding resources in a suitable area, and the task finally runs on the newly added computing node; when the task is distributed and the computing nodes in the cluster are slowly idle, the resource elastic scaling system is triggered to shrink Content strategy, recycle and release the nodes.
具体的,所述的资源扩容子系统对集群添加节点是基于三大数据采集群模块所决定的,包括以下步骤:Specifically, the addition of nodes to the cluster by the resource expansion subsystem is determined based on the three major data collection group modules, and includes the following steps:
S11,任务运行时间统计模块从任务数据库中采集统计不同任务类型的数据;根据已有任务数据进行统计,预测出现有任务队列中任务所需要的运行时间,再结合任务对需要的CPU核数,即能计算出等待队列中所有任务所需要的资源总核数;S11, the task running time statistics module collects statistics on different task types from the task database; performs statistics based on the existing task data, predicts the running time required for the tasks in the task queue, and then combines the required CPU cores with the task, That is, the total number of resource cores required by all tasks in the waiting queue can be calculated;
S12,竞价资源价格监控预测模块从公有云厂商的竞价资源池中采集及监控价格趋势数据;根据竞价资源价格的历史波动数据,可预测出资源在各个区域中不同时间点的价格波动范围;S12, the bidding resource price monitoring and forecasting module collects and monitors price trend data from the bidding resource pool of public cloud vendors; based on the historical fluctuation data of the bidding resource price, it can predict the price fluctuation range of the resource in various regions at different time points;
S13,竞价实例中断处理模块从计算集群中实时采集及监控竞价实例中断数据;结合竞价实例中断处理模块计算节点中断率的实时反馈,即能筛选出最合适区域中的竞价资源;S13. The auction instance interrupt processing module collects and monitors auction instance interrupt data from the computing cluster in real time; combined with the real-time feedback of the node instance interrupt processing module to calculate the node interrupt rate, that is, the bidding resources in the most suitable area can be screened out;
最终,当弹性扩容子系统监控发现任务队列中有等待的任务,结合以上三个模块所得出的资源数据表,最终确定在合适的区域内申请到能满足任务计算需求的高性价比,低中断率的竞价计算节点资源,从而把节点添加到计算集群中。Finally, when the flexible expansion subsystem monitors and finds that there are tasks waiting in the task queue, combined with the resource data tables obtained by the above three modules, it is finally determined that the cost-effective and low interruption rate that can meet the task calculation needs can be applied in the appropriate area. Calculates the node resources in order to add nodes to the computing cluster.
所述的资源缩容子系统向集群添加节点是基于两大数据采集群模块所决定的,包括以下步骤:The resource shrinkage subsystem described above adds nodes to the cluster based on the decision of the two major data collection cluster modules and includes the following steps:
S14,计算节点负载监控模块实时采集节点的CPU使用率时序数据;S14, the load monitoring module of the computing node collects the CPU timing data of the node in real time;
计算节点负载监控模块通过公有云厂商接口可获取到计算节点实时的CPU使用率,并把该数据添加到时序数据库influxdb中,从而外部过通过直接的influxdb接口获取集群中所有计算节点的监控数据。The computing node load monitoring module can obtain the real-time CPU usage of the computing node through the public cloud vendor interface, and add this data to the time series database influxdb, so that the external can obtain the monitoring data of all computing nodes in the cluster through the direct influxdb interface.
S15,集群节点扫描模块周期性扫描采集集群空闲及健康数据;S15, the cluster node scanning module periodically scans to collect cluster idle and health data;
集群节点扫描模块期周性的对整个集群进行扫描,以及时发现当前计算集群中是否有无任务在运行的空闲节点,同是通过健康检测机制发现非健康节点,最终把相关数据存储在集群节点检测表中。The cluster node scanning module periodically scans the entire cluster to find out whether there are idle nodes in the current computing cluster that are running tasks. It also finds non-healthy nodes through the health detection mechanism, and finally stores the relevant data in the cluster nodes. Checklist.
进一步的,还包括,对于高性能计算中的计算节点健康检测,采用了通过监控计算节点CPU负载指标进行辅助,当CPU负载进入80%的阀值时,检测程序会将该计算节加入到缩容保护队列;当任务计算负载降到80%以下时,健康检测恢复正常,计算节点从缩容保护队列中移除,以避免因健康检测失败而造成节点错误回收;Further, it also includes that for the health detection of computing nodes in high-performance computing, it is assisted by monitoring the CPU load index of the computing node. When the CPU load enters the 80% threshold, the detection program will add the calculation section to the contract Capacity protection queue; when the task computing load drops below 80%, the health check returns to normal, and the computing node is removed from the capacity reduction protection queue to avoid node error recovery due to failure of the health check;
弹性缩容子系统结合自身两个数据采集群模块所采集的数据对节点进行回收决策,从而把空闲的计算节点从集群中删除。The flexible shrinkage subsystem combines the data collected by its two data collection group modules to make a recycling decision on the nodes, thereby deleting idle computing nodes from the cluster.
有益效果Beneficial effect
本发明提供的面向高性能计算在云上的资源弹性伸缩系统及其调度方法,具有以下技术效果:The resource elastic scaling system and scheduling method for high-performance computing on the cloud provided by the present invention have the following technical effects:
(1)通过集成各大公有云厂商的弹性伸缩API实现全球资源的管控;(1) Realize the management and control of global resources by integrating the elastic scaling APIs of major public cloud vendors;
(2)针对高性能计算任务实施更弹性的计算节点健康检测机制;(2) Implement a more flexible computing node health detection mechanism for high-performance computing tasks;
(3)动态感知各大公有云厂商中竞价资源的价格及中断率;(3) Dynamically sense the price and interruption rate of bidding resources among major public cloud vendors;
(4)通过对大量现有以及不断新增的不同类型任务运行时间的统计学习,资源伸缩系统可预测出最佳的资源使用方式。(4) Through statistical learning of the running time of a large number of existing and continuously added different types of tasks, the resource scaling system can predict the best resource usage.
附图说明BRIEF DESCRIPTION
图1是本发明的资源弹性伸缩系统的系统结构图;FIG. 1 is a system structure diagram of the resource elastic scaling system of the present invention;
图2是本发明的资源弹性伸缩系统的资源扩容子系统数据采集图;2 is a data acquisition diagram of a resource expansion subsystem of the resource elastic scaling system of the present invention;
图3是本发明的资源弹性伸缩系统的资源缩容子系统数据采集图;3 is a data collection diagram of a resource scaling subsystem of the resource elastic scaling system of the present invention;
图4是本发明的资源弹性伸缩系统的调度方法流程图;4 is a flow chart of the scheduling method of the resource elastic scaling system of the present invention;
图5是本发明的实施示意图。5 is a schematic diagram of the implementation of the present invention.
本发明的最佳实施方式Best Mode of the Invention
本发明的实施方式Embodiments of the invention
结合实施例说明本发明的具体技术方案。The specific technical solution of the present invention will be described in conjunction with the embodiments.
如图1所示,本发明实施例提供的资源弹性伸缩系统方法,包括两子系统:资源扩容子系统与资源缩容子系统;资源扩容子系统负责向集群内添加节点,资源缩容子系统负责从计算集群中删除节点。As shown in FIG. 1, the resource elastic scaling system method provided by the embodiment of the present invention includes two subsystems: a resource expansion subsystem and a resource scaling subsystem; the resource expansion subsystem is responsible for adding nodes to the cluster, and the resource scaling subsystem is responsible for Delete the node in the computing cluster.
资源扩容子系统对集群添加节点是基于三数据采集群模块所决定的,如图2所示,这三大数据采集模块分别是:Adding nodes to the cluster by the resource expansion subsystem is determined based on the three data collection group modules. As shown in Figure 2, the three major data collection modules are:
S11,任务运行时间统计模块从任务数据库中采集统计不同任务类型的数据;S11, the task running time statistics module collects statistics on different task types from the task database;
S12,竞价资源价格监控预测模块从公有云厂商的竞价资源池中采集及监控价格趋势数据;S12, the bidding resource price monitoring and forecasting module collects and monitors price trend data from the bidding resource pool of public cloud vendors;
S13,竞价实例中断处理模块从计算集群中实时采集及监控竞价实例中断数据。S13. The auction instance interrupt processing module collects and monitors the auction instance interrupt data from the computing cluster in real time.
首先,在S11步骤的任务运行时间统计模块中,任务有以下属性:First, in the task runtime statistics module of step S11, the task has the following attributes:
任务名称mission name 任务类别Task category CPU需求CPU requirements 预估持续时间Estimated duration 任务总数Total tasks
根据已有任务数据进行统计,预测出现有任务队列中任务所需要的运行时间,再结合任务对需要的CPU核数,即能计算出等待队列中所有任务所需要的资源总核数。According to the statistics of the existing task data, the running time required for the tasks in the task queue is predicted, and then the number of CPU cores required by the task pair is combined to calculate the total number of resource cores required by all tasks in the waiting queue.
任务名称mission name 任务类别Task category CPU需求(核数)CPU requirements (cores) 预估持续时间 (小时)Estimated duration (hour) 任务总数 (个)Total tasks (Piece)
AA XX 88 0.50.5 10001000
BB YY 1616 3.03.0 500500
CC ZZ 3232 12.012.0 300300
其次,在S12步骤的竞价资源价格监控预测模块中,竞价资源有以下属性:Secondly, in the auction resource price monitoring and forecasting module of step S12, the auction resource has the following attributes:
竞价区域Auction area 竞价实例类别Auction instance category 竞价实例单价Unit price of auction instance 竞价实例中断率Auction instance interruption rate
根据竞价资源价格的历史波动数据,可预测出资源在各个区域中不同时间点的价格波动范围,再结合S13步骤的竞价实例中断处理模块计算节点中断率的实时反馈,即能筛选出最合适区域中的竞价资源。According to the historical fluctuation data of the price of bidding resources, the price fluctuation range of resources in various regions at different points in time can be predicted. Combined with the real-time feedback of the node interruption rate calculation module of the auction instance interrupt processing module of step S13, the most suitable area can be selected. Bidding resources in.
竞价区域Auction area 竞价实例类别Auction instance category 竞价实例单价(元)Unit price of auction instance (yuan) 竞价实例中断率Auction instance interruption rate
AWS-A区AWS-A A1A1 1.61.6 10%10%
腾讯云-B 区Tencent Cloud-B District B1B1 2.42.4 15%15%
华为云-C 区Huawei Cloud-C Area C1C1 1.81.8 20%20%
最终,当弹性扩容子系统监控发现任务队列中有等待的任务,结合以上三个模块所得出的资源数据表,最终确定在合适的区域内申请到能满足任务计算需求的高性价比,低中断率的竞价计算节点资源,从而把节点添加到计算集群中。Finally, when the flexible expansion subsystem monitors and finds that there are tasks waiting in the task queue, combining the resource data tables obtained from the above three modules, it is finally determined that the cost-effective and low interruption rate that can meet the task calculation needs can be applied in the appropriate area. Calculates the node resources in order to add nodes to the computing cluster.
而资源缩容子系统向集群添加节点是基于两大数据采集群模块所决定的,如图3所示,这两大数据采集模块分别是:The resource shrinkage subsystem adds nodes to the cluster based on two major data collection group modules. As shown in Figure 3, the two major data collection modules are:
S14,计算节点负载监控模块实时采集节点的CPU使用率时序数据;S14, the load monitoring module of the computing node collects the CPU timing data of the node in real time;
S15,集群节点扫描模块周期性扫描采集集群空闲及健康数据;S15, the cluster node scanning module periodically scans to collect cluster idle and health data;
首先,在S14中计算节点负载监控模块通过公有云厂商接口可获取到计算节点实时的CPU使用率,并把该数据添加到时序数据库influxdb中,从而外部过通过直接的influxdb接口获取集群中所有计算节点的监控数据。First, in S14, the computing node load monitoring module can obtain the real-time CPU usage of the computing node through the public cloud vendor interface, and add the data to the time series database influxdb, so that the external through the direct influxdb interface to obtain all the calculations in the cluster The monitoring data of the node.
其次,在S15中集群节点扫描模块期周性的对整个集群进行扫描,以及时发现当前计算集群中是否有无任务在运行的空闲节点,同是通过健康检测机制发现非健康节点,最终把相关数据存储在集群节点检测表中。Secondly, in S15, the cluster node scanning module periodically scans the entire cluster, and timely finds out whether there are idle nodes in the current computing cluster that are running tasks, and also finds non-healthy nodes through the health detection mechanism. The data is stored in the cluster node detection table.
竞价区域Auction area 竞价实例类别Auction instance category 是否空闲Free 是否健康Is it healthy
AWS-A区AWS-A A1A1 TRUETRUE TRUETRUE
腾讯云-B 区Tencent Cloud-B District B1B1 FALSEFALSE FALSEFALSE
华为云-C 区Huawei Cloud-C Area C1C1 FALSEFALSE TRUETRUE
同时,对于高性能计算中的计算节点健康检测,本方法采用了通过监控计算节点CPU负载指标进行辅助,当CPU负载进入80%的阀值时,检测程序会将该计算节加入到缩容保护队列,当CPU负载到达100%时,健康检测程序很有可能没办法继续保持心跳信息的发送从而触发缩容,但由于提前设置了缩容保护,所以这个时间该计算节点并不会被误杀。当任务计算负载降到80%以下时,健康检测恢复正常,计算节点从缩容保护队列中移除,以避免因健康检测失败而造成节点错误回收。At the same time, for the health detection of computing nodes in high-performance computing, this method adopts assistance by monitoring the CPU load index of the computing node. When the CPU load enters the 80% threshold, the detection program will add the calculation section to the scale-down protection Queue, when the CPU load reaches 100%, the health detection program may not be able to continue to send heartbeat information to trigger shrinkage, but because the shrinkage protection is set in advance, the computing node will not be killed by this time. When the task computing load drops below 80%, the health check returns to normal, and the computing node is removed from the shrink protection queue to avoid node error recovery due to a failed health check.
最终,弹性缩容子系统结合以上两大模块所采集的数据对节点进行回收决策,从而把空闲的计算节点从集群中删除。Finally, the elastic shrinkage subsystem combines the data collected by the above two modules to make a recycling decision on the node, thereby deleting the idle computing node from the cluster.
弹性资源伸缩系统利用各个模块采集统计相关的数据,为资源的扩容以及资源的容容提供准备的决策。整个系统流程如图4所所示,调度系统接受外部用户或系统提交的任务,并分发到等待队列,资源弹性伸缩系统扫描任务等待队列,结合多方面的扩容决策算法,在合适的区域内申请竞价资源,任务最终在新添加的计算节点上运行起来。当任务被分发完毕,集群中有计算节点慢慢空闲下来时,触发资源弹性伸缩系统的缩容策略,对节点进行回收释放。The flexible resource scaling system uses various modules to collect statistically related data to provide prepared decisions for resource expansion and resource capacity. The entire system flow is shown in Figure 4. The scheduling system accepts tasks submitted by external users or the system and distributes them to the waiting queue. The resource elastic scaling system scans the task waiting queue and combines multiple expansion decision algorithms to apply in the appropriate area. Bidding resources, the task will eventually run on the newly added computing node. When tasks are distributed and computing nodes in the cluster are slowly idle, the scaling strategy of the resource elastic scaling system is triggered to recover and release the nodes.
利用本方法可在各大公有云厂商,比如AWS,腾讯云,华为云,谷歌云等,搭建出一个高效的弹性伸缩系统。通过在云上申请一台主机并附加相应的资源操作权限,同时提供调度系统任务查询的相关接口,便可运行起来,如图5。当操作节点通过提交任务到调度系统后,弹性伸缩系统就会自动添加合适的竞价节点,在任务完成后再实施节点回收策略。This method can be used to build an efficient and flexible system in major public cloud vendors, such as AWS, Tencent Cloud, Huawei Cloud, and Google Cloud. You can run it by applying for a host on the cloud and attaching the corresponding resource operation authority, while providing related interfaces for scheduling system task query, as shown in Figure 5. When the operation node submits the task to the scheduling system, the elastic scaling system will automatically add the appropriate bidding node, and implement the node recovery strategy after the task is completed.

Claims (7)

  1. 面向高性能计算在云上的资源弹性伸缩系统,其特征在于,包括两个子系统:资源扩容子系统与资源缩容子系统;所述的资源扩容子系统负责向集群内添加节点,所述的资源缩容子系统负责从计算集群中删除节点。A resource-elastic scaling system for high-performance computing on the cloud, which is characterized by including two subsystems: a resource expansion subsystem and a resource scaling subsystem; the resource expansion subsystem is responsible for adding nodes to the cluster, and the resources The scaling-down subsystem is responsible for removing nodes from the computing cluster.
  2. 根据权利要求1所述的面向高性能计算在云上的资源弹性伸缩系统,其特征在于,所述的资源扩容子系统包括三个数据采集模块,分别是:The resource-elastic scaling system for high-performance computing on the cloud according to claim 1, wherein the resource expansion subsystem includes three data collection modules, which are:
    任务运行时间统计模块,从任务数据库中采集统计不同任务类型的数据;The task running time statistics module collects statistics on different task types from the task database;
    竞价资源价格监控预测模块,从公有云厂商的竞价资源池中采集及监控价格趋势数据;The auction resource price monitoring and forecasting module collects and monitors price trend data from the bidding resource pool of public cloud vendors;
    竞价实例中断处理模块,从计算集群中实时采集及监控竞价实例中断数据。The auction instance interrupt processing module collects and monitors auction instance interrupt data in real time from the computing cluster.
  3. 根据权利要求1所述的面向高性能计算在云上的资源弹性伸缩系统,其特征在于,所述的资源缩容子系统包括两个数据采集群模块,分别是:The resource elastic scaling system for high-performance computing on the cloud according to claim 1, wherein the resource scaling subsystem includes two data collection group modules, which are:
    计算节点负载监控模块,实时采集节点的CPU使用率时序数据;Compute node load monitoring module, real-time collection of node CPU utilization rate time series data;
    集群节点扫描模块,周期性扫描采集集群节点空闲及节点健康数据。The cluster node scanning module periodically scans to collect cluster node idle and node health data.
  4. 根据权利要求1到3任一项所述的面向高性能计算在云上的资源弹性伸缩系统的调度方法,其特征在于,包括以下步骤:调度系统接受外部用户或系统提交的任务,并分发到等待队列,资源弹性伸缩系统扫描任务等待队列,结合多方面的扩容决策算法,在合适的区域内申请竞价资源,任务最终在新添加的计算节点上运行起来;当任务被分发完毕,集群中有计算节点慢慢空闲下来时,触发资源弹性伸缩系统的缩容策略,对节点进行回收释放。The scheduling method for a resource-elastic scaling system for high-performance computing on the cloud according to any one of claims 1 to 3, characterized in that it includes the following steps: the scheduling system accepts tasks submitted by external users or the system and distributes them to The waiting queue, the resource elastic scaling system scans the task waiting queue, combines multiple expansion decision algorithms, applies for bidding resources in the appropriate area, and the task finally runs on the newly added computing node; when the task is distributed, the cluster has When the computing node slowly idles, it triggers the scaling strategy of the resource elastic scaling system to recover and release the node.
  5. 根据权利要求4所述的面向高性能计算在云上的资源弹性伸缩系统的调度方法,其特征在于,所述的资源扩容子系统对集群添加节点是基于三大数据采集群模块所决定的,包括以下步骤:The scheduling method for a resource-elastic scaling system for high-performance computing on the cloud according to claim 4, wherein the resource expansion subsystem adds nodes to the cluster based on three major data collection group modules, It includes the following steps:
    S11,任务运行时间统计模块从任务数据库中采集统计不同任务类型的数据;根据已有任务数据进行统计,预测出现有任务队列中任务所需要的运行时间,再结合任务所需要的CPU核数,即能计算出等待队列中所有任务所需要的资源总核数;S11, the task running time statistics module collects statistics on different task types from the task database; performs statistics based on the existing task data, predicts the running time required for the tasks in the task queue, and then combines the CPU cores required by the task, That is, the total number of resource cores required by all tasks in the waiting queue can be calculated;
    S12,竞价资源价格监控预测模块从公有云厂商的竞价资源池中采集及监控价格趋势数据;根据竞价资源价格的历史波动数据,可预测出资源在各个区域中不同时间点的价格波动范围;S12, the bidding resource price monitoring and forecasting module collects and monitors price trend data from the bidding resource pool of public cloud vendors; based on the historical fluctuation data of the bidding resource price, it can predict the price fluctuation range of the resource in various regions at different time points;
    S13,竞价实例中断处理模块从计算集群中实时采集及监控竞价实例中断数据;结合竞价实例中断处理模块计算节点中断率的实时反馈,即能筛选出最合适区域中的竞价资源;S13. The auction instance interrupt processing module collects and monitors auction instance interrupt data from the computing cluster in real time; combined with the real-time feedback of the node instance interrupt processing module to calculate the node interrupt rate, that is, the bidding resources in the most suitable area can be screened out;
    最终,当弹性扩容子系统监控发现任务队列中有等待的任务,结合以上三个模块所得出的资源数据表,最终确定在合适的区域内申请到能满足任务计算所需求的高性价比,低中断率的竞价计算节点资源,从而把节点添加到计算集群中。Finally, when the flexible expansion subsystem monitors and finds that there are tasks waiting in the task queue, combining the resource data tables obtained by the above three modules, it is finally determined that the cost-effective and low-interruption can be applied in the appropriate area to meet the task calculation requirements. The rate of bidding calculates node resources, thereby adding nodes to the computing cluster.
  6. 根据权利要求4所述的面向高性能计算在云上的资源弹性伸缩系统的调度方法,其特征在于,所述的资源缩容子系统向集群添加节点是基于两大数据采集群模块所决定的,包括以下步骤:The scheduling method for a resource-elastic scaling system for high-performance computing on the cloud according to claim 4, characterized in that the resource shrinkage subsystem adds nodes to the cluster based on two major data collection group modules, It includes the following steps:
    S14,计算节点负载监控模块实时采集节点的CPU使用率时序数据;S14, the load monitoring module of the computing node collects the CPU timing data of the node in real time;
    计算节点负载监控模块通过公有云厂商接口可获取到计算节点实时的CPU使用率,并把该数据添加到时序数据库influxdb中,从而外部过通过直接的influxdb接口获取集群中所有计算节点的监控数据;The computing node load monitoring module can obtain the real-time CPU usage of the computing node through the public cloud vendor interface, and add this data to the time series database influxdb, so that the external can obtain the monitoring data of all computing nodes in the cluster through the direct influxdb interface;
    S15,集群节点扫描模块周期性扫描采集集群节点空闲及节点健康数据;S15, the cluster node scanning module periodically scans to collect cluster node idle and node health data;
    集群节点扫描模块期周性的对整个集群进行扫描,以及时发现当前计算集群中是否有无任务在运行的空闲节点,同是通过健康检测机制发现非健康节点,最终把相关数据存储在集群节点检测表中。The cluster node scanning module periodically scans the entire cluster to find out whether there are idle nodes in the current computing cluster that are running tasks. It also finds non-healthy nodes through the health detection mechanism, and finally stores the relevant data in the cluster nodes. Checklist.
  7. 根据权利要求4所述的面向高性能计算在云上的资源弹性伸缩系统的调度方法,其特征在于,还包括,对于高性能计算中的计算节点健康检测,采用了通过监控计算节点CPU负载指标进行缩容策略辅助,当CPU负载进入80%的阀值时,检测程序会将该计算节点加入到缩容保护队列;当任务计算负载降到80%以下时,健康检测恢复正常,计算节点从缩容保护队列中移除,以避免因健康检测失败而造成节点错误回收;The scheduling method for a resource-elastic scaling system for high-performance computing on the cloud according to claim 4, further comprising: for the health detection of the computing node in the high-performance computing, the monitoring of the CPU load index of the computing node is adopted With the aid of the scaling strategy, when the CPU load enters the 80% threshold, the detection program will add the computing node to the scaling protection queue; when the task computing load drops below 80%, the health check returns to normal, and the computing node starts from Remove the shrinkage protection queue to avoid node error recovery due to failed health check;
    弹性缩容子系统结合自身两个数据采集群模块所采集的数据对节点进行回收决策,从而把空闲的计算节点从集群中删除。The flexible shrinkage subsystem combines the data collected by its two data collection group modules to make a recycling decision on the nodes, thereby deleting idle computing nodes from the cluster.
PCT/CN2018/124970 2018-12-28 2018-12-28 Cloud resource elastic scaling system for high performance computing and scheduling method WO2020133245A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/124970 WO2020133245A1 (en) 2018-12-28 2018-12-28 Cloud resource elastic scaling system for high performance computing and scheduling method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/124970 WO2020133245A1 (en) 2018-12-28 2018-12-28 Cloud resource elastic scaling system for high performance computing and scheduling method

Publications (1)

Publication Number Publication Date
WO2020133245A1 true WO2020133245A1 (en) 2020-07-02

Family

ID=71125974

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/124970 WO2020133245A1 (en) 2018-12-28 2018-12-28 Cloud resource elastic scaling system for high performance computing and scheduling method

Country Status (1)

Country Link
WO (1) WO2020133245A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8966084B2 (en) * 2011-06-17 2015-02-24 International Business Machines Corporation Virtual machine load balancing
CN107733676A (en) * 2016-08-12 2018-02-23 中国移动通信集团浙江有限公司 A kind of method and system of flexible scheduling resource
CN109034879A (en) * 2018-07-06 2018-12-18 东华大学 A kind of cloud computing based on k neighbour's regression algorithm is bidded example price expectation method
CN109032805A (en) * 2018-08-06 2018-12-18 深圳乐信软件技术有限公司 A kind of scalable appearance method, apparatus of elasticity, server and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8966084B2 (en) * 2011-06-17 2015-02-24 International Business Machines Corporation Virtual machine load balancing
CN107733676A (en) * 2016-08-12 2018-02-23 中国移动通信集团浙江有限公司 A kind of method and system of flexible scheduling resource
CN109034879A (en) * 2018-07-06 2018-12-18 东华大学 A kind of cloud computing based on k neighbour's regression algorithm is bidded example price expectation method
CN109032805A (en) * 2018-08-06 2018-12-18 深圳乐信软件技术有限公司 A kind of scalable appearance method, apparatus of elasticity, server and storage medium

Similar Documents

Publication Publication Date Title
CN107734035B (en) Virtual cluster automatic scaling method in cloud computing environment
CN109213555B (en) Resource dynamic scheduling method for virtual desktop cloud
CN102770845B (en) Optimization of archive management scheduling
CN109766175A (en) Resource elastic telescopic system and its dispatching method towards high-performance calculation on cloud
US8656404B2 (en) Statistical packing of resource requirements in data centers
WO2021103790A1 (en) Container scheduling method and apparatus, and non-volatile computer-readable storage medium
CN102843418B (en) A kind of resource scheduling system
US8788864B2 (en) Coordinated approach between middleware application and sub-systems
US20110010222A1 (en) Point-in-time based energy saving recommendations
CN104142860A (en) Resource adjusting method and device of application service system
CN107851039A (en) System and method for resource management
TWI725744B (en) Method for establishing system resource prediction and resource management model through multi-layer correlations
CN103684916A (en) Method and system for intelligent monitoring and analyzing under cloud computing
CN105868004B (en) Scheduling method and scheduling device of service system based on cloud computing
CN106230986A (en) The resource adaptation dispatching patcher of a kind of electrically-based PaaS cloud platform and method
CN110727508A (en) Task scheduling system and scheduling method
WO2023125637A1 (en) Charging control method and system, electronic device, and computer-readable storage medium
Guo et al. Energy-efficient fault-tolerant scheduling algorithm for real-time tasks in cloud-based 5G networks
CN103763373A (en) Method for dispatching based on cloud computing and dispatcher
CN107203256A (en) Energy-conservation distribution method and device under a kind of network function virtualization scene
WO2020133245A1 (en) Cloud resource elastic scaling system for high performance computing and scheduling method
CN104735134B (en) A kind of method and apparatus serviced for providing calculating
JP7452668B2 (en) Task management device, task management method, and task management program
CN103973811A (en) High-availability cluster management method capable of conducting dynamic migration
Cai et al. SLO-aware colocation: Harvesting transient resources from latency-critical services

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18944487

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18944487

Country of ref document: EP

Kind code of ref document: A1