CN107315636B - Resource availability early warning method and device - Google Patents

Resource availability early warning method and device Download PDF

Info

Publication number
CN107315636B
CN107315636B CN201610265261.5A CN201610265261A CN107315636B CN 107315636 B CN107315636 B CN 107315636B CN 201610265261 A CN201610265261 A CN 201610265261A CN 107315636 B CN107315636 B CN 107315636B
Authority
CN
China
Prior art keywords
resource
resource usage
time period
early warning
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610265261.5A
Other languages
Chinese (zh)
Other versions
CN107315636A (en
Inventor
李湛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Group Hebei Co Ltd
Original Assignee
China Mobile Group Hebei Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Group Hebei Co Ltd filed Critical China Mobile Group Hebei Co Ltd
Priority to CN201610265261.5A priority Critical patent/CN107315636B/en
Publication of CN107315636A publication Critical patent/CN107315636A/en
Application granted granted Critical
Publication of CN107315636B publication Critical patent/CN107315636B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5021Priority
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/504Resource capping

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a resource availability early warning method, which comprises the following steps: establishing a resource usage amount estimation model; predicting the resource usage of the next time period through the resource usage prediction model; and when the estimated resource usage amount of the next period exceeds a preset threshold value, sending out a resource availability early warning. The invention also provides a resource availability degree early warning device.

Description

一种资源可用度预警方法及装置Method and device for early warning of resource availability

技术领域technical field

本发明涉及业务支撑技术领域,尤其涉及一种Hadoop资源可用度预警的方法及装置。The invention relates to the technical field of business support, in particular to a method and device for early warning of Hadoop resource availability.

背景技术Background technique

Hadoop是当前大数据平台的主流软件之一,它提供了一种分布式海量数据存储(HDFS,Hadoop Distributed File System)、分布式大规模计算(MapReduce) 和通用资源管理系统(YARN,Yet Another Resource Negotiator)的基本框架,具有高容错性、易用性和可扩展性等优点,被广泛用于数据挖掘、联机分析处理(OLAP,Online AnalyticalProcessing)、经营分析等,能够发掘潜在客户群体、帮助进行市场细分和客户关系管理、预测未来市场趋势等,为企业领导者提供决策支持,实现数据增值变现的目的。目前Hadoop已经被大量应用于互联网、通信、金融等众多领域。Hadoop is one of the mainstream software of the current big data platform. It provides a distributed massive data storage (HDFS, Hadoop Distributed File System), distributed large-scale computing (MapReduce) and a general resource management system (YARN, Yet Another Resource). Negotiator), which has the advantages of high fault tolerance, ease of use and scalability, and is widely used in data mining, online analytical processing (OLAP, Online Analytical Processing), business analysis, etc. Market segmentation and customer relationship management, forecasting future market trends, etc., provide decision support for business leaders and realize the purpose of data value-added realization. At present, Hadoop has been widely used in many fields such as the Internet, communication, and finance.

现有的基于YARN的Hadoop架构如图1所示,主要包括全局资源管理器 (RM,ResourceManager)、应用主管理程序(AM,ApplicationMaster)、节点管理器(NM,NodeManager)和容器Container等一系列模块。The existing YARN-based Hadoop architecture is shown in Figure 1, which mainly includes a series of global resource manager (RM, ResourceManager), application master management program (AM, ApplicationMaster), node manager (NM, NodeManager) and container Container. module.

基于YARN框架的Hadoop整体执行流程如下:The overall execution process of Hadoop based on the YARN framework is as follows:

步骤1:用户通过客户端JobClient提交MapReduce等应用程序,向RM 申请资源;Step 1: The user submits applications such as MapReduce through the client JobClient, and applies for resources to the RM;

步骤2:RM中的全局应用管理器(ASM,ApplicationsManager)和资源调度器(RS,ResourceScheduler)接受请求后给该应用程序分配第一个容器 Container,并查到对应的NM与之通信,发出在Container中启动AM的命令;Step 2: The global application manager (ASM, ApplicationsManager) and resource scheduler (RS, ResourceScheduler) in the RM assign the first container Container to the application after accepting the request, and find the corresponding NM to communicate with it, and send out the The command to start AM in the Container;

步骤3:AM在RM注册自己,然后通过远程过程调用RPC协议采用轮询方式为各个任务申请资源,主要包括CPU、内存等;Step 3: AM registers itself in the RM, and then uses the polling method to apply for resources for each task through the remote procedure call RPC protocol, mainly including CPU, memory, etc.;

步骤4:当AM领取到资源后会与NM通信,由NM启动待执行的任务;Step 4: When the AM receives the resource, it will communicate with the NM, and the NM will start the task to be executed;

步骤5:各个任务通过RPC协议向AM报告自己当前的状态,AM监控所有任务的运行状态,发现任务运行失败后会重新申请资源然后再重启任务;Step 5: Each task reports its current status to the AM through the RPC protocol, and the AM monitors the running status of all tasks. If the task fails to run, it will re-apply for resources and then restart the task;

步骤6:当应用程序执行完成后,AM向RM注销并关闭自己,回收释放相关资源。Step 6: When the execution of the application is completed, the AM logs out to the RM and closes itself, and recycles and releases related resources.

Hadoop现有的YARN架构虽然为资源管理和任务调度监控提供了很好的支持,但是目前只能监控任务状态,不能在资源不足时提前给用户发出预警因而无法提前进行资源调整,有可能会导致任务进度已经接近完成却发生资源严重不足,那么任务可能只能重新启动并再分配资源,这样就会造成时间和资源的浪费。Although Hadoop's existing YARN architecture provides good support for resource management and task scheduling monitoring, it can only monitor task status at present, and cannot issue early warnings to users when resources are insufficient, so resources cannot be adjusted in advance, which may lead to If the progress of the task is close to completion but the resources are seriously insufficient, the task may only be restarted and resources allocated, which will result in a waste of time and resources.

发明内容SUMMARY OF THE INVENTION

有鉴于此,本发明实施例期望提供一种资源可用度预警方法及装置,能够并且能够有效的减少资源浪费提高资源的利用率。In view of this, the embodiments of the present invention expect to provide a resource availability early warning method and apparatus, which can and can effectively reduce resource waste and improve resource utilization.

为达到上述目的,本发明的技术方案是这样实现的:In order to achieve the above object, the technical scheme of the present invention is achieved in this way:

本发明实施例提供了一种资源可用度预警方法,所述方法包括:An embodiment of the present invention provides a method for early warning of resource availability, and the method includes:

建立资源使用量预估模型;Establish a resource usage estimation model;

通过所述资源使用量预估模型,预估下一时间周期的资源使用量;Estimate the resource usage in the next time period by using the resource usage estimation model;

当所述预估的下一周期的资源使用量超出预设阈值时,发出资源可用度预警。When the estimated resource usage in the next cycle exceeds a preset threshold, a resource availability warning is issued.

上述方案中,所述建立资源使用量预估模型包括:In the above scheme, the establishment of a resource usage estimation model includes:

根据每个时间周期内所有并行任务的进度和消耗资源的增量以及时间周期规律和关联关系确定下一时间周期任务进度增量对应的消耗资源增量;Determine the resource consumption increment corresponding to the progress increment of the task in the next time period according to the progress of all parallel tasks in each time period and the increment of consumed resources, as well as the time period regularity and correlation;

多次对比所述确定的下一时间周期任务进度增量对应的消耗资源增量与实际的任务进度增量对应的消耗资源的增量,动态调整权值减小误差;Comparing the consuming resource increments corresponding to the determined task progress increments in the next time period multiple times with the consuming resource increments corresponding to the actual task progress increments, dynamically adjusting the weights to reduce errors;

选取对应最小误差的权值建立资源使用量预估模型。Select the weight corresponding to the minimum error to establish a resource usage estimation model.

上述方案中,所述通过所述资源使用量预估模型,预估下一时间周期的资源使用量包括但不限于:通过所述资源使用量预估模型,预估下一时间周期的 CPU资源使用量和内存资源使用量。In the above solution, the estimation of the resource usage in the next time period by the resource usage estimation model includes but is not limited to: estimation of the CPU resources in the next time period by the resource usage estimation model. Usage and memory resource usage.

上述方案中,所述预估的下一周期的资源使用量超出第一阈值时包括:预估的下一周期的资源使用量超出剩余可用资源量的阈值。In the above solution, when the estimated resource usage in the next cycle exceeds the first threshold, it includes: the estimated resource usage in the next cycle exceeds the threshold of the remaining available resources.

上述方案中,所述当所述预估的下一周期的资源使用量超出预设阈值时,发出资源可用度预警包括:当所述预估的下一周期的资源使用量超出预设阈值时,根据任务的优先级、重要程度以及依赖关系发出不同级别的资源可用度预警。In the above solution, when the estimated resource usage in the next cycle exceeds a preset threshold, issuing a resource availability warning includes: when the estimated resource usage in the next cycle exceeds a preset threshold , according to the priority, importance and dependencies of the task to issue different levels of resource availability early warning.

本发明实施例还提供了一种资源可用度预警装置,所述装置包括:模型建立模块、资源预估模块、资源预警模块,其中,The embodiment of the present invention also provides a resource availability early warning device, the device includes: a model establishment module, a resource estimation module, and a resource early warning module, wherein,

所述模型建立模块,用于建立资源使用量预估模型;The model establishment module is used to establish a resource usage estimation model;

所述资源预估模块,用于通过所述资源使用量预估模型,预估下一时间周期的资源使用量;The resource estimation module is used to estimate the resource usage in the next time period through the resource usage estimation model;

所述资源预警模块,用于当所述预估的下一周期的资源使用量超出预设阈值时,发出资源可用度预警。The resource early warning module is configured to issue a resource availability early warning when the estimated resource usage in the next cycle exceeds a preset threshold.

上述方案中,所述模型建立模块具体用于:In the above scheme, the model establishment module is specifically used for:

根据每个时间周期内所有并行任务的进度和消耗资源的增量以及时间周期规律和关联关系确定下一时间周期任务进度增量对应的消耗资源增量;Determine the resource consumption increment corresponding to the progress increment of the task in the next time period according to the progress of all parallel tasks in each time period and the increment of consumed resources, as well as the time period regularity and correlation;

多次对比所述确定的下一时间周期任务进度增量对应的消耗资源增量与实际的任务进度增量对应的消耗资源的增量,动态调整权值减小误差;Comparing the consuming resource increments corresponding to the determined task progress increments in the next time period multiple times with the consuming resource increments corresponding to the actual task progress increments, dynamically adjusting the weights to reduce errors;

选取对应最小误差的权值建立资源使用量预估模型。Select the weight corresponding to the minimum error to establish a resource usage estimation model.

上述方案中,所述资源预估模块具体用于:通过所述资源使用量预估模型,预估下一时间周期的CPU资源使用量和内存资源使用量。In the above solution, the resource estimation module is specifically configured to: estimate the CPU resource usage and the memory resource usage in the next time period through the resource usage estimation model.

上述方案中,所述资源预警模块判断预估的下一周期的资源使用量超出第一阈值时包括:所述资源预警模块判断预估的下一周期的资源使用量超出剩余可用资源量的阈值。In the above solution, when the resource early warning module judges that the estimated resource usage in the next cycle exceeds the first threshold, the step includes: the resource early warning module determines that the estimated resource usage in the next cycle exceeds the threshold of the remaining available resources. .

上述方案中,资源预警模块具体用于:当所述预估的下一周期的资源使用量超出预设阈值时,根据任务的优先级、重要程度以及依赖关系发出不同级别的资源可用度预警。In the above solution, the resource early warning module is specifically configured to: when the estimated resource usage in the next cycle exceeds a preset threshold, issue early warning of resource availability at different levels according to the priority, importance and dependency of the task.

本发明实施例所提供的资源可用度预警方法及装置,先建立资源使用量预估模型,然后通过所述资源使用量预估模型,预估下一时间周期的资源使用量,当所述预估的下一周期的资源使用量超出预设阈值时,发出资源可用度预警。如此,能够避免因可用资源不足造成大量任务失败的情况,改善原有运行机制的不足,并且能够有效的减少资源浪费提高资源的利用率。The resource availability early warning method and device provided by the embodiments of the present invention first establish a resource usage estimation model, and then estimate the resource usage in the next time period through the resource usage estimation model. When the estimated resource usage in the next cycle exceeds the preset threshold, a resource availability warning is issued. In this way, a situation in which a large number of tasks fail due to insufficient available resources can be avoided, the insufficiency of the original operation mechanism can be improved, and resource waste can be effectively reduced and resource utilization can be improved.

附图说明Description of drawings

图1为基于YARN的Hadoop架构图;Figure 1 is a diagram of the Hadoop architecture based on YARN;

图2为本发明实施例资源可用度预警方法流程示意图;2 is a schematic flowchart of a method for early warning of resource availability according to an embodiment of the present invention;

图3为本发明实施例资源可用度预警方法的原理示意图;3 is a schematic diagram of the principle of a method for early warning of resource availability according to an embodiment of the present invention;

图4为本发明实施例资源可用度预警装置结构示意图;4 is a schematic structural diagram of an apparatus for early warning of resource availability according to an embodiment of the present invention;

图5为本发明实施例资源可用度预警系统结构示意图;5 is a schematic structural diagram of a resource availability early warning system according to an embodiment of the present invention;

图6为本发明实施例资源可用度预警系统工作流程示意图。FIG. 6 is a schematic work flow diagram of a resource availability early warning system according to an embodiment of the present invention.

具体实施方式Detailed ways

本发明实施例中,先建立资源使用量预估模型,然后通过所述资源使用量预估模型,预估下一时间周期的资源使用量,当所述预估的下一周期的资源使用量超出预设阈值时,发出资源可用度预警。In the embodiment of the present invention, a resource usage estimation model is first established, and then the resource usage estimation model is used to estimate the resource usage in the next time period. When the estimated resource usage in the next period is When the preset threshold is exceeded, a resource availability warning is issued.

正常情况下,Hadoop的正常运行离不开充足的CPU、内存等资源支持,各种资源的可用度大小对Hadoop的任务运行的影响各不相同,比如,CPU资源如果不够用可能不会直接造成任务失败,但会致使整体运行速度和处理效率显著下降而出现大量任务积压排队;内存资源如果不够用则往往是致命的,会出现内存溢出,任务执行中断,造成很多任务失败等。因此,对Hadoop资源的可用度进行实时监控并且在资源可用度较低前进行预警是十分必要的。Under normal circumstances, the normal operation of Hadoop is inseparable from the support of sufficient resources such as CPU and memory. The availability of various resources has different effects on the operation of Hadoop tasks. For example, if the CPU resources are insufficient, it may not directly cause If the task fails, it will cause the overall running speed and processing efficiency to drop significantly, resulting in a large number of task backlogs and queues; if the memory resources are not enough, it is often fatal, memory overflow will occur, task execution will be interrupted, and many tasks will fail. Therefore, it is necessary to monitor the availability of Hadoop resources in real time and give early warning before the availability of resources is low.

本发明实施例中提供了一种Hadoop资源可用度预警方法,首先建立基于时间序列的动态反馈学习模型,根据每个时间序列增量范围内所有并行任务的进度和消耗资源的增量动态,按照预设的规则学习本时间周期内潜在的规律和关联关系并产生下一时间周期任务进度增量对应的消耗资源增量,再对比实际的任务进度和消耗资源的增量值不断反复的动态调整权值减小误差优化模型,从一系列权值中取当前时间序列范围内最小误差的最优权值形成模型,然后在此模型基础上预估完成剩余任务所需的资源大小,如果所需资源值超过剩余可用资源量的阈值则按照任务不同的优先级、重要程度和依赖关系发出不同高低级别的预警,提示资源可用度较低可能会造成大量任务失败,为后期自动调整容器的资源值提供参考,为用户手工配置资源值提供依据,从而避免因可用资源不足造成大量任务失败的情况,改善原有运行机制的不足,并且能够有效的减少资源浪费提高资源的利用率。The embodiment of the present invention provides a method for early warning of Hadoop resource availability. First, a dynamic feedback learning model based on time series is established. The preset rules learn the potential rules and associations in this time period and generate the increment of consumed resources corresponding to the increment of task progress in the next period of time, and then compare the actual task progress and the increment value of consumed resources to continuously and repeatedly dynamically adjust The weight reduction error optimization model is to take the optimal weight with the smallest error in the current time series from a series of weights to form a model, and then estimate the resource size required to complete the remaining tasks on the basis of this model. If the resource value exceeds the threshold of the remaining available resources, different levels of early warnings will be issued according to the priority, importance and dependencies of the tasks, indicating that low resource availability may cause a large number of tasks to fail, and the resource value of the container will be automatically adjusted for the later stage. Provide a reference to provide a basis for users to manually configure resource values, so as to avoid the failure of a large number of tasks due to insufficient available resources, improve the shortcomings of the original operating mechanism, and effectively reduce resource waste and improve resource utilization.

下面结合附图及具体实施例,对本发明技术方案的实施作进一步的详细描述。图2为本发明实施例一资源可用度预警方法流程示意图,如图2所示,本实施例资源可用度预警方法包括以下步骤:The implementation of the technical solution of the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. FIG. 2 is a schematic flowchart of a method for early warning of resource availability according to Embodiment 1 of the present invention. As shown in FIG. 2 , the method for early warning of resource availability in this embodiment includes the following steps:

步骤201:建立资源使用量预估模型;Step 201: establish a resource usage estimation model;

本发明实施例中,所述建立资源使用量预估模型包括:根据每个时间周期内所有并行任务的进度和消耗资源的增量以及时间周期规律和关联关系确定下一时间周期任务进度增量对应的消耗资源增量;多次对比所述确定的下一时间周期任务进度增量对应的消耗资源增量与实际的任务进度增量对应的消耗资源的增量,动态调整权值减小误差;选取对应最小误差的权值建立资源使用量预估模型。In the embodiment of the present invention, the establishment of the resource usage estimation model includes: determining the progress increment of tasks in the next time period according to the progress of all parallel tasks in each time period and the increment of consumed resources, as well as the time period regularity and correlation Corresponding consumption resource increment; compare the consumption resource increment corresponding to the determined next time period task progress increment and the consumption resource increment corresponding to the actual task progress increment multiple times, and dynamically adjust the weights to reduce errors ; Select the weight corresponding to the minimum error to establish a resource usage estimation model.

具体的,根据时间序列增量范围内所有并行任务的进度和消耗资源的增量,按照预设的规则学习本时间周期内潜在的规律和关联关系并产生下一时间周期任务进度增量对应的消耗资源增量,再对比实际的任务进度和消耗资源的增量值不断反复的动态调整权值减小误差优化模型,从一系列权值中取当前时间序列范围内最小误差的最优权值形成模型。Specifically, according to the progress of all parallel tasks and the increment of consumed resources within the incremental range of the time series, the potential rules and associations in the current time period are learned according to preset rules, and the task progress increment corresponding to the next time period is generated. Consume the increment of resources, then compare the actual task progress and the incremental value of consumed resources, continuously and repeatedly adjust the weights dynamically to reduce the error optimization model, and take the optimal weight with the smallest error in the current time series from a series of weights form a model.

图3为本发明实施例资源可用度预警方法的原理示意图,如图3所示, Hadoop中的所有任务的集合表示为JALL={j1,j2,...,jN},其中N表示所有任务的数量。设置时间序列为从ti时刻到ti+1时刻内的一段时间,则时间序列为Δt=ti+1-ti。在时间序列范围内所有正在运行的任务的集合表示为JRUNNING={j1,j2,...,jn},n≤N。对于Hadoop中的所有运行任务的进度状态进行监控并用百分比形式表示,即集合JPRUNNING={jp1,jp2,...,jpn},其中任意一个任务jk在ti时刻进度用

Figure BDA0000975146890000065
表示,那么在时间序列范围内ti时刻到ti+1时刻内任务jk的进度增量为
Figure BDA0000975146890000066
在单位时间内任务jk进度的平均增量如公式(1-1)所示:FIG. 3 is a schematic diagram of the principle of a method for early warning of resource availability according to an embodiment of the present invention. As shown in FIG. 3 , the set of all tasks in Hadoop is represented as J ALL ={j 1 ,j 2 ,...,j N }, where N represents the number of all tasks. The time series is set as a period of time from time t i to time t i+1 , then the time series is Δt=t i+1 −t i . The set of all running tasks in the time series is denoted as J RUNNING ={j 1 ,j 2 ,...,j n },n≤N. The progress status of all running tasks in Hadoop is monitored and expressed as a percentage, that is, the set JP RUNNING = {jp 1 ,jp 2 ,...,jp n }, where any task j k progresses at time t i using
Figure BDA0000975146890000065
represents, then the progress increment of task j k from time t i to time t i+1 in the time series is:
Figure BDA0000975146890000066
The average increment of the progress of task j k in unit time is shown in formula (1-1):

Figure BDA0000975146890000061
Figure BDA0000975146890000061

相应地,计算在单位时间内所有任务进度的平均增量,如公式(1-2)所示:Correspondingly, calculate the average increment of all task progress in unit time, as shown in formula (1-2):

Figure BDA0000975146890000062
Figure BDA0000975146890000062

在时间序列范围内的ti+1时刻,所有正在运行的任务的进度中最快的进度可以用

Figure BDA0000975146890000067
来衡量,那么这个最快进度的任务剩余的所需完成的进度为
Figure BDA0000975146890000068
与之同时运行的所有任务的平均进度增量
Figure BDA0000975146890000069
通过公式(1-3)计算:At time t i+1 in the time series, the fastest progress among all the running tasks can be used with
Figure BDA0000975146890000067
To measure, then the remaining progress of this fastest progressing task is
Figure BDA0000975146890000068
Average progress delta for all tasks running concurrently with it
Figure BDA0000975146890000069
Calculated by formula (1-3):

Figure BDA0000975146890000063
Figure BDA0000975146890000063

在进行资源可用度预警的过程中,主要从CPU、内存等方面动态反馈学习任务消耗资源的情况。假设在时间序列范围内从ti时刻到ti+1时刻用百分比形式表示所有运行任务消耗CPU资源的增量,即Δcp=cpi+1-cpi,则单位时间内所有运行任务消耗CPU资源的计算方法如公式(1-4):In the process of resource availability early warning, the resource consumption of learning tasks is dynamically fed back mainly from the aspects of CPU and memory. Assuming that in the time series from time t i to time t i+1 , the increment of CPU resources consumed by all running tasks is expressed as a percentage, that is, Δcp=cp i+1 -cp i , then all running tasks consume CPU per unit time The calculation method of resources is as formula (1-4):

Figure BDA0000975146890000064
Figure BDA0000975146890000064

在本周期时间序列范围内完成最快进度任务的剩余进度期间,下一时间周期内完成所有运行任务还需要占用的CPU资源增量的计算方法如公式(1-5) 所示:During the remaining progress period of completing the fastest progress task within the time series of this cycle, the calculation method of the increment of CPU resources still needed to complete all running tasks in the next time cycle is as shown in formula (1-5):

Figure BDA0000975146890000071
Figure BDA0000975146890000071

对比下一时间周期实际消耗CPU资源值cpi+2并调节权值σc减小误差,如公式(1-6)所示:Compare the actual CPU resource consumption value cp i+2 in the next time period and adjust the weight σ c to reduce the error, as shown in formula (1-6):

Figure BDA0000975146890000072
Figure BDA0000975146890000072

在时间序列范围内形成一系列权值集合

Figure BDA0000975146890000076
从这些权值中取最小误差的最优权值作为当前σc值,该值在这个时间序列内随资源消耗的不断变化动态地反复学习优化调整取得最优值。Form a series of weight sets in the range of time series
Figure BDA0000975146890000076
From these weights, the optimal weight with the smallest error is taken as the current value of σ c , and this value is dynamically and repeatedly learned to optimize and adjust to obtain the optimal value with the continuous change of resource consumption in this time series.

同理,在时间序列范围内从ti时刻到ti+1时刻,计算单位时间内所有运行任务消耗内存资源的方法如公式(1-7)所示:Similarly, in the time series from time t i to time t i+1 , the method of calculating the memory resource consumption of all running tasks per unit time is shown in formula (1-7):

Figure BDA0000975146890000073
Figure BDA0000975146890000073

在本周期完成最快进度任务的剩余进度期间,在下一时间周期内完成所有运行任务还需要的占用的内存资源增量的计算方法如公式(1-8)所示:During the remaining progress period of completing the fastest progress task in this cycle, the calculation method of the memory resource increment required to complete all running tasks in the next time cycle is as shown in formula (1-8):

Figure BDA0000975146890000074
Figure BDA0000975146890000074

则对比下一时间周期实际消耗内存资源值mpi+2并调节权值σm减小误差,如公式(1-9):Then compare the actual consumption of memory resource value mp i+2 in the next time period and adjust the weight σ m to reduce the error, such as formula (1-9):

Figure BDA0000975146890000075
Figure BDA0000975146890000075

类似地在时间序列范围内形成一系列权值集合

Figure BDA0000975146890000077
并且动态地反复学习优化调整取最小误差的最优权值作为当前σm值。Similarly, a series of weight sets are formed in the time series range
Figure BDA0000975146890000077
And iteratively learns and optimizes dynamically and takes the optimal weight with the smallest error as the current σ m value.

步骤202:通过所述资源使用量预估模型,预估下一时间周期的资源使用量;Step 202: Estimate the resource usage in the next time period through the resource usage estimation model;

本发明实施例中,所述通过所述资源使用量预估模型,预估下一时间周期的资源使用量包括但不限于:通过所述资源使用量预估模型,预估下一时间周期的CPU资源使用量和内存资源使用量。In the embodiment of the present invention, estimating the resource usage in the next time period by using the resource usage estimation model includes but not limited to: estimating the resource usage estimation model in the next time period. CPU resource usage and memory resource usage.

具体的,在形成的最小误差最优资源使用量预估模型基础上,以时间序列范围内进度最快的任务的进度状况为基准,根据所有并行任务当前已经消耗资源的情况预估完成剩余任务所需资源的大小;Specifically, on the basis of the optimal resource usage estimation model with minimum error, the progress status of the task with the fastest progress in the time series is used as the benchmark, and the remaining tasks are estimated to be completed according to the current resource consumption of all parallel tasks. the size of the resources required;

本发明实施例中,若要保证所有运行任务能够正常完成,最重要的是保障在最多并行任务运行的高峰期有充足的资源,如图3所示,在时间序列范围内ti时刻到ti+1时刻内并行任务数最多,当进度最快的任务jpk完成后会释放一些资源,此时并行任务数量下降,CPU、内存等相关资源消耗量也相应下降,因此在时间序列范围内这些并行任务的成功与否主要取决于在最多最快进度任务运行的这段时间是否有足够的CPU、内存等可用资源。In this embodiment of the present invention, in order to ensure that all running tasks can be completed normally, the most important thing is to ensure that there are sufficient resources during the peak period when the most parallel tasks are running. The number of parallel tasks is the largest at time i+1 . When the task jp k with the fastest progress is completed, some resources will be released. At this time, the number of parallel tasks will decrease, and the consumption of related resources such as CPU and memory will also decrease accordingly. Therefore, within the time series range The success of these parallel tasks mainly depends on whether there are enough CPU, memory and other available resources during the time when the fastest progress tasks are running.

使用步骤201中建立的资源预估模型可以预估完成剩余任务所需资源的大小,若当前时刻为tT,则预估所需CPU资源cpT的计算方法如公式(1-10)所示:The resource estimation model established in step 201 can be used to estimate the size of the resources required to complete the remaining tasks. If the current moment is t T , the calculation method for estimating the required CPU resource cp T is shown in formula (1-10) :

Figure BDA0000975146890000081
Figure BDA0000975146890000081

预估所需内存资源mpT的计算方法如(1-11)所示:The calculation method for estimating the required memory resource mp T is shown in (1-11):

Figure BDA0000975146890000082
Figure BDA0000975146890000082

步骤203:当所述预估的下一周期的资源使用量超出预设阈值时,发出资源可用度预警。Step 203: When the estimated resource usage in the next cycle exceeds a preset threshold, issue a resource availability warning.

本发明实施例中,所述预估的下一周期的资源使用量超出第一阈值时包括:预估的下一周期的资源使用量超出剩余可用资源量的阈值。In this embodiment of the present invention, when the estimated resource usage in the next cycle exceeds the first threshold, it includes: the estimated resource usage in the next cycle exceeds the threshold of the remaining available resources.

所述当所述预估的下一周期的资源使用量超出预设阈值时,发出资源可用度预警包括:当所述预估的下一周期的资源使用量超出预设阈值时,根据任务的优先级、重要程度以及依赖关系发出不同级别的资源可用度预警。When the estimated resource usage in the next cycle exceeds the preset threshold, issuing a resource availability warning includes: when the estimated resource usage in the next cycle exceeds the preset threshold, according to the task's Priority, importance, and dependencies issue different levels of resource availability alerts.

具体的,如果预估完成剩余任务所需的资源使用量超过剩余的可用资源量的阈值则按照任务不同的优先级、重要程度和依赖关系发出不同高低级别的预警,提示资源可用度较低可能会造成大量任务失败;Specifically, if the estimated resource usage required to complete the remaining tasks exceeds the threshold of the remaining available resources, different levels of early warnings will be issued according to the different priorities, importance and dependencies of the tasks, indicating that the resource availability may be low. will cause a large number of tasks to fail;

在tT时刻Hadoop实际剩余的可用CPU资源量为

Figure BDA0000975146890000091
假设CPU资源可用度预警阈值为μc,那么预警规则如公式(1-12)所示:The actual remaining available CPU resources of Hadoop at time t T is
Figure BDA0000975146890000091
Assuming that the CPU resource availability warning threshold is μ c , the warning rules are shown in formula (1-12):

Figure BDA0000975146890000092
Figure BDA0000975146890000092

在tT时刻实际剩余的可用内存资源量为

Figure BDA0000975146890000093
设内存资源可用度预警阈值为μm,则预警规则如公式(1-13)所示:The actual remaining amount of available memory resources at time t T is
Figure BDA0000975146890000093
Set the early warning threshold of memory resource availability to μ m , the early warning rules are shown in formula (1-13):

Figure BDA0000975146890000094
Figure BDA0000975146890000094

其中,预警阈值μc和μm可以按照实际需求划分为严重告警、重要告警、一般告警等不同级别的告警。严重告警主要是指对系统、平台或应用等产生致命影响需要立即干预的告警,这是最高级别的告警;重要告警主要是指部分地对系统、平台或应用产生影响的告警,这是中间级别的告警;一般告警主要是警告类别的告警,可能不会对系统、平台或应用产生直接影响的告警,这是最低级别的告警;可以根据实际情况设置更细粒度的告警。如果当前运行任务的优先级较高、很重要并且有很多后续任务依赖该任务,那么将会产生较高级别的告警,在实际应用中可以量化优先级、重要程度和依赖关系并设置对应阈值,这样可以实现在复杂多变的环境产生不同级别的告警。Among them, the early warning thresholds μ c and μ m can be divided into different levels of alarms such as serious alarms, important alarms, and general alarms according to actual needs. Severe alarms mainly refer to alarms that have a fatal impact on systems, platforms, or applications and require immediate intervention, which are the highest-level alarms; major alarms mainly refer to alarms that partially affect systems, platforms, or applications, which are intermediate-level alarms General alarms are mainly warning-type alarms, which may not have a direct impact on the system, platform, or application. This is the lowest-level alarm; finer-grained alarms can be set according to the actual situation. If the priority of the currently running task is high and important, and many subsequent tasks depend on the task, a higher-level alarm will be generated. In practical applications, the priority, importance, and dependency can be quantified and corresponding thresholds can be set. In this way, different levels of alarms can be generated in a complex and changeable environment.

根据上述方法,还可以实现对其他资源的可用度的预警,这样就实现了对 Hadoop所有可用资源进行实时监控并且在资源不足时能够提前告警通知维护人员进行扩容,避免因资源不足造成运行效率下降、任务失败以及数据丢失等不良影响。According to the above method, the early warning of the availability of other resources can also be realized, which realizes real-time monitoring of all available Hadoop resources, and alerts maintenance personnel in advance to expand capacity when resources are insufficient, so as to avoid the reduction of operating efficiency due to insufficient resources. , mission failures, and data loss.

本发明实施例还提供了一种资源可用度预警装置,图4为本发明实施例资源可用度预警装置结构示意图,如图4所示,所述装置包括:模型建立模块41、资源预估模块42、资源预警模块43,其中,An embodiment of the present invention also provides a resource availability early warning device. FIG. 4 is a schematic structural diagram of a resource availability early warning device according to an embodiment of the present invention. As shown in FIG. 4 , the device includes: a model establishment module 41 and a resource estimation module 42. Resource early warning module 43, wherein,

所述模型建立模块41,用于建立资源使用量预估模型;The model establishment module 41 is used to establish a resource usage estimation model;

本发明实施例中,所述模型建立模块具体用于:In the embodiment of the present invention, the model establishment module is specifically used for:

根据每个时间周期内所有并行任务的进度和消耗资源的增量以及时间周期规律和关联关系确定下一时间周期任务进度增量对应的消耗资源增量;多次对比所述确定的下一时间周期任务进度增量对应的消耗资源增量与实际的任务进度增量对应的消耗资源的增量,动态调整权值减小误差;选取对应最小误差的权值建立资源使用量预估模型。Determine the resource consumption increment corresponding to the progress increment of the task in the next time period according to the progress of all parallel tasks in each time period and the increment of consumed resources, as well as the time period regularity and relationship; compare the determined next time multiple times The resource consumption increment corresponding to the periodic task progress increment and the consumption resource increment corresponding to the actual task progress increment are dynamically adjusted to reduce the error; the weight corresponding to the minimum error is selected to establish a resource usage estimation model.

具体的,根据时间序列增量范围内所有并行任务的进度和消耗资源的增量,按照预设的规则学习本时间周期内潜在的规律和关联关系并产生下一时间周期任务进度增量对应的消耗资源增量,再对比实际的任务进度和消耗资源的增量值不断反复的动态调整权值减小误差优化模型,从一系列权值中取当前时间序列范围内最小误差的最优权值形成模型。Specifically, according to the progress of all parallel tasks and the increment of consumed resources within the incremental range of the time series, the potential rules and associations in the current time period are learned according to preset rules, and the task progress increment corresponding to the next time period is generated. Consume the increment of resources, then compare the actual task progress and the incremental value of consumed resources, continuously and repeatedly adjust the weights dynamically to reduce the error optimization model, and take the optimal weight with the smallest error in the current time series from a series of weights form a model.

所述资源预估模块42,用于通过所述资源使用量预估模型,预估下一时间周期的资源使用量;The resource estimation module 42 is used to estimate the resource usage in the next time period through the resource usage estimation model;

本发明实施例中,所述资源预估模块具体用于:通过所述资源使用量预估模型,预估下一时间周期的CPU资源使用量和内存资源使用量。In the embodiment of the present invention, the resource estimation module is specifically configured to: estimate the CPU resource usage and the memory resource usage in the next time period by using the resource usage estimation model.

具体的,在形成的最小误差最优资源使用量预估模型基础上,以时间序列范围内进度最快的任务的进度状况为基准,根据所有并行任务当前已经消耗资源的情况预估完成剩余任务所需资源的大小。Specifically, on the basis of the optimal resource usage estimation model with minimum error, the progress status of the task with the fastest progress in the time series is used as the benchmark, and the remaining tasks are estimated to be completed according to the current resource consumption of all parallel tasks. The size of the required resource.

所述资源预警模块43,用于当所述预估的下一周期的资源使用量超出预设阈值时,发出资源可用度预警。The resource early warning module 43 is configured to issue a resource availability early warning when the estimated resource usage in the next cycle exceeds a preset threshold.

本发明实施例中,所述资源预警模块判断预估的下一周期的资源使用量超出第一阈值时包括:所述资源预警模块判断预估的下一周期的资源使用量超出剩余可用资源量的阈值;当所述预估的下一周期的资源使用量超出预设阈值时,根据任务的优先级、重要程度以及依赖关系发出不同级别的资源可用度预警。In the embodiment of the present invention, when the resource early warning module determines that the estimated resource usage in the next cycle exceeds the first threshold, the step includes: the resource early warning module determines that the estimated resource usage in the next cycle exceeds the remaining available resources When the estimated resource usage of the next cycle exceeds the preset threshold, different levels of resource availability warnings are issued according to the priority, importance and dependency of the task.

本发明实施例还提供了一种资源可用度预警系统,图5为本发明实施例资源可用度预警系统结构示意图,如图5所示,本发明实施例所述资源可用度预警装置中的模型建立模块41和资源预估模块42位于图 5中的应用任务预警器51 (AW,ApplicationWarner),所述资源预警模块43位于图5中的全局资源预警器52(RW,ResourceWarner);An embodiment of the present invention further provides a resource availability early warning system. FIG. 5 is a schematic structural diagram of a resource availability early warning system according to an embodiment of the present invention. As shown in FIG. 5 , a model of the resource availability early warning device according to the embodiment of the present invention is shown. The establishment module 41 and the resource estimation module 42 are located in the application task early warning device 51 (AW, ApplicationWarner) in Figure 5, and the resource early warning module 43 is located in the global resource early warning device 52 (RW, ResourceWarner) in Figure 5;

从整体架构来说,本发明实施例所述资源可用度预警系统在Hadoop原有的YARN框架模块架构基础上,新增具有资源可用度监控预警机制的功能模块 RW和AW,分别实现全局资源的可用度监控预警和局部容器Container的资源可用度监控预警,按照预设的规则,将资源可用度过低的告警信息及时推送通知给用户,以便用户提前处理,为运维提供决策支持。In terms of overall architecture, the resource availability early warning system according to the embodiment of the present invention adds functional modules RW and AW with resource availability monitoring and early warning mechanisms on the basis of the original YARN framework module architecture of Hadoop, respectively realizing global resource monitoring and control. Availability monitoring and early warning and resource availability monitoring and early warning of local container Containers, according to preset rules, push notifications of low resource availability to users in a timely manner, so that users can handle them in advance and provide decision support for operation and maintenance.

本发明实施例所述资源可用度预警系统建立了基于时间序列动态反馈学习模型的资源可用度预警机制,将每个时间序列增量范围内所有并行任务的进度和消耗资源的增量动态实时地反馈给AW,AW按照预设规则学习本时间周期内潜在的规律和关联关系并产生下一时间周期任务进度增量对应的消耗资源增量,再对比实际的任务进度和消耗资源增量不断反复的动态调整权值减小误差优化模型,从一系列权值中取当前时间序列范围内最小误差的最优权值形成模型,然后在此基础上预估完成剩余任务所需资源的大小,如果所需资源值超过剩余可用资源量的阀值则由AW通知RW,RW按照任务不同的优先级、重要程度和依赖关系发出不同高低级别的预警。The resource availability early-warning system according to the embodiment of the present invention establishes a resource availability early-warning mechanism based on a time-series dynamic feedback learning model. Feedback to AW, AW learns the potential rules and relationships in this time period according to the preset rules, and generates the consumption resource increment corresponding to the task progress increment in the next time period, and then compares the actual task progress and consumption resource increment repeatedly. The dynamic adjustment weights reduce the error optimization model, take the optimal weights with the smallest error in the current time series from a series of weights to form a model, and then estimate the size of the resources required to complete the remaining tasks on this basis, if When the required resource value exceeds the threshold value of the remaining available resources, the AW will notify the RW, and the RW will issue different levels of early warning according to the different priorities, importance and dependencies of the tasks.

本发明实施例所述资源可用度预警系统首先由用户提交MapReduce等应用程序,分别向RM申请资源和向RW申请进行资源可用度监控预警,请求获得允许后再由NM发出命令给Container启动AM和AW分别注册资源和预警,然后NM会启动各个任务,任务进入运行状态,其中AW按照公式(1-1)至(1-13) 定义的规则执行资源监控预警,若发现资源不足时通知RW处理并发出预警信息,当任务正常执行完成后释放资源并关闭告警,最后结束并退出流程。In the resource availability early warning system according to the embodiment of the present invention, the user submits application programs such as MapReduce, respectively, applies for resources to the RM and applies to the RW for resource availability monitoring and early warning. AW registers resources and alerts respectively, and then NM starts each task, and the task enters the running state, in which AW performs resource monitoring and alerting according to the rules defined by formulas (1-1) to (1-13), and informs RW if resources are insufficient. And send out early warning information, when the task is completed normally, release resources and close the alarm, and finally end and exit the process.

图6为本发明实施例资源可用度预警系统工作流程示意图,如图6所示,本发明实施例所述资源可用度预警系统工作流程包括以下步骤:FIG. 6 is a schematic diagram of the workflow of the resource availability early warning system according to the embodiment of the present invention. As shown in FIG. 6 , the workflow of the resource availability early warning system according to the embodiment of the present invention includes the following steps:

步骤601:用户提交应用程序,向RM申请资源,向RW请求启动资源可用度监控预警;Step 601: the user submits the application, applies for resources to the RM, and requests the RW to start the monitoring and early warning of resource availability;

本步骤中,用户以JobClient方式提交应用程序,向RM申请资源,并向 RW请求启动资源可用度监控预警。In this step, the user submits the application program in the JobClient mode, applies for resources to the RM, and requests the RW to start the monitoring and early warning of resource availability.

步骤602:RM分配Container并与NM通信要求启动AW,RW与NM通信要求启动AW;Step 602: RM allocates Container and communicates with NM to request to start AW, RW communicates with NM to request to start AW;

本步骤中,RM接受请求后给应用程序分配第一个容器Container,并与对应的NM通信,发出在Container中启动AM的命令。同时,RW也接受请求与 NM进行通信,并且要求在Container中启动应用任务监控AW以期对每个 Container进行资源可用度监控预警。In this step, the RM allocates the first container Container to the application after accepting the request, communicates with the corresponding NM, and issues a command to start the AM in the Container. At the same time, the RW also accepts the request to communicate with the NM, and requests to start the application task monitoring AW in the Container in order to monitor and warn each Container of resource availability.

步骤603:AM向RM注册并领取资源,AW在RW注册并报告资源;Step 603: AM registers with RM and receives resources, AW registers and reports resources with RW;

本步骤中,AM向RM注册自己为各个任务申请领取CPU、内存等资源,同时,AW在RW注册自己并汇报各个任务已经领取到的资源类型、大小等各种信息以便RW能够按其进行分类并采用不同的处理策略。In this step, AM registers itself with RM to apply for CPU, memory and other resources for each task. At the same time, AW registers itself with RW and reports various information such as the type and size of resources already received by each task, so that RW can classify them according to them. and adopt different processing strategies.

步骤604:AM要求NM启动任务,NM通知AW对资源可用度进行监控预警;Step 604: the AM requests the NM to start the task, and the NM notifies the AW to monitor and warn the resource availability;

本步骤中,AM要求NM启动执行各个任务,NM通知AW对资源可用度进行监控预警,然后,NM启动各个任务,所有任务进入正式运行状态。In this step, the AM requests the NM to start and execute each task, and the NM notifies the AW to monitor and warn the resource availability. Then, the NM starts each task, and all tasks enter the official running state.

步骤605:AM管理监控任务状态,AW对资源可用度进行监控和预警;Step 605: AM manages and monitors task status, and AW monitors and warns resource availability;

本步骤中,所有任务在运行期间都由AM负责管理和进行状态监控,并由 AW对资源可用度进行监控和预警。In this step, AM is responsible for management and status monitoring of all tasks during operation, and AW monitors and warns of resource availability.

步骤606:AW判断是否进行预警,当需要进行严重预警时,执行步骤607,当不需要进行预警或者需要进行非严重预警时,执行步骤608;Step 606: AW judges whether to carry out an early warning, when a serious warning needs to be carried out, step 607 is performed, and when no warning is required or a non-serious warning needs to be carried out, step 608 is carried out;

本步骤中,在规定的时间序列范围内记录任务的进度以及Container内的资源可用度,按照公式(1-1)至(1-13)不断反复进行反馈学习获得最小误差的最优模型并预估下一周期所需资源量,当发现剩余的资源可用度比预估所需资源的阀值低时则发出告警。当AM发现处于致命预警状态或已经失败状态的无法挽回任务则重新申请资源并重启任务。In this step, the progress of the task and the availability of resources in the Container are recorded within the specified time series, and the feedback learning is continuously repeated according to formulas (1-1) to (1-13) to obtain the optimal model with the minimum error and predict Estimate the amount of resources required in the next cycle, and issue an alarm when the availability of the remaining resources is found to be lower than the threshold of the estimated required resources. When AM finds an irreversible task in a fatal warning state or has failed, it re-applies for resources and restarts the task.

步骤607:发出严重预警,并返回步骤602;Step 607: issue a serious warning, and return to step 602;

步骤608:RW实现全局资源的可用度监控和预警,并将告警信息通知给户。Step 608: The RW implements global resource availability monitoring and early warning, and notifies the user of the warning information.

本步骤中,RW综合来自AW报告的资源可用度及告警信息,实现全局资源的可用度监控和预警,并将告警信息推送通知给用户。In this step, the RW synthesizes the resource availability and alarm information reported from the AW to implement global resource availability monitoring and early warning, and push notifications of the alarm information to the user.

步骤609:应用程序结束,AM从RM中注销释放资源,AW从RW中关闭告警,本流程结束。Step 609 : the application ends, the AM logs out from the RM to release resources, the AW closes the alarm from the RW, and the process ends.

本步骤中,当应用程序正常执行完成后,AM从RM中注销自己释放资源,相应的AW也从RW中将自己的告警关闭,最终结束流程。In this step, after the normal execution of the application program is completed, the AM logs out of the RM to release resources, and the corresponding AW also closes its own alarms from the RW, and finally ends the process.

本发明实施例所述资源可用度预警方法、装置及系统,以基于时间序列动态反馈学习模型的资源可用度预警机制为核心方法,在Hadoop原架构上新增具有资源可用度监控预警机制的功能模块,如RW和AW,分别实现全局资源的可用度监控预警和局部容器Container的资源可用度监控预警,并且按照预设的规则将资源可用度过低的告警信息及时通知给用户以便提前处理和分配资源。在具体实现过程中,首先建立基于时间序列的动态反馈学习模型,将每个时间序列增量范围内所有并行任务的进度和消耗资源的增量动态实时地反馈给 AW,AW按照预设的规则学习本时间周期内潜在的规律和关联关系并产生下一时间周期任务进度增量对应的消耗资源增量,再对比实际的任务进度和消耗资源的增量值不断反复的动态调整权值减小误差优化模型,从一系列权值中取当前时间序列范围内最小误差的最优权值形成模型,然后在此模型基础上预估完成剩余任务所需的资源大小,如果所需资源值超过剩余可用资源量的阀值则由 AW通知RW,RW按照任务不同的优先级、重要程度和依赖关系发出不同高低级别的预警。The resource availability early warning method, device and system according to the embodiments of the present invention take the resource availability early warning mechanism based on the time series dynamic feedback learning model as the core method, and add a function of resource availability monitoring early warning mechanism to the original Hadoop architecture Modules, such as RW and AW, respectively implement global resource availability monitoring and early warning and local container resource availability monitoring and early warning, and timely notify users of low resource availability alarms according to preset rules for early processing and monitoring. resource allocation. In the specific implementation process, a dynamic feedback learning model based on time series is first established, and the progress of all parallel tasks and the increments of resource consumption within each time series increment are dynamically and real-time fed back to AW. AW follows preset rules. Learn the potential rules and associations in this time period and generate the increment of consumed resources corresponding to the increment of task progress in the next time period, and then compare the actual task progress with the increment value of consumed resources, and continuously and repeatedly dynamically adjust the weight to decrease Error optimization model, take the optimal weight with the smallest error in the current time series from a series of weights to form a model, and then estimate the resource size required to complete the remaining tasks on the basis of this model, if the required resource value exceeds the remaining value The threshold of available resources is notified by AW to RW, and RW sends out different levels of early warning according to the different priorities, importance and dependencies of tasks.

本发明实施例所述资源可用度预警方法、装置及系统,弥补了现有Hadoop 没有相关模块给用户为其分配资源提供数据和事实依据的不足,以及只能实时监控资源和任务状态不能提前预知资源不足情况的缺陷,避免资源的浪费,降低资金成本,减轻维护人员的工作压力,为运维提供决策支持,本方案在实际应用中具有较高的实用性。The method, device and system for early warning of resource availability according to the embodiments of the present invention make up for the deficiencies that the existing Hadoop does not have relevant modules to provide users with data and factual basis for allocating resources, and that resources and task states can only be monitored in real time and cannot be predicted in advance Defects of insufficient resources, avoid waste of resources, reduce capital costs, reduce the work pressure of maintenance personnel, and provide decision support for operation and maintenance. This solution has high practicability in practical applications.

图4中所示的资源可用度预警装置中的各处理模块的实现功能,可参照前述资源可用度预警方法的相关描述而理解。本领域技术人员应当理解,图4所示的资源可用度预警装置中各处理模块的功能可通过运行于处理器上的程序而实现,也可通过具体的逻辑电路而实现,比如:可由中央处理器(CPU)、微处理器(MPU)、数字信号处理器(DSP)、或现场可编程门阵列(FPGA)实现。The realization function of each processing module in the resource availability early warning apparatus shown in FIG. 4 can be understood by referring to the relevant description of the foregoing resource availability early warning method. Those skilled in the art should understand that the function of each processing module in the resource availability early warning device shown in FIG. 4 can be realized by a program running on the processor, or can be realized by a specific logic circuit, for example, a central processing CPU (CPU), Microprocessor (MPU), Digital Signal Processor (DSP), or Field Programmable Gate Array (FPGA).

在本发明所提供的几个实施例中,应该理解到,所揭露的方法及装置,可以通过其他的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个模块或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的通信连接可以是通过一些接口,设备或模块的间接耦合或通信连接,可以是电性的、机械的或其他形式的。In the several embodiments provided by the present invention, it should be understood that the disclosed method and apparatus may be implemented in other manners. The apparatus embodiments described above are only illustrative. For example, the division of the modules is only a logical function division. In actual implementation, there may be other division methods, for example, multiple modules or components may be combined, or Can be integrated into another system, or some features can be ignored, or not implemented. In addition, the communication connection between the components shown or discussed may be through some interfaces, indirect coupling or communication connection of devices or modules, and may be electrical, mechanical or other forms.

上述作为分离部件说明的模块可以是、或也可以不是物理上分开的,作为模块显示的部件可以是、或也可以不是物理模块,即可以位于一个地方,也可以分布到多个网络模块上;可以根据实际的需要选择其中的部分或全部模块来实现本实施例方案的目的。The modules described above as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical modules, that is, they may be located in one place or distributed to multiple network modules; Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外,在本发明各实施例中的各功能模块可以全部集成在一个处理模块中,也可以是各模块分别单独作为一个模块,也可以两个或两个以上模块集成在一个模块中;上述集成的模块既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。In addition, each functional module in each embodiment of the present invention may be all integrated into one processing module, or each module may be separately used as a module, or two or more modules may be integrated into one module; the above integration The module can be realized in the form of hardware, or it can be realized in the form of hardware plus software function module.

本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、只读存储器(ROM,Read-Only Memory)、磁碟或者光盘等各种可以存储程序代码的介质。Those of ordinary skill in the art can understand that all or part of the steps of implementing the above method embodiments can be completed by program instructions related to hardware, the aforementioned program can be stored in a computer-readable storage medium, and when the program is executed, the execution includes: The steps of the above method embodiments; and the aforementioned storage medium includes: a removable storage device, a read-only memory (ROM, Read-Only Memory), a magnetic disk or an optical disk and other media that can store program codes.

或者,本发明实施例上述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本发明各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、磁碟或者光盘等各种可以存储程序代码的介质。Alternatively, if the above-mentioned integrated modules in the embodiments of the present invention are implemented in the form of software function modules and sold or used as independent products, they may also be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the embodiments of the present invention may be embodied in the form of software products in essence or the parts that make contributions to the prior art. The computer software products are stored in a storage medium and include several instructions for A computer device (which may be a personal computer, a server, or a network device, etc.) is caused to execute all or part of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes various media that can store program codes, such as a removable storage device, a ROM, a magnetic disk, or an optical disk.

本发明实施例中记载的资源可用度预警方法、装置只以上述实施例为例,但不仅限于此,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。The resource availability early warning method and device described in the embodiments of the present invention only take the above embodiment as an example, but are not limited to this. Those of ordinary skill in the art should understand that it is still possible to perform the technical solutions described in the foregoing embodiments. Modification, or equivalent replacement of some or all of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

以上所述仅为本发明的较佳实施例而已,并非用于限定本发明的保护范围。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the protection scope of the present invention.

Claims (8)

1.一种资源可用度预警方法,其特征在于,所述方法包括:1. A method for early warning of resource availability, wherein the method comprises: 建立资源使用量预估模型;Establish a resource usage estimation model; 通过所述资源使用量预估模型,预估下一时间周期的资源使用量;Estimate the resource usage in the next time period by using the resource usage estimation model; 当所述预估的下一周期的资源使用量超出预设阈值时,发出资源可用度预警;When the estimated resource usage in the next cycle exceeds a preset threshold, issue a resource availability warning; 其中,所述建立资源使用量预估模型,包括:Wherein, the establishment of a resource usage estimation model includes: 根据每个时间周期内所有并行任务的进度和消耗资源的增量以及时间周期规律和关联关系确定下一时间周期任务进度增量对应的消耗资源增量;Determine the resource consumption increment corresponding to the progress increment of the task in the next time period according to the progress of all parallel tasks in each time period and the increment of consumed resources, as well as the time period regularity and correlation; 多次对比所述确定的下一时间周期任务进度增量对应的消耗资源增量与实际的任务进度增量对应的消耗资源的增量,动态调整权值减小误差;Comparing the consuming resource increments corresponding to the determined task progress increments in the next time period multiple times with the consuming resource increments corresponding to the actual task progress increments, dynamically adjusting the weights to reduce errors; 选取对应最小误差的权值建立资源使用量预估模型。Select the weight corresponding to the minimum error to establish a resource usage estimation model. 2.根据权利要求1所述方法,其特征在于,所述通过所述资源使用量预估模型,预估下一时间周期的资源使用量包括但不限于:通过所述资源使用量预估模型,预估下一时间周期的CPU资源使用量和内存资源使用量。2. method according to claim 1 is characterized in that, described by described resource usage estimation model, the resource usage estimation of next time period includes but not limited to: by described resource usage estimation model , to estimate the CPU resource usage and memory resource usage for the next time period. 3.根据权利要求1所述方法,其特征在于,所述预估的下一周期的资源使用量超出第一阈值时包括:预估的下一周期的资源使用量超出剩余可用资源量的阈值。3. The method according to claim 1, wherein when the estimated resource usage of the next cycle exceeds the first threshold, the method comprises: the estimated resource usage of the next cycle exceeds the threshold of the remaining available resources . 4.根据权利要求1或3所述方法,其特征在于,所述当所述预估的下一周期的资源使用量超出预设阈值时,发出资源可用度预警包括:当所述预估的下一周期的资源使用量超出预设阈值时,根据任务的优先级、重要程度以及依赖关系发出不同级别的资源可用度预警。4. The method according to claim 1 or 3, wherein when the estimated resource usage in the next cycle exceeds a preset threshold, issuing a resource availability early warning comprises: when the estimated resource usage exceeds a preset threshold. When the resource usage in the next cycle exceeds the preset threshold, different levels of resource availability warnings are issued according to the priority, importance and dependencies of the tasks. 5.一种资源可用度预警装置,其特征在于,所述装置包括:模型建立模块、资源预估模块、资源预警模块,其中,5. A resource availability early warning device, characterized in that the device comprises: a model building module, a resource estimation module, and a resource early warning module, wherein, 所述模型建立模块,用于建立资源使用量预估模型;The model establishment module is used to establish a resource usage estimation model; 所述资源预估模块,用于通过所述资源使用量预估模型,预估下一时间周期的资源使用量;The resource estimation module is used to estimate the resource usage in the next time period through the resource usage estimation model; 所述资源预警模块,用于当所述预估的下一周期的资源使用量超出预设阈值时,发出资源可用度预警;The resource early warning module is configured to issue a resource availability early warning when the estimated resource usage in the next cycle exceeds a preset threshold; 其中,所述模型建立模块,具体用于根据每个时间周期内所有并行任务的进度和消耗资源的增量以及时间周期规律和关联关系确定下一时间周期任务进度增量对应的消耗资源增量;Wherein, the model building module is specifically configured to determine the resource consumption increment corresponding to the progress increment of the task in the next time period according to the progress of all parallel tasks in each time period and the increment of consumed resources, as well as the regularity of the time period and the relationship ; 多次对比所述确定的下一时间周期任务进度增量对应的消耗资源增量与实际的任务进度增量对应的消耗资源的增量,动态调整权值减小误差;Comparing the consuming resource increments corresponding to the determined task progress increments in the next time period multiple times with the consuming resource increments corresponding to the actual task progress increments, dynamically adjusting the weights to reduce errors; 选取对应最小误差的权值建立资源使用量预估模型。Select the weight corresponding to the minimum error to establish a resource usage estimation model. 6.根据权利要求5所述装置,其特征在于,所述资源预估模块具体用于:通过所述资源使用量预估模型,预估下一时间周期的CPU资源使用量和内存资源使用量。6. device according to claim 5, is characterized in that, described resource estimation module is specifically used for: by described resource usage estimation model, estimate CPU resource usage amount and memory resource usage amount of next time period . 7.根据权利要求5所述装置,其特征在于,所述资源预警模块判断预估的下一周期的资源使用量超出第一阈值时包括:所述资源预警模块判断预估的下一周期的资源使用量超出剩余可用资源量的阈值。7. The device according to claim 5, wherein when the resource early-warning module judges that the estimated resource usage in the next cycle exceeds the first threshold value, it comprises: the resource early-warning module judges that the estimated next cycle of resource usage exceeds the first threshold. Resource usage exceeds the threshold for remaining available resources. 8.根据权利要求5或7所述装置,其特征在于,资源预警模块具体用于:当所述预估的下一周期的资源使用量超出预设阈值时,根据任务的优先级、重要程度以及依赖关系发出不同级别的资源可用度预警。8. The device according to claim 5 or 7, wherein the resource early warning module is specifically used for: when the estimated resource usage of the next cycle exceeds a preset threshold, according to the priority of the task, the degree of importance As well as dependencies, different levels of resource availability alerts are issued.
CN201610265261.5A 2016-04-26 2016-04-26 Resource availability early warning method and device Active CN107315636B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610265261.5A CN107315636B (en) 2016-04-26 2016-04-26 Resource availability early warning method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610265261.5A CN107315636B (en) 2016-04-26 2016-04-26 Resource availability early warning method and device

Publications (2)

Publication Number Publication Date
CN107315636A CN107315636A (en) 2017-11-03
CN107315636B true CN107315636B (en) 2020-06-05

Family

ID=60184366

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610265261.5A Active CN107315636B (en) 2016-04-26 2016-04-26 Resource availability early warning method and device

Country Status (1)

Country Link
CN (1) CN107315636B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110888733B (en) * 2018-09-11 2023-12-26 三六零科技集团有限公司 Cluster resource use condition processing method and device and electronic equipment
CN109684059A (en) * 2018-12-20 2019-04-26 北京百度网讯科技有限公司 Method and device for monitoring data
CN111858015B (en) * 2019-04-25 2024-01-12 中国移动通信集团河北有限公司 Method, device and gateway for configuring running resources of application program
CN110597634B (en) * 2019-09-12 2021-05-07 腾讯科技(深圳)有限公司 Data processing method and device and computer readable storage medium
CN112328393A (en) * 2020-11-02 2021-02-05 京东数字科技控股股份有限公司 Job processing method, device and system based on big data environment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103024762A (en) * 2012-12-26 2013-04-03 北京邮电大学 Service feature based communication service forecasting method
CN103581339A (en) * 2013-11-25 2014-02-12 广东电网公司汕头供电局 Storage resource allocation monitoring and processing method based on cloud computing
CN103812911A (en) * 2012-11-14 2014-05-21 中兴通讯股份有限公司 Method and system for controlling and utilizing service resources of PaaS (platform as a service) cloud computing platform

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9495152B2 (en) * 2007-06-22 2016-11-15 Red Hat, Inc. Automatic baselining of business application service groups comprised of virtual machines
US8850450B2 (en) * 2012-01-18 2014-09-30 International Business Machines Corporation Warning track interruption facility

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103812911A (en) * 2012-11-14 2014-05-21 中兴通讯股份有限公司 Method and system for controlling and utilizing service resources of PaaS (platform as a service) cloud computing platform
CN103024762A (en) * 2012-12-26 2013-04-03 北京邮电大学 Service feature based communication service forecasting method
CN103581339A (en) * 2013-11-25 2014-02-12 广东电网公司汕头供电局 Storage resource allocation monitoring and processing method based on cloud computing

Also Published As

Publication number Publication date
CN107315636A (en) 2017-11-03

Similar Documents

Publication Publication Date Title
CN107315636B (en) Resource availability early warning method and device
JP6894532B2 (en) Training of machine learning models in a large distributed system using a job server
US9363190B2 (en) System, method and computer program product for energy-efficient and service level agreement (SLA)-based management of data centers for cloud computing
CN106528266B (en) Method and device for dynamically adjusting resources in cloud computing system
US9037880B2 (en) Method and system for automated application layer power management solution for serverside applications
US9396039B1 (en) Scalable load testing using a queue
CN112579304A (en) Resource scheduling method, device, equipment and medium based on distributed platform
JP6823670B2 (en) Detecting and predicting bottlenecks in complex systems
CN105939225A (en) Method and device for executing service
JPWO2011105001A1 (en) Throughput maintenance support system, apparatus, method, and program
CN108270805A (en) For the resource allocation methods and device of data processing
CN110990160B (en) A static security analysis container cloud elastic scaling method based on load forecasting
US20210357016A1 (en) Intelligent and predictive optimization of power needs across virtualized environments
CN103248622B (en) A kind of Online Video QoS guarantee method of automatic telescopic and system
Tran et al. Optimized resource usage with hybrid auto-scaling system for knative serverless edge computing
CN115913967A (en) A Microservice Elastic Scaling Method Based on Resource Demand Prediction in Cloud Environment
CN115665158A (en) Method and system for dynamic management of container cluster services
US11640195B2 (en) Service-level feedback-driven power management framework
Lanciano et al. Predictive auto-scaling with OpenStack Monasca
Liu et al. A trend detection-based auto-scaling method for containers in high-concurrency scenarios
Tuli et al. Carol: Confidence-aware resilience model for edge federations
US11681353B1 (en) Power capping in a composable computing system
CN105357026B (en) A kind of resource information collection method and calculate node
CN118672758B (en) A system, method, device and medium for multi-cluster task scheduling and monitoring
CN105094944B (en) A kind of virtual machine migration method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant