WO2020248227A1 - 一种基于负载预测的Hadoop计算任务推测执行方法 - Google Patents

一种基于负载预测的Hadoop计算任务推测执行方法 Download PDF

Info

Publication number
WO2020248227A1
WO2020248227A1 PCT/CN2019/091269 CN2019091269W WO2020248227A1 WO 2020248227 A1 WO2020248227 A1 WO 2020248227A1 CN 2019091269 W CN2019091269 W CN 2019091269W WO 2020248227 A1 WO2020248227 A1 WO 2020248227A1
Authority
WO
WIPO (PCT)
Prior art keywords
tasks
backup
completion time
map
task
Prior art date
Application number
PCT/CN2019/091269
Other languages
English (en)
French (fr)
Inventor
张斌
李薇
郭军
刘晨
侯帅
周杜凯
柳波
王馨悦
张娅杰
张瀚铎
刘文凤
王嘉怡
Original Assignee
东北大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 东北大学 filed Critical 东北大学
Publication of WO2020248227A1 publication Critical patent/WO2020248227A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5019Workload prediction

Definitions

  • the present invention relates to the field of distributed, big data and cloud computing, in particular to a method for speculative execution of Hadoop computing tasks based on load prediction.
  • Speculative execution of computing tasks means that in a distributed cluster environment, due to program bugs, unbalanced load, or uneven resource distribution, the running speed of multiple computing tasks in the same job is inconsistent, and some computing tasks run significantly slower than other computing Tasks, these computing tasks slow down the overall execution progress of the job.
  • Hadoop uses the idea of changing space for time to start a backup task for the computing task and let the backup and the original task run at the same time, whichever comes first After running, use its result.
  • AppMaster detects that the remaining completion time of the slowest running task in the Map computing task or Reduce computing task is greater than the average execution time of the completed task, it will start a backup task for the task.
  • the Hadoop platform will adopt the strategy of swapping time for space, enabling backups for delayed tasks to accelerate the completion of the job.
  • the current task speculative execution mechanism in Hadoop does not consider the impact of computing node load changes on the task execution progress, and reflects the task execution at a constant rate. It also does not consider the task speculative execution mechanism to perform operations on others when the cluster computing resources are in short supply. The impact.
  • task speculative execution has an important impact on job completion time. How to optimize task speculative execution is one of the key steps to optimize the Hadoop platform.
  • the method for speculative execution of Hadoop computing tasks based on load prediction specifically includes three parts: an adaptive adjustment algorithm for the number of backup tasks, an algorithm for predicting the completion time of execution tasks, and a prediction for the completion time of backup tasks.
  • Algorithm part Adaptive adjustment algorithm for the number of backup tasks.
  • AppMaster senses the current cluster load status and adjusts the total number of backup tasks to reduce the interference of backup tasks to other tasks and avoid resource preemption caused by excessive backup tasks.
  • Execute the task completion time prediction algorithm and establish the task completion time and computing node load and the XGboost model of task progress through historical data.
  • Each running task calculates the task completion time in real time according to the XGboost model, and sends the information to AppMaster.
  • the backup task completion time prediction algorithm uses the load of the computing node to predict the completion time of the backup task through the XGboost model, selects a reasonable node to start the backup task, and reduces the completion time of the job.
  • a method for speculative execution of Hadoop computing tasks based on load prediction includes the following steps:
  • Step 1 Calculate the total number of tasks after the job is submitted, and the resource manager adaptively adjusts the number of backup tasks to obtain the maximum number of backup tasks, including steps 1.1 to 1.3:
  • Step 1.1 For the most recent period T, save the time point when the idle computing node applies for computing resources to the resource manager in the linked list;
  • Step 1.2 Calculate the cluster idleness ⁇ through the cluster idleness awareness method, which specifically includes:
  • Step 1.2.1 Determine whether the length of the linked list List.sise is greater than the threshold L max .
  • Step 1.2.1.1 When List.sise>L max , judge whether the time difference T between the tail node and the head node of the linked list is greater than the threshold T max .
  • Step 1.2.1.1.1 When T>T max , remove the head node and return to step 1.2.1.
  • Step 1.2.1.1.2 When T ⁇ T max , skip to step 1.2.2.
  • Step 1.2.1.2 When List.sise ⁇ L max , continue to store the time point at which the computing node applies for computing resources from the resource manager, and return to step 1.2.1.
  • Step 1.2.2 Calculate the cluster idleness ⁇ according to formula (1).
  • cluster idleness
  • List.size represents the length of the linked list
  • T represents the time difference between the tail node and the head node of the linked list.
  • Step 1.3 Calculate the maximum number of backup tasks in the cluster TotalBackup according to formula (2) and formula (3).
  • t s represents a calculation cycle
  • TotalBackup represents the maximum number of backup tasks that the entire cluster can open
  • represents an intermediate variable
  • N represents the total number of tasks.
  • Step 2 Forecast the completion time of the execution task, including:
  • Step 2.1 Construct the weighted XGboost model WXG (WeightXGboost) and the XGboost model TXG (TimeXGboost) of the remaining completion time.
  • Step 2.2 The Map execution task consists of two stages: map and sort.
  • the method for predicting the completion time of the Map execution task includes:
  • Step 2.2.1 Input the matrices x1[load information of each stage] and x2[node load prediction information, the amount of data remaining in the current stage].
  • Step 2.2.2 According to formula (4), the weights w map and w sort of the map stage and the sort stage are calculated by the weight XGboost model WXG (WeightXGboost).
  • w map represents the weight of the map stage
  • w sort represents the weight of the sort stage
  • x1 represents the load information of each stage
  • WXG(x) represents the weight of the XGboost model.
  • Step 2.2.3 According to formula (5), calculate the remaining completion time T remain of each stage through the XGboost model TXG (TimeXGboost) of the remaining completion time.
  • T remain TXG(x2) (5)
  • T remain represents the remaining completion time of each stage
  • x2 represents the node load prediction information and the remaining data volume of the current stage
  • TXG(x) represents the XGboost model of the remaining completion time.
  • Step 2.2.4 Calculate the progress prog m of each stage according to formula (6).
  • prog m represents the progress of each stage
  • runtime represents the time that each stage has been running.
  • Step 2.2.5 Calculate the completion progress prog Map of the Map execution task according to formula (7).
  • prog Map ⁇ i ⁇ map,sort ⁇ w i *prog i (7)
  • prog Map represents the progress of Map execution tasks
  • w i represents the weight of each stage
  • prog i represents the progress of each stage.
  • Step 2.2.6 Calculate the EstimatedEndTime of the completion time of the Map execution task according to formula (8).
  • EstimatedEndTime represents the completion time of the Map execution task, now represents the current time, and starttime represents the start time of the task.
  • Step 2.3 Reduce execution tasks are divided into three stages: copy, sort, reduce, and predict the completion time of Reduce execution tasks, including:
  • Step 2.3.1 Input matrix x1[load information of each stage], x2[node load prediction information, the amount of data remaining in the current stage].
  • Step 2.3.2 According to formula (9), calculate the weights w copy , w sort , and w reduce of the three stages of copy, sort and reduce through the weight XGboost model WXG (WeightXGboost).
  • w copy represents the weight of the copy phase
  • w sort represents the weight of the sort phase
  • w reduce represents the weight of the reduce phase.
  • Step 2.3.3 According to formula (10), calculate the remaining completion time T remain of each stage through the XGboost model TXG (TimeXGboost) of the remaining completion time.
  • Step 2.3.4 Calculate the progress prog m of each stage according to formula (11).
  • Step 2.3.5 Calculate the progress of the Reduce execution task completion prog Reduce according to formula (12).
  • prog Reduce ⁇ i ⁇ copy,sort,reduce ⁇ w i *prog i (12)
  • prog Reduce represents the completion progress of Reduce execution tasks.
  • Step 2.3.6 Calculate EstimatedEndTime according to formula (13).
  • Step 3 Compare the maximum number of backup tasks with the number of backup tasks set by APPmaster, and take the minimum value as the backup task
  • Step 4 Determine whether the number of backup tasks is less than or equal to the threshold of the number of backup tasks, if yes, go to step 5; if no,
  • Step 5 Determine whether the number of tasks is less than the total number of tasks, if yes, go to step 6, if not, then backup tasks
  • the number is returned to the resource manager
  • Step 6 Forecast the completion time of the backup task, including:
  • Step 6.1 Calculate the failure rate of the execution task of the node Map and the failure rate of the Reduce task according to formula (14).
  • fail Map and fail Reduce respectively represent the failure rate of the computing node running Map tasks and the failure rate of running Reduce tasks.
  • Map fail and Reduce fail respectively represent the number of Map tasks that have failed historical operation and the number of Reduce tasks that have failed historical operation, sum Map , Sum Reduce represents the total number of Map tasks and the total number of Reduce tasks that have been run on the computing node
  • Step 6.2 Calculate the weight parameter w of each stage through the weight XGboost model WXG (WeightXGboost).
  • Step 6.3 According to formula (15), calculate the predicted completion time through the XGboost model TXG (TimeXGboost) of the remaining completion time.
  • runtime represents the predicted completion time of the backup task
  • Step 6.4 Calculate the EstmitedEndTime backup according to the formula (16).
  • EstmitedEndTime backup represents the completion time of the backup task.
  • Step 7 Determine the size of the backup task completion time EstmitedEndTime backup and execution task completion time EstmitedEndTime.
  • EstmitedEndTime backup ⁇ EstmitedEndTime
  • Backup is not started, the number of tasks is increased by 1, and go to step 4;
  • EstmitedEndTime backup ⁇ EstmitedEndTime, start backup and backup Add 1 to the number of tasks, add 1 to the number of tasks, and go to step 4.
  • the present invention proposes a Hadoop computing task speculative execution method based on load prediction, an adaptive adjustment algorithm for the number of backup tasks, and real-time adjustment of the number of backup tasks according to the load status of the cluster, which ensures that the backup task is under the condition of tight cluster computing resources.
  • the opening of will not affect other operations.
  • To predict the completion time of the execution task use the XGboost algorithm to predict the completion time of the Map execution task and the Reduce execution task respectively, so as to more accurately identify the delayed task, and effectively avoid the misjudgment of the delayed task and the waste of computing resources.
  • the backup task completion time prediction algorithm selects a reasonable node to start the backup task, saves the computing resources of the computing node, reduces the completion time of the job, and improves the overall performance of the cluster.
  • FIG. 1 is a flowchart of a method for speculative execution of Hadoop computing tasks based on load prediction according to an embodiment of the present invention
  • Figure 2 Hadoop speculative execution IPO diagram based on load prediction according to an embodiment of the present invention
  • FIG. 3 is a comparison diagram of FIFO job completion time according to an embodiment of the present invention.
  • FIG. 4 is a comparison diagram of the completion time of Capacity operations according to an embodiment of the present invention.
  • Fig. 5 is a comparison diagram of the completion time of Fair-modified operations according to the embodiment of the present invention.
  • Fig. 6 is a comparison diagram of Fair job completion time according to an embodiment of the present invention.
  • the present invention is a method for speculative execution of Hadoop computing tasks based on load prediction. Start the task completion time and compute node load, the XGboost model of task progress, each running task calculates the task completion time in real time according to the XGboost model, and sends the information to AppMaster.
  • the resource manager is based on the current cluster load through adaptive
  • the backup task number adjustment algorithm calculates in real time the maximum number of tasks currently capable of task speculation for AppMaster to judge whether to start the backup task. According to the predicted completion time of the backup task, determine whether to start the backup for the task, and select a reasonable node to start the backup task. Reduce job completion time.
  • the system is tested on the Hadoop platform with 20 homogeneous machines, of which one is master and 19 is slave. Three user queues a, b, and c are configured, occupying 30%, 30%, and 40% of the cluster respectively.
  • the computing resources of the Hadoop cluster are Hadoop version 2.6, Java version 1.7, operating system Centos7, compilation tool Maven, development tool Intelij, the number of nodes is 19, and the user queue is root.a, root.b, root.c .
  • CPU core number is 8 cores
  • CPU frequency is 2.2GHz
  • memory type is DDR3-1333 ECC
  • memory capacity is 8GB
  • hard disk type is 15000 rpm SAS hard disk
  • hard disk capacity is 300GB
  • bandwidth is 1000Mbps.
  • a method for speculative execution of Hadoop computing tasks based on load prediction specifically includes the following steps:
  • Step 1 Calculate the total number of tasks after the job is submitted, and the resource manager adaptively adjusts the number of backup tasks to obtain the maximum number of backup tasks, including:
  • Step 1.1 For the most recent period T, save the time point when the idle computing node applies for computing resources to the resource manager in the linked list;
  • Step 1.2 Calculate the cluster idleness ⁇ through the cluster idleness awareness method, which specifically includes:
  • Step 1.2.1 Determine whether the length of the linked list List.sise is greater than the threshold L max .
  • Step 1.2.1.1 When List.sise>L max , judge whether the time difference T between the tail node and the head node of the linked list is greater than the threshold T max .
  • Step 1.2.1.1.1 When T>T max , remove the head node and return to step 1.2.1.
  • Step 1.2.1.1.2 When T ⁇ T max , skip to step 1.2.2.
  • Step 1.2.1.2 When List.sise ⁇ L max , continue to store the time point at which the computing node applies for computing resources from the resource manager, and return to step 1.2.1.
  • Step 1.2.2 Calculate the cluster idleness ⁇ according to formula (1).
  • cluster idleness
  • List.size represents the length of the linked list
  • T represents the time difference between the tail node and the head node of the linked list.
  • Step 1.3 Calculate the maximum number of backup tasks in the cluster TotalBackup according to formula (2) and formula (3).
  • t s represents a calculation cycle
  • TotalBackup represents the maximum number of backup tasks that the entire cluster can open
  • represents an intermediate variable
  • N represents the total number of tasks.
  • Step 2 Forecast the completion time of the execution task, including:
  • Step 2.1 Construct the weighted XGboost model WXG (WeightXGboost) and the XGboost model TXG (TimeXGboost) of the remaining completion time.
  • Step 2.2 The Map execution task consists of two stages: map and sort.
  • the method for predicting the completion time of the Map execution task includes:
  • Step 2.2.1 Input the matrices x1[load information of each stage] and x2[node load prediction information, the amount of data remaining in the current stage].
  • Step 2.2.2 According to formula (4), the weights w map and w sort of the map stage and the sort stage are calculated by the weight XGboost model WXG (WeightXGboost).
  • w map represents the weight of the map stage
  • w sort represents the weight of the sort stage
  • x1 represents the load information of each stage
  • WXG(x) represents the weight of the XGboost model.
  • Step 2.2.3 According to formula (5), calculate the remaining completion time T remain of each stage through the XGboost model TXG (TimeXGboost) of the remaining completion time.
  • T remain TXG(x2) (5)
  • T remain represents the remaining completion time of each stage
  • x2 represents the node load prediction information and the remaining data volume of the current stage
  • TXG(x) represents the XGboost model of the remaining completion time.
  • Step 2.2.4 Calculate the progress prog m of each stage according to formula (6).
  • prog m represents the progress of each stage
  • runtime represents the time that each stage has been running.
  • Step 2.2.5 Calculate the completion progress prog Map of the Map execution task according to formula (7).
  • prog Map ⁇ i ⁇ map,sort ⁇ w i *prog i (7)
  • prog Map represents the completion progress of Map execution tasks
  • w i represents the weight of each stage
  • prog i represents the progress of each stage.
  • Step 2.2.6 Calculate the EstimatedEndTime of the completion time of the Map execution task according to formula (8).
  • EstimatedEndTime represents the completion time of the Map execution task, now represents the current time, and starttime represents the start time of the task.
  • Step 2.3 Reduce execution tasks are divided into three stages: copy, sort, reduce, and predict the completion time of Reduce execution tasks, including:
  • Step 2.3.1 Input matrix x1[load information of each stage], x2[node load prediction information, the amount of data remaining in the current stage].
  • Step 2.3.2 According to formula (9), calculate the weights w copy , w sort , and w reduce of the three stages of copy, sort and reduce through the weight XGboost model WXG (WeightXGboost).
  • w copy represents the weight of the copy phase
  • w sort represents the weight of the sort phase
  • w reduce represents the weight of the reduce phase.
  • Step 2.3.3 According to formula (10), calculate the remaining completion time T remain of each stage through the XGboost model TXG (TimeXGboost) of the remaining completion time.
  • Step 2.3.4 Calculate the progress prog m of each stage according to formula (11).
  • Step 2.3.5 Calculate the progress of the Reduce execution task completion prog Reduce according to formula (12).
  • prog Reduce ⁇ i ⁇ copy,sort,reduce ⁇ w i *prog i (12)
  • prog Reduce represents the completion progress of Reduce execution tasks.
  • Step 2.3.6 Calculate EstimatedEndTime according to formula (13).
  • Step 3 Compare the maximum number of backup tasks with the number of backup tasks set by APPmaster, and take the minimum value as the backup task
  • Step 4 Determine whether the number of backup tasks is less than or equal to the threshold of the number of backup tasks, if yes, go to step 5; if no,
  • Step 5 Determine whether the number of tasks is less than the total number of tasks, if yes, go to step 6, if not, then backup tasks
  • the number is returned to the resource manager
  • Step 6 Forecast the completion time of the backup task, including:
  • Step 6.1 Calculate the failure rate of the execution task of the node Map and the failure rate of the Reduce task according to formula (14).
  • fail Map and fail Reduce respectively represent the failure rate of the computing node running Map tasks and the failure rate of running Reduce tasks.
  • Map fail and Reduce fail respectively represent the number of Map tasks that have failed historical operation and the number of Reduce tasks that have failed historical operation, sum Map , Sum Reduce represents the total number of Map tasks and the total number of Reduce tasks that have been run on the computing node
  • Step 6.2 Calculate the weight parameter w of each stage through the weight XGboost model WXG (WeightXGboost).
  • Step 6.3 According to formula (15), calculate the predicted completion time through the XGboost model TXG (TimeXGboost) of the remaining completion time.
  • runtime represents the predicted completion time of the backup task
  • Step 6.4 Calculate the EstmitedEndTime backup according to the formula (16).
  • EstmitedEndTime backup represents the completion time of the backup task.
  • Step 7 Determine the size of the backup task completion time EstmitedEndTime backup and execution task completion time EstmitedEndTime.
  • EstmitedEndTime backup ⁇ EstmitedEndTime
  • the backup is not started, the number of tasks is increased by 1, and go to step 4;
  • EstmitedEndTime backup ⁇ EstmitedEndTime
  • the backup is turned on, Increase the number of backup tasks by 1, and increase the number of tasks by 1, and go to step 4.
  • the process of the present invention is completed in the computing node.
  • the completion time of the task is predicted according to the load of the computing node, and then the completion time is passed to AppMaster.
  • AppMaster contains the number of tasks that have been backed up, as shown in Figure 2, and then contact the resource manager
  • the number of backup tasks passed from the adaptive adjustment algorithm is compared, and if it is less, the backup can be continued. Then calculate the completion time of the backup task. If the predicted completion time is greater than the completion time of the backup task, start the backup.
  • the Hadoop calculation task speculative execution method based on load prediction proposed in this chapter can find lagging tasks more accurately than the native LATE speculative execution method of Hadoop, and can adjust the number of backup tasks according to the cluster load, effectively reducing This saves the job completion time while avoiding resource competition caused by excessive backup tasks.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Multi Processors (AREA)
  • Debugging And Monitoring (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明提出一种基于负载预测的Hadoop计算任务推测执行方法,包括:资源管理器对备份任务数自适应调整,得到最大备份任务数;预测执行任务完成时间;将最大备份任务数与APPmaster设置的备份任务数比较,取最小值作为备份任务数阈值;判断备份任务数是否小于等于备份任务数阈值;判断任务数是否小于总任务数;预测备份任务完成时间;判断备份任务完成时间和执行任务完成时间大小,确定是否开启备份;本发明保证了当集群计算资源紧张的情况下,备份任务的开启不会对其他作业产生影响;执行任务的完成时间预测算法,有效避免了迟滞任务的误判导致计算资源浪费;备份任务完成时间预测算法,节约计算节点的计算资源,减少作业的完成时间,提高了集群的整体性能。

Description

一种基于负载预测的Hadoop计算任务推测执行方法 技术领域
本发明涉及本发明涉及分布式、大数据、云计算领域,具体涉及一种基于负载预测的Hadoop计算任务推测执行方法。
背景技术
计算任务推测执行是指在分布式集群环境下,因为程序bug,负载不均衡或者资源分布不均,造成同一个作业的多个计算任务运行速度不一致,有的计算任务运行速度明显慢于其他计算任务,这些计算任务拖慢了作业的整体执行进度,为了避免这种情况发生,Hadoop利用以空间换时间的思想,会为该计算任务启动备份任务,让该备份与原始任务同时运行,哪个先运行完,则使用它的结果。当AppMaster检测到Map计算任务或Reduce计算任务中运行最慢的任务的剩余完成时间大于已完成任务的平均执行时间,则会为该任务启动备份任务。
当前的Hadoop平台下,为了加速用户作业的完成,Hadoop平台会采用以时间换空间的策略,对迟滞任务开启备份来加速作业的完成进度。当前Hadoop中任务推测执行机制并没有考虑计算节点负载变化对任务执行进度的影响,以恒定的速率来反映任务的执行情况,也没有考虑当集群计算资源紧缺时,任务推测执行机制对其他做作业带来的影响。在Hadoop平台中,任务推测执行对作业完成时间有着重要影响,如何优化任务推测执行是优化Hadoop平台的关键步骤之一。
发明内容
基于以上技术问题,本发明所述的一种基于负载预测的Hadoop计算任务推测执行方法,具体包括三部分:备份任务数自适应调整算法部分,执行任务完成时间预测算法部分,备份任务完成时间预测算法部分。备份任务数自适应调整算法,AppMaster感知当前的集群负载状态,对备份任务的总数量做出调整,减少备份任务对其他作业的干扰,避免过多的备份任务产生资源抢占。执行任务完成时间预测算法,通过历史数据建立起任务完成时间与计算节点负载,任务进度的XGboost模型,每个运行的任务根据XGboost模型实时计算出任务的完成时间,并且将信息发送给AppMaster。备份任务完成时间预测算法,利用计算节点的负载通过XGboost模型预测备份任务的完成时间,选择合理的节点来开启备份任务,减少作业的完成时间。
一种基于负载预测的Hadoop计算任务推测执行方法,具体包括如下步骤:
步骤1:在作业提交后计算总任务数,且资源管理器对备份任务数自适应调整,得到最大备份任务数,具体包括步骤1.1~步骤1.3:
步骤1.1:针对最近一段时间T,将空闲计算节点向资源管理器申请计算资源的时间点保存到链表中;
步骤1.2:通过集群空闲度感知方法计算集群空闲度ρ,具体包括:
步骤1.2.1:判断链表长度List.sise是否大于阈值L max
步骤1.2.1.1:当List.sise>L max时,判断链表尾节点和头节点的时间差T是否大于阈值T max
步骤1.2.1.1.1:当T>T max时,移除头节点,返回步骤1.2.1。
步骤1.2.1.1.2:当T≤T max时,跳到步骤1.2.2。
步骤1.2.1.2:当List.sise≤L max时,继续存储计算节点向资源管理器申请计算资源的时间点,返回步骤1.2.1。
步骤1.2.2:根据公式(1)算集群空闲度ρ。
Figure PCTCN2019091269-appb-000001
其中,ρ代表集群空闲度,List.size代表链表长度,T代表链表尾节点和头节点的时间差。
步骤1.3:根据公式(2)和公式(3)算集群最大备份任务数TotalBackup。
Figure PCTCN2019091269-appb-000002
Figure PCTCN2019091269-appb-000003
其中,t s代表一个计算周期,TotalBackup代表整个集群能开启的最大备份任务数,τ代表中间变量,N代表任务总数。
步骤2:预测执行任务完成时间,具体包括:
步骤2.1:构建权值XGboost模型WXG(WeightXGboost)和剩余完成时间的XGboost模型TXG(TimeXGboost)。
步骤2.2:Map执行任务由map和sort两个阶段组成,预测Map执行任务完成时间方法,具体包括:
步骤2.2.1:输入矩阵x1[各阶段的负载信息]和x2[节点负载预测信息,当前阶段剩余的数据量]。
步骤2.2.2:根据公式(4),通过权值XGboost模型WXG(WeightXGboost)计算map阶段与sort阶段的权值w map,w sort
w map,w sort=WXG(x1)   (4)
其中,w map代表map阶段权值,w sort代表sort阶段权值,x1代表各阶段负载信息,WXG(x)代表权值XGboost模型。
步骤2.2.3:根据公式(5),通过剩余完成时间的XGboost模型TXG(TimeXGboost)计算每个阶段剩余完成时间T remain
T remain=TXG(x2)   (5)
其中,T remain代表各阶段剩余完成时间,x2代表节点负载预测信息和当前阶段剩余的数据量,TXG(x)代表剩余完成时间的XGboost模型。
步骤2.2.4:根据公式(6)计算每个阶段的进度prog m
Figure PCTCN2019091269-appb-000004
其中,prog m代表每个阶段的进度,runtime代表每个阶段已经运行的时间。
步骤2.2.5:根据公式(7)计算Map执行任务完成进度prog Map
prog Map=∑ i∈{map,sort}w i*prog i   (7)
其中,prog Map代表Map执行任务完成进度,w i代表每个阶段的权值,prog i代表每个阶段进度。
步骤2.2.6:根据公式(8)计算Map执行任务完成时间EstimatedEndTime。
Figure PCTCN2019091269-appb-000005
其中,EstimatedEndTime代表Map执行任务完成时间,now表示当前时刻,starttime表示任务开始时刻。
步骤2.3Reduce执行任务分为copy,sort,reduce三个阶段,预测Reduce执行任务完成时间,具体包括:
步骤2.3.1:输入矩阵x1[各阶段的负载信息],x2[节点负载预测信息,当前阶段剩余的数据量]。
步骤2.3.2:根据公式(9),通过权值XGboost模型WXG(WeightXGboost)计算copy,sort,reduce三个阶段的权值w copy,w sort,w reduce
w copy,w sort,w reduce=WXG(x1)   (9)
其中,w copy代表copy阶段权值,w sort代表sort阶段权值,w reduce代表reduce阶段权值。
步骤2.3.3:根据公式(10),通过剩余完成时间的XGboost模型TXG(TimeXGboost)计算每个阶段剩余完成时间T remain
T remain=TXG(x2)   (10)
步骤2.3.4:根据公式(11)计算每个阶段的进度prog m
Figure PCTCN2019091269-appb-000006
步骤2.3.5:根据公式(12)计算Reduce执行任务完成进度prog Reduce
prog Reduce=∑ i∈{copy,sort,reduce}w i*prog i   (12)
其中,prog Reduce代表Reduce执行任务完成进度。
步骤2.3.6:根据公式(13)计算Reduce执行任务完成时间EstimatedEndTime。
Figure PCTCN2019091269-appb-000007
步骤3:将最大备份任务数与APPmaster设置的备份任务数比较,取最小值作为备份任
务数阈值,并设置初始备份任务数为0,初始任务数为0;
步骤4:判断备份任务数是否小于等于备份任务数阈值,若为是,则转到步骤5;若为否,
则将备份任务数返回资源管理器;
步骤5:判断任务数是否小于总任务数,若为是,则转到步骤6,若为否,则将备份任务
数返回资源管理器;
步骤6:预测备份任务完成时间,具体包括:
步骤6.1:根据公式(14)计算节点Map执行任务的失效率和Reduce任务的失效率。
Figure PCTCN2019091269-appb-000008
其中,fail Map,fail Reduce分别代表计算节点运行Map任务的失效率和运行Reduce任务的失效率,Map fail,Reduce fail分别代表历史运行失败的Map任务数和历史运行失败的Reduce任务数,sum Map,sum Reduce代表计算节点上历史运行的Map任务总数和Reduce任务总数
步骤6.2:通过权值XGboost模型WXG(WeightXGboost)计算各个阶段的权值参数w。
步骤6.3:根据公式(15),通过剩余完成时间的XGboost模型TXG(TimeXGboost)计算预测完成时间。
Figure PCTCN2019091269-appb-000009
其中,runtime代表备份任务预测完成时间;
步骤6.4:根据公式(16)计算备份任务完成时间EstmitedEndTime 备份
EstmitedEndTime 备份=now+runtime   (16)
其中,EstmitedEndTime 备份代表备份任务完成时间。
步骤7:判断备份任务完成时间EstmitedEndTime 备份和执行任务完成时间EstmitedEndTime大小,当EstmitedEndTime 备份≥EstmitedEndTime时,不开启备份,任务数加1,转到步骤4;当EstmitedEndTime 备份<EstmitedEndTime时,开启备份,备份任务数加1,任务数加1,转到步骤4。
有益技术效果:
本发明提出一种基于负载预测的Hadoop计算任务推测执行方法,备份任务数自适应调整算法,根据集群的负载状态实时调整备份任务数量的开启,保证了当集群计算资源紧张的情况下,备份任务的开启不会对其他作业产生影响。执行任务的完成时间预测,利用XGboost算法分别对Map执行任务和Reduce执行任务的完成时间做出预测,从而更加准确的识别迟滞任务,有效避免了迟滞任务的误判导致计算资源浪费。备份任务完成时间预测算法,选择合理的节点来开启备份任务,节约计算节点的计算资源,减少作业的完成时间,提高了集群的整体性能。
附图说明
图1本发明实施例的一种基于负载预测的Hadoop计算任务推测执行方法流程图;
图2本发明实施例的基于负载预测的Hadoop推测执行IPO图;
图3本发明实施例的FIFO作业完成时间对比图;
图4本发明实施例的Capacity作业完成时间对比图;
图5本发明实施例的Fair-modified作业完成时间对比图;
图6本发明实施例的Fair作业完成时间对比图。
具体实施方式
下面结合附图和具体实施实例对发明做进一步说明,本发明为一种基于负载预测的Hadoop计算任务推测执行方法,基于负载预测的Hadoop计算任务推测执行IPO如图1所示,通过历史数据建立起任务完成时间与计算节点负载,任务进度的XGboost模型,每个运行的任务根据XGboost模型实时计算出任务的完成时间,并且将信息发送给AppMaster,资源管理器根据当前的集群负载,通过自适应备份任务数调整算法实时计算当前能够进行任务推测的最大任务数量供AppMaster进行备份任务开启的判断,根据预测的备份任务完成时间,判断是否对该任务开启备份,选择合理的节点来开启备份任务,减少作业的完成时间。
本系统在Hadoop平台上用20台同构的机器进行试验,其中1台为master,19台为slave,配置了三个用户队列a,b,c,分别占用集群30%,30%,40%的计算资源,Hadoop集群搭建的信息为Hadoop版本2.6,Java版本1.7,操作系统Centos7,编译工具Maven,开发工具Intelij,节点个数为19,用户队列为root.a,root.b,root.c。
本系统实施节点配置参数:CPU核数为8核,CPU主频2.2GHz,内存类型DDR3-1333 ECC,内存容量8GB,硬盘类型为15000转SAS硬盘,硬盘容量300GB,带宽1000Mbps。
一种基于负载预测的Hadoop计算任务推测执行方法,如图1所示,具体包括如下步骤:
步骤1:在作业提交后计算总任务数,且资源管理器对备份任务数自适应调整,得到最大备份任务数,具体包括:
步骤1.1:针对最近一段时间T,将空闲计算节点向资源管理器申请计算资源的时间点保存到链表中;
步骤1.2:通过集群空闲度感知方法计算集群空闲度ρ,具体包括:
步骤1.2.1:判断链表长度List.sise是否大于阈值L max
步骤1.2.1.1:当List.sise>L max时,判断链表尾节点和头节点的时间差T是否大于阈值T max
步骤1.2.1.1.1:当T>T max时,移除头节点,返回步骤1.2.1。
步骤1.2.1.1.2:当T≤T max时,跳到步骤1.2.2。
步骤1.2.1.2:当List.sise≤L max时,继续存储计算节点向资源管理器申请计算资源的时间点,返回步骤1.2.1。
步骤1.2.2:根据公式(1)算集群空闲度ρ。
Figure PCTCN2019091269-appb-000010
其中,ρ代表集群空闲度,List.size代表链表长度,T代表链表尾节点和头节点的时间差。
步骤1.3:根据公式(2)和公式(3)算集群最大备份任务数TotalBackup。
Figure PCTCN2019091269-appb-000011
Figure PCTCN2019091269-appb-000012
其中,t s代表一个计算周期,TotalBackup代表整个集群能开启的最大备份任务数,τ代表中间变量,N代表任务总数。
步骤2:预测执行任务完成时间,具体包括:
步骤2.1:构建权值XGboost模型WXG(WeightXGboost)和剩余完成时间的XGboost模型 TXG(TimeXGboost)。
步骤2.2:Map执行任务由map和sort两个阶段组成,预测Map执行任务完成时间方法,具体包括:
步骤2.2.1:输入矩阵x1[各阶段的负载信息]和x2[节点负载预测信息,当前阶段剩余的数据量]。
步骤2.2.2:根据公式(4),通过权值XGboost模型WXG(WeightXGboost)计算map阶段与sort阶段的权值w map,w sort
w map,w sort=WXG(x1)   (4)
其中,w map代表map阶段权值,w sort代表sort阶段权值,x1代表各阶段负载信息,WXG(x)代表权值XGboost模型。
步骤2.2.3:根据公式(5),通过剩余完成时间的XGboost模型TXG(TimeXGboost)计算每个阶段剩余完成时间T remain
T remain=TXG(x2)   (5)
其中,T remain代表各阶段剩余完成时间,x2代表节点负载预测信息和当前阶段剩余的数据量,TXG(x)代表剩余完成时间的XGboost模型。
步骤2.2.4:根据公式(6)计算每个阶段的进度prog m
Figure PCTCN2019091269-appb-000013
其中,prog m代表每个阶段的进度,runtime代表每个阶段已经运行的时间。
步骤2.2.5:根据公式(7)计算Map执行任务完成进度prog Map
prog Map=∑ i∈{map,sort}w i*prog i   (7)
其中,prog Map代表Map执行任务完成进度,w i代表每个阶段的权值,prog i代表每个阶段进度。
步骤2.2.6:根据公式(8)计算Map执行任务完成时间EstimatedEndTime。
Figure PCTCN2019091269-appb-000014
其中,EstimatedEndTime代表Map执行任务完成时间,now表示当前时刻,starttime表示任务开始时刻。
步骤2.3Reduce执行任务分为copy,sort,reduce三个阶段,预测Reduce执行任务完成时间,具体包括:
步骤2.3.1:输入矩阵x1[各阶段的负载信息],x2[节点负载预测信息,当前阶段剩余的数据量]。
步骤2.3.2:根据公式(9),通过权值XGboost模型WXG(WeightXGboost)计算copy,sort,reduce三个阶段的权值w copy,w sort,w reduce
w copy,w sort,w reduce=WXG(x1)   (9)
其中,w copy代表copy阶段权值,w sort代表sort阶段权值,w reduce代表reduce阶段权值。
步骤2.3.3:根据公式(10),通过剩余完成时间的XGboost模型TXG(TimeXGboost)计算每个阶段剩余完成时间T remain
T remain=TXG(x2)   (10)
步骤2.3.4:根据公式(11)计算每个阶段的进度prog m
Figure PCTCN2019091269-appb-000015
步骤2.3.5:根据公式(12)计算Reduce执行任务完成进度prog Reduce
prog Reduce=∑ i∈{copy,sort,reduce}w i*prog i   (12)
其中,prog Reduce代表Reduce执行任务完成进度。
步骤2.3.6:根据公式(13)计算Reduce执行任务完成时间EstimatedEndTime。
Figure PCTCN2019091269-appb-000016
步骤3:将最大备份任务数与APPmaster设置的备份任务数比较,取最小值作为备份任
务数阈值,并设置初始备份任务数为0,初始任务数为0;
步骤4:判断备份任务数是否小于等于备份任务数阈值,若为是,则转到步骤5;若为否,
则将备份任务数返回资源管理器;
步骤5:判断任务数是否小于总任务数,若为是,则转到步骤6,若为否,则将备份任务
数返回资源管理器;
步骤6:预测备份任务完成时间,具体包括:
步骤6.1:根据公式(14)计算节点Map执行任务的失效率和Reduce任务的失效率。
Figure PCTCN2019091269-appb-000017
其中,fail Map,fail Reduce分别代表计算节点运行Map任务的失效率和运行Reduce任务的失效率,Map fail,Reduce fail分别代表历史运行失败的Map任务数和历史运行失败的Reduce 任务数,sum Map,sum Reduce代表计算节点上历史运行的Map任务总数和Reduce任务总数
步骤6.2:通过权值XGboost模型WXG(WeightXGboost)计算各个阶段的权值参数w。
步骤6.3:根据公式(15),通过剩余完成时间的XGboost模型TXG(TimeXGboost)计算预测完成时间。
Figure PCTCN2019091269-appb-000018
其中,runtime代表备份任务预测完成时间;
步骤6.4:根据公式(16)计算备份任务完成时间EstmitedEndTime 备份
EstmitedEndTime 备份=now+runtime   (16)
其中,EstmitedEndTime 备份代表备份任务完成时间。
步骤7:判断备份任务完成时间EstmitedEndTime 备份和执行任务完成时间EstmitedEndTime大小,当EstmitedEndTime 备份≥EstmitedEndTime时,不开启备份,任务数加1,转到步骤4;;当EstmitedEndTime 备份<EstmitedEndTime时,开启备份,备份任务数加1,任务数加1,转到步骤4。
过对比各个作业在调度中不开启推测执行,开启LATE的推测执行,开启本发明的基于负载预测的Hadoop计算任务推测执行(LPS)进行对比,分别绘制了各个调度器下各个作业集的完成时间对比图,结果如图3-6所示,得出以下结论:
(1)负载的变化对任务完成时间有影响,通过图3,图4,图5,图6可知,当作业规模较小时,LATE推测执行和LPS推测执行都能减少任务的完成时间,而当作业规模变大,任务执行速度明显受到计算节点性能的影响,LPS推测执行可以实时依据负载对任务完成时间进行预测,对迟滞任务的识别准确率高于LATE推测执行,所以当负载变高时,LPS推测执行机制仍然可以有效减少作业的完成时间。
(2)备份任务过多会降低集群性能,由图3,图4,图5,图6可知,当任务量变大时,开启LATE推测执行反而会增加原有的作业完成时间,但LPS推测执行考虑了负载的变化带来的备份任务数变化,所以当集群负载较高时,依然可以减少作业集的完成时间。
本发明过程是在计算节点中完成的,根据计算节点的负载预测任务的完成时间,然后将完成时间传入AppMaster,AppMaster中有已经备份的任务数,如图2所示,然后和资源管理器自适应调整算法传来的可备份任务数进行比较,若小于,足可以继续备份。再去计算备份任务的完成时间,如果预测的完成时间大于备份任务完成时间,则开启备份。
综上所述,本章提出的基于负载预测的Hadoop计算任务推测执行方法相较于Hadoop原生的LATE推测执行方法能够更加准确的找到迟滞任务,并且能够根据集群负载调整备份任务数的开启,有效减少了作业完成时间的同时避免了备份任务过多而产生的资源竞争。

Claims (3)

  1. 一种基于负载预测的Hadoop计算任务推测执行方法,其特征在于,具体步骤如下:
    步骤1:在作业提交后计算总任务数,且资源管理器对备份任务数自适应调整,得到最大备份任务数,具体包括步骤1.1~步骤1.3:
    步骤1.1:针对最近一段时间T,将空闲计算节点向资源管理器申请计算资源的时间点保存到链表中;
    步骤1.2:通过集群空闲度感知方法计算集群空闲度ρ,具体包括:
    步骤1.2.1:判断链表长度List.sise是否大于阈值L max
    步骤1.2.1.1:当List.sise>L max时,判断链表尾节点和头节点的时间差T是否大于阈值T max
    步骤1.2.1.1.1:当T>T max时,移除头节点,返回步骤1.2.1;
    步骤1.2.1.1.2:当T≤T max时,跳到步骤1.2.2;
    步骤1.2.1.2:当List.sise≤L max时,继续存储计算节点向资源管理器申请计算资源的时间点,返回步骤1.2.1;
    步骤1.2.2:根据公式(1)算集群空闲度ρ;
    Figure PCTCN2019091269-appb-100001
    其中,ρ代表集群空闲度,List.sise代表链表长度,T代表链表尾节点和头节点的时间差;
    步骤1.3:根据公式(2)和公式(3)算集群最大备份任务数TotalBackup;
    Figure PCTCN2019091269-appb-100002
    Figure PCTCN2019091269-appb-100003
    其中,t s代表一个计算周期,TotalBackup代表整个集群能开启的最大备份任务数,τ代表中间变量,N代表任务总数;
    步骤2:预测执行任务完成时间;
    步骤3:将最大备份任务数与APPmaster设置的备份任务数比较,取最小值作为备份任务数阈值,并设置初始备份任务数为0,初始任务数为0;
    步骤4:判断备份任务数是否小于等于备份任务数阈值,若为是,则转到步骤5;若为否,则将备份任务数返回资源管理器;
    步骤5:判断任务数是否小于总任务数,若为是,则转到步骤6,若为否,则将备份任务数返回资源管理器;
    步骤6:预测备份任务完成时间;
    步骤7:判断备份任务完成时间EstmitedEndTime 备份和执行任务完成时间EstmitedEndTime大小,当EstmitedEndTime 备份≥EstmitedEndTime时,不开启备份,任务数加1,转到步骤4;当EstmitedEndTime 备份<EstmitedEndTime时,开启备份,备份任务数加1,任务数加1,转到步骤4。
  2. 根据权利要求1所述基于负载预测的Hadoop计算任务推测执行方法,其特征在于,所述预测执行任务完成时间,具体包括:
    步骤2.1:构建权值XGboost模型WXG和剩余完成时间的XGboost模型TXG;
    步骤2.2:Map执行任务由map和sort两个阶段组成,预测Map执行任务完成时间方法,具体包括:
    步骤2.2.1:输入矩阵x1[各阶段的负载信息]和x2[节点负载预测信息,当前阶段剩余的数据量];
    步骤2.2.2:根据公式(4),通过权值XGboost模型WXG计算map阶段与sort阶段的权值w map,w sort
    w map,w sort=WXG(x1)        (4)
    其中,w map代表map阶段权值,w sort代表sort阶段权值,x1代表各阶段负载信息,WXG(x)代表权值XGboost模型;
    步骤2.2.3:根据公式(5),通过剩余完成时间的XGboost模型TXG(TimeXGboost)计算每个阶段剩余完成时间T remain
    T remain=TXG(x2)         (5)
    其中,T remain代表各阶段剩余完成时间,x2代表节点负载预测信息和当前阶段剩余的数据量,TXG(x)代表剩余完成时间的XGboost模型;
    步骤2.2.4:根据公式(6)计算每个阶段的进度prog m
    Figure PCTCN2019091269-appb-100004
    其中,prog m代表每个阶段的进度,runtime代表每个阶段已经运行的时间;
    步骤2.2.5:根据公式(7)计算Map执行任务完成进度prog Map
    prog Map=∑ i∈{map,sort}w i*prog i     (7)
    其中,prog Map代表Map执行任务完成进度,w i代表每个阶段的权值,prog i代表每个阶 段进度;
    步骤2.2.6:根据公式(8)计算Map执行任务完成时间EstimatedEndTime;
    Figure PCTCN2019091269-appb-100005
    其中,EstimatedEndTime代表Map执行任务完成时间,now表示当前时刻,starttime表示任务开始时刻;
    步骤2.3Reduce执行任务分为copy,sort,reduce三个阶段,预测Reduce执行任务完成时间,具体包括:
    步骤2.3.1:输入矩阵x1[各阶段的负载信息],x2[节点负载预测信息,当前阶段剩余的数据量];
    步骤2.3.2:根据公式(9),通过权值XGboost模型WXG(WeightXGboost)计算copy,sort,reduce三个阶段的权值w copy,w sort,w reduce
    w copy,w sort,w reduce=WXG(x1)        (9)
    其中,w copy代表copy阶段权值,w sort代表sort阶段权值,w reduce代表reduce阶段权值;
    步骤2.3.3:根据公式(10),通过剩余完成时间的XGboost模型TXG(TimeXGboost)计算每个阶段剩余完成时间T remain
    T remain=TXG(x2)      (10)
    步骤2.3.4:根据公式(11)计算每个阶段的进度prog m
    Figure PCTCN2019091269-appb-100006
    步骤2.3.5:根据公式(12)计算Reduce执行任务完成进度prog Reduce
    prog Reduce=∑ i∈{cppy,sort,reduce}w i*prog i    (12)
    其中,prog Reduce代表Reduce执行任务完成进度;
    步骤2.3.6:根据公式(13)计算Reduce执行任务完成时间EstimatedEndTime;
    Figure PCTCN2019091269-appb-100007
  3. 根据权利要求1所述基于负载预测的Hadoop计算任务推测执行方法,其特征在于,所述预测备份任务完成时间,具体包括:
    步骤6.1:根据公式(14)计算节点Map执行任务的失效率和Reduce任务的失效率;
    Figure PCTCN2019091269-appb-100008
    其中,fail Map,fail Reduce分别代表计算节点运行Map任务的失效率和运行Reduce任务的失效率,Map fail,Reduce fail分别代表历史运行失败的Map任务数和历史运行失败的Reduce任务数,sum Map,sum Reduce代表计算节点上历史运行的Map任务总数和Reduce任务总数
    步骤6.2:通过权值XGboost模型WXG(WeightXGboost)计算各个阶段的权值参数w;
    步骤6.3:根据公式(15),通过剩余完成时间的XGboost模型TXG(TimeXGboost)计算预测完成时间;
    Figure PCTCN2019091269-appb-100009
    其中,runtime代表备份任务预测完成时间;
    步骤6.4:根据公式(16)计算备份任务完成时间EstmitedEndTime 备份
    EstmitedEndTime 备份=now+runtime      (16)
    其中,EstmitedEndTime 备份代表备份任务完成时间。
PCT/CN2019/091269 2019-06-13 2019-06-14 一种基于负载预测的Hadoop计算任务推测执行方法 WO2020248227A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910510535.6A CN110221909B (zh) 2019-06-13 2019-06-13 一种基于负载预测的Hadoop计算任务推测执行方法
CN201910510535.6 2019-06-13

Publications (1)

Publication Number Publication Date
WO2020248227A1 true WO2020248227A1 (zh) 2020-12-17

Family

ID=67816959

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/091269 WO2020248227A1 (zh) 2019-06-13 2019-06-14 一种基于负载预测的Hadoop计算任务推测执行方法

Country Status (2)

Country Link
CN (1) CN110221909B (zh)
WO (1) WO2020248227A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113094155B (zh) * 2019-12-23 2023-07-18 中国移动通信集团辽宁有限公司 Hadoop平台下的任务调度方法及装置
CN111382013A (zh) * 2020-03-20 2020-07-07 平安科技(深圳)有限公司 一种自动调整定时备份作业发起时间的方法和装置
CN112328430A (zh) * 2020-11-03 2021-02-05 燕山大学 一种用于降低数据中心网络系统运行成本的方法
CN112506619B (zh) * 2020-12-18 2023-08-04 北京百度网讯科技有限公司 作业处理方法、装置、电子设备和存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440167A (zh) * 2013-09-04 2013-12-11 福州大学 Hadoop多作业环境下自学习反馈的任务调度方法
CN105138405A (zh) * 2015-08-06 2015-12-09 湖南大学 基于待释放资源列表的MapReduce任务推测执行方法和装置
CN106168912A (zh) * 2016-07-28 2016-11-30 重庆邮电大学 一种Hadoop大数据平台中基于备份任务运行时间估计的调度方法
US20180322178A1 (en) * 2017-05-08 2018-11-08 Salesforce.Com, Inc. Pseudo-synchronous processing by an analytic query and build cluster

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9857974B2 (en) * 2013-10-03 2018-01-02 International Business Machines Corporation Session execution decision
CN104239194A (zh) * 2014-09-12 2014-12-24 上海交通大学 基于bp神经网络的任务完成时间预测方法
US9672064B2 (en) * 2015-07-13 2017-06-06 Palo Alto Research Center Incorporated Dynamically adaptive, resource aware system and method for scheduling
CN105302647B (zh) * 2015-11-06 2019-04-16 南京信息工程大学 一种MapReduce中备份任务推测执行策略的优化方案
CN105487930B (zh) * 2015-12-01 2018-10-16 中国电子科技集团公司第二十八研究所 一种基于Hadoop的任务优化调度方法
US10346206B2 (en) * 2016-08-27 2019-07-09 International Business Machines Corporation System, method and computer program product for resource management in a distributed computation system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440167A (zh) * 2013-09-04 2013-12-11 福州大学 Hadoop多作业环境下自学习反馈的任务调度方法
CN105138405A (zh) * 2015-08-06 2015-12-09 湖南大学 基于待释放资源列表的MapReduce任务推测执行方法和装置
CN106168912A (zh) * 2016-07-28 2016-11-30 重庆邮电大学 一种Hadoop大数据平台中基于备份任务运行时间估计的调度方法
US20180322178A1 (en) * 2017-05-08 2018-11-08 Salesforce.Com, Inc. Pseudo-synchronous processing by an analytic query and build cluster

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HU, DAN ET AL.: "Improved LATE Scheduling Algorithm on Hadoop Platform", COMPUTER ENGINEERING AND APPLICATIONS, vol. 50, no. 4, 15 February 2014 (2014-02-15), ISSN: 1002-8331, DOI: 20200112185747A *
LI, DEYOU ET AL.: "The Research of a Big Data Storage System Constructed with Hadoop", JOURNAL OF HARBIN UNIVERSITY OF SCIENCE AND TECHNOLOGY, vol. 20, no. 4, 31 August 2015 (2015-08-31), ISSN: 1007-2683, DOI: 20200112185415 *

Also Published As

Publication number Publication date
CN110221909A (zh) 2019-09-10
CN110221909B (zh) 2023-01-17

Similar Documents

Publication Publication Date Title
WO2020248227A1 (zh) 一种基于负载预测的Hadoop计算任务推测执行方法
WO2023184939A1 (zh) 基于深度强化学习的云数据中心自适应高效资源分配方法
CN109324875B (zh) 一种基于强化学习的数据中心服务器功耗管理与优化方法
KR102585591B1 (ko) 이기종 프로세서 기반 엣지 시스템에서 slo 달성을 위한 인공지능 추론 스케쥴러
JP5411587B2 (ja) マルチスレッド実行装置、マルチスレッド実行方法
CN110262897B (zh) 一种基于负载预测的Hadoop计算任务初始分配方法
US8356304B2 (en) Method and system for job scheduling
US9652027B2 (en) Thread scheduling based on performance state and idle state of processing units
CN110795238B (zh) 负载计算方法、装置、存储介质及电子设备
US20170206111A1 (en) Managing processing capacity provided to threads based upon load prediction
Huang et al. Novel heuristic speculative execution strategies in heterogeneous distributed environments
US20230127112A1 (en) Sub-idle thread priority class
TW201314433A (zh) 伺服器系統及其電源管理方法
KR101770736B1 (ko) 응용프로그램의 질의 스케쥴링을 이용한 시스템의 소모전력 절감 방법 및 그 방법을 이용하여 소모전력을 절감하는 휴대단말기
CN115878260A (zh) 一种低碳自适应云主机任务调度系统
CN105302647B (zh) 一种MapReduce中备份任务推测执行策略的优化方案
Fu et al. Optimizing speculative execution in spark heterogeneous environments
WO2024021475A1 (zh) 一种容器调度方法及装置
CN113094155B (zh) Hadoop平台下的任务调度方法及装置
WO2020249106A1 (zh) 用于处理数据组的可编程器件及处理数据组的方法
Ibrahim et al. Improving mapreduce performance with progress and feedback based speculative execution
JP2001350639A (ja) ソフトリアルタイムにおけるスケジューリング方法
CN111813512B (zh) 一种基于动态分区的高能效Spark任务调度方法
US9152451B2 (en) Method of distributing processor loading between real-time processor threads
CN117453388B (zh) 一种分布式算力智能调度系统及方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19932538

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19932538

Country of ref document: EP

Kind code of ref document: A1