WO2017005115A1 - 分布式dag系统的自适应优化方法和装置 - Google Patents

分布式dag系统的自适应优化方法和装置 Download PDF

Info

Publication number
WO2017005115A1
WO2017005115A1 PCT/CN2016/087461 CN2016087461W WO2017005115A1 WO 2017005115 A1 WO2017005115 A1 WO 2017005115A1 CN 2016087461 W CN2016087461 W CN 2016087461W WO 2017005115 A1 WO2017005115 A1 WO 2017005115A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
node
time
processing node
concurrency
Prior art date
Application number
PCT/CN2016/087461
Other languages
English (en)
French (fr)
Inventor
黄益聪
强琦
余骏
金晓军
廖新涛
Original Assignee
阿里巴巴集团控股有限公司
黄益聪
强琦
余骏
金晓军
廖新涛
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司, 黄益聪, 强琦, 余骏, 金晓军, 廖新涛 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2017005115A1 publication Critical patent/WO2017005115A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Definitions

  • the present application relates to the field of computer technologies, and in particular, to an adaptive optimization method and apparatus for a distributed DAG system.
  • FIG. 1 illustrates a DAG topology diagram of a task of a delta computing platform (Galaxy), and each computing node (Model) in FIG. 1 is concurrently executed by a plurality of execution units (Executors).
  • the type of the computing node may include one or more of mapping processing (Mapper), reduction processing, and aggregation processing (Merger).
  • the data stream flows from the root node (ie, the Source node) of the DAG graph, and flows out from the leaf nodes (ie, the Output node) through the various operation nodes.
  • the root node ie, the Source node
  • the leaf nodes ie, the Output node
  • Leaf nodes compute node 16 (Merger), compute node 17 (Merger), compute node 18 (Merger), and compute node 19 (Merger).
  • DAG Topology Diagram The calculation of the computing nodes at each level forms a pipeline for the data stream.
  • the data flows in batches from the root node, and the results are calculated from the leaf nodes after passing through the calculation nodes at each level.
  • the computing power and speed of the computing node may be adjusted by the concurrency of the computing node, that is, the number of concurrent execution units (Executor).
  • a fully loaded system requires the computing power of all computing nodes in the DAG to be consistent. If the computing power of a computing node is less than that of the upper computing node, the data output by the upper computing node will be stacked on the computing node for processing. This affects the throughput performance of the system.
  • the machine environment in which the task runs may change during the life of the task. For example, if the machine fails, the task is scheduled from several machines to several other machines; due to the operation of other tasks in the cluster, it affects the network throughput of the task or the speed of disk read and write.
  • the change of the operating environment will affect the processing speed of the computing node, that is, although the set concurrency is unchanged, the actual running speed of the computing node changes. Therefore, the parameters originally set may no longer apply.
  • the concurrency of one or several compute nodes may be higher than the actual demand at runtime.
  • the cluster wastes resources running idling execution units and waiting for data.
  • the cluster performance is lower than the source speed.
  • the throughput of the pipeline is subject to the slowest running primary pipeline or one of the compute nodes. Inappropriate parameter settings can result in significant short plates in the pipeline, and cluster throughput is well below the theoretically achievable value.
  • the main purpose of the present application is to provide an adaptive optimization method and apparatus for a distributed DAG system to overcome the technical problems of fixed and non-optimized node concurrency in the prior art, which leads to a reduction in system operation efficiency.
  • the distributed DAG system includes a plurality of computing nodes, and the method includes: acquiring performance data of each computing node; Calculating, according to the performance data, a time for each computing node to process the data record, and calculating a concurrency of the computing node according to a time when the computing node processes the data record; and adjusting a current concurrency of the system according to the calculated concurrency.
  • the method further includes: performing breadth-first traversal on the DAG system, acquiring performance data of the traversed computing node, and calculating a time for the computing node to process a data record according to the performance data, and processing a data according to the computing node.
  • the recorded time calculates the concurrency of the compute node.
  • the type of the computing node includes one or more of the following: a mapping processing node, a reduction processing node, and an aggregation processing node.
  • the method further includes: acquiring performance data of the computing node according to a type of the computing node.
  • the method further includes: storing the acquired performance data of the computing node to the storage system; reading the stored performance data from the storage system, and calculating a time at which the computing node processes the data record according to the read performance data.
  • the method further includes: for the aggregation processing node, acquiring a time taken by the aggregation processing node to execute the predetermined batch data, a quantity of data for executing the predetermined batch data, a time required for generating the checkpoint, and a batch of data generated by generating the checkpoint interval.
  • ppt i max(f i /t i , cpt i /t i *cb i ), where f is the cost of the aggregate processing node executing the predetermined batch data The time, t is the number of data of the predetermined batch data for the aggregation processing node, cpt is the time required to generate the checkpoint, and cb is the data batch separated by the one checkpoint.
  • the current concurrency of the new concurrency adjustment system includes: comparing the calculated new concurrency with the current concurrency of the system, and if the difference is greater than the preset threshold, initializing the system and using the new one. Run the system concurrently.
  • the distributed DAG system includes a plurality of computing nodes, and the apparatus includes: a data acquiring module, configured to acquire performance data of each computing node. a calculation module, configured to calculate, according to the performance data, a time for each computing node to process the data record, and calculate a concurrency of the computing node according to a time when the computing node processes the data record; and an adaptive optimization module, configured to calculate The degree of concurrency adjusts the current concurrency of the system.
  • the calculation module is further configured to perform breadth-first traversal on the DAG system, obtain performance data of the traversed computing node, and calculate a time for the computing node to process a data record according to the performance data, and process according to the computing node.
  • the time of a data record calculates the concurrency of the compute node.
  • the type of the computing node includes one or more of the following: a mapping processing node, a reduction processing node, and an aggregation processing node; and the data obtaining module is further configured to acquire performance of the computing node according to a type of the computing node. data.
  • the device further includes: a storage module, configured to store performance data of the computing node acquired by the data acquiring module; and accepting, by the computing module, the stored performance data, so that the computing module is configured according to the The read performance data calculates when the compute node processes the data record.
  • the data acquisition module includes: a second data acquisition module, configured to acquire, for the reduction processing node, a time taken by the reduction processing node to execute the predetermined batch data and a quantity of data for executing the predetermined batch data;
  • the data obtaining module includes: a second data acquiring module, configured to acquire, for the aggregation processing node, a time taken by the aggregation processing node to execute the predetermined batch data, a quantity of data to execute the predetermined batch data, and a checkpoint generated Time, generating a data batch interval between checkpoints;
  • the adaptive optimization module is further configured to compare the calculated new concurrency with the current concurrency of the system, and if the difference is greater than a preset threshold, initialize the system and run the system with a new concurrency System.
  • the concurrency of each computing node of the DAG topology map is automatically optimized according to the sampling data at the running time, so that the computing pipeline can run at full load without idling, waiting for data or overloading, and greatly saving the machine. Cost while improving system performance.
  • Figure 1 shows a prior art DAG topology diagram
  • FIG. 2 shows a flow chart of an adaptive optimization method of a distributed DAG system according to an embodiment of the present application
  • FIG. 3 is a flowchart showing an adaptive optimization method of a distributed DAG system according to another embodiment of the present application.
  • FIG. 4 is a block diagram showing the structure of an adaptive optimization apparatus of a distributed DAG system according to an embodiment of the present application
  • FIG. 5 is a block diagram showing the structure of an adaptive optimization apparatus of a distributed DAG system according to another embodiment of the present application.
  • An adaptive optimization method for a distributed DAG system is provided according to an embodiment of the present application.
  • FIG. 2 illustrates an adaptive optimization method of a distributed DAG system according to an embodiment of the present application
  • the flow chart is as shown in FIG. 2, and the method includes:
  • Step S202 Acquire performance data of each computing node.
  • a plurality of computing nodes are included, and after the target system is started, the performance data of each computing node (Model) is collected in real time; then, the sampled data is written into a reliable storage system; When the performance data of a node is calculated, the stored performance data is read by the storage system.
  • Model the performance data of each computing node
  • the types of computing nodes include, but are not limited to, one or more of the following: a Mapper, a Reduce, and a Merger. Therefore, the performance data of the computing node needs to be obtained according to the type of the computing node, for example, for the mapping processing node, obtaining the time (1) for the mapping processing node to execute one data record; and for obtaining the reduction processing node for the reduction processing node
  • the time (f) taken to execute the predetermined batch data and the number of data (t) for executing the predetermined batch data; for the aggregation processing node, the time (f) taken to execute the predetermined batch data by the aggregation processing node, and execution The number of data (t) of the scheduled batch data, the time required to generate the checkpoint (cpt), and the data batch (cb) at which the checkpoint is generated.
  • Step S204 calculating, according to the performance data, a time for each computing node to process the data record, and calculating a concurrency of the computing node according to a time when the computing node processes the data record.
  • the degree of concurrency of a computing node refers to the number of concurrent execution units (Executors) of a computing node.
  • l is the time at which the mapping processing node executes a data record
  • f is the time taken for the reduction processing node or the aggregation processing node to execute a batch of data
  • t is the amount of data that a batch of data is executed by the reduction processing or aggregation processing node
  • Cpt is the time required to generate a checkpoint
  • Cb is the data batch that is generated by generating a checkpoint, that is, how many batches of data are separated for a checkpoint.
  • the PPT is equal to the time when the mapping processing node executes a data record (ie, l);
  • its PPT is equal to the ratio of the time taken by the reduction processing node to execute a batch of data and the amount of data of the batch data (ie, f/t);
  • calculating its PPT is more complicated, first Calculate the ratio of the time taken by the node to execute a batch of data and the amount of data of the batch data (ie, f/t), and then calculate the ratio of the time required to generate the checkpoint to the number of data that the node executes a batch of data and multiply it to generate The data batch (ie cpt/t*cb) separated by one checkpoint, and the larger of the values of f/t and cpt/t*cb is
  • adjV represents the current computing node
  • v represents the upstream neighboring computing node of the current computing node
  • Sum() represents the summation of all upstream neighboring computing nodes of the current computing node
  • ratio represents the throughput rate of the computing node
  • ratio is equal to the output of the computing node. The ratio of the number of data records (tuples) to the number of data records (tuples) entered by the compute node.
  • Step S206 adjusting the current concurrency of the system according to the calculated concurrency.
  • the concurrency of each computing node is adaptively optimized according to the running time sampling data, so that the speed of each pipeline can be automatically aligned according to the speed of the source data source, and the cluster performance is improved while saving the machine cost.
  • FIG. 3 is a flowchart of an adaptive optimization method of a distributed DAG system according to another embodiment of the present application. Referring to FIG. 3, the method includes:
  • Step S302 after the system is initialized, start to collect performance data of each computing node (Model) in the system.
  • the sampling interval can be customized, for example, the performance data of the computing node is collected once in 15, 30 or 60 seconds.
  • the types of computing nodes include, but are not limited to, one or more of the following: a mapping processing node (Mapper), a reduction processing node (Reduce), and an aggregation processing node (Merger).
  • Mapper mapping processing node
  • Reduce reduction processing node
  • Merger aggregation processing node
  • For the mapping processing node the time at which the collecting node performs a data record
  • the reduction processing node the time taken by the collecting node to execute the predetermined batch data and the amount of data for executing the predetermined batch data
  • the aggregation processing node the collecting node performs the predetermined The time taken for the batch data, the amount of data to execute the scheduled batch data, the time required to generate the checkpoint, and the data batch at which the checkpoint is generated.
  • the sampled data is stored in a reliable storage system.
  • the storage system may be, for example, a distributed storage system (HBase), or may be another reliable storage system.
  • Step S306 the sampling data of the period of time is read from the storage system every predetermined operation period (for example, 15, 30 or 60 minutes), and the new concurrency of the computing node is calculated.
  • predetermined operation period for example, 15, 30 or 60 minutes
  • Step S310 automatically calculating the concurrency of the computing node according to the calculated new concurrency. Specifically, the concurrency of the current computing node is read from the configuration file of the system, and the concurrency of the new concurrency and the current computing node of the system is compared. If the difference is greater than a preset threshold, the system is reinitialized and the new concurrency is performed. Re-run the system.
  • Table 1 shows the data comparison before and after the optimization of the main performance indicators of the task (Job) of the data computing platform (Galaxy) online resource consumption.
  • the task tcif_rp_view_taobao_app is the task that the Galaxy cluster occupies the most resources.
  • the number of resources required for the task is configured through the configuration file before the performance optimization: 300 task processes (worker_num) are required for the task, and 4 CPUs are bound to each worker process. (cpu_bind) and use 3G memory (Memory), the total resource requirement is 1200 CPU and 900G memory, each worker process includes multiple Executor threads, the original configuration file is configured with 2947 Executor threads.
  • the data flows from the real-time data transmission platform into the Galaxy cluster in batches, with 1000 data records per batch of data.
  • the Galaxy cluster generates a DAG model according to the computing task.
  • the DAG includes multiple computing nodes, and the applied physical resources are allocated to the computing nodes according to the configuration file.
  • the fragments in the configuration file may be:
  • the compute node 0 configuration uses 256 Executor threads, that is, the compute node 0 concurrency is 256; the compute node 1 configuration uses 256 execution unit threads, that is, the compute node 1 concurrency is 76.
  • the computing node 3 and the computing node 5 are upstream adjacent computing nodes of the computing node 7 (the computing node 7 needs to rely on the calculation results of the computing node 3 and the computing node 5), and
  • the calculation node 0 is the upstream adjacent calculation node of the calculation node 3 (the calculation result of the calculation node 3 needs to depend on the calculation node 0), and the calculation node 1 is the upstream adjacent calculation node of the calculation node 5 (the calculation result of the calculation node 5 needs to depend on the calculation node 1) ); where compute node 0 and compute node 1 are root nodes.
  • the types of computing nodes 3, 5, and 7 are mapping processing nodes; the value of computing node 0 (the time when a data record is performed) is 0.2 seconds, and the value of computing node 1 is 0.1 seconds.
  • the calculation node 3 has an l value of 0.5 seconds, the calculation node 5 has an l value of 0.3 seconds, the calculation node 7 has an l value of 0.6 seconds, and the calculation nodes 3, 5, and 7 have a throughput ratio of 1, Then the compute nodes 3, 5, and 7 concurrency are:
  • Model3.dop Model0.dop*Model0.ratio*Model3.ppt/Model0.ppt
  • Model5.dop Model1.dop*Model1.ratio*Model5.ppt/Model1.ppt
  • Model7.dop Model3.dop*Model3.ratio*Model7.ppt/Model3.ppt+
  • the new concurrency of the computing nodes 3, 5 and 7 is obtained, and the concurrency of each computing node in the system is calculated according to the above principle, and then the system is reinitialized, and the system is re-run with new concurrency.
  • the total resource requirement of the task tcif_rp_view_taobao_app is reduced from 1200 CPUs to 300 CPUs, and the delay of the task (BatchLatency, data from the source to the output model) is reduced from 2.58 milliseconds to 1.62 milliseconds.
  • the task tcif_rp_view_taobao_app saves 900 CPUs. At the same time, Core's performance has increased by about 60%.
  • the optimization of the top six tasks of the data computing platform online resource consumption can save 2040 CPU cores, and the system resource cost decreases by 75%, while the performance is improved by 30%.
  • An adaptive optimization apparatus for a distributed DAG system is also provided according to an embodiment of the present application.
  • 4 is a structural block diagram of an adaptive optimization apparatus of a distributed DAG system according to an embodiment of the present application. Referring to FIG. 4, the apparatus includes:
  • the data obtaining module 410 is configured to obtain performance data of each computing node.
  • the calculating module 420 is configured to calculate, according to the performance data, a time for each computing node to process the data record, and calculate a concurrency of the computing node according to a time when the computing node processes the data record; further, the calculating module 420 is further configured to: Performing breadth-first traversal on the DAG system, obtaining performance data of the traversed computing node and calculating the concurrency of the computing node.
  • the adaptive optimization module 430 is configured to adjust the current concurrency of the system according to the calculated concurrency. Specifically, the adaptive optimization module 430 is further configured to compare the new concurrency with the current concurrency of the system, and if the difference is greater than the preset threshold, initialize the system to run the system with a new concurrency.
  • the type of the computing node includes: a mapping processing node, a reduction processing node, and an aggregation processing node; the data obtaining module 410 is further configured to acquire the computing node according to the type of the computing node. Performance data.
  • the data acquisition module 410 further includes: a first data acquisition module 512, a second data acquisition module 514, and The third data acquisition module 516.
  • the first data obtaining module 512 is configured to acquire, for the mapping processing node, a time (1) for the node to perform a data record, and the second data acquiring module 514, configured to perform, for the reduction processing node, the acquiring node to execute the predetermined batch data.
  • the third data obtaining module 516 is configured to acquire, for the aggregation processing node, the time (f) and time taken by the node to execute the predetermined batch data.
  • the calculation module 420 further includes: a first calculation module 522, a second calculation module 524, a third calculation module 526, and a fourth calculation module 528.
  • the apparatus further includes: a storage module 440, configured to store performance data of the computing node acquired by the data acquiring module 410; and accept the performance data read by the computing module 420 from the storage module,
  • the calculation module 420 is configured to calculate a time at which the computing node processes the data record based on the read performance data.
  • the concurrency of each computing node of the DAG topology map is automatically optimized according to the sampling data of the running time, so that the computing pipeline can run at full load (not idling waiting for data nor overload), Improve system performance while significantly reducing machine costs.
  • embodiments of the present application can be provided as a method, system, or computer program product.
  • the present application can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment in combination of software and hardware.
  • the application may be embodied in one or more of A computer program product embodied on a computer usable storage medium (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer usable program code.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology.
  • the information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device.
  • computer readable media does not include temporary storage of computer readable media, such as modulated data signals and carrier waves.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开一种分布式DAG系统的自适应优化方法和装置,其中该方法包括:获取每个计算节点的性能数据;根据所述性能数据分别计算每个计算节点处理数据记录的时间,并根据计算节点处理数据记录的时间计算该计算节点的并发度;根据计算得到的并发度调整系统当前的并发度。通过本申请,能够使计算流水线满负载一致运行,在大幅节约机器成本的同时提升系统性能。

Description

分布式DAG系统的自适应优化方法和装置 技术领域
本申请涉及计算机技术领域,尤其涉及一种分布式DAG系统的自适应优化方法和装置。
背景技术
DAG(Directed acyclic graph,有向无环图)可用于描述分布式离线/在线系统的工作流程图。图1示例了增量计算平台(Galaxy)某个任务(Job)的DAG拓扑图,图1中的每个计算节点(Model)由多个执行单元(Executor)并发运行。在典型的分布式离线/在线计算平台中,计算节点的类型可能包含有映射处理(Mapper)、归约处理(Reduce)、聚合处理(Merger)中的一种或者几种。
在DAG系统中,数据流从DAG图的根节点(即Source节点)流入,经过各级运算节点,从叶子节点(即Output节点)流出。在DAG中可能有1个或者多个根节点,在图1中包含有两个根节点:计算节点0和计算节点1;叶子节点也可能有一个或者多个,在图1中包含有4个叶子节点:计算节点16(Merger)、计算节点17(Merger)、计算节点18(Merger)和计算节点19(Merger)。
DAG拓扑图各级计算节点的计算对数据流形成了流水线,数据分批次从根节点流入,逐级经过各级计算计算节点后,结果从叶子节点输出。其中,计算节点的计算能力与速度,可以由该计算节点的并发度,即执行单元(Executor)的并发数目调整。
满负载完美运行的系统要求DAG中的所有计算节点计算能力具有一致性,如果某个计算节点的计算能力小于上级计算节点,那么上级计算节点输出的数据将在该计算节点堆积等待处理,并由此影响系统的吞吐性能。
在现有技术中,广泛使用的分布式离线/在线基于流水线的计算平台对计算节点并发度的设置依赖于人工设定,并在任务(Job)运行的生命周期保持不变,具体步骤如下:
(1)用户或者系统管理员在提交任务前,在配置文件中指定计算节点的并发度;
(2)系统在提交任务时读入配置文件,并据此设置各个计算节点的运行并发度;
(3)各个计算节点的并发度保持不变,直至任务运行结束。
现有的技术对计算节点并发度的设置依赖于人工设定,并在任务运行的生命周期内保持不变,具有如下缺点:
(1)依赖于人工经验设置很难得到完美的并发参数,使得DAG各级流水线能够一致协调运行。如前所述,参数需要在任务提交前设置,所以在首次设置参数时并不知道各计算节点运行时的真实性能。用户可能需要多次迭代:设置参数-提交任务运行-观察性能-重新设置参数-再次提交任务……;这样才能得到一组相对合适的参数运行任务。而且依赖于人工观察与设置,参数很可能并非最优配置。
(2)不能适应集群环境变化。在任务运行的生命周期内,任务运行的机器环境可能发生变化。比如机器出现故障,任务从原先的几台机器,调度到另几台机器运行;由于集群其他任务的运行,影响了任务的网络吞吐或者磁盘读写速度等。运行环境的变化,将影响计算节点的处理速度,即虽然设置的并发度不变,但是计算节点的实际运行速度发生了变化。因此,原先设置的参数可能将不再适用。
(3)不能适应数据流的变化。在实时流式处理系统中,数据流通常并非稳定的,随着业务需求可能有显著的峰值流量与空闲流量。固定不变的参数不能反映数据流的变化。为了适应峰值流量,通常需要设置较大的并发度,造成了集群资源的浪费。
上述缺点将带来如下后果:
(1)浪费机器资源。某个或者某几个计算节点设置的并发度可能高于运行时刻的实际需求,集群浪费资源运行空转的执行单元,等待数据。
(2)集群性能低于源头速度。流水线的吞吐受制于运转最慢的一级流水线或者其中的某个计算节点。不合适的参数设置使得流水线可能存在明显的短板,集群吞吐远低于可以达到的理论最佳值。
发明内容
本申请的主要目的在于提供一种分布式DAG系统的自适应优化方法和装置,以克服现有技术中的固定不变及非最优化的节点并发度导致降低系统运行效率的技术问题。
根据本申请实施例提供了一种分布式有向无环图DAG系统的自适应优化方法,所述分布式DAG系统包括多个计算节点,所述方法包括:获取每个计算节点的性能数据;根据所述性能数据分别计算每个计算节点处理数据记录的时间,并根据计算节点处理数据记录的时间计算该计算节点的并发度;根据计算得到的并发度调整系统当前的并发度。
其中,所述方法还包括:对所述DAG系统进行广度优先遍历,获取遍历到的计算节点的性能数据并根据所述性能数据计算该计算节点处理一条数据记录的时间,根据计算节点处理一条数据记录的时间计算该计算节点的并发度。
其中,所述计算节点的类型包括以下的一种或多种:映射处理节点、归约处理节点、聚合处理节点;所述方法还包括:根据计算节点的类型获取该计算节点的性能数据。
其中,所述方法还包括:将获取到的计算节点的性能数据存储至存储系统;从所述存储系统读取存储的性能数据,并根据读取的性能数据计算计算节点处理数据记录的时间。
其中,进一步包括:对于映射处理节点,获取映射处理节点执行一条数据记录的时间;根据以下公式计算映射处理节点处理数据记录的时间:ppti=li,其中,l为映射处理节点执行一条数据记录的时间。
其中,进一步包括:对于归约处理节点,获取归约处理节点执行预定批次数据所花费的时间以及执行预定批次数据的数据数量;根据以下公式计算映射处理节点处理数据记录的时间:ppti=fi/ti,其中,f为归约处理节点执行预定批次数据所花费的时间、t为归约处理节点执行预定批次数据的数据数量。
其中,进一步包括:对于聚合处理节点,获取聚合处理节点执行预定批次数据所花费的时间、执行预定批次数据的数据数量、生成检查点所需时间、生成一次检查点所间隔的数据批次;根据以下公式计算聚合处理节点处理数据记录的时间:ppti=max(fi/ti,cpti/ti*cbi),其中,f为聚合处理节点执行预定批次数据所花费的时间、t为聚合处理节点执行预定批次数据的数据数量、cpt为生成检查点所需时间、cb为生成一次检查点所间隔的数据批次。
其中,根据以下公式计算计算节点的并发度:adjV.dop=Sum(v.dop*v.ratio*adjV.ppt/v.ppt),其中,adjV为当前计算节点、v是当前计算节点的上游邻接计算节点、Sum()表示对当前计算节点的所有上游邻接计算节点求和、ratio为计算节点的吞吐率。
其中,所述根据计算得到的新的并发度调整系统当前的并发度,包括:对比计算得到的新的并发度与系统当前的并发度,如果差值大于预设阈值则初始化系统并以新的并发度运行系统。
根据本申请实施例还提供一种分布式DAG系统的自适应优化装置,所述分布式DAG系统包括多个计算节点,所述装置包括:数据获取模块,用于获取每个计算节点的性能数据;计算模块,用于根据所述性能数据分别计算每个计算节点处理数据记录的时间,并根据计算节点处理数据记录的时间计算该计算节点的并发度;自适应优化模块,用于根据计算得到的并发度调整系统当前的并发度。
其中,所述计算模块还用于,对所述DAG系统进行广度优先遍历,获取遍历到的计算节点的性能数据并根据所述性能数据计算该计算节点处理一条数据记录的时间,根据计算节点处理一条数据记录的时间计算该计算节点的并发度。
其中,所述计算节点的类型包括以下的一种或多种:映射处理节点、归约处理节点、聚合处理节点;所述数据获取模块还用于,根据计算节点的类型获取该计算节点的性能数据。
其中,所述装置还包括:存储模块,用于存储所述数据获取模块获取到的计算节点的性能数据;并接受所述计算模块从其中读取存储的性能数据,以使所述计算模块根据读取的性能数据计算计算节点处理数据记录的时间。
其中,所述数据获取模块包括:第一数据获取模块,用于对于映射处理节点,获取映射处理节点执行一条数据记录的时间;所述计算模块包括:第一计算模块,用于根据以下公式计算映射处理节点处理数据记录的时间:ppti=li,其中,l为映射处理节点执行一条数据记录的时间。
其中,所述数据获取模块包括:第二数据获取模块,用于对于归约处理节点,获取归约处理节点执行预定批次数据所花费的时间以及执行预定批次数据的数据数量;所述计算模块包括:第二计算模块,用于根据以下公式计算归约处理节点处理数据记录的时间:ppti=fi/ti,其中,f为采集归约处理节点执行预定批次数据所花费的时间、t为归约处理节点执行预定批次数据的数据数量。
其中,所述数据获取模块包括:第二数据获取模块,用于对于聚合处理节点,获取聚合处理节点执行预定批次数据所花费的时间、执行预定批次数据的数据数量、生成检查点所需时间、生成一次检查点所间隔的数据批次;所述计算模块包括:第三计算模块,用于根据以下公式计算聚合处理节点处理数据记录的时间:ppti=max(fi/ti,cpti/ti*cbi),其中,f为聚合处理节点执行预定批次数据所花费的时间、t为聚合处理节点执行预定批次数据的数据数量、cpt为生成检查点所需时间、cb为生成一次检查点所间隔的数据批次。
其中,所述计算模块还包括:第四计算模块,用于根据以下公式计算计算节点的并发度:adjV.dop=Sum(v.dop*v.ratio*adjV.ppt/v.ppt),其中,adjV为当前计算节点、v是当前计算节点的上游邻接计算节点、Sum()表示对当前计算节点的所有上游邻接计算节点求和、ratio为计算节点的吞吐率。
其中,所述自适应优化模块还用于,对比计算得到的新的并发度与系统当前的并发度,如果差值大于预设阈值则初始化系统并以新的并发度运行系 统。
根据本申请的技术方案,依据运行时刻的采样数据,自动优化DAG拓扑图各个计算节点的并发度,使得计算流水线能够满负载一致运行,既不空转等待数据也不超负荷运行,在大幅节约机器成本的同时提升系统性能。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1示出现有技术的DAG拓扑图;
图2示出根据本申请一个实施例的分布式DAG系统的自适应优化方法的流程图;
图3示出根据本申请另一实施例的分布式DAG系统的自适应优化方法的流程图;
图4示出根据本申请一个实施例的分布式DAG系统的自适应优化装置的结构框图;
图5示出根据本申请另一实施例的分布式DAG系统的自适应优化装置的结构框图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请具体实施例及相应的附图对本申请技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
根据本申请实施例提供一种分布式DAG系统的自适应优化方法。
图2示出根据本申请一个实施例的分布式DAG系统的自适应优化方法 的流程图,如图2所示,所述方法包括:
步骤S202,获取每个计算节点的性能数据。
在分布式DAG系统中包括有多个计算节点,在目标系统启动运行后,开始实时采集每个计算节点(Model)的性能数据;然后,将采样得到的数据写入可靠的存储系统;当计算计算节点的性能数据时,通过存储系统读取其存储的性能数据。
在本申请的一个实施例中,计算节点的类型包括但不限于以下的一种或多种:映射处理节点(Mapper)、归约处理节点(Reduce)、聚合处理节点(Merger)。因此,需要根据计算节点的类型获取该计算节点的性能数据,例如:对于映射处理节点,获取映射处理节点执行一条数据记录的时间(l);用于对于归约处理节点,获取归约处理节点执行预定批次数据所花费的时间(f)以及执行预定批次数据的数据数量(t);用于对于聚合处理节点,获取聚合处理节点执行预定批次数据所花费的时间(f)、执行预定批次数据的数据数量(t)、生成检查点所需时间(cpt)、生成一次检查点所间隔的数据批次(cb)。
步骤S204,根据所述性能数据分别计算每个计算节点处理数据记录的时间,并根据计算节点处理数据记录的时间计算该计算节点的并发度。
每隔一个运行周期(例如30分钟),从存储系统读取该段时间的采样数据,并计算系统计算节点新的并发度。所谓计算节点的并发度,是指计算节点的执行单元(Executor)的并发数量。
具体地,把所有源头计算节点(即Model Source)加入访问集VisitQ,以VisitQ为起点对DAG进行广度优先遍历;对遍历到的当前计算节点,获取其性能数据,依据公式(1)计算当前计算节点的处理一条数据记录(tuple)的时间(Pure Processing Time,简称PPT):
Figure PCTCN2016087461-appb-000001
其中:l为映射处理节点执行一条数据记录的时间;
f为归约处理节点或聚合处理节点执行一批数据所花费的时间;
t为归约处理或聚合处理节点执行一批数据的数据数量;
cpt为生成检查点(checkpoint)所需时间;
cb为生成一次检查点所间隔的数据批次,即每间隔多少批次数据做一次检查点。
通过公式(1)表明,在计算计算节点的PPT时,根据计算节点类型的不同分别使用不同的计算公式:对于映射处理节点,其PPT等于映射处理节点执行一条数据记录的时间(即l);对于归约处理节点,其PPT等于归约处理节点执行一批数据所花费的时间及该批次数据的数据数量的比值(即f/t);对于聚合处理节点,计算其PPT较复杂,首先计算节点执行一批数据所花费的时间及该批次数据的数据数量的比值(即f/t),再计算生成检查点所需时间与节点执行一批数据的数据数量的比值并乘以生成一次检查点所间隔的数据批次(即cpt/t*cb),最后取f/t和cpt/t*cb两值间的较大者作为聚合处理节点的PPT。
然后,依据公式(2)计算当前计算节点的并发度(Degree of Parallelism,简称DOP):
adjV.dop=Sum(v.dop*v.ratio*adjV.ppt/v.ppt)          (2)
其中:adjV表示当前计算节点,v表示当前计算节点的上游邻接计算节点,Sum()表示对当前计算节点的所有上游邻接计算节点求和,ratio表示计算节点的吞吐率,ratio等于计算节点输出的数据记录(tuple)数量与计算节点输入的数据记录(tuple)数量的比值。
步骤S206,根据计算得到的并发度调整系统当前的并发度。
通过上述实施例,根据运行时刻采样数据自适应优化各计算节点的并发度,使得各级流水线的速度能够依据源头数据源的速度自动对齐,在节约机器成本的同时提高集群性能。
下面结合图3详细描述本申请实施例的细节。图3示出根据本申请另一实施例的分布式DAG系统的自适应优化方法的流程图,参考图3,所述方法包括:
步骤S302,系统初始化后,开始采集系统中每个计算节点(Model)的性能数据。其中采样间隔可自定义,例如15、30或60秒采集一次计算节点的性能数据。
在本申请中,计算节点的类型包括但不限于以下的一种或多种:映射处理节点(Mapper)、归约处理节点(Reduce)、聚合处理节点(Merger)。对于映射处理节点,采集节点执行一条数据记录的时间;对于归约处理节点,采集节点执行预定批次数据所花费的时间以及执行预定批次数据的数据数量;对于聚合处理节点,采集节点执行预定批次数据所花费的时间、执行预定批次数据的数据数量、生成检查点所需时间、生成一次检查点所间隔的数据批次。
步骤S304,将采样得到的数据存入可靠的存储系统,存储系统例如可以是分布式存储系统(HBase),也可以是其他的可靠的存储系统。
步骤S306,每隔预定的运行周期(例如15、30或60分钟),从存储系统读取该段时间的采样数据,并计算计算节点新的并发度。对所述DAG系统进行广度优先遍历,从存储系统读取遍历到的计算节点的性能数据,并根据采样数据计算计算节点处理数据记录的时间(PPT),具体地:
根据公式ppti=li计算映射处理节点的PPT;
根据公式ppti=fi/ti计算归约处理节点的PPT;
根据公式ppti=max(fi/ti,cpti/ti*cbi)计算聚合处理节点的PPT。
步骤S308,根据公式adjV.dop=Sum(v.dop*v.ratio*adjV.ppt/v.ppt)计算计算节点新的并发度(DOP)。
步骤S310,根据计算得到的新的并发度自动优化计算节点并发度。具体地,从系统的配置文件读取当前计算节点的并发度,对比新的并发度与系统当前计算节点的并发度,如果差值大于预先设置的阈值,则重新初始化系统,并以新的并发度重新运行系统。
下面结合具体实例详细描述本申请。表1示出了本申请对数据计算平台(Galaxy)线上资源消耗排名靠前的任务(Job)主要性能指标的优化前后的数据对比。
表1
Figure PCTCN2016087461-appb-000002
在表1中,任务tcif_rp_view_taobao_app是Galaxy集群占用资源最多的任务,在性能优化前通过配置文件配置任务需要的资源数量:任务运行需要300个worker进程(worker_num),每个worker进程绑定4个CPU(cpu_bind)并使用3G内存(Memory),总资源需求为1200个CPU和900G内存,其中每个worker进程又包括多个Executor线程,原配置文件共配置使用2947个Executor线程。
数据从实时数据传输平台分批次不间断流入Galaxy集群,每批数据1000条数据记录。Galaxy集群依据计算任务生成DAG模型,该DAG中包括多个计算节点,按照配置文件将申请到的物理资源分配到各个计算节点,配置文件中的片段可以是:
Model0.parallelism=256
Model1.parallelism=76
也就是说,为计算节点0配置使用256个执行单元(Executor)线程,即计算节点0的并发度为256;表示为计算节点1配置使用256个执行单元线程,即计算节点1的并发度为76。
在DAG中,多个计算节点间存在依赖关系,比如:计算节点3和计算节点5是计算节点7的上游邻接计算节点(计算节点7需要依赖计算节点3与计算节点5的计算结果),而计算节点0是计算节点3的上游邻接计算节点(计算节点3需要依赖计算节点0的计算结果),计算节点1是计算节点5的上游邻接计算节点(计算节点5需要依赖计算节点1的计算结果);其中,计算节点0和计算节点1是根节点。
在计算计算节点7的并发度时,首先需要计算计算节点3和5的并发度。在本实例中,计算节点3、5和7的类型是映射处理节点;通过采样系统得到计算节点0的l值(执行一条数据记录的时间)为0.2秒、计算节点1的l值为0.1秒、计算节点3的l值为0.5秒、计算节点5的l值为0.3秒、计算节点7的l值为0.6秒;将计算节点3、5和7的吞吐率(ratio)都设置为1,则计算节点3、5和7并发度为:
Model3.dop=Model0.dop*Model0.ratio*Model3.ppt/Model0.ppt
          =256*1*0.5/0.2=640
Model5.dop=Model1.dop*Model1.ratio*Model5.ppt/Model1.ppt
          =76*1*0.3/0.1=228
Model7.dop=Model3.dop*Model3.ratio*Model7.ppt/Model3.ppt+
Model5.dop*Model5.ratio*Model7.ppt/Model5.ppt
=640*1*0.6/0.5+228*1*0.6/0.3=1224
通过上述计算得到了计算节点3、5和7的新的并发度,并依据上述原理分别计算系统中每个计算节点的并发度,然后重新初始化系统,以新的并发度重新运行系统。经过系统优化后,任务tcif_rp_view_taobao_app运行总资源需求由1200个CPU降低到300个CPU、任务延迟(BatchLatency,数据从源头到output model的延迟)由2.58毫秒下降到1.62毫秒,任务tcif_rp_view_taobao_app在节约900个CPU Core的同时,性能还提升了约60%。
应用本申请实施例优化之后,对数据计算平台线上资源消耗排名靠前的6个任务的优化总计可节约2040个CPU内核,系统资源成本下降75%的同时,性能平均提升了30%。
根据本申请实施例还提供一种分布式DAG系统的自适应优化装置。图4示出根据本申请一个实施例的分布式DAG系统的自适应优化装置的结构框图,参考图4,该装置包括:
数据获取模块410,用于获取每个计算节点的性能数据。
计算模块420,用于根据所述性能数据分别计算每个计算节点处理数据记录的时间,并根据计算节点处理数据记录的时间计算该计算节点的并发度;进一步地,计算模块420还用于,对所述DAG系统进行广度优先遍历,获取遍历到的计算节点的性能数据并计算该计算节点的并发度。
自适应优化模块430,用于根据计算得到的并发度调整系统当前的并发度。具体地,所述自适应优化模块430还用于,对比新的并发度与系统当前的并发度,如果差值大于预设阈值,则初始化系统以新的并发度运行系统。
在本申请的一个实施例中,所述计算节点的类型包括:映射处理节点、归约处理节点、聚合处理节点;所述数据获取模块410还用于,根据计算节点的类型获取该计算节点的性能数据。
参考图5,是根据本申请另一实施例的自适应优化装置的结构框图,如图5所示,所述数据获取模块410进一步包括:第一数据获取模块512、第二数据获取模块514和第三数据获取模块516。
其中,第一数据获取模块512,用于对于映射处理节点,获取节点执行一条数据记录的时间(l);第二数据获取模块514,用于对于归约处理节点,获取节点执行预定批次数据所花费的时间(f)以及执行预定批次数据的数据数量(t);第三数据获取模块516,用于对于聚合处理节点,获取节点执行预定批次数据所花费的时间(f)、执行预定批次数据的数据数量(t)、生成检查点所需时间(cpt)、生成一次检查点所间隔的数据批次(cb)。
所述计算模块420进一步包括:第一计算模块522、第二计算模块524、第三计算模块526和第四计算模块528。
其中,第一计算模块522,用于根据以下公式计算映射处理节点处理数据记录的时间:ppti=li,其中,l为映射处理节点执行一条数据记录的时间。
第二计算模块524,用于根据以下公式计算归约处理节点处理数据记录的时间:ppti=fi/ti,其中,f为采集归约处理节点执行预定批次数据所花费的时间、t为归约处理节点执行预定批次数据的数据数量。
第三计算模块526,用于根据以下公式计算聚合处理节点处理数据记录的时间:ppti=max(fi/ti,cpti/ti*cbi),其中,f为聚合处理节点执行预定批次数据所花费的时间、t为聚合处理节点执行预定批次数据的数据数量、cpt为生成检查点所需时间、cb为生成一次检查点所间隔的数据批次。
第三计算模块528,用于根据以下公式计算计算节点的并发度:adjV.dop=Sum(v.dop*v.ratio*adjV.ppt/v.ppt),其中,adjV为当前计算节点、v是当前计算节点的上游邻接计算节点、Sum()表示对当前计算节点的所有上游邻接计算节点求和、ratio为计算节点的吞吐率。
继续参考图5,所述装置还包括有:存储模块440,用于存储所述数据获取模块410获取到的计算节点的性能数据;并接受所述计算模块420从其中读取存储的性能数据,以使所述计算模块420根据读取的性能数据计算计算节点处理数据记录的时间。
本申请的方法的操作步骤与装置的结构特征对应,可以相互参照,不再一一赘述。
综上所述,根据本申请的技术方案,依据运行时刻的采样数据,自动优化DAG拓扑图各个计算节点的并发度,使得计算流水线能够满负载一致运行(不空转等待数据也不超负荷),在大幅节约机器成本的同时提升系统性能。
本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含 有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。

Claims (18)

  1. 一种分布式有向无环图DAG系统的自适应优化方法,所述分布式DAG系统包括多个计算节点,其特征在于,所述方法包括:
    获取每个计算节点的性能数据;
    根据所述性能数据分别计算每个计算节点处理数据记录的时间,并根据计算节点处理数据记录的时间计算该计算节点的并发度;
    根据计算得到的并发度调整系统当前的并发度。
  2. 根据权利要求1所述的方法,其特征在于,还包括:
    对所述DAG系统进行广度优先遍历,获取遍历到的计算节点的性能数据并根据所述性能数据计算该计算节点处理一条数据记录的时间,根据计算节点处理一条数据记录的时间计算该计算节点的并发度。
  3. 根据权利要求1所述的方法,其特征在于,所述计算节点的类型包括以下的一种或多种:映射处理节点、归约处理节点、聚合处理节点;
    所述方法还包括:根据计算节点的类型获取该计算节点的性能数据。
  4. 根据权利要求3所述的方法,其特征在于,还包括:
    将获取到的计算节点的性能数据存储至存储系统;
    从所述存储系统读取存储的性能数据,并根据读取的性能数据计算计算节点处理数据记录的时间。
  5. 根据权利要求3所述的方法,其特征在于,进一步包括:
    对于映射处理节点,获取映射处理节点执行一条数据记录的时间;
    根据以下公式计算映射处理节点处理数据记录的时间:ppti=li,其中,l为映射处理节点执行一条数据记录的时间。
  6. 根据权利要求3所述的方法,其特征在于,进一步包括:
    对于归约处理节点,获取归约处理节点执行预定批次数据所花费的时间 以及执行预定批次数据的数据数量;
    根据以下公式计算映射处理节点处理数据记录的时间:ppti=fi/ti,其中,f为归约处理节点执行预定批次数据所花费的时间、t为归约处理节点执行预定批次数据的数据数量。
  7. 根据权利要求3所述的方法,其特征在于,进一步包括:
    对于聚合处理节点,获取聚合处理节点执行预定批次数据所花费的时间、执行预定批次数据的数据数量、生成检查点所需时间、生成一次检查点所间隔的数据批次;
    根据以下公式计算聚合处理节点处理数据记录的时间:ppti=max(fi/ti,cpti/ti*cbi),其中,f为聚合处理节点执行预定批次数据所花费的时间、t为聚合处理节点执行预定批次数据的数据数量、cpt为生成检查点所需时间、cb为生成一次检查点所间隔的数据批次。
  8. 根据权利要求5、6或7所述的方法,其特征在于,根据以下公式计算计算节点的并发度:
    adjV.dop=Sum(v.dop*v.ratio*adjV.ppt/v.ppt),其中,adjV为当前计算节点、v是当前计算节点的上游邻接计算节点、Sum()表示对当前计算节点的所有上游邻接计算节点求和、ratio为计算节点的吞吐率。
  9. 根据权利要求1所述的方法,其特征在于,所述根据计算得到的新的并发度调整系统当前的并发度,包括:
    对比计算得到的新的并发度与系统当前的并发度,如果差值大于预设阈值则初始化系统并以新的并发度运行系统。
  10. 一种分布式DAG系统的自适应优化装置,所述分布式DAG系统包括多个计算节点,其特征在于,所述装置包括:
    数据获取模块,用于获取每个计算节点的性能数据;
    计算模块,用于根据所述性能数据分别计算每个计算节点处理数据记录 的时间,并根据计算节点处理数据记录的时间计算该计算节点的并发度;
    自适应优化模块,用于根据计算得到的并发度调整系统当前的并发度。
  11. 根据权利要求10所述的装置,其特征在于,所述计算模块还用于,对所述DAG系统进行广度优先遍历,获取遍历到的计算节点的性能数据并根据所述性能数据计算该计算节点处理一条数据记录的时间,根据计算节点处理一条数据记录的时间计算该计算节点的并发度。
  12. 根据权利要求10所述的装置,其特征在于,所述计算节点的类型包括以下的一种或多种:映射处理节点、归约处理节点、聚合处理节点;
    所述数据获取模块还用于,根据计算节点的类型获取该计算节点的性能数据。
  13. 根据权利要求12所述的装置,其特征在于,还包括:
    存储模块,用于存储所述数据获取模块获取到的计算节点的性能数据;并接受所述计算模块从其中读取存储的性能数据,以使所述计算模块根据读取的性能数据计算计算节点处理数据记录的时间。
  14. 根据权利要求12所述的装置,其特征在于,
    所述数据获取模块包括:第一数据获取模块,用于对于映射处理节点,获取映射处理节点执行一条数据记录的时间;
    所述计算模块包括:第一计算模块,用于根据以下公式计算映射处理节点处理数据记录的时间:ppti=li,其中,l为映射处理节点执行一条数据记录的时间。
  15. 根据权利要求12所述的装置,其特征在于,
    所述数据获取模块包括:第二数据获取模块,用于对于归约处理节点,获取归约处理节点执行预定批次数据所花费的时间以及执行预定批次数据的数据数量;
    所述计算模块包括:第二计算模块,用于根据以下公式计算归约处理节 点处理数据记录的时间:ppti=fi/ti,其中,f为采集归约处理节点执行预定批次数据所花费的时间、t为归约处理节点执行预定批次数据的数据数量。
  16. 根据权利要求12所述的装置,其特征在于,
    所述数据获取模块包括:第二数据获取模块,用于对于聚合处理节点,获取聚合处理节点执行预定批次数据所花费的时间、执行预定批次数据的数据数量、生成检查点所需时间、生成一次检查点所间隔的数据批次;
    所述计算模块包括:第三计算模块,用于根据以下公式计算聚合处理节点处理数据记录的时间:ppti=max(fi/ti,cpti/ti*cbi),其中,f为聚合处理节点执行预定批次数据所花费的时间、t为聚合处理节点执行预定批次数据的数据数量、cpt为生成检查点所需时间、cb为生成一次检查点所间隔的数据批次。
  17. 根据权利要求14、15或16所述的装置,其特征在于,所述计算模块还包括:第四计算模块,用于根据以下公式计算计算节点的并发度:adjV.dop=Sum(v.dop*v.ratio*adjV.ppt/v.ppt),其中,adjV为当前计算节点、v是当前计算节点的上游邻接计算节点、Sum()表示对当前计算节点的所有上游邻接计算节点求和、ratio为计算节点的吞吐率。
  18. 根据权利要求10所述的装置,其特征在于,所述自适应优化模块还用于,对比计算得到的新的并发度与系统当前的并发度,如果差值大于预设阈值则初始化系统并以新的并发度运行系统。
PCT/CN2016/087461 2015-07-08 2016-06-28 分布式dag系统的自适应优化方法和装置 WO2017005115A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510397422.1A CN106339252B (zh) 2015-07-08 2015-07-08 分布式dag系统的自适应优化方法和装置
CN201510397422.1 2015-07-08

Publications (1)

Publication Number Publication Date
WO2017005115A1 true WO2017005115A1 (zh) 2017-01-12

Family

ID=57684691

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/087461 WO2017005115A1 (zh) 2015-07-08 2016-06-28 分布式dag系统的自适应优化方法和装置

Country Status (2)

Country Link
CN (1) CN106339252B (zh)
WO (1) WO2017005115A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107315834A (zh) * 2017-07-12 2017-11-03 广东奡风科技股份有限公司 一种基于广度优先搜索算法的etl作业流程分析方法
CN109725989B (zh) * 2017-10-31 2020-07-31 阿里巴巴集团控股有限公司 一种任务执行的方法及装置
CN107832151B (zh) * 2017-11-10 2020-09-25 东软集团股份有限公司 一种cpu资源分配方法、装置及设备
CN110362387B (zh) * 2018-04-11 2023-07-25 阿里巴巴集团控股有限公司 分布式任务的处理方法、装置、系统和存储介质
CN111158901B (zh) * 2019-12-09 2023-09-08 爱芯元智半导体(宁波)有限公司 计算图的优化方法、装置、计算机设备和存储介质
CN111400008B (zh) * 2020-03-13 2023-06-02 北京旷视科技有限公司 计算资源调度方法、装置及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090171731A1 (en) * 2007-12-28 2009-07-02 International Business Machines Corporation Use of graphs in managing computing environments
CN102012844A (zh) * 2010-11-29 2011-04-13 上海大学 一种面向cmp系统的线程调度方法
CN102360246A (zh) * 2011-10-14 2012-02-22 武汉理工大学 一种异构分布式系统中基于自适应阈值的节能调度方法
CN103150148A (zh) * 2013-03-06 2013-06-12 中国科学院对地观测与数字地球科学中心 一种基于任务树的大尺度遥感影像并行镶嵌方法
CN103491024A (zh) * 2013-09-27 2014-01-01 中国科学院信息工程研究所 一种面向流式数据的作业调度方法及装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103699433B (zh) * 2013-12-18 2017-07-14 中国科学院计算技术研究所 一种于Hadoop平台中动态调整任务数目的方法及系统
CN107729147B (zh) * 2014-03-06 2021-09-21 华为技术有限公司 流计算系统中的数据处理方法、控制节点及流计算系统
CN104317658B (zh) * 2014-10-17 2018-06-12 华中科技大学 一种基于MapReduce的负载自适应任务调度方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090171731A1 (en) * 2007-12-28 2009-07-02 International Business Machines Corporation Use of graphs in managing computing environments
CN102012844A (zh) * 2010-11-29 2011-04-13 上海大学 一种面向cmp系统的线程调度方法
CN102360246A (zh) * 2011-10-14 2012-02-22 武汉理工大学 一种异构分布式系统中基于自适应阈值的节能调度方法
CN103150148A (zh) * 2013-03-06 2013-06-12 中国科学院对地观测与数字地球科学中心 一种基于任务树的大尺度遥感影像并行镶嵌方法
CN103491024A (zh) * 2013-09-27 2014-01-01 中国科学院信息工程研究所 一种面向流式数据的作业调度方法及装置

Also Published As

Publication number Publication date
CN106339252B (zh) 2020-06-23
CN106339252A (zh) 2017-01-18

Similar Documents

Publication Publication Date Title
WO2017005115A1 (zh) 分布式dag系统的自适应优化方法和装置
Ousterhout et al. Monotasks: Architecting for performance clarity in data analytics frameworks
WO2017016421A1 (zh) 一种集群中的任务执行方法及装置
CA2963088C (en) Apparatus and method for scheduling distributed workflow tasks
US10049133B2 (en) Query governor across queries
WO2019128475A1 (zh) 数据训练方法及装置、存储介质、电子装置
US8290937B2 (en) Estimating and monitoring query processing time
US8447776B2 (en) Estimating and managing energy consumption for query processing
US20140026147A1 (en) Varying a characteristic of a job profile relating to map and reduce tasks according to a data size
WO2015058578A1 (zh) 一种分布式计算框架参数优化方法、装置及系统
Goel et al. Complexity measures for map-reduce, and comparison to parallel computing
CN103744749A (zh) 一种基于预算算法的虚拟机智能备份方法
Petrov et al. Adaptive performance model for dynamic scaling Apache Spark Streaming
CN103942108A (zh) Hadoop同构集群下的资源参数优化方法
Li et al. A new speculative execution algorithm based on C4. 5 decision tree for Hadoop
Dai et al. Research and implementation of big data preprocessing system based on Hadoop
CN108710640B (zh) 一种提高Spark SQL的查询效率的方法
CN105740249B (zh) 一种大数据作业并行调度过程中的处理方法及其系统
EP3200083B1 (en) Resource scheduling method and related apparatus
US10592473B2 (en) Method for improving energy efficiency of map-reduce system and apparatus thereof
WO2024021475A1 (zh) 一种容器调度方法及装置
Wang et al. Slo-driven task scheduling in mapreduce environments
Lei et al. Redoop: Supporting Recurring Queries in Hadoop.
CN111813512B (zh) 一种基于动态分区的高能效Spark任务调度方法
CN104184806B (zh) 一种均衡能耗与服务质量的iaas虚拟机动态迁移方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16820766

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16820766

Country of ref document: EP

Kind code of ref document: A1