CN102073546A - Task-dynamic dispatching method under distributed computation mode in cloud computing environment - Google Patents

Task-dynamic dispatching method under distributed computation mode in cloud computing environment Download PDF

Info

Publication number
CN102073546A
CN102073546A CN 201010583597 CN201010583597A CN102073546A CN 102073546 A CN102073546 A CN 102073546A CN 201010583597 CN201010583597 CN 201010583597 CN 201010583597 A CN201010583597 A CN 201010583597A CN 102073546 A CN102073546 A CN 102073546A
Authority
CN
Grant status
Application
Patent type
Prior art keywords
task
node
information
computing
table
Prior art date
Application number
CN 201010583597
Other languages
Chinese (zh)
Other versions
CN102073546B (en )
Inventor
毛宏
祝明发
肖利民
胡声秋
阮利
Original Assignee
北京航空航天大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date

Links

Abstract

The invention provides a task-dynamic dispatching method under a distributed computation mode in a cloud computing environment, which comprises the following four steps: 1. a main node receives and analyzes heartbeat information of a subsidiary node; 2. the main node previously distributes the task according to a node state table and a task state table; 3. the subsidiary node demands the task from the main node; and 4. the main node distributes the task to the subsidiary node. The method firstly considers the resource demand of the task and the performance information of the nodes, and dynamically controls the distribution of the task under the condition that the requirement is met, so that the response speed of the work and the resource utilization of the nodes are improved. The method has wide practical value and application prospect in the technical field of the distributed computation in the cloud computing environment.

Description

一种云计算环境中分布式计算模式下的任务动态调度方法 One kind of cloud computing dynamic task scheduling in a distributed computing environment mode

(-)技术领域 (-) FIELD

[0001] 本发明涉及一种分布式计算模型的任务调度方法。 [0001] The present invention relates to a method for distributed computing task scheduling model. 具体涉及到一种云计算环境中分布式计算模式下的任务动态调度方法。 Particularly to a dynamic task scheduling in a distributed computing environment, one kind of cloud computing model. 它是一种任务调度子系统中任务的基于节点性能的动态调度方法,属于计算机技术领域。 It is a task scheduler subsystem dynamic task scheduling method based on the performance of a node, which belongs to the field of computer technology.

(二)背景技术 (B) Background Art

[0002] 目前,随着网络应用的飞速发展使得对计算能力的需求不断增加,伴随着网格计算、并行计算、分布式计算的发展,云计算应运而生,被列为国家未来重点发展的技术方向, 并成为了当今计算机研究界和工业界的热点研究课题。 [0002] Currently, with the rapid development of network applications the demand for computing power continues to increase, along with grid computing, parallel computing, distributed computing development, cloud computing came into being, was listed as the focus of future development of the country technical direction, and it has become a hot research topic in today's computer research community and industry. 随着云计算的流行,越来越多的网络(Web)服务和商业应用被部署到云计算环境中,对于云环境中处理应用层计算请求的分布式节点来说,如何通过任务的调度以高效处理上层计算请求,提高性能上异构的分布式节点的资源的使用率,并提升作业的响应速度成为当前云计算领域的研究热点。 With the popularity of cloud computing, more and more network (Web) services and business applications are deployed to the cloud computing environment for distributed application layer node cloud computing environment to process the request, how to schedule tasks by efficient processing upper calculation request, to improve the utilization of resources on the performance of distributed heterogeneous nodes, and to enhance the response speed of the work become a research hotspot cloud computing.

[0003] 在对云环境中的海量数据进行处理时,以分布式存储和分布式并行处理为基础的任务调度是关键步骤之一。 [0003] When the cloud mass data processing environment, a distributed memory parallel processing and distributed based scheduling is one of the key steps. 改进作业和任务的调度方法是目前的研究热点,国内外研究主要包括多作业并行运行时作业之间的调度、单作业运行时子任务的调度和并行运行的子任务数量的最优化等方面。 Improved Scheduling jobs and tasks are currently hot research, including research aspects of domestic and foreign multi-job parallel runtime optimization tasks such as scheduling and the number of sub-sub-tasks running in parallel between job scheduling, a single job runs.

[0004] 在作业的调度方面,当前的最基本的调度方式为先进先出的作业调度方法,毋庸置疑,这种作业处理方法有很多弊端,尤其是在作业数量较多时,整体响应时间很长。 [0004] In terms of scheduling jobs, the current job is a basic scheduling FIFO scheduling method, no doubt, this job processing method has many drawbacks, especially when large number of jobs, the overall response time is very long . 公平调度器(Fair scheduler)的提出较好的解决了这个问题,当单独一个作业在运行时,它将使用整个集群。 Proposed Fair Scheduler (Fair scheduler) is a better solution to this problem, when a single job is running, it will use the entire cluster. 当有其它作业被提交时,系统会将任务空闲时间片赋给这些新的作业,以使得每一个作业都大概获取到等量的CPU时间,并且使小任务得到快速响应的同时保证大任务的服务水平。 At the same time when another job is submitted, the system will be idle slots assigned to the task of these new jobs such that each job are acquired about the same amount of CPU time, and the rapid response of small tasks ensure a large task Service Level. 容量调度器(Capacity scheduler)则支持多队列,作业提交后进入一个队列,资源按队列分配,每个队列中的作业使用该队列的资源;在一个队列中,高优先级的作业可以先于低优先级的作业使用资源;但一旦一个作业开始执行,它就不会被更高优先级的作业抢占;为防止一个或多个用户垄断所有资源,强制为每个队列分配一定比例的资源。 Capacity Scheduler (Capacity scheduler) supports multi-queue, after a job is submitted into the queue, the queue according to resource allocation, each queue jobs using resources of the queue; in a queue, high priority jobs can precede low priority jobs using resources; but once started a job, it will not be preempted by a higher priority job; monopolize all resources to prevent one or more users to force a certain percentage for each queue assigned resources. 中国科学院计算技术研究所提出的基于MR-Predict的三队列调度器根据CPU和I/O使用率将工作负载分成3类,能够在不同类型的工作负载环境下同时提高CPU和I/O资源的使用率。 Calculated based on the Institute of Chinese Academy of Sciences technology proposed MR-Predict three queue scheduling according CPU and I / O workload utilization divided into three categories, can be simultaneously improved CPU and I / O resources in different types of workloads environment usage.

[0005] 在任务调度上,加州伯克利大学的研究人员提出的LATE (Longest ApproximateTime to End)调度算法则聚焦于对作业中的备份任务的调度的优化。 LATE (Longest ApproximateTime to End) [0005] on the task scheduling, researchers at the University of California at Berkeley proposed scheduling algorithm is focused on the optimization of the operation of the backup task scheduling. 通过推测完成任务所需要的时间,确保只在速度快的节点上执行估计最晚完成的任务的备份任务。 Task completion time required by speculation, to ensure that the backup task is executed only estimate the latest task in the fast node. 普渡大学的研究人员提出了基于历史统计数据的任务数量最优化配置方法,其研究主要关注在执行作业时,云环境中每个节点上同时运行的任务的数目对性能的影响,根据历史统计数据,获取最优化配置并应用于新的同类作业。 Purdue University researchers have proposed based on optimization of the number of tasks to configure historical statistics, the study focused on the job at the time of execution, affect the number of cloud environments to run simultaneously on each node of the task on performance, based on historical statistics data, to obtain an optimal configuration and the same applied to the new job.

[0006] 然而,在大多数情况下,不同节点的性能各异,不同时刻各节点的负载状况也不同,如何根据节点的性能异构性及动态负债状况确定任务的动态分配策略,对于高效处理计算任务并提高分布式节点的资源使用率、提升作业的响应速度有重要意义。 [0006] However, in most cases, the performance of various different nodes, the load on various different nodes at different times, how to determine the dynamic task assignment policy according to the performance of dynamic heterogeneous and liabilities of the nodes, for the efficient processing of computing tasks and improve resource utilization of distributed nodes to enhance the response speed of the job is important.

(三)发明内容 (Iii) Disclosure of the Invention

[0007] 1、目的: [0007] 1, Objective:

[0008] 本发明的主要目的是提供一种云计算环境中分布式计算模式下的任务动态调度方法,它首先考虑任务的资源需求及节点的性能信息,在满足需求的情况下对任务的分配进行动态控制,从而提高作业的响应速度和节点的资源使用率。 [0008] The main object of the present invention is the capability information of the node, and resource requirements to provide a cloud computing environment dynamic task scheduling method in a distributed computing model, consider first its tasks, assign tasks to meet the demand in the case of dynamic control, thereby improving response speed and operation of node resource utilization.

[0009] 为实现上述目的,本发明提出了云计算环境中分布式计算模式下基于节点性能和任务执行状况的任务的动态调度方法,云计算环境下分布式计算节点的组成结构如图1所示,主要包括一个主控节点(主节点)和多个计算节点(子节点),计算节点既可以是物理机,也可以是虚拟机,对主控节点透明,节点间通过网络互联。 [0009] To achieve the above object, the present invention proposes a method for dynamic scheduling of cloud computing task execution status of the task node and the performance-based composition of distributed computing nodes in a cloud computing environment as shown in the distributed computing environment mode shown, including a master node (master) and a plurality of compute nodes (child nodes), the computing node may be either a physical machine or a virtual machine, on a transparent master node between nodes interconnected by a network. 主控节点与计算节点通过远程过程调用(RPC)方式交互。 The master node and compute nodes via a remote procedure call (RPC) interact. 主控节点主要负责接收计算节点的心跳信息,并加以分析和反馈以控制任务的调度和执行;计算节点除了执行任务以外,还主要负责收集本节点的性能信息和任务执行信息并发送给主控节点。 The master node is responsible for receiving computing node heartbeat information, and analyze and feedback to control scheduling and execution of tasks; computing nodes other than the tasks, but also the main responsible for collecting performance information and node information is sent to the master task execution node.

[0010] 2、技术方案: [0010] 2, technical solutions:

[0011] 本发明的技术方案是这样的: [0011] aspect of the present invention is such that:

[0012] 本发明一种云计算环境中分布式计算模式下的任务动态调度方法,具体流程如图2所示,该方法包括以下步骤: [0012] The present invention provides a dynamic tasks in a distributed computing environment cloud mode scheduling method, the specific process shown in Figure 2, the method comprising the steps of:

[0013] 步骤201.计算节点动态收集本节点的性能信息及任务执行信息,以心跳信息的形式报告给主控节点。 [0013] Step 201. The dynamic object of the collecting performance information and execution information node, to report heartbeat information in the form of master node compute nodes.

[0014] 步骤202.主控节点接收并分析各计算节点的心跳信息,创建并不断更新节点状态表和任务状态表。 [0014] Step 202. The master node receiving and analyzing the heartbeat information of each computing node, to create and continuously update the node status and task status table. 根据节点状态表和任务状态表,主控节点为计算节点预分配任务,更新节点预取表和任务预分表。 The node status and task status table, the master node is pre-assigned the task of computing nodes, the node updating the prefetch table and task pre-separation.

[0015] 步骤203.如果计算节点中有空的任务槽(task slot)可用,则在下次的心跳信息中加入向主控节点请求任务的标志。 [0015] Step 203. If the computing node empty slots task (task slot) is available, a task request flag is added to the master node of the next heartbeat.

[0016] 步骤204.主控节点接收到计算节点的任务请求后,按调度策略为其分配任务,并更新节点预取表和任务预分表。 [0016] Step 204. After receiving the master node requesting computing nodes tasks, assign tasks according to the scheduling policy, and updating the prefetch table and the task node pre-separation table.

[0017] 其中,步骤201所述的节点性能信息和任务执行信息是主控节点更新节点状态表和任务状态表的重要数据来源。 [0017] wherein, in the step 201 of the node information and the task execution performance information is important to update the data source node status and task status table master node. 节点性能信息可包括CPU主频、内存大小、CPU使用率、内存使用率、I/O资源使用率等。 Node capability information may include a CPU clock speed, memory size, CPU usage, memory usage, I / O resource usage and the like. 任务执行信息包括刚结束的任务执行信息和正在进行中的任务执行信息;刚结束的任务执行信息包括任务的TaskID、所在作业的JobID、用于IO的时间(复制处理数据)和用于CPU计算的时间,其中,复制处理数据发生在该计算节点没有此任务的输入数据的情况下发生;正在进行中的任务执行信息包括任务的TaskID、所在作业的JobID、任务的执行进度和已执行时间。 Immediately after the task execution information includes execution of task execution task information and ongoing; immediately after the TaskID of the task includes a task execution information, where the JobID job, time (data copy process) for calculating for the CPU and IO occurs in the case of time, wherein the data copying process occurs in the input data of the computing node does not have this task; performing task execution progress information of ongoing tasks include TaskID, JobID where the job, and has the task execution time. 每个计算节点每隔一段时间收集本节点的这两种信息,并封装为心跳信息发送给主控节点。 Each computing node collects node information of the two intervals, and the package is sent to the master node heartbeat information.

[0018] 其中,步骤202中所述的节点状态表和任务状态表是主控节点制定任务分配方案的重要参考信息。 [0018] wherein, in the step of the node status table and the task state table 202 with reference to information is an important task to develop a master node allocation scheme. 节点状态表描述了近一段时期内各计算节点的性能状态,任务状态表记录了各计算节点在近一段时期内处理任务的情况。 Node performance state table describes the state of each computing node in the recent period, the task state table records the case where each computing node processing tasks in a recent period. 主控节点第一次接收到计算节点的心跳信息后,创建节点状态表和任务状态表并在以后每次接收到计算节点的心跳信息后更新这两个表。 After the master node receives the first heartbeat compute nodes, create node status and task status table and after each subsequent heartbeat information received compute nodes update these tables. 节点状态表包括NodeName、CPU_Speed、MemSize、CPU_Usage、Mem_Usage、I0_ Usage 这些字段;任务状态表包括JobID、TaskID、NodeName、Time_IO、Time_CPU、Progress、 I^stTime这些字段。 Node status table includes NodeName, CPU_Speed, MemSize, CPU_Usage, Mem_Usage, I0_ Usage of these fields; task status table includes JobID, TaskID, NodeName, Time_IO, Time_CPU, Progress, I ^ stTime these fields. 节点预取表和任务预分表记录着当前集群中任务的预分配信息。 Prefetching node and task pre-separation table records the pre-assigned tasks in the current cluster. 节点预取表记录了主控节点为计算节点预先分配任务的信息,节点预取表包括NodeName、 preFetchecUprei^etchedTaskID这些字段。 Node information recorded in the prefetch table master node to calculate the node previously assigned task, the prefetch table includes a node NodeName, preFetchecUprei ^ etchedTaskID these fields. 任务预分表记录了主控节点将任务预先分配给计算节点的信息,任务预分表包括TaskID、preScheduled, preScheduledNodeName这些字段。 Task pre-separation table records the master node tasks previously assigned to the computing node information, the task table includes pre-separation TaskID, preScheduled, preScheduledNodeName these fields.

[0019] 其中,步骤203所述的计算节点的任务槽的大小是指计算节点同一时刻能并行执行的最大任务数,任务槽的大小在分布式节点集群启动前配置好。 [0019] wherein the step of computing nodes according to the size of the task of the groove 203 is the maximum number of computing nodes the same time tasks can be executed in parallel, the size of the task before the groove is configured in a distributed node cluster starts. 计算节点只有在有空的任务槽的时候才向主控节点申请任务,任务的申请通过心跳信息传递,心跳信息中包含申请任务的标志位,如果为真则表明该计算节点有空的任务槽,主控节点可以将任务分配给该计算节点执行。 Flag compute nodes only apply when the tank was empty task to the master node application task, the task of the transmission of information by heart, heartbeat information contained in the application tasks, if true indicates that the task of computing nodes empty slots , the master node may assign a task to perform the computing node.

[0020] 其中,步骤204所述的主控节点为申请任务的计算节点分配任务执行是通过分布式调度算法决定的。 [0020] wherein, in the step of the master node 204 computing nodes assigned to perform application tasks distributed by the task scheduling algorithm determined. 分布式调度算法在主控节点的调度器中实现,同一时刻可能有多个计算节点同时申请任务执行,调度器通过读取节点状态表、任务状态表、节点预取表和任务预分表,并结合剩余任务队列,根据分布式调度算法确定为计算节点分配任务的优先次序及任务个数,然后更新节点预取表和任务预分表。 Distributed scheduling algorithm implemented in a scheduler in the master node, the same time may have multiple computing nodes simultaneously request task execution, scheduler node by reading the state table, the task state table, the node prefetching and task pre-separation table, and combining the remaining task queue priorities determined and the calculation of the number of tasks assigned task with nodes of a distributed scheduling algorithm, then the node updates the prefetch and task pre-separation table.

[0021] 3、优点及功效:本发明一种云计算环境中分布式计算模式下的任务动态调度方法,它与现有技术此,其主要优点是:(1)通过分析计算节点的性能动态变化和历史任务执行信息,使得主控节点对任务的分配更合理,更能充分发挥性能较好的计算节点的性能优势,而原有的任务调度方法都没有考虑各计算节点在性能上的动态变化性;(¾改变了典型的分布式计算模型(如MapReduce)中只要计算节点向主控节点申请任务即可获得任务执行的惯例,而为主控节点赋予了选择计算节点去执行任务的权利,这样就避免了性能较差的计算节点带来的瓶颈问题。 [0021] 3, advantages and effects: the present invention provides a dynamic task in the distributed computing environment cloud mode scheduling method, it is prior art to this, the main advantages are: (1) performance by dynamically computing node Analysis change and task execution history information so that the master node to assign the task is more reasonable, more performance into full play the advantages of better performance of compute nodes, and the original task scheduling methods do not consider the dynamics of each computing node in performance variability; (¾ changed typical distributed computing model (such as the MapReduce) as claimed in practice can be obtained as long as the computing node to the master node tasks performed by the application task, given the selected node as the master node is calculated to perform the task of , thus avoiding bottlenecks caused by poor performance of compute nodes.

(四)附图说明 (Iv) Brief Description of Drawings

[0022] 图1本发明的云计算环境中分布式计算节点的组成结构示意图 [0022] Composition a schematic structural diagram of computing nodes cloud computing environment of FIG. 1 of the present invention distributed

[0023] 图2云环境中基于分布式节点性能和任务执行状况的任务分布式调度流程示意图 [0023] FIG. 2 cloud environment based on distributed nodes task performance and the status of distributed scheduling task execution flow schematic

[0024] 图3本发明包括的三个阶段(初始化、信息更新和任务调度)的交互结构图 [0024] Figure 3 of the present invention comprises a three phase (initialization, task scheduling, and update information) FIG interaction structure

[0025] 图4本发明包括的三个阶段的详细流程图 [0025] Figure 4 a detailed flowchart of the present invention comprises three stages of

[0026] 图5本发明信息更新模块流程示意图 [0026] FIG invention information updating module 5 a schematic flow

[0027] 图6本发明任务调度模块流程示意图 [0027] Fig 6 a schematic flowchart of task scheduling module invention

[0028] 图中符号说明如下: [0028] FIG symbols as follows:

[0029] 201-204步骤序号;501-505步骤序号;601-604步骤序号; [0029] Step No. 201-204; 501-505 Reference step; step number 601-604;

(五)具体实施方式 (E) Detailed Description

[0030] 为使本发明的目的、技术方案和优点表达得更加清楚明白,下面结合附图及具体实施例对本发明再作进一步详细的说明。 [0030] For purposes of the present invention, technical solutions and advantages clearer to obtain expression, in conjunction with the accompanying drawings and specific embodiments of the present disclosure is further described in detail. [0031] 本发明所需满足的设备环境条件见图1,云环境中分布式计算节点的组成结构主要包括一个主控节点(主节点)和多个计算节点(子节点),计算节点既可以是物理机,也可以是虚拟机,对主控节点透明,节点间通过网络互联。 [0031] The apparatus necessary environmental conditions of the present invention shown in Figure 1, the composition of the cloud environment configuration distributed computing node includes a master node (master) and a plurality of compute nodes (child nodes), the computing node either It is a physical machine or a virtual machine, on a transparent master node between nodes interconnected by a network. 主控节点与计算节点通过远程过程调用(RPC)方式交互。 The master node and compute nodes via a remote procedure call (RPC) interact. 主控节点主要负责接收计算节点的心跳信息,并加以分析和反馈以控制任务的调度和执行;其中,节点分析器用于接收和分析计算节点的性能信息,更新节点状态表,任务分析器用于接收和分析计算节点的任务信息,更新任务状态表。 The master node is responsible for receiving the heartbeat information calculating node, and analyzed and feedback to control scheduling and execution of tasks; wherein the node receiving the performance information analyzer for analyzing and computing node, update the node status table, the task analyzer for receiving analysis and compute nodes task information, update task status table. 计算节点除了执行任务以外,还主要负责收集本节点的性能信息和任务执行信息并发送给主控节点;其中,节点性能监控器负责收集节点最近一段时间的性能信息,任务监控器负责收集节点最近一段时间执行任务的记录信息。 In addition to the compute nodes to perform tasks, also responsible for collecting performance information and task execution information of the local node and sent to the master node; where the most recent Performance Monitor node node responsible for collecting performance information, task-monitor is responsible for collecting the nearest node record information for some time to perform the task.

[0032] 本发明在软件条件方面,要求各节点采用Linux操作系统,安装有Java开发工具包1.6及以上版本。 [0032] In the present invention, conditions of the software, each node requires a Linux operating system, is attached to the Java Development Kit version 1.6 and above.

[0033] 本发明在环境条件方面,要求各节点能够通过ssh无密码互相访问。 [0033] In the present invention, environmental conditions, each node requires ssh without password access each other through.

[0034] 基于节点性能和任务执行状况的任务动态调度流程见图2,主要包括两个内容: (1)计算节点收集封装本节点的心跳信息并发送给主控节点,主控节点根据接收到的心跳信息建立和更新节点状态表和任务状态表;(¾主控节点在接收到计算节点的任务请求后,根据调度算法为计算节点分配任务并更新节点预取表和任务预分表。 [0034] Based on the performance of the task node and the status of dynamic scheduling task execution process shown in Figure 2, comprises two main elements: (1) calculate the heartbeat package of the present node collects node sends the master node, the master node according to the received heartbeat information to create and update the node status and task status table; (¾ master node upon receiving a task request computing node, according to a scheduling algorithm to calculate and update node is assigned tasks and task nodes prefetch pre-separation table.

[0035] 该方法包括三个阶段:初始化、信息更新和任务调度。 [0035] The method comprises three phases: initialization, task scheduling, and update information. 其交互结构如图3所示。 Its interaction structure shown in Figure 3. 在初始化阶段,主控节点接收作业,并建立节点状态表和任务状态表;在信息更新阶段,主控节点接收计算节点的心跳信息并更新节点状态表、任务状态表、节点预取表和任务预分表, 若计算节点请求任务,则进入任务调度阶段;在任务调度阶段,主控节点根据节点信息和任务信息为计算节点分配任务,结束后返回信息更新阶段等待计算节点的心跳信息。 In the initialization phase, the master node receives a job, and to establish the node status and task status table; information update phase, the heartbeat information receiving computing node master node and updates the node status table, the task state table, the node table and the task prefetching pre-separation table, when the computing node request task, the process proceeds to task scheduling stage; task scheduling stage, the master node to calculate the node is assigned the task, return information after waiting an update phase heartbeat information calculating node according to the node information and the task information.

[0036] 下面以一实例进行说明,如图4所示,本发明所述的方法包括以下步骤: [0036] In the following an example will be described, the method according to the present invention shown in FIG. 4 comprising the steps of:

[0037] 步骤401 :计算节点上的节点性能监控器收集本节点的性能信息,任务监控器收集本节点的任务执行信息,再封装成心跳信息,发送给主控节点。 [0037] Step 401: compute node performance monitor node collects performance information on the local node, task execution task monitor collects information on the local node, and then packaged as heart-rate information transmitted to the master node. 信息收集和心跳信息发送的周期为3秒。 Information collection cycle and heartbeat information transmitted to 3 seconds.

[0038] 步骤402 :主控节点接收并分析各计算节点的心跳信息,如果是第一次收到心跳信息,则创建节点状态表和任务状态表,如果已创建,则每收到一个心跳信息就更新节点状态表和任务状态表。 [0038] Step 402: the master node receiving and analyzing the heartbeat information of each computing node, if the message is the first received heartbeat, create node status and task status table, if created, each receive a heartbeat message It updates the node status and task status table. 主控节点根据节点状态表和任务状态表,为计算节点预分配任务,更新节点预取表和任务预分表。 The master node node state table and the task state table, pre-assigned tasks to compute nodes, the node updating the prefetch table and task pre-separation. 具体如图5的信息更新模块所示。 Specific information updating module 5 shown in FIG.

[0039] 步骤403 :计算节点若有空的任务槽(task slot)可用,则在下次的心跳信息中加入向主控节点请求任务的标志。 [0039] Step 403: compute nodes if the empty slots task (task slot) is available, a task request flag is added to the master node of the next heartbeat.

[0040] 步骤404 :主控节点接收到计算节点的任务请求后,按调度策略为其分配任务。 [0040] Step 404: After receiving the master node to the computing node task requests, tasks assigned by the scheduling policy. 具体如图6的任务调度模块所示。 Specific task scheduling module 6 as shown in FIG.

[0041] 信息更新模块的详细流程如图5所示, [0041] Detailed information update process module shown in Figure 5,

[0042] 步骤501 :主控节点监听计算节点的RPC访问,接收计算节点发送的心跳信息。 [0042] Step 501: the master node listens RPC access computing node, the reception node sends the calculated heartbeat information. 主控节点同一时刻只能接收一个计算节点的心跳信息,如果主控节点在接收某个计算节点的心跳信息时,有其他计算节点也向主控节点发送心跳信息,则主控节点将较晚心跳的计算节点加入等待队列。 Master nodes at the same time can receive a compute node heartbeat, if the master node upon receiving information about a heartbeat compute nodes, there are also other computing nodes send heartbeat information to the master node, the master node will be late computing node added heartbeat queue. 计算节点上的节点性能监控器监控并收集本节点最近一段时间内的性能信息,任务监控器监控本节点上正在执行的任务的信息并收集已执行的最近的3个历史 Information Node Performance Monitor monitors on the compute nodes and collect performance information in the most recent period this node, the task monitor monitoring tasks being performed on this node and collect three recent history has been executed

8任务的记录,计算节点将性能信息和任务信息封装为心跳信息。 8 the recording task, the computing node capability information and task information package heartbeat information. 如果最近一段时间内的任务信息没有更新,心跳信息中也可以只包含节点的性能信息。 If the task information in the most recent period is not updated, heartbeat messages can also contain only information node performance. 计算节点每隔一段时间将心跳信息通过RPC方式发送给主控节点。 The computing node to the master node sending a heartbeat message every so often by way of RPC. 心跳周期为3秒。 Heartbeat period is 3 seconds. 每次心跳时,心跳信息中都应包含节点的性能信息和当前正在执行的任务的信息,而计算节点每执行完一个任务,都在下一次心跳时将刚结束的任务的执行记录加入心跳信息发送给主控节点,即任务信息中包含两类任务信息:已完成但未上报的任务信息(可能为空)和正在进行中的任务信息。 Each time a heartbeat, the heartbeat message should contain performance information nodes and information currently executing tasks, and compute nodes each executing the task, all the next record will be performed immediately after the task of adding a heartbeat heartbeat sent to the master node that contains two types of task information in task information: task information has been completed but not reported (possibly empty) and task information in progress.

[0043] 步骤502 :主控节点根据接收到的心跳信息,更新节点状态表和任务状态表。 [0043] Step 502: The master node receives the heartbeat information, update the node status and task status table. 对于节点状态表,将心跳信息中的计算节点状态信息覆盖主控节点中节点状态表对应于该计算节点的信息。 For the node status table, the heartbeat information calculated in the node status information overrides master node node state table corresponding to the computing node. 任务状态表中记录着每个节点上执行的最近3个历史任务的信息和正在进行中的任务信息,主控节点每次收到新的任务状态信息时,首先看是否有已完成但未上报的任务信息,如果有,则获得该任务的TaskID并查看该任务在任务状态表中是否已存在,若存在则更新任务状态表中该任务的信息,否则删除任务状态表中该计算节点的最旧的任务信息并加入该已完成的任务信息。 When the task state table records information about the last three historical tasks of information and ongoing task execution on each node, the master node each time it receives a new task status information, first to see if there have been completed but not reported the task information, if any, to obtain the task TaskID and see whether the task already exists in the task status table, if there is information the task task status table is updated, or deleted task status table in the computing node most old job information and add information about the task has been completed. 对于正在进行中的任务信息,获得该任务的TaskID,如果该任务在任务状态表中已存在,则更新任务状态表中该任务的信息,否则,在任务状态表中加入该任务的信息。 For information about ongoing task, the task of obtaining TaskID, if the task already exists in the task status table, information about the task update task status table, and otherwise adding information to the task in the task status table.

[0044] 步骤503 :根据节点状态表和任务状态表更新节点预取表和任务预分表。 [0044] Step 503: The node status and task status update node table prefetch and task pre-separation table. 对于任务列表中的第m个任务,根据节点状态表和任务状态表,预测每个节点执行该任务所需的时间,预测算法如下: For m in the task list of tasks, and the task state table based on the node status table, each node prediction time required to perform that task, the prediction algorithm is as follows:

[0045] [0045]

Figure CN102073546AD00091

[0046] 其中,Tu为第i个计算节点执行其第Ji个任务所需的预测时间,ts为出现可用任务槽的时间,ti0为复制数据所需的时间,tcpu为数据处理时间,h为计算节点已经成功执行的任务的参考数目,η为集群中的计算节点数。 [0046] wherein, Tu i-th execution of the predicted time needed to compute nodes Ji its task, ts is a time slot available tasks occurs, ti0 is the time required to copy data, tcpu data processing time, h is the reference number of tasks compute nodes has been successfully executed, η to calculate the number of nodes in the cluster.

[0047] 获得Tu的值后,主控节点选择最小的一个,向其对应的计算节点预分配任务。 After [0047] the value of Tu is obtained, the master node selects the smallest one of its nodes corresponding to the calculated pre-assigned tasks. 并将任务m标记为已预分,将即将执行该任务的计算节点标记为已预取。 M tag and task computing node marked as pre-separation, the task will be executed soon as prefetch.

[0048] 接着,主控节点继续在未被标记为已预取的各计算节点中为下一个没有标记为已预分的任务选择执行它的计算节点。 [0048] Next, the master node continues at each computing node is not marked prefetched in the next unmarked selected to perform its task of pre-computing node points. 每次只预分num个任务,其中num= 5。 A time pre-separation task num, where num = 5. 预分完毕后, 任务队列里可能还有未被标记为已预分的任务,节点列表中也可能会有未被标记为已预取的节点。 After the pre-separation is complete, the task queue and possibly not marked as pre-separation tasks, there may be a list of nodes in the node is not marked as pre-fetched.

[0049] 步骤504 :如果计算节点在心跳信息中将请求任务字段标记为真,则进入任务调度模块。 [0049] Step 504: If the requested task compute node in the heartbeat information field flag is true, the process proceeds to task scheduling module.

[0050] 任务调度模块的详细流程如图6所示: [0050] The detailed flow of task scheduling module shown in Figure 6:

[0051] 步骤601 :主控节点首先通过查找节点预取表判断是否已为该节点预先分配任务,如果已预先分配,则为该节点分配已预分的任务,并标记该任务已分配,标记该节点为未预取。 [0051] Step 601: the master node through the first prefetch table lookup determines whether or not the node has been pre-assigned tasks for the node, if pre-allocated, compared with the pre-assigned sub-node task, and the task has been assigned tag, label the node is not prefetching.

[0052] 步骤602 :如果主控节点没有为该节点预先分配任务,根据步骤503中所述的预测算法,未被标记为已预取的节点是性能较差或计算能力较弱的节点,则从任务队列中选取一个任务给该节点执行,但是并不将该任务标记为已分配,待下次预分时,该任务将被预分给处理速度较快的节点,因而该任务将有一个备份任务在快节点上执行以保证其顺利执行。 [0052] Step 602: If the master node is not pre-assigned tasks for the node, according to the prediction algorithm in step 503, the flag is not weak or poor performance computing node to a node prefetched, then select node performs a task from the task to the queue, but this task is not marked as allocated, until the next pre-sharing, the task will be allocated to the pre-processing speed of the node, the task will therefore have a backup tasks are executed on fast node to ensure its smooth implementation.

[0053] 步骤603 :为计算节点分配任务结束后,主控节点更新节点预取表和任务预分表。 [0053] Step 603: After computing node is assigned task, the master node updates the node prefetch and task pre-separation table.

[0054] 步骤604 :任务调度阶段结束,转入信息更新阶段,主控节点继续接收并处理计算节点发送过来的心跳信息。 [0054] Step 604: End Task scheduling stage, into the stage of the update information, the master node continues to receive and process a heartbeat message sent from the computing node.

[0055] 最后所应说明的是:以上实施例仅用以说明而非限制本发明的技术方案,尽管参照上述实施例对本发明进行了详细说明,本领域的普通技术人员应当理解:依然可以对本发明进行修改或者等同替换,而不脱离本发明的精神和范围的任何修改或局部替换,其均应涵盖在本发明的权利要求范围当中。 [0055] Finally, it should be noted that: the above embodiments are merely to illustrate and not limit the technical solution of the present invention, although the present invention has been described in detail with reference to the embodiments described above, those of ordinary skill in the art should be understood: still present invention may be modified or equivalently substituted without departing from any modification or partial replacement of the spirit and scope of the present invention, which should fall in the scope of claims of the present invention as claimed.

Claims (5)

  1. 1. 一种云计算环境中分布式计算模式下的任务动态调度方法,通过动态获取并分析任务的资源需求及节点的性能信息和历史任务执行信息,在满足需求的情况下对任务的分配进行动态控制,从而提高作业的响应速度和节点的资源使用率,其特征在于:该方法包括以下步骤:步骤一:计算节点动态收集本节点的性能信息及任务执行信息,以心跳信息的形式报告给主控节点;主控节点接收并分析各计算节点的心跳信息,生成节点状态表和任务状态表;步骤二:主控节点根据节点状态表和任务状态表,为计算节点预分配任务,更新节点预取表和任务预分表;步骤三:如果计算节点中有空的任务槽即task slot可用,则在下次的心跳信息中加入向主控节点请求任务的标志;步骤四:主控节点接收到计算节点的任务请求后,按调度策略为其分配任务。 Dynamic task scheduling in a distributed computing environment A cloud computing model, by dynamically acquire and analyze the historical performance information and execution information of the task node and the resource requirements of the task, the task allocation performed in the needs of dynamic control to improve the response speed and operation node resource utilization, characterized in that: the method comprises the following steps: step a: collecting performance information of computing nodes dynamic task execution and the local node, to report heartbeat information in the form of master node; master node receives and analyzes each computing node heartbeat information generating node status and task status table; step II: the master node node state table and the task state table for the pre-assigned tasks compute node, update the node prefetching and task pre-separation table; step 3: If the node is calculated in the empty slot assignment i.e. task slot is available, then the task request flag is added to the master node in the next heartbeat; step four: the master node receives to request the task of computing nodes, according to a scheduling policy assigned tasks.
  2. 2.根据权利要求1所述的一种云计算环境中分布式计算模式下的任务动态调度方法, 其特征在于:步骤一所述的节点性能信息和任务执行信息是主控节点更新节点状态表和任务状态表的重要数据来源;节点性能信息包括CPU主频、内存大小、CPU使用率、内存使用率和I/O资源使用率;任务执行信息包括刚结束的任务执行信息和正在进行中的任务执行信息;刚结束的任务执行信息包括任务的TaskID、所在作业的JobID、用于IO的时间即复制处理数据和用于CPU计算的时间,其中,复制处理数据发生在该计算节点没有此任务的输入数据的情况下发生;正在进行中的任务执行信息包括任务的TaskID、所在作业的JobID、任务的执行进度和已执行时间;每个计算节点每隔一段时间收集本节点的这两种信息,并封装为心跳信息发送给主控节点。 The dynamic task scheduling in a distributed computing environment according to mode one kind of cloud computing claim 1, wherein: node information and performance information and the step of performing a task of the node is the master node update state table an important source of data and task status table; node performance information including CPU speed, memory size, CPU usage, memory usage and I / O resource usage; task execution information including the just concluded the task execution information and ongoing task execution information; just after the JobID task execution information including the TaskID of the task, where the job, i.e., the time for the IO data and the copy processing time of the CPU for calculation, wherein the data copying process occurs at the computing node is not the task It occurs in the case where the input data; task execution includes task progress information of TaskID, where the progress of a job execution JobID, and has the task execution time; each computing node to collect at intervals both the local node , and packaged as a heartbeat transmission to the master node.
  3. 3.根据权利要求1所述的一种云计算环境中分布式计算模式下的任务动态调度方法,其特征在于:步骤二所述的节点状态表和任务状态表是主控节点制定任务分配方案的重要参考信息;节点状态表描述了近一段时期内各计算节点的性能状态,任务状态表记录了各计算节点在近一段时期内处理任务的情况;主控节点第一次接收到计算节点的心跳信息后,创建节点状态表和任务状态表并在以后每次接收到计算节点的心跳信息后更新这两个表;节点状态表包括NodeName、CPU_Speed、MemSize、CPU_Usage、Mem_Usage、I0_ Usage 这些字段;任务状态表包括JobID、TaskID、NodeName、Time_IO、Time_CPU、Progress、 PastTime这些字段;节点预取表记录了主控节点为计算节点预先分配任务的信息,节点预取表包括NodeName、preFetched、preFetchedTaskID这些字段;任务预分表记录了主控节点将任务预先分 The dynamic task scheduling in a distributed computing environment according to mode one kind of cloud computing claim 1, wherein: the step of node status and task status table according to the second is the development task allocation scheme master node important reference information; node status table describes the performance state of each computing node in the recent period, the task state table recorded in each computing node in the case of the last period of processing tasks; master node receives the first computing node after the heartbeat information, create node status and task status table and the heartbeat information calculated after each received node update both tables; node status table includes NodeName, CPU_Speed, MemSize, cPU_Usage, mem_Usage, I0_ Usage these fields; task state table includes JobID, TaskID, NodeName, Time_IO, Time_CPU, Progress, PastTime these fields; nodes prefetch table records the master node to calculate the node information pre-assigned task, the prefetch table includes a node NodeName, preFetched, preFetchedTaskID these fields ; task pre-separation table records the master node tasks pre-portioned 给计算节点的信息,任务预分表包括TaskID、preScheduled, preScheduledNodeName这些字段;具体实现过程如下:1)主控节点监听计算节点的RPC访问,接收计算节点发送的心跳信息;主控节点同一时刻只能接收一个计算节点的心跳信息,如果主控节点在接收某个计算节点的心跳信息时,有其他计算节点也向主控节点发送心跳信息,则主控节点将较晚心跳的计算节点加入等待队列;计算节点上的节点性能监控器监控并收集本节点最近一段时间内的性能信息, 任务监控器监控本节点上正在执行的任务的信息并收集已执行的最近的3个历史任务的记录,计算节点将性能信息和任务信息封装为心跳信息;如果最近一段时间内的任务信息没有更新,心跳信息中也可以只包含节点的性能信息;计算节点每隔一段时间将心跳信息通过RPC方式发送给主控节点;心跳周期为3秒,每 Computing node to the information, the task table includes pre-separation TaskID, preScheduled, preScheduledNodeName these fields; specific implementation process is as follows: 1) monitor the master node calculates RPC access node, the receiving node transmits the heartbeat information calculated; the same moment the master node only computing node can receive a heart-rate information, if the master node upon receiving a heartbeat message to computing nodes, other computing nodes send heartbeat messages to the master node, the master node will be added later computing nodes wait heartbeat queue; node performance monitor monitors on the compute nodes and collect performance information in the most recent period this node, the task monitor task monitor information on this node is performing and collect the most recent three historical tasks of the records that have been performed, the computing node capability information and task information package is a heartbeat message; if the task information in the most recent period is not updated, the heartbeat message may contain only the performance information node; heartbeat intervals computing node information to the RPC by way master node; heartbeat period is 3 seconds, every 次心跳时,心跳信息中都应包含节点的性能信息和当前正在执行的任务的信息,而计算节点每执行完一个任务,都在下一次心跳时将刚结束的任务的执行记录加入心跳信息发送给主控节点,即任务信息中包含两类任务信息:已完成但未上报的任务信息和正在进行中的任务信息;2)主控节点根据接收到的心跳信息,更新节点状态表和任务状态表;对于节点状态表,将心跳信息中的计算节点状态信息覆盖主控节点中节点状态表对应于该计算节点的信息;任务状态表中记录着每个节点上执行的最近3个历史任务的信息和正在进行中的任务信息,主控节点每次收到新的任务状态信息时,首先看是否有已完成但未上报的任务信息, 如果有,则获得该任务的TaskID并查看该任务在任务状态表中是否已存在,若存在则更新任务状态表中该任务的信息,否则删除任务状 When the heart beats, heart rate performance information should contain information about the nodes and information currently executing tasks, and compute nodes each executing the task, all the next record will be performed immediately after the task of adding information to the heartbeat heartbeat master node, i.e., two types of task information includes task information: completed tasks and task information in progress but not reported; 2) the master node receives the heartbeat information, update the node status and task status table ; state table for the node, the heartbeat information calculated in the node status information overrides master node node state table corresponding to the computing node; task status information table is recorded three most recent history of the tasks performed on each node time and task information in progress, the master node each time it receives a new task status information, first to see if there have been reported but not completed the task information, and if so, get TaskID the task and view the task in the task whether the state already in the table, if there is information the task task status table is updated, or deleted tasks like 态表中该计算节点的最旧的任务信息并加入该已完成的任务信息;对于正在进行中的任务信息,获得该任务的TaskID,如果该任务在任务状态表中已存在,则更新任务状态表中该任务的信息,否则,在任务状态表中加入该任务的信息;3)根据节点状态表和任务状态表更新节点预取表和任务预分表。 The computing node state table oldest job information and the task information is added has been completed; in progress for the task information, to obtain the TaskID of the task, if the task in the task state table already exists, the update task status information of the task table, otherwise, the information added to the task in the task state table; 3) the node status and task status update node table prefetch and task pre-separation table. 对于任务列表中的第m个任务,根据节点状态表和任务状态表,预测每个节点执行该任务所需的时间,预测算法如下:_ » _,¾ h ( j )其中,Tu为第i个计算节点执行其第Ji个任务所需的预测时间,ts为出现可用任务槽的时间,ti0为复制数据所需的时间,tcpu为数据处理时间,h为计算节点已经成功执行的任务的参考数目,η为集群中的计算节点数;获得Tu的值后,主控节点选择最小的一个,向其对应的计算节点预分配任务,并将任务m标记为已预分,将即将执行该任务的计算节点标记为已预取;接着,主控节点继续在未被标记为已预取的各计算节点中为下一个没有标记为已预分的任务选择执行它的计算节点,每次只预分num个任务,其中num = 5 ;预分完毕后,任务队列里可能还有未被标记为已预分的任务,节点列表中也可能会有未被标记为已预取的节点。 For m in the task list of tasks, and the task state table based on the node status table, each node prediction time required to perform that task, the prediction algorithm is as follows: _ »_, ¾ h (j) wherein, for the first Tu i computing nodes execution time required for its prediction task Ji, ts is a time slot available tasks appear, ti0 is the time required for data replication, tcpu data processing time, h is the task of the computing node has been successfully performed with reference to number, [eta] is the number of computing nodes in the cluster; the value of Tu is obtained, the master node selects the smallest one of its nodes corresponding to the calculated pre-assigned tasks, and the task is marked as m pre-separation, the task will be executed soon computing node marked as prefetch; next, the master node continues at each computing node is not marked prefetched in the next unmarked selected to perform its task of pre-computing node points, a time pre points num task, where num = 5; after pre-separation is finished, the task queue may also not marked as pre-separation tasks, node list may not be marked as nodes prefetch. 4)如果计算节点在心跳信息中将请求任务字段标记为真,则进入任务调度模块。 4) If a node in the calculated heartbeat information request task field flag is true, the process proceeds to task scheduling module.
  4. 4.根据权利要求1所述的一种云计算环境中分布式计算模式下的任务动态调度方法, 其特征在于:步骤三所述的计算节点的任务槽,其大小是指计算节点同一时刻能并行执行的最大任务数,任务槽的大小在分布式节点集群启动前配置好;计算节点只有在有空的任务槽的时候才向主控节点申请任务,任务的申请通过心跳信息传递,心跳信息中包含申请任务的标志位,如果为真则表明该计算节点有空的任务槽,主控节点可以将任务分配给该计算节点执行。 4. tasks in a distributed computing environment according to one mode of one kind of cloud computing dynamic scheduling method according to claim, wherein: the step of said three computing nodes tasks groove, which refers to the size of nodes in the same time can be calculated the maximum number of tasks executed in parallel, the size of the task in front of the slot configuration of distributed nodes of a cluster good start; compute nodes only when the tank was empty task to the master node application task, the task of passing information through the application heartbeat, heartbeat flag included in the application tasks executed if true indicates that the task of computing nodes empty slot, the master node can assign tasks to the computing node.
  5. 5.根据权利要求1所述的一种云计算环境中分布式计算模式下的任务动态调度方法, 其特征在于:步骤四所述的主控节点为申请任务的计算节点分配任务执行是通过分布式调度算法决定的;分布式调度算法在主控节点的调度器中实现,同一时刻可能有多个计算节点同时申请任务执行,调度器通过读取节点状态表、任务状态表、节点预取表和任务预分表,并结合剩余任务队列,根据分布式调度算法确定为计算节点分配任务的优先次序及任务个数,然后更新节点预取表和任务预分表;具体实现过程如下:1)主控节点首先通过查找节点预取表判断是否已为该节点预先分配任务,如果已预先分配,则为该节点分配已预分的任务,并标记该任务已分配,标记该节点为未预取;2)如果主控节点没有为该节点预先分配任务,根据步骤二中所述的预测算法公式(1),未 The dynamic task scheduling in a distributed computing environment according to mode one kind of cloud computing claim 1, wherein: said step of four compute nodes assigned master node tasks performed by the distribution application task scheduling algorithm determined; distributed scheduling algorithm implemented in a scheduler in the master node, the same time may have multiple computing nodes simultaneously request task execution, scheduler node by reading the state table, the task state table, the node table prefetch and task pre-separation table, and combining the remaining job queue, the distributed scheduling algorithm in accordance with the priorities determined and the number of calculation tasks allocated task node, then the node updates the prefetch and task pre-separation table; specific implementation process is as follows: 1) first, the master node by looking up the node determines whether the prefetch table for the node previously assigned tasks, if pre-allocated, compared with the pre-assigned sub-node task and marks the task assigned, the node is not marked prefetch ; 2) if the master node is not pre-assigned tasks for the node, according to the prediction algorithm formula (1) according to step II, not 标记为已预取的节点是性能较差或计算能力较弱的节点,则从任务队列中选取一个任务给该节点执行,但是并不将该任务标记为已分配,待下次预分时,该任务将被预分给处理速度较快的节点,因而该任务将有一个备份任务在快节点上执行以保证其顺利执行;3)为计算节点分配任务结束后,主控节点更新节点预取表和任务预分表;4)任务调度阶段结束,转入信息更新阶段,主控节点继续接收并处理计算节点发送过来的心跳信息。 Node labeled node is prefetched poor performance computing power or weak, from the task queue to the selected node performs a task, but the task is not marked as allocated, until the next pre-sharing, the task will be allocated to the pre-processing speed of the node, and therefore has the task will perform a backup task to ensure smooth implementation in a fast node; 3) after the end of the computing node is assigned tasks, the master node updates the node prefetch and task pre-separation table; end 4) scheduling stage to the stage of the update information, the master node continues to receive and process a heartbeat message sent from the computing node.
CN 201010583597 2010-12-13 2010-12-13 Task-dynamic dispatching method under distributed computation mode in cloud computing environment CN102073546B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010583597 CN102073546B (en) 2010-12-13 2010-12-13 Task-dynamic dispatching method under distributed computation mode in cloud computing environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010583597 CN102073546B (en) 2010-12-13 2010-12-13 Task-dynamic dispatching method under distributed computation mode in cloud computing environment

Publications (2)

Publication Number Publication Date
CN102073546A true true CN102073546A (en) 2011-05-25
CN102073546B CN102073546B (en) 2013-07-10

Family

ID=44032092

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010583597 CN102073546B (en) 2010-12-13 2010-12-13 Task-dynamic dispatching method under distributed computation mode in cloud computing environment

Country Status (1)

Country Link
CN (1) CN102073546B (en)

Cited By (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102209041A (en) * 2011-07-13 2011-10-05 上海红神信息技术有限公司 Scheduling method, device and system
CN102347989A (en) * 2011-10-25 2012-02-08 百度在线网络技术(北京)有限公司 Data distribution method and system based on resource description symbols
CN102360314A (en) * 2011-10-28 2012-02-22 中国科学院计算技术研究所 System and method for managing resources of data center
CN102404615A (en) * 2011-11-29 2012-04-04 广东威创视讯科技股份有限公司 Video processing system based on cloud computing
CN102495759A (en) * 2011-12-08 2012-06-13 曙光信息产业(北京)有限公司 Method for scheduling job in cloud computing environment
CN102541640A (en) * 2011-12-28 2012-07-04 厦门市美亚柏科信息股份有限公司 Cluster GPU (graphic processing unit) resource scheduling system and method
CN102843248A (en) * 2011-06-21 2012-12-26 中兴通讯股份有限公司 Method and device for automatic standalone distributed deployment of software
CN102866918A (en) * 2012-07-26 2013-01-09 中国科学院信息工程研究所 Resource management system for distributed programming framework
CN102916992A (en) * 2011-08-03 2013-02-06 中兴通讯股份有限公司 Method and system for scheduling cloud computing remote resources unifiedly
CN103001809A (en) * 2012-12-25 2013-03-27 曙光信息产业(北京)有限公司 Service node state monitoring method for cloud storage system
CN103036946A (en) * 2012-11-21 2013-04-10 中国电信股份有限公司云计算分公司 Method and system for processing file backup on cloud platform
CN103064742A (en) * 2012-12-25 2013-04-24 中国科学院深圳先进技术研究院 Automatic deployment system and method of hadoop cluster
CN103092698A (en) * 2012-12-24 2013-05-08 中国科学院深圳先进技术研究院 System and method of cloud computing application automatic deployment
CN103095853A (en) * 2013-02-27 2013-05-08 北京航空航天大学 Cloud data center calculation capacity management system
CN103297499A (en) * 2013-04-19 2013-09-11 无锡成电科大科技发展有限公司 Scheduling method and system based on cloud platform
CN103309738A (en) * 2013-05-31 2013-09-18 中国联合网络通信集团有限公司 User job scheduling method and device
CN103324533A (en) * 2012-03-22 2013-09-25 华为技术有限公司 DDP (distributed data processing) method, device and system
CN103347055A (en) * 2013-06-19 2013-10-09 北京奇虎科技有限公司 System, device and method for processing tasks in cloud computing platform
WO2013149502A1 (en) * 2012-04-01 2013-10-10 华为技术有限公司 Method and device of resource scheduling and management
CN103377087A (en) * 2012-04-27 2013-10-30 北大方正集团有限公司 Data task processing method, device and system
CN103414771A (en) * 2013-08-05 2013-11-27 国云科技股份有限公司 Monitoring method for long task operation between nodes in cloud computing environment
CN103500119A (en) * 2013-09-06 2014-01-08 西安交通大学 Task allocation method based on pre-dispatch
CN103546510A (en) * 2012-07-13 2014-01-29 云联(北京)信息技术有限公司 Management system and management method on basis of cloud service
CN103546509A (en) * 2012-07-13 2014-01-29 云联(北京)信息技术有限公司 Resource-saving cloud service system and resource saving method
CN103617305A (en) * 2013-10-22 2014-03-05 芜湖大学科技园发展有限公司 Self-adaptive electric power simulation cloud computing platform job scheduling algorithm
CN103713942A (en) * 2012-09-28 2014-04-09 腾讯科技(深圳)有限公司 Method and system for dispatching and running a distributed computing frame in cluster
CN103761146A (en) * 2014-01-06 2014-04-30 浪潮电子信息产业股份有限公司 Method for dynamically setting quantities of slots for MapReduce
CN103941662A (en) * 2014-03-19 2014-07-23 华存数据信息技术有限公司 Task scheduling system and method based on cloud computing
CN104008002A (en) * 2014-06-17 2014-08-27 电子科技大学 Target host selection method for deploying virtual machine under cloud platform environment
CN104077188A (en) * 2013-03-29 2014-10-01 西门子公司 Method and device for scheduling tasks
CN104102533A (en) * 2014-06-17 2014-10-15 华中科技大学 Bandwidth aware based Hadoop scheduling method and system
CN104166589A (en) * 2013-05-17 2014-11-26 阿里巴巴集团控股有限公司 Heartbeat package processing method and device
CN104268007A (en) * 2014-01-07 2015-01-07 深圳市华傲数据技术有限公司 Distributed event request scheduling method and system
CN104301423A (en) * 2014-10-24 2015-01-21 北京奇虎科技有限公司 Heartbeat message sending method, device and system
CN104360909A (en) * 2014-11-04 2015-02-18 无锡天脉聚源传媒科技有限公司 Method and device for processing videos
CN104462581A (en) * 2014-12-30 2015-03-25 成都因纳伟盛科技股份有限公司 Micro-channel memory mapping and Smart-Slice based ultrafast file fingerprint extraction system and method
CN104461722A (en) * 2014-12-16 2015-03-25 广东石油化工学院 Job scheduling method used for cloud computing system
CN104503845A (en) * 2015-01-14 2015-04-08 北京邮电大学 Task distributing method and system
WO2015061976A1 (en) * 2013-10-30 2015-05-07 Nokia Technologies Oy Methods and apparatus for task management in a mobile cloud computing environment
WO2015066979A1 (en) * 2013-11-07 2015-05-14 浪潮电子信息产业股份有限公司 Machine learning method for mapreduce task resource configuration parameters
CN104917642A (en) * 2014-03-11 2015-09-16 深圳业拓讯通信科技有限公司 Port mirror image data transmitting method and system
CN104933110A (en) * 2015-06-03 2015-09-23 电子科技大学 MapReduce-based data pre-fetching method
CN105095008A (en) * 2015-08-25 2015-11-25 国电南瑞科技股份有限公司 Distributed task fault redundancy method suitable for cluster system
CN105227488A (en) * 2015-08-25 2016-01-06 上海交通大学 Network flow group scheduling method used for distributed computer platform
CN105468726A (en) * 2015-11-20 2016-04-06 广州视源电子科技股份有限公司 Local computing and distributed computing based data computing method and system
CN106156631A (en) * 2015-06-01 2016-11-23 上海红神信息技术有限公司 Software-hardware device with uncertain service functions and structural characterization
CN104008002B (en) * 2014-06-17 2016-11-30 电子科技大学 Target host selection method under a cloud platform to deploy a virtual machine environment
CN104123214B (en) * 2013-04-26 2017-07-14 阿里巴巴集团控股有限公司 Run-time task-based methods of data measurement and display of progress in the implementation and systems

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101719931A (en) * 2009-11-27 2010-06-02 南京邮电大学 Multi-intelligent body-based hierarchical cloud computing model construction method
US20100287280A1 (en) * 2009-05-08 2010-11-11 Gal Sivan System and method for cloud computing based on multiple providers

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100287280A1 (en) * 2009-05-08 2010-11-11 Gal Sivan System and method for cloud computing based on multiple providers
CN101719931A (en) * 2009-11-27 2010-06-02 南京邮电大学 Multi-intelligent body-based hierarchical cloud computing model construction method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
万至臻 等: "基于MapReduce模型的并行计算平台的设计与实现", 《中国优秀硕士学位论文全文数据库》, 31 December 2008 (2008-12-31) *

Cited By (78)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102843248A (en) * 2011-06-21 2012-12-26 中兴通讯股份有限公司 Method and device for automatic standalone distributed deployment of software
CN102843248B (en) * 2011-06-21 2018-02-02 中兴通讯股份有限公司 Stand-alone software automatically distributed deployment method and apparatus
CN102209041A (en) * 2011-07-13 2011-10-05 上海红神信息技术有限公司 Scheduling method, device and system
CN102209041B (en) 2011-07-13 2014-05-07 上海红神信息技术有限公司 Scheduling method, device and system
CN102916992A (en) * 2011-08-03 2013-02-06 中兴通讯股份有限公司 Method and system for scheduling cloud computing remote resources unifiedly
WO2013016977A1 (en) * 2011-08-03 2013-02-07 中兴通讯股份有限公司 Method and system for uniformly scheduling remote resources of cloud computing
CN102916992B (en) * 2011-08-03 2016-12-28 世纪恒通科技股份有限公司 A uniform cloud remote resource scheduling method and system
CN102347989A (en) * 2011-10-25 2012-02-08 百度在线网络技术(北京)有限公司 Data distribution method and system based on resource description symbols
CN102360314A (en) * 2011-10-28 2012-02-22 中国科学院计算技术研究所 System and method for managing resources of data center
CN102404615A (en) * 2011-11-29 2012-04-04 广东威创视讯科技股份有限公司 Video processing system based on cloud computing
CN102495759A (en) * 2011-12-08 2012-06-13 曙光信息产业(北京)有限公司 Method for scheduling job in cloud computing environment
CN102541640A (en) * 2011-12-28 2012-07-04 厦门市美亚柏科信息股份有限公司 Cluster GPU (graphic processing unit) resource scheduling system and method
CN102541640B (en) 2011-12-28 2014-10-29 厦门市美亚柏科信息股份有限公司 One kind of gpu cluster resource scheduling system and method
CN103324533B (en) * 2012-03-22 2016-12-28 华为技术有限公司 Distributed data processing method, apparatus and system for
CN103324533A (en) * 2012-03-22 2013-09-25 华为技术有限公司 DDP (distributed data processing) method, device and system
CN103365713A (en) * 2012-04-01 2013-10-23 华为技术有限公司 Resource dispatch and management method and device
CN103365713B (en) * 2012-04-01 2017-06-20 华为技术有限公司 Scheduling and resource management method and apparatus
WO2013149502A1 (en) * 2012-04-01 2013-10-10 华为技术有限公司 Method and device of resource scheduling and management
CN103377087B (en) * 2012-04-27 2017-06-23 北大方正集团有限公司 Task A data processing method, apparatus and system for
CN103377087A (en) * 2012-04-27 2013-10-30 北大方正集团有限公司 Data task processing method, device and system
CN103546510A (en) * 2012-07-13 2014-01-29 云联(北京)信息技术有限公司 Management system and management method on basis of cloud service
CN103546509B (en) * 2012-07-13 2018-05-29 天津米游科技有限公司 Kind of cloud service system and resource-saving method to save resources
CN103546509A (en) * 2012-07-13 2014-01-29 云联(北京)信息技术有限公司 Resource-saving cloud service system and resource saving method
CN103546510B (en) * 2012-07-13 2018-08-28 天津米游科技有限公司 Cloud-based service management system and management methods
CN102866918A (en) * 2012-07-26 2013-01-09 中国科学院信息工程研究所 Resource management system for distributed programming framework
CN102866918B (en) * 2012-07-26 2016-02-24 中国科学院信息工程研究所 Resources management system for distributed programming framework
CN103713942A (en) * 2012-09-28 2014-04-09 腾讯科技(深圳)有限公司 Method and system for dispatching and running a distributed computing frame in cluster
CN103713942B (en) * 2012-09-28 2018-01-05 腾讯科技(深圳)有限公司 Dispatching distributed computing frameworks of the methods and systems in the cluster
CN103036946A (en) * 2012-11-21 2013-04-10 中国电信股份有限公司云计算分公司 Method and system for processing file backup on cloud platform
CN103036946B (en) * 2012-11-21 2016-08-24 中国电信股份有限公司 One kind of cloud platform approach to file backup tasks and system for
CN103092698A (en) * 2012-12-24 2013-05-08 中国科学院深圳先进技术研究院 System and method of cloud computing application automatic deployment
CN103001809A (en) * 2012-12-25 2013-03-27 曙光信息产业(北京)有限公司 Service node state monitoring method for cloud storage system
CN103064742A (en) * 2012-12-25 2013-04-24 中国科学院深圳先进技术研究院 Automatic deployment system and method of hadoop cluster
CN103064742B (en) * 2012-12-25 2016-05-11 中国科学院深圳先进技术研究院 Automatic deployment system and method for hadoop cluster
CN103001809B (en) * 2012-12-25 2016-12-28 曙光信息产业(北京)有限公司 A cloud storage service node system status monitoring method
CN103095853A (en) * 2013-02-27 2013-05-08 北京航空航天大学 Cloud data center calculation capacity management system
CN103095853B (en) * 2013-02-27 2016-08-03 北京航空航天大学 Cloud computing data center management system
CN104077188A (en) * 2013-03-29 2014-10-01 西门子公司 Method and device for scheduling tasks
CN103297499B (en) * 2013-04-19 2017-02-08 无锡成电科大科技发展有限公司 A method of scheduling method and system based on cloud platform
CN103297499A (en) * 2013-04-19 2013-09-11 无锡成电科大科技发展有限公司 Scheduling method and system based on cloud platform
CN104123214B (en) * 2013-04-26 2017-07-14 阿里巴巴集团控股有限公司 Run-time task-based methods of data measurement and display of progress in the implementation and systems
CN104166589A (en) * 2013-05-17 2014-11-26 阿里巴巴集团控股有限公司 Heartbeat package processing method and device
CN103309738A (en) * 2013-05-31 2013-09-18 中国联合网络通信集团有限公司 User job scheduling method and device
CN103309738B (en) * 2013-05-31 2016-12-28 中国联合网络通信集团有限公司 User job scheduling method and apparatus
CN103347055B (en) * 2013-06-19 2016-04-20 北京奇虎科技有限公司 Platform task processing system, apparatus and method cloud
CN103347055A (en) * 2013-06-19 2013-10-09 北京奇虎科技有限公司 System, device and method for processing tasks in cloud computing platform
CN103414771B (en) * 2013-08-05 2017-02-15 国云科技股份有限公司 One kind of cloud monitoring method between nodes in a long-tasking operating environment
CN103414771A (en) * 2013-08-05 2013-11-27 国云科技股份有限公司 Monitoring method for long task operation between nodes in cloud computing environment
CN103500119B (en) * 2013-09-06 2017-01-04 西安交通大学 One kind of task allocation method for pre-scheduling based on
CN103500119A (en) * 2013-09-06 2014-01-08 西安交通大学 Task allocation method based on pre-dispatch
CN103617305A (en) * 2013-10-22 2014-03-05 芜湖大学科技园发展有限公司 Self-adaptive electric power simulation cloud computing platform job scheduling algorithm
WO2015061976A1 (en) * 2013-10-30 2015-05-07 Nokia Technologies Oy Methods and apparatus for task management in a mobile cloud computing environment
WO2015066979A1 (en) * 2013-11-07 2015-05-14 浪潮电子信息产业股份有限公司 Machine learning method for mapreduce task resource configuration parameters
CN103761146A (en) * 2014-01-06 2014-04-30 浪潮电子信息产业股份有限公司 Method for dynamically setting quantities of slots for MapReduce
CN103761146B (en) * 2014-01-06 2017-10-31 浪潮电子信息产业股份有限公司 MapReduce slots set one kind of dynamic number of methods
CN104268007A (en) * 2014-01-07 2015-01-07 深圳市华傲数据技术有限公司 Distributed event request scheduling method and system
CN104917642A (en) * 2014-03-11 2015-09-16 深圳业拓讯通信科技有限公司 Port mirror image data transmitting method and system
CN103941662A (en) * 2014-03-19 2014-07-23 华存数据信息技术有限公司 Task scheduling system and method based on cloud computing
CN104008002A (en) * 2014-06-17 2014-08-27 电子科技大学 Target host selection method for deploying virtual machine under cloud platform environment
CN104008002B (en) * 2014-06-17 2016-11-30 电子科技大学 Target host selection method under a cloud platform to deploy a virtual machine environment
CN104102533A (en) * 2014-06-17 2014-10-15 华中科技大学 Bandwidth aware based Hadoop scheduling method and system
CN104301423A (en) * 2014-10-24 2015-01-21 北京奇虎科技有限公司 Heartbeat message sending method, device and system
CN104360909A (en) * 2014-11-04 2015-02-18 无锡天脉聚源传媒科技有限公司 Method and device for processing videos
CN104360909B (en) * 2014-11-04 2017-10-03 无锡天脉聚源传媒科技有限公司 A video processing method and apparatus
CN104461722A (en) * 2014-12-16 2015-03-25 广东石油化工学院 Job scheduling method used for cloud computing system
CN104461722B (en) * 2014-12-16 2017-11-10 广东石油化工学院 A scheduling method for a cloud computing system
CN104462581A (en) * 2014-12-30 2015-03-25 成都因纳伟盛科技股份有限公司 Micro-channel memory mapping and Smart-Slice based ultrafast file fingerprint extraction system and method
CN104462581B (en) * 2014-12-30 2018-03-06 成都因纳伟盛科技股份有限公司 Microchannels memory mapping and the Smart-Slice speed file fingerprint extraction system and method based on
CN104503845B (en) * 2015-01-14 2017-07-14 北京邮电大学 One kind of task distribution method and system
CN104503845A (en) * 2015-01-14 2015-04-08 北京邮电大学 Task distributing method and system
CN106156631A (en) * 2015-06-01 2016-11-23 上海红神信息技术有限公司 Software-hardware device with uncertain service functions and structural characterization
CN104933110B (en) * 2015-06-03 2018-02-09 电子科技大学 Prefetching data based MapReduce
CN104933110A (en) * 2015-06-03 2015-09-23 电子科技大学 MapReduce-based data pre-fetching method
CN105227488A (en) * 2015-08-25 2016-01-06 上海交通大学 Network flow group scheduling method used for distributed computer platform
CN105095008B (en) * 2015-08-25 2018-04-17 国电南瑞科技股份有限公司 Suitable for distributed cluster system task fault tolerance method
CN105227488B (en) * 2015-08-25 2018-05-08 上海交通大学 A network stream group scheduling method for a distributed computer platforms
CN105095008A (en) * 2015-08-25 2015-11-25 国电南瑞科技股份有限公司 Distributed task fault redundancy method suitable for cluster system
CN105468726A (en) * 2015-11-20 2016-04-06 广州视源电子科技股份有限公司 Local computing and distributed computing based data computing method and system

Also Published As

Publication number Publication date Type
CN102073546B (en) 2013-07-10 grant

Similar Documents

Publication Publication Date Title
Cao et al. Agent-based grid load balancing using performance-driven task scheduling
Hamscher et al. Evaluation of job-scheduling strategies for grid computing
US6732139B1 (en) Method to distribute programs using remote java objects
US20060218551A1 (en) Jobstream planner considering network contention & resource availability
Kobbe et al. DistRM: distributed resource management for on-chip many-core systems
US7774457B1 (en) Resource evaluation for a batch job and an interactive session concurrently executed in a grid computing environment
Dong et al. BlueSky cloud framework: An e-learning framework embracing cloud computing
Lin et al. A threshold-based dynamic resource allocation scheme for cloud computing
Singh et al. Optimizing grid-based workflow execution
Addis et al. Autonomic management of cloud service centers with availability guarantees
Beloglazov et al. OpenStack Neat: a framework for dynamic and energy‐efficient consolidation of virtual machines in OpenStack clouds
CN101488098A (en) Multi-core computing resource management system based on virtual computing technology
CN102279771A (en) Method and system for a virtual environment adaptive on-demand resource allocation
CN101938416A (en) Cloud computing resource scheduling method based on dynamic reconfiguration virtual resources
CN102004670A (en) Self-adaptive job scheduling method based on MapReduce
Ge et al. GA-based task scheduler for the cloud computing systems
Polo et al. Performance management of accelerated mapreduce workloads in heterogeneous clusters
CN102063336A (en) Distributed computing multiple application function asynchronous concurrent scheduling method
Nesmachnow et al. Energy-aware scheduling on multicore heterogeneous grid computing systems
CN101951411A (en) Cloud scheduling system and method and multistage cloud scheduling system
Mustafa et al. Resource management in cloud computing: Taxonomy, prospects, and challenges
CN101599026A (en) Cluster job scheduling system with elastic framework
CN102307133A (en) Virtual machine scheduling method for public cloud platform
Abawajy An efficient adaptive scheduling policy for high-performance computing
Guo et al. Improving mapreduce performance in heterogeneous network environments and resource utilization

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination
C14 Grant of patent or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 100191 HAIDIAN, BEIJING TO: 201401 FENGXIAN, SHANGHAI

C41 Transfer of patent application or patent right or utility model
ASS Succession or assignment of patent right

Owner name: SHANGHAI SHICONG INFORMATION TECHNOLOGY CO., LTD.

Free format text: FORMER OWNER: BEIHANG UNIVERSITY

Effective date: 20150512

C56 Change in the name or address of the patentee