WO2017016421A1 - 一种集群中的任务执行方法及装置 - Google Patents

一种集群中的任务执行方法及装置 Download PDF

Info

Publication number
WO2017016421A1
WO2017016421A1 PCT/CN2016/090617 CN2016090617W WO2017016421A1 WO 2017016421 A1 WO2017016421 A1 WO 2017016421A1 CN 2016090617 W CN2016090617 W CN 2016090617W WO 2017016421 A1 WO2017016421 A1 WO 2017016421A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
cluster
executed
cluster resource
resource set
Prior art date
Application number
PCT/CN2016/090617
Other languages
English (en)
French (fr)
Inventor
夏晨
徐常亮
张严明
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2017016421A1 publication Critical patent/WO2017016421A1/zh
Priority to US15/880,432 priority Critical patent/US20180150326A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5044Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3889Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
    • G06F9/3891Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute organised in groups of units sharing resources, e.g. clusters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/485Task life-cycle, e.g. stopping, restarting, resuming execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources

Definitions

  • the present application relates to the field of computer technologies, and in particular, to a task execution method and apparatus in a cluster.
  • the cluster may be a cluster for providing services such as cloud computing and big data processing.
  • a cluster In the prior art, a cluster generally uses cluster resources in time order according to the time when the task is acquired, and sequentially executes each task.
  • the amount of data for each task may be different.
  • a task with a large amount of data may be referred to as a large task, and a task with a small amount of data may be referred to as a small or medium task.
  • the data volume threshold for distinguishing between large tasks and small and medium tasks can be set by the cluster.
  • the cluster may take up all the cluster resources for a long time.
  • a large number of small and medium-sized tasks may wait for a long time because they cannot grab the cluster resources until the cluster completes the large tasks.
  • the cluster resources occupied by the task are released, and the cluster can execute the waiting small and medium tasks.
  • the embodiment of the present invention provides a method and a device for executing a task in a cluster, which may be used to solve the problem that when a certain task occupies all the cluster resources for a long time when the task is executed in the manner of performing the task in the cluster in the prior art.
  • the cluster is unable to perform other tasks in a timely manner.
  • the task to be executed is executed by using the cluster resource included in the determined cluster resource set.
  • a determining module configured to determine, according to the specified attribute of the to-be-executed task, a cluster resource set corresponding to the to-be-executed task in each of the pre-divided cluster resource sets;
  • the execution module is configured to execute the to-be-executed task by using the cluster resource included in the determined cluster resource set.
  • the at least one technical solution may be configured to correspond to different cluster resource sets, and any task to be executed may occupy only the cluster resources included in the cluster resource set corresponding to the to-be-executed task, and may not Occupies all the cluster resources of the cluster. Therefore, even if a task to be executed occupies all the cluster resources included in the cluster resource set corresponding to the task to be executed for a long time, the cluster can still use the cluster resources included in other cluster resource sets to execute in time. Other tasks to be executed corresponding to the other cluster resource sets.
  • FIG. 1 is a schematic diagram of a task execution process in a cluster according to an embodiment of the present application
  • FIG. 2 is a cluster architecture in which a task execution method in a cluster provided by the present application can be implemented in an actual application;
  • FIG. 3 is a schematic diagram of a task execution process of the cluster in FIG. 2 according to an embodiment of the present disclosure
  • FIG. 4 is a schematic structural diagram of a task execution apparatus in a cluster according to an embodiment of the present disclosure.
  • FIG. 1 is a flowchart of a task execution process in a cluster according to an embodiment of the present disclosure, which specifically includes the following steps:
  • the execution subject of the task execution method in the cluster may be a cluster, and the cluster may It is a Hadoop cluster, or a cluster based on other distributed architectures. In practical applications, the cluster can be used to provide services such as cloud computing and big data processing.
  • Each of the steps of the task execution method may be specifically performed by one or more machines in the cluster, and the machines may be task schedulers and/or task performers in the cluster.
  • the user can submit the to-be-executed task to the cluster through the client corresponding to the cluster, and the cluster can obtain the to-be-executed task.
  • the task to be executed may be a specified operation for specifying data that is requested by the cluster.
  • the query task can be submitted to the cluster.
  • the query task may include keywords of the query, and related information of all the papers, such as address indexes of all the papers mentioned above.
  • the cluster may determine the amount of data of the query task according to the information contained in the query task, where the amount of data may be the size of a file in which all the papers are stored.
  • the designation data described above, in this example refers to the file in which the entire paper is stored; and the specified operation described above, in this example, refers to the total occurrence of the query technical noun a. frequency.
  • the specified operation may also be an operation of deleting, modifying, creating, authorizing, etc., and the operation mode and operation content of the specified operation involved in the task to be executed are not limited. .
  • the cluster may acquire multiple tasks to be executed at the same time, or may sequentially acquire each task to be executed in the task queue based on the task queue.
  • the subsequent steps may be separately performed for each of the acquired tasks to be executed.
  • the to-be-executed task mentioned in the subsequent step may refer to any one of the to-be-executed tasks acquired by the cluster.
  • S102 Determine, according to the specified attribute of the to-be-executed task, a cluster resource set corresponding to the to-be-executed task in each of the pre-divided cluster resource sets.
  • the cluster resource may be a computing resource used when executing a task to be executed.
  • the cluster resources may be measured in different units, including but not limited to the following three units:
  • the first type the number of machines.
  • any machine in the cluster can act as a unit of cluster resources.
  • the cluster resource set may include a machine that sets the number of sets.
  • the number of Central Processing Units (CPUs).
  • any one of the machines in the cluster can be used as a unit of cluster resources.
  • the cluster resource set may include a first set number of CPUs.
  • the cluster resource set may include a second set number of processes for performing tasks.
  • all the cluster resources included in the cluster may be divided into at least two cluster resource sets in advance, and the cluster resources included in each cluster resource set may be used as a cluster utilization object, so that the cluster realizes the use of the cluster resources.
  • the cluster resources included in the collection perform the tasks to be executed corresponding to the cluster resource collection.
  • one cluster resource set (or a plurality of cluster resource sets thereof) may be used for cluster execution large tasks, and another cluster resource set (or another plurality of cluster resource sets) may be used.
  • Perform small and medium tasks in the cluster In this case, the process of executing the large task does not occupy the cluster resources required to execute the small and medium tasks, and therefore, the efficiency of executing the small and medium tasks can be improved.
  • the specified attribute may include the amount of data in the above step S102.
  • the amount of data to be executed can reflect the size of the task.
  • the task to be executed may be considered as a small and medium task.
  • the data volume of the task to be executed is greater than the set data amount threshold, the task to be executed may be considered as Small and medium tasks.
  • a plurality of data amount thresholds may be set, and the plurality of data quantity thresholds may be divided into a plurality of data quantity intervals, and the corresponding data quantity may fall in each of the to-be-executed tasks in the same data quantity interval. Corresponds to the same cluster resource set.
  • the specified attribute may also be at least one of a task execution manner, a task priority, and the like.
  • the task execution mode may be online execution or offline execution, wherein the online execution may refer to connecting the Internet when the execution subject performs the task, so as to quickly return the execution result, and offline.
  • Execution can mean that the Internet is not connected to the execution subject while performing the task.
  • the speed required by the user to return execution results is high, and the cluster can perform small and medium tasks online.
  • the speed of returning execution results required by the user is low, and the cluster can perform large tasks offline. .
  • the task execution manner may be specified by a user or may be specified by a cluster.
  • the cluster will preferentially execute the task to be executed with higher priority.
  • a cluster resource set may be separately allocated for each task to be executed for each task priority. In this case, the tasks to be executed with different task priorities do not occupy the cluster resources allocated to the other party.
  • the number of cluster resources included in each divided cluster resource set may be different. It is assumed that the specified attribute is a data quantity, and the cluster resources required for performing a large task are relatively large. Therefore, when the cluster resource set is pre-divided, the cluster resource set corresponding to the large task may include more cluster resources, such as may include 80% of all cluster resources, correspondingly, the cluster resource set corresponding to small and medium tasks can contain 20% of all cluster resources. In this way, the load balancing capability of the cluster can be improved, so that the cluster can obtain sufficient cluster resources in performing large tasks and small and medium tasks.
  • S103 Perform the to-be-executed task by using the cluster resource included in the determined cluster resource set.
  • different tasks to be executed may correspond to different cluster resource sets, and any task to be executed may occupy only the cluster resources included in the cluster resource set corresponding to the task to be executed, and does not occupy all the cluster resources of the cluster. Even if a task to be executed occupies all the cluster resources included in the cluster resource set corresponding to the to-be-executed task for a long time, the cluster can still use the cluster resources included in the other cluster resource set to perform the corresponding other cluster resource sets in time. Other tasks to be performed.
  • the large task and the small and medium task may respectively correspond to different cluster resource sets, so that the large task may occupy only the cluster resources included in the cluster resource set corresponding to the large task without occupying.
  • the cluster resource included in the cluster resource set corresponding to the small and medium task, and the cluster can also perform the small and medium task by using the cluster resource included in the cluster resource set corresponding to the small and medium task while executing the large task. Therefore, the cluster can execute the small and medium task in time. .
  • the cluster can be executed online, and for large tasks, the cluster can be executed offline.
  • the cluster resource set includes at least: a cluster resource set that provides a cluster resource for an online execution task, and a cluster resource set that provides a cluster resource for an offline execution task.
  • the executing the task to be executed may include: executing the to-be-executed task online.
  • executing the to-be-executed task may include: performing the to-be-executed task offline.
  • a cluster resource set that provides cluster resources for online execution tasks, and each machine that performs tasks online in the cluster may constitute a complete system, which may be referred to as: Massively Parallel Processing (MPP) )system.
  • the online MPP system may be a system that has processes such as Impala, Sql On Spark, etc., and can perform small and medium tasks quickly and online.
  • a cluster resource set that provides cluster resources for offline execution tasks, and each machine that performs tasks offline in the cluster can also constitute a complete System, the system can be called: Offline Map Reduction (MapReduce, MP) system.
  • the offline MP system may be an offline big data processing system such as Hadoop that implements a computing model.
  • determining the cluster resource set corresponding to the to-be-executed task may include: determining whether the data quantity of the to-be-executed task is not greater than a data quantity threshold; If yes, the cluster resource set of the cluster resource is provided for the online execution task, and the cluster resource set corresponding to the to-be-executed task is determined; otherwise, the cluster resource set of the cluster resource is provided for the offline execution task, and the cluster resource set is determined to be executed.
  • the collection of cluster resources corresponding to the task may include: determining whether the data quantity of the to-be-executed task is not greater than a data quantity threshold; If yes, the cluster resource set of the cluster resource is provided for the online execution task, and the cluster resource set corresponding to the to-be-executed task is determined; otherwise, the cluster resource set of the cluster resource is provided for the offline execution task, and the cluster resource set is determined to be executed.
  • the collection of cluster resources corresponding to the task may include: determining whether the data quantity of the to-be-executed
  • the cluster obtains the query task, it can determine whether the amount of data required for executing the query task is Not more than 1GB;
  • the query task can be considered as a small and medium task. Therefore, it can be determined that the query task corresponds to a cluster resource set that provides cluster resources for online execution of the task, and further, the online MPP system in the cluster can be used to provide online task execution.
  • the cluster resource included in the cluster resource collection of the cluster resource performs the query task online;
  • the query task can be considered to belong to a large task. Therefore, it can be determined that the query task corresponds to a cluster resource set that provides cluster resources for performing offline tasks. Further, the offline MP system in the cluster can be used to provide clusters for offline execution tasks. The cluster resource included in the cluster resource collection of the resource executes the query task offline.
  • the cluster may further decompose the task to be executed into a set number of task instances (the task instance may also be referred to as a subtask), and subsequent tasks may be performed.
  • the instances are respectively submitted to different processes in the cluster for execution, and after the execution of each task instance is completed, the execution results of the task instances are summarized and merged, and the execution result of the to-be-executed tasks is obtained.
  • the method used by the cluster to decompose the task to be executed is not limited, and may be decomposed according to the amount of data, or may be decomposed according to other attributes of the task to be executed.
  • the specified attribute may also be the number of the task instances that are decomposed from the to-be-executed task, and the cluster resource set corresponding to the task to be executed may be determined. Determining whether the number of task instances decomposed from the to-be-executed task is not greater than an instance number threshold; if yes, providing a cluster resource set of the cluster resource for the online execution task, and determining the cluster resource set corresponding to the to-be-executed task Otherwise, the cluster resource set of the cluster resource is provided for the offline execution task, and is determined as the cluster resource set corresponding to the to-be-executed task.
  • the task to be executed is a query task
  • the number of the query task is The amount is 1GB.
  • the cluster decomposes the task instance from the query task according to the amount of data, and sets the data volume of each task instance to 256 megabytes (MByte, MB), and the query task can be decomposed into four task instances. It can be seen that the number of task instances is not greater than the threshold of the number of instances. Therefore, it can be determined that the query task corresponds to a cluster resource set that provides cluster resources for online execution tasks, and thus can be utilized by the online MPP system in the cluster for online execution.
  • the task provides the cluster resources contained in the cluster resource collection of the cluster resource, and performs the query task online.
  • a cluster resource included in a cluster resource set that provides a cluster resource for an offline execution task is more than a cluster resource included in a cluster resource set that provides a cluster resource for an online execution task, and correspondingly,
  • the ability of a cluster to perform tasks offline may be more powerful than the ability to perform tasks online.
  • cluster resources included in a cluster resource set that provides cluster resources for online execution tasks may also take a long time to execute certain small and medium tasks, so that subsequent small and medium tasks cannot be executed in time.
  • the cluster resources included in the cluster resource set that provides the cluster resources for offline execution tasks can also be used to perform these small and medium tasks, thereby preventing the small and medium tasks in the cluster from being blocked.
  • the method may further include: counting a process of executing the task to be executed online; and stopping the online execution when the time duration is greater than a duration threshold
  • the task to be executed is executed, and the cluster resource occupied by the to-be-executed task is released; and the to-be-executed task is executed offline by using a cluster resource set that provides cluster resources for offline execution of the task.
  • the duration threshold can generally be set to 600 seconds.
  • the specific values of the foregoing data volume threshold, the instance number threshold, and the duration threshold are not limited, and the thresholds may be set according to actual application scenarios.
  • the cluster may perform the execution process and the execution result of each task to be executed based on the cluster resource set, and record in the form of a log.
  • the load balancing status in the cluster can be determined, and the cluster resources included in each cluster resource set can be adjusted periodically or irregularly according to the load balancing status to optimize the load balancing status in the cluster.
  • the cluster resource included in the cluster resource set that provides the cluster resource for the online execution task often performs timeout when performing the small and medium task, and the cluster resource set that provides the cluster resource for the offline execution task.
  • Some of the cluster resources in the cluster resource collection are often idle. In this way, the part of the cluster resources that are often in an idle state can be re-divided into cluster resource sets that provide cluster resources for online execution tasks, so as to perform small and medium tasks online, thereby optimizing load balancing conditions in the cluster.
  • a cluster architecture for implementing a task execution method in a cluster provided by the present application is also provided in an actual application. as shown in picture 2.
  • FIG. 2 includes L clients, a cluster, and the cluster includes: a task scheduler, an online MPP system, and an offline MR system, wherein the online MPP system includes N task execution machines and an offline MR system. It contains M task execution machines.
  • the online MPP system may include a cluster resource set that provides cluster resources for performing tasks online
  • the offline MR system may include a cluster resource set that provides cluster resources for offline execution tasks.
  • the cluster resources contained in the cluster resource collection can be task execution machines.
  • the task execution process in the cluster may specifically include the following steps:
  • the task scheduler obtains a task to be executed submitted by the user through the client.
  • step S302 The task scheduler determines whether the data amount of the to-be-executed task is not greater than a data amount threshold. If yes, step S303 is performed; otherwise, step S306 is performed.
  • S303 The task scheduler sends the to-be-executed task to the online MPP system.
  • S304 The online MPP system performs the to-be-executed task online by using the task execution machine included in itself, and starts timing the time for executing the to-be-executed task.
  • the task scheduler sends the to-be-executed task to the offline MR system for offline execution.
  • the embodiment of the present application further provides a task execution device in the corresponding cluster, as shown in FIG. 4 .
  • FIG. 4 is a schematic structural diagram of a task execution apparatus in a cluster according to an embodiment of the present disclosure, which specifically includes:
  • the obtaining module 401 is configured to obtain a task to be executed
  • a determining module 402 configured to determine, according to the specified attribute of the to-be-executed task, a cluster resource set corresponding to the to-be-executed task in each of the pre-divided cluster resource sets;
  • the executing module 403 is configured to execute the to-be-executed task by using the cluster resource included in the determined cluster resource set.
  • the set of cluster resources includes at least: a cluster resource set that provides cluster resources for online execution tasks, and a cluster resource set that provides cluster resources for offline execution tasks.
  • the determining module 402 is specifically configured to: determine whether the data quantity of the to-be-executed task is not greater than a data quantity threshold; if yes, provide a cluster resource set of the cluster resource for the online execution task
  • the cluster resource set corresponding to the to-be-executed task is determined to be the cluster resource set corresponding to the to-be-executed task, and is determined to be the cluster resource set corresponding to the to-be-executed task.
  • the determining module 402 is specifically configured to: determine whether the number of task instances decomposed from the to-be-executed task is not greater than an instance.
  • the number threshold is set; if yes, the cluster resource set of the cluster resource is provided for the online execution task, and is determined as the cluster resource set corresponding to the to-be-executed task; otherwise, the cluster resource set of the cluster resource is provided for the offline execution task, and is determined to be The cluster resource set corresponding to the execution task is described.
  • the execution module 403 is specifically configured to: execute the online resource by using the cluster resource included in the cluster resource set that provides the cluster resource for the online execution task The task to be executed;
  • the executing module 403 is specifically configured to: use the cluster resource included in the cluster resource set that provides the cluster resource for the offline execution task, and execute offline The task to be performed.
  • the device also includes:
  • the switching module 404 is configured to time the execution of the to-be-executed task by the execution module 403. When the timing duration is greater than the duration threshold, the online execution of the to-be-executed task is stopped, and the task to be executed is released.
  • the cluster resource uses a cluster resource set that provides cluster resources for offline execution tasks, and executes the to-be-executed tasks offline.
  • the specific device shown in Figure 4 above may be located on a machine in the cluster.
  • An embodiment of the present invention provides a task execution method and apparatus in a cluster, where the method obtains a task to be executed, and determines, according to the specified attribute of the to-be-executed task, the pre-executed task in the pre-divided cluster resource set.
  • the set of cluster resources is executed by using the cluster resources included in the determined cluster resource set.
  • the cluster can still use the cluster resources included in the other cluster resource set to perform the corresponding other cluster resource sets in time. Other tasks to be performed.
  • embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology.
  • the information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic box magnetic A tape, magnetic tape storage or other magnetic storage device or any other non-transportable medium can be used to store information that can be accessed by a computing device.
  • computer readable media does not include temporary storage of computer readable media, such as modulated data signals and carrier waves.
  • embodiments of the present application can be provided as a method, system, or computer program product.
  • the present application can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment in combination of software and hardware.
  • the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本申请公开了一种集群中的任务执行方法及装置,该方法获取待执行任务,根据所述待执行任务的指定属性,在预先划分的各集群资源集合中,确定所述待执行任务对应的集群资源集合,利用确定出的集群资源集合中包含的集群资源,执行所述待执行任务。通过上述方法,不同的待执行任务可能对应着不同集群资源集合,任一个待执行任务可以只占用该待执行任务对应的集群资源集合包含的集群资源,而不会占有集群的全部集群资源,因此,即使某个待执行任务长时间的占用该待执行任务对应的集群资源集合包含的全部集群资源,集群仍然可以利用其它集群资源集合包含的集群资源,及时地执行所述其它集群资源集合对应的其他待执行任务。

Description

一种集群中的任务执行方法及装置
本申请要求2015年07月29日递交的申请号为201510455382.1发明名称为“一种集群中的任务执行方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及一种集群中的任务执行方法及装置。
背景技术
在一个繁忙的大型集群中,每天都可能接收到大量的任务。其中,所述集群可以是用于提供云计算、大数据处理等服务的集群。
在现有技术中,集群一般可以根据获取到任务的时间,按照时间顺序,利用集群资源,依次执行各任务。各任务的数据量可能不同,可以将数据量较大的任务称为大任务,将数据量不大的任务称为中小任务。其中,区分大任务和中小任务的数据量阈值可以由集群设定。
但是,集群在执行大任务的过程中,可能需要长时间地占用全部集群资源,这样的话,可能有大量的中小任务由于无法抢到集群资源而长期等待,直至集群将大任务执行完毕后,大任务占用的集群资源得到释放,集群才可以执行等待的中小任务。
因此,采用现有技术中集群执行任务的方式执行任务时,可能会导致当诸如上述的大任务之类的某个任务长时间占据着全部的集群资源时,集群无法及时执行其他任务的问题。
发明内容
本申请实施例提供一种集群中的任务执行方法及装置,用以解决采用现有技术中集群执行任务的方式执行任务时,可能会导致当某个任务长时间占据着全部的集群资源时,集群无法及时执行其他任务的问题。
本申请实施例提供的一种集群中的任务执行方法,包括:
获取待执行任务;
根据所述待执行任务的指定属性,在预先划分的各集群资源集合中,确定所述待执行任务对应的集群资源集合;
利用确定出的集群资源集合中包含的集群资源,执行所述待执行任务。
本申请实施例提供的一种集群中的任务执行装置,包括:
获取模块,用于获取待执行任务;
确定模块,用于根据所述待执行任务的指定属性,在预先划分的各集群资源集合中,确定所述待执行任务对应的集群资源集合;
执行模块,用于利用确定出的集群资源集合中包含的集群资源,执行所述待执行任务。
本申请实施例通过上述至少一种技术方案,不同的待执行任务可能对应着不同集群资源集合,任一个待执行任务可以只占用该待执行任务对应的集群资源集合包含的集群资源,而不会占有集群的全部集群资源,因此,即使某个待执行任务长时间的占用该待执行任务对应的集群资源集合包含的全部集群资源,集群仍然可以利用其它集群资源集合包含的集群资源,及时地执行所述其它集群资源集合对应的其他待执行任务。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1为本申请实施例提供的集群中的任务执行过程的示意图;
图2为一种在实际应用中可以实现本申请提供的集群中的任务执行方法的集群架构;
图3为本申请实施例提供的图2中集群的任务执行过程的示意图;
图4为本申请实施例提供的集群中的任务执行装置结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请具体实施例及相应的附图对本申请技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
图1为本申请实施例提供的集群中的任务执行过程,具体包括以下步骤:
S101:获取待执行任务。
本申请实施例提供的集群中的任务执行方法的执行主体可以是集群,所述集群可以 是Hadoop集群,或者基于其他分布式架构的集群等,在实际应用中,所述集群可以用于提供云计算、大数据处理等服务。所述任务执行方法中的每个步骤具体可以由所述集群中的一台或多台机器执行,所述机器可以是集群中的任务调度机和/或任务执行机。
在本申请实施例中,用户可以通过集群对应的客户端,向集群提交待执行任务,则集群可以获取到该待执行任务。所述待执行任务可以是请求所述集群执行的、针对指定数据的指定操作。
例如,假定用户想要查询某个技术名词(称为技术名词a)在某个论文数据库中的全部论文中出现的总次数,则可以向集群提交查询任务。该查询任务中可以包含查询的关键词,以及所述全部论文的相关信息,如所述全部论文的地址索引等。集群根据该查询任务中包含的信息,可以确定该查询任务的数据量,所述数据量可以是存储有所述全部论文的文件的大小。在这种情况下,前文所述的指定数据,在此例中是指存储了所述全部论文的文件;而前文所述的指定操作,在此例中则是指查询技术名词a出现的总次数。
当然,除了上例中的查询操作以外,所述指定操作还可以是删除、修改、创建、授权等操作,本申请对所述待执行任务涉及的指定操作的操作方式和操作内容并不做限定。
在本申请实施例中,集群可以同时获取到多个待执行任务,也可以基于任务队列等方式,依次获取任务队列中的每个待执行任务。对于上述步骤S101,当集群获取到不止一个待执行任务时,可以针对获取到的每个待执行任务,分别执行后续步骤。为了便于描述,在后续步骤中提及的待执行任务可以指:集群获取的各待执行任务中的任一待执行任务。
S102:根据所述待执行任务的指定属性,在预先划分的各集群资源集合中,确定所述待执行任务对应的集群资源集合。
在本申请实施例中,集群资源可以是执行待执行任务时所使用的计算资源。所述集群资源可以以不同的单位进行度量,包括但不限于以下三种单位:
第一种,机器的台数。在这种情况下,集群中的任一台机器都可以作为一个单位的集群资源。对于划分出的集群资源集合,所述集群资源集合中可以包含设定台数的机器。
第二种,中央处理器(Central Processing Unit,CPU)的个数。在这种情况下,集群中的任一台机器中的任一个CPU(多核机器中可以有多个CPU)都可以作为一个单位的集群资源。对于划分出的集群资源集合,所述集群资源集合中可以包含第一设定数量的CPU。
第三种,用于执行任务的进程的数量。在这种情况下,集群中的任一台机器中的任一个用于执行任务的进程(操作系统会为该进程分配CPU时间片、内存等计算资源)都可以作为一个单位的集群资源。对于划分出的集群资源集合,所述集群资源集合中可以包含第二设定数量的用于执行任务的进程。
以上是对本申请中所述的集群资源的说明。
在本申请实施例中,可以预先将集群中包含的所有集群资源划分为至少两个集群资源集合,每个集群资源集合中包含的集群资源均可以作为集群的利用对象,使得集群实现利用集群资源集合中包含的集群资源,执行与集群资源集合对应的待执行任务。
例如,在划分出的各集群资源集合中,其中一个集群资源集合(或其中多个集群资源集合)可以用于集群执行大任务,另外一个集群资源集合(或另外多个集群资源集合)可以用于集群执行中小任务。这样的话,在执行大任务的过程不会占用执行中小任务所需的集群资源,因此,可以提高执行中小任务的效率。
对于上例,在上述步骤S102中所述指定属性可以包括数据量。一般的,待执行任务的数据量可以反映任务的大小程度。当待执行任务的数据量不大于设定的数据量阈值时,可以认为该待执行任务为中小任务,当待执行任务的数据量大于设定的数据量阈值时,可以认为该待执行任务为中小任务。当然,在实际应用中,可以设定多个数据量阈值,由所述多个数据量阈值可以划分出多个数据量区间,对应的数据量落在同一个数据量区间的各待执行任务可以对应于相同的集群资源集合。
进一步的,所述指定属性还可以是任务执行方式、任务优先级,等等中的至少一种。
当所述指定属性是任务执行方式时,所述任务执行方式具体可以是在线执行或离线执行,其中,在线执行可以指在执行主体在执行任务时连接着互联网,以便于快速返回执行结果,离线执行可以指在执行主体在执行任务时未连接互联网。在实际应用中,对于中小任务,用户所要求的返回执行结果的速度较高,集群可以在线执行中小任务,对于大任务,用户所要求的返回执行结果的速度较低,集群可以离线执行大任务。
需要说明的是,所述任务执行方式可以由用户指定,也可以由集群指定。
当所述指定属性是任务优先级时,若用户向集群提交的待执行任务具有不同的任务优先级,集群会优先执行任务优先级较高的待执行任务。可以分别为每个任务优先级的各待执行任务对应划分出一个集群资源集合,这样的话,任务优先级不同的待执行任务不会占据划分给对方的集群资源。
在本申请实施例中,划分出的各集群资源集合中包含的集群资源的数量可以不同。 假定所述指定属性为数据量,由于执行大任务所需的集群资源相对较多,因此,预先划分集群资源集合时,可以使大任务对应的集群资源集合包含较多的集群资源,如可以包含全部的集群资源的80%,相应的,中小任务对应的集群资源集合可以包含全部的集群资源的20%。这样的话,可以提高集群的负载均衡能力,使得集群在执行大任务和中小任务均可以获取到足够的集群资源。
S103:利用确定出的集群资源集合中包含的集群资源,执行所述待执行任务。
通过上述方法,不同的待执行任务可能对应着不同集群资源集合,任一个待执行任务可以只占用该待执行任务对应的集群资源集合包含的集群资源,而不会占有集群的全部集群资源,因此,即使某个待执行任务长时间的占用该待执行任务对应的集群资源集合包含的全部集群资源,集群仍然可以利用其它集群资源集合包含的集群资源,及时地执行所述其它集群资源集合对应的其他待执行任务。
例如,当所述指定属性为数据量时,大任务、中小任务可以分别对应于不同的集群资源集合,这样的话,大任务可以只占用大任务对应的集群资源集合包含的集群资源,而不用占用中小任务对应的集群资源集合包含的集群资源,进而,集群在执行大任务的同时,也可以利用中小任务对应的集群资源集合包含的集群资源,执行中小任务,因此,集群可以及时地执行中小任务。
在本申请实施例中,对于中小任务,集群可以在线执行,对于大任务,集群可以离线执行。基于这种场景,在一种实施方式中,对于上述步骤S102,所述各集群资源集合至少包括:为在线执行任务提供集群资源的集群资源集合、为离线执行任务提供集群资源的集群资源集合。
进一步的,对于上述步骤S103,当确定出的集群资源集合是为在线执行任务提供集群资源的集群资源集合时,执行所述待执行任务,具体可以包括:在线执行所述待执行任务。
当确定出的集群资源集合是为离线执行任务提供集群资源的集群资源集合时,执行所述待执行任务,具体可以包括:离线执行所述待执行任务。
在实际应用中,为在线执行任务提供集群资源的集群资源集合,以及集群中在线执行任务的各机器可以构成一个完整的系统,该系统可以称为:在线大规模并行处理(Massively Parallel Processing,MPP)系统。具体的,在线MPP系统可以是有诸如Impala、Sql On Spark等进程常驻、可以快速在线执行中小任务的系统。相应的,为离线执行任务提供集群资源的集群资源集合,以及集群中离线执行任务的各机器也可以构成一个完整 的系统,该系统可以称为:离线映射归约(MapReduce,MP)系统。具体的,离线MP系统可以是诸如Hadoop等实现了计算模型的离线大数据处理系统。
进一步的,对于上述步骤S102,当所述指定属性包括数据量时,确定所述待执行任务对应的集群资源集合,具体可以包括:判断所述待执行任务的数据量是否不大于数据量阈值;若是,则将为在线执行任务提供集群资源的集群资源集合,确定为所述待执行任务对应的集群资源集合;否则,将为离线执行任务提供集群资源的集群资源集合,确定为所述待执行任务对应的集群资源集合。
例如,假定所述数据量阈值为1千兆字节(GigaByte,GB),所述待执行任务为查询任务,则集群获取该查询任务后,可以判断执行该查询任务所需查询的数据量是否不大于1GB;
若是,则可以认为该查询任务属于中小任务,因此,可以确定该查询任务对应于为在线执行任务提供集群资源的集群资源集合,进而,可以由集群中的在线MPP系统,利用为在线执行任务提供集群资源的集群资源集合中包含的集群资源,在线执行该查询任务;
否则,可以认为该查询任务属于大任务,因此,可以确定该查询任务对应于为离线执行任务提供集群资源的集群资源集合,进而,可以由集群中的离线MP系统,利用为离线执行任务提供集群资源的集群资源集合中包含的集群资源,离线执行该查询任务。
更进一步的,在实际应用中,集群在获取待执行任务后,还可能将该待执行任务分解为设定数量的任务实例(所述任务实例也可以称为子任务),后续可以将各任务实例分别递交给集群中的不同进程分别执行,以及在各任务实例执行完毕后,对各任务实例的执行结果进行汇总合并,获得该待执行任务的执行结果。需要说明的是,本申请对集群分解待执行任务所采用的方法并不做限定,可以根据数据量进行分解,也可以根据待执行任务的其他属性进行分解。
在这种情况下,对于上述步骤S102,所述指定属性也可以是从所述待执行任务中分解出的任务实例的数量,则确定所述待执行任务对应的集群资源集合,具体可以包括:判断从所述待执行任务中分解出的任务实例的数量是否不大于实例数阈值;若是,则将为在线执行任务提供集群资源的集群资源集合,确定为所述待执行任务对应的集群资源集合;否则,将为离线执行任务提供集群资源的集群资源集合,确定为所述待执行任务对应的集群资源集合。
例如,假定所述实例数阈值为4,所述待执行任务为查询任务,且该查询任务的数 据量为1GB。假定集群根据数据量,从该查询任务中分解任务实例,设定每个任务实例的数据量为256兆字节(MByte,MB),则该查询任务可以被分解为4个任务实例。可以看到,任务实例的数量不大于实例数阈值,因此,可以确定该查询任务对应于为在线执行任务提供集群资源的集群资源集合,进而,可以由集群中的在线MPP系统,利用为在线执行任务提供集群资源的集群资源集合中包含的集群资源,在线执行该查询任务。
在本申请实施例中,一般说来,为离线执行任务提供集群资源的集群资源集合中包含的集群资源,多于为在线执行任务提供集群资源的集群资源集合中包含的集群资源,相应的,集群离线执行任务的能力可能比在线执行任务的能力强。
在实际应用中,利用为在线执行任务提供集群资源的集群资源集合中包含的集群资源,执行某些中小任务也可能耗费较长的时间,导致后面的中小任务不能被及时执行。在这种情况下,也可以利用为离线执行任务提供集群资源的集群资源集合中包含的集群资源,执行这些中小任务,从而可以防止集群中的各中小任务阻塞。
具体的,对于上述步骤S103,在线执行所述待执行任务时,所述方法还可以包括:对在线执行所述待执行任务的过程进行计时;当计时时长大于时长阈值时,停止在线执行所述待执行任务,并释放所述待执行任务占用的集群资源;利用为离线执行任务提供集群资源的集群资源集合,离线执行所述待执行任务。在实际应用中,一般可以将时长阈值设定为600秒。
需要说明的是,本申请对上述的数据量阈值、实例数阈值、时长阈值的具体取值并不做限定,这几个阈值均可以根据实际应用场景进行设定。
在本申请实施例中,预先划分各集群资源集合之后,还可以对集群基于所述各集群资源集合,执行各待执行任务的执行过程以及执行结果,以日志的形式进行记录。通过分析日志,可以确定集群内的负载均衡状况,进而可以根据所述负载均衡状况,定期地或不定期地对各集群资源集合中包含的集群资源进行调整,以优化集群内的负载均衡状况。
例如,假定通过分析近一周的日志,发现利用为在线执行任务提供集群资源的集群资源集合包含的集群资源,执行中小任务时,经常执行超时,而对于为离线执行任务提供集群资源的集群资源集合,该集群资源集合中的部分集群资源却经常处于空闲状态。这样的话,可以将经常处于空闲状态的这部分集群资源,重新划分至为在线执行任务提供集群资源的集群资源集合中,以用于在线执行中小任务,从而优化了集群内的负载均衡状况。
在本申请实施例中,还提供了一种在实际应用中,可以实现本申请提供的集群中的任务执行方法的集群架构。如图2所示。
可以看到,图2中包括L个客户端,一个集群,该集群中包括:任务调度机、在线MPP系统、离线MR系统,其中,在线MPP系统中包含有N台任务执行机,离线MR系统中包含有M台任务执行机。
在线MPP系统可以包括为在线执行任务提供集群资源的集群资源集合,离线MR系统可以包括为离线执行任务提供集群资源的集群资源集合。集群资源集合中包含的集群资源可以为任务执行机。
基于图2中的集群架构,实现的本申请提供的集群中的任务执行过程,如图3所示,具体可以包括以下步骤:
S301:任务调度机获取到用户通过客户端提交的待执行任务。
S302:任务调度机判断所述待执行任务的数据量是否不大于数据量阈值,若是,则执行步骤S303,否则,执行步骤S306。
S303:任务调度机将所述待执行任务发送给在线MPP系统。
S304:在线MPP系统通过自身中包含的任务执行机,在线执行所述待执行任务,同时开始对执行所述待执行任务的时间进行计时。
S305:当计时时长不大于时长阈值时,继续执行所述待执行任务直至执行完毕,当计时时长大于时长阈值时,停止执行所述待执行任务,并将所述待执行任务发送给离线MR系统离线执行。
S306:任务调度机将所述待执行任务发送给离线MR系统离线执行。
以上为本申请实施例提供的集群中的任务执行方法,基于同样的思路,本申请实施例还提供相应的集群中的任务执行装置,如图4所示。
图4为本申请实施例提供的集群中的任务执行装置结构示意图,具体包括:
获取模块401,用于获取待执行任务;
确定模块402,用于根据所述待执行任务的指定属性,在预先划分的各集群资源集合中,确定所述待执行任务对应的集群资源集合;
执行模块403,用于利用确定出的集群资源集合中包含的集群资源,执行所述待执行任务。
所述各集群资源集合至少包括:为在线执行任务提供集群资源的集群资源集合、为离线执行任务提供集群资源的集群资源集合。
当所述指定属性包括数据量时,所述确定模块402具体用于:判断所述待执行任务的数据量是否不大于数据量阈值;若是,则将为在线执行任务提供集群资源的集群资源集合,确定为所述待执行任务对应的集群资源集合;否则,将为离线执行任务提供集群资源的集群资源集合,确定为所述待执行任务对应的集群资源集合。
当所述指定属性包括从所述待执行任务中分解出的任务实例的数量时,所述确定模块402具体用于:判断从所述待执行任务中分解出的任务实例的数量是否不大于实例数阈值;若是,则将为在线执行任务提供集群资源的集群资源集合,确定为所述待执行任务对应的集群资源集合;否则,将为离线执行任务提供集群资源的集群资源集合,确定为所述待执行任务对应的集群资源集合。
当确定出的集群资源集合是为在线执行任务提供集群资源的集群资源集合时,所述执行模块403具体用于:利用为在线执行任务提供集群资源的集群资源集合中包含的集群资源,在线执行所述待执行任务;
当确定出的集群资源集合是为离线执行任务提供集群资源的集群资源集合时,所述执行模块403具体用于:利用为离线执行任务提供集群资源的集群资源集合中包含的集群资源,离线执行所述待执行任务。
所述装置还包括:
切换模块404,用于对所述执行模块403在线执行所述待执行任务的过程进行计时,当计时时长大于时长阈值时,停止在线执行所述待执行任务,并释放所述待执行任务占用的集群资源,利用为离线执行任务提供集群资源的集群资源集合,离线执行所述待执行任务。
具体的上述如图4所示的装置可以位于集群中的机器上。
本申请实施例提供一种集群中的任务执行方法及装置,该方法获取待执行任务,根据所述待执行任务的指定属性,在预先划分的各集群资源集合中,确定所述待执行任务对应的集群资源集合,利用确定出的集群资源集合中包含的集群资源,执行所述待执行任务。通过上述方法,不同的待执行任务可能对应着不同集群资源集合,任一个待执行任务可以只占用该待执行任务对应的集群资源集合包含的集群资源,而不会占有集群的全部集群资源,因此,即使某个待执行任务长时间的占用该待执行任务对应的集群资源集合包含的全部集群资源,集群仍然可以利用其它集群资源集合包含的集群资源,及时地执行所述其它集群资源集合对应的其他待执行任务。
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序 产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁 带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。
本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。

Claims (12)

  1. 一种集群中的任务执行方法,其特征在于,包括:
    获取待执行任务;
    根据所述待执行任务的指定属性,在预先划分的各集群资源集合中,确定所述待执行任务对应的集群资源集合;
    利用确定出的集群资源集合中包含的集群资源,执行所述待执行任务。
  2. 如权利要求1所述的方法,其特征在于,所述各集群资源集合至少包括:为在线执行任务提供集群资源的集群资源集合、为离线执行任务提供集群资源的集群资源集合。
  3. 如权利要求2所述的方法,其特征在于,当所述指定属性包括数据量时,确定所述待执行任务对应的集群资源集合,具体包括:
    判断所述待执行任务的数据量是否不大于数据量阈值;
    若是,则将为在线执行任务提供集群资源的集群资源集合,确定为所述待执行任务对应的集群资源集合;
    否则,将为离线执行任务提供集群资源的集群资源集合,确定为所述待执行任务对应的集群资源集合。
  4. 如权利要求2所述的方法,其特征在于,当所述指定属性包括从所述待执行任务中分解出的任务实例的数量时,确定所述待执行任务对应的集群资源集合,具体包括:
    判断从所述待执行任务中分解出的任务实例的数量是否不大于实例数阈值;
    若是,则将为在线执行任务提供集群资源的集群资源集合,确定为所述待执行任务对应的集群资源集合;
    否则,将为离线执行任务提供集群资源的集群资源集合,确定为所述待执行任务对应的集群资源集合。
  5. 如权利要求2所述的方法,其特征在于,当确定出的集群资源集合是为在线执行任务提供集群资源的集群资源集合时,执行所述待执行任务,具体包括:
    在线执行所述待执行任务;
    当确定出的集群资源集合是为离线执行任务提供集群资源的集群资源集合时,执行所述待执行任务,具体包括:
    离线执行所述待执行任务。
  6. 如权利要求5所述的方法,其特征在于,当执行所述待执行任务具体包括在线 执行所述待执行任务时,所述方法还包括:
    对所述待执行任务的在线执行时长进行计时;
    当计时时长大于时长阈值时,停止在线执行所述待执行任务,并释放所述待执行任务占用的集群资源;
    利用为离线执行任务提供集群资源的集群资源集合,离线执行所述待执行任务。
  7. 一种集群中的任务执行装置,其特征在于,包括:
    获取模块,用于获取待执行任务;
    确定模块,用于根据所述待执行任务的指定属性,在预先划分的各集群资源集合中,确定所述待执行任务对应的集群资源集合;
    执行模块,用于利用确定出的集群资源集合中包含的集群资源,执行所述待执行任务。
  8. 如权利要求7所述的装置,其特征在于,所述各集群资源集合至少包括:为在线执行任务提供集群资源的集群资源集合、为离线执行任务提供集群资源的集群资源集合。
  9. 如权利要求8所述的装置,其特征在于,当所述指定属性包括数据量时,所述确定模块具体用于:判断所述待执行任务的数据量是否不大于数据量阈值;若是,则将为在线执行任务提供集群资源的集群资源集合,确定为所述待执行任务对应的集群资源集合;否则,将为离线执行任务提供集群资源的集群资源集合,确定为所述待执行任务对应的集群资源集合。
  10. 如权利要求8所述的装置,其特征在于,当所述指定属性包括从所述待执行任务中分解出的任务实例的数量时,所述确定模块具体用于:判断从所述待执行任务中分解出的任务实例的数量是否不大于实例数阈值;若是,则将为在线执行任务提供集群资源的集群资源集合,确定为所述待执行任务对应的集群资源集合;否则,将为离线执行任务提供集群资源的集群资源集合,确定为所述待执行任务对应的集群资源集合。
  11. 如权利要求8所述的装置,其特征在于,当确定出的集群资源集合是为在线执行任务提供集群资源的集群资源集合时,所述执行模块具体用于:利用为在线执行任务提供集群资源的集群资源集合中包含的集群资源,在线执行所述待执行任务;
    当确定出的集群资源集合是为离线执行任务提供集群资源的集群资源集合时,所述执行模块具体用于:利用为离线执行任务提供集群资源的集群资源集合中包含的集群资源,离线执行所述待执行任务。
  12. 如权利要求11所述的装置,其特征在于,所述装置还包括:
    切换模块,用于对所述执行模块在线执行所述待执行任务的在线执行时长进行计时,当计时时长大于时长阈值时,停止在线执行所述待执行任务,并释放所述待执行任务占用的集群资源,利用为离线执行任务提供集群资源的集群资源集合,离线执行所述待执行任务。
PCT/CN2016/090617 2015-07-29 2016-07-20 一种集群中的任务执行方法及装置 WO2017016421A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/880,432 US20180150326A1 (en) 2015-07-29 2018-01-25 Method and apparatus for executing task in cluster

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510455382.1A CN106406987B (zh) 2015-07-29 2015-07-29 一种集群中的任务执行方法及装置
CN201510455382.1 2015-07-29

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/880,432 Continuation US20180150326A1 (en) 2015-07-29 2018-01-25 Method and apparatus for executing task in cluster

Publications (1)

Publication Number Publication Date
WO2017016421A1 true WO2017016421A1 (zh) 2017-02-02

Family

ID=57884110

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/090617 WO2017016421A1 (zh) 2015-07-29 2016-07-20 一种集群中的任务执行方法及装置

Country Status (3)

Country Link
US (1) US20180150326A1 (zh)
CN (1) CN106406987B (zh)
WO (1) WO2017016421A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113055476A (zh) * 2021-03-12 2021-06-29 杭州网易再顾科技有限公司 一种集群式服务系统、方法、介质和计算设备

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8327185B1 (en) 2012-03-23 2012-12-04 DSSD, Inc. Method and system for multi-dimensional raid
CN108446169B (zh) * 2017-02-16 2022-04-26 阿里巴巴集团控股有限公司 一种作业调度方法及装置
US10339062B2 (en) 2017-04-28 2019-07-02 EMC IP Holding Company LLC Method and system for writing data to and read data from persistent storage
US10614019B2 (en) 2017-04-28 2020-04-07 EMC IP Holding Company LLC Method and system for fast ordered writes with target collaboration
CN110069511B (zh) * 2017-09-26 2021-10-15 北京国双科技有限公司 一种数据查询的分配方法及装置
CN107729141B (zh) * 2017-09-27 2022-06-10 华为技术有限公司 一种业务分配方法、装置和服务器
CN108632365B (zh) * 2018-04-13 2020-11-27 腾讯科技(深圳)有限公司 服务资源调整方法、相关装置和设备
KR102563648B1 (ko) * 2018-06-05 2023-08-04 삼성전자주식회사 멀티 프로세서 시스템 및 그 구동 방법
CN108920265A (zh) * 2018-06-27 2018-11-30 平安科技(深圳)有限公司 一种基于服务器集群的任务执行方法及服务器
CN109062698A (zh) * 2018-08-13 2018-12-21 郑州云海信息技术有限公司 一种任务处理方法、装置及系统
CN109582447B (zh) * 2018-10-15 2020-09-29 中盈优创资讯科技有限公司 计算资源分配方法、任务处理方法及装置
CN109766328A (zh) * 2018-12-27 2019-05-17 北京奇艺世纪科技有限公司 数据库迁移方法、系统、数据处理设备、计算机介质
CN110362404B (zh) * 2019-06-28 2022-08-23 北京淇瑀信息科技有限公司 一种基于sql的资源分配方法、装置和电子设备
CN110362410A (zh) * 2019-07-24 2019-10-22 江苏满运软件科技有限公司 基于离线应用的资源控制方法、系统、设备及存储介质
CN110659137B (zh) * 2019-09-24 2022-02-08 支付宝(杭州)信息技术有限公司 针对离线任务的处理资源分配方法及系统
CN112783635A (zh) * 2019-11-06 2021-05-11 阿里巴巴集团控股有限公司 一种资源额度调整方法和装置
CN114726869A (zh) * 2022-04-02 2022-07-08 中国建设银行股份有限公司 资源管理方法及装置、存储介质及电子设备

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101441580A (zh) * 2008-12-09 2009-05-27 华北电网有限公司 分布式并行计算平台系统及其计算任务分配方法
CN103475538A (zh) * 2013-09-02 2013-12-25 南京邮电大学 一种基于多接口的自适应的云服务测试方法

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004171234A (ja) * 2002-11-19 2004-06-17 Toshiba Corp マルチプロセッサシステムにおけるタスク割り付け方法、タスク割り付けプログラム及びマルチプロセッサシステム
US7895071B2 (en) * 2006-08-14 2011-02-22 Hrl Laboratories, Llc System and method for multi-mission prioritization using cost-based mission scheduling
CN102243598B (zh) * 2010-05-14 2015-09-16 深圳市腾讯计算机系统有限公司 分布式数据仓库中的任务调度方法及系统
CN102043675B (zh) * 2010-12-06 2012-11-14 北京华证普惠信息股份有限公司 一种基于任务处理请求任务量大小的线程池管理方法
CN102945185B (zh) * 2012-10-24 2015-04-22 深信服网络科技(深圳)有限公司 任务调度方法及装置
IN2013MU02794A (zh) * 2013-08-27 2015-07-03 Tata Consultancy Services Ltd
CN103491187B (zh) * 2013-09-30 2018-04-27 华南理工大学 一种基于云计算的大数据统一分析处理方法
US10073714B2 (en) * 2015-03-11 2018-09-11 Western Digital Technologies, Inc. Task queues

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101441580A (zh) * 2008-12-09 2009-05-27 华北电网有限公司 分布式并行计算平台系统及其计算任务分配方法
CN103475538A (zh) * 2013-09-02 2013-12-25 南京邮电大学 一种基于多接口的自适应的云服务测试方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113055476A (zh) * 2021-03-12 2021-06-29 杭州网易再顾科技有限公司 一种集群式服务系统、方法、介质和计算设备

Also Published As

Publication number Publication date
CN106406987B (zh) 2020-01-03
US20180150326A1 (en) 2018-05-31
CN106406987A (zh) 2017-02-15

Similar Documents

Publication Publication Date Title
WO2017016421A1 (zh) 一种集群中的任务执行方法及装置
US10885033B2 (en) Query plan management associated with a shared pool of configurable computing resources
US20170068574A1 (en) Multiple pools in a multi-core system
US9262210B2 (en) Light weight workload management server integration
Kulkarni et al. Survey on Hadoop and Introduction to YARN.
US10089142B2 (en) Dynamic task prioritization for in-memory databases
US11275622B2 (en) Utilizing accelerators to accelerate data analytic workloads in disaggregated systems
US20130061220A1 (en) Method for on-demand inter-cloud load provisioning for transient bursts of computing needs
US20160371126A1 (en) Scheduling mapreduce jobs in a cluster of dynamically available servers
US20150295970A1 (en) Method and device for augmenting and releasing capacity of computing resources in real-time stream computing system
US8413158B2 (en) Processor thread load balancing manager
JP2012118987A (ja) メモリ使用量照会ガバナのためのコンピュータ実装方法、コンピュータ・プログラム、およびシステム(メモリ使用量照会ガバナ)
CN106775948B (zh) 一种基于优先级的云任务调度方法及装置
Xin et al. Graysort on apache spark by databricks
WO2017005115A1 (zh) 分布式dag系统的自适应优化方法和装置
Petrov et al. Adaptive performance model for dynamic scaling Apache Spark Streaming
WO2016041126A1 (zh) 基于gpu的数据流处理方法和装置
CN105740249B (zh) 一种大数据作业并行调度过程中的处理方法及其系统
US10592473B2 (en) Method for improving energy efficiency of map-reduce system and apparatus thereof
Liu et al. An efficient job scheduling for MapReduce clusters
WO2016101115A1 (zh) 一种资源调度方法以及相关装置
Que et al. Hierarchical merge for scalable mapreduce
CN117093335A (zh) 分布式存储系统的任务调度方法及装置
WO2018196459A1 (zh) 一种下载请求处理方法、装置、处理设备及介质
Singh et al. Private cloud scheduling with SJF, bound waiting, priority and load balancing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16829787

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16829787

Country of ref document: EP

Kind code of ref document: A1