CN103995735A - Device and method for scheduling working flow - Google Patents

Device and method for scheduling working flow Download PDF

Info

Publication number
CN103995735A
CN103995735A CN201310249906.2A CN201310249906A CN103995735A CN 103995735 A CN103995735 A CN 103995735A CN 201310249906 A CN201310249906 A CN 201310249906A CN 103995735 A CN103995735 A CN 103995735A
Authority
CN
China
Prior art keywords
workflow
job
resource
information
file system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310249906.2A
Other languages
Chinese (zh)
Inventor
安信荣
裵承朝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Electronics and Telecommunications Research Institute ETRI
Original Assignee
Electronics and Telecommunications Research Institute ETRI
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electronics and Telecommunications Research Institute ETRI filed Critical Electronics and Telecommunications Research Institute ETRI
Publication of CN103995735A publication Critical patent/CN103995735A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues

Abstract

The invention provides a device and a method for scheduling the working flow. The method comprises steps of determining whether the output file of each work is necessary when the working flow is determined, collecting resource usage information of individual application of the user working flow, appointing the data location dependence parameter if the continual work is the I/O intense work to enable the node of the previous work in the continues works to operate the latter work of the continues works, and scheduling a working group having the data location dependency corresponding to the same calculation node based on the real time work load information collected by the computer system.

Description

Equipment and method for the operation of despatching work stream
(a plurality of) related application
The application requires the rights and interests of No. 10-2013-0015841st, the korean patent application submitted on February 14th, 2013, by reference it is merged thus, as all here illustrated.
Technical field
The present invention relates to the technology of despatching work stream operation, and more specifically, relate to can by high performance computing system automatically operation large-scale parallel distribution operation obtain result, that be suitable for realizing resource management and job scheduling, for equipment and the method for despatching work stream operation.
Background technology
As is well known, under the various computational resource environment such as supercomputer, high-performance bunch, grid system and network service, utilized for replacing people to process science calculating operation, Workflow Management System, resource management system and the scheduling device of mass data, once all (in a lump) operates in the operation between each discontinuous running with dependent complexity thus.
Workflow Management System is such software systems, it is by user-friendly interface (UI: user interface) form (compose) wherein job sequence related workflow of tool each other, meet (in line with) and comprise that the various computational resources of high-performance computer, network service and grid move workflow, and report operation result.Conventional operation Workflow Management System comprises Taverna, Galaxy and Kepler.
Resource management system for bunch or the software systems of high-performance computer, the batch processing of running job and the management of computational resource.For example, resource management system comprise there is OpenPBS, a series of portable batch of system (PBS) of TORQEU and PBS pro, and further comprise SLURM and Oracle grid engine (Oracle Grid Engine).Resource management system is used conventionally has first the first job scheduling of service (FCFS) scheme.
Scheduling device is usually used to be connected with resource management system.Scheduling device is such software systems, its expecting based on the comparison the priority of the quantity of resource, operation, whether exist available resources and available resources type result and when dynamically changing operation order, the operation in operation work queue.Conventional operation scheduler comprises Maui, ALPS, LSF and Moab.
Summary of the invention
As is well known, in the situation that comprise most of science application of genome (genome) sequential analysis, by the application program of exploitation is in advance collected to obtain expected result.Workflow or pipeline are dependence based on time-based order and data dependency and have order or the stream of described dependent application program (operation).This workflow can comprise various sizes, such as comprising the simple types of one or two application and comprising tens or the type of hundreds of application.
In order effectively to obtain result by workflow being mapped to correct computational resource, need to be about forming the precise information of the necessary computational resource of operation of workflow.Yet other people except the developer of application are difficult to know application program about the actual motion operation information to the use of resource.The use information of resource comprises number, memory capacity, dish capacity and the network bandwidth of the CPU that application program is essential.For obtain resource from source code, use exploitation (rather) decline (depress) also a little of the analysis tool of configuration file.
In the prior art, introduced for moving the result that there is the test jobs of sample file and then analyze this operation or the previous operation of storage running and also then when dispatching next operation, utilized the method for the result of storage as operation configuration file.In addition, also introduced the isolation features of cpu resource and memory resource, made the operation for a large amount of operations of calculating of needs or use lots of memory, with isolation features, distributed and need the CPU core of number and storer, to move this operation.As a result, because isolated resource use, each operation does not affect another operation.
Yet, for input and output, do not provide resource isolation.For example,, not to global file system (GFS) be provided for the isolation features of I/O (I/O) bandwidth as the infinitely great frequency band (infiniband) of high-performance calculation network.So a plurality of application of different qualities move and shared I/O (I/O) resource if had simultaneously, can not realize and there is other CPU service efficiency of expectation level.Particularly, move therein in the situation of I/O intensive application of many input and output simultaneously, wait for that the time of I/O becomes very long, and thus CPU service efficiency can be in the intensive operation of CPU severe exacerbation.
In order to solve the problem of the resource use that can occur when operation comprises the workflow of the intensive operation of I/O, the invention provides following methods, for real-time working information on load and the resource using information in the indivedual application of scheduling reflection, to prevent that the intensive operation of I/O from worsen the operating efficiency of high performance computing system.
According to an aspect of the present invention, provide a kind of workflow job scheduling apparatus, having comprised: workflow user interface unit, has been configured to be provided for the interface that user job flows; Workflow engine unit, is configured to, with the resource using information of indivedual application, user job rheology is changed to operation workflow, carries out this operation workflow, and generates dispatch command; Rm-cell, is configured to collect and manage the real-time working information on load for all resources of computing system; Job scheduling unit, is configured to based on this real-time working information on load and according to this dispatch command, dispatch to move successively same computing node is had to the dependent group job in data place; Computational resource task management unit, is positioned at indivedual computing nodes place, and while being configured to the operation when this job scheduling unit requests operation, at distributed Nodes running job; Computing node monitor unit, is configured to, when at the positive running job in indivedual computational resource places, monitor the information on load of indivedual computational resources, and this information on load is provided to this rm-cell; Global file system resource monitoring unit, is configured to measure usage rate and total I/O (I/O) bandwidth of global file system, and metrical information is provided to this rm-cell; Use administrative unit with application resource, be configured to metrical information based on from this global file system resource monitoring unit and from the information on load of this computing node monitor unit, generate the resource using information of indivedual application.
This interface can comprise graphical user interface (GUI) or network interface.
Workflow Management unit can provide storage, change, deletes and check the function of this user job stream.
This user job stream can be not in the workflow of the abstraction level of computational resource place operation.
When this user job rheology being changed to operation workflow, this workflow engine unit can be transformed to the order of the operation of this user job stream the data place dependence parameter needing when submit job, and the resource using information of described indivedual application is transformed to the resource request for utilization parameter needing when the submit job.
This workflow engine unit can generate the job run solicited message that comprises this data place dependence parameter and this resource request for utilization parameter, and this job run solicited message is provided to this job scheduling unit.
This workflow engine unit can provide storage, delete and check the function of the operation result of computational resource.
This real-time working information on load can comprise configuration information about the resource of computing system, about load and whether distribute the information of the resource of indivedual nodes, the load of global file system and I/O bandwidth are used to operation.
This resource can comprise computing node, global file system node, supervising the network switch, computational grid switch and the network architecture.
Whether this job scheduling unit can and there are available resources according to the priority of operation, comes to unsettled operation on the current job queue of computational resource allocation, and moves described operation.
Whether exist available resources to comprise I/O bandwidth in real time.
The information on load of this computational resource can comprise at least one in CPU usage rate, storer usage rate, the use of dish I/O bandwidth, dish usage rate, network I/O use and network I/O usage rate.
This computational resource monitor unit can monitor this information on load, and periodically or this information on load is provided to this rm-cell or this application resource after fulfiling assignment and uses administrative unit.
When this user job stream of operation, can automatically collect and generate the resource using information of described indivedual application.
By the supervision of job manager, via manual input, can generate the resource using information of described indivedual application.
According to a further aspect in the invention, provide a kind of workflow job scheduling method, having comprised: when definition user job stream, as net result, specified whether the output file of each operation is necessary; Collect the resource using information of the indivedual application that form this user job stream; If worked continuously, be the intensive operation of I/O, specific data place dependence parameter, makes the Nodes that moves therein the last operation among working continuously move the rear operation among working continuously; With the real-time working information on load of all resource acquisitions based on for computing system, dispatch to move successively same computing node is had to the dependent group job in data place.
This last operation can move interim (interim) file of this place, domain storage of the node of last operation therein, and the temporary file that last operation is stored is read in this domain of this rear job command I/O catalogue place node of an operation from wherein moving.
This scheduling step can comprise: select the operation on job queue; Whether the I/O bandwidth requirement that checks the global file system of selected operation meets the available I/O bandwidth that this global file system can provide; When meeting this I/O bandwidth demand, select to meet the computing node that the needed resource of selected operation is used; With at selected computing node place, move selected operation.
Can carry out according to predetermined priority control criterion the selection of described operation.
Described demand can comprise the first and second demands.The use that this first demand is current global file system is relatively less than the stabilization of maximum global file system and uses, and the I/O of the global file system of the I/O that this second demand is current global file system use and selected operation requires the relative actual I/O bandwidth of maximum that is less than global file system of sum.
According to embodiments of the invention, when high-performance computer or bunch in operation while comprising the workflow of the intensive operation of I/O, the global file system that may feed back in job scheduling by utilization and the real-time input/output band width information of computing node, prevent that the service efficiency of computational resource from significantly worsening due to I/O.As a result, may make the use of computational resource maximize, and the working time of reducing thus workflow, and obtain rapidly result.In addition, according to embodiments of the invention, even if general user does not have about the deep knowledge of computational resource and do not know how to use system, user also can move its workflow best.
Accompanying drawing explanation
The following description of the embodiment providing in conjunction with the drawings, above and other object of the present invention and feature will become obviously, wherein:
Fig. 1 illustrates the workflow comprising by file I/O operation connected to one another;
Fig. 2 shows the configuration of typical high-performance computer system;
Fig. 3 illustrates according to the block diagram of the workflow job scheduling apparatus of the embodiment of the present invention;
Fig. 4 illustrates the method for the input and output that distribute in the workflow that comprises the intensive operation of I/O, between this domain and global file system; With
Fig. 5 shows the process flow diagram for the scheduling of the job scheduling unit of the intensive operation of I/O.
Embodiment
In the following description of the present invention, if the detailed description of known structure and operation can make theme of the present invention fuzzy, will omit its detailed description.Following term is the term that the function by considering in embodiments of the invention defines, and can change operational symbol (operator) and be intended to for the present invention and practice.Thus, should run through instructions of the present invention and define these terms.
, with reference to accompanying drawing describe embodiments of the invention, make those skilled in the art can easily realize these embodiment thereafter.
First, the present invention is directed to the job scheduling method of workflow of the operation (being the intensive operation of I/O) of the input and output that comprise that wherein existence is relatively a large amount of.For this purpose, dispatching method of the present invention can comprise following scheme, the information of analyzing in advance that the quantity of real time monitoring input/output band width and the computational resource that uses when analyzing in advance in the operation operation in workflow obtains, and the result that reflection monitors continuously in job scheduling.
Fig. 1 illustrates the workflow comprising by file I/O operation connected to one another.Fig. 1 shows workflow (or pipeline) type, wherein the file of the result store as a certain operation is provided as to the input of next operation, makes to move next operation.
Here, the input data of workflow are stored according to document form, or are input to operation as standard input.The operation of every one-phase can have various application characteristics, and be computation-intensive operation, store intensive operation or the intensive operation of I/O.Computation-intensive operation representative has the operation of a large amount of evaluation works, and stores the operation that a large amount of storeies are used in intensive operation representative.
Fig. 2 shows the configuration of typical high-performance computer system (or supercomputer), it comprise a plurality of service node 202-1 to 202-3, supervising the network switch 204, a plurality of computing node 206-1 to 206-4, computational grid switch 208, a plurality of global file system server node 210-1 be to 210-3 and a plurality of storage node 212-1 to 212-3.Here, described a plurality of service node 202-1 comprises than described a plurality of computing node 206-1 to 206-4 node still less to 202-3.
With reference to figure 2, described a plurality of service node 202-1 comprise and login node to each in 202-3, and general user logins and submit job by it; And management node, for operating the various servers of carrying out such as the management function of cluster management, resource management, Workflow Management etc.
Here, each and the described a plurality of global file system server node 210-1 of described a plurality of service node 202-1 in to each in 202-3, described a plurality of computing node 206-1 to 206-4 is to each in 210-3 by computational grid switch 208 and supervising the network switch 204 and couple in pairs each other.Supervising the network switch 204 is according to the speed operation more relatively low than computational grid switch 208.
Computing node 206-1 can comprise or can not comprise this domain to each in 206-4.Most of computational tasks according to reading the input file stored in global file system, the mode carrying out necessary calculating operation and again store the result of this calculating operation in global file system moves.
So, if total I/O bandwidth that the intensive operation of I/O is used exceeds the bandwidth that global file system provides, according to prior art, affect the All Jobs of high performance computing system.The invention provides the scheme that can address these problems.
Fig. 3 illustrates according to the block diagram of the workflow job scheduling apparatus of the embodiment of the present invention, and it comprises that workflow user interface unit 302, Workflow Management unit 304, workflow engine unit 306, rm-cell 308, job scheduling unit 310, computational resource task management unit 312, computational resource monitor unit 314, global file system resource monitoring unit 316 and application resource are used administrative unit 318.
With reference to figure 3, workflow user interface unit 302 moves by graphic user interface (GUI) or socket and can easily define, the user interface of execution and the required workflow of analysis user.
Workflow Management unit 304 provides storage, changes, deletes and checks the function of the user job stream of this user interface specified (or selection).Here, user job stream is the workflow of abstraction level, and it can be applied directly to computational resource.User job rheology is changed to the specific run workflow that can move successively.
The user job rheology that the resource using information that uses indivedual application is set up in workflow engine unit 306 is changed to operation workflow, moves this operation workflow, and generate dispatch command based on operation result by computational resource.From application resource, use administrative unit 318 that the resource using information of indivedual application is provided.Here, the result that the operation by computational resource obtains can be stored, deletes and be checked in workflow engine unit 306.
In addition, when user job rheology being changed to operation workflow, workflow engine unit 306 is transformed to data place dependence parameter required when the submit job by the order of the operation of user job stream, and the resource using information of application is individually transformed to resource request for utilization parameter required when the submit job.Workflow engine unit 306 generates the job run solicited message that comprises this data place dependence parameter and this resource request for utilization parameter, and this job run solicited message is delivered to job scheduling unit 310.
The real-time working information on load about the total resources of computing system is collected and managed to rm-cell 308, for example, comprise configuration information about the total resources of computing system, about load (use) and the real-time working information on load that whether uses and load to the information of each node Resources allocation and the input/output band width of global file system.Total resources comprise computing node, global file system node, supervising the network switch, computational grid switch, the network architecture etc.Here, real-time working information on load is delivered to job scheduling unit 310.
When from workflow engine unit 306 while transmitting dispatch command, the real-time working information on load of job scheduling unit 310 based on transmitting from rm-cell 308, dispatch the operation in the group that will move successively, the operation in this group has the data place dependence for same computing node.That is, job scheduling unit 310 is according to their priority and whether have available resources, and to computational resource allocation unsettled operation on current job queue, and move these operations.Here, whether exist available resources can comprise real-time input/output band width availability.
For example, the operation on job queue is selected according to predetermined priority control criterion in job scheduling unit 310, and checks the input/output band width of the global file system of selected operation requires whether to meet the demand that this global file system can provide.If meet this demand, job scheduling unit 310 operation dispatchings meet with selection the computing node that the essential resource of selected operation is used, and move selected operation.Here, this demand can comprise that the use of current global file system should be less than the stabilization use of maximum global file system relatively and the I/O use of current global file system requires sum should relatively be less than the condition of the actual I/O bandwidth of maximum of file system with the I/O of the global file system of selected operation.
Computational resource task management unit 312 is positioned at indivedual computing nodes place, when, at the Nodes running job distributing, monitoring the operation of operation, and report the result of job run 310 while receiving the request of running jobs from job scheduling unit.
Computational resource job monitoring unit 314 monitors the information on load of the computational resource of the indivedual nodes when at computational resource place running job, and periodically or after completing selected operation, to rm-cell 308 or application resource, use administrative unit 318 these information on loads of report.This information on load comprises CPU usage rate, storer usage rate, the use of dish I/O bandwidth, the I/O use of coiling usage rate, network and usage rate etc.
Here, by the resource of computational resource monitor unit 314 supervision, use configuration file performance metric can comprise that CPU usage rate (peak value, average), storer use (peak value, average), dish usage rate and I/O bandwidth to use (peak value, average), network usage rate and I/O bandwidth use (peak value, average) etc.
Total I/O bandwidth and the usage rate of global file system periodically measured in global file system resource monitoring unit 316, and the bandwidth of measurement and usage rate are delivered to rm-cell 308.
Application resource is used the information on load of administrative unit 318 based on transmitting from computational resource monitor unit 315 and the metrical informations of 316 transmission from global file system resource monitoring unit, generate the resource using information of indivedual application, and resource using information is delivered to workflow engine unit 306.Here, when run user workflow, can automatically collect or generate the resource using information of indivedual application, or job manager can be by monitoring that resource makes the resource using informations for manually input is applied individually.
Fig. 4 illustrates the method for the input and output that distribute in the workflow that comprises the intensive operation of I/O, between this domain and global file system.
With reference to figure 4, the intensive operation of I/O consumes relatively large input/output band width.Thus, need much one-tenth originally to set up the global file system that very large global bandwidth is provided.So, when relatively cheap this domain of effective use, can reduce overall input/output band width, make to set up and running cost efficient system.
For example, as shown in Figure 4, suppose that all or part of of operation (operation 1, operation 2 and operation 3) of workflow is the intensive operation of I/O, answer running job 1, operation 2 and an operation 3.From global file system, transmit the input of operation 1.The input of the output (temporary file 1) of operation 1 as operation 2 is provided.The input of the output (temporary file 2) of operation 2 as operation 3 is provided.The final output of operation 3 is stored in global file system.
As shown in Figure 4, temporary file can be temporarily stored in this domain rather than global file system.In this case, by the local file system of effective use computing node, may be omitted in and in global file system, again write temporary file 1 and 2 and the operation of again reading temporary file 1 and 2 from global file system.For this purpose, operation 2 and operation 3 are scheduling to the computing node place operation of running job 1 therein.
That is,, according in the job scheduler of prior art, only sequence of events is appointed as to dependence parameter.Yet, according in the method for the embodiment of the present invention, by specifying data place the schedule job except sequence of events, the I/O of global file system may be restricted to the I/O in this domain.For this purpose, the invention provides following processing 1) to 4).
1) when by workflow user interface unit definition workflow, user specifies as net result whether the output file of each operation is essential.
2) application resource is used administrative unit to collect and provide the resource using information of the application that forms this workflow.
3) if to work continuously be the intensive operation of I/O, workflow engine unit specific data place dependence parameter, the Nodes that makes to move therein last operation moves the rear operation after this last operation.And last operation moves therein this domain (rather than global file system) of the computing node of last operation and locates to store temporary file.In this domain of rear job command I/O catalogue place node of an operation from wherein moving, read the temporary file that last operation is stored.This processing is the order dependent between specific data place dependence and last operation and a rear operation, and orders deletion user to be appointed as the temporary file of non-essential document, or after fulfiling assignment, temporary file is copied on global file system.
4) job scheduling unit is according to from the dispatch command of workflow engine unit, dispatch to move successively for same computing node, there is the dependent group job in data place.
Fig. 5 shows the process flow diagram for the scheduling of the job scheduling unit of the intensive operation of I/O.
With reference to figure 5, in step 502, according to predetermined priority control criterion, select the operation on job queue.
In step 504, check whether the I/O bandwidth requirement of the global file system of selected operation meets the available demand of global file system.That is, check and meet the first and second demands.The first demand is the stabilization use that the use of current global file system should be less than maximum global file system relatively.The second demand is that the I/O of the I/O use of current global file system and the global file system of selected operation requires sum should relatively be less than the actual I/O bandwidth of maximum of global file system.
As the result of the check processing of step 504, if determine and do not meet at least one in the first and second demands, process proceeds to step 502 to select another operation.The stabilization that can operate to compensate global file system by total system is used.
Result as the check processing of step 504, if determine to meet the first and second demands both, in step 506, the computing node of I/O bandwidth requirement of this domain, storer and the CPU core number of selected node needs is selected to meet in job scheduling unit among computing node, even after allowing the needed bandwidth of selected operation, may stable operation global file system.At this moment, this domain can not be used in selected operation, or with the operation that temporary file is stored in this domain, can need local dribbling wide as shown in Figure 4.Here, if computing node does not have this domain, the processing of I/O bandwidth requirement that can this domain of curtailed inspection.
After this, in step 508, job scheduling cell scheduling will be in the selected operation of selected node place operation.
Although illustrated and described the present invention for preferred embodiment, the invention is not restricted to this.It will be appreciated by those skilled in the art that and can make various changes and modifications, and do not depart from the scope of the present invention limiting in following claim.

Claims (20)

1. a workflow job scheduling apparatus, comprising:
Workflow user interface unit, is configured to be provided for the interface that user job flows;
Workflow engine unit, is configured to, with the resource using information of indivedual application, user job rheology is changed to operation workflow, carries out this operation workflow, and generates dispatch command;
Rm-cell, is configured to collect and manage the real-time working information on load for all resources of computing system;
Job scheduling unit, is configured to based on this real-time working information on load and according to this dispatch command, dispatch to move successively same computing node is had to the dependent group job in data place;
Computational resource task management unit, is positioned at indivedual computing nodes place, and while being configured to the operation when this job scheduling unit requests operation, at distributed Nodes running job;
Computing node monitor unit, is configured to, when at the positive running job in indivedual computational resource places, monitor the information on load of indivedual computational resources, and this information on load is provided to this rm-cell;
Global file system resource monitoring unit, is configured to measure usage rate and total I/O (I/O) bandwidth of global file system, and metrical information is provided to this rm-cell; With
Application resource is used administrative unit, is configured to metrical information based on from this global file system resource monitoring unit and from the information on load of this computing node monitor unit, generates the resource using information of indivedual application.
2. according to the workflow job scheduling apparatus of claim 1, wherein this interface comprises graphical user interface (GUI) or network interface.
3. according to the workflow job scheduling apparatus of claim 1, further comprise Workflow Management unit, be configured to storage, change are provided, delete and check the function of this user job stream.
4. according to the workflow job scheduling apparatus of claim 3, wherein this user job stream is not in the workflow of the abstraction level of computational resource place operation.
5. according to the workflow job scheduling apparatus of claim 1, wherein, when this user job rheology being changed to operation workflow, this workflow engine unit is configured to the order of the operation of this user job stream to be transformed to the data place dependence parameter needing when the submit job, and the resource using information of described indivedual application is transformed to the resource request for utilization parameter needing when the submit job.
6. according to the workflow job scheduling apparatus of claim 5, wherein this workflow engine unit is configured to generate the job run solicited message that comprises this data place dependence parameter and this resource request for utilization parameter, and this job run solicited message is provided to this job scheduling unit.
7. according to the workflow job scheduling apparatus of claim 1, the function that wherein this workflow engine unit is configured to provide storage, deletes and check the operation result of computational resource.
8. according to the workflow job scheduling apparatus of claim 1, wherein this real-time working information on load comprise configuration information about the resource of computing system, about load and whether distribute the information of the resource of indivedual nodes, the load of global file system and I/O bandwidth are used to operation.
9. workflow job scheduling apparatus according to Claim 8, wherein this resource comprises computing node, global file system node, supervising the network switch, computational grid switch and the network architecture.
10. according to the workflow job scheduling apparatus of claim 1, wherein this job scheduling unit is configured to according to the priority of operation and whether has available resources, comes to unsettled operation on the current job queue of computational resource allocation, and moves described operation.
Whether 11. according to the workflow job scheduling apparatus of claim 10, wherein exist available resources to comprise real-time I/O bandwidth.
12. according to the workflow job scheduling apparatus of claim 1, and wherein the information on load of this computational resource comprises at least one in CPU usage rate, storer usage rate, the use of dish I/O bandwidth, dish usage rate, network I/O use and network I/O usage rate.
13. according to the workflow job scheduling apparatus of claim 12, wherein this computational resource monitor unit is configured to monitor this information on load, and periodically or this information on load is provided to this rm-cell or this application resource after fulfiling assignment and uses administrative unit.
14. according to the workflow job scheduling apparatus of claim 1, wherein, when this user job stream of operation, automatically collects and generate the resource using information of described indivedual application.
15. according to the workflow job scheduling apparatus of claim 1, wherein by the supervision of job manager, via manual input, generates the resource using information of described indivedual application.
16. 1 kinds of workflow job scheduling methods, comprising:
When definition user job stream, as net result, specify whether the output file of each operation is necessary;
Collect the resource using information of the indivedual application that form this user job stream;
If worked continuously, be the intensive operation of I/O, specific data place dependence parameter, makes the Nodes that moves therein the last operation among working continuously move the rear operation among working continuously; With
The real-time working information on load of all resource acquisitions based on for computing system, dispatch to move successively same computing node is had to the dependent group job in data place.
17. according to the workflow job scheduling method of claim 16, wherein this last operation moves this place, domain storage temporary file of the node of last operation therein, and the temporary file that last operation is stored is read in this domain of this rear job command I/O catalogue place node of an operation from wherein moving.
18. according to the workflow job scheduling method of claim 16, and wherein this scheduling step comprises:
Select the operation on job queue;
Whether the I/O bandwidth requirement that checks the global file system of selected operation meets the available I/O bandwidth that this global file system can provide;
When meeting this I/O bandwidth demand, select to meet the computing node that the needed resource of selected operation is used; With
At selected computing node, place moves selected operation.
19. according to the workflow job scheduling method of claim 18, wherein according to predetermined priority control criterion, carries out the selection of described operation.
20. according to the workflow job scheduling method of claim 18, and wherein said demand comprises the first and second demands,
The use that wherein this first demand is current global file system is relatively less than the stabilization of maximum global file system and uses, and the I/O of the global file system of the I/O that this second demand is current global file system use and selected operation requires the relative actual I/O bandwidth of maximum that is less than global file system of sum.
CN201310249906.2A 2013-02-14 2013-06-21 Device and method for scheduling working flow Pending CN103995735A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2013-0015841 2013-02-14
KR1020130015841A KR20140102478A (en) 2013-02-14 2013-02-14 Workflow job scheduling apparatus and method

Publications (1)

Publication Number Publication Date
CN103995735A true CN103995735A (en) 2014-08-20

Family

ID=51309911

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310249906.2A Pending CN103995735A (en) 2013-02-14 2013-06-21 Device and method for scheduling working flow

Country Status (2)

Country Link
KR (1) KR20140102478A (en)
CN (1) CN103995735A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106155791A (en) * 2016-06-30 2016-11-23 电子科技大学 A kind of workflow task dispatching method under distributed environment
CN106294533A (en) * 2015-06-24 2017-01-04 伊姆西公司 Use the distributed work flow that data base replicates
CN108446174A (en) * 2018-03-06 2018-08-24 苏州大学 Multinuclear job scheduling method based on pre-allocation of resources and public guiding agency
CN109656699A (en) * 2018-12-14 2019-04-19 平安医疗健康管理股份有限公司 Distributed computing method, device, system, equipment and readable storage medium storing program for executing
CN112000453A (en) * 2020-08-25 2020-11-27 支付宝(杭州)信息技术有限公司 Scheduling method and system of stream computing system
CN112882817A (en) * 2021-03-24 2021-06-01 国家超级计算天津中心 Workflow processing method based on super computer
CN114385337A (en) * 2022-01-10 2022-04-22 杭州电子科技大学 Task grouping scheduling method for distributed workflow system
WO2023019408A1 (en) * 2021-08-16 2023-02-23 Huawei Cloud Computing Technologies Co., Ltd. Apparatuses and methods for scheduling computing resources

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101670460B1 (en) * 2015-04-16 2016-11-01 한국과학기술원 Apparatus and method for controlling scheduling of workflow based on cloud
KR102259927B1 (en) * 2017-10-18 2021-06-03 한국전자통신연구원 Workflow engine framework
CN112799797B (en) * 2019-11-14 2024-04-16 北京沃东天骏信息技术有限公司 Task management method and device
KR102360885B1 (en) * 2019-11-26 2022-02-08 한전케이디엔주식회사 System and method for managing work flow for cloud service
KR102337271B1 (en) * 2020-07-28 2021-12-09 주식회사 이노룰스 Global manufacturing execution system based on business rule management system
CN111898908B (en) * 2020-07-30 2023-06-16 华中科技大学 Production line scheduling system and method based on multiple intelligent objects
KR102427477B1 (en) * 2020-09-29 2022-08-01 한국전자기술연구원 Apply multiple elements method for workload analysis in the micro data center
CN114217733B (en) * 2021-04-30 2023-10-13 无锡江南计算技术研究所 IO (input/output) processing framework and IO request processing method for IO forwarding system
CN113553303A (en) * 2021-06-24 2021-10-26 南方科技大学 Resource management method, device, medium and electronic equipment of super computer
CN113535326B (en) * 2021-07-09 2024-04-12 粤港澳大湾区精准医学研究院(广州) Calculation flow scheduling system based on high-throughput sequencing data
CN114138500B (en) * 2022-01-29 2022-07-08 阿里云计算有限公司 Resource scheduling system and method
CN117406979B (en) * 2023-12-14 2024-04-12 之江实验室 Interface interaction design method and system for computing workflow

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1280335A (en) * 1999-07-10 2001-01-17 三星电子株式会社 Micro dispatching method and operation system inner core
US20020129082A1 (en) * 2001-03-08 2002-09-12 International Business Machines Corporation Inter-partition message passing method, system and program product for throughput measurement in a partitioned processing environment
CN1399218A (en) * 2001-07-19 2003-02-26 西门子共同研究公司 Data triggering operation flow process
CN101694709A (en) * 2009-09-27 2010-04-14 华中科技大学 Service-oriented distributed work flow management system
CN102611622A (en) * 2012-02-28 2012-07-25 清华大学 Dispatching method for working load of elastic cloud computing platform

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1280335A (en) * 1999-07-10 2001-01-17 三星电子株式会社 Micro dispatching method and operation system inner core
US20020129082A1 (en) * 2001-03-08 2002-09-12 International Business Machines Corporation Inter-partition message passing method, system and program product for throughput measurement in a partitioned processing environment
CN1399218A (en) * 2001-07-19 2003-02-26 西门子共同研究公司 Data triggering operation flow process
CN101694709A (en) * 2009-09-27 2010-04-14 华中科技大学 Service-oriented distributed work flow management system
CN102611622A (en) * 2012-02-28 2012-07-25 清华大学 Dispatching method for working load of elastic cloud computing platform

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294533A (en) * 2015-06-24 2017-01-04 伊姆西公司 Use the distributed work flow that data base replicates
CN106294533B (en) * 2015-06-24 2019-06-21 伊姆西公司 The distributed work flow replicated using database
CN106155791B (en) * 2016-06-30 2019-05-07 电子科技大学 A kind of workflow task dispatching method under distributed environment
CN106155791A (en) * 2016-06-30 2016-11-23 电子科技大学 A kind of workflow task dispatching method under distributed environment
CN108446174B (en) * 2018-03-06 2022-03-11 苏州大学 Multi-core job scheduling method based on resource pre-allocation and public boot agent
CN108446174A (en) * 2018-03-06 2018-08-24 苏州大学 Multinuclear job scheduling method based on pre-allocation of resources and public guiding agency
CN109656699A (en) * 2018-12-14 2019-04-19 平安医疗健康管理股份有限公司 Distributed computing method, device, system, equipment and readable storage medium storing program for executing
CN112000453A (en) * 2020-08-25 2020-11-27 支付宝(杭州)信息技术有限公司 Scheduling method and system of stream computing system
CN112882817A (en) * 2021-03-24 2021-06-01 国家超级计算天津中心 Workflow processing method based on super computer
CN112882817B (en) * 2021-03-24 2022-08-12 国家超级计算天津中心 Workflow processing method based on super computer
WO2023019408A1 (en) * 2021-08-16 2023-02-23 Huawei Cloud Computing Technologies Co., Ltd. Apparatuses and methods for scheduling computing resources
CN114385337A (en) * 2022-01-10 2022-04-22 杭州电子科技大学 Task grouping scheduling method for distributed workflow system
CN114385337B (en) * 2022-01-10 2023-10-20 杭州电子科技大学 Task grouping scheduling method for distributed workflow system

Also Published As

Publication number Publication date
KR20140102478A (en) 2014-08-22

Similar Documents

Publication Publication Date Title
CN103995735A (en) Device and method for scheduling working flow
US8095933B2 (en) Grid project modeling, simulation, display, and scheduling
US7853948B2 (en) Method and apparatus for scheduling grid jobs
US7831971B2 (en) Method and apparatus for presenting a visualization of processor capacity and network availability based on a grid computing system simulation
Selvarani et al. Improved cost-based algorithm for task scheduling in cloud computing
US7995474B2 (en) Grid network throttle and load collector
US10831633B2 (en) Methods, apparatuses, and systems for workflow run-time prediction in a distributed computing system
US9875135B2 (en) Utility-optimized scheduling of time-sensitive tasks in a resource-constrained environment
US9727383B2 (en) Predicting datacenter performance to improve provisioning
US9870269B1 (en) Job allocation in a clustered environment
US8756441B1 (en) Data center energy manager for monitoring power usage in a data storage environment having a power monitor and a monitor module for correlating associative information associated with power consumption
US10191779B2 (en) Application execution controller and application execution method
CN102541460B (en) Multiple disc management method and equipment
CN102868573B (en) Method and device for Web service load cloud test
US20070250629A1 (en) Method and a system that enables the calculation of resource requirements for a composite application
Castro et al. A joint CPU-RAM energy efficient and SLA-compliant approach for cloud data centers
CN111045911B (en) Performance test method, performance test device, storage medium and electronic equipment
CN107430526B (en) Method and node for scheduling data processing
CN103593224A (en) Virtual machine resource allocation system and method
CN109614227A (en) Task resource concocting method, device, electronic equipment and computer-readable medium
Cope et al. Robust data placement in urgent computing environments
CN111158904A (en) Task scheduling method, device, server and medium
Lu et al. VM scaling based on Hurst exponent and Markov transition with empirical cloud data
Netto et al. Deciding when and how to move HPC jobs to the cloud
Iglesias et al. Increasing task consolidation efficiency by using more accurate resource estimations

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140820