KR20140102478A

KR20140102478A - Workflow job scheduling apparatus and method

Info

Publication number: KR20140102478A
Application number: KR1020130015841A
Authority: KR
Inventors: 안신영; 배승조
Original assignee: 한국전자통신연구원
Priority date: 2013-02-14
Filing date: 2013-02-14
Publication date: 2014-08-22
Also published as: CN103995735A

Abstract

A workflow job scheduling apparatus of the present invention includes: a workflow user interface unit which provides an interface for a user′s workflow; a workflow engine unit which converts the user′s workflow into an execution workflow using the resource usage information of individual applications, executes the execution workflow using calculation resources, and generates a scheduling instruction in accordance to the execution workflow; a resource management unit which collects and manages real-time resource load information for the entire resource of a computing system; a job scheduler unit which makes a schedule of sequentially executing jobs of a group having data location dependency at a same calculation node, based on the real-time resource load information in accordance to the generated scheduling instruction; a calculation resource job management unit positioned at each individual calculation nodes which executes a job at an assigned node when the execution of the job is requested from the job scheduler; a calculation resource monitoring unit which monitors the load information of calculation resources through which jobs are executed at individual nodes when jobs are being executed using individual calculation resources, which provides the resource management unit with the monitored load information; a global file system resource monitoring unit which measures the total input/output bandwidth, utilizes a global file system, and provides the resource management unit with the measured total input/output bandwidth and utilization; and an application resource usage management unit which creates the resource usage information of the individual applications, based on the load information from the calculation resource monitoring unit and the measured information from the global file system resource monitoring unit which provides the workflow engine unit with the resource usage information.

Description

[0001] WORKFLOW JOB SCHEDULING APPARATUS AND METHOD [0002]

본 발명은 워크플로우 작업 스케줄링 기법에 관한 것으로, 더욱 상세하게는 고성능 컴퓨팅 시스템에 대규모 병렬 분산 작업을 자동 실행하여 결과를 얻을 수 있는 자원 관리 및 작업 스케줄링을 실현하는데 적합한 워크플로우 작업 스케줄링 장치 및 그 방법에 관한 것이다.
The present invention relates to a workflow work scheduling technique, and more particularly, to a workflow work scheduling apparatus and method suitable for realizing a resource management and a job scheduling that can obtain a result by automatically executing a massively parallel distributed work in a high performance computing system .

잘 알려진 바와 같이, 슈퍼컴퓨터, 고성능 클러스터, 그리드 시스템, 웹 서비스 등 다양한 형태의 컴퓨팅 자원 환경 하에서는 사람을 대신하여 대규모 데이터를 처리하는 과학 연산 작업, 여러 단계의 작업 간 종속성이 있는 복잡한 작업들을 일괄 실행하기 위해 워크플로우 관리 시스템, 자원 관리 시스템, 작업 스케줄러 등을 활용하여 왔다.As is well known, under various computing resource environments such as supercomputers, high-performance clusters, grid systems, and Web services, scientific computing operations that process large-scale data on behalf of human beings, and complex operations with dependencies between tasks A workflow management system, a resource management system, and a job scheduler have been utilized.

워크플로우 관리 시스템은 대체로 사용자 친화적인 UI(사용자 인터페이스)를 통해 일련의 작업들이 연관성을 가지고 이루어지는 워크플로우를 작성하고, 이 워크플로우를 고성능 컴퓨터를 포함하여, 그리드, 웹서비스 등 다양한 컴퓨팅 자원을 연동하여 실행하고 결과를 보고하는 소프트웨어 시스템을 의미하는데, 기존의 워크플로우 관리 시스템으로는, 예컨대 Taverna, Galaxy, Kepler 등이 있다.The workflow management system generally creates a workflow in which a series of tasks are related to each other through a user-friendly UI (user interface). The workflow is linked to various computing resources such as a high-performance computer, And reports the results. Existing workflow management systems include, for example, Taverna, Galaxy, and Kepler.

자원 관리 시스템은 고성능 컴퓨터 또는 클러스터에 대한 컴퓨팅 자원의 관리 및 작업의 일괄 실행 등을 처리하는 소프트웨어 시스템을 의미하는 것으로, 예컨대 PBS(Portable Batch System) 계열의 OpenPBS, TORQEU, PBS pro가 있고, 그 외에도 SLURM, Oracle Grid Engine 등이 있는데, 대체적으로 FCFS 방식의 작업 스케줄링을 사용한다.The resource management system refers to a software system that manages computing resources for a high-performance computer or a cluster and executes a batch of tasks, for example, OpenPBS, TORQEU, and PBS pro such as PBS (Portable Batch System) SLURM, and Oracle Grid Engine. They use FCFS-based job scheduling.

작업 스케줄러는 주로 자원 관리 시스템과 연동하여 사용되는데, 작업 큐 상의 작업들을 가용 자원의 종류와 유무 및 작업의 우선순위, 요구 자원량을 비교하여 동적으로 실행순서를 바꾸어가면서 작업들을 실행하는 소프트웨어 시스템을 의미하여, 종래 기술로는, 예컨대 Maui, ALPS, LSF, Moab 등이 있다.
The task scheduler is mainly used in conjunction with the resource management system. The task scheduler is a software system that executes tasks while changing the execution order by comparing the tasks on the task queue with the types and availability of available resources, priorities of tasks, Examples of conventional techniques include Maui, ALPS, LSF, and Moab.

대한민국 공개특허 제2010-0118357호(공개일 : 2010. 11. 05.)Korean Patent Publication No. 2010-0118357 (Published on May 11, 2010) 대한민국 공개특허 제2011-0060175호(공개일 : 2011. 06. 08.)Korea Patent Publication No. 2011-0060175 (published on June 28, 2011)

잘 알려진 바와 같이, 예컨대 유전체 서열분석을 포함하여 대부분의 과학응용의 경우 기 개발된 응용 프로그램들을 조합하여 원하는 결과를 얻는 경우가 많은데, 시간적인 선후관계에 의한 종속성과 데이터 종속성을 가지는 응용 프로그램(작업)들을 이 종속성에 근거하여 순서 흐름을 만들어 주는 것이 워크플로우(또는 파이프라인)이며, 이와 같은 워크플로우는 한 두 개의 응용으로 구성되는 간단한 형태부터 수십~수백 개의 응용들이 묶이는 형태까지 매우 다양한 크기를 가질 수 있다.As is well known, most scientific applications, including genome sequencing, for example, often combine previously developed applications to obtain desired results. Applications that have dependencies and data dependencies due to temporal posterior relationships ) Is a workflow (or pipeline) that creates an order flow based on this dependency. Such workflows can range from a simple form consisting of one or two applications to a bundle of tens to hundreds of applications. Lt; / RTI >

이와 같은 워크플로우를 적당한 계산 자원에 맵핑하여 효과적으로 결과를 도출하기 위해서는 워크플로우를 구성하는 작업들이 필요로 하는 계산 자원에 대한 정확한 정보가 요구된다. 그러나, 작업을 실제 실행하는 응용 프로그램의 자원 사용에 대한 정보(예컨대, 응용 프로그램은 CPU는 몇 개, 메모리 얼마, 디스크 얼마, 네트워크 대역폭 얼마가 필요함)는 그 응용의 개발자가 아니면 알아내기 매우 어려우며, 소스 코드로부터 자원 사용 프로파일을 얻어내는 분석 도구들의 개발도 부진한 편이다.In order to map such a workflow to an appropriate calculation resource and to derive an effective result, accurate information about the calculation resources required by the work constituting the workflow is required. However, it is very difficult to find information about the resource usage of an application that actually performs the task (for example, how many CPUs, how much memory, how much disk, how much network bandwidth is needed) The development of analytical tools to obtain resource usage profiles from source code is also lacking.

이전의 선행특허에서는 샘플 파일로 테스트 작업을 실행하여 그 작업을 분석하거나 이전의 작업 실행 결과를 작업 프로파일로 저장했다가 다음 작업 스케줄링시에 활용하는 방법들이 고안되었다. 더불어, CPU 자원과 메모리 자원에 대한 격리 기능 또한 제공되어 계산이 많은 작업이나 메모리 사용이 많은 작업의 경우 필요한 CPU 코어수와 메모리를 격리 기능을 제공하여 할당하여 실행하면, 자원 사용에 있어 격리가 되어 작업들 간에 영향을 주지 않는다.Previous previous patents have devised ways to run a test job with a sample file to analyze the job or to save the results of the previous job execution as a job profile and to use it in scheduling the next job. In addition, isolation for CPU resources and memory resources is also provided, so that in the case of computation-intensive or memory-intensive tasks, the number of CPU cores and memory required can be quarantined, There is no effect between tasks.

그러나, 입출력의 경우에는 자원의 격리가 일반적으로 제공되지 않는다. 예컨대 고성능 계산 네트워크로 사용되는 인피니밴드(Infiniband)와 전역 파일 시스템(global file system: GFS)은 입출력 대역폭에 대한 격리 기능이 제공되지 않는다. 따라서, 특성이 다른 다수의 응용이 동시에 실행되면서 입출력 자원을 공유하게 되면 기대하는 수준의 CPU 사용 효율성을 성취할 수 없으며, 특히 입출력이 매우 많은 응용들이 동시에 실행될 경우에는 입출력을 대기하는 시간이 매우 늘어나 계산이 많은 작업에 있어서도 CPU 사용 효율성이 매우 나빠지는 상황이 발생할 수 있다.However, in the case of I / O, isolation of resources is generally not provided. For example, infiniband and global file system (GFS) used in high performance computing networks do not provide isolation for I / O bandwidth. Therefore, if a plurality of applications having different characteristics are simultaneously executed and the input / output resources are shared, the expected CPU utilization efficiency can not be achieved. In particular, when applications having a very high input / output are executed simultaneously, Even in computationally intensive work, CPU utilization can be very poor.

본 발명은 상술한 바와 같이 입출력이 상대적으로 많은 작업들이 포함된 워크플로우를 실행함에 있어 발생할 수 있는 자원 사용의 문제점을 해결하기 위해 개별 응용의 자원 사용량 정보와 실시간 자원 부하(workload) 정보를 모두 스케줄링에 반영하여 입출력이 상대적으로 많은 작업으로 인해 전체 고성능 컴퓨팅 시스템의 운용 효율성이 저하되는 것을 방지할 수 있는 새로운 기법을 제공한다.
As described above, in order to solve the problem of resource usage that may occur in executing a workflow including jobs having a relatively large input / output, the present invention can be applied to both scheduling of resource usage information and real- To provide a new technique that can prevent the operation efficiency of the entire high-performance computing system from deteriorating due to a relatively large input / output operation.

본 발명은, 일 관점에 따라, 사용자 워크플로우를 위한 인터페이스를 제공하는 워크플로우 사용자 인터페이스부와, 개별 응용의 자원 사용량 정보들을 이용하여 상기 사용자 워크플로우를 실행 워크플로우로 변환하여 계산 자원을 통해 실행시키고, 그에 따른 스케줄링 지시를 발생하는 워크플로우 엔진부와, 컴퓨팅 시스템의 전체 자원에 대한 실시간 자원 부하 정보를 수집 및 관리하는 자원 관리부와, 발생된 상기 스케줄링 지시에 따라 상기 실시간 자원 부하 정보에 의거하여, 동일 계산 노드에 데이터 위치 종속성이 있는 한 그룹의 작업들이 순차적으로 실행되도록 스케줄링하는 작업 스케줄러부와, 개별 계산 노드에서 위치하며, 상기 작업 스케줄러부로부터 작업에 대한 실행이 요청될 때 할당된 노드에서의 작업을 실행하는 계산 자원 작업 관리부와, 작업들이 개별 계산 자원에서 실행 중 일 때 개별 노드에서 작업이 실행되는 계산 자원의 부하 정보를 감시하여 상기 자원 관리부에 제공하는 계산 자원 감시부와, 전역 파일 시스템의 총 입출력 대역폭과 이용률을 계측하여 상기 자원 관리부에 제공하는 전역 파일 시스템 자원 감시부와, 상기 계산 자원 감시부로부터의 부하 정보와 상기 전역 파일 시스템 자원 감시부로부터의 계측 정보에 의거해 상기 개별 응용의 자원 사용량 정보들을 생성하여 상기 워크플로우 엔진부에 제공하는 응용 자원 사용량 관리부를 포함하는 워크플로우 작업 스케줄링 장치를 제공한다.According to one aspect of the present invention, there is provided a workflow management system including a workflow user interface unit for providing an interface for a user workflow according to a viewpoint, a resource management unit for converting the user workflow into an execution workflow using resource usage information of the individual application, A resource management unit that collects and manages real-time resource load information on all resources of the computing system, and a resource management unit that manages resources based on the real-time resource load information according to the generated scheduling instruction A task scheduler for scheduling tasks of the group to be executed in sequence as long as there is a data position dependency in the same task node, The Calculation Resource Work Center A computation resource monitoring unit monitoring load information of a computational resource for which a job is executed in an individual node when jobs are being executed in the individual computational resource and providing the load information to the resource management unit; A global file system resource monitoring unit for measuring and providing resource usage information of the individual application based on the load information from the calculation resource monitoring unit and the measurement information from the global file system resource monitoring unit And an application resource usage management unit provided to the workflow engine unit.

본 발명의 상기 인터페이스는, GUI 인터페이스 또는 웹 인터페이스일 수 있으며, 상기 사용자 워크플로우에 대한 저장, 변경, 삭제, 조회 기능을 제공하는 워크플로우 관리부를 더 포함할 수 있다. 여기에서, 상기 사용자 워크플로우는, 계산자원에 실행되지 않는 추상적인 레벨의 워크플로우일 수 있다.The interface of the present invention may be a GUI interface or a web interface, and may further include a workflow management unit that provides storage, change, delete, and inquiry functions for the user workflow. Here, the user workflow may be an abstract level workflow that is not executed on the computational resource.

본 발명의 상기 워크플로우 엔진부는, 상기 실행 워크플로우로 변환할 때 상기 사용자 워크플로우의 각 작업 간 순서를 작업 제출시 필요한 데이터 위치 종속성 파라미터로 변환하고, 상기 응용 자원 사용량 정보들을 작업 제출시 필요한 자원 사용 요구량 파라미터로 변환할 수 있다.The workflow engine unit of the present invention converts the order of each work of the user workflow into the data location dependency parameter required for job submission when converting into the execution workflow and transmits the application resource usage information to the resource It can be converted into the usage requirement parameter.

본 발명의 상기 워크플로우 엔진부는, 상기 데이터 위치 종속성 파라미터 및 자원 사용 요구량 파라미터를 포함하는 작업 실행 요청 정보를 생성하여 상기 작업 스케줄러부에 제공할 수 있으며, 상기 계산 자원의 실행 결과에 대한 저장, 삭제, 조회 기능을 제공할 수 있다.The workflow engine unit of the present invention may generate job execution request information including the data location dependency parameter and the resource use requirement parameter and may provide the job execution request information to the job scheduler unit. , And the inquiry function can be provided.

본 발명의 상기 실시간 자원 부하 정보는, 상기 컴퓨팅 시스템의 전체 자원에 대한 형상 정보, 개별 노드들의 자원 할당 여부 및 부하 정보, 전역 파일 시스템의 부하 및 입출력 대역폭 사용량을 포함할 수 있다.The real-time resource load information of the present invention may include shape information of all resources of the computing system, resource allocation and load information of individual nodes, load of the global file system, and input / output bandwidth usage.

본 발명의 상기 전체 자원은, 전체 계산 노드, 전역 파일 시스템 노드, 관리 네트워크 스위치, 계산 네트워크 스위치 및 네트워크 아키텍처를 포함할 수 있다.The entire resource of the invention may include an entire computing node, a global file system node, a management network switch, a computational network switch, and a network architecture.

본 발명의 상기 작업 스케줄러부는, 현재 작업 큐에서 대기 중인 작업들을 우선순위 및 가용 자원의 유무에 따라 계산 자원에 할당하고 실행시킬 수 있으며, 상기 가용 자원의 유무는 실시간 입출력 대역폭의 가용량을 포함할 수 있다.The job scheduler of the present invention can allocate and execute jobs queued in the current job queue according to priorities and availability of available resources, and the availability of the available resources includes the available amount of real-time input / output bandwidth .

본 발명의 상기 계산 자원의 부하 정보는, 상기 개별 노드의 CPU 활용률, 메모리 사용률, 디스크 입출력 대역폭 사용량, 디스크 사용률, 네트워크 입출력 사용량 및 사용률 중 적어도 하나 이상을 포함할 수 있다.The load information of the computation resources of the present invention may include at least one of the CPU utilization rate, the memory utilization rate, the disk input / output bandwidth usage, the disk usage rate, the network input / output usage rate and the usage rate of the individual node.

본 발명의 상기 계산 자원 감시부는, 상기 부하 정보를 감시하여 주기적 또는 작업 종료 후에 상기 자원 관리부 또는 응용 자원 사용량 관리부에 제공할 수 있다.The calculation resource monitoring unit of the present invention may monitor the load information and provide the resource information to the resource management unit or the application resource usage management unit periodically or after the end of the work.

본 발명의 상기 개별 응용의 자원 사용량 정보들은, 상기 사용자 워크플로우가 실행될 때 자동 수집되어 생성되거나 혹은 작업 관리자의 모니터링을 통한 수동 입력을 통해 생성될 수 있다.The resource usage information of the individual application of the present invention may be automatically collected and generated when the user workflow is executed or may be generated through manual input through monitoring of the task manager.

본 발명은, 다른 관점에 따라, 사용자 워크플로우를 정의할 때 각 작업의 출력파일이 최종결과로서 필요한지의 여부를 지정하는 과정과, 상기 사용자 워크플로우를 구성하는 개별 응용의 자원 사용량 정보들을 수집하는 과정과, 연속된 작업의 응용특성이 입출력이 상대적으로 많은 작업일 때, 그 작업의 후속 작업이 선행 작업이 수행된 노드에서 실행되도록 데이터 위치 종속성 파라미터를 지정하는 과정과, 컴퓨팅 시스템의 전체 자원에 대해 수집된 실시간 자원 부하 정보에 의거하여 동일 계산 노드에 데이터 위치 종속성이 있는 한 그룹의 작업들이 순차적으로 실행되도록 스케줄링하는 과정을 포함하는 워크플로우 작업 스케줄링 방법을 제공한다.According to another aspect of the present invention, there is provided a method of managing a user workflow, the method comprising: specifying whether an output file of each job is required as a final result when defining a user workflow; collecting resource usage information of an individual application constituting the user workflow And assigning a data location dependency parameter such that a subsequent task of the task is executed in the node where the predecessor task is executed when the application characteristic of the consecutive task is a task having a relatively large input and output, And scheduling the tasks of the group to be sequentially executed as long as there is a data location dependency on the same computation node based on the collected real-time resource load information.

본 발명의 상기 선행 작업은 그 작업이 실행된 계산 노드의 로컬 디스크에 중간결과 파일을 저장하고, 상기 후속 작업은 실행되는 노드의 로컬 디스크로부터 선행 작업이 저장한 중간결과 파일을 읽어올 수 있도록 입출력 디렉토리 위치를 지시할 수 있다.The preceding task of the present invention stores an intermediate result file on the local disk of the computation node where the task is executed and the subsequent task is an I / O operation so that the intermediate result file stored by the preceding task can be read from the local disk of the executed node You can indicate the directory location.

본 발명의 상기 스케줄링하는 과정은, 작업 큐 상의 작업을 선택하는 과정과, 상기 선택된 작업의 전역 파일 시스템의 입출력 대역폭 요구량이 상기 전역 파일 시스템에서 제공 가능한 요구 조건을 충족시키는 지의 여부를 체크하는 과정과, 상기 요구 조건이 충족될 때, 상기 선택된 작업이 요구하는 자원 사용량을 만족하는 계산 노드를 선택하는 과정과, 상기 선택된 노드에서 상기 선택된 작업을 실행시키는 과정을 포함할 수 있다.The scheduling process of the present invention includes the steps of selecting a job on a job queue, checking whether the input / output bandwidth requirement of the global file system of the selected job satisfies a requirement that can be provided in the global file system, Selecting a computing node that satisfies the resource usage required by the selected task when the requirement is satisfied, and executing the selected task in the selected node.

본 발명의 상기 작업의 선택 기준은, 기 지정된 우선순위 제어 기준에 따를 수 있다.The selection criterion of the job of the present invention may be based on the previously designated priority control criterion.

본 발명의 상기 요구 조건은, 현재 전역 파일 시스템의 이용률이 최대 전역 파일 시스템의 안정 이용률보다 상대적으로 작아야 하고, 상기 현재 전역 파일 시스템의 입출력 사용량과 상기 선택된 작업의 전역 파일 시스템의 입출력 요구량의 합이 전역 파일 시스템의 최대 실측 입출력 대역폭보다 상대적으로 작아야 하는 조건일 수 있다.
The requirement of the present invention is that the utilization rate of the current global file system should be relatively smaller than the stable utilization rate of the maximum global file system and the sum of the input / output usage of the current global file system and the input / It may be a condition that must be relatively smaller than the maximum actual input / output bandwidth of the global file system.

본 발명에 따르면, 입출력이 상대적으로(매우) 많은 작업들을 포함하는 워크플로우를 고성능 컴퓨터 또는 클러스터에서 실행할 때, 계산 노드 및 전역 파일 시스템의 실시간 입출력 대역폭 정보를 피드백 받아 작업 스케줄링에 활용함으로써 입출력으로 인해 계산 자원의 사용 효율이 현저하게 떨어지는 것을 방지할 수 있어 계산 자원의 이용을 극대화할 수 있으며, 이를 통해 워크플로우를 실행하는 시간을 줄여 더 빠르고 신속한 결과를 얻을 수 있다. 또한, 본 발명은 계산 자원에 대한 깊은 지식 및 시스템 사용법을 알지 못하는 일반 사용자도 자신이 만든 워크플로우를 최적으로 실행할 수 있다.
According to the present invention, when a workflow including a relatively (very) high input / output job is executed in a high performance computer or a cluster, real-time input / output bandwidth information of a calculation node and a global file system is fed back to job scheduling, It is possible to prevent the utilization efficiency of the computational resources from being significantly lowered, thereby maximizing the utilization of the computational resources, thereby reducing the execution time of the workflow and obtaining faster and faster results. In addition, the present invention can optimally execute a workflow created by a general user who does not know deep knowledge of computational resources and how to use the system.

도 1은 파일 입출력으로 연결된 작업들로 구성된 워크플로우의 일 예시도,
도 2는 전형적인 고성능 컴퓨터 시스템의 구성도,
도 3은 본 발명의 실시 예에 따른 워크플로우 작업 스케줄링 장치의 블록 구성도,
도 4는 입출력이 상대적으로 많은 작업들을 포함하는 워크플로우에서 입출력을 로컬 디스크와 전역 파일 시스템 간에 분산하는 방법을 설명하기 위한 예시도,
도 5는 본 발명에 따라 작업 스케줄러부가 입출력이 상대적으로 많은 작업들을 위한 스케줄링을 보여주는 순서도.1 is an example of a workflow composed of jobs connected through file input / output,
2 is a block diagram of a typical high performance computer system,
3 is a block diagram of a workflow work scheduling apparatus according to an embodiment of the present invention;
4 is an exemplary diagram for explaining a method of distributing input / output between a local disk and a global file system in a workflow including jobs having a relatively large input / output,
FIG. 5 is a flowchart illustrating scheduling for tasks having a relatively large input / output by a task scheduler unit according to the present invention. FIG.

먼저, 본 발명의 장점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되는 실시 예들을 참조하면 명확해질 것이다. 여기에서, 본 발명은 이하에서 개시되는 실시 예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시 예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 발명의 범주를 명확하게 이해할 수 있도록 하기 위해 예시적으로 제공되는 것이므로, 본 발명의 기술적 범위는 청구항들에 의해 정의되어야 할 것이다.First, the advantages and features of the present invention, and how to accomplish them, will be clarified with reference to the embodiments to be described in detail with reference to the accompanying drawings. While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, It will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

아울러, 아래의 본 발명을 설명함에 있어서 공지 기능 또는 구성 등에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이다. 그리고, 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들인 것으로, 이는 사용자, 운용자 등의 의도 또는 관례 등에 따라 달라질 수 있음은 물론이다. 그러므로, 그 정의는 본 명세서의 전반에 걸쳐 기술되는 기술사상을 토대로 이루어져야 할 것이다.In the following description of the present invention, detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear. It is to be understood that the following terms are defined in consideration of the functions of the present invention, and may be changed according to intentions or customs of a user, an operator, and the like. Therefore, the definition should be based on the technical idea described throughout this specification.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시 예에 대하여 상세하게 설명한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

먼저, 본 발명은 입출력이 상대적으로(매우) 많은 작업들을 포함하는 워크플로우의 작업 스케줄링 방법을 제안하는데, 이를 위해 본 발명의 스케줄링 방법은 워크플로우 작업 실행시 실제 계산 자원에서 사용되는 계산 자원량을 사전 분석한 정보와 실시간으로 입출력 대역폭을 감시하여 그 결과가 스케줄링에 지속적으로 반영되도록 하는 방식을 포함할 수 있다.First, the present invention proposes a work scheduling method for a workflow including a relatively large number of jobs having a relatively large number of input / output operations. To this end, the scheduling method of the present invention includes: Monitoring the I / O bandwidth in real time and analyzing the information, and continuously reflecting the result in the scheduling.

도 1은 파일 입출력으로 연결된 작업들로 구성된 워크플로우의 일 예시도로서, 한 작업의 결과로 저장된 파일을 다시 입력으로 받아 다음 작업을 수행하는 형태로 구성되는 워크플로우(또는 파이프라인)의 형태를 보여준다.FIG. 1 is an example of a workflow composed of jobs connected by a file input / output, in which a form of a workflow (or a pipeline) configured to receive a file stored as a result of a job and perform the next job Show.

여기에서, 워크플로우의 입력 데이터는 파일 형태로 저장되어 있거나 표준 입력으로 작업에 들어오게 되는데, 각 단계별 작업은 계산이 매우 많은 작업(compute-intensive job)이거나 메모리 사용이 매우 많은 작업(memory-intensive job) 또는 입출력이 매우 많은 작업(I/O-intensive job)등 다양한 응용특성을 가질 수 있다.Here, the input data of the workflow is either stored in a file or entered into the job as standard input, each step being either a compute-intensive job or a memory-intensive job job, or I / O-intensive job.

도 2는 전형적인 고성능 컴퓨터 시스템(또는 슈퍼컴퓨터)의 구성도로서, 다수의 서비스 노드(202-1 내지 202-3), 관리 네트워크 스위치(204), 다수의 계산 노드(206-1 내지 206-4), 계산 네트워크 스위치(208), 다수의 전역 파일 시스템 서버 노드(210-1 내지 210-3) 및 다수의 스토리지 노드(212-1 내지 212-3) 등을 포함할 수 있다. 여기에서, 다수의 서비스 노드(202-1 내지 202-3)는 다수의 계산 노드(206-1 내지 206-4)보다 적어도 작은 수의 노드들로 구성될 수 있다.2 is a block diagram of a typical high performance computer system (or supercomputer) that includes a plurality of service nodes 202-1 through 202-3, a management network switch 204, a plurality of computing nodes 206-1 through 206-4 A computing network switch 208, a plurality of global file system server nodes 210-1 through 210-3, and a plurality of storage nodes 212-1 through 212-3. Here, the plurality of service nodes 202-1 to 202-3 may be configured with at least a smaller number of nodes than the plurality of calculation nodes 206-1 to 206-4.

도 2를 참조하면, 각 서비스 노드(202-1 내지 202-3)에는 일반 사용자가 로그인하여 작업을 제출하는 로그인 노드와 관리 기능을 하는 각종 서버(예컨대, 클러스터 관리, 자원 관리, 워크플로우 관리 등)를 운용하는 관리 노드 등을 포함한다.2, each of the service nodes 202-1 to 202-3 is provided with a login node in which a general user logs in and submits a job, and various servers (e.g., cluster management, resource management, workflow management, ), And the like.

여기에서, 각 서비스 노드(202-1 내지 202-3)와 각 계산노드(206-1 내지 206-4) 및 다수의 전역 파일 시스템 서버 노드(210-1 내지 210-3) 간에는 고속의 계산 네트워크 스위치(208)와, 계산 네트워크 스위치에 비해 상대적으로 저속인 관리 네트워크 스위치(204)로 이중 연결된다.Here, between each service node 202-1 to 202-3 and each of the calculation nodes 206-1 to 206-4 and a plurality of global file system server nodes 210-1 to 210-3, Switch 208 and the management network switch 204, which is relatively slow compared to the computational network switch.

그리고, 각 계산 노드(206-1 내지 206-4)는 로컬 디스크를 가질 수도 있고 로컬 디스크를 갖지 않을 수도 있으며, 대부분의 연산 작업은 전역 파일 시스템에 저장된 입력 파일들을 읽어 필요한 연산을 수행하고 그 결과를 다시 전역 파일 시스템에 저장하는 방식으로 이루어진다.In addition, each of the calculation nodes 206-1 to 206-4 may have a local disk or a local disk, and most of the calculation operations are performed by reading the input files stored in the global file system, To the global file system.

따라서, 입출력이 상대적으로 많은 작업들이 사용하는 총 입출력 대역폭이 전역 파일 시스템이 제공하는 대역폭을 초과할 경우, 종래 방식에 따르면 전체 고성능 컴퓨팅 시스템의 모든 작업들이 영향을 받게 되는데, 본 발명에서는 이러한 문제점들을 해결할 수 있는 방안을 제시한다.Accordingly, when the total input / output bandwidth used by jobs having a relatively large input / output exceeds the bandwidth provided by the global file system, all operations of the entire high performance computing system are affected according to the conventional method. In the present invention, I suggest a solution.

도 3은 본 발명의 실시 예에 따른 워크플로우 작업 스케줄링 장치의 블록 구성도로서, 워크플로우 사용자 인터페이스부(302), 워크플로우 관리부(304), 워크플로우 엔진부(306), 자원 관리부(308), 작업 스케줄러부(310), 계산 자원 작업 관리부(312), 계산 자원 감시부(314), 전역 파일 시스템 자원 감시부(316) 및 응용 자원 사용량 관리부(318) 등을 포함할 수 있다.3 is a block diagram of a workflow task scheduling apparatus according to an embodiment of the present invention. The workflow task scheduling apparatus includes a workflow user interface unit 302, a workflow management unit 304, a workflow engine unit 306, a resource management unit 308, A task scheduler 310, a computation resource task manager 312, a computation resource monitor 314, a global file system resource monitor 316 and an application resource usage manager 318.

도 3을 참조하면, 워크플로우 사용자 인터페이스부(302)는 사용자가 필요한 워크플로우를 GUI 인터페이스 또는 웹 인터페이스 등을 통해 쉽게 정의, 실행 및 분석할 수 있는 사용자 인터페이스를 실행하는 등의 기능을 제공한다.Referring to FIG. 3, the workflow user interface unit 302 provides functions such as executing a user interface that allows a user to easily define, execute, and analyze necessary workflow through a GUI interface or a web interface.

또한, 워크플로우 관리부(304)는 사용자 인터페이스를 통해 지정(선택)된 사용자 워크플로우에 대한 저장, 변경, 삭제, 조회 기능을 제공할 수 있는데, 여기에서 사용자 워크플로우는 추상적인 레벨의 워크플로우로서 계산 자원에 곧바로 실행 될 수 있으며, 구체적인 실행 워크플로우로 변환하여 실행 가능하다.In addition, the workflow management unit 304 may provide storage, change, deletion, and lookup functions for a user workflow designated (selected) through a user interface, wherein the user workflow is an abstract level workflow It can be executed immediately on a computational resource, and can be converted into a concrete execution workflow and executed.

다음에, 워크플로우 엔진부(306)는 후술하는 응용 자원 사용량 관리부(318)로부터 제공되는 개별 응용의 자원 사용량 정보들을 이용하여 작성된 사용자 워크플로우를 실행 워크플로우로 변환하여 계산 자원을 통해 실행시키고, 그에 따른 스케줄링 지시를 발생하는 등의 기능을 제공할 수 있다. 여기에서, 워크플로우 엔진부(306)는 계산 자원의 실행을 통해 얻은 결과를 저장, 삭제, 조회하는 기능을 제공할 수 있다.Next, the workflow engine unit 306 converts the created user workflow into the execution workflow using the resource usage information of the individual application provided from the application resource usage management unit 318, which will be described later, And generating a scheduling instruction corresponding thereto. Here, the workflow engine unit 306 may provide a function of storing, deleting, and inquiring results obtained through execution of computational resources.

또한, 워크플로우 엔진부(306)는 실행 워크플로우로 변환할 때 사용자 워크플로우의 각 작업 간 순서를 작업 제출시 필요한 데이터 위치 종속성 파라미터로 변환하고, 응용 자원 사용량 정보들을 작업 제출시 필요한 자원 사용 요구량 파라미터로 변환하는 등의 기능을 제공할 수 있으며, 데이터 위치 종속성 파라미터 및 자원 사용 요구량 파라미터를 포함하는 작업 실행 요청 정보를 생성하여 후술하는 작업 스케줄러부(310)에 전달하는 등의 기능을 제공할 수 있다.The workflow engine unit 306 converts the sequence of each work in the user workflow into the data location dependency parameter required for job submission when converting into the execution workflow, and converts the application resource usage information into the resource usage requirement And can provide functions such as generating job execution request information including the data location dependency parameter and the resource use request amount parameter and delivering the job execution request information to the job scheduler unit 310 to be described later have.

다시, 자원 관리부(308)는 컴퓨팅 시스템의 전체 자원(예컨대, 전체 계산 노드, 전역 파일 시스템 노드, 관리 네트워크 스위치, 계산 네트워크 스위치 및 네트워크 아키텍처 등)에 대한 실시간 자원 부하 정보, 예컨대 컴퓨팅 시스템의 전체 자원에 대한 형상 정보, 개별 노드들의 자원 할당 여부 및 부하(사용률) 정보, 그리고 전역 파일 시스템의 부하 및 입출력 대역폭 사용량을 포함하는 실시간 자원 부하 정보를 수집 및 관리하는 등의 기능을 제공할 수 있다. 여기에서, 수집 및 관리되는 실시간 자원 부하 정보는 작업 스케줄러부(310)로 전달된다.Again, the resource manager 308 may provide real-time resource load information for the entire resources of the computing system (e.g., the entire computing node, the global file system node, the management network switch, the computing network switch and the network architecture, etc.) And collect and manage the real-time resource load information including the load information of the global file system and the input / output bandwidth usage, and the like. Here, the real-time resource load information collected and managed is transmitted to the job scheduler unit 310. [

한편, 작업 스케줄러부(310)는 워크플로우 엔진부(306)로부터 스케줄링 지시가 전달될 때 자원 관리부(308)로부터 전달되는 실시간 자원 부하 정보에 의거하여, 동일 계산 노드에 데이터 위치 종속성이 있는 한 그룹의 작업들이 순차적으로 실행되도록 스케줄링하는 등의 기능을 제공할 수 있다. 즉, 작업 스케줄러부(310)는 현재 작업 큐에서 대기 중인 작업들을 우선순위 및 가용 자원의 유무에 따라 계산 자원에 할당하고 실행시키는 등의 기능을 제공할 수 있는데, 여기에서 가용 자원의 유무는, 예컨대 실시간 입출력 대역폭의 가용량을 포함할 수 있다.On the other hand, the task scheduler unit 310, on the basis of the real-time resource load information delivered from the resource manager 308 when the scheduling instruction is transmitted from the workflow engine unit 306, And scheduling the tasks of the server to be sequentially executed. That is, the job scheduler 310 may provide functions such as assigning and executing jobs queued in the current job queue according to priority and availability of the available resources, and the like. Here, For example, the amount of real-time input / output bandwidth available.

예컨대, 작업 스케줄러부(310)는 작업 큐 상의 작업을 기 지정된 우선순위 제어 기준에 따라 선택하고, 선택된 작업의 전역 파일 시스템의 입출력 대역폭 요구량이 전역 파일 시스템에서 제공 가능한 요구 조건을 충족시키는 지의 여부를 체크하여 요구 조건이 충족될 때, 선택된 작업이 요구하는 자원 사용량을 만족하는 계산 노드를 선택한 후 선택된 작업을 실행하도록 스케줄링할 수 있다. 여기에서, 제공 가능한 요구 조건은 현재 전역 파일 시스템의 이용률이 최대 전역 파일 시스템의 안정 이용률보다 상대적으로 작아야 하고, 현재 전역 파일 시스템의 입출력 사용량과 선택된 작업의 전역 파일 시스템의 입출력 요구량의 합이 전역 파일 시스템의 최대 실측 입출력 대역폭보다 상대적으로 작아야 하는 조건을 의미할 수 있다.For example, the job scheduler unit 310 selects a job on a job queue according to a predefined priority control criterion, and determines whether the input / output bandwidth requirement of the global file system of the selected job satisfies a requirement that can be provided in the global file system When the requirement is satisfied, it is possible to select a calculation node that satisfies the resource usage required by the selected job and to schedule the selected job to be executed. Here, the requirements that can be provided are that the utilization rate of the current global file system should be relatively smaller than the stable utilization rate of the maximum global file system, and the sum of the input / output usage of the current global file system and the input / It may mean a condition that must be relatively smaller than the maximum actual input / output bandwidth of the system.

다음에, 계산 자원 작업 관리부(312)는, 개별 계산 노드에서 위치하는 것으로, 작업 스케줄러부(310)로부터 작업에 대한 실행이 요청될 때 할당된 노드에서의 작업을 실행하고, 실행된 작업을 감시하며, 그 작업 결과를 보고하는 등의 기능을 제공할 수 있다.Next, the calculation resource job management unit 312 is located at the individual calculation node. The calculation resource job management unit 312 executes the job on the assigned node when execution of the job is requested from the job scheduler unit 310, And report the result of the operation.

그리고, 계산 자원 감시부(314)는 작업들이 개별 계산 자원에서 실행 중 일 때 개별 노드의 계산 자원, 예컨대 CPU 활용률, 메모리 사용률, 디스크 입출력 대역폭 사용량, 디스크 사용률, 네트워크 입출력 사용량 및 사용률 등의 작업이 실행되는 계산 자원의 부하 정보를 감시하여 주기적 또는 선택된 작업의 종료 후에 자원 관리부(308) 또는 응용 자원 사용량 관리부(318)에 보고(제공)하는 등의 기능을 제공할 수 있다.The computation resource monitoring unit 314 may be configured to monitor the computation resources of individual nodes, such as CPU utilization, memory utilization, disk I / O bandwidth usage, disk utilization, network I / O usage, (Providing) the load information of the computation resource to be executed and reporting (providing) the resource information to the resource management unit 308 or the application resource usage amount management unit 318 after the termination of the periodic or selected work.

여기에서, 계산 자원 감시부(314)에서 감시하는 자원 사용 프로파일 성능 메트릭은, 예컨대 CPU의 이용률(peak, avg), Memory 사용량(peak, avg), Disk 이용률 및 I/O 대역폭 사용량(peak, avg), Network 이용률 및 I/O 대역폭 사용량(peak, avg) 등을 포함할 수 있다.Here, the resource usage profile performance metric monitored by the calculation resource monitoring unit 314 may be, for example, a peak utilization rate (peak, avg), a memory usage amount (peak, avg), a disk usage rate, and an I / O bandwidth usage amount ), Network utilization and I / O bandwidth usage (peak, avg).

다시, 전역 파일 시스템 자원 감시부(316)는 전역 파일 시스템의 총 입출력 대역폭과 이용률을 주기적으로 계측하여 자원 관리부(308)로 전달하는 등의 제공할 수 있다.The global file system resource monitoring unit 316 may periodically measure the total input / output bandwidth and the utilization rate of the global file system and deliver the bandwidth to the resource management unit 308.

마지막으로, 응용 자원 사용량 관리부(318)는 계산 자원 감시부(314)로부터 전달되는 부하 정보와 전역 파일 시스템 자원 감시부(316)로부터 전달되는 계측 정보에 의거해 개별 응용의 자원 사용량 정보들을 생성하여 워크플로우 엔진부(306)로 전달하는 등의 기능을 제공할 수 있다. 여기에서, 개별 응용의 자원 사용량 정보들은 사용자 워크플로우가 실행될 때 자동 수집되어 생성되거나 혹은 작업 관리자가 수동으로 자원 사용량을 모니터링하여 입력할 수도 있다.Lastly, the application resource usage management unit 318 generates resource usage information of each application based on the load information transmitted from the calculation resource monitoring unit 314 and the measurement information transmitted from the global file system resource monitoring unit 316 To the workflow engine unit 306, and the like. Here, the resource usage information of the individual application may be automatically collected when the user workflow is executed, or may be manually input and monitored by the operation manager.

도 4는 입출력이 상대적으로 많은 작업들을 포함하는 워크플로우에서 입출력을 로컬 디스크와 전역 파일 시스템 간에 분산하는 방법을 설명하기 위한 예시도이다.4 is an exemplary diagram for explaining a method of distributing input / output between a local disk and a global file system in a workflow including jobs having a relatively large input / output.

도 4를 참조하면, 입출력이 상대적으로(매우) 많은 작업들은 입출력 대역폭을 상대적으로 많이 소모하므로, 매우 큰 전역 대역폭을 제공하는 전역 파일 시스템을 구축하기 위해서는 많은 비용이 소모된다. 따라서, 상대적으로 저렴한 로컬 디스크를 효과적으로 사용하면 전역 입출력 대역폭을 줄여 가격 효율적인 시스템 구축과 운용이 가능하다.Referring to FIG. 4, a relatively large number of I / O operations consume a relatively large amount of I / O bandwidth, so that a large cost is required to construct a global file system that provides a very large global bandwidth. Therefore, using a relatively inexpensive local disk effectively reduces the global I / O bandwidth and enables cost-effective system construction and operation.

일예로서, 도 4에 도시된 바와 같이, 워크플로우의 전체 또는 일부 작업들(작업1, 작업2, 작업3으로 구성)이 입출력이 매우 많다고 가정할 때, 작업 1, 2, 3이 순차적으로 실행되어야 하며, 작업 1의 입력은 전역 파일 시스템에서 가져오고, 작업 1의 결과(중간파일1)이 작업 2의 입력으로 들어가며, 작업 2의 결과(중간파일2)가 작업 3의 입력으로 들어가고, 작업 3의 최종결과가 전역 파일 시스템에 저장된다고 가정한다.As an example, assuming that all or some of the tasks of the workflow (consisting of tasks 1, 2, and 3) have very high I / O, tasks 1, 2, and 3 are executed sequentially , The input of task 1 is taken from the global file system, the result of task 1 (intermediate file 1) enters the input of task 2, the result of task 2 (intermediate file 2) enters the input of task 3, It is assumed that the final result of 3 is stored in the global file system.

도 4에서와 같이 중간 파일들은 전역 파일 시스템이 아니라 계산 노드의 로컬 디스크에 임시로 저장할 수 있다. 이 경우 계산 노드의 로컬 파일 시스템을 이용하면 중간파일 1과 2를 전역 파일 시스템으로 쓰고 다시 읽고 하는 작업이 없어진다. 이를 위해서 작업 2와 작업 3이 선행 작업인 작업 1이 실행된 계산 노드에서 실행되도록 스케줄링 해주면 된다.As shown in FIG. 4, the intermediate files can be temporarily stored in the local disk of the compute node, not in the global file system. In this case, using the compute node's local file system eliminates the need to write intermediate files 1 and 2 as global file systems and re-read them. To do this, you need to schedule tasks 2 and 3 to run on the compute node where task 1, the predecessor task, is executed.

즉, 종래의 작업 스케줄러에서는 작업의 순서만을 종속성(dependency) 인자(parameter)로 지정할 수 있었으나, 본 발명에서 제시하는 방법은 작업들의 순서와 데이터 위치까지 지정하여 스케줄링해 줌으로써 전역 파일 시스템의 입출력을 지역 디스크의 입출력으로 한정할 수 있다. 이를 위해 본 발명에서는 아래의 1) 내지 4)와 같은 프로세스를 제공할 수 있다.That is, in the conventional task scheduler, only the order of tasks can be specified as a dependency parameter. However, the method proposed by the present invention can specify the order of tasks and data positions, It can be limited to the input / output of the disk. To this end, the present invention can provide the following processes 1) to 4).

1) 사용자는 워크플로우 사용자 인터페이스부를 통해 워크플로우를 정의할 때 각 작업의 출력파일이 최종결과로서 필요한지를 지정한다.1) When defining the workflow through the workflow user interface, the user specifies whether the output file of each job is needed as the final result.

2) 응용 자원 사용량 관리부에서는 워크플로우를 구성하는 응용들의 자원 사용량 정보를 수집하여 제공한다.2) The application resource usage management unit collects and provides resource usage information of the applications constituting the workflow.

3) 워크플로우 엔진부에서는 연속된 작업의 응용특성이 입출력이 매우 많은 작업일 때, 그 작업의 후속 작업이 선행 작업이 수행된 노드에서 실행되도록 데이터 위치 종속성 파라미터를 지정한다. 이와 더불어 선행 작업은 전역 파일 시스템이 아니라 그 작업이 실행된 계산 노드의 로컬 디스크에 중간결과 파일을 저장하고, 후속 작업은 실행되는 노드의 로컬 디스크로부터 선행 작업이 저장한 중간결과 파일을 읽어올 수 있도록 입출력 디렉토리 위치를 지시한다. 이와 같은 과정은 선행 작업과 후행 작업 간에 순차적 종속성과 데이터 위치 종속성을 함께 지정하는 것이며, 또한 작업 완료 후 사용자가 필요 없다고 지정한 중간 파일을 삭제하거나 혹은 전역 파일 시스템으로 복사하도록 지시할 수 있다.3) The workflow engine part specifies the data location dependency parameter so that the subsequent job of the job is executed on the node where the predecessor job is executed when the application characteristic of the continuous job is a very input / output job. In addition, the predecessor, rather than the global file system, stores the intermediate result file on the local disk of the compute node on which it is executed, and subsequent tasks can read the intermediate result file stored by the predecessor task from the local disk of the executing node Output directory position so that the input / This process specifies both sequential dependencies and data location dependencies between the predecessor and the follower, and it can also instruct the user to delete the intermediate file that the user specifies not to need, or to copy it to the global file system.

4) 작업 스케줄러부에서는 워크플로우 엔진부로부터의 스케줄링 지시에 따라 동일 계산노드에 데이터 위치 종속성이 있는 한 그룹의 작업들이 순차적으로 실행되도록 스케줄링한다.4) The job scheduler unit schedules a group of jobs having the data location dependency to be sequentially executed in the same calculation node according to a scheduling instruction from the workflow engine unit.

도 5는 본 발명에 따라 작업 스케줄러부가 입출력이 상대적으로 많은 작업들을 위한 스케줄링을 보여주는 순서도이다.FIG. 5 is a flowchart showing scheduling for tasks having a relatively large input / output by a task scheduler according to the present invention.

도 5를 참조하면, 작업 큐 상의 작업을 기 지정된 우선순위 제어 기준에 따라 선택한다(단계 502).Referring to FIG. 5, an operation on a work queue is selected according to a predefined priority control criterion (step 502).

다음에, 단계(504)에서는 선택된 작업의 전역 파일 시스템의 입출력 대역폭 요구량이 전역 파일 시스템에서 제공 가능한 요구 조건을 충족시키는 지의 여부를 체크, 즉 현재 전역 파일 시스템의 이용률이 최대 전역 파일 시스템의 안정 이용률보다 상대적으로 작아야 하는 제1조건과, 현재 전역 파일 시스템의 입출력 사용량과 선택된 작업의 전역 파일 시스템의 입출력 요구량의 합이 전역 파일 시스템의 최대 실측 입출력 대역폭보다 상대적으로 작아야 하는 제2조건 모두를 충족시키는지의 여부를 체크한다.Next, in step 504, it is checked whether the input / output bandwidth requirement of the global file system of the selected job satisfies a requirement that can be provided by the global file system, that is, whether or not the utilization rate of the current global file system satisfies the stable utilization rate And the second condition that the sum of the input / output usage of the current global file system and the input / output requirement of the global file system of the selected task should be relatively smaller than the maximum actual input / output bandwidth of the global file system Check whether or not.

상기 단계(504)에서의 체크 결과, 제1조건과 제2조건 중 어느 하나가 충족되지 않거나 혹은 두 조건 모두가 충족되지 않으면, 처리는 단계(502)로 되돌아가서 다른 작업을 선택하게 되는데, 전역 파일 시스템의 안정 이용률은 전체 시스템 운용을 통해 보정될 수 있다.If, as a result of the check in step 504, either one of the first condition and the second condition is not satisfied or both conditions are not satisfied, the process returns to step 502 to select another task, The stable utilization rate of the file system can be corrected through overall system operation.

상기 단계(504)에서의 체크 결과, 제1조건과 제2조건 모두가 충족되는 것으로 판단될 때, 작업 스케줄러부에서는 선택 작업이 요구하는 대역폭을 허가한 후에도 전역 파일 시스템을 안정 운용할 수 있으면, 계산 노드 중 선택 작업이 요구하는 CPU core수, 메모리, 로컬 디스크 입출력 대역폭 요구량을 만족하는 계산 노드를 선택한다(단계 506). 이때 선택 작업이 로컬 디스크를 사용하지 않을 수도 있으며, 도 4에서 제시된 바와 같이 로컬 디스크를 중간파일의 저장 용도로 사용하는 작업들의 경우 로컬 디스크 대역폭을 요구할 수 있다. 여기에서, 만약 계산 노드들이 로컬 디스크를 가지지 않을 경우에는 로컬 디스크의 입출력 대역폭 요구량을 체크하는 프로세스가 생략될 수 있다. If it is determined that both the first condition and the second condition are satisfied as a result of the check in step 504, if the global file system can be stably operated even after the bandwidth requested by the selection task is permitted by the task scheduler unit, A calculation node satisfying the number of CPU cores, the memory, and the local disk input / output bandwidth required by the selection job among the calculation nodes is selected (step 506). At this time, the selection operation may not use the local disk, and the local disk bandwidth may be required for operations using the local disk for storing the intermediate file as shown in FIG. Here, if the calculation nodes do not have a local disk, the process of checking the input / output bandwidth requirement of the local disk may be omitted.

이후, 작업 스케줄러부에서는 선택된 작업이 선택된 노드에서 실행되도록 스케줄링한다(단계 508).Thereafter, the job scheduler unit schedules the selected job to be executed at the selected node (step 508).

이상의 설명은 본 발명의 기술사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경 등이 가능함을 쉽게 알 수 있을 것이다. 즉, 본 발명에 개시된 실시 예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것으로서, 이러한 실시 예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다.It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the following claims. It is easy to see that this is possible. In other words, the embodiments disclosed in the present invention are not intended to limit the scope of the present invention but to limit the scope of the technical idea of the present invention.

따라서, 본 발명의 보호 범위는 후술되는 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.
Therefore, the scope of protection of the present invention should be construed in accordance with the following claims, and all technical ideas within the scope of equivalents should be interpreted as being included in the scope of the present invention.

302 : 워크플로우 사용자 인터페이스
304 : 워크플로우 관리부
306 : 워크플로우 엔진부
308 : 자원 관리부
310 : 작업 스케줄러부
312 : 계산 자원 작업 관리부
314 : 계산 자원 감시부
316 : 전역 파일 시스템 자원 감시부
318 : 응용 자원 사용량 관리부302: Workflow user interface
304: Workflow management unit
306: Workflow engine section
308:
310: task scheduler
312: Calculation resource task manager
314: Calculation resource monitoring unit
316: Global File System Resource Monitoring Unit
318: Application resource usage management unit

Claims

A workflow user interface unit for providing an interface for a user workflow,
A workflow engine unit for converting the user workflow into an execution workflow using the resource usage information of the individual application, executing the resource workflow through calculation resources, and generating a scheduling instruction according to the resource,
A resource management unit for collecting and managing real-time resource load information on all resources of the computing system;
A task scheduler for scheduling tasks of a group to be sequentially executed as long as there is a data position dependency on the same computational node based on the real time resource load information according to the generated scheduling instruction;
A computation resource task management unit located at an individual computation node and executing a task at an assigned node when execution of the task is requested from the task scheduler unit;
A calculation resource monitoring unit for monitoring load information of a calculation resource in which jobs are executed in individual nodes when the jobs are being executed in the individual calculation resources and providing the load information to the resource management unit;
A global file system resource monitoring unit for measuring a total input / output bandwidth and a utilization rate of the global file system and providing the resource to the resource management unit;
An application resource usage management unit for generating resource usage information of the individual application based on load information from the calculation resource monitoring unit and measurement information from the global file system resource monitoring unit and providing the resource usage information to the workflow engine unit;
The workflow scheduling device comprising:

The method according to claim 1,
The interface comprises:
GUI interface or web interface
Workflow task scheduling device.

The method according to claim 1,
The workflow task scheduling apparatus includes:
A workflow management unit for storing, changing, deleting, and displaying the user workflow,
The workflow scheduling device further comprising:

The method of claim 3,
The user workflow includes:
An abstract level of workflow that does not run on computational resources.
Workflow task scheduling device.

The method according to claim 1,
The workflow engine unit includes:
Converts the order of each work of the user workflow into a data location dependency parameter required for job submission when converting to the execution workflow, and converts the application resource usage information into a resource use requirement parameter required at job submission
Workflow task scheduling device.

6. The method of claim 5,
The workflow engine unit includes:
Generates job execution request information including the data location dependency parameter and the resource use request amount parameter and provides the job execution request information to the job scheduler unit
Workflow task scheduling device.

The method according to claim 1,
The workflow engine unit includes:
And provides a function of storing, deleting, and querying the execution result of the computational resource
Workflow task scheduling device.

The method according to claim 1,
The real-time resource load information includes:
The configuration information of the entire resources of the computing system, the resource allocation and load information of the individual nodes, the load of the global file system, and the bandwidth usage of the input /
Workflow task scheduling device.

9. The method of claim 8,
The entire resource may be,
Includes a total compute node, a global file system node, a management network switch, a compute network switch, and a network architecture
Workflow task scheduling device.

The method according to claim 1,
Wherein the job scheduler comprises:
Assigning and executing queued jobs in the current work queue to the computational resources according to priority and availability of resources
Workflow task scheduling device.

11. The method of claim 10,
The presence or absence of the above-
Contains the amount of real-time input and output bandwidth available
Workflow task scheduling device.

The method according to claim 1,
The load information of the computation resources may include:
And at least one of the CPU utilization rate, the memory utilization rate, the disk input / output bandwidth usage, the disk usage rate, the network input / output usage rate and the usage rate of the individual node
Workflow task scheduling device.

13. The method of claim 12,
Wherein the calculation resource monitoring unit comprises:
Monitors the load information and provides the resource information to the resource management unit or the application resource usage management unit periodically or after completion of the work
Workflow task scheduling device.

The method according to claim 1,
The resource usage information of the individual application,
And is automatically collected and generated when the user workflow is executed
Workflow task scheduling device.

The method according to claim 1,
The resource usage information of the individual application,
Generated by manual input through monitoring of the task manager
Workflow task scheduling device.

Specifying whether an output file of each job is required as a final result when defining a user workflow,
Collecting resource usage information of an individual application constituting the user workflow;
Assigning a data location dependency parameter such that a subsequent task of the job is executed in a node where the predecessor job is executed when the application characteristic of the successive job is a job having a relatively large input / output,
Scheduling a group of tasks to be sequentially executed as long as there is a data location dependency on the same compute node based on real-time resource load information collected for all resources of the computing system
The workflow scheduling method comprising:

17. The method of claim 16,
The pre-task stores an intermediate result file on a local disk of the computation node where the task is executed, and the subsequent task is an I / O directory position so as to read an intermediate result file stored in a predecessor task from a local disk of the node to be executed Directing
A method for scheduling workflow tasks.

17. The method of claim 16,
Wherein the scheduling comprises:
Selecting an operation on a work queue,
Checking whether an input / output bandwidth requirement of the global file system of the selected job meets a requirement that can be provided by the global file system;
Selecting a computation node that satisfies the resource usage required by the selected task when the requirement is satisfied;
Executing the selected task on the selected node
The workflow scheduling method comprising:

19. The method of claim 18,
The selection criterion of the job is,
According to the predefined priority control criteria
A method for scheduling workflow tasks.

19. The method of claim 18,
The above-
The utilization ratio of the current global file system is relatively smaller than the stable utilization ratio of the maximum global file system and the sum of the input / output usage of the current global file system and the input / output requirement of the global file system of the selected operation is greater than the maximum actual input / Which is a relatively small condition
A method for scheduling workflow tasks.