KR101694307B1

KR101694307B1 - Apparatus and method for maximizing disk cache effect for workflow job scheduling

Info

Publication number: KR101694307B1
Application number: KR1020120020792A
Authority: KR
Inventors: 안신영; 차규일; 김영호; 임은지; 김진미; 배승조
Original assignee: 한국전자통신연구원
Priority date: 2012-02-29
Filing date: 2012-02-29
Publication date: 2017-01-09
Also published as: CN103294535A; KR20130099351A

Abstract

본 발명은 고성능 근거리 네트워크로 연결되는 고성능컴퓨팅시스템(또는 슈퍼컴퓨터)에 파이프라인 형태의 대규모 데이터 병렬 분산 작업을 자동 실행하여 결과를 얻는 자원 관리 및 작업 스케줄링 방법에 관한 것으로, 자원사용 프로파일을 로드(load)하고, 계산 노드들 중 어느 하나의 계산 노드를 선택하여 상기 선택된 계산 노드에서 동시 수행 가능한 작업 수와 상기 선택된 계산 노드에서 디스크 캐쉬 프리가 발생하지 않고 사용 가능한 가용 디스크 캐쉬 사이즈를 결정하고, 동시 작업으로 디스크 캐쉬되는 파일들의 총합이 상기 가용 디스크 캐쉬 사이즈보다 작게 되도록 입력 파일 수와 파일 크기를 결정하고, 상기 결정된 입력 파일 수와 파일 크기에 기초하여 작업들을 그룹핑 또는 분할하여 작업이 수행될 계산 노드를 결정함으로써, 디스크 캐쉬 효과를 극대화하도록 작업을 분할할 수 있도록 하는 발명이다.The present invention relates to a resource management and job scheduling method for automatically executing a large-scale data parallel distributed work in a pipeline form in a high performance computing system (or a super computer) connected to a high-performance local area network, and selecting one of the calculation nodes from among the calculation nodes to determine the number of operations that can be performed in the selected calculation node at the same time and the available disk cache size that can be used without generating a disk cache free in the selected calculation node, The number of input files and the file size are determined so that the sum of the files cached by the operation is smaller than the available disk cache size and the operations are grouped or divided based on the determined number of input files and the file size, , The disk cassette The invention to split the work so as to maximize the effect.

Description

[0001] APPARATUS AND METHOD FOR MAXIMIZING DISK CACHE EFFECT FOR WORKFLOW JOB SCHEDULING [0002]

본 발명은 고성능 근거리 네트워크로 연결되는 고성능컴퓨팅시스템(또는 슈퍼컴퓨터)에 파이프라인 형태의 대규모 데이터 병렬 분산 작업을 자동 실행하여 결과를 얻는 자원 관리 및 작업 스케줄링 방법에 관한 것이다. The present invention relates to a resource management and job scheduling method for automatically executing a large-scale data parallel distributed work in a pipeline form in a high-performance computing system (or supercomputer) connected to a high-performance local area network.

기존의 슈퍼 컴퓨터, 고성능 클러스터 등 다양한 형태의 컴퓨팅 자원 환경하에서, 사람을 대신하여 대규모 데이터를 처리하는 과학 연산 작업, 또는 여러 단계의 작업 간 종속성이 존재하는 복잡한 작업들을 일괄 실행하기 위해서, 워크플로우(workflow) 관리 시스템, 자원 관리 시스템, 및 작업 스케줄러 등을 활용하여 왔다.In order to execute complex computation tasks that involve large-scale data processing on behalf of people, or complex dependencies between tasks in various forms of computing resources such as existing supercomputers and high-performance clusters, workflows workflow management system, resource management system, and job scheduler.

워크플로우 관리 시스템은, 대체로 사용자 친화적인 사용자 인터페이스(User Interface)를 통해 일련의 작업들이 연관성을 가지고 연결되는 워크플로우를 작성하고, 작성된 워크플로우를, 고성능 컴퓨터, 그리드, 및 웹 서비스 등 다양한 컴퓨팅 자원을 연동하여 실행하고 결과를 보고하는 소프트웨어 시스템이다. 종래 워크플로우 관리 시스템으로는 타베르나(Taverna), 갤럭시(Galaxy), 및 케플러(Kepler) 등이 있다.The workflow management system creates a workflow in which a series of tasks are linked with each other through a user-friendly user interface, and transmits the created workflow to various computing resources such as a high-performance computer, a grid, and a web service And reports the results. Conventional workflow management systems include Taverna, Galaxy, and Kepler.

자원 관리 시스템은, 고성능컴퓨터 또는 클러스터에 대한 컴퓨팅 자원의 관리 및 작업의 일괄 실행 등을 처리하는 소프트웨어 시스템으로, PBS(Portable Batch System) 계열의 OpenPBS, TORQEU, PBS pro가 있고 그 외에도 SLURM, Oracle Grid Engine 등이 있다. 대체로 FCFS(First-Come First-Served) 방식의 작업 스케줄링을 사용한다.The resource management system is a software system that manages computing resources for a high-performance computer or cluster and executes batch operations. It includes OpenPBS, TORQEU, and PBS pros of PBS (Portable Batch System) series, SLURM, Oracle Grid Engine. Generally, it uses first-come first-served (FCFS) job scheduling.

작업 스케줄러는 주로 자원관리 시스템과 연동하여 사용되는데, 작업 큐(Queue) 상의 작업들의 우선순위, 요구 자원량을 가용 자원의 종류 및 수량과 비교하여, 동적으로 실행순서를 바꾸어가면서 작업들을 실행하는 소프트웨어 시스템이다. 종래 기술로는 Maui, ALPS, LSF, Moab 등이 있다.The task scheduler is mainly used in conjunction with the resource management system. The task scheduler compares the priority of the tasks on the task queue, the requested resource amount with the type and quantity of available resources, and executes the tasks while changing the execution order dynamically. to be. Maui, ALPS, LSF, and Moab are known in the prior art.

또한, 워크플로우와 관련된 종래 기술은, 리소스의 현재 정보를 획득하여 작업의 분배를 수행하는 기술(한국공개특허 2010-0133418)을 통하여 전체적인 리소스 측면에서 작업 분배를 고려하는 방법에 관하여는 기술이 제시된 바 있지만, 디바이스에 존재하는 디스크 캐쉬와 관련하여는 이를 효율적으로 활용하는 방법은 고안되지 않았었고, 따라서 디바이스의 성능의 활용에 한계가 존재하는 문제점이 있었다.Further, the related art related to the workflow is disclosed in a technique regarding a method of considering work distribution in terms of overall resources through a technique of acquiring current information of resources and performing work distribution (Korean Patent Publication No. 2010-0133418) However, there has been a problem in that there is a limit to the utilization of the performance of the device, as a method of efficiently utilizing the disk cache in the device is not devised.

예를 들어, 유전체 서열분석을 포함하는 대부분의 과학응용 분야의 기술은, 기 개발된 응용 프로그램들을 조합하여 원하는 결과를 얻는 경우가 많으므로, 워크플로우(또는 파이프라인)을 통하여 시간적인 선후관계에 의한 종속성과 데이터 종속성을 가지는 응용 프로그램(작업)들을 이 종속성에 근거하여 순서 흐름을 구성하여 작업을 수행한다. 또한, 이와 같은 워크플로우는 한두 개의 응용으로 구성되는 간단한 형태부터 수십~수백 개의 응용들이 묶이는 형태까지 매우 다양한 크기를 가질 수 있다.For example, most of the scientific applications, including genome sequencing, often combine pre-developed applications to get the desired results, so it is possible to use a workflow (or pipeline) (Tasks) that have dependencies and data dependencies on them by constructing sequence flows based on these dependencies. In addition, such a workflow can have various sizes ranging from a simple form composed of one or two applications to a form in which tens to hundreds of applications are bundled.

따라서, 이와 같은 워크플로우를 적당한 계산 자원에 맵핑(mapping)하여 효과적으로 결과를 도출하기 위해서는, 워크플로우를 구성하는 작업들이 필요로 하는 계산자원에 대한 정확한 정보가 요구된다. 그러나, 작업을 실제 실행하는 응용 프로그램의 자원 사용에 대한 정보(응용 프로그램은 CPU는 몇 개, 메모리 얼마, 디스크 얼마, 네트워크 대역폭 얼마가 필요하다)는 그 응용의 개발자가 아니면 알아내기 매우 어려우며, 소스 코드로부터 자원 사용 프로파일을 얻어내는 분석 도구들의 개발도 부진한 편이다.Therefore, accurate information on the computational resources required by the work constituting the workflow is required in order to map such a workflow to an appropriate computational resource and effectively derive the result. However, it is very difficult to find out about the resource usage of an application that actually performs the task (the application has few CPUs, how much memory, how much disk, how much network bandwidth is needed) The development of analytical tools for retrieving resource usage profiles from code is also lacking.

따라서 기존의 워크플로우 관리시스템, 자원관리 시스템, 작업 스케줄러를 사용하려고 할 경우 워크플로우의 작업들과 계산 자원간의 효율적인 매칭(matching)이 매우 어렵다. 따라서, 일반적인 유전체 분석 응용 사용자들은 유전체 서열 분석 작업들이 요구하는 계산 자원량을 잘 모르기 때문에 충분히 많은 자원을 요청하여 워크플로우를 실행함으로 인해 고성능 계산 자원의 낭비를 초래하는 문제점이 존재한다.Therefore, if an existing workflow management system, a resource management system, and a job scheduler are used, it is very difficult to efficiently match workflow tasks with computational resources. Therefore, general users of genome analysis applications do not know the amount of computational resources required by genome sequencing tasks. Therefore, there is a problem in that a large amount of resources are requested and a workflow is executed, thereby wasting high-performance computational resources.

이에 따라서, 본 발명은 상기와 같은 문제점을 해결하기 위하여 제안된 것으로, 워크플로우 관리시스템, 자원관리 시스템, 작업 스케줄러 등에서 제공하는 기능들을 연동하여 파일 입출력으로 상호 연결되는 다단계의 작업으로 구성되는 워크플로우를 수행하고자 할 때, 디스크 캐쉬 효과를 극대화 함으로써 물리적 계산 자원의 이용률을 높일 수 있는 워크플로우 스케줄링 방법 및 장치를 제시하는 것을 그 목적으로 한다.SUMMARY OF THE INVENTION Accordingly, the present invention has been made in view of the above problems, and it is an object of the present invention to provide a workflow management system, a workflow management system, a work scheduler, The present invention aims to provide a workflow scheduling method and apparatus capable of maximizing the disk cache effect to increase the utilization rate of physical calculation resources.

즉, 고성능 컴퓨터(또는 클러스터) 환경에서 대규모 데이터를 처리해야 하는 다단계 작업으로 구성되는 워크플로우의 실행 시간을 앞당기고 비용을 줄이기 위해서는 효과적인 계산 자원 사용과 사용률 제고가 필요하므로, 이를 위해서는 종래기술에서 제공하는 계산 자원에 대한 정적 정보 및 작업 실행 결과보다 더 상세한 자원 정보와, 작업 실행 시 적극적인 계산 자원 사용 모니터링을 통해, 그 워크플로우의 자원 사용 결과를 다음 워크플로우 실행 시 참조 되도록 자원 사용 프로파일로 업데이트함으로써, 워크플로우의 성능을 개선할 수 있는 워크플로우 스케줄링 방법 및 장치를 제시하는 것을 그 목적으로 한다.That is, in order to accelerate the execution time of a workflow constituted by a multilevel task that requires processing of large-scale data in a high-performance computer (or cluster) environment and to reduce the cost, Statistical information on the calculated resource and resource information that is more detailed than the job execution result and updating the resource usage result of the workflow so that it is referenced in the next workflow execution And a workflow scheduling method and apparatus capable of improving the performance of a workflow.

본 발명의 목적을 달성하기 위한 워크플로우 스케줄링 장치는, 계산 노드들의 실제 사용 자원량에 대한 사용 정보를 포함하는 워크플로우의 자원사용 프로파일을 저장하는 자원프로파일링 관리부, 소정의 작업들이 단위 계산 자원에서 실행 중인 경우, 상기 단위 계산 자원의 작업이 실제 사용하는 계산 자원의 사용 정보를 측정하고, 상기 측정 결과를 상기 자원프로파일링 관리부로 보고하여 상기 자원사용 프로파일이 업데이트되도록 하는 단위계산 자원 감지부, 및 상기 자원프로파일링 관리부의 자원사용 프로파일을 로드(load)하고, 상기 계산 노드들 중 어느 하나의 계산 노드를 선택하여 상기 선택된 계산 노드에서 동시 수행 가능한 작업 수 및 상기 선택된 계산 노드에서 디스크 캐쉬 프리가 발생하지 않고 사용 가능한 가용 디스크 캐쉬 사이즈를 결정하고, 동시 작업으로 디스크 캐쉬되는 파일들의 총합이 상기 가용 디스크 캐쉬 사이즈보다 작게 되도록 입력 파일 수와 파일 크기를 결정하고, 상기 결정된 입력 파일 수와 파일 크기에 기초하여 작업들을 그룹핑 또는 분할하여 작업이 수행될 계산 노드를 결정하는 작업분할부를 포함하는 것을 특징으로 한다.A workflow scheduling apparatus for achieving the object of the present invention includes a resource profiling manager for storing a resource use profile of a workflow including use information on an actual used resource amount of the calculation nodes, A unit calculation resource sensing unit for measuring usage information of a calculation resource actually used by the unit calculation resource and reporting the measurement result to the resource profiling management unit to update the resource usage profile, A resource utilization profile of the resource profiling management unit is loaded, a number of jobs that can be simultaneously performed in the selected calculation node by selecting one of the calculation nodes, Available disk cache sizes available Determining a number of input files and a file size such that the total number of files to be disk-cached concurrently is smaller than the available disk cache size, and grouping or dividing tasks based on the determined number of input files and file size, And a work dividing section for determining a calculation node to be executed.

또한, 본 발명의 목적을 달성하기 위한 워크플로우 스케줄링 방법은, 자원프로파일링 관리부에서 자원사용 프로파일을 로드(load)하는 단계, 계산 노드들 중 어느 하나의 계산 노드를 선택하는 단계, 상기 선택된 계산 노드에서 동시 수행 가능한 작업 수, 및 상기 선택된 계산 노드에서 디스크 캐쉬 프리가 발생하지 않고 사용 가능한 가용 디스크 캐쉬 사이즈를 결정하는 단계, 동시 작업으로 디스크 캐쉬되는 파일들의 총합이 상기 가용 디스크 캐쉬 사이즈보다 작게 되도록 입력 파일 수와 크기를 결정하는 단계, 및 상기 결정된 입력 파일 수와 크기에 기초하여 작업들을 그룹핑 또는 분할하여 작업이 수행될 계산 노드를 결정하는 단계를 포함하는 것을 특징으로 한다.In addition, a workflow scheduling method for achieving the object of the present invention includes loading a resource usage profile in a resource profiling manager, selecting one of the compute nodes, The method of claim 1, further comprising: determining a number of operations that can be performed simultaneously in the selected node and an available disk cache size that can be used without generating a disk cache free in the selected compute node; Determining a number of files and a size, and grouping or dividing jobs based on the determined number and size of input files to determine a calculation node on which an operation is to be performed.

다단계 작업들의 파이프라인 형태로 구성되는 워크플로우를 고성능컴퓨터 또는 클러스터에서 실행할 때 작업이 계산자원에서 실행될 때의 정보를 피드백 받아 다음 작업 스케줄링에 활용함으로써, 작업이 수행되는 계산 자원들의 이용을 극대화할 수 있는 효과가 있다.When a workflow consisting of a pipeline of multi-stage tasks is executed in a high-performance computer or cluster, the information used when the task is executed in the calculation resource is fed back to the next task scheduling to maximize the utilization of the computation resources There is an effect.

또한, 디스크 캐쉬 효과를 극대화하도록 작업을 분할하여 스케줄링 함으로써, 워크플로우를 실행하는 시간을 줄여 더 빠르게 결과를 얻을 수 있고, 계산 자원에 대한 깊은 지식 및 시스템 사용법을 알지 못하는 일반 사용자도 자신이 만든 워크플로우를 쉽게 최적화할 수 있게 되는 효과가 있다.In addition, by dividing and scheduling tasks so as to maximize the disk cache effect, it is possible to obtain results faster by reducing the execution time of the workflow, and even a general user who does not know how to use the system and the calculation resources, The flow can be easily optimized.

도 1은 워크플로우의 기본적인 진행 과정을 도시한 도면이다.
도 2는 유전체 서열 분석 워크플로우의 진행 순서를 도시한 도면이다.
도 3은 본 발명의 실시 예에 따른 워크플로우 스케줄링 장치의 구성을 도시한 도면이다.
도 4는 본 발명의 실시 예에 따른 워크플로우 스케줄링 방법을 도시한 순서도이다.
도 5는 도 4에 도시된 작업 분할 단계를 상세히 도시한 도면이다.
도 6은 도 5의 계산 노드에서 동시 수행될 수 있는 작업 수를 계산하는 방법을 도시한 순서도이다.1 is a diagram showing a basic process of a workflow.
FIG. 2 is a diagram showing the sequence of proceeding of the genome sequence analysis workflow. FIG.
3 is a diagram illustrating a configuration of a workflow scheduling apparatus according to an embodiment of the present invention.
4 is a flowchart illustrating a workflow scheduling method according to an embodiment of the present invention.
5 is a detailed view showing the job dividing step shown in FIG.
6 is a flowchart illustrating a method of calculating the number of jobs that can be performed simultaneously in the calculation node of FIG.

이하에서는 첨부된 도면을 참조하여 본 발명의 여러 가지 실시 예들을 보다 상세히 설명하도록 하겠다. 나아가, 이하의 설명에서 사용되는 구성요소에 대한 접미사 "부", “기” 및 "장치"는 단순히 본 명세서 작성의 용이함을 고려하여 부여되는 것으로서, 상기 "부", “기” 및 "장치"는 서로 혼용되어 사용될 수 있으며, 하드웨어 또는 소프트웨어로 설계 가능하다.Hereinafter, various embodiments of the present invention will be described in detail with reference to the accompanying drawings. Further, the suffix "part", "unit", and "apparatus" for components used in the following description are merely given for ease of description, Can be used in combination with each other, and can be designed in hardware or software.

나아가, 이하 첨부 도면들 및 첨부 도면들에 기재된 내용들을 참조하여 본 발명의 실시 예를 상세하게 설명하지만, 본 발명이 실시 예들에 의해 제한되거나 한정되는 것은 아니다.BRIEF DESCRIPTION OF THE DRAWINGS The above and other features and advantages of the present invention will be more apparent from the following detailed description taken in conjunction with the accompanying drawings, in which: FIG.

본 발명에서는 각 단계의 출력 파일이 다음 단계의 입력으로 들어가는 파이프라인 형태로 구성되는 워크플로우의 작업 스케줄링 방법을 제안한다. 본 발명에서 제안하는 스케줄링 방법은 워크플로우 실행 시 실제 계산 자원에서 사용되는 계산 자원량을 감시하여 그 결과가 다음 워크플로우 실행 시에 지속적으로 반영되도록 하는 방식으로 구성된다. The present invention proposes a work scheduling method of a workflow in which an output file of each step is configured as a pipeline type input to the next step input. The scheduling method proposed in the present invention is configured to monitor the amount of computational resources used in the actual computational resources at the time of execution of the workflow and continuously reflect the results at the next execution of the workflow.

도 1은 워크플로우의 기본적인 진행 과정을 도시한 도면이다.1 is a diagram showing a basic process of a workflow.

실시 예에 따라, 워크플로우는, 최초 입력 파일(101) 데이터가 저장되고, 중간 파일 1(102), 중간 파일 2(103)의 단계를 거쳐, 결과 파일(104) 데이터가 생성된다. 즉, 워크플로우(또는 파이프라인)은, 한 작업의 결과로 저장된 파일을 다시 입력으로 받아 다음작업을 수행하는 형태로 진행된다.According to the embodiment, in the workflow, the data of the first input file 101 is stored, and the data of the result file 104 is generated through the steps of the intermediate file 1 102 and the intermediate file 2 103. [ In other words, the workflow (or pipeline) takes the file stored as a result of one operation as input again, and proceeds to the next operation.

또한, 워크플로우의 입력 파일(101) 데이터는 파일형태로 저장된다. 입력 데이터의 분할 또는 병합은 전체 워크플로우 결과에 영향을 미치지 않으며, 따라서 계산 성능을 높이기 위해 입력 데이터는 여러 파일로 단순 분할되거나 합쳐질 수 있다.In addition, the input file 101 data of the workflow is stored in a file form. The splitting or merging of the input data has no effect on the overall workflow result, so the input data can be simply split or merged into several files to improve the calculation performance.

도 2는 유전체 서열 분석 워크플로우의 진행 순서를 도시한 도면이다.FIG. 2 is a diagram showing the sequence of proceeding of the genome sequence analysis workflow. FIG.

유전체 서열 분석 워크플로우 또는 파이프라인은, 입출력 파일로 연결되는 워크플로우의 일 실시 예에 해당한다.A genomic sequencing workflow or pipeline corresponds to one embodiment of a workflow that links to an input and output file.

또한, 이 유전체 서열분석 파이프라인은 유전체 단편들이 저장된 입력 파일을 읽어 참조 유전체와 비교하여 그 단편들의 순서를 맞추면서 전체 유전체 순서를 완성하는 작업을 수행한다(re-sequencing 방법). 특히, 도 2은 유전체 서열분석 도구 중, bwa, samtools를 이용한 유전체 서열분석 워크플로우의 일 실시 예에 해당한다. In addition, the genome sequence analysis pipeline reads the input file in which the genomic fragments are stored, compares the genomic sequence with the reference genome, and completes the entire genome sequence in order of the fragments (re-sequencing method). In particular, FIG. 2 corresponds to an embodiment of a genome sequence analysis workflow using bwa and samtools among the genome sequence analysis tools.

실시 예에 따라, 워크플로우의 첫 번째 작업인 레퍼런스 인덱싱(Reference Indexing) 작업은 서열분석이 이미 완료된 참조 유전체로부터 검색을 빠르게 하기 위한 인덱스를 생성한다(S201). 이 작업은 사전단계로 이후 반복되지는 않는다.According to the embodiment, the reference indexing operation, which is the first operation of the workflow, generates an index for speeding up the search from the reference genome in which the sequence analysis has already been completed (S201). This operation is not repeated after the preliminary step.

다음으로, 유전체 단편 맵핑(READ mapping, S202) 작업은, 일루미나 시퀀서(Illumina sequencer)와 같이 염기서열 분석기로부터 나오는 유전체 단편(본 도면에서는 싱글엔디드 리즈(single-ended reads)를 가정한다)들이 텍스트 형태로 저장된 시퀀스 리즈(sequence reads) 파일(FASTQ 파일 포맷)을 로드하여, 각 유전체 단편이 참조 유전체의 어느 부분과 유사한지를 검색하고, sai라는 확장자로 된 파일을 출력한다. Next, a READ mapping (S202) operation is performed in which a dielectric fragment (assuming single-ended reads in this figure) from a sequencer, such as an Illumina sequencer, (FASTQ file format) stored in the sequence file to search for a portion of each of the genomic fragments similar to the reference genome, and outputs a file having an extension of sai.

다음으로, SAM 변환(convert, S203) 작업은, sai라는 확장자로 저장된 파일과 참조 인덱스 파일, 처음 입력된 유전체 서열 단편(sequence reads)파일을 읽어 SAM(Sequence Alignment/Map)포맷의 결과파일을 출력한다. Next, the SAM conversion (convert, S203) operation reads the file stored in the sai extension, the reference index file, and the first genome sequence sequence file, and outputs the result file in the SAM (Sequence Alignment / Map) format do.

다음으로, BAM 변환(convert, S204) 작업은, SAM파일을 바이너리(binary) 버전으로 변환한다. Next, the BAM conversion (convert, S204) operation converts the SAM file into a binary version.

다음으로, BAM 분류(sorting, S205) 작업은 BAM 포맷의 파일을 추후 작업의 속도 향상을 위해 분류한다. Next, the BAM classification (sorting, S205) operation classifies the files of the BAM format for later speeding up of the work.

또한, 상기 단계(S202) 내지 단계(S205) 작업은 입력 파일 별로 각각 수행된다.The steps S202 to S205 are performed for each input file.

다음으로, BAM 병합(merging, S206) 작업은, 상기 단계(S205)의 BAM 분류(sorting) 작업들의 결과들을 모두 묶어 하나의 파일로 저장하고, 다음으로, SNP 호출(calling, S207)) 작업은 단편(read)들을 참조하여, 각 유전체의 위치에 맞게 쌓는 작업을 수행한다. Next, the BAM merging (S206) operation combines all the results of the BAM sorting operations of the step S205 and saves them as one file, and then a SNP call (S207)) operation We refer to the readings and perform the work of stacking according to the position of each dielectric.

상기 단계들에 따른 유전체 서열 분석 파이프라인은, 다양한 유전체 서열분석 파이프라인의 일 예이다. 유전체 서열분석 파이프라인은 입력데이터는 수백GB(Gigabyte)에 이르며, 중간 생성 파일과 최종 결과 파일을 총 데이터 크기는 수TB(Terabyte)에 이른다. 따라서, 이 유전체 단편(read)를 편의상 수십~수백 개(또는 수천 개)의 파일로 나누어 저장하고, 이 파일들을 서열분석 파이프라인의 입력파일로 지정하여 분석 작업을 수행한다. 이 유전체 단편들의 디팩토(de-facto) 표준 데이터 포맷으로 FASTQ 포맷이 많이 사용되며, 이 파일은 단순 분할 및 병합이 가능하다. 이하에서는 도 2에 도시된 유전체 워크플로우 실시 예에 대하여 워크플로우 스케줄링을 설명한다. The genomic sequence analysis pipeline according to the above steps is an example of various genomic sequence analysis pipelines. The genome sequencing pipeline has hundreds of gigabytes of input data, and the total data size for intermediate generation and final output files is several terabytes (TB). Therefore, this genetic fragment (read) is divided into several tens to several hundreds (or thousands) of files for convenience, and these files are designated as an input file of the sequence analysis pipeline for analysis. The FASTQ format is widely used as a de-facto standard data format for these genomic fragments, which can be simply segmented and merged. Hereinafter, workflow scheduling will be described with respect to the dielectric workflow embodiment shown in FIG.

특히, 디스크 캐쉬 효과와 관련하여, 유전체 워크플로우를 진행함에 있어서, 레퍼런스 인덱스 입력 파일, 유전체 단편 시퀀스 입력 파일(예: 1.fastq), 유전체 단편 맵핑(Read Mapping) 작업의 출력 파일(예: 1.sai), SAM 변환 작업의 출력 파일(예: 1.sam), BAM 변환 작업의 출력 파일(예: 1.bam), BAM 분류 작업의 출력파일(예: 1.sorted.bam)을 고려할 수 있다.Particularly with respect to the disk cache effect, in proceeding with the dielectric workflow, a reference index input file, a dielectric fragment sequence input file (e.g., 1.fastq), an output file of a dielectric fragment mapping (Read Mapping) operation .sai), the output file of the SAM conversion job (for example, 1.sam), the output file of the BAM conversion job (for example, 1.bam), and the output file of the BAM sort job (for example, 1.sorted.bam) have.

따라서, 본 도면에 도시된 바와 같이, 유전체 단편 맵핑(S602) 작업부터 BAM 분류 작업의 파이프라인을 수행했을 때, 6개의 파일을 디스크로 읽기(read) 또는 쓰기(write)한다. Therefore, as shown in the figure, when performing the pipeline of the BAM classification operation from the operation of the dielectric fragment mapping (S602), the six files are read or written to the disk.

즉, 유전체 단편 맵핑(S202) 작업 후에는 레퍼런스 인덱스(Reference Index) 파일, 유전체 단편 시퀀스 (Sequence Reads) 입력 파일, 그리고 유전체 단편 맵핑(S202)의 출력파일(sai)이 디스크 캐쉬에 존재한다. That is, after the operation of the dielectric fragment mapping (S202), a reference index file, a dielectric fragment sequence input file, and an output file sai of the dielectric fragment mapping S202 exist in the disk cache.

또한, SAM 변환(S203) 작업에서는, 상기 유전체 단편 맵핑 작업(S202)의 결과파일을 디스크에서 읽어 작업을 수행하는데, 상기 유전체 단편 맵핑 작업(S202)의 입력 파일과 결과 파일은 이미 디스크 캐쉬에 있으므로 디스크에 액세스할 필요 없이, 메모리에서 읽어오게 되어 빠르게 작업이 수행될 수 있다. 이와 같이, 이전 단계에서 디스크에 기록(write)한 파일은 디스크 캐쉬에 존재하므로 다음 단계에서는 이전 단계의 파일이 디스크 캐쉬에 존재하도록 하면 작업 효율을 높일 수 있다.In the SAM conversion operation (S203), the result file of the dielectric fragment mapping operation (S202) is read from the disk, and the input file and the result file of the dielectric fragment mapping operation (S202) Without having to access the disk, it can be read from memory and done quickly. As described above, since the file written to the disk in the previous step exists in the disk cache, in the next step, if the file in the previous step exists in the disk cache, the work efficiency can be increased.

따라서, 본 발명은 상기 경우와 같이, 디스크 캐쉬를 효율적으로 관리하여 작업 효율을 향상시킬 수 있도록 하는 발명을 개시한다.Therefore, the present invention discloses an invention for efficiently managing a disk cache to improve work efficiency as in the above case.

도 3은 본 발명의 실시 예에 따른 워크플로우 스케줄링 장치의 구성을 도시한 도면이다.3 is a diagram illustrating a configuration of a workflow scheduling apparatus according to an embodiment of the present invention.

워크플로우 스케줄링 장치는, 워크플로우 실행 시 작업들의 자원 사용 정보를 피드백 받아서 다음 워크플로우 수행 시 이 자원사용정보를 스케줄링에 활용 가능하다. The workflow scheduling device can utilize the resource use information for scheduling in the next workflow by receiving feedback on the resource usage information of the tasks when the workflow is executed.

워크플로우 스케줄링 장치는, 워크플로우 작성부, 워크플로우 엔진부, 총괄자원 자원관리부, 총괄작업 스케줄러부, 및 단위계산자원 작업관리부를 기존 워크플로우 관리시스템, 자원 관리 시스템, 및 작업 스케줄러의 기본 핵심 기능으로 모두 포함하고, 본 발명에서 신규로 제안하는 수단을 구현하는 구성으로, 자원 프로파일링 관리부, 작업분할부, 및 단위계산 자원감시부를 포함한다.The workflow scheduling apparatus includes a workflow generator, a workflow engine unit, a general resource resource manager, a general task scheduler unit, and a unit calculation resource task manager as basic core functions of an existing workflow management system, a resource management system, and a task scheduler And includes a resource profiling management unit, a work division unit, and a unit calculation resource monitoring unit in a configuration that implements the newly proposed means in the present invention.

실시 예에 따라, 계산 자원의 운영체제에 대해 잘 모르는 사용자도 쉽게 계산 자원을 이용할 수 있도록 하기 위해서는 사용자 친화적인 인터페이스가 필요하다. 이를 위하여 워크플로우 스케줄링 장치의 각 구성은 이하 동작들을 수행한다.According to the embodiment, a user-friendly interface is required in order to make calculation resources available to users who are not familiar with the operating system of the computational resources. To this end, each configuration of the workflow scheduling apparatus performs the following operations.

워크플로우 작성부(301)는, 사용자가 필요한 워크플로우를 GUI(Graphic User Interface)를 통해 쉽게 정의하고 실행할 수 있는 인터페이스를 제공한다.The workflow creating unit 301 provides an interface through which a workflow required by a user can be easily defined and executed through a GUI (Graphic User Interface).

워크플로우 엔진부(304)는 작성된 워크플로우를 다양한 계산자원을 통해 실행하고 그 결과를 사용자에게 GUI 형태로 제공한다.The workflow engine unit 304 executes the created workflow through various calculation resources and provides the result to the user in the form of a GUI.

총괄자원관리부(305)는 서비스(또는 로그인) 노드에 위치하며 사용자에게 계산 서비스를 제공하는 고성능 (슈퍼) 컴퓨터 전체 계산 자원에 대한 형상 정보(전체 계산 노드들의 연결 아키텍처)와 개별 노드들의 자원 상태 및 할당 여부 등을 관리한다.The general resource management unit 305 includes configuration information (connection architecture of the entire calculation nodes) for the high-performance (super) computer-wide calculation resources located at the service (or login) node and providing the calculation service to the user, And the like.

총괄작업스케줄러부(306)는 고성능 컴퓨터의 서비스 노드에 위치하며, 현재 작업큐에서 자원 할당을 대기중인 작업들을 우선순위 및 가용 자원의 유무에 따라 계산 자원에 할당하고 실행한다.The general task scheduler unit 306 is located in a service node of a high performance computer and allocates and executes tasks that are waiting for resource allocation in the current task queue according to priority and availability of available resources.

단위계산자원작업관리부(308)는 단위 계산 노드에서 위치하며, 총괄작업스케줄러부(306)에서 실행하는 세부 계산작업을 실행하고, 그 결과를 보고한다. The unit calculation resource job management unit 308 is located at the unit calculation node, executes the detailed calculation job to be executed by the general job scheduler unit 306, and reports the result.

상기 5개의 구성은, 기존 발명에서 이미 제공하는 기능에 해당한다.The five configurations correspond to the functions already provided in the existing invention.

실시 예에 따라, 본 발명은, 기존 기술이 제공하지 않는, 워크플로우의 자원 프로파일을 자동적으로 업데이트하는 기능, 이 자원 프로파일에서 제공하는 정보를 바탕으로 워크플로우 작업을 분할하는 기능, 및 단위 계산자원에서 작업 실행 중 자원 사용량을 감시하는 기능을 추가로 수행할 수 있다. 추가 기능 수행하기 위한 각 구성의 동작은 이하와 같다.According to an embodiment, the present invention provides a method for automatically updating a resource profile of a workflow, which is not provided by existing technology, a function for dividing a workflow task based on the information provided by the resource profile, It is possible to additionally perform a function of monitoring the resource usage during the execution of the job. The operation of each configuration for performing the additional function is as follows.

단위계산자원감시부(307)는 작업들이 단위 계산 자원에서 실행 중일 때 단위 자원의 CPU활용률, 메모리 사용률, 디스크 캐쉬 사용량, 디스크 사용률, 네트워크 사용량 및 사용률 등의 작업이 실제 사용하는 계산 자원의 사용 정보를 감시하여 주기적 또는 작업 종료 후에 자원프로파일링관리부(303)로 보고한다. The unit calculation resource monitoring unit 307 monitors the usage information of the calculation resources actually used by the operations such as the CPU utilization rate, memory utilization rate, disk cache usage rate, disk usage rate, network usage rate and utilization rate of the unit resources when the tasks are being executed in the unit calculation resources And reports it to the resource profiling management unit 303 periodically or after the end of the work.

자원프로파일링관리부(303)는, 워크플로우의 자원사용프로파일을 업데이트하는 기능을 수행한다. 워크플로우의 자원사용프로파일의 초기값은 사용자로부터 입력 받을 수 있다. 실시 예에 따라, 사용자는 오류를 방지하기 위하여 자원사용프로파일의 초기값을 예상되는 수치보다 약간 높게 입력할 수 있다. The resource profiling managing unit 303 performs a function of updating the resource use profile of the workflow. The initial value of the resource usage profile of the workflow can be input from the user. According to an embodiment, the user may enter an initial value of the resource usage profile slightly higher than the expected value in order to avoid errors.

사용자는 워크플로우를 실행함으로써 계산노드에서 작업들이 실행되었을 때의 실제 자원사용량 정보가 자동으로 자원프로파일링관리부(303)로 보고되고, 자원프로파일링관리부(303)에서는, 실제 사용 자원량을 바탕으로 기존 자원사용프로파일을 업데이트한다. 이렇게 자원사용프로파일은 워크플로우가 실행됨에 따라 지속적으로 업데이트되어 같은 유형의 작업을 최적으로 처리할 수 있게 된다. The user is automatically notified of the actual resource usage information when the jobs are executed in the calculation node by executing the workflow, and the resource profiling managing unit 303 reports the actual resource usage information to the resource profiling managing unit 303. Based on the actual used resource amount, Update the resource usage profile. This resource usage profile is constantly updated as the workflow runs, allowing optimal processing of the same type of work.

자원프로파일링관리부(303)에서 관리하는 워크플로우 자원사용프로파일은 워크플로우를 구성하는 단위 작업 별로 관리되는 정보와 단위 작업을 동시 다중 수행 시 각 계산 자원의 사용량 정보, 그리고 여러 단위 작업을 혼합하여 수행 시 각 계산 자원의 자원 사용량 정보 등을 포함할 수 있다. The workflow resource usage profile managed by the resource profiling management unit 303 is a combination of the usage information of each calculated resource and various unit operations when the information and the unit work managed by the unit work constituting the workflow are simultaneously multiplexed And the resource usage information of the time calculation resource.

각 작업수행 시 단위계산자원감시부(307)에서 감시하는 자원 사용 메트릭과 자원프로파일링관리부(303)에서 관리하는 성능 메트릭으로는, CPU 이용률(peak, avg), Memory 사용량(peak, avg), 메모리상의 디스크 캐쉬 사용량, Disk I/O rate(peak, avg), Disk Utilization(peak, avg)/node, Network 사용량(peak, avg), 작업의 I/O대기 시간/비율 정보 등이 포함될 수 있다.The resource utilization metrics monitored by the unit calculation resource monitoring unit 307 and the performance metrics managed by the resource profiling management unit 303 during each job include CPU utilization (peak, avg), memory usage (peak, avg) Disk I / O rate (peak, avg), disk Utilization (peak, avg) / node, network usage (peak, avg) .

작업분할부(302)는 자원사용프로파일을 참조하여 워크플로우에 입력된 데이터를 처리하는 작업들을 계산노드별로 분할한다. 이러한 작업 분할은 종래 기술에서는 총괄작업스케줄러부(305)에서 담당했던 역할이나, 본 발명에서는 작업분할부(202)에서 총괄자원관리부(305)의 계산 자원에 대한 정보를 얻어와서, 이 상세 계산 자원 정보를 바탕으로 작업을 분할하여 워크플로우엔진부(304)를 통하여 총괄작업스케줄러부(306)로 실행할 작업을 전달한다.The task division unit 302 divides tasks for processing data input into the workflow by calculation nodes referring to the resource usage profile. In the present invention, the task division unit 202 obtains information on the calculation resources of the general resource management unit 305, and the detailed calculation resource And transfers the job to be executed to the general job scheduler unit 306 via the workflow engine unit 304. [

따라서, 본 제안발명은, 상기 구성을 통하여, 워크플로우를 작성하고, 획득된 상세 자원정보를 통하여 디스크 캐쉬 효과를 극대화시킬 수 있도록 워크플로우 작업을 분할하고, 작업 수행 후 프로파일을 업데이트하여 다음 작업 분할에 이용할 수 있도록 한다.Accordingly, in the present invention, the workflow is created through the above configuration, the workflow work is divided so as to maximize the disk cache effect through the acquired detailed resource information, the profile is updated after the work is performed, .

도 4는 본 발명의 실시 예에 따른 워크플로우 스케줄링 방법을 도시한 순서도이다.4 is a flowchart illustrating a workflow scheduling method according to an embodiment of the present invention.

실시 예에 따라, 먼저, 워크플로우 스케줄링 장치는, 워크 플로우를 작성(또는 기 작성된 워크플로우를 오픈)하고, 입력 파일의 위치 및 목록을 지정하여 워크플로우를 실행한다(S401). 다음으로, 워크플로우 스케줄링 장치의 작업분할부는, 입력파일들의 목록 및 크기와 기타 특징을 확인하고, 가용 계산자원들의 CPU개수, 메모리 크기, 디스크 I/O(Input/output)속도, 및 네트워크 대역폭 속도 등 상세 정보를 획득하고(S302), 가용 계산자원 별로 워크플로우의 작업을 분할한다(S303). 작업분할 알고리즘에 관하여는 이하 도 5에서 상세히 설명한다.According to the embodiment, first, the workflow scheduling apparatus creates a workflow (or opens a previously created workflow), specifies a location and a list of input files, and executes a workflow (S401). Next, the work partitioning unit of the workflow scheduling apparatus confirms the list and size of the input files and other characteristics, and determines the number of CPUs, memory size, disk I / O (input / output) (S302), and divides the work of the workflow according to the available calculation resources (S303). The work partitioning algorithm will be described in detail below with reference to FIG.

다음으로, 가용 계산 노드 별로 맞춤 작업들을 실행하고, 정상 실행 여부를 체크하고(S404), 각 계산 노드에서 실행되는 작업이 종료되면, 작업 수행 시 모니터링된 자원 사용량 정보를 이용하여 워크플로우 자원 사용 프로파일을 업데이트한다(S405).Next, the customized jobs are executed for each of the available calculation nodes, and it is checked whether or not the normal jobs are executed (S404). When the jobs executed in the respective calculation nodes are terminated, the workflow resource usage profile (S405).

따라서, 본 제안발명은, 상기와 같은 단계들을 통하여, 워크플로우를 작성하고, 획득된 상세 자원정보를 통하여 디스크 캐쉬 효과를 극대화시킬 수 있도록 워크플로우 작업을 분할하고, 작업 수행 후 프로파일을 업데이트하여 다음 작업 분할에 이용할 수 있도록 한다.Accordingly, in the present invention, a workflow is created through the steps described above, a workflow task is divided to maximize a disk cache effect through acquired detailed resource information, a profile is updated Make it available for job splitting.

도 5는 도 4에 도시된 작업 분할 단계를 상세히 도시한 도면이다.5 is a detailed view showing the job dividing step shown in FIG.

실시 예에 따라, 워크플로우 스케줄링 장치는, 워크플로우를 구성하는 작업들이 사용하는 자원 프로파일 정보와 단위 계산 자원 간의 매칭(matching)을 통해 단위 계산 자원이 최적으로 단위 작업들을 수행할 수 있는 입력 데이터 그룹핑 통한 작업 분할 스케줄링을 수행한다.According to an embodiment, the workflow scheduling apparatus may include an input data grouping unit that can perform unit operations optimally with a unit calculation resource through matching between resource profile information used by jobs constituting the workflow and unit calculation resources And performs job division scheduling.

따라서, 워크플로우 스케줄링 장치는, 가용 계산 노드의 상세 자원정보가 획득되면, 먼저 가용 계산 노드 리스트에서 하나의 계산 노드를 선택한다(S501).Accordingly, when the detailed resource information of the available computation node is obtained, the workflow scheduling apparatus first selects one computation node in the available computation node list (S501).

이하에서는, 상기 선택된 하나의 계산 노드의 인덱스를 'i'라 한다.Hereinafter, the index of the selected one calculation node is referred to as 'i'.

또한, 상기 하나의 계산 노드 선택은, 라운드 로빈 방식으로 선택될 수 있다.Further, the one calculation node selection may be selected in a round robin manner.

다음으로, 상기 선택된 하나의 계산 노드에 대해 동시 수행할 수 있는 작업의 수를 계산한다(S502). Next, the number of jobs that can be simultaneously performed on the selected one of the calculation nodes is calculated (S502).

실시 예에 따라, 초기값은 실제 작업이 사용하는 계측된 자원량이 없으므로 사용자 설정한 값으로 지정된다(디폴트(default) 값은 '1'). According to the embodiment, the initial value is set to a user-set value (the default value is '1') since there is no measured resource amount used by the actual operation.

또한, 워크플로우가 실행되고 각 작업들의 CPU 이용률, Disk 이용률 또는 디스크 I/O속도, 메모리 이용률 또는 메모리 사용량, 네트워크 이용률 또는 이용량 등이 계측이 되면 동시 수행할 수 있는 작업의 수를 계산한다. 상기 작업 수 계산의 순서에 관하여는, 이하 도 6에서 상세히 설명한다.In addition, when the workflow is executed and the CPU utilization, disk utilization or disk I / O rate, memory utilization or memory usage, network utilization or usage of each job are measured, the number of operations that can be performed simultaneously is calculated. The procedure of calculating the number of operations will be described later in detail with reference to FIG.

다음으로, 상기 선택된 하나의 계산 노드에 대해서 디스크 캐쉬 프리(물리 메모리 사용률이 일정비율 이상일 때, 메모리상의 가장 오래된 디스크 캐쉬를 해제 하는 것)가 발생하지 않고 사용 가능한 디스크 캐쉬 사이즈(노드 i의 usable Disk Cache Size: 이하 'uDCS_i')를 하기 수학식 1을 이용하여 계산한다(S503).Next, if the selected one of the compute nodes is free from disk cache (freeing the oldest disk cache on the memory when the physical memory usage rate is higher than a certain rate), the usable disk cache size Cache Size: 'uDCS _i ') using the following Equation 1 (S503).

상기 수학식 1에서, PhyMem_i는 계산 노드 i의 물리적 메모리 크기를 의미하며, DCFSU_i는 계산 노드 i의 디스크 캐쉬 프리가 시작되는 메모리 사용률을 의미한다.In Equation (1), PhyMem _i denotes the physical memory size of the computation node i, and DCFSU _i denotes the memory utilization rate at which the disk cache free of the computation node i starts.

실시 예에 따라, 시스템 구현에 따른 디스크 캐쉬 프리 정책이 다르지만, 대부분의 최근 사용되고 있는 리눅스 시스템의 경우, 80% 메모리가 사용되면 메모리가 부족하다고 판단하고, 응용 프로그램들이 신규 메모리를 요청하면, 디스크 캐쉬 프리가 시작된다. According to the embodiment, although the disk cache-free policy differs according to the system implementation, in most recently used Linux systems, when 80% memory is used, it is determined that memory is insufficient. When application programs request new memory, Free begins.

또한, sysMem_i는 계산 노드 i의 부트(boot) 직후, 시스템이 사용하는 메모리 사용량을 의미한다. 실시 예에 따라, 상기 메모리 사용량은, 시스템 부팅 직후, 워크플로우 스케줄링 장치의 단위계산자원감시부에서 체크해 둔다. Also, sysMem _i means the amount of memory used by the system immediately after the boot of computation node i. According to the embodiment, the memory usage amount is checked in the unit calculation resource monitoring unit of the workflow scheduling apparatus immediately after booting the system.

또한 jobsMem은 작업들이 사용하는 물리 메모리량을 의미한다. 실시 예에 따라, 초기값은 사용자가 입력한 값이며, 작업 실행 중 워크플로우 스케줄링 장치의 단위계산자원감시부에서 감시한 값이 저장된다. 만약 작업들을 동시에 수행할 경우 단위 작업의 메모리 사용량에 동시 수행 작업 수를 곱하여 얻을 수 있다.JobMem also refers to the amount of physical memory used by jobs. According to the embodiment, the initial value is a value input by the user, and the value monitored by the unit resource monitoring unit of the workflow scheduling apparatus during the job execution is stored. If the tasks are executed at the same time, the memory usage of the unit tasks can be obtained by multiplying the number of concurrent tasks.

디스크 캐쉬 효과란 메모리가 낭비되지 않도록 메모리의 일부를 디스크 캐쉬로 사용함으로 인해서 동일한 파일에 대한 입출력 시 디스크로부터 읽어오는 대신 메모리에서 읽어옴으로써 빠른 응답 시간을 얻는 것이다. The disk cache effect is to use a portion of memory as a disk cache so that memory is not wasted, so that when I / O to the same file is read from the disk instead of reading from the disk, a fast response time is obtained.

다음으로, 디스크 캐쉬 프리가 시작되지 않도록 하려면 워크플로우 실행 중 디스크 캐쉬가 될 입출력 파일들의 총합이 uDCS_i 보다 커지면 안되므로, 동시 작업을 진행할 경우에는 작업 수와 동일한 수의 입력 파일들과 그 파일로부터 생성되는 중간 파일들의 총합을 계산하여 uDCS_i보다 작게 되도록 입력 파일 수와 크기를 결정하고, 이에 기초하여 그룹핑 또는 분할을 수행하여 해당 노드 i에 전송하여 실행되도록 한다(S504).Next, in order to prevent the disk cache free from starting, the sum of the input / output files to be the disk cache during the workflow execution should not be larger than uDCS _{i. Therefore} , if the simultaneous operation is performed, And determines the number and size of input files to be smaller than uDCS _i , and performs grouping or segmentation on the basis of the number and size of the input files so as to be transmitted to the corresponding node i (S504).

실시 예에 따라, 디스크 캐쉬가 될 총 파일 크기를 알려면 생성되는 중간 결과 파일 크기를 알아야 되는데 이 파일들의 크기는 실행해보기 전에는 알 수 없다. According to the embodiment, when knowing the total file size to be a disk cache, it is necessary to know the intermediate result file size to be generated. The size of these files can not be known before execution.

따라서 초기에는 입력파일대비총파일크기비율을 사용자가 입력해준다. 입력파일대비총파일크기비율값은 워크플로우 실행을 거치면서 해당 자원사용프로파일에 지속적으로 업데이트되어 최적화된다.Therefore, the user inputs the ratio of the total file size to the input file initially. The ratio of total file size to input file ratio is continuously updated and optimized for the corresponding resource usage profile as the workflow is executed.

디스크캐쉬되는 파일들의 총합 크기는 하기 수학식 2를 이용하여 계산할 수 있다.The total size of the files to be disc-cached can be calculated using Equation (2) below.

따라서, 상기 수학식 2를 통하여, 예상전체파일크기가 uDCS_i보다 작도록 작업을 그룹핑한다. 이와 같은 조건을 만족하는 입력파일들이 그룹핑되면 이 입력파일들을 분석하는 작업을 해당 노드 i에 전송하여 실행되도록 한다.Accordingly, through the above Equation (2), tasks are grouped so that the estimated total file size is smaller than uDCS _i . When the input files satisfying the above conditions are grouped, the task of analyzing the input files is transmitted to the corresponding node i to be executed.

다음으로, 미할당 입력 데이터가 존재하는지 여부를 판단하고(S505), 작업 분할이 진행되지 않은 입력 파일들이 아직 남아 있으면 단계(S501)로 다시 진행하고, 모든 입력데이터에 대해 작업이 할당되면 입력데이터 분할을 종료한다. Next, it is determined whether or not unassigned input data exists (S505). If there are still input files for which job division has not been performed yet, the process returns to step S501. If a job is assigned to all input data, And ends the division.

따라서, 상기와 같은 단계를 통하여 분할된 작업 그룹들이 워크플로우 스케줄링 장치의 워크플로우엔진부를 통해 각 해당 노드로 전송되어, 계산 노드 별 작업이 실행될 수 있다.Accordingly, the divided work groups can be transmitted to each corresponding node through the workflow engine unit of the workflow scheduling apparatus through the steps described above, so that the work for each calculation node can be executed.

도 6은 도 5의 계산 노드에서 동시 수행될 수 있는 작업 수를 계산하는 방법을 도시한 순서도이다.6 is a flowchart illustrating a method of calculating the number of jobs that can be performed simultaneously in the calculation node of FIG.

실시 예에 따라, 먼저 각 계산 자원 별 동시 작업 수를 계산한다. According to the embodiment, the number of simultaneous operations for each calculation resource is first calculated.

따라서, CPU 개수와 이용률 기준 최대 동시 작업 수를 계산하고(S601), 다음으로, 디스크 이용률 기준 최대 동시 작업 수를 계산한다(S602). 다음으로, 메모리 사용량 기준 최대 동시 작업 수를 계산하고(S603), 네크워크 이용률 기준 최대 동시작업 수를 계산한다(S604).Accordingly, the maximum number of simultaneous operations based on the number of CPUs and the utilization rate is calculated (S601), and then the maximum number of concurrent operations based on disk utilization is calculated (S602). Next, the maximum number of simultaneous operations based on the memory usage is calculated (S603), and the maximum number of simultaneous operations based on the network utilization is calculated (S604).

다음으로, 상기 단계들을 통하여 각 계산 자원 별 동시 작업수가 계산되면, 각 계산자원 별 동시 작업 수 중에 가장 작은 값을 해당 노드에 대한 <동시작업 수_i>로 결정한다(S605). 이는, 각 계산자원 별 동시 작업 수 중 가장 작은 값이 성능을 결정하는 병목 자원에 해당하기 때문이다.Next, when the number of concurrent operations for each calculation resource is calculated through the above steps, the smallest value among the number of concurrent operations for each calculation resource is determined as < concurrent operation number _i > for that node (S605). This is because the smallest value among the number of concurrent operations for each calculation resource corresponds to the bottleneck resource that determines performance.

또한, 이상에서는 본 발명의 바람직한 실시 예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시 예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해돼서는 안 될 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, It will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the present invention.

301: 워크플로우 작성부
302: 작업분할부
303: 자원프로파일링관리부
304: 워크플로우 엔진부
305: 총괄자원 자원관리부
306: 총괄작업 스케줄러부
307: 단위계산 자원감시부
308: 단위계산 자원작업관리부301: Workflow creation section
302: Operation division
303: Resource profiling manager
304: Workflow engine part
305: general resource resource management unit
306: general task scheduler section
307: Unit calculation resource monitoring unit
308: Unit calculation resource task manager

Claims

A resource profiling manager for storing a resource use profile of a workflow including use information on an actually used resource amount of the calculation nodes;
Wherein when a predetermined job is being executed in a unit calculation resource, usage information of a calculation resource actually used by the job of the unit calculation resource is measured, and the measurement result is reported to the resource profiling management unit so that the resource usage profile is updated A unit calculation resource sensing unit; And
A resource utilization profile of the resource profiling management unit is loaded, a number of jobs that can be simultaneously performed by the selected calculation node by selecting one of the calculation nodes, and a disk cache free Determining the number of available files and the size of the files so that the total number of files cached in a concurrent operation is smaller than the available disk cache size; And dividing or grouping the jobs so as to determine a calculation node on which the job is to be performed.

The method according to claim 1,
Wherein the unit calculation resource sensing unit comprises:
And report the measurement result to the resource profiling management unit periodically or after completion of the work.

The method according to claim 1,
The resource profiling management unit,
The information processing method according to any one of claims 1 to 3, wherein the management information includes at least one of information managed by each unit work constituting the work flow, information of usage amount of each calculated resource when the unit work is simultaneously executed, And stores the resource usage profile of the workflow.

A method for scheduling a workflow of a scheduling apparatus,
Loading a resource usage profile at a resource profiling manager;
Selecting one of the compute nodes in the task partition;
Determining a number of jobs that can be simultaneously performed in the selected compute node in the task division unit and an available disk cache size that can be used without generating a disk cache free in the selected compute node;
Determining the number and size of input files such that the total number of files to be disc cached in the task division is smaller than the available disk cache size; And
And grouping or dividing tasks based on the determined number and size of input files in the task division to determine a calculation node on which the task is to be performed.

The method of claim 4,
Wherein determining the available disk cache size comprises:
The available disk cache size (uDCS _i ) is determined by Equation (1)
Equation (1)

, Where PhyMem _i is the physical memory size of the selected compute node, DCFSU _i is the memory utilization rate at which the disk cache free of the selected compute node begins, and sysMem _i is the system memory immediately after the boot of the selected compute node. And the jobMem is the physical memory size used by any of the jobs.

The method of claim 4,
Wherein the step of determining the number and size of input files comprises:
Calculating a sum of files to be disc cached by the simultaneous operation according to Equation (2)
Equation (2)

Wherein the scheduling of the workflow comprises:

The method of claim 6,
Wherein the step of determining the number and size of input files comprises:
Wherein the ratio of the total file size to the input file is input from the user when the total file size ratio information to the input file does not exist in the resource usage profile.

The method of claim 4,
Wherein the step of selecting any one of the calculation nodes comprises:
And selecting any one of the compute nodes in a Round Robin fashion. &Lt; Desc / Clms Page number 19 >

The method of claim 4,
Wherein the step of determining the number of simultaneously executable operations comprises:
Calculating a maximum number of simultaneous operations based on the number of CPUs of the selected calculation node and CPU utilization rate information;
Calculating a maximum concurrent operation number based on disk utilization information of the selected compute node;
Calculating a maximum number of concurrent operations based on memory usage information of the selected compute nodes;
Calculating a maximum concurrent operation number based on network utilization information of the selected compute node; And
Determining the smallest value among the calculated maximum number of concurrent operations as the number of simultaneously executable operations of the selected compute node
Wherein the workflow scheduling method comprises the steps of:

The method of claim 4,
Measuring usage information of a computational resource actually used by a task of the unit computational resource when predetermined tasks are being executed in the unit computational resource; And
And storing or updating the measurement result as a resource usage profile of the workflow.

The method of claim 10,
Storing or updating the resource usage profile of the workflow,
The information processing method according to any one of claims 1 to 3, wherein the management information includes at least one of information managed by each unit work constituting the work flow, information of usage amount of each calculated resource when the unit work is simultaneously executed, And storing or updating a resource usage profile of the workflow.