KR101881637B1

KR101881637B1 - Job process method and system for genome data analysis

Info

Publication number: KR101881637B1
Application number: KR1020160061519A
Authority: KR
Inventors: 김진식
Original assignee: 주식회사 케이티
Priority date: 2016-05-19
Filing date: 2016-05-19
Publication date: 2018-08-24
Also published as: KR20170130827A

Abstract

본 발명은 유전체 데이터를 분석할 때 필요한 컴퓨팅 자원을 이용하여 유전체 분석 작업을 처리하는 작업 처리 방법 및 시스템에 관한 것이다. 본 발명의 실시예에 따른 컴퓨팅 자원을 제어하여 유전체 분석 작업을 처리하는 시스템은, 사용자로부터 유전체 분석 작업을 요청받는 접수 처리 모듈; 복수의 파이프라인 이미지 중에서, 상기 유전체 분석 작업에 필요한 파이프라인 이미지를 선정하는 이미지 선정 모듈; 상기 유전체 분석 작업을 수행하는데 요구되는 필요 자원을 확인하고, 컴퓨터 자원을 형성하는 복수의 서버 중에서 상기 필요 자원을 지원할 수 있는 하나 이상의 서버를 작업 분석 서버로 선정하는 자원 관리 모듈; 및 상기 작업 분석 서버로 선정된 하나 이상의 서버에 상기 선정된 파이프라인 이미지를 탑재시켜, 상기 유전체 분석 작업을 상기 선정된 하나 이상의 서버를 통해 처리되게 제어하는 분석 처리 모듈을 포함한다.The present invention relates to a work processing method and system for processing a dielectric analysis work using computing resources required for analyzing dielectric data. A system for controlling a computing resource by controlling computing resources according to an embodiment of the present invention includes a reception processing module for receiving a genome analysis job from a user; An image selection module for selecting a pipeline image necessary for the dielectric analysis operation from a plurality of pipeline images; A resource management module that identifies required resources required for performing the genome analysis and selects one or more servers capable of supporting the required resources among a plurality of servers forming computer resources as a job analysis server; And an analysis processing module for loading the selected pipeline image on one or more servers selected by the job analysis server and controlling the genetic analysis job to be processed through the selected one or more servers.

Description

[0001] The present invention relates to a method and system for processing genome data,

본 발명은 작업 처리 기술에 관한 것으로서, 더욱 상세하게는 유전체 데이터를 분석할 때 필요한 컴퓨팅 자원을 이용하여 유전체 분석 작업을 처리하는 작업 처리 방법 및 시스템에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a work processing technique, and more particularly, to a work processing method and system for processing a dielectric analysis work using computing resources required for analyzing dielectric data.

최근 들어, 인간 질병에 대한 유전자를 밝히기 위한 연구가 활발하게 진행되고 있다. 이 중에서 인체의 유전정보를 가지고 있는 게놈(genome)을 해독해 유전자 지도를 작성하고 유전자 배열을 분석해, 인간의 질병 발생을 예상하는 프로젝트가 활발하게 진행되고 있다.In recent years, studies have been actively conducted to identify genes for human diseases. Among them, a genome map having a human genetic information is decoded to analyze a gene sequence, and a project for predicting the occurrence of a human disease is actively being carried out.

상기 프로젝트는 특정 사용자의 게놈 데이터와 레퍼런스 게놈 데이터를 비교하여 변이 발생한 특정 사용자의 유전정보(즉, 염기서열)를 확인하고, 이렇게 확인한 유전정보를 토대로 상기 특정 사용자에 대한 질병 상관관계를 도출한다.The project compares genome data of a specific user with reference genome data to identify genetic information (i.e., base sequence) of a specific user who has developed a mutation, and derives a disease correlation for the specific user based on the genetic information thus confirmed.

한편, 대규모 컴퓨터 자원을 구축한 컴퓨팅 시스템이 클라이언트 단말로부터 사용자의 염기서열 데이터가 포함된 유전체 데이터를 수신하고, 이 유전체 데이터를 분석한 후 이 분석 결과를 클라이언트 단말로 제공하는 서비스가 개시되었다. 즉, 컴퓨팅 시스템이 유전체 데이터의 분석을 대행하고, 이 분석 결과를 사용자에게 제공하는 서비스가 개시되었다. 아래의 특허문헌은 유전 정보 관리 시스템 및 방법에 관하여 개시한다.On the other hand, a computing system having a large-scale computer resource receives a genome data including a user's nucleotide sequence data from a client terminal, analyzes the genome data, and provides the analysis result to the client terminal. That is, a service has been disclosed in which a computing system performs an analysis of genome data and provides the analysis result to a user. The following patent documents disclose a genetic information management system and method.

최근 개발이 활발한 차세대 염기서열 분석(NGS: Next Generation Sequencing)은, 먼저 체액 속에서 추출된 DNA(deoxyribonucleic acid)에서 시퀀싱 작업을 통해 파편화된 유전자 서열 디지털 정보(즉, 유전체 분석 raw 데이터)를 생성한 후, 대규모 컴퓨팅 자원을 투입해 여러 가지 분석 절차를 거쳐 실제 유전자 서열 정보 및 변이 정보를 추출하는 유전체 분석 파이프라인(이하 파이프라인) 작업을 수행한다.Next Generation Sequencing (NGS), which has been recently developed, is a method of generating genetic sequence digital information (ie, genome analysis raw data) that has been fragmented through sequencing in DNA (deoxyribonucleic acid) Afterwards, it executes a genome analysis pipeline (hereinafter referred to as "pipeline") that extracts actual gene sequence information and mutation information through various analysis procedures by inputting a large amount of computing resources.

파이프라인 작업은 유전체 분석 원시 데이터(raw data)를 기존에 알려진 유전체 표준 서열(즉, 레퍼런스 데이터)과 비교하여 정렬하는 일종의 대규모 퍼즐 맞추기 작업을 통해 최종 유전자 서열을 파악하는 것으로서, 전체 분석 과정 중 가장 많은 컴퓨팅 자원을 소모하게 되고 이를 최적화 및 고속화하는 것이 정보 처리 관점에서 유전자 분석 비용을 낮추는 핵심 기술이 된다.The pipeline work is to identify the final gene sequence through a sort of large-scale puzzle-matching operation in which genome analysis raw data is compared with a known genome standard sequence (ie, reference data) It consumes a lot of computing resources, and optimizing and accelerating it becomes a core technology that lowers the cost of gene analysis from an information processing point of view.

이러한 파이프라인 작업을 처리하는 플랫폼 및 인프라에서는, 파이프라인이 분석 대상과 분석 목적에 따라 다양한 절차 및 컴퓨팅 리소스를 요구가 요구된다. 예를 들어, 일정 유전자의 영역만 검사하는 타깃 영역 시퀀싱(targeted sequencing)의 경우에는 수백 MB(메가바이트) 용량의 FASTQ라는 형식의 파일 분석이 필요하고, 이를 위해서는 특정 파이프라인 절차를 수행하되, 일정 규모의 CPU 코어와 일정 용량의 메모리가 요구된다.In the platforms and infrastructure that handle these pipeline tasks, the pipeline requires various procedures and computing resources depending on the analysis target and analysis purpose. For example, in the case of targeted sequencing that only inspects a region of a certain gene, a file analysis in the form of FASTQ of a capacity of several hundred megabytes (MB) is required. To do this, Sized CPU core and a certain amount of memory are required.

인간 유전자의 유효한 영역 전반에 대한 유전체 분석은 WES(Whole Exome Sequencing)라고 불리는데 이 경우 수십 GB(기가바이트) 용량의 FASTQ파일 분석이 필요하며, 이를 위해서는 타깃 영역 시퀀싱보다 더 많은 코어와 메모리를 요구하게 되는게 일반적이다. 또한, 암 분석의 경우에는 정상 세포의 DNA와 암 세포의 DNA 데이터 두 가지를 동시 분석하기 때문에 전혀 다른 절차 및 그에 맞는 컴퓨팅 자원(메모리, 코어, 디스크 등)이 필요하다.A genome analysis of the entire useful region of a human gene is called Whole Exome Sequencing (WES), which requires analysis of FASTQ files with a capacity of tens of gigabytes (GB), which requires more core and memory than target area sequencing . In addition, in the case of cancer analysis, a different procedure and corresponding computing resources (memory, core, disk, etc.) are needed because the analysis of both normal DNA and cancer DNA data is performed simultaneously.

이에 따라, 종래에는 대규모의 분석을 수행하기 위해, 분석 유형별로 데이터를 그룹핑한 후에 각각 한꺼번에 필요한 파이프라인 절차를 각 서버에 배포하고 일괄 수행하는 작업을 반복하여 전체 데이터에 대한 처리를 완료한다. 그런데 이러한 종래의 기법은, 분석 유형별 파이프라인 절차가 서로 다르고 복잡할수록, 각 상황별로 인프라를 직접 셋팅하고 관리해야 되므로, 비용이 상승하고 운용의 비효율성이 야기되는 문제점이 있다. Accordingly, conventionally, in order to perform a large-scale analysis, groups of data for each type of analysis are grouped, and a pipeline procedure necessary for each of them is distributed to each server, and collective execution is repeated to complete processing of the entire data. However, such a conventional technique has a problem that the infrastructure is directly set and managed for each situation as the pipeline procedures for the different types of analysis are different and complex, which leads to an increase in cost and inefficiency of operation.

따라서, 각 유형별 인프라를 사전에 미리 셋팅하여(즉, 자동화하여) 운영하는 방식이 있으나, 이 방식은 특정 유형 분석이 몰릴 때 시스템의 전체 유휴율을 상승시키는 문제점이 있다. 부연하면, 빈번히 발생하는 작은 규모의 분석 작업은 서버 하나에도 여러 개를 동시 수행할 수 있는데, 이를 자동화하여 처리하게 되면 분석 규모가 작다고 해도 각 절차에 따라 소모되는 컴퓨팅 자원이 불규칙하고 다수의 분석 작업이 작업 간에 서로 영향을 자원에 영향을 주어 병렬처리의 안정성을 저해한다. 따라서 종래의 파이프라인 처리 기법은, 병렬 처리의 안정성을 위해 서버별로 일부 연산 능력이 남더라도 유휴율을 어느 정도 안정적인 수준으로 확보시킨다. 즉, 종래의 파이프라인 처리 기법에서는, 서버의 유휴율이 임계값 미만으로 남지 않게 제어한다. Therefore, there is a method of pre-setting the infrastructure of each type in advance (that is, by automating), but this method has a problem of increasing the total idle rate of the system when specific type analysis is carried out. In addition, a small number of analysis tasks that occur frequently can be performed simultaneously on a single server. If the process is automated, even if the analysis size is small, the computational resources consumed by each procedure are irregular, This interferes with each other and affects the resources and hinders the stability of parallel processing. Therefore, the conventional pipeline processing technique secures the idle rate to a stable level to some degree even if some computation ability is left for each server for the stability of the parallel processing. That is, in the conventional pipeline processing technique, the idle rate of the server is controlled so as not to remain below the threshold value.

그러나 이러한 종래의 파이프라인 처리 기법은, 남아 있는 서버의 자원을 이용하지 않기 때문에 시스템 전체의 효율성을 저하시키는 문제점으로 작용한다.However, the conventional pipeline processing technique does not utilize the resources of the remaining servers, thus degrading the efficiency of the entire system.

한국등록특허 10-1188886호Korean Patent No. 10-1188886

본 발명은 이러한 종래의 문제점을 해결하기 위하여 제안된 것으로, 전체 시스템의 유휴율을 최소화시키고, 다양한 유형의 파이프라인을 자동으로 처리할 수 있는 유전체 데이터 분석을 위한 작업 처리 방법 및 시스템을 제공하는데 그 목적이 있다.Disclosure of Invention Technical Problem [8] Accordingly, the present invention has been made in order to solve the conventional problems, and it is an object of the present invention to provide a method and system for processing data for analyzing dielectric data that minimizes the idle rate of the entire system and automatically processes various types of pipelines. There is a purpose.

본 발명의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있으며, 본 발명의 실시예에 의해 보다 분명하게 알게 될 것이다. 또한, 본 발명의 목적 및 장점들은 특허 청구 범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 쉽게 알 수 있을 것이다.Other objects and advantages of the present invention will become apparent from the following description, and it will be understood by those skilled in the art that the present invention is not limited thereto. It will also be readily apparent that the objects and advantages of the invention may be realized and attained by means of the instrumentalities and combinations particularly pointed out in the appended claims.

상기 목적을 달성하기 위한 본 발명의 제 1 측면에 따른, 컴퓨팅 자원을 제어하여 유전체 분석 작업을 처리하는 시스템은, 사용자로부터 유전체 분석 작업을 요청받는 접수 처리 모듈; 복수의 파이프라인 이미지 중에서, 상기 유전체 분석 작업에 필요한 파이프라인 이미지를 선정하는 이미지 선정 모듈; 상기 유전체 분석 작업을 수행하는데 요구되는 필요 자원을 확인하고, 컴퓨터 자원을 형성하는 복수의 서버 중에서 상기 필요 자원을 지원할 수 있는 하나 이상의 서버를 작업 분석 서버로 선정하는 자원 관리 모듈; 및 상기 작업 분석 서버로 선정된 하나 이상의 서버에 상기 선정된 파이프라인 이미지를 탑재시켜, 상기 유전체 분석 작업을 상기 선정된 하나 이상의 서버를 통해 처리되게 제어하는 분석 처리 모듈을 포함하는 것을 특징으로 한다.According to a first aspect of the present invention, there is provided a system for controlling a computing resource by controlling computing resources, the system comprising: a reception processing module for receiving a genome analysis job from a user; An image selection module for selecting a pipeline image necessary for the dielectric analysis operation from a plurality of pipeline images; A resource management module that identifies required resources required for performing the genome analysis and selects one or more servers capable of supporting the required resources among a plurality of servers forming computer resources as a job analysis server; And an analysis processing module for loading the selected pipeline image into one or more servers selected by the job analysis server and controlling the genome analysis job to be processed through the selected one or more servers.

상기 목적을 달성하기 위한 본 발명의 제 2 측면에 따른 작업 처리 시스템에서 컴퓨팅 자원을 제어하여 유전체 분석 작업을 처리하는 방법은, 사용자로부터 유전체 분석 작업을 요청받는 단계; 복수의 파이프라인 이미지 중에서 상기 유전체 분석 작업에 필요한 파이프라인 이미지를 선정하는 단계; 상기 유전체 분석 작업을 수행하는데 요구되는 필요 자원을 확인하는 단계; 컴퓨터 자원을 형성하는 복수의 서버 중에서 상기 필요 자원을 지원할 수 있는 하나 이상의 서버를 작업 분석 서버로 선정하는 단계; 및 상기 작업 분석 서버로 선정된 하나 이상의 서버에 상기 선정된 파이프 이미지를 탑재시켜, 상기 유전체 분석 작업을 상기 선정된 하나 이상의 서버를 통해 처리되게 제어하는 단계를 포함하는 것을 특징으로 한다.According to a second aspect of the present invention, there is provided a method for controlling a computing resource in a work processing system, the method comprising: receiving a request for a genome analysis job from a user; Selecting a pipeline image necessary for the dielectric analysis operation from a plurality of pipeline images; Identifying a necessary resource required to perform the genome analysis operation; Selecting one or more servers capable of supporting the required resources among a plurality of servers forming computer resources as a job analysis server; And loading the selected pipe image into one or more servers selected by the job analysis server, and controlling the dielectric analysis job to be processed through the selected one or more servers.

본 발명은 작업 유형에 따라 파이프라인 이미지를 서버에 탑재시키고, 이 파이프라인 이미지를 통해서 작업이 처리되게 함으로써, 유전체 분석 작업을 빠르게 처리할 수 있을 뿐만 아니라 고객이 원하는 파이프라인을 구동시킬 수 있는 장점이 있다. According to the present invention, a pipeline image is loaded on a server according to a job type, and a job is processed through the pipeline image. Thus, not only can a dielectric analysis job be processed quickly, but also a merit .

또한, 본 발명은 작업 처리시에 서버별 자원 상태를 확인하고, 이 자원 상태와 작업 유형에 따라 하나 이상의 서버를 분석 대상 서버로 선정함으로써, 전체 컴퓨팅 시스템의 유휴율을 최소화하는 이점이 있다. 게다가, 본 발명은 컴퓨팅 자원이 오토스케일링으로 설정되어 있는지 여부에 따라, 분석 대상 서버를 선정하는 정책을 달리함으로써, 전체 시스템의 자원 효율을 더욱 향상시키는 이점이 있다. In addition, the present invention has an advantage of minimizing the idle rate of the entire computing system by checking the resource status of each server at the time of job processing and selecting one or more servers as analysis target servers according to the resource status and the job type. In addition, the present invention has an advantage of further improving the resource efficiency of the entire system by changing the policy for selecting the analysis target server according to whether or not the computing resources are set to auto scaling.

또한, 본 발명은 분석 서버가 유전체 분석 작업을 격리 처리할 수 있도록 작업별로 독립적인 처리 환경을 제공함으로써, 작업 처리 안정성을 향상시키는 효과도 있다.In addition, the present invention provides an independent processing environment for each job so that the analysis server can isolate the dielectric analysis job, thereby improving the stability of job processing.

게다가, 본 발명은 작업 처리를 위한 서버 자원을 할당할 때에, 사전에 설정된 단위 기준에서 2의 n배수(n은 자연수)가 되게 자원을 할당함으로써, 자원 단편화가 발생되는 현상을 최소화하는 장점이 있다. In addition, the present invention has an advantage of minimizing the occurrence of resource fragmentation by allocating resources so that n is a multiple of 2 (n is a natural number) in a predetermined unit standard when assigning server resources for job processing .

본 명세서에 첨부되는 다음의 도면들은 본 발명의 바람직한 실시예를 예시하는 것이며, 발명을 실시하기 위한 구체적인 내용과 함께 본 발명의 기술사상을 더욱 이해시키는 역할을 하는 것이므로, 본 발명은 그러한 도면에 기재된 사항에만 한정되어 해석되어서는 아니 된다.
도 1은 본 발명의 일 실시예에 따른, 작업 처리 시스템의 구성을 나타내는 도면이다.
도 2는 이미지 테이블을 예시하는 도면이고, 도 3은 할당 정책 테이블을 예시하는 도면이며, 도 4는 자원 상태 테이블을 예시하는 도면이다.
도 5는 본 발명의 일 실시예에 따른, 작업 처리 시스템에서 컴퓨팅 자원을 할당하고 컴퓨팅 자원에 파이프라인 이미지를 탑재시켜 유전체 분석을 수행하는 방법을 설명하는 흐름도이다.BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate exemplary embodiments of the invention and, together with the description, serve to explain the principles of the invention. And shall not be construed as limited to such matters.
1 is a diagram showing a configuration of a job processing system according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating an image table, FIG. 3 is a diagram illustrating an allocation policy table, and FIG. 4 is a diagram illustrating a resource state table.
5 is a flowchart illustrating a method for allocating computing resources in a work processing system and performing a dielectric analysis by loading pipeline images on computing resources, according to an embodiment of the present invention.

상술한 목적, 특징 및 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해 질 것이며, 그에 따라 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 또한, 본 발명을 설명함에 있어서 본 발명과 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에 그 상세한 설명을 생략하기로 한다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 일 실시예를 상세히 설명하기로 한다.The foregoing and other objects, features and advantages of the present invention will become more apparent from the following detailed description of the present invention when taken in conjunction with the accompanying drawings, in which: There will be. In the following description, well-known functions or constructions are not described in detail since they would obscure the invention in unnecessary detail. Hereinafter, a preferred embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른, 작업 처리 시스템의 구성을 나타내는 도면이다.1 is a diagram showing a configuration of a job processing system according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 본 발명의 일 실시예에 따른 작업 처리 시스템(300)은 복수의 서버(310-N), 저장소(320), 접수 처리 모듈(330), 자원 관리 모듈(340), 이미지 선정 모듈(350) 및 분석 처리 모듈(360)을 포함하여, 작업 처리 시스템(300)은 네트워크(200)를 통하여 사용자 단말(100)과 통신한다. 상기 네트워크(200)는 인트라넷, 인터넷망 및 이동통신망을 포함한다.1, a work processing system 300 according to an exemplary embodiment of the present invention includes a plurality of servers 310-N, a repository 320, a reception processing module 330, a resource management module 340, The job processing system 300 communicates with the user terminal 100 via the network 200, including an image selection module 350, The network 200 includes an intranet, an Internet network, and a mobile communication network.

상기 작업 처리 시스템(300)에 포함된 모듈들은 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 조합을 통해서 구현될 수 있다. 또한, 작업 처리 시스템(300)은, 메모리와 하나 이상의 프로세서를 포함할 수 있으며, 접수 처리 모듈(330), 자원 관리 모듈(340), 이미지 선정 모듈(350) 및 분석 처리 모듈(360)은 상기 메모리에 저장되어, 상기 하나 이상의 프로세서에 의하여 실행되는 프로그램 형태로 구현될 수 있다. The modules included in the work processing system 300 may be implemented in hardware or software, or a combination of hardware and software. In addition, the task processing system 300 may include a memory and one or more processors, and the acceptance processing module 330, the resource management module 340, the image selecting module 350, Stored in a memory, and implemented in the form of a program executed by the one or more processors.

사용자 단말(100)은 사용자 또는 기업이 소유하고 있는 서버, 컴퓨터, 전산 시스템 등과 같은 통신 장치로서, 유전체의 원시 데이터(raw data) 및 작업 분석 유형 정보가 포함된 작업 요청 정보를 작업 처리 시스템(300)으로 전송하여, 작업 처리 시스템(300)에서 분석 완료된 작업 결과를 수신한다. 상기 원시 데이터는 체액 속에 추출된 DNA에서 시퀀싱 작업을 통해 생성된 파편화된 유전자 서열 디지털 정보이다. 상기 사용자 단말(100)은 상기 원시 데이터로서, FASTQ, BAM, VCF 등의 유형의 파일이 포함된 작업 요청 정보를 작업 처리 시스템(300)으로 전송할 수 있다. 또한, 상기 사용자 단말(100)은 상기 작업 분석 유형 정보로서, 암 분석, 희귀질환 분석, 비만 분석 등을 상기 작업 요청 정보에 기록할 수 있다.The user terminal 100 is a communication device such as a server, a computer, a computer system, etc. owned by a user or a corporation, and transmits job request information including raw data of the dielectric and job analysis type information to the job processing system 300 ), And receives the analysis result of the job completed in the job processing system (300). The primed data is fragmented gene sequence digital information generated by sequencing the DNA extracted in body fluids. The user terminal 100 may transmit job request information including FASTQ, BAM, VCF, and the like to the job processing system 300 as the raw data. Also, the user terminal 100 may record cancer analysis, rare disease analysis, obesity analysis, and the like in the job request information as the job analysis type information.

복수의 서버(310-N)는 컴퓨팅 자원을 형성하는 수단으로, 물리적인 서버 또는 논리적인 서버가 작업 처리 시스템(300)에 구축될 수 있다. 또한, 서버(310-N)는 메모리, 디스크, CPU 코어 등이 포함된 자원을 보유한다. 상기 서버(310-N)는 분석 처리 모듈(360)의 제어에 의해서, 파이프라인 이미지를 탑재하고 이 파이프라인에 따른 분석 작업을 수행하고, 분석 결과를 저장소(320)에 저장한다. 서버(310-N)는 서로 다른 유형의 작업을 동시에 처리할 수 있으며, 복수의 서버(310-N)가 하나의 작업을 분산 처리할 수도 있다.The plurality of servers 310-N is a means for forming computing resources, and a physical server or a logical server can be built in the work processing system 300. [ In addition, the server 310-N holds resources including a memory, a disk, a CPU core, and the like. The server 310-N loads the pipeline image under the control of the analysis processing module 360, performs the analysis work according to the pipeline, and stores the analysis result in the repository 320. The server 310-N may simultaneously process different types of jobs, and the plurality of servers 310-N may distribute one job.

상기 파이프라인은, 유전체 원시 데이터를 분석 절차를 거쳐 실제 유전자 서열 정보 및 변이 정보를 확인하고, 이 확인한 정보를 토대로 질병 상관관계 등을 유출하는 일련의 분석 작업을 의미한다. 또한, 후술하는 파이프라인 이미지는, 유전체 분석에 필요한(즉, 파이프라인 처리에 필요한) 하나 이상의 응용 프로그램과 실행 절차 및 옵션들을 정의된 프로그램 패키지이다. The pipeline refers to a series of analysis tasks for confirming actual gene sequence information and mutation information through an analysis procedure of genomic raw data, and for deriving disease correlation based on the confirmed information. Further, the pipeline image described below is a program package in which one or more application programs necessary for the genetic analysis (i. E., Required for pipeline processing) and execution procedures and options are defined.

한편, 각각의 서버(310-N)는 파이프라인에 의한 유전체 분석 작업을 처리할 때에, 상기 분석 작업 처리할 때에 사용하기로 설정한 자원(즉, 코어, 메모리 및 디스크)을 이용하여 상기 분석 작업을 격리 실행한다. 상기 격리 실행은, 동일한 서버, 동일한 OS(Operating System)에서 작업이 분석될 때, 메모리, CPU 코어, 디스크(즉, 파일 시스템) 등이 별도의 환경을 갖추어 필요한 프로세스를 처리하는 것을 의미한다. 이러한 격리 실행은, 리눅스 등의 운영체제에 있어서 커널 레벨에서 지원되고 있다.On the other hand, when each of the servers 310-N processes the dielectric analysis work by the pipeline, the server 310-N uses the resource (i.e., core, memory and disk) . The quarantine execution means that a memory, a CPU core, a disk (that is, a file system) or the like, when a job is analyzed in the same server and the same OS (Operating System), has a separate environment and processes necessary processes. This isolation is supported at the kernel level for operating systems such as Linux.

또한, 각각의 서버(310-N)의 자원이 모여서 형성하는 컴퓨팅 자원은 오토스케일링(auto scaling) 환경으로 설정될 수도 있다. 상기 오토스케일링 환경은, 복수의 서버(310-N) 중에서 일부는 활성화되어 작업을 처리하고 있으나, 일부 서버(310-N)는 비활성화되어 대기 상태로 있다가, 작업이 폭주하여 자원이 부족한 경우에 비활성화된 서버 중에 일부가 활성화되어 컴퓨팅 자원이 동적으로 확장되는 기능이다. 본 발명에 따른 서버(310-N)는 오토스케일링 환경으로 설정되어 동적으로 컴퓨팅 자원이 확장되거나, 오토스케일링 환경으로 미설정되어 전체 컴퓨팅 자원이 고정될 수 있다.In addition, the computing resources formed by the resources of the respective servers 310-N may be set to an auto scaling environment. In the auto-scaling environment, some of the plurality of servers 310-N are activated to process jobs, but some servers 310-N are inactive and in a standby state, Some of the deactivated servers are activated and the computing resources are dynamically expanded. The server 310-N according to the present invention may be set to an auto-scaling environment to dynamically expand computing resources, or may not be set to an auto-scaling environment, thereby fixing the entire computing resources.

저장소(repository)(320)는 스토리지 장치, 데이터베이스 등과 같은 저장수단으로서, 복수의 파이프라인 이미지를 저장하고, 특히 동일 유형의 파이프라인 이미지를 버전별로 저장한다. 저장소(320)는 원시 데이터(raw data) 및 작업 분석 유형 정보가 포함된 작업 요청 정보를 저장하고, 분석 작업의 결과물을 저장한다. 상기 저장소(320)는 인트라넷, LAN(Local Area Network), WLAN(Wide LAN), SAN(Storage Area Network) 등과 같은 통신 네트워크를 통하여 액세스되는 네트워크 부착형 저장수단일 수도 있다.The repository 320 is a storage device such as a storage device, a database, or the like, stores a plurality of pipeline images, and in particular, stores pipeline images of the same type by version. The repository 320 stores job request information including raw data and job analysis type information and stores the results of the analysis job. The storage 320 may be a network attached storage device that is accessed through a communication network such as an intranet, a local area network (LAN), a wide area network (WLAN), a storage area network (SAN)

특히, 저장소(320)는 이미지 테이블, 할당 정책 테이블 및 자원 상태 테이블을 저장한다.In particular, the repository 320 stores image tables, allocation policy tables, and resource state tables.

도 2는 이미지 테이블을 예시하는 도면이고, 도 3은 할당 정책 테이블을 예시하는 도면이며, 도 4는 자원 상태 테이블을 예시하는 도면이다.FIG. 2 is a diagram illustrating an image table, FIG. 3 is a diagram illustrating an allocation policy table, and FIG. 4 is a diagram illustrating a resource state table.

도 2 내지 도 4를 참조하여 각 테이블을 설명하면, 이미지 테이블에는 분석 유형 식별정보, 파이프라인 이미지 식별정보 및 이미지의 버전이 매핑되어 저장된다. 즉, 분석 유형에 따라 사용되는 파이프라인 이미지의 식별정보와 이 파이프라인의 버전이 상기 이미지 테이블에 기록된다.Referring to FIG. 2 to FIG. 4, the image table stores analysis type identification information, pipeline image identification information, and a version of the image mapped. That is, the identification information of the pipeline image used according to the analysis type and the version of this pipeline are recorded in the image table.

또한, 할당 정책 테이블에는 원시 데이터의 파일 유형과 원시 데이터의 크기에 따라, 필요 자원(즉, 코어 수, 메모리 용량, 디스크 용량)이 매핑되어 기록된다. 즉, 원시 데이터의 파일유형과 파일 용량에 따라 필요 자원이 구분되어 할당 정책 테이블에 기록된다. 상기 할당 정책 테이블에는 자원의 단편화를 최소화하기 위하여, CPU 코어 수와 메모리 용량이 사전에 설정된 기준 단위를 기준으로 2의 n배수(n은 자연수)가 되도록 기록될 수 있다. 예컨대, 필요 코어수는 기준 단위인 1개를 기준으로 2의 n배수(즉, 2개, 4개, 8개, ...)가 되도록 할당 정책 테이블에 기록될 수 있다. 또한, 필요 메모리 용량은 기준 단위인 1GB를 기준으로 2의 n배수(즉, 2GB, 4GB, 8GB, 16GB, ...)로 설정되도록 할당 정책 테이블에 기록될 수 있다. In the allocation policy table, required resources (that is, the number of cores, the memory capacity, and the disk capacity) are mapped and recorded according to the file type of the raw data and the size of the raw data. That is, necessary resources are classified according to the file type of the original data and the file capacity, and they are recorded in the allocation policy table. In the allocation policy table, the number of CPU cores and the memory capacity may be recorded so that n is a multiple of 2 (n is a natural number) based on a preset reference unit in order to minimize resource fragmentation. For example, the required number of cores may be recorded in the allocation policy table so that n is a multiple of 2 (i.e., 2, 4, 8, ...) based on one reference unit. In addition, the required memory capacity can be recorded in the allocation policy table such that it is set to n times (2 GB, 4 GB, 8 GB, 16 GB, ...) of 2 based on 1 GB as the reference unit.

자원 상태 테이블에는 서버 식별정보, 서버의 총 CPU 코어 개수, 총 메모리 용량, 전체 디스크 용량, 여분 코어 개수, 여분 메모리 용량, 여분 디스크 용량 및 실행 작업 개수가 매핑되어 기록된다. 상기 여분 코어 개수, 여분 메모리 용량 및 여분 디스크 용량은 해당 서버에서 사용 가능한 자원을 의미하고, 상기 실행 작업 수는 해당 서버에서 실행되는 유전체 분석 작업 개수를 의미한다. 상기 여분 코어, 개수, 여분 메모리 용량, 여분 디스크 용량 및 실행 작업수는, 자원 관리 모듈(340)에 의해서 실시간으로 갱신된다.The resource status table records server identification information, the total number of CPU cores in the server, the total memory capacity, the total disk capacity, the number of extra cores, the extra memory capacity, the extra disk capacity, and the number of execution tasks. The number of extra cores, the spare memory capacity, and the spare disk capacity mean resources available in the server, and the number of execution tasks means the number of genetic analysis tasks executed in the server. The spare core, the number, the extra memory capacity, the extra disk capacity, and the number of execution tasks are updated in real time by the resource management module 340.

다시 도 1을 참조하면, 접수 처리 모듈(330)은 사용자 단말(100)로부터 작업 요청 정보를 수신하고, 분석 결과를 사용자 단말(100)로 제공하는 기능을 수행한다. 즉, 접수 처리 모듈(330)은 원시 데이터(raw data) 및 작업 분석 유형 정보가 포함된 작업 요청 정보를 사용자 단말(100)로부터 수신하여 저장소(320)에 저장한다. 또한, 접수 처리 모듈(330)은 저장소(320)에 분석 작업 결과가 저장되면, 분석 작업이 완료되었을 사용자 단말(100)로 통보하여, 상기 분석 작업 결과가 사용자 단말(100)에서 다운로드되거나 열람되게 유도한다.Referring back to FIG. 1, the reception processing module 330 receives work request information from the user terminal 100 and provides the analysis result to the user terminal 100. That is, the reception processing module 330 receives job request information including raw data and job analysis type information from the user terminal 100 and stores the received job request information in the storage 320. When the result of the analysis operation is stored in the storage 320, the reception processing module 330 notifies the user terminal 100 that the analysis operation has been completed, and the result of the analysis operation is downloaded or viewed from the user terminal 100 .

이미지 선정 모듈(350)은 요청된 작업에 필요한 파이프라인 이미지를 선정한다. 구체적으로, 이미지 선정 모듈(350)은 저장소(320)에 작업 요청 정보가 저장되면, 작업 요청 정보에 포함된 작업 분석 유형 정보를 확인하고, 이 작업 분석 유형 정보가 대응되는 파이프라인 이미지와 버전을 저장소(320)의 이미지 테이블에서 확인하여 선정한다. The image selection module 350 selects the pipeline image required for the requested operation. Specifically, when the job request information is stored in the storage 320, the image selecting module 350 checks the job analysis type information included in the job request information, and determines whether the job analysis type information corresponds to the corresponding pipeline image and version It is checked and selected in the image table of the storage 320.

자원 관리 모듈(340)은 요청된 작업에 필요한 필요 자원을 확인하고, 더불어 작업을 분석하는 서버(310-N)를 선정한다. 또한, 자원 관리 모듈(340)은 서버(310-N)별 이용되는 자원을 모니터링하여, 저장소(320)의 자원 상태 테이블을 갱신하는 기능을 수행한다. The resource management module 340 identifies the necessary resources required for the requested task and also selects the server 310-N that analyzes the task. The resource management module 340 monitors resources used by the server 310-N and updates the resource state table of the storage 320. [

상기 자원 관리 모듈(340)은 작업 요청 정보에 포함된 원시 데이터의 파일 유형과 크기를 확인하고, 원시 데이터의 파일 유형과 크기에 대응하는 필요 자원(즉, 필요 코어 수, 필요 메모리 용량 및 필요 디스크 용량)를 할당 정책 테이블에서 확인한다. 또한, 자원 관리 모듈(340)은 필요 자원을 지원할 수 있는 하나 이상의 서버를 작업 분석 서버로 선정한다. 이때, 자원 관리 모듈(340)은 컴퓨팅 자원이 오토스케일링으로 설정되어 있는지 여부를 확인하고, 오토 스케일링 환경 설졍 여부에 따라 도 5를 참조하여 후술하는 바와 같이, 최소 개수의 서버를 분석대상 서버로 선정하거나, 부하율이 높은 서버를 분석대상 서버로서 선정할 수 있다. 또한, 자원 관리 모듈(340)은 하나 이상의 작업 분석 서버가 선정되면, 선정된 하나 이상의 서버(310-N)에서 분석 작업을 위해 이용되는 자원을 결정한다.The resource management module 340 checks the file type and the size of the raw data included in the job request information, and extracts necessary resources corresponding to the file type and size of the raw data (that is, the number of required cores, Capacity) in the allocation policy table. Also, the resource management module 340 selects at least one server capable of supporting the necessary resources as the job analysis server. At this time, the resource management module 340 checks whether or not the computing resources are set to auto-scaling, and selects a minimum number of servers as an analysis target server as described later with reference to FIG. Or a server having a high load factor can be selected as the analysis target server. In addition, when one or more job analysis servers are selected, the resource management module 340 determines resources used for the analysis work in the selected one or more servers 310-N.

분석 처리 모듈(360)은 선정된 하나 이상의 서버(310-N)로 파이프라인 이미지를 탑재시키고, 각 파이프라인이 하나 이상의 서버(310-N)에서 격리 실행되게 처리한다. 즉, 분석 처리 모듈(360)은 선정된 서버(310-N)에 파이프라인 이미지를 탑재시키고, 이 서버(310-N)에서 작업 처리에 필요한 자원을 상기 파이프라인을 통한 분석 작업만을 위하여 전용으로 실행되도록 할당한다. 또한, 분석 처리 모듈(360)은 유전체 분석 처리 작업이 서버(310-N)에서 완료되면, 서버(310-N)에 탑재된 파이프라인 이미지를 해제(unloading)한다.The analysis processing module 360 loads the pipeline image into the selected one or more servers 310-N and processes each pipeline to be isolated from one or more servers 310-N. That is, the analysis processing module 360 loads the pipeline image into the selected server 310-N, and the resource required for the job processing in the server 310-N is dedicated for the analysis work through the pipeline To be executed. In addition, the analysis processing module 360 unloads the pipeline image mounted on the server 310-N when the genome analysis processing job is completed in the server 310-N.

도 5는 본 발명의 일 실시예에 따른, 작업 처리 시스템에서 컴퓨팅 자원을 할당하고 컴퓨팅 자원에 파이프라인 이미지를 탑재시켜 유전체 분석을 수행하는 방법을 설명하는 흐름도이다.5 is a flow diagram illustrating a method for allocating computing resources in a work processing system and performing a dielectric analysis by loading pipeline images on computing resources, according to one embodiment of the present invention.

도 5를 참조하면, 접수 처리 모듈(330)은 원시 데이터(raw data) 및 작업 분석 유형 정보가 포함된 작업 요청 정보를 사용자 단말(100)로부터 수신하여, 이 작업 요청 정보를 저장소(320)에 저장한다(S501). 원시 데이터로서, FASTQ, BAM, VCF 등의 유형의 파일이 기록될 수 있다. 또한, 작업 분석 유형 정보로서, 암 분석, 희귀질환 분석, 비만 분석 등과 같이 분석하고자 하는 질병 유형이 기록될 수 있다.5, the reception processing module 330 receives job request information including raw data and job analysis type information from the user terminal 100, and transmits the job request information to the storage 320 (S501). As the raw data, files of the FASTQ, BAM, VCF, and the like types can be recorded. In addition, as the job analysis type information, types of diseases to be analyzed, such as cancer analysis, rare disease analysis, and obesity analysis, may be recorded.

저장소(320)에 작업 요청 정보가 저장되면, 이미지 선정 모듈(350)은 작업 요청 정보에 포함된 작업 분석 유형 정보를 확인하고(S503), 이 작업 분석 유형 정보가 대응되는 파이프라인 이미지와 버전을 저장소(320)의 이미지 테이블에서 확인하여, 요청된 유전체 분석 작업에 이용되는 파이프라인 이미지와 버전을 선정한다(S505).When the job request information is stored in the storage 320, the image selecting module 350 confirms the job analysis type information included in the job request information (S503), and the job analysis type information corresponds to the corresponding pipeline image and the version In the image table of the storage 320, the pipeline image and the version used for the requested genome analysis are selected (S505).

다음으로, 자원 관리 모듈(340)은 상기 작업 요청 정보에 포함된 원시 데이터의 파일 유형과 크기를 확인하고, 원시 데이터의 파일 유형과 크기에 대응하는 필요 자원(즉, 필요 코어 수, 필요 메모리 및 필요 디스크)를 저장소(320)의 할당 정책 테이블에서 확인한다(S507). 상기 할당 정책 테이블에는 자원의 단편화를 최소화하기 위하여, 필요 자원(즉, 코어 개수 및 메모리 용량)이 기준 단위의 2의 n배수(n은 자연수)로 기록되고, 자원 관리 모듈(340)은 기준 단위에서 2의 n배수에 해당하는 자원을 유전체 분석 작업에 필요한 컴퓨팅 자원으로 확인할 수 있다. 예컨대, 할당 정책 테이블에는 필요 코어수가 기준 단위인 1개를 기준으로 2의 n배수(즉, 2개, 4개, 8개, ...)가 되도록 기록될 수 있고, 필요 메모리 용량이 기준 단위인 1GB를 기준으로 2의 n배수(즉, 2GB, 4GB, 8GB, 16GB, ...)로 되도록 기록될 수 있으며, 이에 따라 자원 관리 모듈(340)은 1개×2n(n은 자연수)인 필요 코어 개수 및 1GB×2n(n은 자연수)인 필요 메모리 용량을 유전체 분석 작업에 필요한 컴퓨팅 자원으로 확인할 수 있다.Next, the resource management module 340 checks the file type and size of the raw data included in the job request information, and extracts necessary resources corresponding to the file type and size of the raw data (i.e., In the allocation policy table of the storage 320 (S507). (N is a natural number) of 2 in the reference unit, and the resource management module 340 writes the necessary resources (that is, the number of cores and the memory capacity) The resources corresponding to n multiples of 2 can be identified as the computing resources required for the genome analysis. For example, in the allocation policy table, the number of required cores may be recorded so as to be n (for example, 2, 4, 8, ...) of 2 based on one reference unit, (I.e., 2GB, 4GB, 8GB, 16GB, ...) based on 1GB of 1GB, and thus the resource management module 340 can write 1x2n (n is a natural number) The required number of cores and the required memory capacity of 1GB × 2n (where n is a natural number) can be confirmed as computing resources required for dielectric analysis.

다음으로, 자원 관리 모듈(340)은 복수의 서버(310-N)가 포함하는 컴퓨팅 자원이 오토스케일링 환경이 설정되어 있는지 여부를 확인한다(S509).Next, the resource management module 340 determines whether the computing resources included in the plurality of servers 310-N are set in the auto-scaling environment (S509).

자원 관리 모듈(340)은 컴퓨팅 자원이 오토스케일링 환경으로 설정된 경우, 저장소(320)의 자원 상태 테이블에서 여분의 자원(즉, 여분 코어, 여분 메모리 용량 및 여분 디스크 용량)을 확인하고, 상기 필요 자원(즉, 필요 코어 수, 필요 메모리 용량 및 필요 디스크 용량)을 지원할 수 있는 최소의 서버를 작업 분석 서버로 선정한다(S511). 부연하면, 자원 관리 모듈(340)은 전체 컴퓨팅 자원이 부족한 경우에 자동으로 비활성화된 서버가 활성화되어 전체 용량이 확장되는 동적인 오토스케일링 환경인 경우에, 상기 필요 자원할 수 있는 최소 개수의 서버를 작업 분석 서버로 선정한다. 즉, 자원 관리 모듈(340)은 하나의 특정 서버에서 상기 필요 자원을 모두 지원할 수 있으며, 상기 특정 서버만을 작업 분석 서버로 선정하고, 반면에 자원 관리 모듈(340)은 하나의 서버에서 상기 필요 자원을 모두 지원할 수 없으면, 서버의 개수를 순차적으로 증가시켜 상기 필요 자원을 지원할 수 있는지 여부를 계속적으로 판별하여 결과적으로 필요 자원을 지원할 수 있는 서버의 개수가 최소 개수가 되게 한다. 이렇게 필요 자원을 지원할 수 있는 최소 개수의 서버가 작업 분석 서버로 선정되면, 컴퓨팅 자원의 단편화가 최소화되고 더불어 오토스케일링 환경에서 불필요한 컴퓨터 자원 확장을 최소화하여 시스템 전체의 유휴율을 최소화시킨다.The resource management module 340 checks the redundant resources (i.e., spare core, spare memory capacity, and spare disk capacity) in the resource status table of the storage 320 when the computing resources are set to the auto-scaling environment, (I.e., the required number of cores, the required memory capacity, and the required disk capacity) (S511). Further, in the case of a dynamic auto-scaling environment in which the automatically deactivated server is activated and the total capacity is expanded when the entire computing resources are insufficient, the resource management module 340 may set the minimum number of servers The job analysis server is selected. That is, the resource management module 340 can support all of the necessary resources in one specific server, and selects only the specific server as the job analysis server, while the resource management module 340 selects one of the required resources The number of servers can be sequentially increased to continuously determine whether or not the required resources can be supported, and as a result, the number of servers capable of supporting the necessary resources is minimized. When the minimum number of servers capable of supporting the required resources is selected as the job analysis server, the fragmentation of the computing resources is minimized and the unnecessary expansion of computer resources is minimized in the auto scaling environment, thereby minimizing the idle rate of the entire system.

한편, 자원 관리 모듈(340)은 컴퓨팅 자원이 오토스케일링 환경으로 설정되지 않은 경우, 저장소(320)의 자원 상태 테이블에서 여분의 자원(즉, 여분 코어, 여분 메모리 용량 및 여분 디스크 용량)을 확인하고, 유휴율이 가장 높은 서버를 작업 분석 서버로 선정한다(S513). 이때, 자원 관리 서버(310-N)는 유휴율이 가장 높은 서버가 상기 필요 자원을 모두 지원할 수 없는 경우, 상기 필요 자원이 모두 지원될 수 있을 때까지 다음 유휴율이 높은 순서에 따라 하나 이상의 서버(310-N)를 추가적으로 선정한다. 부연하면, 자원 관리 모듈(340)은 컴퓨팅 자원이 규모가 일정하게 고정된 정적인 환경인 경우, 유휴율이 높은 서버의 순서에 따라, 상기 필요 자원을 지원할 수 있는 하나 이상의 서버를 작업 분석 서버로 선정한다. 상기 자원 관리 모듈(340)은 서버(310-N)의 유휴율은 여부 코어의 비율, 여분 메모리의 비율, 여부 디스크 용량의 비율 각각에 가중치를 적용하고, 이 가중치가 적용된 여부 코어의 비율, 여분 메모리의 비율 및 여분 디스크 용량 비율을 합산하여 서버의 유휴율을 산출할 수 있다. 이렇게 오토스케일링이 설정되지 않은 환경에서, 유휴율이 높은 서버가 작업 분석 서버로서 선정되면, 전체 시스템의 처리 속도가 향상되고 전체 시스템의 유휴율이 최소화된다. On the other hand, if the computing resource is not set to the auto-scaling environment, the resource management module 340 checks the extra resources (i.e., extra core, spare memory capacity, and extra disk capacity) in the resource status table of the storage 320 , The server having the highest idle rate is selected as the job analysis server (S513). In this case, if the server having the highest idle rate can not support all of the necessary resources, the resource management server 310-N may wait until the next idle rate can be supported, (310-N) is additionally selected. In addition, if the computing resource is a static environment in which the size of the computing resources is fixed, the resource management module 340 may allocate one or more servers capable of supporting the required resources to the job analysis server . The resource management module 340 applies a weight to each of the ratio of whether the idle rate of the server 310-N is a core ratio, a ratio of an extra memory, or a ratio of a disk capacity, and determines whether the weight is applied, The ratio of the memory and the spare disk capacity ratio can be added up to calculate the idle rate of the server. In an environment where no auto-scaling is set up, if a server with a high idle rate is selected as a job analysis server, the processing speed of the entire system is improved and the idle rate of the entire system is minimized.

다음으로, 자원 관리 모듈(340)은 하나 이상의 작업 분석 서버가 선정되면, 선정된 하나 이상의 서버(310-N)에서 분석 작업을 위해 이용되는 자원을 결정하고, 이 확인한 자원을 저장소(320)의 자원 상태 테이블에 여분 자원(즉, 여분 코어, 여분 메모리 및 여분 디스크 용량)과 실행 작업수를 반영하여, 자원 상태 테이블을 갱신한다. Next, when one or more job analysis servers are selected, the resource management module 340 determines resources to be used for the analysis work in the selected one or more servers 310-N, The resource status table is updated by reflecting the extra resources (that is, extra core, extra memory, and extra disk capacity) and the number of execution jobs in the resource status table.

이미지 선정 모듈(350)에서 파이프라인 이미지가 선정되고, 자원 관리 모듈(340)에서 서버가 선정되고 각 서버(310-N)에서 이용되는 자원이 결정되면, 분석 처리 모듈(360)은 상기 선정된 하나 이상의 서버(310-N)로 상기 파이프라인 이미지를 탑재시키고, 각 파이프라인이 하나 이상의 서버(310-N)에서 격리 실행되게 제어한다(S515, S517). 즉, 분석 처리 모듈(360)은 선정된 서버(310-N)에 파이프라인 이미지를 탑재시키고, 이 서버(310-N)에서 작업 처리에 필요한 자원을 상기 파이프라인을 통한 분석 작업만을 위하여 전용으로 실행되도록 할당한다. 그러면, 상기 서버(310-N)는 상기 할당된 자원을 이용하여 상기 파이프라인에 따른 분석 작업을 격리 실행한다. 한편, 서버(310-N)는 상기 파이프라인 이미지가 최초로 탑재하는 경우 저장소(320)에 저장된 상기 파이프라인 이미지를 획득하여, 상기 파이프라인 이미지를 탑재시키고 분석 작업을 격리 실행한다. 반면에, 서버(310-N)는 상기 파이프라인 이미지가 과거에 이미 탑재한 이미지인 경우, 저장소(320)로부터 상기 파이프라인 이미지를 획득하지 않고, 캐쉬된 파이프라인 이미지를 탑재하여 분석 작업을 격리 실행한다. When a pipeline image is selected in the image selection module 350 and a server is selected in the resource management module 340 and resources used in each server 310-N are determined, the analysis processing module 360 determines The pipeline image is loaded onto one or more servers 310-N, and each pipeline is controlled to be isolated from one or more servers 310-N (S515, S517). That is, the analysis processing module 360 loads the pipeline image into the selected server 310-N, and the resource required for the job processing in the server 310-N is dedicated for the analysis work through the pipeline To be executed. Then, the server 310-N isolates and executes analysis work according to the pipeline using the allocated resources. On the other hand, when the pipeline image is first mounted, the server 310-N acquires the pipeline image stored in the storage 320, mounts the pipeline image, and quarantines the analysis job. On the other hand, if the pipeline image is an image that has already been mounted in the past, the server 310-N may mount the cached pipeline image, without acquiring the pipeline image from the repository 320, .

파이프 라인에 따른 분석 작업을 격리 실행한 하나 이상의 서버(310-N)는 분석 작업이 완료되면, 분석 작업에 대한 결과를 저장소(320)에 저장한다(S519). 그러면, 접수 처리 모듈(330)은 분석 작업이 완료되었음을 사용자 단말(100)로 통보하고, 가입자는 작업 처리 시스템(300)에 접속하여 분석 결과를 다운로드하거나 열람할 수 있다. The one or more servers 310-N that isolate the pipeline-based analysis job stores the results of the analysis job in the repository 320 (S519). Then, the reception processing module 330 notifies the user terminal 100 of the completion of the analysis work, and the subscriber can connect to the work processing system 300 to download or browse the analysis results.

또한, 자원 관리 모듈(340)은 분석 작업이 완료되면, 파이프라인에 따른 작업이 서버(310-N)에 완료되면, 상기 파이프라인 이미지 탑재를 해제하여, 이 파이프라인에 따른 작업을 위해 할당한 필요 자원을 반환되게 한다(S521). 그리고 자원 관리 모듈(340)은 상기 반환된 필요 자원이 해당 서버의 여분의 자원에 기록되도록, 저장소(320)의 자원 상태 테이블을 갱신한다. In addition, when the analysis operation is completed, the resource management module 340 releases the pipeline image when the operation according to the pipeline is completed in the server 310-N, and allocates the work for the operation according to the pipeline The necessary resources are returned (S521). The resource management module 340 updates the resource status table of the storage 320 so that the returned required resources are recorded in the spare resources of the corresponding server.

상술한 바와 같이, 본 발명에 따른 작업 처리 시스템(300)은 작업 유형에 따라 파이프라인 이미지를 하나 이상의 서버(310-N)에 탑재시키고, 이 파이프라인 이미지를 통해서 작업이 처리되게 함으로써, 유전체 분석 작업을 빠르게 처리할 수 있다. 또한, 본 발명에 따른 작업 처리 시스템(300)은 작업 처리시에 서버(310-N)별 자원 상태를 확인하고, 이 자원 상태와 작업 유형에 따라 하나 이상의 서버(310-N)를 분석 대상 서버로 선정함으로써, 전체 컴퓨팅 시스템의 유휴율을 최소화시킨다. 특히, 본 발명에 따른 작업 처리 시스템(300)은 컴퓨팅 자원이 오토스케일링으로 설정되어 있는지 여부에 따라, 서버(310-N)를 선정하는 정책을 달리함으로써, 전체 시스템의 자원 효율을 더욱 향상시킨다. 또한, 본 발명에 따른 작업 처리 시스템(300)은 서버(310-N)에서 유전체 분석 작업을 격리 처리할 수 있도록 작업별로 독립적인 처리 환경을 제공함으로써, 작업 처리 안정성을 향상시킨다. As described above, the job processing system 300 according to the present invention can load a pipeline image on one or more servers 310-N according to the job type, and allow the job to be processed through the pipeline image, Work can be done quickly. In addition, the job processing system 300 according to the present invention checks the resource status of each server 310-N at the time of job processing, and manages one or more servers 310-N according to the resource status and the job type, Thereby minimizing the idle rate of the entire computing system. In particular, the task processing system 300 according to the present invention improves the resource efficiency of the overall system by changing the policy for selecting the server 310-N according to whether or not the computing resources are set to auto-scaling. In addition, the work processing system 300 according to the present invention improves work process stability by providing an independent processing environment for each job so that the server 310-N can isolate the dielectric analysis job.

본 명세서는 많은 특징을 포함하는 반면, 그러한 특징은 본 발명의 범위 또는 특허청구범위를 제한하는 것으로 해석되어서는 안 된다. 또한, 본 명세서에서 개별적인 실시예에서 설명된 특징들은 단일 실시예에서 결합되어 구현될 수 있다. 반대로, 본 명세서에서 단일 실시예에서 설명된 다양한 특징들은 개별적으로 다양한 실시예에서 구현되거나, 적절히 결합되어 구현될 수 있다.While the specification contains many features, such features should not be construed as limiting the scope of the invention or the scope of the claims. In addition, the features described in the individual embodiments herein may be combined and implemented in a single embodiment. Conversely, various features described in the singular < Desc / Clms Page number 5 > embodiments herein may be implemented in various embodiments individually or in combination as appropriate.

도면에서 동작들이 특정한 순서로 설명되었으나, 그러한 동작들이 도시된 바와 같은 특정한 순서로 수행되는 것으로, 또는 일련의 연속된 순서, 또는 원하는 결과를 얻기 위해 모든 설명된 동작이 수행되는 것으로 이해되어서는 안 된다. 특정 환경에서 멀티태스킹 및 병렬 프로세싱이 유리할 수 있다. 아울러, 상술한 실시예에서 다양한 시스템 구성요소의 구분은 모든 실시예에서 그러한 구분을 요구하지 않는 것으로 이해되어야 한다. 상술한 프로그램 구성요소 및 시스템은 일반적으로 단일 소프트웨어 제품 또는 멀티플 소프트웨어 제품에 패키지로 구현될 수 있다.Although the operations have been described in a particular order in the figures, it should be understood that such operations are performed in a particular order as shown, or that all described operations are performed to obtain a sequence of sequential orders, or a desired result . In certain circumstances, multitasking and parallel processing may be advantageous. It should also be understood that the division of various system components in the above embodiments does not require such distinction in all embodiments. The above-described program components and systems can generally be implemented as a single software product or as a package in multiple software products.

상술한 바와 같은 본 발명의 방법은 프로그램으로 구현되어 컴퓨터로 읽을 수 있는 형태로 기록매체(시디롬, 램, 롬, 플로피 디스크, 하드 디스크, 광자기 디스크 등)에 저장될 수 있다. 이러한 과정은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있으므로 더 이상 상세히 설명하지 않기로 한다.The method of the present invention as described above can be implemented by a program and stored in a computer-readable recording medium (CD-ROM, RAM, ROM, floppy disk, hard disk, magneto optical disk, etc.). Such a process can be easily carried out by those skilled in the art and will not be described in detail.

이상에서 설명한 본 발명은, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 있어 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하므로 전술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니다.It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention. The present invention is not limited to the drawings.

100 : 사용자 단말 200 : 네트워크
300 : 작업 처리 시스템 310 : 서버
320 : 저장소 330 : 접수 처리 모듈
340 : 자원 관리 모듈 350 : 이미지 선정 모듈
360 : 분석 처리 모듈100: user terminal 200: network
300: job processing system 310: server
320: storage 330: reception processing module
340: resource management module 350: image selection module
360: Analysis processing module

Claims

A system for controlling a computing resource to process a dielectric analysis job,
A reception processing module for receiving a genome analysis job from a user;
An image selection module for selecting a pipeline image necessary for the dielectric analysis operation from a plurality of pipeline images;
A resource management module that identifies required resources required for performing the genome analysis and selects one or more servers capable of supporting the required resources among a plurality of servers forming computer resources as a job analysis server; And
And an analysis processing module for loading the selected pipeline image on one or more servers selected by the job analysis server and controlling the genetic analysis job to be processed through the selected one or more servers,
Wherein the resource management module comprises:
A plurality of servers for establishing a computing resource are selected as the minimum number of servers capable of supporting the required resources among the plurality of servers, And selects at least one server as the job analysis server based on the order of the highest idle rate among a plurality of servers when the auto-scaling environment is not set.

delete

The method according to claim 1,
Wherein the resource management module comprises:
Checking the file type and capacity of the raw data received from the user and confirming the necessary resources corresponding to the file type and the capacity of the raw data and checking the necessary resources required to perform the genome analysis work Work processing system.

The method according to claim 1,
Wherein the image selection module comprises:
Identifying a type of the genome analysis job, identifying the pipeline image identification information and version corresponding to the type, and selecting a pipeline image corresponding to the pipeline image identification information and version from among the plurality of pipeline images Wherein the work processing system comprises:

The method according to claim 1,
Wherein the resource management module determines a resource to be allocated for the genetic analysis job among spare resources of each server selected by the job analysis server,
Wherein each of the servers selected by the job analyzing server isolates and processes the dielectric analysis job using the allocated resources.

The method according to claim 6,
Wherein the resource management module comprises:
Wherein the number of CPU cores is assigned to the server such that the number of CPU cores is n times 2 (n is a natural number) based on a predetermined number of units.

The method according to claim 6,
Wherein the resource management module comprises:
And the memory capacity is allocated to the server such that the capacity corresponds to n times 2 (n is a natural number) based on a unit capacity set in advance.

The method according to claim 1,
Wherein the analysis processing module comprises:
And when the genome analysis operation is completed in the selected one or more servers, dismisses the pipeline image mounted on the selected one or more servers.

A method for controlling a computing resource in a work processing system to process a dielectric analysis job,
Receiving a request for a genome analysis job from a user;
Selecting a pipeline image necessary for the dielectric analysis operation from a plurality of pipeline images;
Identifying a necessary resource required to perform the genome analysis operation;
Selecting one or more servers capable of supporting the required resources among a plurality of servers forming computer resources as a job analysis server; And
And loading the selected pipe image to one or more servers selected by the job analyzing server to control the genetic analysis job to be processed through the selected one or more servers,
Wherein the job analysis server comprises:
A plurality of servers for establishing a computing resource are selected as the minimum number of servers capable of supporting the required resources among the plurality of servers, Wherein the job analysis server selects one or more servers based on the order of the highest idle rate among a plurality of servers when the auto-scaling environment is not set.

delete

11. The method of claim 10,
The method of claim 1,
Checking the file type and capacity of the raw data received from the user; And
And checking the required resources corresponding to the file type and the capacity of the raw data and confirming the necessary resources required to perform the genome analysis operation.

11. The method of claim 10,
Wherein the controlling comprises:
And releasing the pipeline image mounted on the selected one or more servers when the genome analysis operation is completed in the selected one or more servers.