KR101794696B1

KR101794696B1 - Distributed processing system and task scheduling method considering heterogeneous processing type

Info

Publication number: KR101794696B1
Application number: KR1020160103071A
Authority: KR
Inventors: 이영민; 윤일로
Original assignee: 서울시립대학교 산학협력단
Priority date: 2016-08-12
Filing date: 2016-08-12
Publication date: 2017-11-07

Abstract

The present invention relates to a task scheduling method in consideration of a heterogeneous processing type and a distribution processing system which comprise the following steps of: allowing a worker node to allocate a task to a first processing unit or a second processing unit based on the number of desired simultaneous performance in the second processing unit; allowing the worker node to monitor a use rate of the second processing unit; and allowing the worker node to change the number of desired simultaneous performance in accordance with the use rate.

Description

[0001] DISTRIBUTED PROCESSING SYSTEM AND TASK SCHEDULING METHOD CONSIDERING HETEROGENEOUS PROCESSING TYPE [0002]

본 발명은 이기종 프로세싱 타입을 고려한 태스크 스케쥴링 방법 및 분산 처리 시스템에 관한 것으로서, 구체적으로는 하둡과 같은 분산 처리 시스템에서 수행될 태스크를 노드의 이기종 프로세싱 유닛에 동적으로 할당하여 노드의 리소스를 최대한 활용하고 태스크의 처리 속도를 향상시킬 수 있는, 이기종 프로세싱 타입을 고려한 태스크 스케쥴링 방법 및 분산 처리 시스템에 관한 것이다. The present invention relates to a task scheduling method and a distributed processing system considering heterogeneous processing types, and more particularly, to a task scheduling method and a distributed processing system in which a task to be performed in a distributed processing system such as Hadoop is dynamically allocated to a heterogeneous processing unit of a node, To a task scheduling method and a distributed processing system in consideration of heterogeneous processing types capable of improving the processing speed of tasks.

빅 데이터 시대가 도래함에 따라, 빅 데이터의 효율적인 처리를 위해 하둡(Hadoop)과 같은 분산 처리 시스템에서의 분산 처리가 필수가 되고 있다. 전통적으로 대부분의 데이터는 텍스트였으나 근래에 이르러 보안 카메라의 급속한 보급과 비디오를 기록하는 스마트 디바이스의 출현과 확산으로 비디오 데이터가 급속 증가하고 있다. 또한, 비디오 해상도는 지속적으로 증가하여 4K에 이르고 있고 1080p 해상도는 CCTV에서 이미 일반화되었다. With the advent of the big data age, distributed processing in a distributed processing system such as Hadoop has become essential for efficient processing of big data. Traditionally, most of the data was text, but in recent years video data is rapidly increasing due to the rapid spread of security cameras and the emergence and spread of smart devices that record video. In addition, the video resolution is steadily increasing to 4K and the 1080p resolution is already common in CCTV.

한편, 강력한 머신 러닝 알고리즘이 개발됨에 따라 이러한 알고리즘을 비디오 처리에 적용하려는 시도가 널리 이루어지고 있다. 그리고 비디오 분석 응용은 많은 분야에서 쉽게 발견될 수 있다. 예를 들어 얼굴 검출과 얼굴 인식은 보안 분야, 상업적 분야뿐 아니라 마케팅 목적으로 이용 가능하다. 대부분의 머신 러닝 알고리즘은 계산-집약적(compute-intensive)이고 단일 프레임에 대해서도 많은 시간을 요한다. On the other hand, as robust machine learning algorithms have been developed, attempts have been made to apply such algorithms to video processing. And video analytics applications can be easily found in many areas. For example, face detection and face recognition are available for marketing purposes as well as security and commercial applications. Most machine learning algorithms are compute-intensive and require a lot of time for a single frame.

이와 같이, 효율적인 비디오 데이터 처리는 필수가 되어가고 있다. 한편, GPU(Graphic Processing Unit)는 그 엄청난 계산 파워로 인해 많은 인기를 얻고 있고 특히 비디오 프로세싱에 적합하다. 그러나 분산 처리 시스템에서 하나의 GPU 또는 하나의 머신 상의 여러 GPU는 최대 성능을 얻을 수 없는 게 현재 실정이다. In this manner, efficient video data processing has become essential. On the other hand, the GPU (Graphic Processing Unit) is gaining popularity due to its tremendous computing power, especially for video processing. However, in a distributed processing system, one GPU or several GPUs on one machine can not achieve maximum performance at present.

분산 처리 시스템인 하둡(Hadoop) 프레임워크는 복수의 노드로 구성되고 각 노드는 CPU 뿐 아니라 GPU도 포함한다. 하둡 프레임워크에서 더 높은 처리 속도를 가질 수 있도록 GPU를 활용하는 연구가 지속적으로 이루어지고 있다. The distributed processing system, Hadoop framework, is composed of a plurality of nodes, and each node includes a GPU as well as a CPU. There is a continuing work on using the GPU to get higher throughput in the Hadoop framework.

하둡 프레임워크 내의 전형적인 노드는 평균 10 개 이상의 CPU 코어와 GPU를 포함하나, GPU를 활용하는 기존 연구는 GPU가 동작(GPU 맵퍼(mapper), GPU 커널에 의해)하고 있을 때 이 CPU 코어들은 휴지(Idle) 상태에 있어 CPU 코어의 수행 성능을 같이 최대한 활용하지 못하고 있다. Typical nodes in the Hadoop framework include an average of more than 10 CPU cores and GPUs, but previous studies utilizing GPUs have shown that when the GPU is in operation (by the GPU mapper, GPU kernel) Idle state, the performance of the CPU core is not utilized as much as possible.

GPU 맵퍼는 GPU에서 태스크를 처리하고 CPU에서 수행되는 호스트는 단순히 GPU 커널을 런칭(launching)한다. 이러한 경우 호스트는 계산 집중적인 처리를 하지 않기에 노드의 CPU는 대부분 휴지 상태에 있게 된다. 이와 같이 GPU를 활용하면서도 CPU의 리소스를 최대한 활용하여 노드의 최대 성능을 얻을 수 있는 방안이 필요하다. The GPU mapper processes tasks on the GPU, and the host running on the CPU simply launches the GPU kernel. In this case, since the host does not perform computation-intensive processing, the CPU of the node is mostly in a dormant state. In this way, it is necessary to utilize the CPU resources while maximizing the performance of the node while utilizing the GPU.

분산 처리 시스템인 하둡 프레임워크에서 CPU-GPU 하이브리드 스케쥴링을 가능케 하기 위한 가장 큰 문제점은 노드에서의 맵퍼(맵 태스크)의 개수가 잡 수행 동안에 고정되고 변경 불가능한 점이다. 이에 따라 효율적이고 동적으로 CPU-GPU 하이브리드 수행을 얻기가 힘들 실정이다. The biggest problem to enable CPU-GPU hybrid scheduling in distributed processing system, Hadoop framework, is that the number of mapper (map task) in the node is fixed and unchangeable during job execution. Therefore, it is difficult to obtain CPU-GPU hybrid performance efficiently and dynamically.

특정 연구(비특허문헌 1)는 하둡 프레임워크를 확장하고 노드의 CPU와 GPU 모두를 사용하여 잡의 실행 시간을 줄이도록 하고 있다. 이 연구도 GPU 사용율(utilization)에 상관없이 GPU 디바이스 상에 하나의 태스크 만이 사용가능한 것으로 가정하여 스케쥴링을 단순화시켜 GPU와 CPU의 수행 성능을 최대치로 활용 불가능하다. A specific study (non-patent document 1) extends the Hadoop framework and uses both the CPU and the GPU of the node to reduce the execution time of the job. This study also assumes that only one task can be used on the GPU device regardless of the utilization of the GPU, so that it is impossible to maximize the performance of the GPU and the CPU by simplifying the scheduling.

또한, 이 연구는 잡 트래커가 이용 가능한 모든 노드 상에서 이용 가능한 슬롯에 태스크를 바인딩하고 스케쥴링하도록 구성된다. 이를 위해 노드 상의 태스크 트래커가 각 노드의 슬롯 이용가능성을 하트비트(heartbeat) 메시지를 통해 잡 트래커로 전송한다. 이 연구는 스케쥴링이 잡 트래커에 의해 중앙 집중화되고 기존 하둡 스케쥴링 기법을 변경해야 되는 문제가 존재한다. The study is also configured to bind and schedule tasks to available slots on all available nodes of a job tracker. To do this, the task tracker on the node sends the slot availability of each node to the job tracker via a heartbeat message. This research has the problem that the scheduling is centralized by the job tracker and the existing Hadoop scheduling technique must be changed.

또한, 특정 발명(특허문헌 1) 역시 비특허문헌 1과 유사하게 잡 트래커에서 CPU 또는 GPU에 태스크를 스케쥴링한다는 내용을 공개하고 있으나 비특허문헌 1과 동일한 문제점을 안고 있다. Also, in the specific invention (Patent Document 1), similar to the non-patent document 1, a task tracker is disclosed that schedules a task to a CPU or a GPU, but the same problem as the non-patent document 1 is presented.

이와 같이, 이기종 프로세싱 타입을 가지고 있는 분산 처리 시스템에서 기존 알려진 연구와 발명의 한계를 극복할 수 있는 이기종 프로세싱 타입을 고려한 태스크 스케쥴링 방법 및 분산 처리 시스템이 필요하다. Thus, there is a need for a task scheduling method and a distributed processing system considering heterogeneous processing types that can overcome the limitations of existing research and invention in a distributed processing system having heterogeneous processing types.

KR 10-1620896 B1KR 10-1620896 B1

Shirahata, K., Sato, H., Matsuoka, S. "Hybrid map task scheduling for GPU-based heterogeneous clusters" (2010) Proceedings - 2nd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2010, art. no. 5708524, pp. 733-740. Shirahata, K., Sato, H., Matsuoka, S. "Hybrid map task scheduling for GPU-based heterogeneous clusters" (2010) Proceedings - 2nd IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2010, art. no. 5708524, pp. 733-740.

본 발명은, 상술한 문제점을 해결하기 위해서 안출한 것으로서, 이기종 프로세싱 타입을 가지는 노드들로 구성된 분산 처리 시스템에서 각 노드의 이기종 프로세싱 유닛에 태스크를 동적으로 할당하여 노드의 잡 수행 성능을 향상시킬 수 있는 이기종 프로세싱 타입을 고려한 태스크 스케쥴링 방법 및 분산 처리 시스템을 제공하는 데 그 목적이 있다. Disclosure of the Invention The present invention has been made in order to solve the above problems, and it is an object of the present invention to improve the job performance of a node by dynamically allocating a task to a heterogeneous processing unit of each node in a distributed processing system composed of nodes having heterogeneous processing types And to provide a task scheduling method and a distributed processing system considering heterogeneous processing types.

또한, 본 발명은 이기종 프로세싱 유닛으로의 태스크 할당을 노드에서 수행하여 분산 처리 시스템에서의 기존 스케쥴링 기법을 활용하면서도 노드의 최대 처리 성능을 제공할 수 있도록 하는 이기종 프로세싱 타입을 고려한 태스크 스케쥴링 방법 및 분산 처리 시스템을 제공하는 데 그 목적이 있다. In addition, the present invention provides a task scheduling method and a distributed scheduling method considering heterogeneous processing types, which can provide maximum processing performance of a node while utilizing existing scheduling techniques in a distributed processing system by performing task assignment to a heterogeneous processing unit at a node The purpose of the system is to provide.

또한, 본 발명은 이기종 프로세싱 유닛의 사용률과 처리 속도를 반영하여 이기종 프로세싱 유닛 간의 부하를 밸런싱할 수 있는 이기종 프로세싱 타입을 고려한 태스크 스케쥴링 방법 및 분산 처리 시스템을 제공하는 데 그 목적이 있다.Another object of the present invention is to provide a task scheduling method and a distributed processing system considering heterogeneous processing types capable of balancing load among heterogeneous processing units reflecting the utilization rate and processing speed of heterogeneous processing units.

또한, 본 발명은 FSM(Finite State Machine) 로직을 이용하여 이기종 프로세싱 유닛인 CPU와 GPU 중 GPU의 사용률을 최대화하여 태스크 처리 속도를 향상시킬 수 있는 이기종 프로세싱 타입을 고려한 태스크 스케쥴링 방법 및 분산 처리 시스템을 제공하는 데 그 목적이 있다. In addition, the present invention provides a task scheduling method and a distributed processing system considering heterogeneous processing types that can improve the task processing speed by maximizing the usage rate of the GPU among CPU and GPU, which are heterogeneous processing units, by using FSM (Finite State Machine) The purpose is to provide.

본 발명에서 이루고자 하는 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급하지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, unless further departing from the spirit and scope of the invention as defined by the appended claims. It will be possible.

상기와 같은 목적을 달성하기 위한 이기종 프로세싱 타입을 고려한 태스크 스케쥴링 방법은 워커 노드가 태스크를 제1 프로세싱 유닛 또는 제2 프로세싱 유닛에 제2 프로세싱 유닛에서의 동시수행 희망개수에 기초하여 할당하는 단계, 워커 노드가 제2 프로세싱 유닛의 사용률을 모니터링하는 단계 및 워커 노드가 사용률에 따라 동시수행 희망개수를 변경하는 단계를 포함한다. According to another aspect of the present invention, there is provided a method for scheduling a task in consideration of a heterogeneous processing type, the method comprising: a worker node assigning a task to a first processing unit or a second processing unit based on a concurrent execution count in a second processing unit; The step of the node monitoring the utilization rate of the second processing unit and the step of changing the number of simultaneous execution of the worker node according to the usage rate.

또한, 상기와 같은 목적을 달성하기 위한 이기종 프로세싱 타입을 고려한 분산 처리 시스템은 태스크를 처리하는 하나 이상의 워커 노드를 포함하고, 각각의 워커 노드는 복수의 CPU 코어를 포함하는 CPU, 복수의 GPU 코어를 포함하는 GPU 및 CPU 또는 GPU에서 수행되고 할당된 태스크를 CPU 또는 GPU에 할당하는 노드 매니저를 포함하고, 노드 매니저는 GPU의 사용률 또는 GPU에서의 동시수행 희망개수에 기초하여 태스크를 CPU에 또는 GPU에 할당한다. In order to achieve the above object, a distributed processing system considering heterogeneous processing types includes at least one worker node for processing a task, each of the worker nodes includes a CPU including a plurality of CPU cores, a plurality of GPU cores And the node manager assigns tasks to the CPU or the GPU based on the utilization rate of the GPU or the desired number of simultaneous execution in the GPU. .

상기와 같은 본 발명에 따른 이기종 프로세싱 타입을 고려한 태스크 스케쥴링 방법 및 분산 처리 시스템은 이기종 프로세싱 타입을 가지는 노드들로 구성된 분산 처리 시스템에서 각 노드의 이기종 프로세싱 유닛에 태스크를 동적으로 할당하여 노드의 잡 수행 성능을 향상시킬 수 있는 효과가 있다. The task scheduling method and the distributed processing system considering heterogeneous processing type according to the present invention as described above dynamically allocate a task to a heterogeneous processing unit of each node in a distributed processing system composed of nodes having heterogeneous processing types, There is an effect that the performance can be improved.

또한, 상기와 같은 본 발명에 따른 이기종 프로세싱 타입을 고려한 태스크 스케쥴링 방법 및 분산 처리 시스템은 이기종 프로세싱 유닛으로의 태스크 할당을 노드에서 수행하여 분산 처리 시스템에서의 기존 스케쥴링 기법을 활용하면서도 노드의 최대 처리 성능을 제공할 수 있도록 하는 효과가 있다. In addition, the task scheduling method and the distributed processing system considering the heterogeneous processing type according to the present invention as described above perform task assignment to the heterogeneous processing unit in the node, and utilize the existing scheduling technique in the distributed processing system, And the like.

또한, 상기와 같은 본 발명에 따른 이기종 프로세싱 타입을 고려한 태스크 스케쥴링 방법 및 분산 처리 시스템은 이기종 프로세싱 유닛의 사용률과 처리 속도를 반영하여 이기종 프로세싱 유닛 간의 부하를 밸런싱할 수 있는 효과가 있다.In addition, the task scheduling method and the distributed processing system in consideration of the heterogeneous processing type according to the present invention can balance the load among the heterogeneous processing units by reflecting the utilization rate and the processing speed of the heterogeneous processing unit.

또한, 상기와 같은 본 발명에 따른 이기종 프로세싱 타입을 고려한 태스크 스케쥴링 방법 및 분산 처리 시스템은 FSM(Finite State Machine) 로직을 이용하여 이기종 프로세싱 유닛인 CPU와 GPU 중 GPU의 사용률을 최대화하여 태스크 처리 속도를 향상시킬 수 있는 효과가 있다. In addition, the task scheduling method and the distributed processing system in consideration of the heterogeneous processing type according to the present invention can maximize the utilization rate of the GPU among the CPU and the GPU, which are heterogeneous processing units, using the FSM (Finite State Machine) There is an effect that can be improved.

본 발명에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다. The effects obtained by the present invention are not limited to the above-mentioned effects, and other effects not mentioned can be clearly understood by those skilled in the art from the following description will be.

도 1은 분산 처리 시스템의 예시적인 시스템 블록도를 도시한 도면이다.
도 2는 워커 노드의 하드웨어 블록도를 도시한 도면이다.
도 3은 워커 노드의 기능 블록도를 도시한 도면이다.
도 4는 동적 스케쥴링에 따라 발생하는 부하 불균형의 예와 그 대응 예를 도시한 도면이다.
도 5는 요청된 잡을 수행하는 태스크들의 예시적인 동적 스케쥴링 과정을 도시한 도면이다.
도 6은 동적 스케쥴링 과정에서 이용되는 변수와 동적 스케쥴링을 위한 FSM 의 예를 도시한 도면이다.1 is a diagram showing an exemplary system block diagram of a distributed processing system.
2 is a hardware block diagram of a worker node.
3 is a functional block diagram of a worker node.
4 is a diagram showing an example of a load imbalance occurring according to dynamic scheduling and a corresponding example thereof.
5 is a diagram illustrating an exemplary dynamic scheduling process of tasks that perform a requested job.
6 is a diagram illustrating an example of a variable used in the dynamic scheduling process and an FSM for dynamic scheduling.

상술한 목적, 특징 및 장점은 첨부된 도면을 참조하여 상세하게 후술 되어 있는 상세한 설명을 통하여 더욱 명확해 질 것이며, 그에 따라 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 또한, 본 발명을 설명함에 있어서 본 발명과 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에 그 상세한 설명을 생략하기로 한다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시 예를 상세히 설명하기로 한다. The above and other objects, features and advantages of the present invention will become more apparent from the following detailed description of the present invention when taken in conjunction with the accompanying drawings, in which: It can be easily carried out. In the following description, well-known functions or constructions are not described in detail since they would obscure the invention in unnecessary detail. Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 분산 처리 시스템의 예시적인 시스템 블록도를 도시한 도면이다. 도 1은 요청된 잡을 여러 태스크로 분할하고 분할된 태스크를 각 노드(10)에서 수행하도록 구성하는 하둡 프레임워크의 예를 도시한다. 도 1의 분산 처리 시스템을 통해, 대용량 데이터를 대상으로 하는 잡(Job)을 고속으로 처리할 수 있다. 1 is a diagram showing an exemplary system block diagram of a distributed processing system. Figure 1 shows an example of a Hadoop framework for partitioning a requested job into multiple tasks and configuring the divided tasks to be performed at each node 10. [ Through the distributed processing system of Fig. 1, it is possible to process a job that targets a large amount of data at a high speed.

도 1에 따르면, 하둡 프레임워크와 같은 분산 처리 시스템은 다수의 노드(10)와 노드(10) 사이에서 서로 송수신되는 데이터를 전송하기 위한 통신 네트워크로 구성된다. 1, a distributed processing system, such as the Hadoop framework, is comprised of a communication network for transmitting data transmitted and received between a plurality of nodes 10 and a node 10.

노드(10)는 분산 처리 시스템인 하둡 프레임워크 등에서 처리 가능한 맵리듀스(Map/Reduce) 프로그램 모델의 잡(태스크) 들을 처리할 수 있다. 노드(10)는 개인용 컴퓨터, 서버, 워크스테이션 등과 같이 프로그램을 수행할 수 있는 장치이거나 머신 일 수 있다. The node 10 can process jobs of a Map / Deuce program model that can be processed by a distributed processing system, such as the Hadoop framework. The node 10 may be or be a device capable of executing programs, such as a personal computer, a server, a workstation, or the like.

이 분산 처리 시스템은 비디오 처리 잡을 수행 하거나 빅 데이터를 분석하거나 통계를 생성하는 등 다양한 분야에서 다양한 응용을 위해 활용 가능하다. This distributed processing system can be used for various applications in various fields such as performing video processing job, analyzing big data or generating statistics.

분산 처리 시스템(하둡 프레임워크)을 구성하는 복수의 노드(10) 중 하나 이상의 노드(10)는 맵리듀스 프로그램 모델의 처리 잡(Job)을 생성하고 분산 처리 시스템 상에서 처리를 요청하는 클라이언트 노드일 수 있다. 다른 유형의 노드와 구별하기 위해 이하에서는 클라이언트 노드를 도면 부호 100으로 표시한다.One or more nodes 10 of the plurality of nodes 10 constituting the distributed processing system (Hadoop framework) can generate a processing job (Job) of the MapReduce program model and can be a client node that requests processing on the distributed processing system have. In order to distinguish it from other types of nodes, the client node is denoted by reference numeral 100 below.

다른 하나 또는 복수의 노드(10)는 클라이언트 노드(100)로부터 잡 처리 요청을 수신하고 잡 처리 요청에 따라 이 잡을 처리할 태스크와 태스크에서 처리할 스플릿을 할당하는 마스터 노드일 수 있다. 다른 유형의 노드와 구별하기 위해 이하에서는 마스터 노드를 도면 부호 200으로 표시한다. The other node or nodes 10 may be a master node that receives a job processing request from the client node 100 and allocates a task to process this job and a split to process in the task according to the job processing request. To distinguish it from other types of nodes, the master node is denoted 200 hereafter.

마스터 노드(200)는 클라이언트 노드(100)로부터의 잡 처리 요청의 수신에 따라 이 잡을 처리할 워커 노드를 결정하고 각 워커 노드에서 수행될 태스크들과 대응하는 스플릿을 각 워커 노드에 통신 네트워크를 통해 할당한다. The master node 200 determines a worker node to process this job upon receipt of a job processing request from the client node 100 and sends a split corresponding to the tasks to be performed in each worker node to each worker node via a communication network .

예를 들어, 마스터 노드(200)는 분산 처리 시스템의 할당 가능한 리소스에 따라 하나 이상의 워커 노드를 요청된 잡에 할당하고 각 워커 노드에 하나 이상의 태스크를 또한 할당한다. 마스터 노드(200)는 잡을 처리할 입력 파일을 전체 태스크 개수로 나누어 동일한 크기를 가지는 스플릿으로 분할하며 분할될 스플릿을 대응하는 태스크를 수행하는 워커 노드로 통신 네트워크를 통해 전달하거나 알릴 수 있다. For example, the master node 200 assigns one or more worker nodes to a requested job and also assigns one or more tasks to each worker node according to the allocatable resources of the distributed processing system. The master node 200 may divide the input file to be processed by the total number of tasks and divide it into splits having the same size and communicate the split to be split to the worker node performing the corresponding task via the communication network.

예를 들어, 할당가능한 워커 노드가 5개이고, 각 워커 노드가 5개의 태스크를 가질 수 있거나 가지는 것으로 결정한 경우, 마스터 노드(200)는 요청된 잡을 처리할 25개의 태스크와 각 태스크에서 처리할 스플릿을 구성하기 위해 입력 파일을 동일한 크기를 가지는 25개의 스플릿으로 분할할 수 있다. 분할된 각각의 스플릿은 맵핑된 워커 노드의 태스크로 전달되거나 알릴 수 있다. 마스터 노드(200)는 하둡 프레임워크의 버전에 따라 잡 트래커 노드로 지칭될 수도 있다. For example, if there are five assignable worker nodes and each worker node has or can have five tasks, the master node 200 will have 25 tasks to process the requested job and a split to process in each task To configure, the input file can be split into 25 splits of the same size. Each splitted split can be passed or notified to the task of the mapped worker node. The master node 200 may be referred to as a job tracker node depending on the version of the Hadoop framework.

또 다른 복수의 노드(10)는 마스터 노드(200)의 지시에 따라 요청된 잡의 태스크(맵 태스크 및/또는 리듀스 태스크)를 대응하는 스플릿으로 수행하고 그 처리 결과를 생성하는 워커 노드일 수 있다. 다른 유형의 노드(10)와 구별하기 위해 이하에서는 워커 노드를 이하 도면 부호 300으로 표시한다. 워커 노드(300)는 하둡 프레임워크의 버전에 따라 태스크 트래커 노드라 지칭될 수도 있다. Another plurality of nodes 10 may perform the task (map task and / or reduce task) of the requested job in the corresponding split according to the instruction of the master node 200 and may be a worker node have. In order to distinguish it from other types of nodes 10, the worker node is denoted by reference numeral 300 hereinafter. The worker node 300 may be referred to as a task tracker node depending on the version of the Hadoop framework.

마스터 노드(200)는 요청된 처리 잡에서 이용될 입력 파일을 동일한 크기의 스플릿으로 분할하고 분할된 스플릿을 본 처리 잡을 수행하기 위한 워커 노드(300)들에 할당하고 할당된 스플릿을 통신 네트워크를 통해 각각의 워커 노드(300)로 전달한다. 특정 워커 노드(300)에 할당되는 스플릿과 다른 특정 워커 노드(300)에 할당되는 스플릿은 서로 다르고 스플릿이 동일한 크기로 할당된다.The master node 200 divides the input file to be used in the requested processing job into splits of equal size and allocates the split split to the worker nodes 300 for performing this processing job and transmits the allocated split over the communication network To each of the worker nodes (300). The splits allocated to the specific worker node 300 and the splits allocated to the other specific worker node 300 are different and the splits are allocated to the same size.

클라이언트 노드(100), 마스터 노드(200) 및 워커 노드(300)는 반드시 상이한 물리적 머신에 맵핑될 필요는 없다. The client node 100, the master node 200, and the worker node 300 do not necessarily have to be mapped to different physical machines.

각각의 노드(10)들은 분산 처리 시스템(하둡 프레임워크) 상에서 SSH(Secure Shell) 데이터 통신을 통해 필요한 데이터들을 송수신할 수 있고 HDFS(Hadoop Distributed File System)를 통해 파일을 공유할 수 있다. Each node 10 can send and receive necessary data through SSH (Secure Shell) data communication on a distributed processing system (Hadoop framework) and can share files through HDFS (Hadoop Distributed File System).

하둡 프레임워크 상의 통신 네트워크는 근거리 통신 네트워크이거나 광대역 통신 네트워크일 수 있다. 통신 네트워크는 예를 들어 인터넷 망을 구성할 수 있다. The communication network on the Hadoop framework may be a short-range communication network or a broadband communication network. The communication network may constitute an Internet network, for example.

도 2는 워커 노드(300)의 하드웨어 블록도를 도시한 도면이다. FIG. 2 is a hardware block diagram of the worker node 300. FIG.

도 2의 블록도는 워커 노드(300) 뿐 아니라 마스터 노드(200)나 클라이언트 노드(100)의 블록도를 나타낼 수도 있거나 도 2의 블록도로부터 간단한 변형으로 마스터 노드(200)나 클라이언트 노드(100)가 구성될 수 있다. 2 may represent a block diagram of the master node 200 or the client node 100 as well as the worker node 300 or may be a master node 200 or a client node 100 ) Can be constructed.

도 2에 따르면, 워커 노드(300)는 통신 인터페이스(301), 메모리(303), 대용량 저장 매체(305), 시스템 버스/제어 버스(307) 및 CPU(309)와 GPU(311)를 포함한다. 2, the worker node 300 includes a communication interface 301, a memory 303, a mass storage medium 305, a system bus / control bus 307, and a CPU 309 and a GPU 311 .

각 블록에 대해서 간단히 살펴보면, 통신 인터페이스(301)는 통신 네트워크를 통해 각종 데이터를 송신하거나 수신하기 위한 인터페이스이다. 통신 인터페이스(301)는 유선랜 및/또는 무선랜에 액세스할 수 있는 통신 칩을 구비하여 통신 패킷을 통신 네트워크를 통해 송신하고 수신할 수 있다. 통신 인터페이스(301)를 통해 워커 노드(300)는 HDFS를 거쳐 워커 노드(300)의 태스크에 맵핑된(할당된) 스플릿에 액세스하거나 마스터 노드(200) 등으로부터 스플릿을 수신할 수 있고 태스크 프로그램을 또한 수신하거나 액세스할 수 있다. The communication interface 301 is an interface for transmitting or receiving various data through a communication network. The communication interface 301 includes a communication chip capable of accessing a wired LAN and / or a wireless LAN, and can transmit and receive communication packets through a communication network. Through the communication interface 301, the worker node 300 can access the split (allocated) mapped to the task of the worker node 300 via the HDFS or receive the split from the master node 200 or the like, It can also receive or access.

메모리(303)는 에스디램(SDRAM) 등의 휘발성 메모리 및/또는 낸드플래시(Nand Flash) 등의 비휘발성 메모리를 포함하여 각종 데이터와 프로그램을 영구히 또는 임시로 저장한다. The memory 303 includes volatile memory such as SDRAM and / or nonvolatile memory such as NAND flash to permanently or temporarily store various data and programs.

예를 들어, 메모리(303)는 태스크가 처리할 스플릿을 임시로 저장하거나 CPU(309) 또는 GPU(311)에서 수행되는 노드 매니저(350) 프로그램 및/또는 태스크 프로그램을 영구히 또는 임시로 저장할 수 있다. 메모리(303) 등에 저장되는 프로그램(특히 노드 매니저(350) 프로그램)에 대해서는 도 3의 기능 블록도를 통해 좀 더 상세히 살펴보도록 한다. For example, the memory 303 may temporarily store the split to be handled by the task, or permanently or temporarily store the node manager 350 program and / or the task program executed by the CPU 309 or the GPU 311 . A program stored in the memory 303 or the like (particularly, the node manager 350 program) will be described in more detail with reference to the functional block diagram of FIG.

대용량 저장 매체(305)는 하나 이상의 하드 디스크 등을 포함하여 각종 데이터와 프로그램을 저장한다. 대용량 저장 매체(305)는 노드 매니저(350) 프로그램 및/또는 태스크 프로그램 등을 저장하고 나아가 스플릿 등을 저장할 수 있다. 대용량 저장 매체(305)는 설계 변형에 따라 생략될 수 있다. The mass storage medium 305 stores various data and programs including one or more hard disks and the like. The mass storage medium 305 may store a node manager 350 program and / or a task program, and may further store a split or the like. The mass storage medium 305 may be omitted according to the design variant.

시스템 버스/제어 버스(307)는 블록 간에 데이터를 송수신한다. 시스템 버스/제어 버스(307)는 병렬 버스, 시리얼 버스, GPIO(General Purpose Input Output), 유선랜/무선랜 등과 같은 근거리 통신 네트워크 등의 하나 이상의 조합으로 구성될 수 있다. The system bus / control bus 307 sends and receives data between blocks. The system bus / control bus 307 may be configured in one or more combinations of a parallel bus, a serial bus, a general purpose input output (GPIO), a local area communication network such as a wired LAN / wireless LAN,

CPU(309)는 복수의 CPU 코어(309-1)를 포함하여 메모리(303)나 대용량 저장 매체(305)에 저장되어 있는 프로그램을 수행할 수 있다. CPU(309)에 포함되는 각각의 CPU 코어(309-1)는 다른 CPU 코어(309-1)와 독립적으로 프로그램을 수행하고 메모리(303) 등을 액세스할 수 있다. CPU 코어(309-1) 각각은 CPU(309)에 할당된 태스크(이하 'CPU 태스크'라고도 함)를 수행할 수 있다. The CPU 309 can execute programs stored in the memory 303 or the mass storage medium 305, including a plurality of CPU cores 309-1. Each of the CPU cores 309-1 included in the CPU 309 can perform a program independently from the other CPU core 309-1 and access the memory 303 and the like. Each of the CPU cores 309-1 can perform a task (hereinafter, also referred to as a CPU task) assigned to the CPU 309. [

GPU(311)는 복수의 GPU 코어(311-1)를 포함하여 메모리(303)나 대용량 저장 매체(305)에 저장되어 있는 프로그램을 수행할 수 있다. GPU(311)에 포함되는 각각의 GPU 코어(311-1)는 다른 GPU 코어(311-1)와 독립적으로 프로그램을 수행하고 메모리(303) 등에 액세스할 수 있다. GPU 코어(311-1) 각각은 GPU(311)에 할당된 태스크(이하 'GPU 태스크'라고도 함)를 수행할 수 있다. The GPU 311 may execute a program stored in the memory 303 or the mass storage medium 305, including a plurality of GPU cores 311-1. Each of the GPU cores 311-1 included in the GPU 311 can independently execute programs and access to the memory 303 or the like with other GPU cores 311-1. Each of the GPU cores 311-1 can perform a task assigned to the GPU 311 (hereinafter also referred to as a GPU task).

이상에서 알 수 있는 바와 같이, 워커 노드(300)는 적어도 서로 다른 프로세싱 타입의 프로세싱 유닛을 가지고 각각의 이기종 프로세싱 유닛은 독립적으로 동일한 기능을 수행하는 태스크 프로그램을 로딩하여 수행할 수 있다. 한 유형의 프로세싱 타입의 프로세싱 유닛은 CPU(309)일 수 있고 다른 유형의 프로세싱 타입의 프로세싱 유닛은 GPU(311)일 수 있다. GPU 코어(311-1) 각각은 독립적으로 연산을 처리할 수 있는 복수의 마이크로 프로세싱 유닛을 포함하여 CPU 코어(309-1)에 비해 빠른 처리가 예상 가능하다. As can be seen from the above, the worker node 300 has at least processing units of different processing types, and each of the heterogeneous processing units can independently execute a task program that performs the same function. One type of processing type of processing unit may be the CPU 309 and another type of processing type of processing unit may be the GPU 311. Each of the GPU cores 311-1 includes a plurality of microprocessing units capable of independently performing operations, so that it is possible to expect faster processing than the CPU core 309-1.

CPU(309)나 GPU(311)는 메모리(303) 등에 포함된 각종 프로그램을 로딩하여 독립적으로 처리할 수 있고 할당된 태스크를 할당된 태스크에 맵핑되고 입력 파일로부터 분할된 스플릿에 대해 수행할 수 있다. The CPU 309 and the GPU 311 can load various programs included in the memory 303 or the like and can independently process them and perform the assigned task on the splits that are mapped to the assigned tasks and split from the input file .

워커 노드(300)에서 이루어지는 각종 처리와 흐름은 도 3 이하에서 좀 더 상세히 살펴보도록 한다. The various processes and flows in the worker node 300 will be described in more detail below with reference to FIG.

도 3은 워커 노드(300)의 기능 블록도를 도시한 도면이다. FIG. 3 is a functional block diagram of the worker node 300. FIG.

도 3에 따르면 워커 노드(300)는 도 2의 하드웨어 블록도 상에서 수행되는 노드 매니저(350)와 하나 이상의 태스크를 포함하고 특정 워커 노드는 애플리케이션 마스터를 더 포함한다. 동일한 잡을 처리하는 여러 워커 노드(300) 중 적어도 하나의 워커 노드(300)는 이 애플리케이션 마스터를 포함하여 다른 워커 노드(300)에 태스크를 할당할 수 있다. 이 애플리케이션 마스터는 마스터 노드(200)(의 리소스 매니저)로부터 할당 태스크를 수신하여 이를 자신의 또는 다른 워커 노드(300)로 분배할 수 있다. 여기서 워커 노드(300)의 기능 블록들은 하드웨어 블록의 부호와 구별하기 위해 350번 대의 번호로 표시한다. According to FIG. 3, the worker node 300 includes a node manager 350 and one or more tasks performed on the hardware block diagram of FIG. 2, and the specific worker node further includes an application master. At least one worker node 300 of the plurality of worker nodes 300 processing the same job may assign the task to another worker node 300 including this application master. This application master can receive an assignment task from (the resource manager of) the master node 200 and distribute it to its own or another worker node 300. Herein, the functional blocks of the worker node 300 are indicated by numbers of 350 to distinguish them from the codes of the hardware blocks.

도 3의 기능 블록도는 도 2의 하드웨어 블록도 상에서 수행되는 데 프로그램의 형태로 워커 노드(300)의 CPU(309)나 GPU(311) 상에서 수행된다. 이와 같이 워커 노드(300)는 도 2에 따른 하드웨어 블록 뿐 아니라 도 3에 따르는 소프트웨어(기능) 블록을 포함한다. The functional block diagram of FIG. 3 is performed on the hardware block diagram of FIG. 2 and is performed on the CPU 309 or the GPU 311 of the worker node 300 in the form of a program. Thus, the worker node 300 includes not only the hardware block according to FIG. 2 but also the software (function) block according to FIG.

도 3을 통해 워커 노드(300)의 기능을 살펴보면, 노드 매니저(350)는 워커 노드(300)를 관리한다. 노드 매니저(350)는 메모리(303)나 대용량 저장 매체(305)에 저장되어 있는 노드 매니저(350) 프로그램에 의해 구성되고 마스터 노드(200)에 의해서 워커 노드(300)에 할당된 태스크를 워커 노드(300)의 리소스 상태에 따라 CPU(309)나 GPU(311)에 할당하여 해당 태스크를 할당된 프로세싱 유닛에서 수행하도록 한다. 마스터 노드(200)에 의해서 할당된 태스크는 애플리케이션 마스터를 통해서 이 워커 노드(300)에 배당될 수 있다. Referring to FIG. 3, the node manager 350 manages the worker node 300. The node manager 350 sends a task assigned to the worker node 300 by the master node 200 to the worker node 300 constituted by the memory manager 303 or the node manager 350 program stored in the mass storage medium 305. [ To the CPU 309 or the GPU 311 in accordance with the resource status of the resource management unit 300 so that the corresponding task is executed in the assigned processing unit. Tasks assigned by the master node 200 may be assigned to this worker node 300 via the application master.

노드 매니저(350)는 GPU(311)의 사용률 및/또는 GPU(311)에서 동시 수행이 바람직하게 이루어질 수 있는 태스크의 개수를 나타내는 동시수행 희망개수에 기초하여 할당된 태스크를 CPU(309)나 GPU(311)에 할당한다. The node manager 350 may assign an assigned task to the CPU 309 or the GPU 311 based on the usage rate of the GPU 311 and / or the number of concurrently performing tasks, (311).

동적인 태스크 할당을 위해 노드 매니저(350)는 GPU 모니터(351)와 동적 스케쥴러(353)를 적어도 포함한다.For dynamic task allocation, the node manager 350 includes at least a GPU monitor 351 and a dynamic scheduler 353.

GPU 모니터(351)는 GPU(311)의 사용률(utilization)을 모니터하도록 구성된다. GPU 모니터(351)는 라이브러리 등을 통해서 GPU(311)의 사용률을 획득할 수 있다. 예를 들어 GPU 모니터(351)는 NVIDIA사의 관리 라이브러리(Management Library)를 통해 GPU 사용률을 획득할 수 있다. 라이브러리를 통해서 획득되는 GPU 사용률은 GPU(311)가 사용되는 시간에 대한 퍼센티지를 나타내거나 GPU(311) 내에서 GPU 코어(311-1)들의 사용률을 나타낼 수 있다. The GPU monitor 351 is configured to monitor the utilization of the GPU 311. The GPU monitor 351 can acquire the usage rate of the GPU 311 through a library or the like. For example, the GPU monitor 351 can acquire the GPU utilization rate through NVIDIA's Management Library. The GPU usage rate obtained through the library may represent a percentage of the time that the GPU 311 is used or may indicate the usage rate of the GPU cores 311-1 within the GPU 311. [

이와 같이 GPU 모니터(351)는 GPU(311)의 사용 시간에 대한 퍼센티지나 GPU(311)의 전체 GPU 코어(311-1) 중 사용되는 GPU 코어(311-1)의 사용 퍼센티지나 그 조합을 나타내는 GPU 사용률을 획득한다. The GPU monitor 351 may calculate the percentage of the usage time of the GPU 311 or the usage percentage of the GPU core 311-1 used in the entire GPU core 311-1 of the GPU 311, Obtain GPU utilization.

동적 스케쥴러(353)는 마스터 노드(200)로부터 할당된 임의의 태스크를 동적 스케쥴러(353)에 의해서 초기화되고 동적으로 변경되는 동시수행 희망개수 및/또는 GPU 모니터(351)에 의해서 획득된 GPU 사용률에 기초하여 CPU(309)나 GPU(311)에 할당한다. The dynamic scheduler 353 may assign a desired task assigned from the master node 200 to the desired number of concurrently executed and / or GPU usage rates obtained by the GPU monitor 351 that are dynamically changed and initialized by the dynamic scheduler 353 And allocates it to the CPU 309 or the GPU 311.

동적 스케쥴러(353)는 FSM(Finite State Machine)을 포함하고 FSM은 워커 노드(300)에 할당된 태스크를 CPU(309)나 GPU(311)에 할당하기 위해서 이용된다. FSM은 할당된 태스크에 관련된 이벤트에 따라 트리거링되거나 지정된 주기에 따라 트리거링된다. 할당된 태스크에 관련된 이벤트는 신규 태스크가 마스터 노드(200)를 통해 할당되거나 할당된 태스크가 CPU(309)나 GPU(311)에서의 수행이 완료된 경우일 수 있다. 지정된 주기는 예를 들어 100mSec, 500mSec, 1초 등일 수 있다. The dynamic scheduler 353 includes a finite state machine (FSM) and the FSM is used to assign a task assigned to the worker node 300 to the CPU 309 or the GPU 311. The FSM is triggered according to the event associated with the assigned task or triggered according to the specified period. The event related to the assigned task may be a case where a new task is assigned through the master node 200 or a task to which the task is assigned is completed in the CPU 309 or the GPU 311. [ The designated period may be, for example, 100 mSec, 500 mSec, 1 second, or the like.

동적 스케쥴러(353)는 FSM에서 이용되는 각종 변수를 활용하여 할당된 태스크를 CPU(309)나 GPU(311)에 할당한다. 예를 들어 동적 스케쥴러(353)는 할당 정책(Assignment Policy)에 따라 GPU(311)에서 최대로 동시 수행 가능한 태스크의 개수인 동시수행 최대개수(max_gpu) 변수를 먼저 설정하고 설정된 동시수행 최대개수를 FSM의 초기화시나 최초 설정시나 특정 상태에서 동시수행 희망개수 변수(desired)로 설정한다.The dynamic scheduler 353 allocates the assigned task to the CPU 309 or the GPU 311 by utilizing various variables used in the FSM. For example, the dynamic scheduler 353 sets the maximum number of simultaneous execution (max_gpu) variables that are the maximum number of simultaneously executable tasks in the GPU 311 according to the assignment policy, Is set to the desired number of simultaneous execution in initialization, initial setting, or specific state.

먼저 설정되는 동시수행 최대개수는 특정 잡 요청의 잡에서 워커 노드(300)에 할당되는 태스크의 총 개수와 GPU(311)에 할당되는 태스크의 CPU 사용률에 기초하여 설정된다. The maximum number of concurrent tasks to be set first is set based on the total number of tasks assigned to the worker node 300 in the job of the specific job request and the CPU usage rate of the task allocated to the GPU 311. [

동적 스케쥴러(353)는 현재 GPU(311)에서 수행되는 태스크수행 현재개수 변수를 FSM의 초기화에 따라 초기화하고 할당된 특정 태스크가 GPU(311)에 할당함으로써 태스크수행 현재개수 변수를 1 증가시켜 이 변수를 업데이트한다. 또한, GPU(311)에 할당된 태스크의 수행이 완료됨에 따라 동적 스케쥴러(353)는 이 변수를 1 감소시킨다. The dynamic scheduler 353 initializes the task execution current number variable executed in the current GPU 311 according to the initialization of the FSM and allocates the assigned task to the GPU 311 to increment the task execution current number variable by 1, Lt; / RTI > In addition, as the task assigned to the GPU 311 is completed, the dynamic scheduler 353 decrements this variable by one.

동적 스케쥴러(353)는 동시수행 희망개수, 태스크수행 현재개수 및/또는 GPU 사용률의 상관관계를 이용하여 GPU(311)의 최대 성능을 획득하도록 태스크를 할당하고 변수를 변경하거나 관리한다. The dynamic scheduler 353 allocates tasks and changes or manages the variables to obtain the maximum performance of the GPU 311 by using the correlation between the desired number of simultaneous execution, the current number of task execution, and / or the GPU usage rate.

예를 들어, 동적 스케쥴러(353)의 FSM은 GPU 모니터(351)를 통해 획득된 GPU 사용률이 지정된(설정된) 임계치 이상인 경우에 현재 GPU(311)에서 수행되고 있는 태스크수행 현재개수에 기초한 개수를 동시수행 희망개수로 변경한다. 예를 들어 동적 스케쥴러(353)는 태스크수행 현재개수에 지정된 개수(예를 들어 1)를 합산하여 합산된 값을 동시수행 희망개수로 변경한다. For example, the FSM of the dynamic scheduler 353 may set the number based on the current number of tasks being performed in the current GPU 311 when the GPU usage rate obtained through the GPU monitor 351 is equal to or greater than a specified (set) Change to the desired number of execution. For example, the dynamic scheduler 353 sums the number (e.g., 1) specified in the task execution current number, and changes the summed value to the concurrent execution desired number.

이러한 동시수행 희망개수의 변경은 FSM 상의 상태 천이 상에서 이루어지고 GPU 사용률과 현재 GPU(311)에 할당된 태스크의 개수에 따라 동적으로 이루어진다. 동시수행 희망개수는 최초 할당 정책에 따른 동시수행 최대개수로 설정되고 이후 태스크의 할당과 수행에 따라 성능 최대치를 획득할 수 있는 값으로 수렴한다. This change in the desired number of concurrent operations is made on the state transition on the FSM and is dynamically performed according to the GPU usage rate and the number of tasks allocated to the current GPU 311. [ The desired number of simultaneous execution is set to the maximum number of simultaneous execution according to the initial allocation policy, and then converges to a value capable of obtaining the maximum performance according to the assignment and execution of the task.

노드 매니저(350)는 요청된 잡에 대해서 이미 수행된 태스크의 개수에 기초하여 현재 할당된 태스크를 GPU(311)에 강제 할당할 수 있다. The node manager 350 can forcibly assign the currently assigned task to the GPU 311 based on the number of tasks already performed for the requested job.

GPU 태스크(맵퍼)가 CPU 태스크의 수행 시간에 비해 매우 빠른 것이 일반적이고 요청된 잡이 끝나는 시점에 이르러서는 앞서 살펴본 동적 스케쥴링에 따라 심각한 부하 불균형을 가져올 수 있다. The GPU task (mapper) is very fast compared to the CPU task execution time, and at the end of the requested task, the dynamic scheduling can lead to serious load imbalance.

예를 들어, 새로이 도착한 스플릿이 마지막 스플릿이고 이 스플릿에 대응하는 태스크를 CPU(309)에 할당한다면 다른 태스크나 GPU(311)는 이 태스크의 CPU(309) 할당에 따른 긴 수행 시간의 종료를 기다려야 하는 문제가 발생한다. For example, if a newly arrived split is the last split and assigns a task corresponding to this split to the CPU 309, the other task or GPU 311 waits for the end of the long running time according to the assignment of the CPU 309 of this task A problem occurs.

이러한 부하 불균형에 대응하기 위해, 노드 매니저(350)는 할당된 태스크(스플릿)의 개수를 계속 업데이트하고 만일 할당된 태스크(스플릿)의 개수가 설정된 임계치를 넘어서는 경우에 이후에 할당되는 태스크(스플릿)를 강제적으로 GPU(311)에 할당할 수 있다.In order to deal with such a load imbalance, the node manager 350 continuously updates the number of assigned tasks (splits), and if the number of allocated tasks (splits) exceeds the set threshold, Can be forcibly assigned to the GPU 311. [

다른 대안으로서, 노드 매니저(350)는 새로운 태스크(스플릿)의 할당에 따라 아래 수학식에 따라 새로운 태스크를 GPU(311)에 할당한다. As another alternative, the node manager 350 allocates a new task to the GPU 311 according to the following equation according to the allocation of the new task (split).

여기서, N은 워커 노드(300)에서 처리될 태스크(스플릿)의 총 개수를 나타내고 b 는 현재 잡에 대해 CPU(309)에 할당되어 수행되었던 태스크의 총 개수를 나타내고 X는 동시에 수행 가능하도록 설정된 GPU 태스크의 개수를 나타내고 Y는 동시 수행 가능하도록 설정된 CPU 태스크의 개수를 나타낸다. Here, N represents the total number of tasks (splits) to be processed in the worker node 300, b represents the total number of tasks that have been assigned to the CPU 309 for the current job, and X represents a GPU Represents the number of tasks, and Y represents the number of CPU tasks set to be simultaneously executable.

P는 CPU(309)와 GPU(311)의 수행 시간의 비율을 나타내고 예를 들어 "CPU 태스크의 수행 시간/ GPU 태스크의 수행 시간"으로 표현될 수 있다. 이 수행 시간 비율은 미리 획득되거나 태스크의 할당과 수행으로 CPU 태스크의 수행 시간과 GPU 태스크의 수행 시간을 노드 매니저(350)가 획득하여 산출할 수 있다. P represents the ratio of the execution time of the CPU 309 and the GPU 311 and can be expressed, for example, as "execution time of CPU task / execution time of GPU task ". The node manager 350 can calculate the execution time of the CPU task and the execution time of the GPU task by acquiring the execution time ratio in advance or allocating and executing the task.

예를 들어, 노드 매니저(350)는 신규 태스크를 CPU(309)에 할당한 후에 CPU 태스크의 시작과 종료 시각을 측정하여 CPU 태스크의 수행 시간을 산출하고 나아가 여러 CPU 태스크의 수행 시간을 평균하여 평균 수행 시간을 산출할 수 있다. 또한, 노드 매니저(350)는 신규 태스크를 GPU(311)에 할당한 후에 GPU 태스크의 시작과 종료 시각을 측정하여 GPU 태스크의 수행 시간을 산출하고 나아가 여러 GPU 태스크의 수행 시간을 평균하여 평균 수행 시간을 산출할 수 있다. 이후 노드 매니저(350)는 측정된 CPU(309) 및 GPU(311)의 (평균) 수행 시간을 이용하여 P를 산출하고 지속적으로 갱신할 수 있다. For example, after assigning a new task to the CPU 309, the node manager 350 measures the start and end times of the CPU task to calculate the execution time of the CPU task, The execution time can be calculated. In addition, after the node manager 350 assigns a new task to the GPU 311, the start time and the end time of the GPU task are measured to calculate the execution time of the GPU task. Further, the execution time of the GPU tasks is averaged, Can be calculated. Then, the node manager 350 can calculate and continuously update P by using the (average) execution time of the CPU 309 and the GPU 311 measured.

이와 같이 노드 매니저(350)는 부하 밸런싱을 고려하여 이미 수행된 태스크의 개수가 지정된 임계치 이상이거나 이미 수행된 태스크의 개수(b)에 기초한 앞서 살펴본 수학식 1을 만족하는 경우에 신규의 태스크를 CPU(309)에 할당하지 않고 GPU(311)에 강제 할당하여 태스크를 수행할 수 있다. 신규 태스크가 GPU(311)에 강제 할당되는 경우에 노드 매니저(350)는 동적 스케쥴러(353) 또는 동적 스케쥴러(353)의 FSM의 구동을 생략할 수도 있다. In this way, when the number of tasks already performed is equal to or greater than a specified threshold in consideration of load balancing, or when the above-mentioned equation (1) based on the number (b) of tasks that have already been performed is satisfied, the node manager 350 assigns a new task to the CPU To the GPU 311 without assigning it to the GPU 309, and perform the task. The node manager 350 may skip the operation of the FSM of the dynamic scheduler 353 or the dynamic scheduler 353 when a new task is forcibly assigned to the GPU 311. [

동적 스케쥴러(353)에 대해서는 도 5 이하를 통해서 좀 더 상세히 살펴보도록 한다. The dynamic scheduler 353 will be described in more detail below with reference to FIG.

노드 매니저(350)는 하나 이상의 태스크를 생성하고 각각의 태스크는 지정된 CPU(309) 또는 GPU(311)에서 수행된다. CPU(309)에 할당된 태스크(맵 태스크)는 CPU 코어(309-1) 상에서 수행되고 GPU(311)에 할당된 태스크는 GPU(311) 코어(311-1) 상에서 수행된다. 워커 노드(300)는 고정 개수의 태스크를 인스턴스화할 수 있는 콘테이너를 복수 개 구비할 수 있다. 콘테이너는 예를 들어 소프트웨어로 구성되는 객체일 수 있다. 노드 매니저(350)는 태스크를 가지지 않은 콘테이너가 존재하는 경우에 신규의 태스크를 콘테이너에 인스턴스화하여 CPU(309)나 GPU(311)에 할당할 수 있다. The node manager 350 creates one or more tasks and each task is performed on the designated CPU 309 or the GPU 311. A task (map task) assigned to the CPU 309 is executed on the CPU core 309-1 and a task assigned to the GPU 311 is executed on the GPU 311 core 311-1. The worker node 300 may have a plurality of containers capable of instantiating a fixed number of tasks. The container may be an object, for example, composed of software. The node manager 350 can instantiate a new task in a container and assign it to the CPU 309 or the GPU 311 when there is a container that does not have a task.

CPU(309)에 할당된 태스크는 CPU(309) 상의 CPU 코어(309-1)에서 수행되고 GPU(311)에 할당된 태스크는 GPU(311) 상의 GPU(311) 코어(311-1)에서 수행된다. The task assigned to the CPU 309 is executed in the CPU core 309-1 on the CPU 309 and the task assigned to the GPU 311 is executed in the GPU 311 core 311-1 on the GPU 311 do.

도 5는 요청된 잡을 수행하는 태스크들의 예시적인 동적 스케쥴링 과정을 도시한 도면이다.5 is a diagram illustrating an exemplary dynamic scheduling process of tasks that perform a requested job.

도 5의 각 단계는 노드(10)에 의해서 수행되고 바람직하게는 노드(10)의 CPU(309)나 GPU(311)에서 수행되는 프로그램에 의해서 수행된다. Each step of FIG. 5 is performed by the node 10 and preferably by a program executed by the CPU 309 of the node 10 or the GPU 311.

먼저, 마스터 노드(200)는 하나 이상의 클라이언트 노드(100)에 연결되어 클라이언트 노드(100)로부터 잡 처리 요청을 통신 네트워크를 통해 수신(S101)한다. First, the master node 200 is connected to one or more client nodes 100 and receives a job processing request from the client node 100 through a communication network (S101).

예를 들어 마스터 노드(200)의 리소스 매니저는 통신 네트워크를 통해 연결된 클라이언트 노드(100)로부터 특정 기능을 수행하기 위한 잡 처리 요청을 수신할 수 있다. 수신되는 잡 처리 요청은 대용량의 파일에 대해서 병렬로 또는 독립적으로 수행될 수 있는 잡으로서 예를 들어 대용량 파일에 포함된 각각의 데이터에 대한 독립적인 연산 처리, 통계 분석, 비디오 처리 등일 수 있다. For example, the resource manager of the master node 200 may receive a job processing request for performing a specific function from the client node 100 connected through the communication network. The received job processing request may be a job that can be performed in parallel or independently for a large-capacity file. For example, the job processing request may be independent operation processing, statistical analysis, video processing, and the like for each data included in a large-capacity file.

잡 처리 요청을 수신한 마스터 노드(200)의 리소스 매니저는 분산 처리 시스템 상에서 이용가능한 워커 노드(300)들을 결정하고 결정된 각각의 워커 노드(300)에서 할당될 태스크들의 개수와 처리해야 하는 입력(대용량) 파일의 스플릿을 결정(S103)한다. The resource manager of the master node 200 that has received the job processing request determines the worker nodes 300 available on the distributed processing system and determines the number of tasks to be allocated in each of the determined worker nodes 300 and the input (S103). &Lt; / RTI >

하나의 태스크는 하나의 스플릿에 대응하고 해당 스플릿을 처리하도록 구성된다. 따라서 스플릿의 할당이라는 의미에는 태스크의 할당이라는 의미를 내포할 수 있고 그 반대로도 각 용어가 활용될 수 있다. One task corresponds to one split and is configured to process the split. Thus, in the sense of allocating a split, it can mean assignment of tasks, and vice versa.

마스터 노드(200)의 리소스 매니저는 통신 네트워크를 통해 태스크와 대응하는 스플릿을 통신 네트워크를 통해 이 잡을 처리할 워커 노드(300)들로 분배(S105)한다. The resource manager of the master node 200 distributes (S105) the split corresponding to the task via the communication network to the worker nodes 300 that will process the job through the communication network.

예를 들어, 리소스 매니저는 여러 워커 노드(300) 중 하나의 워커 노드(300)에 구성된 애플리케이션 마스터로 분할된 태스크의 처리를 요청하고 애플리케이션 마스터는 여러 워커 노드(300) 각각의 노드 매니저(350)와 연동하여 워커 노드(300)에 태스크를 할당한다. For example, a resource manager may request processing of a task partitioned into application masters configured on one of the worker nodes 300 of the plurality of worker nodes 300, and the application master may request the node manager 350 of each of the plurality of worker nodes 300, And assigns the task to the worker node 300 in cooperation with the worker node 300.

워커 노드(300)의 노드 매니저(350)는 이후 할당된 태스크를 제1 프로세싱 유닛(CPU(309))이나 제2 프로세싱 유닛(예를 들어 GPU(311))에 할당(S107 내지 S119 참조)한다. The node manager 350 of the worker node 300 assigns the assigned task to the first processing unit (CPU 309) or the second processing unit (e.g., GPU 311) (see S107 to S119) .

태스크의 할당에 따라, 워커 노드(300)는 신규 입력된 태스크를 제1 프로세싱 유닛 또는 제2 프로세싱 유닛에 할당할 수 있는 데, 신규 입력된 태스크의 프로세싱 유닛으로의 할당을 위해 먼저 노드 매니저(350)는 동적 스케쥴링을 위한 각종 변수를 초기화(S107)한다. In accordance with the task assignment, the worker node 300 can assign a newly entered task to the first processing unit or to the second processing unit, and the node manager 350 Initializes various variables for dynamic scheduling (S107).

예를 들어, 노드 매니저(350)(또는 동적 스케쥴러(353))는 미리 설정되고 제2 프로세싱 유닛에서 동시에 수행가능한 최대 태스크 개수를 나타내는 동시수행 최대개수 변수(도 6의 max_gpu)를 먼저 설정한다. For example, the node manager 350 (or the dynamic scheduler 353) first sets a concurrently executing maximum number variable (max_gpu in FIG. 6) that indicates the maximum number of tasks that are preset and can be performed simultaneously in the second processing unit.

동시수행 최대개수는 할당 정책(Assignment Policy)에 따라 달리 설정될 수 있다. 하둡 프레임워크와 같은 분산 처리 시스템의 워커 노드(300)들이 CPU(309)와 같은 제1 프로세싱 유닛과 GPU(311)와 같이 제1 프로세싱 유닛과는 그 타입이 다른 제2 프로세싱 유닛을 가지는 경우에 두 타입의 프로세싱 유닛 모두를 최대한 활용하여 수행 성능을 최대화할 필요가 있다. The maximum number of concurrent operations can be set differently according to the Assignment Policy. When the worker nodes 300 of the distributed processing system such as the Hadoop framework have a first processing unit such as CPU 309 and a second processing unit whose type is different from that of the first processing unit such as GPU 311 It is necessary to maximize performance of both types of processing units.

너무 많은 태스크가 제2 프로세싱 유닛에 할당되면 충돌 발생으로 성능 저하가 발생하고 너무 많은 태스크가 제1 프로세싱 유닛에 할당되면 마찬가지로 성능 저하가 발생한다. If too many tasks are assigned to the second processing unit, a performance degradation occurs due to the occurrence of the collision, and if too many tasks are assigned to the first processing unit, the performance degradation likewise occurs.

여러 프로세싱 타입의 프로세싱 유닛의 성능을 최대한 이용하는 있어 하둡 프레임워크에서의 큰 문제점은 각 워커 노드(300)에 할당되는 태스크의 개수가 미리 결정(S103 참조)되고 실시간으로 변경 불가능한 점이다. 따라서 할당 정책에 따라 동시수행 최대개수를 먼저 설정하고 이후 제2 프로세싱 유닛이나 제1 프로세싱 유닛의 태스크 개수를 모니터링되는 성능에 따라 동시수행 최대개수 또는 이 개수에 기초한 변수를 동적으로 변경하면서 하둡 프레임워크의 태스크 분할 정책을 준수할 필요가 존재한다. A major problem in the Hadoop framework that maximizes the performance of various processing types of processing units is that the number of tasks assigned to each worker node 300 is predetermined (see S103) and is not changeable in real time. Therefore, according to the allocation policy, the maximum number of concurrent processes is set first, and then the number of tasks of the second processing unit or the first processing unit is dynamically changed based on the maximum number or the number of concurrently executed tasks according to the monitored performance. There is a need to comply with the task partitioning policy of

일반적으로 두 프로세싱 유닛 중 하나의 프로세싱 유닛은 다른 프로세싱 유닛에 비해 높은 성능을 제공할 수 있다. 예를 들어 제2 프로세싱 유닛의 GPU(311)(이하에서는 GPU와 제2 프로세싱 유닛의 용어를 혼용해서 사용함)의 태스크는 제1 프로세싱 유닛의 CPU(309)(이하에서는 CPU와 제1 프로세싱 유닛의 용어를 혼용해서 사용함)의 태스크보다 매우 높은 쓰루풋(throughtput)을 가진다. 따라서 노드 매니저(350)의 동적 스케쥴러(353)가 가능한 CPU(309)보다는 GPU(311)에 더 많은 태스크가 초기에 할당되도록 동시수행 최대개수를 설정한다. In general, the processing units of one of the two processing units can provide higher performance than the other processing units. For example, the task of the GPU 311 of the second processing unit (hereinafter, a combination of the GPU and the term of the second processing unit) is executed by the CPU 309 of the first processing unit And uses a combination of terms). Therefore, the dynamic scheduler 353 of the node manager 350 sets the maximum number of simultaneous executions so that more tasks are initially allocated to the GPU 311 than the CPU 309 possible.

동시수행 최대개수를 설정하기 위한 할당 정책으로서 이하 4가지 정도를 제시하면,The following four allocation policies are proposed for setting the maximum number of concurrent operations,

1) Dynamic-preProfile : 최상의 GPU 태스크 개수가 하둡과 같은 분산 처리 시스템 상에서 잡이 수행되기 전에 먼저 획득된다. 예를 들어 잡의 수행 이전에 잡을 선행하여 수행시키고 GPU 태스크 개수인 X를 증가시키면서 최상 값을 찾도록 구성된다. 1) Dynamic-preProfile: The number of best GPU tasks is acquired before a job is performed on a distributed processing system such as Hadoop. For example, prior to the execution of a job, the job is performed in advance and the number of GPU tasks, X, is increased to find the best value.

그리고 CPU 태스크의 개수(Y)는 아래의 수학식을 통해서 획득된다. The number (Y) of CPU tasks is obtained by the following equation.

여기서, TC는 워커 노드(300)의 CPU 사용률(percentage)(예를 들어 CPU 코어(309-1)가 12개이면 1200)을 나타내고 GC는 GPU 태스크의 CPU 사용률을 나타낸다. GPU 태스크는 일반적으로 CPU(309)에서 수행되는 커널에 의해 GPU(311)를 사용하기에 CPU(309) 사용을 수반한다. X+Y는 워커 노드(300)에 할당되는 총 태스크의 개수(N)를 나타낼 수 있다. 총 태스크의 개수 N은 단계 S103에서 마스터 노드(200)에 의해서 결정되고 고정된다. Here, TC represents a CPU utilization percentage (for example, 1200 if 12 CPU cores 309-1) of the worker node 300, and GC represents the CPU utilization of the GPU task. The GPU task generally involves the use of the CPU 309 to use the GPU 311 by the kernel performed by the CPU 309. X + Y may represent the total number (N) of tasks assigned to the worker node 300. The total number N of tasks is determined and fixed by the master node 200 in step S103.

2) Dynamic-maxStart1 : 동시 수행 가능한 GPU 태스크 개수인 X를 아래 수학식 3에 따른 최대값으로 설정하고 Y를 수학식 2에 따라 획득되는 값으로 설정한다. 2) Dynamic-maxStart1: X, which is the number of simultaneously executable GPU tasks, is set to a maximum value according to Equation (3) below, and Y is set to a value obtained according to Equation (2).

여기서, TotalGPUmem은 GPU(311)가 가지고 있는 총 메모리 용량을 나타내고 MapTaskmem은 하나의 태스크(맵 태스크)에서 요구하는 메모리의 사용량을 나타낸다.Here, Total GPUmem represents the total memory capacity of the GPU 311, and MapTaskmem represents the amount of memory required by one task (map task).

X의 값이 최상치(optimal value)보다 더 큰 경우에는 이 X의 값은 이후 동적 스케쥴러(353)에 의해서 감소되고 Y의 값은 증가되어 최상치로 수렴한다. 마찬가지로 X의 값이 최상치보다 작은 경우에는 이후 동적 스케쥴러(353)에 의해 증가되어 최상치로 수렴한다. If the value of X is greater than the optimal value, then the value of X is reduced by the dynamic scheduler 353 and the value of Y is increased to converge to the highest value. Likewise, when the value of X is smaller than the maximum value, it is increased by the dynamic scheduler 353 and converged to the highest value.

3) Dynamic-maxStart2 : Dynamic-maxStart1과 동일하게 X의 값이 설정된다. 그러나 Y의 값은 수학식 2를 따르지 않고 0으로 설정되고 이후 X의 값이 감소함에 따라 Y의 값은 증가한다. 이 할당 정책은 GPU(311)만 있는 경우와 유사한 성능을 보여주나 CPU(309)와 GPU(311)의 하이브리드 수행의 포텐셜을 완전히 활용치 못한다. 3) Dynamic-maxStart2: X value is set same as Dynamic-maxStart1. However, the value of Y is set to 0 without following equation (2), and the value of Y increases as the value of X subsequently decreases. This allocation policy has performance similar to that of the GPU 311 alone but does not fully utilize the potential of the hybrid execution of the CPU 309 and the GPU 311. [

4) Dynamic-conservative : X를 최대치에 설정하는 것 대신에 X를 수학식 4를 만족하는 최소값으로 설정하고 동적으로 증가시킨다. 4) Dynamic-conservative: Instead of setting X to the maximum value, X is set to the minimum value satisfying Equation 4 and dynamically increased.

GPU 사용률은 X에 비례하여 증가하지 않고 비선형적으로 포화상태가 되고 수학식 4에 따른 X값은 실시간으로 증가하고 Y값은 감소되도록 구성된다. The GPU usage rate does not increase in proportion to X but becomes non-linearly saturated, and the X value according to Equation (4) increases in real time and the Y value decreases.

위에서 살펴본 여러 할당 정책 중 하나(바람직하게는 2)나 3))에 따라 X의 값이 결정되고 결정된 X값은 동시수행 최대개수로 단계 S107에서 초기화된다. The value of X is determined according to one of the above-mentioned various allocation policies (preferably 2) or 3)), and the determined X value is initialized in step S107 to the maximum number of concurrent operations.

이상의 여러 할당 정책에서 알 수 있는 바와 같이 GPU(311)에서 동시에 수행될 수 있는 태스크의 최대 개수인 동시수행 최대개수 변수는 워커 노드(300)에 할당되는 태스크의 총 개수(N)와 GPU(311)에 할당되는 태스크의 사용률(CPU 사용률 및/또는 GPU 사용률)에 따라 설정된다. As can be seen from the above various assignment policies, the maximum number of concurrent tasks that can be performed simultaneously in the GPU 311 is the total number N of tasks assigned to the worker node 300 and the number of tasks (CPU usage rate and / or GPU usage rate) of the task to be allocated to the task.

각종 변수의 초기화 과정(S107)에서 노드 매니저(350)(의 동적 스케쥴러(353))는 GPU 사용률을 비교할 지정된 임계치(도 6의 upper)를 설정한다. 또한, 노드 매니저(350)는 GPU(311)에서 수행되고 있는 태스크수행 현재개수 변수(도 6의 cur_gpu)를 0으로 초기화한다. In the initialization process (S107) of the various variables, the (dynamic scheduler 353 of) the node manager 350 sets a specified threshold value (upper in Fig. 6) for comparing the GPU utilization rates. Also, the node manager 350 initializes the task execution current variable (cur_gpu in FIG. 6) being executed in the GPU 311 to zero.

동시수행 최대개수의 설정 이후에, 노드 매니저(350)(의 동적 스케쥴러(353))는 앞서 설정된 동시수행 최대개수를 동시수행 희망개수 변수(도 6의 desired)에 설정한다. After setting the maximum number of simultaneous execution, the node manager 350 (the dynamic scheduler 353 of the node manager) sets the maximum number of concurrently executed concurrently to the number of concurrently executed desired number variable (desired in FIG. 6).

이 동시수행 희망개수는 FSM 상에서 설정될 수 있고 초기 상태(도 6의 EMPTY)에서 설정될 수 있다. 동시수행 희망개수는 최초 할당 정책에 따라 결정되는 동시수행 최대개수로 설정되나 이후에는 GPU 사용률과 연동하여 변경된다.This concurrent execution desired number can be set on the FSM and can be set in the initial state (EMPTY in FIG. 6). The desired number of simultaneous execution is set to the maximum number of simultaneous execution determined according to the initial allocation policy, but thereafter it is changed in conjunction with the GPU usage rate.

각종 변수의 초기화 이후에, 노드 매니저(350)는 마스터 노드(200)(를 통해 애플리케이션 마스터)로부터 태스크 할당을 모니터링(S109)한다. After initialization of various variables, the node manager 350 monitors (S109) the task assignment from the master node 200 (via the application master).

만일 신규의 태스크가 마스터 노드(200)를 통해 입력(할당)된 경우, 노드 매니저(350)는 CPU(309)와 GPU 사이에 부하 불균형 상태를 야기할 수 있는 지를 판단(S111)한다. If a new task is input (allocated) via the master node 200, the node manager 350 determines whether the load imbalance state between the CPU 309 and the GPU can be caused (S111).

예를 들어 노드 매니저(350)는 이미 할당된 태스크(스플릿)의 개수가 설정된 임계치를 넘어서는 경우이거나 수학식 1을 만족하는 경우에 부하 불균형을 야기할 수 있는 것으로 판단하여 단계 S113 내지 단계 S117의 수행 없이 단계 S119로 전이한다. For example, the node manager 350 judges that the number of already allocated tasks (splits) exceeds the set threshold value, or when it satisfies the expression (1), it may cause load imbalance, and the execution of steps S113 to S117 The process proceeds to step S119 without performing the process.

부하 불균형 상태로 인식한 노드 매니저(350)는 현재 할당된 태스크를 GPU(311)(제2 프로세싱 유닛)에 강제 할당(S119)한다. 강제 할당에 따라 단계 S113 내지 단계 S117의 수행은 되지 않고 단계 S113과 단계 S117을 구현하는 FSM의 수행이 중단될 수 있다. The node manager 350 recognizing the load unbalance state forcibly assigns the currently assigned task to the GPU 311 (second processing unit) (S119). The execution of the FSM that implements steps S113 and S117 may be interrupted without performing the steps S113 to S117 according to the forced allocation.

이와 같이 노드 매니저(350)는 워커 노드(300)에서 이미 수행된 태스크의 개수에 기초하여 할당된 태스크를 GPU(311)에 강제 할당한다. In this manner, the node manager 350 forcibly allocates the task assigned to the GPU 311 based on the number of tasks already performed in the worker node 300.

만일 부하 불균형이 야기되지 않는 것으로 판단한 경우에, 노드 매니저(350)는 마스터 노드(200)를 통해 현재 입력된 태스크를 CPU(309)(제1 프로세싱 유닛) 또는 GPU(311)(제2 프로세싱 유닛)에 GPU(311)의 동시수행 희망개수에 기초하여 할당(S113)한다. The node manager 350 notifies the CPU 309 (the first processing unit) or the GPU 311 (the second processing unit 311) of the currently input task via the master node 200, if it determines that load imbalance is not caused (S113) based on the number of simultaneous execution of the GPU 311.

예를 들어, 노드 매니저(350)의 동적 스케쥴러(353)는 신규의 태스크를 CPU(309)나 GPU(311)에 할당하기 위한 FSM을 구비하고 이 FSM은 태스크에 관련된 이벤트에 따라 트리거링되거나 지정된 주기(예를 들어 100 mSec)에 따라 트리거링된다. 태스크에 관련된 이벤트는 신규 태스크의 할당 이벤트이거나 할당된 태스크의 수행완료 이벤트일 수 있다. For example, the dynamic scheduler 353 of the node manager 350 has an FSM for assigning a new task to the CPU 309 or the GPU 311, which is triggered according to an event related to the task, (For example, 100 mSec). An event related to a task may be an assignment event of a new task or an execution completion event of an assigned task.

단계 S113은 FSM에 의해서 제어되고 수행될 수 있고 나아가 단계 S117 또한 FSM에 의해서 제어되고 수행될 수 있다. Step S113 can be controlled and executed by the FSM and further can be controlled and executed by the FSM also in step S117.

동적 스케쥴러(353)의 FSM(도 6 참조)을 구체적으로 살펴보면, FSM은 초기화에 따라 EMPTY 상태로 전이하고, EMPTY 상태로의 전이시 또는 EMPTY 상태에서 각종 변수를 초기화(S107)한다. 예를 들어 EMPTY 상태로의 진입시에 동시수행 최대개수, 동시수행 희망개수, 태스크수행 현재개수 등을 설정된 값으로 초기화할 수 있다. 6) of the dynamic scheduler 353, the FSM transitions to the EMPTY state according to the initialization, and initializes various variables at the transition to the EMPTY state or in the EMPTY state (S107). For example, when entering the EMPTY state, the maximum number of simultaneous execution, the desired number of simultaneous execution, and the current number of task execution can be initialized to the set values.

EMPTY 상태에서 신규의 할당 태스크를 동적 스케쥴러(353)는 CPU(309) 대신에 GPU(311)에 할당하고 신규의 태스크 할당에 따라 GPU(311)에서 현재 동시에 수행되고 있는 태스크 개수인 태스크수행 현재개수를 업데이트한다(예를 들어 1 증가시킨다). 업데이트(1 증가)에 후속하여 태스크수행 현재개수는 동시수행 최대개수 또는 동시수행 희망개수와 비교되고 만일 태스크수행 현재개수가 이 변수보다 동일하거나 큰 경우 상태 TRANSITION1으로 전이한다. TRANSITION1 상태로의 전이시에 도 6과 같이 동시수행 희망개수가 동시수행 최대개수로 설정될 수도 있다. The dynamic scheduler 353 allocates a new allocation task in the EMPTY state to the GPU 311 instead of the CPU 309 and updates the task execution current number of tasks currently executing simultaneously in the GPU 311 (E.g., increments by 1). Following the update (1 increment), the current number of task execution is compared with the maximum concurrent execution number or the concurrent execution number, and if the current number of task execution is greater than or equal to this variable, transition to state TRANSITION1. At the transition to the TRANSITION1 state, the desired number of simultaneous execution may be set to the maximum number of simultaneous execution as shown in FIG.

TRANSITION1 상태는 GPU(311)에 신규 태스크를 할당할 수 없는(avail = false) 임시상태이다. avail = false로 설정되는 상태에서는 동적 스케쥴러(353)는 신규의 태스크를 CPU(309)에 할당할 수 있다. The TRANSITION1 state is a temporary state in which a new task can not be allocated to the GPU 311 (avail = false). In the state where avail = false, the dynamic scheduler 353 can assign a new task to the CPU 309. [

GPU 모니터(351)에 의해 GPU 사용률(avg_util)은 주기적으로 모니터링되고 CPU(309)나 GPU(311)로의 할당 이후에도 모니터링(S115)된다. The GPU usage rate (avg_util) is periodically monitored by the GPU monitor 351 and monitored (S115) after the allocation to the CPU 309 or the GPU 311. [

TRANSITION1 상태에서 GPU 사용률이 설정된 임계치(upper)를 초과하는 경우(GPU 태스크 할당과 이후 GPU 사용률 모니터링의 시간 차를 고려하여 TRANSITION1 상태가 존재할 수 있음)에 FSM은 상태를 FULL 상태로 천이한다. FULL 상태에서는 avail이 false로 설정되어 신규 할당되는 태스크는 동적 스케쥴러(353)에 의해 모두 CPU(309)에 할당된다. In the TRANSITION1 state, when the GPU utilization exceeds the set upper limit (the TRANSITION1 state may exist in consideration of the time difference between the GPU task allocation and the subsequent GPU utilization monitoring), the FSM transits the state to the FULL state. In the FULL state, avail is set to false, and newly assigned tasks are all allocated to the CPU 309 by the dynamic scheduler 353. [

여기서, 각각의 상태에서 GPU(311)에 할당된 태스크가 완료된 경우에 GPU(311)의 태스크수행 현재개수는 1씩 감소한다. Here, when the task assigned to the GPU 311 is completed in each state, the current task execution number of the GPU 311 is decremented by one.

FULL 상태에서 GPU(311)로의 태스크 미할당으로 GPU 사용률이 설정된 임계치 이하로 떨어지는 경우에(avg_util <= upper) 동적 스케쥴러(353)의 FSM은 현재의 동시수행 희망개수를 변경(S117)한다. If the GPU usage rate falls below the set threshold value (avg_util <= upper) due to the task unassignment from the FULL state to the GPU 311, the FSM of the dynamic scheduler 353 changes the current desired simultaneous execution count (S117).

예를 들어, FULL 상태에서 동적 스케쥴러(353)는 더 이상 GPU(311)에 신규 태스크를 할당하지 않고 GPU(311)에 할당된 임의의 태스크가 수행 완료됨에 따라 태스크수행 현재개수가 1씩 감소되어 계속 업데이트된다. For example, in the FULL state, the dynamic scheduler 353 no longer allocates a new task to the GPU 311, but the current number of tasks is reduced by 1 as a task assigned to the GPU 311 is completed Continuously updated.

FULL 상태에서, GPU 사용률이 설정된 임계치 이하로 떨어지는 경우에 동적 스케쥴러(353)의 FSM의 현재 설정된 동시수행 희망개수를 태스크수행 현재개수에 지정된 개수(예를 들어 1)의 합산된 값으로 변경(도 6의 set desired = cur_gpu + 1)하고 TRANSITION2 상태로 천이한다. In the FULL state, when the GPU usage rate falls below the set threshold value, the currently set simultaneous execution number of the FSM of the dynamic scheduler 353 is changed to the sum of the number (for example, 1) 6 set desired = cur_gpu + 1) and transitions to TRANSITION2 state.

이와 같이 FULL 상태에서 TRANSITION2 상태로의 천이로 인해 다시 GPU 태스크를 동적 스케쥴러(353)가 할당할 수 있고 이를 통해 동적으로 동시수행 희망개수를 변경할 수 있다. 동적인 동시수행 희망개수의 적응을 통해 워커 노드(300)는 GPU(311)의 최대성능을 활용할 수 있다. Due to the transition from the FULL state to the TRANSITION2 state, the dynamic scheduler 353 can allocate the GPU task again, thereby dynamically changing the desired simultaneous execution count. The worker node 300 can utilize the maximum performance of the GPU 311 through adaptation of the desired number of dynamic concurrent operations.

변경되는 동시수행 희망개수는 현재의 동시에 수행되고 있는 태스크수행 현재개수에 1을 더한 값으로 설정되고 이 값은 적어도 GPU 사용률이 상한치(임계치)보다 적거나 같을 때의 태스크수행 현재개수에 1 더 큰 값이다. The desired number of simultaneous changes to be performed is set to a value obtained by adding 1 to the current number of simultaneously executed tasks, which is at least 1 greater than the current number of task execution when the GPU utilization rate is less than or equal to the upper limit value (threshold value) Value.

Full 상태에서 바로 임계치 이하로 떨어지는 경우의 동시수행 희망개수에 하나 이상의 GPU 태스크를 더 할당할 수 있어 GPU의 최대 성능을 활용할 수 있다. One or more GPU tasks can be allocated to the desired number of simultaneous operations when the number of the GPUs falls below the threshold immediately after the full state, thereby maximizing the performance of the GPU.

TRANSITION2 상태에서는 GPU 태스크를 할당할 수 있고 현재의 태스크수행 현재개수에 따라 UNDER 상태나 TRANSITION1 상태로 천이할 수 있다. In the TRANSITION2 state, the GPU task can be assigned and transition can be made to the UNDER state or the TRANSITION1 state according to the current number of task execution.

이와 같이 FSM은 적어도 변수를 초기화하고, 단계 S113 및 단계 S117을 수행하도록 구성된다. Thus, the FSM is configured to at least initialize the variable and perform steps S113 and S117.

노드 매니저(350)는 GPU 모니터(351)를 통해 주기적으로 GPU 사용률을 모니터링하고 동적 스케쥴러(353)는 신규의 태스크 할당이나 할당된 태스크의 종료 또는 지정된 주기에 따라 FSM을 트리거링하여 신규 태스크를 CPU(309)나 GPU(311)에 할당할 수 있다. The node manager 350 periodically monitors the GPU usage rate through the GPU monitor 351 and the dynamic scheduler 353 triggers the FSM according to a new task assignment, an end of the assigned task, or a specified period, 309) or the GPU 311, as shown in FIG.

신규 태스크의 할당에 따라 단계 S109 내지 S119는 복수 회 반복 수행된다. Steps S109 to S119 are repeated a plurality of times in accordance with the assignment of the new task.

이상에서 설명한 본 발명은, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 있어 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하므로 전술한 실시 예 및 첨부된 도면에 의해 한정되는 것이 아니다. It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention. The present invention is not limited to the drawings.

10 : 노드
100 : 클라이언트 노드 200 : 마스터 노드
300 : 워커 노드
301 : 통신 인터페이스 303 : 메모리
305 : 대용량 저장 매체 307 : 시스템 버스/제어 버스
309 : CPU 309-1 : CPU 코어
311 : GPU 311-1 : GPU 코어
350 : 노드 매니저 351 : GPU 모니터
353 : 동적 스케쥴러
370 : 태스크 10: Node
100: client node 200: master node
300: Worker node
301: Communication interface 303: Memory
305: mass storage medium 307: system bus / control bus
309: CPU 309-1: CPU core
311: GPU 311-1: GPU Core
350: Node Manager 351: GPU Monitor
353: Dynamic scheduler
370: Task

Claims

A task scheduling method considering heterogeneous processing types,
(b) a first processing unit or a second processing unit based on the number of concurrently executed tasks, the worker node being set by the master node to indicate the number of tasks assigned to the worker node to be performed simultaneously in the second processing unit Dynamically allocating to the processing unit;
(c) the worker node monitoring utilization of the second processing unit; And
(d) the worker node changing the desired number of concurrent operations based on a comparison of the utilization rate and a specified threshold value.
Task Scheduling Method.

The method according to claim 1,
Further comprising: before step (b), setting a concurrent maximum number of tasks for which the worker node is to be set for the second processing unit to a concurrent execution number of wishes,
Wherein the step (b) updates the current number of tasks performed in the second processing unit according to task assignment,
Task Scheduling Method.

3. The method of claim 2,
Wherein the step (d) changes the desired number of simultaneous execution to the sum of the numbers specified in the task execution current number when the utilization rate is equal to or larger than the specified threshold,
Task Scheduling Method.

3. The method of claim 2,
Further comprising: setting a maximum number of simultaneous operations by a worker node before the simultaneous execution number setting step,
Wherein the maximum concurrent number of the second processing units is set based on a total number of tasks assigned to the worker node and a usage rate of a first processing unit of a task to be assigned to the second processing unit,
Task Scheduling Method.

The method according to claim 1,
Wherein the steps (b) and (d) comprise a finite state machine (FSM) in a dynamic scheduler on a node manager implemented in a worker node,
Wherein the FSM is triggered according to an event associated with the task to be assigned,
Task Scheduling Method.

6. The method of claim 5,
Wherein the steps (b) to (d) are performed a plurality of times,
Prior to step (b), the master node assigning one or more tasks to the worker node; And forcibly assigning the currently assigned task to the second processing unit based on the number of tasks for which the worker node has already been performed,
The FSM implementing step (b) and step (d), if forced to the second processing unit,
Task Scheduling Method.

The method according to claim 1,
Wherein the first processing unit is a CPU including a plurality of CPU cores,
Wherein the second processing unit is a GPU comprising a plurality of GPU cores,
Task Scheduling Method.

As a distributed processing system considering heterogeneous processing types,
At least one worker node for processing a task,
Each worker node,
A CPU including a plurality of CPU cores; A GPU including a plurality of GPU cores; And
Dynamically assigning a task performed by the CPU or the GPU and assigned to the worker node by a master node to a CPU or a GPU based on the number of concurrently executed tasks set to indicate the number of tasks that can be performed simultaneously in the GPU A node manager,
Wherein the node manager changes the number of simultaneous execution attempts to be used for task assignment to the GPU based on a comparison of the usage rate of the GPU with a specified threshold,
Distributed processing system.

9. The method of claim 8,
The node manager,
A GPU monitor that monitors the utilization of the GPU, and
A dynamic scheduler for assigning a task assigned to a CPU or a GPU based on a desired number of simultaneous execution in the GPU and a utilization rate of the GPU,
Distributed processing system.

10. The method of claim 9,
Wherein the dynamic scheduler sets the maximum number of simultaneously-executed tasks to be set for the GPU as the desired simultaneous execution number and updates the current number of tasks to be performed in the current GPU according to the task assignment,
Distributed processing system.

11. The method of claim 10,
Wherein the dynamic scheduler changes the desired number of simultaneous execution to the sum of the numbers specified in the task execution current number when the usage rate of the GPU is equal to or greater than the specified threshold,
Distributed processing system.

11. The method of claim 10,
Wherein the dynamic scheduler first sets the maximum number of concurrent operations before setting a desired number of simultaneous operations,
Wherein the maximum number of concurrent tasks to be set is set based on a total number of tasks assigned to the worker node and a usage rate of a CPU of a task to be allocated to the GPU.
Distributed processing system.

10. The method of claim 9,
The dynamic scheduler includes an FSM for assigning tasks to a CPU or a GPU,
Wherein the FSM is triggered according to an event associated with the task to be assigned,
Distributed processing system.

14. The method of claim 13,
Further comprising: a master node determining a task to process an input job and assigning the determined one or more tasks to the at least one worker node via a communication network,
Wherein the node manager of the worker node forcibly assigns a currently assigned task to the GPU based on the number of tasks already performed,
Distributed processing system.