KR101953906B1

KR101953906B1 - Apparatus for scheduling task

Info

Publication number: KR101953906B1
Application number: KR1020160044176A
Authority: KR
Inventors: 배유석; 박종열
Original assignee: 한국전자통신연구원
Priority date: 2016-04-11
Filing date: 2016-04-11
Publication date: 2019-06-12
Also published as: KR20170116439A

Abstract

태스크 스케줄링 장치가 제공된다. 이 장치는, 입력되는 데이터를 저장하는 데이터 저장부; 상기 데이터 저장부에 저장된 데이터를 단위 시험을 통해 결정한 초기값 또는 작업량 분배 정보에 따라 CPU 프로세서 또는 GPU 프로세서에 할당하도록 태스크를 생성하고 스케줄링하는 태스크 관리부; 시스템 상태 정보 또는 상기 태스크 관리부를 통해 CPU 프로세서 또는 GPU 프로세서에서 처리된 태스크 이력 정보를 포함한 성능 프로파일 정보를 저장 관리하는 프로파일 관리부; 및 상기 프로파일 관리부에서 저장 관리되는 성능 프로파일 정보를 분석하여 CPU 프로세서 또는 GPU 프로세서에 할당할 태스크의 비율과 작업량을 동적으로 조절하여 작업량 분배 정보를 생성하고 상기 태스크 관리부에 전달하는 작업량 분배부를 포함한다.A task scheduling apparatus is provided. The apparatus comprises: a data storage unit for storing input data; A task manager for generating and scheduling tasks to be allocated to a CPU processor or a GPU processor according to an initial value or workload distribution information determined through a unit test; A profile management unit for storing performance status information including system status information or task history information processed by the CPU processor or the GPU processor through the task management unit; And a workload distribution unit for analyzing the performance profile information stored and managed by the profile management unit and dynamically adjusting a ratio of a task to be allocated to a CPU processor or a GPU processor and a workload to generate workload distribution information and delivering the information to the task management unit.

Description

[0001] APPARATUS FOR SCHEDULING TASK [0002]

본 발명은 태스크 스케줄링 방법 및 장치에 관한 것으로서, 보다 구체적으로는 CPU 자원과 GPU 자원을 동시에 이용하는 방식으로 태스크를 스케줄링하는 태스크 스케줄링 방법 및 장치에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a task scheduling method and apparatus, and more particularly, to a task scheduling method and apparatus for scheduling tasks using a CPU resource and a GPU resource simultaneously.

최근 들어 다수의 코어(core)를 단일 칩에 집적하는 매니코어(many core) 시스템이 보급되고 있으며, 이에 나아가 다수의 코어 중 CPU 프로세서와 GPU 프로세서와 같이 이종의 프로세서 코어가 함께 탑재되는 형태의 매니코어 시스템이 개발되어있다. 또한, 하나의 단일 칩이 아닌 복수의 칩이 병렬적으로 연결되는 형태의 시스템이나 상술한 매니코어를 복수개 설치하여 병렬적으로 연결하는 형태의 시스템 또한 개발되어 있다.In recent years, many core systems for integrating a plurality of cores on a single chip have been popular. Furthermore, a plurality of cores, such as a CPU processor and a GPU processor, Core system has been developed. In addition, a system in which a plurality of chips are connected in parallel, rather than a single chip, or a system in which a plurality of the above-described manifolds are installed and connected in parallel is also developed.

이렇게 많은 코어나 칩이 또는 이종의 프로세서가 사용되는 시스템에서는 각각의 코어나 프로세서들을 효율적으로 사용하여 전체 시스템의 성능을 최대한 발휘하도록 하기 위한 연구가 진행되고 있다.In a system where a large number of cores, chips, or heterogeneous processors are used, studies are being conducted to efficiently use the respective cores or processors to maximize the performance of the entire system.

그 중, 이종의 프로세서 즉, CPU와 GPU에 대한 효율적인 처리를 위해 동일한 연산에 대하여 다수의 CPU를 이용하여 병렬 처리를 수행하거나 CPU 프로세서는 연산에 관여하지 않고 단순히 GPU 프로세서를 호출하여 처리를 수행하도록 요청하고 수행된 결과를 받기 위해서 대기하는 상태로 동작하는 처리 방식이 존재한다. 이는 GPU 프로세서의 활용율은 높지만 CPU 프로세서의 경우 유휴 상태로 존재하는 시간이 늘어나 전체적인 성능 향상에는 한계가 있다.In order to efficiently process different types of processors, that is, the CPU and the GPU, parallel processing is performed using a plurality of CPUs for the same operation, or the CPU processor simply calls the GPU processor There is a processing scheme that operates in a state of waiting to receive the requested result. This means that GPU processor utilization is high, but the CPU processor has more idle time, which limits the overall performance improvement.

또한, 상술한 바와 같이 매니코어 시스템 등은 다수의 CPU 프로세서와 GPU 프로세서를 포함하고 있어, 서로 다른 다수의 프로세서들 모두의 활용률을 높이는 연구가 필요한 상황이다. In addition, as described above, the Mann-core system includes a plurality of CPU processors and a GPU processor, and it is necessary to increase the utilization rate of all the different processors.

본 발명은 상기와 같은 필요에 의해 창출된 것으로서, 다수의 CPU 프로세서와 다수의 GPU 프로세서가 포함된 시스템 또는 클러스터 시스템에서 각각의 프로세서(CPU 프로세서, GPU 프로세서) 자원을 효율적으로 활용하기 위한 태스크 스케줄링 장치 및 방법을 제공하는데 그 목적이 있다.The present invention provides a task scheduling apparatus for efficiently utilizing resources of each processor (CPU processor, GPU processor) in a system or a cluster system including a plurality of CPU processors and a plurality of GPU processors, And a method thereof.

본 발명의 다른 목적 및 장점들은 하기에 설명될 것이며, 본 발명의 실시예에 의해 알게 될 것이다. 또한, 본 발명의 목적 및 장점들은 첨부된 특허 청구 범위에 나타낸 수단 및 조합에 의해 실현될 수 있다.Other objects and advantages of the present invention will be described hereinafter and will be understood by the embodiments of the present invention. Further, objects and advantages of the present invention can be realized by the means and the combination shown in the appended claims.

전술한 목적을 달성하기 위하여, 본 발명의 일면에 따른 태스크 스케줄링 장치는, 입력되는 데이터를 저장하는 데이터 저장부; 상기 데이터 저장부에 저장된 데이터를 기 설정된 제어 정보 또는 작업량 조절 정보에 따라 CPU 프로세서 또는 GPU 프로세서에 할당하도록 태스크를 생성하고 스케줄링하는 태스크 관리부; 시스템 상태 정보 또는 상기 태스크 관리부를 통해 CPU 프로세서 또는 GPU 프로세서에서 처리된 태스크의 이력 정보를 포함한 성능 프로파일 정보 정보를 저장 관리하는 프로파일 관리부; 및 상기 프로파일 관리부에서 저장 관리되는 성능 프로파일 정보를 분석하여 CPU 프로세서 또는 GPU 프로세서에 할당할 태스크의 비율과 작업량을 동적으로 조절하여 작업량 분배 정보를 생성하고 상기 태스크 관리부에 전달하는 작업량 분배부;를 포함한다.According to an aspect of the present invention, there is provided a task scheduling apparatus including: a data storage unit for storing input data; A task manager for creating and scheduling a task to assign data stored in the data storage unit to a CPU processor or a GPU processor according to preset control information or workload adjustment information; A profile management unit for storing performance profile information including system state information or history information of a task processed by the CPU processor or the GPU processor through the task manager; And a workload distribution unit for analyzing performance profile information stored and managed by the profile management unit and dynamically adjusting a ratio of a task to be allocated to a CPU processor or a GPU processor and a workload to generate workload distribution information and deliver the information to the task management unit do.

본 발명의 다른 일면에 따른 태스크 스케줄링 방법은, 프로파일 관리부가 시스템 상태 정보 또는 상기 CPU 프로세서 또는 GPU 프로세서에서 처리된 태스크의 이력 정보를 포함한 성능 프로파일 정보를 저장 관리하는 단계; 작업량 분배부가 상기 성능 프로파일 정보를 분석하여 CPU 프로세서 또는 GPU 프로세서에 할당할 태스크의 비율과 작업량을 동적으로 조절하는 작업량 분배 정보를 생성하여 상기 태스크 관리부에 전달하는 단계; 및 태스크 관리부가 데이터 저장부에 저장된 데이터를 CPU 프로세서 또는 GPU 프로세서에 할당하여 처리하기 위해 태스크를 생성하고, 상기 작업량 분배 정보를 기반으로 상기 생성된 태스크를 스케줄링하는 단계;를 포함한다.
According to another aspect of the present invention, there is provided a method for scheduling a task, the method comprising: storing and managing performance profile information including profile information including system state information or history information of a task processed by the CPU processor or a GPU processor; The workload distribution unit analyzes the performance profile information and generates workload distribution information for dynamically controlling the ratio of the tasks and the workload to be assigned to the CPU processor or the GPU processor and transmitting the generated information to the task management unit; And a task management unit for assigning data stored in the data storage unit to the CPU processor or the GPU processor to generate a task, and scheduling the generated task based on the workload distribution information.

본 발명에 따르면, 서로 다른 프로세서들의 특성(메모리 크기, 코어 수, 메모리 대역폭 등)과 처리 성능을 고려하여 데이터 처리시의 부하를 적절하게 분산하도록 처리함으로써, 각 프로세서의 자원을 동시에 최대한 효율적으로 활용할 수 있어, 전체적인 시스템 성능을 향상시킬 수 있는 효과가 있다.According to the present invention, the load during data processing is appropriately distributed in consideration of the characteristics (memory size, number of cores, memory bandwidth, etc.) of different processors and processing performance, So that the overall system performance can be improved.

또한, 본 발명의 태스크 스케줄링 장치는, 프로파일 관리기능 및 작업량 분배기능을 제공하여 CPU 프로세서와 GPU 프로세서에 할당되는 태스크의 비율과 작업량을 동적으로 조절하여 스케줄링 함으로써 시스템 자원 이용성을 높이고 전체 시스템의 성능을 향상할 수 있는 효과를 제공한다.In addition, the task scheduling apparatus of the present invention provides a profile management function and a workload distribution function to dynamically adjust the ratio and workload of a task assigned to a CPU processor and a GPU processor, thereby increasing system resource utilization, Thereby providing an effect that can be improved.

본 명세서에 첨부되는 다음의 도면들은 본 발명의 바람직한 실시예를 예시하는 것이며, 후술할 발명의 상세한 설명과 함께 본 발명의 기술사상을 더욱 이해시키는 역할을 하는 것이므로, 본 발명은 그러한 도면에 기재된 사항에만 한정되어 해석되어서는 아니 된다.
도 1은 본 발명의 일 실시예에 따른 태스크 스케줄링 장치의 주요 구성을 나타낸 도면이다.
도 2는 본 발명의 일 실시예에 따른 태스크 스케줄링 장치의 동작 과정을 나타낸 순서도이다. BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate preferred embodiments of the invention and, together with the description of the invention below, And should not be construed as limiting.
1 is a block diagram illustrating a main configuration of a task scheduling apparatus according to an embodiment of the present invention.
2 is a flowchart illustrating an operation of a task scheduling apparatus according to an exemplary embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 용이하게 이해할 수 있도록 제공되는 것이며, 본 발명은 청구항의 기재에 의해 정의된다. 한편, 본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 또는 "포함하는(comprising)"은 언급된 구성요소, 단계, 동작 및/또는 소자 이외의 하나 이상의 다른 구성요소, 단계, 동작 및/또는 소자의 존재 또는 추가를 배제하지 않는다.BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention and the manner of achieving them will become apparent with reference to the embodiments described in detail below with reference to the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. And is intended to enable a person skilled in the art to readily understand the scope of the invention, and the invention is defined by the claims. It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. In the present specification, the singular form includes plural forms unless otherwise specified in the specification. It is noted that " comprises, " or "comprising," as used herein, means the presence or absence of one or more other components, steps, operations, and / Do not exclude the addition.

도 1은 본 발명의 일 실시예에 따른 태스크 스케줄링 장치의 주요 구성을 나타낸 도면이다.1 is a block diagram illustrating a main configuration of a task scheduling apparatus according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 태스크 스케줄링 장치(100)는, 복수의 프로세서들이 포함된 시스템에서 각 프로세서들의 특성(메모리 크기, 코어 수, 메모리 대역폭 등)과 처리 성능을 고려하여 데이터 처리시의 부하를 적절하게 분산 처리할 수 있는 기능을 제공한다. 이를 통해 각 프로세서의 자원을 동시에 최대한 효율적으로 활용할 수 있게 되어, 전체 시스템의 성능을 향상하도록 한다. 여기서 복수의 프로세서들은 CPU 프로세서(중앙 처리 장치), GPU 프로세서(그래픽 처리 장치) 등을 예로 들 수 있으며, 그 외에 데이터 처리가 가능한 다양한 프로세서 유닛이 이용될 수 있다.The task scheduling apparatus 100 according to an embodiment of the present invention calculates a load (load) during data processing in consideration of characteristics (memory size, number of cores, memory bandwidth, etc.) of each processor and processing performance in a system including a plurality of processors To be distributed appropriately. This makes it possible to utilize the resources of each processor as efficiently as possible, thereby improving the performance of the entire system. Here, the plurality of processors may be a CPU processor (central processing unit), a GPU processor (graphics processing unit), or the like. In addition, various processor units capable of data processing may be used.

도 1에 도시된 바와 같이, 본 발명에 따른 태스크 스케줄링 장치(100)는 데이터 저장부(110), 태스크 관리부(120), 프로파일 관리부(130), 작업량 분배부(140)를 포함한다.1, a task scheduling apparatus 100 according to the present invention includes a data storage unit 110, a task management unit 120, a profile management unit 130, and a workload distribution unit 140.

상기 데이터 저장부(110)는 입력으로 들어오는 데이터(210)를 저장 관리하고, 상기 태스크 관리부(210)의 요청에 따라 저장 관리중인 데이터를 전달한다. 이때, 상기 데이터 저장부(110)는 대량으로 입력되는 데이터(210)를 처리하기 위하여 분산 환경으로 구성될 수 있다. 또한, 다수의 시스템에 데이터를 복제하여 저장 관리하는 방식을 통하여, 네트워크 오류 또는 시스템 장애 발생 시에도 데이터 유실 없이 안전하게 데이터를 보관할 수 있도록 한다. The data storage unit 110 stores and manages data 210 received as input, and transmits data to be stored and managed according to a request of the task management unit 210. At this time, the data storage unit 110 may be configured as a distributed environment for processing a large amount of input data 210. In addition, by copying and managing data in a plurality of systems, it is possible to safely store data without losing data even when a network error or a system failure occurs.

아울러, 상기 데이터 저장부(110)는 상기 태스크 관리부(120)에서 다수의 태스크들이 동시에 접근할 경우에 대비하여 뮤텍스(mutex), 세마포어(semaphore) 또는 읽기-쓰기 락(Reader-Writer Lock) 등의 방식을 구현할 수 있는 기능을 구비할 수 있다. The data storage unit 110 may store a mutex, a semaphore, a reader-writer lock, or the like in preparation for the simultaneous access of a plurality of tasks in the task management unit 120. [ It is possible to provide a function capable of implementing the method.

이를 통해, 다수의 태스크가 동시에 접근하더라도 허용된 태스크들만이 데이터 저장부(110)에 접근할 수 있도록 제어하여, 데이터의 무결성(integrity)을 보장할 수 있고, 나아가 데이터 안정성을 보장할 수 있도록 한다.Accordingly, even if a plurality of tasks are accessed at the same time, only authorized tasks are allowed to access the data storage unit 110, thereby ensuring data integrity and ensuring data stability .

이와 함께, 상기 데이터 저장부(110)는 임계값(threshold)을 설정할 수 있도록 하여, 입력 또는 저장되는 데이터(210)가 해당 임계값을 초과할 경우 저장 공간을 확장하는 방식으로 운용할 수 있다. In addition, the data storage unit 110 may set a threshold value, and may operate in a manner of expanding the storage space when the input or stored data 210 exceeds the threshold value.

또한, 상기 데이터 저장부(110)는 일정한 버퍼 크기를 갖는 회전 큐(circular queue)와 같은 형태를 구성하고 데이터가 들어오거나 나갈 때 데이터를 식별하는 인덱스를 조정하여 저장 공간을 재사용하는 방식으로 운용할 수 있다. The data storage unit 110 may be configured as a circular queue having a constant buffer size and may be operated in a manner of reusing the storage space by adjusting an index for identifying data when data is input or output .

이를 통해, 상기 데이터 저장부(110)의 물리적 저장 공간을 보다 효율적으로 활용할 수 있도록 한다.Thus, the physical storage space of the data storage unit 110 can be utilized more efficiently.

상기 태스크 관리부(120)는 시스템에 탑재된 CPU(Central Processing Unit) 프로세서 또는 GPU(Graphics Processing Unit) 프로세서에서 태스크(220)를 수행할 수 있도록 프로세스(process)와 쓰레드(thread)를 생성하여 상기 데이터 저장부(110)에 입력된 데이터를 처리하는 역할을 수행한다. 이때, 상기 태스크 관리부(120)는 프로세스에서 복수의 멀티 쓰레드를 생성하고, 이를 통해 입력된 데이터를 동시에 처리하도록 한다. The task management unit 120 generates a process and a thread so that a task 220 can be executed in a central processing unit (CPU) processor or a graphics processing unit (GPU) And processes the data input to the storage unit 110. At this time, the task management unit 120 generates a plurality of multi-threads in the process, and simultaneously processes the input data.

여기서, 프로세스는 메인 메모리에 저장되어 실행되는 프로그램의 작업의 단위가 되며, 쓰레드는 프로세서의 실행 흐름 단위로 프로세스 내에서 자원을 공유하며 병렬로 동시 작업을 수행한다. 즉, 하나의 프로세스는 복수의 쓰레드를 생성하여 동시에 병렬 처리를 수행한다. 따라서, 상기 태스크 관리부(120)에서는 시스템 환경이나 프로그램 환경에 따라 복수의 쓰레드를 동시에 실행할 수 있고, 본 발명의 실시예에서와 같이 멀티 쓰레드 실행 방식을 통해 구현할 수 있다.Here, a process is a unit of work of a program stored in main memory and executed, and a thread shares resources in a process as a unit of execution flow of a processor and performs a parallel operation concurrently. That is, one process creates a plurality of threads and performs parallel processing at the same time. Accordingly, the task manager 120 can execute a plurality of threads at the same time according to a system environment or a program environment, and can be implemented through a multi-thread execution method as in the embodiment of the present invention.

즉, 상기 태스크 관리부(120)는 CPU 프로세서 또는 GPU 프로세서에서 동시에 태스크(220)를 처리하기 위해 각 프로세서를 위한 멀티 쓰레드를 생성하여 실행, 종료, 대기 등의 일련의 동기화를 포함한 태스크 스케줄링을 처리한다. That is, the task management unit 120 generates a multi-thread for each processor to process the task 220 simultaneously by the CPU processor or the GPU processor, and processes the task scheduling including a series of synchronization such as execution, termination, and wait .

다시 말해서, 상기 데이터 저장부(110)에 입력되는 데이터가 없을 경우, 태스크(220)를 처리하는 멀티 쓰레드들은 대기 상태로 있게 되며, 상기 데이터 저장부(110)에 데이터가 입력될 경우 대기중인 멀티 쓰레드들을 깨워서 입력되는 데이터를 가져와 태스크(220)를 처리하는 형태로 동작한다. In other words, when there is no data to be input to the data storage unit 110, the multithreads processing the task 220 are in a standby state, and when data is input to the data storage unit 110, Wakes up the threads and fetches the input data and processes the task 220.

나아가, 상술한 바와 같이 멀티 쓰레드들이 동시에 데이터를 가져오려고 할 때 상기 데이터 저장부(110)에 정의된 뮤텍스, 세마포어 또는 읽기-쓰기 락 등의 방식을 통해 허가된 쓰레드들만이 데이터를 가져와서 태스크(220)를 처리할 수 있도록 하여, 데이터의 무결성 및 안전성을 보장하도록 한다. Furthermore, when the multithreads try to fetch data at the same time, only the threads that are permitted through the mutex, semaphore, or read-write lock defined in the data storage unit 110 fetch the data and execute the task 220) to ensure the integrity and safety of the data.

또한, 상기 태스크 관리부(120)는 데이터 입력 시 대기중인 멀티 쓰레드들을 깨울 때에 입력되는 데이터의 크기를 고려하여 깨우는 시점을 조절하도록 할 수 있다.In addition, the task management unit 120 may adjust a wake-up time in consideration of the size of data input when waking up multi-threads waiting for data input.

상기 태스크 관리부(120)에서는 CPU 프로세서 또는 GPU 프로세서를 동시에 이용하는 태스크 스케줄링을 구현하기 위하여, CPU 프로세서 또는 GPU 프로세서에서 처리할 태스크(220)의 비율과 작업량을 단위 시험을 통해 결정한 초기값으로 설정하고 처리 중 후술할 작업량 분배부(140)의 도움을 받아 프로세서 별로 처리할 태스크(220)의 비율과 작업량을 동적으로 조절하도록 한다. In order to implement task scheduling using a CPU processor or a GPU processor at the same time, the task management unit 120 sets a ratio of the task 220 to be processed by the CPU processor or the GPU processor and an amount of work to an initial value determined through unit testing, The ratio of the task 220 to be processed for each processor and the work amount are dynamically adjusted by the help of the work amount distribution unit 140 described later.

상기 초기값 설정의 경우, 하나의 시스템에서 CPU 프로세서와 GPU 프로세서에 데이터를 처리하는 태스크의 수행 시간을 일정 횟수 반복 측정하여 CPU 프로세서와 GPU 프로세서에 할당할 초기 태스크의 비율과 작업량을 결정한다.In the case of setting the initial value, the execution time of the task of processing the data to the CPU processor and the GPU processor in one system is repeatedly measured a predetermined number of times to determine the ratio and workload of the initial task to be allocated to the CPU processor and the GPU processor.

아울러, 상기 태스크 관리부(120)는 시스템의 사양을 고려하여 일정한 수의 멀티 쓰레드들로 구성된 쓰레드 풀(pool)을 구성하고 데이터가 입력될 때 쓰레드 풀에서 기 생성된 쓰레드를 꺼내어 처리하고 재사용하는 방식을 통해, 처리 시간을 줄이고 자원을 효율적으로 사용하는 태스크 스케줄링을 구현할 수 있다.In addition, the task manager 120 configures a thread pool having a predetermined number of multithreads in consideration of the specifications of the system, extracts a thread created in the thread pool when data is input, processes the thread, and reuses the thread It is possible to reduce the processing time and realize task scheduling that uses resources efficiently.

이와 함께, 상기 태스크 관리부(120)는 처리되는 데이터를 분할하여 동일한 형태의 태스크(220)로 처리하도록 할 수 있다. 이를 통해 다수의 CPU 프로세서 또는 GPU 프로세서에서 데이터를 분리 할당하여 병렬적으로 수행함으로써 시스템 전체 처리 성능을 향상시킬 수 있다. At the same time, the task management unit 120 may divide the processed data and process the divided data into the task 220 of the same type. Accordingly, it is possible to improve performance of the entire system by separately allocating and allocating data from multiple CPU processors or GPU processors in parallel.

또한, 상기 태스크 관리부(120)은 정적 태스크 스케줄링과 동적 태스크 스케줄링을 지원할 수 있다. 정적 태스크 스케줄링은 프로그램 실행 전 사전 지식 (시스템 환경, 프로그램 구조, 태스크의 특성 등)을 이용하여 태스크(220)의 특정 프로세서로의 매핑을 미리 결정하고 스케줄링하는 방식이며, 동적 태스크 스케줄링은 프로그램 실행 중에 태스크(220)의 특정 프로세서로의 매핑을 결정하여 스케줄링하는 방식이다.In addition, the task manager 120 may support static task scheduling and dynamic task scheduling. The static task scheduling is a method of predetermining and scheduling a mapping of the task 220 to a specific processor using prior knowledge (system environment, program structure, task characteristics, etc.) before execution of the program, A mapping of the task 220 to a specific processor is determined and scheduled.

아울러, 상기 태스크 관리부(120)는 태스크(220)를 프로세서에 할당하고 스케줄링할 때 프로세서와 태스크(220)의 특성을 고려하도록 하여 보다 효율적으로 처리할 수 있도록 한다. In addition, the task management unit 120 may consider the characteristics of the processor and the task 220 when scheduling the task 220 to be allocated to the processor, so that the task management unit 120 can process the task 220 more efficiently.

이를 위해, 상기 태스크 관리부(120)는 순차적인 처리를 요구하는 태스크는 CPU 프로세서에 매핑되어 처리될 수 있도록 하고, 높은 수준의 병렬 태스크는 GPU 프로세서에 매핑되어 처리될 수 있도록 스케줄링 한다. For this, the task manager 120 may schedule tasks requiring sequential processing to be mapped to a CPU processor, and high-level parallel tasks to be processed and mapped to the GPU processor.

다른 예로, 상기 태스크 관리부(120)는 순차 알고리즘을 병렬화 하는 과정에서 규칙적인 부분은 높은 병렬성을 제공하는 GPU 프로세서에 매핑하여 처리하고, 동기화와 데이터 통신을 요구하여 GPU 프로세서의 이점을 볼 수 없는 불규칙적인 부분은 CPU 프로세서에 매핑하여 처리하도록 할 수 있다. As another example, the task manager 120 may map the regular part to the GPU processor that provides high parallelism in the process of parallelizing the sequential algorithm, process it irregularly Can be processed by mapping to the CPU processor.

이처럼, 상기 태스크 관리부(120)는 다양한 태스크(220)의 특성을 고려하여 상술한 처리 예에서와 같이 처리할 수 있으며, 이 외에도 각 특성에 맞게 효율적으로 처리할 수 있는 다양한 예가 이용될 수 있다.In this way, the task management unit 120 can process the task in the same manner as the above-described processing example in consideration of the characteristics of the various tasks 220, and various examples that can be efficiently processed according to the respective characteristics can be used.

한편, 상기 태스크 관리부(120)에서는 상기 데이터 저장부(110)에 데이터가 없을 경우 데이터를 처리하기 위한 쓰레드들을 대기 상태로 전환하고, 상술한 임계치 기준 이상의 데이터가 입력될 경우 대기중인 쓰레드들을 깨워 데이터를 처리하도록 한다. 이때, 상기 데이터 저장부(110)에 임계치 기준 이상의 데이터가 기 설정된 기준 시간 동안 입력되지 않을 경우에는 쓰레드 풀을 종료하여 쓰레드 및 메모리 자원을 해제하고 태스크 처리 업무를 종료한다. If there is no data in the data storage unit 110, the task management unit 120 switches threads for processing data to a standby state. If data exceeding the threshold value is input, . At this time, if the data exceeding the threshold value is not input to the data storage unit 110 for a preset reference time, the thread pool is terminated to release the thread and memory resources, and the task processing task is terminated.

이렇게, 태스크 처리 업무가 종료된 후에 데이터가 상기 데이터 저장부(110)에 입력될 경우 멀티 쓰레드를 생성하여 데이터를 처리하는 태스크 관리 기능을 다시 수행하도록 한다. 이를 통해, 보다 효율적인 처리를 수행할 수 있게 된다.When data is input to the data storage unit 110 after the task processing task is terminated, the task management function for generating data and processing the data is performed again. Thus, more efficient processing can be performed.

아울러, 상기 태스크 관리부(120)는 분산 시스템 환경에서 상기 데이터 저장부(110)가 다수개 존재할 경우, 데이터의 지역성(locality)을 고려하여 데이터가 위치한 곳의 CPU 또는 GPU 프로세서에 태스크가 할당되도록 스케줄링 함으로써 프로세서 간 데이터 전송 시간을 줄일 수 있도록 하여, 시스템 성능을 향상하도록 한다.When there are a plurality of data storage units 110 in the distributed system environment, the task management unit 120 performs scheduling so that tasks are allocated to a CPU or a GPU processor where data is located in consideration of locality of data. Thereby reducing data transfer time between processors, thereby improving system performance.

또한, 상기 태스크 관리부(120)는 태스크를 처리할 때 워크 스틸링(work stealing) 또는 워크 쉐어링(work sharing) 방식을 적용할 수 있다. 이를 통해 태스크 처리시 높은 로드 밸런싱을 확보할 수 있다. In addition, the task manager 120 may apply work stealing or work sharing when processing a task. This ensures high load balancing during task processing.

여기서, 상기 워크 스틸링 방식은 유휴(idle) 프로세서에서 바쁜(busy) 프로세서의 작업 일부를 가져와서 처리하는 방식이며, 워크 쉐어링 방식은 어떤 프로세서에서 태스크를 생성할 때마다 작업 분산을 위해 스케줄러에서 충분히 이용하지 못하는 프로세서로 태스크들의 일부를 이송(migration)하여 처리하는 방식이다.Herein, the work stealing method is a method of fetching and processing a part of a busy processor in an idle processor, and a work-sharing method is a method in which a scheduler sufficiently It is a method to migrate and process a part of tasks with an unavailable processor.

상기 프로파일 관리부(130)는 시스템 상태 정보 또는 CPU 프로세서 또는 GPU 프로세서에서 처리된 태스크 이력(history) 정보를 포함한 성능 프로파일 정보를 저장 관리하는 역할을 수행한다. The profile management unit 130 stores and manages performance profile information including system status information or task history information processed by a CPU processor or a GPU processor.

여기서, 시스템 상태 정보로는, 시스템의 프로세서 코어 수, 프로세스 속도, 프로세서 사용율, 메모리 사용율, 디스크 사용율, 네트워크 사용율, 등의 다양한 시스템 상태 정보가 포함된다. Here, the system status information includes various system status information such as the number of processor cores in the system, the process speed, the processor usage rate, the memory usage rate, the disk usage rate, and the network usage rate.

또한, 상기 태스크 이력 정보로는, 상기 태스크 관리부(120)에서 상술한 바와 같이 태스크에 할당된 데이터의 크기, 멀티 쓰레드의 유휴 상태, 태스크의 특성, 상기 태스크 관리부(120)에서 처리한 CPU 또는 GPU 프로세서의 태스크 수행 결과 정보 등을 포함하며, 상기 프로파일 관리부(130)는 상술한 정보들을 포함한 성능 프로파일 정보를 누적 저장하고 관리하는 역할을 수행한다.As described above, the task history information includes the size of data allocated to the task, the idle state of the multi-thread, the characteristics of the task, the CPU or GPU processed by the task management unit 120, Task execution result information of the processor, and the profile management unit 130 accumulates and manages performance profile information including the above-described information.

상기 작업량 분배부(140)는 상기 프로파일 관리부(130)에서 생성된 성능 프로파일 정보를 분석하여 CPU 프로세서와 GPU 프로세서에서 처리할 태스크의 비율과 작업량을 동적으로 조절하는 역할을 수행한다. The workload distribution unit 140 analyzes the performance profile information generated by the profile management unit 130 and dynamically adjusts the ratio of the tasks to be processed by the CPU processor and the GPU processor and the workload.

상기 작업량 분배부(140)는 이렇게 동적으로 조절된 작업량 분배 정보 생성하고 이를 상기 태스크 관리부(120)로 전달한다.The workload distribution unit 140 generates dynamically adjusted workload distribution information and delivers the information to the task management unit 120.

상기 작업량 분배부(140)에서 생성한 작업량 분배 정보는 상기 태스크 관리부(120)로 전달되고, 상기 태스크 관리부(120)는 전달받은 작업량 분배 정보에 근거하여 CPU 프로세서와 GPU 프로세서에서 처리할 태스크의 비율과 작업량을 조절하는 방식으로 작업할 태스크를 스케줄링한다.The workload distribution information generated by the workload distribution unit 140 is transmitted to the task management unit 120. The task management unit 120 calculates a ratio of a task to be processed by the CPU processor and the GPU processor based on the received workload distribution information, And scheduling the tasks to be performed in a manner of controlling the workload.

이와 같이, 성능 프로파일 정보에 기반한 스케줄링 방식은 CPU 프로세서와 GPU 프로세서의 처리 성능에 기반하여 태스크를 할당하는 방식으로써, 이미 남아 있는 태스크의 시간을 고려하지 않을 경우, CPU나 GPU 프로세서 자원이 과다하게 사용되거나 너무 낮게 사용될 수 있다.As described above, the scheduling method based on the performance profile information is a method of allocating a task based on the processing performance of a CPU processor and a GPU processor. If the time of an already remaining task is not taken into account, CPU or GPU processor resources are excessively used Or too low.

이를 위해, 상기 태스크 관리부(120)는 성능 프로파일 정보뿐만 아니라 남아 있는 태스크의 처리 예상 시간, 태스크의 특성 등을 추가로 고려하여 CPU 또는 GPU 프로세서의 처리할 작업량을 조절하도록 한다.To this end, the task management unit 120 controls not only the performance profile information but also the processing time of the CPU or the GPU processor by further considering the processing time of the remaining tasks, the characteristics of the task, and the like.

또한, 본 발명의 실시예에 따른 태스크 스케줄링 장치(100)는 각각의 분리된 장치로 기술하였으나 실제 구현상에서는 필요에 의해 기능들이 통합되거나 하나의 단일 시스템에서 동작할 수 있을 뿐 아니라, 분산 시스템 환경에서도 동작할 수 있음은 물론이다.In addition, although the task scheduling apparatus 100 according to the embodiment of the present invention has been described as a separate apparatus, in actual implementation, functions can be integrated or operated in a single system as required, Of course.

도 2는 본 발명의 일 실시예에 따른 태스크 스케줄링 장치(100)의 동작 과정은 나타낸 순서도이다.FIG. 2 is a flowchart illustrating an operation of the task scheduling apparatus 100 according to an exemplary embodiment of the present invention. Referring to FIG.

도 2를 참조하면, 본 발명의 일 실시예에 따른 태스크 스케줄링 장치(100)의 동작은 먼저, 데이터 저장부(110)를 통해 데이터(210)를 입력받아 저장하는 절차가 진행된다(S10).Referring to FIG. 2, an operation of the task scheduling apparatus 100 according to an exemplary embodiment of the present invention begins with a step of receiving and storing data 210 through a data storage unit 110 (S10).

이후, 데이터가 입력되면 태스크 관리부(120)에서는 저장된 데이터를 처리하기 위해 각 프로세서에 태스크를 할당하는 처리를 진행한다. 이때, 초기 설정값 또는 후술한 단계 S50을 통해 작업량 분배 정보를 전달받아, 이를 근거로 각 프로세서에서 처리할 태스크의 비율과 작업량을 조절하여 태스크를 할당하도록 한다(S20).Thereafter, when data is input, the task management unit 120 proceeds to assign a task to each processor to process the stored data. At this time, the workload distribution information is received through the initial setting value or the step S50 described later, and the task is allocated by adjusting the ratio of the task to be processed by each processor and the work amount based on the received task allocation information.

다음, 상기 태스크 관리부(120)에서는 CPU 프로세서와 GPU 프로세서에서 동시에 태스크를 수행하며, 태스크의 실행, 종료, 대기 등의 일련의 동기화를 포함한 태스크 스케줄링을 처리한다(S30).Next, the task management unit 120 simultaneously executes tasks in the CPU processor and the GPU processor, and performs task scheduling including a series of synchronization such as task execution, termination, and standby (S30).

프로파일 관리부(130)에서는 시스템 상태 정보 또는 CPU 프로세서 또는 GPU 프로세서에서 수행된 태스크 이력 정보를 포함한 성능 프로파일 정보를 누적하여 저장 관리하는 절차가 진행된다(S40).In step S40, the profile management unit 130 accumulates and manages performance profile information including system state information or task history information performed by the CPU processor or the GPU processor.

이후로, 작업량 분배부(140)에서는 성능 프로파일 정보와 현재 시스템의 상태 정보를 검토하여 각 프로세서에서 처리할 작업량을 분배하고, 작업량 분배 정보를 생성하여 상기 태스크 관리부(120)로 전달하는 절차를 진행한다. 이렇게 작업량 분배 정보를 전달하여 상기 단계 S20에서 태스크 할당 처리시에 이용할 수 있도록 반복적으로 피드백 역할을 수행한다(S50).Thereafter, the workload distribution unit 140 examines the performance profile information and the state information of the current system, distributes the workload to be processed by each processor, generates workload distribution information, and transmits the information to the task management unit 120 do. In step S50, the workload distribution information is transmitted in the step S20 so that the workload distribution information is used for the task allocation process.

이상 바람직한 실시예와 첨부도면을 참조하여 본 발명의 구성에 관해 구체적으로 설명하였으나, 이는 예시에 불과한 것으로 본 발명의 기술적 사상을 벗어나지 않는 범주 내에서 여러 가지 변형이 가능함은 물론이다. 그러므로 본 발명의 범위는 설명된 실시예에 국한되어 정해져서는 안되며 후술하는 특허청구의 범위뿐만 아니라 이 특허청구의 범위와 균등한 것들에 의해 정해져야 한다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims. Therefore, the scope of the present invention should not be limited by the illustrated embodiments, but should be determined by the scope of the appended claims and equivalents thereof.

100: 태스크 스케줄링 장치
110: 데이터 저장부
120: 태스크 관리부
130: 프로파일 관리부
140: 작업량 분배부100: Task scheduling device
110: Data storage unit
120: Task manager
130: Profile management unit
140: Workload distribution section

Claims

Storing and managing the performance profile information including the system status information or the history information of the CPU processor or the task processed by the GPU processor;
The workload distribution unit analyzes the performance profile information and generates workload distribution information for dynamically controlling the ratio of the tasks and the workload to be allocated to the CPU processor or the GPU processor and transmitting the generated information to the task management unit; And
Generating a task for assigning and processing data stored in a data storage unit to a CPU processor or a GPU processor and scheduling the generated task based on the workload distribution information,
Wherein the scheduling comprises:
Generating multiple threads for each processor to process the tasks simultaneously in the CPU and GPU processors (detailed description paragraph 0029); And
When the multithreads are attempting to fetch the data at the same time, they are permitted using any one of a mutex, a semaphore, or a reader-writer-lock defined in the data storage unit Only the multithreads fetch the data and schedule to process the task
Lt; / RTI >

delete

2. The method of claim 1, wherein the scheduling comprises:
Determining a ratio and an amount of an initial task to be allocated to the CPU processor and the GPU processor by repeatedly measuring the execution time of a task for processing data in the CPU processor and the GPU processor a predetermined number of times; And
Dynamically adjusting a ratio of a task to be processed and a workload in each of the CPU processor and the GPU processor based on the workload distribution information
Lt; / RTI >

The method of claim 1, wherein the scheduling comprises:
Scheduling the task based on a static task scheduling scheme or a dynamic task scheduling scheme; And
Assigning a task requiring sequential processing to the CPU processor in consideration of characteristics of a processor and a task, and assigning a parallel task requiring parallel processing to the GPU processor.

2. The method of claim 1, wherein the scheduling comprises:
Wherein the task is scheduled based on a work stealing or a work sharing scheme.

The system according to claim 1,
A number of cores of a processor, a processor speed, a processor utilization rate, a memory utilization rate, a disk utilization rate, and a network utilization rate.

The information processing apparatus according to claim 1,
A task scheduling method comprising the steps of: determining a size of data allocated to the task, an idle state of the multithread, a characteristic of the task, a processing expected time of the task, and task execution result information of a CPU or a GPU processor processed by the task manager .

2. The method of claim 1, wherein the scheduling comprises:
Wherein scheduling is performed such that a task is allocated to a CPU processor or a GPU processor where data is located in consideration of locality of data when the data storage unit is a plurality of data storage units in a distributed system environment.

2. The method of claim 1, wherein the scheduling comprises:
A thread pool having a predetermined number of multithreads in consideration of a system specification and scheduling the task in such a manner that a thread created in the thread pool is taken out and processed and reused when data is input Task scheduling method.

A data storage unit for storing input data;
A task management unit for assigning the data stored in the data storage unit to a CPU processor or a GPU processor and creating and scheduling a task for processing;
A profile management unit for storing and managing performance profile information including system state information or history information of a CPU processor or a task processed by a GPU processor through the task manager; And
And a workload distribution unit for analyzing the performance profile information stored and managed by the profile management unit and dynamically adjusting a ratio of a task to be allocated to a CPU processor or a GPU processor and a workload to generate workload distribution information and delivering the information to the task management unit ,
The task management unit,
A plurality of threads for simultaneously processing tasks in the CPU and GPU processors; and a mutex, a semaphore, and a semaphore defined in the data store when the multithreads simultaneously attempt to fetch the data, ) Or a read-write lock (Reader-Writer-Lock), only authorized multithreaders fetch the data and schedule the task to be handled.