KR102002246B1

KR102002246B1 - Method and apparatus for allocating resource for big data process

Info

Publication number: KR102002246B1
Application number: KR1020180024220A
Authority: KR
Inventors: 정종문; 김보배; 이진배
Original assignee: 연세대학교 산학협력단
Priority date: 2018-02-28
Filing date: 2018-02-28
Publication date: 2019-10-01

Abstract

The present invention relates to a method and an apparatus for distributing resources, capable of optimizing resource distribution by predicting an execution time by stochastically including failures of a Hadoop distributed file system (HDFS) and MapReduce. According to an embodiment of the present invention, a method for distributing resources for big data processing in a MapReduce and an HDFS comprises the following steps of: obtaining a failure probability (P_m) in a ″map″ step and a failure probability (P_r) in a ″reduce″ step; calculating an expectation value of the number of tasks by considering the obtained failure probability (P_m) and failure probability (P_r); predicting an execution time of data processing based on the calculated expectation value; and determining the number of slots to be allocated by using the predicted execution time.

Description

Resource distribution method and apparatus for big data processing {METHOD AND APPARATUS FOR ALLOCATING RESOURCE FOR BIG DATA PROCESS}

본 발명은 빅데이터 처리를 위한 자원 분배 방법 및 장치에 관한 것으로, 보다 상세하게는 하둡 분산 파일 시스템과 맵리듀스의 실패를 확률적으로 포함시켜서 실행 시간을 예측하여 자원 분배를 최적화할 수 있는 자원 분배 방법 및 장치에 관한 것이다. The present invention relates to a method and apparatus for distributing resources for processing big data, and more particularly, to a resource distribution capable of optimizing resource distribution by predicting execution time by probably including failures of the Hadoop distributed file system and MapReduce. A method and apparatus are disclosed.

빅데이터는 IoT 의 확산, 스마트 기기의 보급 등으로 인하여 생성되는 디지털 정보량이 폭발적으로 증가하기 시작함에 따라 여러 분야와 영역에서 주목받고 있는 기술이다. 빅데이터 체계 구축은 대량의 데이터를 효율적으로 활용하여 IoT, AI, AR/VR 등과 같은 다양한 융합기술과 다양한 체계에 적용 하는 것이다. 관련된 선행문헌으로 대한민국 등록특허 제10-1432751호가 있다.Big data is a technology that is attracting attention in various fields and areas as the amount of digital information generated due to the proliferation of IoT and the spread of smart devices begins to explode. Big data system construction is to apply large amount of data efficiently to various convergence technologies and various systems such as IoT, AI, AR / VR. Related prior arts are Korean Patent Registration No. 10-1432751.

하둡(Hadoop)은 빅데이터 처리를 위한 오픈소스 프레임 워크(Open Source Framework)이다. 하둡 분산 파일 시스템(Hadoop Distributed File System, HDFS), 맵리듀스(MapReduce) 데이터 처리기능과 리소스 관리자인 얀(Yarn)으로 구성된다. Hadoop is an open source framework for big data processing. It consists of Hadoop Distributed File System (HDFS), MapReduce data processing and resource manager Yarn.

데드라인을 만족시킬 수 있는 빅데이터 시스템을 위해서, 하둡 분산 파일 시스템은 자원 할당이 필요하고 그를 위해 작업 실행 시간 예측을 필요로 한다.For big data systems that can meet deadlines, Hadoop distributed file systems require resource allocation and work execution time prediction for them.

분산 파일 시스템을 구성할 때 작업을 데드라인 내로 끝낼 수 있는 최소의 자원을 할당하는 것이 필요하다. 자원은 각 work node의 map, reduce slot을 의미하며 이것은 개별 컴퓨터가 노드일 때는 각 노드의 CPU 같은 연산 장치이고, 요즘 클라우드 서버를 제공받아 환경을 구성할 때는 아마존 웹 서비스 (AWS) 같은 클라우드 서비스에서 제공하는 가상머신의 연산장치로 볼 수 있다. . 이런 자원의 최적화는 효율적인 시스템 운용뿐 아니라 경제적 측면에서도 매우 중요하다. 하둡에서는 데드라인 요구 사항을 지원하지 않기 때문에 순전히 사용자의 판단에 의존해 왔다. 하지만, 사용자가 예기치 않은 오류와 실패 등의 여러 요인을 고려하여 이를 예측하는 것은 어렵다.When configuring a distributed file system, it is necessary to allocate the least amount of resources to get the job done within the deadline. A resource is a map or reduce slot for each work node, which is a computing device like the CPU of each node when an individual computer is a node. It can be seen as a computing device of the provided virtual machine. . Optimization of these resources is important not only for efficient system operation but also for economic reasons. Hadoop does not support deadline requirements, so it has relied solely on user judgment. However, it is difficult for the user to predict this by considering various factors such as unexpected errors and failures.

하둡 시스템에 대해 기존의 예측 모델은 해결되지 않은, 세 가지의 문제가 존재한다. 첫째, 기존의 예측 모델은 주로 실패(failure)가 발생하지 않은 환경을 기반으로 하며, 둘째, 실패(failure)가 고려된 모델(HP model)의 경우에도 실패(failure) 확률은 포함되지 않았다. 셋째, 맵리듀스의 실패율과 서버 충돌과 디스크 드라이브의 실패율, 수명에 따른 대용량 디스크 드라이브 실패율에 따른 failure에 대하여 고려하지 않은 문제점이 있었다.There are three problems for the Hadoop system that are not solved by the existing prediction model. First, the existing prediction model is mainly based on the environment where failure does not occur. Second, failure probability is not included even in the case of the HP model in which failure is considered. Third, there was a problem of not considering the failure rate of MapReduce, server crash, disk drive failure rate, and failure due to large disk drive failure rate.

따라서 실패 확률을 고려하여 실행 시간을 예측하고 자원을 최적화하는 기술에 대한 연구가 필요한 실정이다.Therefore, it is necessary to study the technique for predicting execution time and optimizing resources in consideration of the probability of failure.

본 발명의 목적은 맵리듀스 및 하둡 분산 파일 시스템에서 발생하는 실패를 확률적으로 고려해 하둡 실행 시간을 예측하여 자원 최적화를 할 수 있는 빅데이터 처리를 위한 자원 분배 방법 및 장치를 제공하는 데 있다. An object of the present invention is to provide a resource distribution method and apparatus for big data processing capable of resource optimization by predicting Hadoop execution time in consideration of failures occurring in the MapReduce and Hadoop distributed file system.

상기 목적을 달성하기 위해 본 발명의 일실시예에 의하면, 맵리듀스와 하둡 분산 파일 시스템(Hadoop Distributed File System, HDFS)에서의 자원 분배 방법에 있어서, 맵 단계에서의 실패 확률(P_m) 및 리듀스 단계에서의 실패 확률(P_r)을 획득하는 단계; 상기 획득된 실패 확률(P_m) 및 실패 확률(P_r)을 고려하여 태스크 개수의 기대값을 산출하는 단계; 상기 산출된 기대값에 근거하여 데이터 처리의 실행 시간을 예측하는 단계; 및 상기 예측된 실행 시간을 이용하여 할당할 슬롯의 개수를 결정하는 단계를 포함하는 자원 분배 방법이 개시된다.According to an embodiment of the present invention to achieve the above object, in the resource distribution method in the MapReduce and Hadoop Distributed File System (HDFS), the probability of failure (P _m ) and the recovery at the map stage Obtaining a probability of failure P _r in the deuce step; Calculating an expected value of the number of tasks in consideration of the obtained failure probability (P _m ) and failure probability (P _r ); Predicting execution time of data processing based on the calculated expected value; And determining the number of slots to allocate using the predicted execution time.

상기 목적을 달성하기 위해 본 발명의 일실시예에 의하면, 맵리듀스와 하둡 분산 파일 시스템(Hadoop Distributed File System, HDFS)에서의 자원 분배 장치에 있어서, 맵 단계에서의 실패 확률(P_m) 및 리듀스 단계에서의 실패 확률(P_r)을 획득하는 실패 확률 획득부; 상기 획득된 실패 확률(P_m) 및 실패 확률(P_r)을 고려하여 태스크 개수의 기대값을 산출하는 기대값 산출부; 상기 산출된 기대값에 근거하여 데이터 처리를 위한 실행 시간을 예측하는 실행 시간 예측부; 상기 예측된 실행 시간을 이용하여 할당할 슬롯의 개수를 결정하는 슬롯 개수 결정부; 및 상기 실패 확률 획득부, 상기 기대값 산출부, 상기 실행 시간 예측부, 및 상기 슬롯 개수 결정부를 제어하는 제어부를 포함하는 자원 분배 장치가 개시된다. In order to achieve the above object, according to an embodiment of the present invention, in the resource distribution device in the MapReduce and Hadoop Distributed File System (HDFS), the probability of failure (P _m ) and the recovery at the map stage A failure probability acquisition unit for obtaining a failure probability P _r in the deuce step; An expected value calculator for calculating an expected value of the number of tasks in consideration of the obtained failure probability P _m and failure probability P _r ; An execution time estimator for predicting an execution time for data processing based on the calculated expected value; A slot number determiner configured to determine the number of slots to allocate using the estimated execution time; And a controller configured to control the failure probability obtaining unit, the expected value calculating unit, the execution time predicting unit, and the slot number determining unit.

본 발명의 일실시예에 의한 빅데이터 처리를 위한 자원 분배 방법 및 장치는 맵리듀스와 하둡 분산 파일 시스템에서 발생하는 실패를 효과적으로 실행 시간을 예측함으로써, 하둡 시스템에 할당되는 자원을 최적화 할 수 있다.The resource distribution method and apparatus for big data processing according to an embodiment of the present invention may optimize the resources allocated to the Hadoop system by effectively predicting execution time of failures occurring in the MapReduce and Hadoop distributed file systems.

도 1은 본 발명의 일실시예와 관련된 하둡 분산 파일 시스템의 일례를 나타낸 도면이다.
도 2는 본 발명의 일실시예와 관련된 맵리듀스 작업이 클러스터에서 실시되는 일례를 도시한 도면이다.
도 3은 본 발명의 일실시예와 관련된 자원 분배 장치의 블록도이다.
도 4는 본 발명의 일실시예와 관련된 자원 분배 방법을 나타내는 흐름도이다.
도 5는 본 발명의 일실시예와 관련된 컨트롤 메시지의 구조와 그것을 노드 간 주고받는 것을 도시한 도면이다.1 is a diagram illustrating an example of a Hadoop distributed file system related to an embodiment of the present invention.
2 is a diagram illustrating an example in which a map reduce operation related to an embodiment of the present invention is performed in a cluster.
3 is a block diagram of a resource distribution apparatus according to an embodiment of the present invention.
4 is a flowchart illustrating a resource distribution method according to an embodiment of the present invention.
5 is a diagram illustrating a structure of a control message related to an embodiment of the present invention and the exchange of the same between nodes.

이하, 본 발명의 일실시예와 관련된 빅데이터 처리를 위한 자원 분배 방법 및 장치에 대해 도면을 참조하여 설명하도록 하겠다.Hereinafter, a resource distribution method and apparatus for big data processing according to an embodiment of the present invention will be described with reference to the drawings.

본 명세서에서 사용되는 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "구성된다" 또는 "포함한다" 등의 용어는 명세서상에 기재된 여러 구성 요소들, 또는 여러 단계들을 반드시 모두 포함하는 것으로 해석되지 않아야 하며, 그 중 일부 구성 요소들 또는 일부 단계들은 포함되지 않을 수도 있고, 또는 추가적인 구성 요소 또는 단계들을 더 포함할 수 있는 것으로 해석되어야 한다.As used herein, the singular forms "a", "an" and "the" include plural forms unless the context clearly indicates otherwise. In this specification, terms such as “consisting of” or “comprising” should not be construed as necessarily including all of the various components or steps described in the specification, and some of the components or some steps It should be construed that it may not be included or may further include additional components or steps.

도 1은 본 발명의 일실시예와 관련된 하둡 분산 파일 시스템의 일례를 나타낸 도면이다. 1 is a diagram illustrating an example of a Hadoop distributed file system related to an embodiment of the present invention.

도시된 바와 같이, 하둡 분산 파일 시스템은 네임노드(NameNode)(100)와 워크노드(WorkNode)(101)로 분리되어 동작하며 네임노드(100)는 각 파일의 저장된 위치를 메모리에서 처리하고, 워크노드(101)는 물리적 파일을 블록 단위로 분산된 노드에 저장한다. 저렴한 하드웨어를 이용하여 설계되지만 장애 발생시 새로운 워크노드를 추가하면 기존의 시스템을 유지시킬 수 있어 다른 분산 시스템과 달리 내결함성(fault tolerant)이 뛰어나다.As shown, the Hadoop distributed file system operates by being divided into a NameNode 100 and a WorkNode 101, and the NameNode 100 processes a stored location of each file in a memory. The node 101 stores physical files in nodes distributed in block units. It is designed using inexpensive hardware, but in the event of a failure, adding a new worknode allows you to maintain the existing system, which is more fault tolerant than other distributed systems.

맵리듀스는 데이터 처리를 위한 프로그래밍 모델이며 맵(Map)과 리듀스(Reduce) 단계로 처리 과정을 나누어 작업한다. 사용자가 정의한 맵과 리듀스 함수에 따라 처리되고 하나의 마스터 노드(JobTracker)가 다수의 슬레이브 노드(TaskTrackers)를 관리하는 마스터 - 슬레이브 구조를 갖는다.MapReduce is a programming model for data processing and divides the processing process into map and reduce steps. It is processed according to user defined map and reduce function, and has a master-slave structure in which one master node (JobTracker) manages a plurality of slave nodes (TaskTrackers).

도 2는 본 발명의 일실시예와 관련된 맵리듀스 작업이 클러스터에서 실시되는 일례를 도시한 도면이다.2 is a diagram illustrating an example in which a map reduce operation related to an embodiment of the present invention is performed in a cluster.

도시된 바와 같이, 하둡은 먼저 인풋 데이터 셋을 분할(split)하고 이 데이터 스플릿은 하나의 슬레이브 노드(TaskTrackers)로 스케쥴되며 맵 태스크에 의해 처리된다(200). As shown, Hadoop first splits the input data set and this data split is scheduled to one slave node (TaskTrackers) and processed by the map task (200).

각 TaskTracker에는 맵과 리듀스 태스크를 실행하기 위한 슬롯(Task Slot)이 미리 정의되어 있는데, 태스크가 슬롯보다 많으면 multiplewaves에서 실행된다. 맵 태스크가 완료되면 모든 중간 값(Intermediate data) <key, value> 쌍을 정렬-병합 알고리즘을 사용하여 그룹화한다(201). 그런 다음 리듀스 태스크를 위해 예약 된 TaskTrackers로 전달되어 Shuffle 된다(202). 마지막으로 리듀스 태스크에서 intermediate data를 처리하여 결과를 생성한다(203). Each TaskTracker has predefined slots for executing maps and reduce tasks. If there are more than one slot, they are executed in multiplewaves. Upon completion of the map task, all intermediate data <key, value> pairs are grouped using a sort-merge algorithm (201). It is then passed to the TaskTrackers reserved for the reduce task and shuffled (202). Finally, the reduce task processes intermediate data to generate a result (203).

맵 리듀스는 맵 단계 (map phase), 셔플 단계 (shuffle phase), 리듀스 단계 (reduce phase)로 구성되며, 맵 단계에서는, 하둡 분산 파일 시스템으로부터 입력 데이터셋을 읽고 청크 단위로 분할 한 후 사용자가 정의한 Map function으로 보낸다. Map function은 이를 처리하여 중간 값(intermediate data)을 만들어 낸다. Map reduce consists of the map phase, shuffle phase, and reduce phase. In the map phase, the input dataset is read from the Hadoop distributed file system, divided into chunks, and then Sends to Map function defined by. The Map function processes this to produce intermediate data.

셔플 단계는 셔플 작업과 분류 작업이 동시에 수행되며 중간 값을 정렬시키고 하나 또는 그 이상의 복사본을 만든다. 만약, 리듀스 태스크 수가 리듀스 슬롯 수 보다 작은 경우에는 single wave로 종료되며, 클 경우에는 multiplewaves로 종료된다.The shuffle phase is performed simultaneously with the shuffling and sorting operations, sorting the intermediate values and making one or more copies. If the number of reduce tasks is smaller than the number of reduce slots, it ends with a single wave, and if it is large, it ends with multiple waves.

리듀스 단계는 정렬된 중간 값을 사용자가 정의한 reduce function으로 처리하여 최종 output을 생성하고 하둡 분산 파일 시스템에 기록 (writing)한다.The reduce step processes the sorted intermediate values into a user-defined reduce function to produce the final output and write it to the Hadoop distributed file system.

본 발명의 일실시예에 의한 실패(failure)를 반영한 실행 시간 예측은 실패(failure)가 어느 단계에서 발생했는지에 따라 계산이 달라진다. Execution time prediction reflecting a failure according to an embodiment of the present invention has a different calculation depending on at what stage a failure occurs.

이하에서는 본 발명의 구체적인 실행 시간 예측 방법과 자원 최적화 방법을 도 2의 실패 발생시 재시작 부분을 참조하여 설명한다. Hereinafter, a detailed execution time prediction method and a resource optimization method of the present invention will be described with reference to the restart portion of FIG. 2.

도 2의 위쪽 화살표 부분은 본 발명의 일실시예에 따른 맵리듀스의 각 단계에서 실패 발생시의 재시작을 도시한 것이다. The upper arrow portion of FIG. 2 illustrates restart when failure occurs in each step of map reduction according to an embodiment of the present invention.

맵 단계에서 실패 발생시 맵 단계의 처음 부분으로 돌아가서 작업을 하게 되고 이미 실행된 맵 태스크와 실행 중인 맵 태스크에 대해서만 재실행이 되므로 리듀스 태스크에는 영향을 주지 않는다(204). 리듀스 단계에서 실패 발생시 맵 단계의 처음부분부터 다시 작업을 해야 하므로 리듀스 단계에서의 실패가 전체 작업 시간에 무척 치명적으로 작용한다(205).If a failure occurs in the map phase, the operation returns to the beginning of the map phase and reruns only for the already executed map task and the executed map task, and thus does not affect the reduce task (204). If a failure occurs in the reduce phase, the work must be done from the beginning of the map phase, so the failure in the reduce phase is very fatal for the entire work time (205).

작업은 위와 같은 단계로 진행되고 총 실행 시간 예측과 자원 분배의 단계는 다음과 같은 단계로 수행될 수 있다.The work proceeds as above, and the total execution time estimation and resource distribution can be performed as the following steps.

먼저 실패를 고려한 맵 태스크(MapTask) 개수를 조사하고, 그를 이용해 맵 단계(MapPhase)의 실행 시간을 예측할 수 있다. 그리고 리듀스 태스크(ReduceTask) 개수를 조사하고, 그를 이용해 리듀스 단계(ReducePhase)의 실행 시간을 예측할 수 있다. 그것들로 총 실행 시간(Job Time)을 예측하고 그것으로 예측 시간을 고려해 각 노드에 자원 분배를 수행할 수 있다. First, the number of MapTasks considering a failure can be examined, and the execution time of the MapPhase can be predicted using them. In addition, the number of reduce tasks can be examined and used to predict the execution time of the reduce phase. With them, we can estimate total run time and use them to perform resource allocation to each node.

물론, 맵 태스크 개수 조사 및 리듀스 태스크 개수 조사를 수행하고, 그것들을 이용해 총 실행 시간을 예측하는 순서로 자원 분배 단계가 이루어질 수도 있다. Of course, the resource distribution step may be performed in order to perform the map task count check and the reduce task count check and predict the total execution time using them.

이하에서는 본 발명의 일실시예에 의한 각 단계의 실패율을 반영한 최종 실행 시간을 예측할 수 있고, 이를 통한 하둡 시스템에 할당될 자원을 최적화 할 수 있는 자원 분배 장치 및 방법에 대해 구체적으로 설명하도록 하겠다. Hereinafter, it is possible to predict the final execution time reflecting the failure rate of each step according to an embodiment of the present invention, and will be described in detail with respect to the resource distribution apparatus and method that can optimize the resources to be allocated to the Hadoop system.

도 3은 본 발명의 일실시예와 관련된 자원 분배 장치의 블록도이다. 3 is a block diagram of a resource distribution apparatus according to an embodiment of the present invention.

도시된 바와 같이, 자원 분배 장치(300)는 실패 확률 획득부(310), 기대값 산출부(320), 실행 시간 예측부(330), 슬롯 개수 결정부(340) 및 제어부(350)를 포함할 수 있다.As shown, the resource distribution device 300 includes a failure probability obtaining unit 310, an expected value calculating unit 320, an execution time predicting unit 330, a slot number determining unit 340, and a control unit 350. can do.

상기 실패 확률 획득부(310)는 맵 단계에서 발생하는 실패 확률(P_m) 및 리듀스 단계에서 발생하는 실패 확률(P_r)을 획득할 수 있다. 셔플 단계에서는 실패가 발생하지 않기 때문에 셔플 단계의 실패 확률은 획득하지 않는다. 상기 실패 확률(P_m) 및 실패 확률(P_r)은 경험적으로 구해질 수 있다.The failure probability acquisition unit 310 may acquire a failure probability P _m occurring in the map step and a failure probability P _r occurring in the reduce step. Since no failure occurs in the shuffle phase, the probability of failure in the shuffle phase is not obtained. The failure probability P _m and the failure probability P _r can be obtained empirically.

상기 기대값 산출부(320)는 획득된 실패 확률(P_m) 및 실패 확률(P_r)을 고려하여 태스크 개수의 기대값을 산출할 수 있다. 예를 들어, 상기 기대값 산출부(320)는 상기 실패 확률(P_m) 및 상기 실패 확률(P_r)을 고려하여 맵 태스크 개수의 기대값(E(N_m.tf)), 셔플 태스크 개수의 기대값(E(N_sh.2.tf)), 리듀스 태스크 개수의 기대값(E(N_r.tf))을 산출할 수 있다. The expected value calculator 320 may calculate an expected value of the number of tasks in consideration of the acquired failure probability P _m and the failure probability P _r . For example, the expected value calculator 320 may calculate the expected value E (N _m.tf ) of the map task number and the number of shuffle tasks in consideration of the failure probability P _m and the failure probability P _r . the expected value (E (N _sh.2.tf)), Li deuce expected value of the number of tasks (E (N _r.tf)) can be calculated.

상기 실행 시간 예측부(330)는 상기 산출된 맵 태스크 개수의 기대값(E(N_m.tf)), 셔플 태스크 개수의 기대값(E(N_sh.2.tf)), 및 리듀스 태스크 개수의 기대값(E(N_r.tf))에 근거하여 총 실행 시간을 예측할 수 있다. The runtime predictor 330 is the expected value of the calculated number of map tasks (E (N _m.tf)), the expected value of the shuffling task number (E (N _sh.2.tf)), and re-task deuce The total execution time can be estimated based on the expected value E (N _r.tf ).

상기 슬롯 개수 결정부(340)는 상기 예측된 총 실행 시간을 이용하여 할당할 최적의 슬롯 개수를 결정할 수 있다. The slot number determiner 340 may determine the optimal number of slots to allocate using the estimated total execution time.

상기 제어부(350)는 상기 실패 확률 획득부(310), 상기 기대값 산출부(320), 상기 실행 시간 예측부(330) 및 상기 슬롯 개수 결정부(340)를 전반적으로 제어할 수 있다. The controller 350 may generally control the failure probability obtaining unit 310, the expected value calculating unit 320, the execution time predicting unit 330, and the slot number determining unit 340.

이하에서는 실패 확률을 고려하여 실행 시간을 예측하고, 그것에 근거하여 슬롯을 결정하는 방법에 대해 구체적으로 설명하도록 하겠다. Hereinafter, a method of predicting execution time in consideration of a failure probability and determining a slot based thereon will be described in detail.

최종 실행 시간은 맵리듀스의 맵 단계, 셔플 단계, 리듀스 단계의 실행 시간을 각각

,

라고 할 때, 각 단계의 실행 시간의 합으로 모델링 할 수 있으며 다음의 수학식 1과 같이 표현될 수 있다.The final execution time is the execution time of the map reduction, shuffle phase, and reduce phase of MapReduce, respectively.

,

In this case, it can be modeled as the sum of execution times of each step and can be expressed as Equation 1 below.

각 단계의 시간을 구하기 위해 맵 단계, 리듀스 단계에서 수행하는 태스크 수와 그 각각의 평균 수행 시간을

,

이라 하고, 셔플은 first wave와 other wave의 숫자와 평균 수행 시간이 다르므로 fist wave와 other wave를 각각 다르게

,

로 놓는다. 또한 각 태스크를 수행하는 node에서의 슬롯 수를 맵 슬롯(map slot)을

, shuffler과 리듀스 슬롯(reduce slot)을

로 놓으면, 각 단계의 수행 시간과 총 수행 시간은 수학식 2 와 같이 표현될 수 있다.To find the time of each step, we need the number of tasks to be performed in the map and reduce steps, and their average execution time.

,

Since shuffles have different numbers and average execution times of first and other waves,

,

Place it. Also, map slots are used to determine the number of slots in the node that performs each task.

, shuffler and reduce slots

If set to, the execution time and total execution time of each step may be expressed as in Equation 2.

이제 실패를 고려하는 수식을 구하기 위해 실패가 발생한 시간을

라 할 때, 수학식 2를 이용해 실패가 발생한 시각까지 수행한 태스크의 숫자를 구할 수 있다. 맵 단계에서 실패가 발생했다면, 실패 발생까지 수행하여 실패 후에 다시 수행해야 하는 태스크 숫자를

이라 할 때 이는 수학식 3과 같이 표현될 수 있다.Now, to find a formula that takes into account the failure,

In this case, using Equation 2, the number of tasks performed up to the time of failure can be obtained. If a failure occurred in the map phase, run the failure up to the number of tasks that must be performed again after the failure.

This may be expressed as Equation 3 below.

맵 단계에서의 워크 노드(work node)가

개라고 할 때 결과적으로 재수행시 한 work node에서 추가적으로 수행해야 하는 태스크(task)수 N_m.fail.m는 수학식 4와 같이 표현할 수 있다.The work node in the map phase

As a result, the number of tasks N _m.fail.m to be additionally performed in one work node when re-executing can be expressed as in Equation 4.

상기 수학식 4에서 괄호는 올림의 표시로 나누어 떨어지지 않아 남는 태스크가 있으면 수행 횟수는 한번 더 추가 되기 때문에 취해주었다. 수행해야 하는 총 태스크의 숫자는 원래 실패가 발생하지 않은 경우 수행해야 하는 태스크 숫자와 실패로 인해 다시 수행해야 하는 태스크 숫자를 합한 숫자다. 원래의 task 숫자를 N_m.1, 수행해야 하는 총 태스크 숫자 N_m.tf라 하면, N_m.tf를 수학식 5로 표현할 수 있다.In Equation 4, if the parenthesis is not divided into rounding marks and there is a task remaining, the number of executions is added once more. The total number of tasks that must be performed is the sum of the number of tasks that must be performed if the original failure did not occur and the number of tasks that must be performed again because of the failure. When the original task number N _m.1, the total number N _m.tf tasks referred to need to do, it can be represented by the equation N _m.tf 5.

한편, 맵 단계에서의 실패는 리듀스 태스크에는 영향을 끼치지 않으므로 실패를 고려한 리듀스 태스크 수 N_r.tf는 수학식 6과 같이 표현할 수 있다.On the other hand, since the failure in the map phase does not affect the reduce task, the number of reduce tasks N _r.tf in consideration of the failure may be expressed by Equation 6.

실패가 리듀스 단계에서 발생하고 그 때까지 수행한 리듀스 태스크의 숫자를 N_r.done이라 하면, 그 실패가 발생한 시간

는 수학식 7과 같이 표현할 수 있다.If the failure occurs in the reduce phase and the number of reduce tasks performed so far is N _r.done , the time at which the failure occurred

Can be expressed as in Equation 7.

상기 수학식 7을 정리하여 실패가 발생할 때까지 수행한 리듀스 태스크 숫자 N_r.done를 수학식 8처럼 나타낼 수 있다.In summary, the _reduced task number N _r.done performed until failure occurs may be expressed as in Equation 8.

상기 수학식 8에서 가장 바깥의 괄호는 올림 하는 것으로 N_r.done가 정수 이므로 취해주었다. 맵 단계에서의 방식과 같이 워크 노드가

개라고 할 때 리듀스에서 실패가 생겨서 재수행을 할 경우, 맵 단계와 리듀스 단계에서 한 워크 노드에서 추가적으로 수행해야 하는 태스크수 N_m.fail.r, N_r.fail는 수학식 9로 나타낼 수 있다.The outermost parenthesis in Equation 8 is rounded up, and N _r.done is taken as an integer. Like the way in the map phase,

In case of redo when redistribution occurs due to failure, the number of additional tasks N _m.fail.r and N _{r.fail to} be performed in one work node in map and reduce phase are represented by Equation 9. Can be.

상기 수학식 9에서도 상술한 같이 남는 값이 있으면 수행 횟수를 한번 더 해야 하기 때문에 올림을 취했다. 맵 단계는 순수하게 map phase에서 실행되는 제1 단계 태스크(N_m.1)와 셔플단계에서 셔플 작업과 병행하여 실행되는 제2 단계 태스크로 구분된다. 그런데 셔플 단계와 병행하여 실행되는 제2 단계 태스크에서는 에러가 발생하지 않는다. 그래서 맵 단계에서 실패를 고려한 태스크 수를 연산할 때는 제1단계 태스크(N_m.1)만을 고려하게 된다. 그런데 리듀스 단계에서 실패가 발생할 경우에는 제1단계 태스크와 제2 단계 태스크를 모두 다 다시 반복하여야 한다. 상기 수학식 9에서 N_m은 제1단계 태스크 수와 상기 제2 단계 태스크 수가 합쳐진 값이다. In Equation (9), if the remaining values are the same as described above, the number of executions should be increased once more. The map phase is divided into a first stage task N _m.1 that is purely executed in the map phase and a second stage task executed in parallel with the shuffle task in the shuffle stage. However, no error occurs in the second step task executed in parallel with the shuffle step. Therefore, when calculating the number of tasks considering failure in the map phase, only the first stage task N _m.1 is considered. However, if a failure occurs in the reduce step, the first step task and the second step task must be repeated again. In Equation 9, N _m is a value obtained by adding up the number of first step tasks and the number of second step tasks.

따라서 실패가 발생하여 재수행시 실제로 하게 될 총 태스크 숫자는 수학식 10으로 나타낼 수 있다.Therefore, the total number of tasks that will be actually performed when a failure occurs and is re-executed can be represented by Equation 10.

도 4는 본 발명의 일실시예와 관련된 자원 분배 방법을 나타내는 흐름도이다. 4 is a flowchart illustrating a resource distribution method according to an embodiment of the present invention.

먼저, 실패 확률 획득부(310)는 맵 단계에서 발생하는 실패 확률(P_m) 및 리듀스 단계에서 발생하는 실패 확률(P_r)을 획득할 수 있다(S410). 상기 실패 확률(P_m) 및 실패 확률(P_r)은 경험적으로 구해질 수 있다. First, the failure probability acquisition unit 310 may acquire a failure probability P _m occurring in the map step and a failure probability P _r occurring in the reduce step (S410). The failure probability P _m and the failure probability P _r can be obtained empirically.

그리고 기대값 산출부(320)는 실패 확률을 고려하여 태스크 개수의 기대값을 산출할 수 있다(S420). The expected value calculator 320 may calculate an expected value of the number of tasks in consideration of a failure probability (S420).

맵과 리듀스 단계에서 실패가 발생할 확률을 각각

라 하면 이를 반영한 맵과 리듀스에서 수행해야 할 태스크 수의 기대값인

는 수학식 11로 표현 할 수 있다.The probability of failure in the map and reduce phases

Is the expected value of the number of tasks

Can be expressed by Equation 11.

상기 실행 시간 예측부(330)는 상기 수학식 10을 이용해 총 실행 시간

을 수학식 12로 예측할 수 있다(S430).The execution time estimator 330 uses the equation (10) to determine the total execution time.

It can be predicted by Equation 12 (S430).

상기 수학식 12에서 태스크 숫자가 슬롯 숫자로 나누어 떨어지지 않으면 한번 더 수행을 해야 하기 때문에 올림을 취했다.In Equation 12, since the task number is not divided by the slot number, it is necessary to perform it again.

상기 슬롯 개수 결정부(340)는 상기 수학식 12를 이용하여 맵 단계, 셔플 단계, 리듀스 단계에서의 슬롯의 개수를 결정하여 각 단계에 맞게 슬롯을 할당할 수 있다(S440). 예를 들어, 상기 슬롯 개수 결정부(340)는 상기 수학식 12를 이용하여 상기

가 데드라인 t를 넘기지 않는 조건에서

및

의 최소값을 산출하고, 산출된 최소값을 최적의 슬롯 개수로 결정할 수 있다.The slot number determiner 340 may determine the number of slots in the map step, shuffle step, and reduce step by using Equation 12 to allocate slots for each step (S440). For example, the slot number determiner 340 uses the equation (12) to determine the number of slots.

Does not cross the deadline t

And

The minimum value of may be calculated, and the calculated minimum value may be determined as the optimal number of slots.

이하 실시예에서는 Lagrange Multipliers방법을 사용하여 맵과 리듀스의 슬롯의 수인

이 최소값을 갖도록 하는 방법에 대해 설명하도록 하겠다. In the following embodiment, the Lagrange Multipliers method is used to determine the number of slots in the map and reduce.

I will explain how to have this minimum.

제약식

은 수학식 13, 목적식

은 수학식 14 로 나타낼 수 있다.Constraints

Is Equation 13, objective

May be represented by Equation 14.

따라서 Lagrangian functoin과 그를 미분하여 풀면 수학식 15와 같이 표현 된다.Therefore, Lagrangian functoin and its derivative are solved as shown in Equation 15.

위 식들로

을 최소화 하는 값을 구하면,

은 아래의 수학식 16을 만족할 때 최소화 된다.With the above equations

If you find a value that minimizes

Is minimized when the following Equation 16 is satisfied.

이렇게 맵과 리듀스 슬롯의 최적 할당을 할 수 있고 이런 최적의 할당은 데드라인을 넘겨 드롭하는 경우가 적어져 재전송 및 재시도를 하지 않는다. 그 결과 시스템 효율적으로도 경제적으로도 더 나은 결과를 얻을 수 있다.In this way, the optimal allocation of map and reduce slots is possible, and the optimal allocation is less likely to drop over the deadline, so retransmission and retry are not performed. The result is better system efficiency and economics.

도 5는 본 발명의 일실시예와 관련된 컨트롤 메시지의 구조와 그것을 노드 간 주고받는 것을 도시한 도면이다.5 is a diagram illustrating a structure of a control message related to an embodiment of the present invention and the exchange of the same between nodes.

도 5와 같이 HALF controller(500)와 각 노드들 간 주고 받는 메시지를 통해 제어할 수 있다. 메시지의 첫 번째 비트는 에러 여부를 나타내고(501), 두 번째에서 다섯 번째까지는 그 메시지를 송신한 노드의 번호(502), 여섯 번째부터 아홉 번째까지는 실패(Failure)가 발생했다면

를, 발생하지 않았다면 비워둔다 (503). 이 메시지의 비트는 노드 수와 데드라인의 변화에 따른 실패(Failure) 시각 변화에 따라 변화가 가능하고, 또한 이 메시지는 단독으로 컨트롤 메시지로 사용되거나 IP layer, TCP layer, MAC layer 등의 헤더의 옵션 필드에 적용하여 각 노드와 컨트롤러에서 주고받는 것이 가능하다 (504).As shown in FIG. 5, the HALF controller 500 and the nodes may be controlled through a message exchanged between the nodes. The first bit of the message indicates whether there is an error (501), the second to fifth numbers of the node that sent the message (502), and the sixth to ninth failures.

If not, leave blank (503). The bit of this message can be changed according to the change of failure time according to the change of node number and deadline, and this message can be used alone as a control message or in the header of IP layer, TCP layer, MAC layer, etc. It is possible to send and receive from each node and controller by applying to the option field (504).

전술한 바와 같이, 본 발명의 일실시예에 의한 빅데이터 처리를 위한 자원 분배 방법 및 장치는 맵리듀스와 하둡 분산 파일 시스템에서 발생하는 실패를 효과적으로 실행 시간을 예측함으로써, 하둡 시스템에 할당되는 자원을 최적화 할 수 있다.As described above, the resource distribution method and apparatus for big data processing according to an embodiment of the present invention effectively predicts execution time for failures occurring in MapReduce and Hadoop distributed file system, thereby managing resources allocated to the Hadoop system. Can be optimized

상술한 빅데이터 처리를 위한 자원 분배 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터로 판독 가능한 기록 매체에 기록될 수 있다. 이때, 컴퓨터로 판독 가능한 기록매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 한편, 기록매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다.The above-described resource distribution method for big data processing may be implemented in the form of program instructions that can be executed by various computer means and recorded in a computer-readable recording medium. In this case, the computer-readable recording medium may include program instructions, data files, data structures, and the like, alone or in combination. Meanwhile, the program instructions recorded on the recording medium may be those specially designed and configured for the present invention, or may be known and available to those skilled in computer software.

컴퓨터로 판독 가능한 기록매체에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(Magnetic Media), CD-ROM, DVD와 같은 광기록 매체(Optical Media), 플롭티컬 디스크(Floptical Disk)와 같은 자기-광 매체(Magneto-Optical Media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. Computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks such as floppy disks. Magnetic-Optical Media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like.

한편, 이러한 기록매체는 프로그램 명령, 데이터 구조 등을 지정하는 신호를 전송하는 반송파를 포함하는 광 또는 금속선, 도파관 등의 전송 매체일 수도 있다.The recording medium may be a transmission medium such as an optical or metal wire, a waveguide, or the like including a carrier wave for transmitting a signal specifying a program command, a data structure, or the like.

또한, 프로그램 명령에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상술한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.In addition, program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

상기와 같이 설명된 빅데이터 처리를 위한 자원 분배 방법 및 장치는 상기 설명된 실시예들의 구성과 방법이 한정되게 적용될 수 있는 것이 아니라, 상기 실시예들은 다양한 변형이 이루어질 수 있도록 각 실시예들의 전부 또는 일부가 선택적으로 조합되어 구성될 수도 있다.The resource distribution method and apparatus for the big data processing described above may not be limitedly applied to the configuration and method of the above-described embodiments, but the embodiments may be all or all of the embodiments so that various modifications may be made. Some may be optionally combined.

300: 자원 분배 장치
310: 실패 확률 획득부
420: 기대값 산출부
330: 실행 시간 예측부
340: 슬롯 개수 결정부
350: 제어부300: resource distribution device
310: failure probability acquisition unit
420: expected value calculation unit
330: execution time prediction unit
340: slot number determination unit
350: control unit

Claims

In the resource distribution method of MapReduce and Hadoop Distributed File System (HDFS),
Obtaining a failure probability P _m at the map stage and a failure probability P _r at the reduce stage;
Calculating an expected value of the number of tasks in consideration of the obtained failure probability (P _m ) and failure probability (P _r );
Predicting execution time of data processing based on the calculated expected value; And
And determining the number of slots to allocate using the predicted execution time.

The method of claim 1, wherein calculating the expected value
Calculating an expected value of the number of map tasks in consideration of the failure probability P _m ; And
And calculating an expected value of the number of map and reduce tasks in consideration of the failure probability (P _r ).

The method of claim 2, wherein the calculating of the expected value of the number of map tasks
And calculating an expected value of the number of map tasks in consideration of four cases in which the failure occurs or does not occur in the map step and the reduce step, respectively.

The method of claim 2, wherein the calculating of the expected values of the map and the number of reduce tasks is performed.
And calculating an expected value of the number of reduce tasks in consideration of two cases where a failure occurs or does not occur in the reduce step.

The method of claim 2, wherein the calculating step of calculating an expected value of the number of tasks in consideration of the obtained failure probability P _m and the failure probability P _r is performed.
And calculating an expected value of the number of shuffle tasks in consideration of two cases where a failure occurs or does not occur in the reduce step.

The method of claim 1, wherein calculating the expected value
Calculating an expected value of the number of map tasks, an expected value of the number of shuffle tasks, and an expected value of the number of reduce tasks, the expected value of the number of map tasks, an expected value of the number of shuffle tasks, and an expected number of reduce tasks The value is calculated by Equation 1 below.
[Equation 1]

Where E (N _m.tf ) is the expected value of the number of map tasks, E (N _sh.2.tf ) is the expected value of the number of shuffle tasks, and E (N _r.tf ) is the number of reduce tasks Expected value P _m is the probability of failure in the map phase, P _r is the probability of failure in the reduce phase N _m.1 is the number of map tasks if no failure occurred in the map phase, and N _{m. fail.m} is the number of additional map tasks to perform on one work node if a failure occurs in the map phase, and N _m.fail.r , N _r.fail are the map phase and reduce if the failure occurs in the reduce phase. The number of additional tasks to be performed on one work node in a step, N _sh.2 is the number of shuffled tasks if no failures occur in the reduce phase.)

7. The method of claim 6, wherein estimating execution time is
Resource distribution method characterized in that performed using the following equation (2).
[Equation 2]

(here,

Is the total run time,

Is the average execution time of one task in the map phase,

Is the average execution time of one task in the shuffle phase,

The average execution time of one task in the shuffle phase 2, N _sh. 1 is the number of tasks in the shuffle phase 1, and N _sh. 2 is the number of tasks in the shuffle phase 2.

Is the number of map slots,

Is the number of reduce slots.)

8. The method of claim 7, wherein determining the number of slots to allocate
The above using Equation 2

Does not cross the deadline t

And

Calculating a minimum value of the resource distribution method.

In the resource distribution device in MapReduce and Hadoop Distributed File System (HDFS),
A failure probability acquisition unit for obtaining a failure probability P _m at a map stage and a failure probability P _r at a reduce stage;
An expected value calculator for calculating an expected value of the number of tasks in consideration of the obtained failure probability P _m and failure probability P _r ;
An execution time estimator for predicting an execution time for data processing based on the calculated expected value;
A slot number determiner configured to determine the number of slots to allocate using the estimated execution time; And
And a controller for controlling the failure probability obtaining unit, the expected value calculating unit, the execution time predicting unit, and the slot number determining unit.

The method of claim 9, wherein the expected value calculator
The expected value of the number of map tasks is calculated in consideration of the failure probability P _m ,
And an expected value of the number of map and reduce tasks in consideration of the failure probability (P _r ).

The method of claim 10, wherein the expected value calculator
And calculating an expected value of the number of map tasks in consideration of four cases where failure occurs or does not occur in the map step and the reduce step, respectively.

The method of claim 10, wherein the expected value calculator
And calculating an expected value of the number of reduce tasks in consideration of two cases where a failure occurs or does not occur in the reduce step.

The method of claim 10, wherein the expected value calculator
And calculating an expected value of the number of shuffle tasks in consideration of two cases where a failure occurs or does not occur in the reduce step.

The method of claim 9, wherein the expected value calculator
The expected value of the number of map tasks, the expected value of the number of shuffle tasks, and the expected value of the number of reduce tasks are calculated. A resource distribution device, characterized in that calculated by the formula (1).
[Equation 1]

15. The method of claim 14, wherein the execution time predictor
Resource distribution apparatus characterized in that for performing the execution time prediction using the following equation (2).
[Equation 2]

(here,

Is the total run time,

Is the average execution time of one task in the map phase,

Is the average execution time of one task in the shuffle phase,

Is the number of map slots,

Is the number of reduce slots.)

The method of claim 15, wherein the slot number determiner
The above using Equation 2

Does not cross the deadline t

And

Calculating a minimum value of the resource distribution apparatus.