KR101772108B1

KR101772108B1 - Memory management system and method based on spark

Info

Publication number: KR101772108B1
Application number: KR1020160089802A
Authority: KR
Inventors: 이재환; 최준
Original assignee: 한국항공대학교산학협력단
Priority date: 2016-07-15
Filing date: 2016-07-15
Publication date: 2017-08-28

Abstract

The present invention relates to a system to manage a memory based on a spark environment, capable of solving a problem of performance deterioration severely occurring in memory shortage in a spark environment. According to the present invention, the system comprises: a master to execute a driver process dividing a job into a plurality of tasks and allocating the tasks; and a worker node to execute an execution process executing the task allocated by the driver process and transmitting an execution result to the driver process. The worker node includes a worker manager to determine whether or not bottleneck is generated in the spark environment, to perform a process solving generation of the bottleneck in accordance with a determination result, and to transmit change information changed as the process is performed to the master. The master includes a driver manager to update the change information received from the worker manager to the driver process.

Description

[0001] SPARK-BASED MEMORY MANAGEMENT SYSTEM AND METHOD [0002]

본원은 스파크 기반의 메모리 관리 시스템 및 방법에 관한 것이다.The present invention relates to a spark-based memory management system and method.

최근 빅데이터 분산처리 플랫폼으로 사용되는 스파크(Spark)는 RDD 데이터셋을 메모리 자원에 저장하여 메모리의 속도로 읽기/쓰기를 통해 속도를 향상시킨다는 점에서 크게 주목 받고 있다. Recently, Spark, which is used as a big data distribution processing platform, is attracting much attention because it stores the RDD dataset in the memory resource and improves the speed by reading / writing at the memory speed.

RDD는 Resilient Distributed Dataset의 약자로서, 한 번 생성되어 메모리에 저장되면 수정이 불가능하고, 반복되는 빅데이터 분산처리 작업에서 좋은 지역성(locality)을 가지기 때문에 디스크까지 데이터를 읽기/쓰기 할 필요가 없다는 장점이 있다.RDD is an abbreviation of Resilient Distributed Dataset. It has the advantage of not having to read / write data to a disk because it can not be modified once it is created and stored in memory, and has good locality in repeated big data distribution processing. .

이러한 스파크는 인메모리(In-memory)로 컴퓨팅되고 RDD 라는 자료구조를 사용함으로써 하둡(Hadoop)에서 반복하는(iterative) 작업들에 의하여 발생하던 성능 저하 문제를 해결할 수 있다. These sparks are computed with in-memory and use RDD data structures to address performance degradation caused by iterative operations in Hadoop.

그러나, 스파크에서는 클러스터 구성 시 제한된 메모리의 용량에 많은 데이터를 저장하고 처리함에 따라 메모리가 부족해지는 경우 병목현상이 발생하게 되며, 이러한 병목현상으로 인해 스파크에 심한 성능 하락이 야기되는 문제가 있다.However, in a spark, when a large amount of data is stored and processed in a limited memory capacity in a cluster configuration, a bottleneck occurs when the memory becomes insufficient, and the bottleneck causes a severe performance drop in the spark.

본원의 배경이 되는 기술은 한국공개특허공보 제2015-0089538호(공개일: 2015.08.05)에 개시되어 있다.The background technology of the present application is disclosed in Korean Patent Laid-Open Publication No. 2015-0089538 (Publication Date: 2015.08.05).

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 스파크 환경에서 메모리 부족 시 심하게 나타나는 성능 하락의 문제를 해소할 수 있는 스파크 기반의 메모리 관리 시스템 및 방법을 제공하려는 것을 목적으로 한다.SUMMARY OF THE INVENTION It is an object of the present invention to provide a spark-based memory management system and method capable of solving the problem of a performance degradation which is severe when a memory is short in a spark environment.

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 스파크의 성능을 향상시킬 수 있는 스파크 기반의 메모리 관리 시스템 및 방법을 제공하려는 것을 목적으로 한다.SUMMARY OF THE INVENTION It is an object of the present invention to provide a spark-based memory management system and method capable of improving spark performance.

다만, 본원의 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.It is to be understood, however, that the technical scope of the embodiments of the present invention is not limited to the above-described technical problems, and other technical problems may exist.

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본원의 일 실시예에 따른 스파크 환경에서의 메모리 관리 시스템은, 잡을 복수의 태스크로 분할하여 할당하는 드라이버 프로세스를 실행하는 마스터, 상기 드라이버 프로세스에 의하여 할당된 태스크를 수행하고, 수행 결과를 상기 드라이버 프로세스에 대해 전송하는 실행 프로세스를 실행하는 워커 노드를 포함하되, 상기 워커 노드는, 상기 스파크 환경에서의 병목현상의 발생 여부를 판단하고, 판단 결과에 따라 상기 병목현상의 발생을 해소시키기 위한 처리를 수행하고, 상기 처리를 수행함으로써 변경된 상기 실행 프로세스의 변경 정보를 상기 마스터로 전송하는 워커 관리자를 포함하고, 상기 마스터는 상기 워커 관리자로부터 수신한 변경 정보를 상기 드라이버 프로세스에 대해 업데이트하는 드라이버 관리자를 포함할 수 있다.According to an aspect of the present invention, there is provided a memory management system in a spark environment, comprising: a master for executing a driver process for dividing and assigning a job to a plurality of tasks; Wherein the worker node determines whether or not the bottleneck phenomenon occurs in the spark environment, and transmits the result of the execution to the driver process based on the determination result And a worker manager for performing processing for eliminating the occurrence of the bottleneck and for transmitting change information of the changed execution process to the master by performing the processing, wherein the master notifies the change information received from the walker manager Updating for the driver process That agent may include a driver manager.

또한, 상기 실행 프로세스는, 상기 태스크를 수행하기 위한 RDD를 생성하여 메모리 또는 고속 저장매체에 저장하되, 상기 워커 관리자는, 상기 메모리에 저장된 RDD 중에서 일부 RDD를 상기 고속 저장매체로 이동시킴으로써 상기 병목현상의 발생을 해소시킬 수 있다.Also, the execution process may include generating an RDD for performing the task and storing the RDD in a memory or a fast storage medium, wherein the worker manager moves some RDDs stored in the memory to the fast storage medium, Can be eliminated.

또한, 상기 워커 관리자는, 기설정된 주기로 풀 가비지 컬렉션 및 셔플 스필 중 적어도 하나 이상의 발생 유무를 확인함으로써 상기 병목현상의 발생 여부를 판단할 수 있다.In addition, the walker manager can determine whether the bottleneck phenomenon has occurred by checking whether at least one of the full garbage collection and the shuffle spill has occurred at a predetermined cycle.

또한, 상기 워커 관리자는, 상기 풀 가비지 컬렉션이 발생한 것으로 판단되는 경우, 기설정된 RDD 교체 알고리즘에 기초하여 상기 메모리에 저장된 RDD 중에서 상기 고속 저장매체로 옮길 제1 RDD를 상기 일부 RDD로서 식별하고, 상기 제1 RDD를 복사하여 상기 고속 저장매체에 저장하고, 상기 메모리에 저장된 상기 제1 RDD를 삭제할 수 있다.The walker manager may identify a first RDD to be transferred to the fast storage medium among the RDDs stored in the memory as the partial RDD based on a predetermined RDD replacement algorithm when it is determined that the full garbage collection has occurred, The first RDD may be copied and stored in the high-speed storage medium, and the first RDD stored in the memory may be deleted.

또한, 상기 워커 관리자는, 풀 가비지 컬렉션에 의한 상기 병목현상의 발생을 사전에 방지하기 위해, 기설정된 히프 크기의 초과 여부를 판단하고, 판단 결과 상기 기설정된 히프 크기가 초과하는 경우 상기 RDD가 상기 메모리 대신 상기 고속 저장매체에 저장되도록 상기 실행 프로세스를 제어할 수 있다.In addition, the walker manager may determine whether the predetermined hip size is exceeded in order to prevent the bottleneck due to the full garbage collection, and if the predetermined hip size is exceeded, The execution process may be controlled to be stored in the high-speed storage medium instead of the memory.

또한, 상기 워커 관리자는, 상기 셔플 스필이 발생한 것으로 판단되는 경우, 상기 메모리에서 셔플 공간의 비중이 기설정된 값 미만인지 여부를 판단함으로써 상기 셔플 공간의 비중을 증가시킬지 여부를 결정할 수 있다.The walker manager may determine whether to increase the specific gravity of the shuffle space by determining whether the specific gravity of the shuffle space is less than a preset value in the memory when it is determined that the shuffle spill has occurred.

또한, 상기 워커 관리자는, 상기 셔플 공간의 비중이 상기 기설정된 값 미만인 것으로 판단되는 경우, 상기 셔플 공간의 비중을 증가시키기 위해 기설정된 RDD 교체 알고리즘에 기초하여 상기 메모리에 저장된 RDD 중에서 상기 고속 저장매체로 옮길 제2 RDD를 상기 일부 RDD로서 식별하고, 상기 제2 RDD를 복사하여 상기 고속 저장매체에 저장하고, 상기 메모리에 저장된 상기 제2 RDD를 삭제할 수 있다.In addition, the walker manager may further include, when it is determined that the specific gravity of the shuffle space is less than the predetermined value, the RDD among the RDDs stored in the memory based on a preset RDD replacement algorithm to increase the specific gravity of the shuffle space The second RDD to be moved to the first RDD may be identified as the partial RDD, the second RDD may be copied and stored in the fast storage medium, and the second RDD stored in the memory may be deleted.

또한, 상기 워커 관리자는, 상기 메모리에서 상기 제2 RDD를 삭제함으로써 상기 셔플 공간의 비중을 증가시키되, 한번에 증가될 수 있는 상기 셔플 공간의 최대 증가 비중 값은 사용자 입력에 의하여 설정 가능한 것일 수 있다.Also, the walker manager may increase the weight of the shuffle space by deleting the second RDD from the memory, and the maximum increase specific gravity value of the shuffle space that can be increased at a time may be set by user input.

또한, 상기 워커 관리자는, 스파크 API를 이용하여 상기 일부 RDD를 상기 고속 저장매체로 이동시킬 수 있다.In addition, the worker manager may move the partial RDD to the high-speed storage medium using a spark API.

또한, 상기 워커 관리자는, 상기 실행 프로세스로부터 상기 RDD의 정보 및 상기 메모리의 정보를 수신하여 상기 실행 프로세스에 의한 상기 RDD의 저장 위치 및 상기 메모리의 설정을 제어하되, 상기 RDD의 저장 위치 및 상기 메모리의 설정에 변화가 발생된 것으로 판단되는 경우, 상기 RDD의 저장 위치의 변경 정보 및 상기 메모리 설정의 변경 정보를 상기 실행 프로세스의 변경 정보로서 상기 드라이버 관리자로 전송할 수 있다.The walker manager receives information of the RDD and information of the memory from the execution process and controls the storage location of the RDD and the setting of the memory by the execution process, The change information of the storage location of the RDD and the change information of the memory setting may be transmitted to the driver manager as change information of the execution process.

한편, 본원의 일 실시예에 따른 스파크 환경에서의 메모리 관리 방법은, (a) 마스터에 포함된 드라이버 프로세스를 실행하여, 잡을 복수의 태스크로 분할하여 할당하는 단계, (b) 워커 노드에 포함된 실행 프로세스를 실행하여, 상기 드라이버 프로세스에 의하여 할당된 태스크를 수행하고, 수행 결과를 상기 드라이버 프로세스에 대해 전송하는 단계를 포함하되, 상기 (b) 단계는, (b1)상기 워커 노드에 포함된 워커 관리자가, 상기 스파크 환경에서의 병목현상의 발생 여부를 판단하고, 판단 결과에 따라 상기 병목현상의 발생을 해소시키기 위한 처리를 수행하고, 상기 처리를 수행함으로써 변경된 상기 실행 프로세스의 변경 정보를 상기 마스터로 전송하는 단계, 및 (b2) 상기 마스터에 포함된 드라이버 관리자가, 상기 워커 관리자로부터 수신한 변경 정보를 상기 드라이버 프로세스에 대해 업데이트하는 단계를 포함할 수 있다.Meanwhile, a memory management method in a spark environment according to an embodiment of the present invention includes the steps of (a) executing a driver process included in a master to divide a job into a plurality of tasks and allocating them, (b) Executing a task executed by the driver process, and transmitting a result of the execution to the driver process, wherein the step (b) includes the steps of: (b1) The manager determines whether or not a bottleneck phenomenon occurs in the spark environment, performs processing for eliminating the occurrence of the bottleneck phenomenon according to the determination result, and transmits the changed change information of the execution process to the master (B2) the driver manager included in the master changes the change information received from the walker manager It may include the step of updating with respect to the driver process.

또한, 상기 (b) 단계에서는, 상기 실행 프로세스가, 상기 태스크를 수행하기 위한 RDD를 생성하여 메모리 또는 고속 저장매체에 저장하되, 상기 (b1) 단계에서는, 상기 워커 관리자가, 상기 메모리에 저장된 RDD 중에서 일부 RDD를 상기 고속 저장매체로 이동시킴으로써 상기 병목현상의 발생을 해소시킬 수 있다.In the step (b), the execution process generates an RDD for performing the task and stores the RDD in a memory or a high-speed storage medium. In the step (b1), the worker manager registers RDD The occurrence of the bottleneck phenomenon can be solved by moving some RDDs to the high-speed storage medium.

또한, 상기 (b1) 단계에서는, 상기 워커 관리자가, 기설정된 주기로 풀 가비지 컬렉션 및 셔플 스필 중 적어도 하나 이상의 발생 유무를 확인함으로써 상기 병목현상의 발생 여부를 판단할 수 있다.In the step (b1), the walker manager can determine whether the bottleneck phenomenon has occurred by checking whether at least one of full garbage collection and shuffle spill has occurred at a predetermined period.

또한, 상기 (b1) 단계에서는, 상기 워커 관리자가, 상기 풀 가비지 컬렉션이 발생한 것으로 판단되는 경우, 기설정된 RDD 교체 알고리즘에 기초하여 상기 메모리에 저장된 RDD 중에서 상기 고속 저장매체로 옮길 제1 RDD를 상기 일부 RDD로서 식별하고, 상기 제1 RDD를 복사하여 상기 고속 저장매체에 저장하고, 상기 메모리에 저장된 상기 제1 RDD를 삭제할 수 있다.In the step (b1), if the walker manager determines that the full garbage collection has occurred, the first RDD to be transferred to the fast storage medium among the RDDs stored in the memory, based on the preset RDD replacement algorithm, It may be identified as some RDD, the first RDD may be copied and stored in the fast storage medium, and the first RDD stored in the memory may be deleted.

또한, 상기 (b) 단계에서는, 상기 워커 관리자가, 풀 가비지 컬렉션에 의한 상기 병목현상의 발생을 사전에 방지하기 위해, 기설정된 히프 크기의 초과 여부를 판단하고, 판단 결과 상기 기설정된 히프 크기가 초과하는 경우 상기 RDD가 상기 메모리 대신 상기 고속 저장매체에 저장되도록 상기 실행 프로세스를 제어할 수 있다.In the step (b), the walker manager may determine whether the predetermined heap size is exceeded in order to prevent the occurrence of the bottleneck due to the full garbage collection, and if the predetermined heap size The RDD may be controlled to be stored in the high-speed storage medium instead of the memory.

또한, 상기 (b1) 단계에서는, 상기 워커 관리자가, 상기 셔플 스필이 발생한 것으로 판단되는 경우, 상기 메모리에서 셔플 공간의 비중(%)이 기설정된 값 미만인지 여부를 판단함으로써 상기 셔플 공간의 비중을 증가시킬지 여부를 결정할 수 있다.In the step (b1), when the walker manager determines that the shuffle spill has occurred, it determines whether the specific gravity (%) of the shuffle space is less than a predetermined value in the memory, Or not.

또한, 상기 (b1) 단계에서는, 상기 워커 관리자가, 상기 셔플 공간의 비중이 상기 기설정된 값 미만인 것으로 판단되는 경우, 상기 셔플 공간의 비중을 증가시키기 위해 기설정된 RDD 교체 알고리즘에 기초하여 상기 메모리에 저장된 RDD 중에서 상기 고속 저장매체로 옮길 제2 RDD를 상기 일부 RDD로서 식별하고, 상기 제2 RDD를 복사하여 상기 고속 저장매체에 저장하고, 상기 메모리에 저장된 상기 제2 RDD를 삭제할 수 있다.If it is determined in step (b1) that the weight of the shuffle space is less than the predetermined value, the walker manager may set the shuffle space in the memory based on a preset RDD replacement algorithm to increase the weight of the shuffle space The second RDD to be transferred from the stored RDD to the high-speed storage medium may be identified as the partial RDD, the second RDD may be copied and stored in the high-speed storage medium, and the second RDD stored in the memory may be deleted.

또한, 상기 (b1) 단계에서는, 상기 워커 관리자가, 상기 메모리에서 상기 제2 RDD를 삭제함으로써 상기 셔플 공간의 비중을 증가시키되, 한번에 증가될 수 있는 상기 셔플 공간의 최대 증가 비중 값은 사용자 입력에 의하여 설정 가능한 것일 수 있다.In the step (b1), the walker manager increases the specific weight of the shuffle space by deleting the second RDD from the memory, and the maximum increase specific gravity value of the shuffle space that can be increased at one time is It can be settable.

또한, 상기 (b1) 단계에서는, 상기 워커 관리자가, 스파크 API를 이용하여 상기 일부 RDD를 상기 고속 저장매체로 이동시킬 수 있다.In the step (b1), the worker manager may move the partial RDD to the high-speed storage medium using a spark API.

또한, 상기 (b1) 단계에서는, 상기 워커 관리자가, 상기 실행 프로세스로부터 상기 RDD의 정보 및 상기 메모리의 정보를 수신하여 상기 실행 프로세스에 의한 상기 RDD의 저장 위치 및 상기 메모리의 설정을 제어하되, 상기 RDD의 저장 위치 및 상기 메모리의 설정에 변화가 발생된 것으로 판단되는 경우, 상기 RDD의 저장 위치의 변경 정보 및 상기 메모리 설정의 변경 정보를 상기 실행 프로세스의 변경 정보로서 상기 드라이버 관리자로 전송할 수 있다.In the step (b1), the walker manager receives information of the RDD and information of the memory from the execution process, and controls the storage location of the RDD by the execution process and the setting of the memory, The change information of the storage location of the RDD and the change information of the memory setting may be transmitted to the driver manager as change information of the execution process when it is determined that a change has occurred in the storage location of the RDD and the setting of the memory.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본원을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 추가적인 실시예가 존재할 수 있다.The above-described task solution is merely exemplary and should not be construed as limiting the present disclosure. In addition to the exemplary embodiments described above, there may be additional embodiments in the drawings and the detailed description of the invention.

전술한 본원의 과제 해결 수단에 의하면, 스파크 환경에서의 병목현상의 발생 요인을 확인하고 확인된 병목현상을 해소시킴으로써, 스파크 환경에서 메모리 부족 시 심하게 나타나는 성능 하락의 요인을 해결할 수 있는 효과가 있다. According to the above-mentioned problem solving means of the present invention, the cause of bottleneck in the spark environment is identified and the bottleneck phenomenon confirmed is solved, thereby solving the factor of the decline in the performance which is drastically caused when the memory is short in the spark environment.

전술한 본원의 과제 해결 수단에 의하면, 스파크 환경에서 고속 저장매체를 이용함으로써 인메모리(In-Memory) 분산처리 프레임워크의 성능을 향상시킬 수 있는 효과가 있다.According to the above-mentioned problem solving means of the present invention, the performance of an in-memory distributed processing framework can be improved by using a high-speed storage medium in a spark environment.

전술한 본원의 과제 해결 수단에 의하면, 분산 프레임워크의 성능에 영향을 끼치지 않는 실시간 분석 도구를 이용함으로써, 스파크의 성능을 보다 효과적으로 향상시킬 수 있는 효과가 있다.According to the above-mentioned problem solving means of the present invention, by using a real-time analysis tool that does not affect the performance of the distributed framework, the spark performance can be improved more effectively.

도 1은 본원의 일 실시예에 따른 스파크 기반의 메모리 관리 시스템의 적용을 위한 스파크 클러스터 시스템의 개략적인 구성을 나타낸 도면이다.
도 2는 본원의 일 실시예에 따른 스파크 기반의 메모리 관리 시스템의 개략적인 구성을 나타낸 도면이다.
도 3은 본원의 일 실시예에 따른 스파크 기반의 메모리 관리 시스템에서 워커 관리자를 통한 병목현상의 발생 해소 과정을 나타낸 도면이다.
도 4a 및 도 4b는 본원의 일 실시예에 따른 스파크 기반의 메모리 관리 방법에 대한 개략적인 동작 흐름도이다.FIG. 1 is a block diagram of a spark cluster system for applying a spark-based memory management system according to an embodiment of the present invention. Referring to FIG.
FIG. 2 is a diagram illustrating a schematic configuration of a spark-based memory management system according to an embodiment of the present invention.
FIG. 3 is a flowchart illustrating a process of eliminating a bottleneck phenomenon through a walker manager in a spark-based memory management system according to an embodiment of the present invention.
4A and 4B are schematic operation flowcharts of a spark-based memory management method according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본원이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본원의 실시예를 상세히 설명한다. 그러나 본원은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본원을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present invention. It should be understood, however, that the present invention may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In the drawings, the same reference numbers are used throughout the specification to refer to the same or like parts.

본원 명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결" 또는 "간접적으로 연결"되어 있는 경우도 포함한다. Throughout this specification, when an element is referred to as being "connected" to another element, it is intended to be understood that it is not only "directly connected" but also "electrically connected" or "indirectly connected" "Is included.

본원 명세서 전체에서, 어떤 부재가 다른 부재 "상에", "상부에", "상단에", "하에", "하부에", "하단에" 위치하고 있다고 할 때, 이는 어떤 부재가 다른 부재에 접해 있는 경우뿐 아니라 두 부재 사이에 또 다른 부재가 존재하는 경우도 포함한다.It will be appreciated that throughout the specification it will be understood that when a member is located on another member "top", "top", "under", "bottom" But also the case where there is another member between the two members as well as the case where they are in contact with each other.

본원 명세서 전체에서, 어떤 부분이 어떤 구성 요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성 요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다.Throughout this specification, when an element is referred to as "including " an element, it is understood that the element may include other elements as well, without departing from the other elements unless specifically stated otherwise.

도 1은 본원의 일 실시예에 따른 스파크 기반의 메모리 관리 시스템의 적용을 위한 스파크 클러스터 시스템의 개략적인 구성을 나타낸 도면이다.FIG. 1 is a block diagram of a spark cluster system for applying a spark-based memory management system according to an embodiment of the present invention. Referring to FIG.

도 1을 참조하면, 일반적인 스파크 클러스터(Spark Cluster) 시스템(100)은 네트워크(10), 마스터(20) 및 복수의 워커 노드(30a, 30b, …, 30x)를 포함할 수 있다.Referring to FIG. 1, a typical spark cluster system 100 may include a network 10, a master 20, and a plurality of worker nodes 30a, 30b, ..., 30x.

마스터(20)와 복수의 워커 노드(30a, 30b, …, 30x)는 네트워크(10)를 통해 연결될 수 있으며, 네트워크(10)를 통해 서로 통신을 수행할 수 있다.The master 20 and the plurality of worker nodes 30a, 30b, ..., and 30x may be connected through the network 10 and may communicate with each other through the network 10. [

네트워크(10)의 일예로는 3GPP(3rd Generation Partnership Project) 네트워크, LTE(Long Term Evolution) 네트워크, WIMAX(World Interoperability for Microwave Access) 네트워크, 인터넷(Internet), LAN(Local Area Network), Wireless LAN(Wireless Local Area Network), WAN(Wide Area Network), PAN(Personal Area Network), 블루투스(Bluetooth) 네트워크, NFC(Near Field Communication) 네트워크, 위성 방송 네트워크, 아날로그 방송 네트워크, DMB(Digital Multimedia Broadcasting) 네트워크 등이 포함될 수 있으며, 이에 한정된 것은 아니다.Examples of the network 10 include a 3rd Generation Partnership Project (3GPP) network, a Long Term Evolution (LTE) network, a World Interoperability for Microwave Access (WIMAX) network, the Internet, a LAN (Local Area Network) A wireless LAN, a WAN (Wide Area Network), a PAN (Personal Area Network), a Bluetooth network, an NFC (Near Field Communication) network, a satellite broadcasting network, an analog broadcasting network, a DMB (Digital Multimedia Broadcasting) But is not limited thereto.

마스터(20)와 복수의 워커 노드(30a, 30b, …, 30x)는 컴퓨팅이 가능한 장치로서, 데스크탑 PC, 노트북, 태블릿, 휴대폰, 스마트폰, 이동통신 단말기, PDA(personal digital assistant) 등일 수 있으며, 이에 한정된 것은 아니다.The master 20 and the plurality of worker nodes 30a, 30b ... 30x can be a computing device capable of being a desktop PC, a notebook, a tablet, a mobile phone, a smart phone, a mobile communication terminal, a personal digital assistant , But is not limited thereto.

스파크 환경에서 빅데이터 분석을 위한 잡(job)이 수행되면, 마스터(20)는 자바 프로세스(Java Process)인 드라이버 프로세스(Driver Process)(21)를 실행하고, 실제로 일을 수행하는 복수의 워커 노드(30a, 30b, …, 30x) 각각은 태스크를 처리하는 자바 프로세스(Java Process)인 실행 프로세스(Executor Process)(31a, 31b, …, 31x)를 실행할 수 있다. 이때, 복수의 워커 노드(30a, 30b, …, 30x) 각각에 포함된 구성 및 각각에 의하여 수행되는 처리 과정들은 모두 동일하므로, 이하에서는 설명의 편의상, 복수의 워커 노드(30a, 30b, …, 30x) 중 대표적으로 하나의 워커 노드(30a)와 마스터(20) 간에 통신에 대하여 설명하기로 한다. 즉, 이하 하나의 워커 노드(30a)에 대하여 설명된 내용은 다른 워커 노드들(30b, …, 30x)에 대해서도 동일하게 적용될 수 있다.When a job for analyzing big data is performed in a spark environment, the master 20 executes a driver process 21 which is a Java process, and a plurality of worker nodes Each of the processors 30a, 30b, ..., 30x can execute an executor process 31a, 31b, ..., 31x, which is a Java process for processing a task. In this case, since the configurations included in each of the plurality of worker nodes 30a, 30b, ..., and 30x and the processing procedures performed by the respective ones are the same, a plurality of worker nodes 30a, 30b, The communication between one worker node 30a and the master 20 will be described. That is, the description of one worker node 30a may be applied to the other worker nodes 30b, ..., 30x.

잡이 제출되어 실행되는 과정에 대해 다시 살펴보면, 우선, 마스터(20)는 분산처리를 위해 잡(job)을 복수의 태스크(task)로 분할하고, 분할된 태스크를 복수의 워커 노드(30a, 30b, …, 30x)에 할당하는 드라이버 프로세스(21)를 실행할 수 있다. 이때, 드라이버 프로세스(21)는 복수의 워커 노드(30a, 30b, …, 30x)의 상황을 고려하여, 분할된 태스크를 특히 복수의 워커 노드(30a, 30b, …, 30x)에 포함된 실행 프로세스(31a, 31b, …, 31x)로 전송하여 할당할 수 있다.First, the master 20 divides a job into a plurality of tasks for distributed processing, and divides the divided tasks into a plurality of worker nodes 30a, 30b, ..., 30x) of the driver. At this time, in consideration of the situation of the plurality of worker nodes 30a, 30b, ..., 30x, the driver process 21 divides the divided tasks into a plurality of worker nodes 30a, 30b, (31a, 31b, ..., 31x).

태스크를 할당받은 워커 노드(30a)는, 드라이버 프로세스(21)에 의하여 할당된 태스크를 수행하고, 그에 대한 수행 결과를 드라이버 프로세스(21)에 전송하는 실행 프로세스(31a)를 실행할 수 있다. 여기서, 드라이버 프로세스(21)로부터 태스크를 할당받는 실행 프로세스(31a)는 여러 개의 태스크를 수행하게 되는데, 이때, 실행 프로세스(31a)는 태스크를 처리하기 위해 필요한 데이터인 RDD를 생성하고, 생성된 RDD를 메모리(32a) 또는 고속 저장매체(예를 들어, SSD)(33a)에 저장할 수 있다.The worker node 30a to which the task has been assigned can execute the execution process 31a for executing the task assigned by the driver process 21 and transmitting the result of the execution to the driver process 21. [ Here, the execution process 31a, which receives a task from the driver process 21, performs a plurality of tasks. At this time, the execution process 31a generates RDD, which is data necessary for processing the task, May be stored in the memory 32a or a high-speed storage medium (e.g., SSD) 33a.

이때, RDD는 Resilient Distributed DataSet의 약자로서, 자바 오브젝트(Java Object)로 한 번 생성되어 메모리에 저장되면 수정할 수 없는 데이터셋을 의미하며, 이러한 RDD는 머신러닝, 하둡(Hadoop) 등에서 반복되는 잡(Job)을 처리할 때 캐싱으로 인한 성능 향상 효과를 보기 위해 고안되었다.RDD is an abbreviation of Resilient Distributed DataSet. It means a data set that is created once by a Java object and can not be modified if it is stored in memory. Such an RDD is a job that is repeated in machine learning, Hadoop, Jobs are designed to look at performance improvements due to caching.

한편, RDD를 생성한 이후에 실행 프로세스(31a)는, RDD를 이용해 모든 태스크를 수행한 후, 태스크의 수행 결과를 마스터(20)의 드라이버 프로세스(21)에 대해 전송할 수 있다.On the other hand, after the RDD is created, the execution process 31a can perform all the tasks using the RDD and then transmit the results of the task execution to the driver process 21 of the master 20. [

이하에서는 도 1에 도시된 스파크 클러스터 시스템(100)에 기초한 본원의 일 실시예에 따른 스파크 기반의 메모리 관리 시스템에 대하여 자세히 기술하기로 한다.Hereinafter, a spark-based memory management system according to an embodiment of the present invention based on the spark cluster system 100 shown in FIG. 1 will be described in detail.

도 2는 본원의 일 실시예에 따른 스파크 기반의 메모리 관리 시스템의 개략적인 구성을 나타낸 도면이다.FIG. 2 is a diagram illustrating a schematic configuration of a spark-based memory management system according to an embodiment of the present invention.

도 2를 참조하면, 본원의 일 실시예에 따른 스파크 기반의 메모리 관리 시스템(200)은 네트워크(10), 마스터(20) 및 복수의 워커 노드(30a, 30b, …, 30x)를 포함할 수 있다. 이때, 도 2에 도시된 네트워크(10), 마스터(20) 및 복수의 워커 노드(30a, 30b, …, 30x)는 도 1에 도시된 네트워크(10), 마스터(20) 및 복수의 워커 노드(30a, 30b, …, 30x)와 동일할 수 있으며, 다만, 도 2에 도시된 본원의 일 실시예에 따른 스파크 기반의 메모리 관리 시스템(200)에서는 마스터(20)가 드라이버 관리자(22)를 더 포함하고, 워커 노드(30a)가 워커 관리자(34a) 및 분석 툴(35a)을 더 포함할 수 있다.2, a spark-based memory management system 200 according to one embodiment of the present invention may include a network 10, a master 20 and a plurality of worker nodes 30a, 30b, ..., 30x. have. At this time, the network 10, the master 20, and the plurality of worker nodes 30a, 30b, ..., 30x shown in FIG. 2 correspond to the network 10, the master 20, Based memory management system 200 according to one embodiment of the present invention shown in Figure 2, the master 20 may be the same as the driver manager 22 (30a, 30b, ..., 30x) , And the worker node 30a may further include a walker manager 34a and an analysis tool 35a.

이때, 도 1을 참조한 설명에서와 마찬가지로, 도 2를 참조한 설명에서는, 설명의 편의상 복수의 워커 노드(30a, 30b, …, 30x) 중 하나의 워커 노드(30a)에 대해서만 설명하기로 하며, 하나의 워커 노드(30a)에 대하여 설명된 내용은 다른 워커 노드들(30b, …, 30x)에 대해서도 동일하게 적용될 수 있다. 즉, 복수의 워커 노드(30a, 30b, …, 30x) 각각은 각각의 워커 관리자와 분석 툴을 포함할 수 있다.2, only one worker node 30a among a plurality of worker nodes 30a, 30b, ..., and 30x will be described for convenience of description, and one The contents described with respect to the worker node 30a of the worker node 30a may be similarly applied to the other worker nodes 30b, ..., 30x. That is, each of the plurality of worker nodes 30a, 30b, ..., 30x may include respective walker managers and analysis tools.

우선, 본원의 일 실시예에 따른 스파크 기반의 메모리 관리 시스템(200)의 전체적인 개념을 간단히 살펴보면, 드라이버 관리자(22)는 워커 관리자(34a)의 제어에 의하여 변경된 정보를 워커 관리자(34a)로부터 수신하고, 수신한 변경된 정보를 드라이버 프로세스(21)에 업데이트할 수 있다. 이러한 드라이버 관리자(22)는 네트워크(10)를 통해 워커 관리자(34a)와 통신을 수행할 수 있다.First, the overall concept of the spark-based memory management system 200 according to an embodiment of the present invention will be briefly described. The driver manager 22 receives information changed by the control of the walker manager 34a from the walker manager 34a And can update the received changed information to the driver process 21. [ This driver manager 22 can communicate with the walker manager 34a via the network 10. [

워커 관리자(34a)는 RDD를 저장하고 태스크를 처리하는 실행 프로세스(31a)와 함께 워커 노드(30a) 내에 구비될 수 있으며, 워커 관리자(34a)는 시스템 분석 도구, 즉 분석 툴(35a)을 이용하여 사용자에 의하여 기설정된 주기 마다 스파크 환경에서 병목현상의 요인이 발생하는지 확인할 수 있다. 본원에서는 스파크 환경에서의 병목현상 요인을 풀 가비지 컬렉션(Full Garbage Collection, Full GC)과 셔플 스필(shuffle spill)의 발생 유무를 예로 들 수 있다. 이때, 본원에서는 풀 가비지 컬렉션의 발생 유무를 확인하기 위해 분석 툴(35a)로서 자바 분석 도구를 이용할 수 있으며, 셔플 스필의 발생 유무를 확인하기 위해 분석 툴(35a)로서 Spark Web UI 와 computer resource 분석 도구를 이용할 수 있다.The walker manager 34a may be included in the worker node 30a with an execution process 31a storing the RDD and processing the task and the walker manager 34a may use the system analysis tool or analysis tool 35a It is possible to check whether a bottleneck phenomenon occurs in the spark environment every predetermined period by the user. In this paper, bottleneck factors in the spark environment are exemplified by the occurrence of full garbage collection (Full GC) and shuffle spill. In this case, a Java analysis tool can be used as the analysis tool 35a in order to check whether full garbage collection has occurred. In order to check the occurrence of the shuffle spill, Spark Web UI and computer resource analysis Tools are available.

보다 구체적으로 살펴보면, 스파크 환경에서 잡이 수행될 때 성능 저하에 영향을 미치는 요인으로는 크게 두 가지가 있다. 첫번째 요인은 Java Garbage Collection 중에서 오래된 객체를 정리하는 Full GC(또는 Major GC)의 발생이다. Full GC가 발생하게 되면, 컴퓨터의 리소스를 많이 사용하기 때문에 빅데이터 분산처리를 위한 프레임워크인 스파크에 병목현상이 발생하게 된다. 이러한 병목현상을 해소하기 위해, 본원에서는 Full GC가 발생한 것으로 판단되는 경우, 메모리의 공간을 충분히 확보하기 위해 고속 저장매체를 이용할 수 있으며, 보다 자세하게는 메모리에 저장되어 있는 일부 RDD를 고속 저장 매체로 이동시켜 저장함으로써 메모리 공간을 확보할 수 있다. More specifically, there are two major factors that affect performance degradation when a job is performed in a spark environment. The first factor is the occurrence of a Full GC (or Major GC) in the Java Garbage Collection to clean up old objects. When full GC occurs, the spark becomes a bottleneck, a framework for large data distribution processing, because it uses a lot of computer resources. In order to solve this bottleneck, in the present invention, when it is judged that full GC is generated, a high-speed storage medium can be used to secure a sufficient space of memory, and more specifically, some RDDs stored in memory The memory space can be ensured by moving and storing.

또한, 스파크 환경에서 성능 저하에 영향을 미치는 두번째 요인은, 셔플 스필(shuffle spill)의 발생이다. 스파크 환경에서 잡을 수행하는 도중에 셔플을 위해 사용되는 메모리가 부족한 경우, 셔플 스필되는 데이터가 디스크에 저장되는 셔플 스필이 발생하게 된다. 이때 메모리에 저장된 데이터 중 셔플 스필되는 데이터가 디스크로 이동될 때, 자바 오브젝트로 만들기 위한 직렬화가 발생하게 되는데, 이는 셔플 스필되는 데이터의 양이 커질수록 CPU의 사용량이 증가하게 되어 빅데이터 분석을 위해 사용될 CPU에 병목현상이 발생하게 되는 것이다. 이러한 병목현상을 해소하기 위해, 본원에서는 셔플 스필이 발생한 것으로 판단되는 경우, 메모리에 저장되어 있는 일부 RDD를 고속 저장 매체로 이동시켜 저장함으로써 메모리 공간을 확보한 후, 확보된 공간에 기초하여 셔플에 필요한 메모리를 증가시킬 수 있다. 이에 대한 설명은 후술하여 보다 자세히 설명하기로 한다.Also, a second factor affecting performance degradation in a spark environment is the occurrence of a shuffle spill. In the spark environment, if there is insufficient memory used for shuffling while performing a job, a shuffle spill will occur where the data to be spilled on the disk is stored on the disk. At this time, when the shuffle data stored in the memory is moved to the disk, a serialization for making the Java object occurs. This is because as the amount of data to be shuffled increases, the CPU usage increases, A bottleneck will occur in the CPU to be used. In order to solve such a bottleneck, in the present invention, when it is determined that a shuffle spill has occurred, some RDDs stored in the memory are moved to a high-speed storage medium and stored, thereby securing a memory space. The required memory can be increased. The description will be described later in more detail.

워커 관리자(34a)는 병목현상 요인이 확인된 경우, 실행 프로세스(31a)와 정보(예를 들어, 병목현상의 발생 여부를 판단할 수 있는 정보 등)를 주고 받음으로써 병목현상의 요인을 해결할 수 있다. 또한, 워커 관리자(34a)는, 실행 프로세스(31)를 통해 RDD의 저장 위치나 메모리 설정 등이 변경된 경우, 변경된 정보를 실행 프로세스(31)로부터 수신하고, 수신된 변경 정보를 드라이버 관리자(22)에게 전송할 수 있다.When the bottleneck factor is identified, the worker manager 34a can solve the bottleneck phenomenon by exchanging information (for example, information for determining whether or not the bottleneck phenomenon has occurred) with the execution process 31a have. The worker manager 34a also receives the changed information from the execution process 31 when the storage location or the memory setting of the RDD is changed through the execution process 31 and transmits the received change information to the driver manager 22 Lt; / RTI >

다시 말해, 드라이버 프로세스(21)에 의하여 태스크를 할당받으면, 워커 관리자(34a)는 자바 GC, 자바 히프 용량, CPU 사용량 등 병목현상 요인이 발생했는지 여부를 판단할 수 있는 정보를 시스템 분석 툴(35a)로부터 획득할 수 있다. 이후, 워커 관리자(34a)는, RDD의 저장 위치나 실행 프로세스(31a)의 메모리 설정을 제어할 수 있도록, 실행 프로세스(31a)로부터 RDD 정보와 실행 프로세스(31a)의 메모리 정보를 획득할 수 있다. 이후, 워커 관리자(34a)는 병목현상을 해소하기 위해, 실행 프로세스(31a)에 의한 RDD의 저장 위치나 메모리 설정 등을 변경할 수 있다. 이후, 드라이버 관리자(22)는 병목현상을 해소함에 따른 변경 정보(예를 들어, RDD의 저장 위치 변경 정보, 메모리 설정의 변경 정보 등)를 워커 관리자(34a)로부터 수신할 수 있다. 이때, 드라이버 관리자(22)는 복수의 워커 노드(30a, 30b, …, 30x)에 포함된 워커 관리자 각각으로부터 변경 정보를 수신할 수 있다. 이후, 드라이버 관리자(22)는 워커 관리자(34a)로부터 수신한 변경 정보를 드라이버 프로세스(21)에 업데이트할 수 있다.In other words, when the task is assigned by the driver process 21, the worker manager 34a sends information to the system analysis tool 35a (see FIG. 5) to determine whether a bottleneck phenomenon such as a Java GC, Java heap capacity, ). &Lt; / RTI > Thereafter, the worker manager 34a can obtain the RDD information and the memory information of the execution process 31a from the execution process 31a so as to control the storage location of the RDD and the memory setting of the execution process 31a . Thereafter, the worker manager 34a can change the storage location of the RDD, the memory setting, and the like by the execution process 31a in order to solve the bottleneck phenomenon. Thereafter, the driver manager 22 can receive change information (e.g., storage location change information of the RDD, change information of the memory setting, etc.) due to the resolution of the bottleneck from the walker manager 34a. At this time, the driver manager 22 can receive the change information from each of the walker managers included in the plurality of worker nodes 30a, 30b, ..., 30x. Thereafter, the driver manager 22 can update the change information received from the worker manager 34a to the driver process 21. [

본원의 일 실시예에 따른 스파크 기반의 메모리 관리 시스템(200)에서는, 마스터(20) 내에 드라이버 프로세스(21)와 워커 노드(30a) 내에 실행 프로세스(31a)는 네트워크(10)를 통해 태스크 정보와 태스크의 수행 결과 정보를 송수신하고, 마스터(20) 내에 드라이버 관리자(22)와 워커 노드(30a) 내에 워커 관리자(34a)는 네트워크(10)를 통해 RDD 정보와 워커 노드(30a)의 상태 정보(예를 들어, 병목현상의 발생을 판단할 수 있는 정보로서, 자바 GC, 자바 히프 용량, CPU 사용량 등의 정보 등)를 송수신할 수 있다. 이하에서는 본원에 대해 보다 자세히 설명하기로 한다.In the spark-based memory management system 200 according to the embodiment of the present invention, the driver process 21 and the execution process 31a in the worker node 30a in the master 20 communicate with each other via the network 10, And the worker manager 34a in the driver manager 22 and the worker node 30a within the master 20 transmits the RDD information and the status information of the worker node 30a For example, information such as Java GC, Java heap capacity, CPU usage, etc. can be transmitted and received as information for determining the occurrence of a bottleneck. Hereinafter, the present invention will be described in more detail.

마스터(20)는 잡을 복수의 태스크로 분할하여 복수의 워커 노드(30a, 30b, …, 30x)에 할당하는 드라이버 프로세스(21)를 실행할 수 있다. 분할된 태스크는 네트워크(10)를 통해 복수의 워커 노드(30a, 30b, …, 30x)로 전송되며, 이에 응답하여 워커 노드(30a)는 드라이버 프로세스(21)에 의하여 할당된 태스크를 수행하는 실행 프로세스(31a)를 실행할 수 있다.The master 20 can execute the driver process 21 that divides the job into a plurality of tasks and allocates them to the plurality of worker nodes 30a, 30b, ..., 30x. The divided tasks are transmitted to the plurality of worker nodes 30a, 30b, ..., 30x through the network 10, and in response, the worker node 30a executes the tasks assigned by the driver process 21 Process 31a can be executed.

이때, 실행 프로세스(31a)는 태스크를 수행하기 위해 필요한 RDD를 생성하여 메모리(32a) 또는 고속 저장매체(33a)에 저장할 수 있다. 일예로, 메모리(32a)는 SD, micro SD 등일 수 있고, 고속 저장매체(33a)는 SSD일 수 있으며, 이에 한정된 것은 아니다.At this time, the execution process 31a may generate the RDD required for executing the task and store the RDD in the memory 32a or the high-speed storage medium 33a. For example, the memory 32a may be SD, micro SD, etc., and the high-speed storage medium 33a may be an SSD, but is not limited thereto.

이후, 실행 프로세스(31a)는 생성된 RDD를 이용하여 태스크를 수행하고, 태스크의 수행 결과를 드라이버 프로세스(21)에 대해 전송할 수 있다. 이때, 본원의 일 실시예에 따른 스파크 기반의 메모리 관리 시스템(200)에서는 스파크의 성능을 향상시키기 위해, 워커 관리자(34a)가 스파크 환경에서의 병목현상의 발생 여부를 판단하고, 판단 결과에 따라 병목현상의 발생을 해소시키기 위한 처리를 수행하며, 처리를 수행함으로써 변경된 실행 프로세스(31a)의 변경 정보를 마스터(20)의 드라이버 관리자(22)로 전송할 수 있다. Thereafter, the execution process 31a may perform the task using the generated RDD, and may transmit the execution result of the task to the driver process 21. [ In this case, in the spark-based memory management system 200 according to the embodiment of the present invention, in order to improve the spark performance, the walker manager 34a determines whether or not a bottleneck phenomenon occurs in the spark environment, It is possible to transmit the changed information of the changed execution process 31a to the driver manager 22 of the master 20 by performing processing for eliminating the occurrence of a bottleneck and performing processing.

워커 관리자(34a)는 기설정된 주기로 풀 가비지 컬렉션(Full Garbage Collection) 및 셔플 스필(shuffle spill) 중 적어도 하나 이상의 발생 유무를 확인함으로써 병목현상의 발생 여부를 판단할 수 있다. 또한, 워커 관리자(34a)는 병목현상이 발생한 것으로 판단되는 경우, 메모리(32a)에 저장된 RDD 중에서 일부 RDD를 고속 저장매체(33a)로 이동시킴으로써 병목현상의 발생을 해소시킬 수 있다. 한편, 워커 관리자(34a)는 풀 가비지 컬렉션의 발생을 사전에 방지하기 위해 히프 크기를 고려하여, 생성된 RDD를 메모리(32a) 대신 고속 저장매체(33a)에 저장할 수 있다. 이는 도 3을 참조하여 보다 쉽게 이해될 수 있다.The walker manager 34a can determine whether a bottleneck phenomenon has occurred by checking whether at least one of a full garbage collection and a shuffle spill has occurred at a predetermined cycle. In addition, when it is determined that the bottleneck phenomenon has occurred, the worker manager 34a can eliminate bottlenecks by moving some of RDDs stored in the memory 32a to the high-speed storage medium 33a. On the other hand, the worker manager 34a may store the generated RDD in the high-speed storage medium 33a instead of the memory 32a in consideration of the size of the heap in order to prevent the occurrence of full garbage collection in advance. This can be more easily understood with reference to FIG.

도 3은 본원의 일 실시예에 따른 스파크 기반의 메모리 관리 시스템(200)에서 워커 관리자(34a)를 통한 병목현상의 발생 해소 과정을 나타낸 도면이다.FIG. 3 is a diagram illustrating a process of eliminating bottlenecks through the walker manager 34a in the spark-based memory management system 200 according to an embodiment of the present invention.

도 3을 참조하면, 워커 관리자(34a)는 병목현상의 발생 여부를 판단하기 위해(S310), 기설정된 주기로 자바 가상머신(java virtual machine) 상태와 셔플 스필(shuffle spill)을 확인할 수 있다. 여기서 자바 가상머신 상태와 셔플 스필을 확인하는 주기, 즉 기 설정된 주기는 사용자 입력에 의하여 설정될 수 있다.Referring to FIG. 3, in step S310, the worker manager 34a can check a state of a Java virtual machine and a shuffle spill in a predetermined period to determine whether a bottleneck has occurred. Here, the period for confirming the state of the Java virtual machine and the shuffle spill, that is, the predetermined period, can be set by user input.

워커 관리자(34a)는 자바 가상머신 상태를 주기적으로 모니터링하여 병목현상의 요인 중 하나인 풀 가비지 컬렉션(Full GC)의 발생 유무를 확인할 수 있으며, 셔플 스필을 주기적으로 모니터링하여 병목현상의 요인 중 다른 하나인 셔플 스필의 발생 유무를 확인할 수 있다.The walker manager 34a periodically monitors the state of the Java virtual machine to check whether full garbage collection (full GC) has occurred, which is one of the bottleneck factors. The walker manager 34a periodically monitors the shuffle spill, It is possible to confirm the occurrence of one shuffle spill.

이때, 워커 관리자(34a)는, 풀 가비지 컬렉션(Full GC)에 의한 병목현상의 발생을 사전에 방지하기 위해, 생성된 RDD가 실행 프로세스(31a)에 의해 메모리(32a) 또는 고속 저장매체(33a)에 저장되기 이전에, 먼저 자바 가상머신 상태를 확인(S320)하여 히프 크기(Heap Size)를 모니터링할 수 있다. 여기서 히프 크기는, 메모리(32a)의 전체 저장 공간 중 RDD가 저장될 수 있는 공간의 크기로서, 달리 말해, RDD를 저장하는 히프의 용량을 의미할 수 있다. 히프 크기(또는 히프 용량)의 설정은 사용자 입력에 의하여 기설정될 수 있다.At this time, in order to prevent occurrence of the bottleneck caused by full garbage collection (Full GC) in advance, the walker manager 34a stores the generated RDD in the memory 32a or the high-speed storage medium 33a , The Java virtual machine status can be checked (S320) and the heap size can be monitored first. Here, the size of the heap may be the size of the space in which the RDD can be stored in the entire storage space of the memory 32a, or in other words, the capacity of the heap storing the RDD. The setting of the heap size (or the hip capacity) may be pre-set by user input.

워커 관리자(34a)는 히프 크기를 모니터링하여, 메모리(32a)에 저장된 RDD의 용량이 기설정된 히프 크기(또는 히프 용량)를 초과하는지 여부를 판단(S321)할 수 있다. 판단 결과, 메모리(32a)에 저장된 RDD의 용량이 기설정된 히프 크기(히프 용량)를 초과하는 경우(S321-Y), 워커 관리자(34a)는 생성된 RDD가 메모리(32a) 대신 고속 저장매체(33a)에 저장(S322)되도록 실행 프로세스(31a)를 제어할 수 있다. 반면, 판단 결과, 메모리(32a)에 저장된 RDD의 용량이 기설정된 히프 크기(히프 용량)를 초과하지 않는 경우(S321-N), 워커 관리자(34a)는 생성된 RDD가 메모리(32a)에 저장(S323)되도록 실행 프로세스(31a)를 제어할 수 있다. 이후, 워커 관리자(34a)는 해당 RDD가 복수의 RDD 중 마지막 RDD인지 확인(S324)하고, 마지막 RDD가 아닌 경우(S324-N)에는 다시 S320을 수행하고, 마지막 RDD인 경우(S324-Y)에는 해당 알고리즘을 종료할 수 있다.The walker manager 34a may monitor the size of the heap and determine whether the capacity of the RDD stored in the memory 32a exceeds a predetermined heap size (or the heap capacity) (S321). If the capacity of the RDD stored in the memory 32a exceeds the predetermined heap size (S321-Y) as a result of the determination (S321-Y), the walker manager 34a causes the generated RDD to be stored in the high- 33a (S322). If the capacity of the RDD stored in the memory 32a does not exceed the predetermined heap size (S321-N) as a result of the determination, the walker manager 34a stores the generated RDD in the memory 32a (Step S323). If the RDD is not the last RDD (S324-N), the walker manager 34a performs S320 again. If the RDD is the last RDD (S324-Y) The corresponding algorithm can be terminated.

한편, 히프 크기가 초과하지 않아(321-N) 생성된 RDD가 메모리(32a)에 저장되는 경우(S323)에도, 잡(Job)을 실행하면서 메모리(32a)가 부족하게 되어 풀 가비지 컬렉션(Full GC)이 발생할 수 있다. Full GC가 발생하면 CPU와 같은 컴퓨팅 자원이 많이 사용되기 때문에 스파크 클러스터에 병목현상이 발생하게 된다. 따라서, 워커 관리자(34a)는 Full GC에 의한 병목현상을 해소하기 위해, 자바 가상머신 상태를 확인(S320)하여 풀 가비지 컬렉션이 발생했는지 여부를 판단(S325)할 수 있다.When the generated RDD is not stored in the memory 32a (S323), the memory 32a becomes insufficient while the job is being executed, so that full garbage collection (Full) GC) may occur. When full GC occurs, a large amount of computing resources such as CPU are used, which causes a bottleneck in spark clusters. Accordingly, the worker manager 34a can check the state of the Java virtual machine (S320) to determine whether full garbage collection has occurred (S325) in order to eliminate the bottleneck due to Full GC.

단계S325에서 풀 가비지 컬렉션이 발생하지 않은 것으로 판단되는 경우(S325-N), 워커 관리자(34a)는 다시 단계S320을 수행하기 위한 단계로 돌아갈 수 있다. 단계S325에서 풀 가비지 컬렉션이 발생한 것으로 판단되는 경우(S325-Y), 워커 관리자(34a)는 기설정된 RDD 교체 알고리즘에 기초하여 메모리(32a)에 저장된 RDD 중에서 고속 저장매체(33a)로 옮길 제1 RDD를 일부 RDD로서 식별할 수 있다(S326). If it is determined in step S325 that full garbage collection does not occur (S325-N), the worker manager 34a may return to step S320. If it is determined in step S325 that full garbage collection has occurred (S325-Y), the walker manager 34a selects the first RDD among the RDDs stored in the memory 32a based on the preset RDD replacement algorithm, RDD can be identified as a part of RDD (S326).

이때, 기설정된 RDD 교체 알고리즘은 일예로 LRU(Least Recently Used) 알고리즘, LFU(Least Frequently Used) 알고리즘, FIFO(First In, First Out) 알고리즘 등일 수 있으며, 이에 한정된 것은 아니다. 각 알고리즘은 공지된 기술이므로, 이하 자세한 설명은 생략하기로 한다.The predetermined RDD replacement algorithm may be, for example, an LRU (Least Recently Used) algorithm, an LFU (Least Frequently Used) algorithm, a FIFO (First In, First Out) algorithm, and the like. Since each algorithm is a known technique, a detailed description will be omitted below.

이후, 워커 관리자(34a)는, 메모리(32a)에 저장된 제1 RDD를 복사하여 고속 저장매체(33a)에 저장한 후 메모리(32a)에 저장된 제1 RDD를 삭제함으로써, 메모리(32a)에 저장된 제1 RDD를 고속 저장매체(33a)로 이동시킬 수 있다(S327). 이때, 워커 관리자(34a)는 스파크 API(Application Programming Interface)를 이용하여 메모리(32a)와 고속 저장매체(33a) 간에 RDD의 이동을 제어할 수 있다. 즉, 워커 관리자(34a)는 스파크 API를 이용함으로써 메모리(32a)에 저장된 제1 RDD를 고속 저장매체(33a)로 이동시킬 수 있다.Thereafter, the worker manager 34a copies the first RDD stored in the memory 32a, stores it in the high-speed storage medium 33a, and deletes the first RDD stored in the memory 32a, thereby storing the first RDD stored in the memory 32a The first RDD can be moved to the high-speed storage medium 33a (S327). At this time, the worker manager 34a can control the movement of the RDD between the memory 32a and the high-speed storage medium 33a using a spark application programming interface (API). That is, the worker manager 34a can move the first RDD stored in the memory 32a to the high-speed storage medium 33a by using the spark API.

이후, 워커 관리자(34a)는 제1 RDD의 이동에 의하여 변화된 변경 정보(예를 들어, RDD의 저장 위치의 변경 정보 등)를 드라이버 관리자(22)로 전송(S328)할 수 있다. 단계 S328 이후, 워커 관리자(34a)는 Full GC에 의한 병목현상의 발생 여부 판단을 기설정된 주기로 반복적으로 수행하기 위해, 다시 단계S320을 수행하기 위한 단계로 돌아갈 수 있다.Thereafter, the worker manager 34a may transmit the changed information (e.g., change information of the storage location of the RDD) changed by the movement of the first RDD to the driver manager 22 (S328). After step S328, the worker manager 34a may go back to step S320 to repeatedly perform the determination of whether or not the bottleneck phenomenon by Full GC is repeated at a predetermined cycle.

한편, 단계S310에서 워커 관리자(34a)는 병목현상의 발생 여부를 판단하기 위해 셔플 스필을 기설정된 주기로 확인(S330)할 수 있다.On the other hand, in step S310, the worker manager 34a may check the shuffle spill in a predetermined cycle (S330) to determine whether the bottleneck phenomenon has occurred or not.

이때, 셔플 스필은 메모리(32a)의 용량이 부족하여 셔플을 통해 발생하는 중간 데이터가 디스크(disk)에 스필되는 것을 의미한다. 셔플 스필이 스파크 클러스터에서 병목현상이 되는 이유는, 태스크 처리를 위해 연산에 사용될 CPU가 스필로 인해 RDD를 직렬화하기 때문이다. 이때, 직렬화를 하는 경우에는 CPU의 사용량이 100%에 도달하여 실행 프로세스(31a)가 태스크를 처리하지 못하게 된다. 이렇게 병목현상이 발생하는 경우에는 병목이 된 시간만큼 메모리 관리 시스템(200)의 전체 잡 수행 시간에 영향을 주게 되어, 스파크의 성능이 저하되는 문제가 있다. 따라서, 스파크의 성능을 향상시키기 위해, 워커 관리자(34a)는 셔플 스필이 발생하는지 여부를 판단한 후 셔플 스필에 의한 병목현상을 해소시키기 위한 처리를 수행할 수 있다. 이때, 워커 관리자(34a)는 셔플 스필의 발생 유무를 확인하기 위해, Spark Web UI 와 computer resource 분석 도구를 포함한 분석 툴(35a)을 이용할 수 있으며, 이를 통해 셔플 스필의 발생 여부 및 CPU 사용량을 확인할 수 있다.At this time, the shuffle spill means that the capacity of the memory 32a is insufficient and the intermediate data generated through the shuffle is spilled on the disk. The reason that shuffle spill becomes a bottleneck in spark clusters is because the CPU to be used in the operation to serialize the RDD due to the spill. At this time, when serialization is performed, the usage amount of the CPU reaches 100%, and the execution process 31a can not process the task. When the bottleneck phenomenon occurs, the whole job execution time of the memory management system 200 is affected by the time that the bottleneck becomes the bottleneck, and the performance of the spark is deteriorated. Therefore, in order to improve the performance of the spark, the walker manager 34a may determine whether or not the shuffle spill occurs, and then perform processing for eliminating the bottleneck caused by the shuffle spill. At this time, the worker manager 34a can use the analysis tool 35a including the Spark Web UI and the computer resource analysis tool to check whether or not the shuffle spill has occurred and check the CPU usage amount .

단계S330에서 셔플 스필을 기설정된 주기로 확인하여 셔플 스필이 발생한 것으로 판단되는 경우, 워커 관리자(34a)는 메모리(32a)에서 셔플 공간의 비중(%)이 기설정된 값(T%, 일예로 80%) 이상(또는 미만)인지 여부를 판단(S331)함으로써, 셔플 공간의 비중을 증가시킬지 여부를 결정할 수 있다. 여기서 셔플 공간의 비중을 증가시킬지 여부를 결정하는데 기준이 되는 기설정된 값(일예로, 80%)은 메모리(32a)의 전체 공간 내에서 셔플 공간으로 이용될 수 있는 최대 값으로서, 이는 사용자 입력에 의하여 설정될 수 있다.If it is determined in step S330 that the shuffle spill has occurred at a predetermined cycle and the shuffle spill has occurred, the walker manager 34a determines that the percentage of the shuffle space in the memory 32a has reached a predetermined value (T%, for example, 80% ) (Step S331), it is possible to determine whether to increase the specific gravity of the shuffle space. Here, a predetermined value (for example, 80%) as a reference for determining whether to increase the specific area of the shuffle space is a maximum value that can be used as a shuffle space in the entire space of the memory 32a, Lt; / RTI >

보다 구체적으로, 셔플 스필이 발생한 것으로 판단되는 경우, 워커 관리자(34a)는 메모리(32a)의 전체 공간 내에서 현재 셔플 공간으로 사용되고 있는 공간이 어느 정도가 되는지(즉, 메모리(32a)에서 셔플 공간이 몇 퍼센트(%)의 비중을 차지하는지) 확인할 수 있다. 일예로 메모리(32a) 내에서 셔플 공간으로 사용될 수 있는 최대 공간이 80%라고 가정하자. 이때, 메모리(32a) 내에서 현재 셔플 공간으로 사용되고 있는 공간이 80% 이상인 경우(S331-Y), 달리 표현하여 메모리(32a) 내에서 현재 셔플 공간으로 사용되고 있는 공간이 기설정된 값(80%) 미만이 아닌 경우, 워커 관리자(34a)는 더 이상 셔플 공간을 증가시킬 수 없으므로 이러한 경우 해당 알고리즘을 종료할 수 있다.More specifically, when it is determined that the shuffle spill has occurred, the walker manager 34a determines how much the current shuffle space is used in the entire space of the memory 32a (i.e., (A percentage of a percentage). For example, assume that the maximum space that can be used as the shuffle space in the memory 32a is 80%. When the space used as the current shuffle space in the memory 32a is 80% or more (S331-Y), the space used as the current shuffle space in the memory 32a is expressed as 80% , The worker manager 34a can no longer increase the shuffle space, and in this case, the corresponding algorithm can be terminated.

반면, 메모리(32a) 내에서 현재 셔플 공간으로 사용되고 있는 공간이 기설정된 값(80%) 이상이 아닌 것으로 판단되는 경우(S331-N), 워커 관리자(34a)는 다음의 과정을 통해 셔플 공간의 비중을 증가시킬 수 있다. 보다 구체적으로, 워커 관리자(34a)는 S331-N인 경우, 셔플 공간의 비중을 증가시키기 위해 기설정된 RDD 교체 알고리즘에 기초하여 메모리(32a)에 저장된 RDD 중에서 고속 저장매체(33a)로 옮길 제2 RDD를 일부 RDD로서 식별할 수 있다(S332).On the other hand, if it is determined that the space used as the current shuffle space in the memory 32a is not equal to or greater than the predetermined value (80%) (S331-N), the walker manager 34a The specific gravity can be increased. More specifically, in the case of S331-N, the worker manager 34a selects, from among the RDDs stored in the memory 32a based on the preset RDD replacement algorithm, The RDD can be identified as some RDD (S332).

이때, 기설정된 RDD 교체 알고리즘은 일예로 LRU(Least Recently Used) 알고리즘, LFU(Least Frequently Used) 알고리즘, FIFO(First In, First Out) 알고리즘 등일 수 있으며, 이에 한정된 것은 아니다.The predetermined RDD replacement algorithm may be, for example, an LRU (Least Recently Used) algorithm, an LFU (Least Frequently Used) algorithm, a FIFO (First In, First Out) algorithm, and the like.

이후, 워커 관리자(34a)는, 메모리(32a)에 저장된 제2 RDD를 복사하여 고속 저장매체(33a)에 저장한 후 메모리(32a)에 저장된 제2 RDD를 삭제함으로써, 메모리(32a)에 저장된 제2 RDD를 고속 저장매체(33a)로 이동시킬 수 있다(S333). 이때, 워커 관리자(34a)는 스파크 API(Application Programming Interface)를 이용하여 메모리(32a)와 고속 저장매체(33a) 간에 RDD의 이동을 제어할 수 있다. 즉, 워커 관리자(34a)는 스파크 API를 이용함으로써 메모리(32a)에 저장된 제2 RDD를 고속 저장매체(33a)로 이동시킬 수 있다.Thereafter, the worker manager 34a copies the second RDD stored in the memory 32a, stores it in the high-speed storage medium 33a, and then deletes the second RDD stored in the memory 32a, thereby storing the second RDD stored in the memory 32a The second RDD can be moved to the high-speed storage medium 33a (S333). At this time, the worker manager 34a can control the movement of the RDD between the memory 32a and the high-speed storage medium 33a using a spark application programming interface (API). That is, the worker manager 34a can move the second RDD stored in the memory 32a to the high-speed storage medium 33a by using the spark API.

이후 워커 관리자(34a)는, 메모리(32a)에서 제2 RDD를 삭제함으로써 확보된 메모리(32a) 내의 여유 공간을 기반으로 하여, 셔플 스필이 발생하지 않도록 셔플 공간의 비중을 증가시킬 수 있다(S334). 이때, 단계S334에서 셔플 공간의 비중을 증가시킬 때, 한번에 증가될 수 있는 셔플 공간의 최대 증가 비중 값(즉, 제2 RDD를 삭제 시 한번에 셔플 공간으로 늘릴 수 있는 최대 비율)은 사용자 입력에 의하여 설정될 수 있다.The worker manager 34a can increase the weight of the shuffle space so that the shuffle spill does not occur based on the free space in the memory 32a secured by deleting the second RDD from the memory 32a (S334 ). At this time, when the specific gravity of the shuffle space is increased in step S334, the maximum increase specific gravity value of the shuffle space that can be increased at one time (that is, the maximum rate at which the second RDD can be expanded into the shuffle space at the time of deletion) Can be set.

이후, 워커 관리자(34a)는 제2 RDD의 이동에 의하여 변화된 변경 정보(예를 들어, RDD의 저장 위치의 변경 정보, 메모리(32a) 설정의 변경 정보 등)를 드라이버 관리자(22)로 전송(S335)할 수 있다. 단계 S336 이후, 워커 관리자(34a)는 메모리(32a)에서 셔플 공간의 비중이 기설정된 값 이상(또는 미만)인지 여부를 다시 판단(S336)할 수 있다. 단계S336에서 현재 셔플 공간으로 사용되고 있는 공간이 기설정된 값 이상인 경우(S336-Y), 워커 관리자(34a)는 더 이상 셔플 공간을 증가시킬 수 없으므로 해당 알고리즘을 종료할 수 있다. 한편, 단계S336에서 현재 셔플 공간으로 사용되고 있는 공간이 기설정된 값 이상이 아닌 경우(S336-N), 워커 관리자(34a)는 셔플 스필에 의한 병목현상의 발생 여부 판단을 기설정된 주기로 반복적으로 수행하기 위해, 다시 단계S330을 수행하기 위한 단계로 돌아갈 수 있다.The walker manager 34a then transmits the change information (e.g., change information of the storage location of the RDD, change information of the memory 32a setting, etc.) changed by the movement of the second RDD to the driver manager 22 S335). After step S336, the walker manager 34a can again determine whether the specific weight of the shuffle space in the memory 32a is equal to or greater than a predetermined value (S336). If the space used as the current shuffle space is equal to or greater than the predetermined value in step S336 (S336-Y), the walker manager 34a can not increase the shuffle space any more and can terminate the algorithm. On the other hand, if it is determined in step S336 that the space used as the current shuffle space is not equal to or greater than the preset value (S336-N), the walker manager 34a repeatedly performs the determination of whether or not the bottleneck phenomenon by the shuffle spill occurs, , It may return to the step for performing step S330 again.

이처럼, 워커 관리자(34a)가 주기적으로 두 가지의 병목현상 요인(예를 들어, Full GC, 셔플 스필)의 발생 여부를 확인한 후 각각의 병목현상을 해결함으로써, 본원의 일 실시예에 따른 스파크 기반의 메모리 관리 시스템(200)은 스파크의 성능이 보다 효과적으로 향상될 수 있다.As described above, after the worker manager 34a periodically checks whether two bottleneck factors (e.g., Full GC, shuffle spill) have occurred and resolves the respective bottleneck phenomena, the spark- The memory management system 200 of the present invention can improve the performance of the spark more effectively.

본원에서 워커 관리자(34a)는 실행 프로세스(31a)로부터 RDD 정보 및 메모리 정보를 수신할 수 있으며, 수신한 정보에 기초하여 실행 프로세스(31a)에 의한 RDD의 저장 위치 및 메모리의 설정을 제어할 수 있다. 이때, RDD의 저장 위치 또는 메모리의 설정에 변화가 발생된 것으로 판단되는 경우, 워커 관리자(34a)는 실행 프로세스(31a)로부터 RDD의 저장 위치의 변경 정보 또는 메모리 설정의 변경 정보를 수신할 수 있다. 이후, 워커 관리자(34a)는 수신된 변경 정보를 실행 프로세스(31a)의 변경 정보로서 드라이버 관리자(22)로 전송할 수 있다.The walker manager 34a here can receive RDD information and memory information from the execution process 31a and can control the storage location of the RDD and the settings of the memory by the execution process 31a based on the received information have. At this time, when it is determined that a change has occurred in the storage location of the RDD or the setting of the memory, the worker manager 34a may receive change information of the storage location of the RDD or change information of the memory setting from the execution process 31a . Then, the worker manager 34a can transmit the received change information to the driver manager 22 as change information of the execution process 31a.

마스터(20)의 드라이버 관리자(22)는 워커 관리자(34a)로부터 수신한 변경 정보를 드라이버 프로세스(21)에 대해 업데이트할 수 있다. The driver manager 22 of the master 20 can update the change information received from the worker manager 34a with respect to the driver process 21. [

이러한 본원의 일 실시예에 따른 스파크 기반의 메모리 관리 시스템(200)은 스파크의 RDD 단위로 캐싱을 관리함으로써 캐싱의 효율을 향상시킬 수 있으며, 여러 빅데이터 분산 프레임워크에 적용될 수 있다. 또한, 메모리 관리 시스템(200)은 고성능 스토리지인 SSD를 이용함으로써 적은 비용으로 높은 효율을 낼 수 있어 인메모리 시스템에서의 성능을 개선시킬 수 있다.The spark-based memory management system 200 according to an exemplary embodiment of the present invention can improve caching efficiency by managing caching in units of spark RDD, and can be applied to various big data distribution frameworks. In addition, the memory management system 200 can achieve high efficiency at a low cost by using SSD, which is a high-performance storage, and can improve performance in an in-memory system.

이하에서는 상기에 자세히 설명된 내용을 기반으로, 본원의 동작 흐름을 간단히 살펴보기로 한다.Hereinafter, the operation flow of the present invention will be briefly described based on the details described above.

도 4a 및 도 4b는 본원의 일 실시예에 따른 스파크 기반의 메모리 관리 방법에 대한 개략적인 동작 흐름도이다.4A and 4B are schematic operation flowcharts of a spark-based memory management method according to an embodiment of the present invention.

도 4a 및 도 4b에 도시된 스파크 기반의 메모리 관리 방법은 앞서 설명된 스파크 기반의 메모리 관리 시스템(200)에 의하여 수행될 수 있다. 따라서, 이하 생략된 내용이라고 하더라도 스파크 기반의 메모리 관리 시스템(200)에 대하여 설명된 내용은 도 4a 및 도 4b에도 동일하게 적용될 수 있다.The spark-based memory management method shown in FIGS. 4A and 4B can be performed by the spark-based memory management system 200 described above. Accordingly, the contents of the spark-based memory management system 200 can be similarly applied to FIG. 4A and FIG. 4B even if omitted from the following description.

도 4a 및 도 4b를 참조하면, 단계S410에서는, 마스터에 포함된 드라이버 프로세스를 실행하여, 잡(job)을 복수의 태스크(task)로 분할하여 할당할 수 있다.Referring to FIGS. 4A and 4B, in step S410, a driver process included in the master may be executed to divide a job into a plurality of tasks.

다음으로, 단계S420에서는, 워커 노드에 포함된 실행 프로세스를 실행하여, 드라이버 프로세스에 의하여 할당된 태스크를 수행하고, 수행 결과를 드라이버 프로세스에 대해 전송할 수 있다. 이때, 단계S420에서는, 실행 프로세스가, 태스크를 수행하기 위한 RDD를 생성하여 메모리 또는 고속 저장매체에 저장할 수 있다.Next, in step S420, the execution process included in the worker node may be executed, the task assigned by the driver process may be performed, and the execution result may be transmitted to the driver process. At this time, in step S420, the execution process may generate an RDD for performing a task and store the RDD in a memory or a high-speed storage medium.

또한, 단계S420에서는, 워커 관리자가, 풀 가비지 컬렉션(Full Garbage Collection)에 의한 상기 병목현상의 발생을 사전에 방지하기 위해, 생성된 RDD 저장 시 기설정된 히프 크기의 초과 여부를 판단하고, 판단 결과 기설정된 히프 크기가 초과하는 경우 RDD가 메모리 대신 고속 저장매체에 저장되도록 실행 프로세스를 제어할 수 있다.In step S420, the worker manager determines whether the predetermined hip size is exceeded at the time of storing the generated RDD in order to prevent the occurrence of the bottleneck phenomenon by full garbage collection in advance, If the preset heap size is exceeded, the execution process can be controlled such that the RDD is stored on the fast storage medium instead of the memory.

또한, 단계S420에서는, 워커 노드에 포함된 워커 관리자가, 스파크 환경에서의 병목현상의 발생 여부를 판단하고, 판단 결과에 따라 병목현상의 발생을 해소시키기 위한 처리를 수행하고, 처리를 수행함으로써 변경된 실행 프로세스의 변경 정보를 마스터로 전송할 수 있다(S421).In step S420, the worker manager included in the worker node determines whether or not a bottleneck phenomenon occurs in the spark environment, performs processing for eliminating the occurrence of the bottleneck phenomenon according to the determination result, The change information of the execution process can be transmitted to the master (S421).

단계S421에서는, 워커 관리자가, 메모리에 저장된 RDD 중에서 일부 RDD를 고속 저장매체로 이동시킴으로써 병목현상의 발생을 해소시킬 수 있다. 이때, 워커 관리자는, 스파크 API를 이용하여 일부 RDD를 고속 저장매체로 이동시킬 수 있다.In step S421, the worker manager can eliminate the bottleneck phenomenon by moving some of the RDDs stored in the memory to the high-speed storage medium. At this time, the worker manager can move some RDDs to the high-speed storage medium by using the spark API.

단계S421에서는, 워커 관리자가, 기설정된 주기로 풀 가비지 컬렉션(Full Garbage Collection) 및 셔플 스필(shuffle spill) 중 적어도 하나 이상의 발생 유무를 확인함으로써 병목현상의 발생 여부를 판단할 수 있다.In step S421, the worker manager can determine whether a bottleneck phenomenon has occurred by checking whether at least one of a full garbage collection and a shuffle spill has occurred at a predetermined period.

또한, 단계S421에서는, 워커 관리자가, 풀 가비지 컬렉션이 발생한 것으로 판단되는 경우, 기설정된 RDD 교체 알고리즘에 기초하여 메모리에 저장된 RDD 중에서 고속 저장매체로 옮길 제1 RDD를 일부 RDD로서 식별하고, 제1 RDD를 복사하여 고속 저장매체에 저장하고, 메모리에 저장된 제1 RDD를 삭제할 수 있다.In step S421, if it is determined that full garbage collection has occurred, the walker manager identifies the first RDD to be transferred to the high-speed storage medium among the RDDs stored in the memory as a partial RDD based on the predetermined RDD replacement algorithm, The RDD can be copied and stored in the high-speed storage medium, and the first RDD stored in the memory can be deleted.

또한, 단계S421에서는, 워커 관리자가, 셔플 스필이 발생한 것으로 판단되는 경우, 메모리에서 셔플 공간의 비중이 기설정된 값 미만인지 여부를 판단함으로써 셔플 공간의 비중을 증가시킬지 여부를 결정할 수 있다.Further, in step S421, when it is determined that the shuffle spill has occurred, the walker manager can determine whether to increase the specific gravity of the shuffle space by determining whether the specific gravity of the shuffle space is less than a predetermined value in the memory.

이때, 단계S421에서는, 워커 관리자가, 셔플 공간의 비중이 상기 기설정된 값 미만인 것으로 판단되는 경우, 셔플 공간의 비중을 증가시키기 위해 기설정된 RDD 교체 알고리즘에 기초하여 메모리에 저장된 RDD 중에서 고속 저장매체로 옮길 제2 RDD를 일부 RDD로서 식별하고, 제2 RDD를 복사하여 고속 저장매체에 저장하고, 메모리에 저장된 제2 RDD를 삭제할 수 있다.At this time, in step S421, when it is determined that the weight of the shuffle space is less than the preset value, the walker manager selects the RDD stored in the memory based on the preset RDD replacement algorithm to increase the weight of the shuffle space The second RDD to be moved can be identified as some RDD, the second RDD can be copied and stored in the high-speed storage medium, and the second RDD stored in the memory can be deleted.

단계S421에서는, 워커 관리자가, 메모리에서 제2 RDD를 삭제함으로써 셔플 공간의 비중을 증가시킬 수 있으며, 이때, 한번에 증가될 수 있는 셔플 공간의 최대 증가 비중 값은 사용자 입력에 의하여 설정될 수 있다.In step S421, the worker manager may increase the weight of the shuffle space by deleting the second RDD from the memory, wherein the maximum increase gravity value of the shuffle space that can be increased at one time may be set by user input.

단계S421에서는, 워커 관리자가, 실행 프로세스로부터 RDD의 정보 및 메모리의 정보를 수신하여 실행 프로세스에 의한 RDD의 저장 위치 및 메모리의 설정을 제어할 수 있다.In step S421, the worker manager can receive the RDD information and the memory information from the execution process, and control the storage location of the RDD and the memory setting by the execution process.

단계S421에서, RDD의 저장 위치 및 메모리의 설정에 변화가 발생된 것으로 판단되는 경우, 워커 관리자는 실행 프로세스로부터 RDD의 저장 위치의 변경 정보 및 메모리 설정의 변경 정보를 수신하고, 수신한 변경 정보를 실행 프로세스의 변경 정보로서 드라이버 관리자로 전송할 수 있다. If it is determined in step S421 that a change has occurred in the storage location of the RDD and the setting of the memory, the walker manager receives the change information of the storage location of the RDD and the change information of the memory setting from the execution process, And can be transferred to the driver manager as change information of the execution process.

이후, 단계S422에서는, 마스터에 포함된 드라이버 관리자가, 워커 관리자로부터 수신한 변경 정보를 드라이버 프로세스에 대해 업데이트할 수 있다(S422).Thereafter, in step S422, the driver manager included in the master can update the change information received from the walker manager with respect to the driver process (S422).

상술한 설명에서, 단계 S410 내지 S420은 본원의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다.In the above description, steps S410 through S420 may be further divided into additional steps or combined into fewer steps, according to embodiments of the present disclosure. Also, some of the steps may be omitted as necessary, and the order between the steps may be changed.

본원의 일 실시 예에 따른 스파크 기반의 메모리 관리 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The spark-based memory management method according to one embodiment of the present invention may be implemented in the form of a program command that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions recorded on the medium may be those specially designed and configured for the present invention or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

또한, 전술한 스파크 기반의 메모리 관리 방법은 기록 매체에 저장되는 컴퓨터에 의해 실행되는 컴퓨터 프로그램 또는 애플리케이션의 형태로도 구현될 수 있다.The above-described spark-based memory management method may also be implemented in the form of a computer program or an application executed by a computer stored in a recording medium.

전술한 본원의 설명은 예시를 위한 것이며, 본원이 속하는 기술분야의 통상의 지식을 가진 자는 본원의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.It will be understood by those of ordinary skill in the art that the foregoing description of the embodiments is for illustrative purposes and that those skilled in the art can easily modify the invention without departing from the spirit or essential characteristics thereof. It is therefore to be understood that the above-described embodiments are illustrative in all aspects and not restrictive. For example, each component described as a single entity may be distributed and implemented, and components described as being distributed may also be implemented in a combined form.

본원의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본원의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is defined by the appended claims rather than the detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalents should be construed as being included within the scope of the present invention.

200: 스파크 기반의 메모리 관리 시스템
20: 마스터
21: 드라이버 프로세스 22: 드라이버 관리자
30a: 워커 노드
31a: 실행 프로세스
32a: 메모리 33a: 고속 저장매체
34a: 워커 관리자 35a: 분석 툴200: Spark-based memory management system
20: Master
21: Driver Process 22: Driver Manager
30a: Worker node
31a: Execution Process
32a: memory 33a: high-speed storage medium
34a: Walker Manager 35a: Analysis Tool

Claims

A memory management system in a spark environment,
A master for executing a driver process of dividing and assigning a job to a plurality of tasks;
A worker node for executing a task assigned by the driver process and executing an execution process for transmitting the execution result to the driver process,
Lt; / RTI >
Wherein the worker node determines whether or not a bottleneck phenomenon occurs in the spark environment, performs processing for eliminating the occurrence of the bottleneck phenomenon according to a determination result, and updates the change information of the changed execution process by performing the processing And a walker manager for transmitting to the master,
Wherein the master includes a driver manager for updating the change information received from the worker manager with respect to the driver process,
Wherein the execution process comprises: generating an RDD for performing the task and storing the RDD in a memory or a high-speed storage medium,
Wherein the worker manager removes the bottleneck by moving some of the RDDs stored in the memory to the fast storage medium and confirms whether at least one of full garbage collection and shuffle spill has occurred at a predetermined cycle, Wherein the memory management system determines whether a phenomenon occurs.

delete

The method according to claim 1,
The walker manager,
Identifies a first RDD to be transferred to the fast storage medium among the RDDs stored in the memory as the partial RDD based on a predetermined RDD replacement algorithm when the full garbage collection has occurred, Fast storage medium, and deletes the first RDD stored in the memory.

The method according to claim 1,
The walker manager,
In order to prevent the occurrence of the bottleneck caused by the full garbage collection, it is determined whether or not the generated RDD exceeds the preset heap size. If the RDD exceeds the predetermined size, the RDD replaces the memory And controls the execution process to be stored in the fast storage medium.

The method according to claim 1,
The walker manager,
And determines whether to increase the weight of the shuffle space by determining whether the shuffle space weight is less than a predetermined value in the memory when it is determined that the shuffle spill has occurred.

The method according to claim 6,
The walker manager,
And a second RDD to be transferred to the high-speed storage medium among the RDDs stored in the memory based on a predetermined RDD replacement algorithm to increase the specific gravity of the shuffle space when it is determined that the specific gravity of the shuffle space is less than the preset value Identify as some RDDs, copy the second RDDs to the fast storage medium, and delete the second RDDs stored in the memory.

8. The method of claim 7,
The walker manager,
Wherein the maximum incremental gravity value of the shuffle space that can be increased at one time by increasing the gravity of the shuffle space by deleting the second RDD from the memory is configurable by user input.

The method according to claim 1,
Wherein the worker manager moves the portion of the RDD to the fast storage medium using a spark API.

The method according to claim 1,
The walker manager,
Receiving the information of the RDD and the information of the memory from the execution process and controlling the storage location of the RDD by the execution process and the setting of the memory,
Wherein the change information of the storage location of the RDD and the change information of the memory setting are transmitted to the driver manager as change information of the execution process when it is determined that a change occurs in the storage location of the RDD and the setting of the memory, Spark-based memory management system.

A method for managing memory in a spark environment,
(a) executing a driver process included in a master to divide a job into a plurality of tasks and allocate them;
(b) executing an execution process included in the worker node, performing a task assigned by the driver process, and transmitting an execution result to the driver process,
Lt; / RTI >
The step (b)
(b1) a walker manager included in the worker node determines whether or not a bottleneck phenomenon occurs in the spark environment, performs processing for eliminating the occurrence of the bottleneck phenomenon according to a determination result, and performs the processing Transmitting change information of the changed execution process to the master; And
(b2) updating, by the driver manager included in the master, the change information received from the walker manager to the driver process,
In the step (b)
Wherein the execution process generates an RDD for performing the task and stores the RDD in a memory or a high-speed storage medium,
In the step (b1)
Wherein the worker manager removes the bottleneck by moving some of the RDDs stored in the memory to the high-speed storage medium and confirms whether at least one of the full garbage collection and the shuffle spill has occurred at a predetermined cycle, And determining whether or not a phenomenon occurs.

delete

12. The method of claim 11,
In the step (b1)
The walker manager identifies, as the partial RDD, a first RDD to be transferred to the fast storage medium among RDDs stored in the memory based on a predetermined RDD replacement algorithm when it is determined that the full garbage collection has occurred, Storing the RDD in the fast storage medium, and deleting the first RDD stored in the memory.

12. The method of claim 11,
In the step (b)
The walker manager determines whether the generated RDD exceeds a preset heap size in order to prevent the occurrence of the bottleneck caused by the full garbage collection. If the generated RDD exceeds the predetermined size, And the RDD is stored in the fast storage medium instead of the memory.

12. The method of claim 11,
In the step (b1)
Wherein the walker manager determines whether to increase the weight of the shuffle space by determining whether the shuffle space weight is less than a predetermined value in the memory when it is determined that the shuffle spill has occurred, How to manage memory.

17. The method of claim 16,
In the step (b1)
The walker manager moves from the RDD stored in the memory to the high-speed storage medium based on a preset RDD replacement algorithm to increase the weight of the shuffle space when it is determined that the weight of the shuffle space is less than the predetermined value Identifying the second RDD as the partial RDD, copying the second RDD to the fast storage medium, and deleting the second RDD stored in the memory.

18. The method of claim 17,
In the step (b1)
Wherein the walker manager is configured to increase the weight of the shuffle space by deleting the second RDD from the memory, wherein the maximum incremental gravity value of the shuffle space that can be increased at one time is set by user input. How to manage memory.

12. The method of claim 11,
In the step (b1)
Wherein the worker manager moves the portion of the RDD to the fast storage medium using a spark API.

12. The method of claim 11,
In the step (b1)
The worker manager receives information of the RDD and information of the memory from the execution process and controls the storage location of the RDD by the execution process and the setting of the memory,
Wherein the change information of the storage location of the RDD and the change information of the memory setting are transmitted to the driver manager as change information of the execution process when it is determined that a change occurs in the storage location of the RDD and the setting of the memory, A spark-based memory management method.

A computer-readable recording medium having recorded thereon a program for executing the method of any one of claims 11, 14 to 20 in a computer.