KR20190062778A

KR20190062778A - Method for dynamic neural network learning and apparatus for the same

Info

Publication number: KR20190062778A
Application number: KR1020170161312A
Authority: KR
Inventors: 이경용
Original assignee: 국민대학교산학협력단
Priority date: 2017-11-29
Filing date: 2017-11-29
Publication date: 2019-06-07
Also published as: KR102091481B1

Abstract

A dynamic neural network learning method is performed in a dynamic neural network learning apparatus connected to a plurality of processing elements. The dynamic neural network learning method comprises the following steps: (a) setting a checkpoint on a learning task performed through a first processing element to obtain output of the learning task; (b) searching a relatively cost-effective second processing element among the processing elements independently of the performance of the learning task; and (c) moving the learning task to the second processing element with the output of the obtained learning task and continuously performing the learning task if the search is successfully performed.

Description

TECHNICAL FIELD [0001] The present invention relates to a dynamic neural network learning method and a dynamic neural network learning apparatus for performing the dynamic neural network learning method,

본 발명은 동적 신경망 학습 기술에 관한 것으로, 보다 상세하게는, 복수의 프로세싱 엘리먼트들을 통해 동적 신경망 학습을 수행할 수 있는 동적 신경망 학습 방법 및 이를 수행하는 동적 신경망 학습 장치에 관한 것이다.The present invention relates to a dynamic neural network learning technique, and more particularly, to a dynamic neural network learning method capable of performing dynamic neural network learning through a plurality of processing elements and a dynamic neural network learning apparatus for performing the dynamic neural network learning method.

딥러닝(Deep Learning)은 컴퓨터가 여러 데이터를 이용해 마치 사람처럼 스스로 학습할 수 있게 하기 위해 인공 신경망(ANN, Artificial Neural Network)을 기반으로 한 기계 학습 기술이다. 최근 병렬 알고리즘을 광범위하게 실행하고 개선하여 컴퓨터 성능을 향상시킴으로써 음성 인식, 컴퓨터 비전, 자연 언어 처리 및 추천 시스템과 같은 다양한 분야에서 응용 프로그램 시나리오를 확장하는데 필요한 심층 학습을 수행한다.Deep Learning is a machine learning technology based on Artificial Neural Network (ANN), which enables a computer to learn as many people as they can using various data. Recent parallel algorithms have been extensively implemented and refined to improve computer performance to perform in-depth learning necessary to extend application scenarios in diverse areas such as speech recognition, computer vision, natural language processing, and recommendation systems.

딥러닝(Deep Learning)은 많은 연산 자원을 필요로 하여 GPU(Graphics Processing Unit)를 활용한 시스템이 널리 사용되고 있는데, 이는 클라우드 컴퓨팅 자원 중 사용 가능한 자원의 수를 제한하고 있는 문제점이 있다.Deep Learning requires a lot of computing resources, and a system utilizing a GPU (Graphics Processing Unit) is widely used, which limits the number of available resources among cloud computing resources.

한국공개특허 제10-2015-0096286호는 유휴 컴퓨터를 활용한 클라우드 대용량 데이터 분석 방법에 관한 것으로, 특정 에이전트 응용 프로그램이 설치된 사용자 개인 컴퓨터에게 네트워크를 통해 클라우드로부터 작업 명령을 받고, 그 작업을 수행한 이후에 그 결과를 다시 네트워크로 되돌리는 기술을 개시한다.Korean Patent Laid-Open No. 10-2015-0096286 relates to a cloud mass data analysis method using an idle computer. In this method, a user's personal computer installed with a specific agent application program receives a work command from the cloud through the network, And then returning the results back to the network.

한국공개특허 제10-2016-0146948호는 가상화 환경에서의 지능형 GPU 스케줄링에 관한 것으로, 상이한 가상 머신들로부터 GPU 커맨드들을 수신하고, 스케줄링 정책을 동적으로 선택하며 GPU에 의한 처리를 위해 GPU 커맨드들을 스케줄링하는 기술을 개시한다.Korean Patent Publication No. 10-2016-0146948 relates to intelligent GPU scheduling in a virtualized environment in which GPU commands are received from different virtual machines, the scheduling policy is dynamically selected, and GPU commands are scheduled for processing by the GPU .

한국공개특허 제10-2015-0096286호 (2015.08.24)Korean Patent Publication No. 10-2015-0096286 (2015.08.24) 한국공개특허 제10-2016-0146948호 (2016.12.21)Korean Patent Publication No. 10-2016-0146948 (December 21, 2016)

본 발명의 일 실시예는 복수의 프로세싱 엘리먼트들을 통해 동적 신경망 학습을 수행할 수 있는 동적 신경망 학습 방법 및 이를 수행하는 동적 신경망 학습 장치를 제공하고자 한다.An embodiment of the present invention is to provide a dynamic neural network learning method capable of performing dynamic neural network learning through a plurality of processing elements and a dynamic neural network learning apparatus performing the dynamic neural network learning method.

본 발명의 일 실시예는 학습 태스크에 체크포인트를 설정할 수 있고, 설정된 체크포인트를 통해 학습 태스크의 산출물을 저장하여 비정상 이벤트가 발생하면 학습 태스크를 이동시킬 수 있는 동적 신경망 학습 방법 및 이를 수행하는 동적 신경망 학습 장치를 제공하고자 한다.One embodiment of the present invention is a dynamic neural network learning method capable of setting a check point in a learning task and storing an output of a learning task through a set checkpoint and moving a learning task when an abnormal event occurs, And to provide a neural network learning apparatus.

본 발명의 일 실시예는 복수의 프로세싱 엘리먼트들 간의 협업을 통해 학습 태스크를 마이그레이션하여 해당 학습 태스크를 계속적으로 수행할 수 있는 동적 신경망 학습 방법 및 이를 수행하는 동적 신경망 학습 장치를 제공하고자 한다.An embodiment of the present invention is to provide a dynamic neural network learning method capable of continuously performing a learning task by migrating a learning task through collaboration among a plurality of processing elements, and a dynamic neural network learning apparatus performing the dynamic neural network learning method.

본 발명의 일 실시예는 복수의 프로세싱 엘리먼트들 각각의 비용 히스토리를 분석하여 낮은 비용의 프로세싱 엘리먼트를 검색할 수 있는 동적 신경망 학습 방법 및 이를 수행하는 동적 신경망 학습 장치를 제공하고자 한다.An embodiment of the present invention is to provide a dynamic neural network learning method capable of analyzing a cost history of each of a plurality of processing elements to search for a low-cost processing element, and a dynamic neural network learning apparatus performing the dynamic neural network learning method.

실시예들 중에서, 동적 신경망 학습 방법은 복수의 프로세싱 엘리먼트들과 연결된 동적 신경망 학습 장치에서 수행된다. 상기 방법은 (a) 제1 프로세싱 엘리먼트를 통해 수행되고 있는 학습 태스크에 관해 체크포인트를 설정하여 상기 학습 태스크의 산출물을 가져오는 단계, (b) 상기 학습 태스크의 수행에 독립적으로 상기 복수의 프로세싱 엘리먼트들 중 상기 제1 프로세싱 엘리먼트보다 비용 효율적인 제2프로세싱 엘리먼트를 검색하는 단계 및 (c) 상기 검색이 성공적으로 수행되면 가져온 상기 학습 태스크의 산출물을 가지고 상기 제2 프로세싱 엘리먼트로 상기 학습 태스크를 이동시켜 계속적으로 수행하는 단계를 포함한다.Among the embodiments, the dynamic neural network learning method is performed in a dynamic neural network learning apparatus connected to a plurality of processing elements. Said method comprising the steps of: (a) establishing a checkpoint for a learning task being performed via a first processing element to obtain an output of said learning task; (b) Retrieving a second processing element that is more cost-effective than the first processing element among the first processing element, and (c) if the search is successfully performed, moving the learning task to the second processing element with the retrieved result of the learning task, . &Lt; / RTI >

일 실시예에서, 상기 (a) 단계는 상기 학습 태스크의 실행단위마다 상기 학습 태스크에 상기 체크포인트를 설정하는 단계를 포함할 수 있다.In one embodiment, the step (a) may include setting the check point in the learning task for each execution unit of the learning task.

일 실시예에서, 상기 (a) 단계는 학습 데이터 집합 중 특정 기준으로 결정된 일부의 학습 데이터를 통해 상기 실행단위를 설정하는 단계를 더 포함할 수 있다.In one embodiment, the step (a) may further comprise setting the execution unit through a part of the learning data determined based on a specific criterion in the learning data set.

일 실시예에서, 상기 (a) 단계는 상기 학습 태스크의 산출물을 다른 프로세싱 엘리먼트에 의해 접근 가능한 글로벌 메모리에 저장하는 단계를 포함할 수 있다.In one embodiment, step (a) may comprise storing the output of the learning task in a global memory accessible by other processing elements.

일 실시예에서, 상기 (a) 단계는 상기 저장 과정에서 상기 다른 프로세싱 엘리먼트에 의해 수행될 때 동일한 실행 환경을 제공하기 위해 상기 학습 태스크에 관한 가상 머신 이미지를 생성하여 상기 글로벌 메모리에 저장하는 단계를 더 포함할 수 있다.In one embodiment, the step (a) may include generating a virtual machine image related to the learning task and storing the virtual machine image in the global memory to provide the same execution environment when performed by the other processing element in the storing process .

일 실시예에서, 상기 (b) 단계는 주기적으로 또는 상기 제1 프로세싱 엘리먼트의 비용이 변경될 때 상기 검색을 시작하는 단계를 포함할 수 있다.In one embodiment, step (b) may include periodically or beginning the search when the cost of the first processing element is changed.

일 실시예에서, 상기 (b) 단계는 상기 복수의 프로세싱 엘리먼트들의 비용 히스토리를 분석하여 상기 제2 프로세싱 엘리먼트를 결정하는 단계를 포함할 수 있다.In one embodiment, step (b) may include analyzing the cost history of the plurality of processing elements to determine the second processing element.

일 실시예에서, 상기 (b) 단계는 상기 학습 태스크와 독립적으로 수행되는 별도의 태스크가 상기 비용 히스토리의 분석을 주기적으로 수행하여 상기 제2 프로세싱 엘리먼트를 추천하도록 하는 단계를 더 포함할 수 있다.In one embodiment, step (b) may further include the step of periodically performing an analysis of the cost history by a separate task, which is performed independently of the learning task, to recommend the second processing element.

일 실시예에서, 상기 (c) 단계는 상기 이동 전에 상기 제2 프로세싱 엘리먼트에 상기 학습 태스크에 관한 가상 머신 이미지로 새로운 학습 태스크를 수행하는 단계를 포함할 수 있다.In one embodiment, step (c) may comprise, before the movement, performing a new learning task on the second processing element with a virtual machine image related to the learning task.

일 실시예에서, 상기 (c) 단계는 상기 새로운 학습 태스크에 가져온 상기 학습 태스크의 산출물을 제공하여 상기 이동을 완료시키는 단계를 더 포함할 수 있다.In one embodiment, the step (c) may further include providing an output of the learning task brought to the new learning task to complete the movement.

실시예들 중에서, 복수의 프로세싱 엘리먼트들과 연결된 동적 신경망 학습 장치는 제1 프로세싱 엘리먼트를 통해 수행되고 있는 학습 태스크에 관해 체크포인트를 설정하여 상기 학습 태스크의 산출물을 가져오는 체크포인트 설정부, 상기 학습 태스크의 수행에 독립적으로 상기 복수의 프로세싱 엘리먼트들 중 상기 제1 프로세싱 엘리먼트보다 비용 효율적인 제2프로세싱 엘리먼트를 검색하는 프로세싱 엘리먼트 검색부 및 상기 검색이 성공적으로 수행되면 가져온 상기 학습 태스크의 산출물을 가지고 상기 제2 프로세싱 엘리먼트로 상기 학습 태스크를 이동시켜 계속적으로 수행하는 작업 이관부를 포함한다.Among the embodiments, a dynamic neural network learning apparatus connected to a plurality of processing elements includes a checkpoint setting unit for setting a checkpoint on a learning task being performed through the first processing element and fetching the output of the learning task, A processing element retrieval unit for retrieving a second processing element that is more cost effective than the first processing element among the plurality of processing elements independently of the execution of the task; 2 < / RTI > processing element and moving the learning task to the second processing element.

일 실시예에서, 상기 체크포인트 설정부는 상기 학습 태스크의 실행단위마다 상기 학습 태스크에 상기 체크포인트를 설정할 수 있다.In one embodiment, the checkpoint setting unit may set the checkpoint in the learning task for each execution unit of the learning task.

일 실시예에서, 상기 체크포인트 설정부는 상기 학습 태스크의 산출물을 다른 프로세싱 엘리먼트에 의해 접근 가능한 글로벌 메모리에 저장할 수 있다.In one embodiment, the checkpoint setting unit may store the output of the learning task in a global memory accessible by other processing elements.

일 실시예에서, 상기 프로세싱 엘리먼트 검색부는 주기적으로 또는 상기 제1 프로세싱 엘리먼트의 비용이 변경될 때 상기 검색을 시작하거나, 또는 상기 복수의 프로세싱 엘리먼트들의 비용 히스토리를 분석하여 상기 제2 프로세싱 엘리먼트를 결정할 수 있다.In one embodiment, the processing element searcher may periodically or initiate the search when the cost of the first processing element is changed, or may analyze the cost history of the plurality of processing elements to determine the second processing element have.

일 실시예에서, 상기 작업 이관부는 상기 이동 전에 상기 제2 프로세싱 엘리먼트에 상기 학습 태스크에 관한 가상 머신 이미지로 새로운 학습 태스크를 수행할 수 있다.In one embodiment, the task transfer unit may perform a new learning task on the second processing element with the virtual machine image related to the learning task before the movement.

실시예들 중에서, 복수의 프로세싱 엘리먼트들과 연결된 동적 신경망 학습 장치는 상기 동적 신경망 학습 장치는 독립된 프로세스들로서 비용 모니터 에이전트, 인스턴스 중재 에이전트 및 인스턴스 추천 에이전트를 실행시키고, 상기 비용 모니터 에이전트는 제1 프로세싱 엘리먼트를 통해 수행되고 있는 학습 태스크에 관해 체크포인트를 설정하여 상기 학습 태스크의 산출물을 가져오며, 상기 인스턴스 추천 에이전트는 상기 학습 태스크의 수행에 독립적으로 상기 복수의 프로세싱 엘리먼트들 중 상기 제1 프로세싱 엘리먼트보다 비용 효율적인 제2프로세싱 엘리먼트를 검색하고, 상기 인스턴스 중재 에이전트는 상기 검색이 성공적으로 수행되면 가져온 상기 학습 태스크의 산출물을 가지고 상기 제2 프로세싱 엘리먼트로 상기 학습 태스크를 이동시켜 계속적으로 수행한다.In embodiments, a dynamic neural network learning apparatus coupled to a plurality of processing elements is configured such that the dynamic neural network learning apparatus executes a cost monitor agent, an instance mediation agent, and an instance recommendation agent as independent processes, Wherein the instance recommendation agent sets a checkpoint for a learning task being performed by the instance recommendation agent to obtain an output of the learning task, wherein the instance recommendation agent is independent of the execution of the learning task from a cost of the first of the plurality of processing elements Retrieves an efficient second processing element, and the instance arbitration agent moves the learning task to the second processing element with the retrieved result of the learning task when the retrieval is successfully performed Perform a lit continuously.

개시된 기술은 다음의 효과를 가질 수 있다. 다만, 특정 실시예가 다음의 효과를 전부 포함하여야 한다 거나 다음의 효과만을 포함하여야 한다는 의미는 아니므로, 개시된 기술의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다.The disclosed technique may have the following effects. It should be understood, however, that the scope of the disclosed technology is not to be construed as limited thereby, as it does not imply that a particular embodiment should include all of the following effects or merely include the following effects.

본 발명의 일 실시예에 따른 동적 신경망 학습 방법 및 이를 수행하는 동적 신경망 학습 장치는 복수의 프로세싱 엘리먼트들을 통해 동적 신경망 학습을 수행할 수 있다.The dynamic neural network learning method and the dynamic neural network learning apparatus performing the dynamic neural network learning method according to an embodiment of the present invention can perform dynamic neural network learning through a plurality of processing elements.

본 발명의 일 실시예에 따른 동적 신경망 학습 방법 및 이를 수행하는 동적 신경망 학습 장치는 학습 태스크에 체크포인트를 설정할 수 있고, 설정된 체크포인트를 통해 학습 태스크의 산출물을 저장하여 비정상 이벤트가 발생하면 학습 태스크를 이동시킬 수 있다.The dynamic neural network learning method and the dynamic neural network learning apparatus according to an embodiment of the present invention can set a check point in a learning task and store the output of a learning task through a set check point, .

본 발명의 일 실시예에 따른 동적 신경망 학습 방법 및 이를 수행하는 동적 신경망 학습 장치는 복수의 프로세싱 엘리먼트들 간의 협업을 통해 학습 태스크를 마이그레이션(즉, 학습 작업 이관)하여 해당 학습 태스크를 계속적으로 수행할 수 있다.The dynamic neural network learning method and the dynamic neural network learning apparatus performing the dynamic neural network learning method according to an embodiment of the present invention may perform the learning task continuously by migrating the learning task through collaboration among the plurality of processing elements .

본 발명의 일 실시예에 따른 동적 신경망 학습 방법 및 이를 수행하는 동적 신경망 학습 장치는 복수의 프로세싱 엘리먼트들 각각의 비용 히스토리를 분석하여 낮은 비용의 프로세싱 엘리먼트를 검색할 수 있다.The dynamic neural network learning method and the dynamic neural network learning apparatus performing the dynamic neural network learning method according to an embodiment of the present invention can analyze the cost history of each of the plurality of processing elements to search low cost processing elements.

도 1은 본 발명의 일 실시예에 따른 동적 신경망 학습 시스템을 설명하는 도면이다.
도 2는 도 1에 있는 동적 신경망 학습 장치를 설명하는 도면이다.
도 3은 도 1에 있는 동적 신경망 학습 장치에서 수행되는 동적 신경망 학습 과정을 설명하는 도면이다.1 is a view for explaining a dynamic neural network learning system according to an embodiment of the present invention.
2 is a view for explaining a dynamic neural network learning apparatus shown in FIG.
3 is a view for explaining a dynamic neural network learning process performed in the dynamic neural network learning apparatus shown in FIG.

본 발명에 관한 설명은 구조적 내지 기능적 설명을 위한 실시예에 불과하므로, 본 발명의 권리범위는 본문에 설명된 실시예에 의하여 제한되는 것으로 해석되어서는 아니 된다. 즉, 실시예는 다양한 변경이 가능하고 여러 가지 형태를 가질 수 있으므로 본 발명의 권리범위는 기술적 사상을 실현할 수 있는 균등물들을 포함하는 것으로 이해되어야 한다. 또한, 본 발명에서 제시된 목적 또는 효과는 특정 실시예가 이를 전부 포함하여야 한다 거나 그러한 효과만을 포함하여야 한다는 의미는 아니므로, 본 발명의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다.The description of the present invention is merely an example for structural or functional explanation, and the scope of the present invention should not be construed as being limited by the embodiments described in the text. That is, the embodiments are to be construed as being variously embodied and having various forms, so that the scope of the present invention should be understood to include equivalents capable of realizing technical ideas. Also, the purpose or effect of the present invention should not be construed as limiting the scope of the present invention, since it does not mean that a specific embodiment should include all or only such effects.

한편, 본 출원에서 서술되는 용어의 의미는 다음과 같이 이해되어야 할 것이다.Meanwhile, the meaning of the terms described in the present application should be understood as follows.

"제1", "제2" 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위한 것으로, 이들 용어들에 의해 권리범위가 한정되어서는 아니 된다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다.The terms "first "," second ", and the like are intended to distinguish one element from another, and the scope of the right should not be limited by these terms. For example, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결될 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어"있다고 언급된 때에는 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. 한편, 구성요소들 간의 관계를 설명하는 다른 표현들, 즉 "~사이에" 와 "바로 ~사이에" 또는 "~에 이웃하는"과 "~에 직접 이웃하는" 등도 마찬가지로 해석되어야 한다.It is to be understood that when an element is referred to as being "connected" to another element, it may be directly connected to the other element, but there may be other elements in between. On the other hand, when an element is referred to as being "directly connected" to another element, it should be understood that there are no other elements in between. On the other hand, other expressions that describe the relationship between components, such as "between" and "between" or "neighboring to" and "directly adjacent to" should be interpreted as well.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "포함하다" 또는 "가지다" 등의 용어는 실시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.It is to be understood that the singular " include " or "have" are to be construed as including the stated feature, number, step, operation, It is to be understood that the combination is intended to specify that it does not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof.

각 단계들에 있어 식별부호(예를 들어, a, b, c 등)는 설명의 편의를 위하여 사용되는 것으로 식별부호는 각 단계들의 순서를 설명하는 것이 아니며, 각 단계들은 문맥상 명백하게 특정 순서를 기재하지 않는 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 단계들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.In each step, the identification code (e.g., a, b, c, etc.) is used for convenience of explanation, the identification code does not describe the order of each step, Unless otherwise stated, it may occur differently from the stated order. That is, each step may occur in the same order as described, may be performed substantially concurrently, or may be performed in reverse order.

본 발명은 컴퓨터가 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현될 수 있고, 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다. 컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장 장치 등이 있다. 또한, 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.The present invention can be embodied as computer-readable code on a computer-readable recording medium, and the computer-readable recording medium includes all kinds of recording devices for storing data that can be read by a computer system . Examples of the computer-readable recording medium include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage, and the like. In addition, the computer-readable recording medium may be distributed over network-connected computer systems so that computer readable codes can be stored and executed in a distributed manner.

여기서 사용되는 모든 용어들은 다르게 정의되지 않는 한, 본 발명이 속하는 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한 이상적이거나 과도하게 형식적인 의미를 지니는 것으로 해석될 수 없다.All terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs, unless otherwise defined. Commonly used predefined terms should be interpreted to be consistent with the meanings in the context of the related art and can not be interpreted as having ideal or overly formal meaning unless explicitly defined in the present application.

도 1은 본 발명의 일 실시예에 따른 동적 신경망 학습 시스템을 설명하는 도면이다.1 is a view for explaining a dynamic neural network learning system according to an embodiment of the present invention.

도 1을 참조하면, 동적 신경망 학습 시스템(10)은 복수의 프로세싱 엘리먼트들(100), 동적 신경망 학습 장치(200) 및 글로벌 메모리(300)를 포함하고, 이들은 네트워크를 통해 연결될 수 있다.Referring to FIG. 1, the dynamic neural network learning system 10 includes a plurality of processing elements 100, a dynamic neural network learning apparatus 200, and a global memory 300, which can be connected through a network.

복수의 프로세싱 엘리먼트들(100)은 전 세계에 분포되어 있을 수 있고, 제1 프로세싱 엘리먼트(100-1), 제2 프로세싱 엘리먼트(100-2), 제3 프로세싱 엘리먼트(100-3), ... 제N 프로세싱 엘리먼트(100-n)으로 구현될 수 있다. 여기에서, 프로세싱 엘리먼트(Processing Element)이 분포되어 있는 지역은 물리적으로 멀리 떨어져 있는 지리적 영역에 해당할 수 있다. 일 실시예에서, 복수의 프로세싱 엘리먼트들(100)은 유휴 클라우드 컴퓨팅 자원을 포함할 수 있다. 동적 신경망 학습 장치(200)는 프로세싱 엘리먼트의 예기치 못한 비정상 이벤트(예를 들어, 강제 종료 등)가 발생하더라도 전 세계에 분포되어 있는 복수의 프로세싱 엘리먼트들(100)과의 협업을 통해 학습 태스크를 계속적으로 수행할 수 있다.The plurality of processing elements 100 may be distributed worldwide and may include a first processing element 100-1, a second processing element 100-2, a third processing element 100-3, ..., Lt; / RTI > processing element 100-n. Here, the area in which the processing element is distributed may correspond to a physically remote geographical area. In one embodiment, the plurality of processing elements 100 may comprise idle cloud computing resources. The dynamic neural network learning apparatus 200 can continuously perform a learning task through collaborations with a plurality of processing elements 100 distributed around the world even if an unexpected abnormal event (e.g., forced termination) of the processing element occurs . &Lt; / RTI >

동적 신경망 학습 장치(200)는 복수의 프로세싱 엘리먼트들(100) 및 글로벌 메모리(300)와 연결된 컴퓨팅 장치에 해당할 수 있다. 보다 구체적으로, 동적 신경망 학습 장치(200)는 체크포인트를 이용하여 학습 태스크의 산출물(즉, 학습 태스크의 중간 단계에 대한 산출물을 포함)을 글로벌 메모리(300)에 저장할 수 있고, 이후 다른 프로세싱 엘리먼트가 검색되면 글로벌 메모리(300)에 저장된 학습 태스크를 가져온 후, 다른 프로세싱 엘리먼트에서 해당 학습 태스크를 다시 수행할 수 있다. 즉, 본 발명의 일 실시예에 따른 동적 신경망 학습 장치(200)는 수행 중 인 프로세싱 엘리먼트에 비정상 이벤트가 발생되면 해당 학습 태스크를 이동시킬 다른 프로세싱 엘리먼트를 결정하고, 결정된 프로세싱 엘리먼트에서 다시 수행할 수 있도록 해당 학습 태스크를 이동시킬 수 있다.The dynamic neural network learning apparatus 200 may correspond to a plurality of processing elements 100 and a computing device connected to the global memory 300. More specifically, the dynamic neural network learning apparatus 200 can store a result of a learning task (i.e., an output for an intermediate stage of a learning task) in the global memory 300 using a checkpoint, The learning task stored in the global memory 300 may be fetched, and the corresponding learning task may be performed again in another processing element. That is, the dynamic neural network learning apparatus 200 according to an exemplary embodiment of the present invention can determine another processing element to move the learning task when an abnormal event occurs in the processing element being executed, and can perform again in the determined processing element The corresponding learning task can be moved.

일 실시예에서, 동적 신경망 학습 장치(200)는 학습 태스크를 수행하고 있는 프로세싱 엘리먼트의 실행 사항을 체크하여 다른 프로세싱 엘리먼트의 시작 여부를 결정할 수 있고, 학습 태스크에 관한 체크포인트 설정 여부를 확인할 수 있다. 예를 들어, 동적 신경망 학습 장치(200)는 학습 태스크를 수행하고 있는 프로세싱 엘리먼트에 관한 비용 증가, 강제 종료 등을 포함하는 비정상 이벤트와 같은 실행 사항을 체크할 수 있고, 프로세싱 엘리먼트와 다른 프로세싱 엘리먼트 간의 비용을 주기적으로 비교 분석할 수 있다.In one embodiment, the dynamic neural network learning apparatus 200 can check the execution of a processing element that is performing a learning task, determine whether to start another processing element, and determine whether a checkpoint is set for a learning task . For example, the dynamic neural network learning apparatus 200 can check an execution, such as an abnormal event, including an increase in cost, a forced termination, etc., related to a processing element performing a learning task, Costs can be periodically compared and analyzed.

또한, 동적 신경망 학습 장치(200)는 클라우드 컴퓨팅 환경에서 유휴 자원을 활용하여 낮은 비용에 딥러닝(Deep Learning) 작업을 수행할 수 있고, 예기치 못한 프로세싱 엘리먼트의 비정상 이벤트에 대응할 수 있다. 이하, 동적 신경망 학습 장치(200)와 관련한 보다 상세한 설명은 도 2를 참조하여 설명한다.In addition, the dynamic neural network learning apparatus 200 can perform a deep learning operation at low cost by utilizing idle resources in a cloud computing environment, and can respond to abnormal events of unexpected processing elements. Hereinafter, a more detailed description related to the dynamic neural network learning apparatus 200 will be described with reference to FIG.

글로벌 메모리(300)는 동적 신경망 학습 장치(200)와 연결될 수 있고, 복수의 프로세싱 엘리먼트들(100)에 의해 접근 가능할 수 있다. 보다 구체적으로, 글로벌 메모리(300)는 SSD(Solid State Disk) 또는 HDD(Hard Disk Drive)와 같은 비휘발성 메모리로 구현되어 동적 신경망 학습 장치(200)에 필요한 데이터 전반을 저장하는데 사용되는 보조기억장치를 포함할 수 있고, 이처럼, 글로벌 메모리(300)는 비휘발성 메모리로 구현될 수 있고, 만일, 비휘발성 메모리로 구현되면 하이퍼링크를 통해 연결되도록 구현될 수 있다.The global memory 300 may be coupled to the dynamic neural network learning device 200 and may be accessible by a plurality of processing elements 100. More specifically, the global memory 300 is implemented as a nonvolatile memory such as a solid state disk (SSD) or a hard disk drive (HDD), and is used as an auxiliary storage device As such, the global memory 300 may be implemented as a non-volatile memory and, if implemented as a non-volatile memory, may be implemented to be hyperlinked.

일 실시예에서, 글로벌 메모리(300)는 체크포인트를 통한 학습 태스크의 산출물이 저장될 수 있고, 학습 태스크에 관한 가상 머신 이미지(VMI, Virtual Machine Image)가 저장될 수 있다. 또한, 글로벌 메모리(300)에 저장되는 데이터는 반드시 이에 한정하지 않으며 설계자에 의해 변경될 수 있다.In one embodiment, the global memory 300 may store an output of a learning task via a checkpoint, and may store a virtual machine image (VMI) about the learning task. In addition, the data stored in the global memory 300 is not limited thereto and may be changed by a designer.

도 2는 도 1에 있는 동적 신경망 학습 장치를 설명하는 도면이다.2 is a view for explaining a dynamic neural network learning apparatus shown in FIG.

도 2를 참조하면, 동적 신경망 학습 장치(200)는 체크포인트 설정부(210), 프로세싱 엘리먼트 검색부(220), 작업 이관부(230) 및 제어부(240)를 포함한다.2, the dynamic neural network learning apparatus 200 includes a checkpoint setting unit 210, a processing element search unit 220, a work transfer unit 230, and a control unit 240.

체크포인트 설정부(210)는 학습 태스크의 산출물을 가져올 수 있다. 보다 구체적으로, 체크포인트 설정부(210)는 제1 프로세싱 엘리먼트(100-1)를 통해 수행되고 있는 학습 태스크에 관해 체크포인트를 설정하여 학습 태스크의 산출물을 가져올 수 있다. 여기에서, 체크포인트(Check Point)는 학습 태스크의 실행 과정의 중간 검사점으로, 해당 중간 검사점의 학습 태스크에 관한 실행 상태를 완전하게 보존하고, 이후에 해당 중간 검사점에서 학습 태스크의 수행을 다시 재개할 수 있도록 한 중간 검사점을 의미한다. 본 발명의 일 실시예에 따른 동적 신경망 학습 장치(200)는 프로세싱 엘리먼트에 관한 예기치 못한 비정상 이벤트(예를 들어, 비용 증가, 강제 종료 등 포함)가 발생되는 경우에는 설정된 체크포인트를 통해 해당 중간 검사점의 학습 태스크의 산출물을 가져올 수 있고, 이후 해당 중간 검사점부터 학습 태스크를 다시 수행할 수 있다.The checkpoint setting unit 210 can fetch the output of the learning task. More specifically, the checkpoint setting unit 210 may set a checkpoint on a learning task being performed through the first processing element 100-1 to fetch the output of the learning task. Here, the check point is an intermediate check point of the execution process of the learning task, and the execution state of the learning task of the corresponding intermediate check point is completely preserved. Thereafter, the execution of the learning task is performed at the corresponding intermediate check point It means an intermediate checkpoint that can be restarted again. The dynamic neural network learning apparatus 200 according to an exemplary embodiment of the present invention is configured to perform a predetermined checkpoint through a predetermined checkpoint when an unexpected abnormal event (for example, an increase in cost, a forced termination, etc.) It is possible to retrieve the output of the learning task of the point, and then perform the learning task again from the corresponding intermediate checkpoint.

체크포인트 설정부(210)는 학습 태스크의 실행단위마다 학습 태스크에 체크포인트를 설정할 수 있다. 보다 구체적으로, 체크포인트 설정부(210)는 학습 태스크의 실행단위마다 학습 태스크의 중간 결과에 대한 체크포인트를 설정할 수 있다. 체크포인트 설정부(210)는 학습 데이터 집합 중 특정 기준으로 결정된 일부의 학습 데이터를 통해 실행단위를 설정할 수 있다. 여기에서, 특정 기준은 설계자에 의해 결정될 수 있다.The checkpoint setting unit 210 can set a checkpoint in the learning task for each execution unit of the learning task. More specifically, the checkpoint setting unit 210 can set a check point for an intermediate result of the learning task for each execution unit of the learning task. The checkpoint setting unit 210 can set an execution unit through a part of learning data determined as a specific reference among the learning data sets. Here, the specific criteria can be determined by the designer.

체크포인트 설정부(210)는 학습 태스크의 산출물을 다른 프로세싱 엘리먼트에 의해 접근 가능한 글로벌 메모리(300)에 저장할 수 있다. 여기에서, 다른 프로세싱 엘리먼트는 해당 학습 태스크가 수행되고 있는 제1 프로세싱 엘리먼트(100-1) 이외에 제2 내지 제N 프로세싱 엘리먼트들(100-2, ... 100-n)을 포함할 수 있다. 일 실시예에서, 체크포인트 설정부(210)는 학습 태스크에 관해 체크포인트를 설정하여 해당 학습 태스크의 산출물을 가져온 후, 해당 학습 태스크의 산출물을 글로벌 메모리(300)에 저장할 수 있다.The checkpoint setting unit 210 may store the output of the learning task in the global memory 300 accessible by other processing elements. Here, other processing elements may include second to Nth processing elements 100-2, ..., 100-n in addition to the first processing element 100-1 in which the corresponding learning task is being performed. In one embodiment, the checkpoint setting unit 210 may set a checkpoint on a learning task, fetch the output of the learning task, and then store the output of the learning task in the global memory 300.

체크포인트 설정부(210)는 다른 프로세싱 엘리먼트에 의해 학습 태스크가 수행될 때 동일한 실행 환경을 제공할 수 있다. 체크포인트 설정부(210)는 학습 태스크의 산출물을 글로벌 메모리(300)에 저장하는 과정에서 다른 프로세싱 엘리먼트에 의해 수행될 때 동일한 실행 환경을 제공하기 위해 학습 태스크의 관한 가상 머신 이미지(VMI, Virtual Machine Image)를 생성하여 글로벌 메모리(300)에 저장할 수 있다.The checkpoint setting unit 210 may provide the same execution environment when a learning task is performed by another processing element. The checkpoint setting unit 210 sets a virtual machine image (VMI) related to a learning task to provide the same execution environment when performed by other processing elements in the process of storing the output of the learning task in the global memory 300. [ Image) may be generated and stored in the global memory 300.

일 실시예에서, 체크포인트 설정부(210)는 다른 프로세싱 엘리먼트에 의해 수행될 때 동일한 실행환경을 제공하기 위해 Ubuntu 14.04, NVIDIA CUDA SDK 7.5, cuDNN 라이브러리 및 TensorFlow 0.1 등을 사용하여 학습 태스크의 관한 가상 머신 이미지를 생성할 수 있다. 이를 통해, 다른 프로세싱 엘리먼트는 글로벌 메모리(300)에 저장된 가상 머신 이미지를 복사할 수 있고, 복사된 가상 머신 이미지를 통해 학습 태스크를 가져온 후에 동일한 실행 환경에서 해당 학습 태스크를 다시 수행할 수 있다.In one embodiment, the checkpoint setting unit 210 uses the Ubuntu 14.04, NVIDIA CUDA SDK 7.5, cuDNN library, and TensorFlow 0.1, etc. to provide the same execution environment when performed by other processing elements, A machine image can be generated. This allows other processing elements to copy the virtual machine image stored in the global memory 300 and retrieve the learning task through the copied virtual machine image and then perform the learning task again in the same execution environment.

프로세싱 엘리먼트 검색부(220)는 학습 태스크의 수행에 독립적으로 복수의 프로세싱 엘리먼트들(100) 중 제1 프로세싱 엘리먼트 보다 상대적으로 비용 효율적인 제2 프로세싱 엘리먼트(100-2)를 검색할 수 있다. 프로세싱 엘리먼트 검색부(220)는 학습 태스크의 수행 과정에서 비정상 이벤트(예를 들어, 강제 종료)가 발생하면 해당 학습 태스크를 다시 수행하기 위해 안정적이고 비용 효율적인 실행환경을 제공할 수 있는 다른 프로세싱 엘리먼트를 검색할 수 있다.The processing element searching unit 220 can search the second processing element 100-2 which is relatively more cost effective than the first processing element among the plurality of processing elements 100 independently of the execution of the learning task. The processing element search unit 220 may include other processing elements that can provide a stable and cost-effective execution environment to perform the learning task again when an abnormal event (for example, forced termination) occurs in the course of execution of the learning task You can search.

프로세싱 엘리먼트 검색부(220)는 주기적으로 또는 제1 프로세싱 엘리먼트(100-1)의 비용이 변경될 때 프로세싱 엘리먼트 검색을 시작할 수 있다. 일 실시예에서, 프로세싱 엘리먼트 검색부(220)는 주기적으로 프로세싱 엘리먼트 검색을 시작할 수 있다. 프로세싱 엘리먼트 검색부(220)는 동적 신경망 학습 장치(200)의 구현 오버헤드(Implementation Overhead)를 최소화하기 위해 복수의 프로세싱 엘리먼트들(100)의 비용을 분석할 수 있다. The processing element searching unit 220 may start searching for the processing element periodically or when the cost of the first processing element 100-1 is changed. In one embodiment, the processing element searcher 220 may periodically start processing element search. The processing element searching unit 220 may analyze the cost of the plurality of processing elements 100 to minimize the implementation overhead of the dynamic neural network learning apparatus 200. [

예를 들어, 프로세싱 엘리먼트 검색부(220)는 주기적으로 프로세싱 엘리먼트 검색을 수행하여 현재 학습 태스크가 수행되고 있는 제1 프로세싱 엘리먼트(100-1) 비용보다 낮은 비용의 프로세싱 엘리먼트가 존재하는 경우에는 비용 효율적인 프로세스 엘리먼트로 학습 태스크를 이동시키도록 할 수 있다.For example, the processing element search unit 220 periodically performs a processing element search to determine whether a processing element having a cost lower than the cost of the first processing element 100-1 in which a current learning task is being performed is cost- You can move a learning task to a process element.

다른 예를 들어, 프로세싱 엘리먼트 검색부(220)는 학습 태스크가 수행되고 있는 제1 프로세싱 엘리먼트(100-1)의 비용이 증가하지 않았더라도 제1 프로세싱 엘리먼트(100-1)의 비용 증가 또는 예기치 못한 비정상 이벤트 발생을 대비하여 주기적으로 다른 프로세싱 엘리먼트를 검색을 시작할 수 있다.For example, the processing element search unit 220 may be configured to determine whether the cost of the first processing element 100-1 is higher than the cost of the first processing element 100-1 even if the cost of the first processing element 100-1, It is possible to start searching for other processing elements periodically in preparation for occurrence of an abnormal event.

다른 일 실시예에서, 프로세싱 엘리먼트 검색부(220)는 제1 프로세싱 엘리먼트(100-1)의 비용이 변경될 때 프로세싱 엘리먼트 검색을 시작할 수 있다. 프로세싱 엘리먼트 검색부(220)는 학습 태스크가 수행되고 있는 제1 프로세싱 엘리먼트(100-1)의 비용이 증가하는 경우에 복수의 프로세싱 엘리먼트들(100) 중 제1 프로세싱 엘리먼트(100-1)의 비용보다 상대적으로 비용 효율적인 다른 프로세싱 엘리먼트 검색을 시작할 수 있다.In another embodiment, the processing element search unit 220 may start processing element search when the cost of the first processing element 100-1 is changed. The processing element searching unit 220 determines the cost of the first processing element 100-1 of the plurality of processing elements 100 when the cost of the first processing element 100-1 in which the learning task is being performed is increased It is possible to start searching for other processing elements that are more cost effective.

프로세싱 엘리먼트 검색부(220)는 복수의 프로세싱 엘리먼트들(100)의 비용 히스토리를 분석하여 제2 프로세싱 엘리먼트(100-2)를 결정할 수 있다. 일 실시예에서, 프로세싱 엘리먼트 검색부(220)는 복수의 프로세싱 엘리먼트들(100) 각각에 관한 비용 히스토리를 분석하여 복수의 프로세싱 엘리먼트들(100) 각각의 비용을 예측할 수 있고, 비용 예측 결과를 기초로 제2 프로세싱 엘리먼트(100-2)를 결정할 수 있다. 즉, 동적 신경망 학습 장치(200)는 주기적으로 프로세싱 엘리먼트의 비용 분석을 수행하는 별도의 태스크에 오류가 발생하거나 복수의 프로세싱 엘리먼트들(100)에 관한 비용 조회 요청이 급증하면 복수의 프로세싱 엘리먼트들(100) 각각의 비용 히스토리를 글로벌 메모리(300)에 저장함으로써 확장성(Scalability) 및 내결함성(Fault-Tolerance)을 부여할 수 있다.The processing element searching unit 220 may analyze the cost history of the plurality of processing elements 100 to determine the second processing element 100-2. In one embodiment, the processing element searcher 220 may analyze the cost history of each of the plurality of processing elements 100 to predict the cost of each of the plurality of processing elements 100, To determine the second processing element 100-2. That is, when the dynamic neural network learning apparatus 200 periodically incurs an error in a separate task for performing the cost analysis of the processing element or when the cost inquiry request for the plurality of processing elements 100 is surged, the plurality of processing elements Scalability and fault-tolerance can be given by storing the respective cost histories in the global memory 300. [

프로세싱 엘리먼트 검색부(220)는 학습 태스크와 독립적으로 수행되는 별도의 태스크가 비용 히스토리의 분석을 주기적으로 수행하여 제2 프로세싱 엘리먼트(100-2)를 추천하도록 할 수 있다. 예를 들어, 프로세싱 엘리먼트 검색부(220)는 복수의 프로세싱 엘리먼트들(100) 중에서 현재 비용이 가장 낮은 프로세싱 엘리먼트를 제2 프로세싱 엘리먼트(100-2)로 결정할 수 있다. 보다 구체적으로, 프로세싱 엘리먼트 검색부(220)는 별도의 태스크가 복수의 프로세싱 엘리먼트들(100)의 현재 비용을 기초로 비용 히스토리를 분석하여 비용 효율적인 제2 프로세싱 엘리먼트(100-2)를 결정할 수 있고, 결정된 제2 프로세싱 엘리먼트(100-2)를 추천하도록 할 수 있다.The processing element search unit 220 may cause a separate task, which is performed independently of the learning task, periodically perform the analysis of the cost history to recommend the second processing element 100-2. For example, the processing element search unit 220 may determine the processing element having the lowest current cost among the plurality of processing elements 100 as the second processing element 100-2. More specifically, the processing element searcher 220 can determine a cost-effective second processing element 100-2 by analyzing the cost history based on the current cost of the plurality of processing elements 100, , And to recommend the determined second processing element 100-2.

작업 이관부(230)는 제2 프로세싱 엘리먼트(100-2)의 검색이 성공적으로 수행되면 체크포인트 설정부(210)를 통해 가져온 학습 태스크의 산출물을 가지고 제2 프로세싱 엘리먼트(100-2)로 학습 태스크를 이동(즉, 제2 프로세싱 엘리먼트로 작업을 이관)시켜 계속적으로 수행할 수 있다.The task transfer unit 230 may transfer the learning task to the second processing element 100-2 with the output of the learning task imported through the checkpoint setting unit 210 when the search of the second processing element 100-2 is successfully performed The task can be continuously moved (i.e., the work is transferred to the second processing element).

작업 이관부(230)는 제2 프로세싱 엘리먼트(100-2)로 학습 태스크를 이동하기 전에 제2 프로세싱 엘리먼트(100-2)에 학습 태스크에 관한 가상 머신 이미지(VMI)로 새로운 학습 태스크를 수행할 수 있다. 일 실시예에서, 작업 이관부(230)는 새로운 학습 태스크에 가져온 학습 태스크의 산출물을 제공하여 이동을 완료시킬 수 있다.The task transfer unit 230 performs a new learning task with the virtual machine image VMI for the learning task in the second processing element 100-2 before moving the learning task to the second processing element 100-2 . In one embodiment, the task transfer unit 230 may provide the output of the learned task to a new task to complete the move.

제어부(240)는 동적 신경망 학습 장치(200)의 전체적인 동작을 제어할 수 있고, 체크포인트 설정부(210), 프로세싱 엘리먼트 검색부(220) 및 작업 이관부(230) 간의 제어 흐름 또는 데이터 흐름을 제어할 수 있다.The control unit 240 can control the overall operation of the dynamic neural network learning apparatus 200 and can control the control flow or data flow between the check point setting unit 210, the processing element search unit 220, and the task transfer unit 230 Can be controlled.

도 3은 도 1에 있는 동적 신경망 학습 장치에서 수행되는 동적 신경망 학습 과정을 설명하는 도면이다.3 is a view for explaining a dynamic neural network learning process performed in the dynamic neural network learning apparatus shown in FIG.

도 3에서, 동적 신경망 학습 장치(200)는 독립된 프로세스들로서 비용 모니터 에이전트, 인스턴스 중재 에이전트 및 인스턴스 추천 에이전트를 실행시킬 수 있다(단계 S310). 보다 구체적으로, 동적 신경망 학습 장치(200)는 비용 모니터 에이전트, 인스턴스 중재 에이전트 및 인스턴스 추천 에이전트 각각을 통해 체크포인트 설정부(210), 프로세싱 엘리먼트 검색부(220) 및 작업 이관부(230)의 동작을 수행할 수 있다.In FIG. 3, the dynamic neural network learning apparatus 200 can execute the cost monitor agent, the instance arbitration agent, and the instance recommendation agent as independent processes (step S310). More specifically, the dynamic neural network learning apparatus 200 receives the operation of the checkpoint setting unit 210, the processing element search unit 220, and the operation transfer unit 230 through the cost monitor agent, the instance arbitration agent, Can be performed.

비용 모니터 에이전트는 제1 프로세싱 엘리먼트(100-1)를 통해 수행되고 있는 학습 태스크에 관해 체크포인트를 설정하여 학습 태스크의 산출물을 가져올 수 있다(단계 S320).The cost monitor agent can obtain the output of the learning task by setting a checkpoint on the learning task being performed through the first processing element 100-1 (step S320).

인스턴스 추천 에이전트는 학습 태스크의 수행에 독립적으로 복수의 프로세싱 엘리먼트들(100) 중 제1 프로세싱 엘리먼트(100-1) 보다 비용 효율적인 제2 프로세싱 엘리먼트(100-2)를 검색할 수 있다(단계 S330). 일 실시예에서, 인스턴스 추천 에이전트는 주기적으로 다른 프로세싱 엘리먼트에 관한 검색을 시작할 수 있고, 이때, 복수의 프로세싱 엘리먼트들(100)의 비용을 주기적으로 체크하여 로컬(Local Disk)에 저장할 수 있다. 다른 일 실시예에서, 인스턴스 추천 에이전트는 제1 프로세싱 엘리먼트(100-1)를 통해 수행되고 있는 학습 태스크가 다른 프로세싱 엘리먼트로 마이그레이션(Migration)(즉, 학습 태스크가 다른 프로세싱 엘리먼트로 이관) 해야 하는지 여부를 모니터링할 수 있다.The instance recommendation agent may retrieve the second processing element 100-2 which is more cost effective than the first processing element 100-1 of the plurality of processing elements 100 independently of the execution of the learning task (step S330) . In one embodiment, the instance recommendation agent may periodically initiate a search for other processing elements, at which time the cost of the plurality of processing elements 100 may be periodically checked and stored in a local disk. In another embodiment, the instance recommendation agent determines whether the learning task being performed through the first processing element 100-1 is to be migrated (i.e., the learning task is transferred to another processing element) to another processing element Can be monitored.

인스턴스 중재 에이전트는 검색이 성공적으로 수행되면 가져온 학습 태스크의 산출물을 가지고 제2 프로세싱 엘리먼트(100-2)로 학습 태스크를 이동시켜 계속적으로 수행할 수 있다(단계 S340). 보다 구체적으로, 인스턴스 중재 에이전트는 체크포인트가 설정된 경로를 통해 제2 프로세싱 엘리먼트(100-2)에서 학습 태스크를 계속하여 수행할 수 있다. 인스턴스 중재 에이전트는 학습 태스크가 다시 시작되면 글로벌 메모리(300)에서 학습 태스크의 산출물을 가져온 후, 해당 체크포인트에서 학습 태스크에 관한 체크포인트를 다시 설정할 수 있다.The instance arbitration agent can continuously perform the learning task by moving the learning task to the second processing element 100-2 with the output of the imported learning task if the search is successfully performed (step S340). More specifically, the instance mediation agent may continue to perform the learning task in the second processing element 100-2 through the path where the checkpoint is set. The instance arbitration agent may retrieve the output of the learning task from the global memory 300 once the learning task is restarted and then reset the checkpoint for the learning task at that checkpoint.

상기에서는 본 출원의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 통상의 기술자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 출원을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as set forth in the following claims And changes may be made without departing from the spirit and scope of the invention.

10: 동적 신경망 학습 시스템
100: 복수의 프로세싱 엘리먼트들 200: 동적 신경망 학습 장치
210: 체크포인트 설정부 220: 프로세싱 엘리먼트 검색부
230: 작업 이관부 240: 제어부
300: 글로벌 메모리10: Dynamic neural network learning system
100: a plurality of processing elements 200: a dynamic neural network learning device
210: checkpoint setting unit 220: processing element search unit
230: work transfer unit 240: control unit
300: Global memory

Claims

A dynamic neural network learning method performed in a dynamic neural network learning apparatus connected with a plurality of processing elements,
(a) setting a checkpoint on a learning task being performed through a first processing element to obtain an output of the learning task;
(b) retrieving a second processing element that is more cost-effective than the first processing element of the plurality of processing elements independently of performing the learning task; And
(c) if the search is successfully performed, moving the learning task to the second processing element with the output of the imported learning task and continuously performing the learning task.

The method of claim 1, wherein step (a)
And setting the check point in the learning task for each execution unit of the learning task.

3. The method of claim 2, wherein step (a)
Further comprising the step of setting the execution unit through a part of learning data determined as a specific reference among the learning data set.

The method of claim 1, wherein step (a)
And storing the output of the learning task in a global memory accessible by other processing elements.

5. The method of claim 4, wherein step (a)
Further comprising generating a virtual machine image related to the learning task and storing the virtual machine image in the global memory to provide the same execution environment when performed by the other processing element in the storing process .

2. The method of claim 1, wherein step (b)
Starting the search periodically or when the cost of the first processing element is changed.

2. The method of claim 1, wherein step (b)
And analyzing the cost history of the plurality of processing elements to determine the second processing element.

8. The method of claim 7, wherein step (b)
Further comprising the step of periodically performing an analysis of the cost history by a separate task independent of the learning task to recommend the second processing element.

2. The method of claim 1, wherein step (c)
And performing a new learning task with the virtual machine image related to the learning task in the second processing element before the movement.

10. The method of claim 9, wherein step (c)
Further comprising providing an output of the learning task brought to the new learning task and completing the movement.

A dynamic neural network learning apparatus connected with a plurality of processing elements,
A checkpoint setting unit for setting a checkpoint for a learning task being performed through the first processing element and fetching the output of the learning task;
A processing element search unit operable to search for a second processing element that is more cost effective than the first processing element among the plurality of processing elements independently of performing the learning task; And
And a task transfer unit that continuously moves the learning task to the second processing element with an output of the taken learning task when the search is successfully performed.

12. The apparatus of claim 11, wherein the checkpoint setting unit
And sets the check point in the learning task for each execution unit of the learning task.

12. The apparatus of claim 11, wherein the checkpoint setting unit
And stores the output of the learning task in a global memory accessible by other processing elements.

12. The apparatus of claim 11, wherein the processing element search unit
Wherein the search is started periodically or when the cost of the first processing element is changed or the cost history of the plurality of processing elements is analyzed to determine the second processing element.

12. The apparatus of claim 11, wherein the work transfer unit
And performs a new learning task with the virtual machine image related to the learning task in the second processing element before the movement.

A dynamic neural network learning apparatus connected with a plurality of processing elements,
The dynamic neural network learning apparatus executes the cost monitor agent, the instance arbitration agent and the instance recommendation agent as independent processes,
Wherein the cost monitor agent sets a checkpoint for a learning task being performed through the first processing element to fetch the output of the learning task,
Wherein the instance recommendation agent retrieves a second processing element that is more cost effective than the first processing element of the plurality of processing elements independently of performing the learning task,
Wherein the instant arbitration agent continuously moves the learning task to the second processing element with the retrieved result of the learning task when the retrieval is successfully performed.