KR102091481B1

KR102091481B1 - Method for dynamic neural network learning and apparatus for the same

Info

Publication number: KR102091481B1
Application number: KR1020170161312A
Authority: KR
Inventors: 이경용
Original assignee: 국민대학교산학협력단
Priority date: 2017-11-29
Filing date: 2017-11-29
Publication date: 2020-03-20
Also published as: KR20190062778A

Abstract

동적 신경망 학습 방법은 복수의 프로세싱 엘리먼트들과 연결된 동적 신경망 학습 장치에서 수행된다. 상기 방법은 (a) 제1 프로세싱 엘리먼트를 통해 수행되고 있는 학습 태스크에 관해 체크포인트를 설정하여 상기 학습 태스크의 산출물을 가져오는 단계, (b) 상기 학습 태스크의 수행에 독립적으로 상기 복수의 프로세싱 엘리먼트들 중 상대적으로 비용 효율적인 제2 프로세싱 엘리먼트를 검색하는 단계 및 (c) 상기 검색이 성공적으로 수행되면 상기 가져온 학습 태스크의 산출물을 가지고 상기 제2 프로세싱 엘리먼트로 상기 학습 태스크를 이동시켜 계속적으로 수행하는 단계를 포함한다.The dynamic neural network learning method is performed in a dynamic neural network learning apparatus connected to a plurality of processing elements. The method comprises the steps of (a) setting a checkpoint for a learning task being performed through a first processing element to obtain an output of the learning task, (b) the plurality of processing elements independently of the execution of the learning task Retrieving a relatively cost-effective second processing element among them and (c) moving the learning task to the second processing element continuously with the output of the imported learning task when the search is successfully performed It includes.

Description

A dynamic neural network learning method and a dynamic neural network learning apparatus for performing the same {METHOD FOR DYNAMIC NEURAL NETWORK LEARNING AND APPARATUS FOR THE SAME}

본 발명은 동적 신경망 학습 기술에 관한 것으로, 보다 상세하게는, 복수의 프로세싱 엘리먼트들을 통해 동적 신경망 학습을 수행할 수 있는 동적 신경망 학습 방법 및 이를 수행하는 동적 신경망 학습 장치에 관한 것이다.The present invention relates to a dynamic neural network learning technology, and more particularly, to a dynamic neural network learning method capable of performing dynamic neural network learning through a plurality of processing elements and a dynamic neural network learning apparatus performing the same.

딥러닝(Deep Learning)은 컴퓨터가 여러 데이터를 이용해 마치 사람처럼 스스로 학습할 수 있게 하기 위해 인공 신경망(ANN, Artificial Neural Network)을 기반으로 한 기계 학습 기술이다. 최근 병렬 알고리즘을 광범위하게 실행하고 개선하여 컴퓨터 성능을 향상시킴으로써 음성 인식, 컴퓨터 비전, 자연 언어 처리 및 추천 시스템과 같은 다양한 분야에서 응용 프로그램 시나리오를 확장하는데 필요한 심층 학습을 수행한다.Deep Learning is a machine learning technology based on Artificial Neural Network (ANN), which allows computers to self-learn as humans using multiple data. In recent years, parallel algorithms have been extensively implemented and improved to improve computer performance, providing the in-depth learning necessary to extend application scenarios in various areas such as speech recognition, computer vision, natural language processing, and recommendation systems.

딥러닝(Deep Learning)은 많은 연산 자원을 필요로 하여 GPU(Graphics Processing Unit)를 활용한 시스템이 널리 사용되고 있는데, 이는 클라우드 컴퓨팅 자원 중 사용 가능한 자원의 수를 제한하고 있는 문제점이 있다.Deep learning requires a large amount of computational resources, and systems utilizing GPUs (Graphics Processing Units) are widely used, which has a problem of limiting the number of available resources among cloud computing resources.

한국공개특허 제10-2015-0096286호는 유휴 컴퓨터를 활용한 클라우드 대용량 데이터 분석 방법에 관한 것으로, 특정 에이전트 응용 프로그램이 설치된 사용자 개인 컴퓨터에게 네트워크를 통해 클라우드로부터 작업 명령을 받고, 그 작업을 수행한 이후에 그 결과를 다시 네트워크로 되돌리는 기술을 개시한다.Korean Patent Publication No. 10-2015-0096286 relates to a method for analyzing large-capacity data using an idle computer, and receives a work command from the cloud through a network to a user's personal computer where a specific agent application is installed, and performs the work. Thereafter, a technique for returning the results back to the network is disclosed.

한국공개특허 제10-2016-0146948호는 가상화 환경에서의 지능형 GPU 스케줄링에 관한 것으로, 상이한 가상 머신들로부터 GPU 커맨드들을 수신하고, 스케줄링 정책을 동적으로 선택하며 GPU에 의한 처리를 위해 GPU 커맨드들을 스케줄링하는 기술을 개시한다.Korean Patent Publication No. 10-2016-0146948 relates to intelligent GPU scheduling in a virtualized environment, receiving GPU commands from different virtual machines, dynamically selecting a scheduling policy, and scheduling GPU commands for processing by the GPU Disclosed technology.

한국공개특허 제10-2015-0096286호 (2015.08.24)Korean Patent Publication No. 10-2015-0096286 (2015.08.24) 한국공개특허 제10-2016-0146948호 (2016.12.21)Korean Patent Publication No. 10-2016-0146948 (2016.12.21)

본 발명의 일 실시예는 복수의 프로세싱 엘리먼트들을 통해 동적 신경망 학습을 수행할 수 있는 동적 신경망 학습 방법 및 이를 수행하는 동적 신경망 학습 장치를 제공하고자 한다.One embodiment of the present invention is to provide a dynamic neural network learning method capable of performing dynamic neural network learning through a plurality of processing elements and a dynamic neural network learning apparatus performing the same.

본 발명의 일 실시예는 학습 태스크에 체크포인트를 설정할 수 있고, 설정된 체크포인트를 통해 학습 태스크의 산출물을 저장하여 비정상 이벤트가 발생하면 학습 태스크를 이동시킬 수 있는 동적 신경망 학습 방법 및 이를 수행하는 동적 신경망 학습 장치를 제공하고자 한다.According to an embodiment of the present invention, a checkpoint may be set in a learning task, and a dynamic neural network learning method capable of moving a learning task when an abnormal event occurs by storing an output of the learning task through the set checkpoint and a dynamic performing the same It is intended to provide a neural network learning device.

본 발명의 일 실시예는 복수의 프로세싱 엘리먼트들 간의 협업을 통해 학습 태스크를 마이그레이션하여 해당 학습 태스크를 계속적으로 수행할 수 있는 동적 신경망 학습 방법 및 이를 수행하는 동적 신경망 학습 장치를 제공하고자 한다.One embodiment of the present invention is to provide a dynamic neural network learning method capable of continuously performing a corresponding learning task by migrating a learning task through collaboration between a plurality of processing elements and a dynamic neural network learning apparatus performing the same.

본 발명의 일 실시예는 복수의 프로세싱 엘리먼트들 각각의 비용 히스토리를 분석하여 낮은 비용의 프로세싱 엘리먼트를 검색할 수 있는 동적 신경망 학습 방법 및 이를 수행하는 동적 신경망 학습 장치를 제공하고자 한다.One embodiment of the present invention is to provide a dynamic neural network learning method capable of searching for a low cost processing element by analyzing the cost history of each of a plurality of processing elements, and a dynamic neural network learning apparatus performing the same.

실시예들 중에서, 동적 신경망 학습 방법은 복수의 프로세싱 엘리먼트들과 연결된 동적 신경망 학습 장치에서 수행된다. 상기 방법은 (a) 제1 프로세싱 엘리먼트를 통해 수행되고 있는 학습 태스크에 관해 체크포인트를 설정하여 상기 학습 태스크의 산출물을 가져오는 단계, (b) 상기 학습 태스크의 수행에 독립적으로 상기 복수의 프로세싱 엘리먼트들 중 상기 제1 프로세싱 엘리먼트보다 비용 효율적인 제2프로세싱 엘리먼트를 검색하는 단계 및 (c) 상기 검색이 성공적으로 수행되면 가져온 상기 학습 태스크의 산출물을 가지고 상기 제2 프로세싱 엘리먼트로 상기 학습 태스크를 이동시켜 계속적으로 수행하는 단계를 포함한다.Among the embodiments, the dynamic neural network learning method is performed in a dynamic neural network learning apparatus connected to a plurality of processing elements. The method comprises the steps of (a) setting a checkpoint for a learning task being performed through a first processing element to obtain an output of the learning task, (b) the plurality of processing elements independently of the execution of the learning task Searching for a second processing element that is more cost-effective than the first processing element and (c) moving the learning task to the second processing element with the output of the learning task obtained when the search is successful. It includes the steps to perform.

일 실시예에서, 상기 (a) 단계는 상기 학습 태스크의 실행단위마다 상기 학습 태스크에 상기 체크포인트를 설정하는 단계를 포함할 수 있다.In one embodiment, step (a) may include setting the checkpoint in the learning task for each execution unit of the learning task.

일 실시예에서, 상기 (a) 단계는 학습 데이터 집합 중 특정 기준으로 결정된 일부의 학습 데이터를 통해 상기 실행단위를 설정하는 단계를 더 포함할 수 있다.In one embodiment, the step (a) may further include the step of setting the execution unit through a part of the learning data determined by a specific criterion of the training data set.

일 실시예에서, 상기 (a) 단계는 상기 학습 태스크의 산출물을 다른 프로세싱 엘리먼트에 의해 접근 가능한 글로벌 메모리에 저장하는 단계를 포함할 수 있다.In one embodiment, step (a) may include storing the output of the learning task in a global memory accessible by other processing elements.

일 실시예에서, 상기 (a) 단계는 상기 저장 과정에서 상기 다른 프로세싱 엘리먼트에 의해 수행될 때 동일한 실행 환경을 제공하기 위해 상기 학습 태스크에 관한 가상 머신 이미지를 생성하여 상기 글로벌 메모리에 저장하는 단계를 더 포함할 수 있다.In one embodiment, step (a) comprises generating a virtual machine image for the learning task and storing it in the global memory to provide the same execution environment when performed by the other processing element in the storage process. It may further include.

일 실시예에서, 상기 (b) 단계는 주기적으로 또는 상기 제1 프로세싱 엘리먼트의 비용이 변경될 때 상기 검색을 시작하는 단계를 포함할 수 있다.In one embodiment, step (b) may include starting the search periodically or when the cost of the first processing element changes.

일 실시예에서, 상기 (b) 단계는 상기 복수의 프로세싱 엘리먼트들의 비용 히스토리를 분석하여 상기 제2 프로세싱 엘리먼트를 결정하는 단계를 포함할 수 있다.In one embodiment, step (b) may include determining the second processing element by analyzing the cost history of the plurality of processing elements.

일 실시예에서, 상기 (b) 단계는 상기 학습 태스크와 독립적으로 수행되는 별도의 태스크가 상기 비용 히스토리의 분석을 주기적으로 수행하여 상기 제2 프로세싱 엘리먼트를 추천하도록 하는 단계를 더 포함할 수 있다.In one embodiment, the step (b) may further include a step of performing a separate task performed independently of the learning task to analyze the cost history to recommend the second processing element.

일 실시예에서, 상기 (c) 단계는 상기 이동 전에 상기 제2 프로세싱 엘리먼트에 상기 학습 태스크에 관한 가상 머신 이미지로 새로운 학습 태스크를 수행하는 단계를 포함할 수 있다.In one embodiment, step (c) may include performing a new learning task with a virtual machine image of the learning task on the second processing element before the movement.

일 실시예에서, 상기 (c) 단계는 상기 새로운 학습 태스크에 가져온 상기 학습 태스크의 산출물을 제공하여 상기 이동을 완료시키는 단계를 더 포함할 수 있다.In one embodiment, step (c) may further include completing the movement by providing an output of the learning task brought to the new learning task.

실시예들 중에서, 복수의 프로세싱 엘리먼트들과 연결된 동적 신경망 학습 장치는 제1 프로세싱 엘리먼트를 통해 수행되고 있는 학습 태스크에 관해 체크포인트를 설정하여 상기 학습 태스크의 산출물을 가져오는 체크포인트 설정부, 상기 학습 태스크의 수행에 독립적으로 상기 복수의 프로세싱 엘리먼트들 중 상기 제1 프로세싱 엘리먼트보다 비용 효율적인 제2프로세싱 엘리먼트를 검색하는 프로세싱 엘리먼트 검색부 및 상기 검색이 성공적으로 수행되면 가져온 상기 학습 태스크의 산출물을 가지고 상기 제2 프로세싱 엘리먼트로 상기 학습 태스크를 이동시켜 계속적으로 수행하는 작업 이관부를 포함한다.Among embodiments, a dynamic neural network learning apparatus connected to a plurality of processing elements sets a checkpoint with respect to a learning task being performed through the first processing element to obtain a product of the learning task, the checkpoint setting unit, the learning The processing element search unit that searches for a second processing element that is more cost-effective than the first processing element among the plurality of processing elements independently of the performance of the task, and the product with the output of the learning task obtained when the search is successfully performed. 2 includes a task transfer unit that continuously moves the learning task to a processing element.

일 실시예에서, 상기 체크포인트 설정부는 상기 학습 태스크의 실행단위마다 상기 학습 태스크에 상기 체크포인트를 설정할 수 있다.In one embodiment, the checkpoint setting unit may set the checkpoint in the learning task for each execution unit of the learning task.

일 실시예에서, 상기 체크포인트 설정부는 상기 학습 태스크의 산출물을 다른 프로세싱 엘리먼트에 의해 접근 가능한 글로벌 메모리에 저장할 수 있다.In one embodiment, the checkpoint setting unit may store the output of the learning task in a global memory accessible by other processing elements.

일 실시예에서, 상기 프로세싱 엘리먼트 검색부는 주기적으로 또는 상기 제1 프로세싱 엘리먼트의 비용이 변경될 때 상기 검색을 시작하거나, 또는 상기 복수의 프로세싱 엘리먼트들의 비용 히스토리를 분석하여 상기 제2 프로세싱 엘리먼트를 결정할 수 있다.In one embodiment, the processing element search unit may start the search periodically or when the cost of the first processing element is changed, or analyze the cost history of the plurality of processing elements to determine the second processing element. have.

일 실시예에서, 상기 작업 이관부는 상기 이동 전에 상기 제2 프로세싱 엘리먼트에 상기 학습 태스크에 관한 가상 머신 이미지로 새로운 학습 태스크를 수행할 수 있다.In one embodiment, the task transfer unit may perform a new learning task with a virtual machine image of the learning task on the second processing element before the movement.

실시예들 중에서, 복수의 프로세싱 엘리먼트들과 연결된 동적 신경망 학습 장치는 상기 동적 신경망 학습 장치는 독립된 프로세스들로서 비용 모니터 에이전트, 인스턴스 중재 에이전트 및 인스턴스 추천 에이전트를 실행시키고, 상기 비용 모니터 에이전트는 제1 프로세싱 엘리먼트를 통해 수행되고 있는 학습 태스크에 관해 체크포인트를 설정하여 상기 학습 태스크의 산출물을 가져오며, 상기 인스턴스 추천 에이전트는 상기 학습 태스크의 수행에 독립적으로 상기 복수의 프로세싱 엘리먼트들 중 상기 제1 프로세싱 엘리먼트보다 비용 효율적인 제2프로세싱 엘리먼트를 검색하고, 상기 인스턴스 중재 에이전트는 상기 검색이 성공적으로 수행되면 가져온 상기 학습 태스크의 산출물을 가지고 상기 제2 프로세싱 엘리먼트로 상기 학습 태스크를 이동시켜 계속적으로 수행한다.Among embodiments, a dynamic neural network learning apparatus connected to a plurality of processing elements, the dynamic neural network learning apparatus executes a cost monitor agent, an instance mediation agent and an instance recommendation agent as independent processes, and the cost monitor agent is the first processing element. A checkpoint is set for a learning task being performed through to obtain an output of the learning task, and the instance recommendation agent costs more than the first processing element among the plurality of processing elements independently of performing the learning task. Search for an efficient second processing element, and the instance arbitration agent moves the learning task to the second processing element with the output of the learning task obtained when the search is successfully performed Perform a lit continuously.

개시된 기술은 다음의 효과를 가질 수 있다. 다만, 특정 실시예가 다음의 효과를 전부 포함하여야 한다 거나 다음의 효과만을 포함하여야 한다는 의미는 아니므로, 개시된 기술의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다.The disclosed technology can have the following effects. However, since a specific embodiment does not mean that all of the following effects should be included or only the following effects are included, the scope of rights of the disclosed technology should not be understood as being limited thereby.

본 발명의 일 실시예에 따른 동적 신경망 학습 방법 및 이를 수행하는 동적 신경망 학습 장치는 복수의 프로세싱 엘리먼트들을 통해 동적 신경망 학습을 수행할 수 있다.A dynamic neural network learning method according to an embodiment of the present invention and a dynamic neural network learning apparatus for performing the same may perform dynamic neural network learning through a plurality of processing elements.

본 발명의 일 실시예에 따른 동적 신경망 학습 방법 및 이를 수행하는 동적 신경망 학습 장치는 학습 태스크에 체크포인트를 설정할 수 있고, 설정된 체크포인트를 통해 학습 태스크의 산출물을 저장하여 비정상 이벤트가 발생하면 학습 태스크를 이동시킬 수 있다.A dynamic neural network learning method and a dynamic neural network learning apparatus performing the same according to an embodiment of the present invention may set a checkpoint in a learning task, and store an output of the learning task through the set checkpoint to generate a learning task when an abnormal event occurs Can be moved.

본 발명의 일 실시예에 따른 동적 신경망 학습 방법 및 이를 수행하는 동적 신경망 학습 장치는 복수의 프로세싱 엘리먼트들 간의 협업을 통해 학습 태스크를 마이그레이션(즉, 학습 작업 이관)하여 해당 학습 태스크를 계속적으로 수행할 수 있다.The dynamic neural network learning method and the dynamic neural network learning apparatus performing the same according to an embodiment of the present invention continuously perform the corresponding learning task by migrating the learning task (that is, transferring the learning task) through collaboration between a plurality of processing elements. You can.

본 발명의 일 실시예에 따른 동적 신경망 학습 방법 및 이를 수행하는 동적 신경망 학습 장치는 복수의 프로세싱 엘리먼트들 각각의 비용 히스토리를 분석하여 낮은 비용의 프로세싱 엘리먼트를 검색할 수 있다.The dynamic neural network learning method according to an embodiment of the present invention and the dynamic neural network learning apparatus performing the same may search for a low cost processing element by analyzing the cost history of each of the plurality of processing elements.

도 1은 본 발명의 일 실시예에 따른 동적 신경망 학습 시스템을 설명하는 도면이다.
도 2는 도 1에 있는 동적 신경망 학습 장치를 설명하는 도면이다.
도 3은 도 1에 있는 동적 신경망 학습 장치에서 수행되는 동적 신경망 학습 과정을 설명하는 도면이다.1 is a diagram illustrating a dynamic neural network learning system according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating the dynamic neural network learning apparatus in FIG. 1.
FIG. 3 is a diagram illustrating a dynamic neural network learning process performed in the dynamic neural network learning apparatus in FIG. 1.

본 발명에 관한 설명은 구조적 내지 기능적 설명을 위한 실시예에 불과하므로, 본 발명의 권리범위는 본문에 설명된 실시예에 의하여 제한되는 것으로 해석되어서는 아니 된다. 즉, 실시예는 다양한 변경이 가능하고 여러 가지 형태를 가질 수 있으므로 본 발명의 권리범위는 기술적 사상을 실현할 수 있는 균등물들을 포함하는 것으로 이해되어야 한다. 또한, 본 발명에서 제시된 목적 또는 효과는 특정 실시예가 이를 전부 포함하여야 한다 거나 그러한 효과만을 포함하여야 한다는 의미는 아니므로, 본 발명의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다.Since the description of the present invention is only an example for structural or functional description, the scope of the present invention should not be interpreted as being limited by the examples described in the text. That is, since the embodiments can be variously changed and have various forms, it should be understood that the scope of the present invention includes equivalents capable of realizing technical ideas. In addition, the purpose or effect presented in the present invention does not mean that a specific embodiment should include all of them or only such an effect, and the scope of the present invention should not be understood as being limited thereby.

한편, 본 출원에서 서술되는 용어의 의미는 다음과 같이 이해되어야 할 것이다.Meanwhile, the meaning of terms described in the present application should be understood as follows.

"제1", "제2" 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위한 것으로, 이들 용어들에 의해 권리범위가 한정되어서는 아니 된다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다.Terms such as "first" and "second" are for distinguishing one component from other components, and the scope of rights should not be limited by these terms. For example, the first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결될 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어"있다고 언급된 때에는 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. 한편, 구성요소들 간의 관계를 설명하는 다른 표현들, 즉 "~사이에" 와 "바로 ~사이에" 또는 "~에 이웃하는"과 "~에 직접 이웃하는" 등도 마찬가지로 해석되어야 한다.When a component is said to be "connected" to another component, it may be understood that other components may exist in the middle, although they may be directly connected to the other component. On the other hand, when a component is said to be "directly connected" to another component, it should be understood that no other component exists in the middle. On the other hand, other expressions describing the relationship between the components, that is, "between" and "immediately between" or "adjacent to" and "directly neighboring to" should be interpreted similarly.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "포함하다" 또는 "가지다" 등의 용어는 실시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Singular expressions are to be understood as including plural expressions unless the context clearly indicates otherwise, and terms such as “comprises” or “have” are used features, numbers, steps, actions, components, parts or the like. It is to be understood that a combination is intended to be present, and should not be understood as pre-excluding the presence or addition possibility of one or more other features or numbers, steps, operations, components, parts or combinations thereof.

각 단계들에 있어 식별부호(예를 들어, a, b, c 등)는 설명의 편의를 위하여 사용되는 것으로 식별부호는 각 단계들의 순서를 설명하는 것이 아니며, 각 단계들은 문맥상 명백하게 특정 순서를 기재하지 않는 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 단계들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.In each step, the identification code (for example, a, b, c, etc.) is used for convenience of explanation. The identification code does not describe the order of each step, and each step clearly identifies a specific order in context. Unless stated, it may occur in a different order than specified. That is, each step may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the reverse order.

본 발명은 컴퓨터가 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현될 수 있고, 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다. 컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장 장치 등이 있다. 또한, 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.The present invention can be embodied as computer readable code on a computer readable recording medium, and the computer readable recording medium includes all kinds of recording devices in which data readable by a computer system is stored. . Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disks, and optical data storage devices. In addition, the computer-readable recording medium can be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.

여기서 사용되는 모든 용어들은 다르게 정의되지 않는 한, 본 발명이 속하는 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한 이상적이거나 과도하게 형식적인 의미를 지니는 것으로 해석될 수 없다.All terms used herein have the same meaning as commonly understood by a person skilled in the art to which the present invention pertains, unless otherwise defined. The terms defined in the commonly used dictionary should be interpreted as being consistent with the meanings in the context of related technologies, and cannot be interpreted as having ideal or excessively formal meanings unless explicitly defined in the present application.

도 1은 본 발명의 일 실시예에 따른 동적 신경망 학습 시스템을 설명하는 도면이다.1 is a diagram illustrating a dynamic neural network learning system according to an embodiment of the present invention.

도 1을 참조하면, 동적 신경망 학습 시스템(10)은 복수의 프로세싱 엘리먼트들(100), 동적 신경망 학습 장치(200) 및 글로벌 메모리(300)를 포함하고, 이들은 네트워크를 통해 연결될 수 있다.Referring to FIG. 1, the dynamic neural network learning system 10 includes a plurality of processing elements 100, a dynamic neural network learning apparatus 200, and a global memory 300, which can be connected through a network.

복수의 프로세싱 엘리먼트들(100)은 전 세계에 분포되어 있을 수 있고, 제1 프로세싱 엘리먼트(100-1), 제2 프로세싱 엘리먼트(100-2), 제3 프로세싱 엘리먼트(100-3), ... 제N 프로세싱 엘리먼트(100-n)으로 구현될 수 있다. 여기에서, 프로세싱 엘리먼트(Processing Element)이 분포되어 있는 지역은 물리적으로 멀리 떨어져 있는 지리적 영역에 해당할 수 있다. 일 실시예에서, 복수의 프로세싱 엘리먼트들(100)은 유휴 클라우드 컴퓨팅 자원을 포함할 수 있다. 동적 신경망 학습 장치(200)는 프로세싱 엘리먼트의 예기치 못한 비정상 이벤트(예를 들어, 강제 종료 등)가 발생하더라도 전 세계에 분포되어 있는 복수의 프로세싱 엘리먼트들(100)과의 협업을 통해 학습 태스크를 계속적으로 수행할 수 있다.The plurality of processing elements 100 may be distributed around the world, the first processing element 100-1, the second processing element 100-2, the third processing element 100-3, .. It may be implemented as an N-th processing element (100-n). Here, an area in which a processing element is distributed may correspond to a geographical area that is physically remote. In one embodiment, the plurality of processing elements 100 may include an idle cloud computing resource. The dynamic neural network learning apparatus 200 continues the learning task through collaboration with a plurality of processing elements 100 distributed around the world even when an unexpected abnormal event (eg, forced termination) of the processing element occurs. Can be done with

동적 신경망 학습 장치(200)는 복수의 프로세싱 엘리먼트들(100) 및 글로벌 메모리(300)와 연결된 컴퓨팅 장치에 해당할 수 있다. 보다 구체적으로, 동적 신경망 학습 장치(200)는 체크포인트를 이용하여 학습 태스크의 산출물(즉, 학습 태스크의 중간 단계에 대한 산출물을 포함)을 글로벌 메모리(300)에 저장할 수 있고, 이후 다른 프로세싱 엘리먼트가 검색되면 글로벌 메모리(300)에 저장된 학습 태스크를 가져온 후, 다른 프로세싱 엘리먼트에서 해당 학습 태스크를 다시 수행할 수 있다. 즉, 본 발명의 일 실시예에 따른 동적 신경망 학습 장치(200)는 수행 중 인 프로세싱 엘리먼트에 비정상 이벤트가 발생되면 해당 학습 태스크를 이동시킬 다른 프로세싱 엘리먼트를 결정하고, 결정된 프로세싱 엘리먼트에서 다시 수행할 수 있도록 해당 학습 태스크를 이동시킬 수 있다.The dynamic neural network learning apparatus 200 may correspond to a computing device connected to the plurality of processing elements 100 and the global memory 300. More specifically, the dynamic neural network learning apparatus 200 may store the output of the learning task (ie, the output for the intermediate step of the learning task) in the global memory 300 using checkpoints, and then other processing elements When is retrieved, the learning task stored in the global memory 300 is fetched, and the learning task can be performed again in another processing element. That is, the dynamic neural network learning apparatus 200 according to an embodiment of the present invention may determine another processing element to move the corresponding learning task when an abnormal event occurs in the processing element being performed, and perform it again on the determined processing element. So you can move the learning task.

일 실시예에서, 동적 신경망 학습 장치(200)는 학습 태스크를 수행하고 있는 프로세싱 엘리먼트의 실행 사항을 체크하여 다른 프로세싱 엘리먼트의 시작 여부를 결정할 수 있고, 학습 태스크에 관한 체크포인트 설정 여부를 확인할 수 있다. 예를 들어, 동적 신경망 학습 장치(200)는 학습 태스크를 수행하고 있는 프로세싱 엘리먼트에 관한 비용 증가, 강제 종료 등을 포함하는 비정상 이벤트와 같은 실행 사항을 체크할 수 있고, 프로세싱 엘리먼트와 다른 프로세싱 엘리먼트 간의 비용을 주기적으로 비교 분석할 수 있다.In one embodiment, the dynamic neural network learning apparatus 200 may determine whether to start another processing element by checking execution of a processing element performing a learning task, and check whether a checkpoint for a learning task is set. . For example, the dynamic neural network learning apparatus 200 may check execution such as an abnormal event including an increase in cost, a forced termination, etc. on a processing element performing a learning task, and between the processing element and another processing element Costs can be compared and analyzed periodically.

또한, 동적 신경망 학습 장치(200)는 클라우드 컴퓨팅 환경에서 유휴 자원을 활용하여 낮은 비용에 딥러닝(Deep Learning) 작업을 수행할 수 있고, 예기치 못한 프로세싱 엘리먼트의 비정상 이벤트에 대응할 수 있다. 이하, 동적 신경망 학습 장치(200)와 관련한 보다 상세한 설명은 도 2를 참조하여 설명한다.In addition, the dynamic neural network learning apparatus 200 may perform deep learning at a low cost by utilizing idle resources in a cloud computing environment, and can respond to abnormal events of unexpected processing elements. Hereinafter, a more detailed description of the dynamic neural network learning apparatus 200 will be described with reference to FIG. 2.

글로벌 메모리(300)는 동적 신경망 학습 장치(200)와 연결될 수 있고, 복수의 프로세싱 엘리먼트들(100)에 의해 접근 가능할 수 있다. 보다 구체적으로, 글로벌 메모리(300)는 SSD(Solid State Disk) 또는 HDD(Hard Disk Drive)와 같은 비휘발성 메모리로 구현되어 동적 신경망 학습 장치(200)에 필요한 데이터 전반을 저장하는데 사용되는 보조기억장치를 포함할 수 있고, 이처럼, 글로벌 메모리(300)는 비휘발성 메모리로 구현될 수 있고, 만일, 비휘발성 메모리로 구현되면 하이퍼링크를 통해 연결되도록 구현될 수 있다.The global memory 300 may be connected to the dynamic neural network learning apparatus 200, and may be accessible by a plurality of processing elements 100. More specifically, the global memory 300 is implemented as a non-volatile memory such as a solid state disk (SSD) or a hard disk drive (HDD), and an auxiliary memory device used to store overall data required for the dynamic neural network learning device 200. It may include, and, as such, the global memory 300 may be implemented as a non-volatile memory, and, if implemented as a non-volatile memory, may be implemented to be connected through a hyperlink.

일 실시예에서, 글로벌 메모리(300)는 체크포인트를 통한 학습 태스크의 산출물이 저장될 수 있고, 학습 태스크에 관한 가상 머신 이미지(VMI, Virtual Machine Image)가 저장될 수 있다. 또한, 글로벌 메모리(300)에 저장되는 데이터는 반드시 이에 한정하지 않으며 설계자에 의해 변경될 수 있다.In one embodiment, the global memory 300 may store an output of a learning task through a checkpoint, and a virtual machine image (VMI) related to the learning task may be stored. In addition, data stored in the global memory 300 is not necessarily limited to this, and may be changed by a designer.

도 2는 도 1에 있는 동적 신경망 학습 장치를 설명하는 도면이다.FIG. 2 is a diagram illustrating the dynamic neural network learning apparatus in FIG. 1.

도 2를 참조하면, 동적 신경망 학습 장치(200)는 체크포인트 설정부(210), 프로세싱 엘리먼트 검색부(220), 작업 이관부(230) 및 제어부(240)를 포함한다.Referring to FIG. 2, the dynamic neural network learning apparatus 200 includes a checkpoint setting unit 210, a processing element search unit 220, a task transfer unit 230, and a control unit 240.

체크포인트 설정부(210)는 학습 태스크의 산출물을 가져올 수 있다. 보다 구체적으로, 체크포인트 설정부(210)는 제1 프로세싱 엘리먼트(100-1)를 통해 수행되고 있는 학습 태스크에 관해 체크포인트를 설정하여 학습 태스크의 산출물을 가져올 수 있다. 여기에서, 체크포인트(Check Point)는 학습 태스크의 실행 과정의 중간 검사점으로, 해당 중간 검사점의 학습 태스크에 관한 실행 상태를 완전하게 보존하고, 이후에 해당 중간 검사점에서 학습 태스크의 수행을 다시 재개할 수 있도록 한 중간 검사점을 의미한다. 본 발명의 일 실시예에 따른 동적 신경망 학습 장치(200)는 프로세싱 엘리먼트에 관한 예기치 못한 비정상 이벤트(예를 들어, 비용 증가, 강제 종료 등 포함)가 발생되는 경우에는 설정된 체크포인트를 통해 해당 중간 검사점의 학습 태스크의 산출물을 가져올 수 있고, 이후 해당 중간 검사점부터 학습 태스크를 다시 수행할 수 있다.The checkpoint setting unit 210 may bring the output of the learning task. More specifically, the checkpoint setting unit 210 may set the checkpoint with respect to the learning task being performed through the first processing element 100-1 to bring the output of the learning task. Here, the check point (Check Point) is an intermediate checkpoint of the execution process of the learning task, and completely preserves the execution state of the learning task of the intermediate checkpoint, and then performs the learning task at the intermediate checkpoint. It means an intermediate checkpoint that can be resumed. The dynamic neural network learning apparatus 200 according to an embodiment of the present invention checks the corresponding intermediate through an established checkpoint when an unexpected abnormal event (for example, an increase in cost, forcibly terminated, etc.) related to the processing element occurs The output of the learning task of the point can be obtained, and then the learning task can be performed again from the corresponding intermediate checkpoint.

체크포인트 설정부(210)는 학습 태스크의 실행단위마다 학습 태스크에 체크포인트를 설정할 수 있다. 보다 구체적으로, 체크포인트 설정부(210)는 학습 태스크의 실행단위마다 학습 태스크의 중간 결과에 대한 체크포인트를 설정할 수 있다. 체크포인트 설정부(210)는 학습 데이터 집합 중 특정 기준으로 결정된 일부의 학습 데이터를 통해 실행단위를 설정할 수 있다. 여기에서, 특정 기준은 설계자에 의해 결정될 수 있다.The checkpoint setting unit 210 may set a checkpoint in the learning task for each execution unit of the learning task. More specifically, the checkpoint setting unit 210 may set a checkpoint for an intermediate result of the learning task for each execution unit of the learning task. The checkpoint setting unit 210 may set an execution unit through some learning data determined based on a specific criterion among the learning data sets. Here, specific criteria can be determined by the designer.

체크포인트 설정부(210)는 학습 태스크의 산출물을 다른 프로세싱 엘리먼트에 의해 접근 가능한 글로벌 메모리(300)에 저장할 수 있다. 여기에서, 다른 프로세싱 엘리먼트는 해당 학습 태스크가 수행되고 있는 제1 프로세싱 엘리먼트(100-1) 이외에 제2 내지 제N 프로세싱 엘리먼트들(100-2, ... 100-n)을 포함할 수 있다. 일 실시예에서, 체크포인트 설정부(210)는 학습 태스크에 관해 체크포인트를 설정하여 해당 학습 태스크의 산출물을 가져온 후, 해당 학습 태스크의 산출물을 글로벌 메모리(300)에 저장할 수 있다.The checkpoint setting unit 210 may store the output of the learning task in the global memory 300 accessible by other processing elements. Here, other processing elements may include second to Nth processing elements 100-2, ... 100-n in addition to the first processing element 100-1 on which the corresponding learning task is being performed. In one embodiment, the checkpoint setting unit 210 may set the checkpoint for the learning task to obtain the output of the corresponding learning task, and then store the output of the learning task in the global memory 300.

체크포인트 설정부(210)는 다른 프로세싱 엘리먼트에 의해 학습 태스크가 수행될 때 동일한 실행 환경을 제공할 수 있다. 체크포인트 설정부(210)는 학습 태스크의 산출물을 글로벌 메모리(300)에 저장하는 과정에서 다른 프로세싱 엘리먼트에 의해 수행될 때 동일한 실행 환경을 제공하기 위해 학습 태스크의 관한 가상 머신 이미지(VMI, Virtual Machine Image)를 생성하여 글로벌 메모리(300)에 저장할 수 있다.The checkpoint setting unit 210 may provide the same execution environment when the learning task is performed by other processing elements. The checkpoint setting unit 210 is a virtual machine image (VMI, Virtual Machine) of the learning task to provide the same execution environment when performed by other processing elements in the process of storing the output of the learning task in the global memory 300 Image) can be generated and stored in the global memory 300.

일 실시예에서, 체크포인트 설정부(210)는 다른 프로세싱 엘리먼트에 의해 수행될 때 동일한 실행환경을 제공하기 위해 Ubuntu 14.04, NVIDIA CUDA SDK 7.5, cuDNN 라이브러리 및 TensorFlow 0.1 등을 사용하여 학습 태스크의 관한 가상 머신 이미지를 생성할 수 있다. 이를 통해, 다른 프로세싱 엘리먼트는 글로벌 메모리(300)에 저장된 가상 머신 이미지를 복사할 수 있고, 복사된 가상 머신 이미지를 통해 학습 태스크를 가져온 후에 동일한 실행 환경에서 해당 학습 태스크를 다시 수행할 수 있다.In one embodiment, the checkpoint setting unit 210 uses Ubuntu 14.04, NVIDIA CUDA SDK 7.5, cuDNN library, TensorFlow 0.1, etc. to provide the same execution environment when performed by other processing elements. You can create machine images. Through this, other processing elements may copy the virtual machine image stored in the global memory 300, and after importing the training task through the copied virtual machine image, the corresponding learning task may be performed again in the same execution environment.

프로세싱 엘리먼트 검색부(220)는 학습 태스크의 수행에 독립적으로 복수의 프로세싱 엘리먼트들(100) 중 제1 프로세싱 엘리먼트 보다 상대적으로 비용 효율적인 제2 프로세싱 엘리먼트(100-2)를 검색할 수 있다. 프로세싱 엘리먼트 검색부(220)는 학습 태스크의 수행 과정에서 비정상 이벤트(예를 들어, 강제 종료)가 발생하면 해당 학습 태스크를 다시 수행하기 위해 안정적이고 비용 효율적인 실행환경을 제공할 수 있는 다른 프로세싱 엘리먼트를 검색할 수 있다.The processing element searcher 220 may search for the second processing element 100-2 that is relatively cost-effective than the first processing element among the plurality of processing elements 100 independently of the performance of the learning task. The processing element search unit 220 provides other processing elements that can provide a stable and cost-effective execution environment to perform the learning task again when an abnormal event (eg, forced termination) occurs in the course of performing the learning task. You can search.

프로세싱 엘리먼트 검색부(220)는 주기적으로 또는 제1 프로세싱 엘리먼트(100-1)의 비용이 변경될 때 프로세싱 엘리먼트 검색을 시작할 수 있다. 일 실시예에서, 프로세싱 엘리먼트 검색부(220)는 주기적으로 프로세싱 엘리먼트 검색을 시작할 수 있다. 프로세싱 엘리먼트 검색부(220)는 동적 신경망 학습 장치(200)의 구현 오버헤드(Implementation Overhead)를 최소화하기 위해 복수의 프로세싱 엘리먼트들(100)의 비용을 분석할 수 있다. The processing element searcher 220 may start searching for the processing element periodically or when the cost of the first processing element 100-1 is changed. In one embodiment, the processing element searcher 220 may periodically start searching for the processing element. The processing element searcher 220 may analyze the cost of the plurality of processing elements 100 to minimize the implementation overhead of the dynamic neural network learning apparatus 200.

예를 들어, 프로세싱 엘리먼트 검색부(220)는 주기적으로 프로세싱 엘리먼트 검색을 수행하여 현재 학습 태스크가 수행되고 있는 제1 프로세싱 엘리먼트(100-1) 비용보다 낮은 비용의 프로세싱 엘리먼트가 존재하는 경우에는 비용 효율적인 프로세스 엘리먼트로 학습 태스크를 이동시키도록 할 수 있다.For example, the processing element search unit 220 periodically performs processing element search to perform cost-effective when a processing element having a cost lower than the cost of the first processing element 100-1 in which the current learning task is currently performed is present. You can move learning tasks to process elements.

다른 예를 들어, 프로세싱 엘리먼트 검색부(220)는 학습 태스크가 수행되고 있는 제1 프로세싱 엘리먼트(100-1)의 비용이 증가하지 않았더라도 제1 프로세싱 엘리먼트(100-1)의 비용 증가 또는 예기치 못한 비정상 이벤트 발생을 대비하여 주기적으로 다른 프로세싱 엘리먼트를 검색을 시작할 수 있다.For another example, the processing element searcher 220 may increase or unexpectedly increase the cost of the first processing element 100-1 even if the cost of the first processing element 100-1 on which the learning task is being performed has not increased. In the event of an abnormal event occurring, other processing elements may be periodically searched.

다른 일 실시예에서, 프로세싱 엘리먼트 검색부(220)는 제1 프로세싱 엘리먼트(100-1)의 비용이 변경될 때 프로세싱 엘리먼트 검색을 시작할 수 있다. 프로세싱 엘리먼트 검색부(220)는 학습 태스크가 수행되고 있는 제1 프로세싱 엘리먼트(100-1)의 비용이 증가하는 경우에 복수의 프로세싱 엘리먼트들(100) 중 제1 프로세싱 엘리먼트(100-1)의 비용보다 상대적으로 비용 효율적인 다른 프로세싱 엘리먼트 검색을 시작할 수 있다.In another embodiment, the processing element searcher 220 may start searching for the processing element when the cost of the first processing element 100-1 is changed. The processing element search unit 220 increases the cost of the first processing element 100-1 among the plurality of processing elements 100 when the cost of the first processing element 100-1 in which the learning task is being performed increases. You can start searching for other processing elements that are more cost-effective.

프로세싱 엘리먼트 검색부(220)는 복수의 프로세싱 엘리먼트들(100)의 비용 히스토리를 분석하여 제2 프로세싱 엘리먼트(100-2)를 결정할 수 있다. 일 실시예에서, 프로세싱 엘리먼트 검색부(220)는 복수의 프로세싱 엘리먼트들(100) 각각에 관한 비용 히스토리를 분석하여 복수의 프로세싱 엘리먼트들(100) 각각의 비용을 예측할 수 있고, 비용 예측 결과를 기초로 제2 프로세싱 엘리먼트(100-2)를 결정할 수 있다. 즉, 동적 신경망 학습 장치(200)는 주기적으로 프로세싱 엘리먼트의 비용 분석을 수행하는 별도의 태스크에 오류가 발생하거나 복수의 프로세싱 엘리먼트들(100)에 관한 비용 조회 요청이 급증하면 복수의 프로세싱 엘리먼트들(100) 각각의 비용 히스토리를 글로벌 메모리(300)에 저장함으로써 확장성(Scalability) 및 내결함성(Fault-Tolerance)을 부여할 수 있다.The processing element searcher 220 may determine the second processing element 100-2 by analyzing the cost history of the plurality of processing elements 100. In one embodiment, the processing element searcher 220 may analyze the cost history of each of the plurality of processing elements 100 to predict the cost of each of the plurality of processing elements 100, and based on the cost prediction result The second processing element 100-2 may be determined. That is, the dynamic neural network learning apparatus 200 periodically generates a plurality of processing elements (when an error occurs in a separate task for performing cost analysis of the processing element or when a request for a query for the cost of the plurality of processing elements 100 rapidly increases) 100) Scalability and fault-tolerance may be provided by storing each cost history in the global memory 300.

프로세싱 엘리먼트 검색부(220)는 학습 태스크와 독립적으로 수행되는 별도의 태스크가 비용 히스토리의 분석을 주기적으로 수행하여 제2 프로세싱 엘리먼트(100-2)를 추천하도록 할 수 있다. 예를 들어, 프로세싱 엘리먼트 검색부(220)는 복수의 프로세싱 엘리먼트들(100) 중에서 현재 비용이 가장 낮은 프로세싱 엘리먼트를 제2 프로세싱 엘리먼트(100-2)로 결정할 수 있다. 보다 구체적으로, 프로세싱 엘리먼트 검색부(220)는 별도의 태스크가 복수의 프로세싱 엘리먼트들(100)의 현재 비용을 기초로 비용 히스토리를 분석하여 비용 효율적인 제2 프로세싱 엘리먼트(100-2)를 결정할 수 있고, 결정된 제2 프로세싱 엘리먼트(100-2)를 추천하도록 할 수 있다.The processing element search unit 220 may allow a separate task performed independently of the learning task to recommend the second processing element 100-2 by periodically performing analysis of cost history. For example, the processing element searcher 220 may determine the second processing element 100-2 as the processing element having the lowest current cost among the plurality of processing elements 100. More specifically, the processing element searcher 220 may determine a second cost-effective processing element 100-2 by analyzing a cost history based on the current cost of the plurality of processing elements 100 by a separate task. , It is possible to recommend the determined second processing element 100-2.

작업 이관부(230)는 제2 프로세싱 엘리먼트(100-2)의 검색이 성공적으로 수행되면 체크포인트 설정부(210)를 통해 가져온 학습 태스크의 산출물을 가지고 제2 프로세싱 엘리먼트(100-2)로 학습 태스크를 이동(즉, 제2 프로세싱 엘리먼트로 작업을 이관)시켜 계속적으로 수행할 수 있다.If the search of the second processing element 100-2 is successfully performed, the task transfer unit 230 learns the second processing element 100-2 with the output of the learning task obtained through the checkpoint setting unit 210. The task can be moved (i.e., transferred to the second processing element) to perform it continuously.

작업 이관부(230)는 제2 프로세싱 엘리먼트(100-2)로 학습 태스크를 이동하기 전에 제2 프로세싱 엘리먼트(100-2)에 학습 태스크에 관한 가상 머신 이미지(VMI)로 새로운 학습 태스크를 수행할 수 있다. 일 실시예에서, 작업 이관부(230)는 새로운 학습 태스크에 가져온 학습 태스크의 산출물을 제공하여 이동을 완료시킬 수 있다.The task transfer unit 230 performs a new learning task with a virtual machine image (VMI) related to the learning task to the second processing element 100-2 before moving the learning task to the second processing element 100-2. You can. In one embodiment, the task transfer unit 230 may complete the movement by providing an output of the learning task brought to the new learning task.

제어부(240)는 동적 신경망 학습 장치(200)의 전체적인 동작을 제어할 수 있고, 체크포인트 설정부(210), 프로세싱 엘리먼트 검색부(220) 및 작업 이관부(230) 간의 제어 흐름 또는 데이터 흐름을 제어할 수 있다.The control unit 240 may control the overall operation of the dynamic neural network learning apparatus 200, and may control or flow data between the checkpoint setting unit 210, the processing element search unit 220, and the task transfer unit 230. Can be controlled.

도 3은 도 1에 있는 동적 신경망 학습 장치에서 수행되는 동적 신경망 학습 과정을 설명하는 도면이다.FIG. 3 is a diagram illustrating a dynamic neural network learning process performed in the dynamic neural network learning apparatus in FIG. 1.

도 3에서, 동적 신경망 학습 장치(200)는 독립된 프로세스들로서 비용 모니터 에이전트, 인스턴스 중재 에이전트 및 인스턴스 추천 에이전트를 실행시킬 수 있다(단계 S310). 보다 구체적으로, 동적 신경망 학습 장치(200)는 비용 모니터 에이전트, 인스턴스 중재 에이전트 및 인스턴스 추천 에이전트 각각을 통해 체크포인트 설정부(210), 프로세싱 엘리먼트 검색부(220) 및 작업 이관부(230)의 동작을 수행할 수 있다.In FIG. 3, the dynamic neural network learning apparatus 200 may execute a cost monitor agent, an instance mediation agent, and an instance recommendation agent as independent processes (step S310). More specifically, the dynamic neural network learning apparatus 200 operates the checkpoint setting unit 210, the processing element search unit 220, and the task transfer unit 230 through the cost monitor agent, the instance mediation agent, and the instance recommendation agent, respectively. You can do

비용 모니터 에이전트는 제1 프로세싱 엘리먼트(100-1)를 통해 수행되고 있는 학습 태스크에 관해 체크포인트를 설정하여 학습 태스크의 산출물을 가져올 수 있다(단계 S320).The cost monitor agent may set the checkpoint for the learning task being performed through the first processing element 100-1 to bring the output of the learning task (step S320).

인스턴스 추천 에이전트는 학습 태스크의 수행에 독립적으로 복수의 프로세싱 엘리먼트들(100) 중 제1 프로세싱 엘리먼트(100-1) 보다 비용 효율적인 제2 프로세싱 엘리먼트(100-2)를 검색할 수 있다(단계 S330). 일 실시예에서, 인스턴스 추천 에이전트는 주기적으로 다른 프로세싱 엘리먼트에 관한 검색을 시작할 수 있고, 이때, 복수의 프로세싱 엘리먼트들(100)의 비용을 주기적으로 체크하여 로컬(Local Disk)에 저장할 수 있다. 다른 일 실시예에서, 인스턴스 추천 에이전트는 제1 프로세싱 엘리먼트(100-1)를 통해 수행되고 있는 학습 태스크가 다른 프로세싱 엘리먼트로 마이그레이션(Migration)(즉, 학습 태스크가 다른 프로세싱 엘리먼트로 이관) 해야 하는지 여부를 모니터링할 수 있다.The instance recommendation agent may search for the second processing element 100-2 which is more cost-effective than the first processing element 100-1 among the plurality of processing elements 100 independently of the execution of the learning task (step S330). . In one embodiment, the instance recommendation agent may periodically start searching for other processing elements, and at this time, the costs of the plurality of processing elements 100 may be periodically checked and stored in a local disk. In another embodiment, the instance recommendation agent determines whether the learning task being performed through the first processing element 100-1 needs to be migrated to another processing element (ie, the learning task is transferred to another processing element). Can be monitored.

인스턴스 중재 에이전트는 검색이 성공적으로 수행되면 가져온 학습 태스크의 산출물을 가지고 제2 프로세싱 엘리먼트(100-2)로 학습 태스크를 이동시켜 계속적으로 수행할 수 있다(단계 S340). 보다 구체적으로, 인스턴스 중재 에이전트는 체크포인트가 설정된 경로를 통해 제2 프로세싱 엘리먼트(100-2)에서 학습 태스크를 계속하여 수행할 수 있다. 인스턴스 중재 에이전트는 학습 태스크가 다시 시작되면 글로벌 메모리(300)에서 학습 태스크의 산출물을 가져온 후, 해당 체크포인트에서 학습 태스크에 관한 체크포인트를 다시 설정할 수 있다.When the search is successfully performed, the instance mediation agent may continuously perform the learning task by moving the learning task to the second processing element 100-2 with the output of the acquired learning task (step S340). More specifically, the instance mediation agent may continue to perform the learning task in the second processing element 100-2 through the path where the checkpoint is set. When the learning task is restarted, the instance mediation agent may obtain the output of the learning task from the global memory 300 and then set the checkpoint for the learning task again at the corresponding checkpoint.

상기에서는 본 출원의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 통상의 기술자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 출원을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although described above with reference to preferred embodiments of the present application, those skilled in the art variously modify the present application without departing from the spirit and scope of the present invention as set forth in the claims below. And can be changed.

10: 동적 신경망 학습 시스템
100: 복수의 프로세싱 엘리먼트들 200: 동적 신경망 학습 장치
210: 체크포인트 설정부 220: 프로세싱 엘리먼트 검색부
230: 작업 이관부 240: 제어부
300: 글로벌 메모리10: Dynamic neural network learning system
100: a plurality of processing elements 200: dynamic neural network learning device
210: checkpoint setting unit 220: processing element search unit
230: work transfer unit 240: control unit
300: global memory

Claims

In the dynamic neural network learning method performed in a dynamic neural network learning apparatus connected to a plurality of processing elements,
(a) With respect to the learning task being performed through the first processing element, as an intermediate checkpoint in the execution process of the learning task, a checkpoint capable of preserving the previous execution state and resuming the subsequent execution is set to obtain the output of the learning task. Importing;
(b) retrieving a second processing element that is more cost effective than the first processing element among the plurality of processing elements independently of performing the learning task; And
(c) when the search is successfully performed, moving the learning task to the second processing element with the output of the obtained learning task and continuously performing the dynamic neural network learning method.

The method of claim 1, wherein step (a) is
And setting the checkpoint in the learning task for each execution unit of the learning task.

The method of claim 2, wherein step (a) is
A method of learning a dynamic neural network, further comprising setting the execution unit through some learning data determined based on a specific criterion among the training data sets.

The method of claim 1, wherein step (a) is
And storing the output of the learning task in a global memory accessible by other processing elements.

The method of claim 4, wherein step (a) is
The method of dynamic neural network learning further comprises the step of generating a virtual machine image for the learning task and storing it in the global memory in order to provide the same execution environment when performed by the other processing element in the storage process. .

The method of claim 1, wherein step (b) is
And starting the search periodically or when the cost of the first processing element changes.

The method of claim 1, wherein step (b) is
And analyzing the cost history of the plurality of processing elements to determine the second processing element.

The method of claim 7, wherein step (b) is
And a separate task performed independently of the learning task to periodically perform analysis of the cost history to recommend the second processing element.

The method of claim 1, wherein step (c) is
And performing a new learning task with a virtual machine image of the learning task on the second processing element before the movement.

The method of claim 9, wherein step (c) is
And providing the output of the learning task brought to the new learning task to complete the movement.

In the dynamic neural network learning apparatus connected to a plurality of processing elements,
A check to obtain a product of the learning task by setting a checkpoint capable of preserving the previous execution state and resuming the subsequent execution as an intermediate checkpoint in the execution process of the learning task with respect to the learning task being performed through the first processing element. Point setting unit;
A processing element search unit for searching a second processing element more cost-effective than the first processing element among the plurality of processing elements independently of performing the learning task; And
A dynamic neural network learning apparatus including a task transfer unit that continuously moves the learning task to the second processing element with the output of the learning task obtained when the search is successfully performed.

The method of claim 11, wherein the checkpoint setting unit
A dynamic neural network learning apparatus, wherein the checkpoint is set in the learning task for each execution unit of the learning task.

The method of claim 11, wherein the checkpoint setting unit
Dynamic neural network learning apparatus, characterized in that the output of the learning task is stored in a global memory accessible by other processing elements.

The method of claim 11, wherein the processing element search unit
A dynamic neural network learning apparatus characterized in that the second processing element is determined by starting the search periodically or when the cost of the first processing element is changed, or by analyzing the cost history of the plurality of processing elements.

The method of claim 11, wherein the work transfer unit
A dynamic neural network learning apparatus comprising performing a new learning task with a virtual machine image of the learning task on the second processing element before the movement.

In the dynamic neural network learning apparatus connected to a plurality of processing elements,
The dynamic neural network learning apparatus executes a cost monitor agent, an instance mediation agent, and an instance recommendation agent as independent processes,
The cost monitor agent is an intermediate checkpoint in the execution process of the learning task with respect to the learning task being performed through the first processing element, and sets a checkpoint capable of preserving the previous execution state and resuming the subsequent execution. Bring the output,
The instance recommendation agent searches for a second processing element that is more cost-effective than the first processing element among the plurality of processing elements independently of performing the learning task,
The instance mediation agent is a dynamic neural network learning apparatus that continuously performs the learning task by moving the learning task to the second processing element with the output of the learning task obtained when the search is successfully performed.