KR102462002B1

KR102462002B1 - Method, device, and system for optimizing a neural network model to be executed on imbedded device

Info

Publication number: KR102462002B1
Application number: KR1020220023382A
Authority: KR
Inventors: 이현재; 김재우
Original assignee: 주식회사 에너자이(ENERZAi)
Priority date: 2022-02-23
Filing date: 2022-02-23
Publication date: 2022-11-03
Also published as: KR20230126631A

Abstract

According to an embodiment of the present invention, a method for optimizing a neural network model comprises the steps of: acquiring execution data of a neural network model that has been trained, wherein the execution data includes at least one of layer data of the neural network model, operation data constituting the neural network model, and the parameters of the neural network model; optimizing the structure of the neural network model based on the execution data of the neural network model and obtaining instruction information; optimizing an embedded device in which the neural network model is to be driven based on the instruction information, and obtaining optimal code information; and transmitting the optimal code information. Therefore, the method can optimally execute the neural network model on the embedded device in consideration of hardware information of the embedded device.

Description

A method for optimizing a neural network model to be run on an embedded device, a device for optimizing a neural network model, and a system for optimizing a neural network model

본 출원은 신경망 모델 최적화 방법, 신경망 모델 최적화 장치, 및 신경망 모델 최적화 시스템에 관한 것이다. 구체적으로 본 출원은 임베디드 장치에서 실행될 신경망 모델을 최적화하는 방법, 장치, 및 시스템에 관한 것이다. The present application relates to a method for optimizing a neural network model, an apparatus for optimizing a neural network model, and a system for optimizing a neural network model. Specifically, the present application relates to a method, apparatus, and system for optimizing a neural network model to be executed in an embedded device.

인공지능 기술이 발전하면서 다양한 산업 분야에서 활용되는 임베디드 시스템이 내재된 임베디드 장치에 인공지능 기술이 접목되는 것이 요구되고 있다. 이에 따라 경량화 기술들이 개발되었고 저성능, 저사양인 임베디드 장치들에 인공지능 기술이 접목될 수 있게 되었다. 특히, 미리 학습이 완료된 인공지능 모델을 임베디드 장치에 최대한 효율적으로 실행시키도록 개발된 소프트웨어인 실행 엔진(Inference Engine) 기술을 통하여 임베디드 장치에 인공지능 기술이 접목될 수 있게 되었다.As artificial intelligence technology develops, it is required to incorporate artificial intelligence technology into embedded devices with embedded systems used in various industrial fields. Accordingly, lightweight technologies have been developed, and artificial intelligence technology can be applied to low-performance, low-spec embedded devices. In particular, AI technology can be applied to embedded devices through Inference Engine technology, which is software developed to execute pre-trained AI models as efficiently as possible on embedded devices.

종래의 임베디드용 인공지능 실행 엔진은 임베디드 장치 자체에서 모델의 실행에 대한 정보를 읽고, 모델 실행 순서를 설정하고, 모델 실행에 필요한 메모리를 할당하여 모델을 실행하는 방식을 채택하고 있었다. 다만, 메모리 공간에 제약이 존재하는 임베디드 장치 자체에서 전술한 모델 실행을 위한 준비 과정들을 실행하는 것은 임베디드 장치의 하드웨어 환경에 상당한 부담을 야기하였다.The conventional AI execution engine for embedded reads information about the execution of the model from the embedded device itself, sets the model execution order, and allocates the memory required for the model execution to execute the model. However, executing the above-described preparation processes for model execution in the embedded device itself having a memory space constraint causes a significant burden on the hardware environment of the embedded device.

또한, 종래의 임베디드용 인공지능 실행 엔진은 비표준화된 하드웨어 요구사항에 대한 최적화를 위하여는 추가적인 수작업이 요구된다는 문제가 존재하였다. 구체적으로 종래의 임베디드용 인공지능 실행 엔진은, 각 하드웨어의 특수 명령어를 이용하거나, For Loop Unrolling 등의 최적화 기술을 이용하기 위하여 엔진 코드를 일일이 수작업으로 수정해야 하는 제약이 존재였다. 즉 종래의 임베디드용 인공지능 실행 엔진은 특정 하드웨어에만 최적화될 수 있었으며, 특정 하드웨어 이외의 하드웨어에서 실제로 사용하기 위하여는 수작업으로 실행 함수들을 최적화하는 과정이 필수적으로 요구되었다. In addition, the conventional AI execution engine for embedded has a problem that additional manual work is required to optimize for non-standard hardware requirements. Specifically, the conventional AI execution engine for embedded has a limitation in that it is necessary to manually modify the engine code one by one in order to use a special instruction of each hardware or to use an optimization technology such as For Loop Unrolling. That is, the conventional AI execution engine for embedded could be optimized only for specific hardware, and in order to actually use it in hardware other than specific hardware, the process of optimizing the execution functions by hand was essential.

이에, 인공지능 모델 및 임베디드 장치의 하드웨어 정보(컴퓨팅 사양)에 기초하여 신경망 모델을 임베디드 장치에서 최적으로 실행하기 위한 신경망 모델 최적화 방법, 장치 및 시스템의 개발이 요구된다.Accordingly, it is required to develop a neural network model optimization method, apparatus, and system for optimally executing a neural network model in an embedded device based on an artificial intelligence model and hardware information (computing specifications) of the embedded device.

본 발명이 해결하고자 하는 일 과제는, 임베디드 장치의 하드웨어 정보를 고려하여 신경망 모델을 임베디드 장치에서 최적으로 실행하기 위한 신경망 모델 최적화 방법, 신경망 모델 최적화 장치 및 신경망 모델 최적화 시스템을 제공하는 것이다. An object of the present invention is to provide a neural network model optimization method, a neural network model optimization device, and a neural network model optimization system for optimally executing a neural network model in an embedded device in consideration of hardware information of the embedded device.

본 발명이 해결하고자 하는 과제가 상술한 과제로 제한되는 것은 아니며, 언급되지 아니한 과제들은 본 명세서 및 첨부된 도면으로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The problem to be solved by the present invention is not limited to the above-described problems, and the problems not mentioned will be clearly understood by those of ordinary skill in the art to which the present invention belongs from the present specification and the accompanying drawings. .

본 출원의 일 실시예에 따른 신경망 모델 최적화 방법은, 학습이 완료된 신경망 모델의 실행 데이터를 획득하는 단계-상기 실행 데이터는 상기 신경망 모델의 계층 데이터, 상기 신경망 모델을 구성하는 연산 데이터(operation data), 및 상기 신경망 모델의 파라미터 중 적어도 하나를 포함함-; 상기 신경망 모델의 실행 데이터에 기초하여 상기 신경망 모델의 구조에 대한 최적화를 수행하고 인스트럭션 정보를 획득하는 단계; 상기 인스트럭션 정보에 기초하여 상기 신경망 모델이 구동될 임베디드 장치에 대한 최적화를 수행하고 최적 코드 정보를 획득하는 단계; 및 상기 최적 코드 정보를 송신하는 단계;를 포함하되, 상기 신경망 모델의 구조에 대한 최적화를 수행하고 인스트럭션 정보를 획득하는 단계는, 상기 실행 데이터에 기초하여 방향성 비사이클 그래프(DAG)를 생성하는 단계; 상기 방향성 비사이클 그래프에 기초하여 연산들의 실행 순서를 결정하는 단계; 미리 정해진 기준 연산 패턴에 기초하여 상기 기준 연산 패턴에 대응되는 상기 방향성 비사이클 그래프의 대상 연산 패턴을 검출하고, 상기 대상 연산 패턴에 포함된 제1 대상 연산과 제2 대상 연산을 병합하는 단계; 상기 결정된 실행 순서에 기초하여 제1 메모리 공간 맵을 획득하고, 상기 제1 메모리 공간 맵에 기초하여 메모리 할당과 관련된 최적화를 수행하는 단계; 및 최적화 수행 결과에 기초하여 메모리 주소와 관련된 인스트럭션을 생성하는 단계;를 더 포함할 수 있다. The method for optimizing a neural network model according to an embodiment of the present application includes: acquiring execution data of a neural network model on which learning is completed - The execution data is hierarchical data of the neural network model, operation data constituting the neural network model , and at least one of the parameters of the neural network model; performing optimization on the structure of the neural network model based on the execution data of the neural network model and obtaining instruction information; performing optimization on an embedded device in which the neural network model is to be driven based on the instruction information and obtaining optimal code information; and transmitting the optimal code information; wherein, performing optimization on the structure of the neural network model and obtaining instruction information includes: generating a directional acyclic graph (DAG) based on the execution data ; determining an execution order of operations based on the directed acyclic graph; detecting a target operation pattern of the directional acyclic graph corresponding to the reference operation pattern based on a predetermined reference operation pattern, and merging the first target operation and the second target operation included in the target operation pattern; obtaining a first memory space map based on the determined execution order, and performing optimization related to memory allocation based on the first memory space map; and generating an instruction related to a memory address based on the optimization result.

본 출원의 일 실시예에 따른 신경망 모델 최적화 장치는, 학습이 완료된 신경망 모델의 실행 데이터 및 신경망 모델이 구동될 임베디드 장치의 컴퓨팅 환경 정보를 획득하는 송수신부; 및 상기 실행 데이터 및 상기 임베디드 장치의 컴퓨팅 환경 정보에 기초하여 상기 신경망 모델에 대한 최적화를 수행하는 프로세서;를 포함하되, 상기 프로세서는, 학습이 완료된 신경망 모델의 실행 데이터-상기 실행 데이터는 상기 신경망 모델의 계층 데이터, 상기 신경망 모델을 구성하는 연산 데이터(operation data), 및 상기 신경망 모델의 파라미터 중 적어도 하나를 포함함-를 획득하고, 상기 신경망 모델의 실행 데이터에 기초하여 상기 신경망 모델의 구조에 대한 최적화를 수행하고 인스트럭션 정보를 획득하고, 상기 인스트럭션 정보에 기초하여 상기 신경망 모델이 구동될 임베디드 장치에 대한 최적화를 수행하고 최적 코드 정보를 획득하고, 상기 최적 코드 정보를 송신하도록 구성되되, 상기 프로세서는, 상기 실행 데이터에 기초하여 방향성 비사이클 그래프(DAG)를 생성하고, 상기 방향성 비사이클 그래프에 기초하여 연산들의 실행 순서를 결정하고, 미리 정해진 기준 연산 패턴에 기초하여 상기 기준 연산 패턴에 대응되는 상기 방향성 비사이클 그래프의 대상 연산 패턴을 검출하고, 상기 대상 연산 패턴에 포함된 제1 대상 연산과 제2 대상 연산을 병합하고, 상기 결정된 실행 순서에 기초하여 제1 메모리 공간 맵을 획득하고, 상기 제1 메모리 공간 맵에 기초하여 메모리 할당과 관련된 최적화를 수행하고, 최적화 수행 결과에 기초하여 메모리 주소와 관련된 인스트럭션을 생성함으로써, 상기 인스트럭션 정보를 획득하도록 구성될 수 있다. An apparatus for optimizing a neural network model according to an embodiment of the present application includes: a transceiver for acquiring execution data of a neural network model that has been trained and computing environment information of an embedded device in which the neural network model is to be driven; and a processor for performing optimization on the neural network model based on the execution data and computing environment information of the embedded device, wherein the processor includes: execution data of a neural network model that has been trained - the execution data is the neural network model obtain hierarchical data of , operation data constituting the neural network model, and at least one of parameters of the neural network model; and, based on the execution data of the neural network model, configured to perform optimization and obtain instruction information, perform optimization on an embedded device in which the neural network model is to be driven based on the instruction information, obtain optimal code information, and transmit the optimal code information, wherein the processor comprises: , generates a directional acyclic graph (DAG) based on the execution data, determines an execution order of operations based on the directional acyclic graph, and corresponds to the reference operation pattern based on a predetermined reference operation pattern Detect a target operation pattern of a directed acyclic graph, merge a first target operation and a second target operation included in the target operation pattern, obtain a first memory space map based on the determined execution order, and 1 may be configured to obtain the instruction information by performing optimization related to memory allocation based on the memory space map and generating an instruction related to a memory address based on a result of performing the optimization.

본 출원의 일 실시예에 따른 신경망 모델 최적화 방법은, 학습이 완료된 신경망 모델의 실행 데이터를 획득하는 단계-상기 실행 데이터는 상기 신경망 모델의 계층 데이터, 상기 신경망 모델을 구성하는 연산 데이터(operation data) 및 상기 신경망 모델의 파라미터 중 적어도 하나를 포함함-; 상기 신경망 모델의 실행 데이터에 기초하여 상기 신경망 모델의 구조에 대한 최적화를 수행하고 인스트럭션 정보를 획득하는 단계-상기 인스트럭션 정보는 연산의 유형, 연산의 및 메모리 주소 중 적어도 하나와 관련된 정보를 포함함-; 상기 인스트럭션 정보에 기초하여 상기 신경망 모델이 구동될 임베디드 장치에 대한 최적화를 수행하고 최적 코드 정보를 획득하는 단계; 및 상기 최적 코드 정보를 송신하는 단계;를 포함하되, 상기 최적 코드 정보를 획득하는 단계는, 상기 임베디드 장치의 컴퓨팅 환경 정보를 획득하는 단계; 강화 학습으로 훈련된 에이전트를 통하여, 인스트럭션 정보로부터 최적화 파라미터를 획득하는 단계; 및 상기 최적화 파라미터에 기초하여 상기 임베디드 장치에서 이용될 코드 정보를 생성하는 단계;를 더 포함할 수 있다. The method for optimizing a neural network model according to an embodiment of the present application includes: acquiring execution data of a neural network model on which learning is completed - The execution data is hierarchical data of the neural network model, operation data constituting the neural network model and at least one of the parameters of the neural network model; performing optimization on the structure of the neural network model and obtaining instruction information based on the execution data of the neural network model, wherein the instruction information includes information related to at least one of a type of operation, a description of the operation, and a memory address- ; performing optimization on an embedded device in which the neural network model is to be driven based on the instruction information and obtaining optimal code information; and transmitting the optimal code information; wherein the acquiring of the optimal code information includes: acquiring computing environment information of the embedded device; obtaining optimization parameters from instruction information through an agent trained by reinforcement learning; and generating code information to be used in the embedded device based on the optimization parameter.

본 출원의 일 실시예에 따른 신경망 모델 최적화 장치는, 학습이 완료된 신경망 모델의 실행 데이터 및 신경망 모델이 구동될 임베디드 장치의 컴퓨팅 환경 정보를 획득하는 송수신부; 및 상기 실행 데이터 및 상기 임베디드 장치의 컴퓨팅 환경 정보에 기초하여 상기 신경망 모델에 대한 최적화를 수행하는 프로세서;를 포함하되, 상기 프로세서는, 학습이 완료된 신경망 모델의 실행 데이터-상기 실행 데이터는 상기 신경망 모델의 계층 데이터, 상기 신경망 모델을 구성하는 연산 데이터(operation data), 및 상기 신경망 모델의 파라미터 중 적어도 하나를 포함함-를 획득하고, 상기 신경망 모델의 실행 데이터에 기초하여 상기 신경망 모델의 구조에 대한 최적화를 수행하고 인스트럭션 정보-상기 인스트럭션 정보는 연산의 유형, 및 메모리 주소 중 적어도 하나와 관련된 정보를 포함함-를 획득하고, 상기 인스트럭션 정보에 기초하여 상기 신경망 모델이 구동될 임베디드 장치에 대한 최적화를 수행하고 최적 코드 정보를 획득하고, 상기 최적 코드 정보를 송신하도록 구성되되, 상기 프로세서는, 상기 임베디드 장치의 컴퓨팅 환경 정보를 획득하고, 강화 학습으로 훈련된 에이전트를 통하여, 인스트럭션 정보로부터 최적화 파라미터를 획득하고, 상기 최적화 파라미터에 기초하여 상기 임베디드 장치에서 이용될 코드 정보를 생성함으로써 상기 최적 코드 정보를 획득하도록 구성될 수 있다. An apparatus for optimizing a neural network model according to an embodiment of the present application includes: a transceiver for acquiring execution data of a neural network model that has been trained and computing environment information of an embedded device in which the neural network model is to be driven; and a processor for performing optimization on the neural network model based on the execution data and computing environment information of the embedded device, wherein the processor includes: execution data of a neural network model that has been trained - the execution data is the neural network model obtain hierarchical data of , operation data constituting the neural network model, and at least one of parameters of the neural network model; and, based on the execution data of the neural network model, Perform optimization and obtain instruction information, wherein the instruction information includes information related to at least one of a type of operation and a memory address, and optimize an embedded device in which the neural network model is to be driven based on the instruction information. and obtain optimal code information, and send the optimal code information, wherein the processor acquires computing environment information of the embedded device, and through an agent trained in reinforcement learning, obtains optimization parameters from the instruction information and to obtain the optimal code information by generating code information to be used in the embedded device based on the optimization parameter.

본 발명의 과제의 해결 수단이 상술한 해결 수단들로 제한되는 것은 아니며, 언급되지 아니한 해결 수단들은 본 명세서 및 첨부된 도면으로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The solutions to the problems of the present invention are not limited to the above-described solutions, and solutions not mentioned will be clearly understood by those of ordinary skill in the art to which the present invention belongs from the present specification and the accompanying drawings. will be able

본 출원의 실시예에 따른 신경망 모델 최적화 방법, 신경망 모델 최적화 장치, 및 신경망 모델 최적화 시스템에 의하면, 임베디드 장치의 하드웨어의 비표준화 문제와 기존 인공지능 실행 엔진의 구조적인 제약을 해결하고, 신경망 모델을 임베디드 장치의 하드웨어 플랫폼에 대하여 최적화시킬 수 있다. According to the neural network model optimization method, the neural network model optimization apparatus, and the neural network model optimization system according to the embodiments of the present application, the problem of hardware non-standardization of embedded devices and the structural constraints of the existing artificial intelligence execution engine are solved, and the neural network model is It can be optimized for the hardware platform of the embedded device.

본 출원의 실시예에 따른 신경망 모델 최적화 방법, 신경망 모델 최적화 장치, 및 신경망 모델 최적화 시스템에 의하면, 임베디드 장치에서의 신경망 모델의 실행 능력이 향상될 수 있다.According to the method for optimizing a neural network model, the apparatus for optimizing a neural network model, and the system for optimizing a neural network model according to an embodiment of the present application, execution capability of a neural network model in an embedded device may be improved.

본 출원의 실시예에 따른 신경망 모델 최적화 방법, 신경망 모델 최적화 장치, 및 신경망 모델 최적화 시스템에 의하면, 임베디드 장치에서 신경망 모델을 실행시키기 위하여 필요한 전력 사용량을 감소시킬 수 있다. According to the method for optimizing a neural network model, the apparatus for optimizing a neural network model, and the system for optimizing a neural network model according to an embodiment of the present application, it is possible to reduce the amount of power required to execute a neural network model in an embedded device.

본 발명의 효과가 상술한 효과들로 제한되는 것은 아니며, 언급되지 아니한 효과들은 본 명세서 및 첨부된 도면으로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확히 이해될 수 있을 것이다.Effects of the present invention are not limited to the above-described effects, and effects not mentioned will be clearly understood by those of ordinary skill in the art to which the present invention belongs from the present specification and accompanying drawings.

도 1은 본 출원의 일 실시예에 따른 신경망 모델 최적화 시스템의 개략도이다.
도 2는 본 출원의 일 실시예에 따른 신경망 모델 최적화 방법을 나타낸 순서도이다.
도 3은 본 출원의 일 실시예에 따른 신경망 모델의 구조에 대한 최적화를 수행하고 인스트럭션 정보를 획득하는 단계를 구체화한 순서도이다.
도 4는 본 출원의 일 실시예에 따른 방향성 비사이클 그래프의 일 양상을 도시한 도면이다.
도 5는 본 출원의 일 실시예에 따른 제1 대상 연산과 제2 대상 연산을 병합하는 일 양상을 도시한 도면이다.
도 6은 본 출원의 일 실시예에 따른 제1 메모리 공간 맵의 일 양상을 도시한 도면이다.
도 7은 본 출원의 일 실시예에 따른 메모리 할당과 관련된 최적화의 일 양상을 도시한 도면이다.
도 8은 본 출원의 일 실시예에 따른 임베디드 장치에 대한 최적화를 수행하고 최적 코드 정보를 획득하는 단계를 구체화한 순서도이다.
도 9는 본 출원의 일 실시예에 따른 강화 학습 방식을 통하여 에이전트를 훈련시키는 일 양상을 도시한 도면이다. 1 is a schematic diagram of a neural network model optimization system according to an embodiment of the present application.
2 is a flowchart illustrating a method for optimizing a neural network model according to an embodiment of the present application.
3 is a flowchart detailing the steps of performing optimization on the structure of a neural network model and acquiring instruction information according to an embodiment of the present application.
4 is a diagram illustrating an aspect of a directional acyclic graph according to an embodiment of the present application.
5 is a diagram illustrating an aspect of merging a first target operation and a second target operation according to an embodiment of the present application.
6 is a diagram illustrating an aspect of a first memory space map according to an embodiment of the present application.
7 is a diagram illustrating an aspect of optimization related to memory allocation according to an embodiment of the present application.
8 is a flowchart detailing the steps of performing optimization on an embedded device and acquiring optimal code information according to an embodiment of the present application.
9 is a diagram illustrating an aspect of training an agent through a reinforcement learning method according to an embodiment of the present application.

본 출원의 상술한 목적, 특징들 및 장점은 첨부된 도면과 관련된 다음의 상세한 설명을 통해 보다 분명해질 것이다. 다만, 본 출원은 다양한 변경을 가할 수 있고 여러 가지 실시예들을 가질 수 있는 바, 이하에서는 특정 실시예들을 도면에 예시하고 이를 상세히 설명하고자 한다.The above-mentioned objects, features and advantages of the present application will become more apparent from the following detailed description in conjunction with the accompanying drawings. However, since the present application may have various changes and may have various embodiments, specific embodiments will be exemplified in the drawings and described in detail below.

명세서 전체에 걸쳐서 동일한 참조번호들은 원칙적으로 동일한 구성요소들을 나타낸다. 또한, 각 실시예의 도면에 나타나는 동일한 사상의 범위 내의 기능이 동일한 구성요소는 동일한 참조부호를 사용하여 설명하며, 이에 대한 중복되는 설명은 생략하기로 한다.Throughout the specification, like reference numerals refer to like elements in principle. In addition, components having the same function within the scope of the same idea shown in the drawings of each embodiment will be described using the same reference numerals, and overlapping descriptions thereof will be omitted.

본 출원과 관련된 공지 기능 혹은 구성에 대한 구체적인 설명이 본 출원의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 본 명세서의 설명 과정에서 이용되는 숫자(예를 들어, 제1, 제2 등)는 하나의 구성요소를 다른 구성요소와 구분하기 위한 식별기호에 불과하다.If it is determined that a detailed description of a known function or configuration related to the present application may unnecessarily obscure the gist of the present application, the detailed description thereof will be omitted. In addition, numbers (eg, first, second, etc.) used in the description process of the present specification are merely identification symbols for distinguishing one component from other components.

또한, 이하의 실시예에서 사용되는 구성요소에 대한 접미사 "모듈" 및 "부"는 명세서 작성의 용이함만이 고려되어 부여되거나 혼용되는 것으로서, 그 자체로 서로 구별되는 의미 또는 역할을 갖는 것은 아니다.In addition, the suffixes "module" and "part" for the components used in the following embodiments are given or mixed in consideration of only the ease of writing the specification, and do not have distinct meanings or roles by themselves.

이하의 실시예에서, 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다.In the following examples, the singular expression includes the plural expression unless the context clearly dictates otherwise.

이하의 실시예에서, 포함하다 또는 가지다 등의 용어는 명세서상에 기재된 특징, 또는 구성요소가 존재함을 의미하는 것이고, 하나 이상의 다른 특징들 또는 구성요소가 부가될 가능성을 미리 배제하는 것은 아니다.In the following embodiments, terms such as include or have means that the features or components described in the specification are present, and the possibility that one or more other features or components may be added is not excluded in advance.

도면에서는 설명의 편의를 위하여 구성 요소들이 그 크기가 과장 또는 축소될 수 있다. 예컨대, 도면에서 나타난 각 구성의 크기 및 두께는 설명의 편의를 위해 임의로 나타낸 것으로, 본 발명이 반드시 도시된 바에 한정되지 않는다.In the drawings, the size of the components may be exaggerated or reduced for convenience of description. For example, the size and thickness of each component shown in the drawings are arbitrarily indicated for convenience of description, and the present invention is not necessarily limited to the illustrated bar.

어떤 실시예가 달리 구현 가능한 경우에 특정한 프로세스의 순서는 설명되는 순서와 다르게 수행될 수도 있다. 예를 들어, 연속하여 설명되는 두 프로세스가 실질적으로 동시에 수행될 수도 있고, 설명되는 순서와 반대의 순서로 진행될 수 있다.In cases where certain embodiments are otherwise implementable, the order of specific processes may be performed differently from the order in which they are described. For example, two processes described in succession may be performed substantially simultaneously, or may be performed in an order opposite to the order described.

이하의 실시예에서, 구성 요소 등이 연결되었다고 할 때, 구성 요소들이 직접적으로 연결된 경우뿐만 아니라 구성요소들 중간에 구성 요소들이 개재되어 간접적으로 연결된 경우도 포함한다.In the following embodiments, when components are connected, it includes not only cases in which components are directly connected, but also cases in which components are interposed between components and connected indirectly.

예컨대, 본 명세서에서 구성 요소 등이 전기적으로 연결되었다고 할 때, 구성 요소 등이 직접 전기적으로 연결된 경우뿐만 아니라, 그 중간에 구성 요소 등이 개재되어 간접적으로 전기적 연결된 경우도 포함한다.For example, in the present specification, when it is said that components and the like are electrically connected, it includes not only the case where the components are directly electrically connected, but also the case where the components are interposed therebetween to be indirectly electrically connected.

본 출원의 일 실시예에 따르면, 상기 제1 대상 연산과 상기 제2 대상 연산을 병합하는 단계는, 미리 정해진 기준 연산 패턴 정보를 획득하는 단계-상기 기준 연산 패턴 정보는 제1 연산 및 상기 제1 연산과 연계된 제2 연산을 포함함-; 상기 기준 연산 패턴 정보에 기초하여 상기 방향성 비사이클 그래프로부터 상기 제1 연산에 대응되는 상기 제1 대상 연산 및 상기 제2 연산에 대응되는 상기 제2 대상 연산을 검출하는 단계; 및 상기 제1 대상 연산과 상기 제2 대상 연산을 병합하고, 상기 병합 결과에 기초하여 커널을 변환하는 단계;를 더 포함할 수 있다. According to an embodiment of the present application, the step of merging the first target operation and the second target operation includes: obtaining predetermined reference operation pattern information - The reference operation pattern information includes the first operation and the first operation comprising a second operation associated with the operation; detecting the first target operation corresponding to the first operation and the second target operation corresponding to the second operation from the directional acyclic graph based on the reference operation pattern information; and merging the first target operation and the second target operation, and transforming a kernel based on a result of the merging.

본 출원의 일 실시예에 따르면, 상기 메모리 할당과 관련된 최적화를 수행하는 단계는, 상기 결정된 실행 순서 및 상기 연산을 통하여 출력되는 데이터의 크기에 기초하여 상기 제1 메모리 공간 맵을 생성하는 단계; 제3 대상 연산으로 입력된 값이 저장되는 제1 메모리 텐서를 상기 제3 대상 연산을 통하여 출력된 값이 저장되는 제2 메모리 텐서로 변경하는 단계; 및 상기 변경 결과에 기초하여 상기 제1 메모리 공간 맵으로부터 제2 메모리 공간 맵을 생성하는 단계;를 더 포함할 수 있다. According to an embodiment of the present application, the performing of the optimization related to the memory allocation may include: generating the first memory space map based on the determined execution order and the size of data output through the operation; changing a first memory tensor in which a value input through a third target operation is stored into a second memory tensor in which a value output through the third target operation is stored; and generating a second memory space map from the first memory space map based on the change result.

본 출원의 일 실시예에 따르면, 상기 연산들의 실행 순서를 결정하는 단계는, 상기 방향성 비사이클 그래프의 제1 브랜치(branch)에 포함된 제4 대상 연산에 요구되는 제1 메모리 공간과 상기 방향성 비사이클 그래프의 제2 브랜치에 포함된 제5 대상 연산에 요구되는 제2 메모리 공간을 연산하는 단계; 상기 제1 메모리 공간과 상기 제2 메모리 공간을 비교하는 단계; 및 상기 비교 결과에 따라 상기 제4 대상 연산과 상기 제5 대상 연산의 실행 순서를 결정하는 단계;를 더 포함할 수 있다. According to an embodiment of the present application, the determining of the execution order of the operations includes a first memory space required for a fourth target operation included in a first branch of the directed acyclic graph and the directionality ratio calculating a second memory space required for a fifth target operation included in a second branch of the cycle graph; comparing the first memory space and the second memory space; and determining the execution order of the fourth target operation and the fifth target operation according to the comparison result.

본 출원의 일 실시예에 따르면, 상기 제1 메모리 공간이 상기 제2 메모리 공간보다 큰 경우, 상기 제4 대상 연산의 실행 순서는 상기 제5 대상 연산의 실행 순서보다 후순위로 할당하되, 상기 제1 메모리 공간이 상기 제2 메모리 공간보다 작은 경우, 상기 제4 대상 연산의 실행 순서는 상기 제5 대상 연산의 실행 순서보다 선순위로 할당할 수 있다. According to an embodiment of the present application, when the first memory space is larger than the second memory space, the execution order of the fourth target operation is allocated with a lower priority than the execution order of the fifth target operation, and the first memory space is larger than the second memory space. When the memory space is smaller than the second memory space, the execution order of the fourth target operation may be allocated with a higher priority than the execution order of the fifth target operation.

본 출원의 일 실시예에 따르면, 상기 신경망 모델의 구조에 대한 최적화를 수행하고 인스트럭션 정보를 획득하는 단계는, 상기 신경망 모델의 연산(operation)과 관련된 입력 데이터 및 출력 데이터를 획득하는 단계; 및 상기 입력 데이터 및 상기 출력 데이터를 미리 결정된 정수 범위에 해당하는 값으로 조정하는 단계;를 더 포함할 수 있다. According to an embodiment of the present application, the step of optimizing the structure of the neural network model and obtaining instruction information may include: obtaining input data and output data related to an operation of the neural network model; and adjusting the input data and the output data to values corresponding to a predetermined integer range.

본 출원의 일 실시예에 따르면, 상기 신경망 모델 최적화 방법을 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체가 제공될 수 있다.According to an embodiment of the present application, a computer-readable recording medium in which a program for executing the neural network model optimization method is recorded may be provided.

본 출원의 일 실시예에 따르면, 상기 최적화 파라미터를 획득하는 단계는, 상기 연산에 대응되는 적어도 하나 이상의 연산 유형 정보, 메모리 상태 정보, 및 상기 임베디드 장치의 컴퓨팅 환경 정보 중 적어도 하나를 상기 에이전트에 입력하는 단계; 및 상기 에이전트를 통하여 출력되는 최적화 파라미터를 획득하는 단계;를 더 포함할 수 있다.According to an embodiment of the present application, the obtaining of the optimization parameter may include inputting at least one of at least one operation type information corresponding to the operation, memory state information, and computing environment information of the embedded device to the agent. to do; and obtaining an optimization parameter output through the agent.

본 출원의 일 실시예에 따르면, 상기 최적화 파라미터는, 상기 연산에 대하여 수행될 알고리즘 유형을 선택하는 파라미터, 상기 연산의 블록 사이즈와 관련된 파라미터, 및 코드의 길이와 관련된 파라미터 중 적어도 하나와 관련될 수 있다.According to an embodiment of the present application, the optimization parameter may be related to at least one of a parameter for selecting an algorithm type to be performed with respect to the operation, a parameter related to a block size of the operation, and a parameter related to a length of a code. have.

본 출원의 일 실시예에 따르면, 상기 에이전트는, 초기 규칙에 따라, 타겟 임베디드 장치의 컴퓨팅 환경과 관련된 타겟 장치 정보, 메모리 상태 정보, 및 연산(operation)에 대응되는 적어도 하나 이상의 알고리즘 유형 정보에 기초하여 파라미터와 관련된 예측값을 출력하도록 구성되되, 상기 에이전트는, 예측값을 통하여 생성된 코드의 성능에 대한 평가값이 최대화되도록 상기 초기 규칙이 갱신됨으로써 훈련될 수 있다. According to an embodiment of the present application, the agent is, according to an initial rule, based on target device information related to the computing environment of the target embedded device, memory state information, and at least one or more algorithm type information corresponding to an operation to output a prediction value related to the parameter, the agent can be trained by updating the initial rule so that the evaluation value for the performance of the code generated through the prediction value is maximized.

본 출원의 일 실시예에 따르면, 상기 임베디드 장치에서 이용될 코드 정보를 생성하는 단계는, 상기 최적화 파라미터에 기초하여 상기 인스트럭션 정보에 대응되도록 코드를 생성하는 단계; 및 상기 생성된 코드를 컴파일하여 바이너리 파일 형태로 변환하는 단계;를 더 포함할 수 있다. According to an embodiment of the present application, the generating of the code information to be used in the embedded device may include: generating a code to correspond to the instruction information based on the optimization parameter; and compiling the generated code and converting it into a binary file format.

이하에서는 도 1 내지 도 9를 참고하여 본 출원의 신경망 모델 최적화 방법, 신경망 모델 최적화 장치, 및 신경망 모델 최적화 시스템에 관하여 설명한다. Hereinafter, a neural network model optimization method, a neural network model optimization apparatus, and a neural network model optimization system of the present application will be described with reference to FIGS. 1 to 9 .

도 1은 본 출원의 일 실시예에 따른 신경망 모델 최적화 시스템의 개략도이다. 1 is a schematic diagram of a neural network model optimization system according to an embodiment of the present application.

본 출원의 일 실시예에 따른 신경망 모델 최적화 시스템(10)은 임베디드 장치(100) 및 신경망 모델 최적화 장치(1000, 혹은 서버)를 포함할 수 있다. The neural network model optimization system 10 according to an embodiment of the present application may include an embedded device 100 and a neural network model optimization device 1000 or a server.

임베디드 장치(100)는 특정 목적(혹은 특정 기능)을 가지고 만들어진 프로그래밍이 가능한 임의의 임베디드 시스템(Embedded system)이 내재된 장치를 포괄하는 의미일 수 있다. 임베디드 장치(100)는 프로세서 및/또는 메모리를 포함하는 하드웨어를 포함할 수 있다. 또한, 임베디드 장치(100)는 하드웨어를 제어하기 위한 펌웨어(Firmware)를 포함할 수 있다. 또한, 임베디드 장치(100)는 인공지능 실행 엔진을 포함하여 임의의 소프트웨어를 펌웨어에 입력하여 임의의 인공지능 모델을 실행하도록 구성될 수 있다. 여기서, 인공지능 실행 엔진(Inference Engine)은 미리 학습된 신경망 모델을 임베디드 장치(100)에 최대한 효율적으로 실행시키기 위한 소프트웨어로서, 인공지능 실사용에 목적을 둔 기술이며 탑재되는 장치의 환경에 효율성을 높이는 기능을 수행한다. 예컨대, 모바일 기기의 경우, 모바일 기기의 컴퓨팅 환경인 느린 연산속도 및 저전력 사양에 맞춰 실행엔진이 구현될 수 있다. 다른 예로, 컴퓨팅 성능이 상대적으로 높은 PC 서버의 경우에는 고성능 병렬처리 능력을 극대화하도록 실행 엔진이 구현될 수 있다.The embedded device 100 may mean a device in which any programmable embedded system created for a specific purpose (or specific function) is embedded. The embedded device 100 may include hardware including a processor and/or memory. Also, the embedded device 100 may include firmware for controlling hardware. In addition, the embedded device 100 may be configured to execute any artificial intelligence model by inputting any software into the firmware, including the artificial intelligence execution engine. Here, the artificial intelligence execution engine (Inference Engine) is software for executing the pre-trained neural network model in the embedded device 100 as efficiently as possible. Elevation functions. For example, in the case of a mobile device, the execution engine may be implemented according to the slow operation speed and low power specifications of the computing environment of the mobile device. As another example, in the case of a PC server having relatively high computing power, an execution engine may be implemented to maximize high-performance parallel processing capability.

본 출원의 일 실시예에 따른 임베디드 장치(100)는 신경망 모델 최적화 장치(1000)로부터 임베디드 장치(100)의 컴퓨팅 환경에 최적화된 최적 코드 정보를 획득하고, 최적 코드 정보를 펌웨어에 추가(혹은 입력)할 수 있다. 후술할 바와 같이, 최적 코드 정보는 신경망 모델의 내부 구조를 분석하여 생성될 수 있다. 또한, 최적 코드 정보는 임베디드 장치(100)의 메모리 사양 및/또는 프로세서 사양 등을 포함한 컴퓨팅 환경을 고려하여 생성될 수 있다. 또한, 임베디드 장치(100)는 신경망 모델 최적화 장치(1000)로부터 생성된 최적 코드 정보를 펌웨어에 추가하고, 신경망 모델의 실행을 수행할 수 있다. The embedded device 100 according to an embodiment of the present application acquires optimal code information optimized for the computing environment of the embedded device 100 from the neural network model optimization device 1000, and adds (or inputs) optimal code information to firmware. )can do. As will be described later, optimal code information may be generated by analyzing the internal structure of the neural network model. Also, the optimal code information may be generated in consideration of a computing environment including a memory specification and/or a processor specification of the embedded device 100 . In addition, the embedded device 100 may add the optimal code information generated from the neural network model optimizing apparatus 1000 to firmware and execute the neural network model.

본 출원의 일 실시예에 따른 신경망 모델 최적화 장치(1000)는 임베디드 장치(100) 이외의 임의의 장치(혹은 서버)에서 훈련된 신경망 모델이 임베디드 장치(100)의 컴퓨팅 환경에서 최적으로 실행될 수 있도록, 신경망 모델의 연산 구조 및/또는 메모리 할당에 대한 최적화를 수행할 수 있다. 또한, 본 출원의 일 실시예에 따른 신경망 모델 최적화 장치(1000)는 신경망 모델이 임베디드 장치(100)의 컴퓨팅 환경에서 최적으로 실행될 수 있는 최적 코드를 자동적으로 생성할 수 있다. The apparatus 1000 for optimizing a neural network model according to an embodiment of the present application allows a neural network model trained in an arbitrary device (or server) other than the embedded device 100 to be optimally executed in the computing environment of the embedded device 100 . , optimization of the computational structure and/or memory allocation of the neural network model may be performed. Also, the apparatus 1000 for optimizing a neural network model according to an embodiment of the present application may automatically generate an optimal code in which the neural network model can be optimally executed in the computing environment of the embedded device 100 .

본 출원의 일 실시예에 따른 신경망 모델 최적화 장치(1000)는 송수신부(1100), 메모리(1200), 및 프로세서(1300)를 포함할 수 있다.The apparatus 1000 for optimizing a neural network model according to an embodiment of the present application may include a transceiver 1100 , a memory 1200 , and a processor 1300 .

신경망 모델 최적화 장치(1000)의 송수신부(1100)는 임베디드 장치(100)를 포함하여 임의의 외부 기기와 통신을 수행할 수 있다. 예컨대, 신경망 모델 최적화 장치(1000)는, 송수신부(1100)를 통해, 최적화를 수행함으로써 획득한 최적 코드 정보를 임베디드 장치(100)로 송신할 수 있다. 또한, 신경망 모델 최적화 장치(1000)는, 송수신부(1100)를 통해, 임베디드 장치(100) 혹은 임의의 외부 장치로부터 임베디드 장치(100)의 컴퓨팅 환경 정보를 수신할 수 있다. 또한 신경망 모델 최적화 장치(1000)는, 송수신부(1100)를 통해, 학습이 완료된 신경망 모델 및/또는 신경망 모델을 실행시키기 위한 실행데이터를 수신할 수 있다. The transceiver 1100 of the apparatus 1000 for optimizing a neural network model may communicate with any external device including the embedded device 100 . For example, the apparatus 1000 for optimizing a neural network model may transmit, through the transceiver 1100 , optimal code information obtained by performing optimization to the embedded apparatus 100 . Also, the neural network model optimization apparatus 1000 may receive computing environment information of the embedded device 100 from the embedded device 100 or any external device through the transceiver 1100 . Also, the neural network model optimization apparatus 1000 may receive, through the transceiver 1100 , a neural network model that has been trained and/or execution data for executing the neural network model.

신경망 모델 최적화 장치(1000)는, 송수신부(1100)를 통해, 네트워크에 접속하여 각종 데이터를 송수신할 수 있다. 송수신부는 크게 유선 타입과 무선 타입을 포함할 수 있다. 유선 타입과 무선 타입은 각각의 장단점을 가지므로, 경우에 따라서 신경망 모델 최적화 장치(1000)에는 유선 타입과 무선 타입이 동시에 마련될 수도 있다. 여기서, 무선 타입의 경우에는 주로 와이파이(Wi-Fi) 같은 WLAN(Wireless Local Area Network) 계열의 통신 방식을 이용할 수 있다. 또는, 무선 타입의 경우에는 셀룰러 통신, 예컨대, LTE, 5G 계열의 통신 방식을 이용할 수 있다. 다만, 무선 통신 프로토콜이 상술한 예시에 제한되는 것은 아니며, 임의의 적절한 무선 타입의 통신 방식을 이용하는 것도 가능하다. 유선 타입의 경우에는 LAN(Local Area Network)이나 USB(Universal Serial Bus) 통신이 대표적인 예이며 그 외의 다른 방식도 가능하다.The apparatus 1000 for optimizing a neural network model may connect to a network through the transceiver 1100 to transmit/receive various types of data. The transceiver may largely include a wired type and a wireless type. Since the wired type and the wireless type have their respective advantages and disadvantages, the wired type and the wireless type may be simultaneously provided in the apparatus 1000 for optimizing the neural network model in some cases. Here, in the case of the wireless type, a wireless local area network (WLAN)-based communication method such as Wi-Fi may be mainly used. Alternatively, in the case of the wireless type, cellular communication, for example, LTE, 5G-based communication method may be used. However, the wireless communication protocol is not limited to the above-described example, and any suitable wireless type communication method may be used. In the case of the wired type, LAN (Local Area Network) or USB (Universal Serial Bus) communication is a representative example, and other methods are also possible.

신경망 모델 최적화 장치(1000)의 메모리(1200)는 각종 정보를 저장할 수 있다. 메모리(1200)에는 각종 데이터가 임시적으로 또는 반영구적으로 저장될 수 있다. 메모리의 예로는 하드 디스크(HDD: Hard Disk Drive), SSD(Solid State Drive), 플래쉬 메모리(flash memory), 롬(ROM: Read-Only Memory), 램(RAM: Random Access Memory) 등이 있을 수 있다. 메모리(1200)는 신경망 모델 최적화 장치(1000)에 내장되는 형태나 탈부착 가능한 형태로 제공될 수 있다. 메모리(1200)에는 신경망 모델 최적화 장치(1000)를 구동하기 위한 운용 프로그램(OS: Operating System)이나 신경망 모델 최적화 장치(1000)의 각 구성을 동작시키기 위한 프로그램을 비롯해 신경망 모델 최적화 장치(1000)의 동작에 필요한 각종 데이터가 저장될 수 있다.The memory 1200 of the apparatus 1000 for optimizing a neural network model may store various types of information. Various data may be temporarily or semi-permanently stored in the memory 1200 . Examples of memory may include a hard disk drive (HDD), a solid state drive (SSD), flash memory, read-only memory (ROM), and random access memory (RAM). have. The memory 1200 may be provided in a form embedded in the apparatus 1000 for optimizing a neural network model or in a form detachable. The memory 1200 includes an operating system (OS) for driving the neural network model optimizing apparatus 1000 or a program for operating each component of the neural network model optimizing apparatus 1000, and Various data necessary for operation may be stored.

프로세서(1300)는 신경망 모델 최적화 장치(1000)의 전반적인 동작을 제어할 수 있다. 예컨대, 프로세서(1300)는 후술할 학습이 완료된 신경망 모델의 실행 데이터를 획득하는 동작, 신경망 모델의 구조에 대한 최적화를 수행하는 동작, 임베디드 장치에 대한 최적화를 수행하는 동작, 최적화 결과에 따라 생성된 최적 코드 정보를 획득하는 동작, 및/또는 최적 코드 정보를 송신하는 동작 등 신경망 모델 최적화 장치(1000)의 전반적인 동작을 제어할 수 있다. 구체적으로 프로세서(1300)는 메모리(1200)로부터 신경망 모델 최적화 장치(1000)의 전반적인 동작을 위한 프로그램을 로딩하여 실행할 수 있다. 프로세서(1300)는 하드웨어나 소프트웨어 또는 이들의 조합에 따라 AP(Application Processor), CPU(Central Processing Unit), MCU(Microcontroller Unit)나 이와 유사한 장치로 구현될 수 있다. 이때, 하드웨어적으로는 전기적 신호를 처리하여 제어 기능을 수행하는 전자 회로 형태로 제공될 수 있으며, 소프트웨어적으로는 하드웨어적 회로를 구동시키는 프로그램이나 코드 형태로 제공될 수 있다.The processor 1300 may control the overall operation of the apparatus 1000 for optimizing a neural network model. For example, the processor 1300 may perform an operation of acquiring execution data of a neural network model that has been trained, an operation of performing optimization on the structure of a neural network model, an operation of performing optimization on an embedded device, and an operation of performing optimization on an embedded device, which will be described later. The overall operation of the neural network model optimizing apparatus 1000 may be controlled, such as an operation of obtaining optimal code information and/or an operation of transmitting optimal code information. Specifically, the processor 1300 may load and execute a program for the overall operation of the neural network model optimization apparatus 1000 from the memory 1200 . The processor 1300 may be implemented as an application processor (AP), a central processing unit (CPU), a microcontroller unit (MCU), or similar devices according to hardware, software, or a combination thereof. In this case, in terms of hardware, it may be provided in the form of an electronic circuit that performs a control function by processing an electrical signal, and in software, it may be provided in the form of a program or code for driving a hardware circuit.

이하에서는 도 2 내지 도 9를 참고하여, 본 출원의 일 실시예에 따른 신경망 모델 최적화 장치(1000)의 동작 및 신경망 모델 최적화 방법을 구체적으로 서술한다. Hereinafter, an operation of the apparatus 1000 for optimizing a neural network model and a method for optimizing a neural network model according to an embodiment of the present application will be described in detail with reference to FIGS. 2 to 9 .

도 2는 본 출원의 일 실시예에 따른 신경망 모델 최적화 방법을 나타낸 순서도이다.2 is a flowchart illustrating a method for optimizing a neural network model according to an embodiment of the present application.

본 출원의 일 실시예에 따른 신경망 모델 최적화 방법은, 학습이 완료된 신경망 모델의 실행 데이터를 획득하는 단계(S1000), 신경망 모델의 구조에 대한 최적화를 수행하고 인스트럭션 정보를 획득하는 단계(S2000), 임베디드 장치에 대한 최적화를 수행하고 최적 코드 정보를 획득하는 단계(S3000), 및 최적 코드 정보를 송신하는 단계(S4000)를 더 포함할 수 있다.A method for optimizing a neural network model according to an embodiment of the present application includes: acquiring execution data of a neural network model on which learning is completed (S1000), performing optimization on the structure of the neural network model and acquiring instruction information (S2000), The method may further include performing optimization for the embedded device and obtaining optimal code information (S3000), and transmitting optimal code information (S4000).

학습이 완료된 신경망 모델의 실행 데이터를 획득하는 단계(S1000)에서는, 신경망 모델 최적화 장치(1000)는, 송수신부(1100)를 통하여 학습이 완료된 신경망 모델의 실행 데이터를 획득할 수 있다. 여기서, 실행 데이터란, 신경망 모델의 계층 데이터, 신경망 모델을 구성하는 연산 데이터(operation data), 및/또는 신경망 모델과 관련된 임의의 가중치(혹은 파라미터)를 포함하여, 신경망 모델을 실행시키기 위하여 필요한 임의의 적절한 데이터를 포괄하는 의미일 수 있다. In step S1000 of obtaining the execution data of the neural network model on which the learning has been completed, the apparatus 1000 for optimizing the neural network model may acquire the execution data of the neural network model on which the learning has been completed through the transceiver 1100 . Here, the execution data includes hierarchical data of the neural network model, operation data constituting the neural network model, and/or arbitrary weights (or parameters) related to the neural network model, any necessary to execute the neural network model. It may mean encompassing appropriate data of

신경망 모델의 구조에 대한 최적화를 수행하고 인스트럭션 정보를 획득하는 단계(S2000)에서는, 신경망 모델 최적화 장치(1000)는 신경망 모델의 실행데이터에 기초하여 신경망 모델의 구조에 대한 최적화, 예컨대 신경망 모델의 구조에 대한 경량화를 수행할 수 있다. 일 예로, 신경망 모델 최적화 장치(1000)는 신경망 모델의 연산 구조에 포함된 연산 패턴을 검출하고, 연산 패턴에 포함된 대상 연산들을 병합하는 동작을 수행하도록 구성될 수 있다. 다른 예로, 신경망 모델 최적화 장치(1000)는 신경망 모델의 연산들의 실행 순서를 결정하고, 결정된 실행 순서에 기초하여 메모리 할당과 관련된 최적화를 수행할 수 있다. In the step of optimizing the structure of the neural network model and obtaining instruction information ( S2000 ), the apparatus 1000 for optimizing the neural network model optimizes the structure of the neural network model based on the execution data of the neural network model, for example, the structure of the neural network model. weight reduction can be performed. As an example, the apparatus 1000 for optimizing a neural network model may be configured to detect an operation pattern included in the operation structure of the neural network model and perform an operation of merging target operations included in the operation pattern. As another example, the apparatus 1000 for optimizing a neural network model may determine an execution order of operations of the neural network model, and perform optimization related to memory allocation based on the determined execution order.

신경망 모델의 구조에 대한 최적화를 수행하고 인스트럭션 정보를 획득하는 단계(S2000)에서는, 신경망 모델 최적화 장치(1000)는 최적화 수행 결과에 따라 인스트럭션 정보를 획득할 수 있다. 여기서, 인스트럭션 정보는, 신경망 모델의 연산들의 유형에 대한 인스트럭션, 및/또는 신경망 모델의 각 연산과 관련된 메모리 주소에 대한 인스트럭션을 포함할 수 있다. S2000 단계에 대하여는 도 3 내지 도 7에서 보다 구체적으로 서술하기로 한다. In the step of optimizing the structure of the neural network model and obtaining instruction information ( S2000 ), the apparatus 1000 for optimizing the neural network model may acquire instruction information according to the optimization result. Here, the instruction information may include an instruction for a type of operations of the neural network model, and/or an instruction for a memory address related to each operation of the neural network model. Step S2000 will be described in more detail with reference to FIGS. 3 to 7 .

임베디드 장치에 대한 최적화를 수행하고 최적 코드 정보를 획득하는 단계(S3000)에서는, 신경망 모델 최적화 장치(1000)는, 훈련이 완료된 신경망 모델을 임베디드 장치(100)의 컴퓨팅 환경에 최적으로 실행시키기 위한 코드를 생성할 수 있다. 구체적으로 신경망 모델 최적화 장치(1000)는 강화 학습 기법으로 훈련된 에이전트를 통하여, S2000 단계에서 획득한 인스트럭션 정보 및 임베디드 장치 정보에 기초하여 최적화 파라미터를 획득하고, 최적화 파라미터에 기초하여 임베디드 장치(100)에서 이용될 최적 코드 정보를 생성하도록 구현될 수 있다. S3000 단계에 대하여는 도 8 내지 도 9에서 보다 구체적으로 서술하기로 한다.In the step of performing optimization for the embedded device and obtaining optimal code information ( S3000 ), the neural network model optimization device 1000 is a code for optimally executing the trained neural network model in the computing environment of the embedded device 100 . can create Specifically, the apparatus 1000 for optimizing a neural network model acquires optimization parameters based on the instruction information and embedded device information obtained in step S2000 through an agent trained with a reinforcement learning technique, and based on the optimization parameters, the embedded device 100 It can be implemented to generate optimal code information to be used in . Step S3000 will be described in more detail with reference to FIGS. 8 to 9 .

최적 코드 정보를 송신하는 단계(S4000)에서는, 신경망 모델 최적화 장치(1000)는, 송수신부(1100)를 통하여, 획득된 최적 코드 정보를 임베디드 장치(100)를 포함하여 임의의 외부 장치(혹은 외부 서버)로 송신하도록 구현될 수 있다. In the step of transmitting the optimal code information (S4000), the neural network model optimization apparatus 1000 transmits the obtained optimal code information through the transceiver 1100 to any external device (or external device) including the embedded device 100 . server) can be implemented.

이하에서는 도 3 내지 7을 참고하여 본 출원의 일 실시예에 따른 신경망 모델의 구조에 대한 최적화에 관한 내용을 보다 구체적으로 서술하도록 한다. Hereinafter, with reference to FIGS. 3 to 7 , content related to optimization of the structure of a neural network model according to an embodiment of the present application will be described in more detail.

도 3은 본 출원의 일 실시예에 따른 신경망 모델의 구조에 대한 최적화를 수행하고 인스트럭션 정보를 획득하는 단계를 구체화한 순서도이다. 3 is a flowchart detailing the steps of performing optimization on the structure of a neural network model and acquiring instruction information according to an embodiment of the present application.

본 출원의 일 실시예에 따른 신경망 모델의 구조에 대한 최적화를 수행하고 인스트럭션 정보를 획득하는 단계(S2000)는, 실행 데이터에 기초하여 방향성 비사이클 그래프(Directed Acyclic Graph, DAG)를 생성하는 단계(S2100), DAG에 기초하여 연산들의 실행 순서를 결정하는 단계(S2200), 미리 정해진 기준 연산 패턴에 대응되는 DAG의 대상 연산 패턴을 검출하고, 대상 연산 패턴에 포함된 제1 대상 연산과 제2 대상 연산을 병합하는 단계(S2300), 결정된 실행 순서에 기초하여 제1 메모리 공간 맵을 획득하고, 제1 메모리 공간 맵에 기초하여 메모리 할당과 관련된 최적화를 수행하는 단계(S2400), 및 최적화 수행 결과에 기초하여 메모리 주소와 관련된 인스트럭션을 생성하는 단계(S2500)를 더 포함할 수 있다. The step of performing optimization of the structure of the neural network model according to an embodiment of the present application and obtaining instruction information (S2000) is a step of generating a Directed Acyclic Graph (DAG) based on the execution data ( S2100), determining the execution order of operations based on the DAG (S2200), detecting a target operation pattern of the DAG corresponding to a predetermined reference operation pattern, and a first target operation and a second target included in the target operation pattern merging operations (S2300), obtaining a first memory space map based on the determined execution order, performing optimization related to memory allocation based on the first memory space map (S2400), and performing optimization on the results The method may further include generating an instruction related to the memory address based on the step (S2500).

실행 데이터에 기초하여 방향성 비사이클 그래프(Directed Acyclic Graph, 이하 DAG)를 생성하는 단계(S2100)에서는, 신경망 모델 최적화 장치(1000)는 신경망 모델의 실행 데이터(예컨대, 신경망 모델의 계층 데이터, 신경망 모델을 구성하는 연산 데이터(operation data), 및/또는 신경망 모델의 파라미터 등)에 기초하여 DAG를 생성할 수 있다. DAG는 방향 순환이 없는 임의의 무한 유향 그래프를 지칭하는 의미일 수 있다. In the step S2100 of generating a directed acyclic graph (hereinafter referred to as DAG) based on the execution data, the neural network model optimization apparatus 1000 performs the execution data of the neural network model (eg, hierarchical data of the neural network model, the neural network model). A DAG may be generated based on operation data constituting the , and/or parameters of a neural network model. DAG may be meant to refer to any infinite directed graph without a directed cycle.

도 4는 본 출원의 일 실시예에 따른 DAG의 일 양상을 도시한 도면이다. 4 is a diagram illustrating an aspect of a DAG according to an embodiment of the present application.

DAG는 신경망 모델을 구성하는 각 함수들의 데이터 의존 관계, 예컨대, 각 함수들의 연결관계와 관련된 정보를 포함할 수 있다. 예컨대, DAG는 제1 연산(예컨대, A)과 연결된 제2 연산(예컨대, B) 및 제2 연산과 연결된 제3 연산(예컨대, C)을 포함하는 제1 브랜치 구조와 관련된 연결관계와 관련된 정보를 포함할 수 있다. 예컨대, DAG는 제1 연산(예컨대, A)과 연결된 제4 연산(예컨대, D) 및 제4 연산과 연결된 제5 연산(예컨대, E)을 포함하는 제2 브랜치 구조와 관련된 연결관계와 관련된 정보를 포함할 수 있다. 또한, DAG는 제3 연산과 제5 연산과 연결된 제6 연산(예컨대, F)과 관련된 연결관계와 관련된 정보를 포함할 수 있다. 다만, 도 4는 DAG의 설명의 편의를 위한 예시에 불과하며, 이에 제한적으로 해석되지 않는다. The DAG may include information related to a data dependency relationship of each function constituting the neural network model, for example, a connection relationship between each function. For example, the DAG includes information related to a connection relationship associated with a first branch structure including a second operation (eg, B) associated with a first operation (eg, A) and a third operation (eg, C) associated with the second operation. may include. For example, the DAG includes information related to a connection relationship associated with a second branch structure including a fourth operation (eg, D) associated with a first operation (eg, A) and a fifth operation (eg, E) associated with the fourth operation may include. In addition, the DAG may include information related to a connection relationship related to the third operation and the sixth operation (eg, F) connected to the fifth operation. However, FIG. 4 is only an example for convenience of description of the DAG, and is not limited thereto.

DAG에 기초하여 연산들의 실행 순서를 결정하는 단계(S2200)에서는, 신경망 모델 최적화 장치(1000)는 DAG를 이용하여 신경망 모델을 구성하는 적어도 하나 이상의 연산들의 실행 순서를 결정할 수 있다. In determining the execution order of operations based on the DAG ( S2200 ), the apparatus 1000 for optimizing the neural network model may determine the execution order of at least one or more operations constituting the neural network model by using the DAG.

실행 순서를 결정하는 데는 임의의 적절한 규칙이 이용될 수 있다. Any suitable rule may be used to determine the execution order.

일 예로, 신경망 모델 최적화 장치(1000)는 신경망 모델을 구성하는 연산에 요구되는 메모리 공간에 기초하여 DAG의 연산들의 실행 순서를 결정할 수 있다. 예컨대, 신경망 모델 최적화 장치(1000)는 DAG의 제1 브랜치에 포함된 연산(예컨대, 도 4의 B)에 요구되는 제1 메모리 공간과 DAG의 제2 브랜치에 포함된 연산(예컨대, 도 4의 D)에 요구되는 제2 메모리 공간을 연산하고, 제1 메모리 공간과 제2 메모리 공간을 비교하고, 비교 결과에 기초하여 제1 브랜치에 포함된 연산(예컨대, 도 4의 B)과 제2 브랜치에 포함된 연산(예컨대, 도 4의 D) 간의 실행 순서를 결정할 수 있다. 구체적으로 제1 메모리 공간이 제2 메모리 공간보다 큰 경우, 신경망 모델 최적화 장치(1000)는 제1 브랜치에 포함된 연산(예컨대, 도 4의 B)의 실행 순서를 제2 브랜치에 포함된 연산(예컨대, 도 4의 D)의 실행 순서보다 후순위로 할당할 수 있다. 반면 제1 메모리 공간이 제2 메모리 공간보다 작은 경우, 신경망 모델 최적화 장치(1000)는 제1 브랜치에 포함된 연산(예컨대, 도 4의 B)의 실행 순서를 제2 브랜치에 포함된 연산(예컨대, 도 4의 D)의 실행 순서보다 선순위로 할당할 수 있다.As an example, the apparatus 1000 for optimizing a neural network model may determine an execution order of operations of the DAG based on a memory space required for operations constituting the neural network model. For example, the apparatus 1000 for optimizing the neural network model includes a first memory space required for an operation included in the first branch of the DAG (eg, B in FIG. 4 ) and an operation included in the second branch of the DAG (eg, in FIG. 4 ). Calculate the second memory space required for D), compare the first memory space with the second memory space, and based on the comparison result, the operation included in the first branch (eg, B in FIG. 4 ) and the second branch An execution order between operations (eg, D of FIG. 4 ) included in . Specifically, when the first memory space is larger than the second memory space, the apparatus 1000 for optimizing the neural network model changes the execution order of the operation included in the first branch (eg, B of FIG. 4 ) to the operation included in the second branch ( For example, it may be assigned with a lower priority than the execution order of FIG. 4D). On the other hand, when the first memory space is smaller than the second memory space, the apparatus 1000 for optimizing the neural network model changes the execution order of the operation included in the first branch (eg, B in FIG. 4 ) to the operation included in the second branch (eg, B of FIG. 4 ). , may be assigned in a higher priority than the execution order of FIG. 4D).

다른 예로, 신경망 모델 최적화 장치(1000)는 DAG의 브랜치 구조를 고려하여 신경망 모델을 구성하는 연산들의 실행 순서를 결정할 수 있다. 예컨대, 같은 브랜치에 포함된 연산들을 순차적으로 실행하는 것이, 제1 브랜치에 포함된 연산을 실행하고 제2 브랜치에 포함된 연산을 실행하는 것보다는 메모리 공간 측면에 유리할 수 있다. 따라서, 신경망 모델 최적화 장치(1000)는 제1 브랜치에 포함된 연산들(예컨대, 도 4의 B, C)을 순차적으로 실행한 이후에, 제2 브랜치에 포함된 연산들(예컨대, 도 4의 D, E)를 실행하도록 실행 순서를 결정할 수 있다. 혹은 신경망 모델 최적화 장치(1000)는 제2 브랜치에 포함된 연산들(예컨대, 도 4의 D, E)을 실행하고, 제1 브랜치에 포함된 연산들(예컨대, 도 4의 B, C)를 실행하도록 실행 순서를 결정할 수 있다. As another example, the apparatus 1000 for optimizing the neural network model may determine the execution order of operations constituting the neural network model in consideration of the branch structure of the DAG. For example, sequentially executing operations included in the same branch may be more advantageous in terms of memory space than executing operations included in a first branch and executing operations included in a second branch. Therefore, the neural network model optimization apparatus 1000 sequentially executes the operations included in the first branch (eg, B and C in FIG. 4 ), and then sequentially executes the operations included in the second branch (eg, in FIG. 4 ). The execution order can be determined to execute D, E). Alternatively, the neural network model optimization apparatus 1000 executes the operations (eg, D and E of FIG. 4 ) included in the second branch, and calculates the operations (eg, B, C of FIG. 4 ) included in the first branch. You can determine the order of execution to be executed.

미리 정해진 기준 연산 패턴에 대응되는 DAG의 대상 연산 패턴을 검출하고, 대상 연산 패턴에 포함된 제1 대상 연산과 제2 대상 연산을 병합하는 단계(S2300)에서는, 신경망 모델 최적화 장치(1000)는, 미리 정해진 기준 연산 패턴을 획득하고, 기준 연산 패턴에 대응되는 DAG의 대상 연산 패턴을 검출할 수 있다. 또한, 미리 정해진 기준 연산 패턴에 대응되는 DAG의 대상 연산 패턴을 검출하고, 대상 연산 패턴에 포함된 제1 대상 연산과 제2 대상 연산을 병합하는 단계(S2300)에서는, 신경망 모델 최적화 장치(1000)는 검출된 대상 연산 패턴에 포함된 연산들을 병합하는 동작을 수행하도록 구현될 수 있다. In the step (S2300) of detecting the target operation pattern of the DAG corresponding to the predetermined reference operation pattern and merging the first target operation and the second target operation included in the target operation pattern, the neural network model optimization apparatus 1000, A predetermined reference operation pattern may be obtained, and a target operation pattern of the DAG corresponding to the reference operation pattern may be detected. In addition, in the step (S2300) of detecting the target operation pattern of the DAG corresponding to the predetermined reference operation pattern and merging the first target operation and the second target operation included in the target operation pattern, the neural network model optimization apparatus 1000 may be implemented to perform an operation of merging operations included in the detected target operation pattern.

보다 구체적으로 본 출원의 일 실시예에 따른 제1 대상 연산과 제2 대상 연산을 병합하는 단계는, 미리 정해진 기준 연산 패턴 정보-기준 연산 패턴 정보는 제1 연산 및 제1 연산과 연계된 제2 연산을 포함함-를 획득하는 단계, 기준 연산 패턴 정보에 기초하여 DAG로부터 제1 연산에 대응되는 제1 대상 연산 및 제2 연산에 대응되는 제2 대상 연산을 검출하는 단계, 및 제1 대상 연산과 제2 대상 연산을 병합하고, 병합 결과에 기초하여 커널을 변환하는 단계를 더 포함할 수 있다. More specifically, the step of merging the first target operation and the second target operation according to an embodiment of the present application includes the predetermined reference operation pattern information-reference operation pattern information, the first operation and the second operation associated with the first operation. obtaining a-comprising operation, detecting a first target operation corresponding to the first operation and a second target operation corresponding to the second operation from the DAG based on the reference operation pattern information, and the first target operation The method may further include merging the second target operation and the second target operation, and transforming the kernel based on the result of the merging.

도 5는 본 출원의 일 실시예에 따른 제1 대상 연산과 제2 대상 연산을 병합하는 일 양상을 도시한 도면이다. 5 is a diagram illustrating an aspect of merging a first target operation and a second target operation according to an embodiment of the present application.

미리 정해진 기준 연산 패턴 정보-기준 연산 패턴 정보는 제1 연산 및 제1 연산과 연계된 제2 연산을 포함함-를 획득하는 단계에서는, 신경망 모델 최적화 장치(1000)는 미리 정해진 기준 연산 패턴 정보를 획득할 수 있다. 이때, 기준 연산 패턴 정보는 일반적으로 많이 이용되는 연산 패턴과 관련된 정보로, 미리 설정될 수 있다. 예컨대, 기준 연산 패턴 정보는 제1 연산(예컨대, 콘볼루션(Convolution)) 및 제1 연산과 연결된 제2 연산(예컨대, Rectified Linear Unit (ReLu))을 포함하는 연산 패턴과 관련될 수 있다. 다만 이는 설명의 편의를 위한 예시에 불과하며, 임의의 적절한 연산 패턴이 미리 설정될 수 있다. 예컨대, 기준 연산 패턴 정보는 컨볼루션(Convolution) 연산을 수행하고, 뎁스와이스 컨볼루션(Depthwise convolution) 연산과 액티베이션(Activation) 연산을 순차적으로 수행하는 연산 패턴과 관련될 수 있다. 다른 예로, 기준 연산 패턴 정보는 이미지의 색상과 관련된 채널별로 필터를 적용하여 채널별로 데이터를 압축하는 뎁스와이스 컨볼루션(Depthwise convolution) 연산을 수행하여 중간 결과값을 획득하고, 중간결과 값에 기초하여 포인트와이스(Pointwise) 연산을 수행하는 연산 패턴을 포함할 수 있다. In the step of obtaining the predetermined reference operation pattern information - the reference operation pattern information includes the first operation and the second operation associated with the first operation - the apparatus 1000 for optimizing the neural network model calculates the predetermined reference operation pattern information. can be obtained In this case, the reference operation pattern information is information related to a commonly used operation pattern, and may be preset. For example, the reference operation pattern information may be related to an operation pattern including a first operation (eg, convolution) and a second operation (eg, Rectified Linear Unit (ReLu)) connected to the first operation. However, this is only an example for convenience of description, and any appropriate operation pattern may be preset. For example, the reference operation pattern information may be related to an operation pattern in which a convolution operation is performed and a depthwise convolution operation and an activation operation are sequentially performed. As another example, the reference operation pattern information performs a depthwise convolution operation that compresses data for each channel by applying a filter for each channel related to the color of the image to obtain an intermediate result value, and based on the intermediate result value It may include an operation pattern for performing a pointwise operation.

기준 연산 패턴 정보에 기초하여 DAG로부터 제1 연산에 대응되는 제1 대상 연산 및 제2 연산에 대응되는 제2 대상 연산을 검출하는 단계에서는, 신경망 모델 최적화 장치(1000)는 기준 연산 패턴 정보를 이용하여 DAG에 포함된 제1 연산(예컨대, Convolution)에 대응되는 제1 대상 연산(예컨대, 도 5의 제1 대상 연산), 및 제2 연산(예컨대, ReLu)에 대응되는 제2 대상 연산(예컨대, 도 5의 제2 대상 연산)을 포함하는 대상 연산 패턴을 검출할 수 있다. 이때, 대상 연산 패턴을 검출하기 위하여 임의의 패턴 매칭 알고리즘이 이용될 수 있다. In the step of detecting the first target operation corresponding to the first operation and the second target operation corresponding to the second operation from the DAG based on the reference operation pattern information, the neural network model optimization apparatus 1000 uses the reference operation pattern information a first target operation (eg, the first target operation of FIG. 5 ) corresponding to the first operation (eg, convolution) included in the DAG, and a second target operation (eg, ReLu) corresponding to the second operation (eg, ReLu) , a target operation pattern including the second target operation of FIG. 5 ) may be detected. In this case, any pattern matching algorithm may be used to detect the target operation pattern.

제1 대상 연산과 제2 대상 연산을 병합하고, 병합 결과에 기초하여 커널을 변환하는 단계에서는, 신경망 모델 최적화 장치(1000)는 제1 대상 연산(예컨대, 도 5의 Convolution)과 제2 대상 연산(예컨대, 도 5의 ReLu)를 병합하고, 병합 결과에 기초하여 대상 연산 패턴에 포함된 커널들을 단일 커널(예컨대, Convolution + ReLu 커널)로 변환하도록 구현될 수 있다. 본 실시예에 따르면, 제1 대상 연산과 제2 대상 연산을 병합하여 일체로 연산을 수행함으로써, 연산에 요구되는 메모리 공간을 줄이고, 실행 속도를 높일 수 있다는 유리한 효과가 제공될 수 있다. In the step of merging the first target operation and the second target operation and transforming the kernel based on the merging result, the apparatus 1000 for optimizing the neural network model performs the first target operation (eg, the convolution of FIG. 5 ) and the second target operation. It may be implemented to merge (eg, ReLu of FIG. 5 ) and convert the kernels included in the target operation pattern into a single kernel (eg, Convolution + ReLu kernel) based on the merging result. According to the present embodiment, by merging the first target operation and the second target operation and performing the operation integrally, advantageous effects of reducing the memory space required for the operation and increasing the execution speed may be provided.

다시 도 2를 참고하면, 신경망 모델의 구조에 대한 최적화를 수행하고 인스트럭션 정보를 획득하는 단계(S2000)에서는, 신경망 모델 최적화 장치(1000)는 신경망 모델을 구성하는 연산의 입력 데이터 및/또는 출력 데이터에 대한 정수화를 수행할 수 있다. 구체적으로 신경망 모델 최적화 장치(1000)는 양자화 기법을 활용하여 소수 형태의 연산의 입력 데이터 및/또는 출력 데이터를 특정 범위의 정수로 변환할 수 있다. 여기서 양자화(Quantizaiton)는 32 비트 소수점(float) 값을 8비트 정수(int)로 변환하는 기법이다. 신경망 모델 최적화 장치(1000)는 신경망 모델을 구성하는 연산에 대한 데이터 텐서별로 스케일(Scale)와 제로 포인트(ZeroPoint)를 연산하여, 32 비트 소수점 값을 용량이 상대적으로 작은 8비트 정수 값(int8Value)으로 변환할 수 있다. 예컨대, 신경망 모델 최적화 장치(1000)는 하기의 수학식을 통하여 32 비트 소수점 값을 8비트의 정수 값으로 변환하거나 조정하도록 구성될 수 있다. Referring back to FIG. 2 , in the step of optimizing the structure of the neural network model and obtaining instruction information ( S2000 ), the neural network model optimizing apparatus 1000 provides input data and/or output data of operations constituting the neural network model. can be purified. In more detail, the apparatus 1000 for optimizing a neural network model may convert input data and/or output data of a decimal type operation into integers within a specific range by using a quantization technique. Here, quantization is a technique of converting a 32-bit decimal point (float) value into an 8-bit integer (int). The apparatus 1000 for optimizing a neural network model calculates a scale and a zero point for each data tensor for an operation constituting a neural network model, and converts a 32-bit decimal value to an 8-bit integer value (int8Value) having a relatively small capacity. can be converted to For example, the apparatus 1000 for optimizing a neural network model may be configured to convert or adjust a 32-bit decimal point value to an 8-bit integer value through the following equation.

다만, 전술한 8비트 정수 양자화는 하나의 예시에 불과하며, 신경망 모델 최적화 장치(1000)는 8비트 정수 외에도, 임의의 소수점 값을 0 혹은 8 비트 자연수 값으로 변환하거나 조정하여 신경망 모델의 구조에 대한 최적화를 수행하도록 구현될 수 있다. However, the above-described 8-bit integer quantization is only an example, and the neural network model optimization apparatus 1000 converts or adjusts any decimal point value into 0 or 8-bit natural number values, in addition to the 8-bit integer, to the structure of the neural network model. It can be implemented to perform optimization for

다시 도 3을 참고하면, 본 출원의 일 실시예에 따른 신경망 모델의 구조에 대한 최적화를 수행하고 인스트럭션 정보를 획득하는 단계(S2000)는, 결정된 실행 순서에 기초하여 제1 메모리 공간 맵을 획득하고, 제1 메모리 공간 맵에 기초하여 메모리 할당과 관련된 최적화를 수행하는 단계(S2400)를 더 포함할 수 있다. Referring back to FIG. 3 , the step of performing optimization on the structure of the neural network model and obtaining instruction information according to an embodiment of the present application (S2000) is to obtain a first memory space map based on the determined execution order and , performing optimization related to memory allocation based on the first memory space map ( S2400 ).

도 6은 본 출원의 일 실시예에 따른 제1 메모리 공간 맵의 일 양상을 도시한 도면이다. 6 is a diagram illustrating an aspect of a first memory space map according to an embodiment of the present application.

결정된 실행 순서에 기초하여 제1 메모리 공간 맵을 획득하고, 제1 메모리 공간 맵에 기초하여 메모리 할당과 관련된 최적화를 수행하는 단계(S2400)에서는, 신경망 모델 최적화 장치(1000)는, S2200 단계에서 결정된 실행 순서 및 각 연산에 요구되는 메모리 공간에 기초하여 제1 메모리 공간 맵을 생성할 수 있다. 구체적으로 도 4와 관련하여 전술한 규칙에 따라 신경망 모델을 구성하는 연산들의 실행 순서(예컨대, 도 6의 A, B, C, D, E, F 순)가 결정될 수 있다. 이때, 신경망 모델 최적화 장치(1000)는 각 연산을 통하여 출력되는 데이터의 크기와 관련된 메모리 공간에 기초하여 제1 메모리 공간 맵을 생성할 수 있다. 예컨대, 신경망 모델 최적화 장치(1000)는 연산 A를 통하여 출력되는 데이터에 필요한 메모리 공간과 연산 A를 통하여 출력되는 데이터가 필요한 연산들(예컨대, 연산 B, 연산 D)의 실행 순서를 고려하여 연산 A와 관련된 메모리 텐서(예컨대, T1)를 배치할 수 있다. 또한, 신경망 모델 최적화 장치(1000)는 연산 B를 통하여 출력되는 데이터에 필요한 메모리 공간과 연산 B를 통하여 출력되는 데이터가 필요한 연산들(예컨대, 연산 C)의 실행 순서를 고려하여 연산 B와 관련된 메모리 텐서(예컨대, T2)를 배치할 수 있다. 이때, 연산 B를 통하여 출력되는 데이터가 저장되는 T2 메모리 텐서는 연산 B에 필요한 데이터를 저장하고 있는 연산 A와 관련된 T1 메모리 텐서의 인접 위치에 배치될 수 있다. 유사한 방식으로 신경망 모델 최적화 장치(1000)는 신경망 모델을 구성하는 연산들(예컨대, 연산 C, D, E 등)을 통하여 출력되는 데이터 크기와 관련된 메모리 공간과 연산들의 실행 순서를 고려하여 제1 메모리 공간 맵을 생성할 수 있다. Obtaining a first memory space map based on the determined execution order, and performing optimization related to memory allocation based on the first memory space map ( S2400 ), the neural network model optimization apparatus 1000 , determined in step S2200 A first memory space map may be generated based on an execution order and a memory space required for each operation. Specifically, the execution order of operations constituting the neural network model (eg, the order of A, B, C, D, E, F in FIG. 6 ) may be determined according to the rule described above with reference to FIG. 4 . In this case, the apparatus 1000 for optimizing a neural network model may generate a first memory space map based on a memory space related to the size of data output through each operation. For example, the apparatus 1000 for optimizing a neural network model considers a memory space required for data output through operation A and an execution order of operations (eg, operation B and operation D) requiring data output through operation A. A memory tensor (eg, T1) associated with may be disposed. In addition, the apparatus 1000 for optimizing a neural network model considers a memory space required for data output through operation B and an execution order of operations (eg, operation C) requiring data output through operation B, and a memory related to operation B. A tensor (eg, T2) may be placed. In this case, the T2 memory tensor in which data output through operation B is stored may be disposed adjacent to the T1 memory tensor related to operation A storing data required for operation B. In a similar manner, the apparatus 1000 for optimizing the neural network model considers the memory space related to the data size output through the operations (eg, operations C, D, E, etc.) constituting the neural network model and the execution order of the operations in the first memory You can create a spatial map.

이상에서는 도 6에 도시된 제1 메모리 공간 맵을 중심으로 신경망 모델 최적화 장치(1000)의 메모리 텐서를 배치하는 동작을 설명하였다. 다만 도 6에 도시된 제1 메모리 공간 맵은 설명의 편의를 위한 예시일 뿐이며, 이에 제한적으로 해석되어서는 아니된다. In the above, the operation of arranging the memory tensor of the apparatus 1000 for optimizing a neural network model has been described based on the first memory space map shown in FIG. 6 . However, the first memory space map illustrated in FIG. 6 is merely an example for convenience of description and should not be construed as being limited thereto.

도 7은 본 출원의 일 실시예에 따른 메모리 할당과 관련된 최적화의 일 양상을 도시한 도면이다. 7 is a diagram illustrating an aspect of optimization related to memory allocation according to an embodiment of the present application.

결정된 실행 순서에 기초하여 제1 메모리 공간 맵을 획득하고, 제1 메모리 공간 맵에 기초하여 메모리 할당과 관련된 최적화를 수행하는 단계(S2400)에서는, 신경망 모델 최적화 장치(1000)는 신경망 모델을 구성하는 연산들의 실행 순서 및 연산들을 통하여 출력되는 데이터 크기와 관련된 메모리 공간에 기초하여 생성된 제1 메모리 공간 맵을 획득할 수 있다. Obtaining a first memory space map based on the determined execution order, and performing optimization related to memory allocation based on the first memory space map ( S2400 ), the neural network model optimization apparatus 1000 configures the neural network model The first memory space map generated based on the memory space related to the execution order of the operations and the size of data output through the operations may be obtained.

또한, 신경망 모델 최적화 장치(1000)는 제1 메모리 공간 맵에 기초하여 메모리 할당과 관련된 최적화를 수행할 수 있다. 일 예로, 신경망 모델 최적화 장치(1000)는 메모리 인-플레이싱(Memory In-placing) 기법을 활용하여, 특정 연산(예컨대, ReLu 연산, Add 연산, 및/또는 Sigmoid 연산 등)의 출력 공간을 해당 연산의 입력 공간에 덮어쓰는 동작을 수행할 수 있다. 구체적으로 신경망 모델 최적화 장치(1000)는 메모리 인-플레이싱 기법을 활용하여, 특정 연산(예컨대, 도 7의 제3 대상 연산(연산 C))으로 입력된 값이 저장되는 메모리 텐서(예컨대, 도 7의 T2 메모리 텐서)를 특정 연산(예컨대, 도 7의 제3 대상 연산(연산 C))를 통하여 출력된 값이 저장되는 메모리 텐서(예컨대, 도 7의 T3 메모리 텐서)로 변경하도록 구현될 수 있다. 또한, 신경망 모델 최적화 장치(1000)는 메모리 텐서의 변경 결과에 기초하여 제1 메모리 공간 맵으로부터 제2 메모리 공간 맵을 생성할 수 있다. 제2 메모리 공간 맵은 제1 메모리 공간 맵보다는 상대적으로 적은 메모리 공간을 차지하게 된다. 따라서 본 동작을 통하여 본 출원의 일 실시예에 따른 신경망 모델 최적화 방법에 의하면, 필요한 전체 메모리 공간을 줄이고, 연산들의 실행 속도를 높일 수 있다. Also, the apparatus 1000 for optimizing a neural network model may perform an optimization related to memory allocation based on the first memory space map. For example, the apparatus 1000 for optimizing a neural network model corresponds to the output space of a specific operation (eg, ReLu operation, Add operation, and/or Sigmoid operation, etc.) by using a memory in-placing technique. An operation that overwrites the input space of an operation can be performed. Specifically, the apparatus 1000 for optimizing a neural network model utilizes a memory in-place technique to store a memory tensor (eg, FIG. It can be implemented to change the T2 memory tensor of 7) into a memory tensor (eg, the T3 memory tensor of FIG. 7) in which a value output through a specific operation (eg, the third target operation (operation C) of FIG. 7) is stored. have. Also, the apparatus 1000 for optimizing a neural network model may generate a second memory space map from the first memory space map based on a change result of the memory tensor. The second memory space map occupies relatively less memory space than the first memory space map. Therefore, through this operation, according to the neural network model optimization method according to an embodiment of the present application, it is possible to reduce the overall memory space required and increase the execution speed of operations.

최적화 수행 결과에 기초하여 메모리 주소와 관련된 인스트럭션을 생성하는 단계(S2500)에서는, 신경망 모델 최적화 장치(1000)는 전술한 대상 연산 패턴에 포함된 대상 연산들의 병합 혹은 메모리 할당과 관련된 최적화 수행 결과에 기초하여, 신경망 모델을 구성하는 각 연산에 대응되는 메모리 텐서의 메모리 주소와 관련된 인스트럭션, 및/또는 신경망 모델을 구성하는 각 연산의 유형과 관련된 인스트럭션을 생성할 수 있다. In step S2500 of generating an instruction related to a memory address based on the result of the optimization, the neural network model optimizing apparatus 1000 performs merging of target operations included in the above-described target operation pattern or the result of optimization related to memory allocation. Accordingly, an instruction related to a memory address of a memory tensor corresponding to each operation constituting the neural network model and/or an instruction related to the type of each operation constituting the neural network model may be generated.

다시 도 2를 참고하면, 본 출원의 일 실시예에 따른 신경망 모델의 최적화 방법은 임베디드 장치에 대한 최적화를 수행하고 최적 코드 정보를 획득하는 단계(S3000)를 포함할 수 있다. Referring again to FIG. 2 , the method of optimizing a neural network model according to an embodiment of the present application may include performing optimization on an embedded device and obtaining optimal code information ( S3000 ).

이하에서는 도 8 내지 9를 참고하여 본 출원의 일 실시예에 따른 임베디드 장치에 대한 신경망 모델의 최적화에 관한 내용을 보다 구체적으로 서술하도록 한다. 본 출원의 일 실시예에 따르면, 강화 학습 기법을 활용하여 임베디드 장치에 대한 신경망 모델의 최적화를 수행하도록 구현될 수 있다.Hereinafter, with reference to FIGS. 8 to 9 , content related to the optimization of a neural network model for an embedded device according to an embodiment of the present application will be described in more detail. According to an embodiment of the present application, it may be implemented to optimize a neural network model for an embedded device by using a reinforcement learning technique.

도 8은 본 출원의 일 실시예에 따른 임베디드 장치에 대한 최적화를 수행하고 최적 코드 정보를 획득하는 단계를 구체화한 순서도이다. 8 is a flowchart detailing the steps of performing optimization on an embedded device and acquiring optimal code information according to an embodiment of the present application.

본 출원의 일 실시예에 따른 임베디드 장치에 대한 최적화를 수행하고 최적 코드 정보를 획득하는 단계(S3000)는, 임베디드 장치의 컴퓨팅 환경 정보를 획득하는 단계(S3100), 강화 학습으로 훈련된 에이전트를 통하여 인스트럭션 정보에 기초하여 최적화 파라미터를 획득하는 단계(S3200), 및 최적화 파라미터에 기초하여 임베디드 장치에서 이용될 최적 코드 정보를 생성하는 단계(S3300)를 더 포함할 수 있다. The step of performing optimization for the embedded device and obtaining optimal code information according to an embodiment of the present application (S3000) includes obtaining the computing environment information of the embedded device (S3100), through an agent trained by reinforcement learning The method may further include obtaining optimization parameters based on the instruction information ( S3200 ), and generating optimal code information to be used in the embedded device based on the optimization parameters ( S3300 ).

임베디드 장치의 컴퓨팅 환경 정보를 획득하는 단계(S3100)에서는, 신경망 모델 최적화 장치(1000)는, 신경망 모델 최적화 장치(1000)는, 송수신부(1100)를 통하여, 임베디드 장치(100) 혹은 임의의 외부 장치로부터 신경망 모델이 실행될 타겟 장치인 임베디드 장치(100)의 컴퓨팅 환경 정보(예컨대, 임베디드 장치(100)의 메모리 정보 및 프로세서 정보)를 획득할 수 있다. 한편, 도 8에서는 도시하지는 않았지만, 임베디드 장치의 컴퓨팅 환경 정보를 획득하는 단계(S3100)에서는, 신경망 모델 최적화 장치(1000)는, 임베디드 장치(100)의 장치 유형 정보나 타겟 기능 정보 등 임베디드 장치(100)에서 신경망 모델을 실행시키기는 영향을 주는 변수에 대한 임의의 정보를 획득할 수 있다. In the step of acquiring the computing environment information of the embedded device ( S3100 ), the neural network model optimizing device 1000 , the neural network model optimizing device 1000 , through the transceiver 1100 , the embedded device 100 or any external Computing environment information (eg, memory information and processor information of the embedded device 100 ) of the embedded device 100 , which is a target device in which the neural network model is to be executed, may be acquired from the device. On the other hand, although not shown in FIG. 8 , in the step S3100 of obtaining the computing environment information of the embedded device, the neural network model optimization device 1000 includes the embedded device such as device type information or target function information of the embedded device 100 ( 100), executing the neural network model can acquire arbitrary information about the variables that affect it.

강화 학습으로 훈련된 에이전트를 통하여 인스트럭션 정보에 기초하여 최적화 파라미터를 획득하는 단계(S3200)에서는, 신경망 모델 최적화 장치(1000)는 강화 학습 기법을 활용하여 훈련된 에이전트를 이용하여, 최적 코드를 생성하기 위한 최적화 파라미터를 획득할 수 있다. In the step (S3200) of obtaining the optimization parameters based on the instruction information through the agent trained by reinforcement learning, the apparatus 1000 for optimizing the neural network model uses the agent trained using the reinforcement learning technique to generate an optimal code. optimization parameters can be obtained for

강화 학습으로 훈련된 에이전트를 통하여 인스트럭션 정보에 기초하여 최적화 파라미터를 획득하는 단계(S3200)에서는, 신경망 모델 최적화 장치(1000)는, 메모리 상태 정보, 인스트럭션 정보에 포함된 연산 유형 정보, 및 임베디드 장치(100)의 컴퓨팅 환경 정보를 포함하는 임베디드 장치 정보 중 적어도 하나를 에이전트에 입력하고, 에이전트를 통하여 출력되는 최적화 파라미터를 획득할 수 있다. 여기서 최적화 파라미터란, 연산에 대하여 수행될 알고리즘 유형을 선택하는 파라미터, 연산의 블록 사이즈와 관련된 파라미터, 및/또는 코드의 길이와 관련된 파라미터를 포함하여, 신경망 모델을 실행시키기 위하여 필요한 코드와 관련된 임의의 변수와 관련될 수 있다. In the step (S3200) of obtaining optimization parameters based on instruction information through an agent trained by reinforcement learning, the neural network model optimization apparatus 1000 includes memory state information, operation type information included in the instruction information, and an embedded device ( 100) of the embedded device information including the computing environment information may be input to the agent, and an optimization parameter outputted through the agent may be obtained. Herein, the optimization parameter refers to any code related to the code required to execute the neural network model, including a parameter for selecting an algorithm type to be performed with respect to the operation, a parameter related to the block size of the operation, and/or a parameter related to the length of the code. It can be related to variables.

일 예로, 에이전트에 입력된 연산 유형을 콘볼루션(Convolution)이라고 가정한다. 이때, 훈련이 완료된 에이전트는, 입력값에 기초하여 콘볼루션과 관련된 알고리즘, 예컨대, Im2Col 알고리즘, Default 알고리즘, 및 FFT 알고리즘 중에서 적어도 하나의 알고리즘을 선택하는 파라미터를 출력할 수 있다. 또한, 에이전트는 입력값에 기초하여 연산의 블록 사이즈와 관련된 파라미터, 및/또는 코드의 길이와 관련된 파라미터를 출력할 수 있다. 이때, 신경망 모델 최적화 장치(1000)는 에이전트를 통하여 출력된 최적화 파라미터를 획득할 수 있다. As an example, it is assumed that the type of operation input to the agent is convolution. In this case, the trained agent may output a parameter for selecting at least one algorithm from among an algorithm related to convolution, for example, an Im2Col algorithm, a Default algorithm, and an FFT algorithm, based on the input value. In addition, the agent may output a parameter related to the block size of the operation and/or a parameter related to the length of the code based on the input value. In this case, the apparatus 1000 for optimizing the neural network model may acquire the optimization parameter output through the agent.

도 9는 본 출원의 일 실시예에 따른 강화 학습 방식을 통하여 에이전트를 훈련시키는 일 양상을 도시한 도면이다. 9 is a diagram illustrating an aspect of training an agent through a reinforcement learning method according to an embodiment of the present application.

전술한 바와 같이, 본 출원의 일 실시예에 따른 신경망 모델 최적화 장치(1000)는 '강화 학습'을 통하여 학습된 에이전트를 통하여 최적화 파라미터를 획득할 수 있다. 구체적으로 에이전트는, 초기 규칙(policy)에 기초하여, 연산 유형 정보, 메모리 상태 정보, 및 임베디드 장치 정보(혹은 타겟 장치 정보)에 수신하여 최적화 파라미터(예컨대, 알고리즘 유형을 선택하는 파라미터, 연산의 블록 사이즈와 관련된 파라미터, 및/또는 코드의 길이와 관련된 파라미터 등)와 관련된 예측값을 출력하도록 구성될 수 있다. 한편, 코드 생성기는 최적화 파라미터를 획득하고, 최적화 파라미터에 기초하여 코드를 생성할 수 있다. 이때, 생성된 코드의 성능에 대한 실행 및 평가가 수행되고, 평가 결과에 기초하여 코드의 성능에 대한 평가값을 최대화하도록(즉, 코드의 성능이 최대화되도록) 에이전트의 초기 규칙이 갱신될 수 있다. 구체적으로 코드의 성능에 대한 평가값이 최대화되는 최적화 파라미터를 출력하도록, 에이전트의 초기 규칙이 갱신되도록 학습될 수 있다. 훈련이 완료된 에이전트는 연산 유형 정보, 메모리 상태 정보, 및/또는 임베디드 장치 정보를 수신하여 최대화된 성능을 가지는 코드를 생성할 수 있는 최적화 파라미터를 출력할 수 있다. As described above, the apparatus 1000 for optimizing a neural network model according to an embodiment of the present application may obtain an optimization parameter through an agent learned through 'reinforcement learning'. Specifically, the agent receives, based on an initial rule (policy), operation type information, memory state information, and embedded device information (or target device information) to optimize parameters (eg, parameters for selecting an algorithm type, block of operations) parameters related to size, and/or parameters related to the length of the code, etc.). Meanwhile, the code generator may obtain an optimization parameter and generate a code based on the optimization parameter. At this time, execution and evaluation of the performance of the generated code is performed, and the initial rule of the agent may be updated to maximize the evaluation value for the performance of the code based on the evaluation result (that is, so that the performance of the code is maximized). . Specifically, the initial rule of the agent may be learned to be updated so as to output an optimization parameter in which the evaluation value for the performance of the code is maximized. The trained agent may receive operation type information, memory state information, and/or embedded device information and output optimization parameters capable of generating code with maximized performance.

최적화 파라미터에 기초하여 임베디드 장치에서 이용될 최적 코드 정보를 생성하는 단계(S3300)에서는, 신경망 모델 최적화 장치(1000)는, S3200 단계에서 생성된 최적화 파라미터(예컨대, 연산에 대하여 수행될 알고리즘 유형을 선택하는 파라미터, 연산의 블록 사이즈와 관련된 파라미터, 및/또는 코드의 길이와 관련된 파라미터 등)에 기초하여 임베디드 장치(100)에서 이용될 최적 코드 정보를 생성할 수 있다. 구체적으로 신경망 모델 최적화 장치(1000)는, 코드 생성기를 통하여, 최적화 파라미터 및 해당 연산의 메모리 주소에 기초하여 최적 코드를 생성할 수 있다. 예컨대, 최적화 파라미터가 Im2Col 알고리즘, Default 알고리즘, 및 FFT 알고리즘 중에서 적어도 하나의 알고리즘(예컨대, Im2Col 알고리즘)을 선택하는 파라미터, 연산의 블록 사이즈 값과 관련된 파라미터, 및/또는 코드의 길이 값과 관련된 파라미터 중 적어도 하나를 포함하는 경우에는, 신경망 모델 최적화 장치(1000)는, 코드 생성기를 통하여, 최적화 파라미터 및 인스트럭션 정보에 포함된 연산의 메모리 주소에 기초하여, 최적 코드를 생성할 수 있다. In the step S3300 of generating optimal code information to be used in the embedded device based on the optimization parameter, the neural network model optimizing device 1000 selects the optimization parameter (eg, an algorithm type to be performed with respect to the operation) generated in step S3200. The optimal code information to be used in the embedded device 100 may be generated based on a parameter to be used, a parameter related to a block size of an operation, and/or a parameter related to a code length). In more detail, the apparatus 1000 for optimizing a neural network model may generate an optimal code based on an optimization parameter and a memory address of a corresponding operation through a code generator. For example, the optimization parameter is a parameter for selecting at least one algorithm (eg, Im2Col algorithm) from among the Im2Col algorithm, the Default algorithm, and the FFT algorithm, a parameter related to a block size value of an operation, and/or a parameter related to a length value of the code. When at least one is included, the apparatus 1000 for optimizing the neural network model may generate the optimal code through the code generator based on the optimization parameter and the memory address of the operation included in the instruction information.

한편, 도 8에서는 도시하지는 않았지만, 본 출원의 일 실시예에 따른 최적화 파라미터에 기초하여 임베디드 장치에서 이용될 최적 코드 정보를 생성하는 단계(S3300)는, 최적화 파라미터에 기초하여 인스트럭션 정보에 대응되도록 코드를 생성하는 단계, 생성된 코드를 컴파일하여 바이너리 파일 형태로 변환하는 단계를 더 포함할 수 있다. 구체적으로 신경망 모델 최적화 장치(1000)는 최적화 파라미터에 기초하여 코드(예컨대, C 언어 코드)를 생성하고, 인스트럭션 정보에 포함된 메모리 주소에 생성된 코드를 저장할 수 있다. 또한, 신경망 모델 최적화 장치(1000)는 생성된 코드를 컴파일하여 바이너리 파일(예, API 파일)로 변환할 수 있다. 또한 신경망 모델 최적화 장치(1000)는, 송수신부(1100)를 통하여, 생성된 코드의 바이너리 파일을 임베디드 장치(100) 혹은 임의의 외부 장치로 송신할 수 있다. 본 실시예에 따르면, 최적 코드가 바이너리 파일(예컨대, API 파일) 형태로 임베디드 장치(100)로 송신됨으로써, 임베디드 장치(100)의 사용자가 최적 코드를 시각적으로 확인할 수 있다. Meanwhile, although not shown in FIG. 8 , the step of generating optimal code information to be used in the embedded device based on the optimization parameter according to an embodiment of the present application ( S3300 ) is a code corresponding to the instruction information based on the optimization parameter. It may further include the step of generating a , compiling the generated code to convert it into a binary file form. In more detail, the apparatus 1000 for optimizing a neural network model may generate a code (eg, a C language code) based on an optimization parameter, and store the generated code in a memory address included in the instruction information. Also, the neural network model optimization apparatus 1000 may compile the generated code and convert it into a binary file (eg, an API file). Also, the neural network model optimization apparatus 1000 may transmit the generated code binary file to the embedded device 100 or any external device through the transceiver 1100 . According to this embodiment, the optimal code is transmitted to the embedded device 100 in the form of a binary file (eg, API file), so that the user of the embedded device 100 can visually check the optimal code.

상술한 신경망 모델 최적화 장치(1000)의 다양한 동작들은 신경망 모델 최적화 장치(1000)의 메모리(1200)에 저장될 수 있으며, 신경망 모델 최적화 장치(1000)의 프로세서(1300)는 메모리(1200)에 저장된 동작들을 수행하도록 제공될 수 있다. Various operations of the apparatus 1000 for optimizing a neural network model described above may be stored in the memory 1200 of the apparatus for optimizing a neural network model 1000 , and the processor 1300 of the apparatus 1000 for optimizing a neural network model is stored in the memory 1200 . may be provided to perform operations.

본 출원에 개시된 신경망 모델 최적화 방법, 신경망 모델 최적화 장치 및 신경망 모델 최적화 시스템은 가전 제품, 차량용 센서, 유아 혹은 노인의 안전을 위한 제품 및 스마트 워치 등을 포함하여 다양한 임베디드 시스템에서의 인공지능 모델의 효율적인 실행을 위하여 이용될 수 있다. The neural network model optimization method, neural network model optimization device, and neural network model optimization system disclosed in the present application are effective in artificial intelligence models in various embedded systems, including home appliances, vehicle sensors, safety products for infants or the elderly, and smart watches. can be used for execution.

이상에서 실시 형태들에 설명된 특징, 구조, 효과 등은 본 발명의 적어도 하나의 실시 형태에 포함되며, 반드시 하나의 실시 형태에만 한정되는 것은 아니다. 나아가, 각 실시 형태에서 예시된 특징, 구조, 효과 등은 실시 형태들이 속하는 분야의 통상의 지식을 가지는 자에 의해 다른 실시 형태들에 대해서도 조합 또는 변형되어 실시 가능하다. 따라서 이러한 조합과 변형에 관계된 내용들은 본 발명의 범위에 포함되는 것으로 해석되어야 할 것이다.Features, structures, effects, etc. described in the above embodiments are included in at least one embodiment of the present invention, and are not necessarily limited to only one embodiment. Furthermore, features, structures, effects, etc. illustrated in each embodiment can be combined or modified for other embodiments by those of ordinary skill in the art to which the embodiments belong. Accordingly, the contents related to such combinations and modifications should be interpreted as being included in the scope of the present invention.

또한, 이상에서 실시 형태를 중심으로 설명하였으나 이는 단지 예시일 뿐 본 발명을 한정하는 것이 아니며, 본 발명이 속하는 분야의 통상의 지식을 가진 자라면 본 실시 형태의 본질적인 특성을 벗어나지 않는 범위에서 이상에 예시되지 않은 여러 가지의 변형과 응용이 가능함을 알 수 있을 것이다. 즉, 실시 형태에 구체적으로 나타난 각 구성 요소는 변형하여 실시할 수 있는 것이다. 그리고 이러한 변형과 응용에 관계된 차이점들은 첨부된 청구 범위에서 규정하는 본 발명의 범위에 포함되는 것으로 해석되어야 할 것이다.In addition, although the embodiment has been mainly described in the above, this is only an example and does not limit the present invention, and those of ordinary skill in the art to which the present invention pertains in the range that does not deviate from the essential characteristics of the present embodiment. It will be appreciated that various modifications and applications not illustrated are possible. That is, each component specifically shown in the embodiment can be implemented by modification. And the differences related to these modifications and applications should be construed as being included in the scope of the present invention defined in the appended claims.

10: 신경망 모델 최적화 시스템
100: 임베디드 장치
1000: 신경망 모델 최적화 장치10: Neural Network Model Optimization System
100: embedded device
1000: Neural network model optimizer

Claims

A method for optimizing a neural network model in consideration of the computing environment of an embedded device in which the neural network model is to be driven, by a neural network model optimization device that performs optimization on the neural network model based on the execution data of the neural network model that has been trained, the method comprising:
Acquiring execution data of a neural network model on which learning is completed - The execution data includes at least one of hierarchical data of the neural network model, operation data constituting the neural network model, and parameters of the neural network model- ;
performing optimization on the structure of the neural network model based on the execution data of the neural network model and obtaining instruction information;
performing optimization on an embedded device in which the neural network model is to be driven based on the instruction information and obtaining optimal code information; and
Including; transmitting the optimal code information;
The step of optimizing the structure of the neural network model and obtaining instruction information includes:
generating a directed acyclic graph (DAG) based on the execution data;
determining an execution order of operations in consideration of the structure of the directed acyclic graph;
detecting a target operation pattern of the directional acyclic graph corresponding to the reference operation pattern based on a predetermined reference operation pattern, and merging the first target operation and the second target operation included in the target operation pattern;
obtaining a first memory space map based on the determined execution order, and performing optimization related to memory allocation based on the first memory space map; and
Generating an instruction related to a memory address based on the optimization performance result; further comprising,
The step of determining the execution order of the operations includes:
Further comprising; after sequentially executing the operations included in the first branch of the directed acyclic graph, determining the execution order to execute the operations included in the second branch of the directed acyclic graph;
How to optimize neural network models.

The method of claim 1,
The step of merging the first target operation and the second target operation includes:
obtaining predetermined reference operation pattern information, wherein the reference operation pattern information includes a first operation and a second operation associated with the first operation;
detecting the first target operation corresponding to the first operation and the second target operation corresponding to the second operation from the directional acyclic graph based on the reference operation pattern information; and
merging the first target operation and the second target operation, and transforming a kernel based on a result of the merging;
How to optimize neural network models.

The method of claim 1,
The step of performing the optimization related to the memory allocation comprises:
generating the first memory space map based on the determined execution order and the size of data output through the operation;
changing a first memory tensor in which a value input through a third target operation is stored into a second memory tensor in which a value output through the third target operation is stored; and
generating a second memory space map from the first memory space map based on the change result;
How to optimize neural network models.

The method of claim 1,
The step of determining the execution order of the operations includes:
A first memory space required for a fourth target operation included in a first branch of the directed acyclic graph and a second memory space required for a fifth target operation included in a second branch of the directed acyclic graph calculating ;
comparing the first memory space and the second memory space; and
determining the execution order of the fourth target operation and the fifth target operation according to the comparison result; further comprising
How to optimize neural network models.

5. The method of claim 4,
When the first memory space is larger than the second memory space, the execution order of the fourth target operation is allocated with a lower priority than the execution order of the fifth target operation,
When the first memory space is smaller than the second memory space, the execution order of the fourth target operation is allocated with a higher priority than the execution order of the fifth target operation,
How to optimize neural network models.

The method of claim 1,
The step of optimizing the structure of the neural network model and obtaining instruction information includes:
obtaining input data and output data related to an operation of the neural network model; and
adjusting the input data and the output data to a value corresponding to a predetermined integer range; further comprising
How to optimize neural network models.

A computer-readable recording medium in which a program for executing the method according to any one of claims 1 to 6 is recorded on a computer.

In the neural network model optimization device for performing optimization on the neural network model based on the execution data of the trained neural network model and the computing environment of the embedded device in which the neural network model is to be driven,
a transceiver for acquiring execution data of the neural network model that has been trained and computing environment information of an embedded device in which the neural network model is to be driven; and
A processor for performing optimization on the neural network model based on the execution data and the computing environment information of the embedded device;
The processor is
Acquiring the execution data of the neural network model on which learning is completed, the execution data including at least one of hierarchical data of the neural network model, operation data constituting the neural network model, and parameters of the neural network model; Based on the execution data of the neural network model, optimization of the structure of the neural network model is performed, instruction information is obtained, and optimization of the embedded device in which the neural network model is to be driven is performed based on the instruction information, and optimal code information is obtained. obtain, and transmit the optimal code information,
The processor is
A directional acyclic graph (DAG) is generated based on the execution data, an execution order of operations is determined in consideration of the structure of the directional acyclic graph, and an execution order corresponding to the reference operation pattern is determined based on a predetermined reference operation pattern. detecting a target operation pattern of the directed acyclic graph, merging a first target operation and a second target operation included in the target operation pattern, and obtaining a first memory space map based on the determined execution order; configured to obtain the instruction information by performing optimization related to memory allocation based on the first memory space map, and generating instructions related to an execution order and a memory address based on a result of performing the optimization,
The processor is
configured to determine the execution order to execute the operations included in the second branch of the directed acyclic graph after sequentially executing the operations included in the first branch of the directed acyclic graph,
Neural network model optimizer.