KR20230004207A

KR20230004207A - Method of optimizing neural network model and neural network model processing system performing the same

Info

Publication number: KR20230004207A
Application number: KR1020210114779A
Authority: KR
Inventors: 이창권; 김경영; 김병수; 김재곤; 임한영; 최정민; 하상혁
Original assignee: 삼성전자주식회사
Priority date: 2021-06-30
Filing date: 2021-08-30
Publication date: 2023-01-06

Abstract

In a neural network model optimization method, first model information on a first neural network model is received. Device information on a first target device for driving the first neural network model is received. Based on at least one of a plurality of suitability determination algorithms, the first model information, and the device information, whether the first neural network model is suitable for driving in the first target device is analyzed. The result of the analysis is visualized and output so that the first model information and the result of the analysis are displayed on one screen.

Description

Optimization method of neural network model and neural network model processing system performing the same

본 발명은 머신 러닝에 관한 것으로서, 더욱 상세하게는 신경망 모델의 최적화 방법 및 상기 신경망 모델의 최적화 방법을 수행하는 신경망 모델 처리 시스템에 관한 것이다.The present invention relates to machine learning, and more particularly, to a method for optimizing a neural network model and a neural network model processing system for performing the method for optimizing a neural network model.

머신 러닝(machine learning)을 통하여 데이터를 분류(classify)하는 방법은 여러 가지가 있다. 그 중에서 신경망(neural network) 또는 인공 신경망(artificial neural network; ANN) 기반의 데이터 분류 방법이 대표적이다. 인공 신경망이란 연결 선으로 연결된 많은 수의 인공 뉴런들을 사용하여 생물학적인 시스템의 계산 능력을 모방하는 소프트웨어나 하드웨어로 구현된 연산 모델을 나타낸다. 인공 신경망에서는 생물학적인 뉴런의 기능을 단순화시킨 인공 뉴런을 사용하게 된다. 그리고 연결 강도를 갖는 연결 선을 통해 상호 연결시켜 인간의 인지 작용이나 학습 과정을 수행하게 된다.There are several ways to classify data through machine learning. Among them, a data classification method based on a neural network or an artificial neural network (ANN) is representative. An artificial neural network refers to a computational model implemented in software or hardware that mimics the computational power of a biological system using a large number of artificial neurons connected by connecting lines. Artificial neural networks use artificial neurons that simplify the functions of biological neurons. In addition, human cognitive function or learning process is performed by interconnecting them through connection lines having connection strength.

최근에는 인공 신경망의 한계를 극복하기 위한 딥 러닝(deep learning) 기술이 연구되고 있으며, 딥 러닝 기술이 발전함에 따라 신경망 모델을 분석하고 최적화하는 다양한 방법들이 제안되고 있다. 기존에는 범용적인 알고리즘을 적용한 최적화 기법이 사용되어 왔다.Recently, deep learning technology for overcoming the limitations of artificial neural networks has been studied, and as deep learning technology develops, various methods for analyzing and optimizing neural network models have been proposed. In the past, an optimization technique applying a general-purpose algorithm has been used.

본 발명의 일 목적은 대상 장치(target device)에 가장 적합하도록 신경망 모델을 효과적으로 최적화하는 방법을 제공하는 것이다.One object of the present invention is to provide a method for effectively optimizing a neural network model to be most suitable for a target device.

본 발명의 다른 목적은 상기 신경망 모델의 최적화 방법을 수행하는 신경망 모델 처리 시스템을 제공하는 것이다.Another object of the present invention is to provide a neural network model processing system that performs the optimization method of the neural network model.

상기 일 목적을 달성하기 위해, 본 발명의 실시예들에 따른 신경망 모델의 최적화 방법에서, 제1 신경망 모델에 대한 제1 모델 정보를 수신한다. 상기 제1 신경망 모델을 구동하고자 하는 제1 대상 장치(target device)에 대한 장치 정보를 수신한다. 복수의 적합성(suitability) 판단 알고리즘들 중 적어도 하나, 상기 제1 모델 정보 및 상기 장치 정보에 기초하여, 상기 제1 신경망 모델이 상기 제1 대상 장치에서 구동하기에 적합한지 분석(analysis)을 수행한다. 상기 제1 모델 정보 및 상기 분석의 결과가 하나의 화면에 표시되도록 상기 분석의 결과를 시각화하여 출력한다.In order to achieve the above object, in a method for optimizing a neural network model according to embodiments of the present invention, first model information on a first neural network model is received. Device information about a first target device to drive the first neural network model is received. Based on at least one of a plurality of suitability determination algorithms, the first model information, and the device information, whether the first neural network model is suitable for running in the first target device is analyzed. . The result of the analysis is visualized and output so that the first model information and the result of the analysis are displayed on one screen.

상기 다른 목적을 달성하기 위해, 본 발명의 실시예들에 따른 컴퓨터 기반의(computer-based) 신경망 모델 처리 시스템은 입력 장치, 저장 장치, 출력 장치 및 프로세서를 포함한다. 상기 입력 장치는 제1 신경망 모델에 대한 제1 모델 정보 및 상기 제1 신경망 모델을 구동하고자 하는 제1 대상 장치(target device)에 대한 장치 정보를 수신한다. 상기 저장 장치는 복수의 적합성(suitability) 판단 알고리즘들 중 적어도 하나, 상기 제1 모델 정보 및 상기 장치 정보에 기초하여 상기 제1 신경망 모델이 상기 제1 대상 장치에서 구동하기에 적합한지 분석(analysis)을 수행하고, 상기 제1 모델 정보 및 상기 분석의 결과가 하나의 화면에 표시되도록 상기 분석의 결과를 생성하는 프로그램 루틴들(program routines)에 대한 정보를 저장한다. 상기 출력 장치는 상기 분석의 결과를 시각화하여 출력한다. 상기 프로세서는 상기 입력 장치, 상기 저장 장치 및 상기 출력 장치와 연결되어 상기 프로그램 루틴들의 실행을 제어한다.In order to achieve the above other object, a computer-based neural network model processing system according to embodiments of the present invention includes an input device, a storage device, an output device, and a processor. The input device receives first model information about a first neural network model and device information about a first target device to drive the first neural network model. The storage device analyzes whether the first neural network model is suitable for operation in the first target device based on at least one of a plurality of suitability determination algorithms, the first model information, and the device information and stores information on program routines that generate the result of the analysis so that the first model information and the result of the analysis are displayed on a single screen. The output device visualizes and outputs the result of the analysis. The processor is connected to the input device, the storage device and the output device to control execution of the program routines.

상기와 같은 본 발명의 실시예들에 따른 신경망 모델의 최적화 방법 및 신경망 모델 처리 시스템에서는, 대상 장치에 가장 최적화된 신경망 모델을 효과적으로 구현할 수 있다. 구체적으로, 학습 전에는 대상 장치에 최적화된 신경망 모델을 설계할 수 있고, 학습 후에는 신경망 모델이 대상 장치에 적합한지 검토하여 필요한 경우 신경망 모델의 수정 및/또는 더 적합한 새로운 구성을 제시할 수 있고, 신경망 모델의 각 구성요소에 적합한 양자화를 적용하여 최적화된 성능을 얻을 수 있으며, 이를 위한 그래픽 유저 인터페이스를 제공할 수 있다. 따라서, 사용자는 대상 장치에 가장 최적화된 신경망 모델을 효과적으로 설계 및 수정할 수 있고, 적합한 양자화 기법을 적용할 수 있다.In the neural network model optimization method and the neural network model processing system according to the embodiments of the present invention as described above, the most optimized neural network model for the target device can be effectively implemented. Specifically, before learning, a neural network model optimized for the target device can be designed, and after learning, the neural network model can be reviewed to see if it is suitable for the target device, and if necessary, the neural network model can be modified and / or a more suitable new configuration can be proposed, Optimal performance can be obtained by applying appropriate quantization to each component of the neural network model, and a graphical user interface for this can be provided. Accordingly, the user can effectively design and modify a neural network model most optimized for the target device and apply an appropriate quantization technique.

도 1은 본 발명의 실시예들에 따른 신경망 모델의 최적화 방법을 나타내는 순서도이다.
도 2, 3 및 4는 본 발명의 실시예들에 따른 신경망 모델 처리 시스템을 나타내는 블록도이다.
도 5a, 5b, 5c 및 6은 본 발명의 실시예들에 따른 신경망 모델의 최적화 방법의 대상이 되는 신경망 모델을 설명하기 위한 도면들이다.
도 7은 도 1의 분석을 수행하는 단계의 일 예를 나타내는 순서도이다.
도 8은 도 7의 제1 분석을 수행하는 단계의 일 예를 나타내는 순서도이다.
도 9는 도 1의 분석을 수행하는 단계의 다른 예를 나타내는 순서도이다.
도 10은 도 9의 제2 분석을 수행하는 단계의 일 예를 나타내는 순서도이다.
도 11은 도 1의 분석을 수행하는 단계의 또 다른 예를 나타내는 순서도이다.
도 12 및 13은 도 11의 제3 분석을 수행하는 단계의 예들을 나타내는 순서도들이다.
도 14는 도 1의 분석을 수행하는 단계의 또 다른 예를 나타내는 순서도이다.
도 15는 도 1의 신경망 모델의 최적화 방법의 구체적인 일 예를 나타내는 순서도이다.
도 16a, 16b, 16c, 16d, 16e 및 16f는 도 15의 동작을 설명하기 위한 도면들이다.
도 17은 본 발명의 실시예들에 따른 신경망 모델의 최적화 방법을 나타내는 순서도이다.
도 18은 도 17의 제1 신경망 모델의 레이어들 중 적어도 하나를 변경하는 단계의 일 예를 나타내는 순서도이다.
도 19는 도 17의 신경망 모델의 최적화 방법의 구체적인 일 예를 나타내는 순서도이다.
도 20a, 20b, 20c 및 20d는 도 19의 동작을 설명하기 위한 도면들이다.
도 21은 본 발명의 실시예들에 따른 신경망 모델의 최적화 방법을 나타내는 순서도이다.
도 22는 도 21의 제1 신경망 모델의 레이어들 중 적어도 일부에 서로 다른 양자화 방식을 적용하는 단계의 일 예를 나타내는 순서도이다.
도 23은 도 21의 신경망 모델의 최적화 방법의 구체적인 일 예를 나타내는 순서도이다.
도 24a, 24b 및 24c는 도 23의 동작을 설명하기 위한 도면들이다.
도 25는 본 발명의 실시예들에 따른 신경망 모델의 최적화 방법이 구현되는 시스템을 나타내는 블록도이다.1 is a flowchart illustrating a method for optimizing a neural network model according to embodiments of the present invention.
2, 3 and 4 are block diagrams showing neural network model processing systems according to embodiments of the present invention.
5A, 5B, 5C, and 6 are diagrams for explaining a neural network model that is a target of a method for optimizing a neural network model according to embodiments of the present invention.
7 is a flowchart illustrating an example of steps for performing the analysis of FIG. 1 .
FIG. 8 is a flowchart illustrating an example of performing the first analysis of FIG. 7 .
9 is a flowchart illustrating another example of steps for performing the analysis of FIG. 1 .
FIG. 10 is a flowchart illustrating an example of performing the second analysis of FIG. 9 .
FIG. 11 is a flowchart illustrating another example of steps for performing the analysis of FIG. 1 .
12 and 13 are flowcharts illustrating examples of steps for performing the third analysis of FIG. 11 .
FIG. 14 is a flowchart illustrating another example of steps for performing the analysis of FIG. 1 .
15 is a flowchart illustrating a specific example of a method for optimizing the neural network model of FIG. 1 .
16a, 16b, 16c, 16d, 16e and 16f are diagrams for explaining the operation of FIG. 15 .
17 is a flowchart illustrating a method for optimizing a neural network model according to embodiments of the present invention.
FIG. 18 is a flowchart illustrating an example of changing at least one of the layers of the first neural network model of FIG. 17 .
19 is a flowchart illustrating a specific example of a method for optimizing the neural network model of FIG. 17 .
20a, 20b, 20c and 20d are diagrams for explaining the operation of FIG. 19 .
21 is a flowchart illustrating a method for optimizing a neural network model according to embodiments of the present invention.
FIG. 22 is a flowchart illustrating an example of applying different quantization schemes to at least some of the layers of the first neural network model of FIG. 21 .
23 is a flowchart illustrating a specific example of a method for optimizing the neural network model of FIG. 21 .
24a, 24b and 24c are diagrams for explaining the operation of FIG. 23 .
25 is a block diagram illustrating a system in which a method for optimizing a neural network model according to embodiments of the present invention is implemented.

이하, 첨부한 도면들을 참조하여, 본 발명의 바람직한 실시예를 보다 상세하게 설명하고자 한다. 도면상의 동일한 구성요소에 대해서는 동일한 참조부호를 사용하고 동일한 구성요소에 대해서 중복된 설명은 생략한다.Hereinafter, with reference to the accompanying drawings, preferred embodiments of the present invention will be described in more detail. The same reference numerals are used for the same components in the drawings, and redundant descriptions of the same components are omitted.

도 1은 본 발명의 실시예들에 따른 신경망 모델의 최적화 방법을 나타내는 순서도이다.1 is a flowchart illustrating a method for optimizing a neural network model according to embodiments of the present invention.

도 1을 참조하면, 본 발명의 실시예들에 따른 신경망 모델의 최적화 방법은, 적어도 일부가 하드웨어 및/또는 소프트웨어로 구현되는 컴퓨터 기반의(computer-based) 신경망 모델 처리 시스템에 의해 수행/실행된다. 상기 신경망 모델 처리 시스템에 대해서는 도 2 내지 4를 참조하여 후술하도록 한다.Referring to FIG. 1, a method for optimizing a neural network model according to embodiments of the present invention is performed/executed by a computer-based neural network model processing system, at least in part of which is implemented as hardware and/or software. . The neural network model processing system will be described later with reference to FIGS. 2 to 4 .

본 발명의 실시예들에 따른 신경망 모델의 최적화 방법에서, 제1 신경망 모델에 대한 제1 모델 정보를 수신한다(단계 S100). 상기 제1 신경망 모델은 학습이 완료된(pre-trained) 신경망 모델일 수도 있고, 학습 중인 신경망 모델일 수도 있다. 다시 말하면, 본 발명의 실시예들에 따른 신경망 모델의 최적화 방법은, 상기 제1 신경망 모델에 대한 학습 동작이 완료된 이후에 수행될 수도 있고, 상기 제1 신경망 모델에 대한 학습 동작이 진행되는 도중에 수행될 수도 있다. 신경망 모델의 예시적인 구성에 대해서는 도 5를 참조하여 후술하도록 한다.In the method for optimizing a neural network model according to embodiments of the present invention, first model information for a first neural network model is received (step S100). The first neural network model may be a pre-trained neural network model or may be a neural network model in training. In other words, the method for optimizing a neural network model according to embodiments of the present invention may be performed after the learning operation for the first neural network model is completed, or performed while the learning operation for the first neural network model is in progress. It could be. An exemplary configuration of the neural network model will be described later with reference to FIG. 5 .

신경망 모델에 대한 학습 동작은, 해결해야 하는 과제와 함수들의 모임이 주어졌을 때 과제를 어떤 최적화된 방법으로 푸는 과정을 나타내며, 신경망 모델의 성능 및/또는 정확도를 향상시키기 위한 과정을 나타낸다. 예를 들어, 신경망 모델에 대한 학습 동작은, 신경망 모델의 네트워크 구조를 결정하는 동작, 가중치와 같은 파라미터들을 결정하는 동작 등을 포함할 수 있다. 또한, 신경망 모델에 대한 학습 동작 시에는, architecture 및 data type을 유지하면서 다른 파라미터들을 변경할 수 있다.A learning operation for a neural network model represents a process of solving a task in an optimized way given a task to be solved and a group of functions, and a process for improving the performance and/or accuracy of the neural network model. For example, the learning operation of the neural network model may include an operation of determining a network structure of the neural network model, an operation of determining parameters such as weights, and the like. In addition, during a learning operation for a neural network model, other parameters may be changed while maintaining architecture and data type.

상기 제1 신경망 모델을 구동하고자 하는 제1 대상 장치(target device)에 대한 장치 정보를 수신한다(단계 S200). 상기 제1 대상 장치는 상기 제1 신경망 모델을 실행하는 프로세싱 소자(processing element) 및/또는 상기 프로세싱 소자를 포함하는 신경망 시스템(또는 전자 시스템)을 나타낼 수 있다. 신경망 시스템의 예시적인 구성에 대해서는 도 6을 참조하여 후술하도록 한다.Device information about a first target device to drive the first neural network model is received (step S200). The first target device may represent a processing element that executes the first neural network model and/or a neural network system (or electronic system) including the processing element. An exemplary configuration of the neural network system will be described later with reference to FIG. 6 .

복수의 적합성(suitability) 판단 알고리즘들 중 적어도 하나, 상기 제1 모델 정보 및 상기 장치 정보에 기초하여, 상기 제1 신경망 모델이 상기 제1 대상 장치에서 구동하기에 적합한지 분석(analysis)을 수행한다(단계 S300). 예를 들어, 상기 복수의 적합성 판단 알고리즘들은 상기 제1 신경망 모델의 성능 효율성(performance efficiency)을 판단하는 제1 알고리즘, 상기 제1 신경망 모델의 복잡도(complexity) 및 용량(capacity)을 분석하는 제2 알고리즘, 상기 제1 신경망 모델의 메모리 효율성(memory efficiency)을 판단하는 제3 알고리즘 등을 포함할 수 있다. 상기 복수의 적합성 판단 알고리즘들 및 이를 이용한 단계 S300의 분석 동작의 예시적인 구성에 대해서는 도 7 내지 14를 참조하여 후술하도록 한다.Based on at least one of a plurality of suitability determination algorithms, the first model information, and the device information, whether the first neural network model is suitable for running in the first target device is analyzed. (Step S300). For example, the plurality of adequacy determination algorithms may include a first algorithm for determining performance efficiency of the first neural network model and a second algorithm for analyzing complexity and capacity of the first neural network model. algorithm, a third algorithm for determining memory efficiency of the first neural network model, and the like. Exemplary configurations of the plurality of suitability determination algorithms and the analysis operation of step S300 using the same will be described later with reference to FIGS. 7 to 14 .

상기 제1 모델 정보 및 상기 분석의 결과가 하나의 화면에 표시되도록 상기 분석의 결과를 시각화하여 출력한다(단계 S400). 예를 들어, 단계 S400은 그래픽 유저 인터페이스(graphical user interface; GUI)를 이용하여 수행될 수 있다. 예를 들어, 상기 분석의 결과는 점수(score) 및 컬러(color) 중 적어도 하나에 기초하여 표시되며, 상기 제1 모델 정보와 상기 분석의 결과를 함께 나타내도록 그래픽 표현을 상기 그래픽 유저 인터페이스 상에 표시할 수 있다. 상기 그래픽 유저 인터페이스에 대해서는 도 16 등을 참조하여 후술하도록 한다.The result of the analysis is visualized and output so that the first model information and the result of the analysis are displayed on one screen (step S400). For example, step S400 may be performed using a graphical user interface (GUI). For example, a result of the analysis is displayed based on at least one of a score and a color, and a graphic representation is displayed on the graphic user interface to show the first model information and the result of the analysis together. can be displayed The graphic user interface will be described later with reference to FIG. 16 and the like.

본 발명의 실시예들에 따른 신경망 모델의 최적화 방법에서는, 대상 장치에 가장 최적화된 신경망 모델을 효과적으로 구현할 수 있다. 구체적으로, 학습 전에는 대상 장치에 최적화된 신경망 모델을 설계할 수 있고, 학습 후에는 신경망 모델이 대상 장치에 적합한지 검토하여 필요한 경우 신경망 모델의 수정 및/또는 더 적합한 새로운 구성을 제시할 수 있고, 신경망 모델의 각 구성요소에 적합한 양자화를 적용하여 최적화된 성능을 얻을 수 있으며, 이를 위한 그래픽 유저 인터페이스를 제공할 수 있다. 따라서, 사용자는 대상 장치에 가장 최적화된 신경망 모델을 효과적으로 설계 및 수정할 수 있고, 적합한 양자화 기법을 적용할 수 있다.In the method for optimizing a neural network model according to embodiments of the present invention, a neural network model most optimized for a target device can be effectively implemented. Specifically, before learning, a neural network model optimized for the target device can be designed, and after learning, the neural network model can be reviewed to see if it is suitable for the target device, and if necessary, the neural network model can be modified and / or a more suitable new configuration can be proposed, Optimal performance can be obtained by applying appropriate quantization to each component of the neural network model, and a graphical user interface for this can be provided. Accordingly, the user can effectively design and modify a neural network model most optimized for the target device and apply an appropriate quantization technique.

도 2, 3 및 4는 본 발명의 실시예들에 따른 신경망 모델 처리 시스템을 나타내는 블록도이다.2, 3 and 4 are block diagrams showing neural network model processing systems according to embodiments of the present invention.

도 2를 참조하면, 신경망 모델 처리 시스템(1000)은 컴퓨터 기반의(computer-based) 신경망 모델 처리 시스템이며, 프로세서(1100), 저장 장치(1200) 및 입출력 장치(1300)를 포함한다. 입출력 장치(1300)는 입력 장치(1310) 및 출력 장치(1320)를 포함한다.Referring to FIG. 2 , a neural network model processing system 1000 is a computer-based neural network model processing system, and includes a processor 1100, a storage device 1200, and an input/output device 1300. The input/output device 1300 includes an input device 1310 and an output device 1320 .

프로세서(1100)는 본 발명의 실시예들에 따른 신경망 모델의 최적화 방법을 위한 연산을 수행하는데 이용될 수 있다. 예를 들어, 프로세서(1100)는 마이크로프로세서(micro-processor), AP(application processor, DSP(digital signal processor), GPU(graphic processing unit) 등을 포함할 수 있다. 도 2에서는 하나의 프로세서(1100)만을 도시하였으나, 본 발명은 이에 한정되지 않으며, 신경망 모델 처리 시스템(1000)은 복수의 프로세서들을 포함할 수도 있다. 한편, 상세하게 도시하지는 않았으나, 프로세서(1100)는 연산 능력 향상을 위해 캐시 메모리를 포함할 수도 있다.The processor 1100 may be used to perform an operation for a method for optimizing a neural network model according to embodiments of the present invention. For example, the processor 1100 may include a micro-processor, an application processor (AP), a digital signal processor (DSP), a graphic processing unit (GPU), etc. In FIG. 2, one processor 1100 ), but the present invention is not limited thereto, and the neural network model processing system 1000 may include a plurality of processors. Meanwhile, although not shown in detail, the processor 1100 is a cache memory to improve computational performance. may also include

저장 장치(1200)는 본 발명의 실시예들에 따른 신경망 모델의 최적화 방법을 위한 프로그램(program, PR)(1210)을 저장/포함하고, 본 발명의 실시예들에 따른 신경망 모델의 최적화 방법을 수행하는데 이용되는 적합성 판단 알고리즘들(suitability determination algorithms, SDA)(1220), 업데이트(또는 변경) 알고리즘들(updating algorithms, UA)(1230) 및 양자화 방식들(quantization schemes, QS)(1240)을 더 저장/포함할 수 있다. 프로그램(1210), 적합성 판단 알고리즘들(1220), 업데이트 알고리즘들(1230) 및 양자화 방식들(1240)은 저장 장치(1200)로부터 프로세서(1100)로 제공될 수 있다.The storage device 1200 stores/includes a program (PR) 1210 for a neural network model optimization method according to embodiments of the present invention, and a neural network model optimization method according to embodiments of the present invention. suitability determination algorithms (SDA) 1220, updating (or changing) algorithms (UA) 1230 and quantization schemes (QS) 1240 used to perform Can be stored/embedded. The program 1210 , suitability determination algorithms 1220 , update algorithms 1230 , and quantization schemes 1240 may be provided to the processor 1100 from the storage device 1200 .

저장 장치(1200)는 컴퓨터로 읽을 수 있는 저장 매체로서, 데이터 및/또는 컴퓨터에 의해 실행되는 명령어들을 저장하는 임의의 저장 매체를 포함할 수 있다. 예를 들어, 컴퓨터로 읽을 수 있는 저장 매체는 DRAM(dynamic random access memory) 등의 휘발성 메모리, 플래시 메모리(flash memory), MRAM(magnetic random access memory), PRAM(phase change random access memory), RRAM(resistance random access memory) 등과 같은 비휘발성 메모리 등을 포함할 수 있다. 컴퓨터로 읽을 수 있는 저장 매체는 컴퓨터에 삽입 가능하거나, 컴퓨터 내에 집적되거나, 네트워크 및/또는 무선 링크와 같은 통신 매개체를 통해서 컴퓨터와 결합될 수 있다.The storage device 1200 is a computer-readable storage medium and may include any storage medium that stores data and/or instructions executed by a computer. For example, computer-readable storage media include volatile memories such as dynamic random access memory (DRAM), flash memory, magnetic random access memory (MRAM), phase change random access memory (PRAM), RRAM ( resistance random access memory) and the like. A computer-readable storage medium can be inserted into a computer, integrated into a computer, or coupled with a computer through a communication medium such as a network and/or a wireless link.

입력 장치(1310)는 본 발명의 실시예들에 따른 신경망 모델의 최적화 방법을 위한 입력을 수신하는데 이용될 수 있다. 예를 들어, 입력 장치(1310)는 모델 정보(MI) 및 장치 정보(DI)를 수신하며, 사용자 입력을 더 수신할 수 있다. 예를 들어, 입력 장치(1310)는 키보드, 키패드, 터치패드, 터치스크린, 마우스, 리모트 컨트롤러 등과 같은 입력 수단을 포함할 수 있다.The input device 1310 may be used to receive an input for a method for optimizing a neural network model according to embodiments of the present invention. For example, the input device 1310 receives model information MI and device information DI, and may further receive a user input. For example, the input device 1310 may include an input means such as a keyboard, keypad, touchpad, touchscreen, mouse, remote controller, and the like.

출력 장치(1320)는 본 발명의 실시예들에 따른 신경망 모델의 최적화 방법을 위한 출력을 제공하는데 이용될 수 있다. 예를 들어, 출력 장치(1320)는 시각화된 출력(VOUT)을 제공할 수 있다. 예를 들어, 출력 장치(1320)는 디스플레이 장치 등과 같은 시각화된 출력(VOUT)을 표시하는 출력 수단을 포함할 수 있고, 그 밖에 스피커, 프린터 등과 같은 출력 수단을 더 포함할 수 있다.The output device 1320 may be used to provide an output for a neural network model optimization method according to embodiments of the present invention. For example, the output device 1320 may provide a visualized output VOUT. For example, the output device 1320 may include an output means for displaying the visualized output VOUT, such as a display device, and may further include output means such as a speaker and a printer.

신경망 모델 처리 시스템(1000)은 도 1을 참조하여 상술한 본 발명의 실시예들에 따른 신경망 모델의 최적화 방법을 수행할 수 있다. 구체적으로, 입력 장치(1310)는 제1 신경망 모델에 대한 제1 모델 정보(예를 들어, MI) 및 상기 제1 신경망 모델을 구동하고자 하는 제1 대상 장치에 대한 장치 정보(예를 들어, DI)를 수신하고, 저장 장치(1200)는 복수의 적합성 판단 알고리즘들 중 적어도 하나, 상기 제1 모델 정보 및 상기 장치 정보에 기초하여 상기 제1 신경망 모델이 상기 제1 대상 장치에서 구동하기에 적합한지 분석을 수행하고, 상기 제1 모델 정보 및 상기 분석의 결과가 하나의 화면에 표시되도록 상기 분석의 결과를 생성하는 프로그램 루틴들(program routines)에 대한 정보를 저장하며, 출력 장치(1320)는 상기 분석의 결과를 시각화하여 출력하고, 프로세서(1100)는 입력 장치(1310), 저장 장치(1200) 및 출력 장치(1320)와 연결되어 상기 프로그램 루틴들의 실행을 제어할 수 있다. 또한, 신경망 모델 처리 시스템(1000)은 도 17 및 21을 참조하여 후술하는 본 발명의 실시예들에 따른 신경망 모델의 최적화 방법을 수행할 수도 있다.The neural network model processing system 1000 may perform the neural network model optimization method according to the embodiments of the present invention described above with reference to FIG. 1 . Specifically, the input device 1310 may provide first model information (eg, MI) for a first neural network model and device information (eg, DI) for a first target device to drive the first neural network model. ) is received, and the storage device 1200 determines whether the first neural network model is suitable for driving in the first target device based on at least one of a plurality of suitability determination algorithms, the first model information, and the device information. The first model information and information on program routines that generate a result of the analysis are stored so that the result of the analysis is displayed on one screen, and the output device 1320 performs the analysis. Results of the analysis are visualized and output, and the processor 1100 is connected to the input device 1310, the storage device 1200, and the output device 1320 to control the execution of the program routines. In addition, the neural network model processing system 1000 may perform a neural network model optimization method according to embodiments of the present invention described later with reference to FIGS. 17 and 21 .

도 3을 참조하면, 신경망 모델 처리 시스템(2000)은 프로세서(2100), 입출력 장치(2200), 네트워크 인터페이스(2300), RAM(random access memory)(2400), ROM(read only memory)(2500) 및 저장 장치(2600)를 포함한다.Referring to FIG. 3, the neural network model processing system 2000 includes a processor 2100, an input/output device 2200, a network interface 2300, a random access memory (RAM) 2400, and a read only memory (ROM) 2500. and a storage device 2600 .

일 실시예에서, 신경망 모델 처리 시스템(2000)은 컴퓨팅 시스템일 수 있으며, 데스크탑 컴퓨터, 워크스테이션, 서버 등과 같이 고정형 컴퓨팅 시스템일 수도 있고, 랩탑 컴퓨터 등과 같이 휴대형 컴퓨팅 시스템일 수도 있다.In one embodiment, the neural network model processing system 2000 may be a computing system, and may be a fixed computing system such as a desktop computer, a workstation, and a server, or a portable computing system such as a laptop computer.

프로세서(2100)는 도 2의 프로세서(1100)와 실질적으로 동일할 수 있다. 예를 들어, 프로세서(2100)는 임의의 명령어 세트(예를 들어, IA-32(Intel Architecture-32), 64 비트 확장 IA-32, x86-64, PowerPC, Sparc, MIPS, ARM, IA-64 등)를 실행할 수 있는 코어를 포함할 수 있다. 예를 들어, 프로세서(2100)는 버스를 통해서 메모리, 즉 RAM(2400) 또는 ROM(2500)에 액세스할 수 있고, RAM(2400) 또는 ROM(2500)에 저장된 명령어들을 실행할 수 있다. 도 3에 도시된 것처럼, RAM(2400)은 본 발명의 실시예들에 따른 신경망 모델의 최적화 방법을 위한 프로그램(PR)의 전부 또는 일부를 저장할 수 있고, 프로그램(PR)은 프로세서(2100)로 하여금 신경망 모델의 최적화를 위한 동작을 수행하도록 할 수 있다.The processor 2100 may be substantially the same as the processor 1100 of FIG. 2 . For example, the processor 2100 may support any instruction set (e.g., Intel Architecture-32 (IA-32), 64-bit extended IA-32, x86-64, PowerPC, Sparc, MIPS, ARM, IA-64 etc.) can be included. For example, the processor 2100 may access memory, that is, the RAM 2400 or the ROM 2500 through a bus, and execute instructions stored in the RAM 2400 or the ROM 2500. As shown in FIG. 3 , the RAM 2400 may store all or part of a program PR for a method for optimizing a neural network model according to embodiments of the present invention, and the program PR is transferred to the processor 2100. to perform an operation for optimizing the neural network model.

다시 말하면, 프로그램(PR)은 프로세서(2100)에 의해서 실행 가능한 복수의 명령어들 및/또는 프로시저(procedure)들을 포함할 수 있고, 프로그램(PR)에 포함된 복수의 명령어들 및/또는 프로시저(procedure)들은 프로세서(2100)로 하여금 본 발명의 실시예들에 따른 신경망 모델의 최적화 방법을 위한 동작들을 수행하도록 할 수 있다. 프로시저는 특정 태스크를 수행하기 위한 일련의 명령어들을 나타낼 수 있다. 프로시저는 함수(function), 루틴(routine), 서브루틴(subroutine), 서브프로그램(subprogram) 등으로도 부를 수 있다. 프로시저들 각각은 외부로부터 제공된 데이터 또는 다른 프로시저가 생성한 데이터를 처리할 수 있다.In other words, the program PR may include a plurality of instructions and/or procedures executable by the processor 2100, and the plurality of instructions and/or procedures included in the program PR (procedures) may cause the processor 2100 to perform operations for a method for optimizing a neural network model according to embodiments of the present invention. A procedure may represent a series of instructions for performing a specific task. Procedures can also be called functions, routines, subroutines, subprograms, etc. Each of the procedures may process data provided from the outside or data generated by another procedure.

저장 장치(2600)는 도 2의 저장 장치(1200)와 실질적으로 동일할 수 있다. 저장 장치(2600)는 프로그램(PR)을 저장하고, 적합성 판단 알고리즘들(SDA), 업데이트 알고리즘들(UA) 및 양자화 방식들(QS)을 저장할 수 있으며, 프로그램(PR)이 프로세서(2100)에 의해서 실행되기 이전에 저장 장치(2600)로부터 프로그램(PR)의 전부 또는 일부가 RAM(2400)으로 로딩될 수 있다. 저장 장치(2600)는 프로그램 언어로 작성된 파일을 저장할 수도 있고, 컴파일러 등에 의해서 생성된 프로그램(PR)의 전부 또는 일부가 RAM(2400)으로 로딩될 수도 있다.The storage device 2600 may be substantially the same as the storage device 1200 of FIG. 2 . The storage device 2600 may store the program PR, suitability decision algorithms SDA, update algorithms UA, and quantization schemes QS, and the program PR may be stored in the processor 2100. All or part of the program PR may be loaded into the RAM 2400 from the storage device 2600 before being executed by the program. The storage device 2600 may store a file written in a program language, or all or part of a program PR created by a compiler or the like may be loaded into the RAM 2400 .

저장 장치(2600)는 프로세서(2100)에 의해서 처리될 데이터 또는 프로세서(2100)에 의해서 처리된 데이터를 저장할 수도 있다. 즉, 프로세서(2100)는 프로그램(PR)에 따라, 저장 장치(2600)에 저장된 데이터를 처리함으로써 새로운 데이터를 생성할 수 있고, 생성된 데이터를 저장 장치(2600)에 저장할 수도 있다.The storage device 2600 may store data to be processed by the processor 2100 or data processed by the processor 2100 . That is, the processor 2100 may generate new data by processing data stored in the storage device 2600 according to the program PR, and may store the generated data in the storage device 2600 .

입출력 장치(2200)는 도 2의 입출력 장치(1300)와 실질적으로 동일할 수 있다. 입출력 장치(2200)는 키보드, 마우스, 터치스크린 등과 같은 입력 장치를 포함할 수 있고, 디스플레이 장치, 프린터 등과 같은 출력 장치를 포함할 수 있다. 예를 들어, 사용자는 입출력 장치(2200)를 통해서, 프로세서(2100)에 의해 프로그램(PR)의 실행을 트리거(trigger)하거나 도 2의 모델 정보(MI), 장치 정보(DI) 및/또는 도 4의 사용자 입력(UI)을 입력할 수도 있으며, 도 2의 시각화된 출력(VOUT) 및/또는 도 4의 그래픽 표현(GR)을 확인할 수도 있다.The input/output device 2200 may be substantially the same as the input/output device 1300 of FIG. 2 . The input/output device 2200 may include an input device such as a keyboard, a mouse, and a touch screen, and may include an output device such as a display device and a printer. For example, the user triggers the execution of the program PR by the processor 2100 through the input/output device 2200, or the model information MI of FIG. 2, device information DI and/or FIG. The user input (UI) of 4 may be input, and the visualized output (VOUT) of FIG. 2 and/or the graphic representation (GR) of FIG. 4 may be checked.

네트워크 인터페이스(2300)는 신경망 모델 처리 시스템(2000) 외부의 네트워크에 대한 액세스를 제공할 수 있다. 예를 들어, 네트워크는 다수의 컴퓨팅 시스템들 및 통신 링크들을 포함할 수 있고, 통신 링크들은 유선 링크들, 광학 링크들, 무선 링크들 또는 임의의 다른 형태의 링크들을 포함할 수 있다. 도 2의 모델 정보(MI), 장치 정보(DI) 및/또는 도 4의 사용자 입력(UI)이 네트워크 인터페이스(2300)를 통해서 신경망 모델 처리 시스템(2000)에 제공될 수도 있고, 도 2의 시각화된 출력(VOUT) 및/또는 도 4의 그래픽 표현(GR)이 네트워크 인터페이스(2300)를 통해서 다른 컴퓨팅 시스템에 제공될 수도 있다.The network interface 2300 may provide access to a network external to the neural network model processing system 2000. For example, a network may include multiple computing systems and communication links, which may include wired links, optical links, wireless links, or any other type of links. The model information (MI) and device information (DI) of FIG. 2 and/or the user input (UI) of FIG. 4 may be provided to the neural network model processing system 2000 through the network interface 2300, and the visualization of FIG. The output VOUT and/or the graphical representation GR of FIG. 4 may be provided to another computing system through the network interface 2300 .

도 4를 참조하면, 도 2 및 3의 신경망 모델 처리 시스템(1000, 2000)에 의해 실행/제어되는 신경망 모델 최적화 모듈(100)은 그래픽 유저 인터페이스 제어 모듈(200) 및 분석 모듈(300)을 포함하며, 업데이트 모듈(400) 및 양자화 모듈(500)을 더 포함할 수 있다. 신경망 모델 최적화 모듈(100)은 신경망 모델을 최적화하기 위한 그래픽 유저 인터페이스를 제공할 수 있다.Referring to FIG. 4 , the neural network model optimization module 100 executed/controlled by the neural network model processing systems 1000 and 2000 of FIGS. 2 and 3 includes a graphic user interface control module 200 and an analysis module 300. and may further include an update module 400 and a quantization module 500. The neural network model optimization module 100 may provide a graphical user interface for optimizing the neural network model.

이하에서 사용되는 "모듈"이라는 용어는 소프트웨어, FPGA또는 ASIC과 같은 하드웨어 또는 소프트웨어와 하드웨어의 조합을 나타낼 수 있다. "모듈"은 소프트웨어의 형태로서 어드레싱할 수 있는 저장 매체에 저장될 수 있고, 하나 또는 그 이상의 프로세서들에 의해 실행되도록 구성될 수도 있다. 예를 들어, "모듈"은 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로 코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들, 및 변수들을 포함할 수 있다. "모듈"은 세부적인 기능들을 수행하는 복수의 "모듈"들로 분리될 수도 있다.The term "module" used below may refer to software, hardware such as an FPGA or ASIC, or a combination of software and hardware. A “module” may be stored in an addressable storage medium in the form of software and may be configured to be executed by one or more processors. For example, a "module" may include components such as software components, object-oriented software components, class components, and task components, processes, functions, properties, procedures, sub may include routines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables. A "module" may be divided into a plurality of "modules" that perform detailed functions.

분석 모듈(300)은 적합성 판단 알고리즘들(도 2 및 3의 SDA)에 기초하여 신경망 모델이 대상 장치에서 구동하기에 적합한지 분석 동작을 수행할 수 있다.The analysis module 300 may perform an analysis operation to determine whether the neural network model is suitable for operation in a target device based on suitability determination algorithms (SDAs of FIGS. 2 and 3 ).

분석 모듈(300)은 대상 장치에 대해 미리 정의된 테이블(pre-listed table, PT)(310), 성능 추정기(performance estimator, PE)(320), 대상 장치에 대해 미리 학습된 딥 러닝 모델(pre-trained deep learning model, PM)(330), 복잡도 판단부(complexity determining unit, CD)(340), 용량 측정부(capacity measuring unit, CM)(350) 및 메모리 추정기(memory estimator, ME)(360)를 포함할 수 있다. 각 구성요소를 이용한 구체적인 분석 동작에 대해서는 후술하도록 한다.The analysis module 300 includes a pre-listed table (PT) 310 for the target device, a performance estimator (PE) 320, and a deep learning model pre-trained for the target device (pre -trained deep learning model (PM) 330, complexity determining unit (CD) 340, capacity measuring unit (CM) 350, and memory estimator (ME) 360 ) may be included. A detailed analysis operation using each component will be described later.

업데이트 모듈(400)은 업데이트 알고리즘들(도 2 및 3의 UA)에 기초하여 신경망 모델에 대한 업데이트 동작(예를 들어, 설정 변경, 레이어 변경 등)을 수행할 수 있다. 상기 업데이트 동작에 대해서는 도 17 등을 참조하여 후술하도록 한다.The update module 400 may perform an update operation (eg, setting change, layer change, etc.) on the neural network model based on the update algorithms (UA of FIGS. 2 and 3 ). The update operation will be described later with reference to FIG. 17 and the like.

양자화 모듈(500)은 양자화 방식들(도 2 및 3의 QS)에 기초하여 신경망 모델에 대한 양자화 동작을 수행할 수 있다. 상기 양자화 동작에 대해서는 도 21 등을 참조하여 후술하도록 한다.The quantization module 500 may perform a quantization operation on the neural network model based on quantization schemes (QS of FIGS. 2 and 3 ). The quantization operation will be described later with reference to FIG. 21 and the like.

그래픽 유저 인터페이스 제어 모듈(200)은 신경망 모델에 대한 최적화를 수행하도록 그래픽 유저 인터페이스를 제어할 수 있다. 예를 들어, 그래픽 유저 인터페이스 제어 모듈(200)은 사용자 입력(UI)을 수신하고 그래픽 표현(GR)을 출력하도록 그래픽 유저 인터페이스를 제어할 수 있다. 예를 들어, 사용자 입력(UI)은 도 2의 모델 정보(MI) 및 장치 정보(DI)를 포함하고, 그래픽 표현(GR)은 도 2의 시각화된 출력(VOUT)에 대응할 수 있다.The graphic user interface control module 200 may control the graphic user interface to optimize the neural network model. For example, the graphic user interface control module 200 may control the graphic user interface to receive a user input (UI) and output a graphic expression (GR). For example, the user input (UI) may include the model information (MI) and device information (DI) of FIG. 2 , and the graphic representation (GR) may correspond to the visualized output (VOUT) of FIG. 2 .

일 실시예에서, 도 2 및 3은 신경망 모델 최적화 모듈(100)이 소프트웨어의 형태로 구현되는 경우를 예시하였으나, 본 발명이 반드시 이에 제한되는 것은 아니다. 예를 들어, 신경망 모델 최적화 모듈(100)에 포함되는 구성요소들의 일부 또는 전부는 하드웨어의 형태로 구현되고, 컴퓨터 기반의 전자 시스템에 포함될 수 있다.In one embodiment, FIGS. 2 and 3 illustrate a case where the neural network model optimization module 100 is implemented in the form of software, but the present invention is not necessarily limited thereto. For example, some or all of the components included in the neural network model optimization module 100 may be implemented in the form of hardware and included in a computer-based electronic system.

도 5a, 5b, 5c 및 6은 본 발명의 실시예들에 따른 신경망 모델의 최적화 방법의 대상이 되는 신경망 모델을 설명하기 위한 도면들이다.5A, 5B, 5C, and 6 are diagrams for explaining a neural network model that is a target of a method for optimizing a neural network model according to embodiments of the present invention.

도 5a, 5b 및 5c는 신경망 모델의 네트워크 구조의 예들을 나타내고, 도 6은 신경망 모델을 구동하는데 이용되는 신경망 시스템의 일 예를 나타낸다. 예를 들어, 상기 신경망 모델은 인공 신경망(Artificial Neural Network; ANN) 모델, 컨볼루션 신경망(Convolutional Neural Network; CNN) 모델, 회귀 신경망(Recurrent Neural Network; RNN) 모델, 심층 신경망(Deep Neural Network; DNN) 모델 중 적어도 하나를 포함할 수 있다.5a, 5b and 5c show examples of the network structure of the neural network model, and FIG. 6 shows an example of a neural network system used to drive the neural network model. For example, the neural network model includes an artificial neural network (ANN) model, a convolutional neural network (CNN) model, a recurrent neural network (RNN) model, and a deep neural network (DNN). ) may include at least one of the models.

도 5a를 참조하면, 일반적인 인공 신경망의 네트워크 구조는 입력 레이어(IL), 복수의 히든 레이어들(HL1, HL2, ..., HLn) 및 출력 레이어(OL)를 포함할 수 있다.Referring to FIG. 5A , the network structure of a general artificial neural network may include an input layer (IL), a plurality of hidden layers (HL1, HL2, ..., HLn), and an output layer (OL).

입력 레이어(IL)는 i(i는 자연수)개의 입력 노드들(x₁, x₂, ..., x_i)을 포함할 수 있고, 길이가 i인 벡터 입력 데이터(IDAT)가 각 입력 노드에 입력될 수 있다.The input layer IL may include i (i is a natural number) number of input nodes (x ₁ , x ₂ , ..., x _i ), and vector input data IDAT having a length of i is provided at each input node. can be entered into

복수의 히든 레이어들(HL1, HL2, ..., HLn)은 n(n은 자연수)개의 히든 레이어들을 포함하며, 히든 노드들(h¹ ₁, h¹ ₂, h¹ ₃, ..., h¹ _m, h² ₁, h² ₂, h² ₃, ..., h² _m, hⁿ ₁, hⁿ ₂, hⁿ ₃, ..., hⁿ _m)을 포함할 수 있다. 예를 들어, 히든 레이어(HL1)는 m(m은 자연수)개의 히든 노드들(h¹ ₁, h¹ ₂, h¹ ₃, ..., h¹ _m)을 포함할 수 있고, 히든 레이어(HL2)는 m개의 히든 노드들(h² ₁, h² ₂, h² ₃, ..., h² _m)을 포함할 수 있으며, 히든 레이어(HLn)는 m개의 히든 노드들(hⁿ ₁, hⁿ ₂, hⁿ ₃, ..., hⁿ _m)을 포함할 수 있다.The plurality of hidden layers HL1, HL2, ..., HLn include n (n is a natural number) hidden layers, and the hidden nodes (h ¹ ₁ , h ¹ ₂ , h ¹ ₃ , ..., h ¹ _m , h ² ₁ , h ² ₂ , h ² ₃ , ..., h ² _m , h ⁿ ₁ , h ⁿ ₂ , h ⁿ ₃ , ..., h ⁿ _m ). For example, the hidden layer HL1 may include m (m is a natural number) hidden nodes (h ¹ ₁ , h ¹ ₂ , h ¹ ₃ , ..., h ¹ _m ), and the hidden layer ( HL2) may include m hidden nodes h ² ₁ , h ² ₂ , h ² ₃ , ..., h ² _m , and the hidden layer HLn may include m hidden nodes h ⁿ ₁ , h ⁿ ₂ , h ⁿ ₃ , ..., h ⁿ _m ).

출력 레이어(OL)는 분류할 클래스에 대응하는 j(j는 자연수)개의 출력 노드들(y₁, y₂, ..., y_j)을 포함할 수 있고, 입력 데이터(IDAT)에 대하여 각 클래스 별로 결과(예를 들어, 점수 또는 class score)를 출력할 수 있다. 출력 레이어(240)는 fully connected 레이어라고 부를 수 있으며, 예를 들어 입력 데이터(IDAT)가 자동차에 대응할 확률을 수치로 나타낼 수 있다.The output layer OL may include j (j is a natural number) output nodes (y ₁ , y ₂ , ..., y _j ) corresponding to the class to be classified, and for the input data IDAT, each Results (eg, score or class score) can be output for each class. The output layer 240 may be referred to as a fully connected layer, and may represent, for example, a probability that the input data IDAT corresponds to a car as a numerical value.

도 5a에 도시된 네트워크 구조는, 두 개의 노드들 사이에 직선으로 도시된 노드들 간의 연결(branch)과, 도시되지는 않았지만 각 연결에서 사용되는 가중치(weight)를 포함할 수 있다. 이 때, 하나의 레이어 내의 노드들 간에는 연결이 되지 않을 수 있고, 서로 다른 레이어들에 포함되는 노드들은 완전하게 혹은 부분적으로 연결될 수 있다.The network structure shown in FIG. 5A may include a branch between nodes shown as a straight line between two nodes and a weight used in each connection, although not shown. In this case, nodes in one layer may not be connected, and nodes included in different layers may be completely or partially connected.

도 5a의 각 노드(예를 들어, h¹ ₁)는 이전 노드(예를 들어, x₁)의 출력을 입력 받아 연산할 수 있고, 연산 결과를 이후 노드(예를 들어, h² ₁)에 출력할 수 있다. 이 때, 각 노드는 입력된 값을 특정 함수, 예를 들어 비선형 함수에 적용하여 출력할 값을 연산할 수 있다.Each node (eg, h ¹ ₁ ) of FIG. 5A may receive and operate the output of a previous node (eg, x ₁ ), and transmit the result of the operation to a subsequent node (eg, h ² ₁ ). can be printed out. At this time, each node may calculate a value to be output by applying the input value to a specific function, for example, a nonlinear function.

일반적으로 신경망의 네트워크 구조는 미리 결정되어 있으며, 노드들 간의 연결에 따른 가중치들은 이미 어떤 클래스에 속할지 정답이 알려진 데이터를 이용하여 적절한 값을 산정하게 된다. 이와 같이 이미 정답이 알려진 데이터들을 '학습 데이터'라고 하고, 가중치를 결정하는 과정을 '학습'이라고 한다. 또한, 독립적으로 학습이 가능한 구조와 가중치의 묶음을 '모델'이라고 가정하고, 가중치가 결정된 모델이 입력 데이터가 어느 클래스에 속할지를 예측하여 그 예측값을 출력하는 것을 '테스트' 과정이라고 한다.In general, the network structure of a neural network is predetermined, and weights according to connections between nodes are calculated with appropriate values using data for which the correct answer is already known to which class they belong. In this way, the data for which the correct answer is already known is called 'learning data', and the process of determining the weight is called 'learning'. In addition, assuming that a structure capable of learning independently and a bundle of weights is a 'model', the process of predicting which class the input data belongs to and outputting the predicted value by the model whose weights are determined is called a 'test' process.

한편, 도 5a에 도시된 일반적인 신경망은 각 노드(예를 들어, h¹ ₁)가 앞쪽 레이어(previous layer)(예를 들어, IL)의 모든 노드들(예를 들어, x₁, x₂, ..., x_i)과 연결되어 있어, 입력 데이터(IDAT)가 영상(또는 음성)인 경우에 영상의 크기가 증가할수록 필요한 가중치의 개수가 기하급수적으로 증가하며, 따라서 영상을 다루기에 적절하지 않을 수 있다. 이에 따라, 신경망에 필터 기술을 병합하여, 신경망이 2차원 영상을 잘 습득할 수 있도록 구현된 컨볼루션 신경망이 연구되고 있다.Meanwhile, in the general neural network shown in FIG. 5A, each node (eg, h ¹ ₁ ) corresponds to all nodes (eg, x ₁ , x ₂ , ..., x _i ), when the input data (IDAT) is video (or audio), the number of required weights increases exponentially as the size of the video increases, so it is not appropriate to handle video. may not be Accordingly, a convolutional neural network implemented by integrating a filter technology into a neural network so that the neural network can acquire a 2D image well is being studied.

도 5b를 참조하면, 컨볼루션 신경망의 네트워크 구조는 복수의 레이어들(CONV1, RELU1, CONV2, RELU2, POOL1, CONV3, RELU3, CONV4, RELU4, POOL2, CONV5, RELU5, CONV6, RELU6, POOL3, FC)을 포함할 수 있다.Referring to FIG. 5B, the network structure of the convolutional neural network includes a plurality of layers (CONV1, RELU1, CONV2, RELU2, POOL1, CONV3, RELU3, CONV4, RELU4, POOL2, CONV5, RELU5, CONV6, RELU6, POOL3, FC). can include

일반적인 신경망과 다르게, 컨볼루션 신경망의 각 레이어는 가로(또는 폭, width), 세로(또는 높이, height), 깊이(depth)의 3개의 차원을 가질 수 있다. 이에 따라, 각 레이어에 입력되는 데이터 또한 가로, 세로, 깊이의 3개의 차원을 가지는 볼륨 데이터일 수 있다. 예를 들어, 도 5b에서 입력 영상이 가로 32, 세로 32의 크기를 가지고 세 개의 컬러 채널(R, G, B)을 가지는 경우에, 상기 입력 영상에 대응하는 입력 데이터(IDAT)는 32*32*3의 크기를 가질 수 있다. 도 5b의 입력 데이터(IDAT)는 입력 볼륨 데이터 또는 입력 액티베이션 볼륨(activation volume)이라 부를 수 있다.Unlike general neural networks, each layer of a convolutional neural network may have three dimensions: horizontal (or width), vertical (or height), and depth. Accordingly, data input to each layer may also be volume data having three dimensions of width, length, and depth. For example, in FIG. 5B , when an input image has a size of 32 horizontally and 32 vertically and has three color channels (R, G, B), the input data (IDAT) corresponding to the input image is 32*32 * Can have a size of 3. The input data IDAT of FIG. 5B may be referred to as input volume data or input activation volume.

컨볼루션 레이어들(CONV1, CONV2, CONV3, CONV4, CONV5, CONV6)은 입력에 대한 컨볼루션 연산을 수행할 수 있다. 영상 처리에서 컨볼루션이란 가중치를 갖는 마스크를 이용하여 데이터를 처리하는 것을 의미할 수 있으며, 입력 값과 마스크의 가중치를 곱한 후에 그 합을 출력 값으로 정하는 것을 나타낼 수 있다. 이 때, 마스크를 필터(filter), 윈도우(window) 또는 커널(kernel)이라고 부를 수 있다.The convolution layers CONV1, CONV2, CONV3, CONV4, CONV5, and CONV6 may perform a convolution operation on an input. In image processing, convolution may mean processing data using a mask having weights, and may indicate that an input value is multiplied by a mask weight and then the sum is determined as an output value. In this case, the mask may be called a filter, window, or kernel.

구체적으로, 각 컨볼루션 레이어의 파라미터들은 일련의 학습 가능한 필터들로 이루어져 있을 수 있다. 각 필터는 가로/세로 차원으로는 각 레이어의 전체 크기보다 작지만 깊이 차원으로는 각 레이어의 전체 깊이를 아우를 수 있다. 예를 들어, 각 필터를 입력 볼륨의 가로/세로 차원으로 슬라이딩(정확히는 convolve) 시키며 필터와 입력의 요소들 사이의 내적 연산(dot product)을 수행하여 2차원의 액티베이션 맵(activation map)을 생성할 수 있고, 이러한 액티베이션 맵을 깊이 차원을 따라 쌓아서 출력 볼륨을 생성할 수 있다. 예를 들어, 컨볼루션 레이어(CONV1)가 32*32*3의 크기의 입력 볼륨 데이터(IDAT)에 네 개의 필터들을 제로 패딩(zero-padding)과 함께 적용하면, 컨볼루션 레이어(CONV1)의 출력 볼륨은 32*32*12의 크기를 가질 수 있다 (즉, 깊이 증가).Specifically, the parameters of each convolution layer may consist of a series of learnable filters. Each filter is smaller than the total size of each layer in the horizontal/vertical dimension, but can cover the entire depth of each layer in the depth dimension. For example, a two-dimensional activation map can be created by sliding (convolve, to be exact) each filter in the horizontal/vertical dimensions of the input volume and performing a dot product between the filter and the elements of the input. and these activation maps can be stacked along the depth dimension to create an output volume. For example, if the convolution layer CONV1 applies four filters together with zero-padding to the 32*32*3 input volume data IDAT, the output of the convolution layer CONV1 A volume can have dimensions of 32*32*12 (i.e. increasing depth).

RELU 레이어들(RELU1, RELU2, RELU3, RELU4, RELU5, RELU6)은 입력에 대한 정정 선형 유닛 연산을 수행할 수 있다. 예를 들어, 정정 선형 유닛 연산은 max(0, x)와 같이 음수에 대해서만 0으로 처리하는 함수를 나타낼 수 있다. 예를 들어, RELU 레이어(RELU1)가 컨볼루션 레이어(CONV1)로부터 제공된 32*32*12의 크기의 입력 볼륨에 정정 선형 유닛 연산을 수행하면, RELU 레이어(RELU1)의 출력 볼륨은 32*32*12의 크기를 가질 수 있다 (즉, 볼륨 유지).The RELU layers (RELU1, RELU2, RELU3, RELU4, RELU5, and RELU6) may perform corrected linear unit operations on inputs. For example, the corrected linear unit operation may represent a function that treats only negative numbers as 0, such as max(0, x). For example, if the RELU layer (RELU1) performs a corrected linear unit operation on an input volume of size 32*32*12 provided from the convolution layer (CONV1), the output volume of the RELU layer (RELU1) is 32*32* It can have a size of 12 (i.e. keep the volume).

풀링 레이어들(POOL1, POOL2, POOL3)은 입력 볼륨의 가로/세로 차원에 대해 다운 샘플링을 수행할 수 있다. 예를 들어, 2*2 필터를 적용하는 경우에 2*2 영역의 네 개의 입력들을 하나의 출력으로 변환할 수 있다. 구체적으로, 2*2 최대 값 풀링과 같이 2*2 영역의 네 개의 입력들 중 최대 값을 선택하거나, 2*2 평균 값 풀링과 같이 2*2 영역의 네 개의 입력들의 평균 값을 연산할 수 있다. 예를 들어, 풀링 레이어(POOL1)가 32*32*12의 크기의 입력 볼륨에 2*2 필터를 적용하면, 풀링 레이어(POOL1)의 출력 볼륨은 16*16*12의 크기를 가질 수 있다 (즉, 가로/세로 감소, 깊이 유지, 볼륨 감소).The pooling layers POOL1, POOL2, and POOL3 may perform down-sampling on the horizontal/vertical dimensions of the input volume. For example, when a 2*2 filter is applied, four inputs in a 2*2 area can be converted into one output. Specifically, as in 2*2 maximum value pooling, the maximum value among the four inputs in the 2*2 area can be selected, or the average value of the four inputs in the 2*2 area can be calculated as in the 2*2 average value pooling. there is. For example, if the pooling layer POOL1 applies a 2*2 filter to an input volume with a size of 32*32*12, the output volume of the pooling layer POOL1 may have a size of 16*16*12 ( i.e. reduce horizontal/vertical, retain depth, decrease volume).

일반적으로 컨볼루션 신경망에서는 하나의 컨볼루션 레이어(예를 들어, CONV1)와 하나의 RELU 레이어(예를 들어, RELU1)가 한 쌍을 형성할 수 있고, 컨볼루션/RELU 레이어들의 쌍이 반복 배치될 수 있으며, 컨볼루션/RELU 레이어들의 쌍이 반복 배치되는 중간 중간에 풀링 레이어를 삽입함으로써, 영상을 줄여나가면서 영상의 특징을 추출할 수 있다.In general, in a convolutional neural network, one convolution layer (eg, CONV1) and one RELU layer (eg, RELU1) may form a pair, and pairs of convolution/RELU layers may be repeatedly arranged. In addition, by inserting a pooling layer in the middle where pairs of convolution/RELU layers are repeatedly arranged, it is possible to extract features of an image while reducing the image.

출력 레이어 또는 fully connected 레이어(FC)는 입력 볼륨 데이터(IDAT)에 대하여 각 클래스 별로 결과를 출력할 수 있다. 예를 들어, 컨볼루션 및 서브 샘플링을 반복 수행함에 따라 2차원 영상에 대응하는 입력 볼륨 데이터(IDAT)가 1차원 행렬(또는 벡터)로 변환될 수 있다. 예를 들어, fully connected 레이어(FC)는 입력 볼륨 데이터(IDAT)가 자동차(CAR), 트럭(TRUCK), 비행기(AIRPLANE), 배(SHIP), 말(HORSE)에 대응할 확률을 수치로 나타낼 수 있다.The output layer or the fully connected layer FC may output results for each class with respect to the input volume data IDAT. For example, input volume data (IDAT) corresponding to a 2D image may be converted into a 1D matrix (or vector) by repeatedly performing convolution and subsampling. For example, the fully connected layer (FC) can numerically represent the probability that the input volume data (IDAT) corresponds to a car (CAR), a truck (TRUCK), an airplane (AIRPLANE), a ship (SHIP), and a horse (HORSE). there is.

도 5c를 참조하면, 회귀 신경망의 네트워크 구조는 도 5c의 좌측에 도시된 특정 노드(N) 또는 셀을 이용한 반복 구조를 포함할 수 있다.Referring to FIG. 5C , the network structure of the recurrent neural network may include a repetitive structure using a specific node N or cell shown on the left side of FIG. 5C.

도 5c의 우측에 도시된 구조는 좌측에 도시된 회귀 신경망의 반복적인 연결이 펼쳐진(UNFOLD) 것을 나타내며, 회귀 신경망을 "펼친다"는 것은 네트워크를 모든 노드들(NA, NB, NC)을 포함하는 전체 시퀀스에 대해 도시한 것일 수 있다. 예를 들어, 관심 있는 시퀀스 정보가 3개의 단어로 이루어진 문장이라면, 회귀 신경망은 한 단어당 하나의 계층(layer)씩 (recurrent 연결이 없는, 또는 사이클이 없는) 3-layer 신경망 구조로 펼쳐질 수 있다.The structure shown on the right side of FIG. 5c indicates that the iterative connection of the recurrent neural network shown on the left is UNFOLD, and “unfolding” the recurrent neural network means that the network is It may be shown for the entire sequence. For example, if the sequence information of interest is a sentence consisting of 3 words, the regressive neural network can be expanded into a 3-layer neural network structure (without recurrent connections or without cycles), one layer per word. .

회귀 신경망에서, X는 회귀 신경망의 입력값을 나타낸다. 예를 들어, X_t는 시간 스텝(time step) t에서의 입력값이며, X_t-1 및 X_t+1 역시 각각 시간 스텝 t-1 및 t+1에서의 입력값일 수 있다.In a recurrent neural network, X represents the input value of the recurrent neural network. For example, X _t is an input value at time step t, and X _t−1 and X _t+1 may also be input values at time steps t−1 and t+1, respectively.

회귀 신경망에서, S는 히든 상태(hidden state)를 나타낸다. 예를 들어, S_t는 시간 스텝 t에서의 히든 상태이며, S_t-1 및 S_t+1도 역시 각각 시간 스텝 t-1 및 t+1에서의 히든 상태일 수 있다. 히든 상태는 이전 시간 스텝의 히든 상태 값과 현재 시간 스텝의 입력값에 의해 계산될 수 있다. 예를 들어, S_t=f(UX_t+WS_t-1)일 수 있고, 이 때 비선형 함수 f는 tanh나 ReLU가 사용될 수 있으며, 최초의 히든 상태를 계산하기 위한 S_-1은 보통 0으로 초기화시킬 수 있다.In recurrent neural networks, S represents a hidden state. For example, S _t is a hidden state at time step t, and S _t−1 and S _t+1 may also be hidden states at time steps t−1 and t+1, respectively. The hidden state can be calculated by the hidden state value of the previous time step and the input value of the current time step. For example, it can be S _t =f(UX _t +WS _t-1 ), in which case tanh or ReLU can be used for the nonlinear function f, and S _-1 for calculating the first hidden state is usually set to 0 can be initialized.

회귀 신경망에서, O는 시간 스텝 t에서의 출력값을 나타낸다. 예를 들어, O_t는 시간 스텝 t에서의 출력값이며, O_t-1 및 O_t+1 역시 각각 시간 스텝 t-1 및 t+1에서의 출력값일 수 있다. 예를 들어, 문장에서 다음 단어를 추측하고 싶다면 단어 수만큼의 차원의 확률 벡터가 될 것이다. 예를 들어, O_t=softmax(VS_t)일 수 있다.In a recurrent neural network, O represents the output value at time step t. For example, O _t is an output value at time step t, and O _t−1 and O _t+1 may also be output values at time steps t−1 and t+1, respectively. For example, if we wanted to guess the next word in a sentence, it would be a probability vector with dimensions equal to the number of words. For example, it may be O _t =softmax(VS _t ).

회귀 신경망에서, 히든 상태는 네트워크의 "메모리" 부분일 수 있다. 다시 말하면, 회귀 신경망은 현재까지 계산된 결과에 대한 "메모리" 정보를 갖고 있다고 볼 수 있다. S_t는 과거의 시간 스텝들에서 일어난 일들에 대한 정보를 전부 담고 있고, 출력값 O_t는 오로지 현재 시간 스텝 t의 메모리에만 의존할 수 있다. 또한, 각 계층마다의 파라미터 값들이 전부 다른 기존의 신경망 구조와 달리, 회귀 신경망은 모든 시간 스텝에 대해 파라미터 값(도 5c의 U, V, W)을 전부 공유하고 있다. 이는 회귀 신경망이 각 스텝마다 입력값만 다를 뿐 거의 똑같은 계산을 하고 있음을 나타내며, 학습해야 하는 파라미터 수를 감소시킬 수 있다.In a recurrent neural network, the hidden state can be the “memory” part of the network. In other words, it can be seen that the regressive neural network has "memory" information about the results calculated so far. S _t contains all the information about what happened in past time steps, and the output value O _t can only depend on the memory of the current time step t. In addition, unlike the existing neural network structure in which parameter values for each layer are all different, the recurrent neural network shares all parameter values (U, V, and W in FIG. 5C) for all time steps. This indicates that the regression neural network performs almost the same calculation with only a different input value for each step, and the number of parameters to be learned can be reduced.

도 6을 참조하면, 신경망 시스템(600)은 신경망 모델을 구동하기 위한 복수의 이종 리소스들, 및 상기 복수의 이종 리소스들을 관리/제어하는 리소스 관리부(601)를 포함할 수 있다.Referring to FIG. 6 , the neural network system 600 may include a plurality of heterogeneous resources for driving a neural network model, and a resource management unit 601 that manages/controls the plurality of heterogeneous resources.

상기 복수의 이종 리소스들은 CPU(central processing unit)(610), NPU(neural processing unit)(620), GPU(graphic processing unit)(630), DSP(digital signal processor)(640) 및 ISP(image signal processor)(650)를 포함하며, 특정 작업 전용 하드웨어(dedicated hardware; DHW)(660), 메모리(memory; MEM)(670), DMA(direct memory access)부(680) 및 통신(connectivity)부(690)를 더 포함할 수 있다. CPU(610), NPU(620), GPU(630), DSP(640), ISP(650) 및 특정 작업 전용 하드웨어(660)는 프로세서, 프로세싱 유닛(processing unit), 연산 리소스(computing resource) 등으로 지칭될 수도 있고, DMA부(680) 및 통신부(690)는 통신 리소스(communication resource)라고 지칭될 수도 있다.The plurality of heterogeneous resources include a central processing unit (CPU) 610, a neural processing unit (NPU) 620, a graphic processing unit (GPU) 630, a digital signal processor (DSP) 640, and an image signal processor (ISP). processor) 650, a dedicated hardware (DHW) 660, a memory (MEM) 670, a direct memory access (DMA) unit 680, and a connectivity unit ( 690) may be further included. The CPU 610, the NPU 620, the GPU 630, the DSP 640, the ISP 650, and the hardware 660 dedicated to specific tasks are divided into processors, processing units, computing resources, and the like. Also, the DMA unit 680 and the communication unit 690 may be referred to as communication resources.

CPU(610), NPU(620), GPU(630), DSP(640), ISP(650) 및 특정 작업 전용 하드웨어(660)는 특정 계산들 또는 태스크들과 같은 다양한 기능들을 실행하며, 신경망 모델을 실행하는데 이용될 수 있다. 예를 들어, 특정 작업 전용 하드웨어(660)는 VPU(vision processing unit), VIP(vision intellectual property) 등을 포함할 수 있다. 메모리(670)는 상기 복수의 이종 리소스들에서 처리되는 데이터를 저장하며, 신경망 모델과 관련된 데이터를 저장할 수 있다. DMA부(680)는 메모리(670)에 대한 접근을 제어할 수 있다. 예를 들어, DMA부(680)는 MDMA(memory DMA), PDMA(peripheral DMA), RDMA(remote DMA), SDMA(smart DMA) 등을 포함할 수 있다. 통신부(690)는 유/무선으로 통신을 수행할 수 있다. 예를 들어, 통신부(690)는 시스템 버스, PCI(peripheral component interconnect), PCIe(PCI express) 등과 같은 내부 통신 및/또는 USB(universal serial bus), Ethernet, WiFi, Bluetooth, NFC(near field communication), RFID(radio frequency identification), 이동 통신(mobile telecommunication) 등과 같은 외부 통신을 지원할 수 있다.The CPU 610, NPU 620, GPU 630, DSP 640, ISP 650 and hardware 660 dedicated to specific tasks execute various functions such as specific calculations or tasks and build neural network models. can be used to run For example, the hardware 660 dedicated to a specific task may include a vision processing unit (VPU), a vision intellectual property (VIP), and the like. The memory 670 may store data processed by the plurality of heterogeneous resources and data related to a neural network model. The DMA unit 680 may control access to the memory 670. For example, the DMA unit 680 may include memory DMA (MDMA), peripheral DMA (PDMA), remote DMA (RDMA), smart DMA (SDMA), and the like. The communication unit 690 may perform wired/wireless communication. For example, the communication unit 690 may include internal communication such as a system bus, peripheral component interconnect (PCI), PCI express (PCIe), and/or universal serial bus (USB), Ethernet, WiFi, Bluetooth, and near field communication (NFC). , radio frequency identification (RFID), and external communication such as mobile telecommunication.

도시하지는 않았지만, 상기 연산 리소스는 마이크로프로세서(microprocessor), 어플리케이션 프로세서(application processor; AP), 맞춤형 하드웨어(customized hardware), 압축용 하드웨어(compression hardware) 등을 더 포함할 수 있다. 상기 통신 리소스는 메모리 복사 가능한 리소스들(memory copy capable resources)을 더 포함할 수 있다.Although not shown, the computing resource may further include a microprocessor, an application processor (AP), customized hardware, compression hardware, and the like. The communication resource may further include memory copy capable resources.

일 실시예에서, 신경망 시스템(600)은 임의의 컴퓨팅 기기 및/또는 모바일 기기에 포함될 수 있다.In one embodiment, neural network system 600 may be included in any computing device and/or mobile device.

일 실시예에서, 컴퓨터 비젼(예를 들어, 영상 분류(image classify), 영상 검출, 영상 분할, 영상 추적 등) 서비스, 생체 정보에 기초한 사용자 인증(authentication) 서비스, 운전 보조 시스템(advanced driver assistance system; ADAS) 서비스, 음성 보조(voice assistant) 서비스, 자동 음성 인식(automatic speech recognition; ASR) 서비스 등과 같은 다양한 서비스 및/또는 어플리케이션이 도 5a, 5b 및 5c를 참조하여 상술한 신경망 모델 및 도 6을 참조하여 상술한 신경망 시스템(600)에 의해 실행 및 처리될 수 있다.In one embodiment, a computer vision (eg, image classify, image detection, image segmentation, image tracking, etc.) service, a user authentication service based on biometric information, an advanced driver assistance system Various services and/or applications, such as ADAS) service, voice assistant service, automatic speech recognition (ASR) service, etc., can use the neural network model described above with reference to FIGS. 5A, 5B and 5C and FIG. 6 . It can be executed and processed by the neural network system 600 described above with reference.

도 7은 도 1의 분석을 수행하는 단계의 일 예를 나타내는 순서도이다.7 is a flowchart illustrating an example of steps for performing the analysis of FIG. 1 .

도 1 및 7을 참조하면, 상기 제1 신경망 모델이 상기 제1 대상 장치에서 구동하기에 적합한지 상기 분석을 수행하는데 있어서(단계 S100), 상기 분석을 수행하는데 이용되는 상기 복수의 적합성 판단 알고리즘들은, 상기 제1 대상 장치에 대한 상기 제1 신경망 모델의 구조 및 레이어들의 성능 효율성을 판단하는 제1 알고리즘을 포함하며, 상기 제1 알고리즘에 기초하여 상기 제1 신경망 모델에 대한 제1 분석을 수행할 수 있다(단계 S310). 예를 들어, 단계 S310은 분석 모듈(300)에 의해 수행될 수 있다.1 and 7, in performing the analysis on whether the first neural network model is suitable for running in the first target device (step S100), the plurality of suitability determination algorithms used to perform the analysis are , a first algorithm for determining the performance efficiency of the structure and layers of the first neural network model for the first target device, and performing a first analysis on the first neural network model based on the first algorithm. It can (step S310). For example, step S310 may be performed by analysis module 300 .

도 5a, 5b 및 5c를 참조하여 상술한 것처럼, 상기 제1 신경망 모델은 다양한 특성을 가진 복수의 레이어들을 포함하며, 몇 가지의 레이어들이 군집을 이루어 형성된 구조(또는 네트워크 구조)를 가질 수 있다. 이 때, 상기 제1 신경망 모델의 구조 및 레이어들 중 상기 제1 대상 장치의 동작에 적합하지 않은 구조 및 구성이 포함되어 있을 수 있다. 단계 S310에서는 상기 제1 신경망 모델의 구조 및 레이어들이 상기 제1 대상 장치에 효율적인 구조 및 구성인지 판단할 수 있고, 그 결과를 점수화하여 단계 S400에서 시각적으로 나타낼 수 있다.As described above with reference to FIGS. 5A, 5B, and 5C, the first neural network model includes a plurality of layers having various characteristics, and may have a structure (or network structure) formed by clustering several layers. At this time, among the structures and layers of the first neural network model, structures and configurations unsuitable for the operation of the first target device may be included. In step S310, it may be determined whether the structures and layers of the first neural network model are efficient for the first target device, and the results are scored and visually displayed in step S400.

도 8은 도 7의 제1 분석을 수행하는 단계의 일 예를 나타내는 순서도이다.FIG. 8 is a flowchart illustrating an example of performing the first analysis of FIG. 7 .

도 7 및 8을 참조하면, 상기 제1 알고리즘에 기초하여 상기 제1 신경망 모델에 대한 상기 제1 분석을 수행하는데 있어서(단계 S310), 상기 제1 대상 장치에 대해 미리 정의된 테이블(예를 들어, 도 4의 310)을 이용하여 상기 제1 신경망 모델의 구조 및 레이어들의 제1 점수들을 획득할 수 있다(단계 S312).7 and 8 , in performing the first analysis on the first neural network model based on the first algorithm (step S310), a table predefined for the first target device (eg, , 310 of FIG. 4), first scores of the structure and layers of the first neural network model may be obtained (step S312).

구체적으로, 미리 정의된 테이블(310)에 기초하여 상기 제1 신경망 모델의 구조 및 레이어들이 상기 제1 대상 장치에 대해 효율적인지 여부를 분석하고(단계 S312a), 이에 기초하여 상기 제1 점수들을 획득할 수 있다(단계 S312b). 예를 들어, 단계 S312a에서 이용되는 미리 정의된 테이블(310)은 상기 제1 대상 장치에서 추론(inference)에 효율적인/효율적이지 않은 구조 및 레이어들이 사전에 정의된 테이블 또는 리스트일 수 있다. 예를 들어, 미리 정의된 테이블(310)은 모델 정보(도 2의 MI)에 포함되어 수신될 수 있다. 예를 들어, 단계 S312b에서 가장 비효율적인 순서로 스코어링을 수행(scoring in the least efficient order)하며, 효율성이 높을수록 높은 점수를 부여하고 효율성이 낮을수록 낮은 점수를 부여할 수 있다.Specifically, based on the predefined table 310, it is analyzed whether the structure and layers of the first neural network model are efficient for the first target device (step S312a), and the first scores are obtained based on this analysis Yes (step S312b). For example, the predefined table 310 used in step S312a may be a table or list in which structures and layers that are efficient/inefficient for inference in the first target device are predefined. For example, the predefined table 310 may be included in model information (MI of FIG. 2) and received. For example, in step S312b, scoring in the least efficient order may be performed, and a higher score may be assigned as efficiency increases and a lower score may be assigned as efficiency decreases.

또한, 성능 추정기(예를 들어, 도 4의 320)를 이용하여 상기 제1 신경망 모델의 구조 및 레이어들의 처리 시간(processing time)을 예측하여 상기 제1 신경망 모델의 구조 및 레이어들의 제2 점수들을 획득할 수 있다(단계 S314).In addition, the second scores of the structure and layers of the first neural network model are predicted by predicting the processing time of the structure and layers of the first neural network model using a performance estimator (eg, 320 in FIG. 4 ). It can be obtained (step S314).

구체적으로, 성능 추정기(320)를 이용하여 상기 제1 신경망 모델의 구조 및 레이어들에 대한 성능을 분석하고(단계 S314a), 이에 기초하여 상기 제2 점수들을 획득할 수 있다(단계 S314b). 예를 들어, 단계 S314a에서 이용되는 성능 추정기(320)는 신경망 모델의 처리 시간을 예측하는 툴(tool)이며, 소프트웨어 또는 하드웨어의 형태로 구현될 수 있다. 예를 들어, 단계 S314b에서 성능을 저하시키는(drop) 구조 및 레이어들이 표시되도록 스코어링을 수행하며, 성능이 높을수록 높은 점수를 부여하고 성능이 낮을수록 낮은 점수를 부여할 수 있다.Specifically, the performance of the structure and layers of the first neural network model may be analyzed using the performance estimator 320 (step S314a), and based on this, the second scores may be obtained (step S314b). For example, the performance estimator 320 used in step S314a is a tool for estimating the processing time of the neural network model, and may be implemented in the form of software or hardware. For example, in step S314b, scoring may be performed so that structures and layers that drop performance may be displayed, and a higher score may be assigned as the performance is higher, and a lower score may be assigned as the performance is lower.

추가적으로, 상기 제1 대상 장치에 대해 미리 학습된 딥 러닝 모델(예를 들어, 도 4의 330)을 이용하여 상기 제1 신경망 모델의 구조 및 레이어들의 제3 점수들을 획득할 수 있다(단계 S316).Additionally, third scores of the structure and layers of the first neural network model may be obtained by using a deep learning model (eg, 330 of FIG. 4 ) previously learned for the first target device (step S316). .

구체적으로, 단계 S316에서 이용되는 미리 학습된 딥 러닝 모델(330)은 상기 제1 대상 장치에 따라 서로 다른 컴포넌트들(different components)을 이용하여 학습된 모델일 수 있다. 예를 들어, 미리 학습된 딥 러닝 모델(330)은 모델 정보(MI)에 포함되어 수신될 수 있다. 예를 들어, 단계 S316에서 미리 학습된 딥 러닝 모델(330)의 결정 출력(determination output)에 기초한 스코어링을 수행할 수 있다.Specifically, the pre-learned deep learning model 330 used in step S316 may be a model learned using different components according to the first target device. For example, the pretrained deep learning model 330 may be included in the model information MI and received. For example, scoring based on the determination output of the deep learning model 330 pretrained in step S316 may be performed.

다시 말하면, 단계 S312에서는 상기 제1 대상 장치에 대해 추론에 효율적인/효율적이지 않은 모델 구조와 레이어 구성을 사전에 정의하며, 사전에 정의된 테이블 또는 리스트를 통해 효율적이지 않은 레이어 구성을 찾고, 정의된 해결 구성을 제시할 수 있다. 단계 S314에서는 처리 시간을 예측하는 툴을 이용하여 각 구성요소를 시뮬레이션하고, 각각의 성능을 예측하여 스코어링할 수 있다. 단계 S316에서는 사전에 상기 제1 대상 장치에 다양한 구성의 여러 모델을 동작시켜 얻은 성능을 기록하여 딥 러닝 모델에 학습하고, 상기 제1 신경망 모델의 각 구성요소를 기 학습된 딥 러닝 모델을 통하여 성능 및 적합도를 측정할 수 있다.In other words, in step S312, a model structure and a layer configuration that are not efficient/inefficient for reasoning are defined in advance for the first target device, and an inefficient layer configuration is found through a predefined table or list, and the defined A solution configuration can be presented. In step S314, each component may be simulated using a tool for estimating processing time, and each performance may be predicted and scored. In step S316, the performance obtained by operating several models of various configurations on the first target device in advance is recorded and learned in a deep learning model, and performance of each component of the first neural network model is performed through the pre-learned deep learning model. and fit can be measured.

도 8에서는 단계 S312, S314 및 S316이 실질적으로 동시에 수행되는 것으로 도시하였으나, 본 발명은 이에 한정되지 않으며, 실시예에 따라서 단계 S312, S314 및 S316은 순차적으로 수행될 수도 있다.8 shows that steps S312, S314, and S316 are performed substantially simultaneously, the present invention is not limited thereto, and steps S312, S314, and S316 may be performed sequentially according to embodiments.

상기 제1 점수들, 상기 제2 점수들 및 상기 제3 점수들에 기초하여 상기 제1 신경망 모델의 구조 및 레이어들의 성능 점수들을 획득할 수 있다(단계 S318). 예를 들어, 상기 제1, 제2 및 제3 점수들에 서로 다른 가중치들을 부여하여 합산하는 가중치 합산 방식에 기초하여 상기 성능 점수들을 획득할 수 있다. 예를 들어, 상기 가중치들은 대상 장치마다 다르게 설정되며, 상기 제1, 제2 및 제3 점수들에 부여되는 제1, 제2 및 제3 가중치들은 모델 정보(MI)에 포함되어 수신될 수 있다.Performance scores of the structure and layers of the first neural network model may be obtained based on the first scores, the second scores, and the third scores (step S318). For example, the performance points may be obtained based on a weight summation scheme in which different weights are assigned to the first, second, and third scores and summed. For example, the weights are set differently for each target device, and the first, second, and third weights assigned to the first, second, and third scores may be included in the model information MI and received. .

일 실시예에서, 상기 제1 점수들, 상기 제2 점수들, 상기 제3 점수들 및 상기 성능 점수들은 상기 제1 신경망 모델의 구조 및 레이어들 각각에 대해 획득될 수 있다.In an embodiment, the first scores, the second scores, the third scores, and the performance scores may be obtained for each of the structure and layers of the first neural network model.

도 9는 도 1의 분석을 수행하는 단계의 다른 예를 나타내는 순서도이다.9 is a flowchart illustrating another example of steps for performing the analysis of FIG. 1 .

도 1 및 9를 참조하면, 상기 제1 신경망 모델이 상기 제1 대상 장치에서 구동하기에 적합한지 상기 분석을 수행하는데 있어서(단계 S100), 상기 분석을 수행하는데 이용되는 상기 복수의 적합성 판단 알고리즘들은, 상기 제1 신경망 모델의 구조 및 레이어들의 복잡도 및 용량을 분석하는 제2 알고리즘을 포함하며, 상기 제2 알고리즘에 기초하여 상기 제1 신경망 모델에 대한 제2 분석을 수행할 수 있다(단계 S320). 예를 들어, 단계 S320은 분석 모듈(300)에 의해 수행될 수 있다.1 and 9, in performing the analysis on whether the first neural network model is suitable for running in the first target device (step S100), the plurality of suitability determination algorithms used to perform the analysis are , and a second algorithm for analyzing complexity and capacity of the structure and layers of the first neural network model, and a second analysis may be performed on the first neural network model based on the second algorithm (step S320). . For example, step S320 may be performed by the analysis module 300 .

단계 S320에서는 상기 제1 신경망 모델의 구조 및 레이어들의 복잡도 및 용량 분석을 통해 최적화 포인트를 판단 및 가이드할 수 있고, 그 결과를 점수화하여 단계 S400에서 시각적으로 나타낼 수 있다.In step S320, an optimization point may be determined and guided through an analysis of complexity and capacity of the structure and layers of the first neural network model, and the result may be scored and visually displayed in step S400.

도 10은 도 9의 제2 분석을 수행하는 단계의 일 예를 나타내는 순서도이다.FIG. 10 is a flowchart illustrating an example of performing the second analysis of FIG. 9 .

도 9 및 10을 참조하면, 상기 제2 알고리즘에 기초하여 상기 제1 신경망 모델에 대한 상기 제2 분석을 수행하는데 있어서(단계 S320), 상기 제1 신경망 모델의 구조 및 레이어들의 복잡도를 판단하여 상기 제1 신경망 모델의 구조 및 레이어들의 제4 점수들을 획득할 수 있다(단계 S322).9 and 10, in performing the second analysis on the first neural network model based on the second algorithm (step S320), the complexity of the structure and layers of the first neural network model is determined and the Fourth scores of the structure and layers of the first neural network model may be obtained (step S322).

구체적으로, 복잡도 판단부(예를 들어, 도 4의 340)를 이용하여 상기 제1 신경망 모델의 구조 및 레이어들의 복잡도를 분석하고(단계 S322a), 이에 기초하여 상기 제4 점수들을 획득할 수 있다(단계 S322b). 예를 들어, 단계 S322a에서 이용되는 복잡도 판단부(340)는 신경망 모델의 복잡도를 판단하는 툴이며, 소프트웨어 또는 하드웨어의 형태로 구현될 수 있다. 예를 들어, 단계 S322b에서 상기 제1 대상 장치에 대한 복잡도의 문턱 값(threshold)에 기초하여 스코어링을 수행하며, 복잡도가 높을수록 낮은 점수를 부여하고 복잡도가 낮을수록 높은 점수를 부여할 수 있다.Specifically, the complexity of the structure and layers of the first neural network model is analyzed using the complexity determination unit (eg, 340 in FIG. 4 ) (step S322a), and the fourth scores can be obtained based on this. (Step S322b). For example, the complexity determination unit 340 used in step S322a is a tool for determining the complexity of the neural network model, and may be implemented in the form of software or hardware. For example, in step S322b, scoring may be performed based on a complexity threshold for the first target device, and a higher score may be assigned as the complexity increases, and a higher score may be assigned as the complexity decreases.

일 실시예에서, 복잡도 판단부(340)가 복잡도를 판단하는 기준은 신경망 모델에 포함되는 파라미터들(parameters), 유닛들(units), 레이어들(layers)의 개수를 포함할 수 있다. 일 실시예에서, 복잡도 판단부(340)가 복잡도를 판단하는 방식은, Monica Bianchini 및 Franco Scarselli의 논문 "On the Complexity of Neural Network Classifiers: A Comparison Between Shallow and Deep Architectures"에 개시된 복잡도 평가 함수(complexity evaluation function)를 포함할 수 있다. 다만 본 발명은 이에 한정되지 않으며, 그 밖에 다양한 기준 및 방식을 이용하여 복잡도를 판단할 수 있다.In one embodiment, the criteria for determining the complexity by the complexity determiner 340 may include the number of parameters, units, and layers included in the neural network model. In one embodiment, the complexity determination unit 340 determines the complexity is a complexity evaluation function disclosed in the paper "On the Complexity of Neural Network Classifiers: A Comparison Between Shallow and Deep Architectures" by Monica Bianchini and Franco Scarselli. evaluation function). However, the present invention is not limited thereto, and complexity may be determined using various other criteria and methods.

또한, 상기 제1 신경망 모델의 구조 및 레이어들의 용량을 측정하여 상기 제1 신경망 모델의 구조 및 레이어들의 제5 점수들을 획득할 수 있다(단계 S324).In addition, fifth scores of the structure and layers of the first neural network model may be obtained by measuring capacity of the structure and layers of the first neural network model (step S324).

구체적으로, 용량 측정부(예를 들어, 도 4의 350)를 이용하여 상기 제1 신경망 모델의 구조 및 레이어들의 용량을 분석하고(단계 S324a), 이에 기초하여 상기 제5 점수들을 획득할 수 있다(단계 S324b). 예를 들어, 단계 S324a에서 이용되는 용량 측정부(350)는 신경망 모델의 용량을 측정하는 툴이며, 소프트웨어 또는 하드웨어의 형태로 구현될 수 있다. 예를 들어, 단계 S324b에서 용량 요구사항의 반대(opposite of capacity requirements)에 따라 스코어링을 수행하며, 용량이 클수록 높은 점수를 부여하고 용량이 작을수록 낮은 점수를 부여할 수 있다.Specifically, the structure of the first neural network model and the capacity of layers may be analyzed using a capacity measurement unit (eg, 350 in FIG. 4 ) (step S324a), and based on this, the fifth scores may be obtained. (Step S324b). For example, the capacity measurer 350 used in step S324a is a tool for measuring the capacity of the neural network model, and may be implemented in software or hardware form. For example, in step S324b, scoring may be performed according to the opposite of capacity requirements, and a higher score may be assigned for a larger capacity and a lower score may be assigned for a smaller capacity.

일 실시예에서, 용량 측정부(350)가 용량을 측정하는 방식은, Aosen Wang et al.의 논문 "Deep Neural Network Capacity"에 개시된 알고리즘을 포함할 수 있다. 다만 본 발명은 이에 한정되지 않으며, 그 밖에 다양한 방식을 이용하여 용량을 측정할 수 있다.In one embodiment, the capacity measurement unit 350 measures the capacity may include an algorithm disclosed in the paper "Deep Neural Network Capacity" by Aosen Wang et al. However, the present invention is not limited thereto, and the capacity can be measured using various other methods.

다시 말하면, 단계 S322에서는 상기 제1 신경망 모델의 복잡도를 판단할 수 있는 알고리즘을 이용하여 상기 제1 대상 장치에서 동작 시 오버헤드(overhead) 정도를 측정하고, 모델의 복잡도에 따른 장치의 성능을 측정하여 상기 제1 신경망 모델의 오버헤드를 예측할 수 있다. 단계 S324에서는 상기 제1 신경망 모델의 용량을 측정하고, 이를 이용한 최적화 포인트를 제시하며, 용량이 충분할수록 모델을 최적화하기에 용이할 수 있다.In other words, in step S322, the degree of overhead during operation in the first target device is measured using an algorithm capable of determining the complexity of the first neural network model, and the performance of the device according to the complexity of the model is measured. Thus, the overhead of the first neural network model can be predicted. In step S324, the capacity of the first neural network model is measured, an optimization point using the capacity is suggested, and it may be easier to optimize the model if the capacity is sufficient.

도 8에서는 단계 S322 및 S324가 실질적으로 동시에 수행되는 것으로 도시하였으나, 본 발명은 이에 한정되지 않으며, 실시예에 따라서 단계 S322 및 S324는 순차적으로 수행될 수도 있다.8 shows that steps S322 and S324 are performed substantially simultaneously, the present invention is not limited thereto, and steps S322 and S324 may be performed sequentially according to embodiments.

상기 제4 점수들 및 상기 제5 점수들에 기초하여 상기 제1 신경망 모델의 구조 및 레이어들의 복잡도 점수들을 획득할 수 있다(단계 S326). 예를 들어, 상기 제4 및 제5 점수들에 서로 다른 가중치들을 부여하여 합산하는 가중치 합산 방식에 기초하여 상기 복잡도 점수들을 획득할 수 있다. 예를 들어, 상기 가중치들은 대상 장치마다 다르게 설정되며, 상기 제4 및 제5 점수들에 부여되는 제4 및 제5 가중치들은 모델 정보(MI)에 포함되어 수신될 수 있다.Complexity scores of structures and layers of the first neural network model may be obtained based on the fourth scores and the fifth scores (step S326). For example, the complexity scores may be obtained based on a weight summation scheme in which different weights are assigned to the fourth and fifth scores and summed. For example, the weights are set differently for each target device, and the fourth and fifth weights assigned to the fourth and fifth scores may be included in the model information MI and received.

일 실시예에서, 상기 제4 점수들, 상기 제5 점수들 및 상기 복잡도 점수들은 상기 제1 신경망 모델의 구조 및 레이어들 각각에 대해 획득될 수 있다.In an embodiment, the fourth scores, the fifth scores, and the complexity scores may be obtained for each of the structure and layers of the first neural network model.

도 11은 도 1의 분석을 수행하는 단계의 또 다른 예를 나타내는 순서도이다.FIG. 11 is a flowchart illustrating another example of steps for performing the analysis of FIG. 1 .

도 1 및 11을 참조하면, 상기 제1 신경망 모델이 상기 제1 대상 장치에서 구동하기에 적합한지 상기 분석을 수행하는데 있어서(단계 S100), 상기 분석을 수행하는데 이용되는 상기 복수의 적합성 판단 알고리즘들은, 상기 제1 대상 장치에 대한 상기 제1 신경망 모델의 구조 및 레이어들의 메모리 효율성을 판단하는 제3 알고리즘을 포함하며, 상기 제3 알고리즘에 기초하여 상기 제1 신경망 모델에 대한 제3 분석을 수행할 수 있다(단계 S330). 예를 들어, 단계 S330은 분석 모듈(300)에 의해 수행될 수 있다.1 and 11, in performing the analysis on whether the first neural network model is suitable for running in the first target device (step S100), the plurality of suitability determination algorithms used to perform the analysis are , a third algorithm for determining memory efficiency of the structure and layers of the first neural network model for the first target device, and performing a third analysis on the first neural network model based on the third algorithm. It can (step S330). For example, step S330 may be performed by the analysis module 300 .

단계 S330에서는 상기 제1 신경망 모델의 구조 및 레이어들의 메모리 풋프린트(footprint)를 분석하여 메모리 사용량(utilization)에 따른 최적화 포인트를 판단 및 가이드할 수 있고, 그 결과를 점수화하여 단계 S400에서 시각적으로 나타낼 수 있다.In step S330, the memory footprint of the structure and layers of the first neural network model may be analyzed to determine and guide an optimization point according to memory utilization, and the result may be scored and visually displayed in step S400. can

도 12 및 13은 도 11의 제3 분석을 수행하는 단계의 예들을 나타내는 순서도들이다.12 and 13 are flowcharts illustrating examples of steps for performing the third analysis of FIG. 11 .

도 11 및 12를 참조하면, 상기 제3 알고리즘에 기초하여 상기 제1 신경망 모델에 대한 상기 제3 분석을 수행하는데 있어서(단계 S330), 상기 제1 대상 장치의 메모리 제한(limitation)을 로드하고(단계 S332), 상기 메모리 제한에 기초하여 상기 제1 신경망 모델의 구조 및 레이어들의 메모리 풋프린트 점수들을 획득할 수 있다(단계 S334).11 and 12, in performing the third analysis on the first neural network model based on the third algorithm (step S330), the memory limitations of the first target device are loaded ( Step S332), memory footprint scores of the structure and layers of the first neural network model may be obtained based on the memory limit (step S334).

구체적으로, 상기 제1 대상 장치의 특성에 의해 SRAM, DRAM 등의 메모리 제한이 있고, 이에 따른 메모리 독출/저장(reading/saving) 포인트에 따라 성능이 크게 달라질 수 있다. 메모리 추정기(예를 들어, 도 4의 360)를 이용하여 상기 제1 신경망 모델의 구조/구성에 의한 각각의 동작에서 발생하는 메모리 사용량, 병목 포인트(bottleneck point), 메모리 공유(sharing) 등을 미리 연산함으로써, 예상되는 성능에 기초한 최적화된 모델을 설계할 수 있다. 예를 들어, 단계 S334에서 이용되는 메모리 추정기(360)는 신경망 모델의 메모리 풋프린트를 분석하는 툴이며, 소프트웨어 또는 하드웨어의 형태로 구현될 수 있다.Specifically, there is a memory limitation such as SRAM or DRAM due to the characteristics of the first target device, and thus performance may vary greatly depending on a memory reading/saving point. Using a memory estimator (eg, 360 in FIG. 4 ), memory usage, bottleneck points, memory sharing, etc. occurring in each operation by the structure/configuration of the first neural network model are determined in advance. By computation, it is possible to design an optimized model based on expected performance. For example, the memory estimator 360 used in step S334 is a tool for analyzing the memory footprint of the neural network model, and may be implemented in the form of software or hardware.

일 실시예에서, 상기 메모리 풋프린트 점수들은 상기 제1 신경망 모델의 구조 및 레이어들 각각에 대해 획득될 수 있다.In one embodiment, the memory footprint scores may be obtained for each of the structure and layers of the first neural network model.

도 11 및 13을 참조하면, 상기 제3 알고리즘에 기초하여 상기 제1 신경망 모델에 대한 상기 제3 분석을 수행하는데 있어서(단계 S330), 단계 S332 및 S334는 도 12의 S332 및 S334와 각각 실질적으로 동일할 수 있다.11 and 13, in performing the third analysis on the first neural network model based on the third algorithm (step S330), steps S332 and S334 are substantially the same as S332 and S334 of FIG. 12, respectively. can be the same

상기 제1 신경망 모델이 상기 메모리 제한 내에서 이용 가능하지 않은 경우에(단계 S512: 아니오), 상기 제1 신경망 모델을 변경(modify 또는 update)할 수 있다(단계 S514). 예를 들어, 메모리 사용량, 병목 포인트, 메모리 공유 등에 따라서 상기 제1 신경망 모델을 변경할 수 있다. 단계 S512 및 S514는 도 17의 단계 S500에 대응할 수 있다.If the first neural network model is not available within the memory limit (step S512: No), the first neural network model may be modified or updated (step S514). For example, the first neural network model may be changed according to memory usage, bottleneck points, memory sharing, and the like. Steps S512 and S514 may correspond to step S500 of FIG. 17 .

상기 제1 신경망 모델이 상기 메모리 제한 내에서 이용 가능한 경우에는(단계 S512: 예), 상기 제1 신경망 모델의 변경 없이 프로세스가 종료될 수 있다.If the first neural network model is usable within the memory limit (step S512: Yes), the process may end without changing the first neural network model.

도 14는 도 1의 분석을 수행하는 단계의 또 다른 예를 나타내는 순서도이다.FIG. 14 is a flowchart illustrating another example of steps for performing the analysis of FIG. 1 .

도 1 및 14를 참조하면, 상기 제1 신경망 모델이 상기 제1 대상 장치에서 구동하기에 적합한지 상기 분석을 수행하는데 있어서(단계 S100), 단계 S310은 도 7 및 8을 참조하여 상술한 단계 S310과 실질적으로 동일하고, 단계 S320은 도 9 및 10을 참조하여 상술한 단계 S320과 실질적으로 동일하며, 단계 S330은 도 11, 12 및 13을 참조하여 상술한 단계 S330과 실질적으로 동일할 수 있다.1 and 14, in performing the analysis whether the first neural network model is suitable for driving in the first target device (step S100), step S310 is the step S310 described above with reference to FIGS. 7 and 8. , Step S320 may be substantially the same as step S320 described above with reference to FIGS. 9 and 10 , and step S330 may be substantially the same as step S330 described above with reference to FIGS. 11 , 12 and 13 .

단계 S310에 의해 획득된 상기 성능 점수들, 단계 S320에 의해 획득된 상기 복잡도 점수들 및 단계 S330에 의해 획득된 상기 메모리 풋프린트 점수들에 기초하여 상기 제1 신경망 모델에 대한 종합 점수들을 획득할 수 있다(단계 S340). 예를 들어, 상기 성능 점수들, 상기 복잡도 점수들 및 상기 메모리 풋프린트 점수들에 서로 다른 가중치들을 부여하여 합산하는 가중치 합산 방식에 기초하여 상기 종합 점수들을 획득할 수 있다. 예를 들어, 상기 가중치들은 대상 장치마다 다르게 설정되며, 상기 성능 점수들, 상기 복잡도 점수들 및 상기 메모리 풋프린트 점수들에 부여되는 가중치들은 모델 정보(MI)에 포함되어 수신될 수 있다.Comprehensive scores for the first neural network model may be obtained based on the performance scores obtained in step S310, the complexity scores obtained in step S320, and the memory footprint scores obtained in step S330. Yes (step S340). For example, the comprehensive scores may be obtained based on a weight summation scheme in which different weights are assigned to the performance scores, the complexity scores, and the memory footprint scores and summed. For example, the weights are set differently for each target device, and the weights given to the performance points, complexity points, and memory footprint points may be included in the model information MI and received.

도 15는 도 1의 신경망 모델의 최적화 방법의 구체적인 일 예를 나타내는 순서도이다. 이하 도 1과 중복되는 설명은 생략한다.15 is a flowchart illustrating a specific example of a method for optimizing the neural network model of FIG. 1 . Descriptions overlapping those of FIG. 1 will be omitted.

도 15를 참조하면, 본 발명의 실시예들에 따른 신경망 모델의 최적화 방법에서, 상기 제1 신경망 모델을 최적화하기 위한 그래픽 유저 인터페이스를 제공한다(단계 S1100). 상기 그래픽 유저 인터페이스의 구체적인 구현 방식에 대해서는 도 16 등을 참조하여 상세하게 후술하도록 한다.Referring to FIG. 15 , in the neural network model optimization method according to embodiments of the present invention, a graphical user interface for optimizing the first neural network model is provided (step S1100). A specific implementation method of the graphic user interface will be described later in detail with reference to FIG. 16 and the like.

상기 제1 신경망 모델에 대한 상기 제1 모델 정보를 상기 그래픽 유저 인터페이스로부터 수신하고(단계 S100a), 상기 제1 대상 장치에 대한 상기 장치 정보를 상기 그래픽 유저 인터페이스로부터 수신하고(단계 S200a), 상기 제1 신경망 모델이 상기 제1 대상 장치에서 구동하기에 적합한지 분석을 수행하며(단계 S300), 상기 제1 모델 정보 및 상기 분석의 결과가 하나의 화면에 표시되도록 시각화하여 상기 그래픽 유저 인터페이스 상에 표시한다(단계 S400a). 단계 S100a, S200a 및 S400a는 도 1의 단계 S100, S200 및 S400과 각각 유사하며, 단계 S300은 도 1의 단계 S300과 실질적으로 동일할 수 있다. 예를 들어, 단계 S300 및 S400a는 분석 모듈(300) 및 그래픽 유저 인터페이스 제어 모듈(200)에 의해 수행될 수 있다.The first model information for the first neural network model is received from the graphic user interface (step S100a), and the device information for the first target device is received from the graphic user interface (step S200a). 1 Analyzing whether the neural network model is suitable for driving in the first target device is performed (step S300), and the first model information and the result of the analysis are visualized to be displayed on one screen and displayed on the graphic user interface. (step S400a). Steps S100a, S200a, and S400a are similar to steps S100, S200, and S400 of FIG. 1, respectively, and step S300 may be substantially the same as step S300 of FIG. For example, steps S300 and S400a may be performed by the analysis module 300 and the graphic user interface control module 200 .

도 16a, 16b, 16c, 16d, 16e 및 16f는 도 15의 동작을 설명하기 위한 도면들이다.16a, 16b, 16c, 16d, 16e and 16f are diagrams for explaining the operation of FIG. 15 .

도 15 및 16a를 참조하면, 단계 S400a에서, 동작 초기에 상기 제1 신경망 모델의 구조 및 레이어들을 나타내는 그래픽 표현(GR11)을 상기 그래픽 유저 인터페이스 상에 표시할 수 있다. 예를 들어, 그래픽 표현(GR11)은 상기 제1 신경망 모델의 입력 및 출력 사이에 존재하는 복수의 레이어들(LAYER1, LAYER2, LAYER3, LAYER4, LAYER5, LAYER6)의 네트워크 구조를 나타낼 수 있다. 그래픽 표현(GR11)은 각 레이어에 대응하는 레이어 박스(예를 들어, 사각형) 및 레이어들의 연결 관계를 나타내는 화살표를 포함할 수 있다.Referring to FIGS. 15 and 16A , in step S400a, a graphic representation GR11 representing the structure and layers of the first neural network model may be displayed on the graphic user interface at the beginning of an operation. For example, the graphic representation GR11 may represent a network structure of a plurality of layers LAYER1 , LAYER2 , LAYER3 , LAYER4 , LAYER5 , and LAYER6 existing between inputs and outputs of the first neural network model. The graphic expression GR11 may include a layer box (eg, a rectangle) corresponding to each layer and an arrow indicating a connection relationship between the layers.

도 15, 16b, 16c, 16d, 16e 및 16f를 참조하면, 단계 S400a에서, 상기 제1 신경망 모델의 구조 및 레이어들과 상기 분석의 결과를 함께 나타내는 그래픽 표현들(GR12, GR13, GR14, GR15, GR16)을 상기 그래픽 유저 인터페이스 상에 표시할 수 있다. 예를 들어, 그래픽 표현들(GR12, GR13, GR14, GR15, GR16)에 포함되는 메뉴(110)에 포함되는 버튼들(112, 114, 116, 118) 중 하나를 선택함으로써, 상기 분석의 결과가 표시될 수 있다.Referring to FIGS. 15, 16b, 16c, 16d, 16e, and 16f, in step S400a, graphic representations (GR12, GR13, GR14, GR15, GR16) may be displayed on the graphic user interface. For example, by selecting one of the buttons 112, 114, 116, and 118 included in the menu 110 included in the graphic representations GR12, GR13, GR14, GR15, and GR16, the result of the analysis is can be displayed

도 16b, 16c, 16d 및 16e는 상기 분석의 결과가 점수로 표시되는 예들을 나타낸다. 도 16b의 예에서는 버튼(114)이 선택되며, 복수의 레이어들(LAYER1~LAYER6) 및 단계 S310에 의해 획득된 상기 제1 분석의 결과인 복수의 성능 점수들(SVP1, SVP2, SVP3, SVP4, SVP5, SVP6)을 포함하는 그래픽 표현(GR12)이 상기 그래픽 유저 인터페이스 상에 표시될 수 있다. 도 16c의 예에서는 버튼(116)이 선택되며, 복수의 레이어들(LAYER1~LAYER6) 및 단계 S320에 의해 획득된 상기 제2 분석의 결과인 복수의 복잡도 점수들(SVC1, SVC2, SVC3, SVC4, SVC5, SVC6)을 포함하는 그래픽 표현(GR13)이 상기 그래픽 유저 인터페이스 상에 표시될 수 있다. 도 16d의 예에서는 버튼(118)이 선택되며, 복수의 레이어들(LAYER1~LAYER6) 및 단계 S330에 의해 획득된 상기 제3 분석의 결과인 복수의 메모리 풋프린트 점수들(SVM1, SVM2, SVM3, SVM4, SVM5, SVM6)을 포함하는 그래픽 표현(GR14)이 상기 그래픽 유저 인터페이스 상에 표시될 수 있다. 도 16e의 예에서는 버튼(112)이 선택되며, 복수의 레이어들(LAYER1~LAYER6) 및 단계 S340에 의해 획득된 복수의 종합 점수들(SVT1, SVT2, SVT3, SVT4, SVT5, SVT6)을 포함하는 그래픽 표현(GR15)이 상기 그래픽 유저 인터페이스 상에 표시될 수 있다.16b, 16c, 16d and 16e show examples in which the results of the analysis are expressed as scores. In the example of FIG. 16B , button 114 is selected, and a plurality of layers (LAYER1 to LAYER6) and a plurality of performance scores (SVP1, SVP2, SVP3, SVP4, A graphical representation GR12 including SVP5 and SVP6 may be displayed on the graphical user interface. In the example of FIG. 16C , button 116 is selected, and a plurality of layers (LAYER1 to LAYER6) and a plurality of complexity scores (SVC1, SVC2, SVC3, SVC4, A graphical representation GR13 including SVC5 and SVC6 may be displayed on the graphical user interface. In the example of FIG. 16D , button 118 is selected, and a plurality of layers (LAYER1 to LAYER6) and a plurality of memory footprint scores (SVM1, SVM2, SVM3, A graphical representation GR14 including SVM4, SVM5 and SVM6 may be displayed on the graphical user interface. In the example of FIG. 16E, the button 112 is selected, and a plurality of layers (LAYER1 to LAYER6) and a plurality of comprehensive scores (SVT1, SVT2, SVT3, SVT4, SVT5, SVT6) obtained by step S340 are included. A graphical representation GR15 may be displayed on the graphical user interface.

일 실시예에서, 도 16b, 16c, 16d 및 16e의 그래픽 표현들(GR12, GR13, GR14, GR15)은 서로 전환 가능할 수 있다.In one embodiment, the graphical representations GR12, GR13, GR14, and GR15 of FIGS. 16B, 16C, 16D, and 16E may be convertible to each other.

도 16f는 상기 분석의 결과가 컬러로 표시되는 예를 나타낸다. 도 16e의 예와 유사하게, 도 16f의 예에서는 버튼(112)이 선택되며, 복수의 레이어들(LAYER1~LAYER6) 및 일부 레이어 박스에 컬러를 표시한 그래픽 표현(GR16)이 상기 그래픽 유저 인터페이스 상에 표시될 수 있다. 도시의 편의상, 도 16f에서는 컬러를 빗금으로 표시하였고, 빗금 간격이 좁을수록 진한 컬러를 나타낼 수 있다. 예를 들어, 컬러 표시된 레이어들(LAYER2~LAYER4)은 상대적으로 종합 점수가 낮은 레이어를 나타내고, 이 때 진한 컬러로 표시될수록 종합 점수가 보다 낮은 레이어를 나타내며, 따라서 레이어(LAYER3)에 대응하는 종합 점수(SVT3)가 가장 낮을 수 있다. 한편 도시하지는 않았으나, 버튼들(112, 114, 116)이 선택되는 경우에도 도 16f의 예와 유사하게 상기 분석의 결과가 컬러로 표시될 수 있다.16F shows an example in which the result of the analysis is displayed in color. Similar to the example of FIG. 16E, in the example of FIG. 16F, the button 112 is selected, and a plurality of layers (LAYER1 to LAYER6) and a graphic representation (GR16) displaying colors in some layer boxes are displayed on the graphic user interface. can be displayed on For convenience of illustration, in FIG. 16F , colors are represented by hatching, and a darker color can be represented as the interval between hatching becomes narrower. For example, the colored layers (LAYER2 to LAYER4) represent layers with a relatively low overall score, and at this time, the darker the color, the lower the overall score, and thus the overall score corresponding to the layer (LAYER3). (SVT3) may be the lowest. Meanwhile, although not shown, even when the buttons 112, 114, and 116 are selected, the result of the analysis may be displayed in color similar to the example of FIG. 16F.

다만 본 발명은 이에 한정되지 않으며, 서로 다른 모양 등을 이용하여 그래픽 표현을 구현할 수도 있다.However, the present invention is not limited thereto, and graphic expression may be implemented using different shapes.

일 실시예에서, 신경망 모델 처리 시스템(1000)에 포함되는 입력 장치(1310)에 포함되는 마우스, 터치스크린 등을 이용하여 사용자 입력을 수신함으로써, 버튼들(112, 114, 116, 118) 중 하나가 선택될 수 있다.In one embodiment, one of the buttons 112, 114, 116, 118 is received by receiving a user input using a mouse, a touch screen, etc. included in the input device 1310 included in the neural network model processing system 1000. can be selected.

도 17은 본 발명의 실시예들에 따른 신경망 모델의 최적화 방법을 나타내는 순서도이다. 이하 도 1과 중복되는 설명은 생략한다.17 is a flowchart illustrating a method for optimizing a neural network model according to embodiments of the present invention. Descriptions overlapping those of FIG. 1 will be omitted.

도 17을 참조하면, 본 발명의 실시예들에 따른 신경망 모델의 최적화 방법에서, 단계 S100, S200, S300 및 S400은 각각 도 1의 단계 S100, S200, S300 및 S400과 실질적으로 동일할 수 있다.Referring to FIG. 17 , in the neural network model optimization method according to embodiments of the present invention, steps S100, S200, S300, and S400 may be substantially the same as steps S100, S200, S300, and S400 of FIG. 1, respectively.

상기 분석의 결과에 기초하여 상기 제1 신경망 모델의 레이어들 중 적어도 하나를 변경한다(단계 S500). 예를 들어, 단계 S400과 유사하게, 단계 S500에서는 상기 모델 변경의 결과를 시각화하여 출력하며, 상기 그래픽 유저 인터페이스를 이용하여 수행될 수 있다. 예를 들어, 단계 S500은 업데이트 모듈(400)에 의해 수행될 수 있다.Based on the result of the analysis, at least one of the layers of the first neural network model is changed (step S500). For example, similar to step S400, in step S500, the result of the model change is visualized and output, and can be performed using the graphic user interface. For example, step S500 may be performed by the update module 400 .

도 18은 도 17의 제1 신경망 모델의 레이어들 중 적어도 하나를 변경하는 단계의 일 예를 나타내는 순서도이다.FIG. 18 is a flowchart illustrating an example of changing at least one of the layers of the first neural network model of FIG. 17 .

도 17 및 18을 참조하면, 상기 분석의 결과에 기초하여 상기 제1 신경망 모델의 레이어들 중 적어도 하나를 변경하는데 있어서(단계 S500), 상기 제1 신경망 모델의 레이어들 중 가장 낮은 점수를 갖는 제1 레이어를 선택하고(단계 S522), 상기 제1 레이어를 대체할 수 있고 상기 제1 레이어보다 높은 점수를 갖는 적어도 하나의 제2 레이어를 추천하며(단계 S524), 상기 적어도 하나의 제2 레이어에 기초하여 상기 제1 레이어를 변경할 수 있다(단계 S526). 예를 들어, 단계 S522 및 S526은 사용자 입력(UI)에 기초하여 수행될 수 있다. 예를 들어, 상기 제1 레이어는 상기 제2 레이어로 변경될 수 있다.17 and 18, in changing at least one of the layers of the first neural network model based on the result of the analysis (step S500), the layer having the lowest score among the layers of the first neural network model One layer is selected (step S522), at least one second layer that can replace the first layer and has a higher score than the first layer is recommended (step S524), and the at least one second layer Based on this, the first layer may be changed (step S526). For example, steps S522 and S526 may be performed based on user input (UI). For example, the first layer may be changed to the second layer.

도 19는 도 17의 신경망 모델의 최적화 방법의 구체적인 일 예를 나타내는 순서도이다. 이하 도 15 및 17과 중복되는 설명은 생략한다.19 is a flowchart illustrating a specific example of a method for optimizing the neural network model of FIG. 17 . Descriptions overlapping those of FIGS. 15 and 17 will be omitted.

도 19를 참조하면, 본 발명의 실시예들에 따른 신경망 모델의 최적화 방법에서, 단계 S1100, S100a, S200a, S300 및 S400a는 각각 도 15의 단계 S1100, S100a, S200a, S300 및 S400a와 실질적으로 동일할 수 있다.Referring to FIG. 19 , in the neural network model optimization method according to embodiments of the present invention, steps S1100, S100a, S200a, S300, and S400a are substantially the same as steps S1100, S100a, S200a, S300, and S400a of FIG. 15 , respectively. can do.

상기 제1 모델 정보 및 모델 변경의 과정 및 결과가 하나의 화면에 표시되도록 시각화하여 상기 그래픽 유저 인터페이스 상에 표시한다(단계 S500a). 단계 S500a는 도 17의 단계 S500과 유사할 수 있다. 예를 들어, 단계 S500a는 업데이트 모듈(400) 및 그래픽 유저 인터페이스 제어 모듈(200)에 의해 수행될 수 있다.The first model information and the process and result of model change are visualized to be displayed on one screen and displayed on the graphic user interface (step S500a). Step S500a may be similar to step S500 of FIG. 17 . For example, step S500a may be performed by the update module 400 and the graphic user interface control module 200 .

도 20a, 20b, 20c 및 20d는 도 19의 동작을 설명하기 위한 도면들이다. 이하 도 16a, 16b, 16c, 16d, 16e 및 16f와 중복되는 설명은 생략한다.20a, 20b, 20c and 20d are diagrams for explaining the operation of FIG. 19 . Descriptions overlapping those of FIGS. 16A, 16B, 16C, 16D, 16E, and 16F are omitted.

도 16e, 16f, 19 및 20a를 참조하면, 단계 S500a에서, 복수의 레이어들(LAYER1~LAYER6) 중 종합 점수(SVT3)가 가장 낮은 레이어(LAYER3)가 선택되며, 이에 따라 레이어(LAYER3)에 대한 정보를 포함하는 그래픽 표현(GR21)을 상기 그래픽 유저 인터페이스 상에 표시할 수 있다. 예를 들어, 레이어(LAYER3)의 입력 데이터의 크기는 (1,64,512,512)이고 출력 데이터의 크기는 (1,137,85,85)이며, 메뉴(120)에 표시되는 방식으로 구현될 수 있다.Referring to FIGS. 16E, 16F, 19, and 20A, in step S500a, a layer (LAYER3) having the lowest overall score (SVT3) is selected from among a plurality of layers (LAYER1 to LAYER6), and accordingly, the layer (LAYER3) A graphical representation GR21 including information may be displayed on the graphical user interface. For example, the size of input data of the layer (LAYER3) is (1,64,512,512) and the size of output data is (1,137,85,85), which can be implemented in a manner displayed on the menu 120.

도 19 및 20b를 참조하면, 단계 S500a에서, 레이어(LAYER3)를 대체할 수 있는 추천 레이어들(LAYER31, LAYER32, LAYER33)에 대한 정보를 포함하는 그래픽 표현(GR22)을 상기 그래픽 유저 인터페이스 상에 표시할 수 있다. 예를 들어, 제1 추천 레이어(LAYER31)는 하나의 레이어로 형성되고 메뉴(122)에 표시되는 방식으로 구현될 수 있다. 예를 들어, 제2 추천 레이어들(LAYER32, LAYER33)은 두 개의 레이어들로 형성되고 메뉴(122)에 표시되는 방식으로 구현될 수 있다. 예를 들어, 레이어(LAYER3)를 제2 추천 레이어들(LAYER32, LAYER33)로 변경하는 경우에 성능이 보다 향상되며, 레이어(LAYER3)를 제1 추천 레이어(LAYER31)로 변경하는 경우에 변경 전 모델과 변경 후 모델의 유사성이 보다 높을 수 있다.19 and 20B, in step S500a, a graphic expression GR22 including information on recommended layers (LAYER31, LAYER32, and LAYER33) that can replace the layer (LAYER3) is displayed on the graphic user interface. can do. For example, the first recommendation layer LAYER31 may be formed as one layer and displayed in the menu 122 . For example, the second recommendation layers LAYER32 and LAYER33 may be formed as two layers and displayed on the menu 122 . For example, when the layer (LAYER3) is changed to the second recommendation layers (LAYER32, LAYER33), the performance is further improved, and when the layer (LAYER3) is changed to the first recommendation layer (LAYER31), the model before change is changed. The similarity of the model after and after the change may be higher.

도 19 및 20c를 참조하면, 단계 S500a에서, 레이어(LAYER3)를 제1 추천 레이어(LAYER31)로 변경하기 위해 제1 추천 레이어(LAYER31)가 선택되며, 이에 대한 그래픽 표현(GR23)을 상기 그래픽 유저 인터페이스 상에 표시할 수 있다.19 and 20c, in step S500a, the first recommendation layer (LAYER31) is selected to change the layer (LAYER3) to the first recommendation layer (LAYER31), and a graphic expression (GR23) for this is selected by the graphic user. can be displayed on the interface.

도 19 및 20d를 참조하면, 단계 S500a에서, 레이어(LAYER3)를 제1 추천 레이어(LAYER31)로 변경한 이후에, 변경된 모델에 대한 복수의 레이어들(LAYER1, LAYER2, LAYER31, LAYER4, LAYER5, LAYER6) 및 복수의 종합 점수들(SVT1, SVT2, SVT31, SVT4, SVT5, SVT6)을 포함하는 그래픽 표현(GR24)을 상기 그래픽 유저 인터페이스 상에 표시할 수 있다. 예를 들어, 변경된 레이어(LAYER31)의 종합 점수(SVT31)는 변경 전 레이어(LAYER3)의 종합 점수(SVT3)보다 높을 수 있다.19 and 20d, after changing the layer LAYER3 to the first recommendation layer LAYER31 in step S500a, a plurality of layers LAYER1, LAYER2, LAYER31, LAYER4, LAYER5, and LAYER6 for the changed model. ) and a graphic expression GR24 including a plurality of composite scores SVT1, SVT2, SVT31, SVT4, SVT5, and SVT6 may be displayed on the graphic user interface. For example, the overall score (SVT31) of the changed layer (LAYER31) may be higher than the overall score (SVT3) of the layer (LAYER3) before the change.

일 실시예에서, 신경망 모델 처리 시스템(1000)에 포함되는 입력 장치(1310)에 포함되는 마우스, 터치스크린 등을 이용하여 사용자 입력을 수신함으로써, 도 20a 및 20c에서 레이어 및 대응하는 레이어 박스가 선택될 수 있다.In one embodiment, a layer and a corresponding layer box in FIGS. 20A and 20C are selected by receiving a user input using a mouse, a touch screen, etc. included in the input device 1310 included in the neural network model processing system 1000. It can be.

상술한 것처럼, 적합성 판단 알고리즘에 기초한 시각적 인터페이스를 활용하여 모델을 수정하고, 상기 과정의 반복을 통해 대상 장치에 최적화된 모델을 디자인할 수 있다. 간단한 수정부터 새로운 대체 구조까지 제시할 수 있으며, 자동적인 최적화 기능 및 사용자의 입력 조건에 맞는 조건부 최적화 기능을 모두 포함할 수 있다.As described above, a model optimized for a target device may be designed by modifying a model using a visual interface based on a suitability determination algorithm and repeating the above process. From simple modifications to new alternative structures, it can include both an automatic optimization function and a conditional optimization function tailored to the user's input conditions.

도 21은 본 발명의 실시예들에 따른 신경망 모델의 최적화 방법을 나타내는 순서도이다. 이하 도 1과 중복되는 설명은 생략한다.21 is a flowchart illustrating a method for optimizing a neural network model according to embodiments of the present invention. Descriptions overlapping those of FIG. 1 will be omitted.

도 21을 참조하면, 본 발명의 실시예들에 따른 신경망 모델의 최적화 방법에서, 단계 S100, S200, S300 및 S400은 각각 도 1의 단계 S100, S200, S300 및 S400과 실질적으로 동일할 수 있다.Referring to FIG. 21 , in the neural network model optimization method according to embodiments of the present invention, steps S100, S200, S300, and S400 may be substantially the same as steps S100, S200, S300, and S400 of FIG. 1, respectively.

상기 제1 신경망 모델의 레이어들 중 적어도 일부에 서로 다른 양자화 방식을 적용한다(단계 S600). 예를 들어, 단계 S400과 유사하게, 단계 S600에서는 상기 양자화 방식 변경의 결과를 시각화하여 출력하며, 상기 그래픽 유저 인터페이스를 이용하여 수행될 수 있다. 예를 들어, 단계 S600은 양자화 모듈(500)에 의해 수행될 수 있다.Different quantization schemes are applied to at least some of the layers of the first neural network model (step S600). For example, similar to step S400, in step S600, the result of changing the quantization method is visualized and output, which can be performed using the graphic user interface. For example, step S600 may be performed by the quantization module 500 .

도 22는 도 21의 제1 신경망 모델의 레이어들 중 적어도 일부에 서로 다른 양자화 방식을 적용하는 단계의 일 예를 나타내는 순서도이다.FIG. 22 is a flowchart illustrating an example of applying different quantization schemes to at least some of the layers of the first neural network model of FIG. 21 .

도 21 및 22를 참조하면, 상기 제1 신경망 모델의 레이어들 중 적어도 일부에 서로 다른 양자화 방식을 적용하는데 있어서(단계 S600), 상기 제1 신경망 모델에 대한 학습이 완료된(pre-trained) 제2 모델 정보를 수신하고(단계 S610), 상기 제2 모델 정보에 기초하여 상기 제1 신경망 모델의 레이어들 중 양자화 방식을 변경하고자 하는 제3 레이어를 선택하며(단계 S620), 상기 선택된 제3 레이어의 양자화 방식을 변경할 수 있다(단계 S630). 예를 들어, 단계 S620 및 S630은 사용자 입력(UI)에 기초하여 수행될 수 있다.21 and 22, in applying different quantization schemes to at least some of the layers of the first neural network model (step S600), the first neural network model is pre-trained. Receiving model information (step S610), selecting a third layer whose quantization method is to be changed among the layers of the first neural network model based on the second model information (step S620), A quantization method may be changed (step S630). For example, steps S620 and S630 may be performed based on user input (UI).

단계 S100 내지 S400과 다르게, 단계 S600은 상기 제1 신경망 모델에 대한 학습 동작이 완료된 이후에 수행될 수 있다. 예를 들어, 상기 제2 모델 정보는 상기 제1 모델 정보의 적어도 일부가 변경되어 획득될 수 있다. 예를 들어, 상세하게 도시하지는 않았으나, 도 21의 단계 S400과 S600 사이에 도 17의 단계 S500이 수행되어 상기 제2 모델 정보가 획득될 수 있다.Unlike steps S100 to S400, step S600 may be performed after the learning operation for the first neural network model is completed. For example, the second model information may be obtained by changing at least a part of the first model information. For example, although not shown in detail, step S500 of FIG. 17 may be performed between steps S400 and S600 of FIG. 21 to obtain the second model information.

양자화는 신경망 모델에 대한 압축(compression) 동작의 일종이다. 신경망 모델에 대한 압축 동작은, 학습이 완료된 신경망 모델의 성능 및/또는 정확도를 최대한 유지하면서 모델의 크기 및 연산량을 줄이기 위한 과정을 나타낸다. 양자화는 일반적으로 부동 소수점으로 표현되는 가중치를 특정 비트 수로 줄여 실제 신경망 모델의 저장 크기를 줄이는 기법을 나타낸다.Quantization is a type of compression operation for neural network models. The compression operation for the neural network model represents a process for reducing the size and amount of computation of the model while maximally maintaining the performance and/or accuracy of the neural network model that has been trained. Quantization refers to a technique of reducing the storage size of an actual neural network model by reducing weights, which are generally expressed in floating point numbers, to a specific number of bits.

도 23은 도 21의 신경망 모델의 최적화 방법의 구체적인 일 예를 나타내는 순서도이다. 이하 도 15 및 21과 중복되는 설명은 생략한다.23 is a flowchart illustrating a specific example of a method for optimizing the neural network model of FIG. 21 . Descriptions overlapping those of FIGS. 15 and 21 will be omitted.

도 23을 참조하면, 본 발명의 실시예들에 따른 신경망 모델의 최적화 방법에서, 단계 S1100, S100a, S200a, S300 및 S400a는 각각 도 15의 단계 S1100, S100a, S200a, S300 및 S400a와 실질적으로 동일할 수 있다.Referring to FIG. 23 , in the neural network model optimization method according to embodiments of the present invention, steps S1100, S100a, S200a, S300, and S400a are substantially the same as steps S1100, S100a, S200a, S300, and S400a of FIG. 15 , respectively. can do.

상기 제2 모델 정보 및 양자화 방식 변경의 과정 및 결과가 하나의 화면에 표시되도록 시각화하여 상기 그래픽 유저 인터페이스 상에 표시한다(단계 S600a). 단계 S600a는 도 21의 단계 S600과 유사할 수 있다. 예를 들어, 단계 S600a는 양자화 모듈(500) 및 그래픽 유저 인터페이스 제어 모듈(200)에 의해 수행될 수 있다.The process and result of changing the second model information and the quantization method are visualized to be displayed on one screen and displayed on the graphic user interface (step S600a). Step S600a may be similar to step S600 of FIG. 21 . For example, step S600a may be performed by the quantization module 500 and the graphic user interface control module 200 .

도 24a, 24b 및 24c는 도 23의 동작을 설명하기 위한 도면들이다. 이하 도 16a, 16b, 16c, 16d, 16e, 16f, 20a, 20b, 20c 및 20d와 중복되는 설명은 생략한다.24a, 24b and 24c are diagrams for explaining the operation of FIG. 23 . 16a, 16b, 16c, 16d, 16e, 16f, 20a, 20b, 20c, and 20d overlapping descriptions are omitted.

도 23 및 24a를 참조하면, 단계 S600a에서, 메뉴(130)에 포함되는 버튼(132)을 선택함으로써, 복수의 레이어들(LAYER1, LAYER2, LAYER31, LAYER4, LAYER5, LAYER6) 및 복수의 양자화 성능들(QP1, QP2, QP3, QP4, QP5, QP6)을 포함하는 그래픽 표현(GR31)을 상기 그래픽 유저 인터페이스 상에 표시할 수 있다.23 and 24a, in step S600a, by selecting the button 132 included in the menu 130, a plurality of layers (LAYER1, LAYER2, LAYER31, LAYER4, LAYER5, LAYER6) and a plurality of quantization performances A graphical representation GR31 including (QP1, QP2, QP3, QP4, QP5, QP6) may be displayed on the graphical user interface.

도 23 및 24b를 참조하면, 단계 S600a에서, 메뉴(130)에 포함되는 버튼(134)을 선택하고 양자화 방식을 변경하고자 하는 레이어(LAYER31)를 선택하고 레이어(LAYER31)의 양자화 방식을 제1 양자화 방식(QS1)에서 제2 양자화 방식(QS2)로 변경하며, 이에 대한 그래픽 표현(GR32)을 상기 그래픽 유저 인터페이스 상에 표시할 수 있다. 레이어(LAYER31)는 제2 양자화 방식(QS2)에 기초하여 재양자화되며, 다른 레이어들과는 다른 양자화 방식이 적용될 수 있다.23 and 24b, in step S600a, a button 134 included in the menu 130 is selected, a layer (LAYER31) whose quantization method is to be changed is selected, and the quantization method of the layer (LAYER31) is set to first quantization. The method (QS1) is changed to the second quantization method (QS2), and a graphic expression (GR32) for this may be displayed on the graphic user interface. The layer LAYER31 is re-quantized based on the second quantization method QS2, and a quantization method different from that of other layers may be applied.

도 23 및 24c를 참조하면, 단계 S600a에서, 메뉴(130)에 포함되는 버튼(132)을 선택함으로써, 복수의 레이어들(LAYER1, LAYER2, LAYER31, LAYER4, LAYER5, LAYER6) 및 복수의 양자화 성능들(QP1, QP2, QP31, QP4, QP5, QP6)을 포함하는 그래픽 표현(GR31)을 상기 그래픽 유저 인터페이스 상에 표시할 수 있다. 예를 들어, 제2 양자화 방식(QS2)에 기초한 레이어(LAYER31)의 양자화 성능(QP31)은 제1 양자화 방식(QS1)에 기초한 레이어(LAYER31)의 양자화 성능(QP3)보다 높을 수 있다.23 and 24c, in step S600a, by selecting the button 132 included in the menu 130, a plurality of layers (LAYER1, LAYER2, LAYER31, LAYER4, LAYER5, LAYER6) and a plurality of quantization performances A graphical representation GR31 including (QP1, QP2, QP31, QP4, QP5, QP6) may be displayed on the graphical user interface. For example, the quantization performance QP31 of the layer LAYER31 based on the second quantization scheme QS2 may be higher than the quantization performance QP3 of the layer LAYER31 based on the first quantization scheme QS1.

상술한 것처럼, 구성요소 각각에 적용된 양자화 방식의 정확성을 확인하고, 분포(distribution) 복원 정도에 의해 손실률에 따라 각각의 구성요소에 다른 양자화 방식을 적용해서 정확성을 향상시킬 수 있다. 구체적으로, 플로팅 포인트(floating point) 모델의 각 레이어, 특성 맵(feature map)에 대한 양자화 정확도를 비교하여, 손실 정도에 따라 각 레이어, 특성 맵에 따른 적합한 양자화 방식을 찾는 알고리즘을 통해 가이드할 수 있다. 양자화 방식을 각 구성요소마다 다르게 적용해 보고, 즉각적인 결과를 확인함으로써 최적화된 양자화 성능을 얻을 수 있다. 특히 사용자가 임의로 하나 또는 복수의 구성 요소에 대해 target min/max range를 설정할 수 있고, 양자화 분포 모드를 각각 설정할 수 있으며, 비대칭(asymmetric) 방식, 대칭(symmetric) 방식 등을 다르게 적용하거나 비트 폭(bit-width)을 다르게 적용하여 재양자화할 수 있다.As described above, the accuracy of the quantization method applied to each component can be confirmed, and the accuracy can be improved by applying a different quantization method to each component according to a loss rate according to a degree of distribution restoration. Specifically, by comparing quantization accuracy for each layer and feature map of a floating point model, it is possible to guide through an algorithm that finds an appropriate quantization method according to each layer and feature map according to the degree of loss. there is. Optimized quantization performance can be obtained by applying a different quantization method to each component and checking the immediate result. In particular, the user can arbitrarily set the target min/max range for one or more components, set the quantization distribution mode individually, apply different asymmetric methods, symmetric methods, etc., or use different bit widths ( bit-width) can be applied differently to re-quantize.

도 25는 본 발명의 실시예들에 따른 신경망 모델의 최적화 방법이 구현되는 시스템을 나타내는 블록도이다.25 is a block diagram illustrating a system in which a method for optimizing a neural network model according to embodiments of the present invention is implemented.

도 25를 참조하면, 시스템(3000)은 사용자 장치(3100), 네트워크(3200) 및 클라우드 컴퓨팅 환경(3300)을 포함할 수 있다. 사용자 장치(3100)는 신경망 모델 최적화 엔진(3110)을 포함하고, 클라우드 컴퓨팅 환경(3300)은 클라우드 스토리지(3210), 데이터 베이스(3220), 신경망 모델 최적화 엔진(3230), 클라우드 신경망 모델 엔진(3240) 및 인벤토리(3250)를 포함할 수 있다. 본 발명의 실시예들에 따른 신경망 모델의 최적화 방법은 클라우드 환경 상에서 구현되며, 신경망 모델 최적화 엔진들(3110, 3230)에 의해 수행될 수 있다.Referring to FIG. 25 , a system 3000 may include a user device 3100 , a network 3200 and a cloud computing environment 3300 . The user device 3100 includes a neural network model optimization engine 3110, and the cloud computing environment 3300 includes a cloud storage 3210, a database 3220, a neural network model optimization engine 3230, and a cloud neural network model engine 3240. ) and inventory 3250. The neural network model optimization method according to embodiments of the present invention is implemented in a cloud environment and may be performed by the neural network model optimization engines 3110 and 3230 .

본 발명의 실시예들은 인공 신경망 및/또는 머신 러닝이 구현될 수 있는 다양한 장치 및 시스템에 적용될 수 있다. 예를 들어, 본 발명의 실시예들은 PC(Personal Computer), 서버 컴퓨터(server computer), 데이터 센터(data center), 워크스테이션(workstation), 노트북(laptop), 핸드폰(cellular), 스마트 폰(smart phone), MP3 플레이어, PDA(Personal Digital Assistant), PMP(Portable Multimedia Player), 디지털 TV, 디지털 카메라, 포터블 게임 콘솔(portable game console), 네비게이션(navigation) 기기, 웨어러블(wearable) 기기, IoT(Internet of Things) 기기, IoE(Internet of Everything) 기기, e-북(e-book), VR(Virtual Reality) 기기, AR(Augmented Reality) 기기, 드론(drone) 등과 같은 전자 시스템에 더욱 유용하게 적용될 수 있다.Embodiments of the present invention may be applied to various devices and systems in which artificial neural networks and/or machine learning may be implemented. For example, embodiments of the present invention may be used in personal computers (PCs), server computers, data centers, workstations, laptops, cellular phones, and smart phones. phone), MP3 player, PDA (Personal Digital Assistant), PMP (Portable Multimedia Player), digital TV, digital camera, portable game console, navigation device, wearable device, IoT (Internet It can be more usefully applied to electronic systems such as Things of Things (IoT) devices, Internet of Everything (IoE) devices, e-books, VR (Virtual Reality) devices, AR (Augmented Reality) devices, and drones. there is.

상기에서는 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술분야의 숙련된 당업자는 하기의 특허청구범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 것이다.Although the above has been described with reference to preferred embodiments of the present invention, those skilled in the art can variously modify and change the present invention without departing from the spirit and scope of the present invention described in the claims below. you will understand that you can

Claims

Receiving first model information on a first neural network model;
receiving device information about a first target device for driving the first neural network model;
Analyzing whether the first neural network model is suitable for running in the first target device based on at least one of a plurality of suitability determination algorithms, the first model information, and the device information step; and
and visualizing and outputting a result of the analysis so that the first model information and the result of the analysis are displayed on one screen.

According to claim 1,
The plurality of suitability determination algorithms,
A first algorithm for determining performance efficiency of structures and layers of the first neural network model for the first target device;
The step of performing the analysis is,
and performing a first analysis on the first neural network model based on the first algorithm.

The method of claim 2, wherein performing the first analysis comprises:
obtaining first scores of structures and layers of the first neural network model by using a table predefined for the first target device;
obtaining second scores of the structure and layers of the first neural network model by estimating a processing time of the structure and layers of the first neural network model using a performance estimator;
obtaining third scores of structures and layers of the first neural network model by using a deep learning model pretrained for the first target device; and
and obtaining performance scores of structures and layers of the first neural network model based on the first scores, the second scores, and the third scores.

According to claim 1,
The plurality of suitability determination algorithms,
A second algorithm for analyzing the complexity and capacity of the structure and layers of the first neural network model,
The step of performing the analysis is,
and performing a second analysis on the first neural network model based on the second algorithm.

The method of claim 4, wherein performing the second analysis comprises:
obtaining fourth scores of the structure and layers of the first neural network model by determining complexity of the structure and layers of the first neural network model;
obtaining fifth scores of the structure and layers of the first neural network model by measuring capacity of the structure and layers of the first neural network model; and
and obtaining complexity scores of structures and layers of the first neural network model based on the fourth scores and the fifth scores.

According to claim 1,
The plurality of suitability determination algorithms,
A third algorithm for determining memory efficiency of structures and layers of the first neural network model for the first target device;
The step of performing the analysis is,
and performing a third analysis on the first neural network model based on the third algorithm.

The method of claim 6, wherein performing the third analysis comprises:
and obtaining memory footprint scores of structures and layers of the first neural network model based on a memory limitation of the first target device.

According to claim 1,
The method of optimizing a neural network model, further comprising changing at least one of the layers of the first neural network model based on a result of the analysis.

According to claim 1,
The method of optimizing a neural network model, further comprising applying different quantization schemes to at least some of the layers of the first neural network model.

an input device configured to receive first model information about a first neural network model and device information about a first target device to drive the first neural network model;
Analyzing whether the first neural network model is suitable for running in the first target device based on at least one of a plurality of suitability determination algorithms, the first model information, and the device information; a storage device that stores information about program routines that generate a result of the analysis so that the first model information and the result of the analysis are displayed on a single screen;
an output device that visualizes and outputs the result of the analysis; and
A computer-based neural network model processing system including a processor connected to the input device, the storage device, and the output device to control execution of the program routines.