KR20170034258A

KR20170034258A - Model training method and apparatus, and data recognizing method

Info

Publication number: KR20170034258A
Application number: KR1020150132679A
Authority: KR
Inventors: 강효아; 김하영
Original assignee: 삼성전자주식회사
Priority date: 2015-09-18
Filing date: 2015-09-18
Publication date: 2017-03-28
Also published as: EP3144859A3; US10410114B2; CN106548190A; US20170083829A1; KR102492318B1; EP3144859A2

Abstract

A model learning method and apparatus, and a data recognition method are disclosed. The disclosed model learning method comprises the steps of: selecting at least one of a plurality of teacher models; and learning a student model based on output data of at least one selected teacher model corresponding to input data.

Description

[0001] MODEL TRAINING METHOD AND APPARATUS AND DATA RECOGNIZING METHOD [0002]

아래 실시예들은 모델 학습 방법 및 장치, 및 데이터 인식 방법에 관한 것이다.The following embodiments relate to a model learning method and apparatus, and a data recognition method.

최근 들어, 입력 패턴을 특정 그룹으로 분류하는 문제를 해결하는 방안으로써, 인간이 지니고 있는 효율적인 패턴 인식 방법을 실제 컴퓨터에 적용시키려는 연구가 활발히 진행되고 있다. 이러한 연구 중 하나로, 인간의 생물학적 신경 세포의 특성을 수학적 표현에 의해 모델링한 인공신경망(artificial neural network)에 대한 연구가 있다. 입력 패턴을 특정 그룹으로 분류하는 문제를 해결하기 위해, 인공신경망은 인간이 가지고 있는 학습이라는 능력을 모방한 알고리즘을 이용한다. 이 알고리즘을 통하여 인공신경망은 입력 패턴과 출력 패턴들 사이의 사상(mapping)을 생성해낼 수 있는데, 이를 인공신경망이 학습 능력이 있다고 표현한다. 또한, 인공신경망은 학습된 결과에 기초하여 학습에 이용되지 않았던 입력 패턴에 대하여 비교적 올바른 출력을 생성할 수 있는 일반화 능력을 가지고 있다.2. Description of the Related Art In recent years, as an approach for solving the problem of classifying input patterns into specific groups, researches have been actively conducted to apply effective pattern recognition methods of humans to real computers. One of these studies is the study of artificial neural networks modeled by mathematical expressions of the characteristics of human biological neurons. In order to solve the problem of classifying the input pattern into a specific group, the artificial neural network uses an algorithm that mimics the ability of the human being to learn. Through this algorithm, an artificial neural network can generate mapping between input pattern and output pattern, which expresses that artificial neural network has learning ability. In addition, the artificial neural network has a generalization ability to generate relatively correct output for input patterns that were not used for learning based on the learned results.

또한, 이러한 인공신경망의 사이즈를 소형화시키면서도 인식률의 감소를 최소화하려는 연구가 진행되고 있다.Further, studies are being made to miniaturize the size of the artificial neural network and minimize the decrease in the recognition rate.

일실시예에 따른 모델 학습 방법은 복수의 교사 모델들 중에서 적어도 하나를 선택하는 단계; 및 입력 데이터에 대한 상기 선택된 적어도 하나의 교사 모델의 출력 데이터에 기초하여 학생 모델(student model)을 학습시키는 단계를 포함한다.A model learning method according to an embodiment includes selecting at least one of a plurality of teacher models; And learning a student model based on output data of the at least one teacher model selected for the input data.

일실시예에 따른 모델 학습 방법에서 상기 복수의 교사 모델들 중에서 적어도 하나를 선택하는 단계는, 상기 복수의 교사 모델들의 정확도에 기반하여 상기 복수의 교사 모델들 중에서 적어도 하나를 선택할 수 있다.The selecting of at least one of the plurality of teacher models in the model learning method according to an exemplary embodiment may select at least one of the plurality of teacher models based on the accuracy of the plurality of teacher models.

일실시예에 따른 모델 학습 방법에서 상기 복수의 교사 모델들 중에서 적어도 하나를 선택하는 단계는, 상기 입력 데이터에 대한 상기 복수의 교사 모델들의 출력 데이터들 간의 상관도(correlation)에 기반하여 상기 복수의 교사 모델들 중에서 적어도 하나를 선택할 수 있다.The step of selecting at least one of the plurality of teacher models in the model learning method according to an exemplary embodiment of the present invention may include selecting at least one of the plurality of teacher models based on a correlation between output data of the plurality of teacher models with respect to the input data, At least one of the teacher models can be selected.

일실시예에 따른 모델 학습 방법에서 상기 복수의 교사 모델들 중에서 적어도 하나를 선택하는 단계는, 상기 선택된 적어도 하나의 교사 모델들의 출력 데이터들 간의 상관도가 임계 값보다 낮도록 상기 복수의 교사 모델들 중에서 적어도 하나를 선택할 수 있다.The selecting of at least one of the plurality of teacher models in the model learning method according to an exemplary embodiment may include selecting one of the plurality of teacher models so that the correlation between output data of the selected at least one teacher models is less than a threshold value, At least one of them can be selected.

일실시예에 따른 모델 학습 방법에서 상기 학생 모델을 학습시키는 단계는, 상기 학생 모델의 출력 데이터를 더 이용하여 상기 학생 모델을 학습시킬 수 있다.The step of learning the student model in the model learning method according to an embodiment may further use the output data of the student model to learn the student model.

일실시예에 따른 모델 학습 방법에서 상기 복수의 교사 모델들 중에서 적어도 하나를 선택하는 단계, 상기 학생 모델을 학습시키는 단계는, 상기 학생 모델이 미리 정해진 조건을 만족할 때까지 반복적으로 수행될 수 있다.Selecting at least one of the plurality of teacher models in the model learning method according to an embodiment, and learning the student model may be repeatedly performed until the student model satisfies a predetermined condition.

일실시예에 따른 모델 학습 방법에서 상기 학생 모델을 학습시키는 단계는, 상기 입력 데이터에 대한 상기 학생 모델의 출력 데이터와 상기 선택된 적어도 하나의 교사 모델의 제1 출력 데이터 사이의 제1 손실; 및 상기 학생 모델의 히든 레이어로부터 파생(derive)된 분류 레이어의 출력 데이터와 상기 선택된 적어도 하나의 교사 모델의 제2 출력 데이터 사이의 제2 손실에 기초하여 상기 학생 모델을 학습시키고, 상기 제1 손실과 상기 제2 손실은, 서로 다른 방법에 기반하여 결정될 수 있다.The step of learning the student model in the model learning method according to an embodiment includes: a first loss between output data of the student model for the input data and first output data of the selected at least one teacher model; And to learn the student model based on a second loss between output data of a classification layer deriving from a hidden layer of the student model and second output data of the selected at least one teacher model, And the second loss may be determined based on different methods.

일실시예에 따른 모델 학습 방법에서 상기 제1 손실과 상기 제2 손실은, 서로 다른 선택된 교사 모델들로부터 출력된 제1 출력 데이터와 제2 출력 데이터를 이용하여 결정될 수 있다.In the model learning method according to an exemplary embodiment, the first loss and the second loss may be determined using first output data and second output data output from different selected teacher models.

일실시예에 따른 모델 학습 방법에서 상기 제1 손실과 상기 제2 손실은, 상기 제1 출력 데이터와 상기 제2 출력 데이터에 서로 다른 가중치를 적용함으로써 결정될 수 있다.In the model learning method according to an exemplary embodiment, the first loss and the second loss may be determined by applying different weights to the first output data and the second output data.

일실시예에 따른 모델 학습 방법에서 상기 분류 레이어의 초기 가중치는, 상기 선택된 적어도 하나의 교사 모델들 중에서 상기 분류 레이어로 입력되는 데이터와 가장 유사한 사이즈를 가지는 선택된 교사 모델의 초기 가중치로 설정될 수 있다.In the model learning method according to an exemplary embodiment, the initial weight of the classification layer may be set to an initial weight of a selected teacher model having a size most similar to data input to the classification layer among the selected teacher models .

일실시예에 따른 모델 학습 방법에서 상기 학생 모델을 학습시키는 단계는, 상기 입력 데이터에 대응하는 정답 데이터를 더 이용하여 상기 학생 모델을 학습시킬 수 있다.The step of learning the student model in the model learning method according to an embodiment may further use the correct answer data corresponding to the input data to learn the student model.

일실시예에 따른 모델 학습 방법에서 상기 복수의 교사 모델들은, 서로 다른 초기 가중치(initial weight)를 가지거나, 서로 다른 신경망 구조를 가지거나, 서로 다른 하이퍼 파라미터(hyper parameter)가 적용되거나 또는 서로 다른 앙상블(ensemble)로 구성될 수 있다.In a model learning method according to an exemplary embodiment, the plurality of teacher models may have different initial weights, different neural network structures, different hyper parameters, It can be composed of an ensemble.

일실시예에 따른 모델 학습 방법에서 상기 학생 모델의 구조는, 상기 선택된 적어도 하나의 교사 모델로 입력되는 데이터의 사이즈에 기초하여 결정될 수 있다.In the model learning method according to an exemplary embodiment, the structure of the student model may be determined based on a size of data input to the selected at least one teacher model.

일실시예에 따른 데이터 인식 방법은 인식하고자 하는 대상 데이터를 수신하는 단계; 및 기 학습된 모델을 이용하여 상기 대상 데이터를 인식하는 단계를 포함하고, 상기 모델은, 복수의 교사 모델들 중에서 적어도 하나를 선택하고, 입력 데이터에 대한 상기 선택된 적어도 하나의 교사 모델의 출력 데이터를 이용함으로써 학습된다.According to an embodiment of the present invention, there is provided a data recognition method comprising: receiving target data to be recognized; And recognizing the object data using the learned model, wherein the model includes at least one of a plurality of teacher models, and outputs output data of the selected at least one teacher model to the input data .

일실시예에 따른 모델 학습 장치는 학생 모델을 학습시키는 프로세서; 및 상기 학습된 학생 모델을 저장하는 메모리를 포함하고, 상기 프로세서는, 복수의 교사 모델들 중에서 적어도 하나를 선택하고, 입력 데이터에 대한 상기 선택된 적어도 하나의 교사 모델의 출력 데이터에 기초하여 상기 학생 모델을 학습시킨다.A model learning apparatus according to an embodiment includes a processor for learning a student model; And a memory for storing the learned student model, wherein the processor is configured to select at least one of the plurality of teacher models, and to generate, based on the output data of the selected at least one teacher model for the input data, .

도 1은 일실시예에 따른 교사 모델과 학생 모델을 설명하기 위한 도면이다.
도 2 내지 도 4는 일실시예에 따라 복수의 교사 모델들 중에서 적어도 하나를 선택하여 학생 모델을 학습시키는 과정을 설명하기 위한 도면이다.
도 5는 일실시예에 따라 학생 모델의 분류 레이어를 이용한 학습 과정을 설명하기 위한 도면이다.
도 6은 일실시예에 따른 모델 학습 방법을 나타낸 도면이다.
도 7은 일실시예에 따른 데이터 인식 방법을 나타낸 도면이다.
도 8은 일실시예에 따른 모델 학습 장치를 나타낸 도면이다.
도 9는 일실시예에 따른 데이터 인식 장치를 나타낸 도면이다.1 is a diagram for explaining a teacher model and a student model according to an embodiment.
FIGS. 2 to 4 are diagrams illustrating a process of selecting at least one of a plurality of teacher models to learn a student model according to an exemplary embodiment.
5 is a diagram for explaining a learning process using a classification layer of a student model according to an embodiment.
6 is a diagram illustrating a model learning method according to an embodiment.
7 is a diagram illustrating a data recognition method according to an embodiment of the present invention.
8 is a diagram illustrating a model learning apparatus according to an embodiment.
9 is a diagram illustrating a data recognition apparatus according to an embodiment.

이하, 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다. 아래의 특정한 구조적 내지 기능적 설명들은 단지 실시예들을 설명하기 위한 목적으로 예시된 것으로, 실시예의 범위가 본문에 설명된 내용에 한정되는 것으로 해석되어서는 안된다. 관련 기술 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다. 또한, 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타내며, 공지된 기능 및 구조는 생략하도록 한다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. The specific structural or functional descriptions below are illustrated for purposes of illustration only and are not to be construed as limiting the scope of the embodiments to those described in the text. Those skilled in the art will appreciate that various modifications and changes may be made thereto without departing from the scope of the present invention. In addition, the same reference numerals shown in the drawings denote the same members, and the well-known functions and structures are omitted.

도 1은 일실시예에 따른 교사 모델과 학생 모델을 설명하기 위한 도면이다.1 is a diagram for explaining a teacher model and a student model according to an embodiment.

도 1을 참고하면, 교사 모델(110)과 학생 모델(120)이 도시된다. 교사 모델(110)과 학생 모델(120)은 특정 입력에 대해 특정 출력이 출력되도록 학습된 모델로서, 예를 들어, 신경망(neural network)을 포함할 수 있다. 신경망은 연결선으로 연결된 많은 수의 인공 뉴런들을 이용하여 생물학적인 시스템의 계산 능력을 모방하는 인식 모델이다.Referring to Figure 1, a teacher model 110 and a student model 120 are shown. The teacher model 110 and the student model 120 are models that have been trained to output a specific output for a particular input, and may include, for example, a neural network. Neural networks are recognition models that mimic the computational capabilities of biological systems using a large number of artificial neurons connected by a link.

신경망은 생물학적인 뉴런의 기능을 단순화시킨 인공 뉴런들이 이용되고, 인공 뉴런들은 연결 가중치(connection weight)를 가지는 연결선을 통해 상호 연결될 수 있다. 신경망의 파라미터인 연결 가중치는 연결선이 갖는 특정한 값으로서 연결 강도라고도 나타낼 수 있다. 신경망은 인공 뉴런들을 통해 인간의 인지 작용이나 학습 과정을 수행할 수 있다. 인공 뉴런은 노드(node)라고도 지칭할 수 있다.Neural networks can be artificial neurons that simplify the function of biological neurons, and artificial neurons can be interconnected through connection lines with connection weights. The connection weight, which is a parameter of the neural network, can also be referred to as the connection strength as a specific value of the connection line. Neural networks can perform human cognitive or learning processes through artificial neurons. An artificial neuron can also be referred to as a node.

신경망은 복수의 층들을 포함할 수 있다. 예를 들어, 신경망은 입력 레이어(input layer), 히든 레이어(hidden layer), 출력 레이어(output layer)를 포함할 수 있다. 입력 레이어는 학습을 수행하기 위한 입력을 수신하여 히든 레이어에 전달할 수 있고, 출력 레이어는 히든 층의 노드들로부터 수신한 신호에 기초하여 신경망의 출력을 생성할 수 있다. 히든 레이어는 입력 레이어와 출력 레이어 사이에 위치하고, 입력 레이어를 통해 전달된 학습 데이터를 예측하기 쉬운 값으로 변화시킬 수 있다. 입력 레이어와 히든 레이어에 포함된 노드들은 연결 가중치를 가지는 연결선을 통해 서로 연결되고, 히든 레이어와 출력 레이어에 포함된 노드들에서도 연결 가중치를 가지는 연결선을 통해 서로 연결될 수 있다. 입력 레이어, 히든 레이어 및 출력 레이어는 복수의 노드들을 포함할 수 있다.The neural network may comprise a plurality of layers. For example, a neural network may include an input layer, a hidden layer, and an output layer. The input layer may receive inputs to perform learning and transmit them to the hidden layer, and the output layer may generate outputs of the neural network based on signals received from the hidden layer nodes. The hidden layer is located between the input layer and the output layer, and can change the learning data transmitted through the input layer to a value that is easy to predict. Nodes included in the input layer and the hidden layer are connected to each other through connection lines having connection weights, and nodes included in the hidden layer and the output layer can be connected to each other via connection weighted connection lines. The input layer, the hidden layer, and the output layer may include a plurality of nodes.

신경망은 복수의 히든 레이어들을 포함할 수 있다. 복수의 히든 레이어들을 포함하는 신경망을 깊은 신경망(deep neural network)이라고 하고, 깊은 신경망을 학습시키는 것을 깊은 학습(deep learning)이라고 한다. 히든 레이어에 포함된 노드를 히든 노드(hidden node)라고 한다.The neural network may include a plurality of hidden layers. A neural network containing multiple hidden layers is called a deep neural network, and a deep neural network is called deep learning. A node contained in a hidden layer is called a hidden node.

신경망은 감독 학습(supervised learning)을 통해 학습될 수 있다. 감독 학습이란 입력 데이터와 그에 대응하는 출력 데이터를 함께 신경망에 입력하고, 입력 데이터에 대응하는 출력 데이터가 출력되도록 연결선들의 연결 가중치를 업데이트하는 방법이다. 예를 들어, 모델 학습 장치는 델타 규칙(delta rule)과 오류 역전파 학습(back propagation learning) 등을 통해 인공 뉴런들 사이의 연결 가중치를 업데이트할 수 있다.Neural networks can be learned through supervised learning. Supervised learning is a method of inputting input data and corresponding output data together into a neural network and updating connection weights of connection lines so that output data corresponding to the input data is output. For example, the model learning device can update the connection weights between artificial neurons through delta rule and back propagation learning.

오류 역전파 학습은, 주어진 입력 데이터에 대해 전방 계산(forward computation)으로 손실을 추정한 후, 출력 레이어에서 시작하여 히든 레이어와 입력 레이어로의 역 방향으로 추정한 손실을 전파하는 과정에서 손실을 줄이는 방향으로 연결 가중치를 업데이트하는 방법이다. 신경망의 처리는 입력 레이어, 히든 레이어, 출력 레이어의 순서로 진행되지만, 오류 역전파 학습에서 연결 가중치의 업데이트 방향은 출력 레이어, 히든 레이어, 입력 레이어의 순서로 진행될 수 있다. 이하, 신경망을 학습시킨다는 것은 신경망의 파라미터를 학습시킨다는 것으로 이해될 수 있다. 또한, 학습된 신경망은 학습된 파라미터가 적용된 신경망으로 이해될 수 있다.Error backpropagation learning estimates the loss by forward computation for a given input data, then reduces the loss in the process of propagating the estimated loss in the reverse direction from the output layer to the hidden layer and the input layer Lt; RTI ID = 0.0 > direction. &Lt; / RTI > The processing of the neural network proceeds in the order of the input layer, the hidden layer, and the output layer. However, in the error backpropagation learning, the update direction of the connection weight may proceed in the order of the output layer, hidden layer, and input layer. Hereinafter, learning the neural network can be understood as learning the parameters of the neural network. In addition, the learned neural network can be understood as a neural network to which the learned parameters are applied.

일실시예에 따른 교사 모델(110)과 학생 모델(120)은 인식하고자 하는 대상이 동일한 서로 다른 사이즈의 신경망을 나타낼 수 있다.The teacher model 110 and the student model 120 according to an embodiment may represent neural networks of different sizes having the same objects to be recognized.

교사 모델(110)은 인식하고자 하는 대상 데이터로부터 추출된 충분히 많은 특징들을 이용하여 높은 정확도로 대상 데이터를 인식하는 모델로서, 학생 모델(120)보다 큰 사이즈의 신경망일 수 있다. 예를 들어, 교사 모델(110)은 학생 모델(120)보다 많은 히든 레이어들, 많은 노드들, 또는 이들의 조합으로 구성될 수 있다.The teacher model 110 may be a neural network of a size larger than the student model 120, which is a model for recognizing the target data with high accuracy using a sufficient number of features extracted from the target data to be recognized. For example, the teacher model 110 may comprise more hidden layers than the student model 120, many nodes, or a combination thereof.

학생 모델(120)은 교사 모델(110)보다 작은 사이즈의 신경망으로서, 작은 사이즈로 인해 교사 모델(110)보다 인식 속도가 빠를 수 있다. 학생 모델(120)은 입력 데이터로부터 교사 모델(110)의 출력 데이터가 출력되도록 교사 모델(110)에 기반하여 학습될 수 있다. 예를 들어, 교사 모델(110)의 출력 데이터는 교사 모델(110)에서 출력되는 로직 값(value of logit), 확률 값 또는 교사 모델(110)의 히든 레이어로부터 파생된 분류 레이어의 출력 값일 수 있다. 이를 통해, 교사 모델(110)과 동일한 값을 출력하면서도 교사 모델(110)보다 빠른 인식 속도를 가지는 학생 모델(120)을 얻을 수 있다. 이러한 과정을 모델 컴프레션(model compression)이라고 한다. 모델 컴프레션은 참 라벨(true label)인 정답 데이터 대신 교사 모델(110)의 출력 데이터를 이용하여 학생 모델(120)을 학습시키는 기법이다.The student model 120 is a neural network having a smaller size than the teacher model 110 and can be recognized faster than the teacher model 110 due to its small size. The student model 120 can be learned based on the teacher model 110 such that output data of the teacher model 110 is output from the input data. For example, the output data of the teacher model 110 may be a value of logit output from the teacher model 110, a probability value, or an output value of the classification layer derived from the hidden layer of the teacher model 110 . Accordingly, the student model 120 having the same recognition speed as the teacher model 110 and having a faster recognition speed than the teacher model 110 can be obtained. This process is called model compression. Model compression is a technique for learning the student model 120 using output data of the teacher model 110 instead of correct data, which is a true label.

일실시예에 따라 학생 모델(120)을 학습시킬 때 이용될 수 있는 교사 모델(110)은 복수일 수 있다. 학생 모델(120)은 복수의 교사 모델들 중에서 적어도 하나를 선택함으로써 학습될 수 있다. 복수의 교사 모델들 중에서 적어도 하나를 선택하여 학생 모델(120)을 학습시키는 과정은 학생 모델(120)이 미리 정해진 조건을 만족할 때까지 반복해서 수행될 수 있다. 이 때, 선택되어 학생 모델(120)을 학습시키는 적어도 하나의 교사 모델은 학습 과정이 반복될 때마다 새롭게 선택될 수 있다. 예를 들어, 하나 또는 둘 이상의 교사 모델이 학생 모델(120)을 학습시키기 위한 모델로 선택될 수 있다.According to one embodiment, there may be a plurality of teacher models 110 that may be used to learn the student model 120. The student model 120 can be learned by selecting at least one of a plurality of teacher models. The process of selecting at least one of the plurality of teacher models and learning the student model 120 can be repeatedly performed until the student model 120 satisfies a predetermined condition. At this time, at least one teacher model selected and learning the student model 120 can be newly selected every time the learning process is repeated. For example, one or more teacher models may be selected as models for learning the student model 120.

학생 모델(120)을 학습시키기 위해 복수의 교사 모델들 중에서 하나 또는 둘 이상을 선택하는 과정에 대해서는 도 2 내지 도 4를 통하여 후술한다.The process of selecting one or more of the plurality of teacher models for learning the student model 120 will be described later with reference to FIG. 2 through FIG.

도 2 내지 도 4는 일실시예에 따라 복수의 교사 모델들 중에서 적어도 하나를 선택하여 학생 모델을 학습시키는 과정을 설명하기 위한 도면이다.FIGS. 2 to 4 are diagrams illustrating a process of selecting at least one of a plurality of teacher models to learn a student model according to an exemplary embodiment.

도 2를 참고하면, 학생 모델(220)을 학습시키기 위해 복수의 교사 모델들 중에서 어느 하나를 선택하는 과정이 도시되어 있다. 복수의 교사 모델들 중에서 어느 하나를 선택하는 과정은 모델 학습 장치에 의해 수행될 수 있다.Referring to FIG. 2, a process of selecting one of a plurality of teacher models for learning the student model 220 is shown. The process of selecting any one of the plurality of teacher models can be performed by the model learning apparatus.

모델 학습 장치는 데이터 인식을 위한 신경망을 학습시키는 장치로서, 단일 프로세서 또는 멀티 프로세서로 구현될 수 있다. 또는, 모델 학습 장치는 서로 다른 장치에 포함된 복수의 모듈들로 구현될 수도 있다. 이 경우, 복수의 모듈들은 네트워크 등을 통하여 서로 연결될 수 있다.The model learning apparatus is a device for learning a neural network for data recognition, and may be implemented as a single processor or a multi-processor. Alternatively, the model learning apparatus may be implemented by a plurality of modules included in different apparatuses. In this case, the plurality of modules may be connected to each other via a network or the like.

복수의 교사 모델들은 학생 모델(220)의 학습 과정에 이용될 수 있는 기 학습된 모델을 나타내는 것으로, 다양한 구조와 다양한 정확도를 가질 수 있다. 복수의 교사 모델들에는 교사 모델 1 내지 교사 모델 N이 포함될 수 있다.The plurality of teacher models represents a pre-learned model that can be used in the learning process of the student model 220, and can have various structures and various accuracies. The plurality of teacher models may include teacher model 1 to teacher model N. [

일실시예에 따른 복수의 교사 모델들은 서로 다른 초기 가중치(initial weight)를 가지거나, 서로 다른 신경망 구조를 가지거나, 서로 다른 하이퍼 파라미터(hyper parameter)를 가지거나 또는 서로 다른 앙상블(ensemble)로 구성될 수 있다.A plurality of teacher models according to an embodiment may have different initial weights, different neural network structures, different hyper parameters, or different ensembles .

복수의 교사 모델들은 서로 다른 초기 가중치를 가질 수 있다. 초기 가중치는 신경망 내의 연결선이 가지는 연결 가중치의 초기 값을 의미하는 것으로, 에러 역전파 신경망 학습에서 학습 속도와 수렴률에 큰 영향을 줄 수 있다. 복수의 교사 모델들에 적용되는 초기 가중치는 랜덤 초기치 설정(random initialization) 또는 사전 학습(pre-training) 등 다양한 방법을 통해 서로 다른 값으로 설정될 수 있다.The plurality of teacher models may have different initial weights. The initial weight means the initial value of the connection weights of the connection lines in the neural network, which can greatly affect the learning rate and convergence rate in error back propagation neural network learning. The initial weights applied to the plurality of teacher models may be set to different values through various methods such as random initialization or pre-training.

복수의 교사 모델들은 서로 다른 신경망 구조를 가질 수 있다. 예를 들어, 복수의 교사 모델들은 서로 다른 히든 레이어 개수, 필터 개수, 커널 사이즈(kernel size) 등으로 구성되는 다양한 신경망 구조를 가질 수 있다.Multiple teacher models can have different neural network structures. For example, a plurality of teacher models may have various neural network structures including different hidden layer number, filter number, kernel size, and the like.

복수의 교사 모델들은 서로 다른 하이퍼 파라미터를 가질 수 있다. 하이퍼 파라미터는 learning rate, momentum 등 학습에 관련된 파라미터를 나타낼 수 있다.The plurality of teacher models may have different hyper parameters. The hyperparameter can represent learning-related parameters such as learning rate and momentum.

복수의 교사 모델들은 서로 다른 앙상블로 구성될 수 있다. 교사 모델은 하나의 신경망으로 구성될 수도 있고 또는 복수의 신경망들의 앙상블로 구성될 수도 있다. 교사 모델이 복수의 신경망들의 앙상블로 구성되는 경우, 해당 교사 모델은 다른 교사 모델과 상이한 앙상블로 구성될 수 있다.A plurality of teacher models can be composed of different ensembles. The teacher model may be composed of one neural network or an ensemble of a plurality of neural networks. If the teacher model is composed of a plurality of ensembles of neural networks, the teacher model may be composed of different ensembles from other teacher models.

일실시예에 따른 모델 학습 장치는 복수의 교사 모델들의 정확도에 기초하여 복수의 교사 모델들 중에서 어느 하나를 선택할 수 있다. 예를 들어, 모델 학습 장치는 복수의 교사 모델들 중에서 가장 높은 정확도를 가지는 교사 모델을 선택할 수 있다. 또는, 모델 학습 장치는 임계 값 이상의 정확도를 가지는 교사 모델들 중에서 어느 하나를 랜덤하게 선택할 수도 있다.The model learning apparatus according to one embodiment can select any one of a plurality of teacher models based on the accuracy of a plurality of teacher models. For example, the model learning apparatus can select a teacher model having the highest accuracy among a plurality of teacher models. Alternatively, the model learning apparatus may randomly select any one of the teacher models having an accuracy higher than the threshold value.

모델 학습 장치는 선택된 교사 모델(210)의 출력 데이터를 학생 모델(220)의 라벨(label)로 이용하여 학생 모델(220)을 학습시킬 수 있다. 모델 학습 장치는 입력 데이터로부터 출력된 학생 모델(220)의 출력 데이터와 선택된 교사 모델(210)의 출력 데이터 간의 차이를 나타내는 손실에 기초하여 학생 모델(220)을 학습시킬 수 있다. 모델 학습 장치는 학생 모델(220)의 출력 데이터와 선택된 교사 모델(210)의 출력 데이터 간의 차이를 나타내는 손실을 계산하고, SGD(stochastic gradient decent) 기법에 기초하여 손실이 감소되도록 학생 모델(220)을 학습시킬 수 있다.The model learning apparatus can learn the student model 220 by using the output data of the selected teacher model 210 as a label of the student model 220. [ The model learning apparatus can learn the student model 220 based on the loss indicating the difference between the output data of the student model 220 output from the input data and the output data of the selected teacher model 210. [ The model learning device calculates the loss representing the difference between the output data of the student model 220 and the output data of the selected teacher model 210 and determines the student model 220 so that the loss is reduced based on the stochastic gradient decent (SGD) .

모델 학습 장치는 출력 레이어에서 시작하여 히든 레이어와 입력 레이어의 역 방향으로 손실을 전파하고 손실을 줄이는 방향으로 연결 가중치를 업데이트할 수 있다. 이러한 역 방향으로 오류를 전파하는 것을 백워드 패스(backward pass)라고 한다.The model learning device can update the connection weights in the direction starting from the output layer and propagating the loss in the reverse direction of the hidden layer and the input layer and reducing the loss. The propagation of errors in this reverse direction is called the backward pass.

모델 학습 장치는 현재 설정된 연결 가중치들이 얼마나 최적에 가까운지를 측정하기 위한 목적 함수(objective function)를 정의하고, 목적 함수의 결과에 기초하여 연결 가중치들을 계속 변경하고, 학습을 반복적으로 수행할 수 있다. 예를 들어, 목적 함수는 신경망이 입력 데이터에 기초하여 실제 출력한 출력 데이터와 출력되기로 원하는 기대 값(예컨대, 선택된 교사 모델(210)의 출력 데이터) 간의 손실을 계산하기 위한 손실 함수(loss function)일 수 있다. 모델 학습 장치는 손실 함수의 값을 줄이는 방향으로 연결 가중치들을 업데이트할 수 있다.The model learning apparatus may define an objective function for measuring how close the optimal connection weights are currently set, continuously change the connection weights based on the result of the objective function, and repeat the learning. For example, the objective function may be a loss function for calculating the loss between the actual output data based on the input data and the desired expected value (e.g., the output data of the selected teacher model 210) ). The model learning device may update the connection weights in a direction that reduces the value of the loss function.

모델 학습 장치는 다음과 같이 손실을 계산할 수 있다.The model learning device can calculate the loss as follows.

위의 수학식 1에서,

은 손실 함수를 나타내고,

은 학습되어야 할 학생 모델(220)의 파라미터를 나타낸다. 또한,

는 선택된 교사 모델 i의 출력 데이터를 나타내고,

은 입력 데이터로부터 출력된 학생 모델(220)의 출력 데이터를 나타내고,

은 선택된 교사 모델 i의 출력 데이터

와 학생 모델(220)의 출력 데이터

간의 크로스 엔트로피(cross entropy), 소프트맥스 함수(softmax function) 또는 유클리드 거리(Euclidean distance)를 나타낸다.In Equation (1) above,

Represents a loss function,

Represents the parameters of the student model 220 to be learned. Also,

Represents the output data of the selected teacher model i,

Represents the output data of the student model 220 output from the input data,

The output data of the selected teacher model i

And the output data of the student model 220

Cross entropy, a softmax function, or an Euclidean distance.

도 2에서는 교사 모델 1 내지 교사 모델 N 중에서 교사 모델 2가 선택되고, 선택된 교사 모델 2의 출력 데이터

를 이용하여 학생 모델(220)이 학습되는 일례가 도시되어 있다.2, the teacher model 2 is selected from the teacher models 1 to N, and the output data of the selected teacher model 2

An example in which the student model 220 is learned is shown.

다른 실시예에 따라 모델 학습 장치는 정답 데이터

를 더 이용하여 학생 모델(220)을 학습시킬 수도 있다. 모델 학습 장치는 정답 데이터

를 더 고려하여 손실을 다음과 같이 계산할 수 있다.According to another embodiment,

The student model 220 may be further learned. The model learning device generates correct answer data

The loss can be calculated as follows.

위의 수학식 2에서,

는 정답 데이터

와 학생 모델(220)의 출력 데이터

간의 크로스 엔트로피, 소프트맥스 함수 또는 유클리드 거리를 나타낸다. 은 상수로서, 선택된 교사 모델 i의 출력 데이터

에 적용되는 가중치를 나타내고,

은 상수로서, 정답 데이터

에 적용되는 가중치를 나타낸다.In the above equation (2)

The correct answer data

And the output data of the student model 220

Cross entropy, soft max function or Euclidean distance. As a constant, the output data of the selected teacher model i

Lt; / RTI >

As a constant,

Lt; / RTI >

모델 학습 장치는

와

의 값을 조절함으로써, 선택된 교사 모델 i의 출력 데이터

와 정답 데이터

각각이 학생 모델(220)의 학습 과정에 미치는 영향을 결정할 수 있다. 예를 들어,

보다

에 보다 큰 값이 설정된 경우, 모델 학습 장치는 정답 데이터

보다 선택된 교사 모델 i의 출력 데이터

에 더 큰 비중을 두고 학생 모델(220)을 학습시킬 수 있다.The model learning device

Wow

The output data of the selected teacher model i

And correct answer data

Each of which can determine the effect of the student model 220 on the learning process. E.g,

see

The model learning apparatus generates the correct answer data

Output data of the teacher model i selected more

The student model 220 can be learned with a larger weight.

도 3을 참고하면, 학생 모델(320)을 학습시키기 위해 복수의 교사 모델들 중에서 둘 이상을 선택하는 과정이 도시되어 있다. 복수의 교사 모델들 중에서 둘 이상을 선택하는 과정은 모델 학습 장치에 의해 수행될 수 있다. 복수의 교사 모델들은 학생 모델(320)의 학습 과정에 이용될 수 있는 기 학습된 모델을 나타내는 것으로, 다양한 구조와 다양한 정확도를 가질 수 있다.Referring to FIG. 3, a process of selecting two or more teacher models for learning the student model 320 is shown. The process of selecting two or more of the plurality of teacher models can be performed by the model learning apparatus. The plurality of teacher models represents a pre-learned model that can be used in the learning process of the student model 320, and can have various structures and various accuracies.

일실시예에 따른 모델 학습 장치는 복수의 교사 모델들의 정확도에 기초하여 복수의 교사 모델들 중에서 둘 이상을 선택할 수 있다. 모델 학습 장치는 복수의 교사 모델들 중에서 임계 값 이상의 정확도를 가지는 교사 모델들을 선택할 수 있다. 또는, 모델 학습 장치는 정확도를 기준으로 복수의 교사 모델들을 나열하고, 정확도가 높은 순서대로 미리 정해진 k 개의 교사 모델들을 선택할 수 있다.The model learning apparatus according to an embodiment can select two or more of the plurality of teacher models based on the accuracy of the plurality of teacher models. The model learning apparatus can select teacher models having an accuracy higher than a threshold value among a plurality of teacher models. Alternatively, the model learning apparatus can list a plurality of teacher models on the basis of the accuracy, and select k teacher models predetermined in order of accuracy.

또한, 모델 학습 장치는 입력 데이터에 대한 복수의 교사 모델들의 출력 데이터들 간의 상관도(correlation)에 기반하여 복수의 교사 모델들 중에서 둘 이상을 선택할 수 있다. 모델 학습 장치는 선택된 교사 모델들(310)의 출력 데이터들 간의 상관도가 임계 값보다 낮도록 복수의 교사 모델들 중에서 둘 이상을 선택할 수 있다. 예를 들어, 모델 학습 장치는 임계 값 이상의 정확도를 가지는 교사 모델들 중에서 선택된 교사 모델들(310)의 출력 데이터들 간의 상관도가 임계 값보다 낮도록 복수의 교사 모델들 중에서 둘 이상을 선택할 수 있다. 또는, 모델 학습 장치는 임계 값 이상의 정확도를 가지는 교사 모델들의 출력 데이터들 중에서 가장 높은 정확도를 가지는 교사 모델의 출력 데이터와 상관도가 낮은 순서대로 미리 정해진 k 개의 교사 모델들을 가장 높은 정확도를 가지는 교사 모델과 함께 선택할 수 있다.In addition, the model learning apparatus can select two or more of the plurality of teacher models based on the correlation between the output data of the plurality of teacher models with respect to the input data. The model learning apparatus can select two or more of the plurality of teacher models so that the correlation between the output data of the selected teacher models 310 is lower than the threshold value. For example, the model learning apparatus can select two or more of the plurality of teacher models so that the correlation between the output data of the teacher models 310 selected from the teacher models having an accuracy higher than or equal to the threshold value is lower than the threshold value . Alternatively, the model learning apparatus may classify predetermined k teacher models in the order of lower correlation with the output data of the teacher model having the highest accuracy among output data of teacher models having an accuracy higher than the threshold value, .

또한, 모델 학습 장치는 휴리스틱(heuristic)하게 복수의 교사 모델들 중에서 둘 이상을 선택하는 입력을 사용자로부터 수신할 수도 있다.Further, the model learning apparatus may heuristically receive an input from the user to select two or more of the plurality of teacher models.

도 3에서는 교사 모델 1 내지 교사 모델 N 중에서 교사 모델 2, 교사 모델 3이 선택되고, 선택된 교사 모델 2의 출력 데이터

, 선택된 교사 모델 3의 출력 데이터

에 기초하여 학생 모델(320)이 학습되는 일례가 도시되어 있다.3, the teacher model 2 and the teacher model 3 are selected from the teacher models 1 to N, and the output data of the selected teacher model 2

, The output data of the selected teacher model 3

An example in which the student model 320 is learned is shown.

모델 학습 장치는 정답 데이터

를 더 이용하여 학생 모델(320)을 학습시킬 수도 있다. 복수의 교사 모델들 중에서 둘 이상을 선택하고 정답 데이터

를 더 고려하여 학생 모델(320)을 학습시키는 경우, 모델 학습 장치는 다음과 같이 손실을 계산할 수 있다.The model learning device generates correct answer data

The student model 320 may be further learned. Select two or more of the plurality of teacher models,

To learn the student model 320, the model learning apparatus can calculate the loss as follows.

위의 수학식 3에서,

은 선택된 교사 모델 j의 출력 데이터를 나타내고,

는 선택된 교사 모델 j의 출력 데이터

와 학생 모델(320)의 출력 데이터

간의 크로스 엔트로피, 소프트맥스 함수 또는 유클리드 거리를 나타낸다.

는 정답 데이터

와 학생 모델(320)의 출력 데이터

은 상수로서, 선택된 교사 모델 j의 출력 데이터

에 적용되는 가중치를 나타내고,

은 상수로서, 정답 데이터

에 적용되는 가중치를 나타낸다. 모델 학습 장치는

,

의 값을 조절함으로써, 선택된 교사 모델 i의 출력 데이터

, 선택된 교사 모델 j의 출력 데이터

와 정답 데이터

가 학생 모델(320)의 학습 과정에 미치는 영향을 결정할 수 있다.In the above equation (3)

Represents the output data of the selected teacher model j,

The output data of the selected teacher model j

And the output data of the student model 320

Cross entropy, soft max function or Euclidean distance.

The correct answer data

And the output data of the student model 320

Cross entropy, soft max function or Euclidean distance.

As a constant, the output data of the selected teacher model j

Lt; / RTI >

As a constant,

Lt; / RTI > The model learning device

,

The output data of the selected teacher model i

, The output data of the selected teacher model j

And correct answer data

Can influence the learning process of the student model (320).

모델 학습 장치는 수학식 3을 통해 계산된 손실이 감소되도록 학생 모델(320)을 학습시킬 수 있다.The model learning apparatus can learn the student model 320 so that the loss calculated through Equation (3) is reduced.

도 4를 참고하면, 학생 모델(420)의 출력 데이터를 더 이용하여 학생 모델(420)을 학습시키는 과정이 도시되어 있다. 모델 학습 장치는 복수의 교사 모델들 중에서 선택된 적어도 하나의 교사 모델(410)의 출력 데이터와 학생 모델(420)의 출력 데이터를 함께 이용하여 손실을 계산하고, 손실이 감소되도록 학생 모델(420)을 학습시킬 수 있다.Referring to FIG. 4, a process of learning the student model 420 by further using output data of the student model 420 is shown. The model learning apparatus calculates the loss by using the output data of at least one teacher model 410 selected from a plurality of teacher models and the output data of the student model 420 together and outputs the student model 420 Can learn.

경우에 따라서는 선택된 적어도 하나의 교사 모델(410)에 기반하여 학습된 학생 모델(420)은 선택된 적어도 하나의 교사 모델(410)보다 높은 정확도를 가질 수 있다. 따라서, 높은 정확도를 가지는 학생 모델(420)의 출력 데이터를 더 이용하여 학생 모델(420)을 학습시킴으로써, 학생 모델(420)의 학습 속도 또는 정확도를 보다 향상시킬 수 있다.In some cases, the learned student model 420 based on the selected at least one teacher model 410 may have a higher accuracy than the selected at least one teacher model 410. Therefore, learning speed or accuracy of the student model 420 can be further improved by further using the output data of the student model 420 having a high accuracy to learn the student model 420. [

일실시예에 따른 모델 학습 장치는 학생 모델(420)의 정확도를 측정하고 복수의 교사 모델들의 정확도와 학생 모델(420)의 정확도를 비교함으로써, 학생 모델(420)의 출력 데이터를 학습에 이용할지 여부를 결정할 수 있다.The model learning apparatus according to one embodiment determines whether the output data of the student model 420 is used for learning by measuring the accuracy of the student model 420 and comparing the accuracy of the plurality of teacher models with the accuracy of the student model 420 Can be determined.

예를 들어, 복수의 교사 모델들의 정확도들 중에서 가장 높은 정확도보다 학생 모델(420)의 정확도가 높은 경우, 모델 학습 장치는 학생 모델(420)의 출력 데이터를 학습에 이용할 것으로 결정할 수 있다. 또는, 복수의 교사 모델들의 정확도들에 대한 통계 값(예컨대, 평균 값, 상위 k%의 값 등)과 학생 모델(420)의 정확도를 비교함으로써, 학생 모델(420)의 출력 데이터를 학습에 이용할지 여부를 결정할 수도 있다. 이외에도 학생 모델(420)의 출력 데이터를 학습에 이용할지 여부를 판단하는 기준은 설계에 따라 다양하게 변형될 수 있다.For example, if the accuracy of the student model 420 is higher than the highest accuracy among the accuracy of the plurality of teacher models, the model learning apparatus can determine that the output data of the student model 420 is used for learning. Alternatively, the output data of the student model 420 may be used for learning by comparing the statistical values (e.g., average value, high k% value, etc.) of the accuracy of the plurality of teacher models with the accuracy of the student model 420 Or not. In addition, the criterion for determining whether to use the output data of the student model 420 for learning can be variously modified according to the design.

도 4에서는 학생 모델(420)이 선택된 교사 모델 2의 출력 데이터

, 선택된 교사 모델 3의

, 학생 모델(420)의 출력 데이터

에 기초하여 학습되는 일례가 도시되어 있다.4, the student model 420 displays the output data of the selected teacher model 2

, Selected teacher model 3

, The output data of the student model 420

As shown in FIG.

도 5는 일실시예에 따라 학생 모델의 분류 레이어를 이용한 학습 과정의 일례를 설명하기 위한 도면이다.5 is a diagram for explaining an example of a learning process using a classification layer of a student model according to an embodiment.

도 5를 참조하면, 학생 모델(500)은 입력 레이어(510), 히든 레이어(520), 출력 레이어(530) 및 분류 레이어(540)를 포함할 수 있다. 학생 모델(500)은 입력 데이터(550)와 출력 데이터(560)를 이용하여 학습되는데, 이 때 출력 데이터(560)는 복수의 교사 모델들 중에서 선택된 적어도 하나의 교사 모델이 입력 데이터(550)로부터 출력한 데이터를 나타낼 수 있다.5, the student model 500 may include an input layer 510, a hidden layer 520, an output layer 530, and a classification layer 540. The student model 500 is learned using input data 550 and output data 560 where the output data 560 includes at least one teacher model selected from a plurality of teacher models from input data 550 The output data can be displayed.

도 2 내지 도 4에서는 출력 레이어(530)를 이용하여 학생 모델(500)을 학습시키는 과정을 중심으로 설명하였으나, 일실시예에 따른 모델 학습 장치는 출력 레이어(530)뿐만 아니라 히든 레이어(520)로부터 파생된 분류 레이어(540)도 함께 이용하여 학생 모델(500)을 학습시킬 수 있다.2 to 4, the model learning apparatus according to an embodiment of the present invention not only includes the output layer 530 but also the hidden layer 520, And the classification layer 540 derived from the classification layer 540 can also be used to learn the student model 500.

히든 레이어(520)는 신경망의 히든 노드들을 일정한 레벨 별로 묶어 놓은 것으로서, 입력 레이어(510)와 출력 레이어(530) 사이에 위치할 수 있다. 예를 들어, 히든 레이어(520)는 CNN(convolutional neural network)에서의 콘볼류션 필터(convolution filters) 또는 완전 연결 레이어(fully connected layer)이거나, 특별한 기능이나 특징을 기준으로 묶인 다양한 종류의 필터 또는 레이어를 나타낼 수 있다.The hidden layer 520 may be located between the input layer 510 and the output layer 530, which is a bundle of hidden nodes of the neural network. For example, the hidden layer 520 may be convolution filters or a fully connected layer in a convolutional neural network (CNN), or various types of filters, You can represent a layer.

분류 레이어(classifier layer)(540)는 히든 레이어(520)로부터 파생된 레이어일 수 있다. 분류 레이어(540)는 출력 레이어(530)와 마찬가지로 파생하는 히든 레이어(520)로부터 전달받은 값을 분석하여 미리 정해진 원소에 대응하는 출력 데이터를 출력할 수 있다. 이하, 설명의 편의를 위하여 분류 레이어 j(540-j)를 중심으로 학생 모델(500)이 학습되는 과정을 후술하나, 이러한 설명은 나머지 분류 레이어에도 동일하게 적용될 수 있다.The classifier layer 540 may be a layer derived from the hidden layer 520. The classification layer 540 may output the output data corresponding to the predetermined element by analyzing the value received from the derivative hidden layer 520, as in the case of the output layer 530. Hereinafter, the process of learning the student model 500 based on the classification layer j (540-j) will be described below for the sake of convenience of explanation, but this description can be applied to the remaining classification layers as well.

모델 학습 장치는 분류 레이어 j(540-j)의 출력 데이터와 선택된 교사 모델의 출력 데이터(560) 사이의 손실을 더 이용하여 학생 모델(500)을 학습시킬 수 있다. 분류 레이어 j(540-j)의 출력 데이터와 선택된 교사 모델의 출력 데이터(560) 간의 손실은 수학식 1 내지 수학식 3을 통해 설명한 사항이 적용될 수 있다.The model learning apparatus can learn the student model 500 by further using the loss between the output data of the classification layer j 540-j and the output data 560 of the selected teacher model. The loss between the output data of the classification layer j (540-j) and the output data 560 of the selected teacher model can be applied as described in Equations (1) to (3).

계산된 손실은 역 전파 기법을 통하여 파생한 히든 레이어 i(520-i)로 역 전파될 수 있다. 히든 레이어 i(520-i)는 파상된 분류 레이어 j(540-j)로부터 전달된 손실과 상위 히든 레이어 i+1로부터 전달된 손실에 기초하여 연결 가중치를 업데이트하고, 두 개의 손실을 합쳐서 하위 히든 레이어 i-1로 전달할 수 있다.The computed loss may be propagated back to the derived hidden layer i (520-i) through the back propagation technique. Hidden layer i (520-i) updates the link weight based on the loss delivered from the scrambled classification layer j (540-j) and the loss delivered from the superior layer i + 1, It can be passed to layer i-1.

분류 레이어 j(540-j)을 추가적으로 이용하여 학생 모델(500)을 학습시킬 때, 출력 레이어(530)와 분류 레이어 j(540-j)에서의 손실들은 서로 다른 방법에 기반하여 계산될 수 있다. 이하, 출력 레이어(530)에서의 손실을 제1 손실이라 하고, 분류 레이어 j(540-j)에서의 손실을 제2 손실이라 지칭한다. 그리고, 제1 손실을 계산할 때 이용되는 선택된 교사 모델의 출력 데이터를 제1 출력 데이터라 하고, 제2 손실을 계산할 때 이용되는 선택된 교사 모델의 출력 데이터를 제2 출력 데이터라 지칭한다.When learning the student model 500 additionally using the classification layer j 540-j, the losses in the output layer 530 and classification layer j 540-j may be computed based on different methods . Hereinafter, the loss in the output layer 530 is referred to as a first loss, and the loss in the classification layer j (540-j) is referred to as a second loss. The output data of the selected teacher model used when calculating the first loss is referred to as first output data and the output data of the selected teacher model used to calculate the second loss is referred to as second output data.

모델 학습 장치는 서로 다른 방법에 기반하여 제1 손실과 제2 손실을 계산할 수 있다. 모델 학습 장치는 제1 출력 데이터와 제2 출력 데이터를 출력하는 선택된 교사 모델을 서로 다르게 설정할 수 있다. 예를 들어, 모델 학습 장치는 선택된 교사 모델 1로부터 출력된 제1 출력 데이터를 이용하여 제1 손실을 계산하고, 선택된 교사 모델 3으로부터 출력된 제2 출력 데이터를 이용하여 제2 손실을 계산할 수 있다.The model learning apparatus can calculate the first loss and the second loss based on different methods. The model learning apparatus can set different teacher models for outputting the first output data and the second output data. For example, the model learning apparatus can calculate the first loss using the first output data output from the selected teacher model 1, and calculate the second loss using the second output data output from the selected teacher model 3 .

또는, 모델 학습 장치는 제1 출력 데이터와 제2 출력 데이터가 동일한 선택된 교사 모델로부터 출력된 경우라도 제1 출력 데이터와 제2 출력 데이터에 적용되는 가중치를 서로 다르게 결정할 수 있다. 예를 들어, 교사 모델 1과 교사 모델 3이 선택되어 제1 손실과 제2 손실이 계산되는 경우, 모델 학습 장치는 제1 손실을 계산할 때에는 교사 모델 1의 출력 데이터에 보다 큰 가중치를 설정하고, 제2 손실을 계산할 때에는 교사 모델 3의 출력 데이터에 보다 큰 가중치를 설정하여 제1 손실과 제2 손실을 계산할 수 있다. 또한, 모델 학습 장치는 출력 레이어(530)와 분류 레이어(540) 중에서 출력 레이어(530)에 가까운 레이어일수록 정답 데이터에 대한 가중치를 높게 설정하여 손실을 계산할 수 있다.Alternatively, even when the first output data and the second output data are output from the same selected teacher model, the model learning apparatus can determine the weights applied to the first output data and the second output data differently. For example, when the teacher model 1 and the teacher model 3 are selected to calculate the first loss and the second loss, the model learning apparatus sets a larger weight to the output data of the teacher model 1 when calculating the first loss, When calculating the second loss, it is possible to calculate the first loss and the second loss by setting a larger weight on the output data of the teacher model 3. In addition, the model learning apparatus can calculate the loss by setting the weight for the correct answer data to be higher for the layer closest to the output layer 530 out of the output layer 530 and the classification layer 540. [

또는, 모델 학습 장치는 출력 레이어(530)와 분류 레이어 j(540-j)의 초기 가중치를 서로 다르게 결정할 수 있다. 모델 학습 장치는 선택된 적어도 하나의 교사 모델 중에서 분류 레이어 j(540-j)의 초기 가중치를 분류 레이어 j(540-j)로 입력되는 데이터와 가장 유사한 사이즈를 가지는 선택된 교사 모델의 초기 가중치로 설정할 수 있다. 마찬가지로, 모델 학습 장치는 선택된 적어도 하나의 교사 모델 중에서 출력 레이어(530)의 초기 가중치를 출력 레이어(530)로 입력되는 데이터와 가장 유사한 사이즈를 가지는 선택된 교사 모델의 초기 가중치로 설정할 수 있다. 예를 들어, 분류 레이어 j(540-j)로 입력되는 데이터(예컨대, 입력 특징 맵)의 사이즈가 128인 경우, 선택된 적어도 하나의 교사 모델 중 가장 유사한 입력 사이즈를 가지는 선택된 교사 모델의 초기 가중치가 분류 레이어 j(540-j)의 초기 가중치로 설정될 수 있다.Alternatively, the model learning apparatus may determine the initial weights of the output layer 530 and the classification layer j 540-j differently from each other. The model learning apparatus can set the initial weight of the classification layer j (540-j) among the selected at least one teacher model to the initial weight of the selected teacher model having the size most similar to the data input to the classification layer j (540-j) have. Similarly, the model learning apparatus may set the initial weight of the output layer 530 among the selected at least one teacher model to the initial weight of the selected teacher model having the size most similar to the data input to the output layer 530. For example, if the size of the data (e.g., the input feature map) input to the classification layer j 540-j is 128, then the initial weight of the selected teacher model having the most similar input size among the selected at least one teacher model is May be set to the initial weight of the classification layer j (540-j).

예를 들어, 분류 레이어 j(540-j)에서의 손실은 다음과 같이 계산될 수 있다.For example, the loss at classification layer j (540-j) can be calculated as follows.

위의 수학식 4에서,

은 분류 레이어 j(540-j)에서의 손실을 계산하는 손실 함수를 나타내고,

은 선택된 교사 모델 l의 출력 데이터를 나타내고,

은 입력 데이터(550)에 대한 분류 레이어 j(540-j)의 출력 데이터를 나타낸다.

은 분류 레이어 j(540-j)의 출력 데이터

와 선택된 교사 모델 i의 출력 데이터

간의 엔트로피, 소프트맥스 함수 또는 유클리드 거리를 나타내고,

은 분류 레이어 j(540-j)의 출력 데이터

와 선택된 교사 모델 l의 출력 데이터

간의 엔트로피, 소프트맥스 함수 또는 유클리드 거리를 나타내며,

은 분류 레이어 j(540-j)의 출력 데이터

와 정답 데이터

간의 엔트로피, 소프트맥스 함수 또는 유클리드 거리를 나타낸다.

,

은 상수로서, 선택된 교사 모델 i의 출력 데이터

, 선택된 교사 모델 l의 출력 데이터

, 정답 데이터

에 적용되는 가중치를 각각 나타낸다.In Equation (4) above,

Represents the loss function for calculating the loss in the classification layer j (540-j)

Represents the output data of the selected teacher model l,

Represents the output data of the classification layer j (540-j) with respect to the input data 550.

The output data of the classification layer j 540-j

And the output data of the selected teacher model i

The entropy, the soft max function or the Euclidian distance,

The output data of the classification layer j 540-j

And the output data of the selected teacher model l

The entropy, the soft max function or the Euclidean distance,

The output data of the classification layer j 540-j

And correct answer data

Entropy, soft max function, or Euclidean distance.

,

As a constant, the output data of the selected teacher model i

, The output data of the selected teacher model l

, Correct answer data

Respectively.

이상, 출력 레이어(530)와 분류 레이어 j(540-j)를 중심으로 학생 모델(500)이 학습되는 과정을 설명하였으나, 상술한 사항들은 분류 레이어(540)에 포함된 나머지 분류 레이어들에도 동일하게 적용될 수 있다.Although the process of learning the student model 500 on the basis of the output layer 530 and the classification layer j 540-j has been described above, the above description is also applicable to the remaining classification layers included in the classification layer 540 Lt; / RTI >

다시 말해, 모델 학습 장치는 복수의 분류 레이어들마다 서로 다른 방식을 적용하여 해당 분류 레이어에서의 손실을 계산할 수 있다. 예를 들어, 모델 학습 장치는 복수의 분류 레이어들마다 서로 다른 선택된 교사 모델로부터 출력된 출력 데이터를 이용하여 손실을 계산할 수 있다. 또한, 모델 학습 장치는 선택된 교사 모델들의 출력 데이터들에 적용되는 가중치를 복수의 분류 레이어들마다 서로 다르게 설정하여 손실을 계산할 수도 있다. 또한, 모델 학습 장치는 복수의 분류 레이어들마다 서로 다른 초기 가중치를 적용함으로써 손실을 계산할 수도 있다.In other words, the model learning apparatus can calculate the loss in the classification layer by applying different methods to each of the plurality of classification layers. For example, the model learning apparatus can calculate loss using output data output from different teacher models different for each of a plurality of classification layers. In addition, the model learning apparatus may calculate the loss by setting different weights applied to the output data of the selected teacher models for each of the plurality of classification layers. In addition, the model learning apparatus may calculate the loss by applying different initial weights for each of the plurality of classification layers.

도 6은 일실시예에 따른 모델 학습 방법을 나타낸 도면이다.6 is a diagram illustrating a model learning method according to an embodiment.

일실시예에 따른 모델 학습 방법은 모델 학습 장치에 구비된 프로세서에 의해 수행될 수 있다.The model learning method according to an embodiment may be performed by a processor included in the model learning apparatus.

단계(610)에서, 모델 학습 장치는 복수의 교사 모델들 중에서 적어도 하나를 선택한다. 복수의 교사 모델들은 서로 다른 초기 가중치를 가지거나, 서로 다른 신경망 구조를 가지거나, 서로 다른 하이퍼 파라미터가 적용되거나 또는 서로 다른 앙상블로 구성될 수 있다.In step 610, the model learning apparatus selects at least one of a plurality of teacher models. The plurality of teacher models may have different initial weights, have different neural network structures, apply different hyperparameters, or be composed of different ensembles.

모델 학습 장치는 복수의 교사 모델들의 정확도에 기반하여 복수의 교사 모델들 중에서 적어도 하나를 선택할 수 있다. 예를 들어, 모델 학습 장치는 복수의 교사 모델들 중에서 가장 높은 정확도를 가지는 교사 모델을 선택하거나 또는 임계 값 이상의 정확도를 가지는 교사 모델들을 선택할 수 있다. 또는, 모델 학습 장치는 정확도를 기준으로 복수의 교사 모델들을 나열하고, 정확도가 높은 순서대로 미리 정해진 k 개의 교사 모델들을 선택할 수도 있다.The model learning apparatus can select at least one of the plurality of teacher models based on the accuracy of the plurality of teacher models. For example, the model learning apparatus may select a teacher model having the highest accuracy among a plurality of teacher models, or may select teacher models having an accuracy higher than a threshold value. Alternatively, the model learning apparatus may list a plurality of teacher models on the basis of the accuracy and select k teacher models predetermined in order of accuracy.

또한, 모델 학습 장치는 입력 데이터에 대한 복수의 교사 모델들의 출력 데이터들 간의 상관도(correlation)에 기반하여 복수의 교사 모델들 중에서 둘 이상을 선택할 수 있다. 모델 학습 장치는 선택된 교사 모델들의 출력 데이터들 간의 상관도가 임계 값보다 낮도록 복수의 교사 모델들 중에서 둘 이상을 선택할 수 있다.In addition, the model learning apparatus can select two or more of the plurality of teacher models based on the correlation between the output data of the plurality of teacher models with respect to the input data. The model learning apparatus can select two or more of the plurality of teacher models so that the correlation between the output data of the selected teacher models is lower than the threshold value.

또한, 모델 학습 장치는 휴리스틱(heuristic)하게 복수의 교사 모델들 중에서 적어도 하나를 선택하는 입력을 사용자로부터 수신할 수도 있다.Further, the model learning apparatus may heuristically receive an input from the user to select at least one of a plurality of teacher models.

단계(620)에서, 모델 학습 장치는 입력 데이터에 대한 선택된 교사 모델의 출력 데이터에 기초하여 학생 모델을 학습시킨다. 모델 학습 장치는 입력 데이터에 대한 학생 모델의 출력 데이터와 선택된 교사 모델의 출력 데이터 사이의 손실에 기초하여 학생 모델을 학습시킬 수 있다.In step 620, the model learning apparatus learns the student model based on the output data of the selected teacher model for the input data. The model learning apparatus can learn the student model based on the loss between the output data of the student model and the output data of the selected teacher model with respect to the input data.

일실시예에 따른 모델 학습 장치는 학생 모델의 출력 데이터를 더 이용하여 학생 모델을 학습시킬 수 있다. 또한, 모델 학습 장치는 입력 데이터에 대응하는 정답 데이터를 더 이용하여 학생 모델을 학습시킬 수도 있다.The model learning apparatus according to the embodiment can further learn the student model by using the output data of the student model. Further, the model learning apparatus may further use the correct answer data corresponding to the input data to learn the student model.

모델 학습 장치는 입력 데이터에 대한 학생 모델의 출력 데이터와 선택된 적어도 하나의 교사 모델의 제1 출력 데이터 사이의 제1 손실과 학생 모델의 히든 레이어로부터 파생(derive)된 분류 레이어의 출력 데이터와 선택된 적어도 하나의 교사 모델의 제2 출력 데이터 사이의 제2 손실에 기초하여 학생 모델을 학습시킬 수 있다.The model learning apparatus includes a first loss between the output data of the student model for the input data and the first output data of the selected at least one teacher model and the output data of the classification layer derived from the hidden layer of the student model, The student model can be learned based on the second loss between the second output data of one teacher model.

이 때, 제1 손실과 제2 손실은 서로 다른 방법에 기반하여 결정될 수 있다. 제1 손실과 제2 손실은, 서로 다른 선택된 교사 모델들로부터 출력된 제1 출력 데이터와 제2 출력 데이터를 이용하여 결정되거나, 또는 제1 출력 데이터와 제2 출력 데이터에 서로 다른 가중치를 적용함으로써 결정될 수 있다. 그리고, 분류 레이어의 초기 가중치는 선택된 적어도 하나의 교사 모델 중에서 분류 레이어로 입력되는 데이터와 가장 유사한 사이즈를 가지는 선택된 교사 모델의 초기 가중치로 설정될 수 있다.At this time, the first loss and the second loss may be determined based on different methods. The first loss and the second loss are determined using the first output data and the second output data output from different selected teacher models or by applying different weights to the first output data and the second output data Can be determined. The initial weight of the classification layer may be set to the initial weight of the selected teacher model having the size most similar to the data input to the classification layer among the selected at least one teacher model.

일실시예에 따른 모델 학습 장치는 학생 모델의 구조를 선택된 교사 모델로 입력되는 데이터의 사이즈에 기초하여 결정할 수 있다. 모델 학습 장치는 선택된 교사 모델로 입력되는 데이터의 사이즈와 학생 모델로 입력되는 데이터의 사이즈 간의 차이에 기초하여 학생 모델의 구조를 변형할 수 있다. 예를 들어, 선택된 교사 모델로 입력되는 데이터의 사이즈가 64 픽셀 x 64 픽셀이고, 학생 모델로 입력되는 데이터의 사이즈가 32 픽셀 x 32 픽셀인 경우, 학생 모델이 선택된 교사 모델의 수용 영역(receptive field)과 동일 또는 유사한 수용 영역을 가지도록 학생 모델의 구조가 변형될 수 있다.The model learning apparatus according to one embodiment can determine the structure of the student model based on the size of data input to the selected teacher model. The model learning apparatus can modify the structure of the student model based on the difference between the size of the data input to the selected teacher model and the size of the data input to the student model. For example, if the size of the data input to the selected teacher model is 64 pixels x 64 pixels and the size of the data input to the student model is 32 pixels x 32 pixels, then the student model is the acceptance field of the selected teacher model The structure of the student model can be modified so as to have the same or similar coverage area as that of the student model.

모델 학습 장치는 학생 모델에 포함된 입력 레이어의 노드 개수, 히든 레이어 개수, 히든 레이어에 포함된 노드 개수 등을 선택된 교사 모델로 입력되는 데이터의 사이즈에 기초하여 변경할 수 있다.The model learning apparatus can change the number of nodes of the input layer included in the student model, the number of hidden layers, the number of nodes included in the hidden layer, and the like based on the size of data input to the selected teacher model.

단계(630)에서, 모델 학습 장치는 학생 모델이 미리 정해진 조건을 만족하는지 여부를 판단할 수 있다.In step 630, the model learning device may determine whether the student model satisfies a predetermined condition.

일례로, 모델 학습 장치는 학생 모델의 정확도가 임계 값 이상인지 여부를 판단할 수 있다. 학생 모델의 정확도가 임계 값 미만인 경우, 모델 학습 장치는 단계들(610, 620)을 재 수행할 수 있다. 단계(610)가 재 수행될 때마다 복수의 교사 모델들 중에서 적어도 하나가 미리 정해진 기준에 따라 가변적으로 선택될 수 있다. 예를 들어, 단계(610)가 재 수행될 때, 이전 단계(620)에서 학생 모델을 학습시킬 때 이용되었던 교사 모델을 제외한 나머지 교사 모델들 중에서 적어도 하나가 선택될 수 있다. 이를 통해, 학생 모델이 하나의 교사 모델에 과대적합되는 것을 효과적으로 방지할 수 있다. 또한, 단계(620)에서 모델 학습 장치는 학생 모델에 대한 학습의 진행 정도(예컨대, 정확도)에 따라 선택된 교사 모델의 출력 데이터 및 정답 데이터에 적용되는 가중치를 제어할 수 있다. 예를 들어, 모델 학습 장치는 학생 모델의 학습이 진행됨에 따라 선택된 교사 모델의 출력 데이터에 대한 가중치보다 정답 데이터에 대한 가중치를 크게 설정할 수 있다. 반대로, 학생 모델의 정확도가 임계 값 이상인 경우, 모델 학습 장치는 학생 모델에 대한 학습을 종료할 수 있다.For example, the model learning device may determine whether the accuracy of the student model is above a threshold. If the accuracy of the student model is less than the threshold, the model learning device may redo steps 610, 620. At least one of the plurality of teacher models may be variably selected according to a predetermined criterion each time step 610 is performed again. For example, when step 610 is performed again, at least one of the teacher models other than the teacher model used in learning the student model in the previous step 620 may be selected. This makes it possible to effectively prevent the student model from being excessively fit into one teacher model. Further, in step 620, the model learning apparatus can control the output data of the selected teacher model and the weights applied to the correct answer data according to the degree of progress (e.g., accuracy) of the learning with respect to the student model. For example, as the learning of the student model progresses, the model learning apparatus can set the weight for the correct answer data to be larger than the weight for the output data of the selected teacher model. Conversely, if the accuracy of the student model is greater than or equal to the threshold value, the model learning device can terminate the learning on the student model.

다른 일례로, 모델 학습 장치는 학생 모델에 대한 학습 횟수가 미리 설정된 반복 횟수를 만족하는지 여부를 판단할 수 있다. 학생 모델에 대한 학습 횟수가 미리 설정된 반복 횟수를 만족하지 않는 경우, 모델 학습 장치는 단계들(610, 620)을 재 수행할 수 있다. 이 경우에도, 단계(610)이 재 수행될 때마다 복수의 교사 모델들 중에서 적어도 하나가 미리 정해진 기준에 따라 가변적으로 선택될 수 있다. 반대로, 학생 모델에 대한 학습 횟수가 미리 설정된 반복 횟수를 만족하는 경우, 모델 학습 장치는 학생 모델에 대한 학습을 종료할 수 있다.In another example, the model learning apparatus can determine whether or not the number of learning times for the student model satisfies a preset number of repetitions. If the number of learning times for the student model does not satisfy the preset number of iterations, the model learning device can re-execute the steps 610 and 620. In this case, at least one of the plurality of teacher models can be variably selected according to a predetermined criterion every time step 610 is performed again. On the other hand, when the number of learning times for the student model satisfies the preset number of times of repetition, the model learning device can terminate the learning on the student model.

도 7은 일실시예에 따른 데이터 인식 방법을 나타낸 도면이다.7 is a diagram illustrating a data recognition method according to an embodiment of the present invention.

일실시예에 따른 데이터 인식 방법은 데이터 인식 장치에 구비된 프로세서에 의해 수행될 수 있다.The data recognition method according to an embodiment may be performed by a processor included in the data recognition apparatus.

단계(710)에서, 데이터 인식 장치는 인식하고자 하는 대상 데이터를 수신한다. 대상 데이터는 기 학습된 모델을 통해 인식하고자 하는 데이터로서, 예를 들어, 이미지 데이터, 비디오 데이터, 음성 데이터, 시계열 데이터(time-series data), 센서 데이터 또는 이들의 다양한 조합을 포함할 수 있다.In step 710, the data recognition apparatus receives the target data to be recognized. The object data may include, for example, image data, video data, voice data, time-series data, sensor data, or various combinations thereof, as data to be recognized through the pre-learned model.

단계(720)에서, 데이터 인식 장치는 기 학습된 모델을 이용하여 대상 데이터를 인식한다. 모델은 대상 데이터로부터 객체를 검출하거나, 분류하거나 또는 클러스터링(clustering)할 수 있는 신경망을 나타낼 수 있다.In step 720, the data recognizing device recognizes the target data using the learned model. The model may represent a neural network that can detect, classify, or clustering objects from the object data.

모델은 복수의 교사 모델들 중에서 적어도 하나를 선택하고, 입력 데이터에 대한 선택된 적어도 하나의 교사 모델의 출력 데이터를 이용함으로써 학습된다. 예를 들어, 모델은 입력 데이터에 대한 학생 모델의 출력 데이터와 선택된 적어도 하나의 교사 모델의 출력 데이터 사이의 손실에 기초하여 학습될 수 있다.The model is learned by selecting at least one of a plurality of teacher models and using output data of at least one teacher model selected for the input data. For example, the model may be learned based on the loss between the output data of the student model for the input data and the output data of the selected at least one teacher model.

이 때, 선택된 적어도 하나의 교사 모델은 복수의 교사 모델들의 정확도 또는 입력 데이터에 대한 복수의 교사 모델들의 출력 데이터들 간의 상관도에 기초하여 선택될 수 있다.At this time, the selected at least one teacher model can be selected based on the accuracy of the plurality of teacher models or the correlation between the output data of the plurality of teacher models to the input data.

또한, 모델은 (i) 입력 데이터에 대한 모델의 출력 데이터와 선택된 적어도 하나의 교사 모델의 제1 출력 데이터 사이의 제1 손실 및 (ii) 모델의 히든 레이어로부터 파생된 분류 레이어의 출력 데이터와 선택된 적어도 하나의 교사 모델의 제2 출력 데이터 사이의 제2 손실에 기초하여 학습될 수 있다. 이 때, 제1 손실과 제2 손실은 서로 다른 방법에 기반하여 결정될 수 있다.The model also includes (i) a first loss between the output data of the model for the input data and the first output data of the selected at least one teacher model, and (ii) the output data of the classification layer derived from the hidden layer of the model, And a second loss between second output data of at least one teacher model. At this time, the first loss and the second loss may be determined based on different methods.

대상 데이터를 인식하기 위한 모델을 학습시키는 과정에 대해서는 도 1 내지 도 6을 통하여 전술한 사항들이 그대로 적용되므로, 보다 상세한 설명은 생략한다.In the process of learning the model for recognizing the object data, the above-described matters are directly applied through FIG. 1 to FIG. 6, and a detailed description thereof will be omitted.

도 8은 일실시예에 따른 모델 학습 장치를 나타낸 도면이다.8 is a diagram illustrating a model learning apparatus according to an embodiment.

도 8을 참고하면, 모델 학습 장치(800)는 프로세서(810) 및 메모리(820)를 포함한다. 모델 학습 장치(800)는 데이터 인식을 위한 신경망을 학습시키는 장치로서, 단일 프로세서 또는 멀티 프로세서로 구현될 수 있다.Referring to FIG. 8, the model learning apparatus 800 includes a processor 810 and a memory 820. The model learning apparatus 800 is a device for learning a neural network for data recognition, and may be implemented as a single processor or a multiprocessor.

프로세서(810)는 복수의 교사 모델들 중에서 적어도 하나를 선택하고, 입력 데이터에 대한 선택된 적어도 하나의 교사 모델의 출력 데이터에 기초하여 학생 모델을 학습시킨다. 예를 들어, 프로세서(810)는 입력 데이터에 대한 학생 모델의 출력 데이터와 선택된 적어도 하나의 교사 모델의 출력 데이터 사이의 손실에 기초하여 학생 모델을 학습시킬 수 있다.The processor 810 selects at least one of the plurality of teacher models and learns the student model based on the output data of the at least one teacher model selected for the input data. For example, the processor 810 may learn the student model based on the loss between the output data of the student model for the input data and the output data of the selected at least one teacher model.

프로세서(810)는 복수의 교사 모델들의 정확도 또는 입력 데이터에 대한 복수의 교사 모델들의 출력 데이터들 간의 상관도에 기초하여 복수의 교사 모델들 중에서 적어도 하나를 선택할 수 있다.The processor 810 may select at least one of a plurality of teacher models based on the accuracy of the plurality of teacher models or the correlation between the output data of the plurality of teacher models to the input data.

프로세서(810)는 (i) 입력 데이터에 대한 학생 모델의 출력 데이터와 선택된 적어도 하나의 교사 모델의 제1 출력 데이터 사이의 제1 손실 및 (ii) 학생 모델의 히든 레이어로부터 파생된 분류 레이어의 출력 데이터와 선택된 적어도 하나의 교사 모델의 제2 출력 데이터 사이의 제2 손실에 기초하여 학생 모델을 학습시킬 수 있다. 이 때, 제1 손실과 제2 손실은 서로 다른 방법에 기반하여 결정될 수 있다.Processor 810 may include (i) a first loss between the output data of the student model for the input data and the first output data of the selected at least one teacher model, and (ii) the output of the classification layer derived from the hidden layer of the student model And to learn the student model based on the second loss between the data and the second output data of the selected at least one teacher model. At this time, the first loss and the second loss may be determined based on different methods.

입력 데이터는 학생 모델을 학습시키기 위한 학습 데이터로서, 예를 들어, 이미지 데이터, 음성 데이터, 또는 이들의 다양한 조합을 포함할 수 있다.The input data may be learning data for learning the student model, for example, image data, voice data, or various combinations thereof.

메모리(820)는 프로세서(810)에서 학습된 학생 모델을 저장한다.The memory 820 stores the learned student model in the processor 810.

도 9는 일실시예에 따른 데이터 인식 장치를 나타낸 도면이다.9 is a diagram illustrating a data recognition apparatus according to an embodiment.

데이터 인식 장치(900)는 수신부(910) 및 프로세서(920)를 포함한다. 데이터 인식 장치는 기 학습된 모델을 통해 수신된 대상 데이터를 인식할 수 있는 장치로서, 예를 들어, 스마트 폰, 테블릿 컴퓨터, 랩톱 컴퓨터, 데스크톱 컴퓨터, 텔레비전, 웨어러블 장치, 보안 시스템, 스마트 홈 시스템 등 다양한 컴퓨팅 장치 및/또는 시스템에 탑재될 수 있다.The data recognition apparatus 900 includes a receiving unit 910 and a processor 920. The data recognizing device is a device capable of recognizing the target data received through the learned model, and includes, for example, a smart phone, a tablet computer, a laptop computer, a desktop computer, a television, a wearable device, Or the like. &Lt; / RTI >

수신부(910)는 인식하고자 하는 대상 데이터를 수신한다.The receiving unit 910 receives target data to be recognized.

프로세서(920)는 기 학습된 모델을 이용하여 대상 데이터를 인식한다. 모델은 복수의 교사 모델들 중에서 적어도 하나를 선택하고, 입력 데이터에 대한 선택된 적어도 하나의 교사 모델의 출력 데이터를 이용함으로써 학습된다. 예를 들어, 모델은 입력 데이터에 대한 학생 모델의 출력 데이터와 선택된 적어도 하나의 교사 모델의 출력 데이터 사이의 손실에 기초하여 학습될 수 있다.The processor 920 recognizes the target data using the learned model. The model is learned by selecting at least one of a plurality of teacher models and using output data of at least one teacher model selected for the input data. For example, the model may be learned based on the loss between the output data of the student model for the input data and the output data of the selected at least one teacher model.

실시예들은 복수의 교사 모델들 중에서 적어도 하나를 선택하여 학생 모델을 학습시킴으로써, 학생 모델의 학습 속도를 높이면서 학생 모델의 정확도를 효과적으로 향상시킬 수 있다.Embodiments can improve the accuracy of the student model by increasing the learning speed of the student model by learning at least one of the plurality of teacher models to learn the student model.

실시예들은 복수의 교사 모델들 중 적어도 하나가 미리 정해진 기준에 따라 가변적으로 선택됨으로써, 학생 모델이 특정 교사 모델에 과대적합(overfitting)되는 것을 방지할 수 있다.Embodiments can prevent at least one of the plurality of teacher models from being overfitted to a particular teacher model by variably selecting according to predetermined criteria.

실시예들은 학생 모델의 출력 레이어에서의 손실과 분류 레이어에서의 손실을 서로 다른 방법에 기반하여 계산함으로써, 학생 모델이 깊은 구조의 신경망이더라도 신경망 내의 연결 가중치를 효과적으로 업데이트시킬 수 있다.Embodiments can effectively update the connection weights in the neural network, even if the student model is a deeply structured neural network, by calculating the loss in the output layer and the loss in the classification layer of the student model based on different methods.

이상에서 설명된 실시예들은 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치, 방법 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The embodiments described above may be implemented in hardware components, software components, and / or a combination of hardware components and software components. For example, the devices, methods, and components described in the embodiments may be implemented within a computer system, such as, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, such as an array, a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For ease of understanding, the processing apparatus may be described as being used singly, but those skilled in the art will recognize that the processing apparatus may have a plurality of processing elements and / As shown in FIG. For example, the processing unit may comprise a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as a parallel processor.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of the foregoing, and may be configured to configure the processing device to operate as desired or to process it collectively or collectively Device can be commanded. The software and / or data may be in the form of any type of machine, component, physical device, virtual equipment, computer storage media, or device , Or may be permanently or temporarily embodied in a transmitted signal wave. The software may be distributed over a networked computer system and stored or executed in a distributed manner. The software and data may be stored on one or more computer readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to an embodiment may be implemented in the form of a program command that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions to be recorded on the medium may be those specially designed and configured for the embodiments or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.Although the embodiments have been described with reference to the drawings, various technical modifications and variations may be applied to those skilled in the art. For example, it is to be understood that the techniques described may be performed in a different order than the described methods, and / or that components of the described systems, structures, devices, circuits, Lt; / RTI > or equivalents, even if it is replaced or replaced.

Claims

Selecting at least one of a plurality of teacher models; And
A step of learning a student model based on output data of at least one teacher model selected for input data
/ RTI >

The method according to claim 1,
Wherein selecting at least one of the plurality of teacher models comprises:
Wherein at least one of the plurality of teacher models is selected based on the accuracy of the plurality of teacher models.

The method according to claim 1,
Wherein selecting at least one of the plurality of teacher models comprises:
Wherein at least one of the plurality of teacher models is selected based on a correlation between output data of the plurality of teacher models with respect to the input data.

The method of claim 3,
Wherein selecting at least one of the plurality of teacher models comprises:
And selecting at least one of the plurality of teacher models so that the degree of correlation between the output data of the selected at least one teacher models is lower than the threshold value.

The method according to claim 1,
Wherein learning the student model comprises:
And the student model is further learned by using output data of the student model.

The method according to claim 1,
Selecting at least one of the plurality of teacher models,
And the student model is repeatedly performed until a predetermined condition is satisfied.

The method according to claim 1,
Wherein learning the student model comprises:
A first loss between output data of the student model for the input data and first output data of the selected at least one teacher model; And
A second loss between the output data of the classification layer derived from the hidden layer of the student model and the second output data of the selected at least one teacher model
Learning the student model based on the student model,
Wherein the first loss and the second loss are determined based on different methods.

8. The method of claim 7,
Wherein the first loss and the second loss are determined using first output data and second output data output from different selected teacher models.

8. The method of claim 7,
Wherein the first loss and the second loss are determined by applying different weights to the first output data and the second output data.

8. The method of claim 7,
Wherein the initial weight of the classification layer
Wherein the initial weight is set to an initial weight of a selected teacher model having a size most similar to data input to the classification layer among the selected at least one teacher models.

The method according to claim 1,
Wherein learning the student model comprises:
And the student model is further learned by using the correct answer data corresponding to the input data.

The method according to claim 1,
Wherein the plurality of teacher models include:
Wherein the model learning method has different initial weights, different neural network structures, different hyper parameters, or different ensembles.

The method according to claim 1,
The structure of the student model,
Wherein the at least one teacher model is determined based on a size of data input to the selected at least one teacher model.

Receiving target data to be recognized; And
A step of recognizing the target data using the learned model
Lt; / RTI >
In the model,
Wherein at least one of the plurality of teacher models is selected and the output data of the selected at least one teacher model for the input data is used.

15. The method of claim 14,
Wherein the at least one selected teacher model comprises:
Wherein the plurality of teacher models are selected based on the accuracy of the plurality of teacher models or the correlation between the output data of the plurality of teacher models to the input data.

15. The method of claim 14,
In the model,
A first loss between output data of the model for the input data and first output data of the selected at least one teacher model; And
A second loss between the output data of the classification layer derived from the hidden layer of the model and the second output data of the selected at least one teacher model
, &Lt; / RTI >
Wherein the first loss and the second loss are determined based on different methods.

A computer-readable recording medium having recorded thereon a program for executing the method according to any one of claims 1 to 16.

A processor that learns the student model; And
A memory for storing the learned student model
Lt; / RTI >
The processor comprising:
Wherein at least one of the plurality of teacher models is selected and the student model is learned based on output data of the selected at least one teacher model for the input data.

19. The method of claim 18,
The processor comprising:
And selects at least one of the plurality of teacher models based on the degree of correlation between the accuracy of the plurality of teacher models or the output data of the plurality of teacher models to the input data.

19. The method of claim 18,
The processor comprising:
A first loss between output data of the student model for the input data and first output data of the selected at least one teacher model; And
A second loss between the output data of the classification layer derived from the hidden layer of the student model and the second output data of the selected at least one teacher model
Learning the student model based on the student model,
Wherein the first loss and the second loss are determined based on different methods.