KR101947780B1

KR101947780B1 - Method and system for downsizing neural network

Info

Publication number: KR101947780B1
Application number: KR1020170053681A
Authority: KR
Inventors: 김태호; 정대웅; 이동욱; 이슬
Original assignee: 주식회사 노타
Priority date: 2017-04-26
Filing date: 2017-04-26
Publication date: 2019-02-15
Also published as: KR20180119939A

Abstract

신경망 다운사이징 방법이 개시된다. 본 발명의 실시예에 따른 신경망 다운사이징 방법은, 신경망 다운사이징을 위해 큰 신경망을 제 1 트레이닝 데이터를 사용함으로써 트레이닝하는 단계로서, 작은 신경망으로의 지식 전달을 위한 제 2 트레이닝 데이터를 생성하는 단계; 및 상기 작은 신경망을 상기 제 2 트레이닝 데이터를 사용함으로써 트레이닝하는 단계를 포함한다.A neural network downsizing method is disclosed. The method for downsizing a neural network according to an embodiment of the present invention includes: training a large neural network by using first training data for downsizing a neural network, the method comprising: generating second training data for knowledge transfer to a small neural network; And training the small neural network by using the second training data.

Description

[0001] The present invention relates to a method and system for downsizing a neural network,

본 발명은 신경망의 다운사이징 방법으로서, 특히 신경망의 다운사이징을 위해 최적화된 방법으로 큰 신경망을 트레이닝하고, 큰 신경망의 트레이닝 결과를 사용하여 작은 신경망을 트레이닝함으로써 특정 태스트 수행을 위한 신경망을 다운사이징하는 방법에 대한 것이다.The present invention relates to a method for downsizing a neural network, in particular for training a large neural network in an optimized manner for downsizing of a neural network and downsizing a neural network for performing a particular test by training a small neural network using training results of a large neural network It is about the method.

신경망은 인간의 뇌에서 뉴런들이 연결된 것처럼 구성한 네트워크를 말한다. 수학적 모델로서의 뉴런이 상호 연결되어 네트워크를 형성하며, 생물학적 신경망과 구별하여 인공 신경망이라고도 지칭할 수 있다.A neural network is a network of neurons in a human brain. Neurons as mathematical models are interconnected to form a network, and may be referred to as artificial neural networks, distinct from biological neural networks.

최근 인공지능의 적용 범위가 증가함에 따라서 인공지능 구현을 위한 알고리즘 중 하나인 신경망 또한 많은 기술 분야에 적용되고 있다.As the application range of artificial intelligence increases, neural network, which is one of the algorithms for artificial intelligence implementation, is also applied to many technical fields.

신경망은 매우 복잡한 연산을 수행하여 고난이도의 태스크를 수행할 수 있다. 다만, 신경망의 구현 자체가 복잡한 구조 및 알고리즘을 필요로 하므로 일정 이상의 메모리와 컴퓨팅 파워를 필요로 한다. 신경망의 사용이 보급됨에 따라서, 개인화된 데이터 처리를 위해 신경망의 사이즈를 줄이는 방법이 제안된다. Neural networks can perform highly complex tasks by performing highly complex operations. However, the implementation of the neural network itself requires a complicated structure and algorithm, and thus requires more memory and computing power than a certain amount. As the use of neural networks becomes more prevalent, a method of reducing the size of a neural network for personalized data processing is proposed.

신경망의 사이즈를 줄이면서 성능 열화는 최소화하는 것이 중요하다. 사이즈를 줄인 신경망의 성능 향상을 위한 트레이닝 방법이 요구된다.It is important to minimize the performance degradation while reducing the size of the neural network. A training method for improving the performance of a neural network with reduced size is required.

상술한 기술적 과제를 해결하기 위하여, 본 발명의 실시예에 따른 신경망 다운사이징 방법은, 신경망 다운사이징을 위해 큰 신경망을 제 1 트레이닝 데이터를 사용함으로써 트레이닝하는 단계로서, 상기 큰 신경망은 k개의 히든 레이어를 포함하며 상기 k개의 히든 레이어 각각은 M_k개의 노드들을 포함하고, 상기 제 1 트레이닝 데이터는 신경망을 특정 태스크에 대한 답을 제공할 수 있도록 트레이닝하는 데이터 세트로서, 상기 태스크에 대한 제 1 입력 데이터 및 상기 제 1 입력 데이터에 대한 정답을 나타내는 제 1 라벨 데이터를 포함하는, 상기 단계; 작은 신경망으로의 지식 전달을 위한 제 2 트레이닝 데이터를 생성하는 단계; 및 상기 작은 신경망을 상기 제 2 트레이닝 데이터를 사용함으로써 트레이닝하는 단계로서, 상기 작은 신경망은 k개의 히든 레이어를 포함하며, k개의 히든 레이어 각각은 상기 M_k보다 작은 N_k개의 노드를 포함하는, 상기 단계를 포함한다.According to another aspect of the present invention, there is provided a method of downsizing a neural network, comprising: training a large neural network by using first training data for downsizing a neural network, Wherein each of the k hidden layers includes M_k nodes and the first training data is a data set for training a neural network to provide an answer to a particular task, The first label data indicating a correct answer to the first input data; Generating second training data for knowledge transfer to a small neural network; And training the small neural network by using the second training data, wherein the small neural network comprises k hidden layers, each k hidden layer comprising N_k nodes less than the M_k, .

본 발명의 실시예에 따른 신경망 다운사이징 방법에 있어서, 상기 큰 신경망을 상기 제 1 트레이닝 데이터를 사용함으로써 트레이닝하는 단계는, 상기 복수의 히든 레이어 각각의 상기 M_k개의 노드들 중 상기 N_k개의 노드들을 선택적으로 사용함으로써 상기 입력 데이터를 프로세싱하는 단계, 상기 입력 데이터에 대한 프로세싱 결과인 제 1 출력 데이터를 출력하는 단계; 및 상기 제 1 출력 데이터와 상기 라벨 데이터를 비교하여 상기 큰 신경망의 파라미터를 조정하는 단계를 포함한다.The method of downsizing a neural network according to an exemplary embodiment of the present invention may include training the large neural network by using the first training data by selecting N_k nodes among the M_k nodes of each of the plurality of hidden layers Processing the input data by outputting first output data as a result of processing the input data; And comparing the first output data with the label data to adjust the parameters of the large neural network.

본 발명의 실시예에 따른 신경망 다운사이징 방법에 있어서, 상기 제 2 트레이닝 데이터는 신경망을 특정 태스크에 대한 답을 제공할 수 있도록 훈련하는 데이터 세트로서, 상기 태스크에 대한 제 2 입력 데이터 및 상기 입력 데이터에 대한 정답을 나타내는 제 2 라벨 데이터를 포함하는 복수의 데이터를 포함하고, 상기 제 2 트레이닝 데이터의 상기 제 2 입력 데이터는 상기 제 1 트레이닝 데이터의 제 1 입력 데이터에 해당하고, 상기 제 2 트레이닝 데이터의 제 2 라벨 데이터는 상기 큰 신경망에서 출력된 상기 제 1 출력 데이터에 해당한다.The method of downsizing a neural network according to an embodiment of the present invention is characterized in that the second training data is a data set for training a neural network to provide an answer to a specific task, The second input data of the second training data corresponds to the first input data of the first training data, and the second input data of the second training data corresponds to the second input data of the second training data, The second label data corresponds to the first output data output from the large neural network.

본 발명의 실시예에 따른 신경망 다운사이징 방법에 있어서, 상기 복수의 히든 레이어에서 사용되는 상기 N_k개의 노드는 상기 히든 레이어 및/또는 상기 데이터 쌍 각각 대해 랜덤하게 선택된다.In the neural network downsizing method according to an embodiment of the present invention, the N_k nodes used in the plurality of hidden layers are randomly selected for each of the hidden layer and / or the data pair.

본 발명의 실시예에 따른 신경망 다운사이징 방법에 있어서, 상기 작은 신경망을 상기 제 2 트레이닝 데이터를 사용함으로써 트레이닝하는 단계는, 상기 제 2 입력 데이터를 프로세싱하여 제 2 출력 데이터를 출력하는 단계; 및 상기 제 2 출력 데이터와 상기 제 2 라벨 데이터를 비교하여 상기 작은 신경망의 파라미터를 조정하는 단계를 포함한다.In the neural network downsizing method according to an embodiment of the present invention, training the small neural network by using the second training data may include processing the second input data and outputting second output data; And adjusting the parameter of the small neural network by comparing the second output data with the second label data.

본 발명의 실시예에 따른 신경망 다운사이징 방법에 있어서, 상기 제 1 라벨 데이터는 원 핫 인코딩된(one hot encoded) 데이터로서, 정답 여부를 나타내는 복수의 바이너리 값들을 포함하고, 상기 제 2 라벨 데이터는 정답일 확률을 나타내는 복수의 확률 값들을 포함한다.In the neural network downsizing method according to an embodiment of the present invention, the first label data is one hot encoded data, and includes a plurality of binary values indicating whether an answer is correct, And a plurality of probability values indicating probability of correct answers.

또한, 상술한 기술적 과제를 해결하기 위한 본 발명의 실시예에 따른 신경망 다운사이징을 수행하는 컴퓨팅 시스템은, 데이터를 저장하는 메모리; 및 상기 메모리와 연결되어 신경망의 동작 및 신경망 다운사이징을 수행하는 프로세서를 포함하며, 상기 프로세서는, 신경망 다운사이징을 위해 큰 신경망을 제 1 트레이닝 데이터를 사용함으로써 트레이닝하고, 작은 신경망으로의 지식 전달을 위한 제 2 트레이닝 데이터를 생성하고, 상기 작은 신경망을 상기 제 2 트레이닝 데이터를 사용함으로써 트레이닝하며, 상기 큰 신경망은 k개의 히든 레이어를 포함하며 상기 k개의 히든 레이어 각각은 M_k개의 노드들을 포함하고, 상기 제 1 트레이닝 데이터는 신경망을 특정 태스크에 대한 답을 제공할 수 있도록 트레이닝하는 데이터 세트로서, 상기 태스크에 대한 제 1 입력 데이터 및 상기 제 1 입력 데이터에 대한 정답을 나타내는 제 1 라벨 데이터를 포함하고, 상기 작은 신경망은 k개의 히든 레이어를 포함하며, k개의 히든 레이어 각각은 상기 M_k보다 작은 N_k개의 노드를 포함한다.According to another aspect of the present invention, there is provided a computing system for downsizing a neural network, comprising: a memory for storing data; And a processor coupled to the memory for performing neural network operations and neural network downsizing, wherein the processor trains a large neural network by using first training data for neural network downsizing, and transmits knowledge to a small neural network And training the small neural network by using the second training data, the large neural network comprising k hidden layers, each of the k hidden layers including M_k nodes, The first training data is a data set for training a neural network so as to provide an answer to a specific task, the first training data including first input data for the task and first label data representing a correct answer to the first input data, The small neural network includes k hidden layers And each of k hidden layers includes N_k nodes smaller than M_k.

본 발명에 따르면 시스템은 히든 레이어의 노드들의 수를 감소시킴으로써 신경망의 사이즈를 감소시킬 수 있다. 그리고 시스템은 작은 신경망에게 큰(large) 신경망에서 미리 트레이닝된 지식을 전달해줌으로써 빠르고 효율적으로 작은 신경망을 트레이닝하여 태스크 수행을 준비시킬 수 있다. 본 발명은 큰 신경망을 트레이닝하는 과정에서, 목표 사이즈의 신경망을 최적화된 방법으로 트레이닝함으로써, 큰 신경망의 트레이닝에 따른 지식이 작은 신경망에게 효과적으로 적용되도록 할 수 있다. 따라서 본 발명은 태스크 수행을 위한 신경망 사이즈를 감소시키면서, 사이즈 감소에 따른 성능 열화를 최소화할 수 있다. 그리고 본 발명의 시스템은 사이즈 감소된 신경망을 효율적으로 그리고 좋은 성능을 갖도록 트레이닝시킬 수 있다.According to the present invention, the system can reduce the size of the neural network by reducing the number of nodes in the hidden layer. And the system can train the small neural network to train the small neural network quickly and efficiently by transmitting the pre - trained knowledge in the large neural network. In the course of training a large neural network, the present invention can train a neural network of a target size in an optimized manner, so that the training knowledge of a large neural network can be effectively applied to a small neural network. Therefore, the present invention can minimize performance degradation due to size reduction while reducing neural network size for task execution. And the system of the present invention can train the reduced size neural network efficiently and with good performance.

이하의 명세서에서 본 발명의 효과가 구성과 함께 추가로 설명될 수 있다.In the following specification, the effects of the present invention can be further explained together with the constitution.

도 1은 본 발명의 실시예에 따른 신경망 모델 및 각 노드의 동작을 나타낸다.
도 2는 본 발명의 실시예에 따른 신경망을 나타낸다.
도 3은 본 발명의 실시예에 따른 신경망의 학습 과정을 나타낸다.
도 4는 본 발명의 실시예에 따른 신경망 모델 이전을 위한 선생 모델과 학생 모델을 나타낸다.
도 5는 본 발명의 실시예에 따른 선생 모델을 사용한 트레이닝 데이터 생성 방법을 나타낸다.
도 6은 본 발명의 실시예에 따른 새로운 트레이닝 데이터를 사용하여 대상 신경망을 트레이닝하는 방법을 나타낸다.
도 7은 본 발명의 다른 실시예에 따른 신경망 모델 트레이닝 방법을 나타낸다.
도 8은 본 발명의 실시예에 따른 신경망 다운사이징 방법을 나타낸다.
도 9는 본 발명의 실시예에 따른 큰 신경망의 학습 방법을 나타낸다.
도 10은 본 발명의 실시예에 따른 작은 신경망의 트레이닝을 위한 새로운 데이터 세트를 생성하는 방법을 나타낸다.
도 11은 본 발명의 실시예에 따른 작은 신경망의 학습을 나타낸다.
도 12는 본 발명의 실시예에 따른 신경망 다운사이징 방법을 나타낸다.
도 13은 본 발명의 실시예에 따른 신경망 다운사이징 시스템을 나타낸다.FIG. 1 shows a neural network model and operation of each node according to an embodiment of the present invention.
2 shows a neural network according to an embodiment of the present invention.
FIG. 3 shows a learning process of a neural network according to an embodiment of the present invention.
4 shows a teacher model and a student model for transferring a neural network model according to an embodiment of the present invention.
5 shows a method of generating training data using a teacher model according to an embodiment of the present invention.
FIG. 6 shows a method of training a target neural network using new training data according to an embodiment of the present invention.
FIG. 7 shows a method of training a neural network model according to another embodiment of the present invention.
8 shows a neural network downsizing method according to an embodiment of the present invention.
9 shows a learning method of a large neural network according to an embodiment of the present invention.
10 shows a method for generating a new data set for training a small neural network according to an embodiment of the present invention.
11 shows learning of a small neural network according to an embodiment of the present invention.
12 illustrates a neural network downsizing method according to an embodiment of the present invention.
13 shows a neural network downsizing system according to an embodiment of the present invention.

본 발명의 바람직한 실시예에 대해 구체적으로 설명하며, 그 예는 첨부된 도면에 나타낸다. 첨부된 도면을 참조한 아래의 상세한 설명은 본 발명의 실시예에 따라 구현될 수 있는 실시예만을 나타내기보다는 본 발명의 바람직한 실시예를 설명하기 위한 것이다. 다음의 상세한 설명은 본 발명에 대한 철저한 이해를 제공하기 위해 세부 사항을 포함하지만, 본 발명이 이러한 세부 사항을 모두 필요로 하는 것은 아니다. 본 발명은 이하에서 설명되는 실시예들은 각각 따로 사용되어야 하는 것은 아니다. 복수의 실시예 또는 모든 실시예들이 함께 사용될 수 있으며, 특정 실시예들은 조합으로서 사용될 수도 있다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The following detailed description with reference to the attached drawings is for the purpose of illustrating preferred embodiments of the present invention rather than illustrating only embodiments that may be implemented according to embodiments of the present invention. The following detailed description includes details in order to provide a thorough understanding of the present invention, but the present invention does not require all of these details. The present invention is not limited to the embodiments described below. Multiple embodiments or all of the embodiments may be used together, and specific embodiments may be used as a combination.

본 발명에서 사용되는 대부분의 용어는 해당 분야에서 널리 사용되는 일반적인 것들에서 선택되지만, 일부 용어는 출원인에 의해 임의로 선택되며 그 의미는 필요에 따라 다음 설명에서 자세히 서술한다. 따라서 본 발명은 용어의 단순한 명칭이나 의미가 아닌 용어의 의도된 의미에 근거하여 이해되어야 한다.Most of the terms used in the present invention are selected from common ones widely used in the field, but some terms are arbitrarily selected by the applicant and the meaning will be described in detail in the following description as necessary. Accordingly, the invention should be understood based on the intended meaning of the term rather than the mere name or meaning of the term.

본 발명은 심층신경망(Deep Neural Network)에 대한 것으로, 심층신경망은 본 명세서에서 신경망으로 약칭할 수도 있다.The present invention is directed to a Deep Neural Network, which may be abbreviated herein as a neural network.

본 발명에서, 컴퓨팅 시스템이 신경망을 구현하고, 데이터 연산을 수행할 수 있다. 컴퓨팅 시스템은 컴퓨터, 핸드폰, 스마트허브, 노트북, IoT 디바이스 등 데이터 프로세싱이 가능한 임의의 기기의 컴퓨팅 프로세싱 부분을 지칭할 수 있다. 본 명세서에서 컴퓨팅 시스템은 시스템으로 약칭할 수 있다. 컴퓨팅 시스템은 저장 장치에 저장된 애플리케이션과 같은 데이터를 수행함으로써 본 발명을 수행할 수 있다. In the present invention, a computing system may implement a neural network and perform data operations. A computing system may refer to a computing processing portion of any device capable of data processing, such as a computer, a cell phone, a smart hub, a notebook, an IoT device, and the like. In this specification, a computing system may be abbreviated as a system. The computing system may perform the invention by performing data, such as an application stored on a storage device.

도 1은 본 발명의 실시예에 따른 신경망 모델 및 각 노드의 동작을 나타낸다.FIG. 1 shows a neural network model and operation of each node according to an embodiment of the present invention.

도 1(a)는 본 발명의 실시예에 따른 신경망을 나타낸다. 신경망은 뉴런들을 포함하며, 도 1(a)에서 각각의 원들이 뉴런들을 나타낸다. 그리고 복수의 뉴런들의 집합을 레이어라고 한다. 뉴런들은 이하에서 노드라고 지칭할 수 있다. 입력 데이터를 받는 레이어를 입력 레이어, 신경망의 연산 결과가 출력되는 레이어를 출력 레이어라고 지칭한다.1 (a) shows a neural network according to an embodiment of the present invention. The neural network includes neurons, and each circle in Fig. 1 (a) represents neurons. A set of plural neurons is called a layer. Neurons can be referred to as nodes below. The layer receiving the input data is referred to as an input layer, and the layer outputting the result of operation of the neural network is referred to as an output layer.

도 1(b)는 신경망에서 뉴런의 동작을 나타낸다. 뉴런은 입력(i_0, i_1)에 가중치(w_0, w_1)를 적용하고, 활성함수(f(x)) 연산을 수행하여 출력을 제공한다. 활성 함수는 실제 신경 세포 처럼 일정 수준 이상의 자극이 주어졌을 때 값이 급격히 커지는 함수가 사용될 수 있다. 선형 함수를 사용하면 분석 능력이 감소할 수 있어, 비선형 함수가 사용될 수 있다. 실시예로서, 활성 함수는 시그모이드(sigmoid), tenh, 렐루(relu) 등의 함수가 사용될 수 있다. 활성 함수는 입력 신호의 총합을 그대로 사용하지 않고, 출력 신호로 변환한다. 즉 활성 함수는 가중치가 적용된 입력들에 대해 활성화 여부를 결정한다.Figure 1 (b) shows the behavior of neurons in neural networks. The neuron applies weights (w_0, w_1) to the inputs (i_0, i_1) and performs an active function (f (x)) operation to provide an output. The active function may be a function that increases rapidly when a stimulus is given above a certain level, such as a real neuron. The use of linear functions can reduce analytical capabilities, and nonlinear functions can be used. As an example, a function such as sigmoid, tenh, relu, etc. may be used as the activation function. The activation function converts the sum of the input signals into an output signal without using the sum. That is, the activation function determines whether or not to activate the weighted inputs.

신경망은 특정 태스크의 수행을 위해 학습한다. 즉 시스템이 특정 태스크의 수행을 위해 신경망을 구성하고, 신경망을 트레이닝할 수 있다. 예를 들면, 사진을 보고 개인지 고양이인지를 알아내는 태스크가 수행될 수 있다. 시스템은 복수의 사진들과 복수의 사진에 대한 정답 데이터(개 또는 고양이)를 사용함으로써 입력 데이터 연산(프로세싱), 출력 데이터와 정답 데이터의 비교 및 가중치의 조정 동작을 반복하여 수행한다. 이렇게 트레이닝된 신경망은 태스크를 수행할 준비가 된다. 본 명세서에서, 신경망의 학습/트레이닝을 위한 입력 및 정답을 포함하는 데이터를 데이터 세트 또는 트레이닝 데이터라고 지칭할 수 있다.The neural network learns to perform specific tasks. That is, the system can construct a neural network and train a neural network to perform specific tasks. For example, a task can be performed to find out whether a person is a cat by viewing a photograph. The system repeatedly performs input data operation (processing), comparison of output data and correct data, and weight adjustment operation by using correct answers data (dog or cat) for a plurality of photographs and a plurality of photographs. The neural network thus trained is ready to perform the task. In this specification, data including inputs and correct answers for learning / training of a neural network may be referred to as a data set or training data.

도 2는 본 발명의 실시예에 따른 신경망을 나타낸다.2 shows a neural network according to an embodiment of the present invention.

도 2(a)는 본 발명의 실시예에 따른 신경망을, 도 1(b)는 신경망 모델링을 위한 데이터 세트를 나타낸다.FIG. 2 (a) shows a neural network according to an embodiment of the present invention, and FIG. 1 (b) shows a data set for neural network modeling.

도 1에서, 신경망은 입력 레이어(input), 출력 레이어(output) 및 히든 레이어들(layer1~layer3)을 포함한다. 입력 레이어(input)는 들어온 신호/데이터를 다음 레이어로 전달할 수 있다. 출력 레이어의 노드들의 출력이 신경망의 최종 결과값에 해당할 수 있다. 입력 레이어와 출력 레이어 사이의 적어도 하나의 레이어를 히든 레이어라고 지칭한다. 특히 히든 레이어가 복수인 신경망을 심층 신경망이라고 부를 수도 있으며, 본 발명에서는 심층 신경망을 실시예로 설명한다.In FIG. 1, the neural network includes an input layer, an output layer, and hidden layers (layer 1 to layer 3). The input layer can forward incoming signal / data to the next layer. The output of nodes in the output layer may correspond to the final result of the neural network. At least one layer between an input layer and an output layer is referred to as a hidden layer. In particular, a neural network having plural hidden layers may be referred to as a deep neural network. In the present invention, a depth neural network will be described as an embodiment.

신경망은 학습 또는 트레이닝에 의해 모델링될 수 있다. 신경망이 모델링되면, 그 신경망은 시스템이 원하는 태스크를 수행할 준비가 된 것으로 볼 수 있다. 즉 모델링된 신경망은 입력에 대해 추론된 출력을 제공할 수 있다. The neural network can be modeled by learning or training. Once the neural network is modeled, it can be seen that the system is ready to perform the desired task. That is, the modeled neural network can provide an inferred output to the input.

신경망은 목표 태스크를 수행하기 위해 모델링될 수 있다. 목표 태스크를 달성하도록 신경망을 트레이닝하기 위해 데이터 세트가 준비될 수 있으며, 이러한 데이터 세트를 트레이닝 데이터라고 지칭할 수 있다. 데이터 세트는 복수의 데이터 쌍(Data Pair 1, Data Pair 2, Data Pair3..)을 포함하며, 데이터 쌍은 입력 데이터와 라벨 데이터를 포함한다. 라벨 데이터는 원 핫 인코딩된(one hot encoded) 데이터로서 하나의 정답을 나타낼 수 있다. 실시예로서, 라벨 데이터는 원 핫 인코딩된 데이터로서, 복수의 바이너리 값들을 포함할 수 있다.The neural network can be modeled to perform the target task. A data set may be prepared to train the neural network to achieve the target task, and such data set may be referred to as training data. The data set includes a plurality of data pairs (Data Pair 1, Data Pair 2, Data Pair 3...), And the data pair includes input data and label data. The label data may represent one correct answer as one hot encoded data. As an example, the label data may be raw hot encoded data and may include a plurality of binary values.

트레이닝 데이터는 복수의 데이터 쌍을 포함할 수 있다. 다만, 데이터의 분류에 따라서 트레이닝 데이터는 입력 데이터와 라벨 데이터를 포함할 수도 있다. 입력 데이터와 라벨 데이터는 서로 매칭될 수 있다. 즉 n번째 입력 데이터에 대한 정답이 n번째 라벨 데이터에 해당할 수 있다.The training data may include a plurality of data pairs. However, training data may include input data and label data depending on the classification of data. The input data and the label data may be matched with each other. That is, the correct answer to the n-th input data may correspond to the n-th label data.

신경망의 히든 레이어는 노드들을 포함하며, 노드들은 가중치(weight)를 갖는다. 노드들의 가중치는 트레이닝 단계에서 학습을 통해 결정/조정될 수 있다. 신경망은 라벨 데이터와 출력 데이터의 차이가 적어지도록 각 노드들의 가중치들을 조정할 수 있다. 노드들에 대한 적어도 하나의 가중치를 신경망의 파라미터로 지칭할 수도 있다. 즉, 시스템/신경망은 각각의 입력 데이터, 출력 데이터 및 라벨 데이터에 대해 신경망의 파라미터 조정을 수행할 수 있으며, 이러한 파라미터 조정은 트레이닝 데이터에 포함된 입력 데이터, 출력 데이터 및 라벨 데이터의 수 만큼 반복 수행될 수 있다.The hidden layer of the neural network contains nodes, and the nodes have weights. The weights of the nodes can be determined / adjusted through learning in the training phase. The neural network can adjust the weight of each node so that the difference between the label data and the output data is small. At least one weight for the nodes may be referred to as a parameter of the neural network. That is, the system / neural network can perform parameter adjustment of the neural network for each input data, output data and label data, and such parameter adjustment is repeatedly performed as many times as the number of input data, output data and label data included in the training data .

도 3은 본 발명의 실시예에 따른 신경망의 학습 과정을 나타낸다.FIG. 3 shows a learning process of a neural network according to an embodiment of the present invention.

도 3의 실시예에서, 시스템은 신경망 모델에 데이터 세트를 입력하여 신경망 모델을 트레이닝한다. 즉, 신경망에 (1,2,0,9,5,4,7,3)이 입력되고, 신경망은 (0.1, 0.2, 0.6, 0.1)을 출력한다. 그리고 시스템/신경망은 출력과 입력 데이터에 대한 라벨 데이터(0,0,1,0)을 비교한다. 시스템/신경망은 출력 데이터(0.1, 0.2, 0.6, 0.1)과 라벨 데이터(0,0,1,0)의 차이가 줄어들도록 각 레이어들의 파라미터들을 조정할 수 있다.In the embodiment of FIG. 3, the system trains a neural network model by entering a set of data into a neural network model. That is, (1,2,0,9,5,4,7,3) is input to the neural network, and the neural network outputs (0.1, 0.2, 0.6, 0.1). Then, the system / neural network compares the label data (0, 0, 1, 0) for the output with the input data. The system / neural network can adjust the parameters of each layer to reduce the difference between the output data (0.1, 0.2, 0.6, 0.1) and the label data (0, 0, 1, 0).

신경망은 입력되는 트레이닝 데이터에 대해 연산, 출력과 라벨의 비교 및 파라미터 조정의 과정을 반복적으로 수행한다. 모든 트레이닝 데이터에 대해 학습이 완료되면, 이 신경망이 특정 태스크를 위해 모델링된 것으로 볼 수 있다.The neural network computes the input training data, compares the output with the label, and performs the process of parameter adjustment repeatedly. Once learning has been completed for all training data, this neural network can be seen as modeled for a particular task.

다만, 신경망을 구현하는 컴퓨팅 환경에 따라서 구현 가능한 신경망의 사이즈가 다를 수 있다. 히든 레이어의 수가 많고 히든 레이어가 포함하는 노드들의 수가 많을수록 일반적으로 신경망은 더 잘 모델링될 수 있다. 더 잘 모델링된 신경망은 특정 태스크에 대해 정답을 제공할 확률이 더 높다. 그러나 많은 히든 레이어 및 많은 노드들의 수는 강한 컴퓨팅 파워 및 저장 공간을 필요로 할 수 있다. 따라서 환경에 따라서 학습된 신경망 모델을 저 작은 신경망으로 이전(transfer)할 필요가 발생할 수 있다. 본 명세서에서, 신경망 모델 이전을 위해 학습되는 오리지널 신경망 모델을 큰 신경망 모델 또는 선생(teacher) 신경망 모델로 지칭하고, 신경망 모델이 이전되는 대상 신경망 모델을 작은 신경망 모델 또는 학생(student) 신경망 모델로 지칭할 수 있다. 신경망 모델은 신경망 또는 모델로 약칭할 수도 있다.However, the size of the neural network that can be implemented may vary depending on the computing environment implementing the neural network. The greater the number of hidden layers and the more nodes the hidden layer contains, the more generally the neural network can be modeled. A better modeled neural network is more likely to provide the correct answer for a particular task. However, many hidden layers and a large number of nodes may require strong computing power and storage space. Therefore, it may be necessary to transfer the learned neural network model to a small neural network depending on the environment. In this specification, the original neural network model learned for the transfer of the neural network model is referred to as a large neural network model or a teacher neural network model, and the neural network model is referred to as a small neural network model or a student neural network model can do. The neural network model may be abbreviated as a neural network or model.

이하에서는 하나의 신경망에서 학습된 모델을 다른 신경망으로 이전하는 방법에 대해 설명한다.Hereinafter, a method of transferring a learned model from one neural network to another neural network will be described.

도 4 내지 도 6은 본 발명의 일 실시예에 따른 신경망 모델 이전(transfer) 방법을 나타낸다.FIGS. 4 to 6 illustrate a method of transferring a neural network model according to an embodiment of the present invention.

도 4는 본 발명의 실시예에 따른 신경망 모델 이전을 위한 선생 모델과 학생 모델을 나타낸다. 4 shows a teacher model and a student model for transferring a neural network model according to an embodiment of the present invention.

도 4에서, 모델 A는 도 2에서 설명한 바와 같이 트레이닝 데이터를 사용하여 트레이닝된 모델을 나타내고, 모델 B는 아직 학습이 되지 않은 모델을 나타낸다. 도 4에서는 레이어의 수가 같은 두 개의 모델을 도시하였지만, 모델 A와 모델 B는 서로 다른 수의 레이어들 또는 서로 다른 수의 노드들을 포함할 수 있다.In FIG. 4, model A represents a model trained using training data as described in FIG. 2, and model B represents a model that has not yet been learned. Although FIG. 4 shows two models with the same number of layers, model A and model B may contain different numbers of layers or different numbers of nodes.

도 5는 본 발명의 실시예에 따른 선생 모델을 사용한 트레이닝 데이터 생성 방법을 나타낸다.5 shows a method of generating training data using a teacher model according to an embodiment of the present invention.

상술한 바와 같이 신경망 모델은 입력 데이터와 라벨 데이터를 포함하는 데이터 쌍들 즉 트레이닝 데이터에 의해 학습된다. 즉 도 5에서처럼, 입력 데이터가 (1,2,0,9,5,4,7,3)인 경우 모델 A의 출력 데이터는 (0.1, 0.2, 0.6, 0.1)가 될 수 있다. 이 경우 시스템은 출력 데이터 (0.1, 0.2, 0.6, 0.1)를 입력 데이터 (1,2,0,9,5,4,7,3)에 대한 라벨로 간주하여 새로운 트레이닝 데이터를 생성할 수 있다. 도 5에서 나타낸 과정을 반복하여, 시스템은 복수의 입력 데이터에 대해, 복수의 출력 데이터를 라벨 데이터로 간주하여 복수의 데이터 쌍을 갖는 새로운 트레이닝 데이터를 생성할 수 있다. As described above, the neural network model is learned by training data, i.e., data pairs including input data and label data. 5, when the input data is (1, 2, 0, 9, 5, 4, 7, 3), the output data of the model A can be (0.1, 0.2, 0.6, 0.1). In this case, the system can generate new training data by considering the output data (0.1, 0.2, 0.6, 0.1) as the label for the input data (1,2,0,9,5,4,7,3). By repeating the process shown in Fig. 5, the system can generate new training data having a plurality of data pairs by considering a plurality of output data as label data for a plurality of input data.

도 6은 본 발명의 실시예에 따른 새로운 트레이닝 데이터를 사용하여 대상 신경망을 트레이닝하는 방법을 나타낸다.FIG. 6 shows a method of training a target neural network using new training data according to an embodiment of the present invention.

도 6에서, 시스템은 도 5에서 생성한 새로운 트레이닝 데이터를 사용함으로써 대상 모델을 트레이닝한다. 즉, 대상 모델을 학습시킬 때, 같은 태스크를 수행하는 오리지널 신경망 모델에 의해 생성된 출력 데이터를 라벨 데이터로 사용하는 것이다. In Fig. 6, the system trains the target model by using the new training data generated in Fig. That is, when learning the target model, the output data generated by the original neural network model performing the same task is used as the label data.

도 6에서, 대상 신경망 모델은 오리지널 신경망 모델에서 학습의 결과로 출력된 출력 데이터(0.1, 0.2, 0.6, 0.1)를 사용하여 학습한다. 즉 오리지널 신경망 모델의 학습 결과에 해당하는 출력 데이터를 라벨로 사용하므로, 대상 신경망 모델은 더 많은 정보를 사용하여 최적화된 학습을 수행할 수 있다.In Fig. 6, the target neural network model is learned using output data (0.1, 0.2, 0.6, 0.1) output as a result of learning in the original neural network model. That is, since the output data corresponding to the learning result of the original neural network model is used as the label, the target neural network model can perform optimized learning using more information.

도 6에서와 같이 생성된 새로운 트레이닝 데이터의 라벨 데이터는 소프트 디시전 밸류들(soft decision values) 또는 소프트 값들을 포함할 수 있다. 소프트 디시전 밸류 또는 소프트 밸류는 출력 데이터의 각 값들이 정답일 가능성을 나타내는 확률값이 될 수 있다. 즉, 새로운 트레이닝 데이터는 정답이 될 수 있는 확률을 나타내는 복수의 소프트 값들을 포함할 수 있다. The label data of the new training data generated as shown in FIG. 6 may include soft decision values or soft values. The soft decision value or the soft value may be a probability value indicating that each value of the output data is correct. That is, the new training data may include a plurality of soft values indicating a probability of being correct.

도 6에서, 첫번째 데이터 쌍에 대해, 출력 데이터는 각각 첫번째 값이 1일 확률(0.3), 두번째 값이 1일 확률(0.2), 세번째 값이 1일 확률(0.4), 네번째 값이 1일 확률(0.1)을 포함할 수 있다. 시스템은 이 출력 데이터를 라벨 데이터(0.1, 0.2, 0.6, 0.1)과 비교하여 신경망의 파라미터를 조정할 수 있다. In FIG. 6, for the first data pair, the output data is the probability that the first value is 1 (0.3), the second value is 1 (0.2), the third value is 1 (0.4) (0.1). The system can adjust the parameters of the neural network by comparing this output data with label data (0.1, 0.2, 0.6, 0.1).

이러한 트레이닝 데이터는 모델 A의 트레이닝에 의해 학습된 지식(knowledge)를 포함한다. 즉, 모델 A의 트레이닝의 결과(지식)을 모델 B가 트레이닝 데이터로 사용하게 되는 것이다. 따라서 모델 B는 라벨값을 이용하는 경우에 비해 더 효율적으로, 그리고 더 잘 트레이닝될 수 있다. 특히 태스크를 이한 데이터 중 커먼 영역의 데이터를 학습하고, 그 지식을 대상 신경망으로 이전하면 대상 신경망의 트레이닝 부담을 더 줄일 수도 있다.This training data includes knowledge learned by the training of the model A. [ That is, the result (knowledge) of the training of the model A is used as the training data by the model B. Thus, Model B can be trained more efficiently and better than when using label values. Especially, learning data of common area among the data of tasks and transferring the knowledge to the target neural network can further reduce the training burden of the target neural network.

학습된 오리지널 모델의 출력 데이터는 각 클래스를 어떻게 구분할 수 있는지, 각 클래스를 어떤 기준으로 구분해야 잘 구분할 수 있는지의 정보를 갖는다. 이러한 지식 전달(knowledge transfer) 방법을 사용함으로써 대상 모델은 더 빠르고 더 정확하게 학습할 수 있다.The output data of the learned original model has information on how to classify each class and how to distinguish each class by a certain criterion. By using this knowledge transfer method, the target model can learn faster and more accurately.

예를 들면, 신경망 모델의 태스크가 사진의 강아지/고양이 구분일 수 있다. 일반적인 학습 방법에 있어서, 신경망 모델은 모든 강아지 사진에 대해 [강아지, 고양이]=[1,0] 이라는 라벨 데이터를 사용하여 학습한다. 그리고 출력 데이터는 [0.7, 0.3]이나, 태스크의 결과로는 강아지라는 정답을 출력할 수 있다. 지식 전달 방법은 소프트 값을 포함하는 오리지널 신경망의 출력 데이터를 라벨 데이터로 사용함으로써, 대상 네트워크를 더 빠르고 정확하게 트레이닝할 수 있는 것이다. 실시예로서, 더 큰 오리지널 신경망을 학습하여 지식을 축적하고, 이 지식을 활용하여 더 작은 대상 신경망을 트레이닝함으로써 대상 신경망의 성능을 향상시킬 수 있다.For example, the task of the neural network model may be the dog / cat classification of the photograph. For a general learning method, the neural network model learns all dog pictures using the label data [dog, cat] = [1,0]. And the output data is [0.7, 0.3], but as the result of the task, you can output the correct answer as dog. The knowledge transfer method uses the output data of the original neural network including the soft value as the label data so that the target network can be trained faster and more accurately. As an embodiment, it is possible to improve the performance of the target neural network by learning a larger original neural network to accumulate knowledge and using this knowledge to train a smaller target neural network.

노드는 줄지만 더 많은 정보를 갖는 데이터로 대상 모델을 학습시켜서 성능을 더 높일 수 있다.The nodes can learn more about the target model with data that has less information but more information, which can further improve performance.

도 7은 본 발명의 다른 실시예에 따른 신경망 모델 트레이닝 방법을 나타낸다.FIG. 7 shows a method of training a neural network model according to another embodiment of the present invention.

도 7은 신경망 모델을 최적화하는 방법을 나타낸다. 즉, 상술한 신경망 학습 방법에 추가로, 레이어들에 대해 추가적인 동작이 부가된다. 도 7의 실시예에서, 신경망 학습 시 레이어들 마다 적어도 하나의 노드들이 연산에서 제외된다. 도 7은 각 히든 레이어의 노드가 9개인 신경망을 나타낸다.Figure 7 shows a method for optimizing a neural network model. That is, in addition to the neural network learning method described above, additional operations are added to the layers. In the embodiment of Fig. 7, at least one node is excluded from the calculation for each layer in neural network learning. FIG. 7 shows a neural network having nine nodes of each hidden layer.

도 7의 예에서, 레이어 2에서는 1, 3, 4, 9 번째 노드들이 연산에서 제외된다. 레이어 3에서는 3, 5, 6, 7 번째 노드들이 연산에서 제외되고, 레이어 4에서는 2, 4, 7, 8 번째 노드들이 연산에서 제외된다. 연산에서 제외되는 노드들은 확률적으로 결정될 수 있다. 연산에서 제외되는 노드들은 랜덤하게 결정될 수도 있다. 학습을 계속하며 파라미터/웨이트를 수정하는 과정에서도 연산에서 제외된 노드들은 무시될 수 있다. 다만, 학습 이후 신경망이 동작할 때는 모든 노드가 사용될 수도 있다. In the example of FIG. 7, the first, third, fourth, and ninth nodes are excluded from the operation in the layer 2. In Layer 3, nodes 3, 5, 6, and 7 are excluded from operation, and in Layer 4, nodes 2, 4, 7, and 8 are excluded from operations. The nodes excluded from the computation can be determined stochastically. The nodes excluded from the computation may be determined randomly. Nodes that are excluded from the operation can be ignored in the process of continuing the learning and modifying the parameters / weights. However, all neighbors may be used when neural networks operate after learning.

학습 과정에서 특정 확률로 노드들을 제외시키면, 하나의 데이터를 학습할 때마다 비슷한 형태의 다른 모델들을 사용하여 학습하는 것과 비슷한 결과를 가져온다. 즉 일부 노드들을 공유하는 서로 다른 복수의 모델들을 학습시키는 것과 유사한 동작이 될 수 있다. 따라서 신경망 동작 시에 모든 노드들을 사용하면, 훈련된 복수의 모델들의 결과를 합친 것과 같아 신경망의 정답 추론 성능이 매우 향상될 수 있다. 이 실시예에 따르면 하나의 모델로만 학습하는 경우에 비해 결과가 트레이닝 세트에 기초하여 편향되는 것을 방지할 수 있고, 따라서 신경망의 성능이 더욱 향상될 수 있다.By excluding nodes with a certain probability in the learning process, it is similar to learning by using different models of similar type each time learning one data. That is, it may be an operation similar to learning a plurality of different models sharing some nodes. Therefore, using all the nodes in the neural network operation is equivalent to combining the results of a plurality of trained models, so that the neural network's correct inference performance can be greatly improved. According to this embodiment, it is possible to prevent the result from being deflected based on the training set as compared with learning only with one model, and therefore the performance of the neural network can be further improved.

도 8은 본 발명의 실시예에 따른 신경망 다운사이징 방법을 나타낸다.8 shows a neural network downsizing method according to an embodiment of the present invention.

최근 신경망은 모바일 기기, IoT(Internet on Things) 기기 등 다양한 전자 기기에서 사용되고 있다. 신경망을 트레이닝하는 과정 및 추론(infer)하는 과정은 신경망 모델의 사이즈가 증가함에 따라서 더 많은 연산을 필요로 하게 되었다. 따라서 신경망의 트레이닝 및 추론 연산은 외부 서버에서 수행될 수 있다. 다만, 이런 경우 개인 정보가 외부 서버로 전송되어야 하여 개인 정보 유출 위험이 증가하고, 서버 운용/통신 비용이 발생할 수 있다. 따라서 개별 기기에서 신경망을 활용하기 위해 신경망의 성능은 최대한 적게 감소키면서 모델의 사이즈를 줄이는 신경망 모델 다운사이즈 방법이 필요하다.Recently, neural networks have been used in various electronic devices such as mobile devices and Internet on Things (IoT) devices. The process of training and inferring the neural network requires more computation as the size of the neural network model increases. Therefore, the training and speculation operation of the neural network can be performed in the external server. However, in this case, the personal information needs to be transmitted to the external server, which increases the risk of leakage of personal information and may incur server operating / communication costs. Therefore, in order to utilize neural network in individual devices, neural network model downsizing method which reduces the size of model while minimizing the performance of neural network is needed.

이를 위해 상술한 도 3~5의 신경망 모델 이전을 위한 지식 이전 방법을 사용할 수 있다. 그러나 상술한 지식 이전 방법은 더 많은 정보를 갖는 트레이닝 데이터로 대상 신경망을 학습시켜 성능을 개선할 수는 있지만, 네트워크 사이즈의 차이로 인한 성능 열화를 피하기 어렵다. 즉 큰 신경망과 작은 신경망의 사이즈 차이가 크면 큰 신경망이 전달한 지식을 작은 신경망이 표현하기가 어려워지기 때문이다. 또는 사이즈의 차이로 인해 큰 신경망에서 전달하는 지식이 오히려 작은 신경망의 학습 성능을 저해할 수도 있다. 따라서 신경망의 사이즈를 줄이면서 성능 열화를 최소화기는 방법이 필요하다.For this, a knowledge transfer method for transferring the neural network model of FIGS. 3 to 5 can be used. However, the above knowledge transfer method can improve the performance by learning the target neural network with the training data having more information, but it is difficult to avoid performance deterioration due to the difference of the network size. That is, if the size difference between the large neural network and the small neural network is large, it is difficult for the small neural network to express the knowledge transferred by the large neural network. Or because of the difference in size, knowledge transmitted in a large neural network may impair the learning performance of a small neural network. Therefore, there is a need to reduce the size of the neural network and minimize the performance degradation.

본 발명은 도 8과 같이 작은 신경망이 학습하기 좋은 데이터 세트를 생성할 수 있도록 큰 신경망을 트레이닝하는 방법을 제안한다. 즉 본 발명은 작은 신경망이 학습할 수 있는 데이터 세트를 생성하고, 생성된 데이터 세트로 다운사이징된 작은 신경망을 트레이닝하는 방법에 대해 설명한다. 이하에서 큰 신경망은 히든 레이어의 노드들의 수가 9이고 작은 신경망은 히든 레이어의 노드들의 수가 3인 실시예로 설명하나, 본 발명의 사상이 이러한 실시예에 제한되지 않는다.As shown in FIG. 8, the present invention proposes a method of training a large neural network so that a small neural network can generate a data set that can be learned. That is, the present invention describes a method for generating a data set that a small neural network can learn, and training a small neural network downsized with the generated data set. In the following, a large neural network is described as an embodiment in which the number of hidden layer nodes is nine and a small neural network has three hidden layer nodes, but the spirit of the present invention is not limited to such an embodiment.

도 9는 본 발명의 실시예에 따른 큰 신경망의 학습 방법을 나타낸다.9 shows a learning method of a large neural network according to an embodiment of the present invention.

본 발명은 큰 신경망을 트레이닝하는 때 다운사이징하려고 하는 작은 신경망의 노드수에 따라서 노드들의 수를 줄여서 트레이닝하는 방법을 제안한다. 도 9에서, 데이터 세트을 학습하는 경우, 큰 신경망의 히든 레이어의 노드들의 수는 3개로 감소된다. 이 때 데이터 세트에 대해 기 결정된 3개의 노드를 사용하는 대신, 연산마다 다른 조합인 3개의 노드가 사용될 수 있다. 즉 신경망 트레이닝을 위한 각각의 데이터 연산시에 랜덤하게 노드들이 선택되고, 히든 레이어별로 랜덤하게 노드들이 선택될 수 있다.The present invention proposes a method of reducing the number of nodes according to the number of nodes of a small neural network to be downsized when training a large neural network. In Figure 9, when learning a data set, the number of nodes in the hidden layer of a large neural network is reduced to three. Instead of using three predetermined nodes for the data set at this time, three different nodes may be used for each operation. That is, nodes are randomly selected at the time of each data operation for neural network training, and nodes can be selected randomly for each hidden layer.

도 9에서, 큰 신경망은 히든 레이어의 연산 시 9개 중 3개의 노드들만을 사용하여 연산을 수행한다. 히든 레이어의 연산에서 사용되는 3개의 노드들은 고정되지 않는다. 임의의 확률로 랜덤하게 선택된 임의의 3개의 노드들이 각 히든 레이어에서 데이터 연산에 사용될 수 있다. 도 8에서 첫번째 데이터(1,2,0,9,5,4,7,3)가 입력되면 첫번째 히든 레이어는 1, 4, 8번째 노드들을 사용하여 연산을 수행하고, 두번째 히든 레이어는 1, 5, 9번째 노드들을 사용하여 연산을 수행하여 (0.1, 0.2, 0.6, 0.1)의 출력 데이터를 출력한다. 시스템은 출력 데이터를 라벨 데이터(0,0,1,0)과 비교하여 신경망의 파라미터/가중치를 조정할 수 있다. In FIG. 9, a large neural network performs an operation using only three of nine nodes in the operation of the hidden layer. The three nodes used in the hidden layer operation are not fixed. Any three nodes randomly selected at any probability can be used for data operations at each hidden layer. In FIG. 8, when the first data (1, 2, 0, 9, 5, 4, 7, 3) is input, the first hidden layer performs operations using the first, fourth and eighth nodes, 5, and 9 nodes to output the output data of (0.1, 0.2, 0.6, 0.1). The system can adjust the parameters / weights of the neural network by comparing the output data with label data (0, 0, 1, 0).

그리고 두번째 데이터(2, 6, 2, 8, 7, 4, 7, 2)가 입력되면 히든 레이어들 각각은 첫번째 데이터의 경우와 같거나 다른 랜덤하게 결정된 3개의 노드들을 사용하여 출력 데이터를 출력한다. 그리고 2번째 데이터에 대한 라벨 데이터와 출력 데이터를 비교하여 신경망의 각 히든 레이어의 각 노드들에 대한 가중치 또는 파라미터를 조정할 수 있다. 큰 신경망은 전체 데이터 세트에 대해 이런 과정을 반복함으로써 학습을 종료한다. When the second data (2, 6, 2, 8, 7, 4, 7, 2) is input, each hidden layer outputs output data using three randomly determined nodes that are the same as or different from the first data . The label data and the output data for the second data are compared with each other to adjust the weight or parameter for each node of each hidden layer of the neural network. The large neural network terminates learning by repeating this process for the entire data set.

실시예로서, 각 히든레이어의 노드들의 수가 다를 수 있다. 이 경우 본 발명은 큰 신경망의 히든 레이어의 노드들에 대해, 작은 신경망의 히든 레이어의 노드들의 수를 랜덤하게 선택할 수 있다. 즉 작은 신경망의 제 2 히든 레이어의 노드 수가 3인 경우, 큰 신경망은 제 2 히든 레이어의 노드 3개를 사용하여 트레이닝을 수행한다.As an example, the number of nodes of each hidden layer may be different. In this case, the present invention can randomly select the number of nodes of the hidden layer of the small neural network for the hidden layer nodes of the large neural network. That is, when the number of nodes of the second hidden layer of the small neural network is 3, the large neural network performs training using three nodes of the second hidden layer.

도 10은 본 발명의 실시예에 따른 작은 신경망의 트레이닝을 위한 새로운 데이터 세트를 생성하는 방법을 나타낸다.10 shows a method for generating a new data set for training a small neural network according to an embodiment of the present invention.

큰 신경망은 데이터 세트에 대한 학습 결과 즉 출력 데이터를 라벨 데이터로 하는 새로운 데이터 세트를 생성한다. 이렇게 생성된 데이터 세트를 다운사이징을 위한 트레이닝 데이터/데이터 세트라고 지칭할 수도 있다. 다운사이징 용 데이터 세트는 큰 신경망이 오리지널 데이터 세트(또는 로(Raw) 데이터 세트)를 입력받고, 학습하면서 출력한 데이터를 라벨 데이터로 갖는다. 즉 큰 신경망이 오리지널 데이터 세트를 학습하여 출력하는 출력 데이터를 각 입력 데이터에 대한 라벨 데이터로 갖는 데이터 세트가 된다. The large neural network generates a new data set with label data for the learning result for the data set, i.e., the output data. The data set thus generated may be referred to as a training data / data set for downsizing. The data set for downsizing receives the original data set (or raw data set) of a large neural network, and has data output as learning as label data. That is, the large neural network becomes a data set having label data for each input data as output data for learning and outputting the original data set.

도 9에서 설명한 바와 같이 큰 신경망은 히든 레이어 당 3개의 노드를 사용함으로써, 입력 데이터 (1,2,0,9,5,4,7,3)를 연산하여 출력 데이터 (0.1, 0.2, 0.6, 0.1)를 출력한다. 그리고 이렇게 출력된 출력 데이터를 라벨 데이터로 치환함으로써 작은 신경망의 훈련을 위한 데이터 세트가 생성된다. 도 9 및 도 10에서 데이터 세트가 x개의 입력 데이터와 라벨 데이터의 쌍을 포함하는 경우, 다운사이징 용 데이터 세트는 x개의 입력 데이터와 라벨 데이터의 쌍을 포함한다. 그리고 다운 사이징 용 데이터 세트의 x개의 라벨 데이터는 큰 신경망의 학습 결과인 출력 데이터로서, 소프트 값들을 포함하는 데이터이다.9, the large neural network computes the input data (1, 2, 0, 9, 5, 4, 7, 3) by using three nodes per hidden layer, 0.1). By replacing the output data thus output with the label data, a data set for training a small neural network is generated. In Figures 9 and 10, if the data set includes a pair of x input data and label data, the data set for downsizing includes a pair of x input data and label data. The x label data of the data set for downsizing is output data which is a learning result of a large neural network, and is data including soft values.

새로운 트레이닝 데이터 생성에 대해서는 도 4 내지 도 6과 관련하여 상술한 설명이 적용된다.For the generation of new training data, the above description with reference to Figs. 4 to 6 applies.

도 11은 본 발명의 실시예에 따른 작은 신경망의 학습을 나타낸다.11 shows learning of a small neural network according to an embodiment of the present invention.

도 11에서, 각각의 히든 레이어의 노드가 3개인 작은 신경망이 학습을 시작한다. 작은 신경망은 도 9 내지 도 10의 과정에서 생성된 다운사이징 용 트레이닝 데이터를 사용하여 학습을 시작한다.In Fig. 11, a small neural network having three nodes of each hidden layer starts learning. The small neural network starts learning using training data for downsizing generated in the procedure of FIGS.

도 11의 실시예에서, 제 1 입력 데이터(1, 2, 0, 9, 5, 4, 7, 3)이 작은 신경망에 입력되고, 작은 신경망은 (0.2, 0.1, 0.5, 0.2)의 출력 데이터를 출력한다. 그리고 작은 신경망은 트레이닝용 라벨 데이터(0.2, 0.1, 0.5, 0.2)를 사용하여 파라미터/웨이트를 조정한다. 트레이닝용 데이터 세트에 대해 신경망은 학습을 수행하고, 학습이 끝나면 태스크를 수행할 준비가 완료된다.11, the first input data (1, 2, 0, 9, 5, 4, 7, 3) is input to the smaller neural network, and the smaller neural network inputs the output data of (0.2, 0.1, 0.5, 0.2) . And the small neural network adjusts the parameter / weight using the training label data (0.2, 0.1, 0.5, 0.2). For the data set for training, the neural network performs the learning, and when the learning is finished, the task is ready to perform.

도 12는 본 발명의 실시예에 따른 신경망 다운사이징 방법을 나타낸다.12 illustrates a neural network downsizing method according to an embodiment of the present invention.

시스템은 신경망 다운사이징을 위해 큰 신경망을 트레이닝한다(S12010).The system trains a large neural network to downsize the neural network (S12010).

시스템은 제 1 트레이닝 데이터를 사용하여 큰 신경망을 트레이닝한다. 큰 신경망은 k개의 히든 레이어를 포함하고, k개의 히든 레이어 각각은 M_k 개의 노드들을 포함할 수 있다. 제 1 트레이닝 데이터는 신경망이 특정 태스크에 대한 답을 제공할 수 있도록 신경망을 트레이닝하는 데이터 세트이다. 제 1 트레이닝 데이터는 제 1 입력 데이터 및 제 1 라벨 데이터를 포함할 수 있다. The system uses the first training data to train a large neural network. A large neural network contains k hidden layers, and each of the k hidden layers may contain M_k nodes. The first training data is a set of data that trains the neural network so that the neural network can provide an answer to a particular task. The first training data may include first input data and first label data.

시스템은 작은 신경망으로의 지식 전달을 위한 제 2 트레이닝 데이터를 생성할 수 있다(S12020). The system may generate second training data for knowledge transfer to the small neural network (S12020).

상기 제 2 트레이닝 데이터는 신경망을 특정 태스크에 대한 답을 제공할 수 있도록 훈련하는 데이터 세트이다. 제 2 트레이닝 데이터는 태스크에 대한 제 2 입력 데이터 및 제 2 입력 데이터에 대한 정답을 나타내는 제 2 라벨 데이터를 포함할 수 있다. The second training data is a set of data that trains a neural network to provide an answer to a particular task. The second training data may include second input data for the task and second label data indicating the correct answer for the second input data.

시스템은 제 2 트레이닝 데이터를 사용함으로써 작은 신경망을 트레이닝할 수 있다(S12030). The system may train a small neural network by using the second training data (S12030).

작은 신경망은 k개의 히든 레이어를 포함하고, k개의 히든 레이어 각각은 M_k 보다 작은 N_k개의 노드를 포함할 수 있다. 시스템은 큰 신경망 트레이밍 전에 작은 신경망의 구조를 구성 또는 결정할 수 있다. 즉 시스템은 목표로 하는 작은 신경망의 레이어별 노드수에 따라서 큰 레이어의 노드수를 줄여서 학습함으로써, 작은 신경망에 최적화된 트레이닝 결과를 전달할 수 있다.A small neural network contains k hidden layers, and each of the k hidden layers may contain N_k nodes smaller than M_k. The system can construct or determine the structure of a small neural network before large neural network tracing. In other words, the system can deliver optimized training results to a small neural network by learning the number of nodes in a large layer according to the number of nodes per layer of the target small neural network.

큰 신경망을 트레이닝하는 단계는, 복수의 히든 레이어 각각의 M_k 개의 노드들 중 N_k 개의 노드들을 선택적으로 사용함으로써 제 1 입력 데이터를 프로세싱하는 단계, 입력 데이터를 프로세싱한 제 1 출력 데이터를 출력하는 단계 및 제 1 출력 데이터와 라벨 데이터를 비교하여 신경망의 파라미터를 조정하는 단계를 포함할 수 있다. 이러한 프로세싱, 출력, 조정 단계는 트레이닝 데이터의 각 입력데이터 및 각 라벨 데이터에 대해 반복적으로 수행될 수 있다. N_k개의 노드는 히든 레이어 및/또는 데이터 쌍 각각에 대해 랜덤하게 선택/결정될 수 있다. M_k는 자연수이고, N_k는 M_k 미만 또는 이하인 자연수가 될 수 있다.The step of training the large neural network may include processing the first input data by selectively using N_k of the M_k nodes of each of the plurality of hidden layers, outputting the first output data processed input data, And comparing the first output data with the label data to adjust the parameters of the neural network. Such processing, output, and adjustment steps may be repeatedly performed for each input data and each label data of the training data. N_k nodes may be randomly selected / determined for each hidden layer and / or data pair. M_k is a natural number, and N_k is a natural number less than or equal to M_k.

k, M_k, N_k는 모두 자연수가 될 수 있다. 도 8 내지 도 11에서, k=2, M_k=9, N_k=3인 실시예로서 설명하였다. 즉 M_1=M_2=9, N_1=N_2=3인 실시예를 설명하였으나, M_1과 M_2가 동일하지 않을 수 있고, N_1과 N_2가 동일하지 않을 수도 있다. 본 발명은 N_1이 M_1 미만 또는 이하이고, N_2가 M_2 미만 또는 이하이면 필요한 조건이 만족된다.k, M_k, and N_k may all be natural numbers. 8 to 11, k = 2, M_k = 9, N_k = 3. That is, although M_1 = M_2 = 9 and N_1 = N_2 = 3 are described, M_1 and M_2 may not be the same, and N_1 and N_2 may not be the same. The present invention satisfies the necessary condition when N_1 is less than or equal to M_1 and N_2 is less than or equal to M_2.

제 2 트레이닝 데이터 생성 단계는 제 1 트레이닝 데이터의 라벨 데이터를 제 1 출력 데이터로 교환함으로써 수행된다. 즉 제 2 트레이닝 데이터에 대해, 제 2 입력 데이터는 제 1 트레이닝 데이터의 제 1 입력 데이터에 해당하고, 제 2 라벨 데이터는 큰 신경망에서 출력된 제 1 출력 데이터에 해당한다.The second training data generation step is performed by exchanging the label data of the first training data with the first output data. That is, for the second training data, the second input data corresponds to the first input data of the first training data, and the second label data corresponds to the first output data output from the larger neural network.

작은 신경망을 트레이닝하는 단계는, 제 2 입력 데이터를 프로세싱하여 제 2 출력 데이터를 출력하는 단계 및 제 2 출력 데이터와 제 2 라벨 데이터를 비교하여 작은 신경망의 파라미터를 조정하는 단계를 포함한다. Training the small neural network includes processing the second input data to output the second output data and comparing the second output data to the second label data to adjust the parameters of the small neural network.

신경망 다운 사이징을 위해 큰 신경망을 트레이닝하고(S12010), 작은 신경망으로 지식 전달을 위하 트레이닝 데이터를 생성(S12020)하는 단계는 도 1 내지 도 10에서, 작은 신경망을 트레이닝하는 단게는 도 11에서 설명하였으며, 각 단계들에 상술한 설명이 적용될 수 있다. The steps of training a large neural network for downsizing a neural network (S12010) and generating training data for knowledge transfer to a small neural network (S12020) are illustrated in FIGS. 1 to 10, , The above description can be applied to each step.

제 1 라벨 데이터는 원 핫 인코딩된 데이터로서, 정답 여부를 나타내는 복수의 바이너리 값들을 포함할 수 있다. 제 2 라벨 데이터는 정답일 확률을 나타내는 복수의 확률값들을 포함할 수 있다. The first label data may be raw hot encoded data, and may include a plurality of binary values indicating whether or not an answer is correct. The second label data may include a plurality of probability values indicating a probability of correct answer.

본 발명에 따르면, 시스템은 성능 열화를 최소화하면서 신경망을 다운사이징할 수 있다. 또한, 작은 신경망을 완전히 새롭게 학습시키는 대신, 커먼 데이터에 대해 학습된 신경망을 사용함으로써 추가 학습의 양을 줄일 수 있다. 즉 신경망 모델의 이전을 통해 개인화를 위한 신경망 모델링의 학습 양을 현저하게 줄일 수도 있음. 실시예에 따라서, 시스템은 커먼 데이터에 대해서는 상술한 방법에 따라서 신경망 다운사이징을 수행하고, 개인 정보에 대해서만 작은 신경망을 추가적으로 트레이닝할 수도 있다.According to the present invention, the system can downsize the neural network with minimal performance degradation. Also, instead of learning a completely new neural network, the amount of additional learning can be reduced by using neural networks learned for common data. In other words, the transfer of the neural network model can significantly reduce the learning amount of neural network modeling for personalization. Depending on the embodiment, the system may perform neural network downsizing according to the method described above for the common data, and may additionally train a small neural network for private information only.

도 13은 본 발명의 실시예에 따른 신경망 다운사이징 시스템을 나타낸다.13 shows a neural network downsizing system according to an embodiment of the present invention.

신경망 다운사이징 시스템(13000)은 컴퓨팅 시스템으로서, 임의의 전자 기기에 포함될 수 있다.The neural network downsizing system 13000 is a computing system and may be included in any electronic device.

메모리(13010)는 프로세서(13020)와 연결되어, 프로세서(13020)를 구동하기 위한 다양한 정보를 저장한다. 메모리(13010)는 프로세서(13020)의 내부에 포함되거나 또는 프로세서(13020)의 외부에 설치되어 프로세서(13020)와 공지의 수단에 의해 연결될 수 있다. 메모리(13010)는 휘발성 및 비휘발성 메모리를 통칭한다. 본 발명에서 메모리(13010)는 신경망, 신경망에 대한 트레이닝 정보, 신경망 구현 및 트레이닝을 위한 애플리케시션과 같은 데이터를 저장할 수 있다.The memory 13010 is connected to the processor 13020 and stores various information for driving the processor 13020. [ Memory 13010 may be internal to processor 13020 or external to processor 13020 and may be coupled to processor 13020 by known means. Memory 13010 is collectively referred to as volatile and non-volatile memory. In the present invention, memory 13010 may store data such as neural networks, training information for neural networks, applications for neural network implementation and training.

프로세서(13020)는 메모리(13010)과 연결되어 본 발명에 따른 신경망, 신경망의 트레이닝 방법, 신경망의 다운사이징 방법을 수행할 수 있다. 상술한 본 발명의 다양한 실시예에 따른 시스템(13000)의 동작을 구현하는 모듈, 데이터, 프로그램 또는 소프트웨어 중 적어도 하나가 메모리(13010)에 저장되고, 프로세서(13020)에 의하여 실행될 수 있다. 프로세싱 유닛(12030)은 본 발명의 방법을 수행하기 위한 애플리케이션/소프트웨어를 구동함으로써, 본 발명의 방법을 실시할 수 있다.The processor 13020 may be connected to the memory 13010 to perform the neural network, the training method of the neural network, and the downsizing method of the neural network according to the present invention. At least one of the modules, data, programs, or software that implement the operation of system 13000 according to various embodiments of the invention described above may be stored in memory 13010 and executed by processor 13020. [ The processing unit 12030 may implement the method of the present invention by driving application / software to perform the method of the present invention.

통신 유닛(13030)은 시스템의 외부 기기와 유선 통신 또는 무선 통신을 수행할 수 있다. 통신 유닛(13030)은 디바이스의 구성에 따라서 구비되지 않을 수도 있다. 또한, 통신 유닛(13030)은 복수의 통신 칩셋들로 구성될 수도 있다. 통신 유닛(12030)은 통신 모듈을 구비하고, 3G, 4G(LTE), 5G, WIFI, 블루투스, NFC 등 다양한 통신 프로토콜에 기초하여 통신을 수행할 수 있다.The communication unit 13030 can perform wired communication or wireless communication with an external device of the system. The communication unit 13030 may not be provided depending on the configuration of the device. Also, the communication unit 13030 may be composed of a plurality of communication chipsets. The communication unit 12030 includes a communication module and can perform communication based on various communication protocols such as 3G, 4G (LTE), 5G, WIFI, Bluetooth, and NFC.

시스템(13000) 또는 시스템의 프로세서(13020)는 도 1 내지 도 12에서 상술한 본 발명의 신경망 다운사이징 방법을 수행할 수 있다.The system 13000 or the processor 13020 of the system may perform the neural network downsizing method of the present invention described above with reference to FIGS.

이상에서 설명된 실시예들은 본 발명의 구성요소들과 특징들이 소정 형태로 결합된 것들이다. 각 구성요소 또는 특징은 별도의 명시적 언급이 없는 한 선택적인 것으로 고려되어야 한다. 각 구성요소 또는 특징은 다른 구성요소나 특징과 결합되지 않은 형태로 실시될 수 있다. 또한, 일부 구성요소들 및/또는 특징들을 결합하여 본 발명의 실시예를 구성하는 것도 가능하다. 본 발명의 실시예들에서 설명되는 동작들의 순서는 변경될 수 있다. 어느 실시예의 일부 구성이나 특징은 다른 실시예에 포함될 수 있고, 또는 다른 실시예의 대응하는 구성 또는 특징과 교체될 수 있다. 특허청구범위에서 명시적인 인용 관계가 있지 않은 청구항들을 결합하여 실시예를 구성하거나 출원 후의 보정에 의해 새로운 청구항으로 포함시킬 수 있음은 자명하다.The embodiments described above are those in which the elements and features of the present invention are combined in a predetermined form. Each component or feature shall be considered optional unless otherwise expressly stated. Each component or feature may be implemented in a form that is not combined with other components or features. It is also possible to construct embodiments of the present invention by combining some of the elements and / or features. The order of the operations described in the embodiments of the present invention may be changed. Some configurations or features of certain embodiments may be included in other embodiments, or may be replaced with corresponding configurations or features of other embodiments. It is clear that the claims that are not expressly cited in the claims may be combined to form an embodiment or be included in a new claim by an amendment after the application.

본 발명에 따른 실시예는 다양한 수단, 예를 들어, 하드웨어, 펌웨어(firmware), 소프트웨어 또는 그것들의 결합 등에 의해 구현될 수 있다. 하드웨어에 의한 구현의 경우, 본 발명의 일 실시예는 하나 또는 그 이상의 ASICs(application specific integrated circuits), DSPs(digital signal processors), DSPDs(digital signal processing devices), PLDs(programmable logic devices), FPGAs(field programmable gate arrays), 프로세서, 콘트롤러, 마이크로 콘트롤러, 마이크로 프로세서 등에 의해 구현될 수 있다.Embodiments in accordance with the present invention may be implemented by various means, for example, hardware, firmware, software, or a combination thereof. In the case of hardware implementation, an embodiment of the present invention may include one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs) field programmable gate arrays, processors, controllers, microcontrollers, microprocessors, and the like.

펌웨어나 소프트웨어에 의한 구현의 경우, 본 발명의 일 실시예는 이상에서 설명된 기능 또는 동작들을 수행하는 모듈, 절차, 함수 등의 형태로 구현될 수 있다. 소프트웨어 코드는 메모리에 저장되어 프로세서에 의해 구동될 수 있다. 상기 메모리는 상기 프로세서 내부 또는 외부에 위치하여, 이미 공지된 다양한 수단에 의해 상기 프로세서와 데이터를 주고 받을 수 있다.In the case of an implementation by firmware or software, an embodiment of the present invention may be implemented in the form of a module, a procedure, a function, or the like which performs the functions or operations described above. The software code can be stored in memory and driven by the processor. The memory is located inside or outside the processor and can exchange data with the processor by various means already known.

본 발명은 본 발명의 필수적 특징을 벗어나지 않는 범위에서 다른 특정한 형태로 구체화될 수 있음은 당업자에게 자명하다. 따라서, 상술한 상세한 설명은 모든 면에서 제한적으로 해석되어서는 아니 되고 예시적인 것으로 고려되어야 한다. 본 발명의 범위는 첨부된 청구항의 합리적 해석에 의해 결정되어야 하고, 본 발명의 등가적 범위 내에서의 모든 변경은 본 발명의 범위에 포함된다. It will be apparent to those skilled in the art that the present invention may be embodied in other specific forms without departing from the essential characteristics thereof. Accordingly, the foregoing detailed description is to be considered in all respects illustrative and not restrictive. The scope of the present invention should be determined by rational interpretation of the appended claims, and all changes within the scope of equivalents of the present invention are included in the scope of the present invention.

13000: 신경망 다운사이징 시스템
13010: 메모리
13020: 프로세서
13030: 통신유닛13000: Neural network downsizing system
13010: Memory
13020: Processor
13030: Communication unit

Claims

A method for downsizing a neural network of a computing system,
Training a large neural network by using first training data for neural network downsizing and outputting first output data, wherein the large neural network includes k hidden layers, each of the k hidden layers including M_k nodes Wherein the first training data is a data set for training a neural network so as to provide an answer to a specific task, the first training data including first input data for the task and first label data representing a correct answer to the first input data Include;
Generating second training data for knowledge transfer to a small neural network wherein the second training data comprises a plurality of training data including second input data for the task and second label data representing a correct answer for the input data, Wherein the second input data of the second training data corresponds to the first input data of the first training data and the second label data of the second training data corresponds to the first output data ; And
Training the small neural network by using the second training data, the small neural network comprising k hidden layers, each k hidden layer comprising N_k nodes less than the M_k,
Wherein the first label data is one hot encoded data and includes a plurality of binary values indicating whether an answer is correct and the second label data includes a plurality of probability values indicating a probability of a correct answer,
Wherein k, M_k and N_k are natural numbers.

The method according to claim 1,
Wherein training the large neural network by using the first training data and outputting the first output data comprises:
Processing the input data by selectively using the N_k nodes among the M_k nodes of each of the k hidden layers,
Outputting first output data that is a processing result of the input data; And
And comparing the first output data with the label data to adjust the parameters of the large neural network.

delete

3. The method of claim 2,
Wherein the N_k nodes used in the k hidden layers are randomly selected for each of k hidden layers and / or data pairs of the first input data and the first label data.

The method according to claim 1,
Wherein training the small neural network by using the second training data comprises:
Processing the second input data to output second output data; And
And comparing the second output data with the second label data to adjust the parameters of the small neural network.

delete

1. A computing system for performing neural network downsizing,
A memory for storing data; And
And a processor coupled to the memory to perform neural network operation and neural network downsizing,
The processor comprising:
Training a large neural network by using first training data for neural network downsizing to output first output data, generating second training data for knowledge transfer to a small neural network, and transmitting the small neural network to the second training data , &Lt; / RTI >
Wherein the large neural network comprises k hidden layers and each of the k hidden layers comprises M_k nodes and the first training data is a data set for training the neural network to provide an answer to a particular task, The first input data for the task and the first label data indicating the correct answer for the first input data,
The small neural network comprising k hidden layers, each k hidden layer comprising N_k nodes less than the M_k,
Wherein the second training data includes a plurality of data including second input data for the task and second label data representing a correct answer to the input data, The second training data corresponds to the first input data of the first training data, the second label data of the second training data corresponds to the first output data,
Wherein the first label data is one hot encoded data and includes a plurality of binary values indicating whether an answer is correct and the second label data includes a plurality of probability values indicating a probability of a correct answer,
Wherein k, M_k, and N_k are natural numbers.

8. The method of claim 7,
The training of the large neural network using the first training data and the output of the first output data,
Processing the input data by selectively using the N_k nodes among the M_k nodes of each of the k hidden layers,
Outputting first output data which is a processing result of the input data,
And comparing the first output data with the label data to adjust parameters of the large neural network.

delete

9. The method of claim 8,
Wherein the N_k nodes used in the k hidden layers are randomly selected for each of the hidden layer and / or data pairs of the first input data and the first label data.

8. The method of claim 7,
Wherein the training of the small neural network using the second training data comprises:
Processing the second input data to output second output data, and
And comparing the second output data with the second label data to adjust parameters of the small neural network.

delete