KR20220049709A

KR20220049709A - System and Method of Adaptive Bach Selection for Accelerating Deep Neural Network Learning based on Data Uncertainty

Info

Publication number: KR20220049709A
Application number: KR1020200133132A
Authority: KR
Inventors: 이재길; 송환준; 김민석
Original assignee: 한국과학기술원
Priority date: 2020-10-15
Filing date: 2020-10-15
Publication date: 2022-04-22

Abstract

Disclosed are a system and method for an adaptive batch selection based on data uncertainty for accelerating a deep neural network learning. The system comprises: a data storage module; an adaptive batch selection module; a network learning module; a model prediction history module; a sample uncertainty evaluation module; a quantization module; and a sample selection probability calculation module.

Description

System and Method of Adaptive Bach Selection for Accelerating Deep Neural Network Learning based on Data Uncertainty

최근 대규모 데이터의 확보가 가능해짐에 따라 고성능 딥 뉴럴 네트워크 모델을 구축하는 것이 가능해졌고, 자율 주행과 같은 실생활과 밀접한 분야에서 굉장히 좋은 성능을 보여주고 있다. 하지만 데이터 크기의 증가에 따라 선형적으로 증가하는 모델의 학습 시간은 고성능 모델을 실용화 하기 위한 중요한 도전과제가 되고 있다.With the recent availability of large-scale data, it has become possible to build a high-performance deep neural network model, and it is showing very good performance in fields closely related to real life, such as autonomous driving. However, the training time of the model, which increases linearly with the increase in data size, is becoming an important challenge to put a high-performance model into practice.

모델 학습 시간을 줄이기 위한 시도로, 사전에 학습된 모델을 활용하는 전이 학습(transfer learning) 그리고 모델의 크기를 줄이는 모델 압축(model compression) 방법들이 제안됐지만 이들은 학습된 모델의 최종적인 일반화 성능을 저하하는 문제가 발생하였다.In an attempt to reduce the model training time, transfer learning using a pre-trained model and model compression method to reduce the size of the model have been proposed, but they degrade the final generalization performance of the trained model. a problem occurred.

이에 딥 뉴럴 네트워크의 학습 시간을 가속화하거나 모델의 일반화 성능을 향상시키기 위한 다른 방법으로서, 학습 데이터에 포함된 어려운 표본들만을 학습에 사용하는 온라인 배치 선택(Online Batch Selection), 능동적 편향(Active Bias) 등이 제안되었다.Accordingly, as another method for accelerating the training time of a deep neural network or improving the generalization performance of a model, Online Batch Selection, which uses only difficult samples included in the training data, for training, and Active Bias etc. have been proposed.

온라인 배치 선택은 머신 러닝(machine learning) 방법론을 더욱 효율적으로 학습하는 방법으로 학습 데이터의 대부분이 쉬운 표본이라는 사실(plethora of easy samples)을 바탕으로, 학습 표본들의 손실값(loss)을 활용해 학습의 어려움을 평가하고, 학습을 위해 평가된 어려움을 기준으로 다음 학습을 위한 미니배치 표본으로 어려운 표본들이 선택될 확률을 상대적으로 높여주는 배치 선택 방법이다. 하지만, 어려운 표본들만을 강조하기 때문에 그러한 학습 표본들에게 모델이 과적합(Overfittings) 현상이 발생하였고, 이는 모델의 일반화 성능을 저하시켜 최종적이 모델이 잘못된 예측을 하도록 만드는 원인이 되었다.Online batch selection is a more efficient way to learn machine learning methodologies. Based on the fact that most of the training data are easy samples, learning using the loss of training samples It is a batch selection method that evaluates the difficulty of learning and increases the probability that difficult samples are selected as mini-batch samples for the next learning based on the assessed difficulty for learning. However, because only difficult samples were emphasized, the model overfitting occurred on such training samples, which deteriorated the generalization performance of the model and ultimately caused the model to make erroneous predictions.

능동적 편향은 딥 뉴럴 네트워크 학습으로 얻어진 모델의 최종적인 일반화 성능을 향상시키기 위한 방법으로, 선택된 미니배치 표본들의 예측 정확도(prediction accuracy)를 학습 표본의 예측 불확실도(uncertainty)를 평가하기위해 사용하였다. 진행된 “전체”학습기간 동안 얻어진 예측 정확도 값들의 분산(variance)를 불확실도로 가정하였고, 가장 분산이 큰 표본들을 다음 학습을 위한 미니배치 표본으로 선택함으로서 최종적으로 학습된 모델의 일반화 성능을 크게 향상시켰다. 하지만, 불확실도를 평가하는데 있어 모든 기간동안의 시계열(time-sereis) 데이터를 활용하는 "Growing Window" 기반의 방식은, 학습이 장기간 지속될 경우 너무 오래된(outdated) 값들로 인해 학습 불확실도를 잘못 추정하게 하였고, 학습에 크게 도움이 되지않는 너무 쉽거나 혹은 너무 어려운 샘플들을 선택하여 모델의 학습 속도를 오히려 저하시켰다.Active bias is a method to improve the final generalization performance of a model obtained by deep neural network learning, and the prediction accuracy of the selected mini-batch samples is used to evaluate the prediction uncertainty of the training sample. The variance of the prediction accuracy values obtained during the “whole” learning period was assumed as uncertainty, and the generalization performance of the finally trained model was greatly improved by selecting the sample with the largest variance as the mini-batch sample for the next training. . However, the “growing window”-based method that uses time-series data for all periods in evaluating uncertainty leads to erroneous estimation of learning uncertainty due to outdated values when learning continues for a long time. , the learning speed of the model was rather slowed by selecting too easy or too difficult samples, which are not very helpful for learning.

본 개시는 딥 뉴럴 네트워크 학습을 위한 효과적인 배치 선택 방법 및 시스템을 제공하는 것이다.The present disclosure is to provide an effective batch selection method and system for deep neural network learning.

본 개시는, 기존 모든 학습 표본을 동일한 중요도로 간주하는 방식과 달리, 모델 학습에 가장 효과적인 학습 표본들을 탐색하여 배치를 구성함으로써, 학습 속도를 개선하고, 학습된 모델의 최종적인 일반화(generalization) 성능을 높이는 데이터 불확실성 기반 적응적 배치 선택 시스템 및 방법에 관한 것이다.The present disclosure improves the learning speed and final generalization performance of the trained model by configuring a batch by searching for the most effective training samples for model learning, unlike the method in which all existing training samples are considered with the same importance. It relates to a data uncertainty-based adaptive batch selection system and method for increasing .

본 개시는, 기존에 제안된 딥 뉴럴 네트워크 학습을 위한 적응형 배치 선택 방법 중 하나로, 기존에 제안된 딥 뉴럴 네트워크 일반화 성능을 향상시키는 방법에서 문제점으로 대두된 “잘못된 표본 불확실도 평가 문제"를 해결하여 “딥 러닝 학습 가속"을 달성하면서도 최종적으로 “모델의 테스트 데이터에 대한 높은 일반화”를 달성하여, 대규모 데이터를 보다 효율적이고 효과적으로 활용하여 모델을 학습시킬 수 있는, 새로운 딥 뉴럴 네트워크 학습 가속화를 위한 데이터 불확실성 기반 적응적 배치 선택 시스템 및 방법을 제공하는 것이다.As one of the previously proposed adaptive batch selection methods for deep neural network learning, the present disclosure solves the “incorrect sample uncertainty evaluation problem” that emerged as a problem in the previously proposed method for improving generalization performance of deep neural networks. Data for accelerating learning of new deep neural networks, which can train models by using large-scale data more efficiently and effectively by achieving “acceleration of deep learning learning” and finally achieving “high generalization to the test data of the model” To provide an uncertainty-based adaptive placement selection system and method.

한 실시예에 따른 시스템은, 전체 학습 데이터를 저장하는 데이터 저장 모듈; 상기 데이터 저장 모듈로부터 학습을 위한 최소 단위인 학습 미니 배치(mini-batch)를 적응적으로 선택하는 적응적 배치 선택 모듈; 입력 받은 불확실도 기반의 미니 배치에 대해 딥 뉴럴 네트워크 학습을 수행하는 네트워크 학습 모듈; 학습된 모델에 대한 미니 배치 표본들에 대한 최근 예측 결과만을 저장 및 관리하는 모델 예측 기록 모듈; 예측 기록 모듈에 축적된 결과를 바탕으로 표본 불확실도를 계산하는 표본 불확실도 평가 모듈; 얻어진 불확실도 값을 정량적인 지수로 변환하는 양자화(quantization) 모듈; 양자화된 지수를 활용하여 표본 선택 확률을 확정하는 표본 선택 확률 계산 모듈을 포함한다.A system according to an embodiment includes a data storage module for storing entire learning data; an adaptive batch selection module for adaptively selecting a learning mini-batch, which is a minimum unit for learning, from the data storage module; a network learning module that performs deep neural network learning on the received uncertainty-based mini-batch; a model prediction recording module for storing and managing only recent prediction results for mini-batch samples for the trained model; a sample uncertainty evaluation module for calculating sample uncertainty based on the results accumulated in the prediction recording module; a quantization module that converts the obtained uncertainty value into a quantitative exponent; and a sample selection probability calculation module for determining the sample selection probability by using the quantized index.

도 1은 딥 뉴럴 네트워크 학습 가속화를 위한 데이터 불확실성 기반 적응적 배치 선택 시스템의 구성도이다.1 is a block diagram of a data uncertainty-based adaptive batch selection system for accelerating deep neural network learning.

딥 러닝 학습을 가속화한다고 알려진 온라인 배치 선택 방법은, 학습을 가속했으나 어려운 표본들에 대한 과적합으로 인해 실제 응용에서 사용되는 테스트 표본들에 대한 모델의 일반화 성능을 오히려 저하시키는 문제가 발생한다. 반면에, 딥 러닝 학습의 일반화 성능을 향상시킨다고 알려진 능동적 편향 방법은, 테스트 데이터에 대한 모델의 최종적인 일반화 성능을 향상시켰으나, Growing Window 기법의 근본적인 한계인“최근이 아닌 오래된 값들에 의존한다”로 인해 학습의 속도를 저하시키는 문제가 발생한다.The online batch selection method, which is known to accelerate deep learning learning, accelerates learning, but has a problem in that it reduces the generalization performance of the model for test samples used in real applications due to overfitting to difficult samples. On the other hand, the active bias method, which is known to improve the generalization performance of deep learning learning, improved the final generalization performance of the model on the test data, but it is a fundamental limitation of the Growing Window technique, “It depends on old values rather than recent ones”. This causes a problem that slows down the learning rate.

본 개시는, “빠른 학습”과 “테스트 표본들에 대한 일반화 성능 향상”을 동시에 달성하는 새로운 딥 뉴럴 네트워크 학습을 위한 적응적 배치 선택 시스템 및 방법을 제공한다.The present disclosure provides an adaptive batch selection system and method for learning a new deep neural network that simultaneously achieves “fast learning” and “improving generalization performance on test samples”.

다음에서 첨부한 도면을 참고로 하여 본 개시의 실시예에 대하여 본 개시가 속하는 기술분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 개시는 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 개시를 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, with reference to the accompanying drawings, the embodiments of the present disclosure will be described in detail so that those of ordinary skill in the art to which the present disclosure pertains can easily implement them. However, the present disclosure may be implemented in several different forms and is not limited to the embodiments described herein. And in order to clearly explain the present disclosure in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

설명에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. In the description, when a part "includes" a certain component, it means that other components may be further included, rather than excluding other components, unless otherwise stated.

설명에서, "전송 또는 제공"은 직접적인 전송 또는 제공하는 것뿐만 아니라 다른 장치를 통해 또는 우회 경로를 이용하여 간접적으로 전송 또는 제공도 포함할 수 있다. In the description, “transmitting or providing” may include not only direct transmission or provision, but also transmission or provision indirectly through another device or using a detour path.

설명에서, 단수로 기재된 표현은 "하나" 또는 "단일" 등의 명시적인 표현을 사용하지 않은 이상, 단수 또는 복수로 해석될 수 있다. In the description, an expression written in the singular may be construed in the singular or plural unless an explicit expression such as “a” or “a” is used.

설명에서, 흐름도에 기재된 동작 순서는 변경될 수 있고, 여러 동작들이 병합되거나, 어느 동작이 분할될 수 있고, 특정 동작은 수행되지 않을 수 있다.In the description, the order of operations described in the flowchart may be changed, several operations may be merged, some operations may be divided, and specific operations may not be performed.

설명에서, "…부", "…기", "…모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.In the description, terms such as “… unit”, “… group”, “… module” mean a unit that processes at least one function or operation, which may be implemented as hardware or software or a combination of hardware and software. .

도 1을 참고하면, 시스템은, 전체 학습 데이터를 저장하는 데이터 저장 모듈; 상기 데이터 저장 모듈로부터 학습을 위한 최소 단위인 학습 미니 배치(mini-batch)를 적응적으로 선택하는 적응적 배치 선택 모듈; 입력 받은 불확실도 기반의 미니 배치에 대해 딥 뉴럴 네트워크 학습을 수행하는 네트워크 학습 모듈; 학습된 모델에 대한 미니 배치 표본들에 대한 최근 예측 결과만을 저장 및 관리하는 모델 예측 기록 모듈; 예측 기록 모듈에 축적된 결과를 바탕으로 표본 불확실도를 계산하는 표본 불확실도 평가 모듈; 얻어진 불확실도 값을 정량적인 지수로 변환하는 양자화(quantization) 모듈; 양자화된 지수를 활용하여 표본 선택 확률을 확정하는 표본 선택 확률 계산 모듈을 포함한다.Referring to Figure 1, the system, the data storage module for storing the entire learning data; an adaptive batch selection module for adaptively selecting a learning mini-batch, which is a minimum unit for learning, from the data storage module; a network learning module that performs deep neural network learning on the received uncertainty-based mini-batch; a model prediction recording module for storing and managing only recent prediction results for mini-batch samples for the trained model; a sample uncertainty evaluation module for calculating sample uncertainty based on the results accumulated in the prediction recording module; a quantization module that converts the obtained uncertainty value into a quantitative exponent; and a sample selection probability calculation module for determining the sample selection probability by using the quantized index.

데이터 저장 모듈(11)은 통상적인 지도 학습(supervised learning)을 위한 데이터를 저장한다. 데이터는 다수의 표본들의 집합이며 데이터에서의 i-th 표본은 해당 표본에 대한 특성을 나타내는 다차원 백터(feature) x_i와 해당 표본이 속한 실제 라벨(label) y_i 정보를 모두 포함한다. 예로, 작업이 이미지 분류 작업인 경우, 고양이 사진에 대한 R, G, B 값이 feature이며, 고양이라는 정보가 라벨이다. 수식적으로, 표본 N개로 구성되는 데이터 D={(x₁,y₁), (x₂,y₂), ..., (x_N,y_N)} 로 표기한다. 데이터 저장 모듈은 해당 데이터는 저장하고 관리하는 모듈로써 적응적 배치 선택 모듈에서 데이터 접근이 가능하다.The data storage module 11 stores data for general supervised learning. Data is a set of multiple samples, and the i-th sample in the data includes both the multidimensional vector (feature) x _i indicating the characteristics of the sample and the actual label y _i information to which the sample belongs. For example, if the task is an image classification task, the R, G, and B values for the cat picture are the features, and the cat information is the label. Numerically, the data consisting of N samples is expressed as D={(x ₁ ,y ₁ ), (x ₂ ,y ₂ ), ..., (x _N ,y _N )} . The data storage module is a module that stores and manages the data, and data access is possible in the adaptive batch selection module.

적응적 배치 선택 모듈(120)은 데이터 저장 모듈로부터 학습 표본들을 수신한 후 딥 뉴럴 네트워크 학습을 위한 데이터 표본 집합의 최소 단위인 미니 배치로 분할한다. 통상적으로 128 혹은 256개의 표본들이 하나의 미니 배치를 구성하는데 포함이 된다. 예로, 전체 데이터의 표본 수가 N개이고 각 미니 배치의 표본 숫자가 Batch Size라고 하면 총 미니 배치의 숫자는 수학식 1과 같다.The adaptive batch selection module 120 receives training samples from the data storage module and divides them into mini-batch, which is a minimum unit of a data sample set for deep neural network training. Typically 128 or 256 samples are included to make up one mini-batch. For example, if the number of samples in the total data is N and the number of samples in each mini-batch is the batch size, the total number of mini-batch is the same as Equation 1.

[수학식 1][Equation 1]

B=N/Batch SizeB=N/Batch Size

기존에는 하나의 미니 배치를 구성하기 위해 무작위로 표본들을 선택한 반면에, 본 발명에서는 적응적으로 현재 모델의 학습에 가장 적합한 표본들을 선택하여 미니 배치를 구성한다. 이때, 모델이 높은 확률로 맞추거나 높은 확률로 틀리는 예측이 확실한 표본(certain sample)은 너무 쉽거나 어려운 표본들이기 때문에 낮은 확률로 선택되며, 모델의 예측이 불확실한 표본(uncertain sample)의 경우 다음 미니 배치를 위해 높은 확률로 선택되어진다. 따라서, 적응적 배치 선택 모듈을 위해 데이터에 표본들에 대한 불확실도 확률 분포

를 지속적으로 관리 및 업데이트하며 학습을 위한 미니 배치 u는 수학식 2와 같이 해당 분포로부터 선택된다. In the past, samples are randomly selected to construct one mini-batch, but in the present invention, the mini-batch is configured by adaptively selecting samples most suitable for learning the current model. At this time, the sample with a high probability that the model fits or is wrong with a high probability is selected with a low probability because it is too easy or difficult samples, and in the case of a sample with an uncertain model prediction, the next mini batch is chosen with a high probability for Thus, for the adaptive batch selection module, the uncertainty probability distribution for the samples in the data

is continuously managed and updated, and a mini-batch u for learning is selected from the corresponding distribution as in Equation (2).

[수학식 2][Equation 2]

불확실도 확률 분포를 계산하는 과정은 (14) ~ (16) 모듈에서 후술한다.The process of calculating the uncertainty probability distribution will be described later in modules (14) to (16).

뉴럴 네트워크 학습 모듈(13)은 다음과 같이 동작한다.The neural network learning module 13 operates as follows.

네트워크 학습 모듈(131)에서의 딥 뉴럴 네트워크의 학습은 적응적 배치 선택 모듈에서 주어지는 미니 배치 단위로 진행된다. 이는 미니 배치 학습(Mini-batch stochastic gradient descent)라 불리우며, 일반적으로 딥 뉴럴 네트워크의 학습은 배치 선택 모듈에서 주어지는 B개의 미니 배치에 대한 학습을 진행하는데 이를 1 Epcoh이라 하며 전체 학습은 여러 Epoch동안 반복된다.Learning of the deep neural network in the network learning module 131 is performed in units of mini-batch provided by the adaptive batch selection module. This is called mini-batch stochastic gradient descent, and in general, deep neural network training is performed on B mini-batches given in the batch selection module, which is called 1 Epcoh. do.

자세히 서술하면, 각 학습 단계에서 모델은 주어진 한 미니 배치, 즉 Batch Size의 표본들에 대한 라벨 예측을 수행하고 모델의 예측이 얼마나 부정확한지에 대한 손실(loss)를 계산한다. 뉴럴 네트워크 모델은 일반적으로 다수의 복잡한 함수를 구성하기위한 매개변수 집합

으로 구성되며, 표본 x_i에 대한 t시점 모델에 대한 손실을

라 했을 때, 모델의 매개변수를 수학식 3과 같이 업데이트하여 학습이 진행된다. 여기서 미니 배치 표본들은 앞서 언급한 불확실도 확률 분포를 활용하여 선택된 미니 배치 u에 속한 표본들이다.More specifically, in each learning step, the model performs label prediction on samples of a given mini-batch, that is, batch size, and calculates the loss of how inaccurate the model's prediction is. A neural network model is usually a set of parameters for constructing a large number of complex functions.

is composed of , and the loss for the time t model for sample x _i is

In this case, learning proceeds by updating the parameters of the model as in Equation (3). Here, the mini-batch samples are samples belonging to the mini-batch u selected using the aforementioned uncertainty probability distribution.

[수학식 3][Equation 3]

모델 예측 기록 모듈(132)은 뉴럴 네트워크 학습 모듈에서 얻어지는 라벨 예측에 대한 손실 값과 유사하게 모델은 각 t시점에서 표본 x_i에 대한 예측된 라벨을 출력으로 낸다. 모델 예측 기록 모듈은 각 표본들에 대한 t시점에서의 예측된 라벨 값을 저장한다. 단, 모든 시점에서의 라벨 값을 유지하는 것이 아니라 최근 q Epoch 동안의 예측 기록만 유지된다. 앞서 기술한 능동적 편향의 경우, 전체 기간 동안의 모델 예측을 활용하는 Growing window 기법을 활용한 반면 본 발명은 최근 q Epoch 동안만의 예측을 활용하는 Sliding window 기법을 활용한다. 이를 통하여, 본 발명은 오래된 예측 값에 의한 잘못된 불확실도 추정 문제를 극복하여 학습 속도를 증진시키면서도 일반화 성능을 높이는데 성공한다.The model prediction recording module 132 outputs the predicted label for the sample x _i at each time t, similar to the loss value for the label prediction obtained in the neural network learning module. The model prediction recording module stores the predicted label value at time t for each sample. However, instead of maintaining the label values at all time points, only the prediction records for the most recent q epochs are maintained. In the case of the active bias described above, the growing window technique that utilizes model prediction for the entire period is used, whereas the present invention uses the sliding window technique that utilizes prediction only for the recent q epoch. Through this, the present invention succeeds in improving generalization performance while improving learning speed by overcoming the problem of erroneous uncertainty estimation due to old prediction values.

추후 서술될 표본 불확실도 평가와 표본 선택 확률 분포 계산을 위해, 전체 표본에 대한 Sliding Window 기반의 모델 예측 기록을 H라 하며, 특정 표본 x_i에 대한 예측 기록은 H_i라고 한다. 수집된 모델 예측 기록들은 학습 Epoch당 업데이트되며 q Epoch 보다 오래된 예측 기록들은 모듈로부터 삭제된다. For the sample uncertainty evaluation and sample selection probability distribution calculation, which will be described later, the sliding window-based model prediction record for the entire sample is called H, and the prediction record for a specific sample x _i is called H _i . Collected model prediction records are updated per training epoch, and prediction records older than q epochs are deleted from the module.

표본 불확실도 평가 모듈(14)은 최근 q Epoch 동안의 모델 라벨 예측 기록을 바탕으로 표본 불확실도를 계산한다. The sample uncertainty evaluation module 14 calculates the sample uncertainty based on the model label prediction record during the most recent q epochs.

먼저, 표본 x_i의 최근 q 번의 예측 기록

라고 할 때, 특정 라벨 j가 최근에 모델에 의해 표본의 라벨로 예측될 확률은 수학식 4와 같다. []는 Iverson bracket이다.First, record the last q predictions of sample x _i .

, the probability that the specific label j is recently predicted as the label of the sample by the model is the same as in Equation 4. [] is an Iverson bracket.

[수학식 4][Equation 4]

다음으로, 일반적으로 널리 사용되는 경험적 엔트로피(empirical entropy)이 사용되며, 최종적인 표본 x_i에 대한 표본 예측 불확실도 u(x_i)는 수학식 5와 같다. 여기서 k는 예측하고자하는 라벨의 숫자이며 예측 불확실도는

를 곱합으로써 항상

이 보장된다.Next, generally widely used empirical entropy is used, and the sample prediction uncertainty u(x _i ) for the final sample x _i is expressed by Equation (5). where k is the number of labels you want to predict and the prediction uncertainty is

always by multiplying

this is guaranteed

[수학식 5][Equation 5]

양자화 모듈(15)은 표본 불확실도 평가 모듈에서 계산된 모든 표본들에 대한 불확실도 값을 전송 받으며, 표본 선택 확률 분포로 전환되기 위해 각 불확실도 u(x_i)를 양자화하여 양자화 지수 Q(u(x_i))를 수학식 5와 같이 계산한다.The quantization module 15 receives the uncertainty values for all the samples calculated in the sample uncertainty evaluation module, and quantizes each uncertainty u(x _i ) to convert it to a sample selection probability distribution to convert the quantization index Q(u(x _i ) )) is calculated as in Equation 5.

[수학식 5][Equation 5]

예측 불확실도가 높을수록 낮은 양자화 지수가 얻어지며 양자화 지수의 범위는 1 부터 데이터 표본 숫자인 N까지 이다,

얻어진 양자화 지수는 각 표본이 다음 미니 배치로 선택될 확률인 표본 선택 확률

를 계산하는데 사용된다.The higher the prediction uncertainty, the lower the quantization index is obtained, and the quantization index ranges from 1 to N, the number of data samples.

The resulting quantization index is the sample selection probability, which is the probability that each sample will be selected in the next mini-batch.

is used to calculate

표본 선택 확률 모듈(16)은 얻어진 각 표본들의 양지화 지수 Q(u(x_i))를 표본 선택 확률 분포

로 만들고 해당 확률 본포를 (12) 적응적 배치 선택 모듈로 전송한다.The sample selection probability module 16 calculates the quantization index Q(u(x _i )) of each sample obtained from the sample selection probability distribution.

, and transmits the corresponding probability distribution to the (12) adaptive batch selection module.

앞서 언급한 바, 모델의 예측이 확실한 샘플은 선택될 확률을 줄이고, 반면에 예측이 불확실한 샘플은 선택될 확률을 높여준다. 따라서, 양자화 계수를 활용하여 표본이 다음 미니 배치로 선택될 확률을 지수적으로 감소시키는 수학식 6을 따라 각 표본 x_i 선택 확률이 계산된다. 여기서, 선택 압력 s_e는 가장 예측이 불확실한 표본과 가장 예측이 확실한 표본의 선택 확률의 차이를 조절하기위한 매개변수이다. 높은 s_e 값이 사용되면 불확실한 표본이 더 높은 확률로 선택 되며, 반면에 낮은 s_e값이 사용되면 확실한 표본일지라도 미니 배치 표본으로 선택될 수 있다. 극단적으로 s_e=1 이면 기존 무작위 샘플링 방식과 같은 방식으로 동작하게 된다.As mentioned above, a sample with a certain prediction from the model decreases the probability of being selected, while a sample with an uncertain prediction increases the probability of being selected. Therefore, each sample x _i selection probability is calculated according to Equation 6, which exponentially reduces the probability that the sample is selected as the next mini-batch by utilizing the quantization coefficient. Here, the selection pressure s _e is a parameter for controlling the difference in the selection probability between the sample with the most uncertainty and the sample with the most prediction. When a _high value of s is used, the uncertain sample is chosen with a higher probability, whereas when a _low value of s is used, even a certain sample can be selected as a mini-batch sample. In the extreme, if s _e =1, it operates in the same way as the existing random sampling method.

[수학식 6][Equation 6]

일반적으로, 학습 후반부에 특정 표본들의 과도하게 강조하는 것은 해당 표본들에 대한 모델의 과적합 문제를 유발한다. 따라서, 초기 선택 압력 s_e0=100 은 학습을 진행함에 따라 수학식 7을 활용하여 지수적으로 그 값을 감쇠하여 최종적으로 s_e-end=1이 되게 하여 모델의 과적합 문제를 피한다. In general, excessive emphasis on specific samples in the late part of training causes the problem of overfitting the model to those samples. Therefore, the initial selection pressure s _e0 =100 exponentially attenuates the value using Equation 7 as learning proceeds to finally make s _e-end = 1 to avoid the problem of overfitting the model.

[수학식 7][Equation 7]

이와 같이, 본 개시는, 딥 뉴럴 네트워크 학습 가속화를 위한 데이터 불확실성 기반 적응적 배치 선택을 통해, 기존 Growing Window 기법으로 인한 학습 속도의 저하 문제를 해결하고, 테스트 데이터에 대한 모델의 일반화와 모델의 학습 속도를 동시에 개선할 수 있다.As such, the present disclosure solves the problem of lowering the learning speed due to the existing Growing Window technique through data uncertainty-based adaptive batch selection for accelerating deep neural network learning, and generalization of models for test data and model learning Speed can be improved at the same time.

본 개시에 의하면, 모델 학습을 가속화 하면서도 테스트 데이터에 대해 더 높은 정확도를 보이는 최종 모델을 얻을 수 있다. 따라서, 딥 뉴럴 네트워크 학습의 가속화를 현재까지 문제가 되었던 일반화 성능의 저하없이 달성할 수 있다. 그 결과, 대규모 데이터에 대한 학습 가속화가 필요한 자율 주행, 물체 탐색 등과 같은 다양한 실생활 응용에 적용될 수 있으며 그 성능을 크게 증진시킬 수 있다.According to the present disclosure, it is possible to obtain a final model showing higher accuracy on test data while accelerating model learning. Therefore, acceleration of deep neural network learning can be achieved without degradation of generalization performance, which has been a problem until now. As a result, it can be applied to various real-life applications such as autonomous driving and object search that require accelerated learning on large-scale data, and its performance can be greatly improved.

Claims

data storage module to store the entire training data;
an adaptive batch selection module for adaptively selecting a learning mini-batch that is a minimum unit for learning from the data storage module;
A network learning module that performs deep neural network training on the received uncertainty-based mini-batch;
A model prediction recording module that stores and manages only the latest prediction results for mini-batch samples for the trained model;
a sample uncertainty evaluation module that calculates sample uncertainty based on the results accumulated in the prediction recording module;
a quantization module for converting the obtained uncertainty value into a quantitative exponent; and
A system, comprising: a sample selection probability calculation module that utilizes the quantized index to determine a sample selection probability.

In claim 1,
The adaptive placement selection module is
A system that continuously manages and updates the uncertainty probability distribution for samples in the data and selects mini-batches from that distribution for training.