KR102097724B1

KR102097724B1 - Method and apparatus for mitigating the catastrophic forgetting problem in cnn for image recognition

Info

Publication number: KR102097724B1
Application number: KR1020180131572A
Authority: KR
Inventors: 신병석; 홍대용; 이연
Original assignee: 인하대학교 산학협력단
Priority date: 2018-10-31
Filing date: 2018-10-31
Publication date: 2020-04-06

Abstract

Disclosed are a method for mitigating a worst oblivion phenomenon in a CNN for image recognition and a device thereof. According to an embodiment of the present invention, the method for mitigating a worst oblivion phenomenon in a CNN for image recognition comprises the steps of: sampling data to be learned for a task to be learned through a prediction process; and performing learning by using a new sampled task. In the step of performing learning by using the new sampled task, as the new task is delivered, a weight set and a fisher information set can be updated by calculating gradients and fisher information.

Description

METHOD AND APPARATUS FOR MITIGATING THE CATASTROPHIC FORGETTING PROBLEM IN CNN FOR IMAGE RECOGNITION}

아래의 실시예들은 영상 인식용 CNN에서의 최악망각현상의 완화 방법 및 장치에 관한 것으로, 더욱 상세하게는 최악망각문제(CFP)를 완화하기 위해 새로운 이미지 중 기존에 학습된 것과 특성이 크게 차이가 나는 이미지만을 사용하여 학습을 진행하는 영상 인식용 CNN에서의 최악망각현상의 완화 방법 및 장치에 관한 것이다. The following embodiments relate to a method and an apparatus for alleviating the worst forgetting phenomenon in a CNN for image recognition, and more specifically, in order to alleviate the worst forgetting problem (CFP), the characteristics of the new image are significantly different from those previously learned. I am concerned with a method and apparatus for alleviating the worst forgetting phenomenon in a CNN for image recognition in which learning is performed using only images.

최근 딥러닝(Deep Learning) 관련 연구가 활발해짐에 따라 이미지 분류(image classification), 음성 인식(speech recognition), 얼굴 인식(face identification), 의미론적 세분화(semantic segmentation), VQA(visual question answering), Atari 게임(Atari games) 등 다양한 분야에서 CNN(Convolutional Neural Networks)이 활용되고 있다. 의료 영상 분야의 대표적인 사례로는 미세조정(finetuning)하여, 폐렴을 포함한 14개 병리를 감지할 수 있다.As recent deep learning-related research has become active, image classification, speech recognition, face identification, semantic segmentation, visual question answering (VQA), CNN (Convolutional Neural Networks) is used in various fields such as Atari games. As a typical case in the medical imaging field, fine-tuning can detect 14 pathologies including pneumonia.

그러나 딥러닝에 주로 사용되는 인공신경망(Neural Networks)은 새로운 데이터 셋(data set)을 학습할 때 기존 데이터에 대한 정보를 상실하는 특성이 보이는 것으로 보고 되었으며, 이는 현재까지 해결되지 않았다. 이를 최악망각문제(Catastrophic Forgetting Problem: CFP)라 하며, CNN에서도 동일하게 발생한다. Abraham은 최악망각문제(CFP)의 원인을 안정성-가소성 딜레마(stability-plasticity dilemma)라 주장하였다. 안정성(stability)은 가중치의 안정성, 즉 뒤이어 들어오는 입력값에 따라 가중치 값이 크게 변하지 않고 천천히 수렴하는 정도를 의미한다. 가소성(plasticity)은 가변성으로 입력값에 따라 가중치가 얼마나 빠르게 피팅(fitting)되는지를 의미한다. 네트워크는 안정성(stability)이 높을수록 이미 학습한 자료에 대해 보다 정확히 예측할 수 있는 능력을 가지며, 가소성(plasticity)이 높을수록 앞으로 학습할 새로운 자료에 대한 학습 능력이 증가한다. 두 가지 특성은 상충 관계(trade-off)에 있으므로 최악망각문제(CFP)가 발생하고 이를 해결하기 위한 연구가 꾸준히 진행되어 왔다.However, it has been reported that neural networks, which are mainly used for deep learning, show a characteristic of losing information on existing data when learning a new data set, which has not been resolved to date. This is called the Catastrophic Forgetting Problem (CFP), and it occurs the same in CNN. Abraham claimed that the cause of the CFP was the stability-plasticity dilemma. Stability refers to the stability of the weight, that is, the degree to which the weight value does not change significantly and converges slowly according to the incoming input value. Plasticity is variability, which means how fast the weight fits according to the input value. The higher the stability, the better the ability to predict the already learned data, and the higher the plasticity, the greater the learning ability for new materials to learn in the future. Because the two characteristics are in trade-off, the worst forgetting problem (CFP) has occurred and research has been conducted to solve it.

도 1은 일 실시예에 따른 최악망각문제의 발생을 설명하기 위한 도면이다. 1 is a view for explaining the occurrence of the worst forgetting problem according to an embodiment.

도 1의 (a)를 참조하면, 최악망각문제의 발생을 나타내는 것으로, 매 데이터 셋을 학습할 때마다 가중치가 조정되기 때문에 이전에 학습한 데이터 셋에 대한 분류 능력이 상실된다. 이 때 가중치를 업데이트하는 속도에 따라 가변성과 안정성이 결정되며, 가변성이 높을수록 최악망각문제(CFP)가 크게 발생한다. Referring to (a) of FIG. 1, it indicates the occurrence of the worst forgetting problem, and since the weight is adjusted for each data set, the classification ability for the previously learned data set is lost. At this time, variability and stability are determined according to the speed of updating the weight, and the higher the variability, the greater the CFP.

도 1의 (b)를 참조하면, 무작위 대입 공격(brute force)의 해결 방법을 나타내는 것으로, 최악망각문제(CFP)가 발생하지 않으려면 새로운 데이터 셋과 이미 학습한 모든 이미지 셋을 하나의 셋으로 합쳐 다시 학습을 진행해야 한다. Referring to (b) of FIG. 1, it shows a solution of a random brute force attack, and in order to avoid the worst forgetting problem (CFP), a new data set and all image sets already learned are set as one set. Together, you have to go on learning again.

이와 같이 최악망각문제(CFP)가 발생하지 않도록 하는 가장 단순한 방법은 새로운 이미지 셋(image set)이 입력되면 기존의 가중치를 버리고 새로운 가중치를 생성하는 것이다. 네트워크가 학습했던 이전의 가중치를 사용하지 않고, 새로운 이미지를 포함해 모든 이미지를 재학습한다. In this way, the simplest way to avoid the worst forgetting problem (CFP) is to discard the existing weight and generate a new weight when a new image set is input. Retrain all images, including new ones, without using the old weights the network learned.

그러나, 이러한 방법은 이미지 셋이 추가됨에 따라 늘어난 테스크 수만큼 학습시간이 배가 되며 그 배수는 갈수록 늘게 된다. 또한 매번 독립적으로 새로운 학습이 진행되기 때문에, 시간과 메모리 측면에서 비효율적이다.However, in this method, as the number of tasks is increased as the number of images is added, the learning time is doubled, and the multiple is gradually increased. In addition, it is inefficient in terms of time and memory because new learning is conducted independently each time.

F(2017) , "On quadratic penalties in elastic weight consolidation", arXiv:1712.03847.

F (2017), "On quadratic penalties in elastic weight consolidation", arXiv: 1712.03847.

실시예들은 영상 인식용 CNN에서의 최악망각현상의 완화 방법 및 장치에 관하여 기술하며, 보다 구체적으로 최악망각문제(CFP)를 완화하기 위해 새로운 이미지 중 기존에 학습된 것과 특성이 크게 차이가 나는 이미지만을 사용하여 학습을 진행하는 기술을 제공한다. Embodiments describe a method and apparatus for alleviating the worst forgetting phenomenon in a CNN for image recognition, and more specifically, an image in which characteristics are significantly different from those previously learned among new images to alleviate the worst forgetting problem (CFP). Provides a technique to progress learning using only.

실시예들은 추가로 학습될 이미지 중 네트워크가 분류하기 어려운 이미지만을 샘플링하여 학습하는 예측 가능한 EWC(Predictive Elastic Weight Consolidation)를 제안함으로써, 학습할 테스크의 크기를 줄이는 한편, 모든 테스크에 이 과정을 반복함으로써 효율적인 학습을 수행할 수 있는 영상 인식용 CNN에서의 최악망각현상의 완화 방법 및 장치를 제공하는데 있다. The embodiments further propose a predictable Elastic Weight Consolidation (EWC) that samples and trains only images that are difficult to classify among the images to be trained, thereby reducing the size of the task to be trained, and repeating this process for all tasks. An object of the present invention is to provide a method and apparatus for alleviating the worst oblivion phenomenon in a CNN for image recognition that can perform efficient learning.

일 실시예에 따른 영상 인식용 CNN에서의 최악망각현상의 완화 방법은, 학습할 테스크에 대해 학습할 데이터를 예측 과정을 통해 샘플링하는 단계; 및 샘플링된 새로운 테스크를 이용하여 학습을 진행하는 단계를 포함하고, 상기 샘플링된 새로운 테스크를 이용하여 학습을 진행하는 단계는, 상기 새로운 테스크가 전달됨에 따라 그래디언트(gradient) 및 피셔 정보(fisher information)를 계산하여 가중치 셋과 피셔 정보 셋을 업데이트할 수 있다. A method for alleviating the worst forgetting phenomenon in a CNN for image recognition according to an embodiment may include sampling data to be learned about a task to be learned through a prediction process; And progressing learning using the sampled new task, and progressing learning using the sampled new task, as the new task is transmitted, gradient and fisher information. Calculate to update the weight set and the Fisher information set.

상기 학습할 테스크에 대해 학습할 데이터를 예측 과정을 통해 샘플링하는 단계는, 상기 학습할 테스크의 각 이미지에 대한 예측 결과와 상기 이미지의 주석을 통해 테스크 내 이미지들을 정렬하는 단계; 및 정렬된 상기 이미지들 중 특정 이미지를 추출하여 새로운 테스크를 생성하는 단계를 포함할 수 있다.Sampling data to be learned about the task to be learned through a prediction process includes: aligning images in a task through prediction results for each image of the task to be learned and annotations of the image; And extracting a specific image among the aligned images to generate a new task.

상기 학습할 테스크의 각 이미지에 대한 예측 결과와 상기 이미지의 주석을 통해 테스크 내 이미지들을 정렬하는 단계는, 현재 네트워크의 각각의 상기 이미지에 대한 예측 결과인 예측값과 상기 이미지의 실제 주석값의 차를 통해 하나의 이미지당 하나의 실수값을 결정하는 단계; 및 상기 실수값에 따라 상기 이미지들을 내림차순으로 정렬하는 단계를 포함할 수 있다. The step of arranging the images in the task through the prediction result for each image of the task to be learned and the annotation of the image may include a difference between a prediction value, which is a prediction result for each image in the current network, and an actual annotation value of the image. Determining one real value per image through; And arranging the images in descending order according to the real value.

상기 정렬된 상기 이미지들 중 특정 이미지를 추출하여 새로운 테스크를 생성하는 단계는, 현재 네트워크의 각각의 상기 이미지에 대한 예측 결과인 예측값과 상기 이미지의 실제 주석값의 차가 큰 이미지를 샘플링하여 상기 새로운 테스크를 생성할 수 있다. The step of generating a new task by extracting a specific image from the aligned images includes sampling the image having a large difference between a prediction value that is a prediction result for each image in the current network and an actual annotation value of the image, and then generating the new task. Can generate

다른 실시예에 따른 영상 인식용 CNN에서의 최악망각현상의 완화 장치는, 학습할 테스크에 대해 학습할 데이터를 예측 과정을 통해 샘플링하는 예측부; 및 샘플링된 새로운 테스크를 이용하여 학습을 진행하는 학습부를 포함하고, 상기 학습부는, 상기 새로운 테스크가 전달됨에 따라 그래디언트(gradient) 및 피셔 정보(fisher information)를 계산하여 가중치 셋과 피셔 정보 셋을 업데이트할 수 있다. An apparatus for alleviating the worst forgetting phenomenon in a CNN for image recognition according to another embodiment includes: a prediction unit sampling data to be learned about a task to be learned through a prediction process; And a learning unit that progresses learning using the sampled new task, and the learning unit updates the weight set and the fisher information set by calculating gradient and fisher information as the new task is delivered. can do.

상기 예측부는, 상기 학습할 테스크의 각 이미지에 대한 예측 결과와 상기 이미지의 주석을 통해 테스크 내 이미지들을 정렬한 후, 정렬된 상기 이미지들 중 특정 이미지를 추출하여 새로운 테스크를 생성할 수 있다. The prediction unit may generate a new task by extracting a specific image among the aligned images after sorting the images in the task through the prediction result for each image of the task to be learned and the annotation of the image.

상기 예측부는, 현재 네트워크의 각각의 상기 이미지에 대한 예측 결과인 예측값과 상기 이미지의 실제 주석값의 차를 통해 하나의 이미지당 하나의 실수값을 결정하는 실수값 산정부; 상기 실수값에 따라 상기 이미지들을 내림차순으로 정렬하는 정렬부; 및 현재 네트워크의 각각의 상기 이미지에 대한 예측 결과인 예측값과 상기 이미지의 실제 주석값의 차가 큰 이미지를 샘플링하여 상기 새로운 테스크를 생성하는 샘플링부를 포함할 수 있다. The predicting unit includes: a real-value calculation unit for determining one real value per image through a difference between a prediction value that is a prediction result for each image in the current network and an actual annotation value of the image; An alignment unit to sort the images in descending order according to the real value; And a sampling unit that generates the new task by sampling an image having a large difference between a prediction value that is a prediction result for each image of the current network and an actual annotation value of the image.

실시예들에 따르면 추가로 학습될 이미지 중 네트워크가 분류하기 어려운 이미지만을 샘플링하여 학습하는 예측 가능한 EWC(Predictive Elastic Weight Consolidation)를 제안함으로써, 학습할 테스크의 크기를 줄이는 한편, 모든 테스크에 이 과정을 반복함으로써 효율적인 학습을 수행할 수 있는 영상 인식용 CNN에서의 최악망각현상의 완화 방법 및 장치를 제공할 수 있다.According to embodiments, a predictable elastic weight consolidation (EWC) that trains only images that are difficult to classify among images to be trained is proposed, thereby reducing the size of the task to be trained, and this process for all tasks. It is possible to provide a method and apparatus for alleviating the worst forgetting phenomenon in a CNN for image recognition that can perform efficient learning by repetition.

실시예들에 따르면 최악망각문제(CFP)를 완화하기 위해 새로운 이미지 중 기존에 학습된 것과 특성이 크게 차이가 나는 이미지만을 사용하여 학습을 진행함으로써, 주요 가중치의 안정성은 유지한 채 그 외 가중치의 가변성을 높여 네트워크의 이미지 분류 능력을 향상시킬 수 있다. According to embodiments, in order to alleviate the worst forgetting problem (CFP), learning is performed using only images that have significantly different characteristics from those previously learned among the new images. By increasing variability, it is possible to improve the network's ability to classify images.

도 1은 일 실시예에 따른 최악망각문제의 발생을 설명하기 위한 도면이다.
도 2는 일 실시예에 따른 EWC의 학습 구조를 설명하기 위한 도면이다.
도 3은 일 실시예에 따른 영상 인식용 CNN에서의 최악망각현상의 완화 방법을 나타내는 흐름도이다.
도 4는 일 실시예에 따른 영상 인식용 CNN에서의 최악망각현상의 완화 장치를 나타내는 블록도이다.
도 5는 일 실시예에 따른 예측 가능한 EWC의 학습 구조를 개략적으로 나타내는 도면이다.
도 6은 일 실시예에 따른 새로운 테스크를 생성하는 예측 과정을 설명하기 위한 도면이다. 1 is a view for explaining the occurrence of the worst forgetting problem according to an embodiment.
2 is a view for explaining the learning structure of the EWC according to an embodiment.
3 is a flowchart illustrating a method for alleviating the worst forgetting phenomenon in a CNN for image recognition according to an embodiment.
4 is a block diagram illustrating an apparatus for alleviating the worst forgetting phenomenon in a CNN for image recognition according to an embodiment.
5 is a diagram schematically illustrating a learning structure of a predictable EWC according to an embodiment.
6 is a view for explaining a prediction process for generating a new task according to an embodiment.

이하, 첨부된 도면을 참조하여 실시예들을 설명한다. 그러나, 기술되는 실시예들은 여러 가지 다른 형태로 변형될 수 있으며, 본 발명의 범위가 이하 설명되는 실시예들에 의하여 한정되는 것은 아니다. 또한, 여러 실시예들은 당해 기술분야에서 평균적인 지식을 가진 자에게 본 발명을 더욱 완전하게 설명하기 위해서 제공되는 것이다. 도면에서 요소들의 형상 및 크기 등은 보다 명확한 설명을 위해 과장될 수 있다.Hereinafter, embodiments will be described with reference to the accompanying drawings. However, the described embodiments may be modified in various other forms, and the scope of the present invention is not limited by the embodiments described below. In addition, various embodiments are provided to more fully describe the present invention to those skilled in the art. The shape and size of elements in the drawings may be exaggerated for a more clear description.

인공신경망(Neural Networks)에서는 새로운 데이터 셋을 학습하는 과정을 반복함에 따라, 기존에 학습한 데이터 셋으로부터 얻어낸 특징을 분석하는 능력을 상실하는 현상이 나타난다. 이를 최악망각문제(Catastrophic Forgetting Problem: CFP)라 한다. CNN(Convolutional Neural Networks)을 이용하여 영상 분류를 할 때, 각 분류마다 나타나는 대표 특징(representative feature)은 네트워크가 예측 대상 이미지의 분류를 판단하는데 중요한 영향을 미치므로, 최악망각문제(CFP)는 매우 치명적이다. In neural networks, as the process of learning a new data set is repeated, a phenomenon of losing the ability to analyze features obtained from the previously learned data set appears. This is called the Catastrophic Forgetting Problem (CFP). When classifying images using CNN (Convolutional Neural Networks), the representative feature appearing in each category has a significant effect on the network determining the classification of the predicted image, so the worst forgetting problem (CFP) is very It is fatal.

본 실시예에서 제안하는 예측 가능한 EWC(Predictive Elastic Weight Consolidation)는 추가로 학습될 이미지 중 네트워크가 분류하기 어려운 이미지만을 샘플링하여 학습할 수 있다. 이 때 샘플을 추출하는 기준은 네트워크의 예측값과 실제 주석값 차의 크기이다. 이를 통해 학습할 테스크의 크기를 줄이는 한편, 모든 테스크에 이 과정을 반복함으로써 효율적인 학습을 수행할 수 있다. The predictable EWC (Predictive Elastic Weight Consolidation) proposed in this embodiment can sample and train only images that are difficult to classify among the images to be trained. At this time, the criterion for extracting the sample is the difference between the predicted value of the network and the actual annotation value. Through this, the size of the task to be learned can be reduced while efficient learning can be performed by repeating this process for all tasks.

아래에서 영상 인식용 CNN에서의 최악망각현상의 완화 방법 및 장치에 대해 상세히 설명한다. Hereinafter, a method and apparatus for alleviating the worst forgetting phenomenon in a CNN for image recognition will be described in detail.

CNN이 좋은 성능을 보이는 것은 저수준 특징(feature)이 아닌 중간수준 특징(feature)을 활용하는 데서 기인한다. 이를 위해 조정해야 할 가중치의 개수가 증가하기 때문에 많은 수의 주석 이미지가 필요하다. 따라서 대부분의 경우 매번 대규모 이미지 셋을 학습하는 대신 이미 학습을 마친 가중치로 네트워크를 초기화한 후 추가 학습을 진행한다. 이를 통해 생성될 새로운 가중치가 기존 가중치의 이미지 특징 인식 능력을 유지한 채 새로운 학습을 시작할 수 있다. The good performance of CNN is due to the use of mid-level features rather than low-level features. For this, a large number of annotation images are required because the number of weights to be adjusted increases. Therefore, in most cases, instead of learning a large image set every time, the network is initialized with the weights already completed and additional learning is performed. Through this, the new weight to be generated can start new learning while maintaining the ability to recognize the image feature of the existing weight.

이것을 전이 학습(transfer learning)]이라고 하며, 그 구현 방식으로는 특징 추출(feature extractor)과 미세조정(finetuning)이 있다. 특징 추출(feature extractor)은 가져온 가중치를 수정하지 않고 특징 인식 능력을 그대로 사용하며, 분석한 특징을 원하는 주석 별로 분류하는 마지막 단의 완전-연결(fully-connected)된 분류기만을 학습시킨다. 미세조정(finetuning)은 초기화는 같은 방식으로 하지만, 분류기를 포함한 모든 가중치를 새로 학습하는 이미지의 특성에 맞게 수정하며 학습을 진행한다. 특징 추출(feature extractor)은 수정할 파라미터의 양이 적어 메모리나 학습 시간 면에서는 효율성 높으나, 일반적으로 미세조정(finetuning)이 학습을 수행한 이미지에 더 잘 피팅(fitting) 되는 경향을 보인다.This is called transfer learning, and its implementation methods include feature extractor and finetuning. The feature extractor uses the feature recognition ability without modifying the imported weights, and trains only the fully-connected classifier in the last stage that classifies the analyzed features into desired annotations. The finetuning is done in the same way, but all the weights including the classifier are modified according to the characteristics of the newly learned image, and the learning is progressed. The feature extractor has a high efficiency in terms of memory or learning time due to a small amount of parameters to be modified, but in general, finetuning tends to fit better to an image on which training has been performed.

최악망각문제(CFP)를 해결하기 위한 연구는 크게 정규화(regularization), 앙상블(ensemble), 리허설(rehearsal), 듀얼-메모리(dual-memory), 스파스-코딩(sparse-coding) 의 다섯 가지로 분류할 수 있다. 정규화(regularization)은 가변성을 조절하는 조건을 학습 과정에서 최소화할 대상인 손실 함수(loss function)에 추가하였다. 앙상블(ensemble)은 각 학습 셋, 즉 각 테스크 마다 네트워크를 새로 생성하여 종합한 예측을 최종 결과로 한다. 리허설(Rehearsal) 기법은 네트워크에게 새로운 학습 셋을 줄 때, 이미 학습했던 셋의 일부를 함께 제공한다. 듀얼-메모리(dual-memory)는 안정성이 높은 네트워크와 가변성이 높은 네트워크를 따로 두어, 학습을 마친 후 둘의 결과를 종합한다. 스파스-코딩(sparse-coding)은 각각의 특징이 표현되는 수치의 범위를 겹치지 않게 함으로써, 서로 영향을 줄 가능성을 낮추고자 한 것이다. 각 기법은 서로 다른 아이디어를 사용했지만, 공통적으로 영상을 학습할 때 이미 학습한 테스크를 잘 표현하는 특징을 잘 보존하고자 하였다.The study to solve the CFP is divided into five types: regularization, ensemble, rehearsal, dual-memory, and sparse-coding. Can be classified. Regularization added the condition to control the variability to the loss function, which is the object to be minimized in the learning process. The ensemble creates a new network for each learning set, that is, for each task, and makes the synthesized prediction the final result. When rehearsal techniques give a network a new set of learning, the rehearsal technique brings together some of the set that has already been learned. Dual-memory (dual-memory) separates the network with high stability and the network with high variability, and combines the results of the two after learning. Sparse-coding is intended to reduce the likelihood that each feature will affect each other by not overlapping the range of numerical values expressed. Each technique used a different idea, but when trying to learn images in common, it was intended to preserve the characteristics of expressing the already learned task well.

앙상블(ensemble) 기법은 딥러닝의 초기부터 성능 향상을 위해 널리 이용되던 기법이다. 네트워크의 구조가 복잡해져 조정할 가중치 수가 많아지면 예측의 정확도는 올라가지만 계산량이 급증한다. 앙상블(ensemble)은 같은 이미지 셋에 대해, 가중치 수가 많은 하나의 복잡한 네트워크를 만드는 대신 여러 개의 단순한 네트워크를 독립적으로 학습시킨다. 그리고 각 네트워크의 결과를 종합하여 최종 예측을 수행함으로써 적은 컴퓨팅 파워로 정확한 예측할 수 있다. Ren은 앙상블(ensemble)의 아이디어를 이용해, 이미 학습한 테스크와 새로운 테스크를 독립적으로 학습한 뒤 각 네트워크의 예측을 종합함으로써 최악망각문제(CFP)을 완화하였다. 또한 단순히 새로운 테스크를 위한 학습을 진행하는데 그치지 않고, 생성된 새로운 가중치(weight)가 기존 가중치(weight)에 반영될지 여부를 판단하는 알고리즘을 추가로 제안하였다. 그러나 고해상도 이미지를 다루는 경우에는 컨볼루션(convolution)을 수행하는 부분에서 비용(cost)가 크다. 네트워크의 깊이와 너비를 감소시켜 얻는 계산량 절감 효과보다 입력 이미지의 해상도가 커짐에 따라 얻는 오버헤드가 크므로, 여러 개의 네트워크를 고해상도 이미지로 미세조정(finetuning)하는 것은 계산량 측면에서 적합하지 않다. The ensemble technique has been widely used to improve performance since the beginning of deep learning. As the structure of the network becomes complicated and the number of weights to be adjusted increases, the accuracy of prediction increases, but the amount of calculation increases rapidly. The ensemble trains multiple simple networks independently, instead of creating one complex network with a large number of weights for the same set of images. And by making final predictions by synthesizing the results of each network, it is possible to make accurate predictions with less computing power. Ren used the idea of ensembles to learn the new and the new tasks independently, and then synthesize the predictions of each network to alleviate the worst-case problem (CFP). In addition, an algorithm for determining whether the generated new weight is to be reflected in the existing weight is additionally proposed, not just for learning for a new task. However, when dealing with high-resolution images, the cost is high in the part of performing convolution. Fine-tuning multiple networks into high-resolution images is not suitable in terms of computational complexity, since the overhead obtained by increasing the resolution of the input image is greater than the computational savings achieved by reducing the depth and width of the network.

리허설(Rehearsal) 기법은 네트워크의 구조를 변경하는 것이 아니라, 학습 자료(training dataset)를 재사용한다. 하나의 테스크를 학습한 뒤 다음 테스크를 학습하되, 이전에 학습한 테스크의 일부를 샘플링하여 새로운 테스크에 추가한 뒤 학습을 진행한다. 이처럼 복습하는 과정을 함께 함으로써 최악망각문제(CFP)를 방지한다. 그러나 이 기법은 샘플링을 위해서 지난 단계 학습에서 사용했던 이미지들을 모두 메모리에 가지고 있어야 한다는 단점이 있다. 이러한 비효율적인 자원(resource) 활용 문제를 개선하기 위해 Robins는 의사-리허설(pseudo-rehearsal) 기법을 제안했다. 이 기법은 이미지 자체를 재사용하지 않고, 기존 이미지를 분석하여 얻어낸 의사(pseudo) 패턴을 새로운 테스크에 추가한다. 패턴이 이미지 대신 들어가기 때문에, 메모리 상의 과부하를 줄였다.The rehearsal technique does not change the structure of the network, but reuses the training dataset. After learning one task, the next task is learned, but a part of the previously learned task is sampled and added to a new task before learning. By doing this review together, we avoid the CFP. However, this technique has a disadvantage in that all images used in the last step training must be kept in memory for sampling. To improve this inefficient resource utilization problem, Robins proposed a pseudo-rehearsal technique. This technique does not reuse the image itself, but adds a pseudo pattern obtained by analyzing an existing image to a new task. Since the pattern fits in place of the image, the memory overhead is reduced.

도 2는 일 실시예에 따른 EWC의 학습 구조를 설명하기 위한 도면이다. 2 is a view for explaining the learning structure of the EWC according to an embodiment.

도 2를 참조하면, 기존의 EWC(Elastic Weight Consolidation)는 통계학에서 사용하는 피셔 정보(fisher information) 개념을 도입하여, 새로운 테스크를 학습할 시 가중치 각 값이 어느 정도의 가변성을 가질지를 결정한다. 여기서, 피셔 정보는 어떤 사건이 일어났을 때, 사건의 관찰을 통해 얻어지는 정보의 양을 의미한다. 일어날 확률이 작은 사건일수록 얻을 수 있는 정보가 많기 때문에, 피셔 정보의 값은 사건이 일어날 확률이 작을수록 큰 값을 가진다. 하나의 테스크를 사건의 배경으로 보고, 그에 대해 학습을 마친 가중치 각 값을 독립 사건으로 보아 피셔 정보를 적용할 수 있다. 이에 따라 해당 테스크로부터 현재 계산된 가중치가 나올 확률이 작을수록 피셔 정보는 큰 값을 가지게 된다. Referring to FIG. 2, the existing EWC (Elastic Weight Consolidation) introduces the concept of fisher information used in statistics to determine how much variability each weight value has when learning a new task. Here, Fisher information means the amount of information obtained through observation of an event when an event occurs. Since an event with a smaller probability of occurrence has more information, the value of Fisher information has a larger value with a smaller probability of occurrence. Fisher information can be applied by looking at one task as the background of an event, and by viewing each value of the weights that have been learned about it as an independent event. Accordingly, the smaller the probability that the currently calculated weight is from the corresponding task, the greater the value of the Fisher information.

EWC는 이처럼 피셔 정보를 반영한 손실(loss)을 손실 함수(loss function)에 추가함으로써 손실 일반화(loss regularization)를 수행했다. 이에 따라 가중치 값 중 피셔 정보 값이 큰 값일수록 업데이트가 많이 된다. 즉, 가변성이 높아진다. EWC performed loss regularization by adding the loss reflecting the Fisher information to the loss function. Accordingly, the larger the value of the Fisher information among the weight values, the more updates. That is, the variability becomes high.

따라서 가중치가 이미 학습한 테스크로부터 영향을 받은 정도를 가중치의 가중치(Weight of Weight: WoW)로 생각할 수 있다. 주된 특성을 반영한 값일수록 가변성을 억제하는 과정을 통해 각 가중치의 가변성을 변경함으로써 하나의 네트워크가 안정성이 높은 네트워크와 가변성이 높은 네트워크를 앙상블(ensemble)한 것과 같은 결과를 만든다. 하나의 가중치 셋만을 저장하므로 메모리 측면에서 앙상블(ensemble)보다 효율적이며, 가중치마다 중요도가 다르다는 점에서 인공신경망이 추구하는 두뇌에 가까운 구조를 설계할 수 있다.Therefore, the degree to which the weight has been influenced by the already learned task can be considered as the Weight of Weight (WoW). By changing the variability of each weight through the process of suppressing variability, the value reflecting the main characteristic produces the result that one network ensembles a network with high stability and a network with high variability. Since only one set of weights is stored, it is more efficient than an ensemble in terms of memory, and it is possible to design a structure close to the brain pursued by the artificial neural network in that the importance of each weight is different.

도 2는 EWC의 학습 구조를 나타내고, 네트워크의 각 가중치는 피셔 정보를 가진다. 가중치의 가변성은 피셔 정보에 의존하기 때문에 피셔 정보를 가중치의 가중치로 생각할 수 있다. 새로운 테스크(D)가 들어옴에 따라 피셔 정보의 연산과 역전파(backpropagation)가 이루어져 가중치 셋(W)과 피셔 정보 셋(WoW)이 업데이트될 수 있다.2 shows the learning structure of the EWC, and each weight of the network has Fisher information. Since the variability of the weight depends on the Fisher information, the Fisher information can be considered as the weight of the weight. As the new task D enters, calculation of the fisher information and backpropagation are performed, so that the weight set W and the fisher information set WoW can be updated.

도 3은 일 실시예에 따른 영상 인식용 CNN에서의 최악망각현상의 완화 방법을 나타내는 흐름도이다. 3 is a flowchart illustrating a method for alleviating the worst forgetting phenomenon in a CNN for image recognition according to an embodiment.

도 3을 참조하면, 일 실시예에 따른 영상 인식용 CNN에서의 최악망각현상의 완화 방법은, 학습할 테스크에 대해 학습할 데이터를 예측 과정을 통해 샘플링하는 단계(S110), 및 샘플링된 새로운 테스크를 이용하여 학습을 진행하는 단계(S120)를 포함하여 이루어질 수 있다. Referring to FIG. 3, a method for alleviating the worst forgetting phenomenon in a CNN for image recognition according to an embodiment includes: (S110) sampling data to be learned about a task to be learned through a prediction process, and a new sampled task It may be made by including the step of progressing the learning using (S120).

여기서, 학습할 테스크에 대해 학습할 데이터를 예측 과정을 통해 샘플링하는 단계(S110)는, 학습할 테스크의 각 이미지에 대한 예측 결과와 이미지의 주석을 통해 테스크 내 이미지들을 정렬하는 단계, 및 정렬된 이미지들 중 특정 이미지를 추출하여 새로운 테스크를 생성하는 단계를 포함할 수 있다.Here, the step (S110) of sampling the data to be learned about the task to be learned through the prediction process is to sort the images in the task through the prediction result for each image of the task to be learned and the annotation of the image, and to be aligned. And generating a new task by extracting a specific one of the images.

또한, 학습할 테스크의 각 이미지에 대한 예측 결과와 이미지의 주석을 통해 테스크 내 이미지들을 정렬하는 단계(S120)는, 현재 네트워크의 각각의 이미지에 대한 예측 결과인 예측값과 이미지의 실제 주석값의 차를 통해 하나의 이미지당 하나의 실수값을 결정하는 단계, 및 실수값에 따라 이미지들을 내림차순으로 정렬하는 단계를 포함할 수 있다. In addition, the step of aligning the images in the task through the annotation of the prediction result and the image for each image of the task to be learned (S120), the difference between the prediction value, which is the prediction result for each image of the current network, and the actual annotation value of the image. The method may include determining one real value per image through and sorting images in descending order according to the real value.

아래에서 일 실시예에 따른 영상 인식용 CNN에서의 최악망각현상의 완화 방법을 하나의 예를 들어 설명하기로 한다. Hereinafter, a method for alleviating the worst forgetting phenomenon in the CNN for image recognition according to an embodiment will be described as an example.

일 실시예에 따른 영상 인식용 CNN에서의 최악망각현상의 완화 방법은 일 실시예에 따른 영상 인식용 CNN에서의 최악망각현상의 완화 장치를 이용하여 보다 구체적으로 설명할 수 있다. The method for alleviating the worst oblivion phenomenon in the CNN for image recognition according to an embodiment may be described in more detail by using the apparatus for alleviating the worst oblivion in the CNN for image recognition according to an embodiment.

도 4는 일 실시예에 따른 영상 인식용 CNN에서의 최악망각현상의 완화 장치를 나타내는 블록도이다. 4 is a block diagram illustrating an apparatus for alleviating the worst forgetting phenomenon in a CNN for image recognition according to an embodiment.

도 4를 참조하면, 일 실시예에 따른 영상 인식용 CNN에서의 최악망각현상의 완화 장치(100)는 예측부(110) 및 학습부(120)를 포함하여 이루어질 수 있다. 실시예에 따라 예측부(110)는 실수값 산정부(111), 정렬부(112) 및 샘플링부(113)를 더 포함할 수 있다. 또한 학습부(120)는 그래디언트 계산부(121), 피셔 정보 추출부(122) 및 업데이트부(123)를 더 포함할 수 있다. Referring to FIG. 4, the apparatus 100 for alleviating the worst forgetting phenomenon in the CNN for image recognition according to an embodiment may include a prediction unit 110 and a learning unit 120. According to an embodiment, the prediction unit 110 may further include a real value calculation unit 111, an alignment unit 112, and a sampling unit 113. Also, the learning unit 120 may further include a gradient calculation unit 121, a Fisher information extraction unit 122, and an update unit 123.

단계(S110)에서, 예측부(110)는 학습할 테스크에 대해 학습할 데이터를 예측 과정을 통해 샘플링할 수 있다. In step S110, the prediction unit 110 may sample data to be learned about the task to be learned through a prediction process.

보다 구체적으로, 예측부(110)는 학습할 테스크의 각 이미지에 대한 예측 결과와 이미지의 주석을 통해 테스크 내 이미지들을 정렬한 후, 정렬된 이미지들 중 특정 이미지를 추출하여 새로운 테스크를 생성할 수 있다. More specifically, the prediction unit 110 may generate a new task by extracting a specific image from the sorted images after sorting the images in the task through the prediction result and the annotation of the image for each image of the task to be learned. have.

여기서, 예측부(110)는 실수값 산정부(111), 정렬부(112) 및 샘플링부(113)를 더 포함할 수 있다.Here, the prediction unit 110 may further include a real value calculation unit 111, an alignment unit 112, and a sampling unit 113.

실수값 산정부(111)는 현재 네트워크의 각각의 이미지에 대한 예측 결과인 예측값과 이미지의 실제 주석값의 차를 통해 하나의 이미지당 하나의 실수값을 결정할 수 있다. The real value calculating unit 111 may determine one real value per image through a difference between a prediction value that is a prediction result for each image of the current network and an actual annotation value of the image.

정렬부(112)는 실수값에 따라 이미지들을 내림차순으로 정렬할 수 있다. The sorting unit 112 may sort the images in descending order according to the real value.

그리고 샘플링부(113)는 현재 네트워크의 각각의 이미지에 대한 예측 결과인 예측값과 이미지의 실제 주석값의 차가 큰 이미지를 추출한 후 샘플링하여 새로운 테스크를 생성할 수 있다.In addition, the sampling unit 113 may generate a new task by extracting and sampling an image having a large difference between a prediction value, which is a prediction result for each image of the current network, and an actual annotation value of the image.

단계(S120)에서, 학습부(120)는 샘플링된 새로운 테스크를 이용하여 학습을 진행할 수 있다. In step S120, the learning unit 120 may progress learning using the new sampled task.

보다 구체적으로, 학습부(120)는 새로운 테스크가 전달됨에 따라 그래디언트(gradient) 및 피셔 정보(fisher information)를 계산하여 가중치 셋과 피셔 정보 셋을 업데이트할 수 있다. More specifically, the learning unit 120 may update the weight set and the fisher information set by calculating gradient and fisher information as new tasks are delivered.

이러한 학습부(120)는 그래디언트 계산부(121), 피셔 정보 추출부(122) 및 업데이트부(123)를 더 포함할 수 있다. The learning unit 120 may further include a gradient calculation unit 121, a Fisher information extraction unit 122, and an update unit 123.

그래디언트 계산부(121)는 새로운 테스크가 전달됨에 따라 그래디언트를 계산할 수 있다. The gradient calculator 121 may calculate the gradient as a new task is delivered.

피셔 정보 추출부(122)는 새로운 테스크가 전달됨에 따라 피셔 정보를 계산할 수 있다. The fisher information extraction unit 122 may calculate fisher information as new tasks are delivered.

업데이트부(123)는 그래디언트 계산부(121) 및 피셔 정보 추출부(122)로부터 계산 결과를 전달받아 가중치 셋과 피셔 정보 셋을 업데이트할 수 있다. The update unit 123 may receive the calculation results from the gradient calculation unit 121 and the Fisher information extraction unit 122 to update the weight set and the Fisher information set.

이와 같이, 실시예들은 최악망각문제(CFP)를 완화하기 위해 손실 일반화(loss regularization)를 통한 전이 학습(transfer learning) 기법인 EWC를 확장하여, 새로운 이미지 중 기존에 학습된 것과 특성이 크게 차이가 나는 이미지만을 사용하여 학습을 진행하는 기법을 제공할 수 있다. As described above, the embodiments extend EWC, a transfer learning technique through loss regularization, to alleviate the worst forgetting problem (CFP). I can provide a technique for learning using only images.

EWC는 가중치 각각에 대해, 대표 특징을 인식하는데 관여한 정도가 클수록 가변성을 낮춘다. 이는 영상 분류에 있어 중요한 특징을 잘 잡아내는 가중치를 덜 조정되도록 하여 최악망각문제(CFP)를 완화하고 네트워크의 인식 능력을 보존할 수 있다. 이미 학습한 이미지 셋에 관여한 정도를 각 가중치마다 가진 채, 새로 입력된 이미지 셋을 모두 학습한다. EWC, for each weight, the greater the degree of involvement in recognizing representative features, the lower the variability. This makes it possible to adjust the weights that are less important to the important features in image classification, thereby alleviating the worst-case forgetting problem (CFP) and preserving the network's perception. All newly input image sets are trained while each weight has the degree of involvement in the image sets already learned.

일 실시예에 따른 영상 인식용 CNN에서의 최악망각현상의 완화 장치(100)(아래에서는, 간단히 "예측 가능한 EWC(100)"라고 하기로 한다.)는 가중치 별로 가변성을 조정하는 과정을 채택하나, EWC와 달리 새로운 이미지 셋을 입력 받는 즉시 학습하지 않는다. 대신 학습할 테스크의 이미지 중, 해당 시점에 네트워크가 인식해내기 어려운 특징을 많이 가진 이미지들을 추출하여 이 샘플링 셋을 학습할 테스크로 정한다. 이 때 EWC를 이용했기 때문에 이미지 분류에 크게 관여한 가중치는 안정성을 유지한다. The apparatus 100 for alleviating the worst forgetting phenomenon in the CNN for image recognition according to an embodiment (hereinafter, simply referred to as "predictable EWC 100") adopts a process of adjusting variability by weight. , Unlike EWC, it does not learn as soon as it receives a new set of images. Instead, among the images of the task to be learned, images with many features that are difficult for the network to recognize at that time are extracted and this sampling set is determined as the task to be learned. Since EWC was used at this time, the weight heavily involved in image classification maintains stability.

추가적으로, 일 실시예에 따른 예측 가능한 EWC는 학습할 이미지를 샘플링하여 테스크를 대체함으로써 이미지 분류에 관여한 정도가 작아 가변성이 높은 가중치를 더 중점적으로 학습시킨다. 또한 기존의 EWC와 달리 모든 이미지를 다 사용하지 않으므로 학습할 이미지 수가 감소하여 학습 시간이 단축된다. Additionally, the predictable EWC according to an embodiment samples the image to be trained and replaces the task, so that the degree of involvement in image classification is small, so that the highly variable weight is more intensively learned. In addition, unlike the existing EWC, since not all images are used, the number of images to learn decreases, thereby reducing the learning time.

따라서 일 실시예에 따른 예측 가능한 EWC는 어려운 이미지만을 학습함으로써 주요 가중치의 안정성은 유지한 채 그 외 가중치의 가변성을 높여 네트워크의 이미지 분류 능력을 향상시킬 수 있다. Therefore, the predictable EWC according to an embodiment may improve the image classification ability of the network by increasing the variability of other weights while maintaining the stability of the main weights by learning only difficult images.

실험 결과, 샘플링하지 않은 EWC 학습의 1회 수행시간과 샘플링 비율을 0.5로 한 예측 가능한 EWC 학습이 3회 수행되는 시간이 비슷했다. 두 실험에서 AUROC 측정 결과 가장 낮은 AUROC 값이 EWC는 약 0.3인데 반해 예측 가능한 EWC는 약 0.6을 기록했다. 또한, 테스크를 나누지 않고 모든 이미지를 미세조정(finetuning)한 학습에서 가장 높은 AUROC가 약 0.6 것을 확인하였다. 이로써 샘플링을 통해 동일한 양의 메모리 리소스를 사용하면서도 인식 정확도가 2배 향상되었음을 확인하였다.As a result of the experiment, the time of one time of EWC learning without sampling and the time of three times of predictable EWC learning with sampling rate of 0.5 were similar. As a result of AUROC measurement in both experiments, the lowest AUROC value was about 0.3, whereas the predictable EWC was about 0.6. In addition, it was confirmed that the highest AUROC in the learning of fine-tuning all images without dividing the task was about 0.6. As a result, it was confirmed through sampling that the recognition accuracy was doubled while using the same amount of memory resources.

아래에서는 제안하는 예측 가능한 EWC의 구조를 서술하고, 실험의 구체적인 요소들과 그 결과를 설명하기로 한다. Below, the structure of the proposed predictable EWC will be described, and specific elements of the experiment and its results will be described.

일 실시예에 따른 예측 가능한 EWC은 리허설(rehearsal) 및 의사-리허설(pseudo-rehearsal)의 샘플링 개념과 EWC 기법의 피셔 정보를 이용할 수 있다. 기존에 샘플링을 이용하는 기법은, 학습에 이용되었던 이미지들 자체 또는 그 이미지들의 특징을 반영하여 생성된 의사-패턴(pseudo-pattern)을 메모리에 가지고 있어야 했다. 이러한 메모리 측면의 비효율성에도 불구하고 네트워크의 성능이 향상되지만 최악망각문제(CFP) 완화에 있어 타 기법들보다 뛰어난 개선을 보이지 못했다. 반면, 일 실시예에 따른 예측 가능한 EWC은 리허설(rehearsal) 기법과 같이 이미 학습에 사용되었던 테스크로부터 샘플링하지 않는다. The predictable EWC according to an embodiment may use sampling concepts of rehearsal and pseudo-rehearsal and Fisher information of the EWC technique. Conventionally, the technique using sampling had to have in memory a pseudo-pattern generated by reflecting the images themselves used for learning or characteristics of the images. Despite the inefficiencies in terms of memory, the performance of the network is improved, but there is no improvement over other techniques in alleviating the worst-case forgetting problem (CFP). On the other hand, the predictable EWC according to an embodiment does not sample from a task that has already been used for learning, such as a rehearsal technique.

도 5는 일 실시예에 따른 예측 가능한 EWC의 학습 구조를 개략적으로 나타내는 도면이다. 5 is a diagram schematically illustrating a learning structure of a predictable EWC according to an embodiment.

도 5를 참조하면, 일 실시예에 따른 예측 가능한 EWC(100)에서 각 셋 D는 각 테스크를 의미하며, D'은 해당 테스크 D에서 네트워크의 예측과 샘플링을 통해 만들어진 새로운 테스크를 의미한다. 실시예에 따르면 예측부(110)에서의 예측 과정을 통해 작아진 학습 셋으로, 학습부(120)에서 그래디언트 및 피셔 정보를 계산해 업데이트를 수행할 수 있다.Referring to FIG. 5, in the predictable EWC 100 according to an embodiment, each set D denotes each task, and D ′ denotes a new task made through network prediction and sampling in the corresponding task D. According to an embodiment, the learning set reduced through the prediction process in the prediction unit 110, and the gradient and Fisher information may be calculated by the learning unit 120 to perform an update.

일 실시예에 따른 예측 가능한 EWC(100)는 앞으로 학습할 테스크에 대해 학습할 데이터를 샘플링하는 과정을 수행하기 때문에 이미지를 메모리에 저장해둘 필요가 없다. 이를 위해 앞으로 학습할 테스크가 학습되기 이전에, 예측부(110)에서 해당 테스크를 검사용 데이터 셋으로 가정하고 예측 과정을 먼저 수행할 수 있다. 그리고 각 이미지에 대한 예측 결과와 이미지의 주석을 통해 테스크 내 이미지를 정렬한 뒤 새로운 테스크를 생성하고, 학습부(120)에서 이 새로운 테스크로 본래 테스크를 대체하여 학습을 진행할 수 있다. 이 때 정렬 기준은 각 이미지에 대한 네트워크의 예측값과 실제 주석값의 차의 L1-norm으로, 다음 식과 같이 표현될 수 있다.The predictable EWC 100 according to an embodiment does not need to store an image in a memory because it performs a process of sampling data to be learned for a task to be learned in the future. To this end, before the task to be learned in the future is learned, the prediction unit 110 may assume the task as a data set for inspection and perform a prediction process first. And after aligning the images in the task through the prediction result for each image and the annotation of the image, a new task is generated, and the learning part 120 replaces the original task with this new task to progress the learning. At this time, the alignment criterion is L1-norm of the difference between the predicted value of the network and the actual annotation value for each image, and can be expressed as the following equation.

[수학식 1][Equation 1]

[수학식 2][Equation 2]

[수학식 3][Equation 3]

[수학식 4][Equation 4]

[수학식 5][Equation 5]

여기서, [수학식 2]는 하나의 테스크를 뜻하며, 그를 구성하는 한 원소는 한 장의 이미지[수학식 3]와 그에 대응하는 주석[수학식 4]의 쌍이다. n은 테스크 [수학식 2]의 원소의 개수, 즉 각 테스크의 크기를 뜻한다. C는 학습시키는 네트워크[수학식 5]에 주어진 분류(class)가 몇 개인지를 나타낸다. 분류기(classifier)의 마지막 단에서 시그모이드(sigmoid) 함수를 적용하기 때문에, 네트워크를 통과한 이미지의 예측값은 0에서 1사이 실수값들로 이루어진 벡터이다. 주석값은 각 분류에 대해 음성 또는 양성에 해당하는지에 따라 0 또는 1로 이루어진 길이 C인 벡터이다. Here, [Equation 2] means a task, and one element constituting it is a pair of images [Equation 3] and a corresponding annotation [Equation 4]. n is the number of elements in the task [Equation 2], that is, the size of each task. C denotes how many classes are given in the learning network [Equation 5]. Since the sigmoid function is applied at the last stage of the classifier, the predicted value of the image passing through the network is a vector of real values between 0 and 1. The annotation value is a vector of length C consisting of 0 or 1, depending on whether it is negative or positive for each classification.

그러므로 [수학식 1]에 따라 하나의 이미지에 대해 분류 별 네트워크의 예측값과 주석값의 차의 절대값들이 계산되고, 그를 합산하여 이미지 한 장당 하나의 양의 실수값이 계산된다. 이 때, 실수값이 클수록 네트워크의 예측값과 실제 주석값의 차이가 크므로 norm 값이 클수록 네트워크가 인식하기 어렵다고 해석할 수 있다. Therefore, according to [Equation 1], the absolute values of the difference between the predicted value and the annotated value of a network for each classification are calculated for one image, and the sum of them calculates one positive real value per image. At this time, it can be interpreted that the larger the real value, the greater the difference between the predicted value of the network and the actual annotation value, so that the larger the norm value, the more difficult the network is to recognize.

도 6은 일 실시예에 따른 새로운 테스크를 생성하는 예측 과정을 설명하기 위한 도면이다. 6 is a view for explaining a prediction process for generating a new task according to an embodiment.

도 6을 참조하면, 예측부(110)에서의 새로운 테스크 생성 과정을 나타내는 것으로, D에 있는 이미지를 네트워크로 예측한 결과와 그에 대응되는 주석에 1-norm을 취하여 정렬하고, 값이 큰 순으로 이미지를 추출할 수 있다. Referring to FIG. 6, a process for generating a new task in the prediction unit 110 is shown, and an image in D is predicted by a network and 1-norm is sorted by annotation corresponding to it, and values are sorted in order of highest value. Images can be extracted.

보다 구체적으로, 각 이미지 별로 L1-norm을 구한 후, 이를 이용해 테스크 내 이미지를 내림차순으로 정렬하고, 정해진 샘플링 비율에 따라 상위 이미지부터 샘플링하여 새로운 테스크를 생성할 수 있다. 결과로 생성되는 새로운 테스크는 본래 테스크보다 크기가 작으며, L1-norm 값이 큰 이미지만 추출하기 때문에 네트워크에게 분류하기 가장 어려운 이미지들의 셋에 해당된다.More specifically, after obtaining the L1-norm for each image, the images in the task can be sorted in descending order, and a new task can be generated by sampling the upper image according to a predetermined sampling rate. The resulting new task is smaller in size than the original task, and it is the set of the most difficult images to classify to the network because only L1-norm values are extracted.

분류 별 네트워크의 예측값과 분류 별 주석값의 차의 절대값은 항상 0에서 1 사이이다. 따라서 L2-norm을 취하게 되면 제곱 연산을 통해 norm 값이 본래 차의 절대값보다 작아진다. 여기서 norm의 역할은 예측과 주석 사이의 거리(distance)를 측정하여 해당 이미지의 난이도를 판별하는 것이므로, 제곱의 합보다 절대값의 합이 보다 명확한 기준이 된다. 그러므로 L2-norm이 아닌 L1-norm을 기준으로 정렬 및 샘플링을 진행할 수 있다.The absolute value of the difference between the predicted value of the classified network and the annotated value of the classified network is always 0 to 1. Therefore, when L2-norm is taken, the norm value is smaller than the absolute value of the original difference through the square operation. Here, since the role of norm is to determine the difficulty of the corresponding image by measuring the distance between the prediction and the annotation, the sum of the absolute values is a clearer criterion than the sum of squares. Therefore, alignment and sampling can be performed based on L1-norm, not L2-norm.

이러한 과정을 수행하기 위해서는 새로운 테스크를 생성하는 단계에서 네트워크가 이미지를 예측하는 과정과 이미지를 정렬하는 추가 비용이 든다. 그러나 예측 과정은 입력 이미지와 가중치 간의 선형적인 연산과 활성화 함수 통과 과정이며, 정렬과정은 양의 실수값들의 단순 내림차순으로, 복잡한 연산을 요구하지 않는다. 따라서 샘플링 비율을 1로 설정하여 새로운 테스크의 모든 이미지를 사용하더라도 기존 EWC와 총 학습시간에 있어 큰 차이가 나지 않는다. In order to perform this process, the process of creating a new task requires the network to predict the image and the additional cost of aligning the image. However, the prediction process is a linear operation between the input image and the weight and the activation function passing process, and the alignment process is a simple descending order of positive real values and does not require a complicated operation. Therefore, even if all images of the new task are used by setting the sampling rate to 1, there is no significant difference in the total learning time from the existing EWC.

반면, 샘플링을 수행하면 이미지 수의 감소하고 이로 인해 가중치 업데이트를 위한 그래디언트(gradient)의 계산량이 대폭 줄기 때문에, 예측 및 정렬 과정으로 인해 추가되는 비용보다 큰 비용 절감 효과가 있다. 따라서 일 실시예에 따른 예측 가능한 EWC가 기존 EWC보다 총 학습시간이 적게 걸린다. On the other hand, when sampling is performed, the number of images decreases, and thus, the computational amount of gradients for weight update is significantly reduced. Therefore, there is a cost saving effect greater than the additional cost due to the prediction and alignment process. Therefore, the predictable EWC according to an embodiment takes less total learning time than the existing EWC.

학습을 위해 입력되는 새로운 테스크를 작은 크기의 테스크로 대체하면 네트워크가 학습하는 특징의 수가 줄어든다. 앞에서 설명한 도 1의 (a)에 도시된 바와 같이 최악망각문제(CFP)를 고려하지 않은 기본적인 형태에서는 학습하는 특징의 수가 줄면 새로운 이미지에 대한 학습도 부족해질 뿐 아니라, 최악망각문제(CFP)도 발생해 네트워크가 이미 가진 이미지 분류 능력도 잃는다. Replacing the new task input for learning with a smaller task reduces the number of features the network learns. As shown in FIG. 1 (a), the basic form that does not consider the worst-case forgetting problem (CFP) decreases the number of features to learn, and not only the learning of new images, but also the worst-case forgetting problem (CFP) Occurs and loses the ability of the network to classify images.

그러나 일 실시예에 따른 예측 가능한 EWC는 기존 EWC을 이용해 이미 가진 분류 능력을 보존하고, 테스크 생성을 통해 아직 학습이 부족한 특징만을 더 학습할 수 있다. EWC로부터 채택해온 피셔 정보는 이미 학습한 이미지의 특징 인식에 주요한 역할을 한 가중치의 안정성(stability)을 높여, 해당 가중치의 학습 속도를 늦추는 역할을 한다. 따라서 피셔 정보를 이용하는 네트워크가 새로운 테스크를 학습할 때, 어떤 가중치가 가변성이 높다면 이 가중치는 해당 시점에 잘 인식하는 이미지 특징과 연관성이 비교적 낮은 가중치이다. 일 실시예에 따른 예측 가능한 EWC는 학습시킬 새로운 테스크를 예측과 주석의 거리가 가장 먼 이미지로 구성한다. However, the predictable EWC according to an embodiment may use the existing EWC to preserve the classification ability already possessed and further learn only features that are not yet sufficiently learned through task creation. Fisher information, which has been adopted from EWC, plays a role in increasing the stability of weights, which played a major role in recognizing the characteristics of images already learned, and slowing the learning speed of the weights. Therefore, when the network using the fisher information learns a new task, if a certain weight is high in variability, the weight is a relatively low weight associated with an image characteristic well recognized at the time. The predictable EWC according to an embodiment configures a new task to be learned into an image having a distance between prediction and annotation.

따라서 일 실시예에 따른 예측 가능한 EWC는 네트워크가 인식하는 능력이 부족한 특징이 풍부한 이미지를, 인식 능력에 기여하는 정도가 작은 가중치에 학습시킬 수 있다. 이로써 기존 EWC만큼 새로운 테스크 학습 이전의 분류능력을 보존하면서, 어려운 이미지만을 통해 효율적으로 학습할 수 있다.Accordingly, the predictable EWC according to an embodiment may train an image rich in features that the network lacks in ability to recognize, with a weight that has a small degree of contribution to the recognition ability. Thus, while preserving the classification ability before learning new tasks as much as the existing EWC, it is possible to learn efficiently through only difficult images.

이상과 같이, 인공신경망은 네트워크가 새로운 데이터로 인해 기존 데이터의 특징(feature)을 인지하는 능력을 소실하는 문제인 최악망각(catastrophic forgetting)을 겪는다. 리허설(rehearsal)은 학습 자료의 재사용으로 최악망각문제(CFP)를 완화시켰고, EWC는 손실 제한(loss constraint)을 통해 가중치(weight) 내부 노드 별 업데이트 속도의 조절하여 최악망각문제(CFP)를 해결하려 했다. As described above, the artificial neural network suffers from catastrophic forgetting, which is a problem that the network loses the ability to recognize features of existing data due to new data. Rehearsal eased the worst forgetting problem (CFP) by reusing learning materials, and EWC solved the worst forgetting problem (CFP) by adjusting the update rate for each internal node of weight through loss constraint I tried to.

본 실시예들은 리허설(rehearsal)의 샘플링 방식과 EWC의 손실 제한(loss constraint) 기법에 기반하지만, 이미 학습을 한 테스크로부터 학습 자료를 샘플링하여 재사용하는 방식은 메모리상 과부하가 생기므로, 학습했던 테스크에서가 아닌 앞으로 학습할 테스크에서 진행하였다. 이 때 현재 네트워크의 이미지에 대한 예측값과 실제 주석의 차이가 큰 이미지를 샘플링 함으로써, 네트워크의 분류 능력이 저조한 특징이 풍부한 이미지를 중점적으로 학습하였다. These embodiments are based on the rehearsal sampling method and the EWC loss constraint technique, but the method of sampling and reusing the learning data from the already learned task causes memory overload, so the task that was learned It was conducted in a task to be learned in the future, not in Esau. At this time, by sampling the image with a large difference between the predicted value and the actual annotation on the current network image, the study focused on the image rich in features with poor network classification.

실험은 전체 셋의 미세조정(finetuning), 테스크별 전이 학습(transfer learning), EWC, 예측 가능한 EWC의 4가지 케이스로 진행되었다. 미세조정(finetuning)을 제외한 다른 케이스의 worst AUROC(Area Under Receiver Operating Characteristic)는 약 0.3, 제안 기법의 worst AUROC는 약 0.6으로 향상된 결과를 확인하였다. 즉, 제안된 예측 가능한 EWC 기법은 worst AUROC가 약 0.6으로, 기존 EWC의 0.3보다 향상되었다.The experiment was conducted in four cases: fine-tuning of the entire set, transfer learning for each task, EWC, and predictable EWC. The worst AUROC (Area Under Receiver Operating Characteristic) of the other cases, except for fine tuning, was improved to about 0.3, and the worst AUROC of the proposed technique was improved to about 0.6. That is, the proposed predictable EWC technique has a worst AUROC of about 0.6, which is improved over 0.3 of the existing EWC.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 컨트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 컨트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented with hardware components, software components, and / or combinations of hardware components and software components. For example, the devices and components described in the embodiments include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor (micro signal processor), a microcomputer, a field programmable array (FPA), It may be implemented using one or more general purpose computers or special purpose computers, such as a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. The processing device may run an operating system (OS) and one or more software applications running on the operating system. In addition, the processing device may access, store, manipulate, process, and generate data in response to the execution of the software. For convenience of understanding, a processing device may be described as one being used, but a person having ordinary skill in the art, the processing device may include a plurality of processing elements and / or a plurality of types of processing elements. It can be seen that may include. For example, a processing device may include a plurality of processors or a processor and a controller. In addition, other processing configurations, such as parallel processors, are possible.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instruction, or a combination of one or more of these, and configure the processing device to operate as desired, or process independently or collectively You can command the device. Software and / or data may be interpreted by a processing device, or to provide instructions or data to a processing device, of any type of machine, component, physical device, virtual equipment, computer storage medium or device. Can be embodied in The software may be distributed on networked computer systems, and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer readable medium. The computer-readable medium may include program instructions, data files, data structures, or the like alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiments or may be known and usable by those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs, DVDs, and magnetic media such as floptical disks. -Hardware devices specially configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language code that can be executed by a computer using an interpreter, etc., as well as machine language codes produced by a compiler.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described by a limited embodiment and drawings, those skilled in the art can make various modifications and variations from the above description. For example, the described techniques are performed in a different order than the described method, and / or the components of the described system, structure, device, circuit, etc. are combined or combined in a different form from the described method, or other components Alternatively, even if replaced or substituted by equivalents, appropriate results can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다. Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

In the method of mitigating the worst oblivion in the image recognition CNN using the apparatus for alleviating the worst oblivion in the CNN for image recognition,
Sampling data to be learned about a task to be learned by a prediction unit of the apparatus for alleviating the worst forgetting phenomenon in the CNN for image recognition through a prediction process; And
Learning by using the new task sampled by the learning unit of the apparatus for alleviating the worst forgetting phenomenon in the CNN for image recognition.
Including,
The step of sampling the data to be learned about the task to be learned through a prediction process,
Sorting the images in the task through the prediction result for each image of the task to be learned and the annotation of the image; And
Generating a new task by extracting a specific image from the aligned images
It includes,
The step of progressing the learning using the sampled new task,
Calculating gradient and fisher information as the new task is delivered to update the weight set and the fisher information set
Characterized in that, the method of mitigating the worst oblivion phenomenon in CNN for image recognition.

delete

According to claim 1,
The step of arranging the images in the task through the prediction result for each image of the task to be learned and the annotation of the image,
Determining one real value per image through a difference between a prediction value that is a prediction result for each image in the current network and an actual annotation value of the image; And
Sorting the images in descending order according to the real value
A method of alleviating the worst oblivion phenomenon in a CNN for image recognition, comprising a.

According to claim 1,
The step of generating a new task by extracting a specific image from the aligned images,
Generating the new task by sampling an image with a large difference between a prediction value that is a prediction result for each image in the current network and an actual annotation value of the image
Characterized in that, the method of mitigating the worst oblivion phenomenon in CNN for image recognition.

In the apparatus for alleviating the worst forgetting phenomenon in the image recognition CNN,
A prediction unit that samples the data to be learned about the task to be learned through a prediction process; And
A learning department that progresses learning using new sampled tasks
Including,
The prediction unit,
After sorting the images in the task through the prediction result for each image of the task to be learned and the annotation of the image, a specific task is extracted from the aligned images to generate a new task,
The learning unit,
Calculating gradient and fisher information as the new task is delivered to update the weight set and the fisher information set
Characterized in that, the apparatus for mitigating the worst forgetting phenomenon in the CNN for image recognition.

delete

The method of claim 5,
The prediction unit,
A real value calculation unit for determining one real value per image through a difference between a prediction value that is a prediction result for each image in the current network and an actual annotation value of the image;
An alignment unit to sort the images in descending order according to the real value; And
A sampling unit that generates the new task by sampling an image with a large difference between a prediction value that is a prediction result for each image of the current network and an actual annotation value of the image.
Including, including, the apparatus for mitigating the worst forgetting phenomenon in the CNN for image recognition.