KR20210029110A

KR20210029110A - Method and apparatus for few-shot image classification based on deep learning

Info

Publication number: KR20210029110A
Application number: KR1020200112520A
Authority: KR
Inventors: 이성환; 서진우; 김정준; 정홍규
Original assignee: 고려대학교 산학협력단
Priority date: 2019-09-05
Filing date: 2020-09-03
Publication date: 2021-03-15
Also published as: KR102370910B1

Abstract

According to an aspect of the present application, a deep learning-based fractional shot image classification method includes the steps of: a primary learning step of performing data augmentation processing on an input image according to a self-mixing technique which generates a new input image by replacing a part of the input image with another part of the input image, performing feature extraction on the input images on which data augmentation has been performed, learning the extracted features based on deep learning, and creating a classification model which classifies the input image; a secondary learning step of creating a new auxiliary classifier from a main classifier included in the classification model and allowing independently output results of each classifier to be shared with other classifiers according to a self-distillation technique; and a step of outputting a classification result by inputting the new input image to the image classification model that has undergone the primary and secondary learning steps, thereby providing a high level of classification accuracy.

Description

Deep learning-based prime shot image classification apparatus and method {METHOD AND APPARATUS FOR FEW-SHOT IMAGE CLASSIFICATION BASED ON DEEP LEARNING}

본 발명은 딥러닝에 기반하여 소수 샷 이미지를 분류할 수 있는 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method capable of classifying a small number of shot images based on deep learning.

최근 영상처리 기술은 빅데이터의 출현과 컴퓨터 하드웨어의 발전을 동반한 인공지능 분야 내 딥러닝 기술의 발전으로 인해 크게 진보하고 있으며, 이를 기반으로 다양한 분야의 영상을 자동으로 분류하는 기술이 발달하고 있다. 기존의 알고리즘에 기반한 영상 처리 방식이나 통계 기반의 영상 처리 방식들과는 다르게, 딥러닝에 기반한 영상 처리 기술은 인간이 분류하기 어려운 복잡한 영상일지라도 컴퓨터가 자동으로 분류할 수 있도록 한다. Recently, image processing technology has been greatly advanced due to the advent of big data and the development of deep learning technology in the field of artificial intelligence accompanied by the development of computer hardware, and based on this, technology for automatically classifying images in various fields is developing. . Unlike image processing methods based on existing algorithms or image processing methods based on statistics, image processing technology based on deep learning enables computers to automatically classify even complex images that are difficult for humans to classify.

그러나 영상 처리에서 높은 성능의 분류 기술을 달성하기 위해서는 정답이 라벨링 되어 있는 다수의 학습 데이터가 필요한데, 아직 많은 산업 및 학문 분야에서 이러한 학습이 가능한 데이터 양을 충분하게 보유하지 못하고 있어, 사람이 직접 데이터를 가공하는 작업이 필요하여 시간 및 비용적 측면에서 어려움이 있는 상황이다.However, in order to achieve high-performance classification technology in image processing, a large number of learning data labeled with correct answers are required. However, many industries and academic fields do not have enough data for such learning. It is a situation that is difficult in terms of time and cost because the work to process is required.

최근에는 영상 처리에서의 분류 성능을 향상시키기 위해 소수 샷 학습 기법에 관한 연구가 주목받고 있다. 소수 샷 학습 기법은 다수의 데이터로 학습이 되어 있는 모델에 이전에 보지 못했던 새로운 이미지가 입력될 때, 입력된 소수의 이미지를 제대로 분류할 수 있도록 하는 것을 목표로 하는 연구이다. 다수의 데이터로 학습이 되어 있는 모델이라 하더라도 새로운 카테고리를 분류하기 위해서는 수백 개 이상의 학습 데이터가 필요 하지만, 소수 샷 학습 기법은 이러한 과정을 불과 몇 장의 이미지만으로 가능하게 하는 기술이다. Recently, research on a fractional shot learning technique has been attracting attention in order to improve classification performance in image processing. The prime shot learning technique is a study aiming to properly classify the input few images when a new image that has not been seen before is input to a model that has been trained with a large number of data. Even if the model is trained with a large number of data, hundreds or more of training data are required to classify a new category, but the prime shot learning technique is a technology that enables this process with only a few images.

소수 샷 학습 기술은 크게 특징추출 및 분류와 같은 일반 영상 분류 방식과 동일한 학습 방식을 취하고 있지만, 세부적으로는 거리 기반 학습 방식, 최적화 방식, 데이터 증강 방식, 그 외 기타 방식으로 구분할 수 있다.The fractional shot learning technique largely takes the same learning method as the general image classification method such as feature extraction and classification, but in detail, it can be classified into a distance-based learning method, an optimization method, a data augmentation method, and other methods.

종래 기술의 이미지 영상 분류의 연구는 다수의 빅데이터를 기반으로 하여 각 클래스에 해당하는 이미지에 대한 패턴과 특징을 추출하는 방식으로 진행되었으며, 최근에는 연산을 줄이면서도 정확도를 향상시킬 수 있는 기술적 개선에 집중되거나 특정 도메인에 특화된 분류 모델 개발이 연구되어 왔다. The research on image classification in the prior art has been conducted by extracting patterns and features for images corresponding to each class based on a large number of big data, and recently, a technical improvement that can improve accuracy while reducing computation. The development of a classification model focused on or specific to a specific domain has been studied.

이와 같이 이미지 영상 분류 기술이 발전하고 있지만, 결국 분류 모델을 학습시키기 위한 충분한 데이터의 양을 확보하지 못하면 관련 기술을 적용할 수 없는 것이 딥러닝 기반 영상 이미지 분류 기술의 한계점으로 지적되어 왔다.As described above, image classification technology has been developed, but it has been pointed out as a limitation of deep learning based image classification technology that the related technology cannot be applied unless sufficient amount of data for training a classification model is secured.

이러한 한계를 극복하기 위해, 소수 샷 학습 기반의 영상 이미지 분류 방법에 대한 연구가 최근 몇 년간 진행되었으며, 이러한 연구로 인해 소 수샷 학습 시스템 연구는 특정 하나의 데이터 세트 내에서는 비교적 높은 성능을 보이는 단계까지 기술 개발이 이뤄졌다. In order to overcome these limitations, studies on image classification methods based on fractional shot learning have been conducted in recent years, and due to these studies, studies on fractional shot learning systems have been able to achieve relatively high performance within a specific data set. Technology development took place.

그러나, 학습에 활용했던 데이터 세트와 전혀 다른 데이터 세트를 소수 샷 학습 시스템의 성능 평가 단계에 적용할 경우, 분류 성능이 낮아지는 현상이 한 연구결과에서 보고되었다. 또 다른 연구에서는 딥러닝 모델이 학습될 때 입력 학습 데이터를 암기하는 경향이 있음을 밝혀냈다. 이는 딥러닝 모델이 갖는 과적합 문제와도 연결되는 사항으로, 이러한 학습데이터 암기가 딥러닝의 일반화 성능을 저해하는데 큰 영향을 미친다고 볼 수 있다. However, when a data set completely different from the data set used for training is applied to the performance evaluation stage of the fractional shot learning system, the phenomenon that the classification performance is lowered was reported in one study. Another study found that when deep learning models are trained, they tend to memorize input training data. This is linked to the overfitting problem of deep learning models, and this learning data memorization can be considered to have a great effect on hindering the generalization performance of deep learning.

결국 선행 연구들은 소수 샷 학습에 있어 높은 일반화 성능을 달성하는데 한계가 있었음을 보여주는 연구로, 이전과는 다른 새로운 학습 방식으로 조금 더 향상된 일반화 성능을 보이는 것이 후속 연구에 있어 중요한 과제로 대두되기 시작했다.Eventually, previous studies showed that there was a limit to achieving high generalization performance in fractional shot learning, and showing a slightly improved generalization performance with a new learning method different from the previous one began to emerge as an important task for subsequent studies. .

기존의 소수 샷 학습 모델들은 순수하게 딥러닝 기반의 영상처리 프로세스만을 이용해 분류의 정확도를 향상시키는데 주력하다 보니 현실에서 발생할 수 있는 전혀 다른 데이터 셋에 대한 인지 정확도는 고려되지 못했던 것으로 판단된다.As existing minority shot learning models focus on improving the accuracy of classification using only the deep learning-based image processing process, it is judged that the recognition accuracy for completely different data sets that may occur in reality has not been considered.

대한민국 등록특허공보 제 10-1888647 호 (발명의 명칭: 이미지 분류 장치 및 방법)Republic of Korea Patent Publication No. 10-1888647 (title of the invention: image classification apparatus and method)

본 발명은 전술한 문제점을 해결하기 위한 것으로, 입력 영상의 데이터 분포와 출력 분포에 다양한 변화를 주는 셀프 증강 기법과 지역 특징 학습이라는 미세 조정 기법을 사용하여 소수의 데이터만으로도 딥러닝 기반의 분류기 학습이 가능하고 높은 수준의 분류 정확도를 제공하는 소수 샷 이미지 분류 방법 및 장치를 제공 하는 것을 목적으로 한다.The present invention is to solve the above-described problem, by using a self-augmentation technique that makes various changes to the data distribution and output distribution of an input image, and a fine-tuning technique called regional feature learning, deep learning-based classifier learning is possible with only a small number of data. An object of the present invention is to provide a method and apparatus for classifying a small number of shot images that provide a possible and high level of classification accuracy.

다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.However, the technical problem to be achieved by the present embodiment is not limited to the technical problem as described above, and other technical problems may exist.

상술한 기술적 과제를 해결하기 위한 기술적 수단으로서, 본 개시의 제 1측면에 따른 딥러닝 기반 소수 샷 이미지 분류 방법은, 입력 영상의 일부를 해당 입력 영상의 다른 부분으로 대체하여 신규 입력 영상을 생성하는 자가 혼합 기법에 따라 입력 영상에 대한 데이터 증강 처리를 수행하고, 데이터 증강이 이루어진 입력 영상들에 대하여 특징 추출을 수행하고, 딥러닝 학습에 기반하여 상기 추출된 특징을 학습하고, 입력 영상을 분류하는 분류 모델을 생성하는 1차 학습단계; 상기 분류 모델에 포함에 메인 분류기로부터 신규의 보조 분류기를 생성하고, 각각의 분류기의 독립적으로 출력한 결과가 자가 증류 기법에 따라 타 분류기로 공유되도록 하는 2차 학습단계 및 상기 1차 학습 단계와 2차 학습 단계를 거친 영상 분류 모델에 신규 입력 영상을 입력하여 분류 결과를 출력하는 단계를 포함한다.As a technical means for solving the above technical problem, the deep learning-based fractional shot image classification method according to the first aspect of the present disclosure generates a new input image by replacing a part of the input image with another part of the input image. Perform data augmentation processing on the input image according to the self-mixing technique, perform feature extraction on the input images with data augmentation, learn the extracted features based on deep learning learning, and classify the input images. A first learning step of generating a classification model; The second learning step and the first learning step and 2 in which a new auxiliary classifier is created from the main classifier for inclusion in the classification model, and the independently output results of each classifier are shared with other classifiers according to the self-distillation technique. And outputting a classification result by inputting a new input image to the image classification model that has undergone the primary learning step.

또한, 본 개시의 제 2 측면에 따른 딥러닝 기반 소수 샷 이미지 분류 장치는 통신 모듈; 소수 샷 이미지 분류 프로그램이 저장된 메모리; 상기 메모리에 저장된 프로그램을 실행하는 프로세서를 포함하되, 상기 소수샷 이미지 분류 프로그램은 입력 영상의 일부를 해당 입력 영상의 다른 부분으로 대체하여 신규 입력 영상을 생성하는 자가 혼합 기법에 따라 입력 영상에 대한 데이터 증강 처리를 수행하고, 데이터 증강이 이루어진 입력 영상들에 대하여 특징 추출을 수행하고, 딥러닝 학습에 기반하여 상기 추출된 특징을 학습하고, 입력 영상을 분류하는 분류 모델을 생성하는 1차 학습단계와 상기 분류 모델에 포함에 메인 분류기로부터 신규의 보조 분류기를 생성하고, 각각의 분류기의 독립적으로 출력한 결과가 자가 증류 기법에 따라 타 분류기로 공유되도록 하는 2차 학습단계를 거쳐 생성된 영상 분류 모델을 포함하고, 상기 소수샷 이미지 분류 프로그램은 신규 입력 영상을 상기 영상 분류 모델에 입력하여 신규 입력 영상의 분류 결과를 출력하는 것이다.In addition, an apparatus for classifying a fractional shot image based on deep learning according to a second aspect of the present disclosure includes: a communication module; A memory in which a fractional shot image classification program is stored; A processor for executing a program stored in the memory, wherein the fractional shot image classification program replaces a part of the input image with another part of the input image to generate a new input image. A first learning step of performing augmentation processing, performing feature extraction on input images with data augmentation, learning the extracted features based on deep learning learning, and generating a classification model for classifying input images; and To include in the classification model, a new sub-classifier is generated from the main classifier, and the independently output result of each classifier is shared with other classifiers according to the self-distillation technique. And the fractional shot image classification program inputs a new input image to the image classification model and outputs a classification result of the new input image.

전술한 본원의 과제 해결 수단 중 어느 하나에 의하면, 자기혼합이라는 데이터 증강 기법을 활용하므로, 딥러닝 기술이 학습 데이터를 암기하는 현상을 방지할 수 있다. 또한, 보조 분류기를 기반으로 한 자가증류 기법을 통해 모델의 출력 분포에 대해서 변화를 가할 수 있다. 이를 통해 시스템이 학습 데이터를 암기하는 것이 아닌 데이터가 갖고 있는 다양한 패턴 정보를 조금 더 민감하게 취득할 수 있도록 하여 소수의 새로운 영상에 대해서도 영상 내 객체를 올바르게 인식할 수 있는 정확도를 향상시킬 수 있다. According to any one of the above-described problem solving means of the present application, since a data augmentation technique called self-mixing is used, a phenomenon in which the deep learning technology memorizes the learning data can be prevented. In addition, a change can be applied to the output distribution of the model through a self-distillation technique based on an auxiliary classifier. This allows the system to more sensitively acquire various pattern information of the data rather than memorizing the learning data, thereby improving the accuracy of correctly recognizing objects in the image even for a small number of new images.

이러한 효과에 의하여, 수작업으로 데이터 세트에 라벨링을 수행하는 번거로운 작업을 거치지 않고서도, 산업 및 학계에서 겪고 있는 데이터 부족 현상을 해소할 수 있는 영상 분류 모델 구축이 가능하다. With this effect, it is possible to construct an image classification model capable of solving the data shortage phenomenon experienced in industry and academia without having to go through the cumbersome task of manually labeling the data set.

도 1은 본 발명의 일 실시예에 따른 딥러닝 기반 소수 샷 이미지 분류 장치를 도시한 블록도이다.
도 2는 본 발명의 다른 실시예에 소수 샷 이미지 분류 방법을 도시한 순서도이다.
도 3은 본 발명의 일 실 시예에 따른 소수 샷 이미지 분류 방법에 사용되는 자가 증강 기법을 설명하기 위한 도면이다.
도 4는 본 발명의 일 실 시예에 따른 소수 샷 이미지 분류 방법에 사용되는 특징 추출 과정을 설명하기 위한 도면이다.
도 5는 본 발명의 일 실 시예에 따른 소수 샷 이미지 분류 방법에 사용되는 자가 증류 기법을 설명하기 위한 도면이다.1 is a block diagram illustrating an apparatus for classifying a fractional shot image based on deep learning according to an embodiment of the present invention.
2 is a flowchart illustrating a method for classifying a minority shot image according to another embodiment of the present invention.
3 is a diagram for explaining a self-augmentation technique used in a method for classifying a minority shot image according to an exemplary embodiment of the present invention.
4 is a diagram illustrating a feature extraction process used in a method for classifying a prime number shot image according to an exemplary embodiment of the present invention.
5 is a diagram illustrating a self-distillation technique used in a method for classifying a fractional shot image according to an exemplary embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본원이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본원의 실시예를 상세히 설명한다. 그러나 본원은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본원을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present application will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art may easily implement the present application. However, the present application may be implemented in various different forms and is not limited to the embodiments described herein. In addition, in the drawings, parts irrelevant to the description are omitted in order to clearly describe the present application, and similar reference numerals are attached to similar parts throughout the specification.

본원 명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. Throughout this specification, when a part is said to be "connected" with another part, this includes not only the case that it is "directly connected", but also the case that it is "electrically connected" with another element interposed therebetween. do.

본원 명세서 전체에서, 어떤 부재가 다른 부재 “상에” 위치하고 있다고 할 때, 이는 어떤 부재가 다른 부재에 접해 있는 경우뿐 아니라 두 부재 사이에 또 다른 부재가 존재하는 경우도 포함한다.Throughout this specification, when a member is positioned “on” another member, this includes not only the case where a member is in contact with the other member, but also the case where another member exists between the two members.

도 1은 본 발명의 일 실시예에 따른 딥러닝 기반 소수 샷 이미지 분류 장치를 도시한 블록도이다.1 is a block diagram illustrating an apparatus for classifying a prime number shot image based on deep learning according to an embodiment of the present invention.

본 발명에 따른 소수 샷 이미지 분류 장치(100), 통신모듈(110), 메모리(120), 프로세서(130), 데이터베이스(140), 각종 입출력 모듈(150)을 포함할 수 있다.It may include a fractional shot image classification apparatus 100, a communication module 110, a memory 120, a processor 130, a database 140, and various input/output modules 150 according to the present invention.

통신모듈(110)은 소수 샷 이미지 분류 장치(100)와 다른 컴퓨팅 장치 또는 서버 등과 유무선 통신 네트워크 접속을 통해 각종 데이터를 송수신할 수 있다. 특히, 통신모듈(110)은 학습을 위해 필요한 영상 데이터를 수신하거나, 이미지 분류 대상이 되는 신규 입력 영상을 수신하거나, 신규 입력 영상에 대한 분류 결과를 전송할 수 있다.The communication module 110 may transmit and receive various types of data through a wired/wireless communication network connection with the minority shot image classification apparatus 100 and other computing devices or servers. In particular, the communication module 110 may receive image data required for learning, receive a new input image to be classified as an image, or transmit a classification result for a new input image.

메모리(120)에는 소수 샷 이미지 분류 프로그램이 저장된다. 소수샷 이미지 분류 프로그램은 입력 영상의 일부를 해당 입력 영상의 다른 부분으로 대체하여 신규 입력 영상을 생성하는 자가 혼합 기법에 따라 입력 영상에 대한 데이터 증강 처리를 수행하고, 데이터 증강이 이루어진 입력 영상들에 대하여 특징 추출을 수행하고, 딥러닝 학습에 기반하여 상기 추출된 특징을 학습하고, 입력 영상을 분류하는 분류 모델을 생성하는 1차 학습단계와 상기 분류 모델에 포함에 메인 분류기로부터 신규의 보조 분류기를 생성하고, 각각의 분류기의 독립적으로 출력한 결과가 자가 증류 기법에 따라 타 분류기로 공유되도록 하는 2차 학습단계를 거쳐 생성된 영상 분류 모델을 포함한다. 이러한 소수샷 이미지 분류 프로그램은 신규 입력 영상을 영상 분류 모델에 입력하여 신규 입력 영상의 분류 결과를 출력한다. A prime shot image classification program is stored in the memory 120. The fractional shot image classification program performs data augmentation processing on the input image according to a self-mixing technique that generates a new input image by replacing a part of the input image with another part of the corresponding input image, and performs data augmentation on the input images. A new auxiliary classifier from the main classifier is used in the primary learning step of generating a classification model for classifying the input image by performing feature extraction, learning the extracted features based on deep learning learning, and inclusion in the classification model. It includes an image classification model generated through a secondary learning step in which the generated and independently outputted results of each classifier are shared with other classifiers according to the self-distillation technique. Such a fractional shot image classification program inputs a new input image into an image classification model and outputs a classification result of the new input image.

한편 메모리(120)에는 소수 샷 이미지 분류 장치(100)의 구동을 위한 운영 체제나 소수 샷 이미지 분류 프로그램의 실행 과정에서 발생되는 여러 종류가 데이터가 저장된다. 이때, 메모리(120)는 전원이 공급되지 않아도 저장된 정보를 계속 유지하는 비휘발성 저장장치 및 저장된 정보를 유지하기 위하여 전력이 필요한 휘발성 저장장치를 통칭하는 것이다. Meanwhile, the memory 120 stores various types of data generated in the process of executing an operating system for driving the minority shot image classification apparatus 100 or a minority shot image classification program. In this case, the memory 120 collectively refers to a nonvolatile storage device that continuously maintains stored information even when power is not supplied, and a volatile storage device that requires power to maintain the stored information.

프로세서(130)는 메모리(120)에 저장된 프로그램을 실행하되, 소수 샷 이미지 분류 프로그램의 실행에 따르는 전체 과정을 제어한다. 프로세서(120)가 수행하는 각각의 동작에 대해서는 추후 보다 상세히 살펴보기로 한다.The processor 130 executes a program stored in the memory 120, but controls the entire process according to the execution of the fractional shot image classification program. Each operation performed by the processor 120 will be described in more detail later.

이러한 프로세서(130)는 데이터를 처리할 수 있는 모든 종류의 장치를 포함할 수 있다. 예를 들어 프로그램 내에 포함된 코드 또는 명령으로 표현된 기능을 수행하기 위해 물리적으로 구조화된 회로를 갖는, 하드웨어에 내장된 데이터 처리 장치를 의미할 수 있다. 이와 같이 하드웨어에 내장된 데이터 처리 장치의 일 예로써, 마이 크로프로세서(microprocessor), 중앙처리장치(central processing unit: CPU), 프로세서 코어(processor core), 멀티프로세서(multiprocessor), ASIC(application-specific integrated circuit), FPGA(field programmable gate array) 등의 처리 장치를 망라할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다.The processor 130 may include all types of devices capable of processing data. For example, it may refer to a data processing device embedded in hardware having a circuit that is physically structured to perform a function represented by a code or command included in a program. As an example of the data processing device built into the hardware as described above, a microprocessor, a central processing unit (CPU), a processor core, a multiprocessor, and an application-specific application-specific device (ASIC) Integrated circuit) and processing devices such as a field programmable gate array (FPGA) may be covered, but the scope of the present invention is not limited thereto.

한편, 소수 샷 이미지 분류 장치(100)는 데이터베이스(140) 등을 더 포함할 수 있으며, 이는 프로세서(130)의 제어에 따라, 소수 샷 이미지 분류 프로그램의 실행에 필요한 데이터를 저장 또는 제공한다. 이러한 데이터베이스는 메모리(110)와는 별도의 구성 요소로서 포함되거나, 또는 메모리(110)의 일부 영역에 구축될 수도 있다.Meanwhile, the minority shot image classification apparatus 100 may further include a database 140 and the like, which stores or provides data necessary for execution of the minority shot image classification program under the control of the processor 130. Such a database may be included as a separate component from the memory 110 or may be built in a partial area of the memory 110.

또한, 소수 샷 이미지 분류 장치(100)는 입출력 모듈(150) 등을 더 포함할 수 있다. 소수 샷 이미지 분류 장치(100)의 동작을 위한 각종 입출력 인터페이스 등이 이에 해당한다.In addition, the minority shot image classification apparatus 100 may further include an input/output module 150 and the like. Various input/output interfaces for the operation of the fractional shot image classification apparatus 100 correspond to this.

도 2는 본 발명의 다른 실시예에 소수 샷 이미지 분류 방법을 도시한 순서도이다.2 is a flowchart illustrating a method for classifying a prime number shot image according to another embodiment of the present invention.

먼저, 입력 영상이 입력되면(S210), 이에 대한 특징 추출을 수행하기 전에 입력 영상의 일부를 해당 입력 영상의 다른 부분으로 대체하여 신규 입력 영상을 생성하는 자가 혼합 기법에 따라 입력 영상에 대한 데이터 증강 처리를 수행한다(S220).First, when an input image is input (S210), data for the input image is augmented according to a self-mixing technique that generates a new input image by replacing a part of the input image with another part of the corresponding input image before performing feature extraction for this. The processing is performed (S220).

이를 수학식으로 표현하면 다음과 같다.This can be expressed as an equation as follows.

[수학식 1][Equation 1]

이때, T 는

와 같은 처리를 수행하는 변환 함수이며, 이미지

의 패치

을 다른 부분의 패치

로 대체하는 것을 의미한다.At this time, T is

It is a conversion function that performs the same processing as an image

Patch of

The other part of the patch

Means to replace with.

도 3은 본 발명의 일 실 시예에 따른 소수 샷 이미지 분류 방법에 사용되는 자가 증강 기법을 설명하기 위한 도면이다.3 is a diagram for describing a self-augmentation technique used in a method for classifying a minority shot image according to an exemplary embodiment of the present invention.

좌측의 입력 영상에서, 해당 입력 영상의 일부 패치(300)를 해당 입력 영상의 다른 부분의 패치와 대체하는 자가 혼합 기법에 따라 데이터 증강 처리를 수행한다.In the input image on the left, data augmentation processing is performed according to a self-mixing technique in which some patches 300 of the corresponding input image are replaced with patches of other parts of the corresponding input image.

이러한 데이터 증강 처리가 수행됨에 따라, 해당 객체에 해당하는 정답을 찾기위해 기존에는 중요하게 여기지 않았던 영상의 세부적인 특징까지 고려하도록 모델 학습 과정이 이뤄져 결론적으로 특징 추출기의 성능을 향상 시킬 수 있다. 이를 통해, 딥러닝이 기존에 갖고있던 학습데이터 암기 문제를 방지함으로써, 입력 영상이 갖는 고유한 패턴을 고려한 특징을 추출하도록 학습이 이루어 지며, 이러한 방식을 통해 학습 단계에서 보지 못한 카테고리의 영상에서도 유의미한 패턴을 추출하는 효과를 기대할 수 있다.As such data augmentation processing is performed, a model learning process is performed to consider detailed features of images that were not previously considered important in order to find the correct answer corresponding to the object, and consequently, the performance of the feature extractor can be improved. Through this, by preventing the problem of memorizing the learning data that deep learning has previously, learning is performed to extract features that take into account the unique patterns of the input image.Through this method, it is meaningful even in images of categories that were not seen in the learning stage. You can expect the effect of extracting the pattern.

이와 같은 데이터 증강 처리 외에도, 입력 영상에 대하여 영상의 크기 또는 형태 등을 자동으로 변환하는 전처리 과정이 추가로 수행될 수 있다.In addition to such data enhancement processing, a preprocessing process of automatically converting the size or shape of the image to the input image may be additionally performed.

다시 도 2를 참조하면, 데이터 증강이 이루어진 입력 영상들에 대하여 특징 추출을 수행하고, 딥러닝 학습에 기반하여 추출된 특징을 학습하는 과정을 통해, 입력 영상을 분류하는 분류 모델을 생성하는 1차 학습단계를 수행한다(S230).Referring back to FIG. 2, the first step of generating a classification model for classifying the input images through the process of performing feature extraction on input images for which data augmentation has been performed and learning the extracted features based on deep learning learning. The learning step is performed (S230).

도 4는 본 발명의 일 실 시예에 따른 소수 샷 이미지 분류 방법에 사용되는 특징 추출 과정을 설명하기 위한 도면이다.4 is a diagram illustrating a feature extraction process used in a method for classifying a prime number shot image according to an exemplary embodiment of the present invention.

도 4에 도시된 바와 같이, 복수의 서브 계층을 갖는 멀티 특징 추출 분류기를 통해 분류 모델을 생성할 수 있다. 이때, 분류기는 코사인 유사도를 이용한 분류 기법을 사용한다. 코사인 분류기를 사용하게 되면 특징 추출기가 클래스 내 데이터 간의 분포의 변화를 낮춰주는 쪽으로 학습하도록 유인하기 때문에 새로운 카테고리를 분류해야 하는 소수 샷 학습 방식에서는 유리하게 작용한다.As shown in FIG. 4, a classification model may be generated through a multi-feature extraction classifier having a plurality of sub-layers. In this case, the classifier uses a classification technique using cosine similarity. The use of a cosine classifier is advantageous in a fractional shot learning method that requires classifying a new category because the feature extractor induces learning toward lowering the change in distribution between data within a class.

특히, 본 발명에서는 자가 혼합에 따른 데이터 증강 처리를 수행한 데이터를 이용하여 특징을 추출하고, 분류 모델을 생성하므로, 도시된 바와 같이, 여러 서브 계층(3-1, 4-1, 4-2)을 추가로 포함하는 분류 모델을 생성할 수 있다.In particular, in the present invention, since the feature is extracted using data that has undergone data augmentation processing according to self-mixing and a classification model is generated, as shown, several sub-layers (3-1, 4-1, 4-2) A classification model that additionally includes) can be created.

다시 도 2를 참조하면, 분류 모델에 포함에 메인 분류기로부터 신규의 보조 분류기를 생성하고, 각각의 분류기의 독립적으로 출력한 결과가 자가 증류 기법에 따라 타 분류기로 공유되도록 하는 2차 학습단계를 수행하여 최종적인 영상 분류 모델을 생성한다(S240). Referring back to FIG. 2, a second learning step is performed in which a new auxiliary classifier is created from the main classifier for inclusion in the classification model, and the independently output results of each classifier are shared with other classifiers according to the self-distillation technique. Thus, a final image classification model is generated (S240).

도 5는 본 발명의 일 실 시예에 따른 소수 샷 이미지 분류 방법에 사용되는 자가 증류 기법을 설명하기 위한 도면이다.5 is a diagram illustrating a self-distillation technique used in a method for classifying a fractional shot image according to an exemplary embodiment of the present invention.

자가증류 기법은 딥러닝 모델의 중간 계층으로부터 또 다른 보조 분류기를 갖는 서브 모델을 생성하고, 이 보조 분류기가 출력하는 예측 결과와 메인 분류기가 예측하는 결과를 쿨백 라이블러 발산(KL divergence)을 통해 지식 공유를 하는 기법을 의미한다. 입력 샘플에 대해 각 분류기는 독립적인 분류 결과를 출력하게 되고 이러한 정보를 서로 공유하게 함으로써 메인 분류기가 좀 더 풍부한 정보의 출력 분포를 갖도록 정규화를 시킨다.The self-distillation technique creates a sub-model with another auxiliary classifier from the middle layer of the deep learning model, and knows the prediction result output by this auxiliary classifier and the result predicted by the main classifier through KL divergence. It refers to the technique of sharing. For the input sample, each classifier outputs an independent classification result, and by sharing this information with each other, the main classifier is normalized to have a richer output distribution of information.

이와 관련한 수식은 다음과 같다. The related formula is as follows.

[수학식 2][Equation 2]

수학식 2에서

는 학습 데이터의 카테고리를,

분류기의 개수를,

는 학습 파라미터를 의미한다. 이와 같이, 자가증류 기법에서는 각 영상의 특징을 나타내는 특징 벡터를 분류하여 지식을 공유하도록 하고, 각 분류기에서 출력된 지식 정보인 각 클래스의 확률 분포가 비슷해지도록 정규화하여, 메인 분류 모델이 좀더 정확한 분류를 수행할 수 있도록 한다.In Equation 2

Is the category of training data,

The number of classifiers,

Denotes the learning parameter. In this way, in the self-distillation technique, feature vectors representing features of each image are classified to share knowledge, and by normalizing the probability distribution of each class, which is the knowledge information output from each classifier, to be similar, the main classification model is classified more accurately. To be able to perform.

다시 도 2를 참조하면, 앞선 단계를 통해 구축한 영상 분류 모델을 이용하여, 새롭게 입력된 영상에 대하여 분류 결과를 출력한다(S250). 분류 결과를 이용하여 영상 분류 모델의 성능을 평가하는 단계를 추가적으로 수행할 수 있다.Referring back to FIG. 2, a classification result is output for a newly input image using the image classification model constructed through the previous step (S250). An additional step of evaluating the performance of the image classification model may be performed using the classification result.

이와 같은 평가 단계에서는 입력 영상에 대한 특징을 추출하되, 앞서 설명한 1차 및 2차 학습 단계와는 상이하게, 단일 계층 구조의 특징 추출기만을 사용하여 최종 특징을 추출하도록 한다. 그리고, 성능 평가를 위해서는 학습 단계에서는 입력되지 않은 소수의 입력 영상을 이용하여 쿼리 영상과의유사도를 기반으로 해당 영상이 속하는 클래스를 최종적으로 평가하도록 할 수 있다. In such an evaluation step, features of the input image are extracted, but different from the first and second learning steps described above, the final feature is extracted using only a feature extractor of a single layer structure. In addition, for performance evaluation, a class to which the corresponding image belongs may be finally evaluated based on the similarity with the query image using a small number of input images that are not input in the learning stage.

본 발명에서는 이와 같이, 1차 학습 단계와 2차 학습 단계를 통해 영상 분류 모델을 생성하는 것을 특징으로 포함한다.In the present invention, the image classification model is generated through the first learning step and the second learning step as described above.

본 발명의 일 실시예에 따른 딥러닝 기반 소수 샷 이미지 분류 방법은 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. The deep learning-based fractional shot image classification method according to an embodiment of the present invention may be implemented in the form of a recording medium including instructions executable by a computer such as a program module executed by a computer. Computer-readable media can be any available media that can be accessed by a computer, and includes both volatile and nonvolatile media, removable and non-removable media. Further, the computer-readable medium may include a computer storage medium. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

본 발명의 방법 및 시스템은 특정 실시예와 관련하여 설명되었지만, 그것들의 구성 요소 또는 동작의 일부 또는 전부는 범용 하드웨어 아키텍쳐를 갖는 컴퓨터 시스템을 사용하여 구현될 수 있다.Although the methods and systems of the present invention have been described in connection with specific embodiments, some or all of their components or operations may be implemented using a computer system having a general-purpose hardware architecture.

전술한 본원의 설명은 예시를 위한 것이며, 본원이 속하는 기술분야의 통상의 지식을 가진 자는 본원의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The foregoing description of the present application is for illustrative purposes only, and those of ordinary skill in the art to which the present application pertains will be able to understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present application. Therefore, it should be understood that the embodiments described above are illustrative and non-limiting in all respects. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as being distributed may also be implemented in a combined form.

본원의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본원의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present application is indicated by the claims to be described later rather than the detailed description, and all changes or modified forms derived from the meaning and scope of the claims and their equivalent concepts should be interpreted as being included in the scope of the present application.

100: 소수 샷 이미지 분류 장치
110: 통신모듈
120: 메모리
130: 프로세서
140: 데이터베이스
150: 입출력 모듈100: fractional shot image classification device
110: communication module
120: memory
130: processor
140: database
150: I/O module

Claims

In the deep learning-based fractional shot image classification method,
Perform data augmentation processing on the input image according to a self-mixing technique that generates a new input image by replacing a part of the input image with another part of the input image, and performs feature extraction on the input images for which data augmentation has been performed. , A first learning step of generating a classification model for learning the extracted features and classifying input images based on deep learning learning;
A secondary learning step of generating a new auxiliary classifier from the main classifier for inclusion in the classification model, and allowing the independently output results of each classifier to be shared with other classifiers according to the self-distillation technique, and
And outputting a classification result by inputting a new input image to an image classification model that has undergone the first and second learning steps.

The method of claim 1,
In the first learning step, the classification model is generated through a multi-feature extraction classifier having a plurality of sub-layers, and the classification model performs classification using cosine similarity.

The method of claim 1,
In the secondary learning step, knowledge sharing is performed on a prediction result output by the auxiliary classifier and a result predicted by the main classifier through KL divergence.

In the deep learning-based fractional shot image classification device,
Communication module;
A memory in which a fractional shot image classification program is stored;
Including a processor for executing a program stored in the memory,
The fractional shot image classification program performs data augmentation processing on the input image according to a self-mixing technique that generates a new input image by replacing a part of the input image with another part of the corresponding input image, and input images with data augmentation. A new auxiliary classifier from the main classifier for inclusion in the classification model and the primary learning step of generating a classification model for learning the extracted features based on deep learning learning and classifying the input image. And an image classification model generated through a secondary learning step in which the independently output results of each classifier are shared with other classifiers according to a self-distillation technique, and the fractional shot image classification program is a new input image Deep learning-based minority shot image classification apparatus for inputting the image classification model to output a classification result of a new input image.

The method of claim 4,
In the first learning step, the classification model is generated through a multi-feature extraction classifier having a plurality of sub-layers, and the classification model performs classification using a cosine similarity.

The method of claim 4,
In the secondary learning step, knowledge sharing is performed on a prediction result output from the auxiliary classifier and a result predicted by the main classifier through KL divergence.

A non-transitory computer-readable recording medium in which a computer program for performing the deep learning-based fractional shot image classification method according to any one of claims 1 to 3 is recorded.