KR102418596B1

KR102418596B1 - A deep-learning system based small learning data-set and method of deep-learning used it

Info

Publication number: KR102418596B1
Application number: KR1020200062746A
Authority: KR
Inventors: 강대성; 이준목
Original assignee: 동아대학교 산학협력단
Priority date: 2020-05-26
Filing date: 2020-05-26
Publication date: 2022-07-06
Also published as: KR20210145923A

Abstract

본 발명은 소규모 학습 데이터 셋을 기반으로 하는 딥러닝 시스템 및 이를 이용한 딥러닝 학습방법에 대한 것으로, 딥러닝 학습을 위한 데이터를 소규모 단위로 구분하고, 사전 검증작업을 거쳐 소규모의 학습을 위한 데이터의 딥러닝 학습에 대한 신뢰도를 높이며, 원시데이터의 반복적인 학습시간을 단축할 수 있도록 하는 것이다.
이를 위해, 딥러닝 모델 학습을 하기 위해 입력되는 학습 데이터 셋 중에서 클래스간 데이터 불균형이 존재하는지 여부를 검사하여 소규모 데이터 셋과 참조 데이터 셋으로 분리하고 분석하는 데이터 불균형 분석장치부(10), 소규모 학습 데이터 셋과 데이터 생성에 필요한 참조 데이터 셋을 이용하여 생성 데이터 셋과 합성 데이터 셋을 생성하고, 생성된 데이터가 실제로 존재하는지 여부를 판별하는 데이터 생성부(20), 데이터 유사도 군집 결과와 기존에 학습되어 있는 범용 딥러닝 모델을 이용하여 목표로 학습해야 할 최종 딥러닝 모델의 가중치와 하이퍼 파라미터의 초기화에 대한 내용을 전이 학습시키는 모델을 생성하는 모델초기화부(30), 소규모 데이터 셋을 통해 생성/검증된 학습 데이터 셋과 메타 러닝 모델을 통해 생성된 전이학습 최적 데이터를 이용하여 딥러닝 모델을 통상적인 학습 데이터 셋과 레이블링 정보를 통해 학습시키는 모델 학습부(40) 및 학습된 모델을 기반으로 분류 혹은 객체 검출과 같은 추론을 하는 모델 추론부(50)로 구성되는 기술을 제공한다.The present invention relates to a deep learning system based on a small learning data set and a deep learning learning method using the same. It is to increase the reliability of deep learning learning and to shorten the iterative learning time of raw data.
To this end, the data imbalance analyzer unit 10 that separates and analyzes a small data set and a reference data set by examining whether there is data imbalance between classes among the training data sets input to learn the deep learning model, small learning A data generating unit 20 that generates a generated data set and a synthetic data set using the data set and the reference data set required for data generation, and determines whether the generated data actually exists, and the data similarity cluster result and the existing learning The model initialization unit 30, which creates a model that transfers learning the contents of the weight and hyper-parameter initialization of the final deep learning model to be learned as a target, using the general-purpose deep learning model that has been Classification based on the trained model and the model learning unit 40 that trains the deep learning model through the normal training data set and labeling information using the verified training data set and transfer learning optimal data generated through the meta-learning model Alternatively, a technology composed of the model inference unit 50 that makes inferences such as object detection is provided.

Description

A DEEP-LEARNING SYSTEM BASED SMALL LEARNING DATA-SET AND METHOD OF DEEP-LEARNING USED IT}

본 발명은 소규모 학습 데이터 셋을 기반으로 하는 딥러닝 시스템 및 이를 이용한 딥러닝 학습방법에 대한 것으로, 소규모 데이터 셋을 통해서 학습 데이터 셋을 증강하여 충분한 양의 데이터 셋으로 학습한 수준의 딥러닝 모델의 정확도를 개선하고, 데이터 셋의 사전 검증작업을 거쳐 중복작업을 배제하여 딥러닝을 위한 원시데이터의 반복적인 학습시간을 단축할 수 있도록 하는 것이다.The present invention relates to a deep learning system based on a small learning data set and a deep learning learning method using the same. The goal is to improve the accuracy and reduce the iterative learning time of raw data for deep learning by excluding redundant tasks through pre-verification of the data set.

일반적으로 딥러닝 학습의 경우 기계학습의 대표적인 기술로서, 반복학습을 통하여 원시데이터인 학습을 위한 데이터를 분석하는 것이다.In general, in the case of deep learning learning, as a representative technique of machine learning, data for learning, which is raw data, is analyzed through repeated learning.

이와 같은 딥러닝 학습의 경우 많은 수의 원시데이터를 반복적으로 학습함으로써, 분석 정밀도가 높아지게 된다.In the case of such deep learning learning, analysis precision is increased by repeatedly learning a large number of raw data.

그러나, 이러한 기존의 딥러닝 학습의 경우 학습을 위한 데이터인 원시데이터의 수가 적은 소규모 데이터셋일 경우 그 분석 정밀도가 낮아져 신뢰도가 낮아지는 문제점이 있었다.However, in the case of such conventional deep learning learning, in the case of a small dataset with a small number of raw data, which is the data for learning, the analysis precision is lowered, and there is a problem in that the reliability is lowered.

또한, 학습을 위한 원시데이터가 입력되면 처음부터 데이터를 학습하는 작업이 수행되기 때문에 동일 또는 유사한 상황이 또는 원시데이터 상에 이벤트가 발생할 경우 이를 반복작업을 통한 학습시에만 인식하고 분석할 수 있기 때문에 딥러닝 학습을 위한 초기작업시 소요되는 시간이 증가하게 된다.In addition, when raw data for learning is input, the task of learning data is performed from the beginning, so if the same or similar situation or event occurs on raw data, it can be recognized and analyzed only when learning through repetitive tasks. The time required for the initial work for deep learning learning increases.

이와 같은 문제점을 극복하기 위해 다양한 방법들이 제안되고 있는데, 그 중 대표적인 것으로는 대한민국 등록특허 제10-2019208호(딥 러닝 기반의 오류 분류 방법 및 장치, 이하 '선행기술'이라 함.)딥 러닝 기반의 오류 분류 방법 및 장치가 개시된다. 오류 분류 방법은 서로 다른 속성을 가지는 복수의 인식 모델들 각각에 동일한 데이터를 입력하는 단계; 상기 복수의 인식 모델들 각각을 통해 상기 데이터의 인식 결과를 추출하는 단계; 상기 복수의 인식 모델들 각각의 인식 결과들이 서로 동일한지 여부를 판단하는 단계; 및 상기 복수의 인식 모델들 각각의 인식 결과들에 대한 판단 결과에 기초하여 상기 데이터를 재검증할 필요가 없는 검증된 인식 결과 및 재검증할 필요가 있는 검증 안된 인식 결과로 분류하는 단계를 포함하는 기술을 개시하고 있다.In order to overcome this problem, various methods have been proposed, and among them, the representative of which is Republic of Korea Patent Registration No. 10-2019208 (Deep learning-based error classification method and apparatus, hereinafter referred to as 'prior art'.) Deep learning-based An error classification method and apparatus are disclosed. The error classification method includes: inputting the same data to each of a plurality of recognition models having different properties; extracting a recognition result of the data through each of the plurality of recognition models; determining whether recognition results of each of the plurality of recognition models are identical to each other; and classifying the data into a verified recognition result that does not need to be re-verified and an unverified recognition result that needs to be re-verified based on the determination result for the recognition results of each of the plurality of recognition models technology is launched.

그러나, 상기 선행기술의 경우 학습된 복수의 인식 모델들을 반복적으로 생성하고 비교함으로써, 딥러닝 학습을 위한 학습결과에 대한 대용량의 데이터를 발생시키게 되며, 이를 처리하는데 많은 시간이 소요되는 문제가 있었다.However, in the case of the prior art, by repeatedly generating and comparing a plurality of learned recognition models, a large amount of data for a learning result for deep learning is generated, and there is a problem that it takes a lot of time to process it.

대한민국 등록특허 제10-2019208호(딥 러닝 기반의 오류 분류 방법 및 장치, 2019년 09월 02일 등록)Republic of Korea Patent No. 10-2019208 (Deep learning-based error classification method and device, registered on September 02, 2019)

본 발명은 상기와 같은 문제점을 극복하기 위해, 딥러닝 학습 전 원시데이터를 소규모의 데이터 셋을 기반으로 하여, 사전 검증작업을 수행하고, 이에 따라 딥러닝을 위한 학습시간을 단축할 수 있도록 하는 소규모 학습 데이터 셋 기반 딥러닝 시스템과 이를 이용한 딥러닝 학습방법을 제공하는 것을 본 발명의 목적으로 한다.In order to overcome the above problems, the present invention performs a preliminary verification operation on the raw data before deep learning learning based on a small data set, and accordingly, a small scale that can shorten the learning time for deep learning. An object of the present invention is to provide a learning data set-based deep learning system and a deep learning learning method using the same.

본 발명의 목적을 달성하기 위한 소규모 학습 데이터 셋을 기반으로 하는 딥러닝 시스템은 딥러닝 모델 학습을 하기 위해 입력되는 학습 데이터 셋 중에서 클래스간 데이터 불균형이 존재하는지 여부를 검사하여 소규모 데이터 셋과 참조 데이터 셋으로 분리하고 분석하는 데이터 불균형 분석장치부(10); 소규모 학습 데이터 셋과 데이터 생성에 필요한 참조 데이터 셋을 이용하여 생성 데이터 셋과 합성 데이터 셋을 생성하고, 생성된 데이터가 실제로 학습데이터로 사용할 수 있는지 적합성을 판별하는 데이터 생성부(20); 데이터 유사도 군집 결과와 기존에 학습되어 있는 범용 딥러닝 모델을 이용하여 목표로 학습해야 할 최종 딥러닝 모델의 가중치 초기화와 하이퍼 파라미터의 자동 지정에 대한 내용을 전이 학습시키는 모델초기화부(30); 및 소규모 데이터 셋을 통해 생성/검증된 학습 데이터 셋과 메타 러닝 모델을 통해 생성된 전이학습 최적 데이터를 이용하여 딥러닝 모델을 학습시키는 모델 학습부(40); 학습된 모델을 기반으로 분류 혹은 객체 검출과 같은 추론을 하는 모델 추론부(50)로 구성되는 것을 특징으로 하는 소규모 학습 데이터 셋 기반 딥러닝 시스템을 제공하게 된다.A deep learning system based on a small learning data set for achieving the object of the present invention examines whether there is a data imbalance between classes among the training data sets input to learn the deep learning model, and then compares the small data set and the reference data. Data imbalance analysis unit 10 for separating and analyzing into three; a data generator 20 that generates a generated data set and a synthetic data set using a small-scale training data set and a reference data set required for data generation, and determines whether the generated data is actually suitable for use as training data; a model initialization unit 30 that transfers learning of the weight initialization of the final deep learning model to be learned as a target and automatic designation of hyper parameters using the data similarity clustering result and the general-purpose deep learning model that has been previously trained; And a model learning unit 40 for learning the deep learning model using the training data set generated/verified through the small data set and the transfer learning optimal data generated through the meta-learning model; A small learning data set-based deep learning system is provided, characterized in that it consists of a model inference unit 50 that makes inferences such as classification or object detection based on the learned model.

여기서, 상기 데이터 불균형 분석장치부(10)는 입력되는 학습 데이터를 분석하는 분석유니트(11)와 상기 분석유니트(11)에서 분석된 결과에 따라 소규모 데이터 셋과 참조 데이터 셋으로 분류 하는 분류유니트(12)로 구성되고, 상기 데이터 생성부(20)는 데이터를 생성하는 생성유니트(21), 소규모 학습데이터의 실제존재여부를 판단하는 검증유니트(22), 학습데이터 생성유니트(21)가 학습을 위한 데이터를 획득할 수 있도록 하는 생성네트워크관리유니트(23) 및 검증유니트(22)에서 데이터를 검증할 수 있도록 참조할 수 있는 데이터를 획득할 수 있도록 하는 검증네트워크관리유니트(24)로 구성된다.Here, the data imbalance analysis unit 10 includes an analysis unit 11 that analyzes input learning data and a classification unit that classifies into a small data set and a reference data set according to the results analyzed by the analysis unit 11 ( 12), and the data generation unit 20 includes a generation unit 21 that generates data, a verification unit 22 that determines whether small-scale learning data actually exists, and a learning data generation unit 21 that performs learning. It is composed of a generation network management unit 23 for obtaining data for the purpose, and a verification network management unit 24 for obtaining data that can be referred to so as to verify data in the verification unit 22.

또한, 상기 데이터 생성부(20)는 데이터의 진위여부를 판단하고 그 확률값을 찾을 수 있는 확률계산유니트(25)가 더 포함되고, 상기 모델검증유니트(22)는 주어진 학습 데이터셋 사이의 유사도를 비교하여 각각을 클러스터링하고 클러스터된 비중을 기반으로 학습 데이터와 적용할 메타러닝 모델을 적용 가중치를 부여하는 유사도군집기를 더 포함하게 되며, 상기 유사도군집기는 데이터의 유사도를 판별함에 있어, 유클리디언 거리, 마할라노비스거리, 민코스키거리, 코사인 유사도와 같은 측정 알고리즘 중 어느 하나를 이용하는 것을 특징으로 하는 소규모 학습 데이터 셋 기반 딥러닝 시스템을 제공하게 된다.In addition, the data generation unit 20 further includes a probability calculation unit 25 that can determine the authenticity of the data and find the probability value, and the model verification unit 22 determines the similarity between the given training datasets. Comparing and clustering each and based on the clustered weight, it further includes a similarity clusterer that gives an application weight to the training data and a meta-learning model to be applied, wherein the similarity clusterer determines the similarity of the data, the Euclidean distance , Mahalanobis distance, Minkowski distance, and any one of the measurement algorithms such as cosine similarity are provided.

또한, 본 발명에 따른 소규모 학습 데이터 셋 기반 딥러닝 시스템을 이용한 딥러닝 학습방법에 있어서는 학습 데이터 셋에서 데이터 불균형을 판단하는 단계(S10); 데이터 불균형 상황에서 소규모 학습 데이터 셋과 참조 데이터 셋으로 분리하는 단계(S20); 소규모 학습 데이터셋과 참조 데이터 셋을 입력받아 새로운 데이터 셋을 생성하는 단계(S30); 생성된 데이터와 소규모 학습 데이터와의 비교를 통해 가상으로 생성된 데이터 셋인지 판단하는 단계(S40); 검증 확률값이 0.5에 수렴함에 따라 생성 학습 데이터를 선택하는 단계(S50); 학습 데이터 셋에서 데이터 사이의 유사도를 측정하는 단계(S60); 유사도에 따라 데이터를 클러스터링 하는 단계(S70); 유사도 정보에 따라 기존 학습된 딥러닝 모델을 적용하는 단계(S80); 적용된 딥러닝 모델의 가중치와 하이퍼 파라미터를 지정하는 단계(S90); 및 생성된 학습 데이터 셋과 가중치, 하이퍼 파라미터 초기값을 지정하여 딥러닝 모델을 학습하는 단계(S100); 학습된 딥러닝 모델을 통하여 추론을 수행하는 단계(S110)로 구성되는 것을 특징으로 하는 소규모 학습 데이터 셋 기반 딥러닝 시스템을 이용한 딥러닝 학습방법을 제공함으로써, 본 발명의 또다른 목적을 보다 잘 달성할 수 있도록 하는 것이다.In addition, in the deep learning learning method using the small-scale learning data set-based deep learning system according to the present invention, determining the data imbalance in the learning data set (S10); Separating a small training data set and a reference data set in a data imbalance situation (S20); generating a new data set by receiving a small-scale learning data set and a reference data set (S30); determining whether it is a virtual generated data set by comparing the generated data with the small-scale learning data (S40); selecting generated learning data as the verification probability value converges to 0.5 (S50); Measuring the similarity between the data in the training data set (S60); clustering data according to the degree of similarity (S70); applying the previously learned deep learning model according to the similarity information (S80); designating weights and hyperparameters of the applied deep learning model (S90); and learning the deep learning model by designating the generated training data set, weights, and initial hyperparameter values (S100); Another object of the present invention is better achieved by providing a deep learning learning method using a small learning data set-based deep learning system, characterized in that it consists of a step (S110) of performing inference through the learned deep learning model to be able to do it

본 발명 소규모 학습 데이터 셋 기반 딥러닝 시스템과 이를 이용한 딥러닝 학습방법을 제공함으로써, 소규모 데이터 셋을 통해서도 충분한 양의 데이터 셋으로 학습한 수준의 딥러닝 모델의 높은 정확도를 가지는 효과가 있으며, 데이터 셋의 사전 검증작업을 수행하고, 중복작업을 배제하여 이에 따라 딥러닝을 위한 학습시간을 단축할 수 있도록 하는 효과가 있다.By providing a small-scale learning data set-based deep learning system and a deep learning learning method using the present invention, there is an effect of having a high level of accuracy of a deep learning model learned with a sufficient amount of data set even through a small data set, and the data set It has the effect of shortening the learning time for deep learning by performing a preliminary verification task and excluding redundant tasks.

도 1은 본 발명의 소규모 학습 데이터 셋 기반 딥러닝 시스템의 구성도이다.
도 2는 본 발명에 따른 데이터 생성부(20)의 세부 구성도이다.
도 3은 본 발명에 따른 딥러닝 학습방법에 대한 개요도이다.
도 4는 본 발명에 따른 불균형 분석장치부(10)의 동작을 설명하기 위한 예시도이다.
도 5는 본 발명에 따른 학습 데이터 생성유니트(21)의 동작을 설명하기 위한 예시도이다.
도 6은 본 발명에 따른 검증유니트(22)의 동작을 설명하기 위한 예시도이다.
도 7은 본 발명에 따른 유사도군집기의 동작을 설명하기 위한 예시도이다.
도 8은 본 발명에 따른 모델초기화부(30)의 동작을 설명하기 위한 예시도이다.
도 9는 본 발명에 따른 모델학습부(40)의 동작을 설명하기 위한 예시도이다.1 is a block diagram of a deep learning system based on a small learning data set of the present invention.
2 is a detailed configuration diagram of the data generating unit 20 according to the present invention.
3 is a schematic diagram of a deep learning learning method according to the present invention.
4 is an exemplary view for explaining the operation of the imbalance analyzer unit 10 according to the present invention.
5 is an exemplary diagram for explaining the operation of the learning data generating unit 21 according to the present invention.
6 is an exemplary view for explaining the operation of the verification unit 22 according to the present invention.
7 is an exemplary diagram for explaining the operation of the similarity grouper according to the present invention.
8 is an exemplary diagram for explaining the operation of the model initialization unit 30 according to the present invention.
9 is an exemplary diagram for explaining the operation of the model learning unit 40 according to the present invention.

이하에서 본 발명의 소규모 학습 데이터 셋 기반 딥러닝 시스템과 이를 이용한 딥러닝 학습방법을 도면을 참조하여 상세하게 설명하도록 한다.Hereinafter, the small-scale learning data set-based deep learning system of the present invention and a deep learning learning method using the same will be described in detail with reference to the drawings.

도 1은 본 발명의 소규모 학습 데이터 셋 기반 딥러닝 시스템의 구성도이고, 도 2는 본 발명에 따른 데이터 생성부(20)의 세부 구성도이다.1 is a configuration diagram of a small-scale learning data set-based deep learning system of the present invention, and FIG. 2 is a detailed configuration diagram of a data generator 20 according to the present invention.

도 3은 본 발명에 따른 딥러닝 학습방법에 대한 개요도이고, 도 4는 본 발명에 따른 불균형 분석장치부(10)의 동작을 설명하기 위한 예시도이고, 도 5는 본 발명에 따른 학습 데이터 생성유니트(21)의 동작을 설명하기 위한 예시도이다고, 도 6은 본 발명에 따른 검증유니트(22)의 동작을 설명하기 위한 예시도이고, 도 7은 본 발명에 따른 유사도군집기의 동작을 설명하기 위한 예시도이고, 도 8은 본 발명에 따른 모델초기화부(30)의 동작을 설명하기 위한 예시도이며, 도 9는 본 발명에 따른 모델학습부(40)의 동작을 설명하기 위한 예시도이다. Figure 3 is a schematic diagram of a deep learning learning method according to the present invention, Figure 4 is an exemplary diagram for explaining the operation of the imbalance analysis unit 10 according to the present invention, Figure 5 is learning data generation according to the present invention It is an exemplary diagram for explaining the operation of the unit 21, FIG. 6 is an exemplary diagram for explaining the operation of the verification unit 22 according to the present invention, and FIG. 7 is an operation of the similarity grouper according to the present invention It is an exemplary view for explaining, FIG. 8 is an exemplary view for explaining the operation of the model initialization unit 30 according to the present invention, Figure 9 is an example for explaining the operation of the model learning unit 40 according to the present invention It is also

도 1과 도 2 및 도 3 내지 도 9를 참조하여 상세하게 설명하면, 본 발명에 따른 소규모 학습 데이터 셋 기반 딥러닝 시스템은 딥러닝 모델 학습을 하기 위해 입력되는 학습 데이터 셋 중에서 클래스간 데이터 불균형이 존재하는지 여부를 검사하여 소규모 데이터 셋과 참조 데이터 셋으로 분규하고 분석하는 데이터 불균형 분석장치부(10), 소규모 학습 데이터 셋과 데이터 생성에 필요한 참조 데이터 셋을 이용하여 생성 데이터 셋과 합성 데이터 셋을 생성하고, 생성된 데이터가 실제로 존재하는지 여부를 판별하는 데이터 생성부(20), 데이터 유사도 군집 결과와 기존에 학습되어 있는 범용 딥러닝 모델을 이용하여 목표로 학습해야 할 최종 딥러닝 모델의 가중치와 하이퍼 파라미터의 초기화에 대한 내용을 전이 학습시키는 모델을 생성하는 모델초기화부(30), 소규모 데이터 셋을 통해 생성/검증된 학습 데이터 셋과 메타 러닝 모델을 통해 생성된 전이학습 최적 데이터를 이용하여 딥러닝 모델을 통상적인 학습 데이터 셋과 레이블링 정보를 통해 학습시키는 모델 학습부(40) 및 학습된 모델을 기반으로 분류 혹은 객체 검출과 같은 추론을 하는 모델 추론부(50)로 구성된다.1 and 2 and 3 to 9, the deep learning system based on a small learning data set according to the present invention has data imbalance between classes among the training data sets input to learn the deep learning model. The data imbalance analyzer unit 10, which analyzes and divides into a small data set and a reference data set by examining whether there is, and a small training data set and a reference data set necessary for data generation, is used to generate a data set and a synthetic data set. The weight of the final deep learning model to be learned as a target using the data generation unit 20 that generates and determines whether the generated data actually exists, the data similarity cluster result, and the general-purpose deep learning model that has been previously trained. The model initialization unit 30 that generates a model that transfers learning the contents of the initialization of hyper parameters, a training data set created/verified through a small data set, and the transfer learning optimal data generated through a meta-learning model are used to deep It consists of a model learning unit 40 that trains a learning model through a typical training data set and labeling information, and a model inference unit 50 that makes inferences such as classification or object detection based on the learned model.

상기 데이터 불균형 분석장치부(10)는 딥러닝 모델 학습을 하기 위해 입력되는 학습 데이터 셋 중에서 클래스간 데이터 불균형이 존재하는지 여부를 검사하여 소규모 데이터 셋과 참조 데이터 셋으로 뷴류하고 분석하는 기능을 구현한다.The data imbalance analysis device unit 10 checks whether there is a data imbalance between classes among the training data sets input to learn the deep learning model, and divides it into a small data set and a reference data set and implements a function of analysis. .

여기서, 상기 학습 데이터 셋 중 클래스간 데이터 불균형이라함은 딥러닝 모델 학습에 필요한 주어진 데이터 셋의 각 분류별 학습 데이터의 개수가 적절한지, 한쪽으로 치우쳐지지 않았는지, 절대적인 데이터 양은 적절한지 분석 및 분류를 시행하는 것이다.Here, the data imbalance between classes among the training data sets means whether the number of training data for each classification of a given data set required for deep learning model learning is appropriate, whether it is not biased to one side, and whether the absolute data amount is appropriate for analysis and classification it will be implemented

예를 들어, 실내에서 일어난 화재를 판단하는 딥러닝 추론 모델을 개발할 때 필요한 학습 데이터 셋은 화재에 해당하는 데이터와 비화재에 해당하는 데이터로 구성되어 있다. 화재 검출 학습 데이터 셋은 n개의 화재 및 비화재 분류와 각 분류를 학습시킬 m개의 데이터로 이루어진 n*m 의 크기를 갖는다. 분류별로 학습 데이터의 개수가 적절한지를 평가하기 위해 0~n까지의 분류별로 고르게 m개의 데이터를 가지고 있는지 판단하는 것이 각 분류별 학습데이터 개수가 적절한지 평가하는 것이다. 또한, 고정된 특정 영역의 정보를 수집하는 CCTV 화면 내에서 화재를 판단할 수 있는 딥러닝 추론모델의 예에서는 화재가 발생하지 않은 일반적인 CCTV화면의 데이터와 화재가 발생한 상황의 데이터로 구성된 학습 데이터셋을 통해 모델을 학습시키게 된다. 이 상황에서 구성된 데이터 셋의 면면을 들여다보면 화재가 발생하지 않은 상황의 데이터의 양이 화재가 발생한 상황의 데이터의 양에 비해 월등히 많을 것이며 이를 데이터가 한쪽으로 치우쳐진 경우라고 할 수 있다. 마지막으로 절대적인 데이터의 양은 적절한지 평가하는 것은 n*m으로 이루어진 전체 데이터 셋이 일정한 데이터 셋 크기를 만족하는지 평가하여 학습 데이터 셋 크기 자체가 적절한지 분석하는 것이다.For example, when developing a deep learning inference model to determine an indoor fire, the training data set required consists of data corresponding to fire and data corresponding to non-fire. The fire detection training dataset has a size of n*m consisting of n fire and non-fire classifications and m data to learn each classification. To evaluate whether the number of training data for each classification is appropriate, determining whether there are m data evenly for each classification from 0 to n is to evaluate whether the number of training data for each classification is appropriate. In addition, in the example of a deep learning inference model that can determine a fire within a CCTV screen that collects information on a specific fixed area, a learning dataset consisting of data from a general CCTV screen without a fire and data from a fire situation to train the model. If we look at the data set constructed in this situation, the amount of data in the non-fire situation will be significantly larger than the amount of data in the fire situation, which can be said to be a case in which the data is biased. Finally, to evaluate whether the absolute data amount is appropriate is to evaluate whether the entire data set consisting of n*m satisfies a certain data set size and analyze whether the training data set size itself is appropriate.

보다 상세하게 설명하면, 학습을 위한 원시데이터가 소규모 데이터인지 구분하거나, 학습 시 참조할 수 있는 참조데이터인지를 구분하게 되는 것이다.In more detail, whether the raw data for learning is small-scale data or whether it is reference data that can be referenced during learning is distinguished.

상기와 같은 데이터 불균형 분석장치부(10)는 입력되는 학습 데이터를 분석하는 분석유니트(11)와 상기 분석유니트(11)에서 분석된 결과에 따라 소규모 데이터 셋과 참조 데이터 셋으로 분류 하는 분류유니트(12)로 구성된다.The data imbalance analysis unit 10 as described above includes an analysis unit 11 that analyzes input learning data, and a classification unit that classifies into a small data set and a reference data set according to the results analyzed in the analysis unit 11 ( 12) consists of

이때, 상기 참조 데이터 셋은 학습을 위해 획득된 하나의 데이터 셋 중 수집하기 용이한 경우나 데이터의 양이 적절하게 존재하는 경우가 될 수 있으며, 학습데이터를 분석하고 학습할 때 참조할 수 있는 데이터를 말하는 것이다.In this case, the reference data set may be a case in which it is easy to collect among one data set acquired for learning or a case in which an appropriate amount of data exists, and data that can be referenced when analyzing and learning the learning data will say

여기에서 수집이 용이한 경우는 구글, Kaggle, Github 등의 오픈 데이터 셋 혹은 인터넷 쉐어, 수집 장치 등을 통해서 별다른 제약 없이 손쉽게 대량의 데이터를 수집이 가능한 경우를 의미한다. 예를 들어 범죄 행동을 검출하기 위해 방범용 CCTV를 이용하여 영상 데이터를 수집한다고 할 때, 범죄 행동이 발생하지 않은 일반적인 사람들의 행위에 대한 영상 데이터는 수집이 용이한 경우라고 할 수 있으며, 실제 강도, 상해, 폭력 등에 해당하는 영상 데이터는 수집이 어려운 경우라고 할 수 있다. Here, the easy collection means a case in which a large amount of data can be easily collected without any restrictions through open data sets such as Google, Kaggle, or Github, or through Internet shares and collection devices. For example, when video data is collected using CCTV for crime prevention to detect criminal behavior, it can be said that it is easy to collect video data about the actions of ordinary people without criminal behavior, and the actual robbery. It can be said that it is difficult to collect video data corresponding to , injuries, and violence.

지정하거나, '화재상황'과 같이 노멀하지 않은 이벤트가 발생한 경우 이와 같은 정보가 포함된 경우가 될 수 있으며, 학습데이터를 분석할 때 참조할 수 있는 데이터를 말하는 것이다.In the case of designated or non-normal events such as 'fire situation', such information may be included, and it refers to data that can be referenced when analyzing learning data.

상기 데이터 생성부(20)는 불균형 분석장치부(10)에서 분류된 소규모 학습데이터 셋과 참조 데이터 셋을 기반으로 생성 데이터 셋과 합성 데이터 셋을 생성하고, 생성 데이터 셋과 합성 데이터 셋의 데이터가 실제로 존재하는지 여부를 판별하게 된다.The data generating unit 20 generates a generated data set and a synthetic data set based on the small training data set and the reference data set classified by the imbalance analysis unit 10, and the data of the generated data set and the synthetic data set is It determines whether or not it actually exists.

상기와 같은 데이터 생성부는 데이터를 생성하는 생성유니트(21), 소규모 학습데이터의 실제존재여부를 판단하는 검증유니트(22), 학습데이터 생성유니트(21)가 학습을 위한 데이터를 획득할 수 있도록 하는 생성네트워크관리유니트(23) 및 검증유니트(22)에서 데이터를 검증할 수 있도록 참조할 수 있는 데이터를 획득할 수 있도록 하는 검증네트워크관리유니트(24)로 구성된다.The data generating unit as described above so that the generating unit 21 that generates data, the verification unit 22 that determines whether small-scale learning data actually exists, and the learning data generating unit 21 can acquire data for learning It consists of a generation network management unit 23 and a verification network management unit 24 that allows the verification unit 22 to obtain referenced data to verify the data.

또한, 상기 데이터 생성부(20)는 확률계산유니트(25)가 더 구비된다.In addition, the data generating unit 20 is further provided with a probability calculating unit (25).

상기 학습데이터 생성유니트(21)에서 학습을 위한 소규모 데이터 셋을 지정하고, 새로운 데이터 합성에 필요한 베이스데이터에 사용할 참조 데이터 셋을 지정한 후, 두 데이터 셋의 특성을 모두 갖는 생성 데이터셋이 완성된다.After designating a small data set for learning in the learning data generating unit 21 and designating a reference data set to be used for base data required for new data synthesis, a generated data set having the characteristics of both data sets is completed.

여기서, 상기 소규모 데이터 셋은 수집이 용이하지 않거나 데이터의 양이 적은 가공이 되지 않은 원시 데이터를 말하는 것이며, 참조 데이터 셋은 일반적으로 수집이 용이하거나 데이터 불균형 상황에서 충분한 양을 갖는 데이터를 말하는 것이다.Here, the small data set refers to unprocessed raw data that is not easy to collect or has a small amount of data, and the reference data set refers to data that is generally easy to collect or has a sufficient amount in a data imbalance situation.

예를 들어, '건물 내 화재 사고'를 검출하기 위해서 딥러닝 모델을 학습하는 학습 데이터 셋이 주어진다면 실제 화재가 발생한 이미지와 관련된 데이터는 소규모 데이터 셋이며, 화재가 발생하지 않은 상황의 건물 내부 모습은 참조 데이터를 말하는 것이다.For example, given a training data set that trains a deep learning model in order to detect 'fire accidents in buildings', the data related to images in which an actual fire occurred is a small data set, and the interior of a building in a non-fire situation is the reference data.

상기와 같이 원시 데이터인 소규모 데이터 셋에 참조 데이터 셋의 특징정보인 이벤트 상황에 대한 정보를 오버래핑하거나 이미지 또는 영상일부를 원시 데이터인 소규모 데이터 셋에 포함하도록 한다. As described above, information on the event situation, which is characteristic information of the reference data set, is overlapped with the small data set that is the raw data, or an image or a part of the image is included in the small data set that is the raw data.

상기 검증유니트(22)에서는 생성 또는 합성된 데이터 셋이 원래 주어진 학습 데이터 셋과 비교하여 가상으로 생성된 데이터 셋인지 아닌지를 판단하고 검증하는 것으로, 생성네트워크관리유니트(23)로부터 검증네트워크관리유니트(24)로부터 이송되어 입력되는 학습을 위한 소규모 데이터 셋과 참조 데이터셋을 기반으로 검증네트워크관리유니트(24)에 입력되는 데이터 셋의 종류 및 진위여부 즉, 생성 또는 합성되었는지의 여부를 판단하게 된다.The verification unit 22 determines and verifies whether the generated or synthesized data set is a virtual generated data set by comparing it with the originally given training data set, and the verification network management unit ( Based on the small data set for learning and the reference data set transferred from 24), it is determined whether the type and authenticity of the data set input to the verification network management unit 24 is generated or synthesized.

이때, 상기 확률계산유니트(25)를 통하여 데이터의 진위여부를 판단하고 그 확률값을 찾게 되는 것이다.At this time, the authenticity of the data is determined through the probability calculation unit 25 and the probability value is found.

여기서, 예를 들어, 확률값이 0에 수렴할 경우 확률계산유니트(25)는 모든 데이터를 거짓 데이터로 판별하고, 확률값이 1에 수렴할 경우 확률 계산유니트는 모든 데이터를 참으로 판별하게 되는 것이다.Here, for example, when the probability value converges to 0, the probability calculation unit 25 determines all data as false data, and when the probability value converges to 1, the probability calculation unit determines all data as true.

그러므로, 상기와 같은 확률값이 0.5에 수렵하게 하여 확률계산유니트(25)가 실제 데이터와 생성된 데이터가 거짓인지 참인지를 판별이 곤란한 데이터로 인식할 때 생성 또는 합성 데이터 셋을 딥러닝 모델을 학습할 데이터 셋으로 선택하게 된다.Therefore, when the probability value as above is 0.5 and the probability calculation unit 25 recognizes the data that it is difficult to determine whether the actual data and the generated data are false or true, the deep learning model is created or synthesized data set is learned The data set to be used is selected.

이때, 상기 확률계산유니트(25)는 하기 [수학식 1]을 탑재하고 이를 이용하여 확률값을 산출하게 된다.At this time, the probability calculation unit 25 is loaded with the following [Equation 1] and calculates a probability value using the [Equation 1].

G : 생성유니트에 적용될 Generative ModelG : Generative Model to be applied to the generating unit

D : 검증유니트에 적용될 Discriminative ModelD: Discriminative Model to be applied to the verification unit

p_z: z의 확률분포p _z : probability distribution of z

z : 임의로 생성하는 노이즈 변수z : randomly generated noise variable

x : 실제 데이터x : actual data

: 실제 데이터 x를 D가 보고 실제 데이터라고 판단할 확률

: Probability that D sees real data x and determines that it is real data

또한 상기 모델검증유니트(22)는 주어진 학습 데이터 셋 사이의 유사도를 비교하여 각각을 클러스터링 클러스터된 비중을 기반으로 학습 데이터와 적용할 메타러닝 모델을 적용 가중치를 부여하는 유사도군집기(미도시)를 더 포함하게 된다.In addition, the model verification unit 22 compares the degree of similarity between the given training data sets, and a similarity clusterer (not shown) that assigns a weight to the training data and the meta-learning model to be applied based on the clustered weight of each. will include more

이때, 데이터의 유사도는 유클리디언 거리, 마할라노비스거리, 민코스키거리, 코사인 유사도와 같은 측정 알고리즘을 이용하여 적용하게 된다.In this case, the similarity of data is applied using a measurement algorithm such as Euclidean distance, Mahalanobis distance, Minkowski distance, and cosine similarity.

상기 모델초기화부(30)는 소규모 데이터의 특징점을 클러스터링하고 이를 기존에 학습된 범용 딥러닝 모델과의 유사도를 측정하여 가장 비슷한 모델을 선정하여 딥러닝 모델 특성을 전이하며, 이를 통해서 최종 딥러닝 모델의 가중치 초기화와 하이퍼 파라미터들을 조기 지정하여 학습시간을 경감시킨다.The model initialization unit 30 clusters the feature points of small-scale data, measures the similarity with the previously learned general-purpose deep learning model, selects the most similar model, and transfers the deep learning model characteristics, through which the final deep learning model The training time is reduced by initializing the weights of , and specifying the hyper parameters early.

이는 학습을 위한 소규모 데이터의 특성에 따라 다양한 종류의 딥러닝 모델을 적용할 수 있도록 호환성을 높여주기 위한 것이다.This is to increase compatibility so that various types of deep learning models can be applied according to the characteristics of small-scale data for training.

이때, 상기 모델초기화부(30)에는 메타러닝모델관리유니트(31)가 더 포함되는데, 상기 러닝모델은 데이터 유사도 군집 결과와 기존에 학습되어 있는 범용 딥러닝 모델을 이용하여 목표로 학습해야 할 최종 딥러닝 모델의 가중치와 하이퍼 파라미터의 초기화에 대한 내용을 전이 학습시키는 모델을 말하며, 범용 딥러닝 모델과 데이터 유사도 군집결과가 입력되면 가중치와 하이퍼 파라미터 초기값을 출력해주게 된다. At this time, the model initialization unit 30 further includes a meta-learning model management unit 31, and the learning model is the final to be learned as a target using the data similarity cluster result and the general-purpose deep learning model that has been previously learned. It refers to a model that transfer learning the contents of the deep learning model weights and initialization of hyperparameters. When the general-purpose deep learning model and data similarity cluster results are input, weights and initial values of hyperparameters are output.

상기 모델 학습부(40)는 소규모 데이터 셋을 통해 생성/검증된 학습 데이터 셋과 메타 러닝 모델을 통해 생성된 전이학습 최적 데이터를 이용하여 딥러닝 모델을 통상적인 학습 데이터 셋과 레이블링 정보를 통해 학습시키는 모든 딥러닝 모델에 적용하게 된다.The model learning unit 40 learns a deep learning model using a training data set created/verified through a small data set and transfer learning optimal data generated through a meta-learning model through a typical training data set and labeling information. It is applied to all deep learning models.

상기 모델추론부(50)는 학습된 모델을 기반으로 분류 혹은 객체 검출과 같은 추론을 수행하게 된다.The model inference unit 50 performs inference such as classification or object detection based on the learned model.

상기와 같은 본 발명의 소규모 학습 데이터 셋 기반 딥러닝 시스템을 이용한 딥러닝 학습방법은 불균형 분석장치부(10)를 이용하여 딥러닝 모델 학습을 위해, 입력되는 학습 데이터 셋에서 데이터 불균형을 판단하는 단계(S10), 상기 불균형을 판단하는 단계(S10) 후 데이터 불균형 상황에서 소규모 학습 데이터 셋과 참조 데이터 셋으로 분리하는 단계(S20), 소규모 학습 데이터 셋과 참조 데이터 셋을 입력받아 데이터 생성부(20)를 통하여 새로운 데이터 셋을 생성하는 단계(S30), 상기 데이터 셋을 생성하는 단계(S30) 후 생성된 데이터와 소규모 학습 데이터와의 비교를 통해 가상으로 생성된 데이터 셋인지 판단하는 단계(S40), 데이터 생성부(20)의 확률계산유니트(25)를 통하여 검증된 검증 확률값이 0.5에 수렴함에 따라 생성 학습데이터를 선택하는 단계(S50), 데이터 생성부(20)의 모델검증유니트(22)를 통하여 학습 데이터 셋에서 데이터 사이의 유사도를 측정하는 단계(S60), 유사도에 따라 데이터를 클러스터링 하는 단계(S70), 유사도 정보에 따라 기존 학습된 딥러닝 모델을 적용하는 단계(S80), 적용된 딥러닝 모델의 가중치와 하이퍼 파라미터를 지정하는 단계(S90), 생성된 학습 데이터 셋과 가중치, 하이퍼 파라미터 초기값을 지정하여 딥러닝 모델을 학습하는 단계(S100) 및 학습된 딥러닝 모델을 통해서 추론을 수행하는 단계(S110)로 구성된다.The deep learning learning method using the small-scale learning data set-based deep learning system of the present invention as described above includes the steps of determining data imbalance in the input learning data set for deep learning model learning using the imbalance analysis device unit 10 (S10), after determining the imbalance (S10), in a data imbalance situation, separating the small learning data set and the reference data set (S20), receiving the small learning data set and the reference data set as input ) through the step of creating a new data set (S30), the step of determining whether the data set is a virtual generated data set by comparing the generated data with the small-scale learning data after the step of generating the data set (S30) (S40) , selecting the generated learning data as the verification probability value verified through the probability calculation unit 25 of the data generation unit 20 converges to 0.5 (S50), the model verification unit 22 of the data generation unit 20 Measuring the similarity between data in the training dataset through (S60), clustering the data according to the similarity (S70), applying the existing deep learning model according to the similarity information (S80), applied deep The step of specifying the weight and hyperparameters of the learning model (S90), the step of learning the deep learning model by specifying the generated training data set, weights, and initial values of the hyperparameters (S100), and inference through the learned deep learning model It consists of performing a step (S110).

상기와 같은 시스템 및 방법을 제공함으로써 본 발명의 소규모 학습 데이터 셋 기반 딥러닝 시스템과 이를 이용한 딥러닝 학습방법을 제공할 수 있는 것이다. By providing the system and method as described above, it is possible to provide a small-scale learning data set-based deep learning system of the present invention and a deep learning learning method using the same.

10 : 데이터 불균형 분석장치부
20 : 데이터 생성부
30 : 모델 초기화부
40 : 모델 학습부10: data imbalance analysis device unit
20: data generation unit
30: model initialization unit
40: model learning unit

Claims

In a deep learning system based on a small learning data set,
a data imbalance analysis device unit 10 for examining whether there is data imbalance between classes among the training data sets input to learn the deep learning model, and separating and analyzing the small data set and the reference data set;
a data generator 20 that generates a generated data set and a synthetic data set using a small-scale learning data set and a reference data set required for data generation, and determines whether the generated data actually exists;
A model initialization unit (30) that creates a model that transfers learning about the initialization of hyperparameters and weights of the final deep learning model to be learned as a target using the data similarity cluster result and the previously trained general-purpose deep learning model (30) ;
a model learning unit 40 for learning a deep learning model through a typical training data set and labeling information using a training data set created/verified through a small data set and transfer learning optimal data generated through a meta-learning model; and
It consists of a model inference unit 50 that performs inference such as classification or object detection based on the learned model,
The data generation unit 20 includes a generation unit 21 that generates data, a verification unit 22 that determines whether small-scale learning data actually exists, and a learning data generation unit 21 that can acquire data for learning. The generation network management unit 23 that allows the verification of the data, the verification network management unit 24 that enables the acquisition of data that can be referred to so as to verify the data in the verification unit 22, determines the authenticity of the data, and determines the probability value A small-scale learning data set-based deep learning system, characterized in that it consists of a probability calculation unit (25) that can find

The method of claim 1,
The data imbalance analysis unit 10 includes an analysis unit 11 for analyzing input learning data and a classification unit 12 for classifying into a small data set and a reference data set according to the results analyzed by the analysis unit 11 A deep learning system based on a small learning data set, characterized in that it consists of

delete

The method of claim 1,
The probability calculation unit 25 is a small-scale learning data set-based deep learning system, characterized in that it is equipped with the following Equation 1 and calculates a probability value by using it.

[Equation 1]

G : Generative Model to be applied to the generating unit
D: Discriminative Model to be applied to the verification unit
p _z : probability distribution of z
z : randomly generated noise variable
x : actual data

: Probability that D sees real data x and determines that it is real data

The method of claim 1,
The verification unit 22 compares the similarity between the given training data sets and clusters each, and based on the clustered weight, the training data and the meta-learning model to be applied further include a similarity clusterer for giving an application weight. A deep learning system based on a small training data set.

7. The method of claim 6,
The similarity clusterer uses any one of measurement algorithms such as Euclidean distance, Mahalanobis distance, Minkowski distance, and cosine similarity in determining the similarity of data. A small learning data set-based deep learning system.

In a deep learning learning method using a small learning data set-based deep learning system,
For deep learning model learning using the imbalance analyzer unit 10, determining the data imbalance in the input training data set (S10);
After determining the imbalance (S10), dividing the data into a small learning data set and a reference data set in a data imbalance situation (S20);
receiving a small-scale learning data set and a reference data set and generating a new data set through the data generating unit 20 (S30);
After generating the data set (S30), determining whether the data set is a virtual generated data set by comparing the generated data with the small-scale learning data (S40);
selecting the generated learning data as the verification probability value verified through the probability calculation unit 25 of the data generation unit 20 converges to 0.5 (S50);
Measuring the similarity between the data in the training data set through the model verification unit 22 of the data generating unit 20 (S60);
clustering data according to the degree of similarity (S70);
applying the previously learned deep learning model according to the similarity information (S80);
designating weights and hyperparameters of the applied deep learning model (S90); and learning the deep learning model by designating the generated training data set, weights, and initial hyperparameter values (S100); and
A deep learning learning method using a small learning data set-based deep learning system, characterized in that it consists of the step (S110) of performing inference through the learned deep learning model.