KR102183672B1

KR102183672B1 - A Method of Association Learning for Domain Invariant Human Classifier with Convolutional Neural Networks and the method thereof

Info

Publication number: KR102183672B1
Application number: KR1020180059934A
Authority: KR
Inventors: 안하은; 유지상
Original assignee: 광운대학교 산학협력단
Priority date: 2018-05-25
Filing date: 2018-05-25
Publication date: 2020-11-27
Also published as: KR20190134380A

Abstract

도메인 변화에 따른 분류기 성능저하 문제를 해결하기 위하여 분류기가 객체의 연관성을 학습할 수 있도록 손실 함수를 설계하는, 합성곱 신경망에 대한 도메인 불변 사람 분류기를 위한 연관성 학습 시스템 및 방법에 관한 것으로서, 합성곱 신경망 모델(CNN)을 저장하는 신경망 모델부; 신경망 모델을 학습하기 위한 학습 데이터 셋을 저장하는 데이터셋 저장부; 상기 합성곱 신경망 모델의 손실함수를 설정하는 손실함수 설정부; 및, 상기 학습 데이터 셋을 통해, 상기 합성곱 신경망 모델을 학습시키는 학습부를 포함하는 구성을 마련한다.
상기와 같은 시스템 및 방법에 의하여, 해당 방법을 통하여 훈련된 모델이 기존의 방법들보다 도메인 변화에 더 강인하게 동작할 수 있다.In order to solve the problem of classifier performance degradation due to domain change, a convolutional learning system and method for domain-invariant human classifiers for convolutional neural networks, which design a loss function so that the classifier can learn the association of objects. A neural network model unit that stores a neural network model (CNN); A data set storage unit for storing a training data set for training a neural network model; A loss function setting unit for setting a loss function of the convolutional neural network model; And a learning unit that trains the convolutional neural network model through the training data set.
With the above-described system and method, a model trained through the corresponding method can operate more robustly to domain changes than conventional methods.

Description

{A Method of Association Learning for Domain Invariant Human Classifier with Convolutional Neural Networks and the method thereof}

본 발명은 도메인 변화에 따른 분류기 성능저하 문제를 해결하기 위하여 분류기가 객체의 연관성을 학습할 수 있도록 손실 함수를 설계하는, 합성곱 신경망에 대한 도메인 불변 사람 분류기를 위한 연관성 학습 시스템 및 방법에 관한 것이다.The present invention relates to an association learning system and method for a domain-invariant human classifier for a convolutional neural network, in which a loss function is designed so that the classifier can learn the association of objects in order to solve the problem of classifier degradation due to domain change. .

또한, 본 발명은 다양한 도메인에서 획득된 사람 영상들의 임베딩 벡터(embedding vector)들을 서로 유사하게 생성되게 분류기를 훈련하고, 임베딩 벡터들간의 천이확률분포를 균등분포와 닮게 하도록 하여, 분류기가 다양한 도메인에 강인하게 동작하도록 하는, 합성곱 신경망에 대한 도메인 불변 사람 분류기를 위한 연관성 학습 시스템 및 방법에 관한 것이다.In addition, the present invention trains the classifier to generate embedding vectors of human images acquired in various domains similarly to each other, and makes the distribution of transition probability between the embedding vectors resemble the uniform distribution, so that the classifier is in various domains. It relates to an association learning system and method for a domain-invariant human classifier for a convolutional neural network to operate robustly.

일반적으로, 사람 인식은 감시 카메라, 방문자 계산 시스템의 핵심 요소 기술이다. 사람 인식을 위하여 직접 설계한 특징을 기계학습을 통해 분류기를 설계하는 등 다양한 방법들이 제안되었다. 최근에는 비전기반 객체 인식 분야에서 좋은 결과를 보여주는 합성곱 신경망 모델을 이용하는 사람 인식방법도 널리 연구되고 있다.In general, human recognition is a key component technology of surveillance cameras and visitor counting systems. Various methods have been proposed, such as designing a classifier through machine learning for features designed for human recognition. Recently, human recognition methods using a convolutional neural network model showing good results in the field of vision-based object recognition are also being widely studied.

감시 카메라 혹은 방문자 카운팅 시스템의 입력 영상은 카메라의 설치 각도나 거리, 렌즈의 종류에 따라서 다양한 도메인을 가진다. 하지만 대다수의 연구들이 도메인 변화에 따른 알고리즘 성능저하 문제를 다루지 않거나 입력 도메인에 제약을 두는 형태를 취한다.The input image of the surveillance camera or visitor counting system has various domains according to the installation angle or distance of the camera, and the type of lens. However, most of the studies do not deal with the problem of algorithm performance degradation due to domain change or take the form of restricting the input domain.

따라서 도메인 변화에 영향을 받지 않고 분류할 수 있는 신경망 모델이 필요하다. 또한, 이러한 문제를 해결하기 위하여 도메인 불변 사람 분류기의 학습방법이 필요하다.Therefore, a neural network model that can be classified without being affected by domain changes is needed. In addition, in order to solve this problem, a learning method of a domain-invariant person classifier is required.

Haeusser, Philip, Alexander Mordvintsev, and Daniel Cremers. "Learning by association-a versatile semi-supervised training method for neural networks." (CVPR), 2017. Haeusser, Philip, Alexander Mordvintsev, and Daniel Cremers. "Learning by association-a versatile semi-supervised training method for neural networks." (CVPR), 2017. Levi, Dan, and Shai Silberstein. "Tracking and motion cues for rear-view pedestrian detection." Intelligent Transportation Systems (ITSC), 2015. Levi, Dan, and Shai Silberstein. "Tracking and motion cues for rear-view pedestrian detection." Intelligent Transportation Systems (ITSC), 2015. Dollar, Piotr, et al. "Pedestrian detection: An evaluation of the state of the art." IEEE transactions on pattern analysis and machine intelligence, 2012. Dollar, Piotr, et al. "Pedestrian detection: An evaluation of the state of the art." IEEE transactions on pattern analysis and machine intelligence, 2012. Szegedy, Christian, et al. "Going deeper with convolutions." (CVPR), 2015. Szegedy, Christian, et al. "Going deeper with convolutions." (CVPR), 2015. Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556, 2014. Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv: 1409.1556, 2014.

본 발명의 목적은 상술한 바와 같은 문제점을 해결하기 위한 것으로, 여러 종류의 도메인에서 획득한 영상들의 임베딩 벡터(embedding vector)가 서로 유사하게 생성되도록 손실 함수를 구성하여 합성곱 신경망 모델을 훈련하는, 합성곱 신경망에 대한 도메인 불변 사람 분류기를 위한 연관성 학습 시스템 및 방법을 제공하는 것이다.An object of the present invention is to solve the above-described problem, to train a convolutional neural network model by configuring a loss function so that embedding vectors of images acquired in various types of domains are similarly generated, To provide an association learning system and method for a domain-invariant human classifier for a convolutional neural network.

또한, 본 발명의 목적은 다양한 도메인에서 획득한 영상들을 포함한 훈련 배치(batch)를 합성곱 신경망 모델에 통과시켜 임베딩 벡터들을 생성하고, 임베딩 벡터들의 유사도를 이용하여 천이확률을 생성한 뒤, 배치 내의 모든 임베딩 벡터간의 천이확률이 동일하도록 손실 함수를 구성하고, 이를 통하여 모델이 다양한 도메인에 존재하는 사람영상의 연관성을 훈련하도록 하는, 합성곱 신경망에 대한 도메인 불변 사람 분류기를 위한 연관성 학습 시스템 및 방법을 제공하는 것이다.In addition, an object of the present invention is to generate embedding vectors by passing a training batch including images acquired from various domains through a convolutional neural network model, and generate a transition probability by using the similarity of the embedding vectors, A relationship learning system and method for a domain-invariant human classifier for a convolutional neural network, in which a loss function is constructed so that the transition probability between all embedding vectors is the same, and the model trains the association of human images in various domains. To provide.

한편, 임베딩 벡터들의 유사도는 [비특허문헌 1]에서 정의한 유사도 행렬과 같은 벡터들의 유사도를 이용한다.On the other hand, the similarity of the embedding vectors uses the similarity of vectors such as the similarity matrix defined in [Non-Patent Document 1].

상기 목적을 달성하기 위해 본 발명은 합성곱 신경망에 대한 도메인 불변 사람 분류기를 위한 연관성 학습 시스템에 관한 것으로서, 합성곱 신경망 모델(CNN)을 저장하는 신경망 모델부; 신경망 모델을 학습하기 위한 학습 데이터 셋을 저장하는 데이터셋 저장부; 상기 합성곱 신경망 모델의 손실함수를 설정하는 손실함수 설정부; 및, 상기 학습 데이터 셋을 통해, 상기 합성곱 신경망 모델을 학습시키는 학습부를 포함하는 것을 특징으로 한다.In order to achieve the above object, the present invention relates to an association learning system for a domain-invariant human classifier for a convolutional neural network, comprising: a neural network model unit for storing a convolutional neural network model (CNN); A data set storage unit for storing a training data set for training a neural network model; A loss function setting unit for setting a loss function of the convolutional neural network model; And a learning unit that trains the convolutional neural network model through the training data set.

또, 본 발명은 합성곱 신경망에 대한 도메인 불변 사람 분류기를 위한 연관성 학습 시스템에 있어서, 상기 학습용 데이터 셋은 적어도 2이상의 관점에서 촬영된 사람의 이미지들로 포함하는 것을 특징으로 한다.In addition, in the association learning system for a domain-invariant human classifier for a convolutional neural network, the present invention is characterized in that the training data set includes images of people photographed from at least two or more viewpoints.

또, 본 발명은 합성곱 신경망에 대한 도메인 불변 사람 분류기를 위한 연관성 학습 시스템에 있어서, 상기 손실함수 L_total 는 다음 수식 1에 의해 계산되는 것을 특징으로 한다.In addition, in an association learning system for a domain-invariant human classifier for a convolutional neural network, the present invention is characterized in that the loss function L _total is calculated by Equation 1 below.

[수식 1][Equation 1]

단, L_{classification} 은 사람 분류기 모델의 예측 결과확률과 실제 정답확률이 닮은 정도를 표현하는 분류 손실이고, L_similarity 는 훈련 배치(B)에서 생성되는 임베딩 벡터 간의 유사도에 의해 결정되는 손실임.However, L _{classification} is a classification loss that expresses the _similarity between the predicted result probability of the human classifier model and the actual correct answer probability, and L _similarity is the loss determined by the similarity between the embedding vectors generated in the training batch (B).

또, 본 발명은 합성곱 신경망에 대한 도메인 불변 사람 분류기를 위한 연관성 학습 시스템에 있어서, 상기 L_{classification} 은 다음 수식 2에 의해 계산되는 것을 특징으로 한다.In addition, the present invention is characterized in that in the association learning system for a domain-invariant human classifier for a convolutional neural network, the L _{classification} is calculated by the following equation (2).

[수식 2][Equation 2]

단, CE()는 크로스-엔트로피(cross-entropy)를 의미하고, GT(x)는 샘플 x에 대한 실제 정답을 의미하며 H(x)는 샘플 x에 대한 사람 분류기의 예측 결과를 의미함.However, CE() means cross-entropy, GT(x) means the actual correct answer for sample x, and H(x) means the prediction result of the human classifier for sample x.

또, 본 발명은 합성곱 신경망에 대한 도메인 불변 사람 분류기를 위한 연관성 학습 시스템에 있어서, 상기 L_similarity 는 다음 수식 3에 의해 계산되는 것을 특징으로 한다.In addition, in the association learning system for a domain-invariant human classifier for a convolutional neural network, the present invention is characterized in that L _similarity is calculated by Equation 3 below.

[수식 3][Equation 3]

단, CE()는 크로스-엔트로피(cross-entropy)를 의미하고, V는 P^transition 이 닮고자 하는 균등분포를, P^transition 은 임베딩 벡터들 간의 천이 확률을 의미함.However, CE () is a cross-meaning the entropy (cross-entropy) and, V means the transition probabilities between the uniform distribution of the ^transition character P resemble, P is the embedded ^transition vector.

또, 본 발명은 합성곱 신경망에 대한 도메인 불변 사람 분류기를 위한 연관성 학습 시스템에 있어서, 상기 V는 모든 확률이 1/(|B|-1)의 값을 가지는 균등분포이고, |B|는 훈련배치 B 내에 포함된 영상의 개수인 것을 특징으로 한다.In addition, in the present invention, in the association learning system for a domain-invariant human classifier for a convolutional neural network, V is a uniform distribution having all probability values of 1/(|B|-1), and |B| is training It is characterized in that it is the number of images included in the batch B.

또, 본 발명은 합성곱 신경망에 대한 도메인 불변 사람 분류기를 위한 연관성 학습 시스템에 있어서, 상기 P^transition 은 훈련배치 B 내의 임베딩 벡터들에 대한 내적곱을 계산하여 유사도 행렬을 생성하고, 생성된 유사도 행렬에 대하여 소프트맥스(softmax) 함수를 적용하여, 계산된 유사도가 확률 분포를 갖도록 하여 생성된 확률인 것을 특징으로 한다.In addition, in the present invention, in the association learning system for a domain-invariant human classifier for a convolutional neural network, the P ^transition generates a similarity matrix by calculating the dot product of the embedding vectors in the training batch B, and the generated similarity matrix It is characterized in that it is a probability generated by applying a softmax function so that the calculated similarity has a probability distribution.

또한, 본 발명은 합성곱 신경망에 대한 도메인 불변 사람 분류기를 위한 연관성 학습 방법에 관한 것으로서, (a) 합성곱 신경망 모델의 손실함수를 계산하는 단계; (b) 학습 데이터 셋으로부터 훈련 배치를 추출하는 단계; 및, (c) 상기 손실함수로 설정된 합성곱 신경망 모델에 상기 훈련 배치의 데이터를 적용하여, 상기 합성곱 신경망 모델을 학습시키는 단계를 포함하는 것을 특징으로 한다.In addition, the present invention relates to an association learning method for a domain-invariant human classifier for a convolutional neural network, the method comprising: (a) calculating a loss function of a convolutional neural network model; (b) extracting a training batch from the training data set; And (c) training the convolutional neural network model by applying the data of the training batch to the convolutional neural network model set as the loss function.

또, 본 발명은 합성곱 신경망에 대한 도메인 불변 사람 분류기를 위한 연관성 학습 방법에 있어서, 상기 손실함수 L_total 는 다음 수식 4에 의해 계산되는 것을 특징으로 한다.In addition, in the present invention, in the association learning method for a domain-invariant human classifier for a convolutional neural network, the loss function L _total is calculated by Equation 4 below.

[수식 4][Equation 4]

또, 본 발명은 합성곱 신경망에 대한 도메인 불변 사람 분류기를 위한 연관성 학습 방법에 있어서, 상기 L_similarity 는 다음 수식 5에 의해 계산되는 것을 특징으로 한다.In addition, in the present invention, in the association learning method for a domain invariant human classifier for a convolutional neural network, the L _similarity is calculated by the following equation (5).

[수식 5][Equation 5]

또한, 본 발명은 합성곱 신경망에 대한 도메인 불변 사람 분류기를 위한 연관성 학습 방법을 수행하는 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것이다.Further, the present invention relates to a computer-readable recording medium in which a program for performing an association learning method for a domain-invariant human classifier for a convolutional neural network is recorded.

상술한 바와 같이, 본 발명에 따른 합성곱 신경망에 대한 도메인 불변 사람 분류기를 위한 연관성 학습 시스템 및 방법에 의하면, 해당 방법을 통하여 훈련된 모델이 기존의 방법들보다 도메인 변화에 더 강인하게 동작할 수 있는 효과가 얻어진다.As described above, according to the association learning system and method for a domain-invariant human classifier for a convolutional neural network according to the present invention, a model trained through the method can operate more robustly to domain changes than conventional methods. A good effect is obtained.

도 1은 본 발명을 실시하기 위한 전체 시스템의 구성을 도시한 도면.
도 2는 도메인 변화에 따른 합성곱 신경망 모델의 성능저하 문제를 도식화하여 보여주는 도면.
도 3은 본 발명의 일실시예에 따른 합성곱 신경망에 대한 도메인 불변 사람 분류기를 위한 연관성 학습 시스템의 구성에 대한 블록도.
도 4는 본 발명의 실험에 따른 다양한 도메인을 포함하는 데이터 세트의 예시 이미지들.
도 5는 본 발명의 실험에 따른, 다양한 도메인을 포함하는 데이터 세트에 대한 실험 결과를 나타낸 표.
도 6은 본 발명의 일실시예에 따른 합성곱 신경망에 대한 도메인 불변 사람 분류기를 위한 연관성 학습 방법을 설명하는 흐름도.1 is a diagram showing the configuration of an entire system for implementing the present invention.
2 is a diagram schematically showing a problem of performance degradation of a convolutional neural network model according to a domain change.
3 is a block diagram of a configuration of an association learning system for a domain-invariant human classifier for a convolutional neural network according to an embodiment of the present invention.
4 is an exemplary image of a data set including various domains according to an experiment of the present invention.
5 is a table showing experimental results for a data set including various domains according to the experiment of the present invention.
6 is a flowchart illustrating an association learning method for a domain-invariant human classifier for a convolutional neural network according to an embodiment of the present invention.

이하, 본 발명의 실시를 위한 구체적인 내용을 도면에 따라서 설명한다.Hereinafter, specific details for the implementation of the present invention will be described with reference to the drawings.

또한, 본 발명을 설명하는데 있어서 동일 부분은 동일 부호를 붙이고, 그 반복 설명은 생략한다.In addition, in describing the present invention, the same parts are denoted by the same reference numerals, and repeated explanations thereof are omitted.

먼저, 본 발명을 실시하기 위한 전체 시스템의 구성의 예들에 대하여 도 1을 참조하여 설명한다.First, examples of the configuration of an entire system for implementing the present invention will be described with reference to FIG. 1.

도 1에서 보는 바와 같이, 본 발명을 실시하기 위한 전체 시스템은 신경망 모델(11)과 데이터 셋(12)을 입력받아, 합성곱 신경망(CNN) 모델을 연관성 학습을 수행시키는 컴퓨터 단말(20) 상의 프로그램 시스템으로 실시될 수 있다. 즉, 상기 연관성 학습 방법 또는 시스템은 프로그램으로 구성되어 컴퓨터 단말(20)에 설치되어 실행될 수 있다. 컴퓨터 단말(20)에 설치된 프로그램은 하나의 프로그램 시스템(30)과 같이 동작할 수 있다.As shown in Fig. 1, the entire system for implementing the present invention receives a neural network model 11 and a data set 12, and performs association learning on a convolutional neural network (CNN) model on a computer terminal 20. It can be implemented as a program system. That is, the association learning method or system may be configured as a program and installed and executed in the computer terminal 20. A program installed in the computer terminal 20 may operate like a single program system 30.

이때, 신경망 모델(11)은 합성곱 신경망 모델(CNN, Convolutional Neural Network)이다.At this time, the neural network model 11 is a convolutional neural network (CNN) model.

또한, 데이터 셋(12)은 신경망 모델(11)을 학습시키기 위한 학습 데이터들이다. 바람직하게는, 데이터 셋(12)은 다양한 도메인에 존재하는 사람 영상 또는 사람의 이미지들로 구성된다.In addition, the data set 12 is training data for training the neural network model 11. Preferably, the data set 12 is composed of human images or images of people present in various domains.

한편, 다른 실시예로서, 상기 연관성 학습 방법 또는 시스템은 프로그램으로 구성되어 범용 컴퓨터에서 동작하는 것 외에 ASIC(주문형 반도체) 등 하나의 전자회로로 구성되어 실시될 수 있다. 또는 데이터 셋(12)을 이용하여 신경망 모델(11)을 학습시키는 것만을 전용으로 처리하는 전용 단말(30)로 개발될 수도 있다. 이를 연관성 학습 시스템이라 부르기로 한다. 그 외 가능한 다른 형태도 실시될 수 있다.On the other hand, as another embodiment, the association learning method or system may be implemented as an electronic circuit such as an ASIC (on-demand semiconductor) in addition to being configured as a program and operating on a general-purpose computer. Alternatively, it may be developed as a dedicated terminal 30 that exclusively processes only training the neural network model 11 using the data set 12. This will be referred to as an association learning system. Other possible forms may also be implemented.

다음으로, 본 발명의 구성을 설명하기 전에, 도메인 변화에 따른 합성곱 신경망 모델의 성능저하 문제를 보다 구체적으로 설명한다.Next, before describing the configuration of the present invention, a problem of performance degradation of a convolutional neural network model according to a domain change will be described in more detail.

도 2는 도메인 변화에 따른 합성곱 신경망 모델의 성능저하 문제를 도식화 하여 보여준다. 도 2의 좌측의 영상은 다양한 도메인에서 획득한 사람 영상을 보여준다. 도 2의 우측의 사각형들은 좌측의 영상들을 합성곱 신경망 모델에 통과시켜 생성된 임베딩 벡터들을 표현하며, 같은 색의 벡터들은 서로 유사도가 높은 벡터들을 의미한다.2 schematically shows the problem of performance degradation of a convolutional neural network model according to a domain change. The image on the left of FIG. 2 shows human images acquired in various domains. The squares on the right side of FIG. 2 represent embedding vectors generated by passing the left images through a convolutional neural network model, and vectors of the same color mean vectors having high similarity to each other.

동일하거나 유사한 도메인에서 획득된 사람 영상들에서 생성된 임베딩 벡터들은 서로 높은 유사도를 가지지만, 상이한 도메인에서 획득된 영상에 대한 임베딩 벡터와의 유사도는 낮게 측정된다. 이는 합성곱 신경망 모델이 이전에 보지 못한 새로운 영상에 대한 훈련이 이루어 지지 않았기 때문에 발생하는 현상이며, 이러한 현상은 성능저하의 원인이 된다.Embedding vectors generated from human images acquired in the same or similar domains have a high degree of similarity to each other, but the similarity with embedding vectors for images acquired in different domains is measured to be low. This is a phenomenon that occurs because the convolutional neural network model has not been trained on a new image that has not been seen before, and this phenomenon causes performance degradation.

본 발명에 따른 방법에서는 이러한 문제를 해결하기 위하여 다양한 도메인에서 획득된 영상들의 임베딩 벡터가 유사하게 생성되도록 합성곱 신경망 모델을 훈련한다.In the method according to the present invention, in order to solve this problem, a convolutional neural network model is trained so that embedding vectors of images acquired in various domains are similarly generated.

다음으로, 본 발명의 일실시예에 따른 합성곱 신경망에 대한 도메인 불변 사람 분류기를 위한 연관성 학습 시스템의 구성을 도 3을 참조하여 설명한다.Next, a configuration of an association learning system for a domain-invariant human classifier for a convolutional neural network according to an embodiment of the present invention will be described with reference to FIG. 3.

도 3에서 보는 바와 같이, 본 발명에 따른 채널 확장 매개변수 설정 시스템은 합성곱 신경망 모델(CNN)을 저장하는 신경망 모델부(31), 신경망 모델을 학습하기 위한 학습 데이터 셋을 저장하는 데이터셋 저장부(32), 손실함수를 설정하는 손실함수 설정부(33), 및, 학습 데이터 셋을 통해 신경망 모델을 학습시키는 학습부(34)로 구성된다. 또한, 합성곱 신경망 모델(CNN), 데이터 셋 등을 저장하는 저장부(38)로 구성된다.As shown in FIG. 3, the channel extension parameter setting system according to the present invention includes a neural network model unit 31 that stores a convolutional neural network model (CNN), and a data set that stores a training data set for training a neural network model. It consists of a unit 32, a loss function setting unit 33 for setting a loss function, and a learning unit 34 for training a neural network model through a training data set. In addition, it is composed of a storage unit 38 that stores a convolutional neural network model (CNN), a data set, and the like.

먼저, 신경망 모델부(31)는 합성곱 신경망 모델(CNN)을 저장한다. 합성곱 신경망 모델(CNN, Convolutional Neural Network)은 입력받아 저장되거나, 사전에 설정되어 저장될 수 있다.First, the neural network model unit 31 stores a convolutional neural network model (CNN). A convolutional neural network model (CNN) may be input and stored, or may be set and stored in advance.

다음으로, 데이터셋 저장부(32)는 신경망 모델을 학습시키기 위한 학습 데이터들의 데이터 셋을 저장한다. 학습용 데이터 셋은 입력받아 저장되거나, 사전에 설정되어 저장될 수 있다.Next, the data set storage unit 32 stores a data set of training data for training the neural network model. The learning data set may be input and stored, or may be set and stored in advance.

도 4는 학습용 데이터 세트의 예시 이미지들을 보여주고 있다.4 shows example images of a training data set.

도 4에서 보는 바와 같이, 바람직하게는, 학습용 데이터 셋은 다양한 도메인을 포함하는 사람 이미지에 대한 데이터 세트이다. 즉, 다양한 관점(또는 도메인)에서 촬영된 이미지로서, 사람을 촬영한 영상(또는 이미지)들이다.As shown in FIG. 4, preferably, the training data set is a data set for human images including various domains. That is, as images taken from various viewpoints (or domains), they are images (or images) of a person.

다음으로, 손실함수 설정부(33)는 신경망 모델을 학습(또는 훈련)시키기 위한 손실함수를 설정한다.Next, the loss function setting unit 33 sets a loss function for learning (or training) the neural network model.

신경망 모델을 학습 데이터로 학습(또는 훈련)시킨다는 것은, 학습을 통해 신경망 모델의 가중치 매개변수를 최적의 값으로 찾도록 하는 것을 의미이다. 이러한 신경망이 학습할 수 있도록 해주는 지표가 손실함수이다. 즉, 학습 데이터들에 대하여 손실함수의 값을 가장 적게 만드는 가중치 매개변수를 찾는 것이 학습과정이다. 손실함수가 가장 적다는 것은 그만큼 데이터와 예측값이 정확해진다는 것이고 그 손실함수를 가장 적게하는 것이 최적의 가중치를 의미하는 것이기 때문이다.Learning (or training) a neural network model with training data means finding the weight parameter of the neural network model as an optimal value through learning. The index that allows these neural networks to learn is the loss function. That is, the learning process is to find the weight parameter that makes the value of the loss function the least for the training data. This is because the least loss function means that the data and the predicted value become more accurate, and that the least loss function means the optimal weight.

다시말하면, 손실함수는 모델을 통해 예측한 값과 실제로 발생하기를 원했던 값간의 차이를 계산하는 함수이다. 학습시키고자 하는 사람 분류기 모델의 파라미터를 변경하면서 예측값과 실제값의 차이를 줄여나간다. 생성하고자 하는 모델의 목적에 따라 여러 종류의 함수가 존재할 수 있다.In other words, the loss function is a function that calculates the difference between the value predicted through the model and the value actually desired to occur. By changing the parameters of the human classifier model to be trained, the difference between the predicted value and the actual value is reduced. Several types of functions can exist depending on the purpose of the model to be created.

손실 함수(L_total)는 수학식 1을 통하여 계산한다. The loss function (L _total ) is calculated through Equation 1.

[수학식 1][Equation 1]

여기서 L_{classification} 은 신경망 모델 훈련에 일반적인 분류 손실로서, 사람 분류기 모델의 예측 결과확률과 실제 정답확률이 얼마나 닮아있는지를 표현한다. 즉, L_{classification} 은 사람 분류기 모델이 사람을 얼마나 잘 분류하는지를 판단하는지를 수학적으로 기술하며, 다음과 같은 식에 의해 계산된다.Here, L _{classification} is a general classification loss in neural network model training, and expresses how similar the predicted result probability of the human classifier model and the actual correct answer probability are. In other words, L _{classification} describes mathematically how well the human classifier model _classifies people, and is calculated by the following equation.

[수학식 2][Equation 2]

여기서, GT(x)는 샘플 x에 대한 실제 정답을 의미하며 H(x)는 샘플 x에 대한 사람 분류기의 예측 결과를 의미한다.Here, GT(x) means the actual correct answer for sample x, and H(x) means the prediction result of the human classifier for sample x.

CE()는 크로스-엔트로피(cross-entropy)이다. 크로스-엔트로피는 두 확률분포가 얼마나 닮아 있는지를 수치적으로 표현하는 역할을 한다. 확률분포 P와 Q에 대한 크로스-엔트로프 CE(P,Q)는 다음과 같이 계산된다.CE() is cross-entropy. Cross-entropy serves to express numerically how similar the two probability distributions are. The cross-entropy CE(P,Q) for the probability distributions P and Q is calculated as follows.

[수학식 3][Equation 3]

여기서 샘플 x에 대한 P와 Q의 확률값이 모두 동일할경우 CE(P,Q)의 결과는 0이 된다. 반면에 샘플 x에 대한 P의 확률값이 0이고, Q의 확률값이 1인 경우 CE(P,Q)의 결과는 무한대가 된다. 이를 통하여 훈련하고자 하는 모델의 예측결과와 실제 정답 결과의 차이를 비교하여 수치적으로 표현할 수 있다.Here, if the probability values of P and Q for sample x are the same, the result of CE(P,Q) is 0. On the other hand, when the probability value of P for sample x is 0 and the probability value of Q is 1, the result of CE(P,Q) becomes infinite. Through this, the difference between the predicted result of the model to be trained and the actual correct answer result can be compared and expressed numerically.

한편, 여기서 말하는 샘플 x는 훈련배치 B에서의 영상을 의미한다. 훈련배치 B를 L_{classification} 과 L_similarity 에 모두 동일하게 적용한다. 훈련배치 B에 대해서는 이하에서 보다 구체적으로 설명한다.On the other hand, the sample x referred to here means the image in the training batch B. Training batch B applies equally to both L _{classification} and L _similarity . Training arrangement B will be described in more detail below.

다음으로, L_similarity 는 훈련배치(B)의 임베딩 벡터들의 서로 얼마나 닮아있는지를 표현한다. 즉, L_similarity 는 훈련 배치(B)에서 생성되는 임베딩 벡터 간의 유사도에 의해 결정되는 손실이며 수학식 2를 통하여 계산한다.Next, L _similarity expresses how similar the embedding vectors of the training batch (B) are to each other. That is, L _similarity is a loss determined by the degree of similarity between embedding vectors generated in the training batch (B) and is calculated through Equation 2.

[수학식 4][Equation 4]

여기서 CE()는 크로스-엔트로피(cross-entropy)를 의미한다.Here, CE() means cross-entropy.

V는 P^transition 이 닮고자 하는 균등분포를, P^transition 은 임베딩 벡터들 간의 천이 확률을 의미한다. V 와 P^transition 는 [비특허문헌 1]에서 제안한 방문손실(visit loss)과 유사하게 구할 수 있다.V is a uniform distribution of the ^transition character P resemble, P ^transition means the transition probabilities between the embedded vector. V and P ^transition can be obtained similarly to the visit loss proposed in [Non-Patent Document 1].

본 발명에서는 사람 객체만을 훈련하기 때문에 V 균등분포를 1/(|B|-1)로 설정한다. |B|는 스칼라(scalar) 값이다. 훈련배치 B 내에 포함된 영상의 개수 |B|를 이용하여 V 균등분포를 계산한다In the present invention, since only human objects are trained, the uniform distribution of V is set to 1/(|B|-1). |B| is a scalar value. V uniform distribution is calculated using the number of images in training batch B |B|

구체적으로, V는 단순하게 모든 확률이 1/(|B|-1)의 값을 가지는 균등분포이다. |B|개의 영상을 포함하는 훈련배치 B에서 총 |B|개의 임베딩 벡터가 생성된다. 이중 한 개의 벡터와 나머지 |B|-1개의 벡터들이 서로 유사하게 훈련이 되도록 V균등분포를 1/(|B|-1)로 설정한다. Specifically, V is simply a uniform distribution in which all probabilities have a value of 1/(|B|-1). In training batch B containing |B| images, a total of |B| embedding vectors are generated. V uniform distribution is set to 1/(|B|-1) so that one vector and the remaining |B|-1 vectors are trained similarly to each other.

또한, P^transition 은 훈련배치 B내의 임베딩 벡터들의 유사도를 확률분포로 표현한 것이다. 훈련배치 B내의 |B|개의 임베딩 벡터들에 대한 내적곱을 계산하여 유사도 행렬을 생성한다. 생성된 유사도 행렬에 대하여 소프트맥스(softmax) 함수를 적용하여 계산된 유사도가 확률 분포를 갖도록 한다. 이렇게 계산된 P^transition 확률분포가 V확률 분포를 닮아가도록 훈련을 진행하게 된다.In addition, P ^transition is a probability distribution representing the similarity of the embedding vectors in training arrangement B. A similarity matrix is generated by calculating the dot product of |B| embedding vectors in training batch B. The calculated similarity is made to have a probability distribution by applying a softmax function to the generated similarity matrix. Training is performed so that the calculated P ^transition probability distribution resembles the V probability distribution.

배치 B에는 다양한 도메인에서 획득된 사람 영상을 포함하기 때문에 훈련 초기의 P^transition 의 값은 서로 비슷한 도메인에 대해서는 높게 측정된다.Since batch B includes human images acquired in various domains, the P ^transition value at the initial stage of training is measured high for domains that are similar to each other.

이러한 경우에는 손실함수가 수학식 2의 L_similarity 에 의하여 페널티(penalty)를 받게 되며 학습이 진행될수록 P^transition 의 값이 V 균등분포를 닮아감에 따라 훈련된 합성곱 신경망 모델이 도메인에 불변하게 동작하게 된다.In this case, the loss function is penalized by the L _similarity in Equation 2, and the trained convolutional neural network model operates invariably in the domain as the value of the P ^transition resembles the V uniform distribution as learning progresses. Is done.

참고로, 훈련 배치(B)는 전체 데이터 셋에서 |B|개의 훈련영상을 임의로 추출하여 생성한다. 예를들어 100개의 데이터 셋에서 10개의 훈련영상을 임의로 추출하여 훈련배치를 생성한다면 총 10개의 훈련배치(B₁~B₁₀)가 생성된다. 즉, 데이터 세트에 포함된 영상들을 랜덤하게 B개씩 추출하여 하나로 묶는다. 이를 하나의 배치라고 하며 모델훈련에 사용되어 훈련배치라고 명시한다.For reference, training batch (B) is generated by randomly extracting |B| number of training images from the entire data set. For example, if 10 training images are randomly extracted from 100 data sets and a training batch is generated, a total of 10 training batches (B ₁ ~ B ₁₀ ) are generated. That is, B images included in the data set are randomly extracted and grouped together. This is called one batch, and it is used for model training and is specified as a training batch.

다음으로, 학습부(34)는 신경망 모델의 손실함수가 설정되면, 학습 데이터 셋을 통해 신경망 모델을 학습시킨다.Next, when the loss function of the neural network model is set, the learning unit 34 trains the neural network model through the training data set.

다음으로, 본 발명의 일실시예에 따른 합성곱 신경망에 대한 도메인 불변 사람 분류기를 위한 연관성 학습 방법을 도 6을 참조하여 설명한다.Next, an association learning method for a domain-invariant human classifier for a convolutional neural network according to an embodiment of the present invention will be described with reference to FIG. 6.

도 6에서 보는 바와 같이, 먼저, 합성곱 신경망 모델의 손실함수를 계산한다(S10). 손실함수는 앞서 수학식 1 및 2에 의하여 계산한다.As shown in FIG. 6, first, the loss function of the convolutional neural network model is calculated (S10). The loss function is previously calculated by Equations 1 and 2.

다음으로, 학습 데이터 셋으로부터 훈련 배치를 추출한다(S20).Next, a training batch is extracted from the training data set (S20).

그리고 손실함수로 설정된 합성곱 신경망 모델에 상기 훈련 배치의 데이터를 적용하여, 상기 합성곱 신경망 모델을 학습시킨다(S30).Then, the convolutional neural network model is trained by applying the data of the training batch to the convolutional neural network model set as a loss function (S30).

다음으로, 본 발명의 효과를 실험을 통해 설명한다.Next, the effects of the present invention will be described through experiments.

즉, 본 발명에 따른 방법이 도메인에 불변하게 동작하는 것을 보이기 위하여 다양한 도메인을 포함하는 데이터 세트를 구성하여 실험을 수행하였다. 실험에 이용된 데이터 세트는 GM-ATCI[비특허문헌 2], 칼텍 보행자(Caltech-pedestrian)[비특허문헌 3]와 자체적으로 획득한 영상들로 구성된다.That is, in order to show that the method according to the present invention operates invariably in a domain, an experiment was performed by configuring a data set including various domains. The data set used in the experiment consists of GM-ATCI [Non-Patent Literature 2], Caltech-pedestrian [Non-Patent Literature 3] and images obtained by themselves.

도 4는 본 발명에서 사용한 데이터 세트의 일부를 보여준다. 본 발명에서는 [비특허문헌 2]에서 제안한 합성곱 신경망 모델을 이용하였으며, 배치의 크기는 100으로 설정하여 훈련을 수행하였다.4 shows a part of the data set used in the present invention. In the present invention, the convolutional neural network model proposed in [Non-Patent Document 2] was used, and training was performed by setting the batch size to 100.

도 5의 표는 본 발명에서 구성한 데이터 세트에 대한 사람 분류 실험결과를 보여준다. [비특허문헌 4]와 [비특허문헌 5]의 모델 구조에 대하여 동일한 훈련 데이터와 조건을 이용하여 보행자 분류 모델을 학습하였다. 도 5의 표의 VGG16 모델의 결과는 본 발명에 따른 방법과 큰 성능차이를 보여준다. VGG16 모델 훈련의 경우 본 발명에 따른 방법과 비교할 때 손실 함수 수학식 1에서 수학식 2를 제거하여 훈련한 것과 동일하다. 이를 통하여 본 발명에 따른 방법이 다양한 도메인에 강인하게 동작하는 것을 확인할 수 있다.The table of FIG. 5 shows the results of a human classification experiment for the data set constructed in the present invention. A pedestrian classification model was trained using the same training data and conditions for the model structures of [Non-Patent Document 4] and [Non-Patent Document 5]. The results of the VGG16 model in the table of FIG. 5 show a large difference in performance from the method according to the present invention. In the case of training the VGG16 model, it is the same as training by removing Equation 2 from the loss function Equation 1 when compared to the method according to the present invention. Through this, it can be confirmed that the method according to the present invention operates robustly in various domains.

본 발명에서는 도메인 불변 사람 분류기 학습 방법을 설명하였다. 여러 도메인에서 획득한 영상들의 임베딩 벡터들을 유사하게 만들기 위하여 새로운 손실함수를 제안하였다. 이를 통하여 합성곱 신경망 모델이 사람영상들의 연관성을 학습할 수 있게 되어 도메인 불변하게 동작함을 보였다. 다양한 환경에서 수집한 데이터 세트에 대한 실험을 통하여 제안하는 방법을 사용할 때 훈련된 모델이 기존의 방법들보다 도메인 변화에 더 강인하게 동작하는 것을 확인하였다.In the present invention, a method of learning a domain invariant person classifier has been described. A new loss function is proposed to make the embedding vectors of images acquired in several domains similar. Through this, it was shown that the convolutional neural network model can learn the association between human images and thus operates domain-invariably. When using the proposed method through experiments on data sets collected in various environments, it was confirmed that the trained model operates more robustly to domain changes than the existing methods.

이상, 본 발명자에 의해서 이루어진 발명을 상기 실시 예에 따라 구체적으로 설명하였지만, 본 발명은 상기 실시 예에 한정되는 것은 아니고, 그 요지를 이탈하지 않는 범위에서 여러 가지로 변경 가능한 것은 물론이다.In the above, the invention made by the present inventor has been described in detail according to the above embodiment, but the invention is not limited to the above embodiment, and it goes without saying that various modifications can be made without departing from the gist of the invention.

11 : 신경망 모델 12 : 데이터 셋
20 : 컴퓨터 단말 30 : 프로그램 시스템
31 : 신경망 모델부 32 : 데이터셋 저장부
33 : 손실함수 설정부 34 : 학습부
38 : 저장부11: neural network model 12: data set
20: computer terminal 30: program system
31: neural network model unit 32: data set storage unit
33: loss function setting unit 34: learning unit
38: storage

Claims

In the association learning system for a domain-invariant human classifier for a convolutional neural network,
A neural network model unit that stores a convolutional neural network model (CNN);
A data set storage unit for storing a training data set for training a neural network model;
A loss function setting unit for setting a loss function of the convolutional neural network model; And,
Including a learning unit for training the convolutional neural network model through the training data set,
The loss function L _total is calculated by the following Equation 1,
In Equation 1, V is a uniform distribution in which all probabilities have a value of 1/(|B|-1), |B| is the number of images in training batch B, and training including |B| In batch B, a total of |B| embedding vectors are created, and one of them and the remaining |B|-1 vectors are trained similarly, so that the V uniform distribution is set to 1/(|B|-1) and,
In Equation 1, P ^transition generates a similarity matrix by calculating the dot product of |B| embedding vectors in training batch B, and by applying a softmax function to the generated similarity matrix, the calculated similarity is An association learning system for a domain-invariant human classifier for a convolutional neural network, characterized in that it is a probability generated by having a probability distribution.
[Equation 1]

,

However, L _{classification} is a classification loss expressing the _similarity between the predicted result probability of the human classifier model and the actual correct answer probability, L _similarity is the loss determined by the similarity between embedding vectors generated in the training batch (B), and CE( ) is cross-meaning the entropy (cross-entropy) and, V means the transition probabilities between the uniform distribution of the ^transition character P resemble, P is the embedded ^transition vector.

The method of claim 1,
The association learning system for a domain-invariant human classifier for a convolutional neural network, wherein the training data set includes images of people photographed from at least two or more viewpoints.

delete

The method of claim 1,
The L _{classification} is an association learning system for a domain-invariant human classifier for a convolutional neural network, characterized in that calculated by the following equation (2).
[Equation 2]

However, CE() means cross-entropy, GT(x) means the actual correct answer for sample x, and H(x) means the prediction result of the human classifier for sample x.

delete

In the association learning method for a domain invariant human classifier for a convolutional neural network,
(a) calculating a loss function of the convolutional neural network model;
(b) extracting a training batch from the training data set; And,
(c) training the convolutional neural network model by applying the data of the training batch to the convolutional neural network model set as the loss function,
The loss function L _total is calculated by the following Equation 4,
In Equation 4, V is a uniform distribution with all probability values of 1/(|B|-1), |B| is the number of images included in training batch B, and training including |B| In batch B, a total of |B| embedding vectors are created, and one of them and the remaining |B|-1 vectors are trained similarly, so that the V uniform distribution is set to 1/(|B|-1) and,
In Equation 4, P ^transition generates a similarity matrix by calculating the dot product of |B| embedding vectors in training batch B, and by applying a softmax function to the generated similarity matrix, the calculated similarity is An association learning method for a domain-invariant human classifier for a convolutional neural network, characterized in that it is a probability generated by having a probability distribution.
[Equation 4]

,

delete

A recording medium that can be read by a computer on which a program that performs the method of claim 8 is recorded.