KR20200000824A

KR20200000824A - Method for recognizing facial expression based on deep-learning model using center-dispersion loss function

Info

Publication number: KR20200000824A
Application number: KR1020190075600A
Authority: KR
Inventors: 양현승; 압힐라샤 난다
Original assignee: 한국과학기술원
Priority date: 2018-06-25
Filing date: 2019-06-25
Publication date: 2020-01-03

Abstract

The present invention relates to a technique regarding video processing using deep learning, which is a method for recognizing a facial expression. A plurality of training images regarding a facial expression is received to extract a training feature so as to learn a classification model of an artificial neural network including a loss function, and a target image including a face is received to extract a target feature to use the learned classification model of the artificial neural network to classify the target feature, so that the loss function is configured to combine a first loss function increasing intra-class variation and a second loss function increasing an inter-class difference while the facial expression of the target image is identified.

Description

Method for recognizing facial expression based on deep-learning model using center-dispersion loss function}

본 발명은 딥 러닝(deep learning)을 이용한 영상 처리에 관한 기술로, 특히 분류 손실 함수를 이용하여 얼굴 표정을 분류하고 그에 해당하는 감정을 식별하는 방법 및 장치에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to image processing using deep learning, and more particularly, to a method and apparatus for classifying facial expressions using classification loss functions and identifying corresponding emotions.

사람은 일상적인 상호작용에서 음성이나 손짓, 몸짓과 함께 주로 얼굴 표정을 통해 감정을 표현한다. 감정을 이해하고 사람들의 표현에 어떻게 반응해야 하는지를 아는 것은 상호 작용을 원활하게 한다. 이에 따라 컴퓨터 과학, 철학, 신경 과학 및 기타 관련 분야의 분야에서 표정 인식이 중요해졌다.In everyday interactions, people express their emotions primarily through facial expressions, along with voices, gestures and gestures. Understanding emotions and knowing how to respond to people's expressions facilitates interaction. Accordingly, facial expression recognition has become important in areas of computer science, philosophy, neuroscience and other related fields.

얼굴 표정은 여러 연구 분야에서 수년간 연구되어 왔으며, 그 중요성과 인간-컴퓨터 상호작용(Human-Computer Interaction, HCI)의 응용으로 주목을 받고 있다. 얼굴 표정 연구의 목표는 얼굴 표정을 기본적인 인간 감정, 즉 분노, 혐오, 공포, 행복, 중립, 슬픔 및 놀라움의 일곱 가지로 분류하는 문제로 간주된다. 이러한 얼굴 표정을 컴퓨터를 통해 처리함에 있어서, 얼굴 외양이나 머리 자세, 조명 및 가려짐에 의해 달라지는 표정 데이터의 부족, 동일한 분류 내의 큰 비유사성과 서로 다른 분류 간의 큰 유사성이 얼굴 표정 인식 문제가 직면한 문제 중 하나이다.Facial expressions have been studied for many years in the field of research, and they are attracting attention because of their importance and the application of Human-Computer Interaction (HCI). The goal of facial expression research is regarded as a problem of classifying facial expressions into seven basic human emotions: anger, disgust, fear, happiness, neutrality, sadness and surprise. In processing these facial expressions through a computer, the lack of facial expression data that varies by facial appearance or head posture, lighting, and obstruction, large dissimilarities in the same classification, and large similarities between different classifications face the problem of facial expression recognition. Is one of the problems.

수년간에 걸쳐 얼굴 표정을 분류하기 위해 많은 수작업 알고리즘이 제안되었다. 그러나 심층 학습 프레임워크(deep learning frameworks)를 적용하는 것이 수작업 방법보다 더 효율적이라는 것이 입증되었다. 이를 위해 얼굴의 특징이 대상의 표정의 변화를 나타내는 정적 이미지 또는 비디오 프레임에서 추출된다.Many manual algorithms have been proposed to classify facial expressions over the years. However, applying deep learning frameworks has proven to be more efficient than manual methods. To this end, facial features are extracted from static images or video frames representing changes in facial expressions.

얼굴 표정 데이터가 부족하기 때문에, 표정 데이터셋(dataset)의 네 가지를 수집하고 결합하여 심층 신경망을 학습할 수 있다. 또한, 맞춤 조합 데이터셋으로 학습한 후 FER2013 시험용 데이터셋에서 신경망을 시험할 수 있다. 얼굴 표정 학습을 위해, 전통적인 CNN(Convolutional Neural Networks)은 분류 문제에 소프트맥스 손실(softmax loss)을 사용하였다. 그러나 소프트맥스 손실은 분류별로 식별이 용이하지 않고 분류 내의 변형과 분류 간의 유사성 문제를 감소시킬 수 없다는 점이 문제점으로 지적되었다.Because of the lack of facial expression data, four types of facial expression datasets can be collected and combined to learn deep neural networks. You can also learn from custom combination datasets and then test neural networks in the FER2013 test dataset. To learn facial expressions, traditional Convolutional Neural Networks (CNN) used softmax loss for classification problems. However, it was pointed out that Softmax loss is not easy to identify by classification and can not reduce the similarity problem between classification and classification in classification.

Shan, C., Gong, S., & McOwan, P. W. (2009). Facial expression recognition based on local binary patterns: A comprehensive study. Image and Vision Computing, 27(6), 803-816.Shan, C., Gong, S., & McOwan, P. W. (2009). Facial expression recognition based on local binary patterns: A comprehensive study. Image and Vision Computing, 27 (6), 803-816.

본 발명이 해결하고자 하는 기술적 과제는, 종래의 컴퓨터 비전 기술에서 얼굴 인식을 수행함에 있어서 얼굴 표정 데이터셋의 특성상 분류 내의 유사성과 분류 간의 유사성이 높다는 문제점을 해소하고, 전통적인 분류 손실 함수의 특성으로 인해 분류별 구별이 곤란하거나 또는 서로 다른 클러스터가 근접하여 중첩되는 상황에서 분류 간의 유사성을 감소시키기 어렵다는 약점을 극복하고자 한다.The technical problem to be solved by the present invention is to solve the problem that the similarity between the classification and the similarity in the classification due to the characteristics of the facial expression dataset in performing the face recognition in the conventional computer vision technology, due to the characteristics of the conventional classification loss function It is intended to overcome the weakness that it is difficult to reduce the similarity between classifications in situations where classification is difficult to distinguish or when different clusters are closely overlapped.

상기 기술적 과제를 해결하기 위하여, 본 발명의 일 실시예에 따른 얼굴 표정 인식 방법은, 얼굴 표정 인식 장치가 얼굴 표정에 관한 복수의 트레이닝 이미지를 입력받아 트레이닝 특징(feature)을 추출하여 손실 함수를 포함하는 인공 신경망의 분류 모델을 학습시키는 단계; 및 상기 얼굴 표정 인식 장치가 얼굴을 포함하는 대상 이미지를 입력받아 대상 특징을 추출하고 학습된 상기 인공 신경망의 분류 모델을 이용하여 상기 대상 특징을 분류함으로써 상기 대상 이미지의 얼굴 표정을 식별하는 단계;를 포함하고, 상기 손실 함수는, 분류 내 변화(intra-class variation)를 증가시키는 제 1 손실 함수 및 분류 간 차이(inter-class difference)를 증가시키는 제 2 손실 함수를 결합하여 구성된다.In order to solve the above technical problem, the facial expression recognition method according to an embodiment of the present invention, the facial expression recognition apparatus receives a plurality of training images related to the facial expression to extract a training feature (feature) includes a loss function Training a classification model of an artificial neural network; And receiving, by the facial expression recognition apparatus, a target image including a face, extracting a target feature, and classifying the target feature using a trained classification model of the artificial neural network to identify the facial expression of the target image. And the loss function is configured by combining a first loss function that increases intra-class variation and a second loss function that increases inter-class difference.

일 실시예에 따른 얼굴 표정 인식 방법에서, 상기 인공 신경망은 컨볼루셔널 인공 신경망(Convolutional Neural Network, CNN)이고, 상기 제 2 손실 함수는 각각의 분류들의 중심 간의 평균 제곱 차이를 산출함으로써 서로 다른 분류 중심 간의 거리를 최대화하는 중심 분산 손실(Center Dispserion Loss) 함수로 구성될 수 있다. 또한, 상기 제 1 손실 함수는 소프트맥스 손실(Softmax Loss) 함수로 구성될 수 있다.In the facial expression recognition method according to an embodiment, the artificial neural network is a convolutional neural network (CNN), and the second loss function calculates a different classification by calculating a mean square difference between the centers of the respective classifications. Center Dispserion Loss function that maximizes the distance between centers. In addition, the first loss function may be configured as a softmax loss function.

일 실시예에 따른 얼굴 표정 인식 방법에서, 상기 손실 함수는, 상기 제 1 손실 함수 및 상기 제 2 손실 함수의 균형을 고려하여 설정된 하이퍼 파라미터를 이용하여 상기 제 1 손실 함수 및 상기 제 2 손실 함수를 선형 결합함으로써 산출될 수 있다.In the facial expression recognition method according to an embodiment, the loss function is configured to perform the first loss function and the second loss function using a hyperparameter set in consideration of a balance between the first loss function and the second loss function. It can be calculated by linear combination.

일 실시예에 따른 얼굴 표정 인식 방법에서, 상기 인공 신경망의 분류 모델을 학습시키는 단계는, 얼굴 표정에 관하여 실험(lab) 환경 및 실제(wild) 환경으로부터 수집된 복수의 트레이닝 이미지를 입력받아 트레이닝 특징을 추출하는 단계: 추출된 상기 트레이닝 특징에 대해 랜드마크(landmark) 검출기를 통해 검출 결과를 생성하는 단계; 생성된 검출 결과를 이용하여 복수의 컨볼루션 레이어(Convolution Layer)로 구성된 딥 네트워크(Ddeep Network) 구조를 학습시키는 단계; 및 전이 학습(Transfer learning)을 통해 학습이 완료된 모델(Pre-Training Model)을 재학습시킴으로써 인공 신경망의 분류 모델을 미세 조정(fine-tuning)하는 단계;를 포함할 수 있다.In the facial expression recognition method according to an embodiment of the present disclosure, the training of the classification model of the artificial neural network may include receiving a plurality of training images collected from a lab environment and a wild environment with respect to the facial expression. Extracting: generating a detection result through a landmark detector on the extracted training feature; Training a deep network structure composed of a plurality of convolution layers using the generated detection result; And fine-tuning the classification model of the artificial neural network by re-learning a pre-training model through transfer learning.

상기 기술적 과제를 해결하기 위하여, 본 발명의 다른 실시예에 따른 얼굴 표정 인식 방법은, 얼굴 표정 인식 장치가 얼굴 표정에 관한 복수의 트레이닝 이미지를 입력받아 트레이닝 특징(feature)을 추출하여 손실 함수를 포함하는 인공 신경망의 분류 모델을 학습시키는 단계; 및 상기 얼굴 표정 인식 장치가 얼굴을 포함하는 대상 이미지를 입력받아 대상 특징을 추출하고 학습된 상기 인공 신경망의 분류 모델을 이용하여 상기 대상 특징을 분류함으로써 상기 대상 이미지의 얼굴 표정을 식별하는 단계;를 포함하고, 상기 손실 함수는, 분류 내 변화(intra-class variation)를 증가시키는 제 1 손실 함수, 분류 간 차이(inter-class difference)를 증가시키는 제 2 손실 함수, 및 분류 내 변화를 감소시키는 제 3 손실 함수를 결합하여 구성된다.In order to solve the above technical problem, the facial expression recognition method according to another embodiment of the present invention, the facial expression recognition apparatus receives a plurality of training images related to the facial expression to extract a training feature (feature) includes a loss function Training a classification model of an artificial neural network; And receiving, by the facial expression recognition apparatus, a target image including a face, extracting a target feature, and classifying the target feature using a trained classification model of the artificial neural network to identify the facial expression of the target image. Wherein the loss function comprises: a first loss function that increases intra-class variation, a second loss function that increases inter-class difference, and an agent that reduces change in classification. Composed of three loss functions.

다른 실시예에 따른 얼굴 표정 인식 방법에서, 상기 인공 신경망은 컨볼루셔널 인공 신경망(Convolutional Neural Network, CNN)이고, 상기 제 2 손실 함수는 각각의 분류들의 중심 간의 평균 제곱 차이를 산출함으로써 서로 다른 분류 중심 간의 거리를 최대화하는 중심 분산 손실(Center Dispserion Loss) 함수이며, 상기 제 3 손실 함수는 샘플과 특징 공간에서의 해당 중심 간의 제곱 거리의 합을 산출함으로써 각 분류의 심층 특징과 해당 분류 중심 간의 거리를 최소화하는 중심 손실(Center Loss) 함수로 구성될 수 있다. 또한, 상기 제 1 손실 함수는 소프트맥스 손실(Softmax Loss) 함수로 구성될 수 있다.In a facial expression recognition method according to another embodiment, the artificial neural network is a convolutional neural network (CNN), and the second loss function calculates a different mean by calculating a mean square difference between the centers of the respective classifications. Center Dispserion Loss function that maximizes the distance between centers, wherein the third loss function calculates the sum of the squared distances between the corresponding centers in the sample and the feature space, thereby calculating the distance between the deeper features of the respective classifications and the corresponding classification centers. It can be configured as a center loss function that minimizes In addition, the first loss function may be configured as a softmax loss function.

다른 실시예에 따른 얼굴 표정 인식 방법에서, 상기 손실 함수는, 상기 제 2 손실 함수 및 상기 제 3 손실 함수의 균형을 고려하여 설정된 제 1 하이퍼 파라미터를 이용하여 상기 제 2 손실 함수 및 상기 제 3 손실 함수를 선형 결합하여 결합 손실 함수를 산출하고, 상기 제 1 손실 함수 및 상기 결합 손실 함수의 균형을 고려하여 설정된 제 2 하이퍼 파라미터를 이용하여 상기 제 1 손실 함수 및 상기 결합 손실 함수를 선형 결합함으로써 산출될 수 있다.In the facial expression recognition method according to another embodiment, the loss function may include the second loss function and the third loss using a first hyperparameter set in consideration of a balance between the second loss function and the third loss function. A linear loss of the combined loss function, and a linear loss of the first loss function and the combined loss function using a second hyperparameter set in consideration of the balance of the first loss function and the combined loss function. Can be.

다른 실시예에 따른 얼굴 표정 인식 방법에서, 상기 인공 신경망의 분류 모델을 학습시키는 단계는, 얼굴 표정에 관하여 실험(lab) 환경 및 실제(wild) 환경으로부터 수집된 복수의 트레이닝 이미지를 입력받아 트레이닝 특징을 추출하는 단계: 추출된 상기 트레이닝 특징에 대해 랜드마크(landmark) 검출기를 통해 검출 결과를 생성하는 단계; 생성된 검출 결과를 이용하여 복수의 컨볼루션 레이어(Convolution Layer)로 구성된 딥 네트워크(Ddeep Network) 구조를 학습시키는 단계; 및 전이 학습(Transfer learning)을 통해 학습이 완료된 모델(Pre-Training Model)을 재학습시킴으로써 인공 신경망의 분류 모델을 미세 조정(fine-tuning)하는 단계;를 포함할 수 있다.In the facial expression recognition method according to another embodiment, the training of the classification model of the artificial neural network may include receiving a plurality of training images collected from a lab environment and a wild environment with respect to the facial expression. Extracting: generating a detection result through a landmark detector on the extracted training feature; Training a deep network structure composed of a plurality of convolution layers using the generated detection result; And fine-tuning the classification model of the artificial neural network by re-learning a pre-training model through transfer learning.

한편, 이하에서는 상기 기재된 얼굴 표정 인식 방법들을 컴퓨터에서 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.Meanwhile, the following provides a computer-readable recording medium having recorded thereon a program for executing the above-described facial expression recognition methods on a computer.

상기 기술적 과제를 해결하기 위하여, 본 발명의 또 다른 실시예에 따른 얼굴 표정 인식 장치는, 얼굴 표정에 관한 복수의 트레이닝 이미지 및 얼굴을 포함하는 대상 이미지를 입력받는 입력부; 얼굴 표정을 학습하고 학습된 모델을 이용하여 대상 이미지의 표정을 인식하는 프로그램을 저장하는 메모리; 및 상기 메모리에 저장된 프로그램을 실행하는 프로세서;를 포함하고, 상기 메모리에 저장된 프로그램은, 입력된 트레이닝 이미지로부터 트레이닝 특징(feature)을 추출하여 손실 함수를 포함하는 인공 신경망의 분류 모델을 학습시키고, 얼굴을 포함하는 대상 이미지를 입력받아 대상 특징을 추출하고 학습된 상기 인공 신경망의 분류 모델을 이용하여 상기 대상 특징을 분류함으로써 상기 대상 이미지의 얼굴 표정을 식별하는 명령을 포함하되, 상기 손실 함수는, 분류 내 변화(intra-class variation)를 증가시키는 제 1 손실 함수 및 분류 간 차이(inter-class difference)를 증가시키는 제 2 손실 함수를 결합하여 구성된다.In order to solve the above technical problem, the facial expression recognition apparatus according to another embodiment of the present invention, the input unit for receiving a plurality of training images and a target image including a face; A memory for learning a facial expression and storing a program for recognizing the facial expression of the target image using the learned model; And a processor for executing a program stored in the memory, wherein the program stored in the memory extracts a training feature from an input training image to train a classification model of an artificial neural network including a loss function, and And receiving a target image including a target image, extracting a target feature, and classifying the target feature using a trained classification model of the artificial neural network, thereby identifying a facial expression of the target image. It consists of combining a first loss function that increases intra-class variation and a second loss function that increases inter-class difference.

또 다른 실시예에 따른 얼굴 표정 인식 장치에서, 상기 인공 신경망은 컨볼루셔널 인공 신경망(Convolutional Neural Network, CNN)이고, 상기 제 1 손실 함수는 소프트맥스 손실(Softmax Loss) 함수이며, 상기 제 2 손실 함수는 각각의 분류들의 중심 간의 평균 제곱 차이를 산출함으로써 서로 다른 분류 중심 간의 거리를 최대화하는 중심 분산 손실(Center Dispserion Loss) 함수로 구성될 수 있다.In a facial expression recognition apparatus according to another embodiment, the artificial neural network is a convolutional neural network (CNN), the first loss function is a softmax loss function, and the second loss. The function may consist of a Center Dispserion Loss function that maximizes the distance between different classification centers by calculating the mean square difference between the centers of the respective classifications.

또 다른 실시예에 따른 얼굴 표정 인식 장치에서, 상기 손실 함수는, 상기 제 1 손실 함수 및 상기 제 2 손실 함수의 균형을 고려하여 설정된 하이퍼 파라미터를 이용하여 상기 제 1 손실 함수 및 상기 제 2 손실 함수를 선형 결합함으로써 산출될 수 있다.In the facial expression recognition apparatus according to another embodiment, the loss function is the first loss function and the second loss function using a hyperparameter set in consideration of a balance between the first loss function and the second loss function. Can be calculated by linearly combining

또 다른 실시예에 따른 얼굴 표정 인식 장치에서, 상기 손실 함수는, 분류 내 변화를 감소시키는 제 3 손실 함수를 더 포함하여 구성될 수 있다. 또한, 상기 제 3 손실 함수는 샘플과 특징 공간에서의 해당 중심 간의 제곱 거리의 합을 산출함으로써 각 분류의 심층 특징과 해당 분류 중심 간의 거리를 최소화하는 중심 손실(Center Loss) 함수로 구성될 수 있다. 나아가, 상기 손실 함수는, 상기 제 2 손실 함수 및 상기 제 3 손실 함수의 균형을 고려하여 설정된 제 1 하이퍼 파라미터를 이용하여 상기 제 2 손실 함수 및 상기 제 3 손실 함수를 선형 결합하여 결합 손실 함수를 산출하고, 상기 제 1 손실 함수 및 상기 결합 손실 함수의 균형을 고려하여 설정된 제 2 하이퍼 파라미터를 이용하여 상기 제 1 손실 함수 및 상기 결합 손실 함수를 선형 결합함으로써 산출될 수 있다.In the facial expression recognition apparatus according to another embodiment, the loss function may further include a third loss function for reducing a change in classification. In addition, the third loss function may be configured as a center loss function that minimizes the distance between the depth feature of each classification and the corresponding classification center by calculating the sum of the squared distances between the sample and the corresponding center in the feature space. . Furthermore, the loss function may linearly combine the second loss function and the third loss function using a first hyperparameter set in consideration of the balance of the second loss function and the third loss function to form a combined loss function. The first loss function and the combined loss function may be calculated by linearly combining the first loss function and the combined loss function using a second hyperparameter set in consideration of the balance between the first loss function and the combined loss function.

또 다른 실시예에 따른 얼굴 표정 인식 장치에서, 상기 메모리에 저장된 프로그램은, 얼굴 표정에 관하여 실험(lab) 환경 및 실제(wild) 환경으로부터 수집된 복수의 트레이닝 이미지를 입력받아 트레이닝 특징을 추출하고, 추출된 상기 트레이닝 특징에 대해 랜드마크(landmark) 검출기를 통해 검출 결과를 생성하고, 생성된 검출 결과를 이용하여 복수의 컨볼루션 레이어(Convolution Layer)로 구성된 딥 네트워크(Ddeep Network) 구조를 학습시키며, 전이 학습(Transfer learning)을 통해 학습이 완료된 모델(Pre-Training Model)을 재학습시킴으로써 인공 신경망의 분류 모델을 미세 조정(fine-tuning)함으로써, 상기 인공 신경망의 분류 모델을 학습시키는 명령을 수행할 수 있다.In the facial expression recognition apparatus according to another embodiment, the program stored in the memory receives a plurality of training images collected from a lab environment and a wild environment with respect to the facial expression, and extracts a training feature, Generating a detection result through a landmark detector on the extracted training feature, and using the generated detection result, learning a deep network structure composed of a plurality of convolution layers; By fine-tuning the classification model of the artificial neural network by re-learning a pre-training model through transfer learning, a command for training the classification model of the artificial neural network may be performed. Can be.

본 발명의 실시예들은, 분류 내 유사성과 분류 간 유사성을 모두 감소시킬 수 있는 중심 분산 손실 함수를 도입함으로써, 데이터셋을 보다 균형있게 유지하고 더 높은 분류 정확도를 달성하였으며, 심층 얼굴 특징의 판별력이 증가함으로써 결과적으로 얼굴 표정 인식 성능이 향상되었다.Embodiments of the present invention provide a more balanced and higher classification accuracy by introducing a central variance loss function that can reduce both similarity between classifications and similarities between classifications, resulting in better discrimination of deep facial features. As a result, facial expression recognition performance improved.

도 1은 얼굴 표정 인식을 처리하기 위한 심층 신경망 구조를 예시한 도면이다.
도 2는 소프트맥스 손실(softmax loss)과 중심 손실(center loss)을 비교하여 설명하기 위한 도면이다.
도 3은 본 발명의 일 실시예에 따른 중심 분산 손실 함수를 활용한 딥 러닝 모델 기반의 얼굴 표정 인식 방법을 도시한 흐름도이다.
도 4a는 실험 환경(실험실에 수행)과 실제 환경(실제 환경에서 수집)의 두 가지 유형의 데이터셋을 예시한 도면이다.
도 4b 내지 도 4e는 트레이닝을 위한 데이터셋을 예시한 도면이다.
도 5는 본 발명의 일 실시예에 따른 도 3의 얼굴 표정 인식 방법에서 인공 신경망의 분류 모델을 학습하는 과정을 보다 구체적으로 도시한 흐름도이다.
도 6은 분류 모델의 예로서 VGGFace 모델을 도시한 도면이다.
도 7은 본 발명의 일 실시예에 따른 중심 분산 손실 함수를 활용한 딥 러닝 모델 기반의 얼굴 표정 인식 장치를 도시한 블록도이다.1 is a diagram illustrating a deep neural network structure for processing facial expression recognition.
FIG. 2 is a diagram for explaining the comparison between softmax loss and center loss.
3 is a flowchart illustrating a method of recognizing a facial expression based on a deep learning model using a central variance loss function according to an embodiment of the present invention.
4A is a diagram illustrating two types of datasets: an experimental environment (performed in a laboratory) and a real environment (collected in a real environment).
4B-4E illustrate datasets for training.
5 is a flowchart illustrating a process of learning a classification model of an artificial neural network in the facial expression recognition method of FIG. 3 according to an embodiment of the present invention in more detail.
6 is a diagram illustrating a VGGFace model as an example of a classification model.
7 is a block diagram illustrating an apparatus for recognizing a facial expression based on a deep learning model using a central variance loss function according to an embodiment of the present invention.

본 발명의 실시예들을 설명하기에 앞서, 종래의 딥러닝 모델에서 얼굴 표정을 인식함에 있어서 활용될 수 있는 다양한 요소 기술들을 소개하고, 앞서 논의된 종래 기술의 약점을 해결하기 위해 본 발명의 실시예들이 제안하는 기술적 수단을 순차적으로 소개하도록 한다.Prior to describing embodiments of the present invention, various element technologies that can be utilized in recognizing facial expressions in a deep learning model are introduced, and embodiments of the present invention to solve the weaknesses of the prior art discussed above. The technical measures suggested by them are introduced sequentially.

개인-독립적인 얼굴 표정 인식을 위해 통계적 로컬 특징, 로컬 바이너리 패턴(Local Binary Patterns, LBP)에 기초하여 얼굴 표정을 경험적으로 평가할 수 있다. 가장 분별있는 LBP 특징을 추출하기 위해 Boosted- LBP를 공식화할 수 있는데, Boosted-LBP 특징을 갖는 서포트 벡터 머신(Support Vector Machine, SVM) 분류자를 사용하여 인식 성능을 얻을 수 있다.Facial expressions can be empirically evaluated based on statistical local features, Local Binary Patterns (LBP) for individual-independent facial expression recognition. Boosted-LBP can be formulated to extract the most discreet LBP features. Recognition performance can be achieved using a Support Vector Machine (SVM) classifier with Boosted-LBP features.

또한, 가버 웨이블릿(Gabor wavelets), SIFT(Scale Invariant Feature Transform) 특징, 방향성 그라디언트의 히스토그램(histogram of Oriented Gradients, HOG), 로컬 바이너리 패턴(Local Binary Patterns, LBP)의 히스토그램, 로컬 위상 양자화 (Local Phase Quantization, LPQ)의 히스토그램, 로컬 가버 바이너리 패턴(Local Gabor Binary Patterns, LGBP) 의 히스토그램은 표정 인식을 위한 가장 성공적인 인간 설계 특징들 중 일부이다.In addition, Gabor wavelets, Scale Invariant Feature Transform (SIFT) features, histograms of Oriented Gradients (HOG), histograms of Local Binary Patterns (LBP), and local phase quantization The histogram of Quantization (LPQ) and the Local Gabor Binary Patterns (LGBP) are some of the most successful human design features for facial recognition.

한편, 다수의 잘 알려진 표준 얼굴 데이터셋에 대한 얼굴 표정 인식을 처리하기 위해 심층 신경망 구조가 활용될 수 있다. 도 1은 얼굴 표정 인식을 처리하기 위한 심층 신경망 구조를 예시한 도면으로서, 이러한 심층 신경망의 심층 네트워크(deep network)는 각각 최대 풀링(max pooling)이 뒤따르는 두 개의 컨볼루션 레이어(convolutional layer)와 네 개의 시작 레이어(Inception layer)로 구성될 수 있다. 이러한 네트워크는 이미지를 일곱 가지 기본 표정으로 분류할 수 있다.Meanwhile, a deep neural network structure may be utilized to process facial expression recognition for a number of well-known standard facial datasets. 1 is a diagram illustrating a deep neural network structure for processing facial expression recognition. Each deep network of the deep neural network includes two convolutional layers followed by max pooling, and a deep network. It can consist of four Inception layers. Such a network can classify images into seven basic expressions.

앞서 종래의 분류 손실 함수, 특히 소프트맥스 손실(Softmax Loss)을 사용하는 경우에 분류별 구별이 용이하지 않거나 분류 내 변화(intra-class variation)가 높아지는 문제점을 지적한 바 있다. 이를 극복하기 위해 중심 손실(center loss)를 고려할 수 있다.Previously, when using a conventional classification loss function, in particular Softmax Loss, it has been pointed out that the classification is not easy to distinguish by classification or the intra-class variation is increased. To overcome this, we can consider the center loss.

도 2는 소프트맥스 손실(softmax loss)과 중심 손실(center loss)을 비교하여 설명하기 위한 도면으로서, (a)는 소프트맥스 손실을 나타내고, (b)는 중심 손실을 나타낸다.FIG. 2 is a diagram for explaining a comparison of softmax loss and center loss, in which (a) shows softmax loss and (b) shows center loss.

중심 손실 함수는 각 분류(class)의 심층 특징(deep feature)에 대한 중심(center)을 동시에 학습하고, 심층 특징과 해당 분류 중심 간의 거리를 최소화할 수 있는데, 이러한 방법은 분류별 구별에 있어서 소프트맥스 손실보다 더 효과적이다. 즉, 중심 손실을 최소화하면 심층 특징의 분류 내 변화(intra-class variation)를 줄이는 경향이 나타나게 된다. 이러한 중심 손실에 대한 함수는 다음의 수학식 1과 같이 표현된다.The center loss function can learn the center of the deep features of each class at the same time and minimize the distance between the deep features and the center of the class. More effective than loss. In other words, minimizing the center loss tends to reduce intra-class variation of deep features. The function for such a central loss is expressed by Equation 1 below.

수학식 1에서 중심 손실은 샘플과 특징 공간(feature space)에서의 해당하는 중심 간의 제곱 거리(squared distance)의 합으로부터 산출될 수 있다. In Equation 1, the center loss can be calculated from the sum of the squared distances between the sample and the corresponding center in the feature space.

분류 모델을 학습시킴에 있어서, 소프트맥스 손실만을 사용하면 분류 내 변화(intra-class variation)가 높은데 반해, 중심 손실(Center Loss)만을 사용하면 심층 학습된 특징과 중심들이 0 으로 감소한다. 따라서, 분류 내 비유사성과 분류 간 유사성을 위한 심층 특징을 명확히 하도록 분류 모델 VGGface를 미세 조정하기 위해 소프트맥스 손실과 중심 손실을 결합하는 전략을 고민해볼 수 있다.In training the classification model, using only softmax loss results in high intra-class variation, whereas using only center loss results in deeply learned features and centers of zero. Thus, one might consider a strategy that combines softmax and center losses to fine-tune the classification model VGGface to clarify the in-depth features for similarity between classifications and dissimilarities.

신경망 학습을 위해 결합된 총 손실 함수는 다음의 수학식 2와 같이 제시될 수 있으며, 분류 내 변화가 감소하고 분류 간 유사도가 감소한 심층 특징(deep feature)를 얻기 위해 심층 네트워크가 학습될 수 있다.The total loss function combined for neural network learning can be presented as Equation 2 below, and the deep network can be trained to obtain deep features with reduced variation in classification and reduced similarity between classifications.

수학식 2에서, L_s는 소프트맥스 손실 함수이고, L_c는 중심 손실 함수이며, 하이퍼 파라미터(Hyper parameter) λ는 두 항(손실 함수들)의 균형을 맞추기 위해 사용되었다.In Equation 2, L _s is a softmax loss function, L _c is a central loss function, and hyper parameter λ is used to balance two terms (loss functions).

본 발명의 실시예들에서는 분노, 혐오, 공포, 행복, 중립, 슬픔 및 놀라움 이라는 일곱 가지 기본 얼굴 표정을 분류하기 위해 심층 학습 프레임워크를 적용하였다. 그러나, 이러한 얼굴 표정 분류의 분류 내 변화와 분류 간의 유사성을 감소시키기 위해서는 이상의 수학식 1 및 수학식 2만으로는 여전히 분류 성능이 충분하지 못하다는 사실을 발견하였다. 예를 들어, 서로 다른 분류의 클러스터가 일부 중첩되는 영역에서 형성될 수 있는데, 이 경우 중심 손실 함수는 분류 내 변화는 감소시킬 수 있으나 분류 간의 유사성을 감소시킴에 있어서는 그 성능이 좋지 못하다는 문제가 발견되었다. 따라서, 이하에서 기술되는 본 발명의 실시예들은 이러한 얼굴 표정 분류의 분류 내 변화와 분류 간 유사성을 모두 동시에 감소시킬 수 있는 새로운 손실 함수로서, 중심 분산 손실 함수(Center Dispersion Loss)를 제안하고자 한다.In the embodiments of the present invention, an in-depth learning framework was applied to classify seven basic facial expressions such as anger, disgust, fear, happiness, neutrality, sadness, and surprise. However, in order to reduce the similarity between the change in the classification and the classification of the facial expression classification, it was found that the above equations 1 and 2 alone are not sufficient in the classification performance. For example, clusters of different classifications can be formed in some overlapping region, where the central loss function can reduce changes in the classifications but does not perform well in reducing similarity between classes. Found. Accordingly, embodiments of the present invention described below propose a center dispersion loss function as a new loss function capable of simultaneously reducing both the change in classification and the similarity between classifications of the facial expression classification.

이하에서는 도면을 참조하여 본 발명의 실시예들을 구체적으로 설명하도록 한다. 다만, 하기의 설명 및 첨부된 도면에서 본 발명의 요지를 흐릴 수 있는 공지 기능 또는 구성에 대한 상세한 설명은 생략한다. 덧붙여, 명세서 전체에서, 어떤 구성 요소를 '포함'한다는 것은, 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라, 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. However, in the following description and the accompanying drawings, detailed descriptions of well-known functions or configurations that may obscure the subject matter of the present invention will be omitted. In addition, the term 'comprising' a certain component throughout the specification means that it may further include other components, without excluding other components unless specifically stated otherwise.

또한, 제 1, 제 2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성 요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성 요소를 다른 구성 요소로부터 구별하는 목적으로 사용될 수 있다. 예를 들어, 본 발명의 권리 범위로부터 이탈되지 않은 채 제 1 구성 요소는 제 2 구성 요소로 명명될 수 있고, 유사하게 제 2 구성 요소도 제 1 구성 요소로 명명될 수 있다.In addition, terms such as first and second may be used to describe various components, but the components should not be limited by the terms. The terms may be used for the purpose of distinguishing one component from another component. For example, without departing from the scope of the present invention, the first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component.

본 발명에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "구비하다" 등의 용어는 설시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting of the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In this application, the terms "comprise" or "include" are intended to indicate that there is a feature, number, step, action, component, part, or combination thereof that is described, and that one or more other features It should be understood that it does not exclude in advance the possibility of the presence or addition of numbers, steps, operations, components, parts or combinations thereof.

특별히 다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미이다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미인 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless specifically defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art. Terms such as those defined in the commonly used dictionaries should be construed as meanings consistent with the meanings in the context of the related art, and shall not be construed in ideal or excessively formal meanings unless expressly defined in this application. .

도 3은 본 발명의 일 실시예에 따른 중심 분산 손실 함수를 활용한 딥 러닝 모델 기반의 얼굴 표정 인식 방법을 도시한 흐름도로서, 컨볼루셔널 인공 신경망(Convolutional Neural Network, CNN)에 기반하여 학습을 수행한다.3 is a flowchart illustrating a deep learning model-based facial expression recognition method using a central variance loss function according to an embodiment of the present invention, and learning based on a convolutional neural network (CNN). To perform.

S310 단계에서, 얼굴 표정 인식 장치는 얼굴 표정에 관한 복수의 트레이닝 이미지를 입력받아 트레이닝 특징(feature)을 추출하여 손실 함수를 포함하는 인공 신경망의 분류 모델을 학습시킨다. 이때, 상기 손실 함수는 분류 간 차이(inter-class difference)를 증가시키는 중심 분산 손실 함수(Center Dispersion Loss)를 포함하여 구성된다. 이러한 손실 함수는 중심 분산 손실 함수 외에 적어도 하나 이상의 다른 손실 함수를 더 포함하여 이들간의 결합을 통해 총 손실 함수를 구성할 수 있으며, 보다 구체적인 총 손실 함수의 실시예들은 이후 순차적으로 기술하도록 한다.In operation S310, the facial expression recognition apparatus receives a plurality of training images regarding a facial expression, extracts a training feature, and trains a classification model of an artificial neural network including a loss function. In this case, the loss function includes a center dispersion loss function that increases an inter-class difference. Such a loss function may further comprise at least one or more other loss functions in addition to the central variance loss function to form a total loss function by combining them, and more specific embodiments of the total loss function will be described later.

손실 함수(loss function)는 얻고자 하는 결과와 연산 결과의 차이를 손실 값(loss)으로 정의하여 손실 값을 계산하는 함수이며, 현재 손실 함수로 전달된 특징(feature) 또는 벡터는 라벨링된 트레이닝용 이미지로부터 온 것이다. 따라서 얻고자 하는 올바른 결과와 실제 연산 결과를 모두 알 수 있으므로 연산 결과와의 올바른 결과의 차이인 오차(또는 손실 값)를 구할 수 있다.A loss function is a function that calculates a loss value by defining a difference between a desired result and an operation result as a loss value, and the feature or vector passed to the current loss function is for labeled training. It's from the image. Therefore, since both the correct result and the actual calculation result are known, an error (or loss value), which is a difference between the correct result and the calculation result, can be obtained.

이제, 분류 손실 함수는 추출된 트레이닝용 특징를 이용해 얼굴 표정을 분류(classification)하고, 이를 통해 추출된 트레이닝용 특징이 비롯된 얼굴 표정과 실제 얼굴 표정의 차이를 그 값으로 가진다. 분류 손실 함수로는 종래에 소프트맥스(Softmax) 함수나 시그모이드(Sigmoid) 함수 등이 사용되었으나, 앞서 지적한 바와 같이 본 발명의 실시예들은 중심 분산 손실 함수(Center Dispersion Loss)을 도출하여 분류 간 차이(inter-class difference)을 보다 명확하게 판단할 수 있도록 구현하였다.Now, the classification loss function classifies facial expressions using the extracted training features, and has the difference between the facial expressions derived from the training features extracted through the training features and the actual facial expressions. As a classification loss function, a Softmax function or a sigmoid function has been conventionally used. However, as described above, embodiments of the present invention derive a center dispersion loss function from a classification. Implemented so that the difference (inter-class difference) can be judged more clearly.

한편, S310 단계의 학습 과정에서는, 표정 데이터가 부족하기 때문에 더 높은 학습 정확도를 위해 다수의 얼굴 표정 데이터셋을 결합할 수 있다. 대규모 데이터셋으로 학습된 VGGFace 또는 Facenet과 같은 심층 얼굴 검증 모델에서 전이 학습의 이점을 활용하고 얼굴 표정 인식에 특화된 차별적인 신경망 손실을 도출할 수 있다.On the other hand, in the learning process of step S310, because the lack of facial expression data can combine a plurality of facial expression dataset for higher learning accuracy. In-depth face verification models such as VGGFace or Facenet trained with large datasets can take advantage of transfer learning and derive differential neural network losses specialized for facial expression recognition.

S320 단계에서, 상기 얼굴 표정 인식 장치는 얼굴을 포함하는 대상 이미지를 입력받아 대상 특징을 추출하고 학습된 상기 인공 신경망의 분류 모델을 이용하여 상기 대상 특징을 분류함으로써 상기 대상 이미지의 얼굴 표정을 식별한다.In operation S320, the apparatus for recognizing a facial expression receives a target image including a face, extracts a target feature, and classifies the target feature using a trained classification model of the artificial neural network to identify the facial expression of the target image. .

(1) 데이터 수집(1) data collection

도 4a는 실험 환경(실험실에 수행)과 실제 환경(실제 환경에서 수집)의 두 가지 유형의 데이터셋을 예시한 도면이다.4A is a diagram illustrating two types of datasets: an experimental environment (performed in a laboratory) and a real environment (collected in a real environment).

기존의 데이터셋을 결합하여 더 큰 규모의 학습 데이터를 모으고 주요한 얼굴 특징을 수집하였다. 그런 다음, dlib 얼굴 검출기(Multi-view HOG), HOG + Adaboost 및 Haar + Adaboost의 세 종류의 랜드마크(landmark) 검출기를 적용하였다. 이로부터 최고의 검출 결과만을 선택하고, 수평 반전으로 이미지 개수를 늘려 최종적으로 224×224 픽셀로 크기를 조정하였다. By combining existing datasets, we collected larger training data and collected key facial features. Then, three kinds of landmark detectors were applied: dlib face detector (Multi-view HOG), HOG + Adaboost and Haar + Adaboost. From this, only the best detection result was selected, and the number of images was increased by horizontal inversion and finally resized to 224 × 224 pixels.

이하에서는 도 4b 내지 도 4e를 참조하여 트레이닝을 위한 데이터셋의 예를 소개하도록 한다. 얼굴 표정 학습 데이터셋은 SFEW, FER2013, CK + 및 KDEF 데이터셋 등을 결합하여 구성될 수 있다.Hereinafter, an example of a data set for training will be introduced with reference to FIGS. 4B to 4E. The facial expression learning dataset may be configured by combining the SFEW, FER2013, CK + and KDEF datasets.

Fer2013Fer2013 (Wild)(Wild)

도 4b를 참조하면, FER2013 데이터셋은 28,709 개의 학습 이미지, 3,589 개의 유효성 검사(공개 데이터셋) 이미지 및 3,589 개의 시험용(개인 데이터셋) 이미지를 포함한다. 데이터셋의 크기는 48×48이며 그레이스케일(grayscale)로 표현된다.Referring to FIG. 4B, the FER2013 dataset includes 28,709 training images, 3,589 validation (public dataset) images and 3,589 trial (personal dataset) images. The size of the dataset is 48x48 and represented in grayscale.

SFEWSFEW (Wild) (Wild)

도 4c를 참조하면, SFEW(Static Facial Expressions in the Wild) 데이터셋은 AFEW (Acted Facial Expressions in the Wild)에서 프레임을 선택하여 구성되어 있으며 영화에서 가져온 700 개의 이미지가 포함되어 있다.Referring to FIG. 4C, the static facial expressions in the wild (SFEW) dataset is composed of frames selected from the Acted Facial Expressions in the Wild (AFEW) and includes 700 images taken from a movie.

KDEFKDEF (Lab) (Lab)

도 4d를 참조하면, KDEF(Karolinska Directed Emotional Faces) 데이터셋은 감정에 대한 인간의 표정 이미지 총 4,900 장으로 구성된 집합이다. 이 데이터셋에는 7 가지 감정 표현을 보여주는 70 명의 대상이 포함되어 있으며, 각 표현은 5 가지 각도에서 두 번씩 촬영되었다.Referring to FIG. 4D, the Karolinska Directed Emotional Faces (KDEF) dataset is a set of 4,900 images of human facial expressions on emotions. The dataset contains 70 subjects representing seven emotional expressions, each of which was shot twice from five angles.

CK+(Lab)CK + (Lab)

도 4e를 참조하면, CK+(Extended Cohn-Kanade) 데이터셋은 123 명의 대상에 걸쳐 총 593 개의 시퀀스로 구성되며, 총 8 개의 분류(즉, 중립, 분노, 경멸, 혐오, 공포, 행복, 슬픔, 놀라움)를 포함한다.Referring to FIG. 4E, the Extended Cohn-Kanade (CK +) dataset consists of a total of 593 sequences across 123 subjects, with a total of eight classifications (ie neutral, angry, disdain, disgusted, frightened, happy, sad, Surprise).

(2) 전이 학습(Transfer Learning)(2) Transfer Learning

도 5는 본 발명의 일 실시예에 따른 도 3의 얼굴 표정 인식 방법에서 인공 신경망의 분류 모델을 학습하는 과정(S310)을 보다 구체적으로 도시한 흐름도이다.5 is a flowchart illustrating in more detail a process of learning a classification model of an artificial neural network (S310) in the facial expression recognition method of FIG. 3 according to an exemplary embodiment of the present invention.

인공 신경망의 분류 모델을 학습시키는 이 과정(S310)은, 우선 S311 단계를 통해 얼굴 표정에 관하여 실험(lab) 환경 및 실제(wild) 환경으로부터 수집된 복수의 트레이닝 이미지를 입력받아 트레이닝 특징을 추출하고, S312 단계를 통해 상기 트레이닝 특징에 대해 랜드마크(landmark) 검출기를 통해 검출 결과를 생성한다. 그런 다음, S313 단계를 통해 생성된 검출 결과를 이용하여 복수의 컨볼루션 레이어(Convolution Layer)로 구성된 딥 네트워크(Ddeep Network) 구조를 학습시키며, S314 단계의 전이 학습(Transfer learning)을 통해 학습이 완료된 모델(Pre-Training Model)을 재학습시킴으로써 인공 신경망의 분류 모델을 미세 조정(fine-tuning)할 수 있다. 여기서, 상기 랜드마크 검출기는 dlib 얼굴 검출기(Multi-view HOG), HOG + Adaboost 및 Haar + Adaboost 중 적어도 하나를 포함하고, 상기 트레이닝 이미지는 SFEW, FER2013, CK+ 및 KDEF의 데이터셋 중 적어도 둘 이상을 결합하여 구성되며, 상기 분류 모델은 VGGFace 모델로 구성될 수 있다.In this process of learning a classification model of an artificial neural network (S310), first, a plurality of training images collected from a lab environment and a wild environment regarding facial expressions are input, and a training feature is extracted. In operation S312, a detection result is generated through a landmark detector for the training feature. Then, the deep network structure composed of a plurality of convolution layers is trained using the detection result generated in step S313, and the learning is completed through transfer learning in step S314. By re-learning the model (Pre-Training Model), it is possible to fine-tune the classification model of the artificial neural network. The landmark detector may include at least one of a dlib face detector (Multi-view HOG), HOG + Adaboost, and Haar + Adaboost, and the training image may include at least two or more data sets of SFEW, FER2013, CK +, and KDEF. The combination model may be configured as a VGGFace model.

머신러닝(Machine Learning)의 많은 모델은 적용하려는 데이터가 학습할 때의 데이터와 같은 분포를 가진다고 가정으로 했을 때 효율적이다. 새로운 문제를 해결할 때 데이터의 분포가 바뀌면 기존의 통계적 모델을 새로운 데이터로 다시 만들어야 하는데, 이는 많은 비용을 야기한다. 따라서, 이미 잘 훈련된 모델이 준비되어 있고, 특히 해당 모델과 유사한 문제를 해결하려고 할 때 전이 학습(Transfer Learning)을 사용하는 것이 바람직하다.Many models of machine learning are efficient assuming that the data you are applying has the same distribution as the data you are learning. When solving a new problem, if the distribution of data changes, the existing statistical model must be rebuilt with new data, which is expensive. Therefore, a well-trained model is already available, and it is desirable to use Transfer Learning, especially when trying to solve a problem similar to that model.

전이 학습은 딥러닝을 특징 추출기(feature extractor)로만 사용하고, 추출한 특징을 이용하여 다른 모델을 학습할 수 있다. 기존의 만들어진 모델을 사용하여 새로운 모델을 만들 때 학습을 빠르게 하며 예측을 더 높일 수 있다. VGG, ResNet, gooGleNet등 이미 사전에 학습이 완료된 모델(Pre-Training Model)을 이용하여 전이 학습을 통해 사용자가 원하는 학습에 미세 조정(fine-tuning)을 수행하게 된다.Transfer learning can use deep learning only as a feature extractor, and can use the extracted features to learn other models. When you create a new model using an existing model, you can speed up learning and make better predictions. Pre-training models such as VGG, ResNet, and gooGleNet have already been pre-trained to perform fine-tuning on the user's desired learning through transition learning.

도 6은 분류 모델의 예로서 VGGFace 모델을 도시한 도면으로, 사용자 정의 학습 데이터셋은 Keras on Python 을 사용하는 VGGFace 모델을 미세 조정한다. FER 실험용 셋에서 평가된 결과, 학습 정확도는 93.5%, 시험 정확도는 63.2% 를 달성하였다.FIG. 6 illustrates a VGGFace model as an example of a classification model, wherein the user-defined training dataset fine tunes the VGGFace model using Keras on Python. As a result of evaluation in the FER experimental set, the learning accuracy was 93.5% and the test accuracy was 63.2%.

(3) 손실 함수(Loss Functions)(3) Loss Functions

이하에서는 본 발명의 실시예들이 제안하는 두 가지 손실 함수를 소개하도록 한다. 양자는 모두 분류 간 차이(inter-class difference)를 증가시키는 중심 분산 손실 함수(Center Dispersion Loss)를 고려하여 설계되었으며, 다만 총 손실 함수의 세부 구성에 일부 차이가 존재한다.Hereinafter, two loss functions proposed by the embodiments of the present invention will be introduced. Both are designed to take into account the Center Dispersion Loss, which increases the inter-class difference, but there are some differences in the details of the total loss function.

3-1) 3-1) 제 1First 실시예Example : 중심 분산 손실(Center Dispersion Loss): Center Dispersion Loss

중심 손실은 각 분류의 심층 특징에 대한 중심을 동시에 학습하고 심층 특징과 해당 분류 중심 간의 거리를 최소화한다. 중심 손실을 최소화하면 심층 특징의 분류 내 변화를 줄이는 경향이 있으므로, 소프트맥스 손실보다 분류별로 식별에 용이하다. 앞서 수학식 1을 통해 소개한 바와 같이, 중심 손실은 샘플 공간과 특징 공간에서의 해당 중심 간의 제곱 거리의 합으로부터 산출될 수 있다.Center loss learns the centers of the deeper features of each class simultaneously and minimizes the distance between the deeper features and the centers of the class. Minimizing the center loss tends to reduce changes in the classification of deep features, making it easier to identify by category than Softmax loss. As introduced through Equation 1 above, the center loss may be calculated from the sum of the squared distances between the corresponding centers in the sample space and the feature space.

그러나, 경우에 따라 서로 다른 분류의 클러스터는 서로 겹칠(overlap) 수 있다. 중심 손실은 분류 내 변화를 줄이지만 분류 간 유사성을 줄이는데 큰 성공을 거두지는 못한다는 문제가 발견되었다. 따라서, 본 발명의 실시예들은 분류 간 차이를 증가시키기 위해 분류 중심 간의 거리를 최대화하는 중심 분산 손실 함수(Center Dispersion Loss)를 제안하였다.However, in some cases clusters of different classifications may overlap each other. Problems have been found that the central loss reduces changes in the classifications but does not have great success in reducing the similarities between the classifications. Accordingly, embodiments of the present invention proposed a Center Dispersion Loss function that maximizes the distance between classification centers to increase the difference between classifications.

아래의 수학식 3은 중심 간의 평균 제곱 차이를 산출하여 중심 분산 손실을 도출할 수 있다.Equation 3 below can calculate the center variance loss by calculating the mean square difference between the centers.

수학식 3을 참조하면, c_j 및 c_k가 각각 서로 다른 분류에 속하는 중심을 나타내며, 이들 간의 평균 제곱 차이를 통해 분류 간 차이를 손실 함수에 반영할 수 있다.Referring to Equation 3, c _j and c _k each represent a center belonging to a different classification, and the difference between the classifications may be reflected in the loss function through the mean square difference between them.

한편, 소프트맥스 손실(Softmax Loss)만을 사용하면 분류 내 변화(intra-class variation)가 높아지고, 중심 분산 손실 함수(Center Dispersion Loss)만을 사용하면 중심(Center)이 '0'으로 저하될 수 있다. 따라서, 분류 내 비 유사성을 명확히 하도록 VGGFace를 미세 조정하기 위해 소프트맥스 손실과 이상에서 제안된 중심 분산 손실을 결합할 수 있다.On the other hand, using only Softmax Loss may increase intra-class variation, and using only Center Dispersion Loss may reduce the center to '0'. Thus, we can combine the softmax loss with the central dispersion loss proposed above to fine-tune VGGFace to clarify non-similarity in classification.

중심 분산 손실을 최소화하는 것은 심층 특징의 분류 간 유사성을 감소시키는 경향이 있으나, 분류 내 변화를 줄이는 데는 성공적이지 못하다는 점에 주목하여, 다음의 수학식 4와 같은 신경망 학습을 위한 새로운 총 손실 함수를 도출하였다.Note that minimizing the central variance loss tends to reduce the similarity between the classifications of the deep features, but is not successful in reducing the changes in the classifications, so that the new total loss function for neural network learning, Was derived.

여기서, L_s는 소프트맥스 손실이고, L_cD는 중심 분산 손실이며, 하이퍼 파라미터(Hyper parameter) λ는 두 항(손실 함수들)의 균항을 맞추기 위해 사용되었다.Where L _s is the softmax loss, L _cD is the central variance loss, and the hyper parameter λ is used to equalize the two terms (loss functions).

예를 들어, λ = 0.1 로 설정하여 미세 조정한 VGGFace 학습 정확도는 97.29 %이고 FER2013 시험 정확도는 70.16 %를 보여주었다.For example, the VGGFace learning accuracy fine-tuned with λ = 0.1 was 97.29% and the FER2013 test accuracy was 70.16%.

요약하건대, 제 1 실시예에서 총 손실 함수는, 분류 내 변화(intra-class variation)를 증가시키는 제 1 손실 함수 및 분류 간 차이(inter-class difference)를 증가시키는 제 2 손실 함수를 결합하여 구성되었다. 이때, 상기 제 1 손실 함수는 소프트맥스 손실(Softmax Loss) 함수이고, 상기 제 2 손실 함수는 각각의 분류들의 중심 간의 평균 제곱 차이를 산출함으로써 서로 다른 분류 중심 간의 거리를 최대화하는 중심 분산 손실(Center Dispserion Loss) 함수인 것이 바람직하다. 또한, 이러한 총 손실 함수는, 상기 제 1 손실 함수 및 상기 제 2 손실 함수의 균형을 고려하여 설정된 하이퍼 파라미터를 이용하여 상기 제 1 손실 함수 및 상기 제 2 손실 함수를 선형 결합함으로써 산출되는 것이 바람직하다.In summary, the total loss function in the first embodiment is constructed by combining a first loss function that increases intra-class variation and a second loss function that increases inter-class difference. It became. In this case, the first loss function is a Softmax Loss function, and the second loss function calculates a mean square difference between the centers of the respective classifications to maximize the distance between different classification centers. Dispserion Loss) function is preferable. In addition, the total loss function is preferably calculated by linearly combining the first loss function and the second loss function using a hyperparameter set in consideration of the balance between the first loss function and the second loss function. .

3-2) 3-2) 제 22nd 실시예Example : : 결합된Combined 중심 분산 손실(Combined Center Dispersion Loss) Combined Center Dispersion Loss

중심 손실을 최소화하는 것은 심층 특징의 분류 내 변화를 줄이는 경향이 있지만, 분류 간 유사성을 줄이지 못한다. 또한, 중심 분산 손실을 최소화하는 것은 심층 특징의 분류 간 유사성을 감소시키는 경향이 있지만, 분류 내 변화를 감소시키는데 효과적이지 못하다. 따라서, 본 실시예에서는, 분류 내 변화를 줄이고 분류 간 유사성을 줄이기 위해 소프트맥스 손실과 함께 중심 손실 및 중심 분산 손실을 모두 결합하는 기법을 제안한다.Minimizing the central loss tends to reduce changes in the classification of deep features, but does not reduce the similarity between classifications. In addition, minimizing the central dispersion loss tends to reduce the similarity between classifications of deep features, but is not effective in reducing changes in classification. Therefore, the present embodiment proposes a technique of combining both the center loss and the center variance loss together with the softmax loss in order to reduce the change in the classification and the similarity between the classifications.

우선 결합된 중심 분산 손실(Combined Center-Dispersion Loss, L_cc)은 다음의 수학식 5와 같이 도출될 수 있다.First, the combined center-dispersion loss (L _cc ) may be derived as shown in Equation 5 below.

여기서, L_c는 중심 손실이고, L_cD는 중심 분산 손실이며, 하이퍼 파라미터 λ₁(예를 들어, λ₁=0.1로 설정될 수 있다)는 두 항(손실 함수들)의 균형을 맞추기 위해 사용되었다.Where L _c is the central loss, L _cD is the central variance loss, and the hyperparameter λ ₁ (eg, can be set to λ ₁ = 0.1) is used to balance the two terms (loss functions) It became.

이제 신경망을 학습시키기 위한 총 손실 함수는 다음의 수학식 6과 같이 도출될 수 있다.Now the total loss function for training the neural network can be derived as in Equation 6 below.

여기서, L_s는 소프트맥스 손실이고, L_cc는 결합된 중심 분산 손실이며, 하이퍼 파라미터 λ(예를 들어, λ=0.1로 설정될 수 있다)는 두 항(손실 함수들)의 균형을 맞추기 위해 사용되었다.Where L _s is the softmax loss, L _cc is the combined central variance loss, and the hyperparameter λ (e.g. can be set to λ = 0.1) is used to balance the two terms (loss functions). Was used.

실험 결과, VGGFace를 미세 조정하여 학습 정확도는 97.9 %, FER2013 시험 정확도는 70.89 %를 달성하였다.As a result of experiment, VGGFace was finely tuned to achieve 97.9% learning accuracy and 70.89% accuracy of FER2013 test.

요약하건대, 제 2 실시예에서 총 손실 함수는, 분류 내 변화(intra-class variation)를 증가시키는 제 1 손실 함수, 분류 간 차이(inter-class difference)를 증가시키는 제 2 손실 함수, 및 분류 내 변화를 감소시키는 제 3 손실 함수를 결합하여 구성되었다. 이때, 상기 제 1 손실 함수는 소프트맥스 손실(Softmax Loss) 함수이고, 상기 제 2 손실 함수는 각각의 분류들의 중심 간의 평균 제곱 차이를 산출함으로써 서로 다른 분류 중심 간의 거리를 최대화하는 중심 분산 손실(Center Dispserion Loss) 함수이며, 상기 제 3 손실 함수는 샘플과 특징 공간에서의 해당 중심 간의 제곱 거리의 합을 산출함으로써 각 분류의 심층 특징과 해당 분류 중심 간의 거리를 최소화하는 중심 손실(Center Loss) 함수인 것이 바람직하다. 또한, 이러한 총 손실 함수는, 상기 제 2 손실 함수 및 상기 제 3 손실 함수의 균형을 고려하여 설정된 제 1 하이퍼 파라미터를 이용하여 상기 제 2 손실 함수 및 상기 제 3 손실 함수를 선형 결합하여 결합 손실 함수를 산출하고, 상기 제 1 손실 함수 및 상기 결합 손실 함수의 균형을 고려하여 설정된 제 2 하이퍼 파라미터를 이용하여 상기 제 1 손실 함수 및 상기 결합 손실 함수를 선형 결합함으로써 산출하는 것이 바람직하다.In summary, the total loss function in the second embodiment is a first loss function that increases intra-class variation, a second loss function that increases inter-class difference, and a classification within the classification. It was constructed by combining a third loss function that reduces the change. In this case, the first loss function is a Softmax Loss function, and the second loss function calculates a mean square difference between the centers of the respective classifications to maximize the distance between different classification centers. Dispserion Loss) function, wherein the third loss function is a center loss function that minimizes the distance between the deep feature of each classification and the corresponding classification center by calculating the sum of the squared distances between the sample and the corresponding center in the feature space. It is preferable. In addition, the total loss function is a combined loss function by linearly combining the second loss function and the third loss function using a first hyperparameter set in consideration of the balance between the second loss function and the third loss function. It is preferable to calculate and calculate by linearly combining the first loss function and the combined loss function using a second hyperparameter set in consideration of the balance of the first loss function and the combined loss function.

도 7은 본 발명의 일 실시예에 따른 중심 분산 손실 함수를 활용한 딥 러닝 모델 기반의 얼굴 표정 인식 장치(700)를 도시한 블록도로서, 도 3의 얼굴 표정 인식 방법의 각 단계를 하드웨어 구성의 관점에서 재구성한 것이다. 따라서, 여기서는 설명의 중복을 피하고자 각 구성의 기능 및 연산 동작만을 약술하도록 한다.FIG. 7 is a block diagram illustrating a deep learning model based facial expression recognizing apparatus 700 using a central variance loss function according to an embodiment of the present invention. Each step of the facial expression recognition method of FIG. 3 is configured in hardware. Reconstructed in terms of Therefore, in order to avoid duplication of description here, only the functions and operation operations of each configuration will be outlined.

입력부(10)는, 얼굴 표정에 관한 복수의 트레이닝 이미지 및 얼굴을 포함하는 대상 이미지를 입력받는 구성이다.The input unit 10 is configured to receive a plurality of training images related to facial expressions and a target image including a face.

메모리(30)는, 얼굴 표정을 학습하고 학습된 모델을 이용하여 대상 이미지의 표정을 인식하는 프로그램을 저장하는 구성이며, 프로세서(20)는, 상기 메모리(30)에 저장된 프로그램을 실행하는 구성이다.The memory 30 is a component for learning a facial expression and storing a program for recognizing the facial expression of the target image using the learned model, and the processor 20 is a component for executing the program stored in the memory 30. .

보다 구체적으로, 메모리(30)에 저장된 프로그램은, 입력된 트레이닝 이미지로부터 트레이닝 특징(feature)을 추출하여 손실 함수를 포함하는 인공 신경망의 분류 모델을 학습시키고, 얼굴을 포함하는 대상 이미지를 입력받아 대상 특징을 추출하고 학습된 상기 인공 신경망의 분류 모델을 이용하여 상기 대상 특징을 분류함으로써 상기 대상 이미지의 얼굴 표정을 식별하는 명령을 포함한다. 이때, 상기 손실 함수는, 분류 내 변화(intra-class variation)를 증가시키는 제 1 손실 함수 및 분류 간 차이(inter-class difference)를 증가시키는 제 2 손실 함수를 결합하여 구성된다.More specifically, the program stored in the memory 30 extracts a training feature from the input training image, trains a classification model of an artificial neural network including a loss function, and receives a target image including a face. And extracting a feature and classifying the target feature using the trained classification model of the artificial neural network to identify a facial expression of the target image. In this case, the loss function is configured by combining a first loss function that increases intra-class variation and a second loss function that increases inter-class difference.

이러한 인공 신경망은 컨볼루셔널 인공 신경망(Convolutional Neural Network, CNN)이고, 상기 제 1 손실 함수는 소프트맥스 손실(Softmax Loss) 함수이며, 상기 제 2 손실 함수는 각각의 분류들의 중심 간의 평균 제곱 차이를 산출함으로써 서로 다른 분류 중심 간의 거리를 최대화하는 중심 분산 손실(Center Dispserion Loss) 함수로 구성될 수 있다. 또한, 상기 손실 함수는, 상기 제 1 손실 함수 및 상기 제 2 손실 함수의 균형을 고려하여 설정된 하이퍼 파라미터를 이용하여 상기 제 1 손실 함수 및 상기 제 2 손실 함수를 선형 결합함으로써 산출될 수 있다.This artificial neural network is a convolutional neural network (CNN), the first loss function is a Softmax Loss function, and the second loss function is the mean square difference between the centers of the respective classifications. By calculating, it can be configured as a function of Center Dispserion Loss that maximizes the distance between different classification centers. The loss function may be calculated by linearly combining the first loss function and the second loss function using a hyperparameter set in consideration of a balance between the first loss function and the second loss function.

한편, 상기 손실 함수는, 분류 내 변화를 감소시키는 제 3 손실 함수를 더 포함하여 구성될 수 있다. 이때, 상기 제 3 손실 함수는 샘플과 특징 공간에서의 해당 중심 간의 제곱 거리의 합을 산출함으로써 각 분류의 심층 특징과 해당 분류 중심 간의 거리를 최소화하는 중심 손실(Center Loss) 함수로 구성될 수 있다.Meanwhile, the loss function may further include a third loss function for reducing the change in the classification. In this case, the third loss function may be configured as a center loss function that minimizes the distance between the depth feature of each classification and the corresponding classification center by calculating the sum of the squared distances between the sample and the corresponding center in the feature space. .

또한, 상기 손실 함수는, 상기 제 2 손실 함수 및 상기 제 3 손실 함수의 균형을 고려하여 설정된 제 1 하이퍼 파라미터를 이용하여 상기 제 2 손실 함수 및 상기 제 3 손실 함수를 선형 결합하여 결합 손실 함수를 산출하고, 상기 제 1 손실 함수 및 상기 결합 손실 함수의 균형을 고려하여 설정된 제 2 하이퍼 파라미터를 이용하여 상기 제 1 손실 함수 및 상기 결합 손실 함수를 선형 결합함으로써 산출될 수 있다.The loss function may be configured to linearly combine the second loss function and the third loss function using a first hyperparameter set in consideration of the balance between the second loss function and the third loss function to form a combined loss function. The first loss function and the combined loss function may be calculated by linearly combining the first loss function and the combined loss function using a second hyperparameter set in consideration of the balance between the first loss function and the combined loss function.

나아가, 메모리(30)에 저장된 프로그램은, 얼굴 표정에 관하여 실험(lab) 환경 및 실제(wild) 환경으로부터 수집된 복수의 트레이닝 이미지를 입력받아 트레이닝 특징을 추출하고, 추출된 상기 트레이닝 특징에 대해 랜드마크(landmark) 검출기를 통해 검출 결과를 생성하고, 생성된 검출 결과를 이용하여 복수의 컨볼루션 레이어(Convolution Layer)로 구성된 딥 네트워크(Ddeep Network) 구조를 학습시키며, 전이 학습(Transfer learning)을 통해 학습이 완료된 모델(Pre-Training Model)을 재학습시킴으로써 인공 신경망의 분류 모델을 미세 조정(fine-tuning)함으로써, 상기 인공 신경망의 분류 모델을 학습시키는 명령을 수행할 수 있다.Furthermore, the program stored in the memory 30 receives a plurality of training images collected from a lab environment and a wild environment regarding facial expressions, extracts training features, and lands the extracted training features. A detection result is generated through a mark detector, a deep network structure composed of a plurality of convolution layers is trained using the generated detection result, and transfer learning is performed through transfer learning. By fine-tuning the classification model of the artificial neural network by re-learning a pre-training model, the instruction for training the classification model of the artificial neural network may be performed.

(4) 실험 결과(4) experimental results

결합된 데이터 세트에 대한 실험 결과 FER2013 테스트셋에서 다음의 표 1과 같은 결과가 산출되었다.Experimental results for the combined data set yielded the results shown in Table 1 in the FER2013 test set.

다음으로, 표 2에서는 본 발명의 실시예들에 제안하는 방법과 FER2013 시험 결과의 다른 딥 러닝 학습 기반 방법을 비교하였다.Next, Table 2 compares the method proposed in the embodiments of the present invention with other deep learning learning based methods of the FER2013 test results.

아래의 표 3은 딥 러닝 학습 기능을 사용하지 않는 FER2013 테스트셋에서 최고의 성능을 보여준 결과를 예시하였다.Table 3 below shows the best results for the FER2013 test set without deep learning learning.

본 발명의 실시예들에 따르면, 학습된 심층 얼굴 특징의 판별력을 향상시키기 위해 CNN을 위한 새로운 중심 분산 손실 함수를 제안하였다. FER2013 데이터셋에 대한 실험 결과에 따르면, 제안된 중심 분산 손실 및 결합된 중심 분산 손실은 양자 모두 표정 인식을 위한 향상된 성능을 보여주었다. 제안된 손실 함수들은 소프트맥스 손실 또는 중심 손실 중 어느 하나만을 사용하는 경우와 비교할 때, 분류 간 유사성만큼 분류 내 변화를 감소시킬 수 있는 효과를 나타낸다.According to embodiments of the present invention, a new central variance loss function for CNN is proposed to improve discrimination of learned deep facial features. Experimental results for the FER2013 dataset show that the proposed central variance loss and the combined central variance loss both show improved performance for facial recognition. The proposed loss functions have the effect of reducing the change in classification by the similarity between classifications, compared with the case of using either softmax loss or central loss.

한편, 본 발명의 실시예들은 컴퓨터로 읽을 수 있는 기록 매체에 컴퓨터가 읽을 수 있는 코드로 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다.On the other hand, embodiments of the present invention can be implemented in a computer-readable code on a computer-readable recording medium. The computer-readable recording medium includes all kinds of recording devices in which data that can be read by a computer system is stored.

컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장장치 등을 포함한다. 또한, 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다. 그리고 본 발명을 구현하기 위한 기능적인(functional) 프로그램, 코드 및 코드 세그먼트들은 본 발명이 속하는 기술 분야의 프로그래머들에 의하여 용이하게 추론될 수 있다.Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like. The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion. And functional programs, codes and code segments for implementing the present invention can be easily inferred by programmers in the art to which the present invention belongs.

이상에서 본 발명에 대하여 그 다양한 실시예들을 중심으로 살펴보았다. 본 발명에 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.The present invention has been described above with reference to various embodiments thereof. Those skilled in the art will understand that the present invention can be implemented in a modified form without departing from the essential features of the present invention. Therefore, the disclosed embodiments should be considered in descriptive sense only and not for purposes of limitation. The scope of the present invention is shown in the claims rather than the foregoing description, and all differences within the scope will be construed as being included in the present invention.

700: 얼굴 표정 인식 장치
10: 입력부
20: 프로세서
30: 메모리 700: facial expression recognition device
10: input unit
20: processor
30: memory

Claims

Receiving, by the facial expression recognition apparatus, a plurality of training images related to the facial expressions, extracting training features, and training a classification model of an artificial neural network including a loss function; And
And receiving, by the facial expression recognition apparatus, a target image including a face, extracting a target feature, and classifying the target feature using a learned classification model of the artificial neural network to identify the facial expression of the target image. and,
The loss function is,
And a first loss function that increases intra-class variation and a second loss function that increases inter-class difference.

The method of claim 1,
The artificial neural network is a convolutional neural network (CNN),
Wherein the second loss function is a Center Dispserion Loss function that maximizes the distance between different classification centers by calculating a mean squared difference between the centers of the respective classifications.

The method of claim 2,
Wherein the first loss function is a Softmax Loss function.

The method of claim 1,
The loss function is,
And calculating the linear expression by linearly combining the first loss function and the second loss function using a hyperparameter set in consideration of the balance between the first loss function and the second loss function.

The method of claim 1,
Training the classification model of the artificial neural network,
Extracting training features by receiving a plurality of training images collected from a lab environment and a wild environment with respect to a facial expression:
Generating a detection result through a landmark detector on the extracted training feature;
Training a deep network structure composed of a plurality of convolution layers using the generated detection result; And
Fine-tuning the classification model of the artificial neural network by re-learning a pre-training model through transfer learning.

The method of claim 5,
The landmark detector includes at least one of a dlib face detector (Multi-view HOG), HOG + Adaboost and Haar + Adaboost,
The training image is configured by combining at least two or more of the data set of SFEW, FER2013, CK + and KDEF,
The classification model is a VGGFace model, facial expression recognition method.

Receiving, by the facial expression recognition apparatus, a plurality of training images related to the facial expressions, extracting training features, and training a classification model of an artificial neural network including a loss function; And
And receiving, by the facial expression recognition apparatus, a target image including a face, extracting a target feature, and classifying the target feature using a learned classification model of the artificial neural network to identify the facial expression of the target image. and,
The loss function is,
A first loss function that increases intra-class variation, a second loss function that increases inter-class difference, and a third loss function that reduces change in classification. , Facial expression recognition method.

The method of claim 7, wherein
The artificial neural network is a convolutional neural network (CNN),
The second loss function is a Center Dispserion Loss function that maximizes the distance between different classification centers by calculating a mean square difference between the centers of respective classifications,
And the third loss function is a center loss function that minimizes the distance between the depth feature of each classification and the center of the classification by calculating the sum of the squared distances between the sample and the corresponding center in the feature space.

The method of claim 8,
Wherein the first loss function is a Softmax Loss function.

The method of claim 7, wherein
The loss function is,
Calculating a combined loss function by linearly combining the second loss function and the third loss function using a first hyperparameter set in consideration of the balance between the second loss function and the third loss function;
And calculating the linear expression by linearly combining the first loss function and the combined loss function using a second hyperparameter set in consideration of the balance of the first loss function and the combined loss function.

The method of claim 7, wherein
Training the classification model of the artificial neural network,
Extracting training features by receiving a plurality of training images collected from a lab environment and a wild environment with respect to a facial expression:
Generating a detection result through a landmark detector on the extracted training feature;
Training a deep network structure composed of a plurality of convolution layers using the generated detection result; And
Fine-tuning the classification model of the artificial neural network by re-learning a pre-training model through transfer learning.

The method of claim 11,
The landmark detector includes at least one of a dlib face detector (Multi-view HOG), HOG + Adaboost and Haar + Adaboost,
The training image is configured by combining at least two or more of the data set of SFEW, FER2013, CK + and KDEF,
The classification model is a VGGFace model, facial expression recognition method.

A computer-readable recording medium having recorded thereon a program for executing the method of any one of claims 1 to 12.

An input unit configured to receive a plurality of training images related to facial expressions and a target image including a face;
A memory for learning a facial expression and storing a program for recognizing the facial expression of the target image using the learned model; And
A processor for executing a program stored in the memory;
The program stored in the memory,
The training feature is extracted from the input training image to train the classification model of the artificial neural network including the loss function, the target image is extracted by receiving the target image including the face, and the classified model of the trained artificial neural network And classifying the target feature to identify a facial expression of the target image.
The loss function is,
And a first loss function that increases intra-class variation and a second loss function that increases inter-class difference.

The method of claim 14,
The artificial neural network is a convolutional neural network (CNN),
The first loss function is a Softmax Loss function,
And the second loss function is a center dispserion loss function that maximizes the distance between different classification centers by calculating a mean square difference between the centers of respective classifications.

The method of claim 14,
The loss function is,
And a facial expression recognition apparatus calculated by linearly combining the first loss function and the second loss function using a hyperparameter set in consideration of the balance between the first loss function and the second loss function.

The method of claim 14,
The loss function is,
And a third loss function for reducing the change in classification.

The method of claim 17,
And the third loss function is a center loss function that minimizes the distance between the depth feature of each classification and the corresponding classification center by calculating the sum of the squared distances between the sample and the corresponding center in the feature space.

The method of claim 17,
The loss function is,
Calculating a combined loss function by linearly combining the second loss function and the third loss function using a first hyperparameter set in consideration of the balance between the second loss function and the third loss function;
And a facial expression recognition apparatus calculated by linearly combining the first loss function and the combined loss function using a second hyperparameter set in consideration of the balance of the first loss function and the combined loss function.

The method of claim 14,
The program stored in the memory,
The training feature is extracted from a plurality of training images collected from a lab environment and a wild environment with respect to a facial expression, and a detection result is generated through a landmark detector on the extracted training feature. The deep network structure composed of a plurality of convolution layers is trained using the generated detection result, and the pre-training model is completed through transfer learning. And fine-tuning the classification model of the artificial neural network by relearning, thereby performing a command for learning the classification model of the artificial neural network.