KR102060719B1

KR102060719B1 - System and method for face detection and emotion recognition based deep-learning

Info

Publication number: KR102060719B1
Application number: KR1020170154097A
Authority: KR
Inventors: 장인훈; 고광은
Original assignee: 한국생산기술연구원
Priority date: 2017-11-17
Filing date: 2017-11-17
Publication date: 2019-12-30
Also published as: KR20190056792A

Abstract

본 발명은 딥러닝 기반 얼굴 검출 및 감정 인식 시스템 및 방법에 관한 것으로서, 이 시스템은 복수의 영상을 입력 받아 얼굴을 검출하고 얼굴에 표출된 감정을 인식하는 딥러닝 연산부, 그리고 제1 영상에서 인식된 제1 감정, 제2 영상에서 인식된 제2 감정, 그리고 이전에 도출된 감정과 관련된 결과 값을 대상으로 감정 전이 확률을 기초로 계산하여 현재 감정을 도출하는 확률 연산부를 포함한다.The present invention relates to a deep learning-based face detection and emotion recognition system and method, wherein the system detects a face by receiving a plurality of images, and detects a face and recognizes the emotion expressed on the face. And a probability calculator configured to derive a current emotion by calculating a emotion based on a probability of emotion transfer based on a first emotion, a second emotion recognized in the second image, and a result value related to a previously derived emotion.

Description

Deep learning based face detection and emotion recognition system and method {SYSTEM AND METHOD FOR FACE DETECTION AND EMOTION RECOGNITION BASED DEEP-LEARNING}

본 발명은 얼굴 검출 및 감정 인식 시스템 및 방법에 관한 것으로서 더욱 상세하게는 딥러닝 기반 얼굴 검출 및 감정 인식 시스템 및 방법에 관한 것이다.The present invention relates to a face detection and emotion recognition system and method, and more particularly to a deep learning based face detection and emotion recognition system and method.

도 1에 도시한 종래의 감정 인식 시스템은 복수의 얼굴이 포함된 입력 영상으로부터 각 얼굴의 개별 감정 상태를 인식할 수 있다. 이 감정 인식 시스템은 입력부(10), 얼굴 특징 추출부(20), 제1 분류기(30), 감정 특징 추출부(50), 그리고 제2 분류기(60)를 포함한다.The conventional emotion recognition system illustrated in FIG. 1 may recognize individual emotion states of each face from an input image including a plurality of faces. The emotion recognition system includes an input unit 10, a face feature extractor 20, a first classifier 30, an emotion feature extractor 50, and a second classifier 60.

입력부(10)는 복수의 얼굴이 포함되어 있는 입력 영상을 받아들이고, 얼굴 특징 추출부(20)는 입력된 영상으로부터 얼굴 특징 값을 생성한다. 제1 분류기(30)는 학습 데이터를 이용하여 입력된 영상으로부터 제1 분류 결과(40), 즉 검출된 얼굴 영상을 출력한다. 한편, 입력 영상에 복수의 얼굴이 존재하여 복수의 얼굴이 검출되면 검출된 얼굴마다 크롭(crop) 영상을 출력한다.The input unit 10 receives an input image including a plurality of faces, and the face feature extractor 20 generates a face feature value from the input image. The first classifier 30 outputs the first classification result 40, that is, the detected face image from the input image using the training data. Meanwhile, when a plurality of faces are present in the input image and a plurality of faces are detected, a crop image is output for each detected face.

종래에 초기 얼굴 검출에 사용된 특징(feature)은 영상 속 얼굴의 강도(intensity)였다. 그러나 인종, 조명 등에 따라 성능이 좌우됨에 따라 이에 무관한 특징이 필요하게 되었고, 하르 유사 특징(Haar-like feature), Local Binary Pattern(LBP), Modified Census Transform 등의 특징이 제안되었다.Conventionally, the feature used for initial face detection was the intensity of the face in the image. However, as performance depends on race, lighting, etc., irrelevant features are needed, and features such as Haar-like feature, Local Binary Pattern (LBP), and Modified Census Transform are proposed.

한편, 얼굴 검출을 위한 분류기로 신경망(Neural Network) 기반 분류기, 아다부스트(AdaBoost) 분류기 등을 사용할 수 있다.Meanwhile, a neural network based classifier, an AdaBoost classifier, or the like may be used as a classifier for face detection.

감정 특징 추출부(50)는 검출된 얼굴 크롭 영상으로부터 감정 특징 값을 생성한다. 제2 분류기(60)는 학습 데이터를 이용하여 얼굴 크롭 영상으로부터 제2 분류 결과(70), 즉 검출된 각 얼굴의 감정 상태를 출력한다.The emotion feature extractor 50 generates an emotion feature value from the detected face crop image. The second classifier 60 outputs the second classification result 70, that is, the emotional state of each detected face, from the face crop image using the training data.

감정 인식을 위한 종래 기술로서 눈썹, 눈, 코, 입 등을 특징 요소로 사용하여 아래 표 1과 같이 얼굴의 표정을 구분 지었다. 이러한 얼굴 특징 요소를 이용한 감정 인식 방법으로는 광학적 흐름분석(optical flow analysis)과 홀리스틱 분석(holistic analysis) 등이 있다.As a conventional technique for emotion recognition, facial expressions are distinguished as shown in Table 1 using eyebrows, eyes, nose, and mouth as feature elements. Emotion recognition methods using such facial feature elements include optical flow analysis and holistic analysis.

[표 1]TABLE 1

그런데, 종래 기술의 접근 방법은 다음과 같은 측면에서 문제점과 한계를 가진다.However, the prior art approach has problems and limitations in the following aspects.

첫째, 분류기 설계자에 의한 주관적인 특징점 선정이다. 분류기는 인식 대상과 주변 환경 등에 영향을 많이 받기 때문에 특징점 선정과 추출은 일반적으로 매우 어려운 문제이고, 특징점 선정 결과에 따라 인식 성능에 큰 영향을 미친다. 또한, 분류기를 위해 선정된 특징점이 최적이라고 판단할 척도가 없다.First, subjective feature point selection by classifier designer. Since the classifier is heavily influenced by the recognition object and the surrounding environment, feature point selection and extraction are generally very difficult problems and greatly affect the recognition performance according to the feature point selection result. In addition, there is no measure to determine that the feature point selected for the classifier is optimal.

둘째, 얼굴 검출 후 감정 인식을 하는 2 단계의 순차적인 과정을 거치기 때문에 2개의 분류기(얼굴 검출 분류기, 감정 인식 분류기)가 필요하다. 이에 따라 학습 데이터, 특징점 선정 및 추출 과정, 학습 과정 모두 두 종류가 필요하게 된다.Second, two sorters (face detection classifier, emotion recognition classifier) are required because the two-step sequential process of emotion recognition is performed after face detection. Accordingly, two types of learning data, feature point selection and extraction process, and learning process are required.

셋째, 복수의 얼굴이 검출된 경우 각 얼굴에 대한 감정 인식을 하려면 얼굴 수만큼의 감정 인식 분류기 연산이 필요하다. 따라서 검출된 얼굴 수가 많아질수록 연산 시간이 크게 증가할 수 있다.Third, when a plurality of faces are detected, an emotion recognition classifier operation equal to the number of faces is required to recognize emotions for each face. Therefore, as the number of detected faces increases, computation time may increase significantly.

넷째, 기존의 감정 인식 기술은 정지 영상을 대상으로 수행하고 있지만 실제 인간의 감정은 시간의 흐름을 고려하여 연속선 상에서 인식할 필요가 있다.Fourth, the existing emotion recognition technology is performed on a still image, but the actual human emotion needs to be recognized on a continuous line in consideration of the passage of time.

공개특허공보 10-2016-0096460Published Patent Publication 10-2016-0096460

따라서 본 발명이 해결하고자 하는 과제는 종래 기술의 문제점과 한계를 극복하기 위한 것으로서, 1개의 분류기를 포함하는 딥러닝 아키텍처 및 확률 모델을 이용하여 얼굴 검출 및 감정 인식을 동시에 실시간으로 수행할 수 있으며, 감정의 연속성을 고려하여 보다 높은 정확도를 가지는 얼굴 검출 및 감정 인식 시스템 및 방법을 제공하는 것이다.Therefore, the problem to be solved by the present invention is to overcome the problems and limitations of the prior art, using the deep learning architecture and probability model including one classifier can perform face detection and emotion recognition at the same time, The present invention provides a face detection and emotion recognition system and method having higher accuracy in consideration of emotion continuity.

본 발명의 한 실시예에 따른 복수의 영상을 입력 받아 얼굴을 검출하고 상기 검출된 얼굴에 표출된 감정을 인식하는 딥러닝 연산부, 그리고 상기 딥러닝 연산부에 의하여 제1 영상에서 인식된 제1 감정, 상기 제1 영상의 다음 입력 영상인 제2 영상에서 인식된 제2 감정, 그리고 이전에 도출된 감정과 관련된 결과 값을 대상으로 감정 전이 확률을 기초로 계산하여 현재 감정을 도출하는 확률 연산부를 포함한다.According to an embodiment of the present invention, a deep learning operation unit detects a face by receiving a plurality of images and recognizes an emotion expressed on the detected face, and a first emotion recognized in the first image by the deep learning operation unit. And a probability calculator configured to derive a current emotion by calculating a second emotion recognized in the second image, which is the next input image of the first image, and a result value related to the previously derived emotion, based on the emotion transfer probability. .

상기 감정 전이 확률은 3축 감정 모델에서 거리가 가까울수록 높고, 거리가 멀수록 낮을 수 있다.The emotion transfer probability may be higher as the distance is closer and lower as the distance is greater in the 3-axis emotion model.

상기 딥러닝 연산부는 하나의 분류기로 상기 얼굴 검출 및 감정 인식을 수행하기 위하여 그리드 셀마다 상기 얼굴 검출을 위한 적어도 하나의 바운딩 박스의 정보와 상기 적어도 하나의 바운딩 박스 안의 얼굴에 대한 감정 추정치 정보를 가지고 있을 수 있다.The deep learning calculator has at least one bounding box information for face detection and emotion estimate information for a face in the at least one bounding box for each grid cell in order to perform the face detection and emotion recognition with one classifier. There may be.

상기 확률 연산부는 다음 수학식The probability calculating unit is the following equation

에 따라 상기 현재 감정을 도출할 수 있다.According to the present emotion can be derived.

상기 검출된 얼굴에 대한 신원을 파악하는 제2 딥러닝 연산부를 더 포함할 수 있다.The apparatus may further include a second deep learning calculator configured to determine an identity of the detected face.

본 발명의 다른 실시예에 따른 얼굴 검출 및 감정 인식 방법은 복수의 영상을 입력 받는 단계, 상기 영상으로부터 얼굴을 검출하고 상기 검출된 얼굴에 표출된 감정을 인식하는 단계, 그리고 상기 인식 단계에서 제1 영상에서 인식된 제1 감정, 상기 제1 영상의 다음 입력 영상인 제2 영상에서 인식된 제2 감정, 그리고 이전에 도출된 감정과 관련된 결과 값을 대상으로 감정 전이 확률을 기초로 계산하여 현재 감정을 도출하는 단계를 포함한다.According to another aspect of the present invention, there is provided a face detection and emotion recognition method comprising: receiving a plurality of images, detecting a face from the image, recognizing the emotion expressed on the detected face, and the first step in the recognition step. The current emotion is calculated based on the probability of emotion transition based on a first emotion recognized in the image, a second emotion recognized in the second image that is the next input image of the first image, and a result value related to a previously derived emotion. Deriving a step.

본 발명에 의하면, 딥러닝 아키텍처 및 확률 모델을 이용함으로써 1개의 분류기만으로도 얼굴 검출 및 감정 인식을 동시에 수행할 수 있고, 검출된 얼굴 수에 비례하여 연산 시간이 증가하지 않고 복수의 얼굴에 대하여 동시에 얼굴 검출 및 감정 인식을 수행할 수 있으므로 실시간 연산이 가능하며, 감정의 연속성을 고려하므로 보다 높은 정확도를 가지고 얼굴 검출 및 감정 인식을 수행할 수 있다.According to the present invention, by using the deep learning architecture and the probabilistic model, face detection and emotion recognition can be simultaneously performed using only one classifier, and the face for a plurality of faces simultaneously without increasing the computation time in proportion to the number of detected faces. Since detection and emotion recognition can be performed, real-time calculation is possible, and since continuity of emotion is considered, face detection and emotion recognition can be performed with higher accuracy.

도 1은 종래의 얼굴 검출 및 감정 인식 시스템을 도시한 블록도이다.
도 2는 본 발명의 한 실시예에 따른 얼굴 검출 및 감정 인식 시스템을 도시한 도면이다.
도 3은 도 2에 도시한 딥러닝 연산부를 도시한 블록도이다.
도 4는 도 3에 도시한 출력 노드 재구성부에 의하여 재구성된 텐서를 도시한 도면이다.
도 5는 각 그리드 셀별 얼굴 검출 및 감정 인식 결과를 나타낸 도면이다.
도 6은 각 바운딩 박스의 감정 인식 분류 확률을 나타낸 도면이다.
도 7은 3축 감정 모델을 도시한 도면이다.
도 8은 상태 전이 모델을 도시한 도면이다.
도 9는 본 발명의 다른 실시예에 따른 얼굴 검출 및 감정 인식 방법을 도시한 흐름도이다.
도 10은 본 발명의 다른 실시예에 따른 얼굴 검출 및 감정 인식 시스템을 도시한 블록도이다.1 is a block diagram illustrating a conventional face detection and emotion recognition system.
2 is a diagram illustrating a face detection and emotion recognition system according to an embodiment of the present invention.
3 is a block diagram illustrating the deep learning operation unit illustrated in FIG. 2.
FIG. 4 is a diagram illustrating a tensor reconstructed by the output node reconstructor shown in FIG. 3.
5 is a diagram illustrating a result of face detection and emotion recognition for each grid cell.
6 is a diagram illustrating emotion recognition classification probabilities of each bounding box.
7 is a diagram illustrating a three-axis emotion model.
8 is a diagram illustrating a state transition model.
9 is a flowchart illustrating a face detection and emotion recognition method according to another embodiment of the present invention.
10 is a block diagram illustrating a face detection and emotion recognition system according to another embodiment of the present invention.

그러면 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다.DETAILED DESCRIPTION Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the present invention.

도 2는 본 발명의 한 실시예에 따른 얼굴 검출 및 감정 인식 시스템을 도시한 도면이다.2 is a diagram illustrating a face detection and emotion recognition system according to an embodiment of the present invention.

도 2를 참고하면, 본 발명의 실시예에 따른 얼굴 검출 및 감정 인식 시스템은 입력부(100), 딥러닝 연산부(200), 확률 연산부(300), 그리고 출력부(400)를 포함한다.2, a face detection and emotion recognition system according to an embodiment of the present invention includes an input unit 100, a deep learning operator 200, a probability operator 300, and an output unit 400.

입력부(100)는 복수의 얼굴이 포함되어 있는 입력 영상을 받아들이고, 딥러닝 연산부(200)는 딥러닝을 이용한 객체 인식 기술에 기반하여 실시간으로 입력 영상 내부의 복수의 사람 얼굴을 검출하고 검출된 얼굴의 감정 인식을 동시에 수행하며, 확률 연산부(300)는 딥러닝 연산부(200)에서 수행한 결과로서 검출된 얼굴과 인식된 감정에 대하여 확률 모델을 결합함으로써 시간적으로 연속적인 감정 인식을 수행한다. 출력부(400)는 검출된 얼굴에 대하여 확률 연산부(300)에서 계산된 감정을 다양한 방식으로 표시하거나 내보낸다.The input unit 100 receives an input image including a plurality of faces, and the deep learning operation unit 200 detects a plurality of human faces inside the input image in real time based on an object recognition technology using deep learning. Emotion recognition is simultaneously performed, and the probability calculating unit 300 performs a continuous emotion recognition in time by combining a probability model with respect to the detected face and the recognized emotion as a result of the deep learning operation unit 200. The output unit 400 displays or exports the emotion calculated by the probability calculator 300 with respect to the detected face in various ways.

딥러닝 연산부(200)는 Faster RCNN 알고리즘을 이용하여 구현될 수 있다. Faster RCNN 알고리즘은 Convolutional Neural Network(CNN)을 기반으로 한 것이므로 특징점 선정 및 추출 과정이 학습이 가능한 컨볼루션(Convolution) 연산을 통해 이루어지기 때문에 딥러닝 연산부(200)는 종래 기술의 문제였던 설계자에 의한 주관적인 특징점 선정이 필요 없다. 더구나, 인식 에러가 최소가 되도록 컨볼루션(Convolution) 연산 파라미터가 역전파(Backpropagation) 알고리즘을 통해 학습되기 때문에 특징점 선정 및 추출 과정을 최적화할 수 있다.The deep learning operation unit 200 may be implemented using a Faster RCNN algorithm. Since the Faster RCNN algorithm is based on the Convolutional Neural Network (CNN), since the feature point selection and extraction process is performed through a convolutional operation that can be learned, the deep learning operation unit 200 is a designer by a designer who was a problem of the prior art. No subjective feature point selection is required. In addition, since convolution operation parameters are learned through a backpropagation algorithm so that recognition errors are minimized, feature selection and extraction processes can be optimized.

또한 딥러닝 연산부(200)는 Faster RCNN 알고리즘을 이용함으로써 입력 영상 내 물체가 어디 있는지를 찾는 위치 검출(Localization) 문제와 물체가 무엇인지를 찾는 분류(Classification) 문제를 하나의 회귀 분석 문제(Single Regression Problem)로 재정의 할 수 있다. 따라서 두 종류의 분류기가 필요한 종래 기술과 달리 하나의 분류기를 이용하여 얼굴 검출과 감정 인식을 동시에 수행할 수 있다.In addition, the deep learning operation unit 200 uses a Faster RCNN algorithm to determine a localization problem of finding an object in an input image and a classification problem of finding an object. Problem) can be redefined. Therefore, unlike the prior art in which two kinds of classifiers are required, face detection and emotion recognition may be simultaneously performed using one classifier.

딥러닝 연산부(200)는 입력 영상을 N*N개의 그리드 셀로 나누고 각 셀별로 사전에 정의된 크기와 개수만큼의 바운딩 박스(Bounding Box)를 생성한다. 그리고 검출하고자 하는 물체가 있을만한 바운딩 박스를 다수 선정한 후 각 바운딩 박스 내의 물체가 무엇인지 인식하므로 복수 개의 물체 인식이 가능하다.The deep learning operation unit 200 divides the input image into N * N grid cells and generates bounding boxes of a predetermined size and number for each cell. In addition, after selecting a plurality of bounding boxes in which there are objects to be detected, a plurality of objects are recognized since the objects in each bounding box are recognized.

따라서 복수의 얼굴이 검출된 경우에도 얼굴 수만큼의 감정 인식 분류기가 추가로 필요 없이 검출된 복수의 얼굴에 대한 바운딩 박스에 대해 각각의 개별 감정 상태 인식이 가능하다. 즉 기존 물체 위치와 물체 인식용 딥러닝 알고리즘을 얼굴 위치와 얼굴 표정 인식용으로 활용할 수 있다.Therefore, even when a plurality of faces are detected, each individual emotional state can be recognized with respect to the bounding box for the plurality of faces detected without the need for an additional face recognition classifier. That is, the deep learning algorithm for object location and object recognition can be used for face location and facial expression recognition.

그러면 도 3 내지 도 6을 참고하여 딥러닝 연산부(200)에 대하여 좀 더 상세하게 설명한다. 도 3은 도 2에 도시한 딥러닝 연산부를 도시한 블록도이고, 도 4는 도 3에 도시한 출력 노드 재구성부에 의하여 재구성된 텐서를 도시한 도면이다. 도 5는 각 그리드 셀별 얼굴 검출 및 감정 인식 결과를 나타낸 도면이고, 도 6은 각 바운딩 박스의 감정 인식 분류 확률을 나타낸 도면이다.Next, the deep learning operation unit 200 will be described in more detail with reference to FIGS. 3 to 6. 3 is a block diagram illustrating the deep learning operation unit illustrated in FIG. 2, and FIG. 4 is a diagram illustrating a tensor reconfigured by the output node reconstruction unit illustrated in FIG. 3. FIG. 5 is a diagram illustrating face detection and emotion recognition results for each grid cell, and FIG. 6 is a diagram illustrating emotion recognition classification probabilities of each bounding box.

딥러닝 연산부(200)는 영상 조절부(210), 특징 추출부(220), 분류기(230), 출력 노드 재구성부(240), 그리고 결과 연산부(250)를 포함한다.The deep learning calculator 200 includes an image controller 210, a feature extractor 220, a classifier 230, an output node reconstructor 240, and a result calculator 250.

딥러닝 연산부(200)는 입력 영상에서 얼굴을 검출하고 얼굴에 표출된 7개의 기본 감정(happy, angry, sad, disgusted, neutral, fearful, surprised)을 파악한다.The deep learning operation unit 200 detects a face from the input image and recognizes seven basic emotions (happy, angry, sad, disgusted, neutral, fearful, and surprised) expressed on the face.

영상 조절부(210)는 입력 영상의 크기를 조절한다. 본 실시예에서는 입력 영상의 크기를 448*448로 조절한다(＝200,704). 영상 조절부(210)는 필요에 따라 영상 보정이나 노이즈 제거 등의 영상 전처리를 수행할 수 있다.The image controller 210 adjusts the size of the input image. In this embodiment, the size of the input image is adjusted to 448 * 448 (= 200,704). The image controller 210 may perform image preprocessing, such as image correction or noise removal, as necessary.

특징 추출부(220)는 크기가 조절된 영상을 입력 받아 24개 콘볼루션 레이어(Convolutional layer)를 거치면서 특징점(feature)을 자동으로 추출한다.The feature extracting unit 220 receives the adjusted image and automatically extracts the feature while passing through 24 convolutional layers.

분류기(230)는 3개의 완전 연결 계층(Fully Connected Layer)을 거치며 출력단에 총 833개의 출력 노드 값을 생성한다. 여기서 얼굴 검출 바운딩 박스 관련 노드 수는 490개이고, 감정 인식 결과 관련 노드 수는 343개이다.The classifier 230 passes through three fully connected layers and generates a total of 833 output node values at the output. Here, the number of nodes related to the face detection bounding box is 490 and the number of nodes related to the emotion recognition result is 343.

출력 노드 재구성부(240)는 도 4에 도시한 바와 같이 833개의 출력 노드를 7*7*17 텐서(tensor)로 재구성한다. 여기서 텐서는 다차원 매트릭스 덩어리를 의미한다. 7*7*17 텐서는 7*7 그리드 셀로 나뉘며 각 그리드 셀은 다음과 같은 17개의 값들로 구성된다.The output node reconstructor 240 reconstructs 833 output nodes into 7 * 7 * 17 tensors as shown in FIG. Tensor means here a multidimensional matrix chunk. A 7 * 7 * 17 tensor is divided into 7 * 7 grid cells, each of which consists of 17 values:

① 각 그리드 셀마다 얼굴을 포함하는 후보 바운딩 박스 2개를 임의로 생성하며, 17개의 값 중 앞의 5개는 첫 번째 바운딩 박스의 정보를 나타내고, 그 다음 5개는 두 번째 바운딩 박스의 정보를 나타낸다.① Randomly generate two candidate bounding boxes containing faces for each grid cell, the first five of the 17 values represent the information of the first bounding box, and the next five represent the information of the second bounding box. .

② 각 바운딩 박스 정보는 위치(x, y), 크기(w, h), 바운딩 박스 안에 얼굴이 있을 확률(confidence) c의 5개 예측 값으로 이루어진다.Each bounding box information consists of five predicted values of position (x, y), size (w, h), and probability c of a face in the bounding box.

③ 마지막 7개의 값은 바운딩 박스 안의 얼굴에 대한 7개 감정 각각에 대한 추정치(확률 P)를 나타낸다.③ The last seven values represent the estimate (probability P) for each of the seven emotions for the face in the bounding box.

따라서 각 그리드셀 별 얼굴 검출 및 감정 인식 결과로서 각 그리드 셀에 대한 17*1*1 텐서는 도 5와 같이 다시 나타낼 수 있다.Therefore, as a result of face detection and emotion recognition for each grid cell, the 17 * 1 * 1 tensor for each grid cell may be represented as shown in FIG. 5.

결과 연산부(250)는 도 6에 도시한 바와 같이 각 바운딩 박스에 얼굴이 있을 확률 c와 7개 감정 각각에 대한 추정치 P를 곱하여 각 바운딩 박스의 감정 인식 분류 확률을 구한다.As shown in FIG. 6, the result calculator 250 calculates an emotion recognition classification probability of each bounding box by multiplying the probability c of each bounding box by the estimated value P for each of the seven emotions.

49개의 그리드 셀(=7*7)당 2개의 바운딩 박스가 존재하므로, 총 98개의 바운딩 박스의 감정 인식 분류 확률을 구할 수 있다. 이때, 감정 인식 분류 확률 값이 기준 값(예를 들면, 0.2)보다 작은 것은 0으로 변경한다.Since there are two bounding boxes per 49 grid cells (= 7 * 7), it is possible to obtain the emotion recognition classification probabilities of a total of 98 bounding boxes. At this time, the value of the emotion recognition classification probability is smaller than the reference value (for example, 0.2) is changed to zero.

그리고 결과 연산부(250)는 각 감정에 대해 감정 인식 분류 확률을 기준으로 바운딩 박스를 내림차순으로 소팅한다. 각 감정에 대해 겹쳐진 바운딩 박스를 제거하기 위해 감정 인식 분류 확률 값이 가장 큰 값을 기준으로 IoU(Intersection of Union)를 계산하여 기준 값(예를 들면, 0.5)보다 크면 분류 확률 값이 작은 쪽을 0으로 변경한다.The result calculator 250 sorts the bounding box in descending order for each emotion based on the emotion recognition classification probability. To remove the overlapping bounding box for each emotion, calculate the IoU (Intersection of Union) based on the largest value of the emotion recognition classification probability, and if the classification value is smaller than the reference value (for example, 0.5), Change it to zero.

이러한 과정을 거치면 감정 인식 분류 확률이 가장 큰 값을 가지는 감정 클래스가 바로 감정 인식 결과이며 그 값이 속한 바운딩 박스가 얼굴 검출 결과를 나타낸다.Through this process, the emotion class having the highest value of the emotion recognition classification probability is the emotion recognition result, and the bounding box to which the value belongs indicates the face detection result.

만일 가장 큰 값이 0이라면 해당 그리드 셀에는 얼굴이 없다는 것을 의미하고, 하나의 그리드 셀에서 2개의 감정 클래스가 검출되었다면 얼굴이 겹쳐져 있는 것을 의미한다.If the largest value is 0, it means that there is no face in the grid cell. If two emotion classes are detected in one grid cell, it means that the faces overlap.

한편, 종래의 감정 인식 기술은 대부분 정지 영상에 대한 감정 인식을 수행하고 있지만 실제 인간의 감정은 시간의 흐름을 고려하여 연속선상에서 인식을 하는 것이 일반적이다. 그리고 시간 축 상에서 어떤 한 상태의 감정에서 다른 감정 상태로의 변화되는 감정 전이 확률은 감정에 따라 다르다.On the other hand, in the conventional emotion recognition technology, most of the emotion recognition for the still image is performed, it is common to recognize the human emotion on the continuous line in consideration of the passage of time. And the probability of emotional transition from one state of emotion to another on the time axis varies from emotion to emotion.

예를 들어, 기쁨 상태에서 슬픔 상태가 될 확률

보다는 화남 상태에서 슬픔으로 변화될 확률

이 일반적으로 더 크다.For example, the probability of being sad from sad

Rather than being angry and sad

This is usually larger.

도 7은 3축 감정 모델을 도시한 도면으로서, 신경 전달 물질인 도파민, 노르아드레날린 및 세로토닌의 특정 조합과 8 가지 기본 감정 사이의 직접적인 관계를 나타낸 3축 감정 모델이다. 즉, 각 축의 신경 전달 물질이 증가할수록 해당 축과 관련된 감정이 커진다. 예를 들어, 도파민이 최대치이고 노르아드레날린과 세로토닌이 0일 때는 공포(fear) 감정이 최대치가 되고, 이와 반대로 도파민이 0이고 노르아드레날린과 세로토닌이 최대치일 때는 놀라움(surprise) 감정이 최대치가 된다.FIG. 7 illustrates a three-axis emotion model, which illustrates a direct relationship between specific combinations of the neurotransmitters dopamine, noradrenaline and serotonin and eight basic emotions. In other words, as the neurotransmitter on each axis increases, the emotion associated with that axis increases. For example, when dopamine is maximal and noradrenaline and serotonin are zero, the fear emotion is maximal. Conversely, when dopamine is zero and noradrenaline and serotonin are maximal, the surprise emotion is maximal.

신경 전달 물질이 인체 내에서 급격하게 변화될 가능성이 낮으므로 이러한 감정 모델에서 거리가 가까울수록 감정 전이 확률이 높고 거리가 멀수록 감정 전이 확률이 낮다.Since neurotransmitters are less likely to change rapidly in the human body, the closer the distance is, the higher the probability of emotional transfer and the lower the distance, the lower the probability of emotional transfer.

본 발명의 실시예에 따른 얼굴 검출 및 감정 인식 시스템의 확률 연산부(300)는 7개의 감정을 상태(state)로 하고 있는 도 8의 상태 전이 모델(state transition model)과 다음 [수학식 1]을 이용하여 계산함으로써 현재 감정을 도출해낸다.The probability operation unit 300 of the face detection and emotion recognition system according to an embodiment of the present invention uses the state transition model of FIG. 8 having seven emotions as a state and the following [Equation 1]: Use it to derive the current emotion.

[수학식 1][Equation 1]

즉, 확률 연산부(300)는 딥러닝 연산부(200)에 의하여 인식된 프레임별 감정들과 확률 연산부(300)가 이전에 계산하여 도출해낸 이전 감정과 관련된 결과 값을 대상으로 하고 감정 전이 확률을 기초로 계산함으로써 현재 감정을 도출해낸다.That is, the probability calculator 300 targets the emotions for each frame recognized by the deep learning calculator 200 and the result values related to previous emotions that the probability calculator 300 previously calculated and derived, and based on the emotion transfer probability. The current emotion is derived by calculating with.

따라서 본 발명의 실시예에 따른 딥러닝 연산부(200)와 확률 연산부(300)에 의하면 시간의 흐름을 고려하여 연속적으로 감정을 인식할 수 있다. 그러므로 동영상과 같이 연속적인 영상을 대상으로 얼굴 검출 및 감정 인식을 수행할 때 딥러닝 연산부(200)에 의하여 부분적인 프레임에 대하여 감정 인식을 잘못 하더라도 확률 연산부(300)에 의하여 올바른 감정 인식으로 수정될 수 있다.Therefore, according to the deep learning operator 200 and the probability operator 300 according to the exemplary embodiment of the present disclosure, emotions may be continuously recognized in consideration of the passage of time. Therefore, when face detection and emotion recognition are performed on a continuous image such as a video, even when the deep learning operation 200 incorrectly recognizes a partial frame, the probability operation unit 300 may correct the correct emotion. Can be.

지금까지 설명의 편의를 위하여 7개의 감정에 대하여 감정 인식을 수행하는 것으로 설명하였으나, 8개 이상의 감정이나 6개 이하의 감정에 대하여도 감정 인식을 수행할 수 있다.For the convenience of explanation, it has been described that emotion recognition is performed on seven emotions, but emotion recognition may be performed on eight or more emotions or six or less emotions.

이와 같이 딥러닝 기술에 기반을 둔 본 발명의 실시예에 따른 얼굴 검출 및 감정 인식 시스템에 의하면 97% 이상의 높은 인식률을 유지하면서 복수의 얼굴 검출 및 감정 인식에도 연산 시간이 비례해서 증가하지 않아 실시간 연산이 가능하며 상용화 가능성도 높다.As described above, according to the face detection and emotion recognition system according to the embodiment of the present invention based on the deep learning technology, the computation time does not increase proportionally to the plurality of face detection and emotion recognition while maintaining a high recognition rate of 97% or more, thereby real-time calculation. It is possible and commercialization possibility is high.

그러면 도 9를 참고하여 본 발명의 실시예에 따른 얼굴 검출 및 감정 인식 방법에 대하여 설명한다. 도 9는 본 발명의 다른 실시예에 따른 얼굴 검출 및 감정 인식 방법을 도시한 흐름도이다.Next, a face detection and emotion recognition method according to an embodiment of the present invention will be described with reference to FIG. 9. 9 is a flowchart illustrating a face detection and emotion recognition method according to another embodiment of the present invention.

도 9에 도시한 방법은 앞서 도 2 내지 도 8에 도시한 얼굴 검출 및 감정 인식 시스템에서 수행되는 방법으로서, 딥러닝 연산부(200)는 딥러닝 기반 얼굴 검출 및 감정 인식을 수행한다(S810). 그리고 확률 연산부(300)는 연속된 감정 상태를 고려하여 감정 전이 확률을 이용하여 현재 감정을 도출한다(S820).The method illustrated in FIG. 9 is a method performed by the face detection and emotion recognition system illustrated in FIGS. 2 to 8, and the deep learning operation unit 200 performs deep learning-based face detection and emotion recognition (S810). In addition, the probability calculating unit 300 derives the current emotion using the emotional transition probability in consideration of the continuous emotional state (S820).

단계 S810 및 단계 S820에 대하여는 앞선 실시예에서 상세하게 설명하였으므로 본 실시예에서도 그대로 채용하는 것으로 하고, 중복을 피하기 위하여 설명을 생략한다.Since steps S810 and S820 have been described in detail in the foregoing embodiment, the present embodiment is adopted as it is, and description thereof is omitted to avoid duplication.

그러면 도 10을 참고하여 얼굴 검출 및 감정 인식과 더불어 검출된 얼굴에 대하여 신원을 파악할 수 있는 얼굴 검출 및 감정 인식 시스템에 대하여 설명한다. 도 10은 본 발명의 다른 실시예에 따른 얼굴 검출 및 감정 인식 시스템을 도시한 블록도이다.Next, with reference to FIG. 10, a face detection and emotion recognition system for identifying an identity of a detected face as well as face detection and emotion recognition will be described. 10 is a block diagram illustrating a face detection and emotion recognition system according to another embodiment of the present invention.

본 발명의 다른 실시예에 따른 얼굴 검출 및 감정 인식 시스템은 입력부(910), 제1 딥러닝 연산부(920), 확률 연산부(930), 제2 딥러닝 연산부(940), 그리고 출력부(950)를 포함한다.The face detection and emotion recognition system according to another exemplary embodiment of the present invention includes an input unit 910, a first deep learning calculator 920, a probability calculator 930, a second deep learning calculator 940, and an output unit 950. It includes.

본 실시예에 따른 입력부(910), 제1 딥러닝 연산부(920), 확률 연산부(930), 그리고 출력부(950)는 앞선 실시예의 입력부(100), 딥러닝 연산부(200), 확률 연산부(300), 그리고 출력부(400)와 각각 실질적으로 동일하므로 자세한 설명은 생략하고 차이 나는 부분에 대하여만 설명하기로 한다.The input unit 910, the first deep learning operation unit 920, the probability operation unit 930, and the output unit 950 according to the present embodiment are the input unit 100, the deep learning operation unit 200, and the probability operation unit ( 300 and the output unit 400 are substantially the same, so a detailed description thereof will be omitted and only differences will be described.

제1 딥러닝 연산부(920)는 검출된 얼굴의 크롭 영상 또는 결합 영상을 제2 딥러닝 연산부(940)에 제공한다.The first deep learning calculator 920 provides the cropped or combined image of the detected face to the second deep learning calculator 940.

제2 딥러닝 연산부(940)는 제1 딥러닝 연산부(920)와 마찬가지로 Faster RCNN 알고리즘 또는 일반 CNN 알고리즘을 이용하여 검출된 얼굴의 신원을 파악할 수 있다.Like the first deep learning calculator 920, the second deep learning calculator 940 may determine the identity of the detected face by using a Faster RCNN algorithm or a general CNN algorithm.

따라서 본 실시예의 얼굴 검출 및 감정 인식 시스템은 입력 영상에서 얼굴 인식뿐만 아니라 검출된 복수의 얼굴의 신원 및 개별 감정을 동시에 실시간으로 파악할 수 있다.Therefore, the face detection and emotion recognition system of the present embodiment can simultaneously recognize not only face recognition in the input image, but also the identity and individual emotions of the plurality of detected faces in real time.

본 발명의 실시예는 다양한 컴퓨터로 구현되는 동작을 수행하기 위한 프로그램 명령을 포함하는 컴퓨터로 읽을 수 있는 매체를 포함한다. 이 매체는 지금까지 설명한 얼굴 검출 및 감정 인식 방법을 실행시키기 위한 프로그램을 기록한다. 이 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 이러한 매체의 예에는 하드디스크, 플로피디스크 및 자기 테이프와 같은 자기 매체, CD 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 자기-광 매체, 롬, 램, 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 구성된 하드웨어 장치 등이 있다. 또는 이러한 매체는 프로그램 명령, 데이터 구조 등을 지정하는 신호를 전송하는 반송파를 포함하는 광 또는 금속선, 도파관 등의 전송 매체일 수 있다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.Embodiments of the invention include a computer readable medium containing program instructions for performing various computer-implemented operations. This medium records a program for executing the face detection and emotion recognition methods described so far. The media may include, alone or in combination with the program instructions, data files, data structures, and the like. Examples of such media include, but are not limited to, magnetic media such as hard disks, floppy disks, and magnetic tape, optical recording media such as CDs and DVDs, floppy disks and program commands such as magnetic, optical media, ROM, RAM, flash memory, and the like. Hardware devices configured to store and perform such operations. Alternatively, the medium may be a transmission medium such as an optical or metal wire, a waveguide, or the like including a carrier wave for transmitting a signal specifying a program command, a data structure, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like.

이상에서 본 발명의 바람직한 실시예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.Although the preferred embodiments of the present invention have been described in detail above, the scope of the present invention is not limited thereto, and various modifications and improvements of those skilled in the art using the basic concepts of the present invention defined in the following claims are also provided. It belongs to the scope of rights.

100, 910: 입력부, 200: 딥러닝 연산부,
300, 930: 확률 연산부, 400, 950: 출력부,
920: 제1 딥러닝 연산부, 940: 제2 딥러닝 연산부,
210: 영상 조절부, 220: 특징 추출부,
230: 분류기, 240: 출력 노드 재구성부,
250: 결과 연산부100, 910: input unit, 200: deep learning operation unit,
300, 930: probability operation unit, 400, 950: output unit,
920: first deep learning operator, 940: second deep learning operator,
210: image control unit, 220: feature extraction unit,
230: classifier, 240: output node reconstruction unit,
250: result calculation unit

Claims

delete

A deep learning operation unit which detects a face by receiving a plurality of images and recognizes the emotion expressed on the detected face, and
A result value related to a first emotion recognized in the first image by the deep learning operation unit, a second emotion recognized in a second image which is the next input image that is temporally continuous with respect to the first image, and a result value related to a previously derived emotion Probability calculation unit that calculates the current emotion based on the probability of emotional transition
Including,
The probability calculating unit is the following equation

According to the face detection and emotion recognition system for deriving the current emotion.

delete

Receiving a plurality of images,
Detecting a face from the image and recognizing the emotion expressed on the detected face; and
In the recognition step, the first emotion recognized in the first image, the second emotion recognized in the second image which is the next input image which is temporally continuous with respect to the first image, and a result value related to the previously derived emotion Deriving the current emotion by calculating based on the probability of the emotional transition
Including,
The current emotion derivation is

Face detection and emotion recognition method calculated according to.

delete