KR102147052B1

KR102147052B1 - Emotional recognition system and method based on face images

Info

Publication number: KR102147052B1
Application number: KR1020180142146A
Authority: KR
Inventors: 장주용
Original assignee: 광운대학교 산학협력단
Priority date: 2018-11-16
Filing date: 2018-11-16
Publication date: 2020-08-21
Also published as: KR20200063292A

Abstract

얼굴 영상 기반의 감정 인식 시스템 및 방법이 개시된다. 얼굴 영상 기반의 감정 인식 시스템은 얼굴 인식을 위한 개인별 얼굴 사진과 기계 학습(machine learning)에 의해 개인별 얼굴 사진의 감정 상태에 따른 얼굴의 윤곽선, 눈썹과 눈, 코와 입, 턱을 포함하는 얼굴의 특징점들과 그 얼굴 사진과 관련된 감정 상태에 따른 영상 패치 기반 데이터를 저장하는 저장된 얼굴 인식 DB와 얼굴 인식 시스템; 및 상기 얼굴 인식 DB와 연동되며, 대상 사람의 얼굴 영상

를 입력받아 N개의 얼굴 특징점들을 추출하고, 특징점 기반 감정 인식 결과

와 그 특징점들 근처의 영상 패치로부터 영상 기반의 감정인식 결과

를 제공하여 대상 사람의 얼굴의 최종 감정 인식 결과를 출력하는 감정 인식 시스템을 포함한다. 대상 사람의 얼굴 영상을 입력받아 얼굴 이미지를 캡춰하여 인공 신경망 기술(CNN)을 사용하여 얼굴의 특징점들(facial landmarks)을 추출하고, 추출된 얼굴 특징점들과 그 특징점들 근처의 영상 패치로부터 특징점 기반의 감정인식 결과와 영상 기반의 감정인식 결과를 제공하여 얼굴의 미세한 표정 변화를 인식하고 희노애락의 감정 상태(기쁨, 슬픔, 두려움, 화남) 를 추정하여 그 사람의 감정 상태를 출력한다. A system and method for recognizing emotions based on a face image are disclosed. The facial image-based emotion recognition system is based on the facial contours of the faces, including the eyebrows and eyes, the nose and mouth, and the chin, according to the emotional state of the individual facial photos by machine learning and individual facial photos for face recognition. A stored face recognition DB and a face recognition system for storing feature points and image patch-based data according to an emotional state related to the face photo; And the face recognition DB and the face image of the target person

Is received, N facial feature points are extracted, and the feature point-based emotion recognition result

And the result of image-based emotion recognition from image patches near the feature points

And an emotion recognition system for outputting a final emotion recognition result of a target person's face. Facial landmarks are extracted using artificial neural network technology (CNN) by capturing the face image of the target person's face, and the feature points are based on the extracted facial feature points and image patches near the feature points. By providing the emotion recognition result of the person and the image-based emotion recognition result, it recognizes the minute facial expression changes on the face, estimates the emotional state of joy, sadness, fear, and anger, and outputs the emotional state of the person.

Description

Emotional recognition system and method based on face images}

본 발명은 얼굴 영상 기반의 감정 인식 시스템 및 방법에 관한 것으로, 보다 상세하게는 대상 사람의 얼굴 영상을 입력받아 얼굴 이미지를 캡춰하여 인공 신경망 기술을 사용하여 윤곽선, 눈썹과 눈, 코와 입, 턱 등의 얼굴 특징점들(facial landmarks)을 추출하고, 추출된 얼굴 특징점들과 그 특징점들 근처의 영상 패치로부터 특징점 기반의 감정인식 결과와 영상 기반의 감정인식 결과를 통해 얼굴의 미세한 표정 변화를 인식하고 희노애락의 감정 상태(기쁨, 슬픔, 두려움, 화남)를 추정하여 그 사람의 얼굴의 감정인식 결과를 출력하는, 얼굴 영상 기반의 감정 인식 시스템 및 방법에 관한 것이다.The present invention relates to a facial image-based emotion recognition system and method, and more particularly, by receiving a face image of a target person and capturing a face image, the outline, eyebrows and eyes, nose and mouth, and chin are obtained using artificial neural network technology. It extracts facial landmarks such as the back, and recognizes the subtle changes in facial expressions through the feature point-based emotion recognition result and the image-based emotion recognition result from the extracted facial feature points and image patches near the feature points. The present invention relates to a facial image-based emotion recognition system and method for estimating the emotional state (joy, sadness, fear, anger) of joy, sadness, fear, and anger and outputting the result of emotion recognition of the person's face.

얼굴인식(Face Recognition) 기술은 1990년대 초기에 소개된 형상 기반 매칭 방법(appearance based matching method), 및 특징(faeture) 기반의 얼굴 인식이 주로 사용된다. 그러나, 얼굴인식은 카메라의 촬영 각도, 조명의 방향, 자세, 표정의 변화 및 시간에 따른 얼굴의 변화에 따라 다르게 인식된다.Face Recognition technology mainly uses an appearance based matching method introduced in the early 1990s and face recognition based on a feature. However, face recognition is recognized differently according to changes in the camera's shooting angle, lighting direction, posture, facial expression, and face changes over time.

특징(faeture) 기반의 얼굴 인식은 디지털 카메라, IoT 디바이스의 카메라 또는 스마트폰의 카메라로 촬영된 영상 데이터를 haar-like feature를 이용한 검출 방법과 MCT(Modified Census Transform) 영상을 이용한 검출 방법이 사용된다. 스마트폰의 카메라의 입력 영상에서 Haar-like feature로 학습된 얼굴 및 눈 검출기를 사용하여 얼굴의 윤곽선과 이마/눈/코/입을 검출하고, 원형의 눈동자를 검출하기 위해 관심 영역(ROI, Region of Interest)으로 설정된 눈 영역을 grayscale로 변환하며, 눈 영역에서 눈동자와 눈의 외곽선 영역이 추출되는 실험에 의한 통계적인 임계값(threshold)을 사용하여 눈 이미지의 histogram[x축 각 픽셀의 화소값, y축 해당 화소 값의 갯수]을 구하고 눈의 이미지를 이진화(binarization)한 후, 히스토그램 평활화(histogram equalization)를 통해 눈 영역의 사진의 전처리를 수행하며, 얼굴 영역에서 눈썹과 눈, 코, 입, 턱의 특징 데이터를 검출하고, 텍스처 특징(Texture Faetures)과 형상 특징(Shape Features)을 추출하여 얼굴 인식 DB에 저장된 얼굴 사진의 특징점들과 유사도(simularity)를 비교하여 얼굴이 인식된다.For feature-based face recognition, a detection method using haar-like features and a detection method using MCT (Modified Census Transform) images are used for image data captured by a digital camera, IoT device camera, or smartphone camera. . Using the face and eye detectors learned as Haar-like features from the input image of the smartphone's camera, the contour of the face and the forehead/eye/nose/mouth are detected, and the region of interest (ROI) is used to detect circular pupils. Interest) is converted to grayscale, and the histogram of the eye image (pixel value of each pixel on the x-axis, pixel value of each pixel on the x-axis, using a statistical threshold obtained by an experiment in which the pupil and the outline area of the eye are extracted from the eye area) The number of pixels corresponding to the y-axis] is obtained, the image of the eye is binarized, and pre-processing of the photograph of the eye area is performed through histogram equalization. The face is recognized by detecting feature data of the jaw, extracting texture features and shape features, and comparing the feature points and simularity of the face photograph stored in the face recognition DB.

삭제delete

얼굴 영역의 눈썹과 눈, 코, 입, 턱의 특징 값은 Haar-like feature의 흰 영역에서 포함되는 픽셀들의 합에서 검은 영역에서 포함되는 픽셀의 합의 차로 표현된다. The feature values of the eyebrows, eyes, nose, mouth, and chin of the face area are expressed as the sum of pixels included in the black area from the sum of pixels included in the white area of the Haar-like feature.

예를들면, 표준 크기의 얼굴 영역 사진에서 검출된 눈 영역에서 오른쪽과 왼쪽 눈의 양쪽 끝점 까지의 거리, 허프 원 변환(hough circle transform) 알고리즘을 사용하여 추출된 눈동자(iris)의 크기 값이 특징 값으로 사용된다.For example, the distance from the eye area detected in the standard-sized facial area photo to both ends of the right and left eye, and the size value of the iris extracted using the hough circle transform algorithm are characterized. Used as a value.

이와 관련된 선행기술1로써, 특허 공개번호 10-2017-0050465에서는 "얼굴 인식 장치 및 방법"을 개시하고 있습니다.As a related prior art 1, Patent Publication No. 10-2017-0050465 discloses a "face recognition apparatus and method".

본 실시예에 의하면, 기계학습을 이용하여 입력영상으로부터 얼굴을 인식함에 있어, 얼굴포즈 및 원근감을 정규화하여 얼굴인식률을 향상시키고, 얼굴 학습 데이터로서 가상 얼굴 영상을 자동으로 생성하여 얼굴 학습 데이터를 획득하는데 드는 비용 및 시간을 절약하는 얼굴 인식 장치 및 방법을 제공한다. According to the present embodiment, in recognizing a face from an input image using machine learning, facial poses and perspective are normalized to improve face recognition rate, and a virtual face image is automatically generated as face learning data to obtain face learning data. It provides a face recognition device and method that saves the cost and time required to do so.

도 1은 종래의 얼굴인식장치의 구성도이다. 도 2는 얼굴인식장치의 정규화부를 설명하는 개념도이다.1 is a block diagram of a conventional face recognition device. 2 is a conceptual diagram illustrating a normalization unit of a face recognition device.

얼굴인식장치(100)는 영상 표시 장치, 영상 촬영 장치, 얼굴인식서버, 태블릿 PC, 랩톱(Laptop), 개인용 PC, 스마트폰, 개인휴대용 정보단말기(PDA: Personal Digital Assistant), 이동통신 단말기, 및 지능형 로봇(Intelligence Robot) 등 중 어느 하나일 수 있다.The face recognition device 100 includes an image display device, an image capture device, a face recognition server, a tablet PC, a laptop (Laptop), a personal PC, a smartphone, a personal digital assistant (PDA), a mobile communication terminal, and It may be any one of an intelligent robot (Intelligence Robot).

얼굴 인식 장치(100)는 카메라로부터 입력되는 입력영상을 획득하는 입력영상 획득부(112); 상기 입력영상에서 얼굴영역을 검출하여 얼굴포즈(Pose)를 정규화함으로써 정면포즈 영상을 생성하고, 상기 카메라와 피사체 간의 거리에 따른 원근왜곡(Perspective Distortion)을 제거하기 위하여 상기 정면포즈 영상의 원근감(Perspective)을 정규화하여 정규화 영상을 생성하는 정규화부(114); 상기 정규화 영상으로부터 상기 피사체의 얼굴을 표현하는 특징벡터(Feature Vector)를 추출하는 특징벡터 추출부(116); 및 기 학습된 분류모델에 상기 특징벡터를 적용하여 상기 입력영상에 포함된 상기 피사체의 얼굴을 인식하는 얼굴인식부(118)를 포함한다.The face recognition apparatus 100 includes an input image acquisition unit 112 for obtaining an input image input from a camera; A front pose image is generated by detecting a face area in the input image and normalizing the face pose, and in order to remove the perspective distortion according to the distance between the camera and the subject, the perspective of the front pose image A normalization unit 114 for generating a normalized image by normalizing ); A feature vector extraction unit 116 for extracting a feature vector representing the face of the subject from the normalized image; And a face recognition unit 118 that recognizes the face of the subject included in the input image by applying the feature vector to the previously learned classification model.

입력영상 획득부(112)는 카메라로부터 입력되는 입력영상을 획득한다. 카메라는 깊이인식 카메라, 스테레오 카메라, 및 컬러 카메라일 수 있다(예를 들면, 키넥트(Kinect) 카메라 등) 또한, 입력영상은 인식대상이 되는 피사체의 얼굴이 포함된 영상으로서 2차원 정지영상 및 동영상을 포함한다. 입력영상은 컬러영상, 깊이영상, 및 컬러-깊이(RGB-D) 영상을 포함할 수 있다.The input image acquisition unit 112 acquires an input image input from the camera. The camera may be a depth recognition camera, a stereo camera, and a color camera (for example, a Kinect camera, etc.). In addition, the input image is an image including the face of a subject to be recognized, and a two-dimensional still image and a Includes video. The input image may include a color image, a depth image, and a color-depth (RGB-D) image.

정규화부(114)는 입력영상으로부터 얼굴영역을 검출하고 얼굴포즈(Pose) 및 원근감(Perspective)을 정규화하여 정규화 영상을 생성한다. 얼굴포즈에 변화가 있는 경우, 그레이 스케일, 형상, 특징점의 위치 등이 달라지기 때문에 얼굴인식률이 저하된다. 또한, 카메라와 피사체 간의 거리가 달라지면 동일한 피사체라 하더라도 촬영된 위치마다 원근왜곡(Perspective Distortion, (예시) 뒤틀림)이 다르게 발생하므로, 다른 피사체를 촬영한 것처럼 보이기도 한다. 따라서 얼굴인식률을 향상시키기 위해서는 입력영상의 얼굴포즈 및 원근감을 정규화할 필요가 있다. The normalization unit 114 generates a normalized image by detecting a face region from the input image and normalizing a face pose and perspective. When there is a change in the face pose, the face recognition rate decreases because the gray scale, shape, and location of feature points are changed. In addition, if the distance between the camera and the subject is different, even if the subject is the same, perspective distortion ((example) distortion) occurs differently for each photographed position, so it may appear that a different subject was photographed. Therefore, in order to improve the face recognition rate, it is necessary to normalize the face pose and perspective of the input image.

정규화부(114)는, 다양한 포즈의 학습용 얼굴영상을 제1 인공신경망의 입력층에 입력하고, 정면포즈의 학습용 얼굴영상이 상기 제1 인공신경망의 출력층에서 출력되도록 상기 제1 인공신경망을 학습시키는 얼굴포즈 정규화 학습부; 및 상기 제1 인공신경망의 출력층에서 출력된 데이터를 제 2 인공신경망의 입력층에 입력하고, 원근왜곡이 없는 학습용 얼굴영상이 상기 제 2 인공신경망의 출력층에서 출력되도록 상기 제2 인공신경망을 학습시키는 원근감 정규화 학습부를 포함한다. The normalization unit 114 inputs a face image for learning of various poses to the input layer of the first artificial neural network, and trains the first artificial neural network so that the face image for learning of the front pose is output from the output layer of the first artificial neural network. Face pose normalization learning unit; And inputting the data output from the output layer of the first artificial neural network to the input layer of the second artificial neural network, and training the second artificial neural network so that a face image for learning without perspective distortion is output from the output layer of the second artificial neural network. It includes a perspective normalization learning unit.

상기 정규화부는, 학습이 완료된 상기 제1 인공신경망과 상기 제2 인공신경망을 통합한 통합 인공신경망의 입력층에 다양한 원근 왜곡이 있는 다양한 포즈의 학습용 얼굴영상을 입력하고, 정면포즈의 원근왜곡이 없는 학습용 얼굴영상이 상기통합 인공신경망의 출력층에서 출력되도록 상기 통합 인공신경망을 학습시킨다. The normalization unit inputs face images for learning of various poses with various perspective distortions into the input layer of the integrated artificial neural network in which the learning has been completed, the first artificial neural network and the second artificial neural network, and there is no perspective distortion of the front pose. The integrated artificial neural network is trained so that the learning face image is output from the output layer of the integrated artificial neural network.

특징벡터 추출부(116)는 기계학습(Machine Learning)을 통해 결정되며, 정규화 영상으로부터 피사체의 얼굴을 표현하는 특징벡터(Feature Vector)를 추출한다.The feature vector extraction unit 116 is determined through machine learning, and extracts a feature vector representing the face of the subject from the normalized image.

특징벡터는 얼굴인식에 사용되는 특징값들을 원소로 가지는 벡터이다. 특징벡터를 추출하는데 사용되는 필터로써 Gabor 필터, Haar 필터, LBP(Local Binary Pattern) - DLBP(Discriminative LBP), ULBP(Uniform LBP), NLBP(Number LBP) 등을 포함 - 등이 있으나, 반드시 이에 한정되지 않으며 그 밖의 다른 필터가 사용될 수 있다.A feature vector is a vector having feature values used in face recognition as elements. Filters used to extract feature vectors include Gabor filter, Haar filter, Local Binary Pattern (LBP)-including DLBP (Discriminative LBP), ULBP (Uniform LBP), NLBP (Number LBP), etc. And other filters can be used.

얼굴 인식부(118)는 기 학습된 분류모델에 특징벡터 추출부(116)에서 추출된 특징벡터를 적용하여 입력영상에 포함된 피사체의 얼굴을 인식한다. 기 학습된 분류모델은 서포트 벡터 머신(Support Vector Machine, SVM), 선형판별분석(Linear Discriminant Analysis, LDA), 및 Softmax 등을 포함할 수 있으나, 반드시 이에 한정되는 것은 아니다.The face recognition unit 118 recognizes the face of a subject included in the input image by applying the feature vector extracted by the feature vector extraction unit 116 to the previously learned classification model. The previously learned classification model may include a support vector machine (SVM), a linear discriminant analysis (LDA), and Softmax, but is not limited thereto.

가상 얼굴영상 생성부(124)는 정규화부(114), 특징벡터 추출부(116), 및 얼굴 인식부(118)가 학습하는데 사용되는 복수의 가상 얼굴영상을 생성할 수 있다.The virtual face image generation unit 124 may generate a plurality of virtual face images used for learning by the normalization unit 114, the feature vector extraction unit 116, and the face recognition unit 118.

복수의 가상 얼굴영상은 가상 얼굴영상 생성부(124)가 카메라로부터 획득된 하나 이상의 2차원 기준영상을 이용하여 합성한 3차원 얼굴모델을 변형시킴으로써생성되는 얼굴영상을 의미한다.The plurality of virtual face images refers to face images generated by transforming a 3D face model synthesized by the virtual face image generator 124 using one or more 2D reference images obtained from a camera.

그러나, 기존의 얼굴 인식 시스템은 입력 영상에 대하여 얼굴 인식 기술을 사용하여 추출된 얼굴 특징점들을 기반 감정을 인식하여 출력하는 얼굴 영상 기반의 감정 인식 시스템이 제공되지 않았다. However, the existing facial recognition system does not provide a facial image-based emotion recognition system that recognizes and outputs emotions based on facial feature points extracted using facial recognition technology for an input image.

특허 공개번호 10-2017-0050465 (공개일자 2017년 05월 11일), "얼굴 인식 장치 및 방법", 에스케이텔레콤 주식회사Patent Publication No. 10-2017-0050465 (Publication Date May 11, 2017), "Face Recognition Device and Method", SK Telecom Co., Ltd.

상기 문제점을 해결하기 위한 본 발명의 목적은 대상 사람의 얼굴 영상을 입력받아 얼굴 이미지를 캡춰하여 인공 신경망 기술을 사용하여 윤곽선, 눈썹과 눈, 코와 입, 턱 등의 얼굴 특징점들(facial landmarks)을 추출하고, 추출된 얼굴 특징점들과 그 특징점들 근처의 영상 패치로부터 특징점 기반의 감정인식 결과와 영상 기반의 감정인식 결과를 통해 사람의 미세한 표정 변화를 인식하고 희노애락의 감정 상태(기쁨, 슬픔, 두려움, 화남)를 추정하여 그 사람의 얼굴의 감정인식 결과를 출력하는, 얼굴 영상 기반의 감정 인식 시스템을 제공한다. An object of the present invention for solving the above problem is to capture a face image of a target person and capture facial images, and facial landmarks such as outlines, eyebrows and eyes, nose and mouth, and chin using artificial neural network technology. Is extracted, and from the extracted facial feature points and the image patch near the feature points, through the feature point-based emotion recognition result and the image-based emotion recognition result, a person's minute facial expression changes are recognized, and the emotional state of joy, sadness, and sorrow (joy, sadness, It provides a facial image-based emotion recognition system that estimates fear and anger) and outputs the emotion recognition result of the person's face.

본 발명의 다른 목적은 얼굴 영상 기반의 감정 인식 방법을 제공한다.Another object of the present invention is to provide a method for recognizing emotions based on a face image.

본 발명의 목적을 달성하기 위해, 얼굴 영상 기반의 감정 인식 시스템은 얼굴 인식을 위한 개인별 얼굴 사진과 기계 학습(machine learning)에 의해 개인별 얼굴 사진의 감정 상태에 따른 얼굴의 윤곽선, 눈썹과 눈, 코와 입, 턱을 포함하는 얼굴의 특징점들과 그 얼굴 사진과 관련된 감정 상태에 따른 영상 패치 기반 데이터를 저장하는 저장된 얼굴 인식 DB와 얼굴 인식 시스템; 및 상기 얼굴 인식 DB와 연동되며, 대상 사람의 얼굴 영상

를 제공하여 대상 사람의 얼굴의 최종 감정 인식 결과를 출력하는 감정 인식 시스템을 포함하며,
상기 감정 인식 시스템은
얼굴 영상

를 입력받아 N개의 얼굴 특징점에 대한 좌표

를 출력하는 얼굴 특징점 추출부; 얼굴 인식DB에 통계적으로 감정상태에 따른 얼굴 표정의 특징점 데이터가 저장되며, 상기 N개의 얼굴 특징점들에 대한 좌표

를 입력받아 상기 감정상태에 따른 얼굴 표정의 특징점 데이터와 비교하여 특징점 기반 감정 인식 결과를 제공하는 특징점 기반 감정 인식부; 입력 얼굴 영상과 상기 N개의 얼굴 특징점 좌표들을 입력받아 얼굴 특징점 좌표를 중심으로 가로, 세로가 W 픽셀의 길이를 가지는 정사각형 패치를 얼굴 영상으로부터 추출하여 결과적으로 총 N개의 영상 패치

를 제공하는 영상 패치 추출부; 상기 영상 패치 추출부로부터 상기 총 N개의 영상 패치

를 입력받고, 영상 패치 기반 감정 인식 결과

를 제공하는 영상 패치 기반 감정 인식부; 및 상기 특징점 기반 감정 인식부 및 상기 영상 패치 기반 감정 인식부로부터 각각 특징점 기반 감정 인식 결과

와 영상 패치 기반 감정 인식 결과

를 입력받아 최종 감정 인식 결과

를 출력하는 감정 인식 결과 융합부를 포함하며,
상기 감정 인식 결과 융합부는

과

는 모두 M 차원의 벡터로 M개의 감정 카테고리에 대한 확률 분포를 나타내며, 그렇게 추정된 두 개의 감정 인식 결과 벡터는 상기 감정 인식 결과 융합부로 입력되고, 최종 감정 인식 결과

가 계산되며,
이는

의 관계식을 통해 특징점 기반 감정 인식 결과 벡터와 영상 패치 기반 감정 인식 결과 벡터의 가중치 평균으로 계산될 수 있으며, 여기서 α는 특징점 기반 감정 인식 결과에 대한 가중치를 나타내며,
인식된 감정의 카테고리

는 가장 높은 확률을 가지는 감정의 인덱스

로 계산되어 감정 인식 시스템의 최종 감정 인식 결과가 출력된다. In order to achieve the object of the present invention, a facial image-based emotion recognition system includes a face contour, eyebrows, eyes, and nose according to the emotional state of each individual face photo by machine learning and an individual face photo for face recognition. A stored face recognition DB and a face recognition system for storing facial feature points including the mouth, chin, and image patch-based data according to an emotional state related to the face photo; And the face recognition DB and the face image of the target person

It includes an emotion recognition system for outputting the final emotion recognition result of the face of the target person by providing,
The emotion recognition system
Face image

Is received and coordinates for N facial feature points

A facial feature point extracting unit for outputting a; Feature point data of facial expressions according to the emotional state are statistically stored in the face recognition DB, and coordinates for the N facial feature points

A feature point-based emotion recognition unit for receiving an input and comparing the feature point data of the facial expression according to the emotional state to provide a feature point-based emotion recognition result; The input face image and the N face feature point coordinates are input, and a square patch having a length of W pixels horizontally and vertically around the face feature point coordinates is extracted from the face image, resulting in a total of N image patches.

An image patch extraction unit providing a; The total number of N video patches from the video patch extraction unit

Is received, and the result of emotion recognition based on image patch

An image patch-based emotion recognition unit providing a; And characteristic point-based emotion recognition results from the feature point-based emotion recognition unit and the image patch-based emotion recognition unit.

And image patch-based emotion recognition results

Is received and the final emotion recognition result

It includes an emotion recognition result fusion unit that outputs,
The emotion recognition result fusion unit

and

Are all M-dimensional vectors and represent probability distributions for M emotion categories, and the two estimated emotion recognition result vectors are input to the emotion recognition result fusion unit, and the final emotion recognition result

Is calculated,
this is

It can be calculated as a weighted average of the feature point-based emotion recognition result vector and the image patch-based emotion recognition result vector through the relational expression of, where α represents the weight for the feature point-based emotion recognition result,
Category of perceived emotion

Is the index of the emotion with the highest probability

Is calculated and the final emotion recognition result of the emotion recognition system is output.

상기 영상 패치 기반 데이터는 감정 상태에 따른 얼굴 사진의 각 특징점 좌표 중심으로 윈도우(window)로 재구성된 컬러 영상들을 각각 얼굴 인식DB에 저장된 얼굴 인식 데이터이며, 상기 윈도우는 3x3 window, 또는 5x5 window를 사용한다. The image patch-based data is face recognition data stored in a face recognition DB, respectively, of color images reconstructed as a window around the coordinates of each feature point of a face photo according to an emotional state, and the window uses a 3x3 window or a 5x5 window. do.

삭제delete

상기 얼굴 특징점 추출부, 상기 특징점 기반 감정 인식부, 상기 영상 패치 기반 감정 인식부는 입력 영상 I로부터 입력층/은닉층/출력층의 다층 구조의 컨볼루션 신경망(CNN)을 사용하며, 상기 얼굴 특징점 추출부는 얼굴의 윤곽선, 눈썹과 눈, 코와 입, 턱을 포함하는 N개의 얼굴 특징점들을 추출한다.The facial feature point extraction unit, the feature point-based emotion recognition unit, and the image patch-based emotion recognition unit use a multi-layered convolutional neural network (CNN) of an input layer/hidden layer/output layer from the input image I, and the facial feature point extracting unit The N facial feature points including the outline of, eyebrows and eyes, nose and mouth, and chin are extracted.

삭제delete

상기 감정 인식 시스템은 얼굴 인식 DC와 출입 관리시의 감정 상태에 따른 얼굴 특징점 데이터가 저장된 서버에 얼굴 인식을 사용한 감정 인식 시스템이 구축되며, client/server 방식으로 카메라 영상의 얼굴 인식 시에 PC 또는 스마트폰의 클라이언트 프로그램으로 얼굴 인식 및 상기 감정 인식 결과를 제공한다. In the emotion recognition system, an emotion recognition system using face recognition is built in a server storing face recognition DC and facial feature point data according to the emotion state at the time of access management, and a PC or smart device when face recognition of camera images is performed in a client/server method. Face recognition and the emotion recognition result are provided by the phone's client program.

본 발명의 다른 목적을 달성하기 위해, 얼굴 영상 기반의 감정 인식 방법은 (a) 감정 인식 시스템에서, 얼굴 영상

를 얼굴 특징점 추출부로 입력 받아, 상기 얼굴 특징점 추출부가 N개의 얼굴 특징점에 대한 좌표

를 출력하는 단계; (b) 상기 얼굴 특징점 추출부로부터 상기 N개의 얼굴 특징점들에 대한 좌표

를 특징점 기반 감정 인식부로 입력받아, 상기 특징점 기반 감정 인식부가 감정상태에 따른 얼굴 표정의 특징점 데이터와 비교하여 특징점 기반 감정 인식 결과를 제공하는 단계; (c) 입력 얼굴 영상과 상기 N개의 얼굴 특징점 좌표들을 영상 패치 추출부로 입력받아, 상기 영상 패치 추출부가 얼굴 특징점 좌표를 중심으로 가로, 세로가 W 픽셀의 길이를 가지는 정사각형 패치를 얼굴 영상으로부터 추출하여 결과적으로 총 N개의 영상 패치

를 제공하는 단계; (d) 상기 영상 패치 추출부로부터 상기 총 N개의 영상 패치

를 영상 패치 기반 감정 인식부로 입력받고, 상기 영상 패치 기반 감정 인식부가 영상 패치 기반 감정 인식 결과

를 제공하는 단계; 및 (e) 상기 특징점 기반 감정 인식부 및 상기 영상 패치 기반 감정 인식부로부터 얼굴 인식 DB와 연동하여 기계 학습 데이터와 비교하여 감정 상태에 따른 개인별 얼굴 사진의 각각 특징점 기반 감정 인식 결과

와 영상 패치 기반 감정 인식 결과

를 감정 인식 결과 융합부로 입력받아, 상기 감정 인식 결과 융합부가 최종 감정 인식 결과

를 출력하는 단계를 포함하며,
상기 단계 (d)의 상기 감정 인식 결과 융합부는

과

는 모두 M 차원의 벡터로 M개의 감정 카테고리에 대한 확률 분포를 나타내며, 그렇게 추정된 두 개의 감정 인식 결과 벡터는 감정 인식 결과 융합부(770)로 입력되고, 최종 감정 인식 결과

가 계산되며,
이는

의 관계식을 통해 특징점 기반 감정 인식 결과 벡터와 영상 패치 기반 감정 인식 결과 벡터의 가중 평균으로 계산될 수 있으며, 여기서 α는 특징점 기반 감정 인식 결과에 대한 가중치를 나타내며,
인식된 감정의 카테고리

는 가장 높은 확률을 가지는 감정의 인덱스

로 계산되어 감정 인식 시스템의 최종 감정 인식 결과가 출력된다. In order to achieve another object of the present invention, a facial image-based emotion recognition method is provided in (a) an emotion recognition system,

Is received by the facial feature point extracting unit, and the facial feature point extracting unit is

Outputting; (b) Coordinates of the N facial feature points from the facial feature point extraction unit

Receiving a feature point-based emotion recognition unit, and comparing the feature point-based emotion recognition unit with feature point data of a facial expression according to an emotional state, and providing a feature point-based emotion recognition result; (c) The input face image and the N facial feature point coordinates are input to an image patch extracting unit, and the image patch extracting unit extracts a square patch having a length of W pixels horizontally and vertically around the facial feature point coordinates from the face image. As a result, a total of N image patches

Providing a; (d) the total number of N video patches from the video patch extraction unit

Is input to the image patch-based emotion recognition unit, and the image patch-based emotion recognition unit

Providing a; And (e) the feature point-based emotion recognition result of each individual face photo according to the emotional state by linking with the face recognition DB from the feature point-based emotion recognition unit and the image patch-based emotion recognition unit to compare it with machine learning data.

And image patch-based emotion recognition results

Is input to the emotion recognition result fusion unit, and the emotion recognition result fusion unit is the final emotion recognition result

Including the step of outputting,
The emotion recognition result fusion unit in step (d)

and

Are all M-dimensional vectors and represent probability distributions for M emotion categories, and the two estimated emotion recognition result vectors are input to the emotion recognition result fusion unit 770, and the final emotion recognition result

Is calculated,
this is

Is the index of the emotion with the highest probability

상기 단계 (a)에서, 상기 얼굴 특징점 추출부, 상기 특징점 기반 감정 인식부, 상기 영상 패치 기반 감정 인식부는 입력 영상 I로부터 입력층/은닉층/출력층의 다층 구조의 컨볼루션 신경망(CNN)을 사용하며, 상기 얼굴 특징점 추출부는 얼굴의 윤곽선, 눈썹과 눈, 코와 입, 턱을 포함하는 N개의 얼굴 특징점들을 추출한다.In the step (a), the facial feature point extraction unit, the feature point-based emotion recognition unit, and the image patch-based emotion recognition unit use a convolutional neural network (CNN) having a multilayered structure of an input layer/hidden layer/output layer from the input image I, and , The facial feature point extracting unit extracts N facial feature points including the outline of the face, eyebrows and eyes, nose and mouth, and chin.

상기 얼굴 인식DB는 기계학습(machine learning)에 따라 개인별 감정상태에 따른 얼굴 사진의 특징점 데이터, 및 개인별 얼굴 사진의 각 특징점 좌표 중심으로 윈도우(window)로 재구성된 영상 패치 기반 데이터가 저장된다. The face recognition DB stores feature point data of a face photo according to each individual's emotional state according to machine learning, and image patch-based data reconstructed as a window around the coordinates of each feature point of each individual face photo.

상기 영상 패치 기반 데이터는 감정 상태에 따른 얼굴 사진의 각 특징점 좌표 중심으로 윈도우(window)로 잘라낸 컬러 영상들을 각각 얼굴 인식DB에 저장된 얼굴 인식 데이터이며, 상기 윈도우는 3x3 window, 또는 5x5 window를 사용한다. The image patch-based data is face recognition data stored in a face recognition DB, respectively, of color images cut out with a window around the coordinates of each feature point of a face photo according to an emotional state, and the window uses a 3x3 window or a 5x5 window. .

삭제delete

상기 감정 인식 시스템은, 얼굴 인식 DC와 출입 관리시의 감정 상태에 따른 얼굴의 특징점 데이터가 저장된 서버에 얼굴 인식을 사용한 감정 인식 시스템이 구축되며, client/server 방식으로 카메라 영상의 얼굴 인식 시에 PC 또는 스마트폰의 클라이언트 프로그램으로 얼굴 인식 및 그 감정 인식 결과를 제공한다. In the emotion recognition system, an emotion recognition system using face recognition is constructed in a server storing face recognition DC and face feature point data according to an emotion state during access management, and a PC when face recognition of camera images is performed in a client/server method. Or, it provides the result of facial recognition and the emotion recognition with the client program of the smartphone.

본 발명의 얼굴 영상 기반의 감정 인식 시스템 및 방법은 대상 사람의 얼굴 영상을 입력받아 얼굴 이미지를 캡춰하여 인공 신경망 기술을 사용하여 윤곽선, 눈썹과 눈, 코와 입, 턱 등의 얼굴 특징점들(facial landmarks)을 추출하고, 추출된 얼굴 특징점들과 그 특징점들 근처의 영상 패치로부터 특징점 기반의 감정인식 결과와 영상 기반의 감정인식 결과를 통해 얼굴의 미세한 표정 변화를 인식하고 희노애락의 감정(기쁨, 슬픔, 두려움, 화남) 상태를 추정하여 그 사람의 감정 상태를 출력하는 효과가 있다. In the face image-based emotion recognition system and method of the present invention, facial features such as outlines, eyebrows and eyes, nose and mouth, and chin are facial features using artificial neural network technology by capturing a face image of a target person. landmarks), the extracted facial feature points and the image patch near the feature points, through the feature point-based emotion recognition result and the image-based emotion recognition result, recognizes minute facial changes of the face, and recognizes the emotions of joy, sorrow, and sorrow (joy, sadness). , Fear, anger) by estimating the state and outputting the person's emotional state.

얼굴 인식 기술은 카메라로 촬영된 영상 데이터를 사용하여 공항 출입국 관리, 얼굴 인식 기반 출입관리, 얼굴 인식 화상 회의, 얼굴 인식 대화형 TV 미디어 서비스, CCTV 카메라의 얼굴 인식 기반 신원 확인 및 범죄 수사에 사용되며, 얼굴 인식을 통해 사람의 감정 상태를 추정하게 되었다. Face recognition technology is used for airport immigration control, face recognition-based access control, face recognition video conferencing, face recognition interactive TV media service, face recognition based identification of CCTV cameras, and criminal investigations using image data captured by a camera. , Through facial recognition, a person's emotional state was estimated.

도 1은 종래의 얼굴인식장치의 구성도이다.
도 2는 얼굴인식장치의 정규화부를 설명하는 개념도이다.
도 3은 얼굴의 윤곽선, 눈썹과 눈, 코 밑선, 입, 턱을 포함하는 얼굴 특징점들(Facial Landmarks)의 예를 보인 그림이다.
도 4는 본 발명에 따른 감정 인식 시스템의 개요를 보인 도면이다.
도 5는 본 발명의 실시예에 따른 감정 인식 시스템의 블록도이다.
도 6은 인공 신경망 기반의 얼굴 특징점 추출부, 특징점 기반 감정 인식부, 영상 패치 기반의 감정 인식부의 블록도이다. 1 is a block diagram of a conventional face recognition device.
2 is a conceptual diagram illustrating a normalization unit of a face recognition device.
3 is a diagram showing examples of facial landmarks including facial contours, eyebrows and eyes, under the nose, mouth, and chin.
4 is a diagram showing an overview of an emotion recognition system according to the present invention.
5 is a block diagram of an emotion recognition system according to an embodiment of the present invention.
6 is a block diagram of an artificial neural network-based facial feature point extraction unit, a feature point-based emotion recognition unit, and an image patch-based emotion recognition unit.

이하, 본 발명의 바람직한 실시예를 첨부된 도면을 참조하여 발명의 구성 및 동작을 상세하게 설명한다. Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

본 발명의 얼굴 영상 기반의 감정 인식 시스템은 대상 사람의 얼굴 영상을 입력받아 얼굴 이미지를 캡춰하여 인공 신경망 기술을 사용하여 윤곽선, 눈썹과 눈, 코와 입, 턱 등의 얼굴 특징점들(facial landmarks)을 추출하고, 추출된 얼굴 특징점들과 그 특징점들 근처의 영상 패치로부터 사람의 미세한 표정 변화를 인식하고 희노애락의 감정 상태(기쁨, 슬픔, 두려움, 화남)를 추정하여 그 사람의 감정 상태를 출력한다. The facial image-based emotion recognition system of the present invention receives facial images of a target person and captures the facial image, and facial landmarks such as contours, eyebrows and eyes, nose and mouth, and chin using artificial neural network technology. Extracts the extracted facial feature points and the image patch near the feature points, recognizes a person's minute facial expression changes, estimates the emotional state of joy, sadness, fear, and anger, and outputs the emotional state of that person. .

도 3은 얼굴의 테두리 윤곽, 눈썹과 눈, 코 밑선, 입, 턱을 포함하는 얼굴 특징점들(Facial Landmarks)의 예를 보인 그림이다. 3 is an illustration showing examples of facial landmarks including the outline of the face, eyebrows and eyes, under the nose, mouth, and chin.

얼굴 특징점은 얼굴에서 구별이 가능한 특징을 가지고 있는 점들을 의미하며, 그 실시예로서 68개의 특징점을 나타냈다. The facial feature points refer to points having features that can be distinguished on the face, and 68 feature points are shown as examples.

실제 사람의 얼굴 영상으로부터 추출된 특징점들은 그 사람의 얼굴의 형태, 상태에 대한 정보를 제공한다. 이러한 얼굴의 특징점들을 활용하여 사람의 표정, 그리고 더 나아가 감정 상태까지 인식을 할 수 있다. 하지만, 얼굴의 특징점들은 거시적인 정보만을 제공할 수 있을 뿐 사람의 미세한 표정 변화에 따른 얼굴 영상의 미세한 변화, 예를들면 화난 얼굴에 나타나는 컬러 또는 밝기의 변화를 나타내지는 못한다. 이를 보완하기 위해 얼굴 특징점들 주위의 영상 정보 또한 함께 사용하여 감정 인식에 활용한다. 요약하면, 본 발명에서는 얼굴 영상으로부터 추출된 특징점들과 그러한 특징점들 근처의 영상 패치로부터 사람 얼굴의 미세한 표정 변화를 인식하고, 감정 상태를 추정하는 시스템을 제안하였다.Feature points extracted from an actual human face image provide information on the shape and state of the person's face. Using these facial features, it is possible to recognize a person's expression and even an emotional state. However, the feature points of the face can only provide macro information, but do not represent a subtle change in a face image according to a person's subtle facial expression change, for example, a change in color or brightness that appears on an angry face. To compensate for this, image information around facial feature points is also used to recognize emotions. In summary, in the present invention, a system for recognizing minute facial expression changes of a human face from feature points extracted from a face image and an image patch near the feature points and estimating an emotional state is proposed.

도 4는 본 발명에 따른 감정 인식 시스템의 개요를 보인 도면이다.4 is a diagram showing an overview of an emotion recognition system according to the present invention.

얼굴 영상 기반의 감정 인식 시스템은 대상 사람의 얼굴 영상을 입력받아 인공 신경망 기술을 사용하여 윤곽선, 눈썹과 눈, 코와 입, 턱 등의 얼굴 특징점 좌표를 추출하고, 추출된 얼굴 특징점들과 그 특징점들 근처의 영상 패치로부터 특징점 기반의 감정 인식 결과와 영상 기반의 감정인식 결과를 통해 얼굴의 미세한 표정 변화를 인식하고 희노애락의 감정 상태(기쁨, 슬픔, 두려움, 화남)를 추정하여 최종 감정 인식 결과를 출력한다. The facial image-based emotion recognition system receives the face image of the target person and uses artificial neural network technology to extract the coordinates of facial feature points such as outlines, eyebrows and eyes, nose and mouth, and chin, and the extracted facial feature points and their feature points. Through the feature point-based emotion recognition result and the image-based emotion recognition result from the image patch near the field, the final emotion recognition result is determined by recognizing the minute facial expression changes and estimating the emotional state of joy, sadness, fear, and anger. Print.

이를 위해, 얼굴 영상 기반의 감정 인식 시스템은 To this end, the facial image-based emotion recognition system

얼굴 영상 기반의 감정 인식 시스템은 얼굴 인식을 위한 개인별 얼굴 사진과 기계 학습(machine learning)에 의해 개인별 얼굴 사진의 감정 상태에 따른 얼굴의 윤곽선, 눈썹과 눈, 코와 입, 턱을 포함하는 얼굴의 특징점들과 그 얼굴 사진과 관련된 감정 상태에 따른 영상 패치 기반 데이터를 저장하는 저장된 얼굴 인식 DB와 얼굴 인식 시스템; 및 The facial image-based emotion recognition system is based on the facial contours of the faces, including the eyebrows and eyes, the nose and mouth, and the chin, according to the emotional state of the individual facial photos by machine learning and individual facial photos for face recognition. A stored face recognition DB and a face recognition system for storing feature points and image patch-based data according to an emotional state related to the face photo; And

상기 얼굴 인식 DB와 연동되며, 대상 사람의 얼굴 영상

를 입력받아 얼굴의 윤곽선, 눈썹과 눈, 코와 입, 턱을 포함하는 N개의 얼굴 특징점들을 추출하고, 특징점 기반 감정 인식 결과

를 입력받아 N개의 얼굴 특징점에 대한 좌표

를 입력받고, 영상 패치 기반 감정 인식 결과

와 영상 패치 기반 감정 인식 결과

를 입력받아 최종 감정 인식 결과

과

가 계산되며,
이는

는 가장 높은 확률을 가지는 감정의 인덱스

로 계산되어 감정 인식 시스템의 최종 감정 인식 결과가 출력된다. It is linked with the face recognition DB, and the face image of the target person

Is received and N facial feature points including facial contours, eyebrows and eyes, nose and mouth, and chin are extracted, and emotion recognition results based on feature points

Is received and coordinates for N facial feature points

Is received, and the result of emotion recognition based on image patch

And image patch-based emotion recognition results

Is received and the final emotion recognition result

and

Is calculated,
this is

Is the index of the emotion with the highest probability

얼굴 인식DB는 기계학습(machine learning, ML)에 따라 감정상태에 따른 얼굴의 특징점 데이터 및 개인별 얼굴 사진의 각 특징점 좌표 중심으로 윈도우(window)로 잘라낸 영상 패치 기반 데이터가 저장된다. In the face recognition DB, feature point data of faces according to emotional states and image patch-based data cut out with a window centered on the coordinates of each feature point of each individual face photo are stored according to machine learning (ML).

상기 영상 패치 기반 데이터는 감정 상태에 따른 얼굴 사진의 각 특징점 좌표 중심으로 윈도우(예, 3x3 window, 또는 5x5 window)로 잘라낸 컬러 영상들을 각각 얼굴 인식DB에 저장된 얼굴 인식 데이터이다. The image patch-based data is face recognition data stored in a face recognition DB, respectively, of color images cut with a window (eg, a 3x3 window or a 5x5 window) centered on the coordinates of each feature point of a face photo according to an emotional state.

도 5는 본 발명의 실시예에 따른 감정 인식 시스템의 블록도이다. 5 is a block diagram of an emotion recognition system according to an embodiment of the present invention.

감정 인식 시스템(700)은 얼굴 특징점 추출부(710), 특징점 기반 감정 인식부(720), 영상 패치 추출부(730), 영상 패치 기반 감정 인식부(740), 및 감정 인식 결과 융합부(770)으로 구성된다. The emotion recognition system 700 includes a facial feature point extraction unit 710, a feature point-based emotion recognition unit 720, an image patch extraction unit 730, an image patch-based emotion recognition unit 740, and an emotion recognition result fusion unit 770. ).

본 발명의 감정 인식 시스템(700)은 The emotion recognition system 700 of the present invention

얼굴 영상

를 입력받아, 얼굴의 윤곽선, 눈썹과 눈, 코와 입, 턱을 포함하는 N개의 얼굴 특징점들에 대한 좌표

를 출력하는 얼굴 특징점 추출부(710); Face image

Is received, and coordinates for N facial feature points including the outline of the face, eyebrows and eyes, nose and mouth, and chin

A facial feature point extracting unit 710 for outputting;

얼굴 인식DB에 통계적으로 감정상태에 따른 얼굴 표정의 특징점 데이터가 저장되며, 상기 N개의 얼굴 특징점들에 대한 좌표

를 입력받아 상기 감정상태에 따른 얼굴 표정의 특징점 데이터와 비교하여 특징점 기반 감정 인식 결과를 제공하는 특징점 기반 감정 인식부(720);Feature point data of facial expressions according to the emotional state are statistically stored in the face recognition DB, and coordinates for the N facial feature points

A feature point-based emotion recognition unit 720 for receiving input and comparing the feature point data of the facial expression according to the emotional state to provide a feature point-based emotion recognition result;

입력 얼굴 영상과 상기 N개의 얼굴 특징점 좌표들을 입력받아 얼굴 특징점 좌표를 중심으로 가로, 세로가 W 픽셀의 길이를 가지는 정사각형 패치를 얼굴 영상으로부터 추출하여 결과적으로 총 N개의 영상 패치

를 출력하는 영상 패치 추출부(730); The input face image and the N face feature point coordinates are input, and a square patch having a length of W pixels horizontally and vertically around the face feature point coordinates is extracted from the face image, resulting in a total of N image patches.

An image patch extraction unit 730 that outputs an image;

상기 영상 패치 추출부(730)로부터 상기 총 N개의 영상 패치

를 입력받고, 영상 패치 기반 감정 인식 결과

를 제공하는 영상 패치 기반 감정 인식부(740); 및The total number of N image patches from the image patch extraction unit 730

Is received, and the result of emotion recognition based on image patch

An image patch-based emotion recognition unit 740 providing a; And

상기 특징점 기반 감정 인식부(720) 및 상기 영상 패치 기반 감정 인식부(740)로부터 각각 특징점 기반 감정 인식 결과

와 영상 패치 기반 감정 인식 결과

를 입력받아 최종 감정 인식 결과

를 출력하는 감정 인식 결과 융합부(770)를 포함한다.Feature point-based emotion recognition results from the feature point-based emotion recognition unit 720 and the image patch-based emotion recognition unit 740, respectively

And image patch-based emotion recognition results

Is received and the final emotion recognition result

It includes an emotion recognition result fusion unit 770 that outputs.

먼저, 얼굴 영상

는 얼굴 특징점 추출부(710)로 입력되고, 얼굴 특징점 추출부(710)는 N개의 얼굴 특징점에 대한 좌표

를 출력한다. First, the face image

Is input to the facial feature point extracting unit 710, and the facial feature point extracting unit 710 is the coordinates for N facial feature points.

Prints.

상기 얼굴 특징점 추출부(710)은 입력 영상 I로부터 입력층/은닉층/출력층의 다층 구조의 딥러닝의 컨볼루션 신경망(Convolutional Neural Network, CNN)을 사용하며, 얼굴의 윤곽선, 눈썹과 눈, 코와 입, 턱을 포함하는 N개의 얼굴 특징점들을 추출한다. The facial feature point extraction unit 710 uses a deep learning convolutional neural network (CNN) of a multi-layered structure of an input layer/hidden layer/output layer from the input image I, and the facial contour, eyebrows and eyes, nose and N facial feature points including mouth and chin are extracted.

상기 얼굴 특징점 추출부(710), 상기 특징점 기반 감정 인식부(720), 상기 영상 패치 기반 감정 인식부(740)는 각각 다른 종류의 인공 신경망을 사용한다. The facial feature point extracting unit 710, the feature point-based emotion recognition unit 720, and the image patch-based emotion recognition unit 740 use different types of artificial neural networks, respectively.

상기 특징점 기반 감정 인식부(720)는 얼굴 특징점 추출부(710)로부터 상기 N개의 얼굴 특징점에 대한 좌표

를 입력받아 얼굴 인식DB에 저장된 상기 감정상태에 따른 얼굴 표정의 특징점 데이터와 비교하여 특징점 기반 감정 인식 결과를 제공한다. The feature point-based emotion recognition unit 720 coordinates the N facial feature points from the facial feature point extracting unit 710

Is received and compared with the feature point data of the facial expression according to the emotional state stored in the face recognition DB to provide a feature point-based emotion recognition result.

상기 영상 패치 추출부(730)는 입력 얼굴 영상과 얼굴 특징점 좌표를 입력받아 얼굴 특징점 좌표를 중심으로 가로, 세로가 W 픽셀의 길이를 가지는 정사각형 패치를 얼굴 영상으로부터 추출하여 결과적으로 총 N개의 영상 패치

를 출력한다. The image patch extractor 730 receives the input face image and the facial feature point coordinates, extracts a square patch having a length of W pixels horizontally and vertically around the facial feature point coordinates from the face image, resulting in a total of N image patches.

Prints.

얼굴 인식 DB와 연동되며, 얼굴 특징점 좌표와 영상 패치를 위해, 특징점 기반 감정 인식부(720)와 영상 패치 기반 감정 인식부(740)로부터 각각 감정 인식 결과 융합부(770)로 총 N개의 영상 패치

를 입력받고, 각각 특징점 기반 감정 인식 결과

와 영상 패치 기반 감정 인식 결과

를 입력받고, 감정 인식 결과 융합부(770)는 분류된 감정 중에서 기쁨, 슬픔, 두려움, 화남 중 어느 하나의 최종 감정 인식 결과를 출력한다. It is interlocked with the face recognition DB, and for facial feature point coordinates and image patch, a total of N image patches from the feature point-based emotion recognition unit 720 and the image patch-based emotion recognition unit 740 to the emotion recognition result fusion unit 770 respectively

And each feature point-based emotion recognition result

And image patch-based emotion recognition results

Is input, and the emotion recognition result fusion unit 770 outputs a final emotion recognition result of any one of joy, sadness, fear, and anger among the classified emotions.

여기서,

과

는 모두 M 차원의 벡터로 M개의 감정 카테고리에 대한 확률 분포를 나타낸다. 그렇게 추정된 두 개의 감정 인식 결과 벡터는 감정 인식 결과 융합부(770)로 입력되며, 최종 감정 인식 결과

가 계산된다. here,

and

Are all M-dimensional vectors and represent probability distributions for M emotion categories. The two estimated emotion recognition result vectors are input to the emotion recognition result fusion unit 770, and the final emotion recognition result

Is calculated.

이는

의 관계식을 통해 특징점 기반 감정 인식 결과 벡터와 영상 패치 기반 감정 인식 결과 벡터의 가중 평균으로 계산될 수 있으며, 여기서 α는 특징점 기반 감정 인식 결과에 대한 가중치를 나타낸다. 마지막으로 인식된 감정의 카테고리

는 가장 높은 확률을 가지는 감정의 인덱스

로 계산되어 감정 인식 시스템의 최종 감정 인식 결과가 출력된다.this is

It can be calculated as a weighted average of the feature point-based emotion recognition result vector and the image patch-based emotion recognition result vector through the relational expression of, where α represents the weight for the feature point-based emotion recognition result. Category of last recognized emotion

Is the index of the emotion with the highest probability

도 6은 인공 신경망 기반의 얼굴 특징점 추출부, 특징점 기반 감정 인식부, 영상 패치 기반의 감정 인식부의 블록도이다. 6 is a block diagram of an artificial neural network-based facial feature point extraction unit, a feature point-based emotion recognition unit, and an image patch-based emotion recognition unit.

입력 얼굴 영상으로부터 얼굴 특징점을 추출하는 얼굴 특징점 추출부는 인공 신경망(Artificial Neural Network) 방법으로 구현된다. The facial feature extraction unit, which extracts facial feature points from the input face image, is implemented using an artificial neural network method.

카메라의 입력 영상 I로부터 입력층/은닉층/출력층의 다층 구조의 컨볼루션 신경망(Convolutional Neural Network, CNN)을 사용한다. 즉, 먼저 입력 영상의 모든 픽셀 정보를 일렬로 나열하여 하나의 커다란 벡터

로 만든 후,

함수를 반복 적용하여 출력 벡터, N개의 얼굴 특징점들에 대한 벡터

를 계산한다:From the input image I of the camera, a multi-layered convolutional neural network (CNN) of the input layer/hidden layer/output layer is used. In other words, first, all the pixel information of the input image is arranged in a line to form one large vector.

After making it,

Output vector by repeatedly applying the function, vector for N facial feature points

Calculate:

여기서, hi는 i번째 은닉 특징 벡터, hi-1은 i-1번째 은닉 특징 벡터, Wi는 신경망 회로의 가중치 파라미터(weight parameter, 상수값), bi는 신경망 회로의 바이어스 값이다. Here, hi is the ith hidden feature vector, hi-1 is the i-1th hidden feature vector, Wi is the weight parameter (constant value) of the neural network circuit, and bi is the bias value of the neural network circuit.

즉, 입력 영상을 나타내는 벡터가

로 설정되어, 총 L 개의 레이어들을 거치며

,

, ...,

을 차례대로 계산하여 최종 출력 벡터는 으

로 결정이 된다. 또한,

,

, ...,

은 시스템의 입출력이 아닌, 드러나지 않는 양으로 은닉 특징 벡터(Hidden Feature Vector)라고 불린다. 이 때 최종 출력 벡터의 차원은

으로 N 개의 얼굴 특징점에 대한 2차원 영상 좌표들을 의미한다.That is, the vector representing the input image

Is set to, and goes through a total of L layers

,

, ...,

Is calculated in sequence, and the final output vector is

It is decided by Also,

,

, ...,

Is an invisible quantity, not an input/output of the system, and is called a hidden feature vector. In this case, the dimension of the final output vector is

Means 2D image coordinates for N facial feature points.

얼굴 특징점 추출부(710), 특징점 기반의 감정 인식부(720)와 영상 패치 기반의 감정 인식부(740) 또한 인공 신경망에 의해 구현된다. 인공 신경망은 입력층/은닉층/출력층의 다층 구조의 CNN(Convolutional Neural Network)을 사용할 수 있다. The facial feature point extraction unit 710, the feature point-based emotion recognition unit 720, and the image patch-based emotion recognition unit 740 are also implemented by an artificial neural network. The artificial neural network may use a convolutional neural network (CNN) having a multi-layered structure of an input layer/hidden layer/output layer.

특징점 기반의 감정 인식부(720)의 경우 얼굴 특징점 추출부(710)로부터 출력된 N개의 얼굴 특징점 좌표를 나타내는 2N 차원의 벡터를 입력으로 받아 M 차원의 감정 확률 벡터

를 출력한다. 비슷하게 영상 패치 기반 감정 인식부(740)의 경우도 영상 패치 추출부(730)로부터 출력된 총 N개의 영상 패치의 픽셀 정보를 일렬로 나열하여 하나의 커다란 벡터로 만든 후 인공 신경망을 통과하여 M차원의 감정 확률 벡터

를 출력한다. 인공 신경망의 구조는 도 5에 도시하였다. In the case of the feature point-based emotion recognition unit 720, a 2N-dimensional vector representing N facial feature point coordinates output from the facial feature point extraction unit 710 is received as an input and an M-dimensional emotion probability vector

Prints. Similarly, in the case of the image patch-based emotion recognition unit 740, pixel information of a total of N image patches output from the image patch extraction unit 730 is arranged in a row to form one large vector, and then passes through an artificial neural network Emotion probability vector

Prints. The structure of the artificial neural network is shown in FIG. 5.

정리하면, 얼굴 특징점 추출부(710), 특징점 기반 감정 인식부(720), 영상 패치 기반 감정 인식부(740)는 입력 영상 I로부터 입력층/은닉층/출력층의 다층 구조의 컨볼루션 신경망(CNN)을 사용하며, 상기 얼굴 특징점 추출부(710)는 얼굴의 윤곽선, 눈썹과 눈, 코와 입, 턱을 포함하는 N개의 얼굴 특징점들을 추출한다. In summary, the facial feature point extraction unit 710, the feature point-based emotion recognition unit 720, and the image patch-based emotion recognition unit 740 include a convolutional neural network (CNN) having a multilayered structure of the input layer/hidden layer/output layer from the input image I. And the facial feature point extracting unit 710 extracts N facial feature points including the outline of the face, eyebrows and eyes, nose and mouth, and chin.

실시예에서는, 얼굴 인식 DC와 출입 관리시의 감정 상태에 따른 얼굴 특징점 데이터가 저장된 서버에 얼굴 인식을 사용한 감정 인식 시스템이 구축되며, client/server 방식으로 카메라 영상의 얼굴 인식 시에 PC 또는 스마트폰의 클라이언트 프로그램으로 얼굴 인식 및 그 감정 인식 결과를 볼 수 있다. In an embodiment, an emotion recognition system using face recognition is constructed in a server storing face recognition DC and facial feature point data according to an emotion state at the time of access management, and a PC or smartphone when face recognition of a camera image is performed in a client/server method. You can view face recognition and the result of emotion recognition with the client program of.

또한, 본 발명의 얼굴 영상 기반의 감정 인식 방법은 (a) 감정 인식 시스템에서, 얼굴 영상

를 출력하는 단계; (b) 얼굴 인식DB에 기계학습(machine learning)에 따라 개인별 감정상태에 따른 얼굴 사진의 특징점 데이터, 및 개인별 얼굴 사진의 각 특징점 좌표 중심으로 윈도우(3x3 window, 또는 5x5 window)로 잘라낸 영상 패치 기반 데이터가 저장되며, 상기 얼굴 특징점 추출부로부터 상기 N개의 얼굴 특징점들에 대한 좌표

를 특징점 기반 감정 인식부로 입력받아, 상기 특징점 기반 감정 인식부가 상기 감정상태에 따른 얼굴 표정의 특징점 데이터와 비교하여 특징점 기반 감정 인식 결과를 제공하는 단계; (c) 입력 얼굴 영상과 상기 N개의 얼굴 특징점 좌표들을 영상 패치 추출부로 입력받아, 상기 영상 패치 추출부가 얼굴 특징점 좌표를 중심으로 가로, 세로가 W 픽셀의 길이를 가지는 정사각형 패치를 얼굴 영상으로부터 추출하여 결과적으로 총 N개의 영상 패치

와 영상 패치 기반 감정 인식 결과

를 출력하는 단계를 포함하며,
상기 단계 (e)의 상기 감정 인식 결과 융합부는

과

가 계산되며,
이는

는 가장 높은 확률을 가지는 감정의 인덱스

로 계산되어 감정 인식 시스템의 최종 감정 인식 결과가 출력된다. In addition, the facial image-based emotion recognition method of the present invention includes (a) in the emotion recognition system,

Outputting; (b) Based on the feature point data of the face photo according to the emotional state of each individual according to machine learning in the face recognition DB, and the image patch cut into a window (3x3 window or 5x5 window) centered on the coordinates of each feature point of the individual face photo. Data is stored, and coordinates of the N facial feature points from the facial feature point extraction unit

Receiving a feature point-based emotion recognition unit and comparing the feature point-based emotion recognition unit with feature point data of a facial expression according to the emotional state to provide a feature point-based emotion recognition result; (c) The input face image and the N facial feature point coordinates are input to an image patch extracting unit, and the image patch extracting unit extracts a square patch having a length of W pixels horizontally and vertically around the facial feature point coordinates from the face image. As a result, a total of N image patches

And image patch-based emotion recognition results

Including the step of outputting,
The emotion recognition result fusion unit in step (e)

and

Is calculated,
this is

Is the index of the emotion with the highest probability

상기 단계 (a)에서, 상기 얼굴 특징점 추출부(710), 상기 특징점 기반 감정 인식부(720), 상기 영상 패치 기반 감정 인식부(740)는 입력 영상 I로부터 입력층/은닉층/출력층의 다층 구조의 컨볼루션 신경망(CNN)을 사용하며, 상기 얼굴 특징점 추출부(710)는 얼굴의 윤곽선, 눈썹과 눈, 코와 입, 턱을 포함하는 N개의 얼굴 특징점들을 추출한다. In the step (a), the facial feature point extraction unit 710, the feature point-based emotion recognition unit 720, and the image patch-based emotion recognition unit 740 have a multilayered structure of an input layer/hidden layer/output layer from the input image I. A convolutional neural network (CNN) of is used, and the facial feature point extracting unit 710 extracts N facial feature points including the contour of the face, eyebrows and eyes, nose and mouth, and chin.

상기 얼굴 인식DB는 기계학습(machine learning)에 따라 개인별 감정상태에 따른 얼굴 사진의 특징점 데이터, 및 개인별 얼굴 사진의 각 특징점 좌표 중심으로 윈도우(window)로 잘라낸 영상 패치 기반 데이터가 저장된다. The face recognition DB stores feature point data of a face photo according to each individual's emotional state according to machine learning, and image patch-based data cut out with a window around the coordinates of each feature point of each individual face photo.

상기 단계 (e)의 상기 감정 인식 결과 융합부(770)는 The emotion recognition result fusion unit 770 of the step (e)

과

가 계산되며,

and

Is calculated,

이는

의 관계식을 통해 특징점 기반 감정 인식 결과 벡터와 영상 패치 기반 감정 인식 결과 벡터의 가중 평균으로 계산될 수 있으며, 여기서 α는 특징점 기반 감정 인식 결과에 대한 가중치를 나타내며, this is

It can be calculated as a weighted average of the feature point-based emotion recognition result vector and the image patch-based emotion recognition result vector through the relational expression of, where α represents the weight for the feature point-based emotion recognition result,

인식된 감정의 카테고리

는 가장 높은 확률을 가지는 감정의 인덱스

로 계산되어 감정 인식 시스템의 최종 감정 인식 결과가 출력된다. Category of perceived emotion

Is the index of the emotion with the highest probability

상기 감정 인식 시스템은, 얼굴 인식 DC와 출입 관리시의 감정 상태에 따른 얼굴의 특징점 데이터가 저장된 서버에 얼굴 인식을 사용한 감정 인식 시스템이 구축되며, client/server 방식으로 카메라 영상의 얼굴 인식 시에 PC 또는 스마트폰의 클라이언트 프로그램으로 얼굴 인식 및 상기 감정 인식 결과를 제공한다. In the emotion recognition system, an emotion recognition system using face recognition is constructed in a server storing face recognition DC and face feature point data according to an emotion state during access management, and a PC when face recognition of camera images is performed in a client/server method. Alternatively, face recognition and the emotion recognition result are provided with a client program of a smartphone.

본 발명에 따른 실시예들은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 기록 매체는 프로그램 명령, 데이터 파일, 데이터 구조를 단독으로 또는 조합하여 포함할 수 있다. 컴퓨터 판독 가능 기록 매체는 스토리지, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함될 수 있다.　프로그램 명령의 예는 컴파일러에 의해 만들어지는 것과, 기계어 코드뿐만 아니라 인터프리터를 사용하여 컴퓨터에 의해 실행될 수 있는 고급 언어 코드를 포함할 수 있다.　상기 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로써 작동하도록 구성될 수 있다.The embodiments according to the present invention may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable recording medium. The computer-readable recording medium may include program instructions, data files, and data structures alone or in combination. Computer-readable recording media include storage, magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -Magneto-optical media, and hardware devices specially configured to store and execute program instructions such as ROM, RAM, flash memory, etc. may be included. Examples of program instructions may include those produced by a compiler, and high-level language codes that can be executed by a computer using an interpreter as well as machine language codes. The hardware device may be configured to operate as one or more software modules to perform the operation of the present invention.

이상에서 설명한 바와 같이, 본 발명의 방법은 프로그램으로 구현되어 컴퓨터의 소프트웨어를 이용하여 읽을 수 있는 형태로 기록매체(CD-ROM, RAM, ROM, 메모리 카드, 하드 디스크, 광자기 디스크, 스토리지 디바이스 등)에 저장될 수 있다. As described above, the method of the present invention is implemented as a program and can be read using software of a computer, and a recording medium (CD-ROM, RAM, ROM, memory card, hard disk, magneto-optical disk, storage device, etc.) ) Can be stored.

본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야에서 통상의 지식을 가진자가 하기의 특허청구범위에 기재된 본 발명의 기술적 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 또는 변형하여 실시할 수 있음을 이해할 수 있을 것이다.Although it has been described with reference to a preferred embodiment of the present invention, various modifications or variations of the present invention within the scope not departing from the spirit and scope of the present invention described in the following claims by those of ordinary skill in the relevant technical field You will understand that it can be done.

700: 감정 인식 시스템 710: 얼굴 특징점 추출부
720: 특징점 기반 감정 인식부 730: 영상 패치 추출부
740: 영상 패치 기반 감정 인식부 770: 감정 인식 결과 융합부700: emotion recognition system 710: facial feature point extraction unit
720: feature point-based emotion recognition unit 730: image patch extraction unit
740: image patch-based emotion recognition unit 770: emotion recognition result fusion unit

Claims

Facial features including facial contours, eyebrows and eyes, nose and mouth, and chin according to the emotional state of each individual facial photo by machine learning and individual facial photos for face recognition, and related facial features. A stored face recognition DB and face recognition system for storing image patch-based data according to an emotional state; And
Linked with the face recognition DB, the face image of the target person

Is received and coordinates for N facial feature points

Is received, and the result of emotion recognition based on image patch

And image patch-based emotion recognition results

Is received and the final emotion recognition result

and

Is calculated,
this is

Is the index of the emotion with the highest probability

A facial image-based emotion recognition system that is calculated as and outputs the final emotion recognition result of the emotion recognition system.

The method of claim 1,
The image patch-based data is face recognition data stored in a face recognition DB, respectively, of color images reconstructed as a window around the coordinates of each feature point of a face photo according to an emotional state, and the window uses a 3x3 window or a 5x5 window. A facial image-based emotion recognition system.

delete

The method of claim 1,
The facial feature point extracting unit, the feature point-based emotion recognition unit, and the image patch-based emotion recognition unit use a convolutional neural network (CNN) having a multilayered structure of an input layer/hidden layer/output layer from the input image I, and the facial feature point extracting unit A facial image-based emotion recognition system that extracts N facial feature points including the outline of the eyebrows, eyes, nose and mouth, and chin.

delete

The method of claim 1,
The emotion recognition system
The emotion recognition system using face recognition is built in the server where face recognition DC and facial feature point data according to the emotion state during access management are stored, and the client of a PC or smartphone when face recognition of the camera image is performed in a client/server method. A facial image-based emotion recognition system that provides a face recognition and the emotion recognition result through a program.

(a) In the emotion recognition system, the face image

Providing a; And
(e) The feature point-based emotion recognition result of each individual face photo according to the emotional state by linking with the face recognition DB from the feature point-based emotion recognition unit and the image patch-based emotion recognition unit to compare with machine learning data

And image patch-based emotion recognition results

and

Is calculated,
this is

Is the index of the emotion with the highest probability

A facial image-based emotion recognition method that is calculated as and outputs the final emotion recognition result of the emotion recognition system.

The method of claim 7,
In the step (a), the facial feature point extraction unit, the feature point-based emotion recognition unit, and the image patch-based emotion recognition unit use a convolutional neural network (CNN) having a multilayered structure of an input layer/hidden layer/output layer from the input image I, and , The facial feature point extracting unit extracts N facial feature points including facial contours, eyebrows and eyes, nose and mouth, and chin.

The method of claim 7,
The face recognition DB stores feature point data of a face photo according to an individual's emotional state according to machine learning, and image patch-based data reconstructed as a window around the coordinates of each feature point of each individual face photo. Image-based emotion recognition method.

The method of claim 7,
The image patch-based data is face recognition data stored in a face recognition DB, respectively, of color images cut out with a window around the coordinates of each feature point of a face photo according to an emotional state, and the window uses a 3x3 window or a 5x5 window. , Face image-based emotion recognition method.

delete

The method of claim 7,
The emotion recognition system
The emotion recognition system using face recognition is built in the server where face recognition DC and facial feature point data according to the emotion state during access management are stored, and the client of a PC or smartphone when face recognition of the camera image is performed in a client/server method. A facial image-based emotion recognition method for providing face recognition and the emotion recognition result by a program.