KR20200010680A

KR20200010680A - Automated Facial Expression Recognizing Systems on N frames, Methods, and Computer-Readable Mediums thereof

Info

Publication number: KR20200010680A
Application number: KR1020180080304A
Authority: KR
Inventors: 노용만; 위삼 자랄 알하즈 바다르; 김성태; 유대훈; 이영복
Original assignee: 한국과학기술원; 주식회사 제네시스랩
Priority date: 2018-07-11
Filing date: 2018-07-11
Publication date: 2020-01-31
Also published as: KR102152120B1

Abstract

The present invention relates to an emotion recognition system performing emotion recognition of a target based on N frames using a machine learning model, a method thereof, and a computer-readable medium thereof. According to one embodiment of the present invention, a natural language generation system comprises: a reference frame extraction unit deriving a reference frame of a target; a feature information extraction unit extracting N first feature information for the N frames and second feature information for the reference frame; and an emotion determination unit deriving the emotion result of the target by a learned circulatory neural network model based on the N first feature information and the second feature information.

Description

Automated Facial Expression Recognizing Systems on N frames, methods, and Computer-Readable Mediums Technical} that uses a machine learning model to perform object recognition based on N frames.

본 발명은 기계학습 모델을 이용하여 N개의 프레임에 기초하여 대상의 감정인식을 수행하는 감정인식시스템, 방법, 및 컴퓨터-판독가능매체에 관한 것으로서, 보다 상세하게는 기계학습 모델을 이용하여 사용자의 개별특성에 따른 영향을 제한하고, 시간의 흐름에 따라 연속적으로 입력되는 영상으로부터 사용자의 감정을 인식하는 기술로, 기계학습 모델을 이용하여 N개의 프레임에 기초하여 대상의 감정인식을 수행하는 감정인식시스템, 방법, 및 컴퓨터-판독가능매체.The present invention relates to an emotion recognition system, a method, and a computer-readable medium for performing an object recognition based on N frames using a machine learning model, and more particularly, to a user using a machine learning model. It is a technology that limits the influence of individual characteristics and recognizes the user's emotion from the images that are continuously input as time passes.Emotion recognition is performed based on N frames using a machine learning model. Systems, methods, and computer-readable media.

종래에도 학습된 인공신경망 기술을 이용하여 영상정보의 대상이 되는 인물의 표정변화로부터 대상의 감정을 인식하는 기술이 존재하였다.Conventionally, there has been a technique for recognizing the emotion of an object from the facial expression change of a person who is the object of image information by using the learned artificial neural network technology.

다만, 이와 같은 종래 기술의 경우, 영상정보의 대상이 되는 인물 개개인이 갖는 고유특성, 외형이 대상의 감정을 인식하는데 있어서 방해가 될 수 있다. 예를들어, 인종, 성별에 따라 표정의 모습과 변화가 상이하기 때문에, 기본 혹은 중립 표정이 다를 수 있고, 또한 표정의 강도가 개인마다 상이하기 때문에, 이에 대한 인공신경망적 판단이 부정확해질 수 있다.However, in the related art, the inherent characteristics and appearance of each person who is an object of image information may interfere with recognition of the emotion of the object. For example, since facial expressions and changes are different according to race and gender, basic or neutral facial expressions may be different, and since the intensity of facial expressions is different for each individual, artificial neural judgments may be inaccurate. .

또한, 영상정보에서의 얼굴의 각도에 따라 포즈가 변화할 수 있고 또한 영상 촬영시의 빛에 의한 영향이 영상마다 상이할 수 있어서, 적은 수의 프레임으로는 대상의 감정을 판별하기 어렵다는 문제점이 있을 수 있다.In addition, since the pose may change depending on the angle of the face in the image information, and the influence of light at the time of image capturing may be different for each image, it may be difficult to determine the emotion of the object with a small number of frames. Can be.

따라서, 학습된 인공신경망 기술을 이용하여 대상의 감정을 인식하는데 있어서, 일반적으로는 방대한 양의 영상정보가 학습되어야 하고 또한 판별 대상 정보의 경우에도 입력되어야 대상의 감정을 인식할 수 있다는 문제점이 있었다.Therefore, in recognizing the emotion of the object by using the learned artificial neural network technology, there is a problem in that a large amount of image information must be learned and input of the object to be discriminated to recognize the emotion of the object. .

따라서, 적은 양의 영상정보로 대상의 외형 등의 개인고유 특징, 빛 등의 환경에 의한 영향을 최소화하면서 대상의 감정을 인식할 수 있는 새로운 기술에 대한 요구가 있었다.Therefore, there is a demand for a new technology capable of recognizing the emotions of an object with a small amount of image information while minimizing the influence of the environment, such as personal appearance of the object, and light.

본 발명의 목적은 기계학습 모델을 이용하여 사용자의 개별특성에 따른 영향을 제한하고, 시간의 흐름에 따라 연속적으로 입력되는 영상으로부터 사용자의 감정을 인식하는 기술로, 기계학습 모델을 이용하여 N개의 프레임에 기초하여 대상의 감정인식을 수행하는 감정인식시스템, 방법, 및 컴퓨터-판독가능매체를 제공하는 것이다.An object of the present invention is to use a machine learning model to limit the influence of the user's individual characteristics, and to recognize the user's emotions from the images that are continuously input over time, N machine using the machine learning model An emotion recognition system, a method, and a computer-readable medium for performing an emotion recognition of a subject based on a frame are provided.

상기와 같은 과제를 해결하기 위하여, 본 발명의 일 실시예에서는, 기계학습 모델을 이용하여 N개의 프레임에 기초하여 대상의 감정인식을 수행하는 감정인식시스템으로서, 상기 대상의 기준프레임을 도출하는 기준프레임추출부; 상기 N개의 프레임에 대한 N개의 제1특징정보 및 상기 기준프레임에 대한 제2특징정보를 추출하는 특징정보추출부; 상기 N개의 제1특징정보 및 상기 제2특징정보에 기초하여, 학습된 순환신경망 모델에 의하여 상기 대상의 감정결과를 도출하는 감정판별부;를 포함하는 감정인식시스템을 제공한다.In order to solve the above problems, in one embodiment of the present invention, as an emotion recognition system for performing the object recognition based on the N frames using a machine learning model, a reference for deriving the reference frame of the object Frame extracting unit; A feature information extracting unit for extracting N first feature information for the N frames and second feature information for the reference frame; And an emotion discrimination unit configured to derive an emotion result of the object based on the trained circulatory neural network model based on the N first feature information and the second feature information.

본 발명의 일 실시예에서는, 상기 기준프레임추출부는, 상기 N개의 프레임 중 어느 하나를 기준프레임으로 도출할 수 있다.In one embodiment of the present invention, the reference frame extractor may derive one of the N frames as a reference frame.

본 발명의 일 실시예에서는, 상기 특징정보추출부는, 상기 N개의 프레임으로부터 상기 N개의 제1특징정보를 추출하는 제1특징정보추출부; 및 상기 기준프레임으로부터 제2특징정보를 추출하는 상기 제2특징정보추출부;를 포함할 수 있다.In an embodiment of the present invention, the feature information extracting unit may include: a first feature information extracting unit extracting the N first feature information from the N frames; And the second feature information extracting unit extracting second feature information from the reference frame.

본 발명의 일 실시예에서는, 상기 제1특징정보추출부는 학습된 컨볼루션신경망모델을 이용하여 상기 N개의 프레임으로부터 상기 N개의 제1특징정보를 추출하고, 상기 제2특징정보추출부는 학습된 컨볼루션신경망모델을 이용하여 상기 기준프레임으로부터 상기 제2특징정보를 추출할 수 있다.In an embodiment of the present invention, the first feature information extractor extracts the N first feature information from the N frames using the learned convolutional neural network model, and the second feature information extractor learns the convoluted learner. The second feature information may be extracted from the reference frame by using a neural network model.

본 발명의 일 실시예에서는, 상기 감정판별부는, 학습된 순환신경망모델을 이용하여 상기 N개의 제1특징정보를 기초로 감정시퀀스정보를 추출하는 감정시퀀스정보순환신경망모듈; 학습된 순환신경망 모델을 이용하여 상기 제2특징정보를 기초로 기준시퀀스정보를 추출하는 기준시퀀스정보순환신경망모듈; 및 상기 감정시퀀스정보, 및 기준시퀀스정보를 기초로 상기 대상의 감정결과를 도출하는 감정결과도출부;를 포함할 수 있다.In one embodiment of the present invention, the emotion discrimination unit, emotion sequence information circulatory neural network module for extracting emotion sequence information based on the N first feature information using the learned cyclic neural network model; A reference sequence information cyclic neural network module for extracting reference sequence information based on the second feature information using the learned cyclic neural network model; And an emotion result deriving unit for deriving an emotion result of the target based on the emotion sequence information and reference sequence information.

본 발명의 일 실시예에서는, 상기 감정시퀀스정보순환신경망모듈은, LSTM 방식의 학습된 순환신경망모델을 이용하여, 복수개의 LSTM유닛에 상기 N개의 제1특징정보가 순차적으로 입력되고, 상기 제1특징정보를 기초로 N개의 감정시퀀스정보를 추출할 수 있다.In one embodiment of the present invention, the emotion sequence information circulatory neural network module, the N first feature information is sequentially input to a plurality of LSTM units, using the learned cyclic neural network model of the LSTM method, the first N pieces of emotion sequence information may be extracted based on the feature information.

본 발명의 일 실시예에서는, 상기 기준시퀀스정보순환신경망모듈은, LSTM 방식의 학습된 순환신경망모델을 이용하여, 복수개의 LSTM유닛에 상기 제2특징정보가 각각 입력되고, 상기 제2특징정보를 기초로 기준시퀀스정보를 추출할 수 있다.In one embodiment of the present invention, the reference sequence information circulatory neural network module, the second feature information is input to a plurality of LSTM units, respectively, by using the learned cyclic neural network model of the LSTM method, and the second feature information Based on the sequence information can be extracted.

본 발명의 일 실시예에서는, 상기 감정시퀀스정보순환신경망모듈은, LSTM 방식의 학습된 순환신경망모델을 이용하여, 복수개의 LSTM유닛에 상기 N개의 제1특징정보가 순차적으로 입력되고, 상기 제1특징정보를 기초로 N개의 감정시퀀스정보를 추출하고, 상기 기준시퀀스정보순환신경망모듈은, LSTM 방식의 학습된 순환신경망모델을 이용하여, 복수개의 LSTM유닛에 상기 제2특징정보가 각각 입력되고, 상기 제2특징정보를 기초로 N개의 기준시퀀스정보를 추출하고, 상기 감정결과도출부는, 상기 N개의 감정시퀀스정보 및 상기 N개의 기준시퀀스정보를 기초로 상기 대상의 감정결과를 도출할 수 있다.In one embodiment of the present invention, the emotion sequence information circulatory neural network module, the N first feature information is sequentially input to a plurality of LSTM units, using the learned cyclic neural network model of the LSTM method, the first Extracting N pieces of emotion sequence information based on the feature information, and the second sequence information is input to the plurality of LSTM units, respectively, in the reference sequence information cyclic neural network module using a learned cyclic neural network model of the LSTM method. N reference sequence information may be extracted based on the second feature information, and the emotion result extracting unit may derive the emotion result of the target based on the N emotion sequence information and the N reference sequence information.

본 발명의 일 실시예에서는, 상기 감정결과도출부는, 상기 N개의 기준시퀀스정보로부터, 상기 N개의 기준시퀀스정보 각각에 대응되는 상기 N개의 감정시퀀스정보를 차감하여 상기 대상의 감정결과를 도출할 수 있다.In an embodiment of the present disclosure, the emotion result deriving unit may derive the emotion result of the target by subtracting the N emotion sequence information corresponding to each of the N reference sequence information from the N reference sequence information. have.

본 발명의 일 실시예에서는, 상기 감정시퀀스정보순환신경망모듈의 복수개의 LSTM유닛 및 상기 기준시퀀스정보순환신경망모듈의 대응되는 복수개의 LSTM유닛은 공통된 학습데이터에 의하여 학습될 수 있다.In one embodiment of the present invention, the plurality of LSTM units of the emotion sequence information cyclic neural network module and the corresponding plurality of LSTM units of the reference sequence information cyclic neural network module may be learned by common learning data.

상기와 같은 과제를 해결하기 위하여, 본 발명의 일 실시예에서는, 1 이상의 프로세서 및 1 이상의 메모리를 포함하는 컴퓨팅 장치로 구현되는 기계학습 모델을 이용하여 N개의 프레임에 기초하여 대상의 감정인식을 수행하는 감정인식 방법으로서, 상기 대상의 기준프레임을 도출하는 기준프레임추출단계; 상기 N개의 프레임에 대한 N개의 제1특징정보 및 상기 기준프레임에 대한 제2특징정보를 추출하는 특징정보추출단계; 상기 N개의 제1특징정보 및 상기 제2특징정보에 기초하여, 학습된 순환신경망 모델에 의하여 상기 대상의 감정결과를 도출하는 감정판별단계;를 포함하는, 감정인식방법을 제공한다.In order to solve the above problems, in one embodiment of the present invention, using the machine learning model implemented by a computing device including at least one processor and at least one memory to perform the object recognition based on N frames An emotion recognition method comprising: a reference frame extracting step of deriving a reference frame of the object; A feature information extraction step of extracting N first feature information for the N frames and second feature information for the reference frame; And an emotion discrimination step of deriving an emotion result of the object based on the trained circulatory neural network model based on the N first feature information and the second feature information.

본 발명의 일 실시예에서는, 상기 특징정보추출단계는, 상기 N개의 프레임으로부터 상기 N개의 제1특징정보를 추출하는 제1특징정보추출단계; 및 상기 기준프레임으로부터 제2특징정보를 추출하는 상기 제2특징정보추출단계;를 포함하고, 상기 제1특징정보추출단계는 학습된 컨볼루션신경망모델을 이용하여 상기 N개의 프레임으로부터 상기 N개의 제1특징정보를 추출하고, 상기 제2특징정보추출단계는 학습된 컨볼루션신경망모델을 이용하여 상기 기준프레임으로부터 상기 제2특징정보를 추출할 수 있다.In an embodiment of the present invention, the feature information extraction step may include: a first feature information extraction step of extracting the N first feature information from the N frames; And extracting second feature information from the reference frame, wherein the second feature information extracting step comprises extracting second feature information from the N frames using the learned convolutional neural network model. In the extracting of the first feature information and extracting the second feature information, the second feature information may be extracted from the reference frame using a trained convolutional neural network model.

본 발명의 일 실시예에서는, 상기 감정판별단계는, 학습된 순환신경망모델을 이용하여 상기 N개의 제1특징정보를 기초로 감정시퀀스정보를 추출하는 감정시퀀스정보추출단계; 학습된 순환신경망 모델을 이용하여 상기 제2특징정보를 기초로 기준시퀀스정보를 추출하는 기준시퀀스정보추출단계; 및 상기 감정시퀀스정보, 및 기준시퀀스정보를 기초로 상기 대상의 감정결과를 도출하는 감정결과도출단계;를 포함할 수 있다.In an embodiment of the present invention, the emotion discriminating step may include: an emotion sequence information extracting step of extracting emotion sequence information based on the N first feature information using the learned cyclic neural network model; A reference sequence information extracting step of extracting reference sequence information based on the second feature information using the learned cyclic neural network model; And an emotion result deriving step of deriving an emotion result of the target based on the emotion sequence information and reference sequence information.

본 발명의 일 실시예에서는, 상기 감정시퀀스정보추출단계는, LSTM 방식의 학습된 순환신경망모델을 이용하여, 복수개의 LSTM 유닛에 상기 N개의 제1특징정보가 순차적으로 입력되고, 상기 제1특징정보를 기초로 N개의 감정시퀀스정보를 추출하고, 상기 기준시퀀스정보추출단계는, LSTM 방식의 학습된 순환신경망모델을 이용하여, 복수개의 LSTM유닛에 상기 제2특징정보가 각각 입력되고, 상기 제2특징정보를 기초로 N개의 기준시퀀스정보를 추출하고, 상기 감정결과도출단계는, 상기 N개의 감정시퀀스정보 및 상기 N개의 기준시퀀스정보를 기초로 상기 대상의 감정결과를 도출할 수 있다.In one embodiment of the present invention, in the emotion sequence information extraction step, the N first feature information is sequentially input to a plurality of LSTM units using a learned cyclic neural network model of the LSTM method, and the first feature Extracting the N pieces of emotion sequence information based on the information and extracting the reference sequence information, the second feature information is input to a plurality of LSTM units, respectively, by using a trained cyclic neural network model of the LSTM method. The N reference sequence information may be extracted based on the two feature information, and in the emotion result deriving step, the emotion result of the target may be derived based on the N emotion sequence information and the N reference sequence information.

상기와 같은 과제를 해결하기 위하여, 본 발명의 일 실시예에서는, 기계학습 모델을 이용하여 N개의 프레임에 기초하여 대상의 감정인식을 수행하는 감정인식 방법을 구현하기 위한 컴퓨터-판독가능 매체로서, 상기 컴퓨터-판독가능 매체는, 컴퓨팅 장치로 하여금 이하의 단계들을 수행하도록 하는 명령들을 저장하며, 상기 단계들은: 상기 대상의 기준프레임을 도출하는 기준프레임추출단계; 상기 N개의 프레임에 대한 N개의 제1특징정보 및 상기 기준프레임에 대한 제2특징정보를 추출하는 특징정보추출단계; 상기 N개의 제1특징정보 및 상기 제2특징정보에 기초하여, 학습된 순환신경망 모델에 의하여 상기 대상의 감정결과를 도출하는 감정판별단계;를 포함하는, 컴퓨터-판독가능 매체를 제공한다.In order to solve the above problems, in one embodiment of the present invention, as a computer-readable medium for implementing an emotion recognition method for performing the emotion recognition of the object based on N frames using a machine learning model, The computer-readable medium stores instructions for causing a computing device to perform the following steps, the steps comprising: extracting a reference frame for deriving a reference frame of the object; A feature information extraction step of extracting N first feature information for the N frames and second feature information for the reference frame; And an emotion discrimination step of deriving an emotion result of the object based on the learned cyclic neural network model based on the N first feature information and the second feature information.

본 발명의 일 실시예에 따르면, 학습된 인공신경망 모델을 이용하여 영상정보에 담긴 대상의 감정을 인식할 수 있다.According to an embodiment of the present invention, the emotion of the target included in the image information may be recognized using the learned artificial neural network model.

본 발명의 일 실시예에 따르면, 학습된 인공신경망 모델을 이용하여 영상정보에 담긴 대상의 감정을 설정된 클래스(class)에 따라 분류할 수 있다.According to an embodiment of the present invention, the emotion of the object included in the image information may be classified according to a set class using the trained artificial neural network model.

본 발명의 일 실시예에 따르면, 컨볼루션인공신경망모델을 이용하여 연속적인 프레임 각각에 대해 특징정보를 추출할 수 있다.According to an embodiment of the present invention, feature information may be extracted for each consecutive frame using a convolutional artificial neural network model.

본 발명의 일 실시예에 따르면, 추출된 기준프레임으로부터 컨볼루션인공신경망모델을 이용하여 대상의 고유특성에 대한 특징정보를 추출할 수 있다.According to an embodiment of the present invention, feature information on the intrinsic characteristic of the object may be extracted using the convolutional artificial neural network model from the extracted reference frame.

본 발명의 일 실시예에 따르면, 대상의 고유특성에 대한 특징정보를 이용하여 학습된 인공신경망 모델의 동작을 수행함으로써, 대상의 감정을 인식하는데 필요한 영상정보량을 줄이는 효과가 있다.According to an embodiment of the present invention, by performing the operation of the learned artificial neural network model using the characteristic information on the unique characteristics of the object, there is an effect of reducing the amount of image information necessary to recognize the emotion of the object.

본 발명의 일 실시예에 따르면, 대상의 고유특성에 대한 특징정보를 이용하여 학습된 인공신경망모델의 동작을 수행함으로써, 대상의 감정을 인식하는데 필요한 단계를 줄이는 효과가 있다.According to an embodiment of the present invention, by performing the operation of the trained neural network model using the characteristic information on the unique characteristics of the target, there is an effect of reducing the steps required to recognize the emotion of the target.

본 발명의 일 실시예에 따르면, 대상의 고유특성에 대한 특징정보를 이용하여 학습된 인공신경망모델의 동작을 수행함으로써, 대상의 외형 등의 개인 고유 특징, 빛 등의 환경에 의한 영향을 최소화하는 효과가 있다.According to an embodiment of the present invention, by performing the operation of the learned artificial neural network model using the characteristic information on the characteristic of the characteristic of the object, to minimize the effect of the environment, such as personal characteristics, such as the appearance of the object, light It works.

도 1은 본 발명의 일 실시예에 따른 대상의 감정인식을 수행하는 감정인식 시스템의 동작환경을 개략적으로 도시한다.
도 2는 본 발명의 일 실시예에 따른 감정인식시스템의 내부구성을 개략적으로 도시한다.
도 3은 본 발명의 일 실시예에 따른 기준프레임추출부의 동작을 예시적으로 도시한다.
도 4는 본 발명의 일 실시예에 따른 제1특징정보 및 제2특징정보를 기초로 감정결과를 도출하는 과정을 개략적으로 도시한다.
도 5는 본 발명의 일 실시예에 따른 컨볼루션신경망모델을 이용한 제1특징정보추출부 및 제2특징정보추출의 동작을 예시적으로 도시한다.
도 6은 본 발명의 일 실시예에 따른 컨볼루션신경망모델의 세부 동작과정을 예시적으로 도시한다.
도 7은 본 발명의 일 실시예에 따른 감정판별부의 동작을 예시적으로 도시한다.
도 8은 본 발명의 일 실시예에 따른 대상의 고유특성에 따른 특징정보를 예시적으로 도시한다.
도 9는 본 발명의 일 실시예에 따른 기계학습 모델을 이용하여 N개의 프레임에 기초하여 대상의 감정인식을 수행하는 감정인식 시스템의 성능 비교결과를 예시적으로 도시한다.
도 10은 본 발명의 일 실시예에 다른 컴퓨팅장치의 내부 구성을 예시적으로 도시한다.1 schematically illustrates an operating environment of an emotion recognition system that performs emotion recognition of a subject according to an embodiment of the present invention.
2 schematically illustrates an internal configuration of an emotion recognition system according to an embodiment of the present invention.
3 exemplarily illustrates an operation of a reference frame extractor according to an embodiment of the present invention.
4 schematically illustrates a process of deriving an emotion result based on the first feature information and the second feature information according to an embodiment of the present invention.
5 exemplarily illustrates operations of the first feature information extractor and the second feature information extractor using the convolutional neural network model according to an exemplary embodiment of the present invention.
6 exemplarily illustrates a detailed operation of the convolutional neural network model according to an embodiment of the present invention.
7 exemplarily illustrates an operation of the emotion discrimination unit according to an embodiment of the present invention.
8 exemplarily shows characteristic information according to a unique characteristic of an object according to an embodiment of the present invention.
FIG. 9 exemplarily shows a performance comparison result of an emotion recognition system that performs an emotion recognition of a target based on N frames using a machine learning model according to an embodiment of the present invention.
10 exemplarily illustrates an internal configuration of a computing device according to an embodiment of the present invention.

이하에서는, 다양한 실시예들 및/또는 양상들이 이제 도면들을 참조하여 개시된다. 하기 설명에서는 설명을 목적으로, 하나이상의 양상들의 전반적 이해를 돕기 위해 다수의 구체적인 세부사항들이 개시된다. 그러나, 이러한 양상(들)은 이러한 구체적인 세부사항들 없이도 실행될 수 있다는 점 또한 본 발명의 기술 분야에서 통상의 지식을 가진 자에게 인식될 수 있을 것이다. 이후의 기재 및 첨부된 도면들은 하나 이상의 양상들의 특정한 예시적인 양상들을 상세하게 기술한다. 하지만, 이러한 양상들은 예시적인 것이고 다양한 양상들의 원리들에서의 다양한 방법들 중 일부가 이용될 수 있으며, 기술되는 설명들은 그러한 양상들 및 그들의 균등물들을 모두 포함하고자 하는 의도이다.In the following, various embodiments and / or aspects are now disclosed with reference to the drawings. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more aspects. However, it will also be appreciated by one of ordinary skill in the art that this aspect (s) may be practiced without these specific details. The following description and the annexed drawings set forth in detail certain illustrative aspects of the one or more aspects. However, these aspects are exemplary and some of the various methods in the principles of the various aspects may be used and the descriptions described are intended to include all such aspects and their equivalents.

또한, 다양한 양상들 및 특징들이 다수의 디바이스들, 컴포넌트들 및/또는 모듈들 등을 포함할 수 있는 시스템에 의하여 제시될 것이다. 다양한 시스템들이, 추가적인 장치들, 컴포넌트들 및/또는 모듈들 등을 포함할 수 있다는 점 그리고/또는 도면들과 관련하여 논의된 장치들, 컴포넌트들, 모듈들 등 전부를 포함하지 않을 수도 있다는 점 또한 이해되고 인식되어야 한다.In addition, various aspects and features will be presented by a system that may include a number of devices, components, and / or modules, and the like. The various systems may include additional devices, components, and / or modules, etc., and / or may not include all of the devices, components, modules, etc. discussed in connection with the drawings. It must be understood and recognized.

본 명세서에서 사용되는 "실시예", "예", "양상", "예시" 등은 기술되는 임의의 양상 또는 설계가 다른 양상 또는 설계들보다 양호하다거나, 이점이 있는 것으로 해석되지 않을 수도 있다. 아래에서 사용되는 용어들 '~부', '컴포넌트', '모듈', '시스템', '인터페이스' 등은 일반적으로 컴퓨터 관련 엔티티(computer-related entity)를 의미하며, 예를 들어, 하드웨어, 하드웨어와 소프트웨어의 조합, 소프트웨어를 의미할 수 있다.As used herein, “an embodiment”, “an example”, “aspect”, “an example” and the like may not be construed as any aspect or design described being better or advantageous than other aspects or designs. . The terms '~ part', 'component', 'module', 'system', 'interface', etc. used generally mean a computer-related entity. For example, hardware, hardware And a combination of software and software.

또한, "포함한다" 및/또는 "포함하는"이라는 용어는, 해당 특징 및/또는 구성요소가 존재함을 의미하지만, 하나이상의 다른 특징, 구성요소 및/또는 이들의 그룹의 존재 또는 추가를 배제하지 않는 것으로 이해되어야 한다.The terms "comprises" and / or "comprising" also mean that the feature and / or component is present, but exclude the presence or addition of one or more other features, components and / or groups thereof. It should be understood that it does not.

또한, 제1, 제2 등과 같이 서수를 포함하는 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되지는 않는다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.In addition, terms including ordinal numbers such as first and second may be used to describe various components, but the components are not limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component. The term and / or includes a combination of a plurality of related items or any item of a plurality of related items.

또한, 본 발명의 실시예들에서, 별도로 다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 발명의 실시예에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.In addition, in the embodiments of the present invention, unless otherwise defined, all terms used herein including technical or scientific terms are generally understood by those skilled in the art to which the present invention belongs. Has the same meaning as Terms such as those defined in the commonly used dictionaries should be construed as having meanings consistent with the meanings in the context of the related art, and ideally or excessively formal meanings, unless explicitly defined in the embodiments of the present invention. Not interpreted as

본 발명은 기계학습 모델을 이용하여 N개의 프레임에 기초하여 대상의 감정인식을 수행하는 감정인식시스템에 관한 것으로서, 기계학습 모델을 이용하여 사용자의 개별특성에 따른 영향을 제한하고, 시간의 흐름에 따라 연속적으로 입력되는 영상으로부터 사용자의 감정을 인식할 수 있다.The present invention relates to an emotion recognition system for performing an object recognition based on N frames using a machine learning model. The present invention relates to a machine learning model for limiting the influence of an individual characteristic of a user, Accordingly, the emotion of the user may be recognized from the images continuously input.

이하에서는 이와 같은 감정인식을 수행하기 위하여, 기계학습모델을 이용한 대상의 감정인식을 수행하는 감정인식시스템, 방법, 및 컴퓨터-판독가능매체에 대하여 설명하도록 한다.Hereinafter, in order to perform such emotion recognition, an emotion recognition system, a method, and a computer-readable medium for performing the emotion recognition of a target using a machine learning model will be described.

도 1은 본 발명의 일 실시예에 따른 대상의 감정인식을 수행하는 감정인식 시스템의 동작환경을 개략적으로 도시한다.1 schematically illustrates an operating environment of an emotion recognition system that performs emotion recognition of a subject according to an embodiment of the present invention.

본 발명의 일 실시예에 따르면, 도 1의 (A)에 도시된 바와 같이 상기 감정인식시스템(100)은 감정을 인식할 수 있는 대상에 대한 영상정보를 기초로 상기 대상에 대한 감정결과를 도출할 수 있다.According to an embodiment of the present invention, as shown in (A) of FIG. 1, the emotion recognition system 100 derives an emotion result for the object based on the image information of the object capable of recognizing the emotion. can do.

구체적으로, 상기 영상정보는 상기 대상의 감정을 인식할 수 있는 안면의 표정변화를 포함하는 상기 대상에 대한 동적영상, 혹은 일정한 주기로 연속되어 기록한 상기 대상에 대한 복수의 정적영상일 수 있다. 상기 영상정보는 2이상의 이미지 프레임을 갖는 것을 포함하는 최광의로 해석되어야 할 것이다. 이와 같은 영상정보는 이미지 단독 혹은 이미지 및 사운드를 포함하는 영상정보일 수 있다.In detail, the image information may be a dynamic image of the object including a facial expression change of the face to recognize the emotion of the object, or a plurality of static images of the object continuously recorded at a predetermined period. The image information should be interpreted broadly, including having two or more image frames. Such image information may be image information alone or image information including images and sounds.

상기 감정인식시스템(100)은 이와 같은 영상정보를 기초로 상기 대상에 대한 감정결과를 도출할 수 있다.The emotion recognition system 100 may derive an emotion result for the object based on the image information.

이와 같은 대상에 대한 감정결과는 감정정보를 추출할 수 있는 데이터셋, 예를들어 벡터형태의 데이터셋, 혹은 혹은 상기 대상의 감정을 인식하여 기설정된 복수개의 클래스(class) 중 1 이상으로 분류한 결과를 포함할 수 있다. 본 발명의 다른 실시예에서는 여기서 상기 감정정보를 추출할 수 있는 데이터셋에 대하여 추가적인 오차보상 알고리즘이 적용될 수 있다. 즉, 상기 감정결과는 감정결과 자체 혹은 감정결과를 도출하기 위한 중간결과물을 포함하는 개념으로 해석되어야 할 것이다.The emotion result for the object is classified into one or more of a dataset from which emotion information can be extracted, for example, a vector dataset, or one or more of a plurality of predetermined classes by recognizing the emotion of the object. May include results. In another embodiment of the present invention, an additional error compensation algorithm may be applied to the data set from which the emotion information can be extracted. That is, the emotional result should be interpreted as a concept including the emotional result itself or an intermediate result for deriving the emotional result.

본 발명의 일 실시예에 따르면, 이와 같은 기설정된 복수개의 클래스(class)는 화남, 불쾌, 두려움, 행복, 슬픔, 놀람 중 1 이상을 포함할 수 있다.According to an embodiment of the present invention, the predetermined plurality of classes may include one or more of anger, displeasure, fear, happiness, sadness, and surprise.

다만, 이와 같은 대상에 대한 감정결과는, 단순히 상기 기설정된 복수개의 클래스(class)뿐만 아니라, 상기 기설정된 복수개의 클래스에 포함되는 1 이상의 감정 각각에 대한 감정강도(intensity)를 포함할 수 있다. However, the emotion result for the object may include not only the plurality of preset classes, but also the intensity of each of one or more emotions included in the preset plurality of classes.

구체적으로, 본 발명의 일 실시예에 따르면, 대상의 영상정보로부터 추출되는 상기 기설정된 복수개의 클래스 중 하나인 행복에 있어서도, 그 감정의 강도를 표현할 수 있다. Specifically, according to one embodiment of the present invention, the strength of the emotion can be expressed even in happiness, which is one of the predetermined plurality of classes extracted from the target image information.

예를 들어, 상기 대상의 영상정보로부터 추출되는 대상의 감정결과가 행복으로 검출된다고 하였을 때, 이와 같은 대상의 행복의 감정에도 감정강도(크기)가 있을 수 있다. 상기 대상의 영상정보로부터 도출되는 상기 대상의 행복의 감정강도가 가장 낮을 때를 1로하고, 상기 대상의 행복의 감정강도가 가장 높을 때를 10으로 하여, 상기 대상의 감정에 대한 강도를 표현할 수 있다. 이를 통하여 상기 대상에 대한 영상정보를 기초로 미세한 표정 차이를 표현하는 대상에 대한 감정결과를 도출할 수 있다.For example, when the emotion result of the object extracted from the image information of the object is detected as happiness, the emotion of happiness of the object may also have emotional intensity (size). The intensity of the emotion of the subject can be expressed by setting 1 as the time when the emotional intensity of happiness of the subject derived from the image information of the subject is 1 and 10 when the emotional intensity of the subject happiness is the highest. have. Through this, the emotion result for the object expressing the minute expression difference may be derived based on the image information of the object.

이를 위하여, 후술하는 바와 같이 상기 영상정보를 기초로 상기 감정결과를 도출하기 위한 다양한 형태의 기계학습 모델을 학습하는데 있어서, 대상의 감정에 대하여 레이블(lable)된 이미지, 혹은 프레임을 포함하는 영상정보를 기준으로 학습을 진행한다.To this end, in learning various types of machine learning models for deriving the emotion result based on the image information as described below, image information including a label or an image labeled with a target emotion. Based on the learning progress.

즉, 대상의 감정에 대하여 레이블된 이미지, 혹은 프레임을 기초로, 상기 영상정보에 포함된 다른 이미지, 혹은 프레임과 코사인 유사성(cosine similarity)을 계산하여 각각의 감정강도를 구하여 학습을 진행할 수 있다.That is, cosine similarity with other images or frames included in the image information may be calculated based on a labeled image or a frame with respect to the object's emotion, and each learning intensity may be obtained by calculating cosine similarity.

결국, 이와 같은 감정강도를 포함하는 상기 영상정보에 포함되는 이미지, 혹은 프레임으로 학습되는 상기 다양한 형태의 기계학습모델을 이용하여 상기 기설정된 복수개의 클래스(class)뿐만 아니라, 상기 기설정된 복수개의 클래스에 포함되는 1 이상의 감정 각각에 대한 감정강도(intensity)를 표현할 수 있다. As a result, not only the predetermined plurality of classes using the various types of machine learning models learned by the images or frames included in the image information including the emotional intensity, but also the predetermined plurality of classes. An emotion intensity of each of one or more emotions included in the expression may be expressed.

또한, 이를 기초로 종래의 기술과는 다르게 중립(neutral)표정에서부터 상기 대상의 표정변화가 완료되는 모든 이미지 혹은 프레임을 포함하는 영상정보가 아닌, 중립(neutral)표정에서부터 중간 표정변화까지의 영상정보를 기초로 하여, 적은 프레임으로도 상기 감정결과를 도출할 수 있다.Also, based on this, unlike the conventional technology, the image information from the neutral expression to the intermediate expression change is different from the neutral expression to the image information including all images or frames for which the object expression change is completed. Based on the above, the emotional result can be derived even with a small frame.

상기 감정인식시스템(100)은 이에 더하여, 상기 감정인식시스템(100)을 구성하는 다양한 형태의 기계학습 모델을 이용하여 상기 대상에 대한 영상정보를 기초로 상기 감정결과를 도출할 수 있다.In addition, the emotion recognition system 100 may derive the emotion result based on the image information of the object using various types of machine learning models constituting the emotion recognition system 100.

이와 같은 기계학습 모델은 학습된 데이터로 학습이 되고, 이후 이를 이용하여 자동화된 감정인식시스템(100)을 구현할 수 있다.Such a machine learning model is learned with the learned data, and then can be used to implement the automated emotion recognition system 100.

다만, 상기 감정인식시스템(100)은 기계학습 모델을 이용하여 종래 기술과 다른 방식으로 구현됨으로써, 종래 기술에 비하여 더 효율적으로 상기 대상에 대한 감정결과를 도출할 수 있다. However, the emotion recognition system 100 may be implemented in a different manner from the prior art by using a machine learning model, so that the emotion result for the object may be more efficiently compared to the prior art.

도 1의 (A)에서는 영상정보로부터 감정결과를 도출한다. 구체적으로, 상기 감정인식시스템(100)은 상기 영상정보로부터 기준프레임을 도출하고, 이와 같은 기준프레임을 기반으로 상기 대상의 외형 등의 고유특성 혹은 환경 등의 외부 특성을 보상하여 감정결과를 도출함으로써, 보다 정확한 감정인식을 수행할 수 있다.In FIG. 1A, the emotion result is derived from the image information. In detail, the emotion recognition system 100 derives a reference frame from the image information, and derives an emotion result by compensating an intrinsic characteristic such as the appearance of the object or an external characteristic such as an environment based on the reference frame. In this way, more accurate emotion recognition can be performed.

혹은 도 1의 (B)에 도시된 바와 같이 상기 영상정보와 함께 기준프레임정보가 감정인식시스템에 입력될 수 있다.Alternatively, as shown in FIG. 1B, reference frame information may be input to the emotion recognition system together with the image information.

이 경우, 상기 대상의 외형 등의 고유특성 혹은 환경 등의 외부 특성을 보상하기 위한 기준프레임이 상기 기준프레임정보 및/또는 영상정보로부터 추출되어, 대상의 외형 등의 고유특성 혹은 환경 등의 외부 특성을 보상하여 감정결과를 도출함으로써, 보다 정확한 감정인식을 수행할 수 있다.In this case, a reference frame for compensating for intrinsic characteristics such as the external appearance of the target or external characteristics such as the environment is extracted from the reference frame information and / or image information, thereby intrinsic characteristics such as the external appearance of the target, or external characteristics such as the environment. By compensating for the derivation of emotion results, more accurate emotion recognition can be performed.

상기 기준프레임으로는 예를들어, 중립(NEUTRAL) 표정을 갖는 프레임 혹은 영상정보에 포함된 프레임 중 일부 혹은 영상정보에 포함된 프레임 중 중립 표정을 갖는다고 추정되는 프레임이 선택될 수 있다.As the reference frame, for example, a frame having a neutral expression may be selected from a frame having a neutral expression or a part of a frame included in the image information or a frame included in the image information.

한편, 상기 기준프레임정보는 기준프레임 자체를 포함하거나 혹은 기준프레임을 도출할 수 있는 정보를 포함하거나 혹은 상기 영상정보의 프레임 중 기준프레임을 도출할 수 있는 정보를 포함하거나 혹은 상기 대상의 외형, 환경에 대한 정보를 포함할 수 있다. 여기서 외형(appearance)에 대한 정보는 인종, 성별, 나이, 얼굴색깔, 얼굴형, 감정세기, 중립 표정 등을 포함할 수 있고, 환경 정보는 광량, 이미지 관련 정보, 포즈 등을 포함할 수 있다.The reference frame information may include the reference frame itself or information for deriving a reference frame, or information for deriving a reference frame among frames of the image information, or the appearance and environment of the object. It may include information about. Here, the information on appearance may include race, sex, age, face color, face shape, emotion intensity, neutral expression, and the like, and the environmental information may include light quantity, image related information, and pose.

이러한 특징정보를 기초로 상기 대상의 외형 및 환경 등의 영향을 제한하는 방법으로 상기 감정인식시스템(100)을 구현함으로써, 종래 기술에 비하여 더 적은 양의 영상정보에 의하여 정확하게 상기 대상의 감정결과를 도출할 수 있다.By implementing the emotion recognition system 100 in a way to limit the appearance and environment of the object based on the feature information, compared to the prior art by using a smaller amount of image information than the accurate result of the target emotion Can be derived.

도 2는 본 발명의 일 실시예에 따른 감정인식시스템의 내부구성을 개략적으로 도시한다.2 schematically illustrates an internal configuration of an emotion recognition system according to an embodiment of the present invention.

상기 실시예에 따른 감정인식시스템은 1 이상의 프로세서 및 1 이상의 메모리를 컴퓨팅 장치에 의하여 구현될 수 있다.The emotion recognition system according to the above embodiment may be implemented by one or more processors and one or more memories by a computing device.

이와 같은 컴퓨팅장치는 프로세서(A), 버스(프로세서, 메모리, 네트워크 인터페이스 사이의 양방향 화살표에 해당), 네트워크 인터페이스(B) 및 메모리(C)를 포함할 수 있다. 메모리(C)에는 OS(운영체제), 및 인공신경망을 구현하는 데 있어서 학습된 학습데이터로서 후술하는 본 발명의 추론 혹은 예측을 하는 모듈에서 이용되는 인공신경망학습데이터가 저장되어 있을 수 있다. 혹은 상기 인공신경망학습데이터는 딥러닝이 진행된 모델링 정보 자체를 의미할 수도 있다. 프로세서(A)에서는 기준프레임추출부(1000), 특징정보추출부(2000), 및 감정판별부(3000)가 실행될 수 있다. 다른 실시예들에서 감정인식시스템은 도 2의 구성요소들보다 더 많은 구성요소들을 포함할 수도 있다.Such a computing device may include a processor A, a bus (corresponding to a double-headed arrow between a processor, a memory, a network interface), a network interface B, and a memory C. The memory C may store artificial neural network learning data used in the inference or prediction module of the present invention described below as learning data learned in implementing an OS and an artificial neural network. Alternatively, the artificial neural network learning data may mean modeling information itself in which deep learning is performed. In the processor A, the reference frame extractor 1000, the feature information extractor 2000, and the emotion discriminator 3000 may be executed. In other embodiments the emotion recognition system may include more components than the components of FIG. 2.

메모리는 컴퓨터에서 판독 가능한 기록 매체로서, RAM(random access memory), ROM(read only memory) 및 디스크 드라이브와 같은 비소멸성 대용량 기록장치(permanent mass storage device)를 포함할 수 있다. 이러한 소프트웨어 구성요소들은 드라이브 메커니즘(drive mechanism, 미도시)을 이용하여 메모리와는 별도의 컴퓨터에서 판독 가능한 기록 매체로부터 로딩될 수 있다. 이러한 별도의 컴퓨터에서 판독 가능한 기록 매체는 플로피 드라이브, 디스크, 테이프, DVD/CD-ROM 드라이브, 메모리 카드 등의 컴퓨터에서 판독 가능한 기록 매체(미도시)를 포함할 수 있다. 다른 실시예에서 소프트웨어 구성요소들은 컴퓨터에서 판독 가능한 기록 매체가 아닌 네트워크 인터페이스(B)를 통해 메모리에 로딩될 수도 있다.The memory is a computer-readable recording medium, and may include a permanent mass storage device such as random access memory (RAM), read only memory (ROM), and a disk drive. Such software components may be loaded from a computer readable recording medium separate from the memory using a drive mechanism (not shown). Such a separate computer-readable recording medium may include a computer-readable recording medium (not shown), such as a floppy drive, disk, tape, DVD / CD-ROM drive, memory card, and the like. In other embodiments, the software components may be loaded into the memory via the network interface B rather than the computer readable recording medium.

버스는 컴퓨팅 장치의 구성요소들간의 통신 및 데이터 전송을 가능하게 할 수 있다. 버스는 고속 시리얼 버스(high-speed serial bus), 병렬 버스(parallel bus), SAN(Storage Area Network) 및/또는 다른 적절한 통신 기술을 이용하여 구성될 수 있다.The bus may enable communication and data transfer between components of the computing device. The bus may be configured using a high-speed serial bus, a parallel bus, a storage area network and / or other suitable communication technology.

네트워크 인터페이스(B)는 감정인식시스템을 구현하는 컴퓨팅장치를 컴퓨터 네트워크에 연결하기 위한 컴퓨터 하드웨어 구성 요소일 수 있다. 네트워크 인터페이스(B)는 감정인식시스템을 무선 또는 유선 커넥션을 통해 컴퓨터 네트워크에 연결시킬 수 있다. The network interface B may be a computer hardware component for connecting a computing device implementing the emotion recognition system to a computer network. The network interface B may connect the emotion recognition system to the computer network through a wireless or wired connection.

프로세서(A)는 기본적인 산술, 로직, 감정인식시스템(100)을 구현하는 입출력 연산을 수행함으로써, 컴퓨터 프로그램의 명령을 처리하도록 구성될 수 있다. 명령은 메모리(C) 또는 네트워크 인터페이스(B)에 의해, 그리고 버스를 통해 프로세서로 제공될 수 있다. 프로세서는 기준프레임추출부(1000), 특징정보추출부(2000), 및 감정판별부(3000)를 위한 프로그램 코드를 실행하도록 구성될 수 있다. 이러한 프로그램 코드는 메모리와 같은 기록 장치에 저장될 수 있다.The processor A may be configured to process instructions of a computer program by performing input / output operations that implement the basic arithmetic, logic, and emotion recognition system 100. Instructions may be provided to the processor by memory C or network interface B and via a bus. The processor may be configured to execute program codes for the reference frame extractor 1000, the feature information extractor 2000, and the emotion discriminator 3000. Such program code may be stored in a recording device such as a memory.

상기 기준프레임추출부(1000), 특징정보추출부(2000), 및 감정판별부(3000)는 이하에서 설명하게 될 감정인식방법을 수행하기 위해 구성될 수 있다. 상기한 프로세서는 감정인식방법에 따라 일부 컴포넌트가 생략되거나, 도시되지 않은 추가의 컴포넌트가 더 포함되거나, 2개 이상의 컴포넌트가 결합될 수 있다.The reference frame extractor 1000, the feature information extractor 2000, and the emotion discriminator 3000 may be configured to perform an emotion recognition method which will be described below. According to the emotion recognition method, some components may be omitted, additional components not shown may be included, or two or more components may be combined.

한편, 이와 같은 상기 컴퓨팅 장치는 바람직하게는 개인용 컴퓨터 혹은 서버에 해당하고, 경우에 따라서는 스마트 폰(smart phone)과, 태블릿(tablet)과, 이동 전화기와, 화상 전화기와, 전자책 리더(e-book reader)와, 데스크 탑(desktop) PC와, 랩탑(laptop) PC와, 넷북(netbook) PC와, 개인용 복합 단말기(personal digital assistant: PDA, 이하 ‘PDA’라 칭하기로 한다)와, 휴대용 멀티미디어 플레이어(portable multimedia player: PMP, 이하 ‘PMP’라 칭하기로 한다)와, 엠피3 플레이어(mp3 player)와, 이동 의료 디바이스와, 카메라와, 웨어러블 디바이스(wearable device)(일 예로, 헤드-마운티드 디바이스(head-mounted device: HMD, 일 예로 ‘HMD’라 칭하기로 한다)와, 전자 의류와, 전자 팔찌와, 전자 목걸이와, 전자 앱세서리(appcessory)와, 전자 문신, 혹은 스마트 워치(smart watch) 등에 해당할 수 있다.On the other hand, such a computing device preferably corresponds to a personal computer or a server, and in some cases, a smart phone, a tablet, a mobile phone, a video phone, an e-book reader (e) -book reader, desktop PC, laptop PC, netbook PC, personal digital assistant (PDA), portable A portable multimedia player (PMP, hereinafter referred to as 'PMP'), an mp3 player, a mobile medical device, a camera, a wearable device (eg, head-mounted) Head-mounted device (HMD), for example referred to as 'HMD', electronic clothing, electronic bracelet, electronic necklace, electronic accessory, electronic tattoo, or smart watch ) And the like.

즉, 본 발명의 일 실시예에 따른 기계학습 모델을 이용하여 N개의 프레임에 기초하여 대상의 감정인식을 수행하는 감정인식 시스템(100)은, 상기 대상의 기준프레임을 도출하는 기준프레임추출부(1000); 상기 N개의 프레임에 대한 N개의 제1특징정보 및 상기 기준프레임에 대한 제2특징정보를 추출하는 특징정보추출부(2000); 상기 N개의 제1특징정보 및 상기 제2특징정보에 기초하여, 학습된 순환신경망 모델에 의하여 상기 대상의 감정결과를 도출하는 감정판별부(3000);를 포함할 수 있다.That is, the emotion recognition system 100 that performs the emotion recognition of the object based on the N frames using the machine learning model according to the embodiment of the present invention includes a reference frame extracting unit (derived) to derive the reference frame of the object ( 1000); A feature information extracting unit 2000 for extracting N first feature information for the N frames and second feature information for the reference frame; And an emotion discriminating unit 3000 that derives an emotion result of the object based on the trained cyclic neural network model based on the N first feature information and the second feature information.

상기 기준프레임추출부(1000)는 상기 대상의 기준프레임을 도출할 수 있다. 즉, 상술한 바와 같이 상기 기준프레임추출부(1000)에서 도출된 상기 기준프레임은 상기 감정인식시스템(100)의 다른 구성요소에 입력되어 상기 대상의 외형 혹은 환경 등의 고유특성에 대한 특징정보를 추출하는데 기초가 된다. 이와 같이 추출된 특징정보는 이후 인공신경망에 의하여 도출된 결과를 보상하는 데 이용된다. 이와 같은 기준프레임은 중립(NEUTRAL) 표정을 갖는 프레임 혹은 영상정보에 포함된 프레임 중 일부 혹은 영상정보에 포함된 프레임 중 중립 표정을 갖는다고 추정되는 프레임이 선택될 수 있다. 혹은 사용자가 제공하는 기준프레임정보로부터 재구성되는 프레임이 선택될 수도 있다.The reference frame extractor 1000 may derive the reference frame of the target. That is, as described above, the reference frame derived from the reference frame extracting unit 1000 is input to other components of the emotion recognition system 100 to provide characteristic information about the characteristic of the object such as its appearance or environment. It is the basis for the extraction. The extracted feature information is then used to compensate for the result derived by the artificial neural network. Such a reference frame may be selected from a frame having a neutral expression or a frame estimated to have a neutral expression among some of the frames included in the image information or a frame included in the image information. Alternatively, a frame reconstructed from reference frame information provided by the user may be selected.

상기 특징정보추출부(2000)는 상기 N개의 프레임에 대한 N개의 제1특징정보 및 상기 기준프레임에 대한 제2특징정보를 추출할 수 있다. 즉, 상기 N개의 제1특징정보는 상기 대상에 대한 상기 N개의 프레임을 기초로 상기 특징정보추출부(2000)에 의하여 도출될 수 있고, 상기 제2특징정보는 상기 기준프레임을 기초로 상기 특징정보추출부(2000)에 의하여 도출될 수 있다.The feature information extractor 2000 may extract N first feature information for the N frames and second feature information for the reference frame. That is, the N first feature information may be derived by the feature information extractor 2000 based on the N frames for the object, and the second feature information is based on the reference frame. It may be derived by the information extraction unit 2000.

이와 같은 N개의 제1특징정보 및 제2특징정보는 상기 감정판별부(3000)에 입력될 수 있다.The N first feature information and the second feature information may be input to the emotion discrimination unit 3000.

상기 감정판별부(3000)는 상기 N의 제1특징정보 및 상기 제2특징정보에 기초하여 학습된 순환신경망모델에 의하여 상기 대상의 감정결과를 도출할 수 있다.The emotion discrimination unit 3000 may derive an emotion result of the object by using a circulatory neural network model trained based on the first feature information of the N and the second feature information.

도 3은 본 발명의 일 실시예에 따른 기준프레임추출부의 동작을 예시적으로 도시한다.3 exemplarily illustrates an operation of a reference frame extractor according to an embodiment of the present invention.

상기 실시예에 따르면, 도 3의 (a)에 도시된 바와 같이 상기 기준프레임추출부(1000)는 상기 대상의 기준프레임을 도출할 수 있다. 구체적으로 영상정보 전체로부터 기준프레임을 추출할 수 있다. 예를들어, 영상정보 전체 중 특정 순번의 프레임을 기준프레임으로 추출하거나 혹은 영상정보 전체에 포함된 프레임에 대하여 표정의 강도를 판단하여 표정의 강도가 낮다고 판단되는 프레임을 기준프레임으로 선택할 수 있다. 여기서 표정의 강도의 판단에는 학습된 인공신경망 모델에 의하여 수행될 수 있고, 혹은 기설정된 특징정보에 대한 룰 베이스 판단에 의하여 수행될 수 있다.According to the embodiment, as shown in (a) of FIG. 3, the reference frame extractor 1000 may derive the reference frame of the object. In more detail, a reference frame may be extracted from the entire image information. For example, a frame having a specific sequence number among all the image information may be extracted as a reference frame, or a frame determined to have a low expression strength may be selected as the reference frame by determining the intensity of the expression with respect to the frame included in the entire image information. The determination of the strength of the facial expression may be performed by the learned artificial neural network model, or may be performed by rule-based determination of predetermined feature information.

바람직하게는, 도 3의 (b)에 도시된 바와 같이 상기 기준프레임추출부(1000)는 분석대상이 되는 N개의 프레임 중 어느 하나를 기준프레임으로 도출할 수 있다.Preferably, as shown in (b) of FIG. 3, the reference frame extractor 1000 may derive one of N frames to be analyzed as a reference frame.

즉, 상기 기준프레임추출부(1000)는 상기 대상에 대한 상기 영상정보를 기초로 상기 대상의 기준프레임을 도출하는데 있어서, 상기 대상에 대한 상기 N개의 프레임 중 어느 하나를 기준프레임으로 도출하거나 혹은 N개의 프레임 중 어느 하나에 대하여 이미지 전처리 등을 수행하여 기준프레임으로 도출할 수 있다. 바람직하게는, 상기 기준프레임추출부(1000)는 상기 N개의 프레임 중 첫번째 프레임을 기준프레임으로 도출할 수 있다.That is, the reference frame extractor 1000 derives one of the N frames for the target as a reference frame or derives the reference frame for deriving the reference frame of the target based on the image information of the target. Image preprocessing may be performed on any one of the frames to derive the reference frame. Preferably, the reference frame extractor 1000 may derive the first frame of the N frames as a reference frame.

또한, 상기 기준프레임추출부(1000)는 도 3의 (c)에 상기 감정인식시스템의 사용자가 입력하는 기준프레임정보를 통하여 기준프레임으로 추출할 수도 있다.In addition, the reference frame extractor 1000 may extract a reference frame through reference frame information input by the user of the emotion recognition system in FIG.

상기 기준프레임정보는 기준프레임 자체가 해당할 수 있다. 예를들어 상기 감정인식시스템에서 사용자에게 중립표정을 지으라는 안내가 나오고 사용자가 중립표정을 지었을 때 촬영한 화상이 기준프레임이 될 수 있다.The reference frame information may correspond to the reference frame itself. For example, in the emotion recognition system, the user may be informed to make a neutral expression, and the image taken when the user makes the neutral expression may be a reference frame.

혹은, 사용자가 제공하는 복수의 프레임을 포함하는 영상정보로부터 중립표정프레임 혹은 기설정된 기준에 해당하는 프레임을 기준프레임으로 설정할 수 있다.Alternatively, a neutral expression frame or a frame corresponding to a predetermined reference may be set as a reference frame from image information including a plurality of frames provided by the user.

혹은, 상기 기준프레임정보는 영상정보가 아닌 사용자의 외형 혹은 환경 등에 대한 정보에 해당할 수 있고, 이와 같은 기준프레임정보에 기초하여 단독으로 혹은 다른 데이터(예를들어 영상정보)를 기초로 하여 기준프레임을 도출할 수 있다.Alternatively, the reference frame information may correspond to information about the user's appearance or environment, not image information, and may be based solely on the basis of such reference frame information or based on other data (for example, image information). The frame can be derived.

즉, 상술한 바와 같이 상기 기준프레임추출부(1000)에서 도출된 상기 기준프레임은 상기 감정인식시스템(100)의 다른 구성요소에 입력되어 상기 대상의 외형 및/또는 환경 등의 고유특성에 대한 특징정보를 추출하는데 기초가 되고, 이와 같이 추출된 특징정보들은 순환신경망을 통하여 도출된 결과를 보상하는 데 이용된다. That is, as described above, the reference frame derived from the reference frame extracting unit 1000 is input to other components of the emotion recognition system 100 so as to provide characteristics of inherent characteristics such as appearance and / or environment of the object. It is the basis for extracting the information, and the extracted feature information is used to compensate for the results obtained through the cyclic neural network.

즉, 상기 제2특징정보는 상술한 상기 대상의 고유특성에 대한 특징정보로서, 상기 기준프레임을 기초로 추출되는 상기 제2특징정보는 기계학습 모델을 이용하여 대상의 감정을 인식하는 종래 기술이, 대상의 고유특성에 의하여 효율이 저하되는 현상을 해결하는데 사용될 수 있다.That is, the second feature information is the feature information on the inherent characteristics of the object described above, and the second feature information extracted based on the reference frame is a conventional technique for recognizing the emotion of the object using a machine learning model. For example, the present invention can be used to solve a phenomenon in which efficiency is lowered due to an inherent characteristic of an object.

도 4는 본 발명의 일 실시예에 따른 제1특징정보 및 제2특징정보를 기초로 감정결과를 도출하는 과정을 개략적으로 도시한다.4 schematically illustrates a process of deriving an emotion result based on the first feature information and the second feature information according to an embodiment of the present invention.

상기 실시예에 따르면, 상기 특징정보추출부(2000)는, 상기 N개의 프레임으로부터 상기 N개의 제1특징정보를 추출하는 제1특징정보추출부(2100); 및 상기 기준프레임으로부터 제2특징정보를 추출하는 상기 제2특징정보추출부(2200);를 포함할 수 있다.According to the embodiment, the feature information extractor 2000 includes: a first feature information extractor 2100 for extracting the N first feature information from the N frames; And the second feature information extractor 2200 for extracting second feature information from the reference frame.

상기 제1특징정보추출부(2100)는 상기 N개의 프레임으로부터 상기 N개의 제1특징정보를 추출할 수 있다. 즉, 도 4에 도시된 바와 같이, #1 내지 #N의 상기 N개의 프레임이 상기 제1특징정보추출부(2100)로 입력되고, 상기 제1특징정보추출부(2100)는 상기 N개의 프레임 각각을 기초로

내지

의 상기 N개의 제1특징정보를 추출할 수 있다.The first feature information extractor 2100 may extract the N first feature information from the N frames. That is, as shown in FIG. 4, the N frames of # 1 to #N are input to the first feature information extractor 2100, and the first feature information extractor 2100 is the N frames. Based on each

To

The N first feature information of may be extracted.

바람직하게는, 상기 제1특징정보추출부(2100)는 학습된 인공신경망모델을 이용하여 상기 N개의 프레임으로부터 상기 N개의 제1특징정보를 추출할 수 있다.Preferably, the first feature information extractor 2100 may extract the N first feature information from the N frames by using the learned artificial neural network model.

즉, 상기 제1특징정보추출부(2100)는 학습된 인공신경망모델에 의하여 구현될 수 있고, 상기 N개의 프레임이 상기 제1특징정보추출부(2100)의 학습된 인공신경망모델에 입력되어, 상기 N개의 프레임 각각으로부터 상기 N개의 제1특징정보를 추출할 수 있다. 혹은 상기 제1특징정보추출부(2100)는 인공신경망모델이 아닌 입력된 프레임으로부터 감정, 표정 등과 관련된 기설정된 정보를 추출하는 규칙기반 모델로부터 제1특징정보를 추출할 수 있다.That is, the first feature information extractor 2100 may be implemented by a trained neural network model, and the N frames are input to the trained artificial neural network model of the first feature information extractor 2100. The N first feature information may be extracted from each of the N frames. Alternatively, the first feature information extractor 2100 may extract first feature information from a rule-based model that extracts predetermined information related to emotions, facial expressions, etc., from an input frame other than an artificial neural network model.

상기 제2특징정보추출부(2200)는 상기 기준프레임으로부터 제2 특징정보를 추출할 수 있다. 즉, 도 4에 도시된 바와 같이 상기 기준프레임추출부(1000)에서 도출된 상기 기준프레임이 상기 제2특징정보추출부(2200)로 입력되고, 상기 제2특징정보추출부(2200)는 상기 기준프레임을 기초로

의 상기 제2특징정보를 추출할 수 있다. 혹은 상기 제2특징정보추출부(2200)는 인공신경망모델이 아닌 입력된 프레임으로부터 감정, 표정 등과 관련된 기설정된 정보를 추출하는 규칙기반 모델로부터 제1특징정보를 추출할 수 있다. The second feature information extractor 2200 may extract second feature information from the reference frame. That is, as shown in FIG. 4, the reference frame derived from the reference frame extractor 1000 is input to the second feature information extractor 2200, and the second feature information extractor 2200 is Based on the reference frame

The second feature information of may be extracted. Alternatively, the second feature information extractor 2200 may extract first feature information from a rule-based model that extracts predetermined information related to emotions, facial expressions, etc., from an input frame other than an artificial neural network model.

바람직하게는, 상기 제1특징정보추출부와 상기 제2특징정보추출부는 공통된 학습데이터로 학습되거나, 혹은 동일한 입력값에 대하여 동일 혹은 상응하는 결과를 출력할 수 있는 모델이다.Preferably, the first feature information extracting unit and the second feature information extracting unit are models that can be learned with common learning data or can output the same or corresponding results for the same input value.

바람직하게는, 상기 제2특징정보추출부(2200)는 학습된 인공신경망모델을 이용하여 상기 기준프레임으로부터 상기 제2특징정보를 추출할 수 있다.Preferably, the second feature information extractor 2200 may extract the second feature information from the reference frame by using the learned artificial neural network model.

즉, 상기 제2특징정보추출부(2200)는 학습된 인공신경망모델에 의하여 구현될 수 있고, 상기 N개의 프레임이 상기 제2특징정보추출부(2200)의 학습된 인공신경망모델에 입력되어, 상기 기준프레임으로부터 상기 제2특징정보를 추출할 수 있다.That is, the second feature information extractor 2200 may be implemented by a trained neural network model, and the N frames are input to the trained artificial neural network model of the second feature information extractor 2200. The second feature information may be extracted from the reference frame.

도 5는 본 발명의 일 실시예에 따른 기계학습된 모델을 이용한 제1특징정보추출부 및 제2특징정보추출부의 동작을 예시적으로 도시한다.5 exemplarily illustrates operations of the first feature information extractor and the second feature information extractor using a machine-learned model according to an embodiment of the present invention.

상기 실시예에 따르면, 더욱 바람직하게는, 상기 제1특징정보추출부(2100)는 기계학습된 모델을 이용하여 상기 N개의 프레임으로부터 상기 N개의 제1특징정보를 추출하고, 상기 제2특징정보추출부(2200)는 기계학습된 모델을 이용하여 상기 기준프레임으로부터 상기 제2특징정보를 추출할 수 있다.According to the embodiment, more preferably, the first feature information extracting unit 2100 extracts the N first feature information from the N frames by using a machine-learned model, and the second feature information. The extractor 2200 may extract the second feature information from the reference frame by using a machine-learned model.

즉, 상기 제1특징정보추출부(2100)는 기계학습된 모델을 이용하여 상기 N개의 프레임으로부터 상기 N개의 제1특징정보를 추출할 수 있다. 구체적으로, 상기 제1특징정보추출부(2100)은 이를 구성하는 상기 기계학습된 모델을 이용하여 상기 N개의 프레임 각각으로부터 상기 대상에 대한 특징인 상기 N개의 제1특징정보를 추출할 수 있다.That is, the first feature information extraction unit 2100 may extract the N first feature information from the N frames by using a machine-learned model. Specifically, the first feature information extractor 2100 may extract the N first feature information, which is a feature of the object, from each of the N frames using the machine-learned model constituting the first feature information extractor 2100.

도 5의 (a)에 도시된 바와 같이, #1 프레임이 상기 제1특징정보추출부(2100)의 기계학습된 모델에 입력되어, 이에 대한 제1특징정보인

을 추출할 수 있다. 이와 마찬가지로, #2 프레임이 입력되어 이에 대한 제1특징정보인

을 추출하고, #N 프레임이 입력되어

을 추출할 수 있다.As shown in (a) of FIG. 5, the # 1 frame is input to the machine-learned model of the first feature information extracting unit 2100, and is the first feature information.

Can be extracted. Similarly, frame # 2 is inputted so that the first feature information

, The #N frame is entered

Can be extracted.

또한, 상기 제2특징정보추출부(2200)는 기계학습된 모델을 이용하여 상기 기준프레임으로부터 상기 제2특징정보를 추출할 수 있다. 구체적으로, 상기 기계학습된 모델을 이용하여 상기 기준프레임으로부터 상기 대상에 대한 특징인 상기 제2특징정보를 추출할 수 있다.In addition, the second feature information extracting unit 2200 may extract the second feature information from the reference frame using a machine-learned model. Specifically, the second feature information, which is a feature of the object, may be extracted from the reference frame by using the machine-learned model.

도 5의 (b)에 도시된 바와 같이, 상기 기준프레임도출부(1000)로부터 도출된 상기 기준프레임이 상기 제2특징정보추출부(2200)의 기계학습된 모델에 입력되어, 이에 대한 제2특징정보인

가 추출될 수 있다.As shown in (b) of FIG. 5, the reference frame derived from the reference frame extracting unit 1000 is input to a machine-learned model of the second feature information extracting unit 2200, and a second corresponding thereto is obtained. Feature Information

Can be extracted.

다만, 이와 같은 제1특징정보추출부(2100), 및 제2특징정보추출부(2200)를 구성하는 상기 기계학습된 모델은 컨볼루션신경망모델(CNN), 캡슐네트워크(CapsNet) 등의 인공신경망 모델에 의하여 구현될 수 있다. 이와 같은 컨볼루션신경망모델(CNN), 캡슐네트워크(CapsNet) 등의 인공신경망 모델은 입력되는 영상정보로부터 대상의 특징정보를 추출할 수 있는 특징이 있다.However, the machine-learned model constituting the first feature information extracting unit 2100 and the second feature information extracting unit 2200 may be an artificial neural network such as a convolutional neural network model (CNN) and a capsule network (CapsNet). It can be implemented by a model. An artificial neural network model such as a convolutional neural network model (CNN), a capsule network (CapsNet), etc. has a feature of extracting feature information of a target from input image information.

이하에서는, 상기 제1특징정보추출부(2100) 및 상기 제2특징정보추출부(2200)를 구성할 수 있는 컨볼루션신경망모델(CNN)의 동작과정에 대하여 설명하도록 한다.Hereinafter, an operation process of the convolutional neural network model (CNN) constituting the first feature information extractor 2100 and the second feature information extractor 2200 will be described.

도 6은 본 발명의 일 실시예에 따른 컨볼루션신경망모델의 세부 동작과정을 예시적으로 도시한다.6 exemplarily illustrates a detailed operation of the convolutional neural network model according to an embodiment of the present invention.

도 6에 도시된 바와 같이, 상기 제1특징정보추출부 및 상기 제2특징정보추출부를 구성하는 상기 컨볼루션신경망모델은 상기 대상의 영상정보가 입력되면 컨볼루션 및 풀링 연산을 반복함으로써, fully connected layer를 생성하는 방법으로 상기 제1특징정보 및 상기 제2특징정보를 추출할 수 있다.As shown in FIG. 6, the convolutional neural network model constituting the first feature information extractor and the second feature information extractor repeats convolution and pooling operations when image information of the target is input, thereby being fully connected. The first feature information and the second feature information may be extracted by generating a layer.

결국, 이와 같은 컨볼루션신경망모델을 통하여 추출되는 상기 제1특징정보 및 상기 제2특징정보는 입력되는 상기 대상의 각각의 프레임에 대한 특징이 추출됨으로써, 후술하는 감정판별부의 동작에 기초가 되는 정보가 될 수 있다.As a result, the first feature information and the second feature information extracted through such a convolutional neural network model extract information about each frame of the object to be input, thereby providing information based on the operation of the emotion discriminator described later. Can be

도 7은 본 발명의 일 실시예에 따른 감정판별부의 동작을 예시적으로 도시한다.7 exemplarily illustrates an operation of the emotion discrimination unit according to an embodiment of the present invention.

상기 실시예에 따르면, 상기 감정판별부(3000)는, 학습된 순환신경망모델을 이용하여 상기 N개의 제1특징정보를 기초로 감정시퀀스정보를 추출하는 감정시퀀스정보순환신경망모듈(3100); 학습된 순환신경망 모델을 이용하여 상기 제2특징정보를 기초로 기준시퀀스정보를 추출하는 기준시퀀스정보순환신경망모듈(3200); 및 상기 감정시퀀스정보, 및 기준시퀀스정보를 기초로 상기 대상의 감정결과를 도출하는 감정결과도출부(3300);를 포함할 수 있다.According to the exemplary embodiment, the emotion discriminating unit 3000 may include an emotion sequence information circulatory neural network module 3100 for extracting emotion sequence information based on the N first feature information using the learned cyclic neural network model; A reference sequence information cyclic neural network module 3200 for extracting reference sequence information based on the second feature information using the learned cyclic neural network model; And an emotion result extracting unit 3300 for deriving an emotion result of the target based on the emotion sequence information and reference sequence information.

상기 감정시퀀스정보순환신경망모듈(3100)은 학습된 순환신경망모델을 이용하여 상기 N개의 제1특징정보를 기초로 감정시퀀스정보를 추출할 수 있다.The emotion sequence information circulatory neural network module 3100 may extract emotion sequence information based on the N first feature information using the learned cyclic neural network model.

상기 감정시퀀스정보는 상기 N개의 제1특징정보로부터 추출되고, 이는 상기 감정인식시스템(100)에서 상기 대상의 감정을 인식하거나 혹은 상기 대상의 감정을 기설정된 클래스(class)에 의하여 분류할 수 있는 정보 등이 된다.The emotion sequence information is extracted from the N first characteristic information, which can be recognized by the emotion recognition system 100 or can classify the emotions of the object by a predetermined class. Information.

상기 N개의 제1특징정보가 학습된 순환신경망모델을 구성하는 각각의 유닛에 순차적으로 입력되고, 하나의 입력에 대한 결과값이 기억되어 다음 입력과 함께 입력됨으로써, 이와 같은 감정시퀀스정보는 다음 결과값에 영향을 주는 특징이 있다.The N first feature information is sequentially input to each unit constituting the trained cyclic neural network model, and a result value for one input is stored and input together with the next input, such that the emotion sequence information is the next result. There is a characteristic that affects the value.

도 7의 (a)에 도시된 바와 같이, 상기 #1 내지 #N 프레임으로부터 추출된

내지

의 상기 제1특징정보는 상기 감정시퀀스정보순환신경망모듈(3100)의 학습된 순환신경망모듈에 차례로 입력되고, 상기 감정결과도출부(3300)는 이를 통하여 추출된

내지

의 상기 감정시퀀스정보를 기초로 상기 대상의 감정을 인식하거나 혹은 상기 대상의 감정을 기설정된 클래스(class)에 의하여 분류할 수 있다.As shown in (a) of FIG. 7, extracted from the frames # 1 to #N.

To

The first characteristic information of is sequentially input to the learned cyclic neural network module of the emotion sequence information circulatory neural network module 3100, and the emotion result extracting unit 3300 is extracted through this.

To

The emotion of the object may be recognized or the emotion of the object may be classified based on a predetermined class based on the emotion sequence information of.

상기 기준시퀀스정보순환신경망모듈(3200)은 학습된 순환신경망모델을 이용하여 상기 제2특징정보를 기초로 기준시퀀스정보를 추출할 수 있다.The reference sequence information cyclic neural network module 3200 may extract reference sequence information based on the second feature information by using the learned cyclic neural network model.

상기 기준시퀀스정보는 상기 기준프레임으로부터 추출되고, 이는 상기 감정인식시스템(100)에서 상기 대상의 감정을 인식하거나 혹은 상기 대상의 감정을 기설정된 클래스(class)에 의하여 분류함에 있어서, 상술한 바와 같이 상기 대상의 고유특성에 의한 영향을 제한하는데 사용될 수 있다.The reference sequence information is extracted from the reference frame, and the emotion recognition system 100 recognizes the emotion of the object or classifies the emotion of the object by a predetermined class, as described above. It can be used to limit the effects of the subject's unique characteristics.

이와 같은 기준시퀀스정보는, 상기 제2특징정보가 학습된 순환신경망모델을 구성하는 각각의 유닛에 순차적으로 입력되고, 하나의 입력에 대한 결과값이 기억되어 다음 입력과 함께 입력됨으로써, 다음 결과값에 영향을 주는 특징이 있다.Such reference sequence information is sequentially input to each unit constituting the circulatory neural network model trained on the second feature information, and the result value for one input is stored and input together with the next input, thereby providing the next result value. There is a characteristic that affects.

도 7의 (a)에 도시된 바와 같이 복수개의

의 상기 제2특징정보는, 상기 기준시퀀스정보순환신경망모듈(3200)에 순차로 복수번 각각 입력되고, 상기 감정결과도출부(3300)는 추출된

내지

의 상기 기준시퀀스정보를 기초로 상기 대상의 고유특성에 의한 영향을 제한할 수 있다.As shown in (a) of FIG.

The second feature information of the, is sequentially input to the reference sequence information circulatory neural network module 3200 a plurality of times, respectively, the emotion result extracting unit 3300 is extracted

To

Based on the reference sequence information of the can be limited by the unique characteristics of the target.

상기 감정결과도출부(3300)는 상기 감정시퀀스정보, 및 상기 기준시퀀스정보를 기초로 상기 대상의 고유특성에 의한 영향이 제한된 상기 대상의 감정결과를 도출할 수 있다.The emotion result deriving unit 3300 may derive the emotion result of the object of which the influence by the unique characteristic of the object is limited based on the emotion sequence information and the reference sequence information.

즉, 상기 감정결과도출부(3300)는 상기 감정시퀀스정보순환신경망모듈(3100)로부터 추출된 상기 감정시퀀스정보를 기초로 상기 대상의 감정을 인식하거나 혹은 상기 대상의 감정을 기설정된 클래스(class)에 의하여 분류하는데 있어서, 상기 기준시퀀스정보를 이용하여 상기 대상의 고유특성에 의한 영향이 제한함으로써, 상기 대상의 감정결과를 도출할 수 있다.That is, the emotion result deriving unit 3300 recognizes the emotion of the subject or sets the emotion of the subject based on the emotion sequence information extracted from the emotion sequence information circulatory neural network module 3100. By classifying, the influence of the characteristic of the subject is limited by using the reference sequence information, so that the emotion result of the subject can be derived.

결국, 상기 감정결과도출부(3300)는 상기 감정시퀀스정보, 및 상기 기준시퀀스정보를 기초로 상기 대상의 감정을 인식하거나 혹은 상기 대상의 감정을 기설정된 클래스(class)로 분류함으로써, 상술한 바와 같이 종래 기술에 비하여 향상된 감정인식 시스템을 구현할 수 있다.As a result, the emotion result obtaining unit 3300 recognizes the emotion of the object based on the emotion sequence information and the reference sequence information, or classifies the emotion of the object into a predetermined class. Likewise, it is possible to implement an improved emotion recognition system compared to the prior art.

또한, 상기 감정시퀀스정보순환신경망모듈(3100)은, 학습된 순환신경망모델을 이용하여, 하나 이상의 순환신경망 유닛에 상기 N개의 제1특징정보가 순차적으로 입력되고, 상기 제1특징정보를 기초로 N개의 감정시퀀스정보를 추출할 수 있다.The emotion sequence information circulatory neural network module 3100 may sequentially input the N first feature information to one or more circulatory neural network units using the learned cyclic neural network model, based on the first feature information. N emotion sequence information can be extracted.

이에 더하여, 상기 기준시퀀스정보순환신경망모듈(3200)은 학습된 순환신경망모델을 이용하여, 하나 이상의 순환신경망 유닛에 상기 제2특징정보가 각각 입력되고, 상기 제2특징정보를 기초로 기준시퀀스정보를 추출할 수 있다.In addition, the reference sequence information circulatory neural network module 3200 uses the learned cyclic neural network model to input the second feature information to one or more cyclic neural network units, respectively, and based on the second feature information, the reference sequence information. Can be extracted.

즉, 상기 감정시퀀스정보순환신경망모듈(3100) 및 상기 기준시퀀스정보순환신경망모듈(3200)을 구성하는 학습된 순환신경망모델을 구성하는 순환신경망 유닛에 상기 N개의 제1특징정보 및 상기 제2특징정보가 병렬적으로 입력되고, 이를 기초로 상기 감정시퀀스정보 및 상기 기준시퀀스정보를 추출할 수 있다.That is, the N first feature information and the second feature are included in a cyclic neural network unit constituting a trained cyclic neural network model constituting the emotion sequence information circulatory neural network module 3100 and the reference sequence information circulatory neural network module 3200. Information is input in parallel, and the emotion sequence information and the reference sequence information can be extracted based on the information.

이에 따라, 상술한 바와 같이 상기 감정시퀀스정보 및 상기 기준시퀀스정보는 상술한 감정결과도출부(3300)에 입력되어 상기 대상의 고유특성에 의한 영향이 제한된 상기 대상의 감정결과를 도출할 수 있다.Accordingly, as described above, the emotion sequence information and the reference sequence information may be input to the above-described emotion result generator 3300 to derive the emotion result of the subject with limited influence by the unique characteristics of the subject.

더욱 바람직하게는, 상기 감정시퀀스정보순환신경망모듈(3100)은 학습된 순환신경망모델을 이용하여, 하나 이상의 순환신경망 유닛에 상기 N개의 제1특징정보가 순차적으로 입력되고, 상기 제1특징정보를 기초로 N개의 감정시퀀스정보를 추출하고, 상기 기준시퀀스정보순환신경망모듈(3200)은 학습된 순환신경망모델을 이용하여, 하나 이상의 순환신경망 유닛 각각에 상기 제2특징정보가 입력되고, 상기 제2특징정보를 기초로 N개의 기준시퀀스정보를 추출하고, 상기 감정결과도출부(3300)는, 상기 N개의 기준시퀀스정보 및 상기 N개의 감정시퀀스정보를 기초로 상기 대상의 감정결과를 도출할 수 있다. More preferably, the emotion sequence information circulatory neural network module 3100 sequentially inputs the N first feature information to at least one cyclic neural network unit by using the trained cyclic neural network model, and inputs the first feature information. Extracting the N emotion sequence information based on the above, and the reference sequence information circulatory neural network module 3200 inputs the second feature information to each of the one or more circulatory neural network units using the learned cyclic neural network model, and the second N reference sequence information may be extracted based on the feature information, and the emotion result obtaining unit 3300 may derive the emotion result of the target based on the N reference sequence information and the N emotion sequence information. .

더욱 바람직하게는, 상기 감정결과도출부(3300)는, 상기 N개의 기준시퀀스정보 및 상기 N개의 감정시퀀스정보를 포함하는 데이터를 기초로 상기 대상의 감정결과를 도출할 수 있다.More preferably, the emotion result deriving unit 3300 may derive the emotion result of the target based on data including the N reference sequence information and the N emotion sequence information.

이와 같은 상기 감정시퀀스정보순환신경망모듈(3100) 및 상기 기준시퀀스정보순환신경망모듈(3200)을 구성하는 상기 학습된 순환신경망모델은 RNN(Recurent Neural Network), LSTM(Long-Short Term Memory), 및 GRU(Gated Recurrent Unit) 등에 의하여 구현될 수 있다. The trained circulatory neural network model constituting the emotion sequence information circulatory neural network module 3100 and the reference sequence information circulatory neural network module 3200 may include a recurent neural network (RNN), a long-short term memory (LSTM), and It may be implemented by a GRU (Gated Recurrent Unit).

도 4에 도시된 바와 같이, 더욱 바람직하게는, 상기 감정시퀀스정보순환신경망모듈(3100)은 LSTM 방식의 학습된 순환신경망모델을 이용하여, 상기 학습된 순환신경망 모델의 하나 이상의 LSTM유닛 각각에 상기 N개의 제1특징정보가 입력되고, 상기 제1특징정보를 기초로 N개의 감정시퀀스정보를 추출할 수 있다.As shown in FIG. 4, more preferably, the emotion sequence information circulatory neural network module 3100 uses the learned circulatory neural network model of the LSTM method, to each of one or more LSTM units of the learned circulatory neural network model. N first feature information may be input, and N emotion sequence information may be extracted based on the first feature information.

이에 더하여, 상기 기준시퀀스정보순환신경망모듈(3200)은 LSTM 방식의 학습된 순환신경망모델을 이용하여, 상기 학습된 순환신경망모델의 하나 이상의 LSTM유닛 각각에 상기 제2특징정보가 입력되고, 상기 제2특징정보를 기초로 기준시퀀스정보를 추출할 수 있다.In addition, the reference sequence information cyclic neural network module 3200 inputs the second characteristic information to each of one or more LSTM units of the learned cyclic neural network model by using the learned cyclic neural network model of the LSTM method. Reference sequence information can be extracted based on the feature information.

즉, 상기 감정시퀀스정보순환신경망모듈(3100) 및 상기 기준시퀀스정보순환신경망모듈(3200)을 구성하는 하나 이상의 LSTM유닛 각각에 상기 N개의 제1특징정보 및 상기 제2특징정보가 병렬적으로 입력되고, 이를 기초로 상기 감정시퀀스정보 및 상기 기준시퀀스정보를 추출할 수 있다.That is, the N first feature information and the second feature information are input in parallel to each of at least one LSTM unit constituting the emotion sequence information cyclic neural network module 3100 and the reference sequence information cyclic neural network module 3200. Based on this, the emotion sequence information and the reference sequence information can be extracted.

더욱 바람직하게는, 상기 감정시퀀스정보순환신경망모듈(3100)은 LSTM 방식의 학습된 순환신경망모델을 이용하여, 하나 이상의 LSTM유닛에 상기 N개의 제1특징정보가 순차적으로 입력되고, 상기 제1특징정보를 기초로 N개의 감정시퀀스정보를 추출하고, 상기 기준시퀀스정보순환신경망모듈(3200)은 LSTM 방식의 학습된 순환신경망모델을 이용하여, 하나 이상의 LSTM유닛 각각에 상기 제2특징정보가 입력되고, 상기 제2특징정보를 기초로 N개의 기준시퀀스정보를 추출하고, 상기 감정결과도출부(3300)는, 상기 N개의 기준시퀀스정보 및 상기 N개의 감정시퀀스정보를 기초로 상기 대상의 감정결과를 도출할 수 있다. More preferably, the emotion sequence information circulatory neural network module 3100 sequentially inputs the N first feature information to at least one LSTM unit by using the learned cyclic neural network model of the LSTM method, and the first feature. N emotion sequence information is extracted based on the information, and the reference sequence information cyclic neural network module 3200 uses the learned cyclic neural network model of the LSTM method to input the second characteristic information to each of the one or more LSTM units. And extracting N reference sequence information based on the second feature information, and the emotion result extracting unit 3300 extracts an emotion result of the target based on the N reference sequence information and the N emotion sequence information. Can be derived.

더욱 바람직하게는, 상기 감정결과도출부(3300)는, 상기 N개의 기준시퀀스정보 및 상기 N개의 감정시퀀스정보를 포함하는 데이터를 기초로 상기 대상의 감정결과를 도출할 수 있다. More preferably, the emotion result deriving unit 3300 may derive the emotion result of the target based on data including the N reference sequence information and the N emotion sequence information.

더욱 바람직하게는, 상기 감정결과도출부는 상기 N개의 기준시퀀스정보 및 상기 N개의 감정시퀀스정보에 대하여, 상기 N개의 감정시퀀스 정보 전체 혹은 각각으로부터 상기 N개의 기준시퀀스 정보 전체 혹은 각각의 영향을 제거하는 방식의 연산을 통하여, 상기 감정결과를 도출한다. More preferably, the emotion result deriving unit removes all or each of the N reference sequence information from each or all of the N emotion sequence information on the N reference sequence information and the N emotion sequence information. Through the operation of the scheme, the emotion result is derived.

상기 감정결과는 상기 N개의 프레임에 대한 1 이상의 감정결과 데이터에 해당할 수 있다. 상기 감정결과는 상기 N개의 프레임 각각에 대하여 도출될 수도 있고, 혹은 분석대상인 N개의 프레임에 대한 종합적인 감정결과일 수 있다.The emotion result may correspond to one or more emotion result data for the N frames. The emotion result may be derived for each of the N frames, or may be a comprehensive emotion result for the N frames to be analyzed.

상기 감정결과는 전술한 바와 같이, 이와 같은 대상에 대한 감정결과는 감정정보를 추출할 수 있는 데이터셋, 예를들어 벡터형태의 데이터셋, 혹은 혹은 상기 대상의 감정을 인식하여 기설정된 복수개의 클래스(class) 중 1 이상으로 분류한 결과를 포함할 수 있다.As described above, the emotion result may be a data set from which emotion information may be extracted, for example, a vector data set, or a plurality of classes preset by recognizing the emotion of the object. It may include the results of classification into one or more of the classes.

바람직하게는, 도 7의 (b)에서와 같이 상기 감정결과도출부(3300)는, 상기 N개의 기준시퀀스정보로부터, 상기 기준시퀀스정보에 대응되는 상기 N개의 감정시퀀스정보를 차감하여 상기 대상의 N개의 프레임의 감정결과로서의 F₁ 내지 F_N을 도출할 수 있다. 여기서 단순히 차감을 하거나 혹은 계수 등의 추가적인 연산을 하여 차감을 하는 방식이 적용될 수 있다.Preferably, as shown in (b) of FIG. 7, the emotion result deriving unit 3300 subtracts the N emotion sequence information corresponding to the reference sequence information from the N reference sequence information to determine the target. F ₁ to F _N as an emotional result of N frames can be derived. In this case, a method of subtracting by simply subtracting or performing an additional operation such as a coefficient may be applied.

결국, 상기 N개의 기준시퀀스정보로부터, 이에 대응되는 상기 N개의 감정시퀀스정보를 적용하여 이를 보상함으로써 상기 대상의 외형, 환경 등의 고유특성이 제한되고, 종래 기술에 비해 성능이 향상된 결과를 얻을 수 있다. As a result, by applying the corresponding N pieces of emotion sequence information from the N pieces of reference sequence information and compensating them, inherent characteristics of the object's appearance and environment are limited, resulting in improved performance compared to the prior art. have.

또한, 본 발명의 다른 실시예에서는, 상기 감정결과도출부는 상기 상기 N개의 기준시퀀스정보 및 상기 기준시퀀스정보에 의하여 도출된 결과, 예를들어 상기 도 7의 (b)의 F₁ 내지 F_N에 대해 추가적인 오차 보상 알고리즘을 적용하여 감정결과를 도출할 수 있다.In another embodiment of the present invention, the emotion result derivation unit is a result derived from the N pieces of reference sequence information and the reference sequence information, for example, in F ₁ to F _N of FIG. An additional error compensation algorithm can be applied to derive the emotion result.

바람직하게는, 상기 감정시퀀스정보순환신경망모듈(3100) 및 상기 기준시퀀스정보순환신경망모듈(3200)은 같은 형태의 LSTM방식의 순환신경망모델로 구현되고, 대응되는 각각의 LSTM유닛은 공통된 학습데이터에 의하여 학습될 수 있다.Preferably, the emotion sequence information circulatory neural network module 3100 and the reference sequence information circulatory neural network module 3200 are implemented as a cyclic neural network model of the same type LSTM, each corresponding LSTM unit is a common learning data Can be learned.

결국, 상술한 바와 같이 공통된 학습데이터에 의하여 학습된 상기 감정시퀀스정보순환신경망모듈(3100) 및 상기 기준시퀀스정보순환신경망모듈(3200)의 대응되는 상기 LSTM유닛 각각은 상기 감정시퀀스정보 및 상기 기준시퀀스정보를 추출하는데 있어서, 같은 가중치로 연산을 진행할 수 있다.As a result, each of the corresponding LSTM units of the emotion sequence information circulatory neural network module 3100 and the reference sequence information circulatory neural network module 3200 learned by the common learning data, as described above, may be the emotion sequence information and the reference sequence. In extracting the information, the operation can be performed with the same weight.

즉, N개의 제1특징정보에 대응되는 상기 제2특징정보는 같은 가중치에 의하여 연산되어 상기 감정시퀀스정보 및 상기 기준시퀀스정보를 추출할 수 있다.That is, the second feature information corresponding to the N first feature information may be calculated by the same weight to extract the emotion sequence information and the reference sequence information.

바람직하게는, 감정시퀀스정보순환신경망모듈(3100)과 상기 기준시퀀스정보순환신경망모듈(3200)의 LSTM유닛은 동일 혹은 상응하는 형태에 해당한다. Preferably, the LSTM units of the emotion sequence information circulatory neural network module 3100 and the reference sequence information circulatory neural network module 3200 correspond to the same or corresponding forms.

다만, 상기 감정시퀀스정보순환신경망모듈(3100) 및 상기 기준시퀀스정보순환신경망모듈(3200)은 각각 제1특징정보 및 제2특징정보를 기초로 같은 가중치에 의하여 상기 감정시퀀스정보 및 상기 기준시퀀스정보를 추출하지만, 각각의 LSTM유닛 각각의 상태정보는 서로 공유되지 않는다.However, the emotion sequence information circulatory neural network module 3100 and the reference sequence information circulatory neural network module 3200 have the same weight based on the first feature information and the second feature information, respectively, according to the emotion sequence information and the reference sequence information. However, the state information of each LSTM unit is not shared with each other.

결국, 상기 감정도출부(3300)는 이와 같이 상기 감정시퀀스정보순환신경망모듈(3100) 및 상기 기준시퀀스정보순환신경망모듈(3200)에 의하여 추출된 상기 감정시퀀스정보 및 상기 기준시퀀스정보를 기초로 상기 대상의 감정결과를 도출할 수 있다.As a result, the emotion extracting unit 3300 may be configured based on the emotion sequence information and the reference sequence information extracted by the emotion sequence information circulatory neural network module 3100 and the reference sequence information circulatory neural network module 3200. The emotional results of the subject can be derived.

이와 같은 방법으로, LSTM 방식의 학습된 순환신경망모델의 복수개의 LSTM유닛에 동일한 가중치를 적용하여 추출되는 상기 N개의 기준시퀀스정보로부터, 대응되는 상기 N개의 감정시퀀스정보를 차감함으로써, 상기 대상의 고유특성이 제한된 상태로 상기 대상의 감정을 인식하거나, 혹은 상기 대상의 감정을 기설정된 클래스(class)에 의하여 분류할 수 있다.In this manner, by subtracting the corresponding N pieces of emotion sequence information from the N pieces of reference sequence information extracted by applying the same weight to a plurality of LSTM units of an LSTM trained cyclic neural network model, the subject's uniqueness The emotion of the object may be recognized in a state in which the characteristic is limited, or the emotion of the object may be classified by a predetermined class.

도 8은 본 발명의 일 실시예에 따른 대상의 고유특성에 따른 특징정보를 예시적으로 도시한다.8 exemplarily shows characteristic information according to a unique characteristic of an object according to an embodiment of the present invention.

상기 실시예에서, 도 8은 학습된 순환신경망모듈을 이용한 종래의 감정인식시스템에 있어서, 상기 영상정보의 대상에 따라, 추출되는 특징정보를 예시적으로 도시한다.In the above embodiment, FIG. 8 exemplarily shows feature information extracted according to an object of the image information in the conventional emotion recognition system using the learned cyclic neural network module.

구체적으로, 대상에 대한 프레임 각각을 여러 장 복사하여 LSTM방식의 학습된 순환신경망모델에 입력한다. 이와 같은 방법으로 복사하여 생성된 상기 대상에 대한 프레임의 세트 중 하나의 프레임을 LSTM방식의 학습된 순환신경망모델 입력하였을 때, 도 8의 첫번째 열의 각각의 값은 0 에서 1.0의 값을 갖는 15차원으로 표현되는 상기 대상에 대한 시퀀스정보의 값을 나타낸다. 이와 같은 시퀀스 정보의 값은 입력되는 프레임의 수가 늘어남에 따라 (Time Step), 하나의 값에 수렴하는 것을 알 수 있다. 이와 같이 대상에 대한 프레임 세트를 입력하여 수렴하는 다차원의 시퀀스정보의 값이 대상의 고유특성에 대한 특징정보를 나타낸다.Specifically, each frame of the object is copied and input into the learned cyclic neural network model of the LSTM method. When one frame of the set of frames for the object generated by copying in this manner is input to the learned cyclic neural network model of the LSTM method, each value of the first column of FIG. 8 is 15-dimensional having a value of 0 to 1.0. It represents the value of the sequence information for the target expressed by. It can be seen that the value of such sequence information converges to one value as the number of input frames increases (Time Step). In this way, the value of the multi-dimensional sequence information that inputs and converges the frame set of the object represents the characteristic information of the unique characteristic of the object.

이와 같은 대상의 고유특성에 대한 특징정보를 표현하는 다차원의 시퀀스 정보의 값은, 대상에 따라 다른 값을 갖는다. 즉, 도 8에 도시된 바와 같이 대상을 백인 남성, 및 황인 여성으로 달리하는 상기 프레임 세트를 LSTM 방식의 학습된 순환신경망모델에 입력하였을 때, 서로 다른 값에 수렴하는 것을 확인할 수 있다.The value of the multi-dimensional sequence information expressing the characteristic information on the unique characteristic of the object has a different value depending on the object. That is, as shown in FIG. 8, when the frame set, which differs from the subject as a white male and a yellow female, is input to the learned circulatory neural network model of the LSTM method, convergence to different values can be confirmed.

즉, 상기 다차원의 시퀀스 정보의 값은 학습된 순환신경망 모듈에 입력되는 영상정보의 대상에 따라 서로 다른 값을 갖는, 대상의 고유특성에 따른 특징정보임을 확인할 수 있다.That is, the value of the multi-dimensional sequence information may be identified as feature information according to the unique characteristics of the object having different values according to the object of the image information input to the learned cyclic neural network module.

이와 같은 상기 대상의 고유특성에 대한 특징정보가 종래의 감정인식시스템의 성능을 저하시키기 때문에, 상술한 바와 같은 방법의 본 발명이 제안되었다.Since such characteristic information on the intrinsic characteristic of the object degrades the performance of the conventional emotion recognition system, the present invention of the method as described above has been proposed.

도 9는 본 발명의 일 실시예에 따른 기계학습 모델을 이용하여 N개의 프레임에 기초하여 대상의 감정인식을 수행하는 감정인식 시스템의 성능 비교결과를 예시적으로 도시한다.FIG. 9 exemplarily shows a performance comparison result of an emotion recognition system that performs an object recognition based on N frames using a machine learning model according to an embodiment of the present invention.

도 9에 도시된 바와 같이 종래 기술(conventional LSTM)에 비하여, 본 발명의 실시예들에 따른 상기 감정인식시스템(ASDF(

), ASDF(

+

))이 더 적은 프레임으로도 같은 정답률을 갖는 것을 확인 할 수 있다.As shown in FIG. 9, the emotion recognition system ASDF according to the embodiments of the present invention is compared with the conventional LSTM.

), ASDF (

+

We can see that)) has the same percentage of correct answers even with fewer frames.

도 9에 도시된 표에서, ASDF(

)은 하나의 대상으로 상기 감정인식시스템을 구성하는 신경망모델을 훈련시켰을 때 얻는 결과값을 나타내고, ASDF(

+

)은 세개의 대상으로 상기 감정인식시스템을 구성하는 신경망모델을 훈련시켰을 때 얻는 결과값을 나타낸다.In the table shown in FIG. 9, the ASDF (

) Represents the result obtained when the neural network model constituting the emotion recognition system is trained with one object.

+

) Represents the result obtained when training the neural network model constituting the emotion recognition system with three subjects.

또한, (a)는 대상의 감정이 화남, (b)는 대상의 감정이 불쾌, (c)는 대상의 감정이 두려움, (d)는 대상의 감정이 행복, (e)는 대상의 감정이 슬픔, (f)는 대상의 감정이 화남에 대한 영상정보일 때의 결과값을 나타낸다.Also, (a) the subject's feelings of anger, (b) the subject's feelings of displeasure, (c) the subject's feelings of fear, (d) the subject's feelings of happiness, (e) the subject's feelings of Sadness (f) shows the result when the subject's emotion is image information about anger.

도 10은 본 발명의 일 실시예에 따른 컴퓨팅장치의 내부 구성을 예시적으로 도시한다.10 exemplarily illustrates an internal configuration of a computing device according to an embodiment of the present invention.

도 10에 도시한 바와 같이, 컴퓨팅 장치(11000)은 적어도 하나의 프로세서(processor)(11100), 메모리(memory)(11200), 주변장치 인터페이스(peripheral interface)(11300), 입/출력 서브시스템(I/Osubsystem)(11400), 전력 회로(11500) 및 통신 회로(11600)를 적어도 포함할 수 있다. 이때, 컴퓨팅 장치(11000)는 감정인식시스템(100)에 연결된 사용자단말기(A) 혹은 전술한 컴퓨팅 장치(B)에 해당될 수 있다.As shown in FIG. 10, the computing device 11000 may include at least one processor 11100, a memory 11200, a peripheral interface 11300, and an input / output subsystem ( I / Osubsystem 11400, power circuit 11500, and communication circuit 11600. In this case, the computing device 11000 may correspond to a user terminal A connected to the emotion recognition system 100 or the computing device B described above.

메모리(11200)는, 일례로 고속 랜덤 액세스 메모리(high-speed random access memory), 자기 디스크, 에스램(SRAM), 디램(DRAM), 롬(ROM), 플래시 메모리 또는 비휘발성 메모리를 포함할 수 있다. 메모리(11200)는 컴퓨팅 장치(11000)의 동작에 필요한 소프트웨어 모듈, 명령어 집합 또는 학습된 임베딩모델에 포함하는 그밖에 다양한 데이터를 포함할 수 있다.The memory 11200 may include, for example, high-speed random access memory, magnetic disk, SRAM, DRAM, ROM, flash memory, or nonvolatile memory. have. The memory 11200 may include various data included in a software module, an instruction set, or a learned embedding model required for the operation of the computing device 11000.

이때, 프로세서(11100)나 주변장치 인터페이스(11300) 등의 다른 컴포넌트에서 메모리(11200)에 액세스하는 것은 프로세서(11100)에 의해 제어될 수 있다.In this case, accessing the memory 11200 from another component such as the processor 11100 or the peripheral interface 11300 may be controlled by the processor 11100.

주변장치 인터페이스(11300)는 컴퓨팅 장치(11000)의 입력 및/또는 출력 주변장치를 프로세서(11100) 및 메모리 (11200)에 결합시킬 수 있다. 프로세서(11100)는 메모리(11200)에 저장된 소프트웨어 모듈 또는 명령어 집합을 실행하여 컴퓨팅 장치(11000)을 위한 다양한 기능을 수행하고 데이터를 처리할 수 있다.The peripheral interface 11300 may couple the input and / or output peripherals of the computing device 11000 to the processor 11100 and the memory 11200. The processor 11100 may execute a software module or an instruction set stored in the memory 11200 to perform various functions for the computing device 11000 and process data.

입/출력 서브시스템(11400)은 다양한 입/출력 주변장치들을 주변장치 인터페이스(11300)에 결합시킬 수 있다. 예를 들어, 입/출력 서브시스템(11400)은 모니터나 키보드, 마우스, 프린터 또는 필요에 따라 터치스크린이나 센서등의 주변장치를 주변장치 인터페이스(11300)에 결합시키기 위한 컨트롤러를 포함할 수 있다. 다른 측면에 따르면, 입/출력 주변장치들은 입/출력 서브시스템(11400)을 거치지 않고 주변장치 인터페이스(11300)에 결합될 수도 있다.Input / output subsystem 11400 can couple various input / output peripherals to peripheral interface 11300. For example, the input / output subsystem 11400 may include a controller for coupling a peripheral device such as a monitor or keyboard, a mouse, a printer, or a touch screen or a sensor, as necessary, to the peripheral interface 11300. According to another aspect, the input / output peripherals may be coupled to the peripheral interface 11300 without passing through the input / output subsystem 11400.

전력 회로(11500)는 단말기의 컴포넌트의 전부 또는 일부로 전력을 공급할 수 있다. 예를 들어 전력 회로(11500)는 전력 관리 시스템, 배터리나 교류(AC) 등과 같은 하나 이상의 전원, 충전 시스템, 전력 실패 감지 회로(power failure detection circuit), 전력 변환기나 인버터, 전력 상태 표시자 또는 전력 생성, 관리, 분배를 위한 임의의 다른 컴포넌트들을 포함할 수 있다.The power circuit 11500 may supply power to all or part of the components of the terminal. For example, power circuit 11500 may be a power management system, one or more power sources such as batteries or alternating current (AC), charging systems, power failure detection circuits, power converters or inverters, power status indicators or power sources. It can include any other components for creation, management, distribution.

통신 회로(11600)는 적어도 하나의 외부 포트를 이용하여 다른 컴퓨팅 장치와 통신을 가능하게 할 수 있다.The communication circuit 11600 may enable communication with another computing device using at least one external port.

또는 상술한 바와 같이 필요에 따라 통신 회로(11600)는 RF 회로를 포함하여 전자기 신호(electromagnetic signal)라고도 알려진 RF 신호를 송수신함으로써, 다른 컴퓨팅 장치와 통신을 가능하게 할 수도 있다.Alternatively, as described above, the communication circuit 11600 may include an RF circuit to transmit and receive an RF signal, also known as an electromagnetic signal, to enable communication with other computing devices.

이러한 도 10의 실시예는, 컴퓨팅 장치(11000)의 일례일 뿐이고, 컴퓨팅 장치(11000)은 도 10에 도시된 일부 컴포넌트가 생략되거나, 도 10에 도시되지 않은 추가의 컴포넌트를 더 구비하거나, 2개 이상의 컴포넌트를 결합시키는 구성 또는 배치를 가질 수 있다. 예를 들어, 모바일 환경의 통신 단말을 위한 컴퓨팅 장치는 도 10에 도시된 컴포넌트들 외에도, 터치스크린이나 센서 등을 더 포함할 수도 있으며, 통신 회로(1160)에 다양한 통신방식(WiFi, 3G, LTE, Bluetooth, NFC, Zigbee 등)의 RF 통신을 위한 회로가 포함될 수도 있다. 컴퓨팅 장치(11000)에 포함 가능한 컴포넌트들은 하나 이상의 신호 처리 또는 어플리케이션에 특화된 집적 회로를 포함하는 하드웨어, 소프트웨어, 또는 하드웨어 및 소프트웨어 양자의 조합으로 구현될 수 있다.This embodiment of FIG. 10 is only one example of the computing device 11000, and the computing device 11000 may include some components shown in FIG. 10, or may include additional components not shown in FIG. It may have a configuration or arrangement that combines two or more components. For example, the computing device for a communication terminal in a mobile environment may further include a touch screen or a sensor, in addition to the components illustrated in FIG. 10, and various communication schemes (WiFi, 3G, LTE) in the communication circuit 1160. , Bluetooth, NFC, Zigbee, etc.) may include a circuit for RF communication. Components included in the computing device 11000 may be implemented in hardware, software, or a combination of both hardware and software, including integrated circuits specialized for one or more signal processing or applications.

본 발명의 실시예에 따른 방법들은 다양한 컴퓨팅 장치를 통하여 수행될 수 있는 프로그램 명령(instruction) 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 특히, 본 실시예에 따른 프로그램은 PC 기반의 프로그램 또는 모바일 단말 전용의 어플리케이션으로 구성될 수 있다. 본 발명이 적용되는 애플리케이션은 파일 배포 시스템이 제공하는 파일을 통해 이용자 단말에 설치될 수 있다. 일 예로, 파일 배포 시스템은 이용자 단말이기의 요청에 따라 상기 파일을 전송하는 파일 전송부(미도시)를 포함할 수 있다.Methods according to an embodiment of the present invention may be implemented in program instruction form that can be executed by various computing devices and recorded in a computer readable medium. In particular, the program according to the present embodiment may be configured as a PC-based program or an application dedicated to a mobile terminal. An application to which the present invention is applied may be installed in a user terminal through a file provided by a file distribution system. For example, the file distribution system may include a file transmitter (not shown) for transmitting the file at the request of the user terminal.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The apparatus described above may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components. For example, the devices and components described in the embodiments are, for example, processors, controllers, arithmetic logic units (ALUs), digital signal processors, microcomputers, field programmable gate arrays (FPGAs). Can be implemented using one or more general purpose or special purpose computers, such as a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to the execution of the software. For convenience of explanation, one processing device may be described as being used, but one of ordinary skill in the art will appreciate that the processing device includes a plurality of processing elements and / or a plurality of types of processing elements. It can be seen that it may include. For example, the processing device may include a plurality of processors or one processor and one controller. In addition, other processing configurations are possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로 (collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨팅 장치 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of these, and configure the processing device to operate as desired, or process it independently or in combination. You can command the device. Software and / or data may be any type of machine, component, physical device, virtual equipment, computer storage medium or device in order to be interpreted by or to provide instructions or data to the processing device. Or may be permanently or temporarily embodied in a signal wave to be transmitted. The software may be distributed over networked computing devices so that they are stored or executed in a distributed manner. Software and data may be stored on one or more computer readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be embodied in the form of program instructions that can be executed by various computer means and recorded in a computer readable medium. The computer readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions recorded on the media may be those specially designed and constructed for the purposes of the embodiments, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.Although the embodiments have been described by the limited embodiments and the drawings as described above, various modifications and variations are possible to those skilled in the art from the above description. For example, the described techniques may be performed in a different order than the described method, and / or components of the described systems, structures, devices, circuits, etc. may be combined or combined in a different form than the described method, or other components. Or, even if replaced or substituted by equivalents, an appropriate result can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are within the scope of the claims that follow.

Claims

An emotion recognition system that performs an object recognition based on N frames using a machine learning model,
A reference frame extracting unit for deriving the reference frame of the target;
A feature information extracting unit for extracting N first feature information for the N frames and second feature information for the reference frame;
And an emotion discrimination unit configured to derive an emotion result of the object based on the learned circulatory neural network model based on the N first feature information and the second feature information.

The method according to claim 1,
The reference frame extraction unit,
Emotion recognition system for deriving any one of the N frames as a reference frame.

The method according to claim 1,
The feature information extraction unit,
A first feature information extraction unit for extracting the N first feature information from the N frames; And
And the second feature information extracting unit for extracting second feature information from the reference frame.

The method according to claim 3,
The first feature information extraction unit extracts the N first feature information from the N frames by using a machine-learned model,
The second feature information extraction unit extracts the second feature information from the reference frame using a machine-learned model.

The method according to claim 1,
The emotion discriminating unit,
An emotion sequence information circulatory neural network module for extracting emotion sequence information based on the N first feature information using a trained cyclic neural network model;
A reference sequence information cyclic neural network module for extracting reference sequence information based on the second feature information using the learned cyclic neural network model; And
And an emotion result deriving unit for deriving an emotion result of the object based on the emotion sequence information and reference sequence information.

The method according to claim 5,
The emotion sequence information circulatory neural network module,
Using the trained circulatory neural network model, the N first feature information is sequentially input to at least one cyclic neural network unit, and extracts the N emotion sequence information based on the first feature information.

The method according to claim 5,
The reference sequence information cyclic neural network module,
Emotion recognition system using the trained circulatory neural network model, the second feature information is respectively input to at least one cyclic neural network unit, and extracts the reference sequence information based on the second feature information.

The method according to claim 5,
The emotion sequence information circulatory neural network module,
The N first feature information is sequentially input to at least one cyclic neural network unit by using a trained cyclic neural network model, and the N emotion sequence information is extracted based on the first feature information.
The reference sequence information cyclic neural network module,
The second feature information is input to at least one cyclic neural network unit by using the learned cyclic neural network model, and N reference sequence information is extracted based on the second feature information.
The emotion result derivation unit,
And an emotion result of the object is derived based on the N emotion sequence information and the N reference sequence information.

The method according to claim 8,
The emotion result derivation unit,
And the emotion result of the object is derived by subtracting the N emotion sequence information corresponding to each of the N reference sequence information from the N reference sequence information.

The method according to claim 8,
And at least one cyclic neural network unit of the emotion sequence information circulatory neural network module and corresponding one or more cyclic neural network units of the reference sequence information circulatory neural network module are learned by common learning data.

An emotion recognition method of performing an object recognition on the basis of N frames using a machine learning model implemented by a computing device including at least one processor and at least one memory.
A reference frame extracting step of deriving the reference frame of the object;
A feature information extraction step of extracting N first feature information for the N frames and second feature information for the reference frame;
And an emotion discrimination step of deriving an emotion result of the object based on the trained circulatory neural network model based on the N first feature information and the second feature information.

The method according to claim 11,
The feature information extraction step,
A first feature information extraction step of extracting the N first feature information from the N frames; And
And a second feature information extraction step of extracting second feature information from the reference frame.
In the extracting of the first feature information, the N first feature information is extracted from the N frames by using a trained convolutional neural network model.
In the extracting of the second feature information, the emotion recognition method extracts the second feature information from the reference frame using a trained convolutional neural network model.

The method according to claim 11,
The emotion discrimination step,
An emotion sequence information extracting step of extracting emotion sequence information based on the N first feature information using the learned cyclic neural network model;
A reference sequence information extracting step of extracting reference sequence information based on the second feature information using the learned cyclic neural network model; And
And an emotion result derivation step of deriving an emotion result of the object based on the emotion sequence information and reference sequence information.

The method according to claim 13,
The emotion sequence information extraction step,
The N first feature information is sequentially input to at least one cyclic neural network unit by using a trained cyclic neural network model, and the N emotion sequence information is extracted based on the first feature information.
The reference sequence information extraction step,
The second feature information is input to at least one cyclic neural network unit by using the learned cyclic neural network model, and N reference sequence information is extracted based on the second feature information.
The emotion result derivation step,
And an emotion result of the object is derived based on the N emotion sequence information and the N reference sequence information.

A computer-readable medium for implementing an emotion recognition method of performing an object recognition on the basis of N frames using a machine learning model, wherein the computer-readable medium causes a computing device to perform the following steps. Save the instructions to be executed, the steps being:
A reference frame extracting step of deriving the reference frame of the object;
A feature information extraction step of extracting N first feature information for the N frames and second feature information for the reference frame;
And an emotion discrimination step of deriving an emotion result of the object based on the trained circulatory neural network model based on the N first feature information and the second feature information.