KR20200062843A

KR20200062843A - Apparatus for recommending contents based on facial expression, method thereof and computer recordable medium storing program to perform the method

Info

Publication number: KR20200062843A
Application number: KR1020180148743A
Authority: KR
Inventors: 이혜정
Original assignee: 에스케이텔레콤 주식회사
Priority date: 2018-11-27
Filing date: 2018-11-27
Publication date: 2020-06-04
Also published as: KR102276216B1; KR20210087923A; KR102372017B1

Abstract

The present invention relates to a facial expression-based content recommendation apparatus, a method thereof, and a computer readable recording medium in which a program for performing the method is recorded. The facial expression-based content recommendation apparatus, of the present invention, includes a content processing unit which extracts an expression vector, when an image search term is input, from a facial image of the image search term input, recognizes the image search term, on a vector domain, as a specific expression corresponding to an expression vector group to which the extracted expression vector belongs, and recommends content classified as a specific expression identical to the recognized specific expression.

Description

Apparatus for recommending contents based on facial expression, method thereof and computer recordable medium storing program to perform the method

본 발명은 컨텐츠 검색 기술에 관한 것으로, 보다 상세하게는, 기본 표정을 기초로 특정 개인의 고유의 표정인 특정 표정을 정의하고, 정의된 특정 개인의 특정 표정과 유사한 표정을 가지는 이미지 혹은 동영상 컨텐츠를 검색하는 표정 기반 컨텐츠 추천 장치, 이를 위한 방법 및 이 방법을 수행하는 프로그램이 기록된 컴퓨터 판독 가능한 기록매체에 관한 것이다. The present invention relates to a content search technology, and more specifically, to define a specific expression that is a specific expression of a specific individual based on the basic expression, and to define an image or video content having an expression similar to the specific expression of the specific individual defined An apparatus for recommending facial expression based content for searching, a method for the same, and a computer-readable recording medium in which a program performing the method is recorded.

사람의 표정 (감정)을 파악하기 위해 과거부터 많은 연구들이 진행되어 오고 있으며, 현재까지 가장 널리 쓰이는 표정 분류 카테고리는 1970년경 Ekman 교수가 정의한 7개 기본 표정이다. 이것은 각각 인간의 표정을 공포(Fear), 경멸(Contempt), 슬픔(Sadness), 행복(Happiness), 놀람(Surprise), 분노(Anger), 혐오(Disgust)로 크게 그룹핑한 분류이다. Many studies have been conducted in the past to grasp human expressions (emotions), and the most widely used expression classification category is the seven basic expressions defined by Professor Ekman around 1970. This is a category in which human expressions are largely grouped into Fear, Contempt, Sadness, Happiness, Surprise, Anger, and Disgust.

얼굴인식 기술을 통해 사진 혹은 동영상에 등장하는 사람이 누구인지를 규명해내는 기술, 객체 인식 기술을 통해 사진 혹은 동영상에 등장하는 사물이 무엇인지를 규명해내는 기술 등과 같이 딥 러닝 이후 영상 인식 기술의 비약적인 발전을 통해 각종 영상 인식 기술들은 과거 대비 성능에서 큰 혁신을 이루었고, 상용 서비스에 다양한 형태로 적용이 가능한 수준에 이르렀다. The technology of video recognition after deep learning, such as the technology of identifying who appears in a photo or video through face recognition technology, and the technology of identifying what appears in a photo or video through object recognition technology, etc. Through rapid development, various image recognition technologies have achieved great innovation in performance compared to the past, and have reached a level that can be applied in various forms to commercial services.

얼굴 혹은 사물 인식은 정답이 명확하다. 즉, 얼굴 혹은 사물 인식은 누가 채점해도 정답이 무엇인 지 명확하다. 예컨대, '정우성'을 정우성이라 인식해야 하고, '코끼리'를 코끼리로 인식해야 한다. 이에 반해, 표정 인식 기술의 경우는 1) 특정 표정에 대해서 보는 사람마다 어떤 표정인지를 명확히 판단하기 어렵고, 2) 기쁘면서 눈물을 흘리거나, 무서워서 놀라기도 하지만 기쁘고 흥분해서 놀라기도 하는 등 사람의 표정은 한 가지 감정으로 잘라서 정의하기 어려운 복합적인 감정이 많으며, 3) 표정 근육을 많이 사용하여 과장되고 정확하게 표현하는 사람들도 있지만, 대부분의 사람은 표정 변화가 크지 않은 경우가 많음 등의 이슈가 있다. The correct answer is obvious for face or object recognition. In other words, face or object recognition is clear no matter who scores the answer. For example,'Jung Woo Sung' should be recognized as Jung Woo Sung, and'Elephant' as an elephant. On the other hand, in the case of facial expression recognition technology, it is difficult to clearly determine what kind of facial expression each person sees about a specific facial expression, and 2) a person's facial expression, such as being happy and crying or being scared and surprised, but also being surprised and excited There are many complex emotions that are difficult to define by cutting out one emotion, and 3) there are some people who express and exaggerate and use expression muscles a lot.

즉, 표정 인식 기술은 다른 영상 인식 기술분야처럼 정답이 명확하지 않아, 카테고리 별 학습 데이터를 모으기도 쉽지 않지만 어떠한 분류 체계로 학습을 시켜서 정답을 제시해야 할지에 대한 기준도 정의하기가 쉽지 않다. In other words, facial expression recognition technology is not easy to collect learning data for each category because the correct answer is not clear like other video recognition technology fields, but it is not easy to define the criteria for which classification system to provide the correct answer by learning.

한국공개특허 제10-2005-0007688호, 2005년 01월 21일 공개 (명칭: 얼굴인식/표정인식 시스템 및 방법)Published Korean Patent No. 10-2005-0007688, published on January 21, 2005 (Name: Face Recognition/Face Recognition System and Method)

본 발명의 목적은 기본 표정을 기초로 특정 개인의 고유의 표정을 나타내는 특정 표정을 정의하는 표정 기반 컨텐츠 추천 장치, 이를 위한 방법 및 이 방법을 수행하는 프로그램이 기록된 컴퓨터 판독 가능한 기록매체를 제공함에 있다. An object of the present invention is to provide a computer-readable recording medium in which a facial expression-based content recommendation device defining a specific expression representing a unique expression of a specific individual based on a basic expression, a method for the same, and a program performing the method are recorded have.

본 발명의 다른 목적은 정의된 특정 개인의 특정 표정과 유사한 표정을 가지는 이미지 혹은 동영상 컨텐츠를 검색하는 표정 기반 컨텐츠 추천 장치, 이를 위한 방법 및 이 방법을 수행하는 프로그램이 기록된 컴퓨터 판독 가능한 기록매체를 제공함에 있다. Another object of the present invention is a facial expression-based content recommendation device for retrieving an image or video content having a facial expression similar to a specific facial expression of a specific individual, a computer readable recording medium having a method therefor and a program performing the method It is in offer.

본 발명은 일반적인 사람들의 보편적인 표정이 아니라, 특정인의 고유의 표정을 특정하고, 인식하며, 분류할 수 있다. 이에 따라, 특정인의 고유의 표정을 기초로 컨텐츠를 분류하고, 검색하며, 추천하는 서비스를 제공할 수 있다. 이러한 서비스는 사용자에게 새로운 사용자경험(UX)을 제공할 수 있다. The present invention can identify, recognize, and classify a unique expression of a specific person, not a general expression of general people. Accordingly, it is possible to provide a service for classifying, searching, and recommending content based on a specific expression of a specific person. These services can provide a new user experience (UX) to the user.

도 1은 본 발명의 실시예에 따른 표정 기반 컨텐츠 추천 장치를 설명하기 위한 블록도이다.
도 2는 본 발명의 실시예에 따른 기본표정인식기의 구성을 설명하기 위한 도면이다.
도 3은 본 발명의 실시예에 따른 특정 표정을 정의하기 위한 벡터 도메인을 설명하기 위한 도면이다.
도 4는 본 발명의 실시예에 따른 표정 기반 컨텐츠 추천 방법을 설명하기 위한 흐름도이다.
도 5는 본 발명의 실시예에 따른 기본 표정을 학습하는 절차를 설명하기 위한 흐름도이다.
도 6은 본 발명의 실시예에 따른 특정인의 특정 표정을 정의하는 방법을 설명하기 위한 흐름도이다.
도 7은 본 발명의 실시예에 따른 표정 기반의 컨텐츠 분류 방법을 설명하기 위한 흐름도이다.
도 8은 본 발명의 일 실시예에 따른 표정 기반 컨텐츠 추천 방법을 설명하기 위한 흐름도이다.
도 9는 본 발명의 실시예에 따른 특정 표정에 따라 동영상 컨텐츠를 분류하는 방법을 설명하기 위한 흐름도이다.
도 10은 본 발명의 실시예에 따른 특정 표정에 따라 동영상 컨텐츠를 분류하는 방법을 설명하기 위한 도면이다. 1 is a block diagram illustrating a facial expression-based content recommendation apparatus according to an embodiment of the present invention.
2 is a view for explaining the configuration of the basic expression recognizer according to an embodiment of the present invention.
3 is a diagram for describing a vector domain for defining a specific expression according to an embodiment of the present invention.
4 is a flowchart illustrating a method for recommending facial expression-based content according to an embodiment of the present invention.
5 is a flowchart illustrating a procedure for learning a basic facial expression according to an embodiment of the present invention.
6 is a flowchart illustrating a method of defining a specific expression of a specific person according to an embodiment of the present invention.
7 is a flowchart illustrating a method for classifying content based on facial expressions according to an embodiment of the present invention.
8 is a flowchart illustrating a method for recommending facial expression-based content according to an embodiment of the present invention.
9 is a flowchart illustrating a method of classifying video content according to a specific expression according to an embodiment of the present invention.
10 is a diagram for explaining a method of classifying video content according to a specific expression according to an embodiment of the present invention.

이하 본 발명의 바람직한 실시 예를 첨부한 도면을 참조하여 상세히 설명한다. 다만, 하기의 설명 및 첨부된 도면에서 본 발명의 요지를 흐릴 수 있는 공지 기능 또는 구성에 대한 상세한 설명은 생략한다. 또한, 도면 전체에 걸쳐 동일한 구성 요소들은 가능한 한 동일한 도면 부호로 나타내고 있음에 유의하여야 한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. However, in the following description and accompanying drawings, detailed descriptions of well-known functions or configurations that may obscure the subject matter of the present invention are omitted. In addition, it should be noted that the same components throughout the drawings are represented by the same reference numerals as much as possible.

이하에서 설명되는 본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니 되며, 발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위한 용어의 개념으로 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야만 한다. 따라서 본 명세서에 기재된 실시 예와 도면에 도시된 구성은 본 발명의 가장 바람직한 일 실시 예에 불과할 뿐이고, 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형 예들이 있을 수 있음을 이해하여야 한다.The terms or words used in the present specification and claims described below should not be construed as being limited to ordinary or dictionary meanings, and the inventor is appropriate as a concept of terms for explaining his or her invention in the best way. Based on the principle that it can be defined, it should be interpreted as meanings and concepts consistent with the technical spirit of the present invention. Therefore, the configuration shown in the embodiments and drawings described in this specification is only one of the most preferred embodiments of the present invention, and does not represent all of the technical spirit of the present invention, and can replace them at the time of this application. It should be understood that there may be equivalents and variations.

또한, 제1, 제2 등과 같이 서수를 포함하는 용어는 다양한 구성요소들을 설명하기 위해 사용하는 것으로, 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용될 뿐, 상기 구성요소들을 한정하기 위해 사용되지 않는다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제2 구성요소는 제1 구성요소로 명명될 수 있고, 유사하게 제1 구성요소도 제2 구성요소로 명명될 수 있다.In addition, terms including ordinal numbers such as first and second are used to describe various components, and are used only to distinguish one component from other components, and to limit the components It is not used. For example, the second component may be referred to as a first component without departing from the scope of the present invention, and similarly, the first component may also be referred to as a second component.

더하여, 어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급할 경우, 이는 논리적 또는 물리적으로 연결되거나, 접속될 수 있음을 의미한다. 다시 말해, 구성요소가 다른 구성요소에 직접적으로 연결되거나 접속되어 있을 수 있지만, 중간에 다른 구성요소가 존재할 수도 있으며, 간접적으로 연결되거나 접속될 수도 있다고 이해되어야 할 것이다.In addition, when referring to a component being "connected" or "connected" to another component, it means that it can be connected or connected logically or physically. In other words, although the component may be directly connected or connected to other components, it should be understood that other components may exist in the middle and may be connected or connected indirectly.

또한, 본 명세서에서 사용한 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 또한, 본 명세서에서 기술되는 "포함 한다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.In addition, the terms used in this specification are only used to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In addition, terms such as "comprises" or "have" described herein are intended to indicate that there are features, numbers, steps, operations, components, parts, or combinations thereof described in the specification, or one or more thereof. It should be understood that the above other features or numbers, steps, operations, components, parts, or combinations thereof are not excluded in advance.

아울러, 본 발명의 범위 내의 실시 예들은 컴퓨터 실행가능 명령어 또는 컴퓨터 판독가능 매체에 저장된 데이터 구조를 가지거나 전달하는 컴퓨터 판독가능 매체를 포함한다. 이러한 컴퓨터 판독가능 매체는, 범용 또는 특수 목적의 컴퓨터 시스템에 의해 액세스 가능한 임의의 이용 가능한 매체일 수 있다. 예로서, 이러한 컴퓨터 판독가능 매체는 RAM, ROM, EPROM, CD-ROM 또는 기타 광디스크 저장장치, 자기 디스크 저장장치 또는 기타 자기 저장장치, 또는 컴퓨터 실행가능 명령어, 컴퓨터 판독가능 명령어 또는 데이터 구조의 형태로 된 소정의 프로그램 코드 수단을 저장하거나 전달하는 데에 이용될 수 있고, 범용 또는 특수 목적 컴퓨터 시스템에 의해 액세스 될 수 있는 임의의 기타 매체와 같은 물리적 저장 매체를 포함할 수 있지만, 이에 한정되지 않는다. In addition, embodiments within the scope of the present invention include computer-readable media having or having computer-executable instructions or data structures stored on computer-readable media. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer system. By way of example, such computer readable media may be in the form of RAM, ROM, EPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage, or computer executable instructions, computer readable instructions or data structures. Physical storage media, such as any other media that can be used to store or transfer any program code means, and can be accessed by a general purpose or special purpose computer system.

먼저, 본 발명의 실시예에 따른 표정 기반 컨텐츠 추천 장치에 대해서 설명하기로 한다. 도 1은 본 발명의 실시예에 따른 표정 기반 컨텐츠 추천 장치를 설명하기 위한 블록도이다. 도 2는 본 발명의 실시예에 따른 기본표정인식기의 구성을 설명하기 위한 도면이다. 도 3은 본 발명의 실시예에 따른 특정 표정을 정의하기 위한 벡터 도메인을 설명하기 위한 도면이다. First, an expression-based content recommendation device according to an embodiment of the present invention will be described. 1 is a block diagram illustrating a facial expression-based content recommendation apparatus according to an embodiment of the present invention. 2 is a view for explaining the configuration of the basic expression recognizer according to an embodiment of the present invention. 3 is a diagram for describing a vector domain for defining a specific expression according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 실시예에 따른 컨텐츠추천장치(10)는 인식부(100) 및 제어부(200)를 포함한다. Referring to FIG. 1, the content recommendation device 10 according to an embodiment of the present invention includes a recognition unit 100 and a control unit 200.

인식부(100)는 복수의 인식기를 포함한다. 복수의 인식기는 인식부(100)는 적어도 기본표정인식기(110)를 포함한다. 더욱이, 인식부(100)는 표정근육인식기(120) 및 랜드마크인식기(130) 중 적어도 하나를 더 포함할 수 있다. 이러한 인식기는 전방전달신경망(FNN: feedforward neural network), 순환인공신경망(RNN: Recurrent neural network), 합성곱신경망(CNN: convolutional neural networks) 및 코헨자기조직신경망(KSN: kohonen self-organizing network) 등의 인공신경망(ANN: artificial neural network)을 예시할 수 있다. 또한, 인식기는 인공신경망뿐만 아니라, 트리인식기, 서포트 벡터 머신(SVM: support vector machine), 퍼셉트론(perceptron), RBF(radial basis function) 등을 예시할 수 있다. The recognition unit 100 includes a plurality of recognizers. The plurality of recognizers 100 includes at least a basic expression recognizer 110. Moreover, the recognition unit 100 may further include at least one of the facial expression muscle recognizer 120 and the landmark recognizer 130. Such recognizers include a forward forward neural network (FNN), a recurrent neural network (RNN), a convolutional neural network (CNN) and a kohonen self-organizing network (KSN). An artificial neural network (ANN) can be exemplified. In addition, the recognizer may exemplify not only an artificial neural network, but also a tree recognizer, a support vector machine (SVM), perceptron, and radial basis function (RBF).

인식부(100)의 복수의 인식기는 학습(Machine learning)된 바에 따라 얼굴 이미지로부터 얼굴 이미지의 특징을 추출하고, 추출된 특징으로부터 얼굴 이미지의 표정이 어떤 표정인지 인식, 즉, 결정하는 기능을 가진다. 하지만, 본 발명은 인식부(100)의 복수의 인식기를 학습시키고, 학습된 바에 따라 인식부(100)를 통해 특징을 추출하되, 얼굴 이미지의 표정이 어떤 표정인지 인식, 즉, 결정하는 기능을 수행하지 않는다. 대신, 인식부(100)를 통해 추출된 얼굴 이미지의 특징을 벡터화하고, 이를 벡터 공간에 사상한 후, 특정 인물의 고유의 표정인 특정 표정을 정의한다. 이러한 인식부(100)의 동작에 대해서는 아래에서 더 상세하게 설명될 것이다. The plurality of recognizers of the recognition unit 100 has a function of extracting features of the face image from the face image as learned, and recognizing, that is, determining which facial expression of the face image is from the extracted features. . However, the present invention learns a plurality of recognizers of the recognition unit 100, extracts features through the recognition unit 100 as learned, and recognizes a facial expression of a facial image, that is, determines a function. Do not perform. Instead, the features of the facial image extracted through the recognition unit 100 are vectorized, mapped into a vector space, and then a specific expression that is a unique expression of a specific person is defined. The operation of the recognition unit 100 will be described in more detail below.

한편, 전술한 바와 같이, 본 발명의 실시예에 따른 인식부(100)는 복수의 인식기를 포함한다. 복수의 인식기 중 기본표정인식기(110)는 미리 정의된 복수의 기본 표정을 학습하고, 인식하기 위한 것이다. Meanwhile, as described above, the recognition unit 100 according to an embodiment of the present invention includes a plurality of recognizers. The basic expression recognizer 110 among the plurality of recognizers is for learning and recognizing a plurality of predefined basic expressions.

그러면, 이러한 기본표정인식기(100)의 일례에 대해서 설명하기로 한다. 아래의 실시예에서 기본표정인식기(100)는 합성곱신경망(CNN: convolutional neural networks)을 대표적인 예로 설명할 것이다. 하지만, 본 발명을 이에 한정하는 것은 아니며, 이 기술분야에서 통상의 지식을 가진자라면, 학습(Machine learning)에 따라 얼굴의 기본 표정을 인식하는 모든 종류의 인식기를 사용할 수 있음을 이해할 수 있을 것이다. Then, an example of such a basic expression recognizer 100 will be described. In the following embodiment, the basic expression recognizer 100 will be described as a representative example of convolutional neural networks (CNN). However, the present invention is not limited to this, and those skilled in the art can understand that all kinds of recognizers that recognize basic facial expressions of the face can be used according to machine learning. .

도 2를 참조하면, 기본표정인식기(110)는 복수의 계층을 포함한다. 복수의 계층은 어느 하나의 계층의 출력이 가중치가 적용되는 복수의 연산을 통해 다음 계층을 구성한다. 여기서, 가중치는 계층 간 연결의 강도를 결정한다. Referring to FIG. 2, the basic expression recognizer 110 includes a plurality of layers. The plurality of layers constitutes a next layer through a plurality of operations in which the output of any one layer is weighted. Here, the weight determines the strength of the connection between layers.

기본표정인식기(110)는 입력계층(input layer: IL), 컨볼루션계층(convolution layer: CL), 풀링계층(pooling layer: PL), 완전연결계층(fully-connected layer: FL) 및 출력계층(Output layer: ML)을 포함한다. The basic expression recognizer 110 includes an input layer (IL), a convolution layer (CL), a pooling layer (PL), a fully-connected layer (FL), and an output layer ( Output layer: ML).

입력계층(IL)은 소정 크기의 행렬로 이루어진다. 입력계층(IL) 행렬의 각 원소는 입력되는 이미지의 각각의 픽셀에 대응한다. The input layer IL is formed of a matrix having a predetermined size. Each element of the input layer (IL) matrix corresponds to each pixel of the input image.

도 2에 도시된 바에 따르면, 2개의 컨볼루션계층(CL: CL1, CL2)과 2개의 풀링계층(PL: PL1, PL2)이 교대로 반복되는 것으로 도시하였지만, 본 발명은 이에 한정되지 않으며, 이 기술분야에서 통상의 지식을 가진자는 컨볼루션계층(CL) 및 풀링계층(PL)의 수 및 배치 순서가 인공신경망의 설계에 따라 변동될 수 있음을 이해할 수 있을 것이다. 컨볼루션계층(CL)과 풀링계층(PL) 각각은 복수의 특징 지도(Feature Map)로 이루어지며, 이러한 특징 지도 각각은 소정 크기의 행렬이다. 특징 지도를 이루는 행렬의 원소 각각의 값은 이전 계층에 대해 커널을 이용한 컨볼루션 연산(convolution) 혹은 풀링 연산(pooling 혹은 subsampling)을 적용하여 산출된다. 여기서, 여기서, 커널은 소정 크기의 행렬이며, 커널을 이루는 행렬의 각 원소의 값은 가중치(w)가 된다. 2, the two convolutional layers (CL: CL1, CL2) and the two pooling layers (PL: PL1, PL2) are illustrated as being alternately repeated, but the present invention is not limited thereto. Those skilled in the art will understand that the number and arrangement order of the convolutional layers CL and the pooling layers PL may vary depending on the design of the artificial neural network. Each of the convolutional layer CL and the pooling layer PL is composed of a plurality of feature maps, each of which is a matrix of a predetermined size. The value of each element of the matrix forming the feature map is calculated by applying a convolution or pooling or subsampling using the kernel to the previous layer. Here, the kernel is a matrix of a predetermined size, and the value of each element of the matrix constituting the kernel is a weight (w).

완전연결계층(FL)은 복수의 노드(혹은 sigmoid: f1, f2, f3...... fn)를 포함하며, 출력계층(OL)은 복수의 출력노드(O1, O2, O3,... O7)를 포함한다. 완전연결계층(FL)의 연산 또한 가중치(w)가 적용되어 출력계층(OL)의 복수의 출력노드(O1, O2, O3...... O7)에 입력된다. 복수의 출력노드(O1, O2, O3...... O7) 각각은 소정의 표정에 대응한다. 예컨대, 이러한 표정은 공포(Fear), 경멸(Contempt), 슬픔(Sadness), 행복(Happiness), 놀람(Surprise), 분노(Anger) 및 혐오(Disgust)를 포함한다. The fully connected layer FL includes a plurality of nodes (or sigmoid: f1, f2, f3... fn), and the output layer OL includes a plurality of output nodes O1, O2, O3,... .O7). The calculation of the fully connected layer FL is also applied to the weights w and input to the plurality of output nodes O1, O2, O3... O7 of the output layer OL. Each of the plurality of output nodes O1, O2, O3... O7 corresponds to a predetermined expression. For example, such expressions include Fear, Contempt, Sadness, Happiness, Surprise, Anger and Disgust.

예를 들면, 제1 출력 노드(O1)는 기본 표정 중 공포(Fear)에 대응하며, 제1 출력 노드(O1)의 출력인 제1 출력값은 얼굴 이미지의 기본 표정이 공포(Fear)일 확률을 나타낸다. 예컨대, 제1 출력 노드(O1)의 출력인 제1 출력값이 0.02이면, 얼굴 이미지의 기본 표정이 공포(Fear)일 확률이 2%임을 나타낸다. For example, the first output node O1 corresponds to fear among the basic expressions, and the first output value, which is the output of the first output node O1, indicates the probability that the basic expression of the face image is fear. Shows. For example, if the first output value, which is the output of the first output node O1, is 0.02, it indicates that the probability that the basic expression of the face image is fear is 2%.

다른 예로, 제2 출력 노드(O2)는 기본 표정 중 경멸(Contempt)에 대응하며, 제2 출력 노드(O2)의 출력인 제2 출력값은 얼굴 이미지의 기본 표정이 경멸(Contempt)일 확률을 나타낸다. 예컨대, 제2 출력 노드(O2)의 출력인 제2 출력값이 0.01이면, 얼굴 이미지의 기본 표정이 경멸(Contempt)일 확률이 1%임을 나타낸다. As another example, the second output node O2 corresponds to contempt among the basic expressions, and the second output value, which is the output of the second output node O2, indicates the probability that the basic expression of the face image is contempt. . For example, when the second output value, which is the output of the second output node O2, is 0.01, it indicates that the probability that the basic expression of the face image is contempt is 1%.

다른 예로, 제3 출력 노드(O3)는 기본 표정 중 슬픔(Sadness)에 대응하며, 제3 출력 노드(O3)의 출력인 제3 출력값은 얼굴 이미지의 기본 표정이 슬픔(Sadness)일 확률을 나타낸다. 예컨대, 제3 출력 노드(O3)의 출력인 제3 출력값이 0.79이면, 얼굴 이미지의 기본 표정이 슬픔(Sadness)일 확률이 79%임을 나타낸다. As another example, the third output node O3 corresponds to sadness among the basic expressions, and the third output value that is the output of the third output node O3 represents the probability that the basic expression of the face image is sadness. . For example, when the third output value, which is the output of the third output node O3, is 0.79, it indicates that the probability that the basic expression of the face image is sadness is 79%.

다른 예로, 제7 출력 노드(O7)는 다른 예로, 제7 출력 노드(O7)는 기본 표정 중 혐오(Disgust)에 대응하며, 제7 출력 노드(O7)의 출력인 제7 출력값은 얼굴 이미지의 기본 표정이 혐오(Disgust)일 확률을 나타낸다. 예컨대, 제7 출력 노드(O7)의 출력인 제7 출력값이 0.11이면, 얼굴 이미지의 기본 표정이 혐오(Disgust)일 확률이 11%임을 나타낸다. As another example, the seventh output node O7 is another example, and the seventh output node O7 corresponds to disgust among basic expressions, and the seventh output value that is the output of the seventh output node O7 is the face image. Indicates the probability that the primary expression is Disgust. For example, when the seventh output value, which is the output of the seventh output node O7, is 0.11, it indicates that the probability that the basic expression of the face image is disgust is 11%.

복수의 계층(IL, CL, PL, FL, OL) 각각은 복수의 연산을 포함한다. 복수의 계층(IL, CL, PL, FL, OL)의 복수의 연산 각각은 가중치(w)가 적용되며, 가중치(w)가 적용된 연산 결과는 다음 계층으로 전달된다. 즉, 이전 계층의 연산 결과는 다음 계층의 입력이 된다. 좀 더 자세히, 도 2에 도시된 바를 예로 하여 각 계층의 연산과 그 가중치(w)에 대해 설명하기로 한다. Each of the plurality of layers IL, CL, PL, FL, and OL includes a plurality of operations. Each of the plurality of operations of the plurality of layers (IL, CL, PL, FL, OL) is applied with a weight (w), and the calculation result to which the weight (w) is applied is delivered to the next layer. That is, the operation result of the previous layer becomes the input of the next layer. In more detail, the operation of each layer and its weight w will be described as an example as shown in FIG. 2.

전술한 바와 같이, 입력계층(IL)은 소정 크기의 행렬인 특징지도이다. 입력계층(IL)의 행렬의 원소는 픽셀 단위이다. 그 행렬의 원소 각각은 얼굴 이미지의 각 픽셀의 픽셀값 등이 될 수 있고, 픽셀값은 이진 데이터로 입력계층(IL)의 행렬의 원소에 입력될 수 있다. As described above, the input layer IL is a feature map that is a matrix of a predetermined size. The elements of the matrix of the input layer IL are in pixels. Each element of the matrix may be a pixel value of each pixel of the face image, and the pixel value may be input to elements of the matrix of the input layer IL as binary data.

그러면, 입력계층 행렬에 대해 복수의 커널(K) 각각을 이용한 컨벌루션 연산(convolution)이 수행되며, 그 연산 결과는 제1 컨벌루션 계층(CL1)의 복수의 특징지도에 입력된다. 여기서, 복수의 커널(K1) 각각은 행렬의 원소가 가중치(w)인 소정 크기의 행렬을 이용할 수 있다. 또한, 제1 컨벌루션 계층(CL1)의 복수의 특징지도 각각은 소정 크기의 행렬이다. Then, a convolution operation using each of the plurality of kernels K is performed on the input layer matrix, and the result of the operation is input to a plurality of feature maps of the first convolutional layer CL1. Here, each of the plurality of kernels K1 may use a matrix having a predetermined size whose elements are weights (w). In addition, each of the plurality of feature maps of the first convolution layer CL1 is a matrix having a predetermined size.

다음으로, 제1 컨벌루션 계층(CL1)의 복수의 특징 지도에 대해 복수의 커널(K)을 이용한 풀링 연산(subsampling)이 수행된다. 복수의 커널(K) 또한 각각이 원소가 가중치(w)로 이루어진 소정 크기의 행렬이다. 이러한 풀링 연산(subsampling)의 연산 결과는 제1 풀링계층(PL1)의 복수의 특징지도에 입력된다. 제1 풀링계층(PL1)의 복수의 특징지도 역시 각각이 소정 크기의 행렬이다. Next, a subsampling operation using a plurality of kernels K is performed on a plurality of feature maps of the first convolution layer CL1. The plurality of kernels K is also a matrix of a predetermined size, each of which consists of a weight w. The result of the calculation of the subsampling is input to a plurality of feature maps of the first pooling layer PL1. Each of the plurality of feature maps of the first pooling layer PL1 is also a matrix of a predetermined size.

이어서, 제1 풀링계층(PL1)의 복수의 특징 지도에 대해 행렬의 원소 각각이 가중치(w)로 이루어진 소정 크기의 행렬인 커널(K)을 이용한 컨벌루션 연산(convolution)을 수행하여, 복수개의 특징 지도로 이루어진 제2 컨벌루션 계층(CL2)을 구성한다. 다음으로, 제2 컨벌루션 계층(CL2)의 복수의 특징 지도에 대해 복수의 가중치(w)로 이루어진 행렬인 커널(K)을 이용한 풀링 연산(subsampling)을 수행하여 복수의 특징 지도로 이루어진 제2 풀링계층(PL2)을 구성한다. 제2 풀링계층(PL2) 역시 각각이 소정 크기의 행렬이다. Subsequently, a plurality of feature maps is performed on a plurality of feature maps of the first pooling layer PL1 by performing convolution using a kernel K, which is a matrix of a predetermined size, each of which consists of a weight w. A second convolutional layer CL2 composed of a map is configured. Next, a second pooling of a plurality of feature maps is performed by performing a subsampling using a kernel K, which is a matrix of a plurality of weights w, for a plurality of feature maps of the second convolution layer CL2. Constructs the layer (PL2). Each of the second pooling layers PL2 is also a matrix of a predetermined size.

그런 다음, 제2 풀링계층(PL2)의 복수의 특징 지도에 대해 복수의 커널(K)을 이용한 컨벌루션 연산(convolution)을 수행한다. 복수의 커널(K) 또한 그 원소가 가중치(w)로 이루어진 소정 크기의 행렬이다. 복수의 커널(K)을 이용한 컨벌루션 연산(convolution) 결과에 따라 완전연결계층(FL)이 생성된다. 다른 말로, 복수의 커널(K5)을 이용한 컨벌루션 연산(convolution) 결과는 복수의 노드(f1 내지 fn)에 입력된다. Then, a convolution operation using a plurality of kernels K is performed on a plurality of feature maps of the second pooling layer PL2. The plurality of kernels K is also a matrix of a predetermined size whose elements are weights w. A complete connection layer FL is generated according to a convolution result using a plurality of kernels K. In other words, the convolution result using the plurality of kernels K5 is input to the plurality of nodes f1 to fn.

완전연결계층(FL)의 복수의 노드(f1 내지 fn) 각각은 제2 풀링계층(PL2)으로부터 입력에 대해 전달함수 등을 이용한 소정의 연산을 수행하고, 그 연산에 가중치(w)를 적용하여 출력계층(OL)의 각 노드에 입력한다. 이에 따라, 출력계층(OL)의 복수의 노드(O1 ~ O7)는 완전연결계층(FL)으로부터 입력된 값에 대해 소정의 연산을 수행하고, 그 결과인 출력값을 출력한다. 전술한 바와 같이, 복수의 출력 노드(O1, O2, O3, ...... O7) 각각은 소정의 기본 표정에 대응하며, 이러한 복수의 출력 노드(O1, O2, O3, ... O7) 각각의 출력값은 기본 표정에 대응하는 확률값이다. Each of the plurality of nodes f1 to fn of the fully connected layer FL performs a predetermined operation using a transfer function for the input from the second pooling layer PL2, and applies a weight w to the operation Input to each node of the output layer OL. Accordingly, the plurality of nodes O1 to O7 of the output layer OL performs a predetermined operation on the value input from the fully connected layer FL, and outputs the resultant output value. As described above, each of the plurality of output nodes O1, O2, O3, ... O7 corresponds to a predetermined basic expression, and such a plurality of output nodes O1, O2, O3, ... O7 ) Each output value is a probability value corresponding to a basic expression.

전술한 바와 같이, 기본표정인식기(110)의 복수의 계층 각각은 복수의 연산으로 이루어지며, 어느 하나의 계층의 어느 하나의 연산 결과는 가중치(w)가 적용되어 후속 계층에 입력된다. 따라서 기본표정인식기(110)는 얼굴 이미지가 입력되면, 얼굴 이미지의 각 픽셀 단위로 가중치(w)가 적용되는 복수의 연산을 수행하여 그 연산의 결과를 출력한다. 이러한 연산 결과에 따라 최종적으로 출력 노드(O1, O2, O3...... O7) 각각의 출력값은 기본 표정에 대응하는 확률값이된다. 예컨대, 출력 노드(O1, O2, O3...... O7) 각각의 출력값은 공포(Fear), 경멸(Contempt), 슬픔(Sadness), 행복(Happiness), 놀람(Surprise), 분노(Anger) 및 혐오(Disgust) 각각의 확률값이 된다. As described above, each of the plurality of layers of the basic expression recognizer 110 is composed of a plurality of operations, and the result of any operation of any one layer is applied with a weight (w) and input to a subsequent layer. Therefore, when the face expression image 110 is input, the basic expression recognizer 110 performs a plurality of operations to which the weight w is applied in units of each pixel of the face image, and outputs the result of the operation. According to the result of the calculation, the output value of each of the output nodes O1, O2, O3... O7 becomes a probability value corresponding to the basic expression. For example, the output values of each of the output nodes O1, O2, O3... O7 are Fear, Contempt, Sadness, Happiness, Surprise, Anger ) And Disgust, respectively.

다시 도 1을 참조하면, 제어부(200)는 컨텐츠추천장치(10)의 전반적인 동작 및 사용자장치(100)의 내부 블록들 간 신호 흐름을 제어하고, 데이터를 처리하는 데이터 처리 기능을 수행할 수 있다. 또한, 제어부(200)는 기본적으로, 컨텐츠추천장치(10)의 각 종 기능을 제어하는 역할을 수행한다. 제어부(200)는 중앙처리장치(CPU: Central Processing Unit), 디지털신호처리기(DSP: Digital Signal Processor) 등을 예시할 수 있다. Referring back to FIG. 1, the control unit 200 may control the overall operation of the content recommendation device 10 and the signal flow between the internal blocks of the user device 100 and perform a data processing function for processing data. . In addition, the control unit 200 basically serves to control various functions of the content recommendation device 10. The control unit 200 may exemplify a central processing unit (CPU), a digital signal processor (DSP), and the like.

제어부(200)는 인식부(100)의 복수의 인식기를 이용하여 얼굴 이미지로부터 특정인의 고유의 표정인 특정 표정을 정의하고, 정의된 특정 표정에 따라 컨텐츠를 분류하여 제공하거나, 특정 표정과 유사한 컨텐츠를 검색하여 추천한다. 이러한 제어부(200)는 표정처리부(210) 및 컨텐츠처리부(230)를 포함한다. The controller 200 uses a plurality of recognizers of the recognition unit 100 to define a specific expression, which is a unique expression of a specific person, from the face image, classifies content according to the defined specific expression, or provides content similar to the specific expression Search and recommend. The control unit 200 includes an expression processing unit 210 and a content processing unit 230.

표정처리부(210)는 특정인의 특정 표정을 정의하기 위한 것이다. 여기서, 특정인의 특정 표정은 특정인의 고유한 표정을 의미한다. 여기서, 특정인은 연예인, 유명 운동선수, 유명 작가 등과 같은 유명인(celebrity)이 바람직하다. 이때, 표정처리부(210)는 미리 정의된 복수의 기본 표정을 기초로 벡터 도메인(VD) 상에서 특정인의 적어도 하나의 특정 표정을 정의한다. 도 3에 이러한 벡터 도메인(VD)과 벡터 도메인 상에서 정의된 복수의 특정 표정(G1 내지 G5)이 도시되었다. 표정처리부(210)는 얼굴 이미지로부터 기본 표정을 기초로 하는 표정벡터를 추출하고, 추출된 표정벡터를 벡터 도메인(VD) 상에서 유사도에 따라 클러스터링하여 표정벡터그룹(예컨대, G1, G2, G3, G4, G5)을 생성한다. 그리고 표정벡터그룹을 특정 표정에 매핑하여 특정 표정을 정의한다. The facial expression processing unit 210 is for defining a specific expression of a specific person. Here, the specific expression of the specific person means the unique expression of the specific person. Here, a specific person is preferably a celebrity such as a celebrity, a famous athlete, or a famous writer. At this time, the expression processing unit 210 defines at least one specific expression of the specific person on the vector domain VD based on a plurality of predefined basic expressions. In FIG. 3, the vector domain (VD) and a plurality of specific expressions (G1 to G5) defined on the vector domain are shown. The expression processing unit 210 extracts an expression vector based on the basic expression from the face image, and clusters the extracted expression vector according to similarity on the vector domain VD, such as an expression vector group (eg, G1, G2, G3, G4) , G5). Then, the expression vector group is mapped to a specific expression to define a specific expression.

컨텐츠처리부(220)는 앞서 정의된 특정 표정에 따라 컨텐츠를 추천하기 위한 것이다. 일 실시예에 따르면, 컨텐츠처리부(220)는 이미지 검색어로 얼굴 이미지가 입력되면, 정의된 특정 표정을 참조하여 입력된 얼굴 이미지와 가장 유사한 특정 표정을 가지는 얼굴 이미지가 포함된 적어도 하나의 콘텐츠를 검색하고, 검색된 콘텐츠를 추천할 수 있다. 다른 실시예에 따르면, 컨텐츠처리부(220)는 컨텐츠 추천 요청이 있으면, 정의된 특정 표정을 참조하여 컨텐츠를 특정 표정에 따라 분류하여 분류된 컨텐츠를 추천할 수 있다. 표정처리부(210) 및 컨텐츠처리부(230)를 포함하는 제어부(200)의 동작은 아래에서 더 상세하게 설명될 것이다. The content processing unit 220 is for recommending content according to a specific expression defined above. According to an embodiment, when a face image is input as an image search term, the content processing unit 220 searches for at least one content including a face image having a specific expression most similar to the input face image by referring to the defined specific expression. And recommend the searched content. According to another embodiment, when a content recommendation request is made, the content processing unit 220 may recommend the classified content by classifying the content according to the specific expression by referring to the defined specific expression. The operation of the control unit 200 including the facial expression processing unit 210 and the content processing unit 230 will be described in more detail below.

다음으로, 본 발명의 실시예에 따른 표정 기반 컨텐츠 추천 방법에 대해서 설명하기로 한다. 도 4는 본 발명의 실시예에 따른 표정 기반 컨텐츠 추천 방법을 설명하기 위한 흐름도이다. Next, an expression-based content recommendation method according to an embodiment of the present invention will be described. 4 is a flowchart illustrating a method for recommending facial expression-based content according to an embodiment of the present invention.

도 3 및 도 4를 참조하면, 먼저, 표정처리부(210)는 S10 단계에서 특정인의 특정 표정을 정의한다. 여기서, 특정인의 특정 표정은 특정인의 고유한 표정을 의미한다. 여기서, 특정인은 연예인, 유명 운동선수, 유명 작가 등과 같은 유명인(celebrity)이 바람직하다. 이때, 표정처리부(210)는 미리 정의된 복수의 기본 표정을 기초로 벡터 도메인(VD) 상에서 특정인의 적어도 하나의 특정 표정을 정의한다. 3 and 4, first, the facial expression processing unit 210 defines a specific facial expression of a specific person in step S10. Here, the specific expression of the specific person means the unique expression of the specific person. Here, a specific person is preferably a celebrity such as a celebrity, a famous athlete, or a famous writer. At this time, the expression processing unit 210 defines at least one specific expression of the specific person on the vector domain VD based on a plurality of predefined basic expressions.

특정 표정이 정의되면, 컨텐츠처리부(220)는 S20 단계에서 정의된 특정 표정에 따라 컨텐츠를 추천한다. S20 단계의 일 실시예에 따르면, 컨텐츠처리부(220)는 이미지 검색어로 얼굴 이미지가 입력되면, 정의된 특정 표정을 참조하여 입력된 얼굴 이미지와 가장 유사한 특정 표정을 가지는 얼굴 이미지가 포함된 적어도 하나의 콘텐츠를 검색하고, 검색된 콘텐츠를 추천할 수 있다. S20 단계의 다른 실시예에 따르면, 컨텐츠처리부(220)는 컨텐츠 추천 요청이 있으면, 정의된 특정 표정을 참조하여 컨텐츠를 특정 표정에 따라 분류하여 분류된 컨텐츠를 추천할 수 있다. When a specific expression is defined, the content processing unit 220 recommends the content according to the specific expression defined in step S20. According to an embodiment of step S20, when the face image is input as an image search term, the content processing unit 220 refers to at least one facial image having a specific expression most similar to the input face image with reference to the defined specific expression. You can search for content and recommend the searched content. According to another embodiment of step S20, the content processing unit 220 may recommend the classified content by classifying the content according to the specific expression by referring to the defined specific expression, when a content recommendation request is made.

그러면, 전술한 S10 단계의 특정 표정을 정의하는 방법 및 S20 단계에 컨텐츠를 추천하는 방법 각각에 대해서 보다 상세하게 설명하기로 한다. 먼저, 특정 표정을 정의하는 방법에 대해서 설명한다. 특정 표정을 정의하기 위해 기본 표정을 학습하고, 기본 표정을 기초로 특정인의 특정 표정을 정의하는 절차가 요구된다. 따라서 우선, 본 발명의 실시예에 따른 기본 표정을 학습하는 절차에 대해서 설명하기로 한다. 도 5는 본 발명의 실시예에 따른 기본 표정을 학습하는 절차를 설명하기 위한 흐름도이다. Then, a method of defining a specific expression in step S10 and a method of recommending content in step S20 will be described in more detail. First, a method of defining a specific expression will be described. In order to define a specific expression, a procedure for learning a basic expression and defining a specific expression of a specific person based on the basic expression is required. Therefore, first, a procedure for learning a basic facial expression according to an embodiment of the present invention will be described. 5 is a flowchart illustrating a procedure for learning a basic facial expression according to an embodiment of the present invention.

도 5를 참조하면, 표정처리부(210)는 S110 단계에서 기본 표정 학습 이미지를 입력 받고, S120 단계에서 입력된 기본 표정 학습 이미지로부터 얼굴 인식(Face Recognition) 기술을 통해 얼굴 이미지를 도출한다. 기본 표정 학습 이미지는 기본 표정이 알려진 얼굴 이미지를 포함한다. 기본 표정은 예컨대, 공포(Fear), 경멸(Contempt), 슬픔(Sadness), 행복(Happiness), 놀람(Surprise), 분노(Anger), 혐오(Disgust) 등 미리 정의되어 1개의 워드로 표현될 수 있는 표정을 의미한다. 예컨대, 학습하고자 하는 기본 표정이 공포(Fear), 경멸(Contempt), 슬픔(Sadness), 행복(Happiness), 놀람(Surprise), 분노(Anger) 및 혐오(Disgust)라면, 기본 표정 학습 이미지는 기본 표정 학습 이미지에 포함된 얼굴 이미지의 표정이 공포(Fear), 경멸(Contempt), 슬픔(Sadness), 행복(Happiness), 놀람(Surprise), 분노(Anger) 및 혐오(Disgust) 중 어떤 것인지 알려진 이미지이다. Referring to FIG. 5, the facial expression processing unit 210 receives a basic facial expression learning image in step S110 and derives a facial image from the basic facial expression learning image input in step S120 through face recognition technology. The basic facial expression learning image includes a face image in which the basic facial expression is known. The basic expression can be expressed in one word in advance, for example, Fear, Contempt, Sadness, Happiness, Surprise, Anger, Disgust, etc. Means an expressive look. For example, if the basic expressions to be learned are Fear, Contempt, Sadness, Happiness, Surprise, Anger and Disgust, the basic facial expression learning image is basic Image of facial expressions included in the facial expression learning image, whether it is Fear, Contempt, Sadness, Happiness, Surprise, Anger or Disgust to be.

이에 따라, 표정처리부(210)는 S130 단계에서 기본 표정 학습 이미지의 알려진 표정에 따라 기댓값을 설정한다. 예컨대, 알려진 표정이 행복(Happiness)이라면, 목표값은 공포(Fear) = "0.000", 경멸(Contempt) = "0.000", 슬픔(Sadness) = "0.000", 행복(Happiness) = "0.800", 놀람(Surprise) = "0.200", 분노(Anger) = "0.000" 및 혐오(Disgust) = "0.000"으로 설정될 수 있다. Accordingly, the facial expression processing unit 210 sets an expected value according to a known facial expression of the basic facial expression learning image in step S130. For example, if the known expression is happiness, the target values are Fear = "0.000", Contempt = "0.000", Sadness = "0.000", Happiness = "0.800", Surprise = "0.200", Anger = "0.000" and Disgust = "0.000".

기댓값은 기본 표정이 알려진 얼굴 이미지를 기본표정인식기(110)에 입력했을 때, 기본표정인식기(110)가 해당 얼굴 이미지의 표정을 알려진 기본 표정으로 인식하는 경우에 기대되는 최소한의 출력값을 의미한다. 예컨대, 행복(Happiness)이라는 표정은 주로 강한 크기의 "행복(Happiness)"와 같은 표정이 나타나고, 부수적으로 약한 크기의 "놀람(Surprise)"과 같은 표정을 동반하기 때문에 기본 표정이 "행복(Happiness)"으로 알려진 얼굴 이미지를 기본표정인식기(110)에 입력하면, 기본표정인식기(110)가 해당 얼굴 이미지의 표정을 "행복(Happiness)"으로 인식하기 위해서 적어도 출력값은 최소한 행복(Happiness) = "0.700", 놀람(Surprise) = "0.200"이어야 하며, 나머지 표정들은 "1.000" 미만이 되어야 한다. 따라서 표정처리부(210)는 기댓값을 행복(Happiness) = "0.700", 놀람(Surprise) = "0.200"과 같이 설정할 수 있다. The expected value means the minimum output value expected when the facial expression with the known basic expression is input to the basic expression recognizer 110, when the basic expression recognizer 110 recognizes the expression of the face image as a known basic expression. For example, the expression “Happiness” mainly has a strong size “Happiness”, and incidentally a weak size “Surprise”, so the basic expression is “Happiness”. ), when the face expression known as "" is input to the basic expression recognizer 110, the basic expression recognizer 110 recognizes the expression of the face image as "Happiness". At least the output value is at least Happiness = " It should be 0.700", Surprise = "0.200", and the rest of the expressions should be less than "1.000". Therefore, the facial expression processing unit 210 may set the expected value as Happiness = "0.700", Surprise = "0.200".

다음으로, 표정처리부(210)는 S140 단계에서 얼굴 이미지를 기본표정인식기(110)에 입력하여 기본표정인식기(110)의 출력값을 도출한다. 이때, 기본표정인식기(110)는 입력된 얼굴 이미지에 대해 복수의 계층에 걸쳐 복수의 연산을 수행하여 출력값을 출력하며, 각 연산은 가중치를 가진다. 특히, 기본표정인식기(110)의 출력값은 학습하고자 하는 기본 표정 각각에 대한 확률값이 될 수 있다. 예컨대, 기본표정인식기(110)의 출력값은 공포(Fear) = "0.005", 경멸(Contempt) = "0.015", 슬픔(Sadness) = "0.304", 행복(Happiness) = "0.321", 놀람(Surprise) = "0.311", 분노(Anger) = "0.031" 및 혐오(Disgust) = "0.013"이 될 수 있다. 이와 같이, 학습이 완료되기 전, 앞서 설정된 기댓값과 출력값은 차이가 있다. Next, the expression processing unit 210 inputs the face image to the basic expression recognizer 110 in step S140 to derive the output value of the basic expression recognizer 110. At this time, the basic expression recognizer 110 performs a plurality of operations over a plurality of layers on the input face image to output an output value, and each operation has a weight. In particular, the output value of the basic expression recognizer 110 may be a probability value for each basic expression to be learned. For example, the output value of the basic expression recognizer 110 is Fear = "0.005", Contempt = "0.015", Sadness = "0.304", Happiness = "0.321", Surprise ) = "0.311", Anger = "0.031" and Disgust = "0.013". In this way, before learning is completed, there is a difference between the expected value and the output value.

따라서 표정처리부(210)는 S150 단계에서 기본표정인식기(110)의 출력값과 기댓값의 차이가 최소가 되도록 소정의 알고리즘, 예컨대, 역확산(back propagation) 알고리즘을 통해 기본표정인식기(110)의 가중치를 수정할 수 있다. Therefore, the expression processing unit 210 weights the basic expression recognizer 110 through a predetermined algorithm, for example, a back propagation algorithm, so that the difference between the output value and the expected value of the basic expression recognizer 110 is minimal in step S150. Can be modified.

이어서, 표정처리부(210)는 S160 단계에서 학습이 완료되었는지 여부를 판별한다. 즉, 표정처리부(210)는 S160 단계에서 모든 기본 표정에 대한 기본 표정 학습 이미지에 대해 기본표정인식기(110)의 출력값과 기댓값의 차이가 소정 범위 이내이면서 그 출력값의 변동이 소정 범위 이내인지 여부를 판별한다. Subsequently, the facial expression processing unit 210 determines whether learning is completed in step S160. That is, the expression processing unit 210 determines whether the difference between the output value and the expected value of the basic expression recognizer 110 is within a predetermined range and the variation of the output value is within a predetermined range for the basic facial expression learning image for all the basic expressions in step S160. Discriminate.

S160 단계의 판별 결과, 기본표정인식기(110)의 출력값과 기댓값의 차이가 소정 범위 이내가 아니거나, 그 출력값의 변동이 소정 범위 이내가 아니면, 표정처리부(210)는 S110 단계로 진행하여 새로운 기본 표정 학습 이미지를 입력 받고, 전술한 S110 단계 내지 S160 단계를 반복한다. As a result of the determination in step S160, if the difference between the output value and the expected value of the basic expression recognizer 110 is not within a predetermined range, or if the variation in the output value is not within a predetermined range, the facial expression processing unit 210 proceeds to step S110 to perform a new basic After receiving the facial expression learning image, steps S110 to S160 described above are repeated.

반면, S150 단계의 판별 결과, 기본표정인식기(110)의 출력값과 기댓값의 차이가 소정 범위 이내이면서 그 출력값의 변동이 소정 범위 이내이면, 표정처리부(210)는 S170 단계로 진행하여 기본 표정 학습을 종료한다. On the other hand, as a result of the determination in step S150, if the difference between the output value and the expected value of the basic expression recognizer 110 is within a predetermined range and the variation of the output value is within a predetermined range, the facial expression processing unit 210 proceeds to step S170 to learn basic facial expression. Ends.

다음으로, 기본 표정을 기초로 특정인의 특정 표정을 정의하는 방법에 대해서 설명하기로 한다. 도 6은 본 발명의 실시예에 따른 특정인의 특정 표정을 정의하는 방법을 설명하기 위한 흐름도이다. Next, a method of defining a specific expression of a specific person based on the basic expression will be described. 6 is a flowchart illustrating a method of defining a specific expression of a specific person according to an embodiment of the present invention.

도 4를 참조하면, 표정처리부(210)는 S210 단계에서 복수의 특정 표정 학습 이미지를 입력 받고, S220 단계에서 입력된 복수의 특정 표정 학습 이미지로부터 얼굴 인식(Face Recognition) 기술을 통해 복수의 특정 표정 학습 이미지 각각으로부터 특정인의 얼굴 영역을 식별하고, 특정인의 얼굴 이미지를 검출한다. 특정 표정 학습 이미지는 특정인의 얼굴 이미지를 포함하는 이미지 혹은 동영상을 포함한다. 만약, 특정 표정 학습 이미지로 특정인의 얼굴 이미지가 입력된 경우, 본 과정은 생략될 수 있다. Referring to FIG. 4, the facial expression processing unit 210 receives a plurality of specific facial expression learning images in step S210, and a plurality of specific facial expressions through face recognition technology from the plurality of specific facial expression learning images input in step S220. A face region of a specific person is identified from each of the learning images, and a face image of the specific person is detected. The specific facial expression learning image includes an image or a video including a specific person's face image. If a face image of a specific person is input as a specific expression learning image, this process may be omitted.

표정처리부(210)는 S230 단계에서 앞서(S220) 검출한 특정인의 얼굴 이미지로부터 복수의 표정벡터를 추출한다. 여기서, 일 실시예에 따르면, 표정벡터는 기본표정벡터, 표정근육벡터 및 랜드마크벡터 중 적어도 하나를 포함한다. The expression processing unit 210 extracts a plurality of expression vectors from the face image of the specific person detected in step S230 (S220). Here, according to an embodiment, the expression vector includes at least one of a basic expression vector, an expression muscle vector, and a landmark vector.

기본표정벡터는 소정 수의 기본 표정 각각에 대한 확률값 전체를 의미한다. 표정처리부(210)는 앞서 기본 표정에 대한 학습이 완료된 기본표정인식기(110)에 특정인의 얼굴 이미지를 입력시켜, 그 기본표정인식기(110)의 출력값을 얻을 수 있다. 전술한 바와 같이, 학습이 완료된 기본표정인식기(110)는 얼굴 이미지가 입력되면, 기본 표정 각각에 대한 확률을 출력값으로 출력한다. 예컨대, 기본표정인식기(110)가 기본 표정으로 공포(Fear), 경멸(Contempt), 슬픔(Sadness), 행복(Happiness), 놀람(Surprise), 분노(Anger) 및 혐오(Disgust)를 학습하였다고 가정한다. 표정처리부(210)는 기본표정인식기(110)에 학습 이미지로부터 검출된 얼굴 영역을 입력하면, 기본표정인식기(110)는 공포(Fear), 경멸(Contempt), 슬픔(Sadness), 행복(Happiness), 놀람(Surprise), 분노(Anger) 및 혐오(Disgust) 각각에 대응하는 확률을 출력값으로 출력한다. 예컨대, 이러한 출력값은 공포(Fear) = "0.005", 경멸(Contempt) = "0.015", 슬픔(Sadness) = "0.304", 행복(Happiness) = "0.321", 놀람(Surprise) = "0.311", 분노(Anger) = "0.031" 및 혐오(Disgust) = "0.013"이 될 수 있다. 종래의 분류기의 경우, 이러한 경우, 확률이 가장 높은 행복(Happiness) = "0.321"을 해당 얼굴 이미지의 표정으로 인식한다. 하지만, 본 발명은 해당 이미지의 얼굴의 표정을 기본 표정 중 하나로 인식하는 것이 아니라, 예시된 출력값, 슬픔(Sadness) = "0.304"인지, 행복(Happiness) = "0.321"인지 혹은 놀람(Surprise) = "0.311"을 그대로 사용하여, 슬픔(Sadness), 행복(Happiness), 및 놀람(Surprise)이 각각의 확률만큼 섞여 있는 표정을 표현한다. 따라서 본 발명은 기본표정인식기(110)의 출력값, 즉, 출력된 복수의 표정 각각의 확률값 모두를 기본표정벡터로 이용한다. 예컨대, 기본표정인식기(110)의 복수의 출력 노드(O1 내지 O7)의 출력값이 공포(Fear) = "0.005", 경멸(Contempt) = "0.015", 슬픔(Sadness) = "0.304", 행복(Happiness) = "0.321", 놀람(Surprise) = "0.311", 분노(Anger) = "0.031" 및 혐오(Disgust) = "0.013"인 경우, 기본표정벡터는 "0.005O1 + 0.015O2 + 0.304O3 + 0.321O4 + 0.311O5 + 0.031O6 + 0.013O7"이 될 수 있다. 이러한 기본표정벡터는 전술한 예와 같이, 복합적인 감정이 섞여 있어도, 기쁨 혹은 슬픔과 같이 강제로 한 가지 감정에 매핑하는 것이 아니라, 기본표정분류기의 출력값, 즉, 출력된 복수의 표정 각각의 확률값 모두를 이용함으로써, 특정인의 고유한 표정의 특징을 표현할 수 있다. The basic expression vector means the entire probability value for each of a predetermined number of basic expressions. The facial expression processing unit 210 may input a face image of a specific person into the basic facial expression recognizer 110 in which learning of the basic facial expression is completed, and obtain an output value of the basic facial expression recognizer 110. As described above, when learning is completed, the basic expression recognizer 110 outputs a probability for each basic expression as an output value when a face image is input. For example, it is assumed that the basic expression recognizer 110 has learned Fear, Contempt, Sadness, Happiness, Surprise, Anger and Disgust with the basic expression. do. When the facial expression processing unit 210 inputs the face region detected from the learning image in the basic expression recognizer 110, the basic expression recognizer 110 includes fear, contempt, sadness, and happiness. , Probability corresponding to each of Surprise, Anger and Disgust is output as an output value. For example, these output values are Fear = "0.005", Contempt = "0.015", Sadness = "0.304", Happiness = "0.321", Surprise = "0.311", Anger = "0.031" and Disgust = "0.013". In the case of the conventional classifier, in this case, the highest probability (Happiness) = "0.321" is recognized as the expression of the corresponding face image. However, the present invention does not recognize the expression of the face of the image as one of the basic expressions, but is the illustrated output value, Sadness = "0.304", Happiness = "0.321", or Surprise = By using "0.311" as it is, expressions of sadness, happiness, and surprise are mixed by each probability. Therefore, the present invention uses the output value of the basic expression recognizer 110, that is, all the probability values of each of the outputted expressions as the basic expression vector. For example, the output values of the plurality of output nodes O1 to O7 of the basic expression recognizer 110 are Fear = "0.005", Contempt = "0.015", Sadness = "0.304", Happiness ( If Happiness = "0.321", Surprise = "0.311", Anger = "0.031" and Disgust = "0.013", the basic expression vector is "0.005O1 + 0.015O2 + 0.304O3 + 0.321O4 + 0.311O5 + 0.031O6 + 0.013O7". These basic expression vectors do not map to one emotion, such as joy or sadness, even when complex emotions are mixed, as in the above-described example, but the output value of the basic expression classifier, that is, the probability value of each of the multiple expressions output. By using all of them, it is possible to express the characteristics of a specific expression of a specific person.

표정근육벡터는 소정의 얼굴 근육의 움직임의 특징을 나타내는 특징 벡터이다. 예컨대, 표정 근육 벡터는 눈을 뜬 정도, 입을 벌린 정도, 윙크(한쪽 눈 감음), 눈썹을 치켜 뜬 정도(눈썹과 눈과의 간격)와 같은 얼굴 근육의 움직임의 특징을 구분하는 특징 벡터이다. 표정근육인식기(120)는 얼굴 이미지로부터 소정의 눈을 뜬 정도, 입을 벌린 정도, 윙크 여부, 눈썹을 치켜 뜬 정도를 인식하고, 이를 출력한다. 따라서 표정처리부(210)는 학습 이미지로부터 검출된 얼굴 영역이 입력되면, 표정근육인식기(120)를 통해 소정의 얼굴 근육의 움직임을 값을 측정하여 표정근육벡터를 도출할 수 있다. The expression muscle vector is a feature vector representing characteristics of movement of a predetermined facial muscle. For example, the expression muscle vector is a feature vector that distinguishes features of movement of the facial muscles such as the degree of opening the eye, the degree of opening the mouth, the wink (close one eye), and the degree of raising the eyebrow (the distance between the eyebrow and the eye). The facial expression muscle recognizer 120 recognizes the degree of opening a predetermined eye, the degree of opening a mouth, whether winking, or the degree of raising an eyebrow from the face image, and outputs it. Accordingly, when the face region detected from the learning image is input, the facial expression processing unit 210 may derive an expression muscle vector by measuring a value of movement of a predetermined facial muscle through the facial expression muscle recognizer 120.

랜드마크벡터는 얼굴 영역에서 소정의 랜드마크의 위치의 특징을 나타내는 특징 벡터이다. 이러한 랜드마크는 눈, 코, 입 등의 주요 좌표 5개, 및 기 설정된 68개의 세부 얼굴 포인트 등이 될 수 있다. 랜드마크인식기(130)는 얼굴 이미지로부터 랜드마크의 좌표를 인식한다. 따라서 표정처리부(210)는 학습 이미지로부터 검출된 얼굴 영역이 입력되면, 랜드마크인식기(130)를 통해 얼굴 영역으로부터 소정의 랜드마크의 좌표를 식별하여 랜드마크벡터를 도출할 수 있다. The landmark vector is a feature vector representing features of a location of a predetermined landmark in the face region. The landmark may be 5 main coordinates such as eyes, nose, and mouth, and 68 preset face points. The landmark recognizer 130 recognizes the coordinates of the landmark from the face image. Accordingly, when the face region detected from the learning image is input, the facial expression processing unit 210 may derive a landmark vector by identifying coordinates of a predetermined landmark from the face region through the landmark recognizer 130.

표정처리부(210)는 S240 단계에서 복수의 특정인의 얼굴 이미지 각각으로부터 도출된 표정벡터를 소정의 벡터 도메인에 사상한다. 그런 다음, 표정처리부(210)는 S250 단계에서 벡터 도메인 상에서 표정벡터를 클러스터링(Grouping 혹은 Clustering)하여 표정벡터그룹을 생성한다. 복수의 얼굴 이미지 각각에 대응하는 복수의 표정벡터가 클러스터링되어 복수의 표정벡터그룹이 형성된다. 예컨대, 표정처리부(210)는 중심값을 기준으로 소정 거리 내에 군집되어 있는 표정벡터를 클러스터링하여 표정벡터그룹을 형성한다. 이를 위하여, k nearest neighbor, k-means, 혹은 cosine similarity와 같이 클러스터링 알고리즘 및 유사도 알고리즘을 이용할 수 있다. The expression processing unit 210 maps the expression vectors derived from each of the face images of the plurality of specific persons to a predetermined vector domain in step S240. Then, the expression processing unit 210 generates an expression vector group by clustering (Grouping or Clustering) the expression vector on the vector domain in step S250. A plurality of facial expression vectors corresponding to each of the plurality of face images are clustered to form a plurality of facial expression vector groups. For example, the facial expression processing unit 210 forms a facial expression vector group by clustering facial expression vectors clustered within a predetermined distance based on the center value. To this end, clustering algorithms and similarity algorithms such as k nearest neighbor, k-means, or cosine similarity can be used.

전술한 바와 같이, 복수의 표정벡터를 클러스터링하여 표정벡터그룹을 형성한 후, 표정처리부(210)는 S260 단계에서 표정벡터그룹을 특정 표정으로 정의한다. As described above, after forming the expression vector group by clustering a plurality of expression vectors, the expression processing unit 210 defines the expression vector group as a specific expression in step S260.

도 3의 벡터 도메인(VD) 상에 5개의 표정벡터그룹(G1, G2, G3, G4, G5)이 도시되었다. 각 점들은 얼굴 이미지에 대응하는 표정벡터를 의미하며, 원은 클러스터링에 의해 형성된 표정벡터그룹의 범위를 의미한다. 특정 표정은 단순히 기본 표정인 1개의 워드, 예컨대, '행복'으로 표현할 수 없는 특정인의 고유의 표정을 나타낸다. 이러한 특정 표정은 예컨대, 한쪽 눈을 윙크하며 귀엽게 웃는 표정, 입을 약간 벌린 섹시한 표정 등을 예시할 수 있다. 기존의 어느 하나의 표정을 하나의 워드로 표현하기 위해 기본 표정을 인식하는 과정에서 하나의 워드를 제외하고, 나머지를 모두 소거하였다. 하지만, 본 발명은 예컨대, "0.005O1 + 0.015O2 + 0.304O3 + 0.321O4 + 0.311O5 + 0.031O6 + 0.013O7"과 같이 어떤 특징도 소거하지 않고 모두 이용하여 표정벡터를 추출하고, 이를 벡터 도메인 상에 사상한 후, 클러스터링을 통해 특정 표정을 하나의 워드가 아닌 복수의 표정벡터의 군집으로 정의한다. 또한, 특정 표정은 예컨대, 특정인의 고유의 한쪽 눈을 윙크하며 귀엽게 웃는 표정, 입을 약간 벌린 섹시한 표정 등으로 명명할 수 있다. 이때, 표정처리부(210)는 특정 표정을 명확히 구분할 수 있도록 표정벡터 중 기본표정벡터, 표정근육벡터 및 랜드마크벡터 각각의 가중치를 부여하고 조절할 수 있다. 각 인물 마다 랜드마크의 분포도 다르고(눈 간격, 눈과 코 간격, 각각의 랜드마크의 크기 등), 표정이 바뀔 때마다의 얼굴 근육의 움직이는 정도도 다르기 때문에, 특정인이 자주 짖는 고유한 표정들로부터 얻어지는 표정벡터가 벡터 도메인 상에서 인접한 공간에 사상된다. 따라서 이러한 클러스터링을 통해 특정인의 고유한 표정, 즉, 특정 표정을 구분할 수 있다. 따라서 인접한 공간에 사상되지 되지 않고 동떨어져 있는 표정벡터를 가지는 얼굴 이미지는 특정 표정과 다른 표정으로 규정할 수 있다. Five expression vector groups G1, G2, G3, G4, and G5 are shown on the vector domain VD of FIG. 3. Each dot represents the expression vector corresponding to the face image, and the circle represents the range of the expression vector group formed by clustering. The specific expression simply represents one word, which is a basic expression, for example, a unique expression of a specific person that cannot be expressed as'happiness'. Such a specific expression may, for example, wink one eye, a cute smile, a slightly open mouth, and the like. In order to express any existing facial expression as a single word, all of the rest were erased except for one word in the process of recognizing the basic facial expression. However, the present invention extracts expression vectors using all of the features without erasing any features, such as "0.005O1 + 0.015O2 + 0.304O3 + 0.321O4 + 0.311O5 + 0.031O6 + 0.013O7", and extracts the expression vectors using them. After mapping to, a specific expression is defined as a cluster of multiple expression vectors rather than a single word through clustering. In addition, the specific expression may be named, for example, a cute smiley expression by winking one eye unique to a specific person, a sexy expression with a slightly open mouth, and the like. At this time, the facial expression processing unit 210 may assign and adjust the weights of the basic expression vector, the facial muscle vector, and the landmark vector among the expression vectors so as to clearly distinguish the specific expression. Because the distribution of landmarks is different for each person (eye spacing, eye and nose spacing, size of each landmark, etc.), and the degree of movement of the facial muscles every time the expression changes, the unique expressions that a particular person often barks from The resulting expression vector is mapped to an adjacent space on the vector domain. Therefore, through such clustering, a unique expression of a specific person, that is, a specific expression can be distinguished. Therefore, a face image having an expression vector that is not thought of in an adjacent space but is separated can be defined as a different expression from a specific expression.

전술한 바에 따라, 표정처리부(210)는 특정인의 특징적인 표정인 특정 표정을 정의할 수 있다. 이는 단순한 기쁨, 슬픔과 같은 기본 표정이 아니라, 특정인의 고유의 표정을 의미한다. 또한, 표정처리부(210)는 특정인의 특정 표정은 고유한 이름을 부여하여 분류 체계를 정의할 수 있다. 앞서 설명된 바와 같이, 특정인의 특정 표정은 "기쁨", "슬픔" 등과 같이 한 단어로 설명할 수 있는 표정들이 아니므로, 임의의 명명 체계를 부여하는 형태로 정의할 수 있다. 예컨대, 특정인 홍길동이 박장대소하는 모습을 홍길동_웃음_01로 명명하고, 특정인 홍길동이 가벼운 미소를 띠는 모습을 홍길동_웃음_02로 명명하는 방식이 될 수 있다. 또한, 표정처리부(210)는 특정인의 특정 표정을 직관적으로 인식할 수 있도록 해당하는 특정 표정을 가지는 복수의 얼굴 이미지 중 대표 이미지를 선정할 수도 있다. As described above, the facial expression processing unit 210 may define a specific expression that is a characteristic expression of a specific person. This is not just a basic expression such as joy or sadness, but a unique expression of a specific person. Also, the facial expression processing unit 210 may define a classification system by giving a specific name of a specific person a unique name. As described above, since a specific expression of a specific person is not an expression that can be described in one word, such as "joy", "sadness," etc., it can be defined as a form that gives an arbitrary naming system. For example, the manner in which a certain person Hong Gil-dong is Park Jang and So can be named Hong Gil-dong_laugh_01, and the appearance of a specific person Hong Gil-dong with a light smile can be named Hong-gil-dong_laugh_02. In addition, the facial expression processing unit 210 may select a representative image among a plurality of facial images having a corresponding specific expression to intuitively recognize a specific expression of a specific person.

다음으로, S20 단계에 컨텐츠를 추천하는 방법에 대해서 보다 상세하게 설명하기로 한다. 특정 표정을 정의한 후, 본 발명의 실시예에 따른 표정 기반 컨텐츠를 추천하기 위해서는 추천하고자 하는 컨텐츠를 앞서 정의된 특정인의 특정 표정에 따라 분류해야 한다. 이러한 방법에 대해서 설명하기로 한다. 도 7은 본 발명의 실시예에 따른 표정 기반의 컨텐츠 분류 방법을 설명하기 위한 흐름도이다. Next, a method for recommending content in step S20 will be described in more detail. After defining a specific expression, in order to recommend expression-based content according to an embodiment of the present invention, the content to be recommended must be classified according to a specific expression of a specific person defined above. This method will be described. 7 is a flowchart illustrating a method for classifying content based on facial expressions according to an embodiment of the present invention.

도 7을 참조하면, 컨텐츠처리부(140)는 S310 단계에서 컨텐츠로부터 얼굴 이미지를 도출한다. 여기서, 컨텐츠는 동영상 컨텐츠 및 이미지 컨텐츠 중 어느 하나가 될 수 있다. 특히, 동영상 컨텐츠인 경우, 컨텐츠처리부(140)는 컨텐츠를 소정의 기준에 따라 장면 별로 구분하여 얼굴 이미지를 추출할 수 있다. Referring to FIG. 7, the content processing unit 140 derives a face image from the content in step S310. Here, the content may be either video content or image content. In particular, in the case of video content, the content processing unit 140 may extract the face image by dividing the content into scenes according to predetermined criteria.

이어서, 컨텐츠처리부(140)는 S320 단계에서 컨텐츠의 얼굴 이미지로부터 표정벡터를 추출한다. 그런 다음, 컨텐츠처리부(140)는 S330 단계에서 추출된 표정벡터를 벡터 도메인 상에 사상한다. 표정벡터가 벡터 도메인 상에 사상되면, 표정벡터그룹 영역 내에 속할 수 있다. 예컨대, 도 3의 제1 내지 제5 표정벡터그룹(G1 내지 G5) 중 어느 하나에 속할 수 있다. 그러면, 컨텐츠처리부(140)는 S340 단계에서 해당 컨텐츠를 표정벡터가 사상된 영역의 표정벡터그룹에 해당하는 특정 표정으로 분류한다. 예컨대, 벡터 도메인 상에 컨텐츠의 표정벡터가 사상된 영역이 제1 표정벡터그룹(G1)의 영역 내(도면에서 원 내부)이라면, 제1 표정벡터그룹(G1)에 해당하는 특정 표정으로 분류할 수 있다. Subsequently, the content processing unit 140 extracts an expression vector from the face image of the content in step S320. Then, the content processing unit 140 maps the expression vector extracted in step S330 onto the vector domain. When the expression vector is mapped onto the vector domain, it may fall within the region of the expression vector group. For example, it may belong to any one of the first to fifth expression vector groups G1 to G5 of FIG. 3. Then, the content processing unit 140 classifies the content into a specific expression corresponding to the expression vector group of the region in which the expression vector is mapped in step S340. For example, if the region in which the expression vector of the content is mapped on the vector domain is within the region of the first expression vector group G1 (in the circle in the drawing), it is classified as a specific expression corresponding to the first expression vector group G1. Can be.

전술한 바와 같은 방식으로 복수의 컨텐츠에 대한 분류가 이루어지면 컨텐츠를 추천할 수 있다. 이러한 방법에 대해서 설명하기로 한다. 도 8은 본 발명의 일 실시예에 따른 표정 기반 컨텐츠 추천 방법을 설명하기 위한 흐름도이다. When classification is performed for a plurality of contents in the same manner as described above, the contents may be recommended. This method will be described. 8 is a flowchart illustrating a method for recommending facial expression-based content according to an embodiment of the present invention.

도 8을 참조하면, 컨텐츠처리부(220)는 S410 단계에서 얼굴 이미지를 포함하는 이미지 검색어를 입력 받을 수 있다. 이러한 얼굴 이미지는 특정인이 특정 표정을 짓는 모습을 포함할 수 있다. Referring to FIG. 8, the content processing unit 220 may receive an image search word including a face image in step S410. The face image may include a specific person making a specific expression.

그리고 컨텐츠처리부(220)는 S420 단계에서 이미지 검색어로부터 얼굴 이미지를 추출한다. 그런 다음, 컨텐츠처리부(220)는 S430 단계에서 추출된 얼굴 이미지로부터 기본표정벡터, 표정근육벡터 및 랜드마크벡터를 포함하는 표정벡터를 추출한다. 이어서, 컨텐츠처리부(220)는 S440 단계에서 추출된 표정벡터를 벡터 도메인 상에 사상한다. 표정벡터가 벡터 도메인 상에 사상되면, 어느 하나의 표정벡터그룹 영역 내에 속할 수 있다. 예컨대, 도 3의 제1 내지 제5 표정벡터그룹(G1 내지 G5) 중 어느 하나에 속할 수 있다. Then, the content processing unit 220 extracts the face image from the image search word in step S420. Then, the content processing unit 220 extracts an expression vector including a basic expression vector, an expression muscle vector, and a landmark vector from the face image extracted in step S430. Subsequently, the content processing unit 220 maps the expression vector extracted in step S440 onto the vector domain. When the expression vector is mapped onto the vector domain, it may belong to any one expression vector group region. For example, it may belong to any one of the first to fifth expression vector groups G1 to G5 of FIG. 3.

그러면, 컨텐츠처리부(220)는 S450 단계에서 해당 이미지 검색어를 표정벡터가 사상된 영역의 표정벡터그룹에 해당하는 특정 표정으로 분류한다. 예컨대, 벡터 도메인 상에 이미지 검색어의 표정벡터가 사상된 영역이 제2 표정벡터그룹(G2)의 영역 내(도면에서 원 내부)이라면, 컨텐츠처리부(220)는 해당 이미지 검색어를 제2 표정벡터그룹(G2)에 해당하는 특정 표정으로 분류할 수 있다. Then, the content processing unit 220 classifies the corresponding image search term as a specific expression corresponding to the expression vector group of the region in which the expression vector is mapped in step S450. For example, if the region in which the expression vector of the image search term is mapped on the vector domain is within the region of the second expression vector group G2 (in the circle in the drawing), the content processing unit 220 sets the corresponding image search word to the second expression vector group. It can be classified as a specific expression corresponding to (G2).

이어서, 컨텐츠처리부(220)는 S460 단계에서 앞서(S340 단계) 분류된 컨텐츠 중 동일한 특정 표정으로 분류된 컨텐츠를 추천한다. 이때, 본 발명의 추가적인 실시예에 따르면, 컨텐츠처리부(220)는 이미지검색어와 동일한 표정벡터그룹에 속하는 컨텐츠를 이미지 검색어의 표정벡터가 사상된 위치와 가까운 순서(유사한 순서)에 따라 나열하거나, 서비스에서 원하는 개수만큼 유사한 순서대로 추출해서 제공할 수도 있다. Subsequently, the content processing unit 220 recommends the content classified with the same specific expression among the previously classified content in step S460 (step S340). At this time, according to an additional embodiment of the present invention, the content processing unit 220 lists contents belonging to the same expression vector group as the image search word in the order close to the location where the expression vector of the image search term is mapped (similar order), or services You can also extract and provide as many as you want in a similar order.

전술한 바와 같이, 본 발명은 얼굴, 객체 혹은 표정에 대한 인식 혹은 분류에 대한 결과값을 카테고리값인 1개 워드(Word)로 제공하는 것이 아니라, 그 결과값을 1개의 워드로 표현할 수 없지만, 특징점을 갖는 벡터값의 형태로 제공한다. 즉, 본 발명의 실시예에 따른 표정 인식 혹은 표정 분류 결과는 1개의 워드로 선택되는 것이 아니라, 벡터값들끼리 유사점이나 클러스터링의 형태로 제공한다. 따라서 본 발명은 예컨대, 행복한 표정, 정우성, 바닷가, 코끼리 등과 같이, 검색 키워드로 워드를 입력하는 것이 아니라, 이미지를 입력하면, 이와 유사한 특징을 갖는 결과값들이 나열되는 형태로 인식 혹은 분류 결과를 제공하는 서비스이다. As described above, the present invention does not provide a result value for recognition or classification of a face, an object, or an expression as one word, which is a category value, but the result value cannot be expressed as one word. Provided in the form of vector values with feature points. That is, the result of facial expression recognition or facial expression classification according to an embodiment of the present invention is not selected as one word, but vector values are provided in the form of similarities or clustering. Therefore, the present invention provides recognition or classification results in a form in which result values having similar characteristics are listed when a word is input instead of a search keyword, such as a happy expression, Jung Woo-sung, a beach, an elephant, etc. Service.

전술한 바와 같이 복수의 컨텐츠에 대한 분류가 이루어진 후, 본 발명의 다른 실시예에 따른 컨텐츠를 추천하는 방법에 대해서 설명하기로 한다. 여기서, 본 발명의 실시예에 따른 특정 표정에 따라 동영상 컨텐츠를 분류하는 방법에 대해서 설명하기로 한다. 도 9는 본 발명의 실시예에 따른 특정 표정에 따라 동영상 컨텐츠를 분류하는 방법을 설명하기 위한 흐름도이다. 도 10은 본 발명의 실시예에 따른 특정 표정에 따라 동영상 컨텐츠를 분류하는 방법을 설명하기 위한 도면이다. As described above, after classification of a plurality of contents is performed, a method of recommending contents according to another embodiment of the present invention will be described. Here, a method of classifying video content according to a specific expression according to an embodiment of the present invention will be described. 9 is a flowchart illustrating a method of classifying video content according to a specific expression according to an embodiment of the present invention. 10 is a diagram for explaining a method of classifying video content according to a specific expression according to an embodiment of the present invention.

도 9를 참조하면, 컨텐츠처리부(220)는 S510 단계에서 동영상 컨텐츠에서 특정인이 등장하는 장면의 구간을 복수의 등장장면구간으로 구분한다. 본 발명은 이미지 검색어에 포함된 얼굴 이미지에 나타난 표정과 가장 유사한 표정을 갖는 동영상 컨텐츠의 특정 구간을 매칭하여 제공하기 위한 것이다. 이때 동영상 컨텐츠는 특정 인물이 연속된 시간 값을 갖는 장면에 등장한다. 따라서 해당 장면이 연속된 시간 값을 갖고 그 장면에서 다양한 표정 변화를 갖게 되는데, 이를 이미지 검색어의 표정과 매칭하는 것이 요구된다. 따라서 본 발명에서는, 우선 동영상 컨텐츠에서 특정 인물이 등장하는 장면의 구간을 인물별 등장장면구간으로 정의 하는 과정이 선행된다. 또한, 특정 장면의 구간 내에서도 촬영 기법 상 여러 대의 카메라가 다른 화각으로 촬영한 영상을 복합적으로 편집된 경우, 특정 인물이 화면에 등장했다가 사라졌다가 하는 것이 빈번하게 될 것이다. 따라서 특정 인물의 등장장면구간은 일시적으로 화면에 사라지는 것을 배재하고, 일정 시간 기준 내에 재등장하면 같은 등장장면구간으로 처리하며, 스토리(Story) 단위의 씬(Scene)으로 구분 하는 것이 필요하다. 예컨대, 인물 A가 최초 등장해서 일정 시간 동안 기 설정된 최소 시간 이상 화면에서 사라짐 없이 계속 등장하는 구간을 그룹화하여 인물 A의 등장장면구간으로 정의한다. 이러한 방법으로 인물별 등장장면구간이 구분된다. Referring to FIG. 9, the content processing unit 220 divides a section of a scene in which a specific person appears in the video content into a plurality of appearance scene sections in step S510. The present invention is to match and provide a specific section of video content having the expression most similar to the expression shown in the face image included in the image search term. At this time, the video content appears in a scene in which a specific person has a continuous time value. Therefore, the scene has a continuous time value and has various facial expression changes in the scene, and it is required to match it with the expression of the image search word. Therefore, in the present invention, first, a process of defining a section of a scene in which a specific person appears in the video content as a scene section for each person is preceded. In addition, even in a section of a specific scene, when a plurality of cameras are combined to edit an image shot with a different angle of view according to a shooting technique, it will be frequently said that a specific person appears and disappears on the screen. Therefore, it is necessary to exclude the scenes of a specific person from temporarily disappearing from the screen, to reappear within a certain time frame, to treat the scenes as the same scene, and to divide them into scenes in story units. For example, a section in which the person A first appears and continues to appear on the screen for a certain period of time or longer without disappearing is defined as the appearance scene section of the person A. In this way, the appearance scene section for each person is divided.

컨텐츠처리부(220)는 S520 단계에서 각 등장장면구간 별로 추출되는 복수의 얼굴 이미지 각각의 표정벡터를 벡터 도메인에 사상하여 복수의 얼굴 이미지 각각의 특정 표정을 도출하고, S530 단계에서 각 등장장면구간 별로 가장 많이 도출된 특정 표정을 해당 등장장면구간의 특정 표정으로 결정할 수 있다. The content processing unit 220 maps the expression vectors of the plurality of face images extracted for each appearance scene section to the vector domain in step S520 to derive specific expressions for each of the plurality of face images, and in step S530, for each appearance scene section. The specific expression most frequently drawn can be determined as a specific expression in the corresponding scene section.

한편, 추가적인 실시예에 따르면, 특정 표정 결정의 정확도를 높이기 위하여, 등장장면구간의 배경 음악, 대사 등을 분석하여 등장장면구간의 감성 정보에 대한 단서를 획득하고, 이 단서 정보와 특정 표정 후보로 올라온 것들의 유사도를 산출하여, 소정의 가중치를 가지고 합산하여 해당 등장장면구간의 특정 표정을 결정할 수 있다. 예를 들면, 배경 음악, 대사 혹은 스토리 상 감성 정보가 이별 장면이라고 추정되는 경우, 벡터 도메인 상 특정 표정의 후보가 울먹거리는 표정, 놀란 표정 등의 몇 가지가 후보로 도출 된다면, 울먹거리는 표정이 유사도가 가장 높을 것이므로, 해당 특정 표정을 해당 등장장면구간의 특정 표정으로 결정할 수 있다. On the other hand, according to an additional embodiment, in order to increase the accuracy of determining a specific facial expression, a background music, a dialogue, etc. of the appearance scene section is analyzed to obtain a clue for the emotion information of the appearance scene section, and the cue information and the specific expression candidate By calculating the similarity of the climbed things, it can be summed with a predetermined weight to determine a specific expression of the corresponding scene section. For example, if the emotion information in the background music, dialogue, or story is presumed to be a parting scene, the expression of the crying expression is similarity if several candidates such as the expression of the crying expression and the surprised expression on the vector domain are derived as candidates. Since is the highest, the specific expression can be determined as a specific expression in the corresponding scene section.

이에 따라, 컨텐츠처리부(220)는 S540 단계에서 동영상 컨텐츠를 인물 및 해당 인물의 표정별로 복수의 등장장면구간으로 구분한다. Accordingly, the content processing unit 220 divides the video content into a plurality of appearance scene sections for each person and the expression of the person in step S540.

동영상 컨텐츠를 제공하는 미디어 서비스는 영화, 드라마, 예능과 같이 장르 별 카테고리로 컨텐츠를 분류하여 제공하였다. 이에 따라, 시청자는 장르를 통해 동영상 컨텐츠를 검색하고, 선택하여 동영상 컨텐츠를 시청할 수 있다. 전술한 바와 같이, 본 발명은 동영상 컨텐츠를 인물 및 해당 인물의 표정별로 복수의 등장장면구간으로 구분할 수 있다. 이에 따라, 본 발명은 동영상 컨텐츠를 제공하는 미디어 서비스 제공 시, 동영상 컨텐츠를 인물 및 해당 인물의 표정별로 복수의 등장장면구간으로 구분하여 사용자가 선택하여 시청할 수 있도록 서비스를 제공할 수 있다. Media services that provide video content are categorized into categories by genre, such as movies, dramas, and entertainment. Accordingly, the viewer can search for video content through a genre, and select and watch the video content. As described above, the present invention can divide the video content into a plurality of appearance scene sections for each person and each person's expression. Accordingly, according to the present invention, when a media service providing video content is provided, the video content may be divided into a plurality of appearance scene sections for each person and the expression of the person, and a service may be provided for a user to select and watch.

예컨대, 시청자가 메뉴 선택을 배우 A -> 배우 A 등장 장면 모음 -> 1. 배우 A의 섹시한 표정 2. 배우 A의 윙크 찡긋 표정 3. 배우 A의 우수에 찬 표정 등의 트리 구조로 제공할 수 있고, 각 표정을 직관적으로 선택할 수 있도록 메뉴는 이미지와 함께 제공될 수 있다. 이에 따라, 시청자가 메뉴에서 배우 A를 선택하면, 컨텐츠처리부(220)는 배우 A가 등장한 다양한 컨텐츠 속에서 배우 A가 등장한 등장장면구간만 모아 하이라이트 영상처럼 시청 가능하도록 제공할 수 있다. 또한, 표정을 추가로 선택하면, 해당 배우의 해당 표정이 특정 표정으로 분류된 등장장면구간만 모아 제공할 수 있다. 그러면, 시청자는 배우 A의 특징적인 표정들이 보여지는 동영상 컨텐츠를 시청할 수 있다. 또한, 시청자가 메뉴에서 순차로 검색하지 않고, 임의의 배우 A의 임의 표정 이미지를 입력하면, 컨텐츠처리부(220)는 해당 표정과 유사한 표정이 등장한 컨텐츠들을 검색하여 제공할 수도 있다. For example, the viewer can provide menu selections in a tree structure such as actor A -> actor A appearance scene collection -> 1. Actor A's sexy expression 2. Actor A's wink frown 3. Actor A's exquisite expression In addition, the menu may be provided with an image so that each expression can be intuitively selected. Accordingly, when the viewer selects actor A from the menu, the content processing unit 220 may collect the scenes in which the actor A has appeared in various contents in which the actor A has appeared, and provide the viewer with a highlight image. In addition, if an additional expression is selected, only the appearance scene sections in which the corresponding actor's expression is classified as a specific expression can be collected and provided. Then, the viewer can watch the video content showing the characteristic expressions of the actor A. In addition, if the viewer does not sequentially search from the menu and inputs an arbitrary facial expression image of an actor A, the content processing unit 220 may search for and provide contents with facial expressions similar to the corresponding facial expression.

또 다른 실시예에 따르면, 종래의 동영상 컨텐츠를 설명하거나, 검색하기 위해, 예컨대, #슈퍼맨 #바닷가 #결혼식 등과 같이, 텍스트 형식의 해시태그를 사용하였다. 본 발명은 추가적인 실시예에 따르면, 컨텐츠처리부(220)는 특정 표정에 속하는 어느 하나의 얼굴 이미지를 해시태그로 하는 표정 이미지 해시태그를 설정하여 제공할 수 있다. 또한, 컨텐츠처리부(220)는 표정 이미지 해시태그를 동영상 컨텐츠에 매핑하여 제공할 수도 있다. According to another embodiment, in order to describe or search conventional video content, a hash tag in a text format is used, for example, #Superman #Seaside #Wedding. According to an embodiment of the present invention, the content processing unit 220 may set and provide a hashtag of an expression image that uses a hashtag of one face image belonging to a specific expression. Also, the content processing unit 220 may map and provide a facial expression image hashtag to video content.

이에 따라, 시청자는 직관적으로 좋아하는 표정 이미지를 선택하면, 해당 표정이 등장하는 컨텐츠를 손쉽게 검색하여 시청할 수 있다. 또한, 복수의 시청자들의 특정 표정 이미지의 선택을 집계하여, 각 연예인의 각 표정 별 인기도를 통계로 제공하는 서비스도 제공할 수도 있다. Accordingly, when the viewer intuitively selects a favorite facial expression image, the user can easily search and view the content of the facial expression. In addition, it is also possible to provide a service that aggregates a selection of specific facial expression images of a plurality of viewers, and provides statistics of popularity of each facial expression of each entertainer.

한편, 앞서 설명된 본 발명의 실시예에 따른 표정 기반 컨텐츠 추천 방법은 다양한 컴퓨터수단을 통하여 판독 가능한 프로그램 형태로 구현되어 컴퓨터로 판독 가능한 기록매체에 기록될 수 있다. 여기서, 기록매체는 프로그램 명령, 데이터 파일, 데이터구조 등을 단독으로 또는 조합하여 포함할 수 있다. 기록매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 예컨대 기록매체는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광 기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치를 포함한다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 와이어뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 와이어를 포함할 수 있다. 이러한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다. Meanwhile, the facial expression-based content recommendation method according to the embodiment of the present invention described above may be implemented in a form readable by various computer means and recorded in a computer-readable recording medium. Here, the recording medium may include program instructions, data files, data structures, or the like alone or in combination. The program instructions recorded on the recording medium may be specially designed and configured for the present invention or may be known and usable by those skilled in computer software. For example, the recording medium includes magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs, DVDs, and magnetic-optical media such as floptical disks ( magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions may include high-level language wires that can be executed by a computer using an interpreter, as well as machine language wires such as those produced by a compiler. Such a hardware device may be configured to operate as one or more software modules to perform the operation of the present invention, and vice versa.

본 명세서는 다수의 특정한 구현물의 세부사항들을 포함하지만, 이들은 어떠한 발명이나 청구 가능한 것의 범위에 대해서도 제한적인 것으로서 이해되어서는 안되며, 오히려 특정한 발명의 특정한 실시형태에 특유할 수 있는 특징들에 대한 설명으로서 이해되어야 한다. 개별적인 실시형태의 문맥에서 본 명세서에 기술된 특정한 특징들은 단일 실시형태에서 조합하여 구현될 수도 있다. 반대로, 단일 실시형태의 문맥에서 기술한 다양한 특징들 역시 개별적으로 혹은 어떠한 적절한 하위 조합으로도 복수의 실시형태에서 구현 가능하다. 나아가, 특징들이 특정한 조합으로 동작하고 초기에 그와 같이 청구된 바와 같이 묘사될 수 있지만, 청구된 조합으로부터의 하나 이상의 특징들은 일부 경우에 그 조합으로부터 배제될 수 있으며, 그 청구된 조합은 하위 조합이나 하위 조합의 변형물로 변경될 수 있다.This specification includes details of many specific implementations, but these should not be understood as limiting on the scope of any invention or claim, but rather as a description of features that may be specific to a particular embodiment of the particular invention. It should be understood. Certain features that are described in this specification in the context of separate embodiments may be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments individually or in any suitable subcombination. Further, although features may operate in a particular combination and may be initially depicted as so claimed, one or more features from the claimed combination may in some cases be excluded from the combination, and the claimed combination subcombined. Or sub-combinations.

마찬가지로, 특정한 순서로 도면에서 동작들을 묘사하고 있지만, 이는 바람직한 결과를 얻기 위하여 도시된 그 특정한 순서나 순차적인 순서대로 그러한 동작들을 수행하여야 한다거나 모든 도시된 동작들이 수행되어야 하는 것으로 이해되어서는 안 된다. 특정한 경우, 멀티태스킹과 병렬 프로세싱이 유리할 수 있다. 또한, 상술한 실시형태의 다양한 시스템 컴포넌트의 분리는 그러한 분리를 모든 실시형태에서 요구하는 것으로 이해되어서는 안되며, 설명한 프로그램 컴포넌트와 시스템들은 일반적으로 단일의 소프트웨어 제품으로 함께 통합되거나 다중 소프트웨어 제품에 패키징 될 수 있다는 점을 이해하여야 한다.Likewise, although the operations are depicted in the drawings in a particular order, it should not be understood that such operations should be performed in the particular order shown or in sequential order, or that all shown actions should be performed in order to obtain desirable results. In certain cases, multitasking and parallel processing may be advantageous. In addition, the separation of various system components of the above-described embodiments should not be understood as requiring such separation in all embodiments, and the described program components and systems will generally be integrated together into a single software product or packaged in multiple software products. You should understand that you can.

본 명세서에서 설명한 주제의 특정한 실시형태를 설명하였다. 기타의 실시형태들은 이하의 청구항의 범위 내에 속한다. 예컨대, 청구항에서 인용된 동작들은 상이한 순서로 수행되면서도 여전히 바람직한 결과를 성취할 수 있다. 일 예로서, 첨부도면에 도시한 프로세스는 바람직한 결과를 얻기 위하여 반드시 그 특정한 도시된 순서나 순차적인 순서를 요구하지 않는다. 특정한 구현예에서, 멀티태스킹과 병렬 프로세싱이 유리할 수 있다.Specific embodiments of the subject matter described herein have been described. Other embodiments are within the scope of the following claims. For example, the operations recited in the claims may be performed in different orders while still achieving desirable results. As an example, the process illustrated in the accompanying drawings does not necessarily require that particular illustrated order or sequential order to obtain desirable results. In certain implementations, multitasking and parallel processing can be advantageous.

본 설명은 본 발명의 최상의 모드를 제시하고 있으며, 본 발명을 설명하기 위하여, 그리고 당업자가 본 발명을 제작 및 이용할 수 있도록 하는 예를 제공하고 있다. 이렇게 작성된 명세서는 그 제시된 구체적인 용어에 본 발명을 제한하는 것이 아니다. 따라서 상술한 예를 참조하여 본 발명을 상세하게 설명하였지만, 당업자라면 본 발명의 범위를 벗어나지 않으면서도 본 예들에 대한 개조, 변경 및 변형을 가할 수 있다. 따라서 본 발명의 범위는 설명된 실시예에 의하여 정할 것이 아니고 특허청구범위에 의해 정하여져야 한다. This description presents the best mode of the invention, and provides examples to illustrate the invention and to enable those skilled in the art to make and use the invention. This written specification is not intended to limit the invention to the specific terms presented. Therefore, although the present invention has been described in detail with reference to the above-described examples, those skilled in the art can make modifications, alterations, and modifications to the examples without departing from the scope of the present invention. Therefore, the scope of the present invention should not be determined by the described embodiments, but should be determined by the claims.

본 발명은 일반적인 사람들의 보편적인 표정이 아니라, 특정인의 고유의 표정을 특정하고, 인식하며, 분류할 수 있다. 이에 따라, 특정인의 고유의 표정을 기초로 컨텐츠를 분류하고, 검색하며, 추천하는 서비스를 제공할 수 있다. 이러한 서비스는 사용자에게 새로운 사용자경험(UX)을 제공할 수 있다. 따라서 본 발명은 시판 또는 영업의 가능성이 충분할 뿐만 아니라 현실적으로 명백하게 실시할 수 있는 정도이므로 산업상 이용가능성이 있다. The present invention can identify, recognize, and classify a unique expression of a specific person, not a general expression of general people. Accordingly, it is possible to provide a service for classifying, searching, and recommending content based on a specific expression of a specific person. These services can provide a new user experience (UX) to the user. Therefore, the present invention is not only sufficient for commercial or commercial possibilities, but also practically and clearly, so it has industrial applicability.

100: 인식부 110: 기본표정인식기
120: 표정근육인식기 130: 랜드마크인식기
200: 제어부 210: 표정처리부
220: 컨텐츠처리부 100: recognition unit 110: basic expression recognizer
120: facial muscle recognition machine 130: landmark recognition machine
200: control unit 210: facial expression processing unit
220: content processing unit

Claims

When an image search term is input, an expression vector is extracted from the face image of the input image search term,
On the vector domain, the image search term is recognized by a specific expression corresponding to the expression vector group to which the extracted expression vector belongs,
And a content processing unit recommending content classified as a specific expression identical to the recognized specific expression.
Facial expression-based content recommendation device.

According to claim 1,
The device
And before the step of inputting the image search term, an expression processing unit defining a specific expression representing at least one unique expression of a specific person on a vector domain based on a plurality of basic expressions.
Facial expression-based content recommendation device.

According to claim 2,
The facial expression processing unit
Learning the basic expression recognizer to recognize the plurality of basic facial expressions,
An expression vector is extracted from a face image of a specific person based on the plurality of basic expressions through the basic expression recognizer,
Map the extracted expression vector onto the vector domain,
The expression vector group is clustered on the vector domain to generate an expression vector group,
The expression vector group is defined as the specific expression.
Facial expression-based content recommendation device.

According to claim 3,
The facial expression processing unit
And inputting a plurality of face images of the plurality of specific persons into the basic expression recognizer to extract probability values of each of the plurality of basic expressions output by the basic expression recognizer into the expression vector.
Facial expression-based content recommendation device.

According to claim 2,
The content processing unit
And extracting an expression vector from the facial image of the content, and classifying the content into a specific expression corresponding to the expression vector group to which the extracted expression vector belongs on the vector domain.
Facial expression-based content recommendation device.

The method of claim 5,
The content processing unit
When video content is input, a section of a scene in which a specific person appears in the input video content is divided into a plurality of scene sections,
The expression vector of each of the plurality of face images extracted for each of the appearance scene sections is mapped to the vector domain to derive a specific expression of each of the plurality of face images,
Determining a specific expression most frequently drawn for each of the appearance scene sections as a specific expression for the appearance scene section,
A plurality of scene sections of the video content are divided according to a plurality of scene sections according to the expression of a specific person and a specific person,
A facial expression-based content recommendation device, characterized in that a plurality of appearance scene sections provide video content classified by a specific person and a specific person's expression.

The content processing unit receiving an image search term;
The content processing unit extracting an expression vector from the face image of the image search term;
Recognizing the image search word by a specific expression corresponding to the expression vector group to which the extracted expression vector belongs on the vector domain; And
It characterized in that it comprises; the content processing unit recommends the content classified as a specific expression identical to the recognized specific expression;
Expression-based content recommendation method.

The method of claim 7,
Before the step of receiving the image search term,
The expression processing unit further includes defining a specific expression representing at least one unique expression of the specific person on the vector domain based on the plurality of basic expressions.
Expression-based content recommendation method.

The method of claim 8,
The step of defining the specific expression is
Learning a basic expression recognizer so that the expression processing unit can recognize the plurality of basic expressions;
The expression processing unit extracting an expression vector from a face image of a specific person based on the plurality of basic expressions through the basic expression recognizer;
Mapping the extracted expression vector onto a vector domain;
Generating an expression vector group by clustering the expression vector mapped on the vector domain; And
And defining the expression vector group as the specific expression.
Expression-based content recommendation method.

The method of claim 9,
Extracting the expression vector is
The expression processing unit inputs a plurality of face images of the plurality of specific persons into the basic expression recognizer and extracts a probability value of each of the plurality of basic expressions output by the basic expression recognizer into the expression vector. To do
Expression-based content recommendation method.

The method of claim 8,
Before the step of receiving the image search term,
After the step of defining the specific expression,
And the content processing unit extracting an expression vector from the facial image of the content, and classifying the content into a specific expression corresponding to the expression vector group to which the expression vector extracted on the vector domain belongs.
Expression-based content recommendation method.

The method of claim 11,
The step of classifying the content
When the content processing unit inputs video content, dividing a section of a scene in which a specific person appears in the input video content into a plurality of appearance scene sections;
Deriving a specific expression of each of the plurality of face images by mapping the expression vector of each of the plurality of face images extracted for each appearance scene section into the vector domain;
Determining, by the content processing unit, a specific expression most frequently derived for each appearance scene section as a specific expression of the appearance scene section;
A step in which the content processing unit divides a plurality of appearance scene sections of the video content into a plurality of appearance scene sections according to a specific person and a facial expression of the specific person; And
And providing, by the content processing unit, video contents divided by a specific person and a specific person's facial expression in a plurality of appearance scene sections.

A computer-readable recording medium in which a program for performing a method for recommending facial expression-based content according to any one of claims 7 to 12 is recorded.