KR20120130627A

KR20120130627A - Apparatus and method for generating animation using avatar

Info

Publication number: KR20120130627A
Application number: KR1020110048714A
Authority: KR
Inventors: 김재환
Original assignee: 한국전자통신연구원
Priority date: 2011-05-23
Filing date: 2011-05-23
Publication date: 2012-12-03
Also published as: KR101558202B1

Abstract

PURPOSE: A device for generating an animation by using an avatar and a method thereof are provided to compose animations through avatars for users in a virtual space by using a 2D image and a text. CONSTITUTION: A model generator(120) generates a parameter model by using feature points extracted from a 2D facial image. The model generator searches a pre-stored database for expression parameters matched with the feature points and feature words extracted from text data. When the text data is expressed by a voice, an animation generator(130) generates a vector indicating position changes of the feature points. The animation generator applies the expression parameters and the vector to the parameter model. [Reference numerals] (110) Input unit; (120) Model generator; (130) Animation generator; (140) Output unit

Description

Apparatus and method for generating animation using avatars {APPARATUS AND METHOD FOR GENERATING ANIMATION USING AVATAR}

본 발명은 아바타를 이용한 애니메이션 생성 장치 및 방법에 관한 것이다. 보다 상세하게, 본 발명은 이차원 얼굴 영상으로부터 생성된 아바타를 이용하여 애니메이션을 생성하기 위한 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for generating animation using avatars. More specifically, the present invention relates to an apparatus and method for generating animation using an avatar generated from a two-dimensional face image.

얼굴은 사람들 사이의 상호작용(Interaction)과 대화(Communication)에 있어 매우 중요한 요소이지만, 공학적으로 정의하고 표현하기 매우 복잡한 대상이다. 이와 같은 얼굴을 검출(Detection), 추적(tracking), 인식(recognition), 모델링(modeling), 합성(synthesis) 및 표현(rendering)하기 위한 다양한 연구가 컴퓨터 그래픽스(Computer graphics), 컴퓨터 비전(computer vision), 컴퓨터 애니메이션(Computer Animation) 등과 같은 다양한 분야에서 활발히 진행 중이다.The face is a very important element in the interaction and communication between people, but it is a very complex object to be defined and expressed in engineering. Various studies to detect, track, recognize, model, synthesize, and render such faces include computer graphics and computer vision. ), And are actively working in various fields such as computer animation.

특히, 얼굴 애니메이션(Facial Animation)은 얼굴의 움직임과 표정을 사실감 있게 표현하기 위한 것으로 얼굴의 해부학적인 구조와 섬세한 표정을 사실감 있게 표현해야 하기 때문에, 컴퓨터 애니메이션 분야에서도 가장 어려운 분야로 인식되고 있다.In particular, facial animation (Facial Animation) is to represent the movement and facial expression of the face realistically, because it must realistically express the anatomy and delicate expression of the face, it is recognized as the most difficult field in the field of computer animation.

하지만, 얼굴 애니메이션은 디지털 콘텐츠 분야에서 실시간 얼굴 애니메이션, 캐릭터 애니메이션, 대화 가능한 실사수준의 아바타(Avatar) 생성, 영상 통신, 사진 내 사람들에 대한 동영상 애니메이션 생성, 휴먼 인터페이스 등에 활용될 수 있기 때문에, 얼굴 애니메이션에 대한 기술적 수요가 증대되고 있다.However, face animation can be used for real-time face animation, character animation, interactive photorealistic avatar creation, video communication, video animation for people in photos, and human interfaces in digital content. The technical demand for is increasing.

이때, 얼굴 애니메이션을 위해서는 컴퓨터 그래픽스와 영상 처리(Image Processing) 기술을 포함한 컴퓨터 비전 기술을 이용하여 얼굴 모션 표현(Facial motion representation) 및 얼굴 모션 합성(facial motion synthesis) 기술이 필요하며, 텍스트에 대한 음성 변환(Text-To-Speech) 및 음성 합성(Speech Synthesis)을 위한 기계 학습(Machine Learning)과 같은 인공지능 기술이 필요하다.In this case, for facial animation, facial motion representation and facial motion synthesis are required using computer vision techniques including computer graphics and image processing. Artificial intelligence techniques such as machine learning for Text-To-Speech and Speech Synthesis are needed.

얼굴 애니메이션은 크게 키프레임 기반 방법, 근육 모델 기반 방법, 그리고 매개변수 모델 기반 방법의 세가지 방법들로 나뉠 수 있다.Facial animation can be divided into three methods: keyframe based method, muscle model based method, and parametric model based method.

키프레임 기반 방법은 중간에 얼굴 애니메이션을 생성하기 위해 얼굴 모델의 제어점을 선형 또는 비선형으로 보간한다.The keyframe based method interpolates the control points of the face model linearly or nonlinearly to create a face animation in the middle.

근육 모델 기반 방법은 사람의 얼굴 근육을 애니메이션 모델에 적용한 것으로, 근육의 영향을 받는 영역을 구성 후 근육의 움직임에 따라 얼굴모델을 변형하는 벡터 기반의 근육 모델 방법이 많이 이용된다. 근육 모델 기반 방법은 피부 메시의 모양이 피부 하부에 있는 근육에 의해 영향을 받기 때문에, 비록 해부학적으로 합리적이지만 근육 포인트들에 대한 자동 추적이 어려운 단점이 있다.The muscle model-based method is applied to an animation model of a human facial muscle, and a vector-based muscle model method is widely used in which a face model is transformed according to the movement of a muscle after constructing an area affected by the muscle. Muscle model-based methods have the disadvantage that, although anatomically reasonable, the automatic tracking of muscle points is difficult because the shape of the skin mesh is affected by the muscles underneath the skin.

도 1은 종래에 따른 선행 근육모델을 도시한 도면이다. 여기서, Water의 선행 근육모델은 도 1에 도시된 바와 같다.1 is a view showing a prior muscle model according to the prior art. Here, the preceding muscle model of Water is as shown in FIG.

매개변수 모델 기반 방법은 Carl-Heman Hjortsjo에 의해 처음으로 정의된 움직임 단위들(Action Units, 이하에서는 'AUs'라고도 함)을 기반으로 얼굴 움직임 코딩 시스템(Facial Action Coding System, 이하에서는 'FACS'라고도 함)을 이용한 방법으로서, 얼굴의 표정을 AUs로 세분화하여 매개변수와 함께 이용한다.The parametric model-based method is also known as the Facial Action Coding System (FACS), based on the first movement units defined by Carl-Heman Hjortsjo (also known as 'AUs'). The facial expression is subdivided into AUs and used with parameters.

도 2는 종래에 따른 움직임 단위들의 예제 리스트를 도시한 도면이다. 여기서, Aus의 예제 리스트는 도 2에 도시된 바와 같다.2 is a diagram illustrating an example list of motion units according to the related art. Here, an example list of Aus is as shown in FIG.

이러한 FACS는 컴퓨터 그래픽스와 컴퓨터 비전 분야에서도 널리 사용되고 있으며, MPEG-4 얼굴 애니메이션 표준에서는 더욱 확장된 FACS가 채택되었다. 이때, 엠펙4(MPEG-4) 얼굴 애니메이션 표준에서는 3차원 메시와 꼭지점인 얼굴 포인트(Facial Points, 이하에서는 'FPs'라고도 함), 그리고 66 로우레벨(low-level) 얼굴 애니메이션 파라미터(Facial Animation Parameters, 이하에서는 'FAPs'라고도 함)와 2 하이레벨(high-level) FAPs을 정의한다.This FACS is widely used in computer graphics and computer vision, and the extended FACS is adopted in the MPEG-4 facial animation standard. At this time, in the MPEG-4 face animation standard, 3D meshes and vertex face points (Facial Points, hereinafter referred to as 'FPs'), and 66 low-level face animation parameters (Facial Animation Parameters) (Hereinafter referred to as 'FAPs') and two high-level FAPs.

얼굴 모션 표현은 크게 영상 기반(image-based) 방법과 모델 기반(model-based) 방법으로 나뉠 수 있다. 영상 기반 방법의 특징은 얼굴 내의 특징점 변화에 대한 확률적 모델을 생성해야 하기 때문에, 다량의 얼굴 영상 학습 데이터가 요구된다. 반면, 모델 기반 방법은 얼굴 데이터를 2차원 혹은 3차원 메시로 표현하여 얼굴 모션을 메시의 변형으로 표현한다. 그리고 지금까지 일반적으로 얼굴 애니메이션이라 하면 모델 기반으로 생성된 메시를 대상으로 매개변수 조절을 통해 표현된 얼굴 영상을 말한다.Facial motion expression can be largely divided into an image-based method and a model-based method. Since the feature of the image-based method has to generate a probabilistic model of the feature point change in the face, a large amount of face image training data is required. On the other hand, the model-based method expresses face data as a two-dimensional or three-dimensional mesh to express facial motion as a deformation of the mesh. In general, face animation is a face image expressed through parameter adjustment for a mesh generated based on a model.

얼굴 모션 합성은 얼굴 모션의 매개변수화 표현이 정의되면, 얼굴 모션 합성은 시간에 따른 해당 매개변수들의 궤적 생성과 해당 키 모션 영상들의 보간을 이용하여 수행된다. 얼굴 매개변수들의 궤적 생성시 음성과 입모양 합성을 위해 은닉 마르코프 모델(Hidden Markov Model, 이하에서는 'HMM'이라고도 함)과 같은 확률적 모델 학습 방법을 주로 이용한다. 또한, 얼굴 표정 영상 합성 시, 영상 워핑(warping) 방법을 통해 얼굴 모션을 합성할 수 있다.When face motion synthesis is defined as a parameterized expression of face motion, face motion synthesis is performed using a trajectory generation of corresponding parameters over time and interpolation of corresponding key motion images. Probability model learning methods, such as Hidden Markov Model (HID), are used mainly for speech and mouth synthesis when generating trajectories of face parameters. In addition, when synthesizing the facial expression image, the facial motion may be synthesized through an image warping method.

세부 얼굴 표현 및 요소 추출(subtle facial expression & component extraction)에서 보다 사실적인 얼굴 표정 및 애니메이션 표현을 위해 얼굴 주름 표현 및 눈동자의 움직임과 다양한 입모양 표현이 매우 중요하다. 얼굴의 주름은 표정 비율 이미지(Expressive Ratio Image, 이하에서는 'ERI'라고도 함)를 이용하여 사실감을 개선할 수 있으나 계산량이 많은 문제점이 있다.In subtle facial expression and component extraction, facial wrinkle expression and pupil movement and various mouth expressions are very important for more realistic facial expression and animation expression. The wrinkles of the face may improve the realism by using an expressive ratio image (hereinafter, also referred to as 'ERI'), but there is a problem of a large amount of computation.

이때, 영상 기반의 방법으로 다양한 눈동자와 입 모양의 움직임을 표현하기 위해서는 세부 부분을 영상으로부터 분리하여 추출해야 한다. 이때, 얼굴 애니메이션의 사실감 증대를 위해서는 눈의 깜박임이나 두 입술의 개별적인 움직임, 미세한 주름과 같은 표현요소들이 매우 중요하다.At this time, in order to express various eye and mouth movements by using an image-based method, the detail part has to be extracted from the image. In this case, expression elements such as eye blinking, individual movements of two lips, and fine wrinkles are very important to increase the realism of facial animation.

도 3은 종래에 따른 영상에서 세부 부분을 추출한 영상 예제를 도시한 도면이다. 여기서, 눈 영상(10)에서 추출된 눈동자(11)와 나머지 텍스처 부분(13)은 도 4에 도시된 바와 같다.3 is a diagram illustrating an example of an image in which a detail part is extracted from an image according to the related art. Here, the pupil 11 extracted from the eye image 10 and the remaining texture portion 13 are as shown in FIG. 4.

도 4는 종래에 따른 영상 분리 과정에 대한 영상 예제를 도시한 도면이다. 여기서, 입술영역(Mouth region)(20)으로부터 그레디언트 벡터 맵(gradient vector map)(21)과 키 포인트 추출 영상(extraction of key point)(22)을 생성하고, 그레디언트 벡터 맵(gradient vector map)(21), 키 포인트 추출 영상(extraction of key point)(22) 및 다항식 모델(polynomial model)(23)을 이용하여 입술부분(segmented lips)(24)를 생성하는 영상 분리 과정은 도 4에 도시된 바와 같다.4 is a diagram illustrating an example of an image separation process according to the related art. Here, a gradient vector map 21 and an extraction of key point 22 are generated from the Mouth region 20, and a gradient vector map ( 21), an image separation process of generating segmented lips 24 using an extraction of key point 22 and a polynomial model 23 is illustrated in FIG. 4. As shown.

종래에는, 가상공간에서 사용자들간의 대화가 문자, 음성 또는 영상을 통해 이루어졌다. 하지만, 영상을 통한 대화는 데이터의 용량, 또는 데이터의 전송을 위한 비용이 문자나 음성과 같은 데이터에 비해 상대적으로 크기 때문에, 가상공간에서 사용자들간의 대화는 주로 문자나 음성을 통해서만 이루어졌다.Conventionally, conversations between users in a virtual space have been made through text, voice, or video. However, since conversations through video are relatively large in terms of data capacity or data transmission cost compared to data such as text or voice, conversations between users in the virtual space are mainly performed through text or voice.

따라서, 데이터의 용량이나 비용이 낮은 방법을 통해 사용자에게 가상공간에서 실사적인 얼굴 애니메이션을 제공할 수 있는 방안이 요구된다.Accordingly, there is a need for a method of providing realistic face animation in a virtual space to a user through a method of low data capacity or cost.

본 발명의 목적은, 가상공간에서 실사적인 얼굴 애니메이션을 제공하기 위해 아바타를 이용한 애니메이션 생성 장치 및 방법을 제공하는 것이다.SUMMARY OF THE INVENTION An object of the present invention is to provide an apparatus and method for generating animation using an avatar to provide realistic face animation in a virtual space.

본 발명의 특징에 따른 아바타를 이용한 애니메이션 생성 장치는 모델 생성부, 애니메이션 생성부 및 출력부를 포함한다. 모델 생성부는 이차원 얼굴 영상으로부터 추출된 복수 개의 특징점들을 이용하여 표정의 변형이 가능한 매개변수 모델을 생성하고, 텍스트 데이터에서 추출된 특징 단어와 복수 개의 특징점들에 대응되는 표정 매개변수를 미리 저장된 데이터베이스에서 검색한다. 애니메이션 생성부는 텍스트 데이터에 포함된 단어의 음성학적 특성을 이용하여 텍스트 데이터를 음성으로 표현할 때 복수 개의 특징점들의 위치 변화를 나타내는 벡터를 생성하고, 매개변수 모델에 표정 매개변수 및 벡터를 적용하여 텍스트 데이터를 음성으로 표현하는 동안의 표정 변화를 반영한 애니메이션을 생성한다. 출력부는 가상공간에서 이차원 얼굴 영상에 대응되는 아바타를 이용하여 애니메이션을 출력한다.An apparatus for generating animation using an avatar according to an aspect of the present invention includes a model generator, an animation generator, and an output unit. The model generator generates a parametric model capable of transforming a facial expression using a plurality of feature points extracted from a two-dimensional face image, and stores a feature word extracted from text data and facial expression parameters corresponding to the plurality of feature points in a previously stored database. Search. The animation generator generates a vector representing a change in position of a plurality of feature points when the text data is spoken using the phonetic characteristics of the words included in the text data, and applies the expression parameter and the vector to the parameter model. Creates an animation that reflects facial expression changes during speech. The output unit outputs an animation using an avatar corresponding to the two-dimensional face image in the virtual space.

이때, 아바타를 이용한 애니메이션 생성 장치의 모델 생성부는 복수 개의 특징점들을 이용하여 이차원 얼굴 영상에서 얼굴 요소들을 추출하고, 얼굴 요소들의 변형을 통해 표정의 변형이 가능한 매개변수 모델을 생성한다.In this case, the model generator of the animation generating apparatus using the avatar extracts face elements from the two-dimensional face image by using a plurality of feature points, and generates a parametric model that is capable of modifying an expression by modifying the face elements.

또한, 아바타를 이용한 애니메이션 생성 장치에서, 벡터는 텍스트 데이터를 음성으로 표현할 때 얼굴 요소들에 대한 궤적을 나타낸다.In addition, in the animation generating apparatus using the avatar, the vector represents a trajectory for facial elements when the text data is expressed by voice.

또한, 아바타를 이용한 애니메이션 생성 장치의 애니메이션 생성부는 표정 매개변수에 따라 얼굴 요소들을 변형하여 얼굴 표정 영상을 생성하고, 얼굴 표정 영상에 벡터를 적용하여 텍스트 데이터를 음성으로 표현하는 동안에 얼굴 요소들의 움직임을 반영한 얼굴 애니메이션을 생성한다.In addition, the animation generator of the animation generating apparatus using the avatar generates facial expression images by modifying facial elements according to facial expression parameters, and applies motion vectors to facial expression images to express text data using voice. Create a reflected face animation.

또한, 아바타를 이용한 애니메이션 생성 장치의 애니메이션 생성부는 텍스트 데이터를 웨이브폼 형태로 변환하여 텍스트 데이터에 대응되는 음성 데이터를 생성하는 음성 변환부, 그리고 얼굴 애니메이션을 음성 데이터와 합성하여 대화 애니메이션을 생성하는 음성 합성부를 포함한다.The animation generating unit of the animation generating apparatus using the avatar converts the text data into a waveform to generate a voice data corresponding to the text data, and generates a dialogue animation by synthesizing a face animation with the voice data. It includes a synthesis unit.

또한, 아바타를 이용한 애니메이션 생성 장치의 모델 생성부는 복수 개의 특징점들을 이용하여 이차원 얼굴 영상에서 복수 개의 템플릿들을 추출하고, 디지털 이미지 매팅을 통해 복수 개의 템플릿들에 대응되는 매팅 영상들을 생성하며, 매팅 영상들을 이용하여 얼굴 요소들을 생성한다.The model generator of the animation generating apparatus using the avatar extracts a plurality of templates from a two-dimensional face image by using a plurality of feature points, generates matting images corresponding to the plurality of templates through digital image matting, and generates matting images. Create facial elements.

또한, 아바타를 이용한 애니메이션 생성 장치의 모델 생성부는 템플릿 매칭을 통해 이차원 얼굴 영상에서 복수 개의 템플릿들을 추출한다.The model generator of the animation generating apparatus using the avatar extracts a plurality of templates from the two-dimensional face image through template matching.

또한, 아바타를 이용한 애니메이션 생성 장치의 모델 생성부는 복수 개의 특징점들을 아핀 변환(Affine Transformation)을 통해 정규화하여 정규화된 얼굴 모델을 생성하고, 정규화된 얼굴 모델과 특징 단에 대응되는 표정 매개변수를 검색하는 검색부를 포함한다.In addition, the model generator of the animation generating apparatus using the avatar generates a normalized face model by normalizing a plurality of feature points through an affine transformation, and retrieves the normalized face model and facial expression parameters corresponding to the feature stage. It includes a search unit.

또한, 아바타를 이용한 애니메이션 생성 장치에서, 데이터베이스는 미리 저장된 표본 영상들과 표본 단어들을 서로 다른 표정들로 분류하여 각 표정에 대응되는 복수 개의 표정 매개변수들을 저장한다.Also, in the animation generating apparatus using the avatar, the database classifies the pre-stored sample images and sample words into different expressions and stores a plurality of expression parameters corresponding to each expression.

본 발명의 특징에 따른 아바타를 이용한 애니메이션 생성 방법은 이차원 얼굴 영상 및 텍스트 데이터를 포함하는 입력 데이터로부터 복수 개의 특징점들 및 특징 단어를 포함하는 특징 데이터를 추출하는 단계, 복수 개의 특징점들을 이용하여 매개변수 제어를 통해 표정의 변형이 가능한 매개변수 모델을 생성하는 단계, 미리 저장된 데이터베이스에서 특징 데이터에 대응되는 표정 매개변수를 검색하는 단계, 텍스트 데이터를 음성으로 표현하는 동안에 복수 개의 특징점들의 위치 변화를 나타내는 벡터를 생성하는 단계, 매개변수 모델에 표정 매개변수 및 벡터를 적용하여 텍스트 데이터를 음성으로 표현하는 동안에 복수 개의 특징점들의 위치 변화를 반영한 얼굴 애니메이션을 생성하는 단계, 그리고 텍스트 데이터를 웨이브폼 형태로 변환한 음성 데이터를 얼굴 애니메이션과 합성하여 대화 애니메이션을 생성하는 단계를 포함한다.According to an aspect of the present invention, there is provided a method of generating an animation using an avatar, extracting feature data including a plurality of feature points and feature words from input data including a two-dimensional face image and text data, and using a plurality of feature points as parameters. Generating a parameter model capable of transforming a facial expression through control, retrieving a facial expression parameter corresponding to the characteristic data from a pre-stored database, and a vector representing the positional change of a plurality of feature points while the text data is spoken. Generating facial expressions, applying facial expression parameters and vectors to the parametric model, generating facial animations reflecting the positional changes of the plurality of feature points while the textual data is spoken, and converting the textual data into waveforms. voice Synthesizing the data with the facial animation to generate a dialogue animation.

이때, 표정 매개변수를 검색하는 단계는, 복수 개의 특징점들을 정규화하여 정규화된 얼굴 모델을 생성하는 단계, 그리고 데이터베이스에서 정규화된 얼굴 모델 및 특징 단어에 대응되는 표정 매개변수를 검색하는 단계를 포함한다.In this case, the retrieving the facial expression parameter may include generating a normalized face model by normalizing the plurality of feature points, and retrieving the facial expression parameter corresponding to the normalized face model and the feature word in a database.

또한, 아바타를 이용한 애니메이션 생성 방법에서, 데이터베이스는 미리 저장된 표본 영상들과 표본 단어들을 서로 다른 표정들로 분류하여 각 표정에 대응되는 표정 매개변수를 저장한다.In addition, in the animation generation method using the avatar, the database classifies previously stored sample images and sample words into different expressions and stores expression parameters corresponding to each expression.

또한, 매개변수 모델을 생성하는 단계는, 복수 개의 특징점들을 이용하여 이차원 얼굴 영상에서 얼굴 요소들을 검출하는 단계, 그리고 얼굴 요소들로 구성되고 얼굴 요소들의 변형을 통해 표정의 변형이 가능한 상기 매개변수 모델을 생성하는 단계를 포함한다.The generating of the parametric model may include: detecting facial elements in a two-dimensional face image using a plurality of feature points, and including the facial elements and modifying the facial expressions through the deformation of the facial elements. Generating a step.

또한, 벡터를 생성하는 단계는 텍스트 데이터를 음성으로 표현하는 동안에 얼굴 요소들의 궤적을 나타내는 벡터를 생성한다.In addition, generating the vector generates a vector representing the trajectories of the face elements while the text data is spoken.

또한, 얼굴 애니메이션을 생성하는 단계는, 표정 매개변수에 따라 매개변수 모델의 표정을 변형하여 얼굴 표정 모델을 생성하는 단계, 그리고 얼굴 표정 모델에 벡터를 적용하여 텍스트 데이터를 음성으로 표현하는 동안에 얼굴 요소들의 움직임을 반영한 얼굴 애니메이션을 생성하는 단계를 포함한다.In addition, the step of generating a facial animation, the step of generating a facial expression model by modifying the expression of the parametric model in accordance with the facial expression parameters, and the facial elements while expressing text data by voice by applying a vector to the facial expression model Generating a facial animation reflecting the movement of the children.

또한, 아바타를 이용한 애니메이션 생성 방법은, 가상공간에서 이차원 얼굴 영상에 대응되는 아바타를 이용하여 대화 애니메이션을 출력하는 단계를 더 포함한다.The method of generating animation using an avatar further includes outputting a dialogue animation using an avatar corresponding to the two-dimensional face image in the virtual space.

본 발명의 특징에 따르면, 하나의 이차원 영상과 텍스트를 이용하여 가상공간에서 사용자들을 대신하는 아바타를 통해 애니메이션을 합성할 수 있는 효과가 있다.According to a feature of the present invention, it is possible to synthesize an animation through an avatar on behalf of users in a virtual space using a single two-dimensional image and text.

또한, 본 발명의 특징에 따르면, 이차원 영상에서 추출된 얼굴 요소들을 이용하여 영상과 텍스트에 따른 표정을 반영한 애니메이션을 합성할 수 있는 효과가 있다.In addition, according to the feature of the present invention, there is an effect that can synthesize the animation reflecting the expression according to the image and the text using the facial elements extracted from the two-dimensional image.

또한, 본 발명의 특징에 따르면, 텍스트에 대응되는 음성을 합성하여 출력할 때 해당 텍스트의 음성학적 특성을 이용하여 해당 음성의 출력에 따른 얼굴의 표정 변화를 반영한 애니메이션을 합성할 수 있는 효과가 있다.In addition, according to a feature of the present invention, when synthesizing and outputting speech corresponding to text, an animation reflecting a change in facial expression according to the output of the speech may be synthesized using the phonetic characteristics of the text. .

도 1은 종래에 따른 선행 근육모델을 도시한 도면이다.
도 2는 종래에 따른 움직임 단위들의 예제 리스트를 도시한 도면이다.
도 3은 종래에 따른 영상에서 세부 부분을 추출한 영상 예제를 도시한 도면이다.
도 4는 종래에 따른 영상 분리 과정에 대한 영상 예제를 도시한 도면이다.
도 5는 본 발명의 실시 예에 따른 애니메이션 생성 장치의 구성을 도시한 도면이다.
도 6은 본 발명의 실시 예에 따른 모델 생성부의 구성을 도시한 도면이다.
도 7은 본 발명의 실시 예에 따른 얼굴 요소 추출 방법을 도시한 도면이다.
도 8은 본 발명의 실시 예에 따른 아랫입술 추출 과정을 도시한 도면이다.
도 9는 본 발명의 실시 예에 따른 데이터 분류 방법을 도시한 도면이다.
도 10은 본 발명의 실시 예에 따른 표본 영상의 정규화 방법을 도시한 도면이다.
도 11은 본 발명의 실시 예에 따른 특징점 검출 방법을 도시한 도면이다.
도 12는 본 발명의 실시 예에 따른 특징점 좌표 산출 방법을 도시한 도면이다.
도 13은 본 발명의 실시 예에 따른 애니메이션 생성부의 구성을 도시한 도면이다.
도 14는 본 발명의 제1 실시 예에 따른 얼굴 요소의 좌표를 도시한 도면이다.
도 15는 본 발명의 제1 실시 예에 따른 눈동자의 좌표를 도시한 도면이다.
도 16은 본 발명의 제2 실시 예에 따른 눈동자의 좌표를 도시한 도면이다.
도 17은 본 발명의 제2 실시 예에 따른 얼굴 요소의 좌표를 도시한 도면이다.
도 18은 본 발명의 제1 실시 예에 따른 입술의 좌표를 도시한 도면이다.
도 19는 본 발명의 제2 실시 예에 따른 입술의 좌표를 도시한 도면이다.
도 20은 본 발명의 실시 예에 따른 제1 얼굴 표정 영상을 도시한 도면이다.
도 21은 본 발명의 실시 예에 따른 제2 얼굴 표정 영상을 도시한 도면이다.
도 22는 본 발명의 실시 예에 따른 제3 얼굴 표정 영상을 도시한 도면이다.
도 23은 본 발명의 실시 예에 따른 제4 얼굴 표정 영상을 도시한 도면이다.
도 24는 본 발명의 실시 예에 따른 애니메이션 생성 방법을 도시한 도면이다.
도 25는 본 발명의 실시 예에 따른 아바타들간의 대화 애니메이션 출력 영상을 도시한 도면이다.1 is a view showing a prior muscle model according to the prior art.
2 is a diagram illustrating an example list of motion units according to the related art.
3 is a diagram illustrating an example of an image in which a detail part is extracted from an image according to the related art.
4 is a diagram illustrating an example of an image separation process according to the related art.
5 is a diagram illustrating a configuration of an animation generating device according to an embodiment of the present invention.
6 is a diagram illustrating a configuration of a model generator according to an exemplary embodiment of the present invention.
7 is a diagram illustrating a face element extraction method according to an embodiment of the present invention.
8 is a view illustrating a lower lip extraction process according to an embodiment of the present invention.
9 is a diagram illustrating a data classification method according to an embodiment of the present invention.
10 illustrates a method of normalizing a sample image according to an exemplary embodiment of the present invention.
11 is a diagram illustrating a feature point detection method according to an exemplary embodiment of the present invention.
12 is a diagram illustrating a feature point coordinate calculation method according to an embodiment of the present invention.
13 is a diagram illustrating a configuration of an animation generator according to an exemplary embodiment of the present invention.
14 is a diagram illustrating coordinates of a face element according to a first embodiment of the present invention.
15 is a diagram illustrating coordinates of a pupil according to a first embodiment of the present invention.
16 is a diagram illustrating the coordinates of the pupil according to the second embodiment of the present invention.
17 is a diagram illustrating coordinates of a face element according to a second embodiment of the present invention.
18 is a diagram illustrating coordinates of the lips according to the first embodiment of the present invention.
19 is a diagram illustrating coordinates of a lip according to a second embodiment of the present invention.
20 is a diagram illustrating a first facial expression image according to an exemplary embodiment of the present invention.
21 is a diagram illustrating a second facial expression image according to an exemplary embodiment of the present invention.
22 is a diagram illustrating a third facial expression image according to an embodiment of the present invention.
FIG. 23 is a diagram illustrating a fourth facial expression image according to an exemplary embodiment of the present invention.
24 is a diagram illustrating a method of generating animation according to an embodiment of the present invention.
25 is a diagram illustrating a dialogue animation output image between avatars according to an embodiment of the present invention.

본 발명을 첨부된 도면을 참고하여 상세히 설명하면 다음과 같다. 여기서, 반복되는 설명, 본 발명의 요지를 불필요하게 흐릴 수 있는 고지 기능, 및 구성에 대한 상세한 설명은 생략한다. 본 발명의 실시형태는 해당 기술분야에서 평균적인 지식을 가진 자에게 본 발명을 보다 완전하게 설명하기 위해서 제공되는 것이다. 따라서, 도면에서의 요소들의 형상 및 크기 등은 보다 명확한 설명을 위해 과장될 수 있다.
Hereinafter, the present invention will be described in detail with reference to the accompanying drawings. Here, the repeated description, the notification function that may unnecessarily obscure the gist of the present invention, and the detailed description of the configuration will be omitted. Embodiments of the present invention are provided to more completely explain the present invention to those skilled in the art. Accordingly, the shape and size of elements in the drawings may be exaggerated for clarity.

이제 도면을 참고하여 본 발명의 실시 예에 따른 아바타를 이용한 애니메이션 생성 장치 및 방법에 대해 설명한다.
An apparatus and method for generating animation using an avatar according to an embodiment of the present invention will now be described with reference to the drawings.

먼저, 도 5를 참고하여 본 발명의 실시 예에 따른 아바타를 이용한 애니메이션 생성 장치에 대해 설명한다.First, an animation generating apparatus using an avatar according to an embodiment of the present invention will be described with reference to FIG. 5.

도 5는 본 발명의 실시 예에 따른 애니메이션 생성 장치의 구성을 도시한 도면이다.5 is a diagram illustrating a configuration of an animation generating device according to an embodiment of the present invention.

도 5에 도시된 바와 같이, 본 발명의 실시 예에 따른 애니메이션 생성 장치(100)는 입력되는 영상에 대응되는 아바타(Avatar)를 생성하기 위한 것으로, 입력부(110), 모델 생성부(120), 애니메이션 생성부(130) 및 출력부(140)를 포함한다.As shown in FIG. 5, the animation generating apparatus 100 according to an embodiment of the present invention is for generating an avatar corresponding to an input image, and includes an input unit 110, a model generator 120, The animation generator 130 and the output unit 140 is included.

입력부(110)는 가상공간에서 아바타로 표현하고자 하는 대상의 이차원 얼굴 영상(2D Facial Image), 및 가상공간에서 아바타를 통해 음성으로 출력하고자 하는 단어나 문장 등의 문장 요소를 포함하는 텍스트(Text) 데이터를 입력 받는다.The input unit 110 includes a 2D facial image of an object to be expressed as an avatar in a virtual space, and a text element including sentence elements such as a word or a sentence to be output as a voice through the avatar in the virtual space. Receive data.

모델 생성부(120)는 입력된 이차원 얼굴 영상으로부터 복수 개의 특징점들을 추출하고, 추출된 복수 개의 특징점들을 이용하여 얼굴 표정(Expression)을 결정하는 움직임 단위들(Action Units, 이하에서는 'AUs'라고도 함)에 해당하는 얼굴 요소들을 추출하고, 추출된 얼굴 요소들의 변형을 통해 표정의 변형이 가능한 매개변수 모델(Parameter Model)을 생성한다. 또한, 모델 생성부(120)는 입력된 텍스트 데이터 및 추출된 복수 개의 특징점들을 이용하여 아바타의 표정을 결정하기 위한 표정 매개변수(Expression Parameter)를 검출한다.The model generator 120 extracts a plurality of feature points from the input two-dimensional face image, and uses the extracted feature points to determine facial expressions. Action Units (hereinafter referred to as 'AUs') ), And extracts the facial elements corresponding to), and creates a parameter model that can modify the facial expressions by modifying the extracted facial elements. In addition, the model generator 120 detects an expression parameter for determining an expression of the avatar using the input text data and the extracted plurality of feature points.

애니메이션 생성부(130)는 검출된 표정 매개변수에 따라 매개변수 모델의 표정을 변형하여 얼굴 표정 영상을 생성하고, 텍스트 데이터에 포함된 문장 요소에 따른 특징점들의 좌표를 산출하여 텍스트 데이터를 음성으로 표현할 때 특징점들의 좌표 변화를 나타내는 고유벡터(Eigenvector)를 생성하고, 텍스트 데이터를 청취 가능한 웨이브폼(waveform) 형태로 변환하여 음성 데이터를 생성하며, 얼굴 표정 영상에 고유벡터를 반영하여 얼굴 애니메이션을 생성한 후 얼굴 애니메이션과 음성 데이터를 합성하여 대화 애니메이션을 생성한다.The animation generator 130 generates a facial expression image by modifying the expression of the parameter model according to the detected facial expression parameters, and calculates coordinates of feature points according to sentence elements included in the text data to express text data by voice. Eigenvectors representing the change in coordinates of feature points are generated, voice data is generated by converting text data into audible waveforms, and facial animation is generated by reflecting eigenvectors in facial expression images. After that, a dialogue animation is generated by synthesizing facial animation and voice data.

출력부(140)는 가상공간에서 이차원 얼굴 영상에 대응되는 아바타를 이용하여 대화 애니메이션을 출력한다.
The output unit 140 outputs a dialogue animation using an avatar corresponding to the two-dimensional face image in the virtual space.

다음, 도 6을 참고하여 본 발명의 실시 예에 따른 모델 생성부의 구성에 대해 설명한다.Next, a configuration of the model generator according to an exemplary embodiment of the present invention will be described with reference to FIG. 6.

도 6은 본 발명의 실시 예에 따른 모델 생성부의 구성을 도시한 도면이다.6 is a diagram illustrating a configuration of a model generator according to an exemplary embodiment of the present invention.

도 6에 도시된 바와 같이, 본 발명의 실시 예에 따른 모델 생성부(120)는 특징 추출부(121), 검색부(123) 및 학습부(125)를 포함한다.As shown in FIG. 6, the model generator 120 includes a feature extractor 121, a searcher 123, and a learner 125.

특징 추출부(121)는 입력된 이차원 얼굴 영상으로부터 복수 개의 특징점들을 추출하고, 입력된 텍스트 데이터에서 특징 단어를 추출한다. 여기서, 특징 추출부(121)는 아다부스트(Adaboost), 서포트 벡터 머신(support vector machine, SVM), 능동 형태 모델(active appearance model, AAM) 등과 같은 특징점 검출 알고리즘을 이용하여 각 특징점을 검출할 수 있다. 이때, 특징 추출부(121)는 눈썹의 끝점, 눈의 위쪽 중간점, 아랫입술의 중간점 등과 같이 미리 정의된 특징점들을 추출할 수 있다.The feature extractor 121 extracts a plurality of feature points from the input two-dimensional face image and extracts a feature word from the input text data. Here, the feature extractor 121 may detect each feature point using a feature point detection algorithm such as Adaboost, support vector machine (SVM), active appearance model (AAM), or the like. have. In this case, the feature extractor 121 may extract predefined feature points such as an end point of the eyebrow, an upper midpoint of the eye, and a midpoint of the lower lip.

검색부(123)는 특징 추출부(121)에서 추출된 복수 개의 특징점들과 특징 단어에 대응되는 표정 매개변수를 학습부(125)에서 검색하여 추출한다. 여기서, 검색부(123)는 추출된 복수 개의 특징점들을 정규화하여 프로토타입 얼굴 모델(Prototype Face Model)을 생성하고, 생성된 프로토타입 얼굴 모델과 추출된 특징 단어에 대응되는 표정 매개변수를 학습부(125)에서 검색하여 추출한다. 이때, 검색부(123)는 변형(translation), 회전(Rotation), 스케일링(Scaling)을 고려한 아핀 변환(Affine Transformation)을 통해 추출된 복수 개의 특징점들을 정규화하여 프로토타입 얼굴 모델을 생성할 수 있다.The searcher 123 searches and extracts a plurality of feature points extracted from the feature extractor 121 and the expression parameter corresponding to the feature word from the learner 125. Here, the searcher 123 normalizes the extracted plurality of feature points to generate a prototype face model, and calculates an expression parameter corresponding to the generated prototype face model and the extracted feature word. 125) to search for and extract. In this case, the searcher 123 may generate a prototype face model by normalizing a plurality of feature points extracted through Affine Transformation in consideration of transformation, rotation, and scaling.

학습부(125)는 기계학습을 통해 미리 저장된 표본 영상들과 표본 단어들을 서로 다른 표정들로 분류하여 각 표정에 대한 표정 매개변수를 저장한다. 여기서, 학습부(125)는 아핀 변환을 통해 표본 영상을 정규화하고, 정규화된 표본 영상에서 특징점들을 추출하며, 추출된 특징점들을 이용하여 특징점 좌표를 산출할 수 있다.
The learning unit 125 classifies sample images and sample words stored in advance through machine learning into different expressions and stores expression parameters for each expression. Here, the learner 125 may normalize the sample image through an affine transformation, extract feature points from the normalized sample image, and calculate feature point coordinates using the extracted feature points.

다음, 도 7을 참고하여 본 발명의 실시 예에 따른 모델 생성부가 얼굴 요소를 추출하는 방법에 대해 설명한다.Next, a method of extracting a face element by the model generator according to an exemplary embodiment of the present invention will be described with reference to FIG. 7.

도 7은 본 발명의 실시 예에 따른 얼굴 요소 추출 방법을 도시한 도면이다.7 is a diagram illustrating a face element extraction method according to an embodiment of the present invention.

도 7에 도시된 바와 같이, 먼저, 모델 생성부(120)는 이차원 얼굴 영상으로부터 복수 개의 특징점들을 추출한다(S100).As shown in FIG. 7, first, the model generator 120 extracts a plurality of feature points from the two-dimensional face image (S100).

다음, 모델 생성부(120)는 추출된 특징점들을 이용하여 템플릿 매칭(Template Matching)을 통해 얼굴 요소에 대한 템플릿들을 추출한다(S110). 여기서, 추출된 템플릿들은 얼굴 템플릿, 눈썹 템플릿, 눈 템플릿, 눈동자 템플릿, 입술 템플릿 등을 포함한다.Next, the model generator 120 extracts templates for face elements through template matching using the extracted feature points (S110). Here, the extracted templates include a face template, an eyebrow template, an eye template, a pupil template, a lip template, and the like.

이후, 모델 생성부(120)는 추출된 템플릿들을 이용하여 디지털 이미지 매팅(Digital Image Matting)을 통해 얼굴 요소에 대한 매팅 영상들을 생성한다(S120).Thereafter, the model generator 120 generates matting images for the face element through digital image matting using the extracted templates (S120).

다음, 모델 생성부(120)는 생성된 매팅 영상들을 이용하여 이차원 얼굴 영상에 대한 얼굴 요소들을 추출한다(S130). 여기서, 추출된 얼굴 요소들은 전체 얼굴, 눈썹, 눈, 눈동자, 입술 등을 포함한다.
Next, the model generator 120 extracts face elements for the two-dimensional face image by using the generated matting images (S130). Here, the extracted facial elements include the entire face, eyebrows, eyes, pupils, lips, and the like.

다음, 도 8을 참고하여 본 발명의 실시 예에 따른 모델 생성부가 얼굴 영상에서 아랫입술을 추출하는 방법에 대해 설명한다.Next, a method of extracting the lower lip from the face image by the model generator according to an exemplary embodiment of the present invention will be described with reference to FIG. 8.

도 8은 본 발명의 실시 예에 따른 아랫입술 추출 과정을 도시한 도면이다.8 is a view illustrating a lower lip extraction process according to an embodiment of the present invention.

도 8에 도시된 바와 같이, 먼저, 모델 생성부(120)는 템플릿 매칭을 통해서 입술 영상(210)으로부터 아랫입술에 대한 템플릿 영상(220)을 추출한다.As shown in FIG. 8, first, the model generator 120 extracts the template image 220 for the lower lip from the lip image 210 through template matching.

다음, 모델 생성부(120)는 디지털 이미지 매팅을 통해서 아랫입술에 대한 템플릿 영상(220)으로부터 아랫입술에 대한 알파맵(Alpha MAP)(230)을 추출한다.Next, the model generator 120 extracts an alpha map 230 for the lower lip from the template image 220 for the lower lip through digital image matting.

이후, 모델 생성부(120)는 아랫입술에 대한 알파맵(Alpha MAP)(230)으로부터 얼굴 요소에 해당하는 아랫입술 영상(240)을 추출한다.
Thereafter, the model generator 120 extracts the lower lip image 240 corresponding to the face element from the alpha map 230 for the lower lip.

다음은, 도 9를 참고하여 본 발명의 실시 예에 따른 학습부가 표본 데이터를 서로 다른 표정들로 분류하는 방법에 대해 설명한다.Next, a method of classifying sample data into different expressions by a learning unit according to an exemplary embodiment of the present invention will be described with reference to FIG. 9.

도 9는 본 발명의 실시 예에 따른 데이터 분류 방법을 도시한 도면이다.9 is a diagram illustrating a data classification method according to an embodiment of the present invention.

학습부(125)는 표정의 종류에 따라 복수 개의 표본 영상들과 복수 개의 표본 단어들을 분류하여 군집화할 수 있다.The learning unit 125 may classify and group the plurality of sample images and the plurality of sample words according to the type of expression.

도 9에 도시된 바와 같이, 예를 들어, 학습부(125)는 제1 영상(Image 1), 제2 영상(Image 2) 및 제1 단어(Word 1)를 "행복함(happy)"을 나타내는 제1 표정으로 분류할 수 있다. 특히, 제1 단어(Word 1)는 제1 영상(Image 1)의 표정과 제2 영상(Image 2)의 표정에 대응될 수 있다.As illustrated in FIG. 9, for example, the learner 125 may say “happy” the first image Image 1, the second image Image 2, and the first word Word 1. It can classify by the 1st expression which shows. In particular, the first word Word 1 may correspond to the expression of the first image Image 1 and the expression of the second image Image 2.

또한, 학습부(125)는 제3 영상(Image 3), 제4 영상(Image 4), 제2 단어(Word 2), 제3 단어(Word 3) 및 제4 단어(Word 4)를 "놀라움(surprise)"을 나타내는 제2 표정으로 분류할 수 있다. 특히, 제2 단어(Word 2)는 제3 영상(Image 3)의 표정과 제4 영상(Image 4)의 표정에 대응되고, 제3 단어(Word 3)는 제3 영상(Image 3)의 표정에 대응되며, 제4 단어(Word 4)는 제4 영상(Image 4)의 표정에 대응될 수 있다.In addition, the learner 125 may be surprised by the third image, the fourth image, the fourth image, the second word, the second word, the third word, and the fourth word. (surprise) "can be classified into a second facial expression. In particular, the second word Word 2 corresponds to the expression of the third image Image 3 and the expression of the fourth image Image 4, and the third word Word 3 represents the expression of the third image Image 3. The fourth word Word 4 may correspond to the expression of the fourth image Image 4.

또한, 학습부(125)는 제n-2 영상(Image n-2), 제n-1 영상(Image n-1), 제n 영상(Image n), 제p-1 단어(Word p-1) 및 제p 단어(Word p)를 "두려움(fear)"을 나타내는 제3 표정으로 분류할 수 있다. 특히, 제p-1 단어(Word p-1)는 제n-2 영상(Image n-2)의 표정과 제n 영상(Image n)의 표정에 대응되고, 제p 단어(Word p)는 제n-1 영상(Image n-1)에 대응될 수 있다.In addition, the learning unit 125 may include an n-th image Image n-2, an n-1 image Image n-1, an n-th image Image n, and a p-1 word Word p-1 ) And the p-word Word p may be classified into a third facial expression indicating "fear". In particular, the p-th word Word p-1 corresponds to the facial expression of the n-th image Image n-2 and the facial expression of the n-th image Image n, and the p-th word Word p- The image may correspond to an n-1 image n-1.

또한, 학습부(125)는 제m-2 영상(Image m-2), 제m-1 영상(Image m-1), 제m 영상(Image m), 제q-1 단어(Word q-1) 및 제q 단어(Word q)를 "혐오감(disgust)"을 나타내는 제4 표정으로 분류할 수 있다. 특히, 제q-1 단어(Word q-1)는 제m-2 영상(Image m-2)의 표정과 제m-1 영상(Image m-1)의 표정에 대응되고, 제q 단어(Word q)는 제m-2 영상(Image m-2)의 표정, 제m-1 영상(Image m-1)의 표정, 및 제m 영상(Image m)의 표정에 대응될 수 있다.In addition, the learning unit 125 may include the m-th image (Image m-2), the m-th image (Image m-1), the m-th image (Image m), the q-1 word (Word q-1) ) And the q-th word (Word q) can be classified into a fourth expression representing "disgust". In particular, the q-th word Word q-1 corresponds to the expression of the m-th image Image m-2 and the expression of the m-th image Image m-1 and the q-word Word q) may correspond to an expression of the m-th image Image m-2, an expression of the m-th image Image m-1, and an expression of the m-th image Image m.

이와 같이, 학습부(125)는 표본 영상과 표본 단어를 군집화함으로써 표정 매개변수를 산출할 때 계산량을 감소시키고 표정의 정확성을 증대시킬 수 있다.
As such, the learning unit 125 may reduce the amount of calculation and increase the accuracy of the expression when calculating the expression parameter by clustering the sample image and the sample word.

다음, 도 10 내지 도 12를 참고하여 본 발명의 실시 예에 따른 학습부가 표본 영상으로부터 특징점 좌표를 산출하는 방법에 대해 설명한다.Next, a method of calculating feature point coordinates from a sample image according to an embodiment of the present invention will be described with reference to FIGS. 10 to 12.

도 10은 본 발명의 실시 예에 따른 표본 영상의 정규화 방법을 도시한 도면이다.10 illustrates a method of normalizing a sample image according to an exemplary embodiment of the present invention.

도 10에 도시된 바와 같이, 학습부(125)는 제1 표본 영상(311)의 두 눈동자를 기준으로 변형(translation), 회전(Rotation), 스케일링(Scaling)을 고려한 아핀 변환(Affine Transformation)을 통해 제1 표본 영상(311)을 정규화한다.As illustrated in FIG. 10, the learner 125 performs an affine transformation in consideration of transformation, rotation, and scaling based on two eyes of the first sample image 311. The first sample image 311 is normalized.

또한, 학습부(125)는 제2 표본 영상(312)의 두 눈동자를 기준으로 변형(translation), 회전(Rotation), 스케일링(Scaling)을 고려한 아핀 변환(Affine Transformation)을 통해 제2 표본 영상(312)을 정규화한다.In addition, the learner 125 may perform a second sample image (Affine Transformation) by considering transformation, rotation, and scaling based on two eyes of the second sample image 312. Normalize 312).

도 11은 본 발명의 실시 예에 따른 특징점 검출 방법을 도시한 도면이다.11 is a diagram illustrating a feature point detection method according to an exemplary embodiment of the present invention.

도 11에 도시된 바와 같이, 학습부(125)는 도 11에서 제1 표본 영상(311)을 정규화하여 생성된 제1 정규화 영상(331)에서 미리 정해진 종류 및 개수에 대응되는 특징점들을 검출한다.As illustrated in FIG. 11, the learner 125 detects feature points corresponding to a predetermined type and number from the first normalized image 331 generated by normalizing the first sample image 311 in FIG. 11.

또한, 학습부(125)는 도 11에서 제2 표본 영상(312)을 정규화하여 생성된 제2 정규화 영상(332)에서 미리 정해진 종류 및 개수에 대응되는 특징점들을 검출한다.In addition, the learner 125 detects feature points corresponding to a predetermined type and number from the second normalized image 332 generated by normalizing the second sample image 312 in FIG. 11.

도 12는 본 발명의 실시 예에 따른 특징점 좌표 산출 방법을 도시한 도면이다.12 is a diagram illustrating a feature point coordinate calculation method according to an embodiment of the present invention.

도 12에 도시된 바와 같이, 학습부(125)는 제1 정규화 영상(331)에서 검출된 특징점들 각각에 대한 좌표를 산출한다.As illustrated in FIG. 12, the learner 125 calculates coordinates for each of the feature points detected in the first normalized image 331.

또한, 학습부(125)는 제2 정규화 영상(332)에서 검출된 특징점들 각각에 대한 좌표를 산출한다.
In addition, the learner 125 calculates coordinates for each of the feature points detected in the second normalized image 332.

다음, 도 13을 참고하여 본 발명의 실시 예에 따른 애니메이션 생성부의 구성에 대해 설명한다.Next, a configuration of an animation generator according to an exemplary embodiment of the present invention will be described with reference to FIG. 13.

도 13은 본 발명의 실시 예에 따른 애니메이션 생성부의 구성을 도시한 도면이다.13 is a diagram illustrating a configuration of an animation generator according to an exemplary embodiment of the present invention.

도 13에 도시된 바와 같이, 애니메이션 생성부(130)는 표정 변형부(131), 고유벡터 생성부(133), 영상 보정부(135), 음성 변환부(137) 및 음성 합성부(139)를 포함한다.As shown in FIG. 13, the animation generator 130 includes an expression transform unit 131, an eigenvector generator 133, an image corrector 135, a voice converter 137, and a voice synthesizer 139. It includes.

표정 변형부(131)는 표정 매개변수에 따라 매개변수 모델을 구성하는 얼굴 요소들을 변형하여 얼굴 표정 영상을 생성한다. 여기서, 표정 변형부(131)는 표정 매개변수에 따라 얼굴 요소들에 대한 특징점 위치 변화를 통해 매개변수 모델의 표정을 변형할 수 있다.The facial expression modifying unit 131 generates a facial expression image by modifying facial elements configuring the parameter model according to the facial expression parameter. Here, the facial expression modifying unit 131 may deform the facial expression of the parameter model by changing the position of the feature point for the facial elements according to the facial expression parameter.

고유벡터 생성부(133)는 텍스트 데이터에 포함된 각 문장 요소에 대한 음성학적 특성을 이용하여 각 문장 요소에 대응되는 특징점 좌표를 산출하고, 산출된 특징점 좌표를 이용하여 텍스트 데이터를 음성으로 표현할 때 특징점 좌표의 변화를 나타내는 고유벡터를 생성한다. 여기서, 음성학적 특성은 해당 문장 요소를 음성으로 표현할 때 발음기관의 움직임, 얼굴 요소의 움직임 등을 포함할 수 있다.The eigenvector generator 133 calculates feature point coordinates corresponding to each sentence element by using the phonetic characteristics of each sentence element included in the text data, and when the text data is spoken using the calculated feature point coordinates. Create an eigenvector representing the change of the feature point coordinates. Here, the phonetic characteristics may include the movement of the pronunciation organ, the movement of the facial element, and the like when the sentence element is expressed as a voice.

영상 보정부(135)는 영상 워핑(Image Warping)을 통해 얼굴 애니메이션 또는 대화 애니메이션에 대한 영상 왜곡을 보정한다. 이때, 영상 보정부(135)는 표정 비율 이미지(Expressive Ratio Image, 이하에서는 'ERI'라고도 함)를 이용하여 얼굴의 주름 등을 표현할 수 있다.The image corrector 135 corrects an image distortion for a face animation or a dialogue animation through image warping. In this case, the image corrector 135 may express the wrinkles of the face using an expression ratio image (hereinafter, also referred to as 'ERI').

음성 변환부(137)는 텍스트 데이터를 청취 가능한 웨이브폼(waveform) 형태로 변환하여 음성 데이터를 생성한다.The voice converter 137 converts the text data into an audible waveform to generate voice data.

음성 합성부(139)는 얼굴 표정 영상에 고유벡터를 반영하여 생성된 얼굴 애니메이션과 음성 변환부(137)에서 생성된 음성 데이터를 합성하여 대화 애니메이션을 생성한다.
The voice synthesizer 139 generates a dialogue animation by synthesizing the facial animation generated by reflecting the eigenvectors in the facial expression image and the voice data generated by the voice converter 137.

다음, 도 14 내지 도 16을 참고하여 본 발명의 실시 예에 따른 애니메이션 생성부가 눈동자의 좌표를 결정하는 방법에 대해 설명한다.Next, a method of determining the coordinates of the pupil by the animation generator according to an embodiment of the present invention will be described with reference to FIGS. 14 to 16.

도 14는 본 발명의 제1 실시 예에 따른 얼굴 요소의 좌표를 도시한 도면이다.14 is a diagram illustrating coordinates of a face element according to a first embodiment of the present invention.

도 14에 도시된 바와 같이, 애니메이션 생성부(130)는 얼굴 요소 중 오른쪽 눈동자(410), 왼쪽 눈동자(420), 코(430)의 좌표를 x축, y축, z축의 좌표로 표현할 수 있다.As illustrated in FIG. 14, the animation generator 130 may express coordinates of the right eye 410, the left eye 420, and the nose 430 among the face elements as coordinates of the x-axis, the y-axis, and the z-axis. .

여기서, 눈동자 기준점(400)은 오른쪽 눈동자(410)와 왼쪽 눈동자(420)의 기준점에 해당한다.Here, the pupil reference point 400 corresponds to the reference point of the right pupil 410 and the left pupil 420.

도 15는 본 발명의 제1 실시 예에 따른 눈동자의 좌표를 도시한 도면이다.15 is a diagram illustrating coordinates of a pupil according to a first embodiment of the present invention.

도 15에 도시된 바와 같이, 오른쪽 눈꺼풀(440)은 움직이지 않은 상태에서 오른쪽 눈동자(410)가 아래쪽 방향으로 움직인 경우, 애니메이션 생성부(130)는 눈동자 기준점(400)을 중심으로 오른쪽 눈동자(410)의 z축 좌표의 변화량을 이용하여 오른쪽 눈동자(410)가 움직인 각도를 결정할 수 있다. 여기서, 오른쪽 눈동자(410), 왼쪽 눈동자(420), 오른쪽 눈꺼풀(440)의 z축 좌표는 수학식 1을 따른다.As shown in FIG. 15, when the right eye 410 moves downward while the right eyelid 440 does not move, the animation generator 130 may move the right eye around the eye reference point 400. The angle of movement of the right eye 410 may be determined using the change amount of the z-axis coordinate of 410. Here, the z-axis coordinates of the right eye 410, the left eye 420, and the right eyelid 440 follow Equation 1.

수학식 1에서, "z_eye"는 눈동자의 z축 좌표를 나타내고, "a"는 제1 가중치를 나타내고, "x_p"는 기준점(400)의 x축 좌표를 나타내고, "x_r"은 오른쪽 눈동자의 x축 좌표를 나타내고, "x_l"은 왼쪽 눈동자의 x축 좌표를 나타낸다. 또한, 수학식 1에서, "z_eye′"은 눈꺼풀의 z축 좌표를 나타내고, "b"는 제2 가중치를 나타낸다. 여기서, 제1 가중치는 제2 가중치보다 크다.In Equation 1, "z _eye " represents the z-axis coordinate of the pupil, "a" represents the first weight, "x _p " represents the x-axis coordinate of the reference point 400, "x _r " is the right The x-axis coordinate of the pupil is represented, and "x _l " represents the x-axis coordinate of the left pupil. In addition, in Equation 1, "z _eye '" represents the z-axis coordinate of the eyelid, and "b" represents the second weight. Here, the first weight is greater than the second weight.

도 16은 본 발명의 제2 실시 예에 따른 눈동자의 좌표를 도시한 도면이다.16 is a diagram illustrating the coordinates of the pupil according to the second embodiment of the present invention.

도 16에 도시된 바와 같이, 오른쪽 눈꺼풀(440)과 왼쪽 눈꺼풀(450)은 움직이지 않은 상태에서 오른쪽 눈동자(410)와 왼쪽 눈동자(420)가 좌측 방향으로 움직인 경우, 애니메이션 생성부(130)는 기준점(400)을 중심으로 오른쪽 눈동자(410)의 x축 좌표의 변화량을 이용하여 오른쪽 눈동자(410)가 움직인 각도를 결정할 수 있다. 여기서, 오른쪽 눈동자(410) 및 왼쪽 눈동자(420)의 x축 좌표는 수학식 2를 따른다.As shown in FIG. 16, when the right eye 410 and the left eye 420 move in the left direction while the right eyelid 440 and the left eyelid 450 do not move, the animation generator 130 The angle of the right eye pupil 410 may be determined using a change amount of the x-axis coordinate of the right eye pupil 410 about the reference point 400. Here, the x-axis coordinates of the right eye 410 and the left eye 420 follow the equation (2).

수학식 2에서, "x"는 눈동자의 x축 좌표를 나타내고, "g(?)"는 눈동자의 템플릿 함수를 나타낸다.In Equation 2, "x" represents the pupil's x-axis coordinate, and "g (?)" Represents the pupil's template function.

여기서, 애니메이션 생성부(130)는 눈동자의 움직임을 회전 행렬 연산으로 표현할 수 있다. 여기서, 움직임에 따른 눈동자의 좌표는 수학식 3을 따른다.Here, the animation generator 130 may express the movement of the pupil by a rotation matrix operation. Here, the coordinates of the pupil according to the movement follow the equation (3).

수학식 3에서, "x′"는 눈동자가 움직인 후의 x축 좌표, "y′"는 눈동자가 움직인 후의 y축 좌표, "z′"는 눈동자가 움직인 후의 z축 좌표를 나타내고, "x"는 눈동자가 움직이기 전의 x축 좌표, "y"는 눈동자가 움직이기 전의 y축 좌표, "z"는 눈동자가 움직이기 전의 z축 좌표를 나타내며, "R_xy"는 x-y 축에서의 회전 행렬을 나타내고, "R_yz"는 y-z 축에서의 회전 행렬을 나타낸다.
In Equation 3, "x '" represents the x-axis coordinate after the pupil moves, "y'" represents the y-axis coordinate after the pupil moves, and "z '" represents the z-axis coordinate after the pupil moves. x "is the x-axis coordinate before the pupil moves," y "is the y-axis coordinate before the pupil moves," z "is the z-axis coordinate before the pupil moves, and" R _xy "is the rotation on the xy axis _{Represents a} matrix, and "R _yz " represents a rotation matrix in the yz axis.

다음, 도 17 내지 도 19를 참고하여 본 발명의 실시 예에 따른 애니메이션 생성부가 입술의 좌표를 결정하는 방법에 대해 설명한다.Next, a method of determining the coordinates of the lips by the animation generator according to an embodiment of the present invention will be described with reference to FIGS. 17 to 19.

도 17은 본 발명의 제2 실시 예에 따른 얼굴 요소의 좌표를 도시한 도면이다.17 is a diagram illustrating coordinates of a face element according to a second embodiment of the present invention.

도 17에 도시된 바와 같이, 애니메이션 생성부(130)는 얼굴 요소 중 윗입술(510)과 아랫입술(520)의 좌표를 x축, y축, z축의 좌표로 표현할 수 있다.As illustrated in FIG. 17, the animation generator 130 may express coordinates of the upper lip 510 and the lower lip 520 of the face elements as coordinates of the x-axis, the y-axis, and the z-axis.

도 18은 본 발명의 제1 실시 예에 따른 입술의 좌표를 도시한 도면이다.18 is a diagram illustrating coordinates of the lips according to the first embodiment of the present invention.

도 18에 도시된 바와 같이, 입을 열지 않은 경우, 애니메이션 생성부(130)는 턱 기준점(500)에서 아랫입술(520)까지의 각도를 이용하여 아랫입술(520)의 좌표를 결정할 수 있다. 여기서, 아랫입술(520)의 좌표는 수학식 4를 따른다.As shown in FIG. 18, when the mouth is not opened, the animation generator 130 may determine the coordinates of the lower lip 520 using an angle from the jaw reference point 500 to the lower lip 520. Here, the coordinates of the lower lip 520 follow equation (4).

수학식 4에서, "x"는 영상 내의 x축 좌표, "y"는 영상 내의 y축 좌표, "f(?)"는 아랫입술의 템플릿 포물선 함수를 나타내고, "z"는 영상 내의 (x, y)에서의 z축 좌표, "

"는 임의로 정의된 특징점의 y축 좌표를 나타낸다.In Equation 4, "x" represents the x-axis coordinate in the image, "y" represents the y-axis coordinate in the image, "f (?)" Represents the template parabolic function of the lower lip, and "z" represents (x, z-axis coordinates in y), "

"Represents the y-axis coordinate of the arbitrarily defined feature point.

도 19는 본 발명의 제2 실시 예에 따른 입술의 좌표를 도시한 도면이다.19 is a diagram illustrating coordinates of a lip according to a second embodiment of the present invention.

도 19에 도시된 바와 같이, 입을 벌린 경우, 애니메이션 생성부(130)는 입을 벌린 각도를 이용하여 아랫입술(520)의 좌표를 결정할 수 있다. 여기서, 아랫입술(520)의 좌표는 수학식 5를 따른다.As shown in FIG. 19, when the mouth is open, the animation generator 130 may determine the coordinates of the lower lip 520 using the angle of opening the mouth. Here, the coordinates of the lower lip 520 follow the equation (5).

수학식 5에서, "z′"는 입을 벌린 후의 z축 좌표, "y′"는 입을 벌린 후의 y축 좌표, "θ"는 입이 벌어진 각도, "z"는 입을 벌리기 전의 z축 좌표, "y"는 입을 벌리기 전의 y축 좌표를 나타낸다.In Equation 5, "z '" is the z-axis coordinate after opening the mouth, "y'" is the y-axis coordinate after opening the mouth, "θ" is the angle at which the mouth is opened, "z" is the z-axis coordinate before opening the mouth, " y "represents the y-axis coordinate before opening a mouth.

수학식 5에 따르면, 입을 벌린 경우, 아랫입술의 모든 좌표는 (x, y)에서 (x, y′)로 변환된다.
According to Equation 5, when the mouth is opened, all coordinates of the lower lip are converted from (x, y) to (x, y ').

다음, 도 20 내지 도 23을 참고하여 본 발명의 실시 예에 따른 애니메이션 생성부가 얼굴 표정 영상을 생성하는 방법에 대해 설명한다.Next, a method of generating a facial expression image by an animation generator according to an embodiment of the present invention will be described with reference to FIGS. 20 through 23.

도 20은 본 발명의 실시 예에 따른 제1 얼굴 표정 영상을 도시한 도면이다.20 is a diagram illustrating a first facial expression image according to an exemplary embodiment of the present invention.

도 20에 도시된 바와 같이, 애니메이션 생성부(130)의 표정 변형부(131)는 매개변수 제어부(600)를 통해 표정 매개변수들을 제어하여 다양한 얼굴 표정 영상들을 생성할 수 있다. 여기서, 매개변수 제어부(600)는 "행복함(Happiness)", "놀라움(Surprise)", "슬픔(Sadness)", "두려움(Fear)", "혐오감(Disgust)", 및 "화남(Anger)"에 각각 대응되는 표정 매개변수들을 제어할 수 있다.As shown in FIG. 20, the expression modifying unit 131 of the animation generator 130 may generate various facial expression images by controlling the expression parameters through the parameter controller 600. Here, the parameter controller 600 may include "Happiness", "Surprise", "Sadness", "Fear", "Disgust", and "Anger". Expression parameters corresponding to the ") "

특히, 애니메이션 생성부(130)의 표정 변형부(131)는 표정 매개변수들을 제어하지 않음으로써 표정이 없는 얼굴 영상에 해당하는 제1 얼굴 표정 영상(610)을 생성할 수 있다.In particular, the expression modifying unit 131 of the animation generator 130 may generate the first facial expression image 610 corresponding to the facial image without the facial expression by not controlling the expression parameters.

도 21은 본 발명의 실시 예에 따른 제2 얼굴 표정 영상을 도시한 도면이다.21 is a diagram illustrating a second facial expression image according to an exemplary embodiment of the present invention.

도 21에 도시된 바와 같이, 애니메이션 생성부(130)의 표정 변형부(131)는 매개변수 제어부(600)를 통해 표정 매개변수들을 제어하여 다양한 얼굴 표정 영상들을 생성할 수 있다. 여기서, 매개변수 제어부(600)는 "행복함(Happiness)", "놀라움(Surprise)", "슬픔(Sadness)", "두려움(Fear)", "혐오감(Disgust)", 및 "화남(Anger)"에 각각 대응되는 표정 매개변수들을 제어할 수 있다.As shown in FIG. 21, the expression modifying unit 131 of the animation generator 130 may generate various facial expression images by controlling the expression parameters through the parameter controller 600. Here, the parameter controller 600 may include "Happiness", "Surprise", "Sadness", "Fear", "Disgust", and "Anger". Expression parameters corresponding to the ") "

특히, 애니메이션 생성부(130)의 표정 변형부(131)는 "화남(Anger)"에 대응되는 표정 매개변수를 제어하여 화난 표정을 나타내는 얼굴 영상에 해당하는 제2 얼굴 표정 영상(620)을 생성할 수 있다.In particular, the facial expression modifying unit 131 of the animation generator 130 controls the facial expression parameter corresponding to "Anger" to generate a second facial expression image 620 corresponding to a facial image representing an angry expression. can do.

도 22는 본 발명의 실시 예에 따른 제3 얼굴 표정 영상을 도시한 도면이다.22 is a diagram illustrating a third facial expression image according to an embodiment of the present invention.

도 22에 도시된 바와 같이, 애니메이션 생성부(130)의 표정 변형부(131)는 매개변수 제어부(600)를 통해 표정 매개변수들을 제어하여 다양한 얼굴 표정 영상들을 생성할 수 있다. 여기서, 매개변수 제어부(600)는 "행복함(Happiness)", "놀라움(Surprise)", "슬픔(Sadness)", "두려움(Fear)", "혐오감(Disgust)", 및 "화남(Anger)"에 각각 대응되는 표정 매개변수들을 제어할 수 있다.As shown in FIG. 22, the expression modifying unit 131 of the animation generator 130 may generate various facial expression images by controlling the expression parameters through the parameter controller 600. Here, the parameter controller 600 may include "Happiness", "Surprise", "Sadness", "Fear", "Disgust", and "Anger". Expression parameters corresponding to the ") "

특히, 애니메이션 생성부(130)의 표정 변형부(131)는 "행복함(Happiness)"에 대응되는 표정 매개변수를 제어하여 행복한 표정을 나타내는 얼굴 영상에 해당하는 제3 얼굴 표정 영상(630)을 생성할 수 있다.In particular, the facial expression modifying unit 131 of the animation generator 130 controls the facial expression parameter corresponding to "Happiness" to display the third facial expression image 630 corresponding to the facial image representing the happy expression. Can be generated.

도 23은 본 발명의 실시 예에 따른 제4 얼굴 표정 영상을 도시한 도면이다.FIG. 23 is a diagram illustrating a fourth facial expression image according to an exemplary embodiment of the present invention.

도 23에 도시된 바와 같이, 애니메이션 생성부(130)의 표정 변형부(131)는 매개변수 제어부(600)를 통해 표정 매개변수들을 제어하여 다양한 얼굴 표정 영상들을 생성할 수 있다. 여기서, 매개변수 제어부(600)는 "행복함(Happiness)", "놀라움(Surprise)", "슬픔(Sadness)", "두려움(Fear)", "혐오감(Disgust)", 및 "화남(Anger)"에 각각 대응되는 표정 매개변수들을 제어할 수 있다.As illustrated in FIG. 23, the expression modifying unit 131 of the animation generator 130 may generate various facial expression images by controlling the expression parameters through the parameter controller 600. Here, the parameter controller 600 may include "Happiness", "Surprise", "Sadness", "Fear", "Disgust", and "Anger". Expression parameters corresponding to the ") "

특히, 애니메이션 생성부(130)의 표정 변형부(131)는 "놀라움(Surprise)"에 대응되는 표정 매개변수를 제어하여 슬픈 표정을 나타내는 얼굴 영상에 해당하는 제4 얼굴 표정 영상(640)을 생성할 수 있다.
In particular, the expression modifying unit 131 of the animation generator 130 controls the expression parameter corresponding to “Surprise” to generate a fourth facial expression image 640 corresponding to a facial image showing a sad expression. can do.

다음, 도 24를 참고하여 본 발명의 실시 예에 따른 아바타를 이용한 애니메이션 생성 방법에 대해 설명한다.Next, a method for generating animation using an avatar according to an embodiment of the present invention will be described with reference to FIG. 24.

도 24는 본 발명의 실시 예에 따른 애니메이션 생성 방법을 도시한 도면이다.24 is a diagram illustrating a method of generating animation according to an embodiment of the present invention.

도 24에 도시된 바와 같이, 먼저, 입력부(110)는 이차원 얼굴 영상 및 텍스트 데이터를 포함하는 입력 데이터를 수신한다(S200).As shown in FIG. 24, first, the input unit 110 receives input data including a two-dimensional face image and text data (S200).

다음, 모델 생성부(120)는 입력 데이터에 포함된 이차원 얼굴 영상으로부터 복수 개의 특징점들을 추출하고, 입력 데이터에 포함된 텍스트 데이터에서 특징 단어를 추출하여, 입력 데이터로부터 복수 개의 특징점들 및 특징 단어를 포함하는 특징 데이터를 추출한다(S210).Next, the model generator 120 extracts a plurality of feature points from the two-dimensional face image included in the input data, extracts a feature word from the text data included in the input data, and extracts a plurality of feature points and the feature word from the input data. Extracting the feature data to include (S210).

이후, 모델 생성부(120)는 추출된 복수 개의 특징점들을 이용하여 입력된 이차원 얼굴 영상에서 얼굴 요소들을 추출한다(S220).Thereafter, the model generator 120 extracts face elements from the input two-dimensional face image using the extracted plurality of feature points (S220).

다음, 모델 생성부(120)는 추출된 얼굴 요소들로 구성되고 얼굴 요소들의 변형을 통해 표정의 변형이 가능한 매개변수 모델을 생성한다(S230).Next, the model generator 120 generates a parametric model composed of the extracted face elements and capable of modifying the expression through the deformation of the face elements (S230).

이후, 모델 생성부(120)는 미리 저장된 데이터베이스에서 추출된 특징 데이터 대응되는 표정 매개변수를 검색한다(S240).Thereafter, the model generator 120 searches for the facial expression parameter corresponding to the feature data extracted from the pre-stored database (S240).

다음, 애니메이션 생성부(130)는 검색된 표정 매개변수에 따라 매개변수 모델을 구성하는 얼굴 요소들을 변형하여 표정 매개변수에 대응되는 표정이 반영된 얼굴 표정 영상을 생성한다(S250).Next, the animation generator 130 generates a facial expression image reflecting the expression corresponding to the facial expression parameter by modifying the facial elements constituting the parameter model according to the retrieved facial expression parameter (S250).

이후, 애니메이션 생성부(130)는 텍스트 데이터를 음성으로 표현할 때 얼굴 표정 영상의 얼굴 요소들에 대한 궤적을 나타내는 고유벡터를 생성한다(S260). 여기서, 애니메이션 생성부(130)는 텍스트 데이터에 포함된 각 문자 요소에 대한 음성학적 특성을 이용하여 각 문장 요소에 대응되는 특징점 좌표를 산출하고, 산출된 특징점 좌표를 이용하여 고유벡터를 생성할 수 있다.Thereafter, the animation generator 130 generates eigenvectors representing the trajectories of the face elements of the facial expression image when the text data is expressed by voice (S260). Here, the animation generator 130 may calculate feature point coordinates corresponding to each sentence element by using phonetic characteristics of each character element included in the text data, and generate an eigenvector using the calculated feature point coordinates. have.

다음, 애니메이션 생성부(130)는 얼굴 표정 영상에 고유벡터를 적용하여 텍스트 데이터를 음성으로 표현하는 동안에 얼굴 표정 영상의 얼굴 요소들의 움직임을 반영한 얼굴 애니메이션을 생성한다(S270).Next, the animation generator 130 generates a facial animation reflecting the movement of the facial elements of the facial expression image while applying the eigenvector to the facial expression image, while expressing the text data by voice (S270).

이후, 애니메이션 생성부(130)는 텍스트 데이터를 웨이브폼 형태로 변환한 음성 데이터를 얼굴 애니메이션과 합성하여 대화 애니메이션을 생성한다(S280).Thereafter, the animation generator 130 generates a dialogue animation by synthesizing the voice data obtained by converting the text data into the waveform form with the face animation (S280).

다음, 출력부(140)는 가상공간에서 입력 데이터에 포함된 이차원 얼굴 영상에 대응되는 아바타를 이용하여 대화 애니메이션을 출력한다(S290).
Next, the output unit 140 outputs a dialogue animation using an avatar corresponding to the two-dimensional face image included in the input data in the virtual space (S290).

다음, 도 25를 참고하여 본 발명의 실시 예에 따른 가상공간에서 아바타들 간의 대화 애니메이션을 출력한 영상에 대해 설명한다.Next, an image outputting a dialogue animation between avatars in a virtual space will be described with reference to FIG. 25.

도 25는 본 발명의 실시 예에 따른 아바타들간의 대화 애니메이션 출력 영상을 도시한 도면이다.25 is a diagram illustrating a dialogue animation output image between avatars according to an embodiment of the present invention.

도 25에 도시된 바와 같이, 먼저, 애니메이션 생성 장치(100)는 제1 입력 데이터(710) 및 제2 입력 데이터(720)를 입력받는다. 여기서, 제1 입력 데이터(710)는 제1 이차원 얼굴 영상(711) 및 제1 대화 문장(713)을 포함하고, 제2 입력 데이터(720)는 제2 이차원 얼굴 영상(721) 및 제2 대화 문장(723)을 포함한다.As illustrated in FIG. 25, first, the animation generating apparatus 100 receives first input data 710 and second input data 720. Here, the first input data 710 includes a first two-dimensional face image 711 and a first conversation sentence 713, and the second input data 720 includes a second two-dimensional face image 721 and a second conversation. Sentence 723.

다음, 애니메이션 생성 장치(100)는 제1 입력 데이터(710)를 이용하여 제1 대화 애니메이션을 생성하고, 제2 입력 데이터(720)를 이용하여 제2 대화 애니메이션을 생성한다.Next, the animation generating apparatus 100 generates a first dialogue animation using the first input data 710 and generates a second dialogue animation using the second input data 720.

이후, 애니메이션 생성 장치(100)는 가상환경(730)에서 제1 이차원 얼굴 영상(711)에 대응되는 제1 아바타(740)를 통해 제1 대화 애니메이션을 출력하고, 제2 이차원 얼굴 영상(721)에 대응되는 제2 아바타(750)를 통해 제2 대화 애니메이션을 출력한다.
Subsequently, the animation generating apparatus 100 outputs the first dialogue animation through the first avatar 740 corresponding to the first two-dimensional face image 711 in the virtual environment 730, and the second two-dimensional face image 721. The second dialogue animation is output through the second avatar 750 corresponding to the.

이상에서와 같이 도면과 명세서에서 최적의 실시 예가 개시되었다. 여기서 특정한 용어들이 사용되었으나, 이는 단지 본 발명을 설명하기 위한 목적에서 사용된 것이지 의미 한정이나 특허청구범위에 기재된 본 발명의 범위를 제한하기 위하여 사용된 것은 아니다. 그러므로, 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시 예가 가능하다는 점을 이해할 것이다. 따라서, 본 발명의 진정한 기술적 보호범위는 첨부된 특허청구범위의 기술적 사상에 의해 정해져야 할 것이다.As described above, an optimal embodiment has been disclosed in the drawings and specification. Although specific terms have been employed herein, they are used for purposes of illustration only and are not intended to limit the scope of the invention as defined in the claims or the claims. Therefore, those skilled in the art will understand that various modifications and equivalent other embodiments are possible therefrom. Accordingly, the true scope of the present invention should be determined by the technical idea of the appended claims.

100: 애니메이션 생성 장치
110: 입력부
120: 모델 생성부
121: 특징 추출부
123: 검색부
125: 학습부
130: 애니메이션 생성부
131: 표정 변형부
133: 고유벡터 생성부
135: 영상 보정부
137: 음성 변환부
139: 음성 합성부
140: 출력부100: animation generating device
110: input unit
120: model generation unit
121: feature extraction unit
123: search unit
125:
130: animation generator
131: expression deformity
133: eigenvector generator
135: image correction unit
137: voice conversion unit
139: speech synthesis unit
140: output unit

Claims

Generating a parametric model capable of transforming a facial expression using a plurality of feature points extracted from a two-dimensional face image, and retrieving feature words extracted from text data and expression parameters corresponding to the plurality of feature points from a pre-stored database. A model generator;
When the text data is spoken using the phonetic characteristics of the words included in the text data, a vector representing a change in position of the plurality of feature points is generated, and the facial expression parameter and the vector are applied to the parameter model. An animation generator for generating an animation reflecting a change in facial expression while expressing the text data by voice; And
And an output unit configured to output the animation using an avatar corresponding to the two-dimensional face image in a virtual space.

The method according to claim 1,
The model generation unit
And extracting facial elements from the two-dimensional face image using the plurality of feature points, and generating the parametric model capable of modifying an expression by modifying the facial elements.

The method according to claim 2,
The vector is
And an avatar representing a trajectory for the face elements when the text data is expressed by voice.

The method according to claim 3,
The animation generator
Generating a facial expression image by modifying the facial elements according to the facial expression parameters, and generating a facial animation reflecting the movement of the facial elements while expressing the text data by voice by applying the vector to the facial expression image. An apparatus for generating animation using an avatar.

The method of claim 4,
The animation generator
A voice converter for converting the text data into a waveform to generate voice data corresponding to the text data; And
And a voice synthesizer configured to generate a dialogue animation by synthesizing the face animation with the voice data.

The method according to claim 2,
The model generation unit
Extracting a plurality of templates from the two-dimensional face image using the plurality of feature points, generating matting images corresponding to the plurality of templates through digital image matting, and generating the face elements using the matting images. An apparatus for generating animation using an avatar.

The method of claim 6,
The model generator
An apparatus for generating animation using an avatar extracting the plurality of templates from the two-dimensional face image through template matching.

The method according to claim 1,
The model generator
Normalize the plurality of feature points through an affine transformation to generate a normalized face model, and use the avatar to search for the normalized face model and the facial expression parameters corresponding to the feature stage. Generating device.

The method according to claim 1,
The database
An apparatus for generating animation using an avatar storing pre-stored sample images and sample words into different expressions and storing a plurality of expression parameters corresponding to each expression.

Extracting feature data including a plurality of feature points and feature words from input data including a two-dimensional face image and text data;
Generating a parameter model capable of modifying an expression through parameter control using the plurality of feature points;
Retrieving a facial expression parameter corresponding to the feature data from a pre-stored database;
Generating a vector representing a change in position of the plurality of feature points while the text data is spoken;
Generating a facial animation reflecting the positional changes of the plurality of feature points while the text data is spoken by applying the facial expression parameter and the vector to the parameter model; And
And generating a dialogue animation by synthesizing the voice data obtained by converting the text data into a waveform form with the facial animation.

The method of claim 10,
Retrieving the facial expression parameter
Normalizing the plurality of feature points to generate a normalized face model; And
And searching for the facial expression parameters corresponding to the normalized face model and the feature word in the database.

The method of claim 11,
The database
A method for generating animation using an avatar storing pre-stored sample images and sample words into different expressions to store expression parameters corresponding to each expression.

The method of claim 10,
Generating the parametric model
Detecting face elements in the two-dimensional face image using the plurality of feature points; And
And generating the parametric model composed of the face elements and capable of modifying an expression through the deformation of the face elements.

The method according to claim 13,
Generating the vector
And generating a vector representing a trajectory of the face elements while expressing the text data by voice.

The method according to claim 14,
Generating the face animation
Generating a facial expression model by modifying an expression of the parameter model according to the facial expression parameter; And
Generating a facial animation reflecting the movement of the facial elements while expressing the text data by voice by applying the vector to the facial expression model.

The method of claim 10,
And outputting the dialogue animation using an avatar corresponding to the two-dimensional face image in a virtual space.