KR101436730B1

KR101436730B1 - 3d face fitting method of unseen face using active appearance model

Info

Publication number: KR101436730B1
Application number: KR1020130031950A
Authority: KR
Inventors: 강행봉; 주명호
Original assignee: 가톨릭대학교 산학협력단
Priority date: 2013-03-26
Filing date: 2013-03-26
Publication date: 2014-09-02

Abstract

The present invention provides a method for 3D face fitting of an unlearned face using an active appearance model. The method comprises the steps of: (a) inputting an image; (b) extracting a face region including an internal region and an external region adjacent to the internal region by warping the input image to an average image; (c) applying a single scale retinex (SSR) to the face region, and extracting face texture from the face region by mapping the application result to the face region within a range from a predetermined maximum result value to a predetermined minimum result value; (d) extracting a texture error by applying the extracted face texture to an active appearance model; (e) extracting a face shape from the extracted face texture; (f) extracting a shape error by applying the extracted face shape to the active appearance model; and (g) giving a predetermined weight to the texture error and the shape error, aggregating the weighted errors, and fitting a face image in the input image to a model face by applying the aggregate result to the active appearance model. Therefore, the method can minimize a generalized error by considering the texture and shape of a face.

Description

TECHNICAL FIELD [0001] The present invention relates to a 3D facial fitting method of a non-learning face using an active appearance model,

본 발명은 능동적 외양 모델을 이용한 비학습 얼굴의 3차원 얼굴 피팅 방법에 관한 것으로서, 보다 상세하게는 얼굴의 질감 및 형태를 고려한 일반화된 에러를 최소화하는 능동적 외양 모델을 이용한 비학습 얼굴의 3차원 얼굴 피팅 방법에 관한 것이다.
The present invention relates to a 3D face fitting method of a non-learning face using an active appearance model, and more particularly, to a 3D face fitting method of a non-learning face using an active appearance model that minimizes a generalized error considering a texture and a shape of a face Fitting method.

근래에 다양한 영상 처리 어플리케이션이 증가함에 따라 얼굴 모델 피팅 기술은 얼굴 인식이나 사진 수정, 로봇 상호 작용 등과 같은 다양한 분야에서 효과적인 이용될 수 있다. 반면, 얼굴 모델 피팅을 위한 다양한 연구가 그동안 활발하게 이루어져 왔지만, 비학습 얼굴에 대한 얼굴 피팅은 아직까지도 주요한 문제로 남아 있고, 얼굴 모델 피팅 기술이 실제 어플리케이션에 이용되기 어렵게 만드는 주요 원인이 되고 있다.As various image processing applications have been increasing in recent years, face model fitting techniques can be effectively used in various fields such as face recognition, photo correction, robot interaction, and the like. On the other hand, various studies for facial model fitting have been active in the meantime, but facial fitting for non-learning faces remains a major problem, and facial model fitting technology is becoming a major cause of difficulty in practical applications.

정확한 얼굴 모델 피팅을 위해서는 입력 얼굴과 피팅되는 얼굴 모델간의 에러를 최소화하면서 입력 얼굴로부터 얼굴 형태를 찾는 것이 중요하다. 일반적으로 이러한 얼굴 모델 피팅 방법은 두 가지로 크게 분류할 수 있다.For accurate face model fitting, it is important to find the face shape from the input face while minimizing errors between the input face and the fitted face model. Generally, these face model fitting methods can be classified into two types.

첫 번째 방법은 지역적 특징(Local feature) 기반 방법으로 얼굴의 각 주요한 특징점을 학습된 지역적인 템플릿이나 회귀 함수(Regression function)를 이용하여 찾는 방법이다. 학습 얼굴로부터 각 얼굴 특징의 변화를 학습하고 능동적 외양 모델(Active Appearance Model, AAM)과 같은 파라미터 모델을 이용하여 전체 얼굴 특징점을 구성하게 된다.The first method is a local feature-based method in which each major feature point of a face is searched using a learned local template or a regression function. We learn the change of each facial feature from learning face and construct whole facial feature point by using parameter model such as Active Appearance Model (AAM).

그러나 이러한 방법은 각 특징점에서 크게 에러가 발생할 수 있기 때문에 이로 인해 부정확한 얼굴 피팅이 수행될 수 있다. 또한, 피팅된 얼굴 모델은 이용되는 파라미터 모델에 의존적으로 변화하기 때문에 비학습 얼굴의 피팅에 적용하기는 적합하지 않다.However, this method can cause an error in each feature point so that an inaccurate facial fitting can be performed. In addition, since the fitted face model changes depending on the parameter model used, it is not suitable to be applied to the fitting of the non-learning face.

두 번째 방법은 입력 얼굴과 피팅되는 얼굴 모델간의 에러를 최소화하는 모델 기반 방법이다. 이 방법은 사용되는 에러 함수와 함수의 최적화 정도에 따라 얼굴 피팅의 정확도가 결정된다.The second method is a model-based method that minimizes the error between the input face and the fitted face model. This method determines the accuracy of the face fitting depending on the error function used and the degree of optimization of the function.

이와 같은 모델 기반 방법은 지역적 특징 기반 방법에 비해 보다 정확한 얼굴 피팅 결과를 보이지만 학습되는 얼굴의 변화가 제한적이기 때문에 피팅 가능한 얼굴 모델의 변화 또한 제한적인 단점을 갖는데, 이는 비학습 얼굴 피팅에는 효과적이지 못한 영향을 미치기 된다. 또한, 모델 기반 방법은 초기 파라미터에 민감하며 최적화 과정에서 국소 최저치(Local minima)에 빠지기 쉽다.This model-based method shows more accurate face fitting results than the regional feature-based method, but has a limited disadvantage that the change of the fitable face model is also limited because of the limited change of the learned face, which is not effective for non-learning face fitting . In addition, the model-based method is sensitive to initial parameters and is prone to fall into local minima during optimization.

모델 기반 방법을 이용하여 보다 다양한 얼굴에 효과적인 얼굴 피팅을 수행하기 위해서는 제한된 학습 얼굴의 변화보다 일반적인 얼굴의 변화를 반영하는 에러 함수를 이용해야 하며 최적화 과정에서 국소 최저치에 빠지지 않은 메카니즘 (mechanism)을 개발해야 한다.In order to perform effective facial fitting on more various faces by using the model-based method, it is necessary to use an error function that reflects a general face change rather than a limited learning face change, and develops a mechanism that does not fall into the local minimum in the optimization process Should be.

비학습 얼굴 피팅을 위한 일반화된 에러 모델을 구축하기 위해 본 논문에서는 얼굴의 질감과 형태 특성을 동시에 고려한다. (본발명의 내용이므로 전체를 삭제하였습니다.) 얼굴 간의 질감 차이가 발생하는 주요 원인은 서로 다른 사람간의 얼굴 질감 차이와 얼굴 내 발생하는 조명(illumination)의 차이다. 첫 번째 원인은 서로 다른 사람간의 피부 색, 얼굴 윤곽 차이, 머리 카락이나 수염의 유무 등으로 발생하며 이러한 차이는 사람마다 매우 다양하기 때문에 이를 최소화하기 어렵다. 반면, 조명에 의해 발생하는 질감 차이는 입력되는 영상 내 얼굴 픽셀에 큰 변화를 주기 때문에 매우 크게 발생하며 이를 최소화하여 질감 차이를 최소화할 수 있다.In order to construct a generalized error model for non - learning face fitting, we consider the texture and shape characteristics of face simultaneously. (The entire contents of the present invention have been deleted.) The main cause of the difference in the texture between the faces is the difference in the face texture between the different persons and the difference in illumination occurring in the face. The first cause is skin color between different people, difference in facial contour, presence or absence of hair black hair and beard, and it is difficult to minimize this difference because it varies from person to person. On the other hand, the texture difference caused by the illumination is very large because it gives a large change to the facial pixels in the input image, and minimizes the difference, thereby minimizing the texture difference.

서로 다른 사람간에도 얼굴 형태는 유사하게 나타나기 때문에 얼굴 피팅을 위한 주요한 요소가 될 수 있다. 얼굴 형태를 반영하는 기존의 연구는 얼굴 모델에 각 픽셀의 x축과 y축의 변화(gradient)를 함께 학습하였다. 그러나 이러한 추가적인 학습은 학습되는 얼굴의 변화에 한정된 x축과 y축의 변화를 가지기 때문에 사람간의 얼굴 형태 유사성을 올바르게 반영하지 못한다.Since face shapes are similar between different people, they can be a major factor for face fitting. Existing research that reflects the facial shape has learned the x-axis and y-axis gradient of each pixel in the face model. However, this additional learning does not correctly reflect the human face shape similarity because it has x-axis and y-axis changes that are limited to the learning face changes.

한편, 얼굴의 일반적인 특성을 반영하기 위해 얼굴 모델을 이용한 다양한 얼굴 피팅 방법들이 제안되고 있다. 일 예로, T. F. Cootes, G. J. Edwards, 그리고 C. J. Taylor의 논문 "Active Appearance Models(In IEEE Transaction on Pattern Analysis and Machine Intelligence, 23(6):681-685, 2001.)"에서는 얼굴의 형태와 질감을 결합한 능동적 외양 모델(AAM)을 제안하고 있다.Meanwhile, various face fitting methods using a face model have been proposed to reflect the general characteristics of the face. For example, in TJ Cootes, GJ Edwards, and CJ Taylor, "Active Appearance Models (23 (6): 681-685, 2001.) Active appearance model (AAM).

얼굴의 형태와 질감 공간은 학습 얼굴에 주성분 분석(Principal Component Analysis) 방법을 적용하여 구성되는데, AAM은 모델 파라미터 c를 조정하여 얼굴의 질감과 형태 변화를 한 번에 조정하고 있다. 그리고, AAM은 입력 얼굴에 얼굴 모델을 피팅하기 위해 모델 파라미터와 함께 모델의 크기, 이동 변화와 같은 변형 파라미터 t를, 입력 얼굴과 얼굴 모델간의 에러를 [수학식 1]과 같이 최소화함으로써 추정하고 있다.
The face shape and texture space is constructed by applying Principal Component Analysis method to the learning face. The AAM adjusts the texture and shape change of the face at once by adjusting the model parameter c. In order to fit the face model to the input face, the AAM estimates the distortion parameter such as the size of the model and the movement of the model together with the model parameter by minimizing the error between the input face and the face model as shown in Equation 1 .

[수학식 1][Equation 1]

여기서, W(I)는 입력 얼굴 영상 I를 평균 얼굴 형태

로 와핑(Warping)시켜주는 함수이고, A는 질감 파라미터이고, g_i는 질감 고유 벡터이고,

는 학습 영상의 평균 얼굴 질감 벡터이다.Here, W (I) represents the input face image I as an average face shape

A is a texture parameter, g _i is a texture eigenvector,

Is the mean face texture vector of the training image.

AAM은 입력 얼굴의 표정 변화를 학습 얼굴의 변화로 학습함으로써 다양한 얼굴 표정 변화에 대해 비교적 정확한 얼굴 피팅을 수행할 수 있다. 그러나 얼굴의 3차원 포즈 변화에 따른 모든 변화를 학습하기 어렵기 때문에 일반적으로 정면의 얼굴 영상에서만 유용하다.AAM can perform relatively accurate facial fitting for various facial expression changes by learning the facial expression change of the input facial as learning facial variation. However, since it is difficult to learn all the changes due to the three-dimensional pose change of the face, it is generally useful only in front face images.

이러한 문제점을 해결하기 위해 T. F. Cootes, G. V. Wheeler, K. N. Walker, 및 C. J. Taylor는 논문 "View-based active appearance models(In Image and Vision Computing, vol. 20, 657-664, 2002.)"에서 얼굴의 포즈를 구분하고 각 포즈마다 독립적인 AAM을 구성하는 View-based AAM을 제안하였다. 3차원 포즈 변화를 갖는 입력 얼굴에 대해 효과적인 피팅을 수행하기 위해, T. F. Cootes의 위 두 번째 논문은 입력 영상과 최소의 에러를 갖는 AAM을 선택하고 있다. 그러나, 모든 3차원 포즈 변화를 학습하기 어렵기 때문에 정확한 피팅 결과를 얻기는 어렵다.In order to solve these problems, TF Cootes, GV Wheeler, KN Walker, and CJ Taylor published a pose of a face in the article "View-based active appearance models (In Image and Vision Computing, vol. 20, 657-664, And a View-based AAM that constitutes an independent AAM for each pose. To perform an effective fitting on the input face with a three-dimensional pose change, the second article on T. F. Cootes selects the input image and the AAM with the smallest error. However, since it is difficult to learn all three-dimensional pose changes, it is difficult to obtain accurate fitting results.

M. Zhou, L. Liang, J. Sun, 및 Y. Wang은 논문 "AAM based Face Tracking with Temporal Matching and Face Segmentation(CVPR, 2010.)"에서 view-based AAM과 얼굴 영역 가중치, 얼굴 주요 영역간의 시간 변화 매칭 등의 추가적인 방법을 결합함으로써 동영상에서의 효과적인 얼굴 피팅 방법을 제안하였다. 그러나 이 방법은 모델과 입력 얼굴간의 차이가 클 경우 효과적으로 얼굴 피팅을 하기 어려우며 추가적인 연산을 통해 많은 계산 시간이 요구하는 단점이 있다.M. Zhou, L. Liang, J. Sun, and Y. Wang in "AAM Based Face Tracking with Temporal Matching and Face Segmentation (CVPR, 2010.) We proposed an effective facial fitting method in video by combining additional methods such as temporal matching. However, this method is disadvantageous in that face fitting is not effective when the difference between the model and the input face is large, and a lot of calculation time is required through additional calculation.

실제 사람의 얼굴은 3차원 형태를 가지기 때문에 얼굴 피팅을 위해서는 2차원 모델보다 3차원 모델이 보다 효율적이다. 이에, V. Blanz와 T. Vetter는 논문 "A Morphable Model for the Synthesis of 3D Faces(In SIGGRAPH'99 Conference Proceedings, 1999.)"에서 3차원 얼굴 모델을 이용한 방법으로 3D Morphable Model (3DMM)을 제안하였다.Since a face of a real person has a three-dimensional shape, a three-dimensional model is more efficient than a two-dimensional model for face fitting. V. Blanz and T. Vetter proposed 3D Morphable Model (3DMM) as a method using 3D face model in the paper "A Morphable Model for the Synthesis of 3D Faces (In SIGGRAPH'99 Conference Proceedings, 1999.) Respectively.

3DMM은 3D 스캐너나 깊이 카메라를 이용하여 얼굴의 3차원 정보를 직접적으로 이용하여 얼굴을 3차원 형태와 질감간의 통계적 모델로 표현하고 있다. 형태 공간과 질감 공간은 AAM과 동일한 방법으로 생성된다. 그러나 3D를 이용한 방법들은 얼굴의 3차원 정보를 직접적으로 획득하기 위해 고비용을 요구하기 때문에 실제 어플리케이션으로 적용하기는 쉽지 않다.3DMM expresses the face as a statistical model between 3D shape and texture by directly using 3D information of face using 3D scanner or depth camera. The shape space and texture space are created in the same way as AAM. However, it is not easy to use 3D methods because it requires high cost to directly acquire 3D information of face.

최근 AAM을 기반으로 얼굴의 3차원 정보를 2차원 영상으로부터 추정하여 이용하는 3DAAM 방법들이 제안되었다. C.-W. Chen와 C.-C. Wang은 논문 "3D Active Appearance Model for Aligning Faces in 2D Images(In Proceedings of the IEEE/RS International Conference on Intelligent Robots and Systems, 3133-3139, 2008.)"에서 AAM과 스테레오를 이용하여 추정된 깊이 정보를 결합한 3DAAM 방법을 제안하였다.Recently, 3DAAM methods have been proposed to estimate face 3D information from 2D images based on AAM. C.-W. Chen and C.-C. Wang describes the depth information estimated using the AAM and stereo in the paper "3D Active Appearance Model for Aligning Faces in 2D Images (IEEE Proceedings of the IEEE / RS International Conference on Intelligent Robots and Systems, 3133-3139, 2008.) And proposed a combined 3DAAM method.

그러나 스테레오를 이용한 3차원 얼굴 모델 구성은 얼굴의 깊이 추정을 위해 많은 계산양을 요구하기 때문에 실시간으로 얼굴을 피팅하기 어렵다. 또한 얼굴은 동일한 피부색 등으로 인해 유사한 얼굴 영역이 많기 때문에 스테레오를 이용한 얼굴의 깊이 추정은 부정확한 3차원 정보를 제공할 수 있다.However, it is difficult to fit face in real - time because stereo computation requires a large amount of computation to estimate the face depth. In addition, since face has many similar face areas due to the same skin color, estimation of face depth using stereo can provide inaccurate 3-dimensional information.

비학습 얼굴은 학습된 얼굴과 형태와 질감에서 큰 차이를 가질 수 있기 때문에 위에서 설명한 모델 기반의 방법들은 비학습 얼굴 피팅에 적합하지 않다. 이러한 비학습 얼굴과 학습 얼굴간의 차이를 다루기 위해 C. Zhao, W.-K. Cham, 및 X. Wang는 논문 "Face Alignment with a Generic Deformable Face Model(CVPR, 2011.)"에서 비학습 얼굴의 여러 장의 영상을 이용하여 이들을 한 번에 정렬할 수 있는 모델 기반 방법을 제안하였다. 이 방법은 AAM을 이용하면서 얼굴 벡터로 형성된 행렬의 rank를 최소화함으로써 각 얼굴에 대한 모델 피팅을 수행하였다.The model-based methods described above are not suitable for non-learning face fittings because non-learning faces can have large differences in the shape and texture of learned faces. To address the differences between these non-learning and learning faces, C. Zhao, W.-K. Cham, and X. Wang proposed a model-based method that can arrange them at once using multiple images of non-learning faces in the paper "Face Alignment with a Generic Deformable Face Model (CVPR, 2011.)". This method minimizes the rank of matrices formed by face vectors while using AAM to perform model fitting for each face.

이러한 rank 최소화 방법은 Y. Peng, A. Ganesh, J. Wright, W. Xu 및 Y. Ma가 논문 "Robust Alignment by Sparse and Low-rank Decomposition for Linearly Correlated Images(CVPR, 2010.)"을 통해 제안하였으며, 입력 얼굴이 학습 얼굴에 의존적이지 않은 장점을 가진다. 그러나, 이 방법은 rank 최소화를 수행하는 행렬을 구성하기 위해 서로 연관되는 많은 수의 얼굴 벡터가 필요하기 때문에 한 장의 얼굴 영상이나 제한된 수의 얼굴 영상에 대한 피팅에는 적합하지 않다.This rank minimization method is proposed by Y. Peng, A. Ganesh, J. Wright, W. Xu, and Y. Ma in "Robust Alignment by Sparse and Low-rank Decomposition for Linearly Correlated Images (CVPR, 2010.) And the input face has an advantage that it is not dependent on the learning face. However, this method is not suitable for fitting a single face image or a limited number of face images because a large number of mutually related face vectors are required to construct a matrix for performing rank minimization.

한 장의 얼굴 영상에 대한 효과적인 얼굴 피팅을 위해 얼굴의 각 특징점의 지역적 변화를 모델 기반 방법과 결합한 방법이 제안되었다. J. Heo와 M. Savvides는 논문 "In Between 3D Active Appearance Models and 3D Mor-phable Models(In IEEE Conf. on Computer Vision and Pattern Recognition Workshops, 20-26, 2009.)"에서 2차원 얼굴 영상으로부터 얼굴의 투영 행렬을 계산하여 3DAAM을 구성하고, 이를 Active Shape Model(ASM)과 결합한 CASAAM을 제안하였다.For effective face fitting of a single facial image, a method of combining regional changes of each facial feature point with a model-based method has been proposed. J. Heo and M. Savvides proposed a two-dimensional face image from a face image in the paper "In Between 3D Active Appearance Models and 3D Morphable Models (In IEEE Conf. On Computer Vision and Pattern Recognition Workshops, 20-26, 2009.) 3DAAM, and proposed CASAAM which combines this with Active Shape Model (ASM).

ASM은 각 얼굴 특징점에 지역적 변화를 고려하기 때문에 비학습 얼굴에 보다 효율적이다. CASAAM은 재구성 에러에 따라 AAM과 ASM 중 보다 효과적인 방법을 선택하여 피팅을 수행함으로써 비학습 얼굴에 대한 효과적인 얼굴 피팅을 수행하였다.ASM is more efficient for non-learning faces because it considers regional changes in each facial feature point. CASAAM performed an effective facial fitting on non-learning face by performing more fitting method among AAM and ASM according to reconstruction error and performing fitting.

이와 다르게 회귀 함수를 결합한 방법들도 제안되고 있다. 이 방법은 최적화 방법 대신 회귀 함수를 이용하여 얼굴 특징점을 검출하고 있다. 회귀 함수는 true와 false로 구분되는 다양한 샘플로 구성되는 매우 많은 수의 데이터로부터 학습되며 boosting함으로써 ASM보다 적은 에러를 갖는 얼굴 특징점을 검출한다. 그러나 ASM과 회귀 함수를 이용한 방법은 모델 기반 방법의 파라미터 모델을 이용하기 때문에 형태 변화에 제한을 갖는다.Unlike this, methods combining regression functions are proposed. This method detects facial feature points using a regression function instead of an optimization method. The regression function learns from a very large number of data consisting of various samples separated by true and false and detects facial feature points with less errors than ASM by boosting. However, the method using ASM and regression function has limitation on morphological change because it uses parameter model of model - based method.

따라서, 비학습 얼굴에 대한 효과적인 얼굴 피팅을 위해서는 모델 기반 방법과 같은 얼굴 전체에 대한 일반적인 에러를 고려해야 하며, 에러 모델은 지역적 및 전역적인 얼굴 특징 변화를 모두 고려되어야 한다.
Therefore, for effective face fitting on non-learning faces, common errors on the entire face, such as model-based methods, should be considered, and error models should consider both local and global facial feature changes.

이에, 본 발명은 상기와 같은 문제점을 해소하기 위해 안출된 것으로서, 기존의 모델 기반 방법과는 다르게 학습되는 얼굴 내의 형태 및 질감 변화에 의존되지 않으며 얼굴의 일반적인 특성을 반영하는 질감 및 형태 에러를 결합하여 제한된 수의 얼굴 변화 학습만으로 다양한 비학습 얼굴 피팅할 수 있는 능동적 외양 모델을 이용한 비학습 얼굴의 3차원 얼굴 피팅 방법을 제공하는데 그 목적이 있다.
SUMMARY OF THE INVENTION Accordingly, the present invention has been made to solve the above-mentioned problems, and it is an object of the present invention to provide an image processing method and an image processing method that do not depend on shape and texture changes in a face, Dimensional face fitting method using an active appearance model capable of performing various non-learning face fitting only with a limited number of face change learning.

상기 목적은 본 발명에 따라, 능동적 외양 모델(Active Appearance Model)을 이용한 비학습 얼굴의 3차원 얼굴 피팅 방법에 있어서, (a) 입력 영상이 입력되는 단계와; (b) 상기 입력 영상을 평균 영상으로 와핑하여 내부 영역과 상기 내부 영역에 인접하는 외부 영역을 포함하는 얼굴 영역을 추출하는 단계와; (c) 상기 얼굴 영역에 단일 크기 레티넥스(Single Scale Retinex : SSR)를 적용하되, 기 설정된 결과 최대값과 기 설정된 결과 최소값의 범위 내에서 매핑하여 상기 얼굴 영역으로부터 얼굴 질감을 추출하는 단계와; (d) 상기 추출된 얼굴 질감을 능동적 외양 모델(Active Appearance Model)에 적용하여 질감 에러를 추출하는 단계와; (e) 상기 추출된 얼굴 질감으로부터 얼굴 형상을 추출하는 단계와; (f) 상기 추출된 얼굴 형상을 능동적 외양 모델(Active Appearance Model)에 적용하여 형태 에러를 추출하는 단계와; (g) 상기 질감 에러와 상기 형태 에러에 기 설정된 가중치를 부여하여 합산하고, 합산 결과를 능동적 외양 모델(Active Appearance Model)에 적용하여 상기 입력 영상 내의 얼굴 영상을 모델 얼굴에 피팅하는 단계를 포함하는 것을 특징으로 하는 능동적 외양 모델(Active Appearance Model)을 이용한 비학습 얼굴의 3차원 얼굴 피팅 방법에 의해서 달성된다.According to an aspect of the present invention, there is provided a method for fitting a face of a non-learning face using an active appearance model, the method comprising the steps of: (a) inputting an input image; (b) extracting a face region including an inner region and an outer region adjacent to the inner region by warping the input image with an average image; (c) applying a single-scale retinex (SSR) to the face region, and extracting a face texture from the face region by mapping within a range of a preset maximum value and a predetermined result minimum value; (d) extracting the texture error by applying the extracted face texture to an active appearance model; (e) extracting a face shape from the extracted face texture; (f) applying the extracted face shape to an active appearance model to extract shape errors; (g) fitting the face image in the input image to the model face by applying the texture error and the predetermined weight to the shape error, summing the result, and applying the sum result to the active appearance model Dimensional face fitting method using an Active Appearance Model, which is a feature of the present invention.

삭제delete

그리고, 상기 (b) 단계에서 상기 얼굴 영역은 얼굴 내부 영역을 포함하는 사각형 영역으로 추출될 수 있다.In step (b), the face area may be extracted as a rectangular area including a face inside area.

또한, 상기 (d) 단계에서 상기 질감 에러는 얼굴 내부 영역에 대해서 수행될 수 있다.In addition, in the step (d), the texture error may be performed on an area inside the face.

그리고, 상기 (f) 단계에서 상기 형태 에러는 평균 얼굴 형태의 에지를 확장한 조정 가중치가 적용되어 산출될 수 있다.In the step (f), the shape error may be calculated by applying an adjustment weight that extends the edge of the average face shape.

여기서, 상기 조정 가중치는 수학식Here, the adjustment weight may be expressed by Equation

(여기서, w_d(x,y)는 각 픽셀에 대한 조정 가중치이고, α는 기 설정된 크기 가중치 파라미터이고, S_Intensified(x,y)는 에지가 확장된 영상이고, S_mean(x,y)는 학습 얼굴 영상의 평균 얼굴 형태이고, G는 가우시안 함수이다)에 의해 산출될 수 있다.(Where, w _d (x, y) is an adjusting weight, and size of the weight parameter α is preset for each pixel, S _Intensified (x, y) is an edge of the extended video, S _mean (x, y) Is the average face shape of the learning face image, and G is the Gaussian function).

그리고, 상기 (f) 단계에서 상기 형태 에러는 상기 얼굴 내부 영역으로부터 소정 개수의 픽셀만큼 확정하여 추출될 수 있다.
In the step (f), the shape error can be determined and extracted by a predetermined number of pixels from the face area.

상기와 같은 구성에 따라 본 발명에 따르면, 얼굴의 질감 및 형태를 고려한 일반화된 에러를 최소화하는 능동적 외양 모델을 이용한 비학습 얼굴의 3차원 얼굴 피팅 방법이 제공된다.
According to the present invention, a three-dimensional face fitting method of a non-learning face using an active appearance model that minimizes a generalized error considering a texture and a shape of a face is provided.

도 1은 본 발명에 따른 비학습 얼굴의 3차원 얼굴 피팅 방법에서 질감 에러와 형태 에러를 추출하는 과정을 도식적으로 도시한 도면이고,
도 2는 본 발명에 따른 비학습 얼굴의 3차원 얼굴 피팅 방법의 피팅 과정을 나타낸 도면이고,
도 3은 학습 얼굴 영상들의 평균 형태를 이용하여 형태 에러에 대한 가중치 함수 생성 과정 예를 나타낸 도면이고,
도 4 내지 도 6은 본 발명에 따른 비학습 얼굴의 3차원 얼굴 피팅 방법의 효과를 설명하기 위한 도면이다.FIG. 1 is a diagram schematically illustrating a process of extracting a texture error and a shape error in a 3D face fitting method of a non-learning face according to the present invention,
FIG. 2 is a view illustrating a fitting process of a 3D face fitting method of a non-learning face according to the present invention,
FIG. 3 is a view illustrating an example of a weight function generation process for shape errors using an average shape of learning face images,
FIGS. 4 to 6 are views for explaining the effect of the method of three-dimensional face fitting of a non-learning face according to the present invention.

이하에서는 첨부된 도면을 참조하여 본 발명에 따른 실시예들을 상세히 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

일반적인 얼굴의 특성을 반영하기 위해 본 발명에 따른 비학습 얼굴의 3차원 얼굴 피팅 방법은 상술한 능동적 외양 모델(Active Appearance Model, 이하, 'AAM'이라 함) 내의 에러 함수에 대한 보정을 수행하였다.In order to reflect general facial characteristics, the 3D facial fitting method of the non-learning face according to the present invention performs correction for the error function in the above-mentioned Active Appearance Model (hereinafter, referred to as 'AAM').

본 발명에 따른 비학습 얼굴의 3차원 얼굴 피팅 방법을 통해 보정된 에러 함수는 얼굴의 질감 에러와 형태 에러를 모두 포함한다. 도 1은 본 발명에 따른 비학습 얼굴의 3차원 얼굴 피팅 방법에서 질감 에러와 형태 에러를 추출하는 과정을 도식적으로 도시한 도면이고, 도 2는 본 발명에 따른 비학습 얼굴의 3차원 얼굴 피팅 방법의 피팅 과정을 나타낸 도면이다.The error function corrected through the 3D face fitting method of the non-learning face according to the present invention includes both the texture error and the shape error of the face. FIG. 1 is a diagram schematically illustrating a process of extracting a texture error and a shape error in a 3D face fitting method of a non-learning face according to the present invention. FIG. 2 is a diagram illustrating a 3D face fitting method FIG.

도 1 및 도 2를 참조하여 설명하면, 먼저, 입력 영상이 입력된다(S10). 여기서, 입력 영상이 입력되면, 도 1에 도시된 바와 같이, 얼굴 모델을 학습하기 위해 입력 영상에서 얼굴 영역의 특징점들이 추출된다.Referring to FIGS. 1 and 2, an input image is input (S10). Here, when the input image is input, the feature points of the face region are extracted from the input image to learn the face model, as shown in FIG.

여기서, 입력 영상과 학습 영상은 서로 별개로 먼저 학습 영상의 특징점을 수기로 표기하여 학습하게 된다. 학습 영상은 외양 모델의 변화를 학습하기 위해 사용되며, 입력 영상은 외양 모델의 초기 위치에서 S20 단계에서 S80 단계까지의 단계를 반복적으로 최적화하여 최종적인 얼굴 피팅이 수행된다.Here, the input image and the learning image are firstly learned separately from the feature points of the learning image by handwriting. The training image is used to learn the change of the appearance model, and the final face fitting is performed by repeatedly optimizing the input image from steps S20 to S80 at the initial position of the appearance model.

이 때, 본 발명에 따른 비학습 얼굴의 3차원 얼굴 피팅 방법은 특징점이 추출되는 얼굴 영역은 얼굴 내부 영역과 얼굴 내부 영역에 인접한 얼굴 외부 영역이 포함되어 추출된다. 도 2에서는 사각형 영역의 얼굴 영역에 특징점이 추출되는 것을 예로 하고 있다. 여기서, 특징점의 추출에 있어서, 학습 영상에서는 표기된 특징점을 기반으로 수행되어 학습되며, 입력 영상에서는 현재 피팅 과정의 모델 위치에 대한 입력 영상 영역에 대해 수행된다.In this case, the 3D face fitting method of the non-learning face according to the present invention extracts the face region from which feature points are extracted by including the face inside region and the face outside region adjacent to the face inside region. In FIG. 2, feature points are extracted in a face region of a rectangular region. Here, in the extraction of the minutiae, the learning image is learned and performed based on the marked minutiae, and the input image is performed on the input image area for the model position of the current fitting process.

이와 같이, 얼굴 내부 영역과 얼굴 외부 영역을 함께 학습함으로써, 얼굴의 내부 영역의 질감 정보만 이용할 경우, 질감으로부터 추출되는 얼굴의 형태가 턱 영역에서 올바르게 나타나지 않은 오류를 제거할 수 있게 된다.In this way, by using only the texture information of the inner area of the face by learning the inner area of the face and the outer area of the face together, it is possible to eliminate the error that the shape of the face extracted from the texture does not appear correctly in the jaw area.

한편, 입력 영상, 즉 특징점이 추출된 입력 영상을 평균 영상으로 와핑하여 얼굴 영역을 추출한다(S20). 그런 다음, 추출된 얼굴 영역에 단일 크기 레티넥스Single Scale Retinex : SSR)를 적용하여 얼굴 질감을 추출하게 된다(S30). 여기서, 본 발명에 따른 비학습 얼굴의 3차원 얼굴 피팅 방법에서는 얼굴 영역에 단일 크기 레티넥스(Single Scale Retinex : SSR)를 적용하는데 있어, 기 설정된 결과 최대값과 기 설정된 결과 최소값의 범위 내에서 매핑하여 얼굴 영역으로부터 얼굴 질감을 추출하게 된다. 그리고, 추출된 얼굴 질감을 능동적 외양 모델(Active Appearance Model)에 적용하여 질감 에러를 추출하게 된다(S60).Meanwhile, the face region is extracted by warping the input image, that is, the input image from which the feature points are extracted, as an average image (S20). Then, the facial texture is extracted by applying a single size Retinex Single Scale Retinex (SSR) to the extracted face region (S30). In the 3D face fitting method of a non-learning face according to the present invention, a single-size retinex (SSR) is applied to a face region. In the case of applying a single-scale retinex (SSR) So that the face texture is extracted from the face area. Then, the extracted face texture is applied to the Active Appearance Model to extract the texture error (S60).

그리고, 추출된 얼굴 질감으로부터 얼굴 형상을 추출하고(S40), 추출된 얼굴 형상을 능동적 외양 모델(Active Appearance Model)에 적용하여 형태 에러를 추출하게 된다(S70).Then, the face shape is extracted from the extracted face texture (S40), and the extracted face shape is applied to the active appearance model (S70).

상기 과정을 보다 구체적으로 설명하면, 본 발명에 따른 비학습 얼굴의 3차원 얼굴 피팅 방법의 에러 함수는 질감 에러와 형태 에러를 결합하여 구성된다. 질량 에러는 조명에 의해 발생되는 질감 차이를 최소화하기 위해 얼굴 영상에 단일 크기 레티넥스(Single Scale Retinex : SSR)를 적용하되, 최대 결과값과 최소 결과값에 의해 결정되는 범위가 설정된 범위 단일 크기 레티넥스(Ranged Single Scale Retinex : RSSR)를 적용하여 계산하게 된다.More specifically, the error function of the 3D face fitting method of the non-learning face according to the present invention is formed by combining the texture error and the shape error. In order to minimize the texture difference caused by the illumination, the mass error is applied to the facial image with a single size retinex (SSR), and the range determined by the maximum result value and the minimum result value is set to a single size reticle Ranged Single Scale Retinex (RSSR).

일반적으로 단일 크기 레티넥스(Single Scale Retinex : SSR)는 각 픽셀의 지역적 평균의 비율을 이용하여 각 영상의 반사도(Reflectance)를 추정한다. 단일 크기 레티넥스(Single Scale Retinex : SSR)는 [수학식 2]와 같이 정의될 수 있다.
In general, Single Scale Retinex (SSR) estimates the reflectance of each image using the ratio of the local average of each pixel. The Single Scale Retinex (SSR) can be defined as: " (2) "

[수학식 2]&Quot; (2) "

여기서, F(x,y)는 입력된 얼굴 영상이고, R(x,y)는 단일 크기 레티넥스(Single Scale Retinex : SSR)가 적용된 적용 결과를 나타내며, G(x,y)는 가우시안 필터이고,

는 콘볼루션(Convolution) 연산자이다. Here, F (x, y) is the input facial image, R (x, y) represents the application result in which Single Scale Retinex (SSR) is applied, G (x, y) is a Gaussian filter ,

Is a convolution operator.

단일 크기 레티넥스(Single Scale Retinex : SSR)는 적용 결과R(x,y)를 영상화하기 위해 적용 결과를 결과 내의 최대값과 최소값을 범위로 하여 (0 255)의 영상 값으로 매핑하게 된다. 이와 같은 매핑 과정에서 단일 크기 레티넥스(Single Scale Retinex : SSR)의 적용 결과는 최대값과 최소값에 따라 동일한 픽셀에 대해서도 다른 결과를 보이게 된다.Single Scale Retinex (SSR) maps the application results to image values of (0, 255) in the range of the maximum and minimum values in the result to image R (x, y). In this mapping process, the results of applying Single Scale Retinex (SSR) show different results for the same pixel depending on the maximum and minimum values.

이에, 본 발명에 따른 비학습 얼굴의 3차원 얼굴 피팅 방법에서는 상술한 바와 같이, 단일 크기 레티넥스(Single Scale Retinex : SSR)의 최대값과 최소값을 기 설정된 결과 최대값과 결과 최대값으로 조정하게 된다. 이를 통해, 적용 결과로부터 추출되는 형태를 보다 강하거나 약하게 조절 가능하게 된다. 본 발명에서는 결과 최대값으로 log(1.6)을, 결과 최소값으로 log(1.0)이 설정되는 것을 예로 한다.Accordingly, in the 3D face fitting method of the non-learning face according to the present invention, as described above, the maximum value and the minimum value of the single-scale retinex (SSR) are adjusted to the preset maximum value and the maximum value do. This allows the shape extracted from the application result to be adjusted more or less weakly. In the present invention, log (1.6) is set as the maximum value of the result, and log (1.0) is set as the minimum value of the result.

한편, 얼굴의 형태 에러의 산출에 있어, 얼굴 형태는 얼굴의 형태를 학습하여 이용하지 않고, 상술한 바와 같이, 얼굴의 질감 정보로부터 직접적으로 추출된다. 이는 얼굴의 질감 정보에서 추출되는 형태 정보는 얼굴의 표정이나 3차원 얼굴 포즈에 따라 본 발명에서는, 도 2의 S20 단계에서와 같이 평균 얼굴로 와핑된 영상으로부터 추출하게 된다.On the other hand, in the calculation of the face shape error, the face shape is directly extracted from the texture information of the face, as described above, without learning the shape of the face. The shape information extracted from the texture information of the face is extracted from the image waved to the average face according to the facial expression or the three-dimensional face pose according to the present invention as in step S20 of FIG.

얼굴의 형태 F_s는 x축과 y축에 따른 픽셀 변화의 크기로, [수학식 3]과 같이 표현될 수 있다.
The shape F _s of the face is the magnitude of the pixel change along the x and y axes, and can be expressed as Equation (3).

[수학식 3]&Quot; (3) "

여기서,

이고,

이다.here,

ego,

to be.

상기와 같이, 본 발명에 따른 비학습 얼굴의 3차원 얼굴 피팅 방법의 에러 모델은 얼굴의 질감 및 형태 변화를 반영하게 된다. 여기서, 질감 에러의 경우, 본 발명에서는 상술한 바와 같이, 사각형 형태의 얼굴 영역을 이용하기 때문에 얼굴의 내부와 외부가 모두 학습된다.As described above, the error model of the 3D face fitting method of the non-learning face according to the present invention reflects the texture and morphological change of the face. Here, in the case of a texture error, since the face region of a rectangular shape is used as described above in the present invention, both inside and outside of the face are learned.

얼굴 외부 영역의 경우, 학습 영상과 입력 영상이 크게 다르기 때문에 학습된 질감 영역을 에러로 직접 사용할 경우, 잘못된 에러 결과를 보일 수 밖에 없다. 이러한 문제점을 해소하기 위해, 본 발명에 따른 비학습 얼굴의 3차원 얼굴 피팅 방법에서는 질감 에러를 계산할 때, 얼굴 내부 영역에 대해서만 수행하게 된다.In the case of the area outside the face, since the learning image and the input image are greatly different from each other, if the learned texture area is directly used as an error, the result of erroneous error can not be obtained. In order to solve this problem, in the 3D face fitting method of the non-learning face according to the present invention, only the face inside area is calculated when the texture error is calculated.

그리고, 형태 에러의 경우, 얼굴의 형태가 턱과 같은 경우, 얼굴 내부 영역과 인접하여 생성되기 때문에 얼굴의 내부 영역에서 소정 개수의 픽셀 만큼 확장하여 계산하게 된다.In the case of a shape error, if the shape of the face is the same as the jaw, it is generated adjacent to the area inside the face, so that it is expanded by a predetermined number of pixels in the inner area of the face.

S20 단계에서 평균 얼굴로 와핑된 얼굴 영상의 얼굴 형태에 대한 정보는 얼굴의 표정이나 사람의 차이와 무관하게 눈이나 코, 입의 위치 등이 동일하기 때문에 유사하게 표현된다. 도 3은 다양한 얼굴에 대해 생성된 얼굴 질감 및 추출된 얼굴 형태의 예를 나타낸 도면이다. 도 3에 도시된 바와 같이, 본 발명에 따른 비학습 얼굴의 3차원 얼굴 피팅 방법을 통해 추출된 얼굴 형태는 다양한 입력 얼굴에 비해 유사한 형태를 갖게 된다. 따라서, 질감 에러와 형태 에러를 결합함으로써 다양한 비학습 얼굴에 대한 얼굴 피팅을 수행 가능하게 된다.In step S20, information on the face shape of the face image watermarked with the average face is similarly expressed because the positions of the eyes, nose, and mouth are the same regardless of the facial expression or human difference. FIG. 3 is a view showing an example of a face texture and an extracted face shape generated for various faces. As shown in FIG. 3, the face shape extracted through the 3D face fitting method of the non-learning face according to the present invention has a similar shape to the various input faces. Thus, it is possible to perform face fitting on various non-learning faces by combining texture error and shape error.

한편, 능동적 외양 모델(Active Appearance Model)의 얼굴 피팅을 위한 [수학식 1]로부터, 입력 영상과 얼굴 모델 간의 질감 차이에 의한 질감 에러는 [수학식 4]와 같이 정의될 수 있다.
On the other hand, the texture error due to the difference in texture between the input image and the face model can be defined as in Equation (4) from Equation (1) for face fitting of the active appearance model.

[수학식 4]&Quot; (4) "

여기서 RSSR(·) 함수는 본 발명에 따른 범위 단일 크기 레티넥스(Ranged Single Scale Retinex : RSSR)를 적용하기 위한 함수이다. 마찬가지로, 입력 얼굴과 얼굴 모델 간의 형태 에러는 [수학식 5]와 같이 정의될 수 있다.
Here, the RSSR () function is a function for applying Ranged Single Scale Retinex (RSSR) according to the present invention. Similarly, the shape error between the input face and the face model can be defined as: " (5) "

[수학식 5]&Quot; (5) "

Shape(·) 함수는 [수학식 3]을 이용한 형태 추출 함수이다. 여기서, 입력 영상과 얼굴 모델의 형태 정보는 학습되지 않고, 질감 정보로부터 직접적으로 추출되는 것은 상술한 바와 같다. 따라서, 입력 얼굴에 대해 얼굴 모델을 피팅하기 위해, 도 2에 도시된 바와 같이, 질감 에러와 형태 에러가 기 설정된 가중치가 부여되어 합산된다(S70). [수학식 6]은 가중치가 부여되어 합산된 에러를 나타내고 있다.
The Shape () function is a shape extraction function using [Equation 3]. Here, the type information of the input image and the face model is not learned, but is directly extracted from the texture information as described above. Accordingly, in order to fit the face model to the input face, texture errors and shape errors are added with predetermined weights as shown in Fig. 2 (S70). [Equation (6)] represents an error that is weighted and added.

[수학식 6]&Quot; (6) "

여기서, w_e는 가중치로, 질감 에러와 형태 에러 간의 에러 차이를 조정하게 되며, 질감 에러와 형태 에러 간의 가중치의 차이가 없도록 1이 적용되는 것을 예로 한다.Here, w _e is a weight, which adjusts the error difference between the texture error and the form error, and 1 is applied so that there is no difference in weight between the texture error and the form error.

상기와 같이, 질감 에러와 형태 에러의 합산이 완료되면, 도 2에 도시된 바와 같이, 합산 결과를 능동적 외양 모델(Active Appearance Model)에 적용하여 입력 영상 내의 얼굴 영상을 얼굴 모델에 피팅하게 된다(S80).As described above, when the sum of the texture error and the shape error is completed, the result of summation is applied to the active appearance model, as shown in FIG. 2, to fit the face image in the input image to the face model S80).

한편, 상술한 바와 같이, 형태 에러는 입력 얼굴과 피팅되는 얼굴 모델로부터 [수학식 3]을 이용하여 추출되는 얼굴 형태 간의 차이로 계산된다. 형태 에러를 이용한 효과적인 얼굴 피팅을 위해서는 입력 얼굴의 에지 영역이 얼굴 모델의 에지 영역에 존재할 때, 형태 에러가 작아야 한다.On the other hand, as described above, the form error is calculated as the difference between face shapes extracted using [Equation 3] from the face model fitting with the input face. For efficient face fitting using shape errors, the shape error should be small when the edge region of the input face exists in the edge region of the face model.

반면, 에지 영역이 얼굴 모델에서 에지가 존재하지 않는 영역에 존재할 경우, 형태 에러는 커져야 한다. 또한 효과적인 에러 최적화를 위해서는 형태 에러는 얼굴 모델이 입력 얼굴에 피팅됨에 따라 입력 얼굴의 에지가 얼굴 모델의 에지에 접근할 때 점차 감소해야 한다. 그러나, 얼굴의 에지는 뾰족한 형태로 나타나기 때문에 형태 에러는 이러한 에러 감소를 갖지 못하는 문제점이 있다.On the other hand, if the edge region is present in the face model where there is no edge, the shape error must be large. In addition, for effective error optimization, the form error must gradually decrease as the edge of the input face approaches the edge of the face model as the face model is fitted to the input face. However, since the edge of the face appears as a pointed shape, the shape error has a problem that it does not have such an error reduction.

이에, 본 발명에 따른 비학습 얼굴의 3차원 얼굴 피팅 방법은 얼굴의 일반적인 형태를 반영하는 형태 에러를 이용하여 다양한 비학습 얼굴을 피팅하기 위해, 학습 얼굴 영상의 평균 얼굴 형태를 이용하여 조정 가중치를 적용하여 산출하는 것을 예로 한다.Accordingly, the 3D face fitting method of the non-learning face according to the present invention uses the average face form of the learning face image to fit various non-learning faces using the shape error reflecting the general shape of the face, And the calculation is applied.

학습 얼굴의 평균 얼굴 형태는 얼굴 형태의 일반적인 모양을 나타낸다. 도 3은 본 발명에 따른 비학습 얼굴의 3차원 얼굴 피팅 방법의 조정 가중치를 생성하는 과정을 나타낸 도면이다.The average face shape of the learning face represents the general shape of the face shape. FIG. 3 is a diagram illustrating a process of generating an adjustment weight of a method for fitting a three-dimensional face of a non-learning face according to the present invention.

조정 가중치는 함수로 정의되는데, 평균 얼굴 형태에서 에지의 픽셀을 확장하여 생성된다. 에지가 확장된 영상 S_Intensified(x,y)는 평균 얼굴 형태 영상과 이를 가우시안 필터를 적용한 영상에서 각 픽셀 값의 큰 것을 취함으로써 정의되며, [수학식 7]과 같이 나타낼 수 있다.
The adjustment weight is defined as a function, which is generated by expanding the pixels of the edge in the average face shape. The expanded image S _Intensified (x, y) of the edge is defined by taking the larger value of each pixel value in the average face shape image and the Gaussian filter applied image, as shown in Equation (7).

[수학식 7]&Quot; (7) "

여기서 S_mean(x,y)는 학습 얼굴 영상의 평균 얼굴 형태이고, G는 가우시안 함수이다. 에지가 확장된 얼굴 평균 영상으로부터 각 픽셀에 대한 조정 가중치는 [수학식 8]을 통해 산출된다.
Where S _mean (x, y) is the average face shape of the learning face image and G is the Gaussian function. The adjustment weight for each pixel from the edge-averaged face averaged image is computed from equation (8).

[수학식 8]&Quot; (8) "

여기서, w_d(x,y)는 각 픽셀에 대한 조정 가중치이고, α는 기 설정된 크기 가중치 파라미터이다. 상기와 같은, 조정 가중치를 반영한 합산된 에러는 [수학식 9]와 같이 표현될 수 있다.
Here, w _d (x, y) is an adjustment weight for each pixel and? Is a predetermined size weight parameter. As described above, the summed error reflecting the adjustment weight can be expressed as Equation (9).

[수학식 9]&Quot; (9) "

한편, 비학습 얼굴에 대한 얼굴 피팅을 위한 주요한 문제 중 하나는 제한된 형태의 변화이다. 이러한 문제점은 얼굴 모델이 학습 얼굴의 변화 내에서만 형태를 변화할 수 있기 때문에 발생한다. 반면, 입력 얼굴은 심하게는 학습 얼굴과 크게 다른 형태로 변화할 수 있다. 이로 인해, 모델 기반 방법이 비학습 얼굴에 정확하게 피팅을 수행할 수 없게 된다.On the other hand, one of the major problems for facial fitting to non-learning faces is the limited form of change. This problem occurs because the face model can change shape only within the change of learning face. On the other hand, the input face can be changed into a shape that is significantly different from the learning face. As a result, model-based methods can not perform fitting correctly on non-training faces.

그러나, 얼굴의 각 부분에 해당하는 얼굴 특징점은 얼굴 형태 변화와 무관하게 유사한 위치에 존재한다. 예를 들어, 코는 얼굴의 중앙에 위치하며 입은 코의 아래쪽에 눈은 코의 위쪽 부분에 위치한다. 따라서, 얼굴의 다양한 변화를 표현하기 위한 얼굴 특징의 변화를 갖기 위해 본 발명에 따른 비학습 얼굴의 3차원 얼굴 피팅 방법에서는 얼굴 형태의 각 요소의 특징을 조정하는 것을 예로 한다.However, the facial feature points corresponding to each part of the face exist in similar positions regardless of the change in facial shape. For example, the nose is located in the center of the face and the eye is located in the upper part of the nose at the bottom of the nose. Therefore, in order to have a change of facial features for expressing various changes of a face, the feature of each element of the facial shape is adjusted in the three-dimensional facial fitting method of the non-learning face according to the present invention.

보다 구체적으로 설명하면, 얼굴 형태의 각 부분에 해당하는 특징점의 변화를 조정하기 위해 추가적인 형태 파라미터 s={s₁, s₂, …, s_n}을 정의한다. 각 형태 파라미터는 얼굴 형태의 각 부분에의 특징점을 독립적으로 조정하게 된다. 형태 파라미터에 의해 발생하는 피팅 에러는 [수학식 9]의 질감 에러를 이용한다.More specifically, in order to adjust the change of the minutiae corresponding to each part of the face shape, an additional form parameter s = {s ₁ , s ₂ , ... , s _n }. Each shape parameter independently adjusts the feature points for each part of the face shape. The fitting error caused by the shape parameter uses the texture error of (9).

그러나, 질감 에러를 최소화하기 위해 형태 파라미터가 크게 변화한다면 얼굴 모델의 입력 얼굴의 형태를 올바르게 피팅하기 어렵다. 예를 들어, 만약 눈의 크기를 조정하는 파라미터가 눈의 크기를 너무 크게 만든다면 눈의 영역이 얼굴 외곽 영역을 넘는 문제가 발생하게 된다. 따라서, 형태 파라미터에 대한 변화 제약이 필요하게 된다. [수학식 10]은 [수학식 9]에 형태 파라미터를 반영한 것이다.
However, if shape parameters change significantly to minimize texture errors, it is difficult to fit the shape of the input face of the face model correctly. For example, if a parameter that adjusts the size of an eye makes the size of the eye too large, the problem arises that the area of the eye exceeds the area of the face. Therefore, there is a need for a change restriction on the form parameter. Equation (10) reflects the shape parameter in Equation (9).

[수학식 10]&Quot; (10) "

여기서, w_s와 α_i는 형태 파라미터의 변화 에러와 얼굴의 각 형태 변화 에러를 조정하기 위한 가중치 파라미터이다.Here, w _s and a _i are weight parameters for adjusting the shape parameter change error and each shape change error of the face.

이와 같은 형태 파라미터를 최적화하기 위한 방법의 하나는 종래의 능동적 외양 모델(Active Appearance Model, AAM)과 동일하게 학습 얼굴을 이용하여 파라미터에 따른 에러 변화를 학습하는 것이다. 그러나, 학습 얼굴에 입력되는 모든 일반 얼굴을 포함할 수 없기 때문에 최적화 결과는 정확하지 않을 뿐만 아니라 국소 최저치에 빠지기 쉬워 올바른 얼굴 피팅을 수행하지 못한다.One of the ways to optimize such shape parameters is to learn the error changes according to the parameters using the learning face as in the conventional Active Appearance Model (AAM). However, the optimization result is not accurate because it can not include all normal faces entered in the learning face, and it is easy to fall into the local minimum, so that the correct face fitting can not be performed.

이를 해결하기 위해, 본 발명에 따른 비학습 얼굴의 3차원 얼굴 피팅 방법에서는 얼굴 피팅을 수행하는 각 단계에서 형태 파라미터 변화에 따른 Jacobian 행렬을 계산하여 정확한 최적의 형태 파라미터를 계산하게 된다. 이와 같은 방법은 사전에 학습하는 것에 비해 보다 많은 계산량을 요구하지만 다양한 비학습 얼굴 형태에 정확한 얼굴 피팅을 수행할 수 있다.In order to solve this problem, the Jacobian matrix according to the morphological parameter changes is calculated at each step of performing the face fitting in the 3-dimensional face fitting method of the non-learning face according to the present invention to calculate the correct optimal morphological parameter. This method requires more computation than pre-learning, but it can perform accurate face fitting to various non-learning face forms.

이하에서는, 본 발명에 따른 비학습 얼굴이 3차원 얼굴 피팅 방법의 효과를 검증하기 위한 실험 결과에 대해 상세히 설명한다.Hereinafter, experimental results for verifying the effect of the 3D face fitting method according to the present invention will be described in detail.

효과 검증을 위해 4가지의 공개 데이터베이스를 이용하였다. 첫 번째는 IMM DB로써, 40명의 사람으로부터 각 사람마다 무표정과 기쁨 감정을 갖는 2장의 정면 얼굴 영상과 ㅁ30도로 좌우로 회전된 영상 2장, 그리고 집중 조명을 받는 정면 영상과 임의의 표정을 갖는 정면 영상을 포함하여 총 240장의 영상으로 이루어진다. 도 4는 IMM DB의 예를 나타낸 것이다.Four public databases were used to verify the effect. The first is the IMM DB, which is composed of two frontal facial images with no expression and joy feeling for each person from 40 people, two images rotated left and right at 30 degrees, and frontal images with focused illumination and arbitrary facial expressions It consists of a total of 240 images including frontal images. 4 shows an example of the IMM DB.

두 번째는 BioID DB로써, 23명으로부터 정면에 가까운 얼굴로 1,521장의 영상으로 이루어진다. 세 번째는 FGNet Tralking Face 비디오로써, 한명의 사람의 인터뷰 모습으로 5,000장의 프레임으로 이루어져 있다. 마지막으로 Labeled Faces in the Wild(LFW) DB는 다양한 배경과 조명 얼굴 포즈, 표정 변화를 갖는 임의의 얼굴 영상을 Web에서 수집된 DB로 ground-truth 의 얼굴 특징점 정보를 가지고 있지 않다.The second is the BioID DB, which consists of 1,521 images from 23 people with faces close to the front. The third is the FGNet Tralking Face video, which consists of 5,000 frames with one human interview. Finally, the Labeled Faces in the Wild (LFW) DB does not have the face minutia information of the ground-truth as the DB collected from the Web on various backgrounds, lighting face pose, random face image with facial expression change.

실험은 두 가지로 평가되었다. 첫 번째는 본 발명에 따른 비학습 얼굴이 3차원 얼굴 피팅 방법을 4개의 DB를 이용하여 종래의 방법과 비교하여 정확도를 평가하였다. 이 때, 비학습 얼굴에 대한 평가를 위해 모든 방법은 IMM DB를 이용하여 학습한 후 평가하였다. 두 번째는 IMM DB로 학습된 제안된 방법과 종래의 방법을 LFW DB에 적용함으로써 다양한 비학습 얼굴에 대한 피팅 평가를 수행하였다.The experiment was evaluated in two ways. First, the accuracy of the non-learning face according to the present invention is evaluated by comparing the three-dimensional face fitting method with the conventional method using four DBs. At this time, all methods for evaluation of non - learning faces were evaluated by using IMM DB. Secondly, fitting evaluation for various non - learning faces was performed by applying the proposed method learned by IMM DB and conventional method to LFW DB.

본 발명에 따른 비학습 얼굴이 3차원 얼굴 피팅 방법과 비교 대상이 되는 종래의 피팅 방법으로는 2D AAM과 Approximated 3D AAM, multi-band AAM이 사용되었다. 또한, 본 발명에 따른 비학습 얼굴이 3차원 얼굴 피팅 방법의 각 부분적 기술에 대한 평가를 수행하기 위해 본 발명에 따른 비학습 얼굴이 3차원 얼굴 피팅 방법에서 RSSR, 조정 함수, 형태 파라미터 등에 대한 부분적 적용 결과에 대한 비교를 수행하였다.The 2D AAM, the approximated 3D AAM, and the multi-band AAM are used as the conventional fitting method in which the non-learning face according to the present invention is compared with the 3D face fitting method. In order to perform the evaluation of each partial technique of the 3D face fitting method according to the present invention, the non-learning face according to the present invention may be partially or completely divided into the RSSR, the adjustment function, A comparison was made between the results.

얼굴 피팅 결과의 정확도를 평가하기 위해 피팅 결과와 ground-truth의 특징점간 거리 차이를 이용하여 [수학식 11]을 이용하였다. 이 때, 입력 영상의 얼굴의 크기에 따라 결과가 다르게 나오지 않도록 거리 차이를 눈의 거리로 정규화하였다.
In order to evaluate the accuracy of the face fitting results, [Equation 11] was used by using the distance difference between the fitting results and the ground-truth feature points. In this case, the distance difference is normalized to the eye distance so that the result is not different depending on the size of the face of the input image.

[수학식 11]&Quot; (11) "

여기서,

와

은 각각 ground-truth와 피팅 결과의 특징점의 위치를 나타낸다.here,

Wow

Represent the ground-truth and the position of the feature point of the fitting result, respectively.

첫 번째로, IMM DB에 대한 평가를 수행하였다. IMM DB에 대한 평가를 위해 DB를 2개의 그룹으로 나누어 각 그룹마다 20명의 서로 다른 사람으로부터 획득된 영상이 포함하도록 하였다.First, we evaluated the IMM DB. In order to evaluate the IMM DB, the DB was divided into two groups, and the images obtained from 20 different persons were included in each group.

그리고 첫 번째 그룹의 영상에서 무표정과 웃는 표정의 정면 얼굴 영상들을 이용하여 모든 방법을 학습하였다. IMM DB는 58개의 얼굴 특징점으로 각 영상에 대한 ground-truth를 제공한다. 그러나 각 특징점의 위치만을 제공하고, 각 피팅 방법의 AAM을 적용하기 위해 도 4에 도시된 바와 같이 얼굴 메시를 생성하였다. 그리고, 3D AAM을 생성하기 위해 DB내 각 사람마다의 두 측면 영상을 이용하여 얼굴의 3차원 형태를 재구성하였다.In the first group of images, all methods were learned by using facial expressions of expressionless face and smiling face. The IMM DB provides ground-truth for each image with 58 facial feature points. However, to provide only the location of each feature point, and to apply the AAM of each fitting method, a face mesh was created as shown in FIG. In order to generate the 3D AAM, the 3D shape of the face was reconstructed using two side images of each person in the DB.

IMM DB를 이용한 평가는 두 가지로 이루어졌다. 첫 번째로 DB내 학습된 20명의 사람의 집중 조명된 얼굴과 임의의 표정의 얼굴에 각 방법에 대한 피팅 정확도를 평가하였다. 그리고 두 번째로 DB에서 학습에 포함되지 않은 20명의 얼굴 영상에 대한 피팅 정확도를 평가하였다. 이때 각 사람의 영상은 3가지 영상으로 나누어 평가하였다.The evaluation using the IMM DB was done in two ways. First, fitting accuracy for each method was evaluated on the focused face and face of arbitrary face of 20 people learned in the DB. Secondly, we evaluated the fitting accuracy of 20 face images not included in the learning in the DB. At this time, the image of each person was divided into three images and evaluated.

첫 번째는 무표정과 웃는 표정의 2장의 정면 영상으로 학습 얼굴과 동일한 얼굴 표정의 영상이며, 두 번째는 집중 조명된 영상, 그리고 세 번째는 임의 표정의 정면 영상을 평가하였다.The first one is the face image of the same face as the learning face, the second is the focused image, and the third is the front face image of the arbitrary face.

[표 1]은 IMM DB에 대한 각 방법에 대한 피팅 에러를 나타낸 것이다. 본 발명에 따른 비학습 얼굴의 3차원 얼굴 피팅 방법이 가장 작은 에러로 얼굴 피팅을 수행한 반면, 다른 기존 피팅 방법들은 영상 내 잡음과 학습 얼굴간의 질감 차이로 인해 보다 부정확한 얼굴 피팅을 수행하였다.
Table 1 shows fitting errors for each method for the IMM DB. The three-dimensional face fitting method of the non-learning face according to the present invention performed face fitting with the smallest error, while other conventional fitting methods performed more inaccurate face fitting due to the difference in the texture between the in-image noise and the learning face.

[표 1][Table 1]

비학습 얼굴에 대한 평가를 수행하기 위해 IMM DB를 이용하여 학습한 후, BioID DB에 대한 얼굴 피팅 결과를 평가하였다. 평가되는 모든 방법은 IMM DB의 40명에 대한 각 2장의 영상으로 80장의 얼굴 영상으로 학습되었다. BioID DB는 IMM DB에 비해 보다 적은 수의 특징점을 ground-truth로 제공하며 두 DB간 공통되는 특징점의 위치가 약간의 위치 차이를 갖는다. 그러므로 평가에서는 도 5에 도시된 바와 같이, 눈과 코, 입, 턱선 등에 위치한 주요한 특징점에 대해서 에러를 계산하였다.After performing the learning using the IMM DB to evaluate the non - learning face, the face fitting results for the BioID DB were evaluated. All the methods evaluated were 80 face images with 2 images for 40 IMM DB users. BioID DB provides fewer number of feature points as ground-truth than IMM DB, and the position of feature points common between two DBs has a slight difference in position. Therefore, in the evaluation, as shown in FIG. 5, errors were calculated for major feature points located in the eyes, nose, mouth, and chin line.

[표 2]는 각 방법에 대한 피팅 에러의 비교 결과를 보인다. 평가에서는 두 DB간 공통되는 특징점의 위치가 약간 다르기 때문에 IMM DB만을 이용한 평가에 비해 피팅 에러 결과가 약간 크게 나타난다. BioID DB의 얼굴 영상은 학습된 IMM DB의 얼굴 영상과 다른 환경 및 다른 얼굴 포즈, 표정으로 획득되었기 때문에 2D AAM과 3D AAM의 경우, 몇 영상에서 얼굴의 피팅 결과가 크게 잘못된 경우가 발생했다.Table 2 shows the comparison results of fitting errors for each method. In the evaluation, since the positions of the common feature points are slightly different between the two DBs, the fitting error result is slightly larger than the evaluation using only the IMM DB. Since the face images of BioID DB are obtained by different environments and different facial pose and facial expressions from the facial images of the learned IMM DB, facial fitting results are often wrong for 2D AAM and 3D AAM in several images.

반면, 본 발명에 따른 비학습 얼굴의 3차원 얼굴 피팅 방법은 가장 작은 피팅 에러로 정확한 얼굴 피팅을 수행하였다. 또한, 본 발명에 따른 비학습 얼굴의 3차원 얼굴 피팅 방법 RSSR과 형태 파라미터가 비학습 얼굴의 피팅 정확도를 향상시켰음을 볼 수 있다.
On the other hand, the 3D face fitting method of the non-learning face according to the present invention performs accurate face fitting with the smallest fitting error. In addition, it can be seen that the three-dimensional face fitting method RSSR and the shape parameter of the non-learning face according to the present invention improve the fitting accuracy of the non-learning face.

[표 2][Table 2]

한편, talking face 비디오 DB에 대한 평가를 위해 이전과 동일하게 모든 방법은 IMM DB를 이용하여 학습한 이후, 얼굴 피팅을 수행하였다. 피팅 에러를 계산하기 위해 도 6에 도시된 바와 같이, 두 DB간 유사한 40개의 얼굴 특징점을 이용하였다. 비디오 DB의 각 프레임은 연속되는 얼굴의 변화이기 때문에 얼굴 피팅을 위한 초기 위치를 이전 프레임에서의 피팅 결과로 이용하였다.On the other hand, for the evaluation of the talking face video DB, all the methods were performed using the IMM DB, and then face fitting was performed. To calculate the fitting error, 40 similar facial feature points between the two DBs were used, as shown in Fig. Since each frame of the video DB is a continuous face change, the initial position for face fitting is used as the fitting result in the previous frame.

[표 3]은 각 방법 간의 피팅 에러 결과에 대한 비교를 나타내고 있다. 비디오 DB는 단색의 배경으로 얼굴에 특별한 조명을 포함하지 않기 때문에 대부분의 방법이 비교적 올바른 얼굴 피팅을 수행하였다. 그러나, 기존 방법이 형태 제약 등으로 인해 몇개의 얼굴 특징점에서 잘못된 피팅을 보인 반면, 본 발명에 따른 비학습 얼굴의 3차원 얼굴 피팅 방법은 가장 작은 에러로 정확한 얼굴 피팅을 수행하였다.
Table 3 shows a comparison of fitting error results between the methods. Since Video DB does not include special lighting on the face as a solid background, most methods performed relatively correct face fitting. However, while the existing method shows erroneous fitting in some facial feature points due to the type restriction, the 3D facial fitting method of the non-learning face according to the present invention performs accurate face fitting with the smallest error.

[표 3][Table 3]

비록 본 발명의 몇몇 실시예들이 도시되고 설명되었지만, 본 발명이 속하는 기술분야의 통상의 지식을 가진 당업자라면 본 발명의 원칙이나 정신에서 벗어나지 않으면서 본 실시예를 변형할 수 있음을 알 수 있을 것이다. 발명의 범위는 첨부된 청구항과 그 균등물에 의해 정해질 것이다.Although several embodiments of the present invention have been shown and described, those skilled in the art will appreciate that various modifications may be made without departing from the principles and spirit of the invention . The scope of the invention will be determined by the appended claims and their equivalents.

Claims

In a 3D face fitting method of a non-learning face using an Active Appearance Model,
(a) inputting an input image;
(b) extracting a face region including an inner region and an outer region adjacent to the inner region by warping the input image with an average image;
(c) applying a single-scale retinex (SSR) to the face region, and extracting a face texture from the face region by mapping within a range of a preset maximum value and a predetermined result minimum value;
(d) extracting the texture error by applying the extracted face texture to an active appearance model;
(e) extracting a face shape from the extracted face texture;
(f) applying the extracted face shape to an active appearance model to extract shape errors;
(g) fitting the face image in the input image to the model face by applying the texture error and the predetermined weight to the shape error, summing the result, and applying the sum result to the active appearance model A three - dimensional face fitting method of non - learning face using Active Appearance Model.

delete

The method according to claim 1,
Wherein the face region is extracted as a rectangular region including an inner region of the face in the step (b).

The method according to claim 1,
Wherein the texture error is performed on an inner area of the face in step (d).

The method according to claim 1,
Wherein the shape error in the step (f) is calculated by applying an adjustment weight that extends the edge of the average face shape to the 3D face fitting method of the non-learning face using the Active Appearance Model.

6. The method of claim 5,
The adjustment weight may be calculated using Equation

(Where, w _d (x, y) is an adjusting weight, and size of the weight parameter α is preset for each pixel, S _Intensified (x, y) is an edge of the extended video, S _mean (x, y) Is the average face shape of the learning face image, and G is the Gaussian function)
Dimensional face fitting method using an Active Appearance Model, which is characterized in that the 3D face fitting method of the present invention is calculated by the following method.

The method according to claim 1,
Wherein in the step (f), the shape error is determined and extracted by a predetermined number of pixels from the face inside area.