KR101877711B1

KR101877711B1 - Apparatus and method of face tracking

Info

Publication number: KR101877711B1
Application number: KR1020140118874A
Authority: KR
Inventors: 펑 쉬에타오; 센 샤오루; 장 후이; 김지연; 김정배
Original assignee: 삼성전자주식회사
Priority date: 2013-10-22
Filing date: 2014-09-05
Publication date: 2018-07-13
Also published as: CN104573614A; CN104573614B; KR20150046724A; KR20150046718A

Abstract

얼굴 트랙킹 기술이 개시된다. 일 실시예에 따르면, 입력 영상에서 검출된 얼굴 영역은 서브 영역들로 분할되고, 서브 영역들의 가려짐 확률들에 기초하여 입력 영상에 포함된 얼굴이 트랙킹된다.A face tracking technique is disclosed. According to an exemplary embodiment, a face region detected in an input image is divided into sub regions, and a face included in the input image is tracked based on the probabilities of closure of sub regions.

Description

[0001] APPARATUS AND METHOD OF FACE TRACKING [0002]

이하의 실시예들은 얼굴 트랙킹 장치 및 방법에 관한 것으로 얼굴 및 얼굴의 키 포인트를 트랙킹(tracking)하는 장치 및 방법에 관한 것이다.
The following embodiments relate to a face tracking apparatus and method, and an apparatus and method for tracking face points and face key points.

얼굴은 눈, 코, 입과 같은 주요 구성요소들에 의하여 효과적으로 식별되므로, 얼굴은 주요 구성요소들에 대응하는 특징점들을 이용하여 트랙킹될 수 있다. 그러나, 다양한 원인에 의하여 얼굴의 일부가 가려질 수 있다. 예를 들어, 사용자가 선글라스를 착용하는 경우, 눈에 대응하는 특징점들이 정확하게 트랙킹되기 어렵다. 또는, 사용자가 마스크를 착용하는 경우, 입에 대응하는 특징점들이 정확하게 트랙킹되기 어렵다. 나아가, 불균등 조명 환경에 의하여 얼굴에 그림자가 발생하는 경우, 얼굴의 표정에 따라 그림자에 의하여 가려지는 영역의 크기 및 형상이 변할 수 있다. 이 경우, 얼굴이 정확하게 트랙킹되기 어렵다.
Since the face is effectively identified by the main components such as the eye, nose, and mouth, the face can be tracked using feature points corresponding to the major components. However, a part of the face may be covered by various causes. For example, when the user wears sunglasses, it is difficult for the feature points corresponding to the eyes to be accurately tracked. Alternatively, when the user wears a mask, it is difficult for the minutiae corresponding to the mouth to be accurately tracked. Furthermore, when a shadow occurs on the face due to the uneven illumination environment, the size and shape of the area covered by the shadow may vary depending on the expression of the face. In this case, it is difficult for the face to be accurately tracked.

일 측에 따른 얼굴 트랙킹 방법은 입력 영상에서 얼굴 영역을 검출하는 단계; 상기 얼굴 영역을 서브 영역들로 분할하는 단계; 상기 서브 영역들의 가려짐 확률들을 계산하는 단계; 및 상기 가려짐 확률들에 기초하여 상기 입력 영상에 포함된 얼굴을 트랙킹하는 단계를 포함한다. 여기서, 상기 입력 영상은 적어도 일부가 가려진 얼굴을 포함할 수 있다.A method for tracking a face according to one side comprises: detecting a face region in an input image; Dividing the face region into sub-regions; Calculating closure probabilities of the sub-areas; And tracking the faces included in the input image based on the clipping probabilities. Here, the input image may include at least a partially obscured face.

상기 얼굴 영역을 검출하는 단계는 상기 입력 영상의 현재 프레임에서 제1 특징점들을 추출하는 단계; 데이터베이스로부터 적어도 하나의 키 프레임을 선택하는 단계; 상기 제1 특징점들과 상기 적어도 하나의 키 프레임의 제2 특징점들에 기초하여, 상기 입력 영상에 포함된 상기 얼굴의 포즈를 추정하는 단계; 및 상기 추정된 포즈에 기초하여 상기 입력 영상에 포함된 상기 얼굴의 제3 특징점들을 추정하는 단계를 포함할 수 있다.Wherein the detecting the face region comprises: extracting first feature points in a current frame of the input image; Selecting at least one key frame from a database; Estimating a pose of the face included in the input image based on the first feature points and the second feature points of the at least one key frame; And estimating third feature points of the face included in the input image based on the estimated pose.

상기 얼굴의 포즈를 추정하는 단계는 상기 제1 특징점들의 특징 벡터들과 상기 제2 특징점들의 특징 벡터들 사이의 유사성에 기초하여 상기 제1 특징점들과 상기 제2 특징점들 사이의 매칭 관계 정보를 생성하는 단계; 및 매칭된 제1 특징점의 좌표와 매칭된 제2 특징점의 투영 좌표 사이의 거리에 기초하여 상기 얼굴의 포즈를 지시하는 포즈 파라미터를 추정하는 단계를 포함할 수 있다.Wherein the step of estimating the pose of the face generates matching relationship information between the first feature points and the second feature points based on the similarity between the feature vectors of the first feature points and the feature vectors of the second feature points ; And estimating a pose parameter indicating a pose of the face based on the distance between the coordinates of the matched first feature point and the projection coordinates of the matched second feature point.

상기 서브 영역들로 분할하는 단계는 상기 얼굴 영역에 포함된 픽셀들의 위치 및 색상에 기초하여 패치들을 생성하는 단계; 및 상기 얼굴 영역에서 추정된 특징점들에 기초하여 섹션들을 생성하는 단계를 포함할 수 있다.Wherein the dividing into the subareas comprises: generating patches based on positions and colors of pixels included in the face region; And generating sections based on the estimated feature points in the face region.

상기 서브 영역들의 가려짐 확률들을 계산하는 단계는 패치들의 제1 확률 모델들에 기초하여 상기 패치들의 제1 가려짐 확률들을 계산하는 단계; 섹션들의 제2 확률 모델들에 기초하여 상기 섹션들의 제2 가려짐 확률들을 계산하는 단계; 및 상기 제1 가려짐 확률들 및 상기 제2 가려짐 확률들에 기초하여 가려짐 가중치 맵을 생성하는 단계를 포함할 수 있다.Wherein calculating the masking probabilities of the sub-areas comprises: calculating first masking probabilities of the patches based on first probability models of patches; Calculating second occlusion probabilities of the sections based on second probability models of the sections; And generating a masking weight map based on the first masking probabilities and the second masking probabilities.

상기 입력 영상에 포함된 상기 얼굴을 트랙킹하는 단계는 가려짐 가중치 맵을 이용하여 상기 입력 영상에 포함된 상기 얼굴을 표현하는 얼굴 모델의 파라미터를 조정하는 단계를 포함할 수 있다.The step of tracking the face included in the input image may include adjusting a parameter of a face model representing the face included in the input image using a masking weight map.

상기 얼굴 트랙킹 방법은 기 학습된 분류기를 이용하여 트랙킹 결과를 평가하는 단계; 및 상기 트랙킹 결과가 성공적이라고 평가되는 경우, 키 프레임을 업데이트하는 단계를 더 포함할 수 있다.The face tracking method includes: evaluating a tracking result using a learned classifier; And updating the key frame if the tracking result is evaluated to be successful.

다른 일 측에 따른 얼굴 트랙킹 장치는 입력 영상에서 얼굴 영역을 검출하는 얼굴 영역 검출부; 상기 얼굴 영역을 서브 영역들로 분할하는 분할부; 상기 서브 영역들의 가려짐 확률들을 계산하는 가려짐 확률 계산부; 및 상기 가려짐 확률들에 기초하여 상기 입력 영상에 포함된 얼굴을 트랙킹하는 트랙킹부를 포함한다.
The face tracking apparatus according to another aspect of the present invention includes a face region detecting unit for detecting a face region in an input image; A dividing unit dividing the face region into sub regions; A masking probability calculation unit for calculating masking probabilities of the sub-areas; And a tracking unit for tracking a face included in the input image based on the blindness probabilities.

도 1은 일 실시예에 따른 얼굴 트랙킹 방법을 나타낸 동작 흐름도.
도 2는 일 실시예에 따른 가려짐 현상을 설명하는 도면.
도 3은 일 실시예에 따른 얼굴 영역의 검출 방법을 나타낸 동작 흐름도.
도 4는 일 실시예에 따른 얼굴 영역의 분할 방법을 나타낸 동작 흐름도.
도 5a 내지 도 5c는 일 실시예에 따른 서브 영역을 설명하는 도면.
도 6은 일 실시예에 따른 가려짐 확률의 계산 방법을 나타낸 동작 흐름도.
도 7은 일 실시예에 따른 템플레이트 형상을 설명하는 도면.
도 8은 일 실시예에 따른 확률 모델을 설명하는 도면.
도 9는 일 실시예에 따른 가려짐 확률을 설명하는 도면.
도 10은 일 실시예에 따른 얼굴 모델의 파라미터 조정 방법을 나타낸 동작 흐름도.
도 11은 일 실시예에 따른 얼굴 트랙킹 방법의 후 처리를 나타낸 동작 흐름도.
도 12는 일 실시예에 따른 얼굴 트랙킹 프로세스의 전 과정을 나타낸 동작 흐름도.
도 13은 일 실시예에 따른 얼굴 트랙킹 알고리즘을 나타낸 동작 흐름도.
도 14는 일 실시예에 따른 얼굴 트랙킹 장치를 나타낸 블록도.
도 15는 일 실시예에 따른 얼굴 영역 검출부를 나타낸 블록도.
도 16은 일 실시예에 따른 가려짐 확률 계산부를 나타낸 블록도.1 is an operational flowchart illustrating a face tracking method according to an embodiment.
2 is a view for explaining a masking phenomenon according to an embodiment;
3 is a flowchart illustrating a method of detecting a face region according to an exemplary embodiment;
4 is a flowchart illustrating a method of dividing a face region according to an exemplary embodiment;
Figures 5A-5C illustrate sub-regions according to one embodiment.
6 is an operational flow diagram illustrating a method for calculating a closure probability according to an embodiment;
7 is a view for explaining a template shape according to an embodiment;
8 is a diagram illustrating a probability model according to an embodiment;
9 is a view for explaining a masking probability according to an embodiment;
10 is an operational flowchart illustrating a method of adjusting a parameter of a face model according to an embodiment.
11 is an operational flowchart showing a post-process of a face tracking method according to an embodiment.
12 is a flowchart illustrating an entire process of a face tracking process according to an embodiment.
13 is an operational flowchart illustrating a face tracking algorithm according to an embodiment.
14 is a block diagram showing a face tracking apparatus according to an embodiment;
15 is a block diagram showing a face area detecting unit according to an embodiment;
16 is a block diagram illustrating a clog probability calculation unit according to an embodiment;

이하, 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다. 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. Like reference symbols in the drawings denote like elements.

도 1은 일 실시예에 따른 얼굴 트랙킹 방법을 나타낸 동작 흐름도이다. 도 1을 참조하여, 일 실시예에 따른 얼굴 트랙킹 방법(face tracking method)을 설명하기에 앞서, 실시예들에 의하여 이용되는 얼굴 모델을 설명한다. 얼굴 모델은 입력 영상에 포함된 얼굴을 표현하는 모델이다. 얼굴 모델은 2차원 형상 모델, 3차원 형상 모델, 및/또는 텍스쳐 모델을 포함할 수 있다. 얼굴 모델은 변형 가능할(deformable) 수 있다.1 is a flowchart illustrating a method of tracking a face according to an exemplary embodiment of the present invention. Referring to FIG. 1, before describing a face tracking method according to an embodiment, a face model used by embodiments will be described. The face model is a model that expresses the face included in the input image. The face model may include a two-dimensional shape model, a three-dimensional shape model, and / or a texture model. The face model can be deformable.

2차원 형상 모델은 얼굴의 특징점들(feature points)의 기하 위치를 2차원 좌표로 표시할 수 있다. 특징점들은 눈, 코, 입, 눈썹, 또는 얼굴의 윤곽선 등 얼굴의 특징적인 외형 위에 위치하는 점들일 수 있다. 예를 들어, 2차원 형상 모델은 수학식 1로 나타낼 수 있다.The two-dimensional shape model can display the geometric positions of the feature points of the face as two-dimensional coordinates. The feature points may be points located on the characteristic outline of the face such as eyes, nose, mouth, eyebrow, or contour of the face. For example, the two-dimensional shape model can be expressed by Equation (1).

여기서, s은 2차원 형상 모델을 표현하는 벡터이며, 얼굴의 특징점들의 2차원 좌표로 구성될 수 있다. p는 2차원 형상 파라미터(shape parameter)이고, q는 2차원 유사 변환 파라미터(similarity transformation parameter)이다. s₀은 2차원 평균 형상이고, s_i는 2차원 형상 프리미티브(primitive)이다. p_i는 2차원 형상 파라미터의 구성 요소이고, N()은 2차원 형상을 2차원 유사 변환시키는 함수이다. 2차원 형상 파라미터 p에 따라 서로 다른 2차원 얼굴 형상이 생성될 수 있다. 2차원 유사 변환 파라미터 q에 따라 2차원 얼굴의 포즈(pose)가 변경될 수 있다. 2차원 형상 파라미터 p와 2차원 유사 변환 파라미터 q에 의하여 2차원 형상 모델 s를 구성하는 특징점들의 2차원 좌표들이 결정된다.Here, s is a vector representing a two-dimensional shape model, and may be formed of two-dimensional coordinates of the feature points of the face. p is a two-dimensional shape parameter, and q is a two-dimensional similarity transformation parameter. s ₀ is a two-dimensional mean shape, and s _i is a two-dimensional shape primitive. p _i is a component of the two-dimensional shape parameter, and N () is a function of two-dimensional pseudo-transformation of the two-dimensional shape. Different two-dimensional face shapes can be generated according to the two-dimensional shape parameter p . The pose of the two-dimensional face can be changed according to the two-dimensional similarity transformation parameter q . The two-dimensional coordinates of the feature points constituting the two-dimensional shape model s are determined by the two-dimensional shape parameter p and the two-dimensional similarity conversion parameter q .

3차원 형상 모델은 얼굴의 특징점들의 기하 위치를 3차원 좌표를 표시할 수 있다. 예를 들어, 3차원 형상 모델은 수학식 2로 나타낼 수 있다. The three-dimensional shape model can display the three-dimensional coordinates of the geometric positions of the feature points of the face. For example, the three-dimensional shape model can be expressed by the following equation (2).

여기서, s'은 3차원 형상 모델을 표현하는 벡터이며, 얼굴의 특징점들의 3차원 좌표로 구성될 수 있다. p'은 3차원 형상 파라미터이고, q'은 3차원 유사 변환 파라미터이다. s'₀은 3차원 평균 형상이고, s'_i는 3차원 형상 프리미티브이다. p'_i는 3차원 형상 파라미터의 구성요소이고, N'()은 3차원 형상을 3차원 유사 변환시키는 함수이다.Here, s' is a vector representing a three-dimensional shape model, and may be composed of three-dimensional coordinates of the feature points of the face. p ' is a three-dimensional shape parameter, and q' is a three-dimensional pseudo-transformation parameter. s ' ₀ is a three-dimensional average shape, and s' _i is a three-dimensional shape primitive. p ' _i is a component of the three-dimensional shape parameter, and N' () is a function of three-dimensional pseudo-transformation of the three-dimensional shape.

3차원 형상 파라미터 p'에 따라 서로 다른 3차원 얼굴 형상이 생성될 수 있다. 3차원 유사 변환 파라미터 q'에 따라 3차원 좌표계 내에서 3차원 얼굴의 위치 또는 포즈가 변경될 수 있다. 3차원 형상 파라미터 p'은 얼굴의 표정(expression)에 대응할 수 있다. 예를 들어, 3차원 형상 파라미터 p'은 개인이 가지고 있는 고유한 얼굴 형태뿐 아니라, 개인의 표정도 포함할 수 있다. 또한,, 유사 변환 파라미터 q'은 얼굴의 포즈(pose)에 대응할 수 있다. 3차원 형상 파라미터 p'과 3차원 유사 변환 파라미터 q'에 의하여 3차원 형상 모델 s'을 구성하는 특징점들의 3차원 좌표들이 결정된다.Different three-dimensional face shapes can be generated according to the three-dimensional shape parameter p ' . The position or pose of the three-dimensional face in the three-dimensional coordinate system can be changed according to the three-dimensional similarity conversion parameter q ' . The three-dimensional shape parameter p ' may correspond to a facial expression. For example, the three-dimensional shape parameter p ' may include not only a unique facial shape of an individual but also an individual facial expression. Further, the similar transformation parameter q ' may correspond to the pose of the face. The three-dimensional coordinates of the feature points constituting the three-dimensional shape model s' are determined by the three-dimensional shape parameter p ' and the three-dimensional similarity transformation parameter q' .

텍스쳐 모델은 얼굴의 텍스쳐를 표시하는 모델이다. 예를 들어, 텍스쳐 모델은 수학식 3으로 나타낼 수 있다. 텍스쳐 모델은 외형 모델(appearance model)이라고 지칭될 수 있다.The texture model is a model for displaying the texture of a face. For example, the texture model can be expressed by Equation (3). The texture model may be referred to as an appearance model.

여기서, a는 텍스쳐 모델을 표현하는 텍스쳐 벡터이며, b는 텍스쳐 파라미터이다. a ₀ 는 평균 텍스쳐이고, a _i 는 텍스쳐 프리미티브이며, b_i는 텍스쳐 파라미터의 구성요소이다. 텍스쳐 파라미터 b에 따라 텍스쳐 모델 a가 변형될 수 있다.
Here, a is a texture vector representing a texture model, and b is a texture parameter. a ₀ is the average texture, a _i is the texture primitive, and b _i is a component of the texture parameter. The texture model a can be modified according to the texture parameter b .

도 1을 참조하면, 일 실시예에 따른 얼굴 트랙킹 방법은 입력 영상에서 얼굴 영역을 검출하는 단계(110), 얼굴 영역을 서브 영역들로 분할하는 단계(120), 서브 영역들의 가려짐 확률들을 계산하는 단계(130), 및 가려짐 확률들에 기초하여 얼굴을 트랙킹하는 단계(140)를 포함한다. 얼굴 트랙킹 방법은 소프트웨어 모듈로 구현되어 프로세서에 의하여 실행될 수 있다. 또는, 얼굴 트랙킹 방법은 하드웨어 모듈로 구현될 수 있다. 또는, 얼굴 트랙킹 방법은 소프트웨어 모듈과 하드웨어 모듈의 조합으로 구현될 수 있다. 이하, 얼굴 트랙킹 방법이 구현된 소프트웨어 모듈, 하드웨어 모듈, 또는 이들의 조합은 '얼굴 트랙킹 장치'라고 지칭될 수 있다.Referring to FIG. 1, a face tracking method according to an exemplary embodiment includes a step 110 of detecting a face area in an input image, a step 120 of dividing a face area into sub-areas, (Step 130), and tracking (step 140) a face based on the masked probabilities. The face tracking method may be implemented by a software module and executed by a processor. Alternatively, the face tracking method may be implemented as a hardware module. Alternatively, the face tracking method may be implemented by a combination of a software module and a hardware module. Hereinafter, a software module, a hardware module, or a combination thereof implemented with the face tracking method may be referred to as a 'face tracking device'.

일 실시예에 따른 얼굴 트랙킹 장치는 입력 영상에서 얼굴을 트랙킹할 수 있다. 입력 영상은 복수 개의 영상들 또는 비디오 스트리밍(video streaming)일 수 있다. 얼굴 트랙킹 장치는 복수 개의 영상들 각각에서 얼굴을 트랙킹할 수 있다. 또는, 얼굴 트랙킹 장치는 비디오 스트리밍에 포함된 프레임들 각각에서 얼굴을 트랙킹할 수 있다. A face tracking apparatus according to an exemplary embodiment may track a face in an input image. The input image may be a plurality of images or video streaming. The face tracking apparatus can track a face in each of a plurality of images. Alternatively, the face tracking device may track a face in each of the frames included in the video streaming.

입력 영상에 포함된 얼굴에서 적어도 일부가 다른 물체에 의하여 가려질(occluded) 수 있다. 예를 들어, 도 2를 참조하면, 제1 얼굴 영상(210)에서 사용자의 얼굴 중 오른쪽 눈썹 전부, 왼쪽 눈썹 일부, 및 왼쪽 눈 일부가 머리카락에 의하여 가려질 수 있다. 또한, 제2 얼굴 영상(220)에서 사용자의 얼굴 중 오른쪽 눈썹 전부 및 왼쪽 눈썹 일부가 머리카락에 의하여 가려지고, 입술 일부가 마이크에 의하여 가려질 수 있다. At least some of the faces included in the input image may be occluded by other objects. For example, referring to FIG. 2, in the first face image 210, all of the right eyebrow portion, the left eyebrow portion, and the left eye portion of the user's face may be covered by the hair. Also, in the second face image 220, all of the right eyebrows and a part of the left eyebrows of the user's face may be covered by the hair, and a part of the lips may be masked by the microphone.

이처럼 입력 영상에 포함된 얼굴에서 적어도 일부가 다른 물체에 의하여 가려지는 경우, 트랙킹 결과의 정확도가 낮아질 수 있다. 실시예들은 입력 영상에 포함된 얼굴에서 적어도 일부가 다른 물체에 의하여 가려지는 경우에도 정확도 높은 트랙킹 결과를 도출하는 기술을 제공할 수 있다.
If at least a part of the face included in the input image is covered by another object, the accuracy of the tracking result may be lowered. Embodiments can provide a technique of deriving accurate tracking results even when at least some of the faces included in the input image are obscured by other objects.

얼굴 트랙킹 장치는 단계(110)에서 입력 영상을 수신하고, 입력 영상으로부터 얼굴 영역(face region)을 검출할 수 있다. 얼굴 영역은 단일 영상 또는 단일 프레임에서 얼굴을 포함하는 영역일 수 있다. 얼굴 영역은 눈, 코, 입, 눈썹 등 얼굴의 주요 구성요소들 및 얼굴의 윤곽선 등을 포함하는 영역일 수 있다. 예를 들어, 단일 영상 또는 단일 프레임에는 사람의 전신(full body)이 포함될 수 있다. 얼굴 트랙킹 장치는 사람의 전신 중 얼굴에 대응하는 얼굴 영역을 검출할 수 있다. The face tracking device may receive an input image in step 110 and may detect a face region from the input image. The face region may be a single image or an area containing a face in a single frame. The face region may be an area including main components of the face such as eyes, nose, mouth, eyebrows, and contours of the face. For example, a single image or a single frame may include a full body of a person. The face tracking apparatus can detect a face area corresponding to a face of a whole body of a person.

얼굴 트랙킹 장치는 이전 프레임(previous frame)의 트랙킹 결과(tracking result)에 기초하여 현재 프레임(current frame)의 얼굴 영역을 검출할 수 있다. 얼굴의 운동 속도는 일정 속도 이하로 제한되므로, 얼굴 트랙킹 장치는 이전 프레임에서 얼굴이 포함된 영역을 주변으로 확대하여 현재 프레임의 얼굴 영역을 결정할 수 있다.The face tracking apparatus can detect a face area of a current frame based on a tracking result of a previous frame. Since the speed of motion of the face is limited to a predetermined speed or less, the face tracking apparatus can determine the face area of the current frame by enlarging the area including the face in the previous frame to the periphery.

예를 들어, 도 3을 참조하면, 얼굴 트랙킹 장치는 단계(111)에서 현재 프레임으로부터 특징점들을 추출할 수 있다. 얼굴 트랙킹 장치는 현재 프레임에서 눈, 코, 입, 눈썹, 얼굴의 윤곽선 등을 나타내는 특징점들을 추출할 수 있다. 예를 들어, 얼굴 트랙킹 장치는 SIFT(scale-invariant feature transform) 알고리즘, SURF(speed up robust feature) 알고리즘, 및 FAST(features from accelerated segment test) 알고리즘 중 적어도 하나를 사용하여 특징점들을 검출할 수 있다. 이하, 현재 프레임으로부터 추출된 특징점들은 '제1 특징점들'이라고 지칭될 수 있다.For example, referring to FIG. 3, the face tracking apparatus may extract feature points from the current frame in step 111. [0033] FIG. The face tracking device can extract feature points representing the eye, nose, mouth, eyebrow, and contour of the face in the current frame. For example, the face tracking device may detect feature points using at least one of a scale-invariant feature transform (SIFT) algorithm, a speed up robust feature (SURF) algorithm, and a feature from accelerated segment test (FAST) algorithm. Hereinafter, the feature points extracted from the current frame may be referred to as 'first feature points'.

이 때, 가려진 부위(occluded part)에서는 얼굴과 무관한 특징점들이 추출될 수 있다. 예를 들어, 도 2의 제2 얼굴 영상(220)에서 입술의 일부가 마이크에 의하여 가려지므로, 입술에 대응하는 특징점 대신 마이크에 대응하는 특징점이 추출될 수 있다.At this time, feature points irrelevant to the face can be extracted in the occluded part. For example, since a part of the lips is masked by the microphone in the second face image 220 of FIG. 2, feature points corresponding to the microphones may be extracted instead of the feature points corresponding to the lips.

얼굴 트랙킹 장치는 단계(112)에서 데이터베이스로부터 키 프레임들(key frames)을 선택할 수 있다. 데이터베이스는 복수의 키 프레임들을 저장한다. 키 프레임들 각각은 포즈 파라미터(pose parameter) 및 표정 파라미터(expression parameter) 중 적어도 하나로 인덱스될 수 있다. 예를 들어, 키 프레임들 각각은 특정 포즈와 특정 표정의 조합에 대응하는 특징점들을 저장한다. 키 프레임들 각각은 특징점들을 3차원 좌표의 형태로 저장할 수 있다.The face tracking device may select keyframes from the database at step 112. The database stores a plurality of key frames. Each of the key frames may be indexed with at least one of a pose parameter and an expression parameter. For example, each of the key frames stores feature points corresponding to a combination of a particular pose and a particular facial expression. Each of the key frames may store the feature points in the form of three-dimensional coordinates.

얼굴 트랙킹 장치는 데이터베이스에 저장된 복수의 키 프레임들 중 이전 프레임의 트랙킹 결과와 연관된 키 프레임들을 선택할 수 있다. 예를 들어, 이전 프레임의 트랙킹 결과 도출된 포즈 파라미터와 표정 파라미터로 각각 p1과 e1이 도출될 수 있다. 이 경우, 얼굴 트랙킹 장치는 데이터베이스에 저장된 복수의 키 프레임들 중 (p1, e1)으로 인덱스되는 키 프레임을 선택할 수 있다. 또한, 얼굴 트랙킹 장치는 데이터베이스에 저장된 복수의 키 프레임들 중 (p1, *) 또는 (*, e1)으로 인덱스되는 키 프레임들을 선택할 수 있다. (p1, *)는 p1을 포함하는 모든 인덱스들이고, (*, e1)은 e1을 포함하는 모든 인덱스들일 수 있다. 또는, 얼굴 트랙킹 장치는 (p1, e1)과 유사한 인덱스들을 결정하고, (p1, e1)과 유사하다고 결정된 인덱스들로 인덱스되는 키 프레임들을 선택할 수 있다. 전술한 키 프레임 선택 방식은 예시적인 사항에 불과하며, 키 프레임 선택 방식은 다양하게 변형될 수 있다.The face tracking device may select key frames associated with tracking results of a previous one of a plurality of key frames stored in the database. For example, p1 and e1 may be derived respectively as a pose parameter and a facial expression parameter derived from the tracking result of the previous frame. In this case, the face tracking apparatus can select a key frame indexed by (p1, e1) among a plurality of key frames stored in the database. In addition, the face tracking device may select key frames indexed by (p1, *) or (*, e1) among a plurality of key frames stored in the database. (p1, *) are all indices including p1 and (*, e1) may be all indices including e1. Alternatively, the face tracking device may determine indexes similar to (p1, e1) and select keyframes indexed with indices determined to be similar to (p1, e1). The above-described key frame selection method is merely an example, and the key frame selection method can be variously modified.

얼굴 트랙킹 장치는 단계(113)에서 선택된 키 프레임들에 포함된 특징점들과 현재 프레임에서 추출된 제1 특징점들에 기초하여 현재 프레임에 포함된 얼굴의 포즈를 추정할 수 있다. 선택된 키 프레임들 각각은 특징점들을 3차원 좌표의 형태로 저장할 수 있다. 이하, 선택된 키 프레임들 각각에 포함된 특징점들은 '제2 특징점들'이라고 지칭될 수 있다.The face tracking apparatus can estimate the pose of the face included in the current frame based on the feature points included in the key frames selected in step 113 and the first feature points extracted from the current frame. Each of the selected key frames may store the feature points in the form of three-dimensional coordinates. Hereinafter, the feature points included in each of the selected key frames may be referred to as 'second feature points'.

포즈 추정은 크게 두 단계들로 수행될 수 있다. 포즈 추정의 첫 번째 단계로, 얼굴 트랙킹 장치는 선택된 제1 특징점들과 키 프레임들 각각에 포함된 제2 특징점들 사이의 매칭 관계 정보(matching relation information)를 생성할 수 있다. 키 프레임들 각각은 이전에 성공적으로 매칭되었던 특징점들을 3차원 좌표의 형태로 저장하고, 이전에 성공적으로 매칭되었던 특징점들의 특징 벡터들을 더 저장할 수 있다.The pose estimation can be largely performed in two steps. As a first step of the pose estimation, the face tracking apparatus may generate matching relation information between the selected first feature points and the second feature points contained in each of the key frames. Each of the key frames may store previously previously matched feature points in the form of three-dimensional coordinates and further store the feature vectors of the previously successfully matched feature points.

키 프레임에 저장된 3차원 좌표들은 이전에 성공적으로 매칭되었던 특징점들의 2차원 좌표들이 3차원 형상 모델로 투영(projection)됨으로써 얻어진 것일 수 있다. 키 프레임에 저장된 3차원 좌표들은 3차원 형상 모델을 구성하는 삼각형 표면에 위치할 수 있다. 키 프레임에 저장된 3차원 좌표들은 해당 삼각형의 꼭지점 좌표 및 수심 좌표로 표시될 수 있다. 여기서, 삼각형의 수심(orthocenter)은 삼각형의 세 꼭지점에서 그 각각의 대변에 내린 세 수선들이 만나는 점이다.The three-dimensional coordinates stored in the key frame may be obtained by projecting the two-dimensional coordinates of the previously matched feature points into the three-dimensional shape model. The three-dimensional coordinates stored in the key frame may be located on the triangular surface constituting the three-dimensional shape model. The three-dimensional coordinates stored in the key frame can be represented by the vertex coordinates and the depth coordinates of the corresponding triangle. Here, the orthocenter of a triangle is the point at which three perpendicular lines meet at each of the three sides of the triangle.

키 프레임에 저장된 특징 벡터는 이전에 성공적으로 매칭되었던 특징점들의 주변 영역의 색상에 의하여 계산된 것일 수 있다. 예를 들어, 특징 벡터는 색상 히스토그램 및/또는 SIFT 히스토그램에 기초하여 계산될 수 있다. 키 프레임에 저장된 특징 벡터는 이전에 성공적으로 매칭되었던 특징점들의 텍스쳐 특징을 반영할 수 있다.The feature vector stored in the key frame may be calculated by the color of the surrounding area of previously previously matched feature points. For example, the feature vector may be computed based on the color histogram and / or the SIFT histogram. The feature vector stored in the key frame may reflect the texture characteristics of the previously matched feature points.

얼굴 트랙킹 장치는 특징 벡터인 텍스처 벡터의 유사 여부에 기초하여 매칭 관계 정보를 생성할 수 있다. 예를 들어, 얼굴 트랙킹 장치는 현재 프레임에서 추출된 제1 특징점들의 특징 벡터들과 데이터베이스로부터 선택된 키 프레임들에 포함된 제2 특징점들의 특징 벡터들을 비교할 수 있다. 얼굴 트랙킹 장치는 특징 벡터들이 유사한 제1 특징점 및 제2 특징점을 서로 매칭할 수 있다. 보다 구체적으로, 얼굴 트랙킹 장치는 특징 벡터들 사이의 거리를 계산할 수 있다. 얼굴 트랙킹 장치는 현재 프레임으로부터 추출된 제1 특징점에 대응하여, 복수의 키 프레임들에 포함된 제2 특징점들 중 특징 벡터들 사이의 거리가 가장 가깝게 계산된 제2 특징점을 검출할 수 있다. 얼굴 트랙킹 장치는 검출된 제2 특징점을 제1 특징점의 매칭점(matching point)으로 선택할 수 있다.The face tracking apparatus can generate matching relationship information based on the similarity of texture vectors which are feature vectors. For example, the face tracking apparatus may compare the feature vectors of the first feature points extracted from the current frame with the feature vectors of the second feature points included in the key frames selected from the database. The face tracking apparatus can match first feature points and second feature points that are similar in feature vectors to each other. More specifically, the face tracking device can calculate the distance between feature vectors. The face tracking apparatus can detect a second feature point in which the distance between the feature vectors of the second feature points included in the plurality of key frames is calculated to be closest to the first feature point extracted from the current frame. The face tracking device may select the detected second feature point as a matching point of the first feature point.

얼굴 트랙킹 장치는 데이터베이스로부터 선택된 복수의 키 프레임들 중 어느 하나의 키 프레임을 선택할 수 있다. 예를 들어, 얼굴 트랙킹 장치는 제1 특징점들과 가장 잘 매칭되는 제2 특징점들을 포함하는 키 프레임을 선택할 수 있다. 매칭 관계 정보는 특징 벡터 사이의 유사성에 의하여 생성되고, 텍스쳐 정보가 가장 유사한 키 프레임이 선택될 수 있다.The face tracking device may select any one of a plurality of key frames selected from the database. For example, the face tracking device may select a key frame that includes second feature points that best match the first feature points. The matching relation information may be generated by similarity between the feature vectors, and the key frame whose texture information is most similar may be selected.

포즈 추정의 두 번째 단계로, 얼굴 트랙킹 장치는 매칭 관계 정보에 기초하여 현재 프레임에 포함된 얼굴의 포즈를 추정할 수 있다. 예를 들어, 얼굴 트랙킹 장치는 매칭 관계 정보를 이용하여 적합한 3차원 형상 모델의 3차원 유사 변환 파라미터를 조정할 수 있다. 얼굴 트랙킹 장치는 선택된 키 프레임에 대응하는 3차원 얼굴 모델을 획득하고, 매칭 관계 정보에 기초하여 3차원 얼굴 모델의 3차원 유사 변환 파라미터를 결정할 수 있다. 얼굴 트랙킹 장치는 3차원 유사 변환 파라미터를 결정함으로써, 현재 프레임에 포함된 얼굴의 포즈를 추정할 수 있다.As a second step of the pose estimation, the face tracking apparatus can estimate the pose of the face included in the current frame based on the matching relation information. For example, the face tracking device can adjust the 3D similarity transformation parameters of a suitable three-dimensional shape model using matching relationship information. The face tracking apparatus may acquire a three-dimensional face model corresponding to the selected key frame, and may determine a three-dimensional similarity conversion parameter of the three-dimensional face model based on the matching relationship information. The face tracking apparatus can estimate the pose of the face included in the current frame by determining the 3D similarity conversion parameter.

보다 구체적으로, 얼굴 트랙킹 장치는 3차원 유사 변환 파라미터를 조정함으로써, 키 프레임에 포함된 매칭 특징점들의 3차원 위치 및 자세를 변경할 수 있다. 얼굴 트랙킹 장치는 3차원 유사 변환 파라미터에 의하여 변환된 키 프레임의 매칭 특징점들을 현재 프레임의 매칭 특징점들과 비교하기 위하여, 키 프레임의 매칭 특징점들을 현재 프레임으로 투영시킬 수 있다. 키 프레임의 매칭 특징점들 각각은 3차원 좌표를 가지고, 현재 프레임은 2차원 영상이기 때문이다. 얼굴 트랙킹 장치는 키 프레임의 매칭 특징점들을 현재 프레임에 투영함으로써 투영점들을 획득할 수 있다. 현재 프레임의 매칭 특징점들과 투영점들은 2차원 좌표를 가질 수 있다.More specifically, the face tracking apparatus can change the three-dimensional position and posture of the matching feature points included in the key frame by adjusting the three-dimensional similarity conversion parameter. The face tracking apparatus can project the matching feature points of the key frame onto the current frame in order to compare the matching feature points of the key frame converted by the 3D similar transformation parameter with the matching feature points of the current frame. Each of the matching feature points of the key frame has three-dimensional coordinates, and the current frame is a two-dimensional image. The face tracking device may obtain the projection points by projecting the matching feature points of the key frame to the current frame. The matching feature points and projection points of the current frame may have two-dimensional coordinates.

얼굴 트랙킹 장치는 투영점들과 현재 프레임의 매칭 특징점들 사이의 거리를 계산할 수 있다. 예를 들어, 얼굴 트랙킹 장치는 수학식 4를 사용하여 투영점들과 현재 프레임의 매칭 특징점들 사이의 거리를 계산할 수 있다.The face tracking device can calculate the distance between the matching feature points of the current frame and the projection points. For example, the face tracking device may calculate the distance between the matching feature points of the current frame and the projection points using equation (4).

여기서, i 는 서로 매칭된 페어(pair)의 인덱스이고, v_i는 현재 프레임의 매칭 특징점이며, u_i는 키 프레임의 매칭 특징점일 수 있다. Proj()는 키 프레임의 매칭 특징점을 현재 프레임에 투영하는 함수이고, N'()은 3차원 형상 모델을 유사 변환(예를 들어, 3차원 이동 및 회전)시키는 함수일 수 있다. q'은 3차원 유사 변환 파라미터이다.Here, i is an index of a pair matched with each other, v _i is a matching minutiae point of a current frame, and u _i may be a matching minutiae of a key frame. Proj () is a function for projecting the matching feature point of the key frame to the current frame, and N ' () may be a function for performing similar transformation (for example, three-dimensional movement and rotation) of the three-dimensional shape model. q ' is a 3D similarity conversion parameter.

ρ()는 로버스트(robust) 오차함수이다. 로버스트 오차함수는 입력이 임계값보다 적으면 입력에 따라 출력이 증가하고, 입력이 임계값보다 크면 입력에 따라 출력이 증가하는 속도가 느려지거나 더 이상 증가하지 않는 함수이다. 얼굴 트랙킹 장치는 로버스트 오차함수를 사용함으로써, 특징점 매칭 중 발생된 에러가 얼굴의 포즈 추정에 미치는 간섭을 감소시킬 수 있다.ρ () is a robust error function. The robust error function is a function that increases the output according to the input if the input is less than the threshold value, and slows down or does not increase the output when the input is greater than the threshold value. By using the robust error function, the face tracking apparatus can reduce the interference of the errors generated during the feature point matching on the estimation of the pose of the face.

얼굴 트랙킹 장치는 현재 프레임의 매칭 특징점들과 투영점들 사이의 거리가 최소가 되도록, 3차원 얼굴 모델의 3차원 유사 변환 파라미터를 결정할 수 있다. 얼굴 트랙킹 장치는 3차원 유사 변환 파라미터를 결정함으로써, 현재 프레임에 포함된 얼굴의 포즈를 추정할 수 있다.The face tracking apparatus can determine the 3D similarity transformation parameters of the 3D face model such that the distance between matching feature points and projection points of the current frame is minimized. The face tracking apparatus can estimate the pose of the face included in the current frame by determining the 3D similarity conversion parameter.

얼굴 트랙킹 장치는 단계(114)에서 추정된 포즈에 기초하여 현재 프레임에 포함된 얼굴의 특징점들을 추정할 수 있다. 이하, 현재 프레임의 얼굴을 위하여 추정된 특징점들은 '제3 특징점들'이라고 지칭될 수 있다. 제1 특징점은 현재 프레임에서 직접 추출되므로, 가려진 부위에서 추출된 제1 특징점들은 얼굴과 무관한 특징점들을 포함할 수 있다. 반면, 제3 특징점은 이전 프레임의 트랙킹 결과와 유사하면서 제1 특징점들과의 연관 관계가 높은 포즈에 기초하여 추정되므로, 가려진 부위에서도 얼굴과 관련된 제3 특징점들이 추정될 수 있다. The face tracking apparatus can estimate the feature points of the face included in the current frame based on the estimated pose in step 114. [ Hereinafter, the feature points estimated for the face of the current frame may be referred to as 'third feature points'. Since the first feature point is directly extracted from the current frame, the first feature points extracted from the obscured region may include feature points irrelevant to the face. On the other hand, since the third feature point is estimated based on a pose having a high correlation with the first feature points, similar to the tracking result of the previous frame, third feature points related to the face can be estimated even in the hidden region.

보다 구체적으로, 얼굴 트랙킹 장치는 3차원 형상 모델에 기초하여 현재 프레임을 위한 2차원 형상 모델의 파라미터들을 결정할 수 있다. 얼굴 트랙킹 장치는 결정된 2차원 형상 모델의 특징점들을 현재 프레임에 포함된 얼굴의 특징점들로 추정할 수 있다.More specifically, the face tracking apparatus can determine the parameters of the two-dimensional shape model for the current frame based on the three-dimensional shape model. The face tracking apparatus can estimate the feature points of the determined two-dimensional shape model as feature points of the face included in the current frame.

예를 들어, 얼굴 트랙킹 장치는 수학식 5로 표현되는 비용 함수(cost function)를 최소화시키는 2차원 형상 모델의 파라미터들을 결정할 수 있다.For example, the face tracking apparatus can determine parameters of a two-dimensional shape model that minimizes a cost function expressed by Equation (5).

얼굴 트랙킹 장치는 기울기 하강 알고리즘(gradient descent algorithm)을 이용하여 수학식 5의 비용 함수를 최소화시킴으로써, 2차원 형상 모델의 파라미터들 p, q를 결정할 수 있다. 이 때, 2차원 형상 모델을 구성하는 특징점들과 3차원 형상 모델을 구성하는 특징점들은 각각 대응하지 않을 수 있다. 얼굴 트랙킹 장치는 서로 대응하는 특징점들에 한하여 수학식 5의 비용 함수를 최소화시킬 수 있다.The face tracking apparatus can determine the parameters p , q of the two-dimensional shape model by minimizing the cost function of Equation (5) using a gradient descent algorithm. At this time, the feature points constituting the two-dimensional shape model and the feature points constituting the three-dimensional shape model may not correspond to each other. The face tracking apparatus can minimize the cost function of Equation (5) with respect to mutually corresponding feature points.

얼굴 트랙킹 장치는 현재 프레임을 위한 2차원 형상 모델의 특징점들에 대응하는 위치 좌표에 기초하여 현재 프레임에서 얼굴 영역을 검출할 수 있다. 이처럼 일 실시예에 따른 얼굴 트랙킹 장치는 이전 프레임의 트랙킹 결과에 기초하여 현재 프레임의 얼굴 영역을 검출할 수 있다. The face tracking apparatus can detect the face region in the current frame based on the position coordinates corresponding to the minutiae points of the two-dimensional shape model for the current frame. In this way, the face tracking apparatus according to an embodiment can detect the face region of the current frame based on the tracking result of the previous frame.

다른 실시예에 따른 얼굴 트랙킹 장치는 일반적인 얼굴 검출 알고리즘(face detection algorithm)을 이용하여 얼굴 영역을 검출할 수 있다. 예를 들어, 비디오 스트리밍 내 첫 번째 프레임이나, 복수의 영상들 중 최초로 입력된 영상의 경우, 이전 프레임 정보가 없을 수 있다. 이 경우, 얼굴 트랙킹 장치는 일반적인 얼굴 검출 알고리즘을 통하여 얼굴 영역을 검출할 수 있다.The face tracking apparatus according to another embodiment can detect a face region using a general face detection algorithm. For example, there may be no previous frame information in the first frame in the video streaming or in the first input image of the plurality of images. In this case, the face tracking apparatus can detect the face region through a general face detection algorithm.

도면에 도시하지 않았으나, 얼굴 트랙킹 장치는 유효 매칭 결과(valid matching result)를 저장할 수 있다. 유효 매칭 결과는 제1 특징점들 및 제2 특징점들 사이의 매칭이 유효한지 여부를 나타내는 정보일 수 있다. 예를 들어, 유효 매칭 결과는 매칭 관계 정보에 의하여 매칭되는 (제1 특징점, 제2 특징점)의 페어들 중 제1 특징점과 제2 특징점 사이의 차이가 미리 정해진 임계값 미만인 페어만을 포함할 수 있다. 유효 매칭 결과는 가려지지 않은 영역에 존재하는 특징점들을 지시할 수 있다.Although not shown in the drawing, the face tracking apparatus can store a valid matching result. The effective matching result may be information indicating whether matching between the first feature points and the second feature points is valid. For example, the effective matching result may include only pairs having a difference between the first feature point and the second feature point that are less than a predetermined threshold, among the pairs of the first feature point and the second feature point that are matched by the matching relation information . The effective matching result can indicate the feature points that exist in the non-masked region.

얼굴 트랙킹 장치는 현재 프레임에서 추출된 특징점들 및 3차원 형상 모델에서 현재 프레임에 투영된 투영점들 사이의 거리에 기초하여 유효 매칭 결과를 생성할 수 있다. 예를 들어, 얼굴 트랙킹 장치는 현재 프레임에서 추출된 특징점들을 유효 매칭 그룹 및 무효 매칭 그룹으로 분류할 수 있다. 얼굴 트랙킹 장치는 현재 프레임에서 추출된 특징점 및 3차원 형상 모델에서 현재 프레임에 투영된 투영점 사이의 거리를 계산할 수 있다. 얼굴 트랙킹 장치는 계산된 거리가 임계값보다 작은 경우, 현재 프레임에서 추출된 특징점을 유효 매칭 그룹으로 분류할 수 있다. 얼굴 트랙킹 장치는 계산된 거리가 임계값 이상인 경우, 현재 프레임에서 추출된 특징점을 무효 매칭 그룹으로 분류할 수 있다. 얼굴 트랙킹 장치는 유효 특징 그룹으로 분류된 특징점들을 이용하여 유효 매칭 결과를 생성할 수 있다.The face tracking device may generate an effective matching result based on the feature points extracted from the current frame and the distance between the projection points projected on the current frame in the three-dimensional shape model. For example, the face tracking apparatus can classify the feature points extracted in the current frame into an effective matching group and an invalid matching group. The face tracking apparatus can calculate the distance between the feature point extracted from the current frame and the projection point projected on the current frame in the three-dimensional shape model. If the calculated distance is smaller than the threshold value, the face tracking apparatus can classify the feature points extracted in the current frame into an effective matching group. The face tracking apparatus can classify the feature points extracted from the current frame into invalid matching groups when the calculated distance is equal to or greater than the threshold value. The face tracking apparatus can generate an effective matching result using the minutiae classified into the effective feature group.

유효 매칭 결과는 얼굴 영역을 서브 영역들로 분할 시, 섹션(section)을 생성하는 과정에 이용될 수 있다. 유효 매칭 결과가 이용되는 방식과 관련된 상세한 사항은 후술한다.
The effective matching result can be used in the process of creating a section when dividing the face region into sub regions. Details related to how the valid matching result is used will be described later.

얼굴 트랙킹 장치는 단계(120)에서 얼굴 영역을 서브 영역들로 분할할 수 있다. 서브 영역들은 얼굴 영역이 분할된 영역들로, 패치들(patches)과 섹션들(sections)을 포함할 수 있다. 패치들은 픽셀들의 위치와 색상에 기초하여 얼굴 영역 내 픽셀들이 클러스터링(clustering) 됨으로써 생성될 수 있다. 섹션들은 얼굴 영역 내 특징점들에 기초하여 패치들이 병합(merge) 됨으로써 생성될 수 있다.The face tracking device may divide the face area into sub-areas in step 120. [ The sub-regions are divided regions of the face region, and may include patches and sections. The patches may be generated by clustering the pixels in the face region based on the location and color of the pixels. The sections may be created by merging the patches based on the feature points in the face region.

도 4를 참조하면, 얼굴 트랙킹 장치는 단계(121)에서 패치들을 생성할 수 있다. 얼굴 트랙킹 장치는 유사한 위치에서 유사한 색상을 가지는 픽셀들을 클러스터링 함으로써 패치들을 생성할 수 있다. 얼굴 트랙킹 장치는 얼굴 영역 내에서 서로 다른 색상을 가지는 픽셀들을 서로 다른 패치들로 분할할 수 있다. 이로 인하여, 얼굴 영역 내 가려진 부분과 가려지지 않은 부분을 서로 다른 패치들로 분할할 수 있다. 예를 들어, 도 2의 얼굴 영상들로부터 도 5a의 패치들이 생성될 수 있다.Referring to FIG. 4, the face tracking device may generate patches at step 121. The face tracking device may generate patches by clustering pixels having similar colors at similar locations. The face tracking apparatus can divide pixels having different colors in different face patches into different patches. Accordingly, the part hidden in the face area and the part not covered can be divided into different patches. For example, the patches of FIG. 5A may be generated from the face images of FIG.

얼굴 트랙킹 장치는 위치-색상 디스크립터(position-color descriptor)를 이용하여 K-평균 클러스터링 알고리즘(K-means clustering algorithm)을 얼굴 영상의 픽셀들에 반복적으로(iteratively) 적용함으로써, 패치들을 생성할 수 있다. 위치-색상 디스크립터는 [x, y, r, g, b]로 표현될 수 있다. 여기서, x는 픽셀의 x좌표이고, y는 픽셀의 y좌표이며, r은 픽셀의 빨간색 성분이고, g는 픽셀의 녹색 성분이며, b는 픽셀의 파란색 성분이다.The face tracking apparatus can generate patches by iteratively applying a K-means clustering algorithm to pixels of a facial image using a position-color descriptor . The location-color descriptor can be expressed as [x, y, r, g, b]. Where x is the x coordinate of the pixel, y is the y coordinate of the pixel, r is the red component of the pixel, g is the green component of the pixel, and b is the blue component of the pixel.

얼굴 트랙킹 장치는 단계(122)에서 섹션들을 생성할 수 있다. 얼굴 트랙킹 장치는 얼굴 영역 내 특징점들에 따라 인접한 패치들을 병합함으로써 섹션들을 생성할 수 있다. 얼굴 영역 내 특징점들은 제3 특징점들일 수 있다. 얼굴 영역 내 특징점들은 눈, 코, 입, 또는 눈썹 등 얼굴의 주요 구성요소들 위에 위치하므로, 섹션들은 얼굴의 주요 구성요소들에 각각 대응될 수 있다. 섹션들의 크기는 서로 다를 수 있다. 예를 들어, 눈에 대응하는 섹션의 크기와 코에 대응하는 섹션의 크기는 서로 다를 수 있다. 또한, 얼굴의 특징이 잘 나타나지 않는 볼 부분은 하나의 섹션에 포함될 수 있다.The face tracking device may generate the sections in step 122. [ The face tracking device can generate sections by merging adjacent patches along the feature points in the face region. The feature points in the face region may be third feature points. Since the feature points in the face region are located above the main components of the face, such as the eye, nose, mouth, or eyebrows, the sections can correspond to the major components of the face, respectively. The sizes of the sections may be different. For example, the size of the section corresponding to the eye and the size of the section corresponding to the nose may be different from each other. In addition, a ball portion in which features of the face are not well visible may be included in one section.

일 예로, 도 5b를 참조하면, 제1 섹션(510)은 제1 얼굴 영상에서 오른쪽 눈에 대응하는 섹션이고, 제2 섹션(520)은 제1 얼굴 영상에서 왼쪽 눈에 대응하는 섹션이며, 제3 섹션(530)은 제1 얼굴 영상에서 입에 대응하는 섹션이고, 제4 섹션(540)은 제1 얼굴 영상에서 코에 대응하는 섹션일 수 있다.For example, referring to FIG. 5B, the first section 510 corresponds to the right eye in the first face image, the second section 520 corresponds to the left eye in the first face image, 3 section 530 may be a section corresponding to the mouth in the first face image and a fourth section 540 may be a section corresponding to the nose in the first face image.

다른 예로, 도 5c를 참조하면, 제5 섹션(550)은 제2 얼굴 영상에서 오른쪽 눈에 대응하는 섹션이고, 제6 섹션(560)은 제2 얼굴 영상에서 왼쪽 눈에 대응하는 섹션이며, 제7 섹션(570)은 제2 얼굴 영상에서 입에 대응하는 섹션이고, 제8 섹션(580)은 제2 얼굴 영상에서 코에 대응하는 섹션일 수 있다.
5C, the fifth section 550 is a section corresponding to the right eye in the second face image, the sixth section 560 is a section corresponding to the left eye in the second face image, 7 section 570 may be a section corresponding to the mouth in the second face image and an eighth section 580 may be a section corresponding to the nose in the second face image.

얼굴 트랙킹 장치는 단계(130)에서 서브 영역들의 가려짐 확률들(occlusion probabilities)을 계산할 수 있다. 서브 영역들 각각의 가려짐 확률은 해당 서브 영역이 가려진 것일 확률일 수 있다. 가려짐 확률은 0 이상 1 이하의 값을 가지며, 1에서 가려짐 확률을 차감한 값은 노출 확률(exposition probability)일 수 있다. 서브 영역들 각각의 노출 확률은 해당 서브 영역이 가려지지 않고 노출된 것일 확률일 수 있다. 이하, 설명의 편의를 위하여 가려짐 확률을 이용하는 경우를 설명하나, 실시예들은 노출 확률을 이용하도록 변형될 수 있다.The face tracking device may calculate the occlusion probabilities of the sub-regions at step 130. [ The probability of closure of each of the sub-areas may be the probability that the corresponding sub-area is obscured. The probability of occlusion has a value between 0 and 1, and a value obtained by subtracting the probability of occlusion from 1 may be an exposition probability. The exposure probability of each of the sub-regions may be the probability that the sub-region is not covered but exposed. Hereinafter, for convenience of explanation, the case of using the masking probability is explained, but the embodiments can be modified to use the exposure probability.

얼굴 트랙킹 장치는 확률 모델들을 이용하여 서브 영역들의 가려짐 확률들을 계산할 수 있다. 예를 들어, 도 6을 참조하면, 얼굴 트랙킹 장치는 단계(131)에서 패치들의 가려짐 확률들을 계산할 수 있다. 얼굴 트랙킹 장치는 템플레이트 형상(template shape) 내에서 패치에 대응하는 부위의 확률 모델을 이용하여 패치의 가려짐 확률을 계산할 수 있다. 여기서, 템플레이트 형상은 미리 정해진 얼굴 형상으로, 도 7과 같이 미리 정해진 얼굴 형상을 구성하는 복수의 부위들을 포함할 수 있다. 템플레이트 형상에 포함된 각각의 부위마다 확률 모델이 지정될 수 있다.The face tracking apparatus can calculate the masking probabilities of sub-regions using probability models. For example, referring to FIG. 6, the face tracking device may calculate the clogging probabilities of the patches at step 131. The face tracking apparatus can calculate the masking probability of a patch using a probability model of a region corresponding to the patch in a template shape. Here, the template shape may have a predetermined face shape, and may include a plurality of parts constituting a predetermined face shape as shown in Fig. A probabilistic model can be specified for each part included in the template shape.

템플레이트 형상의 부위들에 지정된 확률 모델들은 랜덤 트리 클러스터 기반 적응 다 변수 가우스 모델(random tree cluster based adaptive multivariate Gaussian model)일 수 있다. 도 8을 참조하면, 랜덤 트리(810)는 패치 데이터를 클러스터 하도록 적응될 수 있다. 랜덤 트리의 리프 노드들 각각은 다 변수 가우스 분포(multivariate Gaussian distribution)에 기초하여 패치 데이터를 클러스터할 수 있다. 예를 들어, 제1 리프 노드(811)는 제1 다 변수 가우스 분포(821)에 기초하여 제1 패치 데이터를 클러스터하고, 제2 리프 노드(812)는 제2 다 변수 가우스 분포(822)에 기초하여 제2 패치 데이터를 클러스터할 수 있다.Probabilistic models assigned to template-shaped regions may be a random tree cluster-based adaptive multivariate Gaussian model. Referring to FIG. 8, the random tree 810 may be adapted to cluster patch data. Each of the leaf nodes of the random tree may cluster the patch data based on a multivariate Gaussian distribution. For example, the first leaf node 811 may cluster the first patch data based on the first multivariate Gaussian distribution 821 and the second leaf node 812 may cluster the second multivariable Gaussian distribution 822 The second patch data can be clustered on the basis of the second patch data.

얼굴 트랙킹 장치는 패치가 템플레이트 형상의 어느 부위에 대응하는지 여부를 판단하고, 패치에 대응하는 부위에 지정된 확률 모델을 이용하여 패치의 가려짐 확률을 계산할 수 있다. 얼굴 트랙킹 장치는 패치에 포함된 픽셀들의 통계(statistic)를 특징 디스크립터(feature descriptor)로 이용할 수 있다. 패치에 포함된 픽셀들의 통계는 색상과 관련된 통계일 수 있다. 예를 들어, 패치에 포함된 픽셀들의 통계는 색상 히스토그램, 색상 평균, 및 색상 분산 중 적어도 하나를 포함할 수 있다. 얼굴 트랙킹 장치는 패치를 위한 확률 모델에 기초하여, 패치의 특징 디스크립터에 대응하는 가려짐 확률을 계산할 수 있다. i번째 패치 P_i의 가려짐 확률은 O(P_i)라고 지칭될 수 있다.The face tracking apparatus can determine whether the patch corresponds to which part of the template shape, and calculate the masking probability of the patch using the probability model designated at the part corresponding to the patch. The face tracking device may use the statistic of the pixels included in the patch as a feature descriptor. The statistics of the pixels included in the patch may be color related statistics. For example, the statistics of the pixels included in the patch may include at least one of a color histogram, a color average, and a color variance. The face tracking apparatus can calculate the masking probability corresponding to the feature descriptor of the patch, based on the probability model for the patch. The probability of closure of the _ith patch P _i may be referred to as O (P _i ).

확률 모델은 가려지지 않은 패치들(unoccluded patches)에 기초하여 생성되고 갱신되므로, 확률 모델은 가려지지 않은 패치가 어떻게 보이는지를 기술하는데 이용될 수 있다. 예를 들어, 확률 모델을 생성하기 위하여 색상이 특징 디스크립터로 이용된 상황에서, 볼 위의 패치에 대하여 가려짐 확률을 계산하는 경우를 가정하자. 확률 모델은 볼에 위치하고 피부 색상에 맞는 임의의 패치에 대하여 그 패치가 얼굴의 가려지지 않은 부위에 해당할 확률이 높다고 (또는, 그 패치의 가려짐 확률이 낮다고) 예측할 수 있다. 실제로는 자세, 조명, 표정 등에 의하여 외형이 변하므로, 적절한 확률 모델 및 특징 디스크립터가 선택되어야 한다.Since the probability model is created and updated based on unoccluded patches, the probability model can be used to describe how the unseen patches look. For example, suppose that in a situation where a color is used as a feature descriptor to generate a probability model, the probability of masking the patch on the ball is calculated. The probability model can be predicted that for any patch that is located in the ball and matches the skin color, the probability that the patch will correspond to the uncovered part of the face (or the probability of masking the patch is low). Actually, the appearance changes depending on the posture, illumination, facial expression, etc., so that an appropriate probability model and feature descriptor should be selected.

일 예로, 가우스 혼합 모델(Gaussian Mixture Model)이 이용되는 경우, 가려지지 않은 패치로부터 벡터 x가 도출될 확률은 수학식 6과 같이 계산될 수 있다.As an example, when a Gaussian Mixture Model is used, the probability that the vector x is derived from the unsealed patch can be calculated as shown in Equation (6).

여기서, M은 컴포넌트 번호이고, w _t 는 t번째 컴포넌트의 가중치이며,

는 t번째 컴포넌트 가우스 밀도이다.

는 수학식 7과 같이 계산될 수 있다.Where M is the component number, w _t is the weight of the t th component,

Is the t- th component Gaussian density.

Can be calculated as shown in Equation (7).

여기서,

는 평균 벡터이고,

는 공분산(covariance) 매트릭스이며, D는 벡터 x의 디멘션(dimension)이다.here,

Is an average vector,

Is the covariance matrix, and D is the dimension of the vector x .

다른 예로, 랜덤 트리 기반 가우스 모델이 이용되는 경우, 가려지지 않은 패치로부터 벡터 x가 도출될 확률은 모든 트리 밀도들의 평균일 수 있다. 이 경우, 가려지지 않은 패치로부터 벡터 x가 도출될 확률은 수학식 8과 같이 계산될 수 있다.As another example, when a random tree based Gaussian model is used, the probability that the vector x is derived from the unsealed patch may be the average of all the tree densities. In this case, the probability that the vector x is derived from the unsealed patch can be calculated as shown in Equation (8).

여기서, T는 트리 번호이고, p _t (x)는 t번째 트리 밀도이다. p _t (x)는 수학식 9와 같이 계산될 수 있다.Where T is the tree number, and p _t (x) is the t- th tree density. p _t (x) can be calculated as shown in Equation (9).

여기서, l(x)는 벡터 x가 분열되는(divided into) 리프 노드이고,

는 리프 노드 l(x)에 도달하는 모든 트레이닝 샘플의 비율(proportion)이며, Z _t 는 확률 정규화(probability normalization)를 위한 계수이고,

는 리프 노드 l(x)를 위한 단일 가우스 모델이다.

는 수학식 10과 같이 계산될 수 있다.Where l (x) is the leaf node into which the vector x is divided,

Is the proportion of all training samples arriving at leaf node l (x) , Z _t is a coefficient for probability normalization,

Is a single Gaussian model for leaf node l (x) .

Can be calculated as shown in Equation (10).

여기서,

는 리프 노드 l(x) 내 모든 트레이닝 샘플의 평균 벡터이고,

는 리프 노드 l(x) 내 모든 트레이닝 샘플의 공분산 매트릭스이며, D는 벡터 x의 디멘션이다.here,

Is the mean vector of all training samples in leaf node l (x)

Is the covariance matrix of all training samples in leaf node l (x) , and D is the dimension of vector x .

특징 디스크립터 벡터 x는 패치, 패치와 이웃 패치들, 또는 패치와 이웃 픽셀들로부터 추출될 수 있다. 예를 들어, 패치 내 픽셀들의 색상 히스토그램이 특징 디스크립터 벡터 x로 추출될 수 있다. The feature descriptor vector x may be extracted from patches, patches and neighboring patches, or patches and neighboring pixels. For example, a color histogram of pixels in a patch may be extracted as a feature descriptor vector x .

특징 디스크립터 벡터 x는 영역의 프로퍼티(property)를 나타내는데 이용될 수 있는 어떠한 벡터라도 포함할 수 있다. 예를 들어, 특징 디스크립터 벡터 x는 패치 내 픽셀들의 그레디언트 히스토그램; 패치의 이웃 영역 내 픽셀들의 색상 히스토그램; 패치의 이웃 영역 내 픽셀들의 그레디언트 히스토그램; 경계 직사각형(bounding rectangle)의 높이-너비 비, 둘레-넓이 비, 패치와 동일한 정규화된 제2 센트럴 모멘츠(normalized second central moments)를 가지는 타원의 메이저-마이너 축 길이 비, 패치 내 컨벡스 홀(convex hull)에 포함된 픽셀들의 비율 등과 같은 기하학적 특징(geometry feature); 로컬 바이너리 특징(Local Binary Feature), 코-어커런스 매트릭스(co-occurrence matrix) 내 원소들 등과 같은 텍스처 특징; 정규화, 주성분분석(Principle Component Analysis) 등으로 변환된 특징 벡터들 등을 포함할 수 있다.The feature descriptor vector x may include any vector that can be used to represent the property of the region. For example, the feature descriptor vector x may be a gradient histogram of the pixels in the patch; A color histogram of pixels in the neighborhood region of the patch; A gradient histogram of pixels in the neighborhood of the patch; The major-minor axial length ratio of the ellipse with the height-width ratio, the circumference-width ratio of the bounding rectangle, the normalized second central moments equal to the patch, the ratio of the convex a geometry feature such as a ratio of pixels included in the hull; Texture features such as Local Binary Feature, elements in a co-occurrence matrix, and the like; Normalized, principal component analysis, and the like.

임의의 패치는 복수의 확률 모델들에 대응할 수 있다. 일 예로, 템플레이트 형상은 100 x 100 픽셀 범위를 가지고, 템플레이트 형상은 100개의 부위들로 분할될 수 있다. 이 경우, 각 부위에 지정된 확률 모델은 10 x 10 픽셀 범위에 대응할 수 있다. 패치 P_i의 크기가 10 x 10 픽셀 범위보다 큰 경우, 패치 P_i는 템플레이트 형상 내 복수의 부위들에 대응할 수 있다. 얼굴 트랙킹 장치는 복수의 부위들에 지정된 확률 모델들을 획득할 수 있다. 복수의 확률 모델들은 모두 패치 P_i에 대응할 수 있다. 얼굴 트랙킹 장치는 패치를 위한 복수의 확률 모델들에 기초하여, 패치의 특징 디스크립터에 대응하는 가려짐 확률을 계산할 수 있다. i번째 패치 P_i에 대응하는 m번째 확률 모델을 이용하는 경우, 패치 P_i의 가려짐 확률은 O_m(P_i)라고 지칭될 수 있다. 이 경우, i번째 패치 P_i의 가려짐 확률은 O(P_i) = min(O_m(P_i))으로 계산될 수 있다.Any patch may correspond to a plurality of probability models. In one example, the template shape has a range of 100 x 100 pixels, and the template shape can be divided into 100 parts. In this case, the probabilistic model assigned to each region may correspond to a 10 x 10 pixel range. If the size of the patch P _i is greater than 10 x 10 pixel range, the patch P _i may correspond to a plurality of sites within the template shape. The face tracking apparatus can acquire probability models assigned to a plurality of sites. A plurality of probability models may all correspond to a patch P _i . The face tracking apparatus can calculate a masking probability corresponding to a feature descriptor of a patch, based on a plurality of probability models for a patch. When using the m-th probability model corresponding to the i-th patch P _i , the probability of masking the patch P _i may be referred to as O _m (P _i ). In this case, the probability of closure of the i-th patch P _i can be calculated as O (P _i ) = min (O _m (P _i )).

패치의 가려짐 확률은 대응하는 확률 모델들에 기초하여 예측될 수 있다. 실제로 패치의 수는 일정하지 않으므로, 확률 모델들과 패치들 사이에 일대일 대응 관계가 형성되지 않을 수 있다.The probability of masking the patch can be predicted based on the corresponding probability models. Actually, the number of patches is not constant, so a one-to-one correspondence between probability models and patches may not be formed.

만약 패치 i의 영역 내 N _i 개의 확률 모델들이 있다면, N _i 개의 가려짐 확률들

이 계산될 수 있다. 여기서, x _i 는 패치 i의 특징 디스크립터이다.If there are N _i probability models in the region of patch i , then N _i clogging probabilities

Can be calculated. Where x _i is the feature descriptor of patch i .

패치의 가려짐 확률은 모든 가려짐 확률들

의 결합 결과일 수 있다. 만약 패치 i가 가려지지 않았다면, N _i 개의 확률 모델들은 패치 i 주변의 얼굴 영역의 외형을 나타낸다. 하지만, N _i 개의 확률 모델들 중 일부는 이웃 패치들의 외형을 나타낼 수 있다. 따라서, 가려짐 확률들

의 신뢰도(reliability)는 달라질 수 있다.The probability of masking the patches is the sum of all masking probabilities

Lt; / RTI > If patch i is not masked, N _i probability models represent the appearance of the face region around patch i . However, some of the N _i probability models may represent the contours of neighboring patches. Therefore,

The reliability of the system can be changed.

확률 결합의 일 예로, 가려짐 확률이 낮은 스코어들은 가려짐 확률이 높은 스코어들보다 더 신뢰할 수 있다는 가설(hypothesis)이 이용될 수 있다. 가려짐 확률이 낮은 스코어는 관측된 외형이 모델에 잘 매치되는 것을 의미할 수 있다. 가려짐 확률이 높은 스코어는 다양한 원인들, 예를 들어 트레이닝 또는 적응(adaptation)이 불충분한 경우, 확률 모델이 패치 i보다 다른 패치에 더 가까운 경우, 조명 변화로 인하여 외형이 눈에 보이지 않게 되는 경우 등에 의하여 발생될 수 있다. 그러므로, 가장 낮은 가려짐 확률이 패치 i의 가려짐 확률로 고려될 수 있다.As an example of probability combining, a hypothesis can be used that scores that are less likely to be masked are more reliable than scores that are more likely to be masked. Scores with a low likelihood of occlusion can mean that the observed shape matches the model well. Scores with a high probability of being masked may have a variety of causes, such as when training or adaptation is inadequate, when the probability model is closer to another patch than to patch i , And the like. Therefore, the lowest probability of masking can be considered as the probability of masking patch i .

확률 결합의 다른 예로, 확률 모델이 정의된 위치와 패치의 중심 사이의 거리가 이용될 수 있다. 예를 들어, 임의의 패치가 가려지지 않은 확률은 수학식 11과 같이 계산될 수 있다.As another example of probability binding, the distance between the position where the probability model is defined and the center of the patch can be used. For example, the probability that an arbitrary patch is not masked can be calculated as shown in Equation (11).

여기서, p _i 는 패치 i가 가려지지 않은 확률이고, w _j 는 가중치 계수이다. w _j 는 수학식 12와 같이 계산될 수 있다.Here, p _i is the probability that patch i is not covered, and w _j is a weight coefficient. w _j can be calculated as shown in Equation (12).

여기서, d _j 는 확률 모델 j가 정의된 위치와 패치의 중심 사이의 거리이다.Where d _j is the distance between the location where the probability model j is defined and the center of the patch.

얼굴 트랙킹 장치는 단계(132)에서 섹션들의 가려짐 확률들을 계산할 수 있다. 얼굴 트랙킹 장치는 섹션들의 가려짐 확률들을 계산함으로써, 얼굴의 주요 구성요소들 각각이 가려진 확률들을 추정할 수 있다. 예를 들어, 도 9를 참조하면, 케이스 1에서는 입에 대응하는 섹션(910)의 가려짐 확률이 다른 섹션들의 가려짐 확률들에 비하여 높게 계산될 수 있다. 이 경우, 얼굴 트랙킹 장치는 입력 영상에서 입이 가려진 확률이 높다고 판단할 수 있다. 케이스 2에서는 오른쪽 눈에 대응하는 섹션(920)의 가려짐 확률이 다른 섹션들의 가려짐 확률들에 비하여 높게 계산될 수 있다. 이 경우, 얼굴 트랙킹 장치는 입력 영상에서 오른쪽 눈이 가려진 확률이 높다고 판단할 수 있다. 케이스 3에서는 왼쪽 볼 일부에 대응하는 섹션(930)의 가려짐 확률이 다른 섹션들의 가려짐 확률들에 비하여 높게 계산될 수 있다. 이 경우, 얼굴 트랙킹 장치는 입력 영상에서 왼쪽 볼 일부가 가려진 확률이 높다고 판단할 수 있다.The face tracking device may calculate the closure probabilities of the sections at step 132. [ The face tracking device can estimate the occlusions of each of the major components of the face by calculating the occlusion probabilities of the sections. For example, referring to FIG. 9, in Case 1, the probability of closure of the section 910 corresponding to the mouth can be calculated to be higher than the closure probabilities of the other sections. In this case, the face tracking apparatus can determine that the probability that the mouth is blocked in the input image is high. In Case 2, the probability of closure of the section 920 corresponding to the right eye can be calculated to be higher than the closure probabilities of the other sections. In this case, the face tracking apparatus can determine that the probability that the right eye is obscured in the input image is high. In Case 3, the probability of closure of the section 930 corresponding to a part of the left ball can be calculated to be higher than the closure probabilities of the other sections. In this case, the face tracking apparatus can determine that the probability that a part of the left ball is occluded in the input image is high.

얼굴 트랙킹 장치는 각각의 섹션들에 대응하는 적응 가우스 모델(adaptive Gaussian model)을 이용하여 섹션의 가려짐 확률을 계산할 수 있다. 각각의 섹션들에 대응하는 적응 가우스 모델들은 얼굴의 주요 구성요소들(눈, 코, 입, 눈썹 등)을 반영할 수 있다. 얼굴 트랙킹 장치는 섹션에 포함된 유효 매칭 결과의 수를 특징 디스크립터로 이용할 수 있다. 예를 들어, 얼굴 트랙킹 장치는 얼굴 영역 검출 과정에서 기 저장된 유효 매칭 결과를 이용하여, 섹션에 포함된 유효 매칭 결과의 수를 카운트할 수 있다. 얼굴 트랙킹 장치는 섹션의 확률 모델에 기초하여, 섹션의 특징 디스크립터에 대응하는 가려짐 확률을 계산할 수 있다. j번째 섹션 R_j의 가려짐 확률은 O(R_j)라고 지칭될 수 있다.The face tracking device may calculate the occlusion probability of a section using an adaptive Gaussian model corresponding to each of the sections. The adaptive Gaussian models corresponding to the respective sections may reflect the major components of the face (eye, nose, mouth, eyebrow, etc.). The face tracking device may use the number of valid matching results included in the section as a feature descriptor. For example, the face tracking apparatus can count the number of valid matching results included in the section using the previously stored effective matching result in the face area detecting process. The face tracking apparatus can calculate the occlusion probability corresponding to the feature descriptor of the section based on the probability model of the section. The closure probability of the _jth section Rj may be referred to as O ( _Rj ).

얼굴 트랙킹 장치는 단계(133)에서 가려짐 가중치 맵(occlusion weight map)을 생성할 수 있다. 가려짐 가중치 맵은 얼굴 영역에 포함된 각각의 픽셀들의 가려짐 확률을 포함할 수 있다. 얼굴 트랙킹 장치는 패치들의 가려짐 확률들 및 섹션들의 가려짐 확률들에 기초하여 가려짐 가중치 맵을 생성할 수 있다. 여기서, 패치들과 섹션들은 서로 정밀도가 다를 수 있다. 얼굴 트랙킹 장치는 정밀도가 서로 다른 패치의 가려짐 확률과 섹션의 가려짐 확률을 종합적으로 고려하여 픽셀의 가려짐 확률을 추정할 수 있다.The face tracking device may generate an occlusion weight map in step 133. [ The masked weight map may include a masking probability of each pixel included in the face area. The face tracking device may generate a masked weight map based on the masking probabilities of the patches and the masking probabilities of the sections. Here, patches and sections may have different precision from each other. The face tracking apparatus can estimate the occlusion probability of a pixel by comprehensively considering the probability of masking the patches having different accuracies and the probability of securing the sections.

예를 들어, 얼굴 트랙킹 장치는 O(X_k) = max(O(P_i), O(R_j))를 이용하여, 가려짐 가중치 맵을 생성할 수 있다. 여기서, O(X_k)는 얼굴 영역 내 k번째 픽셀의 가려짐 가중치이고, O(P_i)는 k번째 픽셀이 속한 패치 P_i의 가려짐 확률이며, O(R_j)는 k번째 픽셀이 속한 섹션 R_j의 가려짐 확률이다. 패치들의 가려짐 확률들과 섹션들의 가려짐 확률들을 조합하여 가려짐 가중치 맵을 생성하는 방식은 다양하게 변형될 수 있다.
For example, the face tracking device may generate a masked weight map using O (X _k ) = max (O (P _i ), O (R _j )). Where O (X _k ) is the masking weight of the kth pixel in the face region, O (P _i ) is the mask probability of the patch P _i to which the k th pixel belongs, O (R _j ) Is the probability of closure of the section R _j belonging to. The way in which the masking probability of patches and the masking probabilities of sections are combined to generate a masking weight map can be modified in various ways.

얼굴 트랙킹 장치는 단계(140)에서 가려짐 확률들에 기초하여 얼굴을 트랙킹할 수 있다. 예를 들어, 도 10을 참조하면, 얼굴 트랙킹 장치는 단계(141)에서 가려짐 가중치 맵을 이용하여 얼굴 모델의 파라미터들을 조정할 수 있다. 얼굴 모델은 2차원 형상 모델, 3차원 형상 모델, 및/또는 텍스쳐 모델을 포함할 수 있다. 얼굴 모델은 변형 가능한 형상 모델(deformable shape model)일 수 있다.The face tracking device may track the face based on the probabilities of being masked in step 140. [ For example, referring to FIG. 10, the face tracking device may adjust the parameters of the face model using the masked weight map in step 141. The face model may include a two-dimensional shape model, a three-dimensional shape model, and / or a texture model. The face model may be a deformable shape model.

얼굴 트랙킹 장치는 변형 가능한 형상 모델의 미리 정해진 파라미터들을 조정함으로써, 변형 가능한 형상 모델을 입력된 얼굴(input face)에 맞게 적응시킬 수 있다. 입력된 얼굴은 현재 프레임에 포함된 얼굴을 지칭한다. 얼굴 트랙킹 장치는 가려짐 가중치 맵을 이용하여 정의된 비용 함수의 출력이 최소화 되도록, 변형 가능한 형상 모델의 파라미터들을 조정할 수 있다. 실시예들은 가려짐 가중치 맵을 이용함으로써, 가려진 영역에서 발생되는 에러를 감소시킬 수 있다. 또한, 실시예들은 변형 가능한 형상 모델의 변형 에너지를 이용함으로써, 가려진 영역 내 특정점들의 위치가 일반적인 위치로부터 크게 벗어나는 것을 방지할 수 있다.The face tracking device can adapt the deformable shape model to the input face by adjusting predetermined parameters of the deformable shape model. The input face refers to the face included in the current frame. The face tracking device may adjust the parameters of the deformable shape model so that the output of the cost function defined using the masked weight map is minimized. Embodiments can reduce errors that occur in the obscured region, by using a masked weight map. Embodiments can also prevent the position of certain points in the obscured region from deviating significantly from the normal position, by utilizing the strain energy of the deformable shape model.

비용 함수는 가려짐 가중치 맵을 이용하여 얼굴 모델과 입력된 얼굴 사이의 매칭 오차를 계산하는 함수이다. 이하, 비용 함수의 출력은 매칭 오차 정보라고 지칭될 수 있다. 얼굴 트랙킹 장치는 매칭 오차 정보가 최소화되도록 2차원 형상 파라미터, 2차원 유사 변환 파라미터, 3차원 형상 파라미터, 3차원 유사 변환 파라미터 및 텍스쳐 파라미터 중 적어도 하나를 변경할 수 있다.The cost function is a function that calculates the matching error between the face model and the input face using the masked weight map. Hereinafter, the output of the cost function may be referred to as matching error information. The face tracking apparatus can change at least one of a two-dimensional shape parameter, a two-dimensional similarity conversion parameter, a three-dimensional shape parameter, a three-dimensional similarity conversion parameter and a texture parameter so that matching error information is minimized.

일 실시예에 따른 비용 함수는 수학식 13으로 정의될 수 있다.The cost function according to one embodiment may be defined by Equation (13).

여기서, E (p, q, b)는 비용함수이고, O _a는 가려짐 확률이다. A (p, q)는 현재 프레임에서 획득된 텍스쳐 벡터이고, a(b)는 텍스쳐 모델에 대응하는 텍스쳐 벡터이다. 픽셀의 가려짐 확률이 클 수록 A (p, q) 및 a(b) 사이의 차이에 적용되는 가중치가 감소할 수 있다. 이에 따라, 픽셀의 가려짐 확률이 클 수록 가려짐으로 인해 받는 영향이 감소될 수 있다.Here, E ( p , q , b ) is a cost function and O _a is _a probability of masking. A ( p , q ) is the texture vector obtained in the current frame, and a ( b ) is the texture vector corresponding to the texture model. The greater the probability that a pixel is covered, the less weight applied to the difference between A ( p , q ) and a ( b ) may be. Accordingly, the greater the probability that the pixel is hidden, the more the influence due to the masking can be reduced.

A (p, q)는 2차원 형상 파라미터 p 및 2차원 유사 변환 파라미터 q에 기초하여 계산될 수 있다. 얼굴 트랙킹 장치는 p 및 q에 의하여 표현되는 2차원 형상 모델에 포함된 특징점들을 기 설정된 크기의 영상 I 안에 포함되도록 할 수 있다. 예를 들어, 얼굴 트랙킹 장치는 p를 0으로 설정하고 q를 적절한 값으로 설정함으로써, 2차원 형상 모델에 포함된 특징점들이 영상 I 안에 포함되도록 할 수 있다. A ( p , q ) can be calculated based on the two-dimensional shape parameter p and the two-dimensional similarity transformation parameter q . The face tracking apparatus can make the feature points included in the two-dimensional shape model represented by p and q included in the image I of a predetermined size. For example, the face tracking device may set feature points included in the two-dimensional shape model to be included in the image I by setting p to 0 and setting q to an appropriate value.

얼굴 트랙킹 장치는 2차원 형상 모델에 포함된 특징점들을 꼭지점으로 하는 삼각형들을 설정할 수 있다. 삼각형들은 공통 에지(common edge) 또는 공통 꼭지점(common vertex)을 통해 서로 인접하면서, 서로 겹치기 않게 설정될 수 있다. 각각의 삼각형들은 영상 I를 구성하는 픽셀 X_k (k는 인덱스)로 설정될 수 있다.The face tracking apparatus can set triangles having vertexes as feature points included in the two-dimensional shape model. The triangles may be set adjacent to each other via a common edge or a common vertex without overlapping each other. Each triangle can be set to a pixel _Xk (k is an index) constituting the image I.

얼굴 트랙킹 장치는 영상 I의 픽셀 X_k에 대응하는 삼각형의 수심 좌표를 계산할 수 있다. 얼굴 트랙킹 장치는 픽셀 X_k에 대응하는 삼각형의 수심 좌표 및 꼭지점 좌표들에 기초하여, 픽셀 X_k에 대응하는 대응점(corresponding point)의 좌표를 계산할 수 있다. 대응점의 좌표는 현재 프레임의 픽셀을 지시할 수 있다. 예를 들어, 얼굴 트랙킹 장치는 최근접(nearest neighbor) 방법 및/또는 직선 보간(linear interpolation) 방법을 이용하여 픽셀 X_k에 대응하는 대응점의 좌표를 계산할 수 있다.Face tracking system can calculate the depth coordinates of the triangle corresponding to the pixel X of image I _k. Face tracking device can calculate the coordinates of the, corresponding point (corresponding point) corresponding to the pixel X _k based on the depth coordinates and vertex coordinates of the triangle corresponding to the pixel X _k. The coordinates of the corresponding point may indicate the pixels of the current frame. For example, the face tracking device may be using a nearest neighbor (nearest neighbor) method and / or linear interpolation (linear interpolation) method to calculate the coordinates of the corresponding point corresponding to the pixel X _k.

얼굴 트랙킹 장치는 대응점의 좌표에 의하여 지시되는 현재 프레임의 픽셀로부터 색상을 획득할 수 있다. 얼굴 트랙킹 장치는 획득된 색상을 픽셀 X_k에 할당함으로써, 영상 I를 텍스쳐 영상 I'으로 변경할 수 있다. 이 때, 텍스쳐 영상 I'은 현재 프레임에 포함된 얼굴의 형상과 무관할 수 있다.The face tracking device may obtain the color from the pixels of the current frame indicated by the coordinates of the corresponding point. Face tracking device is obtained by assigning a color to the pixel X _k, may change the picture I as texture image I '. At this time, the texture image I 'may be independent of the shape of the face included in the current frame.

얼굴 트랙킹 장치는 텍스쳐 영상 I'의 픽셀들을 변경할 수 있다. 예를 들어, 얼굴 트랙킹 장치는 텍스쳐 영상 I'에 그레이 스케일 표준화(Grayscale normalization)를 적용한 결과 및/또는 그라데이션 변형(Gradient Transform)을 적용한 결과를 하나의 벡터로 결합함으로써, 텍스쳐 벡터 A (p, q)를 획득할 수 있다.The face tracking device may change the pixels of the texture image I '. For example, the face tracking apparatus combines the result of applying grayscale normalization to a texture image I 'and / or applying a gradient transform to a vector to obtain a texture vector A ( p , q Can be obtained.

얼굴 트랙킹 장치는 E (p, q, b)를 최소화하는 p, q 및 b를 계산할 수 있다. 예를 들어, 얼굴 트랙킹 장치는 기울기 하강 알고리즘으로 p, q 및 b를 변경함으로써, E (p, q, b)를 최소화시킬 수 있다. 얼굴 트랙킹 장치는 계산된 p와 q를 수학식 1에 적용함으로써 현재 프레임에서 얼굴의 특징점들을 획득할 수 있다.The face tracking apparatus can calculate p , q and b to minimize E ( p , q , b ). For example, the face tracking device can minimize E ( p , q , b ) by changing p , q and b with the slope descent algorithm. The face tracking apparatus can obtain facial feature points in the current frame by applying the calculated p and q to Equation (1).

다른 실시예에 따른 비용 함수는 수학식 14와 같이 2차원 형상 모델과 3차원 형상 모델의 2차원 투영간의 바이어스(bias)에 기초하여 정의될 수 있다.The cost function according to another embodiment can be defined based on a bias between a two-dimensional shape model and a two-dimensional projection of a three-dimensional shape model as shown in Equation (14).

여기서, s(p, q)는 2차원 형상 모델이고, Proj(s'(p', q'))은 3차원 형상 모델의 2차원 투영이다. 얼굴 트랙킹 장치는 수학식 14에 의하여 정의되는 비용 함수의 출력을 최소화하는 p, q, p' 및 q'을 계산할 수 있다. 얼굴 트랙킹 장치는 계산된 p와 q를 수학식 1에 적용함으로써 현재 프레임에서 얼굴의 특징점들을 획득할 수 있다. 또는, 얼굴 트랙킹 장치는 계산된 p'과 q'을 수학식 2에 적용함으로써 현재 프레임에서 얼굴의 특징점들을 획득할 수 있다.
Here, s ( p , q ) is a two-dimensional shape model, and Proj ( s' ( p ' , q' )) is a two-dimensional projection of a three-dimensional shape model. The face tracking apparatus may calculate p , q , p ' and q' to minimize the output of the cost function defined by equation (14). The face tracking apparatus can obtain facial feature points in the current frame by applying the calculated p and q to Equation (1). Alternatively, the face tracking apparatus can obtain facial feature points in the current frame by applying the calculated p ' and q' to Equation (2).

얼굴 트랙킹 장치는 각각의 영상들 또는 각각의 프레임들에 대응하는 트랙킹 결과를 출력할 수 있다. 트랙킹 결과는 다양한 방식으로 표현될 수 있다. The face tracking apparatus can output tracking results corresponding to respective images or respective frames. Tracking results can be expressed in various ways.

일 예로, 트랙킹 결과는 전술한 얼굴 모델을 이용하여 표현될 수 있다. 트랙킹 결과는 얼굴의 특징점들의 2차원 좌표들로 구성되는 2차원 형상 모델로 표현될 수 있다. 또는, 트랙킹 결과는 얼굴의 특징점들의 3차원 좌표들로 구성되는 3차원 형상 모델로 표현될 수 있다. 또는, 트랙킹 결과는 얼굴의 텍스쳐 정보를 포함하는 텍스쳐 모델로 표현될 수 있다.For example, the tracking result can be expressed using the above-described face model. The tracking result can be expressed by a two-dimensional shape model composed of two-dimensional coordinates of the feature points of the face. Alternatively, the tracking result may be represented by a three-dimensional shape model composed of three-dimensional coordinates of the feature points of the face. Alternatively, the tracking result may be represented by a texture model that includes texture information of the face.

다른 예로, 트랙킹 결과는 얼굴 모델의 파라미터들로 표현될 수 있다. 트랙킹 결과는 2차원 형상 모델의 파라미터들인 2차원 형상 파라미터 및 2차원 유사 변환 파라미터로 표현될 수 있다. 또는, 트랙킹 결과는 3차원 형상 모델의 파라미터들인 3차원 형상 파라미터 및 3차원 유사 변환 파라미터로 표현될 수 있다. 또는, 트랙킹 결과는 텍스쳐 모델의 파라미터인 텍스쳐 파라미터로 표현될 수 있다.As another example, the tracking results may be represented by parameters of the face model. The tracking result can be expressed by a two-dimensional shape parameter and two-dimensional similar transformation parameters which are parameters of the two-dimensional shape model. Alternatively, the tracking result may be expressed by a three-dimensional shape parameter and a three-dimensional similar transformation parameter, which are parameters of the three-dimensional shape model. Alternatively, the tracking result may be represented by a texture parameter which is a parameter of the texture model.

또 다른 예로, 트랙킹 결과는 얼굴의 포즈 정보 및 얼굴의 표정 정보로 표현될 수 있다. 포즈 정보는 얼굴의 포즈를 나타내는 정보로, 예를 들어 얼굴 정면 포즈, 얼굴 측면 포즈 등을 포함할 수 있다. 표정 정보는 얼굴의 표정을 나타내는 정보로, 예를 들어 웃는 표정, 우는 표정 등을 포함할 수 있다.As another example, the tracking result may be represented by face pose information and facial expression information. The pose information is information indicating a pose of a face, and may include, for example, a face frontal pose, a face side pose, and the like. The facial expression information is information representing facial expressions, and may include, for example, a smiling expression, a crying expression, and the like.

전술한 트랙킹 결과의 표현 방식은 예시적인 사항들에 불과하고, 트랙킹 결과의 표현 방식은 다양하게 변형될 수 있다. 예를 들어, 트랙킹 결과는 전술한 사항들의 다양한 조합으로 표현될 수 있다.
The manner of expressing the above-described tracking results is merely illustrative, and the manner of expressing the tracking result can be variously modified. For example, the tracking results may be expressed in various combinations of the foregoing.

얼굴 트랙킹 장치는 트랙킹 결과를 평가하고, 키 프레임 또는 확률 모델을 업데이트할 수 있다. 예를 들어, 도 11을 참조하면, 얼굴 트랙킹 장치는 단계(150)에서 기 학습된 분류기(classifier)를 이용하여 트랙킹 결과를 평가할 수 있다. 얼굴 트랙킹 장치는 트랙킹 결과를 평가함으로써, 현재 프레임에서 도출된 트랙킹 결과가 신뢰할 수 있는 결과인지 여부를 판단할 수 있다. 트랙킹 결과가 성공적이라고 평가되는 경우, 현재 프레임에서 도출된 트랙킹 결과는 다음 프레임의 얼굴 트랙킹을 위하여 이용될 수 있다. 이 경우, 키 프레임 및/또는 확률 모델이 업데이트될 수 있다. 트랙킹 경과가 성공적이지 않다고 평가되는 경우, 현재 프레임에서 도출된 트랙킹 결과는 다음 프레임의 얼굴 트랙킹을 위하여 이용되지 않을 수 있다. 이 경우, 키 프레임 및/또는 확률 모델이 업데이트되지 않을 수 있다.The face tracking device may evaluate tracking results and update key frames or probability models. For example, referring to FIG. 11, the face tracking apparatus may evaluate the tracking result using the classifier learned in step 150. The face tracking device can evaluate the tracking result to determine whether the tracking result derived from the current frame is a reliable result. If the tracking result is evaluated to be successful, the tracking result derived from the current frame may be used for tracking the face of the next frame. In this case, the key frame and / or the probability model may be updated. If the tracking progress is evaluated as not being successful, the tracking result derived from the current frame may not be used for tracking the face of the next frame. In this case, the key frame and / or the probability model may not be updated.

트랙킹 결과의 평가에 이용되는 분류기는 트레이닝 샘플들을 이용하여 기 학습된 것일 수 있다. 트레이닝 샘플들은 복수의 영상들 또는 스트리밍 비디오로 구성될 수 있으며, 각각의 영상 또는 각각의 프레임에 포함된 특징점들의 위치가 레이블되어(labeled) 있을 수 있다. The classifier used to evaluate the tracking results may be learned using training samples. The training samples may be composed of a plurality of images or streaming video, and the position of the feature points contained in each image or each frame may be labeled.

분류기는 트랙킹 결과의 성공/실패 여부를 분류하는 랜덤 트리 분류기(random tree classifier)일 수 있다. 예를 들어 분류기는 SVM(support vector machine) 및 랜덤 포레스트(random forest) 중 하나일 수 있다. 분류기는 성공적으로 트랙킹된 샘플들을 파지티브 샘플들(positive samples)로 분류하고, 성공적으로 트랙킹되지 못한 샘플들을 네거티브 샘플들(negative samples)로 분류할 수 있다. 분류기에 입력되는 정보는 트랙킹 결과에 포함된 각종 파라미터들 및 비용 함수의 출력들 중 적어도 하나를 포함할 수 있다. The classifier may be a random tree classifier that classifies the success / failure of the tracking result. For example, the classifier may be one of a support vector machine (SVM) and a random forest. The classifier can classify successfully tracked samples into positive samples and successfully untracked samples into negative samples. The information input to the classifier may include at least one of various parameters included in the tracking result and outputs of the cost function.

분류기를 학습시키는 네거티브 샘플들의 수를 증가시키기 위하여, 트레이닝 샘플들에 방해요소(disturbance)가 추가될 수 있다. 예를 들어, 트레이닝 샘플들에서 노이즈가 추가되거나, 가려짐이 추가되거나, 밝기가 변경되거나, 콘트라스트(contrast)가 변경될 수 있다.In order to increase the number of negative samples learning the classifier, a disturbance may be added to the training samples. For example, noise may be added, shadowing may be added, brightness may be changed, or contrast may be changed in training samples.

얼굴 트랙킹 장치는 트랙킹 결과가 성공적이라고 평가되는 경우, 단계(160)에서 키 프레임을 업데이트할 수 있다. 예를 들어, 얼굴 트랙킹 장치는 트랙킹 결과로부터 특징점들의 3차원 좌표들, 포즈 파라미터, 및 표정 파라미터를 획득할 수 있다. 얼굴 트랙킹 장치는 포즈 파라미터 및 표정 파라미터로 인덱스되는 키 프레임을 데이터베이스에 추가할 수 있다. 키 프레임은 특징점들의 3차원 좌표들을 포함할 수 있다. 만약 포즈 파라미터 및 표정 파라미터로 인덱스되는 키 프레임이 이미 데이터베이스에 이미 저장되어 있는 경우, 분류기에 의하여 평가된 스코어를 기준으로 교체(replace) 여부를 판단할 수 있다. 예를 들어, 이미 저장된 키 프레임에 비하여 새로 생성된 키 프레임이 분류기에 의하여 더 높은 스코어를 받는 경우, 이미 저장된 키 프레임은 새로 생성된 키 프레임으로 교체될 수 있다.The face tracking device may update the key frame in step 160 if the tracking result is evaluated to be successful. For example, the face tracking device can obtain the three-dimensional coordinates of the feature points, the pose parameter, and the facial expression parameter from the tracking results. The face tracking device may add keyframes to the database that are indexed with pose parameters and facial expression parameters. The keyframe may include three-dimensional coordinates of the feature points. If the key frame indexed by the pose parameter and the facial expression parameter is already stored in the database, it can be determined whether or not to replace the key frame based on the score evaluated by the classifier. For example, if the newly generated key frame is higher than the already stored key frame by the classifier, the already stored key frame can be replaced with the newly generated key frame.

얼굴 트랙킹 장치는 트랙킹 결과가 성공적이라고 평가되는 경우, 단계(170)에서 확률 모델을 업데이트할 수 있다. 얼굴 트랙킹 장치는 패치들의 가려짐 확률들을 계산하기 위한 확률 모델들 및/또는 섹션들의 가려짐 확률들을 계산하기 위한 확률 모델들을 업데이트할 수 있다. 예를 들어, 얼굴 트랙킹 장치는 노드 스플릿(node split)을 이용하여 패치의 가려짐 확률들을 계산하기 위한 확률 모델을 업데이트할 수 있다. The face tracking device may update the probability model at step 170 if the tracking result is evaluated to be successful. The face tracking device may update probability models for calculating the clogged probabilities of patches and / or probability models for calculating the clogged probabilities of the sections. For example, the face tracking device may update the probability model for calculating the masking probabilities of patches using a node split.

도 8을 통하여 전술한 바와 같이, 패치의 가려짐 확률들을 계산하기 위한 확률 모델은 트리 구조를 가질 수 있다. 얼굴 트랙킹 장치는 (i)리프 노드에 포함된 데이터의 수가 제1 임계값 이상이고, (ii)노드 스플릿을 통한 정보 이득(information gain)이 제2 임계값 이상이며, (iii)트리의 깊이(depth)가 제3 임계값 미만인 경우, 리프 노드를 스플릿함으로써 확률 모델을 업데이트할 수 있다.
As described above with reference to FIG. 8, the probability model for calculating the closure probabilities of patches may have a tree structure. The face tracking apparatus is configured such that (i) the number of data included in the leaf node is equal to or greater than a first threshold value, (ii) information gain through the node split is equal to or greater than a second threshold value, (iii) depth) is less than the third threshold, the probability model can be updated by splitting the leaf node.

도 12는 일 실시예에 따른 얼굴 트랙킹 프로세스의 전 과정을 나타낸 동작 흐름도이다. 도 12를 참조하면, 일 실시예에 따른 얼굴 트랙킹 장치는 단계(910)에서 초기화를 수행할 수 있다. 여기서, 확률 모델들이 초기화될 수 있다. 얼굴 트랙킹 장치는 단계(920)에서 입력 영상을 수신할 수 있다. 얼굴 트랙킹 장치는 단계(930)에서 얼굴을 검출할 수 있다. 여기서, 일반적인 얼굴 검출 알고리즘을 통하여 얼굴 영역이 검출될 수 있다. 얼굴 트랙킹 장치는 단계(940)에서 얼굴 모델의 파라미터들을 계산할 수 있다.12 is a flowchart illustrating an entire process of a face tracking process according to an exemplary embodiment of the present invention. Referring to FIG. 12, a face tracking apparatus according to an exemplary embodiment may perform initialization at step 910. FIG. Here, the probability models can be initialized. The face tracking device may receive the input image in step 920. [ The face tracking device may detect the face in step 930. [ Here, the face region can be detected through a general face detection algorithm. The face tracking device may calculate the parameters of the face model in step 940.

얼굴 트랙킹 장치는 단계(950)에서 얼굴 모델을 평가할 수 있다. 여기서, 기 학습된 분류기를 통하여 얼굴 모델이 평가될 수 있다. 얼굴 모델이 성공적이지 않다고 평가되는 경우, 얼굴 트랙킹 장치는 단계(910) 내지 단계(950)을 이용하여 다음 프레임의 얼굴을 트랙킹할 수 있다.The face tracking device may evaluate the face model in step 950. Here, the face model can be evaluated through the learned classifier. If the face model is evaluated as not being successful, then the face tracking device may track the face of the next frame using steps 910 through 950.

얼굴 모델이 성공적이라고 평가되는 경우, 얼굴 트랙킹 장치는 단계(960) 내지 단계(980)을 이용하여 다음 프레임의 얼굴을 트랙킹할 수 있다. 얼굴 트랙킹 장치는 단계(960)에서 입력 영상의 다음 프레임을 수신할 수 있다. 얼굴 트랙킹 장치는 단계(970)에서 얼굴 트랙킹 알고리즘을 수행할 수 있다. 얼굴 트랙킹 알고리즘은 도 13을 통하여 후술한다. 얼굴 트랙킹 장치는 얼굴 트랙킹 알고리즘을 수행함으로써 얼굴 모델 평가 결과를 수신할 수 있다. 얼굴 트랙킹 장치는 단계(980)에서 얼굴 모델 평가 결과가 성공적인지 아닌지 여부를 판단할 수 있다. 얼굴 모델이 성공적이지 않다고 평가되는 경우, 얼굴 트랙킹 장치는 단계(910) 내지 단계(950)을 이용하여 그 다음 프레임의 얼굴을 트랙킹할 수 있다. 얼굴 모델이 성공적이라고 평가되는 경우, 얼굴 트랙킹 장치는 단계(960) 내지 단계(980)을 이용하여 그 다음 프레임의 얼굴을 트랙킹할 수 있다.If the face model is evaluated to be successful, then the face tracking device may track the face of the next frame using steps 960 through 980. The face tracking device may receive the next frame of the input image in step 960. [ The face tracking device may perform a face tracking algorithm in step 970. [ The face tracking algorithm will be described later with reference to FIG. The face tracking apparatus can receive a face model evaluation result by performing a face tracking algorithm. The face tracking device may determine in step 980 whether the face model evaluation result is successful or not. If the face model is estimated to be unsuccessful, the face tracking device may track the face of the next frame using steps 910 through 950. If the face model is evaluated to be successful, the face tracking device may track the face of the next frame using steps 960 through 980.

도 13은 일 실시예에 따른 얼굴 트랙킹 알고리즘을 나타낸 동작 흐름도이다. 도 13에 도시된 각 단계들에는 도 1 내지 도 11을 통하여 전술한 사항들이 그대로 적용될 수 있다. 예를 들어, 단계(971)은 도 1의 단계(110)에 대응한다. 단계(972)는 도 1의 단계(120)에 대응한다. 단계(973)은 도 1의 단계(130)에 대응한다. 단계(974)는 도 1의 단계(140)에 대응한다. 단계(975)는 도 11의 단계(150)에 대응한다. 단계(976)은 도 11의 단계(160)에 대응한다.13 is a flowchart illustrating a face tracking algorithm according to an exemplary embodiment of the present invention. The steps described above with reference to FIGS. 1 to 11 may be applied to each step shown in FIG. For example, step 971 corresponds to step 110 of FIG. Step 972 corresponds to step 120 of FIG. Step 973 corresponds to step 130 of FIG. Step 974 corresponds to step 140 of FIG. Step 975 corresponds to step 150 of FIG. Step 976 corresponds to step 160 of FIG.

얼굴 트랙킹 장치는 단계(977)에서 입력 영상에 가려짐이 존재하는지 여부를 판단할 수 있다. 얼굴 트랙킹 장치는 서브 영역들의 가려짐 확률들에 기초하여 입력 영상에 가려짐 유무 여부를 판단할 수 있다. 예를 들어, 모든 패치들의 가려짐 확률들 및 모든 섹션들의 가려짐 확률들이 미리 설정된 임계값보다 작은 경우, 얼굴 트랙킹 장치는 입력 영상에 가려짐이 존재하지 않는다고 판단할 수 있다.The face tracking apparatus can determine in step 977 whether or not there is occlusion in the input image. The face tracking apparatus can determine whether or not the input image is masked based on the probabilities of closure of the sub regions. For example, if the masking probabilities of all the patches and the masking probabilities of all the sections are smaller than a predetermined threshold value, the face tracking apparatus can judge that no masking exists in the input image.

입력 영상에 가려짐이 존재하지 않는다고 판단되는 경우 얼굴 트랙킹 장치는 단계(978)에서 확률 모델을 업데이트할 수 있다. 단계(978)은 도 11의 단계(170)에 대응한다. 입력 영상에 가려짐이 존재한다고 판단되는 경우 얼굴 트랙킹 장치는 확률 모델을 업데이트하지 않을 수 있다. 얼굴 트랙킹 장치는 단계(979)에서 얼굴 모델 평가 결과를 리턴(return)할 수 있다.
If it is determined that no obscuration is present in the input image, the face tracking device may update the probability model in step 978. [ Step 978 corresponds to step 170 of FIG. The face tracking apparatus may not update the probability model if it is determined that there is occlusion in the input image. The face tracking device may return the face model evaluation result in step 979. [

도 14는 일 실시예에 따른 얼굴 트랙킹 장치를 나타낸 블록도이다. 도 14를 참조하면, 일 실시예에 따른 얼굴 트랙킹 장치(1400)는 얼굴 영역 검출부(1410), 분할부(1420), 가려짐 확률 계산부(1430), 및 트랙킹부(1440)를 포함한다. 얼굴 영역 검출부(1410)는 입력 영상에서 얼굴 영역을 검출할 수 있다. 분할부(1420)는 얼굴 영역을 서브 영역들로 분할할 수 있다. 가려짐 확률 계산부(1430)는 서브 영역들의 가려짐 확률들을 계산할 수 있다. 트랙킹부(1440)는 가려짐 확률들에 기초하여 입력 영상에 포함된 얼굴을 트랙킹할 수 있다. 여기서, 입력 영상은 적어도 일부가 가려진 얼굴을 포함할 수 있다.14 is a block diagram showing a face tracking apparatus according to an embodiment. 14, the face tracking apparatus 1400 includes a face region detecting unit 1410, a dividing unit 1420, a clogging probability calculating unit 1430, and a tracking unit 1440. The face region detection unit 1410 can detect the face region in the input image. The division unit 1420 can divide the face region into sub regions. The masking probability calculator 1430 may calculate the masking probabilities of the sub-areas. The tracking unit 1440 may track a face included in the input image based on the probability of clogging. Here, the input image may include at least a partially obscured face.

도 15를 참조하면, 얼굴 영역 검출부(1410)는 특징점 추출부(1411), 선택부(1412), 포즈 추정부(1413), 및 특징점 추정부(1414)를 포함한다. 특징점 추출부(1411)는 입력 영상의 현재 프레임에서 제1 특징점들을 추출할 수 있다. 선택부(1412)는 데이터베이스로부터 적어도 하나의 키 프레임을 선택할 수 있다. 포즈 추정부(1413)는 제1 특징점들과 적어도 하나의 키 프레임의 제2 특징점들에 기초하여 입력 영상에 포함된 얼굴의 포즈를 추정할 수 있다. 특징점 추정부(1414)는 추정된 포즈에 기초하여 입력 영상에 포함된 얼굴의 제3 특징점들을 추정할 수 있다.15, the face region detection unit 1410 includes a feature point extraction unit 1411, a selection unit 1412, a pose estimation unit 1413, and a feature point estimation unit 1414. The feature point extracting unit 1411 can extract the first feature points from the current frame of the input image. The selection unit 1412 can select at least one key frame from the database. The pose estimation unit 1413 can estimate a pose of a face included in the input image based on the first feature points and the second feature points of at least one key frame. The feature point estimation unit 1414 can estimate the third feature points of the face included in the input image based on the estimated pose.

도 16을 참조하면, 가려짐 확률 계산부(1430)는 확률 모델 결정부(1431), 제1 가려짐 확률 계산부(1432), 제2 가려짐 확률 계산부(1433), 및 가려짐 가중치 맵 생성부(1434)를 포함한다. 확률 모델 결정부(1431)는 패치들의 제1 확률 모델들 및 섹션들의 제2 확률 모델들을 결정할 수 있다. 제1 가려짐 확률 계산부(1432)는 제1 확률 모델들에 기초하여 패치들의 제1 가려짐 확률들을 계산할 수 있다. 제2 가려짐 확률 계산부(1433)는 제2 확률 모델들에 기초하여 섹션들의 제2 가려짐 확률들을 계산할 수 있다. 가려짐 가중치 맵 생성부(1434)는 제1 가려짐 확률들 및 제2 가려짐 확률들에 기초하여 가려짐 가중치 맵을 생성할 수 있다.16, the masking probability calculator 1430 includes a probability model determiner 1431, a first masking probability calculator 1432, a second masking probability calculator 1433, and a masking weight map And a generating unit 1434. The probability model determination unit 1431 can determine the first probability models of patches and the second probability models of sections. The first occlusion probability calculator 1432 may calculate the first occlusion probabilities of the patches based on the first probability models. The second occlusion probability calculation unit 1433 may calculate the second occlusion probabilities of the sections based on the second probability models. The masked weight map generator 1434 may generate a masked weight map based on the first masked probabilities and the second masked probabilities.

도 14 내지 도 16에 도시된 각 모듈들에는 도 1 내지 도 13을 통하여 기술된 사항들이 그대로 적용될 수 있으므로, 보다 상세한 설명은 생략한다.The modules described in FIGS. 14 to 16 may be applied to the modules described in FIG. 1 through FIG. 13 as they are, so that a detailed description thereof will be omitted.

이상에서 설명된 실시예들은 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치, 방법 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The embodiments described above may be implemented in hardware components, software components, and / or a combination of hardware components and software components. For example, the devices, methods, and components described in the embodiments may be implemented within a computer system, such as, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, such as an array, a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For ease of understanding, the processing apparatus may be described as being used singly, but those skilled in the art will recognize that the processing apparatus may have a plurality of processing elements and / As shown in FIG. For example, the processing unit may comprise a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as a parallel processor.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of the foregoing, and may be configured to configure the processing device to operate as desired or to process it collectively or collectively Device can be commanded. The software and / or data may be in the form of any type of machine, component, physical device, virtual equipment, computer storage media, or device , Or may be permanently or temporarily embodied in a transmitted signal wave. The software may be distributed over a networked computer system and stored or executed in a distributed manner. The software and data may be stored on one or more computer readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to an embodiment may be implemented in the form of a program command that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions to be recorded on the medium may be those specially designed and configured for the embodiments or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 비록 한정된 도면에 의해 실시예들이 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다. 그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.
Although the preferred embodiments of the present invention have been disclosed for illustrative purposes, those skilled in the art will appreciate that various modifications, additions and substitutions are possible, without departing from the scope and spirit of the invention as disclosed in the accompanying claims. For example, it is to be understood that the techniques described may be performed in a different order than the described methods, and / or that components of the described systems, structures, devices, circuits, Lt; / RTI > or equivalents, even if it is replaced or replaced. Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

Detecting a face region in an input image;
Dividing the face region into sub-regions;
Calculating closure probabilities of the sub-areas; And
Tracking the faces included in the input image based on the clipping probabilities
Lt; / RTI >
Wherein the masking probabilities are calculated based on probability models of a plurality of sites constituting a template shape, the plurality of sites corresponding to the sub-
Face tracking method.

The method according to claim 1,
The input image
Wherein at least a portion of the face is covered.

The method according to claim 1,
The step of detecting the face region
Detecting the face area based on the previous tracking result
Gt;

The method according to claim 1,
The step of detecting the face region
Extracting first feature points in a current frame of the input image;
Selecting at least one key frame from a database;
Estimating a pose of the face included in the input image based on the first feature points and the second feature points of the at least one key frame; And
Estimating third feature points of the face included in the input image based on the estimated pose
Gt;

5. The method of claim 4,
The at least one key frame
Wherein the index is indexed by a pose parameter and a facial expression parameter.

5. The method of claim 4,
Wherein the selecting of the at least one key frame comprises:
Selecting the at least one key frame using a pose parameter and a facial expression parameter of a previous frame of the input image
Gt;

5. The method of claim 4,
The at least one key frame
The three-dimensional coordinates of the previously successfully matched feature points and the feature vectors of the previously successfully matched feature points.

5. The method of claim 4,
The step of estimating the pose of the face
Generating matching relationship information between the first feature points and the second feature points based on the similarity between the feature vectors of the first feature points and the feature vectors of the second feature points; And
Estimating a pose parameter indicating a pose of the face based on the distance between the coordinates of the matched first feature point and the projection coordinates of the matched second feature point
Gt;

9. The method of claim 8,
The step of detecting the face region
Generating an effective matching result based on a distance between the coordinates of the matched first feature points and the projection coordinates of the matched second feature points
Wherein the face tracking method further comprises:

The method according to claim 1,
The step of dividing into the sub-
Generating patches based on positions and colors of pixels included in the face region; And
Generating sections based on the estimated feature points in the face region
Gt;

11. The method of claim 10,
The steps of generating the sections
Generating the sections by merging patches adjacent to each of the estimated feature points
Gt;

Detecting a face region in an input image;
Dividing the face region into sub-regions;
Calculating closure probabilities of the sub-areas; And
Tracking the faces included in the input image based on the clipping probabilities
Lt; / RTI >
The step of calculating the masking probabilities of the sub-
Calculating first occlusion probabilities of the patches based on first probability models of patches;
Calculating second occlusion probabilities of the sections based on second probability models of the sections; And
Generating a masking weight map based on the first masking probabilities and the second masking probabilities
Gt;

13. The method of claim 12,
The first probability models
Wherein the plurality of regions constituting the template shape are probability models assigned to portions corresponding to the patches.

13. The method of claim 12,
The feature descriptors of the patches
And a feature associated with the colors of the pixels included in the patches.

13. The method of claim 12,
The second probability models
Wherein the probability models are associated with components corresponding to the sections of the major components of the face.

13. The method of claim 12,
The feature descriptors of the sections
A feature associated with the number of valid mapping results included in the sections.

13. The method of claim 12,
The overlay weight map
And a probability of closure of each of the pixels included in the face region.

The method according to claim 1,
The step of tracking the face included in the input image
Adjusting a parameter of a face model representing the face included in the input image using a masked weight map
Gt;

19. The method of claim 18,
The face model
Dimensional shape model, a two-dimensional shape model, a three-dimensional shape model, and a texture model.

19. The method of claim 18,
The step of adjusting the parameters of the face model
Adjusting the parameter such that the cost function defined using the overlay weight map is minimized
Gt;

The method according to claim 1,
Evaluating a tracking result using the learned classifier; And
If the tracking result is evaluated to be successful, updating the key frame
Wherein the face tracking method further comprises:

22. The method of claim 21,
Determining whether or not there is occlusion in the input image if the tracking result is evaluated to be successful; And
If it is determined that there is no masking in the input image, updating the probability model
Wherein the face tracking method further comprises:

A computer-readable recording medium having recorded thereon a program for executing the method according to any one of claims 1 to 22.

A face region detection unit for detecting a face region in an input image;
A dividing unit dividing the face region into sub regions;
A masking probability calculation unit for calculating masking probabilities of the sub-areas; And
A tracking unit for tracking a face included in the input image based on the masking probabilities,
Lt; / RTI >
Wherein the masking probabilities are calculated based on probability models of a plurality of sites constituting a template shape, the plurality of sites corresponding to the sub-
Face tracking device.

25. The method of claim 24,
The input image
Wherein at least a portion of the face is covered.

25. The method of claim 24,
The face region detection unit
A feature point extraction unit for extracting first feature points in a current frame of the input image;
A selection unit for selecting at least one key frame from the database;
A pose estimating unit that estimates a pose of the face included in the input image based on the first feature points and the second feature points of the at least one key frame; And
A feature point estimating unit estimating third feature points of the face included in the input image based on the estimated pose,
And the face tracking device.

27. The method of claim 26,
The at least one key frame
And stores information related to the feature points, and is indexed by a pose parameter and a facial expression parameter.

25. The method of claim 24,
The divider
A patch generator for generating patches based on the positions and colors of the pixels included in the face area; And
A section generating section for generating sections based on the estimated feature points in the face region,
And the face tracking device.

29. The method of claim 28,
The section generating unit
And generates the sections by merging patches adjacent to each of the estimated minutiae.

A face region detection unit for detecting a face region in an input image;
A dividing unit dividing the face region into sub regions;
A masking probability calculation unit for calculating masking probabilities of the sub-areas; And
A tracking unit for tracking a face included in the input image based on the masking probabilities,
Lt; / RTI >
The occlusion probability calculator
A probability model determining unit for determining first probability models of patches and second probability models of sections;
A first occlusion probability calculator for calculating first occlusion probabilities of the patches based on the first probability models;
A second occlusion probability calculator for calculating second occlusion probabilities of the sections based on the second probability models; And
Generating an overtone weight map based on the first overtravel probabilities and the second overtop probabilities;
And the face tracking device.