KR101702878B1

KR101702878B1 - Head pose estimation with accumulated historgram and random forest

Info

Publication number: KR101702878B1
Application number: KR1020160049494A
Authority: KR
Inventors: 이칠우; 문성희
Original assignee: 전남대학교산학협력단
Priority date: 2016-04-22
Filing date: 2016-04-22
Publication date: 2017-02-06

Abstract

The present invention relates to a head pose estimation method. More specifically, the head pose estimation method using an accumulated histogram and a random forest comprises: calculating the accumulated histogram of a face image; and classifying the accumulated histogram into a predetermined angle by using the random forest, thereby accurately estimating the rotation direction of the head through a few processes of calculation.

Description

[0001] The present invention relates to a method of estimating head orientation using a cumulative histogram and a random forest,

본 발명은 머리 방향 추정 방법에 관한 것으로, 보다 구체적으로는 얼굴 영상의 누적 히스토그램을 계산하고 랜덤 포레스트를 이용하여 미리 정해놓은 각도로 분류함으로써 머리의 회전 방향을 적은 연산과정으로 정확하게 추정할 수 있는 누적 히스토그램과 랜덤 포레스트를 이용한 머리 방향 추정 방법에 관한 것이다.The present invention relates to a head direction estimating method, more specifically, by calculating a cumulative histogram of a facial image and classifying it into a predetermined angle using a random forest, And a head direction estimation method using a histogram and a random forest.

인간과 컴퓨터의 상호작용에 있어서 사용자의 머리방향은 많은 정보를 제공해 준다. 특히 머리방향과 관련된 시선 방향은 사용자의 ID 판별뿐만 아니라 그 사람의 의도, 대화내용의 추정, 관심대상의 구별 등에 꼭 필요한 정보를 담고 있다. 따라서 대부분의 지능형 인터페이스의 구현에 있어서 머리방향인식은 중요한 전제 조건의 하나로 인식되고 있다.In the interaction between human and computer, the user's head direction provides much information. Especially, the gaze direction related to the head direction contains not only the user's ID discrimination but also information necessary for the person's intention, estimation of the conversation contents, and distinction of the interested objects. Therefore, head direction recognition is recognized as one of the important preconditions in the implementation of most intelligent interfaces.

머리방향 추정을 위한 연구는 크게 지역적 접근법과 전역적 접근법으로 나누어진다. 지역적 접근법은 대부분 그림1에서 보이는 눈, 코, 입과 같은 얼굴 요소를 이용하여 머리방향을 추정한다. 이 방법은 얼굴 기관의 위치는 표정이 변하더라도 대체적으로 큰 차이를 보이지 않는다는 가정에서 출발한다. 이렇듯 자유도가 낮은 얼굴 요소의 구조적 특징 정보를 이용하여 머리의 방향성을 추정하는 방법이 지역적 접근법의 대표적인 예이다.The studies for head direction estimation are divided into regional approach and global approach. The regional approach estimates the head direction using facial elements such as eyes, nose, and mouth as shown in Fig. This method starts with the assumption that the position of the facial organs does not largely differ even if the expression changes. Thus, a method of estimating head orientation using structural feature information of face elements with low degrees of freedom is a representative example of a regional approach.

Gee[3]는 얼굴 내의 눈과 입의 위치를 이용해 머리방향을 추정할 수 있는 방법을 제안하고 있다. Horprasert[4]는 눈의 양 끝점과 코의 위치를 이용하여 특징 벡터를 정의함으로써 보다 정밀하게 머리방향을 추정하였다. Wang[5] 또한 마찬가지로 눈의 양 끝점과 입의 양 끝점의 위치를 바탕으로 머리방향을 추정하였다. Fadda[6]는 눈과 코의 위치를 기반으로 레오나르도 다빈치의 인체 황금비례를 응용하여 머리방향을 추정하였다.Gee [3] suggests a method of estimating the head direction using the positions of eyes and mouth in the face. Horprasert [4] estimated the head direction more precisely by defining the feature vector using both end points of the eye and nose position. Wang [5] also estimated the head direction based on the positions of both end points of the eye and both end points of the mouth. Fadda [6] used Leonardo da Vinci's human body proportions to estimate the head direction based on the position of the eyes and nose.

얼굴 요소 외에 지역적 특징 사이의 관계를 이용하기도 한다.In addition to facial features, the relationship between local features is also used.

Maurer[7]는 얼굴 요소의 구조적 정보를 사용하지 않고 영상에서 Garbor-based Jet 을 이용하여 추출한 특징 벡터를 기반으로 머리방향을 추정하였다.Maurer [7] estimated the head direction based on the feature vector extracted from the image using Garbor-based Jet without using the structural information of the face element.

이들과는 달리 3D 얼굴 모델을 구성하고 특징점과 모델과의 대응관계를 이용하여 머리방향을 구하기도 한다. Kong[8]은 3D 얼굴 모델을 구성하고 눈의 양 끝점과 코끝, 입의 양 끝점과 모델의 대응관계를 이용하여 머리방향을 추정하였다. Unlike these, we construct the 3D face model and find the head direction using the correspondence relation between the feature point and the model. Kong [8] constructed a 3D facial model and estimated the head direction using the correspondence between the model's endpoints, both ends of the eye, both ends of the mouth, and the model.

지역적 접근법들은 얼굴의 특징 벡터가 정확히 검출되어야한다는 전제를 바탕으로 방향 추정이 이루어진다. 그러나 얼굴의 개인차나 조명에 따른 얼굴의 음영 변화 등 여러 요인들로 인해 정확한 특징 벡터 검출에 어려움이 있으며, 검출된 특징 벡터에 따라 결과가 민감하게 변화한다는 문제점이 있다.Local approaches are based on the premise that face feature vectors must be detected accurately. However, it is difficult to detect accurate feature vectors due to various factors such as individual differences of faces and shading of faces due to illumination, and there is a problem that the results are sensitively changed according to the detected feature vectors.

전역적 접근법은 머리방향 추정에 얼굴 영상 전체를 이용하는 방법이다. 템플릿 영상과 비교하여 머리의 방향을 추정하는 방법을 가장 대표적인 방법으로 들 수 있다. The global approach is to use the entire facial image for head orientation estimation. The method of estimating the head direction compared with the template image is the most representative method.

Niyogi[9]는 TSVQ(Tree Structured Vector Quantization)과 저해상도 얼굴 영상 템플릿을 사용하여 머리의 방향을 추정하는 시스템을 작성하였다. Lanitis[10]는 윤곽선을 기반으로 하는 Flexible Model을 템플릿으로 사용하여 얼굴 요소의 위치정보를 얻고 머리의 방향을 추정했다. Sumi[11]는 얼굴 요소의 개별적 템플릿을 사용하여 위치를 찾고 이들의 위치 정보를 기록한 템플릿과의 정합을 통하여 대략적인 얼굴의 방향을 추정했다. 템플릿 기반 방법은 신체의 개인적 차이나 배경이나 조명 등에 의하여 영향을 받아 달라질 수 있기 때문에 획득할 수 있는 특징의 정밀도에 한계를 가질 수 있다.Niyogi [9] developed a system to estimate the head direction using TSVQ (Tree Structured Vector Quantization) and low-resolution facial image template. Lanitis [10] used the Flexible Model based on contour lines as a template to obtain the position information of facial elements and to estimate the direction of the head. Sumi [11] estimated the approximate direction of the face by using the individual template of facial elements and matching the template with the recorded position information of them. Template-based methods can be affected by personal differences in the body, background, lighting, and so on, which can limit the accuracy of acquired features.

이 외에 다중 검출기를 이용하여 머리방향을 추정하는 방법도 있다. Huang[12]은 여러 개의 분류기가 트리 구조를 이루도록 조합한 부스팅 방법을 사용하여 머리방향을 추정하였다. 또, 비선형 회귀방법을 이용하여 머리방향을 추정하는 방법이 있다. Gourier[13]는 색차 기반의 특징을 이용하여 얼굴을 검출하고 정규화하여 imagettes를 얻고, 이를 Widrow-Hoff 학습법을 이용해 각 포즈로 학습시킨 linear auto-associative memory를 이용하여 머리방향을 추정하였다.In addition, there is a method of estimating the head direction using a multi-detector. Huang [12] estimated the head direction using a boosting method that combines several classifiers to form a tree structure. There is also a method of estimating the head direction using a nonlinear regression method. Gourier [13] estimates the head orientation using a linear auto-associative memory, which learns faces by using color-difference-based features, obtains imagettes by normalizing them, and learns them with each pose using Widrow-Hoff learning method.

지역적 접근법은 특징 벡터 추출이 잘 이루어진다면 템플릿 기반 방법보다 정밀하게 방향을 추정할 수 있다. The local approach can more accurately estimate the direction than the template-based method if the feature vector extraction is well done.

[선행기술문헌][Prior Art Literature]

[비특허문헌][Non-Patent Document]

[1] http://www.jdl.ac.cn/peal/JDL-PEAL-Release.htm[1] http://www.jdl.ac.cn/peal/JDL-PEAL-Release.htm

[2] L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp.5 ?32, 2001.[2] L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp.53,23, 2001.

[3] A. Gee and R. Cipolla, “Non-intrusive Gaze Tracking of Human Computer Interaction,” Cambridge University, 1995.[3] A. Gee and R. Cipolla, "Non-intrusive Gaze Tracking of Human Computer Interaction," Cambridge University, 1995.

[4] T. Horprasert, Y. Yacoob and L.S. Davis, “Computing 3-D Head Orientation from a Monocular Image Secuqence,” Proc. 2nd Int. Conf. on Automatic Face and Gesture Recognition, pp.242-247, 1996.[4] T. Horprasert, Y. Yacoob and L.S. Davis, " Computing 3-D Head Orientation from a Monocular Image Secuqence, " Proc. 2nd Int. Conf. on Automatic Face and Gesture Recognition, pp. 242-247, 1996.

[5] J.G. Wang and E. Sung, “EM Enhancement of 3D Head Pose Estimated by Point at Infinity,” Image and Vision Computing, vol. 25, no. 12, pp. 1864-1874, 2007.[5] J.G. Wang and E. Sung, " EM Enhancement of 3D Head Pose Estimated by Point at Infinity, " Image and Vision Computing, vol. 25, no. 12, pp. 1864-1874, 2007.

[6] G. Fadda, G. L. Marcialis, F. Roli, L. Ghiani, "Exploiting the Golden Ratio on Human Faces for Head-Pose Estimation." In: Image Analysis and Processing?ICIAP 2013, Springer Berlin Heidelberg, pp. 280-289, 2013.[6] G. Fadda, G. L. Marcialis, F. Roli, L. Ghiani, "Exploiting the Golden Ratio on Head-Pose Estimation." In: Image Analysis and Processing? ICIAP 2013, Springer Berlin Heidelberg, pp. 280-289, 2013.

[7] T. Maurer and C. von der Malsburg, "Tracking and Learning Graphs and Pose on Image Sequences of Faces," Proc. 2nd Int. Conf. on Automatic Face and Gesture Recognition, pp. 176-181, 1996.[7] T. Maurer and C. von der Malsburg, "Tracking and Learning Graphs and Pose on Image Sequences of Faces," Proc. 2nd Int. Conf. on Automatic Face and Gesture Recognition, pp. 176-181, 1996.

[8]S.G. Kong and Ralph Oyini Mbouna, “Head Pose Estimation From a 2D Face Image Using 3D Face Morphing With Depth Parameters," IEEE Trans. on Image Processing, Vol.24, No.6, pp.1801-1808, 2015.[8] S.G. Kong and Ralph Oyini Mbouna, " Head Pose Estimation From a 2D Face Image Using 3D Face Morphing With Depth Parameters, "IEEE Trans. On Image Processing, Vol.24, No.6, pp. 1801-1808,

[9] S. Niyogi and W. Freeman, “Example-Based Head Tracking”, Proc. 2nd Int. Conf. on Automatic Face and Gesture Recognition, pp.374-377, 1996.[9] S. Niyogi and W. Freeman, "Example-Based Head Tracking", Proc. 2nd Int. Conf. on Automatic Face and Gesture Recognition, pp. 374-377, 1996.

[10] A. Lanitis, C.J. Taylor, T.F. Cootes and T.ahmed, “Automatic Interpretation of Human Faces and Hand Gestures Using Flexible Models”, Proc. IEEE Int. Conf. on Automatic Face and Gesture Recognition, pp.98-103, 1995.[10] A. Lanitis, C.J. Taylor, T.F. Cootes and T. Ahmed, " Automatic Interaction of Human Faces and Hand Gestures Using Flexible Models ", Proc. IEEE Int. Conf. on Automatic Face and Gesture Recognition, pp. 98-103, 1995.

[11] Y. Sumi and Y. Ohta, “Detection of face orientation and facial components using distributed appearance modeling”, Proc. IEEE Int. Conf. on Automatic Face and Gesture Recognition, pp. 254-259, 1995.[11] Y. Sumi and Y. Ohta, "Detection of face orientation and facial components using distributed appearance modeling", Proc. IEEE Int. Conf. on Automatic Face and Gesture Recognition, pp. 254-259, 1995.

[12]C. Huang, H.Ai, Y. LI and S.Lao, “High-performance rotation invariant multiview face detection,” IEEE Trans. on Pattern Ana-lysis and Machine Intelligence, Vol.29, No.4, pp. 671-686, 2007.[12] C. Huang, H. Ai, Y. Li and S. Lao, "High-performance rotation invariant multiview face detection," IEEE Trans. on Pattern Ana-lysis and Machine Intelligence, Vol.29, No. 4, pp. 671-686, 2007.

[13]N. Gourier, J. Maisonnasse, D. Hall and J.L. Crowley, “Head pose estimation on low resolution images,” Lecture Notes in Computer Science 4122, pp.270-280, 2007.[13] N. Gourier, J. Maisonnasse, D. Hall and J.L. Crowley, " Head pose estimation on low resolution images, " Lecture Notes in Computer Science 4122, pp. 270-280, 2007.

본 발명은 영상 내의 머리 방향을 추정함에 있어 계산량이 적으면서도 얼굴 영상의 지역적 특징을 효과적으로 반영함으로써 추정의 정확도를 향상시킬 수 있는 머리 방향 추정 방법을 제공하는 것이다.An object of the present invention is to provide a head direction estimation method capable of improving the accuracy of estimation by effectively reflecting regional characteristics of a face image with a small amount of calculation in estimating head direction in an image.

상기의 목적을 달성하기 위하여 본 발명은 얼굴 영상 학습 데이터를 이용하여 얼굴방향을 분류할 수 있는 분류기를 설계하는 단계; 분류하고자하는 대상인 입력 영상을 전처리 및 정규화를 수행하여 얼굴 영상으로 획득하는 단계; 상기 얼굴 영상을 이진 에지 영상(binary edge image, 이하, '대상 에지 영상'이라 함)으로 변환하고, 평균 정면 에지 영상과 상기 대상 에지 영상의 차 영상(different image)을 계산하는 단계; 및 상기 차 영상의 누적 히스토그램(accumulated histogram)을 생성하는 단계; 상기 누적 히스토그램을 상기 분류기에 입력하여 상기 얼굴 영상의 y축 방향 회전 각도와 x축 방향 회전 각도를 추정하는 단계;를 포함하는 것을 특징으로 하는 머리 방향 추정 방법을 제공한다.According to an aspect of the present invention, there is provided a method for designing a classifier, the method comprising: designing a classifier capable of classifying a face direction using facial image learning data; A step of pre-processing and normalizing an input image to be classified to obtain a face image; Converting the facial image into a binary edge image (hereinafter referred to as a 'target edge image'), and calculating a difference image between the average front edge image and the target edge image; And generating an accumulated histogram of the difference image; And inputting the cumulative histogram to the classifier to estimate a y-axis rotation angle and an x-axis rotation angle of the facial image.

바람직한 실시예에 있어서, 상기 전처리는 상기 입력 영상에서 노이즈와 배경을 제거하여 머리 영역을 추출하는 과정이다.In a preferred embodiment, the preprocessing is a process of extracting a head region by removing noise and background from the input image.

바람직한 실시예에 있어서, 상기 노이즈의 제거는 가우시안 필터링(Gaussian filtering)으로 수행된다.In a preferred embodiment, the removal of the noise is performed by Gaussian filtering.

바람직한 실시예에 있어서, 상기 정규화는 상기 머리 영역의 사이즈를 특정 사이즈로 확대 또는 축소하여 정규화하는 과정이다.In a preferred embodiment, the normalization is a process of normalizing the size of the head area by enlarging or reducing the size of the head area to a specific size.

바람직한 실시예에 있어서, 상기 대상 에지 영상은 케니 에지 검출(Canny Edge Detection)을 통해 획득된다.In a preferred embodiment, the subject edge image is obtained through Canny Edge Detection.

바람직한 실시예에 있어서, 상기 분류기는 랜덤 포레스트(random forest) 방법으로 학습된 결정 트리이다.In a preferred embodiment, the classifier is a decision tree learned in a random forest method.

바람직한 실시예에 있어서, 상기 누적 히스토그램은 상기 차 영상의 에지 분포를 x축에 누적한 x축 누적 히스토그램과 y축에 누적한 y축 누적 히스토그램을 포함한다.In a preferred embodiment, the cumulative histogram includes an x-axis cumulative histogram that accumulates the edge distribution of the difference image on the x-axis and a y-cumulative histogram that accumulates on the y-axis.

바람직한 실시예에 있어서, 상기 분류기는 상기 x축 누적 히스토그램과 상기 y축 누적 히스토그램을 입력받아 상기 얼굴 영상의 y축 방향 회전 각도와 x축 방향 회전 각도를 각각 추정한다.In a preferred embodiment, the classifier receives the x-axis cumulative histogram and the y-axis cumulative histogram and estimates the y-axis rotation angle and the x-axis rotation angle of the face image, respectively.

또한, 본 발명은 컴퓨터와 결합하여 상기 머리 방향 추정 방법을 수행하기 위해 컴퓨터로 읽을 수 있는 매체에 저장된 컴퓨터 프로그램을 더 제공한다.The present invention further provides a computer program stored in a computer-readable medium for performing the head-direction estimating method in combination with a computer.

또한, 본 발명은 메모리, 중앙 처리 장치 및 입출력 장치를 포함하고, 상기 컴퓨터 프로그램이 저장되어 머리 방향 추정 방법을 수행하는 컴퓨터를 더 제공한다.The present invention further provides a computer including a memory, a central processing unit, and an input / output device, wherein the computer program is stored to perform a head-direction estimation method.

또한, 본 발명은 메모리, 중앙 처리 장치, 입출력 장치 및 통신 장치를 포함하고, 상기 메모리에 상기 컴퓨터 프로그램이 저장되며, 통신망을 통해 클라이언트 컴퓨터로 상기 컴퓨터 프로그램을 전송할 수 있는 서버 컴퓨터를 더 제공한다.The present invention further provides a server computer including a memory, a central processing unit, an input / output device, and a communication device, the memory storing the computer program, and transmitting the computer program to a client computer via a communication network.

본 발명은 다음과 같은 우수한 효과를 가진다.The present invention has the following excellent effects.

본 발명의 머리 방향 추정 방법에 의하면, 얼굴 영상의 누적 히스토그램을 얼굴 특징으로 이용함으로써 적은 연산량으로도 지역적인 특징을 잘 반영하여 머리 회전 각도를 정확하게 추정할 수 있는 장점이 있다.According to the head direction estimation method of the present invention, the cumulative histogram of the face image is used as the face feature, so that it is possible to accurately estimate the head rotation angle by reflecting local characteristics even with a small amount of calculation.

또한, 본 발명의 머리 방향 추정 방법에 의하면, 랜덤 포레스트 알고리즘을 이요하여 추정의 신뢰도를 향상시킬 수 있는 장점이 있다.Further, according to the head direction estimation method of the present invention, there is an advantage that reliability of estimation can be improved by using a random forest algorithm.

도 1은 본 발명의 일 실시예에 따른 머리 방향 추정 방법의 흐름도,
도 2는 본 발명의 일 실시예에 따른 머리 방향 추정 방법에서 분류기 학습을 위해 사용된 얼굴 영상 학습 데이터의 예시,
도 3은 본 발명의 일 실시예에 따른 머리 방향 추정 방법에서 랜덤 포레스트 방법으로 학습된 결정 트리의 예시,
도 4는 본 발명의 일 실시예에 따른 머리 방향 추정 방법에서 입력 영상으로 부터 정규화된 얼굴 영상을 추출하는 과정을 설명하기 위한 도면,
도 5는 본 발명의 일 실시예에 따른 머리 방향 추정 방법에서 누적 히스토그램을 계산하는 과정을 설명하기 위한 도면,
도 6은 본 발명의 일 실시예에 따른 머리 방향 추정 방법에 의해 추정된 머리 방향 추정 결과를 보여주는 도면이다.1 is a flowchart of a head direction estimating method according to an embodiment of the present invention,
2 illustrates an example of facial image learning data used for classifier learning in the head direction estimation method according to an embodiment of the present invention,
FIG. 3 illustrates an example of a decision tree learned by the random forest method in the head direction estimation method according to an embodiment of the present invention,
4 is a view for explaining a process of extracting a normalized face image from an input image in a head direction estimation method according to an embodiment of the present invention;
5 is a diagram for explaining a process of calculating an accumulated histogram in a head direction estimation method according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating the head direction estimation result estimated by the head direction estimation method according to an embodiment of the present invention.

본 발명에서 사용되는 용어는 가능한 현재 널리 사용되는 일반적인 용어를 선택하였으나, 특정한 경우는 출원인이 임의로 선정한 용어도 있는데 이 경우에는 단순한 용어의 명칭이 아닌 발명의 상세한 설명 부분에 기재되거나 사용된 의미를 고려하여 그 의미가 파악되어야 할 것이다.Although the terms used in the present invention have been selected as general terms that are widely used at present, there are some terms selected arbitrarily by the applicant in a specific case. In this case, the meaning described or used in the detailed description part of the invention The meaning must be grasped.

이하, 첨부한 도면에 도시된 바람직한 실시예들을 참조하여 본 발명의 기술적 구성을 상세하게 설명한다.Hereinafter, the technical structure of the present invention will be described in detail with reference to preferred embodiments shown in the accompanying drawings.

그러나 본 발명은 여기서 설명되는 실시예에 한정되지 않고 다른 형태로 구체화될 수도 있다. 명세서 전체에 걸쳐 동일한 참조번호는 동일한 구성요소를 나타낸다.However, the present invention is not limited to the embodiments described herein but may be embodied in other forms. Like reference numerals designate like elements throughout the specification.

본 발명의 일 실시예에 따른 머리 방향 추정 방법은 입력 영상에 포함된 사람의 머리가 영상 내에서 x축과 y축을 중심으로 얼마만큼 회전하였는지를 추정함으로써 영상으로부터 사람의 의도나 관심의 대상을 파악하는데 적용될 수 있는 기술이다.The head direction estimation method according to an embodiment of the present invention estimates how much the head of a person included in the input image rotates about the x axis and the y axis in the image, It is a technology that can be applied.

또한, 본 발명의 일 실시예에 따른 머리 방향 추정 방법은 실질적으로 머리 방향 추정을 위한 컴퓨터 프로그램이 저장된 컴퓨터에 의해 수행된다.Further, the head-direction estimating method according to an embodiment of the present invention is performed by a computer in which a computer program for substantially head-direction estimating is stored.

또한, 상기 컴퓨터는 일반적인 퍼스널 컴퓨터뿐만 아니라 보안 카메라와 같은 임베디드 시스템, 스마트폰, 태블릿 PC와 같은 스마트 기기를 포함하는 광의의 개념으로 상기 컴퓨터 프로그램을 저장하는 메모리, 중앙 처리 장치 및 입출력 장치를 포함하여 영상 처리를 수행할 수 있는 장치라면 어떠한 장치라도 가능하다.In addition, the computer includes a memory, a central processing unit, and an input / output device for storing the computer program in a broad concept including a smart device such as an embedded system such as a security camera, a smart phone, and a tablet PC as well as a general personal computer Any device capable of performing image processing is possible.

또한, 상기 컴퓨터 프로그램은 상기 컴퓨터와는 별도로 기록 매체에 저장되어 제공될 수 있으며, 상기 기록매체는 본 발명을 위하여 특별히 설계되어 구성된 것들이거나 컴퓨터 소프트웨어 분야에서 통상의 지식을 가진 자에서 공지되어 사용 가능한 것일 수 있으며, 예를 들면, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD, DVD와 같은 광 기록 매체, 자기 및 광 기록을 겸할 수 있는 자기-광 기록 매체, 롬, 램, 플래시 메모리 등 단독 또는 조합에 의해 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치일 수 있다.In addition, the computer program may be stored in a recording medium separately from the computer, and the recording medium may be designed and configured specifically for the present invention, or may be publicly known and used by a person having ordinary skill in the computer software field Optical recording media such as CD and DVD, magneto-optical recording media which can also serve as magnetic and optical recording media, ROM, RAM, flash memory, and flash memory. Or the like, or a hardware device specially configured to store and execute program instructions by itself or in combination.

또한, 상기 컴퓨터 프로그램은 프로그램 명령, 로컬 데이터 파일, 로컬 데이터 구조 등이 단독 또는 조합으로 구성된 프로그램일 수 있고, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라, 인터프리터 등을 사용하여 컴퓨터에 의해 실행될 수 있는 고급 언어 코드로 짜여진 프로그램일 수 있다.In addition, the computer program may be a program consisting of program commands, local data files, local data structures, etc., alone or in combination, and may be executed by a computer using an interpreter or the like as well as machine code Lt; RTI ID = 0.0 > language code. &Lt; / RTI >

또한, 상기 컴퓨터 프로그램은 서버 컴퓨터에 저장되어 통신망을 통해 클라이언트 컴퓨터로 전송되어 다운로드될 수 있다.The computer program may be stored in a server computer and transmitted to a client computer through a communication network and downloaded.

또한, 상기 서버 컴퓨터는 상기 컴퓨터 프로그램을 저장할 수 있는 메모리, 중앙 처리 장치, 입출력 장치 및 통신 장치를 포함하여 구성될 수 있다.In addition, the server computer may include a memory capable of storing the computer program, a central processing unit, an input / output device, and a communication device.

이하에서는 도 2를 참조하여 본 발명의 일 실시예에 따른 머리 방향 추정 방법을 상세히 설명한다.Hereinafter, a head direction estimation method according to an embodiment of the present invention will be described in detail with reference to FIG.

도 2를 참조하면, 본 발명의 일 실시예에 따른 머리 방향 추정 방법은 먼저, 얼굴 영상 학습 데이터를 이용하여 소정의 입력 벡터를 분류할 수 있는 분류기를 설계한다(S1000).Referring to FIG. 2, in the head direction estimation method according to an embodiment of the present invention, a classifier capable of classifying a predetermined input vector using face image learning data is designed (S1000).

본 발명에서는 랜덤 포레스트(random forest) 방법으로 학습된 결정 트리를 상기 분류기로 사용하였다.In the present invention, a decision tree learned by a random forest method is used as the classifier.

랜덤 포레스트[2]는 2001년에 L. Breiman이 제안한 알고리즘으로 전통적인 결정 트리(decision tree) 기법을 다수 개의 나무 구조로 확장시킨 결정트리 메타학습(meta-learning) 기법이다. 랜덤 포레스트를 구성하는 트리는 일반적인 결정 트리에서 다루기 힘든 가지치기 과정을 생략하고 성장과정만을 거쳐 생성되기 때문에 분류 모델 디자인이 더욱 용이하다. 이는 ‘큰 수의 법칙’에 근거하여, 충분히 많은 서로 다른 랜덤 결정 트리의 결과를 조합함으로써 가지치기를 한 단일 결정 트리의 성능을 능가하는 정확도를 얻을 수 있기 때문이다.Random Forest [2] is a decision tree meta-learning method in which a traditional decision tree technique is extended to a number of tree structures by an algorithm proposed by L. Breiman in 2001. [ It is easier to design the classification model because the tree that constitutes the random forest is generated only through the growth process, omitting the pruning process which is difficult to handle in the general decision tree. This is because, based on the 'large number of rules', it is possible to achieve an accuracy that exceeds the performance of a single decision tree with pruning by combining the results of sufficiently many different random decision trees.

랜덤 포레스트의 임의성을 높일수록 각 결정 트리 간의 연관성이 낮아지는데, 이는 분산 감소와 직결되어 알고리즘의 정확도 향상에 기여한다. 임의성을 부여할 수 있는 요소로는 각 트리의 학습 데이터 선택과 각 노드에서의 최적화 과정이 있다. 이를 고려한 랜덤 포레스트 학습과정은 다음과 같다. As the randomness of the random forest increases, the relevance of each decision tree decreases, which directly contributes to the improvement of the accuracy of the algorithm. Elements that can give randomness are learning data selection of each tree and optimization process at each node. The random forest learning process considering this is as follows.

먼저, 학습 집합으로부터 생성하고자하는 결정 트리 수만큼의 부트스트랩(bootstrap)을 구성한다. 부트스트랩이란 학습 집합으로부터 임의 복원추출을 반복하여 원 학습 집합과 크기가 동일하게 생성한 부분집합이다. 부트스트랩을 이용하여 각 트리를 생성하면 학습 영상에 차이가 생기므로 각 나무의 첫 번째 노드의 분기함수부터 변화가 생기게 되고, 이로 인해 다양한 형태의 나무들을 얻을 수 있다. 즉, 부트스트랩은 학습영상의 선택에 임의성을 부여하는 장치이다.First, a bootstrap is formed as many as the number of decision trees to be generated from the learning set. Bootstrap is a subset that is generated by repeating arbitrary restoration extraction from the learning set and having the same size as the original learning set. If we create each tree using bootstrapping, there will be a difference in the learning image. Therefore, the branching function of the first node of each tree changes, and various types of trees can be obtained. That is, bootstrap is a device that gives randomness to selection of a learning image.

부트스트랩 생성 이 후, 각 부트스트랩으로부터 결정 트리를 성장시킨다. 일반적인 결정 트리와 달리 랜덤 포레스트를 구성하는 나무는 각 노드에서 학습영상의 모든 특징값 또는 학습영상의 특징값의 모든 차원을 고려하지 않고 사전에 지정된 개수만큼 임의로 선출된 것만을 고려하여 분기함수를 생성한다. 이로 인해 같은 학습영상이 주어지더라도 다른 형태를 갖는 나무로 성장시킬 수 있다. 즉, 특징값의 개수나 차원에 임의성을 부여하는 방법은 노드 최적화에 임의성을 부여하는 장치로 쓰인다. 이 경우 각 개별 랜덤 결정 트리의 정밀도는 떨어질 수 있으나, 앞서 설명한 바와 같이 이들을 조합하여 예측을 수행하게 되는 랜덤 포레스트의 정확도와 안정성은 높아지게 된다. After the bootstrap generation, a crystal tree is grown from each bootstrap. Unlike a general decision tree, a tree constituting a random forest generates a branch function considering only all the feature values of the learning image or the feature values of the learning image at random, do. Thus, even if the same learning image is given, it can be grown into a tree having a different shape. That is, the method of assigning randomness to the number of feature values or dimension is used as a device for giving randomness to node optimization. In this case, the precision of each individual random decision tree may be lowered, but the accuracy and stability of the random forest to perform the prediction by combining them are increased as described above.

각 결정 트리의 성장을 멈추는 기준은 다음과 같다. 첫째, 해당 노드에 도달한 학습 데이터의 수가 어떤 값보다 작은 경우, 해당 노드에서 데이터를 다시 분할하기엔 너무 적은 데이터를 다룸으로써 일반성을 잃을 수 있으므로 성장을 멈추고 잎노드를 생성한다. 둘째, 결정 트리의 깊이 값이 일정 값에 도달한 경우, 해당 깊이까지 완전히 성장시킨 트리의 경우 깊이 값 가 커질수록 총 계산량은 기하급수적으로 증가하므로 시간적 측면에서 효율이 매우 떨어지게 되므로 성장을 멈추고 잎 노드를 생성한다. 셋째, 데이터를 분할하여 얻을 수 있는 정보획득량이 일정 값 이하인 경우, 해당 노드가 트리의 정확도에 끼치는 영향력이 매우 적으므로 성장을 멈추고 잎노드를 생성한다. The criteria for stopping growth of each decision tree are as follows. First, if the number of learning data arriving at a given node is smaller than a certain value, it may lose generality by handling too little data to re-partition the data at that node, thus stopping the growth and creating leaf nodes. Second, when the depth value of the decision tree reaches a certain value, the tree is completely grown up to the corresponding depth, and as the depth value becomes larger, the total calculation amount increases exponentially. Therefore, . Third, if the amount of information obtained by dividing the data is less than a certain value, the influence of the node on the accuracy of the tree is very small, so the growth is stopped and a leaf node is generated.

도 3은 상기 랜덤 포레스트를 이용하여 결정 트리(T₁,T₂)를 생성한 예를 보여주는 것으로 영상 내의 존재하는 영상 특징을 임의의 차원 값으로 이용하여 결정 트리를 생성한 예를 보여준다.FIG. 3 shows an example of generating a decision tree (T ₁ , T ₂ ) using the random forest. FIG. 3 shows an example in which a decision tree is generated using an image feature existing in an image as an arbitrary dimension value.

랜덤 포레스트는 노드의 최적화 시 고려하는 차원의 수가 많을수록 임의성이 낮아지므로 다른 결정 트리와의 연관성이 높아져 대부분이 비슷한 형태의 나무로 생성되며, 이러한 경우 단일 결정 트리와 비슷한 성능을 갖는다. 따라서 고려하는 차원의 수를 적게 지정하여 임의성을 높이면 다른 결정 트리와의 연관성이 감소하므로 다양한 형태의 나무들이 생성되어 이를 결합하면 단일 결정 트리보다 정확한 결과를 얻어낼 수 있다. Random forests have similarities with other decision trees because they have less randomness as the number of dimensions to consider when optimizing nodes increases. Most of them are generated as similar trees. In this case, they have similar performance to single decision trees. Therefore, if the number of dimensions to be considered is small and the randomness is increased, the correlation with other decision trees is reduced. Therefore, various types of trees are generated and combined to obtain accurate results than a single decision tree.

또한, 본 발명에서는 영상 특징으로 누적 히스토그램(accumulated histogram, H1,H2)을 이용하였다. 누적 히스토그램에 관한 자세한 설명은 도 4 및 도 5의 설명에서 하기로 한다.In the present invention, accumulated histograms (H1 and H2) are used as image features. A detailed description of the cumulative histogram will be given in Figs. 4 and 5.

또한, 도 2는 상기 결정 트리의 학습에 이용된 얼굴 영상 학습 데이터를 보여주는 것으로 본 발명에는 CAS-PEAL Dataset[1]를 이용하였다. 이 데이터는 남성 595명과 여성 445명총 1040명의 인물에 대해 포즈변화, 표정변화, 장신구 착용, 조명 조건 변화 등의 다양한 가변요소를 포함한 영상들로 구성되어 있다.FIG. 2 shows the face image learning data used in the learning of the decision tree. In the present invention, CAS-PEAL Dataset [1] is used. This data is composed of images including various variable elements such as pose change, facial expression change, wearing of ornaments, illumination condition change for 595 male and 445 female 1040 persons.

머리방향 인식을 위한 데이터베이스는 개인별로 21(7*3)개의 다른 영상 그룹으로 구성되어있다. 이 중 101명은 수직축에 대한 회전이 (-67°, -45°, -22°, 0°, 22°, 45°, 67°), 939명은 (45°, -30°, -15°, 0°, 15°, 30°, 45°)이고 수평축에 대한 회전은 (-30°, 0°, 30°)으로 동일하다. 본 발명에는 이들 중 후자인 939명에 대한 포즈 영상만을 이용하였다. The database for head direction recognition is composed of 21 (7 * 3) different image groups for each individual. Of these, 101 had rotations with respect to the vertical axis (-67 °, -45 °, -22 °, 0 °, 22 °, 45 ° and 67 °), 939 (45 °, -30 °, -15 °, 0 ° °, 15 °, 30 °, 45 °) and the rotation about the horizontal axis is the same (-30 °, 0 °, 30 °). In the present invention, only the pose images of the latter 939 persons were used.

즉, 상기 분류기는 수직축(y축)에 대해 얼굴의 회전 각도를 45°, -30°, -15°, 0°, 15°, 30°, 45°의 7개의 각도로 분류할 수 있고, 수평축(x축)에 대하 얼굴의 회전 각도를 -30°, 0°, 30°의 3개의 각도로 분류할 수 있도록 설계된다.That is, the classifier can classify the angle of rotation of the face with respect to the vertical axis (y axis) into seven angles of 45 °, -30 °, -15 °, 0 °, 15 °, 30 ° and 45 °, (x-axis) can be classified into three angles of -30 °, 0 °, and 30 °.

다음, 분류하고자 하는 대상 영상인 입력 영상을 입력받는다(S2000).Next, the input image which is the target image to be classified is inputted (S2000).

또한, 도 4를 참조하면 상기 입력 영상(100)은 사람의 머리 영역(110)뿐만 아니라 배경(120)이 포함된 영상이다.Referring to FIG. 4, the input image 100 includes an image including a background 120 as well as a human head region 110.

다음, 상기 입력 영상(100)의 전처리 및 정규화를 수행한다(S3000).Next, the preprocessing and normalization of the input image 100 is performed (S3000).

또한, 상기 전처리 과정은 상기 입력 영상(100)에서 배경(120)은 제거하고 머리 영역(110)을 포함하는 얼굴 영상(111)을 추출하는 과정과 불필요한 노이즈를 제거하는 과정을 포함한다.The preprocessing process includes removing the background 120 from the input image 100, extracting the face image 111 including the head region 110, and removing unnecessary noise.

또한, 상기 얼굴 영상(111)의 추출은 공지된 머리 영역 추출방법[4]을 이용할 수 있으나 특별한 제약은 없다.In addition, the extraction of the facial image 111 can use a known head region extraction method [4], but there is no particular limitation.

또한, 본 발명에서는 가우시안 필터(Gaussian filter)를 이용하여 서로 다른 조명 조건에서 발생하는 노이즈를 제거하였다.Also, in the present invention, noise generated under different illumination conditions is removed by using a Gaussian filter.

또한, 상기 정규화는 상기 얼굴 영상(111)을 비교가능한 표준형의 의미를 갖는 크기와 밝기로 변환하여 정규화하여 정규화된 얼굴 영상(200)을 얻는 과정이다.In addition, the normalization is a process of converting the face image 111 into a size and a brightness having a comparable standard type and obtaining normalized face images 200 by normalization.

본 발명에서는 상기 얼굴 영상(111)을 160×90 크기의 영상으로 정규화하였으나 변경이 가능하다.In the present invention, the face image 111 is normalized to a 160x90 size image, but can be changed.

다음, 상기 정규화된 얼굴 영상(200)을 이진 에지 영상(binary edge image, 이하, '대상 에지 영상'이라 함)으로 변환한다(S4000).Next, the normalized facial image 200 is converted into a binary edge image (S4000).

또한, 도 5를 참조하면, 상기 대상 에지 영상(210)은 상기 얼굴 영상(110)의 에지 부분을 이진화하여 획득한 영상으로 본 발명에서는 캐니 에지 검출(Canny Edge Detection)을 통해 획득하였다.5, the target edge image 210 is an image obtained by binarizing an edge portion of the face image 110. In the present invention, the edge image 210 is acquired through Canny Edge Detection.

다음, 상기 대상 에지 영상(210)과 평균 정면 에지 영상(300)의 차 영상(different image)을 계산한다(S5000).Next, a difference image between the target edge image 210 and the average front edge image 300 is calculated (S5000).

또한, 상기 평균 정면 에지 영상(300)은 도 2에 도시한 얼굴 영상 학습 데이터를 전처리 및 정규화하고, x축 또는 y축에 대해 회전량이 '0'인 영상들을 평균한 뒤, 특징이 잘 드러나도록 임계값을 부여하여 이진 에지 영상으로 변환한 영상이다.In addition, the average front edge image 300 preprocesses and normalizes the face image learning data shown in FIG. 2, and averages the images having the rotation amount of '0' with respect to the x axis or the y axis, It is the image which is converted into the binary edge image by giving the threshold value.

또한, 상기 대상 에지 영상(210)과 상기 평균 정면 에지 영상(300)의 차 영상은 상기 평균 정면 에지 영상(300)의 얼굴 요소(입, 눈, 코, 눈썹 등)와 위치가 다른 부분의 에지를 검출할 수 있게 하여, 아래에서 설명할 누적 히스토그램을 계산하였을 때, 얼굴 회전에 따른 변화 정보를 얻어낼 수 있게 한다.The difference image between the target edge image 210 and the average front edge image 300 is the edge of the portion of the average front edge image 300 that is different from the face element (mouth, eye, nose, eyebrow, So that the change information according to the rotation of the face can be obtained when the cumulative histogram to be described below is calculated.

다음, 상기 차 영상의 누적 히스토그램(accumulated histogram)을 생성한다(S6000).Next, an accumulated histogram of the difference image is generated (S6000).

또한, 도 5를 참조하면, 상기 누적 히스토그램(400)은 상기 차 영상의 에지를 y축으로 누적시킨 y축 누적 히스토그램(410)과 x축으로 누적시킨 x축 누적 히스토그램(420)을 포함할 수 있으며, 이는 y축 방향 회전 각도와 x축 방향 회전 각도를 추정하는 데 각각 이용된다.5, the cumulative histogram 400 may include a y-axis cumulative histogram 410 in which the edges of the difference image are accumulated in the y-axis and an x-axis cumulative histogram 420 in the x-axis Which is used to estimate the rotation angle in the y-axis direction and the rotation angle in the x-axis direction, respectively.

또한, 상기 누적 히스토그램은 히스토그램 스무딩(smoothing) 및 정규화 과정을 거칠 수 있다.In addition, the accumulated histogram may be subjected to histogram smoothing and normalization.

즉, 상기 누적 히스토그램들(410,420)을 이용하여 얼굴의 회전 각도를 추정하는 것은 본 발명의 가장 핵심이 되는 부분으로 종래의 얼굴 특징 즉, 눈, 코, 입의 위치를 추출하여 위치관계를 분석하는 방법이 각 기관을 정확한 위치를 추출하지 못하였을 때, 머리 방향 추정 정확도가 매우 낮아지는 반면, 본 발명은 얼굴 특징을 에지 정보와 이를 누적한 히스토그램을 이용하므로 추정의 정확도를 향상시킬 수 잇고, 덧셈과 뺄셈의 단순한 연산만이 필요하므로 처리 속도를 매우 향상시킬 수 있어 실시간 검출에 매우 적합한 장점이 있다.That is, estimating the rotation angle of the face using the cumulative histograms 410 and 420 is the most important part of the present invention, and extracts the positions of the conventional facial features, i.e., eyes, nose, and mouth, When the method fails to extract the exact position of each organ, the head direction estimation accuracy is very low. On the other hand, since the present invention uses the edge information and the accumulated histogram, the accuracy of the estimation can be improved, And subtraction. Therefore, it is very suitable for real-time detection because the processing speed can be greatly improved.

또한, 도 3에 도시한 결정 트리 역시, 도 2의 얼굴 영상 학습 데이터를 전처리, 정규화, 에지 영상 변환과정을 거쳐 누적 히스토그램을 생성한 후, 생성된 누적 히스토그램을 학습 데이터로 이용하여 설계된다.In addition, the decision tree shown in FIG. 3 is also designed by using the generated cumulative histogram as the learning data after generating the cumulative histogram through the preprocessing, normalization, and edge image conversion process of the face image learning data of FIG.

다음, 상기 누적 히스토그램들(410,420)을 상기 분류기에 특징 벡터로 입력하면(S7000), 상기 분류기는 상기 누적 히스토그램들(410,420)이 x축 및 y축으로 회전한 각도를 분류하여 머리 회전 각도를 추정하고(S8000), 종료한다.Next, the cumulative histograms 410 and 420 are input to the classifier as feature vectors (S7000). The classifier classifies angles of the cumulative histograms 410 and 420 rotated in the x and y axes, (S8000), and the process ends.

도 6은 본 발명의 일 실시예에 따른 머리 방향 추정 방법에 의해 추정된 머리 방향 추정 결과를 보여주는 것으로, 도 2에 도시한 CAS-PEAL Dataset의 수직축에 대한 회전이 (45°, -30°, -15°, 0°, 15°, 30°, 45°)이고, 수평축에 대한 회전은 (-30°, 0°, 30°)인 939명 중, 700명은 상기 결정 트리의 설계를 위한 학습 데이터로 이용하였고, 239명은 추정 대상으로 이용하였다.FIG. 6 shows the head direction estimation result estimated by the head orientation estimation method according to an embodiment of the present invention. The rotation of the CAS-PEAL Dataset shown in FIG. 2 with respect to the vertical axis is (45 °, -30 °, Of the 939 persons whose rotation is about -30 degrees, -15 degrees, 0 degrees, 15 degrees, 30 degrees, 45 degrees) and the rotation about the horizontal axis is (-30 degrees, 0 degrees, 30 degrees) And 239 were used as estimation subjects.

도 6에 도시한 추정 결과를 참조하면, 머리가 정면을 향하는 경우 거의 완전하게 방향을 인식할 수 있었고, 고개를 좌우 또는 상하로 많이 회전한 경우 인식률이 낮아졌으나 평균적으로 종래의 머리 방향 추정 방법들과 비교하여 높은 인식률을 보였다.Referring to the estimation result shown in FIG. 6, when the head is directed to the front, the direction can be recognized almost completely and the recognition rate is lowered when the head is rotated leftward or rightward or upward or downward. However, And the recognition rate was high.

이상에서 살펴본 바와 같이 본 발명은 바람직한 실시예를 들어 도시하고 설명하였으나, 상기한 실시예에 한정되지 아니하며 본 발명의 정신을 벗어나지 않는 범위 내에서 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 다양한 변경과 수정이 가능할 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is clearly understood that the same is by way of illustration and example only and is not to be taken by way of limitation in the present invention. Various changes and modifications will be possible.

100:입력 영상 110:머리 영역
111:얼굴 영상 120:배경
200:정규화된 얼굴 영상 210:대상 에지 영상
300:평균 정면 에지 영상 400:누적 히스토그램100: Input image 110: Head area
111: Facial image 120: Background
200: Normalized facial image 210: Target edge image
300: Average front edge image 400: Cumulative histogram

Claims

Designing a classifier capable of classifying a face direction using face image learning data;
A step of pre-processing and normalizing an input image to be classified to obtain a face image;
Converting the facial image into a binary edge image (hereinafter referred to as a 'target edge image'), and calculating a difference image between the average front edge image and the target edge image; And
Generating an accumulated histogram of the difference image;
And inputting the cumulative histogram to the classifier to estimate a y-axis rotation angle and an x-axis rotation angle of the face image.

The method according to claim 1,
Wherein the preprocessing is a process of extracting a head region by removing noise and background from the input image.

3. The method of claim 2,
Wherein the removal of the noise is performed by Gaussian filtering.

3. The method of claim 2,
Wherein the normalization is a process of normalizing the size of the head region by expanding or reducing the size of the head region to a specific size.

The method according to claim 1,
Wherein the subject edge image is obtained through Canny Edge Detection.

The method according to claim 1,
Wherein the classifier is a decision tree learned by a random forest method.

7. The method according to any one of claims 1 to 6,
Wherein the cumulative histogram includes an x-axis cumulative histogram that accumulates the edge distribution of the difference image on the x-axis and a y-cumulative histogram that is accumulated on the y-axis.

8. The method of claim 7,
Wherein the classifier receives the x-axis cumulative histogram and the y-axis cumulative histogram and estimates the y-axis rotation angle and the x-axis rotation angle of the face image, respectively.

A computer program stored in a computer-readable medium for performing the head-direction estimating method of claim 7 in combination with a computer.

A computer comprising a memory, a central processing unit, and an input / output device, wherein the computer program of claim 9 is stored to perform a head-direction estimation method.

A server computer comprising a memory, a central processing unit, an input / output device, and a communication device, wherein the computer program of claim 9 is stored in the memory, and the computer program can be transferred to the client computer through a communication network.