KR101534776B1

KR101534776B1 - A Template-Matching-Based High-Speed Face Tracking Method Using Depth Information

Info

Publication number: KR101534776B1
Application number: KR1020130110805A
Authority: KR
Inventors: 김동욱; 서영호; 김우열
Original assignee: 광운대학교 산학협력단
Priority date: 2013-09-16
Filing date: 2013-09-16
Publication date: 2015-07-09
Also published as: KR20150031522A

Abstract

사람의 얼굴을 촬영하는 카메라로부터 시간상 연속되는 프레임을 갖는 깊이영상 및 컬러영상을 입력받아, 한 프레임에서 상기 얼굴을 검출하여 템플릿을 생성한 후 상기 템플릿으로 다음 프레임부터 얼굴을 추적하는 템플릿 매칭 기반 고속 얼굴 추적 방법에 관한 것으로서, (a) 상기 템플릿의 깊이와, 현재 프레임의 깊이를 비교하여, 상기 템플릿의 크기를 갱신하는 단계; (b) 갱신된 템플릿을 중심으로, 갱신된 템플릿의 크기의 소정의 비율(이하 확장 비율)로 확장한 현재 프레임의 영역을 탐색영역으로 설정하는 단계; (c) 상기 갱신된 템플릿으로 상기 탐색영역에서 템플릿 매칭을 수행하여, 매칭되는 위치를 선택하는 단계; 및, (d) 상기 매칭되는 위치를 상기 갱신된 템플릿의 위치로 갱신하고, 상기 현재 프레임의 다음 프레임을 현재 프레임으로 하여, 상기 (a)단계 내지 (c)단계를 반복하는 단계를 포함하는 구성을 마련한다.
상기와 같은 템플릿 매칭 기반 고속 얼굴 추적 방법에 의하여, 조기종료와 희소(sparse) 탐색을 적용하고 템플릿 크기를 조정하여 탐색영역을 축소함으로써, 얼굴 추적의 속도를 높여 실시간으로 처리하는 동시에, 매칭보정 등을 수행하여 추적의 정확도도 향상시킬 수 있다.A template matching-based high-speed image processing unit that receives a depth image and a color image having frames consecutive in time from a camera that photographs a face of a person, detects the face in one frame to generate a template, A method for tracking a face, the method comprising: (a) comparing a depth of the template with a depth of a current frame to update a size of the template; (b) setting a region of the current frame extending to a predetermined ratio (hereinafter referred to as expansion ratio) of the size of the updated template as a search region, with the updated template as a center; (c) performing template matching in the search area with the updated template and selecting a matching position; And (d) repeating the steps (a) to (c) by updating the matched position to the position of the updated template and using the next frame of the current frame as a current frame. .
According to the template matching based high speed face tracking method as described above, by applying early termination and sparse search and reducing the search area by adjusting the template size, the speed of face tracking is increased to process in real time, To improve the accuracy of tracking.

Description

{Template-Matching-Based High-Speed Face Tracking Method Using Depth Information}

본 발명은 깊이 정보만을 이용하여 얼굴을 고속으로 추적하는 방법으로서, 템플릿 매칭 방법을 사용하되, 조기종료 방법과 희소(sparse) 탐색 방법을 적용하고, 그에 따른 추적오류를 보정하고자 주변 화소들을 대상으로 매칭보정을 수행하는 템플릿 매칭 기반 고속 얼굴 추적 방법에 관한 것이다.The present invention relates to a method for tracking a face at high speed using only depth information, which uses a template matching method and applies an early termination method and a sparse search method, And more particularly, to a template matching based high speed face tracking method for performing matching correction.

특히, 본 발명은 얼굴의 움직임에 따른 깊이의 변화를 보정하기 위해 추적할 얼굴의 깊이 값을 추정하고 그 결과에 따라 템플릿의 크기를 조정하고, 조정된 템플릿의 크기에 따라 템플릿 매칭을 수행할 탐색영역을 조정하는 템플릿 매칭 기반 고속 얼굴 추적 방법에 관한 것이다.
In particular, the present invention estimates a depth value of a face to be tracked in order to correct a change in depth according to a movement of a face, adjusts the size of the template according to a result of the estimation, And more particularly, to a template-matching based high-speed face tracking method for adjusting regions.

인간생체의 일부를 추적하는 방법은 컴퓨터 비전분야를 비롯한 다양한 분야에서 오래전부터 연구되어 왔으며, 보안시스템, 화상회의, 로봇 비전, HCI(human-computer interface)에 의한 대화형 시스템, 스마트 홈 등에 널리 사용되고 있다[비특허문헌 1][비특허문헌 2]. 이 중 얼굴에 대한 연구가 가장 활발히 연구되어 왔으며, 그 목적은 빠르고 정확한 추적이다.Methods of tracking a part of a human body have been studied for a long time in various fields including computer vision field and widely used for security systems, video conferencing, robot vision, interactive system by HCI (human-computer interface) [Non-Patent Document 1] [Non-Patent Document 2]. Among these, research on the face has been actively studied, and its purpose is fast and accurate tracking.

얼굴추적은 동영상으로 입력되는 영상 시퀀스에서 움직이는 사람의 얼굴을 검출하고 이동경로를 추적하는 것으로, 실시간 환경에서의 빠른 수행속도에 초점을 맞추어 연구되고 있다.Face tracking detects moving faces of a moving person in a sequence of moving images and traces the movement path, and focuses on fast running speed in real - time environment.

얼굴을 추적하는 가장 간단한 방법은 얼굴을 하나의 객체로 보고 객체에 해당하는 블록을 매칭 시키는 방법[비특허문헌 3]이다. 그 외 전처리 또는 수학적, 물리적 현상 등을 사용한 모델링 방법을 이용하여 동적인 배경에서 움직이는 물체를 분리하여 영상 내에서 가장 유사한 객체를 추적하는 방법이 있다[비특허문헌 4-6]. [비특허문헌 4]는 특징 기반의 방법으로 강한 얼굴 분석을 위해서 색깔과 움직임의 결합된 정보를 이용하였으며, [비특허문헌 5]은 비디오 시퀀스를 가지고 얼굴 포즈를 추적하기 위해서 기계학습과 확률적인 틀을 결합하였다. [비특허문헌 6]에서는 다양한 조명 조건에서 3D 머리 추적을 위해 안정하고 강한 기술을 제안하였다.The simplest way to track a face is to look at the face as an object and match the block corresponding to the object [Non-Patent Document 3]. In addition, there is a method of tracking moving objects in a dynamic background by using a preprocessing or modeling method using mathematical and physical phenomenon to track the most similar objects in an image [Non-Patent Document 4-6]. [Non-Patent Document 4] uses combined information of color and motion for strong facial analysis in a feature-based method, and Non-Patent Document 5 describes a method for tracking facial pose with a video sequence, The framework was combined. [Non-Patent Document 6] proposed a stable and robust technique for 3D head tracking under various illumination conditions.

그리고 기존에 2차원 영상을 사용하던 방법과는 달리 3차원적 정보를 사용하기 위하여 3차원적 움직임[비특허문헌 7]을 사용하는 방법이 연구되었다.Unlike the conventional method using two-dimensional images, a method of using three-dimensional motion [Non-Patent Document 7] has been studied to use three-dimensional information.

[비특허문헌 7]은 옵티칼 플로우(Optical flow)와 깊이를 통합함으로서 3-D 변형이나 회전 정보를 가지고 사람을 추적하는 방법을 제안하였다. 3차원적 움직임뿐 만 아니라 3차원적 정보를 사용하기 위해서 스테레오 매칭에 의한 변이 값[비특허문헌 8][비특허문헌 9][비특허문헌 10]을 사용하는 방법들도 연구되었다.[Non-Patent Document 7] proposed a method of tracking a person with 3-D transformation or rotation information by integrating optical flow and depth. Non-Patent Document 8 [Non-Patent Document 9] [Non-Patent Document 10] methods of using mutation values by stereo matching were also studied in order to use three-dimensional information as well as three-dimensional motion.

[비특허문헌 8]는 스테레오 카메라를 통해 3D 영상 획득 시 필요한 전처리(캘리브레이션) 과정이 필요 없는 스테레오 비전 카메라 시스템 기반의 3D 얼굴 추적을 제안하였다. [비특허문헌 9]는 칼라 분포와 깊이의 이점을 결합한 방법을 제안하였으며, [비특허문헌 10]에서는 스테레오 카메라에서 추출한 변위정보를 이용한 PCA-기반의 얼굴 인식 알고리즘을 제안하였다.[Non-Patent Document 8] proposes a 3D face tracking based on a stereo vision camera system that does not require a preprocessing (calibration) process for 3D image acquisition through a stereo camera. [Non-Patent Document 9] proposed a method of combining the advantages of color distribution and depth. [Non-Patent Document 10] proposed a PCA-based face recognition algorithm using displacement information extracted from a stereo camera.

객체를 추적하는 것에 있어서 조명변화는 매우 민감하여 추적 시 문제가 된다. 하지만 위와 같이 3차원 정보 즉 깊이 정보를 이용하게 되면 다양한 조명변화에도 강하여 조명변화로 인해 발생했던 문제를 해결할 수 있다. Lighting changes in tracking objects are very sensitive and are a problem in tracking. However, by using the 3D information, that is, the depth information, it is possible to solve the problem caused by the illumination change due to strong resistance to various illumination changes.

추적 시 또한 최근에는, 깊이카메라 중 TOF(Time of filght) 방식의 SR4000[비특허문헌 11][비특허문헌 12]이나 또는 구조광 방식의 마이크로소프트(Microsoft)사의 키넥트(Kinect)[비특허문헌 13]를 이용하여 깊이 정보를 실시간으로 획득할 수 있기 때문에, 이들로부터 획득된 깊이정보를 얼굴추적에 직접 사용하는 연구도 진행되고 있다.In recent years, SR4000 [Non-Patent Document 11] [Non-Patent Document 12] of TOF (Time of filght) method or Kinect [Non-Patent Document 12] of structured optical system of Microsoft Since the depth information can be obtained in real time using the document [13], studies are being conducted to directly use the depth information obtained from the depth information on the face tracking.

[비특허문헌 14]에서는 특징 기반 방법의 SR3000을 이용하여 사람의 코 추적 방법을 제안하였다. [비특허문헌 15]에서는 키넥트(Kinect)와 SR4000을 사용하여 템플릿의 깊이를 이용하여 얼굴을 추적하기 위한 템플릿 매칭 방법을 이용하였다. 또한 깊이카메라를 이용하여 움직임 추정의 복잡도를 줄이는 방법과[비특허문헌 18] 현재블록과 참조블록 사이의 거리 정보를 이용하여 줌 움직임을 정확하게 추정하는 방법[비특허문헌 19]을 제안하였다.[Non-Patent Document 14] proposed a method of tracking a person's nose using the feature-based method SR3000. [Non-Patent Document 15] uses a template matching method for tracking a face using the depth of a template using Kinect and SR4000. Also, a method of reducing the complexity of motion estimation using a depth camera and a method of accurately estimating zoom motion using distance information between a current block and a reference block are proposed [Non-Patent Document 19].

키넥트(Kinect)를 이용하는 방법들도 실시간 얼굴 검출 및 추적을 위하여, 보다 정확하고 좀 더 빠른 강력한 방법의 구현이 필요하다. 특히, 이를 위해, 조기종료(early termination) 방법이나, 희소(Sparse) 탐색 방법들을 잘 적용시킬 수 있는 방법의 구현이 필요하다.
Kinect-based methods also require a more accurate and faster method for real-time face detection and tracking. In particular, for this purpose, it is necessary to implement an early termination method or a method capable of applying sparse search methods well.

G, Q, Zhao, et al., "A Simple 3D face Tracking Method based on Depth Information," Int'l Conf. on Machine Learning and Cybernetics, pp. 5022-5027, Aug. 2005.G, Q, Zhao, et al., "A Simple 3D Face Tracking Method Based on Depth Information," Int'l Conf. on Machine Learning and Cybernetics, pp. 5022-5027, Aug. 2005. [비특허문헌 2] C. X. Wang and Z. Y. Li, "A New Face Tracking Algorithm Based on Local Binary Pattern and Skin Color Information," ISCSCT, Vol. 2, pp. 20-22, Dec. 2008.[Non-Patent Document 2] C. X. Wang and Z. Y. Li, "A New Face Tracking Algorithm Based on Local Binary Pattern and Skin Color Information," ISCSCT, Vol. 2, pp. 20-22, Dec. 2008. [비특허문헌 3] K. Hariharakrishnan and D. Schonfeld, "Fast object tracking using adaptive block matching," IEEE Trans. Multimedia, vol. 7, no. 5, 2005.[Non-Patent Document 3] K. Hariharakrishnan and D. Schonfeld, "Fast object tracking using adaptive block matching," IEEE Trans. Multimedia, vol. 7, no. 5, 2005. [비특허문헌 4] M. Lievin and F. Luthon; "Nonlinear Color Space and Spatiotemporal MRF for Hierarchical Segmentation of Face Features in Video," IEEE Trans. Image Processing, vol. 13, No. 1, Jan. 2004.[Non-Patent Document 4] M. Lievin and F. Luthon; "Nonlinear Color Space and Spatiotemporal MRF for Hierarchical Segmentation of Face Features in Video," IEEE Trans. Image Processing, vol. 13, No. 1, Jan. 2004. [비특허문헌 5] Y. Lin et al., "Real-time Face Tracking and Pose Estimation with Partitioned Sampling and Relevance Vector Machine," IEEE Intl. Conf. Robotics and Automation, pp. 453-458, 2009.[Non-Patent Document 5] Y. Lin et al., "Real-time Face Tracking and Pose Estimation with Partitioned Sampling and Relevance Vector Machine," IEEE Intl. Conf. Robotics and Automation, pp. 453-458, 2009. [비특허문헌 6] A. An and M. Chung, "Robust Real-time 3D Head Tracking based on Online Illumination Modeling and its Application to Face Recognition," IEEE Intl. Conf. Intelligent Robots and Systems, pp. 1466-1471, 2009.[Non-Patent Document 6] A. An and M. Chung, " Robust Real-time 3D Head Tracking Based On Online Illumination Modeling and its Application to Face Recognition, IEEE Intl. Conf. Intelligent Robots and Systems, pp. 1466-1471, 2009. [비특허문헌 7] R. Okada, Y. Shirai, and J. Miura, "Tracking a Person with 3-D Motion by Integrating Optical Flow and Depth," Proc. Fourth IEEE Int'l Conf. Automatic Face and Gesture Recognition, pp. 336-341, 2000.[Non-Patent Document 7] R. Okada, Y. Shirai, and J. Miura, "Tracking a Person with 3-D Motion by Integrating Optical Flow and Depth," Proc. Fourth IEEE Int'l Conf. Automatic Face and Gesture Recognition, pp. 336-341,2000. [비특허문헌 8] G. Zhao, et al., "A Simple 3D Face Tracking Method based on Depth Information," Intl Conf. Machine Learning and Cybernetics, pp. 5022-5027, 2005.[Non-Patent Document 8] G. Zhao, et al., "A Simple 3D Face Tracking Method Based on Depth Information," Intl Conf. Machine Learning and Cybernetics, pp. 5022-5027, 2005. [비특허문헌 9] Y. H. Lee et al., "A Robust Face Tracking using Stereo Camera," SICE Annual Conf., pp. 1985-1989, Sept. 2007. [Non-Patent Document 9] Y. H. Lee et al., "A Robust Face Tracking using Stereo Camera," SICE Annual Conf., Pp. 1985-1989, Sept. 2007. [비특허문헌 10] S. Kosov et al., "Rapid Stereo-vision Enhanced Face Recognition," IEEE Intl. Conf. Image Processing, pp. 2437-2440, Sept. 2010.[Non-Patent Document 10] S. Kosov et al., "Rapid Stereo-vision Enhanced Face Recognition," IEEE Intl. Conf. Image Processing, pp. 2437-2440, Sept. 2010. [비특허문헌 11] Mesa Imaging, SR4000 user manual v2.0, May 2011.[Non-Patent Document 11] Mesa Imaging, SR4000 user manual v2.0, May 2011. [비특허문헌 12] M. Hacker, et al., "Geometric Invariants for Facial Feature Tracking with 3D TOF Cameras," Int'l Symposium on Signals, Circuits and Systems, Vol. 1, pp. 1-4, 2007.[Non-Patent Document 12] M. Hacker, et al., "Geometric Invariants for Facial Feature Tracking with 3D TOF Cameras," Int'l Symposium on Signals, Circuits and Systems, Vol. 1, pp. 1-4, 2007. [비특허문헌 13] J. L. Wilson, Microsoft kinect for Xbox 360, PC Mag. Com, Nov. 10, 2010.[Non-Patent Document 13] J. L. Wilson, Microsoft Kinect for Xbox 360, PC Mag. Com, Nov. 10, 2010. [비특허문헌 14] M. Hacker, et al., "Geometric Invariants for Facial Feature Tracking with 3D TOF Cameras," Int'l Symposium on Signals, Circuits and Systems, Vol. 1, pp. 1-4, 2007.[Non-Patent Document 14] M. Hacker, et al., "Geometric Invariants for Facial Feature Tracking with 3D TOF Cameras," Int'l Symposium on Signals, Circuits and Systems, Vol. 1, pp. 1-4, 2007. [비특허문헌 15] X. Suau, J. Ruiz-Hidalgo and J. Casas, "Real-Time Head and Hand Tracking Based on 2.5D Data", IEEE Trans. Multimedia, Vol. 14, No. 3, pp. 575-585, June 2012.[Non-Patent Document 15] X. Suau, J. Ruiz-Hidalgo and J. Casas, "Real-Time Head and Hand Tracking Based on 2.5D Data", IEEE Trans. Multimedia, Vol. 14, No. 3, pp. 575-585, June 2012. [비특허문헌 16] Y.-J. Bae, H.-J. Choi, Y.-H Seo and D.-W. Kim, "A Fast and Accurate Face Detection and Tracking Method by using Depth Information," J. Korean Institute of Communications and Information Science (KICS), Vol. 37A, No. 07, pp. 586-599[Non-Patent Document 16] Y.-J. Bae, H.-J. Choi, Y.-H Seo and D.-W. Kim, "A Fast and Accurate Face Detection and Tracking Method by using Depth Information," J. Korean Institute of Communications and Information Science (KICS), Vol. 37A, No. 07, pp. 586-599 [비특허문헌 17] P. Viola and M. J. Jones, "Robust Real-Time Face Detection," Computer Vision, Vol. 52, No. 2, pp. 137-154, 2004.[Non-Patent Document 17] P. Viola and M. J. Jones, "Robust Real-Time Face Detection," Computer Vision, Vol. 52, No. 2, pp. 137-154, 2004. [비특허문헌 18] S.-K. Kwon and S.-W. Kim, "Motion Estimation Method by Using Depth Camera," J. Broadcast Engineering, Vol. 17, No. 4, pp. 676-683, Jul. 2012. [Non-patent Document 18] S.-K. Kwon and S.-W. Kim, "Motion Estimation Method by Using Depth Camera," J. Broadcast Engineering, Vol. 17, No. 4, pp. 676-683, Jul. 2012. [비특허문헌 19] S.-K. Kwon, Y.-H. Park and K.-R. Kwon, "Zoom Motion Estimation Method by Using Depth Information," J. Korea Multimedia Society, Vol. 16, No. 2, pp. 131-137, Feb. 2013.[Non-Patent Document 19] S.-K. Kwon, Y.-H. Park and K.-R. Kwon, "Zoom Motion Estimation Method by Using Depth Information," J. Korea Multimedia Society, Vol. 16, No. 2, pp. 131-137, Feb. 2013.

본 발명의 목적은 상술한 바와 같은 문제점을 해결하기 위한 것으로, 깊이 정보만을 이용하여 얼굴을 고속으로 추적하는 방법으로서, 템플릿 매칭 방법을 사용하되, 조기종료 방법과 희소(sparse) 탐색 방법을 적용하고, 그에 따른 추적오류를 보정하고자 주변 화소들을 대상으로 매칭보정을 수행하는 템플릿 매칭 기반 고속 얼굴 추적 방법을 제공하는 것이다.SUMMARY OF THE INVENTION The object of the present invention is to solve the above-mentioned problems, and an object of the present invention is to provide a method of tracking a face at high speed using only depth information by using a template matching method and applying an early termination method and a sparse search method And a matching method based on template matching to perform matching correction on neighboring pixels in order to correct a tracking error.

또한, 본 발명의 목적은 얼굴의 움직임에 따른 깊이의 변화를 보정하기 위해 추적할 얼굴의 깊이 값을 추정하고 그 결과에 따라 템플릿의 크기를 조정하고, 조정된 템플릿의 크기에 따라 템플릿 매칭을 수행할 탐색영역을 조정하는 템플릿 매칭 기반 고속 얼굴 추적 방법을 제공하는 것이다.
It is another object of the present invention to estimate a depth value of a face to be tracked in order to correct a change in depth according to movement of a face, to adjust the size of the template according to a result of the estimation and to perform template matching according to the size of the adjusted template Speed face tracking method based on template matching that adjusts the search area to be scanned.

상기 목적을 달성하기 위해 본 발명은 사람의 얼굴을 촬영하는 카메라로부터 시간상 연속되는 프레임을 갖는 깊이영상 및 컬러영상을 입력받아, 한 프레임에서 상기 얼굴을 검출하여 템플릿을 생성한 후 상기 템플릿으로 다음 프레임부터 얼굴을 추적하는 템플릿 매칭 기반 고속 얼굴 추적 방법에 관한 것으로서, (a) 상기 템플릿의 깊이와, 현재 프레임의 깊이를 비교하여, 상기 템플릿의 크기를 갱신하는 단계; (b) 갱신된 템플릿을 중심으로, 갱신된 템플릿의 크기의 소정의 비율(이하 확장 비율)로 확장한 현재 프레임의 영역을 탐색영역으로 설정하는 단계; (c) 상기 갱신된 템플릿으로 상기 탐색영역에서 템플릿 매칭을 수행하여, 매칭되는 위치를 선택하는 단계; 및, (d) 상기 매칭되는 위치를 상기 갱신된 템플릿의 위치로 갱신하고, 상기 현재 프레임의 다음 프레임을 현재 프레임으로 하여, 상기 (a)단계 내지 (c)단계를 반복하는 단계를 포함하는 것을 특징으로 한다.According to an aspect of the present invention, there is provided a depth-of-field image processing method including receiving a depth image and a color image having frames successive in time from a camera that photographs a face of a person, detecting the face in one frame to generate a template, (A) comparing a depth of the template with a depth of a current frame to update a size of the template; (b) setting a region of the current frame extending to a predetermined ratio (hereinafter referred to as expansion ratio) of the size of the updated template as a search region, with the updated template as a center; (c) performing template matching in the search area with the updated template and selecting a matching position; And (d) repeating the steps (a) to (c) by updating the matched position to the position of the updated template and using the next frame of the current frame as a current frame .

또, 본 발명은 템플릿 매칭 기반 고속 얼굴 추적 방법에 있어서, 상기 (a)단계에서, 상기 템플릿 및, 현재 프레임에서 상기 템플릿에 해당하는 영역(이하 현재 프레임의 샘플링 영역) 각각을 서브블록으로 나눈 후, 가장 큰 깊이의 서브블록을 각각의 깊이로 구하여, 상기 템플릿 및 상기 샘플링 영역의 깊이 차에 비례하여, 상기 템플릿의 크기를 조정하여 갱신하는 것을 특징으로 한다.According to another aspect of the present invention, there is provided a high-speed face tracking method based on template matching, wherein in the step (a), each of the template and the region corresponding to the template in the current frame , Subblocks having the greatest depth are obtained with respective depths, and the size of the template is adjusted in proportion to the difference in depth between the template and the sampling region.

또, 본 발명은 템플릿 매칭 기반 고속 얼굴 추적 방법에 있어서, 상기 (c)단계에서, 상기 갱신된 템플릿과, 상기 탐색영역 내의 각 위치에서 픽셀 당 SAD(sum-of-absolute differences)(이하 PSAD)를 구하여, 최소값을 갖는 위치를 매칭된 위치로 선택하는 것을 특징으로 한다.According to another aspect of the present invention, there is provided a high-speed face tracking method based on template matching, wherein in step (c), the updated template and a sum-of-absolute differences (SAD) And selects the position having the minimum value as the matched position.

또, 본 발명은 템플릿 매칭 기반 고속 얼굴 추적 방법에 있어서, 상기 (c)단계에서, 상기 갱신된 템플릿과, 상기 탐색영역 내의 픽셀 당 SAD(PSAD)가 소정의 문턱치 보다 작을 경우, 탐색을 조기 종료하는 것을 특징으로 한다.According to another aspect of the present invention, there is provided a high-speed face tracking method based on template matching, wherein in step (c), when the updated template and the SAD (PSAD) per pixel in the search area are smaller than a predetermined threshold, .

또, 본 발명은 템플릿 매칭 기반 고속 얼굴 추적 방법에 있어서, 상기 (c)단계에서, 상기 템플릿의 위치에 해당하는 탐색영역의 위치부터 시작하여 나선형으로 점차 먼쪽으로 탐색하는 것을 특징으로 한다.According to another aspect of the present invention, there is provided a high-speed face tracking method based on template matching, wherein in step (c), the searching is performed in a spiral direction gradually starting from a position of a search area corresponding to a position of the template.

또, 본 발명은 템플릿 매칭 기반 고속 얼굴 추적 방법에 있어서, 상기 (c)단계에서, 상기 탐색영역에서 순차적으로 이웃하는 위치로 이동하면서, 상기 갱신된 템플릿과의 템플릿 매칭을 수행하되, 탐색된 한 화소의 위치에서 다음으로 탐색하는 화소의 위치로 이동할 때, 소정의 간격으로 화소를 뛰어넘어 다음 화소를 선택하여 탐색하는 것을 특징으로 한다.According to another aspect of the present invention, there is provided a high-speed face tracking method based on template matching, wherein in step (c), template matching with the updated template is performed while sequentially moving from the search area to a neighboring location, The present invention is characterized in that, when moving from a position of a pixel to a position of a pixel to be searched next, the next pixel is selected and searched beyond a pixel at a predetermined interval.

또, 본 발명은 템플릿 매칭 기반 고속 얼굴 추적 방법에 있어서, 상기 (c)단계에서, 매칭되는 위치의 화소가 선택되면, 선택된 위치의 화소와 이웃하는 화소에 대하여 추가적인 매칭을 수행하여 보정하는 것을 특징으로 한다.According to another aspect of the present invention, there is provided a high-speed face tracking method based on template matching, wherein, in the step (c), when a pixel at a matching position is selected, further matching is performed with respect to a pixel adjacent to the selected position .

또, 본 발명은 템플릿 매칭 기반 고속 얼굴 추적 방법에 있어서, 상기 (c)단계에서, 선택된 위치의 상하 또는 상하좌우의 화소에 대하여 보정하는 것을 특징으로 한다.
According to another aspect of the present invention, there is provided a high-speed face tracking method based on template matching, wherein in the step (c), upper and lower or upper, lower, left, and right pixels of a selected position are corrected.

또한, 본 발명은 템플릿 매칭 기반 고속 얼굴 추적 방법을 수행하는 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것이다.
The present invention also relates to a computer-readable recording medium on which a program for performing a fast matching method based on template matching is recorded.

상술한 바와 같이, 본 발명에 따른 템플릿 매칭 기반 고속 얼굴 추적 방법에 의하면, 조기종료와 희소(sparse) 탐색을 적용하고 템플릿 크기를 조정하여 탐색영역을 축소함으로써, 얼굴 추적의 속도를 높여 실시간으로 처리하는 동시에, 매칭보정 등을 수행하여 추적의 정확도도 향상시키는 효과가 얻어진다.As described above, according to the template matching-based high-speed face tracking method according to the present invention, by applying early termination and sparse search and reducing the search area by adjusting the template size, At the same time, matching correction and the like are performed to improve the tracking accuracy.

본 발명에 따른 방법을 구현하여, 자체 제작한 테스트 시퀀스들과 MPEG에서 제공한 다시점 테스트 시퀀스를 적용하는 실험을 수행한 결과, 실험결과 키넥트(Kinect)를 이용하여 자체제작(640×480) 시퀀스에서는 약 3%의 추적오류와 2.45ms의 수행시간을 보였으며, Lovebird1(1024×768) 시퀀스에서는 약 1%의 추적 오류와 7.46ms의 수행시간을 보였다.
As a result of the experiments using the self-produced test sequences and the multi-point test sequences provided by MPEG, the self-produced (640 × 480) Sequence showed about 3% tracking error and 2.45 ms execution time. In Lovebird1 (1024 × 768) sequence, tracking error was about 1% and execution time was 7.46 ms.

도 1은 본 발명을 실시하기 위한 전체 시스템의 구성을 도시한 도면이다.
도 2는 본 발명의 일실시예에 따른 얼굴검출과 얼굴추적의 동작상의 관계를 설명하는 흐름도.
도 3은 본 발명의 일실시예에 따른 템플릿 매칭 기반 고속 얼굴 추적 방법을 설명하는 흐름도.
도 4는 본 발명에 따른 얼굴 추적 방법을 표현한 의사코드.
도 5는 본 발명에 따른 객체 크기와 깊이의 관계를 표시한 그래프.
도 6은 본 발명에 따라 탐색영역을 설정하는 예시도.
도 7은 본 발명에 따라 나선형 희소(sparse) 탐색 방법에 대한 예시도.
도 8은 본 발명에 따라 매칭 보정을 하는 방법에 대한 예시도.
도 9는 본 발명에 따라 파라미터 결정을 위해 사용한 테스 시퀀스에 대한 표.
도 10은 본 발명에 따라 각 테스트 시퀀스 이미지로서, LR; (a) 64th, (b) 79th, (c) 186th, UD; (d) 38th (e) 96th (f) 107th, BF; (g) 32th (h) 65th (i) 149th의 이미지
도 11은 본 발명에 따라 템플릿 분할을 위한 템플릿 재조정 오류를 도시한 그래프.
도 12는 본 발명에 따라 탐색범위 α(%)에 대한 추적오차(%)를 도시한 그래프.
도 13은 본 발명에 따라 조기종료 문턱치를 결정하기 위한 실험 결과를 나타낸 그래프.
도 14는 본 발명에 따라 희소(sparse) 탐색 간격에 대한 추적오차와 수행시간을 나타낸 표.
도 15는 본 발명의 실험에 따라 추적에 사용한 시퀀스 정보를 나타낸 표.
도 16은 본 발명의 실험에 따라 본 발명의 얼굴 추적 방법의 실험결과를 나타낸 표.
도 17은 본 발명의 실험에 따라 추적 된 결과 영상으로서, WL; (a) 38th, (b) 186th, (c) 225th, S&J; (d) 51th, (e) 188th, (f) 292th, Lovebird1; (g) 26th, (h) 96th, (i) 135th를 나타낸 이미지.
도 18은 본 발명의 실험에 따라 본원발명과 종래의 방법과의 성능 비교를 나타낸 표.1 is a diagram showing a configuration of an overall system for carrying out the present invention.
FIG. 2 is a flowchart for explaining a relationship between an operation of face detection and face tracking according to an embodiment of the present invention; FIG.
3 is a flowchart illustrating a template matching-based high-speed face tracking method according to an embodiment of the present invention.
4 is a pseudo code representing a face tracking method according to the present invention.
5 is a graph showing the relationship between object size and depth according to the present invention.
6 is an exemplary diagram for setting a search area according to the present invention;
Figure 7 is an illustration of a spiral sparse search method in accordance with the present invention.
8 is an exemplary view of a method of performing matching correction according to the present invention.
9 is a table for a test sequence used for parameter determination according to the present invention;
10 shows, as each test sequence image according to the present invention, LR; (a) 64th, (b) 79th, (c) 186th, UD; (d) 38th (e) 96th (f) 107th, BF; (g) 32th (h) 65th (i) Image of 149th
11 is a graph illustrating template rebalance errors for template partitioning in accordance with the present invention;
12 is a graph showing the tracking error (%) for the search range? (%) According to the present invention.
13 is a graph showing experimental results for determining an early termination threshold according to the present invention.
14 is a table showing the tracking error and the execution time for a sparse search interval according to the present invention;
15 is a table showing sequence information used for tracking according to the experiment of the present invention.
16 is a table showing experimental results of the face tracking method of the present invention in accordance with the experiment of the present invention.
17 is a resultant image traced according to the experiment of the present invention, in which WL; (a) 38th, (b) 186th, (c) 225th, S & (d) 51th, (e) 188th, (f) 292th, Lovebird1; (g) 26th, (h) 96th, (i) 135th.
18 is a table showing the performance comparison between the present invention and the conventional method according to the experiment of the present invention.

이하, 본 발명의 실시를 위한 구체적인 내용을 도면에 따라서 설명한다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, the present invention will be described in detail with reference to the drawings.

또한, 본 발명을 설명하는데 있어서 동일 부분은 동일 부호를 붙이고, 그 반복 설명은 생략한다.
In the description of the present invention, the same parts are denoted by the same reference numerals, and repetitive description thereof will be omitted.

먼저, 본 발명을 실시하기 위한 전체 시스템의 구성의 예들에 대하여 도 1을 참조하여 설명한다.First, examples of the configuration of the entire system for carrying out the present invention will be described with reference to Fig.

도 1에서 보는 바와 같이, 본 발명에 따른 템플릿 매칭 기반 고속 얼굴 추적 방법은 키넥트(20)의 깊이 카메라(21)에 의해 촬영된 깊이영상(61), 및, 키넥트(20)의 색상 카메라(또는 RGB카메라)(22)에 의해 촬영된 색상영상(62)을 입력받아 얼굴을 추적하는 컴퓨터 단말(30) 상의 프로그램 시스템으로 실시될 수 있다. 즉, 얼굴 추적 방법은 프로그램으로 구성되어 컴퓨터 단말(30)에 설치되어 실행될 수 있다. 컴퓨터 단말(30)에 설치된 프로그램은 하나의 프로그램 시스템(40)과 같이 동작할 수 있다.1, the template matching based high speed face tracking method according to the present invention includes a depth image 61 photographed by the depth camera 21 of the keynote 20, (Or an RGB camera) 22 to receive a color image 62 and track the face. That is, the face tracking method may be implemented by a program and installed in the computer terminal 30 and executed. The program installed in the computer terminal 30 can operate as one program system 40. [

한편, 다른 실시예로서, 얼굴 추적 방법은 프로그램으로 구성되어 범용 컴퓨터에서 동작하는 것 외에 ASIC(주문형 반도체) 등 하나의 전자회로로 구성되어 실시될 수 있다. 또는 영상에서의 얼굴 이미지를 검출하고 추적하는 것만을 전용으로 처리하는 전용 컴퓨터 단말(30)로 개발될 수도 있다. 이를 얼굴 추적 장치(40)라 부르기로 한다. 그 외 가능한 다른 형태도 실시될 수 있다.Meanwhile, as another embodiment, the face tracking method may be implemented by a single electronic circuit such as an ASIC (on-demand semiconductor) in addition to being operated by a general-purpose computer. Or a dedicated computer terminal 30 dedicated to only detecting and tracking the face image in the image. This is called a face tracking device 40. Other possible forms may also be practiced.

키넥트(20)는 깊이 카메라(21) 및 색상 카메라(22)를 포함한다.The keynote 20 includes a depth camera 21 and a color camera 22.

깊이 카메라(21)는 물체(10)의 깊이를 측정하는 카메라로서, 깊이정보를 측정하여 깊이영상을 출력한다.The depth camera 21 is a camera for measuring the depth of the object 10, and measures the depth information to output a depth image.

바람직하게는, 깊이 카메라(21)는 키넥트에 설치된 깊이 카메라로서, 적외선 패턴에 의하여 깊이정보를 측정하는 깊이 카메라이다. 깊이 카메라(21)는 적외선 송출부와 수신부로 구성되어, 송출부에서 송출된 적외선이 물체(10)에 맞고 반사되면, 수신부에서 반사되는 적외선을 수신하여, 물체(10)의 깊이를 측정한다.Preferably, the depth camera 21 is a depth camera installed in the Kinect and is a depth camera that measures depth information by an infrared pattern. The depth camera 21 is composed of an infrared ray transmitting unit and a receiving unit. When the infrared ray emitted from the transmitting unit is reflected by the object 10, the depth camera 21 receives the infrared rays reflected by the receiving unit and measures the depth of the object 10.

촬영된 깊이영상(61)은 깊이 카메라(21)로 촬영된 깊이영상이다.The photographed depth image 61 is a depth image photographed by the depth camera 21.

색상 카메라(22)는 통상의 RGB카메라로서, 물체(10)의 색상을 획득한다. 바람직하게는, 색상 카메라(22)는 키넥트에 설치된 RGB 카메라이다. 촬영된 색상영상(62)은 색상 카메라(22)로 촬영된 RGB 영상이다.The color camera 22 is a conventional RGB camera and acquires the color of the object 10. [ Preferably, the color camera 22 is an RGB camera installed in the Kinect. The photographed color image 62 is an RGB image photographed by the color camera 22.

카메라(21,22)가 촬영하는 대상은 주로 사람의 얼굴이다. 즉, 카메라(21,22)는 주로 사람의 정면에 설치되어 사람의 얼굴을 촬영한다. 예를 들어, 3차원 영상기기를 시청하는 시청자의 얼굴을 검출하고 시청자의 시점에 따라 3차원 영상을 처리하기 위해, 상기 카메라(21,22)는 영상기기의 정면을 향하도록 설치될 수 있다.The object to be photographed by the cameras 21 and 22 is a person's face. That is, the cameras 21 and 22 are installed on the front of a person to photograph a face of a person. For example, the cameras 21 and 22 may be installed so as to face the front of the video equipment in order to detect the face of the viewer who views the 3D video equipment and to process the 3D image according to the viewpoint of the viewer.

깊이영상(61) 및 색상영상(62)은 컴퓨터 단말(30)에 직접 입력되어 저장되고, 얼굴 추적 장치(40)에 의해 처리된다. 또는, 깊이영상(61) 및 색상영상(62)은 컴퓨터 단말(30)의 저장매체에 미리 저장되고, 얼굴 추적 장치(40)에 의해 저장된 깊이영상(60)을 읽어 입력될 수도 있다.The depth image 61 and the color image 62 are directly input to and stored in the computer terminal 30 and processed by the face tracking device 40. Alternatively, the depth image 61 and the color image 62 may be stored in advance in the storage medium of the computer terminal 30 and read from the depth image 60 stored by the face tracking device 40.

영상은 시간상으로 연속된 프레임으로 구성된다. 예를 들어, 현재시간 t의 프레임을 현재 프레임이라고 하면, 직전시간 t-1의 프레임은 이전 프레임이라고 하고, t+1의 프레임은 다음 프레임이라고 부르기로 한다. 한편, 각 프레임은 컬러영상(또는 컬러 이미지) 및 깊이영상(또는 깊이정보)을 갖는다. The image consists of consecutive frames in time. For example, if the frame at the current time t is the current frame, the frame at the immediately preceding time t-1 is referred to as the previous frame, and the frame at the time t + 1 is referred to as the next frame. On the other hand, each frame has a color image (or a color image) and a depth image (or depth information).

즉, 깊이영상(61) 및 색상영상(62)은 시간상으로 연속된 프레임으로 구성된다. 하나의 프레임은 하나의 이미지를 갖는다. 또한, 영상(61,62)은 하나의 프레임(또는 이미지)을 가질 수도 있다. 즉, 영상(61,62)은 하나의 이미지인 경우에도 해당된다.That is, the depth image 61 and the color image 62 are composed of consecutive frames in time. One frame has one image. Also, the images 61 and 62 may have one frame (or image). That is, the images 61 and 62 correspond to one image.

깊이영상 및 색상영상에서 얼굴을 검출하는 것은, 곧 깊이/색상 프레임(또는 이미지) 각각에서 검출하는 것을 의미하나, 이하에서 특별한 구별의 필요성이 없는 한, 영상이란 용어를 사용하기로 한다.
Detecting a face in a depth image and a color image means detection in each of a depth / color frame (or image), but the term "image" is used below unless there is a need for a special distinction.

먼저, 본 발명에서 사용되는 전체적인 얼굴 검출 및 추적 방법에 대하여 도 2를 참조하여 설명한다.First, the overall face detection and tracking method used in the present invention will be described with reference to FIG.

일반적으로 얼굴추적은 추적할 대상 얼굴이 정해져야 하며, 이를 위해 얼굴검출 방법[비특허문헌 17]을 주로 사용하고 있다. 얼굴검출은 입력영상이 주어지면 전체 영상을 대상으로 얼굴을 탐색하여야 하기 때문에 일반적으로 많은 시간이 소요된다. 따라서 얼굴검출 방법을 매 영상 프레임에 적용하여 얼굴을 추적하는 것은 실시간 처리가 용이하지 않다. 그래서 얼굴추적 초기에 얼굴을 검출하고, 검출된 얼굴을 대상 얼굴로 추적하는 것이 일반적이다.In general, face tracking must determine a target face to be tracked, and a face detection method [Non-Patent Document 17] is mainly used for this purpose. Face detection is usually time consuming because the face must be searched for the whole image if the input image is given. Therefore, it is not easy to track face by applying face detection method to every image frame. Therefore, it is common to detect the face at the beginning of face tracking and track the detected face to the target face.

도 2에 본 발명에서 고려하는 얼굴검출과 얼굴추적의 관계를 나타내었다. 기존의 얼굴검출 방법은 텍스쳐 영상을 주로 사용하였으나, 본 발명에서 제시하는 추적 방법이 깊이정보만을 사용하기 때문에 도면에서 얼굴검출의 입력으로 깊이정보를 추가하였다.FIG. 2 shows the relationship between face detection and face tracking considered in the present invention. In the conventional face detection method, the texture image is mainly used. However, since the tracking method according to the present invention uses only depth information, depth information is added as an input of face detection in the drawing.

깊이정보를 사용하던 안하던 검출단계에서 얼굴이 검출되면 그 얼굴의 깊이 맵, 위치, 및 크기를 추출하고, 이를 얼굴추적에서 템플릿으로 사용한다. 일단 얼굴이 검출되면 그 다음 프레임부터는 얼굴추적만 수행하며, 매 프레임 추적한 얼굴의 템플릿은 갱신(update)한다.When a face is detected in the detection step without using depth information, the depth map, position, and size of the face are extracted and used as a template in face tracking. Once a face is detected, only the face tracking is performed from the next frame, and the template of the face tracked every frame is updated.

만약 추적에 실패하면 더 이상 추적을 진행할 수 없으므로, 이때는 다시 얼굴검출 단계로 가서 얼굴을 재 검출한다. 일반적으로 추적이 실패하는 경우는 추적하고 있는 사람이 영상의 바깥으로 나가거나 장면이 바뀌는 경우이다.If the tracking fails, the tracking process can not proceed anymore. In this case, go to the face detection step again and detect the face again. Generally, if the tracking fails, the person who is tracking moves out of the image or changes the scene.

본 발명에서는 얼굴추적 방법을 제시하기 때문에 얼굴검출 방법에 대해서는 언급하지 않으며, 기존의 방법[비특허문헌 16]을 사용하는 것으로 한다.In the present invention, since the face tracking method is proposed, the face detection method is not mentioned, and a conventional method (non-patent document 16) is used.

한편, 얼굴을 추적할 때 깊이영상을 사용하는 것은 조명의 변화 등에 무관하게 추적하기 위한 것이다. 종래 방법은 RGB 또는 RGB 중 하나를 사용해서 추적하는데, 이 경우 현재 프레임과 다음 프레임의 조명이 변화한다면 템플릿 매칭에 있어서 오류가 발생할 수도 있다. 그러나 깊이영상을 사용하며 조명의 변화 등에 무관하게 추적할 수 있고, 단지, 앞뒤 움직임에 대해서 깊이값을 조정(adjustment)만 해주면 된다.
On the other hand, the use of depth images to track faces is to track irrelevant changes in illumination. The conventional method tracks using RGB or RGB. In this case, if the illumination of the current frame and the next frame change, an error may occur in template matching. However, depth images are used and can be tracked irrespective of changes in lighting, and only the adjustment of the depth value is required for forward and backward movement.

다음으로, 본 발명의 일실시예에 따른 템플릿 매칭 기반 고속 얼굴 추적 방법에 대하여 도 3 내지 도 4를 참조하여 설명한다.Next, a fast matching method based on template matching according to an embodiment of the present invention will be described with reference to FIG. 3 to FIG.

도 3에서 보는 바와 같이, 본 발명에 따른 템플릿 매칭 기반 고속 얼굴 추적 방법은 (a) 템플릿 크기를 갱신하는 단계(S10); (b) 현재 프레임에서 탐색영역을 설정하는 단계(S20); (c) 템플릿 매칭을 수행하는 단계(S30); 및 (d) 크기와 위치가 갱신된 템플릿으로 템플릿 매칭을 반복하는 단계(S40)로 구성된다.As shown in FIG. 3, the template matching based fast face tracking method according to the present invention includes: (a) updating a template size (S10); (b) setting a search area in the current frame (S20); (c) performing template matching (S30); And (d) repeating template matching with a template whose size and position have been updated (S40).

도 4는 본 발명의 얼굴 추적 방법을 의사코드로 표현한 것이다. 도 4에서 보는 바와 같이, 본 발명의 얼굴 추적 방법은 깊이영상 시퀀스와 템플릿을 입력으로 하고, 입력되는 영상 시퀀스의 각 프레임에 대한 갱신(update)된 템플릿을 출력한다. 템플릿으로는 추적할 얼굴에 해당하는 깊이영상, 위치, 그리고 크기를 사용한다. 각 단계는 이하에서 상세히 설명한다.4 is a pseudo code representation of the face tracking method of the present invention. As shown in FIG. 4, the face tracking method of the present invention inputs a depth image sequence and a template, and outputs an updated template for each frame of an input image sequence. As a template, use the depth image, position, and size corresponding to the face to be tracked. Each step is described in detail below.

먼저, 템플릿 크기 및 탐색범위를 재설정한다(S10,S20).First, the template size and search range are reset (S10, S20).

즉, 이전 프레임에 비해 그 다음 프레임(또는 현재 프레임)에서 추적하는 얼굴의 크기변화를 예측하고, 이에 따라 템플릿의 크기를 조정하고 그 결과로 템플릿 매칭을 수행할 탐색영역을 조정하는 것이다.That is, the size change of the face tracked in the next frame (or the current frame) compared to the previous frame is predicted, and accordingly, the size of the template is adjusted and the search area to be template-matched is adjusted as a result.

템플릿의 크기 재설정 단계(S10)에 대하여 구체적으로 설명한다.The size resetting step S10 of the template will be described in detail.

만약 사람이 상하 또는 좌우로 움직인다면, 얼굴의 크기는 거의 변화가 없지만, 전후로 움직인다면 크기는 변할 것이다. 따라서 템플릿의 크기는 매 프레임마다 그 프레임의 얼굴의 크기에 맞게 갱신(update)을 해주어야 한다.If a person moves up or down or side to side, the size of the face is almost unchanged, but if moved back and forth, the size will change. Therefore, the size of the template should be updated in accordance with the size of the face of the frame every frame.

영상에서 객체의 상대적인 크기는 해당하는 깊이에 영향을 받기 때문에, 먼저 깊이와 객체 크기 사이의 관계에 대해 설명한다. 일반적으로 깊이 카메라는 실제의 깊이 값, 즉 카메라와 객체와의 거리 값을 제공한다. 깊이 카메라가 측정할 수 있는 거리의 최대, 최소 거리를 각각 z_R _, _min,z_R _, _max로 정의하고 그 거리를 n-비트로 표현(Z')할 때 실제 거리 z와 Z'은 식 1과 같은 관계를 갖는다.Since the relative size of an object in an image is affected by its depth, we first describe the relationship between depth and object size. In general, a depth camera provides the actual depth value, that is, the distance between the camera and the object. The maximum depth of the distance that the camera can be measured, "when the actual distance z and Z represent defines the minimum distance to the _R _z, _min, _R _z, _max, respectively, and n- bits, the distance (Z), the formula (1) and Have the same relationship.

[수학식 1][Equation 1]

그러나 우리가 보통 접하는 깊이 영상은 가까울수록 큰 값을 가지게 된다. 이 방식의 깊이 맵(Z)은 식 1을 식 2와 같이 반전시켜서 표현하여야 한다.However, the closer the depth image we usually touch, the larger the value. The depth map (Z) of this method should be expressed by inverting Equation 1 as shown in Equation (2).

[수학식 2]&Quot; (2) "

따라서 객체의 실제 크기가 s이고, 실제 거리가 z, 카메라의 초점 거리가 f일 때, 카메라 영상 센서에서의 물체크기 S_sensor와 z 또는 Z의 관계는 식 3과 같다.Therefore, when the actual size of the object is s, the actual distance is z, and the focal length of the camera is f, the relationship between the object size S _sensor and z or Z in the camera image sensor is given by Equation 3.

[수학식 3]&Quot; (3) "

영상 센서의 픽셀 크기가 P_sensor이면, 실제 영상에서의 픽셀의 수 N는 식 4로 나타낼 수 있다.If the pixel size of the image sensor is P _sensor , the number N of pixels in the actual image can be expressed by Equation 4.

[수학식 4]&Quot; (4) "

객체가 깊이 Z에 있을 때의 물체의 크기를 N이라 하면 도 5와 같은 그래프를 얻을 수 있다. 여기서 점선은 식 4를 나타낸 결과이고, 그래프에서 점들로 찍힌 것은 측정한 결과이며, 실선은 점들의 추세선이다. 여기서 점선과 측정된 값 또는 추세선과의 오차는 측정할 때 생긴 오차와 식 1 부터 4에서 발생하는 소수점 단위의 오차 때문이다. If the object size is N when the object is at the depth Z, the graph shown in FIG. 5 can be obtained. Here, the dotted line is the result of Equation 4, the dotted line is the measurement result, and the solid line is the trend line of the points. Here, the error between the dotted line and the measured value or the trend line is due to the error in the measurement and the error in the decimal unit in Equations 1 to 4.

도 5에 따라 현재 프레임의 템플릿 크기를 재설정하기 위해서, 이전 프레임에서 얼굴의 깊이, 즉 이전 프레임에서 생성된 템플릿의 깊이와 해당하는 현재 프레임의 깊이를 비교하여 템플릿의 크기를 갱신한다.In order to reset the template size of the current frame according to FIG. 5, the depth of the face in the previous frame, that is, the depth of the template generated in the previous frame is compared with the depth of the corresponding current frame to update the size of the template.

우선 템플릿(이전 프레임의 템플릿)을 도 6에서 m×n 서브블록(SB)으로 나눈다. 그리고 각 블록(a×b 해상도)의 평균 깊이를 구하여, 그 중 가장 큰 깊이를 템플릿의 깊이 Z_T로 정의하며, 이를 식 5로 표현하였다. SB^T _i _,j은 a×b 해상도의 (i,j) 서브블록의 평균 깊이를 의미한다.First, the template (the template of the previous frame) is divided into m × n subblocks SB in FIG. Then, the average depth of each block (a × b resolution) is determined, and the largest depth is defined as the depth Z _T of the template. SB ^T _i _{, j} means the average depth of (i, j) subblocks of a × b resolution.

[수학식 5]&Quot; (5) "

현재 추적할 프레임(또는 현재 프레임)에서 템플릿에 해당하는 위치와 크기의 부분 깊이영상을 추출하여 식 5와 동일한 과정을 거쳐 최고의 평균 깊이 값을 구한다. 이를 Z_T'라 한다.In the current frame to be tracked (or current frame), the partial depth image of the position and size corresponding to the template is extracted and the maximum average depth value is obtained through the same process as in Equation 5. This is called Z _T '.

이때, 현재 프레임에 사용할 템플릿의 크기 S^X _ZT'은 이전 프레임에서 구한 템플릿의 크기 S^X _ZT에서 식 6과 같이 구할 수 있다. 여기서 X는 너비(hor) 또는 높이(ver)을 나타낸다.In this case, the size S ^X _{ZT '} of the template to be used in the current frame can be obtained from the size S ^X _ZT of the template obtained in the previous frame as shown in Equation 6. Where X represents the width (hor) or height (ver).

[수학식 6]&Quot; (6) "

여기서 템플릿을 몇 개의 블록(m×n)으로 나눌 것인가는 실험 등을 통해 사전에 결정된다.
Here, the number of blocks (m × n) divided by the template is determined in advance through experiments and the like.

다음으로, 탐색영역 조정 단계(S20)에 대하여 설명한다.Next, the search area adjustment step (S20) will be described.

템플릿 매칭을 위해 탐색하여야 하는 영역은 얼굴의 움직임 속도와 관련이 있다. 본 발명에서는 추적 대상 사람이 TV나 영화를 시청하는 것과 같이 전방을 주시하고 있다고 가정한다. 따라서 이 경우는 영상을 시청하면서 움직일 수 있는 최대의 움직임 속도를 고려하여 탐색영역의 크기를 정한다.The area to be searched for template matching is related to the movement speed of the face. In the present invention, it is assumed that a person to be tracked is watching forward, such as watching a TV or a movie. Therefore, in this case, the size of the search area is determined in consideration of the maximum movement speed that can be moved while viewing the image.

물체의 깊이에 따라 움직임의 속도, 즉 움직임 양이 달라진다. 즉, 화면상의 움직임 정도는 그 물체의 카메라와의 거리(깊이)에 따라 상대적으로 결정된다. 따라서 본 발명에서는 도 6과 같이 거리에 따라 조정된 템플릿의 크기에 대해 상대적인 크기, 즉 템플릿 크기의 일정 비율로 확장한 영역을 탐색영역으로 설정한다.The speed of movement, that is, the amount of movement, varies with the depth of the object. That is, the degree of motion on the screen is relatively determined according to the distance (depth) of the object to the camera. Therefore, in the present invention, as shown in FIG. 6, a region expanded relative to the size of the template adjusted according to the distance, that is, the template region is expanded to a certain ratio.

여기서 템플릿은 앞에서 설명한 것과 같이 조정된 템플릿의 크기(S^X _ZT')를 사용하며, 본 발명에서는 가로와 세로 모두 동일한 비율(α)(또는 확장비율)로 확장한다. α값 또한 실험 등에 구해진 계수로서, 사전에 미리 정해둔다.
Here, the template uses the size (S ^X _{ZT '} ) of the adjusted template as described above. In the present invention, the template is expanded in the same ratio (?) (Or expansion ratio). The alpha value is a coefficient obtained in the experiment or the like, and is predetermined in advance.

다음으로, 템플릿 매칭 단계(S30)에 대하여 설명한다.Next, the template matching step S30 will be described.

앞에서 재조정된 템플릿과 그에 해당하는 탐색영역 내의 각 위치를 탐색하여 정확한 얼굴 위치를 찾는 템플릿 매칭 과정이다. [비특허문헌 16]에서 제시한 것과 같이 RGB영상으로 템플릿 매칭을 수행할 경우, 조명변화로 인해 얼굴을 찾지 못하고 다른 부분을 찾는 경우가 발생할 수 있다. 그래서 본 발명에서는 조명변화에 강한 깊이정보를 이용하여 템플릿 매칭을 수행한다.This is a template matching process for finding the correct face position by searching for each position in the template and its corresponding search area. When template matching is performed with an RGB image, as shown in [Non-Patent Document 16], it may happen that a face can not be found due to illumination change and another portion is searched. Thus, in the present invention, template matching is performed using depth information that is resistant to illumination change.

템플릿 매칭을 위해 사용하는 비용함수로 식 7의 픽셀 당 SAD(sum-of-absolute differences) PSAD을 사용한다. 즉, 식 7을 이용하여 탐색범위 내의 모든 위치 (i,j)를 탐색하여 그 중 가장 작은 PSAD_i,j값을 갖는 위치를 템플릿이 매칭 된 위치(식 8의 SP_Opt)로 선택한다.We use a sum-of-absolute differences (SAD) per pixel in Equation 7 as a cost function for template matching. That is, all positions (i, j) in the search range are searched using Equation 7, and the position having the smallest PSAD _{i, j} value among them is selected as the position (SP _{Opt in} Equation 8) where the template is matched.

[수학식 7]&Quot; (7) "

[수학식 8]&Quot; (8) "

식 7에서, k는 템플릿의 픽셀의 수, T_w와 T_h은 각각 템플릿의 가로와 세로의 크기를 의미한다. 그리고 D_SA(i+h,j+v)와 D_T(h,v)은 템플릿과 탐색영역의 픽셀 값을 의미한다.
In Equation 7, k is the number of pixels in the template, T _w and T _h are the width and height of the template, respectively. And D _SA (i + h, j + v) and D _T (h, v) represent the pixel values of the template and the search area.

마지막으로, 매칭되는 위치를 템플릿의 새로운 위치로 갱신하고, 그 다음 프레임을 현재 프레임으로 하여, 앞서 단계(S10 ~ S30)를 반복한다(S40). 앞서 얼굴 검출 및 추적의 전체적 방법을 설명한 바와 같이, 매칭되는 위치를 찾지 못하면, 다시 얼굴을 검출하여, 템플릿을 새롭게 추출한 후, 앞서의 과정을 반복한다.
Finally, the matching position is updated to the new position of the template, and the next frame is set as the current frame, and steps S10 to S30 are repeated (S40). As described above, if the matching position is not found, the face is detected again and a new template is extracted, and the above process is repeated.

다음으로, 템플릿 매칭 단계(S30)의 속도를 향상하기 위한 세부적인 방법에 대하여 보다 구체적으로 설명한다.Next, a detailed method for improving the speed of the template matching step S30 will be described in more detail.

바람직하게는, 템플릿 매칭시, 탐색 영역 내의 각 위치에 대하여 PSAD를 계산하되, 계산된 PSAD가 특정 조건을 만족하면, 템플릿 매칭을 조기 종료한다.Preferably, at the time of template matching, PSAD is calculated for each position in the search area, and if the calculated PSAD satisfies a specific condition, template matching is terminated early.

식 7은 탐색영역의 모든 위치를 PSAD값으로 탐색하는 것으로, 그 시간은 탐색영역의 크기에 좌우된다. 얼굴추적의 정확도를 감안하면 탐색영역을 크게 설정하는 것이 유리하다. 실제로 어느 정도까지는 탐색영역이 클수록 추적오차는 감소한다(이하 실험을 통해 상세히 설명함).Equation (7) searches all positions of the search area with the PSAD value, and the time depends on the size of the search area. Considering the accuracy of face tracking, it is advantageous to set the search area large. Indeed, to some extent, the larger the search area, the smaller the tracking error (described in more detail below).

그러나 충분한 크기의 탐색영역은 탐색시간이 과도하게 커 실시간 추적이 불가능할 수 있다. 본 발명의 목적 중 하나는 고속으로 얼굴을 추적하는 것이다. 따라서 탐색시간을 줄이기 위해 조기종료(Early termination) 방법을 사용한다. 이 방법의 배경은, 위치를 추적하는 목적이 조금의 오차도 없이 추적하는 경우보다는 어느 정도의 근사를 허용하는 경우가 많기 때문이다.However, a sufficiently large search area may not be able to be tracked in real time due to excessive search time. One of the objects of the present invention is to track faces at high speed. Therefore, the early termination method is used to reduce the search time. The background of this method is that the purpose of tracking the position is often to allow a degree of approximation rather than tracking it without any errors.

조기종료 방법을 사용하기 위해서는 근사적으로 위치를 추적했다고 판단하여야 하는데, 비용함수로 PSAD을 사용하기 때문에 미리 정한 PSAD의 문턱치(T_ET)보다 작을 경우에 탐색을 조기종료 한다. 이를 식으로 나타내면 식 9와 같다.In order to use the early termination method, it is necessary to judge that the position is tracked approximate. Since the PSAD is used as the cost function, the search is terminated early when the threshold value (T _ET ) of the predetermined PSAD is smaller than T _ET . This can be expressed as Equation 9.

[수학식 9]&Quot; (9) "

또한, 바람직하게는, 템플릿 매칭시, 탐색영역에서 나선형으로 탐색을 진행한다.Further, preferably, the search proceeds in a spiral manner in the search area at the time of template matching.

앞에서 설정한 환경의 경우 일반적으로 사람이 움직이는 경우보다 거의 움직이지 않는 경우가 많고, 움직인다 하더라도 큰 움직임보다는 작은 움직임이 많다. 따라서 조기종료를 염두에 둔다면 일반적인 방법(raster 스캔 또는 zigzag 스캔 등)보다는 그 전 템플릿의 위치 주변을 먼저 탐색하는 것이 더 효율적이다.In the case of the environment set in the previous case, there are many cases in which the person does not move much more than when the person moves, and there are many small movements rather than large movements. Therefore, if you consider early termination, it is more efficient to search around the location of the previous template rather than the usual way (such as raster scan or zigzag scan).

따라서 본 발명에서는 이전 템플릿의 위치에서 시작하여 나선형으로 점차 먼 쪽으로 탐색하는 나선형 탐색방법을 사용한다. 도 7에서, 탐색영역이 5×5[pixel²]의 경우 나선형 탐색을 나타내었다. 도 7에서의 각 블록은 화소를 나타내고, 각 블록의 번호는 나선형 탐색의 순서를 나타내며, '0' 위치가 이전 템플릿의 위치이다.
Therefore, in the present invention, a helical search method is used in which the search starts from the position of the previous template and is gradually distant toward the spiral. In FIG. 7, a spiral search is performed when the search area is 5 x 5 [pixel ² ]. Each block in Fig. 7 represents a pixel, the number of each block represents the order of the helical search, and the '0' position is the position of the previous template.

또한, 바람직하게는, 템플릿 매칭시, 희소(Sparse) 탐색 방법을 적용한다.Further, preferably, a sparse search method is applied in template matching.

조기종료 방법에서 더 나아가 더 많은 탐색시간을 줄이기 위해서 본 발명에서는 희소(Sparse) 탐색 방법을 적용한다. 이 방법은 탐색영역 내에서 현재 탐색한 화소 다음에 탐색할 화소를 인접한 화소(나선형 탐색의 경우 탐색 시퀀스의 그 다음 화소)를 탐색하지 않고 일정한 간격(interval)을 뛰어넘어 그 다음 탐색할 화소를 선택한다. 즉, 그 사이의 화소들은 검색하지 않는다.A sparse search method is applied in the present invention in order to further reduce the search time in the early termination method. In this method, the pixel to be searched next to the currently searched pixel in the search area is not searched for the adjacent pixel (in the case of the spiral search, the next pixel in the search sequence) do. That is, the pixels in between are not searched.

도 7에서, 그 간격이 '3'인 경우의 예를 보이고 있다. 이 예에서 탐색되는 화소는 {0, 3, 6, 9, 12, 15, 18, 21, 24}이다. 이 희소(Sparse) 탐색에서 탐색되는 시퀀스를 식 10에 나타내었다. 여기서 I는 간격을 나타낸다. 이 식에서 I=1이면 모든 영역을 탐색함을 의미한다. I 값 또한 실험 등을 통해 사전에 결정한다.In FIG. 7, an example is shown in which the interval is '3'. The pixels searched in this example are {0, 3, 6, 9, 12, 15, 18, 21, 24}. The sequence searched in this sparse search is shown in Equation 10. Where I represents the interval. If I = 1 in this equation, it means that all regions are searched. The I value is also determined in advance through experiments and the like.

[수학식 10]&Quot; (10) "

다른 실시예로서, 규칙성이 있게 미리 탐색할 셀을 정해놓고 나선 방식의 희소 탐색을 수행한다. 또한, 중심에서 거리가 멀어질수록 탐색 화소를 희박하게 잡을 수도 있다. 이것은 움직임이 적을 확률이 높기 때문에, 이전 위치에서 가까운 위치를 검색할 때는 희소 탐색 간격을 작게하고, 멀어질수록 탐색 간격을 멀게하는 것이 바람직하다. 이 경우, 탐색 속도는 상당히 개선시킬 수 있다.
In another embodiment, a sparse sparse search is performed while a cell to be searched for is regularly determined. Also, as the distance from the center increases, the search pixel may be sparsely caught. Since it is highly probable that the motion is small, it is desirable to make the rare search interval smaller when searching for a position close to the previous position, and to make the search interval longer as the distance becomes larger. In this case, the search speed can be significantly improved.

다음으로, 바람직하게는, 희소 탐색 또는 조기 종료 방법이 적용되는 경우, 추적 오류를 줄이기 위해 매칭 보정을 수행한다.Next, preferably, when a rare search or early termination method is applied, matching correction is performed to reduce tracking errors.

희소(Sparse) 탐색과 조기종료로 인하여 발생한 추적 오류를 줄이기 위해서 보정과정이 수행되는데, 그 방법은 조기종료 된 화소의 주위화소들에 대해 추가적인 탐색을 수행하는 것이다.A correction process is performed to reduce tracking errors caused by sparse searching and early termination, which is to perform an additional search on surrounding pixels of a prematurely terminated pixel.

추가 탐색 화소 수에 따라 두 가지를 고려하는데, 그 첫 번째가 2-화소 보정(2-pixel refinement)이다. 이 보정방법에서는 탐색 시퀀스에서 조기종료 된 화소의 전화소와 후화소를 추가로 검색하는 것이다.Two are considered according to the number of additional search pixels, the first of which is a 2-pixel refinement. In this correction method, the telephone number and the posterior pixel of the pixel terminated early in the search sequence are further searched.

두 번째 방법은 4-화소 보정(4-pixel refinement)인데, 이 방법은 조기종료 된 화소의 4방향 인접화소를 추가로 검색하는 방법이다. 도 8에 도 7의 나선형 희소(Sparse) 탐색으로 예를 보이고 있는데, 여기서는 화소 6에서 조기종료 된 경우를 보이고 있다. 여기서 2-화소 보정은 {5, 21} 화소들이 추가로 검색되고, 4-화소 보정에서는 {5, 7, 19, 21} 화소들이 추가 탐색된다. The second method is 4-pixel correction (4-pixel refinement), which is a method of further searching for four-directional adjacent pixels of an early-terminated pixel. FIG. 8 shows an example of the spiral search of FIG. 7, where the case of early termination at the pixel 6 is shown. Here, the 2-pixel correction is further searched for {5, 21} pixels and the {5, 7, 19, 21} pixels are further searched for the 4-pixel correction.

이 두 보정에서 모두 조기종료 된 화소와 보정을 위해 검색된 화소들의 PSAD값 중 가장 작은 값을 갖는 화소를 최종적인 SP_Opt로 결정한다.
The final SP _Opt is determined as the pixel having the smallest value among the PSAD values of the pixels terminated prematurely and the pixels searched for correction in both of the corrections.

다음으로, 얼굴추적을 위한 파라미터 결정하는 방법에 대하여 설명한다.Next, a method for determining parameters for face tracking will be described.

앞에서 설명한 얼굴 추적 방법에 사용된 파라미터들을 실험적으로 결정한다. 도 9의 표는 파라미터 결정을 위해 사용한 테스트 시퀀스들의 특성을 보이고 있다. 시퀀스들은 마이크로소프트(Microsoft)사에서 출시한 키넥트(Kinect)를 사용하여 자체 제작하였으므로 640×480 해상도의 깊이정보와 컬러정보를 갖는다.The parameters used in the face tracking method described above are determined experimentally. The table of Figure 9 shows the characteristics of the test sequences used for parameter determination. The sequences are self-created using Kinect released by Microsoft, and thus have depth information and color information of 640 × 480 resolution.

도 9의 표에서 'LR'은 좌우 움직임, 'UD'는 상하 움직임, 그리고 'BF'는 전후 움직임을 측정하기 위해 제작한다. 각 방향의 움직임은 앞에서 설정한 환경을 충분히 수용할 수 있도록 충분히 빠른 속도로 제작한다. 도 10은 각 시퀀스의 대표적인 세 프레임의 영상을 보이고 있다.
In the table of FIG. 9, 'LR' is used to measure left and right movement, 'UD' is used to measure up and down movement, and 'BF' is used to measure back and forth movement. Movements in each direction are made at a speed sufficiently high to accommodate the environment previously set. FIG. 10 shows images of representative three frames of each sequence.

먼저, 템플릿 재조정을 위한 템플릿 분할의 크기를 결정하는 실험에 대하여 설명한다.First, an experiment for determining the size of the template division for template re-adjustment will be described.

앞에서 템플릿의 크기를 현재 프레임의 얼굴크기로 추정하기 위해서 템플릿을 m×n 블록으로 분할하였다. 이에 여러 가지 분할방법에 대해 실험을 수행한다. m과 n의 조합으로 분할하는 방법이 너무 많기 때문에 ‘m=n’으로 하여 3×3 블록부터 11×11 블록까지 실험한다.In order to estimate the size of the template as the face size of the current frame, the template is divided into m × n blocks. Experiments are carried out on various partitioning methods. Since there are too many ways to divide by the combination of m and n, we test from 3 × 3 block to 11 × 11 block with 'm = n'.

실험하는 방법은 참조 영상의 얼굴(RI₃)과 재조정 된 템플릿(RT₃)의 크기를 비교하여 오차(TRE(%))를 측정한다. 식 11은 오차를 산출하는 방법을 나타내고 있다. 참조 영상은 아다부스트(Adaboost) 알고리즘[비특허문헌 17]을 이용해 검출한 얼굴영상을 사용한다.In the experiment, the error (TRE (%)) is measured by comparing the size of the reference image (RI ₃ ) and the resized template (RT ₃ ). Equation 11 shows a method of calculating an error. The reference image uses a face image detected using the Adaboost algorithm (Non-Patent Document 17).

[수학식 11]&Quot; (11) "

도 11에 실험 결과를 보이고 있는데, 3×3 블록부터 11×11 블록까지 오차가 점점 증가하는 것을 알 수 있다. 따라서 본 발명에서는 템플릿 분할을 오차가 가장 작은 3×3 블록으로 정한다.
The experimental results are shown in Fig. 11, and it can be seen that the error gradually increases from the 3x3 block to the 11x11 block. Therefore, in the present invention, the template division is defined as a 3x3 block having the smallest error.

다음으로, 탐색영역의 범위, 특히, 확장비율(α)를 결정하는 실험에 대하여 설명한다.Next, an experiment for determining the range of the search area, particularly, the expansion ratio? Will be described.

앞에서 탐색영역을 템플릿 크기에 대비한 상대적인 값 100+2α(%)로 정한 바 있으며, 여기에서는 실험을 통해 확장비율(α)를 결정한다.In the previous section, we set the search area to a relative value of 100 + 2α (%) in relation to the template size. Here, the expansion ratio α is determined through experiments.

실험방법으로 각 테스트 시퀀스에 대하여 α를 15%부터 50%까지 변화시키면서 추적오차(Tracking Error)를 측정한다. 이때, 추적오차는 식 12를 사용하여 구한다.For each test sequence, tracking error is measured while varying α from 15% to 50%. At this time, the tracking error is obtained using Equation 12.

[수학식 12]&Quot; (12) "

여기서, T_SP(Template_{SearchPosition})은 α(%)에서 추적된 얼굴의 위치를 말하고, T_BP(Template_BestPosition)은 영상 전체를 탐색 범위로 하여 찾은 얼굴의 위치를 말한다. 그리고 T_DS(Template_DiagonalSize)은 템플릿의 대각선 크기를 의미한다.Here, T _SP (Template _{SearchPosition} ) refers to the position of the face tracked at α (%), and T _BP (Template _{Best Position} ) refers to the position of the face found with the entire image as the search range. And T _DS (Template _Diagonal Size) refers to the diagonal size of the template.

도 12에서, 일실시예에 의한 실험결과를 보이고 있다. 'UD' 시퀀스가 α<21에서 추적오류가 20% 이상의 높은 추적오차를 보이는 것을 알 수 있다. 그 이유는 탐색범위가 작아서 얼굴이 탐색범위를 벗어났기 때문이며, 그리고 상대적으로 'BF'는 전 구간에 걸쳐 낮은 오차를 보였는데, 그 이유는 'BF'가 전후 움직임인 시퀀스이기 때문에 탐색영역에 영향을 적게 받았기 때문이다. 세 시퀀스 모드를 고려하여 모든 시퀀스의 추적오류가 거의 0%인 α=41(%)로 탐색영역을 결정한다. 바람직하겐는, 추적오류가 1%이하가 되도록 확장비율(α)를 결정한다.
In FIG. 12, experimental results according to an embodiment are shown. It can be seen that the tracking error of the 'UD' sequence is higher than 20% at α <21. The reason for this is that the search range is so small that the face is out of the search range, and relatively 'BF' has a low error throughout the reason, because 'BF' . Considering the three sequence modes, the search area is determined to be α = 41 (%) where the tracking error of all sequences is almost 0%. Preferably, the expansion ratio alpha is determined so that the tracking error is less than 1%.

다음으로, 조기종료를 위한 PSAD 문턱치를 결정하는 실험을 설명한다.Next, an experiment for determining the PSAD threshold for early termination will be described.

앞에서, 수행 시간을 줄이기 위해서 조기종료 기법을 설명하였다. 여기에서 조기종료 방법에서 사용한 문턱치를 실험을 통해 결정한다. 도 13은 문턱치(T_ET)를 '0'부터 '5'까지 변화시키면서 추적오차와 수행시간(Time)을 측정한 결과이다.Previously, an early termination technique was described to reduce execution time. Here, the threshold used in the early termination method is determined experimentally. FIG. 13 shows the result of measuring the tracking error and the execution time (Time) while changing the threshold value (T _ET ) from '0' to '5'.

도 13에서 보듯이, 문턱치가 높아질수록 추적오차는 증가하였지만, 수행시간은 감소하였다. 본 발명에서는 고속 추적을 목표로 하기 때문에 수행시간이 40ms 미만이 되는 문턱치 T_ET=2로 정하였으며, 이 때 그림에 의한 추적오차는 4% 미만이었다.As shown in FIG. 13, as the threshold value increases, the tracking error increases, but the execution time decreases. In the present invention, since the target of high-speed tracking is set, the threshold T _ET = 2, in which the execution time is less than 40 ms, is set to 2, and the tracking error by the figure is less than 4%.

즉, 바람직하게는, 목표로 하는 수행시간을 기준으로, 목표 수행시간 이하로 나오는 문턱치를 PSAD 문턱치로 결정한다.
That is, preferably, the threshold value that is less than the target execution time is determined as the PSAD threshold based on the target execution time.

다음으로, 희소(Sparse) 탐색을 위한 간격을 결정하는 실험에 대하여 설명한다.Next, an experiment for determining the interval for the sparse search will be described.

조기종료 뿐만 아니라 수행 시간을 더 줄이기 위해서 앞에서 제시한 희소(Sparse) 탐색 방법을 제시하였다. 이때, 바람직하게는, 그 간격은 실험을 통해 결정한다.In order to reduce the execution time as well as the early termination, we proposed a sparse search method. At this time, preferably, the interval is determined through experiments.

실험은 간격 I를 '1'에서 '5'까지 증가시키면서 추적오차와 수행시간을 측정한다. 도 14에는 일실시예에 의한 실험 결과를 보이고 있다. 이 실험에서는 앞에서 정한 대로 T_ET=2로 수행한다. 도 14에서 보듯이 예측한 대로 간격이 증가할수록 추적오차는 증가하고 수행시간은 감소한다. 따라서 최적의 간격으로 추적 오류가 5% 미만이고, 평균 수행시간이 30ms 미만인 I=3 으로 정하였다.
The experiment measures the tracking error and the execution time while increasing the interval I from '1' to '5'. FIG. 14 shows an experimental result according to an embodiment. In this experiment, T _ET = 2 is performed as previously defined. As shown in FIG. 14, as the interval increases, the tracking error increases and the execution time decreases. Therefore, I = 3 with less than 5% tracking error and less than 30 ms average tracking time at optimal intervals.

다음으로, 실험을 통한 본 발명의 효과를 도 12 내지 도 20을 참조하여 보다 구체적으로 설명한다.Next, the effect of the present invention through experiments will be described in more detail with reference to Figs. 12 to 20. Fig.

본 발명에 따른 얼굴 추적 방법과 실험을 통해 구한 파라미터들을 여러 테스트 시퀀스들에 적용하는 실험을 수행하였다. 구현은 Microsoft window7 운영체제에서 Microsoft Visual Studio 2010, OpenCV Library 2.4.3을 사용하였으며, 실험에 사용된 PC의 사양은 3.40GHz의 Intel Core i7 CPU, 16GB RAM이었다.Experiments were conducted to apply the parameters obtained through the face tracking method and experiment according to the present invention to various test sequences. The implementation uses Microsoft Visual Studio 2010, OpenCV Library 2.4.3 in the Microsoft window7 operating system, and the specifications of the PC used in the experiment were Intel Core i7 CPU of 3.40GHz and 16GB of RAM.

앞서 파라미터를 결정하기 위해 사용한 테스트 시퀀스들은 파리미터 추출을 위해 극단적인 움직임을 연출하여 제작하였기 때문에 현실적인 상황과는 다소 차이가 있다. 따라서 좀 더 현실적인 테스트 시퀀스를 자체 제작하여 사용하였다.The test sequences used to determine the parameters above are somewhat different from realistic situations because they were produced by extreme movements for parameter extraction. Therefore, a more realistic test sequence was created and used in-house.

도 15의 표에 사용한 시퀀스들의 특성을 보이고 있는데, 'WL'와 'S&J'은 자체 제작된 시퀀스이고, Lovebird1은 MPEG 에서 제공하는 다시점 테스트 시퀀스이다. 'WL'은 한 사람이 앉아서 전후, 좌우, 상하로 자유롭게 움직이는 시퀀스이고, 'S&J'는 두 사람이 자유롭게 움직이다가 한 사람이 화면 밖으로 나가고 한 사람만 남는 시퀀스이다.15 shows the characteristics of the sequences used in the table, 'WL' and 'S & J' are self-produced sequences, and Lovebird1 is a multi-point test sequence provided by MPEG. 'WL' is a sequence in which a person sits and moves freely in front and rear, left and right, and up and down, and 'S & J' is a sequence in which two people move freely and one person leaves the screen and one person remains.

본 발명에 따른 얼굴 추적 방법과 추출한 파리미터 값들을 이 시퀀스들에 적용한 실험결과는 도 16의 표에 제시하였다. 이 표에는 조기종료와 희소(sparse) 탐색을 적용하지 않는 FS(Full search)와 조기종료 만 적용한 ET(Early termination) 그리고 조기종료와 희소(sparse) 탐색, 그리고 매칭보정2-PR(2-pixel refinement), 4-PR(4-pixel refinement)를 각각 적용한 결과의 평균 추적오차(%)와 평균 수행시간(ms)을 보이고 있다. 표에서 보듯이 4-PR이 추적오차(%)는 가장 낮지만 수행시간은 조기종료만을 적용한 경우가 가장 빠른 것을 알 수 있다. 하지만 2-PR과 4-PR은 수행속도가 별로 차이가 나지 않고 실시간 동작에도 충분하기 때문에 최종적인 알고리즘으로 4-PR을 선택하여도 무방하다.The face tracking method according to the present invention and the results of applying the extracted parameter values to these sequences are shown in the table of FIG. This table also includes FS (Full search) with no early termination and sparse search, Early Termination (ET) with early termination only, Early termination and sparse searching, and 2-pixel refinement, and 4-PR (4-pixel refinement), respectively. The average tracking error (%) and the average execution time (ms) are shown. As shown in the table, the 4-PR has the lowest tracking error (%), but the execution time is the fastest when applying the early termination only. However, since 2-PR and 4-PR do not make much difference in speed and are sufficient for real-time operation, 4-PR may be selected as the final algorithm.

도 17은 각각의 시퀀스에서 추적된 결과 영상을 보여준다. 그림에서 보듯이 본 발명에 따른 방법은 두 명 이상의 사람이 있을 때 카메라에서 가장 가까운 사람만을 추적한다는 것을 알 수 있다(도 17의 (e)와 (f)). 또한 정면을 바라보지 않고 옆을 보고 있는 얼굴도 추적하는 것을 볼 수 있는데, 이것은 컬러영상이 아닌 깊이영상을 대상으로 하기 때문이다.Figure 17 shows the resultant image tracked in each sequence. As shown in the figure, it can be seen that the method according to the present invention tracks only the person closest to the camera when there are two or more people (Fig. 17 (e) and (f)). You can also see the side looking at the face without looking at the front, because it targets depth images, not color images.

도 18의 표는 기존의 방법을 가지고 비교하였다. [비특허문헌 15],[비특허문헌 19]은 본 발명과의 동등의 비교를 위한 실험결과 정보를 가지고 있지 않기 때문에, 본 본문에서는 동등한 비교를 위하여 본 발명의 정보를 변형하였다. 도 16의 표에서 3가지(ET, 2-PR, 4-PR) 중 오류와 속도가 중간인 2-PR을 선택하여 비교하였다. 시퀀스로서 Depth+RGB 들어가지만 본 발명에 따른 알고리즘에 깊이(Depth)정보 만을 사용하였다. 표 4에서 보듯이 본 발명에 따른 방법이 [비특허문헌 15]와 비교하여 정확도와 속도에서 우수한 결과를 보이고 있다. [비특허문헌 19]에서는 속도가 제시되지 않았기 때문에 비교할 수가 없었으며, 정확도에서는 본 발명에 따른 방법이 해상도가 640x480 인 영상에서는 더 안 좋은 결과를 보이고 있고, 1024×768 인 영상에서는 더 우수한 결과를 보이고 있다. 640x480에서 정확도가 더 안 좋은 결과를 보인 것은 초당 404프레임의 고속으로 추적하는 알고리즘을 구현하였기 때문에 정확도를 높이기 위하여 파라미터(T_ET=2, I=3)값들을 조정한다면 더욱 높은 정확도의 결과를 보일 것으로 사료된다.
The table of FIG. 18 was compared with the conventional method. [Non-Patent Documents 15] and [Non-Patent Document 19] do not have experimental result information for the same comparison with the present invention. Therefore, in the present text, the information of the present invention is modified for the sake of an equal comparison. In the table of FIG. 16, 2-PRs having middle error and speed among ET, 2-PR, and 4-PR are selected and compared. Depth + RGB is input as the sequence, but only the depth information is used in the algorithm according to the present invention. As shown in Table 4, the method according to the present invention shows excellent results in accuracy and speed as compared with [Non-Patent Document 15]. In [Non-Patent Document 19], the speed was not shown, and therefore, the method according to the present invention showed a worse result in the case of a resolution of 640x480 and a better result in a case of 1024x768 It is showing. At 640x480, the less accurate results were achieved with a fast tracking algorithm of 404 frames per second, so if you adjust the parameters (T _ET = 2, I = 3) to increase the accuracy, .

본 발명에서는 깊이정보 만을 이용하여 고속으로 얼굴을 추적하는 방법을 설명하였다. 얼굴추적을 하기 위한 기본적인 방법으로 템플릿 매칭 방법을 사용하였고, 템플릿 매칭 방법의 문제점인 과다한 연산량을 줄이기 위해서 조기종료 방법, 희소(sparse) 탐색 방법을 사용하였으며, 그에 따른 추적오류를 보정하고자 주변 화소들을 대상으로 매칭보정을 수행하였다. 전후 움직임에 따른 깊이 변화를 보정하기 위해 추적할 얼굴의 깊이 값을 추정하고 그 결과에 따라 템플릿의 크기를 적절하게 조정하였다. 또한 조정된 템플릿의 크기에 따라 템플릿 매칭을 수행할 탐색 영역을 조정하였다.In the present invention, a method of tracking a face at high speed using only depth information has been described. The template matching method is used as a basic method for face tracking and the early termination method and the sparse search method are used to reduce the excessive amount of computation which is a problem of the template matching method. Matching correction was performed. In order to correct the depth change due to the back and forth movement, the depth value of the face to be tracked was estimated and the size of the template was appropriately adjusted according to the result. Also, we adjusted the search area to perform template matching according to the size of the adjusted template.

본 발명에서는 제시한 얼굴 추적 방법의 파라미터 값들을 도출하고자 테스트 시퀀스들을 이용하여 값들을 구하였다. 그리고 구한 파라미터들을 이용하여 상기 테스트 시퀀스가 아닌 실험을 위해 다시 자체제작 한 시퀀스와 MPEG 시퀀스를 이용하여 실험하였다. 실험결과 추적율은 100%였으며, Kinect을 이용하여 자체 제작한 시퀀스 'WL'와 'S&J'에서는 약 3%의 추적 오류와 2.45ms의 수행시간을 보였고, MPEG에서 제공한 다시점 시퀀스 Lovebird1에서는 약 1%의 추적 오류와 7.46ms의 수행시간을 보였다. In the present invention, values are obtained using test sequences in order to derive the parameter values of the proposed face tracking method. Then, we experimented using the self-generated sequence and MPEG sequence for the experiment, not the test sequence, using the obtained parameters. In the experimental result, the tracking rate was 100%. In the self-produced sequences 'WL' and 'S & J' using Kinect, the tracking error was about 3% and the execution time was 2.45 ms. In the multi-point sequence Lovebird1 provided by MPEG 1% tracking error and 7.46ms performance time.

본 발명에 따른 방법은 조기종료를 위한 문턱치의 변화, 희소(sparse) 탐색을 위한 간격에 따른 추적오류와 수행시간은 예상한 바와 같이 상보적인 관계가 있음을 확인하였다. 따라서 본 발명에 따른 얼굴추적 방법은 초당 30 프레임 이상의 실시간 얼굴추적 시스템에 사용하기 적합하며, 수행시간과 추적오류의 상보적 관계를 활용하면 다양한 분야에서 사용할 수 있을 것으로 사료된다.
The method according to the present invention confirms that there is a complementary relationship between the threshold value for early termination, the tracking error according to the interval for sparse search and the execution time as expected. Therefore, the face tracking method according to the present invention is suitable for use in a real time face tracking system of 30 frames per second or more, and can be used in various fields by utilizing the complementary relationship between the execution time and the tracking error.

이상, 본 발명자에 의해서 이루어진 발명을 상기 실시 예에 따라 구체적으로 설명하였지만, 본 발명은 상기 실시 예에 한정되는 것은 아니고, 그 요지를 이탈하지 않는 범위에서 여러 가지로 변경 가능한 것은 물론이다.
Although the present invention has been described in detail with reference to the above embodiments, it is needless to say that the present invention is not limited to the above-described embodiments, and various modifications may be made without departing from the spirit of the present invention.

10 : 얼굴 20 : 키넥트
21 : 깊이 카메라 22 : 색상 카메라
30 : 컴퓨터 단말 40 : 프로그램 시스템
61 : 깊이영상 62 : 색상영상10: Face 20: Kinect
21: depth camera 22: color camera
30: computer terminal 40: program system
61: depth image 62: color image

Claims

A template matching-based high-speed image processing unit that receives a depth image and a color image having frames consecutive in time from a camera that photographs a face of a person, detects the face in one frame to generate a template, In the face tracking method,
(a) comparing a depth of the template with a depth of a current frame to update a size of the template;
(b) setting a region of the current frame extending to a predetermined ratio (hereinafter referred to as expansion ratio) of the size of the updated template as a search region, with the updated template as a center;
(c) performing template matching in the search area with the updated template and selecting a matching position; And
(d) repeating the steps (a) to (c) by updating the matched position to the position of the updated template and using the next frame of the current frame as a current frame,
In the step (c), the search is progressively progressed in a spiral direction starting from the position of the search area corresponding to the position of the template,
In the step (c), template matching with the updated template is performed while sequentially moving from the search area to a neighboring location, and when shifting from the position of the searched pixel to the position of the next search pixel, The next pixel is selected and searched beyond the pixel at a predetermined interval,
In the step (c), if the pixel at the matched position is selected, further matching is performed with respect to the pixel at the selected position and the neighboring pixel, and correction is performed for the upper, lower, left, right, Speed face tracking based on template matching.

The method according to claim 1,
In the step (a), each of the template and the region corresponding to the template in the current frame (hereinafter referred to as a sampling region of the current frame) is divided into subblocks, Wherein the size of the template is adjusted and updated in proportion to a depth difference between the template and the sampling area.

The method according to claim 1,
In the step (c), the updated template and a sum-of-absolute differences (SAD) per pixel at each position in the search region are obtained, and a position having a minimum value is selected as a matched position A high speed face tracking method based on template matching.

The method according to claim 1,
Wherein the search is prematurely terminated when the updated template and the SAD (PSAD) per pixel in the search area are smaller than a predetermined threshold in step (c).

delete

A computer-readable recording medium having recorded thereon a program for performing the template matching based high-speed face tracking method according to any one of claims 1 to 4.