KR100474771B1

KR100474771B1 - Face detection method based on templet

Info

Publication number: KR100474771B1
Application number: KR10-2002-0051617A
Authority: KR
Inventors: 이진수; 유재신
Original assignee: 엘지전자 주식회사
Priority date: 2002-08-29
Filing date: 2002-08-29
Publication date: 2005-03-10
Also published as: KR20040020159A

Abstract

본 발명의 탬플리트 기반 얼굴영역 추출방법은, 영상 입력단계와; 영상에 대하여 매트릭스 연산을 이용하여 변환하지 않는 비변환 탬플리트를 이용한 얼굴 후보영역 추출단계; 및 추출된 얼굴 후보영역을 대상으로, 입력 영상을 매트릭스 연산을 이용하여 변환하는 변환 탬플리트를 이용한 얼굴영역 추출단계; 를 포함한다.Template-based face region extraction method of the present invention, the image input step; Extracting a face candidate region using an unconverted template that is not transformed using a matrix operation on an image; And a face region extraction step using a transform template for converting an input image by using a matrix operation on the extracted face candidate region; It includes.

여기서, 얼굴 후보영역 추출단계에서 이용되는 비변환 탬플리트는, 상대적으로 변환 탬플리트에 비해 처리 속도는 빠르지만 정확성이 낮은 추출 방법으로써, 예를 들면 학습할 얼굴 영상을 사용하여 평균 얼굴 영상을 구성한 탬플리트이다.Here, the non-converted template used in the face candidate region extraction step is an extraction method having a relatively faster processing speed but lower accuracy than the transform template, for example, a template in which an average face image is formed using the face image to be learned. .

또한 얼굴 후보영역 추출단계에서, 비변환 탬플리트를 이용하여 탬플리트 매칭을 수행함에 있어 탬플리트의 영역별로 가중치를 차이 나게 설정하여 매칭을 수행하며, 탬플리트 영역별로 가중치를 차이 나게 설정함에 있어, 탬플리트 영역 중에서 눈과 입 영역에 높은 가중치를 부여한다.In the face candidate region extraction step, in performing template matching using an unconverted template, matching is performed by setting weights differently for each template region. Give high weight to the overburden area.

또한, 얼굴 영역 추출단계에서 이용되는 변환 탬플리트는, 상대적으로 비변환 탬플리트에 비해 처리 속도는 느리지만 정확성이 높은 추출방법으로써, 학습할 얼굴 영상에 대하여 웨이블릿 변환을 통해 구성되며, 변환 탬플리트는 웨이블릿 변환된 네 개의 부분 주파수 영역 중, 가로 세로 모두 저주파인 영역과, 가로 방향의 에지를 표현하는 세로 축 방향에서의 고주파 영역을 탬플리트로 선택된다.In addition, the transform template used in the face region extraction step is a extraction method that is relatively slower in processing speed but more accurate than a non-converted template, and is configured through wavelet transform on the face image to be trained, and the transform template is wavelet transform. Of the four partial frequency regions, the region in which the width and width are all low frequencies and the high frequency region in the vertical axis direction expressing the edge in the horizontal direction are selected as templates.

Description

Template based face region extraction method {Face detection method based on templet}

본 발명은 얼굴 영역 추출방법에 관한 것으로서, 특히 탬플리트를 기반으로 얼굴 영역을 자동으로 추출하는 방법에 관한 것으로, 화상통신, 비디오 인덱싱과 같은 동영상이나 디지털 카메라와 같은 정지영상 환경에 모두 적용될 수 있는 탬플리트 기반 얼굴 영역 추출방법에 관한 것이다.The present invention relates to a method for extracting a face region, and more particularly, to a method for automatically extracting a face region based on a template, and to a template that can be applied to both a still image environment such as a video camera or a digital camera such as video communication and video indexing. The present invention relates to a method for extracting face-based face region.

동영상 또는 정지영상에서 자동으로 얼굴 영역을 추출하는 기술은 오래 전부터 다양한 분야에 적용되어 왔다. 얼굴 영역의 추출은 처리 시간 측면에서 실시간 추출을 요구하는 경우와 그렇지 않은 경우로 나누어 생각할 수 있다. The technique of automatically extracting facial regions from moving images or still images has been applied to various fields for a long time. The extraction of the face region can be divided into the case of requiring real-time extraction in the processing time and the case of not.

일반적으로 비디오 인덱싱을 자동으로 수행하는데 있어서 얼굴 영역 추출을 이용할 경우에는 실시간 처리가 요구되지 않는다. 이는 오프라인으로 한번 인덱싱 과정을 수행하면, 이후에 추출 과정이 다시 필요 없으므로 추출하는 데 필요한 시간은 그리 중요하지 않을 수 있다. 반면에 화상통신과 같은 실시간 응용에 얼굴 영역 추출을 이용할 경우에는 실시간에 얼굴 영역을 처리해야 하므로 처리 시간이 매우 중요하다.In general, real-time processing is not required when using face region extraction to automatically perform video indexing. This means that once the indexing process is done offline, the time needed for the extraction may not be so important since the extraction process is not required again. On the other hand, when face region extraction is used for real-time applications such as video communication, processing time is very important because face regions must be processed in real time.

이와 같은 처리 시간의 중요성이 반영된 하나의 방안으로, 화상 통신에서 얼굴 영역은 다른 영역에 비해 중요한 영역이므로, 얼굴 영역은 상대적으로 높은 화질을 나타내도록 코딩하고, 다른 영역은 낮은 화질을 나타내도록 코딩함으로써 같은 데이터 양을 유지하면서도 얼굴 영역을 보다 높은 화질로 전송할 수 있는 방법이 제안된 바 있다.In one scheme that reflects the importance of such processing time, since the face area is an important area in comparison with other areas in the video communication, the face area is coded to show a relatively high picture quality, and the other area is coded to show a low picture quality. A method of transmitting a face area with higher image quality while maintaining the same amount of data has been proposed.

이와 같이 얼굴 영역 추출은 여러 가지 용도로 중요하게 이용되므로 매우 많은 연구가 이루어져왔는데, 그 접근 방법도 다양하다. 이러한 얼굴 영역 추출 방법은 크게 지식 기반 방법, 특징 기반 방법, 탬플리트 매칭 방법 그리고 학습 기반 방법으로 나눌 수 있다. As such, face area extraction is important for various purposes, and a lot of research has been conducted. These face region extraction methods can be broadly divided into knowledge based method, feature based method, template matching method, and learning based method.

여기서 지식 기반 방법은, 얼굴 특징에 대한 사전 지식을 이용하여 규칙을 정의한 후, 정의된 규칙을 만족하는 지 여부가 얼굴 영역을 판단하는 기준으로 사용되는 방법이다. 또한 특징 기반 방법은 가장 많이 사용되는 살색 조건을 사용하는 방법과 에지 정보를 바탕으로 눈, 코, 입 등을 찾아 추출하는 방법이 제안된 바 있다. 살색 조건을 사용하는 방법은 빠른 처리가 가능하고 제한된 데이터에 대해서 비교적 우수한 성능을 나타내기 때문에 매우 많이 사용된다. 하지만 조명 등 환경이 다양할 경우 살색 조건 자체가 매우 광범위 해져서 다른 배경과 구분하기가 어려워지는 단점이 있다. Here, the knowledge-based method is a method of defining a rule by using prior knowledge of facial features and then using the criteria as a criterion for determining a face region whether or not the defined rule is satisfied. In addition, the feature-based method has been proposed to use the most commonly used skin color condition and to find and extract eyes, nose and mouth based on edge information. The method using the skin color condition is very popular because it enables fast processing and shows relatively good performance for limited data. However, if there are various environments such as lighting, the skin condition itself becomes very wide, which makes it difficult to distinguish it from other backgrounds.

그리고 탬플리트 매칭 방법의 경우, 얼굴 이미지의 특성을 하나의 탬플리트로 구성하여, 이를 입력 영상과 부분적으로 매칭함으로써 얼굴 영역을 추출하게 되는데, shape 정보를 사용한 탬플리트나 얼굴에 나타난 텍스쳐의 특징을 나타내는 텍스쳐 탬플리트를 사용하기도 한다. 이와 같은 탬플리트 매칭 기반의 접근 방식은 칼라를 사용하지 않고 Gray 이미지를 사용하는 경우가 일반적이므로, 조명 등에 의존적이지 않은 장점이 있다. 하지만 탬플리트 매칭 회수가 많아 처리시간이 많이 요구된다는 단점이 있는데, 특히 웨이블릿 등 입력 이미지 영상을 변환(트랜스폼)하여 탬플리트와 비교해야 하는 경우에는 처리 시간이 매우 오래 걸릴 수 있다. In the case of the template matching method, the facial region is extracted by configuring the characteristics of the face image as one template and partially matching it with the input image.The template using the shape information or the texture template representing the characteristics of the texture on the face is extracted. Also used. Such a template matching-based approach generally uses gray images without using colors, and thus has an advantage of not being dependent on lighting. However, there is a disadvantage in that a large number of template matching is required, which requires a lot of processing time. In particular, when an input image image such as a wavelet is converted (transformed) and compared with a template, the processing time may be very long.

하지만 웨이블릿 등의 변환된 탬플리트가 입력 영상을 그대로 이용하는 것보다 유용한 정보를 포함하기 때문에, 많은 연구에서 이를 이용한 결과를 보고하고 있다. 이러한 문제점에도 불구하고 탬플리트 기반 얼굴 영역 추출은 실제 응용에서 문제시 되는 조명, 피부색에 상관없는 장점으로 인해 많이 사용되고 있는 실정이다.However, since the converted templates such as wavelets contain useful information rather than using the input image as it is, many studies have reported the results using them. Despite these problems, template-based facial region extraction is being used a lot because of the advantages irrespective of lighting and skin color that are problematic in practical applications.

본 발명은, 탬플리트 기반 얼굴 영역 추출에 있어서 가장 큰 문제점인 처리 시간을 보다 효과적으로 단축하여 조명에 무관한 얼굴 영역을 추출할 수 있는 탬플리트 기반 얼굴 영역 추출방법을 제공함에 그 목적이 있다.It is an object of the present invention to provide a template-based face region extraction method capable of extracting a face region irrelevant to illumination by shortening the processing time which is the biggest problem in template-based face region extraction.

상기의 목적을 달성하기 위하여 본 발명에 따른 탬플리트 기반 얼굴 영역 추출방법은,Template-based facial region extraction method according to the present invention to achieve the above object,

영상의 입력 단계와;Inputting an image;

상기 입력된 영상에 대하여 매트릭스 연산을 이용하여 변환하지 않는 비변환 탬플리트를 이용한 얼굴 후보 영역 추출 단계; 및Extracting a face candidate region using an unconverted template that does not transform the input image by using a matrix operation; And

상기 추출된 얼굴 후보 영역을 대상으로, 입력된 영상을 매트릭스 연산을 이용하여 변환하는 변환 탬플리트를 이용한 얼굴 영역 추출 단계; 를 포함하는 점에 그 특징이 있다.Extracting a face region using a transform template for converting an input image by using a matrix operation on the extracted face candidate region; Its features are to include.

여기서 본 발명에 의하면, 상기 얼굴 후보 영역 추출 단계에서 이용되는 비변환 탬플리트는, 학습할 얼굴 영상을 사용하여 평균 얼굴 영상을 구성한 탬플리트인 점에 그 특징이 있다.According to the present invention, the non-converted template used in the face candidate region extraction step is characterized in that the template is composed of the average face image using the face image to be learned.

또한 본 발명에 의하면, 상기 얼굴 후보 영역 추출 단계에서, 상기 비변환 탬플리트를 이용하여 탬플리트 매칭을 수행함에 있어 탬플리트의 영역별로 가중치를 차이 나게 설정하여 매칭을 수행하는 점에 그 특징이 있다.According to the present invention, in the face candidate region extraction step, in performing template matching using the non-converted template, matching is performed by setting weights differently for each region of the template.

또한 본 발명에 의하면, 상기 탬플리트 영역별로 가중치를 차이 나게 설정함에 있어, 탬플리트 영역 중에서 눈과 입 영역에 높은 가중치를 부여하는 점에 그 특징이 있다.In addition, according to the present invention, the weight is set differently for each template region, and the feature is that high weight is given to the eye and mouth region among the template region.

또한 본 발명에 의하면, 상기 얼굴 영역 추출단계에서 이용되는 변환 탬플리트는, 학습할 얼굴 영상에 대하여 웨이블릿(wavelet) 변환을 통해 구성되는 탬플리트인 점에 그 특징이 있다.In addition, according to the present invention, the transform template used in the face region extraction step is characterized in that the template is configured through the wavelet transform (wavelet) for the face image to be learned.

또한 본 발명에 의하면, 상기 변환 탬플리트는 웨이블릿 변환된 네 개의 부분 주파수 영역 중, 가로 세로 모두 저주파인 영역과, 가로 방향의 에지를 표현하는 세로 축 방향에서의 고주파 영역을 탬플리트로 선택하는 점에 그 특징이 있다.In addition, according to the present invention, the conversion template is selected from among the four wavelet-converted partial frequency regions in which the horizontal and vertical low frequency regions and the high frequency region in the vertical axis direction representing the horizontal edges are selected as templates. There is a characteristic.

또한 본 발명에 의하면, 상기 비변환 탬플리트를 이용한 얼굴 후보 영역 추출 단계 및 상기 변환 탬플리트를 이용한 얼굴 영역 추출 단계는, 탬플리트와 매칭할 입력 영상의 부분 영역 크기를 설정하는 단계와, 설정된 부분 영역 크기에 맞게 탬플리트 또는 부분 영역의 크기를 조절하는 단계와, 해당 부분 영역에 매칭하기 위해 탬플리트를 이동하는 단계 및 매칭 후 매칭 점수를 이용하여 얼굴 영역 여부를 판단하는 단계로 각각 구성되는 점에 그 특징이 있다.According to the present invention, the extracting the face candidate region using the non-converted template and extracting the face region using the transform template may include setting a partial region size of an input image to match the template, and setting the partial region size to the set partial region size. Adjusting the size of the template or partial region according to the size, moving the template to match the partial region, and determining whether the face region is determined using a matching score after matching. .

또한 본 발명에 의하면, 상기 해당 부분 영역에 매칭하기 위해 탬플리트를 이동하는 단계는, 비변환 탬플리트를 이용하여 얼굴 후보 영역을 추출하는 경우의 이동 단위가 변환 탬플리트를 이용하여 얼굴 영역을 추출하는 경우의 이동 단위보다 큰 점에 그 특징이 있다.Further, according to the present invention, the moving of the template to match the corresponding partial region may include the case where the movement unit in the case of extracting the face candidate region using the non-converted template extracts the face region using the transform template. Its features are larger than its mobile units.

이와 같은 본 발명에 의하면, 조명 등 주위 환경에 무관한 얼굴 영역 추출방법을 제공하며, 속도는 빠르나 성능이 정확하지 못한 비변환 탬플리트와 성능은 높으나 속도가 늦은 변환 탬플리트를 복합적으로 사용함으로써 빠르면서도 정확한 얼굴 영역 추출방법을 제공할 수 있는 장점이 있다.According to the present invention, there is provided a method for extracting a face region irrelevant to an environment, such as lighting, and using a non-converted template having a high speed but inaccurate performance and a transformed template having a high performance but a slow speed and accurate speed. There is an advantage that can provide a facial region extraction method.

이하, 첨부된 도면을 참조하여 본 발명에 따른 실시 예를 상세히 설명한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

본 발명은, 종래기술에서 설명된 탬플리트 기반 얼굴 영역 추출방법의 장점을 그대로 이용하되, 가장 큰 단점인 처리 시간을 효과적으로 줄이기 위하여, 변환 탬플리트와 비변환 탬플리트를 조합하여 얼굴 영역을 추출하는 방안에 대하여 제안하고자 한다. 그러면, 본 발명에 따른 얼굴 영역 추출방법을 설명하기 이전에 탬플리트 기반 얼굴 영역 추출 및 변환 탬플리트를 이용한 얼굴 영역 추출방법에 대하여 간략하게 살펴 보기로 한다.The present invention, while using the advantages of the template-based face region extraction method described in the prior art as it is, to reduce the processing time, which is the biggest disadvantage, with respect to a method for extracting the face region by combining the transform template and non-converted template I would like to suggest. Before describing the face region extraction method according to the present invention, a face-based extraction method using a template-based face region extraction and transformation template will be briefly described.

<탬플리트 기반 얼굴 영역 추출방법> < Template based face region extraction method>

탬플리트 기반 얼굴 영역 추출이란, 사전에 얼굴에 대한 학습 세트를 준비한 후, 이를 이용하여 얼굴 모습에 대한 탬플리트를 구성하고, 구성된 탬플리트를 이용하여 얼굴 영역을 추출하는 방법이다. 즉, 얼굴 영상으로만 구성된 학습 세트를 이용하여 가장 기본이 될 수 있는 얼굴의 틀을 구성하여, 이를 입력 영상의 부분 영역과 비교함으로써 얼굴 영역을 추출하게 된다. 이러한 얼굴의 틀을 탬플리트라 하는데, 얼굴의 탬플리트는 필요에 따라 하나, 또는 복수 개로 구성할 수 있는데, 다양한 얼굴을 그룹화하여 비슷한 유형끼리 하나의 틀로 표현하는 것이 높은 성능을 나타내므로, 일반적으로는 복수 개의 탬플리트를 구성하여 얼굴 영역을 추출하게 된다.Template-based facial region extraction is a method of preparing a learning set for a face in advance, constructing a template for a face shape using the same, and extracting a facial region using the configured template. That is, the face region may be extracted by constructing a frame of a face that can be the most basic using a learning set composed of only face images, and comparing it with a partial region of an input image. Such a template of the face is a template, and one or more templates of the face can be configured as needed.In general, a plurality of faces are grouped to express high performance in a single template. Two templates are configured to extract the face region.

이와 같은 탬플리트의 예를 도 1에 나타내었다. 도 1은 일반적인 탬플리트 기반 얼굴 영역 추출방법에 의하여 얼굴 영역을 추출하는데 이용되는 탬플리트의 예를 나타낸 도면이다. 도 1에서는 모두 10 개의 탬플리트를 구성한 예를 나타내고 있다. 하지만 탬플리트의 수가 많아질수록 입력 영상과의 비교 회수가 많아지므로 처리 시간이 많이 요구된다. 얼굴은 입력 영상에서 어떠한 위치에 어떠한 형태로 나타날 지 모르므로, 구성한 탬플리트를 다양한 크기로 변화시키면서 입력 영상 전체를 스캐닝하면서 비교해야 한다. 이와 같이 많은 비교 회수로 인해 일반적으로 탬플리트는 다른 얼굴 영역 추출 방법에 비해 많은 처리 시간이 요구되는 단점이 있다.An example of such a template is shown in FIG. 1. 1 illustrates an example of a template used to extract a face region by a general template based face region extraction method. In FIG. 1, the example which comprised all ten templates is shown. However, as the number of templates increases, the number of comparisons with the input image increases, which requires more processing time. Since the face may not appear in any position at any point in the input image, the face should be compared while scanning the entire input image while varying the size of the template. Due to such a large number of comparisons, templates generally require more processing time than other face region extraction methods.

<변환 탬플리트를 이용한 얼굴 영역 추출 방법><Face Region Extraction Method Using Transform Template>

일반적으로 영상은 2차원 이미지로 구성되며, 각 픽셀은 세 개의 칼라 요소값으로 표현된다. 칼라 요소 값은 칼라가 표현된 색공간에 따라 달라지는데, 예를 들어 모니터에서 표현하는 RGB 색공간일 경우 R, G, B로 표현되며 MPEG 계열이나 H.26x 계열의 표준 동영상 형식일 경우 YCrCb 색공간에서 표현되므로 Y, Cr, Cb로 표현된다. 이들 정보는 얼굴을 표현하는 기본 정보가 되지만 일반적으로 탬플리트 기반 얼굴 영역 추출에서는 칼라 정보 보다는 그레이 정보를 이용하게 된다. 이는 얼굴의 특징이 명암관계로 표현 가능하며, 동시에 조명에 따른 색의 왜곡에 강할 수 있기 때문이다.In general, an image is composed of two-dimensional images, and each pixel is represented by three color element values. The color element value depends on the color space in which the color is expressed. For example, the RGB color space represented by the monitor is expressed as R, G, B, and YCrCb color space when it is a standard video format of MPEG series or H.26x series. Is represented by Y, Cr, Cb. These information become basic information for representing the face, but in general, template-based face region extraction uses gray information rather than color information. This is because facial features can be expressed in contrast, and at the same time, they can be strong in color distortion due to illumination.

YCrCb 색공간을 예로 들면 Y 성분이 그레이를 표현한다. 이러한 그레이 픽셀들로 구성된 얼굴 영상 자체는 픽셀 수가 많을 뿐 아니라, 그 자체로서 얼굴의 특징을 잘 표현하지 못한다. 때문에 일반적으로는 얼굴의 특징을 기술할 수 있는 보다 작은 차원(dimension)으로 표현된 탬플리트를 구성하게 된다. 대표적인 예가 영상이 표현하는 차원과 다른 차원으로 변환(transform)을 통하여 탬플리트를 구성하는 예이다. Taking the YCrCb color space as an example, the Y component represents gray. The face image itself composed of such gray pixels not only has a large number of pixels, but also does not express the features of the face by itself. As a result, templates are usually constructed in smaller dimensions that can describe facial features. A representative example is an example of configuring a template by transforming it to a dimension different from the dimension represented by the image.

이와 같은 변환 중에서, 가장 많이 사용하는 변환 중 하나가 웨이블릿(wavelet) 변환인데, 도 2에 나타낸 바와 같이, 웨이블릿 변환이란 입력 영상에서 수평 수직으로 고주파 성분과 저주파 성분으로 구분하여 영상을 분리하는 기능을 한다. 도 2는 일반적인 변환 탬플리트를 이용한 얼굴 영역 추출 방법에 의하여 얼굴 영역을 추출하기 위한 웨이블릿 변환의 예를 나타낸 도면이다.Among these transformations, one of the most frequently used transformations is a wavelet transformation. As shown in FIG. 2, a wavelet transformation is a function of separating an image by dividing an image into high and low frequency components horizontally and vertically in an input image. do. 2 is a diagram illustrating an example of wavelet transform for extracting a face region by a face region extraction method using a general transform template.

도 2에서 'LL'은 수평과 수직 모두 저주파 성분을 나타내는 영역으로 이는 영상을 다운 샘플링(downsampling)하여 축소한 것과 같은 효과를 나타낸다. 'LH'는 영상을 수직으로 스캔할 때 고주파 성분이 나타나는 경우로서 주로 수평 에지 성분을 포함하고 있다. 'HL'은 영상을 수평으로 스캔할 때 고주파 성분이 나타나는 경우로서 주로 수직에지 성분을 포함하고 있다. 'HH'는 수평과 수직 모두 고주파 성분을 나타내고 있는 경우이다. 도 2에서는 웨이블릿 한 스텝을 변환한 예를 나타내며, 이 경우 각 네 영역은 원 영상의 1/4 크기를 갖는다. In FIG. 2, 'LL' is a region showing low frequency components both horizontally and vertically, and has the same effect as downsampling an image and reducing it. 'LH' is a case where high frequency components appear when the image is scanned vertically, and mainly includes horizontal edge components. 'HL' is a case where high frequency components appear when the image is scanned horizontally, and mainly includes vertical edge components. 'HH' represents a case where high frequency components are represented both horizontally and vertically. 2 shows an example of converting one wavelet step, in which case each of the four areas has a size 1/4 of the original image.

이때, 'LL'영역을 다시 한 스텝 웨이블릿 변환을 수행하면 다시 이와 같이 내 개의 영역으로 나뉘어지며 각각의 영역은 도 2에서의 'LL'영역을 기준으로 1/4 크기이며 원 영상을 기준으로는 1/16 크기가 된다. 이렇게 작아진 영역 중 일부만을 사용하면 매우 유용할 수 있다. 예를 들어 얼굴의 특징이 'LL' 영역과 'LH' 영역에 많이 포함되어 있다면 이 두 영역만을 이용하여 탬플리트로 사용할 수 있다. 만일 두 스텝 웨이블릿 변환을 한 후 'LL' 영역과 'LH' 영역을 사용할 경우 전체 크기는 원 영상의 1/8 크기로 줄게 된다.At this time, if the step wavelet transform of the 'LL' region is performed again, it is divided into four regions as described above. Each region is 1/4 size based on the 'LL' region in FIG. It is 1/16 size. Using only some of these smaller areas can be very useful. For example, if a lot of facial features are included in the 'LL' and 'LH' areas, only these two areas can be used as templates. If the 'LL' and 'LH' regions are used after two step wavelet transforms, the overall size is reduced to 1/8 of the original image.

이와 같이, 웨이블릿 변환 등을 이용하여 탬플리트를 구성하면 얼굴 영역 특징을 많이 포함하면서도 크기가 작은 탬플리트를 구성할 수 있는 장점이 있다. 하지만 입력 영상과 비교를 위해서는 입력 영상 역시 웨이블릿 변환을 수행하여야 하는데, 비교회수 만큼 웨이블릿 변환을 수행해야만 한다. 따라서 비교 회수가 많은 경우에는 탬플리트 크기를 줄였음에도 불구하고 처리 시간이 많이 요구되는 단점이 있다.As described above, configuring a template using wavelet transform or the like has an advantage that a small template may be configured while including many facial region features. However, in order to compare with the input image, the input image must also perform wavelet transform, and the wavelet transform must be performed for the comparison frequency. Therefore, when the number of comparisons is large, there is a disadvantage that a lot of processing time is required despite the reduction in the template size.

<비변환 탬플리트를 이용한 얼굴 영역 추출방법><Face region extraction method using unconverted template>

상기 기술하였듯이 변환을 통해 구성한 탬플리트의 주 목적은 작은 탬플리트 크기로 많은 정보를 포함시키는 데 있다. 이에 반해 비변환 탬플리트는 같은 크기의 탬플리트라고 가정했을 때 변환 탬플리트에 비해 자세한 정보를 포함하지 못할 수 있다. 하지만 입력 영상에 대해 웨이블릿과 같은 변환을 수행하지 않으므로 같은 크기의 탬플리트일 때, 변환 탬플리트를 이용하는 경우에 비하여 보다 상대적으로 처리 시간이 빠른 것이 일반적이다. 비변환 탬플리트의 가장 기본적인 유형은 얼굴 영상을 N*M 크기로 평균을 취하여 구성한 평균 얼굴 영상을 들 수 있다. 평균 얼굴 영상은 눈, 코, 입의 위치가 다양한 사람의 특징을 잘 표현하지 못하기 때문에, 절대적인 눈, 코, 입의 위치를 표현하는 평균 얼굴 영상 대신 상대적인 눈, 코, 입의 위치를 표현하는 탬플리트를 구성할 수 있다. 즉, 영역의 좌우 상단에 눈에 의해 어두운 영역이 나타나고, 아래 영역 역시 입에 의해 어두운 영역이 나타남을 탬플리트에 표현하게 된다. 이와 같이 사전에 특징을 기술하는 탬플리트의 실시 예를 이하에서 자세히 설명하기로 한다.As described above, the main purpose of the template constructed through the conversion is to include a lot of information in a small template size. In contrast, assuming that a non-converted template is a template of the same size, detailed information may not be included in comparison with the transform template. However, since it does not perform a wavelet-like transformation on the input image, it is common to have a relatively short processing time when using a template of the same size as compared to using a transform template. The most basic type of unconverted template is an average face image composed by averaging face images in N * M size. Since the average face image does not express the characteristics of various people with various positions of eyes, nose, and mouth, the average face image, which represents the position of absolute eyes, nose, and mouth, is used to express relative eye, nose, and mouth positions. You can configure the template. In other words, the dark area appears by the eyes on the upper left and right sides of the area, and the lower area also expresses the dark area by the mouth. As described above, an embodiment of a template for describing a feature in advance will be described in detail below.

일반적으로 입력 영상은 2차원 영상의 여러가지 색 공간으로 구성되어 있지만, 탬플리트 영상을 만들고 비교할 때는 주로 그레이 영상을 이용한다. 이는 얼굴의 특징이 그레이 영상의 명암으로 표현가능하며 조명에 따른 색의 왜곡이 심할 수 있기 때문이다. 비변환 탬플리트는 도 3과 같이 많은 입력 영상을 정면, 측면과 같은 비슷한 그룹으로 분류한 뒤 N*M으로 평균 영상을 만든다. 도 3은 일반적인 비변환 탬플리트를 이용한 얼굴 영역 추출방법에 의하여 얼굴 영역을 추출하기 위한 비변환 탬플리트의 구성 예를 나타낸 도면이다.In general, the input image is composed of various color spaces of the two-dimensional image, but gray images are mainly used when making and comparing template images. This is because the facial features can be expressed in the contrast of the gray image and the color distortion due to the lighting may be severe. The non-converted template classifies many input images into similar groups such as front and side as shown in FIG. 3, and then generates an average image with N * M. 3 is a diagram illustrating an example of a configuration of an unconverted template for extracting a face region by a face region extraction method using a general nonconverted template.

이러한 평균 영상은 임의의 입력 데이타 셋(data set)을 바탕으로 만들어져 대체적인 얼굴 영상의 특징을 나타내게 된다. 이렇게 얻은 N*M의 평균 영상을 더 작은 사이즈의 N'*M'으로 줄인다. 이것은 큰 영상보다는 작은 영상이 더 이미지의 특성을 잘 나타내기 때문이다. 큰 평균 영상에서는 얼굴 영상의 특징뿐 아니라 그렇지 않은 부분의 정보도 많이 있기 때문에 입력 영상이 얼굴 영상인지를 판단하기 위해 비교 회수도 많아지고 정확도도 떨어질 수 있다. 따라서, 평균 영상의 크기를줄여 얼굴 영역의 특징을 잘 나타내고, 입력 영상과 비교 회수도 줄일 수 있게 한다. 도 3은 N*M의 평균 영상과 그 평균 영상의 크기를 N'*M'(8*10)으로 줄여 탬플리트를 만든 것을 이해를 돕기 위해 확대한 이미지이다.This average image is generated based on an arbitrary input data set to represent the characteristics of an alternative face image. The average image of N * M thus obtained is reduced to N '* M' of smaller size. This is because small images are more representative of image characteristics than large images. In the large average image, not only the features of the face image but also a large amount of information on the non-face portions may increase the number of comparisons and reduce accuracy to determine whether the input image is a face image. Therefore, the size of the average image is reduced, so that the feature of the face region can be well represented, and the number of comparisons with the input image can be reduced. 3 is an enlarged image for understanding that a template is made by reducing the average image of N * M and the size of the average image to N '* M' (8 * 10).

얼굴 영역 검출을 위해 입력 영상을 작은 사이즈부터 큰 사이즈까지 사이즈와 위치를 변경하면서 전체 이미지에 대해 탬플리트 매칭을 수행한다. 매칭 방법은 입력 영상에서 임의 위치, 임의 사이즈의 영상을 탬플리트의 사이즈와 동일하게 만든 뒤, 탬플리트와 1:1 매칭 등을 통해 탬플리트와 유사도가 임계치 이상이면 얼굴 영역이라고 판단할 수 있다. For face region detection, template matching is performed on the entire image while changing the size and position of the input image from the small size to the large size. In the matching method, an image having an arbitrary position and an arbitrary size in the input image may be made the same as the template size, and may be determined to be a face region if the similarity with the template is greater than or equal to the threshold value through 1: 1 matching with the template.

각 픽셀(pixel) 단위로 탬플리트 매칭을 할 때 탬플리트와 크기가 동일하게 변환된 임의 위치의 입력 영상과 1:1 매칭을 할 수도 있지만, 탬플리트 영상도 각 픽셀(pixel) 위치마다 중요도를 다르게 하여 매칭할 수도 있다. 예를 들어 눈이 위치하여 픽셀(pixel) 값이 더 어두운 부분이나 입이 위치하여 픽셀(pixel) 값이 더 어두운 부분은 얼굴의 특성을 좀 더 잘 나타내 주는 부분이므로 그 부분에 매칭되는 픽셀(pixel)은 가중치를 주어 영상의 유사도 계산에 점수를 더 줄 수도 있다.When template matching is performed for each pixel unit, one-to-one matching can be made with an input image of any position that has the same size as the template, but the template image can also be matched with different importance for each pixel position. You may. For example, the part where the eye is located has a darker pixel value or the mouth is where the darker pixel value is a better representation of the facial features, so the pixel matching the part ) May be given a weight to calculate the similarity of the image.

도 4는 일반적인 비변환 탬플리트를 이용한 얼굴 영역 추출방법에 의하여 입력 영상과 탬플리트의 유사도 계산의 한 예를 나타낸다. 4 illustrates an example of calculating the similarity between the input image and the template by a face region extraction method using a general non-converted template.

도 4와 같이 탬플리트에서 사람의 특징을 가장 많이 나타낼 수 있는 눈과 입의 위치를 지정해 놓고, 중요 위치에 가중치를 주어 입력 영상과 유사도를 비교하거나, 탬플리트가 가지고 있는 그레이 값을 가로 방향으로 더하여 구성된 히스토그램(gray value histogram) 등을 이용하여 유사도를 비교할 수도 있다. 탬플리트 매칭은 사용자가 미리 분류한 그룹단위로 대표 탬플리트를 생성하고, 각 그룹의 대표 탬플리트와 매칭을 통하여 가장 유사도가 높은 탬플리트로 얼굴의 자세나 방향 등을 좀 더 정확하게 찾을 수도 있다. 도 4에 나타낸 예는 정면 이미지를 이용한 탬플리트 매칭이고, 이미지에서 주로 나타날 수 있는 다른 자세나 방향 등도 탬플리트로 생성한 뒤, 매칭을 수행하여 가장 유사한 탬플리트로 입력 영상의 얼굴 유무와 자세, 방향 등을 알 수 있게 된다.As shown in FIG. 4, the positions of eyes and mouths that represent the most human features in a template are designated, and weights are assigned to important positions to compare similarity with an input image, or to add a gray value of a template in a horizontal direction. Similarity may be compared using a histogram or the like. Template matching can generate a representative template in group units that are pre-classified by the user, and can more accurately find the posture or direction of the face with the template having the highest similarity through matching with the representative template of each group. The example shown in FIG. 4 is a template matching using a front image, and other postures or directions that may be mainly displayed in the image are generated as templates, and then matching is performed to determine the presence, posture, direction, etc. of the input image with the most similar template. You will know.

이와 같은 비변환 탬플리트는 입력 영상에 대해 변환을 수행하지 않아, 같은 크기의 탬플리트일 경우 빠른 처리 시간이 가능하지만, 상기 기술했듯이 얼굴 영상에 대한 특징을 공간의 위치정보나 히스토그램 등 제한된 정보를 이용하기 때문에 얼굴 영상의 가장 큰 특징을 표현하지 못하므로 변환 탬플리트보다 성능이 떨어질 수 있다.Since the non-converted template does not perform conversion on the input image, it is possible to have a fast processing time when the template of the same size is used.However, as described above, using the limited information such as the location information and histogram of the space for the feature of the face image Therefore, since the biggest feature of the face image cannot be expressed, performance may be lower than that of the conversion template.

<변환/비변환 탬플리트를 이용한 얼굴 영역 추출방법> < Face region extraction method using transform / non-conversion template>

본 발명은 앞서 기술한 변환 탬플리트를 이용한 얼굴 영역 추출방법과 비변환 탬플리트를 이용한 얼굴 영역 추출방법의 각 장점을 결합하여, 비변환 탬플리트를 이용하는 경우의 장점인 빠른 처리 시간을 가능하게 하면서도, 변환 탬플리트의 장점인 높은 성능을 가능하게 하였다. 본 발명이 제시하는 얼굴 영역 추출 방법이 도 5에 나타나 있다. 도 5는 본 발명에 따른 탬플리트 기반 얼굴 영역 추출방법의 처리 과정을 나타낸 순서도이다. The present invention combines the advantages of the face region extraction method using the transform template and the face region extraction method using the non-transformation template described above, thereby enabling a fast processing time, which is an advantage of using the non-transformation template, and converting the template. It enables high performance, which is an advantage. The face region extraction method of the present invention is shown in FIG. 5. 5 is a flowchart illustrating a process of a template-based face region extraction method according to the present invention.

먼저, 입력 영상이 들어오면 비변환 탬플리트를 사용하여 대략적인 얼굴의 후보 영역을 추출하게 된다. 비변환 탬플리트를 수행하기 위해 기본적으로 크기를 탬플리트와 맞추기 위한 크기 조정, 그리고 탬플리트에서 표현하는 동일한 특징을 표현하기 위한 간단한 처리를 수행한다. 비변환 탬플리트를 사용하는 단계에서 얼굴 후보 영역의 결정은 일반적인 비변환 탬플리트를 사용할 때 보다 낮은 점수를 후보로 결정하기 위한 임계치로 지정함으로써, 'precision' 보다는 'recall'을 높여 사용하게 된다. 여기서, 'Precision'이란 인식한 얼굴 중에 실제 정확한 얼굴이 포함된 비율로서 후보로 결정하기 위한 점수의 임계치를 높일수록 'Precision'은 높아지게 된다. 그리고, 'Recall'은 전체 존재하는 실제 얼굴 중에 인식에 성공한 경우의 비율로서 후보로 결정하기 위한 점수의 임계치를 낮게 할수록 'recall'은 높아지게 된다. First, when an input image is input, an approximate face candidate region is extracted using an unconverted template. In order to perform the non-converted template, it basically performs the resizing to fit the size with the template, and a simple process for expressing the same feature expressed in the template. In the step of using the non-transformation template, the determination of the face candidate region is used as a threshold for determining a lower score as a candidate when using a general non-transformation template, thereby increasing the 'recall' rather than the 'precision'. Here, 'Precision' is the ratio of the actual face included in the recognized face, the higher the threshold of the score for determining the candidate, the 'Precision' becomes higher. Recall is the ratio of successful recognition among all existing real faces, and the lower the threshold of the score for determining the candidate, the higher the recall.

이와 같이 'precision'과 'recall'은 서로 상반 관계에 있기 때문에 일반적으로는 이들의 조합을 적절히 조화시켜야 하는데, 본 발명에서는 'recall'을 높일 수 있도록 임계치를 설정하였다. 이와 같이 함으로써 존재하는 얼굴 영역을 놓치는 경우를 줄이고, 얼굴 이외의 영역이 후보 영역으로 설정되더라도 다음 단계에서 판단할 수 있도록 하였다. As described above, since 'precision' and 'recall' have mutually opposite relations, in general, a combination of these should be appropriately harmonized. In the present invention, a threshold is set to increase 'recall'. By doing so, it is possible to reduce the case where the existing face area is missed and to determine in the next step even if an area other than the face is set as a candidate area.

그리고, 비변환 탬플리트에 의해 얼굴 후보 영역이 결정되면, 이후 변환 탬플리트를 사용하여 얼굴 확인 작업을 수행한다. 변환 탬플리트를 수행하기 위해서 먼저 후보 영역을 탬플리트와 같은 크기로 조절하기 위한 크기 조절을 수행하고, 변환 탬플리트가 표현하는 공간으로 변환을 수행한다. 다음 변환 탬플리트와 매칭을 수행함으로써 얼굴 영역임을 판단하게 된다.When the face candidate region is determined by the non-converted template, the face verification operation is then performed using the transform template. In order to perform the transform template, first, a size control for adjusting the candidate area to the same size as the template is performed, and then the transform is performed to the space represented by the transform template. The face region is determined by performing matching with the next transform template.

이와 같이 본 발명에서는, 먼저 비변환 탬플리트를 수행하여 대략적인 얼굴 영역의 후보를 추출한 후, 변환 탬플리트를 이용하여 정확한 얼굴 영역을 추출함으로써, 변환 탬플리트의 단점인 처리 시간을 단축함과 동시에 장점인 정확한 얼굴 영역 추출을 가능하게 하였다.As described above, in the present invention, by first performing the non-conversion template to extract the candidate of the approximate face region, and then extracting the correct face region using the transform template, the processing time, which is a disadvantage of the transform template, is shortened and the advantages are accurate. Facial region extraction was enabled.

이상 기술한 바와 같이 본 발명은 정확성은 비록 떨어지지만 빠른 처리가 가능한 탬플리트를 이용하여 먼저 얼굴의 후보 영역을 추출한 후, 처리 시간은 많이 요구되지만 정확성이 높은 변환 탬플리트를 이용하여 얼굴 영역을 추출하는 방법을 제안하였다. As described above, the present invention extracts a candidate region of a face using a template that can be processed quickly, although accuracy is low, and then extracts a facial region using a transform template having high accuracy although processing time is required. Suggested.

이러한 발명의 개념은 다음과 같이 일반화할 수 있다. 먼저 정확성은 낮고 처리 속도는 빠른 탬플리트를 사용하여 후보 영역을 추출하고, 후보 영역에 대해 정확성은 높고 처리 속도는 낮은 탬플리트를 사용하여 얼굴 영역을 추출한다. 이 때 정확성은 높고 처리 속도는 낮은 탬플리트는 일반적으로 매트릭스 연산을 요구하는 변환 탬플리트로서 이러한 변환 탬플리트로는 상기 기술한 웨이블릿 변환을 이용한 탬플리트 이외에 DCT(Discrete Cosine Transform) 변환을 이용한 탬플리트, 그리고 PCA(Principle Component Analysis)를 이용한 변환 탬플리트 등 다양하게 확장 적용될 수 있다. 이들 각각의 변환 탬플리트에 대한 설명은, 잘 알려진 알고리즘이므로 여기서는 생략하기로 한다.The concept of this invention can be generalized as follows. First, candidate regions are extracted using templates with low accuracy and fast processing speed, and face regions are extracted using templates with high accuracy and low processing speed for candidate regions. In this case, a template with high accuracy and low processing speed is a transform template that generally requires matrix operation. Such a transform template may include a template using a discrete cosine transform (DCT) transform and a principal (PCA) in addition to the template using the wavelet transform described above. It can be applied in various ways such as conversion template using Component Analysis. The description of each of these conversion templates is well known algorithm and will be omitted here.

이와 같은 본 발명은, 조명 등 주위 환경에 무관한 얼굴 영역 추출방법을 제공할 수 있게 되며, 속도는 빠르나 성능이 정확하지 못한 비변환 탬플리트와 성능은 높으나 속도가 늦은 변환 탬플리트를 복합적으로 사용함으로써 빠르면서도 정확한 얼굴 영역 추출방법을 제공할 수 있게 된다. 또한, 본 발명은, 환경이 자주 변화는 이동통신 환경에서 화상 통신 시 통신 영상을 대상으로 얼굴 영역을 추출하는데 유용하게 적용될 수 있으며, 또한 비디오와 같이 다양한 환경에서 얼굴이 나타날 때 얼굴 영역을 효과적으로 추출할 수 있게 된다.The present invention can provide a method for extracting a face region irrespective of the surrounding environment such as lighting, and is faster by using a combination of a non-converted template having a high speed but an inaccurate performance and a transform template having a high performance but a slow speed. It is possible to provide a precise face area extraction method. In addition, the present invention can be usefully applied to extract a face region for a communication image during video communication in a mobile communication environment where the environment changes frequently, and also effectively extracts a face region when a face appears in various environments such as video. You can do it.

이상의 설명에서와 같이 본 발명에 따른 탬플리트 기반 얼굴 영역 추출방법에 의하면, 조명 등 주위 환경에 무관한 얼굴 영역 추출방법을 제공하며, 속도는 빠르나 성능이 정확하지 못한 비변환 탬플리트와 성능은 높으나 속도가 늦은 변환 탬플리트를 복합적으로 사용함으로써 빠르면서도 정확한 얼굴 영역 추출방법을 제공할 수 있는 장점이 있다.As described above, the template-based face region extraction method according to the present invention provides a method for extracting a face region irrelevant to an environment, such as lighting, and has a high speed but high performance and an unconverted template that is not accurate. By using a late conversion template in combination, it is possible to provide a fast and accurate face region extraction method.

또한 본 발명에 따른 탬플리트 기반 얼굴 영역 추출방법에 의하면, 환경이 자주 변화는 이동통신 환경에서 화상 통신 시 통신 영상을 대상으로 얼굴 영역을 추출하는데 유용하게 적용될 수 있으며, 또한 비디오와 같이 다양한 환경에서 얼굴이 나타날 때 얼굴 영역을 효과적으로 추출할 수 있는 장점이 있다.In addition, according to the template-based face region extraction method according to the present invention, it can be usefully applied to extract the face region for the communication image in the video communication in the mobile communication environment, the environment is frequently changed, and also the face in various environments such as video When this appears, there is an advantage that can effectively extract the face area.

도 1은 일반적인 탬플리트 기반 얼굴 영역 추출방법에 의하여 얼굴 영역을 추출하는데 이용되는 탬플리트의 예를 나타낸 도면.1 is a diagram illustrating an example of a template used to extract a face region by a general template based face region extraction method.

도 2는 일반적인 변환 탬플리트를 이용한 얼굴 영역 추출 방법에 의하여 얼굴 영역을 추출하기 위한 웨이블릿 변환의 예를 나타낸 도면.2 is a diagram illustrating an example of wavelet transform for extracting a face region by a face region extraction method using a general transform template.

도 3은 일반적인 비변환 탬플리트를 이용한 얼굴 영역 추출방법에 의하여 얼굴 영역을 추출하기 위한 비변환 탬플리트의 구성 예를 나타낸 도면.3 is a diagram illustrating an example of a configuration of an unconverted template for extracting a face region by a face region extraction method using a general unconverted template.

도 4는 일반적인 비변환 탬플리트를 이용한 얼굴 영역 추출방법에 의하여 입력 영상과 탬플리트의 유사도 계산의 한 예를 나타낸 도면.4 is a diagram illustrating an example of calculating the similarity between an input image and a template by a face region extraction method using a general non-converted template.

도 5는 본 발명에 따른 탬플리트 기반 얼굴 영역 추출방법의 처리 과정을 나타낸 순서도.5 is a flowchart illustrating a process of a template-based face region extraction method according to the present invention.

Claims

Inputting an image;

Extracting a face candidate region using an unconverted template that does not transform the input image by using a matrix operation; And

Extracting a face region using a transform template for converting an input image by using a matrix operation on the extracted face candidate region; Template-based face region extraction method comprising a.

The method of claim 1,

The non-converted template used in the face candidate region extracting step is a template in which an average face image is formed using a face image to be learned.

The method of claim 1,

In the face candidate region extraction step, in performing template matching using the non-converted template, matching is performed by setting a different weight for each region of the template.

The method of claim 3, wherein

The template-based facial region extraction method of claim 1, wherein the weights are differently set for each template region.

The method of claim 1,

The transform template used in the face region extraction step is a template based face region extraction method, characterized in that the template configured through the wavelet (wavelet) transformation for the face image to be learned.

The method of claim 5,

The conversion template is a template-based face region, characterized in that, among the four wavelet transformed partial frequency regions, a region in which both horizontal and vertical low frequencies are selected, and a high frequency region in the vertical axis direction representing an edge in the horizontal direction is selected as a template. Extraction method.

The method of claim 1,

Extracting a face candidate region using the non-converted template and extracting a face region using the transform template may include:

Setting a partial region size of the input image to match the template, adjusting the size of the template or the partial region according to the set partial region size, moving the template to match the partial region, and matching after matching Template-based facial region extraction method, characterized in that each step comprising the step of determining the face region using the score.

The method of claim 7, wherein

The moving of the template to match the corresponding partial region may include a movement unit when extracting a face candidate region using an unconverted template than a movement unit when extracting a face region using a transform template. Template-based face region extraction method.