KR100464079B1

KR100464079B1 - Face detection and tracking of video communication system

Info

Publication number: KR100464079B1
Application number: KR10-2002-0002441A
Authority: KR
Inventors: 이지은
Original assignee: 엘지전자 주식회사
Priority date: 2002-01-16
Filing date: 2002-01-16
Publication date: 2004-12-30
Also published as: KR20030062043A

Abstract

본 발명은 얼굴 추출과 추적을 기반으로 하는 실시간 화상 통신 시스템에 관한 것이다.The present invention relates to a real-time video communication system based on face extraction and tracking.

본 발명은 입력된 영상에서 얼굴 영역을 추출하고, 추출된 얼굴 영역을 포함하는 영상을 인코딩하여 전송하며, 이 때 상기 추출된 얼굴 영역을 기반으로 하여 비트율을 자동적으로 제어하거나, 추출된 얼굴 영역을 다른 배경 영역과 합성하는 등의 편집을 수행하여 이 편집된 영상을 인코딩하여 전송할 수 있다. 상기 얼굴 영역의 추출은 입력된 영상에서 배경과 얼굴을 분리해 내기 위한 얼굴 영역의 검출과, 상기 검출된 얼굴 영역을 기반으로 하여 이후 연속되는 프레임에 대한 얼굴 추적을 수행함으로써, 실시간 화상 통신 환경을 구현한다. 상기 얼굴의 검출은 피부색을 기반으로 하고 얼굴 형태 매칭을 위한 마스크를 적용하여 수행하며, 검출한 얼굴 영역이 실제 얼굴인지의 여부를 눈, 코, 입 검출을 통해 확인함으로서 이루어지고, 상기 얼굴의 추적은 연속한 프레임에서 이전 얼굴에 대해 다음 얼굴의 위치변화와 크기변화를 제한하는 방법을 이용해서 추적한다.The present invention extracts a face region from an input image, encodes and transmits an image including the extracted face region, and at this time, automatically controls a bit rate based on the extracted face region, or extracts the extracted face region. Editing such as compositing with other background areas may be performed to encode and transmit the edited video. The extraction of the face region is performed by detecting a face region for separating a face and a background from an input image, and performing face tracking on subsequent frames based on the detected face region, thereby creating a real-time video communication environment. Implement The face detection is performed by applying a mask for face shape matching based on the skin color, and checking whether the detected face area is a real face through eye, nose, and mouth detection, and tracking the face. Is tracked using a method that limits the position and size of the next face relative to the previous face in successive frames.

Description

Face detection and tracking system in video communication {FACE DETECTION AND TRACKING OF VIDEO COMMUNICATION SYSTEM}

본 발명은 화상통신 시 통신 영상 데이터에서 자동으로 사용자의 얼굴 영역을 검출하고 연속된 프레임에서 얼굴을 추적하는 시스템에 관한 것이다.The present invention relates to a system for automatically detecting a face area of a user in communication image data and tracking a face in a continuous frame during video communication.

최근에 음성으로만 전달되던 통신 방법이 통신자의 화상까지도 같이 전달되는 멀티미디어 환경의 화상통신으로 바뀌고 있다. 이러한 화상통신은 지금까지는 주로 PC 카메라를 사용한 화상 통신 및 화상 채팅, 비디오 폰을 사용한 화상 통신 등이 주류를 이루었으나, 향후 IMT 2000 서비스가 개시되면서 이동 통신 단말기를 사용한 화상 통신 환경이 곧 이루어 질 것으로 예측하고 있다.Recently, a communication method that has been transmitted only by voice has been changed to a video communication of a multimedia environment in which even an image of a communicator is also transmitted. Up to now, such video communication has been mainly used for video communication, video chatting using a PC camera, and video communication using a video phone. However, as the IMT 2000 service is started, a video communication environment using a mobile communication terminal will soon be achieved. Predicting

이미 단말기를 사용한 화상 통신을 위해 H.263 및 MPEG-4와 같은 동영상 표준이 이동 화상 통신에서의 동영상 표준으로 지정되고, 많은 통신 단말기 회사들은 질 높은 화상 통신이 가능한 단말기를 개발하고 있다.Video standards such as H.263 and MPEG-4 have already been designated as video standards in mobile video communications for video communications using terminals, and many communication terminal companies are developing terminals capable of high quality video communications.

이와 같이 동영상을 이용한 통신 서비스가 기본 기능으로 자리잡고 있으나 보다 질 높은 서비스를 위해 사용자의 다양한 요구를 만족시켜 주기에는 아직 많은 한계를 가지고 있다. 사용자의 요구는 크게 두 가지 종류로 나누어 생각할 수 있는데, 먼저 낮은 네트워크 환경에서도 자연스런 동영상이 표시될 수 있는 질적인 면과, 사용자가 원하지 않는 배경을 가리거나 사용자만의 개성을 반영할 수 있는 서비스 측면이 있다.As such, the communication service using video has become a basic function, but there are still many limitations to satisfy various needs of users for higher quality service. Users' needs can be divided into two types. First, the quality of natural video can be displayed even in a low network environment, and the service aspect that can hide backgrounds that users do not want or reflect their personality. There is this.

이러한 두 가지 부분을 모두 해결하기 위해서는 실시간으로 통신 영상에서 사람 등의 중요 객체를 분리하는 기술이 요구되는데, 이러한 기술은 현재로는 매우 어려운 기술로서 실시간으로 이루어지지 못하고 있다. 이와 같이 주어진 영상에서 객체를 분리하는 기존 기술을 대략 살펴보면 다음과 같다.In order to solve both of these parts, a technique for separating important objects such as a person from a communication image in real time is required. Such a technology is very difficult at present and has not been achieved in real time. As described above, a conventional technique of separating an object from a given image is as follows.

주어진 영상에서 객체를 분리하는 기술로, 칼라 그룹을 기반으로 이미지에서 부분 영역을 분리하는 기술이 있다. 이 기술은 이미지에서 비교적 영역을 잘 분리하고 있으나 칼라 기반의 영역일 뿐, 의미를 갖는 오브젝트 영역이 아니라는 점과 처리 시간이 매우 길다는 문제가 있다.As a technique of separating objects from a given image, there is a technique of separating partial regions from an image based on a color group. This technique is relatively good at separating areas from an image, but it is a color-based area, not a meaningful object area, and has a long processing time.

주어진 영상에서 객체를 분리하는 또 다른 기술로, 차영상과 에지 영상을 이용하여 객체를 분리하는 방법이 있는데, 이 방법은 사용하는 특징 정보가 단순하여 빠른 처리가 가능하지만 복잡한 화면에서 분리가 되지 않고, 기본적으로 배경은 정지되어 있고 움직이는 물체가 객체라는 전제 아래에서 분리를 행하므로 이동 카메라 환경에서는 적용할 수 없는 문제점이 있다. 또한, 칼라정보를 특징 정보로 사용해서 객체를 분리하는 기법도 제안되고 있으나, 이 방법 또한 칼라가 일정한 영역을 분리한 후 이를 이용하는데 상당한 처리 시간이 요구되어 실시간 적용이 어렵고, 움직이는 객체를 분리하는 알고리즘의 경우에도 이와 같은 이유로 이동 카메라 환경에 적용하기 어려운 문제점이 있다.Another technique for separating objects from a given image is to separate objects using difference images and edge images. This method enables simple processing of features due to the simple feature information used, but does not separate them from complex screens. However, there is a problem in that the background is stationary and the moving object is separated under the premise that the object is not applicable in the mobile camera environment. In addition, a technique for separating objects by using color information as feature information has been proposed, but this method also requires a significant processing time to use a color after separating a certain area, which makes it difficult to apply in real time and separate moving objects. In the case of the algorithm, for this reason, there is a problem that is difficult to apply to the mobile camera environment.

이와 같이 지금까지 이동 통신 환경에서 사람과 같은 객체를 실시간으로 추출하는 알고리즘은 소개된 바가 없다. 대신에 객체의 의미를 얼굴로 한정 시켜서 얼굴을 분리하려는 연구는 몇 보고되고 있다. 즉, 얼굴 영역의 분리 기술은 비교적 실시간에 가능하기도 하지만 아직까지 보안 시스템과 같은 특정한 목적 이외에 이동 단말기를 이용한 실시간 화상통신 환경 등과 같은 응용에서는 유용하게 사용되지 못하고 있다.As such, no algorithm for extracting an object such as a human in real time has been introduced. Instead, few studies have attempted to separate faces by limiting the meaning of objects to faces. In other words, the face separation technique is possible in a relatively real time, but has not yet been useful in applications such as a real-time video communication environment using a mobile terminal in addition to a specific purpose such as a security system.

본 발명은 화상 통신을 위한 영상에서 자동으로 얼굴을 추출 및 추적하여 실시간 화상 통신을 가능하도록 한 화상 통신 시스템과, 그 시스템에서의 얼굴 추출방법 및 얼굴 추적방법을 제공함을 목적으로 한다.An object of the present invention is to provide a video communication system that automatically extracts and tracks a face from an image for video communication to enable real-time video communication, a face extraction method, and a face tracking method in the system.

상기 목적을 달성하기 위하여 본 발명의 화상 통신 시스템은, 화상 통신을 위한 영상의 입력 수단과, 상기 입력된 영상에서 얼굴 영역을 추출하는 수단과, 상기 추출된 얼굴 영역을 포함하는 영상을 인코딩하는 수단; 을 포함하여 이루어지는 것을 특징으로 한다. 또한 본 발명은 상기 추출된 얼굴 영역을 원래의 배경과 다른영상과 합성하는 등의 편집 효과를 부가하거나, 상기 추출된 얼굴 영역과 분리된 배경 영역을 서로 다른 비트율로 인코딩하여 전송할 수 있도록 함을 특징으로 한다.In order to achieve the above object, the video communication system of the present invention includes an input means for image communication for video communication, means for extracting a face region from the input image, and means for encoding an image including the extracted face region. ; Characterized in that comprises a. In another aspect, the present invention is to add an editing effect, such as to synthesize the extracted face region with a different image from the original background, or to encode and transmit the background region separated from the extracted face region at different bit rates. It is done.

또한 상기 목적을 달성하기 위하여 본 발명의 화상 통신 시스템에서의 얼굴 검출방법은, 화상 통신을 위하여 획득한 영상에서 얼굴을 검출하는 단계, 상기 검출된 얼굴을 얼굴 후보 영역으로 하여 얼굴 여부를 확인하는 단계, 상기 확인된 얼굴 영역을 포함하는 영상을 인코딩하여 전송하는 단계; 를 포함하여 이루어지는 것을 특징으로 한다.In addition, in order to achieve the above object, the face detection method in the video communication system of the present invention comprises the steps of: detecting a face in an image obtained for video communication, checking the face using the detected face as a face candidate area; Encoding and transmitting an image including the identified face area; Characterized in that comprises a.

또한 상기 목적을 달성하기 위하여 본 발명의 화상 통신 시스템에서의 얼굴 검출방법은, 영상에서 얼굴의 피부색 특징과 얼굴의 타원형 특징으로 얼굴 영역 후보를 추출하는 단계와, 상기 피부색 특징과 타원형 특징으로 추출된 얼굴 영역 후보 중에서 눈, 코, 입을 검출하여 얼굴영역을 선택하는 단계; 로 이루어지는 것을 특징으로 한다.In order to achieve the above object, the face detection method in the video communication system of the present invention comprises the steps of: extracting a face region candidate from the image of the skin color feature of the face and the elliptical feature of the face; Selecting a face region by detecting eyes, noses, and mouths among the face region candidates; Characterized in that consists of.

또한 상기 목적을 달성하기 위하여 본 발명의 화상 통신 시스템에서의 얼굴 추적방법은, 입력 영상 프레임에서 얼굴을 검출하는 단계와, 연속된 화상통신 영상에서 상기 이전 프레임의 얼굴을 검출한 정보를 이용하여 현재 프레임의 얼굴을 검출하는 단계; 를 포함하여 이루어지는 것을 특징으로 한다.In addition, in order to achieve the above object, the face tracking method in the video communication system of the present invention comprises the steps of detecting a face in an input video frame, and using the information of detecting the face of the previous frame in a continuous video communication image. Detecting a face of the frame; Characterized in that comprises a.

도1은 화상 통신 시스템에서 비트율 제어에 따른 화질 변화를 설명하기 위한 도면1 is a view for explaining a change in image quality according to bit rate control in a video communication system;

도2는 얼굴영역과 배경영역의 편집을 설명하기 위한 도면2 is a diagram for explaining editing of a face region and a background region.

도3은 본 발명의 얼굴 검출방법을 설명하기 위한 플로우차트3 is a flowchart for explaining a face detection method of the present invention.

도4는 본 발명에서 피부색 검출을 설명하기 위한 도면Figure 4 is a view for explaining the skin color detection in the present invention

도5는 본 발명에서 얼굴 검출을 위한 마스크 및 영역 매칭을 설명하기 위한 도면5 is a diagram for explaining mask and region matching for face detection in the present invention.

도6은 본 발명에서 마스크 및 영역 매칭의 예를 나타낸 도면6 illustrates an example of mask and region matching in the present invention.

도7은 본 발명에서 마스크 유형의 예를 나타낸 도면Figure 7 shows an example of a mask type in the present invention

도8은 본 발명에서 마스크 유형에 따른 얼굴 매칭을 설명하기 위한 도면8 is a view for explaining face matching according to mask type in the present invention.

도9는 본 발명에서 얼굴 후보 영역 검출 과정을 설명하기 위한 도면9 is a view for explaining a face candidate region detection process in the present invention.

도10은 본 발명에서 화면에 얼굴의 일부만 나타날 경우의 마스크 매칭을 설명하기 위한 도면FIG. 10 is a diagram illustrating mask matching when only a part of a face appears on a screen in the present invention. FIG.

도11은 본 발명에서 실시간 처리를 위한 마스크 사이즈(가로/세로의 비)의 영향을 설명하기 위한 도면11 is a view for explaining the influence of the mask size (horizontal / vertical ratio) for real time processing in the present invention.

도12는 본 발명에서 얼굴 확인(검증)의 필요성을 설명하기 위한 도면12 is a view for explaining the necessity of face verification (verification) in the present invention.

도13은 본 발명에서 얼굴 확인(검증)을 위한 눈, 코, 입 영역을 검출하는 방법을 설명하기 위한 도면Figure 13 is a view for explaining a method for detecting the eye, nose, mouth area for face verification (verification) in the present invention

도14는 본 발명에서 얼굴 확인(검증)을 위한 눈, 코, 입 영역의 검출 예를 나타낸 도면14 is a view showing an example of detection of the eye, nose, mouth area for face verification (verification) in the present invention

도15는 본 발명에서 얼굴 초기 검출과 추적의 관계를 설명하기 위한 도면15 is a diagram for explaining the relationship between face initial detection and tracking in the present invention.

도16은 본 발명에서 얼굴 추적을 위하여 피부색 분포 영상의 비교 영역을 제한하는 방법을 설명하기 위한 도면16 is a view for explaining a method of limiting a comparison area of a skin color distribution image for face tracking in the present invention;

도17은 본 발명의 얼굴 검출 및 추적기법을 적용한 화상 통신 시스템의 실시예 구성을 나타낸 도면17 is a diagram showing an embodiment of a video communication system to which the face detection and tracking technique of the present invention is applied.

상기 목적을 달성하기 위한 본 발명의 화상 통신 시스템과, 화상 통신을 위한 얼굴 검출방법 및 얼굴 추적방법을 첨부된 도면을 참조하여 바람직한 실시예로서 설명한다.A video communication system of the present invention for achieving the above object, a face detection method and a face tracking method for video communication will be described as preferred embodiments with reference to the accompanying drawings.

먼저, 본 발명에서 실시간 화상 통신 환경을 구현하는 기법의 배경으로 비트율 제어와, 영상의 편집에 대하여 살펴본다.First, in the present invention, a bit rate control and an image editing are described as a background of a technique for implementing a real-time video communication environment.

(1) 얼굴 분리 기술을 이용한 화상통신 데이터의 비트율 제어 방법(1) Bit rate control method of video communication data using face separation technology

화상통신 환경에서는 많은 양의 화상 데이터를 실시간에 제한된 네트워크 환경을 통하여 전송하여야 하는 어려움이 있다. 전송할 화상 데이터의 질과 데이터의 양은 서로 비례하므로 높은 화질을 보장하려면 높은 데이터 전송률이 동시에 보장되어야 한다. 일반적으로 사용자가 느끼는 화질의 차이는 통신 영상의 전체 프레임 영역보다는 사용자 영역에서 더욱 심하게 나타난다. 이는 화상 통신의 경우 사용자의 주된 관심과 주시되는 객체가 사용자(통화하는 상대방)에게 집중되기 때문이다.In a video communication environment, there is a difficulty in transmitting a large amount of image data through a limited network environment in real time. Since the quality of the image data to be transmitted and the amount of data are proportional to each other, high data rates must be guaranteed at the same time to ensure high image quality. In general, the difference in image quality felt by the user is more severe in the user region than in the entire frame region of the communication image. This is because, in the case of video communication, the main interest of the user and the object to be watched are focused on the user (the call partner).

따라서 사용자 영역만을 보다 높은 화질로 전송함으로써, 낮은 데이터 양을 가지고 높은 화질을 구현할 수 있다. 이와 같은 기능을 위해서는 사용자와 배경을 자동으로 분리하는 기술이 요구된다.Therefore, by transmitting only the user region with higher image quality, high image quality can be realized with a low data amount. This feature requires a technology that automatically separates the user from the background.

비트율(Bit-rate) 제어 및 배경 전환 서비스에 대한 예가 도1의 (a), (b), (c)에 도시되어 있다. 도1의 (a)는 원래의 프레임 영상이고 도1 의 (b)는 비트율이 낮은 경우 전송될 수 있는 저 화질의 영상이다. 그러나 사용자 얼굴을 검출할 수 있는 기술을 이용하면 사용자영역과 배경영역의 화질을 다르게 조정할 수 있기 때문에 도1 의 (c)와 같이 사람영역에는 더 많은 비트를 할당하여 고 화질로 전송하고, 배경 영역에는 더 적은 비트를 할당하여 저 화질로 전송하는 것이 가능하다.Examples of bit-rate control and background switching services are shown in Figures 1 (a), (b) and (c). Figure 1 (a) is the original frame image and Figure 1 (b) is a low quality image that can be transmitted when the bit rate is low. However, if the user's face is detected, the quality of the user area and the background area can be adjusted differently. Therefore, as shown in FIG. It is possible to transmit with lower quality by allocating fewer bits.

도1에서 (b)와 (c)의 경우 최종적으로 사용된 데이터의 총량은 비슷하지만,통화자의 관심영역은 상대 통화자의 영역(주로 얼굴)이므로 통화자는 (b)보다 (c)의 화질이 더 좋다고 느끼게 된다. 현재 H.263이나 MPEG-4 기반 인코딩 시스템은 매크로 블럭 단위로 양자화 변수(Quantization parameter)를 조절할 수 있어, 위에 기술한 화질이나 데이터 양을 매크로 블럭마다 조절하는 것이 가능하다.In the case of (b) and (c) in Fig. 1, the total amount of data finally used is similar, but since the caller's area of interest is that of the other party (mainly a face), the caller has a higher quality of (c) than (b). I feel good. Currently, the H.263 or MPEG-4 based encoding system can adjust the quantization parameter (Quantization parameter) in macroblock units, and thus it is possible to adjust the image quality or data amount described above for each macroblock.

또한 사람얼굴 추출 기술을 이용하면 화상통신에서 통화자가 어떤 공간에서 계속 통화를 할 경우 배경이 되는 공간의 영상을 초기에 한번만 전송하고, 이후에는 통화자 영역만 전송하는 것도 가능하다. 이 경우 수신자의 단말기에서는 처음에 받은 배경 이미지와 실시간으로 전송되는 송신자의 영상을 합성하여 디스플레이 하게 된다. 여기서 처음에 전송하는 배경 이미지가 통화자가 있는 실제 공간이 아니라 통화자가 선호하는 임의의 배경 영상일 수도 있다. 즉 일종의 영상 편집 기능이 되는데 이에 관해서는 다음 항목에서 기술한다.In addition, using the human face extraction technology, if the caller continues to talk in a certain space in the video communication, it is possible to transmit an image of the background space only once, and then only the caller area. In this case, the terminal of the receiver synthesizes and displays the background image originally received and the image of the sender transmitted in real time. The background image transmitted initially may be any background image preferred by the caller, not the actual space where the caller is located. That is, it becomes a kind of video editing function, which is described in the next section.

(2) 얼굴 분리 기술을 이용한 화상통신 데이터의 영상 편집(2) Image editing of video communication data using face separation technology

이 기능은 사용자가 원하지 않는 배경을 가리거나 사용자만의 개성을 반영할 수 있도록 통신 영상을 편집하는 서비스이다.This function is a service to edit the communication video to hide the background that the user does not want or reflect the user's personality.

도2는 사용자가 원하는 배경, 또는 광고 배경 등 다른 배경으로 전환하여 통신하는 예를 보여준다. 즉, 사용자가 다른 장소에서 통신을 하고 있는 것과 같은 효과를 줄 수 있도록 배경을 자동으로 전환하여 통신하는 예를 보여주며, 배경 전환 뿐 아니라 특정한 캐릭터에 합성하거나, 그래픽 효과를 주는 것도 가능하다.2 shows an example of communicating by switching to a different background such as a background desired by the user or an advertisement background. That is, an example of automatically switching the background to communicate so that the user has an effect such as communicating in another place, and it is also possible to synthesize a specific character or to give a graphic effect as well as changing the background.

도2에서 (a)는 배경을 다른 장소로 전환시킨 경우이고, (b)는 얼굴 영역을 다른 배경 영상에 합성한 경우이며, (c)는 얼굴 영역을 특정한 캐릭터에 합성한 경우이고, (d)는 얼굴 영역을 액자 영상에 합성한 경우이며, (e)는 얼굴 영역을 제외한 부분에만 스무딩(smoothing) 효과를 준 영상을 보여준다.In FIG. 2, (a) is a case where the background is switched to another place, (b) is a case where the face region is synthesized with another background image, (c) is a case where the face region is synthesized with a specific character, and (d ) Shows a case in which a face region is synthesized in a frame image, and (e) shows an image in which a smoothing effect is applied only to a portion except for the face region.

상기한 관점에서 본 발명은 얼굴 검출 방법과 얼굴 추적 방법의 두가지로 구성되며, 이하 본 발명을 얼굴 검출방법과 얼굴 추적방법으로 나누어 설명한다.In view of the above, the present invention is composed of two types of a face detection method and a face tracking method. Hereinafter, the present invention is divided into a face detection method and a face tracking method.

[얼굴 검출 방법][Face detection method]

얼굴 검출 방법은 도3에 나타낸 바와 같이 1. 피부색 영역을 추출하는 과정, 2. 피부색 영역에 대한 타원형 검증을 실행하는 과정, 3. 눈,코,입 검출을 통한 얼굴 영역의 확인을 하는 과정으로 이루어지며, 각 과정을 나누어 상세하게 설명하면 다음과 같다.As shown in Fig. 3, the method for detecting a face includes 1. extracting a skin color region, 2. performing an elliptic verification on the skin color region, and 3. checking a face region through eye, nose, and mouth detection. It is made in detail by dividing each process as follows.

[1]. 피부색 영역 추출[One]. Skin color area extraction

피부색 영역의 추출은 매우 다양한 방법이 구사될 수 있는데, 본 발명에서는 다음과 같은 방법으로 피부색을 검출한다.Extraction of the skin color region can be used in a wide variety of methods, the present invention detects the skin color by the following method.

먼저 영상이 입력되면, 주어진 영상에서 피부색 범위에 해당하는 색을 지닌 픽셀을 검출하는 피부색 영역 추출 단계를 수행한다. 사람의 피부색은 인종뿐만 아니라 조명, 장치에 따라 다양하므로 광범위하게 나타날 수 있다. 때문에 피부색 영역에는 사람 영역 이외에도 유사한 색의 영역들이 포함될 수 있다. 이를 해결하기 위하여 피부색 영역들을 다시 그 안에서 유사한 색으로 모여 있는 영역들을 분리하는 피부색 그룹화 단계를 수행한다.First, when an image is input, a skin color region extraction step of detecting a pixel having a color corresponding to a skin color range in a given image is performed. Human skin color varies widely depending on race, lighting, and device, so it can appear widely. Therefore, the skin color region may include similar color regions in addition to the human region. In order to solve this problem, a skin color grouping step of separating skin color areas which are gathered in a similar color therein is performed.

그 결과의 예를 도4에 나타내었다. 도4에서의 흑백영상에서 흰색으로 표현된 부분이 추출된 피부색 영역이다. 그러나 도4의 (a), (b), (c)에서 보듯이, 추출된피부색영역은 얼굴 영역뿐 아니라, 옷이나, 배경 등도 포함될 수 있고, 얼굴영역에서도 피부색 추출의 조건에 따라 포함되지 않을 수 있다.An example of the result is shown in FIG. In the black-and-white image of FIG. 4, a portion represented by white is an extracted skin color region. However, as shown in (a), (b), and (c) of FIG. 4, the extracted skin color region may include not only a face region but also clothes, a background, and the like, and may not be included in the facial region according to the condition of skin color extraction. Can be.

[2]. 피부색 영역에 대한 타원형 검증[2]. Elliptic verification of skin color area

[2.1]. 타원형 매칭을 위한 마스크 구성과 매칭값 계산[2.1]. Mask Composition and Matching Value Calculation for Elliptic Matching

도4에서 알 수 있듯이 추출된 피부색 영역만으로 얼굴이라고 판단할 수 없으며 얼굴을 검출하는 추가적인 수단이 요구된다. 본 발명에서는 사람의 얼굴형이 타원형이라는 특징에 착안하여, 상기 피부색 추출과정에서 추출된 피부색 영역중, 공간적 분포가 타원형에 가까운 영역을 얼굴 후보로 찾는다. 비록 머리카락으로 가려지거나 모자를 썼다 하더라도 볼 부분이나 턱선 등에서 타원형의 특징이 나타난다. 본 발명에서는 피부색 추출과정에서 구한 피부색 영역에 대해 도5와 같은 타원형의 마스크(mask)를 매칭(matching)시킨다.As can be seen in FIG. 4, the extracted skin color region alone cannot be determined as a face, and additional means for detecting the face are required. In the present invention, focusing on the feature that the human face shape is elliptical, among the skin color regions extracted in the skin color extraction process, the area whose spatial distribution is close to elliptical is found as a face candidate. Oval features appear on the cheeks and jaw line, even if covered with hair or wearing a hat. In the present invention, an oval mask as shown in FIG. 5 is matched with respect to the skin color region obtained in the skin color extraction process.

도5의 마스크는 얼굴영역과 얼굴 밖 영역의 두 가지 정보를 모두 매칭하고 있는데 그 방법은 다음과 같다.The mask of FIG. 5 matches both types of information of the face region and the off-face region. The method is as follows.

마스크의 흰색으로 표현된 부분이 얼굴영역으로 상기 피부색 영역 추출에서 추출된 피부색 영역 이미지에 마스크를 겹쳤을 때 피부색 영역이 일치하는 얼마나 많은 값을 가지고 있는지를 계산한다(Inner Rate). 만일 피부색 영역 이미지에서 마스크의 흰색 부분에 해당되는 부분의 값이 모두 일치하는 값이라면 얼굴 안쪽 영역 값(Inner Rate)은 100%가 된다. 얼굴 밖 영역을 매칭하는 방법도 유사하다. 얼굴 밖 영역은 마스크의 검은색으로 표현된 부분으로 상기 피부색 영역 추출에서 추출된 피부색 영역 이미지에 마스크를 겹쳤을 때 피부색 영역이 얼마나 많은 일치하지 않는 값을 가지고 있는지를 계산한다(Outer Rate). 마스크를 적용한 예로 도6에서 'Inner Rate'는 95%, 'Outer Rate'는 98%로 계산된다.It is calculated how many values of the skin color region correspond to when the mask is overlaid on the skin color region image extracted by extracting the skin color region from the skin color region extraction as the face region (Inner Rate). If the values of the white areas of the mask in the skin color image are all identical, the inner rate of the face becomes 100%. The method of matching areas outside the face is similar. The out-of-face area is a black part of the mask and calculates how many mismatched values of the skin color area when the mask is superimposed on the skin color image extracted from the skin color area extraction (Outer Rate). As an example of applying the mask, in FIG. 6, 'Inner Rate' is calculated as 95% and 'Outer Rate' as 98%.

최종적인 매칭값의 계산은 얼굴 영역이 매칭되는 것이 중요한지와, 얼굴 밖 영역이 매칭되는 것이 중요한지에 따라 정의할 수 있는데, 예를 들어 얼굴 영역이 매칭되는 것과, 얼굴 밖 영역이 매칭되는 것이 똑 같이 중요하다면, 최종 매칭값은 다음과 같이 정의할 수 있다.The calculation of the final matching value can be defined according to whether it is important to match the face area and whether it is important to match the out-of-face area, for example to match the face area and to match the out-face area. If important, the final matching value can be defined as:

Matching_value = 0.5 ×Inner_Rate + 0.5 ×Outer_RateMatching_value = 0.5 × Inner_Rate + 0.5 × Outer_Rate

다른 방법으로, 얼굴영역이 매칭되는 것이 얼굴 밖 영역이 매칭되는 것에 비해 두배 정도의 중요도를 더 가지고 있다면 최종 매칭값은 다음과 같이 정의될 수 있다.Alternatively, if the matching of the face area has twice as important as the matching of the out of face area, the final matching value may be defined as follows.

Matching_value = 0.67 ×Inner_Rate + 0.33 ×Outer_RateMatching_value = 0.67 × Inner_Rate + 0.33 × Outer_Rate

참고적으로 얼굴영역의 매칭 정도가 얼굴 밖 영역의 매칭 정도에 비해 중요한 환경은 도4의 (c)처럼 얼굴영역 내에서 피부색이 잘 검출되지 않을 경우이고, 반대로 얼굴 밖 영역의 매칭 정도가 얼굴영역의 매칭 정도에 비해 중요한 환경은 도4의 (a)처럼 얼굴 밖 영역에서 피부색이 많이 검출될 경우이다.For reference, the environment where the matching of the face area is more important than the matching degree of the outside area is when the skin color is not detected well in the face area as shown in FIG. Compared to the degree of matching, an important environment is the case where a lot of skin color is detected in the area outside the face as shown in FIG.

이와 같이 본 발명의 타원형 마스크 매칭 방법은 얼굴 영역과 얼굴 밖 영역의 매칭 정도를 따로 계산하므로, 그 응용 환경에 따라 매칭값을 구하는 적절한 식을 얼굴 영역 매칭 정도와 얼굴 밖 영역 매칭 정도를 조합하여 도출할 수 있는 장점이 있다.As described above, since the elliptic mask matching method of the present invention calculates the matching degree between the face area and the out of face area separately, a suitable expression for obtaining a matching value according to the application environment is derived by combining the face area matching degree and the out of face area matching degree. There is an advantage to this.

[2.2]. 마스크 자체의 특성[2.2]. Characteristics of the mask itself

본 발명의 타원형 매칭 마스크는 단지 얼굴영역의 형태가 타원형이라는 정보뿐 아니라, 얼굴 바깥쪽으로는 피부색 영역이 나와서는 안된다는 정보도 표현하고 있으며, 인체에서 목 부분의 피부색영역에서 생기는 오류를 없애기 위해 타원의 아래 부분이 없는 마스크를 사용하고 있다.The oval matching mask of the present invention expresses not only the information that the shape of the face region is elliptical, but also the information that the skin color region should not come out of the outside of the face, and in order to eliminate an error occurring in the skin color region of the neck part in the human body, I use a mask without the bottom part.

다음은 마스크 자체의 특징에 대한 성능을 비교한 것이다.The following is a comparison of the performance of the mask itself.

도7에 나타낸 바와 같이 A형과 B형의 마스크가 있다고 할 때, 도7의 A형의 마스크로 매칭값(InnerRate + OuterRate)이 가장 높은 영역을 찾은 결과는 도8의 (a)와 같다.As shown in FIG. 7, when there are masks of type A and type B, the result of finding the region having the highest matching value (InnerRate + OuterRate) with the mask of type A in FIG. 7 is as shown in FIG.

같은 영상을 도7의 B형의 마스크로 매칭값(InnerRate + OuterRate)이 가장 높은 영역을 찾은 결과는 도8의 (b)와 같다.As a result of finding the region having the highest matching value (InnerRate + OuterRate) using the B-type mask of FIG. 7, the result of FIG.

도8의 결과를 분석하여 보면 왼쪽 여성의 결과에서는 도7의 B형의 마스크가 이마 위쪽으로는 피부색영역이 나오면 안된다는 정보를 반영하기 때문에 보다 정확히 얼굴 영역을 찾은 것을 알 수 있으며, 오른쪽 남성의 결과에서는 도7의 A형의 마스크가 타원의 아래 끝부분까지 비교하여 피부색영역이 없어지는 목 아래 부분까지 얼굴로 잡는 것에 비해 B형 마스크는 타원의 아래 끝 부분은 이용하지 않고 얼굴과 목의 경계까지만 비교하기 때문에 보다 정확히 얼굴영역만 찾은 것을 알 수 있다.In analyzing the results of FIG. 8, the result of the left woman shows that the face type B mask of FIG. 7 reflects the information that the skin color area should not be located above the forehead. In Fig. 7, the mask of type A is compared to the lower end of the ellipse and the face is lowered to the lower part of the neck where the skin color area disappears. Because of the comparison, it can be seen that only the face area is found more accurately.

[2.3]. 타원형 매칭 방법[2.3]. Oval matching method

타원형 마스크가 획득되면 타원형 마스크와 위에 기술한 피부색 영역 이미지를 매칭하여 얼굴 영역을 찾는다. 피부색 영역 이미지에 대해 타원형 마스크를 매칭할 때는 피부색 영역 이미지의 일부분을 직사각형으로 분리하고 마스크의 크기에 맞도록 리사이징(resizing)한 후 마스크와 매칭하여 얼굴영역 매칭정도(Inner Rate)와 얼굴 밖 영역 매칭정도(Outer Rate)를 계산한다.Once the oval mask is obtained, the face region is found by matching the oval mask with the skin color image described above. When matching an oval mask to a skin region image, separate a portion of the skin region image into a rectangle, resizing to match the size of the mask, and match the mask to match the inner rate and the out of face region. Calculate the Outer Rate.

분리하는 직사각형 영역의 위치는 피부색 영역 이미지 상의 수평위치와 수직위치의 두 개의 변수에 대한 문제가 되므로 두 변수를 변화시켜 가면서, 분리할 직사각형 영역을 정한다.Since the position of the separated rectangular area becomes a problem for two variables, a horizontal position and a vertical position on the skin color area image, the rectangular area to be separated is determined by changing the two variables.

또한 직사각형 영역의 크기도 가로길이와 세로길이 두 개의 변수에 대한 문제가 되므로 가로길이와 세로길이에 대한 변수를 변화시켜 가면서, 분리할 직사각형 영역을 정한다.In addition, since the size of the rectangular area also becomes a problem for two variables, the length and length are determined, and the variable area for the length and length is changed, and the rectangular area to be separated is determined.

직사각형 영역을 분리한 후에는 마스크 크기에 맞도록 리사이징(resizing)하여, 마스크의 각 픽셀들과 1:1 매칭을 수행하고, 그 결과 위에 기술한 매칭값(InnerRate, OuterRate)을 획득한다. 최종적으로 매칭값(InnerRate와 OuterRate의 조합)이 임계치 이상인 영역이 얼굴의 후보가 된다.After separating the rectangular regions, the pixels are resized to fit the mask size, and 1: 1 matching is performed with each pixel of the mask, and as a result, matching values (InnerRate and OuterRate) described above are obtained. Finally, a region where the matching value (combination of InnerRate and OuterRate) is greater than or equal to the threshold becomes a face candidate.

도9에 이 과정을 도시하였다.This process is illustrated in FIG.

도9에서 PosX는 분리할 직사각형 영역의 왼쪽 시작 위치, PosY는 분리할 직사각형 영역의 아래쪽 시작 위치, Width는 분리할 직사각형 영역의 가로길이, Height는 분리할 직사각형 영역의 세로길이이다.In Fig. 9, PosX is the left starting position of the rectangular region to be separated, PosY is the lower starting position of the rectangular region to be separated, Width is the horizontal length of the rectangular region to be separated, and Height is the vertical length of the rectangular region to be separated.

먼저, 피부색 영역 이미지를 구하였고, 타원 매칭 마스크를 선택하였다(S11). 다음 단계(S12)에서는 피부색 이미지에서 분리할 직사각형 영역인 'Rectangle'을 설정하며, 이 영역은 왼쪽(Left) = PosX, 오른쪽(Right) = PosX+ Width, 위쪽(Top) = PosY, 아래쪽(Bottom) = PosY + Height로 설정한다.First, the skin color region image was obtained, and an elliptic matching mask was selected (S11). In the next step (S12), we set 'Rectangle', which is the rectangular area to separate from the skin color image, which is Left = PosX, Right = PosX + Width, Top = PosY, Bottom Set = PosY + Height.

다음 단계(S13)에서는 피부색 이미지에서 직사각형 영역(Rectangle)을 마스크 크기와 같아 지도록 리사이징(resizing)한다. 다음 단계(S14)에서는 상기 도5 및 도6에서 설명한 방법으로 마스크 매칭값(Matching_value)을 계산한다.In a next step S13, the rectangular area Rectangle is resized in the skin color image to be equal to the mask size. In a next step S14, the mask matching value Matching_value is calculated by the method described with reference to FIGS. 5 and 6.

다음 단계(S15)에서는 얼굴 후보로 인정되는 임계치 보다 상기 매칭값이 크면 얼굴 후보에 등록하고, 다음 단계(S16)에서는 상기 PosX, PosY, Width, Height 값을 갱신한다. 이렇게 갱신된 PosX, PosY, Width, Height 값에 대하여 종료 조건에 해당하는지를 검사하고(S17), 종료 조건에 해당하지 않으면 상기 단계(S12)부터 계속 수행한다. 모든 수행이 종료되면 타원 매칭값이 임계치 이상인 얼굴 후보들을 결과로 얻는다.In the next step S15, if the matching value is larger than the threshold value recognized as the face candidate, the matching is registered in the face candidate. In the next step S16, the PosX, PosY, Width, and Height values are updated. The updated PosX, PosY, Width, and Height values are checked whether they correspond to the end condition (S17), and if they do not correspond to the end condition, the process continues from step S12. When all executions are finished, face candidates with elliptic matching above the threshold are obtained.

[2.4]. 화면에 얼굴의 일부만 나타날 경우의 마스크 매칭[2.4]. Mask matching when only part of the face appears on the screen

본 발명의 대상 시스템이 화상 통신 영상이고 화상통신 영상은 영상사이즈가 크지 않으며 사용자의 움직임으로 단말기가 움직일 수 있으므로 사용자의 얼굴이 화면에 전부 표시되지 않을 수도 있다.Since the target system of the present invention is a video communication image and the video communication image is not large in image size and the terminal may be moved by the user's movement, the user's face may not be displayed on the screen.

즉, 도10에 나타낸 바와 같이 사용자 얼굴의 왼쪽만 표시되었을 경우에도 본 발명의 타원형 매칭방법은 얼굴의 가려진 부분까지 고려하여 정확하게 얼굴 영역을 찾을 수 있도록 고안되었다. 도면의 왼쪽 실제 영상에서 타원으로 그려진 영역이 본 발명 시스템에서 찾아낸 사용자의 얼굴 영역이다.That is, even when only the left side of the user's face is displayed as shown in FIG. 10, the elliptic matching method of the present invention is designed to accurately find the face area in consideration of the hidden part of the face. The area drawn by the ellipse in the left image of the figure is the face region of the user found by the present system.

이렇게 하기 위해서는 본 발명의 타원형 매칭에서는 가려진 부분을 고려하지 않는 조건(Don't care condition)으로 두고 가려지지 않은 부분의 타원 매칭값을가려진 부분과 가려지지 않은 부분을 합한 전체 영역에 대한 대표값으로 이용하면 된다.In order to do this, in the elliptic matching of the present invention, the elliptic matching value of the unhidden portion is set as a Don't care condition, and is represented as a representative value of the entire area including the hidden portion and the unhidden portion. You can use

[2.5]. 실시간 처리를 위한 적응 방법[2.5]. Adaptive method for real time processing

(1). 얼굴 후보를 찾기 위한 타원형 매칭을 실시간에 처리하기 위해서 본 발명에서는 도9에서의 리사이징(resizing) 단계를 최소한의 횟수로 시행한다.(One). In the present invention, the resizing step of FIG. 9 is performed a minimum number of times in order to process elliptic matching to find a face candidate in real time.

리사이징(resizing)은 크기에 대한 변수 'Width'와 'Height'에 대해 결정되므로 'Width'와 'Height'가 같고 위치변수인 'PosX'와 'PosY'가 다른 직사각형 영역에 대해서는 리사이징(resizing)이 반복되지 않도록 하였다. 즉, 'PosX'와 'PosY'가 변하며 참조하는 전체영역에 대해 한번만 리사이징(resizing)을 시행한 후, 각 'PosX'와 'PosY'값에 대해 참조하는 영역의 리사이징(resizing) 결과를 선별해 오는 것이다.Resizing is determined for the variables 'Width' and 'Height' for size, so resizing is not possible for rectangular areas with the same 'Width' and 'Height' and different position variables 'PosX' and 'PosY'. It was not repeated. In other words, 'PosX' and 'PosY' are changed, and resizing is performed only once for the entire area referred to, and then the result of resizing of the area referenced for each 'PosX' and 'PosY' value is selected. Is coming.

(2). 얼굴 후보를 찾기 위한 타원형 매칭을 실시간에 처리하기 위해서 본 발명에서는 매칭을 반복하는 횟수를 줄이기 위해 매칭 횟수를 결정하는 크기변수'Width', 'Height'와 위치변수 'PosX', 'PosY'를 비교적 큰 스텝 사이즈(stepsize)로 변화시켰다. 이렇게 하면 상기 도9에서 단계(S16)에 의해 재조정(갱신)되는 상기 변수값의 증감폭을 크게 할 수 있어서 매칭 반복 횟수를 줄일 수 있게 된다.(2). In order to process the elliptic matching to find the face candidate in real time, the present invention compares the size variables 'Width', 'Height' and the position variables 'PosX', 'PosY' which determine the number of matching to reduce the number of repetitions. Large step size was changed. This makes it possible to increase and decrease the variable value readjusted (updated) in step S16 in FIG. 9, thereby reducing the number of matching iterations.

(3). 얼굴 후보를 찾기 위한 타원형 매칭을 실시간에 처리하기 위해서 본 발명에서는 크기와 위치에 대한 4개의 변수(Width, Height, PosX, PosY) 중 세로길이에 대한 변수(Height)를 제거하고 가로길이(Width)에 대해 미리 지정된 세로길이를사용하였다(즉, 가로 길이를 '1'로 할 때 세로 길이를 '1.2'로 한다거나, 가로 길이를 '1'로 할 때 세로 길이를 '1.3'으로 하는 방법). 이렇게 하면 얼굴의 가로길이와 세로길이의 비율이 다른 얼굴에 대해 대응하지 못하는 단점이 있으나, 도11에 나타낸 바와 같이 그 결과는 그 차이가 얼굴을 검출하는데 있어 중대하지 않음을 보여주고 있다.(3). In order to process the elliptic matching to find the face candidate in real time, the present invention removes the height (Height) from the four variables (Width, Height, PosX, PosY) for the size and position, and the width. We used a pre-defined vertical length for (e.g., when the length is set to '1', the length is set to '1.2', or when the length is set to '1', the length is set to '1.3'). In this case, there is a disadvantage in that the ratio of the length of the face to the length of the face does not correspond to the face, but as shown in FIG. 11, the result shows that the difference is not significant in detecting the face.

즉, 도11에서 첫번째 사람과 세번째 사람의 경우 얼굴의 가로, 세로비가 1:1.2인 경우가 더 정확한 얼굴 경계를 표현하고, 두번째 사람의 경우 얼굴의 가로, 세로비가 1:1.3인 경우가 더 정확한 얼굴 경계를 표현하지만, 그 결과의 차이가 얼굴을 검출한다는 의미에서 중대한 것은 아니라는 것이다.That is, in FIG. 11, the first person and the third person have a more accurate face boundary when the ratio of the face is 1: 1.2, and the second person has a more accurate case where the ratio is 1: 1.3. While expressing face boundaries, the difference in results is not significant in the sense of detecting a face.

(4). 얼굴 후보를 찾기 위한 타원형 매칭을 실시간에 처리하기 위해서 본 발명에서는 보통의 화상통신 영상에서 나타나는 사람의 얼굴 크기보다 매우 작은 32 ×33크기의 타원형 마스크를 사용하였다. 이렇게 하면 작은 사이즈의 타원형 마스크를 가지고 얼굴 후보 영역을 찾기 때문에 큰 사이즈의 타원형 마스크를 사용할 때 보다 빠르고 적은 양의 후보 찾기 과정을 수행할 수 있다.(4). In order to process the elliptic matching to find a face candidate in real time, the present invention uses a 32 × 33 size elliptical mask that is much smaller than the face size of a person appearing in a normal video communication image. In this way, the face candidate area is searched with the small size of the elliptical mask, so the process of finding the candidate is faster and faster than using the large size of the elliptical mask.

[3]. 눈, 코, 입 검출을 통한 얼굴 영역 확인[3]. Facial region identification through eye, nose and mouth detection

위에 기술한 방법으로 얼굴의 후보가 되는 타원형 영역을 찾은 후 눈, 코, 입을 검출하여 타원형의 피부색 영역이 실제로 얼굴인지 확인하는 단계가 필요하다. 예를 들어 도12를 보면 피부색 분포 영상에서 얼굴뿐 아니라 손 부분도 타원 매칭값이 높으므로 얼굴 후보 영역으로 추출되었다. 이런 경우에 실제 얼굴과 얼굴이 아닌 부분을 구분할 수 있는 단서는 눈, 코, 입의 존재 유무가 될 수 있다.After finding an elliptical region that is a candidate for the face by the above-described method, it is necessary to detect eyes, nose, and mouth to check whether the elliptical skin color region is actually a face. For example, as shown in FIG. 12, since the elliptic matching value of not only the face but also the hand is high in the skin color distribution image, it is extracted as the face candidate region. In this case, a clue that can distinguish the real face from the non-face part may be the presence of eyes, nose and mouth.

이를 위하여 눈, 코, 입을 검출하는 방법은 두 단계로 구성되는데, 그 첫번째 단계는 타원영역 내부를 대상으로 중앙은 어둡고 위아래는 밝은 부분을 찾는 것과, 두번째 단계로 상기 첫번째 단계에서 찾은 부분 중에서 실제 눈, 코, 입을 결정하는 단계로 구성된다.For this purpose, the method of detecting the eyes, nose, and mouth consists of two steps, the first step being to find the dark and upper and lower part of the light in the elliptical area, and the second step to find the actual eye from the first step. It consists of determining the nose, mouth.

(1). 타원영역 내부를 대상으로 중앙은 어둡고 위아래는 밝은 부분을 찾는 방법(One). How to find the dark part of the center and the bright part of the top and bottom of the ellipse area

타원 영역중에서 어두운 부분을 찾기 위해서는 도13의(a)와 같은 기본 패턴을 원영상에 대한 회색영상(Gray scale image)에 매칭한다. 기본 패턴은 중간 밴드가 어둡고 위와 아래 밴드가 밝은 모양으로 이는 얼굴의 눈이나, 코, 입이 살색 영역으로 둘러싸여 있기 때문이다. 또한 위와 아래는 밝기 때문에 머리카락 영역처럼 전체적으로 어두운 부분은 제거할 수 있다. 도13의 (a)에 나타낸 패턴의 크기는 타원형 매칭에서 찾은 타원의 가로길이(Width)에 대해 비례하는 값으로 지정된다.In order to find the dark part of the elliptic area, the basic pattern as shown in FIG. 13 (a) is matched with a gray scale image for the original image. The basic pattern is that the middle band is dark and the upper and lower bands are bright because the eyes, nose, and mouth of the face are surrounded by skin areas. And because the brightness is above and below, you can get rid of dark areas like the hair area. The size of the pattern shown in Fig. 13A is designated as a value proportional to the width of the ellipse found in the elliptic matching.

도13의 (b)와 (c)는 (a)의 패턴을 이용하여 얼굴의 어두운 부분을 찾은 결과이다. 원 영상 아래의 흑백 피부색 분포 영상에서 보여주는 바와 같이 눈이나 코, 입 주위에 대하여 사각형으로 표시된 부분이 얼굴의 어두운 부분으로 검출된 곳이다. 결과에서 보여지듯이 눈, 눈썹, 코, 입, 턱 아래부분 등이 얼굴의 어두운 부분으로 검출되었다.Figures 13 (b) and 13 (c) are results of finding dark portions of the face using the pattern of (a). As shown in the black and white skin color distribution image below the original image, the area indicated by the square around the eyes, nose and mouth is detected as the dark part of the face. As shown in the results, the eyes, eyebrows, nose, mouth and lower chin were detected as dark areas of the face.

(2). 어두운 부분 중 실제 눈, 코, 입을 결정하는 단계(2). Steps to determine the real eye, nose and mouth of the dark

상기한 바와 같이 기본 패턴을 이용해서 찾은 얼굴의 어두운 부분에서 눈, 코, 입을 결정하는 방법은 눈, 코, 입이 위치할 수 있는 공간적인 관계를 이용한다. 즉, 눈은 얼굴의 윗부분에 있으며 두 눈은 비슷한 높이에 있고, 코는 두 눈이 위치한 곳의 사이에서 아래 부분에 있으며, 입은 코 아래 있다는 등의 관계이다.As described above, the method of determining the eyes, nose, and mouth in the dark areas of the face found using the basic pattern uses a spatial relationship in which the eyes, nose, and mouth may be located. That is, the eyes are at the top of the face, both eyes are at a similar height, the nose is at the bottom between the two eyes, and the mouth is below the nose.

도14는 얼굴의 어두운 부분 중에서 최종적으로 눈, 입으로 선택된 결과의 예를 보여준다.Figure 14 shows an example of the results finally selected from the dark areas of the face with eyes and mouth.

[얼굴 추적 방법][Face tracking method]

얼굴 추적(face tracking)은 위에 기술한 방법으로 얼굴을 초기 검출한 후 이후 연속되는 화상통신 프레임에 대해서 얼굴영역을 보다 빠른 시간에 검출해 냄으로써 화상통신 시에 실시간 얼굴 검출이 가능하게 하는 것이다.Face tracking is a method of real time face detection at the time of video communication by detecting the face area at a faster time after the initial detection of the face by the above-described method and subsequent video communication frames.

얼굴 초기 검출과 얼굴 추적의 관계는 도15와 같이, 초기 얼굴을 검출하고(S21), 검출된 얼굴 영역이 초기 얼굴 조건을 만족하는가를 검사하고(S22), 초기 얼굴 조건을 만족하지 않으면 다음 프레임에 대해서 초기 얼굴을 검출하며, 초기 얼굴 조건을 만족하면 얼굴 추적을 실시하고(S23), 상기 추적된 얼굴이 얼굴 추적 조건을 만족하는가를 검사하고(S24), 얼굴 추적 조건을 만족하지 않으면 초기 얼굴 검출을 실행하고, 얼굴 추적 조건을 만족하면 다음 프레임에 대해서 얼굴 추적을 계속하는 것이다.The relationship between the face initial detection and the face tracking is as shown in Fig. 15, the initial face is detected (S21), the detected face area is examined to satisfy the initial face condition (S22), and if the initial face condition is not satisfied, the next frame is detected. Detects the initial face with respect to the face, and performs face tracking if the initial face condition is satisfied (S23), examines whether the tracked face satisfies the face tracking condition (S24), and if the face tracking condition is not satisfied, the initial face. If detection is performed and the face tracking condition is satisfied, face tracking is continued for the next frame.

즉, 첫 프레임(혹은 정확성을 높이기 위해 연속한 몇 프레임)을 이용하여 초기 얼굴 영역을 찾고(S21,S22), 이후 다음 프레임들에 대해서는 얼굴 추적 방법을 적용한다(S23,S24). 화상 통신 중에 실시간으로 얼굴을 검출하여야 하므로 얼굴 추적 방법은 초기 얼굴 검출 방법에 비해 간단하지만 빠른 방법으로 구현된다. 또한 얼굴 추적단계에서 오류가 발생했을 때나 주기적으로 초기 얼굴 검출 방법을 이용하여 다시 초기화 할 수 있다.That is, the initial face region is found using the first frame (or several consecutive frames to increase the accuracy) (S21, S22), and then the face tracking method is applied to the next frames (S23, S24). Since the face must be detected in real time during video communication, the face tracking method is simpler but faster than the initial face detection method. In addition, when an error occurs in the face tracking step or periodically, it may be reinitialized using an initial face detection method.

기본적으로 얼굴 추적 방법은 초기 얼굴 검출 방법과 같으나, 타원 매칭의 비교회수를 줄이는 것이다. 타원 매칭의 비교회수를 줄이는 방법은 두가지 방법이 있는데, 그 첫번째 방법은 피부색 분포영상의 비교 영역을 제한하는 방법이고, 두번째 방법은 얼굴 크기의 변화도를 제한하는 방법이다.The face tracking method is basically the same as the initial face detection method, but reduces the number of comparisons of elliptic matching. There are two ways to reduce the number of comparisons of elliptic matching. The first method is to limit the comparison area of the skin color distribution image, and the second method is to limit the degree of change in face size.

(1). 피부색 분포영상의 비교 영역을 제한하는 방법(One). How to limit the comparison area of skin color distribution image

초기 얼굴 검출에서는 타원 매칭을 할 경우 피부색 분포 영상에서 피부색이 존재하는 전체 영역을 검색하였지만, 얼굴 추적 단계에서는 이전프레임의 얼굴이 있는 영역을 중심으로 한 일부(얼굴이 이동되었다고 가정할 때 그 얼굴 이동을 포함할 수 있는 정도)만 검색한다. 예를 들어 도16의 경우, 초기 얼굴 검출 단계에서는 얼굴과 목, 피부색에 속한 조끼 영역 등 모든 피부색 존재 영역(바깥쪽 사각형 영역)을 검색하지만, 얼굴 추적 단계에서는 도16에서 얼굴 주변의 작은 사각형으로 표시된 부분만 검색하게 된다.In the initial face detection, when the elliptic matching is performed, the entire area where the skin color exists in the skin color distribution image is searched, but in the face tracking step, the part of the face centered on the area where the face of the previous frame is located Search only). For example, in Fig. 16, in the initial face detection step, all skin color present areas (outer square area) such as face and neck, and the vest area belonging to the skin color are searched, but in the face tracking step, as shown in FIG. Only the marked part will be searched.

(2). 얼굴 크기의 변화도를 제한하는 방법(2). How to limit the gradient of face size

얼굴 추적 단계에서는 비교할 타원의 크기변화를 이전얼굴 크기에 대해 지정된 범위 안에서만 가능하도록 제한하였다. 예를 들면 현재 프레임에서 가능한 타원의 크기 S(t)는 이전 프레임의 타원크기 S(t-1)에 대해;In the face tracking step, the size of the ellipse to be compared is limited to the range specified for the previous face size. For example, the size of the ellipse S (t) possible in the current frame is for the ellipse size S (t-1) of the previous frame;

0.9 ×S(t-1) ≤ S(t) ≤ 1.1 x S(t-1)로 제한하는 것이다.0.9 x S (t-1)? S (t)? 1.1 x S (t-1).

참고적으로 초기 얼굴 검출 단계에서는 화상통신 영상에서 가능한 최대 얼굴 크기와 최소 얼굴크기에 대한 임계치를 미리 정한 후 최소/최대 임계치 범위에서가능한 모든 얼굴 크기를 비교하였다.For reference, in the initial face detection step, threshold values for the maximum face size and the minimum face size in the video communication image were previously determined, and then all possible face sizes in the minimum / maximum threshold range were compared.

[화상 통신 시스템][Image communication system]

도17은 지금까지 설명한 얼굴검출 및 추적방법을 적용한 화상 통신 시스템 구성의 일 예를 보여준다. 영상 입력수단(100)은 화상 통신을 위한 사용자 영상을 획득하며, 예를 들면 PC카메라일 수 있다. 얼굴 영역 추출수단(101)은 상기 영상 입력수단(100)에 의해서 획득한영상에서 얼굴 영역을 추출한다. 얼굴 영역 추출수단(101)은 앞서 설명한 바와 같이 입력 영상에서 배경과 오브젝트(얼굴)를 분리하여 검출하는 얼굴 검출수단(101a)과, 상기 검출된 얼굴에 대하여 앞서 설명한 얼굴 추적을 수행하는 얼굴 추적수단(101b)을 포함할 수 있다. 상기 얼굴 영역 추출수단(101)에 의해서 추출된 얼굴 영역(또는 추적되는 얼굴 영역)과 그 배경으로 이루어지는 영상신호는 인코딩수단(102)에 의해서 인코딩되어 전송된다. 상기 인코딩수단(102)은 상기 추출된 얼굴 영역을 기반으로 하여 비트율(bit rate)을 자동 제어하는 비트율 제어수단(102a)을 포함할 수 있다. 비트율 제어수단(102a)은 예를 들면 도1에서 설명한 바와 같이 얼굴 영역은 높은 비트율로 인코딩하고 배경 영역은 그 보다 낮은 비트율로 전송함으로써, 사용자가 주시하게 되는 통화자의 얼굴에 대해서는 고화질을 확보하고, 상대적으로 관심이 덜한 배경은 저화질로 처리하여, 전체적인 전송부담을 줄이면서도 좋은 화질로 인식될 수 있는 화상 통화 환경을 제공할 수 있도록 해준다.17 shows an example of the configuration of a video communication system to which the face detection and tracking method described so far is applied. The image input unit 100 obtains a user image for image communication, and may be, for example, a PC camera. Face region extraction means 101 extracts the face region from the image obtained by said image input means (100). As described above, the face region extracting means 101 includes a face detecting means 101a for separating and detecting a background and an object (face) from an input image, and a face tracking means for performing face tracking on the detected face. 101b may be included. The video signal including the face region (or the face region to be tracked) extracted by the face region extraction means 101 and its background is encoded and transmitted by the encoding means 102. The encoding means 102 may include bit rate control means 102a for automatically controlling a bit rate based on the extracted face region. The bit rate control unit 102a encodes the face region at a high bit rate and transmits the background region at a lower bit rate, as described with reference to FIG. The less-interested background can be processed at lower quality, providing a video call environment that can be perceived with good image quality while reducing the overall transmission burden.

또한, 편집수단(103)은 상기 추출된 얼굴 영역을 기반으로 하여 도2에서 설명한 바와 같은 영상 편집을 수행한다. 영상의 편집을 위해서는 이 편집수단(103)이 사용자와의 인터페이스를 갖출 수 있으며, 다양한 편집 효과를 위한 배경 영상 등의 정보를 미리 저장해 놓은 메모리를 가질 수도 있다.In addition, the editing means 103 performs image editing as described with reference to FIG. 2 based on the extracted face region. For editing an image, the editing means 103 may have an interface with a user, and may have a memory in which information such as a background image for various editing effects is stored in advance.

도17에 나타낸 본 발명의 화상 통신 시스템 구성은 영상입력수단, 얼굴 추출수단, 인코딩 수단, 편집수단을 기반으로 하여 다양하게 변형되어 실시될 수 있으며, 도17에 나타낸 구성으로 제한되지 않는다. 도17에는 도시하지 않았으나, 화상 통신 시스템은 음성과 영상의 양방향 전송이 이루어지므로 상대방으로부터 전송되어온 영상신호를 디코딩하여 디스플레이하는 요소가 부가됨은 자명하다.The video communication system configuration of the present invention shown in FIG. 17 can be variously modified and implemented based on the video input means, face extraction means, encoding means, and editing means, and is not limited to the configuration shown in FIG. Although not shown in FIG. 17, since the video communication system performs bidirectional transmission of audio and video, it is obvious that an element for decoding and displaying the video signal transmitted from the counterpart is added.

본 발명은 얼굴 검출 및 추적 기술을 통해 화상통신 데이터에서 실시간으로 얼굴 영역을 추출하며, 추출된 얼굴 영역 정보를 이용하여 효과적으로 화상통신의 비트율을 제어하거나 화상통신 영상을 자동 편집할 수 있는 시스템을 제공한다.The present invention provides a system that can extract the face region from the video communication data in real time through face detection and tracking technology, and effectively control the bit rate of the video communication or automatically edit the video communication image using the extracted face region information. do.

본 발명은 화상 통신 시스템에서 사용자가 주로 관심있는 영역인 통화자의 얼굴 영역을 추출하고, 이를 기반으로 하여 얼굴 영역을 배경 영역에 비하여 고화질로 전송함으로써 실질적으로 화상통신에서 의미있는 정보의 전달이 고화질로 이루어질 수 있고, 필요에 따라 배경화면에 대한 편집을 가하여 화상 통신시에 보다 다양한 부가 기능을 제공할 수 있도록 하였다.The present invention extracts the caller's face area, which is the area of interest of the user, in the video communication system, and transmits the face area in high quality compared to the background area based on this. The background image can be edited as necessary to provide various additional functions in video communication.

Claims

Image input means for inputting an image for video communication;

The face region is extracted from the initial frame of the image input through the image input means, and in the subsequent frames, the face region is obtained by elliptic verification of the skin color region with respect to the current frame based on the face region position extracted from the previous frame. In extracting a face, face region extraction means for extracting a face region while changing the degree of change in the size and position of the face of the previous frame only within a specified range;

Encoding means for encoding an image including the face region extracted by the face region extracting means ;

Rate control means for automatically controlling the bit rate based on the face region extracted by the face region extracting means; And a video communication system comprising a.

The video communication system according to claim 1, further comprising editing means for editing an image based on the extracted face region.

delete

In an image acquired for video communication, an elliptical face region represented by a parametric expression is detected in an initial frame of the acquired image, and in a subsequent frame, a face region of the elliptical face is detected based on the position of the face region detected in the previous frame. In detecting a face region through elliptic verification of the skin color region, detecting a face region while changing the degree of change in the size and position of the face of the previous frame only within a specified range , Determining whether a face is a face candidate area, encoding and transmitting an image including the identified face area; Face detection method in a video communication system comprising a.

The method of claim 4, wherein the detecting and confirming of the face comprises: extracting a skin color region, performing a facial shape verification by comparing an elliptical degree of the skin color region using a mask in the extracted skin color distribution image; Identifying a face region through eye, nose, and mouth detection with respect to the facial shape verified result; And a face detection method in a video communication system.

6. The face type verification using the mask according to claim 5, wherein the facial shape verification using the mask uses two pieces of information that the skin color distribution of the face area is close to an oval, and that the outside area has no skin color distribution. Detection method.

The method of claim 5, wherein the performing of the face shape verification comprises quantifying a combination of two pieces of information indicating that the skin color distribution of the face region is close to an oval, and that the information outside the face has no skin color distribution. A face detection method in a video communication system, characterized in that the face candidate is selected as a face candidate when the value of the ellipticity is digitized.

6. The method of claim 5, wherein comparing the elliptical degree of the skin color area using a mask in the extracted skin color distribution image comprises: separating a portion by adjusting a position and size larger than one pixel unit in the skin color area; Resizing a part to be equal to a mask size that is smaller than the actual face size, and comparing the result with respect to each pixel for the mask, thereby detecting a face. Way.

The method of claim 5, wherein the face shape verification for the skin color region is regarded as a do n't care condition when only a part of the caller's face appears on the screen, and the ellipticity of the displayed part is digitized. A face detection method in a video communication system, characterized by using a representative value for the entire area where a part showing one value and a part not appearing are used.

The method of claim 8, wherein the resizing is performed only once for the entire area resizing at the same size ratio for faster calculation.

delete

Detecting a face region in an initial frame of an input image, and extracting a face region through elliptic verification of a skin color region with respect to a current frame based on the position of a face region detected in a previous frame in a subsequent frame. Extracting a face region while changing the degree of change in face size and position of the frame within a specified range ; Face tracking method in a video communication system comprising a.

delete

The method of claim 13, wherein the limiting of the degree of change while changing only within a specified range is to adjust the size or position only within a range specified for the face size or position of a previous frame.

A face extraction step of extracting a face region by detecting a skin color region and an elliptic mask matching in an input image, based on the extracted face region, and in a subsequent frame, based on the position of the face region extracted from the previous frame In detecting a face region through elliptic verification of the skin color region for a frame, a face tracking step of detecting a face region while changing the degree of change in the size and position of the face of the previous frame only within a specified range , and detecting the detected face region and the background. Separating and transmitting the face region and the background at different bit rates; Encoding control method of a video communication system comprising a.

The method of claim 16, wherein the eye, nose, and mouth are detected by matching a predetermined reference pattern for the detection of eyes, nose, and mouth with respect to the face area within the detected face area, and the face area is detected from their spatial positional relationship. Encoding control method of a video communication system, characterized in that for verifying.

delete

17. The method of claim 16, wherein a video editing effect is provided by synthesizing and encoding the separated face region with a new background image replacing the original background region.