KR20120021103A

KR20120021103A - A methode of speaker's face area detection in communication terminal and the distance measurement between speaker and communication terminal using the same and communication terminal using thesame

Info

Publication number: KR20120021103A
Application number: KR1020100085156A
Authority: KR
Inventors: 이재원; 홍성훈
Original assignee: 전남대학교산학협력단
Priority date: 2010-08-31
Filing date: 2010-08-31
Publication date: 2012-03-08

Abstract

PURPOSE: A method for detecting a face area of a speaker in a communication terminal and the communication terminal using the same, and a method for measuring the distance between speakers and a communication terminal using the same are provided to supply an optimized distance for guaranteeing high communication quality to a speaker. CONSTITUTION: A transceiving unit of a communication terminal receives a k-numbered image frame(S210). A converting unit converts the k-numbered image frame into color signal components corresponding to a plurality of color coordinate systems(S230). A probability mask assigning unit distinguishes the color signal components(S240). A detection unit calculates horizontal and vertical histogram values(S270). The detection unit detects a face area using the histogram value(S280).

Description

A method of speaker's face area detection in communication terminal and the distance measurement between speaker and communication terminal using the same and communication terminal using thesame}

본 발명은 통신단말기에서 화자의 얼굴영역 검출 방법, 이를 이용한 통신단말기와 화자 간 거리 측정 방법 및 이를 적용한 통신단말기에 관한 것으로, 보다 상세하게는 통화 중인 화자의 얼굴영역을 실시간으로 검출하고, 검출된 얼굴영역의 크기를 기준으로 통신단말기와 화자 사이의 거리를 측정하는 통신단말기에서 화자의 얼굴영역 검출 방법 및 이를 이용한 통신단말기와 화자 간 거리 측정 방법 및 이를 적용한 통신단말기에 관한 것이다.The present invention relates to a method for detecting a face area of a speaker in a communication terminal, a method for measuring a distance between a communication terminal and a speaker using the same, and a communication terminal applying the same. The present invention relates to a method for detecting a face area of a speaker, a method for measuring a distance between a communication terminal and a speaker using the same, and a communication terminal using the same in a communication terminal that measures the distance between the communication terminal and the speaker based on the size of the face region.

얼굴영역 검출이란 영상에서 얼굴의 유무를 결정하고, 만약 영상 내에 얼굴이 존재한다면 얼굴의 위치와 크기를 찾아내는 기술 분야이다.Face area detection is a technical field of determining the presence or absence of a face in an image and finding the position and size of the face if a face exists in the image.

기존의 얼굴영역 검출 기술에는 지식 기반 방법(Knowledge-based Methods), 특정 기반 방법(Feature-based Methods), 형판 정합 방법(Integration of multiple feature Methode), 외형 기반 방법(Appearance-based Methods)이 있다.Conventional face area detection techniques include knowledge-based methods, feature-based methods, integration of multiple feature methods, and appearance-based methods.

기식 기반 방법은 연구/개발자의 지식에 근거한 규칙에 따라 얼굴영역 검출 알고리즘을 개발하는 방법으로, 대상 이미지에서 얼굴 후보 이미지는 미리 작성된 규칙에 따라 검출된다. 즉, 얼굴은 눈, 코, 입 등 특징적인 성분을 포함하고 각 성분들은 일정 거리와 일정 위치관계를 갖는다는 지식을 이용하여 얼굴의 영역을 검출한다. 이 방식은 얼굴 부위의 잡음, 밝기 혹은 다른 사물에 의해 가려지는 등의 경우에 이미지 특징들을 찾는 알고리즘이 심각한 수준으로 영향을 받을 수 있는 문제점이 있다.The knowledge-based method is a method of developing a face region detection algorithm according to a rule based on knowledge of a researcher / developer. The face candidate image in a target image is detected according to a previously written rule. That is, the face includes characteristic components such as eyes, nose, and mouth, and the components of the face are detected by using knowledge that each component has a predetermined distance and a predetermined positional relationship. This method has a problem that an algorithm for finding image features may be seriously affected in case of noise, brightness or other objects of the face.

특징 기반 방법은 얼굴의 기하학적인 정보를 이용하거나 얼굴 특징 성분인 눈, 코, 입, 턱 등을 이용하여 그 크기와 모양, 상호 연관성 또는 이러한 요소들의 혼합된 형태의 정보를 이용해서 얼굴을 인식하는 방법이다. 즉, 얼굴의 눈, 코, 입과 같은 특징을 이용한 검출, 얼굴이 가진 특정 질감을 이용한 검출, 얼굴의 색상을 이용한 검출 등 다양한 방법이 있다. 이 방식은 얼굴 부위의 노이즈, 밝기 혹은 다른 사물에 의해 가려지는 등의 경우에 이미지 특징들을 찾는 알고리즘이 심각한 수준으로 영향을 받을 수 있는 문제점이 있다.The feature-based method recognizes a face by using geometric information of the face or by using the facial features such as eyes, nose, mouth, and chin using information of size, shape, correlation, or a mixture of these elements. It is a way. That is, there are various methods such as detection using features such as eyes, nose and mouth of the face, detection using a specific texture of the face, detection using color of the face, and the like. This method has a problem that an algorithm for finding image features may be seriously affected in case of noise, brightness or other objects of the face.

형판 정합 방법은 템플렛 매칭(Template Matching) 방법이라고도 불리며, 특정 함수에 의해 계산된 얼굴에 대한 표준 템플렛을 만들고 이후 입력된 얼굴 이미 지와의 상관 정도를 측정하여 검출을 시도하는 방식이다. 이 방식은 상대적으로 구현하기가 쉽다는 장점이 있으나 얼굴 이미지의 크기, 모양, 포즈 등의 다양한 변경 사항에 대해 효율적으로 처리하지 못하는 문제점이 있다.The template matching method, also called template matching method, is a method of creating a standard template for a face calculated by a specific function and then attempting detection by measuring the degree of correlation with the input face image. This method has the advantage of being relatively easy to implement, but has a problem in that it cannot efficiently handle various changes such as the size, shape, and pose of the face image.

마지막으로 외형 기반 방법은 여러 얼굴 이미지와 다른 이미지들에서 학습한 얼굴 이미지의 특징을 이용하여 통계적인 분석을 통해 얼굴을 검출하는 방법이다. 입력 이미지에서 얼굴을 검출하기 위해 전체 이미지를 스캔하면서 확률 함수를 이용하여 얼굴 부분과 얼굴이 아닌 부분을 파악한다. 잘 알려진 얼굴 검출 방법으로는 주성분 분석(Principal Component Analysis; PCA)에 의해 생성되는 고유얼굴(Eigenfaces), 선형판별식 해석(Linear Discrimi-nant Analysis; LDA), 신경망(Neural networks; NN), 서포트 벡터 머신(Support Vector Machine; SVM) 방법을 이용하는 방법 등이 있다.Lastly, the appearance-based method is a method of detecting a face through statistical analysis using features of face images learned from various face images and other images. Scan the entire image to detect the face in the input image and use the probability function to identify the face and non-face. Well-known face detection methods include Eigenfaces, Linear Discrimi-nant Analysis (LDA), Neural networks (NN), and support vectors generated by Principal Component Analysis (PCA). And a method using a support vector machine (SVM) method.

이와 같은 방법들은 조명변화, 크기변화, 위치변화 등에 따라 얼굴영역 검출이 부정확할 수 있으며, 특히 계산량이 많아 핸드폰과 같이 연산량에 제한을 받는 경우 실시간으로 처리하기가 어려운 단점이 있다.Such methods may have inaccurate face region detection due to lighting changes, size changes, position changes, and the like, and in particular, due to a large amount of calculation, it is difficult to process in real time when the amount of calculation is limited, such as a mobile phone.

실시간 연산을 위해 저연산으로 얼굴영역 검출을 수행하는 방법으로는 색정보에 대해 임계치를 적용하는 방식이 사용된다. 색정보는 색좌표계(color coordinate)에 따라 RGB, YCbCr, YIQ, HSV 등의 좌표계가 있다.As a method of performing face region detection with low computation for a real-time operation, a method of applying a threshold to color information is used. The color information includes coordinate systems such as RGB, YCbCr, YIQ, and HSV according to the color coordinate system.

그러나 얼굴의 색상이 사람마다 다르고 조명환경에 따라 카메라를 통해 얻어진 피부색영역의 색상이 변한다. 따라서 단일 색좌표계에 대해 고정된 임계치만을 적용하는 기존 방식들은 조명 등 환경 변화에 민감하여 얼굴영역 검출 결과의 신뢰성이 떨어지는 문제가 있다.However, the color of the face varies from person to person, and the color of the skin color area obtained through the camera changes according to the lighting environment. Therefore, existing methods that apply only a fixed threshold value for a single color coordinate system have a problem in that the reliability of the facial region detection result is inferior because it is sensitive to environmental changes such as illumination.

따라서 핸드폰으로 영상 통화 중 화자의 얼굴을 촬영한 영상 프레임을 실시간으로 수신하여 조명 등 환경 변화에 영향을 적게 받으면서 저연산으로 화자의 얼굴영역을 검출하고, 검출한 얼굴영역을 통해 핸드폰과 화자 사이의 거리를 측정할 수 있는 통신단말기에서 화자의 얼굴영역 검출 방법, 이를 이용한 통신단말기와 화자 간 거리 측정 방법 및 이를 적용한 통신단말기의 개발이 필요하다.Therefore, the mobile phone receives the video frame of the speaker's face during the video call in real time and detects the speaker's face area with low computation while being less affected by environmental changes such as lighting. In a communication terminal capable of measuring a distance, a method for detecting a speaker's face area, a method for measuring a distance between a communication terminal and a speaker using the same, and a communication terminal using the same are required.

본 발명은 상술한 종래 기술의 문제점을 해결하기 위하여 안출된 것으로, 영상 통화 중 촬영된 화자의 영상 프레임을 복수의 색좌표계에 대응하는 색신호 성분으로 실시간으로 변환하고 각각의 색신호 성분을 저연산으로 다단계 임계치 처리하여 피부색 영역, 중간 영역 및 피부색이 아닌 영역으로 구분하고, 각 영역에 확률 마스크를 부여한 후 합산하여 통합 확률 마스크를 생성하고, 통합 확률 마스크를 시간축으로 필터 처리하고 수평 및 수직 방향으로 투영하여 화자의 얼굴영역을 실시간으로 검출하는 통신단말기에서 화자의 얼굴영역 검출 방법 및 이를 적용한 통신단말기을 제공하는데 그 목적이 있다.The present invention has been made to solve the above-mentioned problems of the prior art, and converts the video frame of a speaker photographed during a video call into color signal components corresponding to a plurality of color coordinate systems in real time and multi-steps each color signal component to low computation. By processing the threshold, it is divided into skin color region, middle region, and non-skin color region, assigning probability masks to each region, and adding them together to generate integrated probability masks, filtering the integrated probability masks on the time axis, and projecting them horizontally and vertically. It is an object of the present invention to provide a method for detecting a speaker's face area and a communication terminal using the same in a communication terminal for detecting a speaker's face area in real time.

또한, 본 발명은 검출한 화자의 얼굴영역을 이용하여 통신단말기와 화자 간의 거리를 측정하여, 화자에게 통화품질을 보장할 수 있는 최적의 거리를 제공할 수 있는 통신단말기에서 화자의 얼굴영역 검출 방법, 이를 이용한 통신단말기와 화자 간 거리 측정 방법 및 이를 적용한 통신단말기를 제공하는데 그 목적이 있다.In addition, the present invention measures the distance between the communication terminal and the speaker using the detected speaker's face area, the method for detecting the speaker's face area in the communication terminal that can provide the speaker with the optimal distance to ensure the call quality The purpose of the present invention is to provide a method for measuring a distance between a communication terminal and a speaker using the same and a communication terminal using the same.

이러한 목적을 달성하기 위하여, 본 발명에 따른 통신단말기에서 화자의 얼굴영역 검출 방법은 통신단말기에서 화자의 얼굴영역을 검출하는 방법에 있어서, 촬영한 화자의 영상 프레임을 복수의 색좌표계에 대응하는 색신호 성분으로 변환하는 제1 단계, 상기 변환된 각각의 색신호 성분을 다단계 임계치 처리하여 피부색 영역, 중간 영역 및 피부색이 아닌 영역으로 구분하는 제2 단계, 상기 구분된 각 영역에 피부색의 확률을 표현하는 확률 마스크를 부여하는 제3 단계, 상기 각 영역에 부여된 확률 마스크를 합산하여 통합 확률 마스크를 산출하는 제4 단계, 상기 통합 확률 마스크를 시간축으로 필터 처리하는 제5 단계 및 상기 필터 처리된 통합 확률 마스크를 수평 및 수직 방향으로 투영하여 수평 및 수직 방향의 히스토그램값을 산출하고, 상기 산출된 히스토그램값이 기설정된 일정 값보다 적은 영역을 상기 화자의 얼굴영역으로 검출하는 제6 단계를 포함할 수 있다.In order to achieve the above object, the method of detecting a face area of a speaker in a communication terminal according to the present invention is a method of detecting a face area of a speaker in a communication terminal, the color signal corresponding to a plurality of color coordinates of the image frame of the speaker A first step of converting each component into a component, a second step of dividing each converted color signal component into a skin color region, an intermediate region and a non-skin color region, and a probability of expressing a probability of skin color in each of the divided regions A third step of applying a mask, a fourth step of calculating an integrated probability mask by summing the probability masks assigned to the respective regions, a fifth step of filtering the integrated probability mask on a time axis, and the filtered integrated probability mask Is projected in the horizontal and vertical directions to calculate the histogram values in the horizontal and vertical directions, and the calculation And detecting a region in which the histogram value is smaller than a predetermined predetermined value as the face region of the speaker.

상기 제1 단계는, RGB 색좌표계로 입력된 상기 화자의 영상 프레임을 아래의 식을 통해 YCbCr 색좌표계의 색신호 성분으로 변환하고,In the first step, an image frame of the speaker inputted in an RGB color coordinate system is converted into a color signal component of a YCbCr color coordinate system through the following equation,

, 상기 통신단말기는 RGB로 입력된 상기 화자의 영상 프레임을 아래의 식을 통해 YIQ 색좌표계에 대응하는 색신호 성분으로 변환할 수 있다.

The communication terminal may convert an image frame of the speaker inputted in RGB into a color signal component corresponding to a YIQ color coordinate system through the following equation.

상기 제2단계는, 2단계로 다단계 임계치 처리하여 상기 변환된 각각의 색신호 성분 c에 대해 각각 제1 및 제2 하위 임계치(TL_c ¹, TL_c ²), 제1 및 제2 상위 임계치(TH_c ¹, TH_c ²)를 결정하고, 상기 제1 하위 임계치와 상기 제1 상위 임계치 사이 영역을 상기 피부색 영역으로, 상기 제2 하위 임계치와 상기 제1 하위 임계치 사이 및 상기 제1 상위 임계치와 상기 제2 상위 임계치 사이 영역을 상기 중간 영역으로, 상기 제2 하위 임계치 이하 및 상기 제2 상위 임계치 이상 영역을 상기 피부색이 아닌 영역으로 정할 수 있다.In the second step, the multi-step threshold processing is performed in two steps, and the first and second lower threshold values TL _c ¹ and TL _c ² and the first and second upper threshold values TH, respectively, for each of the converted color signal components c. _c ¹ , TH _c ² ), and a region between the first lower threshold and the first upper threshold as the skin color region, between the second lower threshold and the first lower threshold, and the first upper threshold and the A region between a second upper threshold may be defined as the intermediate region, and a region below the second lower threshold and above the second upper threshold may be defined as a region other than the skin color.

상기 제1 단계는, 상기 수신한 화자의 영상 프레임을 화이트 밸런스 보정 및 역광 보정을 하는 전처리 단계를 더 포함할 수 있다.The first step may further include a preprocessing step of performing white balance correction and backlight correction on the image frame of the speaker.

또한, 본 발명에 따른 통신단말기와 화자 간 거리 측정 방법은 통신단말기에서 상기 통신단말기와 화자 사이의 거리를 측정하는 방법에 있어서, 촬영한 화자의 영상 프레임을 복수의 색좌표계에 대응하는 색신호 성분으로 변환하는 제1 단계, 상기 변환된 각각의 색신호 성분을 다단계 임계치 처리하여 피부색 영역, 중간 영역 및 피부색이 아닌 영역으로 구분하는 제2 단계, 상기 구분된 각 영역에 피부색의 확률을 표현하는 확률 마스크를 부여하는 제3 단계, 상기 각 영역에 부여된 확률 마스크를 합산하여 통합 확률 마스크를 산출하는 제4 단계, 상기 통합 확률 마스크를 시간축으로 필터 처리하는 제5 단계, 상기 필터 처리된 통합 확률 마스크를 수평 및 수직 방향으로 투영하여 수평 및 수직 방향의 히스토그램값을 산출하고, 상기 산출된 히스토그램값이 기설정된 일정 값보다 적은 영역을 상기 화자의 얼굴영역으로 검출하는 제6 단계 및 상기 검출한 얼굴영역의 면적, 수직 또는 수평 길이를 이용하여 상기 통신단말기와 상기 화자의 거리를 측정하는 제7 단계를 포함할 수 있다.In addition, the method for measuring the distance between the communication terminal and the speaker according to the present invention is a method for measuring the distance between the communication terminal and the speaker in the communication terminal, the image frame of the photographed speaker as a color signal component corresponding to a plurality of color coordinate system A first step of converting, a second step of dividing each of the converted color signal components into a multi-level threshold and dividing the skin color region, an intermediate region and a non-skin color region, and a probability mask expressing a probability of the skin color in each of the divided regions A third step of adding, a fourth step of calculating an integrated probability mask by summing the probability masks assigned to the respective regions, a fifth step of filtering the integrated probability mask on a time axis, and horizontally filtering the filtered integrated probability mask And projecting in the vertical direction to calculate histogram values in the horizontal and vertical directions, and calculating the histogram. A sixth step of detecting an area smaller than the predetermined predetermined value as the face area of the speaker and a seventh step of measuring a distance between the communication terminal and the speaker using the detected area, vertical or horizontal length of the face area; It may include.

상기 제7 단계는, 아래의 식을 통해 상기 통신단말기와 상기 화자의 거리(d)를 측정하며,

, 상기 x는 상기 검출부에서 검출한 얼굴영역의 면적, 수직 또는 수평 길이를 가리키고, a와 b는 상기 통신단말기에 구비된 카메라의 종류에 따라 결정되는 계수인 것을 특징으로 한다.In the seventh step, the distance d between the communication terminal and the speaker is measured by the following equation,

X denotes the area, vertical or horizontal length of the face area detected by the detector, and a and b are coefficients determined according to the type of camera provided in the communication terminal.

또한, 본 발명에 따른 통신단말기는 수신한 화자의 영상 프레임을 복수의 색좌표계에 대응하는 색신호 성분으로 변환한 후 각각의 색신호 성분을 다단계 임계치 처리하여 피부색 영역, 중간 영역 및 피부색이 아닌 영역으로 구분하고 각 영역에 피부색의 확률을 표현하는 확률 마스크를 부여하는 처리부, 상기 각 영역에 부여된 확률 마스크를 합산하여 통합 확률 마스크를 산출하는 합산부, 상기 통합 확률 마스크를 시간축으로 필터 처리하는 필터부, 상기 필터 처리된 통합 확률 마스크를 수평 및 수직 방향으로 투영하여 수평 및 수직 방향의 히스토그램값을 산출하고, 상기 산출된 히스토그램값이 기설정된 일정 값보다 적은 영역을 상기 화자의 얼굴영역으로 검출하는 검출부 및 상기 처리부에 상기 화자의 영상 프레임을 전송하고, 상기 처리부에서 부여한 확률 마스크를 상기 합산부에 전송하고, 상기 합산부에서 산출한 통합 확률 마스크를 상기 필터부로 전송하여 필터 처리한 후 상기 검출부로 전송하여 상기 화자의 얼굴영역을 검출하는 제어부를 포함할 수 있다.In addition, the communication terminal according to the present invention converts the received image frame of the speaker into a color signal component corresponding to a plurality of color coordinate system, and then divide each color signal component into a multi-level threshold value to divide the skin color region, the middle region and the non-skin color region. And a processing unit for assigning a probability mask expressing a probability of skin color to each region, a summing unit for calculating an integrated probability mask by summing the probability masks assigned to each region, a filter unit for filtering the integrated probability mask on a time axis, A detector for projecting the filtered integrated probability mask in the horizontal and vertical directions to calculate histogram values in the horizontal and vertical directions, and detecting a region where the calculated histogram value is smaller than a predetermined predetermined value as the face region of the speaker; The image frame of the speaker is transmitted to the processing unit, and the processing unit And a control unit for transmitting the assigned probability mask to the adding unit, transmitting the integrated probability mask calculated by the adding unit to the filter unit, performing a filter process, and transmitting the filtered mask to the detection unit to detect a face area of the speaker.

본 발명에 따른 통신단말기는 상기 검출부에서 검출한 얼굴영역의 면적, 수직 또는 수평 길이를 이용하여 상기 통신단말기와 상기 화자 사이의 거리를 측정하는 측정부를 더 포함할 수 있다.The communication terminal according to the present invention may further include a measurement unit for measuring the distance between the communication terminal and the speaker using the area, vertical or horizontal length of the face area detected by the detection unit.

상기 처리부는, 각각의 색신호 성분 c에 대응하는 제1 및 제2 하위 임계치(TL_c ¹, TL_c ²), 제1 및 제2 상위 임계치(TH_c ¹, TH_c ²)를 결정하고, 상기 임계치는 아래의 식에 부합될 수 있다.The processor determines the first and second lower thresholds TL _c ¹ and TL _c ² and the first and second upper thresholds TH _c ¹ and TH _c ² corresponding to the respective color signal components c. The threshold can be given by the equation

상기 처리부는, 상기 제1 하위 임계치와 상기 제1 상위 임계치 사이 영역을 상기 피부색 영역으로, 상기 제2 하위 임계치와 상기 제1 하위 임계치 사이 및 상기 제1 상위 임계치와 상기 제2 상위 임계치 사이 영역을 상기 중간 영역으로, 상기 제2 하위 임계치 이하 및 상기 제2 상위 임계치 이상 영역을 상기 피부색이 아닌 영역으로 정할 수 있다.The processing unit may be a region between the first lower threshold value and the first upper threshold value as the skin color area, and an area between the second lower threshold value and the first lower threshold value and between the first upper threshold value and the second upper threshold value. As the intermediate region, an area below the second lower threshold and an area above the second upper threshold may be defined as an area other than the skin color.

상기 처리부는, 상기 피부색 영역은 2점, 상기 중간 영역은 1점, 상기 피부색이 아닌 영역은 0점의 확률 마스크를 부여할 수 있다.The processing unit may give a probability mask of two points on the skin color area, one point on the middle area, and zero points on the non-skin area.

상기 처리부는, RGB 색신호 성분으로 입력된 영상 프레임을 YCbCr 색좌표계 또는 YIQ 색좌표계에 대응하는 색신호 성분으로 변환할 수 있다.The processor may convert an image frame input as an RGB color signal component into a color signal component corresponding to a YCbCr color coordinate system or a YIQ color coordinate system.

상기 필터부는 아래의 식을 통해 현재 프레임의 확률 마스크와 이전 프레임의 확률 마스크에 시간축으로 IIR(Infinite Impulse Response) 필터 처리하며,

, 상기 M(k)는 k번째 영상 프레임에 대한 확률 마스크이고, α는 0 내지 1의 값을 갖는 가중치인 것을 특징으로 한다.The filter unit processes an IIR (Infinite Impulse Response) filter on the probability mask of the current frame and the probability mask of the previous frame by the following equation,

M (k) is a probability mask for a k-th image frame, and α is a weight having a value of 0 to 1.

상기 측정부는 아래의 식을 통해 상기 통신단말기와 상기 화자의 거리(d)를 측정하며,

, 상기 x는 상기 검출부에서 검출한 얼굴영역의 면적, 수직 또는 수평 길이를 가리키고, a와 b는 상기 통신단말기에 구비된 카메라의 종류에 따라 결정되는 계수인 것을 특징으로 한다.The measuring unit measures the distance (d) of the communication terminal and the speaker through the following equation,

본 발명에 따른 통신단말기는 상기 수신한 화자의 영상 프레임을 화이트 밸런스(White Balance) 보정 및 역광 보정하는 전처리부를 더 포함할 수 있다.The communication terminal according to the present invention may further include a preprocessor for white balance correction and backlight correction of the image frame of the speaker.

본 발명에 따르면, 영상 통화 중 촬영된 화자의 영상 프레임을 복수의 색좌표계에 대응하는 색신호 성분으로 실시간으로 변환하고 각각의 색신호 성분을 저연산으로 다단계 임계치 처리하여 피부색 영역, 중간 영역 및 피부색이 아닌 영역으로 구분하고, 각 영역에 확률 마스크를 부여한 후 합산하여 통합 확률 마스크를 생성하고, 통합 확률 마스크를 시간축으로 필터 처리하고 수평 및 수직 방향으로 투영하여 화자의 얼굴영역을 실시간으로 검출할 수 있는 효과가 있다.According to the present invention, a video frame of a speaker photographed during a video call is converted in real time into color signal components corresponding to a plurality of color coordinate systems, and each color signal component is processed in a low-step multi-level threshold process so that the skin color region, the middle region and the skin color are not It is possible to detect the speaker's face area in real time by classifying them into areas, assigning probability masks to each area, and adding them together to generate an integrated probability mask, filtering the integrated probability masks on the time axis, and projecting them horizontally and vertically. There is.

또한, 검출한 화자의 얼굴영역을 이용하여 통신단말기와 화자 간의 거리를 측정하여, 화자에게 통화품질을 보장할 수 있는 최적의 거리를 제공할 수 있는 효과가 있다.In addition, by measuring the distance between the communication terminal and the speaker using the detected speaker's face area, it is possible to provide the speaker with the optimum distance to ensure the call quality.

또한, 스마트폰을 포함하여 영상 통화를 제공하는 통신단말기의 연산량의 제한을 갖는 환경에서 실시간으로 얼굴영역을 고신뢰도로 검출할 수 있는 효과가 있다.In addition, there is an effect that can detect the face region with high reliability in real time in an environment having a limited amount of calculation of the communication terminal providing a video call, including a smart phone.

도 1은 본 발명의 실시예에 따른 통신단말기를 보여주는 블록도.
도 2 및 도 3은 본 발명의 실시예에 대한 임계치와 영역의 관계를 나타낸 도면.
도 4는 본 발명의 실시예에 따른 통신단말기에서 화자의 얼굴영역 검출 및 통신단말기와 화자 간 거리 측정 방법을 보여주는 도면.
도 5는 본 발명의 실시예에 따른 복수의 색좌표계에 대응하는 색신호 성분으로 변환된 화자의 영상 프레임을 보여주는 도면.
도 6a 내지 도 6d는 본 발명의 실시예에 따른 복수의 색신호 성분을 다단계 임계치 처리한 예를 보여주는 그래프.
도 7은 본 발명의 실시예에 따른 다단계 임계치 처리를 적용하여 얻어진 확률 마스크를 보여주는 도면.
도 8은 본 발명의 실시예에 따른 통합 확률 마스크를 보여주는 도면.
도 9는 본 발명의 실시예에 따른 IIR 필터 처리된 통합 확률 마스크를 보여주는 도면.
도 10은 본 발명의 실시예에 따른 통합 확률 마스크를 수평 및 수직 방향으로 투영한 히스토그램을 보여주는 도면.
도 11은 본 발명의 실시예에 따라 검출된 얼굴영역을 보여주는 도면.
도 12는 본 발명의 실시예에 따른 얼굴영역 수평방향 화소수와 통신단말기와 화자 사이의 거리 관계를 나타내는 그래프.
도 13은 본 발명의 실시예에 따른 얼굴 검출 및 거리 측정 결과를 보여주는 도면.1 is a block diagram showing a communication terminal according to an embodiment of the present invention.
2 and 3 illustrate the relationship between thresholds and regions for embodiments of the present invention.
4 is a diagram illustrating a method for detecting a speaker's face area and measuring a distance between a communication terminal and a speaker in a communication terminal according to an embodiment of the present invention.
5 is a diagram illustrating an image frame of a speaker converted into color signal components corresponding to a plurality of color coordinate systems according to an embodiment of the present invention.
6A to 6D are graphs illustrating an example of multi-step threshold processing of a plurality of color signal components according to an embodiment of the present invention.
7 illustrates a probability mask obtained by applying a multistep threshold processing according to an embodiment of the present invention.
8 illustrates an integrated probability mask in accordance with an embodiment of the present invention.
9 illustrates an IIR filtered integrated probability mask in accordance with an embodiment of the present invention.
10 illustrates a histogram of projection of the integrated probability mask in the horizontal and vertical directions according to an embodiment of the present invention.
11 is a view showing a face region detected according to an embodiment of the present invention.
12 is a graph showing the relationship between the number of pixels in the horizontal region of the face area and the distance between the communication terminal and the speaker according to an embodiment of the present invention.
13 is a view illustrating a result of face detection and distance measurement according to an embodiment of the present invention.

이하, 첨부한 도면들을 참조하여 본 발명의 실시예를 상세히 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 실시예에 따른 통신단말기를 보여주는 블록도이다.1 is a block diagram showing a communication terminal according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 실시예에 따른 통신단말기는 영상통화가 가능한 단말기로, 스마트폰을 포함한 휴대폰일 수 있다.Referring to FIG. 1, a communication terminal according to an embodiment of the present invention may be a mobile terminal including a smart phone as a terminal capable of making a video call.

본 발명의 실시예에 따른 통신단말기(100)는 처리부(110), 합산부(120), 필터부(130), 검출부(140), 측정부(150) 및 제어부(170)를 포함하여 구성되고, 화자의 얼굴을 촬영하기 위한 카메라(160)를 더 포함하여 구성될 수 있다.Communication terminal 100 according to an embodiment of the present invention is configured to include a processing unit 110, summing unit 120, filter unit 130, detection unit 140, measuring unit 150 and the control unit 170 It may be configured to further include a camera 160 for photographing the speaker's face.

처리부(110)는 송수신부(111), 전처리부(113), 변환부(115) 및 확률 마스크 부여부(117)를 포함하여 구성된다.The processor 110 includes a transceiver 111, a preprocessor 113, a converter 115, and a probability mask granter 117.

송수신부(111)는 카메라(160)가 촬영한 화자의 영상 프레임을 수신한다.The transceiver 111 receives an image frame of the speaker photographed by the camera 160.

전처리부(113)는 조명변화에 따라 얼굴영역 검출의 성능이 저하되는 것을 방지하기 위해 화자의 영상 프레임을 전처리한다. 전처리는 화이트 밸런스 보정 및 역과 보정을 포함한다. 화이트 밸런스 보정은 흰 물체가 정확하게 흰색으로 재생하도록 색상의 값을 조정하는 보정 방법이다. 또한, 역광 보정은 빛을 등지고 촬영된 영상물을 최대한 밝게 자동 처리하는 보정 방법이다.The preprocessor 113 preprocesses the speaker's image frame in order to prevent the performance of face region detection from deteriorating due to the change in illumination. Preprocessing includes white balance correction and inverse correction. White balance correction is a correction method that adjusts the value of color so that white objects reproduce exactly white. In addition, the backlight compensation is a correction method for automatically processing the captured image as bright as possible with the back of the light.

변환부(115)는 전처리된 영상 프레임을 복수의 색좌표계에 대응하는 색신호 성분으로 변환한다. 예를 들어, 카메라(160)로부터 얻어진 영상이 RGB 색좌표계인 경우 변환부(115)는 RGB 색좌표계의 영상을 YCbCr 색좌표계 및 YIQ 색좌표계에 대응하는 색신호 성분으로 각각 변환한다.The conversion unit 115 converts the preprocessed image frame into color signal components corresponding to the plurality of color coordinate systems. For example, when the image obtained from the camera 160 is an RGB color coordinate system, the conversion unit 115 converts the image of the RGB color coordinate system into color signal components corresponding to the YCbCr color coordinate system and the YIQ color coordinate system, respectively.

수학식 1 및 2는 RGB 색좌표계를 YCbCR 색좌표계 및 YIQ 색좌표계로 변환하는 식이다. 즉, 카메라(160)로부터 얻어진 RGB 색좌표계의 영상을 수학식 1의 첫 번째 식에 대입하면 YCbCr 색좌표계의 Cb 색신호 성분으로 변환되고, 두 번째 식에 대입하면 YCbCr 색좌표계의 Cr 색신호 성분으로 변환된다.Equations 1 and 2 are equations for converting an RGB color coordinate system into a YCbCR color coordinate system and a YIQ color coordinate system. That is, when the image of the RGB color coordinate system obtained from the camera 160 is substituted into the first equation of Equation 1, it is converted into the Cb color signal component of the YCbCr color coordinate system, and when it is substituted into the second equation, it is converted into the Cr color signal component of the YCbCr color coordinate system. do.

마찬가지로 RGB 색좌표계의 영상을 수학식 2의 첫 번째 식에 대입하면 YIQ 색좌표계의 I 색신호 성분으로 변환되고, 두 번째 식에 대입하면 YIQ 색좌표계의 Q 색신호 성분으로 변환된다.Similarly, when the image of the RGB color coordinate system is substituted into the first equation of Equation 2, the image is converted into the I color signal component of the YIQ color coordinate system, and when the image of the RGB color coordinate system is substituted, it is converted into the Q color signal component of the YIQ color coordinate system.

이외에도 영상 프레임은 수학식 3에 의해 HSV 색좌표계로 변환할 수도 있는데, 이는 연산량이 많아 통신단말기(100)에서 처리하기에 무리가 있을 수 있으므로 본 발명의 실시예에서는 제외하였다. 그러나, HSV 또는 CMYK 등 다양한 색좌표계가 있으며 변환되는 색좌표계는 YCbCr 및 YIQ에 한정되지 않는다.In addition, the image frame may be converted into the HSV color coordinate system by Equation 3, which is not included in the embodiment of the present invention because a large amount of computation may be difficult to process in the communication terminal 100. However, there are various color coordinate systems such as HSV or CMYK, and the color coordinate system to be converted is not limited to YCbCr and YIQ.

확률 마스크 부여부(117)는 변환부(115)에서 변환된 각각의 색신호 성분을 다단계 임계치 처리하여 피부색 영역, 중간 영역 및 피부색이 아닌 영역으로 구분하고 각 영역에 확률 마스크를 부여한다. 이때, 중간 영역은 피부색이라고 하기에도 애매하고 피부색이 아니라고 하기에도 애매한 영역을 의미한다.The probability mask assigning unit 117 processes each color signal component converted by the converting unit 115 into a multi-level threshold value to divide the skin color region, the middle region, and the non-skin color region, and assign a probability mask to each region. In this case, the intermediate region refers to a region that is ambiguous even if it is skin color, and is not even skin color.

구체적으로 확률 마스크 부여부(117)는 2단계로 다단계 임계치 처리할 수 있는데, 변환된 각각의 색신호 성분 c에 대해 각각 제1 및 제2 하위 임계치(TL_c ¹, TL_c ²), 제1 및 제2 상위 임계치(TH_c ¹, TH_c ²)를 결정할 수 있다. 이때, TL은 Threshold Low, TH는 Threshold High를 나타낸다.In detail, the probability mask assigning unit 117 may process the multi-level threshold in two stages, wherein the first and second lower thresholds TL _c ¹ and TL _c ² , first and ^second , respectively, for the converted color signal components c, respectively. The second upper threshold value TH _c ¹ , TH _c ² may be determined. At this time, TL is Threshold Low and TH is Threshold High.

수학식 4는 각각의 색신호 성분에 대응하는 임계치들의 관계에 대한 식으로, i=1인 경우 첫 번째 식은 1단계 임계치 범위를 나타내고, 두 번째 식은 1단계 임계치와 2단계 임계치 사이의 관계를 나타낸다. Equation 4 is a relationship between thresholds corresponding to each color signal component. When i = 1, the first equation represents a range of a first level threshold, and the second equation represents a relationship between a level 1 threshold and a level 2 threshold.

도 2는 본 발명의 실시예에 대한 임계치와 영역의 관계를 나타낸 도면으로, 도 2 및 수학식 4를 참조하면, 제1 하위 임계치와 제1 상위 임계치 사이 영역을 피부색 영역으로 정하고, 제2 하위 임계치와 제1 하위 임계치 사이 및 제1 상위 임계치와 제2 상위 임계치 사이 영역을 중간 영역으로, 제2 하위 임계치 이하 및 제2 상위 임계치 이상 영역을 피부색이 아닌 영역으로 정할 수 있다.FIG. 2 is a diagram illustrating a relationship between a threshold value and an area according to an embodiment of the present invention. Referring to FIGS. 2 and 4, the area between the first lower threshold value and the first upper threshold value may be defined as a skin color area, and the second lower value value may be described. The region between the threshold and the first lower threshold and between the first upper threshold and the second upper threshold may be defined as an intermediate region, and the region below the second lower threshold and above the second upper threshold may be designated as a non-skin color region.

그리고 확률 마스크는 피부색의 확률을 표현하는 값으로 예를 들어 피부색 영역에는 2점, 중간 영역에는 1점, 피부색이 아닌 영역에는 0점을 부여할 수 있다.The probability mask is a value expressing the probability of the skin color. For example, the skin mask may be given 2 points in the skin color area, 1 point in the middle area, and 0 points in the non-skin area.

한편, 도 3은 본 발명의 실시예에 대한 임계치와 영역의 관계를 나타낸 도면으로, 3단계 임계치 처리한 경우를 설명하고 있다. 3단계로 다단계 임계치 처리하는 경우 임계치는 변환된 각각의 색신호 성분 c에 대해 각각 제1 내지 제3 하위 임계치(TL_c ¹, TL_c ², TL_c ³), 제1 내지 제2 상위 임계치(TH_c ¹, TH_c ², TH_c ³)로 결정될 수 있다. 그리고 도 3에 도시된 바와 같이 영역을 나누고 확률 마스크를 부여할 수 있다.3 is a diagram illustrating a relationship between a threshold value and a region according to an exemplary embodiment of the present invention. When the multi-step threshold processing is performed in three steps, the thresholds are respectively the first to third lower thresholds TL _c ¹ , TL _c ² , and TL _c ³ and the first to second upper thresholds TH for each of the converted color signal components c. _c ¹ , TH _c ² , TH _c ³ ). As shown in FIG. 3, regions may be divided and probability masks may be assigned.

합산부(120)는 처리부(110)의 확률 마스크 부여부(117)에서 각 영역 즉, 피부색 영역, 중간 영역 및 피부색이 아닌 영역에 각각 부여된 확률 마스크를 합산하여 통합 확률 마스크를 산출한다.The adder 120 calculates an integrated probability mask by summing probability masks applied to the respective areas, that is, the skin color region, the middle region, and the non-skin color region, in the probability mask granter 117 of the processor 110.

예를 들면, 화자의 영상 프레임에 대해 YCbCr 색좌표계에 대응하는 색신호 성분을 다단계 임계치 처리하여 피부색 영역, 중간 영역, 피부색이 아닌 영역에 각각 2점, 1점, 0점이 부여되고, YIQ 색좌표계에 대응하는 색신호 성분을 다단계 임계치 처리하여 피부색 영역, 중간 영역, 피부색이 아닌 영역에 각각 2점, 1점, 0점이 부여되었다면, 합산부(120)는 이를 모두 더하여 통합 확률 마스크가 가질 수 있는 값의 범위는 0 내지 4점이 된다.For example, multi-level threshold processing of the color signal components corresponding to the YCbCr color coordinate system is performed on the speaker's image frame to give 2 points, 1 point, and 0 points to the skin color region, the middle region, and the non-skin color region, respectively, and to the YIQ color coordinate system. If two, one, and zero points are applied to the skin color region, the middle region, and the non-skin color region by multi-level threshold processing of the corresponding color signal components, the summation unit 120 adds all of them to the value of the integrated probability mask. The range is 0 to 4 points.

필터부(130)는 합산부(120)에서 산출한 통합 확률 마스크를 수학식 5와 같은 연산을 통해 시간축으로 필터 처리한다. 이는 선택된 얼굴영역이 프레임 단위로 급격히 변하지 않고 이전 프레임과의 상관성이 유지되도록 함으로써 얼굴영역 검출의 신뢰도를 높인다.The filter 130 filters the integrated probability mask calculated by the adder 120 on the time axis through an operation as shown in Equation 5 below. This improves the reliability of face region detection by maintaining the correlation of the selected face region with the previous frame without sudden change in units of frames.

여기서 M(k)는 k번째 영상 프레임에 대한 확률 마스크이고, α는 0 내지 1의 값을 갖는 가중치이다. 이때, 만약 α=1이면 시간축 IIR(Infinite Impulse Response) 필터 처리 결과는 입력한 동일한 마스크 즉 M(k)와 동일하게 된다.M (k) is a probability mask for the k-th image frame, and α is a weight having a value of 0 to 1. In this case, if α = 1, the result of the time axis IIR filter processing is the same as the input mask, that is, M (k).

검출부(140)는 필터부(130)에서 필터 처리된 통합 확률 마스크를 수평 및 수직 방향으로 투영하여 수평 및 수직 방향의 히스토그램값을 산출한다. 즉, 통합 확률 마스크의 점수가 높은 화소위치는 얼굴영역일 가능성이 높은 위치이므로 최종적으로 얼굴영역을 선택하기 위해서 통합 확률 마스크의 점수를 수평 및 수직 방향으로 투영하여 누적값을 계산한다. The detector 140 projects the integrated probability mask filtered by the filter 130 in the horizontal and vertical directions to calculate histogram values in the horizontal and vertical directions. That is, since the pixel position having a high score of the integrated probability mask is a position that is likely to be a face region, the cumulative value is calculated by projecting the score of the integrated probability mask in the horizontal and vertical directions to finally select the face region.

특히, 피부색 확률 마스크는 피부색과 유사한 배경이나 의상 같은 노이즈가 포함된 영역도 높은 확률 마스크 점수를 가질 수 있기 때문에 노이즈는 제거하고 얼굴영역만 추출하기 위해서 확률 마스크의 피부색 확률을 수평방향과 수직방향으로 투영하여 수평 및 수직 방향의 히스토그램값을 산출한다.In particular, the skin color probability mask can have a high probability mask score even in areas containing noise such as background or clothing similar to skin color, so the skin color probability of the probability mask can be adjusted in the horizontal and vertical directions to remove noise and extract only the face area. Projection yields histogram values in the horizontal and vertical directions.

검출부(140)는 산출된 히스토그램값이 기 설정된 일정 값보다 큰 영역을 화자의 얼굴영역으로 검출한다. 즉, 수평 및 수직 히스토그램값이 일정 임계치보다 큰 위치를 얼굴영역 경계로 결정하여 얼굴영역을 검출하는데, 여기서 임계치는 실험을 통해 투영 히스토그램 최대값의 1/4이 적합한 것으로 확인되었다.The detector 140 detects a region where the calculated histogram value is larger than a predetermined predetermined value as the face region of the speaker. In other words, the facial region is detected by determining the position where the horizontal and vertical histogram values are larger than a predetermined threshold as the face region boundary, and the threshold is experimentally confirmed that 1/4 of the projection histogram maximum value is suitable.

이를 통해 통신단말기(100)는 영상 통화를 수행하는 중에 저연산으로 화자의 얼굴영역을 실시간으로 검출할 수 있다. 이후 통화 품질을 보장하기 위한 통신단말기(100)와 화자 사이의 거리를 측정하기 위해 측정부(150)가 더 포함될 수 있는데, 측정부(150)는 검출부(140)에서 검출한 얼굴영역의 면적, 수직 또는 수평 길이를 이용하여 통신단말기(100)와 화자 사이의 거리를 측정할 수 있다.Through this, the communication terminal 100 may detect the face region of the speaker in real time with low computation during the video call. Thereafter, the measuring unit 150 may be further included to measure the distance between the communication terminal 100 and the speaker to guarantee the call quality. The measuring unit 150 may include an area of the face area detected by the detector 140, The distance between the communication terminal 100 and the speaker may be measured using the vertical or horizontal length.

구체적으로, 검출한 얼굴영역의 면적(화소수), 수직길이(화소수) 및 수평길이(화소수)를 포함하는 얼굴영역 파라미터(x)는 통신단말기(100)와 화자 사이의 거리(d)와 반비례 관계를 갖는다.Specifically, the face area parameter (x) including the area (pixel number), vertical length (pixel number), and horizontal length (pixel number) of the detected face area is the distance d between the communication terminal 100 and the speaker. Have an inverse relationship with

이는 수학식 6으로 표현할 수 있다.This can be expressed by Equation 6.

여기서, a, b는 통신단말기(100)에 구비된 카메라(160)의 종류에 따라 결정되는 계수이다.Here, a and b are coefficients determined according to the type of the camera 160 provided in the communication terminal 100.

한편, 제어부(170)는 통신단말기(100)의 전반적인 제어 동작을 수행하는 마이크로 프로세어이다.On the other hand, the controller 170 is a microprocessor that performs the overall control operation of the communication terminal 100.

제어부(170)는 카메라(160)가 촬영한 영상 프레임을 처리부(110)에 전송하고, 처리부(110)에서 부여한 확률 마스크를 합산부(120)에 전송하고, 합산부(120)에서 산출한 통합 확률 마스크를 필터부(130)로 전송하여 필터 처리한 후 검출부(140)로 전송하여 화자의 얼굴영역을 검출하도록 제어한다.The controller 170 transmits the image frame photographed by the camera 160 to the processor 110, transmits the probability mask given by the processor 110 to the adder 120, and calculates the integrated frame calculated by the adder 120. The probability mask is transmitted to the filter unit 130, filtered, and then transmitted to the detector 140 to detect the face area of the speaker.

또한, 제어부(170)는 검출부(140)에서 검출한 얼굴영역을 이용하여 측정부(150)에서 통신단말기와 화자 사이의 거리를 측정하도록 제어한다.In addition, the controller 170 controls the measurement unit 150 to measure the distance between the communication terminal and the speaker using the face area detected by the detector 140.

본 발명의 실시예에 의한 통신단말기에서 화자의 얼굴영역 검출 방법 및 이를 이용한 통신단말기와 화자 간 거리 측정 방법은 도 1 내지 도 13을 참조하여 설명하면 다음과 같다. 도 4는 본 발명의 실시예에 따른 통신단말기에서 화자의 얼굴영역 검출 및 통신단말기와 화자 간 거리 측정 방법을 보여주는 도면이다.A method of detecting a face area of a speaker and a method of measuring a distance between a communication terminal and a speaker using the same in a communication terminal according to an embodiment of the present invention will be described with reference to FIGS. 1 to 13. 4 is a diagram illustrating a method for detecting a face area of a speaker and a distance measuring method between the communication terminal and a speaker in a communication terminal according to an embodiment of the present invention.

도 1 및 도 4를 참조하면, S210과정에서 통신단말기(100)의 송수신부(111)는 카메라(160)가 촬영한 화자의 k번째 영상 프레임(F(k))을 수신한다. F(k)는 RGB 색좌표계 대응하는 색신호 성분을 갖도록 촬영될 수 있다.1 and 4, in operation S210, the transceiver 111 of the communication terminal 100 receives the k-th image frame F (k) of the speaker photographed by the camera 160. F (k) may be photographed to have a color signal component corresponding to the RGB color coordinate system.

다음으로 S210과정에서 전처리부(113)는 수신한 F(k)를 전처리한다. 전처리 과정은 화이트 밸런스 보정 및 히스토그램 정규화를 이용한 역광 보정을 포함하는데, 이와 같은 두가지의 전처리 과정은 매우 어두운 조명 환경하에서도 얼굴영역을 검출하기 위해 수행된다.Next, in step S210, the preprocessor 113 preprocesses the received F (k). The preprocessing process includes backlight compensation using white balance correction and histogram normalization. Both of these preprocessing steps are performed to detect facial regions even under very dark lighting conditions.

다음으로 S230과정에서 변환부(115)는 전처리된 F(k)를 복수의 색좌표계에 대응하는 색신호 성분으로 변환한다. 도 5은 본 발명의 실시예에 따른 복수의 색좌표계에 대응하는 색신호 성분으로 변환된 화자의 영상 프레임을 보여주는 도면으로, 도 5의 (a)는 YCbCr 색좌표계의 Cb 색성분으로 변환된 영상 프레임을 보여주고, 도 5의 (b)는 YCbCr 색좌표계의 Cr 색성분으로 변환된 영상 프레임을 보여준다. 그리고 도 5의 (c) 및 (d)는 각각 YIQ 색좌표계의 I 색성분 및 Q 색성분으로 변환된 영상 프레임을 보여준다.Next, in step S230, the conversion unit 115 converts the preprocessed F (k) into color signal components corresponding to the plurality of color coordinate systems. FIG. 5 is a diagram illustrating an image frame of a speaker converted to color signal components corresponding to a plurality of color coordinate systems according to an embodiment of the present invention. FIG. 5A illustrates an image frame converted to Cb color components of a YCbCr color coordinate system. 5 (b) shows an image frame converted to Cr color components of the YCbCr color coordinate system. 5 (c) and 5 (d) show image frames converted to I color components and Q color components of the YIQ color coordinate system, respectively.

이때, S231과정은 도 5의 (a) 및 (b)에 도시된 바와 같이 전처리된 RGB 색좌표계의 F(k)를 YCbCr 색좌표계에 대응하는 색신호 성분으로 변환하고, S232과정은 도 5의 (c) 및 (d)에 도시된 바와 같이 전처리된 RGB 색좌표계의 F(k)를 YIQ 색좌표계에 대응하는 색신호 성분으로 변환한다. 만약 더 많은 색좌표계로 변환하는 경우 도시되지는 않았으나 S233과정에서 HSV 색좌표계에 대응하는 색신호 성분으로 변환할 수 있다.In this case, the process S231 converts F (k) of the preprocessed RGB color coordinate system into color signal components corresponding to the YCbCr color coordinate system as shown in FIGS. 5A and 5B, and the S232 process is performed in FIG. As shown in c) and (d), F (k) of the preprocessed RGB color coordinate system is converted into a color signal component corresponding to the YIQ color coordinate system. If the conversion to more color coordinate system is not shown, it may be converted to a color signal component corresponding to the HSV color coordinate system in step S233.

이외에도 색좌표계로 변환하는 과정은 HSV, CMYK 등 다양하며 그 종류는 이에 한정되지 않는다.In addition, the process of converting to the color coordinate system is various, such as HSV, CMYK, and the like is not limited thereto.

다음으로 S240과정에서 확률 마스크 부여부(117)는 각각 변환된 색신호 성분을 다단계 임계치 처리하여 피부색 영역, 중간 영역 및 피부색이 아닌 영역으로 구분한다. 이는 이후 단계에서 각각의 영역에 확률 마스크를 부여하기 위해 구분하는 것이다.Next, in step S240, the probability mask applying unit 117 divides the converted color signal components into multi-level threshold values, respectively, to divide the skin color region, the middle region, and the non-skin color region. This is done later to assign a probability mask to each region.

도 6a 내지 도 6d는 본 발명의 실시예에 따른 복수의 색신호 성분을 다단계 임계치 처리한 예를 보여주는 그래프이고, 도 7은 본 발명의 실시예에 따른 다단계 임계치 처리를 적용하여 얻어진 확률 마스크를 보여주는 도면이다.6A to 6D are graphs illustrating an example of multi-step threshold processing of a plurality of color signal components according to an embodiment of the present invention, and FIG. 7 is a diagram illustrating a probability mask obtained by applying the multi-step threshold processing according to an embodiment of the present invention. to be.

도 6a 및 도 6b는 각각 Cb 색신호 성분 및 Cr 색신호 성분을 다단계 임계치 처리한 예를 보여주고, 도 6c 및 도 6d는 각각 I 색신호 성분 및 Q 색신호 성분을 다단계 임계치 처리한 예를 보여준다.6A and 6B show examples of multi-step threshold processing of Cb color signal components and Cr color signal components, respectively, and FIGS. 6C and 6D show examples of multi-step threshold processing of I color signal components and Q color signal components, respectively.

본 발명의 실시예에 따른 다단계 임계치 처리는 2단계 임계치 처리로서, 임계치가 제1 및 제2 하위 임계치(TL_c ¹, TL_c ²), 제1 및 제2 상위 임계치(TH_c ¹, TH_c ²)를 포함하여 총 4개가 결정된다.Multi-level threshold processing according to an embodiment of the present invention is a two-step threshold processing, the threshold is the first and second lower threshold (TL _c ¹ , TL _c ² ), the first and second upper threshold (TH _c ¹ , TH _c ^A total of four are determined, including ² ).

구체적으로, S241a과정은 도 6a 및 도 6b에 도시된 바와 같이 Cb 및 Cr 색신호 성분에 대해 2단계 임계치를 처리하고, 이때의 임계치는 TL_Cb ¹, TL_Cb ^2,TL_Cr ¹, TL_Cr ²로 표현되는 제1 및 제2 하위 임계치, TH_Cb ¹, TH_Cb ^2,TH_Cr ¹, TH_Cr ²로 표현되는 제1 및 제2 상위 임계치이다.Specifically, the process S241a processes two-step thresholds for the Cb and Cr color signal components as shown in FIGS. 6A and 6B, and the thresholds are TL _Cb ¹ , TL _Cb ^2, TL _Cr ¹ , and TL _Cr ² . The first and second lower thresholds expressed, TH _Cb ¹ , TH _Cb ^2, TH _Cr ¹ , TH _Cr ² are the first and second upper thresholds expressed.

그리고 S242a과정은 도 6c 및 도 6d에 도시된 바와 같이 I 및 Q 색신호 성에 대해 2단계 임계치를 처리하고, 이때의 임계치는 TL_I ¹, TL_I ^2,TL_Q ¹, TL_Q ²로 표현되는 제1 및 제2 하위 임계치, TH_I ¹, TH_I ^2,TH_Q ¹, TH_Q ²로 표현되는 제1 및 제2 상위 임계치이다.The process S242a processes two-level thresholds for the I and Q color signal characteristics as shown in FIGS. 6C and 6D, and the thresholds are represented by TL _I ¹ , TL _I ^2, TL _Q ¹ , and TL _Q ² . The first and second upper thresholds, expressed as the first and second lower thresholds, TH _I ¹ , TH _I ^2, TH _Q ¹ , TH _Q ² .

이때, 도 6a 내지 도 6d에 도시된 바와 같이 각 색신호 성분에 대응하여 제1 하위 임계치와 제1 상위 임계치 사이 영역을 피부색 영역(a)으로, 제2 하위 임계치와 제1 하위 임계치 사이 및 제1 상위 임계치와 제2 상위 임계치 사이 영역을 중간 영역(b)으로, 제2 하위 임계치 이하 및 제2 상위 임계치 이상 영역을 피부색이 아닌 영역(c)으로 정할 수 있다.In this case, as shown in FIGS. 6A to 6D, the area between the first lower threshold value and the first upper threshold value corresponds to the skin color area a, and between the second lower threshold value and the first lower threshold value and corresponding to each color signal component. The region between the upper threshold and the second upper threshold may be defined as the middle region b, and the region below the second lower threshold and above the second upper threshold may be designated as the non-skin color region c.

그리고 S241b과정에서 도 7에 도시된 바와 같이 확률 마스크 부여부(117)는 구분된 각각의 영역에 확률 마스크를 부여한다. 도 7의 (a)는 Cb와 Cr 색신호 성분에 대해 부여된 확률 마스크를 나타내고, 도 7의 (b)는 I와 Q 색신호 성분에 대해 부여된 확률 마스크를 나타낸다. 예를 들어 피부색 영역(a)은 2점, 중간 영역(b)에는 1점, 피부색이 아닌 영역(c)에는 0점을 부여한다.In operation S241b, as illustrated in FIG. 7, the probability mask assigning unit 117 assigns a probability mask to each of the divided regions. Fig. 7A shows a probability mask given to Cb and Cr color signal components, and Fig. 7B shows a probability mask given to I and Q color signal components. For example, the skin color region a is given 2 points, the middle region b is 1 point, and the skin color region c is 0.

일반적으로 확률 마스크는 2단계 임계치 처리가 되는 경우 0 내지 4의 값을 갖는데, 큰 값은 피부색일 확률이 높음을 의미한다. 예로써 도 7의 (a)에 도시된 Cb와 Cr 색신호 성분에 대한 확률 마스크의 경우 수학식 7의 집합으로 표현될 수 있다.In general, the probability mask has a value of 0 to 4 when the two-step threshold processing is performed, and a large value means that the skin color has a high probability. For example, the probability masks for the Cb and Cr color signal components shown in FIG. 7A may be represented by a set of Equation 7.

다음으로 S250과정에서 합산부(120)는 각 색좌표계에 대한 확률 마스크를 모두 더해 통합 확률 마스크를 산출한다. 도 8은 본 발명의 실시예에 따른 통합 확률 마스크를 보여주는 도면으로, 도 8은 도 6a 내지 도 6d에 도시된 Cb와 Cr 색신호 성분과 I와 Q 색신호 성분에 대한 개별 확률 마스크를 더해 얻은 통합 확률 마스크의 예를 나타낸다. Next, in step S250, the adder 120 adds all of the probability masks for each color coordinate system to calculate an integrated probability mask. 8 is a diagram illustrating an integrated probability mask according to an embodiment of the present invention, and FIG. 8 is an integrated probability obtained by adding individual probability masks for the Cb and Cr color signal components and the I and Q color signal components shown in FIGS. 6A to 6D. An example of a mask is shown.

본 발명의 실시예에 따른 다단계 임계치 처리 방식이 2단계 임계치 처리 방식인 경우 개별 확률 마스크가 가질 수 있는 점수의 범위가 0 내지 2점이므로 CbCr과 IQ 두 개의 색좌표계에 대한 통합 확률 마스크는 0 내지 4의 값을 가질 수 있다.When the multi-step threshold processing method according to an embodiment of the present invention is a two-step threshold processing method, since the range of scores that individual probability masks can have is 0 to 2 points, the combined probability masks for the two color coordinate systems of CbCr and IQ are 0 to 2. It can have a value of 4.

다음으로 S260과정에서 필터부(130)는 합산부(120)에서 산출한 통합 확률 마스크를 시간축으로 필터 처리한다. 도 9는 본 발명의 실시예에 따른 IIR 필터 처리된 통합 확률 마스크를 보여주는 도면이다.Next, in step S260, the filter unit 130 filters the integrated probability mask calculated by the adder 120 on the time axis. 9 illustrates an IIR filtered integrated probability mask according to an embodiment of the present invention.

필터부(130) 상기 수학식 4의 연산을 통해 시간축 IIR(Infinite Impulse Response) 필터 처리하여 선택된 얼굴영역이 프레임 단위로 급격히 변하지 않고 이전 프레임과의 상관성이 유지되도록 함으로써 얼굴영역 검출의 신뢰도를 높인다.The filter unit 130 processes the time axis IIR (Infinite Impulse Response) filter through the operation of Equation 4 so that the selected face region is not rapidly changed in units of frames and maintains correlation with the previous frame, thereby increasing reliability of face region detection.

이때, 도 9에 도시된 필터 처리된 통합 확률 마스크는 도 8에 도시된 통합 확률 마스크와 비교하여 모서리 또는 에지(edge)에 나타난 높은 확률 값이 낮아지고 전반적으로 평활화된 것을 알 수 있다.In this case, it can be seen that the filtered probabilities mask shown in FIG. 9 is lower than the integrated probabilities mask shown in FIG. 8 and has a high probability value, which is generally smoothed.

즉, 본 발명의 실시예에 따른 필터부(130)는 IIR 필터 수행을 통하여 오류에 의한 잘못된 확률 마스크에 대한 평가를 피할 수 있고 시간축에 대해 처리를 하기 때문에 단위 시간 동안 확률 마스크 값이 크게 움직일 수 없는 가정 하에 갑작스런 빛의 간섭에도 더욱 강한 검출을 할 수 있도록 한다.That is, the filter unit 130 according to the embodiment of the present invention can avoid the evaluation of the false probability mask due to an error by performing the IIR filter and process the time axis so that the probability mask value can move greatly during the unit time. This makes it possible to make a stronger detection even in the absence of sudden light interference.

다음으로 S270과정에서 검출부(140)는 필터 처리된 통합 확률 마스크를 수평 및 수직 방향으로 투영하여 수평 및 수직 방향의 히스토그램값을 산출한다. 도 10은 본 발명의 실시예에 따른 통합 확률 마스크를 수평 및 수직 방향으로 투영한 히스토그램을 보여주는 도면이다.Next, in step S270, the detector 140 calculates histogram values in the horizontal and vertical directions by projecting the filtered integrated probability mask in the horizontal and vertical directions. FIG. 10 is a diagram illustrating a histogram of projecting an integrated probability mask in horizontal and vertical directions according to an exemplary embodiment of the present invention.

필터부(130)에서 필터 처리된 통합 확률 마스크의 점수가 높은 화소위치는 얼굴영역일 가능성이 높은 위치이므로 최종적으로 얼굴영역을 검출하기 위해서 통합 확률 마스크의 점수를 수평 및 수직 방향으로 투영하여 누적값을 계산한다.Since the pixel position having the high score of the integrated probability mask filtered by the filter unit 130 is likely to be a face area, the cumulative value of the integrated probability mask is projected in the horizontal and vertical directions to finally detect the face area. Calculate

다음으로 S280과정에서 검출부(140)는 도 10에 도시된 수평 및 수직 투영 히스토그램을 이용하여 얼굴영역을 검출한다. 도 11은 본 발명의 실시예에 따라 검출된 얼굴영역을 보여주는 도면으로, 수평 및 수직 좌표의 히스토그램 값이 일정 임계치보다 큰 위치를 얼굴영역의 경계로 결정한 것이다.Next, in step S280, the detection unit 140 detects a face region using the horizontal and vertical projection histograms shown in FIG. 10. FIG. 11 is a diagram illustrating a detected facial region according to an exemplary embodiment of the present invention, wherein a position where a histogram value of horizontal and vertical coordinates is larger than a predetermined threshold is determined as a boundary of the facial region.

다음으로 S290과정에서 측정부(150)는 검출부(140)에서 검출한 얼굴영역을 기반으로 화자와 통신단말기 사이의 거리를 측정한다. 도 12는 본 발명의 실시예에 따른 얼굴영역 수평방향 화소수와 통신단말기와 화자 사이의 거리 관계를 나타내는 그래프이다. 도 12는 3차례의 실험을 통해 얻어진 데이터를 수학식 6으로 피팅(fitting)한 것으로 95%의 신뢰구간에서 a=4630, b=2.549의 값을 갖는다.Next, the measuring unit 150 measures the distance between the speaker and the communication terminal based on the face area detected by the detector 140 in step S290. 12 is a graph showing the relationship between the number of pixels in the horizontal region of the face area and the distance between the communication terminal and the speaker according to the embodiment of the present invention. FIG. 12 is a fitting of data obtained through three experiments using Equation 6, with values of a = 4630 and b = 2.549 at a 95% confidence interval.

한편, 도 13은 본 발명의 실시예에 따른 얼굴 검출 및 거리 측정 결과를 보여주는 도면으로, 다양한 조명과 배경 그리고 피부색에 대하여 실험한 결과이다. 도 13을 참조하면, 피부색과 유사한 배경에서도 얼굴영역 선택과 거리 측정을 수행할 수 있다.On the other hand, Figure 13 is a view showing the results of the face detection and distance measurement according to an embodiment of the present invention, the results of experiments for various lighting, background and skin color. Referring to FIG. 13, face region selection and distance measurement may be performed even on a background similar to skin color.

결과적으로 본 발명의 통신단말기에서 화자의 얼굴영역 검출 방법, 이를 이용한 통신단말기와 화자 간 거리 측정 방법 및 이를 적용한 통신단말기는 영상 통화 중 촬영된 화자의 영상 프레임을 복수의 색좌표계에 대응하는 색신호 성분으로 실시간으로 변환하고 각각의 색신호 성분을 저연산으로 다단계 임계치 처리하여 피부색 영역, 중간 영역 및 피부색이 아닌 영역으로 구분하고, 각 영역에 부여된 확률 마스크를 합산하여 통합 확률 마스크를 생성하고, 통합 확률 마스크를 시간축으로 필터 처리하고 수평 및 수직 방향으로 투영하여 화자의 얼굴영역을 실시간으로 검출할 수 있는 효과가 있다.As a result, in the communication terminal of the present invention, a method for detecting a speaker's face area, a method for measuring the distance between the communication terminal and the speaker using the same, and the communication terminal applying the same are used as color signal components corresponding to a plurality of color coordinates of the video frame photographed during a video call. By converting in real time and multi-level thresholding of each color signal component with low computation, it is divided into skin color region, middle region and non-skin color region, the combined probability masks are added to each region to generate integrated probability mask, and integrated probability The mask is filtered along the time axis and projected in the horizontal and vertical directions to detect the speaker's face area in real time.

이상에서, 본 발명의 구성 및 동작을 상기한 설명 및 도면에 따라 도시하였지만 이는 예를 들어 설명한 것에 불과하며, 본 발명의 기술적 사상 및 범위를 벗어나지 않는 범위 내에서 다양한 변화 및 변경이 가능함은 물론이다.In the above, the configuration and operation of the present invention has been shown in accordance with the above description and drawings, but this is merely an example, and various changes and modifications are possible without departing from the spirit and scope of the present invention. .

100 : 통신단말기 110 : 처리부
111 : 송수신부 113 : 전처리부
115 : 변환부 117 : 확률 마스크 부여부
120 : 합산부 130 : 필터부
140 : 검출부 150 : 측정부
160 : 카메라 170 : 제어부100: communication terminal 110: processing unit
111: transceiver 113: preprocessor
115: transform unit 117: probability mask grant unit
120: adding unit 130: filter unit
140: detection unit 150: measurement unit
160: camera 170: control unit

Claims

In the method for detecting the face area of the speaker in the communication terminal,
A first step of converting the captured image frame of the speaker into color signal components corresponding to a plurality of color coordinate systems;
A second step of dividing each of the converted color signal components into a skin color region, an intermediate region, and a non-skin color region by multi-step threshold processing;
A third step of applying a probability mask representing a probability of skin color to each of the divided regions;
A fourth step of calculating an integrated probability mask by summing probability masks assigned to each of the regions;
A fifth step of filtering the integrated probability mask on a time axis; And
A sixth step of projecting the filtered integrated probability mask in the horizontal and vertical directions to calculate a histogram value in the horizontal and vertical directions, and detecting a region where the calculated histogram value is smaller than a predetermined predetermined value as the face region of the speaker; And a method of detecting a speaker's face area in a communication terminal.

The method of claim 1, wherein the first step is
Converts the video frame of the speaker inputted into the RGB color coordinate system into the color signal component of the YCbCr color coordinate system through the following equation,

,
And the communication terminal converts the video frame of the speaker inputted in RGB into a color signal component corresponding to a YIQ color coordinate system through the following equation.

The method of claim 2, wherein the second step,
The multi-step threshold processing is performed in two stages, and the first and second lower thresholds TL _c ¹ and TL _c ² and the first and second upper thresholds TH _c ¹ and TH _c ² , respectively, for each of the converted color signal components c. ),
An area between the first lower threshold and the first upper threshold as the skin color region,
An area between the second lower threshold and the first lower threshold and between the first upper threshold and the second upper threshold as the middle region,
And detecting a region below the second lower threshold and above the second upper threshold as an area other than the skin color.

The method of claim 3, wherein the first step,
And a pre-processing step of performing white balance correction and backlight correction on the received video frame of the speaker.

In the communication terminal for measuring the distance between the communication terminal and the speaker,
A first step of converting the captured image frame of the speaker into color signal components corresponding to a plurality of color coordinate systems;
A second step of dividing each of the converted color signal components into a skin color region, an intermediate region, and a non-skin color region by multi-step threshold processing;
A third step of applying a probability mask representing a probability of skin color to each of the divided regions;
A fourth step of calculating an integrated probability mask by summing probability masks assigned to each of the regions;
A fifth step of filtering the integrated probability mask on a time axis;
A sixth step of projecting the filtered integrated probability mask in the horizontal and vertical directions to calculate a histogram value in the horizontal and vertical directions, and detecting a region where the calculated histogram value is smaller than a predetermined predetermined value as the face region of the speaker; step; And
And a seventh step of measuring the distance between the communication terminal and the speaker using the detected area of the face area, the vertical or horizontal length.

The method of claim 5, wherein the first step,
Converts the video frame of the speaker inputted into the RGB color coordinate system into the color signal component of the YCbCr color coordinate system through the following equation,

,
And the communication terminal converts an image frame of the speaker inputted in RGB into a color signal component corresponding to a YIQ color coordinate system through the following equation.

The method of claim 6, wherein the second step,
The multi-step threshold processing is performed in two stages, and the first and second lower thresholds TL _c ¹ and TL _c ² and the first and second upper thresholds TH _c ¹ and TH _c ² , respectively, for each of the converted color signal components c. ),
An area between the first lower threshold and the first upper threshold as the skin color region,
An area between the second lower threshold and the first lower threshold and between the first upper threshold and the second upper threshold as the middle region,
And determining a region below the second lower threshold and above the second upper threshold as an area other than the skin color.

The method of claim 7, wherein the first step,
And a pre-processing step of performing white balance correction and backlight correction on the image frame of the received speaker.

The method according to any one of claims 5 to 8, wherein the seventh step is
Measure the distance (d) of the communication terminal and the speaker through the following equation,

,
The x indicates the area, vertical or horizontal length of the face area detected by the detection unit, a and b are coefficients determined according to the type of the camera provided in the communication terminal, the distance measurement between the communication terminal and the speaker Way.

After converting the image frame of the received speaker into color signal components corresponding to a plurality of color coordinate systems, each color signal component is processed into multi-level thresholds to divide the skin color region, the middle region and the non-skin color region, and express the probability of the skin color in each region. A processing unit for assigning a probability mask;
An adder configured to calculate an integrated probability mask by summing probability masks provided to the respective areas;
A filter unit for filtering the integrated probability mask on a time axis;
A detector for projecting the filtered integrated probability mask in a horizontal and vertical direction to calculate a histogram value in a horizontal and vertical direction and detecting a region where the calculated histogram value is smaller than a predetermined predetermined value as a face region of the speaker; And
The image frame of the speaker is transmitted to the processor, the probability mask given by the processor is transmitted to the adder, the integrated probability mask calculated by the adder is transmitted to the filter, filtered, and then transmitted to the detector. And a controller for detecting a face area of the speaker.

The method of claim 10,
And a measuring unit measuring a distance between the communication terminal and the speaker by using an area, a vertical or horizontal length of the face area detected by the detection unit.

The method of claim 10 or 11, wherein the processing unit,
Determine first and second lower thresholds TL _c ¹ and TL _c ² and first and second upper thresholds TH _c ¹ and TH _c ² corresponding to respective color signal components c, Communication terminal, characterized in that according to the formula.

The method of claim 12, wherein the processing unit,
An area between the first lower threshold and the first upper threshold as the skin color region,
An area between the second lower threshold and the first lower threshold and between the first upper threshold and the second upper threshold as the middle region,
And a region below the second lower threshold and above the second upper threshold is defined as an area other than the skin color.

The method of claim 13, wherein the processing unit,
And the skin color region is 2 points, the intermediate region is 1 point, and the non-skin color region is assigned a probability mask of 0 points.

The method of claim 14, wherein the processing unit,
A communication terminal for converting an image frame input as an RGB color signal component to a color signal component corresponding to the YCbCr color coordinate system or the YIQ color coordinate system.

16. The method of claim 15, wherein the filter unit performs an Infinite Impulse Response (IIR) filter on the probability mask of the current frame and the probability mask of the previous frame on a time axis through the following equation,

,
The M (k) is a probability mask for the k-th image frame, and α is a weight having a value of 0 to 1.

The method of claim 11, wherein the measuring unit measures the distance (d) of the speaker and the speaker through the following equation,

,
X denotes the area, vertical or horizontal length of the face area detected by the detection unit, and a and b are coefficients determined according to the type of the camera provided in the communication terminal.

The method of claim 17,
And a preprocessor for white balance correction and backlight correction for the received video frame of the speaker.