KR20050114817A

KR20050114817A - Automatic detection method of human facial objects for the digital video surveillance

Info

Publication number: KR20050114817A
Application number: KR1020040039868A
Authority: KR
Inventors: 김상훈
Original assignee: 김상훈
Priority date: 2004-06-02
Filing date: 2004-06-02
Publication date: 2005-12-07
Also published as: KR100572768B1

Abstract

본 발명은 실내 감시를 위한 디지탈 영상 보안용 카메라에 적용이 가능한 사람 얼굴객체 검출 시스템에 관한 것이다. 본 발명은 사람의 얼굴을 추적의 대상으로 하고, 사람의 얼굴영역 자동 추출과 추적이 가능한 상호작용적인 비전 시스템을 구성한다. 종래의 객체 추적시스템은 능동센서를 사용하는 경우가 많으며, 이 경우 센서의 기능이 단순하고 지능적이지 못하며 비용이 많이 드는 단점이 있다. 얼굴영역의 자동추출을 위해서 본 발명은 거리, 색상, 움직임 등 다양한 형식이 상호 융합된 정보를 활용하여, 빛과 주변환경의 변화에 따른 영상의 변형에 대비한 객체중심의 검출 방법을 특징으로 한다. 얼굴객체를 복잡한 배경으로부터 분리하기 위하여 거리정보를 나타내는 변위맵과 변위 히스토그램을 이용하고 MPC(Matching Pixel Count)를 이용하여 정합(matching)의 정확도를 향상시키는 역할을 한다. 거리정보에 의해 영상분할된 영역으로부터 얼굴영역을 찾아내기 위해 색상변환기술이 사용되었으며, 이는 색상정보를 빛의 세기와 광원의 색상의 두 가지 요인으로 모델링하고 두 가지 성분을 모두 정규화하는 과정을 통해 특징공간내에서의 일반화된 살색분포를 정의한다. 움직임 검출 기술은 살색의 확률에 따라 임계값이 적응적으로 변화하는 색상변환영역에서 정의되며 관련연산은 입력컬러영상을 색상과 움직임정보의 확률로 표현하기 위한 영역으로 변환해주는 역할을 한다. 최종적으로는 살색움직임 정보와 거리정보의 융합을 통해 검출된 얼굴영역의 성분분포를 특징공간에서 정의하고, 추적을 위한 모델 데이터로 활용하게 된다. 새로 입력되는 얼굴이미지(target)는 같은 과정을 통해 특징공간에서의 분포를 만들고 이미 설정된 모델 데이터와의 유사도 측정을 통해 mean shift 기반의 추적을 시행함을 특징으로 한다. The present invention relates to a human face object detection system applicable to a digital video security camera for indoor surveillance. The present invention targets a human face and constitutes an interactive vision system capable of automatic extraction and tracking of a human face area. Conventional object tracking systems often use active sensors, in which case the sensor's function is not simple, intelligent and expensive. In order to automatically extract the face region, the present invention is characterized by an object-centered detection method that prepares for deformation of an image according to changes in light and surrounding environment by utilizing information in which various forms such as distance, color, and motion are fused together. . In order to separate the face object from the complex background, it uses displacement map and distance histogram representing distance information and uses matching pixel count (MPC) to improve matching accuracy. The color conversion technique is used to find the face region from the image segmented area by distance information, which is modeled as two factors of light intensity and color of light source and normalized both components. Define a generalized skin color distribution within the feature space. The motion detection technique is defined in the color conversion area where the threshold value is adaptively changed according to the probability of flesh color, and the related operation converts the input color image into the area for expressing the probability of color and motion information. Finally, the component distribution of the detected facial region is defined in the feature space through the convergence of flesh movement information and distance information, and used as model data for tracking. The newly input face image (target) is characterized by performing a mean shift-based tracking through the same process to create a distribution in the feature space and measure similarity with the already set model data.

Description

Automatic detection method of human facial objects for the digital video surveillance}

본 발명은 실내 감시 카메라 또는 화상회의 카메라에 적용이 가능한 사람 객체 추적 시스템에 관한 것이다. 본 발명은 사람의 얼굴을 추적의 대상으로 하고, 사람의 얼굴영역 자동 추출과 자동 추적이 가능한 지능적이고 상호작용적인 비전 시스템을 구성한다. 종래의 객체 추적시스템은 능동센서를 사용하는 경우가 많으며, 이 경우 센서의 기능이 단순하고 지능적이지 못하며 비용이 많이 드는 단점이 있다. The present invention relates to a human object tracking system applicable to indoor surveillance cameras or videoconferencing cameras. The present invention makes an intelligent and interactive vision system capable of tracking a human face and automatically extracting and automatically tracking a human face area. Conventional object tracking systems often use active sensors, in which case the sensor's function is not simple, intelligent and expensive.

본 발명은 상기 배경의 관점에서 이루어진 것으로, 감시 카메라 또는 화상카메라 구축을 위해 이동하는 사람의 얼굴 영역 검출 및 추적이 가능한 시스템을 제공하는 것을 목적으로 한다. The present invention has been made in view of the above-described background, and an object of the present invention is to provide a system capable of detecting and tracking a face area of a moving person for constructing a surveillance camera or an image camera.

이러한 목적을 달성하기 위해 본 발명은, 얼굴객체의 검출을 위한 객체-배경 분리부, 얼굴색상과 움직임을 이용해 최종 얼굴영역을 검출하는 얼굴색상 정보 검출부, 얼굴색상 영역과 가중치 적용된 움직임을 동시에 갖는 영역을 추출하기 위한 움직임 색상정보 추출부, 복수의 정보에 의해 검출된 영역을 최종 얼굴객체 영역으로 정의하는 최종얼굴객체 판별부, 검출된 데이터를 타겟으로 움직이는 객체의 추적을 위한 영역 추적부의 크게 5개 부분으로 구성된다.In order to achieve the above object, the present invention provides an object-background separator for detecting a face object, a face color information detector for detecting a final face area using face color and motion, and a region having a face color area and a weighted motion at the same time. A motion color information extracting unit for extracting an image, a final face object discriminating unit defining an area detected by a plurality of pieces of information as a final face object area, and a large area tracking unit for tracking an object moving the detected data as a target It is composed of parts.

본 발명에서는 다양한 잡음요소를 갖는 실제환경에서 얼굴영역을 정확히 검출해내기 위해 거리, 살색, 객체의 움직임 정보등 다양한 형식의 정보를 융합하는 멀티모달 방식을 채택하였다. 거리 및 얼굴색에 의해 각각 영상분할된 영역과 움직임이 검출된 영역등이 추출된 후, 이러한 결과영상들의 화소값들이 모두 결합되어 다시 0과 255사이의 흑백영상으로 변환된다. 이는 얼굴영역의 확률을 표시하게 되고, 여기서 높은 확률값을 갖는 영역은 다시 그룹화되고 영상분할되어 최종의 얼굴영역을 정의하게 된다. In the present invention, in order to accurately detect a face region in a real environment having various noise elements, a multi-modal method of fusing various types of information such as distance, flesh color, and object motion information is adopted. After the image segmented area and the motion detected area are extracted by distance and face color, the pixel values of the resultant images are combined and converted into a black and white image between 0 and 255. This indicates the probabilities of the face areas, where the areas with high probability values are again grouped and image segmented to define the final face area.

도1의 객체-배경 분리부(10)는 스테레오 영상을 이용한 거리정보의 추출을 통해 이루어진다. The object-background separator 10 of FIG. 1 is formed by extracting distance information using a stereo image.

도2는 거리정보에 의해 영상분할하는 방법을 좀더 구체적으로 표시하였으며, 이는 복잡한 배경영역과 분리되어 있는 사람객체를 분할하는데 중요하게 작용한다. 거리정보는 스테레오 카메라로부터 복수의 영상을 얻고 변위맵을 구하기 전에 잡음 성분을 최소화하고 정합 확률을 높이기 위해 경계선 강조 처리를 한다(101). 스테레오정합에 의한 변위맵을 구하기위해 MPC(Matching Pixel Count) 유사도 측정방법을 사용하였다(102). Fig. 2 shows in more detail a method of image segmentation based on distance information, which plays an important role in dividing a human object separated from a complex background area. The distance information is processed to obtain a plurality of images from the stereo camera and the edge emphasis process to minimize the noise component and increase the matching probability before obtaining the displacement map (101). In order to calculate the displacement map by stereo matching, MPC (Matching Pixel Count) similarity measurement method was used (102).

도3은 MPC를 이용하여 복수의 사람을 포함한 입력영상에 대해 변위맵을 구한 결과이다. 여기서 사용된 변위 히스토그램(DH)은 변위값들의 분포에 따른 발생횟수로 정의되며(도4) 이는 객체의 카메라로부터의 거리와 객체의 갯수를 얻을수 있는 기초정보가 된다. DH를 이용한 거리영상분할과정(103)은 구체적으로 다음과 같이 진행된다. 첫째, DH 외곽선을 평균적 필터에 의해 곡선화한다. 여기서 정해진 임계값 이하의 발생횟수를 갖는 영역들은 발생횟수가 없는 것으로 간주된다. 정해진 임계값 이상의 발생횟수를 가지며 연속된 변위값들은 카메라로부터의 거리가 동일한 하나의 객체를 표현하는 것으로 인식한다. 동일한 변위값을 갖는 화소들은 영상으로부터 추출된 후, 서로 연결된 영역 각각에 대해 독립적인 라벨이 부여된다. 가장 작은 변위값을 갖는 영역은 배경으로 간주한다. 3 is a result of obtaining a displacement map of an input image including a plurality of people by using the MPC. The displacement histogram (DH) used here is defined as the number of occurrences according to the distribution of the displacement values (Fig. 4), which is basic information to obtain the distance from the camera and the number of objects. The distance image segmentation process 103 using the DH is specifically performed as follows. First, the DH outline is curved by the average filter. In this case, areas having a frequency of occurrence less than or equal to a predetermined threshold value are considered to have no frequency of occurrence. Successive displacements with a number of occurrences above a certain threshold are recognized as representing one object with the same distance from the camera. Pixels having the same displacement value are extracted from the image, and are given an independent label for each of the areas connected to each other. The area with the smallest displacement is regarded as the background.

도1의 얼굴색상정보 검출부(20)는 색상정보를 이용하여 1차적인 사람객체의 영역을 예측하기 위한 기술이다. 정규화된 RGB 색상공간은 광원의 세기(intensity)에 의한 영향을 제거하는데 널리 사용되었으나 이것은 밝기가 고정되어 있고 조명의 단일한 색상, 단순한 배경화면 등을 가정하고 있다는데서 한계를 갖는다. 본 발명에서는 광원의 세기와 색상에 의한 영향을 제거하기 위하여 개선된 색상정규화 방법인 CSN(Color Synthetic Normalization) 방법을 제안한다. 카메라의 응답이 선형적일 때, 영상은 S배만큼의 공통적인 인자로 표현할 수 있다. 이러한 간단한 모델을 이용하여 많은 연구에서 순수한 색상성분을 얻기위한 RGB 정규화 과정이 시도되었으며 이는 빛의 세기변화에 따른 영향을 최소화 할 수 있는 방법이 된다. CSN 정규화는 이러한 빛의 세기에 대한 정규화 과정에 추가로, 광원의 다양한 색상에 의한 종속성을 제거하기 위해 성분별 정규화를 적용한다. 위의 2가지 색상 변화요인을 고려하면 결과적인 색상성분들의 표현은 원 색상성분으로부터 다음과 같은 식으로 표현할 수 있으며, 색상성분에 관한 일반적 표현모델로 정의할 수 있다. The face color information detection unit 20 of FIG. 1 is a technique for predicting an area of a primary human object using color information. Normalized RGB color spaces have been widely used to eliminate the effects of intensity of light sources, but this is limited by assuming that the brightness is fixed and a single color of illumination, a simple background, etc. The present invention proposes a Color Synthetic Normalization (CSN) method which is an improved color normalization method in order to remove the influence of the intensity and color of the light source. When the response of the camera is linear, the image can be expressed by a common factor of S times. Many studies have attempted the RGB normalization process to obtain pure color components using this simple model, which can minimize the effects of changes in light intensity. CSN normalization applies component-specific normalization to remove the dependence of the various colors of the light source in addition to the normalization process for this light intensity. Considering the above two color change factors, the resulting color components can be expressed as follows from the original color components, and can be defined as a general expression model for color components.

식 1 Equation 1

위 식의 우변 마지막 항은 비선형잡음등을 포함하는 카메라 환경에 기인한 초기변수로 정의되며 본 발명에서는 0으로 가정한다. The last term on the right side of the above equation is defined as an initial variable due to the camera environment including nonlinear noise and is assumed to be zero in the present invention.

도5는 본 발명에서 제안된 CSN 정규화의 과정을 설명하며 다음의 간단한 방법을 따른다. 즉 먼저 광원의 세기에 대해 입력영상을 정규화(201) 한 후, 색상성분에 의한 정규화(202)를 취한다. 이러한 연속된 2가지 연산은 정규화된 색상성분이 작은 영역으로 수렴될때까지 계속 반복된다. 정규화된 색상공간에서의 화소값들을 색상 히스토그램상에서 보면 유사한 색상을 갖는 화소값들이 작은 영역에 집중되는데 그 분포는 Gaussian분포와 유사(203)하다. 따라서 본 발명에서는 2차원의 Gaussian분포를 적용하여 표현된 일반살색분포를 정의하며 이러한 정의를 파라미터로 하여 입력영상을 이미 정의된 얼굴색상성분 분포의 영역으로 변환시킨다.5 illustrates the process of CSN normalization proposed in the present invention and follows the following simple method. That is, first, the input image is normalized 201 with respect to the intensity of the light source, and then normalized by the color component 202. These two consecutive operations are repeated until the normalized color components converge into small regions. When pixel values in a normalized color space are viewed on a color histogram, pixel values having similar colors are concentrated in a small area, and the distribution is similar to that of a Gaussian distribution (203). Therefore, the present invention defines a general skin color distribution by applying a two-dimensional Gaussian distribution, and converts the input image into a region of the already defined face color component distribution using this definition as a parameter.

도6은 도5에서의 변환과정(301)에서 영상내의 움직임 성분을 가중치로 반영하여, 얼굴 객체성분의 분포확율을 신뢰성이 높은 파라미터로 정의하여 최종 얼굴객체 추출의 신뢰성을 높이기 위하 단계이다. 상호작용적인 통신을 위한 영상데이터에서 중요한 정보중의 하나는 얼굴의 움직임 정보이며 이는 관심영역의 대부분이 움직임을 가지고 있기 때문이다. 객체의 움직임 정보를 찾아내기 위해 UPC(Unmatched Pixel Count) 움직임 검출 측정법이 이용되었다(302). UPC는 블록단위의 간단한 연산형태를 갖는다. 본 발명에서 제안하는 AWUPC (Adaptive Weighted Unmatched Pixel Count) 연산은(303) 식2, 식3, 식4와 같이 정의되며 Z(x,y,t)는 일반적 얼굴색상분포로 변환된 결과영상이고 U(i,j,t)는 UPC 움직임 검출결과이다. AWUPC연산은 살색변환된 영역안에서 움직임이 있는 성분을 강조하는 결과를 보여준다. FIG. 6 is a step for increasing the reliability of the final face object extraction by defining the distribution probability of the face object component as a highly reliable parameter by reflecting the motion component in the image as a weight in the conversion process 301 of FIG. One of the important information in the image data for interactive communication is the movement information of the face because most of the region of interest has movement. An unmatched pixel count (UPC) motion detection measure was used to find motion information of an object (302). UPC has a simple operation in block units. AWUPC (Adaptive Weighted Unmatched Pixel Count) operation proposed in the present invention is defined as Equation 2, Equation 3, Equation 4, and Z (x, y, t) is a result image converted to a general face color distribution and U (i, j, t) is the UPC motion detection result. The AWUPC operation shows the result of emphasizing the moving component in the chrominized region.

식 2 Equation 2

식 3 Expression 3

식 4 Equation 4

한편 식4에서의 임계값은 입력 색상영상의 살색유사도에 따라 적응적으로 결정될수 있도록 도7에서와 같은 sigmoid함수를 사용하였으며 그 자세한 연산방법은 식4와 같다. 여기서 Z(x,y,t)는 시간 t에서의 입력 화소값이며, Q는 sigmoid함수의 곡선의 기울기를 결정하는 계수이다. 적응적인 임계값을 사용하는 이유는 다음과 같다. 입력영상의 화소값은 얼굴색상의 확률을 의미한다. 그러므로 이미 높은 얼굴영역의 확률을 가지고 있는 화소는 작은 움직임에도 얼굴객체로 검출되도록 하기 위해 낮은 임계값을 갖을 필요가 있으며, 반대로 색상변환을 통해 얼굴일 확률이 낮게 나온 영역은 대체로 얼굴등의 관심영역이 아닌 경우이므로 큰 움직임이 있는 경우에만 검출이 될 수있도록 높은 임계값을 사용한다. On the other hand, the threshold value in Equation 4 is used sigmoid function as shown in Figure 7 so that it can be adaptively determined according to the skin color similarity of the input color image, the detailed calculation method is shown in Equation 4. Where Z (x, y, t) is the input pixel value at time t and Q is the coefficient that determines the slope of the curve of the sigmoid function. The reasons for using adaptive thresholds are as follows. The pixel value of the input image represents the probability of face color. Therefore, a pixel that already has a high probability of face area needs to have a low threshold in order to be detected as a face object even with small movements. On the contrary, an area that has a low probability of being a face through color conversion is a region of interest such as a face. Since this is not the case, use a high threshold so that it can be detected only when there is a large movement.

도 8은 복수의 정보를 활용하여 최종적인 얼굴객체 성분을 정의하기 위한 방법을 묘사한 그림이다. 1단계에서 가장 많은 잡음이 포함된 배경영역과 관심의 대상인 타겟영역을 분리하기 위해 변위맵에 의한 영상분할을 취한 영역과, 2단계에서 얼굴색상을 강조한 영역과 움직임색상정보를 반영한 영역을 각각 독립적으로 추출한다. 3단계에서 화소단위의 변환과정을 통해 위의 모든 성분을 포함하는 공간에서의 확률분포를 구성하고, 가장 확률이 높은 영역에서의 성분을 Gaussian분포의 중심에 오도록 특징공간을 구성하고 얼굴객체 영역으로 정의한다(401).8 is a diagram illustrating a method for defining a final face object component using a plurality of information. In the first step, the background area containing the most noise and the target area of interest are separated from each other by the image segmentation by the displacement map, and the area where the face color is emphasized and the area where the motion color information is reflected in the second step. To extract. In the third step, we construct the probability distribution in the space including all the above components by converting the pixel unit, and configure the feature space so that the component in the most probable region is in the center of the Gaussian distribution, Define (401).

도 1의 영역추적부는 최종얼굴객체 성분 정의단계(401)에서의 화소단위 성분분포를 모델 영상의 확율밀도함수로 정의하고, 새로 입력되는 영상에 대해서 특징공간을 구성하고 확율밀도함수를 정의하여 모델 영상과 타겟 영상의 분포간의 유사도 측정을 통해 새로 입력된 영상안에서 관심영역을 추적한다. 모델 영사의 성분 분포는 Mean shift기반의 가중치 적응방법에 의해 관심영역 추적을 위한 중심점을 개선해 나간다. The area tracer of FIG. 1 defines the pixel unit component distribution in the final face object component defining step 401 as a probability density function of a model image, forms a feature space for a newly input image, and defines a probability density function. The ROI is tracked in the newly input image by measuring the similarity between the distribution of the image and the target image. The component distribution of the model projection improves the center point for tracking the ROI by means of weight shifting based on mean shift.

본 발명은 거리, 색상, 움직임등 다양한 형식이 상호 융합된 정보를 활용한, 객체중심의 얼굴영역 검출 기술을 제시한다. 얼굴객체를 복잡한 배경으로부터 분리하기 위하여 거리정보를 나타내는 스테레오 변위 히스토그램을 이용하였으며 MPC를 이용한 변위측정방법은 정합의 정확도를 향상시키는 역할을 하고 거리정보에 의해 영상분할된 영역으로부터 얼굴영역을 찾아내기 위해 색상변환기술이 사용되었다. 빛의 색상요인을 제거한 CSN의 정규화 방법에 의해 고유색상의 분포를 안정화시키며 색상공간내에서 2차원 Gaussian 함수를 이용하여 일반화된 얼굴객체 영역분포를 정의한다. AWUPC 움직임 검출 기술은 살색의 확률에 따라 임계값이 적응적으로 변화하는 색상변환영역을 정의하며 AWUPC연산은 입력컬러영상을 색상과 움직임정보 모두를 가질 확률값으로 재변환한다. The present invention proposes an object-centered face region detection technique using information in which various forms such as distance, color, and motion are fused together. In order to separate the face object from the complex background, a stereo displacement histogram representing distance information is used, and the displacement measurement method using MPC improves the accuracy of matching and finds the face area from the image segmented area by distance information. Color conversion technology was used. The normalized distribution of eigencolors is stabilized by the CSN normalization method that removes the color factor of light, and the generalized face object region distribution is defined using the 2D Gaussian function in the color space. The AWUPC motion detection technique defines a color conversion region in which the threshold is adaptively changed according to the probability of flesh color, and the AWUPC operation reconverts the input color image into a probability value having both color and motion information.

이상에서 설명한 바와 같이 본 발명에 의한 사람 얼굴객체 자동검출 방법은 복수의 정보를 서로 융합하여 얼굴객체를 복잡한 배경과 효과적으로 분리하고, 얼굴영역일 확율을 증가시키는 기술로서, 영상내 정보검출을 통해 사람-컴퓨터간 상호작용을 필요로 하는 디지탈 영상감시장치와 화상통신등의 기술분야에서 응용가능하다. As described above, the method for automatically detecting a human face object according to the present invention is a technology for effectively separating a face object from a complicated background by fusing a plurality of pieces of information with each other, and increasing the probability of working as a face region. -It can be applied in technical fields such as digital video surveillance device and video communication that require computer-to-computer interaction.

도1은 본 발명의 전체 처리과정을 묘사한 전체 흐름도이다.1 is an overall flow diagram depicting the overall process of the present invention.

도2는 본 발명의 전체 처리과정중 객체-배경 분리부를 위한 상세 흐름도이다.2 is a detailed flowchart for the object-background separation unit during the entire process of the present invention.

도3은 도 2의 객체-배경 분리부중 102 단계의 변위 맵 생성예이다.3 is an example of generating a displacement map in step 102 of the object-background separator of FIG. 2.

도4는 도 2의 객체-배경 분리부중 103 단계의 거리정보 영상분할을 설명하기 위한 변위 히스토그램(Disparity Histogram)의 생성예이다.FIG. 4 illustrates an example of generating a disparity histogram for explaining distance information image segmentation in step 103 of the object-background separator of FIG. 2.

도5는 도1의 얼굴색상정보 검출부의 상세 흐름도이다.5 is a detailed flowchart of the face color information detection unit of FIG. 1.

도6은 도1의 움직임색상정보 검출부의 상세 흐름도이다.6 is a detailed flowchart of a motion color information detection unit of FIG. 1.

도7은 도6의 움직임색상정보 검출부중 프레임간 움직임 검출시의 적응적 임계치를 적용하기 위한(303) 함수를 보여준다.FIG. 7 illustrates a function 303 for applying an adaptive threshold value for detecting inter-frame motion among the motion color information detection units of FIG. 6.

도8은 최종 얼굴 객체 영역을 판별하기 위한 상세과정을 보여주는 흐름도이다.8 is a flowchart illustrating a detailed process for determining a final face object region.

Claims

In the method of detecting a region of interest of an input image for digital image security, an object-background separator 10 for separating an object-background by extracting distance information by a stereo camera and a face through color normalization of the input object A motion color information extractor 30 and a final face object discriminator 40 which focus on the same input image as the face color information detector 20 that defines and extracts the color gamut by weights of the motion information and the face color gamut Image processing / monitoring system comprising a region tracking unit (50).

The image of claim 1, wherein the face color information detection unit is defined in a feature space in which the intensity normalization unit 201 of the light source and the color normalization unit 202 of the light source are simultaneously taken in the normalization process and converts the ROI by Gaussian distribution. Processing / Surveillance System.

The method of claim 1, wherein the motion color information extracting unit reflects the distribution of the face color component of the ROI and the motion value between the image frames at the same time by performing AWUPC (Aaptive Weighted Unmatched Pixel Count) calculation in the process of estimating the motion of the pixel unit. Image processing / monitoring system using feature space.

The image of the method of claim 3, wherein, in a threshold value used for calculating a difference in motion between image frames, a separate function is applied such that the determination of the threshold value is variably changed according to pixel values of the input image to be compared. Processing / Surveillance System.

4. The image processing / monitoring system according to claim 3, wherein the sigmoid function as shown in Fig. 7 is used as a transform function to determine a threshold value used for calculating a difference between motions between image frames.

The method of claim 1, wherein the face color information detecting unit 20 and the motion color information extracting unit 30 are processed to the region of interest including only the short-range objects already separated by the object-background separating unit 10. Image processing / monitoring system.

The method of claim 1, wherein the processes 10, 20, 30, and 40 for detecting a human face object are performed only on the initial input image, and the ROI is tracked through the region tracker 50 based on the mean shift algorithm. The image processing / monitoring system does not repeat the object detection process (10, 20, 30, 40) of interest in the process.