KR100776880B1

KR100776880B1 - System for estimating attitude at specific part, methode for estimating attitude at specific part, and program for estimating attitude at specific part

Info

Publication number: KR100776880B1
Application number: KR1020067017711A
Authority: KR
Inventors: 쇼지 다나카
Original assignee: 미쓰비시덴키 가부시키가이샤
Priority date: 2006-08-31
Filing date: 2004-03-24
Publication date: 2007-11-19
Also published as: KR20070029666A

Abstract

The posture is detected in a short time by small hardware which is not affected by the brightness of the input image and which has a low computing power. A matching image generation unit 3 which binarizes each pixel of the input image from which the specific portion is extracted to a setting threshold value based on an image value within a predetermined coordinate range, and specifies a component whose size is within the setting range to obtain a binary image; The pattern matching part 4 which detects a posture by contrasting the binary image obtained by the above-mentioned specification with a predetermined | prescribed template was provided.

Description

Specific partial pose estimating apparatus, specific partial pose estimating method and specific partial pose estimating program

본 발명은 휴대 전화나 전자 수첩 등의 인식 능력이 낮은 하드웨어에 의해 얻어지는 화상에 의해서도, 또한 그러한 장치에 장착되는 간이한 화상 처리 회로에 의해서도 용이하게 단시간에 자세가 추정 가능한 자세 추정 장치에 관한 것이다. The present invention relates to an attitude estimation apparatus capable of easily estimating a posture in a short time even by an image obtained by hardware having low recognition capability, such as a mobile phone or an electronic notebook, or by a simple image processing circuit attached to such a device.

예컨대, 자동차 등의 운전 감시를 하는 경우, 또는 방의 입실이나 인물의 특정 등의 인증에 있어서, 사람의 얼굴을 그 자세도 포함시켜서 추출하고, 화상 처리하여 감시, 또는 인증에 이용하는 것이 실행되고 있다. 이러한 감시 또는 인증에 있어서는, 뒤의 화상 처리에 앞서서, 먼저 자세의 검출이 필요하게 되거나, 혹은 자세 검출이 중요한 것으로 된다. For example, in the case of driving monitoring of a car or the like, or in authentication of entering a room, specifying a person, and the like, extraction of a human face including its posture, image processing, and use for monitoring or authentication is performed. In such monitoring or authentication, the posture detection is required first or posture detection becomes important prior to the subsequent image processing.

이러한 경우에 종래는, 예컨대 특허 문헌(일본 특허 공개 평성 제7-200774호 공보)에 도시된 도 14에 그 구성을 나타내는 자세 추출용의 화상 처리 장치가 있 다. 도면에서, 화상 처리 장치(21)는 입력 화상으로부터 피부색을 추출하는 피부색 추출 수단(22), 그 결과로부터 눈이나 입의 후보로 되는 영역을 추출하기 위한 2치화 수단(23)과, 2치화 수단의 결과로부터 눈 및 입의 영역을 검출하여, 동화상으로부터 그것들을 추적하는 눈ㆍ입 영역 검출ㆍ추적 수단(24)으로 구성되어 있다. In this case, conventionally, there is an image processing apparatus for posture extraction, which shows its configuration in Fig. 14 shown in, for example, a patent document (Japanese Patent Laid-Open No. 7-200774). In the figure, the image processing apparatus 21 includes skin color extraction means 22 for extracting skin color from an input image, binarization means 23 for extracting an area for candidates for eyes or mouth from the result, and binarization means. Eye and mouth area detection and tracking means 24 for detecting eyes and mouth areas from the moving picture.

다음에 동작에 대해서 설명한다. Next, the operation will be described.

먼저, 동화상 중에서 얼굴 영역을 특정하기 위해서, 3차원 컬러 히스토그램 등을 이용하여 피부색 추출 수단(22)으로 피부색 화소를 검출하고, 2치화 수단(23)에서는 피부색 화소와 그 이외의 화소로 2치화한다. 다음에, 눈ㆍ입 영역 검출ㆍ추적 수단(24)으로 피부색 영역 내의 구멍 영역을 추출하여, 눈 및 입의 후보 영역으로 한다. 추출한 후보 영역으로부터, 예를 들어 얼굴 영역에 대한 눈의 위치, 입의 위치의 휴리스틱스(heuristics)에 근거하여 눈 및 입 위치를 검출한다. 최후에 검출한 눈 및 입 위치로부터 머리 부분의 자세 정보를 추출한다. First, in order to specify a face region in a moving image, skin color pixels are detected by skin color extraction means 22 using a three-dimensional color histogram and the like, and binarization means 23 binarizes the skin color pixels and other pixels. . Next, the eye / mouth area detection / tracking means 24 extracts the hole area in the skin color area to be a candidate area for eyes and mouth. From the extracted candidate region, the eye and mouth positions are detected based on heuristics of the position of the eye and the position of the mouth relative to the face region, for example. Posture information of the head is extracted from the last detected eye and mouth positions.

종래의 자세 추출 장치는 상기한 바와 같이 구성되어 있으며, 대상을 촬영한 화상을, 피부색 추출하고, 또한 영역 검출ㆍ추적하여 최종적으로 자세 검출하기 때문에, 먼저 비디오 카메라로 촬영되는 화상이 촬영 장소가 변하면 조명도 포함시켜서 일정하지 않으며, 또한 양질의 화질이 얻어진다고는 한정되지 않아, 피부색의 양호한 검출이 곤란하다고 하는 과제가 있다. 또한, 영역 검출도 포함시켜서 처리량이 많아, 대규모인 회로를 필요로 하기 때문에, 휴대 전화에 탑재하는 소규모의 하드웨어로는 처리에 시간이 걸린다고 하는 과제가 있다. The conventional posture extracting apparatus is constructed as described above, and since the skin color is extracted, the area is detected and tracked, and the posture is finally detected, the image captured by the video camera first changes the shooting location. It is not limited to including illumination, and quality image quality is not limited, but there exists a subject that favorable detection of skin color is difficult. In addition, since there is a large amount of processing including area detection and a large circuit is required, there is a problem that processing takes time with a small piece of hardware mounted on a mobile telephone.

본 발명은 상기한 바와 같은 과제를 해결하기 위해서 이루어진 것으로, 비디 오 카메라 등에 의한 입력 화상의 화질에 영향을 주지 않고, 또한 계산 능력이 낮은 소규모의 하드웨어에 의해서도 단시간에 자세를 추출한다. SUMMARY OF THE INVENTION The present invention has been made to solve the above problems, and poses are extracted in a short time even by a small amount of hardware having low computational power without affecting the image quality of an input image by a video camera or the like.

발명의 개시Disclosure of the Invention

본 발명에 따른 특정 부분 자세 추정 장치는, 특정 부분을 추정한 입력 화상의 각 화소를 소정 좌표 범위 내의 화상값에 근거한 설정 임계값으로 2치화하고, 또한 크기가 설정 범위 내인 부품을 특정하여 2치 화상을 얻는 매칭 화상 생성부와, 이 특정해서 얻어진 2치 화상과, 소정의 템플릿을 대조하여 자세를 검출하는 패턴ㆍ매칭부를 구비하였다. The specific partial attitude estimation apparatus according to the present invention binarizes each pixel of the input image in which the specific portion is estimated to a set threshold value based on an image value within a predetermined coordinate range, and further specifies a component whose size is within the set range. A matching image generating unit for obtaining an image, a binary image obtained by specifying this image, and a pattern matching unit for detecting a posture by comparing a predetermined template were provided.

또한, 매칭 화상 생성부는, 입력 화상을 그레이 스케일 화상으로 변환하여, 해당 그레이 스케일 화상의 휘도를, 대상 화소를 좌표 중심으로 하여 소정 범위 내에 있는 화소의 휘도의 평균값, 또는 중앙값을 임계값으로 하여 2치화하도록 한 것을 특징으로 한다. The matching image generating unit converts the input image into a gray scale image, and sets the luminance of the gray scale image as the threshold value as the average value or the median value of the luminance of pixels within a predetermined range with the target pixel as the coordinate center. It is characterized in that it is made to be digitized.

또한, 패턴ㆍ매칭부는, 미리 복수의 특정 입력 화상에 의해 매칭 화상을 생성하고, 해당 생성한 2치화 화상을 템플릿의 요소로 한 것을 특징으로 한다. The pattern matching unit generates a matching image from a plurality of specific input images in advance, and uses the generated binary image as an element of a template.

또한, 패턴ㆍ매칭부는, 템플릿의 각 요소와의 대조을 화소의 논리곱으로 얻도록 한 것을 특징으로 한다. The pattern matching unit is characterized in that a comparison with each element of the template is obtained by a logical product of the pixels.

또한, 패턴ㆍ매칭부는, 매칭 화상 생성부에서 얻어지는 2치 화상으로부터 특정한 값을 가지는 화소를 구하고, 그들 화소의 화상 중에 있어서의 분포 상태로부터 검출 대상인 특정 부분의 경사를 추정하도록 한 것을 특징으로 한다. The pattern matching unit obtains pixels having a specific value from the binary image obtained by the matching image generation unit, and estimates the inclination of the specific portion to be detected from the distribution state in the image of those pixels.

본 발명에 따른 특정 부분 자세 추정 방법은, 특정 부분을 추출한 입력 화상의 각 화소를 소정 좌표 범위 내의 화상값에 근거한 설정 임계값으로 2치화하는 것, 상기 2치화해서 얻어진 2치화 화상으로부터 설정 범위 외의 부분을 삭제하여, 설정 범위 내의 화상 부품으로서 라벨 부착하는 것, 상기 삭제 후에 라벨 부착된 부품군으로 이루어지는 2치화 화상과, 소정의 템플릿에 존재하는 화상의 휘도를 대조하는 것을 구비한 것을 특징으로 한다. The specific partial attitude estimation method according to the present invention binarizes each pixel of the input image from which the specific portion is extracted to a set threshold based on an image value within a predetermined coordinate range, and out of the set range from the binarized image obtained by the binarization. And deleting the portion and labeling it as an image part within a setting range, matching the binary image formed by the group of parts with the label after the deletion, and the luminance of the image present in the predetermined template. .

또한, 2치화는, 먼저 입력 화상을 그레이 스케일 화상으로 변환하고, 해당 변환한 그레이 스케일 화상의 휘도를, 대상 화소를 좌표 중심으로 하여 소정 범위 내에 있는 화소의 휘도 평균값과, 화상 중앙값 중 하나를 2치화 임계값으로서 선택하여, 2치화하는 것을 특징으로 한다. In the binarization, the input image is first converted into a gray scale image, and the luminance of the converted gray scale image is one of the luminance average value of the pixel within a predetermined range with the target pixel as the coordinate center and one of the image center values. It selects as a digitization threshold and binarizes.

또한, 대조는, 2치화 화상과 템플릿 화상의, 좌표값에서 대응하는 각 화소의 휘도 논리곱으로 대조값을 얻는 것을 특징으로 한다. Further, the collation is characterized in that a collation value is obtained by the logical AND of the pixels corresponding to the coordinate values of the binarized image and the template image.

본 발명에 따른 특정 부분 자세 추정 프로그램은, 계산기가 판독하여 실행 가능하고, 특정 부분을 추출한 입력 화상의 각 화소를 소정 좌표 범위 내의 화상값에 근거한 설정 임계값으로 2치화하는 것, 상기 2치화해서 얻어진 2치화 화상으로부터 설정 범위 외의 부분을 삭제하여, 설정 범위 내의 화상 부품으로서 라벨 부착하는 것과, 상기 삭제 후에 라벨 부착된 부품군으로 이루어지는 2치화 화상과, 소정의 템플릿에 존재하는 화상의 휘도를 대조하는 것을 구비한 프로그램인 것을 특징으로 한다. The specific partial attitude estimation program according to the present invention can be read and executed by a calculator and binarizes each pixel of the input image from which the specific portion is extracted to a set threshold value based on an image value within a predetermined coordinate range. A part outside the setting range is deleted from the obtained binarized image, and labeling is performed as an image part within the setting range, and the luminance of the binarized image made up of the group of parts labeled with the label and the image present in the predetermined template are compared. Characterized in that the program having a.

도 1은 본 발명의 실시예 1에 있어서의 특정 부분 자세 추정 장치의 구성을 나타내는 도면, BRIEF DESCRIPTION OF THE DRAWINGS The figure which shows the structure of the specific partial attitude estimation apparatus in Example 1 of this invention.

도 2는 실시예 1에 있어서의 특정 부분 자세 추정 장치의 동작을 나타내는 흐름도, 2 is a flowchart showing the operation of the specific partial attitude estimation apparatus according to the first embodiment;

도 3은 실시예 1에 있어서의 매칭 화상 생성부가 실행하는 2치화 동작을 나타내는 흐름도, 3 is a flowchart showing the binarization operation performed by the matching image generation unit in the first embodiment;

도 4는 실시예 1에 있어서의 2치 화상 생성부의 하드웨어 내부 구성을 나타내는 도면, 4 is a diagram illustrating a hardware internal configuration of a binary image generating unit in Example 1;

도 5는 실시예 1에 있어서의 설정 임계값을 얻는 범위를 설명하는 도면, 5 is a diagram for explaining a range of obtaining a set threshold value in Example 1;

도 6은 실시예 1에 있어서의 설정 임계값의 취득 방법을 설명하는 도면, 6 is a diagram for explaining a method for acquiring a setting threshold value in Example 1;

도 7은 실시예 1에 있어서의 매칭 화상 생성부가 실행하는 2치화 동작을 설명하기 위한 도면, FIG. 7 is a view for explaining a binarization operation performed by a matching image generating unit in Embodiment 1; FIG.

도 8은 실시예 1에 있어서의 매칭부가 실행하는 매칭 동작의 흐름도, 8 is a flowchart of a matching operation performed by the matching unit according to the first embodiment;

도 9는 실시예 1에 있어서의 패턴 매칭부가 실행하는 매칭 동작을 설명하기 위한 도면, 9 is a view for explaining a matching operation performed by the pattern matching unit in Example 1;

도 10은 실시예 1에 있어서의 다른 특정 부분 자세 추정 장치의 구성을 나타내는 도면, FIG. 10 is a diagram illustrating a configuration of another specific partial attitude estimation device according to the first embodiment; FIG.

도 11은 실시예 1에 있어서의 다른 패턴ㆍ매칭부가 실행하는 템플릿 작성 동작을 설명하기 위한 도면, FIG. 11 is a diagram for explaining a template creating operation executed by another pattern matching unit in Example 1;

도 12는 실시예 2에 있어서의 특정 부분 자세 추정 장치의 구성을 나타내는 도면,12 is a diagram illustrating a configuration of a specific partial attitude estimation apparatus according to the second embodiment;

도 13은 실시예 2에 있어서의 패턴ㆍ매칭부가 실행하는 화소 분포에 의한 자세 추출을 설명하기 위한 도면, FIG. 13 is a view for explaining attitude extraction by pixel distribution executed by the pattern matching unit in Example 2; FIG.

도 14는 종래의 화상 처리 장치에 따른 자세 추출을 위한 구성을 나타내는 도면. 14 is a diagram showing a configuration for extracting a posture according to a conventional image processing apparatus.

발명을 실시하기Implement the invention 위한 최선의 형태 Best form for

(실시예 1)(Example 1)

도 1은 본 발명의 본 실시예에 있어서의 특정 부분 자세 추정 장치의 구성을 나타내는 도면이다. BRIEF DESCRIPTION OF THE DRAWINGS It is a figure which shows the structure of the specific partial attitude estimation apparatus in this Example of this invention.

도면에서, 특정 부분 자세 추정 장치(1)는 비디오 카메라 등으로 촬영한 영상 신호를 캡쳐하기 위한 영상 캡쳐부(2), 캡쳐한 영상에 대하여 필터링 처리를 실시하여, 후술하는 자세 패턴과 대조하기 위한 화상을 생성하기 위한 매칭 화상 생성부(3), 매칭 화상 생성부(3)에서 생성한 화상과, 미리 기억한 자세 패턴과 대조하여, 머리 부분 등의 부분 자세를 구하는 패턴ㆍ매칭부(4)로 구성된다. 또한, 내부는 영상 캡쳐부(2)에서 캡쳐한 컬러 화상을 그레이 스케일 화상으로 변환하기 위한 색공간 변환부(5), 변환한 그레이 스케일 화상을 2치 화상으로 변환하기 위한 2치 화상 생성부(6), 2치 화상 생성부(6)에서 2치화한 화상으로부터 인접 화소를 통합함으로써 영역을 구하여, 눈이나 입으로 될 수 있는 후보 영역만을 추출하는 부 품 후보 추출부(7), 매칭 화상 생성부(3)에서 생성한 매칭 화상과 미리 기억한 자세 패턴 화상과 대조하기 위한 매칭부(8), 매칭부(8)에서 이용하는 패턴을 기억하기 위한 매칭ㆍ패턴 DB(9)로 구성된다. In the drawing, the specific partial pose estimating apparatus 1 performs a filtering process on the image capturing unit 2 for capturing an image signal captured by a video camera or the like, and compares the captured image with a posture pattern described later. Pattern matching unit 4 for obtaining a partial posture of a head or the like, in contrast with an image generated by matching image generation unit 3 and matching image generation unit 3 for generating an image, and a posture pattern stored in advance. It consists of. In addition, the inside includes a color space conversion unit 5 for converting the color image captured by the image capture unit 2 into a gray scale image, and a binary image generation unit for converting the converted gray scale image into a binary image ( 6) The part candidate extracting unit 7 which obtains a region by integrating adjacent pixels from the binarized image by the binary image generating unit 6, and extracts only candidate regions which can be eyes or mouths, and generating a matching image. It consists of a matching part 8 for matching with the matching image produced | generated by the part 3, and the posture pattern image memorize | stored previously, and the matching pattern DB 9 for storing the pattern used by the matching part 8.

다음에, 도 2~도 9를 이용하여 본 발명의 특정 부분 자세 추정 장치의 동작을 설명한다. Next, the operation of the specific partial pose estimation apparatus of the present invention will be described with reference to Figs.

도 2는 그 동작을 설명하기 위한 흐름도이다. 또한, 도 3은 2치 화상 생성부(6)의 동작을 설명하기 위한 흐름도이다. 또한, 도 4는 다른 요소도 마찬가지이지만, 2치 화상 생성부(6)의 하드웨어 내부 구성을 나타내는 도면이고, 또한 도 5와 도 6은 매칭 화상 생성부(3)에서 실행하는 처리의 흐름을 설명하기 위한 도면이다. 도 7은 패턴 매칭부(4)에서 실행하는 대조 처리를 설명하기 위한 도면이다. 도 8은 패턴 매칭부(4)에서 실행하는 대조 처리의 동작을 나타내는 흐름도이다. 도 9는 패턴 매칭부(4)의 동작을 설명하기 위한 도면이다. 2 is a flowchart for explaining the operation. 3 is a flowchart for explaining the operation of the binary image generating unit 6. 4 is a diagram showing a hardware internal configuration of the binary image generating unit 6, but the other elements are the same, and FIGS. 5 and 6 illustrate the flow of processing executed by the matching image generating unit 3. It is a figure for following. FIG. 7 is a diagram for explaining a matching process performed by the pattern matching section 4. 8 is a flowchart showing the operation of the matching process executed by the pattern matching section 4. 9 is a view for explaining the operation of the pattern matching section 4.

2치 화상 생성부(6)는, 도 4에 나타내는 바와 같이, 프로세서(61)와 메모리(62)와 입출력 인터페이스(64)와, 도 3에 도시되는 동작을 실행시키는 2치화 프로그램(63)으로 구성된다. 프로세서(61)는 먼저 입출력 인터페이스(64)를 경유해서 얻어지는 캡쳐 화상을 그레이 스케일화한 그레이 스케일 화상을 메모리(62)로 읽어 들인다. 그리고, 판독한 그레이 스케일 화상에 대하여, 2치화 프로그램(63)에 기록된 단계에 의해, 뒤에서 설명하는 바와 같이 도 2의 S1-3에서, 휘도에 따라 2치화한다. As shown in FIG. 4, the binary image generating unit 6 is a processor 61, a memory 62, an input / output interface 64, and a binary program 63 for executing the operations shown in FIG. It is composed. The processor 61 first reads the gray scale image obtained by gray scale the captured image obtained via the input / output interface 64 into the memory 62. Then, the read gray scale image is binarized in accordance with the luminance in S1-3 of FIG. 2 by the steps recorded in the binarization program 63 as described later.

먼저, 영상 캡쳐부(2)에서 비디오 신호를 캡쳐하고(단계 S1-1), 캡쳐한 컬러 화상을 색공간 변환부(5)에 의해 그레이 스케일 화상으로 변환한다(단계 S1-2). First, the video capture unit 2 captures a video signal (step S1-1), and converts the captured color image into a gray scale image by the color space conversion unit 5 (step S1-2).

컬러 화상으로부터 그레이 스케일 화상으로의 변환은, 예를 들면 다음의 (식1)을 이용하여 실행한다. The conversion from the color image to the gray scale image is performed using, for example, the following equation (1).

(식1)

(Eq. 1)

여기서, Gy(x, y)는 좌표값(x, y)에 있어서의 휘도값, R, G, B(x, y)는 좌표값(x, y)에 있어서의 컬러 화상의 화소값이다. 컬러 화상으로부터 그레이 스케일 화상으로 변환할 때의 계수값은, 상기 이외의 값을 이용해도 된다. Here, Gy (x, y) is the luminance value at the coordinate values (x, y), and R, G, B (x, y) is the pixel value of the color image at the coordinate values (x, y). You may use the value of that excepting the above as the coefficient value at the time of converting from a color image to a gray scale image.

또는, 색공간 변환부(5)에 있어서의 컬러ㆍ그레이 스케일 변환을, 다음의 (식2)를 이용하여 RGB값을 정규화한 후에, 상기의 (식1)을 이용하여 변환해도 된다. Alternatively, the color-gray scale conversion in the color space conversion unit 5 may be converted using the above expression (1) after normalizing the RGB value using the following equation (2).

(식2)

(Eq. 2)

다음에, 2치 화상 생성 수단(6)에 있어서 그레이 스케일 화상을 휘도에 따라 2치화한다(단계 S1-3). 이 때, 도 3에 나타내는 처리를 실행함으로써, 입력된 화상에 적응하여 2치 화상을 생성한다. 이 때의 2치화 기준으로 되는 임계값은, 도 5에 나타내는 소정 범위의 윈도우 좌표(31)를 설정하여, 그 좌표 범위 내에 있는 전화소, 도 5의 경우는 25화소의 휘도 평균값, 또는 중앙값을 구하여, 이 값을 임계값으로 한다. 도 6에 나타내는 처리를 실행하고, 그리고 대상으로 되는 화소(32)의 휘도를 임계값과 비교한다. Next, in the binary image generating means 6, the gray scale image is binarized in accordance with the luminance (step S1-3). At this time, by performing the process shown in Fig. 3, a binary image is generated in accordance with the input image. The threshold value used as the binarization standard at this time sets the window coordinates 31 of the predetermined range shown in FIG. 5, and sets the luminance average value or the median value of 25 pixels in the telephone set within the coordinate range, and in the case of FIG. It calculates | requires and makes this value a threshold value. The processing shown in FIG. 6 is executed and the luminance of the pixel 32 as a target is compared with a threshold value.

이렇게 해서, 입력 화상을 주사하고, 모든 화소에 대하여, 단계 S2-1~단계 S2-8의 처리를 반복해서 실행함으로써 화상을 2치화한다. In this way, the input image is scanned, and the image is binarized by repeatedly performing the processing of steps S2-1 to S2-8 for all the pixels.

즉, 이제부터 판정을 하고자 하는 화소의 화소 밝기 레벨값이, 미리 설정한 임계값보다도 낮은 화소는 그 화소값을 1로 설정한다(단계 S2-2, 단계 S2-7). 만약에, 대상 화소의 화소 밝기 레벨이 임계값보다도 큰 경우, 주목 화소를 중심으로 한 KxK(상기한 바와 같이 도 5의 경우는 K=5로 하고 있음)의 크기의 블록을 설정한다(단계 S2-3). 다음에, 설정한 블록 내의 휘도의 평균값을 구한다(단계 S2-4). 그리고, 주목 화소의 화소값이 다음의 (식3)의 조건을 만족하지 여부를 판정한다(단계 S2-5).That is, a pixel whose pixel brightness level value of the pixel to be determined from now on is lower than the preset threshold value is set to 1 (step S2-2, step S2-7). If the pixel brightness level of the target pixel is larger than the threshold value, a block having a size of KxK (K = 5 in FIG. 5 as described above) centered on the pixel of interest is set (step S2). -3). Next, the average value of the luminance in the set block is obtained (step S2-4). Then, it is determined whether or not the pixel value of the pixel of interest satisfies the following condition (Equation 3) (step S2-5).

(식3)

(Eq. 3)

여기서, C는 미리 설정하는 규정값이다. Here, C is a prescribed value set in advance.

만약에, (식3)의 조건을 만족하는 것이면 화소값을 0으로 설정하고(단계 S2-6), 그렇지 않으면 화소값을 1로 설정한다(단계 S2-7). If the condition of Expression (3) is satisfied, the pixel value is set to 0 (step S2-6); otherwise, the pixel value is set to 1 (step S2-7).

이상과 같이, 주위 화소의 상태에 따라 2치화 처리를 실행하기 때문에, 예를 들면 비디오 카메라의 화질 열화에 의해 콘트라스트가 낮은 화상에 대해서도 소정 좌표 범위를 설정하기 때문에, 적응적으로 그 범위 내의, 예를 들어 평균값으로부터 2치화 처리를 실행하는 것이 가능해진다. As described above, since the binarization processing is performed in accordance with the state of the surrounding pixels, a predetermined coordinate range is set even for an image having low contrast due to deterioration of image quality of the video camera. For example, it becomes possible to perform a binarization process from an average value.

단계 S2-4에서 평균값을 구했지만, 예를 들면 블록 내의 화소값을 재배열했을 때의 중앙값을 구하고, 이하 (식4)의 조건을 이용해도 된다. Although the average value was calculated | required in step S2-4, the median value at the time of rearranging the pixel value in a block, for example may be calculated | required, and the following formula (4) may be used.

(식4)

(Eq. 4)

또한, 히스토그램 평균화법 등의 콘트라스트 강조 처리를 실시한 후에 고정 임계값을 이용하여 2치화하도록 해도 된다. In addition, after performing contrast enhancement processing such as a histogram averaging method, it may be binarized using a fixed threshold value.

이상의 처리에 의해, 영상 캡쳐부(2)에 의해 캡쳐한 컬러 화상을 그레이 스케일로 변환하면, 도 7의 (10)으로 되고, 그 화상을 2치화하면 도 7의 (11)과 같이 된다. When the color image captured by the image capturing unit 2 is converted to gray scale by the above process, it becomes (10) of FIG. 7, and when the image is binarized, it becomes as (11) of FIG.

다음에, 부품 후보 추출부(7)에 있어서, 2치 화상(11)에 대하여 동일 2치 화소가 상하 좌우 경사로 4연결 또는 8연결되어 있는지를 보아서, 관련되고, 또한 의미가 있는 인접 화소를 통합해서 영역을 구하여 이를 개개의 영역으로서 도 7의 (11-a), (11-b) 등과 같이 라벨링한다(S1-4). 또한, 개개의 영역(11-a) 등에 영역의 외접 직사각형의 크기가 미리 설정한 범위 내에 수용되는 영역만을 추출한다(단계 S1-5). 즉, 도 7의 영역(11-a)에 대해서는, 추정 대상 외의 크기의 영역이라고 하여 이것을 제외한다. Next, in the component candidate extracting section 7, it is determined whether the same binary pixels are connected to the binary image 11 by four or eight connected up, down, left, and right inclinations, thereby integrating adjacent and meaningful neighboring pixels. Then, the area is obtained and labeled as individual areas as shown in (11-a) and (11-b) in FIG. 7 (S1-4). Further, only the area in which the size of the circumscribed rectangle of the area is contained within the preset range is extracted (step S1-5). That is, the area 11-a in FIG. 7 is excluded as an area having a size other than the estimation target.

여기서, 휴대 전화나 전자 수첩에 있어서의 TV 전화 기능을 이용하는 것을 생각한 경우, 사용자는 자기의 얼굴을 화각(畵角)에 수용되고 또한 크게 촬영하기 위해서, 미리 눈의 크기 또는 입의 크기가 어느 정도로 될지를 예측하는 것이 가능하다. 따라서, 전술한 임계값 처리는 유효하다. Here, in the case of using the TV telephone function in a mobile phone or an electronic notebook, in order to capture a face of a person at an angle of view and to take a larger image, the user has a certain degree of eye size or mouth size in advance. It is possible to predict what will happen. Therefore, the above threshold processing is valid.

이렇게 해서 단계 S1-5에서 추출한 결과는 도 7의 (12)와 같이 된다. Thus, the result extracted in step S1-5 becomes as (12) of FIG.

이상에서 구한 매칭 화상(13)을 이용하여 패턴 매칭부(4)에서는, 머리 부분 자세를 추정한다(단계 S1-6). Using the matching image 13 obtained above, the pattern matching part 4 estimates a head posture (step S1-6).

도 8의 패턴 매칭부(4)가 실행하는 흐름도에 따라서 동작을 설명한다. 이 패턴 매칭부(4)의 하드웨어 구성도 도 4의 구성과 마찬가지이지만, 2치화 프로그램(63) 대신에 도 8의 동작을 실행시키는 매칭ㆍ프로그램이 있다. The operation will be described according to the flowchart executed by the pattern matching section 4 in FIG. The hardware configuration of the pattern matching section 4 is similar to that of FIG. 4, but there is a matching program that performs the operation of FIG. 8 instead of the binarization program 63.

여기서, 매칭 패턴 DB(9)에는 도 9의 (14)에 나타내는 템플릿이 저장되어 있다고 한다. 전술한 바와 같이, 휴대 전화나 전자 수첩에서 TV 전화 기능 등을 이용할 때에 인물의 얼굴을 촬영하는 경우, 미리 화각을 상정할 수 있기 때문에, 얼굴의 방향에 따른 눈 및 입 영역의 상태를 미리 예측할 수 있다. 매칭 패턴 DB(9)에서는, 상정한 머리 부분 자세에 있어서의 눈 및 입 영역의 2치 마스크 화상을 저장하고 있다. Here, it is assumed that the template shown in (14) of FIG. 9 is stored in the matching pattern DB9. As described above, when photographing a face of a person when using a TV phone function or the like in a mobile phone or an electronic notebook, the angle of view can be assumed in advance, so that the state of the eye and mouth area along the direction of the face can be predicted in advance. have. In the matching pattern DB 9, the binary mask image of the eye and mouth area in the assumed head posture is stored.

S3-1에서, 입출력 인터페이스 경유로 메모리에 2치화한 도 9의 매칭 화상(13)의 P를 판독한다. 또한, S3-2에서 매칭 패턴 DB(9)의 템플릿(14)으로부터 기준으로 되는 1번째의 마스크 화상 T1을 판독한다. 그리고, S3-3에서, 캡쳐 화상 P와 마스크 화상 T1의 판독한 y=0 내지 B, x=0 내지 A의 영역에서, P(x, y)의 2치화값과, T1(x, y)의 2치화값을, 각 좌표값(x, y)으로 논리 연산한다. 그리고, 좌표(0, 0)부터 (A, B)까지 그 논리곱을 가산한다. S-4에서 템플릿(14)에 아직 시도해 보지 않은 마스크 화상이 존재하지 않을 때까지 단계 S3-2와 S3-3을 반복한다. S3-4에서, 모든 마스크 화상에 있어서의 논리곱 가산이 끝나면, S3-5에 있어서 단계 3-4에서 최대 가산값을 나타내는 마스크 화상, 이 예에서는 화상(15)을 선택한다. In S3-1, P of the matching image 13 in Fig. 9 binarized in the memory via the input / output interface is read. In addition, in S3-2, the first mask image T1 as a reference is read from the template 14 of the matching pattern DB 9. Then, in S3-3, the binarized value of P (x, y) and T1 (x, y) in the region of read y = 0 to B and x = 0 to A of the captured image P and the mask image T1. The binary value of is logically operated on each coordinate value (x, y). Then, the logical product is added from coordinates (0, 0) to (A, B). In S-4, steps S3-2 and S3-3 are repeated until no mask image has been tried in the template 14 yet. In step S3-4, when the AND addition of all the mask images is completed, the mask image showing the maximum addition value in step 3-4 in S3-5, and in this example, the image 15 is selected.

즉, 매칭 화상(13)과 템플릿의 각 마스크 화상과의 논리곱을 계산하고, 그 결과 얻어지는 화상의 화소 1(매칭한 화상)의 수를 계산해서 가산하여, 가장 수가 많은 것을 선택함으로써, 대조 결과(15)를 얻을 수 있다. 이와 같이, 2치화에 의해 패턴ㆍ매칭의 검출이 아날로그 비교가 아니어도 되므로, 매우 간단하게 실행할 수 있다. That is, the logical product of the matching image 13 and each mask image of the template is calculated, the number of pixels 1 (matched images) of the resultant image is calculated and added, and the largest number is selected, so that the matching result ( 15) can be obtained. In this way, since the detection of the pattern and matching does not have to be an analog comparison by the binarization, it can be performed very simply.

이상과 같이, 매칭 패턴 DB에 저장한 패턴의 수만큼의 자세 정보밖에 추정할 수 없지만, 매우 처리가 단순하므로, 계산 능력이 낮은 하드웨어에 있어서도 충분히 실시간으로 처리가 가능하다. As described above, only attitude information as many as the number of patterns stored in the matching pattern DB can be estimated. However, since the processing is very simple, the processing can be performed sufficiently in real time even on hardware having low computing power.

도 1의 구성에 있어서는, 매칭 패턴 DB에 규정된 패턴을 저장하도록 했지만, 최초에 비디오 카메라로부터 취득한 화상을 이용하여, 사용자 개인의 템플릿 화상을 생성해도 된다. In the structure of FIG. 1, although the pattern prescribed | regulated to the matching pattern DB was stored, you may generate a template image of a user individually using the image acquired from the video camera initially.

도 10은 매칭ㆍ패턴을 영상 캡쳐부(2)의 출력으로부터 작성하는 경우의 특정 부분 자세 추출 구성도이다. 도면에서, 촬영한 화상으로부터, 자세 패턴의 템플릿 화상을 생성하기 위한 매칭 패턴 생성부(16)를 마련하고 있다. FIG. 10 is a configuration diagram of a specific partial pose extraction in the case of creating a matching pattern from the output of the video capture unit 2. FIG. In the figure, a matching pattern generation unit 16 for generating a template image of a posture pattern from the photographed image is provided.

다음에, 동작을 도 11을 이용하여 설명한다. 도 11은 영상 캡쳐부(2)에 의해 통상 자세로 촬영한 화상을 2치화 후, 그 화상을 기초로 템플릿 화상을 생성한 결과를 나타낸 도면이다. Next, the operation will be described with reference to FIG. FIG. 11 is a diagram showing a result of generating a template image based on the image after binarizing an image captured by the image capturing unit 2 in a normal posture.

여기서는, 영상 캡쳐부(2)에서 최초에 캡쳐하는 화상을 통상의 자세(카메라에 대해서 정면을 향한 자세)로 보거나, 혹은, 사용자에게 통상의 자세로 촬영을 의뢰하여, 통상 자세의 화상을 취득한다. Here, the image initially captured by the image capturing unit 2 is viewed in a normal posture (a posture facing the camera), or a user is photographed in a normal posture to obtain an image of the normal posture. .

이상과 같이 하여 취득한 화상을 매칭 화상 생성부(3)에 의해 2치화한 화상(17)을 매칭 템플릿 생성부(16)에 있어서 아핀(affine) 변환을 이용하여, 예를 들어, 목을 좌우로 갸웃거린 화상, 목을 좌우로 흔든 화상, 상하로 흔든 화상을 생성한다. The image 17 obtained by binarizing the image acquired as described above by the matching image generation unit 3 is, for example, affine transformed in the matching template generation unit 16 by using an affine transformation. It produces a scorching image, an image of shaking the neck from side to side, and an image of shaking up and down.

아핀 변환은 다음 식으로 나타내는 행렬로 표현할 수 있다. The affine transformation can be expressed as a matrix represented by the following equation.

(식5)

(Eq. 5)

(식5)에서 나타낸 아핀 변환 행렬을, 각각의 자세에 대응해서 준비하여, 2치 화상(17)의 화소값이 1인 좌표를 다음의 (식6)에서 변환하고, 변환 후의 좌표에 화소값 1을 그 이외를 화소값 0으로 함으로써 도 11의 (18)에 나타내는 각 자세에 따른 화상을 생성할 수 있다. The affine transformation matrix shown in (Equation 5) is prepared corresponding to each attitude, the coordinates of which the pixel value of the binary image 17 is 1 are converted in the following (Equation 6), and the pixel values are converted to the coordinates after the conversion. By setting 1 to a pixel value of 0 other than that, an image corresponding to each posture shown in Fig. 11 (18) can be generated.

(식6)

(Eq. 6)

또한, (식6)에서는, 본래의 좌표를 X, Y로 하고, 변환 후의 좌표를 x, y로 하고 있다. 또한, 2치 화상(17)은 평면으로 하고 있다. In Equation 6, the original coordinates are X and Y, and the coordinates after the conversion are x and y. In addition, the binary image 17 is made into a plane.

이상과 같이 매칭 패턴(18)의 생성 후는, 도 1의 구성과 마찬가지의 처리를 실행하여 자세를 추정하는 것이 가능하다. After the generation of the matching pattern 18 as described above, the posture can be estimated by executing the same process as that of FIG. 1.

또한, 시간은 걸리지만, 사용자에게 예를 들어 도 11의 (18)에 표시되는 각종 자세를 취하게 하여, 그것을 2치화하도록 해도 된다. 그렇게 하면, 아핀 변환 없이 템플릿을 생성할 수 있다. Moreover, although it takes time, you may make a user take the various postures shown, for example in FIG. 11 (18), and binarize it. That way, you can create a template without affine transformations.

이상과 같이 단시간 또는 실시간 머리 부분 자세 추정 장치를 구성함으로써, 사용자의 얼굴의 특징에 적합한 템플릿을 생성하기 때문에, 매칭 정밀도가 향상할 수 있는 효과가 있다. By constructing a short time or real time head pose estimation apparatus as described above, a template suitable for the feature of the user's face is generated, so that matching accuracy can be improved.

또한, 이후의 실시예에 있어서도, 특정 부분으로서 머리 부분 또는 얼굴의 자세의 추정에 대해서 설명했지만, 특정 부분으로서는 이에 한정되지 않고, 손, 팔, 발 또는 상반신 등, 다른 부분이더라도 무방하다. In addition, although the estimation of the posture of a head part or a face was demonstrated as a specific part also in the following Example, it is not limited to this as a specific part, It may be other parts, such as a hand, an arm, a foot, or a torso.

(실시예 2)(Example 2)

이상의 실시예 1에서는, 매칭 패턴과 매칭 화상을 대조함으로써 머리 부분 자세를 추정하고 있었지만, 매칭 화상의 화소값 1의 화소의 분포로부터 머리 부분 자세를 추정하도록 매칭 수단을 변경한 경우의 형태를 설명한다. In the above-described first embodiment, the head posture is estimated by matching the matching pattern with the matching image, but the case where the matching means is changed to estimate the head posture from the distribution of the pixel of pixel value 1 of the matching image will be described. .

도 12는 본 실시예에 있어서의 특정 부분 자세 추정 장치의 구성을 나타낸 도면이다. 도면에서, 매칭 화상의 화소 분포를 구하여, 그 분포의 상태에 따라 머리 부분 자세를 추정하기 위한 화소 분포 계측부(19)를 마련하고 있다.12 is a diagram showing the configuration of a specific partial attitude estimation apparatus according to the present embodiment. In the figure, a pixel distribution measuring unit 19 for obtaining a pixel distribution of a matching image and estimating a head posture in accordance with the distribution state is provided.

다음에, 동작을 도 13을 이용하여 설명한다. 도 13은 화소 분포에 따라 머리 부분 자세를 추정하기 위한 맵을 나타낸 것이다. Next, the operation will be described with reference to FIG. FIG. 13 illustrates a map for estimating the head pose according to the pixel distribution.

매칭 화상의 화소값 1의 화소가, 맵(20) 등의 영역에 많이 분포하거나, 각 영역에 들어가는 화소수를 카운트하여, 그 카운트 수가 많은 영역에 대응하는 머리 부분 자세를 추정 결과로 한다. The pixel of pixel value 1 of a matching image is distributed in the area | regions, such as the map 20, or counts the number of pixels which enter each area | region, and the head posture corresponding to the area | region where the count number is many is made into an estimation result.

이와 같이, 화소 분포를 이용하면, 더욱 처리를 간략화하는 것이 가능해지고, 따라서, 계산 능력이 낮은 하드웨어에 의해서도 처리를 더욱 단축할 수 있는 효과가 있다. In this way, the use of the pixel distribution makes it possible to further simplify the processing, and therefore, there is an effect that the processing can be further shortened even by hardware having low computing power.

상기 실시예에서는, 특정 부분 자세 추정 장치는 하드웨어로 구성된다고 해서 설명했지만, 도 4에서 나타내는 바와 같이 실제로는 프로그램을 준비하여 프로세서로 이 프로그램을 실행시키는 구성으로 할 수 있다. 또는, 도 2, 도 3, 도 8의 흐름을 나타내는 단계로 구성하는 방법으로 해도 된다. In the above embodiment, the specific partial attitude estimation apparatus has been described as being composed of hardware. However, as shown in FIG. 4, the program can be configured to actually prepare a program and execute the program by the processor. Or it is good also as a method comprised by the step which shows the flow of FIG. 2, FIG. 3, FIG.

이상과 같이 본 발명에 의하면, 입력 화상을 소정 범위 내의 평균 화상에 근거해서 2치화하여 부품을 특정하는 매칭 화상 생성부와, 이 얻어진 2치 화상과 소정의 템플릿을 대조하여 자세를 검출하는 패턴ㆍ매칭부를 구비했기 때문에, 규모를 억제하고 용이하게 부분 자세를 추정할 수 있는 효과가 있다. As described above, according to the present invention, a matching image generation unit for binarizing an input image based on an average image within a predetermined range to specify a component, and a pattern for detecting posture by comparing the obtained binary image with a predetermined template. Since a matching part is provided, there is an effect of suppressing the scale and easily estimating the partial posture.

Claims

A matching image generating unit which binarizes each pixel of the input image from which the specific portion is extracted to a setting threshold value based on an image value within a predetermined coordinate range, and specifies a part whose size is within the setting range to obtain a binary image;

The pattern matching unit which detects a posture in contrast with the binary image obtained by the above specification, and a predetermined template.

Partial posture estimation device characterized in that it comprises a.

The method of claim 1,

The matching image generating unit converts an input image into a gray scale image, and binarizes the luminance of the gray scale image by using an average value or a median value of luminance of pixels within a predetermined range with a target pixel as the coordinate center. Partial posture estimation device, characterized in that.

The method of claim 1,

The pattern matching unit generates a matching image from a plurality of specific input images in advance, and uses the generated matching image as an element of a template.

The method of claim 1,

And the pattern matching unit is configured to obtain a comparison with each element of the template by a logical product of the pixels.

The method of claim 1,

The pattern matching unit obtains a pixel having a specific value from the binary image obtained by the matching image generation unit, and estimates the inclination of the specific portion to be detected from the distribution state in the image of those pixels. Attitude estimation device.

Binarizing each pixel of the input image from which the specific portion is extracted to a set threshold value based on an image value within a predetermined coordinate range;

Removing a portion outside the setting range from the binarized image obtained by binarizing and labeling as an image part within the setting range;

Contrasting the binarized image consisting of a group of parts labeled with the image after the deletion and the luminance of the image present in the predetermined template;

Partial attitude estimation method comprising a.

The method of claim 6,

In the step of binarizing, first, the input image is converted into a gray scale image, and the luminance of the converted gray scale image is one of the luminance average value and the image center value of the pixel within a predetermined range with the target pixel as the coordinate center. Selecting and binarizing as a digitization threshold value, the specific partial attitude estimation method characterized by the above-mentioned.

The method of claim 6,

And said collating step obtains a collation value by the logical AND of the corresponding pixels in the coordinate values of the binarized image and the image of the template.

Binarizing each pixel of the input image from which the calculator is capable of reading and extracting a specific portion to a set threshold value based on an image value within a predetermined coordinate range;

And a recording medium for recording a specific partial posture estimation program.