KR100680278B1

KR100680278B1 - Method for lip shape extraction and apparatus thereof

Info

Publication number: KR100680278B1
Application number: KR1020060014215A
Authority: KR
Inventors: 고한석; 전창원
Original assignee: 고려대학교 산학협력단
Priority date: 2005-12-28
Filing date: 2006-02-14
Publication date: 2007-02-07

Abstract

A method and a device for extracting a lip shape are provided to obtain correct coordinates for a lip outline and correctly trace the lip shape by detecting a lip area from an image based on tone/saturation and detecting the lip outline from the lip area, and correctly recognize speed based on speech information of a speaker and change information of the lip shape. A tone/saturation calculator(200) calculates the tone/saturation of each coordinate of the inputted image. A lip area detector(210) detects the lip area including a lip from the inputted image based on the calculated tone/saturation of each coordinate. An outline detector(220) extracts the predetermined number of points forming the lip shape from the detected lip area. A change information recorder(230) records change information of the lip shape by using the extracted points. The lip area extractor includes a histogram generator(211) assigning a histogram value corresponding to a tone value of the coordinates to the coordinates having the bigger tone/saturation value than a reference value and an area detector(212) detecting the lip area.

Description

Lip shape extraction method and apparatus therefor {Method for lip shape extraction and Apparatus}

도 1은 본 발명의 블럭도이다.1 is a block diagram of the present invention.

도 2는 도 1의 상세 블럭도이다.FIG. 2 is a detailed block diagram of FIG. 1.

도 3은 본 발명의 흐름도이다.3 is a flowchart of the present invention.

도 4는 도 3의 상세 흐름도이다.4 is a detailed flowchart of FIG. 3.

도 5는 본 발명에 적용된 HSV 색상 모델이 갖는 특성을 나타내는 일 예를 도시한 것이다.Figure 5 shows an example showing the characteristics of the HSV color model applied to the present invention.

도 6은 도 3의 입술영역을 검출하는 과정을 도시한 것이다.FIG. 6 illustrates a process of detecting the lip region of FIG. 3.

도 7은 활성화 윤곽 모델(Active Shape Model)을 위해서 정의된 입술모양의 일 예를 도시한 것이다.FIG. 7 illustrates an example of a lip shape defined for an active shape model.

도 8은 도 3의 입술모양의 변화 정보를 기록하는 과정의 일 예를 도시한 것이다.FIG. 8 illustrates an example of a process of recording change information of the lip shape of FIG. 3.

도 9은 본 발명을 적용한 일 예를 도시한 것이다.9 illustrates an example in which the present invention is applied.

본 발명은 영상신호 처리에 관한 것으로, 특히 입술모양 추출방법 및 그 장치에 관한 것이다.The present invention relates to image signal processing, and more particularly, to a lip extraction method and apparatus.

종래의 음성인식 기술에서는 음성 신호만을 이용하기 때문에 주변 잡음이 인식성능에 영향을 미치는 문제점이 있었으며, 따라서 주변 환경의 잡음이 음성 인식을 어렵게 만드는 경우에는 영상정보를 이용하는 방법이 유리한 방법으로 인식되고 있다. 여기서 영상정보는 발음과 관련된 화자의 입술, 혀, 치아 등을 대상으로 한다. 일반적으로 영상정보를 이용한 음성인식 기술을 립리딩(Lip-Reading)이라고 하거나 VSR(Visual speech recognition)이라고 한다.In the conventional voice recognition technology, since only the voice signal is used, there is a problem that the ambient noise affects the recognition performance. Therefore, when the noise of the surrounding environment makes the voice recognition difficult, the method of using the image information is recognized as an advantageous method. . Here, the image information targets the speaker's lips, tongue and teeth related to the pronunciation. In general, voice recognition technology using image information is referred to as lip reading or VSR (Visual speech recognition).

립리딩(Lip-Reading)을 위한 영상신호 처리방법에는 입술영역 검출과 입술모양 추출의 단계로 구성된다. 입술영역 검출은 색상정보를 기반으로 입력신호의 전체영상에서 화자의 입술부근의 정보를 검출하는 과정이다.An image signal processing method for lip reading includes lip area detection and lip shape extraction. Lip region detection is a process of detecting the information around the speaker's lips in the entire image of the input signal based on the color information.

종래의 RGB 또는 YCbCr 색상 모델에 기반한 방법은 제한된 조건내에서 우수한 입술검출 성능을 보여준다. 그러나, 촬영하는 카메라와 주변의 조명성분 등에 의해 영향을 받게 되어 실험환경에 따라 수정해야 한다.The method based on the conventional RGB or YCbCr color model shows excellent lip detection performance in limited conditions. However, it is affected by the camera and the surrounding lighting components to be photographed and should be corrected according to the experimental environment.

한편, 입술모양에 대한 정보를 얻기 위해서 인식에 필요한 특징되는 부분영상의 정보를 이용하는 방법이 있다. 위와 같은 방법은 입술의 모양을 인식하기 위해 적용되는 특징정보를 적게 필요로 한다.On the other hand, there is a method of using the information of the characteristic partial image required for recognition in order to obtain information about the shape of the lips. The above method requires less feature information to recognize the shape of the lips.

그러나, 종래의 입술모양 추출방법은 주변의 조명의 변화나 노이즈 등에 민감하여 주변환경에 따라 입술의 모양이 정확하게 검출되지 않을 수 있고, 입술 모양의 검출하는 과정에 사용되는 특징정보가 적어 입술모양을 정확하게 검출할 수 없고, 입술모양의 변화를 정확하게 추적할 수 없는 문제점이 있다.,However, in the conventional lip shape extraction method, the shape of the lips may not be detected accurately according to the surrounding environment because it is sensitive to changes in ambient lighting or noise, and the lip shape may be reduced due to less feature information used in the process of detecting the shape of the lips. There is a problem that can not accurately detect, and can not accurately track changes in the shape of the lips.,

본 발명이 이루고자 하는 기술적 과제는 영상에서 색조와 채도에 기초하여 입술이 존재하는 영역을 검출하고 이 영역에서 입술의 윤곽을 검출하는 입술모양 추출방법을 제공하는데 있다.An object of the present invention is to provide a lip shape extraction method for detecting a region in which lips exist based on hue and saturation in an image and detecting contours of lips in this region.

본 발명이 이루고자 하는 다른 기술적 과제는 상기의 입술모양 추출방법이 적용된 입술모양 추출장치를 제공하는데 있다.Another technical problem to be achieved by the present invention is to provide a lip extracting apparatus to which the lip extracting method is applied.

상기의 기술적 과제를 해결하기 위하여 본 발명은 입력 영상의 좌표별 색조와 좌표별 채도를 연산하는 색조채도 연산부, 상기 연산된 좌표별 색조와 상기 좌표별 채도를 이용하여 상기 입력 영상으로부터 입술이 포함된 입술영역을 검출하는 입술영역 검출부, 상기 검출된 입술 영역에서 입술의 윤곽을 구성하는 소정 갯수의 포인트들을 추출하는 윤곽 검출부 및 상기 추출된 포인트들을 이용하여 상기 입술 모양의 변화 정보를 기록하는 변화정보 기록부를 포함한다.In order to solve the above technical problem, the present invention includes a hue saturation calculator that calculates a hue saturation and a saturation per coordinate of an input image, and includes lips from the input image using the calculated hue saturation and saturation per coordinate. A lip region detector for detecting a lip region, a contour detector for extracting a predetermined number of points constituting the outline of the lips from the detected lip region, and a change information recording unit for recording change information of the lip shape using the extracted points It includes.

상기의 다른 기술적 과제를 해결하기 위하여 본 발명은 입력 영상의 좌표별 색조와 좌표별 채도를 연산하는 단계, 상기 연산된 좌표별 색조와 상기 좌표별 채도를 이용하여 상기 입력 영상으로부터 입술이 포함된 입술영역을 검출하는 단계, 상기 검출된 입술 영역에서 입술의 윤곽을 구성하는 소정 갯수의 포인트들을 추출하는 단계 및 상기 추출된 포인트들을 이용하여 상기 입술 모양의 변화 정보를 기록하는 단계를 포함한다.In order to solve the above other technical problem, the present invention includes calculating a hue by coordinates and a saturation by coordinates of an input image, the lips including the lips from the input image using the calculated hue by coordinates and saturation by coordinates. Detecting a region, extracting a predetermined number of points constituting the outline of the lips from the detected lip region, and recording change information of the lip shape using the extracted points.

이하에서는 도면을 참조하여 본 발명의 바람직한 실시예를 설명하기로 한다.Hereinafter, with reference to the drawings will be described a preferred embodiment of the present invention.

도 1은 본 발명의 블럭도이다. 1 is a block diagram of the present invention.

색조채도 연산부(100)는 입력 영상의 좌표별 색조와 좌표별 채도를 연산한다. 이때, 입력 영상은 임의의 시간 t에서 입력된 영상이다. 이때, 좌표별 색조를 연산하는 과정은 입력 영상이 입력된 시간 t에서 영상의 각 좌표마다 색조값을 연산하는 과정이다. 이때, 좌표별 채도를 연산하는 과정은 입력 영상이 입력된 시간 t에서 영상의 각 좌표마다 채도값을 연산하는 과정이다.The hue saturation calculator 100 calculates the hue by coordinates and the saturation by coordinates of the input image. In this case, the input image is an image input at an arbitrary time t. In this case, the process of calculating the color tone for each coordinate is a process of calculating the color tone value for each coordinate of the image at the time t when the input image is input. At this time, the process of calculating the saturation for each coordinate is a process of calculating the saturation value for each coordinate of the image at the time t when the input image is input.

입술영역 검출부(110)는 색조채도 연산부(100)에서 연산된 좌표별 색조와 상기 좌표별 채도를 이용하여 입력 영상으로부터 입술이 포함된 입술영역을 검출한다. 이때, 좌표별 색조와 상기 좌표별 채도를 이용하여 입술영역을 검출하는 과정은 각 좌표마다 색조값 및 채도값을 미리 정해진 기준값과 비교한 결과에 따라 입술의 모양을 추출하기 위한 입술영역을 결정하는 과정이다.The lip area detector 110 detects a lip area including the lips from the input image by using the hue saturation calculator 100 and the hue saturation calculated by the coordinates. At this time, the process of detecting the lip region using the hue by coordinates and the saturation by the coordinates is to determine the lip region for extracting the shape of the lips according to the result of comparing the hue value and the saturation value for each coordinate with a predetermined reference value. It is a process.

윤곽 검출부(120)는 입술영역 검출부(110)에서 검출된 입술 영역으로부터 입술의 윤곽을 구성하는 소정 갯수의 포인트들을 추출한다.The contour detector 120 extracts a predetermined number of points constituting the contour of the lips from the lip region detected by the lip region detector 110.

변화정보 기록부(130)는 윤곽 검출부에서 추출된 포인트들을 이용하여 입술 모양의 변화 정보를 기록한다. 바람직하게는, 변화정보 기록부(130)는 휘발성 메모리 또는 비휘발성 메모리와 같은 저장소자를 포함할 수 있다. 바람직하게는, 휘발성 메모리는 SDRAM을 포함할 수 있다. 바람직하게는, 비휘발성 메모리는 플래쉬(flash) 메모리를 포함할 수 있다.The change information recording unit 130 records change information of a lip shape using the points extracted by the contour detection unit. Preferably, the change information recording unit 130 may include a reservoir such as a volatile memory or a nonvolatile memory. Preferably, the volatile memory may comprise SDRAM. Desirably, the nonvolatile memory may include flash memory.

바람직하게는, 본 발명에서 입력영상은 화자의 전체 얼굴이 표현된 영상을 이용할 수 있다. 바람직하게는, 영상에 하나의 입술만 존재하고, 화자에 의해 인식될 음성이 그 입술에서 발화되는 입력영상을 이용할 수 있다.Preferably, in the present invention, the input image may use an image in which the entire face of the speaker is expressed. Preferably, only one lip exists in the image, and an input image in which a voice to be recognized by the speaker is uttered in the lip may be used.

색조채도 연산부(200)는 입력 영상의 좌표별 색조와 좌표별 채도를 연산한다. 바람직하게는, 색조채도 연산부(200)는 입력 영상의 RGB 값을 대응하는 HSV값으로 변환하고, 변환된 HSV값으로부터 좌표별 색조와 좌표별 채도를 추출할 수 있다.The hue saturation calculator 200 calculates hue by coordinates and saturation by coordinates of the input image. Preferably, the hue saturation calculator 200 may convert the RGB value of the input image into a corresponding HSV value, and extract the hue per coordinate and the saturation per coordinate from the converted HSV value.

입술영역 검출부(210)는 색조채도 연산부(200)에서 연산된 좌표별 색조와 상기 좌표별 채도를 이용하여 입력 영상으로부터 입술이 포함된 입술영역을 검출한다. The lip area detector 210 detects a lip area including the lips from the input image by using the coordinate hue and the saturation by the coordinate calculated by the hue saturation calculator 200.

입술영역 검출부(210)는 히스토그램 생성부(211) 및 영역 검출부(212)를 포함한다. 히스토그램 생성부(211)는 좌표별 색조의 값이 소정의 기준색조값보다 크고 좌표별 채도의 값이 소정의 기준채도값보다 큰 좌표에 대해, 위 좌표의 색조값에 대응하는 히스토그램값으로 할당한다. 이때, 각 좌표마다 히스토그램값이 할당되면, 입력 영상에 대한 히스토그램이 생성된다. 영역 검출부(212)는 히스토그램 생성부(211)에 의해 할당된 히스토그램값이 최대가 되는 좌표들을 포함하는 영역을 입술 영역으로 검출한다.The lip region detector 210 includes a histogram generator 211 and an area detector 212. The histogram generation unit 211 assigns a histogram value corresponding to the hue value of the above coordinates to the coordinates whose hue value by coordinates is larger than a predetermined reference color value and the saturation value by coordinates is larger than a predetermined reference saturation value. . At this time, if a histogram value is assigned to each coordinate, a histogram for the input image is generated. The area detector 212 detects the area including the coordinates of which the histogram value allocated by the histogram generator 211 is the maximum as the lip area.

또한, 히스토그램 생성부(211)는 좌표별 색조의 값이 소정의 기준색조값보다 작거나 좌표별 채도의 값이 소정의 기준채도값보다 작은 좌표에 소정의 최저 히스토그램값을 할당한다.In addition, the histogram generator 211 allocates a predetermined minimum histogram value to a coordinate whose value of the hue for each coordinate is smaller than a predetermined reference tone value or whose value of saturation for each coordinate is smaller than a predetermined reference chroma value.

바람직하게는, 히스토그램 생성부(211)는 다음의 수학식 1을 이용하여 특정 좌표에 히스토그램값을 할당할 수 있다.Preferably, the histogram generator 211 may assign a histogram value to specific coordinates by using Equation 1 below.

수학식 1에서, 특정 좌표(x,y)의 히스토그램값은

이고, 특정 좌표의 색조는

이며, 특정 좌표의 채도는

이고, 소정의 기준색조값은

이며, 소정의 기준채도값은

이다.In Equation 1, the histogram value of a specific coordinate (x, y) is

Where the hue of a particular coordinate is

, The saturation of a particular coordinate

The predetermined reference tone value is

The predetermined standard saturation value is

to be.

윤곽 검출부(220)는 입술영역 검출부(210)에서 검출된 입술 영역으로부터 입술의 윤곽을 구성하는 소정 갯수의 포인트들을 추출한다. The contour detector 220 extracts a predetermined number of points constituting the contour of the lips from the lip region detected by the lip region detector 210.

바람직하게는, 윤곽 검출부(220)는 입술에 대한 활성화 윤곽 모델(Active Shape Model)을 이용하여 입술의 윤곽을 구성하는 포인트들을 추출할 수 있다.Preferably, the contour detector 220 may extract points constituting the contour of the lips by using an active shape model of the lips.

바람직하게는, 윤곽 검출부(220)에서 추출하는 포인트들의 수는 16개로 할 수 있다.Preferably, the number of points extracted by the contour detector 220 may be 16.

윤곽 검출부(220)는 발화 검출부(221) 및 포인트 추출부(222)를 포함한다.The outline detector 220 includes a speech detector 221 and a point extractor 222.

발화 검출부(221)는 영역 검출부(212)에서 검출된 입술영역에서 입술이 발화된 상태인지 판단한다. 바람직하게는, 발화 검출부(221)는 입술영역에 히스토그램값이 소정의 제1기준값 이상인 2개의 영역들이 포함되고 위의 영역들 사이에 히스토그램값이 소정의 제2기준값 미만인 영역이 포함되면, 입술이 발화된 상태로 판단 할 수 있다.The utterance detector 221 determines whether the lip is ignited in the lip region detected by the region detector 212. Preferably, if the utterance detector 221 includes two areas whose histogram value is greater than or equal to a predetermined first reference value and the area whose histogram value is less than a predetermined second reference value is included in the lip area, It can be judged as a fired state.

포인트 추출부(222)는 발화 검출부(221)에 의해 입술이 발화된 상태라고 판단되면, 입술의 윤곽을 구성하는 소정 갯수의 포인트들을 추출한다. 또한, 포인트 추출부(222)는 발화 검출부(221)에 의해 입술이 발화되지 않은 상태로 판단되면, 소정의 시간이 경과한 후에 입력된 영상을 이용하여 입술의 윤곽을 구성하는 소정 갯수의 포인트들을 추출한다. 바람직하게는, 소정의 시간이 경과한 후에 입력된 영상이란, 현재 입력된 영상의 프레임과 연결되는 다음 프레임의 영상일 수 있다.When the point extractor 222 determines that the lip is uttered by the utterance detector 221, the point extractor 222 extracts a predetermined number of points constituting the outline of the lip. In addition, when it is determined that the lip is not uttered by the utterance detector 221, the point extractor 222 may select a predetermined number of points constituting the contour of the lip using the input image after a predetermined time elapses. Extract. Preferably, the image input after a predetermined time elapses may be an image of a next frame connected to a frame of the currently input image.

변화정보 기록부(230)는 윤곽 검출부에서 추출된 포인트들을 이용하여 입술 모양의 변화 정보를 기록한다. The change information recorder 230 records change information of a lip shape using the points extracted by the outline detector.

변화정보 기록부(230)는 입력 영상을 이용하여 추출한 입술의 윤곽과 소정의 시간이 지난후에 입력된 영상을 이용하여 추출한 입술의 윤곽의 차이를 이용하여 입술의 윤곽이 변화된 정도를 기록한다. 바람직하게는, 소정의 시간이 경과한 후에 입력된 영상이란, 현재 입력된 영상의 프레임과 연결되는 다음 프레임의 영상일 수 있다.The change information recording unit 230 records the degree of change of the outline of the lips using the difference between the outline of the lips extracted using the input image and the outline of the lips extracted using the input image after a predetermined time. Preferably, the image input after a predetermined time elapses may be an image of a next frame connected to a frame of the currently input image.

변화정보 기록부는 벡터 생성부(231) 및 기록부(232)를 포함한다.The change information recorder includes a vector generator 231 and a recorder 232.

벡터 생성부(231)는 입력 영상을 이용하여 추출한 포인트들과 소정의 시간이 경과한 후에 입력된 영상을 이용하여 추출한 포인트들을 연결하는 벡터들을 생성한다. 기록부(232)는 벡터 생성부(231)에서 생성된 벡터들의 크기 및 방향을 이용하여 입술의 윤곽이 변화된 정도를 기록한다.The vector generator 231 generates vectors connecting points extracted using the input image and points extracted using the input image after a predetermined time elapses. The recording unit 232 records the degree of change of the contour of the lips using the size and direction of the vectors generated by the vector generating unit 231.

먼저, 입력 영상의 좌표별 색조와 좌표별 채도를 연산한다(300 과정).First, the hue by coordinate and the saturation by coordinate of the input image are calculated (step 300).

이때, 입력 영상은 임의의 시간 t에서 입력된 영상이다. 이때, 좌표별 색조를 연산하는 과정은 입력 영상이 입력된 시간 t에서 영상의 각 좌표마다 색조값을 연산하는 과정이다. 이때, 좌표별 채도를 연산하는 과정은 입력 영상이 입력된 시간 t에서 영상의 각 좌표마다 채도값을 연산하는 과정이다.In this case, the input image is an image input at an arbitrary time t. In this case, the process of calculating the color tone for each coordinate is a process of calculating the color tone value for each coordinate of the image at the time t when the input image is input. At this time, the process of calculating the saturation for each coordinate is a process of calculating the saturation value for each coordinate of the image at the time t when the input image is input.

다음, 연산된 좌표별 색조와 상기 좌표별 채도를 이용하여 상기 입력 영상으로부터 입술이 포함된 입술영역을 검출한다(310 과정). 이 과정(310 과정)은 각 좌표마다 색조값 및 채도값을 미리 정해진 기준값과 비교한 결과에 따라 입술의 모양을 추출하기 위한 입술영역을 결정하는 과정이다.Next, the lip region including the lips is detected from the input image by using the calculated coordinates of hue and saturation of the coordinates (step 310). This process (process 310) is a process of determining the lip region for extracting the shape of the lips according to the result of comparing the hue value and the saturation value with each reference coordinate with a predetermined reference value.

입술영역이 검출되면, 입술 영역에서 입술의 윤곽을 구성하는 소정 갯수의 포인트들을 추출한다(320 과정). 이때, 소정 갯수는 당업자의 필요에 의해 임의적으로 정해질 수 있다. 소정 갯수가 증가하면, 입술의 윤곽을 정확히 추적할 수 있는 반면, 필요한 연산량이 증가한다.When the lip region is detected, a predetermined number of points constituting the outline of the lip are extracted from the lip region (step 320). At this time, the predetermined number may be arbitrarily determined by the needs of those skilled in the art. As the predetermined number increases, the contour of the lips can be accurately tracked, while the amount of computation required increases.

마지막으로, 추출된 포인트들을 이용하여 입술 모양의 변화 정보를 기록한다(330 과정).Finally, the change information of the shape of the lips is recorded using the extracted points (step 330).

도 4는 도 3의 상세 흐름도이다.4 is a detailed flowchart of FIG. 3.

먼저, 입력영상의 RGB값을 HSV값으로 변환시킨다(400 과정).First, the RGB value of the input image is converted into an HSV value (400).

다음, 변환된 HSV값으로부터 좌표별 색조값 및 채도값을 추출한다(401 과정).Next, the hue value and chroma value of each coordinate are extracted from the transformed HSV value (step 401).

좌표별 색조값 및 채도값이 추출되면, 입력영상에 대한 히스토그램을 생성한 다(410 과정). 이 과정(410 과정)은 좌표마다 색조값 및 채도값을 이용하여 적절한 히스토그램값을 할당하는 과정이다. 바람직하게는, 이 과정(410 과정)은 좌표별 색조의 값이 소정의 기준색조값보다 크고 상기 좌표별 채도의 값이 소정의 기준채도값보다 큰 좌표에 대해, 상기 좌표의 색조값에 대응하는 히스토그램값을 할당하는 과정을 포함할 수 있다. 이때, 소정의 기준색조값 및 소정의 기준채도값은 당업자에 의해 임의로 정해질 수 있는 값으로서, 컬러 영상에 입술 모양이 존재하는 경우의 최저 색조값 및 최저 채도값을 기준으로 정할 수 있다. 바람직하게는, 이 과정(410 과정)은 수학식 1을 이용하여 수행될 수있다. When the hue and saturation values for each coordinate are extracted, a histogram for the input image is generated (step 410). This process (step 410) is a process of allocating an appropriate histogram value using the hue value and the saturation value for each coordinate. Preferably, this process (step 410) corresponds to the hue value of the coordinates for the coordinates where the value of the hue per coordinate is greater than a predetermined reference hue value and the value of the saturation per coordinate is greater than a predetermined reference saturation value. The method may include assigning a histogram value. In this case, the predetermined reference tone value and the predetermined reference saturation value may be arbitrarily determined by those skilled in the art, and may be determined based on the lowest hue value and the lowest saturation value when the lip shape exists in the color image. Preferably, this process (process 410) may be performed using Equation 1.

다음, 위에서 생성된 히스토그램을 이용하여 입술 영역을 검출한다(411 과정). 바람직하게는, 이 과정(411 과정)은 히스토그램값이 최대가 되는 좌표들을 포함하는 영역을 입술 영역으로 검출하는 과정이다. 바람직하게는, 입술 영역은 히스토그램값이 최대가 되는 좌표에서부터 히스토그램값이 불연속적이 되는 좌표까지의 영역이다.Next, the lip region is detected using the histogram generated above (step 411). Preferably, this process (step 411) is a process of detecting the area including the coordinates of the maximum histogram value as the lip area. Preferably, the lip region is an area from the coordinate at which the histogram value is maximum to the coordinate at which the histogram value is discontinuous.

입술 영역이 검출되면, 입술 영역의 입술의 상태가 발화(Open) 상태인지 판단한다(420 과정). 이때, 입술의 상태가 발화 상태가 아니면, 입술이 닫힌(Closed) 상태이므로 입술의 모양 변화를 추적하기가 용이하지 않다. 따라서, 입술의 상태가 발화 상태가 아니면, 소정의 시간이 경과한 후에 입력된 영상에 대해 위의 과정(400-411 과정)을 반복한다. 바람직하게는, 소정의 시간이 경과한 후에 입력된 영상이란, 현재 입력된 영상의 프레임과 연결되는 다음 프레임의 영상일 수 있다.When the lip region is detected, it is determined whether the lip state of the lip region is in an open state (S420). At this time, if the state of the lips is not a ignition state, since the lips are closed (Closed) state it is not easy to track the shape change of the lips. Therefore, if the state of the lips is not a ignition state, the above process (steps 400-411) is repeated for the input image after a predetermined time has elapsed. Preferably, the image input after a predetermined time elapses may be an image of a next frame connected to a frame of the currently input image.

이때, 입술의 상태가 발화 상태이면, 입술의 윤곽을 구성하는 포인트들을 추 출한다(421 과정). 바람직하게는, 이 과정(421 과정)은 입술에 대한 활성화 윤곽 모델(Active Shape Model)을 이용하여 입술의 윤곽을 구성하는 포인트들을 추출하는 과정일 수 있다. 바람직하게는, 추출하는 포인트들의 수는 16개로 할 수 있다.At this time, if the state of the lips is a ignition state, the points constituting the outline of the lips are extracted (step 421). Preferably, this process 421 may be a process of extracting points constituting the contour of the lips using an active shape model for the lips. Preferably, the number of points to be extracted may be 16.

다음, 위에서 추출한 포인트들과 소정의 시간이 경과한 후에 입력된 영상을 이용하여 추출한 포인트들을 연결하는 벡터들을 생성한다(430 과정).In operation 430, vectors that connect the extracted points with the extracted points using the input image after a predetermined time elapses are generated.

마지막으로, 생성된 벡터들의 크기 및 방향을 이용하여 입술의 윤곽이 변화된 정도를 기록한다(431 과정).Finally, the degree of change of the contour of the lips is recorded using the generated size and direction (step 431).

도 5의 세로방향으로 윗줄부터 RGB 색상 모델, YCbCr 색상 모델, HSV 색상 모델로 변환한 결과를 나타낸다. 종래의 방법에서 사용된 RGB나 YCbCr 색상 모델에 비하여 HSV 색상 모델은 Hue와 Saturation 변수에 의해 입술영역이 보다 명확하게 표현됨을 알 수 있다.5 shows the result of conversion from the upper row in the vertical direction to the RGB color model, the YCbCr color model, and the HSV color model. Compared to the RGB or YCbCr color model used in the conventional method, the HSV color model can be seen that the lip region is more clearly expressed by the Hue and Saturation parameters.

도 6은 도 3의 입술영역을 검출하는 과정(310 과정)을 도시한 것이다.FIG. 6 illustrates a process 310 of detecting the lip region of FIG. 3.

도 6에서와 같이 가로축과 세로축에 대해 각각 Histogram을 구한다. 입술이 발화되는 열린 상태에서는 도 6에서와 같이 세로축에 의한 Histogram이 2개의 극대값을 갖는 형태로 표현된다. 입술이 발화되지 않는 닫힌 상태에서는 세로축에 의한 Histogram이 1개의 극대값을 갖는 형태로 표현된다. 따라서, 세로축에 의한 Histogram에서 극대값을 갖는 분포의 형태를 통해 입술이 발화되고 있는 열린(Open) 상태인지 혹은 닫힌(Closed) 상태인지를 판단하게 된다.As shown in FIG. 6, Histograms are obtained for the horizontal and vertical axes, respectively. In the open state where the lips are ignited, the histogram along the vertical axis is expressed in a form having two local maxima as shown in FIG. 6. In the closed state where the lips are not ignited, the histogram along the vertical axis is expressed in the form of one local maximum. Therefore, it is determined whether the lips are in an open state or a closed state through the form of a distribution having a maximum value in the histogram along the vertical axis.

열린(Open) 상태로 판단된 경우에는 입술모양 추출단계로 진행하게 되고, 닫힌(Closed) 상태로 판단된 경우에는 처리과정이 중단되어 다음 영상에 대한 처리로 진행된다.If it is determined as the open state, the process proceeds to the lip shape extraction step. If it is determined as the closed state, the process is stopped to proceed to the next image.

발화되는 입술의 상태를 표현하기 위해 총 16개의 이차원 좌표로 구성되어 있다. 활성화 윤곽 모델(Active Shape Model)은 일반적으로 대상의 윤곽을 검출하기 위해 사용되는 영상처리 방법이다. It consists of a total of 16 two-dimensional coordinates to express the state of the lips being ignited. An active shape model is an image processing method generally used to detect the contour of an object.

도 8은 도 3의 입술모양의 변화 정보를 기록하는 과정(330 과정)의 일 예를 도시한 것이다.8 illustrates an example of a process of recording change information of the lip shape of FIG. 3 (operation 330).

도 8에서와 같이 각각의 포인트에서 이동 벡터(dx)를 이용하여 미리 정의된 입술모델과 소정의 시간이 경과한 후에 입력된 영상간의 차이를 산출한다.As shown in FIG. 8, the difference between the predefined lip model and the input image after a predetermined time has been calculated using the motion vector dx at each point.

입술의 상태를 발화되는 열린 상태로 인식하고 다음 단계로 진행된 결과이다. 직사각형의 상자는 입술영역 검출단계의 결과물을 표현한 것이다. 십자가(+) 표시의 16개의 이차원 좌표는 입술모양 추적과정을 통해 얻어진 결과이다.This is the result of recognizing the state of the lips as an open state of ignition and proceeding to the next step. The rectangular box represents the result of the lip region detection step. The 16 two-dimensional coordinates of the cross sign are the result of the lip tracking process.

본 발명은 기존의 입술 영역 검출 방법에 사용된 RGB 또는 YCbCr 색상 모델을 이용하는 방법 대신에 새로운 색상 기법을 도입하여, 주변 환경에 민감한 단점을 보완하는 방법을 사용한다. HSV(Hue-Saturation-Value) 색상 모델에서는 Hue와 Saturation은 카메라 장치에 독립적인 변수로 얻게 된다. 따라서, 이 변수들을 이 용하여 입력영상에서 우리가 얻고자 하는 입술을 포함하는 최소영역 영상을 얻을 수 있게 된다. The present invention uses a method that compensates for the disadvantages sensitive to the surrounding environment by introducing a new color technique instead of using the RGB or YCbCr color model used in the existing lip detection method. In the Hue-Saturation-Value (HSV) color model, Hue and Saturation are obtained as camera device independent variables. Therefore, by using these variables, we can obtain the minimum region image including the lips we want to obtain from the input image.

또한, 입술 모양에 대한 보다 많은 정보를 얻기 위해서 입술에 활성화 윤곽 모델(Active Shape Model)을 적용하여 입술모양 정보를 얻는 방법을 사용할 수 있다. 이 방법을 적용할 경우, 입술의 윤곽에 대한 영상 내의 정확한 이차원 좌표를 얻을 수 있게 된다. 이 좌표를 이용하여 잡음 환경에서 보다 정확한 음성인식 결과를 얻을 수 있다.In addition, in order to obtain more information about the shape of the lips, a method of obtaining lip shape information by applying an active shape model to the lips may be used. By applying this method, it is possible to obtain accurate two-dimensional coordinates in the image of the contour of the lips. These coordinates can be used to obtain more accurate speech recognition results in noisy environments.

바람직하게는, 본 발명의 입술모양 추출방법을 컴퓨터에서 실행시키기 위한 프로그램을 컴퓨터로 읽을 수 있는 기록매체에 기록할 수 있다.Preferably, a program for executing the lip extraction method of the present invention on a computer can be recorded on a computer-readable recording medium.

본 발명은 소프트웨어를 통해 실행될 수 있다. 소프트웨어로 실행될 때, 본 발명의 구성 수단들은 필요한 작업을 실행하는 코드 세그먼트들이다. 프로그램 또는 코드 세그먼트들은 프로세서 판독 가능 매체에 저장되거나 전송 매체 또는 통신망에서 반송파와 결합된 컴퓨터 데이터 신호에 의하여 전송될 수 있다. The invention can be implemented via software. When implemented in software, the constituent means of the present invention are code segments that perform the necessary work. The program or code segments may be stored on a processor readable medium or transmitted by a computer data signal coupled with a carrier on a transmission medium or network.

본 발명은 도면에 도시된 일 실시예를 참고로 하여 설명하였으나 이는 예시적인 것에 불과하며 당해 분야에서 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 실시예의 변형이 가능하다는 점을 이해할 것이다. 그러나, 이와 같은 변형은 본 발명의 기술적 보호범위내에 있다고 보아야 한다. 따라서, 본 발명의 진정한 기술적 보호범위는 첨부된 특허청구범위의 기술적 사상에 의해서 정해져야 할 것이다.Although the present invention has been described with reference to one embodiment shown in the drawings, this is merely exemplary and will be understood by those of ordinary skill in the art that various modifications and variations can be made therefrom. However, such modifications should be considered to be within the technical protection scope of the present invention. Therefore, the true technical protection scope of the present invention will be defined by the technical spirit of the appended claims.

상술한 바와 같이, 본 발명에 의하면, 영상에서 색조와 채도에 기초하여 입술이 존재하는 영역을 검출하고 이 영역에서 입술의 윤곽을 검출함으로써, 입술윤곽에 대한 정확한 좌표를 얻을 수 있고, 입술의 모양을 정확하게 추적할 수 있으며, 화자의 음성정보와 함께 입술 모양의 변화정보에 기초하여 정확한 음성인식을 가능하게 하는 효과가 있다.As described above, according to the present invention, by detecting the area where the lips exist in the image based on the hue and saturation and detecting the contour of the lips in this area, it is possible to obtain accurate coordinates for the contour of the lips, the shape of the lips Can be accurately tracked, and has the effect of enabling accurate voice recognition based on the change of the shape of the lips together with the speaker's voice information.

Claims

Calculating hue by coordinate and saturation by coordinate of the input image;

Detecting a lip region including a lip from the input image by using the computed hue and coordinate saturation;

Extracting a predetermined number of points constituting the contour of the lips from the detected lips area; And

And extracting the change information of the shape of the lip using the extracted points.

The method of claim 1,

Computing the hue by coordinates and the saturation by coordinates

Converting an RGB value of the input image into a corresponding HSV value; And

And extracting the hue by coordinates and the saturation by coordinates from the converted HSV values.

The method of claim 1,

Detecting the lip area

Assigning a histogram value corresponding to the hue value of the coordinate to the coordinate whose value of the hue for each coordinate is greater than a predetermined reference hue value and saturation value for the coordinate is greater than a predetermined reference saturation value; And

And detecting the area including the coordinates of which the histogram value is maximum as the lip area.

The method of claim 3, wherein

Allocating a histogram value corresponding to the hue value of the coordinates

And assigning a predetermined minimum histogram value to a coordinate whose value of the hue for each coordinate is smaller than a predetermined reference tone value or for which the value of the saturation for each coordinate is smaller than a predetermined reference saturation value. Way.

The method of claim 4, wherein

Allocating a histogram value corresponding to the hue value of the coordinates

Histogram of the coordinates

, The hue of the coordinates

, The saturation of the coordinate

, A predetermined reference tone value

, The reference saturation value

about,

Lips extraction method characterized in that the step of assigning a histogram value to the coordinates using the equation.

The method of claim 1,

Extracting the points

Determining whether the lips are ignited in the detected lip region; And

And extracting the predetermined number of points when the lips are in a uttered state.

The method of claim 6,

Extracting the points

And extracting a predetermined number of points constituting the contour of the lips by using the input image after a predetermined time has elapsed if the lips are not ignited.

The method of claim 6,

The step of determining whether the lips are ignited

When the lip region includes two regions whose histogram value is greater than or equal to a predetermined first reference value, and the region whose histogram value is greater than or equal to a predetermined second reference value is included between the two regions, the lip is determined to be in a uttered state. Lip extraction method characterized in that the step.

The method of claim 1,

Extracting the points

And extracting the points using an active shape model of the lips.

In claim 1,

Extracting the points

Lip extraction method characterized in that the step of extracting the 16 points constituting the contour of the lips.

The method of claim 1,

Recording the change information of the lip shape

Lip shape extraction, characterized in that the step of recording the degree of change of the contour of the lips by using the difference between the contour of the lips extracted using the input image and the contour of the lips extracted using the input image after a predetermined time. Way.

The method of claim 11,

Extracting the points

Determining whether the lips are ignited in the detected lip region; And

If the lips are in a uttered state, extracting a predetermined number of points constituting the outline of the lips,

The step of recording the degree of change of the contour of the lips is

Generating vectors connecting points extracted using the input image and points extracted using the input image after a predetermined time elapses; And

And recording the degree of change of the contour of the lip by using the size and direction of the generated vectors.

A computer-readable recording medium having recorded thereon a program for executing the method of claim 1 on a computer.

A hue saturation calculator configured to calculate hue saturation and coordinate saturation of the input image;

A lip region detector configured to detect a lip region including a lip from the input image by using the calculated hue by coordinate and saturation by coordinate;

A contour detector extracting a predetermined number of points constituting the contour of the lips from the detected lips area; And

And a change information recording unit for recording change information of the lip shape using the extracted points.

The method of claim 14,

The hue saturation calculation unit

And converting the RGB values of the input image into corresponding HSV values and extracting the hue by coordinates and the saturation by coordinates from the converted HSV values.

The method of claim 14,

The lip area detection unit

A histogram generating unit for allocating a histogram value corresponding to the hue value of the coordinate to a coordinate whose value of the hue for each coordinate is greater than a predetermined reference tone value and the saturation for each coordinate is greater than a predetermined reference saturation value; And

And a region detecting unit detecting a region including coordinates of which the histogram value is maximum as a lip region.

The method of claim 16,

The histogram generator

And a predetermined minimum histogram value is assigned to a coordinate whose value of the hue for each coordinate is smaller than a predetermined reference tone value or for which the value of the saturation for each coordinate is smaller than a predetermined reference saturation value.

The method of claim 17,

The histogram generator

Histogram of the coordinates

, The hue of the coordinates

, The saturation of the coordinate

, A predetermined reference tone value

, The reference saturation value

about,

Lip-shaped extracting apparatus, characterized in that for assigning a histogram value to the coordinates using the equation.

The method of claim 14,

The contour detector

An utterance detector for determining whether a lip is ignited in the detected lip region; And

If the lips are in a ignited state, the lips extraction apparatus comprising a point extraction unit for extracting the predetermined number of points.

The method of claim 19,

The point extracting unit

And extracting a predetermined number of points constituting the contour of the lips using the input image after a predetermined time has elapsed when the lips are not ignited.

The method of claim 19,

The ignition detection unit

When the lip region includes two regions whose histogram value is greater than or equal to a predetermined first reference value, and the region whose histogram value is greater than or equal to a predetermined second reference value is included between the two regions, the lip is determined to be in a uttered state. Lip-shaped extracting device, characterized in that.

The method of claim 14,

The contour detector

And extracting the points by using an active shape model for the lips.

The method of claim 14,

The contour detector

Lip-shaped extraction device, characterized in that for extracting 16 points constituting the contour of the lips.

The method of claim 14,

The change information recording unit

Lip shape extraction apparatus characterized by recording the degree of change in the contour of the lips by using the difference between the contour of the lips extracted using the input image and the contour of the lips extracted using the input image after a predetermined time.

The method of claim 24,

The contour detector

If the lips are in a uttered state, including a point extraction unit for extracting a predetermined number of points constituting the outline of the lips,

The change information recording unit

A vector generator for generating vectors connecting points extracted using the input image and points extracted using the input image after a predetermined time elapses; And

Lip recording apparatus characterized in that it comprises a recording unit for recording the degree of change of the contour of the lips using the size and direction of the generated vector.

The method of claim 25,

The recording unit

Lip shape extraction apparatus characterized in that it comprises a SDRAM as a storage space for recording the degree of change of the contour of the lips.