KR100220699B1

KR100220699B1 - Apparatus for coding lip shape information in a 3 dimension model-based coding system

Info

Publication number: KR100220699B1
Application number: KR1019960078101A
Authority: KR
Inventors: 이민섭
Original assignee: 전주범; 대우전자주식회사
Priority date: 1996-12-30
Filing date: 1996-12-30
Publication date: 1999-09-15
Also published as: KR19980058765A

Abstract

본 발명은 3 차원 모델 기반 코딩 시스템에서 입술 영상과 음성으로부터 입술 형태 정보를 추출하는 장치에 관한 것으로, 상기 음성으로부터 소리 존재 여부를 검출하여 스위칭 제어 신호를 출력하는 소리 검출부와, 상기 음성으로부터 인식된 음성에 대응하는 입술 패턴의 변형 변수값을 출력하는 음성 인식부와, 상기 입술 영상을 제 1 코딩 방식으로 코딩하는 제 1 코딩부와, 상기 음성 인식부로부터 제공된 입술 패턴의 변형 변수값을 제 2 코딩 방식으로 코딩하는 제 2 코딩부와, 상기 소리 검출 수단으로부터의 스위칭 제어 신호에 응답하여 제 1 및 제 2 코딩부에 의해 코딩된 정보를 선택적으로 출력하는 스위칭부를 구비한다.The present invention relates to an apparatus for extracting lip shape information from a lip image and a voice in a three-dimensional model-based coding system, comprising: a sound detector for detecting presence of a sound from the voice and outputting a switching control signal; A speech recognition unit for outputting a deformation parameter value of a lip pattern corresponding to a voice, a first coding unit for coding the lip image by a first coding scheme, and a deformation parameter value of the lip pattern provided from the voice recognition unit; And a second coding section for coding in a coding scheme, and a switching section for selectively outputting information coded by the first and second coding sections in response to a switching control signal from the sound detecting means.

Description

Lip Shape Information Extraction Device of 3D Model-based Coding System

본 발명은 3 차원 모델, 특히 인간의 형상 모델을 이용한 얼굴의 표정 변화 코딩 시스템에 관한 것으로, 보다 상세하게는 입술의 형태 정보를 추출하는 장치 및 방법에 관한것이다.The present invention relates to a facial expression change coding system using a three-dimensional model, in particular a human shape model, and more particularly to an apparatus and method for extracting the shape information of the lips.

3 차원 물체 모델, 특히 인간의 형상 모델을 이용한 응용 기술은 비디오 코딩(video coding) 및 컴퓨터 그래픽스(computer graphics) 분야에서 그 응용되고 있다. 현재 진행중인 비디오 코딩(video coding) 분야에서는 좀더 적은 데이터량으로 만족스러운 화질의 영상을 전송하려는 연구가 지속되고 있다. 전송되는 데이터량에 따른 연구과제를 살펴보면 크게 HDTV 로부터 시작하여 적게는 현재 활발히 연구되고있는 MPEG4가 있다. 이중 MPEG4 에서는 매우 적은 데이터량을 이용하여 비디오 신호를 전송하기 때문에 단순히 정보 이론을 이용한 변환, 예를 들면, 코사인 변환(cosine transform) 계수 방법만 사용되는 것이 아니라, 그밖의 컴퓨터 비젼(computer vision), 컴퓨터 그래픽스에서 사용되는 기술들이 응용되기도 한다.Application techniques using three-dimensional object models, especially human shape models, have been applied in the field of video coding and computer graphics. In the field of video coding currently underway, research to transmit a satisfactory image quality with a smaller amount of data continues. Looking at the research projects according to the amount of data to be transmitted, there is MPEG4, which starts from HDTV and is currently being actively studied. In MPEG4, a video signal is transmitted using a very small amount of data. Therefore, not only a transformation using information theory, for example, a cosine transform coefficient method, but also other computer vision, The techniques used in computer graphics are also applied.

일반적으로 비디오 폰 또는 비디오 회의(video conference)에서 보이는 사람의 형태는 허리 이상인 흉부만이 보인다. 특히 말하는 사람의 얼굴이 영상 정보 전달의 대부분을 차지하고 있다. 만약 사람 얼굴 외의 정보, 즉, 배경 정보 등을 무시하고 얼굴 형태에 대한 특징적인 규칙화된 정보만 보낼 수 있다면 전송량을 매우 줄일수 있을 것이다. 즉 얼굴에 나타나는 일반적인 희, 노, 애, 락 감정 표정 및 말을 할 때의 입 모양에 나타나는 규칙을 찾아내어 하나의 정보화된 코드로 상대편에게 보내고, 상대방에서는 이를 받아서 자신이 갖고 있는 3 차원 모델을 수신된 코드에 따라 얼굴 형태를 변형시켜 도시함으로써 전송자의 표정 형태를 적은 데이터 양으로 파악 할 수 있을 것이다.In general, the shape of a person seen in a videophone or video conference only shows the chest above the waist. In particular, the speaker's face takes up most of the video information transmission. If only non-human face information, ie background information, can be ignored and only regular information on the face shape can be sent, the transmission rate can be greatly reduced. In other words, it finds the rules of the general face, face, expression, and emotions of the face and mouth when speaking, and sends them to the other party through one code. By deforming the face shape according to the received code, the expression shape of the sender may be recognized with a small amount of data.

보다 상세히 말해서, 3 차원 모델 기반 코딩 시스템은 비디오 폰과 같이 두 사람이 현장에서 실 시간으로 이야기를 하며, 소리뿐아니라 변화하는 얼굴 또는 몸도 같이 실시간으로 전송하는 것을 코딩의 대상으로한다. 그런데 일반 비디오 코딩과 다른 것은 전송 데이터는 시시각각 변화하는 픽셀들로 이루어진 영상을 취급하는 것이 아니고 어떤 특정한 움직임 변수를 추출하여 전송한 다음, 수신측에서 가지고있는 자료, 즉, 전송자의 얼굴 영상, 일반적인 3 차원 머리 형태 모델, 몸 형태 모델등과 결합되어 재구성된후 최종 변화하는 전송자의 얼굴 또는 몸 영상을 구성하는 것이다.More specifically, the three-dimensional model-based coding system targets coding in which two people talk in real time, such as a video phone, and transmit not only sound but also a changing face or body in real time. However, unlike general video coding, the transmitted data does not handle an image composed of pixels that change every moment. Instead, it extracts and transmits a certain motion variable, and then has the data on the receiving side, that is, the face of the sender. It is combined with the dimensional head shape model, body shape model, etc. to form the face or body image of the final changer after reconstruction.

물론, 이와 같은 시스템 구성은 전송하고자 하는 데이터의 양이 데이터 전송 대역 폭에 의해 매우 크게 제약을 받아 많은 양은 보내지 못하지만, 쌍방간 정확한 인식을 위한 충분한 화질을 요할 경우 사용될 수 있는 것이다. 대역폭이 커서 충분한 데이터를 보낼 수 있다면 이 시스템의 장점은 그만큼 낮아질것이다. 왜냐하면, 규칙화된 코드로서는 자연스런 표정을 나타내기에 부족한 감이 있기 때문이다.Of course, such a system configuration can be used when the amount of data to be transmitted is very limited by the data transmission bandwidth, but not a large amount, but requires sufficient image quality for accurate recognition between the two parties. If the bandwidth is large enough to send enough data, the advantage of this system will be that low. This is because regular code lacks a natural look.

이러한 3 차원 모델 기반 코딩 시스템에서 상호 전송되는 전송자의 얼굴 영상중에서 입 모양, 특히 입술에 대한 형태는 매우 중요한 시각적 정보를 가지고 있다. 즉, 전송자가 무슨 말을 하고 있으며, 어떤 감정을 나타내고 있는지 입술의 형태로써 가늠할 수가 있기 때문이다.In the 3D model-based coding system, the shape of the mouth, especially the shape of the lips, has a very important visual information in the face images of the senders. In other words, it is possible to estimate what the sender is saying and what emotions are expressed in the form of lips.

따라서 적은 데이터량으로 수신측에서 만족할만한 입 모양을 재구성 할 수 있을만큼의 정보량을 추출할 수 있는 시스템이 요구되어오고 있다.Therefore, there is a demand for a system capable of extracting enough information to reconstruct a satisfactory mouth shape with a small amount of data.

그러므로, 본 발명은 3 차원 모델 기반 얼굴 표정 코딩 시스템에서 입술 형태 정보를 추출하는 방법 및 장치를 제공하는 것을 그 목적으로한다.Therefore, an object of the present invention is to provide a method and apparatus for extracting lip shape information in a 3D model based facial expression coding system.

상술한 목적을 달성하기위한 본 발명에 따르면, 3 차원 모델 기반 코딩 시스템에서 입술 영상과 음성으로부터 입술 형태 정보를 추출하는 장치가 제공되는데 이 장치는 상기 음성으로부터 소리 존재 여부를 검출하여 스위칭 제어 신호를 출력하는 소리 검출부와, 상기 음성으로부터 인식된 음성에 대응하는 입술 패턴의 변형 변수값을 출력하는 음성 인식부와, 상기 입술 영상을 제 1 코딩 방식으로 코딩하는 제 1 코딩부와, 상기 음성 인식부로부터 제공된 입술 패턴의 변형 변수값을 제 2 코딩 방식으로 코딩하는 제 2 코딩부와, 상기 소리 검출 수단으로부터의 스위칭 제어 신호에 응답하여 제 1 및 제 2 코딩부에 의해 코딩된 정보를 선택적으로 출력하는 스위칭부를 구비한다.According to the present invention for achieving the above object, there is provided an apparatus for extracting the lip shape information from the lip image and the voice in a three-dimensional model-based coding system, which detects the presence of sound from the voice to provide a switching control signal A sound detection unit for outputting, a voice recognition unit for outputting a deformation parameter value of a lip pattern corresponding to the voice recognized from the voice, a first coding unit for coding the lip image by a first coding method, and the voice recognition unit Selectively outputting information coded by the first and second coding units in response to a switching control signal from the sound detecting means, and a second coding unit for coding the deformation parameter values of the lip patterns provided from the second coding scheme; And a switching unit.

도 1은 본 발명의 바람직한 실시예에 따라서 구성된 3 차원 모델 기반 코딩 시스템의 입술 형태 정보 추출 장치의 블록도1 is a block diagram of an apparatus for extracting lip shape information of a 3D model based coding system constructed according to a preferred embodiment of the present invention.

도 2는 도 1에 도시된 음성 인식부에서 사용하는 입술 형태의 패턴을 도시하는 도면FIG. 2 is a diagram illustrating a pattern of a lip shape used in the speech recognition unit illustrated in FIG. 1.

도 3은 입술의 움직임에 따른 변수를 생성하는데 사용하는 기본 입술의 도면3 is a diagram of a basic lip used to generate a variable according to lip movement

* 도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

10 : 음성 처리부 20 : 영상 처리부10: voice processing unit 20: image processing unit

30 : 소리 측정부 40 : 음성 인식부30: sound measuring unit 40: voice recognition unit

50 : 스위칭부 60 : 제 1 코딩부50: switching unit 60: first coding unit

70 : 제 2 코딩부70: second coding unit

본 발명의 상기 및 기타 목적과 여러가지 장점은 첨부된 도면을 참조하여 하기에 기술되는 본 발명의 바람직한 실시예로 부터 더욱 명확하게 될 것이다.The above and other objects and various advantages of the present invention will become more apparent from the preferred embodiments of the present invention described below with reference to the accompanying drawings.

이하 첨부된 도면을 참조하여 본 발명의 바람직한 실시예에 대하여 상세하게 설명한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1을 참조하면, 본 발명에 따라 3 차원 모델 기반 코딩 시스템에서 입술 영상의 데이터와 음성 데이터로부터 입술 형태 정보를 추출하는 장치의 블록도가 도시된다. 본 발명의 입술 형태 정보 추출 장치는 음성 처리부(10), 영상 처리부(20), 소리 측정부(30), 음성 인식 장치(40), 스위칭부(50), 제 1 코딩부(60), 제 2 코딩부(70)를 구비한다.Referring to FIG. 1, a block diagram of an apparatus for extracting lip shape information from lip image data and voice data in a 3D model based coding system according to the present invention is shown. The apparatus for extracting lip shape information of the present invention includes a speech processing unit 10, an image processing unit 20, a sound measuring unit 30, a speech recognition device 40, a switching unit 50, a first coding unit 60, and a first processing unit. 2 coding section 70 is provided.

음성 처리부(10)는 마이크(도시안됨)를 통해 발생된 음성을 양자화하여 소리측정부(30)와 음성 인식 장치(40)로 제공한다.The voice processor 10 quantizes the voice generated through the microphone (not shown), and provides the voice measurer 30 and the voice recognition device 40.

영상 처리부(20)는 카메라(도시안됨)로부터 제공되는 전송자의 입 모양 영상을 화소 값이 비슷한 영역으로 분리함으로써 입술 영상을 추출한다. 추출된 입술 영상은 제 1 및 제 2 코딩부(60) 및 (70)으로 각기 제공된다.The image processor 20 extracts the lips image by separating the sender's mouth image provided from the camera (not shown) into regions having similar pixel values. The extracted lip images are provided to the first and second coding units 60 and 70, respectively.

추출된 입술 영상은 감정 표현과 같은 일정한 패턴이 없는 형태와 발음하는 말과 같이 일정한 패턴이 있는 형태로 분류할 수 있을 것이다. 감정 표현은 사람마다 다양하며, 또한 미묘한 감정의 차이에 의해 입 모양은 다양하므로, 여기에 적당히 일정한 규칙에 따른 입 모양을 나타내기는 불가능할 것이다. 이에 대해서는 제 1 코딩부(60)에서 윤곽선 코딩 기법을 이용하여 코딩한다.The extracted lip image may be classified into a form having no pattern, such as an emotional expression, and a form having a certain pattern, such as a pronounced word. Since the expression of emotions varies from person to person, and mouth shape varies according to subtle emotions, it will be impossible to express mouth shape according to a proper rule. For this, the first coding unit 60 codes using the contour coding technique.

그러나, 일정한 패턴을 갖는 입술 형태에 대하여는 적당히 일정한 규칙을 제공하여 입 모양을 나타내는 것이 가능할 것이다. 이에 대해서는 제 2 코딩부(70)에서 음성 인식부(40)로부터 제공되는 입술 영상에 대한 변형 변수값을 파라미터 코딩 기법에 따라 코딩한다.However, it would be possible to provide an appropriately regular rule for the shape of the lips with a certain pattern to represent the shape of the mouth. In this regard, the deformation parameter value of the lip image provided from the speech recognition unit 40 is coded by the second coding unit 70 according to a parameter coding technique.

제 1 코딩부(60)는 통상의 에지 검출 방법을 이용하여 입술 영상에 대하여 에지 부분을 추출하고, 추출된 입술의 경계를 나타내는 윤곽선(contour)과 그 영역의 내부에 대한 정보인 내용(content)을 각기 코딩한다.The first coding unit 60 extracts an edge portion from the lip image by using a conventional edge detection method, and includes a contour indicating the boundary of the extracted lip and information on the inside of the region. Code each separately.

이러한 에지 검출은 영상의 밝기가 급격히 변하는 원영상의 밝기 분포를 이용하여 찾아낸다. 즉, 원영상이 입력되면 휘도신호의 레벨차를 검출하고, 검출된 레벨차가 미리 설정된 소정의 임계값 보다 크면 입력된 원영상을 경계부분으로 판단하여 경계 데이터를 출력한다.The edge detection is found by using the brightness distribution of the original image in which the brightness of the image changes rapidly. That is, when the original image is input, the level difference of the luminance signal is detected. When the detected level difference is larger than a predetermined threshold value, the input original image is determined as the boundary portion and the boundary data is output.

또한, 물체의 윤곽을 처리하는데 있어서, 윤곽 정보는 물체의 모양을 해석 및 합성하는데 대단히 중요하며, 이러한 윤곽 정보를 나타내기 위한 통상의 부호화 기법으로는, 예를들면 체인 코딩(chain coding)방법이 있다. 이때, 체인 부호화 방법은 윤곽선상의 임의의 한 점에서 시작하여 화소들간의 연결상태에 따라 경계선을 일정한 방향으로 방향벡터들의 순열로써 나열해 가면서 윤곽정보를 부호화하는 기법으로, 이러한 방법은 본 기술분야에 잘 알려진 기술이므로 상세한 설명은 생략된다. 상술한 바와같이, 제 1 코딩부(60)에 의해 체인 코딩된 입술 정보는 스위칭부(50)로 제공된다.In addition, in processing the contour of an object, the contour information is very important for analyzing and synthesizing the shape of the object. As a conventional coding technique for representing such contour information, for example, a chain coding method is used. have. In this case, the chain encoding method is a technique of encoding contour information starting from an arbitrary point on the contour line and arranging the boundary lines as a permutation of direction vectors in a predetermined direction according to the connection state between pixels. Detailed description is omitted since it is a well known technique. As described above, the lip information chain-coded by the first coding unit 60 is provided to the switching unit 50.

한편, 소리 측정부(30)는 입력되는 음성 데이터로부터 전송자의 소리 존재 여부를 검출하여 소리 존재 신호를 스위칭 제어 신호로서 멀티플렉서(50)로 제공한다. 소리 존재 여부의 검출은 입력되는 음성의 높이가 기설정 기준치보다 높은지 또는 낯은지를 비교함으로써 판단한다. 이러한 판단에 의해 검출된 소리 존재 신호는 멀티플렉서(50)로하여금 말 소리가 있는 입술 변형과 말 소리가 없는 감정 표현의 입술 모양에 대하여 제 1 코딩부(60)와 제 2 코딩부(70)에서 처리된 신호를 선택적으로 출력하는 판단 신호로서 사용된다.On the other hand, the sound measuring unit 30 detects the presence or absence of the sound of the transmitter from the input voice data and provides the sound presence signal to the multiplexer 50 as a switching control signal. The detection of the presence of sound is determined by comparing whether the height of the input voice is higher than the preset reference value or is unfamiliar. The sound presence signal detected by this determination is transmitted to the multiplexer 50 by the first coding unit 60 and the second coding unit 70 for the lip shape with the speech sound and the lip shape without the speech sound. It is used as a judgment signal for selectively outputting the processed signal.

음성 인식 장치(40)는, 하기에서 상세히 설명되는 바와같이, 입력된 음성으로부터 초성 중성 종성을 분류하고 이들 세가지 조합에 따라 이미 구성되어있는 입술 변형 변수값을 출력하는 기능을 수행한다.As described in detail below, the speech recognition apparatus 40 performs a function of classifying the primary neutral species from the input speech and outputting the lip deformation variable values already configured according to these three combinations.

어떤 언어이든지 말 소리는 자음과 모음으로 나뉘어지며, 자음보다는 모음에 따라 크게 입 모양이 좌우된다. 한국어는 말소리를 근본으로하여 이루어진 문자이므로 매우 체계적이고 분석적이고, 한국어를 기준으로 말소리와 입 모양의 체계를 세울 경우 모든 나라의 말소리에 대한 시각적인 표현이 90% 이상 가능할 것이다. 따라서 본 발명에서는 한국어를 기본으로하여 입 모양에 대한 시각적 규칙을 세워 이것을 입 모양 형태의 정보로서 이용한다.In any language, speech sounds are divided into consonants and vowels, and mouth shape depends largely on vowels rather than consonants. Since Korean is a letter based on speech, it is very systematic and analytical. If you build a speech and mouth-based system based on Korean language, you will be able to express more than 90% of the visual sounds of speech in all countries. Therefore, the present invention establishes a visual rule for the shape of the mouth based on the Korean language and uses this as information on the shape of the mouth.

대표적인 입 모양은 중성 또는 종성이 되는 모음에 따라 시각적인 형태의 패턴을 갖는다. 이때 초성과 종성이 될 수 있는 자음은 입술 변화의 순간적인 시작과 끝 상태, 즉, 입을 다문 상태와 입을 연 상태를 결정하는데 사용된다.Representative mouth shapes have a pattern of visual forms depending on the vowels being neutral or longitudinal. The consonant, which can be a first consonant and a final consonant, is used to determine the instant start and end of the change of the lips, namely the closed state and the open state.

도 2를 참조하면, 본 발명에 따라 각각의 중성에 따른 입 모양이 입술의 형태를 설정하는 8 개의 패턴으로 형성된 것이 도시된다.Referring to Figure 2, in accordance with the present invention is shown that the mouth shape according to each neutral formed in eight patterns to set the shape of the lips.

음성 인식 장치(40)에서, 전송자가 발음하는 음성에서 추출된 중성에 대응하는 입술 모양의 패턴이 도 2로부터 선택되며, 선택된 하나의 입술 모양 패턴은 도 3에 도시된바와같은 기본 형태의 입술로부터 변형된 변위량을 나타내는 변수값으로 생성된다.In the speech recognition apparatus 40, a lip pattern corresponding to the neutral extracted from the voice pronounced by the sender is selected from FIG. 2, and the selected lip pattern is selected from the lips of the basic shape as shown in FIG. 3. It is created with a variable value representing the amount of displacement.

도 3에 도시된바와같이, 기본 입술 형태의 변형은 대부분 입술의 변화로서 발생하며, 매우 복잡한 변화 형태를 뛴다. 그 변수는 입술의 중간부분 상하 움직임 변수(L3, L4), 입술의 가장자리 움직임 변수(L7, L8), 입술 좌우 움직임 변수(L1, L2), 입술 앞뒤 움직임 변수(L5, L6)를 포함한다.As shown in FIG. 3, the deformation of the basic lip form mostly occurs as a change of the lip, and runs a very complex change form. The variables include middle and upper movement variables L3 and L4 of the lips, lip movement variables L7 and L8 of the lips, lip movement variables L1 and L2, and lip movement variables L5 and L6.

이때 입술 변수값은 기설정된 범위, 예를 들면, -1 내지 1 의 범위내에서 선택되도록 하며, 중간값 "0"은 입술 형태의 변형이 없음을 나타내는 것으로 기본 입술 형태의 위치 값을 나타낸다. 음성 인식 장치(40)에서 선택된 변수값은 제 2 코딩부(70)으로 제공되어 코딩된다. 제 2 코딩부(70)에 의해 코딩된 입술 정보는 스위칭부(50)으로 제공된다.In this case, the lip variable value is selected within a predetermined range, for example, a range of -1 to 1, and the median value "0" indicates that there is no deformation of the lip shape and indicates a position value of the basic lip shape. The variable value selected by the speech recognition apparatus 40 is provided to the second coding unit 70 and coded. The lip information coded by the second coding unit 70 is provided to the switching unit 50.

멀티플렉서로 구성될 수 있는 스위칭부(50)는 소리 측정부(30)로부터 제공된 스위칭 제어에 따라 제 1 코딩부(60) 또는 제 2 코딩부(70)로부터 출력된 코드화된 데이터를 선택적으로 출력되게한다. 즉, 소리 측정부(30)로부터 소리 검출 신호가 제공되면, 스위칭부(50)는 제 2 코딩부(70)로부터 출력되는 입술 영상 정보를 수신측으로 전송하며, 소리 측정부(30)로부터 소리 검출 신호가 제공되지않으면, 스위칭부(50)는 제 1 코딩부(60)로부터 출력되는 입술 영상 정보를 수신측으로 전송하게된다.The switching unit 50, which may be configured as a multiplexer, may selectively output coded data output from the first coding unit 60 or the second coding unit 70 according to the switching control provided from the sound measuring unit 30. do. That is, when the sound detection signal is provided from the sound measuring unit 30, the switching unit 50 transmits the lip image information output from the second coding unit 70 to the receiving side, and detects the sound from the sound measuring unit 30. If no signal is provided, the switching unit 50 transmits the lip image information output from the first coding unit 60 to the receiving side.

이상 설명한 바와같이, 3 차원 물체 모델 기반 코딩 시스템에서 전송자의 입술 형태에 대한 정보를 전송할 때 일반 코딩을 적용하여 생성된 입술 형태 정보와 파라미터 코딩을 이용하여 생성된 입술 정보를 소리 존재 여부에 따라 선택적으로 전송하기 때문에 전송 데이터 양을 줄일수있으며, 이렇게 적은 데이터 양으로도 전송자의 얼굴 표정, 특히 전송자의 입술 모양을 파악할 수 있게 된다.As described above, when the 3D object model-based coding system transmits information about the lip shape of the sender, the lip shape information generated by applying general coding and the lip information generated by using parameter coding are selectively selected according to the presence of sound. Because of this, the amount of data transmitted can be reduced, and even with this small amount of data, the facial expression of the sender, especially the shape of the sender's lips, can be identified.

Claims

An apparatus for extracting lip shape information from a lip image and a voice in a 3D model based coding system,

Sound detection means for detecting presence of sound from the voice and outputting a switching control signal;

Speech recognition means for outputting a deformation parameter value of a lip pattern corresponding to the speech recognized from the speech;

First coding means for coding the lip image in a first coding scheme;

Second coding means for coding a deformation parameter value of the lip pattern provided from the speech recognition means in a second coding scheme;

And switching means for selectively outputting information coded by the first and second coding means in response to the switching control signal from the sound detection means.

2. The variable according to claim 1, wherein the speech recognition means discriminates the primary neutral species from the input voice, selects one lip pattern corresponding to the differentiated neutral, and indicates a displacement amount in which the selected lip pattern is deformed from the basic lip pattern. Lip shape information extraction device, characterized in that for outputting.

According to claim 2, wherein the deformation parameter of the mouth is the left and right movement variable (L1, L2) indicating the displacement amount according to the left and right movement of the lips, the middle lips up and down respectively indicating the displacement amount according to the vertical movement of the middle and the edge of the lips Lip shape information extraction comprising the motion variables L3 and L4, the lip up and down motion variables L7 and L8, and the lip front and back motion variables L5 and L6 representing the displacement amount according to the front and rear movement of the lips. Device.

The method of claim 1, wherein the first coding scheme performed by the first coding unit extracts an edge portion of the lip image, and includes a contour representing a boundary of the extracted lips and information about the inside of the region. An apparatus for extracting lip shape information, characterized by coding content.