KR100897149B1

KR100897149B1 - Apparatus and method for synchronizing text analysis-based lip shape

Info

Publication number: KR100897149B1
Application number: KR1020070105563A
Authority: KR
Inventors: 오현준; 오석표
Original assignee: 에스케이 텔레콤주식회사
Priority date: 2007-10-19
Filing date: 2007-10-19
Publication date: 2009-05-14
Also published as: KR20090040014A

Abstract

본 발명은 텍스트 분석 기반의 입 모양 동기화 장치 및 방법을 개시한다. 본 발명에 따른 텍스트 분석 기반의 입 모양 동기화 장치는, 객체의 영상에 대한 얼굴검출인식 알고리즘을 실행하여 해당 얼굴영역을 추출한 후, 얼굴영역의 얼굴 특징요소를 이루는 특징점들을 판별한 것을 토대로 얼굴 특징요소 중 입 모양의 영상을 추출하고, 입력되는 텍스트에 대한 발음 분석을 기 설정된 소리 음 단위로 실행하여 적어도 하나 이상의 소리 음 패턴을 분류하며, 기 설정된 텍스트 기반의 입 모양 동기화 테이블을 통해 소리 음 패턴과 매칭되는 입 모양을 확인하여 해당 동기화 영상을 로딩한 후, 동기화 영상을 토대로 입 모양의 영상에 대한 동기화를 실행하는 것을 특징으로 한다. 따라서, 본 발명은 소정의 객체에 대한 영상으로부터 추출되는 얼굴영역에서의 입 모양에 대한 영상을 대응되는 텍스트에 대한 분석결과에 따라 영상 변환하여 동기화되는 얼굴 영상을 형성함으로써, 사용자에 의해 입력되는 텍스트에 대응하여 얼굴인식되는 소정의 영상에 포함되는 얼굴영역의 입 모양이 자동 변환하여 사용자로 하여금 색다른 흥미를 유발시킬 수 있는 다양한 응용 서비스의 구현이 가능하다.The present invention discloses an apparatus and method for synchronizing mouth shapes based on text analysis. In the text analysis-based mouth shape synchronization device according to the present invention, after extracting a corresponding face region by executing a face detection recognition algorithm on an image of an object, the facial feature element is determined based on determining the feature points constituting the face feature element of the face region. Extracts the image of the mouth shape, and analyzes the pronunciation of the input text in units of preset sound tones to classify at least one sound pattern, and sets the sound pattern and the sound pattern through the preset text-based mouth synchronization table. After confirming a matching mouth shape and loading the corresponding synchronization image, the synchronization of the mouth-shaped image based on the synchronization image is performed. Accordingly, the present invention forms a face image that is synchronized by converting an image of a mouth shape in a face region extracted from an image of a predetermined object according to an analysis result of a corresponding text, thereby forming a text input by a user. Correspondingly, the mouth shape of the face area included in the predetermined image recognized face is automatically converted to implement various application services that can cause a user's different interest.

입, 텍스트, 얼굴인식, 동기 Mouth, text, facial recognition, motivation

Description

Apparatus and method for synchronizing mouth shape based on text analysis {APPARATUS AND METHOD FOR SYNCHRONIZING TEXT ANALYSIS-BASED LIP SHAPE}

본 발명은 얼굴인식을 토대로 하여 응용 서비스를 구현하기 위한 기술에 관한 것으로, 더욱 상세하게는 소정의 객체에 대한 영상으로부터 추출되는 얼굴영역에서의 입 모양에 대한 영상을 대응되는 텍스트에 대한 분석결과에 따라 영상 변환하여 동기화되는 얼굴 영상을 형성하기 위한 텍스트 분석 기반의 입 모양 동기화 장치 및 방법에 관한 것이다.The present invention relates to a technology for implementing an application service based on face recognition, and more particularly, to an image analysis result of a mouth shape in a face region extracted from an image of a predetermined object. The present invention relates to an apparatus and method for synchronizing mouth shapes based on text analysis for forming a face image synchronized by image conversion.

통상적으로, 사람의 얼굴은 조명, 표정, 연령, 영상이 획득된 시기, 잡음 등의 변화에 따라 매우 다른 얼굴 영상으로 나타나기 때문에 이를 정확히 인식한다는 것은 매우 어려운 문제이다. 얼굴인식 시스템의 핵심은 이러한 얼굴에 대한 영상변이에 영향받지 않고 안정적인 얼굴인식 성능을 구현하는 것이다.안정된 성능을 갖는 얼굴인식 시스템의 개발은 회사, 연구소, 은행 등에서의 보안 시스템이나 휴먼 컴퓨터 인터페이스 등의 분야에 상당히 중요하게 적용될 수 있다.In general, it is very difficult to accurately recognize a face because a human face appears as a very different face image according to a change in lighting, facial expression, age, when the image was acquired, noise, and the like. The core of the face recognition system is to realize stable face recognition performance without being affected by the image variation of the face. The development of a face recognition system with stable performance is a security system or a human computer interface in a company, a laboratory, a bank, etc. It can be very important in the field.

이러한 얼굴인식 시스템은 다양한 분야에 적용될 수 있음에 따라, 2차원 영상을 3차원 영상으로 변환하는 기술, 얼굴 영상을 이루는 특징을 효율적으로 판별 하여 영상을 더욱 명확하게 인식하기 위한 기술 등과 같이 매우 다양한 각도에서 얼굴인식에 대한 기술 연구가 진행되고 있다.As the face recognition system can be applied to various fields, a variety of angles such as a technology for converting a 2D image into a 3D image, a technology for efficiently identifying a feature of a face image, and recognizing the image more clearly A technical study on face recognition is underway.

이와 더불어, 얼굴인식에 대한 기술과 관련하여 3차원 얼굴인식 영상의 구현 및 얼굴영상 자체에 대한 인식률 향상에 대한 연구 등에 대하여는 괄목할 만한 기술 성장이 이루어지고 있는 반면에, 얼굴인식을 통해 형성된 영상을 부가 서비스 데이터와 결합하여 다양한 응용 서비스를 구현하고 이를 통해 사용자와의 직접적인 인터페이싱 실현하기 위한 얼굴인식 기술에 관하여는 그 성장이 부진한 것이 사실이다.In addition, there are remarkable technological developments on the implementation of 3D face recognition images and the improvement of recognition rate on face images themselves. It is true that the growth of the face recognition technology for implementing various application services in combination with additional service data and realizing direct interfacing with the user is slow.

이와 관련하여, 얼굴인식과 관련된 주요 데이터 및 부가 서비스와 관련된 부수적 데이터 간의 동기화를 실행하기 위한 기술도 중요한 부분을 차지하고 있으며, 이에 대한 종래기술로는 음성 합성기술과 자막 캡션을 위한 동기 기술을 어느 정도 관련된 것으로 설정할 수 있다.In this regard, a technique for performing synchronization between main data related to face recognition and ancillary data related to supplementary services also plays an important part. As a related art, some techniques for synthesizing speech and caption caption are used. Can be set as relevant.

먼저 종래 음성합성기술에 대해 개략적으로 설명하면 다음과 같다.First, a brief description of the conventional speech synthesis technology is as follows.

음성합성기술은 기계가 인간에게 정보를 음성으로 전달하기 위해 개발된 기술로, 학문적으로 여러 분야로 구분될 수 있으나 가장 일반적인 형태가 TTS(Text To Speech conversion system)이다. 이는 말 그대로 텍스트 형태로 되어 있는 정보Speech synthesis technology is a technology developed by a machine to deliver information to a human voice. It can be divided into various fields academically, but the most common form is a text to speech conversion system (TTS). This is literally information in text form

를 말소리로 바꾸어 출력하는 것이다.Will be converted to speech.

음성합성기술은 제한적인 어휘의 부분문장 합성기술과 무제한 어휘의 전체문장 합성기술로 구분되는데, 제한적 어휘의 부분합성은 미리 녹음된 문장형태의 음성과 특정 어휘의 합성 음성을 합쳐 하나의 완성된 문장을 만든다. 하지만 이 기술 은문장 전체의 자연성은 높으나 문장형태의 변화가 어려우며 합성 부분의 연결이 어색하다는 단점이 있다. 따라서 최근에는 주로 무제한 어휘의 전체문장 합성기술에 연구 및 실용화의 초점이 맞추어져 있으며, 정서(놀람, 공포, 즐거움, 화냄, 지루함 등)에 기초한 음성 합성 방법도 연구되고 있다.Speech synthesis technology is divided into limited vocabulary partial sentence synthesis technology and unlimited vocabulary full sentence synthesis technology. Restrictive vocabulary partial synthesis combines pre-recorded sentences and synthesized speech of specific vocabulary. Make However, this technique has the disadvantages that the naturalness of the whole sentence is high but the sentence form is difficult to change and the connection of the synthetic part is awkward. Recently, research and practical use have been focused on the full sentence synthesis technology of unlimited vocabulary, and a method of speech synthesis based on emotions (surprise, fear, pleasure, anger, boredom, etc.) has been studied.

종래의 음성합성방법은 언어학적 처리 단계; 운율 처리 단계; 음성 신호 처리단계로 구분된다.Conventional speech synthesis method comprises a linguistic processing step; Rhyme processing step; It is divided into voice signal processing steps.

언어학적 처리 단계는 숫자, 약어, 기호 사전을 참조하는 텍스트 전처리기; 품사 사전을 참조하는 문장 구조 분석기; 예외발음 사전 등을 이용하여 발음 표기 변환을 수행하는 발음 표기 변환기로 구성된다.The linguistic processing step includes a text preprocessor that refers to a dictionary of numbers, abbreviations, and symbols; A sentence structure analyzer that references a part-of-speech dictionary; It is composed of a phonetic notation converter that performs phonetic notation conversion using an exception phonetic dictionary.

운율 처리 단계는 문장 구조 분석기에서 분석된 문법적인 상황과 정서를 나타나는 음향 파라메터를 삽입하여 음소의 강약, 액센트, 고저 장단, 억양, 지속시간, 휴지기간, 경계를 결정한다.The Rhyme Processing step inserts the grammatical situation and emotional sound parameters analyzed by the sentence structure analyzer to determine the phoneme intensity, accent, height, length, duration, intonation, duration, rest period, and boundary.

음성 신호 처리 단계는 합성 단위 데이터베이스를 이용하는 합성 단위 선택기; 합성 단위를 연결하여 음을 합성하는 합성단위 연결기로 구성된다. 음성신호 처리단계는 운율처리 단계에서 결정된 음소의 강약, 액센트, 고저 장단, 억양, 지속시간, 휴지기간, 경계에 가장 적합한 음성 데이터를 찾고 음성을 합성한다.The speech signal processing step may include a synthesis unit selector using a synthesis unit database; Composed of the synthesis unit linker to synthesize the sound by connecting the synthesis unit. The voice signal processing step finds the most suitable voice data for the strength, accent, low and long duration, intonation, duration, rest period and boundary of the phoneme determined in the rhyme processing step and synthesizes the voice.

상기에서 운율 처리 단계는 최근에 연구되고 있는 정서(놀람, 화냄, 즐거움, 공포 등) 음향에 대한 음성 합성까지 확장할 수있게 구성될 수 있다.In the above rhyme processing step may be configured to extend to the speech synthesis for the emotion (surprise, anger, pleasure, fear, etc.) that is being studied recently.

또한, 운율 처리 단계에서는 발화 속도 제어 입력과 정서 음향 파라미터 입력과 언어학적 처리단계에서 분석 및 변환된 정보를 가지고 액센트, 억양, 경계, final lengthening과 음소의 강약, 지속시간, 휴지기간 등을 결정한다.In addition, in the rhyme processing stage, accents, intonations, boundaries, final lengthening and phoneme strength, duration, and rest periods are determined using the speech rate control input, emotional sound parameter input, and linguistic processing stage. .

억양(intonation)은 문장 유형(종결형 어미)에 따라 변화를 보이며, 평서문에서는 하강조, 예/아니오 등의 의문문에서는 마지막 음절 직전까지 하강 후 마지막 음절에서 상승하고, 의문사 의문문에서는 하강조로 피치를 조절한다.Intonation changes according to the sentence type (final ending) .In the interrogation sentence, the descending tone is lowered, the yes / no question is lowered to the last syllable, and ascends from the last syllable. do.

액센트(accent)는 발음에 나타나는 음절 내부의 강세를 표현한다.The accent expresses the accent within the syllable in the pronunciation.

지속 시간(Duration)은 음소의 발음이 지속되는 시간으로 천이구간과 정상구간으로 나눌 수 있다. 지속시간 결정에 영향을 미치는 특징요소로는 자음, 모음의 고유 또는 평균값, 음절 유형, 조음 방법과 음소의 위치, 어절 내 음절 수, 어절 내 음절 위치, 인접 음운, 문장 끝, 억양구, 경계에서 나타나는 final lengthening, 조사나 어미에 해당하는 품사에 따른 효과 등이 있다. 그리고 지속 시간의 구현은 각 음소의 최소 지속 시간을 보장하며, 주로 자음보다는 모음 위주로 지속시간과 종성자음의 지속시간, 천이구간과 안정 구간에 대해 비선형적으로 지속 시간을 조절한다.Duration is the duration of the phoneme's pronunciation and can be divided into transition and normal sections. Features that influence the duration determination include consonants, eigen or average values of vowels, syllable types, location of articulation methods and phonemes, number of syllables within a word, syllable positions within a word, adjacent phonemes, sentence ends, accents, and boundaries. The final lengthening that appears and the effects of the part of speech that corresponds to the investigation or ending. In addition, the implementation of the duration guarantees the minimum duration of each phoneme, and adjusts the duration nonlinearly for the duration of the consonant, the duration of the final consonant, the transition period, and the stable period, mainly vowel rather than consonant.

경계는 끓어 읽기, 숨의 조절, 문맥의 이해도 제고를 위해 필요하며, 경계에서 나타나는 운율 현상으로 피치(F0)의 급격한 하강, 경계 앞 음절에서 final lengthening, 경계에서 휴지구간 존재하며 발화 속도에 따라 경계의 길이가 변화한다. 문장에서 경계의 검출은 어휘 사전과 형태소(조사, 어미) 사전을 이용하여 형태소를 분석하여 이루어진다.Boundary is necessary for boiling reading, breath control, and understanding of context.It is a rhythmic phenomena that occurs at the boundary, the sudden fall of the pitch (F0), the final lengthening at the syllables in front of the boundary, and the rest period at the boundary. The length of the boundary changes. Boundary detection in sentences is achieved by analyzing morphemes using lexical dictionaries and morphemes (search, mother) dictionaries.

이와 같이 종래의 음성합성기는 문자와 음성 데이터 베이스를 이용하여 음성 합성 방법을 제공하고, 주로 전화 교환용이나 어학 학습용도의 단순한 음성 데이터 합성만을 수행하였으며, 문자 정보를 음성정보로 변환하는 시스템으로 크게 언어학As described above, the conventional voice synthesizer provides a voice synthesis method using a text and voice database, mainly performs simple voice data synthesis for telephone exchange or language learning, and converts text information into voice information. linguistics

적 처리 단계, 운율 처리 단계, 음성 신호 처리 단계로 구분되어 음성합성기에서는 입력된 텍스트로부터 음성을 합성하는 용도로만 사용되고 있다.The speech synthesizer is divided into an adversary processing step, a rhyme processing step, and a voice signal processing step, and is used only for synthesizing a voice from input text.

즉, 종래의 음성합성기는 텍스트 정보로부터 음성만 합성하고 원천 정보인 텍스트틀 음성신호에 맞추어 노래 반주기처럼 텍스트 정보를 글자 단위로 미세하게 색칠하지 못하고 있다. 더 나아가 문서 정보에 포함된 정지영상이나 동영상 정보를That is, the conventional speech synthesizer synthesizes only the speech from the text information and does not finely color the text information by the character unit like the song half cycle in accordance with the text frame voice signal as the source information. Furthermore, the still image or movie information included in the document information

합성된 음성정보에 자동으로 동기시키는 것은 더욱 어렵다.It is more difficult to automatically synchronize the synthesized voice information.

뿐만 아니라, 음성합성기에서 생성되는 음성 데이터는 컴퓨터 노래반주기와 같은 미세한 자막 컬러링 같은 효과가 지원하지 않으며, 동기 작업을 위하여 많은 인원, 시간 및 비용이 소모된다. 또한, 네트워크 상에서 외국어로 된 많은 어학 데이터가 존재 하지만, 텍스트 또는 텍스트와 정지영상이 포함된 어학 컨텐츠를 컴퓨터 노래 반주기와 같이 음성 데이터와 텍스트 및 정지영상을 자동으로 동기시키지 못하는 문제점이 있으며, 하나의 문서에 대해서도 동기 작업에 많은 인원, 시간, 비용을 투입해야 하는 어려움이 있다.In addition, the voice data generated by the voice synthesizer is not supported by effects such as subtitle coloring such as computer song cycles, and consumes a lot of people, time, and money for synchronization work. In addition, although there are many language data in foreign languages on the network, there is a problem in that language content including text or text and still images is not automatically synchronized with audio data and text and still images such as computer song half cycle. There is also the difficulty of investing a lot of people, time and money in synchronous work.

이와 같이, 데이터 간의 동기화를 이루기 위한 종래의 기술에서도 더욱 진일보한 연구 방안이 요구되고 있으며, 얼굴인식과 관련하여서는 얼굴인식으로 형성되는 영상과 이에 관련하여 응용 서비스를 구현하기 위한 부수적 데이터 간의 동기화 기술이 요구되는 실정이다.As such, there is a need for further research in the related art to achieve synchronization between data, and in relation to face recognition, a synchronization technology between an image formed by face recognition and ancillary data for implementing an application service in relation thereto is required. It is required.

아울러, 얼굴인식에 기반하여 구현될 수 있는 응용 서비스 중에서 소정의 텍스트 입력에 따라 얼굴인식의 형성된 영상 내의 입 모양이 동기화되어 변환되는 서 비스를 구현하기 위해서는 상기와 같은 기술이 더더욱 요구된다 할 것이다.In addition, among the application services that can be implemented based on face recognition, the above-described technology is further required to implement a service in which a mouth shape in a formed image of face recognition is synchronized and converted according to a predetermined text input.

따라서, 본 발명은 상기의 문제점들을 해결하기 위해 창출된 것으로, 본 발명의 목적은 소정의 객체에 대한 영상으로부터 추출되는 얼굴영역에서의 입 모양에 대한 영상을 대응되는 텍스트에 대한 분석결과에 따라 영상 변환하기 위한 텍스트 분석 기반의 입 모양 동기화 장치 및 방법을 제공하는 데 있다.Therefore, the present invention was created to solve the above problems, and an object of the present invention is to image an image of a mouth shape in a face region extracted from an image of a predetermined object according to an analysis result of corresponding text. To provide a device and method for synchronizing mouth shape based text analysis for conversion.

또한, 본 발명의 다른 목적은 소정의 객체에 대한 영상으로부터 추출되는 얼굴영역에서의 입 모양에 대한 영상을 대응되는 텍스트에 대한 분석결과에 따라 영상 변환하여 동기화되는 얼굴 영상을 형성하기 위한 텍스트 분석 기반의 입 모양 동기화 장치 및 방법을 제공하는 데 있다.In addition, another object of the present invention is based on text analysis to form a synchronized face image by converting the image of the shape of the mouth in the face region extracted from the image of a predetermined object according to the analysis result of the corresponding text To provide a device and method for synchronizing a mouth shape.

상기 목적을 달성하기 위한 본 발명의 제 1 관점에 따른 텍스트 분석 기반의 입 모양 동기화 장치는, 객체의 영상에 대한 얼굴검출인식 알고리즘을 실행하여 해당 얼굴영역을 추출한 후, 상기 얼굴영역의 얼굴 특징요소를 이루는 특징점들을 판별한 것을 토대로 상기 얼굴 특징요소 중 입 모양의 영상을 추출하고, 입력되는 텍스트에 대한 발음 분석을 기 설정된 소리 음 단위로 실행하여 적어도 하나 이상의 소리 음 패턴을 분류하며, 기 설정된 텍스트 기반의 입 모양 동기화 테이블을 통해 상기 소리 음 패턴과 매칭되는 입 모양을 확인하여 해당 동기화 영상을 로딩한 후, 상기 동기화 영상을 토대로 상기 입 모양의 영상에 대한 동기화를 실행하는 것을 특징으로 한다.According to an aspect of the present invention, there is provided a text analysis-based mouth shape synchronization apparatus for extracting a corresponding facial region by executing a face detection recognition algorithm for an image of an object, and then extracting facial features of the facial region. The image of the mouth shape is extracted from the facial feature elements based on the determination of the feature points constituting the feature points, and at least one sound tone pattern is classified by performing pronunciation analysis on the input text in units of preset sound tones. After loading the corresponding synchronization image by checking the shape of the mouth matching the sound tone pattern through the base mouth shape synchronization table, and performing the synchronization on the image of the mouth shape based on the synchronization image.

바람직하게는, 상기 텍스트 분석 기반의 입 모양 동기화 장치는 상기 객체의 영상을 얼굴인식 실행을 위한 영상으로 변환 처리하는 영상 전처리부, 상기 전 처리된 영상을 기초로 얼굴검출인식 알고리즘을 실행하여 상기 얼굴영역을 추출하기 위한 얼굴영역 검출부, 상기 얼굴영역의 얼굴 특징요소를 이루는 특징점들을 판별하여 상기 입 모양의 영상을 추출하기 위한 특징점 판별부, 상기 텍스트를 제공받기 위한 텍스트 입력부, 상기 텍스트에 대한 발음 분석을 기 설정된 소리 음 단위로 실행하여 상기 소리 음 패턴을 분류하기 위한 패턴 분류부, 상기 텍스트 기반의 입 모양 동기화 테이블을 저장하기 위한 데이터 베이스부, 상기 텍스트 기반의 입 모양 동기화 테이블을 토대로 상기 소리 음 패턴과 매칭되는 입 모양을 확인하여 상기 동기화 영상을 로딩하기 위한 영상 로딩부 및 상기 동기화 영상을 통해 상기 입 모양의 영상에 대한 동기화를 실행하기 위한 동기화 실행부를 포함하는 것을 특징으로 한다.Preferably, the text analysis-based mouth shape synchronization device is an image pre-processing unit for converting the image of the object into an image for face recognition execution, the face detection algorithm based on the pre-processed image to execute the face Facial region detection unit for extracting a region, feature point determination unit for extracting the mouth-shaped image by determining the feature points constituting the facial feature elements of the face region, text input unit for receiving the text, pronunciation analysis of the text A pattern classifying unit for classifying the sound pattern by executing a predetermined sound unit, a database unit for storing the text-based mouth synchronization table, and the sound sound based on the text-based mouth synchronization table Check the shape of the mouth matching the pattern and the synchronization image Through the image loading unit and said image synchronization for coding is characterized in that comprises a synchronization to the synchronization of the image of the mouth.

바람직하게는, 상기 텍스트 분석 기반의 입 모양 동기화 장치는 상기 객체의 영상을 촬영하여 제공받기 위한 영상 촬영부를 더 포함하는 것을 특징으로 한다.Preferably, the text analysis-based mouth shape synchronization device further comprises an image capturing unit for receiving and providing an image of the object.

바람직하게는, 상기 텍스트 입력부는 사용자 제어입력을 통해 상기 텍스트를 제공받거나, 기 저장중인 텍스트를 로딩하여 제공받는 것을 특징으로 한다.The text input unit may be provided with the text through a user control input, or may be provided by loading a text which is already stored.

바람직하게는, 상기 패턴 분류부는 상기 소리 음 단위를 초성, 중성 및 종성으로 설정하여 분류 실행하는 방식을 포함하는 것을 특징으로 한다.Preferably, the pattern classification unit includes a method of performing classification by setting the sound unit to be initial, neutral, and final.

바람직하게는, 상기 패턴 분류부는 상기 소리 음 패턴에 동기화 시간을 할당 하여 상기 동기화 실행부를 통해 동기화 실행되는 시간을 설정하는 것을 특징으로 한다.Preferably, the pattern classifier assigns a synchronization time to the sound sound pattern to set a time for synchronization to be executed through the synchronization execution unit.

바람직하게는, 상기 동기화 시간은 상기 소리 음 패턴의 유형별로 차등 설정되는 것을 특징으로 한다.Preferably, the synchronization time is characterized in that the differential setting for each type of sound tone pattern.

바람직하게는, 상기 텍스트 기반의 입 모양 동기화 테이블은 상기 소리 음 패턴이 적어도 하나 이상의 모음에 대응하는 경우 각 모음에 대한 입 모양을 기 설정하여 매칭하고, 상기 소리 음 패턴이 적어도 하나 이상의 자음에 대응하는 경우 자음의 소리 유형별로 기 설정된 입 모양을 매칭하여 형성되는 것을 특징으로 한다.Preferably, the text-based mouth shape synchronization table is configured to match a mouth shape for each vowel when the sound tone pattern corresponds to at least one vowel, and the sound tone pattern corresponds to at least one consonant. If it is characterized in that formed by matching the predetermined mouth shape for each sound type of consonants.

바람직하게는, 상기 영상 로딩부는 상기 텍스트 기반의 입 모양 동기화 테이블을 토대로 상기 소리 음 패턴과 매칭되는 입 모양을 특정하는 지시정보를 이용하여 상기 데이터 베이스부로부터 사기 동기화 영상을 제공받아 로딩하는 것을 특징으로 한다.Preferably, the image loading unit receives and loads a fraudulent synchronization image from the database unit by using indication information for specifying a mouth shape matching the sound tone pattern based on the text-based mouth shape synchronization table. It is done.

바람직하게는, 상기 동기화 영상은 상기 특정한 입 모양과 대응하는 정지 영상 및 동영상 중 어느 하나를 포함하는 것을 특징으로 한다.The synchronization image may include any one of a still image and a moving image corresponding to the specific mouth shape.

바람직하게는, 상기 동기화 실행부는 상기 동기화 영상을 상기 입 모양의 영상이 이루는 영역에 대체하는 프로세싱을 반복하여 동기화 실행하는 것을 특징으로 한다.Preferably, the synchronization execution unit is characterized in that for performing the synchronization by repeating the processing for replacing the synchronization image to the area formed by the mouth-shaped image.

바람직하게는, 상기 동기화 실행부는 상기 텍스트가 입력되지 않는 경우, 상기 객체에 대한 다음 영상을 토대로 하여 상기 입 모양의 영상에 대응하는 다른 입 모양의 영상으로 대체하는 변환을 통해 상기 입 모양의 영상에 대한 동기화를 실행하는 것을 특징으로 한다.Preferably, when the text is not input, the synchronization executing unit replaces the image of the mouth shape with another mouth shape image corresponding to the mouth shape image based on the next image of the object. Characterized in that for performing synchronization.

바람직하게는, 상기 동기화 실행부는 상기 얼굴 특징요소 중 상기 입 모양의 영상을 제외한 부분의 영상 및 상기 입 모양의 영상에 대한 동기화를 실행하여 형성된 영상을 합성하여 얼굴 애니메이션 데이터를 형성하는 것을 특징으로 한다.Preferably, the synchronization executing unit forms face animation data by synthesizing an image of a portion of the face feature except for the mouth-shaped image and an image formed by performing synchronization on the mouth-shaped image. .

그리고, 상기 목적을 달성하기 위한 본 발명의 제 2 관점에 따른 텍스트 분석 기반의 입 모양 동기화 방법은, (가) 객체의 영상을 제공받아 얼굴검출인식 알고리즘을 실행하여 해당 얼굴영역을 추출한 후, 상기 얼굴영역의 얼굴 특징요소 중에서 입 모양의 영상을 추출하는 단계, (나) 텍스트의 입력 여부를 판별하는 단계, (다) 상기 판별 결과에 따라 상기 텍스트에 대한 발음 분석을 기 설정된 소리 음 단위로 실행하여 적어도 하나 이상의 소리 음 패턴을 분류하는 단계, (라) 기 설정된 텍스트 기반의 입 모양 동기화 테이블을 토대로 상기 소리 음 패턴과 매칭되는 입 모양을 확인하는 단계 및 (마) 상기 확인한 입 모양과 대응하는 동기화 영상을 로딩하여 상기 입 모양의 영상에 대한 동기화를 실행하는 단계를 포함하는 것을 특징으로 한다.In addition, in order to achieve the above object, a text analysis-based mouth shape synchronization method according to the second aspect of the present invention includes (a) receiving an image of an object and executing a face detection recognition algorithm to extract a corresponding face region; Extracting an image of a mouth shape from facial feature elements of a face region, (b) determining whether text is input, and (c) executing a pronunciation analysis of the text in predetermined sound units according to the determination result Classifying at least one sound tone pattern, (D) identifying a mouth shape matching the sound tone pattern based on a preset text-based mouth shape synchronization table, and (E) corresponding to the identified mouth shape. And loading the synchronization image to perform synchronization on the mouth-shaped image.

바람직하게는, 상기 텍스트 분석 기반의 입 모양 동기화 방법은 (바) 상기 얼굴 특징요소 중 상기 입 모양의 영상을 제외한 부분의 영상 및 상기 입 모양의 영상에 대한 동기화를 실행하여 형성된 영상을 합성하여 얼굴 애니메이션 데이터를 형성하는 단계를 더 포함하는 것을 특징으로 한다.Preferably, the text analysis-based mouth shape synchronization method (F) by synthesizing the image of the portion of the face feature except the mouth shape image and the image of the mouth shape by synthesizing the face The method may further include forming animation data.

바람직하게는, 상기 (다) 단계는 (다-1) 상기 소리 음 단위를 초성, 중성 및 종성으로 설정하는 방식을 통해 1차 분류 실행하는 단계 및 (다-2) 상기 1차 분류 실행된 결과 중 상기 기 설정된 텍스트 기반의 입 모양 동기화 테이블에 리스트-업 된 패턴을 추출하여 상기 소리 음 패턴을 설정하는 2차 분류 실행하는 단계를 포함하는 것을 특징으로 한다.Preferably, the step (c) comprises: (c-1) performing the primary classification through the method of setting the sound sound unit to initial, neutral, and finality and (c-2) the result of the primary classification. And performing a second classification to set the sound tone pattern by extracting a list-up pattern from the preset text-based mouth shape synchronization table.

바람직하게는, 상기 (다) 단계는 (다-3) 상기 소리 음 패턴에 동기화 시간을 할당하여 동기화 실행되는 시간을 설정하는 단계를 더 포함하는 것을 특징으로 한다.Preferably, the step (c) further comprises the step of (c-3) assigning a synchronization time to the sound tone pattern to set a time for synchronization.

바람직하게는, 상기 (마) 단계는 (마-1) 상기 텍스트 기반의 입 모양 동기화 테이블을 토대로 상기 소리 음 패턴과 매칭되는 입 모양을 특정하는 지시정보를 이용하여 상기 동기화 영상을 로딩하는 단계 및 (마-2) 상기 동기화 영상을 상기 입 모양의 영상이 이루는 영역에 대체하는 프로세싱을 반복하여 동기화 실행하는 단계를 포함하는 것을 특징으로 한다.Preferably, the step (e) comprises: (e-1) loading the synchronization image by using indication information specifying a mouth shape matching the sound tone pattern based on the text-based mouth shape synchronization table; (E-2) iterating and repeating the processing of replacing the synchronization image with the area formed by the mouth-shaped image.

바람직하게는, 상기 (마) 단계는 상기 텍스트가 입력되지 않은 경우, 상기 객체에 대한 다음 영상을 토대로 하여 상기 입 모양의 영상에 대응하는 다른 입 모양의 영상으로 대체하는 변환을 통해 상기 입 모양의 영상에 대한 동기화를 실행하는 것을 특징으로 한다.Preferably, in the step (e), when the text is not input, the mouth-shaped image is converted through the conversion of replacing the image with another mouth-shaped image corresponding to the mouth-shaped image based on the next image of the object. And synchronizing the video.

따라서, 본 발명에서는 소정의 객체에 대한 영상으로부터 추출되는 얼굴영역에서의 입 모양에 대한 영상을 대응되는 텍스트에 대한 분석결과에 따라 영상 변환하여 동기화되는 얼굴 영상을 형성함으로써, 사용자에 의해 입력되는 텍스트에 대 응하여 얼굴인식되는 소정의 영상에 포함되는 얼굴영역의 입 모양이 자동 변환하여 사용자로 하여금 색다른 흥미를 유발시킬 수 있는 다양한 응용 서비스의 구현이 가능하다는 이점이 있다.Accordingly, in the present invention, the text inputted by the user is formed by converting the image of the mouth shape in the face region extracted from the image of the predetermined object according to the analysis result of the corresponding text to form a synchronized face image. In response to this, there is an advantage that the application of various application services that can cause a user's different interests by automatically converting the mouth shape of the face area included in a predetermined image recognized face.

이하, 첨부도면들을 참조하여 본 발명에 따른 텍스트 분석 기반의 입 모양 동기화 장치(100)의 바람직한 실시예를 보다 상세히 설명하면 다음과 같다.Hereinafter, a preferred embodiment of the text analysis based mouth shape synchronization device 100 according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일실시 예에 따른 텍스트 분석 기반의 입 모양 동기화 장치(100)의 구성도이다. 도 1에 단지 예로써 도시된 바와 같이, 텍스트 분석 기반의 입 모양 동기화 장치(100)는 객체의 영상을 제공받아 얼굴인식을 실행하기 위한 영상으로 변환 처리하는 영상 전처리부(120), 전 처리된 영상을 기초로 하여 얼굴검출인식 알고리즘을 실행하여 해당하는 얼굴영역을 추출하기 위한 얼굴영역 검출부(130), 얼굴영역의 얼굴 특징요소(눈, 이마, 눈, 코, 입 등)을 이루는 특징점들을 다양한 방식 중 적어도 하나의 방식에 따라 판별한 후, 이렇게 판별한 결과를 통해 입 모양의 영상을 추출하기 위한 특징점 판별부(140), 사용자의 제어입력에 의하거나 기 저장중이던 텍스트를 제공받기 위한 텍스트 입력부(150), 입력되는 텍스트에 대한 발음 분석을 기 설정된 소리 음 단위(예컨데, 초성, 중성 및 종성으로 설정되는 단위)로 실행하여 적어도 하나 이상의 소리 음 패턴을 분류하기 위한 패턴 분류부(160), 각각의 소리 음 패턴과 매칭하는 입 모양을 기 설정하여 테이블로 형성된 텍스트 기반의 입 모양 동기화 테이블을 저장하기 위한 데이터 베이스부(170), 상기 텍스트 기반의 입 모양 동기화 테이블을 토대로 소리 음 패턴과 매 칭되는 입 모양을 확인하여 해당하는 동기화 영상(예컨데, 매칭하여 특정되는 입 모양과 대응하는 정지 영상 및 동영상 중 적어도 어느 하나로 형성되는 영상)을 로딩하기 위한 영상 로딩부(180), 및 동기화 영상을 입 모양의 영상이 이루는 영역에 대체하는 프로세싱을 반복하여 동기화 실행하기 위한 동기화 실행부(190)를 포함한다.1 is a block diagram of an apparatus 100 for synchronizing mouth shapes based on text analysis according to an embodiment of the present invention. As shown by way of example only in FIG. 1, the text analysis-based mouth shape synchronization apparatus 100 receives an image of an object and converts the image into an image for executing face recognition. Based on the image, a face detection unit 130 for extracting a corresponding face area by executing a face detection recognition algorithm and various feature points constituting facial feature elements (eye, forehead, eyes, nose, mouth, etc.) of the face area After determining according to at least one of the methods, the feature point determination unit 140 for extracting the mouth-shaped image through the determination result, the text input unit for receiving the text that is being stored or by the user's control input 150, at least one small sound by performing pronunciation analysis on the input text in preset sound units (eg, a unit set to initial, neutral, and final). A pattern classification unit 160 for classifying sound patterns, a database unit 170 for storing a text-based mouth shape synchronization table formed as a table by presetting a mouth shape matching each sound sound pattern, and the text Based on the mouth shape synchronization table based on the sound tone pattern and matching mouth shape, the corresponding synchronization image (for example, the image formed by at least one of a still image and a video corresponding to the mouth shape that is matched and specified) is loaded. The image loading unit 180 and the synchronization execution unit 190 for repeatedly performing the processing of replacing the synchronization image in the area formed by the mouth-shaped image.

이에, 텍스트 분석 기반의 입 모양 동기화 장치(100)는 객체의 영상을 촬영하여 제공받을 수 있을 뿐만 아니라, 객체에 대한 영상을 트레킹하여 제공받기 위한 영상 촬영부(110)를 더 포함하는 것이 바람직하다.Thus, the textual analysis-based mouth shape synchronization apparatus 100 may not only be provided with photographing an image of an object, but also further include an image photographing unit 110 for receiving and providing an image of the object. .

패턴 분류부(160)는 분류된 소리 음 패턴에 동기화 시간을 할당하는 과정을 실행하여 동기화 실행부(190)에서 상기 소리 음 패턴에 대응하는 입 모양의 영상이 동기화 실행되는 시간을 설정하게 되며, 여기서 상기 동기화 시간은 각각의 분류되는 소리 음 패턴의 유형별로 기 설정된 사항에 따라 차등적으로 설정되는 것이 바람직하다.The pattern classifying unit 160 executes a process of allocating a synchronization time to the classified sound patterns, thereby setting the time when the image of the mouth shape corresponding to the sound patterns is synchronized by the synchronization executing unit 190. In this case, the synchronization time is preferably set differentially according to a preset matter for each type of sound tone pattern to be classified.

영상 로딩부(180)는 텍스트 기반의 입 모양 동기화 테이블을 토대로 소리 음 패턴과 매칭되는 입 모양을 특정하는 지시정보를 확인한 후, 데이터 베이스부(170)와 연동하여 상기 지시정보에 대응하는 동기화 영상을 제공받아 로딩 실행한다.The image loading unit 180 checks the indication information specifying the mouth shape matching the sound tone pattern based on the text-based mouth shape synchronization table, and then synchronizes with the database unit 170 to synchronize the image corresponding to the indication information. Run the loading provided with.

더 나아가, 동기화 실행부(190)는 상기 텍스트 입력부(150)를 통해 텍스트가 입력되지 않은 경우, 객체의 영상을 제공받은 이후의 추가적으로 제공되는 다음 영상을 토대로 하여 다른 입 모양의 영상을 제공받아 이전의 입 모양의 영상을 대체하는 변환을 실행함에 따라 동기화 실행하게 된다.Furthermore, when the text is not input through the text input unit 150, the synchronization executing unit 190 receives a different mouth-shaped image based on the next image additionally provided after receiving the image of the object. Synchronization is performed by performing a transformation that replaces the mouth-shaped image of.

또한, 동기화 실행부(190)는 얼굴 특징요소 중 상기 특징점 판별부(140)를 통해 형성된 입 모양의 영상을 제외한 부분의 영상 및 입 모양의 영상에 대한 동기화를 실행하여 형성된 영상을 합성하여 얼굴 애니메이션 데이터를 형성한다.In addition, the synchronization executing unit 190 synthesizes the image formed by performing the synchronization on the image of the part except the mouth-shaped image formed by the feature point determination unit 140 of the facial feature elements and the image of the mouth shape to facial animation Form the data.

도 2는 도 1에 도시된 텍스트 분석 기반의 입 모양 동기화 장치(100)에 적용되는 텍스트 기반의 입 모양 동기화 테이블을 도시한 도면이다. 도 2에 단지 예로써 도시된 바와 같이, 텍스트 기반의 입 모양 동기화 테이블은 다양한 방식으로 형성되는 것이 가능하며, 그 다양한 방식 중 하나의 방식으로 형성되는 상기 텍스트 기반의 입 모양 동기화 테이블은 소리 음 패턴이 적어도 하나 이상의 모음에 대응하는 경우에 각 모음에 대한 입 모양을 기 설정하여 상호 매칭되도록 테이블 형성하고, 또한 소리 음 패턴이 적어도 하나 이상의 자음에 대응하는 경우에 각 자음에 대한 소리 유형별(예컨데, 각 자음에 대한 입술 소리, 혀끝 소리 등)로 기 설정된 입 모양을 상호 매칭되도록 테이블 형성될 수 있다.FIG. 2 is a diagram illustrating a text-based mouth shape synchronization table applied to the text analysis-based mouth shape synchronization device 100 shown in FIG. 1. As shown by way of example only in FIG. 2, the text-based mouth-synchronization table can be formed in various ways, and the text-based mouth-synchronization table formed in one of the various ways is a sound tone pattern. In the case of the at least one vowel, the mouth shape of each vowel is preset to form a table to match each other. Also, when the sound tone pattern corresponds to the at least one consonant, the sound type for each consonant (for example, The lip sound, the tongue tip sound, etc. for each consonant may be formed in a table so as to match the preset mouth shape.

이러한 테이블 적용의 일례로, 입력되는 텍스트가 '안녕하세요'인 경우, 각 음절에 대한 초성, 중성 및 종성을 구분하여 각각의 소리 음 패턴을 설정하고, 각각 설정된 소리 음 패턴을 상기 테이블에 대입하여 대응하는 입 모양을 확인할 수 있게 되고, 이렇게 확인된 입 모양을 조합하여 입 모양의 영상에 대한 동기화를 실행할 수 있게 된다.As an example of applying such a table, when the input text is 'hello', each sound tone pattern is set by distinguishing the consonants, the neutrals, and the finalities for each syllable, and the corresponding sound tone patterns are assigned to the table. It is possible to check the shape of the mouth, it is possible to perform a synchronization on the image of the mouth shape by combining the confirmed mouth shape.

이하에서는, 첨부도면들을 참조하여 본 발명에 따른 텍스트 분석 기반의 입 모양 동기화 장치(100)의 동작과정에 대해 보다 상세히 설명하면 다음과 같다.Hereinafter, with reference to the accompanying drawings will be described in more detail with respect to the operation of the text analysis-based mouth shape synchronization device 100 according to the present invention.

도 3은 도 1에 도시된 텍스트 분석 기반의 입 모양 동기화 장치(100)의 동작과정을 나타내는 순서도이다. 도 3에 단지 예로써 도시된 바와 같이, 텍스트 분석 기반의 입 모양 동기화 방법은 객체의 영상을 제공받아 얼굴검출인식 알고리즘을 실행하여 해당 얼굴영역을 추출한 후, 추출한 얼굴영역의 얼굴 특징요소를 이루는 특징점들을 판별하기 위한 과정을 실행하는 것으로 진행된다(S100 내지 S104).FIG. 3 is a flowchart illustrating an operation of the text analysis-based mouth synchronization device 100 shown in FIG. 1. As shown only as an example in FIG. 3, in the text analysis-based mouth shape synchronization method, a feature point constituting a facial feature element of the extracted face region after extracting a corresponding face region by receiving an image of an object and executing a face detection recognition algorithm In step S100 to S104, the process for determining the numbers is executed.

이후로, 상기 얼굴 특징요소 중에서 입 모양의 영상을 별도로 추출하기 위한 과정이 실행되고(S106), 상기 얼굴 특징요소 중에서 이마, 눈, 코 등과 같이 입을 제외한 영상을 추출하기 위한 과정도 함께 실행된다(S108).Subsequently, a process for separately extracting an image of a mouth shape from the facial feature elements is performed (S106), and a process for extracting an image excluding the mouth such as the forehead, eyes, and nose from the facial feature elements is also performed together ( S108).

더 나아가, 사용자에 의해 소정의 텍스트가 입력되면(S110), 입력되는 텍스트에 대한 발음 분석을 기 설정된 소리 음 단위로 실행하여 적어도 하나 이상의 소리 음 패턴을 분류하기 위한 과정이 실행되고(S112), 분류 형성된 소리 음 패턴과 대응되는 입 모양을 기 설정된 텍스트 기반의 입 모양 동기화 테이블을 통해 확인 및 선택하게 된다(S114).Furthermore, when a predetermined text is input by the user (S110), a process for classifying at least one sound pattern is performed by performing pronunciation analysis on the input text in preset sound units (S112). The mouth shape corresponding to the classified sound tone pattern is checked and selected through a preset text-based mouth shape synchronization table (S114).

상기 S114 단계에서, 확인 및 선택한 입 모양에 대한 지시 정보를 이용하여 해당하는 동기화 영상을 로딩하고(S116), 이후로 텍스트 입력 여부를 확인하여 텍스트 입력이 이루어진 것으로 확인되면 상기 로딩한 동기화 영상을 통해 입 모양의 영상에 대한 동기화를 실행하게 된다(S118).In step S114, the corresponding synchronization image is loaded using the check and indication information on the selected mouth shape (S116), and after confirming whether the text is input by checking whether the text has been input, through the loaded synchronization image Synchronization of the mouth-shaped image is performed (S118).

또한, 상기 S118 단계에서 텍스트가 입력되지 않은 것으로 확인되는 경우, 텍스트에 대응하는 동기화 영상을 로딩하는 과정을 거치지 않고 객체에 대한 다음 영상을 토대로 하여 상기 입 모양의 영상에 대응하는 다른 입 모양의 영상으로 대 체하여 변환을 실행하는 것이 바람직하다.In addition, when it is determined in step S118 that no text is input, the image of another mouth shape corresponding to the image of the mouth shape is performed based on the next image of the object without going through a process of loading a synchronization image corresponding to the text. It is preferable to perform the conversion instead.

이후로, 얼굴 특징요소 중 입 모양의 영상을 제외한 부분의 영상 및 상기 입 모양의 영상에 대한 동기화를 실행하여 형성된 영상을 합성하여 얼굴 애니메이션 데이터를 형성하게 된다(S120).Subsequently, the face animation data is formed by synthesizing the image of the portion of the facial feature except for the mouth-shaped image and the image formed by synchronizing the mouth-shaped image (S120).

도 4는 도 3에 도시된 과정 중 텍스트 분석을 통해 입 모양의 동기화를 실행하는 과정을 더욱 상세하게 나타내는 순서도이다. 도 4에 단지 예로써 도시된 바와 같이, 텍스트 분석을 통해 입 모양의 동기화를 실행하기 위한 과정(즉, 도 3에서의 S110 단계 내지 S116 단계에 이르는 과정)에 관한 것이며, '안녕하세요'와 같은 텍스트 입력이 실행되는 것으로 진행된다(S200).4 is a flowchart illustrating a process of executing synchronization of a mouth shape through text analysis of the process illustrated in FIG. 3 in more detail. As shown by way of example only in FIG. 4, it relates to a process for performing synchronization of the mouth shape through text analysis (ie, from step S110 to step S116 in FIG. 3), and text such as 'hello'. The input proceeds to being executed (S200).

이후로, '안녕하세요'에 대한 각 음절에 대한 초성, 중성 및 종성에 대한 분류가 실행되고(S202), 텍스트 기반의 입 모양 동기화 테이블을 토대로 하여 상기 S202 단계에서 분류 실행한 소리 음 패턴에 대한 추가적인 분류가 실행된다(S204).Subsequently, classification for initial, neutral, and final for each syllable of 'Hello' is performed (S202), and additional to the sound tone pattern classified in step S202 based on the text-based mouth shape synchronization table. Classification is executed (S204).

분류된 각 소리 음 패턴에 대한 동기화 시간을 할당하여 각 소리 음 패턴에 대응하는 입 모양의 유형별로 동기화 실행되어 디스플레이 출력되는 시간을 설정하게 된다(S206).By assigning a synchronization time for each classified sound tone pattern, the synchronization is performed for each type of mouth shape corresponding to each sound tone pattern to set the display time.

더 나아가, 상기 S204 단계에서 분류 실행되어 형성되는 소리 음 패턴을 기 설정된 텍스트 기반의 입 모양 동기화 테이블에 적용함에 따라, 해당 소리 음 패턴에 매칭되는 입 모양을 확인/선택하고(S208), 선택한 입 모양에 해당하는 동기화 영상을 로딩하여 입 모양을 제외한 영상과의 합성을 통해 얼굴 애니메이션 데이터를 형성하게 된다(S210).Furthermore, according to applying the sound tone pattern formed by performing classification in the step S204 to the preset text-based mouth shape synchronization table, the mouth shape matching the corresponding sound tone pattern is checked / selected (S208), and the selected mouth By loading a synchronization image corresponding to a shape, the face animation data is formed by synthesizing with the image except the shape of the mouth (S210).

상기에서는 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다. Although the above has been described with reference to a preferred embodiment of the present invention, those skilled in the art will be variously modified and changed within the scope of the invention without departing from the spirit and scope of the invention described in the claims below. I can understand that you can.

또한, 본 발명은 소정의 객체에 대한 영상으로부터 추출되는 얼굴영역에서의 입 모양에 대한 영상을 대응되는 텍스트에 대한 분석결과에 따라 영상 변환하기 위한 것임에 따라, 시판 또는 영업의 가능성이 충분할 뿐만 아니라 현실적으로 명백하게 실시할 수 있는 정도이므로 산업상 이용가능성이 있는 발명이다.In addition, the present invention is to convert the image of the shape of the mouth in the face region extracted from the image of the predetermined object according to the analysis result of the corresponding text, the possibility of commercialization or sales is not only sufficient It is an invention that is industrially applicable because it is practically evident.

도 1은 본 발명의 일실시 예에 따른 텍스트 분석 기반의 입 모양 동기화 장치의 구성도,1 is a block diagram of an apparatus for synchronizing mouth shapes based on text analysis according to an embodiment of the present invention;

도 2는 도 1에 도시된 텍스트 분석 기반의 입 모양 동기화 장치에 적용되는 텍스트 기반의 입 모양 동기화 테이블을 도시한 도면,FIG. 2 is a diagram illustrating a text-based mouth sync table applied to the text analysis-based mouth sync apparatus shown in FIG. 1;

도 3은 도 1에 도시된 텍스트 분석 기반의 입 모양 동기화 장치의 동작과정을 나타내는 순서도, 및3 is a flowchart illustrating an operation of an apparatus for synchronizing mouth shapes based on text analysis shown in FIG. 1;

도 4는 도 3에 도시된 과정 중 텍스트 분석을 통해 입 모양의 동기화를 실행하는 과정을 더욱 상세하게 나타내는 순서도이다.4 is a flowchart illustrating a process of executing synchronization of a mouth shape through text analysis of the process illustrated in FIG. 3 in more detail.

< 도면의 주요 부분에 대한 부호의 설명 ><Description of Symbols for Main Parts of Drawings>

100 : 텍스트 분석 기반의 입 모양 동기화 장치 110 : 영상 촬영부100: text analysis-based mouth shape synchronization device 110: the image capture unit

120 : 영상 전처리부 130 : 얼굴영역 검출부120: image preprocessing unit 130: face region detection unit

140 : 특징점 판별부 150 : 텍스트 입력부140: feature point discrimination unit 150: text input unit

160 : 패턴 분류부 170 : 데이터 베이스부160: pattern classification unit 170: database unit

180 : 영상 로딩부 190 : 동기화 실행부180: image loading unit 190: synchronization execution unit

Claims

An image preprocessor configured to convert an image of the object into an image for face recognition;

A face region detector for extracting a corresponding face region by executing a face detection recognition algorithm based on the preprocessed image;

A feature point discrimination unit for extracting an image of a mouth shape among the face feature elements by determining feature points forming a face feature element of the face region;

A text input unit for receiving text;

A pattern classifier configured to classify at least one sound tone pattern by performing pronunciation analysis on the text in preset sound units;

A database unit for storing a preset text-based mouth shape synchronization table;

An image loading unit for loading the synchronization image by checking a mouth shape matching the sound tone pattern based on the text-based mouth shape synchronization table; And

And a synchronization execution unit for executing synchronization with respect to the mouth-shaped image through the synchronization image.

The apparatus of claim 1, wherein the text analysis-based mouth synchronization device

And an image photographing unit for photographing and receiving an image of the object.

The method of claim 1 or 2, wherein the text input unit

The text analysis-based mouth shape synchronization device, characterized in that the text is provided through a user control input, or by loading the text that is already stored.

The method of claim 1 or 2, wherein the pattern classification unit

And a method of classifying and setting the sound units to initial, neutral, and final sounds.

The method of claim 1 or 2, wherein the pattern classification unit

The apparatus for synchronizing mouth shapes based on text analysis, by assigning a synchronization time to the sound sound pattern, and setting a time for synchronization to be executed through the synchronization execution unit.

The method of claim 5, wherein the synchronization time is

And a speech analysis-based mouth shape synchronization device, characterized in that differentially set for each type of sound patterns.

3. The method of claim 1 or 2, wherein the text-based mouth shape synchronization table is

When the sound tone pattern corresponds to at least one vowel, the mouth shape for each vowel is preset.

And when the sound sound pattern corresponds to at least one consonant, the mouth shape synchronization device based on text analysis, wherein the mouth shape is formed by matching a predetermined mouth shape for each sound type of the consonant.

The image loading unit of claim 1, wherein the image loading unit

Text analysis-based mouth shape, characterized in that receiving the fraud synchronization image from the database unit using the indication information specifying the mouth shape matching the sound tone pattern based on the text-based mouth shape synchronization table Synchronization device.

The method of claim 8, wherein the synchronization image

And a still image and a moving picture corresponding to the specific mouth shape.

The method of claim 1, wherein the synchronization execution unit

And a text processing-based mouth shape synchronization device, characterized in that the repeated processing of replacing the synchronized image with the area formed by the mouth shape image is repeated.

The method of claim 1, wherein the synchronization execution unit

If the text is not input, a synchronization is performed on the mouth-shaped image by converting to another mouth-shaped image corresponding to the mouth-shaped image based on the next image of the object. Text analysis based mouth shape synchronization device.

The method of claim 1, wherein the synchronization execution unit

An apparatus for synchronizing mouth shapes based on text analysis, comprising: synthesizing an image of a part except the mouth-shaped image and the image formed by synchronizing the mouth-shaped image to form facial animation data; .

(A) receiving an image of an object and performing a face detection recognition algorithm to extract a corresponding face region and extracting an image of a mouth shape from face feature elements of the face region;

(B) determining whether text is input;

(C) classifying at least one sound tone pattern by performing pronunciation analysis on the text in predetermined sound units according to the determination result;

(D) checking a mouth shape matching the sound tone pattern based on a preset text-based mouth shape synchronization table; And

(E) loading the synchronization image corresponding to the confirmed mouth shape and performing synchronization on the mouth-shaped image.

The method of claim 13, wherein step (e)

And synthesizing an image of a portion of the face feature other than the mouth-shaped image and the image formed by synchronizing the mouth-shaped image to form face animation data. Mouth shape synchronization method.

15. The method of claim 13 or 14, wherein the step (c)

(C-1) performing primary classification through a method of setting the sound unit to be primary, neutral, and final; And

(C-2) performing a second classification to set the sound tone pattern by extracting a list-up pattern from the preset text-based mouth shape synchronization table among the results of the first classification; How to synchronize mouth shape based on text analysis.

The method of claim 15, wherein the (c) step

(C-3) A method of synchronizing mouth shapes based on text analysis, further comprising assigning a synchronization time to the sound pattern to set a time for synchronization to be executed.

15. The method of claim 13 or 14, wherein step (e)

(E-1) loading the synchronization image using indication information specifying a mouth shape matching the sound tone pattern based on the text-based mouth shape synchronization table; And

And (e-2) iterating and repeating the processing of replacing the synchronized image with the area formed by the mouth-shaped image.

15. The method of claim 13 or 14, wherein step (e)

If the text is not input, the synchronization of the mouth-shaped image is performed by converting to another mouth-shaped image corresponding to the mouth-shaped image based on the next image of the object. A text analytics based mouth shape synchronization method.