KR20120069969A

KR20120069969A - Real time talking reality method and apparatus

Info

Publication number: KR20120069969A
Application number: KR1020100131325A
Authority: KR
Inventors: 한성호
Original assignee: 뷰모션 (주); 동국대학교 산학협력단
Priority date: 2010-12-21
Filing date: 2010-12-21
Publication date: 2012-06-29
Also published as: KR101196116B1

Abstract

PURPOSE: A real time talking reality method and an apparatus thereof are provided to transfer a mood state of a user through a character in a virtual space. CONSTITUTION: An expression forming unit(19) forms three dimensional expression data for expressing a mood state of a character through a word. The word is composed of at least one character. An expression signal generating unit(26) mixes three-dimensional mouth shape data with formed three-dimensional expression data. The expression signal generating unit generates an expression signal. A sound converting unit(27) converts the converted character into a voice signal through a speech synthesis technology.

Description

REAL TIME TALKING REALITY METHOD AND APPARATUS}

본 발명은 리얼 타임 토킹 리얼리티(Real time talking reality) 방법 및 장치에 관한 것으로, 더욱 자세하게는 음성 출력에 따른 캐릭터의 자연스러운 입 모양 및 표정을 생성하는 기술에 관한 것이다. The present invention relates to a method and apparatus for real time talking reality, and more particularly, to a technique for generating a natural mouth shape and facial expression of a character according to a voice output.

현재 가상공간에서 캐릭터를 통한 대화방식은 캐릭터가 출력된 화면을 보면서 문자 대화가 이루고 있다. 따라서 가상공간에서 캐릭터를 보면서 문자 대화가 이루어지기 때문에, 사용자는 자신의 감정상태를 완전하게 표현할 수 없고 마찬가지로 상대방의 감정상태를 완전하게 느끼지 못할 수 있다.In the current virtual space, a character conversation is performed while looking at the screen on which the character is output. Therefore, since the text conversation is made while watching the character in the virtual space, the user may not fully express his or her emotional state and may not fully feel the emotional state of the other party.

사용자가 자신의 단말을 이용하여 가상공간에서 캐릭터를 통하여 상대방과 대화시 자신의 감정상태를 충분히 전달하고 있고 상대방의 감정상태를 충분히 감지할 수 있도록 한, 리얼 타임 토킹 리얼리티 방법 및 장치가 제안된다.There is proposed a real time talking reality method and apparatus that allows a user to sufficiently communicate his or her emotional state when communicating with a counterpart through a character in a virtual space using his terminal and to sufficiently sense the emotional state of the other party.

본 발명의 일 양상에 따른 리얼 타임 토킹 리얼리티 방법은, 음성인식(Speech Recognition) 기술을 이용하여 음성을 문자로 변환하는 단계; 상기 변환된 문자 내 모음을 이용하여 상기 변환된 문자에 대한 캐릭터의 발음을 위한 적어도 하나의 3D 입 모양 데이터를 형성하는 단계; 상기 변환되는 적어도 하나의 문자로 구성되는 단어를 이용하여 캐릭터의 감정을 표시하기 위한 적어도 하나의 3D 표정 데이터를 형성하는 단계; 상기 형성된 적어도 하나의 3D 입 모양 데이터와 3D 표정 데이터를 혼합하여 표정 신호를 발생하는 단계; 음성합성(Speech Synthesis) 기술을 이용하여 상기 변환된 문자를 음성신호를 변환하는 단계; 및 상기 발생한 표정신호와 상기 변환된 음성신호를 동기 시켜 출력하는 단계를 포함한다.According to an aspect of the present invention, a real time talking reality method includes converting a voice into a text using a speech recognition technology; Forming at least one 3D mouth shape data for pronunciation of a character for the converted character using the transformed vowel in the character; Forming at least one 3D facial expression data for displaying a character's emotion using a word composed of the converted at least one character; Generating an expression signal by mixing the formed at least one 3D mouth shape data and 3D facial expression data; Converting the converted text into a speech signal using a speech synthesis technique; And synchronizing and outputting the generated facial expression signal and the converted voice signal.

상기 변환된 문자 내 모음을 이용하여 상기 변환된 문자에 대한 캐릭터의 발음을 위한 3D 입 모양 데이터를 형성하는 단계는, 다수의 모음마다 대응되는 발음기호 포인터를 저장한 제 1 데이터베이스에서, 상기 입력된 문자 내 모음에 대응되는 발음기호 포인터를 검색하는 단계; 발음기호 포인터마다 대응되는 모음의 발음을 위해 설정된 적어도 하나의 특정 모음의 관성계수 값을 저장한 제 3 데이터베이스에서, 상기 검색된 발음기호 포인터에 대응되는 모음의 발음을 위한 적어도 하나의 특정 모음의 관성계수 값을 검색하는 단계; 및 발음기호 포인터마다 대응되는 모음의 발음을 위해 설정된 적어도 하나의 특정 모음의 관성계수 값 각각에 대한 3D 입 모양 데이터를 저장한 제 5 데이터베이스에서, 상기 검색된 적어도 하나의 특정 모음의 관성계수 값 각각에 상응하는 3D 입 모양 데이터를 검색하여, 상기 검색된 적어도 하나의 3D 입 모양 데이터를 상기 형성된 적어도 하나의 3D 입 모양 데이터로 출력하는 단계를 포함할 수 있다.Forming the 3D mouth shape data for pronunciation of the character for the converted character using the converted vowel in the character, the input database in the first database that stores a phonetic symbol pointer corresponding to each vowel Searching for a phonetic symbol pointer corresponding to a vowel in a character; Inertial coefficient of at least one specific vowel for pronunciation of a vowel corresponding to the searched pronunciation symbol pointer in a third database storing at least one specific vowel inertia coefficient value set for pronunciation of a vowel corresponding to each phonetic symbol pointer Retrieving a value; And a fifth database storing 3D mouth shape data for each of at least one specific vowel inertia coefficient value set for pronunciation of a corresponding vowel per phonetic symbol pointer, wherein each of the retrieved inertial coefficient values of the at least one specific vowel is stored. Retrieving corresponding 3D mouth shape data, and outputting the retrieved at least one 3D mouth shape data as the formed at least one 3D mouth shape data.

상기 변환되는 적어도 하나의 문자로 구성되는 단어를 이용하여 캐릭터의 감정을 표시하기 위한 적어도 하나의 3D 표정 데이터를 형성하는 단계는, 다수의 표정 그룹마다 대응되는 표정 포인터를 저장한 제 2 데이터베이스에서, 상기 입력된 적어도 하나의 문자로 구성되는 단어가 속하는 표정 그룹에 대응되는 표정 포인터를 검색하는 단계; 표정 포인터마다 대응되는 표정 그룹에 해당되는 표정을 형성하기 위해 설정된 적어도 하나의 특정 표정의 관성계수 값을 저장한 제 4 데이터베이스에서, 상기 검색된 표정 포인터에 대응되는 표정을 형성하기 위한 설정된 적어도 하나의 특정표정의 관성계수 값을 검색하는 단계; 표정 포인터마다 대응되는 표정 그룹에 해당되는 표정을 형성하기 위해 설정된 적어도 하나의 특정표정의 관성계수 값 각각에 대한 3D 표정 데이터를 저장한 제 6 데이터베이스에서, 상기 검색된 적어도 하나의 특정표정의 관성계수 값 각각에 상응하는 3D 표정 데이터를 검색하여, 상기 검색된 적어도 하나의 3D 표정 데이터를 상기 형성된 3D 표정 데이터로 출력하는 단계를 포함할 수 있다.Forming at least one 3D facial expression data for displaying the emotion of the character by using the word composed of the converted at least one character, in the second database for storing the facial expression pointer corresponding to a plurality of facial expression groups, Searching for a facial expression pointer corresponding to the facial expression group to which the word composed of the input at least one letter belongs; At least one specific set for forming an expression corresponding to the retrieved facial expression pointer in a fourth database storing inertia coefficient values of at least one specific facial expression set for forming an expression corresponding to the expression group corresponding to each expression pointer Retrieving an inertia coefficient value of the facial expression; In the sixth database storing 3D facial expression data for each of at least one specific expression set to form an expression corresponding to a corresponding facial expression group for each facial expression pointer, the retrieved inertia coefficient value of the at least one specific expression And searching for 3D facial expression data corresponding to each, and outputting the retrieved at least one 3D facial expression data as the formed 3D facial expression data.

상기 특정 모음은, a, e, i, o, u 중 어느 하나일 수 있다. The specific vowel may be any one of a, e, i, o and u.

상기 다수의 표정 그룹은, 기쁨을 나타내는 단어를 포함하는 기쁨그룹, 화남을 나타내는 단어를 포함하는 화남 그룹, 슬픔을 나타내는 단어를 포함하는 슬픔 그룹, 즐거움을 나타내는 단어를 포함하는 즐거움 그룹 및 기쁨, 화남, 슬픔, 즐거움을 나타내는 단어 이외의 단어를 포함하는 무표정 그룹 중 적어도 하나를 포함할 수 있다.The plurality of facial expression groups may include: a joy group including a word representing joy, an angry group including a word representing anger, a sad group including a word representing sadness, a joy group including a word representing joy and joy, anger It may include at least one of the expressionless group including words other than the words indicating sadness, joy.

본 발명의 다른 양상에 따른 리얼 타임 토킹 리얼리티 장치는, 음성인식(Speech Recognition) 기술을 이용하여 음성을 문자로 변환하는 문자 변환부; 상기 변환된 문자 내 모음을 이용하여 상기 변환된 문자에 대한 캐릭터의 발음을 위한 적어도 하나의 3D 입 모양 데이터를 형성하는 입 모양 형성부; 상기 변환되는 적어도 하나의 문자로 구성되는 단어를 이용하여 캐릭터의 감정을 표시하기 위한 적어도 하나의 3D 표정 데이터를 형성하는 표정 형성부; 상기 형성된 적어도 하나의 3D 입 모양 데이터와 3D 표정 데이터를 혼합하여 표정 신호를 발생하는 표정신호 발생부; 음성합성(Speech Synthesis) 기술을 이용하여 상기 변환된 문자를 음성신호를 변환하는 음성 변환부; 및 상기 발생한 표정신호와 상기 변환된 음성신호를 동기 시켜 출력하는 출력부를 포함한다. According to another aspect of the present invention, a real time talking reality apparatus may include: a text converter configured to convert a voice into a text using a speech recognition technology; A mouth shape forming unit configured to form at least one 3D mouth shape data for pronunciation of a character with respect to the converted character using the converted vowel in the character; An expression forming unit for forming at least one 3D facial expression data for displaying a character's emotion using a word composed of the converted at least one character; An expression signal generator for generating an expression signal by mixing the formed at least one 3D mouth shape data and 3D facial expression data; A speech converter for converting the converted text into a speech signal using a speech synthesis technique; And an output unit for synchronizing and outputting the generated facial expression signal and the converted voice signal.

상기 입 모양 형성부는, 다수의 모음마다 대응되는 발음기호 포인터를 저장한 제 1 데이터베이스; 상기 제 1 데이터베이스에서, 상기 입력된 문자 내 모음에 대응되는 발음기호 포인터를 검색하는 제 1 검색부; 발음기호 포인터마다 대응되는 모음의 발음을 위해 설정된 적어도 하나의 특정 모음의 관성계수 값을 저장한 제 3 데이터베이스; 상기 제 3 데이터베이스에서, 상기 검색된 발음기호 포인터에 대응되는 모음의 발음을 위한 적어도 하나의 특정 모음의 관성계수 값을 검색하는 제 3 검색부; 발음기호 포인터마다 대응되는 모음의 발음을 위해 설정된 적어도 하나의 특정 모음의 관성계수 값 각각에 대한 3D 입 모양 데이터를 저장한 제 5 데이터베이스; 및 상기 제 5 데이터베이스에서, 상기 검색된 적어도 하나의 특정 모음의 관성계수 값 각각에 상응하는 3D 입 모양 데이터를 검색하여, 상기 검색된 적어도 하나의 3D 입 모양 데이터를 상기 형성된 적어도 하나의 3D 입 모양 데이터로 출력하는 제 5 검색부를 포함할 수 있다.The mouth shape forming unit may include: a first database storing a phonetic symbol pointer corresponding to each of a plurality of vowels; A first search unit for searching for a phonetic symbol pointer corresponding to the input vowel in the first database; A third database storing inertial coefficient values of at least one specific vowel set for pronunciation of a vowel corresponding to each phonetic symbol pointer; A third retrieval unit for retrieving an inertial coefficient value of at least one specific vowel for pronunciation of a vowel corresponding to the searched phonetic symbol pointer in the third database; A fifth database storing 3D mouth shape data for each of at least one inertia coefficient value set for pronunciation of a corresponding vowel per phonetic symbol pointer; And searching the 3D mouth shape data corresponding to each of the searched inertia coefficient values of the at least one specific vowel in the fifth database, and converting the retrieved at least one 3D mouth shape data into the formed at least one 3D mouth shape data. It may include a fifth search unit for outputting.

상기 표정 형성부는, 다수의 표정 그룹마다 대응되는 표정 포인터를 저장한 제 2 데이터베이스; 상기 제 2 데이터베이스에서, 상기 입력된 적어도 하나의 문자로 구성되는 단어가 속하는 표정 그룹에 대응되는 표정 포인터를 검색하는 제 2 검색부; 표정 포인터마다 대응되는 표정 그룹에 해당되는 표정을 형성하기 위해 설정된 적어도 하나의 특정 표정의 관성계수 값을 저장한 제 4 데이터베이스; 상기 제 4 데이터베이스에서, 상기 검색된 표정 포인터에 대응되는 표정을 형성하기 위한 설정된 적어도 하나의 특정표정의 관성계수 값을 검색하는 제 4 검색부; 표정 포인터마다 대응되는 표정 그룹에 해당되는 표정을 형성하기 위해 설정된 적어도 하나의 특정표정의 관성계수 값 각각에 대한 3D 표정 데이터를 저장한 제 6 데이터베이스; 및 상기 제 6 데이터베이스에서, 상기 검색된 적어도 하나의 특정표정의 관성계수 값 각각에 상응하는 3D 표정 데이터를 검색하여, 상기 검색된 적어도 하나의 3D 표정 데이터를 상기 형성된 3D 표정 데이터로 출력하는 제 6 검색부를 포함할 수 있다.The facial expression forming unit may include: a second database storing facial expression pointers corresponding to a plurality of facial expression groups; A second search unit for searching for an expression pointer corresponding to an expression group to which a word composed of the input at least one letter belongs in the second database; A fourth database storing an inertia coefficient value of at least one specific facial expression set to form the facial expression corresponding to the facial expression group corresponding to each facial expression pointer; A fourth retrieval unit for retrieving an inertia coefficient value of at least one specific expression set in the fourth database to form an expression corresponding to the retrieved facial expression pointer; A sixth database that stores 3D facial expression data for each of at least one inertia coefficient value set to form an expression corresponding to a facial expression group corresponding to each facial expression pointer; And a sixth search unit for searching for 3D facial expression data corresponding to each of the searched inertia coefficient values of the at least one specific expression, and outputting the searched at least one 3D facial expression data as the formed 3D facial expression data. It may include.

본 발명의 실시예에 따른 리얼 타임 토킹 리얼리티 방법 및 장치에 따르면, 음성인식(Speech Recognition) 기술을 이용하여 음성을 문자로 변환하고 상기 변환된 문자 내 모음을 이용하여 상기 변환된 문자에 대한 캐릭터의 발음을 위한 적어도 하나의 3D 입 모양 데이터를 형성하고 상기 변환되는 적어도 하나의 문자로 구성되는 단어를 이용하여 캐릭터의 감정을 표시하기 위한 적어도 하나의 3D 표정 데이터를 형성하고 상기 형성된 적어도 하나의 3D 입 모양 데이터와 3D 표정 데이터를 혼합하여 표정 신호를 발생하고 음성합성(Speech Synthesis) 기술을 이용하여 상기 변환된 문자를 음성신호를 변환하고 상기 발생한 표정신호와 상기 변환된 음성신호를 동기 시켜 출력함으로써, 사용자가 가상공간에서 캐릭터를 통하여 상대방과 대화시 자신의 감정상태를 충분히 전달하고 있고 상대방의 감정상태를 충분히 감지할 수 있게 된다. According to a method and apparatus for real time talking reality according to an embodiment of the present invention, a voice is converted into a character using a speech recognition technique and a character of the converted character is converted using a vowel in the converted character. Forming at least one 3D mouth shape data for pronunciation and forming at least one 3D facial expression data for displaying a character's emotion using a word composed of the converted at least one letter and forming the at least one 3D mouth By mixing the shape data and the 3D facial expression data to generate an facial expression signal, and by using the speech synthesis technology (Speech Synthesis) technology by converting the voice signal and the generated expression signal and the converted voice signal in synchronization, When the user communicates with the other party through the character in the virtual space, the user's emotional state is sufficient. And it passed and to be able to sufficiently detect a person's emotional state.

도 1은 본 발명의 실시예에 따른 리얼 타임 토킹 리얼리티 방법에 대한 플로차트이다.
도 2는 발음기호 포인터에 대응되는 모음의 발음을 위한 적어도 하나의 특정 모음의 관성계수 값을 저장한 제 3 데이터베이스의 저장구조를 나타낸 도면이다.
도 3은 본 발명의 실시예에 따른 리얼 타임 토킹 리얼리티 장치에 대한 구성을 나타낸 도면이다.1 is a flowchart of a real time talking reality method according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating a storage structure of a third database storing at least one inertia coefficient value for pronunciation of a vowel corresponding to a phonetic symbol pointer.
3 is a diagram illustrating a configuration of a real time talking reality device according to an embodiment of the present invention.

이하에서는 첨부한 도면을 참조하여 본 발명의 실시예를 상세히 설명한다. 본 발명의 실시예를 설명함에 있어 관련된 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이다. 또한, 후술 되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the following description of the present invention, detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear. In addition, terms to be described below are terms defined in consideration of functions in the present invention, which may vary according to intention or custom of a user or an operator. Therefore, the definition should be based on the contents throughout this specification.

도 1은 본 발명의 실시예에 따른 리얼 타임 토킹 리얼리티 방법에 대한 플로차트이다. 1 is a flowchart of a real time talking reality method according to an embodiment of the present invention.

본 발명의 실시예에 따른 리얼 타임 토킹 리얼리티 방법은, 후술될 리얼 타임 토킹 리얼리티 장치에 의해서 수행될 수 있으며, 이러한 리얼 타임 토킹 리얼리티 장치는 개인용 PC등과 같은 유선 단말 또는 스마트 폰 등과 같은 모바일 단말 내에 구현될 수 있다.The real time talking reality method according to an embodiment of the present invention may be performed by a real time talking reality device to be described later, and the real time talking reality device is implemented in a mobile terminal such as a wired terminal such as a personal PC or a smart phone. Can be.

리얼 타임 토킹 리얼리티 장치는, 음성인식(Speech Recognition) 기술을 이용하여 음성을 문자로 변환한다(S1). 이때 음성인식 기술은 음향학적 신호(Acoustic speech signal)를 텍스트로 맵핑 시키는 기술이다. The real time talking reality device converts speech into text using speech recognition technology (S1). In this case, the speech recognition technology is a technology for mapping acoustic speech signals to text.

리얼 타임 토킹 리얼리티 장치는, 상기 변환된 문자 내 모음을 이용하여 변환된 문자에 대한 캐릭터의 발음을 위한 적어도 하나의 3D 입 모양 데이터를 형성한다(S2). The real time talking reality device forms at least one 3D mouth shape data for the pronunciation of the character for the converted character using the converted vowel in the character (S2).

즉, 리얼 타임 토킹 리얼리티 장치는, 다수의 모음마다 대응되는 발음기호 포인터를 저장한 제 1 데이터베이스에서, 상기 입력된 문자 내 모음에 대응되는 발음기호 포인터를 검색한다. That is, the real time talking reality device searches for a phonetic symbol pointer corresponding to the input vowel in a first database in which a phonetic symbol pointer corresponding to each of a plurality of vowels is stored.

리얼 타임 토킹 리얼리티 장치는, 발음기호 포인터마다 대응되는 모음의 발음을 위해 설정된 적어도 하나의 특정 모음의 관성계수 값을 저장한 제 3 데이터베이스에서, 상기 검색된 발음기호 포인터에 대응되는 모음의 발음을 위한 적어도 하나의 특정 모음의 관성계수 값을 검색한다. 이때, 발음기호 포인터마다 대응되는 모음의 발음을 위해 설정된 적어도 하나의 특정 모음은, a, e, i, o, u 중 적어도 하나를 포함할 수 있다. 이 발음기호 포인터(1)에 대응되는 모음의 발음을 위한 적어도 하나의 특정 모음의 관성계수 값(2)을 저장한 제 3 데이터베이스의 저장구조가 도 2에 도시되어 있다. 도 2에 도시된 바와 같이, 발음기호 포인터마다 대응되는 모음의 발음을 위한 적어도 하나의 특정 모음의 관성계수 값이 서로 다르게 설정되어 있음을 확인할 수 있다. 실시예로, 발음기호 포인터 "1"에 대응되는 모음이 "ah"인 경우, 적어도 하나의 특정 모음인, a, e, i, o, u의 각각의 관성 계수 값은 a==1, e==0, i==0, o==0, u==0으로 설정되어 있음을 확인할 수 있다.The real time talking reality apparatus may further include at least one of a plurality of specific vowel inertia coefficient values set for the pronunciation of a vowel corresponding to each phonetic symbol pointer, and at least for pronunciation of the vowel corresponding to the searched phonetic symbol pointer. Retrieve the value of inertia coefficient of one specific collection. In this case, the at least one specific vowel set for pronunciation of the vowel corresponding to each phonetic symbol pointer may include at least one of a, e, i, o, and u. A storage structure of a third database storing at least one specific vowel inertia coefficient value 2 for pronunciation of a vowel corresponding to the phonetic symbol pointer 1 is shown in FIG. As shown in FIG. 2, it can be seen that the inertia coefficient values of at least one specific vowel for pronunciation of a vowel corresponding to each phonetic symbol pointer are set differently. In an embodiment, when the vowel corresponding to the phonetic symbol pointer "1" is "ah", each inertia coefficient value of a, e, i, o, u, which is at least one specific vowel, is a == 1, e It can be seen that it is set to == 0, i == 0, o == 0, and u == 0.

리얼 타임 토킹 리얼리티 장치는, 발음기호 포인터마다 대응되는 모음의 발음을 위해 설정된 적어도 하나의 특정 모음의 관성계수 값 각각에 대한 3D 입 모양 데이터를 저장한 제 5 데이터베이스에서, 검색된 적어도 하나의 특정 모음의 관성계수 값 각각에 상응하는 3D 입 모양 데이터를 검색하여, 검색된 적어도 하나의 3D 입 모양 데이터를 상기 형성된 적어도 하나의 3D 입 모양 데이터로 출력한다. 이로써 리얼 타임 토킹 리얼리티 장치는 변환된 문자 내 모음 정보를 이용하여 변환된 문자의 발음 시의 입 모양을 형성할 수 있게 된다.
The real time talking reality device is configured to store 3D mouth shape data for each of at least one specific vowel inertia coefficient value set for pronunciation of a vowel corresponding to each phonetic symbol pointer. 3D mouth shape data corresponding to each of the inertia coefficient values are searched, and the retrieved at least one 3D mouth shape data is output as the formed at least one 3D mouth shape data. As a result, the real-time talking reality device may form a mouth shape when the converted character is pronounced using vowel information in the converted character.

다시, 도 1에서 리얼 타임 토킹 리얼리티 장치는, 변환되는 적어도 하나의 문자로 구성되는 단어를 이용하여 캐릭터의 감정을 표시하기 위한 적어도 하나의 3D 표정 데이터를 형성한다(S3). Again, in FIG. 1, the real time talking reality apparatus forms at least one 3D facial expression data for displaying the character's emotion using a word composed of at least one character to be converted (S3).

즉, 리얼 타임 토킹 리얼리티 장치는 다수의 표정 그룹마다 대응되는 표정 포인터를 저장한 제 2 데이터베이스에서, 상기 입력된 적어도 하나의 문자로 구성되는 단어가 속하는 표정 그룹에 대응되는 표정 포인터를 검색한다. 이때 다수의 표정 그룹은, 기쁨을 나타내는 단어를 포함하는 기쁨그룹, 화남을 나타내는 단어를 포함하는 화남 그룹, 슬픔을 나타내는 단어를 포함하는 슬픔 그룹, 즐거움을 나타내는 단어를 포함하는 즐거움 그룹 및 기쁨, 화남, 슬픔, 즐거움을 나타내는 단어 이외의 단어를 포함하는 무표정 그룹 중 적어도 하나를 포함할 수 있다. 이러한 기쁨 그룹에 속하는 단어는 happy, pleased, hopeful 등이 해당될 수 있고, 화남 그룹에 속하는 단어는 angry, indignant, envious, irritated 등이 해당될 수 있고, 슬픔 그룹에 속하는 단어는 sad, blue, dismal, gloomy 등이 해당될 수 있고, 즐거움 그룹에 속하는 단어는 satisfied, optimistic, generous 등이 해당될 수 있다. That is, the real-time talking reality device searches for a facial expression pointer corresponding to the facial expression group to which the word composed of the input at least one letter belongs in the second database storing the facial expression pointer corresponding to each of the plurality of facial expression groups. In this case, the plurality of facial expression groups include a joy group including a word representing joy, an angry group including a word representing anger, a sad group including a word expressing sadness, a joy group including a word expressing joy and joy, anger It may include at least one of the expressionless group including words other than the words indicating sadness, joy. Words belonging to such a joy group may correspond to happy, pleased, hopeful, etc. Words belonging to an angry group may correspond to angry, indignant, envious, irritated, etc. Words belonging to a sadness group may be sad, blue, and dismal. , gloomy, and the like may be used, and words belonging to the joy group may correspond to satisfied, optimistic, or generous.

리얼 타임 토킹 리얼리티 장치는, 표정 포인터마다 대응되는 표정 그룹에 해당되는 표정을 형성하기 위해 설정된 적어도 하나의 특정 표정의 관성계수 값을 저장한 제 4 데이터베이스에서, 상기 검색된 표정 포인터에 대응되는 표정을 형성하기 위한 설정된 적어도 하나의 특정표정의 관성계수 값을 검색한다. The real-time talking reality device forms an expression corresponding to the retrieved facial expression pointer in a fourth database storing inertia coefficient values of at least one specific facial expression set to form the facial expression corresponding to the facial expression group corresponding to each facial expression pointer. Retrieve the inertia coefficient value of at least one specific expression set to

리얼 타임 토킹 리얼리티 장치는 표정 포인터마다 대응되는 표정 그룹에 해당되는 표정을 형성하기 위해 설정된 적어도 하나의 특정표정의 관성계수 값 각각에 대한 3D 표정 데이터를 저장한 제 6 데이터베이스에서, 상기 검색된 적어도 하나의 특정표정의 관성계수 값 각각에 상응하는 3D 표정 데이터를 검색하여, 상기 검색된 적어도 하나의 3D 표정 데이터를 상기 형성된 3D 표정 데이터로 출력한다.The real time talking reality device is configured to store 3D facial expression data for each of the at least one inertial coefficient value of at least one specific expression set to form the facial expression corresponding to the facial expression group corresponding to each facial expression pointer. 3D facial expression data corresponding to each inertia coefficient value of a specific expression is searched, and the retrieved at least one 3D facial expression data is output as the formed 3D facial expression data.

이로써 리얼 타임 토킹 리얼리티 장치는 변환되는 적어도 하나의 문자로 구성되는 단어를 이용하여 캐릭터의 감정을 표시하는데 필요한 적어도 하나의 3D 표정 데이터를 형성할 수 있게 된다.
As a result, the real-time talking reality device may form at least one 3D facial expression data necessary for displaying the emotion of the character using a word composed of at least one character to be converted.

이후 리얼 타임 토킹 리얼리티 장치는, 이렇게 형성된 적어도 하나의 3D 입 모양 데이터와 3D 표정 데이터를 혼합하여 표정 신호를 발생한다(S4). Thereafter, the real time talking reality device generates the facial expression signal by mixing the at least one 3D mouth shape data and the 3D facial expression data thus formed (S4).

리얼 타임 토킹 리얼리티 장치는, 음성합성(Speech Synthesis) 기술을 이용하여 상기 변환된 문자를 음성신호로 변환한다(S5). 이때 음성합성 기술은 사람의 말소를 인공적으로 합성하는 기술로서, 사람의 목소리를 음성단위로 분할 저장하고 필요한 데이터만 다시 사용하여 출력해주는 소프트웨어나 하드웨어 기술을 지칭한다. 주로 문자 형태를 음성으로 바꾸는 주는 방식이 많으며 이를 TTS(Text To Speech)라고 한다.The real time talking reality device converts the converted text into a voice signal using a speech synthesis technique (S5). In this case, the speech synthesis technology is a technology for artificially synthesizing a human erase, and refers to a software or hardware technology that divides and stores a human voice in speech units and outputs only necessary data. There are many ways to change the form of text into voice, which is called TTS (Text To Speech).

리얼 타임 토킹 리얼리티 장치는 이렇게 발생한 표정신호와 변환된 음성신호를 동기 시켜 출력한다(S6). 이때, 리얼 타임 토킹 리얼리티 장치는 표정신호에 의해 캐릭터의 입 모양 및 표정이 변하기 시작한 시점에 맞춰서 음성신호를 출력하거나, 음성신호가 출력되기 시작한 시점에 맞춰서 표정신호에 따른 캐릭터의 입 모양 및 표정을 변화시킬 수 있다.
The real time talking reality device synchronizes the facial expression signal thus generated and the converted voice signal (S6). At this time, the real-time talking reality device outputs a voice signal in accordance with the point in time when the character's mouth shape and facial expression are changed by the facial expression signal, or adjusts the mouth shape and facial expression of the character in accordance with the facial expression signal in accordance with the point in time when the voice signal starts to be output. Can change.

도 3은 본 발명의 실시예에 따른 리얼 타임 토킹 리얼리티 장치에 대한 구성을 나타낸 도면이다.3 is a diagram illustrating a configuration of a real time talking reality device according to an embodiment of the present invention.

도시된 바와 같이, 본 발명의 실시예에 따른 리얼 타임 토킹 리얼리티 장치는, 문자 변환부(10), 입 모양 형성부(12), 표정 형성부(19), 표정신호 발생부(26), 음성 변환부(27) 및 출력부(28)를 포함한다.As shown, the real time talking reality device according to an embodiment of the present invention, the character conversion unit 10, the mouth shape forming unit 12, the expression forming unit 19, the facial expression signal generating unit 26, voice And a converter 27 and an output 28.

문자 변환부(10)는 음성인식(Speech Recognition) 기술을 이용하여 음성을 문자로 변환한다. The text converter 10 converts a voice into a text by using a speech recognition technology.

입 모양 형성부(12)는 문자 변환부(10)에서 변환된 문자 내 모음을 이용하여 상기 변환된 문자에 대한 캐릭터의 발음을 위한 적어도 하나의 3D 입 모양 데이터를 형성한다. 이때, 입 모양 형서부(12)는 제 1 데이터베이스(13), 제 1 검색부(14), 제 3 데이터베이스(15), 제 3 검색부(16), 제 5 데이터베이스(17) 및 제 5 검색부(18)를 포함한다. The mouth shape forming unit 12 forms at least one 3D mouth shape data for the pronunciation of the character for the converted character using the vowel in the character converted by the character converter 10. At this time, the mouth-shaped form part 12 is the first database 13, the first search unit 14, the third database 15, the third search unit 16, the fifth database 17 and the fifth search. Part 18 is included.

제 1 데이터베이스(13)는 다수의 모음마다 대응되는 발음기호 포인터를 저장한다. The first database 13 stores a phonetic symbol pointer corresponding to each of a plurality of vowels.

제 1 검색부(14)는 제 1 데이터베이스(13)에서, 상기 입력된 문자 내 모음에 대응되는 발음기호 포인터를 검색한다. The first search unit 14 searches for a phonetic symbol pointer corresponding to the input vowel in the first database 13.

제 3 데이터베이스(15)는 발음기호 포인터마다 대응되는 모음의 발음을 위해 설정된 적어도 하나의 특정 모음의 관성계수 값을 저장한다. The third database 15 stores inertial coefficient values of at least one specific vowel set for pronunciation of a vowel corresponding to each phonetic symbol pointer.

제 3 검색부(16)는 제 3 데이터베이스(15)에서, 상기 검색된 발음기호 포인터에 대응되는 모음의 발음을 위한 적어도 하나의 특정 모음의 관성계수 값을 검색한다. The third search unit 16 searches for the inertia coefficient value of at least one specific vowel for pronunciation of the vowel corresponding to the searched phonetic symbol pointer in the third database 15.

제 5 데이터베이스(17)는 발음기호 포인터마다 대응되는 모음의 발음을 위해 설정된 적어도 하나의 특정 모음의 관성계수 값 각각에 대한 3D 입 모양 데이터를 저장한다. The fifth database 17 stores 3D mouth shape data for each of the inertia coefficient values of at least one specific vowel set for pronunciation of the vowel corresponding to each phonetic symbol pointer.

제 5 검색부(18)는 제 5 데이터베이스(17)에서, 상기 검색된 적어도 하나의 특정 모음의 관성계수 값 각각에 상응하는 3D 입 모양 데이터를 검색하여, 상기 검색된 적어도 하나의 3D 입 모양 데이터를 상기 형성된 적어도 하나의 3D 입 모양 데이터로 출력한다.
The fifth search unit 18 retrieves 3D mouth shape data corresponding to each of the searched inertia coefficient values of the at least one specific vowel from the fifth database 17, and stores the retrieved at least one 3D mouth shape data. Output the formed at least one 3D mouth shape data.

다시, 표정 형성부(18)는 문자 변환부(10)에서 변환되는 적어도 하나의 문자로 구성되는 단어를 이용하여 캐릭터의 감정을 표시하기 위한 적어도 하나의 3D 표정 데이터를 형성한다. 이때 표정 형성부(18)는 제 2 데이터베이스(20), 제 2 검색부(21), 제 4 데이터베이스(22), 제 4 검색부(23), 제 6 데이터베이스(24) 및 제 6 검색부(25)를 포함한다.Again, the expression forming unit 18 forms at least one 3D facial expression data for displaying the character's emotions using a word composed of at least one letter converted by the character converting unit 10. In this case, the expression forming unit 18 may include the second database 20, the second search unit 21, the fourth database 22, the fourth search unit 23, the sixth database 24, and the sixth search unit ( 25).

제 2 데이터베이스(20)는 다수의 표정 그룹마다 대응되는 표정 포인터를 저장한다. The second database 20 stores the facial expression pointers corresponding to the plurality of facial expression groups.

제 2 검색부(21)는 제 2 데이터베이스(20)에서, 상기 입력된 적어도 하나의 문자로 구성되는 단어가 속하는 표정 그룹에 대응되는 표정 포인터를 검색한다.The second search unit 21 searches the second database 20 for the facial expression pointer corresponding to the facial expression group to which the word composed of the input at least one letter belongs.

제 4 데이터베이스(22)는 표정 포인터마다 대응되는 표정 그룹에 해당되는 표정을 형성하기 위해 설정된 적어도 하나의 특정 표정의 관성계수 값을 저장한다.The fourth database 22 stores inertia coefficient values of at least one specific facial expression set to form the facial expression corresponding to the facial expression group corresponding to each facial expression pointer.

제 4 검색부(23)는 제 4 데이터베이스(22)에서, 상기 검색된 표정 포인터에 대응되는 표정을 형성하기 위한 설정된 적어도 하나의 특정표정의 관성계수 값을 검색한다.The fourth search unit 23 searches the fourth database 22 for the inertia coefficient value of at least one specific expression set to form an expression corresponding to the searched facial expression pointer.

제 6 데이터베이스(24)는 표정 포인터마다 대응되는 표정 그룹에 해당되는 표정을 형성하기 위해 설정된 적어도 하나의 특정표정의 관성계수 값 각각에 대한 3D 표정 데이터를 저장한다.The sixth database 24 stores 3D facial expression data for each of the inertia coefficient values of at least one specific expression set to form the facial expression corresponding to the facial expression group corresponding to each facial expression pointer.

제 6 검색부(25)는 제 6 데이터베이스(24)에서, 상기 검색된 적어도 하나의 특정표정의 관성계수 값 각각에 상응하는 3D 표정 데이터를 검색하여, 상기 검색된 적어도 하나의 3D 표정 데이터를 상기 형성된 3D 표정 데이터로 출력한다.
The sixth search unit 25 searches 3D facial expression data corresponding to each of the searched inertia coefficient values of the at least one specific expression in the sixth database 24, and generates the searched at least one 3D facial expression data. Output as facial expression data.

표정신호 발생부(26)는 입 모양 형성부(12)에서 형성된 적어도 하나의 3D 입 모양 데이터와 표정 형성부(19)에서 형성된 3D 표정 데이터를 혼합하여 표정 신호를 발생한다. The facial expression signal generator 26 generates the facial expression signal by mixing the at least one 3D facial data formed by the mouth shape forming unit 12 and the 3D facial expression data formed by the facial expression forming unit 19.

음성 변환부(27)는 음성합성(Speech Synthesis) 기술을 이용하여 문자 변환부(10)에서 변환된 문자를 음성신호를 변환한다. The voice converter 27 converts the voice signal from the text converted by the text converter 10 using a speech synthesis technique.

출력부(28)는 표정신호 발생부(26)에서 발생한 표정신호와 음성 변환부(27)에서 변환된 음성신호를 동기 시켜 출력한다.
The output unit 28 synchronously outputs the facial expression signal generated by the facial expression signal generator 26 and the voice signal converted by the speech converter 27.

이제까지 본 발명에 대하여 실시예들을 중심으로 살펴보았다. 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 따라서 본 발명의 범위는 전술한 실시예에 한정되지 않고 특허청구범위에 기재된 내용 및 그와 동등한 범위 내에 있는 다양한 실시 형태가 포함되도록 해석되어야 할 것이다.
So far, the present invention has been described with reference to the embodiments. Those skilled in the art will understand that the present invention may be implemented in a modified form without departing from the essential characteristics of the present invention. Therefore, the disclosed embodiments should be considered in an illustrative rather than a restrictive sense. Therefore, the scope of the present invention should not be construed as being limited to the above-described examples, but should be construed to include various embodiments within the scope of the claims and equivalents thereof.

Claims

Converting the speech into text using Speech Recognition technology;
Forming at least one 3D mouth shape data for pronunciation of a character for the converted character using the transformed vowel in the character;
Forming at least one 3D facial expression data for displaying a character's emotion using a word composed of the converted at least one character;
Generating an expression signal by mixing the formed at least one 3D mouth shape data and 3D facial expression data;
Converting the converted text into a speech signal using Speech Synthesis technology; And
And synchronizing and outputting the generated facial expression signal and the converted voice signal.

The method of claim 1,
Forming 3D mouth shape data for pronunciation of a character for the converted character using the converted vowel in the character,
Retrieving a phonetic symbol pointer corresponding to the inputted vowel in a first database storing a phonetic symbol pointer corresponding to each of a plurality of vowels;
Inertial coefficient of at least one specific vowel for pronunciation of a vowel corresponding to the searched pronunciation symbol pointer in a third database storing at least one specific vowel inertia coefficient value set for pronunciation of a vowel corresponding to each phonetic symbol pointer Retrieving a value; And
A fifth database storing 3D mouth shape data for each of at least one specific vowel inertia coefficient value set for pronunciation of a corresponding vowel for each phonetic symbol pointer, corresponding to each of the retrieved inertial coefficient values of the at least one specific vowel Retrieving 3D mouth shape data, and outputting the retrieved at least one 3D mouth shape data to the formed at least one 3D mouth shape data.

The method of claim 1,
Forming at least one 3D facial expression data for displaying the emotion of the character by using the word composed of the at least one letter is converted,
Retrieving a facial expression pointer corresponding to a facial expression group to which a word composed of the input at least one letter belongs in a second database storing facial expression pointers corresponding to a plurality of facial expression groups;
At least one specific set for forming an expression corresponding to the retrieved facial expression pointer in a fourth database storing inertia coefficient values of at least one specific facial expression set for forming an expression corresponding to the expression group corresponding to each expression pointer Retrieving an inertia coefficient value of the facial expression;
In the sixth database storing 3D facial expression data for each of at least one specific expression set to form an expression corresponding to a corresponding facial expression group for each facial expression pointer, the retrieved inertia coefficient value of the at least one specific expression Retrieving 3D facial expression data corresponding to each, and outputting the retrieved at least one 3D facial expression data as the formed 3D facial expression data.

The method according to any one of claims 1 to 3,
The at least one specific vowel,
A real time talking reality method comprising at least one of a, e, i, o, u.

The method according to any one of claims 1 to 3,
The plurality of facial expression groups,
Joy group containing words to express joy, anger group containing words to express anger, sadness group containing words to express sadness, joy group containing words to express joy and words representing joy, anger, sadness and joy And at least one of an expressionless group containing words other than words.

A text conversion unit for converting speech into text using Speech Recognition technology;
A mouth shape forming unit configured to form at least one 3D mouth shape data for pronunciation of a character with respect to the converted character using the converted vowel in the character;
An expression forming unit for forming at least one 3D facial expression data for displaying a character's emotion using a word composed of the converted at least one character;
An expression signal generator for generating an expression signal by mixing the formed at least one 3D mouth shape data and 3D facial expression data;
A voice converter for converting the converted text into a voice signal using a speech synthesis technique; And
And an output unit for synchronizing and outputting the generated facial expression signal and the converted voice signal.

The method according to claim 6,
The mouth shaped portion,
A first database storing a phonetic symbol pointer corresponding to each of a plurality of vowels;
A first search unit for searching for a phonetic symbol pointer corresponding to the input vowel in the first database;
A third database storing inertial coefficient values of at least one specific vowel set for pronunciation of a vowel corresponding to each phonetic symbol pointer;
A third retrieval unit for retrieving an inertial coefficient value of at least one specific vowel for pronunciation of a vowel corresponding to the searched phonetic symbol pointer in the third database;
A fifth database storing 3D mouth shape data for each of at least one inertia coefficient value set for pronunciation of a corresponding vowel per phonetic symbol pointer; And
In the fifth database, 3D mouth shape data corresponding to each of the searched inertia coefficient values of the at least one specific vowel is searched, and the searched at least one 3D mouth shape data is output as the formed at least one 3D mouth shape data. And a fifth search unit.

The method according to claim 6,
The expression forming unit,
A second database storing a facial expression pointer corresponding to each of a plurality of facial expression groups;
A second search unit for searching for an expression pointer corresponding to an expression group to which a word composed of the input at least one letter belongs in the second database;
A fourth database storing an inertia coefficient value of at least one specific facial expression set to form the facial expression corresponding to the facial expression group corresponding to each facial expression pointer;
A fourth retrieval unit for retrieving an inertia coefficient value of at least one specific expression set in the fourth database to form an expression corresponding to the retrieved facial expression pointer;
A sixth database that stores 3D facial expression data for each of the inertia coefficient values of at least one specific expression set to form the facial expression corresponding to the facial expression group corresponding to each facial expression pointer; And
And a sixth search unit for searching for 3D facial expression data corresponding to each of the searched inertia coefficient values of the at least one specific expression in the sixth database and outputting the searched at least one 3D facial expression data as the formed 3D facial expression data. Real-time talking reality device.

9. The method according to any one of claims 6 to 8,
The at least one specific vowel,
A real time talking reality device comprising at least one of a, e, i, o, u.

9. The method according to any one of claims 6 to 8,
The plurality of facial expression groups,
Joy group containing words to express joy, anger group containing words to express anger, sadness group containing words to express sadness, joy group containing words to express joy and words representing joy, anger, sadness and joy A real time talking reality device comprising at least one of an expressionless group containing a word other than.