US20150073772A1 - Multilingual speech system and method of character - Google Patents

Multilingual speech system and method of character Download PDF

Info

Publication number
US20150073772A1
US20150073772A1 US14/349,274 US201214349274A US2015073772A1 US 20150073772 A1 US20150073772 A1 US 20150073772A1 US 201214349274 A US201214349274 A US 201214349274A US 2015073772 A1 US2015073772 A1 US 2015073772A1
Authority
US
United States
Prior art keywords
spoken words
unit
expression
multilingual
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/349,274
Inventor
Young Jin JUN
Se Kyung Song
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
FUTURE ROBOT CO Ltd
Original Assignee
FUTURE ROBOT CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by FUTURE ROBOT CO Ltd filed Critical FUTURE ROBOT CO Ltd
Assigned to FUTURE ROBOT CO., LTD. reassignment FUTURE ROBOT CO., LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: JUN, YOUNG JIN, SONG, SE KYUNG
Publication of US20150073772A1 publication Critical patent/US20150073772A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • G06F17/20
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/263Language identification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • G10L13/086Detection of language

Definitions

  • the present invention relates to a system and method for providing a multilingual speech motion of a character, and more specifically, to a multilingual speech system and method, wherein a two-dimensional or three-dimensional character speaks to express messages in many languages according to surroundings whereby messages such as consultations or guide services or the like can be precisely delivered through the character.
  • the guide robots display a two-dimensional or three-dimensional character on the screen to naturally transfer consultations or guide services to foreign visitors, implement the same face looks and lip shapes as real human, etc., and may provide a variety of types of information to foreign visitors into voices formed in the languages for each nation.
  • a speech motion in which two-dimensional or three-dimensional character provides a variety of types of information to a user into the voices, becomes data corresponding to spoken words into texts, and outputs the texts into the voices.
  • the speech system using used for the speech motion of the character linguistically interprets the inputted texts, and converts the texts into natural synthetic speech by synthesizing the interpreted texts with the voices. That is, this is a TTS (Text-To Speech) technology.
  • the TTS Text-To Speech
  • the symbolized character information is variously present according to the languages or nations to be used, and is mapped into subsequent bit types having binary values of 0 and 1, computer can understand, by character encoding.
  • ASCII encoding system encoding the character information may represent up to total 120 characters using 7 bits.
  • ISO-8859-1 encoding system which is a new character set that includes the characters used in west European countries into the existing ASCII character set, uses 8 bit (1 byte) code system because all ASCII character code may not be accepted by an adopted 7 bit code system due to the extension of ASCII.
  • Representative character codes using for each nation are as follows. Europe uses ISO 8859 series, ISO 6937, Middle East uses ISO8859 series, China uses GB2312-80, GBK, BIG5, Japanese uses a native character code such as JIS, and Korea uses the native character code such as KSX 1001.
  • feeling expression and speech motions formed in many languages are applied to a two-dimensional or three-dimensional character
  • the feeling expression and the speech motions have been subsequently progressed separately. That is, for example, the speech motions moving the lips are followed after performing feeling expression motions such as character's laughing appearance, or the feeling expression motions such as crying appearance are performed after performing the speech motions. Therefore, in order to enhance power of delivery for messages or stories according to the motions of two-dimensional or three-dimensional character, technologies capable of simultaneously performing the speech motions while performing feeling expression motions such as crying or laughing are needed.
  • An advantage of some aspects of the invention that it provides a multilingual speech system and method of a character to solve a problem in that methods to encode character information for each various language are different and the sentences corresponding to the languages having different codes may not be spoken at a time, when two-dimensional or three-dimensional character provides speech motions that expresses messages in many languages according to surroundings.
  • a multilingual speech system of a character including a context-aware unit recognize surroundings; a conversation selection unit to select spoken words in accordance with the recognized surroundings; a Unicode multilingual database in which the spoken words are stored in Unicode-based multiple languages according languages for each nations; a behavior expression unit to express behaviors in accordance with the selected spoken words; and a work processing unit to synchronize and express the selected spoken words and the behaviors according to the spoken words.
  • the multilingual speech system of the character further includes a feeling production unit to select feelings in accordance with the recognized surroundings, wherein the work processing unit synchronizes and expresses the selected feelings and the behaviors according to the spoken words.
  • the Unicode multilingual database additionally stores language identification information identifying the languages for each nation, and the conversation selection unit selects the spoken words in accordance with the corresponding languages by the language identification information.
  • the behavior expression unit includes a voice synthesis unit to output the selected spoken words into voices, and a face expression unit to display faces according to the selected spoken words on the screen.
  • the voice synthesis unit extracts consonant and vowel information necessary for producing lip shapes from the spoken words, includes a syntax analysis unit to produce time information pronouncing consonants and vowels changing the lip shapes, and a sound source production unit to produce sound sources corresponding to the selected spoken words and to output the produced sound sources into the voices, and the face expression unit includes a feeling expression unit to select face looks corresponding to feeling expression according to the recognized surroundings and to display the selected face looks on the screen, and a speech expression unit to select the lip shapes necessary for representing the selected spoken words and to display the selected lip shapes on the screen.
  • the face expression unit further includes a look database to store the face looks into images, and a lip shape database to store the lip shapes into the images.
  • the work processing unit adds the selected feeling information to the produced sound sources, changes tones thereof, and outputs the changed tones into the voices.
  • a multilingual speech method of a character including a context-aware step to recognize surroundings; a conversation selection step to select the spoken words according to the corresponding languages by language identification information in accordance with the recognized surroundings from a Unicode multilingual database, stored in Unicode-based multiple languages, including the language identification information identifying the languages for each nation and the spoken words corresponding to the languages for each nation; and a behavior expression step to synchronize and express the selected spoken words and the behaviors according to the spoken words.
  • the multilingual speech method of the character further includes a feeling production step to select feelings in accordance with the recognized surroundings, wherein the behavior expression step synchronizes and expresses the selected feeling and the behaviors according to the spoken words.
  • the behavior expression step includes a voice synthesis step to output the selected spoken words into voices, and a face expression unit to display faces according to the selected spoken words on the screen.
  • the voice synthesis step extracts consonant and vowel information necessary for producing lip shapes from the spoken words, includes a syntax analysis step to produce time information pronouncing consonants and vowels changing the lip shapes, and a sound source production step to produce sound sources corresponding to the selected spoken words and to output the produced sound sources to the voices, and the face expression step includes a feeling expression step to select face look corresponding to feeling expression according to the recognized surroundings and to display the selected face look on the screen, and a speech expression step to select the lip shapes necessary for representing the selected spoken words and to display the selected lip shapes on the screen.
  • the sound source production step adds the selected feeling information to the produced sound sources, changes tones thereof, and outputs the changed tones into the voices.
  • FIG. 1 is a block view representing configurations of a multilingual speech system of a character according to the present invention.
  • FIG. 2 is a flow view describing a multilingual speech method of a character according to the present invention.
  • FIG. 3 is a flow view describing steps synchronizing and then expressing feelings and behaviors, when a character speaks in many languages, according to the present invention.
  • a multilingual speech system and method of a character according to the present invention proposes technological characteristics that may simultaneously represent languages of various nations and may easily speak many languages simultaneously at a specific situation, when a two or three dimensional character provides speech motions expressing messages in many languages according to surroundings.
  • FIG. 1 is a block view representing configurations of a multilingual speech system of a character according to the present invention.
  • a multilingual speech system 100 of a character according to the present invention includes a context-aware unit 110 to recognize surroundings, a conversation selection unit 130 to select spoken words in accordance with the recognized surroundings, a Unicode multilingual database 135 in which the spoken words are stored in Unicode-based multiple languages according to languages for each nation, a behavior expression unit 140 to express behaviors in accordance with the selected spoken words, and a work processing unit 150 to synchronize and express the selected words and the behaviors according to the spoken words.
  • the multilingual speech system 100 of the character according to the present invention further include a feeling production unit 120 to select feelings in accordance with the recognized surroundings, wherein the work processing unit 150 may synchronize and express the selected feelings and the behaviors according to the spoken words.
  • the context-aware unit 110 recognizes the surroundings of the character. For example, it recognizes the surroundings, approached by customers, when the customers approach to less than a certain distance around the character.
  • the context-aware unit 110 is implemented by a system, etc. that takes the surroundings by a camera and analyzes the taken images, or by all kinds of sensors recognizing the surroundings.
  • the feeling production unit 120 selects the feelings in accordance with the surroundings recognized from the context-aware unit 110 . For example, when the context-aware unit 110 recognizes customer's approaches, the feeling production unit 120 selects feeling representation such as laughs. The feeling production unit 120 selects the feelings according to the surroundings recognized by the context-aware unit 110 , or inputs the feelings and the spoken words randomly selected by a user.
  • the conversation selection unit 130 selects the spoken words in accordance with the surroundings recognized from the context-aware unit 110 . That is, when the context-aware unit 110 recognizes customer's approaches, it selects, for example, the spoken words called Hi, come on in.
  • the conversation selection unit 130 selects the spoken words according to the surroundings recognized by the context-aware unit 110 , or selects and inputs the spoken words randomly selected by the user.
  • the conversation selection unit 130 selects the spoken words as the languages for each nation and may select the spoken words according to the corresponding language.
  • a method to select the languages for each nation may select the spoken words corresponding to the corresponding language and may randomly select the languages by the user's explicit selection on recognizing the corresponding language by the context-aware unit 110 .
  • the Unicode multilingual database 135 stores data corresponding to the spoken words in the Unicode-based multiple languages in accordance with the languages for each nation.
  • a Unicode is called USC (Universal Code System), having 2 bytes type, that is established as an international standard and is universally common.
  • the Unicode standardizes a value giving to 1 character as 16 bits to smooth exchange for data.
  • the value per 1 character of codes is 7 bits in English, is 8 bits in non-English, and is 16 bits in Korean or Japanese, wherein all the values is standardized as 16 bits.
  • An alphanumeric keypad such as ISO/IEC 10646-1 one by one gives code values to characters and special symbols of 26 languages used in the world
  • the Unicode which overcome limits of a prior ASCII and is to be compatible with all world's languages, is a character code internationally designed to represent all languages used by human, and a single large character set that includes all of encoding structures of the existing languages.
  • the Unicode multilingual database 135 stores the spoken words according to the languages for each nation in the Unicode-based multiple languages, and therefore the conversation selection unit 130 may select the spoken words according to the languages for each nation from the Unicode multilingual database 135 , and may simultaneously represent the languages of many nations without colliding and may simultaneously speak many languages at a specific situation.
  • the Unicode multilingual database 135 additionally stores language identification information identifying the languages for each nation, and the conversation selection unit 130 may select the spoken words according to the corresponding language by the language identification information.
  • the conversation selection unit 130 may select the spoken words according to the corresponding language by the language identification information.
  • the behavior expression unit 140 expresses the behaviors according to the spoken words selected from the conversation selection unit 130 .
  • the behavior expression unit 140 includes a voice synthesis unit 142 to output the selected spoken words into the voices, and a face expression unit 145 to display the faces according to the selected spoken words on the screen.
  • the voice synthesis unit 142 extracts consonant and vowel information necessary for producing lip shapes from the spoken words selected from the conversation selection unit 130 , includes a syntax analysis unit 143 to produce time information pronouncing consonants and vowels changing the lip shapes, and a sound source production unit 144 to produce sound sources corresponding to the spoken words selected from the conversation selection unit 130 and output the produced sound sources into the voices.
  • the face expression unit 145 includes a feeling expression unit 146 to select face looks corresponding to feeling expression according to the surroundings recognized from the context-aware unit 110 and to display the selected face looks on the screen, and a speech expression unit 148 to select the lip shapes necessary for representing the spoken words selected from the conversation selection unit 130 and to display the selected lip shapes on the screen.
  • the face expression unit 145 further includes a look database 147 to store the face looks into images, and the feeling expression unit 146 selects the face looks corresponding to feeling expression according to the surroundings from face look images stored into the look database 147 and therefore displays the selected face looks on the screen.
  • the face expression unit 145 further includes a lip shape database 149 to store the lip shapes into the images, and the speech expression unit 148 selects the lip shapes necessary for representing the spoken words from lip shape images stored into the lip shape database 149 and displays the selected lip shapes on the screen.
  • the work processing unit 150 synchronizes and expresses the selected spoken words and the behaviors according to the spoken words.
  • the work processing unit 150 further includes the feeling production unit 120 , it synchronizes and expresses the feelings and the behaviors according to the spoken words.
  • the work processing unit 150 analyzes consonants and vowels of the spoken words by the syntax analysis unit 143 , selects the lip shapes based on the vowels that changes the lip shapes most significantly, and may select the lip shapes closing lips before selecting next vowels on pronouncing the consonants closing the lips
  • the work processing unit 150 adds the selected feeling information to the produced sound sources, changes the tones thereof, and outputs the changed tones into the voice. Therefore, the work processing unit 150 adds the feeling information such as laughs to the face looks such as laughing faces, outputs the voices for the spoken words changed with the tones, and provides the character capable of representing the lip shapes according to the consonants and vowels of the spoken words to the users.
  • FIG. 2 is a flow view describing a multilingual speech method of the character according to the present invention.
  • FIG. 3 is a flow view describing steps synchronizing and then expressing the feelings and behaviors, when a character speaks in many languages, according to the present invention.
  • a multilingual speech method 200 of a character includes a context-aware step S 210 to recognize the surroundings, a conversation selection step S 230 to select the spoken words according to the corresponding language by the language identification information in accordance with the recognized surroundings from a Unicode multilingual database 135 , stored into Unicode-based multiple languages, including the language identification information identifying the languages for each nation and the spoken words corresponding to the languages for each nation, and a behavior expression step S 240 to synchronize and express the selected spoken words and the behaviors according to the spoken words.
  • the multilingual speech method 200 of the character further includes a feeling production step S 220 to select the feelings according to the surroundings recognized from the context-aware step S 210 , wherein the behavior expression step S 240 synchronizes and expresses the selected feelings and the behaviors according to the spoken words
  • the context-aware step S 210 recognizes the surroundings through the context-aware unit 110 .
  • the context-aware unit 110 is implemented by a system, etc. that takes the surroundings by a camera and analyzes the taken images, or by all kinds of sensors recognizing the surroundings.
  • the feeling production step S 220 selects the feelings according to the recognized surroundings from the feeling production unit 120 .
  • the feeling production step S 220 selects the feelings according to the surroundings recognized by the context-aware unit 110 , or selects and inputs the feelings and spoken words randomly selected by the user.
  • the conversation selection step S 230 selects the spoken words according to the corresponding language by the language identification information in accordance with the surroundings recognized by the conversation selection unit 130 from the Unicode multilingual database 135 , stored into Unicode-based multiple languages, including the language identification information identifying the languages for each nation and the spoken words corresponding to the languages for each nations.
  • the conversation selection step S 230 selects the corresponding language by the language identification information stored into the Unicode multilingual database 135 and selects the spoken words according to the language.
  • the conversation selection step S 230 selects the corresponding language by the language identification information stored into the Unicode multilingual database 135 and selects the spoken words according to the language.
  • the behavior expression step S 240 synchronizes the spoken words selected from the conversation selection unit 130 and the behaviors according to the spoken words and expresses the synchronized spoken words to the behavior expression unit 140 . Further, when the behavior expression step S 240 further includes the feeling production step S 220 , it synchronizes the feelings selected from the feeling production unit 120 and the behaviors according to the spoken words selected from the conversation selection unit 130 and therefore expresses the synchronized feelings and behaviors to the behavior expression unit 140 .
  • the behavior expression step S 240 includes a voice synthesis step S 242 to output the selected spoken words into voices by the voice synthesis unit 142 , and a face expression step S 245 to display the faces according to the selected spoken words on the screen by the face expression unit 145 .
  • the voice synthesis step S 242 extracts consonant and vowel information necessary for producing lip shapes from the spoken words by the syntax analysis unit 143 , includes a syntax analysis step S 243 to produce time information pronouncing consonants and vowels changing the lip shapes, and a sound source production step S 244 to produce the sound sources corresponding to the selected spoken words by the sound source production unit 144 and to output the produced sound sources into the voices.
  • the sound source production step S 244 adds the selected feeling information to the sound sources produced by the work processing unit 150 , changes tones thereof, and outputs the changed tones into the voices by the sound source production unit 144 .
  • the face expression step S 245 includes a feeling expression step S 246 to select the face looks corresponding to the feeling expression according to the recognized surroundings and to display the selected face looks on the screen, and a speech expression step S 248 to select the lip shapes necessary for representing the selected spoken words and to display the selected lip shapes on the screen.
  • the feeling expression step S 246 selects the face looks from the look database 147 that stores the face looks into the images and displays the selected face looks on the screen by the feeling expression unit 146
  • the speech expression step S 248 selects the lip shapes necessary for representing the spoken words selected from the lip shape database 149 that stores the lip shapes into the images and displays the selected lip shapes on the screen by the speech expression unit 148 .

Abstract

The present invention relates to multilingual speech system and method, wherein a two-dimensional or three-dimensional character speaks to express messages in multiple languages according to surroundings whereby messages such as consultations or guide services or the like can be precisely delivered through the character. To accomplish the objective, the multilingual speech system of the character according to the present invention includes a context-aware unit to recognize the surroundings; a conversation selection unit to select spoken words in accordance with the recognized surroundings; a Unicode multilingual database in which the spoken words are stored in Unicode-based multiple languages according to languages of respective nations; a behavior expression unit to express behaviors in accordance with the selected spoken words; and a work processing unit to synchronize and express the selected spoken words and the behaviors according to the spoken words.

Description

    TECHNICAL FIELD
  • The present invention relates to a system and method for providing a multilingual speech motion of a character, and more specifically, to a multilingual speech system and method, wherein a two-dimensional or three-dimensional character speaks to express messages in many languages according to surroundings whereby messages such as consultations or guide services or the like can be precisely delivered through the character.
  • BACKGROUND ART
  • With active international exchanges in recent years, foreign visitors have been increased rapidly around the world. Therefore, the foreign visitors having no geographical or cultural knowledge of the nations to be visited need consultations or guide services using their own languages. Thus, the need for experts capable of using many languages is increasing.
  • The need for experts capable of using many languages is more increasing on holding global events such as Olympic, Asian game, World Cup etc. Therefore, consultation or guide systems using guide robots have been developed instead of the experts in recent years, and the guide robots provide consultations or guide services using their own languages, if necessary, to foreign visitors.
  • The guide robots display a two-dimensional or three-dimensional character on the screen to naturally transfer consultations or guide services to foreign visitors, implement the same face looks and lip shapes as real human, etc., and may provide a variety of types of information to foreign visitors into voices formed in the languages for each nation.
  • A speech motion, in which two-dimensional or three-dimensional character provides a variety of types of information to a user into the voices, becomes data corresponding to spoken words into texts, and outputs the texts into the voices. The speech system using used for the speech motion of the character linguistically interprets the inputted texts, and converts the texts into natural synthetic speech by synthesizing the interpreted texts with the voices. That is, this is a TTS (Text-To Speech) technology.
  • The TTS (Text-To Speech) technology converts encoded character information into audible voice information. The symbolized character information is variously present according to the languages or nations to be used, and is mapped into subsequent bit types having binary values of 0 and 1, computer can understand, by character encoding.
  • ASCII encoding system encoding the character information may represent up to total 120 characters using 7 bits. ISO-8859-1 encoding system, which is a new character set that includes the characters used in west European nations into the existing ASCII character set, uses 8 bit (1 byte) code system because all ASCII character code may not be accepted by an adopted 7 bit code system due to the extension of ASCII.
  • Representative character codes using for each nation are as follows. Europe uses ISO 8859 series, ISO 6937, Middle East uses ISO8859 series, China uses GB2312-80, GBK, BIG5, Japanese uses a native character code such as JIS, and Korea uses the native character code such as KSX 1001.
  • When character information is variously encoded according to the languages as above, separate sentences for each language should be configured to output data corresponding to the spoken words, that is, the text into the voices. That is, the languages are determined by user's explicit selection, etc. on the situation, and the sentences are brought according to the corresponding languages from database storing the texts according to the corresponding languages in case of determining the languages and the sentences are output into speeches, that is, the voices.
  • In a prior multilingual speech system mentioned above, there is a problem in that methods encoding the character information are different for each language, the sentences having different language codes for each various language may not be spoken at a time, and different languages are designated after speaking the specific languages and the corresponding languages should be spoken again.
  • In addition, in the prior multilingual speech system, there is a problem in that the rules for the schemes selecting the languages according to many languages should be separately made, the rules for the orders for speaking the corresponding sentences according to the languages should be made, and the programs to implement the system become complex. Therefore, there is a problem in that the system speaking as one language should be configured, until the specific situation is lapsed, on selecting one language without making as types subsequently changing the languages.
  • Further, when feeling expression and speech motions formed in many languages are applied to a two-dimensional or three-dimensional character, the feeling expression and the speech motions have been subsequently progressed separately. That is, for example, the speech motions moving the lips are followed after performing feeling expression motions such as character's laughing appearance, or the feeling expression motions such as crying appearance are performed after performing the speech motions. Therefore, in order to enhance power of delivery for messages or stories according to the motions of two-dimensional or three-dimensional character, technologies capable of simultaneously performing the speech motions while performing feeling expression motions such as crying or laughing are needed.
  • DISCLOSURE Technical Problem
  • An advantage of some aspects of the invention that it provides a multilingual speech system and method of a character to solve a problem in that methods to encode character information for each various language are different and the sentences corresponding to the languages having different codes may not be spoken at a time, when two-dimensional or three-dimensional character provides speech motions that expresses messages in many languages according to surroundings.
  • Technical Solution
  • According to an aspect of the invention, there is provided a multilingual speech system of a character including a context-aware unit recognize surroundings; a conversation selection unit to select spoken words in accordance with the recognized surroundings; a Unicode multilingual database in which the spoken words are stored in Unicode-based multiple languages according languages for each nations; a behavior expression unit to express behaviors in accordance with the selected spoken words; and a work processing unit to synchronize and express the selected spoken words and the behaviors according to the spoken words.
  • Preferably, the multilingual speech system of the character further includes a feeling production unit to select feelings in accordance with the recognized surroundings, wherein the work processing unit synchronizes and expresses the selected feelings and the behaviors according to the spoken words.
  • Further, the Unicode multilingual database additionally stores language identification information identifying the languages for each nation, and the conversation selection unit selects the spoken words in accordance with the corresponding languages by the language identification information.
  • The behavior expression unit includes a voice synthesis unit to output the selected spoken words into voices, and a face expression unit to display faces according to the selected spoken words on the screen.
  • Further, the voice synthesis unit extracts consonant and vowel information necessary for producing lip shapes from the spoken words, includes a syntax analysis unit to produce time information pronouncing consonants and vowels changing the lip shapes, and a sound source production unit to produce sound sources corresponding to the selected spoken words and to output the produced sound sources into the voices, and the face expression unit includes a feeling expression unit to select face looks corresponding to feeling expression according to the recognized surroundings and to display the selected face looks on the screen, and a speech expression unit to select the lip shapes necessary for representing the selected spoken words and to display the selected lip shapes on the screen.
  • The face expression unit further includes a look database to store the face looks into images, and a lip shape database to store the lip shapes into the images.
  • The work processing unit adds the selected feeling information to the produced sound sources, changes tones thereof, and outputs the changed tones into the voices.
  • According to another aspect of the invention, there is provided a multilingual speech method of a character including a context-aware step to recognize surroundings; a conversation selection step to select the spoken words according to the corresponding languages by language identification information in accordance with the recognized surroundings from a Unicode multilingual database, stored in Unicode-based multiple languages, including the language identification information identifying the languages for each nation and the spoken words corresponding to the languages for each nation; and a behavior expression step to synchronize and express the selected spoken words and the behaviors according to the spoken words.
  • Preferably, the multilingual speech method of the character further includes a feeling production step to select feelings in accordance with the recognized surroundings, wherein the behavior expression step synchronizes and expresses the selected feeling and the behaviors according to the spoken words.
  • The behavior expression step includes a voice synthesis step to output the selected spoken words into voices, and a face expression unit to display faces according to the selected spoken words on the screen.
  • The voice synthesis step extracts consonant and vowel information necessary for producing lip shapes from the spoken words, includes a syntax analysis step to produce time information pronouncing consonants and vowels changing the lip shapes, and a sound source production step to produce sound sources corresponding to the selected spoken words and to output the produced sound sources to the voices, and the face expression step includes a feeling expression step to select face look corresponding to feeling expression according to the recognized surroundings and to display the selected face look on the screen, and a speech expression step to select the lip shapes necessary for representing the selected spoken words and to display the selected lip shapes on the screen.
  • Further, the sound source production step adds the selected feeling information to the produced sound sources, changes tones thereof, and outputs the changed tones into the voices.
  • Advantageous Effects
  • According to an embodiment of the present invention, there is a remarkable effect in that may simultaneously represent the languages of multiple nations and may easily process the functions simultaneously speaking multiple languages at the specific situation by selecting the spoken words according to corresponding language from a Unicode multilingual database stored in Unicode-based multiple languages by the spoken words according to the languages for each nation and by outputting the selected spoken words into the voices.
  • According to another embodiment of the present invention, there is a remarkable effect in that may use speech engines written in the corresponding language by the language identification information only without complex processing logics such as rules related to separable language selection and may fluidly configure the speeches simultaneously representing languages of multiple nations by including language identification information identifying the languages for each nation into the spoken words.
  • According to further another embodiment of the present invention, there is a remarkable effect in that may simultaneously express the face looks added with various feeling expression and speech messages in many languages by two-dimensional or three-dimensional character.
  • DESCRIPTION OF DRAWINGS
  • FIG. 1 is a block view representing configurations of a multilingual speech system of a character according to the present invention.
  • FIG. 2 is a flow view describing a multilingual speech method of a character according to the present invention.
  • FIG. 3 is a flow view describing steps synchronizing and then expressing feelings and behaviors, when a character speaks in many languages, according to the present invention.
  • MODE FOR INVENTION
  • A multilingual speech system and method of a character according to the present invention proposes technological characteristics that may simultaneously represent languages of various nations and may easily speak many languages simultaneously at a specific situation, when a two or three dimensional character provides speech motions expressing messages in many languages according to surroundings.
  • Hereinafter, exemplary embodiments, advantages, and characteristics of the present invention will be described with reference to the enclosed drawings
  • FIG. 1 is a block view representing configurations of a multilingual speech system of a character according to the present invention. Referring to FIG. 1, a multilingual speech system 100 of a character according to the present invention includes a context-aware unit 110 to recognize surroundings, a conversation selection unit 130 to select spoken words in accordance with the recognized surroundings, a Unicode multilingual database 135 in which the spoken words are stored in Unicode-based multiple languages according to languages for each nation, a behavior expression unit 140 to express behaviors in accordance with the selected spoken words, and a work processing unit 150 to synchronize and express the selected words and the behaviors according to the spoken words.
  • Preferably, the multilingual speech system 100 of the character according to the present invention further include a feeling production unit 120 to select feelings in accordance with the recognized surroundings, wherein the work processing unit 150 may synchronize and express the selected feelings and the behaviors according to the spoken words.
  • The context-aware unit 110 recognizes the surroundings of the character. For example, it recognizes the surroundings, approached by customers, when the customers approach to less than a certain distance around the character. The context-aware unit 110 is implemented by a system, etc. that takes the surroundings by a camera and analyzes the taken images, or by all kinds of sensors recognizing the surroundings.
  • The feeling production unit 120 selects the feelings in accordance with the surroundings recognized from the context-aware unit 110. For example, when the context-aware unit 110 recognizes customer's approaches, the feeling production unit 120 selects feeling representation such as laughs. The feeling production unit 120 selects the feelings according to the surroundings recognized by the context-aware unit 110, or inputs the feelings and the spoken words randomly selected by a user.
  • The conversation selection unit 130 selects the spoken words in accordance with the surroundings recognized from the context-aware unit 110. That is, when the context-aware unit 110 recognizes customer's approaches, it selects, for example, the spoken words called Hi, come on in. The conversation selection unit 130 selects the spoken words according to the surroundings recognized by the context-aware unit 110, or selects and inputs the spoken words randomly selected by the user.
  • Further, the conversation selection unit 130 selects the spoken words as the languages for each nation and may select the spoken words according to the corresponding language. A method to select the languages for each nation may select the spoken words corresponding to the corresponding language and may randomly select the languages by the user's explicit selection on recognizing the corresponding language by the context-aware unit 110.
  • The Unicode multilingual database 135 stores data corresponding to the spoken words in the Unicode-based multiple languages in accordance with the languages for each nation.
  • A Unicode is called USC (Universal Code System), having 2 bytes type, that is established as an international standard and is universally common. The Unicode standardizes a value giving to 1 character as 16 bits to smooth exchange for data. In a prior art, the value per 1 character of codes is 7 bits in English, is 8 bits in non-English, and is 16 bits in Korean or Japanese, wherein all the values is standardized as 16 bits. An alphanumeric keypad such as ISO/IEC 10646-1 one by one gives code values to characters and special symbols of 26 languages used in the world
  • The Unicode, which overcome limits of a prior ASCII and is to be compatible with all world's languages, is a character code internationally designed to represent all languages used by human, and a single large character set that includes all of encoding structures of the existing languages.
  • Therefore, the Unicode multilingual database 135 stores the spoken words according to the languages for each nation in the Unicode-based multiple languages, and therefore the conversation selection unit 130 may select the spoken words according to the languages for each nation from the Unicode multilingual database 135, and may simultaneously represent the languages of many nations without colliding and may simultaneously speak many languages at a specific situation.
  • Preferably, the Unicode multilingual database 135 additionally stores language identification information identifying the languages for each nation, and the conversation selection unit 130 may select the spoken words according to the corresponding language by the language identification information. Thus, it is possible to select the spoken words written in the corresponding language by the language identification information only without complex processing logics such as rules related to separable language selection, and to fluidly configure speech motions simultaneously representing languages of multiple nations.
  • When the conversation selection unit 130 may select, for example, the spoken words called {
    Figure US20150073772A1-20150312-P00001
    .}, it may select the spoken words written in Hangul by the language identification information having {<lang type=korean>
    Figure US20150073772A1-20150312-P00001
    .</lang} scheme in case of Hangul, and may select the spoken words written in English by the language identification information having {<lang type=english> Hello.</lang} scheme in case of English, thereby to fluidly select the spoken words of the corresponding language by the language identification information only.
  • The behavior expression unit 140 expresses the behaviors according to the spoken words selected from the conversation selection unit 130. Preferably, the behavior expression unit 140 includes a voice synthesis unit 142 to output the selected spoken words into the voices, and a face expression unit 145 to display the faces according to the selected spoken words on the screen.
  • The voice synthesis unit 142 extracts consonant and vowel information necessary for producing lip shapes from the spoken words selected from the conversation selection unit 130, includes a syntax analysis unit 143 to produce time information pronouncing consonants and vowels changing the lip shapes, and a sound source production unit 144 to produce sound sources corresponding to the spoken words selected from the conversation selection unit 130 and output the produced sound sources into the voices.
  • In addition, the face expression unit 145 includes a feeling expression unit 146 to select face looks corresponding to feeling expression according to the surroundings recognized from the context-aware unit 110 and to display the selected face looks on the screen, and a speech expression unit 148 to select the lip shapes necessary for representing the spoken words selected from the conversation selection unit 130 and to display the selected lip shapes on the screen.
  • The face expression unit 145 further includes a look database 147 to store the face looks into images, and the feeling expression unit 146 selects the face looks corresponding to feeling expression according to the surroundings from face look images stored into the look database 147 and therefore displays the selected face looks on the screen.
  • Further, the face expression unit 145 further includes a lip shape database 149 to store the lip shapes into the images, and the speech expression unit 148 selects the lip shapes necessary for representing the spoken words from lip shape images stored into the lip shape database 149 and displays the selected lip shapes on the screen.
  • The work processing unit 150 synchronizes and expresses the selected spoken words and the behaviors according to the spoken words. When the work processing unit 150 further includes the feeling production unit 120, it synchronizes and expresses the feelings and the behaviors according to the spoken words. The work processing unit 150 analyzes consonants and vowels of the spoken words by the syntax analysis unit 143, selects the lip shapes based on the vowels that changes the lip shapes most significantly, and may select the lip shapes closing lips before selecting next vowels on pronouncing the consonants closing the lips
  • The work processing unit 150 adds the selected feeling information to the produced sound sources, changes the tones thereof, and outputs the changed tones into the voice. Therefore, the work processing unit 150 adds the feeling information such as laughs to the face looks such as laughing faces, outputs the voices for the spoken words changed with the tones, and provides the character capable of representing the lip shapes according to the consonants and vowels of the spoken words to the users.
  • FIG. 2 is a flow view describing a multilingual speech method of the character according to the present invention. FIG. 3 is a flow view describing steps synchronizing and then expressing the feelings and behaviors, when a character speaks in many languages, according to the present invention.
  • Referring to FIG. 2 and FIG. 3, a multilingual speech method 200 of a character according to the present invention includes a context-aware step S210 to recognize the surroundings, a conversation selection step S230 to select the spoken words according to the corresponding language by the language identification information in accordance with the recognized surroundings from a Unicode multilingual database 135, stored into Unicode-based multiple languages, including the language identification information identifying the languages for each nation and the spoken words corresponding to the languages for each nation, and a behavior expression step S240 to synchronize and express the selected spoken words and the behaviors according to the spoken words.
  • Preferably, the multilingual speech method 200 of the character further includes a feeling production step S220 to select the feelings according to the surroundings recognized from the context-aware step S210, wherein the behavior expression step S240 synchronizes and expresses the selected feelings and the behaviors according to the spoken words
  • The context-aware step S210 recognizes the surroundings through the context-aware unit 110. The context-aware unit 110 is implemented by a system, etc. that takes the surroundings by a camera and analyzes the taken images, or by all kinds of sensors recognizing the surroundings.
  • Hereinafter, the feeling production step S220 selects the feelings according to the recognized surroundings from the feeling production unit 120. The feeling production step S220 selects the feelings according to the surroundings recognized by the context-aware unit 110, or selects and inputs the feelings and spoken words randomly selected by the user.
  • The conversation selection step S230 selects the spoken words according to the corresponding language by the language identification information in accordance with the surroundings recognized by the conversation selection unit 130 from the Unicode multilingual database 135, stored into Unicode-based multiple languages, including the language identification information identifying the languages for each nation and the spoken words corresponding to the languages for each nations.
  • The conversation selection step S230 selects the corresponding language by the language identification information stored into the Unicode multilingual database 135 and selects the spoken words according to the language. Thus, it is possible to select the spoken words written in the corresponding language by the language identification information only without complex processing logics such as rules related to separable language selection, and to fluidly configure speech motions simultaneously representing languages of multiple nations.
  • The behavior expression step S240 synchronizes the spoken words selected from the conversation selection unit 130 and the behaviors according to the spoken words and expresses the synchronized spoken words to the behavior expression unit 140. Further, when the behavior expression step S240 further includes the feeling production step S220, it synchronizes the feelings selected from the feeling production unit 120 and the behaviors according to the spoken words selected from the conversation selection unit 130 and therefore expresses the synchronized feelings and behaviors to the behavior expression unit 140. Preferably, the behavior expression step S240 includes a voice synthesis step S242 to output the selected spoken words into voices by the voice synthesis unit 142, and a face expression step S245 to display the faces according to the selected spoken words on the screen by the face expression unit 145.
  • The voice synthesis step S242 extracts consonant and vowel information necessary for producing lip shapes from the spoken words by the syntax analysis unit 143, includes a syntax analysis step S243 to produce time information pronouncing consonants and vowels changing the lip shapes, and a sound source production step S244 to produce the sound sources corresponding to the selected spoken words by the sound source production unit 144 and to output the produced sound sources into the voices.
  • The sound source production step S244 adds the selected feeling information to the sound sources produced by the work processing unit 150, changes tones thereof, and outputs the changed tones into the voices by the sound source production unit 144.
  • In addition, the face expression step S245 includes a feeling expression step S246 to select the face looks corresponding to the feeling expression according to the recognized surroundings and to display the selected face looks on the screen, and a speech expression step S248 to select the lip shapes necessary for representing the selected spoken words and to display the selected lip shapes on the screen.
  • The feeling expression step S246 selects the face looks from the look database 147 that stores the face looks into the images and displays the selected face looks on the screen by the feeling expression unit 146, and the speech expression step S248 selects the lip shapes necessary for representing the spoken words selected from the lip shape database 149 that stores the lip shapes into the images and displays the selected lip shapes on the screen by the speech expression unit 148.
  • Although embodiments have been described with reference to a number of illustrative embodiments thereof, it should be understood that numerous other modifications and embodiments can be devised by those skilled in the art that will fall within the spirit and scope of the principles of this disclosure. More particularly, various variations and modifications are possible in the component parts and/or arrangements of the subject combination arrangement within the scope of the disclosure, the drawings and the appended claims. In addition to variations and modifications in the component parts and/or arrangements, alternative uses will also be apparent to those skilled in the art.

Claims (17)

1. A multilingual speech system of a character, comprising:
a context-aware unit to recognize surroundings;
a conversation selection unit to select spoken words in accordance with the recognized surroundings;
a Unicode multilingual database in which the spoken words are stored in Unicode-based multiple languages according to languages for each nation;
a behavior expression unit to express behaviors in accordance with the selected spoken words; and
a work processing unit to synchronize and express the selected spoken words and the behaviors according to the spoken words.
2. The multilingual speech system of the character according to claim 1, further comprising a feeling production unit to select feelings in accordance with the recognized surroundings,
wherein the work processing unit synchronizes and expresses the selected feelings and the behaviors according to the spoken words.
3. The multilingual speech system of the character according to claim 1, wherein the Unicode multilingual database additionally stores language identification information identifying the languages for each nation, and the conversation selection unit selects the spoken words in accordance with the corresponding language by the language identification information.
4. The multilingual speech system of the character according to claim 3, wherein the behavior expression unit includes a voice synthesis unit to output the selected spoken words into voices, and a face expression unit to display the faces according to the selected spoken words on the screen.
5. The multilingual speech system of the character according to claim 4, wherein the voice synthesis unit extracts consonant and vowel information necessary for producing lip shapes from the spoken words, includes a syntax analysis unit to produce time information pronouncing consonants and vowels changing the lip shapes, and a sound source production unit to produce sound sources corresponding to the selected spoken words and to output the produced sound sources into the voices, and the face expression unit includes a feeling expression unit to select face looks corresponding to feeling expression according to the recognized surroundings and to display the selected face looks on the screen, and a speech expression unit to select the lip shapes necessary for representing the selected spoken words and to display the selected lip shapes on the screen.
6. The multilingual speech system of the character according to claim 5, wherein the face expression unit further includes a look database to store the face looks into images, and a lip shape database to store the lip shape into the images.
7. The multilingual speech system of the character according to claim 5, wherein the work processing unit adds the selected feeling information to the produced sound sources, changes tones thereof, and outputs the changed tones into the voices.
8. A multilingual speech method of a character, comprising:
a context-aware step to recognize surroundings;
a conversation selection step to select the spoken words according to the corresponding language by language identification information in accordance with the recognized surroundings from a Unicode multilingual database, stored into Unicode-based multiple languages, including the language identification information identifying the languages for each nation and the spoken words corresponding to the languages for each nation; and
a behavior expression step to synchronize and express the selected spoken words and the behaviors according to the spoken words.
9. The multilingual speech method of the character according to claim 8, further comprising a feeling production step to select the feelings in accordance with the recognized surroundings,
wherein the behavior expression step synchronizes and expresses the selected feeling and the behaviors according to the spoken words.
10. The multilingual speech method of the character according to claim 9, wherein the behavior expression step includes a voice synthesis step to output the selected spoken words into voices, and a face expression unit to display the faces according to the selected spoken words on the screen.
11. The multilingual speech method of the character according to claim 10, wherein the voice synthesis step extracts consonant and vowel information necessary for producing lip shapes from the spoken words, includes a syntax analysis step to produce time information pronouncing consonants and vowels changing the lip shapes, and a sound source production step to produce sound sources corresponding to the selected spoken words and to output the produced sound sources into the voices, and the face expression step includes a feeling expression step to select face looks corresponding to feeling expression according to the recognized surroundings and to display the selected face looks on the screen, and a speech expression step to select the lip shapes necessary for representing the selected spoken words and to display the selected lip shapes on the screen.
12. The multilingual speech method of the character according to claim 11, wherein the sound source production step adds the selected feeling information to the produced sound sources, changes tones thereof, and outputs the changed tones into the voices.
13. The multilingual speech system of the character according to claim 2, wherein the Unicode multilingual database additionally stores language identification information identifying the languages for each nation, and the conversation selection unit selects the spoken words in accordance with the corresponding language by the language identification information.
14. The multilingual speech system of the character according to claim 13, wherein the behavior expression unit includes a voice synthesis unit to output the selected spoken words into voices, and a face expression unit to display the faces according to the selected spoken words on the screen.
15. The multilingual speech system of the character according to claim 14, wherein the voice synthesis unit extracts consonant and vowel information necessary for producing lip shapes from the spoken words, includes a syntax analysis unit to produce time information pronouncing consonants and vowels changing the lip shapes, and a sound source production unit to produce sound sources corresponding to the selected spoken words and to output the produced sound sources into the voices, and the face expression unit includes a feeling expression unit to select face looks corresponding to feeling expression according to the recognized surroundings and to display the selected face looks on the screen, and a speech expression unit to select the lip shapes necessary for representing the selected spoken words and to display the selected lip shapes on the screen.
16. The multilingual speech system of the character according to claim 15, wherein the face expression unit further includes a look database to store the face looks into images, and a lip shape database to store the lip shape into the images.
17. The multilingual speech system of the character according to claim 15, wherein the work processing unit adds the selected feeling information to the produced sound sources, changes tones thereof, and outputs the changed tones into the voices.
US14/349,274 2011-11-21 2012-07-18 Multilingual speech system and method of character Abandoned US20150073772A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
KR1020110121825A KR101358999B1 (en) 2011-11-21 2011-11-21 method and system for multi language speech in charactor
KR10-2011-0121825 2011-11-21
PCT/KR2012/005722 WO2013077527A1 (en) 2011-11-21 2012-07-18 Multilingual speech system and method of character

Publications (1)

Publication Number Publication Date
US20150073772A1 true US20150073772A1 (en) 2015-03-12

Family

ID=48469940

Family Applications (1)

Application Number Title Priority Date Filing Date
US14/349,274 Abandoned US20150073772A1 (en) 2011-11-21 2012-07-18 Multilingual speech system and method of character

Country Status (5)

Country Link
US (1) US20150073772A1 (en)
EP (1) EP2772906A4 (en)
KR (1) KR101358999B1 (en)
CN (1) CN104011791A (en)
WO (1) WO2013077527A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170357636A1 (en) * 2016-06-13 2017-12-14 Sap Se Real time animation generator for voice content representation
JP2019162714A (en) * 2016-08-29 2019-09-26 Groove X株式会社 Robot recognizing direction of sound source

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101388633B1 (en) * 2014-02-10 2014-04-24 대한민국 System and method of virtual interactive interrogation training
US10339916B2 (en) * 2015-08-31 2019-07-02 Microsoft Technology Licensing, Llc Generation and application of universal hypothesis ranking model
CN108475503B (en) * 2015-10-15 2023-09-22 交互智能集团有限公司 System and method for multilingual communication sequencing
JP6901992B2 (en) * 2018-04-17 2021-07-14 株式会社日立ビルシステム Guidance robot system and language selection method
US20230032760A1 (en) * 2021-08-02 2023-02-02 Bear Robotics, Inc. Method, system, and non-transitory computer-readable recording medium for controlling a serving robot

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050275558A1 (en) * 2004-06-14 2005-12-15 Papadimitriou Wanda G Voice interaction with and control of inspection equipment
US20060074689A1 (en) * 2002-05-16 2006-04-06 At&T Corp. System and method of providing conversational visual prosody for talking heads
US20080269958A1 (en) * 2007-04-26 2008-10-30 Ford Global Technologies, Llc Emotive advisory system and method
US20140277735A1 (en) * 2013-03-15 2014-09-18 JIBO, Inc. Apparatus and methods for providing a persistent companion device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100240637B1 (en) * 1997-05-08 2000-01-15 정선종 Syntax for tts input data to synchronize with multimedia
KR20020022504A (en) * 2000-09-20 2002-03-27 박종만 System and method for 3D animation authoring with motion control, facial animation, lip synchronizing and lip synchronized voice
KR100706967B1 (en) * 2005-02-15 2007-04-11 에스케이 텔레콤주식회사 Method and System for Providing News Information by Using Three Dimensional Character for Use in Wireless Communication Network
KR100945495B1 (en) * 2008-05-16 2010-03-09 한국과학기술정보연구원 System and Method for providing terminology resource
KR101089184B1 (en) * 2010-01-06 2011-12-02 (주) 퓨처로봇 Method and system for providing a speech and expression of emotion in 3D charactor

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060074689A1 (en) * 2002-05-16 2006-04-06 At&T Corp. System and method of providing conversational visual prosody for talking heads
US20050275558A1 (en) * 2004-06-14 2005-12-15 Papadimitriou Wanda G Voice interaction with and control of inspection equipment
US20080269958A1 (en) * 2007-04-26 2008-10-30 Ford Global Technologies, Llc Emotive advisory system and method
US20140277735A1 (en) * 2013-03-15 2014-09-18 JIBO, Inc. Apparatus and methods for providing a persistent companion device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Wu, Zhiyong, et al. "A unified framework for multilingual text-to-speech synthesis with SSML specification as interface." Tsinghua Science & Technology 14.5 (2009): 623-630 *
Wu, Zhiyong, et al. "A unified framework for multilingual text-to-speech synthesis with SSML specification as interface." Tsinghua Science & Technology 14.5 (2009): 623-630. *
Wu, Zhiyong, et al. "A unified framework for multilingual text-to-speech synthesis with SSML specification as interface." Tsinghua Science & Technology 14.5 (2009): 623-630. (MAILED with first office action) *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170357636A1 (en) * 2016-06-13 2017-12-14 Sap Se Real time animation generator for voice content representation
US10304013B2 (en) * 2016-06-13 2019-05-28 Sap Se Real time animation generator for voice content representation
JP2019162714A (en) * 2016-08-29 2019-09-26 Groove X株式会社 Robot recognizing direction of sound source
US11376740B2 (en) 2016-08-29 2022-07-05 Groove X, Inc. Autonomously acting robot that recognizes direction of sound source

Also Published As

Publication number Publication date
EP2772906A4 (en) 2015-06-17
WO2013077527A1 (en) 2013-05-30
EP2772906A1 (en) 2014-09-03
CN104011791A (en) 2014-08-27
KR101358999B1 (en) 2014-02-07
KR20130056078A (en) 2013-05-29

Similar Documents

Publication Publication Date Title
CN106653052B (en) Virtual human face animation generation method and device
US20150073772A1 (en) Multilingual speech system and method of character
US8224652B2 (en) Speech and text driven HMM-based body animation synthesis
US11482134B2 (en) Method, apparatus, and terminal for providing sign language video reflecting appearance of conversation partner
WO2017112813A1 (en) Multi-lingual virtual personal assistant
CN114401438B (en) Video generation method and device for virtual digital person, storage medium and terminal
CN112086086A (en) Speech synthesis method, device, equipment and computer readable storage medium
CN111145720A (en) Method, system, device and storage medium for converting text into voice
US20220335079A1 (en) Method for generating virtual image, device and storage medium
CN110517668B (en) Chinese and English mixed speech recognition system and method
CN114495927A (en) Multi-modal interactive virtual digital person generation method and device, storage medium and terminal
KR101089184B1 (en) Method and system for providing a speech and expression of emotion in 3D charactor
KR20170062089A (en) Method and program for making the real-time face of 3d avatar
CN112463942A (en) Text processing method and device, electronic equipment and computer readable storage medium
CN114401431A (en) Virtual human explanation video generation method and related device
Fellbaum et al. Principles of electronic speech processing with applications for people with disabilities
KR100897149B1 (en) Apparatus and method for synchronizing text analysis-based lip shape
CN102970618A (en) Video on demand method based on syllable identification
Dreuw et al. The signspeak project-bridging the gap between signers and speakers
Karpov et al. Multimodal synthesizer for Russian and Czech sign languages and audio-visual speech
CN110781327B (en) Image searching method and device, terminal equipment and storage medium
Bear et al. Some observations on computer lip-reading: moving from the dream to the reality
CN114694633A (en) Speech synthesis method, apparatus, device and storage medium
CN115171673A (en) Role portrait based communication auxiliary method and device and storage medium
JP2010102564A (en) Emotion specifying device, emotion specification method, program, and recording medium

Legal Events

Date Code Title Description
AS Assignment

Owner name: FUTURE ROBOT CO., LTD., KOREA, REPUBLIC OF

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:JUN, YOUNG JIN;SONG, SE KYUNG;REEL/FRAME:032594/0982

Effective date: 20140325

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION