WO2007136208A1

WO2007136208A1 - Method and system of video phone calling using talker sensitive avatar

Info

Publication number: WO2007136208A1
Application number: PCT/KR2007/002449
Authority: WO
Inventors: Seongho Lim
Original assignee: Seongho Lim
Priority date: 2006-05-24
Filing date: 2007-05-21
Publication date: 2007-11-29

Abstract

The present invention relates to a video communication method using an avatar naturally moving in accordance with a speaker and a system therefor. There is provided a method of transmitting an avatar while performing video communication, the method comprising: a speaker determining step for monitoring user's and counterpart's voices and determining a speaker; and an avatar motion step for retrieving listening-shape avatar data or speaking-shape avatar data and transmitting the retrieved avatar data to a counterpart's videophone, if a speaker is determined. According to the present invention, video communication using an avatar naturally acting according to a speaker is allowed.

Description

METHOD AND SYSTEM OF VIDEO PHONE CALLING USING

TALKER SENSITIVE AVATAR

Technical Field

[1] The present invention relates to a video communication method using an avatar naturally moving in accordance with a speaker and a system therefor. More specifically, the present invention relates to a video communication method using an avatar naturally moving in accordance with a speaker and a system therefor, in which a user transmits an avatar instead of a real image of the user while the user communicates using a telephone capable of video communication, and user's and counterpart's voices are monitored so that the avatar may naturally move in accordance with a speaker. Background Art

[2] A conventional video communication method using an avatar is a technique that retrieves and transmits data on a speaking- shape avatar if a user speaks while a non- speaking- shape avatar is transmitted.

[3] However, since only a simply non-speaking-shape user's avatar is transmitted while a counterpart speaks, and the avatar is controlled inappropriately, communication using the avatar is unnatural and inconvenient. Disclosure of Invention Technical Problem

[4] Accordingly, the present invention has been made in order to solve the above problems, and it is an object of the present invention to provide a video communication method and a system therefor, in which while a user communicates using a videophone, a listening- shape or a speaking- shape user avatar is displayed on a counterpart's videophone in accordance with a speaker. Technical Solution

[5] In order to accomplish the above object of the invention, according to one aspect of the invention, there is provided a video communication method using an avatar naturally moving in accordance with a speaker, the video communication method comprising the steps of: a speaker determining step for monitoring user's and counterpart's voices and determining a current speaker; and an avatar motion step for retrieving listening- shape avatar data or speaking- shape avatar data and transmitting the retrieved avatar data to a counterpart's videophone, if a speaker is determined, whereby the avatar is transmitted while video communicating.

[6] Here, an energy sensing technique for simply detecting changes in amplitude or frequency, a voice activity detection (VAD) technique that is further specialized, or the like can be used in the speaker determining step. [7] Here, in the avatar motion step, it is controlled to select and transmit either a speaking- shape avatar or a listening-shape avatar if a speaker is determined. [8] At this point, if a user's voice and a counterpart's voice are simultaneously detected, the user is determined as a current speaker, and the speaking- shape avatar is transmitted. [9] In addition, if a speaker is determined, speaker's voices are analyzed using a variety of voice analysis techniques, and thus movements of lips, speaking patterns, or listening patterns can be divided in further detail. [10] Examples of such a variety of voice analysis techniques may include voice magnitude and pattern analysis, phoneme recognition, speech recognition, and the like.

Advantageous Effects

[11] According to the present invention, when a user speaks while performing avatar communication, a counterpart will see a speaking- shape avatar, and when the counterpart speaks, the counterpart will see a listening-shape avatar.

[12] That is, through determination of a speaker, a user's avatar is controlled in real-time to act clearly discriminating a listening or a speaking form or motion, and thus the user can enjoy video communication seeing an avatar that is further naturally responding to the speaker. Brief Description of the Drawings

[13] Further objects and advantages of the invention can be more fully understood from the following detailed description taken in conjunction with the accompanying drawings in which:

[14] FIG. 1 is a conceptual view showing the connectivity of a video communication system using an avatar moving in accordance with a speaker according to an embodiment of the invention;

[15] FIG. 2 is a view showing the configuration of a videophone used for a video communication system using an avatar moving in accordance with a speaker;

[16] FIG. 3 is a flowchart illustrating a method of operating an avatar of a video communication system using an avatar moving in accordance with a speaker; and

[17] FIG. 4 is a view showing the configuration of an avatar server connected to a video communication system using an avatar moving in accordance with a speaker according to another embodiment of the invention.

[18]

Mode for the Invention

[19] Hereinafter, a video communication method using an avatar naturally moving in accordance with a speaker and a system therefor according to a preferred embodiment of the present invention will be described in detail with reference to the accompanying drawings.

[20] Embodiment 1

[21] FIG. 1 is a conceptual view showing the connectivity of a video communication system using an avatar naturally moving in accordance with a speaker (hereinafter, referred to as a video communication system) according to an embodiment of the invention, and the embodiment is described referring to FIG. 1.

[22] The video communication system comprises a plurality of videophones 100 and

200, an avatar server 300, and a telephone network 400.

[23] The videophones 100 and 200 are divided into a transmitter videophone 100 and a receiver videophone 200, i.e., any kind of telephone that can transfer a user's image to a counterpart user through a camera. That is, the videophone can be any kind of terminal, including a general wired telephone, mobile communication terminal, personal digital assistance (PDA), and the like.

[24] The avatar server 300 stores a plurality of avatar data and transfers an avatar if a user requests to transmit a specific avatar.

[25] The telephone network 400 comprises all kinds of general wired or wireless telephone communication networks, i.e., a communication network capable of voice, video, or data communication.

[26] FIG. 2 is a view showing the configuration of a videophone used for a video communication system, and the embodiment is described referring to FIG. 2.

[27] The videophone 100 or 200 means all terminals of a transmitter side and a receiver side. Here, mainly a transmitter side terminal is described for convenience.

[28] The videophone 100 comprises a camera 102, a microphone 104, a keypad 106, a video and audio mixing unit 108, an avatar output unit 110, a speaker determining unit 112, an avatar database (DB) 114, a control unit 116, a speaker 151, a video and audio processing 153, and a display apparatus 154.

[29] The camera 102 is an apparatus for photographing an image of a user and converting the image into an electrical signal, i.e., a camera that is generally used for a videophone or a digital camera.

[30] The microphone 104 is an apparatus for converting a user's voice into an electrical signal, i.e., a microphone that is generally used for a telephone, a cellular phone, or the like.

[31] The keypad 106 is an apparatus for inputting numerals or special characters, i.e., an apparatus for generating an electrical signal corresponding to a numeral or a character if a user presses a specific numeral or character button, which is a general keypad that is used to input numerals in a telephone, a cellular phone, or the like. [32] The video and audio mixing unit 108 appropriately mixes a user's image and voice inputted from the camera 102 and the microphone 104 with an avatar image and a sound or voice outputted from the avatar output unit 110 in response to a request of the control unit 116 or in accordence to a characteristic of avatar data, or selects an image and a voice, and converts the mixed or selected image and voice into a data form that can be transmitted through the telephone network 400.

[33] The avatar output unit 110 retrieves an avatar requested by the control unit 116 from the avatar DB and outputs the retrieved avatar to the video and audio mixing unit 108.

[34] The speaker determining unit 112 monitors a user's voice inputted through the microphone 103 or a counterpart's voice received through the telephone network and determines a current speaker, thereby allowing the control unit 116 to transmit an avatar corresponding to the speaker.

[35] On the other hand, if a speaker is determined, speaker's voices are analyzed using a variety of voice analysis techniques so that the control unit 116 can transmit an avatar of a further detailed form.

[36] The avatar DB 114 stores data on an avatar that a user desired to transmit. The avatar data stored in the avatar DB includes images of an animal or a thing, icons, pictorial letters, and the like, including all kinds of stand-still avatars and moving avatars.

[37] Here, the avatar data can be a general pictorial data that can be animated, such as a graphic interchange format (GIF) file, or a vector or coordinate data that can contain motions, shapes, and the like.

[38] On the other hand, the avatar data can include a sound or a voice. For example, if a sound expressing an agreement, such as indeed-, oh~ , is outputted while a listening- shape avatar is displayed, a user can feel more comfortable than communicating only viewing an avatar.

[39] Here, the avatar data can be downloaded in a variety of ways, such as a website, a personal computer (PC), or the like.

[40] The control unit 116 requests the avatar output unit 110 to output data for displaying an avatar of a specific form based on a signal inputted from the speaker determining unit 112 and a signal inputted through the keypad 106.

[41] On the other hand, a user can switch from a video communication to an avatar communication, or vice versa. That is, a user can transmit a user's image by handling the keypad 106 while displaying an avatar in a communication, or can transmit an avatar while displaying a user's image.

[42] In addition, a user can communicate by simultaneously displaying a user's image and an avatar on the counterpart's videophone. [43] Such a switch handling can be implemented by control unit's 116 recognizing a designated command inputted through the keypad 106.

[44] The speaker 151 is an apparatus for converting an electrical signal into a sound so that people can hear the sound, which means a general speaker of a wired or a wireless telephone.

[45] The video and audio processing unit 153 converts data received from the telephone network into video and audio data.

[46] The display unit 154 is an apparatus for converting an electrical signal into an image so that people can see the image, which means a liquid crystal display screen or the like of a general wired or a wireless telephone.

[47] FIG. 3 is a flowchart illustrating a method of operating an avatar of a video communication system, and the embodiment is described referring to FIG. 3.

[48] When a transmitter places a phone call using the videophone 100, the transmitter can select whether to communicate with a video, to communicate with an avatar, or to communicate without a video function.

[49] If the transmitter selects an avatar communication through the keypad 106 SlOO, the control unit 116 provides a list of avatars that the transmitter can select, among the avatar data stored in the avatar DB 114.

[50] If the transmitter selects an avatar to transmit to a receiver's videophone 200 S 102 and inputs a phone number of a specific receiver, a video communication using an avatar is started S 104.

[51] If the communication is started, the control unit 116 requests the avatar output unit

110 to retrieve avatar data for displaying a non- speaking- shape avatar among a plurality of avatar data stored in the avatar DB 114, and the video and audio mixing unit 108 transmits the non- speaking- shape avatar data outputted from the avatar output unit 110 to the receiver's videophone 200 so that the non- speaking- shape avatar is displayed on the receiver's videophone 200 S 106.

[52] The non- speaking- shape avatar is an avatar that makes a tiny gesture, such as lifting a finger or slightly nodding although the avatar is in a stand-still shape or does not move lips, while remaining in a non-speaking state.

[53] That is, the non-speaking-shape avatar is an avatar that makes a small motion enough to indicate that the communication is still connected although the avatar does not speak to the user. Through the non- speaking- shape avatar, the counterpart can also recognize that the communication is still continued.

[54] The speaker determining unit 112 monitors whether transmitter's voices or counterpart's voices are received through the microphone 104 while the communication is continued and determines a current speaker S 107.

[55] If the user is determined as a current speaker by the speaker determining unit 112, the control unit 116 retrieves avatar data for displaying a speaking- shape avatar among a plurality of avatar data stored in the avatar DB 114, and the video and audio mixing unit 108 transmits the retrieved speaking- shape avatar data to the receiver's videophone 200 so that the speaking- shape avatar is displayed on the receiver's videophone 200 S108.

[56] The speaking-shape avatar is an avatar moving lips or shows a motion expressing that the avatar is speaking.

[57] If the counterpart is determined as a current speaker by the speaker determining unit

112, the control unit 116 retrieves avatar data for displaying a listening-shape avatar among a plurality of avatar data stored in the avatar DB 114, and the video and audio mixing unit 108 transmits the retrieved listening-shape avatar data to the receiver's videophone 200 so that the listening- shape avatar is displayed on the receiver's videophone 200 S 109.

[58] The listening- shape avatar is an avatar that can express its listening to what the counterpart says through a motion, such as nodding occasionally, giving an ear, or the like.

[59] The speaking-shape or listening-shape avatar can be created in a variety of forms. If a speaker is determined by the speaker determining unit 112, a shape of an avatar can be controlled by analyzing or classifying magnitude or changes of speaker's voices into certain steps.

[60] Embodiment 2

[61] FIG. 4 is a view showing the configuration of an avatar server connected to a video communication system using an avatar moving in accordance with a speaker according to another embodiment of the invention, and the embodiment is described referring to FIG. 4.

[62] The videophone 100 described above comprises the avatar DB for storing avatar data, the speaker determining unit, and the like in a single body, and therefore video communication using such an avatar can be implemented without an additional server or apparatus.

[63] However, the service of the present invention can be implemented by installing such constitutional components in a separate server, not in the videophone 100, connected to the telephone network.

[64] For this purpose, an additional avatar server 300 is required, and the avatar server

300 comprises a web server module 310, an avatar control module 320, a speaker determining module 330, and an avatar DB 360.

[65] The web server module 310 is connected to the telephone network 400 and provides a browser that enables transmitting and receiving data with the videophone 100 or 200.

[66] The avatar control module 320 functions the same as the control unit 116 of the embodiment described above. The avatar control module retrieves predetermined avatar data in response to a signal inputted from the speaker determining module 330 and allows the retrieved avatar data to be transmitted to a counterpart's videophone.

[67] The speaker determining module 330 functions the same as the speaker determining unit 112 of the embodiment described above. The speaker determining module determines a current speaker and allows an avatar of a predetermined form to be transmitted, which can be implemented using an energy sensing technique, VAD technique, or the like described above.

[68] The avatar DB 360 functions the same as the avatar DB 114 of the embodiment described above, which stores data on an avatar desired to be transmitted by a user.

[69] The avatar server 300 transmits and displays a non- speaking- shape avatar on the counterpart's videophone when a video communication is commenced. If a speaker is determined, the avatar server detects the determination of a speaker, appropriately mixes a naturally listening or speaking avatar with a user's image or voice, or selects an avatar, and transmits and displays the mixed or selected avatar on the counterpart's videophone.

[70] Accordingly, a video communication system using an avatar can be constructed without adding separate constitutional components required for video communication to the videophone 100 and 200.

[71] A video communication method using an avatar naturally moving in accordance with a speaker and a system therefor according to the embodiments of the present invention have been described above. However, the scope of rights of the present invention is not limited to the embodiments.

[72] For example, even when movements or expressions of a user's avatar to be transmitted are controlled by analyzing user's real movements or expressions received from a camera, if a counterpart's voice is detected, an ear of the user's avatar is slightly moved or enlarged, or the color of the avatar is changed in order to add an effect or a change showing that the user is listening to what the counterpart says. If a user's voice is detected while the avatar is transmitted, the previous effect or change is discarded or another effect or change is added. Therefore, it is apparent that a communication using an avatar naturally moving in accordance with a speaker can be implemented. Industrial Applicability

[73] Although messenger programs, such as the Microsoft Network (MSN) messenger, that provide a video communication function on a PC are widely distributed already, video communication users are not so many yet in reality.

[74] One of the most important reasons thereof is reluctance of users to show their camera images to a counterpart or worries about privacy infringement. [75] That is, in order to activate video communications, a technique that can ease the reluctance and worries about privacy should be provided to the users, and demands on an avatar video communication technique is absolutely expected as a technique that is most appropriate thereto.

[76] At this time, it is expected that necessarily required is a technique for identifying and transmitting an avatar naturally listening or speaking according to a speaker when the avatar is transmitted.

[77]

Claims

[1] A video communication method using an avatar naturally moving in accordance with a speaker, the video communication method comprising the steps of: a speaker determining step for monitoring user's and counterpart's voices and determining a current speaker; and an avatar motion step for retrieving listening-shape avatar data or speaking- shape avatar data and transmitting the retrieved avatar data to a counterpart's videophone, if a speaker is determined, whereby the avatar is transmitted while video communicating.

[2] The method according to claim 1, wherein if a user's voice and a counterpart's voice are simultaneously detected, the speaking-shape avatar data is retrieved and transmitted to the counterpart's videophone.

[3] A video communication system using an avatar naturally moving in accordance with a speaker, the video communication system comprising: a speaker determining unit for monitoring user's and counterpart's voices and determining a current speaker; and a control unit for controlling to retrieve listening-shape avatar data or speaking- shape avatar data and transmitting the retrieved avatar data to a counterpart's videophone, if a speaker is determined by the speaker determining unit, whereby the avatar is transmitted while video communicating.

[4] The system according to claim 3, wherein if a user's voice and a counterpart's voice are simultaneously detected, the control unit controls to retrieve the speaking- shape avatar data.