WO2009098671A2 - Content generation and communication in a video mail system - Google Patents

Content generation and communication in a video mail system Download PDF

Info

Publication number
WO2009098671A2
WO2009098671A2 PCT/IE2009/000003 IE2009000003W WO2009098671A2 WO 2009098671 A2 WO2009098671 A2 WO 2009098671A2 IE 2009000003 W IE2009000003 W IE 2009000003W WO 2009098671 A2 WO2009098671 A2 WO 2009098671A2
Authority
WO
WIPO (PCT)
Prior art keywords
avatar
messaging server
server
message
content
Prior art date
Application number
PCT/IE2009/000003
Other languages
French (fr)
Other versions
WO2009098671A3 (en
Inventor
David Harmony
Original Assignee
Markport Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Markport Limited filed Critical Markport Limited
Publication of WO2009098671A2 publication Critical patent/WO2009098671A2/en
Publication of WO2009098671A3 publication Critical patent/WO2009098671A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/07User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail characterised by the inclusion of specific contents
    • H04L51/10Multimedia information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • H04L65/401Support for services or applications wherein the services involve a main real-time session and one or more additional parallel real-time or time sensitive sessions, e.g. white board sharing or spawning of a subconference
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • H04N7/157Conference systems defining a virtual conference space and using avatars or agents

Definitions

  • the invention relates to generation of content, particularly avatars.
  • Avatars provide a way to allow users to create greetings and messages without providing their own images. This can be an action that is done for fun or because of personal preferences.
  • EP 1814294 describes creating personalized multimedia content with video independently of a messaging system.
  • the invention is directed towards providing an improved method for generation of avatars so that they can be used for communication in a more convenient manner to users.
  • a messaging method comprising the steps of:
  • a messaging server receiving from a user device a request for sending a message
  • the messaging server receiving content for inclusion in the message, said content . including at least audio content;
  • an avatar-generation server automatically synchronising the content with an avatar skin to provide an avatar
  • the messaging server incorporating the avatar in a message body to provide an avatar message, and transmitting the avatar message or storing it for later use
  • the avatar message is generated in real time during a single session between the user device and the messaging server, the user device either uploading the content during the session or indicating during the session a location for it to be found;
  • the messaging server maintains a cache of avatar skins, and retrieves avatar skins from said cache to provide options to the user device for choice of an avatar skin during the session.
  • the avatar generation server is separate from the messaging server, and said servers communicate in real time to perform the method.
  • the messaging server communicates with the avatar- generation server at intervals to refresh the cache.
  • the messaging server receives notifications from the avatar-generation server to refresh the cache.
  • the messaging server maintains mark-up language documents related to the avatar skins, and executes mark-up of said documents for play-out of the avatar skins to the user device to provide user options for choice of an avatar skin.
  • said documents are grouped by category and subcategory of avatar skin.
  • each document relates to a menu for user choice of an avatar skin and there are selectable avatar skins per menu.
  • the documents relate to different user device displays for viewing previews.
  • the documents control device displays through standard DTMF, the mark-up language including DTMF control commands in menu options, the menus being generated and stored in the cache to create a display, and the menus being interpreted by a mark-up language interpreter for DTMF interfacing.
  • the documents control device displays through automatic speech recognition, the mark-up language including automatic speech recognition commands for menu options, the menus being generated and stored in the cache to create a display, and the menus being interpreted by a mark-up language interpreter for automatic speech recognition interfacing.
  • the messaging server constructs the avatar skins in the cache by limiting differences of I-Frames to not exceed a pre-determined maximum frame size.
  • the messaging server applies a de-blurring mechanism to the I-Frames, in which an original I-Frame is copied N times, and the first N-I copies are blurred with a Gaussian blur radius.
  • the messaging server caches avatar skins indexed with user profiles and automatically selects skins which are indexed with the user.
  • the messaging server breaks the recorded content into segments, and the avatar generation server synchronizes the content with the avatar skin segment-by-segment.
  • the method comprises the further step of the messaging server transmitting a preview of the avatar message to the user device and only transmitting the avatar message after receiving user approval.
  • the method comprises the further steps of the messaging server playing out animation content while the avatar generation server generates the avatar.
  • the animation content is retrieved from said cache and is related to the selected avatar skin.
  • the messaging server transmits the avatar message according to recipient preferences. In one embodiment, the messaging server multi-casts the avatar message.
  • the content includes video or image content.
  • the invention provides a messaging system comprising:
  • a messaging server comprising means for performing messaging server operations of any method defined above, and
  • an avatar generation server comprising means for performing avatar generation server operations of any method defined above.
  • the invention provides a computer program product comprising software code for performing operations of any method defined above when executing on a digital processor.
  • FIGs. 1 and 2 are block diagrams illustrating a messaging system of the invention.
  • Figs. 3 and 4 are message sequence diagrams for operation of the system.
  • Fig. 5 is a set of plots illustrating frame processing for a message.
  • a video mail system 1 of the invention comprises a messaging server 2 (for example a video mail server), a storage device 3, and a cache 4.
  • the system 1 also comprises an avatar generation server 5 and mobile devices 6.
  • Fig. 2 is a flow diagram showing how the messaging server 2 populates the cache 4. It retrieves avatar skins from the server 5 and transfers them to the cache 4 for later retrieval in real time. A listing of available avatar skins triggers the generation of a local cache of mark-up language
  • VXML in this embodiment
  • documents are sub-divided by category and sub-category and each grouping spanning a plurality of documents.
  • Each document has commands for playing out an avatar skin selection menu and there are selectable avatar skins per menu.
  • the required avatar skin video construction to be used for a messaging session is carried out, as described in more detail below. There is dynamic refresh of this cache.
  • the invention enhances a video mail service by providing dynamic creation of an avatar message during real time interaction with a user.
  • the video mail system 1 records user audio (and possible also video) content, presents to the user available avatar “skins”, synchronizes the recorded content with a selected avatar skin to provide the body of an avatar message (an "avatar”), receives user preview results, and sends the avatar message to the recipient. This is all carried out within a video call session.
  • a mobile device 6 calls in to the messaging server 2 and records content for a message or greeting.
  • This content is typically audio-only, but may include video.
  • the messaging server 2 retrieves avatar skins from the cache 4.
  • the messaging server 2 plays out the available avatar skins to the user device and the user selects an avatar skin.
  • the messaging server uses pre-created VXML documents and video files to control and present the available options with minimal delay for the end user. This is described in more detail below.
  • the messaging server 2 still during the messaging session, submits the audio received from the user and the selected avatar skin to the avatar generation server 5.
  • the avatar generation server 5 generates an avatar by synchronizing the received content and the avatar skin. Simultaneously, the messaging server 2 plays out information concerning the selected avatar skin in order to avoid the session being quiet while the avatar is being generated.
  • the generated avatar is sent by the server 5 to the server 2, is stored in permanent storage 3.
  • the messaging server 2 generates a message incorporating the avatar in the body to provide an avatar message. It sends it for preview to the user device.
  • the avatar message Upon receiving a user confirmation the avatar message is sent by the messaging server 2 according to recipient preferences. As Fig. 3 shows, the messaging server 2 may perform video editing before sending the preview avatar message.
  • the video messaging server 2 retrieves available avatar skins from the (locally-generated) cache 4, this allows a quick response to the caller, advantageously resulting in the caller avoiding experiencing long pauses due to network latency while previewing the available avatar skins.
  • the avatar message could be an audio greeting or any other type of message for another subscriber.
  • the user is automatically offered the service of creating an avatar message.
  • the user is then able to view the available avatar skins through the VXML-driven menu system, and previews specific avatar skins as still images or as video previews.
  • the invention allows flexibility by providing for dynamic updating of the cache 4, which stores data including the avatar skins and the VXML documents.
  • the updating may be either by the messaging server 2 regularly polling an external service or by the external service pushing notifications to the messaging server 2.
  • the VXML documents handle the controlled play-out to the user devices by controlling prompts for user confirmation or rejection of the preview avatar message.
  • the avatar skins stored in the cache are generated in a manner which achieves very efficient use of bandwidth for transmission to the user devices (sender for preview and recipient for delivery), particularly for the UMTS standard.
  • This generation of the avatar skins involves transforming from JPEG to a 3GPP or MPEG video file which is adapted for optimum delivery on the bearer channel.
  • the messaging server 2 may modify the generated avatar message for the appropriate transport medium. For example, for UMTS transmission, the avatar message may be processed by the messaging server 2 to reduce the maximum frame size. Or, the server 2 may be adapted to truncate the message for compliance with MMS size requirements.
  • a locally stored video clip i.e. a sample animation for the selected avatar skin
  • This clip is retrieved from the cache 4.
  • the system 1 can allow different types of message and consequently different avatar skins for each type of message. There will for example be different types of messages in response to different call situations such as recipient busy, recipient not answering, or in response to caller identification.
  • the messaging server allows message delivery via a variety of different delivery mechanisms, which can be according to recipient preferences. For instance, if the recipient desires, they may enable MMS Push of their messages or alternatively they could log on and retrieve the message through a video call.
  • the messaging server 2 using the cache 4, advantageously limits the amount of 'dead-air' users will experience and reserves processing power for handling of the call rather than content generation.
  • the messaging system avoids dead air with the use of the cache. Because of the cache of VXML documents and avatar skin video files there are no delays which would otherwise arise if the messaging server 2 needed to interface with the remote avatar server 5 for the available avatar skins and selection menu play-out.
  • the candidate avatar skins are generated and stored in the cache, allowing them to be played out at high speed.
  • the server 2 is configurable to support different numbers of avatar skins per page.
  • a network interface e.g. HTTP, Web Service.
  • the video messaging server 2 queries the avatar-generation server 5 off-line to determine what avatar skins are available to be created by the system 1.
  • the listing of available avatar skins triggers the creation of a local cache of VXML documents; the list is subdivided by category/sub-category and each grouping can span multiple VXML documents (for example, there may be four selectable avatars per VXML menu).
  • the multiple documents address the limited displays available on mobile phones for viewing the available previews.
  • the VXML documents handle the control of the user experience through standard DTMF or ASR via generation of the VXML menus.
  • the messaging server 2 can initiate cache update on a periodic basis or as directed by push notifications from the avatar generation server 5.
  • the messaging server 2 constructs the cache using the icons of the avatar skins for dynamic generation of the video prompts for preview and selection of skins.
  • the generation of an I-Frame is done to match existing background layouts of the video messaging server 2.
  • the system accomplishes this in the simplest of forms with imaging layers where the background image is provided and a layout is selected that would not conflict with logos or other artefacts of the background image.
  • the layout selected during the generation of the video has an impact on the generation of the VXML documents. If the template selected allows, three images, the DTMF keys or ASR will be constrained by that count.
  • the system flattens the layered image to construct a single I-Frame for final conversion to a 3GPP standard video.
  • the messaging server 2 generates the avatar skins as motion videos by constructing a series of I- Frames to play over time. These motion videos support a set of transitions (e.g. dissolve-in, fly- in, and appear). Also, the videos are generated in multiple formats, sizes, and bit rates to remove any unnecessary need to transcode the content depending on phone capabilities e.g. if the server 2 supports CIF and QCIF resolution both prompt sets will be generated.
  • the initial I- Frame is copied 3 times, the first is blurred with a Gaussian blur radius of 4, the second is blurred with a Gaussian blur radius of 2, and the final I-Frame is the original I-Frame.
  • This is illustrated in Fig. 5, in which the horizontal axis represents number of frames for up to a 1 second period, and the vertical axis represents frame size in kbits.
  • the system also generates transition prompts ("animation play-out" in Figs. 3 and 4) to be run while the avatar is being generated; these prompts eliminate the dead-air on the call during creation to let the user know that the request is being processed.
  • This cache of information can be refreshed at a predefined rate to keep the cache synchronized with the remote server.
  • the invention allows the message to be sent to the recipient based on the recipient's preferences, which may be MMS, Email, retrieved through Webmail, or calling into the video mail server.
  • the avatar skins can be classified into groups for easy browsing and each avatar skin can be previewed for selection.
  • the preview may be a video based on the avatar skin so that the user may be able to see how the avatar moves.
  • the system 1 in real time generates the full avatar message with the audio and/or video recorded by the user. After the synchronization of the audio and the avatar skin the user can preview the synchronized message prior to sending it to the recipient.
  • the mail server 2 may also provide entries in a subscriber profile (e.g. LDAP or an RDBMS) that allows the subscribers to select a preset series of avatar skins. The user can then select these pre-selected avatar skins as a short-cut, bypassing the available avatar skins or selected during an audio only call.
  • a subscriber profile e.g. LDAP or an RDBMS
  • the invention provides the user with an improved experience compared to the prior art.
  • management of the avatar functionality is available entirely via mobile video calling in a single session.
  • the user may control creation and review of avatars using: - Mobile using UMTS video calling IMS Mobile PDA (which is SIP-enabled)
  • the invention is used to enhance a video blogging service.
  • An end user calls into the messaging server 2 and records a selected blog and then has their video replaced by a selected avatar.
  • This allows the user to select a video skin for their message, achieving anonymity of the sender's appearance, location, and environment.
  • the creator of the video blogging service can create an avatar that is themed to the topic of the delivery. This can provide an additional reinforcement of the message delivery.
  • a user can create a video blog content describing the importance of business professionalism, could select a business-based avatar skin, and record the content without concern for their current appearance or location.
  • the messaging server could multi-cast the message to multiple destinations, for example to members of a closed user group.
  • the system may take the resultant avatar message and transcode it to a more suitable format for the Web and upload it to a personal site or to a social networking site for anyone to view.
  • This is similar to the previous example with the slight difference being that rather than a 1 -to- 1 messaging system it is a one-to-many system.
  • the subscriber also is subscribed to a set of blogs over an RSS feed.
  • the RSS service signals a new blog has been updated the service will fetch the content of the blog and parse the resultant HTML document and create an audio file from the resultant text with a text to speech engine; this may be done as multiple segments.
  • the user may have pre-selected a set of avatar skins stored in their profile.
  • the entire audio is converted to an avatar message, with automatic selection of an avatar skin.
  • the audio may be broken into segments and each section individually synchronized with an avatar skin.
  • Each audio segment may be combined with video to provide the content which is synchronized with an avatar skin.
  • the resultant avatar messages may then be stored in a video mail server and retrieved over a video call as typically video messages are stored in a video mail server.
  • the created avatar messages may be available for preview during a video session to the end user, selectably navigable via dynamically created menu structures with visual descriptors of the content therein.
  • system 1 seamlessly performs creation of an avatar message in real time, integrated within a video mail session.
  • seamless and transparent creation of avatar messages makes the use of multi-media content much easier for the user and it is more likely for the end user to exercise the service.
  • the avatar generation service is not limited to being a remote service and could be integrated with the messaging service.
  • the recorded audio and/or video content is applied to all avatar skins, and the subsequently created avatar messages can be previewed before accepting a given avatar message.
  • the media server part of the messaging server 2 captures the DTMF.
  • the audio is sent to a speech recognition engine which quantifies the speech and returns to the video mail solution the reliability of audio as it maps to the available grammar created by the construction of the VXML documents.
  • the content is recorded in real time during a session.
  • the content may have been recorded previously and synchronised with a selected avatar skin during the session.
  • the user device may retrieve the previously recorded content from the messaging server. This may be done using a menu system.

Abstract

A video mail system (1) comprises a messaging server 2, a storage device (3), a cache (4), and an avatar generation server (5). The messaging server (2) populates the cache (4) from the server (5) and transfers them to the cache 4 for later retrieval in real time. The invention enhances a video mail service by providing dynamic creation of an avatar message during real time interaction with a user. The video mail system (1) records user audio (and possible also video) content, presents to the user available avatar 'skins', synchronizes the recorded content with a selected avatar skin to provide the body of an avatar message (an 'avatar'), receives user preview results, and sends the avatar message to the recipient. This is all carried out within a video call session.

Description

"Content Generation and Communication in a Video Mail System"
INTRODUCTION
Field of the Invention
The invention relates to generation of content, particularly avatars.
Prior Art Discussion
Some subscribers are hesitant to provide their own image to be left for greetings and messages. Avatars provide a way to allow users to create greetings and messages without providing their own images. This can be an action that is done for fun or because of personal preferences.
US7176956 describes video enhancement of an avatar.
EP 1814294 describes creating personalized multimedia content with video independently of a messaging system.
The invention is directed towards providing an improved method for generation of avatars so that they can be used for communication in a more convenient manner to users.
SUMMARY OF THE INVENTION
According to the invention, there is provided a messaging method comprising the steps of:
a messaging server receiving from a user device a request for sending a message,
the messaging server receiving content for inclusion in the message, said content . including at least audio content;
an avatar-generation server automatically synchronising the content with an avatar skin to provide an avatar, and the messaging server incorporating the avatar in a message body to provide an avatar message, and transmitting the avatar message or storing it for later use,
wherein the avatar message is generated in real time during a single session between the user device and the messaging server, the user device either uploading the content during the session or indicating during the session a location for it to be found; and
wherein the messaging server maintains a cache of avatar skins, and retrieves avatar skins from said cache to provide options to the user device for choice of an avatar skin during the session.
In one embodiment, the avatar generation server is separate from the messaging server, and said servers communicate in real time to perform the method.
In one embodiment, the messaging server communicates with the avatar- generation server at intervals to refresh the cache.
In another embodiment, the messaging server receives notifications from the avatar-generation server to refresh the cache.
In one embodiment, wherein the messaging server maintains mark-up language documents related to the avatar skins, and executes mark-up of said documents for play-out of the avatar skins to the user device to provide user options for choice of an avatar skin.
In another embodiment, said documents are grouped by category and subcategory of avatar skin.
hi one embodiment, each document relates to a menu for user choice of an avatar skin and there are selectable avatar skins per menu.
In a further embodiment, the documents relate to different user device displays for viewing previews.
In one embodiment; the documents control device displays through standard DTMF, the mark-up language including DTMF control commands in menu options, the menus being generated and stored in the cache to create a display, and the menus being interpreted by a mark-up language interpreter for DTMF interfacing.
In one embodiment, the documents control device displays through automatic speech recognition, the mark-up language including automatic speech recognition commands for menu options, the menus being generated and stored in the cache to create a display, and the menus being interpreted by a mark-up language interpreter for automatic speech recognition interfacing.
In another embodiment, the messaging server constructs the avatar skins in the cache by limiting differences of I-Frames to not exceed a pre-determined maximum frame size.
In one embodiment, the messaging server applies a de-blurring mechanism to the I-Frames, in which an original I-Frame is copied N times, and the first N-I copies are blurred with a Gaussian blur radius. Preferably, the blur radius generally reduces from N=I to N-I, and the final copy is not blurred.
In one embodiment, the messaging server caches avatar skins indexed with user profiles and automatically selects skins which are indexed with the user.
In one embodiment, the messaging server breaks the recorded content into segments, and the avatar generation server synchronizes the content with the avatar skin segment-by-segment.
In one embodiment, the method comprises the further step of the messaging server transmitting a preview of the avatar message to the user device and only transmitting the avatar message after receiving user approval.
In one embodiment, the method comprises the further steps of the messaging server playing out animation content while the avatar generation server generates the avatar.
In one embodiment, the animation content is retrieved from said cache and is related to the selected avatar skin.
In one embodiment, the messaging server transmits the avatar message according to recipient preferences. In one embodiment, the messaging server multi-casts the avatar message.
In one embodiment, the content includes video or image content.
In another aspect, the invention provides a messaging system comprising:
a messaging server comprising means for performing messaging server operations of any method defined above, and
an avatar generation server comprising means for performing avatar generation server operations of any method defined above.
In a further aspect, the invention provides a computer program product comprising software code for performing operations of any method defined above when executing on a digital processor.
DETAILED DESCRIPTION OF THE INVENTION
Brief Description of the Drawings
The invention will be more clearly understood from the following description of some embodiments thereof, given by way of example only with reference to the accompanying drawings in which:-
Figs. 1 and 2 are block diagrams illustrating a messaging system of the invention; and
Figs. 3 and 4 are message sequence diagrams for operation of the system; and
Fig. 5 is a set of plots illustrating frame processing for a message.
Description of the Embodiments
The following acronyms are used in this specification. MMS Multimedia Messaging Service TUI Telephone User Interface
UMTS Universal Mobile Telecommunications System
VXML Voice XML
VUI Video User Interface DTMF Dual-tone multi-frequency
ASR Automatic Speech Recognition
Avatar Selection and Creation
Referring to Fig. 1, a video mail system 1 of the invention comprises a messaging server 2 (for example a video mail server), a storage device 3, and a cache 4. The system 1 also comprises an avatar generation server 5 and mobile devices 6.
Fig. 2 is a flow diagram showing how the messaging server 2 populates the cache 4. It retrieves avatar skins from the server 5 and transfers them to the cache 4 for later retrieval in real time. A listing of available avatar skins triggers the generation of a local cache of mark-up language
(VXML in this embodiment) documents; the list being sub-divided by category and sub-category and each grouping spanning a plurality of documents. Each document has commands for playing out an avatar skin selection menu and there are selectable avatar skins per menu. The required avatar skin video construction to be used for a messaging session is carried out, as described in more detail below. There is dynamic refresh of this cache.
The invention enhances a video mail service by providing dynamic creation of an avatar message during real time interaction with a user. The video mail system 1 records user audio (and possible also video) content, presents to the user available avatar "skins", synchronizes the recorded content with a selected avatar skin to provide the body of an avatar message (an "avatar"), receives user preview results, and sends the avatar message to the recipient. This is all carried out within a video call session.
Referring to Figs. 1 and 3, the following are the steps:
A mobile device 6 calls in to the messaging server 2 and records content for a message or greeting. This content is typically audio-only, but may include video. The messaging server 2 retrieves avatar skins from the cache 4.
The messaging server 2 plays out the available avatar skins to the user device and the user selects an avatar skin. The messaging server uses pre-created VXML documents and video files to control and present the available options with minimal delay for the end user. This is described in more detail below.
The messaging server 2, still during the messaging session, submits the audio received from the user and the selected avatar skin to the avatar generation server 5.
The avatar generation server 5 generates an avatar by synchronizing the received content and the avatar skin. Simultaneously, the messaging server 2 plays out information concerning the selected avatar skin in order to avoid the session being quiet while the avatar is being generated.
The generated avatar is sent by the server 5 to the server 2, is stored in permanent storage 3. The messaging server 2 generates a message incorporating the avatar in the body to provide an avatar message. It sends it for preview to the user device.
Upon receiving a user confirmation the avatar message is sent by the messaging server 2 according to recipient preferences. As Fig. 3 shows, the messaging server 2 may perform video editing before sending the preview avatar message.
As shown in Fig. 4, in another method the message is not played out to the recipient, rather being stored for later use. Otherwise, the steps are the same.
As the video messaging server 2 retrieves available avatar skins from the (locally-generated) cache 4, this allows a quick response to the caller, advantageously resulting in the caller avoiding experiencing long pauses due to network latency while previewing the available avatar skins.
The avatar message could be an audio greeting or any other type of message for another subscriber. In more detail, after recording the audio and/or video content, the user is automatically offered the service of creating an avatar message. The user is then able to view the available avatar skins through the VXML-driven menu system, and previews specific avatar skins as still images or as video previews. The invention allows flexibility by providing for dynamic updating of the cache 4, which stores data including the avatar skins and the VXML documents. The updating may be either by the messaging server 2 regularly polling an external service or by the external service pushing notifications to the messaging server 2. The VXML documents handle the controlled play-out to the user devices by controlling prompts for user confirmation or rejection of the preview avatar message. Also, the avatar skins stored in the cache are generated in a manner which achieves very efficient use of bandwidth for transmission to the user devices (sender for preview and recipient for delivery), particularly for the UMTS standard. This generation of the avatar skins involves transforming from JPEG to a 3GPP or MPEG video file which is adapted for optimum delivery on the bearer channel.
Once the user has selected their preferred avatar skin their recorded content is sent in real time by the messaging server 2 over a network connection to the avatar generation server 5. The content is synchronized by the server 5 with the subscriber's selected avatar skin and returned to the messaging server 2 for permanent storage or delivery in the session. The messaging server 2 may modify the generated avatar message for the appropriate transport medium. For example, for UMTS transmission, the avatar message may be processed by the messaging server 2 to reduce the maximum frame size. Or, the server 2 may be adapted to truncate the message for compliance with MMS size requirements.
While the content/avatar skin synchronization is performed on the server 5, a locally stored video clip, i.e. a sample animation for the selected avatar skin, is played to the user device for an improved experience. This clip is retrieved from the cache 4.
The system 1 can allow different types of message and consequently different avatar skins for each type of message. There will for example be different types of messages in response to different call situations such as recipient busy, recipient not answering, or in response to caller identification.
The messaging server allows message delivery via a variety of different delivery mechanisms, which can be according to recipient preferences. For instance, if the recipient desires, they may enable MMS Push of their messages or alternatively they could log on and retrieve the message through a video call.
The messaging server 2, using the cache 4, advantageously limits the amount of 'dead-air' users will experience and reserves processing power for handling of the call rather than content generation. The messaging system avoids dead air with the use of the cache. Because of the cache of VXML documents and avatar skin video files there are no delays which would otherwise arise if the messaging server 2 needed to interface with the remote avatar server 5 for the available avatar skins and selection menu play-out. The candidate avatar skins are generated and stored in the cache, allowing them to be played out at high speed. The server 2 is configurable to support different numbers of avatar skins per page. The avatar generation server
5 communicates via a network interface (e.g. HTTP, Web Service).
The video messaging server 2 queries the avatar-generation server 5 off-line to determine what avatar skins are available to be created by the system 1. The listing of available avatar skins triggers the creation of a local cache of VXML documents; the list is subdivided by category/sub-category and each grouping can span multiple VXML documents (for example, there may be four selectable avatars per VXML menu). The multiple documents address the limited displays available on mobile phones for viewing the available previews. The VXML documents handle the control of the user experience through standard DTMF or ASR via generation of the VXML menus.
The messaging server 2 can initiate cache update on a periodic basis or as directed by push notifications from the avatar generation server 5. The messaging server 2 constructs the cache using the icons of the avatar skins for dynamic generation of the video prompts for preview and selection of skins. The generation of an I-Frame is done to match existing background layouts of the video messaging server 2. The system accomplishes this in the simplest of forms with imaging layers where the background image is provided and a layout is selected that would not conflict with logos or other artefacts of the background image. The layout selected during the generation of the video has an impact on the generation of the VXML documents. If the template selected allows, three images, the DTMF keys or ASR will be constrained by that count. The system flattens the layered image to construct a single I-Frame for final conversion to a 3GPP standard video. The messaging server 2 generates the avatar skins as motion videos by constructing a series of I- Frames to play over time. These motion videos support a set of transitions (e.g. dissolve-in, fly- in, and appear). Also, the videos are generated in multiple formats, sizes, and bit rates to remove any unnecessary need to transcode the content depending on phone capabilities e.g. if the server 2 supports CIF and QCIF resolution both prompt sets will be generated.
Large frame sizes are a problem for video transmission over UMTS because UMTS has a transmission limit of 64kb/s, and a single frame can only use a portion of this bandwidth. The maximum frame size of the video message is limited so that only a portion of the frame is received and available for display. While constructing the avatar skin video prompts the messaging server 2 limits the differences of the I-Frames to not exceed a pre-determined maximum frame size. This applies to the complete construction of the video but is especially important for the construction of the initial I-Frame against which the subsequent Frames are referenced. For the initial I-Frame of the video a de-blurring mechanism is applied. The initial I- Frame is copied 3 times, the first is blurred with a Gaussian blur radius of 4, the second is blurred with a Gaussian blur radius of 2, and the final I-Frame is the original I-Frame. This is illustrated in Fig. 5, in which the horizontal axis represents number of frames for up to a 1 second period, and the vertical axis represents frame size in kbits.
The system also generates transition prompts ("animation play-out" in Figs. 3 and 4) to be run while the avatar is being generated; these prompts eliminate the dead-air on the call during creation to let the user know that the request is being processed. This cache of information can be refreshed at a predefined rate to keep the cache synchronized with the remote server.
By creating the avatar message through the video messaging server 2, the invention allows the message to be sent to the recipient based on the recipient's preferences, which may be MMS, Email, retrieved through Webmail, or calling into the video mail server. The avatar skins can be classified into groups for easy browsing and each avatar skin can be previewed for selection. The preview may be a video based on the avatar skin so that the user may be able to see how the avatar moves. Once the user has selected their desired avatar skin the system 1 in real time generates the full avatar message with the audio and/or video recorded by the user. After the synchronization of the audio and the avatar skin the user can preview the synchronized message prior to sending it to the recipient. The mail server 2 may also provide entries in a subscriber profile (e.g. LDAP or an RDBMS) that allows the subscribers to select a preset series of avatar skins. The user can then select these pre-selected avatar skins as a short-cut, bypassing the available avatar skins or selected during an audio only call.
It will be appreciated that the invention provides the user with an improved experience compared to the prior art. With the invention, management of the avatar functionality is available entirely via mobile video calling in a single session. The user may control creation and review of avatars using: - Mobile using UMTS video calling IMS Mobile PDA (which is SIP-enabled)
- Fixed video phone
— Fixed voice phone in conjunction with television being the video aspect (multimodal)
In a use case example, the invention is used to enhance a video blogging service. An end user calls into the messaging server 2 and records a selected blog and then has their video replaced by a selected avatar. This allows the user to select a video skin for their message, achieving anonymity of the sender's appearance, location, and environment. Another benefit is that the creator of the video blogging service can create an avatar that is themed to the topic of the delivery. This can provide an additional reinforcement of the message delivery. Thus a user can create a video blog content describing the importance of business professionalism, could select a business-based avatar skin, and record the content without concern for their current appearance or location. The messaging server could multi-cast the message to multiple destinations, for example to members of a closed user group. Alternatively, the system may take the resultant avatar message and transcode it to a more suitable format for the Web and upload it to a personal site or to a social networking site for anyone to view. This is similar to the previous example with the slight difference being that rather than a 1 -to- 1 messaging system it is a one-to-many system.
The subscriber also is subscribed to a set of blogs over an RSS feed. As the RSS service signals a new blog has been updated the service will fetch the content of the blog and parse the resultant HTML document and create an audio file from the resultant text with a text to speech engine; this may be done as multiple segments. The user may have pre-selected a set of avatar skins stored in their profile.
In the above the entire audio is converted to an avatar message, with automatic selection of an avatar skin. However, the audio may be broken into segments and each section individually synchronized with an avatar skin. Each audio segment may be combined with video to provide the content which is synchronized with an avatar skin. The resultant avatar messages may then be stored in a video mail server and retrieved over a video call as typically video messages are stored in a video mail server.
The created avatar messages may be available for preview during a video session to the end user, selectably navigable via dynamically created menu structures with visual descriptors of the content therein.
It will be appreciated that the system 1 seamlessly performs creation of an avatar message in real time, integrated within a video mail session. The seamless and transparent creation of avatar messages makes the use of multi-media content much easier for the user and it is more likely for the end user to exercise the service.
The invention is not limited to the embodiments described herein but may be varied in construction and detail. Thus, for example, advantageously the avatar generation service is not limited to being a remote service and could be integrated with the messaging service. In one embodiment the recorded audio and/or video content is applied to all avatar skins, and the subsequently created avatar messages can be previewed before accepting a given avatar message.
The media server part of the messaging server 2 captures the DTMF. In the case of the ASR the audio is sent to a speech recognition engine which quantifies the speech and returns to the video mail solution the reliability of audio as it maps to the available grammar created by the construction of the VXML documents.
In the above description the content is recorded in real time during a session. However, in another embodiment the content may have been recorded previously and synchronised with a selected avatar skin during the session. In this case the user device may retrieve the previously recorded content from the messaging server. This may be done using a menu system.

Claims

Claims
1. A messaging method comprising the steps of:
a messaging server receiving from a user device a request for sending a message,
the messaging server receiving content for inclusion in the message, said content including at least audio content;
an avatar-generation server automatically synchronising the content with an avatar skin to provide an avatar, and
the messaging server incorporating the avatar in a message body to provide an avatar message, and transmitting the avatar message or storing it for later use,
wherein the avatar message is generated in real time during a single session between the user device and the messaging server, the user device either uploading the content during the session or indicating during the session a location for it to be found; and
wherein the messaging server maintains a cache of avatar skins, and retrieves avatar skins from said cache to provide options to the user device for choice of an avatar skin during the session.
2. A method as claimed in claim 1, wherein the avatar generation server is separate from the messaging server, and said servers communicate in real time to perform the method.
3. A method as claimed in claims 1 or 2, wherein the messaging server communicates with the avatar-generation server at intervals to refresh the cache.
4. A methpcT as claimed in claims 1 or 2, wherein the messaging server receives notifications from the avatar- generation server to refresh the cache.
5. . A method as claimed in any preceding claim, wherein the messaging server maintains mark-up language documents related to the avatar skins, and executes mark-up of said documents for play-out of the avatar skins to the user device to provide user options for choice of an avatar skin.
6. A method as claimed in claim 5, wherein said documents are grouped by category and subcategory of avatar skin.
7. A method as claimed in claims 5 or 6, wherein each document relates to a menu for user choice of an avatar skin and there are selectable avatar skins per menu.
8. A method as claimed in any of claims 5 to 7, wherein the documents relate to different user device displays for viewing previews.
9. A method as claimed in any of claims 5 to 8, wherein the documents control device displays through standard DTMF, the mark-up language including DTMF control commands in menu options, the menus being generated and stored in the cache to create a display, and the menus being interpreted by a mark-up language interpreter for DTMF interfacing.
10. A method as claimed in any of claims 5 to 8, wherein the documents control device displays through automatic speech recognition, the mark-up language including automatic speech recognition commands for menu options, the menus being generated and stored in the cache to create a display, and the menus being interpreted by a mark-up language interpreter for automatic speech recognition interfacing.
11. A method as claimed in any preceding claim, wherein the messaging server constructs the avatar skins in the cache by limiting differences of I-Frames to not exceed a predetermined maximum frame size.
12. A method as claimed in claim 11, wherein the messaging server applies a de-blurring mechanism to the I-Frames, in which an original I-Frame is copied N times, and the first
N7I copies are blurred with a Gaussian blur radius.
13.' , A method as claimed in claim 12, wherein the blur radius generally reduces from N=I to N-I; and the final copy is not blurred.
14. A method as claimed in any preceding claim, wherein the messaging server caches avatar skins indexed with user profiles and automatically selects skins which are indexed with the user.
15. A method as claimed in any preceding claim, wherein the messaging server breaks the recorded content into segments, and the avatar generation server synchronizes the content with the avatar skin segment-by-segment.
16. A method as claimed in any preceding claim, comprising the further step of the messaging server transmitting a preview of the avatar message to the user device and only transmitting the avatar message after receiving user approval.
17. A method as claimed in any preceding claim, comprising the further steps of the messaging server playing out animation content while the avatar generation server generates the avatar.
18. A method as claimed in claim 17, wherein the animation content is retrieved from said cache and is related to the selected avatar skin.
19. A method as claimed in any preceding claim, wherein the messaging server transmits the avatar message according to recipient preferences.
20. A method as claimed in any preceding claim, wherein the messaging server multi-casts the avatar message.
21. A method as claimed in any preceding claim, wherein the content includes video or image content.
22. A messaging system comprising:
a messaging server comprising means for performing messaging server operations of a method of any preceding claim, and an avatar generation server comprising means for performing avatar generation server operations of a method of any preceding claim.
23. A computer program product comprising software code for performing operations of a method of any of claims 1 to 21 when executing on a digital processor.
PCT/IE2009/000003 2008-02-07 2009-02-06 Content generation and communication in a video mail system WO2009098671A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US694708P 2008-02-07 2008-02-07
US61/006,947 2008-02-07

Publications (2)

Publication Number Publication Date
WO2009098671A2 true WO2009098671A2 (en) 2009-08-13
WO2009098671A3 WO2009098671A3 (en) 2009-11-05

Family

ID=40786643

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IE2009/000003 WO2009098671A2 (en) 2008-02-07 2009-02-06 Content generation and communication in a video mail system

Country Status (1)

Country Link
WO (1) WO2009098671A2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014139142A1 (en) * 2013-03-15 2014-09-18 Intel Corporation Scalable avatar messaging
US9886622B2 (en) 2013-03-14 2018-02-06 Intel Corporation Adaptive facial expression calibration
CN108563481A (en) * 2018-04-09 2018-09-21 广州阿里巴巴文学信息技术有限公司 Method, equipment and the device of skin real time modifying preview
CN109918214A (en) * 2019-02-26 2019-06-21 深圳知云网络科技有限公司 A kind of group message server-side storage method
US10412029B2 (en) 2015-12-11 2019-09-10 Microsoft Technology Licensing, Llc Providing rich preview of communication in communication summary

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001236290A (en) * 2000-02-22 2001-08-31 Toshinao Komuro Communication system using avatar
WO2001084334A1 (en) * 2000-05-04 2001-11-08 Quarterview Co., Ltd. System and method for message transmission by recording and reproducing avatar action
WO2003094072A1 (en) * 2002-05-03 2003-11-13 Hyun-Gi An System and method for providing avatar mail

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001236290A (en) * 2000-02-22 2001-08-31 Toshinao Komuro Communication system using avatar
WO2001084334A1 (en) * 2000-05-04 2001-11-08 Quarterview Co., Ltd. System and method for message transmission by recording and reproducing avatar action
WO2003094072A1 (en) * 2002-05-03 2003-11-13 Hyun-Gi An System and method for providing avatar mail

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9886622B2 (en) 2013-03-14 2018-02-06 Intel Corporation Adaptive facial expression calibration
WO2014139142A1 (en) * 2013-03-15 2014-09-18 Intel Corporation Scalable avatar messaging
US10044849B2 (en) 2013-03-15 2018-08-07 Intel Corporation Scalable avatar messaging
US10412029B2 (en) 2015-12-11 2019-09-10 Microsoft Technology Licensing, Llc Providing rich preview of communication in communication summary
CN108563481A (en) * 2018-04-09 2018-09-21 广州阿里巴巴文学信息技术有限公司 Method, equipment and the device of skin real time modifying preview
CN108563481B (en) * 2018-04-09 2021-12-24 阿里巴巴(中国)有限公司 Method, equipment and device for modifying preview of skin in real time
CN109918214A (en) * 2019-02-26 2019-06-21 深圳知云网络科技有限公司 A kind of group message server-side storage method
CN109918214B (en) * 2019-02-26 2022-11-18 维正科技服务有限公司 Group message server storage method

Also Published As

Publication number Publication date
WO2009098671A3 (en) 2009-11-05

Similar Documents

Publication Publication Date Title
US8660038B1 (en) Previewing voicemails using mobile devices
US8380786B2 (en) Subscribing to mobile media sharing
US7725072B2 (en) Provision of messaging services from a video messaging system based on ANI and CLID
US8086172B2 (en) Provision of messaging services from a video messaging system for video compatible and non-video compatible equipment
US7813724B2 (en) System and method for multimedia-to-video conversion to enhance real-time mobile video services
US8260263B2 (en) Dynamic video messaging
US7826831B2 (en) Video based interfaces for video message systems and services
US7636348B2 (en) Distributed IP architecture for telecommunications system with video mail
US8036350B2 (en) Audio chunking
US8542804B2 (en) Voice and text mail application for communication devices
US20100274847A1 (en) System and method for remotely indicating a status of a user
US20090092234A1 (en) Answering video chat requests
CN108540437A (en) Multimedia communication method
US8503625B2 (en) Managing packet-based voicemail messages
US8112778B2 (en) Video mail and content playback control with cellular handset
WO2009098671A2 (en) Content generation and communication in a video mail system
GB2508138A (en) Delivering video content to a device by storing multiple formats
US20090214006A1 (en) System and method for providing enhanced voice messaging services
US20080162650A1 (en) User-chosen media content
EP1976256A2 (en) System and method for pushing multimedia messages to a communication device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09708420

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09708420

Country of ref document: EP

Kind code of ref document: A2