WO2009098671A2

WO2009098671A2 - Content generation and communication in a video mail system

Info

Publication number: WO2009098671A2
Application number: PCT/IE2009/000003
Authority: WO
Inventors: David Harmony
Original assignee: Markport Limited
Priority date: 2008-02-07
Filing date: 2009-02-06
Publication date: 2009-08-13
Also published as: WO2009098671A3

Abstract

A video mail system (1) comprises a messaging server 2, a storage device (3), a cache (4), and an avatar generation server (5). The messaging server (2) populates the cache (4) from the server (5) and transfers them to the cache 4 for later retrieval in real time. The invention enhances a video mail service by providing dynamic creation of an avatar message during real time interaction with a user. The video mail system (1) records user audio (and possible also video) content, presents to the user available avatar 'skins', synchronizes the recorded content with a selected avatar skin to provide the body of an avatar message (an 'avatar'), receives user preview results, and sends the avatar message to the recipient. This is all carried out within a video call session.

Description

"Content Generation and Communication in a Video Mail System"

INTRODUCTION

Field of the Invention

The invention relates to generation of content, particularly avatars.

Prior Art Discussion

Some subscribers are hesitant to provide their own image to be left for greetings and messages. Avatars provide a way to allow users to create greetings and messages without providing their own images. This can be an action that is done for fun or because of personal preferences.

US7176956 describes video enhancement of an avatar.

EP 1814294 describes creating personalized multimedia content with video independently of a messaging system.

The invention is directed towards providing an improved method for generation of avatars so that they can be used for communication in a more convenient manner to users.

SUMMARY OF THE INVENTION

According to the invention, there is provided a messaging method comprising the steps of:

a messaging server receiving from a user device a request for sending a message,

the messaging server receiving content for inclusion in the message, said content . including at least audio content;

an avatar-generation server automatically synchronising the content with an avatar skin to provide an avatar, and the messaging server incorporating the avatar in a message body to provide an avatar message, and transmitting the avatar message or storing it for later use,

wherein the avatar message is generated in real time during a single session between the user device and the messaging server, the user device either uploading the content during the session or indicating during the session a location for it to be found; and

wherein the messaging server maintains a cache of avatar skins, and retrieves avatar skins from said cache to provide options to the user device for choice of an avatar skin during the session.

In one embodiment, the avatar generation server is separate from the messaging server, and said servers communicate in real time to perform the method.

In one embodiment, the messaging server communicates with the avatar- generation server at intervals to refresh the cache.

In another embodiment, the messaging server receives notifications from the avatar-generation server to refresh the cache.

In one embodiment, wherein the messaging server maintains mark-up language documents related to the avatar skins, and executes mark-up of said documents for play-out of the avatar skins to the user device to provide user options for choice of an avatar skin.

In another embodiment, said documents are grouped by category and subcategory of avatar skin.

hi one embodiment, each document relates to a menu for user choice of an avatar skin and there are selectable avatar skins per menu.

In a further embodiment, the documents relate to different user device displays for viewing previews. ^■

In one embodiment; the documents control device displays through standard DTMF, the mark-up language including DTMF control commands in menu options, the menus being generated and stored in the cache to create a display, and the menus being interpreted by a mark-up language interpreter for DTMF interfacing.

In one embodiment, the documents control device displays through automatic speech recognition, the mark-up language including automatic speech recognition commands for menu options, the menus being generated and stored in the cache to create a display, and the menus being interpreted by a mark-up language interpreter for automatic speech recognition interfacing.

In another embodiment, the messaging server constructs the avatar skins in the cache by limiting differences of I-Frames to not exceed a pre-determined maximum frame size.

In one embodiment, the messaging server applies a de-blurring mechanism to the I-Frames, in which an original I-Frame is copied N times, and the first N-I copies are blurred with a Gaussian blur radius. Preferably, the blur radius generally reduces from N=I to N-I, and the final copy is not blurred.

In one embodiment, the messaging server caches avatar skins indexed with user profiles and automatically selects skins which are indexed with the user.

In one embodiment, the messaging server breaks the recorded content into segments, and the avatar generation server synchronizes the content with the avatar skin segment-by-segment.

In one embodiment, the method comprises the further step of the messaging server transmitting a preview of the avatar message to the user device and only transmitting the avatar message after receiving user approval.

In one embodiment, the method comprises the further steps of the messaging server playing out animation content while the avatar generation server generates the avatar.

In one embodiment, the animation content is retrieved from said cache and is related to the selected avatar skin.

In one embodiment, the messaging server transmits the avatar message according to recipient preferences. In one embodiment, the messaging server multi-casts the avatar message.

In one embodiment, the content includes video or image content.

In another aspect, the invention provides a messaging system comprising:

a messaging server comprising means for performing messaging server operations of any method defined above, and

an avatar generation server comprising means for performing avatar generation server operations of any method defined above.

In a further aspect, the invention provides a computer program product comprising software code for performing operations of any method defined above when executing on a digital processor.

DETAILED DESCRIPTION OF THE INVENTION

Brief Description of the Drawings

The invention will be more clearly understood from the following description of some embodiments thereof, given by way of example only with reference to the accompanying drawings in which:-

Figs. 1 and 2 are block diagrams illustrating a messaging system of the invention; and

Figs. 3 and 4 are message sequence diagrams for operation of the system; and

Fig. 5 is a set of plots illustrating frame processing for a message.

Description of the Embodiments

The following acronyms are used in this specification. MMS Multimedia Messaging Service TUI Telephone User Interface

UMTS Universal Mobile Telecommunications System

VXML Voice XML

VUI Video User Interface DTMF Dual-tone multi-frequency

ASR Automatic Speech Recognition

Avatar Selection and Creation

Referring to Fig. 1, a video mail system 1 of the invention comprises a messaging server 2 (for example a video mail server), a storage device 3, and a cache 4. The system 1 also comprises an avatar generation server 5 and mobile devices 6.

Fig. 2 is a flow diagram showing how the messaging server 2 populates the cache 4. It retrieves avatar skins from the server 5 and transfers them to the cache 4 for later retrieval in real time. A listing of available avatar skins triggers the generation of a local cache of mark-up language

(VXML in this embodiment) documents; the list being sub-divided by category and sub-category and each grouping spanning a plurality of documents. Each document has commands for playing out an avatar skin selection menu and there are selectable avatar skins per menu. The required avatar skin video construction to be used for a messaging session is carried out, as described in more detail below. There is dynamic refresh of this cache.

The invention enhances a video mail service by providing dynamic creation of an avatar message during real time interaction with a user. The video mail system 1 records user audio (and possible also video) content, presents to the user available avatar "skins", synchronizes the recorded content with a selected avatar skin to provide the body of an avatar message (an "avatar"), receives user preview results, and sends the avatar message to the recipient. This is all carried out within a video call session.

Referring to Figs. 1 and 3, the following are the steps:

A mobile device 6 calls in to the messaging server 2 and records content for a message or greeting. This content is typically audio-only, but may include video. The messaging server 2 retrieves avatar skins from the cache 4.

The messaging server 2 plays out the available avatar skins to the user device and the user selects an avatar skin. The messaging server uses pre-created VXML documents and video files to control and present the available options with minimal delay for the end user. This is described in more detail below.

The messaging server 2, still during the messaging session, submits the audio received from the user and the selected avatar skin to the avatar generation server 5.

The avatar generation server 5 generates an avatar by synchronizing the received content and the avatar skin. Simultaneously, the messaging server 2 plays out information concerning the selected avatar skin in order to avoid the session being quiet while the avatar is being generated.

The generated avatar is sent by the server 5 to the server 2, is stored in permanent storage 3. The messaging server 2 generates a message incorporating the avatar in the body to provide an avatar message. It sends it for preview to the user device.

Upon receiving a user confirmation the avatar message is sent by the messaging server 2 according to recipient preferences. As Fig. 3 shows, the messaging server 2 may perform video editing before sending the preview avatar message.

As shown in Fig. 4, in another method the message is not played out to the recipient, rather being stored for later use. Otherwise, the steps are the same.

As the video messaging server 2 retrieves available avatar skins from the (locally-generated) cache 4, this allows a quick response to the caller, advantageously resulting in the caller avoiding experiencing long pauses due to network latency while previewing the available avatar skins.

The avatar message could be an audio greeting or any other type of message for another subscriber. In more detail, after recording the audio and/or video content, the user is automatically offered the service of creating an avatar message. The user is then able to view the available avatar skins through the VXML-driven menu system, and previews specific avatar skins as still images or as video previews. The invention allows flexibility by providing for dynamic updating of the cache 4, which stores data including the avatar skins and the VXML documents. The updating may be either by the messaging server 2 regularly polling an external service or by the external service pushing notifications to the messaging server 2. The VXML documents handle the controlled play-out to the user devices by controlling prompts for user confirmation or rejection of the preview avatar message. Also, the avatar skins stored in the cache are generated in a manner which achieves very efficient use of bandwidth for transmission to the user devices (sender for preview and recipient for delivery), particularly for the UMTS standard. This generation of the avatar skins involves transforming from JPEG to a 3GPP or MPEG video file which is adapted for optimum delivery on the bearer channel.

Once the user has selected their preferred avatar skin their recorded content is sent in real time by the messaging server 2 over a network connection to the avatar generation server 5. The content is synchronized by the server 5 with the subscriber's selected avatar skin and returned to the messaging server 2 for permanent storage or delivery in the session. The messaging server 2 may modify the generated avatar message for the appropriate transport medium. For example, for UMTS transmission, the avatar message may be processed by the messaging server 2 to reduce the maximum frame size. Or, the server 2 may be adapted to truncate the message for compliance with MMS size requirements.

While the content/avatar skin synchronization is performed on the server 5, a locally stored video clip, i.e. a sample animation for the selected avatar skin, is played to the user device for an improved experience. This clip is retrieved from the cache 4.

The system 1 can allow different types of message and consequently different avatar skins for each type of message. There will for example be different types of messages in response to different call situations such as recipient busy, recipient not answering, or in response to caller identification.

The messaging server allows message delivery via a variety of different delivery mechanisms, which can be according to recipient preferences. For instance, if the recipient desires, they may enable MMS Push of their messages or alternatively they could log on and retrieve the message through a video call.

The messaging server 2, using the cache 4, advantageously limits the amount of 'dead-air' users will experience and reserves processing power for handling of the call rather than content generation. The messaging system avoids dead air with the use of the cache. Because of the cache of VXML documents and avatar skin video files there are no delays which would otherwise arise if the messaging server 2 needed to interface with the remote avatar server 5 for the available avatar skins and selection menu play-out. The candidate avatar skins are generated and stored in the cache, allowing them to be played out at high speed. The server 2 is configurable to support different numbers of avatar skins per page. The avatar generation server

5 communicates via a network interface (e.g. HTTP, Web Service).

The video messaging server 2 queries the avatar-generation server 5 off-line to determine what avatar skins are available to be created by the system 1. The listing of available avatar skins triggers the creation of a local cache of VXML documents; the list is subdivided by category/sub-category and each grouping can span multiple VXML documents (for example, there may be four selectable avatars per VXML menu). The multiple documents address the limited displays available on mobile phones for viewing the available previews. The VXML documents handle the control of the user experience through standard DTMF or ASR via generation of the VXML menus.

The messaging server 2 can initiate cache update on a periodic basis or as directed by push notifications from the avatar generation server 5. The messaging server 2 constructs the cache using the icons of the avatar skins for dynamic generation of the video prompts for preview and selection of skins. The generation of an I-Frame is done to match existing background layouts of the video messaging server 2. The system accomplishes this in the simplest of forms with imaging layers where the background image is provided and a layout is selected that would not conflict with logos or other artefacts of the background image. The layout selected during the generation of the video has an impact on the generation of the VXML documents. If the template selected allows, three images, the DTMF keys or ASR will be constrained by that count. The system flattens the layered image to construct a single I-Frame for final conversion to a 3GPP standard video. The messaging server 2 generates the avatar skins as motion videos by constructing a series of I- Frames to play over time. These motion videos support a set of transitions (e.g. dissolve-in, fly- in, and appear). Also, the videos are generated in multiple formats, sizes, and bit rates to remove any unnecessary need to transcode the content depending on phone capabilities e.g. if the server 2 supports CIF and QCIF resolution both prompt sets will be generated.

Large frame sizes are a problem for video transmission over UMTS because UMTS has a transmission limit of 64kb/s, and a single frame can only use a portion of this bandwidth. The maximum frame size of the video message is limited so that only a portion of the frame is received and available for display. While constructing the avatar skin video prompts the messaging server 2 limits the differences of the I-Frames to not exceed a pre-determined maximum frame size. This applies to the complete construction of the video but is especially important for the construction of the initial I-Frame against which the subsequent Frames are referenced. For the initial I-Frame of the video a de-blurring mechanism is applied. The initial I- Frame is copied 3 times, the first is blurred with a Gaussian blur radius of 4, the second is blurred with a Gaussian blur radius of 2, and the final I-Frame is the original I-Frame. This is illustrated in Fig. 5, in which the horizontal axis represents number of frames for up to a 1 second period, and the vertical axis represents frame size in kbits.

The system also generates transition prompts ("animation play-out" in Figs. 3 and 4) to be run while the avatar is being generated; these prompts eliminate the dead-air on the call during creation to let the user know that the request is being processed. This cache of information can be refreshed at a predefined rate to keep the cache synchronized with the remote server.

By creating the avatar message through the video messaging server 2, the invention allows the message to be sent to the recipient based on the recipient's preferences, which may be MMS, Email, retrieved through Webmail, or calling into the video mail server. The avatar skins can be classified into groups for easy browsing and each avatar skin can be previewed for selection. The preview may be a video based on the avatar skin so that the user may be able to see how the avatar moves. Once the user has selected their desired avatar skin the system 1 in real time generates the full avatar message with the audio and/or video recorded by the user. After the synchronization of the audio and the avatar skin the user can preview the synchronized message prior to sending it to the recipient. The mail server 2 may also provide entries in a subscriber profile (e.g. LDAP or an RDBMS) that allows the subscribers to select a preset series of avatar skins. The user can then select these pre-selected avatar skins as a short-cut, bypassing the available avatar skins or selected during an audio only call.

It will be appreciated that the invention provides the user with an improved experience compared to the prior art. With the invention, management of the avatar functionality is available entirely via mobile video calling in a single session. The user may control creation and review of avatars using: - Mobile using UMTS video calling IMS Mobile PDA (which is SIP-enabled)

- Fixed video phone

— Fixed voice phone in conjunction with television being the video aspect (multimodal)

In a use case example, the invention is used to enhance a video blogging service. An end user calls into the messaging server 2 and records a selected blog and then has their video replaced by a selected avatar. This allows the user to select a video skin for their message, achieving anonymity of the sender's appearance, location, and environment. Another benefit is that the creator of the video blogging service can create an avatar that is themed to the topic of the delivery. This can provide an additional reinforcement of the message delivery. Thus a user can create a video blog content describing the importance of business professionalism, could select a business-based avatar skin, and record the content without concern for their current appearance or location. The messaging server could multi-cast the message to multiple destinations, for example to members of a closed user group. Alternatively, the system may take the resultant avatar message and transcode it to a more suitable format for the Web and upload it to a personal site or to a social networking site for anyone to view. This is similar to the previous example with the slight difference being that rather than a 1 -to- 1 messaging system it is a one-to-many system.

The subscriber also is subscribed to a set of blogs over an RSS feed. As the RSS service signals a new blog has been updated the service will fetch the content of the blog and parse the resultant HTML document and create an audio file from the resultant text with a text to speech engine; this may be done as multiple segments. The user may have pre-selected a set of avatar skins stored in their profile.

In the above the entire audio is converted to an avatar message, with automatic selection of an avatar skin. However, the audio may be broken into segments and each section individually synchronized with an avatar skin. Each audio segment may be combined with video to provide the content which is synchronized with an avatar skin. The resultant avatar messages may then be stored in a video mail server and retrieved over a video call as typically video messages are stored in a video mail server.

The created avatar messages may be available for preview during a video session to the end user, selectably navigable via dynamically created menu structures with visual descriptors of the content therein.

It will be appreciated that the system 1 seamlessly performs creation of an avatar message in real time, integrated within a video mail session. The seamless and transparent creation of avatar messages makes the use of multi-media content much easier for the user and it is more likely for the end user to exercise the service.

The invention is not limited to the embodiments described herein but may be varied in construction and detail. Thus, for example, advantageously the avatar generation service is not limited to being a remote service and could be integrated with the messaging service. In one embodiment the recorded audio and/or video content is applied to all avatar skins, and the subsequently created avatar messages can be previewed before accepting a given avatar message.

The media server part of the messaging server 2 captures the DTMF. In the case of the ASR the audio is sent to a speech recognition engine which quantifies the speech and returns to the video mail solution the reliability of audio as it maps to the available grammar created by the construction of the VXML documents.

In the above description the content is recorded in real time during a session. However, in another embodiment the content may have been recorded previously and synchronised with a selected avatar skin during the session. In this case the user device may retrieve the previously recorded content from the messaging server. This may be done using a menu system.

Claims

1. A messaging method comprising the steps of:

the messaging server receiving content for inclusion in the message, said content including at least audio content;

an avatar-generation server automatically synchronising the content with an avatar skin to provide an avatar, and

the messaging server incorporating the avatar in a message body to provide an avatar message, and transmitting the avatar message or storing it for later use,

2. A method as claimed in claim 1, wherein the avatar generation server is separate from the messaging server, and said servers communicate in real time to perform the method.

3. A method as claimed in claims 1 or 2, wherein the messaging server communicates with the avatar-generation server at intervals to refresh the cache.

4. A methpcT as claimed in claims 1 or 2, wherein the messaging server receives notifications from the avatar- generation server to refresh the cache.

5. . A method as claimed in any preceding claim, wherein the messaging server maintains mark-up language documents related to the avatar skins, and executes mark-up of said documents for play-out of the avatar skins to the user device to provide user options for choice of an avatar skin.

6. A method as claimed in claim 5, wherein said documents are grouped by category and subcategory of avatar skin.

7. A method as claimed in claims 5 or 6, wherein each document relates to a menu for user choice of an avatar skin and there are selectable avatar skins per menu.

8. A method as claimed in any of claims 5 to 7, wherein the documents relate to different user device displays for viewing previews.

9. A method as claimed in any of claims 5 to 8, wherein the documents control device displays through standard DTMF, the mark-up language including DTMF control commands in menu options, the menus being generated and stored in the cache to create a display, and the menus being interpreted by a mark-up language interpreter for DTMF interfacing.

10. A method as claimed in any of claims 5 to 8, wherein the documents control device displays through automatic speech recognition, the mark-up language including automatic speech recognition commands for menu options, the menus being generated and stored in the cache to create a display, and the menus being interpreted by a mark-up language interpreter for automatic speech recognition interfacing.

11. A method as claimed in any preceding claim, wherein the messaging server constructs the avatar skins in the cache by limiting differences of I-Frames to not exceed a predetermined maximum frame size.

12. A method as claimed in claim 11, wherein the messaging server applies a de-blurring mechanism to the I-Frames, in which an original I-Frame is copied N times, and the first

^• N₇I copies are blurred with a Gaussian blur radius.

13.^' , A method as claimed in claim 12, wherein the blur radius generally reduces from N=I to N-I; and the final copy is not blurred.

14. A method as claimed in any preceding claim, wherein the messaging server caches avatar skins indexed with user profiles and automatically selects skins which are indexed with the user.

15. A method as claimed in any preceding claim, wherein the messaging server breaks the recorded content into segments, and the avatar generation server synchronizes the content with the avatar skin segment-by-segment.

16. A method as claimed in any preceding claim, comprising the further step of the messaging server transmitting a preview of the avatar message to the user device and only transmitting the avatar message after receiving user approval.

17. A method as claimed in any preceding claim, comprising the further steps of the messaging server playing out animation content while the avatar generation server generates the avatar.

18. A method as claimed in claim 17, wherein the animation content is retrieved from said cache and is related to the selected avatar skin.

19. A method as claimed in any preceding claim, wherein the messaging server transmits the avatar message according to recipient preferences.

20. A method as claimed in any preceding claim, wherein the messaging server multi-casts the avatar message.

21. A method as claimed in any preceding claim, wherein the content includes video or image content.

22. A messaging system comprising:

a messaging server comprising means for performing messaging server operations of a method of any preceding claim, and an avatar generation server comprising means for performing avatar generation server operations of a method of any preceding claim.

23. A computer program product comprising software code for performing operations of a method of any of claims 1 to 21 when executing on a digital processor.