WO2007002448A1

WO2007002448A1 - Apparatus, system, method, and article of manufacture for automatic context-based media transformation and generation

Info

Publication number: WO2007002448A1
Application number: PCT/US2006/024552
Authority: WO
Inventors: Lalit S. Sarna; David M. Westwood; Gregory L. Lutter; Jianyu Deng; Daniel F. Gies; Samuel Trychin
Original assignee: Vidiator Enterprises Inc.
Priority date: 2005-06-23
Filing date: 2006-06-23
Publication date: 2007-01-04
Also published as: CN101208929A; JP2008544412A; EP1908255A1; TW200718134A

Abstract

Embodiments provide automatic media generation or other transformation (including enhancement) of multiple media inputs for delivery to user devices in a device-agnostic manner. The transformation process is further automatically enhanced based on contextual information. The contextual information provides customized content enhancement to the media delivered to a user device. The contextual information is automatically integrated with the transformed and generated media to provide highly customized rich-media content delivery in a device-agnostic manner.

Description

APPARATUS, SYSTEM, METHOD, AND ARTICLE OF MANUFACTURE FOR AUTOMATIC CONTEXT-BASED MEDIA TRANSFORMATION AND

GENERATION

CROSS-REFERENCE TO RELATED APPLICATION This application claims the benefit under 35 U. S. C. § 119(e) of

U.S. Provisional Patent Application Serial No. 60/693,381 , entitled "APPARATUS, SYSTEM, METHOD, AND ARTICLE OF MANUFACTURE FOR AUTOMATIC CONTEXT-BASED MEDIA TRANSFORMATION AND GENERATION," filed June 23, 2005, assigned to the same assignee as the present application, and incorporated herein by reference in its entirety.

TECHNICAL FIELD

The present disclosure generally relates to delivery of media over a communications network, and in particular but not exclusively, relates to media generation and transformation for delivery over a communication network to client devices.

BACKGROUND INFORMATION

Various forms of media format conversion are presently available in the market. For example, many companies routinely use a voice recognition system for telephones to input data to customer service server-side applications that convert the voice data to text data for processing. Media transformation techniques include, but are not limited to: text to speech, image to 3D renders and image to video. Also known in the prior art are various techniques for customizing data for delivery to various user devices depending on device capabilities. For example, it is known to modify data at a server-side application for varying display and audio capabilities before delivery to a client device, usually based on some information provided by the requesting device. Examples include modifying web content for a cell-phone or PDA-sized display versus a full-size PC display. However, such basic media format conversion and media transformation techniques leave much to be desired, particularly in the present environments where users have an ever-growing demand for many different types of media content that can be easily delivered to many different types of client devices. Such media format conversion and media transformation techniques are simply too basic and primitive to satisfy the needs of technologically sophisticated users who wish to maximize the benefits of their varying user devices.

BRIEF SUMMARY According to one aspect, a method includes obtaining first data, analyzing the first data to obtain context information therefrom, using the obtained context information to determine second data, supplementing the first data with the determined second data to obtain transformed data, and delivering the transformed data to at least one client device.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

Non-limiting and non-exhaustive embodiments are described with reference to the following figures, wherein like reference numerals refer to like parts throughout the various views unless otherwise specified.

Figure 1 is a block diagram of a system according to one embodiment.

Figure 2 is a block diagram of a system according to another embodiment.

Figure 3 is a block diagram of a system according to still another embodiment. Figure 4 is a block diagram of a system according to yet another embodiment. DETAILED DESCRIPTION

Embodiments for automatic context-based media transformation and generation are described herein. In the following description, numerous specific details are given to provide a thorough understanding of embodiments. One skilled in the relevant art will recognize, however, that the embodiments can be practiced without one or more of the specific details, or with other methods, components, materials, etc. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of the embodiments. Reference throughout this specification to "one embodiment" or

"an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

As explained previously above, there exist some prior art techniques for media format conversion and media transformation. In the prior art, however, there have not been any systems or methods introduced to provide automated media generation or transformation for simultaneous multiple media inputs and device outputs. In particular, no system and methods are available to provide media transformation and media enhancement based on contextual parameters. As an overview, an embodiment provides a system for automatic media generation or other transformation (including enhancement) of multiple media input formats for delivery to a user device in a device-agnostic manner. In one embodiment, the transformation process is further automatically enhanced based on contextual information. The contextual information provides customized content enhancement to whatever media is delivered to a user device. The contextual information is automatically integrated with the transformed and generated media to provide highly customized rich-media content delivery in a device-agnostic manner.

Fig. 1 illustrates a first system 10 for implementing an embodiment. Fig. 1 shows an automated media customization system 100 having a plurality of media inputs (101, 102, 103, etc.) and customized media outputs (111 , 121 , 131 , etc.) that feed one or more user devices (110, 120, 130, etc.). The input media (101 , 102, 103, etc.) can take any form such as audio, video, 3D renders, graphics, Internet/web content, live audio or video, files, animation or text feeds such as stock quotes, news and weather broadcasts, satellite images and sports feeds, for example.

The various media inputs (101, 102, 103, etc.) to the media customization system 100 can include content such as media created by end users such as text messages, audio messages, images, and audio clips. The media inputs can also include preset list information such as fixed text messages, avatars or other pictorial representation, images, and themes. The media inputs can be supplied by third-party information services such as news feeds, weather reports, sports headlines, and stock quotes, for example. It is appreciated that media types/forms other than those specifically mentioned herein may be contained in the media inputs (101 , 102, 103, etc.). An embodiment of the automatic media customization system 100 provides on-the-fly format conversion for the various media inputs (101 , 102, 103, etc.) and can also generate media based on contextual parameters to combine with the input media to form respective customized media outputs (111 , 121, 131, etc.) to each user device, and/or one or more of the devices (110, 120, 130, etc.) can receive more than one of the media outputs.

According to an embodiment, a user of the client device(s) can select a media delivery preference for one or more destination client devices (110, 120, 130, etc.). The system 100 may be deployed as stand-alone client technologies or a combination of server-client technologies. In one embodiment, the customized media is generated on a server (e.g., the media customization system 100 is located at the server) and delivered to the devices (110, 120, 130, etc.). It is also possible to generate some or all of the media at the device(s) (110, 120, 130, etc.), based on instructions and data from the media customization system 100. Media generation is optimized to use the resources available for individual user accounts and/or devices and the components available within the media customization system.

A context component 107 of one embodiment determines contextual information and/or integrates contextual elements into an enriched media format for delivery to a client device. The contextual information may be preset or deduced based on several factors including application types, media types, location of the client device, end-user input, and end-user demographics, as examples. The resulting context-enriched media and/or other enriched media may be generated by a media generation component 106 in order to match the client device capabilities or factors listed above. The media generation component 106 and the context component 107 may comprise the same component in one embodiment.

In one embodiment, the context component 107 includes an input for context, such as user inputted preferences sent from a client device or context data that is deduced from user account data such as service request history, location data, billing data, applied demographic data, user geographic/location data, etc. that is stored in a customer information database 109 or other storage unit, which may be separate from the media customization system 100 and/or integrated therein. In one embodiment, other contextual elements can be determined by the context component 107 by decomposing a media input and parsing the content, and then generating new customized media using any determined contextual elements, such as emotive elements in text and speech like vocal inflections and emoticons, and location-based context such as rain sounds for a rainy weather forecast.

In an embodiment, the media customization system 100 includes a transcoding component 105 to transform the media to be delivered to the devices (110, 120, 130, etc.) in a manner that is optimized for these devices and/or channel conditions. For example, the transcoding component 105 can dynamically change bit rate, frame rate, resolution, encoding format, color format, or other parameter associated with the media to be delivered in a manner that optimizes the media based on characteristics of the client device to receive that media and/or based on channel conditions to that client device, which may change throughout a communication session. Example techniques and modules that can be used by the transcoding component 105 are disclosed in U.S. Patent Application Serial No. 09/502,390, entitled "COMPUTER PROGRAM PRODUCT FOR TRANSFORMING STREAMING VIDEO DATA," filed February 10, 2000, assigned to the same assignee as the present application, and incorporated herein by reference in its entirety.

The system 100 takes the various media inputs and then, based on the context information, transforms or otherwise generates customized media in a different or the same media type. The customized media can then be delivered to a client device that has video display and/or audio capabilities. The delivery mechanisms include, but are not limited to streaming, email, MMS, WAP PUSH, and download link. The delivery channel includes, but is not limited to, wireless and wired networks, and cable assisted upload channels.

Figs. 2 and 3 show other embodiments for the system 100 of Fig. 1. For example, Fig. 2 shows a media transcoding system (or other transformation system) 200 in which the context component 107 includes the media generation component 106. Further in an embodiment, the user information database 109 can include information pertaining to device capabilities, alternatively or additionally to user information or other information. As shown by way of example in Fig. 2, the various client devices can include a cellular telephone 210, a wireless device (such as a PDA or Blackberry) 220, a laptop 230, and so forth.

The user information database 109 of one embodiment can contain preferences and user histories. A user can select an avatar or customize an avatar or other pictorial representation, and that information can be stored either at the client device or in the user information database 109. For example, a user might want a predetermined avatar such as a "Maria Bartiromo" avatar or a "Larry Kudlow" avatar to deliver stock news on their client device. Users can also provide a digital picture for use as a basis for a new custom avatar. As explained above, the user information database 109 can also contain a device capabilities database that correlates media capabilities with a particular user account. If a user has more than one device, a MIN, SID or other device identification code can be cross-referenced and determined on-the-fly based on a request for service from the device.

According to various embodiments, there are one or more processes involved: A first process is analyzing the input media by the applying media generation component 106 and context component 107 to deduce various context characteristics and combing the contextual information with other information that is available, for example from the database 109. The media customization system of Figs. 1-3 is configured to provide audio analysis, text analysis, video analysis, image analysis, location based info analysis, user demographics analysis, and user behavior history analysis.

A second process comprises using the characteristics determined by analysis and other information, such as user preferences and device capabilities, to generate a new and/or modified media type that delivers this information in an appealing and exciting fashion to a client device. The resulting customized output media can include: images, 3D animation, 2D animation, video + audio, video only, and audio only output, and various other media output types. The content of the customized media can be adapted and updated as new features become available. A feature of an embodiment is in the restructuring of customized media from the raw media inputs based on intelligent automated decomposition (if appropriate), analysis and generation of a user/device-customized media based on contextual parameters.

One feature of the system is in the application of "intelligent" media generation and transformation or other intelligent transformation. The intelligent generation and transformation elements of the system provide services such as taking a raw media data source and converting it automatically into a 3D avatar animation or other pictorial representation. Each application of the system shares a certain amount of common functionality. For example, the underlying enabling technology components such as text to speech, speech to animation, 3D server side rendering, 3D client side rendering and encoding of 3D to video.

In order to assemble a customized media output, several context components, such as templates, are applied by an embodiment (see, e.g., the context component 307 in the media customization system 300 of Fig. 3). A context-specific media placement template 337 can be employed to define the positioning of media in an avatar animation clip; for example, the template 337 can place the weather map in the background, and place the avatar on the left hand side for a screen.

A context-specific media animation template 347 can also be employed to define the flow of the animation in time; for example, audio to animation conversion, and the general flow of the application screens. The template347 can be configured direct the composition of an animation according to a sequence in time; for example to show a splash screen, show a weather map in center of the device screen introducing the weather, have the weather man move to left side, and have the weather map correlate to an audio feed of a 24 hour forecast. A context-specific database 317 referenced to the user information database can also be employed for media objects and text to audio conversion to match events to media objects in the scene.

The context component 307 can include a context media generation engine 327 that has the role of scene generation and animation generation, for example. The "intelligent" context media generation engine 327 of one embodiment first takes the input media, and optionally user preferences, and uses information stored in the user database 109 to determine what media assets should change in the scene and further uses the placement template 337 to place those media assets in the scene in the correct location, at the correct time in the media clip. For example for a weather forecast - the input might be the text string "60 F, 80% precipitation". The text can be parsed by the context component 307 to determine key contextual parameters, such as "precipitation" and then compared to context information from the context database 317 in order to fetch media objects that present the appearance of a wet day with a high chance of rain - e.g., the weather girl might change her appearance to wear waterproof clothing, and the location of the forecast might be outside where we have an animated 3D scene showing rain coming down.

Animation is generated in a similar manner by the media generation engine component 306 in Fig. 3 using an animation template and contextual information stored in the customer information database 109. A text weather forecast can be converted to an audio forecast where the audio is scripted based on the animation template. An animation engine 346 can be used in an embodiment to provide animation for media objects. The animation is then either rendered to a video for delivery by a rendering engine 336, or delivered as 3D data to the client device, or delivered in some other media form. An example of the rendering engine 336 is a 3D batch processor that renders frames of a 3D animation using a hardware accelerated opengl engine that can be used to render video content.

The frames can then be passed to a proprietary video encoding component (VeeCoder) or other video engine 316. The VeeCoder component or other video engine 316 can encode video to a number of different video formats that a client device might support. An audio engine 326 may also be used to generate or otherwise provide the appropriate audio for the video. If the clip is to be delivered as 3D, the underlying avatar model (such as from avatar templates 356), textures and all animation data is packaged and sent to the device where it is played back with a client-side rendering engine.

The client-side rendering engine is typically part of an application - for example a weather application. The media generation component 306 can include image recognition components that can be used for a variety of image tweaks and character generation. An image input can be analyzed and then used to generate an altered image or a video output with modifications based on the templates, user inputs, text to speech or other pre-specified image enhancement effects.

In an embodiment of the media customization system 300 of Figure 3, device capabilities information may be stored in a database 350. Furthermore, the customer information database 109 may store personal avatars 309 that are preferred or otherwise selected by the user to be used in conjunction with the delivered media.

A number of example and non-limiting applications of the media customization system will now be described in order to further illustrate the utility of one or more embodiments.

One example use for the customized media system involves automatically generated and distributed videos based on text input. In this application, various text inputs such as weather reports, stock tickers, and news are transformed into much richer media types such as a 3D or a video weather report, for example. The visual scene may be composed and enhanced based on the physical location of the client device using a location based template or visual cue.

For example, if a client device is associated with San Francisco, a background scene showing the picture of the Golden Gate Bridge can appear behind other content, such as text, 3D renders, animation or video. In another example, user preferences can be applied to customize the media for a user. For example, a user can pick and chose a voice, a video avatar, or the user may also provide an image to generate a personalized avatar or other pictorial representation. User demographics can also be applied to customize the media. For example, a preference can be deduced based on a particular age/gender/income group for content that is jazzy, edgy or directed to more conservative professional-type content. Device capabilities can also be used to customize media for a client device. For example, scenes may be created specifically to meet the device capabilities, including supported video, 3D and audio formats. One example embodiment includes intelligent text input recognition. In this version, the customized media is generated based on data contained in a text input. Using weather information as an example, the GPS location of a client device could be used as the basis for a query to a weather database. The returned weather description could be analyzed and then used to create a voice for a virtual character, while the chance of precipitation may depicted by rain clouds appearing on the map, and a temperature below 32 degrees may be depicted by a shivering avatar. In this manner, a set of media inputs is customized for a particular client device based on contextual parameters such as location. In another related example, a message such as an instant message may be transformed by the media customization system in order to convert a text message into an animation having an audio track. Specifically, a text birthday greeting in an instant message can be transformed and media-enhanced to provide an animated birthday greeting with an audio track that replays the birthday song or the text of the original message.

Another related example application for an embodiment includes the generation of an intelligent traffic report or intelligent travel information. Location information can be acquired by using a GPS system. For instance by taking two GPS measurements a few seconds apart, a current road and direction can be determined and information about the traffic further down the road can be obtained and customized for a client device. Generally, a GPS- enabled client device will be sending GPS data to the media customization system, or a connected server several times in the course of a regular commute. Therefore, the system can determine if the user has one or more regular routes. Then in response to a query, and based on the deduced route information, the system can report traffic conditions along the remainder of the route that the user appears to be taking. Time data can also be used to determine whether it is a morning or evening commute. If real-time traffic events (accidents) are available to the server, the media customization system can send alerts to the client device if a crash happens further along the usual route. Once the user route information is deduced, it can be used to generate animated information with road maps, pictures of traffic data or a 3D avatar relaying the report or other customized parameters based on other contextual information.

Travel-related services can be offered in much the same customized manner, but offering localized information on restaurants, hotels, clubs and entertainment option, again based on the location of the client device, prior user preferences and demographics, and available input media channels such as restaurant guides (Zagat), and fee-based partnership agreements with hotel and entertainment promotion services. Other applications for media customization using the various embodiments of methods and system of the invention include content transformation and generation for messaging between users in communities such as Friendster, Tribe, MySpace, Dodgeball, Match.com, for example. Messages can originate in lowest-common-denominator format (text only: instant messaging, SMS, email). However, other inputs such as audio and video are supported too. The input media can the be transcoded according to a destination device capability and additional contextual parameters such as user selected or created avatars, playback voices, location-based scenery, etc. can be used during the transformation and generation process. The messages originate from one user and get sent to another, but pass through the community's central "profile" engine which does the end-to-end routing and hides the endpoints of the two users (phone number, email address, etc.) Only the profile identity of the two users are revealed to each other, not their "real world" identities. This allows the use of "textimation" (text to animation) transformation to "upgrade" text or SMS messages into higher-end formats for users with devices/players capable of playing them. Example include SMS to avatar transformation and generation with text-to-speech, using the avatar of the sender (extracted from their profile), and using an SMS-specific context engine to explode compacted and shortened SMS shorthand into more complete spoken sentences with emotional nuance extracted from emoticon cues, punctuation. In one embodiment, the various media elements need not be "deconstructed." For example, if the element(s) being fed into this system is an audio track, the audio track need not be deconstructed before it is incorporated into other media element(s) and/or otherwise transformed. In various embodiments, more than contextual-based transformation can be occurring. For example, combining the various artistic elements (e.g., graphical 3D rendering plus audio) creates "something else" - this is a transformative process that creates a derivative work or other resultant work. More than just taking different input in differing formats and transforming them all into the same format (e.g., 3GPP), such an embodiment involves a situation where the whole is greater than a sum of the parts. Further in an embodiment, one element informs or otherwise influences the other element(s) - e.g., the audio track "drives" the animation (movement) of the lips, eyes, etc. According to one embodiment, at least some of the features described herein can be embodied in software or other machine-readable instructions stored on one or more machine-readable media. The machine- readable instructions can be executed by one or more processors to provide the features and other functionality described herein. For instance, the various engines, templates, components, etc. shown in Figs. 1-3 can be embodied as software modules that are executable by one or more processors.

Another embodiment is shown in Fig. 4. The embodiment provides the context-based media generation component (blocks 106 and 107) with the ability to generate supplemental information that may enhance video encoding speed and quality. The supplemental information is used by the transcoding component 105 (or some other transformation component) to produce video (such as for example a 3GPP file from the 3D frames). Video compression used by the transcoding component 105 entails estimation of motion between the frames and encoding of prediction error. The 3D media generator 106 can provide the motion information between the frames. Having the exact motion information reduces the video encoding time (e.g., no motion estimation required) and reduces the prediction error, since the motion that has occurred and what has changed between one frame to another is known. This motion information or other supplemental information is passed to the transcoding component 105 as a hint track 151 for encoding. Reduction in prediction error increases the compression quality and enables higher video quality for a fixed bit rate or reduced file size for a fixed-quality video file. It is appreciated that motion data is just one possible example of the supplemental information that can be provided by the 3D media generator 106 to increase video encoding quality and speed. The media generator 106 can also provide information about luminance or other lighting changes in between frames, texture changes in between frames, as additional non-limiting examples. An embodiment of the 3D media generator 106 can also send information about composition of the frame in case there is text overlay, or what part of the scene should be the focus of more bits, as yet additional examples. In an embodiment, the media may be delivered to the end user device as a 3GPP video file (for example) or as a 3D video file based on device capabilities. This information is derived with the help of device capabilities detection engine 109. Furthermore, the video or the 3D media can be custom- created to match the device capabilities. Different devices may have different capabilities when it comes to rendering 3D files and hence may need 3D content optimized to that device. The context-based media generation engine is capable of producing files that match the device capability. In a similar manner, the transcoding component 105 is capable of producing video files with attributes that are suitable for the end user device. In case the media has been pre encoded into multiple 3D files and multiple 3GPP files all with different features, a delivery engine can use the device information and file properties to select the most optimal file for the end user device.

The delivery engine of one embodiment uses device capabilities to select files rather than network conditions or the device conditions. Once the session is in place, the capabilities of the delivery engine can be coupled with the dynamic bandwidth adaptation technology, such as disclosed in U.S. Application Serial No. 10/452,035, entitled "METHOD AND APPARATUS FOR DYNAMIC BANDWIDTH ADAPTATION," filed May 30, 2003, assigned to the same assignee as the present application, and incorporated herein by reference in its entirety. The transcoding component (105) of one embodiment can also utilize the adaptation techniques disclosed in this commonly owned application.

All of the above U.S. patents, U.S. patent application publications, U.S. patent applications, foreign patents, foreign patent applications and non- patent publications referred to in this specification and/or listed in the Application Data Sheet, are incorporated herein by reference, in their entirety. The above description of illustrated embodiments, including what is described in the Abstract, is not intended to be exhaustive or to limit the invention to the precise forms disclosed. While specific embodiments and examples are described herein for illustrative purposes, various equivalent modifications are possible within the scope of the invention and can be made without deviating from the spirit and scope of the invention.

These and other modifications can be made to the invention in light of the above detailed description. The terms used in the following claims should not be construed to limit the invention to the specific embodiments disclosed in the specification and the claims. Rather, the scope of the invention is to be determined entirely by the following claims, which are to be construed in accordance with established doctrines of claim interpretation.

Claims

CLAIMSWhat is claimed is:

1. A method, comprising: obtaining first data; analyzing the first data to obtain context information therefrom; using the obtained context information to determine second data; supplementing the first data with the determined second data to obtain transformed data; and delivering the transformed data to at least one client device.

2. The method of claim 1 wherein obtaining the first data includes obtaining text data, wherein analyzing the first data includes parsing the text data to obtain the context information therefrom, and wherein supplementing the first data with the determined data includes supplementing or replacing the text data with audio, video, a pictorial representation, one or more images, or other rich media element that is in context with the text data.

3. The method of claim 1 wherein obtaining the first data includes obtaining a plurality of first data, wherein determining the second data includes using at least some of the plurality of first data as the second data based on context information of the plurality of first data, and wherein supplementing the first data with the determined second data includes combining at least some second data corresponding to some of the plurality of first data to obtain the transformed data.

4. The method of claim 1 wherein obtaining the context information includes determining an emotional nuance from the first data.

5. The method of claim 1 , further comprising: transcoding the transformed data, the second data, or the transformed data from a first form to another form based on characteristics of the client device or based on conditions of a communication channel to deliver the transformed data; and dynamically updating during a communication session the transcoded data in response to changes in the characteristics of the client device or the channel conditions.

6. The method of claim 1 wherein supplementing the first data with the determined second data includes using templates to provide rich media enhancements to the first data in selected locations and according to a sequence in time.

7. The method of claim 1 , further comprising: deriving hint track information based on the determined second data or based on the context information; and using the hint track information to optimize transcoding of the transformed data to a form that is optimum for the client device or for a communication channel to the client device.

8. A media customization system, comprising: means for obtaining first data; means for analyzing the first data to obtain context information therefrom; means for using the obtained context information to determine second data; means for supplementing the first data with the determined second data to generate transformed data; and means for delivering the transformed data to at least one client device.

9. The system of claim 8, further comprising: means for transforming the first, second, or transformed data; and means for storing client device data, customer information, or pictorial information that can be used to generate the transformed data.

10. The system of claim 8 wherein the means for analyzing the first data include at least one of a context database, a context engine, a placement template, and an animation template.

11. The system of claim 8 wherein the means for supplementing includes at least one of a video engine, an audio engine, a rendering engine, an animation engine, and pictorial templates that are usable to generate media for the transformed data.

12. The system of claim 8, further comprising means for enhancing video encoding based on supplemental information derived from the context information or from the second data.

13. The system of claim 7, further comprising means for delivering the transformed data as a video file that has been derived based on device capabilities.

14. An article of manufacture, comprising: a machine-readable medium having instructions stored thereon that are executable by at least one processor to: analyze obtained first data to obtain context information therefrom; use the obtained context information to determine second data; supplement the first data with the determined second data to obtain transformed data; and deliver the transformed data to at least one client device.

15. The article of manufacture of claim 14 wherein the machine-readable medium further includes instructions stored thereon to: derive hint track information based on the determined second data or based on the context information; and use the hint track information to optimize transcoding of the transformed data to a form that is optimum for the client device or for a communication channel to the client device.

16. The article of manufacture of claim 14 wherein the instructions to analyze the obtained first data to obtain context information includes instructions to derive emotional nuances from the first data.

17. The article of manufacture of claim 14 wherein the instructions to use the obtained context information to determine second data includes instructions to identify second data that is in context with the first data based on at least one or more of user preferences, device characteristics, or media content that can be associated with the first data.

18. An apparatus, comprising: an input terminal to receive a media input; a context component coupled to the terminal to analyze the media input to obtain context information therefrom and to use the obtained context information to determine a media supplement for the media input that is in context with the media input; a media generation component coupled to the context component to supplement the media input with the determined media supplement to obtain transformed media; and an output terminal coupled to the media generation component to deliver the transformed media to at least one client device.

19. The apparatus of claim 18, further comprising: a transcoding component coupled to the media generation component to convert the transformed media into a form that is optimum for the client device or for a communication channel to the client device that is to be used to deliver the transformed media; and at least one storage unit to store information related to the client device, including device capability information, user information, preference information, or avatar templates.

20. The apparatus of claim 18, further comprising a transcoding component coupled to the media generation component to convert the transformed media into a form that is optimum for the client device or for a communication channel to the client device that is to be used to deliver the transformed media, the transcoding component further being coupled to receive hint track information from the media generation component or from the context component that is usable to optimize conversion of the transformed media.

21. The apparatus of claim 18 wherein the context component includes a context database, a context engine, a placement template, and an animation template.

22. The apparatus of claim 18 wherein the media generation component includes a video engine, an audio engine, a rendering engine, an animation engine, and pictorial templates that are usable to generate objects for the transformed media.