JP2008544412A

JP2008544412A - Apparatus, system, method, and product for automatic media conversion and generation based on context

Info

Publication number: JP2008544412A
Application number: JP2008518455A
Authority: JP
Inventors: ラリトエス．サルナ，; デイビッドエム．ウエストウッド，; グレゴリーエル．ルター，; チアンユデン，; ダニエルエフ．ギース，; サミュエルトリチン，
Original assignee: ビディアトアーエンタープライジズインコーポレイテッド
Priority date: 2005-06-23
Filing date: 2006-06-23
Publication date: 2008-12-04
Also published as: CN101208929A; EP1908255A1; WO2007002448A1; TW200718134A

Abstract

実施形態は、デバイスにとらわれない方法でのユーザデバイスへの配信のための、自動的なメディアの生成または複数のメディア入力の他の変換（価値向上を含む）を提供する。変換プロセスは、さらに、文脈情報に基づいて自動的に価値向上がなされる。文脈情報は、カスタマイズされたコンテンツの価値向上を、ユーザデバイスに配信されるメディアに提供する。文脈情報は変換および生成されたメディアと自動的に統合され、デバイスにとらわれない方法で、高度にカスタマイズされた質の高いメディアコンテンツの配信を提供する。Embodiments provide automatic media generation or other conversion of media input (including value enhancement) for delivery to user devices in a device-independent manner. The conversion process is further automatically enhanced in value based on the context information. Context information provides customized content value enhancement to media delivered to user devices. Contextual information is automatically integrated with the converted and generated media, providing highly customized high quality media content delivery in a device-independent manner.

Description

（関連出願の引用）
本出願は、本出願と同じ譲受人に譲渡され、その全体が本明細書において参考として援用される、２００５年６月２３日出願の「ＡＰＰＡＲＡＴＵＳ，ＳＹＳＴＥＭ，ＭＥＴＨＯＤ，ＡＮＤＡＲＴＩＣＬＥＯＦＭＡＮＵＦＡＣＴＵＲＥＦＯＲＡＵＴＯＭＡＴＩＣＣＯＮＴＥＸＴ−ＢＡＳＥＤＭＥＤＩＡＴＲＡＮＳＦＯＲＭＡＴＩＯＮＡＮＤＧＥＮＥＲＡＴＩＯＮ」と題される米国仮特許出願第６０／６９３，３８１号の利益を、３５Ｕ．Ｓ．Ｃ．セクション１１９（ｅ）の下で主張する。 (Citation of related application)
This application is assigned to the same assignee as the present application and is incorporated herein by reference in its entirety. The benefit of US Provisional Patent Application No. 60 / 693,381 entitled “BASED MEDIA TRANSFORMATION AND GENERATION” is S. C. Claim under section 119 (e).

（技術分野）
本出願は、概して、通信ネットワークを通じたメディアの配信に関し、特に、これに限定するものではないが、通信ネットワークを通じたクライアントのデバイスへの配信のためのメディアの生成および変換（ｔｒａｎｓｆｏｒｍａｔｉｏｎ）に関する。 (Technical field)
The present application relates generally to media distribution over a communication network, and more particularly, but not exclusively, to media generation and transformation for distribution to client devices over a communication network.

様々な形式のメディアフォーマットの転換（ｃｏｎｖｅｒｓｉｏｎ）が、現在、市場で入手可能である。例えば、多くの会社が、日常的に、カスタマーサービスのサーバ側のアプリケーションにデータを入力するために、電話のための音声認識システムを使用しており、該カスタマーサービスのサーバ側のアプリケーションは、処理するために音声データをテキストデータに転換する。メディア変換技術は、これに限定するものではないが、テキストから音声への変換、イメージから３Ｄレンダリングへの変換、およびイメージから映像への変換を含む。様々なユーザデバイスへの配信のために、デバイスの能力に従ってデータをカスタマイズする様々な技術がまた、従来技術において公知である。例えば、通常、要求しているデバイスによって提供される一部の情報に基づいて、クライアントのデバイスに配信する前に、様々な表示および音声の能力に合わせて、サーバ側のアプリケーションにおいてデータを改変することが公知である。例としては、フルサイズのＰＣのディスプレイに対して携帯電話またはＰＤＡのサイズのディスプレイのためにウェブのコンテンツを改変することを含む。 Various forms of media format conversion are currently available on the market. For example, many companies routinely use speech recognition systems for telephones to enter data into customer service server-side applications that process In order to do this, the voice data is converted into text data. Media conversion techniques include, but are not limited to, text to audio conversion, image to 3D rendering, and image to video conversion. Various techniques for customizing data according to device capabilities for distribution to various user devices are also known in the prior art. For example, based on some information typically provided by the requesting device, the data is modified in the server-side application for various display and audio capabilities before delivery to the client device. It is known. Examples include modifying web content for a mobile phone or PDA size display versus a full size PC display.

しかしながら、特に、様々な異なるタイプのクライアントのデバイスに容易に配信され得る様々な異なるタイプのメディアのコンテンツに対して、ユーザの要望がますます増加する現在の環境において、このような基本的なメディアフォーマットの転換およびメディアの変換技術には、所望されることが多く残されている。このようなメディアフォーマットの転換およびメディアの変換技術には、様々なユーザデバイスの利点を最大化させようとする技術的な知識を有するユーザのニーズを満たすためには、あまりにも基本的でありかつ初歩的でありすぎる。 However, such basic media, particularly in the current environment, where user demands are increasing for a variety of different types of media content that can be easily delivered to a variety of different types of client devices. Much remains to be desired in format conversion and media conversion techniques. Such media format conversion and media conversion techniques are too basic to meet the needs of users with technical knowledge to maximize the benefits of various user devices and Too rudimentary.

一局面に従って、方法は、第１のデータを取得することと、そこからの文脈情報を取得するために該第１のデータを分析することと、第２のデータを決定するために取得した文脈情報を使用することと、変換されたデータを取得するために、該決定された第２のデータで該第１のデータを補うことと、該変換されたデータを少なくとも１つのクライアントのデバイスに配信することと、を含む。 In accordance with one aspect, a method obtains first data, analyzes the first data to obtain context information therefrom, and obtains a context to determine second data. Using the information, supplementing the first data with the determined second data to obtain the transformed data, and delivering the transformed data to at least one client device And including.

限定を意図せず、かつ全てを網羅するものではない実施形態が、以下の図面を参照して記述される。同じ参照番号は、特に断りがない場合には、様々な図全体を通して同じ部分を指す。 Embodiments that are not intended to be limiting and are not exhaustive will be described with reference to the following drawings. The same reference numbers refer to the same parts throughout the various figures unless otherwise specified.

文脈に基づいた自動的なメディアの変換および生成に対する実施形態が本明細書において記述される。以下の記述において、多数の特定の詳細が、実施形態の完全な理解を提供するために与えられる。しかしながら、実施形態は、特定の詳細のうちの１つ以上を用いることなく、または他の方法、コンポーネント、材料などを用いて実施され得るということを、当業者は認識する。他の例においては、周知の構造、材料、または動作は、実施形態の局面を分かりにくくすることを避けるために、詳細には示されないか、または記述されない。 Embodiments for automatic media conversion and generation based on context are described herein. In the following description, numerous specific details are given to provide a thorough understanding of the embodiments. However, one of ordinary skill in the art appreciates that the embodiments may be practiced without one or more of the specific details, or with other methods, components, materials, and the like. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of the embodiments.

この明細書全体を通した「一実施形態」または「実施形態」への言及は、実施形態に関連して記述される特定の特徴、構造、または特性が、少なくとも１つの実施形態に含まれるということを意味する。従って、この明細書全体を通した様々な場所における、成句「一実施形態において」または「実施形態において」の出現は、必ずしも、全て同じ実施形態に言及しているわけではない。さらに、特定の特徴、構造、または特性は、１つ以上の実施形態において、任意の適切な方法で組み合わせられ得る。 Reference to “one embodiment” or “an embodiment” throughout this specification includes that the particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. Means that. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

上で先に述べられたように、メディアのフォーマットの転換およびメディアの変換に対する一部の従来技術の手法が存在する。しかしながら、従来技術においては、同時的な複数のメディア入力およびデバイス出力に対して自動化されたメディアの生成または変換を提供するために導入されたシステムまたは方法は全く存在していない。特に、文脈パラメータに基づいてメディアの変換およびメディアの価値向上を提供するために利用可能なシステムおよび方法は存在しない。 As previously mentioned above, there are some prior art approaches to media format conversion and media conversion. However, in the prior art, there are no systems or methods introduced to provide automated media generation or conversion for simultaneous multiple media inputs and device outputs. In particular, there are no systems and methods available to provide media conversion and media value enhancement based on context parameters.

全体像として、実施形態は、デバイスにとらわれない方法でのユーザデバイスへの配信のための、自動的なメディアの生成または複数のメディア入力のフォーマットの（価値向上を含む）他の変換のためのシステムを提供する。一実施形態において、変換プロセスは、さらに、文脈情報に基づいて自動的に価値を高められる。文脈情報は、カスタマイズされたコンテンツの価値向上をユーザデバイスに配信される全てのメディアに提供する。文脈情報は変換および生成されたメディアと自動的に統合され、デバイスにとらわれない方法で高度にカスタマイズされた質の高いメディアコンテンツの配信を提供する。 Overall, the embodiments are for automatic media generation or other conversion (including value enhancement) of multiple media input formats for delivery to user devices in a device-independent manner. Provide a system. In one embodiment, the conversion process is further automatically valued based on the context information. The context information provides customized media value enhancement to all media delivered to the user device. The contextual information is automatically integrated with the converted and generated media, providing a highly customized delivery of high quality media content in a device-independent manner.

図１は、実施形態を実装するための第１のシステム１０を例示する。図１は、複数のメディア入力（１０１、１０２、１０３など）、および１つ以上のユーザデバイス（１１０、１２０、１３０など）に供給するカスタマイズされたメディア出力（１１１、１２１、１３１など）を有する、自動化されたメディアカスタマイズシステム１００を示す。入力メディア（１０１、１０２、１０３など）は、例えば、音声、映像、３Ｄレンダリング、図形、インターネット／ウェブのコンテンツ、生の音声または映像、ファイル、株の取引値、ニュースおよび天気予報のような動画またはテキスト番組、衛星イメージ、ならびにスポーツ番組のような任意の形状を取り得る。 FIG. 1 illustrates a first system 10 for implementing embodiments. FIG. 1 has multiple media inputs (101, 102, 103, etc.) and customized media outputs (111, 121, 131, etc.) to feed one or more user devices (110, 120, 130, etc.) 1 shows an automated media customization system 100. Input media (101, 102, 103, etc.) can be, for example, audio, video, 3D rendering, graphics, Internet / web content, live audio or video, files, stock trading values, videos such as news and weather forecasts Or it can take any shape like text programs, satellite images, and sports programs.

メディアカスタマイズシステム１００に対する様々なメディア入力（１０１、１０２、１０３など）は、テキストメッセージ、音声メッセージ、イメージ、および音声クリップのようなエンドユーザによって作成されるメディアのようなコンテンツを含み得る。メディア入力は、一定のテキストメッセージ、アバタまたは他のイメージ表現、イメージ、およびテーマのような事前設定のリスト情報を含み得る。メディア入力は、例えば、ニュース番組、天気予報、スポーツのヘッドライン、および株の取引値のようなサードパーティの情報サービスによって供給される。本明細書において明確に述べられるメディアのタイプ／形式以外のメディアのタイプ／形式がメディア入力（１０１、１０２、１０３）に含まれ得るということが理解される。 Various media inputs (101, 102, 103, etc.) to the media customization system 100 may include content such as media created by end users such as text messages, voice messages, images, and audio clips. Media input may include preset list information such as certain text messages, avatars or other image representations, images, and themes. Media input is provided by third party information services such as news programs, weather forecasts, sports headlines, and stock trading values, for example. It will be appreciated that media types / formats other than those specifically described herein may be included in the media input (101, 102, 103).

自動メディアカスタマイズシステム１００の実施形態は、様々なメディア入力（１０１、１０２、１０３など）に対して、その場でフォーマットの転換を提供し、また文脈パラメータに基づいてメディアを生成し得ることにより、入力メディアと組み合わせて、各ユーザデバイスに対してそれぞれのカスタマイズされたメディア出力（１１１、１２１、１３１など）を形成し、および／または１つ以上のデバイス（１１０、１２０、１３０など）が、１つよりも多いメディア出力を受信し得る。 Embodiments of the automated media customization system 100 can provide on-the-fly format conversion for various media inputs (101, 102, 103, etc.) and can generate media based on contextual parameters, In combination with input media, each user device forms a respective customized media output (111, 121, 131, etc.) and / or one or more devices (110, 120, 130, etc.) More than one media output may be received.

実施形態に従って、クライアントデバイスのユーザは、１つよりも多い宛先のクライアントデバイス（１１０、１２０、１３０など）に対するメディア配信の好みを選択し得る。システム１００は、スタンドアロンのクライアント技術、またはサーバ−クライアント技術の組み合わせとして配備され得る。一実施形態において、カスタマイズされたメディアは、サーバ上で生成され（例えば、メディアカスタマイズシステム１００はサーバに配置される）、デバイス（１１０、１２０、１３０など）に配信される。メディアカスタマイズシステム１００からの命令およびデータに基づいて、デバイス（１１０、１２０、１３０など）において一部または全てのメディアを生成することがまた可能である。メディア生成は、個々のユーザのアカウントにとって利用可能であるリソース、およびメディアカスタマイズシステム内の利用可能なコンポーネントを使用するように最適化される。 In accordance with an embodiment, a client device user may select media delivery preferences for more than one destination client device (110, 120, 130, etc.). System 100 may be deployed as a stand-alone client technology or a combination of server-client technologies. In one embodiment, customized media is generated on a server (eg, media customization system 100 is located on the server) and distributed to devices (110, 120, 130, etc.). It is also possible to generate some or all media at the device (110, 120, 130, etc.) based on instructions and data from the media customization system 100. Media generation is optimized to use the resources available to individual user accounts and the components available in the media customization system.

一実施形態の文脈コンポーネント１０７は、文脈情報を決定して、および／またはクライアントデバイスへの配信のための質を高められたメディアフォーマットに文脈要素を統合する。例えば、アプリケーションのタイプ、メディアのタイプ、クライアントデバイスの位置、エンドユーザの入力、およびエンドユーザの人口統計を含むいくつかの要因に基づいて、文脈情報は事前設定されまたは導き出され得る。その結果としての文脈の質を高められたメディア、および／または他の質を高められたメディアが、クライアントデバイスの能力または上に列挙された要因に合うように、メディア生成コンポーネント１０６によって生成され得る。一実施形態において、メディア生成コンポーネント１０６と文脈コンポーネント１０７とは、同じコンポーネントを備え得る。 The context component 107 of one embodiment determines context information and / or integrates context elements into a quality enhanced media format for delivery to client devices. For example, contextual information may be preset or derived based on a number of factors including application type, media type, client device location, end user input, and end user demographics. The resulting context-enhanced media and / or other quality-enhanced media can be generated by the media generation component 106 to match the capabilities of the client device or the factors listed above. . In one embodiment, media generation component 106 and context component 107 may comprise the same component.

一実施形態において、文脈コンポーネント１０７は、クライアントデバイスまたは文脈のデータから送信されるユーザ入力の好みのような文脈に対する入力を含み、該クライアントデバイスまたは文脈のデータは、サービス要求履歴、位置データ、料金請求データ、適用された人口統計データ、ユーザの地理／位置データなどのようなユーザのアカウントデータから導き出されて、該ユーザのアカウントデータは顧客情報のデータベース１０９または他の格納ユニットに格納されており、該顧客情報のデータベース１０９または他の格納ユニットは、メディアカスタマイズシステム１００から分離され得るか、またはそこに統合され得る。一実施形態において、文脈コンポーネント１０７によってメディア入力を分解して、コンテンツを構文解析することによって、他の文脈要素は決定され得、次いで、声の抑揚およびエモーティコンのようなテキストおよび音声における感情を表す要素のような任意の決定された文脈要素と、雨天の天気予報のための雨音のような位置に基づいた文脈とを使用して、新たにカスタマイズされたメディアを生成する。 In one embodiment, the context component 107 includes input for a context, such as user input preferences sent from the client device or context data, which includes service request history, location data, charges Derived from user account data such as billing data, applied demographic data, user geography / location data, etc., the user account data is stored in the customer information database 109 or other storage unit. , The customer information database 109 or other storage unit may be separated from or integrated into the media customization system 100. In one embodiment, other contextual elements can be determined by decomposing the media input by contextual component 107 and parsing the content, and then represent emotions in text and speech, such as voice inflection and emoticons. Use any determined contextual elements, such as elements, and location-based contexts, such as rain sound for rainy weather forecasts, to generate a new customized media.

実施形態において、メディアカスタマイズシステム１００は、トランスコーディングコンポーネント１０５を含み、デバイス（１１０、１２０、１３０など）に配信されるように、これらのデバイスおよび／またはチャンネルの状況に対して最適化される方法で、メディアを変換する。例えば、トランスコーディングコンポーネント１０５は、そのメディアを受信するクライアントデバイスの特性に基づいて、および／またはそのクライアントデバイスのチャンネルの状況に基づいてメディアを最適化する方法で、ビットレート、フレームレート、解像度、符号化フォーマット、カラーフォーマット、または配信されるメディアに関連する他のパラメータを動的に変更し得、それらは通信セッションの全体を通して変更され得る。トランスコーディングコンポーネント１０５によって使用され得る手法およびモジュールの例は、本出願と同じ譲受人に譲渡され、その全体が本明細書において参考として援用される２０００年２月１０日出願の「ＣＯＭＰＵＴＥＲＰＲＯＧＲＡＭＰＲＯＤＵＣＴＦＯＲＴＲＡＮＳＦＯＲＭＩＮＧＳＴＲＥＡＭＩＮＧＶＩＤＥＯＤＡＴＡ」と題される米国特許出願第０９／５０２，３９０号に開示されている。 In an embodiment, the media customization system 100 includes a transcoding component 105 and is optimized for the status of these devices and / or channels for delivery to devices (110, 120, 130, etc.). Then convert the media. For example, the transcoding component 105 may be configured to optimize the media based on characteristics of the client device that receives the media and / or based on the channel conditions of the client device, The encoding format, color format, or other parameters related to the media being delivered can be changed dynamically, and they can be changed throughout the communication session. Examples of techniques and modules that can be used by the transcoding component 105 are assigned to the same assignee as the present application and are incorporated herein by reference in their entirety as “COMPUTER PROGRAM PRODUCT FOR” filed on Feb. 10, 2000. No. 09 / 502,390 entitled “TRANSFORMING STREAMING VIDEO DATA”.

システム１００は様々なメディア入力を取得して、次いで文脈情報に基づいて変換し、または異なるまたは同じメディアのタイプのカスタマイズされたメディアを生成する。次に、カスタマイズされたデータは、映像ディスプレイおよび／または音声能力を有するクライアントデバイスに配信され得る。配信メカニズムは、限定するものではないが、ストリーミング、電子メール、ＭＭＳ、ＷＡＰ、ＰＵＳＨ、およびダウンロードリンクを含む。配信チャンネルは、これに限定するものではないが、ワイヤレスおよび有線のネットワーク、ならびにケーブル支援のアップロードチャンネルを含む。 The system 100 obtains various media inputs and then converts based on the contextual information or generates customized media of different or the same media type. The customized data can then be delivered to client devices having video display and / or audio capabilities. Delivery mechanisms include, but are not limited to, streaming, email, MMS, WAP, PUSH, and download links. Distribution channels include, but are not limited to, wireless and wired networks, and cable assisted upload channels.

図２および図３は、図１のシステム１００に対する他の実施形態を示す。例えば、図２は、文脈コンポーネント１０７がメディア生成コンポーネント１０６を含む、メディアトランスコーディングシステム（または他の変換システム）２００を示す。さらに、実施形態において、ユーザ情報データベース１０９は、デバイスの能力に関する情報を含み得、代替的にまたは追加的に、ユーザの情報または他の情報に関する情報を含み得る。図２に例として示されているように、様々なクライアントデバイスは、セルラー電話２１０、（ＰＤＡまたはＢｌａｃｋＢｅｒｒｙのような）ワイヤレスデバイス２２０、ラップトップ２３０などを含み得る。 2 and 3 illustrate another embodiment for the system 100 of FIG. For example, FIG. 2 shows a media transcoding system (or other conversion system) 200 in which the context component 107 includes a media generation component 106. Further, in embodiments, the user information database 109 may include information regarding device capabilities, and alternatively or additionally, may include information regarding user information or other information. As shown by way of example in FIG. 2, various client devices may include a cellular phone 210, a wireless device 220 (such as a PDA or BlackBerry), a laptop 230, and the like.

一実施形態のユーザ情報データベース１０９は、好みとユーザの履歴とを含み得る。ユーザはアバタを選択し得るか、もしくはアバタまたは他の画像表示をカスタマイズし得、その情報はクライアントデバイスまたはユーザ情報データベース１０９のいずれかに格納され得る。例えば、ユーザは「ＭａｒｉａＢａｒｔｉｒｏｍｏ」アバタまたは「ＬａｒｒｙＫｕｄｌｏｗ」アバタのような所定のアバタを望み得ることにより、それらのクライアントデバイス上の株価のニュースを配信する。ユーザはまた、新たなカスタムアバタの基礎として使用するために、デジタル写真を提供し得る。上で述べられたように、ユーザ情報データベース１０９はまた、メディアの性能を特定のユーザのアカウントと関連付けるデバイス能力データベースを含み得る。ユーザが１つよりも多いデバイスを有する場合には、ＭＩＮ、ＳＩＤまたは他のデバイスの識別コードが、デバイスからのサービスに対する要求に基づいて、その場で相互参照されて、決定され得る。 The user information database 109 in one embodiment may include preferences and user history. The user can select an avatar or customize the avatar or other image display, and that information can be stored either in the client device or in the user information database 109. For example, a user may want a predetermined avatar, such as a “Maria Bartiromo” avatar or a “Larry Kudlow” avatar, to deliver stock price news on those client devices. Users can also provide digital photos for use as the basis for new custom avatars. As mentioned above, the user information database 109 may also include a device capability database that associates media performance with specific user accounts. If the user has more than one device, the MIN, SID or other device identification code may be cross-referenced and determined on the fly based on the request for service from the device.

様々な実施形態に従って、１つ以上のプロセスが含まれる：
第１のプロセスは、メディア生成コンポーネント１０６および文脈コンポーネント１０７を適用することによって、入力メディアを分析して、様々な文脈の特徴を導き出し、例えば、データベース１０９から入力可能である他の情報と文脈情報を組み合わせる。図１〜図３のメディアカスタマイズシステムは、音声分析、テキスト分析、映像分析、イメージ分析、位置に基づく情報の分析、ユーザの人口統計分析、ユーザの行動履歴の分析を提供するように構成されている。 In accordance with various embodiments, one or more processes are included:
The first process analyzes the input media by applying the media generation component 106 and the context component 107 to derive various contextual characteristics, eg, other information and contextual information that can be input from the database 109. Combine. The media customization system of FIGS. 1-3 is configured to provide audio analysis, text analysis, video analysis, image analysis, location-based information analysis, user demographic analysis, and user behavior history analysis. Yes.

第２のプロセスは、分析によって決定された特徴ならびにユーザの好みおよびデバイスの能力のような他の情報を使用して、この情報をアピールしかつ刺激する方法でクライアントデバイスに配信する、新たなおよび／または修正されたメディアのタイプを生成することを包含する。結果としてのカスタマイズされた出力メディアは、イメージ、３Ｄ動画、２Ｄ動画、映像と音声、映像のみ、および音声のみの出力、ならびに様々な他のメディア出力のタイプを含み得る。新たな特徴が利用可能になると、カスタマイズされたメディアのコンテンツは、適合および更新され得る。実施形態の特徴は、文脈パラメータに基づいてユーザ／デバイスにカスタマイズされたメディアの（適切である場合には）インテリジェントな自動分解、分析および生成に基づいた、生のメディア入力からのカスタマイズされたメディアの再構築である。 The second process uses the characteristics determined by the analysis and other information such as user preferences and device capabilities to deliver this information to the client device in a way that appeals and stimulates the new and Or generating a modified media type. The resulting customized output media may include images, 3D animations, 2D animations, video and audio, video only, and audio only outputs, as well as various other media output types. As new features become available, the customized media content can be adapted and updated. Embodiments feature customized media from raw media input based on intelligent automatic decomposition, analysis and generation (if appropriate) of media customized to the user / device based on contextual parameters It is a reconstruction.

システムの一特徴は、「インテリジェントな」メディア生成および変換または他のインテリジェントな変換のアプリケーションである。システムのインテリジェントな生成および変換の要素は、生のメディアデータソースを獲得すること、および生のメディアデータソースを３Ｄアバタの動画または他の画像表示に自動的に転換することのようなサービスを提供する。システムの各アプリケーションは、一定量の共通の機能性を共有する。例えば、テキストから音声への変換、音声から動画への変換、３Ｄサーバサイドのレンダリング、３Ｄクライアントサイドのレンダリング、および３Ｄから映像への符号化のような基礎を成す実現化技術のコンポーネントである。 One feature of the system is an “intelligent” media generation and conversion or other intelligent conversion application. The intelligent generation and transformation elements of the system provide services such as acquiring raw media data sources and automatically converting raw media data sources into 3D avatar video or other image display To do. Each application in the system shares a certain amount of common functionality. For example, underlying implementation technology components such as text-to-speech conversion, audio-to-video conversion, 3D server-side rendering, 3D client-side rendering, and 3D-to-video encoding.

カスタマイズされたメディア出力を組み立てるために、いくつかの文脈コンポーネント、例えば、テンプレートが実施形態によって適用される（例えば、図３のメディアカスタマイズシステム３００における文脈コンポーネント３０７を参照）。文脈固有のメディア配置テンプレート３３７は、アバタの動画クリップ内のメディアの配置を定義するために使用され得る；例えば、テンプレート３３７は、背景に天気図を配置し、スクリーンに向かって左側にアバタを配置し得る。 In order to assemble a customized media output, several contextual components, eg templates, are applied by the embodiment (see, eg, contextual component 307 in media customization system 300 of FIG. 3). Context-specific media placement template 337 may be used to define the placement of media in the avatar's video clip; for example, template 337 places the weather map in the background and the avatar on the left side toward the screen. Can do.

文脈固有のメディア動画テンプレート３４７がまた使用され得、例えば、音声から動画への転換のような時間を追った動画の流れ、およびアプリケーションのスクリーンの全般的な流れを定義する。テンプレート３４７は、例えば、水しぶきのスクリーンを示し、天気を伝えるデバイスのスクリーンの中央に天気図を示し、天気予報者を左側に動かし、天気図を２４時間の予報の音声供給と関連付けるように、時間の順序に従って動画の構成を導くように構成され得る。ユーザ情報データベースに参照される文脈固有のデータベース３１７はまた、メディアオブジェクト、およびテキストから音声への転換のために使用され得、場面内のメディアオブジェクトにイベントを合わせる。 A context-specific media animation template 347 may also be used to define a time-course animation flow, such as a voice-to-video transition, and an application screen general flow, for example. The template 347 shows, for example, a splash screen, a weather map in the middle of the screen of the device that communicates the weather, moving the weather forecaster to the left, and associating the weather map with a 24-hour forecast audio feed. It may be configured to guide the composition of the moving image according to the order of: A context-specific database 317 referenced to a user information database can also be used for media objects and text-to-speech conversions to match events to media objects in the scene.

文脈コンポーネント３０７は、例えば、場面生成および動画生成の役割を有する文脈メディア生成エンジン３２７を含み得る。一実施形態の「インテリジェントな」文脈メディア生成エンジン３２７は、最初に、入力メディアを獲得して、任意的に、ユーザの好みを獲得して、ユーザデータベース１０９に格納されている情報を使用して、どのメディア資産が場面内で変更されるべきかを決定して、配置テンプレート３３７をさらに使用して、メディアクリップ内の正しい時間に、場面内の正しい位置に、それらのメディア資産を配置する。例えば、天気予報に関して、入力は、テキスト文字列「華氏６０度、８０％の降水確率」であり得る。テキストは、文脈コンポーネント３０７によって構文解析され得、「降水確率」のような鍵となる文脈パラメータを決定して、雨の確率の高い雨天の外見を表すメディアオブジェクトを引き出す−例えば、天気を伝える女性が防水服を着た外見に変わり得、予報の位置が雨が降るのを示す動画の３Ｄを有する外になり得るために、文脈データベース３１７からの文脈情報と比較され得る。 The context component 307 may include, for example, a context media generation engine 327 that has the role of scene generation and video generation. In one embodiment, the “intelligent” context media generation engine 327 first obtains input media and optionally obtains user preferences and uses information stored in the user database 109. , Determine which media assets are to be changed in the scene and further use the placement template 337 to place those media assets at the correct position in the scene at the correct time in the media clip. For example, for a weather forecast, the input may be the text string “60 degrees Fahrenheit, 80% precipitation probability”. The text can be parsed by the context component 307 to determine key contextual parameters such as “Precipitation Probability” to derive a media object that represents a rainy appearance with a high probability of rain—for example, a woman telling the weather Can be transformed into a look in waterproof clothing and can be compared with contextual information from the contextual database 317 in order to be outside with the 3D of the animation showing the location of the forecast to rain.

動画は、動画テンプレート、および顧客情報データベース１０９に格納されている文脈情報を使用して、図３のメディア生成エンジンコンポーネント３０６によって、同様に生成される。テキストの天気予報は音声の予報に転換され得、音声は動画テンプレートに基づいて筋書きを決められる。一実施形態において、動画エンジン３４６はメディアオブジェクトに対する動画を提供するために使用され得る。次に、動画は、レンダリングエンジン３３６によって、配信のための映像にレンダリングされるか、またはクライアントデバイスに３Ｄデータとして配信されるか、もしくは一部の他のメディアの形式で配信されるかのいずれかである。レンダリングエンジン３３６の例は、３Ｄバッチプロセッサであり、該３Ｄバッチプロセッサは、映像コンテンツをレンダリングするために使用され得るハードウェアで加速されたｏｐｅｎｇｌエンジンを使用して、３Ｄ動画のフレームをレンダリングする。 The video is similarly generated by the media generation engine component 306 of FIG. 3 using the video template and context information stored in the customer information database 109. Text weather forecasts can be converted to audio forecasts, which can be scripted based on animated templates. In one embodiment, the animation engine 346 may be used to provide animation for media objects. The video is then rendered by the rendering engine 336 into video for distribution, delivered to the client device as 3D data, or delivered in some other media format. It is. An example of a rendering engine 336 is a 3D batch processor that renders frames of a 3D video using a hardware accelerated opengl engine that can be used to render video content.

次に、フレームは、専有の映像符号化コンポーネント（ＶｅｅＣｏｄｅｒ）または他の映像エンジン３１６に渡され得る。ＶｅｅＣｏｄｅｒコンポーネントまたは他の映像エンジン３１６は映像を、クライアントデバイスがサポートし得る多数の異なる映像フォーマットに符号化し得る。音声エンジン３２６はまた、映像に対する適切な音声を生成または提供するために使用され得る。クリップが３Ｄとして配信される場合には、（アバタテンプレート３５６からのような）基礎を成すアバタのモデルとテクスチャと全ての動画のデータが、パッキングされデバイスに送信されて、そこでデータがクライアント側のレンダリングエンジンを用いて再生される。 The frame can then be passed to a dedicated video encoding component (VeeCoder) or other video engine 316. The VeeCoder component or other video engine 316 may encode the video into a number of different video formats that the client device can support. Audio engine 326 may also be used to generate or provide appropriate audio for the video. If the clip is delivered as 3D, the underlying avatar model and texture (such as from avatar template 356) and all video data are packed and sent to the device where the data is client-side Played using the rendering engine.

クライアント側のレンダリングエンジンは、一般的には、例えば、天気予報アプリケーションのようなアプリケーションの一部分である。メディア生成コンポーネント３０６は、様々なイメージの微調整およびキャラクタの生成のために使用され得るイメージ認識コンポーネントを含み得る。イメージ入力は分析され、次に、テンプレート、ユーザ入力、テキストから音声への変換、または他の事前に指定されたイメージの価値向上の効果に基づいて、代替のイメージまたは修正を有する映像の出力を生成するために使用され得る。 The client-side rendering engine is typically part of an application, such as a weather forecast application. Media generation component 306 can include an image recognition component that can be used for various image fine-tuning and character generation. The image input is analyzed and then the output of the video with alternative images or modifications based on templates, user input, text-to-speech conversion, or other pre-specified image value enhancement effects. Can be used to generate.

図３のメディアカスタマイズシステム３００の実施形態において、デバイスの能力の情報はデータベース３５０に格納され得る。さらに、ユーザによって好まれるか、または選択され、顧客情報データベース１０９は、配信されたメディアと共に使用される個人的なアバタ３０９を格納し得る。 In the embodiment of media customization system 300 of FIG. 3, device capability information may be stored in database 350. Further, as preferred or selected by the user, the customer information database 109 may store a personal avatar 309 that is used with the distributed media.

ここで、メディアカスタマイズシステムの多数の例および限定するためではないアプリケーションが、１つ以上の実施形態の有用性をさらに例示するために記述される。 Here, numerous examples of media customization systems and non-limiting applications are described to further illustrate the utility of one or more embodiments.

カスタマイズされたメディアシステムに対する一例示的な使用は、テキスト入力に基づいて自動的に生成および配信される映像を含む。このアプリケーションにおいて、例えば、天気予報、株の相場表示機、およびニュースのような様々なテキスト入力が、３Ｄまたは映像の天気予報のような非常により質の高いメディアのタイプに変換される。視覚的な場面が、位置に基づいたテンプレートまたは視覚的なキューを使用して、クライアントのデバイスの物理的な位置に基づいて構成され、価値を向上され得る。 One exemplary use for a customized media system includes videos that are automatically generated and distributed based on text input. In this application, various text inputs such as weather forecasts, stock quotes, and news are converted into much higher quality media types such as 3D or video weather forecasts. Visual scenes can be constructed and enhanced based on the physical location of the client device using a location-based template or visual cue.

例えば、クライアントのデバイスがＳａｎＦｒａｎｃｉｓｃｏに関連する場合には、ＧｏｌｄｅｎＧａｔｅＢｒｉｄｇｅの写真を示している背景が、テキスト、３Ｄレンダリング、動画、または映像のような他のコンテンツの後に現れ得る。別の例において、ユーザの好みが、ユーザのためにメディアをカスタマイズするために適用され得る。例えば、ユーザは、音声、映像アバタを取捨選択し得るか、またはユーザはまた、個人用のアバタまたは他の画像表示を生成するために、イメージを提供し得る。ユーザの人口統計がまた、メディアをカスタマイズするために適用され得る。例えば、特定の年齢／性別／収入のグループに基づいて、派手な、最先端、またはより保守的な職業タイプのコンテンツに向きのコンテンツに対する、好みが導き出され得る。デバイスの能力はまた、クライアントのデバイスに対してメディアをカスタマイズするために使用され得る。例えば、サポートされた映像、３Ｄおよび音声のフォーマットを含むデバイスの能力に特別に合致するように、場面が作成され得る。 For example, if the client device is associated with San Francisco, a background showing a picture of Golden Gate Bridge may appear after other content such as text, 3D rendering, video, or video. In another example, user preferences may be applied to customize media for the user. For example, the user may select audio, video avatars, or the user may also provide images to generate personal avatars or other image displays. User demographics can also be applied to customize the media. For example, based on a particular age / gender / income group, preferences for content oriented towards flashy, cutting-edge, or more conservative occupation type content may be derived. Device capabilities can also be used to customize media for client devices. For example, scenes can be created to specifically match the capabilities of the device, including supported video, 3D and audio formats.

１つの例示的な実施形態は、インテリジェントなテキスト入力認識を含む。このバージョンにおいて、カスタマイズされたメディアは、テキスト入力に含まれるデータに基づいて生成される。例として天気情報を使用すると、クライアントのデバイスのＧＰＳの位置が、天気のデータベースに対するクエリの基準として使用され得る。返信された天気の記述は分析されて、次に、バーチャルなキャラクタのための音声を作成するために使用され得、一方で、降水確率が、地図上に現れる雨雲によって描かれ得、３２度を下回る温度が、震えるアバタによって描かれ得る。この方法で、一連のメディア入力が、位置のような文脈パラメータに基づいて、特定のクライアントのデバイスに対してカスタマイズされる。別の関連する例において、インスタントメッセージのようなメッセージが、音声トラックを有する動画にテキストメッセージを転換するために、メディアカスタマイズシステムによって変換され得る。特に、インスタントメッセージでのテキストの誕生日祝賀は、バースデイソングまたはオリジナルのメッセージのテキストを再生する音声トラックを有する動画の誕生日祝賀を提供するように、変換されて、メディアの価値向上がなされ得る。 One exemplary embodiment includes intelligent text input recognition. In this version, customized media is generated based on the data contained in the text input. Using weather information as an example, the GPS location of the client device may be used as a basis for querying the weather database. The returned weather description can be analyzed and then used to create audio for the virtual character, while the probability of precipitation can be drawn by the rain clouds appearing on the map, The temperature below can be drawn by a shaking avatar. In this way, a series of media inputs are customized for a particular client device based on contextual parameters such as location. In another related example, a message such as an instant message may be converted by a media customization system to convert the text message to a video with an audio track. In particular, text birthday celebrations in instant messages can be transformed to provide media birthday enhancements to provide birthday songs or birthday celebrations for videos with audio tracks that play the original message text. .

実施形態に対する別の関連する例示的なアプリケーションは、インテリジェントな交通レポートまたはインテリジェントな旅行情報の生成を含む。位置情報はＧＰＳシステムを使用することによって獲得され得る。例えば、数秒離れて、２つのＧＰＳ測定を獲得することによって、現在の道と方向が決定され得、道のさらに遠方の交通についての情報が得られ、クライアントのデバイスに対してカスタマイズされ得る。一般的に、ＧＰＳが可能であるクライアントのデバイスは、通常の通勤の間に数回、メディアカスタマイズシステムまたは接続されたサーバにＧＰＳデータを送信する。従って、システムは、ユーザが１つ以上の通常の経路を有するかどうかを決定し得る。次に、クエリに応答して、導き出された経路の情報に基づいて、システムは、ユーザが取ると考えられる残りの経路に沿った交通状況を報告し得る。時間データはまた、朝の通勤または夕方の通勤であるかどうかを決定するために使用され得る。リアルタイムの交通イベント（事故）がサーバに対して利用可能である場合には、メディアカスタマイズシステムは、交通事故が通常の経路上のさらに遠くで生じた場合には、クライアントのデバイスに警告を送り得る。ユーザの経路情報が導き出されると、道路地図、交通データの写真、またはレポートを伝える３Ｄのアバタを有する動画情報または他の文脈情報に基づいた他のカスタマイズされたパラメータを生成するために、ユーザの経路情報は使用され得る。 Another related exemplary application for embodiments includes the generation of intelligent traffic reports or intelligent travel information. Location information can be obtained by using a GPS system. For example, by obtaining two GPS measurements a few seconds apart, the current road and direction can be determined, information about further traffic on the road can be obtained and customized for the client device. In general, a client device capable of GPS sends GPS data to a media customization system or connected server several times during a normal commute. Thus, the system can determine whether the user has one or more normal paths. Next, in response to the query, based on the derived route information, the system may report traffic conditions along the remaining routes that the user is likely to take. The time data can also be used to determine whether it is a morning commute or an evening commute. If a real-time traffic event (accident) is available to the server, the media customization system can send an alert to the client device if the traffic accident occurs further along the normal route . Once the user's route information is derived, the user's route information can be generated to generate roadmaps, traffic data photos, or other customized parameters based on 3D avatars that convey reports or other contextual information. Route information can be used.

旅行関連のサービスが、ほぼ同じカスタマイズされた方法で供給され得るが、やはりクライアントのデバイスの位置、以前のユーザの好みおよび人口統計、およびレストランガイド（Ｚａｇａｔ）のような利用可能な入力メディアチャンネル、ならびにホテルおよび娯楽プロモーションサービスとの有料のパートナー契約に基づいて、レストラン、ホテル、クラブ、および娯楽オプションに関する特定の地域の情報を提供する。 Travel-related services can be provided in much the same customized way, but again the client device location, previous user preferences and demographics, and available input media channels such as restaurant guides (Zagat), And provide specific local information on restaurants, hotels, clubs, and entertainment options based on paid partner agreements with hotels and entertainment promotion services.

本発明の方法およびシステムの様々な実施形態を使用する、メディアカスタマイズに対する他のアプリケーションは、例えば、Ｆｒｉｅｎｄｓｔｅｒ、Ｔｒｉｂｅ、ＭｙＳｐａｃｅ、Ｄｏｄｇｅｂａｌｌ、Ｍａｔｃｈ．ｃｏｍのようなコミュニティ内のユーザの間でメッセージを送信するためのコンテンツの変換および生成を含む。メッセージは、最小共通分母のフォーマット（テキストのみ：インスタントメッセージ、ＳＭＳ、電子メール）で発信し得る。しかしながら、音声および映像のような他の入力がまたサポートされる。入力メディアは、あて先のデバイスの能力に従ってトランスコーディングされ得、ユーザが選択または作成したアバタ、再生音声、位置を基にした情景などのようなさらなる文脈パラメータは、変換および生成プロセスの間に使用され得る。メッセージは一人のユーザから発信して、別のユーザに送信されるが、しかしコミュニティの中央「プロフィール」エンジンを通過し、該コミュニティの中央「プロフィール」エンジンは、端から端までのルーティングを行い、２人のユーザの終点（電話番号、電子メールのアドレスなど）を隠す。２人のユーザのプロフィールの識別だけが互いに明かされるが、「実世界」の正体は明らかにされない。これは、「ｔｅｘｔｉｍａｔｉｏｎ」（テキストから動画への）変換の使用が、より高性能のフォーマットを扱うことが可能であるデバイス／プレイヤを有するユーザのために、より高性能のフォーマットにテキストまたはＳＭＳメッセージを「アップグレード」することを可能にする。例としては、テキストから音声への変換を用いたＳＭＳからアバタへの変換および生成を含み、（送信者のプロフィールから抽出された）送信者のアバタを使用、および圧縮されかつ短縮されたＳＭＳの省略表現を、エモーティコンキュー、句読法から抽出される感情的なニュアンスを有するより完全な会話文に爆発的に増加させるための、ＳＭＳ固有の文脈エンジンの使用を含む。 Other applications for media customization that use various embodiments of the methods and systems of the present invention include, for example, Friendster, Tribe, MySpace, Dodgeball, Match. com conversion and generation to send messages between users in a community such as com. Messages can be sent in the format of the minimum common denominator (text only: instant message, SMS, email). However, other inputs such as audio and video are also supported. Input media can be transcoded according to the capabilities of the destination device, and additional context parameters such as avatars selected or created by the user, playback audio, location-based scenes, etc. are used during the conversion and generation process. obtain. The message originates from one user and is sent to another user, but passes through the community's central “profile” engine, which performs end-to-end routing, Hide the end points (phone numbers, email addresses, etc.) of the two users. Only the identification of the profiles of the two users is revealed to each other, but the identity of the “real world” is not revealed. This allows text or SMS messages to be used in higher performance formats for users with devices / players where the use of “textimation” (text to video) conversion can handle higher performance formats. Allows you to "upgrade". Examples include SMS-to-avatar conversion and generation using text-to-speech conversion, using sender's avatar (extracted from sender profile), and compressed and shortened SMS Includes the use of SMS-specific context engines to explode shorthand expressions into more complete conversational sentences with emotional nuances extracted from emoticon cues, punctuation.

一実施形態において、様々なメディア要素は「解体される」必要はない。例えば、このシステムに供給される要素が音声トラックである場合には、音声トラックが他のメディア要素に組み込まれる、および／または変換される前に、音声トラックは解体される必要はない。 In one embodiment, the various media elements need not be “disassembled”. For example, if the element supplied to the system is an audio track, the audio track need not be disassembled before the audio track is incorporated and / or converted into other media elements.

様々な実施形態において、文脈に基づいた変換にとどまらないものが生じ得る。例えば、様々な芸術的な要素を組み合わせる（例えば、図形的な３Ｄレンダリングに音声を加える）ことが、「すごいもの」を作成する−これは、派生的な作品または他の結果としての作品を作成する変換プロセスである。異なるフォーマットで異なる入力を獲得して、それら全てを同じフォーマット（例えば、３ＧＰＰ）に変換するだけにとどまらず、このような実施形態は、全体が部分の合計を上回る状況を含む。さらに、実施形態において、１つの要素が他の要素を特徴づけるか、または影響を与える−例えば、音声トラックが、唇、目などの動画（動作）を「動かす」。 In various embodiments, more than just a contextual transformation can occur. For example, combining various artistic elements (eg, adding audio to a graphical 3D rendering) creates a “wow”-this creates a derivative work or other resulting work Conversion process. More than just acquiring different inputs in different formats and converting them all to the same format (eg, 3GPP), such embodiments include situations where the total exceeds the sum of the parts. Further, in embodiments, one element characterizes or influences another element—for example, an audio track “moves” a moving image (motion) such as lips, eyes, and the like.

一実施形態に従って、本明細書に記述された特徴の少なくとも一部は、１つ以上の機械で読み取り可能なメディアに格納されたソフトウェアまたは他の機械で読み取り可能な命令で実現され得る。機械で読み取り可能な命令は、１つ以上のプロセッサによって実行され得ることにより、本明細書に記述された特徴および他の機能性を提供する。例えば、図１〜図３に示されている様々なエンジン、テンプレート、コンポーネントなどは、１つ以上のプロセッサによって実行可能であるソフトウェアモジュールとして実現され得る。 According to one embodiment, at least some of the features described herein may be implemented in software or other machine readable instructions stored on one or more machine readable media. Machine readable instructions may be executed by one or more processors to provide the features and other functionality described herein. For example, the various engines, templates, components, etc. shown in FIGS. 1-3 can be implemented as software modules that can be executed by one or more processors.

別の実施形態が図４に示されている。該実施形態は、映像符号化の速度および質を向上し得る補足情報を生成する能力を有する、文脈に基づいたメディア生成コンポーネント（ブロック１０６および１０７）を提供する。補足情報は、トランスコーディングコンポーネント１０５（または一部の他の変換コンポーネント）によって使用されて、（例えば、３Ｄフレームからの３ＧＰＰのファイルのような）映像を生成する。トランスコーディングコンポーネント１０５によって使用される映像圧縮は、フレーム間の動きの評価と、予報の誤りの符号化を伴う。３Ｄメディアジェネレータ１０６は、フレーム間の動きの情報を提供し得る。生じた動きと、１つのフレームと別のフレームとの間で変化したことが分かるので、正確な動きの情報を有することは、映像の符号化の時間を削減し（例えば、動きの評価が必要ではない）、予報の誤りを減少させる。 Another embodiment is shown in FIG. The embodiment provides a context-based media generation component (blocks 106 and 107) that has the ability to generate supplemental information that can improve the speed and quality of video encoding. The supplemental information is used by the transcoding component 105 (or some other transform component) to generate a video (eg, a 3GPP file from a 3D frame). Video compression used by the transcoding component 105 involves evaluating motion between frames and encoding prediction errors. The 3D media generator 106 may provide motion information between frames. Having accurate motion information reduces the time required to encode the video (eg, motion evaluation is required), as it can be seen that the motion that has occurred has changed between one frame and another. Not) reduce forecast errors.

この動きの情報または他の補足情報は、符号化のためのヒントトラック１５１としてトランスコーディングコンポーネント１０５に渡される。予報の誤りの減少は、圧縮の質を増加させ、一定のビットレートに対するより高い映像の質、または一定の質の映像ファイルに対する削減されたファイルサイズを可能にする。動きのデータは、映像の符号化の質および速度を増加させるために３Ｄメディアジェネレータ１０６によって提供され得る、補足情報の単なる１つの可能な例であるということが理解される。メディアジェネレータ１０６はまた、追加的な限定ではない例として、フレーム間の輝度または他のライティングの変更、フレーム間のテクスチャの変更に関する情報を提供し得る。３Ｄメディアジェネレータ１０６の実施形態はまた、さらに追加的な例として、テキストのオーバレイがある場合にはフレームの構成に関する情報を送信し得るか、または場面のどの部分がさらに多くのビットの焦点となるべきかに関する情報を送信し得る。 This motion information or other supplemental information is passed to the transcoding component 105 as a hint track 151 for encoding. The reduction in forecast errors increases the quality of compression and allows for higher video quality for a constant bit rate or reduced file size for a constant quality video file. It will be appreciated that motion data is just one possible example of supplemental information that may be provided by 3D media generator 106 to increase the quality and speed of video encoding. Media generator 106 may also provide information regarding changes in luminance or other lighting between frames, texture changes between frames, as additional non-limiting examples. Embodiments of 3D media generator 106 may also send information about the composition of the frame if there is a text overlay, or which part of the scene will be the focus of more bits, as a further additional example. Information about what to do can be sent.

実施形態において、メディアは、デバイス能力に基づいて、（例えば）３ＧＰＰの映像ファイルまたは３Ｄの映像ファイルとしてエンドユーザのデバイスに配信され得る。この情報は、デバイスの能力検出エンジン１０９を用いて導き出される。さらに、映像または３Ｄのメディアは、デバイスの能力に合うようにカスタムで作成され得る。３Ｄのファイルをレンダリングすることに関して、異なるデバイスは異なる能力を有し得、従って、そのデバイスに最適化された３Ｄのコンテンツを必要とし得る。文脈に基づいたメディア生成エンジンは、デバイスの能力に合うファイルを生成することが可能である。同様の方法で、トランスコーディングコンポーネント１０５は、エンドユーザのデバイスに適した属性を有する映像ファイルを生成することが可能である。メディアが、全て異なる特質を有する複数の３Ｄファイルおよび複数の３ＧＰＰのファイルに事前に符号化された場合には、配信エンジンがデバイスの情報およびファイルのプロパティを使用して、エンドユーザのデバイスにとって最適なファイルを選択し得る。 In embodiments, the media may be delivered to the end user's device as a 3GPP video file or a 3D video file (for example) based on device capabilities. This information is derived using the device capability detection engine 109. Furthermore, video or 3D media can be custom created to match the capabilities of the device. With respect to rendering a 3D file, different devices may have different capabilities and thus may require 3D content optimized for that device. A context-based media generation engine can generate files that match the capabilities of the device. In a similar manner, transcoding component 105 can generate a video file with attributes suitable for the end user's device. If the media is pre-encoded into multiple 3D files and multiple 3GPP files, all with different characteristics, the delivery engine uses device information and file properties to optimize for end-user devices You can select the correct file.

一実施形態の配信エンジンは、ファイルを選択するために、ネットワークの状態またはデバイスの状態よりもむしろ、デバイスの能力を使用する。セッションが開始されると、配信エンジンの能力が、動的帯域幅適合技術と結合され得る。それは例えば、２００３年５月３０日出願の「ＭＥＴＨＯＤＡＮＤＡＰＰＡＲＡＴＵＳＦＯＲＤＹＮＡＭＩＣＢＡＮＤＷＩＤＴＨＡＤＡＰＴＡＴＩＯＮ」と題される米国特許出願第１０／４５２，０３５号に開示されており、該出願は、本出願と同じ譲受人に譲渡され、その全体が本明細書において参考として援用される。一実施形態のトランスコーディングコンポーネント（１０５）はまた、この共有に係る出願に開示される適合技術を利用し得る。 The distribution engine of one embodiment uses device capabilities rather than network state or device state to select a file. Once the session is initiated, the delivery engine capabilities can be combined with dynamic bandwidth adaptation techniques. It is disclosed, for example, in US patent application Ser. No. 10 / 452,035 entitled “METHOD AND APPARATUS FOR DYNAMIC BANDWIDTH ADAPTATION” filed on May 30, 2003, which is assigned to the same assignee as the present application. And is hereby incorporated by reference in its entirety. The transcoding component (105) of one embodiment may also utilize adaptation techniques disclosed in this sharing application.

本明細書において言及され、および／または出願データシートに列挙された上記の米国特許、米国特許出願公開、米国特許出願、外国特許、外国特許出願、および非特許刊行物の全ては、その全体が本明細書において参考として援用される。 All of the above U.S. patents, U.S. patent application publications, U.S. patent applications, foreign patents, foreign patent applications, and non-patent publications mentioned in this specification and / or listed in the application data sheet are incorporated in their entirety. Incorporated herein by reference.

要約書に記述されたものを含む例示された実施形態に関する上の記述は、網羅的であること、または開示された正確な形式に本発明を限定することを意図されていない。特定の実施形態および例は、本明細書において例示の目的として記述されているが、様々な均等な改変が、本発明の範囲内において可能であり、本発明の意図および範囲を逸脱することなく行われ得る。 The above description of illustrated embodiments, including those described in the abstract, is not intended to be exhaustive or to limit the invention to the precise form disclosed. While particular embodiments and examples have been described herein for purposes of illustration, various equivalent modifications are possible within the scope of the invention and without departing from the spirit and scope of the invention. Can be done.

これらのおよび他の改変が、上記の詳細な記述に鑑みて本発明に対して行われ得る。特許請求の範囲において使用される用語は、本明細書および特許請求の範囲に開示された特定の実施形態に本発明を限定するように解釈されるべきではない。むしろ、本発明の範囲は特許請求の範囲によって完全に決定されるべきであり、それは請求項の解釈に関する確立された原則に従って解釈されるべきである。 These and other modifications can be made to the invention in light of the above detailed description. The terms used in the following claims should not be construed to limit the invention to the specific embodiments disclosed in the specification and the claims. Rather, the scope of the present invention should be determined entirely by the claims, which should be construed in accordance with established principles with respect to claim interpretation.

図１は、一実施形態に従ったシステムのブロック図である。FIG. 1 is a block diagram of a system according to one embodiment. 図２は、別の実施形態に従ったシステムのブロック図である。FIG. 2 is a block diagram of a system according to another embodiment. 図３は、さらに別の実施形態に従ったシステムのブロック図である。FIG. 3 is a block diagram of a system according to yet another embodiment. 図４は、またさらに別の実施形態に従ったシステムのブロック図である。FIG. 4 is a block diagram of a system according to yet another embodiment.

Claims

Obtaining the first data;
Analyzing the first data and obtaining context information therefrom;
Using the obtained contextual information to determine second data;
Supplementing the first data with the determined second data to obtain transformed data;
Delivering the transformed data to at least one client device.

Obtaining the first data includes obtaining text data, and analyzing the first data comprises parsing the text data and obtaining the context information therefrom. Including and supplementing the first data with the determined data comprises supplementing the text data with audio, video, image display, one or more images, or other quality media elements associated with the text data. The method of claim 1, comprising supplementing or replacing the text data with these.

Obtaining the first data includes obtaining a plurality of first data, and determining the second data is based on context information of the plurality of first data. Using at least a portion of a plurality of first data as the second data, and supplementing the first data with the determined second data comprises the plurality of first data The method of claim 1, comprising combining at least some second data corresponding to some of the data to obtain the transformed data.

The method of claim 1, wherein obtaining the context information includes determining emotional nuances from the first data.

The converted data, the second data, or another format from the first format based on characteristics of the client device or based on the status of a communication channel for delivering the converted data Transcoding the transformed data into:
The method of claim 1, further comprising: dynamically updating the transcoded data in response to changes in device characteristics or channel conditions of the client during a communication session. .

Complementing the first data with the determined second data uses a template to add the quality of the high quality media to the first data according to the order in time at the selected location. The method of claim 1, comprising providing an improvement.

Deriving hint track information based on the determined second data or based on the context information;
Using the hint track information to optimize transcoding of the transformed data to a format that is optimal for the client device or for a communication channel to the client device; The method of claim 1, further comprising:

Means for obtaining first data;
Means for analyzing the first data and obtaining context information therefrom;
Means for determining second data using the obtained context information;
Means for supplementing the first data with the determined second data to generate transformed data;
Means for delivering the converted data to at least one client device.

Means for converting the first, second or converted data;
9. The system of claim 8, further comprising means for storing client device data, customer information, or image information that can be used to generate the transformed data.

9. The system of claim 8, wherein the means for analyzing the first data includes at least one of a context database, a context engine, a placement template, and an animation template.

9. The supplemental means of claim 8, comprising at least one of a video engine, an audio engine, a rendering engine, a video engine, and an image template that can be used to generate media for the converted data. System.

9. The system of claim 8, further comprising means for increasing the value of video encoding based on supplemental information derived from the contextual information or from the second data.

8. The system of claim 7, further comprising means for distributing the converted data as a video file derived based on device capabilities.

For analyzing the obtained first data and obtaining context information therefrom;
Using the obtained context information to determine second data;
At least one for supplementing the first data with the determined second data to obtain transformed data and for delivering the transformed data to at least one client device; A product comprising machine-readable media having instructions stored on the product that are executable by the processor.

The machine-readable medium is
For deriving hint track information based on the determined second data or based on the contextual information, and using the hint track information for the client device or for the client For optimizing the transcoding of the transformed data into a format that is optimal for the communication channel to the device;
15. The product of claim 14, further comprising instructions stored on the product.

15. The product of claim 14, wherein the instructions for analyzing the obtained first data to obtain contextual information include instructions for deriving emotional nuances from the first data.

Using the obtained contextual information, the instructions for determining second data are at least one or more of user preferences, device characteristics, or media content that may be associated with the first data. 15. The product of claim 14, comprising instructions for identifying second data associated with the first data based on the first data.

An input terminal that receives media input;
The terminal for analyzing the media input, obtaining context information therefrom, and using the obtained context information to determine a media supplement to the media input associated with the media input; A combined context component;
A media generation component combined with the context component to supplement the media input with the determined media supplement to obtain transformed media;
An apparatus comprising: an output terminal coupled with the media generation component for delivering the converted media to at least one client device.

For converting the converted media into a format that is optimal for a communication channel to the client device or to a communication channel to the client device used to deliver the converted media; A transcoding component combined with a media generation component;
19. The apparatus of claim 18, further comprising: at least one storage unit for storing information related to the client device, including device capability information, user information, preference information, or avatar templates. The device described.

For converting the converted media into a format that is optimal for a communication channel to the client device or to a communication channel to the client device used to deliver the converted media; The apparatus of claim 18, further comprising a transcoding component combined with a media generation component, wherein the transcoding component is used to optimize conversion of the media generation component or the converted media. An apparatus further coupled to receive hint track information from the contextual component that is possible.

The apparatus of claim 18, wherein the context component includes a context database, a context engine, a placement template, and an animation template.

The apparatus of claim 18, wherein the media generation component includes a video engine, an audio engine, a rendering engine, a video engine, and an image template that can be used to generate objects for the converted media.