JP7317189B2

JP7317189B2 - automated media publishing

Info

Publication number: JP7317189B2
Application number: JP2022117811A
Authority: JP
Inventors: シャイレンドラ・マートル
Original assignee: アビッドテクノロジーインコーポレイテッド
Priority date: 2018-07-02
Filing date: 2022-07-25
Publication date: 2023-07-28
Anticipated expiration: 2039-06-28
Also published as: EP3591595A1; JP7194367B2; US10771863B2; JP2022137306A; JP2020017945A; EP4170568A1; EP3591595B1; US20200007956A1

Description

関連出願に対する相互引用
[001] 本願は、３５Ｕ．Ｓ．Ｃ．§１２０に基づく優先権および権利を主張し、２０
１８年７月２日に出願された係属中の米国特許出願第１０／０２５，０４３号の継続出願である。この特許出願をここで引用したことにより、その内容は本願にも含まれるものとする。 Cross-citation to related applications
[001] The present application is based on 35 U.S.C. S. C. claiming priority and rights under § 120, 20
This is a continuation of pending US patent application Ser. No. 10/025,043 filed Jul. 2, 18. By reference herein to this patent application, its content is also incorporated herein.

Prior art

[002] メディアは、劇場、事務所、家庭、および多くの他の会場において世界中で消
費されている。メディアは、大型の固定画面上、ハンドヘルド移動体デバイス上、および広範囲にわたる中間プラットフォーム上で視聴される。メディア消費設定値およびプラットフォームの数が増えるに連れて、メディアがコンシューマに配信されることが意図された会場やプラットフォームの各々に適したバージョンおよびフォーマットで、メディア・コンテンツの所与の品目が配信されることに対する要望が増えつつある。更に、出版および流通の手段も多様になっている。その結果、所与のメディア・プロジェクトが、数十、または百を超える異なる成果物(deliverable)を求めることもある。 [002] Media is consumed worldwide in theaters, offices, homes, and many other venues. Media is viewed on large fixed screens, on handheld mobile devices, and on a wide range of intermediate platforms. As the number of media consumption preferences and platforms increases, a given item of media content will be delivered in versions and formats appropriate for each of the venues and platforms on which the media is intended to be delivered to consumers. There is an increasing demand for In addition, the means of publication and distribution have also diversified. As a result, a given media project may call for dozens or even hundreds of different deliverables.

[003] 今日のメディア出版ワークフローは、時間がかかり、複雑であり、誤りが発生
しやすく、その場しのぎである。メディアの作成に関与するものは、少量のバージョン、レンディッション(rendition)、および消費媒体(vehicle)しか要求されないときには、このような非効率性に対処することができたが、このような方法は、所与のメディア・プロジェクトに対する成果物が多様になるに連れて、扱い難くなりそして費用がかかり過ぎるようになった。加えて、メディア作成のストーリテリング(story-telling)段階に関与す
る創作編集者(creative editor)は、彼らの創作的選択を下流のレンダリング・プロセス
において表現する能力を有していない。したがって、メディア作成および出版パイプラインは、コンテンツを広げメディア成果物の技術的品質を高め、更に生産性を押し上げてコストを制御する新たな手法を必要としている。 [003] Today's media publishing workflows are time consuming, complex, error prone and ad hoc. Those involved in media creation could have dealt with such inefficiencies when only a small number of versions, renditions, and consumption vehicles were required; became unwieldy and too costly as the deliverables for a given media project became diverse. Additionally, creative editors involved in the story-telling stage of media creation do not have the ability to express their creative choices in the downstream rendering process. Therefore, media creation and publishing pipelines need new ways to extend content, improve the technical quality of media deliverables, and further boost productivity and control costs.

[0004] 概略的に、本明細書において説明する方法、システム、およびコンピュータ・プログラム製品は、メディア作成および出版ワークフローの部分の正当化(rationalization)および自動化を可能にする。編集プロセスの任意の段階において、編集者には、合成
オブジェクトを規則とタグ付けする手段が提供される。規則は、レンディション・プロファイルによって指定された通りの出力レンダリングを可能な限り最良の品質で生成しつつ、編集者のレンディション関連選択を忠実に守るようにするには、どのようなレンダリングの判断を行い、どのように入力ソースを扱うべきか指定する。これらのタグは、コンポジションをレンダリングするときに、レンディション・エンジンによって下流において解釈される規則を規定する(provide)。これらの規則は、レンディション要件を指定する１
組のプロファイルに応答して、どのようにレンディションを生成すべきか指定する。 [0004] Generally, the methods, systems, and computer program products described herein enable rationalization and automation of portions of media creation and publishing workflows. At any stage of the editing process, editors are provided with the means to tag composite objects with rules. A rule should determine what rendering decisions are made to adhere to the editor's rendition-related choices while producing the best possible quality output rendering as specified by the rendition profile. to specify how the input source should be treated. These tags provide rules that are interpreted downstream by the rendition engine when rendering the composition. These rules specify rendition requirements1
Specifies how renditions should be generated in response to a set of profiles.

[0005] 概略的に、１つの態様において、メディア・コンポジションを編集する方法は、メディア合成ホスト・システムにおいて実行するメディア合成アプリケーションを使用して、編集者が、メディア・コンポジションを編集し、メディア・コンポジションが合成オブジェクトを含み、合成オブジェクトがソース・メディア・アセットを参照し、合成オブジェクトに対して編集レンディション規則を指定し、編集レンディション規則をメディ
ア・コンポジションの合成オブジェクトと関連付けることを可能にするステップと、所与のメディア・コンポジションのレンディションが生成されるとき、レンディション・エンジンが、所与のメディア・コンポジションと、編集者が指定し所与のメディア・コンポジションの合成オブジェクトと関連付けられた編集レンディション規則とを受け、所与のメディア・コンポジションのレンディションに対して、メディア・エッセンス・エンコーディング・パラメータを指定するレンディション・プロファイルを受け、所与のコンポジションのソース・メディア・アセットを入力し、所与のコンポジションの編集レンディション規則、および所与のコンポジションのレンディションに対するメディア・エッセンス・エンコーディング・パラメータにしたがって、所与のメディア・コンポジションのソース・メディア・アセットから、所与のメディア・コンポジションのレンディションを生成するステップとを含む。 [0005] In general, in one aspect, a method of editing a media composition comprises, using a media composition application executing on a media composition host system, an editor editing a media composition; The media composition includes a compositing object, the compositing object references a source media asset, specifying editing rendition rules for the compositing object, and associating the editing rendition rule with the compositing object of the media composition. and when a rendition of a given media composition is generated, the rendition engine generates a rendition of the given media composition and the editor-specified given media composition composition object and associated editing rendition rules, and for a rendition of a given media composition, a rendition profile that specifies media essence encoding parameters, and Inputs the source media assets for a position and renders a given media composition according to the given composition's edit rendition rules and the media essence encoding parameters for the given composition's renditions. generating renditions of a given media composition from the source media assets.

[0006] 種々の実施形態は、以下の特徴の内１つ以上を含む。メディア合成アプリケーションが、レンディション・エンジンを含む。レンディション・エンジンが、メディア合成ホストの外部にあるプラットフォーム上にホストされる。レンディション・プロファイルが、所与のコンポジションのレンディションを生成するために使用されるべき所与のメディア・コンポジションのバージョンを指定する。編集者が指定し所与のメディア・コンポジションの合成オブジェクトと関連付けられた編集レンディション規則が、空間編集レンディション規則であり、レンディション・プロファイルによって指定されたメディア・エッセンス・エンコーディング・パラメータの少なくとも１つが、空間レンディション・パラメータである。空間レンディション・パラメータが、所与のメディア・コンポジションの合成オブジェクトのビデオ・フレームの一部分を定めるフレーミング・ボックスであり、レンディション・エンジンが、フレーミング・ボックス内に存在する所与のコンポジションのソース・メディア・アセットの第１部分を、優先してレンディションに含ませる。レンディションに対するターゲット・ディスプレイのアスペクト比が、フレーミング・ボックスのアスペクト比よりも大きく、レンディション・エンジンが、更に、フレーミング・ボックスの外側にある所与のコンポジションのソース・メディア・アセットの第２部分を、優先してレンディションに含ませ、所与のメディア・コンポジションのソース・メディア・アセットの第２部分が、メディア・エッセンス・データを含む。編集者が指定し所与のメディア・コンポジションの合成オブジェクトと関連付けられた編集レンディション規則が、時間編集レンディション規則であり、レンディション・プロファイルによって指定されたメディア・エッセンス・エンコーディング・パラメータの少なくとも１つが、時間レンディション・パラメータである。時間レンディション・パラメータが、所与のコンポジションのソース・メディア・アセット内に時間範囲を定める時間フレーミング・レンジを含み、レンディション・エンジンのロジックが、時間フレーミング・レンジ内に存在する所与のメディア・コンポジションのソース・メディア・アセットの時間部分を、優先してレンディションに含ませる。編集者が指定し所与のメディア・コンポジションの合成オブジェクトと関連付けられた編集レンディション規則が、ダイナミック・レンジ編集レンディション規則であり、レンディション・プロファイルによって指定されたメディア・エッセンス・エンコーディング・パラメータの少なくとも１つが、ダイナミック・レンジレンディション・パラメータである。所与のコンポジションのソース・メディア・アセットがビデオ・アセットであり、ダイナミック・レンジ・レンディション・パラメータが、所与のコンポジションのソース・メディア・アセット内に画素輝度(brightness)値の範囲を定めるリーガル・ダイナミック・レンジであり、レンディション・エンジンが、リーガル・レンジ内に存在する輝度値を有する所与のコンポジションのソース・メディア・アセットの画素を、優先してレンディションに含ませる。所与のコンポジションのソース・メディア・アセットがオーディオ・アセットであり、ダイナミック・レンジ・レンディション・パラメータが、所与のコンポジションのソース・メディア・アセット内のオーディオ・サンプル強度値の範囲を定めるリーガル・ダイナミック・レンジであり、レンディション・エンジンが、リーガル・レンジ内に存在する強度値を有する所与のコンポジション
のソース・メディア・アセットのオーディオ・サンプルを、優先してレンディションに含ませる。編集者が指定し所与のメディア・コンポジションの合成オブジェクトと関連付けられた編集レンディション規則が、色編集レンディション規則であり、レンディション・プロファイルによって指定されたメディア・エッセンス・エンコーディング・パラメータの少なくとも１つが、色域レンディション・パラメータである。所与のコンポジションのソース・メディア・アセットが、グラフィクス・アセットおよびビデオ・アセットの内の１つであり、色域レンディション・パラメータが、多次元色空間内に多次元領域を定めるリーガル・カラー・レンジを含み、レンディション・エンジンが、リーガル・カラー・レンジ内に存在する色値を有する所与のコンポジションのソース・メディア・アセットの画素を、優先してレンディションに含ませる。編集者が指定し所与のメディア・コンポジションの合成オブジェクトと関連付けられた編集レンディション規則が、レンディションのフレーム上におけるタイトル・テキストのレイアウトのための空間編集レンディション規則であり、少なくとも１つのメディア・エッセンス・エンコーディング・パラメータが、空間フレーミング・ボックスであり、所与のコンポジションのソース・メディア・アセットが、テキスト・タイトリング・エフェクト(text titling effect)であり、レンディシ
ョン・エンジンが、レンディションのフレーム上のフレーミング・ボックス内に、テキスト・タイトリング・エフェクトのテキストをレンダリングする。編集者が指定し所与のメディア・コンポジションの合成オブジェクトと関連付けられた編集レンディション規則が、コンポジション編集レンディション規則であり、少なくとも１つのメディア・エッセンス・エンコーディング・パラメータが、空間フレーミング・ボックスであり、所与のコンポジションのソース・メディア・アセットが、合成エフェクト(compositing effect)であり、レンディション・エンジンが、合成画像を有するレンディションのフレームをフレーミング・ボックス内にレンダリングする。レンディション・プロファイル、および編集者が指定し所与のメディア・コンポジションの合成オブジェクトと関連付けられた編集レンディション規則が、レンディション・エンジンによって、標準化された形態で受け取られ、レンディション・エンジンが、規則を実行するための包括的ロジックを含む。レンディション・エンジンが、深層学習方法を使用して、広範囲のメディア・コンテンツ、ソース・メディア・タイプ、およびレンディション要件を扱う作業をする編集者の集団(cohort)から取り込んだ編集レンディション判断から学習されるロジックを含む。 [0006] Various embodiments include one or more of the following features. A media composition application includes a rendition engine. A rendition engine is hosted on a platform external to the media composition host. A rendition profile specifies the version of a given media composition that should be used to generate a rendition of the given composition. The editing rendition rules specified by the editor and associated with the composite object of a given media composition are the spatial editing rendition rules and have at least the media essence encoding parameters specified by the rendition profile. One is the spatial rendition parameter. The spatial rendition parameter is a framing box that defines a portion of a video frame for a composite object of a given media composition, and the rendition engine is a The first portion of the source media asset is preferentially included in the rendition. If the target display aspect ratio for the rendition is greater than the framing box aspect ratio, the rendition engine may further Portions are preferentially included in renditions, with a second portion of the source media assets of a given media composition containing media essence data. The editing rendition rules specified by the editor and associated with the composite objects of a given media composition are temporal editing rendition rules and have at least the media essence encoding parameters specified by the rendition profile. One is the temporal rendition parameter. The temporal rendition parameters include a temporal framing range that bounds the time within the source media assets of a given composition, and the rendition engine logic determines that a given composition exists within the temporal framing range. The time portion of the source media asset of the media composition is preferentially included in the rendition. The editing rendition rule specified by the editor and associated with the composite object of a given media composition is the dynamic range editing rendition rule, and the media essence encoding parameters specified by the rendition profile is a dynamic range rendition parameter. The source media asset for a given composition is a video asset and the dynamic range rendition parameter specifies the range of pixel brightness values within the source media asset for the given composition. A legal dynamic range that the rendition engine defines that causes pixels of the source media asset of a given composition that have luminance values that lie within the legal range to be preferentially included in the rendition. A given composition's source media asset is an audio asset, and the dynamic range rendition parameter defines the range of audio sample intensity values within the given composition's source media asset Legal dynamic range, so that the rendition engine prefers audio samples of source media assets for a given composition that have intensity values that lie within the legal range to be included in the rendition . The editing rendition rule specified by the editor and associated with the composite object of a given media composition is a color editing rendition rule and has at least the media essence encoding parameters specified by the rendition profile. One is the gamut rendition parameter. Legal color where the source media asset for a given composition is one of a graphics asset and a video asset, and the gamut rendition parameter defines a multidimensional region within a multidimensional color space • Include a range so that the rendition engine prioritizes pixels of the source media asset for a given composition that have color values that lie within the legal color range for inclusion in the rendition. The editing rendition rules specified by the editor and associated with the composite object of the given media composition are spatial editing rendition rules for the layout of the title text on the frames of the rendition, and at least one A media essence encoding parameter is a spatial framing box, a source media asset for a given composition is a text titling effect, a rendition engine is a rendering Render the text of the text titling effect in the framing box on the frame of the distion. The editing rendition rules specified by the editor and associated with the composition object of a given media composition are composition editing rendition rules, and at least one media essence encoding parameter is a spatial framing box , where the source media asset of a given composition is a compositing effect, and the rendition engine renders frames of the rendition with the composite image into a framing box. The rendition profile and editorial rendition rules specified by the editor and associated with the composite object of a given media composition are received by the rendition engine in standardized form, and the rendition engine , which contains the generic logic for executing the rules. From editorial rendition decisions that the rendition engine uses deep learning methods to capture from a cohort of editors working with a wide range of media content, source media types, and rendition requirements Contains learned logic.

[0007] 概略的に、他の態様において、コンピュータ・プログラム製品は、コンピュ
ータ・プログラム命令が格納された非一時的コンピュータ読み取り可能ストレージを含み、コンピュータ・プログラム命令がコンピュータによって実行されると、コンピュータに、メディア・コンポジションを編集する方法を実行するように命令する。この方法は、メディア合成ホスト・システムにおいて実行するメディア合成アプリケーションを使用して、編集者が、メディア・コンポジションを編集し、メディア・コンポジションが合成オブジェクトを含み、合成オブジェクトがソース・メディア・アセットを参照し、合成オブジェクトに対して編集レンディション規則を指定し、編集レンディション規則をメディア・コンポジションの合成オブジェクトと関連付けることを可能にするステップと、所与のメディア・コンポジションのレンディションが生成されるとき、レンディション・エンジンが、所与のメディア・コンポジションと、編集者が指定し所与のメディア・コンポジションの合成オブジェクトと関連付けられた編集レンディション規則とを受け、所与のメディア・コンポジションのレンディションに対して、メディア・エッセンス・エンコーディング・パラメータを指定するレンディション・プロファイルを受け、所与のコンポジションのソース・メディア・アセットを入力し、所与のコンポジションの編集レンディション規則、および所与のコンポジションのレンディションに対するメディア・エッセンス・エンコーディング・パラメータにしたがって、所与のメディア・コンポジションのソース・メディア・アセットから、所与のメディア・コンポジションのレンディションを生成するステップとを含む。 [0007] In general, in another aspect, a computer program product includes non-transitory computer readable storage having computer program instructions stored therein which, when executed by the computer, cause a computer to: , to execute a method for editing a media composition. The method uses a media composition application running on a media composition host system to allow an editor to edit a media composition, the media composition to include a composition object, the composition object to represent a source media asset. , specifying editing rendition rules for the compositing object and enabling associating the editing rendition rules with the compositing object of the media composition; When generated, a rendition engine receives a given media composition and editing rendition rules specified by an editor and associated with composite objects of the given media composition, and generates a given For renditions of a media composition, take a rendition profile that specifies media essence encoding parameters, input source media assets for a given composition, and edit a given composition Renders a rendition of a given media composition from the given media composition's source media assets according to the rendition rules and the media essence encoding parameters for the given composition's renditions. and generating.

[0008] 概略的に、他の態様において、メディア・コンポジション編集システムは、コンピュータ読み取り可能命令を格納するためのメモリと、このメモリに接続されたプロセッサとを含み、このプロセッサが、コンピュータ読み取り可能命令を実行すると、システムに、システムにおいて実行するメディア合成アプリケーションを使用して、編集者が、メディア・コンポジションを編集し、メディア・コンポジションが合成オブジェクトを含み、合成オブジェクトがソース・メディア・アセットを参照し、合成オブジェクトに対して編集レンディション規則を指定し、編集レンディション規則をメディア・コンポジションの合成オブジェクトと関連付けることを可能にさせ、所与のメディア・コンポジションのレンディションが生成されるとき、レンディション・エンジンが、所与のメディア・コンポジションと、編集者が指定し所与のメディア・コンポジションの合成オブジェクトと関連付けられた編集レンディション規則とを受け、所与のメディア・コンポジションのレンディションに対して、メディア・エッセンス・エンコーディング・パラメータを指定するレンディション・プロファイルを受け、所与のコンポジションのソース・メディア・アセットを入力し、所与のコンポジションの編集レンディション規則、および所与のコンポジションのレンディションに対するメディア・エッセンス・エンコーディング・パラメータにしたがって、所与のメディア・コンポジションのソース・メディア・アセットから、所与のメディア・コンポジションのレンディションを生成する。 [0008] In general, in another aspect, a media composition editing system includes a memory for storing computer readable instructions and a processor coupled to the memory, the processor operable to perform computer readable Execution of the instructions instructs the system to edit a media composition using a media composition application running on the system, the media composition containing the composition object, the composition object containing the source media asset. to specify edit rendition rules for a compositing object, and to allow associating edit rendition rules with compositing objects of a media composition such that a rendition of a given media composition is generated. When the rendition engine receives a given media composition and the editing rendition rules specified by the editor and associated with the composite object of the given media composition, the rendition engine renders the given media composition For renditions of a composition, take a rendition profile that specifies media essence encoding parameters, input source media assets for a given composition, and edit renditions for a given composition Generate a rendition of a given media composition from a given media composition's source media assets according to rules and media essence encoding parameters for the given composition's rendition .

[0009] 概略的に、他の態様において、メディア・コンポジションのレンディションを生成する方法は、メディア・コンポジションを受けるステップであって、メディア・コンポジションが、ソース・メディア・アセットを参照する合成オブジェクトを含み、合成オブジェクトが、メディア・コンポジションの編集者によって、メディア合成アプリケーションを使用して、指定された編集レンディション規則と関連付けられる、ステップと、メディア・コンポジションのレンディションに対してメディア・エッセンス・エンコーディング・パラメータを指定するレンディション・プロファイルを受けるステップと、編集レンディション規則と、メディア・コンポジションのレンディションに対するメディア・エッセンス・エンコーディング・パラメータとにしたがって、ソース・メディア・アセットから、メディア・コンポジションのレンディションを生成するステップとを含む。 [0009] In general, in another aspect, a method of generating a rendition of a media composition comprises receiving a media composition, the media composition referencing a source media asset. for a rendition of a media composition, a step comprising a composition object, wherein the composition object is associated with specified editing rendition rules by an editor of the media composition using a media composition application; Receiving a Rendition Profile that specifies Media Essence Encoding Parameters; From a Source Media Asset according to the Editorial Rendition Rules and the Media Essence Encoding Parameters for the renditions of the Media Composition; , and generating a rendition of the media composition.

[0010] 種々の実施形態は、以下の特徴の内１つ以上を含む。メディア・コンポジションが複数のバージョンを有し、レンディション・プロファイルが、更に、メディア・コンポジションの複数のバージョンの内、コンポジションのレンディションを生成するために使用されるべきバージョンを指定する。編集レンディション規則が、空間編集レンディ
ション規則であり、レンディション・プロファイルによって指定されるメディア・エッセンス・エンコーディング・パラメータの少なくとも１つが空間レンディション・パラメータである。空間レンディション・パラメータが、メディア・コンポジションの合成オブジェクトのビデオ・フレームの一部分を定めるフレーミング・ボックスであり、コンポジションのソース・メディア・アセットの内、フレーミング・ボックス内に存在する第１部分が、優先してレンディションに含まれる。レンディションに対するターゲット・ディスプレイのアスペクト比が、フレーミング・ボックスのアスペクト比よりも大きく、コンポジションのソース・メディア・アセットの内、フレーミング・ボックスの外側にある第２部分も、優先してレンディションに含まれ、メディア・コンポジションのソース・メディア・アセットの第２部分が、メディア・エッセンス・データを含む。編集レンディション規則が、時間編集レンディション規則であり、レンディション・プロファイルによって指定されたメディア・エッセンス・エンコーディング・パラメータの少なくとも１つが時間レンディション・パラメータである。時間レンディション・パラメータが、コンポジションのソース・メディア・アセット内における時間範囲を定める時間フレーミング・レンジを含み、メディア・コンポジションのソース・メディア・アセットの内、時間フレーミング・レンジ内に存在する時間部分が、優先してレンディションに含まれる。編集レンディション規則が、ダイナミック・レンジ編集レンディション規則であり、レンディション・プ
ロファイルによって指定されたメディア・エッセンス・エンコーディング・パラメータの少なくとも１つが、ダイナミック・レンジ・レンディション・パラメータである。コンポジションのソース・メディア・アセットが、ビデオ・アセットであり、ダイナミック・レンジ・レンディション・パラメータが、コンポジションのソース・メディア・アセット内における画素輝度値の範囲を定めるリーガル・ダイナミック・レンジであり、リーガル・レンジ内に存在する輝度値を有するコンポジションのソース・メディア・アセットの画素が、優先してレンディションに含まれる。コンポジションのソース・メディア・アセットが、オーディオ・アセットであり、ダイナミック・レンジ・レンディション・パラメータが、コンポジションのソース・メディア・アセット内におけるオーディオ・サンプル強度値の範囲を定めるリーガル・ダイナミック・レンジであり、リーガル・レンジ内に存在する強度値を有するコンポジションのソース・メディア・アセットのオーディオ・サンプルが、優先してレンディションに含まれる。編集レンディション規則が、色編集レンディション規則であり、レンディション・プロファイルによって指定されたメディア・エッセンス・エンコーディング・パラメータの少なくとも１つが、色域レンディション・パラメータである。コンポジションが、グラフィクス・アセットおよびビデオ・アセットの内の１つであり、色域レンディション・パラメータが、多次元色空間内における多次元領域を定めるリーガル・カラー・レンジを含み、リーガル・カラー・レンジ内に存在する色値を有するコンポジションのソース・メディア・アセットの画素が、優先してレンディションに含まれる。編集レンディション規則が、レンディションのフレーム上におけるタイトル・テキストのレイアウトに対する空間編集レンディション規則であり、少なくとも１つのメディア・エッセンス・エンコーディング・パラメータが、空間フレーミング・ボックスであり、コンポジションのソース・メディア・アセットがテキスト・タイトリング・エフェクトであり、テキスト・タイトリング・エフェクトのテキストが、レンディションのフレーム上においてフレーミング・ボックス内にレンダリングされる。編集レンディション規則が、コンポジション編集レンディション規則であり、少なくとも１つのメディア・エッセンス・エンコーディング・パラメータが、空間フレーミング・ボックスであり、コンポジションのソース・メディア・アセットが、合成エフェクトであり、フレーミング・ボックス内に合成画像を有するレンディションのフレームがレンダリングされる。生成するステップが、規則を実行するために包括的ロジックを使用して、編集レンディション規則を解釈するステップを含む。生成するステップが、深層学習方法を使用して、広範囲のメディア・コンテンツ、ソース・メディア・タイプ、およびレンディション要件を扱う作業をする編集者の集団から取り込んだ編集レンディション判断から学習されるロジックを実行するステップを含む。 [0010] Various embodiments include one or more of the following features. The media composition has multiple versions, and the rendition profile further specifies which of the multiple versions of the media composition should be used to generate the rendition of the composition. The editing rendition rules are spatial editing rendition rules, and at least one of the media essence encoding parameters specified by the rendition profile is a spatial rendition parameter. The spatial rendition parameter is a framing box that defines a portion of a video frame of a composite object of the media composition, and the first portion of the composition's source media assets that resides within the framing box is , to be included in renditions with priority. If the target display's aspect ratio for the rendition is greater than the framing box's aspect ratio, the second portion of the composition's source media asset that is outside the framing box is also given preference to the rendition. A second portion of the source media assets included in the media composition includes media essence data. The editorial rendition rule is a temporal editorial rendition rule, and at least one of the media essence encoding parameters specified by the rendition profile is a temporal rendition parameter. The time rendition parameters include a time framing range that defines a range of time within the source media assets of the composition, and the time within the source media assets of the media composition that falls within the time framing range. parts are preferentially included in renditions. The editing rendition rule is a dynamic range editing rendition rule, and at least one of the media essence encoding parameters specified by the rendition profile is a dynamic range rendition parameter. The composition's source media asset is a video asset, and the dynamic range rendition parameter is the legal dynamic range that defines the range of pixel luminance values within the composition's source media asset , pixels of the composition's source media assets with luminance values that lie within the legal range are preferentially included in the rendition. The composition's source media asset is an audio asset, and the dynamic range rendition parameter is the legal dynamic range that defines the range of audio sample intensity values within the composition's source media asset and audio samples of the composition's source media assets with intensity values that lie within the legal range are preferentially included in the rendition. The editing rendition rules are color editing rendition rules, and at least one of the media essence encoding parameters specified by the rendition profile is a gamut rendition parameter. The composition is one of a graphics asset and a video asset, the color gamut rendition parameter includes a legal color range that defines a multidimensional region within a multidimensional color space, and the legal color Pixels of the composition's source media assets with color values that lie within the range are preferentially included in the rendition. the editing rendition rules are spatial editing rendition rules for layout of title text on frames of the rendition; at least one media essence encoding parameter is a spatial framing box; The media asset is a text titleing effect, and the text of the text titleing effect is rendered within the framing box on the frame of the rendition. the editing rendition rule is a composition editing rendition rule, the at least one media essence encoding parameter is a spatial framing box, the source media asset of the composition is a compositing effect, and framing • A frame of the rendition is rendered with the composite image inside the box. Generating includes interpreting the edit rendition rules using generic logic to execute the rules. Logic in which the generating step uses deep learning methods to learn from editorial rendition decisions captured from a collective of editors working with a wide range of media content, source media types, and rendition requirements including the step of performing

[0011] 概略的に、他の態様において、コンピュータ・プログラム製品は、コンピュータ・プログラム命令が格納された非一時的コンピュータ読み取り可能ストレージを含み、コンピュータ・プログラム命令がコンピューティング・システムによって処理されると、コンピューティング・システムに、メディア・コンポジションのレンディションを生成する方法を実行するように命令する。この方法は、メディア・コンポジションを受けるステップであって、メディア・コンポジションが、ソース・メディア・アセットを参照する合成オブジェクトを含み、合成オブジェクトが、メディア・コンポジションの編集者によって、メディア合成アプリケーションを使用して、指定された編集レンディション規則と関連付けられる、ステップと、メディア・コンポジションのレンディションに対してメディア・エッセンス・エンコーディング・パラメータを指定するレンディション・プロファイルを受けるステップと、編集レンディション規則と、メディア・コンポジションのレンディションに対するメディア・エッセンス・エンコーディング・パラメータとにしたがって、ソース・メディア・アセットから、メディア・コンポジションのレンディションを生成するステップとを含む。 [0011] In general, in another aspect, a computer program product includes non-transitory computer readable storage having computer program instructions stored thereon, wherein when the computer program instructions are processed by a computing system, a , directs a computing system to perform a method of generating a rendition of a media composition. The method includes receiving a media composition, the media composition including a composition object that references source media assets, the composition object being processed by an editor of the media composition in a media composition application. receiving a rendition profile that specifies media essence encoding parameters for a rendition of a media composition associated with a specified editing rendition rule using generating renditions of the media composition from the source media assets according to the rendition rules and media essence encoding parameters for the renditions of the media composition.

[0012] 種々の実施形態は、以下の特徴の内１つ以上を含む。コンピューティング・シ
ステムは、メディア・コンポジションが受け取られる場所から離れている。コンピューティング・システムはクラウド内に実装される。 [0012] Various embodiments include one or more of the following features. A computing system is remote from where the media composition is received. A computing system is implemented in the cloud.

[0013] 概略的に、他の態様において、メディア合成レンディション・エンジンは、コンピュータ読み取り可能命令を格納するためのメモリと、メモリに接続されたプロセッサとを含む。プロセッサがコンピュータ読み取り可能命令を実行すると、メディア合成レンディション・エンジンに、メディア・コンポジションを受けさせ、メディア・コンポジションが、ソース・メディア・アセットを参照する合成オブジェクトを含み、合成オブジェクトが、メディア・コンポジションの編集者によって、メディア合成アプリケーションを使用して、指定された編集レンディション規則と関連付けられ、メディア・コンポジションのレンディションに対してメディア・エッセンス・エンコーディング・パラメータを指定するレンディション・プロファイルを受けさせ、編集レンディション規則と、メディア・コンポジションのレンディションに対するメディア・エッセンス・エンコーディング・パラメータとにしたがって、ソース・メディア・アセットから、メディア・コンポジションのレンディションを生成させる。 [0013] In general, in another aspect, a media synthesis rendition engine includes a memory for storing computer readable instructions, and a processor coupled to the memory. The processor executes the computer readable instructions to cause a media composition rendition engine to receive a media composition, the media composition including a composition object that references source media assets, the composition object that references the media A rendition template that is associated with a specified editorial rendition rule by a composition editor using a media composition application and that specifies media essence encoding parameters for a rendition of the media composition. Subjecting the profile to generating a rendition of the media composition from the source media asset according to the editorial rendition rules and the media essence encoding parameters for the rendition of the media composition.

[0014] 種々の実施形態は、以下の特徴の内１つ以上を含む。メディア合成レンディション・エンジンは、メディア・コンポジションが受け取られる場所から離れて位置する。メディア合成レンディション・エンジンは、クラウド内に実装される。 [0014] Various embodiments include one or more of the following features. A media composition rendition engine is located remotely from where the media composition is received. A media synthesis rendition engine is implemented in the cloud.

図１は、説明するメディア出版ワークフローを示す上位ブロック図である。FIG. 1 is a high-level block diagram illustrating the described media publishing workflow. 図２は、編集レンディション規則(editorial rendition rules)とタグ付けされたメディア・コンポジション(media composition)を示す。FIG. 2 shows a media composition tagged with editorial rendition rules. 図３は、所与のコンポジションに対する１組のレンディション・プロファイルを示す。FIG. 3 shows a set of rendition profiles for a given composition. 図４Ａは、１つのコンポジションからの複数のレンディションの生成を示す。FIG. 4A shows the generation of multiple renditions from one composition. 図４Ｂは、複数のバージョンのコンポジションを含むワークフローにおけるレンディションの生成を示す。FIG. 4B illustrates rendition generation in a workflow that includes multiple versions of a composition. 図５は、空間レンディション・パラメータを定める３つのボックスを示す。FIG. 5 shows three boxes defining spatial rendition parameters. 図６は、２つの異なる空間レンディション規則が、どのようにして、それらのレンディション・プロファイルにおいて同じ空間パラメータを有する２つのレンディションのために、異なるレンディションを、生成させるのかを示す。FIG. 6 shows how two different spatial rendition rules cause different renditions to be generated for two renditions with the same spatial parameters in their rendition profiles. 図７は、時間的レンディション・パラメータを示す、理想化されたタイムラインを示す。FIG. 7 shows an idealized timeline showing temporal rendition parameters. 図８は、時間範囲と時間分解能との間における区別(distinction)を示す。FIG. 8 shows the distinction between time range and time resolution. 図９は、ダイナミック・レンジ・レンディション・パラメータを示す。FIG. 9 shows dynamic range rendition parameters. 図１０は、色空間レンディション・パラメータを示すＣＩＥ色度図である。FIG. 10 is a CIE chromaticity diagram showing color space rendition parameters. 図１１は、自動化メディア出版のためのシステムを示す上位ブロック図である。FIG. 11 is a high level block diagram showing a system for automated media publishing. 図１２は、複数のレンディション・エンジンによる自動メディア出版システムを示す上位ブロック図である。FIG. 12 is a high-level block diagram illustrating an automated media publishing system with multiple rendition engines.

[0028] 本明細書において説明するシステムおよび方法は、所与のコンポジション(composition)または１組の関連コンポジションに由来する１組のメディア成果物(deliverable)を生成するための新たな枠組みを提供する。編集段階において挿入されるレンダリング規則と、コンポジションがタグ付けされるワークフローについて説明する。規則は、引き
渡し可能な(deliverable)メディア・ベース製品がレンダリングされるときに、下流にお
いて自動または半自動的に解釈される。典型的な全体的(end-to-end)メディア・ワークフローは、（ｉ）コンポジションを作成する段階、（ｉｉ）随意に、種々のターゲット地域、コンシューマの年齢層(gemographics)、および消費会場に対するコンテンツ・ケータリング(content catering)に多様性がある１つ以上のコンポジションに基づくバージョンのファミリを作成する段階、（ｉｉｉ）バージョンのファミリから、またはバージョンが生成されない場合、コンポジションから直接、レンディションを生成する段階、および（ｉｖ）引き渡し可能なメディア・パッケージをレンディションから生成し配布する段階とを含む。ここで中心となるのは、出版されるメディア製品のレンダリングの最適化および自動化、即ち、ワークフローにおける段階（ｉｉｉ）のレンディションの生成である。本明細書において使用する場合、レンディション(rendition)とは、ターゲット・プラットフ
ォームによって消費、即ち、再生することができる形態におけるメディア・コンポジションの「平坦化」レンダリング(flattened rendering)である。共通レンディション・フォ
ーマットには、オーディオ相互交換ファイル・フィーマット（．ＡＩＦＦ）、ＷＡＶＥオーディオ・ファイル（．ＷＡＶ）、ＭＰ３オーディオ・ファイル（．ＭＰ３）、ＭＰＥＧムービー（.ＭＰＧ）、ＡｐｐｌｅＱｕｉｃｋＴｉｍｅＭｏｖｉｅ（．ＭＯＶ）、ウ
ィンドウズ、メディア・ビデオ・ファイル（．ＷＭＶ）、マテリアル交換フォーマット（ＭＸＦ），相互動作可能マスタリング・フォーマット（ＩＭＦ）、およびディジタル・シネマ・パッケージ（ＤＣＰ）が含まれる。 [0028] The systems and methods described herein provide a new paradigm for generating a set of media deliverables derived from a given composition or set of related compositions. offer. It describes the rendering rules inserted during the editing stage and the workflow by which compositions are tagged. Rules are automatically or semi-automatically interpreted downstream when a deliverable media-based product is rendered. A typical end-to-end media workflow consists of (i) creating a composition; creating a family of versions based on one or more compositions with diversity in content catering; (iii) renditions directly from the family of versions or, if no versions are generated, from the composition; and (iv) generating and distributing a deliverable media package from the renditions. The focus here is on optimizing and automating the rendering of the published media product, ie the generation of renditions, stage (iii) in the workflow. As used herein, a rendition is a "flattened" rendering of a media composition in a form that can be consumed or played by a target platform. Common rendition formats include Audio Interchange File Format (.AIFF), WAVE Audio File (.WAV), MP3 Audio File (.MP3), MPEG Movie (.MPG), Apple QuickTime Movie (. MOV), Windows, Media Video File (.WMV), Material Exchange Format (MXF), Interoperable Mastering Format (IMF), and Digital Cinema Package (DCP).

[0029] 一般に、所与のメディア・コンポジションを、それが消費されようとしている複数のコンテキストに対して適合化および最適化する必要がある。従前では、これはトランスコーディング・プロセス(transcoding proess)によって行われていた。トランスコーディング・プロセスは、高品質平坦化ファイル(flattened file)、通例「マスタ」と呼ばれるものの空間、時間、および色の再フォーマッティングおよび圧縮を伴う。マスタは、複数のトラックおよびエフェクトが混合され１つのメディア・ストリームになった最終ミックスを表す。マスタからトランスコードするとき、結果的に得られるプログラムの品質は、マスタの品質に制限される。何故なら、どんなに良くても、トランスコーディング処理はマスタの品質を保存するだけであり、多くの場合、劣化させるからである。 [0029] In general, a given media composition needs to be adapted and optimized for multiple contexts in which it is intended to be consumed. Previously, this was done by a transcoding process. The transcoding process involves spatial, temporal and color reformatting and compression of a high quality flattened file, commonly referred to as a "master". A master represents the final mix where multiple tracks and effects are mixed into one media stream. When transcoding from a master, the quality of the resulting program is limited to that of the master. This is because, at best, the transcoding process only preserves the quality of the master, and often degrades it.

[0030] また、「マスタ」という用語は、流通マスタ(distribution master)に言及す
るときにも使用される。流通マスタとは、種々のメディア発信地および業務への流通のために組み立てられるパッケージである。パッケージは、言語、プログラム長、およびターゲット・ディスプレイ・パラメータというような面に対して、複数の代わりの選択肢を含む。流通マスタ内に含まれるメディア（即ち、ビデオおよびオーディオ）選択肢は、元のマスタを要求されたフォーマットにトランスコードすることによって生成されたミックス・ダウン・ファイル(mixed-down file)である。つまり、このような流通マスタが複数の
代わりのトラックを含むことができるが、これらの各々は、元のソース・アセットに関して定められたプログラムのコンポジション・エレメントとしてではなく、「平坦化された」代用物として含まれる。実際、ソース・アセットは流通パッケージ内に含まれもしない。一例として、東カナダのために作成される流通マスタであれば、英語およびフランス語双方のダイアログ・トラックと、スマートフォンおよびHＤテレビジョン用にフォーマッ
トされたビデオ・ファイルとを含む。ディジタル・シネマ・パッケージ（ＤＣＰ）および相互動作可能マスタリング・フォーマット（ＩＭＦ）は、流通マスタ・タイプの２つの例である。 [0030] The term "master" is also used when referring to a distribution master. A distribution master is a package assembled for distribution to various media outlets and businesses. The package contains multiple alternatives for aspects such as language, program length, and target display parameters. The media (ie, video and audio) selections contained within the distribution master are mixed-down files generated by transcoding the original master into the required format. That is, such a distribution master can contain multiple alternative tracks, each of which has been "flattened" rather than as a programmatic composition element defined with respect to the original source asset. Included as a substitute. In fact, the Source Assets are not even included in the distribution package. As an example, a distribution master created for Eastern Canada would include dialog tracks in both English and French, and video files formatted for smart phones and HD televisions. Digital Cinema Package (DCP) and Interoperable Mastering Format (IMF) are two examples of distribution master types.

[0031] 対照的に、本明細書において説明する方法およびシステムは、メディア成果物の発行に関する編集判断の非可逆的な適用を、成果物の具体的な要件が分かり、成果物が生成されるまで延期する。元の編集されたコンポジションおよび元のソース・アセットが利用可能にされるので、トラック・ミキシング、複合(compositing)、およびエフェクト
というようなレンディション選択肢の適用を、ミックス・ダウン・バージョンではなく、元のメディアのソースに対して実行することを可能にする。例えば、ＨＤメザニン・フォーマットで４Ｋソース・メディアを編集した後に４Ｋ解像度の成果物を生成するためには、従前からの方法は、編集されたＨＤマスタを再サンプリングして４Ｋ成果物を生成するので、その結果品質が元のものよりも低くなる。対照的に、本明細書において説明する方法を使用すると、編集が完了した後に、４Ｋソース材料から成果物が直接生成される。タイトルまたはクローズド・キャプションのようなテキストのレイアウトは、他の例を示す(provide)。既存の方法では、要求された宛先の画面サイズにテキストが収まらない場合
、利用可能な唯一の選択肢は、画像全体をスケールするまたはクロップすることであるが、こうすると、テキストが判読できなくなる、または成果物から削除される(cropped out
of)おそれがある。本明細書において説明する手法では、宛先画面サイズが指定されたときでも、元のメディアおよび文字入れエフェクト(titling effect)はなおも利用可能であるので、おそらくは宛先画面のアスペクト比のためにテキストが収まらない場合、文字入れエフェクトを再度実行し、それが宛先画面に最適に収まるように、テキストが配置される(lay out)。更に他の例は、色補正を伴う。高ダイナミック・レンジ（ＨＤＲ）可能モ
ニタ(capable monitor)への最終出力が指定される場合、８ビットまたは１０ビット・マ
スタは、表示するために利用可能な全輝度範囲を供給できない。しかし、元のメディア・コンポジションおよびソース・メディア・アセットが未だ利用可能であるので、元のメディア(original)の全ダイナミック・レンジを、流通用レンディション(distributed rendition)に含ませることができる。 [0031] In contrast, the methods and systems described herein apply the irreversible application of editorial decisions to the publication of a media product, where the specific requirements of the product are known and the product is produced. postpone until Since the original edited composition and original source assets are made available, the application of rendition choices such as track mixing, compositing, and effects, rather than a mix-down version Allows to run against the original media source. For example, to produce a 4K resolution output after editing 4K source media in HD mezzanine format, the traditional method is to resample the edited HD master to produce a 4K output. , resulting in lower quality than the original. In contrast, using the methods described herein, the deliverables are generated directly from the 4K source material after editing is complete. Text layouts such as titles or closed captions provide other examples. With existing methods, if the text does not fit in the requested destination screen size, the only options available are to scale or crop the entire image, but doing so renders the text illegible, or cropped out of the artifact
of) there is a risk. With the approach described here, the original media and titling effects are still available even when the destination screen size is specified, so perhaps the text is If it doesn't fit, run the inscription effect again and the text is laid out so that it fits best on the destination screen. Yet another example involves color correction. If final output to a high dynamic range (HDR) capable monitor is specified, an 8-bit or 10-bit master cannot supply the full luminance range available for display. However, since the original media composition and source media assets are still available, the full dynamic range of the original can be included in the distributed rendition.

[0032] このようなワークフローを実現するために、追加機能を編集環境に追加し、編集者が、レンディションをどのように生成すべきか定める規則および対応するパラメータを指定すること、ならびにこれらの規則をメディア・コンポジションと関連付けること、更に特定すれば、メディア・コンポジションのコンポジション・オブジェクト(compositional object)と関連付けることを可能にする。規則は、レンディションがコンポジションから生成されるときに、自動レンダリング・ソフトウェアに露出される。本明細書において説明する方法は、編集者がメディアにおいて語りたいストーリーを具体化する彼らの能力を確認するためのメカニズムを、編集者に提供する。即ち、骨の折れる手作業での副編集によって再度作成するのではなく、彼らの全創作意図(full creative intent)は、レンディションが生成されるときに、自動的に下流側で把握される。このような規則のことを「編集レンディション規則」(editorial rendition rules)と呼ぶ。何故なら、これらは
本質的に事実上編集に関わり、メディア・フォーマット・レンディション規則とは区別されなければならないからである。後者は、レンディションを生成するときに種々のコデック・パラメータが適用される条件を指定することによって、メディア・ファイルの構造およびフォーマット、ならびに技術的なメディア品質を最適化する機能を果たす。既存のワークフローでは、共通フォーマットの決定は、創作的編集および組み立て仕上げプロセスの下流側にいる要員によって指定される。場合によっては、フォーマットの決定およびエンコーディングは手作業で行われることもある。本明細書において説明する方法は、ワークフローのこの段階の自動化も同様にサポートする。しかしながら、メディア・フォーマット規則およびパラメータは、レンディション指定の一部として供給され、メディア・コンポジション内ではなく、レンディション・プロファイル（以下で説明する）内で提供される。 [0032] To enable such a workflow, additional functionality is added to the editing environment by which editors specify rules and corresponding parameters that define how renditions are to be generated, and how these rules are defined. with a media composition, and more particularly with a compositional object of the media composition. The rules are exposed to automatic rendering software when renditions are generated from the composition. The methods described herein provide editors with a mechanism for verifying their ability to embody the stories they want to tell in the media. That is, their full creative intent is automatically captured downstream when renditions are generated, rather than being recreated through laborious manual sub-editing. Such rules are called "editorial rendition rules". Because they are essentially editorial in nature, they must be distinguished from media format rendition rules. The latter serves to optimize the structure and format of media files, as well as technical media quality, by specifying the conditions under which various codec parameters are applied when generating renditions. In existing workflows, common format decisions are specified by people downstream in the creative editing and assembly process. In some cases, format determination and encoding may be done manually. The methods described herein support automation of this stage of the workflow as well. However, media format rules and parameters are supplied as part of the Rendition specification and are provided within the Rendition Profile (described below) rather than within the Media Composition.

[0033] 本明細書において説明する方法は、メディア発行者および流通者が彼らのアセットを収益化する能力を、保管されているアセットを再利用する能力を提供することによって、高める役割を果たす。創作的判断は元のソース・メディア・アセットに対して行われ、これらのアセットと関連付けてメタデータとしてアーカイブに格納されるので、メディア作用域(media scope)および品質のフル・レンジが、任意の時点においてレンディシ
ョンを生成するために利用可能であり続ける。これは、空間ドメイン（元の大きなラスタ
）、時間ドメイン（元のフレーム・レート）、および色ドメイン（元のダイナミック・レンジ）に適用される。 [0033] The methods described herein serve to enhance the ability of media publishers and distributors to monetize their assets by providing the ability to reuse stored assets. Creative judgment is made on the original source media assets and stored in the archive as metadata associated with these assets, so that the full range of media scope and quality is It remains available for generating renditions at that point in time. This applies to the spatial domain (original large raster), temporal domain (original frame rate), and color domain (original dynamic range).

[0034] 図１は、説明するメディア出版ワークフロー、およびこのプロセスに関与するデータ・オブジェクトを示す上位図である。ビデオ・クリップ、オーディオ・クリップ、コンピュータ生成画像のようなソース・メディア・アセット１０２、および特殊エフェクトは、素材のソースを提供し、この素材はメディア編集アプリケーション１０６を使用する編集者１０４によって使用される。メディア編集アプリケーション１０６は、非線形ビデオ・エディタ、デジタル・オーディオ・ワークステーション（ＤＡＷ）、またはコンポジション１０８を作成するための特殊エフェクト・アプリケーションであってもよい。このようなアプリケーションの例には、双方共マサチューセッツ州、ＢｕｒｌｉｎｇｔｏｎのＡｖｉｄ（登録商標）Ｔｅｃｈｎｏｌｏｇｙ, Ｉｎｃ．，からのＭＥＤＩＡＣＯ
ＭＰＯＳＥＲ（登録商標）、非線形ビデオ・エディタ、およびＰＲＯＴＯＯＬＳ（登録商標）、ＤＡＷ、ならびにＡｄｏｂｅ（登録商標）Ｓｙｓｔｅｍｓ，ＩｎｃからのＡＦＴＥＲ－ＥＦＦＥＣＴＳ（登録商標）視覚エフェクト、モーション・グラフィクス、および合成アプリケーションが含まれる。コンポジション１０８は、一列に配列されたソース・メディア・アセット１０２のクリップを、編集判断、パラメータ、および設定値を指定する豊富なメタデータと共に含む複数のトラックによって表される。このコンポジションは、全てのソース・メディア・アセット、エフェクトに対する参照、およびこのコンポジションを再生またはレンダリングするために必要とされるメタデータを含む。このような「非平坦化」コンポジション(unflattened composition)は、更なる編集を許容する形態
であり、先に引用した従前からの平坦化ファイル「マスタ」や、ＤＣＰおよびＩＭＦのような流通マスタとは区別されなければならない。コンポジションの構造は、アプリケーション毎に異なる。例えば、ＭＥＤＩＡＣＯＭＰＯＳＥＲはビデオ・コンポジションを「シーケンス」と呼び、設定値、ビン・ファイル、およびクリップを含み、シーケンス・メタデータはソースへの参照を含む。ＰＲＯＴＯＯＬＳは、オーディオ・コンポジションを「セッション」と呼び、セッションは、オーディオ編集およびミキシングの決定および設定値、ならびにオーディオ・ファイルおよびエフェクトのようなソース・アセットへの参照を指定するメタデータを含む。グラフィクスおよび合成コンポジション(compositing
composition)は、シーン・グラフ(scene graph)として表され、グラフィクス・ソース・アセットに適用される編集判断を指定する。 [0034] Figure 1 is a high-level diagram illustrating the described media publishing workflow and the data objects involved in this process. Source media assets 102 such as video clips, audio clips, computer-generated images, and special effects provide sources of material that is used by editors 104 using media editing applications 106. . Media editing application 106 may be a non-linear video editor, digital audio workstation (DAW), or special effects application for creating composition 108 . Examples of such applications include Avid® Technology, Inc., both of Burlington, Massachusetts. , MEDIA CO from
MPOSER®, a nonlinear video editor, and PRO TOOLS®, DAW, and AFTER-EFFECTS® visualization, motion graphics, and compositing applications from Adobe® Systems, Inc. included. The composition 108 is represented by multiple tracks containing clips of the source media assets 102 arranged in a row, along with rich metadata specifying editing decisions, parameters, and settings. This composition contains all source media assets, references to effects, and metadata needed to play or render this composition. Such an "unflattened" composition is a form that allows for further editing, and can be compared with the traditional flattened file "master" cited above, or distribution masters such as DCP and IMF. must be distinguished. The composition structure varies from application to application. For example, MEDIA COMPOSER calls a video composition a "sequence" and includes settings, bin files, and clips, and sequence metadata includes references to sources. PRO TOOLS refers to an audio composition as a "session", which contains metadata that specifies audio editing and mixing decisions and settings, as well as references to source assets such as audio files and effects. . graphics and compositing
A composition, represented as a scene graph, specifies the editing decisions applied to a graphics source asset.

[0035] 前述のように、編集フェーズは、編集者が編集レンディション規則１１０を指定することを可能にすることによって、その従前からの作用域を超えて広げられる。編集者が編集レンディション規則を指定するとき、メディア合成アプリケーションは、この規則を、それが要求するかもしれない任意のパラメータの値と共に、コンポジションにおける該当する合成オブジェクト(compositional object)に添付する。図２は、３つのトラックを含み、各トラックが１つ以上のクリップを含むコンポジション２０２に対する合成データ・モデルを示す。トラックおよびクリップは、合成オブジェクトの例である。トラック１２０４は編集レンディション規則Ａ２０６と関連付けられ、クリップ１２０８は編集レンディション規則２１０と関連付けられる。編集レンディション規則を合成オブジェクトと関連付けるステップは、本明細書では、合成オブジェクトを規則とタグ付けするとも呼ぶ。メディア合成アプリケーションは、ユーザ・インターフェースを供給する。このユーザ・インターフェースは、編集者が、メディア・コンポジションの選択された合成オブジェクトに対して編集レンディション規則およびパラメータを作成し入力することを可能にする。一旦編集者によって指定されると、編集者は合成オブジェクトを編集レンディション規則とタグ付けすることができ、規則は、図２に示すように、メディア・コンポジションの一部となる。編集レンディション規則の例については、以下で説明する。 [0035] As mentioned above, the editing phase is extended beyond its traditional scope by allowing editors to specify editing rendition rules 110 . When an editor specifies an editorial rendition rule, the media compositing application attaches this rule, along with the values of any parameters it may require, to the appropriate compositional object in the composition. FIG. 2 shows a composite data model for a composition 202 containing three tracks, each track containing one or more clips. Tracks and clips are examples of composite objects. Track 1 204 is associated with editing rendition rule A 206 and Clip 1 208 is associated with editing rendition rule 210 . The step of associating edit rendition rules with a composition object is also referred to herein as tagging the composition object with the rule. A media composition application provides a user interface. This user interface allows editors to create and enter editorial rendition rules and parameters for selected composite objects of the media composition. Once specified by the editor, the editor can tag the composite object with editorial rendition rules, which become part of the media composition, as shown in FIG. Examples of edit rendition rules are described below.

[0036] コンポジションの１つ以上のレンディションが生成されようとするとき、編集
レンディション規則１１０は、レンディション１１６を生成するために、レンディション・プロファイル１１４によって指定されるメディア引き渡し要件(media deliverable requirement)にしたがって、レンディション・エンジン１１２によって解釈される。レンデ
ィション・エンジン・ロジックは、その対応するレンディション・プロファイルに見合った品質で各レンディションを生成しようとする。レンディション・エンジンは、中間ファイルに頼らずに、元のソース・アセットから開始するので、この品質は、これらのアセットの品質以下になるおそれがある。また、ソース・エフェクトおよびソース・メディアへのアクセスは、指定されたパラメータを考慮してエフェクトがどのように適用されるかに関して、高い柔軟度を提供する。 [0036] When one or more renditions of a composition are to be generated, editorial rendition rules 110 apply media delivery requirements specified by rendition profile 114 to generate rendition 116. are interpreted by the Rendition Engine 112 according to the deliverable requirements). The Rendition Engine Logic attempts to produce each Rendition with quality commensurate with its corresponding Rendition Profile. Since the Rendition Engine starts from the original source assets, without resorting to intermediate files, this quality can be less than or equal to the quality of those assets. Also, access to source effects and source media provides a high degree of flexibility as to how effects are applied given the specified parameters.

[0037] レンディション・プロファイル１１４は、１つ以上の個別レンディション・プロファイルを、生成されるレンディション毎に１つずつ含む。各レンディション・プロファイルは、以下のことを指定する。（ｉ）バージョン指定段階(version stage)を含むワ
ークフローに対して、使用すべきバージョン、（ｉｉ）そのバージョンから生成されるべきメディア・パッケージの構造的記述、例えば、相互動作可能マスタ・フォーマット（ＩＭＦ）、素材交換フォーマット（ＭＸＦ）ＯＰ－ＡｔｏｍおよびＯＰ１ａ、（ｉｉｉ）空間フレーミング・ボックス(framing box)のような空間パラメータ、時間フレーミング・
レンジのような時間パラメータ、ならびにダイナミック・レンジおよびカラー・リーガル・レンジ(color legal ranges)のような色パラメータを含むが、これらに限定されない、エッセンス・エンコーディング・パラメータ(essence encoding parameter)、（ｉｖ）デプロイすべき流通方法。加えて、レンディション・プロファイルは、おそらくは、レンディション要件が最初に定められたときに編集者によって既に指定されている、１つ以上の編集レンディション規則も含むことができる。これらは、コンポジション内に埋め込まれた編集レンディション規則を補足することができる。更に、有効なボックスまたはフレーミング・ボックス（以下を参照）のような、特定のレンディション・パラメータが、ソース・メディア・アセット１０２内部に存在してもよく、編集レンディション規則１１０にしたがって、レンディション・プロファイル内に存在する対応するパラメータよりも優先度が与えられてもよい。しかしながら、パラメータを解釈しレンディションを生成するためにレンディション・エンジンによって使用されるロジックは、レンディション・パラメータのソースがたとえ何であっても、同一である。図３は、所与のコンポジションに対する１組のレンディション・プロファイルの例示的な図である。 [0037] Rendition profiles 114 include one or more individual rendition profiles, one for each rendition that is generated. Each Rendition Profile specifies: (i) for workflows that include a version stage, which version to use, (ii) a structural description of the media package to be generated from that version, e.g., an Interoperable Master Format (IMF) ), material exchange formats (MXF) OP-Atom and OP1a, (iii) spatial parameters such as spatial framing boxes, temporal framing
essence encoding parameters, including but not limited to temporal parameters such as range, and color parameters such as dynamic range and color legal ranges; (iv) The distribution method that should be deployed. Additionally, the Rendition Profile may also include one or more editorial rendition rules, possibly already specified by the editor when the rendition requirements were originally defined. These can complement the editing rendition rules embedded within the composition. In addition, certain rendition parameters, such as valid boxes or framing boxes (see below), may be present within the source media asset 102, and according to the editorial rendition rules 110, the rendition • May be given priority over corresponding parameters present in the profile. However, the logic used by the Rendition Engine to interpret parameters and generate renditions is the same no matter what the source of the rendition parameters. FIG. 3 is an exemplary illustration of a set of rendition profiles for a given composition.

[0038] あるワークフローでは、１つ以上の部分的にミックス・ダウンされたメディア・ファイルを含むメディア・アセットからレンディションが生成される場合がある。このような中間ファイルは、次に、レンディション・フェーズ中に他の元のメディアおよび／またはエフェクトと組み合わせられる。他の例では、背景トラックは平坦化されるが、タイトル・トラックはレンディション・フェーズの間手を付けずに残されるか、またはミックス・ダウンされたモノ・オーディオ・トラックが、レンディションの直前に指定された言語でダイアログに混合するために、レンディション・フェーズにおいて使用される。つまり、レンディション決定(renditioning decision)の中には、延期されたままのものも
あり、最終レンディション(final renditioning)における柔軟性を保持し、一方他の決定は、ワークフローの早いフェーズにおいてコミットされる。 [0038] In some workflows, renditions may be generated from media assets that include one or more partially mixed down media files. Such intermediate files are then combined with other original media and/or effects during the rendition phase. In other examples, the background track is flattened but the title track is left untouched during the rendition phase, or a mixed down mono audio track is added just before the rendition. Used in the rendition phase to blend into the dialog in the specified language. That is, some renditioning decisions remain deferred, preserving flexibility in final renditioning, while other decisions are committed early in the workflow. be.

[00391 ワークフローの最終フェーズにおいて、１つ以上のレンディションが流通マスタ内にパッケージングされ、次いで、流通マスタは、これらが消費されようとしている場所に流通される。 [00391] In the final phase of the workflow, one or more renditions are packaged into a distribution master, which is then distributed to where they are intended to be consumed.

[0040] 図４Ａは、１つのコンポジション４０２からの複数のレンディション４０６の生成を示す。各レンディションは、１組のレンディション・プロファイル４０４の内対応するメンバによって指定される。図４Ｂは、コンポジションの複数のバージョンが生成さ
れるワークフローにおけるレンディションの生成を示す。各バージョンは、次に、１組のレンディションの基礎として使用することができる。バージョンとは、コンテンツおよび／または特定のエレメントのタイミングの変動(variation)を含むコンポジションの変形(variation)を指す。サブマスタまたはカプセル化クリップまたは部分的ミックス・ダウンの形態とした他のソースまたはコンポジションも、種々のバージョンに対するコンテンツを提供することができる。例えば、連続プログラムの１回分の作品を著す１組のバージョンが、その１回分の作品の個々のコンテンツを表すコンポジションからのコンテンツ、および各１回分の作品において再生(reproduce)されるオープニング・シーン(opening sequence)を含む第２マスタからのコンテンツを含むことができる。コンポジションの複数の
バージョンは、各々、異なる言語のダイアログ・トラックを含むことができる。ダイアログ・トラックに加えて、異なる言語バージョンを実現するために必要とされる変形は、表題(titling)、字幕(subtitling)、および場面選択も含む(involve)ことができる。他のバージョン・タイプには、１つ以上の場面が、意図される消費会場に基づいて追加または除去されているものがある。例えば、航空機事故を含む場面は、機内航空会社バージョン(in-flight airline version)から除去される。一般聴衆に示されるバージョンは、成人聴
衆に意図されたものとは異なる場面を含むことができる。他のタイプのバージョニング(versioning)では、プログラム内における広告の挿入、または流通時点において追加される広告のためのある範囲のプレースホルダ長が含まれる。生成される１組のバージョンは、１組のバージョン・プロファイルによって指定することができ、１組のバージョン・プロファイルは、コンポジションの編集が行われている間または完了した後に供給することができる。図４Ｂを参照すると、コンポジション４０８は、バージョンＡ４１０およびバージョンＢ４１２の基礎として使用されている。レンディション・プロファイルＡ４１４にしたがって、バージョンＡは、合成の基礎として使用され、これから、レンディションＡ１、Ａ２、Ａ３、およびＡ４４１６が生成され、更にレンディション・プロファイルＢ４１８にしたがって、バージョンＢは合成の基礎として使用され、これから、レンディションＢ１およびＢ２４２０が生成される。 [0040] FIG. 4A illustrates the generation of multiple renditions 406 from a composition 402. FIG. Each rendition is specified by a corresponding member of a set of rendition profiles 404 . FIG. 4B illustrates the generation of renditions in a workflow in which multiple versions of a composition are generated. Each version can then be used as the basis for a set of renditions. A version refers to a variation of a composition, including variations in content and/or timing of particular elements. Other sources or compositions in the form of submasters or encapsulated clips or partial mixdowns can also provide content for the various versions. For example, a set of versions authoring an episode of a continuous program, content from a composition representing the individual content of that episode, and an opening scene reproduced in each episode. It can contain content from a second master that contains the scenes (opening sequence). Multiple versions of a composition can each contain dialog tracks in different languages. In addition to dialog tracks, the transformations required to achieve different language versions can also involve titling, subtitling, and scene selection. Other version types have one or more scenes added or removed based on the intended consumption venue. For example, scenes involving aircraft crashes are removed from the in-flight airline version. The version shown to a general audience may contain different scenes than those intended for an adult audience. Other types of versioning involve the insertion of advertisements within the program, or a range of placeholder lengths for advertisements added at the point of distribution. The set of versions to be generated can be specified by a set of version profiles, and the set of version profiles can be supplied either during composition editing or after it is completed. Referring to FIG. 4B, composition 408 is used as the basis for version A 410 and version B 412 . According to rendition profile A 414, version A is used as the basis for compositing, from which renditions A1, A2, A3, and A4 416 are generated, and version B is used for compositing according to rendition profile B 418. used as a basis from which renditions B1 and B2 420 are generated.

[0041] １組の供給されたレンディション・プロファイルの所与のメンバにしたがってレンディションを生成するとき、レンディション・エンジンは、メディア・コンポジションおよびその合成オブジェクト（またはバージョンが存在する場合は、そのコンポジションのどのバージョンか）を読み出すのに必要とされるロジックを実行し、それを使用して、空間、時間、または色に関係するレンディション・ロジックの内１つ以上が所与のレンディション・プロファイルによって供給されるエッセンス・エンコーディング・パラメータに適用されるときに、編集レンディション規則にしたがってこれらを呼び出ながら、ソース・メディア・アセット、エフェクトを、指定されたレンディションに変換する。加えて、レンディション規則は、複数の代替アセットが利用可能なときに、編集者がソース・メディア・アセットの選択を最適化するために規則を指定することを可能にすることもできる。代替アセットは、それらの空間、時間、または色解像度あるいは範囲の内１つ以上において、互いに異なってもよい。レンディション・エンジンは、編集レンディション規則を、タグ付けされたコンポジション自体から受け、更に規則の対応するパラメータを、合成規則自体、レンディション・プロファイル、およびソース・メディア・アセットの内１つ以上から受ける。 [0041] When generating a rendition according to a given member of a set of supplied Rendition Profiles, the Rendition Engine takes the Media Composition and its Composite Objects (or if versions exist, which version of that composition) and uses it to apply one or more of the rendition logic related to space, time, or color to a given rendering. Converts the source media assets, effects into the specified rendition, calling them according to the editorial rendition rules when applied to the essence encoding parameters supplied by the dition profile. In addition, rendition rules can also allow editors to specify rules to optimize the selection of source media assets when multiple alternative assets are available. Alternate assets may differ from each other in one or more of their spatial, temporal, or color resolutions or ranges. The rendition engine receives the editorial rendition rules from the tagged composition itself, and the rules' corresponding parameters from one or more of the compositing rules themselves, the rendition profiles, and the source media assets. receive from

[0042] これより、レンディション規則の３つのカテゴリ、ならびにそれらのパラメータ、空間、色、および時間パラメータについて説明する。空間パラメータは、ビデオ・フレーム内部におけるｘ、ｙ範囲に関する。図５に示すように、３つのボックスが空間パラメータを定める。ビデオ・フレーム５０２について、エッセンス・ボックス５０４はサンプルが存在する画像の範囲(extent)を定める。これは、以前にパディングとして導入された黒いサンプルを含むことができる。有効ボックス５０６は、パディングのために以前に導入された画素とは対照的に、画像コンテンツを含む画素の範囲を定める。編集者によっ
て定められる１つ以上のフレーミング・ボックスが、画像の最も重要な部分がどこに位置するかに関する編集者の評価を構成する。編集者は、フレーミング・ボックスを使用して安全アクション領域、安全タイトル領域を定めるか、または画像のどこにアクター(actor)がいるかシステムに伝える。フレーミング・ボックスは、フレームのどの部分に、レン
ディションに含める優先度を与えるべきか決定するために、空間型編集レンディション
規則(spatial type editorial rendition rule)にしたがって呼び出される。例えば、レ
ンディション・プロファイルによって指定されるターゲット・プレイアウト・ディスプレイのアスペクト比に依存して、使用されるべきフレーミング・ボックスの外形を,他の規
則が制約する。図５は、フレーミング・ボックス５０８を示す。これは、フレーミング・ボックス５０８およびターゲット・スクリーン５１０のアスペクト比が同一であり、フレーミング・ボックスによって定められる全範囲がレンディション・フレーム５１２内に含まれる例を示す。ソース・メディア・アセット内部で定められるエッセンス・ボックスおよび有効ボックスとは異なり、フレーミング・ボックスは性質上編集用(editorial)であ
る。編集者によって使用されているメディア合成アプリケーションは、ユーザ・インターフェースを供給する。このユーザ・インターフェースは、編集者が、コンポジションの個々のフレーム、クリップ、または他の部分に対してフレーミング・ボックスを指定することを可能にする。例示的なユーザ・インターフェースでは、このアプリケーションは、編集者が、アプリケーションのユーザ・インターフェース内部にある再生ウィンドウに表示されているビデオ・フレームの表示上にフレーミング・ボックスを描くことを可能にする。 [0042] Three categories of rendition rules and their parameters, spatial, color, and temporal parameters, will now be described. Spatial parameters relate to the x,y extent within a video frame. As shown in Figure 5, three boxes define the spatial parameters. For video frame 502, essence box 504 defines the extent of the image in which the sample resides. This can include black samples previously introduced as padding. Valid box 506 delimits pixels that contain image content, as opposed to pixels previously introduced for padding. One or more framing boxes defined by the editor constitute the editor's assessment of where the most important parts of the image are located. The editor uses framing boxes to define safe action areas, safe title areas, or to tell the system where actors are in the image. The framing box is invoked according to spatial type editorial rendition rules to determine which parts of the frame should be given priority for inclusion in the rendition. Other rules constrain the outline of the framing box to be used, for example, depending on the aspect ratio of the target playout display specified by the rendition profile. FIG. 5 shows framing box 508 . This illustrates an example where the aspect ratios of framing box 508 and target screen 510 are the same and the entire extent defined by the framing box is contained within rendition frame 512 . Framing boxes are editorial in nature, unlike essence and effect boxes that are defined within a source media asset. The media composition application used by the editor provides the user interface. This user interface allows the editor to specify framing boxes for individual frames, clips, or other portions of the composition. In an exemplary user interface, the application allows an editor to draw a framing box on the display of video frames displayed in a playback window within the application's user interface.

[0043] 空間レンディション・パラメータは解像度とは無関係である。つまり、フレーミング・ボックスは、編集者が画素のｘ、ｙ範囲に関して呼び出す(call out)ことを望む画像の部分を定める。図６は、２つの異なる空間レンディション規則が、異なるレンディションを、同じ空間パラメータ（この場合、解像度パラメータ）をそれらの(レンディシ
ョン・プロファイルに有する２つのレンディションに対して、どのように生成させるかを示す。ソース画像５０２は、３８４０×２１６０の解像度を有する。フレーミング・ボックス６０４は、（１０００～２９２０）、（５４０～１６２０）のｘ、ｙ範囲を定め、その結果、解像度を変化させることなく、フレーミング・ボックスによって定められる範囲に広がる、ソース画像６０２のクロップ部分(cropped portion)を示す画像６０６が、Ｈ
Ｄ解像度で得られる。対照的に、ソース画像６０２のフル・レンジは、範囲を変化させることなく、単に解像度を変化させることによって、ＨＤ解像度の画像６０８にスケールすることができる。１９２０×１０８０ＨＤ画像を生成するためにスケールするかまたはクロップするか（または、更に一般的には、クロッピングおよびスケーリングの組み合わせを選択する）についての決定は、本質的に、性質上編集に関することである。何故なら、レンダリングされた画像の全体的な品質の方が重要でありソース画像をスケールすべきか、あるいは画像の極めて重要なコンテンツにコンシューマの注意を集中させるために画像をクロップするよりも、可能な品質の損失の方が重要でないかに関する定性的評価に依存する可能性があるからである。この例は、副編集者(sub-editor)または自動化されたレンダラ(renderer)が、不用意にフレームの最も重要な部分を切り抜くというような、拙い選択を行ったときに、空間編集レンディション規則の案内がないと起こる可能性がある曖昧さを示す。 [0043] Spatial rendition parameters are independent of resolution. That is, the framing box defines the portion of the image that the editor wishes to call out in terms of x,y ranges of pixels. Figure 6 shows how two different spatial rendition rules generate different renditions for two renditions that have the same spatial parameters (in this case resolution parameters) in their rendition profiles. The source image 502 has a resolution of 3840 by 2160. The framing box 604 defines x,y ranges of (1000-2920), (540-1620), resulting in varying resolutions. without the H
Obtained in D resolution. In contrast, the full range of source image 602 can be scaled to HD resolution image 608 by simply changing the resolution without changing the extent. The decision to scale or crop (or, more generally, to choose a combination of cropping and scaling) to produce a 1920x1080 HD image is essentially an editorial in nature. . Because the overall quality of the rendered image is more important than whether the source image should be scaled, or if the image is cropped in order to focus the consumer's attention on the image's vital content. This is because it may rely on a qualitative assessment as to whether the loss of quality is less significant. An example of this is when a sub-editor or automated renderer makes poor choices, such as inadvertently cropping the most important parts of a frame. Indicates ambiguity that can arise without the guidance of

[0044] ヒンジング(hinging)規則は、他のタイプの空間レンディション規則である。
ヒンジング規則は、インタラクティブ・エレメント、典型的にはグラフィカル・オブジェクトを、レンダリングされた表示上にどのように配置する(lay out)か指定する。これら
の規則は、蝶番位置パラメータを含む。例えば、テレビジョン司会者(television presenter)によって調節することができる棒グラフは、底辺を蝶番として動かすことができ、その垂直方向の範囲が、固定された蝶番位置に関して変化する。他の空間規則およびパラメータは、ビデオ・ウォールのような１組の表示内における表示の配置、および、例えば、
ピクチャ合成(picture compositing)内部のピクチャに対する合成規則を定める。 [0044] Hinging rules are another type of spatial rendition rule.
Hinging rules specify how interactive elements, typically graphical objects, are laid out on a rendered display. These rules include hinge position parameters. For example, a bar graph, which can be adjusted by a television presenter, can hinge at its base and its vertical extent changes with respect to a fixed hinge position. Other spatial rules and parameters are the placement of displays within a set of displays, such as a video wall, and, for example,
Defines compositing rules for pictures within picture compositing.

[0045] 以下は、空間フレーミング・ボックスを使用するためのレンディション・エンジン・ロジックの例である。レンディション・ターゲット・ディスプレイが、フレーミング・ボックスと同じアスペクト比を有する場合、以下の規則が適用される。ターゲット・ディスプレイの解像度がフレーミング・ボックス内部の解像度と同じである場合、レンディション・エンジンは、単に、ソース・メディア・アセット内におけるフレーミング・ボックスの範囲をレンディションのために抽出するだけである。ターゲット・ディスプレイがフレーミング・ボックスよりも低い解像度を有する場合、フレーミング・ボックス内の画素の範囲を、ターゲット解像度までダウンサンプリングする。ターゲット・ディスプレイがフレーミング・ボックス内の解像度よりも高い解像度を有する場合、レンディションにおいて含まれる画素の範囲は、ソース・メディア・アセットにおけるフレーミング・ボックスを中心として対称的に広げられる。このような拡張の結果、レンディション・ボックスが有効ボックスに達した場合、有効ボックスに達する点を超えて必要とされる追加の解像度が、有効ボックスで制限される範囲のアップサンプリングによって得られ、またはターゲット・ディスプレイが、有効ボックスの外側であるがエッセンス・ボックスの内部にある非エッセンス部分を含むことが許される。後者の場合、ターゲット解像度に達する前に、エッセンス・ボックスの限度にも達した場合、エッセンス・ボックスで制限される範囲をアップサンプリングすることによって、残りの必要な解像度が得られる。 [0045] The following is an example of rendition engine logic for using a spatial framing box. If the rendition target display has the same aspect ratio as the framing box, the following rules apply. If the resolution of the target display is the same as the resolution inside the framing box, the rendition engine simply extracts the extent of the framing box within the source media asset for rendition. If the target display has a lower resolution than the framing box, downsample the range of pixels within the framing box to the target resolution. If the target display has a higher resolution than the resolution in the framing box, the range of pixels included in the rendition is spread symmetrically around the framing box in the source media asset. If such expansion results in the rendition box reaching the valid box, the additional resolution required beyond the point at which the valid box is reached is obtained by upsampling the range limited by the valid box, Or the target display is allowed to contain non-essence parts that are outside the valid box but inside the essence box. In the latter case, if the essence box limit is also reached before the target resolution is reached, the remaining required resolution is obtained by upsampling the range bounded by the essence box.

[0046] レンディション・ターゲット・ディスプレイがフレーミング・ボックスのアスペクト比よりも高いアスペクト比（即ち、ｘ解像度のｙ解像度に対する比率が高い）を有する場合、以下の規則を適用する。レンディション・ボックスのｘ範囲を、ターゲット・アスペクト比に達するまで、ｙ解像度を変化させずに、フレーミング・ボックスの各側で対称的に広げる。結果的に得られた解像度がターゲット・ディスプレイのそれと一致した場合、これがレンディション・ボックスになる。ターゲット・ディスプレイの方が解像度が低い場合、ターゲットと一致するようにボックスをダウンサンプリングする。ターゲット・アスペクト比が得られる前に有効ボックスに達した場合、次いで、この範囲の一端のみにおいて有効ボックスに達した場合、この範囲の他端において有効ボックス以内で、範囲を広げる。ターゲットｘ解像度が得られる前に、ｘ範囲の両端が有効ボックスに達した場合、有効ボックスの外側であるがエッセンス・ボックスの内側にある画素を含ませることによって、未だ必要とされるアスペクト比の追加増大を得る。所望のアスペクト比を受け入れる(accommodate)ためにはエッセンス・ボックスにおいてもメディア・サンプルが
不十分である場合、ｘ範囲の底辺および上辺において追加のパディングを挿入するか、またはｙ範囲をクロップする。一旦所望のアスペクト比に達したなら、解像度が要件を満たすならば、結果的に得られたボックスはレンディション・ボックスになり、またターゲット解像度を超える場合、ダウンサンプリングする。ターゲット・アスペクト比がフレーミング・ボックスのそれよりも低い場合、類似のロジックを適用する。 [0046] If the rendition target display has a higher aspect ratio than the framing box aspect ratio (ie, a higher ratio of x resolution to y resolution), the following rules apply. Expand the x-extent of the rendition box symmetrically on each side of the framing box without changing the y-resolution until the target aspect ratio is reached. If the resulting resolution matches that of the target display, this becomes the rendition box. If the target display has a lower resolution, downsample the boxes to match the target. If the valid box is reached before the target aspect ratio is obtained, then if the valid box is reached at only one end of the range, then extend the range within the valid box at the other end of the range. If both ends of the x-range reach the active box before the target x-resolution is achieved, the still required aspect ratio is reduced by including pixels outside the active box but inside the essence box. Gain extra augmentation. If there are insufficient media samples even in the essence box to accommodate the desired aspect ratio, insert additional padding at the bottom and top of the x-range or crop the y-range. Once the desired aspect ratio is reached, the resulting box becomes a rendition box if the resolution meets the requirements, and downsamples if it exceeds the target resolution. Similar logic applies if the target aspect ratio is lower than that of the framing box.

[0047] ターゲット・アスペクト比を考慮するために、レンディション・エンジン・ロジックを使用してテキストを配列する(lay out)ことができる。クローズド・キャプショ
ンまたは字幕を配列するために、テキストのライン長を百分率で、例えば、ターゲット・ディスプレイの水平範囲の７５％に指定することができる。同様に、ターゲット・ディスプレイ上で入手可能な解像度に見合う解像度にフォント・サイズをスケールすることができる。テキストは、これらが重なり合う画像と干渉する可能性があるので、レンディション・エンジン・ロジックは、ターゲット・ディスプレイ上のフレーミング・ボックスの範囲外に範囲が利用可能であれば、フレーミング・ボックスによって定められる範囲を超える字幕の配置を回避する、または極力抑えようとすることができる。次いで、レンディション・エンジン(renditioning engine)は、読みやすさを阻害するおそれがあるテキスト
・スケーリング・アーチファクトを導入する必要なく、レンディション毎に新たにテキス
トを配列する。また、テキストの審美的レイアウトに対する同様の規則は、ビデオ・コンポジションの先頭および後端において表示されるタイトル、または個々の場面の境界を定めるタイトル等のタイトルにも適用される。 [0047] Text may be laid out using rendition engine logic to take into account the target aspect ratio. To align closed captions or subtitles, the line length of the text can be specified as a percentage, eg, 75% of the horizontal extent of the target display. Similarly, the font size can be scaled to match the resolution available on the target display. Text can interfere with images that they overlap, so the rendition engine logic is bounded by the framing box if bounds are available outside of the framing box on the target display. You can try to avoid or minimize outlying subtitle placement. The renditioning engine then arranges the text anew for each rendition without having to introduce text scaling artifacts that can interfere with readability. Similar rules for the aesthetic layout of text also apply to titles, such as titles displayed at the beginning and end of a video composition, or titles demarcating individual scenes.

[0048] 時間レンディション・パラメータは、先に説明した空間パラメータに類似している。水平軸に沿って時間を表す図７の理想化したタイムラインにおいて示すように、時間エッセンス・レンジ(temporal essence range)７０２に対応する１つのクリップ全体(full clip)は、元の映像の一部ではないカウントダウン・クロックおよびカラー・バーの
ような、ヘッダ７０４およびフッタ７０６を含むことがある。エッセンス・レンジ全体に及ぶクリップ全体は、先に定めたエッセンス・ボックスと等価である。有効レンジ７０８は、元々なかったヘッダおよびフッタを除外するが、クリップのヘッド７１０およびテール７１２を含む。これらは、通例、移行の中で、例えば、直前のクリップまたは連続するクリップへのクロスフェードにおける使用のために設計される。通例、このようなヘッドおよびテールは、５２０個のビデオ・フレームを含む。フレーミング・レンジ７１４は、エッセンス・レンジの内部にあるクリップの内、元々はなかったヘッダ７０４およびフッタ７０６の双方、ならびにヘッダ７１０よびテール７１２を除外した部分である。フレーミング・レンジは、最も重要なコンテンツが存在する時間範囲を示す。クリップ内部におけるコンテンツの階層に対する指針を与えるために、追加の時間フレーミング・レンジを挿入することもできる。例えば、第２フレーミング・レンジ７１６は、編集者によって
クリップの中から指定され、下流における編集の進行中(on-the-fly editing)に元のクリップ長を約５０％に落とすために使用される時間的部分を示す。これによって、最も重要なコンテンツを失う危険を冒すことなく、コンポジションを異なる長さにする、自動下流編集(automated downstream editing)が可能になる。空間レンディション・パラメータと同様に、時間エッセンス・レンジおよび有効レンジは、ソース・メディア・アセットのプロパティであるが、フレーミング・レンジは編集者によって指定することができる。メディア合成アプリケーションのユーザ・インターフェースは、編集者が、例えば、マーカ７１８および７２０を、クリップの表示されたタイムライン表現上に挿入して、フレーミング・レンジの開始および終了を示すことによって、フレーミング・レンジを指定することを可能にする。 [0048] The temporal rendition parameters are similar to the spatial parameters described above. As shown in the idealized timeline of Figure 7, which represents time along the horizontal axis, one full clip corresponding to the temporal essence range 702 is a portion of the original video. may include headers 704 and footers 706, such as countdown clocks and color bars that are not An entire clip that spans the entire essence range is equivalent to the essence box defined above. Valid range 708 excludes headers and footers that were not originally present, but includes head 710 and tail 712 of the clip. These are typically designed for use in transitions, for example in crossfades to the previous or successive clip. Typically such heads and tails comprise 520 video frames. The framing range 714 is the portion of the clip that is within the essence range, excluding both the header 704 and footer 706, and the header 710 and tail 712, which were not originally present. The framing range indicates the time range within which the most important content is present. Additional time framing ranges can also be inserted to give guidance to the hierarchy of content within the clip. For example, a second framing range 716 is specified by the editor from within the clip and used to drop the original clip length to about 50% during on-the-fly editing downstream. Indicates a temporal portion. This allows for automated downstream editing of different length compositions without the risk of losing the most important content. Like the spatial rendition parameters, the temporal essence range and valid range are properties of the source media asset, but the framing range can be specified by the editor. The user interface of the media compositing application allows the editor to mark the framing range by, for example, inserting markers 718 and 720 onto the displayed timeline representation of the clip to indicate the start and end of the framing range. allows you to specify

[0049] 時間レンディション・パラメータに対するレンディション・エンジン・ロジックは、空間フレーミング・ボックスのそれと同様である。レンディション・プロファイルにおいて指定されたクリップのターゲット期間が、フレーミング・レンジによって指定されたそれと同一である場合、クリップのその部分がレンディション内に含まれる。フレーミング・ボックスの期間よりも長い期間が、レンディションにおいて要求される場合、フレーミング・レンジを超えるが有効レンジ内の追加コンテンツが追加される。具体的な使用事例では、ブロードキャスト・レンディションの期間は、放送される広告のようなインサートの数に依存する場合がある。何故なら、レンディションおよびインサートを組み合わせた期間は、所定の全割り当て時間内に収まらなければならないからである。これは放映時刻の直前に決定することができ、既に作成されている編集レンディション規則による自動レンダリングを特別に価値あるものにする。補足フレーミング・レンジ（例えば、図７の７１６）は、レンディション・エンジンが、３０秒、１分、または２分の短縮というような、編集で指定されたフレーミング・レンジの長さ短縮によって表される、種々の予測されたインサート期間にしたがって、コンテンツをカットすることを可能にする。他の使用事例では、最終編集の一部として、フェード移行におけるように、期間延長を必要とする新たなエフェクトが適用される。すると、この状況では、追加のコンテンツがフレーミング・ボックスを超えて、有効レンジ内に広がる。期間の延長が有効レンジによって完全に供給することができず、ソース・メディアには追加の素材が存在しない場合、フレーム複製、クリップのテールにグラフィクスまたは黒いフレームをパディングするというような、期間を広げるための他の技法を使用することもできる。 [0049] The rendition engine logic for temporal rendition parameters is similar to that for spatial framing boxes. If the clip's target duration specified in the rendition profile is the same as that specified by the framing range, then that portion of the clip is included in the rendition. If a duration longer than the duration of the framing box is requested in the rendition, additional content beyond the framing range but within the valid range is added. In a specific use case, the duration of a broadcast rendition may depend on the number of ad-like inserts being broadcast. This is because the combined duration of renditions and inserts must fit within a given total allotted time. This can be determined just before show time, making automatic rendering with already created editorial rendition rules particularly valuable. Supplemental framing ranges (eg, 716 in FIG. 7) are represented by the length shortening of the framing range that the rendition engine specifies in the edit, such as shortening by 30 seconds, 1 minute, or 2 minutes. allows content to be cut according to various predicted insert durations. In other use cases, as part of the final edit, new effects are applied that require duration extensions, such as in fade transitions. Then, in this situation, the additional content extends beyond the framing box and into the effective range. Extending the duration, such as frame duplication, padding the tail of the clip with graphics or black frames, if the duration extension cannot be fully supplied by the available range and no additional material is present in the source media. Other techniques can also be used for

[0050] 時間範囲(temporal range)と時間解像度(temporal resolution)との間の区別
、即ち、１秒当たりのビデオ・フレーム数（ＦＰＳ）または１秒当たりのオーディオ・サンプル数を図８に示す。図８は、同じ時間範囲を有するが、異なる時間解像度を有する２つのビデオ・クリップ８０２および８０４を示す。クリップ８０２は、クリップ８０４の２倍の時間解像度を有する。例えば、５０ＦＰＳ対２５ＦＰＳである。 [0050] The distinction between temporal range and temporal resolution, ie video frames per second (FPS) or audio samples per second, is shown in FIG. FIG. 8 shows two video clips 802 and 804 having the same time range but different temporal resolutions. Clip 802 has twice the temporal resolution of clip 804 . For example, 50 FPS versus 25 FPS.

[0051] レンディションの中には、要求フレーム・レートがソース・メディア・アセットのそれから逸脱するものもある。例えば、毎秒２４フレーム（ＦＰＳ）のフィルム・フレーム・レートを、２５ＦＰＳのビデオ・フレーム・レート（ＰＡＬエンコーディングの場合）または毎秒２９．９７フレーム（ＮＴＳＣエンコーディングの場合）でレンダリングする必要がある場合もある。このような変換を実行するとき、編集レンディション規則が、クリップの期間を保存するか否か、またはクリップ期間を変更することを許可するか否か決定する。より高いフレーム・レートに変換するときにクリップの期間を保存しようとする場合、内挿補間方法によって、例えば、３：２プルダウンによって、または余分なコンテンツをクリップに追加することによってのいずれかで、追加のフレームが得られる。より低いフレーム・レートに変換するときにクリップの期間を保存しようとする場合、クリップからフレームを除去することによって、随意に、除去されるフレームを使用して、保持されるフレームを調節／配合することによって、クリップをより少ないフレームにスケールする。クリップの期間が変動することが許される場合、ビデオの速度を変更する、即ち、多少遅くするまたは速くしつつ、クリップ・コンテンツにおけるフレーム数を保持ことができる。２４ＦＰＳのスースを２５ＦＰＳのレンディションにレンダリングするとき、通常では、単純なテンポ・スケーリング(tempo scaling)が容認可能であると見な
され、クリップ期間に対応するわずかな短縮が生ずるが、２４ＦＰＳのソースを２９．９７ＦＰＳレンディションにレンダリングするとき、単純なスケーリングを行うと、過剰な高速化が生ずる結果となる。したがって、この状況では、テンポおよび期間を保存するために、新たな内挿補間フレームを挿入する。 [0051] For some renditions, the requested frame rate deviates from that of the source media asset. For example, a film frame rate of 24 frames per second (FPS) may need to be rendered at a video frame rate of 25 FPS (for PAL encoding) or 29.97 frames per second (for NTSC encoding). . When performing such transformations, the editing rendition rules determine whether to preserve the duration of the clip or whether to allow the duration of the clip to be changed. When trying to preserve the duration of a clip when converting to a higher frame rate, either by interpolation methods, e.g., by 3:2 pulldown, or by adding extra content to the clip. You get an extra frame. If you are trying to preserve the duration of a clip when converting to a lower frame rate, remove frames from the clip, optionally using the removed frames to adjust/blend the retained frames. scales the clip to fewer frames. If the duration of the clip is allowed to vary, the speed of the video can be changed, ie, slowed down or sped up somewhat, while retaining the number of frames in the clip content. When rendering a 24FPS source to a 25FPS rendition, a simple tempo scaling is usually considered acceptable, resulting in a slight shortening corresponding to the clip duration, but rendering a 24FPS source When rendering to the 29.97 FPS rendition, simple scaling results in excessive speedup. Therefore, in this situation, we insert a new interpolated frame to preserve tempo and duration.

[0052] 色レンディション規則およびそれらの関連パラメータは、白黒ダイナミック・
レンジの内ターゲット・ディスプレイにマッピングされなければならない部分、またはビデオ・エンコーディング内のターゲット・ディスプレイにマッピングされなければならない多次元カラー・レンジの部分を選択するように、レンディション・エンジンに指令する。例えば、編集によって指定されるリーガル・レンジが、ＳＤまたはＨＤテレビジョン受像機上に表示することができるカラー・レンジに対応するのでもよい。他の編集によって指定されるリーガル・レンジが、ＨＤＲディスプレイ上に表示することができるもっと広いカラー・レンジに対応するのでもよい。 [0052] Color rendition rules and their associated parameters are
Instruct the rendition engine to select the portion of the range that should be mapped to the target display, or the portion of the multi-dimensional color range that should be mapped to the target display in the video encoding. For example, the legal range specified by the compilation may correspond to the color range that can be displayed on an SD or HD television receiver. The legal range specified by other compilations may correspond to a wider color range that can be displayed on an HDR display.

[0053] 第１組の色パラメータは、ダイナミック・レンジを定める。エッセンス・レンジは、画像の画素値のエンコーディングが取ることができる値の範囲に対応する。これは、ビット長、即ち、各画素を表すために割り当てられたビットの数の関数である。有効レンジは、メディア・サンプルが存在する値の範囲に対応する。画像において実際のエンコード値(encoded value)の範囲は、通例エッセンス・レンジの部分集合であるリーガル・
レンジに及ぶ。以上で定めたフレーミング・レンジに対応するリーガル・レンジは、編集者が基準黒点および白点間の範囲にマッピングするエンコード値の範囲であり、多くの場合０から１００％ＲＥで示される。 [0053] The first set of color parameters defines the dynamic range. The essence range corresponds to the range of values that the pixel value encoding of the image can take. This is a function of the bit length, ie the number of bits allocated to represent each pixel. The valid range corresponds to the range of values in which the media sample resides. The range of actual encoded values in an image is the legal range, which is usually a subset of the essence range.
reach the range. The legal range, corresponding to the framing range defined above, is the range of encoded values that the editor maps to the range between the reference black and white points, often denoted 0 to 100% RE.

[0054] 図９は、所与のビデオ・フレームの画素値のヒストグラム・プロット９０２上におけるダイナミック・レンジ・パラメータを示す。エッセンス・レンジ９０４は、フレームが表されるエンコーディング方式によって表すことができる範囲全体に及ぶ。有効レンジ９０６は、フレームの画素値によってエッセンス・レンジが埋められる(populate)広
さ(extent)に及び、一方リーガル・レンジ９０８は、基準黒点および基準白点、即ち、０から１００％までのＩＲＥレンジに及ぶ範囲にマッピングされる画素値の範囲を示す。リーガル・レンジは、ブロードキャスト安全範囲(broadcast safe range)、即ち、リーガル・レンジ９０８のテレビジョン・モニタ９１０へのブロードキャストによって図９において示されるように、標準品位（ＳＤ）または高品位（ＨＤ）テレビジョン・モニタ上において表すことができる範囲に対応することができる。色編集レンディション規則を作成するとき、編集者は、ソース・ダイナミック・レンジのどの部分が示されようとするのか示すために、リーガル・レンジを指定することができる。例えば、ビデオ・コンポジションについて、編集者が微妙な詳細を示すことを望むのが、フレームの影においてであって、もっと明るく照明されたエリアではない場合、編集者はリーガル・レンジを有効レンジの下端に向けてスライドする。高ダイナミック・レンジ・モニタに対応する追加のリーガル・レンジも指定することができる。このようなモニタは、４,０００％ＩＲＥまたは１０
，０００％ＩＲＥまでものダイナミック・レンジで画像(magery)を表示することができる場合もある。このようなディスプレイ上に何をマッピングすべきか指定するために、対応してもっと広いリーガル・レンジが、編集者によって有効レンジ内から指定される。メディア合成アプリケーションのユーザ・インターフェースは、編集者がリーガル・レンジを指定することを、例えば、ライン９１２および９１４をスライドさせることによって可能にし、リーガル・レンジの下限および上限を、それぞれ、画素値ヒストグラム・プロット９０２上の左または右に指定する(denote)。ディスプレイ・デバイス規格に対応する標準ダイナミック・レンジを選択してもよく、つまり、ライン９１２および９１４の分離を、このような規格に対応するように、例えば、０～１００％ＩＲＥ（ＳＤ、ＨＤ）または０～４，０００％ＩＲＥ（ＨＤＲ）に対応するように制約することができる。 [0054] Figure 9 illustrates the dynamic range parameter on a histogram plot 902 of pixel values for a given video frame. Essence range 904 spans the entire range that can be represented by the encoding scheme in which the frame is represented. Valid range 906 spans the extent to which the essence range is populated by the pixel values of the frame, while legal range 908 spans the reference black and white points, i.e., the IRE from 0 to 100%. Indicates the range of pixel values that are mapped to ranges that span ranges. The legal range is the broadcast safe range, i.e. standard definition (SD) or high definition (HD) television, as illustrated in FIG. Any range that can be represented on the John monitor can be accommodated. When creating a color editing rendition rule, the editor can specify a legal range to indicate what portion of the source dynamic range is to be rendered. For example, for a video composition, if the editor wants to show subtle details in the shadows of the frame and not in the more brightly lit areas, the editor may change the legal range to the effective range. Slide towards the bottom edge. Additional legal ranges can also be specified to accommodate high dynamic range monitors. Such monitors are rated at 4,000% IRE or 10
,000% IRE in some cases. To specify what should be mapped on such a display, a correspondingly wider legal range is specified by the editor from within the valid range. The user interface of the media compositing application allows the editor to specify the legal range, for example by sliding lines 912 and 914, and the lower and upper limits of the legal range, respectively, to the pixel value histogram. Denote left or right on plot 902 . A standard dynamic range corresponding to the display device standard may be selected, ie the separation of lines 912 and 914 may be adjusted to correspond to such standard, eg 0-100% IRE (SD, HD) Or it can be constrained to correspond to 0-4,000% IRE (HDR).

[0055] 第２組のレンディション関係色パラメータは、多次元空間における色域を定める。ダイナミック・レンジと同様、エッセンス・レンジは、画像データを表すために使用されたエンコーディングによって表すことができる値の範囲に対応するが、この場合、ＲＧＢまたはＹＣＢＣＲのような多次元色空間である。有効レンジは、表される色画像におけるメディア・サンプルが存在する色空間値の範囲に対応する。つまり、エッセンス・レンジは、元の場面からサンプリングされなかった値を含むことができる。例えば、１６ビット・エンコーディングが使用される場合、可能な１６ビット・コード・ワードの全てがカメラによってキャプチャされる訳ではない。 [0055] A second set of rendition-related color parameters defines a color gamut in a multidimensional space. Like dynamic range, essence range corresponds to the range of values that can be represented by the encoding used to represent the image data, but in this case a multi-dimensional color space such as RGB or YCBCR. The valid range corresponds to the range of color space values in which media samples exist in the represented color image. That is, the essence range can contain values that were not sampled from the original scene. For example, if 16-bit encoding is used, not all possible 16-bit code words will be captured by the camera.

[0056] リーガル・レンジは、編集によって決定される範囲であり、有効レンジの部分集合であり、有効レンジ全体を表示できないときに、色空間のどの部分をレンディションにおける表現のために選択すべきかについての編集者の選択を反映する。図１０は、色編集レンディション規則によって使用される色空間パラメータ選択の例を示す、照明委員会（ＣＩＥ：Commission on Illumination）の色度図である。エッセンス・レンジは、外側エンベロープ１００２内に含まれる。外側エンベロープ１００２は、画像エンコーディング方式によって表される全ての色値に及び、画素表現の多次元ビット深さに依存する。有効レンジ１００４は、エッセンス・レンジの部分集合であり、国際電気通信連合（ＩＴＵ）－ＲＢＴ２０２０色域１００４に対応し、編集者指定リーガル・レンジはＩＴＵ－７０９色域１００６に対応する。メディア合成アプリケーションのユーザ・インターフェースは、例えば、編集者が、図１０におけるＣＩＥ色度図のような二次元色空間プロット上または透視三次元色空間プロット上で標準的色域（ＩＴＵ－７０９（１００６）のような）を選択することまたはこの色域のエンベロープを描画することを可能にすることによって、編集者がリーガル色域レンジ(legal color gamut range)を指定することを可能にす
る。このように、編集者は、ＢＴ２０２０色域全体を忠実に表示することができない出力ディスプレイ上で色解像度をどこに集中させるべきか指定することができる。 [0056] The legal range is the range determined by editing, a subset of the valid range, which part of the color space should be selected for representation in the rendition when the entire valid range cannot be displayed. reflect the editor's choice about FIG. 10 is a Commission on Illumination (CIE) chromaticity diagram showing an example of color space parameter selection used by color editing rendition rules. The essence range is contained within outer envelope 1002 . The outer envelope 1002 spans all color values represented by the image encoding scheme and depends on the multidimensional bit depth of the pixel representation. The valid range 1004 is a subset of the essence range and corresponds to the International Telecommunications Union (ITU)-RBT2020 color gamut 1004 and the editor-specified legal range corresponds to the ITU-709 color gamut 1006. The user interface of the media composition application, for example, allows the editor to plot the standard color gamut (ITU-709 (1006 ) or allows the envelope of this color gamut to be drawn, allowing the editor to specify the legal color gamut range. In this way, editors can specify where to concentrate color resolution on output displays that cannot faithfully display the entire BT2020 color gamut.

[0057] 空間および時間編集レンディション規則について以上で説明したのと類似した
方法で、色レンディション規則およびそれらのパラメータは、自動化レンディション・エンジンがどのようにしてレンディション・プロファイルにしたがってレンディションを生成するか決定する。例えば、レンディションのために指定されたターゲット色空間リーガル・レンジが、ソース・メディア・エッセンス・レンジよりも小さな色空間ボリューム(color space volume)を占めるとき、編集レンディション規則は、エッセンス・レンジ・ボリュームをリーガル・レンジ・ボリュームまでクロップするか否か、またはリーガル・レンジ・ボリューム内に収まるように、ソース・メディア内にある画素値を縮小するか否か決定する。逆に、ターゲット色空間リーガル・レンジが、ソース・メディア・エッセンス・レンジよりも大きなボリュームを占めるとき、レンディション規則は、エッセンス・レンジ・ボリュームを拡大し、エリアシング・アーチファクトを回避するために内挿補間値を使用するか否か、または値範囲を使用すべきか否か指定する。 [0057] In a manner similar to that described above for spatial and temporal editing rendition rules, color rendition rules and their parameters are used by the automated rendition engine to rendition according to rendition profiles. determines whether to generate For example, when the target color space legal range specified for a rendition occupies a smaller color space volume than the source media essence range, the edit rendition rule will apply the essence range Determine whether to crop the volume to the legal range volume, or whether to shrink the pixel values in the source media to fit within the legal range volume. Conversely, when the target color space legal range occupies a larger volume than the source media essence range, the rendition rule expands the essence range volume and uses the inner color space to avoid aliasing artifacts. Specifies whether to use interpolated values or whether to use value ranges.

[0058] 空間、時間、および色パラメータを指定することに加えて、レンディション・プロファイルは、レンディションを完全に指定するために必要とされる補助パラメータも含むことができる。補助パラメータは、使用されるコデック、ストリーミング流通を目的とする帯域幅、ファイル・ラッパ・タイプ(file wrapper type)、ディジタル権利管理情
報、およびスレートを含む。例えば、１つのレンディションを、ＭＸＦファイル・ラッパにおける１０８０Ｐ、および流通権利を米国のみとして指定することができる。一般に、このような補助レンディション・パラメータは、生成されるレンディションの完全な指定のために必要とされるが、性質上編集には関係なく、したがって、これらのパラメータに対して編集によって指定される値とメディアをタグ付けするための上流編集インターフェースを提供する必要はない。 [0058] In addition to specifying spatial, temporal, and color parameters, a rendition profile may also include auxiliary parameters required to fully specify a rendition. Auxiliary parameters include codec used, bandwidth intended for streaming distribution, file wrapper type, digital rights management information, and slate. For example, one rendition may be specified as 1080P in the MXF file wrapper and distribution rights as US only. In general, such auxiliary rendition parameters are required for the complete specification of the rendition to be generated, but are irrelevant to the edit in nature and are therefore not specified by the edit for these parameters. It is not necessary to provide an upstream editing interface for tagging media with values that

[0059] 本明細書において説明したレンディションの消費のための流通チャネルには、映画館、放映、ケーブル、衛星テレビジョンおよびラジオ、コンピュータ、タブレット、ならびにスマートフォンにストリーミングされるネット配信番組(over-the-top programming)、更にはライブ・ストリームが含まれるが、これらに限定されるのではない。消費環境には、娯楽および教育シーン(setting)、公の空間または私的組織および企業シーンが
含まれる。これらの環境は、１つ以上の地理的領域に配置されてもよい。各チャネル、環境、および地理は、メディア消費に対する特定の要件を生み出すことができ、対応して特別に作られるレンディション・プロファイルにカプセル化される。 [0059] Distribution channels for consumption of the renditions described herein include theatrical, broadcast, cable, satellite television and radio, computers, tablets, and over-the-air programming streamed to smartphones. the-top programming), and even live streams. Consumption environments include entertainment and educational settings, public spaces or private institutional and corporate settings. These environments may be located in one or more geographic regions. Each channel, environment, and geography can create specific requirements for media consumption and is encapsulated in a correspondingly specially crafted Rendition Profile.

[0060] 再度図１を参照すると、レンディション・エンジン１１２は、カスタム・ソフトウェアと、随意に、メディア合成アプリケーション１０６の一部であるハードウェア・ロジックとを含むことができる。他の実施形態では、レンディション・エンジンは、包括的な論理エンジンを含む。この包括的な論理エンジンは、編集レンディション規則１１０を入力し、レンディション・プロファイル１１４にしたがって、規則を実行してソース・メディア・アセット１０２からレンディションおよびエフェクトを生成する（図１には示されていない）。種々の実施形態において、レンディション・エンジンは、人工知能技法を含む。人工知能技法は、広範囲の合成コンテンツ、ソース・メディア・タイプ、およびレンディション要件についての編集者の集団による編集レンディション判断から学習することを伴う。深層学習方法が、多レベル・ニューラル・ネットワークをデプロイして、どのように規則およびパラメータを選択するか、そして利用可能なソース・メディアに関して、レンダリングされる出力を最適化するソースをどのように選択するか判定することができる。 [0060] Referring again to FIG. In other embodiments, the rendition engine includes a generic logic engine. This overarching logic engine takes as input editorial rendition rules 110 and executes the rules to generate renditions and effects from source media assets 102 according to rendition profiles 114 (shown in FIG. 1). It has not been). In various embodiments, the rendition engine includes artificial intelligence techniques. Artificial intelligence techniques involve learning from editorial rendition decisions by a collective of editors for a wide range of synthetic content, source media types, and rendition requirements. How deep learning methods deploy multi-level neural networks to select rules and parameters, and sources to optimize the rendered output with respect to the available source media can decide whether to

[0061] 図１１は、自動化メディア出版のためのシステムを示す上位ブロック図である。メディア合成ホスト・システム１１０２は、コンポジション１１０６およびレンディション・エンジン１１０８を含むメディア合成アプリケーション１１０４をホストする。１人以上の編集者１１１０がコンポジションを編集する。コンポジションは、編集レンディ
ション規則１１１２を指定し、コンポジション１１０６の１つ以上の合成オブジェクトを規則とタグ付けすることを含む。レンディション・プロファイル１１１４がメディア合成ホスト・システムに供給される。レンディション・プロファイル１１１４によって指定されたレンディションが生成されようとするとき、レンディション・エンジン１１０８はストレージ１１１８からソース・メディア・アセット１１１６にアクセスし、出力レンディション１１２０を生成しストレージ１１２２に出力する。図示する実施形態では、レンディション・エンジン１１０８は、メディア合成アプリケーションの一部として実装され、メディア合成アプリケーションのメディア・プレーヤ・ソフトウェアにアクセスすることができる。メディア合成アプリケーションは、コンポジションをその元の平坦化されていない形態で読み出すことができる。 [0061] Figure 11 is a high-level block diagram illustrating a system for automated media publishing. Media composition host system 1102 hosts media composition application 1104 including composition 1106 and rendition engine 1108 . One or more editors 1110 edit the composition. A composition includes specifying editing rendition rules 1112 and tagging one or more composite objects of composition 1106 with the rules. A rendition profile 1114 is provided to the media composition host system. When a rendition specified by rendition profile 1114 is to be generated, rendition engine 1108 accesses source media asset 1116 from storage 1118 and generates output rendition 1120 for output to storage 1122. . In the illustrated embodiment, the rendition engine 1108 is implemented as part of a media composition application and has access to the media composition application's media player software. A media compositing application can read the composition in its original, unflattened form.

[0062] 図１２に示す他の実施形態では、レンディション・エンジンは、メディア合成ホスト・システムの外部にある１つ以上のプラットフォームにおいて実装される。これらの実施形態では、レンディション・エンジンは、メディア合成アプリケーションによって使用されるメディア・プレーヤ・ソフトウェアと同様の機能を含むので、豊富なコンポジションをその元の平坦化されていない形態で読み出すことができる。メディア合成ホスト・システムの外部にある２つのレンディション・エンジンの使用を示す図１２を参照すると、メディア合成ホスト・システム１２０２は、編集レンディション規則１２０８とタグ付けされたコンポジション１２０６を含む、メディア合成アプリケーション１２０４をホストする。レンディションが生成されようとするとき、レンディション・エンジン１１２１０およびレンディション・エンジン２１２１２はレンディション・プロファイル１２１４およびコンポジション１２０６を受け、ソース・メディア・アセット１２１８から開始して、レンディション・プロファイルおよび編集レンディション規則１２０８によって指定されるレンディション１２１６を生成する。レンディション・エンジンは、ストレージ１２２０における格納のために、レンディションを出力する。レンディション・エンジンが実装される外部プラットフォームは、ホスト１２０２と同じ場所にあってもよく、例えば、ローカル・エリア・ネットワークによって接続されるか、あるいは１つ以上のリモート・サーバによって、またはプライベート・クラウドもしくはパブリック・クラウドによってホストされてもよい。 [0062] In another embodiment, shown in Figure 12, the rendition engine is implemented in one or more platforms external to the media synthesis host system. In these embodiments, the rendition engine includes functionality similar to media player software used by media compositing applications so that the rich composition can be read in its original, unflattened form. can. Referring to FIG. 12, which illustrates the use of two rendition engines external to the media composition host system, the media composition host system 1202 includes a composition 1206 tagged with editorial rendition rules 1208, media It hosts the compositing application 1204 . When a rendition is about to be generated, Rendition Engine 1 1210 and Rendition Engine 2 1212 receive the Rendition Profile 1214 and Composition 1206 and, starting with the Source Media Asset 1218, create a Generate renditions 1216 specified by profile and edit rendition rules 1208 . The rendition engine outputs renditions for storage in storage 1220 . The external platform on which the Rendition Engine is implemented may be co-located with the host 1202, for example connected by a local area network, or by one or more remote servers, or by a private cloud. Or it may be hosted by a public cloud.

[0063] 本明細書において説明したシステムの種々のコンポーネントは、汎用コンピュータ・システムを使用するコンピュータ・プログラムとして実装することができる。このようなコンピュータ・システムは、通例、情報をユーザに表示する出力デバイスと、ユーザから入力を受ける入力デバイスとの双方に接続された主ユニットを含む。主ユニットは、一般に、相互接続メカニズムを介してメモリ・システムに接続されたプロセッサを含む。また、入力デバイスおよび出力デバイスは、プロセッサおよびメモリ・システムにも、相互接続メカニズムを介して接続される。 [0063] Various components of the systems described herein can be implemented as computer programs using a general-purpose computer system. Such computer systems typically include a main unit connected to both an output device for displaying information to a user and an input device for receiving input from the user. A main unit generally includes a processor connected to a memory system via an interconnection mechanism. Input and output devices are also connected to processors and memory systems via interconnection mechanisms.

[0064] １つ以上の出力デバイスをコンピュータ・システムに接続することができる。出力デバイスの例には、液晶ディスプレイ（ＬＣＤ）、プラズマ・ディスプレイ、ビューア用めがねを必要とするディスプレイおよびめがねが不要のディスプレイを含む種々の立体視ディスプレイ、陰極線管、ビデオ投影システム、ならびに他のビデオ出力デバイス、プリンタ、ネットワーク・インターフェース・デバイス、ケーブル・モデムを含む、低または高帯域幅ネットワークを介して通信するためのデバイス、ならびにディスクまたはテープのような記憶デバイスが含まれるが、これらに限定されるのではない。１つ以上の入力デバイスをコンピュータ・システムに接続することができる。入力デバイスの例には、キーボード、キーパッド、トラック・ボール、マウス、ペンおよびタブレット、タッチスクリーン、カメラ、通信デバイス、ならびにデータ入力デバイスが含まれるが、これらに限定されるのではない。本発明は、コンピュータ・システムと組み合わせて使用される特定の入力または出力デバイスにも、本明細書において説明したものにも限定されない。 [0064] One or more output devices can be connected to the computer system. Examples of output devices include liquid crystal displays (LCDs), plasma displays, various stereoscopic displays including displays that require viewer glasses and displays that do not require glasses, cathode ray tubes, video projection systems, and other video displays. Devices for communicating over low or high bandwidth networks including, but not limited to, output devices, printers, network interface devices, cable modems, and storage devices such as disks or tapes. not One or more input devices can be connected to the computer system. Examples of input devices include, but are not limited to, keyboards, keypads, trackballs, mice, pens and tablets, touchscreens, cameras, communication devices, and data entry devices. The invention is not limited to any particular input or output device used in conjunction with the computer system, nor is it limited to those described herein.

[0065] コンピュータ・システムは、汎用コンピュータ・システムであってもよく、コンピュータ・プログラミング言語、スクリプト言語、またはアセンブリ言語でさえも使用してプログラミング可能である。また、コンピュータ・システムは、特別にプログラミングされた特殊目的ハードウェアであってもよい。汎用コンピュータ・システムでは、プロセッサは、通例、市販のプロセッサである。また、汎用コンピュータは、通例、オペレーティング・システムを有する。オペレーティング・システムは、他のコンピュータ・プログラムの実行を制御し、スケジューリング、デバッギング、入力／出力制御、アカウント管理、編集、ストレージ割り当て、データ管理およびメモリ管理、ならびに通信制御および通信関連サービスに対応する(provide)。コンピュータ・システムは、ローカル・ネッ
トワークに、および／またはインターネットのような、ワイド・エリア・ネットワークに接続することもできる。接続されたネットワークは、コンピュータ・システムに、およびコンピュータ・システムから、コンピュータにおける実行のためのプログラム命令、ビデオ・データ、静止画像データ、またはオーディオ・データのようなメディア・データ、メタデータ、メディア・コンポジションに対する意見および承認情報、メディア注釈(media
annotation)、ならびに他のデータを転送することができる。 [0065] The computer system may be a general-purpose computer system, programmable using a computer programming language, scripting language, or even assembly language. The computer system may also be specially programmed, special purpose hardware. In general-purpose computer systems, the processor is typically an off-the-shelf processor. A general purpose computer also typically has an operating system. The operating system controls the execution of other computer programs and provides scheduling, debugging, input/output control, account management, editing, storage allocation, data and memory management, and communications control and communications-related services ( provide). The computer system may also be connected to local networks and/or to wide area networks such as the Internet. The connected network can transfer media data, such as program instructions, video data, still image data, or audio data to and from the computer system for execution in the computer. Opinion and endorsement information for the composition, media annotations
annotation), as well as other data can be transferred.

[0066] メモリ・システムは、通例、コンピュータ読み取り可能媒体を含む。この媒体は、揮発性または不揮発性、書き込み可能または書き込み不可能、および／または上書き可能もしくは上書き不可能であってもよい。メモリ・システムは、通例、データを二進形態で格納する。このようなデータは、マイクロプロセッサによって実行されるアプリケーション・プログラム、またはアプリケーション・プログラムによって処理されるためにディスクに格納されている情報を定めることができる。本発明は、特定のメモリ・システムには限定されない。磁気、光、またはソリッド・ステート・デバイスに時間ベース媒体を格納し、これらから入力することができる。これらのドライブは、ローカルまたはネットワーク・アタッチト・ディスクのアレイを含んでもよい。 [0066] The memory system typically includes a computer-readable medium. The medium may be volatile or non-volatile, writable or non-writable, and/or overwritable or non-writable. Memory systems typically store data in binary form. Such data may define application programs executed by the microprocessor or information stored on the disk for processing by the application programs. The invention is not limited to any particular memory system. Time-based media can be stored and input from magnetic, optical, or solid state devices. These drives may include arrays of local or network attached disks.

[0067] 本明細書において説明したようなシステムは、ソフトウェア、ハードウェア、ファームウェア、またはこれら３つの組み合わせで実装することができる。このシステムの種々のエレメントは、個別でもまたは組み合わせでも、１つ以上のコンピュータ・プログラム製品として実現することができ、コンピュータ・プログラム製品において、コンピュータ・プログラム命令は、コンピュータによる実行のために非一時的コンピュータ読み取り可能媒体に格納され、または接続されているローカル・エリアまたはワイド・エリア・ネットワークを介してコンピュータ・システムに転送される。プロセスの種々のステップは、このようなコンピュータ・プログラム命令を実行するコンピュータによって実行することができる。コンピュータ・システムは、マルチプロセッサ・コンピュータ・システムであってもよく、またはコンピュータ・ネットワークを介して接続された複数のコンピュータを含んでもよく、あるいはクラウドに実装されてもよい。本明細書において説明したコンポーネントは、コンピュータ・プログラムの別々のモジュールであってもよく、または別々のコンピュータ・プログラムでもよく、別々のコンピュータにおいて動作可能であってもよい。これらのコンポーネントによって生成されるデータは、メモリ・システムに格納することができ、またはコンピュータ・システム間で、搬送波信号のような種々の通信媒体によって送信することもできる。 [0067] A system as described herein may be implemented in software, hardware, firmware, or a combination of the three. The various elements of this system, either individually or in combination, may be implemented as one or more computer program products, in which computer program instructions are non-transitory for execution by a computer. Stored on a computer readable medium or transferred to a computer system over a local area or wide area network connected. Various steps of a process can be performed by a computer executing such computer program instructions. The computer system may be a multiprocessor computer system, or may include multiple computers connected via a computer network, or may be implemented in the cloud. The components described herein may be separate modules of a computer program, or may be separate computer programs and operable on separate computers. The data generated by these components can be stored in memory systems or can be transmitted between computer systems by various communication media such as carrier wave signals.

[0068] 以上、実施形態例について説明したが、以上のことは単なる例示であって限定ではなく、一例として紹介されたに過ぎないことは、当業者には明白なはずである。数多くの変更および他の実施形態も当業者の範囲内に入り、本発明の範囲に該当するものとして考えることとする。 [0068] While example embodiments have been described above, it should be apparent to those skilled in the art that the foregoing has been presented by way of illustration only and not limitation. Numerous modifications and other embodiments will come within the purview of those skilled in the art and are considered to fall within the scope of the invention.

Claims

A method of generating a rendition of a media composition, comprising:
receiving the media composition, wherein the media composition includes a composition object that references source media assets; the composition object is created by an editor of the media composition in a media composition application; a step associated with a specified edit rendition rule using
receiving a rendition profile that specifies media essence encoding parameters for a rendition of the media composition;
generating renditions of the media composition from the source media assets according to the editorial rendition rules and the media essence encoding parameters for the renditions of the media composition;
A method, including

2. The method of claim 1, wherein the media composition has a plurality of versions, and the rendition profile further comprises a version of the media composition among the plurality of versions of the media composition. How to specify the version that should be used to generate the rendition.

2. The method of claim 1, wherein the editing rendition rules are spatial editing rendition rules, and at least one of the media essence encoding parameters specified by the rendition profile is a spatial rendition parameter. there is a way.

4. The method of claim 3, wherein the spatial rendition parameter is a framing box that defines a portion of a video frame of the composite object of the media composition, and the source media composition of the media composition. A method, wherein a first portion of an asset that resides within the framing box is preferentially included in the rendition.

5. The method of claim 4, wherein the aspect ratio of the target display for the rendition is greater than the aspect ratio of the framing box, and the framing box among the source media assets of the media composition. is also preferentially included in the rendition, the second portion of the source media assets of the media composition comprising media essence data.

2. The method of claim 1, wherein the editing rendition rules are temporal editing rendition rules, and at least one of the media essence encoding parameters specified by the rendition profile is a temporal rendition parameter. there is a way.

7. The method of claim 6, wherein the temporal rendition parameters include a temporal framing range that defines a temporal range within the source media assets of the media composition, and wherein the source media assets of the media composition. A method wherein temporal portions of a media asset that fall within said time framing range are preferentially included in said rendition.

2. The method of claim 1, wherein the editing rendition rules are dynamic range editing rendition rules, and at least one of the media essence encoding parameters specified by the rendition profile is a dynamic range editing rendition rule. • A method, which is a rendition parameter.

9. The method of claim 8, wherein the source media asset of the media composition is a video asset, and the dynamic range rendition parameter is the source media asset of the media composition. A legal dynamic range defining a range of pixel luminance values within an asset, wherein pixels of the source media asset of the media composition having luminance values lying within the legal dynamic range are prioritized. and included in said renditions.

9. The method of claim 8, wherein the source media asset of the media composition is an audio asset, and the dynamic range rendition parameter is the source media asset of the media composition. A legal dynamic range defining a range of audio sample intensity values within an asset, wherein audio samples of said source media asset of said media composition having intensity values lying within said legal dynamic range. is preferentially included in said rendition.

2. The method of claim 1, wherein the editing rendition rules are color editing rendition rules, and at least one of the media essence encoding parameters specified by the rendition profile is a gamut rendition rule. A method, which is a parameter.

12. The method of claim 11, wherein
the source media asset of the media composition is one of a graphics asset and a video asset;
the gamut rendition parameter includes a legal color range defining a multidimensional region within a multidimensional color space;
A method wherein pixels of the source media assets of the media composition with color values that lie within the legal color range are preferentially included in the rendition.

The method of claim 1, wherein
the editing rendition rules are spatial editing rendition rules for layout of title text on frames of the rendition;
at least one media essence encoding parameter is a spatial framing box;
wherein the source media asset of the media composition is a text titling effect;
A method, wherein the text of the text titling effect is rendered within the spatial framing box on a frame of the rendition.

The method of claim 1, wherein
the editing rendition rules are composition editing rendition rules;
at least one media essence encoding parameter is a spatial framing box;
wherein the source media assets of the media composition are composite effects;
A method wherein a frame of said rendition is rendered having a composite image within said spatial framing box.

2. The method of claim 1, wherein the generating step includes interpreting the edit rendition rules using generic logic to execute rules.

2. The method of claim 1, wherein the generating step uses deep learning methods to capture from a pool of editors working with a wide range of media content, source media types, and rendition requirements. A method that includes executing logic that learns from edit rendition decisions.

A computer program,
comprising non-transitory computer readable storage having computer program instructions stored thereon, said computer program instructions being processed by said computing system to generate a rendition of a media composition in said computing system. command to perform a method of
receiving the media composition, wherein the media composition includes a composition object that references source media assets; the composition object is created by an editor of the media composition in a media composition application; a step associated with an edit rendition rule specified using
receiving a rendition profile that specifies media essence encoding parameters for a rendition of the media composition;
generating renditions of the media composition from the source media assets according to the editorial rendition rules and the media essence encoding parameters for the renditions of the media composition;
A computer program, including

A media synthesis rendition engine, comprising:
a memory for storing computer readable instructions;
a processor connected to the memory;
and when the processor executes the computer readable instructions, to the media synthesis rendition engine:
receiving a media composition, said media composition including a composition object that references source media assets, said composition object specified by an editor of said media composition using a media composition application associated with the edited rendition rule and
receiving a rendition profile specifying media essence encoding parameters for a rendition of the media composition;
generating renditions of the media composition from the source media assets according to the editorial rendition rules and the media essence encoding parameters for the renditions of the media composition;
A media composition rendition engine that runs