JP2022028896A

JP2022028896A - Screen-relative rendering of audio and encoding and decoding of audio for such rendering

Info

Publication number: JP2022028896A
Application number: JP2021194090A
Authority: JP
Inventors: キューロビンソン，チャールズ; q robinson Charles; アールティンゴス，ニコラス; R Tsingos Nicolas; サンチェス，フレディ; Sanchez Freddie
Original assignee: Dolby Laboratories Licensing Corp
Current assignee: Dolby Laboratories Licensing Corp
Priority date: 2013-11-14
Filing date: 2021-11-30
Publication date: 2022-02-16
Anticipated expiration: 2034-11-11
Also published as: JP6987198B2; JP2017503375A; CN105723740A; JP2019165494A; WO2015073454A3; EP3069528A2; JP2020205630A; EP3069528B1; JP6197115B2; JP2023120268A; CN105723740B; US9813837B2; JP2017225175A; US20160286333A1; JP6765476B2; JP7297036B2; WO2015073454A2; JP6688264B2

Abstract

PROBLEM TO BE SOLVED: To provide screen-relative rendering of audio, and encoding and decoding of audio for such rendering.

SOLUTION: An object processing subsystem 9 receives speaker channels, object channels, and screen-related metadata of an audio program delivered from a decoder 7. The object processing subsystem 9 performs warping on the object channels and outputs to a rendering subsystem 11 the resulting object channels and/or mixes. The warping is determined by a warping degree parameter indicated by the screen-related metadata and/or a warping degree parameter provided by a controller 10. The rendering subsystem 11 maps to the object channels and/or the speaker channels that can use the mixes.

SELECTED DRAWING: Figure 3

Description

関連出願への相互参照
本願は2013年11月14日に出願された米国仮特許出願第61/904,233号の優先権の利益を主張するものである。同出願の内容はここに参照によってその全体において組み込まれる。 Cross-reference to related applications This application claims the priority benefit of US Provisional Patent Application No. 61 / 904,233 filed on November 14, 2013. The content of the application is incorporated herein by reference in its entirety.

発明の技術分野
本発明は、対応するビデオ・コンテンツをもつオーディオ・プログラム（たとえば、映画または他のオーディオビジュアル・プログラムのサウンドトラック）のエンコード、デコードおよびレンダリングに関する。いくつかの実施形態では、プログラムは、少なくとも一つのオーディオ・オブジェクト・チャネルと、スクリーン関係メタデータ（screen-related metadata）と、典型的にはまたスピーカー・チャネルとを含むオブジェクト・ベースのオーディオ・チャネルである。スクリーン関係メタデータは、プログラムによって示される音源（たとえばオブジェクト・チャネルによって示されるオブジェクト）が再生システムの表示スクリーンに対する諸位置（少なくとも部分的には前記スクリーン関係メタデータによって決定される）においてレンダリングされる対スクリーン・レンダリング（screen-relative rendering）をサポートする。 Technical Field of the Invention The present invention relates to the encoding, decoding and rendering of an audio program with corresponding video content (eg, the soundtrack of a movie or other audiovisual program). In some embodiments, the program is an object-based audio channel that includes at least one audio object channel and screen-related metadata, typically also a speaker channel. Is. The screen-related metadata is rendered at various positions (at least in part by the screen-related metadata) of the playback system with respect to the display screen of the sound source indicated by the program (eg, the object indicated by the object channel). Supports screen-relative rendering.

本発明の実施形態は、オーディオ・コンテンツ生成および配送パイプライン（たとえば、オーディオビジュアル・プログラムのオーディオ・コンテンツを生成し、配送するためのパイプライン）の一つまたは複数の側面に関する。 Embodiments of the invention relate to one or more aspects of an audio content generation and delivery pipeline (eg, a pipeline for generating and delivering audio content in an audiovisual program).

そのようなパイプラインは、オーディオ・プログラム（典型的には、オーディオ・コンテンツおよび該オーディオ・コンテンツに対応するメタデータを示すエンコードされたオーディオ・プログラム）の生成を実装する。オーディオ・プログラムの生成は、オーディオ制作〔プロダクション〕活動（オーディオの捕捉および記録）および任意的にはまた「ポストプロダクション」活動（記録されたオーディオの操作）を含んでいてもよい。生放送は、必然的に、あらゆるオーサリング決定がオーディオ制作の間になされることを要求する。映画および他の非リアルタイム・プログラムの生成においては、多くのオーサリング決定はポストプロダクションの間になされてもよい。 Such a pipeline implements the generation of an audio program, typically audio content and an encoded audio program that presents the metadata corresponding to the audio content. The generation of audio programs may include audio production activities (capturing and recording audio) and optionally also "post-production" activities (manipulating recorded audio). Live broadcasts necessarily require that any authoring decision be made during audio production. In the generation of movies and other non-real-time programs, many authoring decisions may be made during post-production.

オーディオ・コンテンツ生成および配送パイプラインは任意的に、プログラムのリミックスおよび／またはリマスタリングを実装する。いくつかの場合には、プログラムは、コンテンツを代替的な使用ケースのために転用するため、コンテンツ生成後に追加的な処理を必要とすることがある。たとえば、映画館での再生のためにもともと生成されたプログラムが、家庭環境での再生により好適であるよう修正（たとえばリミックス）されてもよい。 The audio content generation and delivery pipeline optionally implements program remixing and / or remastering. In some cases, the program diverts the content for alternative use cases and may require additional processing after the content is generated. For example, a program originally generated for playback in a movie theater may be modified (eg, remixed) to be more suitable for playback in a home environment.

オーディオ・コンテンツ生成および配送パイプラインは典型的には、エンコード段を含む。オーディオ・プログラムは、配送を可能にするためにエンコードを必要とすることがある。たとえば、家庭での再生のために意図されるプログラムは、典型的には、より効率的な配送を許容するためにデータ圧縮される。エンコード・プロセスは、空間的オーディオ・シーンの複雑さの低減および／またはプログラムの個々のオーディオ・ストリームのデータ・レート低減および／またはオーディオ・コンテンツ（たとえば圧縮されたオーディオ・コンテンツ）の複数のチャネルおよび対応するメタデータの、所望されるフォーマットをもつビットストリームへのパッケージングの段階を含んでいてもよい。 Audio content generation and delivery pipelines typically include an encoding stage. Audio programs may require encoding to allow delivery. For example, a program intended for home playback is typically data compressed to allow for more efficient delivery. The encoding process reduces the complexity of the spatial audio scene and / or reduces the data rate of the individual audio streams of the program and / or multiple channels of audio content (eg, compressed audio content) and It may include a step of packaging the corresponding metadata into a bitstream with the desired format.

オーディオ・コンテンツ生成および配送パイプラインは、（典型的にはデコーダを含む再生システムによって実装される）デコードおよびレンダリングの段を含む。最終的には、プログラムは、再生設備および環境に基づいてオーディオ記述をラウドスピーカー信号にレンダリングすることによって、最終消費者に呈示される。 The audio content generation and delivery pipeline includes decoding and rendering stages (typically implemented by a playback system that includes a decoder). Ultimately, the program is presented to the end consumer by rendering the audio description into a loudspeaker signal based on the playback equipment and environment.

本発明の典型的な実施形態は、聴覚像の位置が、対応する視覚像の位置と整合する仕方で信頼できるように呈示されるよう、オーディオ・プログラム（たとえば、映画またはオーディオおよび画像コンテンツをもつ他のプログラムのサウンドトラック）が再生されることを許容する。 A typical embodiment of the present invention comprises an audio program (eg, a movie or audio and image content) such that the position of the auditory image is presented reliably in a manner consistent with the position of the corresponding visual image. Allow other programs' soundtracks to be played.

伝統的には、映画館のミキシング室（または他のオーディオビジュアル・プログラム・オーサリング環境）では、表示スクリーン（本稿では、オーディオビジュアル・プログラム再生スクリーンと区別するために「参照」スクリーンと称される）の位置およびサイズは、ミキシング環境の前方壁と一致し、参照スクリーンの左右の端は左右のメイン・スクリーン・ラウドスピーカーの位置と一致する。追加的な中央スクリーン・チャネルは一般に、参照スクリーン／壁の中央に位置される。このように、前方壁の広がり、前面ラウドスピーカー位置およびスクリーン位置は一貫して共位置にされる。典型的には、参照スクリーンは部屋とほぼ同じくらい広く、左、中央および右のラウドスピーカーは参照スクリーンの左端、中心および右端に近い。この配列は、期待される映画シアター再生位置におけるスクリーンおよび前面スピーカーの典型的な配列と同様である。たとえば、図１は、そのような映画シアターの前方壁の図であり、表示スクリーンS、左右の前方スピーカー（LおよびR）および前方中央スピーカー（C）が前方壁に（またはその近くに）取り付けられている。映画の再生中、視覚像BがスクリーンS上に表示され、その間、付随する音「A」が（スピーカーL、R、Cを含む）再生システムのスピーカーから放出される。たとえば、像Bは、音源（たとえば鳥またはヘリコプター）の画像であってもよく、音「A」はその音源から発しているように知覚されることが意図される音であってもよい。映画は、前面スピーカーがスクリーンSと同平面に位置され、左前方および右前方スピーカー（LおよびR）がスクリーンSの左および右端にあり、中央前方スピーカーがスクリーンSの中央付近にあるときに、音Aが、像Bが表示されるスクリーンS上の位置に一致する（またはほぼ一致する）音源位置から発しているように知覚されるよう、オーサリングおよびレンダリングされているものとする。図１は、スクリーンSが少なくとも実質的に音響的に透明であり、スピーカーL、C、RはスクリーンSの背後に（ただし少なくとも実質的にスクリーンSの面内に）設置されていると想定している。 Traditionally, in a movie theater mixing room (or other audiovisual program authoring environment), a display screen (referred to in this article as a "reference" screen to distinguish it from an audiovisual program playback screen). The position and size of is consistent with the front wall of the mixing environment, and the left and right edges of the reference screen are aligned with the positions of the left and right main screen loudspeakers. The additional center screen channel is generally located in the center of the reference screen / wall. In this way, the front wall spread, the front loudspeaker position and the screen position are consistently co-located. Typically, the reference screen is about as large as the room, and the left, center, and right loudspeakers are close to the left, center, and right edges of the reference screen. This arrangement is similar to the typical arrangement of screens and front speakers in the expected movie theater playback position. For example, FIG. 1 is a diagram of the front wall of such a movie theater, with display screens S, left and right front speakers (L and R) and front center speakers (C) mounted on (or near) the front wall. Has been done. During the playback of the movie, the visual image B is displayed on the screen S, during which the accompanying sound "A" is emitted from the speakers of the playback system (including speakers L, R, C). For example, the image B may be an image of a sound source (eg, a bird or a helicopter), and the sound "A" may be a sound intended to be perceived as coming from that sound source. The movie is when the front speakers are coplanar with screen S, the left front and right front speakers (L and R) are on the left and right edges of screen S, and the center front speaker is near the center of screen S. It is assumed that sound A is authored and rendered so that sound A is perceived as originating from a sound source position that matches (or nearly matches) the position on screen S where image B is displayed. FIG. 1 assumes that the screen S is at least substantially acoustically transparent and the speakers L, C, R are located behind the screen S (but at least substantially within the plane of the screen S). ing.

しかしながら、消費者の家庭における（またはモバイル・ユーザーのポータブル再生装置による）再生中は、再生システムの前面スピーカー（またはヘッドセット・スピーカー）の、互いに対するおよび再生システムの表示スクリーンに対するサイズおよび位置は、プログラム・オーサリング環境（たとえば映画館ミキシング室）の前面スピーカーおよび表示スクリーンのものと一致する必要はない。そのような再生事例では、再生スクリーンの幅は典型的には左右のメイン・スピーカー（左右の前方スピーカーまたはヘッドセット、たとえば一対のヘッドフォンのスピーカー）を分離する距離より有意に小さい。スクリーンがメイン・スピーカーに対して中心にされていない、またさらには固定位置にさえない（たとえば、ヘッドフォンを装着して表示装置を保持しているモバイル・ユーザーの場合）こともありうる。これは、知覚されるオーディオとビジュアルの間の、気づかれる食い違いを作り出すことがある。 However, during playback in the consumer's home (or by a mobile user's portable playback device), the size and position of the front speakers (or headset speakers) of the playback system relative to each other and relative to the playback system's display screen. It does not have to match that of the front speakers and display screen of the program authoring environment (eg cinema mixing room). In such playback cases, the width of the playback screen is typically significantly smaller than the distance that separates the left and right main speakers (left and right front speakers or headsets, such as a pair of headphone speakers). It is possible that the screen is not centered with respect to the main speakers, or even in a fixed position (for example, for mobile users wearing headphones and holding the display). This can create a noticeable discrepancy between perceived audio and visual.

たとえば、図２は、前方壁に（またはその近くに）取り付けられた家庭シアター・システムの表示スクリーン（S'）、左右の前方スピーカー（L'およびR'）および前方中央スピーカー（C'）をもつ部屋の前方壁（W'）の図である。図１の例に記載された同じ映画の（図２のシステムによる）再生中、視覚像BはスクリーンS'上に表示され、その間、付随する音Aが（スピーカーL'、R'、C'を含む）再生システムのスピーカーから放出される。映画は、音Aが、像Bが表示される映画シアター・スクリーン上の位置に一致する（またはほぼ一致する）音源位置から発しているように知覚されるよう、（映画シアター再生システムによる）レンダリングおよび再生のためにオーサリングされているものと想定した。しかしながら、映画が図２の家庭シアター・システムによって再生されるとき、音Aは、左前方スピーカーL'に近い音源位置から発しているように知覚されることになる。その音源位置は像Bが表示される家庭シアター・スクリーンS'上の位置と一致するのでも、ほぼ一致するのでもない。これは、家庭シアター・システムの前方スピーカーL'、C'およびR'が、プログラム・オーサリング・システムの前方スピーカーがプログラム・オーサリング・システムの参照スクリーンに対してもつものとは、異なるサイズおよび位置をスクリーンS'に対して有することによる。 For example, FIG. 2 shows a home theater system display screen (S') mounted on (or near) the front wall, left and right front speakers (L'and R'), and front center speakers (C'). It is a figure of the front wall (W') of the room with the speaker. During playback of the same movie (according to the system of FIG. 2) of the example of FIG. 1, the visual image B is displayed on the screen S'while the accompanying sound A is (speakers L', R', C'. Is emitted from the speakers of the playback system (including). The movie is rendered (by the movie theater playback system) so that sound A is perceived as coming from a sound source position that matches (or nearly matches) the position on the movie theater screen where image B is displayed. And assumed to be authored for reproduction. However, when the movie is played by the home theater system of FIG. 2, the sound A will be perceived as coming from a sound source position close to the left front speaker L'. The sound source position does not coincide with or almost coincide with the position on the home theater screen S'where the image B is displayed. This is because the front speakers L', C'and R'of the home theater system are different in size and position than those the front speakers of the program authoring system have with respect to the reference screen of the program authoring system. By having against screen S'.

図１および図２の例では、期待される映画館再生システムは、そのラウドスピーカーとスクリーンとの間のよく定義された関係をもつものと想定される。よって、表示される画像および対応するオーディオ源についてのコンテンツ・クリエーターの所望される相対位置は、（映画館における再生の間）信頼性をもって再現できる。他の環境での（たとえば家庭オーディオ・ビデオ室での）再生については、ラウドスピーカーとスクリーンとの間の想定される関係は典型的には保存されず、よって（コンテンツ・クリエーターによって所望される）表示される画像と対応するオーディオ源との相対位置は典型的にはよく再現されない。（ラウドスピーカーとスクリーンとの間での想定される関係をもつ映画館以外における）再生中に実際に達成される表示される画像と対応するオーディオ源の相対位置は、再生システムのラウドスピーカーおよび表示スクリーンの実際の相対的な位置およびサイズに基づく。 In the examples of FIGS. 1 and 2, the expected cinema reproduction system is assumed to have a well-defined relationship between its loudspeakers and the screen. Thus, the desired relative position of the content creator for the displayed image and the corresponding audio source can be reliably reproduced (during playback in the cinema). For playback in other environments (eg in a home audio / video room), the expected relationship between the loudspeakers and the screen is typically not preserved and therefore (desired by the content creator). The relative position of the displayed image and the corresponding audio source is typically not well reproduced. The relative position of the displayed image and the corresponding audio source that is actually achieved during playback (other than in a cinema with the expected relationship between the loudspeaker and the screen) is the loudspeaker and display of the playback system. Based on the actual relative position and size of the screen.

オーディオビジュアル・プログラムの再生の間、オン・スクリーン位置において知覚されるようレンダリングされる音については、最適な聴覚像位置は、聴取者位置とは独立である。（スクリーンの平面に垂直な方向において0でない距離にある）オフ・スクリーン位置において知覚されるようレンダリングされる音については、聴取者位置に依存して、音源の聴覚的に知覚される位置におけるパララックス誤差（parallax errors）の可能性がある。既知のまたは想定される聴取者位置に基づいてそのようなパララックス誤差を最小にするまたはなくすよう試みる諸方法が提案されている。 For sounds that are rendered to be perceived in the on-screen position during playback of the audiovisual program, the optimal auditory image position is independent of the listener position. For sounds that are rendered to be perceived in the off-screen position (at a non-zero distance in the direction perpendicular to the plane of the screen), the parallax at the auditory perceived position of the sound source depends on the listener position. There is a possibility of parallax errors. Methods have been proposed that attempt to minimize or eliminate such parallax errors based on known or expected listener positions.

オブジェクト・ベースのオーディオ・プログラム（たとえば映画サウンドトラックを示すオブジェクト・ベースのプログラム）をレンダリングするために（たとえば映画シアターにおいて）ハイエンドの再生システムを用いることが知られている。たとえば、映画サウンドトラックであるオブジェクト・ベースのオーディオ・プログラムは、意図される全体的な聴覚体験を作り出すために、スクリーン上の像に対応する多くの異なる音要素（オーディオ・オブジェクト）、ダイアログ、ノイズおよびスクリーン上の（またはスクリーンに対する）異なる位置から発するサウンド効果ならびに背景音楽および周囲効果（ambient effects）（これはプログラムのスピーカー・チャネルによって示されてもよい）を示していてもよい。そのようなプログラムの正確な再生は、音が、オーディオ・オブジェクト・サイズ、位置、強度、動きおよび奥行きに関してコンテンツ・クリエーターによって意図されているものにできるだけ近く対応する仕方で再現されることを要求する。 It is known to use a high-end playback system (eg, in a movie theater) to render an object-based audio program (eg, an object-based program that indicates a movie soundtrack). For example, an object-based audio program, which is a movie soundtrack, has many different sound elements (audio objects), dialogs, and noises that correspond to images on the screen to create the intended overall auditory experience. And may indicate sound effects emanating from different locations on the screen (or relative to the screen) as well as background music and ambient effects, which may be indicated by the speaker channel of the program. Accurate reproduction of such programs requires that the sound be reproduced in a manner that corresponds as closely as possible to what the content creator intended in terms of audio object size, position, intensity, movement and depth. ..

オブジェクト・ベースのオーディオ・プログラムは、伝統的なスピーカー・チャネル・ベースのオーディオ・プログラムに対して有意な改善を表わす。スピーカー・チャネル・ベースのオーディオは、特定のオーディオ・オブジェクトの空間的再生に関して、オブジェクト・チャネル・ベースのオーディオよりも制限されているからである。スピーカー・チャネル・ベースのオーディオ・プログラムのオーディオ・チャネルは、スピーカー・チャネルのみからなり（オブジェクト・チャネルなし）、各スピーカー・チャネルは典型的には、聴取環境における特定の、個別のスピーカーのためのスピーカー・フィードを決定する。 Object-based audio programs represent a significant improvement over traditional speaker channel-based audio programs. Speaker channel-based audio is more restricted than object channel-based audio in terms of spatial reproduction of certain audio objects. The audio channels of a speaker channel-based audio program consist only of speaker channels (no object channels), and each speaker channel is typically for a specific, individual speaker in the listening environment. Determine the speaker feed.

オブジェクト・ベースのオーディオ・プログラムを生成し、レンダリングするためのさまざまな方法およびシステムが提案されている。オブジェクト・ベースのオーディオ・プログラムの生成の間、典型的には、任意の数のラウドスピーカーがプログラムの再生のために用いられること、再生のために（典型的には映画シアターにおいて）用いられるラウドスピーカーは再生環境中の任意の位置に位置されるのであり、必ずしも（公称上の）水平面内またはプログラム生成の時点で知られていた他の任意の所定の配置ではないことが想定される。典型的には、プログラムに含まれるオブジェクト関係メタデータは、たとえばスピーカーの三次元アレイを使って、（三次元体積において）ある見かけの空間位置にまたはある軌跡に沿ってプログラムの少なくとも一つのオブジェクトをレンダリングするためのレンダリング・パラメータを示す。たとえば、プログラムのオブジェクト・チャネルは、（該オブジェクト・チャネルによって示される）オブジェクトがレンダリングされる見かけの空間位置の三次元軌跡を示す対応するメタデータを有していてもよい。軌跡は、（再生環境の、床上に位置されていると想定されるスピーカーの部分集合の平面内または他の水平面内の）「床」位置のシーケンスと、（それぞれ再生環境の少なくとも一つの他の水平面に位置されると想定されるスピーカーの部分集合を駆動することによって決定される）「床上方」位置のシーケンスとを含んでいてもよい。オブジェクト・ベースのオーディオ・プログラムのレンダリングの例は、たとえば、本願の被譲渡者に譲渡された特許文献１に記載されている。 Various methods and systems have been proposed for generating and rendering object-based audio programs. During the generation of an object-based audio program, typically any number of loudspeakers are used to play the program, the loudspeakers used for playback (typically in a movie theater). It is assumed that the speakers are located in any position in the playback environment and are not necessarily in the (nominal) horizontal plane or any other predetermined arrangement known at the time of program generation. Typically, the object-related metadata contained in a program uses, for example, a three-dimensional array of speakers to capture at least one object in the program at an apparent spatial position (in a three-dimensional volume) or along a trajectory. Shows the rendering parameters for rendering. For example, the object channel of a program may have corresponding metadata showing a three-dimensional trajectory of the apparent spatial position in which the object (indicated by the object channel) is rendered. Trajectories are a sequence of "floor" positions (in the plane of a subset of speakers that are supposed to be located on the floor in the playback environment or in other horizontal planes) and at least one other other in the playback environment, respectively. It may include a sequence of "above floor" positions (determined by driving a subset of speakers that are supposed to be located in a horizontal plane). An example of rendering an object-based audio program is described, for example, in Patent Document 1 assigned to the assignee of the present application.

オブジェクト・ベースのオーディオ・プログラム・レンダリングの到来は、処理されるオーディオ・データの量およびレンダリング・システムによって実行されねばならないレンダリングの複雑さを有意に増した。これは部分的には、オブジェクト・ベースのオーディオ・プログラムは多くのオブジェクト（それぞれ対応するメタデータをもつ）を示すことがあり、多くのラウドスピーカーを含むシステムによる再生のためにレンダリングされうるからである。意図されるレンダリング・システムがプログラムをレンダリングする機能をもつよう、オブジェクト・ベースのオーディオ・プログラムに含まれるオブジェクト・チャネルの数を制限することが提案されている。たとえば、2012年12月21日に出願された「オブジェクト・ベースのオーディオ・コンテンツをレンダリングするためのシーン単純化およびオブジェクト・クラスタリング」と題する、Brett Crockett, Alan Seefeldt, Nocolas Tsingos, Rhonda WilsonおよびJeroen Breebaartを発明者として挙げ、本発明の被譲渡者に譲渡された米国仮特許出願第61/745,401号は、入力オブジェクト・チャネルをクラスタリングして、プログラムに含められるクラスタリングされたオブジェクト・チャネルを生成することによっておよび／または、入力オブジェクト・チャネルのオーディオ・コンテンツをスピーカー・チャネルと混合してプログラムに含められる混合されたスピーカー・チャネルを生成することによって、オブジェクト・ベースのオーディオ・プログラムのオブジェクト・チャネルの数をそのように制限するための方法および装置を記載している。本発明のいくつかの実施形態は、再生システムへの（スクリーン関係メタデータと一緒の）送達のためのオブジェクト・ベースのプログラムを生成するために、あるいは再生システムへの送達のためのスピーカー・チャネル・ベースのプログラムの生成において使うために、（たとえばミキシングまたはリミックス施設において）そのようなクラスタリングとの関連で実行されてもよいことが考えられている。 The advent of object-based audio program rendering has significantly increased the amount of audio data processed and the complexity of rendering that must be performed by the rendering system. This is partly because object-based audio programs can show many objects (each with corresponding metadata) and can be rendered for playback by a system that includes many loudspeakers. be. It has been proposed to limit the number of object channels contained in an object-based audio program so that the intended rendering system has the ability to render the program. For example, Brett Crockett, Alan Seefeldt, Nocolas Tsingos, Rhonda Wilson and Jeroen Breebaart, entitled "Scene Simplification and Object Clustering for Rendering Object-Based Audio Content," filed December 21, 2012. US Provisional Patent Application No. 61 / 745,401, which was transferred to the transferee of the present invention, clusters the input object channels to generate the clustered object channels to be included in the program. And / or the number of object channels in an object-based audio program by mixing the audio content of the input object channel with the speaker channels to produce a mixed speaker channel that can be included in the program. Describes methods and devices for such limitation. Some embodiments of the invention are speaker channels for generating object-based programs for delivery to a playback system (with screen-related metadata) or for delivery to a playback system. It is believed that it may be performed in the context of such clustering (eg, in a mixing or remix facility) for use in the generation of base programs.

PCT国際出願第PCT/US2011/028783号、国際公開第2011/119041A2号、2011年9月29日公開PCT International Application No. PCT / US 2011/028783, International Publication No. 2011/119041A2, published September 29, 2011

本開示を通じて、オーディオ・プログラムの少なくとも一つのチャネル（たとえば、オブジェクト・チャネルまたはスピーカー・チャネル）の「歪め」（warping［ワーピング］）とは、プログラムが対応するビデオ・コンテンツをもつことを想定し（たとえばプログラムは映画または他のオーディオビジュアル・プログラムのサウンドトラックであってもよい）、歪められたオーディオ・コンテンツを生成するようそれぞれの前記チャネルのオーディオ・コンテンツ（オーディオ・データ）を処理すること（あるいはそれぞれの前記チャネルを歪められたオーディオ・コンテンツを示す少なくとも一つの他のオーディオ・チャネルで置き換えること）を表わす。歪められたオーディオ・コンテンツがレンダリングされてスピーカー・フィードを生成し、該スピーカー・フィードが再生スピーカーを駆動するために用いられると、スピーカーから発される音は、（コンテンツ・クリエーターが参照スクリーン、たとえば映画シアターのスクリーンに対して少なくとも一つの所定の位置において知覚されるよう意図した）少なくとも一つのオーディオ要素であって、知覚される歪められた位置（これは固定であってもよく、時間とともに変化してもよい）をもつものを示す。歪められた位置は、（コンテンツ・クリエーターによって想定された参照スクリーンに対してではなく）再生システムの表示スクリーンに対する所定の位置であるという意味で「歪められ（warped）」ている。典型的には、それぞれの歪められた位置は、オーディオ・プログラムと一緒に提供される（たとえばオーディオ・プログラムに含められる）メタデータ（本稿では「スクリーン関係」メタデータと称される）によって、（少なくとも部分的には）再生システムの表示スクリーン（時に「再生スクリーン」と称される）に対して決定される。それぞれの歪められた位置は、スクリーン関係メタデータおよび再生システム構成を示す他のデータ（たとえば、再生システムのスピーカーおよび表示スクリーンの位置または位置およびサイズおよび／または諸サイズおよび／または諸位置の間の関係（単数または複数）を示すデータ）によって決定されうる。歪められた位置（単数または複数）は、実際の再生スクリーンと一致してもよいが、その必要はない。本発明のいくつかの実施形態は、（再生スクリーンに対して）オン・スクリーンおよび／またはオフ・スクリーンの、再生の間に変化する歪められた位置の間でのなめらかな遷移を許容する。 Throughout this disclosure, the "warping" of at least one channel (eg, an object channel or speaker channel) of an audio program is assumed to have the corresponding video content of the program (for example). For example, the program may be the soundtrack of a movie or other audiovisual program), processing (or audio data) the audio content (audio data) of each said channel to produce distorted audio content. Replacing each said channel with at least one other audio channel indicating distorted audio content). When the distorted audio content is rendered to produce a speaker feed, and the speaker feed is used to drive the playback speaker, the sound emitted by the speaker is (the content creator sees a reference screen, eg, a reference screen, for example. At least one audio element intended to be perceived in at least one predetermined position with respect to the screen of a cinema theater, and in a perceived distorted position (which may be fixed and change over time). (May be). The distorted position is "warped" in the sense that it is a predetermined position with respect to the display screen of the playback system (rather than with respect to the reference screen envisioned by the content creator). Typically, each distorted position is (in this article referred to as "screen-related" metadata) by the metadata provided with the audio program (eg, included in the audio program). Determined, at least in part, for the display screen of the playback system (sometimes referred to as the "playback screen"). Each distorted position is between screen-related metadata and other data indicating the playback system configuration (eg, the position or position and size and / or size and / or position of the speaker and display screen of the playback system. It can be determined by the relationship (data showing the relationship (s). The distorted position (s) may match the actual playback screen, but it is not necessary. Some embodiments of the invention allow a smooth transition between on-screen and / or off-screen distorted positions that change during reproduction (relative to the reproduction screen).

本稿において、プログラムの少なくとも一つのチャネルの「オフ・スクリーンの歪め」という表現は、（前記少なくとも一つのチャネルのオーディオ・コンテンツによって決定される）少なくとも一つの対応するオーディオ要素の歪められた位置が、再生スクリーンに対して0でない奥行きのところにある（すなわち、再生スクリーンの平面に少なくとも実質に垂直な方向において再生スクリーンからの0でない距離をもつ）という型の、前記少なくとも一つのチャネルの「歪め」を表わす。 In this paper, the expression "off-screen distortion" of at least one channel of the program refers to the distorted position of at least one corresponding audio element (determined by the audio content of said at least one channel). "Distortion" of at least one channel of the type that is at a non-zero depth to the playback screen (ie, has a non-zero distance from the playback screen at least substantially perpendicular to the plane of the playback screen). Represents.

第一のクラスの実施形態では、本発明は、オーディオ・プログラム（たとえばオブジェクト・ベースのオーディオ・プログラム）をレンダリングする方法であって：（ａ）少なくとも一つの歪め度（warping degree）パラメータを決定する（たとえば、プログラムをパースして、プログラムのスクリーン関係メタデータによって示される少なくとも一つの前記一つの歪め度パラメータを同定することによる、あるいは少なくとも一つの前記一つの歪め度パラメータを再生システムに対して指定することによることを含むレンダリングを実行するよう再生システムを構成することによる）段階と；（ｂ）プログラムの少なくとも一つのチャネルのオーディオ・コンテンツに対して、少なくとも部分的には前記チャネルに対応する歪め度パラメータによって決定される度合いまで歪めを実行する段階であって、それぞれの前記歪め度パラメータは、再生システムによってプログラムの対応するオーディオ・コンテンツに対して実行される歪めの最大の度合いを示す（たとえば、該最大の度合いを示す非バイナリー値である）段階とを含む方法である。第一のクラスのいくつかの実施形態では、段階（ａ）は、少なくとも一つのオフ・スクリーン歪めパラメータを決定する（たとえば、プログラムをパースして、プログラムのスクリーン関係メタデータによって示される少なくとも一つの前記一つのオフ・スクリーン歪めパラメータを同定することによる）段階を含み、前記オフ・スクリーン歪めパラメータは、再生システムによるプログラムの対応するオーディオ・コンテンツに対するオフ・スクリーン歪めの少なくとも一つの特性を示し、段階（ｂ）において実行される歪めは、少なくとも部分的には少なくとも一つの前記オフ・スクリーン歪めパラメータによって決定されるオフ・スクリーン歪めを含む。たとえば、オフ・スクリーン歪めパラメータは、オーディオ要素の歪められた位置の（再生スクリーンの平面に少なくとも実質的に平行な方向における）歪めの仕方もしくは度合いまたは最大の歪めを、奥行き（再生スクリーンの平面に少なくとも実質的に垂直な方向における再生スクリーンからの距離）の関数として制御してもよい。いくつかの実施形態では、段階（ａ）において決定される歪め度パラメータは、プログラムの対応するオーディオ・コンテンツに対して（再生スクリーンに少なくとも実質的に垂直なある奥行きにある）再生スクリーンの平面に少なくとも実質的に平行な平面において実行される歪めの最大の度合いを示し、よってオフ・スクリーン歪めパラメータである。他の実施形態では、段階（ａ）は、少なくとも一つの歪め度パラメータおよび歪め度パラメータではない少なくとも一つのオフ・スクリーン歪めパラメータを決定することを含む。いくつかの実施形態では、プログラムは、少なくとも二つのオブジェクトを示し、段階（ａ）は、前記オブジェクトの少なくとも二つのそれぞれについて、少なくとも一つの歪め度パラメータを独立に決定する段階を含み、段階（ｂ）は、少なくとも部分的には前記オブジェクトの前記それぞれに対応する前記少なくとも一つの歪め度パラメータによって決定される度合いまで前記オブジェクトのそれぞれを示すオーディオ・コンテンツに対して歪めを独立して実行する段階を含む。 In the first class of embodiments, the invention is a method of rendering an audio program (eg, an object-based audio program): (a) determining at least one warping degree parameter. (For example, by parsing the program and identifying at least one of the distortion parameters indicated by the program's screen-related metadata, or by specifying at least one of the distortion parameters for the playback system. (By configuring the playback system to perform rendering, including by); (b) distortion of the audio content of at least one channel of the program, at least in part, corresponding to said channel. At the stage of performing distortion to the degree determined by the degree parameter, each of the distortion degree parameters indicates the maximum degree of distortion performed by the playback system on the corresponding audio content of the program (eg,). , A non-binary value indicating the maximum degree). In some embodiments of the first class, step (a) determines at least one off-screen distortion parameter (eg, parses the program and at least one indicated by the screen-related metadata of the program). The off-screen distortion parameter comprises at least one characteristic of off-screen distortion for the corresponding audio content of the program by the playback system, including the step (by identifying the one off-screen distortion parameter). The distortion performed in (b) includes, at least in part, the off-screen distortion determined by at least one of the off-screen distortion parameters. For example, the off-screen distortion parameter is the depth (in the plane of the playback screen) the distortion method or degree or maximum distortion of the distorted position of the audio element (at least substantially parallel to the plane of the playback screen). It may be controlled as a function of at least the distance from the playback screen in a substantially vertical direction). In some embodiments, the distortion parameter determined in step (a) is in the plane of the playback screen (at a depth that is at least substantially perpendicular to the playback screen) with respect to the corresponding audio content of the program. It indicates the maximum degree of distortion performed at least in substantially parallel planes and is thus an off-screen distortion parameter. In another embodiment, step (a) comprises determining at least one distortion parameter and at least one off-screen distortion parameter that is not a distortion parameter. In some embodiments, the program exhibits at least two objects, step (a) comprising independently determining at least one distortion degree parameter for each of at least two of the objects, step (b). ) Is a step of independently performing distortion on the audio content indicating each of the objects, at least in part, to the extent determined by the at least one distortion parameter corresponding to each of the objects. include.

第二のクラスの実施形態では、本発明は、オブジェクト・ベースのオーディオ・プログラムを生成する（またはデコードする）方法である。本方法は、少なくとも一つのオーディオ・オブジェクトについて少なくとも一つの歪め度パラメータを決定する段階と、（前記オブジェクトを示す）オブジェクト・チャネルおよび前記オブジェクトについてのそれぞれの前記歪め度パラメータを示すスクリーン関係メタデータを前記プログラムに含める段階とを含む。それぞれの前記歪め度パラメータは、再生システムによって対応するオブジェクトに対して（たとえば再生スクリーンの平面に平行な平面において）実行される歪めの最大の度合いを示す（たとえば、該最大の度合いを示す非バイナリー値（たとえば、所定の範囲内の多くの値のうちの任意の値をもつスカラー値）である）。たとえば、歪め度パラメータは、ある最小値（歪めが実行されるべきでないことを示す）からフルの歪めが実行される（たとえば、参照スクリーンの右端にあるようプログラムによって定義されているオーディオ要素位置を再生スクリーンの右端の歪められた位置に歪める）べきであることを示すある最大値までの範囲内の浮動小数点値であってもよい。ここで、前記範囲は、中間的な度合いの歪め（たとえばフル歪めの50%）が実行される（たとえば、参照スクリーンの右端にあるようプログラムによって定義されているオーディオ要素位置を再生部屋の右端と再生スクリーンの右端の間の中間の歪められた位置に歪める）べきであることを示す少なくとも一つの中間的な値（前記最小値より大きいが前記最大値より小さい）を含む。このコンテキストにおいて、フル歪めは、歪められた位置が再生スクリーンと一致するような、再生スクリーンの平面内でのオーディオ要素の知覚される位置の歪めを表わしてもよく、中間的な度合いの（あるいはフル歪めより少ない）歪めは、歪められた位置が再生スクリーンより大きな（かつ再生スクリーンを含む）エリアと一致するような、再生スクリーンの平面内でのオーディオ要素の知覚される位置の歪めを表わしてもよい。 In the second class of embodiments, the invention is a method of generating (or decoding) an object-based audio program. The method determines at least one distortion parameter for at least one audio object, and screen-related metadata that indicates the object channel (indicating the object) and each of the distortion parameters for the object. Includes steps to include in the program. Each of the distortion parameters indicates the maximum degree of distortion performed by the playback system on the corresponding object (eg, in a plane parallel to the plane of the playback screen) (eg, non-binary indicating the maximum degree). A value (for example, a scalar value with any value out of many values within a given range). For example, the distortion parameter can be a full distortion from a minimum (indicating that distortion should not be performed) (for example, the audio element position defined by the program to be on the far right of the reference screen). It may be a floating point value within the range up to a certain maximum value that indicates that it should be distorted to the distorted position on the right edge of the playback screen. Here, the range is where an intermediate degree of distortion (eg 50% of full distortion) is performed (eg, the audio element position defined by the program to be at the right edge of the reference screen with the right edge of the playback room. Includes at least one intermediate value (greater than the minimum but less than the maximum) indicating that it should be distorted to an intermediate distorted position between the right edges of the playback screen. In this context, full distortion may represent a distortion of the perceived position of the audio element in the plane of the playback screen such that the distorted position coincides with the playback screen, to an intermediate degree (or). Distortion (less than full distortion) represents distortion of the perceived position of the audio element in the plane of the playback screen such that the distorted position coincides with an area larger than the playback screen (and includes the playback screen). May be good.

第二のクラスのいくつかの実施形態では、前記スクリーン関係メタデータは、プログラムの少なくとも二つのオブジェクトのそれぞれについての少なくとも一つの前記歪め度パラメータを示し、それぞれの前記歪め度パラメータは、それぞれの対応するオブジェクトに対して実行される最大の歪めの度合いを示す。たとえば、歪め度パラメータは、異なるオブジェクト・チャネルによって示される各オブジェクトについて、再生スクリーンの面内のまたは該面に平行な、異なる最大の歪めの度合いを示すことがありうる。もう一つの例では、歪め度パラメータは、異なるオブジェクト・チャネルによって示される各オブジェクトについて、再生スクリーンの面内でのまたは該面に平行な鉛直方向における異なる最大の歪めの度合いと、再生スクリーンの面内でのまたは該面に平行な水平方向における異なる最大の歪めの度合いとを示すことがありうる。 In some embodiments of the second class, the screen-related metadata indicates at least one said distortion parameter for each of at least two objects in the program, and each said strain parameter corresponds to each other. Shows the maximum degree of distortion performed on an object. For example, the distortion parameter may indicate a different maximum degree of distortion in or parallel to the plane of the playback screen for each object represented by a different object channel. In another example, the distortion parameter is the different maximum degree of distortion in the plane of the playback screen or in the vertical direction parallel to the plane and the plane of the playback screen for each object indicated by a different object channel. It may indicate a different degree of maximum strain within or in the horizontal direction parallel to the plane.

第二のクラスのいくつかの実施形態では、前記スクリーン関係メタデータは、再生システムによってプログラムの対応するオーディオ・コンテンツに対して実行されるオフ・スクリーン歪めの少なくとも一つの特性を示す（たとえば、再生スクリーンの平面に少なくとも実質的に平行な面内で歪めが実行される仕方および／または度合いを、再生スクリーンの平面に少なくとも実質的に垂直な、各面の距離の関数として示す）、少なくとも一つのオフ・スクリーン歪めパラメータをも示す。いくつかのそのような実施形態では、スクリーン関係メタデータは、プログラムによって示される少なくとも二つのオブジェクトのそれぞれについて一つの前記オフ・スクリーン歪めパラメータを示し、それぞれの前記オフ・スクリーン歪めパラメータはそれぞれの対応するオブジェクトに対して実行されるべきオフ・スクリーン歪めの少なくとも一つの特性を示す。たとえば、プログラムは、異なるオブジェクト・チャネルによって示される各オブジェクトについてのオフ・スクリーン歪めパラメータを含むことができ、該オフ・スクリーン歪めパラメータは、それぞれの対応するオブジェクトに対して実行されるオフ・スクリーン歪めの型を示す（すなわち、前記メタデータは、各オブジェクト・チャネルに対応するオブジェクト（単数または複数）について異なる型のオフ・スクリーン歪めを指定することができる）。いくつかの実施形態では、少なくとも一つのオフ・スクリーン歪めパラメータは、プログラムの対応するオーディオ・コンテンツに対する（再生スクリーンに少なくとも実質的に垂直なある奥行きにある）再生スクリーンの平面に少なくとも実質的に平行な面内において実行されるべき最大の歪めの度合いを示し、よって該オフ・スクリーン歪めパラメータは歪め度パラメータである。 In some embodiments of the second class, the screen-related metadata exhibits at least one characteristic of off-screen distortion performed by the playback system on the corresponding audio content of the program (eg, playback). At least one (shown as a function of the distance between each plane, at least substantially perpendicular to the plane of the playback screen), how and / or the degree to which the distortion is performed in a plane that is at least substantially parallel to the plane of the screen. It also shows off-screen distortion parameters. In some such embodiments, the screen-related metadata indicates one said off-screen distortion parameter for each of at least two objects represented by the program, and each said off-screen distortion parameter corresponds to each other. Shows at least one characteristic of off-screen distortion that should be performed on the object to be performed. For example, a program can include off-screen distortion parameters for each object indicated by different object channels, which off-screen distortion parameters are performed for each corresponding object. (Ie, the metadata can specify different types of off-screen distortion for the object (s) corresponding to each object channel). In some embodiments, the at least one off-screen distortion parameter is at least substantially parallel to the plane of the playback screen (at a depth that is at least substantially perpendicular to the playback screen) with respect to the program's corresponding audio content. It indicates the maximum degree of distortion to be performed in a plane, and thus the off-screen distortion parameter is a distortion degree parameter.

第三のクラスの実施形態では、本発明は：
（ａ）オブジェクト・ベースのオーディオ・プログラムを生成する段階と；
（ｂ）前記オブジェクト・ベースのオーディオ・プログラムに応答して、再生スクリーンに対して所定の位置に位置されるラウドスピーカーによる再生のために意図されたスピーカー・チャネルの少なくとも一つの集合を含むスピーカー・チャネル・ベースのプログラムを生成する段階であって、スピーカー・チャネルの前記集合の生成は、前記オブジェクト・ベースのオーディオ・プログラムのオーディオ・コンテンツを、少なくとも部分的には少なくとも一つの歪め度パラメータによって決定される度合いまで歪める段階を含み、それぞれの前記歪め度パラメータは、再生システムによって前記オブジェクト・ベースのオーディオ・プログラムの対応するオーディオ・コンテンツに対して（たとえば再生スクリーンの平面に平行な平面において）実行される歪めの最大の度合いを示す（たとえば、該最大の度合いを示す非バイナリー値（たとえば、所定の範囲内の多くの値のうちの任意の値をもつスカラー値）である）、段階とを含む、方法である。 In a third class of embodiment, the invention is:
(A) The stage of generating an object-based audio program;
(B) A speaker that includes at least one set of speaker channels intended for playback by a loudspeaker located in place with respect to the playback screen in response to the object-based audio program. At the stage of generating a channel-based program, the generation of the set of speaker channels determines the audio content of the object-based audio program, at least in part, by at least one distortion parameter. Each of the distortion parameters is performed by the playback system on the corresponding audio content of the object-based audio program (eg, in a plane parallel to the plane of the playback screen), including the steps to distort to the extent that it is done. Indicates the maximum degree of distortion to be performed (eg, a non-binary value indicating the maximum degree (for example, a scalar value having any value among many values within a predetermined range)), a stage, and a stage. Including, the method.

第三のクラスのいくつかの実施形態では、段階（ｂ）は、前記スピーカー・チャネル・ベースのオーディオ・プログラムを、スピーカー・チャネルの二つ以上の選択可能な集合を含むように生成する段階を含み、それらの集合の少なくとも一つは前記オブジェクト・ベースのオーディオ・プログラムの歪められていないオーディオ・コンテンツを示し、それらの集合の少なくとも一つの他のものの生成は、前記オブジェクト・ベースのオーディオ・プログラムのオーディオ・コンテンツを（前記歪め度パラメータを使って）歪める段階を含み、前記集合の前記他のものは、再生スクリーンに対して所定の位置に位置されるラウドスピーカーによる再生のために意図される。第三のクラスのいくつかの実施形態では、段階（ｂ）は、少なくとも一つのオフ・スクリーン歪めパラメータを決定する（たとえば、前記オブジェクト・ベースのプログラムをパースして、前記オブジェクト・ベースのプログラムのスクリーン関係メタデータによって示される少なくとも一つの前記オフ・スクリーン歪めパラメータを同定することによる）段階を含み、前記オフ・スクリーン歪めパラメータは、再生システムによる前記オブジェクト・ベースのオーディオ・プログラムの対応するオーディオ・コンテンツに対するオフ・スクリーン歪めの少なくとも一つの特性を示し、段階（ｂ）は、少なくとも部分的には少なくとも一つの前記オフ・スクリーン歪めパラメータによって決定されるオフ・スクリーン歪めを含む。 In some embodiments of the third class, step (b) is a step of generating the speaker channel based audio program to include two or more selectable sets of speaker channels. Containing, at least one of those sets represents the undistorted audio content of the object-based audio program, and the generation of at least one of those sets is the object-based audio program. The other of the set is intended for playback by a loudspeaker located in place with respect to the playback screen, including the step of distorting the audio content of the (using the distortion parameter). .. In some embodiments of the third class, step (b) determines at least one off-screen distortion parameter (eg, parsing the object-based program to the object-based program. The off-screen distortion parameter comprises a step (by identifying at least one of the off-screen distortion parameters indicated by screen-related metadata), wherein the off-screen distortion parameter is the corresponding audio of the object-based audio program by the playback system. Demonstrating at least one characteristic of off-screen distortion for content, step (b) includes off-screen distortion determined by at least one of the off-screen distortion parameters, at least in part.

第三のクラスのいくつかの実施形態では、前記オブジェクト・ベースのオーディオ・プログラムは、少なくとも一つの前記歪め度パラメータ（または少なくとも一つの前記歪め度パラメータおよび少なくとも一つのオフ・スクリーン歪めパラメータ）を示すスクリーン関係メタデータを含み、段階（ｂ）は、前記オブジェクト・ベースのオーディオ・プログラムをパースして、前記少なくとも一つの前記歪め度パラメータ（または前記少なくとも一つの前記歪め度パラメータおよび前記オフ・スクリーン歪めパラメータ）を同定する段階を含む。 In some embodiments of the third class, the object-based audio program exhibits at least one said distortion parameter (or at least one said distortion parameter and at least one off-screen distortion parameter). Including screen-related metadata, step (b) parses the object-based audio program and the at least one of the distortion parameters (or the at least one of the distortion parameters and the off-screen distortion). Includes the step of identifying the parameter).

（第三のクラスの実施形態による）前記スピーカー・チャネル・ベースのプログラムの前記生成は、オブジェクト・ベースのオーディオ・プログラムのデコードおよびレンダリングを実行するよう構成されていない（だがスピーカー・チャネル・ベースのプログラムをデコードし、レンダリングすることはできる）再生システムによる対スクリーン・レンダリングをサポートする。典型的には、スピーカー・チャネル・ベースのプログラムは、特定の再生システム・スピーカーおよびスクリーン構成の知識をもつ（または該構成を想定する）リミックス・システムによって生成される。典型的には、前記オブジェクト・ベースのプログラム（それに応答して前記スピーカー・チャネル・ベースのプログラムが生成される）は、好適に構成された（オブジェクト・ベースのプログラムをデコードおよびレンダリングできる）再生システムによる前記オブジェクト・ベースのプログラムの対スクリーン・レンダリングをサポートするスクリーン関係メタデータを含む。 The generation of the speaker channel-based program (according to a third class of embodiment) is not configured to perform decoding and rendering of an object-based audio program (but of speaker channel based). Supports screen rendering with playback systems (which can decode and render programs). Typically, speaker channel-based programs are generated by a remix system that has (or assumes) knowledge of a particular playback system speaker and screen configuration. Typically, the object-based program (in response to which the speaker channel-based program is generated) is a well-configured playback system (which can decode and render the object-based program). Includes screen-related metadata that supports screen-to-screen rendering of said object-based program by.

第四のクラスの実施形態では、本発明は、歪められたコンテンツを示すスピーカー・チャネルの少なくとも一つの集合を含むスピーカー・チャネル・ベースのプログラムをレンダリングする方法であり、前記スピーカー・チャネル・ベースのプログラムは、オブジェクト・ベースのオーディオ・プログラムを処理することによって生成されたものである。該処理は、前記オブジェクト・ベースのオーディオ・プログラムのオーディオ・コンテンツを、少なくとも部分的には少なくとも一つの歪め度パラメータによって決定される度合いまで歪めて、歪められたコンテンツを示すスピーカー・チャネルの前記集合を生成することによることを含む。ここで、それぞれの前記歪め度パラメータは、再生システムによって前記オブジェクト・ベースのオーディオ・プログラムの対応するオーディオ・コンテンツに対して（たとえば再生スクリーンの平面に平行な平面において）実行される歪めの最大の度合いを示す（たとえば、該最大の度合いを示す非バイナリー値（たとえば、所定の範囲内の多くの値のうちの任意の値をもつスカラー値）である）。本レンダリングする方法は：
（ａ）前記スピーカー・チャネル・ベースのプログラムをパースして、歪められたコンテンツを示すスピーカー・チャネルのそれぞれの前記集合を含む、前記スピーカー・チャネル・ベースのプログラムのスピーカー・チャネルを同定する段階と、
（ｂ）歪められたコンテンツを示すスピーカー・チャネルの少なくとも一つの前記集合（at least one said set）を含む、前記スピーカー・チャネル・ベースのプログラムのスピーカー・チャネルの少なくともいくつかに応答して、再生スクリーンに対する所定の位置に位置されるラウドスピーカーを駆動するためのスピーカー・フィードを生成する段階とを含む。 In a fourth class of embodiment, the invention is a method of rendering a speaker channel-based program that includes at least one set of speaker channels that represent distorted content. The program is generated by processing an object-based audio program. The process distorts the audio content of the object-based audio program to the extent determined by at least one distortion parameter, at least in part, to the set of speaker channels that represent the distorted content. Includes by generating. Here, each of the distortion parameters is the maximum amount of distortion performed by the playback system on the corresponding audio content of the object-based audio program (eg, in a plane parallel to the plane of the playback screen). Indicates the degree (eg, a non-binary value indicating the maximum degree (eg, a scalar value having any value among many values within a predetermined range)). How to render this:
(A) A step of parsing the speaker channel-based program to identify the speaker channels of the speaker channel-based program, including each said set of speaker channels exhibiting distorted content. ,
(B) Playback in response to at least some of the speaker channels of the speaker channel based program, including at least one said set of speaker channels indicating distorted content. It involves generating a speaker feed to drive a loudspeaker that is positioned in place with respect to the screen.

第四のクラスのいくつかの実施形態では、前記スピーカー・チャネル・ベースのプログラムは、前記オブジェクト・ベースのオーディオ・プログラムを処理することによって生成されたものであり、該処理は、前記オブジェクト・ベースのオーディオ・プログラムのオーディオ・コンテンツの、少なくとも部分的には前記少なくとも一つの歪め度パラメータによって決定される度合いまでのオフ・スクリーン歪めを、前記オブジェクト・ベースのプログラムの対応するオーディオ・コンテンツに対するオフ・スクリーン歪めの少なくとも一つの特性を示す少なくとも一つのオフ・スクリーン歪めパラメータを使って実行することによることを含む。 In some embodiments of the fourth class, the speaker channel-based program is generated by processing the object-based audio program, which processing is the object-based. Off-screen distortion of the audio content of the audio program, at least in part, to the extent determined by the at least one distortion parameter, to the corresponding audio content of the object-based program. Includes performing with at least one off-screen distortion parameter that exhibits at least one characteristic of screen distortion.

第四のクラスのいくつかの実施形態では、前記スピーカー・チャネル・ベースのオーディオ・プログラムは、スピーカー・チャネルの二つ以上の選択可能な集合を含み、それらの集合の少なくとも一つは前記オブジェクト・ベースのオーディオ・プログラムの歪められていないオーディオ・コンテンツを示し、それらの集合の他のものは、歪められたコンテンツを示すスピーカー・チャネルの一つの前記集合（one said set）であり、段階（ｂ）は、歪められたコンテンツを示すスピーカー・チャネルの一つの前記集合である前記集合の一つを選択する段階を含む。 In some embodiments of the fourth class, the speaker channel based audio program comprises two or more selectable sets of speaker channels, at least one of which is the object. The other of those sets showing the undistorted audio content of the base audio program is one said set of speaker channels showing the distorted content, stage (b). ) Contains the step of selecting one of the sets, which is the set of one of the speaker channels indicating the distorted content.

いくつかの実施形態では、本発明の方法は、スクリーン関係メタデータを含むオブジェクト・ベースのオーディオ・プログラムを（たとえばエンコーダにおいて）生成する段階、（たとえばデコーダにおいて）デコードする段階および／またはレンダリングする段階を含む。オブジェクト・ベースのプログラムは、対応するビデオ・コンテンツを有し（たとえば映画または他のオーディオビジュアル・プログラムのサウンドトラックであってもよい）、少なくとも一つのオーディオ・オブジェクト・チャネル、スクリーン関係メタデータおよび典型的にはスピーカー・チャネルをも含む。スクリーン関係メタデータは、前記オブジェクト・チャネルの少なくとも一つのそれぞれに対応するメタデータを（および任意的には、前記スピーカー・チャネルの少なくとも一つのそれぞれに対応するメタデータも）含む。オブジェクト・ベースのプログラムのレンダリングおよび再生の間、（典型的には再生システムのスピーカーとスクリーンとの間の関係（単数または複数）を示すデータをもつ）スクリーン関係メタデータの処理は、オン・スクリーンのオーディオ要素（たとえば、映画シアターにおける再生中に映画スクリーン上の所定の位置において知覚されるようコンテンツ・クリエーターが意図したオーディオ要素）の知覚される位置の動的な歪めを許容し、よって歪められた位置は、再生システムの表示スクリーンの実際のサイズおよび位置に対して所定のサイズおよび位置をもつ。歪められた位置は、再生システムの実際の表示スクリーンと一致する必要はなく、本発明の典型的な実施形態は、プログラムの再生の間に変化する位置が変化するオーディオ要素のオン・スクリーンおよびオフ・スクリーンの知覚される位置の間のなめらかな遷移を許容する。 In some embodiments, the method of the invention generates an object-based audio program containing screen-related metadata (eg, in an encoder), decodes it (eg, in a decoder), and / or renders it. including. The object-based program has the corresponding video content (which may be, for example, the soundtrack of a movie or other audiovisual program), at least one audio object channel, screen-related metadata and typical. It also includes speaker channels. Screen-related metadata includes metadata corresponding to at least one of the object channels (and optionally metadata corresponding to each of at least one of the speaker channels). During the rendering and playback of an object-based program, the processing of screen-related metadata (typically with data showing the relationship (s) between the speakers of the playback system and the screen) is on-screen. Allows and thus distorts the perceived position of an audio element (eg, an audio element intended by the content creator to be perceived in place on a movie screen during playback in a movie theater). The position has a predetermined size and position with respect to the actual size and position of the display screen of the reproduction system. The distorted position does not have to match the actual display screen of the playback system, and typical embodiments of the present invention are on-screen and off-screen of audio elements that change position during program playback. -Allows smooth transitions between perceived positions on the screen.

いくつかの実施形態では、オブジェクト・ベースのオーディオ・プログラムが生成され、デコードされ、および／またはレンダリングされる。プログラムは、少なくとも一つのオーディオ・オブジェクト・チャネルを、任意的には少なくとも一つのスピーカー・チャネル（たとえば、スピーカー・チャネルの集合または「ベッド」）も含み、各オブジェクト・チャネルはオーディオ・オブジェクトまたはオーディオ・オブジェクトの集合（たとえば混合またはクラスター）を示し、少なくとも一つのオブジェクト・チャネルは対応するスクリーン関係メタデータをもつ（たとえば含む）。スピーカー・チャネルのベッドは、オブジェクト・チャネルを含まない通常のスピーカー・チャネル・ベースの放送プログラムに含まれうる型のスピーカー・チャネルの通常の混合（たとえば5.1チャネル混合）であってもよい。本方法は、それぞれの前記オブジェクト・チャネルを（任意的にはスピーカー・チャネルの前記集合も）示すオーディオ・データをエンコードしてオブジェクト・ベースのオーディオ・プログラムを生成する段階を含んでいてもよい。このクラスの典型的な実施形態によって生成されるオブジェクト・ベースのオーディオ・プログラムに応答して、レンダリングする段階は、各スピーカー・チャネルおよび各オブジェクト・チャネルのオーディオ・コンテンツの混合を示すスピーカー・フィードを生成してもよい。 In some embodiments, an object-based audio program is generated, decoded, and / or rendered. The program also includes at least one audio object channel, optionally at least one speaker channel (eg, a set of speaker channels or a "bed"), where each object channel is an audio object or audio. Represents a collection of objects (eg, a mixture or cluster), with at least one object channel having the corresponding screen-related metadata (eg, including). The speaker channel bed may be a normal mix of speaker channels of the type that can be included in a normal speaker channel based broadcast program that does not include object channels (eg 5.1 channel mix). The method may include encoding audio data indicating each said object channel (and optionally the set of speaker channels) to generate an object-based audio program. In response to an object-based audio program produced by a typical embodiment of this class, the rendering stage produces a speaker feed that shows the mixture of each speaker channel and the audio content of each object channel. It may be generated.

本発明の諸側面は、本発明の方法の任意の実施形態を実装するよう構成された（たとえばプログラムされた）システムまたは装置および本発明の方法の任意の実施形態またはその段階を実装するためのコードを（たとえば非一時的な仕方で）記憶するコンピュータ可読媒体（たとえばディスク）を含む。たとえば、本発明のシステムは、プログラム可能な汎用プロセッサ、デジタル信号プロセッサまたはマイクロプロセッサであって本発明の方法の実施形態またはその段階を含むデータに対する多様な動作の任意のものを実行するようソフトウェアもしくはファームウェアによってプログラムされたおよび／または他の仕方で構成されたものであるまたはそれを含むことができる。そのような汎用プロセッサは、入力装置と、メモリと、呈されたデータに応答して本発明の方法の実施形態（またはその段階）を実行するようプログラムされた（および／または他の仕方で構成された）処理回路とを含むコンピュータ・システムであってもよく、あるいはそれを含んでいてもよい。 Aspects of the invention are for implementing (eg, programmed) systems or devices configured to implement any embodiment of the method of the invention and any embodiment or stage thereof of the method of the invention. Includes a computer-readable medium (eg, a disk) that stores the code (eg, in a non-temporary way). For example, the system of the invention may be a programmable general purpose processor, digital signal processor or microprocessor that performs any of the various actions on the data, including embodiments or stages of the methods of the invention. It is programmed by the firmware and / or configured in some other way or may include it. Such general purpose processors are programmed (and / or otherwise configured) to perform embodiments (or stages thereof) of the methods of the invention in response to input devices, memory, and presented data. It may or may be a computer system including a processing circuit (which has been).

あるクラスの実施形態では、本発明は、少なくとも一つのオーディオ・オブジェクト・チャネル（典型的にはオブジェクト・チャネルの集合）および少なくとも一つのスピーカー・チャネル（典型的にはスピーカー・チャネルの集合）を示すオブジェクト・ベースのオーディオ・プログラムを生成するよう構成されたシステムである。各オーディオ・オブジェクト・チャネルはオブジェクトまたはオブジェクトの集合（たとえば混合またはクラスター）を示し、典型的には対応するオブジェクト関係メタデータを含む。スピーカー・チャネルの集合は、オブジェクト・チャネルを含まない通常のスピーカー・チャネル・ベースの放送プログラムに含まれうる型のスピーカー・チャネルの通常の混合（たとえば5.1チャネル混合）であってもよい。システムの典型的な実施形態によって生成されるオブジェクト・ベースのオーディオ・プログラムに応答して、空間的レンダリング・サブシステムは、スピーカー・チャネルおよび各オブジェクト・チャネルのオーディオ・コンテンツの混合を示すスピーカー・フィードを生成してもよい。 In certain classes of embodiments, the invention presents at least one audio object channel (typically a set of object channels) and at least one speaker channel (typically a set of speaker channels). A system configured to generate object-based audio programs. Each audio object channel represents an object or collection of objects (eg, a mixture or cluster) and typically contains the corresponding object-relational metadata. The set of speaker channels may be a normal mixture of speaker channels of the type that can be included in a normal speaker channel-based broadcast program that does not include object channels (eg, 5.1 channel mix). In response to an object-based audio program produced by a typical embodiment of the system, the spatial rendering subsystem is a speaker feed showing a mixture of speaker channels and the audio content of each object channel. May be generated.

あるクラスの実施形態では、本発明は、本発明の方法の任意の実施形態によって生成されたオーディオ・プログラムの（オーディオ・コンテンツを含む）少なくとも一つのフレームまたは他のセグメントを（たとえば非一時的な仕方で）記憶するバッファ・メモリ（バッファ）を含むオーディオ処理ユニット（APU：audio processing unit）である。プログラムがオブジェクト・ベースのオーディオ・プログラムである場合には、記憶されるセグメントは典型的にはスピーカー・チャネルのベッドおよびオブジェクト・チャネルのオーディオ・コンテンツと、対応するスクリーン関係メタデータとを含む。別のクラスの実施形態では、本発明は、スピーカー・チャネル・ベースのオーディオ・プログラムの少なくとも一つのフレームまたは他のセグメントを（たとえば非一時的な仕方で）記憶するバッファ・メモリ（バッファ）を含むAPUである。ここで、該セグメントは、本発明の実施形態に従ってオブジェクト・ベースのオーディオ・プログラムのオーディオ・コンテンツの歪めを実行することの結果として生成されたスピーカー・チャネルの少なくとも一つの集合のオーディオ・コンテンツを含む。該セグメントは、スピーカー・チャネル・ベースのプログラムのスピーカー・チャネルの少なくとも二つの選択可能な集合のオーディオ・コンテンツを含んでいてもよく、ここで、それらの集合の少なくとも一つは、本発明の実施形態に従う歪めの結果として生成される。 In certain classes of embodiments, the invention comprises at least one frame (including audio content) or other segment (eg, non-temporary) of an audio program produced by any embodiment of the method of the invention. It is an audio processing unit (APU) that includes a buffer memory (buffer) to be stored. If the program is an object-based audio program, the stored segments typically include speaker channel beds and object channel audio content, as well as the corresponding screen-related metadata. In another class of embodiments, the invention includes a buffer memory (buffer) that stores at least one frame or other segment of a speaker channel based audio program (eg, in a non-temporary manner). APU. Here, the segment comprises the audio content of at least one set of speaker channels produced as a result of performing distortion of the audio content of an object-based audio program according to embodiments of the present invention. .. The segment may include audio content of at least two selectable sets of speaker channels in a speaker channel based program, wherein at least one of those sets is a practice of the present invention. Produced as a result of morphological distortion.

本発明の典型的な実施形態は、外部のレンダリング・システム（たとえば装置）への送信（または別の仕方での送達）のための、エンコードされた、オブジェクト・ベースのオーディオ・ビットストリームのリアルタイムの生成を実装するよう構成される。 A typical embodiment of the invention is a real-time, encoded, object-based audio bitstream for transmission (or otherwise delivery) to an external rendering system (eg, device). Configured to implement generation.

表示スクリーン（S）および左右の前方スピーカー（LおよびR）および前方中央スピーカー（C）が前方壁（またはその近く）に設置された映画シアターの前方壁（W）の図である。It is a figure of the front wall (W) of a movie theater in which the display screen (S) and the left and right front speakers (L and R) and the front center speaker (C) are installed on the front wall (or near). 家庭シアター・システムの表示スクリーン（S'）、左右の前方スピーカー（L'およびR'）および前方中央スピーカー（C'）が前方壁（またはその近く）に設置された部屋の前方壁（W'）の図である。The front wall (W') of the room where the display screen (S') of the home theater system, the left and right front speakers (L'and R') and the front center speaker (C') are installed on the front wall (or near). ). 本発明の方法のある実施形態を実行するよう構成されたシステムのある実施形態のブロック図である。FIG. 3 is a block diagram of an embodiment of a system configured to perform an embodiment of the method of the invention. 再生システムの表示スクリーン（再生スクリーンS'）およびスピーカー（L'、C'、R'、LsおよびRs）を含む再生環境の図である。FIG. 3 is a diagram of a playback environment including a playback system display screen (playback screen S') and speakers (L', C', R', Ls and Rs). 図４の再生環境の図であり、パラメータEXPが図４を参照して記述される実施形態とは異なる値をもつ実施形態を示している。FIG. 4 is a diagram of the reproduction environment of FIG. 4, showing an embodiment in which the parameter EXP has a value different from that of the embodiment described with reference to FIG. 図４の再生環境の図であり、パラメータEXPが図４および図４Ａを参照して記述される実施形態とは異なる値をもつ実施形態を示している。FIG. 4 is a diagram of the reproduction environment of FIG. 4, showing an embodiment in which the parameter EXP has a value different from that of the embodiment described with reference to FIGS. 4 and 4A. 本発明のもう一つの実施形態を実行するよう構成されたシステムの要素のブロック図である。FIG. 3 is a block diagram of elements of a system configured to perform another embodiment of the invention.

〈記法および命名法〉
請求項を含む本開示を通じて、信号またはデータ「に対して」動作を実行する（たとえば信号またはデータをフィルタリングする、スケーリングする、変換するまたは利得を適用する）という表現は、信号またはデータに対して直接的に、または信号またはデータの処理されたバージョンに対して（たとえば、予備的なフィルタリングまたは前処理を該動作の実行に先立って受けている前記信号のバージョンに対して）該動作を実行することを表わすために広義で使用される。 <Notation and nomenclature>
Throughout the present disclosure, including the claims, the expression performing an action "on" a signal or data (eg, filtering, scaling, transforming, or applying gain to the signal or data) is to the signal or data. Perform the operation either directly or against a processed version of the signal (eg, for the version of the signal that has undergone preliminary filtering or preprocessing prior to performing the operation). Used in a broad sense to indicate that.

請求項を含む本開示を通じて、「システム」という表現は、装置、システムまたはサブシステムを表わす広義で使用される。たとえば、デコーダを実装するサブシステムは、デコーダ・システムと称されてもよく、そのようなサブシステムを含むシステム（たとえば、複数の入力に応答してX個の出力信号を生成するシステムであって、前記サブシステムが入力のうちのM個を生成し、他のX－M個の入力は外部源から受領されるもの）もデコーダ・システムと称されることがある。 Throughout this disclosure, including claims, the expression "system" is used in a broad sense to refer to an appliance, system or subsystem. For example, a subsystem that implements a decoder may be referred to as a decoder system, and is a system that includes such a subsystem (eg, a system that produces X output signals in response to multiple inputs. , The subsystem produces M of the inputs, and the other X-M inputs are received from an external source) may also be referred to as a decoder system.

請求項を含む本開示を通じて、用語「プロセッサ」は、データ（たとえばオーディオまたはビデオまたは他の画像データ）に対して動作を実行するよう（たとえばソフトウェアまたはファームウェアを用いて）プログラム可能または他の仕方で構成可能であるシステムまたは装置を表わす広義で使用される。プロセッサの例は、フィールド・プログラム可能なゲート・アレイ（または他の構成可能な集積回路またはチップセット）、オーディオまたは他のサウンド・データに対してパイプライン化された処理を実行するようプログラムされたおよび／または他の仕方で構成されたデジタル信号プロセッサ、プログラム可能な汎用プロセッサもしくはコンピュータおよびプログラム可能なマイクロプロセッサ・チップまたはチップセットを含む。 Throughout this disclosure, including claims, the term "processor" is programmable or otherwise programmable (eg, using software or firmware) to perform operations on data (eg, audio or video or other image data). Used in a broad sense to refer to a configurable system or device. Examples of processors were programmed to perform pipelined processing on field programmable gate arrays (or other configurable integrated circuits or chipsets), audio or other sound data. Includes digital signal processors, programmable general purpose processors or computers and programmable microprocessor chips or chipsets configured in and / or other ways.

請求項を含む本開示を通じて、表現「オーディオ・プロセッサ」および「オーディオ処理ユニット」は、交換可能に、オーディオ・データを処理するよう構成されたシステムを表わす広義で使用される。オーディオ処理ユニットの例は、エンコーダ（たとえばトランスコーダ）、デコーダ、コーデック、前処理システム、後処理システムおよびビットストリーム処理システム（時にビットストリーム処理ツールと称される）を含むがこれに限られない。 Throughout the present disclosure, including the claims, the expressions "audio processor" and "audio processing unit" are used broadly to represent a system configured to process audio data interchangeably. Examples of audio processing units include, but are not limited to, encoders (eg, transcoders), decoders, codecs, pre-processing systems, post-processing systems and bitstream processing systems (sometimes referred to as bitstream processing tools).

請求項を含む本開示を通じて、（たとえば「スクリーン関係メタデータ」という表現における）「メタデータ」という表現は、対応するオーディオ・データ（メタデータをも含むビットストリームの、オーディオ・コンテンツ）とは別個の異なるデータを指す。メタデータは、オーディオ・データに関連付けられ、該オーディオ・データの少なくとも一つの特徴または特性（たとえばそのオーディオ・データに対してどの型（単数または複数）の処理がすでに実行されているか、あるいは実行されるべきかまたはそのオーディオ・データによって示されるオブジェクトの軌跡）を示す。メタデータのオーディオ・データとの関連付けは、時間同期的である。このように、現在の（最も最近受領または更新された）メタデータは、対応するオーディオ・データが同時的に、示される特徴をもつおよび／または示される型のオーディオ・データ処理の結果を含むことを示しうる。 Throughout this disclosure, including claims, the expression "metadata" (eg, in the expression "screen-related metadata") is separate from the corresponding audio data (the audio content of the bitstream, which also contains the metadata). Refers to different data of. Metadata is associated with audio data and that at least one feature or characteristic of the audio data (eg, what type (s) of processing has already been performed on or performed on that audio data). Indicates the trajectory of the object that should or is indicated by its audio data). The association of metadata with audio data is time-synchronous. Thus, the current (most recently received or updated) metadata should include the result of audio data processing of the type in which the corresponding audio data simultaneously has the characteristics shown and / or is shown. Can be shown.

請求項を含む本開示を通じて、「結合する」または「結合される」という用語は、直接的または間接的な接続を意味するために使われる。よって、第一の装置が第二の装置に結合する場合、その接続は、直接接続を通じてであってもよいし、他の装置および接続を介した間接的な接続を通じてであってもよい。 Throughout this disclosure, including claims, the term "combined" or "combined" is used to mean a direct or indirect connection. Thus, when the first device is coupled to the second device, the connection may be through a direct connection or through an indirect connection via another device and connection.

請求項を含む本開示を通じて、以下の表現は以下の定義をもつ。 Throughout this disclosure, including claims, the following expressions have the following definitions:

スピーカーおよびラウドスピーカーは、任意の音を発するトランスデューサを表わすものとして同義に使われる。この定義は、複数のトランスデューサ（たとえばウーファーおよびツイーター）として実装されるラウドスピーカーを含む。 Speakers and loudspeakers are used interchangeably to represent transducers that emit arbitrary sound. This definition includes loudspeakers implemented as multiple transducers (eg, woofers and tweeters).

スピーカー・フィード：ラウドスピーカーに直接加えられるオーディオ信号または直列の増幅器およびラウドスピーカーに加えられるオーディオ信号。 Speaker feed: An audio signal applied directly to a loudspeaker or an audio signal applied to an amplifier and loudspeaker in series.

チャネル（または「オーディオ・チャネル」）：モノフォニック・オーディオ信号。そのような信号は典型的には、該信号を所望されるまたは公称上の位置にあるラウドスピーカーに直接加えるのと等価であるようにレンダリングされることができる。所望される位置は、物理的なラウドスピーカーでは典型的にそうであるように静的であってもよく、あるいは動的であってもよい。 Channel (or "audio channel"): A monophonic audio signal. Such a signal can typically be rendered to be equivalent to applying the signal directly to a loudspeaker in the desired or nominal position. The desired position may be static or dynamic, as is typically the case with physical loudspeakers.

オーディオ・プログラム：一つまたは複数のオーディオ・チャネル（少なくとも一つのスピーカー・チャネルおよび／または少なくとも一つのオブジェクト・チャネル）および任意的には関連するメタデータ（たとえば、所望される空間的オーディオ呈示を記述するメタデータ）の集合。 Audio program: Describes one or more audio channels (at least one speaker channel and / or at least one object channel) and optionally associated metadata (eg, desired spatial audio presentation). A set of metadata).

スピーカー・チャネル（または「スピーカー・フィード・チャネル」）：（所望されるまたは公称上の位置にある）指定されたラウドスピーカーに関連付けられているまたは定義されたスピーカー配位内での指定されたスピーカー・ゾーンに関連付けられているオーディオ・チャネル。スピーカー・チャネルは、該オーディオ信号を（所望されるまたは公称上の位置にある）指定されたラウドスピーカーにまたは指定されたスピーカー・ゾーン内のスピーカーに直接加えるのと等価であるようにレンダリングされる。 Speaker channel (or "speaker feed channel"): The specified speaker within the speaker configuration associated with or defined for the specified loudspeaker (in the desired or nominal position). • The audio channel associated with the zone. The speaker channel is rendered to be equivalent to applying the audio signal directly to the designated loudspeaker (at the desired or nominal position) or to the speakers in the designated speaker zone. ..

オブジェクト・チャネル：オーディオ源（時にオーディオ「オブジェクト」と称される）によって発される音を示すオーディオ・チャネル。典型的には、オブジェクト・チャネルは、パラメトリックなオーディオ源記述を決定する（たとえば、パラメトリックなオーディオ源記述を示すメタデータがオブジェクト・チャネル内に含められるまたはオブジェクト・チャネルと一緒に提供される）。源記述は、（時間の関数としての）源によって発された音、時間の関数としての源の見かけの位置（たとえば、3D空間座標）および任意的には源を特徴付ける少なくとも一つの追加的パラメータ（たとえば見かけの源サイズまたは幅）を決定してもよい。 Object channel: An audio channel that represents the sound produced by an audio source (sometimes referred to as an audio "object"). Typically, the object channel determines the parametric audio source description (for example, metadata indicating the parametric audio source description is included within or provided with the object channel). The source description is the sound emitted by the source (as a function of time), the apparent position of the source as a function of time (eg, 3D spatial coordinates) and optionally at least one additional parameter that characterizes the source (as a function of time). For example, the apparent source size or width) may be determined.

オブジェクト・ベースのオーディオ・プログラム：一つまたは複数のオブジェクト・チャネルの集合を（および任意的には少なくとも一つのスピーカー・チャネルも）および任意的には関連するメタデータ（たとえば、オブジェクト・チャネルによって示される音を発するオーディオ・オブジェクトの軌跡を示すメタデータ、あるいは他の仕方でオブジェクト・チャネルによって示される音の所望される空間的オーディオ呈示を示すメタデータまたはオブジェクト・チャネルによって示される音の源である少なくとも一つのオーディオ・オブジェクトの識別情報を示すメタデータ）も含むオーディオ・プログラム。 Object-based audio program: A collection of one or more object channels (and optionally at least one speaker channel) and optionally associated metadata (eg, by object channels). A source of sound that is shown by a metadata or object channel that shows the trajectory of the audio object that emits the sound, or otherwise shows the desired spatial audio presentation of the sound that is shown by the object channel. An audio program that also contains (metadata) that indicates the identification information of at least one audio object.

レンダリング：オーディオ・プログラムを一つまたは複数のスピーカー・フィードに変換するプロセスまたはオーディオ・プログラムを一つまたは複数のスピーカー・フィードに変換し、該スピーカー・フィードを一つまたは複数のラウドスピーカーを使って音に変換するプロセス。（後者の場合、レンダリングは本稿では時にラウドスピーカー「による」レンダリングと称される。）オーディオ・チャネルは、信号を所望される位置にある物理的なラウドスピーカーに直接加えることによって（所望される位置「において」）トリビアルにレンダリングされることができる。あるいは、一つまたは複数のオーディオ・チャネルは、（聴取者にとって）そのようなトリビアルなレンダリングと実質的に等価であるよう設計された多様な仮想化技法の一つを使ってレンダリングされることができる。この後者の場合、各オーディオ・チャネルは、一般には所望される位置とは異なる既知の位置にあるラウドスピーカー（単数または複数）に加えられるべき一つまたは複数のスピーカー・フィードに変換されてもよく、それによりフィードに応答してラウドスピーカーによって発される音は、所望される位置から発しているように知覚されることになる。そのような仮想化技法の例は、ヘッドフォンを介したバイノーラル・レンダリング（たとえばヘッドフォン装着者のために7.1チャネルまでのサラウンド・サウンドをシミュレートするドルビー・ヘッドフォン処理を使う）および波面合成（wave field synthesis）を含む。 Rendering: The process of converting an audio program to one or more speaker feeds or converting an audio program to one or more speaker feeds and using the speaker feeds to one or more loudspeakers. The process of converting to sound. (In the latter case, rendering is sometimes referred to in this paper as "rendering by" loudspeakers.) The audio channel is applied directly to the physical loudspeakers in the desired location (desired location). "In") can be rendered trivially. Alternatively, one or more audio channels may be rendered using one of a variety of virtualization techniques designed to be substantially equivalent (to the listener) to such trivial rendering. can. In this latter case, each audio channel may be converted into one or more speaker feeds that should be added to the loudspeakers (s) in known positions that are generally different from the desired position. , The sound emitted by the loudspeakers in response to the feed will be perceived as coming from the desired position. Examples of such virtualization techniques are binaural rendering through headphones (eg using Dolby headphone processing to simulate surround sound up to 7.1 channels for headphone wearers) and wave field synthesis. )including.

〈発明の実施形態の詳細な説明〉
本発明のシステム（および該システムによって実行される方法）の実施形態の例は図３、図４、図５を参照して記述される。 <Detailed description of the embodiment of the invention>
Examples of embodiments of the system of the invention (and methods performed by the system) are described with reference to FIGS. 3, 4, and 5.

図３はオーディオ処理パイプライン（オーディオ・データ処理システム）の例のブロック図であって、本システムの要素の一つまたは複数は本発明のある実施形態に基づいて構成されている。本システムは、図のように一緒に結合された以下の要素を含む：捕捉ユニット、プロダクション・ユニット３（これはエンコード・サブシステムを含む）、送達サブシステム５、デコーダ７、オブジェクト処理サブシステム９、コントローラ１０およびレンダリング・サブシステム１１。変形では、図のシステムに対して、要素の一つまたは複数が省略され、あるいは追加的なオーディオ・データ処理ユニットが含められる。典型的には、要素７、９、１０および１１は再生システム（たとえば、エンドユーザーの家庭シアター・システム）に含められる。 FIG. 3 is a block diagram of an example of an audio processing pipeline (audio data processing system), wherein one or more of the elements of the system are configured based on an embodiment of the present invention. The system includes the following elements combined together as shown: capture unit, production unit 3 (which includes the encoding subsystem), delivery subsystem 5, decoder 7, object processing subsystem 9. , Controller 10 and rendering subsystem 11. The variant omits one or more of the elements or includes an additional audio data processing unit for the system in the figure. Typically, elements 7, 9, 10 and 11 are included in a playback system (eg, an end-user home theater system).

捕捉ユニット１は典型的には、オーディオ・コンテンツを有するPCM（時間領域）サンプルを生成し、該PCMサンプルを出力するよう構成されている。それらのサンプルは、マイクロホンによって捕捉されたオーディオの複数のストリームを示す。プロダクション・ユニット３は、前記PCMサンプルを入力として受け入れ、前記オーディオ・コンテンツを示すオブジェクト・ベース・オーディオ・プログラムを出力するよう構成される。プログラムは典型的には、エンコードされた（たとえば圧縮された）オーディオ・ビットストリームであるまたはそれらを含む。前記オーディオ・コンテンツを示すエンコードされたビットストリームのデータは本稿では時に「オーディオ・データ」と称される。プロダクション・ユニット３のエンコード・サブシステムが本発明の典型的な実施形態に従って構成される場合、ユニット３によって出力されるオブジェクト・ベース・オーディオ・プログラムは、オーディオ・データの複数のスピーカー・チャネル（スピーカー・チャネルの「ベッド」）と、オーディオ・データの複数のオブジェクト・チャネルと、メタデータ（各オブジェクト・チャネルに対応するスクリーン関係メタデータおよび任意的には各スピーカー・チャネルに対応するスクリーン関係メタデータを含む）と示す（すなわち含む）。 The capture unit 1 is typically configured to generate a PCM (time domain) sample with audio content and output the PCM sample. Those samples show multiple streams of audio captured by a microphone. The production unit 3 is configured to accept the PCM sample as input and output an object-based audio program indicating the audio content. Programs are typically encoded (eg, compressed) audio bitstreams or include them. The encoded bitstream data indicating the audio content is sometimes referred to as "audio data" in this paper. When the encoding subsystem of the production unit 3 is configured according to a typical embodiment of the present invention, the object-based audio program output by the unit 3 is a plurality of speaker channels (speakers) of audio data. • Channel “bed”) and multiple object channels of audio data and metadata (screen-related metadata for each object channel and optionally screen-related metadata for each speaker channel). (Including) (ie, including).

典型的な実装では、ユニット３は、そこで生成されたオブジェクト・ベースのオーディオ・プログラムを出力するよう構成される。 In a typical implementation, unit 3 is configured to output the object-based audio program generated there.

もう一つの実装では、ユニット３は、オブジェクト・ベースのオーディオ・プログラムに応答してスピーカー・チャネル・ベースのオーディオ・プログラム（スピーカー・チャネルを含むがオブジェクト・チャネルは含まない）を生成するよう結合され、構成されているリミックス・サブシステムを含み、ユニット３は、該スピーカー・チャネル・ベースのオーディオ・プログラムを出力するよう構成される。図５のシステムのリミックス・サブシステム６は、本発明のある実施形態に従って（図５の）エンコーダ４によって生成されたオブジェクト・ベースのオーディオ・プログラム（「OP」）に応答して本発明のある実施形態に従ってスピーカー・チャネル・ベースのオーディオ・プログラム（スピーカー・チャネルを含むがオブジェクト・チャネルは含まないプログラム「SP」）を生成するよう結合され、構成されたリミックス・サブシステムのもう一つの例である。 In another implementation, unit 3 is combined to generate a speaker channel-based audio program (including speaker channels but not object channels) in response to an object-based audio program. , Including the configured remix subsystem, the unit 3 is configured to output the speaker channel based audio program. The remix subsystem 6 of the system of FIG. 5 is according to an embodiment of the invention according to the invention in response to an object-based audio program (“OP”) generated by the encoder 4 (FIG. 5). In another example of a remix subsystem combined and configured to generate a speaker channel based audio program (program "SP" that includes speaker channels but not object channels) according to embodiments. be.

図３の送達サブシステムは、ユニット３によって生成され、ユニット３から出力されるプログラム（たとえば、オブジェクト・ベースのオーディオ・プログラムまたは該オブジェクト・ベースのオーディオ・プログラムに応答して生成されたスピーカー・チャネル・ベースのオーディオ・プログラム）を記憶および／または伝送（たとえば放送）するよう構成されている。簡単のため、（ユニット３によって生成されユニット３から出力されるプログラムがスピーカー・チャネル・ベースのオーディオ・プログラムであることがコンテキスト、記述または参照から明らかでない限り）ユニット３によって生成されユニット３から出力されるプログラムはオブジェクト・ベースのオーディオ・プログラムであるとの想定で図３のシステムを記述する（そしてそれを参照する）。 The delivery subsystem of FIG. 3 is a speaker channel generated by the unit 3 and output from the unit 3 (for example, an object-based audio program or a speaker channel generated in response to the object-based audio program). -It is configured to store and / or transmit (eg, broadcast) a base audio program). For simplicity, it is generated by unit 3 and output from unit 3 (unless it is clear from the context, description or reference that the program generated by unit 3 and output from unit 3 is a speaker channel based audio program). We describe (and refer to it) the system of Figure 3 assuming that the program to be done is an object-based audio program.

図３のシステムの典型的な実施形態では、サブシステム５はオブジェクト・ベースのオーディオ・プログラムの、デコーダ７への送達を実装する。たとえば、サブシステム５は、プログラムを（たとえばディスクに）記憶し、記憶されたプログラムをデコーダ７に提供するよう構成されていてもよい。あるいはまた、サブシステム５は、プログラムを（たとえば放送システムまたはインターネット・プロトコルまたは他のネットワークを通じて）デコーダ７に送信するよう構成されていてもよい。 In a typical embodiment of the system of FIG. 3, the subsystem 5 implements delivery of an object-based audio program to the decoder 7. For example, the subsystem 5 may be configured to store a program (eg, on a disk) and provide the stored program to the decoder 7. Alternatively, the subsystem 5 may be configured to send the program to the decoder 7 (eg, through a broadcast system or internet protocol or other network).

デコーダ７は、送達サブシステム５によって送達されるプログラムを受け入れ（受信しまたは読み）、該プログラムをデコードするよう結合され、構成される。プログラムがオブジェクト・ベースのプログラムであり、デコーダ７が本発明の典型的な実施形態に従って構成されている場合、典型的な動作におけるデコーダ７の出力は次のものを含む：
該プログラムのスピーカー・チャネルのベッドを示すオーディオ・サンプルのストリーム（および任意的にはスクリーン関係メタデータの対応するストリームも）；および
該プログラムのオブジェクト・チャネルを示すオーディオ・サンプルのストリームおよびスクリーン関係メタデータの対応するストリーム。 The decoder 7 is coupled and configured to accept (receive or read) a program delivered by the delivery subsystem 5 and decode the program. If the program is an object-based program and the decoder 7 is configured according to a typical embodiment of the invention, the output of the decoder 7 in a typical operation will include:
A stream of audio samples showing the bed of the program's speaker channels (and optionally a corresponding stream of screen-related metadata); and a stream of audio samples showing the program's object channels and screen-related meta. The corresponding stream of data.

オブジェクト処理サブシステム９は、（デコーダ７から）送達されたプログラムのデコードされたスピーカー・チャネル、オブジェクト・チャネルおよびオブジェクト関係メタデータを受領するよう結合されている。サブシステム９は、スクリーン関係メタデータを使って、オブジェクト・チャネルに対する（あるいはオブジェクト・チャネルの選択された部分集合に対する、あるいはオブジェクト・チャネルの一部または全部の少なくとも一つの混合（たとえばクラスター）に対する）歪めを実行し、結果として得られるオブジェクト・チャネルおよび／または混合をレンダリング・サブシステム１１に出力するよう結合され、構成されている。サブシステム９は典型的には、サブシステム１１に出力するオブジェクト・チャネルおよび／または混合に対応するオブジェクト関係メタデータ（これは、サブシステム５によって送達されたプログラムからデコーダ７によってパースされ、デコーダ７からサブシステム９に呈されたものである）をもレンダリング・サブシステム１１に対して出力する。サブシステム９は典型的には、デコーダ７からのデコードされたスピーカー・チャネルを不変のまま（サブシステム１１に）素通りさせるよう構成されている。 The object processing subsystem 9 is coupled to receive the decoded speaker channels, object channels and object-relational metadata of the delivered program (from the decoder 7). A subsystem 9 uses screen-related metadata to an object channel (or to a selected subset of an object channel, or to at least one mixture (eg, a cluster) of some or all of an object channel). It is coupled and configured to perform distortion and output the resulting object channel and / or mixture to the rendering subsystem 11. The subsystem 9 typically outputs object channels to the subsystem 11 and / or object-related metadata corresponding to the mixture (which is parsed by the decoder 7 from the program delivered by the subsystem 5 and is parsed by the decoder 7). (It is presented to the subsystem 9 from) is also output to the rendering subsystem 11. The subsystem 9 is typically configured to pass the decoded speaker channels from the decoder 7 unchanged (to the subsystem 11).

デコーダ７に送達されるプログラムが（本発明のある実施形態に基づいてオブジェクト・ベースのプログラムから生成された）スピーカー・チャネル・ベースのオーディオ・プログラムである場合、サブシステム９は、（のちにより詳細に述べる仕方で）プログラムのスピーカー・チャネルのいくつかを選択し、選択されたチャネルをレンダリング・サブシステム１１に呈することによって、本発明に基づく歪めを実装するよう構成された単純なスピーカー・チャネル選択システムとして実装されても（あるいはそのような選択システムによって置換されても）よい。 If the program delivered to the decoder 7 is a speaker channel based audio program (generated from an object-based program based on an embodiment of the invention), the subsystem 9 will be (later detailed). A simple speaker channel selection configured to implement distortion according to the present invention by selecting some of the speaker channels of the program (as described in) and presenting the selected channels to the rendering subsystem 11. It may be implemented as a system (or replaced by such a selection system).

サブシステム９によって実行される歪めは、少なくとも部分的には（たとえばシステムのセットアップ中にコントローラ１０のユーザー操作に応答して）コントローラ１０からサブシステム９に呈されるデータによって制御されてもよい。そのようなデータは、再生システム・スピーカーおよび表示スクリーンの特性を示していてもよい（たとえば、再生システム・スクリーンと再生システム・スピーカーの相対的なサイズおよび位置を示していてもよい）し、および／または少なくとも一つの歪め度パラメータおよび／または少なくとも一つのオフ・スクリーン歪めパラメータを含んでいてもよい。サブシステム９によって実行される歪めは典型的には、（デコーダ７に送達される）プログラムのスクリーン関係メタデータによって示される少なくとも一つの歪め度パラメータおよび／または少なくとも一つのオフ・スクリーン歪めパラメータならびに／またはコントローラ１０からサブシステム９に呈される少なくとも一つの歪め度パラメータおよび／または少なくとも一つのオフ・スクリーン歪めパラメータによって決定される。 The distortion performed by the subsystem 9 may be controlled, at least in part, by the data presented to the subsystem 9 by the controller 10 (eg, in response to user manipulation of the controller 10 during system setup). Such data may indicate the characteristics of the reproduction system speaker and display screen (eg, may indicate the relative size and position of the reproduction system screen and the reproduction system speaker), and. / Or at least one distortion degree parameter and / or at least one off-screen distortion parameter may be included. The distortion performed by subsystem 9 is typically at least one distortion degree parameter and / or at least one off-screen distortion parameter and / as indicated by the screen-related metadata of the program (delivered to decoder 7). Alternatively, it is determined by at least one distortion degree parameter and / or at least one off-screen distortion parameter presented by the controller 10 to the subsystem 9.

図３のレンダリング・サブシステム１１は、再生システムのスピーカー（図示せず）による再生のために、サブシステム９の出力によって決定されたオーディオ・コンテンツをレンダリングするよう構成される。サブシステム１１は、サブシステム９から出力されたレンダリング・パラメータ（たとえばサブシステム９から出力されるオブジェクト関係メタデータによって示される空間位置およびレベルの値）を使って、サブシステム９から出力されるオブジェクト・チャネル（または混合）によって決定されたオーディオ・オブジェクトを利用可能なスピーカー・チャネルにマッピングするよう構成される。レンダリング・サブシステム１１はまた、サブシステム９によって素通しにされたスピーカー・チャネルのベッドがあればそれも受領する。典型的には、サブシステム１１は知的な混合器であり、利用可能なスピーカーについてのスピーカー・フィードを決定するよう構成されている。該決定は、一つまたは複数のオブジェクト（または混合）をいくつかの個々のスピーカー・チャネルのそれぞれにマッピングして、それらのオブジェクト（または混合）をプログラムのスピーカー・チャネル・ベッドの各対応するスピーカー・チャネルによって示される「ベッド」オーディオ・コンテンツと混合することによることを含む。 The rendering subsystem 11 of FIG. 3 is configured to render the audio content determined by the output of the subsystem 9 for reproduction by a speaker (not shown) of the reproduction system. Subsystem 11 uses the rendering parameters output from subsystem 9, for example, the spatial position and level values indicated by the object-related metadata output from subsystem 9, and the object output from subsystem 9. • Configured to map the audio object determined by the channel (or mix) to the available speaker channels. The rendering subsystem 11 also receives a bed of speaker channels that has been passed through by the subsystem 9. Typically, the subsystem 11 is an intelligent mixer and is configured to determine the speaker feed for available speakers. The decision maps one or more objects (or mixes) to each of several individual speaker channels and maps those objects (or mixes) to each corresponding speaker in the program's speaker channel bed. Includes by mixing with the "bed" audio content indicated by the channel.

典型的には、サブシステム１１の出力は、再生システム・ラウドスピーカー（たとえば図４に示されるスピーカー）に呈されてそれらのスピーカーを駆動するスピーカー・フィードの集合である。 Typically, the output of the subsystem 11 is a set of speaker feeds that are presented to a reproduction system loudspeaker (eg, the speaker shown in FIG. 4) to drive those speakers.

本発明のある側面は、本発明の方法の任意の実施形態を実行するよう構成されたオーディオ処理ユニット（APU）である。APUの例は、エンコーダ（たとえばトランスコーダ）、デコーダ、コーデック、前処理システム（前処理器）、後処理システム（後処理器）、オーディオ・ビットストリーム処理システムおよびそのような要素の組み合わせを含むがそれに限定されるものではない。APUの例は、図３のプロダクション・ユニット３、デコーダ７、オブジェクト処理サブシステム９およびレンダリング・サブシステム１１である。本発明の方法のある実施形態を実行するよう構成されているこれらすべての例示的なAPUの実装が本稿で考えられ、記載されている。 One aspect of the invention is an audio processing unit (APU) configured to perform any embodiment of the method of the invention. Examples of APUs include encoders (eg transcoders), decoders, codecs, pre-processing systems (pre-processing units), post-processing systems (post-processing units), audio bitstream processing systems and combinations of such elements. It is not limited to that. Examples of APUs are the production unit 3, the decoder 7, the object processing subsystem 9 and the rendering subsystem 11 of FIG. Implementations of all these exemplary APUs configured to perform certain embodiments of the methods of the invention are considered and described herein.

あるクラスの実施形態では、本発明は、本発明の方法の任意の実施形態によって生成されたオーディオ・プログラム（オーディオ・コンテンツを含む）の少なくとも一つのフレームまたは他のセグメントを（たとえば非一時的な仕方で）記憶するバッファ・メモリ（バッファ）を含むAPUである。プログラムがオブジェクト・ベースのオーディオ・プログラムである場合には、記憶されるセグメントは典型的にはスピーカー・チャネルのベッドおよびオブジェクト・チャネルのオーディオ・コンテンツと、対応するスクリーン関係メタデータとを含む。そのようなAPUの例は、エンコード・サブシステム３Ｂ（本発明の実施形態に従ってオブジェクト・ベースのオーディオ・プログラムを生成するよう構成されている）およびサブシステム３Ｂに結合されたバッファ３Ａを含む図３のプロダクション・ユニット３の実装である。ここで、バッファ３Ａはオブジェクト・ベースのオーディオ・プログラムの少なくとも一つのフレームまたは他のセグメント（スピーカー・チャネルのベッドおよびオブジェクト・チャネルのオーディオ・コンテンツならびに対応するスクリーン関係メタデータを含む）を（たとえば非一時的な仕方で）記憶する。そのようなAPUのもう一つの例は、バッファ７Ａと、デコード・サブシステム７Ｂ（バッファ７Ａに結合されている）とを含む図３のデコーダ７の実装である。ここで、バッファ７Ａはサブシステム５からデコーダ７に送達されたオブジェクト・ベースのオーディオ・プログラムの少なくとも一つのフレームまたは他のセグメント（スピーカー・チャネルのベッドおよびオブジェクト・チャネルのオーディオ・コンテンツならびに対応するスクリーン関係メタデータを含む）を（たとえば非一時的な仕方で）記憶する。デコード・サブシステム７Ｂは、プログラムをパースし、必要なデコードがあればそれをプログラムに対して実行するよう構成されている。 In certain classes of embodiments, the invention comprises at least one frame or other segment (eg, non-temporary) of an audio program (including audio content) generated by any embodiment of the method of the invention. An APU that contains a buffer memory (buffer) to store (in some way). If the program is an object-based audio program, the stored segments typically include speaker channel beds and object channel audio content, as well as the corresponding screen-related metadata. An example of such an APU includes an encoding subsystem 3B (configured to generate an object-based audio program according to an embodiment of the invention) and a buffer 3A coupled to the subsystem 3B. It is an implementation of the production unit 3 of. Here, buffer 3A contains at least one frame or other segment of the object-based audio program (including, for example, the bed of the speaker channel and the audio content of the object channel and the corresponding screen-related metadata). Remember (in a temporary way). Another example of such an APU is the implementation of decoder 7 in FIG. 3, which includes buffer 7A and a decoding subsystem 7B (which is coupled to buffer 7A). Here, buffer 7A is at least one frame or other segment of the object-based audio program delivered from subsystem 5 to decoder 7 (speaker channel beds and object channel audio content as well as corresponding screens. Remember (including relational metadata) (for example, in a non-temporary way). The decoding subsystem 7B is configured to parse the program and execute any necessary decoding on the program.

別のクラスの実施形態では、本発明は、スピーカー・チャネル・ベースのオーディオ・プログラムの少なくとも一つのフレームまたは他のセグメントを（たとえば非一時的な仕方で）記憶するバッファ・メモリ（バッファ）を含むAPUである。ここで、該セグメントは、本発明の実施形態に従ってオブジェクト・ベースのオーディオ・プログラムのオーディオ・コンテンツの歪めを実行することの結果として生成されたスピーカー・チャネルの少なくとも一つの集合のオーディオ・コンテンツを含む。該セグメントは、スピーカー・チャネル・ベースのプログラムのスピーカー・チャネルの少なくとも二つの選択可能な集合のオーディオ・コンテンツを含んでいてもよく、ここで、それらの集合の少なくとも一つは、本発明の実施形態に従う歪めの結果として生成される。そのようなAPUの例は、エンコード・サブシステム３Ｂ（本発明の実施形態に従ってスピーカー・チャネル・ベースのオーディオ・プログラムを生成するよう構成されている；これはやはりユニット３によって生成されるオブジェクト・ベースのオーディオ・プログラムのオーディオ・コンテンツに対して歪めを実行することによることを含む）およびサブシステム３Ｂに結合されたバッファ３Ａを含む図３のプロダクション・ユニット３の実装である。ここで、バッファ３Ａはスピーカー・チャネル・ベースのオーディオ・プログラムの少なくとも一つのフレームまたは他のセグメント（スピーカー・チャネルの少なくとも二つの選択可能な集合のオーディオ・コンテンツを含む；ここで、それらの集合の少なくとも一つは、オブジェクト・ベースのオーディオ・プログラムのオーディオ・コンテンツに対して本発明の実施形態に基づく歪めを実行することの結果として生成される）を（たとえば非一時的な仕方で）記憶する。そのようなAPUのもう一つの例は、バッファ７Ａと、デコード・サブシステム７Ｂ（バッファ７Ａに結合されている）とを含む図３のデコーダ７の実装である。ここで、バッファ７Ａはユニット３からサブシステム５を介してデコーダ７に送達された、ユニット３の例示的実施形態によって生成されたスピーカー・チャネル・ベースのオーディオ・プログラムの少なくとも一つのフレームまたは他のセグメントを（たとえば非一時的な仕方で）記憶する。デコード・サブシステム７Ｂは、プログラムをパースし、必要なデコードがあればそれをプログラムに対して実行するよう構成されている。そのようなAPUのもう一つの例は、サブシステム６Ｂ（本発明の実施形態に従ってスピーカー・チャネル・ベースのオーディオ・プログラムを生成するよう構成されている；これは図５のエンコーダ４によって生成される、典型的にはスクリーン関係メタデータ含む、オブジェクト・ベースのオーディオ・プログラムのオーディオ・コンテンツに対して歪めを実行することによることを含む）と、オーディオ処理サブシステム６Ｂに結合されたバッファ６Ａとを含む図５のリミックス・サブシステム６の実装である。ここで、バッファ６Ａはサブシステム６Ｂによって生成されたスピーカー・チャネル・ベースのオーディオ・プログラムの少なくとも一つのフレームまたは他のセグメント（スピーカー・チャネルの少なくとも二つの選択可能な集合のオーディオ・コンテンツを含む；ここで、それらの集合の少なくとも一つは、本発明の実施形態に基づく歪めの結果として生成される）を（たとえば非一時的な仕方で）記憶する。 In another class of embodiments, the invention includes a buffer memory (buffer) that stores at least one frame or other segment of a speaker channel based audio program (eg, in a non-temporary manner). APU. Here, the segment comprises the audio content of at least one set of speaker channels produced as a result of performing distortion of the audio content of an object-based audio program according to embodiments of the present invention. .. The segment may include audio content of at least two selectable sets of speaker channels in a speaker channel based program, wherein at least one of those sets is a practice of the present invention. Produced as a result of morphological distortion. An example of such an APU is configured to generate an encoding subsystem 3B, a speaker channel based audio program according to an embodiment of the invention; which is also an object base produced by unit 3. 3 is an implementation of the production unit 3 of FIG. 3 including buffer 3A coupled to subsystem 3B) and by performing distortions on the audio content of the audio program of. Here, the buffer 3A contains at least one frame or other segment of a speaker channel-based audio program (including audio content of at least two selectable sets of speaker channels; where, of those sets). At least one stores (eg, in a non-temporary manner) the audio content of an object-based audio program (produced as a result of performing distortions according to embodiments of the present invention). .. Another example of such an APU is the implementation of decoder 7 in FIG. 3, which includes buffer 7A and a decoding subsystem 7B (which is coupled to buffer 7A). Here, the buffer 7A is delivered from the unit 3 to the decoder 7 via the subsystem 5, at least one frame or other frame of the speaker channel based audio program produced by the exemplary embodiment of the unit 3. Remember the segment (for example, in a non-temporary way). The decoding subsystem 7B is configured to parse the program and execute any necessary decoding on the program. Another example of such an APU is configured to generate a speaker channel based audio program according to a subsystem 6B according to an embodiment of the invention; it is generated by the encoder 4 of FIG. , Typically by performing distortions on the audio content of an object-based audio program, including screen-related metadata) and the buffer 6A coupled to the audio processing subsystem 6B. 5 is an implementation of the remix subsystem 6 of FIG. Here, buffer 6A includes at least one frame or other segment of the speaker channel based audio program generated by subsystem 6B (including audio content of at least two selectable sets of speaker channels; Here, at least one of those sets stores (eg, in a non-temporary way) (produced as a result of distortion according to embodiments of the invention).

本発明の典型的な実施形態は、再生環境がx軸に沿った幅、（x軸に垂直な）y軸に沿った奥行きおよび（x軸およびy軸のそれぞれに垂直な）z軸に沿った高さをもつ単位立方体であると想定する。オーディオ・プログラムによって示されるオーディオ要素（音源）（すなわち、オブジェクト・チャネルによって示されるオーディオ・オブジェクトまたはスピーカー・チャネルによって示される音源）がレンダリングされる位置は、この単位立方体において、デカルト座標(x,y,z)を使って同定される。x、y座標のそれぞれは区間[0,1]内の値域をもつ。たとえば、図４は、再生システムの表示スクリーン（再生スクリーンS'）およびスピーカー（L'、C'、R'、LsおよびRs）を含む再生環境（部屋）の図である。図４の再生スクリーンS'はx軸に沿った幅W1をもち、その中心は部屋の前方壁（y＝0である平面）の中央の垂直軸に添って位置されている。部屋の後方壁（これは幅W2をもつ）は、y＝1である平面である。前方スピーカーL'、C'およびR'は、部屋の前方壁の近くに位置され、左サラウンド・スピーカーLsは部屋の左壁（x＝0である平面）の近くに位置され、右サラウンド・スピーカーRsは部屋の右壁（x＝1である平面）の近くに位置される。 In a typical embodiment of the present invention, the reproduction environment is along the width along the x-axis, the depth along the y-axis (perpendicular to the x-axis), and the z-axis (perpendicular to the x-axis and the y-axis, respectively). It is assumed to be a unit cube with a vertical height. The position where the audio element (sound source) represented by the audio program (ie, the audio object represented by the object channel or the sound source represented by the speaker channel) is rendered is in Cartesian coordinates (x, y) in this unit cube. Identified using, z). Each of the x and y coordinates has a range within the interval [0,1]. For example, FIG. 4 is a diagram of a playback environment (room) including a playback system display screen (playback screen S') and speakers (L', C', R', Ls and Rs). The reproduction screen S'of FIG. 4 has a width W1 along the x-axis, the center of which is located along the central vertical axis of the front wall of the room (the plane where y = 0). The rear wall of the room (which has a width of W2) is a plane with y = 1. The front speakers L', C'and R'are located near the front wall of the room, the left surround speakers Ls are located near the left wall of the room (the plane where x = 0), and the right surround speakers. Rs is located near the right wall of the room (the plane where x = 1).

典型的には、再生環境のz座標は、（公称上再生システムのユーザーの耳のレベルに対応する）固定した値をもつと想定される。あるいはまた、耳レベルより下または上にあると知覚される位置においてオブジェクト（または他の音源）をレンダリングするために、レンダリング位置のz座標は（たとえば部屋が1に等しい幅、1に等しい奥行きおよび2に等しい高さをもつと想定される場合、区間[－1,1]にわたって）変動することが許容されることができる。 Typically, the z-coordinate of the playback environment is assumed to have a fixed value (nominally corresponding to the level of the user's ear in the playback system). Alternatively, in order to render the object (or other sound source) at a position that is perceived to be below or above the ear level, the z coordinate of the rendering position is (eg, the width equal to 1 in the room, the depth equal to 1 and Fluctuations (over the interval [-1,1]) can be tolerated if they are assumed to have a height equal to 2.

いくつかの実施形態では、スクリーン・パラメータ化および／または歪めは、以下のパラメータの全部または一部を使って達成される（これらは、オーサリングおよび／またはエンコードの際に決定されてもよく、送達されるプログラムのスクリーン関係メタデータによって示されてもよい）。
・参照スクリーンに対するオーディオ要素（たとえばオブジェクト）位置；
・オン・スクリーン歪めの度合い（たとえば、再生スクリーンの平面内で、または該平面に平行に実行される歪めの最大の度合いを示すパラメータ）。オーサリングは典型的には歪めを二分決定として指定してもよく、エンコード・プロセスは該二分決定を歪めなしからフル（最大）歪めの範囲にわたる連続的な（またはほとんど連続的な）変数に修正してもよいことが考えられる；
・所望されるオフ・スクリーン歪め（たとえば、再生スクリーンの平面に少なくとも実質的に平行な平面における歪めが、再生スクリーンの平面に少なくとも実質的に垂直な距離の関数として実行されるべき仕方または度合いを示す一つまたは複数のパラメータ）。オーサリングは、オーディオ要素の知覚される歪められた位置が再生スクリーン面に垂直な方向において再生スクリーンから離れるにつれての、歪めが実行されるべき仕方または度合いを示すパラメータ（単数または複数）を定義することができる。場合によっては、そのようなパラメータは、プログラムと一緒に送達されはしない（代わりに再生システムによって決定できる）；
・参照部屋に対する（またはオーサリングの間に使われた参照L/Rスピーカーに対する）参照スクリーン幅。典型的には、このパラメータは映画館については（すなわち、映画館における再生のためにオーサリングされたオーディオビジュアル・プログラムについては）1.0に等しい；
・参照部屋に対する（またはオーサリングの間に使われた参照L/Rスピーカーに対する）参照スクリーン中心位置。典型的には、このパラメータは映画館については(0.5,0,0.5)に等しい。 In some embodiments, screen parameterization and / or distortion is achieved using all or part of the following parameters (these may be determined during authoring and / or encoding and are delivered. It may be indicated by the screen-related metadata of the program to be done).
• Position of the audio element (eg object) with respect to the reference screen;
The degree of on-screen distortion (eg, a parameter that indicates the maximum degree of distortion performed in or parallel to the plane of the playback screen). Authoring may typically specify the distortion as a binary decision, and the encoding process modifies the binary decision to a continuous (or almost continuous) variable ranging from no distortion to full (maximum) distortion. It is possible that it may be;
The desired off-screen distortion (eg, how or the degree to which distortion in a plane at least substantially parallel to the plane of the playback screen should be performed as a function of a distance at least substantially perpendicular to the plane of the playback screen. One or more parameters shown). Authoring defines a parameter (s) indicating how or the degree to which distortion should be performed as the perceived distorted position of the audio element moves away from the playback screen in a direction perpendicular to the playback screen plane. Can be done. In some cases, such parameters are not delivered with the program (instead can be determined by the regeneration system);
• Reference screen width for the reference room (or for the reference L / R speakers used during authoring). Typically, this parameter is equal to 1.0 for cinemas (ie, for audiovisual programs authored for playback in cinemas);
• Reference screen center position for the reference room (or for the reference L / R speakers used during authoring). Typically, this parameter is equal to (0.5,0,0.5) for cinemas.

いくつかの実施形態では、スクリーン・パラメータ化および／または歪めは、以下のパラメータの全部または一部を使って達成される（これらは典型的には、再生システムによって、たとえば家庭シアター・セットアップの際に決定される）。
・再生部屋に対する（または再生システムL/Rスピーカーに対する）再生スクリーン幅。たとえば、このパラメータはデフォルト値1.0を有していてもよい（たとえば、エンドユーザーが再生スクリーン・サイズを指定しない場合には、再生システムは、再生スクリーンは再生部屋幅に一致すると想定する。これは事実上、歪めを無効にすることになる）；
・所望されるオフ・スクリーン歪め（たとえば、再生スクリーンの平面に少なくとも実質的に平行な平面における歪めが、再生スクリーンの平面に少なくとも実質的に垂直な距離の関数として実行されるべき仕方または度合いを示す一つまたは複数のパラメータ）。いくつかの実施形態では、再生システム（たとえば、図３の実施形態のコントローラ１０）は、（再生スクリーン面に少なくとも実質的に垂直な方向における）再生スクリーン面からのオーディオ要素の知覚される歪められた位置の距離の関数として、歪めが実行されるべき仕方または度合いを示すカスタム設定を可能にするよう構成される。典型的な実施形態では、プログラムのスクリーン関係メタデータが固定した関数またはデフォルト関数（これはたとえば再生システム・セットアップの間に、ユーザーが指定する代替的な関数によって置き換えられることができる）を示す（たとえばそれを示す少なくとも一つのオフ・スクリーン歪めパラメータを含む）ことが期待される。これが、再生スクリーンの面からのオーディオ要素の知覚される歪められた位置の距離の関数として、歪めが実行されるべき仕方を少なくとも部分的には決定する。
・再生スクリーン・アスペクト比（たとえば、デフォルト値1.0をもつ）；
・再生スクリーン中心位置（たとえば、デフォルト値(0.5,0,0.5)をもつ）。 In some embodiments, screen parameterization and / or distortion is achieved using all or part of the following parameters (these are typically achieved by a playback system, eg, during a home theater setup): Will be determined).
-Playback screen width for the playback room (or for the playback system L / R speakers). For example, this parameter may have a default value of 1.0 (for example, if the end user does not specify a playback screen size, the playback system assumes that the playback screen matches the playback room width. In effect, the distortion will be nullified);
The desired off-screen distortion (eg, how or the degree to which distortion in a plane at least substantially parallel to the plane of the playback screen should be performed as a function of a distance at least substantially perpendicular to the plane of the playback screen. One or more parameters shown). In some embodiments, the playback system (eg, controller 10 of the embodiment of FIG. 3) is perceived distorted audio elements from the playback screen surface (at least substantially perpendicular to the playback screen surface). As a function of the distance between positions, it is configured to allow custom settings that indicate how or how much distortion should be performed. In a typical embodiment, the program's screen-related metadata shows a fixed function or default function (which can be replaced, for example, by a user-specified alternative function during playback system setup). For example, it is expected to include at least one off-screen distortion parameter to indicate it). This, at least in part, determines how the distortion should be performed as a function of the perceived distorted position distance of the audio element from the surface of the playback screen.
-Playback screen aspect ratio (for example, with a default value of 1.0);
-Playback screen center position (for example, with the default value (0.5,0,0.5)).

いくつかの実施形態では、歪めは、送達されたプログラムのスクリーン関係メタデータによって示されてもよい他のパラメータを（上述したパラメータの一部または全部の代わりにまたはそれに加えて）使って達成される。たとえば、プログラムの各チャネル（オブジェクト・チャネルまたはスピーカー・チャネル）について（またはプログラムのチャネルのうちいくつかのそれぞれについて）、以下のパラメータのうち一つまたは複数が提供されることができる。 In some embodiments, distortion is achieved using other parameters (instead of or in addition to some or all of the parameters described above) that may be indicated by the screen-related metadata of the delivered program. To. For example, for each channel of the program (object channel or speaker channel) (or for each of some of the channels of the program), one or more of the following parameters can be provided.

１．歪め有効化。このパラメータは、チャネルによって決定される少なくとも一つのオーディオ要素の知覚される位置を歪めるために処理が実行されるべきか否かを示す。このパラメータは典型的には、歪めが実行されるべきか否かを示すバイナリー値である。例は、後述する「apply_screen_warping」値である。 1. 1. Distortion enabled. This parameter indicates whether processing should be performed to distort the perceived position of at least one audio element determined by the channel. This parameter is typically a binary value that indicates whether distortion should be performed. An example is the "apply_screen_warping" value described below.

２．歪めの度合い（たとえば、それぞれが範囲[0,1]または他の所定の範囲における多くの異なる値のうちの任意のものをもつ一つまたは複数の浮動小数点値または一つまたは複数の他の非バイナリー・パラメータ）。そのような歪め度パラメータ（単数または複数）は典型的には、参照スクリーンの面内（または該面に平行な）位置から再生スクリーンの面内（または該面に平行な）位置への歪めを制御する関数を修正して、再生スクリーンの面内において（または該面に並行に）実行されるべき歪めの最大の度合いを決定する。歪め度パラメータ（またはパラメータ集合）は、再生スクリーンの幅が沿う軸（たとえばx軸）に沿った（または該軸に平行な）および再生スクリーンの高さが沿う軸（たとえばz軸）に沿った（または該軸に平行な）歪めについて異なることができる。 2. 2. Degree of distortion (eg, one or more floating point values or one or more other nons, each with any of a range [0,1] or many different values in another predetermined range. Binary parameters). Such distortion parameters (s) typically distort from an in-plane (or parallel to) position on the reference screen to an in-plane (or parallel to) position on the playback screen. The controlling function is modified to determine the maximum degree of distortion to be performed within (or parallel to) the plane of the playback screen. The distortion parameter (or set of parameters) is along (or parallel to) an axis along which the width of the playback screen is (eg, x-axis) and along an axis (eg, z-axis) along which the height of the playback screen is. It can be different for distortion (or parallel to the axis).

３．奥行き歪め（たとえば、それぞれ所定の範囲[1,N]、たとえばN＝2、における任意の浮動小数点値をもつ一つまたは複数のパラメータ）。そのようなパラメータ（単数または複数）（本稿では時に「オフ・スクリーン歪めパラメータ」と称される）は典型的には、オフ・スクリーン・オーディオ要素の歪めを制御する関数を修正して、再生スクリーンの面からの距離（奥行き）の関数としてオーディオ要素レンダリング位置の歪めの度合いまたは最大歪めを制御する。たとえば、そのようなパラメータは、（再生部屋の前部にある）再生スクリーンから再生部屋の後部に、あるいはその逆方向に「飛んでいる」ように知覚されるよう意図されているオーディオ要素のレンダリング位置のシーケンスの（再生スクリーンの面に少なくとも実質的に平行な）歪めの度合いを制御することができる。 3. 3. Depth distortion (eg, one or more parameters with any floating point value in a given range [1, N], eg N = 2). Such parameters (s) (sometimes referred to in this article as "off-screen distortion parameters") typically modify the function that controls the distortion of the off-screen audio element to the playback screen. Controls the degree of distortion or maximum distortion of the audio element rendering position as a function of the distance (depth) from the surface of. For example, rendering of an audio element that is intended to be perceived as "flying" from the playback screen (at the front of the playback room) to the rear of the playback room and vice versa. You can control the degree of distortion (at least substantially parallel to the plane of the playback screen) of the sequence of positions.

たとえば、あるクラスの実施形態では、歪めは、オーディオ・プログラム（たとえば、オブジェクト・ベースのオーディオ・プログラム）に含まれるスクリーン関係メタデータを使って達成される。ここで、スクリーン関係メタデータは、再生システムによって実行されるべき歪めの最大の度合い（たとえば、再生スクリーンの面内においてまたは該面に平行に実行されるべき歪めの最大の度合い）を示す少なくとも一つの非バイナリー値（たとえば、連続的に可変なまたは所定の範囲内の多くの値のうちの任意の値をもつスカラー値）を示す。たとえば、非バイナリー値は、ある最大値（フルの歪めが実行されて、たとえば、参照スクリーンの右端にあるようプログラムによって定義されているオーディオ要素位置を再生スクリーンの右端の歪められた位置に歪めるべきであることを示す）からある最小値（歪めが実行されるべきでないことを示す）までの範囲の浮動小数点値であってもよい。一例では、前記範囲の中点における非バイナリー値は、半分の歪め（50%歪め）が実行される（たとえば、参照スクリーンの右端にあるようプログラムによって定義されているオーディオ要素位置を再生部屋の右端と再生スクリーンの右端の間の中間の歪められた位置に歪める）べきであることを示してもよい。 For example, in one class of embodiment, distortion is achieved using screen-related metadata contained in an audio program (eg, an object-based audio program). Here, the screen-related metadata indicates at least one degree of distortion to be performed by the playback system (eg, the maximum degree of distortion to be performed in or parallel to the plane of the playback screen). Indicates one non-binary value (eg, a scalar value with any value out of many values that are continuously variable or within a given range). For example, a non-binary value should have a maximum value (full distortion is performed, for example, the audio element position defined by the program to be on the right edge of the reference screen is distorted to the distorted position on the right edge of the playback screen. It may be a floating point value in the range (indicating that) to a minimum value (indicating that distortion should not be performed). In one example, a non-binary value at the midpoint of the range will be half-distorted (50% distorted) (for example, the right edge of the playback room at the audio element position defined by the program to be at the right edge of the reference screen). And should be distorted to an intermediate distorted position between the right edge of the playback screen).

このクラスのいくつかの実施形態では、プログラムは、該プログラムの各オブジェクト・チャネルについてそのようなメタデータを含むオブジェクト・ベースのオーディオ・プログラムである。前記メタデータは、それぞれの対応するオブジェクトに対して実行されるべき歪めの最大の度合いを示す。たとえば、メタデータは、異なるオブジェクト・チャネルによって示される各オブジェクトについて、再生スクリーンの面内におけるまたは該面に平行な歪めの異なる最大の度合いを示すことができる。もう一つの例として、メタデータは、異なるオブジェクト・チャネルによって示される各オブジェクトについて、再生スクリーンの面内のまたは該面に平行な垂直方向の（たとえば図４のz軸に平行な）歪めの異なる最大の度合いおよび再生スクリーンの面内のまたは該面に平行な水平方向の（たとえば図４のx軸に平行な）歪めの異なる最大の度合いを示すことができる。 In some embodiments of this class, the program is an object-based audio program that contains such metadata for each object channel of the program. The metadata indicates the maximum degree of distortion to be performed for each corresponding object. For example, the metadata can indicate different maximum degrees of distortion within or parallel to a plane of the playback screen for each object represented by a different object channel. As another example, the metadata shows that for each object represented by a different object channel, the distortion is different in the plane of the playback screen or in the vertical direction parallel to that plane (eg, parallel to the z-axis in FIG. 4). It is possible to indicate the maximum degree and different maximum degrees of horizontal distortion (eg, parallel to the x-axis in FIG. 4) in or parallel to the plane of the reproduction screen.

このクラスのいくつかの実施形態では、オーディオ・プログラムは、オフ・スクリーン歪めの少なくとも一つの特性を示すスクリーン関係メタデータをも含む（そして該スクリーン関係メタデータを使って歪めが達成される）。（該特性は、たとえば、再生スクリーンの平面に少なくとも実質的に平行な面内で歪めが実行される仕方または度合いを、再生スクリーンの平面に少なくとも実質的に垂直な距離の関数として示す。）いくつかのそのような実施形態では、プログラムは、該プログラムの各オブジェクト・チャネルについてそのようなメタデータを含むオブジェクト・ベースのオーディオ・プログラムである。前記メタデータは、それぞれの対応するオブジェクトに対して実行されるべきオフ・スクリーン歪めの少なくとも一つの特性を示す。たとえば、プログラムは、各オブジェクト・チャネルについて、各対応するオブジェクトに対して実行されるべきオフ・スクリーン歪めの型を示すそのようなメタデータを含むことができる（すなわち、メタデータは、各オブジェクト・チャネルに対応するオブジェクトについて異なる型のオフ・スクリーン歪めを指定できる）。 In some embodiments of this class, the audio program also includes screen-related metadata that exhibits at least one characteristic of off-screen distortion (and the distortion is achieved using the screen-related metadata). (The property, for example, indicates how or the degree to which distortion is performed in a plane that is at least substantially parallel to the plane of the reproduction screen, as a function of a distance that is at least substantially perpendicular to the plane of the reproduction screen.) In such an embodiment, the program is an object-based audio program containing such metadata for each object channel of the program. The metadata exhibits at least one characteristic of off-screen distortion to be performed for each corresponding object. For example, a program can contain such metadata for each object channel that indicates the type of off-screen distortion to be performed for each corresponding object (ie, the metadata is for each object. You can specify different types of off-screen distortion for the objects that correspond to the channel).

次に、本発明の実施形態に基づく歪めを実装するためにオーディオ・プログラムを処理する方法の例を記述する。 Next, an example of how to process an audio program to implement distortion according to an embodiment of the present invention will be described.

例示的な方法では、オーディオ・プログラムのスクリーン関係メタデータは、（オーディオ・コンテンツが歪められるべき各チャネルについての）再生スクリーンの面内のまたは該面に平行な、当該チャネルによって示される少なくとも一つのオーディオ要素に対して再生システムによって実行されるべき歪めの最大の度合いを示す非バイナリー値をもつ少なくとも一つの歪め度パラメータを含む。それにより、プログラムが（参照スクリーンの面内においてかつ）参照スクリーンに対して諸位置においてレンダリングされるべきであることを示すオーディオ要素が、（再生スクリーンの面内においてかつ）再生スクリーンに対する歪められた諸位置においてレンダリングされる。好ましくは、各チャネルについて一つまたは二つのそのような歪め度パラメータが含まれる。水平方向において（たとえば図４のx軸に沿って）当該チャネルによって示される少なくとも一つのオーディオ要素にどのくらいの歪めが適用されるべきか（すなわち、適用されるべき歪めの最大の度合い）を制御する歪め因子を示すもの（たとえば後述する値XFACTOR）および／または垂直方向において（たとえば図４のz軸に沿って）当該チャネルによって示される少なくとも一つのオーディオ要素にどのくらいの歪めが適用されるべきか（すなわち、適用されるべき歪めの最大の度合い）を制御する歪め因子を示すものである。プログラムのスクリーン関係メタデータは、各チャネルについてのオフ・スクリーン歪めパラメータ（たとえば後述する値EXP）をも示す。これは、再生スクリーンの面に垂直な（対応するオーディオ要素の歪められた位置の）距離の関数として、実行されるべきオフ・スクリーン歪めの少なくとも一つの特性を制御する。たとえば、オフ・スクリーン歪めパラメータは、再生スクリーンの面に垂直な奥行き（図４のy軸に沿った距離）の関数としてオーディオ要素の歪められた位置の歪めの仕方または度合いまたは最大歪めを制御してもよい。 In an exemplary method, the screen-related metadata of the audio program is at least one indicated by the channel in or parallel to the plane of the playback screen (for each channel in which the audio content should be distorted). Includes at least one distortion parameter with a non-binary value that indicates the maximum degree of distortion to be performed by the playback system for the audio element. Thereby, the audio element indicating that the program should be rendered at various positions with respect to the reference screen (in the plane of the reference screen and) is distorted with respect to the playback screen (in the plane of the playback screen and). Rendered at various positions. Preferably, one or two such distortion parameters are included for each channel. Controls how much distortion should be applied to at least one audio element indicated by the channel in the horizontal direction (eg, along the x-axis in FIG. 4) (ie, the maximum degree of distortion to be applied). What indicates a distortion factor (eg, the value XFACTOR described below) and / or how much distortion should be applied to at least one audio element indicated by the channel in the vertical direction (eg along the z-axis in FIG. 4) (eg, along the z-axis in FIG. 4). That is, it indicates a distortion factor that controls (the maximum degree of distortion to be applied). The program's screen-related metadata also shows off-screen distortion parameters (eg, the value EXP described below) for each channel. It controls at least one characteristic of off-screen distortion to be performed as a function of distance (at the distorted position of the corresponding audio element) perpendicular to the plane of the playback screen. For example, the off-screen distortion parameter controls how or the degree or maximum distortion of the distorted position of an audio element as a function of depth perpendicular to the plane of the playback screen (distance along the y-axis in Figure 4). You may.

例示的実施形態では、プログラムのスクリーン関係メタデータは、プログラムについての（またはプログラムのセグメントのシーケンスの各セグメントについての）バイナリー値（本稿ではapply_screen_warping〔スクリーン歪めを適用〕と称される）をも含む。（プログラムまたはそのセグメントについての）apply_screen_warpingの値が「オフ」を示す場合には、再生システムによって対応するオーディオ・コンテンツに歪めが適用されない。たとえば、再生スクリーンの面内の（または再生スクリーンと一致する）知覚される諸位置をもってレンダリングされるべきであるが、ビジュアルに緊密に結びついている必要はないオーディオ・コンテンツ（たとえば、音楽または周囲音であるオーディオ・コンテンツ）については、このように歪めが無効にされることができる。（プログラムまたはそのセグメントについての）apply_screen_warpingの値が「オン」を示す場合には、再生システムは次のようにして対応するオーディオ・コンテンツに歪めを適用する。パラメータapply_screen_warpingは、本発明に基づいて使われるおよび／または生成される型の「歪め度」パラメータの例ではない。 In an exemplary embodiment, the program's screen-related metadata also includes a binary value for the program (or for each segment of the program's sequence of segments) (referred to herein as apply_screen_warping). .. If the value of apply_screen_warping (for the program or its segment) indicates "off", then the playback system does not apply any distortion to the corresponding audio content. For example, audio content (eg, music or ambient sound) that should be rendered with perceived positions within the plane of the playback screen (or in line with the playback screen), but does not need to be closely tied to the visual sense. For audio content), distortion can be disabled in this way. If the value of apply_screen_warping (for the program or its segment) indicates "on", the playback system applies the distortion to the corresponding audio content as follows: The parameter apply_screen_warping is not an example of the "distortion degree" parameter of the type used and / or generated under the present invention.

以下の記述は、プログラムがオブジェクト・ベースのプログラムであり、歪めを受ける各チャネルが、プログラムによって決定される歪められていない位置（これは時間変化する位置であってもよい）をもつオーディオ・オブジェクトを示すオブジェクト・チャネルであると想定する。当業者には、スピーカー・チャネルがプログラムによって決定される歪められていない位置（これは時間変化する位置であってもよい）をもつ少なくとも一つのオーディオ要素を示す場合に、プログラムのスピーカー・チャネルの歪めを実装するために本記述をどのように修正するかは明白であろう。以下の記述は、再生環境が図４に示されるようなものであることおよび再生システムがプログラムに応答して（図４に示されるスピーカーL’、C'、R'、LsおよびRsについての）五つのスピーカー・フィードを生成するよう構成されていることも想定する。 The following description is an audio object where the program is an object-based program and each strained channel has an undistorted position (which may be a time-varying position) determined by the program. Suppose it is an object channel that indicates. Those skilled in the art will appreciate that the speaker channel of a program is of the speaker channel if the speaker channel presents at least one audio element with an undistorted position (which may be a time-varying position) determined by the program. It will be clear how to modify this description to implement the distortion. The following description states that the playback environment is as shown in FIG. 4 and that the playback system responds to the program (for speakers L', C', R', Ls and Rs shown in FIG. 4). It is also assumed that it is configured to generate five speaker feeds.

本例示的実施形態では、再生システム（たとえば図３のシステムのサブシステム９）は、プログラムから（たとえば、プログラムのスクリーン関係メタデータから）、（再生システムによって決定されるべき歪められた位置においてレンダリングされるべき）オブジェクトの歪められていない位置を示す次の値
Xs＝(x－RefSXcenterpos)/RefSWidth
を決定する。ここで、xは、参照スクリーンの左端に対する水平（xまたは「幅」）軸に沿った、歪められていないオブジェクト位置であり、RefSXcenterposは、水平軸に沿った参照スクリーンの中心点の位置であり、RefSWidthは参照スクリーンの（水平軸に沿った）幅である。 In this exemplary embodiment, the playback system (eg, subsystem 9 of the system of FIG. 3) is rendered from the program (eg, from the screen-related metadata of the program) at a distorted position to be determined by the playback system. The next value that indicates the undistorted position of the object (which should be)
Xs ＝ (x－RefSXcenterpos) / RefSWidth
To decide. Where x is the undistorted object position along the horizontal (x or "width") axis with respect to the left edge of the reference screen, and RefSXcenterpos is the position of the reference screen center point along the horizontal axis. , RefSWidth is the width (along the horizontal axis) of the reference screen.

再生システム（たとえば図３のシステムのサブシステム９）は、プログラムのスクリーン関係メタデータ（および再生システム構成を示す他のデータ）を使って次の値
Xwarp＝Xs*SWidth＋SXcenterpos
YFACTOR＝y^EXPおよび
X'＝x*YFACTOR＋(1－YFACTOR)*[XFACTOR*Xwarp＋(l－XFACTOR)*x)]
を生成するよう構成されている。ここで、Xwarpは、再生システム表示スクリーン（「再生スクリーン」）の左端に対する水平（xまたは「幅」）軸に沿った、生の（スケーリングされていない）歪められたオブジェクト位置であり、Xsは再生スクリーンの中心点に対する、水平軸に沿った、前記歪められたオブジェクト位置であり、SXcenterposは、水平軸に沿った再生スクリーンの中心点の位置であり、SWidthは再生スクリーンの（水平軸に沿った）幅であり、
YFACTORは、再生スクリーンの面に垂直な奥行き軸（図４のy軸）に沿った位置の関数としての、水平（幅）軸に沿った歪めの度合いを示す奥行き歪め因子であり、yは奥行き軸に沿った歪められたオブジェクト位置であり、EXPは本稿でいうところの「オフ・スクリーン歪め」パラメータの例である所定の（たとえばユーザー選択された）定数であり、
X'は、再生スクリーンの左端に対する水平軸に沿った歪められたオブジェクト位置（生の歪められたオブジェクト位置Xwarpのスケーリングされたバージョン）を表わし（よって、再生環境の水平面内における歪められたオブジェクト位置は座標X',yをもつ点である）、XFACTORは、プログラムのスクリーン関係メタデータによって示される幅軸歪めパラメータ（これは、プログラムのオーサリング、ミキシング、リミックスまたはエンコードの間に決定されうる）である。XFACTORは本稿でいうところの「歪め度」パラメータの例である。 The playback system (eg, subsystem 9 of the system of FIG. 3) uses the program's screen-related metadata (and other data indicating the playback system configuration) to the following values:
Xwarp ＝ Xs * SWidth ＋ SXcenterpos
YFACTOR = y ^EXP and
X'＝ x * YFACTOR ＋ (1-YFACTOR) * [XFACTOR * Xwarp ＋ (l－XFACTOR) * x)]
Is configured to generate. Where Xwarp is the raw (unscaled) distorted object position along the horizontal (x or "width") axis with respect to the left edge of the playback system display screen ("play screen"), where Xs is. The distorted object position along the horizontal axis with respect to the center point of the playback screen, SXcenterpos is the position of the center point of the playback screen along the horizontal axis, and SWidth is the position of the center point of the playback screen (along the horizontal axis). T) width,
YFACTOR is a depth distortion factor that indicates the degree of distortion along the horizontal (width) axis as a function of the position along the depth axis (y-axis in FIG. 4) perpendicular to the surface of the playback screen, and y is the depth. A distorted object position along an axis, EXP is a given (eg user-selected) constant that is an example of the "off-screen distortion" parameter in this article.
X'represents a distorted object position along the horizontal axis with respect to the left edge of the playback screen (a scaled version of the raw distorted object position Xwarp) (hence the distorted object position in the horizontal plane of the playback environment). Is a point with coordinates X', y), XFACTOR is the width axis distortion parameter indicated by the program's screen-related metadata, which can be determined during program authoring, mixing, remixing or encoding. be. XFACTOR is an example of the "distortion degree" parameter in this paper.

垂直（zまたは「高さ」）軸に沿っての（プログラムによって決定される）歪められていないオブジェクト位置の、再生スクリーンに対する垂直軸に沿った歪められた位置への歪めは、参照スクリーンのアスペクト比および再生スクリーンのアスペクト比を考慮に入れて、上記の式のトリビアルな修正（水平またはx軸への言及を垂直またはz軸への言及で置き換える）によって決定される仕方で実行されることができる。 Distortion of an undistorted object position (determined by the program) along the vertical (z or "height") axis to a distorted position along the vertical axis with respect to the playback screen is the aspect of the reference screen. It can be performed in a manner determined by a trivial modification of the above equation, replacing the horizontal or x-axis reference with a vertical or z-axis reference, taking into account the ratio and the aspect ratio of the playback screen. can.

パラメータXFACTORは0から1まで（両端含む）の範囲の値をもつ（すなわち、この範囲内の少なくとも三つの値のうちの一つ、典型的には多くの値のうちの一つをもつ）。XFACTORの値は水平軸に沿って歪めが適用される度合いを制御する。XFACTOR＝1であれば、水平軸に沿ってフル歪めが実行される（それにより、オブジェクトの歪められていない位置が再生スクリーンから外れていたとしても、歪められた位置は再生スクリーン上になる）。XFACTOR＝1/2（または1より小さい他の値）であれば、低減された量の歪めがx軸に沿って実行される（それにより、オブジェクトの歪められていない位置が再生スクリーンから遠く外れている場合、たとえば左前方再生スピーカーの位置にある場合、歪められた位置も再生スクリーンから外れて、たとえば左前方スピーカーと再生スクリーンの左端との間の中間になることがありうる）。さまざまな理由により、XFACTORを1より小さいが0より大きい値に設定することが有用であることがある。たとえば、歪めが所望されるが、小さな再生スクリーンへのフル歪めは望ましくないと見なされる場合、あるいは（たとえば拡散音源について）オーディオ・オブジェクト位置が表示スクリーン・サイズとゆるやかに結びついているだけである場合がそうである。 The parameter XFACTOR has values in the range 0 to 1 (including both ends) (ie, one of at least three values in this range, typically one of many). The value of XFACTOR controls the degree to which distortion is applied along the horizontal axis. If XFACTOR = 1, full distortion is performed along the horizontal axis (so that the distorted position is on the playback screen, even if the undistorted position of the object is off the playback screen). .. If XFACTOR = 1/2 (or any other value less than 1), a reduced amount of distortion is performed along the x-axis (so that the undistorted position of the object is far off the playback screen. If so, for example, if it is in the position of the left front playback speaker, the distorted position can also be off the playback screen, for example, halfway between the left front speaker and the left edge of the playback screen). For various reasons, it may be useful to set the XFACTOR to a value less than 1 but greater than 0. For example, if distortion is desired, but full distortion to a small playback screen is considered undesirable, or if the audio object position (for example, for diffuse sources) is only loosely tied to the display screen size. Is so.

パラメータYFACTORは、奥行き軸に沿ったオーディオ・オブジェクトの歪められた位置の関数として、（水平軸および／または垂直軸に沿っての）歪めの度合いを制御するために使われ、パラメータYFACTORの値は奥行き軸に沿ったオブジェクトの歪められた位置の関数である。上記の例において、この関数は、指数関数である。代替的な実施形態では、この例示的な指数関数の変形であるまたは他の仕方でこの例示的な指数関数と異なる他の関数がYFACTORを決定するために用いられる（たとえば、YFACTORは、奥行き軸に沿った歪められたオブジェクト位置yの余弦または余弦の冪乗であってもよい）。YFACTOR＝y^EXPである上記の例では、（典型的な選択であると期待されるように）EXPが0より大きいとき、再生部屋の前部における（すなわち、再生スクリーン上の）歪められていない位置をもつ音の（奥行き軸に垂直なxおよび／またはz方向における）歪めの度合いは、部屋の前部から遠い（すなわち、再生部屋の後方壁における）歪められていない位置をもつ音の（奥行き軸に垂直な方向における）歪めの度合いよりも大きい。EXPが0より大きく、y＝0である場合（すなわち、オブジェクトの歪められた位置および歪められていない位置が再生部屋の前部にある再生スクリーンの面内にある場合）には、YFACTOR＝0であり、水平な「幅」軸に沿った歪められた位置（X'）は、幅軸に沿った歪められていない位置（x）およびパラメータXFACTORおよびXwarpによって決定される。EXPが0より大きく、y＝1である場合（すなわち、オブジェクトの歪められた位置および歪められていない位置が再生部屋の後部にある場合）には、YFACTOR＝1であり、水平な「幅」軸に沿った歪められた位置（X'）は、幅軸に沿った歪められていない位置（x）に等しく、よって、事実上、この場合にはオブジェクトに対して歪めは（幅軸に沿っては）実行されないことになる。 The parameter YFACTOR is used to control the degree of distortion (along the horizontal and / or vertical axis) as a function of the distorted position of the audio object along the depth axis, and the value of the parameter YFACTOR is It is a function of the distorted position of the object along the depth axis. In the above example, this function is an exponential function. In an alternative embodiment, another function that is a variant of this exemplary exponential function or otherwise different from this exemplary exponential function is used to determine YFACTOR (eg, YFACTOR is the depth axis). It may be the cosine of the distorted object position y along, or the power of the cosine). In the above example where YFACTOR = y ^EXP , when EXP is greater than 0 (as expected to be a typical choice), it is not distorted in the front of the playback room (ie, on the playback screen). The degree of distortion (in the x and / or z directions perpendicular to the depth axis) of a sound with position is that of a sound with an undistorted position (ie, on the rear wall of the playback room) far from the front of the room. Greater than the degree of distortion (in the direction perpendicular to the depth axis). If EXP is greater than 0 and y = 0 (ie, the distorted and undistorted position of the object is in the plane of the playback screen at the front of the playback room), YFACTOR = 0. And the distorted position (X') along the horizontal "width" axis is determined by the undistorted position (x) along the width axis and the parameters XFACTOR and Xwarp. If EXP is greater than 0 and y = 1 (ie, the distorted and undistorted position of the object is at the rear of the playback room), then YFACTOR = 1 and the horizontal "width". The distorted position (X') along the axis is equal to the undistorted position (x) along the width axis, thus effectively distorting the object (along the width axis) in this case. It will not be executed.

より特定的な例として、図４のオーディオ・オブジェクトA1は、歪められていない位置を（よって歪められた位置を）再生部屋の前部における再生スクリーンS'の面内にもつ（すなわち、y＝y1＝0）。EXPが0より大きければ、オブジェクトA1に対する水平軸歪めを実行するためにYFACTOR＝0であり、歪めはオブジェクトA1の歪められた位置を、再生スクリーンS'に一致する何らかの位置X'＝x1、y＝0に置く（たとえば図４に示されるように）。図４のオーディオ・オブジェクトA2は、歪められていない位置を（よって歪められた位置を）を再生部屋の前方壁および後方壁の間に（0＜y2＜1において）もつ。EXPが0より大きければ、オブジェクトA2に対する水平軸歪めを実行するためにYFACTORは0より大きく、歪めはオブジェクトA2の歪められた位置を、点T1とT2の間の線分に沿った何らかの位置X'＝x2、y＝y2に置く（たとえば図４に示されるように）。点T1とT2の間の離間は（図４に示されるように）W3であり、EXPは0より大きいので、W3はW1＜W3＜W2を満たす。ここで、W1はスクリーンS'の幅であり、W2は再生部屋の幅である。EXPの特定の値は、W3の値を決定する。W3は、歪めによって、再生スクリーンS'に対して奥行きy＝y2のところのオブジェクトがマッピングされることのできる幅範囲である。EXPが1より大きければ、歪めはオブジェクトA2の歪められた位置を、（図４に示される）曲線C1とC2の間の位置に置く。ここで、曲線C1とC2の間の離間（W3）は、（図４に示されるように）奥行きパラメータyの指数関数的に増大する関数であり、離間W3は、yがより大きな値をもつときほど（yの値の増大とともに）より急速に増大し、yがより小さな値をもつときほど（yの値の増大とともに）より遅く増大する。 As a more specific example, the audio object A1 in FIG. 4 has an undistorted position (and thus a distorted position) in the plane of the playback screen S'at the front of the playback room (ie, y =). y1 = 0). If EXP is greater than 0, then YFACTOR = 0 to perform horizontal axis distortion for object A1, and distortion is the distorted position of object A1 at some position X'= x1, y that matches the playback screen S'. Place at = 0 (eg, as shown in Figure 4). The audio object A2 of FIG. 4 has an undistorted position (and thus a distorted position) between the front and rear walls of the reproduction room (at 0 <y2 <1). If EXP is greater than 0, then YFACTOR is greater than 0 to perform horizontal axis distortion for object A2, and distortion is the distorted position of object A2 at some position along the line segment between points T1 and T2. '= x2, y = y2 (for example, as shown in Figure 4). Since the distance between points T1 and T2 is W3 (as shown in FIG. 4) and EXP is greater than 0, W3 satisfies W1 <W3 <W2. Here, W1 is the width of the screen S'and W2 is the width of the playback room. The specific value of EXP determines the value of W3. W3 is the width range in which the object at the depth y = y2 can be mapped to the playback screen S'by distortion. If EXP is greater than 1, distortion puts the distorted position of object A2 in a position between curves C1 and C2 (shown in FIG. 4). Here, the distance (W3) between the curves C1 and C2 is a function that increases exponentially with the depth parameter y (as shown in FIG. 4), where the distance W3 has a larger value for y. It grows more rapidly (as the value of y increases) and slower as y has a smaller value (as the value of y increases).

図４の曲線C1およびC2を参照して述べた例示的実施形態に対する変形である（図４Ａを参照して後述する）他の実施形態では、EXPは1に等しい。よって、歪めは、オブジェクトA2の歪められた位置を二つの曲線（たとえば図４Ａにおける曲線C3およびC4）の間の位置に置く。ここで、曲線C3とC4の間の離間は奥行きパラメータyの線形に増大する関数である。図４の曲線C1およびC2を参照して述べた例示的実施形態に対する変形である（図４Ｂを参照して後述する）他の実施形態では、EXPは0より大きいが1より小さく、よって、歪めは、オブジェクトA2の歪められた位置を二つの曲線（たとえば図４Ｂにおける曲線C5およびC6）の間の位置に置く。ここで、曲線C5とC6の間の離間は奥行きパラメータyの対数的に増大する関数である。曲線間の離間は、yがより小さな値をもつときほど（yの値の増大とともに）より急速に増大し、yがより大きな値をもつときほど（yの値の増大とともに）より遅く増大する。EXPが1以下である実施形態は典型的であると期待される。というのも、そのような実施形態では、歪めの効果が、EXPが1より大きい場合よりも、yの増大する値とともに（すなわち、スクリーンからの歪められた位置の距離の増大とともに）より急速に減少するからである。EXPが1より小さいときは、歪めの効果は歪められた位置がスクリーンから離れはじめる際には急速に減少し、歪められた位置がスクリーンからさらに遠くに進むにつれて漸進的によりゆっくり減少し、しまいには歪められた位置は、歪めが実行されない後方壁に到達する。 In other embodiments (discussed below with reference to FIG. 4A) that are variants of the exemplary embodiment described with reference to curves C1 and C2 of FIG. 4, EXP is equal to 1. Thus, distortion places the distorted position of object A2 between two curves (eg, curves C3 and C4 in FIG. 4A). Here, the distance between the curves C3 and C4 is a linearly increasing function of the depth parameter y. In other embodiments (discussed below with reference to FIG. 4B) that are variants of the exemplary embodiment described with reference to curves C1 and C2 of FIG. 4, EXP is greater than 0 but less than 1 and thus distorted. Places the distorted position of object A2 between two curves (eg, curves C5 and C6 in FIG. 4B). Here, the distance between the curves C5 and C6 is a logarithmic increasing function of the depth parameter y. The distance between the curves increases more rapidly as y has a smaller value (as the value of y increases) and slower as y has a larger value (as the value of y increases). .. Embodiments with an EXP of 1 or less are expected to be typical. For, in such an embodiment, the effect of distortion is faster with increasing values of y (ie, with increasing distance of the distorted position from the screen) than if EXP is greater than 1. Because it will decrease. When EXP is less than 1, the effect of distortion decreases rapidly as the distorted position begins to move away from the screen, and gradually decreases more slowly as the distorted position moves further away from the screen, and finally. The distorted position reaches the rear wall where the distortion is not performed.

次に、オブジェクト・ベースのプログラムに応答して、（たとえばスクリーン関係メタデータを使って）歪める段階を含む仕方で（スピーカー・チャネルを含むがオブジェクト・チャネルを含まない）スピーカー・チャネル・ベースのオーディオ・プログラムが生成される別のクラスの実施形態について述べる。該スピーカー・チャネル・ベースのオーディオ・プログラムは、スピーカー・チャネルの少なくとも一つの集合を含み、それは、オブジェクト・ベースのプログラムのオーディオ・コンテンツを、少なくとも部分的には歪め度パラメータによって（および／またはオフ・スクリーン歪めパラメータを使って）決定される度合いまで歪める結果として生成され、再生システム表示スクリーンに対する所定の諸位置に位置されるラウドスピーカーによる再生のために意図される。このクラスのいくつかの実施形態では、スピーカー・チャネル・ベースのオーディオ・プログラムは、スピーカー・チャネルの二つ以上の選択可能な集合を含むよう生成され、それらの集合の少なくとも一つは、歪めの結果として生成され、再生システム表示スクリーンに対する所定の諸位置に位置されるラウドスピーカーによる再生のために意図される。スピーカー・チャネル・ベースのプログラムの生成は、オブジェクト・ベースのオーディオ・プログラムのデコードおよびレンダリングを実行するよう構成されていない（だがスピーカー・チャネル・ベースのプログラムをデコードし、レンダリングすることはできる）再生システムによる対スクリーン・レンダリングをサポートする。典型的には、スピーカー・チャネル・ベースのプログラムは、特定の再生システム・スピーカーおよびスクリーン構成の知識をもつ（または該構成を想定する）リミックス・システムによって生成される。典型的には、前記オブジェクト・ベースのプログラム（それに応答して前記スピーカー・チャネル・ベースのプログラムが生成される）は、好適に構成された（オブジェクト・ベースのプログラムをデコードおよびレンダリングできる）再生システムによる前記オブジェクト・ベースのプログラムの対スクリーン・レンダリングをサポートするスクリーン関係メタデータを含む。 Then, in response to an object-based program, speaker channel-based audio (including speaker channels but not object channels) that includes a distorting step (using screen-related metadata, for example). -Describe an embodiment of another class in which a program is generated. The speaker channel-based audio program comprises at least one set of speaker channels, which distorts (and / or off) the audio content of the object-based program, at least in part, by the distortion parameter. Generated as a result of distortion to a determined degree (using screen distortion parameters) and intended for reproduction by loudspeakers located in predetermined positions with respect to the reproduction system display screen. In some embodiments of this class, speaker channel-based audio programs are generated to include two or more selectable sets of speaker channels, at least one of which is distorted. It is intended for playback by a loudspeaker that is produced as a result and is located at predetermined locations with respect to the playback system display screen. Speaker channel-based program generation is not configured to perform decoding and rendering of object-based audio programs (although speaker channel-based programs can be decoded and rendered) playback. Supports system-to-screen rendering. Typically, speaker channel-based programs are generated by a remix system that has (or assumes) knowledge of a particular playback system speaker and screen configuration. Typically, the object-based program (in response to which the speaker channel-based program is generated) is a well-configured playback system (which can decode and render the object-based program). Includes screen-related metadata that supports screen-to-screen rendering of said object-based program by.

このクラスの実施形態は、対スクリーン・レンダリングを実装することが望まれるが利用可能な再生システム（単数または複数）がオブジェクト・ベースのプログラムをレンダリングするよう構成されていない場合に特に有用である。スピーカー・チャネルのみを含む（オブジェクト・チャネルは含まない）オーディオ・プログラムの対スクリーン・レンダリングを実装するためには、まず、対スクリーン・レンダリングをサポートするオブジェクト・ベースのプログラムが本発明の実施形態に従って生成される。次いで、該オブジェクト・ベースのプログラムに応答して、（対スクリーン・レンダリングをサポートする）スピーカー・チャネル・ベースのオーディオ・プログラムが生成される。スピーカー・チャネル・ベースのオーディオ・プログラムは、スピーカー・チャネルの少なくとも二つの選択可能な集合を含んでいてもよく、再生システムは、対スクリーン・レンダリングを実装するために、スピーカー・チャネルのそれらの集合のうちの選択された一つをレンダリングするよう構成されていてもよい。 Embodiments of this class are particularly useful when it is desired to implement anti-screen rendering but the available playback system (s) are not configured to render object-based programs. To implement anti-screen rendering of an audio program that contains only speaker channels (not object channels), an object-based program that supports anti-screen rendering first follows an embodiment of the invention. Generated. A speaker channel-based audio program (which supports anti-screen rendering) is then generated in response to the object-based program. A speaker channel-based audio program may contain at least two selectable sets of speaker channels, and the playback system may include those sets of speaker channels to implement anti-screen rendering. It may be configured to render one of the selected ones.

スピーカー・チャネル・ベースのプログラムによって想定される一般的なスピーカー・チャネル構成は、（二つのスピーカーを使った再生のための）ステレオおよび（五つのフルレンジ・スピーカーによる再生のための）5.1サラウンド・サウンドを含む。そのようなチャネル構成では、スピーカー・チャネル（オーディオ信号）は、定義により、ラウドスピーカー位置に関連付けられており、（諸チャネルのオーディオ・コンテンツによって示される）オーディオ要素がレンダリングされる知覚される位置は、典型的には、再生環境における想定されるスピーカー位置または参照聴取位置に対する想定されるスピーカー位置に基づいて決定される。 Typical speaker channel configurations envisioned by speaker channel-based programs are stereo (for playback with two speakers) and 5.1 surround sound (for playback with five full-range speakers). including. In such a channel configuration, the speaker channel (audio signal) is, by definition, associated with the loudspeaker position, and the perceived position where the audio element (indicated by the audio content of the channels) is rendered. , Typically, it is determined based on the assumed speaker position in the reproduction environment or the assumed speaker position with respect to the reference listening position.

（オブジェクト・ベースのプログラムに応答して）スピーカー・チャネル・ベースのオーディオ・プログラムが生成されるいくつかの実施形態では、オブジェクト・ベースのプログラムのスクリーン関係メタデータによって可能にされるスクリーン関係歪め（スケーリング）機能は、再生スクリーンに対する所定の諸位置を有するラウドスピーカーに関連付けられている（スピーカー・チャネル・ベースのプログラムの）スピーカー・チャネルを生成するために利用される。典型的には、特定の再生スクリーンのサイズおよび形状ならびに位置が、スピーカー・チャネル・ベースのプログラムを生成するシステムによって想定される。たとえば、オブジェクト・ベースのプログラムに応答して、スピーカー・チャネル・ベースのプログラムは、スピーカー・チャネルの次の二つの集合を（および任意的には他のスピーカー・チャネルも）含むよう生成されることができる：
参照スクリーンに対して（たとえば映画館ミキシング施設において）決定される知覚される位置にオーディオ要素をレンダリングするための、通常の左（L）および右（R）前方スピーカー・チャネルの第一の集合；および
想定される再生表示スクリーンの左端および右端に対して（たとえばリミックス施設またはミキシング施設のリミックス段において）決定される知覚される位置に同じオーディオ要素をレンダリングするための、「左スクリーン」（Lsc）および「右スクリーン」（Rsc）と称されてもよい左および右前方スピーカー・チャネルの第二の集合（ここで、再生スクリーンおよび再生システム前方スピーカーは、所定の相対的なサイズ、形状および位置を有すると想定される）。 In some embodiments where speaker channel-based audio programs are generated (in response to an object-based program), the screen-related distortions (once possible) by the screen-related metadata of the object-based program. The scaling feature is utilized to generate speaker channels (of speaker channel based programs) associated with loudspeakers that have predetermined positions with respect to the playback screen. Typically, the size and shape and location of a particular playback screen is envisioned by the system that produces the speaker channel based program. For example, in response to an object-based program, a speaker channel-based program should be generated to contain the following two sets of speaker channels (and optionally other speaker channels): Can:
The first set of normal left (L) and right (R) front speaker channels for rendering audio elements to the perceived position determined for the reference screen (eg in a cinema mixing facility); And the "left screen" (Lsc) for rendering the same audio element to the perceived position determined for the left and right edges of the expected playback display screen (eg, in the remix stage of a remix facility or mixing facility). And a second set of left and right front speaker channels, which may be referred to as the "right screen" (Rsc), where the playback screen and playback system front speakers have a given relative size, shape and position. It is supposed to have).

典型的には、歪めの結果として生成されるスピーカー・チャネル・ベースのプログラムのチャネル（たとえばLscおよびRscチャネル）は、再生スクリーン上に表示される画像と対応するレンダリングされる音との間のより近い近接性マッチを許容するようにレンダリングされることができる。 Typically, the channels of the speaker channel-based program produced as a result of the distortion (eg Lsc and Rsc channels) are the more between the image displayed on the playback screen and the corresponding rendered sound. It can be rendered to allow close proximity matches.

通常の左（L）および右（R）前方スピーカー・チャネルを選択し、レンダリングすることによって、再生システムは、それにより決定されたオーディオ要素が歪められていない位置をもつように知覚されるよう、選択されたチャネルをレンダリングすることができる。「左スクリーン」（Lsc）および「右スクリーン」（Rsc）スピーカー・チャネルを選択し、レンダリングすることによって、再生システムは、それにより決定されたオーディオ要素が（再生スクリーンに対して）歪められた位置をもつように知覚されるよう、選択されたチャネルをレンダリングすることができる。だが、歪めは、再生システムによってではなく、（典型的にはスクリーン関係メタデータを含むオブジェクト・ベースのプログラムに応答して）スピーカー・チャネル・ベースのプログラムの生成の時点で実行される。 By selecting and rendering the normal left (L) and right (R) front speaker channels, the playback system will perceive the audio element it determines to have an undistorted position. You can render the selected channel. By selecting and rendering the "Left Screen" (Lsc) and "Right Screen" (Rsc) speaker channels, the playback system is in a position where the audio elements it determines are distorted (relative to the playback screen). The selected channel can be rendered so that it is perceived as having. However, the distortion is not performed by the playback system, but at the time of generation of the speaker channel-based program (typically in response to an object-based program containing screen-related metadata).

このクラスのいくつかの実施形態は：（ミキシングの時と場所において）スクリーン関係メタデータをもつオブジェクト・ベースのプログラムを生成する段階と；次いで、（たとえば家庭用途のためのレコーディングを生成するために、もとのミキシングがなされたのと同じ位置であることができる「リミックス」の時と場所において）スクリーン関係メタデータを使って、オブジェクト・ベースのプログラムからスピーカー・チャネル・ベースのプログラムを生成する段階であって、スクリーン関係歪めを実行することによることを含む、段階と；次いで、スピーカー・チャネル・ベースのプログラムを再生システムに送達する段階とを含む。スピーカー・チャネル・ベースのプログラムは、チャネルの複数の選択可能な集合を含むことができる。該複数の選択可能な集合は、歪めを実行することなく生成され、少なくとも一つの歪められていない位置にあると知覚される少なくとも一つのオーディオ要素を（レンダリングされたときに）示すスピーカー・チャネル（たとえば、通常の仕方で生成されるLおよびRチャネル）の第一の集合と、オブジェクト・ベースのプログラムのコンテンツの歪めの結果として生成され、同じオーディオ要素を（レンダリングされたときに）示すが、少なくとも一つの異なる（すなわち、歪められた）位置にあると知覚されるスピーカー・チャネル（たとえば、LscおよびRscチャネル）の少なくとも一つの追加的な集合とを含む。あるいはまた、スピーカー・チャネル・ベースのプログラムは、歪めの結果として生成され、少なくとも一つ歪められた位置に知覚される少なくとも一つのオーディオ要素を（レンダリングされたときに）示すチャネル（たとえば、LscおよびRscチャネル）の一つの集合のみを含み、歪められていない位置に知覚される同じオーディオ要素を（レンダリングされたときに）示すチャネル（たとえばLおよびRチャネル）の別の集合は含まない。 Some embodiments of this class are: the stage of generating an object-based program with screen-related data (at the time and place of mixing); and then (for example, to generate a recording for home use). Generate speaker channel-based programs from object-based programs using screen-related data (at the time and place of the "remix", which can be in the same position where the original mixing was done). A step, including by performing a screen-related distortion; then a step of delivering a speaker channel-based program to the reproduction system. A speaker channel-based program can include multiple selectable sets of channels. The plurality of selectable sets is a speaker channel (when rendered) that is generated without performing distortion and represents at least one audio element that is perceived to be in at least one undistorted position. For example, the first set of (L and R channels) generated in the usual way) and the same audio element (when rendered) that is generated as a result of distorting the content of an object-based program, but Includes at least one additional set of speaker channels (eg, Lsc and Rsc channels) that are perceived to be in at least one different (ie, distorted) position. Alternatively, the speaker channel-based program is generated as a result of the distortion and indicates the channel (eg, Lsc and) showing at least one audio element perceived in at least one distorted position (when rendered). It contains only one set of Rsc channels) and not another set of channels (eg L and R channels) that indicate (when rendered) the same audio element perceived in an undistorted position.

ある例示的実施形態に基づいてオブジェクト・ベースのプログラムから生成されるスピーカー・チャネル・ベースのプログラムは、五つの前方チャネルを含む：左（L）、左スクリーン（Lsc）、中央（C）、右スクリーン（Rsc）および右（R）である。LscおよびRscチャネルは、オブジェクト・ベースのプログラムのスクリーン関係メタデータを使って歪めを実行することによって生成される。スピーカー・チャネル・ベースのプログラムをレンダリングおよび再生するために、再生システムは、再生スクリーンの左端および右端にある前方スピーカーを駆動するようLおよびRチャネルを選択し、レンダリングしてもよいし、あるいは、再生スクリーンの左端および右端から隔たった前方スピーカーを駆動するためにLscおよびRscチャネルを選択し、レンダリングしてもよい。たとえば、LscおよびRscチャネルは、想定されるユーザー位置に対して＋30度と－30度の方位角にある前方スピーカーを使ってオーディオ要素をレンダリングするために使われるという想定で生成されてもよく、LおよびRチャネルは、想定されるユーザー位置に対して（再生スクリーンの左端および右端にある）＋15度と－15度の方位角にある前方スピーカーを使ってオーディオ要素をレンダリングするために使われるという想定で生成されてもよい。 A speaker channel-based program generated from an object-based program based on an exemplary embodiment contains five forward channels: left (L), left screen (Lsc), center (C), right. Screen (Rsc) and right (R). Lsc and Rsc channels are generated by performing distortions using screen-related metadata in object-based programs. To render and play speaker channel-based programs, the playback system may select and render the L and R channels to drive the front speakers on the left and right edges of the playback screen. Lsc and Rsc channels may be selected and rendered to drive the front speakers away from the left and right edges of the playback screen. For example, Lsc and Rsc channels may be generated with the assumption that they will be used to render audio elements with forward speakers at +30 and -30 degrees azimuths with respect to the expected user position. The L and R channels are said to be used to render audio elements using front speakers at +15 and -15 degree azimuths (on the left and right edges of the playback screen) for the expected user position. It may be generated by assumption.

たとえば、図５のシステムは、本発明のある実施形態に基づく、スクリーン関係メタデータを含むオブジェクト・ベースのオーディオ・プログラム（「OP」）を生成するよう構成されたエンコーダ４を含んでいる。エンコーダ４は、ミキシング施設内でまたはミキシング施設において実装されてもよい。図５のシステムは、エンコーダ４によって生成されたオブジェクト・ベースのオーディオ・プログラムに応答して、スピーカー・チャネルを含むがオブジェクト・チャネルを含まないスピーカー・チャネル・ベースのオーディオ・プログラム（「SP」）を（本発明のある実施形態に従って）生成するよう結合され、構成されているリミックス・サブシステム６をも含む。サブシステム６は、リミックス施設内でまたはリミックス施設においてまたはミキシング施設（たとえばエンコーダ４も実装されるミキシング施設）のリミックス段として実装されてもよい。スピーカー・チャネル・ベースのプログラムSPのオーディオ・コンテンツは、スピーカー・チャネルの少なくとも二つの選択可能な集合（たとえば、上記で論じたチャネルLおよびRを含む一つの集合と、上記で論じたLscおよびRscを含むもう一つの集合）を含み、サブシステム６は、本発明の実施形態に基づいて、プログラムOPのスクリーン関係メタデータを使って（そして典型的にはスクリーン関係メタデータによっては示されない、歪めの型および／または度合いを示す他の制御データも使って）（エンコーダ４によって生成された）オブジェクト・ベースのプログラムOPのオーディオ・コンテンツを歪める結果として、それらの集合のうちの少なくとも一つ（たとえばチャネルLscおよびRsc）を生成するよう構成されている。スピーカー・チャネル・ベースのプログラムSPはサブシステム６から送達サブシステム５に出力される。サブシステム５は、図３のシステムの上記で論じたサブシステム５と同一であることができる。 For example, the system of FIG. 5 includes an encoder 4 configured to generate an object-based audio program (“OP”) containing screen-related metadata based on an embodiment of the invention. The encoder 4 may be implemented in the mixing facility or in the mixing facility. The system of FIG. 5 responds to an object-based audio program generated by encoder 4 with a speaker channel-based audio program (“SP”) that includes speaker channels but does not contain object channels. Also includes a remix subsystem 6 coupled and configured to produce (according to certain embodiments of the invention). The subsystem 6 may be implemented in a remix facility, in a remix facility, or as a remix stage of a mixing facility (eg, a mixing facility in which the encoder 4 is also mounted). The audio content of a speaker channel-based program SP is the audio content of at least two selectable sets of speaker channels (eg, one set containing channels L and R discussed above, and Lsc and Rsc discussed above. The subsystem 6 is distorted using the screen-related metadata of the program OP (and typically not shown by the screen-related metadata) based on embodiments of the present invention. At least one of those sets (eg, using other control data indicating the type and / or degree of) as a result of distorting the audio content of the object-based program OP (generated by encoder 4). It is configured to generate channels Lsc and Rsc). The speaker channel-based program SP is output from the subsystem 6 to the delivery subsystem 5. The subsystem 5 can be identical to the subsystem 5 discussed above for the system of FIG.

本発明の実施形態は、ハードウェア、ファームウェアまたはソフトウェアまたはそれらの組み合わせにおいて（たとえばプログラム可能な論理アレイとして）実装されてもよい。たとえば、図３のシステム（またはそのサブシステム３またはサブシステム７、９、１０、１１）は、適切にプログラムされた（または他の仕方で構成された）ハードウェアまたはファームウェアにおいて、たとえばプログラムされた汎用プロセッサ、デジタル信号プロセッサまたはマイクロプロセッサとして実装されてもよい。特に断わりのない限り、本発明の一部として含まれるアルゴリズムまたはプロセスは、いかなる特定のコンピュータまたは他の装置にも本来的に関係していない。特に、さまざまな汎用機械が、本願の教示に従って書かれたプログラムとともに使用されてもよく、あるいは必要とされる方法ステップを実行するためにより特化した装置（たとえば集積回路）を構築することがより便利であることがある。このように、本発明は、一つまたは複数のプログラム可能なコンピュータ・システム（たとえば、図３のシステム（またはそのサブシステム３またはサブシステム７、９、１０、１１）を実装するコンピュータ・システム）上で実行される一つまたは複数のコンピュータ・プログラムにおいて実装されてもよい。各コンピュータ・システムは、少なくとも一つのプロセッサ、少なくとも一つのデータ記憶システム（揮発性および不揮発性メモリおよび／または記憶要素を含む）、少なくとも一つの入力装置またはポートおよび少なくとも一つの出力装置またはポートを有する。本稿に記載される機能を実行し、出力情報を生成するようプログラム・コードが入力データに適用される。出力情報は、既知の仕方で一つまたは複数の出力装置に適用される。 Embodiments of the invention may be implemented in hardware, firmware or software or a combination thereof (eg, as a programmable logical array). For example, the system of FIG. 3 (or its subsystem 3 or subsystems 7, 9, 10, 11) was, for example, programmed in properly programmed (or otherwise configured) hardware or firmware. It may be implemented as a general purpose processor, digital signal processor or microprocessor. Unless otherwise noted, the algorithms or processes included as part of the invention are not inherently relevant to any particular computer or other device. In particular, various general-purpose machines may be used with programs written according to the teachings of the present application, or it is better to build more specialized equipment (eg, integrated circuits) to perform the required method steps. It can be convenient. As such, the present invention relates to one or more programmable computer systems (eg, computer systems that implement the system of FIG. 3 (or its subsystem 3 or subsystems 7, 9, 10, 11)). It may be implemented in one or more computer programs running above. Each computer system has at least one processor, at least one data storage system (including volatile and non-volatile memory and / or storage elements), at least one input device or port and at least one output device or port. .. Program code is applied to the input data to perform the functions described in this article and generate output information. The output information is applied to one or more output devices in a known manner.

そのような各プログラムは、コンピュータ・システムと通信するためにいかなる所望されるコンピュータ言語（機械、アセンブリーまたは高水準手続き型、論理的またはオブジェクト指向のプログラミング言語を含む）において実装されてもよい。いずれの場合にも、言語はコンパイルされる言語でもインタープリットされる言語でもよい。 Each such program may be implemented in any desired computer language for communicating with a computer system, including machine, assembly or high-level procedural, logical or object-oriented programming languages. In either case, the language may be a compiled language or an interpreted language.

たとえば、コンピュータ・ソフトウェア命令のシーケンスによって実装されるとき、本発明の実施形態のさまざまな機能および段階は、好適なデジタル信号処理ハードウェアにおいて実行されるマルチスレッド式のソフトウェア命令シーケンスによって実装されてもよく、その場合、実施形態のさまざまな装置、段階および機能は、ソフトウェア命令の諸部分に対応してもよい。 For example, when implemented by a sequence of computer software instructions, the various functions and stages of embodiments of the invention may be implemented by a multithreaded software instruction sequence performed in suitable digital signal processing hardware. Well, in that case, the various devices, stages and functions of the embodiment may correspond to parts of the software instructions.

そのような各コンピュータ・プログラムは好ましくは、汎用または専用のプログラム可能なコンピュータによって読み取り可能な記憶媒体またはデバイス（たとえば半導体メモリまたはメディアまたは磁気式もしくは光学式メディア）に記憶されるまたはダウンロードされ、記憶媒体またはデバイスがコンピュータ・システムによって読まれたときに、本稿に記載される手順を実行するようコンピュータを構成し、動作させる。本発明のシステムは、コンピュータ・プログラムをもって構成された（すなわちコンピュータ・プログラムを記憶している）コンピュータ可読記憶媒体として実装されてもよく、そのように構成された記憶媒体はコンピュータ・システムに、本稿に記載される機能を実行するよう特定のあらかじめ定義された仕方で動作させる。 Each such computer program is preferably stored or downloaded and stored on a storage medium or device (eg, semiconductor memory or media or magnetic or optical media) readable by a general purpose or dedicated programmable computer. Configure and operate your computer to perform the procedures described in this article when the medium or device is read by your computer system. The system of the present invention may be implemented as a computer-readable storage medium configured with a computer program (ie, storing the computer program), and the storage medium so configured is described in the computer system. Operate in a specific predefined way to perform the functions described in.

諸実装を例として、例示的な個別的な実施形態を用いて記述してきたが、本発明の実装は開示される実施形態に限定されないことは理解される。むしろ、当業者に明白となるさまざまな修正および類似の構成をカバーすることが意図されている。したがって、付属の請求項の範囲は、あらゆるそのような修正および類似の構成を包含するよう最も広い解釈を与えられるべきものである。 Although the implementations have been described using exemplary individual embodiments as examples, it is understood that the implementations of the present invention are not limited to the disclosed embodiments. Rather, it is intended to cover various modifications and similar configurations that will be apparent to those of skill in the art. Therefore, the scope of the accompanying claims should be given the broadest interpretation to include any such amendments and similar configurations.

いくつかの態様を記載しておく。
〔態様１〕
オーディオ・プログラムをレンダリングする方法であって：
（ａ）少なくとも一つの歪め度パラメータを決定する段階と；
（ｂ）前記プログラムの少なくとも一つのチャネルのオーディオ・コンテンツに対して、少なくとも部分的には前記チャネルに対応する前記歪め度パラメータによって決定される度合いまで歪めを実行する段階であって、それぞれの前記歪め度パラメータは、再生システムによって前記プログラムの対応するオーディオ・コンテンツに対して実行される歪めの最大の度合いを示す、段階とを含む、
方法。
〔態様２〕
段階（ａ）は、少なくとも一つのオフ・スクリーン歪めパラメータを決定する段階を含み、前記オフ・スクリーン歪めパラメータは、前記再生システムによる前記プログラムの対応するオーディオ・コンテンツに対するオフ・スクリーン歪めの少なくとも一つの特性を示し、段階（ｂ）において実行される歪めが、少なくとも部分的には少なくとも一つの前記オフ・スクリーン歪めパラメータによって決定されるオフ・スクリーン歪めを含む、態様１記載の方法。
〔態様３〕
前記オフ・スクリーン歪めパラメータは、再生スクリーンの面に少なくとも実質的に平行な幅軸に沿ったオーディオ要素の歪められていない位置を歪める度合いを、前記再生スクリーンの面に少なくとも実質的に垂直な、前記オーディオ要素がレンダリングされるべき歪められた位置の距離の関数として制御する、態様２記載の方法。
〔態様４〕
前記歪めが、幅軸に沿ったある歪められた位置においてレンダリングされるべきオーディオ要素の幅軸に沿った歪められていない位置を示す値Xsの決定と、値
Xwarp
YFACTOR＝y^EXPおよび
X'＝x*YFACTOR＋(1－YFACTOR)*[XFACTOR*Xwarp＋(l－XFACTOR)*x)]
の決定とを含み、
Xwarpは、再生スクリーンの端に対する幅軸に沿った前記オーディオ要素の生の歪められた位置を表わし、
EXPはオフ・スクリーン歪めパラメータであり、
YFACTORは、前記再生スクリーンの面に少なくとも実質的に垂直な奥行き軸に沿った前記オーディオ要素の歪められた位置yの関数としての、幅軸に沿った歪めの度合いを示し、
X'は、前記再生スクリーンの前記端に対する幅軸に沿った前記オーディオ要素の歪められたオブジェクト位置を表わし、
XFACTORは、一つの前記歪め度パラメータである、
態様１ないし３のうちいずれか一項記載の方法。
〔態様５〕
前記プログラムはオブジェクト・ベースのオーディオ・プログラムであり、段階（ａ）は、前記プログラムをパースして、前記プログラムのスクリーン関係メタデータによって示される少なくとも一つの前記歪め度パラメータを同定する段階を含む、態様１ないし４のうちいずれか一項記載の方法。
〔態様６〕
前記プログラムは少なくとも二つのオブジェクトを示し、段階（ａ）は、前記オブジェクトのそれぞれについて、少なくとも一つの歪め度パラメータを独立に決定する段階を含み、段階（ｂ）は：
少なくとも部分的には前記オブジェクトの前記それぞれに対応する前記少なくとも一つの歪め度パラメータによって決定される度合いまで前記オブジェクト・チャネルのそれぞれのオーディオ・コンテンツに対して歪めを独立して実行する段階を含む、
態様５記載の方法。
〔態様７〕
それぞれの前記歪め度パラメータは、前記再生システムによって前記プログラムの対応するオーディオ・コンテンツに対して実行される歪めの最大の度合いを示す非バイナリー値である、態様１ないし６のうちいずれか一項記載の方法。
〔態様８〕
オブジェクト・ベースのオーディオ・プログラムを生成する方法であって：
（ａ）少なくとも一つのオブジェクトについての少なくとも一つの歪め度パラメータを決定する段階と；
（ｂ）前記オブジェクトを示すオブジェクト・チャネルおよび前記オブジェクトについてのそれぞれの前記歪め度パラメータを示すスクリーン関係メタデータを前記プログラムに含める段階であって、それぞれの前記歪め度パラメータは、再生システムによって前記オブジェクトに対して実行される歪めの最大の度合いを示す、段階とを含む、
方法。
〔態様９〕
前記プログラムは少なくとも二つのオブジェクトを示し、前記スクリーン関係メタデータは、前記オブジェクトのうち少なくとも二つのオブジェクトのそれぞれについての少なくとも一つの前記歪め度パラメータを示し、それぞれの前記歪め度パラメータは、それぞれの対応するオブジェクトに対して実行される最大の歪めの度合いを示す、態様８記載の方法。
〔態様１０〕
段階（ａ）は、前記少なくとも一つのオブジェクトについての少なくとも一つのオフ・スクリーン歪め度パラメータを決定する段階を含み、前記オフ・スクリーン歪めパラメータは、前記再生システムによって前記オブジェクトに対して実行されるオフ・スクリーン歪めの少なくとも一つの特性を示し、前記プログラムに含められる前記スクリーン関係メタデータは、それぞれの前記オフ・スクリーン歪めパラメータを示す、態様８または９記載の方法。
〔態様１１〕
前記オフ・スクリーン歪めパラメータは、再生スクリーンの面に少なくとも実質的に平行な幅軸に沿った前記オブジェクトの歪められていない位置を歪める度合いを、前記再生スクリーンの面に少なくとも実質的に垂直な、前記オブジェクトがレンダリングされるべき歪められた位置の距離の関数として制御する、態様１０記載の方法。
〔態様１２〕
前記歪めが、幅軸に沿ったある歪められた位置においてレンダリングされるべき前記オブジェクトの幅軸に沿った歪められていない位置を示す値Xsの決定と、値
Xwarp
YFACTOR＝y^EXPおよび
X'＝x*YFACTOR＋(1－YFACTOR)*[XFACTOR*Xwarp＋(l－XFACTOR)*x)]
の決定とを含み、
Xwarpは、再生スクリーンの端に対する幅軸に沿った前記オーディオ要素の生の歪められた位置を表わし、
EXPはオフ・スクリーン歪めパラメータであり、
YFACTORは、前記再生スクリーンの面に少なくとも実質的に垂直な奥行き軸に沿った前記オーディオ要素の歪められた位置yの関数としての、幅軸に沿った歪めの度合いを示し、
X'は、前記再生スクリーンの前記端に対する幅軸に沿った前記オーディオ要素の歪められたオブジェクト位置を表わし、
XFACTORは、一つの前記歪め度パラメータである、
態様８ないし１１のうちいずれか一項記載の方法。
〔態様１３〕
それぞれの前記歪め度パラメータは、前記再生システムによって前記プログラムの対応するオーディオ・コンテンツに対して実行される歪めの最大の度合いを示す非バイナリー値である、態様８ないし１２のうちいずれか一項記載の方法。
〔態様１４〕
（ａ）オブジェクト・ベースのオーディオ・プログラムを生成する段階と；
（ｂ）前記オブジェクト・ベースのオーディオ・プログラムに応答して、再生スクリーンに対して所定の位置に位置されるラウドスピーカーによる再生のために意図されたスピーカー・チャネルの少なくとも一つの集合を含むスピーカー・チャネル・ベースのプログラムを生成する段階であって、スピーカー・チャネルの前記集合の生成は、前記オブジェクト・ベースのオーディオ・プログラムのオーディオ・コンテンツを、少なくとも部分的には少なくとも一つの歪め度パラメータによって決定される度合いまで歪める段階を含み、それぞれの前記歪め度パラメータは、再生システムによって前記オブジェクト・ベースのオーディオ・プログラムの対応するオーディオ・コンテンツに対して実行される歪めの最大の度合いを示す、段階とを含む、方法。
〔態様１５〕
段階（ｂ）は、前記スピーカー・チャネル・ベースのオーディオ・プログラムを、該スピーカー・チャネル・ベースのオーディオ・プログラムがスピーカー・チャネルの二つ以上の選択可能な集合を含むように生成する段階を含み、それらの集合の少なくとも一つは前記オブジェクト・ベースのオーディオ・プログラムの歪められていないオーディオ・コンテンツを示し、それらの集合の少なくとも一つの他のものの生成は、前記オブジェクト・ベースのオーディオ・プログラムのオーディオ・コンテンツを、少なくとも部分的には前記歪め度パラメータによって決定される度合いまで歪める段階を含み、前記集合の前記他のものは、前記再生スクリーンに対して前記所定の位置に位置されるラウドスピーカーによる再生のために意図される、態様１４記載の方法。
〔態様１６〕
段階（ｂ）は、少なくとも一つのオフ・スクリーン歪めパラメータを決定する段階を含み、前記オフ・スクリーン歪めパラメータは、前記再生システムによる前記オブジェクト・ベースのオーディオ・プログラムの対応するオーディオ・コンテンツに対するオフ・スクリーン歪めの少なくとも一つの特性を示し、段階（ｂ）は、少なくとも部分的には少なくとも一つの前記オフ・スクリーン歪めパラメータによって決定されるオフ・スクリーン歪めを含む、態様１４または１５記載の方法。
〔態様１７〕
前記オフ・スクリーン歪めは、再生スクリーンの面に少なくとも実質的に平行な幅軸に沿ったオーディオ要素の歪められていない位置を、前記再生スクリーンの面に少なくとも実質的に垂直な、前記オーディオ要素がレンダリングされるべき歪められた位置の距離の関数として前記オフ・スクリーン歪めパラメータによって制御される度合いまで、歪めることを含む、態様１６記載の方法。
〔態様１８〕
前記歪める段階が、幅軸に沿ったある歪められた位置においてレンダリングされるべきオーディオ・オブジェクトの、前記再生スクリーンの面に少なくとも実質的に平行な幅軸に沿った歪められていない位置を示す値Xsの決定と、値
Xwarp
YFACTOR＝y^EXPおよび
X'＝x*YFACTOR＋(1－YFACTOR)*[XFACTOR*Xwarp＋(l－XFACTOR)*x)]
の決定とを含み、
Xwarpは、再生スクリーンの端に対する幅軸に沿った前記オブジェクトの生の歪められた位置を表わし、
EXPはオフ・スクリーン歪めパラメータであり、
YFACTORは、前記再生スクリーンの面に少なくとも実質的に垂直な奥行き軸に沿った前記オブジェクトの歪められた位置yの関数としての、幅軸に沿った歪めの度合いを示し、
X'は、前記再生スクリーンの前記端に対する幅軸に沿った前記オブジェクトの歪められたオブジェクト位置を表わし、
XFACTORは、一つの前記歪め度パラメータである、
態様１４ないし１７のうちいずれか一項記載の方法。
〔態様１９〕
前記オブジェクト・ベースのオーディオ・プログラムは、前記少なくとも一つの歪め度パラメータを示すスクリーン関係メタデータを含み、段階（ｂ）は、前記オブジェクト・ベースのオーディオ・プログラムをパースして、前記スクリーン関係メタデータによって示されるそれぞれの前記歪め度パラメータを同定する段階を含む、態様１４ないし１８のうちいずれか一項記載の方法。
〔態様２０〕
それぞれの前記歪め度パラメータは、前記再生システムによって前記プログラムの対応するオーディオ・コンテンツに対して実行される歪めの最大の度合いを示す非バイナリー値である、態様１４ないし１９のうちいずれか一項記載の方法。
〔態様２１〕
歪められたコンテンツを示すスピーカー・チャネルの少なくとも一つの集合を含むスピーカー・チャネル・ベースのプログラムをレンダリングする方法であって、前記スピーカー・チャネル・ベースのプログラムは、オブジェクト・ベースのオーディオ・プログラムを処理することによって生成されたものであり、該処理は、前記オブジェクト・ベースのオーディオ・プログラムのオーディオ・コンテンツを、少なくとも部分的には少なくとも一つの歪め度パラメータによって決定される度合いまで歪めて、歪められたコンテンツを示すスピーカー・チャネルの前記集合を生成することによることを含み、それぞれの前記歪め度パラメータは、再生システムによって前記オブジェクト・ベースのオーディオ・プログラムの対応するオーディオ・コンテンツに対して実行される歪めの最大の度合いを示し、当該レンダリングする方法は：
（ａ）前記スピーカー・チャネル・ベースのプログラムをパースして、歪められたコンテンツを示すスピーカー・チャネルのそれぞれの前記集合を含む、前記スピーカー・チャネル・ベースのプログラムのスピーカー・チャネルを同定する段階と；
（ｂ）歪められたコンテンツを示すスピーカー・チャネルの少なくとも一つの前記集合を含む、前記スピーカー・チャネル・ベースのプログラムのスピーカー・チャネルの少なくともいくつかに応答して、再生スクリーンに対する所定の位置に位置されるラウドスピーカーを駆動するためのスピーカー・フィードを生成する段階とを含む、
方法。
〔態様２２〕
前記スピーカー・チャネル・ベースのプログラムは、前記オブジェクト・ベースのオーディオ・プログラムを処理することによって生成されたものであり、該処理は、前記オブジェクト・ベースのオーディオ・プログラムのオーディオ・コンテンツのオフ・スクリーン歪めを、少なくとも部分的には前記少なくとも一つの歪め度パラメータと、前記オブジェクト・ベースのプログラムの対応するオーディオ・コンテンツに対するオフ・スクリーン歪めの少なくとも一つの特性を示す少なくとも一つのオフ・スクリーン歪めパラメータを使うこととによって決定される度合いまで実行することによることを含む、態様２１記載の方法。
〔態様２３〕
前記スピーカー・チャネル・ベースのオーディオ・プログラムは、スピーカー・チャネルの二つ以上の選択可能な集合を含み、それらの集合の少なくとも一つは前記オブジェクト・ベースのオーディオ・プログラムの歪められていないオーディオ・コンテンツを示し、それらの集合の他の一つは、歪められたコンテンツを示すスピーカー・チャネルの一つの前記集合であり、段階（ｂ）は、歪められたコンテンツを示すスピーカー・チャネルの一つの前記集合である前記集合の一つを選択する段階を含む、態様２１または２２記載の方法。
〔態様２４〕
それぞれの前記歪め度パラメータは、前記再生システムによって前記プログラムの対応するオーディオ・コンテンツに対して実行される歪めの最大の度合いを示す非バイナリー値である、態様２１ないし２３のうちいずれか一項記載の方法。
〔態様２５〕
マルチチャネル・オーディオ・プログラムをパースして前記プログラムのチャネルを同定するよう構成されている第一のサブシステムと；
前記第一のサブシステムに結合された処理サブシステムであって、前記プログラムの少なくとも一つのチャネルのオーディオ・コンテンツに対して、少なくとも部分的には前記チャネルに対応する少なくとも一つの歪め度パラメータによって決定される度合いまで歪めを実行するよう構成された処理サブシステムとを含んでおり、それぞれの前記歪め度パラメータは、再生システムによって前記プログラムの対応するオーディオ・コンテンツに対して実行される歪めの最大の度合いを示す、
システム。
〔態様２６〕
前記歪めは、少なくとも部分的には少なくとも一つのオフ・スクリーン歪めパラメータによって決定されるオフ・スクリーン歪めを含み、前記オフ・スクリーン歪めパラメータは、再生システムによる前記プログラムの対応するオーディオ・コンテンツに対するオフ・スクリーン歪めの少なくとも一つの特性を示す、態様２５記載のシステム。
〔態様２７〕
前記オフ・スクリーン歪めは、再生スクリーンの面に少なくとも実質的に平行な幅軸に沿ったオーディオ要素の歪められていない位置を、前記再生スクリーンの面に少なくとも実質的に垂直な、前記オーディオ要素がレンダリングされるべき歪められた位置の距離の関数として前記オフ・スクリーン歪めパラメータによって制御される度合いまで、歪めることを含む、態様２６記載のシステム。
〔態様２８〕
前記歪めが、幅軸に沿ったある歪められた位置においてレンダリングされるべきオーディオ要素の幅軸に沿った歪められていない位置を示す値Xsの決定と、値
Xwarp
YFACTOR＝y^EXPおよび
X'＝x*YFACTOR＋(1－YFACTOR)*[XFACTOR*Xwarp＋(l－XFACTOR)*x)]
の決定とを含み、
Xwarpは、再生スクリーンの端に対する幅軸に沿った前記オーディオ要素の生の歪められた位置を表わし、
EXPはオフ・スクリーン歪めパラメータであり、
YFACTORは、前記再生スクリーンの面に少なくとも実質的に垂直な奥行き軸に沿った前記オーディオ要素の歪められた位置yの関数としての、幅軸に沿った歪めの度合いを示し、
X'は、前記再生スクリーンの前記端に対する幅軸に沿った前記オーディオ要素の歪められたオブジェクト位置を表わし、
XFACTORは、一つの前記歪め度パラメータである、
態様２５ないし２７のうちいずれか一項記載のシステム。
〔態様２９〕
前記プログラムはオブジェクト・ベースのオーディオ・プログラムであり、前記第一のサブシステムは、前記プログラムをパースして、前記プログラムのスクリーン関係メタデータによって示される少なくとも一つの前記歪め度パラメータを同定するよう構成されている、態様２５ないし２８のうちいずれか一項記載のシステム。
〔態様３０〕
前記プログラムは少なくとも二つのオブジェクトを示し、前記第一のサブシステムは、前記オブジェクトのそれぞれについて、少なくとも一つの歪め度パラメータを独立に決定するよう構成されており、前記処理サブシステムは、少なくとも部分的には前記オブジェクトの前記それぞれに対応する前記少なくとも一つの歪め度パラメータによって決定される度合いまで前記オブジェクトのそれぞれを示すオーディオ・コンテンツに対して歪めを独立して実行するよう構成されている、態様２９記載のシステム。
〔態様３１〕
それぞれの前記歪め度パラメータは、前記再生システムによって前記プログラムの対応するオーディオ・コンテンツに対して実行される歪めの最大の度合いを示す非バイナリー値である、態様２５ないし３０のうちいずれか一項記載のシステム。
〔態様３２〕
オブジェクト・ベースのオーディオ・プログラムを生成するよう構成された第一のサブシステムと；
前記オブジェクト・ベースのオーディオ・プログラムに応答して、再生スクリーンに対して所定の位置に位置されるラウドスピーカーによる再生のために意図されたスピーカー・チャネルの少なくとも一つの集合を含むスピーカー・チャネル・ベースのプログラムを生成するよう構成された、前記第一のサブシステムに結合された第二のサブシステムとを含むシステムであって、前記第二のサブシステムは、前記オブジェクト・ベースのオーディオ・プログラムのオーディオ・コンテンツを、少なくとも部分的には少なくとも一つの歪め度パラメータによって決定される度合いまで歪めることによることを含め、スピーカー・チャネルの前記集合を生成するよう構成されており、それぞれの前記歪め度パラメータは、再生システムによって前記オブジェクト・ベースのオーディオ・プログラムの対応するオーディオ・コンテンツに対して実行される歪めの最大の度合いを示す、システム。
〔態様３３〕
前記第二のサブシステムは、前記スピーカー・チャネル・ベースのオーディオ・プログラムを、該スピーカー・チャネル・ベースのオーディオ・プログラムがスピーカー・チャネルの二つ以上の選択可能な集合を含むように生成するよう構成されており、それらの集合の少なくとも一つは前記オブジェクト・ベースのオーディオ・プログラムの歪められていないオーディオ・コンテンツを示し、それらの集合の少なくとも一つの他のものの生成は、前記オブジェクト・ベースのオーディオ・プログラムのオーディオ・コンテンツを、少なくとも部分的には前記歪め度パラメータによって決定される度合いまで歪めることを含み、前記集合の前記他のものは、前記再生スクリーンに対して前記所定の位置に位置されるラウドスピーカーによる再生のために意図される、態様３２記載のシステム。
〔態様３４〕
前記第二のサブシステムは、前記オブジェクト・ベースのオーディオ・プログラムのオーディオ・コンテンツのオフ・スクリーン歪めを実行することによることを含め、スピーカー・チャネルの前記集合を生成するよう構成されており、前記オフ・スクリーン歪めは、少なくとも部分的には、前記オフ・スクリーン歪めの少なくとも一つの特性を示す少なくとも一つのオフ・スクリーン歪めパラメータによって決定される、態様３２または３３記載のシステム。
〔態様３５〕
前記第二のサブシステムは、前記オフ・スクリーン歪めパラメータに応答して、再生スクリーンの面に少なくとも実質的に平行な幅軸に沿ったオブジェクトの歪められていない位置を歪める度合いを、前記再生スクリーンの面に少なくとも実質的に垂直な、前記オブジェクトがレンダリングされるべき歪められた位置の距離の関数として制御するよう構成されている、態様３４記載のシステム。
〔態様３６〕
前記第二のサブシステムが、幅軸に沿ったある歪められた位置においてレンダリングされるべきオーディオ要素の、前記再生スクリーンの面に少なくとも実質的に平行な前記幅軸に沿った歪められていない位置を示す値Xsを決定し、値
Xwarp
YFACTOR＝y^EXPおよび
X'＝x*YFACTOR＋(1－YFACTOR)*[XFACTOR*Xwarp＋(l－XFACTOR)*x)]
を決定することによることを含めて前記歪めを実行するよう構成されており、
Xwarpは、再生スクリーンの端に対する幅軸に沿った前記オブジェクトの生の歪められた位置を表わし、
EXPはオフ・スクリーン歪めパラメータであり、
YFACTORは、前記再生スクリーンの面に少なくとも実質的に垂直な奥行き軸に沿った前記オブジェクトの歪められた位置yの関数としての、幅軸に沿った歪めの度合いを示し、
X'は、前記再生スクリーンの前記端に対する幅軸に沿った前記オブジェクトの歪められたオブジェクト位置を表わし、
XFACTORは、一つの前記歪め度パラメータである、
態様３２ないし３５のうちいずれか一項記載のシステム。
〔態様３７〕
前記オブジェクト・ベースのオーディオ・プログラムは、前記少なくとも一つの歪め度パラメータを示すスクリーン関係メタデータを含み、前記第二のサブシステムは、前記オブジェクト・ベースのオーディオ・プログラムをパースして、前記スクリーン関係メタデータによって示されるそれぞれの前記歪め度パラメータを同定するよう構成されている、態様３２ないし３６のうちいずれか一項記載のシステム。
〔態様３８〕
それぞれの前記歪め度パラメータは、前記再生システムによって前記プログラムの対応するオーディオ・コンテンツに対して実行される歪めの最大の度合いを示す非バイナリー値である、態様３２ないし３７のうちいずれか一項記載のシステム。
〔態様３９〕
歪められたコンテンツを示すスピーカー・チャネルの少なくとも一つの集合を含むスピーカー・チャネル・ベースのプログラムをレンダリングするシステムであって、前記スピーカー・チャネル・ベースのプログラムは、オブジェクト・ベースのオーディオ・プログラムを処理することによって生成されたものであり、該処理は、前記オブジェクト・ベースのオーディオ・プログラムのオーディオ・コンテンツを、少なくとも部分的には少なくとも一つの歪め度パラメータによって決定される度合いまで歪めて、歪められたコンテンツを示すスピーカー・チャネルの前記集合を生成することによることを含み、それぞれの前記歪め度パラメータは、再生システムによって前記オブジェクト・ベースのオーディオ・プログラムの対応するオーディオ・コンテンツに対して実行される歪めの最大の度合いを示し、当該システムは：
前記スピーカー・チャネル・ベースのプログラムをパースして、歪められたコンテンツを示すスピーカー・チャネルのそれぞれの前記集合を含む、前記スピーカー・チャネル・ベースのプログラムのスピーカー・チャネルを同定するよう構成された第一のサブシステムと；
歪められたコンテンツを示すスピーカー・チャネルの少なくとも一つの前記集合を含む、前記スピーカー・チャネル・ベースのプログラムのスピーカー・チャネルの少なくともいくつかに応答して、再生スクリーンに対する所定の位置に位置されるラウドスピーカーを駆動するためのスピーカー・フィードを生成するよう構成されている、前記第一のサブシステムに結合されたレンダリング・サブシステムとを含む、
システム。
〔態様４０〕
前記スピーカー・チャネル・ベースのオーディオ・プログラムは、スピーカー・チャネルの二つ以上の選択可能な集合を含み、それらの集合の少なくとも一つは前記オブジェクト・ベースのオーディオ・プログラムの歪められていないオーディオ・コンテンツを示し、それらの集合の他の一つは、歪められたコンテンツを示すスピーカー・チャネルの一つの前記集合であり、前記第一のサブシステムは、歪められたコンテンツを示すスピーカー・チャネルの一つの前記集合である前記集合の一つを、前記レンダリング・サブシステムによるレンダリングのために選択するよう構成されている、態様３９記載のシステム。
〔態様４１〕
それぞれの前記歪め度パラメータは、前記再生システムによって前記プログラムの対応するオーディオ・コンテンツに対して実行される歪めの最大の度合いを示す非バイナリー値である、態様３９または４０記載のシステム。
〔態様４２〕
バッファ・メモリと；
前記バッファ・メモリに結合された少なくとも一つの処理サブシステムとを含むオーディオ処理ユニットであって、
前記バッファ・メモリは、オブジェクト・ベースのオーディオ・プログラムの少なくとも一つのセグメントであって、前記セグメントは少なくとも一つのオブジェクトを示す少なくとも一つのオブジェクト・チャネルのオーディオ・コンテンツを含む、セグメントと、少なくとも一つの前記オブジェクトについての少なくとも一つの歪め度パラメータを示すスクリーン関係メタデータであって、それぞれの前記歪め度パラメータは、再生システムによって前記オブジェクトに対して実行されるべき歪めの最大の度合いを示す、スクリーン関係メタデータとを記憶しており、
前記処理サブシステムは、前記スクリーン関係メタデータの少なくとも一部を使った前記オブジェクト・ベースのオーディオ・プログラムのレンダリングまたは前記オブジェクト・ベースのオーディオ・プログラムの生成または前記オブジェクト・ベースのオーディオ・プログラムのデコードのうちの少なくとも一つを実行するよう結合され、構成されている、
オーディオ処理ユニット。
〔態様４３〕
前記プログラムは少なくとも二つのオブジェクトを示し、前記スクリーン関係メタデータは、前記オブジェクトのうち少なくとも二つのオブジェクトのそれぞれについての少なくとも一つの前記歪め度パラメータを示し、それぞれの前記歪め度パラメータは、それぞれの対応するオブジェクトに対して実行される最大の歪めの度合いを示す、態様４２記載のオーディオ処理ユニット。
〔態様４４〕
前記バッファ・メモリに記憶されている前記オブジェクト・ベースのオーディオ・プログラムの前記セグメントは、前記少なくとも一つのオブジェクトについての少なくとも一つのオフ・スクリーン歪めパラメータを示し、前記オフ・スクリーン歪めパラメータは、前記再生システムによって前記オブジェクトに対して実行されるべきオフ・スクリーン歪めの少なくとも一つの特性を示し、前記プログラムに含まれる前記スクリーン関係メタデータはそれぞれの前記オフ・スクリーン歪めパラメータを示す、態様４２または４３記載のオーディオ処理ユニット。
〔態様４５〕
前記オフ・スクリーン歪めパラメータは、再生スクリーンの面に少なくとも実質的に平行な幅軸に沿った前記オブジェクトの歪められていない位置を歪める度合いを、前記再生スクリーンの面に少なくとも実質的に垂直な、前記オブジェクトがレンダリングされるべき歪められた位置の距離の関数として制御する、態様４４記載のオーディオ処理ユニット。
〔態様４６〕
前記歪めが、幅軸に沿ったある歪められた位置においてレンダリングされるべき前記オブジェクトの幅軸に沿った歪められていない位置を示す値Xsの決定と、値
Xwarp
YFACTOR＝y^EXPおよび
X'＝x*YFACTOR＋(1－YFACTOR)*[XFACTOR*Xwarp＋(l－XFACTOR)*x)]
の決定とを含み、
Xwarpは、再生スクリーンの端に対する幅軸に沿った前記オーディオ要素の生の歪められた位置を表わし、
EXPはオフ・スクリーン歪めパラメータであり、
YFACTORは、前記再生スクリーンの面に少なくとも実質的に垂直な奥行き軸に沿った前記オーディオ要素の歪められた位置yの関数としての、幅軸に沿った歪めの度合いを示し、
X'は、前記再生スクリーンの前記端に対する幅軸に沿った前記オーディオ要素の歪められたオブジェクト位置を表わし、
XFACTORは、一つの前記歪め度パラメータである、
態様４２ないし４５のうちいずれか一項記載のオーディオ処理ユニット。
〔態様４７〕
当該オーディオ処理ユニットがエンコーダであり、前記処理サブシステムが前記オブジェクト・ベースのオーディオ・プログラムを生成するよう構成されている、態様４２ないし４６のうちいずれか一項記載のオーディオ処理ユニット。
〔態様４８〕
当該オーディオ処理ユニットがデコーダであり、前記処理サブシステムが前記オブジェクト・ベースのオーディオ・プログラムをデコードするよう構成されている、態様４２ないし４６のうちいずれか一項記載のオーディオ処理ユニット。
〔態様４９〕
それぞれの前記歪め度パラメータは、前記再生システムによって前記プログラムの対応するオーディオ・コンテンツに対して実行される歪めの最大の度合いを示す非バイナリー値である、態様４２ないし４８のうちいずれか一項記載のオーディオ処理ユニット。
〔態様５０〕
バッファ・メモリと；
前記バッファ・メモリに結合された少なくとも一つの処理サブシステムとを含むオーディオ処理ユニットであって、
前記バッファ・メモリは、スピーカー・チャネル・ベースのオーディオ・プログラムの少なくとも一つのセグメントを記憶しており、前記セグメントは、再生スクリーンに対して所定の位置に位置されるラウドスピーカーによる再生のために意図された前記スピーカー・チャネル・ベースのプログラムのスピーカー・チャネルの少なくとも一つの集合のオーディオ・コンテンツを含み、スピーカー・チャネルの前記集合はオブジェクト・ベースのオーディオ・プログラムに応答して生成されたものであり、該生成は、前記オブジェクト・ベースのオーディオ・プログラムのオーディオ・コンテンツを、少なくとも部分的には少なくとも一つの歪め度パラメータによって決定される度合いまで歪めることによることを含み、それぞれの前記歪め度パラメータは、再生システムによって前記オブジェクト・ベースのオーディオ・プログラムの対応するオーディオ・コンテンツに対して実行される歪めの最大の度合いを示し、
前記処理サブシステムは、前記スピーカー・チャネル・ベースのオーディオ・プログラムのレンダリングまたは前記スピーカー・チャネル・ベースのオーディオ・プログラムのデコードのうちの少なくとも一つを実行するよう構成されている、
オーディオ処理ユニット。
〔態様５１〕
前記バッファ・メモリに記憶されている前記スピーカー・チャネル・ベースのオーディオ・プログラムの少なくとも一つの前記セグメントは、スピーカー・チャネルの二つ以上の選択可能な集合のオーディオ・コンテンツを含み、それらの集合の少なくとも一つは前記オブジェクト・ベースのオーディオ・プログラムの歪められていないオーディオ・コンテンツを示し、それらの集合の少なくとも一つの他のものは、少なくとも部分的には前記少なくとも一つの歪め度パラメータによって決定される度合いまで前記オブジェクト・ベースのオーディオ・プログラムのオーディオ・コンテンツを歪めることによることを含め、前記オブジェクト・ベースのオーディオ・プログラムに応答して生成されたものである、態様５０記載のオーディオ処理ユニット。
〔態様５２〕
スピーカー・チャネルの前記集合は、少なくとも部分的には少なくとも一つのオフ・スクリーン歪めパラメータによって決定されるオフ・スクリーン歪めの実行を含むプロセスによって生成されたものである、態様５０または５１記載のオーディオ処理ユニット。
〔態様５３〕
前記オフ・スクリーン歪めは、再生スクリーンの面に少なくとも実質的に平行な幅軸に沿ったオーディオ要素の歪められていない位置を、前記再生スクリーンの面に少なくとも実質的に垂直な、前記オーディオ要素がレンダリングされるべき歪められた位置の距離の関数として前記オフ・スクリーン歪めパラメータによって制御される度合いまで、歪めることを含む、態様５２記載のオーディオ処理ユニット。
〔態様５４〕
スピーカー・チャネルの前記集合は、幅軸に沿ったある歪められた位置においてレンダリングされるべきオーディオ・オブジェクトの、再生スクリーンの面に少なくとも実質的に平行な前記幅軸に沿った歪められていない位置を示す値Xsの決定と、値
Xwarp
YFACTOR＝y^EXPおよび
X'＝x*YFACTOR＋(1－YFACTOR)*[XFACTOR*Xwarp＋(l－XFACTOR)*x)]
の決定とを含む歪めの実行を含むプロセスによって生成されたものであり、
Xwarpは、再生スクリーンの端に対する幅軸に沿った前記オブジェクトの生の歪められた位置を表わし、
EXPはオフ・スクリーン歪めパラメータであり、
YFACTORは、前記再生スクリーンの面に少なくとも実質的に垂直な奥行き軸に沿った前記オブジェクトの歪められた位置yの関数としての、幅軸に沿った歪めの度合いを示し、
X'は、前記再生スクリーンの前記端に対する幅軸に沿った前記オブジェクトの歪められたオブジェクト位置を表わし、
XFACTORは、一つの前記歪め度パラメータである、
態様５０ないし５３のうちいずれか一項記載のオーディオ処理ユニット。
〔態様５５〕
前記オブジェクト・ベースのオーディオ・プログラムは、前記少なくとも一つの歪め度パラメータを示すスクリーン関係メタデータを含み、スピーカー・チャネルの前記集合は、前記スクリーン関係メタデータによって示されるそれぞれの前記歪め度パラメータを同定するよう前記オブジェクト・ベースのオーディオ・プログラムをパースする段階を含むプロセスによって生成されたものである、態様５０ないし５４のうちいずれか一項記載のオーディオ処理ユニット。
〔態様５６〕
前記オーディオ処理ユニットはデコーダである、態様５０ないし５５のうちいずれか一項記載のオーディオ処理ユニット。
〔態様５７〕
それぞれの前記歪め度パラメータは、前記再生システムによって前記プログラムの対応するオーディオ・コンテンツに対して実行される歪めの最大の度合いを示す非バイナリー値である、態様５０ないし５６のうちいずれか一項記載のオーディオ処理ユニット。 Some aspects are described.
[Aspect 1]
How to render an audio program:
(A) At the stage of determining at least one distortion degree parameter;
(B) At the stage of performing distortion to the audio content of at least one channel of the program, at least in part, to the extent determined by the distortion parameter corresponding to the channel. The distortion degree parameter includes a step indicating the maximum degree of distortion performed by the playback system on the corresponding audio content of the program.
Method.
[Aspect 2]
The step (a) includes determining at least one off-screen distortion parameter, wherein the off-screen distortion parameter is at least one of the off-screen distortion for the corresponding audio content of the program by the playback system. The method according to aspect 1, wherein the distortion characteristic and performed in step (b) comprises, at least in part, an off-screen distortion determined by at least one said off-screen distortion parameter.
[Aspect 3]
The off-screen distortion parameter distorts the undistorted position of the audio element along a width axis that is at least substantially parallel to the plane of the playback screen, at least substantially perpendicular to the plane of the playback screen. The method of aspect 2, wherein the audio element is controlled as a function of the distance of the distorted position to be rendered.
[Aspect 4]
Determining the value Xs and the value at which the distortion indicates the undistorted position along the width axis of the audio element to be rendered at some distorted position along the width axis.
Xwarp
YFACTOR = y ^EXP and
X'＝ x * YFACTOR ＋ (1-YFACTOR) * [XFACTOR * Xwarp ＋ (l－XFACTOR) * x)]
Including the decision of
Xwarp represents the raw, distorted position of the audio element along the width axis with respect to the edge of the playback screen.
EXP is an off-screen distortion parameter
YFACTOR indicates the degree of distortion along the width axis as a function of the distorted position y of the audio element along the depth axis at least substantially perpendicular to the plane of the playback screen.
X'represents the distorted object position of the audio element along the width axis with respect to the edge of the playback screen.
XFACTOR is one of the distortion degree parameters,
The method according to any one of aspects 1 to 3.
[Aspect 5]
The program is an object-based audio program, and step (a) comprises parsing the program and identifying at least one of the distortion parameters indicated by the screen-related metadata of the program. The method according to any one of aspects 1 to 4.
[Aspect 6]
The program exhibits at least two objects, step (a) includes step (a) of independently determining at least one distortion degree parameter for each of the objects, step (b):
It comprises performing distortion independently for each audio content of the object channel, at least in part, to the extent determined by the at least one distortion degree parameter corresponding to each of the objects.
The method according to aspect 5.
[Aspect 7]
1. the method of.
[Aspect 8]
How to generate an object-based audio program:
(A) At the stage of determining at least one distortion degree parameter for at least one object;
(B) At the stage of including the object channel indicating the object and the screen-related metadata indicating the respective distortion degree parameters for the object into the program, each of the distortion degree parameters is the object by the reproduction system. Indicates the maximum degree of distortion performed against, including steps,
Method.
[Aspect 9]
The program shows at least two objects, the screen-related metadata shows at least one of the distortion parameters for each of at least two of the objects, and each of the distortion parameters corresponds to each other. The method according to aspect 8, wherein the degree of maximum distortion performed on the object to be performed is shown.
[Aspect 10]
The step (a) includes determining at least one off-screen distortion degree parameter for the at least one object, wherein the off-screen distortion parameter is performed on the object by the reproduction system. 8. The method of aspect 8 or 9, wherein the screen-related metadata that exhibits at least one characteristic of screen distortion and is included in the program indicates the respective off-screen distortion parameters.
[Aspect 11]
The off-screen distortion parameter distorts the undistorted position of the object along a width axis that is at least substantially parallel to the plane of the playback screen, at least substantially perpendicular to the plane of the playback screen. 10. The method of aspect 10, wherein the object is controlled as a function of the distance of the distorted position to be rendered.
[Aspect 12]
Determining the value Xs, which indicates the undistorted position along the width axis of the object, where the distortion should be rendered at some distorted position along the width axis, and the value.
Xwarp
YFACTOR = y ^EXP and
X'＝ x * YFACTOR ＋ (1-YFACTOR) * [XFACTOR * Xwarp ＋ (l－XFACTOR) * x)]
Including the decision of
Xwarp represents the raw, distorted position of the audio element along the width axis with respect to the edge of the playback screen.
EXP is an off-screen distortion parameter
YFACTOR indicates the degree of distortion along the width axis as a function of the distorted position y of the audio element along the depth axis at least substantially perpendicular to the plane of the playback screen.
X'represents the distorted object position of the audio element along the width axis with respect to the edge of the playback screen.
XFACTOR is one of the distortion degree parameters,
The method according to any one of aspects 8 to 11.
[Aspect 13]
One of aspects 8-12, wherein each of the distortion parameters is a non-binary value indicating the maximum degree of distortion performed by the playback system on the corresponding audio content of the program. the method of.
[Aspect 14]
(A) The stage of generating an object-based audio program;
(B) A speaker that includes at least one set of speaker channels intended for playback by a loudspeaker located in place with respect to the playback screen in response to the object-based audio program. At the stage of generating a channel-based program, the generation of the set of speaker channels determines the audio content of the object-based audio program, at least in part, by at least one distortion parameter. Each of the distortion parameters indicates the maximum degree of distortion performed by the playback system on the corresponding audio content of the object-based audio program. Including the method.
[Aspect 15]
Step (b) includes generating the speaker channel based audio program so that the speaker channel based audio program contains two or more selectable sets of speaker channels. , At least one of those sets represents the undistorted audio content of the object-based audio program, and the generation of at least one of those sets is of the object-based audio program. The other of the set comprises a loudspeaker located at a predetermined position with respect to the playback screen, comprising distorting the audio content, at least in part, to the extent determined by the distortion parameter. 14. The method of aspect 14, intended for reproduction by.
[Aspect 16]
Step (b) includes determining at least one off-screen distortion parameter, wherein the off-screen distortion parameter is off to the corresponding audio content of the object-based audio program by the playback system. 13. The method of aspect 14 or 15, wherein step (b) comprises at least one characteristic of screen distortion and at least partially includes off-screen distortion determined by at least one of the off-screen distortion parameters.
[Aspect 17]
The off-screen distortion is an undistorted position of an audio element along a width axis that is at least substantially parallel to the plane of the playback screen, at least substantially perpendicular to the plane of the playback screen. 16. The method of aspect 16, comprising distorting to a degree controlled by the off-screen distortion parameter as a function of the distance of the distorted position to be rendered.
[Aspect 18]
The distorting step is a value that indicates the undistorted position of the audio object to be rendered at a distorted position along the width axis, at least substantially parallel to the plane of the playback screen. Xs determination and value
Xwarp
YFACTOR = y ^EXP and
X'＝ x * YFACTOR ＋ (1-YFACTOR) * [XFACTOR * Xwarp ＋ (l－XFACTOR) * x)]
Including the decision of
Xwarp represents the raw, distorted position of the object along the width axis with respect to the edge of the playback screen.
EXP is an off-screen distortion parameter
YFACTOR indicates the degree of distortion along the width axis as a function of the distorted position y of the object along the depth axis at least substantially perpendicular to the plane of the playback screen.
X'represents the distorted object position of the object along the width axis with respect to the edge of the playback screen.
XFACTOR is one of the distortion degree parameters,
The method according to any one of aspects 14 to 17.
[Aspect 19]
The object-based audio program includes screen-related metadata indicating the at least one distortion degree parameter, and step (b) parses the object-based audio program and said screen-related metadata. 13. The method of any one of embodiments 14-18, comprising identifying the respective strain degree parameters indicated by.
[Aspect 20]
One of aspects 14-19, wherein each of the distortion parameters is a non-binary value indicating the maximum degree of distortion performed by the playback system on the corresponding audio content of the program. the method of.
[Aspect 21]
A method of rendering a speaker channel-based program that includes at least one set of speaker channels that represent distorted content, said speaker channel-based program processing an object-based audio program. The process distorts and distorts the audio content of the object-based audio program, at least in part, to the extent determined by at least one distortion parameter. Each of the distortion parameters is performed by the playback system on the corresponding audio content of the object-based audio program, including by generating the set of speaker channels that represent the content. The maximum degree of distortion is shown and the rendering method is:
(A) A step of parsing the speaker channel-based program to identify the speaker channels of the speaker channel-based program, including each said set of speaker channels exhibiting distorted content. ;
(B) Positioned in place with respect to the playback screen in response to at least some of the speaker channels of the speaker channel based program, including the set of at least one speaker channel indicating distorted content. Including the stage of generating speaker feed to drive loudspeakers
Method.
[Aspect 22]
The speaker channel-based program was generated by processing the object-based audio program, which is an off-screen of the audio content of the object-based audio program. Distortion, at least in part, the at least one distortion parameter and at least one off-screen distortion parameter that exhibits at least one characteristic of off-screen distortion for the corresponding audio content of the object-based program. 21. The method of aspect 21, comprising performing to a degree determined by use.
[Aspect 23]
The speaker channel-based audio program comprises two or more selectable sets of speaker channels, at least one of which is an undistorted audio program of the object-based audio program. The other one of those sets representing the content is one said set of speaker channels showing the distorted content, and step (b) is the one said set of the speaker channels showing the distorted content. 21. The method of aspect 21 or 22, comprising the step of selecting one of the above sets, which is a set.
[Aspect 24]
The one of aspects 21 to 23, wherein each of the distortion parameters is a non-binary value indicating the maximum degree of distortion performed by the playback system on the corresponding audio content of the program. the method of.
[Aspect 25]
With a first subsystem configured to parse a multi-channel audio program and identify the channels of the program;
A processing subsystem coupled to the first subsystem, determined by at least one distortion parameter corresponding to the channel, at least in part, for the audio content of at least one channel of the program. Each said distortion parameter is the maximum amount of distortion performed by the playback system on the corresponding audio content of the program. Show the degree,
system.
[Aspect 26]
The distortion includes, at least in part, an off-screen distortion determined by at least one off-screen distortion parameter, the off-screen distortion parameter being off to the corresponding audio content of the program by the playback system. 25. The system according to aspect 25, which exhibits at least one characteristic of screen distortion.
[Aspect 27]
The off-screen distortion is an undistorted position of an audio element along a width axis that is at least substantially parallel to the plane of the playback screen, at least substantially perpendicular to the plane of the playback screen. 26. The system of aspect 26, comprising distorting to the extent controlled by the off-screen distortion parameter as a function of the distance of the distorted position to be rendered.
[Aspect 28]
Determining the value Xs and the value at which the distortion indicates the undistorted position along the width axis of the audio element to be rendered at some distorted position along the width axis.
Xwarp
YFACTOR = y ^EXP and
X'＝ x * YFACTOR ＋ (1-YFACTOR) * [XFACTOR * Xwarp ＋ (l－XFACTOR) * x)]
Including the decision of
Xwarp represents the raw, distorted position of the audio element along the width axis with respect to the edge of the playback screen.
EXP is an off-screen distortion parameter
YFACTOR indicates the degree of distortion along the width axis as a function of the distorted position y of the audio element along the depth axis at least substantially perpendicular to the plane of the playback screen.
X'represents the distorted object position of the audio element along the width axis with respect to the edge of the playback screen.
XFACTOR is one of the distortion degree parameters,
The system according to any one of aspects 25 to 27.
[Aspect 29]
The program is an object-based audio program, and the first subsystem is configured to parse the program and identify at least one distortion parameter indicated by the program's screen-related metadata. The system according to any one of aspects 25 to 28.
[Aspect 30]
The program exhibits at least two objects, the first subsystem is configured to independently determine at least one distortion parameter for each of the objects, and the processing subsystem is at least partial. 29 is configured to independently perform distortion on the audio content indicating each of the objects to the extent determined by the at least one distortion degree parameter corresponding to each of the objects. Described system.
[Aspect 31]
The one of aspects 25 to 30, wherein each of the distortion parameters is a non-binary value indicating the maximum degree of distortion performed by the playback system on the corresponding audio content of the program. System.
[Aspect 32]
With a first subsystem configured to generate object-based audio programs;
A speaker channel base that includes at least one set of speaker channels intended for playback by loudspeakers located in place with respect to the playback screen in response to the object-based audio program. A system comprising a second subsystem coupled to the first subsystem, which is configured to generate the program of the object-based audio program. The audio content is configured to generate the set of speaker channels, including at least in part by distorting to the extent determined by at least one distortion parameter. Is a system that indicates the maximum degree of distortion performed by the playback system on the corresponding audio content of the object-based audio program.
[Aspect 33]
The second subsystem causes the speaker channel-based audio program to be generated such that the speaker channel-based audio program contains two or more selectable sets of speaker channels. Constructed, at least one of those sets represents the undistorted audio content of the object-based audio program, and the generation of at least one of those sets is of the object-based. The other of the set is positioned in the predetermined position with respect to the playback screen, comprising distorting the audio content of the audio program, at least in part, to the extent determined by the distortion parameter. 32. The system according to aspect 32, which is intended for reproduction by loudspeakers.
[Aspect 34]
The second subsystem is configured to generate the set of speaker channels, including by performing off-screen distortion of the audio content of the object-based audio program. The system according to aspect 32 or 33, wherein the off-screen distortion is determined, at least in part, by at least one off-screen distortion parameter exhibiting at least one characteristic of the off-screen distortion.
[Aspect 35]
The second subsystem distorts the undistorted position of the object along a width axis that is at least substantially parallel to the plane of the playback screen in response to the off-screen distortion parameter. 34. The system of aspect 34, wherein the object is configured to control as a function of the distance of the distorted position to be rendered, at least substantially perpendicular to the plane of.
[Aspect 36]
The second subsystem is an undistorted position along the width axis of the audio element to be rendered at a distorted position along the width axis, at least substantially parallel to the plane of the playback screen. Determine the value Xs that indicates the value
Xwarp
YFACTOR = y ^EXP and
X'＝ x * YFACTOR ＋ (1-YFACTOR) * [XFACTOR * Xwarp ＋ (l－XFACTOR) * x)]
Is configured to perform the distortion, including by determining
Xwarp represents the raw, distorted position of the object along the width axis with respect to the edge of the playback screen.
EXP is an off-screen distortion parameter
YFACTOR indicates the degree of distortion along the width axis as a function of the distorted position y of the object along the depth axis at least substantially perpendicular to the plane of the playback screen.
X'represents the distorted object position of the object along the width axis with respect to the edge of the playback screen.
XFACTOR is one of the distortion degree parameters,
The system according to any one of aspects 32 to 35.
[Aspect 37]
The object-based audio program contains screen-related metadata indicating the at least one distortion degree parameter, and the second subsystem parses the object-based audio program and said screen-related. The system according to any one of aspects 32 to 36, configured to identify each said strain parameter indicated by the metadata.
[Aspect 38]
13. System.
[Aspect 39]
A system that renders a speaker channel-based program that contains at least one set of speaker channels that represent distorted content, said speaker channel-based program processing an object-based audio program. The process distorts and distorts the audio content of the object-based audio program, at least in part, to the extent determined by at least one distortion parameter. Each of the distortion parameters is performed by the playback system on the corresponding audio content of the object-based audio program, including by generating the set of speaker channels that represent the content. Indicates the maximum degree of distortion, the system is:
A first configured to parse the speaker channel based program to identify the speaker channel of the speaker channel based program, including said set of each of the speaker channels showing distorted content. With one subsystem;
A loudspeaker positioned in place with respect to the playback screen in response to at least some of the speaker channels of the speaker channel-based program, including the set of at least one speaker channel indicating distorted content. Includes a rendering subsystem coupled to said first subsystem that is configured to generate speaker feeds to drive speakers.
system.
[Aspect 40]
The speaker channel-based audio program comprises two or more selectable sets of speaker channels, at least one of which is an undistorted audio program of the object-based audio program. The other one of those sets representing the content is one said set of speaker channels showing the distorted content, and the first subsystem is one of the speaker channels showing the distorted content. 39. The system of aspect 39, wherein one of the set, which is the set, is configured to be selected for rendering by the rendering subsystem.
[Aspect 41]
The system of aspect 39 or 40, wherein each of the distortion parameters is a non-binary value indicating the maximum degree of distortion performed by the reproduction system on the corresponding audio content of the program.
[Aspect 42]
With buffer memory;
An audio processing unit comprising at least one processing subsystem coupled to the buffer memory.
The buffer memory is at least one segment of an object-based audio program, the segment comprising at least one object channel audio content representing at least one object, and at least one segment. Screen-related metadata indicating at least one distortion degree parameter for the object, each said distortion degree parameter indicating the maximum degree of distortion to be performed on the object by the reproduction system. It remembers the metadata and
The processing subsystem uses at least a portion of the screen-related metadata to render the object-based audio program or generate the object-based audio program or decode the object-based audio program. Combined and configured to perform at least one of
Audio processing unit.
[Aspect 43]
The program shows at least two objects, the screen-related metadata shows at least one of the distortion parameters for each of at least two of the objects, and each of the distortion parameters corresponds to each other. 42. The audio processing unit according to aspect 42, which indicates the degree of maximum distortion performed on the object.
[Aspect 44]
The segment of the object-based audio program stored in the buffer memory indicates at least one off-screen distortion parameter for the at least one object, and the off-screen distortion parameter is the reproduction. 42 or 43, wherein the screen-related metadata contained in the program exhibits the respective off-screen distortion parameters, indicating at least one characteristic of the off-screen distortion to be performed on the object by the system. Audio processing unit.
[Aspect 45]
The off-screen distortion parameter distorts the undistorted position of the object along a width axis that is at least substantially parallel to the plane of the playback screen, at least substantially perpendicular to the plane of the playback screen. 44. The audio processing unit of aspect 44, wherein the object is controlled as a function of the distance of the distorted position to be rendered.
[Aspect 46]
Determining the value Xs, which indicates the undistorted position along the width axis of the object, where the distortion should be rendered at some distorted position along the width axis, and the value.
Xwarp
YFACTOR = y ^EXP and
X'＝ x * YFACTOR ＋ (1-YFACTOR) * [XFACTOR * Xwarp ＋ (l－XFACTOR) * x)]
Including the decision of
Xwarp represents the raw, distorted position of the audio element along the width axis with respect to the edge of the playback screen.
EXP is an off-screen distortion parameter
YFACTOR indicates the degree of distortion along the width axis as a function of the distorted position y of the audio element along the depth axis at least substantially perpendicular to the plane of the playback screen.
X'represents the distorted object position of the audio element along the width axis with respect to the edge of the playback screen.
XFACTOR is one of the distortion degree parameters,
The audio processing unit according to any one of aspects 42 to 45.
[Aspect 47]
The audio processing unit according to any one of aspects 42 to 46, wherein the audio processing unit is an encoder and the processing subsystem is configured to generate the object-based audio program.
[Aspect 48]
The audio processing unit according to any one of aspects 42 to 46, wherein the audio processing unit is a decoder and the processing subsystem is configured to decode the object-based audio program.
[Aspect 49]
One of aspects 42-48, wherein each of the distortion parameters is a non-binary value indicating the maximum degree of distortion performed by the playback system on the corresponding audio content of the program. Audio processing unit.
[Aspect 50]
With buffer memory;
An audio processing unit comprising at least one processing subsystem coupled to the buffer memory.
The buffer memory stores at least one segment of a speaker channel-based audio program, which segment is intended for playback by a loudspeaker located in place with respect to the playback screen. Containing the audio content of at least one set of speaker channels in the speaker channel-based program, the set of speaker channels was generated in response to an object-based audio program. The generation comprises distorting the audio content of the object-based audio program to a degree determined by at least one distortion parameter, at least in part. Indicates the maximum degree of distortion performed by the playback system on the corresponding audio content of the object-based audio program.
The processing subsystem is configured to perform at least one of rendering of the speaker channel based audio program or decoding of the speaker channel based audio program.
Audio processing unit.
[Aspect 51]
At least one segment of the speaker channel-based audio program stored in the buffer memory contains audio content of two or more selectable sets of speaker channels of those sets. At least one represents the undistorted audio content of the object-based audio program, and at least one other of their set is at least partially determined by the at least one distortion degree parameter. 50. The audio processing unit according to aspect 50, which is generated in response to the object-based audio program, including by distorting the audio content of the object-based audio program to some extent.
[Aspect 52]
The audio processing according to aspect 50 or 51, wherein the set of speaker channels is generated by a process comprising performing off-screen distortion determined by at least one off-screen distortion parameter, at least in part. unit.
[Aspect 53]
The off-screen distortion is an undistorted position of an audio element along a width axis that is at least substantially parallel to the plane of the playback screen, at least substantially perpendicular to the plane of the playback screen. 52. The audio processing unit of aspect 52, comprising distorting to the extent controlled by the off-screen distortion parameter as a function of the distance of the distorted position to be rendered.
[Aspect 54]
The set of speaker channels is an undistorted position along the width axis of the audio object to be rendered at some distorted position along the width axis, at least substantially parallel to the plane of the playback screen. Determining the value Xs to indicate the value and the value
Xwarp
YFACTOR = y ^EXP and
X'＝ x * YFACTOR ＋ (1-YFACTOR) * [XFACTOR * Xwarp ＋ (l－XFACTOR) * x)]
It was generated by a process that involved performing distortions, including the determination of
Xwarp represents the raw, distorted position of the object along the width axis with respect to the edge of the playback screen.
EXP is an off-screen distortion parameter
YFACTOR indicates the degree of distortion along the width axis as a function of the distorted position y of the object along the depth axis at least substantially perpendicular to the plane of the playback screen.
X'represents the distorted object position of the object along the width axis with respect to the edge of the playback screen.
XFACTOR is one of the distortion degree parameters,
The audio processing unit according to any one of aspects 50 to 53.
[Aspect 55]
The object-based audio program contains screen-related metadata indicating the at least one distortion parameter, and the set of speaker channels identifies each of the distortion parameters indicated by the screen-related metadata. The audio processing unit according to any one of aspects 50 to 54, which is generated by a process including a step of parsing the object-based audio program.
[Aspect 56]
The audio processing unit according to any one of aspects 50 to 55, wherein the audio processing unit is a decoder.
[Aspect 57]
One of aspects 50-56, wherein each of the distortion parameters is a non-binary value indicating the maximum degree of distortion performed by the playback system on the corresponding audio content of the program. Audio processing unit.

Claims

A processing method for rendering audio programs:
At the stage of receiving the distortion degree parameter;
The stage of performing distortion on the audio content of at least one audio object in the program, at least in part, to the extent determined by the distortion parameter and the position of at least one playback speaker. The distortion parameter indicates the maximum degree of distortion performed by the playback system on the audio object, and the possible values for the degree determined by the distortion parameter, at least in part, are no distortion. Includes, said maximum degree of distortion and one or more intermediate values between no distortion and said maximum degree of distortion, including steps.
The execution of the distortion on the audio object is based on an index parameter indicating whether or not the distortion is performed on the audio object.
Method.

The method of claim 1, wherein the distortion parameter is a non-binary value indicating the maximum degree of distortion performed by the reproduction system on the audio object.

A system for rendering an audio program, comprising one or more processors configured to perform the method of claim 1 or 2.

A storage medium having a software program adapted for execution on a processor for performing the method step of claim 1 or 2 when executed on a computing device.

A computer program product having executable instructions for performing the method according to claim 1 or 2 when executed on a computer.