JP2007501553A

JP2007501553A - Apparatus and method for generating, storing and editing audio representations in an audio scene

Info

Publication number: JP2007501553A
Application number: JP2006522307A
Authority: JP
Inventors: フランクメルキオル; ヤンラングハマー; トーマスレダー; カトリーンムエニッヒ; サンドラブリックス
Original assignee: フラウンホッファー−ゲゼルシャフトツァフェルダールングデァアンゲヴァンテンフォアシュンクエー．ファオ
Priority date: 2003-08-04
Filing date: 2004-08-02
Publication date: 2007-01-25
Anticipated expiration: 2024-08-02
Also published as: WO2005017877A3; JP4263217B2; EP1652405A2; CN1849845A; US20050105442A1; CN100508650C; WO2005017877A2; EP1652405B1; DE10344638A1; ATE390824T1; US7680288B2

Abstract

The audio signal processing circuit (12) has a large number of input channels (16) carrying separate signals (Ek1-Ekm). The circuit may be a WFS (Wave Field Synthesis) rendering unit. It has a large number of output channels (14) carrying loudspeaker signals (LS1- LSn). A display device (18) provides the inputs for the audio signal processing circuit. Temporally non- overlapping audio objects are allocated to the same input channel. The input for the display device comes from the device generating the audio scene (10) with separate audio objects.

Description

本発明は波面合成の分野にあって、特に、オーディオシーンでのオーディオ表現を生成し、保存し、編集するための装置および方法に関する。 The present invention is in the field of wavefront synthesis, and more particularly to an apparatus and method for generating, storing and editing audio representations in an audio scene.

娯楽用電子機器の分野で新規の技術および革新的な製品に対する必要性が高まっている。ここで、最適な機能または性能、それぞれを提供するにあたって、新規のマルチメディアシステムが成功するための重要な前提がある。これを、デジタル技術および特にコンピュータ技術を用いることにより達成する。従って、例としては、音響映像の印象の臨場感を向上させるのに適用するものである。従来技術のオーディオシステムでは、現実そして仮想環境での空間音声再生の品質に重大な弱点がある。 There is a growing need for new technologies and innovative products in the field of entertainment electronics. Here, there are important premises for a new multimedia system to succeed in providing optimal functions or performances, respectively. This is achieved by using digital technology and in particular computer technology. Accordingly, as an example, the present invention is applied to improve the realistic sensation of the impression of an audio image. Prior art audio systems have significant weaknesses in the quality of spatial audio playback in real and virtual environments.

オーディオ信号のマルチチャンネルスピーカ再生方法は周知のものであり、長年の間標準化されてきている。通常の技術はすべて、スピーカの配置および聴取者の位置がともに、すでに伝送フォーマットに固定されているという欠点を有している。スピーカを聴取者に対して間違って配置した場合、オーディオの品質が非常に劣化する。最適なサウンドは、再生空間の非常に狭い範囲、いわゆるスイートスポットだけで可能である。 Multi-channel speaker reproduction methods for audio signals are well known and have been standardized for many years. All conventional techniques have the disadvantage that both the placement of the speakers and the position of the listener are already fixed in the transmission format. If the speakers are misplaced with respect to the listener, the audio quality is greatly degraded. The optimum sound is possible only in a very narrow range of the reproduction space, the so-called sweet spot.

向上した自然な空間印象とともにオーディオ再生中のより丈夫なスピーカボックスは、新規の技術のサポートより達成することができる。この技術の原理である、いわゆる波面合成（ＷＦＳ）は、デルフト工科大（ＴＵＤｅｌｆｔ）で研究が行われ、１９８０年代後半に初めて発表されたものである（ベルクホウト（Ｂｅｒｋｈｏｕｔ）、Ａ．Ｊ．ドフリース（ｄｅＶｒｉｅｓ）、Ｄ．フォーゲル（Ｖｏｇｅｌ）、Ｐ．：波面合成による音響制御（ＡｃｏｕｓｔｉｃｃｏｎｔｒｏｌｂｙＷａｖｅ‐ｆｉｅｌｄＳｙｎｔｈｅｓｉｓ）ＪＡＳＡ９３，９９３年）。 A more robust speaker box during audio playback with an improved natural space impression can be achieved with the support of new technologies. The principle of this technology, so-called wavefront synthesis (WFS), was studied at TU Delft and was first published in the late 1980s (Berkhout, AJ Doffries ( de Vries), D. Vogel, P .: Acoustic control by wave-field synthesis (JASA 93, 993).

コンピュータの能力と伝送速度というこの方法が持つ非常に大きな要件のために、波面合成は、これまで実際にはほとんど用いられていなかった。しかし、マイクロプロセッサ技術およびオーディオ符号化分野の進歩により、今日、この技術を特定の適用例に用いることができる。専門分野での第１の製品は、来年期待されている。数年の間に、民生用分野の最初の波面合成適用例が販売されることになっている。 Wavefront synthesis has never been used in practice so far because of the huge requirements of this method of computer capability and transmission speed. However, advances in microprocessor technology and the audio coding field allow this technology to be used today for specific applications. The first product in the specialized field is expected next year. Over the years, the first wavefront synthesis application in the consumer sector will be sold.

ＷＦＳの基本的な考えは、ホイヘンスの原理の波動説を応用したものに基づいている。 The basic idea of WFS is based on the application of the Huygens principle of wave theory.

波動で得た各点は、球形または円形の経路において要素波の伝搬する場合の始点である。 Each point obtained by the wave is a starting point when the element wave propagates in a spherical or circular path.

音響学に応用する場合、互いに横に配列した非常に数多くののスピーカ（いわゆるスピーカアレイ）により、到来波面の任意の形がそれぞれを再生することができる。最も単純な場合では、１つの点音源を再生し、スピーカを線形配列する場合、個別のスピーカから出力した音場が適切に重畳するように、各スピーカのオーディオ信号を時間遅延および振幅変倍で供給する必要がある。音源をいくつか用いることにより、各スピーカに対する寄与率を音源ごとに別々に算出して、得られる信号を加算する。もし、反射壁面のある室内に再生する音源が存在する場合は、反射についても、追加の音源としてスピーカアレイを介して再生することができる。従って、算出労力は、音源と、録音室の反射特性と、スピーカの数とに非常に依存している。 When applied to acoustics, any number of incoming wavefronts can be reproduced by a very large number of speakers (so-called speaker arrays) arranged side by side. In the simplest case, when playing a single point sound source and linearly arranging speakers, the audio signal of each speaker is time-delayed and amplitude-magnified so that the sound fields output from the individual speakers are appropriately superimposed. It is necessary to supply. By using several sound sources, the contribution rate for each speaker is calculated separately for each sound source, and the obtained signals are added. If a sound source to be reproduced exists in a room having a reflective wall surface, reflection can be reproduced as an additional sound source through the speaker array. Therefore, the calculation effort is very dependent on the sound source, the reflection characteristics of the recording room, and the number of speakers.

特に、この技術特有の利点は、再生空間の広い範囲にわたって自然の空間音声インプレッションが可能であることである。周知の技術と対照的に、音源からの方向と距離とを非常に正確に再生する。程度は限られているが、実際のスピーカアレイと聴取者との間にバーチャル音源を配置することもできる。 In particular, a unique advantage of this technology is that natural spatial audio impressions are possible over a wide range of playback space. In contrast to known techniques, the direction and distance from the sound source are reproduced very accurately. To a limited extent, a virtual sound source can be placed between the actual speaker array and the listener.

その特性が周知の環境に対して波面合成は十分機能するが、特性が変わったり、環境の実際の特性に一致しない環境特性に基づいて波面合成を行ったりした場合は、それぞれ不規則性が発生する。 Wavefront synthesis works well for environments with known characteristics, but irregularities occur when the characteristics change or wavefront synthesis is performed based on environmental characteristics that do not match the actual characteristics of the environment. To do.

しかしながら、波面合成技術は、対応する空間オーディオ知覚を仮想知覚に追加するために効果的に用いることができる。従来では、バーチャルスタジオでの製作中は、最前面において仮想シーンにおける真の仮想インプレッションの伝達であった。画像と一致した音響インプレッションは通常、いわゆるポストプロダクションと呼ばれるマニュアル操作工程により後からオーディオ信号に組み込まれるものであるが、実現するには非常にコストがかかり、時間がかかると考えられているので、無視されている。これにより、一般的に、個別の感覚印象の間で矛盾が発生してしまい、設計した空間、すなわち、設計したシーンを、本物より劣っているように思ってしまうことになる。 However, wavefront synthesis techniques can be used effectively to add corresponding spatial audio perception to virtual perception. Conventionally, during the production in the virtual studio, the true virtual impression was transmitted in the virtual scene in the foreground. Sound impression that matches the image is usually incorporated into the audio signal later by a manual operation process called so-called post-production, but it is considered to be very costly and time consuming to realize, Ignored. This generally causes a contradiction between individual sensory impressions, and the designed space, that is, the designed scene, seems to be inferior to the real thing.

一般に言えば、オーディオ素材、例えば映画は、多数のオーディオオブジェクトから成る。オーディオオブジェクトは、映画の設定における音源である。映画のシーンについて考えると、例えば、２人の人間が、互いに向かい合わせに立って話をしている時に、同時に、例えば馬に乗った人および列車が近づいてくる場合、ある特定の時間では、４つの音源がこのシーンに存在することになる。すなわち、２人の人間と、近づいてくる馬に乗った人と、走ってくる列車とである。話をしている２人の人間が同時に話をやめると仮定すると、ある時刻では、少なくとも２つのオーディオオブジェクトが、少なくともアクティブになっている必要がある。この時刻に２人の人間が黙っている場合では、すなわち、馬に乗った人と列車とである。しかしながら、一方の人間が別の時刻に話をしている場合は、３つのオーディオオブジェクトがアクティブになっている。すなわち、馬に乗った人と、列車と、一方の人間である。２人の人間が実際に同時に話をしている場合では、この時刻では４つのオーディオオブジェクトがアクティブになっている。すなわち、馬に乗った人と、列車、第１の人間と、第２の人間とである。 Generally speaking, audio material, such as a movie, consists of a number of audio objects. An audio object is a sound source in a movie setting. Considering the scene of a movie, for example, when two people are standing facing each other and talking at the same time, for example, a person on a horse and a train approaching, at a certain time, There will be four sound sources in this scene. That is, two people, a person riding on an approaching horse, and a running train. Assuming that two people who are talking stop talking at the same time, at some time, at least two audio objects need to be at least active. If two people are silent at this time, that is, a person on a horse and a train. However, if one person is talking at a different time, three audio objects are active. That is, a person on a horse, a train, and one person. If two people are actually talking at the same time, four audio objects are active at this time. That is, a person on a horse, a train, a first person, and a second person.

一般に言えば、オーディオオブジェクトが、ある特定の時刻でアクティブまたは“生きている”、映画の設定における音源を表すというように、オーディオオブジェクトはそれ自体を表現している。オーディオオブジェクトはさらに、開始時刻と終了時刻とにより特徴づけられていることを意味している。前者の例では、馬に乗った人および列車は、例えば、全設定の間アクティブになっている。両方が近づいてくる場合は、聴取者は、馬に乗った人および列車のサウンドが大きくなることでこれを認識する。そして、最適な波面合成設定では、これらの音源の位置についても適応可能であれば、それに応じて変化する。一方、会話をしている２人の話す人は常に、新しいオーディオオブジェクトを生成する。一方の話す人が話をやめると必ず現在のオーディオオブジェクトが終了して、もう一方のスピーカが話し始めると新しいオーディオオブジェクトが開始するからである。これは、もう一方の話す人が話をやめるとやはり終了する。第１の話す人が再び話し始めると、新しいオーディオオブジェクトをやはり開始する。 Generally speaking, an audio object represents itself, such that an audio object represents a sound source in a movie setting that is active or “alive” at a particular time. It also means that the audio object is characterized by a start time and an end time. In the former example, the person on the horse and the train are active during all settings, for example. If both are approaching, the listener will recognize this by the louder sound of the person on the horse and the train. In the optimum wavefront synthesis setting, if the positions of these sound sources can also be adapted, they change accordingly. On the other hand, the two speakers speaking are always creating a new audio object. This is because the current audio object ends whenever one speaker stops speaking, and a new audio object starts when the other speaker starts speaking. This ends when the other speaker stops speaking. When the first speaker starts speaking again, it also starts a new audio object.

ある特定の量の入力チャンネルから、すなわち波面合成スピーカアレイのスピーカのそれぞれの位置を知ることから、ある特定の量のスピーカ信号を生成することができる既存の波面合成レンダリング手段がある。 There are existing wavefront synthesis rendering means that can generate a certain amount of speaker signal from a certain amount of input channels, ie knowing the position of each of the speakers of the wavefront synthesis speaker array.

波面合成レンダリング装置は、ある意味波面合成システムの“心臓部”である。これは、振幅および位相が正確になるように、スピーカアレイの多数のスピーカのスピーカ信号を算出するので、ユーザには、最適な視覚的印象ばかりでなく最適な音響的印象も提供する。 The wavefront synthesis rendering device is, in a sense, the “heart” of the wavefront synthesis system. This calculates the loudspeaker signal of multiple loudspeakers in the loudspeaker array so that the amplitude and phase are accurate, thus providing the user with an optimal acoustic impression as well as an optimal visual impression.

６０年代後半になってマルチチャンネルオーディオが映画に導入されるようになり、実際にシーンの中にいるような印象を聴取者に与えることが、常に音響技師の目的となっている。サラウンドチャンネルを再生システムに加えることが、さらに節目となっている。９０年代になって、新しいデジタルシステムが導入され、オーディオチャンネルの数が増加するようになってきた。現在では、５．１または７．１システムが映画再生の標準システムとなっている。 In the late 1960s, multi-channel audio was introduced into movies, and it has always been the goal of acoustic engineers to give the listener the impression that they are actually in the scene. Adding surround channels to the playback system is another milestone. In the 90s, new digital systems were introduced and the number of audio channels increased. Currently, the 5.1 or 7.1 system is the standard system for movie playback.

多くの場合、これらのシステムは、映画の印象を創造的にサポートする優れた可能性となり、音響効果、環境音、またはサラウンドミキシング音楽に優れた可能性を与える必要がある。一方、波面合成技術は非常に柔軟性があり、この点で最大の自由が得られる。 In many cases, these systems will have great potential to creatively support the impression of a movie and need to give great potential to sound effects, ambient sounds, or surround mixing music. On the other hand, wavefront synthesis techniques are very flexible and provide the greatest freedom in this regard.

しかし、５．１または７．１システムが用いられるようになって、映画サウンドトラックのミキシングを処理するために、“標準化した”やり方が行われるようになった。 However, with the 5.1 or 7.1 system being used, a “standardized” approach has been taken to handle the mixing of movie soundtracks.

再生システムは通常、スピーカの位置が固定されている。５．１の場合では、左チャンネル（“レフト”）、中央チャンネル（“センタ”）、右チャンネル（“ライト”）、サラウンド左チャンネル（“サラウンドレフト”）、およびサラウンド右チャンネル（“サラウンドライト”）である。これらの固定した（数少ない）位置のために、音響技師が求める理想的なサウンドイメージは、少数の座席、いわゆるスイートスポットに限られている。上記の５．１の配置の間に仮想音源を用いると、向上が見られる場合もあるが、必ずしも満足の行く結果になるとは限らない。 The playback system usually has a fixed speaker position. In the case of 5.1, the left channel (“left”), center channel (“center”), right channel (“right”), surround left channel (“surround left”), and surround right channel (“surround right”) ). Because of these fixed (few) positions, the ideal sound image that the acoustic engineer seeks is limited to a small number of seats, so-called sweet spots. If a virtual sound source is used during the arrangement of 5.1 above, an improvement may be seen, but not necessarily a satisfactory result.

映画のサウンドは通常、セリフ、効果音、環境音、および音楽から構成される。５．１および７．１システムの制約を考慮に入れて、これらの要素がそれぞれをミキシングされる。一般に、セリフは中央チャンネル（７．１システムでは、さらにハーフレフトおよびハーフライト位置）でミキシングが行われる。これは、俳優がスクリーンを横切ると、サウンドが追従しないことを意味する。移動するサウンドオブジェクトの効果音は、素早く移動する場合に限って実感できるので、聴取者は、音響伝送一方のスピーカから他方へサウンドが伝わることを認識することができない。 Movie sounds usually consist of lines, sound effects, environmental sounds, and music. These elements are mixed together, taking into account the constraints of the 5.1 and 7.1 systems. In general, the lines are mixed in the central channel (and in the 7.1 system, the half left and half right positions). This means that if the actor crosses the screen, the sound will not follow. Since the sound effect of the moving sound object can be realized only when moving quickly, the listener cannot recognize that the sound is transmitted from one speaker to the other.

フロントスピーカとサラウンドスピーカとの間の可聴ギャップが大きいために、横方向の音源についても配置することができないので、オブジェクトが後方から前方へ、またはこの逆に、ゆっくりと移動することができない。 Since the audible gap between the front speaker and the surround speaker is large, the sound source in the horizontal direction cannot be arranged, so that the object cannot move slowly from the rear to the front or vice versa.

また、サラウンドスピーカをスピーカの拡散アレイ内に配置することができないので、聴取者に対してある種の包絡を表すサウンドイメージを生成することになる。従って、このように正確に配置された音源に伴う耳障りな干渉音場を回避するために、聴取者の後ろに正確に配置された音源を省くことになる。 Also, because surround speakers cannot be placed in the speaker's diffusion array, a sound image representing a certain envelope for the listener is generated. Therefore, in order to avoid the annoying interference sound field associated with the sound source arranged accurately in this way, the sound source arranged accurately behind the listener is omitted.

聴取者が認識する音場を構築するための全く新しいやり方である波面合成は、これらの基本的な欠点を克服する。映画館に対する応用の重要性は、オブジェクトの２次元配置に対して制約を受けることなく、正確なサウンドイメージ達成できることである。これにより、映画館向けの設計およびサウンドミキシングを行う際に、多種多様の可能性がもたらされる。波面合成技術により達成される完全なサウンドイメージ再生により、音源を自由に配置できるようになる。また、聴取者の空間内の音源とともに、聴取者の空間の外にある音源についても焦点を合わせて音源を配置できるようになる。 Wavefront synthesis, a completely new way to build a sound field that is perceived by the listener, overcomes these fundamental drawbacks. The importance of application to a movie theater is that an accurate sound image can be achieved without being restricted by the two-dimensional arrangement of objects. This offers a wide variety of possibilities when designing for cinema and sound mixing. With complete sound image reproduction achieved by wavefront synthesis technology, the sound source can be freely arranged. In addition to the sound source in the listener's space, the sound source can be placed with a focus on the sound source outside the listener's space.

さらに、安定した音源方向および安定した音源位置を、点形成放射（ｐｏｉｎｔ−ｓｈａｐｅｄｒａｄｉａｔｉｎｇ）音源または平面波を用いて生成することもできる。最後に、聴取者の空間の内外または聴取者の空間を介して、音源を自由に移動させることもできる。 Furthermore, a stable sound source direction and a stable sound source position can also be generated using point-shaped radiating sound sources or plane waves. Finally, the sound source can be freely moved through the listener's space or through the listener's space.

これにより、創造的な現実性を提供する大きな可能性をもたらし、そして、スクリーン上の映像に従って、例えば全セリフに対して音源を正確に配置する可能性もたらすことになる。これとともに、実際に、視覚的にも音響的にも聴取者を映画にはめ込むことができるようになる。 This provides great potential for providing creative reality and the possibility of accurately placing the sound source, eg, for all lines, according to the image on the screen. At the same time, the listener can actually be inserted into the movie both visually and acoustically.

歴史的な事情により、サウンド設計、すなわちサウンド録音係の行動は、チャンネルまたはトラックパラダイムに基づいている。符号化フォーマットまたはスピーカの数、すなわち５．１システムまたは７．１システムにより、再生セットアップが決められることを意味している。特に、特定のサウンドシステムは、やはり特定の符号化フォーマットを必要とする。その結果、すべてのミキシングを再び行わなければマスタファイルを少しも変更することができない。例えば、最終的なマスタファイルで選択的にセリフトラックを変更することができない。すなわち、このシーンでの他のサウンドを全て変更しなければ、これを変更することができない。 Due to historical circumstances, the sound design, ie the behavior of the sound recording clerk, is based on the channel or track paradigm. It means that the playback setup is determined by the encoding format or the number of speakers, ie 5.1 system or 7.1 system. In particular, certain sound systems still require specific encoding formats. As a result, the master file cannot be changed at all unless all mixing is performed again. For example, it is not possible to selectively change the clift rack in the final master file. That is, it cannot be changed unless all other sounds in this scene are changed.

一方、視聴者／聴取者はチャンネルに無関心である。どのサウンドシステムでサウンドが生成されているか、元々のサウンドの内容がオブジェクト指向で、チャンネル指向で表現されているかどうか等について、無頓着である。聴取者はまた、オーディオ設定のミキシングが行われたか、そしてどのように行われたかについても関心がない。聴取者ついて考慮に入れることは、サウンドインプレッションだけである。すなわち、映画に対するサウンド設定が好きか、あるいはサウンド設定が映画にあった方がいいのかないほうがいいかということである。 On the other hand, the viewer / listener is indifferent to the channel. It is indifferent to which sound system the sound is generated, whether the original sound content is expressed in an object-oriented or channel-oriented manner, etc. The listener is also not interested in how and how the audio settings were mixed. The only thing that takes into account the listener is sound impression. That is, whether you like the sound settings for the movie, or whether the sound settings should be in the movie or not.

一方、新しいコンセプトが、新しいコンセプトで作業する人間に受け入れられることが、本質的なことである。サウンド録音係は、サウンドミキシングを担当している。サウンド録音係は、チャンネル指向パラダイムのためにチャンネル指向で作業するように“調整されている”。彼らにとっては、例えば５．１サウンドシステムの映画館では、実際に６つのチャンネルをミキシングすることが目標となっている。これは、オーディオオブジェクトについてのことではなく、チャンネル指向についてのことである。この場合は、オーディオオブジェクトは一般に、開始時刻も終了時刻も持たない。その代わり、スピーカの信号は、映画の初めの１秒から映画の最終秒までアクティブになっている。これは、一般的な映画館のサウンドシステムの（いくつかの）スピーカのうちの１つを介して、あるサウンドが生成されているという事実によるものである。バックグラウンド音楽だけであっても、特定のスピーカを介した音源が常に存在する必要があるからである。 On the other hand, it is essential that the new concept is accepted by people working on the new concept. The sound recording staff is in charge of sound mixing. The sound recording clerk is “tuned” to work in a channel orientation because of the channel orientation paradigm. For them, for example, in a 5.1 sound system cinema, the goal is to actually mix six channels. This is not about audio objects, but about channel orientation. In this case, the audio object generally has neither a start time nor an end time. Instead, the speaker signal is active from the first second of the movie to the last second of the movie. This is due to the fact that a sound is generated through one of (several) speakers of a typical cinema sound system. This is because there is always a need for a sound source through a specific speaker even for background music alone.

この理由から、オーディオ信号を対応付けられた情報とともに入力チャンネルに入力する場合には、波面合成スピーカアレイの個別のスピーカまたはスピーカ群のスピーカ信号を生成する、ある特定の量の入力チャンネルを有するように、チャンネル指向で動作するように、既存の波面合成レンダリング装置を用いる。 For this reason, when an audio signal is input to an input channel with associated information, it has a certain amount of input channels that generate speaker signals for individual speakers or speaker groups of the wavefront synthesized speaker array. In addition, an existing wavefront synthesis rendering apparatus is used so as to operate in a channel orientation.

一方、原則的には、制限のない大量のオーディオオブジェクトが存在して、映画で観察される、すなわちオーディオシーンで観察されるようになるという事に関する限り、波面合成の技術により、オーディオシーンが基本的に“よりトランスペアレント”になる。オーディオシーンにおけるオーディオオブジェクトの量がオーディオ処理手段のデフォルトの最大量の入力チャンネルを一般的に常に越えている場合には、チャンネル指向波面合成レンダリング手段が問題となることもある。さらに、ユーザにとって、すなわちサウンド録音係にとって、例えば、ある特定の時刻では存在するが、別の時刻では存在しなかったりするもの、すなわち、確定した開始と確定した終了時刻とを有する、多数のオーディオオブジェクトといった、オーディオシーンでのオーディオ表現を生成することは、混乱させることであるから、サウンド録音係と波面合成との間にやはり心理的な敷居を作ってしまうこととなる。しかしながら、実際には、サウンド録音係にとって、大きな創造的な可能性を構築することが期待されている。 On the other hand, in principle, as long as there are a large number of unrestricted audio objects that can be observed in a movie, that is, to be observed in an audio scene, the wavefront synthesis technique makes the audio scene fundamental. Is more “transparent”. If the amount of audio objects in the audio scene generally exceeds the default maximum amount of input channels of the audio processing means in general, channel-oriented wavefront synthesis rendering means may be problematic. In addition, for a user, i.e. a sound recorder, for example, a number of audios that exist at one particular time but do not exist at another time, i.e. a defined start and a defined end time. Generating an audio representation of an audio scene, such as an object, is confusing and therefore creates a psychological threshold between the sound recorder and wavefront synthesis. In reality, however, it is expected that sound recording staff will build great creative possibilities.

Ｂｅｒｋｈｏｕｔ，Ａ．Ｊ．、ｄｅ、Ｖｒｉｅｓ，Ｄ．およびＶｏｇｅｌ，Ｐ．著「波面合成による音響制御」（ＡｃｏｕｓｔｉｃｃｏｎｔｒｏｌｂｙＷａｖｅ‐ｆｉｅｌｄＳｙｎｔｈｅｓｉｓ）、ＪＡＳＡ９３、１９９３年Berkhout, A .; J. et al. , De, Vries, D .; And Vogel, P .; Author "Acoustic control by Wave-field Synthesis", JASA 93, 1993

本発明の目的は、対応するツールを求めるユーザに広く受け入れられる、オーディオシーンでのオーディオ表現を生成し、保存し、編集するためのコンセプトを提供することである。 It is an object of the present invention to provide a concept for generating, storing and editing audio representations in an audio scene that is widely accepted by users seeking corresponding tools.

この目的は、請求項１に記載のオーディオシーンでのオーディオ表現を生成し、保存し、編集するための装置、請求項１５に記載のオーディオシーンでのオーディオ表現を生成し、保存し、編集するための方法、または、請求項１６に記載のコンピュータプログラムにより達成される。 The object is to generate, store and edit an audio representation in an audio scene according to claim 1, and to generate, store and edit an audio representation in an audio scene according to claim 15. Or a computer program according to claim 16.

本発明は、オーディオオブジェクトについて、一般的な映画設定で発生するように、単にオブジェクト指向記述は、明瞭で効率的なやり方で処理可能であるという知見に基づいている。オーディオ信号を有し、確定した開始時刻と確定した終了時刻とに対応付けられているオブジェクトを有するオーディオシーンのオブジェクト指向記述は、実際の一般的な状況に対応している。サウンドが全時間にわたって存在することは、いずれにしろめったに発生しない。その代わり、例えばセリフでは、セリフの相手が話を始めて話をやめたり、あるいはサウンドは通常、始まりと終わりがあることが一般的である。それに関する限り、実際の各音源をそれ自体が有するオブジェクトに対応付けるオブジェクト指向オーディオシーン記述は、自然な状況に適用するので、透過性、明瞭性、効率性、および理解度に関して最適なものになる。 The present invention is based solely on the finding that object-oriented descriptions can be processed in a clear and efficient manner, as occurs for audio objects in common movie settings. An object-oriented description of an audio scene having an audio signal and having objects associated with a defined start time and a defined end time corresponds to an actual general situation. It never happens that the sound exists for the whole time. Instead, for example, in a dialogue, it is common that the dialogue partner starts and stops talking, or the sound usually has a beginning and an end. As far as that is concerned, the object-oriented audio scene description that maps each actual sound source to its own object is optimal in terms of transparency, clarity, efficiency, and comprehension because it applies to natural situations.

一方、例えば、オーディオ表現をオーディオシーンから生成したいと考えているサウンド録音係、すなわち彼らの創造的な可能性を滑り込ませて、映画館におけるオーディオシーンでのオーディオ表現を“同期”させたいと考えるサウンド録音係は、特殊音響効果についても考慮に入れる場合がある。チャンネルパラダイムのために、一般にハードウェア実現ミキシングデスクまたはソフトウェア実現ミキシングデスクいずれかと協働して用いられ、これにより、結果的に、チャンネル指向作業方法へ変換が行われる。ハードウェア実現ミキシングデスクまたはソフトウェア実現ミキシングデスクにおいては、各チャンネルがレギュレータ、ボタン等を有し、これにより、このチャンネル内のオーディオ信号を操作する、すなわち“ミキシング”を行う。 On the other hand, for example, a sound recording clerk who wants to generate an audio representation from an audio scene, that is, wants to “synchronize” the audio representation in an audio scene in a cinema, slipping their creative possibilities Sound recording staff may also take into account special sound effects. For the channel paradigm, it is generally used in conjunction with either a hardware-implemented mixing desk or a software-implemented mixing desk, which results in a conversion to a channel-oriented working method. In a hardware-implemented mixing desk or software-implemented mixing desk, each channel has a regulator, a button, and the like, thereby operating an audio signal in the channel, that is, performing “mixing”.

本発明によれば、命を吹き込むようなオブジェクト指向オーディオ表現と、サウンド録音係の真価を発揮できるようなチャンネル指向表現との間のバランスを、オーディオシーンのオブジェクト指向記述を波面合成レンダリング装置等のオーディオ処理手段の複数の入力チャンネルにマッピングするために用いられる、マッピング手段により達成する。本発明によれば、マッピング手段を、第１のオーディオオブジェクトを入力チャンネルに割り当てて、その開始時刻が第１のオーディオオブジェクトの終了時刻の後にある第２のオーディオオブジェクトを同じ入力チャンネルに割り当てて、その開始時刻が第１のオーディオオブジェクトの開始時刻の後にあって第１のオーディオオブジェクトの終了時刻の前にある第３のオーディオオブジェクトを複数の入力チャンネルのうちの別のものに割り当てるように構成する。 According to the present invention, a balance between an object-oriented audio expression that brings life to life and a channel-oriented expression that can demonstrate the value of a sound recording person can be obtained. This is achieved by the mapping means used for mapping to a plurality of input channels of the audio processing means. According to the invention, the mapping means assigns a first audio object to the input channel, assigns a second audio object whose start time is after the end time of the first audio object to the same input channel, A third audio object whose start time is after the start time of the first audio object and before the end time of the first audio object is configured to be assigned to another one of the plurality of input channels. .

同時に発生するオーディオオブジェクトを波面合成レンダリング装置の異なる入力チャンネルに割り当てるが、連続して発生するオーディオオブジェクトは同じ入力チャンネルに割り当てるというこの時間的な割り当てが、非常にチャンネルエフィシエントであることがわかった。波面合成レンダリング装置の比較的少ない数の入力チャンネルが平均して占有されることにより、一方では明瞭になり、他方では、演算集約的波面合成レンダリング装置の演算効率にとって都合が良くなることを意味している。同時占有チャンネルが平均して比較的小さな数であるので、ユーザ、すなわち、例えばサウンド録音係は、この瞬間にどのオブジェクトがアクティブになっていて、この瞬間にどのオブジェクトがアクティブでないかという問題を、多数の入力チャンネルから探し求めることなく、ある特定の時刻でのオーディオシーンの複雑性の概要を素早くわかる。他方、ユーザは、自分が用いるチャンネルレギュレータにより、オーディオオブジェクトの操作をオブジェクト指向表現として簡単に行うことができる。 We found that this temporal assignment of assigning simultaneously occurring audio objects to different input channels of a wavefront rendering renderer but assigning consecutively generated audio objects to the same input channel is very channel efficient. . Meaning that, on average, a relatively small number of input channels of a wavefront rendering renderer is occupied makes it clear, and on the other hand, it is convenient for the computational efficiency of the computationally intensive wavefront rendering renderer. ing. Since the number of simultaneously occupied channels is a relatively small number on average, the user, i.e. the sound recorder, for example, has the problem of which objects are active at this moment and which are not active at this moment. Quickly get an overview of the complexity of an audio scene at a particular time without having to search through multiple input channels. On the other hand, the user can easily operate the audio object as an object-oriented expression by the channel regulator used by the user.

進歩性のあるコンセプトが広く受け入れられるように期待されている点は、ユーザに対して、進歩性のあるコンセプトを、なじみのある作業環境に提供することである。しかしながらこれは、さらに高い革新的な可能性を含んでいる。従って、オブジェクト指向オーディオアプローチをチャンネル指向レンダリングアプローチにマッピングすることに基づく進歩性のあるコンセプトは全ての用件の真価を発揮させることになる。一方、すでに述べたように、オーディオシーンのオブジェクト指向記述は、自然に最適に適用されるので、効率的になり明瞭になる。他方、技術をユーザに合わせたり、またはその逆にしたりするという点で、ユーザのくせや必要性を考慮に入れる。 The expectation that the inventive concept is widely accepted is to provide the user with the inventive concept in a familiar working environment. However, this includes even more innovative possibilities. Thus, an inventive concept based on mapping an object-oriented audio approach to a channel-oriented rendering approach will demonstrate the value of all requirements. On the other hand, as already mentioned, the object-oriented description of the audio scene is naturally optimally applied, so it becomes efficient and clear. On the other hand, it takes into account the user's habits and needs in terms of adapting the technology to the user and vice versa.

本発明の好適な実施の形態について、添付の図面を参照して説明する。
図１は、オーディオ表現を生成する進歩性のある装置のブロック回路図である。
図２は、図１に示すコンセプトのためのユーザインターフェースの概略の説明図である。
図３ａは、本発明の一実施の形態による、図２のユーザインターフェースの概略の説明図である。
図３ｂは、本発明の別の実施の形態による、図２のユーザインターフェースの概略の説明図である。
図４は、好適な実施の形態による、進歩性のある装置のブロック回路図である。
図５は、各種のオーディオオブジェクトを有するオーディオシーンの時間図である。
図６は、図５に示すオーディオシーンに対する、本発明によるオブジェクトとチャンネルとの間の１：１変換の比較およびオブジェクトチャンネル割り当てである。 Preferred embodiments of the present invention will be described with reference to the accompanying drawings.
FIG. 1 is a block circuit diagram of an inventive apparatus for generating an audio representation.
FIG. 2 is a schematic illustration of a user interface for the concept shown in FIG.
FIG. 3a is a schematic illustration of the user interface of FIG. 2, according to one embodiment of the present invention.
FIG. 3b is a schematic illustration of the user interface of FIG. 2 according to another embodiment of the present invention.
FIG. 4 is a block circuit diagram of an inventive device according to a preferred embodiment.
FIG. 5 is a time diagram of an audio scene having various audio objects.
FIG. 6 is a comparison of 1: 1 conversion between objects and channels and object channel assignment according to the present invention for the audio scene shown in FIG.

図１は、オーディオシーンでのオーディオ表現を生成する進歩性のある装置のブロック回路図を示す。進歩性のある装置は、オーディオシーンのオブジェクト指向記述を提供する手段１０を含む。オーディオシーンのオブジェクト指向記述は、複数のオーディオオブジェクトを含み、オーディオオブジェクトは、少なくともオーディオ信号と、開始時刻と、終了時刻とに対応付けられている。進歩性のある装置はさらに、複数のスピーカ信号ＬＳｉ１４を生成するオーディオ処理手段１２を含んでいる。これは、チャンネル指向であり、複数のスピーカ信号１４を複数の入力チャンネルＥＫｉから生成する。提供手段１０と、例えば、ＷＦＳレンダリング装置として形成されるチャンネル指向オーディオ信号処理手段との間に、オーディオシーンのオブジェクト指向記述をチャンネル指向オーディオ信号処理手段１２の複数の入力チャンネル１６にマッピングするマッピング手段１８がある。マッピング手段１８は、第１のオーディオオブジェクトをＥＫ１等の入力チャンネルに割り当てて、その開始時刻が第１のオーディオオブジェクトの終了時刻の後にある第２のオーディオオブジェクトを入力チャンネルＥＫ１等の同じ入力チャンネルに割り当てて、その開始時刻が第１のオーディオオブジェクトの開始時刻の後にあって第１のオーディオオブジェクトの終了時刻の前にある第３のオーディオオブジェクトを入力チャンネルＥＫ２等の複数の入力チャンネルの別の入力チャンネルに割り当てるように構成されている。マッピング手段１８は従って、時間的に重複しないオーディオオブジェクトを同じ入力チャンネルに割り当てて、時間的に重複するオーディオオブジェクトを異なる並列入力チャンネルに割り当てるように構成されている。 FIG. 1 shows a block circuit diagram of an inventive device for generating an audio representation in an audio scene. The inventive device includes means 10 for providing an object-oriented description of the audio scene. The object-oriented description of the audio scene includes a plurality of audio objects, and the audio objects are associated with at least an audio signal, a start time, and an end time. The inventive device further includes audio processing means 12 for generating a plurality of speaker signals LSi14. This is channel-oriented and generates a plurality of speaker signals 14 from a plurality of input channels EKi. Mapping means for mapping an object-oriented description of an audio scene to a plurality of input channels 16 of the channel-oriented audio signal processing means 12 between the providing means 10 and channel-oriented audio signal processing means formed as, for example, a WFS rendering device There are 18. The mapping means 18 assigns the first audio object to the input channel such as EK1, and assigns the second audio object whose start time is after the end time of the first audio object to the same input channel such as the input channel EK1. Assign a third audio object whose start time is after the start time of the first audio object and before the end time of the first audio object to another input of a plurality of input channels such as input channel EK2. It is configured to be assigned to a channel. The mapping means 18 is thus configured to assign audio objects that do not overlap in time to the same input channel and to assign audio objects that overlap in time to different parallel input channels.

好適な実施の形態では、チャンネル指向オーディオ信号処理手段１２は、波面合成レンダリング装置を含む。仮想位置に対応付けられるように、オーディオオブジェクトについても指定する。オブジェクトのこの仮想位置は、オブジェクトが生きている間に変更することもできる。これは、例えば、馬に乗った人がシーンの中央に近づいてきて、馬に乗った人のギャロップが次第に大きくなって、特に、聴取者空間に次第に近づいてくるといった場合に対応する。この場合は、オーディオオブジェクトは、このオーディオオブジェクトと開始時刻と終了時刻とに対応付けられているオーディオ信号ばかりでなく、さらに、時間とともに変化する仮想音源の位置や、さらに適応可能な場合には、点音源特性を持たせる必要があるかどうか、または視聴者に対して無限大の距離の仮想位置に対応する平面波を放出する必要があるかどうかといった、オーディオオブジェクトの特性を含むこともできる。技術的には、さらに音源の特性、すなわち、オーディオオブジェクトの特性についてわかっている。これは、図１のチャンネル指向オーディオ信号処理手段１２設備により、考慮に入れても良い。 In a preferred embodiment, the channel-oriented audio signal processing means 12 includes a wavefront synthesis rendering device. The audio object is also specified so as to be associated with the virtual position. This virtual position of the object can also be changed while the object is alive. This corresponds to, for example, a case where a person riding on a horse approaches the center of the scene and the gallop of the person riding on the horse gradually increases, and in particular, gradually approaches the listener space. In this case, the audio object is not only the audio signal associated with this audio object and the start time and end time, but also the position of the virtual sound source that changes over time, and if applicable, It can also include audio object characteristics such as whether it is necessary to have point source characteristics or whether it is necessary to emit a plane wave corresponding to a virtual position at an infinite distance to the viewer. Technically, it also knows the characteristics of the sound source, that is, the characteristics of the audio object. This may be taken into account by the channel-oriented audio signal processing means 12 facility of FIG.

本発明によれば、装置の構造を階層的に構築する。オーディオオブジェクトを受信するチャンネル指向オーディオ信号処理手段を直接提供手段と接続しないで、マッピング手段を介して接続する。これにより、全オーディオシーンについて情報が得られ、提供手段だけに保存することになり、マッピング手段およびチャンネル指向オーディオ信号処理手段が保存しなければならない全オーディオ設定に関する情報がより少なくなる。その代わり、マッピング手段１８およびオーディオ信号処理手段１２は両方とも、提供手段１０から供給されたオーディオシーンの命令に従って動作する。 According to the present invention, the structure of the apparatus is constructed hierarchically. The channel-oriented audio signal processing means for receiving the audio object is connected via the mapping means without being directly connected to the providing means. As a result, information about all audio scenes is obtained and stored only in the providing means, and there is less information on all audio settings that the mapping means and the channel-oriented audio signal processing means must save. Instead, both the mapping means 18 and the audio signal processing means 12 operate according to the instructions of the audio scene supplied from the providing means 10.

本発明の好適な実施の形態では、図１に示す装置はさらに、図２に２０として示すユーザインターフェースを備えている。ユーザインターフェース２０を、入力チャンネル１つに対してユーザインターフェースチャンネルを１つ有するとともに、ユーザインターフェースチャンネル毎に好ましくは操作装置を有するように構成する。入力チャンネルＥＫｍに対するＥＫ１の利用率をユーザインターフェース２０に表示させるので、割り当て情報をマッピング手段から得るために、ユーザインターフェース２０は、そのユーザインターフェース入力２２を介してマッピング手段１８と接続される。出力側では、各ユーザインターフェースチャンネル対して操作装置機能を有している場合は、ユーザインターフェース２０は、提供手段１０に接続される。特に、ユーザインターフェース２０は、元々のバージョンに対して操作したオーディオオブジェクトを提供手段１０に提供するように構成される。従って、変更したオーディオシーンを取得して、次にそれをマッピング手段１８に提供して、それに応じて入力チャンネルに分配して、チャンネル指向オーディオ信号処理手段１２に分配する。 In the preferred embodiment of the present invention, the apparatus shown in FIG. 1 further comprises a user interface shown as 20 in FIG. The user interface 20 is configured to have one user interface channel for each input channel and preferably have an operating device for each user interface channel. Since the usage rate of EK1 for the input channel EKm is displayed on the user interface 20, the user interface 20 is connected to the mapping means 18 via the user interface input 22 in order to obtain allocation information from the mapping means. On the output side, the user interface 20 is connected to the providing means 10 when each user interface channel has an operation device function. In particular, the user interface 20 is configured to provide the providing means 10 with an audio object operated on the original version. Thus, the modified audio scene is acquired and then provided to the mapping means 18 for distribution to the input channels accordingly and distribution to the channel-oriented audio signal processing means 12.

実施例によるが、ユーザインターフェース２０は、図３ａに示すようなユーザインターフェースとして構成される。すなわち、ユーザインターフェースが常に、現在のオブジェクトだけを表すようにする。あるいは、ユーザインターフェース２０が、図３ｂのように構築される。すなわち、入力チャンネルの全オブジェクトを常に表すようにする。図３ａおよび図３ｂではともに、タイムライン３０は、発生順でオブジェクトＡ、Ｂ、Ｃを含むように示されている。オブジェクトＡは、開始時刻３１ａおよび終了時刻３１ｂを含んでいる。ランダムに、図３ａでは、第１のオブジェクトＡの終了時刻３１ｂは、第２のオブジェクトＢの開始時刻と同時に発生する。これは、終了時刻３２ｂを有し、ランダムに、第３のオブジェクトＣの開始時刻とやはり同時に発生する。これもやはり、終了時刻３３ｂを有している。開始時刻３２ａおよび３３ｂは、終了時刻３１ｂおよび３２ｂに対応するが、簡略化のために図３ａ、３ｂには示していない。 Depending on the embodiment, the user interface 20 is configured as a user interface as shown in FIG. 3a. That is, the user interface always represents only the current object. Alternatively, the user interface 20 is constructed as shown in FIG. That is, always represent all objects in the input channel. In both FIGS. 3a and 3b, the timeline 30 is shown to include objects A, B and C in the order of occurrence. Object A includes a start time 31a and an end time 31b. Randomly, in FIG. 3a, the end time 31b of the first object A occurs simultaneously with the start time of the second object B. This has an end time 32b and occurs randomly and at the same time as the start time of the third object C. Again, this has an end time 33b. Start times 32a and 33b correspond to end times 31b and 32b, but are not shown in FIGS. 3a and 3b for simplicity.

図３ａに示すモードでは、現在のオブジェクトだけをユーザインターフェースチャンネルとして表示している。図３ａの右側に、ミキシングデスクチャンネルシンボル３４を示している。これは、スライダ３５とスタイルボタン３６を含んでいる。オブジェクトＢのオーディオ信号の特性または仮想位置等を変更するものである。３７で示される図３ａのタイムマークがオブジェクトＢの終了時刻３２ｂになるとすぐに、スタイルチャンネルイラスト３４はオブジェクトＢではなく、オブジェクトＣを表示する。例えば、オブジェクトＤがオブジェクトＢと同時に発生する場合は、図３ａのユーザインターフェースはさらに、入力チャンネルｉ＋１等のチャンネルを示す。図３ａに示す説明により、ある時刻での並列オーディオオブジェクトの数の概要を分かりやすくサウンド録音係に提供する。すなわち、実際にアクティブチャンネルの数を表示する。アクティブになっていない入力チャンネルは、図３ａに示す図２のユーザインターフェース２０の実施の形態に全く表示されない。 In the mode shown in FIG. 3a, only the current object is displayed as a user interface channel. The mixing desk channel symbol 34 is shown on the right side of FIG. 3a. This includes a slider 35 and a style button 36. The characteristic or virtual position of the audio signal of the object B is changed. As soon as the time mark of FIG. 3 a indicated by 37 reaches the end time 32 b of the object B, the style channel illustration 34 displays the object C, not the object B. For example, if object D occurs simultaneously with object B, the user interface of FIG. 3a further indicates a channel, such as input channel i + 1. The explanation shown in FIG. 3a provides an easy-to-understand summary of the number of parallel audio objects at a certain time to the sound recorder. That is, the actual number of active channels is displayed. Input channels that are not active are not displayed at all in the embodiment of the user interface 20 of FIG. 2 shown in FIG. 3a.

図３ｂに示す実施の形態では、入力チャンネルの全オブジェクトが、隣接して表示される。使用していない入力チャンネルの表示についても行われない。しかしながら、時間的に発生順に割り当てられたチャンネルが属する入力チャンネルｉを、３つの時間で表す。すなわち、ある時間ではオブジェクトチャンネルＡであったものが、別の時間ではオブジェクトチャンネルＢであり、さらに別の時間ではオブジェクトチャンネルＣとなる。本発明によれば、対応するソフトウェアまたはハードウェアレギュレータを通じてこのチャンネルレギュレータまたはチャンネルスイッチを介してオブジェクトのオーディオ信号をさらに操作することを、サウンド録音係が予測できるように、オブジェクトが現在当該チャンネルｉに供給されていて、例えば、複数のオブジェクトを遅かれ早かれこのチャンネルで実行することになるといった概要を、サウンド録音係に明瞭に提供するために、例えば、色または明るさで、オブジェクトＢに対して入力チャンネルｉというように（図３ｂの参照番号３８）チャンネルの強調を好適に行える。従って、図２のユーザインターフェース２０、特に、図３ａおよび図３ｂのその実施の形態は、マッピング手段１８が生成するチャンネル指向オーディオ信号処理手段の入力チャンネルの“占有率”を、所望の場合にはビジュアルイラストで提供するように構成される。 In the embodiment shown in FIG. 3b, all objects in the input channel are displayed adjacently. The display of unused input channels is also not performed. However, the input channel i to which the channels assigned in the order of occurrence in time belong is represented by three times. That is, what was the object channel A at a certain time becomes the object channel B at another time, and becomes the object channel C at another time. In accordance with the present invention, an object is currently on channel i so that the sound recorder can predict further manipulation of the object's audio signal via this channel regulator or channel switch through corresponding software or hardware regulators. Provided for object B, for example in color or brightness, to provide a clear overview to the sound recorder, for example, that multiple objects will be executed on this channel sooner or later. Channel i can be suitably emphasized (reference number 38 in FIG. 3b), such as channel i. Accordingly, the user interface 20 of FIG. 2, and in particular its embodiment of FIGS. 3a and 3b, determines the “occupancy” of the input channel of the channel-oriented audio signal processing means generated by the mapping means 18 if desired. Configured to provide visual illustrations.

次に、図５を参照すると、図１のマッピング手段１８の機能の簡単な例を示す。図５は、各種のオーディオオブジェクトＡ、Ｂ、Ｃ、Ｄ、Ｅ、Ｆ、およびＧを有するオーディオシーンを示す。オブジェクトＡ、Ｂ、Ｃ、およびＤは、時間的に重複することがわかる。言い換えれば、これらのオブジェクトＡ、Ｂ、Ｃ、およびＤはすべて、ある特定の時刻５０でアクティブである。一方、オブジェクトＥは、オブジェクトＡ、Ｂと重複していない。時刻５２からわかるように、オブジェクトＥは、オブジェクトＤおよびＣとだけ重複している。例えば時刻５４からわかるように、オブジェクトＦおよびオブジェクトＤが重複している。同じことが、オブジェクトＦおよびＧに当てはまる。例えば時刻５６で重複しているものの、オブジェクトＧは、オブジェクトＡ、Ｂ、Ｃ、Ｄ、およびＥとは重複していない。 Next, referring to FIG. 5, a simple example of the function of the mapping means 18 of FIG. 1 is shown. FIG. 5 shows an audio scene with various audio objects A, B, C, D, E, F, and G. It can be seen that objects A, B, C, and D overlap in time. In other words, these objects A, B, C, and D are all active at a particular time 50. On the other hand, the object E does not overlap with the objects A and B. As can be seen from time 52, object E overlaps only with objects D and C. For example, as can be seen from time 54, object F and object D overlap. The same applies to objects F and G. For example, although it overlaps at time 56, object G does not overlap with objects A, B, C, D, and E.

多くの場合都合の悪い、簡単なチャンネル対応付けは、図５の例に示す、各オーディオオブジェクトを入力チャンネルに割り当てることである。図６の表の左側に示す１：１変換が得られることになる。このコンセプトの欠点は、多数の入力チャンネルが必要になったり、映画の場合では非常に早い、多数のオーディオオブジェクトが存在する場合には、波面合成レンダリング装置の入力チャンネルの数により、実際の映画設定における処理可能な仮想音源の数が限られてしまったりするという事実である。技術的な制限により創造的な可能性を妨げてしまってはならないので、もちろん、これは望ましいものではない。一方、この１：１変換は次の点で不確かなものである。一般に、各入力チャンネルがオーディオオブジェクトを取得することもあるが、特定のオーディオシーンについて考えると、一般に、比較的少ない数の入力チャンネルがアクティブになっているが、概要では全オーディオチャンネルが常に提供されているので、しかしながら、ユーザはこれを簡単に行使することができない。 A simple channel association, which is often inconvenient, is to assign each audio object to an input channel, as shown in the example of FIG. The 1: 1 conversion shown on the left side of the table of FIG. 6 will be obtained. The disadvantage of this concept is that if you have a large number of input channels or very fast in the case of a movie, there are a lot of audio objects, depending on the number of input channels of the wavefront rendering device, the actual movie settings The number of virtual sound sources that can be processed is limited. Of course, this is not desirable because technical limitations must not hinder creative possibilities. On the other hand, this 1: 1 conversion is uncertain in the following points. In general, each input channel may get an audio object, but considering a specific audio scene, a relatively small number of input channels are generally active, but the overview always provides all audio channels. However, the user cannot easily exercise this.

さらに、オーディオオブジェクトをオーディオ処理手段の入力チャンネルに１：１割り当てを行うというこのコンセプトにより、オーディオオブジェクトの数をできるだけ少なくしたり、数の制限がないようにしたりするためには、入力チャンネルの数が非常に多いオーディオ処理手段を提供する必要があるという事実を導くことになる。このことは、ただちに計算量が増えることになり、個別のスピーカ信号を算出するために、オーディオ処理手段の計算能力と記憶容量とを必要とすることになる。これは、直接このようなシステムの価格が高くなってしまうことになる。 Furthermore, with this concept of assigning 1: 1 audio objects to the input channels of the audio processing means, the number of input channels can be reduced in order to minimize the number of audio objects and to limit the number of audio objects. Will lead to the fact that there is a need to provide very many audio processing means. This immediately increases the amount of calculation and requires the calculation capability and storage capacity of the audio processing means to calculate individual speaker signals. This directly increases the price of such a system.

本発明によるマッピング手段１８により達成されているように、図５に示す例の、進歩性のある割り当てオブジェクトチャンネルを、図６の表の右側部分に示される。従って、並列オーディオオブジェクトＡ、Ｂ、Ｃ、およびＤを、入力チャンネルＥＫ１、ＥＫ２、ＥＫ３、およびＥＫ４それぞれに連続して割り当てる。オブジェクトＥを図６の左半分に示すように、入力チャンネルＥＫ５に割り当てる必要なないものの、入力チャンネルＥＫ１や、括弧で示すように、入力チャンネルＥＫ２等に、自由にチャンネルに割り当てることもできる。同じことがオブジェクトＦに当てはめられる。これを、原則的には入力チャンネルＥＫ４を除く全チャンネルに割り当てても良い。同じことがオブジェクトＧに当てはめられる。これを、以前にオブジェクトＦを割り当てたチャンネル（この例では入力チャンネルＥＫ１）をのぞき、全チャンネルに割り当てても良い。 As achieved by the mapping means 18 according to the invention, the inventive assignment object channel of the example shown in FIG. 5 is shown in the right part of the table of FIG. Accordingly, parallel audio objects A, B, C, and D are assigned sequentially to input channels EK1, EK2, EK3, and EK4, respectively. Although it is not necessary to assign the object E to the input channel EK5 as shown in the left half of FIG. 6, it can be freely assigned to the input channel EK1 or the input channel EK2 as shown by parentheses. The same applies to object F. In principle, this may be assigned to all channels except the input channel EK4. The same applies to object G. This may be assigned to all channels except for the channel to which the object F has been previously assigned (in this example, the input channel EK1).

本発明の好適な実施の形態では、マッピング手段１８は、元々の数ができるだけ小さいチャンネルを常に占有するように構成される。そして、可能な場合には常に、隣接する入力チャンネルＥＫｉおよびＥＫｉ＋１を占有するように構成する。これにより、ホールがなくなるようにする。他方、この“近傍機能”が本質的なものでない。なぜなら、例えば、レギュレータ３５またはちょうど現在のチャンネルであるミキシングデスクチャンネルイラスト３４のボタン３６により、進歩性のあるユーザインターフェースでこのチャンネルをまさに操作できるかぎりにおいては、オーディオ処理手段の、入力チャンネルの第１番目であるのか、第７番目であるのか、何番目のものを今操作しているのかどうかについて、本発明によるオーディオオーサシステムのユーザにとって意味がないからである。従って、ユーザインターフェースチャンネルｉは、必ずしも入力チャンネルｉに対応する必要はないが、ユーザインターフェースチャンネルｉが、例えば、入力チャンネルＥＫｍに対応して、ユーザインターフェースチャンネルｉ＋１が入力チャンネルｋ等に対応するように、チャンネル割り当てを行うこともできる。 In a preferred embodiment of the invention, the mapping means 18 is configured to always occupy the channel whose original number is as small as possible. Whenever possible, the adjacent input channels EKi and EKi + 1 are occupied. This ensures that there are no holes. On the other hand, this “neighboring function” is not essential. This is because, for example, the first of the input channels of the audio processing means is as long as this channel can be manipulated with an inventive user interface by means of the regulator 35 or the button 36 of the mixing desk channel illustration 34, which is just the current channel. This is because it is meaningless to the user of the audio author system according to the present invention whether it is the th, the seventh, or what number is currently being operated. Therefore, the user interface channel i does not necessarily correspond to the input channel i, but the user interface channel i corresponds to the input channel EKm, for example, and the user interface channel i + 1 corresponds to the input channel k and the like. Channel assignment can also be performed.

これとともに、ユーザインターフェースチャンネル再マッピングを行うことにより、チャンネルホールの存在を回避する。すなわち、直ちにそして明瞭に互いに隣接して示される現在のユーザインターフェースチャンネルを、サウンド録音係が常にわかるようにする。 At the same time, the presence of a channel hole is avoided by performing user interface channel remapping. That is, the sound recording staff always knows the current user interface channels shown immediately and clearly adjacent to each other.

もちろん、ユーザインターフェースの進歩性のあるコンセプトは、既存のハードウェアミキシングコンソールに移管することもできる。これは、実際のハードウェアレギュレータおよびハードウェアボタンを含み、サウンド録音係手入力で最適なオーディオミキシングを行うように操作する。本発明の利点は、このようなハードウェアミキシングコンソールであって、これはサウンド録音係とって一般に非常になじみがあって、且つ重要なものであり、まさに現在のチャンネルにより常に用いることもできる。現在のチャンネルを、例えば、一般にミキシングコンソールに存在するＬＥＤ等のインジケータにより、サウンド録音係に明瞭に表示する。 Of course, the inventive concept of user interface can also be transferred to an existing hardware mixing console. This includes the actual hardware regulator and hardware buttons, and operates to perform optimal audio mixing with manual input of the sound recording. An advantage of the present invention is such a hardware mixing console, which is generally very familiar and important to sound recording personnel and can always be used with the very current channel. The current channel is clearly displayed to the sound recording clerk, for example by an indicator such as an LED typically present on a mixing console.

生成用波面合成スピーカセットアップを、例えば映画館での再生セットアップから外すという点で、本発明はさらに柔軟性がある。従って、本発明によれば、オーディオコンテンツは、各種のシステムによりレンダリングすることができるフォーマットで符号化される。このフォーマットはオーディオシーンである。すなわち、オブジェクト指向オーディオ表現であって、スピーカ信号表現ではない。それに関する限り、表示方法は、コンテンツの再生システムへの適用として、理解される。本発明によれば、数本のマスタチャンネルばかりでなく全オブジェクト指向シーンの記述についても、波面合成再生処理で処理される。シーンは、再生毎にレンダリングする。これを、現在の状態に適用するために一般にリアルタイムで処理される。一般に、この適用には、スピーカの数およびそれらの位置や、周波数特性、サウンド差圧レベル等の再生システムの特性、室内の音響的条件、またはさらに映像再生条件を考慮に入れる。 The present invention is more flexible in that it eliminates the generating wavefront synthesis speaker setup from, for example, a movie theater playback setup. Thus, according to the present invention, audio content is encoded in a format that can be rendered by various systems. This format is an audio scene. That is, object-oriented audio representation, not speaker signal representation. As far as it is concerned, the display method is understood as an application to a content playback system. According to the present invention, not only several master channels but also descriptions of all object-oriented scenes are processed by the wavefront synthesis reproduction process. The scene is rendered for each playback. This is typically processed in real time to apply to the current state. In general, this application takes into account the number of speakers and their positions, the characteristics of the reproduction system, such as frequency characteristics, sound differential pressure levels, indoor acoustic conditions, or even video reproduction conditions.

現在のシステムのチャンネルベースのアプローチと比較すると、波面合成ミキシングの大きな違いは、サウンドオブジェクトを自由に配置できることにある。立体音響原理に基づく通常の再生システムでは、音源の位置を想定的に符号化する。このことが、例えば、映画等のビジュアルコンテンツに属するミキシングコンセプトにとって重要であるのは、正確なシステムセットアップにより、映像を参照して音源の配置を近似するようにするからである。 Compared to the channel-based approach of current systems, the major difference in wavefront synthesis mixing is that sound objects can be placed freely. In a normal reproduction system based on the stereophonic principle, the position of a sound source is assumed and encoded. This is important for a mixing concept belonging to visual contents such as movies, for example, because the arrangement of sound sources is approximated by referring to the video by an accurate system setup.

波面合成システムはしかしながら、サウンドオブジェクトに対する絶対位置を必要とする。これは、このオーディオオブジェクトの開始時刻および終了時刻の他に、このオーディオオブジェクトとともにオーディオオブジェクトのオーディオ信号に対する追加情報として提供される。 Wavefront synthesis systems, however, require an absolute position relative to the sound object. This is provided as additional information for the audio signal of the audio object along with the audio object, in addition to the start time and end time of the audio object.

従来のチャンネル指向アプローチにおいては、基本的な考えは、いくつかのプレミキシング動作におけるトラックの数を減らすことであった。これらのプレミキシング動作は、セリフ、音楽、サウンド、効果音等のカテゴリで整理される。ミキシング処理の間、全ての必要なオーディオ信号が、ミキシングコンソールに供給されて、異なる音響技師により、同時にミキシングが行われる。再生スピーカ当りのトラックが１本になるまで、各プレミキシングによりトラックの数を低減する。これらの最終的なトラックが、最終的なマスタファイル（最終的なマスタ）を構成する。 In conventional channel-oriented approaches, the basic idea was to reduce the number of tracks in some premixing operations. These premixing operations are organized in categories such as speech, music, sound, and sound effects. During the mixing process, all necessary audio signals are fed to the mixing console and mixed simultaneously by different acoustic engineers. Each premixing reduces the number of tracks until there is one track per playback speaker. These final tracks constitute the final master file (final master).

イコライゼーション、ダイナミック、位置等の、関連するすべてのミキシング作業が、ミキシングデスクまたはさらに専用の装置を用いて実行される。 All relevant mixing tasks, such as equalization, dynamic, position, etc., are performed using a mixing desk or even a dedicated device.

ポストプロダクション処理のリエンジニアリングの目的は、ユーザのトレーニングを最小限にすることと、新しい進歩性のあるシステムをユーザの既存の知識に統合することとである。本発明の波面合成の適用では、トラックまたは位置が異なるレンダリングするオブジェクトの全てが、従来のプロダクション施設とは対照的にマスタファイル／配信フォーマットが存在し、生成処理の間にトラックの数を低減するように最適化を行う。他方、実際上の理由から、再レコーディングエンジニアに既存のミキシングコンソールを用いて波面合成生成を行う可能性を提供することが必要である。 The purpose of post-engineering reengineering is to minimize user training and to integrate new and inventive systems into the user's existing knowledge. In the wavefront synthesis application of the present invention, all of the objects to be rendered in different tracks or locations have a master file / distribution format as opposed to a conventional production facility, reducing the number of tracks during the generation process. Optimize so that. On the other hand, for practical reasons, it is necessary to provide the re-recording engineer with the possibility to perform wavefront synthesis using an existing mixing console.

従って、本発明によれば、現在のミキシングコンソールは、従来のミキシング作業に用いられる。次に、これらのミキシングコンソールの出力が、進歩性のあるシステムに導入されて、空間ミキシングを実行して、オーディオシーンでのオーディオ表現を生成する。本発明による波面合成オーサーツールは、ワークステーションとして実施されて、最終的なミキシングのオーディオ信号を記録して、別の工程で配信フォーマットに変換する能力を有することを意味している。このため、本発明によれば、２つの面を考慮に入れる。第１は、全オーディオオブジェクトまたはトラックがやはり最終的なマスタに存在することである。第２の面は、ミキシングコンソールでは配置を行わないことである。いわゆるオーサリング、すなわちサウンド録音係の後処理は、プロダクションチェーンの最終工程の１つであることを意味している。本発明によれば、本発明によるシステムの波面合成、すなわち、オーディオ表現を生成するための進歩性のある装置は、スタンドアロン型ワークステーションとして実施される。ミキシングデスクからのオーディオ出力をシステムに供給することにより、これを異なるプロダクション環境に組み込んでも良い。それに関する限り、ミキシングデスクは、オーディオシーンでのオーディオ表現を生成するための装置に接続したユーザインターフェースを表す。 Therefore, according to the present invention, the current mixing console is used for conventional mixing operations. These mixing console outputs are then introduced into an inventive system to perform spatial mixing to produce an audio representation in the audio scene. The wavefront synthesis author tool according to the present invention is implemented as a workstation, which means that it has the ability to record the final mixed audio signal and convert it into a distribution format in a separate step. For this reason, according to the invention, two aspects are taken into account. The first is that all audio objects or tracks are still in the final master. The second aspect is that no arrangement is made on the mixing console. So-called authoring, that is, post-processing of the sound recording staff, means that it is one of the final steps of the production chain. According to the present invention, the inventive wavefront synthesis of the system according to the present invention, ie the inventive device for generating the audio representation, is implemented as a stand-alone workstation. By supplying the audio output from the mixing desk to the system, it may be incorporated into a different production environment. As far as it is concerned, a mixing desk represents a user interface connected to a device for generating an audio representation in an audio scene.

本発明の好適な実施の形態による進歩性のあるシステムは、図４に示される。図１または図２と同じ参照番号は、同じ要素を示す。基本のシステム設計は、モジュール性を目的とし、既存のミキシングコンソールをユーザインターフェースとして進歩性のある波面合成オーサシステムに組み込む機能に基づいている。 An inventive system according to a preferred embodiment of the present invention is shown in FIG. The same reference numerals as in FIG. 1 or 2 indicate the same elements. The basic system design aims at modularity and is based on the ability to incorporate an existing mixing console as a user interface into an inventive wavefront synthesis author system.

この理由から、他のモジュールと通信を行う中央制御装置１２０を、オーディオ処理手段１２内部に構成する。これにより、全てものが同じ通信プロトコルを用いている限り、ある特定のモジュールの選択肢を用いることが可能になる。図４に示すシステムをブラックボックスとして考えると、一般に、（提供手段１０からの）多数の入力と多数の出力（スピーカ信号１４）とともに、ユーザインターフェース２０を観察できる。ユーザインターフェースの次にこのブラックボックスに組み込まれているのは、実際のＷＦＳレンダリング装置１２２である。これは、様々な入力情報を用いて、スピーカ信号の実際の波面合成演算を実行する。また、ルームシミュレーションモジュール１２４が備えられている。これは、録音室の室内特性を生成したり、録音室の室内特性を操作したりするのに用いるある特定のルームシミュレーションを行うように構成される。 For this reason, a central control device 120 that communicates with other modules is configured inside the audio processing means 12. This makes it possible to use certain module options as long as all use the same communication protocol. Considering the system shown in FIG. 4 as a black box, the user interface 20 can generally be observed with multiple inputs (from the providing means 10) and multiple outputs (speaker signal 14). Next to the user interface, the actual WFS rendering device 122 is incorporated in the black box. This performs the actual wavefront synthesis operation of the speaker signal using various input information. A room simulation module 124 is also provided. This is configured to perform certain room simulations that are used to generate room characteristics of the recording room or to manipulate room characteristics of the recording room.

また、オーディオ録音手段１２６とともに記録再生手段（やはり１２６）を備える。手段１２６は好ましくは、外部入力を備える。この場合は、全オーディオ信号は、もともとオブジェクト指向で、または静止チャンネル指向で、提供し供給される。そのとき、オーディオ信号はシーンプロトコルから来ないで、そのときは、制御タスクに従う。供給されたオーディオデータを次に必要な場合には手段１２６からオブジェクトベースの表現に変換して、次に内部的にマッピング手段１８に供給して、次にオブジェクト／チャンネルマッピングを実行する。 In addition to the audio recording means 126, a recording / reproducing means (also 126) is provided. The means 126 preferably comprises an external input. In this case, the entire audio signal is provided and supplied either originally object-oriented or stationary channel-oriented. At that time, the audio signal does not come from the scene protocol and then follows the control task. The supplied audio data is then converted from means 126 to an object-based representation, if necessary, and then internally supplied to the mapping means 18 and then object / channel mapping is performed.

モジュール間のオーディオ接続は全て、マトリックスモジュール１２８により切り換え可能である。中央制御装置１２０の要求により、対応するチャンネルを対応するチャンネルに接続する。好適な実施の形態では、ユーザは、６４個の入力チャンネルで仮想音源の信号をオーディオ処理手段１２供給する機能を有するので、この実施の形態では６４個の入力チャンネルＥＫ１〜ＥＫｍがある。これとともに、既存のコンソールをユーザインターフェースとして用いて、仮想音源信号のプレミキシングを行っても良い。次に、空間ミキシングを、波面合成オーサシステムと、特に心臓部である、ＷＦＳレンダリング装置１２２とにより行う。 All audio connections between modules can be switched by the matrix module 128. At the request of the central controller 120, the corresponding channel is connected to the corresponding channel. In the preferred embodiment, the user has the function of supplying the audio processing means 12 with the virtual sound source signal through 64 input channels, so in this embodiment there are 64 input channels EK1 to EKm. At the same time, pre-mixing of the virtual sound source signal may be performed using an existing console as a user interface. Next, spatial mixing is performed by the wavefront synthesis author system and the WFS rendering device 122, particularly the heart.

完全なシーン記述は、提供手段１０に保存される。これは、シーンプロトコルとも呼ばれる。しかしながらメイン通信または必要なデータトラフィックは、中央制御装置１２０で実行される。シーン記述の変更については、例えば、ユーザインターフェース２０と、特に、ハードウェアミキシングコンソール２００またはソフトウェアＧＵＩ、すなわちソフトウェアグラフィックユーザインターフェース２０２とで行うこともでき、ユーザインターフェース制御装置２０４を介して、変更したシーンプロトコルとして、変更を提供手段１０に供給される。変更したシーンプロトコルを提供することにより、シーンの全論理構造を一意的に表す。 The complete scene description is stored in the providing means 10. This is also called a scene protocol. However, main communication or necessary data traffic is performed at the central controller 120. The scene description can be changed by, for example, the user interface 20, particularly the hardware mixing console 200 or the software GUI, that is, the software graphic user interface 202, and the changed scene can be changed via the user interface control device 204. The change is supplied to the providing means 10 as a protocol. By providing a modified scene protocol, it uniquely represents the entire logical structure of the scene.

オブジェクト指向ソリューションアプローチを実現するために、各サウンドオブジェクトは、マッピング手段１８により、表示チャンネル（入力チャンネル）に対応付けられている。オブジェクトは、ある特定の時間存在する。通常、図３ａ、３ｂ、および６に従って示されているように、多数のオブジェクトが発生順にある特定のチャンネル上に存在する。進歩性のあるオーサシステムが、このオブジェクト指向をサポートするが、波面合成レンダリング装置自体は、オブジェクトについてわかっていない。オーディオチャンネル内の信号と、これらのチャンネルをレンダリングする方法の記述とをシンプルに受信する。シーンプロトコル、すなわち、オブジェクトおよび対応付けられたチャンネルがわかっている提供手段は、オブジェクト関連メタデータ（例えば音源位置）をチャンネル関連メタデータに変換して、ＷＦＳレンダリング装置１２２に送信しても良い。図４にブロックで機能プロトコル１２９を概略で示すように、他のモジュール間の通信が、他のモジュールが必要な情報だけを含むように、専用プロトコルで実行される。 In order to realize the object-oriented solution approach, each sound object is associated with a display channel (input channel) by the mapping means 18. An object exists for a certain time. Typically, as shown in accordance with FIGS. 3a, 3b, and 6, a large number of objects exist on a particular channel in the order of occurrence. An inventive author system supports this object orientation, but the wavefront synthesis rendering device itself does not know about the object. Simply receive the signals in the audio channels and a description of how to render these channels. The providing means that knows the scene protocol, that is, the object and the associated channel, may convert the object-related metadata (eg, sound source position) into channel-related metadata and send it to the WFS rendering device 122. As schematically shown in block form functional protocol 129 in FIG. 4, communication between other modules is performed with a dedicated protocol so that the other modules contain only the information needed.

進歩性のある制御モジュールはまた、シーン記述のハードディスクストレージをサポートする。好ましくは、２つのファイルフォーマットを区別する。一方のファイルフォーマットはオーサフォーマットで、オーディオデータを圧縮ＰＣＭデータとして保存する。また、オーディオオブジェクトすなわち音源のグループ化、レイヤ情報等の、セッション関連情報を用いて、ＸＭＬに基づいて、専用ファイルフォーマットに保存する。 The inventive control module also supports hard disk storage of scene descriptions. Preferably, two file formats are distinguished. One file format is an author format, and audio data is stored as compressed PCM data. Further, using session-related information such as audio object, that is, sound source grouping, layer information, and the like, it is stored in a dedicated file format based on XML.

もう一方のタイプは、配信ファイルフォーマットである。このフォーマットでは、オーディオデータを圧縮して保存することもできるし、セッション関連データをさらに保存する必要はない。オーディオオブジェクトはこのフォーマットにおいてやはり存在することと、ＭＰＥＧ−４規格を用いて配信しても良いこととに注意すべきである。本発明によれば、波面合成レンダリングをリアルタイムで常に行うことが好ましい。これにより、プレレンダリングオーディオ情報、すなわちすでに終了したスピーカ信号を、いずれかのファイルフォーマットで保存する必要が無くなる。スピーカ信号はデータの非常に大きな部分を占有するので、このことは、そのかぎりでは大きな長所である。これは、少なくとも波面合成環境で用いる多数のスピーカに起因するものではない。 The other type is a distribution file format. In this format, audio data can be compressed and stored, and there is no need to further store session related data. It should be noted that audio objects still exist in this format and may be distributed using the MPEG-4 standard. According to the present invention, it is preferable to always perform wavefront synthesis rendering in real time. This eliminates the need to store pre-rendered audio information, i.e., already finished speaker signals, in any file format. This is a great advantage as far as speaker signals occupy a very large part of the data. This is not due to the large number of speakers used at least in the wavefront synthesis environment.

１つ以上の波面合成レンダリング装置モジュール１２２には通常、仮想音源信号およびチャンネル指向シーン記述が供給される。波面合成レンダリング装置は、各スピーカ、すなわち図４のスピーカ信号１４のスピーカ信号の波面合成理論に従って、駆動信号を算出する。波面合成レンダリング装置はさらに、サブウーファースピーカの信号を算出する。これは、低周波数で波面合成システムをサポートするために必要なものでもある。ルームシミュレーションモジュール１２４からのルームシミュレーション信号を、多数の（通常８から１２）の静止平面波を用いてレンダリングする。このコンセプトに基づいて、異なるソリューションアプローチを統合してルームシミュレーションを行うことが可能になる。ルームシミュレーションモジュール１２４を用いない場合は、波面合成システムはすでに、聴取範囲の音源方向を安定して認識する、許容できるサウンドイメージを生成している。しかしながら、音源深度の認識に関してある特定の欠陥があるのは、通常、初期空間反射または残響をまったく音源信号に加えないからである。本発明によれば、ルームシミュレーションモジュールを用いることが好ましい。これは、壁面反射を再生する。例えば、ミラー音源モデルを用いて初期反射を生成して、これをモデル化する。これらのミラー音源が、シーンプロトコルのオーディオオブジェクトとして再び処理されても良いし、あるいは、実際、オーディオ処理手段それ自体により追加されても良い。録音／再生装置１２６は、有益な補足を表す。空間ミキシングだけを行うように、プレミキシングの間に従来のようにミキシングが終了したサウンドオブジェクトが、従来のミキシングデスクからオーディオオブジェクト再生装置へ供給されても良い。また、ミキシングデスクの出力チャンネルをタイムコード制御で録音して、オーディオデータを再生モジュールに保存する、オーディオ録音モジュールを備えることが好ましい。再生モジュールは、開始タイムコードを受信して、すなわち、表示手段１８から再生装置１２６へ供給した個別の出力チャンネルに関連した、ある特定のオーディオオブジェクトを再生する。録音／再生装置は、オーディオオブジェクトに対応付けられている開始時刻および停止時刻の記述に従って、個別のオーディオオブジェクトの再生を互いに別々に開始して停止することもできる。ミキシング手順が終了するとすぐに、オーディオコンテンツは、再生装置モジュールから取り出されて、配信ファイルフォーマットにエクスポートする。従って、配信ファイルフォーマットは、ミキシングの準備が整ったシーンの終了したシーンプロトコルを含む。進歩性のあるユーザインターフェースコンセプトの目的は、階層構造を実行することである。これは、映画館ミキシング処理作業に適用される。ここで、オーディオオブジェクトは、任意の時間、個別のオーディオオブジェクトの表現として存在する音源として録音される。開始時間および停止／終了時間は、音源、すなわち、オーディオオブジェクトにとって典型的なものである。音源またはオーディオオブジェクトは、オブジェクトまたは音源が“生きている”時間の間は、システムリソースを必要とする。 One or more wavefront synthesis rendering device modules 122 are typically supplied with virtual sound source signals and channel-oriented scene descriptions. The wavefront synthesis rendering apparatus calculates a drive signal according to the wavefront synthesis theory of each speaker, that is, the speaker signal of the speaker signal 14 in FIG. The wavefront synthesis rendering apparatus further calculates a subwoofer speaker signal. This is also necessary to support wavefront synthesis systems at low frequencies. The room simulation signal from the room simulation module 124 is rendered using a large number (usually 8 to 12) of stationary plane waves. Based on this concept, different solution approaches can be integrated to perform room simulation. When the room simulation module 124 is not used, the wavefront synthesis system has already generated an acceptable sound image that stably recognizes the sound source direction in the listening range. However, there are certain flaws with respect to sound source depth recognition, since they usually do not add any initial spatial reflection or reverberation to the sound source signal. According to the present invention, it is preferable to use a room simulation module. This reproduces the wall reflection. For example, an initial reflection is generated using a mirror sound source model and modeled. These mirror sound sources may be processed again as scene protocol audio objects, or may actually be added by the audio processing means itself. The recording / playback device 126 represents a useful supplement. In order to perform only spatial mixing, a sound object for which mixing has been completed in the conventional manner during pre-mixing may be supplied from a conventional mixing desk to an audio object playback device. Further, it is preferable to include an audio recording module that records the output channel of the mixing desk by time code control and stores the audio data in the reproduction module. The playback module receives the start time code, i.e. plays a particular audio object associated with the individual output channel supplied from the display means 18 to the playback device 126. The recording / playback apparatus can also start and stop the playback of individual audio objects separately according to the description of the start time and stop time associated with the audio object. As soon as the mixing procedure is finished, the audio content is extracted from the playback device module and exported to the distribution file format. Thus, the delivery file format includes the finished scene protocol for a scene that is ready for mixing. The purpose of the inventive user interface concept is to implement a hierarchical structure. This applies to cinema mixing processing operations. Here, the audio object is recorded as a sound source that exists as an expression of an individual audio object at an arbitrary time. The start time and stop / end time are typical for sound sources, ie audio objects. A sound source or audio object requires system resources during the time that the object or sound source is “alive”.

好ましくは、開始時間および停止時間を別にして、各音源は、メタデータも含む。これらのメタデータは、方向依存音量および方向依存遅延の“タイプ”（ある特定の時刻での平面波または点音源）、“方向”、“ボリューム”、“ミューティング”および“フラグ”である。これらのメタデータをすべて、自動化して用いても良い。 Preferably, apart from the start time and stop time, each sound source also includes metadata. These metadata are “type” (plane wave or point sound source at a specific time), “direction”, “volume”, “muting” and “flag” of direction-dependent volume and direction-dependent delay. All of these metadata may be used automatically.

また、オブジェクト指向ソリューションアプローチに関わらず、例えば、全映画を通して、または一般に全シーンを通して“生きている”オブジェクトは、それ自体のチャンネルについても取得するという点で、進歩性のあるオーサシステムを従来のチャンネルコンセプトに用いることは、好ましい。これらのオブジェクトは原則的には、図６に基づいて述べた１：１変換での単純なチャンネルを表すことを意味している。 Also, regardless of the object-oriented solution approach, for example, objects that are “living” throughout the movie, or generally throughout the scene, generally acquire an inventive author system in that they also acquire their own channels. It is preferable to use it for the channel concept. These objects in principle are meant to represent simple channels with the 1: 1 conversion described with reference to FIG.

本発明の好適な実施の形態では、少なくとも２つのオブジェクトをグループ化しても良い。各グループは、どのパラメータでグループ化するか選択することができ、グループのマスタを用いてどのように算出するか、選択することができる。音源グループは、メンバの開始時間および終了時間により確定した任意の時間の間、存在する。 In a preferred embodiment of the invention, at least two objects may be grouped. Each group can be selected with which parameter to group, and how to calculate using the master of the group can be selected. The sound source group exists for an arbitrary time determined by the start time and end time of the member.

グループのユーティリティの例は、仮想規格サラウンドセットアップに用いることから構成される。これらを、シーンの仮想フェードアウトまたはシーンの仮想ズームインに用いることもできる。あるいは、グループ化は、サラウンド残響を組み込むために、そしてＷＦＳミキシングを録音するために使用される。 An example of a group utility consists of using for virtual standard surround setup. These can also be used for virtual fade out of the scene or virtual zoom in of the scene. Alternatively, grouping is used to incorporate surround reverberation and to record WFS mixing.

また、さらに、論理本質、すなわちレイヤを構成することが好ましい。ミキシングまたはシーンを構築するために、本発明の好適な実施の形態では、グループおよび音源を異なるレイヤに配列する。レイヤを用いることにより、プレダビングは、オーディオワークステーションでシミュレーションされても良い。レイヤを用いて、現在のミキシング対象の異なる部分を表示したり隠したりするといった、オーサ処理の間に表示属性を変更することもできる。 Furthermore, it is preferable to construct a logical essence, that is, a layer. In order to build a mixing or scene, the preferred embodiment of the present invention arranges groups and sound sources in different layers. By using layers, predubbing may be simulated at an audio workstation. It is also possible to change display attributes during author processing such as using layers to display or hide different parts of the current mixing target.

シーンは、任意の継続時間の間、前述の構成要素から成る。この継続時間は、フィルムスプールまたは、例えば、全映画、または例えば、５分間といった、映画の一部のある特定の継続時間だけとすることもできる。シーンはやはり、シーンに属する多数のレイヤ、グループ、および音源から構成される。 A scene consists of the aforementioned components for an arbitrary duration. This duration can also be a film spool or just a certain duration of a part of the movie, for example the whole movie or for example 5 minutes. A scene is again composed of a number of layers, groups, and sound sources belonging to the scene.

好ましくは、完全なユーザインターフェース２０は、ハプティック制御を可能にするためにグラフィックソフトウェア部とハードウェア部とをともに含んでいる。これは好ましいが、しかしながら、コストのために、ユーザインターフェースを完全に、ソフトウェアモジュールとして実施こともできる。 Preferably, the complete user interface 20 includes both a graphic software part and a hardware part to allow haptic control. This is preferred, however, due to cost, the user interface can also be implemented entirely as a software module.

いわゆる“空間”に基づくグラフィックシステムの設計コンセプトを用いる。ユーザインターフェースでは、異なる空間が少し存在する。各空間は、専用編集環境であって、空間に必要な全ツールを利用することができる、異なるアプローチからのプロジェクトを示す。従って、もう各種のウインドウに注意を払う必要はない。環境に必要な全ツールは、対応する空間内にある。 A graphic system design concept based on so-called “space” is used. There are a few different spaces in the user interface. Each space represents a project from a different approach that is a dedicated editing environment and can take advantage of all the tools needed for the space. Therefore, it is no longer necessary to pay attention to the various windows. All the tools necessary for the environment are in the corresponding space.

音響技師に任意の時刻での全オーディオ信号の概要を提供するために、図３ａおよび３ｂに基づいてすでに述べた適応ミキシング空間が用いられる。アクティブチャンネルを表示するだけの従来のミキシングデスクと比較することができる。適応ミキシング空間では、単なるチャンネル情報の代わりに、オーディオオブジェクト情報についても提示する。すでに示したように、これらのオブジェクトは、図１のマッピング手段１８により、ＷＦＳレンダリング装置の入力チャンネルに対応付けられている。適応ミキシング空間とは別に、いわゆるタイムライン空間も存在する。これは、全入力チャンネルの概要を提供する。各チャンネルは、その対応するオブジェクトを有するように表される。自動化チャンネル対応付けは、簡素化に好適であるが、ユーザは、オブジェクト対チャンネル対応付けを用いることができる。 In order to provide the acoustic engineer with an overview of all audio signals at any time, the adaptive mixing space already described with reference to FIGS. 3a and 3b is used. It can be compared with a conventional mixing desk that only displays active channels. In the adaptive mixing space, audio object information is presented instead of simple channel information. As already indicated, these objects are associated with the input channels of the WFS rendering apparatus by the mapping means 18 of FIG. Apart from the adaptive mixing space, there is also a so-called timeline space. This provides an overview of all input channels. Each channel is represented as having its corresponding object. Although automated channel association is preferred for simplification, the user can use object-to-channel association.

別の空間は、配置および編集空間である。これは、３次元ビューにおけるシーンを示す。この空間により、ユーザは、音源オブジェクトの動きを録音したり編集したりすることができる。ジョイスティックあるいは、例えば、グラフィックユーザインターフェースとして周知の他の入力／表示装置を用いて、動きを生成しても良い。 Another space is a placement and editing space. This shows the scene in a 3D view. This space allows the user to record and edit the movement of the sound source object. The movement may be generated using a joystick or other input / display device known as a graphic user interface, for example.

最後に、室内空間が存在する。これは、室内編集機能を提供するために、図４のルームシミュレーションモジュール１２４をサポートしている。各室内は、室内デフォルトライブラリに保存したある特定のパラメータセットにより記述される。室内モデルによるが、様々な種類のパラメータセットとともに、各種のグラフィックユーザインターフェースを用いても良い。 Finally, there is an indoor space. This supports the room simulation module 124 of FIG. 4 to provide room editing functions. Each room is described by a specific set of parameters stored in a room default library. Depending on the indoor model, various graphic user interfaces may be used together with various types of parameter sets.

状況に応じて、オーディオ表現を生成するための進歩性のある方法は、ハードウェアまたはソフトウェアで実施されても良い。実施例は、デジタルストレージ媒体、特に、フロッピー（登録商標）ディスクまたは電子的に読取可能な制御信号を有するＣＤで行うこともできる。従って、進歩性のある方法を実行するプログラム可能なコンピュータシステムと協働させても良い。従って、一般に、コンピュータプログラム製品をコンピュータで実行する場合は、本発明はまた、プログラムコードを進歩性のある方法を実行する、機械で読取可能なキャリアに格納したコンピュータプログラム製品からなる。言い換えれば、コンピュータプログラムをコンピュータで実行する場合は、本発明は従って、本方法を実行するプログラムコードを備えるコンピュータプログラムとして実施することができる。 Depending on the situation, the inventive method for generating the audio representation may be implemented in hardware or software. Embodiments can also be performed on digital storage media, particularly floppy disks or CDs with electronically readable control signals. Thus, it may be associated with a programmable computer system that performs the inventive method. Thus, in general, when the computer program product is executed on a computer, the present invention also comprises a computer program product having the program code stored on a machine readable carrier that performs the inventive method. In other words, when the computer program is executed on a computer, the present invention can therefore be implemented as a computer program comprising program code for executing the method.

オーディオ表現を生成する進歩性のある装置のブロック回路図である。FIG. 2 is a block circuit diagram of an inventive device for generating an audio representation. 図１に示すコンセプトのためのユーザインターフェースの概略の説明図である。FIG. 2 is a schematic explanatory diagram of a user interface for the concept shown in FIG. 1. 図３ａは、本発明の一実施の形態による、図２のユーザインターフェースの概略の説明図である。図３ｂは、本発明の別の実施の形態による、図２のユーザインターフェースの概略の説明図である。FIG. 3a is a schematic illustration of the user interface of FIG. 2, according to one embodiment of the present invention. FIG. 3b is a schematic illustration of the user interface of FIG. 2 according to another embodiment of the present invention. 好適な実施の形態による、進歩性のある装置のブロック回路図である。FIG. 2 is a block circuit diagram of an inventive device according to a preferred embodiment. 各種のオーディオオブジェクトを有するオーディオシーンの時間図である。FIG. 4 is a time diagram of an audio scene having various audio objects. 図５に示すオーディオシーンにたいする、本発明によるオブジェクトとチャンネルとの間の１：１変換の比較およびオブジェクトチャンネル割り当てである。FIG. 6 is a 1: 1 conversion comparison between object and channel and object channel assignment for the audio scene shown in FIG.

Claims

A device for generating, storing and editing audio representations in an audio scene,
Audio processing means (12) for generating a plurality of speaker signals from a plurality of input channels (EK1, EK2, ..., Ekm) (16);
Means (10) for providing an object-oriented description of an audio scene, wherein the object-oriented description of the audio scene includes a plurality of audio objects, the audio object being associated with an audio signal, a start time and an end time; ,
A first audio object is assigned to the input channel, a second audio object whose start time is after the end time of the first audio object is assigned to the same input channel, and the start time is assigned to the first audio object. An object-oriented description of an audio scene configured to assign a third audio object after the start time and before the end time of the first audio object to another of the plurality of input channels is provided in the audio processing apparatus. A device comprising mapping means (18) for mapping to a plurality of input channels.

The apparatus of claim 1, wherein the audio processing means (12) comprises wavefront synthesis means (122) configured to calculate a plurality of speaker signals of the speakers, wherein the positions of the plurality of speakers are known.

The audio object is further associated with a virtual position, and the audio processing means (12) is configured to take into account the virtual position of the audio object in generating a plurality of speaker signals. 2. The apparatus according to 2.

The apparatus according to claim 1, wherein the audio processing apparatus is connected only to the providing apparatus via the mapping apparatus and receives audio object data to be processed.

The number of input channels of the audio processing means is preset to be less than the allowable number of audio objects in the audio scene, and at least two audio objects are presented so as not to overlap in time. 4. The apparatus according to any one of 4.

A user interface (20) is further provided, the user interface comprising a number of individual user interface channels, wherein the user interface channel is associated with an input channel of the audio processing device, and the user interface (20) comprises a mapping means (80 6. An apparatus according to any one of the preceding claims, which identifies an audio object that is just assigned to a user interface channel at a certain time.

7. The apparatus according to claim 6, wherein the user interface (20) is configured to identify a user interface channel associated with an input channel of the audio processing means that is currently assigned one audio object. .

The user interface is configured as a hardware mixing console having hardware operation means for each user interface channel, and each hardware operation means is associated with an indicator that identifies the currently active user interface channel. The apparatus according to claim 7.

The user interface comprises a graphical user interface configured to display only user interface channels associated with the input channels of the audio processing means currently assigned one audio object on the electronic display device. Item 8. The device according to Item 7.

The user interface (20) further comprises means for manipulating the user interface channel, which is configured to manipulate an audio object assigned to an input channel of the audio processing means (12) corresponding to the user interface channel, The interface connects to the providing means (10), replaces the audio object with its manipulated version, and the mapping means (18) replaces the manipulated version with the manipulated version instead of the audio object as input to the audio processing means (12). 10. Apparatus according to any of claims 6 to 9, configured to be assigned to a channel.

The apparatus according to claim 10, wherein the operating means is arranged to change the position, type or audio signal of the audio object.

The user interface is configured to indicate a temporal occupancy of the user interface channel, where the temporal occupancy represents a temporal sequence of audio objects assigned to the user interface channel, and the user interface is further 10. An apparatus according to any of claims 6 to 9, wherein the apparatus is configured to specify a current time (37) in a typical occupancy.

The user interface (20) is configured to show temporal occupancy as a time axis, which comprises an indicator (37) that moves over time, along with assigned audio objects that are proportional to their length. The apparatus according to claim 12.

The providing means (10) is configured to be able to group audio objects so that the grouped audio objects are clearly indicated by group information regarding group attribution,
The mapping means (18) sets the group information so that the manipulation of the group characteristics affects all members of the group apart from the fact that which input channel of the audio processing means is associated with the audio object of the group. 14. An apparatus according to any of claims 1 to 13, configured to store.

A method for generating, storing and editing audio representations in an audio scene,
Generating (12) a plurality of speaker signals from a plurality of input channels (EK1, EK2,..., Ekm) (16);
Providing an object-oriented description of the audio scene, wherein the object-oriented description of the audio scene includes a plurality of audio objects, the audio object being associated with an audio signal, a start time, and an end time;
A first audio object is assigned to the input channel, a second audio object whose start time is after the end time of the first audio object is assigned to the same input channel, and the start time is assigned to the first audio object. By assigning a third audio object after the start time and before the end time of the first audio object to another of the plurality of input channels, the object-oriented description of the audio scene is Mapping (18) to a plurality of input channels.

A computer program having program code for executing the method of claim 15 when the program is executed on a computer.