JP5533282B2

JP5533282B2 - Sound playback device

Info

Publication number: JP5533282B2
Application number: JP2010127741A
Authority: JP
Inventors: 広臣四童子; 進澤米
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2010-06-03
Filing date: 2010-06-03
Publication date: 2014-06-25
Anticipated expiration: 2030-06-03
Also published as: JP2011254359A

Description

この発明は、任意の位置に音像を定位させる音響再生装置に関する。 The present invention relates to a sound reproducing device that localizes a sound image at an arbitrary position.

従来、図１（Ａ）に示すように、視聴者Ｕの周囲に設置した５つのスピーカＬ，Ｃ，Ｒ，ＳＬ，ＳＲからオーディオ信号に基づく音声を放音することで、聴取者の周囲に音像を定位させる立体音像制御装置があった（例えば、特許文献１参照。）。 Conventionally, as shown in FIG. 1A, sound based on an audio signal is emitted from five speakers L, C, R, SL, and SR installed around a viewer U, so that the listener is surrounded. There has been a three-dimensional sound image control device that localizes a sound image (see, for example, Patent Document 1).

特開２００２−４４７９９号公報JP 2002-44799 A

図１（Ｂ）に示すように、視聴者Ｕは通常、オーディオ用アンプに接続した５つのスピーカＬ，Ｃ，Ｒ，ＳＬ，ＳＲを部屋Ｒの壁Ｋの近くに設置することが多い。そして、視聴者Ｕは一般的にスピーカから一定距離離れた場所で視聴するため、視聴者Ｕはスピーカからの直接音だけでなく部屋の天井、床、壁面からの反射音（間接音）も同時に聞いている。一般に人間は音源までの距離を、直接音と間接音の音量比を主な手がかりとして判断しているから、各スピーカから音声を放音したときは、視聴者Ｕは音源までの距離としてスピーカまでの実距離を感じている。 As shown in FIG. 1B, the viewer U usually installs five speakers L, C, R, SL, SR connected to an audio amplifier near the wall K of the room R. And since the viewer U generally views at a certain distance from the speaker, the viewer U not only receives the direct sound from the speaker but also the reflected sound (indirect sound) from the ceiling, floor and wall of the room at the same time. listening. In general, humans judge the distance to the sound source as the main clue to the volume ratio of the direct sound and the indirect sound. Therefore, when the sound is emitted from each speaker, the viewer U uses the distance to the sound source as the distance to the sound source. I feel the real distance.

そのため、図１（Ｂ）に示したように、５つのスピーカＬ，Ｃ，Ｒ，ＳＬ，ＳＲにより囲まれた領域Ｔの内側（例えば、視聴者Ｕの左前方）に音源が定位するように、従来のオーディオ用アンプで音源Ｓを模擬しようとしても、視聴者Ｕには実際にその位置に音源が置かれた時の直接音と間接音の比率を再現することができない。そのため、視聴者Ｕにあたかもその場所に音源があるような距離感に知覚させることは困難であった。 Therefore, as shown in FIG. 1B, the sound source is localized inside the region T surrounded by the five speakers L, C, R, SL, SR (for example, the left front of the viewer U). Even if an attempt is made to simulate the sound source S with a conventional audio amplifier, the viewer U cannot reproduce the ratio of the direct sound and the indirect sound when the sound source is actually placed at that position. For this reason, it is difficult for the viewer U to perceive a sense of distance as if there is a sound source at that location.

そこで、本発明は、視聴者の周囲における任意の位置に音像を定位することができる音響再生装置を提供することを目的とする。 SUMMARY OF THE INVENTION An object of the present invention is to provide an acoustic reproduction apparatus that can localize a sound image at an arbitrary position around a viewer.

この発明の音響再生装置は、座標情報取得手段と、放音手段と、供給手段と、音像制御手段と、を備えている。座標情報取得手段は、表示装置が表示する映像の視聴環境における座標情報を取得する。放音手段は、視聴位置の周囲に第１音源を配置し、第２音源を第１音源よりも視聴位置の近くに配置する。供給手段は、映像に対応する音声信号を第１音源と第２音源に供給し、これにより、第１音源と第２音源から同じ音声を放音させる。音像制御手段は、座標情報に応じて第１音源と第２音源に供給する音声信号のゲイン比率を調整する。 The sound reproducing device of the present invention includes coordinate information acquisition means, sound emission means, supply means, and sound image control means. The coordinate information acquisition means acquires coordinate information in the viewing environment of the video displayed by the display device. The sound emitting means arranges the first sound source around the viewing position, and arranges the second sound source closer to the viewing position than the first sound source. The supply means supplies an audio signal corresponding to the video to the first sound source and the second sound source, thereby causing the same sound to be emitted from the first sound source and the second sound source. The sound image control means adjusts the gain ratio of the audio signal supplied to the first sound source and the second sound source according to the coordinate information.

音響再生装置を、例えば立方体型の室内に設置して視聴者の視聴位置を部屋の中央に設定した場合、第２音源は第１音源よりも視聴位置の近傍に設置する。直接音の音圧が距離の逆二乗に反比例して急激に減衰するのに対し、間接音の音圧は音源からの距離が変わってもそれほど音量が変化しない。したがって、視聴者は、これを手がかりに第１音源が放音した音を間接音の比率が高い音、すなわち遠くから聞こえる音と知覚する。一方、第２音源から視聴者までの距離は、第１音源から視聴者までの距離より小さいので、聴こえる音量のうち直接音の占める比率が第１音源からの場合より大きくなる。そこで、視聴位置にいる視聴者は、第２音源が放音した音を、直接音の比率が高い音、すなわち近くから聞こえる音と知覚する。したがって、第１音源と第２音源から同じ音声を放音させ、両音声信号のゲインを調整することで、視聴位置において視聴者に聞こえる直接音と間接音の比率を調整でき、第１音源と第２音源の間における任意の距離の位置に音像を定位させることができる。例えば、第２音源を視聴位置の直近とすれば、第２音源から放音される音は視聴者にはほとんど直接音として聞こえるので、視聴者の視聴位置の直近にも音源を定位させることができる。 When the sound reproducing device is installed in, for example, a cubic room and the viewer's viewing position is set at the center of the room, the second sound source is installed closer to the viewing position than the first sound source. The sound pressure of the direct sound abruptly attenuates in inverse proportion to the inverse square of the distance, whereas the sound pressure of the indirect sound does not change much as the distance from the sound source changes. Therefore, the viewer perceives the sound emitted by the first sound source as a clue as a sound with a high ratio of indirect sounds, that is, a sound heard from a distance. On the other hand, since the distance from the second sound source to the viewer is smaller than the distance from the first sound source to the viewer, the proportion of the direct sound in the audible volume is larger than that from the first sound source. Therefore, the viewer at the viewing position perceives the sound emitted by the second sound source as a sound with a high direct sound ratio, that is, a sound heard from nearby. Therefore, by emitting the same sound from the first sound source and the second sound source and adjusting the gain of both sound signals, the ratio of the direct sound and the indirect sound that can be heard by the viewer at the viewing position can be adjusted. The sound image can be localized at an arbitrary distance between the second sound sources. For example, if the second sound source is positioned close to the viewing position, the sound emitted from the second sound source is almost directly heard by the viewer, so that the sound source can be localized close to the viewer's viewing position. it can.

第１音源と第２音源は、具体的には以下のようにして実現することができる。例えば、第１スピーカと第２スピーカを放音手段が備える構成とし、第１スピーカに第１音源の音声信号を放音させ、第２スピーカを第１スピーカよりも視聴位置の近くに設置して第２音源の音声信号を放音させることで実現できる。 Specifically, the first sound source and the second sound source can be realized as follows. For example, the sound emission means includes a first speaker and a second speaker, the first speaker emits a sound signal of the first sound source, and the second speaker is installed closer to the viewing position than the first speaker. This can be realized by emitting the sound signal of the second sound source.

本発明の構成は、実際には、例えば５．１サラウンドシステムのような形態において、従来の５．１チャンネルに相当する６個のスピーカを上記第１スピーカとし、この構成にあらたに追加するスピーカを視聴者近傍設置用として上記第２スピーカとすることで実現できる。この第２スピーカの形態として１個または複数個のスピーカを視聴者正面に設置する、または視聴者の背後、例えばソファのヘッドレスト部に設置したり、立体映像を見る際に使用するメガネを構成するフレームに設置するなどが考えられる。このような構成により視聴者の実際に耳の近傍に音源を設置することができ、第１スピーカとの間で直接音と間接音との比率を制御することができる。 In the configuration of the present invention, actually, in a form such as a 5.1 surround system, for example, six speakers corresponding to the conventional 5.1 channel are used as the first speakers, and speakers newly added to this configuration. Can be realized by using the second speaker for installation near the viewer. As a form of the second speaker, one or a plurality of speakers are installed in front of the viewer, or are installed behind the viewer, for example, on the headrest portion of the sofa, or are used for viewing stereoscopic images. It can be installed on a frame. With such a configuration, a sound source can be installed in the vicinity of the ear of the viewer, and the ratio of direct sound and indirect sound can be controlled with the first speaker.

また、第１音源と第２音源は、以下のようにして実現することもできる。例えば、放音手段が、スピーカアレイと信号処理手段を組み合わせた指向性制御スピーカと、前記信号処理や、特殊構造による指向性制御をしない通常のスピーカを備える構成とする。 Further, the first sound source and the second sound source can also be realized as follows. For example, the sound emitting means includes a directivity control speaker that combines a speaker array and a signal processing means, and a normal speaker that does not perform directivity control by the signal processing or special structure.

この構成においては、第２音源の音声信号をスピーカアレイにより音声ビーム化して放音させるので、指向性制御をしないスピーカと比較して、視聴位置での直接音比率を上げることができる。このため実際に視聴位置の近くにスピーカを設置することなく第２音源を視聴位置近傍に定位させることができる。これにより、第１音源スピーカと同じ位置に第２音源スピーカを設置できたり、第１音源スピーカと第２音源スピーカを一体化したりするなど、設置位置や構成の自由度を高めることができる。また、前記構成において、指向性制御スピーカとしてスピーカアレイではなく、例えば平面スピーカなど構造的に指向性を制御しているようなスピーカを使用しても良い。 In this configuration, the sound signal of the second sound source is converted into a sound beam by the speaker array and emitted, so that the direct sound ratio at the viewing position can be increased as compared with a speaker without directivity control. Therefore, the second sound source can be localized near the viewing position without actually installing a speaker near the viewing position. As a result, the second sound source speaker can be installed at the same position as the first sound source speaker, or the first sound source speaker and the second sound source speaker can be integrated. Further, in the above-described configuration, a loudspeaker whose directivity is structurally controlled such as a flat speaker may be used as the directivity control loudspeaker instead of the loudspeaker array.

また、前記構成のスピーカを設置する環境の中には、スタジオなど部屋自体の音響が意図的にコントロールされている場合がある。このような環境の場合、吸音壁などにより通常の部屋より残響が少ないため、第１音源から視聴位置に到達する音の直接音比率が高く、その結果、第２音源との間で直接、間接音比率による距離感制御が困難な場合がある。本発明ではこのような特殊な環境においても、充分な効果を再現するために、第１音源が放音する音声信号に残響成分を付加し、その付加量を映像の座標情報に従って増減することができる。 Moreover, in the environment where the speaker having the above-described configuration is installed, the sound of the room itself such as a studio may be intentionally controlled. In such an environment, since the reverberation is less than in a normal room due to a sound absorbing wall or the like, the direct sound ratio of the sound reaching the viewing position from the first sound source is high, and as a result, directly and indirectly with the second sound source. It may be difficult to control the sense of distance using the sound ratio. In the present invention, in order to reproduce a sufficient effect even in such a special environment, it is possible to add a reverberation component to the audio signal emitted by the first sound source, and increase or decrease the added amount according to the coordinate information of the video. it can.

この構成においては、第１音源が放音する音声信号に残響成分を付加することで、たとえ残響が少ない環境においても視聴者に第１音源からの間接音の比率が高くなったように聴かせることができる。視聴者は、第１音源から間接音の比率が高い（ような）音が聞こえると、残響を付加しない場合より遠くから音が放音されたように感じるため、第２音源との間に距離感の差を再現することができる。ここで付加される残響成分の量は次のように決定すればよい。予め部屋の残響時間を測定しておき、その残響時間が予め決定された時間に対して少ない場合は、残響付加量を増やし、残響時間が大きい場合は残響付加量を減らすようにする。このように残響測定値に従って第１音源に付加する残響量を決定すると、異なる環境においても一定の効果を再現することができる。また、残響付加量を測定値等により決定される適性量より多く付加することで、第１音源の距離感を意図的に大きくして、立体映像の座標がＴＶ画面の奥行き方向にある場合にも、その距離感を立体映像に応じて再現し、音源定位位置制御をより効果的にすることもできる。立体映像が同時に奥行き効果と飛び出し効果を伴っている場合は、奥行き方向を第１音源に残響を付加することで表現しつつ、同時に飛び出し方向の定位を第２音源で表現する。この場合は残響付加により視聴者に聞こえる第１音源の直接音と間接音の比率が変わるため、残響付加量に応じて第１音源と第２音源の間のゲイン比率を変動させる（残響付加量が多くなると、同じ定位位置での第２音源のゲイン比率を大きくする。残響付加量が少なくなると、同じ定位位置での第１音源のゲイン比率を大きくする。）ことで立体映像の座標位置に正しく定位を保つことができる。 In this configuration, by adding a reverberation component to the audio signal emitted by the first sound source, even if the reverberation is low, the viewer can listen as if the ratio of indirect sound from the first sound source is high. be able to. When a viewer hears a sound with a high ratio of indirect sound from the first sound source, the viewer feels that the sound has been emitted from a distance farther than when no reverberation is added. The difference in feeling can be reproduced. The amount of reverberation component added here may be determined as follows. The reverberation time of the room is measured in advance, and when the reverberation time is less than the predetermined time, the reverberation addition amount is increased, and when the reverberation time is large, the reverberation addition amount is decreased. Thus, when the amount of reverberation added to the first sound source is determined according to the reverberation measurement value, a certain effect can be reproduced even in different environments. Further, when the amount of added reverberation is larger than the appropriate amount determined by the measured value or the like, the sense of distance of the first sound source is intentionally increased, and the coordinates of the stereoscopic video are in the depth direction of the TV screen. However, it is possible to reproduce the sense of distance according to the stereoscopic image, and to make the sound source localization position control more effective. When the stereoscopic video has a depth effect and a pop-out effect at the same time, the depth direction is expressed by adding reverberation to the first sound source, and at the same time, the localization in the pop-out direction is expressed by the second sound source. In this case, since the ratio of the direct sound and the indirect sound of the first sound source that can be heard by the viewer changes due to the addition of reverberation, the gain ratio between the first sound source and the second sound source is changed according to the amount of reverberation (the amount of reverberation added). (2), the gain ratio of the second sound source at the same localization position is increased, and when the amount of reverberation is reduced, the gain ratio of the first sound source at the same localization position is increased. You can maintain the correct orientation.

すなわち、立体映像によっては、背景は奥行き方向に拡がり、人物など強調したい映像部分は逆に飛び出しているような表現があるが、このような場合は、奥行きを表現するために第１音源に残響を付加し、かつ飛び出しを表現するために第２音源へのゲインを上げる必要がある。このとき、固定の距離−音源間ゲイン比率を使用していると、残響付加により、第１音源の直接−間接音比率が変わり、算出された映像の位置に正しく定位させることができなくなる。そのため、音像制御手段は残響成分量に応じて距離−音源間ゲイン比率を可変させることで、常時正しい音声定位位置を保つように動作させることができる。具体的には、残響付加量が多くなるにつれて、同じ定位位置における第２音源のゲイン比率を上げ、逆に残響付加量が少なくなるにつれて、同じ定位位置における第２音源のゲイン比率を下げるとよい。 In other words, depending on the stereoscopic video, the background spreads in the depth direction, and there is an expression that the video portion to be emphasized, such as a person, protrudes in the opposite direction. In such a case, the reverberation is generated in the first sound source to express the depth. It is necessary to increase the gain to the second sound source in order to add pop-up and express pop-up. At this time, if a fixed distance-gain ratio between sound sources is used, the direct-indirect sound ratio of the first sound source changes due to the addition of reverberation and cannot be correctly localized at the calculated image position. Therefore, the sound image control means can be operated so as to always maintain the correct sound localization position by varying the distance-sound source gain ratio according to the amount of reverberation components. Specifically, the gain ratio of the second sound source at the same localization position is increased as the amount of added reverberation increases, and conversely, the gain ratio of the second sound source at the same localization position is decreased as the amount of added reverberation decreases. .

また、この発明の音響再生装置では、音像制御手段は、映像に対応する音声信号における所定帯域の周波数特性の変化を抑制するように、第１音源と第２音源の音声信号の周波数特性を調整することもできる。 In the sound reproducing device of the present invention, the sound image control means adjusts the frequency characteristics of the audio signals of the first sound source and the second sound source so as to suppress a change in the frequency characteristics of the predetermined band in the audio signal corresponding to the video. You can also

音響効果の手法の１つに、音像が遠くから近づいてくる状態を表現するために、音像の音声の高域成分を徐々に増加させ、音像が遠くに離れてゆく状態を表現するために、音像の音声の高域成分を徐々に減少させる手法がある。一方、この発明の音響再生装置は、視聴位置の周囲に第１音源を配置し、該第１音源よりも視聴位置の近くに第２音源を配置させ、座標情報に応じて、第１音源と第２音源の音声信号のゲインを調整することにより、音像を実際に移動させる。そのため、３Ｄコンテンツの音声信号自体に上記の音響効果が付加されていると、実際に視聴位置近傍で再生する第２音源からの音声にとっては高域成分が過剰となり、視聴に適さない音質となる可能性がある。そこで、本発明では、映像の前後座標に応じて、前記音響効果を抑制することで、第２音源から再生される音声の品質を確保する。つまり、映像が視聴位置に近づくにつれて第１音源と第２音源の音声信号における所定帯域の周波数成分、例えば高域成分を減少させ、また、映像が視聴位置から遠ざかるにつれて第１音源と第２音源の音声信号における所定帯域の周波数成分、例えば高域成分を回復させる。これにより、映像が視聴位置方向に飛び出す効果を強調する音響効果がコンテンツ側に施されていても第２音源から再生される音声の品質を適正にすることができる。 In order to express the state where the sound image is approaching from a distance as one of the methods of the acoustic effect, to gradually increase the high frequency component of the sound of the sound image and to express the state where the sound image is moving away, There is a method of gradually reducing the high frequency component of the sound of the sound image. On the other hand, the sound reproducing device of the present invention arranges the first sound source around the viewing position, arranges the second sound source closer to the viewing position than the first sound source, and determines the first sound source and the sound source according to the coordinate information. The sound image is actually moved by adjusting the gain of the audio signal of the second sound source. Therefore, if the above-described acoustic effect is added to the audio signal itself of the 3D content, the high frequency component is excessive for the audio from the second sound source that is actually reproduced in the vicinity of the viewing position, resulting in sound quality that is not suitable for viewing. there is a possibility. Therefore, in the present invention, the sound effect reproduced from the second sound source is ensured by suppressing the acoustic effect according to the front-rear coordinates of the video. That is, as the video approaches the viewing position, frequency components of a predetermined band, for example, high frequency components, in the audio signals of the first sound source and the second sound source are reduced, and as the video moves away from the viewing position, the first sound source and the second sound source. The frequency component of a predetermined band, for example, a high frequency component is recovered in the audio signal. This makes it possible to make the quality of the sound reproduced from the second sound source appropriate even if the content side has an acoustic effect that emphasizes the effect of the video jumping out in the viewing position direction.

この発明によれば、従来のサラウンドシステムに対して、視聴位置直近傍に追加した音源から、映像の視聴環境における座標情報に応じた音声を出力することによって、より臨場感の高い音場を再現することができる。 According to the present invention, a sound field with higher presence is reproduced by outputting sound corresponding to coordinate information in a video viewing environment from a sound source added in the vicinity of the viewing position with respect to a conventional surround system. can do.

図１（Ａ）は、従来のオーディオ用アンプに接続するスピーカの配置図であり、図１（Ｂ）は、従来のオーディオ用アンプに接続するスピーカを室内に設置し、視聴者の近くに音源を定位させる場合を説明するための図である。FIG. 1A is a layout diagram of speakers connected to a conventional audio amplifier, and FIG. 1B shows a speaker connected to a conventional audio amplifier indoors, and a sound source near the viewer. It is a figure for demonstrating the case where is localized. 本発明の第１実施形態に係る音響再生装置を備えた３Ｄシステムの概略構成を示すブロック図である。It is a block diagram showing a schematic structure of a 3D system provided with a sound reproduction device concerning a 1st embodiment of the present invention. 図３は、モニタの画面に表示する映像と視聴者が知覚する立体映像の関係を示す図である。図３（Ａ）は、立体映像をモニタの画面より奥に知覚する場合、図３（Ｂ）は立体映像をモニタの画面上に知覚する場合、図３（Ｃ）は、立体映像をモニタの画面よりも手前に知覚する場合である。FIG. 3 is a diagram showing the relationship between the video displayed on the monitor screen and the stereoscopic video perceived by the viewer. 3A shows a case where a stereoscopic video is perceived deeper than the monitor screen, FIG. 3B shows a case where a stereoscopic video is perceived on the monitor screen, and FIG. This is a case of perceiving in front of the screen. 図４は、視聴者とモニタの映像と立体映像の距離関係を示すＸＹ平面図である。図４（Ａ）は、立体映像がモニタの画面より奥に位置する場合、図３（Ｂ）は立体映像がモニタの画面上に位置する場合、図３（Ｃ）は、立体映像がモニタの画面よりも手前に位置する場合である。FIG. 4 is an XY plan view showing the distance relationship between the viewer and the monitor image and the stereoscopic image. 4A shows a case where the stereoscopic video is located behind the monitor screen, FIG. 3B shows a case where the stereoscopic video is located on the monitor screen, and FIG. This is a case of being positioned in front of the screen. 図５は、音像の定位位置を説明するための図である。図５（Ａ）は視聴位置に音像が定位する場合、図５（Ｂ）はアタッチスピーカとサラウンドスピーカの間に音像が定位する場合、図５（Ｃ）は、サラウンドスピーカ（Ｃｃｈ）の位置に音像が定位する場合、図５（Ｄ）は、サラウンドスピーカ（Ｃｃｈ）の背後に音像が定位する場合、図５（Ｅ）は、複数のスピーカにより音像を定位する場合である。FIG. 5 is a diagram for explaining a localization position of a sound image. 5A shows the case where the sound image is localized at the viewing position, FIG. 5B shows the case where the sound image is localized between the attach speaker and the surround speaker, and FIG. 5C shows the position of the surround speaker (Cch). When the sound image is localized, FIG. 5D is a case where the sound image is localized behind the surround speaker (Cch), and FIG. 5E is a case where the sound image is localized by a plurality of speakers. プレゼンススピーカを室内に設置した状態を示す図である。It is a figure which shows the state which installed the presence speaker indoors. 音源の距離に応じた周波数特性を示すグラフである。It is a graph which shows the frequency characteristic according to the distance of a sound source. 第２実施形態に係る音響再生装置を備えた３Ｄシステムの概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the 3D system provided with the sound reproduction apparatus which concerns on 2nd Embodiment. スピーカアレイの音声ビームの放音状態を示す図である。It is a figure which shows the sound emission state of the sound beam of a speaker array.

以下、音響再生装置の詳細について説明する。 Details of the sound reproducing device will be described below.

［第１実施形態］
図２は、本発明の第１実施形態に係る音響再生装置を備えた３Ｄシステムの概略構成を示すブロック図である。図２に示すように、３Ｄシステム１は、映像再生装置２と音響再生装置３を備えている。映像再生装置２は、コンテンツ信号受信部２１、コンテンツ再生部２２、映像処理部２３、映像信号出力部２５、及び音声信号出力部２６を備えている。 [First Embodiment]
FIG. 2 is a block diagram showing a schematic configuration of a 3D system including the sound reproducing device according to the first embodiment of the present invention. As shown in FIG. 2, the 3D system 1 includes a video playback device 2 and an audio playback device 3. The video playback device 2 includes a content signal receiving unit 21, a content playback unit 22, a video processing unit 23, a video signal output unit 25, and an audio signal output unit 26.

音響再生装置３は、映像信号入力部３１、映像処理部３２（座標情報取得手段に相当）、音声信号入力部３３、音像処理部３４、第１音源信号出力部３５、第２音源信号出力部３６、及び映像信号出力部３７を備えている。音像処理部３４は、ミックスレベル算出部（音像制御手段に相当）３４１、波形調整部（供給手段に相当）３４３、波形調整部（供給手段に相当）３４５、残響付加部３４７、及び監視部３４８を備えている。映像信号出力部３７には、モニタ（表示装置に相当）３８が接続されている。第１音源信号出力部３５には一例として、５つのスピーカ４１〜４５が接続されている。各スピーカ４１〜４５は、放音手段（第１スピーカ）に相当し、以下の説明では、Ｌ（前方左）ｃｈスピーカ４１、Ｃ（前方中）ｃｈスピーカ４２、Ｒ（前方右）ｃｈスピーカ４３、ＳＬ（後方左）ｃｈスピーカ４４、及びＳＲ（後方右）ｃｈスピーカ４５と称する。また、以下の説明では、Ｌｃｈスピーカ４１、Ｃｃｈスピーカ４２、Ｒｃｈスピーカ４３、ＳＬｃｈスピーカ４４、及びＳＲｃｈスピーカ４５をまとめてサラウンドスピーカ４０と称する。図２には、サラウンドスピーカ４０を、ＩＴＵ−ＲＢＳ．７７５−１に基づいて配置し、Ｌｃｈスピーカ４１、Ｃｃｈスピーカ４２、Ｒｃｈスピーカ４３の付近にモニタ３８を配置した例を示している。なお、５．１ｃｈサラウンドシステムにおいては、サブウーファを設置するが、以下の説明ではサブウーファの音声信号の処理については説明を省略する。 The sound reproduction device 3 includes a video signal input unit 31, a video processing unit 32 (corresponding to coordinate information acquisition means), an audio signal input unit 33, a sound image processing unit 34, a first sound source signal output unit 35, and a second sound source signal output unit. 36 and a video signal output unit 37. The sound image processing unit 34 includes a mix level calculation unit (corresponding to a sound image control unit) 341, a waveform adjustment unit (corresponding to a supply unit) 343, a waveform adjustment unit (corresponding to a supply unit) 345, a reverberation adding unit 347, and a monitoring unit 348. It has. A monitor (corresponding to a display device) 38 is connected to the video signal output unit 37. As an example, five speakers 41 to 45 are connected to the first sound source signal output unit 35. Each of the speakers 41 to 45 corresponds to sound emitting means (first speaker). In the following description, an L (front left) ch speaker 41, a C (front middle) ch speaker 42, and an R (front right) ch speaker 43 are used. , SL (rear left) ch speaker 44, and SR (rear right) ch speaker 45. In the following description, the Lch speaker 41, the Cch speaker 42, the Rch speaker 43, the SLch speaker 44, and the SRch speaker 45 are collectively referred to as a surround speaker 40. In FIG. 2, the surround speaker 40 is connected to the ITU-R BS. In this example, the monitor 38 is arranged in the vicinity of the Lch speaker 41, the Cch speaker 42, and the Rch speaker 43. In the 5.1ch surround system, a subwoofer is installed, but in the following description, description of the processing of the audio signal of the subwoofer is omitted.

第２音源信号出力部３６にはアタッチスピーカ４７が接続されている。アタッチスピーカ４７は、放音手段（第２スピーカ）に相当し、第２音源信号出力部３６が出力した音声を放音する。 An attach speaker 47 is connected to the second sound source signal output unit 36. The attach speaker 47 corresponds to sound emitting means (second speaker) and emits the sound output from the second sound source signal output unit 36.

第１実施形態では、視聴者Ｕの近傍に本発明の第２音源を定位するために、視聴者Ｕの直近にアタッチスピーカ４７を設置（配置）している。また、視聴者Ｕの周囲に第１音源を定位させるために、サラウンドスピーカ４０を設置（配置）し、サラウンドスピーカ４０とアタッチスピーカ４７の間の任意の位置に立体映像に対応する音像を定位させる。音響再生装置３の音像処理部３４は、映像処理部３２が出力した立体映像の３次元座標情報に基づいて、サラウンドスピーカ４０とアタッチスピーカ４７に出力する音声信号の比率（ミックスレベル）を算出し、このミックスレベルに基づいて、サラウンドスピーカ４０とアタッチスピーカ４７に出力する音声信号の振幅（ゲイン）を調整する。そして、音像処理部３４は、第１音源信号出力部３５と第２音源信号出力部３６に音声信号を出力し、サラウンドスピーカ４０とアタッチスピーカ４７から放音させる。 In the first embodiment, in order to localize the second sound source of the present invention in the vicinity of the viewer U, the attach speaker 47 is installed (placed) in the immediate vicinity of the viewer U. Further, in order to localize the first sound source around the viewer U, a surround speaker 40 is installed (arranged), and a sound image corresponding to a stereoscopic image is localized at an arbitrary position between the surround speaker 40 and the attach speaker 47. . The sound image processing unit 34 of the sound reproduction device 3 calculates the ratio (mix level) of the audio signals output to the surround speaker 40 and the attach speaker 47 based on the three-dimensional coordinate information of the stereoscopic video output by the video processing unit 32. Based on this mix level, the amplitude (gain) of the audio signal output to the surround speaker 40 and the attach speaker 47 is adjusted. Then, the sound image processing unit 34 outputs a sound signal to the first sound source signal output unit 35 and the second sound source signal output unit 36 and emits sound from the surround speaker 40 and the attach speaker 47.

映像再生装置２は、３Ｄコンテンツの映像信号と音声信号を音響再生装置３に出力する。この例においては、映像再生装置２のコンテンツ信号受信部２１は、放送波を受信して、３Ｄコンテンツの映像信号を映像処理部２３に出力し、３Ｄコンテンツの音声信号を音声信号出力部２６に出力する。 The video playback device 2 outputs the video signal and audio signal of the 3D content to the sound playback device 3. In this example, the content signal receiving unit 21 of the video reproduction device 2 receives a broadcast wave, outputs a video signal of 3D content to the video processing unit 23, and outputs an audio signal of 3D content to the audio signal output unit 26. Output.

コンテンツ再生部２２は、不図示の記憶メディアを再生して３Ｄコンテンツの映像信号を映像処理部２３に出力し、３Ｄコンテンツの音声信号を音声信号出力部２６に出力する。 The content playback unit 22 plays back a storage medium (not shown), outputs a video signal of 3D content to the video processing unit 23, and outputs an audio signal of 3D content to the audio signal output unit 26.

映像処理部２３は、コンテンツ信号受信部２１またはコンテンツ再生部２２から入力された３Ｄコンテンツの映像信号を、映像信号出力部２５に出力する。映像信号出力部２５は、３Ｄコンテンツの映像信号を音響再生装置３に出力し、音声信号出力部２６は３Ｄコンテンツの複数チャンネルの音声信号を音響再生装置３に出力する。 The video processing unit 23 outputs the video signal of the 3D content input from the content signal receiving unit 21 or the content reproduction unit 22 to the video signal output unit 25. The video signal output unit 25 outputs a video signal of 3D content to the sound playback device 3, and the audio signal output unit 26 outputs a plurality of channels of 3D content audio signals to the sound playback device 3.

音響再生装置３の映像信号入力部３１は、入力された３Ｄコンテンツの映像信号を、映像処理部３２と映像信号出力部３７に出力する。映像処理部３２は、入力された映像信号を解析して、視聴位置９０に対する立体映像の３次元座標を取得して音像処理部３４に出力する。映像信号出力部３７は、映像信号入力部３１から入力された映像信号をモニタ３８に出力する。モニタ３８は、３Ｄコンテンツの立体映像を表示する。 The video signal input unit 31 of the sound reproduction device 3 outputs the input video signal of the 3D content to the video processing unit 32 and the video signal output unit 37. The video processing unit 32 analyzes the input video signal, acquires the three-dimensional coordinates of the stereoscopic video with respect to the viewing position 90, and outputs it to the sound image processing unit 34. The video signal output unit 37 outputs the video signal input from the video signal input unit 31 to the monitor 38. The monitor 38 displays a stereoscopic video of 3D content.

映像再生装置２は、例えば、視差（左右の映像のずれ）を設けた２つの映像を特殊なメガネにより視聴者の左目と右目で別々に見えるように作成された３Ｄコンテンツの映像を再生することで、この映像を視聴者に立体的な映像と知覚させている。 For example, the video playback device 2 plays back video of 3D content created so that two videos provided with parallax (shift between left and right videos) can be seen separately by the viewer's left eye and right eye using special glasses. Therefore, the viewer perceives this video as a three-dimensional video.

音響再生装置３の映像処理部３２は、立体映像に対応する位置に音像を定位させるために、３Ｄコンテンツの左目用の映像（以下、左映像と称する。）と右目用の映像（以下、右映像と称する。）とのずれを解析して、立体映像がどれくらい画面から飛び出しているか（引っ込んでいるか）、またその立体映像が画面上の左右上下のどこに位置するかを示す座標を算出する。 The video processing unit 32 of the sound reproduction device 3 localizes the sound image at a position corresponding to the stereoscopic video, and the left-eye video (hereinafter referred to as the left video) and the right-eye video (hereinafter, the right video) of the 3D content. And the coordinates indicating how much the stereoscopic video is projected (retracted) and where the stereoscopic video is located on the left, right, top and bottom on the screen.

図３は、モニタの画面に表示する映像と視聴者が知覚する立体映像の関係を示す図である。図３（Ａ）は、立体映像をモニタの画面よりも奥に知覚する場合、図３（Ｂ）は立体映像をモニタの画面上に知覚する場合、図３（Ｃ）は、立体映像をモニタの画面よりも手前に知覚する場合である。 FIG. 3 is a diagram showing the relationship between the video displayed on the monitor screen and the stereoscopic video perceived by the viewer. 3A shows a case where a stereoscopic video is perceived deeper than the monitor screen, FIG. 3B shows a case where a stereoscopic video is perceived on the monitor screen, and FIG. 3C shows a case where the stereoscopic video is monitored. It is a case where it perceives before this screen.

人間の目には左右の視差（両眼視差）があり、左目と右目とでは物体を見る位置が異なるため見え方が異なっており、人間の脳は、視線が交差したところに立体的な物体が存在していると知覚する。また、人間は視差が大きいほど物体が近くに存在し、視差が小さいほど物体が遠くに存在していると知覚する。３Ｄコンテンツは、このような人間の特性を応用したものであり、視差（左右の映像のずれ）を設けた２つの映像を、例えば特殊なメガネにより左目と右目で別々に見えるようにすることで、この映像を視聴者に立体的な映像として知覚させる。また、３Ｄコンテンツでは、左映像と右映像との違い（視差）を変えることで、画面から立体映像が飛び出したり引っ込んだりしていると知覚させる。 The human eye has left and right parallax (binocular parallax), and the left eye and the right eye look different because the object is viewed at different positions, and the human brain is a three-dimensional object where the line of sight intersects Perceive that exists. Also, humans perceive that the larger the parallax is, the closer the object is, and the smaller the parallax is, the farther the object is. 3D content is an application of such human characteristics. By making two images with parallax (the difference between the left and right images) visible to the left and right eyes separately using special glasses, for example. The viewer is made to perceive this video as a stereoscopic video. In the 3D content, the difference (parallax) between the left video and the right video is changed to perceive that the stereoscopic video is popping out or retracted from the screen.

図３（Ａ）に示すように、左映像１３１Ｌをモニタ３８の左側にずらして表示し、右映像１３１Ｒをモニタ３８の右側にずらして表示すると、視線が交差する場所が画面よりも奥になるので、視聴者Ｕは立体映像１３１が画面の奥に引っ込んだように知覚する。また、図３（Ｂ）に示すように、左映像１３２Ｌと右映像１３２Ｒとをモニタ３８の同じ位置にずれなしに表示すると、視線が交差する場所が画面上になるので、視聴者Ｕは立体映像１３２が画面と同じ位置にあるように知覚する。また、図３（Ｃ）に示すように、左映像１３３Ｌをモニタ３８の右側にずらして表示し、右映像１３３Ｒをモニタ３８の左側にずらして表示すると、視線が交差する場所が画面よりも手前になるので、視聴者Ｕは立体映像１３３が画面の手前に飛び出したように知覚する。 As shown in FIG. 3A, when the left image 131L is shifted to the left side of the monitor 38 and the right image 131R is shifted to the right side of the monitor 38, the place where the line of sight intersects is behind the screen. Therefore, the viewer U perceives the stereoscopic video 131 as if it was retracted into the back of the screen. Further, as shown in FIG. 3B, when the left video 132L and the right video 132R are displayed at the same position on the monitor 38 without deviation, the place where the line of sight intersects is on the screen. The image 132 is perceived as being at the same position as the screen. Also, as shown in FIG. 3C, when the left video 133L is displayed shifted to the right side of the monitor 38 and the right video 133R is displayed shifted to the left side of the monitor 38, the place where the line of sight intersects is closer to the screen. Therefore, the viewer U perceives the stereoscopic video 133 as if it jumped out in front of the screen.

このように、視聴者Ｕは、左映像と右映像との違い（視差）によって立体映像の位置を異なって知覚するので、モニタ３８の画面に垂直な方向において、視差と立体映像の位置には比例関係が成り立つ。そこで、映像処理部３２は、３Ｄコンテンツの左映像と右映像とにおける立体映像のずれ（視差）と、そのずれが発生している画面上の位置を解析して、画面に対する立体映像の位置、すなわち立体映像の視聴位置９０に対する３次元座標を算出して、音像処理部３４に出力する。立体映像の座標は、例えば以下のようにして算出する。 In this way, the viewer U perceives the position of the stereoscopic video differently depending on the difference (parallax) between the left video and the right video, so that the position of the parallax and the stereoscopic video is in the direction perpendicular to the screen of the monitor 38. A proportional relationship holds. Therefore, the video processing unit 32 analyzes the shift (parallax) of the stereoscopic video between the left video and the right video of the 3D content and the position on the screen where the shift occurs, and the position of the stereoscopic video with respect to the screen, That is, three-dimensional coordinates for the viewing position 90 of the stereoscopic video are calculated and output to the sound image processing unit 34. The coordinates of the stereoscopic image are calculated as follows, for example.

図４は、視聴者とモニタの映像と立体映像の距離関係を示すＸＹ平面図である。図４（Ａ）は、立体映像がモニタの画面より奥に位置する場合、図４（Ｂ）は立体映像がモニタの画面上に位置する場合、図４（Ｃ）は、立体映像がモニタの画面よりも手前に位置する場合である。図４において、視聴者Ｕの視聴位置９０（両目の中点）を原点Ｏ（０，０）とし、視聴者Ｕの視聴位置９０（原点Ｏ）からモニタ３８の画面までのＹ軸方向の距離をＬｓ、視聴者Ｕの視聴位置９０（原点Ｏ）から立体映像１３１までのＹ軸方向の距離をＬｄとする。 FIG. 4 is an XY plan view showing the distance relationship between the viewer and the monitor image and the stereoscopic image. 4A shows a case where the stereoscopic video is located behind the monitor screen, FIG. 4B shows a case where the stereoscopic video is located on the monitor screen, and FIG. This is a case of being positioned in front of the screen. In FIG. 4, the viewing position 90 (midpoint of both eyes) of the viewer U is the origin O (0, 0), and the distance in the Y-axis direction from the viewing position 90 (origin O) of the viewer U to the screen of the monitor 38 Is Ls, and the distance in the Y-axis direction from the viewing position 90 (the origin O) of the viewer U to the stereoscopic video 131 is Ld.

例えば、左映像１３１Ｌと右映像１３１Ｒの一致点の座標の差を算出することで視差の値がわかる。図４（Ａ）に示すように、立体映像（同図では一例として立方体）１３１がモニタ３８の画面よりも奥の点Ｐ（０，Ｌｄ）に位置する場合、この点Ｐは、左映像１３１Ｌでは一致点Ｐ_Ｌ（ｘ_Ｌ，Ｌｓ）、右映像１３１Ｒでは一致点Ｐ_Ｒ（ｘ_Ｒ，Ｌｓ）である。この場合、映像処理部３２は、例えば左映像１３１Ｌと右映像１３１Ｒに対してテンプレートマッチングを行って、２つの映像における物体の一致点を求める。例えば、左映像を任意の数のブロックに区切り、このブロックをテンプレートとみなして、テンプレートに類似したブロックを右映像から１画素ずつずらしながら探し出す。左映像と右映像には視差があるので、まったく同一にはならず、実際にはテンプレートと最も類似したブロックを探し出すことになる。映像処理部３２は、右映像において、左映像のテンプレートとの相関値が最大となるブロックを一致点と判定する。 For example, the parallax value can be determined by calculating the coordinate difference between the left video 131L and the right video 131R. As shown in FIG. 4A, when the stereoscopic image 131 (cube as an example in the figure) 131 is located at a point P (0, Ld) at the back of the screen of the monitor 38, this point P is the left image 131L. Is the coincidence point P _L (x _L , Ls), and the right image 131R is the coincidence point P _R (x _R , Ls). In this case, the video processing unit 32 performs template matching on the left video 131L and the right video 131R, for example, and obtains a coincidence point between the objects in the two videos. For example, the left video is divided into an arbitrary number of blocks, and this block is regarded as a template, and a block similar to the template is searched while being shifted pixel by pixel from the right video. Since the left image and the right image have parallax, they are not exactly the same, and actually the block most similar to the template is searched for. The video processing unit 32 determines a block having the maximum correlation value with the template of the left video in the right video as a matching point.

なお、上記のように相関を求める場合、図４（Ａ）と図４（Ｃ）に示すように、左映像に対する右映像の一致点のずれがプラスの値の場合には、立体映像はモニタの画面より奥に位置し、ずれがマイナスの値の場合には、立体映像はモニタの画面よりも手前に位置する。 When obtaining the correlation as described above, as shown in FIGS. 4 (A) and 4 (C), when the deviation of the coincidence point of the right image with respect to the left image is a positive value, the stereoscopic image is monitored. When the shift is a negative value, the stereoscopic image is positioned in front of the monitor screen.

映像処理部３２は、左映像１３１Ｌの点Ｐ_Ｌと右映像１３１Ｒの点Ｐ_Ｒを一致点と判定すると、点Ｐ_Ｌと点Ｐ_Ｒのずれ、すなわち視差を用いて、立体映像１３１の座標、すなわち、視聴位置９０から立体映像１３１までの距離を、そして一致点のブロックの位置から、立体映像の画面平面上での位置座標を算出する。 The video processing unit 32 has determined the point _{P R} of _{P L} and the right image 131R points left image 131L and match point, the deviation of the point _{P L} and the point _{P R,} i.e. using the disparity, the coordinates of the three-dimensional image 131, That is, the position coordinates on the screen plane of the stereoscopic video are calculated from the distance from the viewing position 90 to the stereoscopic video 131 and the position of the block of the coincidence point.

映像処理部３２は、例えば、視聴位置９０から立体映像１３１までの距離（Ｙ軸方向の座標）を以下のようにして算出する。図４（Ａ）に示すように、視聴者の左目ＵＬと右目ＵＲの距離をＥ、モニタ３８の画面上に表示される左映像１３１Ｌと右映像１３１Ｒとのずれ（視差）をＤ（＝ｘ_Ｒ−ｘ_Ｌ）とすると、以下の比例関係が成り立つので式１が得られる。 For example, the video processing unit 32 calculates the distance (coordinates in the Y-axis direction) from the viewing position 90 to the stereoscopic video 131 as follows. As shown in FIG. 4A, the distance between the viewer's left eye UL and right eye UR is E, and the shift (parallax) between the left video 131L and the right video 131R displayed on the screen of the monitor 38 is D (= x _{If R} −x _L ), the following proportional relationship is established, and thus Equation 1 is obtained.

Ｌｄ：（Ｌｄ−Ｌｓ）＝Ｅ：Ｄ
Ｌｄ＝Ｙｐ＝（Ｅ／Ｅ−Ｄ）Ｌｓ …（式１）
同様に、図４（Ｂ）、図４（Ｃ）に示す立体映像１３２，１３３についても比例関係が成り立つので、式１により立体映像１３１までの距離を算出できる。 Ld: (Ld−Ls) = E: D
Ld = Yp = (E / ED) Ls (Formula 1)
Similarly, since the proportional relationship is also established for the stereoscopic images 132 and 133 shown in FIGS. 4B and 4C, the distance to the stereoscopic image 131 can be calculated by Equation 1.

なお、式１の定数Ｅは、人の一般的な両目の距離を予め音響再生装置３に入力しておくとよい。また、式１の定数Ｌｓは、視聴位置９０に応じて予め音響再生装置３に入力しておくとよい。 In addition, as for the constant E of Formula 1, it is preferable to input the distance between both eyes of a person into the sound reproducing device 3 in advance. Further, the constant Ls of Equation 1 may be input to the sound reproduction device 3 in advance according to the viewing position 90.

また、定数（Ｅ、Ｌｓ）を用いた式１の計算を行わなくても、左映像３８Ｌと右映像３８Ｒの視差Ｄ（＝ｘ_Ｒ−ｘ_Ｌ）から立体映像の定位位置を推定して、視差Ｄをそのまま座標に対応させてもよい。例えば、Ｄ＝３（ｃｍ）でＬｄ＝１（ｍ）等としておく。 Further, the localization position of the stereoscopic video is estimated from the parallax D (= x _R −x _L ) between the left video 38L and the right video 38R without performing the calculation of Equation 1 using the constants (E, Ls), You may make the parallax D correspond to a coordinate as it is. For example, D = 3 (cm) and Ld = 1 (m) are set.

なお、上記の説明では、モニタ３８の画面に対して前後方向（Ｙ軸方向：奥向き方向及び飛び出し方向）について、立体映像の座標を算出する場合について説明したが、モニタ３８の画面に対して上下方向（Ｚ軸方向）や左右方向（Ｘ軸方向）についても同様に立体映像の座標を算出できる。 In the above description, the case where the coordinates of the stereoscopic video are calculated in the front-rear direction (Y-axis direction: the back direction and the pop-out direction) with respect to the screen of the monitor 38 has been described. Similarly, the coordinates of the stereoscopic video can be calculated in the vertical direction (Z-axis direction) and the horizontal direction (X-axis direction).

図２に示した音響再生装置３のミックスレベル算出部３４１は、映像処理部３２が出力した立体映像の視聴位置９０に対する３次元位置情報に基づいて、サラウンドスピーカ４０とアタッチスピーカ４７に出力する音声信号の配分（ミックスレベル情報）を算出する。そして、ミックスレベル算出部３４１は、波形調整部３４３と波形調整部３４５にミックスレベル情報を出力する。ミックスレベル情報は、立体映像に対応する位置に音像を定位させるために、サラウンドスピーカ４０とアタッチスピーカ４７とに分配する同じソースの音声信号のゲイン比率に関する情報と、ソースがどのチャンネルの音声信号であるかを示す情報を含む。立体映像の座標Ｐが、例えば画面真正面のＣｃｈスピーカ４２とアタッチスピーカ４７の中間にある場合には、この立体映像に対応する音声信号（この場合Ｃｃｈの音声信号）がＣｃｈスピーカ４２とアタッチスピーカ４７から座標位置に応じたゲイン配分で再生される。また、立体映像の座標Ｐが画面上の左側で画面と視聴位置の間にある場合には、例えばＬｃｈスピーカ４１とＣｃｈスピーカ４２とアタッチスピーカ４７から座標位置に応じたゲイン配分で再生される。 The audio level output to the surround speaker 40 and the attach speaker 47 is performed by the mix level calculation unit 341 of the sound reproduction device 3 illustrated in FIG. 2 based on the three-dimensional position information with respect to the viewing position 90 of the stereoscopic video output by the video processing unit 32. Signal distribution (mix level information) is calculated. Then, the mix level calculation unit 341 outputs the mix level information to the waveform adjustment unit 343 and the waveform adjustment unit 345. The mix level information includes information on the gain ratio of the audio signal of the same source distributed to the surround speaker 40 and the attach speaker 47 in order to localize the sound image at a position corresponding to the stereoscopic video, and the audio signal of which channel the source is. Contains information that indicates whether it is present. When the coordinate P of the stereoscopic video is, for example, between the Cch speaker 42 and the attach speaker 47 in front of the screen, the audio signal corresponding to this stereoscopic video (in this case, the audio signal of Cch) is the Cch speaker 42 and the attach speaker 47. To be reproduced with gain distribution according to the coordinate position. Also, when the coordinate P of the stereoscopic video is between the screen and the viewing position on the left side of the screen, it is reproduced from the Lch speaker 41, the Cch speaker 42, and the attach speaker 47, for example, with gain distribution according to the coordinate position.

なお、上記のゲイン配分方法は、立体映像の座標Ｐがサラウンドスピーカ４０とアタッチスピーカ４７の間にあることを前提としているが、アタッチスピーカの位置によっては、算出した座標がアタッチスピーカより、視聴位置に近い場合も考えられる。この場合はアタッチスピーカの位置を限度としてアタッチスピーカを最大音量とてしてもよい。また、実際の座標をアタッチスピーカとサラウンドスピーカの距離で表現できるようにスケーリングしてもよい。つまり、座標位置を必ずアタッチスピーカとサラウンドスピーカの間にくるようにスケーリング（変換）する。例えば、算出された座標位置が画面と視聴者の中間にある場合は、サラウンドスピーカとアタッチスピーカの中間に定位するようにする。 Note that the above gain distribution method is based on the premise that the coordinate P of the stereoscopic video is between the surround speaker 40 and the attach speaker 47, but depending on the position of the attach speaker, the calculated coordinate may be viewed from the attach speaker. It is conceivable that it is close to. In this case, the attached speaker may be set to the maximum volume with the position of the attached speaker as a limit. Further, the actual coordinates may be scaled so as to be expressed by the distance between the attached speaker and the surround speaker. That is, the coordinate position is always scaled (converted) so as to be between the attached speaker and the surround speaker. For example, when the calculated coordinate position is in the middle of the screen and the viewer, it is localized in the middle of the surround speaker and the attached speaker.

波形調整部３４３は、入力されたミックスレベル情報に基づいて、音声信号入力部３３から入力されたＬｃｈ、Ｃｃｈ、Ｒｃｈ、ＳＬｃｈ、及びＳＲｃｈの音声信号から立体映像に対応する音声信号を選択し、選択した音声信号のゲインを調整して第１音源信号出力部３５に出力する。 Based on the input mix level information, the waveform adjustment unit 343 selects an audio signal corresponding to a stereoscopic image from the Lch, Cch, Rch, SLch, and SRch audio signals input from the audio signal input unit 33, The gain of the selected audio signal is adjusted and output to the first sound source signal output unit 35.

波形調整部３４５も、波形調整部３４３と同様に、入力されたミックスレベル情報に基づいて、立体映像に対応する音声信号（波形調整部３４３が選択したのと同じ音声信号）を選択し、選択した音声信号のゲインを調整して第２音源信号出力部３６に出力する。 Similarly to the waveform adjustment unit 343, the waveform adjustment unit 345 also selects and selects an audio signal corresponding to the stereoscopic video (the same audio signal selected by the waveform adjustment unit 343) based on the input mix level information. The gain of the sound signal thus adjusted is adjusted and output to the second sound source signal output unit 36.

また、波形調整部３４３と波形調整部３４５は、サラウンドスピーカ４０から放音する音声とアタッチスピーカ４７から放音する音声とが同じタイミングで視聴位置９０に到達するように位相調整（遅延量の調整）を行う。 In addition, the waveform adjustment unit 343 and the waveform adjustment unit 345 adjust the phase (adjust the delay amount) so that the sound emitted from the surround speaker 40 and the sound emitted from the attach speaker 47 reach the viewing position 90 at the same timing. )I do.

例えば、Ｃｃｈスピーカ４２とアタッチスピーカ４７の間に音像を定位する場合、波形調整部３４３と波形調整部３４５は以下の処理を行う。図２に示したように、アタッチスピーカ４７は視聴者Ｕの直近に設置され、Ｃｃｈスピーカ４２はモニタ３８の近傍に設置されているので、波形調整部３４５は、例えば、アタッチスピーカ４７の放音タイミングを、Ｃｃｈスピーカ４２の放音タイミングに対して、モニタ３８から視聴位置までの距離（例えば、１〜２ｍ）を音声が伝搬する時間だけ遅延させる。これにより、視聴位置９０にほぼ同時に音声が伝搬するように調整できる。なお、視聴位置９０に対するＣｃｈスピーカ４２とアタッチスピーカ４７の距離を予め測定しておき、波形調整部３４３と波形調整部３４５は、この距離に応じて両スピーカの放音タイミングを調整することで、より正確に視聴位置９０に同時に音声を伝搬させることができる。これにより、サラウンドスピーカ４０とアタッチスピーカ４７から放音した音声信号が、視聴位置９０において打ち消し合うのを防止できる。 For example, when a sound image is localized between the Cch speaker 42 and the attach speaker 47, the waveform adjustment unit 343 and the waveform adjustment unit 345 perform the following processing. As shown in FIG. 2, the attach speaker 47 is installed in the immediate vicinity of the viewer U, and the Cch speaker 42 is installed in the vicinity of the monitor 38, so that the waveform adjustment unit 345, for example, emits sound from the attach speaker 47. The timing is delayed with respect to the sound emission timing of the Cch speaker 42 by the time that the sound propagates the distance (for example, 1 to 2 m) from the monitor 38 to the viewing position. Thereby, it can adjust so that an audio | voice may propagate to the viewing position 90 substantially simultaneously. The distance between the Cch speaker 42 and the attach speaker 47 with respect to the viewing position 90 is measured in advance, and the waveform adjustment unit 343 and the waveform adjustment unit 345 adjust the sound emission timing of both speakers according to this distance, The sound can be propagated to the viewing position 90 more accurately at the same time. Thereby, it is possible to prevent the audio signals emitted from the surround speaker 40 and the attach speaker 47 from canceling at the viewing position 90.

残響付加部３４７は、算出された立体映像の座標が画面の奥行き方向に位置している場合に、その距離Ｌｄに応じた量の残響成分をサラウンドスピーカ４０に付加し、奥行き方向を表現する。これは、人は音声に残響（リバーブ）が付加されると、音声の間接音比率が高くなるため、実際のスピーカの位置よりも遠くから聞こえる音と知覚する現象を利用している。また、残響付加部３４７は、スピーカを設置した部屋の残響が著しく少なかったり、アタッチスピーカとサラウンドスピーカの設置位置が近接していたりするような、スピーカ間の直接音−間接音比率の差が少なく奥行き方向の定位の表現が困難である場合にも、設置環境に拠らず一定の効果が出せるように、直接音、間接音比率の差をつけるための残響付加も行う。この場合の残響付加量は予め設置した部屋の残響特性を測定した結果により決定される。 The reverberation adding unit 347 adds a reverberation component in an amount corresponding to the distance Ld to the surround speaker 40 and expresses the depth direction when the calculated coordinates of the stereoscopic video are located in the depth direction of the screen. This is because the indirect sound ratio of the sound increases when reverberation is added to the sound, so that a person perceives the sound as being heard from a distance farther than the actual speaker position. In addition, the reverberation adding unit 347 has a small difference in the direct sound-indirect sound ratio between the speakers, such that the reverberation of the room in which the speakers are installed is remarkably small or the installation positions of the attached speakers and the surround speakers are close to each other. Even if it is difficult to express the localization in the depth direction, reverberation is added to make a difference between the direct sound and indirect sound ratios so that a certain effect can be obtained regardless of the installation environment. The amount of added reverberation in this case is determined by the result of measuring the reverberation characteristics of a room installed in advance.

立体映像によっては、背景は奥行き方向に拡がり、人物など強調したい映像部分は逆に飛び出しているような表現があるが、このような場合は、奥行きを表現するためにサラウンドスピーカ４０に残響を付加し、かつ飛び出しを表現するためにアタッチスピーカへのゲインを上げなくてはいけない。このとき、固定の距離−音源間ゲイン比率を使用していると、残響付加により、サラウンドスピーカ４０の直接−間接音比率が変わり、算出された立体映像の位置に正しく定位させることができなくなるので、ミックスレベル算出部３４１は音声信号に付加する残響成分量に応じて前記距離−音源間ゲイン比率を変動させることで常時正しい音声定位位置を保つように動作する。具体的には付加する残響成分量が多くなるにつれて、同じ定位位置におけるアタッチスピーカのゲイン比率を上げ、逆に付加する残響成分量が少なくなるにつれて、同じ定位位置におけるアタッチスピーカのゲイン比率を下げる。 Depending on the 3D image, the background may expand in the depth direction, and the video portion that you want to emphasize, such as a person, may appear to protrude in reverse. In such a case, reverberation is added to the surround speaker 40 to express the depth. In order to express popping out, the gain to the attached speaker must be increased. At this time, if a fixed distance-gain ratio between sound sources is used, the direct-indirect sound ratio of the surround speaker 40 changes due to the addition of reverberation and cannot be correctly localized at the calculated position of the stereoscopic image. The mix level calculation unit 341 operates so as to always maintain a correct sound localization position by changing the distance-sound source gain ratio according to the amount of reverberation component added to the sound signal. Specifically, as the amount of reverberation component to be added increases, the gain ratio of the attached speaker at the same localization position is increased, and conversely, as the amount of reverberation component to be added decreases, the gain ratio of the attach speaker at the same localization position is decreased.

図５は、音像の定位位置を説明するための図である。図５（Ａ）は視聴位置に音像が定位する場合、図５（Ｂ）はアタッチスピーカとサラウンドスピーカの間に音像が定位する場合、図５（Ｃ）は、サラウンドスピーカ（Ｃｃｈ）の位置に音像が定位する場合、図５（Ｄ）は、サラウンドスピーカ（Ｃｃｈ）の背後に音像が定位する場合、図５（Ｅ）は、複数のスピーカにより音像を定位する場合である。図６は、プレゼンススピーカを室内に設置した状態を示す図である。以下の説明では、図４（Ａ）と同様に視聴位置を原点Ｏとする。 FIG. 5 is a diagram for explaining a localization position of a sound image. 5A shows the case where the sound image is localized at the viewing position, FIG. 5B shows the case where the sound image is localized between the attach speaker and the surround speaker, and FIG. 5C shows the position of the surround speaker (Cch). When the sound image is localized, FIG. 5D is a case where the sound image is localized behind the surround speaker (Cch), and FIG. 5E is a case where the sound image is localized by a plurality of speakers. FIG. 6 is a diagram illustrating a state in which the presence speaker is installed in the room. In the following description, the viewing position is set as the origin O as in FIG.

（１）図５（Ａ）に示すように、立体映像の座標Ｐが原点Ｏ（Ｌｄ＝０）の場合、波形調整部３４３と波形調整部３４５は立体映像に対応する音声信号のゲインを調整して、アタッチスピーカ４７からのみ放音し、Ｃｃｈスピーカ４２から音声を放音しないように設定する。この時アタッチスピーカは視聴者直近に設置されているため、アタッチスピーカから知覚される音声は主に直接音となり、当然ながらその結果、視聴者直近から聞こえるものと知覚される。 (1) As shown in FIG. 5A, when the coordinate P of the stereoscopic video is the origin O (Ld = 0), the waveform adjusting unit 343 and the waveform adjusting unit 345 adjust the gain of the audio signal corresponding to the stereoscopic video. Then, the sound is set to be emitted only from the attached speaker 47 and not emitted from the Cch speaker 42. At this time, since the attached speaker is installed in the immediate vicinity of the viewer, the sound perceived from the attached speaker is mainly a direct sound, and as a result, it is perceived that the sound can be heard from the viewer.

（２）図５（Ｂ）に示すように、立体映像の座標Ｐが原点ＯとＣｃｈスピーカ４２との間（０＜Ｌｄ＜Ｌｓ）の場合、波形調整部３４３と波形調整部３４５は立体映像に対応する音声信号のゲインを調整して、アタッチスピーカ４７とＣｃｈスピーカ４２から放音する音声の音量を調整することで、視聴者が中点５５の位置に音声定位を知覚する直接音−間接音比率を持つ音声を再生することができる。 (2) As shown in FIG. 5B, when the coordinate P of the stereoscopic video is between the origin O and the Cch speaker 42 (0 <Ld <Ls), the waveform adjusting unit 343 and the waveform adjusting unit 345 are the stereoscopic video. By adjusting the gain of the audio signal corresponding to, and adjusting the volume of the sound emitted from the attached speaker 47 and the Cch speaker 42, the viewer perceives the sound localization at the midpoint 55 position. Sound with a sound ratio can be played.

（３）図５（Ｃ）に示すように、立体映像の座標ＰがＣｃｈスピーカ４２の放音面（Ｌｄ＝Ｌｓ）の場合、波形調整部３４３と波形調整部３４５は立体映像に対応する音声信号のゲインを調整して、アタッチスピーカ４７から音声を放音しないように設定し、Ｃｃｈスピーカ４からのみ音声を再生する。したがって、視聴者Ｕは、Ｃｃｈスピーカ４２からの音声が持つ直接音−間接音比率で決まる距離感を知覚することになる。 (3) As shown in FIG. 5C, when the coordinate P of the stereoscopic video is the sound emission surface (Ld = Ls) of the Cch speaker 42, the waveform adjusting unit 343 and the waveform adjusting unit 345 are audio corresponding to the stereoscopic video. The gain of the signal is adjusted so that sound is not emitted from the attached speaker 47, and the sound is reproduced only from the Cch speaker 4. Accordingly, the viewer U perceives a sense of distance determined by the direct sound-indirect sound ratio of the sound from the Cch speaker 42.

（４）図５（Ｄ）に示すように、立体映像の座標ＰがＣｃｈスピーカ４２の背後にある（Ｌｄ＞Ｌｓ）場合、（３）と同様にＣｃｈスピーカ４単独、またはＣｃｈを含む複数個のサラウンドスピーカを使用して再生し、同時に画面から座標Ｐまでの距離に応じた残響成分を残響付加部３４７によりサラウンドスピーカ音声に付加する。また、画面から座標Ｐまでの距離（立体映像の座標情報）に応じて、残響付加部３４７によりサラウンドスピーカ音声に付加する残響成分を増減する。これにより、視聴者Ｕは画面の奥行きに拡がった音場を知覚する。 (4) As shown in FIG. 5D, when the coordinate P of the stereoscopic video is behind the Cch speaker 42 (Ld> Ls), the Cch speaker 4 alone or a plurality including Cch is provided as in (3). The reverberation component according to the distance from the screen to the coordinate P is added to the surround speaker sound by the reverberation adding unit 347 at the same time. The reverberation component added to the surround speaker sound is increased or decreased by the reverberation adding unit 347 according to the distance from the screen to the coordinate P (coordinate information of the stereoscopic video). As a result, the viewer U perceives a sound field extending to the depth of the screen.

このように、アタッチスピーカ４７の音量の変化と、サラウンドスピーカ４０に付加する残響量を変化させることで、アタッチスピーカ４７とサラウンドスピーカ４０との音声のバランスが変化し、視聴者Ｕに聞こえる音の直接音と間接音の比率が変わる。人は、直接音と間接音の比率や周波数特性や位相差などにより音像の定位感を感じるが、本発明では、直接音と間接音の比率を制御して主に音像の距離感を調整している。これにより、モニタ３８の画面から飛び出したり、引っ込んだりして移動する立体映像に追従して音像を移動させることができる。 Thus, by changing the volume of the attached speaker 47 and the amount of reverberation added to the surround speaker 40, the balance of the sound between the attach speaker 47 and the surround speaker 40 changes, and the sound heard by the viewer U can be changed. The ratio of direct sound and indirect sound changes. People feel a sense of localization of the sound image due to the ratio of direct sound and indirect sound, frequency characteristics, phase difference, etc., but in the present invention, the ratio of direct sound and indirect sound is controlled to mainly adjust the sense of distance of the sound image. ing. As a result, the sound image can be moved following the stereoscopic video that moves by jumping out or withdrawing from the screen of the monitor 38.

以上の説明では、主にＣｃｈスピーカ４２とアタッチスピーカ４７のみを使い音像を定位させた場合について説明したが、実際のコンテンツではＣｃｈと相関性が高い音声が他のチャンネル（例えばＬ／Ｒｃｈ）にも含まれていることが多いため、Ｃｃｈだけを取り出して本発明の効果を適用すると音質的に違和感を生じる可能性がある。このため、実際にはＣｃｈを含む複数チャンネルを選択して再生することもできる。さらに、本発明では立体映像の画面上での左右の位置によってサラウンドスピーカ４０とアタッチスピーカ４７の各チャンネルの音声ゲインを調整することによって、真正面以外の位置に音像を定位させることもできる。 In the above description, the case where the sound image is localized mainly using only the Cch speaker 42 and the attach speaker 47 has been described. However, in the actual content, audio having a high correlation with Cch is transmitted to other channels (for example, L / Rch). Therefore, if only the Cch is extracted and the effect of the present invention is applied, there is a possibility that a sense of discomfort is produced in terms of sound quality. Therefore, in practice, a plurality of channels including Cch can be selected and reproduced. Furthermore, in the present invention, the sound image can be localized at a position other than the front by adjusting the audio gain of each channel of the surround speaker 40 and the attach speaker 47 according to the left and right positions on the 3D video screen.

例えば、立体映像の画面上での左右の位置に応じて、Ｌｃｈスピーカ４１とＳＬｃｈスピーカ４４とアタッチスピーカ４７の音声ゲインを調整して放音することで、図５（Ｅ）に示すように、Ｌｃｈスピーカ４１とＳＬｃｈスピーカ４４とアタッチスピーカ４７を結ぶ線により形成された三角形の領域に音像５７を定位させることができる。 For example, by adjusting the sound gain of the Lch speaker 41, the SLch speaker 44, and the attach speaker 47 according to the left and right positions on the 3D video screen, as shown in FIG. The sound image 57 can be localized in a triangular region formed by a line connecting the Lch speaker 41, the SLch speaker 44, and the attach speaker 47.

また、図６に示すように、視聴位置９０の前左右上方にプレゼンス左チャンネル（ＰＬｃｈ）用のスピーカ５１と、プレゼンス右チャンネル（ＰＲｃｈ）用のスピーカ５２を設置する。そして、立体映像の画面上での上下の位置に応じて、例えば、Ｒｃｈスピーカ４３とＰＲｃｈスピーカ５２とアタッチスピーカ４７の音声ゲインを調整して放音することで、部屋９１の上下方向にも音像を定位させることができる。すなわち、図６に示すように、Ｒｃｈスピーカ４３とＰＲｃｈスピーカ５２とアタッチスピーカ４７を結ぶ線により形成された三角形の領域に音像５８を定位させることができる。 Also, as shown in FIG. 6, a presence left channel (PLch) speaker 51 and a presence right channel (PRch) speaker 52 are installed in front of the viewing position 90 in the front left and right directions. Then, according to the vertical position on the screen of the stereoscopic video, for example, by adjusting the sound gain of the Rch speaker 43, the PRch speaker 52, and the attach speaker 47 and releasing the sound, the sound image is also displayed in the vertical direction of the room 91. Can be localized. That is, as shown in FIG. 6, the sound image 58 can be localized in a triangular region formed by a line connecting the Rch speaker 43, the PRch speaker 52, and the attach speaker 47.

また、音像処理部３４では、監視部３４８により、映像処理部３２による立体映像の視聴位置９０に対する座標の移動を監視する。監視部３４８は、波形調整部３４３，３４５に制御信号を出力させて、立体映像の座標情報と、アタッチスピーカの設置位置に応じて、アタッチスピーカ４７の音声信号の周波数特性を調整する。 In the sound image processing unit 34, the monitoring unit 348 monitors the movement of coordinates with respect to the viewing position 90 of the stereoscopic video by the video processing unit 32. The monitoring unit 348 causes the waveform adjusting units 343 and 345 to output a control signal, and adjusts the frequency characteristics of the audio signal of the attach speaker 47 according to the coordinate information of the stereoscopic video and the installation position of the attach speaker.

図７は、音源の距離に応じた周波数特性を示すグラフである。音響効果の手法の１つに、音像が遠くから近づいてくる状態を表現するために、音像の音声の高域成分を徐々に増加させ、音像が近くから遠くに離れていく状態を表現するために、音像の音声の高域成分を徐々に減少させる手法がある。この手法は、図７に示すように、例えば低域から高域まで一定レベルの音を放音すると、音源が視聴位置の近くにある場合には、音の高域成分に減衰なく到達するが、音源視聴位置から遠くへ離れていくにつれて、音の高域成分が減衰していく現象を模擬したものである。この手法を以下、接離効果と称する。一般的な音響再生装置では、上記のような接離効果が付加された音声信号を再生することで、視聴者に音像の接離を体感させることができる。 FIG. 7 is a graph showing frequency characteristics according to the distance of the sound source. To express the state where the sound image approaches from a distance as one of the acoustic effect methods, to gradually increase the high frequency component of the sound of the sound image and express the state where the sound image moves away from the near In addition, there is a method of gradually reducing the high frequency component of the sound of the sound image. In this method, as shown in FIG. 7, for example, when a certain level of sound is emitted from a low range to a high range, if the sound source is near the viewing position, the high frequency component of the sound is reached without attenuation. This simulates the phenomenon in which the high frequency component of the sound attenuates as it moves farther away from the sound source viewing position. This method is hereinafter referred to as a contact / separation effect. In a general sound reproducing device, a viewer can experience the contact / separation of a sound image by reproducing an audio signal to which the above contact / separation effect is added.

一方、本発明の音響再生装置３は、音像を実際に接離させることができる。そのため、３Ｄコンテンツの音声信号に接離効果が予め付加されていると、アタッチスピーカからそのまま再生すると、視聴者にとって音質的に高域成分が過剰になり、快適な視聴を阻害したり、音響再生装置３が意図する立体映像の定位位置に音像を正確に定位させることができなかったりする可能性がある。そこで、本発明では、監視部３４８により映像処理部３２による立体映像の視聴位置９０に対する座標の移動を監視し、座標の移動とアタッチスピーカの設置位置（視聴者からの距離）に応じて周波数特性を調整して、高域成分の過剰さや接離効果を抑制する。つまり、立体映像が画面から手前に移動する場合には視聴位置に近づくほど音声信号の高域成分の減少量を増加させ、立体映像が手前から画面方向に移動する場合には遠くなるにつれて音声信号の高域成分を増加させ、Ｃｃｈスピーカの位置で完全に元の周波数特性に回復させるようにする。また、立体映像が画面より奥にある場合は、上記周波数特性の調整は行わない。 On the other hand, the sound reproducing apparatus 3 of the present invention can actually make a sound image close to and away from the sound image. For this reason, if the contact / separation effect is added to the audio signal of the 3D content in advance, if the playback is performed as it is from the attached speaker, the high frequency component is excessive in terms of sound quality for the viewer, and the comfortable viewing is disturbed or the sound is reproduced. There is a possibility that the sound image cannot be accurately localized at the localization position of the stereoscopic video intended by the device 3. Therefore, in the present invention, the monitoring unit 348 monitors the movement of the coordinates with respect to the viewing position 90 of the stereoscopic video by the video processing unit 32, and the frequency characteristics according to the movement of the coordinates and the installation position of the attached speaker (distance from the viewer). To suppress the excess of high-frequency components and the contact / separation effect. In other words, when the stereoscopic image moves to the front from the screen, the amount of decrease in the high frequency component of the audio signal increases as the viewing position is approached, and when the stereoscopic image moves from the front to the screen direction, the audio signal increases as the distance increases. The high frequency component is increased and the original frequency characteristic is completely restored at the position of the Cch speaker. Further, when the stereoscopic video is behind the screen, the frequency characteristics are not adjusted.

ここまで言及してきたアタッチスピーカ４７は、必ずしも全周波数帯域が再生できるスピーカでなくても良い。視聴者Ｕが音像の定位や音像の移動を知覚できる周波数帯域の音声を放音できるもの、例えば、最低共振周波数ｆ０が５００Ｈｚ以上の小型のアタッチスピーカを用いても良い。このような小型のアタッチスピーカであれば設置位置や方法の自由度が高く、例えば３Ｄ映像を見るために使用するメガネに取り付けるといったことが可能となり、視聴者Ｕの耳の近傍にスピーカを配置できるので、直接音の比率が非常に高い音声を視聴者に聞かせることができ本発明の効果を最大化することができる。 The attached speaker 47 mentioned so far does not necessarily have to be a speaker that can reproduce the entire frequency band. A speaker that can emit sound in a frequency band in which the viewer U can perceive the localization of the sound image and the movement of the sound image, for example, a small attached speaker having a minimum resonance frequency f0 of 500 Hz or more may be used. Such a small attached speaker has a high degree of freedom in the installation position and method, and can be attached to, for example, glasses used for viewing 3D video, and the speaker can be arranged in the vicinity of the ear of the viewer U. Therefore, it is possible to let the viewer hear the voice having a very high direct sound ratio, and the effect of the present invention can be maximized.

［第２実施形態］
次に、本発明の第２実施形態に係る音響再生装置について説明する。図８は、第２実施形態に係る音響再生装置を備えた３Ｄシステムの概略構成を示すブロック図である。図９は、スピーカアレイの音声ビームの放音状態を示す図である。以下の説明では、第１実施形態と同様の構成には同じ符号を付して説明を省略する。 [Second Embodiment]
Next, a sound reproducing device according to the second embodiment of the present invention will be described. FIG. 8 is a block diagram illustrating a schematic configuration of a 3D system including the sound reproducing device according to the second embodiment. FIG. 9 is a diagram showing a sound emission state of the sound beam of the speaker array. In the following description, the same components as those in the first embodiment are denoted by the same reference numerals and description thereof is omitted.

図８に示す３Ｄシステム６は、映像再生装置２と、音響再生装置８を備えている。３Ｄシステム６では、図１に示した３Ｄシステム１と異なり、音響再生装置３のアタッチスピーカ４７を視聴位置９０の直近に設置せずに、スピーカアレイ４９により視聴位置９０の直近の点５０で焦点を結ぶ音声ビーム（第２音源の音声信号）を放音させている。他の構成は３Ｄシステム１と同様であり、サラウンドスピーカ４０から第１音源の音声信号を放音させて、視聴位置９０の任意の位置に音像を定位（配置）させる。スピーカアレイ４９は、一例として部屋９１のモニタ３８の近傍（部屋９１の前側）に設置している。スピーカアレイ４９は、複数のスピーカユニット４９−１〜４９−Ｎが所定の配列で配置されている。 A 3D system 6 illustrated in FIG. 8 includes a video playback device 2 and an audio playback device 8. In the 3D system 6, unlike the 3D system 1 shown in FIG. 1, the speaker 47 of the sound reproduction device 3 is not installed in the immediate vicinity of the viewing position 90, but is focused at the point 50 closest to the viewing position 90 by the speaker array 49. A sound beam (sound signal of the second sound source) is connected. The other configuration is the same as that of the 3D system 1, and the sound signal of the first sound source is emitted from the surround speaker 40 and the sound image is localized (arranged) at an arbitrary position of the viewing position 90. As an example, the speaker array 49 is installed in the vicinity of the monitor 38 in the room 91 (the front side of the room 91). In the speaker array 49, a plurality of speaker units 49-1 to 49-N are arranged in a predetermined arrangement.

音響再生装置８は、音響再生装置３の構成を一部変更しており、音像処理部３４Ｂにおいて、波形調整部３４５の後段に位相調整部３４９を備えている。位相調整部３４９は、信号処理手段に相当し、図９に示すように、スピーカアレイ４９の各スピーカユニット４９−１〜４９−Ｎに出力する音声信号（第２音源の音声信号）の遅延量を調整して、スピーカアレイ４９により視聴位置９０の直近（例えば前方）の点５０で焦点を結ぶ音声ビームを放音させる。視聴者Ｕは、スピーカアレイ４９から上記のような音声ビームが放音されると、音源（第２音源）が点５０付近に定位していると知覚する。これは、スピーカアレイと同じ位置にある信号処理や、特殊な構造による指向性制御を行わないスピーカからの音声と比較して、音声ビームはその鋭い指向性によって直接音の比率が高くなるからである。そのため、視聴者Ｕはスピーカアレイが設置されている場所より近くに音源があるかのような印象を受ける。これにより、アタッチスピーカを視聴位置９０の直近に設置しなくてもよくなり、構成、配置の自由度や、利便性を高めることができる。また前記と同様な効果をスピーカアレイ以外の方法で実現しても良い。アタッチスピーカは指向性制御を行わない通常のスピーカより、直接音比率が高い音声が出力できればよいので、例えば、超音波を使用するパラメトリックアレイスピーカや、直進性が高い音を再生できる平面バッフルスピーカなどを使用しても良い。 The sound reproduction device 8 has partially changed the configuration of the sound reproduction device 3, and the sound image processing unit 34B includes a phase adjustment unit 349 subsequent to the waveform adjustment unit 345. The phase adjustment unit 349 corresponds to a signal processing unit, and as shown in FIG. 9, the delay amount of the audio signal (the audio signal of the second sound source) output to each speaker unit 49-1 to 49-N of the speaker array 49. And the sound beam focused at the point 50 closest to the viewing position 90 (for example, the front) is emitted by the speaker array 49. When the sound beam as described above is emitted from the speaker array 49, the viewer U perceives that the sound source (second sound source) is localized near the point 50. This is because the sound beam has a higher direct sound ratio due to its sharp directivity compared to sound from speakers that do not perform signal processing at the same position as the speaker array or directivity control by a special structure. is there. Therefore, the viewer U receives the impression that the sound source is near the place where the speaker array is installed. Thereby, it is not necessary to install the attach speaker in the immediate vicinity of the viewing position 90, and the degree of freedom of configuration and arrangement and convenience can be improved. Further, the same effect as described above may be realized by a method other than the speaker array. Attached speakers only need to be able to output sound with a higher direct sound ratio than normal speakers that do not perform directivity control. For example, parametric array speakers that use ultrasonic waves, flat baffle speakers that can reproduce highly straight sound, etc. May be used.

以上のように、本発明では、視聴位置９０の周囲（視聴位置９０から離れた位置）に複数のスピーカ（サラウンドスピーカ４０）を配置することで、これらのスピーカが放音した音は視聴位置では間接音の比率が高くなるので、視聴者に遠くから聞こえる音と知覚させることができる。また、視聴者Ｕの視聴位置９０の直近にアタッチスピーカ４７を配置することで、このスピーカが放音した音は、視聴位置では直接音の比率が高くなるので、視聴者に近くから聞こえる音と知覚させることができる。また、第２音源を視聴位置の直近に配置させれば、第２音源から放音される音は視聴者にはほとんど直接音として聞こえるので、視聴者の視聴位置に音源を定位させることができる。したがって、サラウンドスピーカとアタッチスピーカのミックスレベルを調整することで、任意の位置に音像を定位することができる。 As described above, in the present invention, by arranging a plurality of speakers (surround speakers 40) around the viewing position 90 (positions away from the viewing position 90), the sound emitted by these speakers is not viewed at the viewing position. Since the ratio of indirect sound is increased, the viewer can perceive the sound as being audible from a distance. In addition, by arranging the attach speaker 47 in the immediate vicinity of the viewing position 90 of the viewer U, the sound emitted by this speaker has a high direct sound ratio at the viewing position, so that the sound that can be heard from near by the viewer can be obtained. It can be perceived. Further, if the second sound source is arranged in the immediate vicinity of the viewing position, the sound emitted from the second sound source is almost directly heard by the viewer, so that the sound source can be localized at the viewer's viewing position. . Therefore, the sound image can be localized at any position by adjusting the mix level of the surround speaker and the attach speaker.

なお、第１実施形態においては、視聴者Ｕの視聴位置９０の周囲に５つのサラウンドスピーカ４０を配置している例を示したが、これに限るものではなく、他の方式を用いることも可能である。例えば、サラウンドスピーカに代えてスピーカアレイを設置し、スピーカアレイに５チャンネルの音声信号を入力し、各スピーカユニットの放音タイミングを制御して、５つの音声ビームを生成し、部屋の壁に音声ビームを反射させることで、視聴位置９０の周囲に、５つのスピーカを配置したのと同様の音場を生成することができる。したがって、アタッチスピーカを視聴位置９０の直近に設置し、スピーカアレイによりサラウンドスピーカ４０を模擬した音声を放音する構成とすることも可能である。 In the first embodiment, the example in which the five surround speakers 40 are arranged around the viewing position 90 of the viewer U is shown, but the present invention is not limited to this, and other methods can be used. It is. For example, instead of a surround speaker, a speaker array is installed, a five-channel audio signal is input to the speaker array, the sound emission timing of each speaker unit is controlled, five sound beams are generated, and sound is generated on the wall of the room. By reflecting the beam, a sound field similar to that in which five speakers are arranged around the viewing position 90 can be generated. Therefore, it is also possible to install the attach speaker in the immediate vicinity of the viewing position 90 and emit sound simulating the surround speaker 40 by the speaker array.

また、２つのスピーカアレイを視聴位置９０の周囲に設置し、一方のスピーカアレイに、視聴位置９０の直近に焦点を結ぶ音声ビームを放音させて、アタッチスピーカを模擬し、他方のスピーカアレイに、サラウンドスピーカ４０を模擬した音声を放音する構成とすることも可能である。 In addition, two speaker arrays are installed around the viewing position 90, and a sound beam focused in the immediate vicinity of the viewing position 90 is emitted from one speaker array to simulate an attached speaker. It is also possible to have a configuration that emits sound simulating the surround speaker 40.

なお、図２には視聴者Ｕの前方（直近）に１つのアタッチスピーカを配置した例を示したが、本発明はこれに限るものではない。例えば視聴者Ｕの前方または側方に２つのアタッチスピーカを配置し、ステレオ音声を放音するように設定してもよいし、正中面方向の音声に対して定位の前後を誤判定し易い人間の聴覚の特性を利用して、視聴者の頭部直近後方のソファや、映画館の座席のヘッドレストにアタッチスピーカを設置しても良い。 In addition, although the example which has arrange | positioned one attachment speaker in front of the viewer U in FIG. 2 was shown, this invention is not limited to this. For example, two attached speakers may be arranged in front or side of the viewer U so as to emit stereo sound, or a person who easily misjudges before and after localization with respect to sound in the median plane direction. An attached speaker may be installed on the sofa immediately behind the viewer's head or on the headrest of a movie theater seat using the auditory characteristics.

いずれの場合も、アタッチスピーカとサラウンドスピーカの音量バランスを調整することで、視聴者の周囲の任意の位置に音像を定位させることができる。 In either case, the sound image can be localized at any position around the viewer by adjusting the volume balance between the attached speaker and the surround speaker.

また、図８には、視聴者Ｕの前方（直近）に１つの音声ビームが焦点を結ぶ例を示したが、本発明はこれに限るものではなく、例えばスピーカアレイが２つの音声ビーム（ステレオ音声ビーム）を出力（放音）し、視聴者Ｕの前方の左右でそれぞれ焦点を結ぶように設定してもよい。このようにすることで、２つのアタッチスピーカを配置した場合と同様の効果が得られる。 FIG. 8 shows an example in which one audio beam is focused in front of the viewer U (nearest), but the present invention is not limited to this. For example, the speaker array includes two audio beams (stereo). (Sound beam) may be output (sound emission) and set so that the left and right in front of the viewer U are focused. By doing in this way, the same effect as the case where two attached speakers are arranged can be obtained.

なお、以上の説明では、音響再生装置３は、映像再生装置が出力した映像信号を解析して、映像の座標を算出する構成を説明したが、例えばコンテンツの信号ストリームに予め書き込まれた映像の座標情報を用いるなど音響再生装置３の外部から映像の座標情報を与えても良い。その場合は映像信号を解析することなく、この座標情報に基づいてミックスレベルを算出するように構成するとよい。これにより、音響再生装置３の構成を簡略化できる。また、座標情報は、必ずしも３Ｄコンテンツから取得しなくてもよい。例えば、２Ｄコンテンツを再生装置によって３Ｄ映像に変換した後の映像データを使ってもよいし、２Ｄ映像の画像の時系列相関の解析や、画像と音声／音量変化の連動の解析から３次元座標を予想して作成された座標情報を使用してもよい。 In the above description, the audio playback device 3 has been described as analyzing the video signal output from the video playback device and calculating the video coordinates. You may give the coordinate information of an image | video from the outside of the sound reproduction apparatus 3, such as using coordinate information. In that case, the mix level may be calculated based on the coordinate information without analyzing the video signal. Thereby, the structure of the sound reproduction apparatus 3 can be simplified. In addition, the coordinate information is not necessarily acquired from the 3D content. For example, video data obtained by converting 2D content into 3D video by a playback device may be used, or analysis of time-series correlation of 2D video images or analysis of the linkage between images and audio / volume changes. Coordinate information created in anticipation of the above may be used.

具体的には、２Ｄ画像から動きのある物体の座標を取得するためには、以下のような方法が考えられる。時系列相関の解析により、時間軸上の連続したフレーム中からある物体像を抽出し、次フレームで（相似図形で）像が大きくなっている物体は近づいていると推測する。また、背景と注目物体の大きさの関連（遠近法）から座標情報（の変化）を取得したり、霧の中から近づいてくるときに輪郭が次第にはっきりすることを利用したりするなどすればよい。前述の例では、抽出したある物体が近づく／遠ざかる、の相対関係が主に抽出される。絶対的な距離は、その映像が室内シーンなのか屋外なのか、などシーン判別の手法を用いて算出できる絶対的な距離を概算すると、上記遠近移動を組み合わせて判断することも可能である。また、上記映像上で大きくなる物体が、近づいてくるのか、風船を膨らませているときのように物体自体が膨張しているのか、判別がつきにくい場合も考えられる。そのような場合は音声信号を援用してもよい。例えば物体の見かけ（画角）上の寸法が大きくなるに従い音量が大きくなるような音声信号成分があれば、その物体は近づいていると判断する。あるいは後述のようにマルチチャンネル信号の音像のパンニング状況も勘案して判断することも可能である。また、近年デジタルカメラなどで頻用されている顔検出を利用して、顔の部位間の距離（例えば目の間隔）の変化などで前後移動を推定するなどの方法も考えられる。 Specifically, in order to acquire the coordinates of a moving object from a 2D image, the following method can be considered. By analyzing the time-series correlation, an object image is extracted from consecutive frames on the time axis, and it is assumed that an object whose image is enlarged in the next frame (in a similar figure) is approaching. You can also get coordinate information (changes) from the relationship between the background and the size of the object of interest (perspective), or use the fact that the outline gradually becomes clear when approaching from the fog. Good. In the above-described example, the relative relationship in which an extracted object approaches / descends is mainly extracted. The absolute distance can also be determined by combining the above-mentioned perspective movement when the absolute distance that can be calculated by using a scene discrimination method such as whether the video is an indoor scene or the outdoors is approximated. In addition, it may be difficult to determine whether an object that is enlarged on the image is approaching or whether the object itself is expanding like when a balloon is inflated. In such a case, an audio signal may be used. For example, if there is an audio signal component whose volume increases as the apparent size (view angle) of the object increases, it is determined that the object is approaching. Alternatively, the determination can be made in consideration of the panning state of the sound image of the multi-channel signal as described later. In addition, a method of estimating the back-and-forth movement based on a change in the distance between face parts (for example, the interval between eyes) using the face detection frequently used in digital cameras or the like in recent years can be considered.

また、必ずしも画像処理を用いないでもソースの音源から音像の座標情報を抽出することにより本発明を実施することも可能である。コンテンツの音声トラック内でも音像移動を意図して５．１ｃｈや７．１ｃｈなど複数の音声チャンネルを用いてミキシングされたものがある。例えば頭上（真上）を通過するヘリコプタなどの音源が前後に移動する場合を考える。このとき、最初は音像の持つ信号（モノラル）をセンタスピーカのみから放音し、次にセンタ＋Ｌ・Ｒ同音量に振り分け、次にＬ・Ｒ同音量＋ＳＲ・ＳＬ同音量（このとき前後の音量比で前後の音像移動をする）、次にＳＲ＋ＳＬ同音量、（バックスピーカがあればさらにこの後バックに音量を振り分ける）のようにして前方から後方に向けての音像の移動を感じさせる。このようなソースの場合、音響再生装置では、逆にミキシングされた音声トラックを解析して、同相（モノラル）成分がどのようにチャンネルに振り分けられているか解析することで製作者の意図する音像の移動を推測することができる。さらに、［発明が解決しようとする課題］で述べたような距離感と音に関する人間の知覚の習性を利用して、コンテンツ製作者が距離感を感じさせるための表現として、直接音と間接音の比率を変えることが通常行われている。前記と同様、音声トラックから直接音と間接音の比率を算出し、算出値から製作者の意図する音像の座標を推測することができる。これらを用いて音像の位置情報を得て本発明の音像移動に利用することができる。 Further, the present invention can be implemented by extracting the coordinate information of the sound image from the sound source of the source without necessarily using image processing. Some content audio tracks are mixed using a plurality of audio channels such as 5.1ch and 7.1ch with the intention of moving the sound image. For example, consider a case where a sound source such as a helicopter passing over the head (directly above) moves back and forth. At this time, first, the signal (monaural) of the sound image is emitted only from the center speaker, then is distributed to the center + the same volume of LR, and then the same volume of LR, + the same volume of SR, SL (the volume before and after this time) Next, the sound image is moved from the front to the rear in the same manner as SR + SL, and (if there is a back speaker, the sound volume is further distributed to the back). In the case of such a source, the sound reproduction device analyzes the audio track mixed in reverse and analyzes how the in-phase (monaural) component is distributed to the channels, thereby producing the sound image intended by the manufacturer. You can guess the movement. Furthermore, the direct sound and the indirect sound are used as expressions for making the content creator feel a sense of distance by using the sense of distance and the human perception of sound related to the sound as described in [Problems to be Solved by the Invention]. It is usual to change the ratio. As described above, the ratio of the direct sound and the indirect sound can be calculated from the audio track, and the coordinates of the sound image intended by the manufacturer can be estimated from the calculated value. By using these, position information of the sound image can be obtained and used for moving the sound image of the present invention.

以上のように、音像の座標情報は種々の方法によって取得することができ、いずれの方法をとってもよいし、複数の方法を組み合わせて座標情報取得精度を向上することも考えられる。 As described above, the coordinate information of the sound image can be acquired by various methods, and any method may be used, and it is conceivable to improve the coordinate information acquisition accuracy by combining a plurality of methods.

１，６…３Ｄシステム２…映像再生装置３，８…音響再生装置２１…コンテンツ信号受信部２２…コンテンツ再生部２３…映像処理部２５…映像信号出力部２６…音声信号出力部３１…映像信号入力部３２…映像処理部３３…音声信号入力部３４…音像処理部３５…第１音源信号出力部３６…第２音源信号出力部３７…映像信号出力部３７…映像信号出力部３８…モニタ４１…Ｌｃｈスピーカ４２…Ｃｃｈスピーカ４３…Ｒｃｈスピーカ４４…ＳＬｃｈスピーカ４５…ＳＲｃｈスピーカ４７…アタッチスピーカ DESCRIPTION OF SYMBOLS 1,6 ... 3D system 2 ... Video reproduction apparatus 3,8 ... Sound reproduction apparatus 21 ... Content signal receiving part 22 ... Content reproduction part 23 ... Video processing part 25 ... Video signal output part 26 ... Audio signal output part 31 ... Video signal Input unit 32 ... Video processing unit 33 ... Audio signal input unit 34 ... Sound image processing unit 35 ... First sound source signal output unit 36 ... Second sound source signal output unit 37 ... Video signal output unit 37 ... Video signal output unit 38 ... Monitor 41 ... Lch speaker 42 ... Cch speaker 43 ... Rch speaker 44 ... SLch speaker 45 ... SRch speaker 47 ... Attached speaker

Claims

Coordinate information acquisition means for acquiring coordinate information in the viewing environment of the video displayed by the display device;
A sound emitting means for disposing a first sound source around the viewing position and disposing the second sound source closer to the viewing position than the first sound source;
Supply means for supplying an audio signal corresponding to the video to the first sound source and the second sound source;
Sound image control means for adjusting a gain ratio of an audio signal supplied to the first sound source and the second sound source in accordance with the coordinate information;
A sound reproducing device comprising:

The sound emitting means is a first speaker that emits an audio signal of the first sound source, and a second speaker that is installed closer to the viewing position than the first speaker and emits an audio signal of the second sound source. The sound reproducing device according to claim 1, further comprising a speaker.

The sound emitting means is
A speaker array in which a plurality of speaker units are arranged;
A signal processing unit that delays and distributes the audio signal of the second sound source to the plurality of speaker units, and emits an audio beam focused at a position where the second sound source is localized;
A speaker for emitting an audio signal of the first sound source;
The sound reproducing device according to claim 1, further comprising:

The sound reproduction device according to claim 1, wherein the sound image control unit adds a reverberation component to an audio signal emitted by the first sound source, and increases or decreases the addition amount according to coordinate information of the video.

5. The sound image control unit according to claim 4, wherein the sound image control means varies a gain ratio of an audio signal supplied to the first sound source and the second sound source in accordance with a reverberation component amount added to the audio signal emitted by the first sound source. Sound playback device.

The said sound image control means adjusts the frequency characteristic of the audio | voice signal of a said 2nd sound source so that the change of the frequency of the predetermined band in an audio | voice signal may be suppressed according to the coordinate information of the said image | video. The sound reproducing device described in 1.