JP2007274061A

JP2007274061A - Sound image localizer and av system

Info

Publication number: JP2007274061A
Application number: JP2006093943A
Authority: JP
Inventors: Makoto Tanaka; 田中　　良
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2006-03-30
Filing date: 2006-03-30
Publication date: 2007-10-18

Abstract

PROBLEM TO BE SOLVED: To represent sound image localization of a video image in a content accurately in a device for reproducing the sound field of sound synchronized with the video image, specifically an object in the video image, by outputting to a speaker array. SOLUTION: A sound image localizer comprises an input section 4, a parameter calculator 5, and a signal processor 6. The input section 4 receives a sound signal 41 which accompanies one or more objects appearing in a video image, and the positional information 42 in the video image of the object synchronized with the video image for every object. The signal processor 6 processes the input signal of that sound, and outputs a digital sound signal to each channel of a plurality of speakers. Based on the positional information of the object, a virtual sound source is set in front of a display for displaying the video image, and various parameters 51, 52 and 53 for forming the sound field of the virtual sound source are calculated for every channel. The signal processor 6 outputs the digital sound signal to each channel of each speaker based on the various parameters. COPYRIGHT: (C)2008,JPO&INPIT

Description

この発明は、スピーカアレイを用いて、映像、特に映像内の物体と同期した音声を形成する装置に関する。 The present invention relates to an apparatus that uses a speaker array to form a video, particularly a sound synchronized with an object in the video.

従来より、映像に付随した音声を、映像に合わせて広がりを持たせて再生するために、２チャンネルステレオや５．１チャンネルサラウンド等のオーディオシステムが実用化されている。 2. Description of the Related Art Conventionally, audio systems such as 2-channel stereo and 5.1-channel surround have been put to practical use in order to reproduce audio accompanying a video with a spread according to the video.

一方、近年のパソコンゲームやテレビゲーム機では、より正確に、映像に登場する物体（キャラクタやオブジェクト）の位置に、その物体が発する音声の音像を定位させる技術が提案されている（例えば特許文献１）。特許文献１の装置では、左右の耳のそれぞれ上下に配置されるスピーカを有するステレオヘッドホンを用い、音像を上下方向に定位させることができるようにしている。
特開平６−１６５８７７号公報 On the other hand, in recent personal computer games and video game machines, a technique has been proposed in which a sound image of a sound emitted from an object is more accurately localized at the position of an object (character or object) appearing in a video (for example, Patent Documents). 1). In the device of Patent Document 1, stereo headphones having speakers arranged above and below the left and right ears are used so that a sound image can be localized in the vertical direction.
JP-A-6-165877

しかしながら、特許文献１の装置は、従来のステレオヘッドホンに比べて、音像を上下方向に定位させることができるが、音像に遠近感や奥行き感を持たせることは不可能であった。また、特許文献１の装置は、ヘッドホンを用いることが必須であり、室内に設置したスピーカセットを用いて音像を定位させることができなかった。 However, the device of Patent Document 1 can position the sound image in the vertical direction as compared with the conventional stereo headphones, but it is impossible to give the sound image a sense of perspective or depth. In addition, the device of Patent Document 1 is required to use headphones, and a sound image cannot be localized using a speaker set installed indoors.

そこで、この発明は、スピーカアレイを用いて、映像、特に映像内の物体の移動と同期した音声を立体的に定位させる装置を提供することを目的とする。 Accordingly, an object of the present invention is to provide a device that uses a speaker array to stereoscopically localize a video, particularly a sound synchronized with the movement of an object in the video.

本発明は、上述の課題を解決するための手段を以下のように構成している。 In the present invention, means for solving the above-described problems are configured as follows.

（１）本発明は、
映像に含まれる物体に関連付けられた音声信号、および、前記物体の前記映像内での位置情報を入力する入力部と、
アレイ状に配列された複数のスピーカを有するスピーカアレイを制御する制御部であって、前記位置情報が示す位置に焦点を結ぶ音声ビームとなるように、前記音声信号を各スピーカに入力するタイミングを制御する定位制御部と、を備えた音像定位装置であることを特徴とする。 (1) The present invention
An audio signal associated with an object included in the video, and an input unit for inputting positional information of the object in the video;
A control unit for controlling a speaker array having a plurality of speakers arranged in an array, wherein a timing at which the audio signal is input to each speaker so as to form an audio beam focused on the position indicated by the position information A sound image localization apparatus including a localization control unit to control.

本発明では、入力部が前記物体の前記映像内での位置情報を入力しており、アレイ状に配置したスピーカに音声を出力している。また、前記位置情報が示す位置と前記アレイスピーカの配置位置に基づき前記位置情報が示す位置に焦点を結ぶ音声ビームとなるように、前記音声信号を各スピーカに入力するタイミングを制御する。これにより、スピーカアレイから音声を出力すると、前記仮想音源から広がってスピーカアレイに到達した音声を再現することができ、前記物体の前記映像内での位置情報が示す位置に、音像定位させることができる。 In the present invention, the input unit inputs position information of the object in the video, and outputs sound to speakers arranged in an array. In addition, the timing at which the audio signal is input to each speaker is controlled so that the sound beam is focused on the position indicated by the position information based on the position indicated by the position information and the arrangement position of the array speaker. Thus, when sound is output from the speaker array, the sound that has spread from the virtual sound source and reaches the speaker array can be reproduced, and the sound image can be localized at the position indicated by the position information of the object in the video. it can.

また、本発明によれば、この音像定位は、天井や壁といった機械的な要素を必要とすることなく任意の位置に設定することが可能となる。さらに、この装置によれば、ドップラー効果等の効果音を模擬的に作り出すのではなく、音像定位の結果として、ドップラー効果を正確に再現できる。 Further, according to the present invention, the sound image localization can be set at an arbitrary position without requiring mechanical elements such as a ceiling and a wall. Furthermore, according to this apparatus, the Doppler effect can be accurately reproduced as a result of the sound image localization, instead of simulating the sound effect such as the Doppler effect.

なお、音声信号を入力するタイミングの制御は、例えば、各スピーカへの出力について、物体の映像内での位置情報から仮想音源の位置を計算し、この仮想音源の位置から各スピーカアレイまでの距離を音速で除算したディレイ量で、音声信号を遅延させて出力することにより実現できる。
また、本発明では、「入力部」を通って音声信号の入力、位置情報の入力が信号処理部により処理されれば、これらの入力は、外部記憶部（ハードディスク、光磁気ディスク等）に格納されているものでも良く、また通信回線を通じて提供されるストリーミングデータでも良い。またこの「入力部」は、例えば一時記憶装置（メモリ）とすることができる。 The control of the timing of inputting the audio signal is performed by, for example, calculating the position of the virtual sound source from the position information in the video of the object for the output to each speaker, and the distance from the position of the virtual sound source to each speaker array. This can be realized by delaying and outputting the audio signal by a delay amount obtained by dividing by the sound speed.
In the present invention, if an input of an audio signal and an input of position information are processed by the signal processing unit through the “input unit”, these inputs are stored in an external storage unit (hard disk, magneto-optical disk, etc.). Or streaming data provided through a communication line. The “input unit” may be a temporary storage device (memory), for example.

（２）本発明は、
前記定位制御部は、さらに、前記音声信号を各スピーカに入力するレベルを制御することを特徴とする。 (2) The present invention
The localization control unit further controls a level at which the audio signal is input to each speaker.

本発明は、前記音声信号を各スピーカに入力するレベルを制御しているので、より正確に音像を定位させることができる。 Since the present invention controls the level at which the audio signal is input to each speaker, the sound image can be localized more accurately.

（３）本発明は、
前記位置情報は、前記映像を表示する表示画面の奥行き方向の距離情報を含むことを特徴とする。 (3) The present invention
The position information includes distance information in a depth direction of a display screen that displays the video.

本発明は、前記位置情報が奥行き方向の距離情報を含むので、従来では得られなかった奥行き間がある音像定位が可能となる。また、前述したドップラー効果による周波数の変調は、奥行き方向に物体が移動した場合の方が強くなるから、音像定位が立体的になる。 In the present invention, since the position information includes distance information in the depth direction, it is possible to perform sound image localization with a depth that has not been obtained conventionally. Further, since the frequency modulation by the Doppler effect described above becomes stronger when the object moves in the depth direction, the sound image localization becomes three-dimensional.

（４）本発明は、
前記入力部は、映像に含まれる複数の物体に各々関連付けられた音声信号、および前記物体の前記映像内での位置情報を入力し、
前記定位制御部は、前記複数の物体に対して前記位置情報が示す位置に基づき前記音声ビームを各々生成し、これらを合成することを特徴とする。 (4) The present invention
The input unit inputs an audio signal associated with each of a plurality of objects included in a video, and position information of the object in the video,
The localization control unit generates the sound beams based on the positions indicated by the position information with respect to the plurality of objects, and combines them.

本発明は、複数の物体の前記映像内での位置情報に基づき、音声ビームを生成するので、これらの音声ビームの合成した出力をアレイスピーカから放音すれば、複数の物体の仮想音源の位置に定位した音声を同時に放音することができる。これにより、これらの物体に関連付けられた音声が定位している仮想音源の位置に相対的な違いが生じるから、音像定位がより立体的になる。 Since the present invention generates an audio beam based on position information of a plurality of objects in the video, if the synthesized output of these audio beams is emitted from the array speaker, the positions of the virtual sound sources of the plurality of objects It is possible to simultaneously emit sound that has been localized. Thereby, since a relative difference occurs in the position of the virtual sound source where the sound associated with these objects is localized, the sound image localization becomes more three-dimensional.

（５）本発明は、
物体を含む映像信号を出力すると共に、前記物体の前記映像信号内での位置情報を出力する映像出力部と、前記映像信号に同期して、前記物体に関連付けられた音声信号を出力する音声出力部と、
前記映像信号を入力して、映像を表示する表示部と、
前記表示部と並べて設置され、アレイ状に配列された複数のスピーカを有するスピーカアレイと、（１）〜（４）のいずれかに記載の音像定位装置と、を備えたコンテンツ再生装置であることを特徴とする。 (5) The present invention
A video output unit that outputs a video signal including an object and outputs position information of the object in the video signal, and an audio output that outputs an audio signal associated with the object in synchronization with the video signal And
A display unit for inputting the video signal and displaying the video;
A content reproduction device comprising: a speaker array having a plurality of speakers arranged side by side with the display unit; and the sound image localization device according to any one of (1) to (4). It is characterized by.

本発明のコンテンツ再生装置は、映像出力部が出力する映像信号に同期して、音声出力部が前記物体に関連付けられた音声信号を、映像上の位置に合わせて、前記位置情報の位置に焦点を結ぶ音声ビームをスピーカアレイに出力するから、映像上の位置に合わせて、音像定位させることができる。 In the content playback apparatus of the present invention, the audio output unit focuses the audio signal associated with the object on the position of the position information in accordance with the position on the video in synchronization with the video signal output by the video output unit. Is output to the speaker array, so that the sound image can be localized according to the position on the video.

本発明によれば、映像内の物体の位置に基づいた正確な音像定位が可能となる。また、この音像定位は、天井や壁といった機械的な要素を必要とすることなく任意の位置に設定可能となる。また、この装置によれば、ドップラー効果等の効果音を模擬的に作り出すのではなく、前記ディレイ設定のためのパラメータ、音量の減衰を設定するためのパラメータ等に基づく正確な音像定位の結果として、ドップラー効果を正確に再現できる。 According to the present invention, accurate sound image localization based on the position of an object in a video can be performed. The sound image localization can be set at an arbitrary position without requiring mechanical elements such as a ceiling and a wall. Also, according to this device, as a result of accurate sound image localization based on the parameters for setting the delay, the parameters for setting the attenuation of the volume, and the like, instead of simulating the sound effect such as the Doppler effect. The Doppler effect can be accurately reproduced.

＜第１の実施形態のＡＶシステムの概略の説明＞
図１を用いて、本実施形態の音像定位装置１０を含むＡＶシステム１についての概略を説明する。図１は、本実施形態であるＡＶシステム１の外観と構成を表す図である。図１（Ａ）はＡＶシステム１の正面図、図１（Ｂ）は、ＡＶシステム１の平面図である。
このＡＶシステム１は、コンテンツ再生装置２、音像定位装置１０、複数のスピーカＳＰｉ（ｉ＝１〜Ｎ）からなるスピーカアレイ１１、表示器１２を備える。スピーカアレイ１１および表示器１２の正面から視聴者１００が視聴している。 <Overview of AV System of First Embodiment>
An outline of the AV system 1 including the sound image localization apparatus 10 of this embodiment will be described with reference to FIG. FIG. 1 is a diagram illustrating the appearance and configuration of an AV system 1 according to this embodiment. FIG. 1A is a front view of the AV system 1, and FIG. 1B is a plan view of the AV system 1.
The AV system 1 includes a content reproduction device 2, a sound image localization device 10, a speaker array 11 including a plurality of speakers SPi (i = 1 to N), and a display 12. A viewer 100 views from the front of the speaker array 11 and the display 12.

コンテンツ再生装置２は、映像信号を表示器１２に出力すると共に、位置情報を音像定位装置１０に出力する。映像と音声を同時に出力するコンテンツ再生装置２としては、例えば、ゲームプログラムや映画等の映像、音声を含むコンテンツを出力するＰＣやＴＶ等、遠隔会議装置等が挙げられる。音像定位装置１０は、表示器１２に映し出される物体１０５Ａ、１０５Ｂの位置から算出した仮想音源位置を焦点とする音声ビームをスピーカアレイ１１に出力する。スピーカアレイ１１は、これらの仮想音源から発せられ、スピーカアレイ１１に到達した音声を再現する。これにより、スピーカアレイ１１は、これらの仮想音源の音像を定位させる。表示器１２は、１または複数の物体１０５Ａ、１０５Ｂ等を含む画像１０４を表示する。 The content reproduction device 2 outputs a video signal to the display device 12 and outputs position information to the sound image localization device 10. Examples of the content playback device 2 that outputs video and audio simultaneously include a remote conference device such as a PC or TV that outputs content including video and audio such as game programs and movies. The sound image localization apparatus 10 outputs to the speaker array 11 an audio beam whose focal point is the virtual sound source position calculated from the positions of the objects 105 A and 105 B displayed on the display 12. The speaker array 11 reproduces the sound emitted from these virtual sound sources and reaching the speaker array 11. Thereby, the speaker array 11 localizes the sound images of these virtual sound sources. The display 12 displays an image 104 that includes one or more objects 105A, 105B, and the like.

＜第１の実施形態のＡＶシステムの構成の説明＞
図１、図２を用いて、第１の実施形態のＡＶシステムについて説明する。図１は、前述の通り、このＡＶシステムの外観と構成を表す図である。図２は、図１に示すＡＶシステム１に含まれる音像定位装置１０の内部構成図である。 <Description of Configuration of AV System of First Embodiment>
The AV system according to the first embodiment will be described with reference to FIGS. FIG. 1 is a diagram showing the appearance and configuration of this AV system as described above. FIG. 2 is an internal configuration diagram of the sound image localization apparatus 10 included in the AV system 1 shown in FIG.

まず、図１を用いて、ＡＶシステム１の外観について説明する。図１（Ａ）に示すように、このＡＶシステム１はコンテンツ再生装置２、音像定位装置１０、スピーカアレイ１１、表示器１２を備える。スピーカアレイ１１は、複数のスピーカＳＰｉ（ｉ＝１〜Ｎ）を備えており、音像定位装置１０は、スピーカＳＰｉ（ｉ＝１〜Ｎ）のそれぞれに対応した音声出力系統を有している。
表示器１２は、例えば、プラズマディスプレイ、液晶ディスプレイ等で構成されており、コンテンツ再生装置より出力された映像、特に動画を表示する。
なお、図１では、音像定位装置１０は、スピーカアレイ１１や表示器１２とは別個の筐体として構成しているが、スピーカアレイ１１や表示器１２と同一の音像定位装置筐体に入るものとして構成してもよい。 First, the external appearance of the AV system 1 will be described with reference to FIG. As shown in FIG. 1A, the AV system 1 includes a content reproduction device 2, a sound image localization device 10, a speaker array 11, and a display 12. The speaker array 11 includes a plurality of speakers SPi (i = 1 to N), and the sound image localization apparatus 10 has an audio output system corresponding to each of the speakers SPi (i = 1 to N).
The display device 12 is composed of, for example, a plasma display, a liquid crystal display, and the like, and displays a video output from the content reproduction device, particularly a moving image.
In FIG. 1, the sound image localization device 10 is configured as a separate housing from the speaker array 11 and the display device 12, but enters the same sound image localization device housing as the speaker array 11 and display device 12. You may comprise as.

次に、図１（Ｂ）の平面図を用いて、スピーカアレイ１１のそれぞれのスピーカＳＰｉ（ｉ＝１〜Ｎ）の音声出力系統の制御方法の概要について説明する。図１（Ａ）で示した物体１０５Ａ、１０５Ｂは、コンテンツ再生装置２より出力された画像１０４に含まれるものであり、コンテンツ再生装置２からはこれら物体の位置の情報（以下、「位置情報」という。）が音像定位装置１０に出力される。この物体１０５Ａ、１０５Ｂの位置情報を基に音像定位装置１０では、それぞれ仮想音源１０１Ａ、１０１Ｂを設定する。そして、音像定位装置１０は、スピーカアレイ１１から視聴者１００が聴取する音声（実際の波面１０３で図示）が、図１（Ｂ）に示すように、あたかも表示器１２の奥側に位置する仮想音源１０１Ａ、１０１Ｂから音声が出力されるように（仮想の波面１０２で図示）、音像を定位させる。この音像定位を実現するため、音像定位装置１０は、スピーカアレイ１１のスピーカＳＰｉ（ｉ＝１〜Ｎ）のそれぞれに入力する音声信号のタイミングを制御する。この音声信号を入力するタイミングの制御は、時間遅れ（位相遅れ）特性（以下「ディレイ」という。）、音量、周波数特性を調整して、これらのスピーカＳＰｉから音声１０３を出力する。これにより、仮想音源１０１Ａ、１０１Ｂの位置に音像を定位させることができる。 Next, an outline of a method for controlling the audio output system of each speaker SPi (i = 1 to N) of the speaker array 11 will be described using the plan view of FIG. The objects 105A and 105B shown in FIG. 1A are included in the image 104 output from the content reproduction apparatus 2, and the content reproduction apparatus 2 receives information on the positions of these objects (hereinafter, “position information”). Is output to the sound image localization apparatus 10. Based on the position information of the objects 105A and 105B, the sound image localization apparatus 10 sets virtual sound sources 101A and 101B, respectively. The sound image localization apparatus 10 is a virtual image in which the sound (illustrated by the actual wavefront 103) heard by the viewer 100 from the speaker array 11 is located on the back side of the display device 12 as shown in FIG. The sound image is localized so that sound is output from the sound sources 101A and 101B (illustrated by the virtual wavefront 102). In order to realize this sound image localization, the sound image localization apparatus 10 controls the timing of the audio signal input to each of the speakers SPi (i = 1 to N) of the speaker array 11. The control of the timing of inputting the audio signal adjusts the time delay (phase delay) characteristic (hereinafter referred to as “delay”), volume, and frequency characteristic, and outputs the audio 103 from these speakers SPi. Thereby, a sound image can be localized at the positions of the virtual sound sources 101A and 101B.

次に図２を用いて音像定位装置１０の内部構成について説明する。音像定位装置１０は、入力部４と、パラメータ算出部５と、信号処理部６と、Ｄ／Ａ変換器ＤＡＣｉ（ｉ＝１〜Ｎ）と、アンプＡＭＰｉ（ｉ＝１〜Ｎ）と、これらの動作を統括制御する制御部７を備えている。また、音像定位装置１０の外部に設けたコンテンツ再生装置２には、外部プログラム１１１と外部記憶部３１と、外部プログラム１１１を動作させる図示しないＣＰＵを備えている。 Next, the internal configuration of the sound image localization apparatus 10 will be described with reference to FIG. The sound image localization apparatus 10 includes an input unit 4, a parameter calculation unit 5, a signal processing unit 6, a D / A converter DACi (i = 1 to N), an amplifier AMPi (i = 1 to N), and these The control part 7 which performs overall control of these operations is provided. In addition, the content reproduction device 2 provided outside the sound image localization device 10 includes an external program 111, an external storage unit 31, and a CPU (not shown) that operates the external program 111.

入力部４は、各信号を入力するインタフェースおよび入力した信号を格納するバッファを有している。入力部４には、インターネット、コンテンツ再生装置などが接続されている。入力部４は、これらから、音声信号４１、位置情報４２を入力し、バッファ内にこれらのデータを一時的に格納する。 The input unit 4 has an interface for inputting each signal and a buffer for storing the input signal. The input unit 4 is connected to the Internet, a content playback device, and the like. The input unit 4 inputs the audio signal 41 and the position information 42 from these, and temporarily stores these data in the buffer.

音声信号４１は、外部記憶部３１、外部プログラム１１１、インターネットを通じて入力された音声信号であって、図１（Ａ）に示すような画像１０４と同期して出力されるディジタル音声信号のデータである。また、この音声信号４１は、図１（Ａ）に示すような画像１０４内の物体１０５Ａ、１０５Ｂごとに入力される。
位置情報４２は、図１（Ａ）に示すようなコンテンツ再生装置２によって出力される位置情報である。 The audio signal 41 is an audio signal input through the external storage unit 31, the external program 111, and the Internet, and is digital audio signal data output in synchronization with the image 104 as shown in FIG. . The audio signal 41 is input for each of the objects 105A and 105B in the image 104 as shown in FIG.
The position information 42 is position information output by the content reproduction apparatus 2 as shown in FIG.

図２のパラメータ算出部５は、計算部で構成する。パラメータ算出部５は、入力部４に入力された位置情報４２に基づいて、図１（Ａ）で示したような仮想音源１０１Ａ、１０１Ｂの位置を設定し、ディレイパラメータ５１と、音量調整パラメータ５２と、高域減衰パラメータ５３を計算し、入力部４に蓄えるか、または信号処理部６にこれらのパラメータを出力する。 The parameter calculation unit 5 in FIG. 2 includes a calculation unit. The parameter calculation unit 5 sets the positions of the virtual sound sources 101A and 101B as shown in FIG. 1A based on the position information 42 input to the input unit 4, and includes a delay parameter 51 and a volume adjustment parameter 52. Then, the high-frequency attenuation parameter 53 is calculated and stored in the input unit 4, or these parameters are output to the signal processing unit 6.

図２のパラメータ算出部５内で計算するディレイパラメータ５１は、物体ごと、スピーカＳＰｉごとに設定する。図１（Ｂ）に示すような物体１０５Ａについて、図１（Ａ）に示すような仮想音源１０１ＡからスピーカＳＰｉそれぞれまでの距離を計算し、この距離に基づく音声の伝達の遅延を計算して、スピーカＳＰｉごとにディレイを設定する。図１（Ａ）の例であれば、画像１０４に登場する物体１０５Ｂについても、図１（Ｂ）の仮想音源１０１ＢからスピーカＳＰｉそれぞれまでの距離を計算して行なう。
音量調整パラメータ５２は、この音声の伝達距離により音量が減少する分を考慮して、スピーカＳＰｉごとに音量の減少量を計算する。また、物体すべてについてこの減少量を計算する。
高域減衰パラメータ５３は、仮想音源１０１ＡとスピーカＳＰｉとの距離、またはスピーカアレイ１１の並びと仮想音源１０１Ａとの角度に基づいて高域を減衰させるパラメータであり、物体ごと、スピーカＳＰｉごとに設定する。 The delay parameter 51 calculated in the parameter calculation unit 5 in FIG. 2 is set for each object and for each speaker SPi. For the object 105A as shown in FIG. 1 (B), the distance from the virtual sound source 101A as shown in FIG. 1 (A) to each of the speakers SPi is calculated, and the delay of sound transmission based on this distance is calculated, A delay is set for each speaker SPi. In the example of FIG. 1A, the object 105B appearing in the image 104 is also calculated by calculating the distances from the virtual sound source 101B of FIG.
The volume adjustment parameter 52 calculates the amount of decrease in volume for each speaker SPi in consideration of the decrease in volume due to the transmission distance of the sound. Also, this reduction amount is calculated for all objects.
The high frequency attenuation parameter 53 is a parameter for attenuating the high frequency based on the distance between the virtual sound source 101A and the speaker SPi or the angle between the arrangement of the speaker array 11 and the virtual sound source 101A, and is set for each object and each speaker SPi. To do.

なお、ディレイパラメータ５１、音量調整パラメータ５２、高域減衰パラメータ５３は、それぞれの物体の移動に伴い、仮想音源とスピーカＳＰｉ（ｉ＝１〜Ｎ）との距離が変動するので、パラメータ算出部５は、各パラメータの値を更新する。 Note that the delay parameter 51, the volume adjustment parameter 52, and the high-frequency attenuation parameter 53 change the distance between the virtual sound source and the speaker SPi (i = 1 to N) as the respective objects move, so the parameter calculation unit 5 Updates the value of each parameter.

信号処理部６は、音声信号４１のディジタル音声信号データを加工して、スピーカＳＰｉごとに出力するデータを計算する。信号処理部６には、音声信号が映像信号と同期して入力される。信号処理部６は、物体１０５Ａ、１０５Ｂごとに入力した音声信号４１を、前述したパラメータ算出部５のディレイパラメータ５１、音量調整パラメータ５２、高域減衰パラメータ５３それぞれに基づいて、スピーカＳＰｉの音声出力系統に送るディジタル音声信号のデータを生成する。信号処理部６は、ディレイパラメータ５１に基づくディレイを行なうディジタルフィルタＤＦｉ（ｉ＝１〜Ｎ）（後述、図３参照。）と音量調整パラメータ５２、高域減衰パラメータ５３に基づくゲイン調整Ｇｉ（ｉ＝１〜Ｎ）（後述、図３参照。）を備える。また、このようにして生成したデータは、物体が複数ある場合には、後述の図４で説明するように、物体１０５Ａ、１０５Ｂごとのディジタル音声信号のデータを加算して出力する。 The signal processing unit 6 processes the digital audio signal data of the audio signal 41 and calculates data to be output for each speaker SPi. An audio signal is input to the signal processing unit 6 in synchronization with the video signal. The signal processing unit 6 outputs the audio signal 41 input for each of the objects 105A and 105B based on the delay parameter 51, the volume adjustment parameter 52, and the high-frequency attenuation parameter 53 of the parameter calculation unit 5 described above, to the audio output of the speaker SPi. Data of a digital audio signal to be sent to the system is generated. The signal processing unit 6 performs a gain adjustment Gi (i) based on a digital filter DFi (i = 1 to N) (see FIG. 3 to be described later), a volume adjustment parameter 52, and a high-frequency attenuation parameter 53 for performing a delay based on the delay parameter 51. = 1 to N) (see FIG. 3 described later). In addition, when there are a plurality of objects, the data generated in this way is output by adding digital audio signal data for each of the objects 105A and 105B, as will be described later with reference to FIG.

Ｄ／Ａ変換器ＤＡＣｉ（ｉ＝１〜Ｎ）は、Ｄ／Ａ変換用ＩＣチップで構成でき、信号処理部６で生成したディジタル音声信号データをアナログ音声信号に変換し、それぞれ、アンプＡＭＰｉ（ｉ＝１〜Ｎ）に出力する。
アンプＡＭＰｉ（ｉ＝１〜Ｎ）は、例えば、ＦＥＴ等の増幅段等で構成できる。アンプＡＭＰｉ（ｉ＝１〜Ｎ）は、外付けのＡＶアンプでも良い。Ｄ／Ａ変換器ＤＡＣｉで出力したアナログ音声信号を増幅してスピーカＳＰｉに送る。
スピーカＳＰｉ（ｉ＝１〜Ｎ）は、スピーカアレイまたは、３つ以上のスピーカを並べたユニットが必要であり、それぞれ独立の音声信号を入力する。そして、アンプＡＭＰｉ（ｉ＝１〜Ｎ）で増幅したアナログ音声信号を音声に変換する。
制御部７は、例えばＣＰＵ等で構成され、音像定位装置１０の内部構成のそれぞれの部分を制御する。 The D / A converter DACi (i = 1 to N) can be configured by a D / A conversion IC chip, converts the digital audio signal data generated by the signal processing unit 6 into an analog audio signal, and each of the amplifiers AMPi ( i = 1 to N).
The amplifier AMPi (i = 1 to N) can be configured by an amplification stage such as an FET, for example. The amplifier AMPi (i = 1 to N) may be an external AV amplifier. The analog audio signal output from the D / A converter DACi is amplified and sent to the speaker SPi.
The speakers SPi (i = 1 to N) require a speaker array or a unit in which three or more speakers are arranged, and input independent audio signals. Then, the analog audio signal amplified by the amplifier AMPi (i = 1 to N) is converted into audio.
The control unit 7 is configured by a CPU or the like, for example, and controls each part of the internal configuration of the sound image localization apparatus 10.

外部プログラム１１１は、コンテンツ再生装置２内で動作する。外部プログラム１１１を実行すると、この外部プログラム１１１または別のプログラムによるコンテンツの再生と同時に、音声信号４１を外部記憶部３１から読み出して入力部４に出力し、位置情報４２を計算して入力部４に出力する。 The external program 111 operates in the content reproduction device 2. When the external program 111 is executed, simultaneously with the reproduction of content by the external program 111 or another program, the audio signal 41 is read from the external storage unit 31 and output to the input unit 4, the position information 42 is calculated, and the input unit 4 is calculated. Output to.

外部記憶部３１は、ハードディスク、光ディスク装置で構成し、入力部４に音声信号４１を入力する。 The external storage unit 31 includes a hard disk and an optical disk device, and inputs an audio signal 41 to the input unit 4.

ここで、図２で示した構成のうち、外部プログラム１１１、音声信号４１、位置情報４２について、ＡＶシステム１の用途別に２つの例を挙げて具体的に説明する。
《ゲームプログラムをＰＣで実行させる場合》
コンテンツ再生装置２は、例えば、ＰＣに接続するサウンドボードで構成でき、このＰＣでゲームプログラムを実行させる場合がある。ゲームプログラムは、図２に示す外部プログラム１１１に相当し、ゲームプログラムはその内部で計算した映像上の物体（キャラクタやオブジェクトを含む）の位置情報４２を入力部４に出力し、入力部４は、位置情報４２を一時的に格納する。またゲームプログラムからは音声信号４１Ａ、４２Ｂが、図１（Ａ）に示すような画像１０４内の物体１０５Ａ、１０５Ｂに対応する効果音の音声として出力される。この音声信号４１は、ゲームプログラムの実行時に、効果音を発する命令を受けて、入力部４に格納する。 Here, in the configuration shown in FIG. 2, the external program 111, the audio signal 41, and the position information 42 will be specifically described by giving two examples for each use of the AV system 1.
<< When running a game program on a PC >>
The content reproduction apparatus 2 can be configured by, for example, a sound board connected to a PC, and a game program may be executed on the PC. The game program corresponds to the external program 111 shown in FIG. 2, and the game program outputs the position information 42 of the object (including the character and the object) on the image calculated inside to the input unit 4, and the input unit 4 The position information 42 is temporarily stored. In addition, audio signals 41A and 42B are output from the game program as sound effects sound corresponding to the objects 105A and 105B in the image 104 as shown in FIG. The audio signal 41 receives a command for generating a sound effect when the game program is executed, and stores it in the input unit 4.

《ＰＣ上のコンテンツ再生プログラムでディジタルコンテンツを再生する場合》
コンテンツ再生装置２は、上述の通り例えば、ＰＣ（パーソナルコンピュータ）に接続するサウンドボードで構成でき、ＰＣ上のコンテンツ再生プログラムで画像と音声を再生する場合がある。このコンテンツ再生プログラムでディジタルコンテンツを再生する場合であれば、台詞や物体の音声を音声信号４１として入力する。コンテンツ再生プログラムは、外部プログラム１１１に相当する。また、コンテンツ再生プログラムからは、音声信号４１に対応する画像に対応して、時系列的に物体の動作、移動、停止に同期した位置情報４２が入力部４に入力される。 << When playing back digital contents with a content playback program on a PC >>
As described above, the content reproduction apparatus 2 can be constituted by a sound board connected to a PC (personal computer), for example, and there are cases where images and sound are reproduced by a content reproduction program on the PC. If digital content is to be played back by this content playback program, speech or object sound is input as the audio signal 41. The content reproduction program corresponds to the external program 111. Further, from the content reproduction program, position information 42 synchronized with the movement, movement, and stop of the object in time series corresponding to the image corresponding to the audio signal 41 is input to the input unit 4.

＜信号処理部の動作説明＞
次に図３を用いて、信号処理部６の動作をさらに詳細に説明する。図３は、本実施形態の音像定位装置の内部のパラメータ算出部５、信号処理部６の構成図であり、図１（Ｂ）の物体１０５Ａに対応する仮想音源１０１Ａについて示している。また、図２で示したＤ／Ａ変換器ＤＡＣｉ（ｉ＝１〜Ｎ）、アンプＡＭＰｉ（ｉ＝１〜Ｎ）は、省略しており、図１、図２で示したスピーカの出力系統の数Ｎは、図３では８個としている。ただし、本実施形態のＡＶシステム１ではスピーカ出力系統の数Ｎは、８個に限らず実行できる。
図３に示すように、信号処理部６には、複数のディジタルフィルタＤＦｉ（ｉ＝１〜８）と、ゲイン調整Ｇｉ（ｉ＝１〜８）を設けている。 <Description of operation of signal processor>
Next, the operation of the signal processing unit 6 will be described in more detail with reference to FIG. FIG. 3 is a configuration diagram of the parameter calculation unit 5 and the signal processing unit 6 inside the sound image localization apparatus of the present embodiment, and shows a virtual sound source 101A corresponding to the object 105A of FIG. Further, the D / A converter DACi (i = 1 to N) and the amplifier AMPi (i = 1 to N) shown in FIG. 2 are omitted, and the output system of the speaker shown in FIGS. 1 and 2 is omitted. The number N is 8 in FIG. However, in the AV system 1 of the present embodiment, the number N of speaker output systems is not limited to eight.
As shown in FIG. 3, the signal processing unit 6 is provided with a plurality of digital filters DFi (i = 1 to 8) and gain adjustment Gi (i = 1 to 8).

ディジタルフィルタＤＦｉ（ｉ＝１〜８）は、図１（Ａ）の仮想音源１０１Ａ（図３にも図示。）からスピーカＳＰｉ（ｉ＝１〜８）までのそれぞれの距離Ｌｉ（［ｍ］、ｉ＝１〜８）による伝播時間に相当する遅延時間を設定し、それぞれのディジタルフィルタＤＦｉに入力される音声信号に対してディレイをかけて出力する。このディレイの算出はパラメータ算出部５で行なうが、この算出は、Ｌｉを音速で除算することにより行なう。前述の通り、映像信号と、信号処理部６に入力される音声信号を同期させているので、音声信号に対してディレイをかけて出力すると、映像に登場する物体の奥行きに応じて音声信号の到達時間に差がでて、音声に奥行き感を出すことができる。 The digital filter DFi (i = 1 to 8) is a distance Li ([m], each distance from the virtual sound source 101A in FIG. 1A (also shown in FIG. 3) to the speaker SPi (i = 1 to 8). A delay time corresponding to the propagation time according to i = 1 to 8) is set, and the audio signal input to each digital filter DFi is delayed and output. The delay is calculated by the parameter calculation unit 5, and this calculation is performed by dividing Li by the speed of sound. As described above, since the video signal and the audio signal input to the signal processing unit 6 are synchronized, when the audio signal is output with a delay, the audio signal is output according to the depth of the object appearing in the video. There is a difference in the arrival time, and a sense of depth can be given to the voice.

物体１０５Ａが映像内で移動した場合、図２のパラメータ算出部５は、その物体１０５Ａの位置から逐次的に仮想音源１０１Ａの位置を決定しているから、結果的に仮想音源１０１Ａの位置が変動し、これにより距離Ｌｉは、時間的に変動することになる。したがって時間と共にディレイが変動することになれば、音の粗密が変化するから、音像定位の結果として、ドップラー効果を正確に再現できる。
また、ゲイン調整Ｇｉ（ｉ＝１〜８）は、パラメータ算出部５が音量調整パラメータ５２を計算したものを信号処理部６が設定する。音量は、距離Ｌｉ（ｉ＝１〜８）の２乗に反比例するから、基準音量を定めて、
基準音量×１［ｍ］／距離Ｌｉ^２［ｍ］
により算出する。さらに、図２の高域減衰パラメータ５３により、高域を減衰させる周波数特性フィルタを畳み込み演算する。前述の通り、パラメータ算出部５では、高域減衰パラメータ５３を計算しており、高域減衰パラメータ５３は、仮想音源１０１ＡとスピーカＳＰｉとの距離Ｌｉ、またはスピーカアレイ１１の並びと仮想音源１０１Ａとの角度に基づいて高域を減衰させるパラメータである。高域減衰パラメータ５３で算出する高域の減衰量は、例えば、スピーカＳＰｉの並んでいる面ＳＦと、各スピーカＳＰｉから見た仮想音源１０１Ａの方向との間の角度αｉ（ｉ＝１〜８）を用いて、周波数のフラットな部分に対して、
（ｃｏｓ（αｉ）×１［ｍ］／距離Ｌｉ^２［ｍ］）倍
により算出する。 When the object 105A moves in the video, the parameter calculation unit 5 in FIG. 2 determines the position of the virtual sound source 101A sequentially from the position of the object 105A, and as a result, the position of the virtual sound source 101A varies. Thus, the distance Li varies with time. Therefore, if the delay varies with time, the density of the sound changes, so that the Doppler effect can be accurately reproduced as a result of the sound image localization.
In addition, the gain adjustment Gi (i = 1 to 8) is set by the signal processing unit 6 as the parameter calculation unit 5 calculates the volume adjustment parameter 52. Since the volume is inversely proportional to the square of the distance Li (i = 1 to 8), the reference volume is determined,
Reference volume x 1 [m] / distance Li ² [m]
Calculated by Further, the frequency characteristic filter that attenuates the high frequency is convolved with the high frequency attenuation parameter 53 of FIG. As described above, the parameter calculation unit 5 calculates the high-frequency attenuation parameter 53. The high-frequency attenuation parameter 53 includes the distance Li between the virtual sound source 101A and the speaker SPi, or the arrangement of the speaker array 11 and the virtual sound source 101A. It is a parameter that attenuates the high range based on the angle of. The high-frequency attenuation calculated by the high-frequency attenuation parameter 53 is, for example, an angle αi (i = 1 to 8) between the surface SF on which the speakers SPi are arranged and the direction of the virtual sound source 101A viewed from each speaker SPi. ) For the flat part of the frequency
It is calculated by (cos (αi) × 1 [m] / distance Li ² [m]) times.

なお、スピーカアレイ１１と、表示器１２は、必ずしも、図１で示したように一体である必要はない。 Note that the speaker array 11 and the display 12 do not necessarily have to be integrated as shown in FIG.

また、本実施形態の音像定位装置１０を含むＡＶシステム１の応用として、スピーカアレイ１１を設置する位置を音像定位装置１０の表示器１２より前方に設置する場合には、さらに、その距離に対応して、前記入力部４は、視聴者の聴取方向のスピーカと表示器との間の距離の入力を受け付ける構成も可能である。例えば、スピーカと表示器１２の聴取方向の距離を音像定位装置１０に入力して、距離Ｌｉ（ｉ＝１〜Ｎ）に加える設定を制御部７に指示を与える図示しない操作部から行なう。この場合、前記パラメータ計算部５は、当該距離分のディレイを前記仮想音源の位置にあわせてスケールを調整したディレイ量を音源を前記仮想音源の位置と前記各スピーカとの距離分のディレイに付加して、前記ディレイ設定のためのパラメータを算出する。
表示器１２と、スピーカＳＰｉの距離を考慮して、ディレイをかけるので、表示器１２とスピーカＳＰｉの設置の自由度を高めることができる。例えば、表示器１２に写っている物体１０５Ａが大きく写っている場合には、物体の前記仮想音源の位置は近くなり、小さく写っている場合には、前記仮想音源の位置は遠くなるが、その遠近感は、視聴者の聴取方向のスピーカと表示器との間の距離のスケールと必ずしも一致しない場合がある。このような場合に、視聴者の聴取方向のスピーカと表示器との間の距離を音速で除算したディレイを付加すれば、表示装置に対して奥行き方向の音像を再現することができる。 Further, as an application of the AV system 1 including the sound image localization apparatus 10 of the present embodiment, when the position where the speaker array 11 is installed is placed in front of the display 12 of the sound image localization apparatus 10, it further corresponds to the distance. The input unit 4 may be configured to receive an input of the distance between the speaker and the display in the listening direction of the viewer. For example, the distance in the listening direction between the speaker and the display device 12 is input to the sound image localization apparatus 10 and setting to add to the distance Li (i = 1 to N) is performed from an operation unit (not shown) that gives an instruction to the control unit 7. In this case, the parameter calculation unit 5 adds a delay amount obtained by adjusting the scale according to the delay corresponding to the distance to the position of the virtual sound source to the delay corresponding to the distance between the position of the virtual sound source and each speaker. Then, parameters for setting the delay are calculated.
Since a delay is applied in consideration of the distance between the display device 12 and the speaker SPi, the degree of freedom of installation of the display device 12 and the speaker SPi can be increased. For example, when the object 105A shown on the display device 12 is large, the position of the virtual sound source of the object is close, and when the object 105A is small, the position of the virtual sound source is far, Perspective may not always match the scale of the distance between the speaker and the display in the viewer's listening direction. In such a case, by adding a delay obtained by dividing the distance between the speaker in the listening direction of the viewer and the display by the speed of sound, a sound image in the depth direction can be reproduced on the display device.

＜物体が複数登場する場合のパラメータ算出部、信号処理部の動作の説明＞
次に、図４を用いて、物体が映像に複数登場する場合のパラメータ算出部５、信号処理部６の動作について説明する。図４は、図１で示した物体１０５Ａ、１０５Ｂをそれぞれ登場させる場合の、本実施形態の音像定位装置の内部のパラメータ算出部５、信号処理部６の構成図を示す。
図４に示すように、右側には、図３と同様、物体１０５Ａに対する仮想音源１０１Ａと、音声信号４１Ａと、この音声信号４１Ａを加工して、この仮想音源１０１Ａを定位させるのに必要な、ディジタルフィルタＤＦＡｉ（ｉ＝１〜８）、ゲイン調整ＧＡｉ（ｉ＝１〜８）を備えている。 <Description of operation of parameter calculation unit and signal processing unit when multiple objects appear>
Next, operations of the parameter calculation unit 5 and the signal processing unit 6 when a plurality of objects appear in the video will be described with reference to FIG. FIG. 4 shows a configuration diagram of the parameter calculation unit 5 and the signal processing unit 6 in the sound image localization apparatus of the present embodiment when the objects 105A and 105B shown in FIG.
As shown in FIG. 4, on the right side, similar to FIG. 3, the virtual sound source 101A for the object 105A, the audio signal 41A, and the audio signal 41A are processed to localize the virtual sound source 101A. A digital filter DFAi (i = 1 to 8) and gain adjustment GAi (i = 1 to 8) are provided.

また、図４に示すように、物体１０５Ｂに対応して、図３で示した構成に対して、物体１０５Ｂに対する仮想音源１０１Ｂと、音声信号４１Ｂと、この音声信号４１Ｂを加工して、この仮想音源１０１Ｂを定位させるのに必要なディジタルフィルタＤＦＢｉ（ｉ＝１〜８）、ゲイン調整ＧＢｉ（ｉ＝１〜８）をさらに備えている。
なお、図４で示す符号は、物体１０５Ａ、１０５Ｂの「Ａ」、「Ｂ」に対応して、ディジタルフィルタＤＦｉに対してＤＦＡｉ、ＤＦＢｉ、ゲイン調整Ｇｉに対応して、ＧＡｉ、ＧＢｉを付している。
図４に示す構成では、図３で説明したものと同様にして、これら仮想音源１０１Ａ、１０１Ｂの定位に必要なディジタル音声出力の計算をそれぞれスピーカＳＰｉごとに、ＤＦＡｉ、ＤＦＢｉ、ＧＡｉ、ＧＢｉにより行なっている。さらに、加算部ＡＤＤｉ（ｉ＝１〜８）は、これら計算したディジタル音声出力を互いに加えている。これにより、物体１０５Ａ、１０５Ｂが独立に移動した場合にも、それぞれ独立に音像定位させることができる。 Also, as shown in FIG. 4, corresponding to the object 105B, the virtual sound source 101B, the audio signal 41B, and the audio signal 41B for the object 105B are processed to process the virtual A digital filter DFBi (i = 1 to 8) and gain adjustment GBi (i = 1 to 8) necessary for localizing the sound source 101B are further provided.
The codes shown in FIG. 4 correspond to “A” and “B” of the objects 105A and 105B, and GAi and GBi are added to the digital filter DFi corresponding to DFAi and DFBi and gain adjustment Gi. ing.
In the configuration shown in FIG. 4, in the same manner as described with reference to FIG. 3, the calculation of the digital audio output necessary for localization of the virtual sound sources 101A and 101B is performed for each speaker SPi by DFAi, DFBi, GAi, and GBi. ing. Further, the adder ADDi (i = 1 to 8) adds these calculated digital audio outputs to each other. Thereby, even when the objects 105A and 105B move independently, the sound images can be localized independently.

なお、物体（１０５Ａ、１０５Ｂを含む。）を３個以上登場させる場合には、図３で示した構成を３つ以上設けて、その出力信号を加算部ＡＤＤｉにより加算する。 When three or more objects (including 105A and 105B) appear, three or more configurations shown in FIG. 3 are provided, and the output signals are added by the adder ADDi.

＜物体が移動する場合のパラメータ算出部、信号処理部の動作フロー＞
次に図５を用いて、物体が移動する場合のパラメータ算出部５、信号処理部６の動作フローを説明する。図５はこのフローを表す図である。
図５のＳ１では、図１（Ａ）、図３、図４で示したような仮想音源１０１Ａ、１０１Ｂの位置が変動したか判断する。この位置が変動した場合には、ＹとなりＳ２に進む。変動しない場合には、Ｎとなり現状のディレイ等のパラメータ５１、５２、５３を維持する。
Ｓ１における判断では、仮想音源１０１Ａ、１０１Ｂの座標の変動は、図２で示したように、パラメータ算出部５で設定するが、この変動は、本来的には位置情報４２の変動に基づいている。 <Operation flow of parameter calculation unit and signal processing unit when object moves>
Next, the operation flow of the parameter calculation unit 5 and the signal processing unit 6 when the object moves will be described with reference to FIG. FIG. 5 is a diagram showing this flow.
In S1 of FIG. 5, it is determined whether the positions of the virtual sound sources 101A and 101B as shown in FIGS. 1A, 3 and 4 have changed. If this position fluctuates, it becomes Y and proceeds to S2. If it does not fluctuate, it becomes N and the parameters 51, 52, 53 such as the current delay are maintained.
In the determination in S1, the change in the coordinates of the virtual sound sources 101A and 101B is set by the parameter calculation unit 5 as shown in FIG. 2, but this change is essentially based on the change in the position information 42. .

図５のＳ２では、図３、図４で示したようにして、仮想音源１０１ＡとスピーカＳＰｉ（ｉ＝１〜Ｎ）との距離Ｌｉ（ｉ＝１〜Ｎ）（仮想音源１０１Ｂも同様。）を再計算する。
Ｓ３では、図３で説明したようにして、求めた距離Ｌｉから、パラメータ算出部５は、ディレイパラメータ５１、音量調整パラメータ５２、高域減衰パラメータ５３等のパラメータを算出する。
Ｓ４では、Ｓ３で求めた各パラメータを再設定する。 In S2 of FIG. 5, as shown in FIGS. 3 and 4, the distance Li (i = 1 to N) between the virtual sound source 101A and the speaker SPi (i = 1 to N) (the same applies to the virtual sound source 101B). Is recalculated.
In S3, the parameter calculation unit 5 calculates parameters such as the delay parameter 51, the volume adjustment parameter 52, and the high-frequency attenuation parameter 53 from the obtained distance Li as described in FIG.
In S4, each parameter obtained in S3 is reset.

以上、図５で示した動作フローにより、物体１０５Ａ、１０５Ｂが移動する場合にも、その移動に基づいて、それぞれの音像を定位できる。 As described above, according to the operation flow shown in FIG. 5, even when the objects 105A and 105B move, the sound images can be localized based on the movement.

＜ディジタルフィルタの具体的な構成＞
次に、図６を用いて、ディジタルフィルタＤＦｉ（ｉ＝１〜Ｎ）の具体的な構成を説明する。図６は、本実施形態の音像定位装置のディジタルフィルタの具体的な構成図である。図３、図４で示したディジタルフィルタＤＦｉは、実際には、リングバッファ６２上での書き込み、読み出しにより実現される。 <Specific configuration of digital filter>
Next, a specific configuration of the digital filter DFi (i = 1 to N) will be described with reference to FIG. FIG. 6 is a specific configuration diagram of the digital filter of the sound image localization apparatus of the present embodiment. The digital filter DFi shown in FIGS. 3 and 4 is actually realized by writing and reading on the ring buffer 62.

図６に示すように、リングバッファ６２は、複数のデータ区画６３に区分けされたメモリ上に設定したバッファである。このバッファはディジタル音声信号の入出力に対応できるよう高速で動作する必要がある。データ区画６３には、それぞれ１つのサンプルデータを入出力する。 As shown in FIG. 6, the ring buffer 62 is a buffer set on a memory divided into a plurality of data partitions 63. This buffer needs to operate at a high speed so as to be compatible with digital audio signal input / output. One sample data is input / output to / from the data section 63.

図６のリングバッファ６２へのサンプルデータの書き込みは、図２の入力部４に入力された音声信号を、書き込み位置６４からデータ書き込み経路６５に沿って、データ区画６３を巡回してリング状にサンプルデータの書き込みを繰り返す。 The sample data is written into the ring buffer 62 in FIG. 6 by circulating the audio signal input to the input unit 4 in FIG. 2 from the write position 64 along the data write path 65 through the data section 63 in a ring shape. Repeat writing sample data.

図６のリングバッファ６２からのサンプルデータの読み込みは、読み取り位置ＴＡＰｉ（ｉ＝１〜Ｎ）を設けて、リングバッファ６２上に書き込まれたサンプルデータをそれぞれ読み出すことにより行なう。この読み取り位置ＴＡＰｉの読み出し経路６６は、データ書き込み経路６５と同じ方向である。読み取り位置ＴＡＰｉで読み出したサンプルデータは、図３で示したようなゲイン調整Ｇｉ（ｉ＝１〜Ｎ）に送られる。 Reading of the sample data from the ring buffer 62 in FIG. 6 is performed by providing the reading position TAPi (i = 1 to N) and reading the sample data written on the ring buffer 62, respectively. The read path 66 for the read position TAPi is in the same direction as the data write path 65. The sample data read at the reading position TAPi is sent to the gain adjustment Gi (i = 1 to N) as shown in FIG.

図６の読み取り位置ＴＡＰｉでの読み出しは、それぞれ、書き込み位置６４に対して所定のディレイ量で読み取り位置ＴＡＰｉを遅延させて読み出しを行なう。このディレイの量は、前述の図２、図３で説明したようにパラメータ算出部５で算出したディレイパラメータ５１の値である。このディレイの量は、物体１０５Ａの位置の変動によって変動する。即ち、図１、図２で説明したように物体１０５Ａの位置が変動すれば、図５のＳ１で説明したように、パラメータ算出部５により仮想音源１０１Ａの位置を変動させることになる。そして、この移動は一瞬で移動するのではなく、ある速度をもって移動することになるから、パラメータ算出部５で設定するディレイパラメータ５１は、刻々と変動することになる。このディレイパラメータ５１の変動により、物体１０５Ａが静止している場合の通常のサンプリング速度よりも速い、または遅い速度で読み取り位置ＴＡＰｉを移動させることになる。そうすると、このような読み取り位置ＴＡＰｉの移動速度の変動により、周波数の変調が起こり、ドップラー効果が生じる。このドップラー効果は、従来行なわれていたような後付けしたような効果音ではなく、音像定位を正確に行なった結果生じるものであり、速度の変化によっても周波数が変動するものである。 The reading at the reading position TAPi in FIG. 6 is performed by delaying the reading position TAPi with respect to the writing position 64 by a predetermined delay amount. The amount of this delay is the value of the delay parameter 51 calculated by the parameter calculation unit 5 as described above with reference to FIGS. The amount of this delay varies depending on the position of the object 105A. That is, if the position of the object 105A fluctuates as described in FIGS. 1 and 2, the parameter calculation unit 5 changes the position of the virtual sound source 101A as described in S1 of FIG. Since this movement does not move instantaneously, but moves at a certain speed, the delay parameter 51 set by the parameter calculation unit 5 changes every moment. Due to the fluctuation of the delay parameter 51, the reading position TAPi is moved at a speed faster or slower than the normal sampling speed when the object 105A is stationary. Then, due to such fluctuations in the moving speed of the reading position TAPi, frequency modulation occurs and a Doppler effect occurs. This Doppler effect is not a sound effect that has been added afterwards, but is generated as a result of accurate sound image localization, and the frequency fluctuates due to a change in speed.

なお、以上、図６で説明したような読み取り位置ＴＡＰｉの移動速度を変動させるとＤ／Ａ変換器ＤＡＣｉのサンプリング速度の整数倍とならず小数点が生じる。この場合は、さらに細かいサンプリング速度に対応するよう、ディジタルフィルタで補間することができる。また、この小数点を単純に四捨五入して出力することも可能である。 As described above, when the moving speed of the reading position TAPi as described with reference to FIG. 6 is changed, a decimal point is generated instead of an integral multiple of the sampling speed of the D / A converter DACi. In this case, interpolation can be performed with a digital filter so as to correspond to a finer sampling rate. It is also possible to output by rounding off the decimal point.

また、物体１０５Ａ、１０５Ｂを３個以上登場させる場合には、図６で示したリングバッファを３つ以上設けて、その出力信号系統を加算部ＡＤＤｉによりスピーカ毎に加算する。 When three or more objects 105A and 105B appear, three or more ring buffers shown in FIG. 6 are provided, and the output signal system is added for each speaker by the adder ADDi.

＜本実施形態の応用に係るＡＶシステムの説明＞
次に、図７を用いて、本実施形態のＡＶシステム１の応用として、物体ごとに効果音を出力するものであって、予め定まったストーリ展開をもった映画等のディジタルコンテンツ、およびこのディジタルコンテンツに用いる音像定位装置について説明する。図７は、本実施形態の音像定位装置１０の応用に係る音像定位装置１０Ａを表す図である。なお、図２で示したものと同じ部分は、同様の符号を付して重複する説明は省略する。
図７に示すように、音像定位装置１０Ａにおいて、ＤＶＤ等に記録されたディジタルコンテンツは、図２で説明したようなパラメータ算出部５で逐一計算したディレイパラメータ５１、音量調整パラメータ５２、高域減衰パラメータ５３がすでに計算されて、外部記憶部３１に記録されている。外部プログラム１１１は、この記録されたデータを音像定位パラメータ４５として入力することにより、図７に示すような音像定位装置１０は、これらのパラメータを読み取ってスピーカＳＰｉ（ｉ＝１〜Ｎ）に出力する。これにより、図２で示したＡＶシステム１と同様に、コンテンツ再生システムより出力される音声信号を、映像に映し出される物体に対応する仮想音源の位置に音像定位させることができる。 <Description of AV system according to application of this embodiment>
Next, referring to FIG. 7, as an application of the AV system 1 of the present embodiment, a sound effect is output for each object, and a digital content such as a movie having a predetermined story development, and the digital A sound image localization apparatus used for content will be described. FIG. 7 is a diagram illustrating a sound image localization device 10A according to an application of the sound image localization device 10 of the present embodiment. 2 that are the same as those shown in FIG. 2 are given the same reference numerals and redundant description is omitted.
As shown in FIG. 7, in the sound image localization apparatus 10A, the digital content recorded on the DVD or the like includes the delay parameter 51, the volume adjustment parameter 52, the high-frequency attenuation, which are calculated one by one by the parameter calculation unit 5 as described in FIG. The parameter 53 has already been calculated and recorded in the external storage unit 31. The external program 111 inputs the recorded data as the sound image localization parameters 45, so that the sound image localization apparatus 10 as shown in FIG. 7 reads these parameters and outputs them to the speakers SPi (i = 1 to N). To do. As a result, similarly to the AV system 1 shown in FIG. 2, the audio signal output from the content reproduction system can be localized at the position of the virtual sound source corresponding to the object displayed in the video.

＜他の発明＞
また、以下の発明も考えられる。
（Ａ）本発明は、
前記定位制御部は、
前記仮想音源の位置と前記各スピーカとの距離または角度に基づく高域の減衰を設定するためのパラメータを計算することを特徴とする。 <Other inventions>
The following inventions are also conceivable.
(A) The present invention
The localization control unit includes:
A parameter for setting a high-frequency attenuation based on a distance or angle between the position of the virtual sound source and each speaker is calculated.

このように構成すれば、物体の音像について高域の減衰を考慮しているので、物体間の遠近感をさらに正確に音声で表現できる。 With this configuration, since the high-frequency attenuation is taken into consideration for the sound image of the object, the perspective between the objects can be expressed more accurately by voice.

（Ｂ）本発明は、
入力部に格納された、映像に含まれる物体に関連付けられた音声信号、および、前記物体の前記映像内での位置情報を用いて、
アレイ状に配列された複数のスピーカを有するスピーカアレイを制御する制御部であって、前記位置情報が示す位置に焦点を結ぶ音声ビームとなるように、前記音声信号を各スピーカに入力するタイミングを制御する定位制御ステップをコンピュータに実行させる、音像定位プログラムである。 (B) The present invention
Using the audio signal associated with the object included in the video stored in the input unit, and the position information of the object in the video,
A control unit for controlling a speaker array having a plurality of speakers arranged in an array, wherein a timing at which the audio signal is input to each speaker so as to form an audio beam focused on the position indicated by the position information A sound image localization program that causes a computer to execute a localization control step to be controlled.

このように構成して本発明のプログラムを実行させると、（１）と同様の効果を奏する。 When configured as described above and executing the program of the present invention, the same effect as (1) is obtained.

（Ｃ）本発明は、
前記（Ｂ）に記載の音像定位プログラムに対して、
前記定位制御ステップは、さらに、
メモリに記憶された視聴者の聴取方向のスピーカと表示器との間の距離の入力のデータを用いて、
当該距離分のディレイを前記仮想音源の位置と前記各スピーカとの距離分のディレイに付加してディレイ設定のためのパラメータを算出し、これに基づいて、前記音声信号を各スピーカに入力するタイミングを制御する、
音像定位プログラムであることを特徴とする。 (C) The present invention
For the sound image localization program described in (B) above,
The localization control step further includes:
Using the input data of the distance between the speaker and the display in the listening direction stored in the memory,
Timing for inputting the audio signal to each speaker based on the delay setting parameter calculated by adding the delay for the distance to the delay for the distance between the virtual sound source position and each speaker. To control the
It is a sound image localization program.

本発明は、メモリに記憶された視聴者の聴取方向のスピーカと表示器との間の距離の入力のデータを用いて、当該距離分のディレイを前記仮想音源の位置と前記各スピーカとの距離分のディレイに付加してディレイ設定のためのパラメータを算出するので、このディレイ設定のためのパラメータに基づいて前記音声信号を各スピーカに入力するタイミングを制御するから、表示器とスピーカとの距離に差がある場合でも、（Ｂ）の効果を奏することができる。 The present invention uses the input data of the distance between the speaker in the listening direction of the viewer and the display stored in the memory, and delays the distance corresponding to the position of the virtual sound source and the distance between each speaker. Since the delay setting parameter is calculated in addition to the minute delay, the timing for inputting the audio signal to each speaker is controlled based on the delay setting parameter. Even if there is a difference between the two, the effect of (B) can be achieved.

本実施形態のＡＶ機器の概念図Conceptual diagram of AV equipment of this embodiment 本実施形態の音像定位装置の内部構成図Internal configuration diagram of the sound image localization apparatus of the present embodiment 本実施形態の音像定位装置の内部のパラメータ算出部、信号処理部の構成図Configuration diagram of the internal parameter calculation unit and signal processing unit of the sound image localization apparatus of the present embodiment 物体が映像に複数登場する場合における、本実施形態の音像定位装置の内部のパラメータ算出部、信号処理部の構成図Configuration diagram of the parameter calculation unit and signal processing unit inside the sound image localization apparatus of this embodiment when multiple objects appear in the video 物体が移動する場合の本実施形態の音像定位装置のパラメータ算出部、信号処理部の動作フロー図Operation flow diagram of parameter calculation unit and signal processing unit of sound image localization apparatus of this embodiment when object moves 本実施形態の音像定位装置のディジタルフィルタの具体的な構成図Specific configuration diagram of the digital filter of the sound image localization apparatus of the present embodiment 本実施形態の音像定位装置の応用に係る音像定位装置を表す図The figure showing the sound image localization apparatus which concerns on the application of the sound image localization apparatus of this embodiment

Explanation of symbols

１−ＡＶシステム、１０−音像定位装置、１０Ａ−音像定位装置
１１−スピーカアレイ、１２−表示器
２−コンテンツ再生装置、３１−外部記憶部、４−入力部
４１−音声信号、４１Ａ−音声信号、４１Ｂ−音声信号、４２−位置情報入力
４５−音像定位パラメータ、５−パラメータ算出部、５１−ディレイパラメータ
５２−音量調整パラメータ、５３−高域減衰パラメータ
６−信号処理部、６２−リングバッファ
６３−データ区画、６４−書き込み位置
６５−データ書き込み経路、６６−読み出し経路、７−制御部
１００−視聴者、１０１Ａ−仮想音源、１０１Ｂ−仮想音源
１０２−仮想の波面、１０３−実際の波面、１０４−画像、１０５Ａ−物体
１０５Ｂ−物体、１１１−外部プログラム
ＳＰｉ（ｉ＝１〜Ｎ）−スピーカ、Ｇｉ（ｉ＝１〜Ｎ）−ゲイン調整
ＧＡｉ（ｉ＝１〜Ｎ）−ゲイン調整、ＧＢｉ（ｉ＝１〜Ｎ）−ゲイン調整
ＡＭＰｉ（ｉ＝１〜Ｎ）−アンプ、ＤＦｉ（ｉ＝１〜Ｎ）−ディジタルフィルタ
ＤＦＡｉ（ｉ＝１〜Ｎ）−ディジタルフィルタ、
ＤＦＢｉ（ｉ＝１〜Ｎ）−ディジタルフィルタ
ＤＡＣｉ（ｉ＝１〜Ｎ）−Ｄ／Ａ変換器
ＴＡＰｉ（ｉ＝１〜Ｎ）−読み取り位置、ＡＤＤｉ（ｉ＝１〜Ｎ）−加算部 1-AV system, 10-sound image localization device, 10A-sound image localization device 11-speaker array, 12-display 2-content playback device, 31-external storage unit, 4-input unit 41-audio signal, 41A-audio signal 41B-Audio signal, 42-Position information input 45-Sound image localization parameter, 5-Parameter calculation unit, 51-Delay parameter 52-Volume adjustment parameter, 53-High-frequency attenuation parameter 6-Signal processing unit, 62-Ring buffer 63 -Data partition, 64- Write position 65-Data write path, 66-Read path, 7-Control unit 100-Viewer, 101A-Virtual sound source, 101B-Virtual sound source 102-Virtual wavefront, 103-Actual wavefront, 104 -Image, 105A-Object 105B-Object, 111-External program SPi (i = 1 to N) Speaker, Gi (i = 1 to N) —Gain adjustment GAi (i = 1 to N) —Gain adjustment, GBi (i = 1 to N) —Gain adjustment AMPi (i = 1 to N) —Amplifier, DFi (i = 1 to N) -digital filter DFAi (i = 1 to N) -digital filter,
DFBi (i = 1 to N) -digital filter DACi (i = 1 to N) -D / A converter TAPi (i = 1 to N) -reading position, ADDi (i = 1 to N) -adder

Claims

An audio signal associated with an object included in the video, and an input unit for inputting positional information of the object in the video;
A control unit that controls a speaker array having a plurality of speakers arranged in an array, and a sound beam that focuses on a position indicated by the position information and a virtual sound source position set based on an arrangement position of the array speaker; A sound image localization apparatus comprising: a localization control unit that controls timing of inputting the audio signal to each speaker.

The sound image localization apparatus according to claim 1, wherein the localization control unit further controls a level at which the audio signal is input to each speaker.

The sound image localization apparatus according to claim 1, wherein the position information includes distance information in a depth direction of a display screen that displays the video.

The input unit inputs an audio signal associated with each of a plurality of objects included in a video, and position information of the object in the video,
The sound image localization apparatus according to claim 1, wherein the localization control unit generates the sound beams based on positions indicated by the position information with respect to the plurality of objects, and synthesizes the sound beams.

A video output unit that outputs a video signal including an object and outputs position information of the object in the video signal, and an audio output that outputs an audio signal associated with the object in synchronization with the video signal And
A display unit for inputting the video signal and displaying the video;
A speaker array having a plurality of speakers installed side by side with the display unit and arranged in an array,
The sound image localization apparatus according to any one of claims 1 to 4,
AV system equipped with.