JP2024007669A

JP2024007669A - Sound field reproduction program using sound source and position information of sound-receiving medium, device, and method

Info

Publication number: JP2024007669A
Application number: JP2022108891A
Authority: JP
Inventors: 翔太大久保; Shota Okubo; 俊治堀内; Toshiharu Horiuchi
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2022-07-06
Filing date: 2022-07-06
Publication date: 2024-01-19

Abstract

To provide a sound field reproduction program with which it is possible to reproduce the sound field of a sound collection space for a sound-receiving medium even when a sound source and/or the sound-receiving medium is present at a discretionary position or movable.SOLUTION: The present program causes a computer to function as: position information acquisition means for acquiring measured position information at a sound source and a sound-receiving medium; input acoustic signal determination means for identifying, on the basis of the acquired position information, a microphone located in a direction leading to the corresponding position of the sound-receiving medium or a direction close by a prescribed distance or more to said direction in a sound-collection space, as seen from the sound source, and acquiring an acoustic signal that pertains to the sound collected by the identified microphone to determine an input acoustic signal; and output acoustic signal generation means for generating, on the basis of the acquired position information, an output acoustic signal, using the input acoustic signal, the output acoustic signal causing a sound equivalent to the collected sound to be outputted toward the sound-receiving medium from a direction leading toward the corresponding position of the sound source or a direction close by a prescribed distance or more to said direction in the reproduction space, as seen from the sound-receiving medium.SELECTED DRAWING: Figure 1

Description

本発明は、音場再現を含む音場再生技術に関する。 The present invention relates to sound field reproduction technology including sound field reproduction.

近年、ウェブ（Web）会議アプリ、遠隔セッションアプリ等の普及により、マイクやスピーカを利用して好適な又は所望の音響環境を提供するサービスが広く利用されている。また、高臨場感の遠隔地カンファレンスシステムやオンラインコンサートの活用も、積極的に進められている。ここでこのようなサービスの提供においては、音源に係る音場を再生場で再生（再現）する音場再生（音場再現）技術の向上が重要な課題となる。 In recent years, with the spread of web conferencing applications, remote session applications, and the like, services that provide a suitable or desired acoustic environment using microphones and speakers have become widely used. Additionally, the use of highly immersive remote conference systems and online concerts is being actively promoted. In providing such services, an important issue is to improve sound field reproduction (sound field reproduction) technology that reproduces (reproduces) the sound field related to the sound source in a reproduction field.

この音場再生（音場再現）技術の基本として、例えば非特許文献１には、空間的に波面を生成する波面合成法を用いた音場制御の手法が紹介されている。具体的には例えば、ホイヘンスの原理の数学的表現であるキルヒホッフ-ヘルムホルツ積分方程式に、逆システム理論を適用する境界音場制御の原理が説明されている。 As the basis of this sound field reproduction (sound field reproduction) technology, for example, Non-Patent Document 1 introduces a sound field control method using a wave field synthesis method that spatially generates a wave field. Specifically, for example, the principle of boundary sound field control is explained by applying inverse system theory to the Kirchhoff-Helmholtz integral equation, which is a mathematical expression of Huygens' principle.

また、この非特許文献１に開示されたような、音圧勾配やインパルス応答等に基づいた音場再現手法として、例えば特許文献１には、受信側において、収録音と反射音とのバランスがとれた、高い臨場感の音場再現を行うことを可能にする高臨場音場再現情報送信装置が開示されている。この装置は具体的に、無音響環境で音源から取得された信号であるドライソース音源信号と、このドライソース音源信号を受信側の環境に対応させて再現するための情報である音場再現情報とを送信する装置となっている。 In addition, as a sound field reproduction method based on sound pressure gradient, impulse response, etc. as disclosed in Non-Patent Document 1, for example, Patent Document 1 discloses that the balance between recorded sound and reflected sound is A highly realistic sound field reproduction information transmitting device is disclosed that enables highly realistic sound field reproduction. Specifically, this device receives a dry source sound source signal, which is a signal acquired from a sound source in a no-acoustic environment, and sound field reproduction information, which is information for reproducing this dry source sound source signal in accordance with the receiving side environment. It is a device that transmits.

ここで音源は従来、音響を再現すべき制御領域の外側に位置するものとされてきたが、これに対し、例えば非特許文献２では、音源が制御領域内に位置する場合における音場再現が試みられている。具体的には、複数のマイクロホンで構成されるマイクロホンアレーによって収音された音響信号から、逆フィルタを用いて仮想音源信号を推定し、音源に対してより制約の少ないバーチャルリアリティシステムを構築する音場再現手法が提案されている。 Here, the sound source has conventionally been assumed to be located outside the control area in which the sound should be reproduced, but on the other hand, for example, in Non-Patent Document 2, the sound field reproduction when the sound source is located within the control area is is being attempted. Specifically, we use an inverse filter to estimate a virtual sound source signal from an acoustic signal picked up by a microphone array consisting of multiple microphones, and create a virtual reality system with fewer restrictions on the sound source. A field reproduction method has been proposed.

さらに、例えば非特許文献３には、音響を再現すべき再生場に聴取者自身が存在するといったような、外乱が不可避的に生じる状況において、このような外乱に対し高いロバスト性を示す音場再現手法が開示されている。ここで一般に、音場再現は、収音場及び再生場における音の伝搬の様子を一致させることによって完成するとされている。しかしながら、例えば人がいない状態で収音場及び再生場における音の伝搬の再現が可能となったとしても、再生場に人が入ることによって、音の伝搬の様子は変化してしまう。すなわち、再生場に人が存在することによって外乱が生じてしまうことは避けられないのである。 Furthermore, for example, Non-Patent Document 3 describes a sound field that exhibits high robustness against disturbances in situations where disturbances inevitably occur, such as when a listener is present in the playback field where the sound is to be reproduced. A reproduction method is disclosed. In general, sound field reproduction is said to be completed by matching the manner of sound propagation in the sound collection field and the reproduction field. However, even if it is possible to reproduce the sound propagation in the sound collection field and the reproduction field in the absence of people, for example, the state of sound propagation changes when a person enters the reproduction field. In other words, it is inevitable that disturbances will occur due to the presence of people in the playback area.

またこの非特許文献３では、２４本の鋭指向性マイクロホンを用いた収音方法、及びそのような収音によって生成された信号を再生する信号処理方法が提案されている。ここでこの信号処理方法によれば、収音しつつその音を出力するだけの場合でも、インパルス応答を測定して直接音成分を除去し、例えば音源の移動していることが感じられるような空間音響特性を表現することができるとしている。 Furthermore, this non-patent document 3 proposes a sound collection method using 24 acute directional microphones and a signal processing method for reproducing the signal generated by such sound collection. According to this signal processing method, even if the sound is only being collected and output, the impulse response can be measured and the sound component directly removed, for example, when the sound source is felt to be moving. It is said to be able to express spatial acoustic characteristics.

さらに、例えば特許文献２には、音源とその位置が既知の剛球とを含む剛球モデルを採用し、剛球に関して音源と反対側の聴取位置での音場を再現する音場再現装置が開示されている。具体的にこの装置は、空間内の剛球と音源とを結ぶ線に関して予め定められた関係を有する所定線上の複数箇所において予め求められた伝達関数を供給するための伝達関数記憶部と、音場を再現すべき位置に関する情報を決定するための位置推定部と、この決定された位置に関する情報に基づいて、伝達関数記憶部から対応する伝達関数の供給を受け、入力された音源信号をこの伝達関数に従い変換して音響信号を合成し出力するための出力信号合成部とを含む。ここでこのような機能構成によって、空間内の物体の配置の変化に対応してリアルタイムで音場の再現ができるとしている。 Further, for example, Patent Document 2 discloses a sound field reproduction device that employs a hard sphere model that includes a sound source and a hard sphere whose position is known, and reproduces a sound field at a listening position on the opposite side of the sound source with respect to the hard sphere. There is. Specifically, this device includes a transfer function storage section for supplying transfer functions obtained in advance at multiple locations on a predetermined line having a predetermined relationship with respect to a line connecting a rigid sphere in space and a sound source, and a sound field. a position estimating unit for determining information regarding a position to be reproduced; and a position estimating unit for determining information regarding a position to be reproduced, and receiving a corresponding transfer function from a transfer function storage unit based on the information regarding the determined position; and an output signal synthesis unit for converting according to a function, synthesizing and outputting an acoustic signal. With this functional configuration, it is possible to reproduce the sound field in real time in response to changes in the arrangement of objects in the space.

特開２００５－０８６５３７号公報JP2005-086537A 特開２００１－１４２４７１号公報Japanese Patent Application Publication No. 2001-142471

電子情報通信学会「知識の森」２群（画像・音・言語）－６編（音響信号処理）－７章音場再現，[online]，［令和４年６月２７日検索］，インターネット＜URL: https://www.ieice-hbkb.org/files/ad_base/view_pdf.html?p=/files/02/02gun_06hen_07.pdf＞Institute of Electronics, Information and Communication Engineers "Forest of Knowledge" Group 2 (Image, Sound, Language) - Volume 6 (Acoustic Signal Processing) - Chapter 7 Sound Field Reproduction, [online], [Retrieved June 27, 2020], Internet <URL: https://www.ieice-hbkb.org/files/ad_base/view_pdf.html?p=/files/02/02gun_06hen_07.pdf> 水野渉他，「マイクロホンアレーを用いた自由頂点音場再生システムに関する理論的検討」，信学技報，104巻，614号，7-12頁，２００５年Wataru Mizuno et al., “Theoretical study on free apex sound field reproduction system using microphone array,” IEICE Technical Report, Vol. 104, No. 614, pp. 7-12, 2005. 尾本章，「音場の創造的再現のための録音・再生手法」，2022年日本音響学会春季研究発表会講演論文集 1-12-10，２０２２年３月Akira Omoto, “Recording and playback methods for creative reproduction of sound fields”, Proceedings of the 2022 Acoustical Society of Japan Spring Conference 1-12-10, March 2022

例えば非特許文献１や特許文献１に開示された、予め測定した音圧勾配やインパルス応答等に基づき行われる音場再現手法は、再現精度向上を目的とした基本的な手法となっている。しかしながら、音圧勾配やインパルス応答は、聴取者の存在の影響を受けやすく、また聴取者の移動によって大きく変化してしまう。したがって、このような音場再現手法では、再生場に人が存在することによる外乱に対し、ロバスト性を高めることが困難となってしまう。 For example, the sound field reproduction method disclosed in Non-Patent Document 1 and Patent Document 1, which is performed based on pre-measured sound pressure gradients, impulse responses, etc., is a basic method aimed at improving reproduction accuracy. However, the sound pressure gradient and impulse response are easily influenced by the presence of a listener, and change significantly due to the movement of the listener. Therefore, with such a sound field reproduction method, it is difficult to increase robustness against disturbances caused by the presence of people in the reproduction field.

また非特許文献２に開示された、マイクロホンアレーによって収音された音響信号を用いた音場再現技術は、制御領域内に音源を配置した状況に対応可能な手法となっているが、聴取者の移動にも対応して仮想音源信号を生成するものにはなっていない。すなわち、上記の非特許文献１や特許文献１に開示された技術と同様、聴取者の存在やその移動に対応した音場再現は依然、困難となっているのである。 In addition, the sound field reproduction technology using acoustic signals collected by a microphone array disclosed in Non-Patent Document 2 is a method that can cope with the situation where the sound source is placed within the control area. It is not designed to generate a virtual sound source signal in response to the movement of the sound source. That is, as with the techniques disclosed in Non-Patent Document 1 and Patent Document 1, it is still difficult to reproduce a sound field that corresponds to the presence and movement of a listener.

一方、非特許文献３に開示された音場再現技術は、２４本の鋭指向性マイクロホンによって収音を行っており、聴取者の存在に対しロバスト性の高い手法となっている。しかしながら、再生場における聴取位置は、収音場において鋭指向性マイクロホン群の設置された位置に対応する位置に限定されている。すなわち、再生場における他の位置では聴感上の特徴が異なってしまうので、結局、聴取者の移動に対応した音場再現は、やはり困難であると言わざるを得ない。 On the other hand, the sound field reproduction technology disclosed in Non-Patent Document 3 collects sound using 24 acutely directional microphones, and is a highly robust method with respect to the presence of a listener. However, the listening position in the reproduction field is limited to the position corresponding to the position where the sharply directional microphone group is installed in the sound collection field. That is, since the auditory characteristics differ at other positions in the reproduction field, it must be said that it is difficult to reproduce the sound field in response to the movement of the listener.

これに対し、特許文献２に開示された音場再現装置はたしかに、聴取位置の変化による聴感上の特徴における変化の再現を行っている。しかしながら、この音場再現装置において、入力音源は、事前に収録されたドライ音源であり、またその位置は既知であることが大前提となっている。 On the other hand, the sound field reproduction device disclosed in Patent Document 2 does indeed reproduce changes in auditory characteristics due to changes in the listening position. However, in this sound field reproduction device, it is a major premise that the input sound source is a dry sound source recorded in advance, and that its position is known.

そこで、本発明は、音源が収音空間内の任意の位置に存在する又は収音空間内を移動可能となっている場合でも、及び／又は、受音体が再生空間内の任意の位置に存在する又は再生空間内を移動可能となっている場合でも、当該受音体に対し収音空間の音場を再生することの可能な音場再生プログラム、装置及び方法を提供することを目的とする。 Therefore, the present invention provides a sound source located at any position within the sound collection space or movable within the sound collection space, and/or a sound receiver located at any position within the reproduction space. The purpose of the present invention is to provide a sound field reproduction program, device, and method capable of reproducing the sound field of a sound collection space for the sound receiver, even if the sound receiver exists or is movable within the reproduction space. do.

本発明によれば、収音空間内の音源に係る音場を、空間内位置に関し当該収音空間と対応関係にある再生空間内の受音体に対し再生するための音場再生プログラムであって、
当該音源及び当該受音体における測定された若しくは設定された位置に係る情報を取得する位置情報取得手段と、
当該収音空間において当該音源から見て当該受音体の対応位置へ向かう方向又は当該方向に所定条件を満たすまでに近い方向に位置する少なくとも１つのマイクを、取得された当該位置に係る情報に基づき特定し、特定した当該マイクで収音された音に係る音響信号を取得して入力音響信号を決定する入力音響信号決定手段と、
当該再生空間において当該受音体から見て当該音源の対応位置へ向かう方向又は当該方向に所定条件を満たすまでに近い方向から、当該受音体に向けて、当該収音された音に相当若しくは対応する音が出力されることになる出力音響信号を、取得された当該位置に係る情報に基づき、当該入力音響信号を用いて生成する出力音響信号生成手段と
してコンピュータを機能させる音場再生プログラムが提供される。 According to the present invention, there is provided a sound field reproduction program for reproducing a sound field related to a sound source in a sound collection space to a sound receiver in a reproduction space that has a correspondence relationship with the sound collection space in terms of position in the space. hand,
a position information acquisition means for acquiring information regarding the measured or set position of the sound source and the sound receiver;
At least one microphone located in a direction toward the corresponding position of the sound receiver as seen from the sound source in the sound collection space, or in a direction close to the sound direction until a predetermined condition is satisfied, based on the acquired information regarding the position. input acoustic signal determining means for determining an input acoustic signal by acquiring an acoustic signal related to the sound picked up by the identified microphone;
In the reproduction space, from the direction toward the corresponding position of the sound source as seen from the sound receiver, or from a direction close to the direction until a predetermined condition is satisfied, toward the sound receiver, the sound corresponding to or corresponding to the collected sound. A sound field reproduction program that causes a computer to function as an output acoustic signal generation means that generates an output acoustic signal in which a corresponding sound is output using the input acoustic signal based on the acquired information regarding the position. provided.

この本発明による音場再生プログラムの好適な一実施形態として、位置情報取得手段は、当該音源における測定された若しくは設定された音の出力方向に係る情報も取得し、
入力音響信号決定手段は、当該収音空間において当該音源から見て当該出力方向又は当該出力方向に所定条件を満たすまでに近い方向に位置する少なくとも１つの別のマイクを、取得された当該位置に係る情報及び当該出力方向に係る情報に基づき特定し、特定した当該別のマイクで収音された音に係る別の音響信号を取得して別の入力音響信号を決定し、
出力音響信号生成手段は、当該別の入力音響信号と、当該マイクで収音された音に係る当該入力音響信号とを合わせて当該出力音響信号を生成することも好ましい。 As a preferred embodiment of the sound field reproduction program according to the present invention, the position information acquisition means also acquires information regarding the output direction of the measured or set sound at the sound source,
The input acoustic signal determining means moves at least one other microphone located in the output direction or in a direction close to the output direction as seen from the sound source in the sound collection space to the acquired position. specifying based on the information and the information related to the output direction, obtaining another acoustic signal related to the sound picked up by the specified another microphone and determining another input acoustic signal,
It is also preferable that the output acoustic signal generating means generates the output acoustic signal by combining the other input acoustic signal and the input acoustic signal related to the sound picked up by the microphone.

また上記の実施形態において、当該別のマイクは、当該マイクよりも指向性の高いマイクとなっていることも好ましい。 Further, in the above embodiment, it is also preferable that the other microphone has higher directivity than the other microphone.

さらに、本発明による音場再生プログラムにおける他の実施形態として、当該再生空間は実空間であり、
出力音響信号生成手段は、当該再生空間において当該受音体から見て当該音源の対応位置へ向かう方向又は当該方向に所定条件を満たすまでに近い方向に位置する少なくとも１つのスピーカを、取得された当該位置に係る情報に基づき特定し、特定した当該スピーカへ供給すべき当該出力音響信号を生成することも好ましい。 Furthermore, as another embodiment of the sound field reproduction program according to the present invention, the reproduction space is a real space,
The output acoustic signal generation means is configured to generate at least one speaker located in a direction toward a corresponding position of the sound source as viewed from the sound receiver in the reproduction space, or in a direction close to the sound source until a predetermined condition is satisfied. It is also preferable to specify the output sound signal based on information regarding the position and to generate the output acoustic signal to be supplied to the specified speaker.

また上記の他の実施形態において、出力音響信号生成手段は、特定した当該スピーカと比べて当該受音体の位置により近い又は周囲の境界から見てより内側に位置する少なくとも１つの別のスピーカも特定し、特定した当該別のスピーカへ供給すべき当該出力音響信号も生成することも好ましい。 In the other embodiment described above, the output acoustic signal generating means also outputs at least one other speaker located closer to the position of the sound receiver than the identified speaker or located further inward from the surrounding boundary. It is also preferable to identify and also generate the output audio signal to be supplied to the identified further speaker.

さらに上記の他の実施形態において、当該音源及び当該受音体はそれぞれ、実空間である当該収音空間内及び当該再生空間内を移動可能であり、
入力音響信号決定手段は、位置情報取得手段において１つの時点で測定され取得された当該位置に係る情報に基づき、当該１つの時点における当該マイクを特定して当該１つの時点に係る当該入力音響信号を決定し、
出力音響信号生成手段は、当該１つの時点で取得された当該位置に係る情報に基づき、当該１つの時点における当該スピーカを特定して当該１つの時点に係る当該出力音響信号を生成することも好ましい。 Furthermore, in the other embodiment described above, the sound source and the sound receiver are respectively movable within the sound collection space and the reproduction space, which are real spaces,
The input audio signal determination means identifies the microphone at the one time point based on the information regarding the position measured and acquired at the one time point by the position information acquisition means, and determines the input audio signal at the one time point. decide,
It is also preferable that the output acoustic signal generation means specifies the speaker at the one point in time based on information about the position acquired at the one point in time and generates the output acoustic signal at the one point in time. .

さらに、本発明による音場再生プログラムにおける更なる他の実施形態として、当該再生空間は、当該受音体に装着されたステレオホンにおける音像定位に係る仮想空間であり、
出力音響信号生成手段は、当該再生空間において当該受音体から見て当該音源の対応位置へ向かう方向又は当該方向に所定条件を満たすまでに近い方向から、当該受音体に向けて、当該収音された音に相当若しくは対応する音が出力されることになる、ステレオホンに供給すべき当該出力音響信号を生成することも好ましい。 Furthermore, as yet another embodiment of the sound field reproduction program according to the present invention, the reproduction space is a virtual space related to sound image localization in a stereophone attached to the sound receiver,
The output acoustic signal generating means generates the sound signal toward the sound receiver from a direction toward the corresponding position of the sound source as seen from the sound receiver in the reproduction space, or from a direction close to the direction until a predetermined condition is satisfied. It is also preferable to generate an output acoustic signal to be supplied to a stereophone, in which a sound corresponding to or corresponding to the played sound is to be output.

また、上述した「別のマイク」を用いる実施形態において、出力音響信号生成手段は、決定された当該別の入力音響信号に対し、当該音源の位置と当該受音体の位置との遠さに応じて振幅を減衰させる振幅調整処理を施し、当該処理を施した当該別の入力音響信号を用いて、出力音響信号を生成することも好ましい。 In addition, in the embodiment using the above-mentioned "another microphone", the output acoustic signal generating means is configured to respond to the determined other input acoustic signal based on the distance between the position of the sound source and the position of the sound receiver. It is also preferable to perform amplitude adjustment processing to attenuate the amplitude accordingly, and to generate an output audio signal using the other input audio signal subjected to the processing.

さらに、上述した「別のマイク」を用いる実施形態において、出力音響信号生成手段は、決定された当該別の入力音響信号に対し、当該音源の位置と当該受音体の位置との近さに応じて所定の高周波帯を強調するフィルタリング処理を施し、または、当該音源の位置と当該受音体の位置との間のインパルス応答を決定して当該インパルス応答に係る周波数領域での畳み込み処理を施し、当該処理を施した当該別の入力音響信号を用いて、出力音響信号を生成することも好ましい。 Furthermore, in the embodiment using the above-mentioned "another microphone", the output acoustic signal generating means is configured to respond to the determined another input acoustic signal based on the proximity between the position of the sound source and the position of the sound receiver. Accordingly, a filtering process is performed to emphasize a predetermined high frequency band, or an impulse response between the position of the sound source and the position of the sound receiver is determined, and convolution processing is performed in the frequency domain related to the impulse response. It is also preferable to generate the output acoustic signal using the other input acoustic signal that has undergone the processing.

さらにまた、上述した「別のマイク」を用いる実施形態において、出力音響信号生成手段は、当該位置に係る情報に基づき、当該別の入力音響信号と当該入力音響信号との位相差を決定し、決定した位相差を解消するように一方の位相を遅らせた若しくは進めた上で、当該別の入力音響信号と当該入力音響信号とを合わせて当該出力音響信号を生成することも好ましい。 Furthermore, in the embodiment using the above-mentioned "another microphone", the output acoustic signal generating means determines the phase difference between the other input acoustic signal and the input acoustic signal based on the information regarding the position, It is also preferable to delay or advance one phase so as to eliminate the determined phase difference, and then combine the other input audio signal and the input audio signal to generate the output audio signal.

本発明によれば、また、収音空間内の音源に係る音響を、空間内位置に関し当該収音空間と対応関係にある再生空間内の受音体に対し再生するための音場再生プログラムであって、
当該音源及び当該受音体における測定された若しくは設定された位置に係る情報、及び当該音源における測定された若しくは設定された音の出力方向に係る情報を取得する位置情報取得手段と、
（ａ）当該収音空間において当該音源から見て当該受音体の対応位置へ向かう方向又は当該方向に所定条件を満たすまでに近い方向に位置する少なくとも１つのマイクを、取得された当該位置に係る情報に基づき特定し、特定した当該マイクで収音された音に係る音響信号を取得して入力音響信号を決定し、また、（ｂ）当該収音空間において当該音源から見て当該出力方向又は当該出力方向に所定条件を満たすまでに近い方向に位置する少なくとも１つの別のマイクを、取得された当該位置に係る情報及び当該出力方向に係る情報に基づき特定し、特定した当該別のマイクで収音された音に係る別の音響信号を取得して別の入力音響信号を決定する入力音響信号決定手段と、
当該入力音響信号と当該別の入力音響信号とを合わせて、当該受音体に向けて、当該収音された音に相当若しくは対応する音が出力されることになる出力音響信号を生成する出力音響信号生成手段と
してコンピュータを機能させる音場再生プログラムが提供される。 According to the present invention, there is also provided a sound field reproduction program for reproducing sound related to a sound source in a sound collection space to a sound receiver in a reproduction space that has a correspondence relationship with the sound collection space in terms of position in the space. There it is,
a position information acquisition means for acquiring information regarding the measured or set position of the sound source and the sound receiver, and information regarding the output direction of the measured or set sound at the sound source;
(a) At least one microphone located in the direction toward the corresponding position of the sound receiver as seen from the sound source in the sound collection space, or in a direction close to the sound direction until a predetermined condition is satisfied, is moved to the acquired position. (b) determining the input sound signal by determining the sound signal related to the sound picked up by the identified microphone based on the information; and (b) determining the output direction as seen from the sound source in the sound collection space. or at least one other microphone located in a direction close to the output direction that satisfies a predetermined condition, based on the obtained information regarding the position and the information regarding the output direction, and the identified another microphone input sound signal determining means for determining another input sound signal by acquiring another sound signal related to the sound collected by the input sound signal;
An output that combines the input acoustic signal and the other input acoustic signal to generate an output acoustic signal that results in outputting a sound equivalent to or corresponding to the collected sound toward the sound receiver. A sound field reproduction program is provided that causes a computer to function as an acoustic signal generating means.

本発明によれば、さらに、収音空間内の音源に係る音場を、空間内位置に関し当該収音空間と対応関係にある再生空間内の受音体に対し再生するための音場再生プログラムであって、
当該音源及び当該受音体における測定された又は設定された位置に係る情報を取得する位置情報取得手段と、
当該音源からの音を収音可能な少なくとも１つのマイクで収音された音に係る音響信号を取得して入力音響信号を決定する入力音響信号決定手段と、
当該再生空間において当該受音体から見て当該音源の対応位置へ向かう方向又は当該方向に所定条件を満たすまでに近い方向に位置する少なくとも１つのスピーカを、取得された当該位置に係る情報に基づき特定し、特定した当該スピーカへ供給すべき出力音響信号を生成する出力音響信号生成手段と
してコンピュータを機能させる音場再生プログラムが提供される。 According to the present invention, the present invention further provides a sound field reproduction program for reproducing a sound field related to a sound source in a sound collection space to a sound receiver in a reproduction space that has a correspondence relationship with the sound collection space in terms of position in the space. And,
a position information acquisition means for acquiring information regarding the measured or set position of the sound source and the sound receiver;
input acoustic signal determining means for determining an input acoustic signal by acquiring an acoustic signal related to a sound picked up by at least one microphone capable of picking up the sound from the sound source;
At least one speaker located in the direction toward the corresponding position of the sound source as viewed from the sound receiver in the playback space, or in a direction close to the sound source until a predetermined condition is satisfied, based on the acquired information regarding the position. A sound field reproduction program is provided that causes a computer to function as an output acoustic signal generation means for generating an output acoustic signal to be specified and supplied to the specified speaker.

本発明によれば、さらにまた、収音空間内の音源に係る音場を、空間内位置に関し当該収音空間と対応関係にある再生空間内の受音体に対し再生するための音場再生装置であって、
当該音源及び当該受音体における測定された若しくは設定された位置に係る情報を取得する位置情報取得手段と、
当該収音空間において当該音源から見て当該受音体の対応位置へ向かう方向又は当該方向に所定条件を満たすまでに近い方向に位置する少なくとも１つのマイクを、取得された当該位置に係る情報に基づき特定し、特定した当該マイクで収音された音に係る音響信号を取得して入力音響信号を決定する入力音響信号決定手段と、
当該再生空間において当該受音体から見て当該音源の対応位置へ向かう方向又は当該方向に所定条件を満たすまでに近い方向から、当該受音体に向けて、当該収音された音に相当若しくは対応する音が出力されることになる出力音響信号を、取得された当該位置に係る情報に基づき、当該入力音響信号を用いて生成する出力音響信号生成手段と
を有する音場再生装置が提供される。 According to the present invention, the present invention further provides sound field reproduction for reproducing a sound field related to a sound source in a sound collection space to a sound receiver in a reproduction space that has a correspondence relationship with the sound collection space in terms of position in the space. A device,
a position information acquisition means for acquiring information regarding the measured or set position of the sound source and the sound receiver;
At least one microphone located in a direction toward the corresponding position of the sound receiver as seen from the sound source in the sound collection space, or in a direction close to the sound direction until a predetermined condition is satisfied, based on the acquired information regarding the position. input acoustic signal determining means for determining an input acoustic signal by acquiring an acoustic signal related to the sound picked up by the identified microphone;
In the reproduction space, from the direction toward the corresponding position of the sound source as seen from the sound receiver, or from a direction close to the direction until a predetermined condition is satisfied, toward the sound receiver, the sound corresponding to or corresponding to the collected sound. There is provided a sound field reproducing device comprising an output acoustic signal generating means for generating an output acoustic signal in which a corresponding sound is to be output, using the input acoustic signal based on the acquired information regarding the position. Ru.

本発明によれば、また、収音空間内の音源に係る音場を、空間内位置に関し当該収音空間と対応関係にある再生空間内の受音体に対し再生するための音場再生システムであって、
当該音源及び当該受音体における測定された若しくは設定された位置に係る情報を取得する位置情報取得手段と、
当該収音空間において当該音源から見て当該受音体の対応位置へ向かう方向又は当該方向に所定条件を満たすまでに近い方向に位置する少なくとも１つのマイクを、取得された当該位置に係る情報に基づき特定し、特定した当該マイクで収音された音に係る音響信号を取得して入力音響信号を決定する入力音響信号決定手段と、
当該再生空間において当該受音体から見て当該音源の対応位置へ向かう方向又は当該方向に所定条件を満たすまでに近い方向から、当該受音体に向けて、当該収音された音に相当若しくは対応する音が出力されることになる出力音響信号を、取得された当該位置に係る情報に基づき、当該入力音響信号を用いて生成する出力音響信号生成手段と
を有する音場再生システムが提供される。 According to the present invention, there is also a sound field reproduction system for reproducing a sound field related to a sound source in a sound collection space to a sound receiver in a reproduction space that corresponds to the sound collection space in terms of position in the space. And,
a position information acquisition means for acquiring information regarding the measured or set position of the sound source and the sound receiver;
At least one microphone located in a direction toward the corresponding position of the sound receiver as seen from the sound source in the sound collection space, or in a direction close to the sound direction until a predetermined condition is satisfied, based on the acquired information regarding the position. input acoustic signal determining means for determining an input acoustic signal by acquiring an acoustic signal related to the sound picked up by the identified microphone;
In the reproduction space, from the direction toward the corresponding position of the sound source as seen from the sound receiver, or from a direction close to the direction until a predetermined condition is satisfied, toward the sound receiver, the sound corresponding to or corresponding to the collected sound. A sound field reproduction system is provided, comprising an output acoustic signal generating means for generating an output acoustic signal in which a corresponding sound is output, using the input acoustic signal based on the acquired information regarding the position. Ru.

本発明によれば、さらに、収音空間内の音源に係る音場を、空間内位置に関し当該収音空間と対応関係にある再生空間内の受音体に対し再生するためのコンピュータによって実施される音場再生方法であって、
当該音源及び当該受音体における測定された若しくは設定された位置に係る情報を取得するステップと、
当該収音空間において当該音源から見て当該受音体の対応位置へ向かう方向又は当該方向に所定条件を満たすまでに近い方向に位置する少なくとも１つのマイクを、取得された当該位置に係る情報に基づき特定し、特定した当該マイクで収音された音に係る音響信号を取得して入力音響信号を決定するステップと、
当該再生空間において当該受音体から見て当該音源の対応位置へ向かう方向又は当該方向に所定条件を満たすまでに近い方向から、当該受音体に向けて、当該収音された音に相当若しくは対応する音が出力されることになる出力音響信号を、取得された当該位置に係る情報に基づき、当該入力音響信号を用いて生成するステップと
を有する音場再生方法が提供される。 According to the present invention, the computer further reproduces a sound field related to a sound source in a sound collection space to a sound receiver in a reproduction space that corresponds to the sound collection space in terms of position in the space. A sound field reproduction method comprising:
obtaining information regarding the measured or set position of the sound source and the sound receiver;
At least one microphone located in a direction toward the corresponding position of the sound receiver as seen from the sound source in the sound collection space, or in a direction close to the sound direction until a predetermined condition is satisfied, based on the acquired information regarding the position. determining an input acoustic signal by acquiring an acoustic signal related to the sound picked up by the identified microphone;
In the reproduction space, from the direction toward the corresponding position of the sound source as seen from the sound receiver, or from a direction close to the direction until a predetermined condition is satisfied, toward the sound receiver, the sound corresponding to or corresponding to the collected sound. A sound field reproduction method is provided, which includes the step of generating an output acoustic signal in which a corresponding sound is to be output, using the input acoustic signal based on the acquired information regarding the position.

本発明の音場再生プログラム、装置及び方法によれば、音源が収音空間内の任意の位置に存在する又は収音空間内を移動可能となっている場合でも、及び／又は、受音体が再生空間内の任意の位置に存在する又は再生空間内を移動可能となっている場合でも、当該受音体に対し収音空間の音場を再生することが可能となる。 According to the sound field reproduction program, device, and method of the present invention, even when the sound source is located at an arbitrary position within the sound collection space or is movable within the sound collection space, and/or the sound receiver Even if the sound receiver is located at an arbitrary position within the reproduction space or is movable within the reproduction space, it is possible to reproduce the sound field of the sound collection space for the sound receiver.

本発明による音場再生装置の一実施形態における機能構成を示す機能ブロック図である。FIG. 1 is a functional block diagram showing the functional configuration of an embodiment of a sound field reproduction device according to the present invention. 本発明に係るマイク特定処理の一実施形態を説明するための、収音空間に係る模式図である。FIG. 2 is a schematic diagram of a sound collection space for explaining an embodiment of microphone identification processing according to the present invention. 本発明に係る入力音響信号決定処理（音響信号混合処理）の一実施形態を説明するための、収音空間に係る模式図である。FIG. 2 is a schematic diagram of a sound collection space for explaining an embodiment of input acoustic signal determination processing (acoustic signal mixing processing) according to the present invention. 本発明に係るスピーカ特定処理及びスピーカパニング処理の一実施形態を説明するための、再生空間に係る模式図である。FIG. 2 is a schematic diagram of a playback space for explaining an embodiment of speaker identification processing and speaker panning processing according to the present invention. 本発明に係る天井スピーカ特定処理及び天井スピーカパニング処理の一実施形態を説明するための、再生空間に係る模式図である。FIG. 2 is a schematic diagram of a reproduction space for explaining an embodiment of ceiling speaker identification processing and ceiling speaker panning processing according to the present invention.

以下、本発明の実施形態について、図面を用いて詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail using the drawings.

［音場再生装置］
図１は、本発明による音場再生装置の一実施形態における機能構成を示す機能ブロック図である。なお同図には、本装置に係る収音空間及び再生空間の一実施形態も示されている。 [Sound field reproduction device]
FIG. 1 is a functional block diagram showing the functional configuration of an embodiment of a sound field reproduction device according to the present invention. Note that the same figure also shows an embodiment of a sound collection space and a reproduction space according to the present device.

図１に示した、本発明の一実施形態としての音場再生装置１は、収音空間内の音源（本実施形態では発話者）に係る音場を、再生空間内の受音体（本実施形態では聴取者）に対し再生することのできる装置である。 A sound field reproducing device 1 as an embodiment of the present invention shown in FIG. In the embodiment, it is a device that can reproduce music to a listener.

ここで、収音空間と再生空間とは、空間内位置に関し対応関係にある（互いに一対一に対応する空間内位置を有する）空間同士となっている。したがって、
（ａ）収音空間内における、受音体（聴取者）の位置、並びに音源（発話者）から見た、受音体（聴取者）の相対位置及び位置する方向（方位，向き）と、
（ｂ）再生空間内における、音源（発話者）の位置、並びに受音体（聴取者）から見た、音源（発話者）の相対位置及び位置する方向（方位，向き）と
が決定可能となっているのである。 Here, the sound collection space and the reproduction space are spaces that have a corresponding relationship with respect to their spatial positions (they have spatial positions that correspond one-to-one to each other). therefore,
(a) The position of the sound receiver (listener) in the sound collection space, and the relative position and direction (azimuth, direction) of the sound receiver (listener) as seen from the sound source (speaker);
(b) The position of the sound source (speaker) in the reproduction space, as well as the relative position and direction (azimuth, orientation) of the sound source (speaker) as seen from the sound receiver (listener) can be determined. It has become.

ここで、本実施形態ではさらに、音場の再生精度を向上させるべく、収音空間と再生空間との間で、音源（発話者）と受音体（聴取者）との距離や、音源（発話者）及び受音体（聴取者）のなす方向（方位，向き）も一致するように設定されている。ただし、（この後述べるマイク群によって仕切られた領域としての）収音空間の大きさ及び形状と、（この後述べるスピーカ群によって仕切られた領域としての）再生空間の大きさ及び形状とは、互いに異なっていてもよい。 Here, in this embodiment, in order to further improve the reproduction accuracy of the sound field, the distance between the sound source (speaker) and the sound receiver (listener) and the sound source ( The directions (azimuth, direction) of the speaker) and the sound receiver (listener) are also set to match. However, the size and shape of the sound collection space (as an area partitioned by a group of microphones, which will be described later), and the size and shape of a reproduction space (as an area partitioned by a group of speakers, which will be described later) are as follows: They may be different from each other.

また本実施形態では、収音空間及び再生空間はそれぞれ、収音室及び再生室であって、室の（水平面内での）境界に複数のマイクロホン（以下、マイクと略称）（２，３）及び複数のスピーカ（５）が設置されている。また本実施形態において、収音空間及び再生空間はいずれも、室の境界に少なくとも１つの（本実施形態では互いに反対方向の深度測定が可能な２つの）深度センサ４を備えており、収音空間における音源（発話者）の位置、及び再生空間における受音体（聴取者）の位置が測定可能となっている。 Furthermore, in this embodiment, the sound collection space and the playback space are a sound collection room and a playback room, respectively, and a plurality of microphones (hereinafter abbreviated as microphones) (2, 3) are located at the boundaries (in the horizontal plane) of the rooms. and a plurality of speakers (5) are installed. Furthermore, in this embodiment, both the sound collection space and the reproduction space are equipped with at least one depth sensor 4 (in this embodiment, two capable of depth measurement in opposite directions) at the boundary of the room, and The position of a sound source (speaker) in space and the position of a sound receiver (listener) in playback space can be measured.

さらに本実施形態では、音源（発話者）及び受音体（聴取者）はいずれも、移動（例えば歩行）可能であって、深度センサ４は、それらの位置をリアルタイムで測定可能となっている。ただし勿論、音源（発話者）及び受音体（聴取者）の一方又は両方が移動することなく、設定された一定の位置に存在する（例えば所定位置に座っている）形態をとることも可能である。この場合、本装置１は、この設定された一定の位置に係る情報を（例えば設定者による装置入力によって）取得することが可能となっている。 Furthermore, in this embodiment, both the sound source (speaker) and the sound receiver (listener) are movable (for example, walking), and the depth sensor 4 is capable of measuring their positions in real time. . However, of course, it is also possible for one or both of the sound source (speaker) and the sound receiver (listener) to remain at a fixed position without moving (for example, sitting at a certain position). It is. In this case, the device 1 is capable of acquiring information related to this set fixed position (for example, through device input by the setter).

このような状況の下、音場再生装置１は、音場再生処理を実施すべく、
（Ａ）音源（発話者）及び受音体（聴取者）における測定された（若しくは設定された）「位置に係る情報」を取得する位置情報取得部１１１と、
（Ｂ）収音空間において音源（発話者）から見て受音体（聴取者）の対応位置へ向かう方向又はこの方向に所定条件を満たすまでに近い方向に位置する少なくとも１つのマイクを、取得された「位置に係る情報」に基づき特定し、特定したマイクで「収音された音」に係る音響信号を取得して「入力音響信号」を決定する入力音響信号決定部１１２と、
（Ｃ）再生空間において受音体（聴取者）から見て音源（発話者）の対応位置へ向かう方向又はこの方向に所定条件を満たすまでに近い方向から、受音体（聴取者）に向けて、「収音された音」に相当若しくは対応する音が出力されることになる「出力音響信号」を、取得された「位置に係る情報」に基づき、「入力音響信号」を用いて生成する出力音響信号生成部１１３と
を有することを特徴としている。 Under such circumstances, the sound field reproduction device 1 performs the following steps in order to perform sound field reproduction processing.
(A) a position information acquisition unit 111 that acquires “information related to the position” measured (or set) in the sound source (speaker) and the sound receiver (listener);
(B) Obtain at least one microphone located in the sound collection space in a direction toward the corresponding position of the sound receiver (listener) as seen from the sound source (speaker), or in a direction close to this direction until a predetermined condition is met. an input audio signal determining unit 112 that determines an "input audio signal" by determining an audio signal related to "sound collected" by the identified microphone based on the "information regarding the position"identified;
(C) In the reproduction space, from the direction toward the corresponding position of the sound source (speaker) as seen from the sound receiver (listener), or from a direction close to this direction until a predetermined condition is met, toward the sound receiver (listener) Then, an "output acoustic signal" that will output a sound equivalent to or corresponding to the "collected sound" is generated using the "input acoustic signal" based on the acquired "position information". It is characterized by having an output acoustic signal generation section 113 that performs the following steps.

ここで上記（Ｂ）の特定されたマイクは、図１に示した本実施形態の収音空間において、発話者から見て聴取者の対応位置へ向かう方向に所定条件を満たすまでに近い方向（図１では最も近い方向及び２番目に近い方向）に位置する２つのマイク３ａ及び３ｂとなっている。すなわち本実施形態では、発話者を取り囲むように設置された複数のマイクから、「発話者の位置に係る情報」及び「聴取者の位置に係る情報」に基づき、２つのマイク３ａ及び３ｂが選定（特定）され、これらのマイク３ａ及び３ｂで収音された音に係る音響信号から「入力音響信号」が決定されるのである。 Here, in the sound collection space of this embodiment shown in FIG. In FIG. 1, there are two microphones 3a and 3b located in the nearest direction and the second nearest direction. That is, in this embodiment, two microphones 3a and 3b are selected from a plurality of microphones installed surrounding the speaker based on "information related to the speaker's position" and "information related to the listener's position". (specified) and the "input acoustic signal" is determined from the acoustic signals related to the sounds collected by these microphones 3a and 3b.

またさらに上記（Ｃ）の「出力音響信号」は、後述するように、図１に示した本実施形態の再生空間においては、聴取者を取り囲むように設置された複数のスピーカ（５）から、「聴取者の位置に係る情報」及び「発話者の位置に係る情報」に基づき選定（特定）された２つのスピーカ５ｆ及び５ｇへ供給されるべき音響信号となっている。 Furthermore, as described later, in the reproduction space of this embodiment shown in FIG. The acoustic signals are to be supplied to the two speakers 5f and 5g selected (specified) based on "information related to the position of the listener" and "information related to the position of the speaker."

ここで、これら２つのスピーカ５ｆ及び５ｇは、後に詳細に説明するが、聴取者から見て発話者の対応位置へ向かう方向に所定条件を満たすまでに近い方向に位置するスピーカであり、いわば特定された２つのマイク３ａ及び３ｂとは（発話者及び聴取者を間に挟んで）互いに対向する位置関係になっている。したがって、「出力音響信号」を受け取ったこれらのスピーカ５ｆ及び５ｇからは、特定されたマイク３ａ及び３ｂで「収音された音」に相当若しくは対応する音が出力されるようにすることができるのである。 Here, these two speakers 5f and 5g, which will be explained in detail later, are speakers that are located close to the speaker's corresponding position as viewed from the listener until a predetermined condition is satisfied, and are so-called specific speakers. The two microphones 3a and 3b are positioned to face each other (with the speaker and listener in between). Therefore, the speakers 5f and 5g that have received the "output acoustic signal" can output sounds equivalent to or corresponding to the "sound collected" by the specified microphones 3a and 3b. It is.

なお、これも後に詳述する実施形態とはなるが、再生空間を、聴取者に装着されたステレオホン（例えばヘッドホン）における音像定位に係る仮想空間として、上記（Ｃ）の出力音響信号生成部１１３は、この再生空間において聴取者から見て発話者の対応位置へ向かう方向又はこの方向に所定条件を満たすまでに近い方向から、聴取者に向けて、「収音された音」に相当若しくは対応する音が出力されることになる、装着されたステレオホンに供給すべき「出力音響信号」を生成してもよい。 Although this is also an embodiment that will be described in detail later, the output acoustic signal generation unit of (C) above assumes that the playback space is a virtual space related to sound image localization in a stereophone (e.g., headphones) worn by a listener. 113 corresponds to "collected sound" or is directed toward the listener from a direction toward the corresponding position of the speaker as seen from the listener in this reproduction space, or from a direction close to this direction until a predetermined condition is satisfied. An "output audio signal" may be generated to be supplied to the attached stereophone, from which the corresponding sound will be output.

いずれにしても、音場再生装置１によれば、取得した「発話者の位置に係る情報」及び「聴取者の位置に係る情報」に基づき、マイクを特定（選定）して「入力音響信号」を決定し、さらに「出力音響信号」を生成するので、受音体（聴取者）が再生空間内の任意の位置に存在する又は再生空間内を移動可能となっている場合でも、この受音体（聴取者）に対し収音空間の音場を好適に再生する、例えば（高い臨場感をもって）再現することができるのである。また、音源（発話者）が収音空間内の任意の位置に存在する又は収音空間内を移動可能となっている場合においても、同じく受音体（聴取者）に対し収音空間の音場を好適に再生する、例えば（高い臨場感をもって）再現することができることも理解される。 In any case, the sound field reproduction device 1 specifies (selects) the microphone based on the acquired "information related to the speaker's position" and "information related to the listener's position" and "input acoustic signal ” and then generates the “output acoustic signal,” so even if the sound receiver (listener) is located at an arbitrary position within the playback space or is able to move within the playback space, this receiver It is possible to suitably reproduce the sound field of the sound collection space for the sound body (listener), for example (with a high degree of realism). Furthermore, even if the sound source (speaker) is located at an arbitrary position within the sound collection space or is movable within the sound collection space, the sound of the sound collection space may also be affected by the sound receiver (listener). It is also understood that the scene can be suitably reproduced, for example (with a high degree of realism).

ちなみに上述したように、音源（発話者）及び受音体（聴取者）がそれぞれ、実空間である収音空間内及び再生空間内を移動可能である場合、
（ａ）上記（Ｂ）の入力音響信号決定部１１２は、上記（Ａ）の位置情報取得部１１１において１つの時点で測定され取得された「位置に係る情報」に基づき、この１つの時点におけるマイクを特定してこの１つの時点に係る「入力音響信号」を決定し、
（ｂ）上記（Ｃ）の出力音響信号生成部１１３は、この１つの時点で取得された「位置に係る情報」に基づき、この１つの時点におけるスピーカを特定してこの１つの時点に係る「出力音響信号」を生成することも好ましい。
この場合例えば、音源（発話者）の移動によって刻々と変化する収音空間の音場を、再生空間内を移動する受音体（聴取者）に対し、概ねリアルタイムで再生することも可能となるのである。 By the way, as mentioned above, if the sound source (speaker) and the sound receiver (listener) are movable within the sound collection space and the reproduction space, which are real spaces, respectively,
(a) The input acoustic signal determination unit 112 in (B) above determines the position at this one time based on the “position information” measured and acquired at one time in the position information acquisition unit 111 in (A). identifying a microphone and determining an "input acoustic signal" for this one point in time;
(b) The output acoustic signal generation unit 113 in (C) above specifies the speaker at this one point in time based on the “position information” acquired at this one point in time, and specifies the “position information” at this one point in time. It is also preferred to generate an output acoustic signal.
In this case, for example, it is possible to reproduce the sound field in the sound collection space, which changes moment by moment as the sound source (speaker) moves, to the sound receiver (listener) moving within the playback space, almost in real time. It is.

また本実施形態では、マイク（２，３）及びスピーカ（５）はそれぞれ、収音空間及び再生空間の境界に位置を固定して設置されているが、固定されておらずその位置が変化するものであってもよい。すなわち、複数のマイク（２，３）の少なくとも１つは例えば、収音空間の境界上を移動可能となっていてもよく、または、収音空間内を移動可能なロボットに取り付けられたマイクであってもよい。さらに複数のスピーカ（５）の少なくとも１つも例えば、再生空間の境界上を移動可能となっていてもよく、または、再生空間内を移動可能なロボットに取り付けられたスピーカであってもよいのである。 Further, in this embodiment, the microphones (2, 3) and the speaker (5) are installed at fixed positions at the boundaries of the sound collection space and the reproduction space, respectively, but they are not fixed and their positions change. It may be something. That is, at least one of the plurality of microphones (2, 3) may be movable on the boundary of the sound collection space, or may be a microphone attached to a robot that can move within the sound collection space. There may be. Furthermore, at least one of the plurality of speakers (5) may also be movable on the boundary of the playback space, or may be a speaker attached to a robot that is movable within the playback space. .

なおこの場合、上記（Ｂ）の入力音響信号決定部１１２は、移動しているマイクを含むマイク群の中から、例えば１つの時点において音源（発話者）から見て受音体（聴取者）の対応位置へ向かう方向又はこの方向に所定条件を満たすまでに近い方向に位置するマイクを特定することとなる。ここで、例えば上記の条件に該当する位置の近傍にいるマイクを、この該当する位置へ移動させた上で特定してもよい。さらに上記（Ｃ）の出力音響信号生成部１１３は、例えばこの１つの時点において受音体（聴取者）から見て音源（発話者）の対応位置へ向かう方向又はこの方向に所定条件を満たすまでに近い方向に位置するスピーカを特定することとなる。ここで、例えば上記の条件に該当する位置の近傍にいるスピーカを、この該当する位置へ移動させた上で特定してもよいのである。 In this case, the input acoustic signal determination unit 112 of (B) above selects a sound receiver (listener) from a group of microphones including moving microphones at one point in time, for example, as seen from the sound source (speaker). A microphone located in a direction toward the corresponding position or in a direction close to this direction until a predetermined condition is satisfied is specified. Here, for example, a microphone located near a position that meets the above conditions may be moved to the relevant position and then identified. Furthermore, the output acoustic signal generation unit 113 of (C) above, for example, at this one time point, moves in a direction toward the corresponding position of the sound source (speaker) as seen from the sound receiver (listener), or until a predetermined condition is satisfied in this direction. This means identifying a speaker located in a direction close to . Here, for example, a speaker located near a position that satisfies the above conditions may be moved to the corresponding position and then identified.

さらに本発明においては、音場再生装置１の構成要素である上記（Ｂ）の入力音響信号決定部１１２と、上記（Ｃ）の出力音響信号生成部１１３とが別の装置に具備された形態をとることも可能である。例えば、
（ａ）収音空間の中に又は近くに設置された、位置情報取得部１１１及び入力音響信号決定部１１２を備えた装置と、
（ｂ）再生空間の中に又は近くに設置された、位置情報取得部１１１及び出力音響信号生成部１１３を備えた装置であって、上記（ａ）の装置との間で通信によって情報のやり取りが可能な装置と
を含む、本発明による音場再生システムを構成してもよい。 Furthermore, in the present invention, the input acoustic signal determining section 112 of the above (B), which is a component of the sound field reproduction device 1, and the output acoustic signal generating section 113 of the above (C) are provided in separate devices. It is also possible to take for example,
(a) A device equipped with a position information acquisition unit 111 and an input acoustic signal determination unit 112 installed in or near a sound collection space;
(b) A device installed in or near the playback space and equipped with a position information acquisition unit 111 and an output acoustic signal generation unit 113, which exchanges information with the device in (a) above by communication. A sound field reproduction system according to the present invention may be configured, including a device capable of.

［装置構成，音場再生プログラム・方法］
以下、本発明の一実施形態としての音場再生装置１の機能構成について、より詳細に説明を行う。同じく図１の機能ブロック図において、音場再生装置１は、通信インタフェース１０１と、キーボード（ＫＢ）・ディスプレイ（ＤＰ）１０２と、プロセッサ・メモリ（メモリ機能を備えた演算処理系）とを有する。ここで、プロセッサ・メモリは、本発明による音場再生プログラムを保存しており、また、コンピュータ機能を有していて、この音場再生プログラムを実行することによって音場再生処理を実施する。 [Device configuration, sound field reproduction program/method]
Hereinafter, the functional configuration of the sound field reproduction device 1 as an embodiment of the present invention will be explained in more detail. Similarly, in the functional block diagram of FIG. 1, the sound field reproduction device 1 includes a communication interface 101, a keyboard (KB)/display (DP) 102, and a processor/memory (arithmetic processing system with memory function). Here, the processor memory stores the sound field reproduction program according to the present invention, has a computer function, and executes the sound field reproduction process by executing the sound field reproduction program.

またこのことから音場再生装置１は、音場再生処理専用の装置であってもよいが、本発明による音場再生プログラムを搭載した、汎用のクラウドサーバや非クラウド型サーバであってもよく、さらにはパーソナルコンピュータ（ＰＣ）、ノート型若しくはタブレット型コンピュータや、スマートフォン、さらにはヘッドマウントディスプレイ（ＨＭＤ）といったようなウェアラブルデバイス等とすることも可能である。 Further, from this, the sound field reproduction device 1 may be a device dedicated to sound field reproduction processing, but may also be a general-purpose cloud server or a non-cloud server equipped with the sound field reproduction program according to the present invention. Furthermore, it is also possible to use a personal computer (PC), a notebook or tablet computer, a smartphone, or even a wearable device such as a head mounted display (HMD).

また、プロセッサ・メモリは、機能構成部として、位置情報取得部１１１と、全指向性マイク特定部１１２ａ、入力混合部１１２ｂ、鋭指向性マイク特定部１１２ｃ、及び入力混合部１１２ｄを含む入力音響信号決定部１１２と、振幅調整部１１３ａ、音色調整部１１３ｂ、出力混合部１１３ｃ、スピーカ（ＳＰ）特定パニング部１１３ｄ、及び近スピーカ（ＳＰ）特定パニング部１１３ｅを含む出力音響信号生成部１１３と、通信制御部１２１と、入出力制御部１２２とを有する。なお、これらの機能構成部は、プロセッサ・メモリに保存された、本発明による音場再生プログラムの実行によって具現する機能と捉えることができる。また、図１の機能ブロック図における音場再生装置１の機能構成部間を矢印で接続して示した処理の流れは、本発明による音場再生方法の一実施形態としても理解される。 In addition, the processor memory includes a position information acquisition section 111, an omnidirectional microphone identification section 112a, an input mixing section 112b, a sharply directional microphone identification section 112c, and an input audio mixing section 112d as functional components. The determination unit 112 communicates with an output acoustic signal generation unit 113 including an amplitude adjustment unit 113a, a tone adjustment unit 113b, an output mixing unit 113c, a speaker (SP) specific panning unit 113d, and a near speaker (SP) specific panning unit 113e. It has a control section 121 and an input/output control section 122. Note that these functional components can be regarded as functions realized by executing the sound field reproduction program according to the present invention stored in the processor memory. Furthermore, the process flow shown by connecting the functional components of the sound field reproducing device 1 with arrows in the functional block diagram of FIG. 1 can be understood as an embodiment of the sound field reproducing method according to the present invention.

＜位置情報取得手段＞
同じく図１の機能ブロック図において、位置情報取得部１１１は本実施形態において、
（ａ）音源である発話者及び受音体である聴取者における、測定された若しくは設定された位置に係る情報、本実施形態では「位置座標情報」、及び
（ｂ）発話者における測定された若しくは設定された音の出力方向に係る情報、本実施形態では発話者の顔の向き（方位）に係る情報である「顔方位角情報」
を取得する。 <Location information acquisition means>
Similarly, in the functional block diagram of FIG. 1, the position information acquisition unit 111 in this embodiment:
(a) Information related to the measured or set position of the speaker who is the sound source and the listener who is the sound receiver, in this embodiment, "position coordinate information"; and (b) Information about the measured or set position of the speaker Alternatively, information related to the set output direction of the sound, in this embodiment, "face azimuth information" which is information related to the direction (direction) of the speaker's face.
get.

具体的に本実施形態において、収音室（収音空間）及び再生室（再生空間）はいずれも、自身の水平面内での境界に、（室内における上記（ａ）及び（ｂ）の情報を確実に取得すべく）互いに反対方向の深度測定が可能な２つの深度センサ４を備えている。ここで、これらの深度センサ４によって得られた深度画像に対し、人及び顔の認識が可能な画像認識処理を施すことによって、収音室における発話者の位置座標情報及び顔方位角情報と、再生室における聴取者の位置座標情報とが生成される。位置情報取得部１１１は、深度センサ４に接続された画像認識処理装置からこれらの情報を取得してもよく、または、深度センサ４から得られた深度画像に対し自らこのような画像認識処理を施し、これらの情報を生成・取得するものとしてもよい。 Specifically, in this embodiment, both the sound collection room (sound collection space) and the playback room (playback space) have the information (a) and (b) above in the room at the boundary in their own horizontal plane. Two depth sensors 4 are provided that are capable of measuring depth in opposite directions (to ensure reliable acquisition). Here, by performing image recognition processing that can recognize people and faces on the depth images obtained by these depth sensors 4, position coordinate information and face azimuth angle information of the speaker in the sound collection room are obtained. Position coordinate information of the listener in the playback room is generated. The position information acquisition unit 111 may acquire this information from an image recognition processing device connected to the depth sensor 4, or may perform such image recognition processing on the depth image obtained from the depth sensor 4 by itself. It is also possible to generate and obtain this information.

さらに変更態様として、深度センサ４の代わりに又は深度センサ４とともに、可視光（及び赤外線）カメラを用い、これらのカメラによって得られた画像に対し、人及び顔の認識が可能な画像認識処理を施すことによって、収音室における発話者の位置座標情報及び顔方位角情報と、再生室における聴取者の位置座標情報とを生成してもよい。また例えば、深度センサ４として、Microsoft社のAzure Kinect DKを採用することも可能である。Azure Kinect DKは、ＴｏＦ方式の深度センサ及びＲＧＢカメラを備えていて、画像認識クラウドサービスと連携させることにより、これらの位置座標情報及び顔方位角情報を生成可能となっている。また更なる変更態様として、室内における位置が決定可能な磁気センサ、加速度センサや静電センサを搭載した端末を発話者や聴取者が携帯していて、この端末から位置に係る情報を取得することも可能である。 Furthermore, as a modification, a visible light (and infrared) camera is used instead of the depth sensor 4 or in addition to the depth sensor 4, and image recognition processing that can recognize people and faces is performed on images obtained by these cameras. By doing so, position coordinate information and face azimuth angle information of the speaker in the sound collection room and position coordinate information of the listener in the reproduction room may be generated. For example, it is also possible to employ Microsoft's Azure Kinect DK as the depth sensor 4. The Azure Kinect DK is equipped with a ToF depth sensor and an RGB camera, and can generate position coordinate information and face azimuth information by linking with an image recognition cloud service. A further modification is that the speaker or listener is carrying a terminal equipped with a magnetic sensor, acceleration sensor, or electrostatic sensor that can determine the position in the room, and information regarding the position can be obtained from this terminal. is also possible.

なお本実施形態において、収音室（収音空間）には、自身の水平面内での境界において空間の内部を取り囲むように、複数の（図１では８個の）全指向性マイク２ａ～２ｈ、及び複数の（図１では８個の）鋭指向性マイク３ａ～３ｈが設置されている。ここで、鋭指向性マイク３ａ～３ｈは、全指向性マイク２ａ～２ｈとは「別のマイク」であって、当然ながら全指向性マイク２ａ～２ｈよりも指向性の高いマイクとなっている。例えば、全指向性マイク２ａ～２ｈは無指向性のアンビエントマイクであって、鋭指向性マイク３ａ～３ｈはガンマイクであってもよい。 In this embodiment, a plurality of (eight in FIG. 1) omnidirectional microphones 2a to 2h are installed in the sound collection room (sound collection space) so as to surround the inside of the space at the boundary within its own horizontal plane. , and a plurality of (eight in FIG. 1) sharply directional microphones 3a to 3h are installed. Here, the sharply directional microphones 3a to 3h are "different microphones" from the omnidirectional microphones 2a to 2h, and naturally have higher directivity than the omnidirectional microphones 2a to 2h. . For example, the omnidirectional microphones 2a to 2h may be omnidirectional ambient microphones, and the sharply directional microphones 3a to 3h may be gun microphones.

また本実施形態においては、１つの全指向性マイク２と１つの鋭指向性マイク３とがペアとなって１つの位置に設置されていて、また、これらのペアの設置位置が、全体で四角形の境界を形成している。ただし勿論、このようなマイクの配置に限定されるものではなく、例えば全指向性マイク２と鋭指向性マイク３とは互いに異なる位置、例えば互い違いの位置、に設置されてもよく、また、マイクの設置位置は、全体として例えば円形、楕円形や、（三角形を含む）多角形の境界を形成していてもよい。 Further, in this embodiment, one omnidirectional microphone 2 and one sharply directional microphone 3 are installed as a pair at one position, and the installation positions of these pairs are rectangular as a whole. forming the boundary of However, of course, the arrangement of the microphones is not limited to this. For example, the omnidirectional microphone 2 and the sharply directional microphone 3 may be installed at different positions, for example, at alternate positions. The installation position may form, for example, a circular, oval, or polygonal (including triangular) boundary as a whole.

さらに本実施形態において、これらのマイクで収音された音に係る音響信号や、上述した深度センサ４を用いて生成された位置座標情報及び顔方位角情報（又は深度画像情報）は、例えば収音室（収音空間）の中に又は近傍に設置された（図示されていない）音響信号・位置情報管理装置から、通信ネットワークを介し、音場再生装置１の通信インタフェース１０１によって受信され、同装置１の通信制御部１２１によって、入力音響信号決定部１１２や位置情報取得部１１１へ提供されることとなっている。またさらに、位置情報取得部１１１は、全指向性マイク２及び鋭指向性マイク３の設定された（又は測定された）位置に係る情報や、マイク５の設定された（又は測定された）位置に係る情報も、収音室側や再生室側から（又は装置１への直接の入力によって）受け取り、これらの情報を入力音響信号決定部１１２や出力音響信号生成部１１３へ提供する。 Furthermore, in this embodiment, acoustic signals related to sounds picked up by these microphones, position coordinate information and face azimuth angle information (or depth image information) generated using the depth sensor 4 described above are, for example, A sound signal is received by the communication interface 101 of the sound field reproduction device 1 via the communication network from an acoustic signal/location information management device (not shown) installed in or near the sound room (sound collection space), and the same The communication control section 121 of the device 1 is to provide the input acoustic signal determination section 112 and the position information acquisition section 111. Furthermore, the position information acquisition unit 111 acquires information regarding the set (or measured) positions of the omnidirectional microphone 2 and the acutely directional microphone 3, and the set (or measured) position of the microphone 5. Information related to this is also received from the sound collection room side and the playback room side (or by direct input to the device 1), and provides this information to the input audio signal determining section 112 and the output audio signal generating section 113.

＜入力音響信号決定手段＞
同じく図１の機能ブロック図において、入力音響信号決定部１１２の全指向性マイク特定部１１２ａは、
（ａ）収音室（収音空間）において発話者（音源）から見て、聴取者（受音体）の対応位置へ向かう方向、又はこの方向に所定条件を満たすまでに近い方向に位置する少なくとも１つの、本実施形態では２つの全指向性マイク２を、取得された位置座標情報に基づき特定（選定）し、特定した２つの全指向性マイク２（図１では２ａ及び２ｂ）で収音された音に係る音響信号を取得する。 <Input acoustic signal determining means>
Similarly, in the functional block diagram of FIG. 1, the omnidirectional microphone identification section 112a of the input acoustic signal determination section 112 is
(a) In the sound collection room (sound collection space), located in the direction toward the corresponding position of the listener (sound receiver) as viewed from the speaker (sound source), or in a direction close to this direction until a predetermined condition is satisfied. At least one (in this embodiment, two) omnidirectional microphones 2 is identified (selected) based on the acquired positional coordinate information, and the identified two omnidirectional microphones 2 (2a and 2b in FIG. 1) are used to collect information. Acquire an acoustic signal related to the emitted sound.

ここで全指向性マイク２は、収音空間そのものの音響（音場）を捉えるマイクとなっており、したがって、上記のように特定した全指向性マイク２（２ａ及び２ｂ）は、まさに聴取者（受音体）が聴覚で捉える音響（音場）に相当する音を収音することができるのである。 Here, the omnidirectional microphone 2 is a microphone that captures the acoustics (sound field) of the sound collection space itself. Therefore, the omnidirectional microphone 2 (2a and 2b) specified above is a microphone that captures the acoustics (sound field) of the sound collection space itself. It is possible to pick up sound that corresponds to the sound (sound field) that the sound receiver (sound receptor) perceives with the sense of hearing.

また、入力音響信号決定部１１２の鋭指向性マイク特定部１１２ｃは、
（ｂ）収音室（収音空間）において発話者（音源）から見て、発話者の顔の向いた方向、又はこの方向に所定条件を満たすまでに近い方向に位置する少なくとも１つの、本実施形態では２つの鋭指向性マイク３を、取得された位置座標情報及び顔方位角情報に基づき特定（選定）し、特定した鋭指向性マイク３（図１では３ｃ及び３ｄ）で収音された音に係る音響信号を取得する。 Further, the acute directional microphone identification unit 112c of the input acoustic signal determination unit 112,
(b) At least one book located in the sound collection room (sound collection space) in the direction in which the speaker's face is facing, or in a direction close enough to this direction to satisfy a predetermined condition, when viewed from the speaker (sound source). In the embodiment, two sharply directional microphones 3 are identified (selected) based on the acquired position coordinate information and face azimuth information, and the sound is collected by the identified sharply directional microphones 3 (3c and 3d in FIG. 1). Acquire an acoustic signal related to the sound.

ここで鋭指向性マイク３は、音源からの音そのものを捉えるマイクとなっており、したがって、上記のように特定した鋭指向性マイク３（３ｃ及び３ｄ）は、聴取者（受音体）に届けるべき発話者（音源）からの音（音声）を、確実に収音することができるのである。 Here, the sharply directional microphone 3 is a microphone that captures the sound itself from the sound source, so the sharply directional microphone 3 (3c and 3d) specified as above is used to direct the listener (sound receiver) to the listener (sound receiver). This makes it possible to reliably capture the sound (voice) from the speaker (sound source) that should be delivered.

ちなみに、特定したマイクを指定する情報を収音室側に通知し、特定した全指向性マイク２や鋭指向性マイク３によって収音された音に係る音響信号だけを、収音室側から受信してもよく、または、これらのマイクによって収音された音に係る音響信号を収音室側から全て受信した上で、そこから特定したマイクに係る音響信号を選択することも可能である。 By the way, the information specifying the identified microphone is notified to the sound collection room side, and only the acoustic signal related to the sound collected by the identified omnidirectional microphone 2 or acute directional microphone 3 is received from the sound collection room side. Alternatively, it is also possible to receive all the acoustic signals related to the sounds picked up by these microphones from the sound collection room side, and then select the acoustic signals related to the identified microphone from there.

図２は、本発明に係るマイク特定処理の一実施形態を説明するための、収音空間に係る模式図である。ここで、収音空間である収音室には、水平面内の位置を規定するxy位置座標系が設定されている。 FIG. 2 is a schematic diagram of a sound collection space for explaining one embodiment of the microphone identification process according to the present invention. Here, an xy position coordinate system that defines a position in a horizontal plane is set in the sound collection room, which is a sound collection space.

図２に示したように、全指向性マイク特定部１１２ａ（図１）は本実施形態において、（ａ）発話者の位置S(x_s, y_s)から聴取者の位置R(x_r、y_r)までを結ぶ線分をさらに延長した先における当該線分と境界との交点AM(x_amb, y_amb)を算出し、
（ｂ）位置S(x_s, y_s)から自らの設置位置へ向かう方向が、位置S(x_s, y_s)から位置AM(x_amb, y_amb)へ向かう方向に最も近くなる（互いになす角が最も小さくなる）全指向性マイク２ｂと、２番目に近くなる（互いになす角が２番目に小さくなる）全指向性マイク２ａとを特定（選定）する。 As shown in FIG. 2, in this embodiment, the omnidirectional microphone identification unit 112a (FIG. 1) detects (a) from the speaker's position S(x_s, y_s) to the listener's position R(x_r, y_r). Calculate the intersection point AM(x_amb, y_amb) between the line segment and the boundary after further extending the line segment connecting
(b) The direction from position S(x_s, y_s) to its own installation position is closest to the direction from position S(x_s, y_s) to position AM(x_amb, y_amb) (the angle between them is the smallest) ) Specify (select) the omnidirectional microphone 2b and the omnidirectional microphone 2a that is the second closest (the angle between them is the second smallest).

また、鋭指向性マイク特定部１１２ｃ（図１）は本実施形態において、
（ｃ）発話者の位置S(x_s, y_s)から顔方位角θ_s方向に伸長した線分と境界との交点GM(x_gun, y_gun)を算出し、
（ｄ）位置S(x_s, y_s)から自らの設置位置へ向かう方向が、S(x_s, y_s)からGM(x_gun, y_gun)へ向かう方向に最も近くなる（互いになす角が最も小さくなる）鋭指向性マイク３ｄと、２番目に近くなる（互いになす角が２番目に小さくなる）鋭指向性マイク３ｃとを特定（選定）するのである。 Further, in this embodiment, the sharp directional microphone identification unit 112c (FIG. 1)
(c) Calculate the intersection point GM(x_gun, y_gun) between the boundary and the line segment extending from the speaker's position S(x_s, y_s) in the direction of the face azimuth angle θ_s,
(d) The direction from position S(x_s, y_s) to its own installation position is closest to the direction from S(x_s, y_s) to GM(x_gun, y_gun) (the angle between them is the smallest) The directional microphone 3d and the sharply directional microphone 3c that is the second closest (the angle between them is the second smallest) are specified (selected).

ここで、全指向性マイク２についても鋭指向性マイク３についても、例えば最も近い１つだけを特定（選定）することもできる。しかしながら本実施形態においては、発話者の位置が刻々と変化する場合にも自然な音場が再生されるように、すなわち発話者の移動による音場の変化が自然に再生されるように、上述したような２つを特定（選定）しているのである。なお勿論、同じ理由で３つ以上を特定（選定）することも可能である。 Here, for both the omnidirectional microphone 2 and the sharply directional microphone 3, for example, only the closest one can be specified (selected). However, in this embodiment, the above-mentioned steps are taken so that a natural sound field is reproduced even when the speaker's position changes from moment to moment, that is, so that changes in the sound field due to the speaker's movement are reproduced naturally. We have identified (selected) two such things. Of course, it is also possible to specify (select) three or more for the same reason.

図１の機能ブロック図に戻って、入力音響信号決定部１１２の入力混合部１１２ｂは、全指向性マイク特定部１１２ａで特定（選定）された全指向性マイク（図１では２ａ及び２ｂ）に係る音響信号を合わせて（例えば混合して）第１の入力音響信号を生成する。また入力音響信号決定部１１２の入力混合部１１２ｄは、鋭指向性マイク特定部１１２ｃで特定（選定）された鋭指向性マイク（図１では３ｃ及び３ｄ）に係る音響信号を合わせて（例えば混合して）第２の入力音響信号を生成する。 Returning to the functional block diagram of FIG. 1, the input mixing section 112b of the input acoustic signal determining section 112 uses the omnidirectional microphones (2a and 2b in FIG. 1) specified (selected) by the omnidirectional microphone specifying section 112a. Such acoustic signals are combined (eg, mixed) to generate a first input acoustic signal. In addition, the input mixing section 112d of the input acoustic signal determining section 112 combines (for example, mixes) the acoustic signals related to the sharply directional microphones (3c and 3d in FIG. 1) specified (selected) by the sharply directional microphone specifying section 112c. ) generating a second input acoustic signal.

図３は、本発明に係る入力音響信号決定処理（音響信号混合処理）の一実施形態を説明するための、収音空間に係る模式図である。ちなみに同図に示した収音室（収音空間）におけるマイク設定や発話者・聴取者の状態は、図２に示した収音室におけるものと同一である。 FIG. 3 is a schematic diagram of a sound collection space for explaining an embodiment of input acoustic signal determination processing (acoustic signal mixing processing) according to the present invention. Incidentally, the microphone settings and the states of the speakers and listeners in the sound collection room (sound collection space) shown in the figure are the same as those in the sound collection room shown in FIG. 2.

最初に図３（Ａ）を用いて、入力混合部１１２ｂ（図１）で実施される、特定（選定）された全指向性マイク２ａに係る音響信号（の振幅強度）t_2a(p)と、特定（選定）された全指向性マイク２ｂに係る音響信号（の振幅強度）t_2b(p)とを混合して第１の入力音響信号（の振幅強度）t_amb(p)を生成する処理について説明する。ここでpは、時点又は時点に相当するサンプリングインデックスを表すパラメータである。 First, using FIG. 3(A), the acoustic signal (amplitude intensity) t_2a(p) related to the specified (selected) omnidirectional microphone 2a performed by the input mixing unit 112b (FIG. 1), The process of generating the first input acoustic signal (amplitude intensity) t_amb(p) by mixing the acoustic signal (amplitude intensity) t_2b(p) related to the specified (selected) omnidirectional microphone 2b will be explained. do. Here, p is a parameter representing a time point or a sampling index corresponding to the time point.

図３（Ａ）に示したように、
（ａ）発話者の位置S及び全指向性マイク２ａの位置P2aを結ぶ線分と、発話者の位置S及び全指向性マイク２ｂの位置P2bを結ぶ線分とのなす角をαとし、
（ｂ）発話者の位置S及び全指向性マイク２ａの位置P2aを結ぶ線分と、発話者の位置S及び（図２で説明した）位置AMを結ぶ線分とのなす角をβとした場合に、
第１の入力音響信号t_amb(p)は、これらα及びβを算出した上で、次式
（１） t_amb(p)＝t_2a(p)×(sin(α／2)＋sin(α／2－β))
＋t_2b(p)×(sin(α／2)－sin(α／2－β))
によって導出される。 As shown in Figure 3(A),
(a) Let α be the angle formed by the line segment connecting the speaker's position S and the position P2a of the omnidirectional microphone 2a and the line segment connecting the speaker's position S and the position P2b of the omnidirectional microphone 2b,
(b) Let β be the angle formed by the line segment connecting the speaker's position S and the position P2a of the omnidirectional microphone 2a and the line segment connecting the speaker's position S and position AM (explained in Figure 2). In case,
After calculating these α and β, the first input acoustic signal t_amb(p) is calculated using the following formula (1) t_amb(p)=t_2a(p)×(sin(α/2)+sin(α/2− β))
+t_2b(p)×(sin(α/2)−sin(α/2−β))
It is derived by

上式（１）において、右辺第一項におけるt_2a(p)及びt_2b(p)の係数はそれぞれ、全指向性マイク２ａに係る音響信号の混合すべき割合、及び全指向性マイク２ｂに係る音響信号の混合すべき割合となっている。ちなみにβ＝0及びβ＝αの場合、第１の入力音響信号t_amb(p)は、それぞれt_2a(p)の定数倍、及びt_2b(p)の定数倍となり、予め２つのマイクを特定（選定）したものの実質的には、一方の（１つの）マイクに係る音響信号を入力音響信号とすることとなる。すなわち本実施形態においては、発話者の位置Sと聴取者の位置Rとマイクの位置とがこの順で同一直線上にある場合、まさにその（１つの）マイクを特定（選定）する設定となっていてもよいのである。 In the above equation (1), the coefficients of t_2a(p) and t_2b(p) in the first term on the right-hand side are the proportion to be mixed of the acoustic signals related to the omnidirectional microphone 2a and the acoustic signal related to the omnidirectional microphone 2b, respectively. This is the ratio at which the signals should be mixed. Incidentally, when β = 0 and β = α, the first input acoustic signal t_amb(p) is a constant multiple of t_2a(p) and a constant multiple of t_2b(p), respectively, and two microphones are specified (selected) in advance. ), but essentially the acoustic signal related to one (one) microphone is used as the input acoustic signal. In other words, in this embodiment, when the speaker's position S, the listener's position R, and the microphone position are on the same straight line in this order, the setting is such that exactly that (one) microphone is specified (selected). It's okay to stay.

次に、同じく図３（Ａ）を用いて、入力混合部１１２ｄ（図１）で実施される、特定（選定）された鋭指向性マイク３ｃに係る音響信号t_3c(p)と、特定（選定）された鋭指向性マイク３ｄに係る音響信号t_3d(p)とを混合して第２の入力音響信号t_gun(p)を生成する処理について説明する。 Next, similarly using FIG. 3(A), the acoustic signal t_3c(p) related to the specified (selected) sharp directional microphone 3c and the specified (selected) ) and the acoustic signal t_3d(p) from the sharply directional microphone 3d to generate the second input acoustic signal t_gun(p) will be described.

図３（Ａ）に示したように、
（ａ）発話者の位置S及び鋭指向性マイク３ｃの位置P3cを結ぶ線分と、発話者の位置S及び鋭指向性マイク３ｄの位置P3dを結ぶ線分とのなす角をδとし、
（ｂ）発話者の位置S及び鋭指向性マイク３ｃの位置P3cを結ぶ線分と、発話者の位置S及び（図２で説明した）位置GMを結ぶ線分とのなす角をγとすると、
第２の入力音響信号t_gun(p)は、これらδ及びγを算出した上で、次式
（２） t_gun(p)＝t_3c(p)×(sin(δ／2)＋sin(δ／2－γ))
＋t_3d(p)×(sin(δ／2)－sin(δ／2－γ))
によって導出される。ここで第２の入力音響信号t_gun(p)の生成における、マイクの特定（選定）数については、上式（１）のところで説明した事情と同様となっている。 As shown in Figure 3(A),
(a) Let δ be the angle formed by the line segment connecting the speaker's position S and the position P3c of the sharply directional microphone 3c and the line segment connecting the speaker's position S and the position P3d of the sharply directional microphone 3d,
(b) Let γ be the angle formed by the line segment connecting the speaker's position S and the position P3c of the sharply directional microphone 3c, and the line segment connecting the speaker's position S and position GM (explained in FIG. 2). ,
After calculating these δ and γ, the second input acoustic signal t_gun(p) is calculated using the following formula (2) t_gun(p)=t_3c(p)×(sin(δ/2)+sin(δ/2−) γ))
+t_3d(p)×(sin(δ/2)−sin(δ/2−γ))
It is derived by Here, regarding the number of microphones to be specified (selected) in generating the second input acoustic signal t_gun(p), the situation is similar to that described in relation to equation (1) above.

＜出力音響信号生成手段＞
図１の機能ブロック図に戻って、出力音響信号生成部１１３の振幅調整部１１３ａは、本実施形態において、第２の入力音響信号t_gun(p)に対し、発話者（音源）の位置と聴取者（受音体）の位置との遠さに応じて振幅を減衰させる振幅調整処理を施す。なお、この処理対象を（鋭指向性マイク３に係る）第２の入力音響信号t_gun(p)とするのは、鋭指向性マイク３（３ｃ及び３ｄ）によって収音された、聴取者に届けるべき発話者からの音（音声）については実際に、発話者と聴取者とが互いに近くにいるほど音量が大きく、逆に遠くにいるほど音量が小さくなるのであり、そのように入力音響信号の振幅を調整することによって、音場の再生精度が向上し臨場感が高まることによる。なお勿論、演算時間を短縮すべく、このような振幅調整処理を実施しないことも可能である。 <Output acoustic signal generation means>
Returning to the functional block diagram of FIG. 1, in this embodiment, the amplitude adjustment unit 113a of the output acoustic signal generation unit 113 determines the position of the speaker (sound source) and the listening position for the second input acoustic signal t_gun(p). An amplitude adjustment process is performed to attenuate the amplitude according to the distance from the position of the person (sound receiver). Note that this processing target is the second input acoustic signal t_gun(p) (related to the sharply directional microphone 3), which is the sound picked up by the sharply directional microphone 3 (3c and 3d) and delivered to the listener. In fact, the closer the speaker and listener are to each other, the louder the sound (voice) from the speaker is, and conversely, the farther away the listener is, the lower the volume is. This is because adjusting the amplitude improves the reproduction accuracy of the sound field and enhances the sense of presence. Of course, it is also possible to omit such amplitude adjustment processing in order to shorten the calculation time.

ここで一般に、音響信号における（音源からの）距離dの位置での距離減衰y（dB）は、基準距離をd_0として、y＝20log₁₀(d_0／d)によって求められる。また一般に、音響信号の振幅とデシベルdBとは(振幅)＝10^dB／20の関係にある。したがって、具体的に振幅調整部１１３ａは、振幅調整処理済みの第２の入力音響信号tj_gun(p)を、次式
（３） tj_gun(p)＝t_gun(p)×amp
amp＝10^y／20
y＝20log₁₀(d_SR／d_SG)
によって生成することができる。上式（３）において、（基準）距離d_SR及び距離d_SGはそれぞれ、図３（Ｂ）に示したように、発話者の位置Sと聴取者の位置Rとの距離、及び発話者の位置Sと位置GMとの距離である。 Generally, distance attenuation y (dB) at a position of distance d (from the sound source) in an acoustic signal is determined by y=20log ₁₀ (d_0/d), where d_0 is the reference distance. Generally, the amplitude of an acoustic signal and decibel dB have a relationship of (amplitude)=10 ^dB/20 . Therefore, specifically, the amplitude adjustment unit 113a converts the amplitude adjustment-processed second input acoustic signal tj_gun(p) into the following equation (3) tj_gun(p)=t_gun(p)×amp
amp=10y ^/20
y＝20log ₁₀ (d_SR／d_SG)
can be generated by In the above equation (3), the (reference) distance d_SR and distance d_SG are the distance between the speaker's position S and the listener's position R, and the speaker's position S, respectively, as shown in FIG. 3(B). and the distance from the position GM.

同じく図１の機能ブロック図において、出力音響信号生成部１１３の音色調整部１１３ｂは、本実施形態において、聴取者（受音体）に届けるべき発話者（音源）からの音を収音する鋭指向性マイク３に係る第２の入力音響信号に対し、実際に聴取者（受音体）が受け取るであろう音色となるように音色調整処理を施す。ここで、一般に音は、伝播するにつれて（音源から遠のくにつれて）高周波数成分の減衰することが知られている。したがって、例えば聴取者が発話者のより近くに位置するほど、入力音響信号において所定の高周波数帯を強調することによって、より近くに位置するといったような距離の遠近感を表現し、これにより、音場の再生精度を向上させ臨場感を高めることが可能となるのである。 Similarly, in the functional block diagram of FIG. 1, the timbre adjustment section 113b of the output acoustic signal generation section 113 is a tone adjustment section 113b that collects the sound from the speaker (sound source) to be delivered to the listener (sound receiver). A timbre adjustment process is performed on the second input acoustic signal related to the directional microphone 3 so that it becomes the timbre that a listener (sound receiver) would actually receive. Here, it is generally known that high frequency components of sound attenuate as it propagates (as it moves away from the sound source). Therefore, for example, the closer the listener is to the speaker, the closer the listener is to the speaker by emphasizing a predetermined high frequency band in the input acoustic signal. This makes it possible to improve the reproduction accuracy of the sound field and enhance the sense of presence.

このような音色調整処理として、音色調整部１１３ｂは、振幅調整処理済みの第２の入力音響信号tj_gun(p)に対し、
（ア：時間情報における双二次フィルタリング処理）発話者（音源）の位置と聴取者（受音体）の位置との近さに応じて所定の高周波帯を強調する双二次フィルタリング処理を施して、フィルタリング処理済みの第２の入力音響信号tf_gun(p)を生成する、または、
（イ：周波数領域での畳み込み処理）発話者（音源）の位置と聴取者（受音体）の位置との間のインパルス応答を決定してこのインパルス応答に係る周波数領域での畳み込み処理を施し、この畳み込み処理を施して、畳み込み処理済みの第２の入力音響信号tc_gun(p)を生成する。
ちなみに勿論、演算時間を短縮すべく、このような音色調整処理を実施しないことも可能である。 As such timbre adjustment processing, the timbre adjustment section 113b adjusts the amplitude adjustment-processed second input acoustic signal tj_gun(p) to
(A: Biquadratic filtering processing for time information) Biquadratic filtering processing is performed to emphasize a predetermined high frequency band depending on the proximity between the position of the speaker (sound source) and the position of the listener (sound receiver). to generate a filtered second input acoustic signal tf_gun(p), or
(B: Convolution processing in the frequency domain) Determine the impulse response between the position of the speaker (sound source) and the position of the listener (sound receiver), and perform convolution processing on this impulse response in the frequency domain. , this convolution process is performed to generate the convolution-processed second input acoustic signal tc_gun(p).
Incidentally, it is of course possible to omit such timbre adjustment processing in order to shorten the calculation time.

（時間情報における双二次フィルタリング処理）
最初に、上記（ア）について具体的な説明を行う。双二次フィルタ（biquad filter）として例えばハイシェルフフィルタ（High Shelf Filter）を採用し、発話者から出力される音を、4kHz帯の子音成分を含む発話音声として、このフィルタの設定を、
・Freq（この値よりも高い周波数帯をブーストする閾値）＝4000（Hz）
・Q（周波数帯ピークの幅）＝1.0
・Gain（フィルタの利得）＝amp ここでamp＝10^y／20，y＝20log₁₀(d_SR／d_SG)
等とする。次いでここから、双二次伝達関数（biquad transfer function）：
（４） H(z)＝(b0＋b1×z^-1＋b2×z^-2)／(a0＋a1×z^-1＋a2×z^-2)
における分母及び分子の各係数a0、a1、a2、b0、b1及びb2を算出して、双二次フィルタリング処理済みの第２の入力音響信号tf_gun(p)を、次式
（５） tf_gun(p)＝(1／a0)×(b0×tj_gun(p)＋b1×tj_gun(p-1)＋
b2×tj_gun(p-2)－a1×tf_gun(p-1)－a2×tj_gun(p-2))
を用いて生成するのである。 (Biquadratic filtering processing on time information)
First, the above (a) will be explained in detail. For example, a high shelf filter is adopted as a biquad filter, and the settings of this filter are as follows, assuming that the sound output from the speaker is a speech sound that includes consonant components in the 4kHz band.
・Freq (threshold for boosting frequency bands higher than this value) = 4000 (Hz)
・Q (width of frequency band peak) = 1.0
・Gain (filter gain) = amp where amp = 10 ^y/20 , y = 20log ₁₀ (d_SR/d_SG)
etc. Then from here, the biquad transfer function:
(4) H(z)=(b0+b1×z ^-1 +b2×z ^-2 )/(a0+a1×z ^-1 +a2×z ^-2 )
The denominator and numerator coefficients a0, a1, a2, b0, b1, and b2 are calculated, and the biquadratic filtered second input acoustic signal tf_gun(p) is calculated using the following formula (5) tf_gun(p) )=(1/a0)×(b0×tj_gun(p)+b1×tj_gun(p-1)+
b2×tj_gun(p-2)－a1×tf_gun(p-1)－a2×tj_gun(p-2))
It is generated using .

ここで、各種フィルタにおける係数a0、a1、a2、b0、b1及びb2の算出方法は、例えば非特許文献：“Cookbook formulae for audio equalizer biquad filter coefficients”，[online]，［令和４年６月２７日検索］，インターネット＜URL: https://webaudio.github.io/Audio-EQ-Cookbook/audio-eq-cookbook.html＞や、非特許文献：「++C++; // 未確認飛行 C」，[online]，［令和４年６月２７日検索］，インターネット＜URL: https://ufcpp.net/study/sp/digital_filter/biquad/＞に開示されている。ちなみに、上述したようなフィルタの選択、及びFreq、QやGain等のフィルタ設定は、如何なる音場を再生したいのかによって、また収音空間・再生空間におけるシチュエーションに応じて、適宜行われることも好ましい。 Here, the calculation method of coefficients a0, a1, a2, b0, b1, and b2 in various filters is described, for example, in a non-patent document: “Cookbook formula for audio equalizer biquad filter coefficients”, [online], [June 2020] 27th search], Internet <URL: https://webaudio.github.io/Audio-EQ-Cookbook/audio-eq-cookbook.html>, non-patent literature: "++C++; // Unidentified Flight C" , [online], [Retrieved June 27, 2020], Disclosed on the Internet <URL: https://ufcpp.net/study/sp/digital_filter/biquad/>. Incidentally, it is preferable that the selection of the filter as described above and the filter settings such as Freq, Q, and Gain are performed as appropriate depending on what kind of sound field is desired to be reproduced and the situation in the sound collection space and reproduction space. .

（周波数領域での畳み込み処理）
次に上記（イ）について具体的な説明を行う。まず、発話者の位置Sと聴取者の位置Rとの間における、空気の吸収特性を反映したインパルス応答、又は空気の吸収特性を模した複数のピークを持たせたフィルタのインパルス応答を準備し、これをi(p)とする。このようなインパルス応答i(p)は、距離d_SR（図３（Ｂ））に依存する量となっている。ここで、インパルス応答i(p)の時間長Nを、計算機の能力によって適宜設定することも好ましい。例えば、典型的にはN＝1024とすることができるが、処理速度の高くないＰＣを用いる場合、N＝2048やN＝4096としてもよい。 (Convolution processing in frequency domain)
Next, the above (a) will be explained in detail. First, prepare an impulse response that reflects the absorption characteristics of air, or an impulse response of a filter with multiple peaks that simulates the absorption characteristics of air, between the speaker's position S and the listener's position R. , this is defined as i(p). Such an impulse response i(p) is a quantity that depends on the distance d_SR (FIG. 3(B)). Here, it is also preferable to appropriately set the time length N of the impulse response i(p) depending on the capability of the computer. For example, typically N = 1024, but if a PC with low processing speed is used, N = 2048 or N = 4096 may be used.

次いで、インパルス応答i(p)をフーリエ変換して伝達関数I(ω)（ωは周波数）を導出し、また、設定した時間長Nで切り出した第２の入力音響信号tj_gun(p)に対しフーリエ変換を施して、フーリエ変換後の第２の入力音響信号tj_gun(ω)を算出する。これにより、周波数領域での畳み込み処理済みの第２の入力音響信号tc_gun(p)を、次式
（６） tc_gun(p)＝IF[tc_gun(ω)]
tc_gun(ω)＝I(ω)＊tj_gun(ω)
によって生成することができるのである。ここで、IF[]は逆フーリエ変換演算子であり、また、＊は畳み込み積分演算子である。 Next, the impulse response i(p) is Fourier transformed to derive the transfer function I(ω) (ω is the frequency), and for the second input acoustic signal tj_gun(p) cut out at the set time length N. A Fourier transform is performed to calculate the second input acoustic signal tj_gun(ω) after the Fourier transform. As a result, the second input acoustic signal tc_gun(p) that has been subjected to convolution processing in the frequency domain can be calculated using the following equation (6) tc_gun(p)=IF[tc_gun(ω)]
tc_gun(ω)＝I(ω)＊tj_gun(ω)
It can be generated by Here, IF[] is an inverse Fourier transform operator, and * is a convolution integral operator.

以上、音色調整処理として（ア）及び（イ）の２つの処理を説明したが、音場再生のリアルタイム性が重視される状況では、演算時間が比較的少なくて済む（ア：時間情報における双二次フィルタリング処理）を採用し、一方、複雑なフィルタリング処理（例えば、特定した異なる複数の高周波帯を強調する処理）を行って音場再生を向上させ臨場感を高めたい状況では、（イ：周波数領域での畳み込み処理）を採用することとしてもよい。すなわち状況に応じ、どちらか一方の音色調整処理を選択して実施可能な設定となっていることも好ましい。 Above, we have explained the two processes (a) and (b) as timbre adjustment processes, but in situations where real-time sound field reproduction is important, the calculation time is relatively small (A: duality in time information). On the other hand, in situations where you want to perform complex filtering processing (for example, processing that emphasizes multiple identified different high frequency bands) to improve sound field reproduction and enhance the sense of presence, (I: Convolution processing in the frequency domain) may also be employed. That is, it is also preferable that the settings be such that either one of the timbre adjustment processes can be selected and executed depending on the situation.

同じく図１の機能ブロック図において、出力音響信号生成部１１３の出力混合部１１３ｃは、生成された第１の入力音響信号（t_amb(p)）と、生成された第２の入力音響信号（tf_gun(p)，tc_gun(p)）とを合わせて（混合して）出力音響信号t_out(p)を生成する。すなわち、次式
（７） t_out(p)＝t_amb(p)＋tf_gun(p) 又は
t_out(p)＝t_amb(p)＋tc_gun(p)
によって、出力音響信号t_out(p)を導出する。 Similarly, in the functional block diagram of FIG. 1, the output mixing section 113c of the output acoustic signal generation section 113 mixes the generated first input acoustic signal (t_amb(p)) and the generated second input acoustic signal (tf_gun). (p), tc_gun(p)) are combined (mixed) to generate an output acoustic signal t_out(p). In other words, the following equation (7) t_out(p)=t_amb(p)+tf_gun(p) or
t_out(p)＝t_amb(p)＋tc_gun(p)
The output acoustic signal t_out(p) is derived by .

ここで１つの態様として、出力混合部１１３ｃは、発話者の位置S及び聴取者の位置Rに基づき、第１の入力音響信号（t_amb(p)）と第２の入力音響信号（tf_gun(p)，tc_gun(p)）との位相差を決定し、決定した位相差を解消するように一方の位相を遅らせた若しくは進めた上で、第１の入力音響信号（t_amb(p)）と第２の入力音響信号（tf_gun(p)，tc_gun(p)）とを合わせて（混合して）出力音響信号t_out(p)を生成することも好ましい。 In one aspect, the output mixing unit 113c outputs a first input acoustic signal (t_amb(p)) and a second input acoustic signal (tf_gun(p) based on the speaker's position S and the listener's position R. ), tc_gun(p)), and after delaying or advancing one phase so as to eliminate the determined phase difference, the first input acoustic signal (t_amb(p)) and It is also preferable to generate the output acoustic signal t_out(p) by combining (mixing) the two input acoustic signals (tf_gun(p), tc_gun(p)).

ちなみに上記の位相差は、（ａ）第１の入力音響信号に係る発話者の位置Sと位置AMとの距離d_SA（図３（Ｂ））と、（ｂ）第２の入力音響信号に係る発話者の位置Sと位置GMとの距離d_SG（図３（Ｂ））とが異なっている場合に発生し、両信号を混合する際に干渉を生じさせる原因となる。ただし勿論、適切な調整に手間のかかるこのような位相差調整処理を実施せず、その分、演算時間を短縮させることも可能である。 Incidentally, the above phase difference is determined by (a) the distance d_SA between the speaker's position S and the position AM related to the first input acoustic signal (Fig. 3(B)), and (b) related to the second input acoustic signal. This occurs when the distance d_SG (FIG. 3(B)) between the speaker's position S and the position GM is different, and causes interference when mixing both signals. However, of course, it is also possible to not perform such a phase difference adjustment process, which requires time and effort for appropriate adjustment, and to reduce the calculation time accordingly.

具体的に出力混合部１１３ｃは、最初に位相差p_thを、次式
（８） p_th＝fs×(d_SA－d_SG)／c
によって算出する。ここでc（m/秒）は音速であり、fs（Hz）はサンプリング周波数となっている。 Specifically, the output mixing unit 113c first calculates the phase difference p_th using the following formula (8) p_th=fs×(d_SA−d_SG)/c
Calculated by Here, c (m/sec) is the speed of sound, and fs (Hz) is the sampling frequency.

次いで出力混合部１１３ｃは、d_SA≧d_SGの場合に、距離d_SG（位置GM）に係る方の第２の入力音響信号（tf_gun(p)，tc_gun(p)）に遅延を付加して位相差を解消させた上で、出力音響信号t_out(p)を生成する。すなわち、次式
（d_SA≧d_SGの場合）
（９） t_out(p)＝t_amb(p)＋tf_gun(p＋p_th) 又は
t_out(p)＝t_amb(p)＋tc_gun(p＋p_th)
によって、出力音響信号t_out(p)を導出する。一方、d_SA＜d_SGの場合には、距離d_SA（位置AM）に係る方の第１の入力音響信号（t_amb(p)）に遅延を付加して位相差を解消させた上で、出力音響信号t_out(p)を生成する。すなわち、次式
（d_SA＜d_SGの場合）
（９’） t_out(p)＝t_amb(p＋p_th)＋tf_gun(p) 又は
t_out(p)＝t_amb(p＋p_th)＋tc_gun(p)
によって、出力音響信号t_out(p)を導出するのである。 Next, in the case of d_SA≧d_SG, the output mixing unit 113c adds a delay to the second input acoustic signal (tf_gun(p), tc_gun(p)) related to the distance d_SG (position GM) to calculate the phase difference. After eliminating the problem, the output acoustic signal t_out(p) is generated. In other words, the following formula (if d_SA≧d_SG)
(9) t_out(p)=t_amb(p)+tf_gun(p+p_th) or
t_out(p)＝t_amb(p)＋tc_gun(p＋p_th)
The output acoustic signal t_out(p) is derived by . On the other hand, in the case of d_SA<d_SG, a delay is added to the first input acoustic signal (t_amb(p)) related to the distance d_SA (position AM) to eliminate the phase difference, and then the output acoustic signal is Generate t_out(p). In other words, the following formula (if d_SA<d_SG)
(9') t_out(p)=t_amb(p+p_th)+tf_gun(p) or
t_out(p)＝t_amb(p＋p_th)＋tc_gun(p)
Thus, the output acoustic signal t_out(p) is derived.

同じく図１の機能ブロック図において、出力音響信号生成部１１３のスピーカ特定パニング部１１３ｄは、
（ア）再生室（再生空間）において聴取者（受音体）から見て発話者（音源）の対応位置へ向かう方向又はこの方向に所定条件を満たすまでに近い方向に位置する少なくとも１つのスピーカ５（図１では、２つのスピーカ５ｆ及び５ｇ）を、取得された聴取者（受音体）及び発話者（音源）の位置座標情報（位置に係る情報）に基づき特定し、
（イ）特定したスピーカ５（５ｆ及び５ｇ）へ供給すべき出力音響信号を生成する。
以下図４を用いて、上記（ａ）のスピーカ特定処理、及び上記（ｂ）のスピーカパニング処理の具体的な説明を行う。 Similarly, in the functional block diagram of FIG. 1, the speaker specific panning section 113d of the output acoustic signal generation section 113 is
(a) At least one speaker located in a direction toward the corresponding position of the speaker (sound source) as viewed from the listener (sound receiver) in the reproduction room (reproduction space), or in a direction close enough to satisfy a predetermined condition in this direction. 5 (in FIG. 1, the two speakers 5f and 5g) are identified based on the acquired positional coordinate information (positional information) of the listener (sound receiver) and the speaker (sound source),
(a) Generate an output acoustic signal to be supplied to the specified speakers 5 (5f and 5g).
The speaker identification process (a) and the speaker panning process (b) will be specifically explained below using FIG. 4.

図４は、本発明に係るスピーカ特定処理及びスピーカパニング処理の一実施形態を説明するための、再生空間に係る模式図である。ここで、再生空間である再生室には、水平面内の位置を規定するxy位置座標系が設定されている。 FIG. 4 is a schematic diagram of a playback space for explaining an embodiment of speaker identification processing and speaker panning processing according to the present invention. Here, an xy position coordinate system that defines a position in a horizontal plane is set in the playback room, which is a playback space.

図４に示したように、スピーカ特定パニング部１１３ｄ（図１）は本実施形態において最初に、
（ア１）聴取者の位置Rから発話者の位置Sまでを結ぶ線分をさらに延長した先における当該線分と境界との交点SP(x_sp, y_sp)を算出し、
（ア２）自らの設置位置から聴取者の位置Rに向かう方向が、位置SP(x_sp, y_sp)から聴取者の位置Rに向かう方向に最も近くなる（互いになす角が最も小さくなる）スピーカ５ｇと、２番目に近くなる（互いになす角が２番目に小さくなる）スピーカ５ｆとを特定（選定）する。 As shown in FIG. 4, in this embodiment, the speaker specific panning section 113d (FIG. 1) first
(A1) Calculate the intersection point SP (x_sp, y_sp) between the line segment and the boundary after further extending the line segment connecting the listener's position R to the speaker's position S,
(A2) A speaker 5g whose direction from its installation position to the listener's position R is closest to the direction from the position SP (x_sp, y_sp) to the listener's position R (the angle between them is the smallest) and the speaker 5f that is the second closest (the angle between them is the second smallest) is specified (selected).

次に、スピーカ特定パニング部１１３ｄ（図１）は本実施形態において、
（イ）特定（選定）したスピーカ５ｆ及び５ｇから、聴取者に向けて、「（マイクで）収音された音」に相当若しくは対応する音が出力されることになる、スピーカ５ｆ及び５ｇ用の出力音響信号を、取得された聴取者及び発話者の位置座標情報に基づき、（第１及び第２の入力音響信号を用いて生成された）出力音響信号t_out(p)を用いて生成する。 Next, in this embodiment, the speaker specific panning unit 113d (FIG. 1)
(b) For speakers 5f and 5g, where the specified (selected) speakers 5f and 5g will output a sound equivalent to or corresponding to "the sound picked up (by the microphone)" toward the listener. generate an output acoustic signal using an output acoustic signal t_out(p) (generated using the first and second input acoustic signals) based on the obtained positional coordinate information of the listener and the speaker. .

具体的には図４に示したように、
（ａ）スピーカ５ｆの位置P5f及び聴取者の位置Rを結ぶ線分と、スピーカ５ｇの位置P5g及び聴取者の位置Rを結ぶ線分とのなす角をρとし、
（ｂ）スピーカ５ｆの位置P5f及び聴取者の位置Rを結ぶ線分と、位置SP及び聴取者の位置Rを結ぶ線分とのなす角をσとした場合に、
スピーカ５ｆ用の（スピーカ５ｆへ供給されるべき）出力音響信号t5f_out(p)、及びスピーカ５ｇ用の（スピーカ５ｇへ供給されるべき）出力音響信号t5g_out(p)は、これらρ及びσを算出した上で、次式
（１０） t5f_out(p)＝t_out(p)×(sin(ρ／2)＋sin(ρ／2－σ))
t5g_out(p)＝t_out(p)×(sin(ρ／2)－sin(ρ／2－σ))
によって導出されるのである。 Specifically, as shown in Figure 4,
(a) Let ρ be the angle formed by the line segment connecting the position P5f of the speaker 5f and the listener's position R, and the line segment connecting the position P5g of the speaker 5g and the listener's position R,
(b) When σ is the angle formed by the line segment connecting the position P5f of the speaker 5f and the listener's position R, and the line segment connecting the position SP and the listener's position R,
The output acoustic signal t5f_out(p) for the speaker 5f (to be supplied to the speaker 5f) and the output acoustic signal t5g_out(p) for the speaker 5g (to be supplied to the speaker 5g) are calculated by calculating these ρ and σ. Then, the following formula (10) t5f_out(p)=t_out(p)×(sin(ρ/2)+sin(ρ/2−σ))
t5g_out(p)=t_out(p)×(sin(ρ/2)−sin(ρ/2−σ))
It is derived by

ここで上式（１０）は、スピーカ５ｆ及び５ｇを用い、両スピーカ（チャネル）間の音量差（振幅強度の差異）によって音像定位を実現する、いわゆる（振幅）パニング（panning）を実施するものとなっている。 Here, the above equation (10) uses the speakers 5f and 5g to implement so-called (amplitude) panning, which realizes sound image localization by the difference in volume (difference in amplitude intensity) between both speakers (channels). It becomes.

以上、図１～４を用いて、音源（発話者）を含む収音空間（収音室）の音場を、再生空間（再生室）に存在する受音体（聴取者）に対し再生する処理の説明を行ったが、ここで本実施形態における当該処理を簡単にまとめておく。本実施形態においては、
（ａ）マルチチャネルの全指向性マイク２から、音源（発話者）及び受音体（聴取者）の位置座標情報に基づき、２チャネルの全指向性マイク２を選定して１チャネルの（モノラルの）第１の入力音響信号を決定し、
（ｂ）マルチチャネルの鋭指向性マイク３から、音源（発話者）の位置座標情報及び顔方位角情報に基づき、２チャネルの鋭指向性マイク３を選定して１チャネルの（モノラルの）第２の入力音響信号を決定し、
（ｃ）これらのモノラルの（合わせて２チャネルの）入力音響信号を混合して１チャネルの（モノラルの）出力音響信号を生成した上で、マルチチャネルのスピーカ５から、音源（発話者）及び受音体（聴取者）の位置座標情報に基づき、２チャネルのスピーカ５を選定して、選定したスピーカ５による音像定位が可能となる出力音響信号を生成・提供するのである。 As described above, using Figures 1 to 4, the sound field in the sound collection space (sound collection room) including the sound source (speaker) is reproduced to the sound receiver (listener) existing in the playback space (playback room). Although the processing has been explained, the processing in this embodiment will be briefly summarized here. In this embodiment,
(a) From the multi-channel omnidirectional microphone 2, select the 2-channel omnidirectional microphone 2 based on the position coordinate information of the sound source (speaker) and the sound receiver (listener), and select the 1-channel (monaural) omnidirectional microphone 2. ) determining a first input acoustic signal;
(b) From the multi-channel sharp directional microphones 3, the 2-channel sharp directional microphone 3 is selected based on the position coordinate information and face azimuth information of the sound source (speaker), and the 1-channel (monaural) 2, determine the input acoustic signal of
(c) After mixing these monaural (two channels in total) input audio signals to generate a one-channel (monaural) output audio signal, the multi-channel speaker 5 outputs a sound source (speaker) and Based on the positional coordinate information of the sound receiver (listener), the two-channel speaker 5 is selected, and an output acoustic signal is generated and provided that enables the selected speaker 5 to localize the sound image.

図１の機能ブロック図に戻って、出力音響信号生成部１１３の近スピーカ特定パニング部１１３ｅは、特定したスピーカ５（図１では５ｆ及び５ｇ）と比べて聴取者（受音体）の位置により近い又は（室壁から見て）より内側に位置する少なくとも１つの「別のスピーカ」も特定し、特定した「別のスピーカ」（図１ではスピーカ５_cell）へ供給すべき出力音響信号も生成する。これにより、サラウンドのスピーカ５ａ～５ｈだけでは実現されない音場（音響）、例えば音源が近傍に存在していると感じるような、すなわちより近くに音像を定位させるような音場（音響）を再生若しくは再現することも可能となる。 Returning to the functional block diagram of FIG. 1, the near speaker identification panning section 113e of the output acoustic signal generation section 113 is configured to At least one "another speaker" located nearby or further inside (as viewed from the room wall) is also identified, and an output acoustic signal to be supplied to the identified "another speaker" (speaker 5_cell in FIG. 1) is also generated. . This reproduces a sound field (acoustic) that cannot be achieved by the surround speakers 5a to 5h alone, for example, a sound field (acoustic) that makes you feel that the sound source is nearby, that is, a sound field (acoustic) that localizes the sound image closer. Or it can be reproduced.

ここで図１に示した実施形態において、再生室の天井中央に設置されたスピーカ５_cellは、再生室に配置されているサラウンドのスピーカ５ａ～５ｈとは「別のスピーカ」であり、この再生室において、より内側の音響提示を実施可能とするスピーカとなっている。この場合、近スピーカ特定パニング部１１３ｅは、スピーカ５_cell用の出力音響信号tcell_out(p)を、次式
（１１） tcell_out(p)＝tf_gun(p)
（１２） tcell_out(p)＝tc_gun(p)
（１３） tcell_out(p)＝t_out(p)
のうちのいずれかをもって決定することができる。以下、このような「別のスピーカ」が複数存在する実施形態について、図５を用いて説明を行う。 In the embodiment shown in FIG. 1, the speaker 5_cell installed at the center of the ceiling of the playback room is a "different speaker" from the surround speakers 5a to 5h arranged in the playback room, and This is a speaker that can present sounds from inside the room. In this case, the near speaker specific panning unit 113e calculates the output acoustic signal tcell_out(p) for the speaker 5_cell using the following formula (11) tcell_out(p)=tf_gun(p)
(12) tcell_out(p)=tc_gun(p)
(13) tcell_out(p)=t_out(p)
The decision can be made based on either of the following. An embodiment in which a plurality of such "other speakers" exist will be described below using FIG. 5.

図５は、本発明に係る天井スピーカ特定処理及び天井スピーカパニング処理の一実施形態を説明するための、再生空間に係る模式図である。 FIG. 5 is a schematic diagram of a reproduction space for explaining an embodiment of ceiling speaker identification processing and ceiling speaker panning processing according to the present invention.

図５によれば、再生空間において、サラウンドのスピーカ５ａ～５ｈ（図５では省略）とは「別のスピーカ」として、天井境界部の（空間内部を取り囲む）複数の位置と、天井中央の位置とに（図５では計９個の）天井スピーカが設置されている。ここで、近スピーカ特定パニング部１１３ｅ（図１）は、聴取者の位置座標情報（位置に係る情報）に基づき、所定数の（図５では３つの）天井スピーカを特定（選定）し、これらの天井スピーカによってパニングを行うことができるように、各天井スピーカへ供給する出力音響信号を生成する。 According to FIG. 5, in the playback space, the surround speakers 5a to 5h (omitted in FIG. 5) are "separate speakers" at multiple positions on the ceiling boundary (surrounding the inside of the space) and at the center of the ceiling. Ceiling speakers (9 in total in Figure 5) are installed. Here, the near speaker identification panning unit 113e (FIG. 1) identifies (selects) a predetermined number (three in FIG. 5) of ceiling speakers based on the listener's position coordinate information (position information), and An output acoustic signal is generated to be provided to each ceiling speaker so that panning can be performed by the ceiling speakers.

具体的に図５では、天井スピーカを頂点とする最小の「三角形」であって、聴取者の頭上の天井位置（図５のバツ印）を含む「三角形」の頂点に位置する３つの天井スピーカ５_cell1、５_cell2及び５_cell3が、特定（選定）されている。近スピーカ特定パニング部１１３ｅ（図１）は、これらの特定した天井スピーカ５_cell1、５_cell2及び５_cell3用の出力音響信号を、聴取者にとって頭上の天井位置（バツ印）から音が聞こえるように、例えばＶＢＡＰ（Vector Based Amplitude Panning）法によって生成する。 Specifically, in FIG. 5, three ceiling speakers are located at the vertices of the smallest "triangle" with the ceiling speaker as the apex, and which includes the ceiling position above the listener's head (marked with a cross in FIG. 5). 5_cell1, 5_cell2, and 5_cell3 are specified (selected). The near speaker specific panning unit 113e (FIG. 1) converts the output acoustic signals for the specified ceiling speakers 5_cell1, 5_cell2, and 5_cell3 into, for example, VBAP so that the sound can be heard from the ceiling position (cross mark) above the listener's head. (Vector Based Amplitude Panning) method.

ここでＶＢＡＰ法は、三次元マルチチャネル音響（音場）における音像定位の代表的な制御手法であり、例えば非特許文献：安藤彰男, 「音響の高臨場感技術」, 映像情報メディア学会誌, Vol.66, No.8, pp.671－677, ２０１２年において詳細に解説されている。 Here, the VBAP method is a typical control method for sound image localization in three-dimensional multi-channel sound (sound field), for example, non-patent literature: Akio Ando, "High Presence Technology for Acoustics", Journal of the Institute of Image Information and Media Engineers, It is explained in detail in Vol.66, No.8, pp.671-677, 2012.

なお本実施形態では、天井スピーカは再生室（再生空間）の天井に固定して設置されているが、固定されておらずその位置が変化するものであってもよい。すなわち、複数の天井スピーカの少なくとも１つは例えば、再生室の天井を移動可能となっていてもよい。また「別のスピーカ」として、再生室内を移動可能なロボットに取り付けられたスピーカを採用することもできる。この場合例えば、聴取者にとって近傍の音像定位を実現すべく、スピーカを備えたロボットが適宜、聴取者の近傍に移動してもよいのである。さらに「別のスピーカ」として、再生室の床中央に設置された、または床上を移動可能な十二面体スピーカ５ｐを用いることも可能である。 In this embodiment, the ceiling speaker is fixedly installed on the ceiling of the playback room (playback space), but the speaker may not be fixed and its position may change. That is, at least one of the plurality of ceiling speakers may be movable on the ceiling of the playback room, for example. Further, as the "another speaker", a speaker attached to a robot that can move within the playback room can also be employed. In this case, for example, a robot equipped with a speaker may move as appropriate to the vicinity of the listener in order to realize sound image localization in the vicinity of the listener. Furthermore, it is also possible to use a dodecahedral speaker 5p installed in the center of the floor of the reproduction room or movable on the floor as "another speaker".

以上、出力音響信号生成部１１３（図１）による、再生室（再生空間）に設置されたスピーカを用いた音場再生処理について説明を行った。以下、聴取者（受音体）に装着されたステレオオン、本実施形態ではヘッドホンにおける音場再生処理について説明する。ここでこの場合、再生空間は、物理的な空間ではなく、聴取者に装着されたヘッドホンにおける音像定位に係る仮想空間（音像空間）となる。 The sound field reproduction processing performed by the output acoustic signal generation unit 113 (FIG. 1) using the speakers installed in the reproduction room (reproduction space) has been described above. Hereinafter, sound field reproduction processing in a stereo on, which is headphones in this embodiment, worn by a listener (sound receiver) will be described. In this case, the reproduction space is not a physical space but a virtual space (sound image space) related to sound image localization in headphones worn by the listener.

図１の機能ブロック図に戻って、出力音響信号生成部１１３は、出力混合部１１３ｃで生成した出力音響信号t_out(p)を用い、ヘッドホンの再生空間（音像空間）において聴取者（受音体）から見て発話者（音源）の対応位置へ向かう方向又はこの方向に所定条件を満たすまでに近い方向から、聴取者に向けて、「収音された音」に相当若しくは対応する音が出力されることになる、このヘッドホンに供給すべき出力音響信号th_out(p)を生成する。 Returning to the functional block diagram of FIG. 1, the output acoustic signal generation unit 113 uses the output acoustic signal t_out(p) generated by the output mixing unit 113c to ) A sound equivalent to or corresponding to the "collected sound" is output toward the listener from a direction toward the corresponding position of the speaker (sound source) or from a direction close to this direction until a predetermined condition is met. generate an output audio signal th_out(p) to be supplied to this headphone, which will be

具体的には、聴取者から見た発話者の位置、及び聴取者の両耳にとっての発話者の向きを反映したヘッドホンの頭部伝達関数（ＨＲＴＦ, Head-Related Transfer Function）を、出力音響信号t_out(p)に対し、周波数領域で畳み込む（畳み込み積分処理を行う）ことにより、出力音響信号th_out(p)を生成することができる。なお、ＨＲＴＦについては、例えば非特許文献：西野隆典，「頭部伝達関数データベース」，[online]，［令和４年６月２７日検索］，インターネット＜URL: https://sites.google.com/site/takanorinishinomu/research/hrtf/database-j＞において詳細に解説されている。 Specifically, the Head-Related Transfer Function (HRTF) of the headphones, which reflects the speaker's position as seen from the listener and the speaker's orientation for the listener's both ears, is used as the output acoustic signal. The output acoustic signal th_out(p) can be generated by convolving t_out(p) in the frequency domain (performing convolution integral processing). Regarding HRTF, for example, non-patent literature: Takanori Nishino, "Head-related transfer function database", [online], [searched on June 27, 2020], Internet <URL: https://sites.google. com/site/takanorinishinomu/research/hrtf/database-j> is explained in detail.

また、以上述べたようにヘッドホン用の出力音響信号th_out(p)を生成する際、出力音響信号t_out(p)の代わりに、
・この出力音響信号t_out(p)に対し、収音室（収音空間）の室内伝達関数（ＲＴＦ, Room Transfer Function）から生成した逆フィルタを畳み込む（畳み込み積分処理を行う）ことによって生成された出力音響信号tan_out(p)
を用いてもよい。これにより、聴取者はこのヘッドホンによって、例えば収音室の残響を除去したクリアな音を享受することも可能となる。また勿論、この出力音響信号tan_out(p)は、ヘッドホン以外にも、収音室のスピーカ等、他の音響出力系で利用されてもよい。 Also, as mentioned above, when generating the output audio signal th_out(p) for headphones, instead of the output audio signal t_out(p),
・This output acoustic signal t_out(p) is generated by convolving (performing convolution integral processing) an inverse filter generated from the room transfer function (RTF, Room Transfer Function) of the sound collection room (sound collection space). Output acoustic signal tan_out(p)
may also be used. As a result, the listener can also enjoy clear sound with the reverberations of the sound collection room removed, for example, using the headphones. Of course, this output acoustic signal tan_out(p) may be used in other acoustic output systems such as speakers in a sound collection room, in addition to headphones.

さらに他の実施形態として、出力音響信号生成部１１３は、出力混合部１１３ｄから出力された出力音響信号t_out(p)を、そのまま（例えばパニング処理を施すことなく）装置１の外部へ提供させることも可能である。この場合でも、この出力音響信号t_out(p)は、装置１の外部において、受音体に向けて「収音された音」に相当若しくは対応する音が出力されることになる音響信号として利用されることが可能となっている。 In yet another embodiment, the output acoustic signal generation unit 113 provides the output acoustic signal t_out(p) output from the output mixing unit 113d to the outside of the device 1 as it is (for example, without performing panning processing). is also possible. Even in this case, this output acoustic signal t_out(p) is used as an acoustic signal that causes a sound equivalent to or corresponding to the "collected sound" to be output toward the sound receiver outside the device 1. It is now possible to do so.

さらに言えば、更なる他の実施形態として、入力音響信号決定部１１２は、上述したような全指向性マイク特定処理や鋭指向性マイク特定処理を特に実施することなく、予め指定された少なくとも１つのマイクで収音された音に係る音響信号を取得して、入力音響信号を決定するものであってもよい。この場合においても、出力音響信号生成部１１３は、再生空間（再生室）において受音体（聴取者）から見て音源（発話者）の対応位置へ向かう方向又はこの方向に所定条件を満たすまでに近い方向に位置する少なくとも１つのスピーカ５を、取得された受音体（聴取者）及び音源（発話者）の位置に係る情報に基づき特定し、特定したスピーカ５へ供給すべき出力音響信号を生成するのである。 Furthermore, as a further embodiment, the input acoustic signal determination unit 112 may select at least one pre-specified The input acoustic signal may be determined by acquiring acoustic signals related to sounds picked up by two microphones. In this case as well, the output acoustic signal generation unit 113 continues in the direction toward the corresponding position of the sound source (speaker) as seen from the sound receiver (listener) in the reproduction space (reproduction room), or until the predetermined condition is met in this direction. At least one speaker 5 located in a direction close to is specified based on the acquired information on the positions of the sound receiver (listener) and the sound source (speaker), and an output acoustic signal to be supplied to the specified speaker 5. It generates.

同じく図１の機能ブロック図において、以上説明したように出力音響信号生成部１１３で生成された出力音響信号は、通信制御部１２１を介して通信インタフェース１０１から、通信ネットワークを経由し、再生空間に設置された（特定された）各スピーカ、又は再生室（再生空間）の中に又は近傍に設置された（図示されていない）音響信号分配制御装置へ送信される。ここで、送信される出力音響信号には、当該信号が供給されるべき（特定された）スピーカの識別情報が付与されていることも好ましい。 Similarly, in the functional block diagram of FIG. 1, as explained above, the output acoustic signal generated by the output acoustic signal generation section 113 is transmitted from the communication interface 101 via the communication control section 121 to the reproduction space via the communication network. The signal is transmitted to each installed (identified) speaker or to an audio signal distribution control device (not shown) installed in or near the playback room (playback space). Here, it is also preferable that the transmitted output acoustic signal is given identification information of the (specified) speaker to which the signal is to be supplied.

また、以上に説明したような音場再生処理は、例えばキーボード（ＫＢ）１０２から入力された処理実行指示を、入出力制御部１２２が該当する機能構成部へ出力することによって実施されてもよい。また例えば、入出力制御部１２２が、特定されたマイク及びスピーカの情報や、発話者及び聴取者の位置に係る情報を、該当する機能構成部から適宜取得し、例えば図１の上方に示されたような、収音室及び再生室における刻々の状況を示す画像を、ディスプレイ（ＤＰ）１０２に表示させてもよいのである。 Further, the sound field reproduction processing as described above may be performed by the input/output control unit 122 outputting a processing execution instruction inputted from the keyboard (KB) 102 to the corresponding functional component, for example. . Further, for example, the input/output control unit 122 may appropriately acquire information on the identified microphone and speaker and information on the positions of the speaker and the listener from the corresponding functional components, for example, as shown in the upper part of FIG. The display (DP) 102 may display images showing the current situation in the sound collection room and the playback room.

以上詳細に説明したように、本発明によれば、取得した音源及び受音体の位置に係る情に基づき、マイクを特定して入力音響信号を決定し、さらに出力音響信号を生成するので、受音体が再生空間内の任意の位置に存在する又は再生空間内を移動可能となっている場合でも、この受音体に対し収音空間の音場を再生することができる。また、音源が収音空間内の任意の位置に存在する又は収音空間内を移動可能となっている場合においても、同じく受音体に対し収音空間の音場を再生することが可能となる。 As described in detail above, according to the present invention, the microphone is identified and the input acoustic signal is determined based on the acquired information regarding the positions of the sound source and the sound receiver, and the output acoustic signal is further generated. Even when the sound receiver is located at an arbitrary position within the reproduction space or is movable within the reproduction space, the sound field of the sound collection space can be reproduced with respect to the sound receiver. Furthermore, even if the sound source is located at any position within the sound collection space or is movable within the sound collection space, it is possible to reproduce the sound field of the sound collection space to the sound receiver. Become.

またさらに、本発明は例えば、近年広く利用されているウェブ会議、遠隔セッションや、遠隔地カンファレンス、さらにはオンラインコンサート等において、収音空間での音場に相当する又は対応する音場を、音場の被提供者に対し忠実に再現してみせることに大いに貢献し得るものとなっている。 Furthermore, the present invention provides a sound field that corresponds to or corresponds to the sound field in the sound collection space, for example, in web conferences, remote sessions, remote conferences, and even online concerts, which have been widely used in recent years. This can greatly contribute to faithfully reproducing the space for the recipient.

また本発明によれば、例えば（ａ）ステージ上を移動しながら歌唱・演奏を行う歌手・演奏家に係る音楽教材や、（ｂ）動き回る選手に係る音声・動作音を収録した体育教材、さらには（ｃ）ネイティブと並んで歩きながらの又は各種共同作業を行いながらの会話に係る言語教育教材や、（ｄ）工場や展示会場等の設備における種々の箇所で発生する各種の音を収録した社会見学教材等を、高い音場再現性をもって作成し、このような質の高い教材を活用した優れた音楽教育、体育教育、言語教育や社会教育等を、都市部だけでなく地方の子供達や受講者に対し例えばオンラインで、また場合によってはリアルタイムで受ける機会を提供することも可能となる。すなわち本発明によれば、国連が主導する持続可能な開発目標（ＳＤＧｓ）の目標４「すべての人々に包摂的かつ公平で質の高い教育を提供し、生涯学習の機会を促進する」に貢献することも可能となるのである。 Further, according to the present invention, for example, (a) music teaching materials relating to singers/players who sing and perform while moving on a stage; (b) physical education teaching materials relating to voices and movement sounds relating to moving athletes; (c) contains language teaching materials related to conversations while walking side by side with native speakers or while performing various collaborative tasks, and (d) records of various sounds occurring at various locations in facilities such as factories and exhibition halls. We create social field study materials with high sound field reproduction, and use these high-quality materials to provide excellent music education, physical education, language education, social education, etc. to children not only in urban areas but also in rural areas. For example, it is possible to provide students with the opportunity to receive training online, or in some cases in real time. In other words, the present invention contributes to Goal 4 of the Sustainable Development Goals (SDGs) led by the United Nations: “Provide inclusive, equitable and quality education for all and promote lifelong learning opportunities.” It is also possible to do so.

以上に述べた本発明の種々の実施形態について、本発明の技術思想及び見地の範囲の種々の変更、修正及び省略は、当業者によれば容易に行うことができる。前述の説明はあくまで例であって、何ら制約しようとするものではない。本発明は、特許請求の範囲及びその均等物として限定するものにのみ制約される。 Regarding the various embodiments of the present invention described above, various changes, modifications, and omissions within the scope of the technical idea and viewpoint of the present invention can be easily made by those skilled in the art. The above description is merely an example and is not intended to be limiting in any way. The invention is limited only by the claims and their equivalents.

１音場再生装置
１０１通信インタフェース
１０２キーボード（ＫＢ）・ディスプレイ（ＤＰ）
１１１位置情報取得部
１１２入力音響信号決定部
１１２ａ全指向性マイク特定部
１１２ｂ、１１２ｄ入力混合部
１１２ｃ鋭指向性マイク特定部
１１３出力音響信号生成部１１３
１１３ａ振幅調整部
１１３ｂ音色調整部
１１３ｃ出力混合部
１１３ｄスピーカ（ＳＰ）特定パニング部
１１３ｅ近スピーカ（ＳＰ）特定パニング部
１２１通信制御部
１２２入出力制御部
２、２ａ、２ｂ、２ｃ、２ｄ、２ｅ、２ｆ、２ｇ、２ｈ全指向性マイク
３、３ａ、３ｂ、３ｃ、３ｄ、３ｅ、３ｆ、３ｇ、３ｈ鋭指向性マイク
４深度センサ
５、５ａ、５ｂ、５ｃ、５ｄ、５ｅ、５ｆ、５ｇ、５ｈスピーカ
５_cell1、５_cell2、５_cell3 天井スピーカ
５ｐ十二面体スピーカ 1 Sound field reproduction device 101 Communication interface 102 Keyboard (KB)/Display (DP)
111 Position information acquisition section 112 Input acoustic signal determination section 112a Omnidirectional microphone identification section 112b, 112d Input mixing section 112c Sharp directional microphone identification section 113 Output acoustic signal generation section 113
113a Amplitude adjustment section 113b Tone adjustment section 113c Output mixing section 113d Speaker (SP) specific panning section 113e Near speaker (SP) specific panning section 121 Communication control section 122 Input/output control section 2, 2a, 2b, 2c, 2d, 2e, 2f, 2g, 2h Omnidirectional microphone 3, 3a, 3b, 3c, 3d, 3e, 3f, 3g, 3h Sharp directional microphone 4 Depth sensor 5, 5a, 5b, 5c, 5d, 5e, 5f, 5g, 5h Speaker 5_cell1, 5_cell2, 5_cell3 Ceiling speaker 5p Dodecahedral speaker

Claims

A sound field reproduction program for reproducing a sound field related to a sound source in a sound collection space to a sound receiver in a reproduction space that has a correspondence relationship with the sound collection space in terms of position in the space, the program comprising:
a position information acquisition means for acquiring information regarding the measured or set position of the sound source and the sound receiver;
At least one microphone located in a direction toward the corresponding position of the sound receiver as seen from the sound source in the sound collection space, or in a direction close to the sound direction until a predetermined condition is satisfied, based on the acquired information regarding the position. input acoustic signal determining means for determining an input acoustic signal by acquiring an acoustic signal related to the sound picked up by the identified microphone;
In the reproduction space, from the direction toward the corresponding position of the sound source as seen from the sound receiver, or from a direction close to the direction until a predetermined condition is satisfied, toward the sound receiver, the sound corresponding to or corresponding to the collected sound. The computer is characterized by causing the computer to function as an output acoustic signal generation means for generating an output acoustic signal in which a corresponding sound is output, using the input acoustic signal based on the acquired information regarding the position. Sound field reproduction program.

The position information acquisition means also acquires information regarding the output direction of the measured or set sound at the sound source,
The input acoustic signal determining means selects at least one other microphone located in the output direction or in a direction close to the output direction as seen from the sound source in the sound collection space, based on the acquired position. and determining a different input acoustic signal by obtaining another acoustic signal related to the sound picked up by the identified other microphone,
2. The output acoustic signal generating means generates the output acoustic signal by combining the other input acoustic signal and the input acoustic signal related to the sound picked up by the microphone. The sound field reproduction program described.

3. The sound field reproduction program according to claim 2, wherein the other microphone has higher directivity than the other microphone.

The playback space is a real space,
The output acoustic signal generating means acquires at least one speaker located in a direction toward a corresponding position of the sound source as seen from the sound receiver in the reproduction space or in a direction close to the direction until a predetermined condition is satisfied. 4. The sound field reproduction program according to claim 1, wherein the sound field reproduction program generates the output acoustic signal to be supplied to the specified speaker based on information regarding the specified position.

The output acoustic signal generating means also identifies at least one other speaker that is closer to the position of the sound receiver than the identified speaker or is located more inward from the surrounding boundary, and 5. The sound field reproduction program according to claim 4, further comprising generating the output acoustic signal to be supplied to a speaker.

The sound source and the sound receiver are movable within the sound collection space and the reproduction space, which are real spaces, respectively;
The input acoustic signal determining means specifies the microphone at the one time point based on the information regarding the position measured and acquired at the one time point by the position information acquisition means, and determines the input sound signal at the one time point. determine the acoustic signal;
The output acoustic signal generation means specifies the speaker at the one point in time based on information about the position acquired at the one point in time and generates the output acoustic signal at the one point in time. The sound field reproduction program according to claim 4.

The playback space is a virtual space related to sound image localization in the stereophone attached to the sound receiver,
The output acoustic signal generating means is configured to generate a signal toward the sound receiver from a direction toward a corresponding position of the sound source as seen from the sound receiver in the reproduction space, or from a direction close to the direction until a predetermined condition is satisfied. According to any one of claims 1 to 3, the output audio signal is generated to be supplied to the stereophone, from which a sound corresponding to or corresponding to the collected sound is output. Sound field reproduction program.

The output acoustic signal generating means performs an amplitude adjustment process on the determined another input acoustic signal to attenuate the amplitude according to the distance between the position of the sound source and the position of the sound receiver, and performs the process. 4. The sound field reproduction program according to claim 2, wherein the output acoustic signal is generated using the other input acoustic signal subjected to the above processing.

The output acoustic signal generation means performs a filtering process on the determined another input acoustic signal to emphasize a predetermined high frequency band depending on the proximity between the position of the sound source and the position of the sound receiver, Alternatively, determine the impulse response between the position of the sound source and the position of the sound receiver, perform convolution processing in the frequency domain related to the impulse response, and use the other input acoustic signal that has undergone the processing. 4. The sound field reproduction program according to claim 2, wherein the sound field reproduction program generates an output acoustic signal.

The output acoustic signal generating means determines a phase difference between the other input acoustic signal and the input acoustic signal based on the information regarding the position, and delays the phase of one of the input acoustic signals so as to eliminate the determined phase difference. 4. The sound field reproducing program according to claim 2, further comprising the step of combining said other input sound signal and said input sound signal to generate said output sound signal.

A sound field reproduction program for reproducing a sound field related to a sound source in a sound collection space to a sound receiver in a reproduction space that has a correspondence relationship with the sound collection space in terms of position in the space, the program comprising:
a position information acquisition means for acquiring information regarding the measured or set position of the sound source and the sound receiver, and information regarding the output direction of the measured or set sound at the sound source;
(a) At least one microphone located in the direction toward the corresponding position of the sound receiver as seen from the sound source in the sound collection space, or in a direction close to the sound direction until a predetermined condition is satisfied, is moved to the acquired position. (b) determining the input sound signal by determining the sound signal related to the sound picked up by the identified microphone based on the information; and (b) determining the output direction as seen from the sound source in the sound collection space. or at least one other microphone located in a direction close to the output direction that satisfies a predetermined condition, based on the obtained information regarding the position and the information regarding the output direction, and the identified another microphone input sound signal determining means for determining another input sound signal by acquiring another sound signal related to the sound collected by the input sound signal;
An output that combines the input acoustic signal and the other input acoustic signal to generate an output acoustic signal that results in outputting a sound equivalent to or corresponding to the collected sound toward the sound receiver. A sound field reproduction program characterized by causing a computer to function as an acoustic signal generating means.

A sound field reproduction program for reproducing a sound field related to a sound source in a sound collection space to a sound receiver in a reproduction space that has a correspondence relationship with the sound collection space in terms of position in the space, the program comprising:
a position information acquisition means for acquiring information regarding the measured or set position of the sound source and the sound receiver;
input acoustic signal determining means for determining an input acoustic signal by acquiring an acoustic signal related to a sound picked up by at least one microphone capable of picking up the sound from the sound source;
At least one speaker located in the direction toward the corresponding position of the sound source as viewed from the sound receiver in the playback space, or in a direction close to the sound source until a predetermined condition is satisfied, based on the acquired information regarding the position. A sound field reproduction program characterized by causing a computer to function as an output acoustic signal generation means for specifying and generating an output acoustic signal to be supplied to the specified speaker.

A sound field reproduction device for reproducing a sound field related to a sound source in a sound collection space to a sound receiver in a reproduction space that has a correspondence relationship with the sound collection space in terms of position in the space,
a position information acquisition means for acquiring information regarding the measured or set position of the sound source and the sound receiver;
At least one microphone located in a direction toward the corresponding position of the sound receiver as seen from the sound source in the sound collection space, or in a direction close to the sound direction until a predetermined condition is satisfied, based on the acquired information regarding the position. input acoustic signal determining means for determining an input acoustic signal by acquiring an acoustic signal related to the sound picked up by the identified microphone;
In the reproduction space, from the direction toward the corresponding position of the sound source as seen from the sound receiver, or from a direction close to the direction until a predetermined condition is satisfied, toward the sound receiver, the sound corresponding to or corresponding to the collected sound. A sound field characterized by comprising an output acoustic signal generating means for generating an output acoustic signal in which a corresponding sound is to be output, using the input acoustic signal based on the acquired information regarding the position. playback device.

A sound field reproduction system for reproducing a sound field related to a sound source in a sound collection space to a sound receiver in a reproduction space that has a corresponding relationship with the sound collection space in terms of position in the space, the system comprising:
a position information acquisition means for acquiring information regarding the measured or set position of the sound source and the sound receiver;
At least one microphone located in a direction toward the corresponding position of the sound receiver as seen from the sound source in the sound collection space, or in a direction close to the sound direction until a predetermined condition is satisfied, based on the acquired information regarding the position. input acoustic signal determining means for determining an input acoustic signal by acquiring an acoustic signal related to the sound picked up by the identified microphone;
In the reproduction space, from the direction toward the corresponding position of the sound source as seen from the sound receiver, or from a direction close to the direction until a predetermined condition is satisfied, toward the sound receiver, the sound corresponding to or corresponding to the collected sound. A sound field characterized by comprising an output acoustic signal generating means for generating an output acoustic signal in which a corresponding sound is to be output, using the input acoustic signal based on the acquired information regarding the position. playback system.

A sound field reproduction method implemented by a computer for reproducing a sound field related to a sound source in a sound collection space to a sound receiver in a reproduction space that corresponds to the sound collection space in terms of position in the space, ,
obtaining information regarding the measured or set position of the sound source and the sound receiver;
At least one microphone located in a direction toward the corresponding position of the sound receiver as seen from the sound source in the sound collection space, or in a direction close to the sound direction until a predetermined condition is satisfied, based on the acquired information regarding the position. determining an input acoustic signal by acquiring an acoustic signal related to the sound picked up by the identified microphone;
In the reproduction space, from the direction toward the corresponding position of the sound source as seen from the sound receiver, or from a direction close to the direction until a predetermined condition is satisfied, toward the sound receiver, the sound corresponding to or corresponding to the collected sound. A sound field reproduction method comprising the step of generating an output acoustic signal in which a corresponding sound is to be output, using the input acoustic signal based on the acquired information regarding the position.