JP5550019B2

JP5550019B2 - Sound field sharing system and optimization method

Info

Publication number: JP5550019B2
Application number: JP2010228392A
Authority: JP
Inventors: 成悟榎本; 雄介池田; 哲中村; 史郎伊勢
Original assignee: National Institute of Information and Communications Technology
Current assignee: National Institute of Information and Communications Technology
Priority date: 2010-10-08
Filing date: 2010-10-08
Publication date: 2014-07-16
Anticipated expiration: 2030-10-08
Also published as: JP2012085035A

Description

この発明は音場共有システムおよび最適化方法に関し、特にたとえば、原音場を物理的に忠実に記録・再現する音場制御再生システムを用いた、音場共有システムおよび最適化方法に関する。 The present invention relates to a sound field sharing system and an optimization method, and more particularly to a sound field sharing system and an optimization method using a sound field control reproduction system that records and reproduces an original sound field physically and faithfully.

この種の従来の音場共有システムの一例が非特許文献１に開示されている。この非特許文献１に開示される３次元音場通信システムでは、７０ｃｈ（チャネル）のマイクロホンアレイで収録した音響データを６２ｃｈのラウドスピーカで再現する音場制御（ＢｏｕｎｄａｒｙＳｕｒｆａｃｅＣｏｎｔｒｏｌ：ＢｏＳＣ）再生システムを用いて、遠隔地に存在する利用者が音響空間を共有しながら会話を行うことが可能である。具体的には、予め収録し逆フィルタが畳み込まれた６２ｃｈの音場データがサーバに記憶される。このサーバには、インターネットおよびＬＡＮのようなネットワークを介して、２台のクライアントマシン（ＰＣ）が異なる場所に配置される。各クライアントマシンには、３次元の音場再現システムが接続されている。サーバは、利用者が選択した再現音場を双方の音場再現システムに同時に伝送する。各音場再現システムの利用者の音声に対応する音声データは、ネットワークを介して、それぞれ他方のクライアントマシンに伝送される。各クライアントマシンでは、他方の利用者の音声に対応する音声データ（１ｃｈ）が、実時間で畳み込まれた後に、音場データ（６２ｃｈ）に重ね合わせて出力される。したがって、異なる場所に存在する利用者は、サーバから出力される音場データを共有するとともに、会話することができる。 An example of this type of conventional sound field sharing system is disclosed in Non-Patent Document 1. In the three-dimensional sound field communication system disclosed in Non-Patent Document 1, a sound field control (Bond Surface Control: BoSC) reproduction system that reproduces sound data recorded by a microphone array of 70 ch (channel) with a 62 ch loudspeaker is provided. It is possible for users in remote locations to have a conversation while sharing an acoustic space. Specifically, 62ch sound field data recorded in advance and convoluted with an inverse filter is stored in the server. In this server, two client machines (PCs) are arranged at different locations via a network such as the Internet and a LAN. A three-dimensional sound field reproduction system is connected to each client machine. The server simultaneously transmits the reproduction sound field selected by the user to both sound field reproduction systems. Voice data corresponding to the voice of the user of each sound field reproduction system is transmitted to the other client machine via the network. In each client machine, voice data (1ch) corresponding to the voice of the other user is convoluted in real time and then superimposed on the sound field data (62ch) and output. Therefore, users existing in different places can share the sound field data output from the server and have a conversation.

「１．数値解析技術と可視化・可聴化１．７三次元音場通信システム」榎本成悟音響技術 No.148/Dec.2009 pp37-42"1. Numerical analysis technology and visualization / audibility 1.7 Three-dimensional sound field communication system" Seigo Enomoto Acoustic Technology No.148 / Dec.2009 pp37-42

しかし、非特許文献１の３次元音場通信システムでは、７０ｃｈのマイクロホンアレイで収録した音場データを、６２ｃｈのスピーカアレイ（音場再現システム）で再現するため、その音場データのデータ量が膨大である。また、チャネル数が多いため、畳み込みの処理も負荷が大きい。このため、予め収録するとともに、予め畳み込み処理を行った音場データを各クライアントマシンに伝送するようにしてある。したがって、リアルタイムに収録した音場データを共有することは困難であった。 However, in the three-dimensional sound field communication system of Non-Patent Document 1, the sound field data recorded by the 70ch microphone array is reproduced by the 62ch speaker array (sound field reproduction system). It is enormous. In addition, since the number of channels is large, the load of convolution processing is heavy. For this reason, sound field data that have been recorded in advance and subjected to convolution processing in advance are transmitted to each client machine. Therefore, it is difficult to share sound field data recorded in real time.

それゆえに、この発明の主たる目的は、新規な、音場共有システムおよび最適化方法を提供することである。 Therefore, the main object of the present invention is to provide a novel sound field sharing system and optimization method.

また、この発明の他の目的は、リアルタイムに収録された音場データを異なる場所に存在する利用者によって共有することができる、音場共有システムおよび最適化方法を提供することである。 Another object of the present invention is to provide a sound field sharing system and an optimization method capable of sharing sound field data recorded in real time by users existing in different places.

本発明は、上記の課題を解決するために、以下の構成を採用した。なお、括弧内の参照符号および補足説明等は、本発明の理解を助けるために後述する実施の形態との対応関係を示したものであって、本発明を何ら限定するものではない。 The present invention employs the following configuration in order to solve the above problems. The reference numerals in parentheses, supplementary explanations, and the like indicate correspondence relationships with embodiments described later to help understanding of the present invention, and do not limit the present invention in any way.

第１の発明は、音場共有システムは、或る音場に配置され、第１所定数のマイクホンを有するマイクロホンアレイ（１４）、マイクロホンアレイによって検出された音場データを収録し、当該音場データを複数の再現システムに伝送するサーバ（１２）、およびサーバからの音場データを第２所定数のラウドスピーカを有するスピーカアレイによって再生する再現システム（２２、２６）を備える。この音場共有システムは、初期スピーカ選択手段、第１評価値算出手段、基準スピーカ選択手段、第１実行手段、初期マイク選択手段、第２評価値算出手段、基準マイク選択手段および第２実行手段を備える。たとえば、これらの手段は、コンピュータ（１２、１８、２０など）によって実現される。 According to a first aspect of the present invention, a sound field sharing system is arranged in a certain sound field, includes a microphone array (14) having a first predetermined number of microphones, sound field data detected by the microphone array, and records the sound field. A server (12) for transmitting data to a plurality of reproduction systems, and a reproduction system (22, 26) for reproducing sound field data from the server by a speaker array having a second predetermined number of loudspeakers. The sound field sharing system includes an initial speaker selection unit, a first evaluation value calculation unit, a reference speaker selection unit, a first execution unit, an initial microphone selection unit, a second evaluation value calculation unit, a reference microphone selection unit, and a second execution unit. Is provided. For example, these means are realized by a computer (12, 18, 20, etc.).

初期スピーカ選択手段は、スピーカアレイのうちの１個のラウドスピーカを最初の基準ラウドスピーカとして選択する。第１評価値算出手段は、スピーカアレイのうち、選択された基準ラウドスピーカと当該基準ラウドスピーカ以外のすべての評価対象ラウドスピーカのそれぞれとの間のグラムシュミット直交化評価値を算出する。基準スピーカ選択手段は、第１評価値算出手段によって算出されたグラムシュミット直交化評価値が最も高い評価対象ラウドスピーカを基準ラウドスピーカとして選択する。第１実行手段は、基準スピーカ選択手段による選択の結果、基準ラウドスピーカの数が第２所定数よりも少ない第３所定数になるまで、第１評価値算出手段および基準スピーカ選択手段を繰り返し実行させる。 The initial speaker selection means selects one loudspeaker in the speaker array as the first reference loudspeaker. The first evaluation value calculation means calculates Gram Schmidt orthogonalization evaluation values between the selected reference loudspeaker and each of the evaluation target loudspeakers other than the reference loudspeaker in the speaker array. The reference speaker selection means selects the evaluation target loudspeaker having the highest Gram Schmidt orthogonalization evaluation value calculated by the first evaluation value calculation means as the reference loudspeaker. The first executing means repeatedly executes the first evaluation value calculating means and the reference speaker selecting means until the number of reference loudspeakers becomes a third predetermined number smaller than the second predetermined number as a result of selection by the reference speaker selecting means. Let

初期マイク選択手段は、マイクロホンアレイのうちの１個のマイクロホンを最初の基準マイクロホンとして選択する。第２評価値算出手段は、マイクロホンアレイのうち、選択された基準マイクロホンと当該基準マイクロホン以外のすべての評価対象マイクロホンのそれぞれとの間のグラムシュミット直交化評価値を算出する。基準マイク選択手段は、第２評価値算出手段によって算出されたグラムシュミット直交化評価値が最も高い評価対象マイクロホンを基準マイクロホンとして選択する。第２実行手段は、基準マイク選択手段による選択の結果、基準マイクロホンの数が第１所定数よりも少ない第４所定数になるまで、第２評価値算出手段および基準マイク選択手段を繰り返し実行させる。そして、サーバは、第４所定数の基準マイクロホンによって検出された音場データを複数の再現システムに伝送する。したがって、複数の再現システムの各々は、第３所定数の基準ラウドスピーカを用いて、サーバから伝送された音場データを再生する。 The initial microphone selection means selects one microphone of the microphone array as the first reference microphone. The second evaluation value calculation means calculates a Gramschmitt orthogonalization evaluation value between the selected reference microphone and all the evaluation target microphones other than the reference microphone in the microphone array. The reference microphone selection unit selects the evaluation target microphone having the highest Gram Schmitt orthogonalization evaluation value calculated by the second evaluation value calculation unit as the reference microphone. The second execution means repeatedly executes the second evaluation value calculation means and the reference microphone selection means until the number of reference microphones becomes a fourth predetermined number smaller than the first predetermined number as a result of selection by the reference microphone selection means. . Then, the server transmits the sound field data detected by the fourth predetermined number of reference microphones to the plurality of reproduction systems. Accordingly, each of the plurality of reproduction systems reproduces the sound field data transmitted from the server using the third predetermined number of reference loudspeakers.

第１の発明によれば、第２所定数のラウドスピーカを第３所定数に低減するとともに、第１所定数のマイクロホンを第４所定数に低減するので、畳み込み処理の負荷およびデータ量を低減することができる。したがって、リアルタイムに、畳み込み処理およびデータの伝送を行うことができ、音場を共有することができる。 According to the first aspect, the second predetermined number of loudspeakers is reduced to the third predetermined number and the first predetermined number of microphones is reduced to the fourth predetermined number, so that the load and data amount of the convolution process are reduced. can do. Therefore, convolution processing and data transmission can be performed in real time, and a sound field can be shared.

第２の発明は、第１の発明に従属し、音場共有システムは、初期スピーカ変化手段、第１組記憶手段および第１組選択手段をさらに備える。これらの手段もまた、コンピュータ（１２、１８、２０など）によって実現される。初期スピーカ変化手段は、初期スピーカ選択手段によって選択される最初の基準ラウドスピーカを順次変化させる。したがって、最初に選択される基準ラウドスピーカ毎に、第１実行手段によって、第１評価値算出手段および基準スピーカ選択手段を繰り返し実行することにより、第３所定数の基準ラウドスピーカの組が複数（ラウドスピーカの総数）得られる。第１組記憶手段は、初期スピーカ変化手段によって最初の基準ラウドスピーカを順次変化させたとき、それぞれの場合について、選択された第３所定数の基準ラウドスピーカの複数の組を記憶する。たとえば、複数の組は、コンピュータのメモリ（ハードディスクやＲＡＭ）などに記憶される。第１組選択手段は、第１組記憶手段によって記憶された複数の組のうち、第１評価値算出手段によって算出されたグラムシュミット直交化評価値が所定の条件を満たす一組の第３所定数の基準ラウドスピーカを選択する。具体的には、グラムシュミット直交化法による評価指標の平均値が最大である組が選択される。ただし、評価指標の平均値が最大である組についての評価指標の最小値が極端に低い場合には、線形独立性の低い周波数が存在するため、評価指標の平均値が最大であっても、選択するのは適切ではない。正しく音場を再現できないと考えられるからである。かかる場合には、次に評価指標の平均値が大きい組が選択される。ただし、次に評価指標の平均値が大きい組についての評価指標の最小値が極端に低い場合には、その次に評価指標の平均値が大きい組が選択される。それ以降も同様である。このようにして、最適と考えられる組の第３所定数の基準ラウドスピーカが選択される。したがって、複数の再現システムの各々は、第１組選択手段によって選択された一組の第３所定数の基準ラウドスピーカを用いて、サーバから伝送された音場データを再生する。 A second invention is dependent on the first invention, and the sound field sharing system further includes an initial speaker changing unit, a first group storage unit, and a first group selection unit. These means are also realized by a computer (12, 18, 20, etc.). The initial speaker changing means sequentially changes the first reference loudspeaker selected by the initial speaker selecting means. Therefore, by repeatedly executing the first evaluation value calculating means and the reference speaker selecting means by the first execution means for each reference loudspeaker selected first, a plurality of sets of the third predetermined number of reference loudspeakers ( Total number of loudspeakers). The first set storage means stores a plurality of sets of the selected third predetermined number of reference loudspeakers for each case when the initial reference loudspeaker is sequentially changed by the initial speaker changing means. For example, the plurality of sets are stored in a computer memory (hard disk or RAM). The first set selection unit is a set of third predetermined groups in which the Gram Schmitt orthogonalization evaluation value calculated by the first evaluation value calculation unit satisfies a predetermined condition among the plurality of groups stored by the first group storage unit. Select a number of reference loudspeakers. Specifically, a set having the maximum average value of evaluation indices by the Gram-Schmidt orthogonalization method is selected. However, when the minimum value of the evaluation index for the group having the maximum average value of the evaluation index is extremely low, since there is a frequency with low linear independence, even if the average value of the evaluation index is maximum, It is not appropriate to choose. This is because it is considered that the sound field cannot be reproduced correctly. In such a case, the group with the next highest average value of the evaluation index is selected. However, if the minimum value of the evaluation index for the group having the next highest average value of the evaluation index is extremely low, the group having the next highest average value of the evaluation index is selected. The same applies thereafter. In this way, the third predetermined number of reference loudspeakers in the set considered optimal are selected. Therefore, each of the plurality of reproduction systems reproduces the sound field data transmitted from the server by using a set of the third predetermined number of reference loudspeakers selected by the first set selection unit.

第２の発明によれば、最適と考えられるラウドスピーカを選択することができるので、正しく音場を再現することができる。 According to the second aspect, the loudspeaker considered to be optimal can be selected, so that the sound field can be correctly reproduced.

第３の発明は、第２の発明に従属し、音場共有システムは、初期マイク変化手段、第２組記憶手段および第２組選択手段をさらに備える。これらの手段もまた、コンピュータ（１２、１８、２０など）によって実現される。初期マイク変化手段は、初期マイク選択手段によって選択される最初の基準マイクロホンを順次変化させる。第２組記憶手段は、初期マイク変化手段によって最初の基準マイクロホンを順次変化させたとき、それぞれの場合について、選択された第４所定数の基準マイクロホンの複数の組を記憶する。そして、第２組選択手段は、第２組記憶手段によって記憶された複数の組のうち、グラムシュミット直交化評価値が所定の条件を満たす一組の第４所定数の基準マイクロホンを選択する。したがって、ラウドスピーカの場合と同様に、最適と考えられる組の第４所定数のマイクロホンが選択される。そして、サーバは、第２組選択手段によって選択された一組の第４所定数のマイクロホンによって検出された音場データを複数の再現システムに伝送する。 A third invention is dependent on the second invention, and the sound field sharing system further includes an initial microphone changing means, a second set storage means, and a second set selection means. These means are also realized by a computer (12, 18, 20, etc.). The initial microphone changing means sequentially changes the first reference microphone selected by the initial microphone selecting means. The second set storage means stores a plurality of sets of the selected fourth predetermined number of reference microphones in each case when the initial reference microphone is sequentially changed by the initial microphone changing means. Then, the second set selecting unit selects a set of a fourth predetermined number of reference microphones, out of the plurality of sets stored by the second set storing unit, whose Gram Schmitt orthogonalization evaluation value satisfies a predetermined condition. Therefore, as in the case of the loudspeaker, a fourth predetermined number of microphones considered to be optimal are selected. Then, the server transmits the sound field data detected by the set of the fourth predetermined number of microphones selected by the second set selection means to the plurality of reproduction systems.

第３の発明によれば、最適と考えられるマイクロホンが選択されるため、第２の発明と同様に、音場を正しく再現することができる。 According to the third aspect, since the optimum microphone is selected, the sound field can be correctly reproduced as in the second aspect.

第４の発明は、第１ないし第３の発明に従属し、第４所定数は、第３所定数に応じて決定される。具体的には、逆システムの行列の全要素数は決定されており、したがって、ラウドスピーカを第４所定数に決定すると、全要素数を第４所定数で割った値に第３所定数が決定される。 The fourth invention is dependent on the first to third inventions, and the fourth predetermined number is determined according to the third predetermined number. Specifically, the total number of elements of the matrix of the inverse system has been determined. Therefore, when the loudspeaker is determined to be the fourth predetermined number, the third predetermined number is obtained by dividing the total number of elements by the fourth predetermined number. It is determined.

第４の発明によれば、第４所定数に応じて第３所定数を決定するので、第３所定数を簡単に決定することができる。 According to the fourth aspect, since the third predetermined number is determined according to the fourth predetermined number, the third predetermined number can be easily determined.

第５の発明は、第１ないし第４の発明に従属し、第３所定数および第４所定数は、少なくともサーバおよび再現システムの処理能力に応じて決定される。つまり、逆システムの行列の全要素数は、サーバおよび再現システムの処理能力によって決定されるのである。 The fifth invention is dependent on the first to fourth inventions, and the third predetermined number and the fourth predetermined number are determined according to at least the processing capability of the server and the reproduction system. In other words, the total number of elements in the inverse system matrix is determined by the processing capabilities of the server and the reproduction system.

第５の発明によれば、第３所定数および第４所定数は、サーバおよび再現システムの処理能力に応じて決定されるため、確実に、音場データの畳み込み処理、伝送および再現を、リアルタイムに実行させることができる。 According to the fifth invention, since the third predetermined number and the fourth predetermined number are determined according to the processing capability of the server and the reproduction system, the convolution processing, transmission, and reproduction of the sound field data are surely performed in real time. Can be executed.

第６の発明は、第１ないし第５の発明に従属し、第２所定数は６２であり、第３所定数は２４を超えない値である。つまり、ラウドスピーカは、最大で２４個選択される。 A sixth invention is dependent on the first to fifth inventions, the second predetermined number is 62, and the third predetermined number is a value not exceeding 24. That is, a maximum of 24 loudspeakers are selected.

第６の発明によれば、６２個のラウドスピーカを２４個に低減できるため、畳み込み処理およびデータ量を低減することができる。 According to the sixth aspect, since 62 loudspeakers can be reduced to 24, the convolution process and the data amount can be reduced.

第７の発明は、第６の発明に従属し、第１所定数は７０であり、第３所定数は８を超えない値である。 A seventh invention is dependent on the sixth invention, wherein the first predetermined number is 70 and the third predetermined number is a value not exceeding 8.

第７の発明によれば、たとえば、逆行列の要素数は１９２に設定され、ラウドスピーカを２４個にした場合には、マイクロホンを最大で８個選択することができる。 According to the seventh invention, for example, when the number of elements of the inverse matrix is set to 192 and 24 loudspeakers are used, a maximum of 8 microphones can be selected.

第８の発明は、或る音場に配置され、第１所定数のマイクホンを有するマイクロホンアレイ、マイクロホンアレイによって検出された音場データを収録し、当該音場データを複数の再現システムに伝送するサーバ、およびサーバからの音場データを第２所定数のラウドスピーカを有するスピーカアレイによって再生する再現システムを備える、音場共有システムのマイクロホンアレイおよびスピーカアレイの個数および配置を最適化する最適化方法であって、（ａ）スピーカアレイのうちの１個のラウドスピーカを最初の基準ラウドスピーカとして選択し、（ｂ）スピーカアレイのうち、選択された基準ラウドスピーカと当該基準ラウドスピーカ以外のすべての評価対象ラウドスピーカのそれぞれとの間のグラムシュミット直交化評価値を算出し、（ｃ）ステップ（ｂ）によって算出されたグラムシュミット直交化評価値が最も高い評価対象ラウドスピーカを基準ラウドスピーカとして選択し、（ｄ）ステップ（ｃ）による選択の結果、基準ラウドスピーカの数が第２所定数よりも少ない第３所定数になるまで、ステップ（ｂ）およびステップ（ｃ）を繰り返し実行させ、（ｅ）マイクロホンアレイのうちの１個のマイクロホンを最初の基準マイクロホンとして選択し、（ｆ）マイクロホンアレイのうち、選択された基準マイクロホンと当該基準マイクロホン以外のすべての評価対象マイクロホンのそれぞれとの間のグラムシュミット直交化評価値を算出し、（ｇ）ステップ（ｆ）によって算出されたグラムシュミット直交化評価値が最も高い評価対象マイクロホンを基準マイクロホンとして選択し、そして（ｈ）ステップ（ｇ）による選択の結果、基準マイクロホンの数が第１所定数よりも少ない第４所定数になるまで、ステップ（ｆ）およびステップ（ｇ）を繰り返し実行させる、最適化方法である。 In an eighth aspect, a microphone array having a first predetermined number of microphones arranged in a certain sound field, sound field data detected by the microphone array is recorded, and the sound field data is transmitted to a plurality of reproduction systems. Optimization method for optimizing the number and arrangement of microphone arrays and speaker arrays in a sound field sharing system, including a server and a reproduction system that reproduces sound field data from the server by a speaker array having a second predetermined number of loudspeakers (A) selecting one loudspeaker of the speaker array as the first reference loudspeaker, and (b) selecting all of the speaker arrays other than the selected reference loudspeaker and the reference loudspeaker. Calculate the Gram Schmidt orthogonalization evaluation value with each of the target loudspeakers (C) selecting the evaluation target loudspeaker having the highest Gram Schmidt orthogonalization evaluation value calculated in step (b) as the reference loudspeaker; (d) selecting the reference loudspeaker as a result of the selection in step (c); Steps (b) and (c) are repeatedly executed until the number reaches a third predetermined number less than the second predetermined number, and (e) one microphone of the microphone array is selected as the first reference microphone (F) calculating a Gram-Schmidt orthogonalization evaluation value between the selected reference microphone and each of all the evaluation target microphones other than the reference microphone in the microphone array, and (g) by step (f) The evaluation target microphone with the highest calculated Gramschmitt orthogonalization evaluation value is the reference microphone. (H) Step (f) and step (g) are repeatedly executed until the number of reference microphones becomes a fourth predetermined number smaller than the first predetermined number as a result of selection in step (g). This is an optimization method.

第８の発明によれば、ラウドスピーカの数およびマイクロホンの数を低減することにより、リアルタイムに畳み込み処理およびデータを伝送できる、音場共有システムを提供することができる。 According to the eighth aspect of the invention, it is possible to provide a sound field sharing system capable of transmitting convolution processing and data in real time by reducing the number of loudspeakers and the number of microphones.

この発明によれば、第２所定数のラウドスピーカを第３所定数に低減するとともに、第１所定数のマイクロホンを第４所定数に低減するので、畳み込み処理の負荷およびデータ量を低減することができる。したがって、リアルタイムに、畳み込み処理およびデータの伝送を行うことができ、音場を共有することができる。 According to the present invention, the second predetermined number of loudspeakers is reduced to the third predetermined number, and the first predetermined number of microphones is reduced to the fourth predetermined number, so that the load of convolution processing and the amount of data are reduced. Can do. Therefore, convolution processing and data transmission can be performed in real time, and a sound field can be shared.

この発明の上述の目的，その他の目的，特徴および利点は、図面を参照して行う以下の実施例の詳細な説明から一層明らかとなろう。 The above object, other objects, features and advantages of the present invention will become more apparent from the following detailed description of embodiments with reference to the drawings.

図１はこの発明の音場共有システムの一例を示す図解図である。FIG. 1 is an illustrative view showing one example of a sound field sharing system of the present invention. 図２は図１に示すマイクロホンアレイの例を示す図解図である。FIG. 2 is an illustrative view showing an example of the microphone array shown in FIG. 図３は図１に示すスピーカアレイシステムの例を示す図解図である。FIG. 3 is an illustrative view showing an example of the speaker array system shown in FIG. 図４は音場再現の原理を説明するための図解図である。FIG. 4 is an illustrative view for explaining the principle of sound field reproduction. 図５はグラムシュミットの直交化法を説明するための図解図である。FIG. 5 is an illustrative view for explaining the Gramschmitt orthogonalization method. 図６は各ラウドスピーカを最初に選択した場合に、６２個のマイクロホンに対して２４個のラウドスピーカを選択したときの評価指標の平均値および最小値の変化を示すグラフである。FIG. 6 is a graph showing changes in the average value and the minimum value of the evaluation index when 24 loudspeakers are selected for 62 microphones when each loudspeaker is first selected. 図７は６０番のラウドスピーカを最初に選択した場合における選択処理に従う評価指数の平均値および最小値の変化を示すグラフである。FIG. 7 is a graph showing changes in the average value and the minimum value of the evaluation index according to the selection process when the 60th loudspeaker is first selected. 図８は各マイクロホンを最初に選択した場合に、２４個のラウドスピーカに対して８個のマイクロホンを選択したときの評価指数の平均値および最小値の変化を示すグラフである。FIG. 8 is a graph showing changes in the average value and the minimum value of the evaluation index when eight microphones are selected for 24 loudspeakers when each microphone is first selected. 図９は最初に６５番のマイクロホンを選択し、選択処理の回数を増加させた場合における評価指標の値の変化を示すグラフである。FIG. 9 is a graph showing changes in the value of the evaluation index when the 65th microphone is first selected and the number of selection processes is increased. 図１０は６０番のラウドスピーカを最初に選択した場合において、選択された２４個のスピーカ位置の分布を示す図解図である。FIG. 10 is an illustrative view showing a distribution of 24 selected speaker positions when the 60th loudspeaker is first selected. 図１１は６５番のマイクロホンを最初に選択した場合において、選択された８個のマイクロホンの位置の分布を示す図解図である。FIG. 11 is an illustrative view showing a distribution of positions of eight selected microphones when the 65th microphone is first selected. 図１２は実験条件のテーブルを示す図解図である。FIG. 12 is an illustrative view showing a table of experimental conditions. 図１３は提示された音源定位について被験者が認識した角度の差についてのＲＭＳ値および提示された音源定位について被験者が認識した角度の正解率を示すグラフである。FIG. 13 is a graph showing the RMS value for the difference in angle recognized by the subject for the presented sound source localization and the accuracy rate of the angle recognized by the subject for the presented sound source localization. 図１４は差のＲＭＳ値についてのテューキーの多重比較試験の結果を示すテーブルおよび正解率についてのテューキーの多重比較試験の結果を示すテーブルである。FIG. 14 is a table showing the results of Tukey's multiple comparison test for the difference RMS value and the table showing the results of Tukey's multiple comparison test for the correct answer rate.

図１を参照して、この実施例の音場共有システム１０はサーバ１２を含み、サーバ１２には、マイクロホンアレイ１４が接続される。サーバ１２は、汎用のサーバであり、インターネットまたはＬＡＮ或いはその両方のようなネットワーク１６を介して、コンピュータ１８およびコンピュータ２０に接続される。コンピュータ１８、２０は、汎用のＰＣまたはワークステーションである。コンピュータ１８には、スピーカアレイシステム２２およびマイク２４が接続され、同様に、コンピュータ２０には、スピーカアレイシステム２６およびマイク２８が接続される。 Referring to FIG. 1, the sound field sharing system 10 of this embodiment includes a server 12, and a microphone array 14 is connected to the server 12. The server 12 is a general-purpose server, and is connected to the computer 18 and the computer 20 via a network 16 such as the Internet and / or a LAN. The computers 18 and 20 are general-purpose PCs or workstations. A speaker array system 22 and a microphone 24 are connected to the computer 18. Similarly, a speaker array system 26 and a microphone 28 are connected to the computer 20.

この図１に示す音場共有システム１０は、２つの音場制御（ＢｏＳＣ）再生システム１０ａおよび１０ｂを含む。図１の点線枠で囲むように、ＢｏＳＣ再生システム１０ａは、サーバ１２、マイクロホンアレイ１４、ネットワーク１６、コンピュータ１８、スピーカアレイシステム２２およびマイク２４によって構成される。また、図１の一点鎖線枠で囲むように、ＢｏＳＣ再生システム１０ｂは、サーバ１２、マイクロホンアレイ１４、ネットワーク１６、コンピュータ２０、スピーカアレイシステム２６およびマイク２８によって構成される。 The sound field sharing system 10 shown in FIG. 1 includes two sound field control (BoSC) reproduction systems 10a and 10b. The BoSC playback system 10 a is configured by a server 12, a microphone array 14, a network 16, a computer 18, a speaker array system 22 and a microphone 24 so as to be surrounded by a dotted line frame in FIG. 1. In addition, the BoSC playback system 10b includes a server 12, a microphone array 14, a network 16, a computer 20, a speaker array system 26, and a microphone 28 so as to be surrounded by a one-dot chain line frame in FIG.

図２に示すように、マイクロホンアレイ１４は、球形に近い形状の骨格１４ａおよびこの骨格１４ａを支持するスタンド１４ｂを含む。骨格１４ａは、Ｃ_８０フラーレン（Ｆｕｌｌｅｒｅｎｅ）の構造を基に、底部の１０個の頂点を切り取った７０個の頂点を有している。図示は省略するが、骨格１４ａの表面（外面）であり、７０個の頂点の各々には１個の無指向性のマイクロホンが取り付けられる。たとえば、マイクロホンとしては、ＤＰＡ４０６０−ＢＭを用いることができる。スタンド１４ｂは、支持軸１４０および三脚１４２によって構成され、支持軸１４０は、骨格１４ａの切り取った底部を通ってこの骨格１４ａの天井をその内側から支持している。 As shown in FIG. 2, the microphone array 14 includes a skeleton 14a having a nearly spherical shape and a stand 14b that supports the skeleton 14a. Skeleton _14a, based on the structure of the _{C 80} fullerene (Fullerene), has 70 vertices taken ten vertices of the bottom. Although not shown, it is the surface (outer surface) of the skeleton 14a, and one omnidirectional microphone is attached to each of the 70 apexes. For example, DPA 4060-BM can be used as the microphone. The stand 14b is constituted by a support shaft 140 and a tripod 142, and the support shaft 140 supports the ceiling of the skeleton 14a from the inside through the bottom portion cut out of the skeleton 14a.

なお、骨格１４ａは、前面側と重なる部分以外は、背面側であっても正面から見えるが、分かり易く示すために、図２では、背面側に相当する部分を点線で示してある。 The skeleton 14a can be seen from the front even if it is on the back side except for the part that overlaps the front side, but for the sake of easy understanding, the part corresponding to the back side is shown by dotted lines in FIG.

また、図３に示すように、スピーカアレイシステム２２、２６は、楕円形のドーム部２２０およびこれを支える４本の柱部２２２を含む。この楕円形のドーム部２２０は、たとえば木製の４層の架台２２０ａ、２２０ｂ、２２０ｃ、２２０ｄによって構成される。ただし、図３では、ドーム部２２０の内部をその斜め下方から見た図であり、架台２２０ｄおよび柱部２２２についてはその一部を示してある。図示は省略するが、ドーム部２２０および柱部２２２の内部は空洞にされ、架台（２２０ａ−２２０ｄ）自体が密室型エンクロージャの役割を果たす。 As shown in FIG. 3, the speaker array system 22, 26 includes an elliptical dome portion 220 and four pillar portions 222 that support the dome portion 220. The elliptical dome portion 220 is constituted by, for example, wooden four-layer mounts 220a, 220b, 220c, and 220d. However, in FIG. 3, the inside of the dome portion 220 is viewed from an obliquely lower side, and a part of the gantry 220 d and the column portion 222 is shown. Although illustration is omitted, the inside of the dome part 220 and the pillar part 222 is made hollow, and the gantry (220a-220d) itself serves as a closed-chamber enclosure.

また、スピーカアレイシステム２２、２６の各々には、７０個のラウドスピーカ２３０が設置される。具体的には、架台２２０ａには６個のフルレンジユニット（ＦｏｓｔｅｘＦＥ８３Ｅ）すなわちラウドスピーカ２３０が設置され、架台２２０ｂには１６個のラウドスピーカ２３０が設置され、架台２２０ｃには２４個のラウドスピーカ２３０が設置され、そして、架台２２０ｄには１６個のラウドスピーカ２３０が設置される。さらに、４本の柱部２２２の各々には、低域を補うため、２個のサブウーファーユニット（ＦｏｓｔｅｘＦＷ１０８Ｎ）すなわちラウドスピーカ２３０が設置される。 In addition, 70 loudspeakers 230 are installed in each of the speaker array systems 22 and 26. Specifically, six full-range units (Fostex FE83E), that is, loudspeakers 230 are installed on the gantry 220a, 16 loudspeakers 230 are installed on the gantry 220b, and 24 loudspeakers 230 are installed on the gantry 220c. And 16 loudspeakers 230 are installed on the frame 220d. Further, in each of the four pillars 222, two subwoofer units (Fostex FW108N), that is, loudspeakers 230 are installed to compensate for the low frequency range.

このようなスピーカアレイシステム２２、２６は、それぞれ、音場再現ルーム（図示せず）内に設置される。音場再現ルームは、１．５帖の防音室であり、ＹＡＭＡＨＡウッディボックス（遮音性能Ｄｒ−３０）が用いられる。また、音場再現ルーム内には、リフト付きの椅子（図示せず）が設けられる。これは、スピーカアレイシステム２２、２６のドーム部２２０内であり、ラウドスピーカ２３０の数が最大となる架台２２０ｃの高さに、椅子に座ったユーザの耳の位置（高さ）を設定するためである。 Such speaker array systems 22 and 26 are each installed in a sound field reproduction room (not shown). The sound field reproduction room is a 1.5-cm soundproof room, and a YAMAHA woody box (sound insulation performance Dr-30) is used. In addition, a chair with a lift (not shown) is provided in the sound field reproduction room. This is in the dome portion 220 of the speaker array system 22, 26, in order to set the position (height) of the ear of the user sitting on the chair to the height of the mount 220 c where the number of loudspeakers 230 is maximum. It is.

なお、マイクロホンアレイ１４、およびコンピュータ（１８、２０）とスピーカアレイシステム（２６、２８）とを含む音場再現ルーム（音場再現システム）については、「１．数値解析技術と可視化・可聴化１．７三次元音場通信システム」榎本成悟音響技術 No.148/Dec.2009 pp37-42に開示されているため、さらなる詳細な説明は省略することにする。 For the sound field reproduction room (sound field reproduction system) including the microphone array 14 and the computers (18, 20) and the speaker array system (26, 28), refer to “1. Numerical analysis technology and visualization / audibility 1. .7 Three-Dimensional Sound Field Communication System ”Seigo Enomoto Acoustic Technology No.148 / Dec.2009 pp37-42, and will not be described in further detail.

たとえば、図１に示した音場共有システム１０では、マイクロホンアレイ１４は、オーケストラの演奏会場などの音場に配置される。サーバ１２は、マイクロホンアレイ１４からアンプ（図示せず）を介して入力される音声信号（音場信号）をディジタルの音声データ（音場データ）に変換し、この音場データに対して逆システムの畳み込み処理を実行する。サーバ１２は、畳み込み処理を実行した音場データを、ネットワーク１６を介して、コンピュータ１８および２０に送信する。 For example, in the sound field sharing system 10 shown in FIG. 1, the microphone array 14 is arranged in a sound field such as an orchestra performance hall. The server 12 converts a sound signal (sound field signal) input from the microphone array 14 via an amplifier (not shown) into digital sound data (sound field data), and performs an inverse system on the sound field data. The convolution process is executed. The server 12 transmits the sound field data subjected to the convolution process to the computers 18 and 20 via the network 16.

コンピュータ１８、２０は、それぞれ、サーバ１２からの音場データをアナログの音場信号に変換し、スピーカアレイシステム２２、２６に出力する。したがって、スピーカアレイシステム２２、２６では、上述の音場が再現される。このため、スピーカアレイシステム２２、２６を使用する各ユーザ（図示せず）は、遠隔地に存在している場合であっても、スピーカアレイシステム２２、２６を介して、たとえば演奏会場で収録した生のオーケストラを楽しむことができる。 The computers 18 and 20 convert the sound field data from the server 12 into analog sound field signals and output them to the speaker array systems 22 and 26, respectively. Therefore, in the speaker array systems 22 and 26, the above-described sound field is reproduced. For this reason, each user (not shown) who uses the speaker array systems 22 and 26 is recorded at the performance hall, for example, via the speaker array systems 22 and 26 even when they are located in a remote place. You can enjoy a raw orchestra.

また、各ユーザは、マイク２４、２８を通して音声を入力することができる。マイク２４で検出された音声信号はコンピュータ１８でディジタルの音声データに変換され、ネットワーク１６を介してコンピュータ２０に送信される。コンピュータ２０は、受信した音声データと音声フィルタを畳み込み演算し、音場データに重ね合わせて、スピーカアレイシステム２６に出力する。したがって、音場が再現されるとともに、他のユーザの音声が再現される。同様に、マイク２８で検出された音声信号はコンピュータ２０でディジタルの音声データに変換され、ネットワーク１６を介してコンピュータ１８に送信される。コンピュータ１８は、受信した音声データと音声フィルタを畳み込み演算し、音場データに重ね合わせて、スピーカアレイシステム２４に出力する。 Each user can input voice through the microphones 24 and 28. The audio signal detected by the microphone 24 is converted into digital audio data by the computer 18 and transmitted to the computer 20 via the network 16. The computer 20 performs a convolution operation on the received audio data and the audio filter, superimposes the audio data on the audio field data, and outputs the result to the speaker array system 26. Therefore, the sound field is reproduced and the voices of other users are reproduced. Similarly, the audio signal detected by the microphone 28 is converted into digital audio data by the computer 20 and transmitted to the computer 18 via the network 16. The computer 18 performs a convolution operation on the received sound data and the sound filter, superimposes the sound data on the sound field data, and outputs it to the speaker array system 24.

したがって、スピーカアレイシステム２２のユーザと、スピーカアレイシステム２６のユーザとは、音場を共有するとともに、会話することが可能である。 Therefore, the user of the speaker array system 22 and the user of the speaker array system 26 can share a sound field and have a conversation.

なお、詳細な説明は省略するが、たとえば、マイク２４、２８はヘッドセットマイクを用いることができる。 In addition, although detailed description is abbreviate | omitted, the microphone 24,28 can use a headset microphone, for example.

ここで、境界音場制御（ＢｏＳＣ）の原理およびＢｏＳＣを用いた音場再現システムについて簡単に説明する。境界音場制御では、キルヒホッフ・ヘルムホルツ積分方程式（ＫＨＩＥ）に基づき、図４の左側に示す原音場内の領域Ｖ内の音場が、図４の右側に示す際現音場内の領域Ｖ´において再現される。ただし、領域Ｖを囲む境界Ｓ上の収録点ｒと、領域Ｖ’を囲む境界Ｓ’上の制御点ｒ’との相対的な位置は等しいものとする。つまり、数１が成立すると仮定する。ただし、点ｓおよび点ｓ’は各領域内部の任意の点である。 Here, the principle of boundary sound field control (BoSC) and a sound field reproduction system using BoSC will be briefly described. In the boundary sound field control, based on the Kirchhoff-Helmholtz integral equation (KHIE), the sound field in the region V in the original sound field shown on the left side of FIG. 4 is reproduced in the region V ′ in the current sound field shown on the right side of FIG. Is done. However, the relative positions of the recording point r on the boundary S surrounding the region V and the control point r ′ on the boundary S ′ surrounding the region V ′ are equal. That is, it is assumed that Equation 1 holds. However, the point s and the point s ′ are arbitrary points inside each region.

[数１]
｜ｒ−ｓ｜＝｜ｒ’−ｓ’｜，ｓ∈Ｖ，ｓ’∈Ｖ’
このとき、内部に音源を含まない領域内の音圧ｐ（ｓ），ｐ（ｓ’）はＫＨＩＥより、数２および数３のそれぞれで示される。 [Equation 1]
| R−s | = | r′−s ′ |, s∈V, s′∈V ′
At this time, the sound pressures p (s) and p (s ′) in the region that does not include the sound source are expressed by Equations 2 and 3 from KHIE.

ただし、ωは角周波数であり、ρ_０は媒質の密度であり、ｐ（ｒ），ｖ_ｎ（ｒ）はそれぞれ境界上の点ｒにおける音圧と法線ｎの方向の粒子速度であり、Ｇ（ｒ｜ｓ）は自由空間グリーン関数である。 Where ω is the angular frequency, ρ ₀ is the density of the medium, p (r) and v _n (r) are the sound pressure at the point r on the boundary and the particle velocity in the direction of the normal n, respectively. G (r | s) is a free space Green's function.

ここで、数１より、数４に示す関係が成立する。さらに、数４に従って、数５が成立する。 Here, from Equation 1, the relationship shown in Equation 4 is established. Further, according to Equation 4, Equation 5 is established.

この数５から、原音原で収音された境界面Ｓ上の音圧と粒子速度が再現音場において等しくなるように、２次音源から信号を出力すれば、領域Ｖ内の音場が領域Ｖ’において再現されることが分かる。 From Equation 5, if a signal is output from the secondary sound source so that the sound pressure on the boundary surface S collected by the original sound source and the particle velocity are equal in the reproduced sound field, the sound field in the region V will be the region. It can be seen that it is reproduced at V ′.

ただし、２次音源の出力は、すべての２次音源からすべての制御点までの伝達特性を打ち消す逆フィルタと収録点で観測された信号を畳み込むことにより決定される。したがって、図４に示すような、ＢｏＳＣ音場再現システムを実現するためには、安定であり、かつ頑健な逆フィルタ（ｐｉｎｖ（Ｈ））を設計することが重要になる。 However, the output of the secondary sound source is determined by convolving the inverse filter that cancels the transfer characteristics from all secondary sound sources to all control points and the signal observed at the recording point. Therefore, in order to realize a BoSC sound field reproduction system as shown in FIG. 4, it is important to design a stable and robust inverse filter (pinv (H)).

なお、逆フィルタの設計方法は、文献（S.Enomoto et al., "Three-dimensional sound field reproduction and recording systems based on boundary surface control principle", Proc. of 14th ICAD, Presentation o 16, 2008 Jun.）に詳細に開示されているため、ここでは、簡単に説明することにする。 The inverse filter design method is described in the literature (S. Enomoto et al., “Three-dimensional sound field reproduction and recording systems based on boundary surface control principle”, Proc. Of 14th ICAD, Presentation o 16, 2008 Jun.). Will be described briefly here.

図４に示すような、２次音源数Ｍ、制御点数Ｎの多チャネル−多点制御逆システム（以下、単に「逆システム」という）を周波数領域で設計する方法について簡単に説明する。ただし、逆システムとは、Ｍ×Ｎ個の逆フィルタ群の総称である。 A method for designing a multichannel-multipoint control inverse system (hereinafter simply referred to as “inverse system”) having the number M of secondary sound sources and the number N of control points as shown in FIG. 4 will be briefly described. However, the inverse system is a general term for M × N inverse filter groups.

２次音源ｉから制御点ｊまでの伝達関数をＨｊｉ（ω）とし、入力信号をＸｊ（ω）とし、そして、観測信号をＰｊ（ω）とすると、これらの関係は、数６で表すことができる。ただし、ｉは２次音源番号（１、２、…、Ｍ）であり、ｊは制御点番号（１、２、…、Ｎ）であり、そして、Ｗ（ω）は逆システムである。 When the transfer function from the secondary sound source i to the control point j is Hji (ω), the input signal is Xj (ω), and the observation signal is Pj (ω), these relations are expressed by Equation 6. Can do. Where i is the secondary sound source number (1, 2,..., M), j is the control point number (1, 2,..., N), and W (ω) is the inverse system.

［数６］

[Equation 6]

このとき、Ｐ（ω）＝Ｘ（ω）とするためには、数７を満たす必要がある。ただし、＋は疑似逆行列を意味する。これによって、［Ｗ（ω）］は、［Ｈ（ω）］の逆システムとして定義される。 At this time, in order to satisfy P (ω) = X (ω), Equation 7 must be satisfied. However, + means a pseudo inverse matrix. Thus, [W (ω)] is defined as the inverse system of [H (ω)].

［数７］
[W(ω)] = [H(ω)]⁺
ここで、正則化法が逆問題を解決する合理的な方法であることは良く知られている。これは既に音再生システムに適用されている（TOKUNO et al., "Inverse Filter of Sound Reproduction Systems Using Regularization" EIEIC TRANS. FUNDAMENTALS, Vol.E80-A, NO.5 MAY 1997など）。正則化法を用いることにより、ランク（［Ｈ（ω）］）＝Ｎについての算出された逆行列［Ｗ＾（ω）］（表記の都合上、“＾”をＷの横に示してあるが、実際には数８に示すように、Ｗの上に記載される。以下、同じ。）は数８で与えられる。ただし、数８において、＃は共役転置を意味し、β（ω）は正則化パラメータであり、Ｉ_ＭはＭ×Ｍの単位行列である。以下、同様である。 [Equation 7]
[W (ω)] = [H (ω)] ⁺
Here, it is well known that the regularization method is a rational method for solving the inverse problem. This has already been applied to sound reproduction systems (TOKUNO et al., “Inverse Filter of Sound Reproduction Systems Using Regularization” EIEIC TRANS. FUNDAMENTALS, Vol.E80-A, NO.5 MAY 1997, etc.). By using the regularization method, the calculated inverse matrix [W ^ (ω)] for rank ([H (ω)]) = N (“^” is shown next to W for convenience of description. Is actually written on W as shown in Equation 8. The same applies hereinafter.) Is given by Equation 8. In Equation 8, # means conjugate transpose, β (ω) is a regularization parameter, and _IM is an M × M unit matrix. The same applies hereinafter.

［数８］

[Equation 8]

一方、数７の右辺に示される、ランク（［Ｈ（ω）］）＝Ｍについての逆行列［Ｈ（ω）］^＋は、数９として導かれる。 On the other hand, the inverse matrix [H (ω)] ⁺ for rank ([H (ω)]) = M shown on the right side of Equation 7 is derived as Equation 9.

［数９］

[Equation 9]

数８および数９は、それぞれ、最小二乗解および最小ノルム解（ノルム最小型一般逆行列）として解釈される。ただし、ランク（［Ｈ（ω）］）＝Ｎ＝Ｍであり、［Ｈ（ω）］は特異行列（非正則行列）では無く、そして［Ｗ（ω）］＝［Ｈ（ω）］^−１で与えられる。また、−１は逆行列を意味する。最後に、時間領域逆フィルタ係数は、［Ｗ＾（ω）］の逆離散フーリエ変換から得られる。 Equations 8 and 9 are interpreted as a least square solution and a minimum norm solution (norm minimum general inverse matrix), respectively. However, rank ([H (ω)]) = N = M, [H (ω)] is not a singular matrix (non-regular matrix), and [W (ω)] = [H (ω)] ⁻ Given by ¹ . Moreover, -1 means an inverse matrix. Finally, the time domain inverse filter coefficients are obtained from the inverse discrete Fourier transform of [W ^ (ω)].

なお、ＢｏＳＣ再生システムにおいては、ラウドスピーカ２３０およびマイクロホンの配置は、空間サンプリングに影響を及ぼす。 In the BoSC playback system, the arrangement of the loudspeaker 230 and the microphone affects spatial sampling.

数８および数９においては、適切な正則化パラメータβ（ω）が選択されることにより、逆システムの不安定性を緩和する（取り除く）ことができる。この実施例では、正則化パラメータβ（ω）は、発見的に、各オブターブの周波数帯域で定義される。さらに、逆フィルタは、予め防音室でそれぞれのラウドスピーカ２３０とマイクロホンとの組の間で測定されたインパルス応答を使用することによって、計算された。測定されたインパルス応答を使用したため、環境の変化によって引き起こされた変動には追従しなかった。ただし、変動する実際の環境においては、ＭＩＭＯ(Multiple-Input Multiple-Output)の適応型の逆フィルタをＢｏＳＣ再生システムに適用することができる。 In the equations (8) and (9), the instability of the inverse system can be reduced (removed) by selecting an appropriate regularization parameter β (ω). In this embodiment, the regularization parameter β (ω) is heuristically defined in the frequency band of each object. Furthermore, the inverse filter was calculated by using the impulse response measured in advance between each loudspeaker 230 and microphone pair in a soundproof room. Because the measured impulse response was used, it did not follow the fluctuations caused by environmental changes. However, in an actual environment that fluctuates, an adaptive inverse filter of MIMO (Multiple-Input Multiple-Output) can be applied to the BoSC reproduction system.

ここで、図１−図３に示したマイクロホンアレイ１４およびスピーカアレイシステム２２、２６をそのまま使用する場合には、サーバ１２における処理負荷がかなり大きい。具体的には、マイクロホンアレイ１４が７０ｃｈであり、スピーカアレイシステム２２が６２ｃｈであるため、サーバ１２は、マイクロホン７０ｃｈの音声信号（音場データ）と、逆システムとの畳み込み処理を６２×７０回行う必要があり、また、各回の畳み込み処理は、逆システム（逆フィルタ）のタップ数（この実施例では、４０９６）分実行する必要がある。 Here, when the microphone array 14 and the speaker array systems 22 and 26 shown in FIGS. 1 to 3 are used as they are, the processing load on the server 12 is considerably large. Specifically, since the microphone array 14 is 70 ch and the speaker array system 22 is 62 ch, the server 12 performs the convolution process of the audio signal (sound field data) of the microphone 70 ch and the inverse system 62 × 70 times. It is necessary to perform the convolution process for each round, and it is necessary to execute the number of taps of the inverse system (inverse filter) (4096 in this embodiment).

また、伝送する音場データの量（データ量）が膨大であるため、各クライアント（コンピュータ１８、２０）において、約４５Ｍｂｐｓの帯域を必要とする。 Further, since the amount of sound field data to be transmitted (data amount) is enormous, each client (computer 18, 20) requires a bandwidth of about 45 Mbps.

さらに、コンピュータ１８、２０によって、ユーザの音声に対応する音声データと音声フィルタを畳み込み演算する場合にも、７０ｃｈをフルに使用する場合には、処理負荷が比較的大きくなってしまう。 Furthermore, even when the computer 18 or 20 performs convolution calculation of the audio data corresponding to the user's voice and the audio filter, the processing load becomes relatively large when 70 ch is fully used.

したがって、サーバ１２からコンピュータ１８、２０に音場データをリアルタイムに送信するのは困難であり、当然のことながら、スピーカアレイシステム２２、２６を使用するユーザがリアルタイムにオーケストラ等を楽しむことも困難である。つまり、リアルタイムに音場を共有することができない。 Therefore, it is difficult to transmit the sound field data from the server 12 to the computers 18 and 20 in real time, and naturally, it is difficult for the user using the speaker array systems 22 and 26 to enjoy the orchestra and the like in real time. is there. That is, the sound field cannot be shared in real time.

これを回避するため、たとえば、マイクロホンアレイ１４のマイクロホンの数やスピーカアレイシステム２２、２６のラウドスピーカ２３０の数を減らすことにより、畳み込み処理の処理負荷および伝送するデータ量を低減することが考えられる。しかし、マイクロホンおよびラウドスピーカ２３０の数を単に減らせば良いということでは無く、再現される音場の臨場感を損なわない必要がある。 In order to avoid this, for example, by reducing the number of microphones of the microphone array 14 and the number of loudspeakers 230 of the speaker array systems 22 and 26, it is conceivable to reduce the processing load of the convolution process and the amount of data to be transmitted. . However, it is not just that the number of microphones and loudspeakers 230 is reduced, and it is necessary not to impair the realism of the reproduced sound field.

そこで、この実施例では、臨場感を損なうことなく、マイクロホンおよびラウドスピーカ２３０を低減するとともに、マイクロホンおよびラウドスピーカ２３０の妥当な個数を決定することにする。 In this embodiment, therefore, the number of microphones and loudspeakers 230 is reduced and the appropriate number of microphones and loudspeakers 230 is determined without impairing the sense of reality.

この実施例では、まず、グラムシュミットの直交化法を用いて、７０ｃｈのマイクロホンアレイ１４を用いた場合に、スピーカアレイシステム２２で使用するラウドスピーカ２３０が抽出（選出）される。そして、選出されたラウドスピーカ２３０を用いる場合に、グラムシュミットの直交化法を用いて、マイクロホンアレイ１４で使用するマイクロホンが抽出（選出）される。 In this embodiment, first, the loudspeaker 230 used in the speaker array system 22 is extracted (selected) when the 70ch microphone array 14 is used, using the Gramschmitt orthogonalization method. When the selected loudspeaker 230 is used, microphones used in the microphone array 14 are extracted (selected) using the Gramschmitt orthogonalization method.

詳細な説明は省略するが、使用するラウドスピーカ２３０およびマイクロホンの抽出（選出）は、サーバ１２、コンピュータ１８、２０または図示しない別のコンピュータを用いて実行することができる。 Although detailed description is omitted, extraction (selection) of the loudspeaker 230 and the microphone to be used can be performed using the server 12, the computers 18, 20 or another computer (not shown).

ここでは、単一の周波数について、グラムシュミットの直交化法を使用することでラウドスピーカ２３０を選択する場合の基本的なアルゴリズムを説明する。Ｎ×Ｍに含まれるＮ次元の縦ベクトルからの線形独立性が低ければ、行列式は悪い状態であると言われる。［Ｈ（ω）］において線形独立性の劣化は、ＢｏＳＣ再生システムの不安定性を引き起こす。ここで、数６に示した［Ｈ（ω）］は、数１０のように書くことができる。 Here, a basic algorithm in the case of selecting the loudspeaker 230 by using the Gram Schmidt orthogonalization method for a single frequency will be described. If the linear independence from the N-dimensional vertical vector contained in N × M is low, the determinant is said to be in a bad state. The degradation of linear independence in [H (ω)] causes instability of the BoSC playback system. Here, [H (ω)] shown in Equation 6 can be written as in Equation 10.

［数１０］
P(ω) = [H(ω)]Y(ω)
= {h₁(ω),…,h_M(ω)}Y(ω)
ただし、Ｙ（ω）＝［Ｗ（ω）］Ｘ（ω）およびｈ_ｉ（ω）は、［Ｈ（ω）］に含まれるＮ次元の縦ベクトルである。この縦ベクトルｈ（ω）は、周波数ωにおける、或るラウドスピーカ２３０と各々のマイクロホンとの間の伝達関数である。それゆえに、グラムシュミットの直交化法を用いたラウドスピーカ２３０の選択は、［Ｈ（ω）］から高い線形独立を有する縦ベクトルｈ（ω）の組を選択することを意味する。以下、グラムシュミットの直交化法のアルゴリズムについて簡単に説明することにする。 [Equation 10]
P (ω) = [H (ω)] Y (ω)
= {h ₁ (ω),…, h _M (ω)} Y (ω)
However, Y (ω) = [W (ω)] X (ω) and h _i (ω) are N-dimensional vertical vectors included in [H (ω)]. This vertical vector h (ω) is a transfer function between a certain loudspeaker 230 and each microphone at a frequency ω. Therefore, the selection of the loudspeaker 230 using the Gramschmitt orthogonalization means selecting a set of longitudinal vectors h (ω) having high linear independence from [H (ω)]. Hereinafter, the algorithm of the Gramschmitt orthogonalization method will be briefly described.

ラウドスピーカ２３０を選択するｎ番目のステップにおいては、既にｎ−１個のラウドスピーカ２３０が選択されている。［Ｈ］に含まれる縦ベクトルの集合は、τ＝｛ｈ_１，…，ｈ_Ｍ｝で示される。Ｓ_ｎ−１は、ｎ−１番目のステップまでに選択されたベクトルの部分集合を示し、τ_ｎ−１は、ｎ−１番目のステップまでに未使用のベクトルの部分集合を示す。ｖ_ｎ−１＝｛ｖ_１，…，ｖ_ｎ−１｝は、部分集合Ｓ_ｎ−１によって張られる平面の正規直交基底を示す。 In the n-th step of selecting the loudspeakers 230, n-1 loudspeakers 230 have already been selected. A set of vertical vectors included in [H] is represented by τ = {h ₁ ,..., H _M }. S _n−1 indicates a subset of vectors selected up to the (n−1) th step, and τ _n−1 indicates a subset of unused vectors until the (n−1) th step. v _n−1 = {v ₁ ,..., v _n−1 } represents an orthonormal basis of a plane stretched by the subset S _n−1 .

たとえば、最初のステップでは、すべてのラウドスピーカ２３０のうちの１つのラウドスピーカ２３０が基準ラウドスピーカ２３０として選択され、基準ラウドスピーカ２３０以外のすべてのラウドスピーカ２３０が評価対象のラウドスピーカ２３０（評価対象ラウドスピーカ２３０）として選択される。後述するように、グラムシュミットの直交化法により、基準ラウドスピーカ２３０との関係において、複数の評価対象ラウドスピーカ２３０から１の評価対象ラウドスピーカ２３０が選択される。次のステップでは、同じくグラムシュミットの直交化法により、最初に選択された基準ラウドスピーカ２３０および先のステップで選択された評価対象ラウドスピーカ２３０との関係において、残りの複数の評価対象ラウドスピーカ２３０から１の評価対象ラウドスピーカ２３０が選択される。つまり、このステップでは、先のステップで選択された評価対象ラウドスピーカ２３０は、基準ラウドスピーカ２３０と言える。これが繰り返されるのである。
ただし、低域を補う８個のラウドスピーカ２３０は、基準ラウドスピーカ２３０や評価対象ラウドスピーカ２３０の対象外である。 For example, in the first step, one of the loudspeakers 230 is selected as the reference loudspeaker 230, and all the loudspeakers 230 other than the reference loudspeaker 230 are evaluated. Selected as loudspeaker 230). As will be described later, one evaluation target loudspeaker 230 is selected from the plurality of evaluation target loudspeakers 230 in relation to the reference loudspeaker 230 by the Gram Schmidt orthogonalization method. In the next step, the remaining plurality of evaluation target loudspeakers 230 in relation to the reference loudspeaker 230 initially selected and the evaluation target loudspeaker 230 selected in the previous step, also using the Gram Schmidt orthogonalization method. To 1 of the evaluation target loudspeakers 230 is selected. That is, in this step, the evaluation target loudspeaker 230 selected in the previous step can be said to be the reference loudspeaker 230. This is repeated.
However, the eight loudspeakers 230 that compensate for the low frequency band are outside the scope of the reference loudspeaker 230 and the evaluation target loudspeaker 230.

図５は、部分集合Ｓ_ｎ−１によって張られた平面の一例である。ｎ番目のステップでは、部分集合Ｓ_ｎ−１によって張られた平面に対するｈ_ｎ＾（数１１に示すように、実際には“＾”はｈの上に表記される。以下、同じ。）の垂直成分が最大となるように、ｈ_ｎ＾が選択される。部分集合τ_ｎ−１に含まれる任意のベクトルｈ_ｉの垂直成分ｒ_ｉは数１１で表される。 FIG. 5 is an example of a plane spanned by the subset S _n−1 . In the n-th step, h _n ^ for the plane stretched by the subset S _n−1 (in fact, “^” is written on h as shown in Equation 11. The same applies hereinafter). H _n ^ is selected so that the vertical component is maximized. A vertical component r _i of an arbitrary vector h _i included in the subset τ _n−1 is expressed by Equation 11.

［数１１］
r_i = z_i- p
ただし、ｐは部分集合Ｓ_ｎ−１によって張られた平面上の投影（射影）を示す。ｎ番目のラウドスピーカ２３０は、たとえば数１２で示される、垂直成分ｒ_ｉのノルムが最大となるように決定される。 [Equation 11]
r _i = z _i -p
Here, p represents a projection (projection) on a plane stretched by the subset S _n−1 . The n-th loudspeaker 230 is determined so that the norm of the vertical component r _i shown in, for example, Equation 12 is maximized.

［数１２］

[Equation 12]

ただし、評価指標の値であるＪ（ｈ_ｉ）は数１３で定義される。 However, J (h _i ), which is the value of the evaluation index, is defined by Equation 13.

［数１３］
J(h_i) = ||r_i||
ｈ_ｉ＾の垂直成分がｒ_ｎ＾（実際には“＾”の記号はｒの上に表記される。以下、同じ。）として示される場合には、ｎ番目の正規直交ベクトルｖ_ｎは数１４に従って決定される。 [Equation 13]
J (h _i ) = || r _i ||
If the vertical component of h _i ^ is indicated as r _n ^ (actually, the symbol “^” is written on r. The same applies hereinafter), the nth orthonormal vector v _n is a number. 14 is determined.

［数１４］

[Formula 14]

ｎ番目のステップで最大化された評価指標の値Ｊ_ｎ＾（実際には“＾”の記号はＪの上に表記される。以下、同じ。）は数１５で示される。 The evaluation index value J _n ^ maximized in the n-th step (actually, the symbol “^” is written on J. The same applies hereinafter) is expressed by Equation 15.

［数１５］

[Equation 15]

このような数１１−数１５に従う処理は、評価指標の値Ｊ_ｎ＾が予め設定された閾値Ｊ_ｔｈｒ＾よりも小さくなるまで繰り返される。ただし、周波数帯域［ω_ｌ，ω_ｈ］について、２つの評価指標の値が数１６に従って求められる。 Such processing according to Equation 11 to Equation 15 is repeated until the evaluation index value J _n ^ becomes smaller than a preset threshold value J _thr ^. However, for the frequency band [ω _l , ω _h ], two evaluation index values are obtained according to Equation 16.

［数１６］

[Equation 16]

ただし、ｈ_ｉ￣＝｛ｈ_ｉ（ω_ｌ），…，ｈ_ｉ（ω_ｈ）｝であり（実際には、数１６に示すように、“￣”はｈの上に表記される。）、Ｋは離散周波数ω_ｋの数であり、ａ_ｋは離散周波数ω_ｋに対する任意の重み係数を示す。垂直成分ｒ_ｉ（ω_ｋ）と正規直交ベクトルｖ_ｉ（ω_ｋ）は、単一の周波数の場合と同様に、離散周波数毎に分離して求められる。最適化処理では、評価指標の値Ｊ_ａｖｇは最大化される。一方、評価指標の値Ｊ_ｍｉｎは最適化処理の終了判定に用いられる。つまり、Ｊ_ｍｉｎ＾＜Ｊ_ｔｈｒ＾となったときにラウドスピーカ２３０の選択を終了する。 However, h _i ￣ = {h _i (ω _l ),..., H _i (ω _h )} (in practice, “￣” is written on h as shown in Equation 16). , K is the number of discrete frequencies ω _k and a _k is an arbitrary weighting factor for the discrete frequency ω _k . The vertical component r _i (ω _k ) and the orthonormal vector v _i (ω _k ) are obtained separately for each discrete frequency as in the case of a single frequency. In the optimization process, the evaluation index value J _avg is maximized. On the other hand, the evaluation index value J _min is used to determine the end of the optimization process. That is, selection of the loudspeaker 230 ends when J _min ^ <J _thr ^.

ただし、最適化処理については、文献（Asano, Suzuki, and Swanson " Optimization of control source configuration in active control systems using Gram-Schmidt orthogonalization", Speech and Audio Processing, IEEE Transactions on, Mar. 1999）に開示されている。 However, optimization processing is disclosed in the literature (Asano, Suzuki, and Swanson "Optimization of control source configuration in active control systems using Gram-Schmidt orthogonalization", Speech and Audio Processing, IEEE Transactions on, Mar. 1999). Yes.

この文献においては、評価指標の値が閾値以上（Ｊ_ｍｉｎ＾≧Ｊ_ｔｈｒ＾）である場合には、ラウドスピーカ２３０の選択は継続される。しかし、適切な閾値を決定する方法は確認されていない。したがって、この実施例では、音場共有システム１０において、リアルタイムに音場を共有することができるラウドスピーカ２３０の最大数とマイクロホンの最大数とを検証した。そして、グラムシュミットの直交化法を使用することで、最大数までのラウドスピーカ２３０の番号（配置位置）を決定した。 In this document, when the value of the evaluation index is equal to or greater than the threshold (J _min ^ ≧ J _thr ^), the selection of the loudspeaker 230 is continued. However, a method for determining an appropriate threshold has not been confirmed. Therefore, in this embodiment, the maximum number of loudspeakers 230 and the maximum number of microphones that can share a sound field in real time in the sound field sharing system 10 are verified. And the number (arrangement position) of the loudspeakers 230 up to the maximum number was determined by using the Gramschmitt orthogonalization method.

ここで、上述したように、グラムシュミットの直交化法では、スピーカ位置は、それ以前に選択されたスピーカ位置に基づいて決定されるため、その選択結果は、１番目に選択されるスピーカ位置に強い影響を及ぼされる。 Here, as described above, in the Gram Schmidt orthogonalization method, the speaker position is determined based on the speaker position previously selected, and therefore, the selection result is the first selected speaker position. Has a strong influence.

たとえば、使用するラウドスピーカ２３０の個数を、半数程度（３２個）、３分の１程度（２４個）、４分の１程度（１６個）に削減する場合について検討した。図６は、２４個のラウドスピーカ２３０が選択された（２４ステップの選択処理を実行した）場合の評価指標の値Ｊ_ａｖｇ，Ｊ_ｍｉｎの変化である。図６において、横軸は最初に選択されたラウドスピーカ２３０（基準ラウドスピーカ２３０）のスピーカ位置（図１０参照）を示し、縦軸は評価値（ｄＢ）を示す。ただし、２本の実線のうち、細い実線が評価指標の値Ｊ_ａｖｇを示し、細い実線が評価指標の値Ｊ_ｍｉｎの変化を示す。 For example, the case where the number of the loudspeakers 230 to be used is reduced to about half (32), about one third (24), or about one fourth (16) was examined. FIG. 6 shows changes in the evaluation index values J _avg and J _min when 24 loudspeakers 230 are selected (a selection process of 24 steps is executed). In FIG. 6, the horizontal axis indicates the speaker position (see FIG. 10) of the first selected loudspeaker 230 (reference loudspeaker 230), and the vertical axis indicates the evaluation value (dB). However, of the two solid lines, the thin solid line indicates the value J _avg of the evaluation index, and the thin solid line indicates the change in the value J _min of the evaluation index.

詳細な説明は省略するが、たとえば、最初に選択される基準ラウドスピーカ２３０は「１」番（図１０参照）から順次変化（２、３、…、６２）され、それぞれの場合について、選択された２４個のスピーカ位置（ラウドスピーカ２３０の番号）の組が選択されるとともに、各組について評価指標の値Ｊ_ａｖｇ，Ｊ_ｍｉｎが算出される。ただし、選択された２４個のスピーカ位置（ラウドスピーカ２３０の番号）の組と、各組について算出された評価指標の値Ｊ_ａｖｇ，Ｊ_ｍｉｎは、上述したコンピュータのメモリ（図示は省略するが、ハードディスクやＲＡＭ）に記憶される。そして、後述するように、複数の組のうち、評価指標の値Ｊ_ａｖｇ，Ｊ_ｍｉｎが所定の条件を満たす一組が選択される。したがって、選択された一組の２４個のラウドスピーカ２３０を用いて音場が再現されるのである。 Although the detailed description is omitted, for example, the reference loudspeaker 230 selected first is sequentially changed (2, 3,..., 62) from “1” (see FIG. 10), and is selected for each case. In addition, a set of 24 speaker positions (numbers of loudspeakers 230) is selected, and evaluation index values J _avg and J _min are calculated for each set. However, the set of 24 selected speaker positions (numbers of the loudspeakers 230) and the evaluation index values J _avg and J _min calculated for each set are the memory of the computer (not shown). Stored in a hard disk or RAM). As will be described later, one set of evaluation index values J _avg and J _min satisfying a predetermined condition is selected from the plurality of sets. Therefore, the sound field is reproduced using the selected set of 24 loudspeakers 230.

また、自由空間グリーン関数は、各ラウドスピーカ２３０とマイクロホンとの間の伝達関数を得るのに使用された。後述する刺激のための上限周波数は、ここでは制限されなかった。しかし、ラウドスピーカ２３０の構成（設定）は、２０Hzから１kHzまでの範囲を、２０Hz毎の周波数で決定された。図示は省略するが、上限周波数が制限されない場合には、上側の層（架台２２０ａ、架台２２０ｂ）に配置されたラウドスピーカ２３０が、多く選択された。ラウドスピーカ２３０が全く無い方向から来る波面を統合するのは立体音の再生系においては困難である。したがって、ラウドスピーカ２３０は、マイクロホンアレイに囲まれるあらゆる可能な方向に位置されるべきである。 The free space Green function was also used to obtain the transfer function between each loudspeaker 230 and the microphone. The upper limit frequency for stimulation described below was not limited here. However, the configuration (setting) of the loudspeaker 230 was determined in the range from 20 Hz to 1 kHz at a frequency of 20 Hz. Although illustration is omitted, when the upper limit frequency is not limited, many loudspeakers 230 arranged on the upper layer (the gantry 220a and the gantry 220b) are selected. It is difficult to integrate a wavefront coming from a direction where there is no loudspeaker 230 in a three-dimensional sound reproduction system. Accordingly, the loudspeaker 230 should be positioned in every possible direction surrounded by the microphone array.

上述したように、図６には、ラウドスピーカ２３０について、２４ステップ（回）の選択処理を実行した場合の評価指標の値Ｊ_ａｖｇ，Ｊ_ｍｉｎを折れ線で示したグラフである。この図６からも分かるように、スピーカ位置が「６０」（図１０参照）であるラウドスピーカ２３０を最初に選択し、全部で２４個のラウドスピーカ２３０を選択した場合の評価指標の値Ｊ_ａｖｇ，Ｊ_ｍｉｎが最大である。 As described above, FIG. 6 is a graph showing the evaluation index values J _avg and J _min in a broken line when the selection process of 24 steps (times) is performed for the loudspeaker 230. As can be seen from FIG. 6, the evaluation index value J _avg when the loudspeaker 230 whose speaker position is “60” (see FIG. 10) is first selected and all 24 loudspeakers 230 are selected. , J _min is the maximum.

この実施例では、複数の組（この実施例では、６２個の組）のうち、評価指標の値Ｊ_ａｖｇ，Ｊ_ｍｉｎが所定の条件を満たす一組の２４個のラウドスピーカ２３０が選択される。具体的には、評価指標の値Ｊ_ａｖｇが最大である組が選択される。ただし、評価指標の値Ｊ_ａｖｇが最大である組についての評価指標の値Ｊ_ｍｉｎが極端に低い場合には、線形独立性の低い周波数が存在するため、評価指標の値Ｊ_ａｖｇが最大であっても、選択するのは適切ではない。正しく音場を再現できないと考えられるからである。かかる場合には、次に評価指標の値Ｊ_ａｖｇが大きい組が選択される。ただし、次に評価指標の値Ｊ_ａｖｇが大きい組についての評価指標の値Ｊ_ｍｉｎが極端に低い場合には、その次に評価指標の値Ｊ_ａｖｇが大きい組が選択される。それ以降も同様である。たとえば、評価指標の値Ｊ_ｍｉｎが極端に低いかどうかについては、予め設定された閾値によってコンピュータは判断する。この閾値は、音場共有システム１０の開発者ないし使用者が設定する値である。ただし、後述の図７に示すように、選択するラウドスピーカ２３０の個数が増えるに従って、評価指標の値Ｊ_ａｖｇ，Ｊ_ｍｉｎは次第に低下するため、選択するラウドスピーカ２３０の個数に応じて、閾値も可変的に設定する必要がある。 In this embodiment, among a plurality of sets (62 sets in this embodiment), a set of 24 loudspeakers 230 in which evaluation index values J _avg and J _min satisfy a predetermined condition are selected. . Specifically, the pair having the maximum evaluation index value J _avg is selected. However, when the evaluation index value J _min for the pair having the maximum evaluation index value J _avg is extremely low, a frequency with low linear independence exists, and therefore the evaluation index value J _avg is the maximum. However, it is not appropriate to choose. This is because it is considered that the sound field cannot be reproduced correctly. In such a case, a group having the next largest evaluation index value J _avg is selected. However, if the next value J _min of metrics for the value J _avg large set of metrics is extremely low, the set value J _avg metric the next larger is selected. The same applies thereafter. For example, the computer determines whether or not the value J _{min of the} evaluation index is extremely low based on a preset threshold value. This threshold is a value set by the developer or user of the sound field sharing system 10. However, as shown in FIG. 7 to be described later, as the number of loudspeakers 230 to be selected increases, the evaluation index values J _avg and J _min gradually decrease, so that the threshold value also depends on the number of loudspeakers 230 to be selected. Must be set variably.

また、図７は、スピーカ位置が「６０」であるラウドスピーカ２３０を最初に選択し、その後、選択処理を繰り返した場合の評価指標の値Ｊ_ａｖｇ，Ｊ_ｍｉｎの変化を示すグラフである。図７から分かるように、評価指標の値Ｊ_ａｖｇ，Ｊ_ｍｉｎは次第に低下している。 FIG. 7 is a graph showing changes in the evaluation index values J _avg and J _min when the loudspeaker 230 whose speaker position is “60” is first selected and then the selection process is repeated. As can be seen from FIG. 7, the evaluation index values J _avg and J _min gradually decrease.

簡単のため、図示は省略するが、上述したように、ラウドスピーカ２３０の個数を１６個や３２個に低減した場合についても、図６に示すような評価指標の値Ｊ_ａｖｇ，Ｊ_ｍｉｎの変化を示した。ただし、後述するように、音源定位テストの結果に基づいて、ラウドスピーカ２３０の最大数を２４個に決定した。 Although not shown for simplicity, as described above, even when the number of loudspeakers 230 is reduced to 16 or 32, changes in evaluation index values J _avg and J _min as shown in FIG. showed that. However, as described later, the maximum number of loudspeakers 230 was determined to be 24 based on the result of the sound source localization test.

予備試験の結果では、サーバ１２およびコンピュータ１８、２０の性能およびネットワーク１６を含む通信速度の制約から、［Ｗ（ω）］における要素の数がＭ×Ｎ＝１９２以内で、ラウドスピーカ２３０の数（Ｍ）およびマイクロホンの数（Ｎ）が決定されるべきであることが示された。ただし、この実施例では、サーバ１２およびコンピュータ１８、２０のＣＰＵ（図示せず）はＸｅｏｎ（登録商標）ＱｕａｄＣｏｒｅ×２であり、メモリ（図示せず）は４ＧＢである。また、サーバ１２には、オペレーティングシステムとして、Ｗｉｎｄｏｗｓ（登録商標）ＸＰ６４ｂｉｔが採用された。また、サーバ１２とコンピュータ１８、２０とを結ぶネットワーク１６としては、超高速・高機能研究開発テストベッドネットワーク（ＪＧＮ２ｐｌｕｓ：１Ｇｂｐｓ）およびＬＡＮ（１００Ｍｂｐｓ）が用いられた。 As a result of the preliminary test, the number of elements in [W (ω)] is within M × N = 192 and the number of loudspeakers 230 due to the performance of the server 12 and the computers 18 and 20 and the communication speed limitation including the network 16. It was shown that (M) and the number of microphones (N) should be determined. However, in this embodiment, the CPU (not shown) of the server 12 and the computers 18 and 20 is Xeon (registered trademark) QuadCore × 2, and the memory (not shown) is 4 GB. The server 12 employs Windows (registered trademark) XP 64 bits as an operating system. Further, as the network 16 connecting the server 12 and the computers 18 and 20, an ultrahigh-speed, high-function R & D test bed network (JGN2 plus: 1 Gbps) and a LAN (100 Mbps) were used.

なお、図示は省略するが、予備実験においては、サーバ１２とコンピュータ１８とは、上述のＬＡＮを用いて接続され、サーバ１２とコンピュータ２０とは、上述のＪＧＮ２ｐｌｕｓおよびＬＡＮを用いて接続される。 In addition, although illustration is abbreviate | omitted, in a preliminary experiment, the server 12 and the computer 18 are connected using the above-mentioned LAN, and the server 12 and the computer 20 are connected using the above-mentioned JGN2plus and LAN.

したがって、上述したように、ラウドスピーカ２３０の数（Ｍ）を「２４」に決定したため、選択されるマイクロホンの数（Ｎ）は最大で「８」である。図８は、マイクロホンについて、８ステップの選択処理を実行した場合の評価指標の値Ｊ_ａｖｇ，Ｊ_ｍｉｎの変化が示される。この図８からも分かるように、マイクロホン位置が「６５」であるマイクロホン（基準のマイクロホン）を最初に選択した場合に、全部で８個のマイクロホンを選択したときの評価指標の値Ｊ_ａｖｇ，Ｊ_ｍｉｎが最大である。 Therefore, as described above, since the number (M) of the loudspeakers 230 is determined to be “24”, the number (N) of the selected microphones is “8” at the maximum. FIG. 8 shows changes in the evaluation index values J _avg and J _min when an 8-step selection process is executed for the microphone. As can be seen from FIG. 8, when a microphone having a microphone position of “65” (reference microphone) is first selected, the evaluation index values J _avg , J when a total of eight microphones are selected. _min is the maximum.

また、図９は、マイクロホン位置が「６５」であるマイクロホンを最初に選択し、その後、選択処理を繰り返した場合の評価値Ｊ_ｎ，Ｊ_ｍｉｎの変化が示される。図９に示すように、評価指標の値Ｊ_ｎ，Ｊ_ｍｉｎは選択処理を繰り返すと次第に小さくなり、繰り返し回数が「２５」である場合に、つまりマイクロホンが２５個選択された場合に、評価指標の値Ｊ_ｎ，Ｊ_ｍｉｎが著しく低下する。したがって、マイクロホンの最大数は、２４個以内に決定するのが望ましいと考えられる。上述したように、ここでは、８個のマイクロホンを選択するため、この要件は満たしていると言える。 FIG. 9 shows changes in the evaluation values J _n and J _min when the microphone whose microphone position is “65” is first selected and then the selection process is repeated. As shown in FIG. 9, the evaluation index values J _n and J _min gradually decrease when the selection process is repeated, and the evaluation index is obtained when the number of repetitions is “25”, that is, when 25 microphones are selected. The values J _n and J _min are significantly reduced. Therefore, it is considered desirable to determine the maximum number of microphones within 24. As described above, since eight microphones are selected here, it can be said that this requirement is satisfied.

図１０には、上述したように、スピーカ位置が「６０」のラウドスピーカ２３０が最初に選択し、全部で２４個のラウドスピーカ２３０を選択した場合の２４個のラウドスピーカ２３０のスピーカ位置の分布が示される。ただし、図１０では省略するが、スピーカ位置が中央に向かうに従って高さ方向（Ｚ方向）の値は大きくなる。したがって、架台２２０ａに設けられたラウドスピーカ２３０のスピーカ位置は、「１」−「６」である。また、架台２２０ｂに設けられたラウドスピーカ２３０のスピーカ位置は、「７」−「２２」である。さらに、架台２２０ｃに設けられたラウドスピーカ２３０のスピーカ位置は、「２３」−「４６」である。そして、架台２２０ｄに設けられたラウドスピーカ２３０のスピーカ位置は、「４７」−「６２」である。 In FIG. 10, as described above, the distribution of the speaker positions of the 24 loudspeakers 230 when the loudspeaker 230 whose speaker position is “60” is selected first and 24 loudspeakers 230 are selected in total. Is shown. However, although omitted in FIG. 10, the value in the height direction (Z direction) increases as the speaker position goes toward the center. Accordingly, the loudspeaker positions of the loudspeakers 230 provided on the mount 220a are “1”-“6”. The loudspeaker positions of the loudspeakers 230 provided on the gantry 220b are “7”-“22”. Furthermore, the loudspeaker position of the loudspeaker 230 provided on the gantry 220c is “23”-“46”. And the speaker position of the loudspeaker 230 provided in the mount 220d is “47”-“62”.

なお、低域を補うために、４本の柱部２２２に設けられた８個のラウドスピーカ２３０は選択の対象では無いため、図１０には示されていない。 In order to compensate for the low frequency, the eight loudspeakers 230 provided on the four pillars 222 are not selected and are not shown in FIG.

図１０においては、最初に選択されたラウドスピーカ２３０のスピーカ位置を示す丸印（「６０」が記載された丸印）に網掛模様が付される。また、これに続いて、グラムシュミットの直交化法に基づく繰り返しの結果として選ばれたラウドスピーカ２３０のスピーカ位置を示す丸印（ここでは、「１」−「６」、「７」、「９」、「１１」、「１３」、「１５」、「１７」、「１９」、「２１」、「２３」、「３１」、「３５」、「４８」、「５１」、「５４」、「５６」、「５８」、「６２」が記載された丸印）に斜線模様が付されている。さらに、模様が付されていない丸印は、選択されなかったラウドスピーカ２３０のスピーカ位置を示す。この図１０からは、各方向と高さに分布されたラウドスピーカ２３０が規則的に観測される。図１０のように平面的に見た場合には、上下および左右のそれぞれにおいて、選択されたラウドスピーカ２３０が略対称に分布していることが分かる。 In FIG. 10, a shaded pattern is added to a circle indicating the speaker position of the first selected loudspeaker 230 (a circle having “60” written therein). Further, following this, a circle indicating the speaker position of the loudspeaker 230 selected as a result of the repetition based on the Gramschmitt orthogonalization method (here, “1”-“6”, “7”, “9”). ”,“ 11 ”,“ 13 ”,“ 15 ”,“ 17 ”,“ 19 ”,“ 21 ”,“ 23 ”,“ 31 ”,“ 35 ”,“ 48 ”,“ 51 ”,“ 54 ”, A hatched pattern is attached to the circles “56”, “58”, and “62”. Further, a circle without a pattern indicates the speaker position of the loudspeaker 230 that has not been selected. From FIG. 10, the loudspeakers 230 distributed in each direction and height are regularly observed. When viewed in a plan view as shown in FIG. 10, it can be seen that the selected loudspeakers 230 are distributed substantially symmetrically in the top, bottom, left and right.

また、ラウドスピーカ２３０とマイクロホンとの構成を入れ替えることによって、上述したグラムシュミットの直交化法を適用することにより、マイクロホンを選択した。ただし、グラムシュミットの直交化法を用いた選択方法については既に説明したため、重複した説明は省略することにする。 In addition, the microphones were selected by applying the Gramschmitt orthogonalization method described above by switching the configuration of the loudspeaker 230 and the microphones. However, since the selection method using the Gramschmitt orthogonalization method has already been described, redundant description will be omitted.

図１１は、図１０に示した２４個のラウドスピーカ２３０の配列に対する８個のマイクロホンの配列を示す。図示は省略するが、マイクロホンの位置は、ラウドスピーカ２３０のスピーカ位置と同様に、番号が割り当てられている。図１１では少し分かり難いが、ＸＹ平面を真上方向から平面的に見た場合には、選択されたマイクロホンはすべての方向に均等に分布している。 FIG. 11 shows an array of eight microphones relative to the array of 24 loudspeakers 230 shown in FIG. Although illustration is omitted, numbers are assigned to the positions of the microphones in the same manner as the speaker positions of the loudspeaker 230. Although it is a little difficult to understand in FIG. 11, when the XY plane is viewed from above, the selected microphones are evenly distributed in all directions.

このように、グラムシュミットの直交化法を使用することによって、マイクロホンおよびラウドスピーカ２３０の数を低減するようにしたが、この低減による影響を評価するために、水平面の音源定位テストが行われた。 In this way, the number of microphones and loudspeakers 230 was reduced by using the Gramschmitt orthogonalization method, but in order to evaluate the effect of this reduction, a sound source localization test on a horizontal plane was performed. .

音源定位テストにおいては、ＢｏＳＣ再生システムを使用することで再生された刺激は、ピンクノイズとインパルス応答の畳み込みから発生させた。ただし、インパルス応答は、自由空間グリーン関数からシミュレートされた。シミュレーションにおいては、音源の位置は、マイクロホンアレイ１４の中心から１メートル離れたところに配置されているものと仮定した。また、ＢｏＳＣ再生システムにおける音場再生（立体音響再生）のための逆フィルタは、あらかじめ４８ｋＨｚのサンプリング周波数で測定されたインパルス応答を使用することによって計算され、逆フィルタの長さは４０９６ポイントであった。刺激の音圧レベルは、各実験条件と方向との間のレベル差を排除するように、マイクロホンアレイ１４の中心においてＬ_A,Fmax＝５５ｄＢに調整された。 In the sound source localization test, the stimulus reproduced using the BoSC playback system was generated from convolution of pink noise and impulse response. However, the impulse response was simulated from a free space Green's function. In the simulation, it was assumed that the position of the sound source was located 1 meter away from the center of the microphone array 14. In addition, the inverse filter for sound field reproduction (stereoscopic sound reproduction) in the BoSC reproduction system is calculated by using an impulse response measured in advance at a sampling frequency of 48 kHz, and the length of the inverse filter is 4096 points. It was. The sound pressure level of the stimulus was adjusted to L _{A, Fmax} = 55 dB at the center of the microphone array 14 to eliminate the level difference between each experimental condition and direction.

ラウドスピーカ２３０の数とマイクロホンの数に関する実験条件は図１２に示す表にまとめた。すべてのラウドスピーカ２３０とすべてのマイクロホンは、条件５において使用された。条件４においては、すべてのマイクロホンに対して、ラウドスピーカ２３０の数が２４まで低減された。条件１、２および３は、上述したように、ラウドスピーカの数を２４、３２および１６にしており、逆行列［Ｗ（ω）］の要素数（１９２）を一致させるように、マイクロホンの数がそれぞれ８、６、１２に決定された。 The experimental conditions regarding the number of loudspeakers 230 and the number of microphones are summarized in the table shown in FIG. All loudspeakers 230 and all microphones were used in condition 5. In condition 4, the number of loudspeakers 230 was reduced to 24 for all microphones. Conditions 1, 2 and 3 have the number of loudspeakers of 24, 32 and 16, as described above, and the number of microphones so that the number of elements (192) of the inverse matrix [W (ω)] matches. Were determined to be 8, 6 and 12, respectively.

また、この音源定位テストにおいては、２０歳代から５０歳代までの１３人の被験者（５人の男性と８人の女性）は、再生された刺激を聞いた後に、知覚した角度を回答した。刺激は、水平面において、０度−３３０度の間を、３０度刻みで提示された。それらは、２秒継続され、各角度について２回繰り返された。ただし、提示順序は、ラテン方格法を用いることにより、決定された。ただし、被験者は刺激を聞いている間、頭と体を動かすことは許容された。 In this sound source localization test, 13 subjects (5 men and 8 women) from the 20s to 50s answered the perceived angles after listening to the regenerated stimuli. . Stimulation was presented in 30 degree increments between 0 and 330 degrees in the horizontal plane. They lasted 2 seconds and were repeated twice for each angle. However, the presentation order was determined by using the Latin square method. However, subjects were allowed to move their heads and bodies while listening to the stimulus.

音源定位テストの結果は、図１３（Ａ）および図１３（Ｂ）に示される。図１３（Ａ）は、各条件における、再生した音の角度と知覚された角度の差についてのＲＭＳ値（二乗平均値）の被験者間の平均を示す。ただし、図１３（Ａ）では、エラーバーは９５％の信頼区間（ＣＩ）を示す。図１３（Ａ）から分かるように、条件５の場合に、最も低いＲＭＳ値が得られ、そして、結果として、音源定位の最も高い精度は実現されました。条件４では、上述したように、２４個のラウドスピーカ２３０と７０個のマイクロホンとが使用され、ＲＭＳ値は、条件５の場合よりも約５度大きかった。条件１のＲＭＳ値は、条件５の場合よりも約１５度大きかった。また、条件２と条件３のＲＭＳ値はほとんど同じであるが、それらの値は、条件５の場合よりも約２５度大きかった。 The results of the sound source localization test are shown in FIGS. 13 (A) and 13 (B). FIG. 13A shows the average of the RMS values (square mean values) between subjects for the difference between the angle of the reproduced sound and the perceived angle under each condition. However, in FIG. 13A, error bars indicate a 95% confidence interval (CI). As can be seen from FIG. 13A, in the case of condition 5, the lowest RMS value was obtained, and as a result, the highest accuracy of sound source localization was realized. In condition 4, as mentioned above, 24 loudspeakers 230 and 70 microphones were used, and the RMS value was about 5 degrees larger than in condition 5. The RMS value in condition 1 was about 15 degrees greater than in condition 5. Moreover, although the RMS value of condition 2 and condition 3 is almost the same, those values were about 25 degree | times larger than the case of condition 5.

また、図１３（Ｂ）には、各条件における、被験者の正解率およびエラーバーが９５％の信頼区間（ＣＩ）を示す。図１３（Ｂ）から分かるように、条件５においては、図１３（Ａ）に示した結果と同様に、最も高い正解率が得られた。また、条件５の正解率は、条件４の場合よりも約５％高く、条件１の場合よりも約１０％高い。 FIG. 13B shows a confidence interval (CI) in which the accuracy rate and error bar of the subject are 95% under each condition. As can be seen from FIG. 13 (B), under condition 5, the highest accuracy rate was obtained as in the result shown in FIG. 13 (A). The correct answer rate under condition 5 is about 5% higher than under condition 4 and about 10% higher than under condition 1.

詳細な説明は省略するが、ハートレイ検定では、すべての条件について同様の変化を有することが確認されたため、他のシステム（条件）とは著しく異なるシステム（条件）を見つけるために、テューキーの多重比較法が適用される。この統計的検査法の結果は、図１４（Ａ）および（Ｂ）に示す表で示される。ただし、図１４（Ａ）および（Ｂ）に示す表では、“＊”が各条件の間における１％の顕著な差を示し、“＊＊”が各条件の間における５％の顕著な差を示している。テューキーの多重比較法では、条件５と条件４との間および条件５と条件１との間には、著しい違いが無いことが確認された。したがって、ＢｏＳＣ再生システムを構成するラウドスピーカ２３０とマイクロホンの数がグラムシュミットの直交化法を使用することで低減することができると言える。対照的に、条件５と条件２との間および条件５と条件３との間には、著しい違いが有ることが確認された。したがって、条件２や条件３を用いた場合には、ＢｏＳＣ再生システムによる再生は、臨場感を損なうと言える。 Although detailed explanation is omitted, the Hartley test has confirmed that all conditions have the same change, and in order to find a system (condition) that is significantly different from other systems (conditions), multiple comparisons of Tukey The law applies. The results of this statistical test are shown in the tables shown in FIGS. 14 (A) and (B). However, in the tables shown in FIGS. 14A and 14B, “*” indicates a significant difference of 1% between the conditions, and “**” indicates a significant difference of 5% between the conditions. Is shown. In the Tukey multiple comparison method, it was confirmed that there was no significant difference between Condition 5 and Condition 4 and between Condition 5 and Condition 1. Therefore, it can be said that the number of loudspeakers 230 and microphones constituting the BoSC playback system can be reduced by using the Gramschmitt orthogonalization method. In contrast, it was confirmed that there were significant differences between Condition 5 and Condition 2 and between Condition 5 and Condition 3. Therefore, when Condition 2 and Condition 3 are used, it can be said that reproduction by the BoSC reproduction system impairs the sense of reality.

以上のように、この実施例では、グラムシュミットの直交化法を用いて、ラウドスピーカ２３０とマイクロホンの数を低減する方法が示された。グラムシュミットの直交化法を使用して、高い線形独立性を有する縦ベクトルのグループが各ラウドスピーカ２３０とマイクロホンの間の伝達関数マトリクスから選択するようにした。選択されたベクトルは、ＢｏＳＣ再生システム１０ａ、１０ｂでのラウドスピーカ２３０とマイクロホンの構成に対応している。したがって、他の評価基準を使用することで選択されたラウドスピーカ２３０とマイクロホンとによって構成されたシステムと比べて、そのようなシステムは音響の環境の変化に打ち勝つことができると考えられる。 As described above, in this embodiment, the method of reducing the number of the loudspeakers 230 and the microphones using the Gram Schmidt orthogonalization method is shown. Using the Gramschmitt orthogonalization method, a group of longitudinal vectors with high linear independence was selected from the transfer function matrix between each loudspeaker 230 and microphone. The selected vector corresponds to the configuration of the loudspeaker 230 and the microphone in the BoSC playback systems 10a and 10b. Thus, it is believed that such a system can overcome changes in the acoustic environment as compared to a system comprised of a loudspeaker 230 and a microphone selected using other evaluation criteria.

また、選択手順において、２０Ｈｚから１ｋＨｚまでの周波数帯域の制限は、シミュレーションにおいて、ラウドスピーカ２３０が規則的に分散している構成を満たした。また、同様に、グラムシュミットの直交化法を用いて、マイクロホンを選択した結果、すべての水平な方向に規則的に分配され、低減された数のマイクロホンが得られた。このように、マイクロホンは、ラウドスピーカ２３０と同じ方法で選択されたが、ラウドスピーカ２３０の数は、グラムシュミットの直交化法により既に低減されていた。 Further, in the selection procedure, the limitation of the frequency band from 20 Hz to 1 kHz satisfied the configuration in which the loudspeakers 230 are regularly dispersed in the simulation. Similarly, the selection of microphones using Gramschmitt's orthogonalization method resulted in a reduced number of microphones that were regularly distributed in all horizontal directions. Thus, the microphones were selected in the same way as the loudspeakers 230, but the number of loudspeakers 230 was already reduced by the Gram Schmidt orthogonalization method.

また、ラウドスピーカ２３０とマイクロホンの数を低減したことによる劣化を評価するために、水平面の音源定位テストが行われた。主観評価の結果によれば、６２個のラウドスピーカ２３０からなるＢｏＳＣ再生システムと２４個のラウドスピーカ２３０から成るＢｏＳＣ再生システムとの間には、統計的に著し違いは存在しなかった。さらに、２４個のラウドスピーカ２３０に対して、８個のマイクロホンを用いたシステムと、７０個のマイクロホンを用いたシステムとの間にも、統計的な著しい違いは存在しなかった。 Further, a horizontal plane sound source localization test was performed in order to evaluate deterioration due to the reduction in the number of loudspeakers 230 and microphones. According to the results of the subjective evaluation, there was no statistically significant difference between the BoSC playback system consisting of 62 loudspeakers 230 and the BoSC playback system consisting of 24 loudspeakers 230. Furthermore, for 24 loudspeakers 230, there was no statistically significant difference between a system using 8 microphones and a system using 70 microphones.

したがって、２４個のラウドスピーカ２３０の構成を適用しても良いと考えられる。また、この実施例では、サーバ１２、コンピュータ１８、２０の性能およびネットワーク１６の制約から、逆行列［Ｗ（ω）］の要素数（１９２）が決定され、したがって、２４個のラウドスピーカ２３０に対して８個のマイクロホンの構成を適用することに決定された。 Therefore, it is considered that the configuration of 24 loudspeakers 230 may be applied. Further, in this embodiment, the number of elements (192) of the inverse matrix [W (ω)] is determined from the performance of the server 12 and the computers 18 and 20 and the restrictions of the network 16. On the other hand, it was decided to apply the configuration of 8 microphones.

詳細な説明は省略するが、選択されたマイクロホンで検出された音場信号がマイクロホンアレイ１４からサーバ１２に与えられる。このとき、選択されていないマイクロホンは不能化される。つまり、サーバ１２は、選択されていないマイクロホンからの音場信号を検出しない。一方、コンピュータ１８および２０は、選択されたラウドスピーカ２３０のみに、音場データや音声データを出力する。 Although a detailed description is omitted, the sound field signal detected by the selected microphone is supplied from the microphone array 14 to the server 12. At this time, unselected microphones are disabled. That is, the server 12 does not detect a sound field signal from a microphone that is not selected. On the other hand, the computers 18 and 20 output sound field data and audio data only to the selected loudspeaker 230.

この実施例によれば、グラムシュミットの直交評価法に従って、２次音源であるラウドスピーカの数を低減するとともに、１次音源の音を収録するためのマイクロホンの数も低減するため、畳み込みの処理負荷を低減することができるとともに、伝送するデータ量を低減することができる。したがって、音場で収録した音に対応する音場データを、リアルタイムに伝送し、クライアント側で再生することができる。つまり、音場に存在する人間と、スピーカシステムを使用するユーザとによって、リアルタイムに音場を共有することができる。 According to this embodiment, the number of microphones for recording the sound of the primary sound source and the number of microphones for recording the sound of the primary sound source are reduced in accordance with the Gram Schmidt orthogonal evaluation method. The load can be reduced and the amount of data to be transmitted can be reduced. Therefore, sound field data corresponding to the sound recorded in the sound field can be transmitted in real time and reproduced on the client side. That is, the sound field can be shared in real time by a person who exists in the sound field and a user who uses the speaker system.

なお、この実施例では、２台のクライアントコンピュータを示したが、３台以上のクライアントコンピュータがネットワークに接続されてもよい。かかる場合には、各クライアントコンピュータは、他の２台以上のクライアントコンピュータからの音声データを個別に畳み込み、音場データに重畳する。 In this embodiment, two client computers are shown, but three or more client computers may be connected to the network. In such a case, each client computer individually folds audio data from two or more other client computers and superimposes the audio data on the sound field data.

１０ …音場共有システム
１２ …サーバ
１４ …マイクロホンアレイ
１８，２０ …コンピュータ
２２，２６ …スピーカアレイシステム DESCRIPTION OF SYMBOLS 10 ... Sound field sharing system 12 ... Server 14 ... Microphone array 18, 20 ... Computer 22, 26 ... Speaker array system

Claims

A microphone array disposed in a certain sound field and having a first predetermined number of microphones;
A sound field data detected by the microphone array is recorded, the sound field data is transmitted to a plurality of reproduction systems, and the sound field data from the server is reproduced by a speaker array having a second predetermined number of loudspeakers. A sound field sharing system comprising the reproduction system
Initial speaker selection means for selecting one loudspeaker of the speaker array as an initial reference loudspeaker;
A first evaluation value calculation means for calculating a Gram Schmidt orthogonalization evaluation value between the selected reference loudspeaker and each of the evaluation target loudspeakers other than the reference loudspeaker in the speaker array;
Reference speaker selection means for selecting, as the reference loudspeaker, the evaluation target loudspeaker having the highest Gram Schmidt orthogonalization evaluation value calculated by the first evaluation value calculation means;
As a result of the selection by the reference speaker selection means, the first evaluation value calculation means and the reference speaker selection means are repeatedly executed until the number of the reference loudspeakers becomes a third predetermined number smaller than the second predetermined number. First execution means,
Initial microphone selection means for selecting one microphone of the microphone array as a first reference microphone;
A second evaluation value calculating means for calculating a Gram Schmitt orthogonalization evaluation value between the selected reference microphone and each of all the evaluation target microphones other than the reference microphone in the microphone array;
Reference microphone selection means for selecting the evaluation target microphone having the highest Gram Schmidt orthogonalization evaluation value calculated by the second evaluation value calculation means as the reference microphone, and the result of selection by the reference microphone selection means, the reference microphone Second execution means for repeatedly executing the second evaluation value calculation means and the reference microphone selection means until the number of the first evaluation number becomes a fourth predetermined number smaller than the first predetermined number,
The server transmits sound field data detected by the fourth predetermined number of the reference microphones to the plurality of reproduction systems;
Each of the plurality of reproduction systems reproduces the sound field data transmitted from the server using the third predetermined number of reference loudspeakers.

Initial speaker changing means for sequentially changing the first reference loudspeaker selected by the initial speaker selecting means;
First set storage means for storing a plurality of selected sets of the third predetermined number of reference loudspeakers for each case when the initial reference loudspeaker is sequentially changed by the initial speaker changing means; And among the plurality of sets stored by the first set storage means, the third predetermined number of the set of the third predetermined number of sets in which the Gram Schmitt orthogonalization evaluation value calculated by the first evaluation value calculation means satisfies a predetermined condition Further comprising first set selection means for selecting a reference loudspeaker;
Each of the plurality of reproduction systems reproduces the sound field data transmitted from the server using a set of the third predetermined number of the reference loudspeakers selected by the first set selection unit. The sound field sharing system according to claim 1.

Initial microphone changing means for sequentially changing the first reference microphone selected by the initial microphone selecting means;
Second set storage means for storing a plurality of selected sets of the fourth predetermined number of the reference microphones for each case when the initial reference microphones are sequentially changed by the initial microphone changing means; and Second set selection means for selecting one set of the fourth predetermined number of the reference microphones among the plurality of sets stored by the second set storage means, wherein the Gramschmitt orthogonalization evaluation value satisfies a predetermined condition. Prepared,
The sound field sharing system according to claim 2, wherein the server transmits sound field data detected by the set of the fourth predetermined number of the microphones selected by the second set selection unit to the plurality of reproduction systems. .

4. The sound field sharing system according to claim 1, wherein the fourth predetermined number is determined according to the third predetermined number. 5.

5. The sound field sharing system according to claim 1, wherein the third predetermined number and the fourth predetermined number are determined according to at least the processing capability of the server and the reproduction system.

The sound field sharing system according to any one of claims 1 to 5, wherein the second predetermined number is 62 and the third predetermined number is a value not exceeding 24.

The sound field sharing system according to claim 6, wherein the first predetermined number is 70 and the fourth predetermined number is a value not exceeding eight.

A microphone array disposed in a certain sound field and having a first predetermined number of microphones;
A sound field data detected by the microphone array is recorded, the sound field data is transmitted to a plurality of reproduction systems, and the sound field data from the server is reproduced by a speaker array having a second predetermined number of loudspeakers. An optimization method for optimizing the number and arrangement of the microphone array and the speaker array of a sound field sharing system comprising the reproduction system comprising:
(A) selecting one loudspeaker from the speaker array as a first reference loudspeaker;
(B) calculating a Gramschmitt orthogonalization evaluation value between the selected reference loudspeaker and each of the evaluation target loudspeakers other than the reference loudspeaker in the speaker array;
(C) selecting the evaluation target loudspeaker having the highest Gram Schmidt orthogonalization evaluation value calculated in step (b) as the reference loudspeaker;
(D) Steps (b) and (c) are repeatedly executed until the number of the reference loudspeakers becomes a third predetermined number smaller than the second predetermined number as a result of the selection in step (c). Let
(E) selecting one microphone of the microphone array as the first reference microphone;
(F) calculating a Gram Schmitt orthogonalization evaluation value between the selected reference microphone and each of all evaluation target microphones other than the reference microphone in the microphone array;
(G) selecting the evaluation target microphone having the highest Gram Schmitt orthogonalization evaluation value calculated in step (f) as the reference microphone; and (h) selecting the reference microphone as a result of the selection in step (g). The step (f) and the step (g) are repeatedly executed until the number reaches a fourth predetermined number smaller than the first predetermined number.