JP2024043429A

JP2024043429A - Realistic sound field reproduction device and realistic sound field reproduction method

Info

Publication number: JP2024043429A
Application number: JP2022148617A
Authority: JP
Inventors: 邦昭大澤; 宏正大橋; 仁岩泉; 古賀　淳一; 義悟正清; 章悟三橋
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2022-09-16
Filing date: 2022-09-16
Publication date: 2024-03-29

Abstract

【課題】音場収録空間内の各種の音源による音像を含む臨場感を音場再生空間内において高感度に再生する臨場感音場再生装置及び方法を提供する。【解決手段】臨場感音場再生システム１００において、サテライト会場は、ライブ会場から、人物が装着する複数のヘッドセットマイクが収録する演者音信号、複数のゾーンマイクが収録するゾーン音信号及び前記人物の位置情報を取得するサテライト会場側通信部と、前記位置情報に基づいて、複数のゾーンマイク中の主周辺マイクを決定する音像定位ゾーン決定部と、ヘッドセットマイクによる発話音声信号、主周辺マイクによる第１周辺音信号及び主周辺マイク以外のゾーンマイクによる第２周辺音信号に基づいて、複数のサテライトスピーカを用いてライブ会場の音場を再現するための音場再生処理を実行する音場再生部（客席側臨場感パラメータ算出部１６、ミキシング／レベルバランス調整部、音場再生処理部）と、を備える。【選択図】図１[Problem] To provide a realistic sound field reproducing device and method for reproducing with high sensitivity in a sound field reproducing space a sense of realism including sound images from various sound sources in a sound field recording space. [Solution] In a realistic sound field reproducing system 100, a satellite venue includes a satellite venue side communication unit that acquires performer sound signals recorded by multiple headset microphones worn by people, zone sound signals recorded by multiple zone microphones, and position information of the people from a live venue, a sound image localization zone determination unit that determines a main peripheral microphone among the multiple zone microphones based on the position information, and a sound field reproducing unit (audience side realistic parameter calculation unit 16, mixing/level balance adjustment unit, sound field reproducing processing unit) that executes sound field reproducing processing to reproduce the sound field of the live venue using multiple satellite speakers based on speech signals from the headset microphone, a first peripheral sound signal from the main peripheral microphone, and a second peripheral sound signal from a zone microphone other than the main peripheral microphone. [Selected Figure] Figure 1

Description

本開示は、臨場感音場再生装置及び臨場感音場再生方法に関する。 The present disclosure relates to a realistic sound field reproducing device and a realistic sound field reproducing method.

昨今、リアルタイムに音場再現（再生）を行うためにシーンベース立体音響再生技術が注目されている。シーンベース立体音響再生技術とは、複数の指向性マイク素子を剛球上又は中空球面上に配置されているアンビソニックスマイクを用いて収録（収音）した多チャンネル信号に対して信号処理を施すことにより、視聴環境（空間）を取り囲むように配置されたスピーカを用いてあたかもリスナー（聴取者）がアンビソニックスマイクの設置箇所に存在しているかのような立体的な音場をリアルタイムに再現（再生）する方式である。 Recently, scene-based 3D sound reproduction technology has been attracting attention for its ability to reproduce (play) sound fields in real time. Scene-based 3D sound reproduction technology is a method of reproducing (playing) a 3D sound field in real time using speakers arranged to surround the viewing environment (space) by performing signal processing on multi-channel signals recorded (sound pickup) using an Ambisonics microphone, which has multiple directional microphone elements arranged on a solid sphere or a hollow sphere, and using this to reproduce (play) a 3D sound field in real time as if the listener were present at the location where the Ambisonics microphone was installed.

音場再生に関する先行技術として、例えば特許文献１が知られている。特許文献１は、被写体に装着されたワイヤレスマイクの収音信号を受信し、複数のマイクロフォンにより音声を収音した各音声信号を基にしてマルチチャネル音声信号を生成する音声収録装置を開示している。この音声収録装置は、ワイヤレスマイクの収音信号をマルチチャネル音声信号の１つ以上の任意のチャネルに割り当て、それぞれ任意の合成比で合成して撮像画像信号とともに記録媒体に記録する。 As a prior art related to sound field reproduction, for example, Patent Document 1 is known. Patent Document 1 discloses an audio recording device that receives a sound signal from a wireless microphone attached to a subject and generates a multichannel audio signal based on each audio signal collected by a plurality of microphones. There is. This audio recording device allocates a sound signal picked up by a wireless microphone to one or more arbitrary channels of a multi-channel audio signal, synthesizes them at an arbitrary synthesis ratio, and records them on a recording medium together with a captured image signal.

特開２００６－３１４０７８号公報Japanese Patent Application Publication No. 2006-314078

ここで、上述したシーンベース立体音響再生技術を用いて、例えば広範なコンサートホール等のライブ会場で収録した各種の音源（例えば舞台上の役者等の演者、効果音）による音場をライブ会場とは異なる１つ以上のサテライト会場において再現することを想定する。特許文献１には、マイクロフォンにより収音された音場の雰囲気とワイヤレスマイクにより収音された音場の雰囲気との関係が詳細に開示されておらず、上述した想定の実現に特許文献１の技術を適用することが困難であると考えられる。また、ライブ会場側の臨場感をサテライト会場において高感度に再現（再生）するためには、ライブ会場の舞台上の演者が発話した時の音声（言い換えると、音像）だけでは、サテライト会場にいる聴取者が聴取する上では音場再生の観点で不十分となる可能性があった。しかしながら、特許文献１では、上述した各種の音源を利用して高感度な臨場感を再現するための解決の道筋が提示されていなかった。 Here, it is assumed that the above-mentioned scene-based stereophonic reproduction technology is used to reproduce a sound field from various sound sources (e.g., performers such as actors on stage, sound effects) recorded at a live venue such as a large concert hall in one or more satellite venues different from the live venue. Patent Document 1 does not disclose in detail the relationship between the atmosphere of the sound field picked up by a microphone and the atmosphere of the sound field picked up by a wireless microphone, and it is considered difficult to apply the technology of Patent Document 1 to realize the above-mentioned assumption. In addition, in order to reproduce (reproduce) the sense of realism of the live venue with high sensitivity in the satellite venue, there is a possibility that the sound (in other words, the sound image) when the performer on the stage of the live venue speaks alone may not be sufficient in terms of sound field reproduction for listeners in the satellite venue. However, Patent Document 1 does not present a solution for reproducing a sense of realism with high sensitivity using the above-mentioned various sound sources.

本開示は、上述した従来の状況に鑑みて案出され、音場収録空間内の各種の音源による音像を含む臨場感を音場再生空間内において高感度に再生する臨場感音場再生装置及び臨場感音場再生方法を提供することを目的とする。 The present disclosure has been devised in consideration of the above-mentioned conventional situation, and aims to provide a realistic sound field reproduction device and a realistic sound field reproduction method that reproduces with high sensitivity in a sound field reproduction space a sense of realism including sound images from various sound sources in the sound field recording space.

本開示は、音場収録空間内の活動エリアを移動可能な少なくとも１人の人物が装着する人物マイクが収録する発話音声信号と、前記活動エリアの周辺に配置された複数の周辺マイクが収録する周辺音信号と、前記人物の位置情報と、を少なくとも取得する取得部と、前記人物の位置情報に基づいて、前記複数の周辺マイクのうちいずれかであってかつ前記活動エリア内の前記人物が位置する箇所を収録領域とする主周辺マイクを決定する決定部と、少なくとも前記人物マイクによる前記発話音声信号と、前記主周辺マイクによる第１周辺音信号と、前記主周辺マイク以外の他の前記周辺マイクによる第２周辺音信号とに基づいて、前記音場収録空間とは異なる音場再生空間内に配置された複数のスピーカを用いて前記音場収録空間内の音場を再現するための音場再生処理を実行する音場再生部と、を備える、臨場感音場再生装置を提供する。 The present disclosure provides a realistic sound field reproduction device comprising: an acquisition unit that acquires at least a speech sound signal recorded by a person microphone worn by at least one person capable of moving in an activity area in a sound field recording space, ambient sound signals recorded by multiple peripheral microphones arranged around the activity area, and position information of the person; a determination unit that determines a main peripheral microphone, which is one of the multiple peripheral microphones and has a recording area where the person is located in the activity area, based on the person's position information; and a sound field reproduction unit that executes a sound field reproduction process to reproduce a sound field in the sound field recording space using multiple speakers arranged in a sound field reproduction space different from the sound field recording space, based on at least the speech sound signal from the person microphone, a first ambient sound signal from the main peripheral microphone, and a second ambient sound signal from a peripheral microphone other than the main peripheral microphone.

また、本開示は、臨場感音場再生装置による臨場感音場再生方法であって、音場収録空間内の活動エリアを移動可能な少なくとも１人の人物が装着する人物マイクが収録する発話音声信号と、前記活動エリアの周辺に配置された複数の周辺マイクが収録する周辺音信号と、前記人物の位置情報と、を少なくとも取得するステップと、前記人物の位置情報に基づいて、前記複数の周辺マイクのうちいずれかであってかつ前記活動エリア内の前記人物が位置する箇所を収録領域とする主周辺マイクを決定するステップと、少なくとも前記人物マイクによる前記発話音声信号と、前記主周辺マイクによる第１周辺音信号と、前記主周辺マイク以外の他の前記周辺マイクによる第２周辺音信号とに基づいて、前記音場収録空間とは異なる音場再生空間内に配置された複数のスピーカを用いて前記音場収録空間内の音場を再現するための音場再生処理を実行するステップと、を有する、臨場感音場再生方法を提供する。 The present disclosure also provides a method for reproducing a realistic sound field using a realistic sound field reproducing device, in which utterances are recorded by a person microphone worn by at least one person who can move in an activity area within a sound field recording space. a step of acquiring at least a signal, ambient sound signals recorded by a plurality of peripheral microphones arranged around the activity area, and position information of the person; determining a main surrounding microphone that is one of the surrounding microphones and whose recording area is a location where the person is located in the activity area; a plurality of speakers arranged in a sound field reproduction space different from the sound field recording space based on a first peripheral sound signal from the main peripheral microphone and a second peripheral sound signal from the peripheral microphone other than the main peripheral microphone. A method for reproducing a realistic sound field is provided, comprising: executing a sound field reproducing process for reproducing the sound field in the sound field recording space using the sound field recording space.

なお、これらの包括的または具体的な態様は、システム、装置、方法、集積回路、コンピュータプログラム、または、記録媒体で実現されてもよく、システム、装置、方法、集積回路、コンピュータプログラムおよび記録媒体の任意な組み合わせで実現されてもよい。 Note that these comprehensive or specific aspects may be realized by a system, an apparatus, a method, an integrated circuit, a computer program, or a recording medium. It may be realized by any combination of the following.

本開示によれば、音場収録空間内の各種の音源による音像を含む臨場感を音場再生空間内において高感度に再生できる。 According to the present disclosure, a sense of presence including sound images from various sound sources in the sound field recording space can be reproduced with high sensitivity in the sound field reproduction space.

実施の形態１に係る臨場感音場再生システムのシステム構成例を示すブロック図A block diagram showing an example of a system configuration of a realistic sound field reproduction system according to Embodiment 1. ライブ会場の舞台を天井から見た方向の舞台端ゾーンマイク及び天吊りゾーンマイクの配置例を示す図A diagram showing an example of the arrangement of stage end zone microphones and ceiling-mounted zone microphones when looking at the stage of a live venue from the ceiling. 舞台端ゾーンマイク及び天吊りゾーンマイクの収録範囲例を示す図Diagram showing examples of recording ranges for stage edge zone microphones and ceiling-mounted zone microphones 天吊りゾーンマイクの指向性例を示す図Diagram showing an example of directivity of a ceiling-mounted zone microphone ライブ会場の舞台を左側（右側）の舞台端から右側（左側）の舞台端を見た方向の舞台端ゾーンマイク及び天吊りゾーンマイクの配置例を示す図A diagram showing an example of the arrangement of stage edge zone microphones and ceiling-suspended zone microphones when looking from the left (right) stage edge to the right (left) stage edge of a live venue stage. マッチングテーブルの定期更新例を示す遷移図Transition diagram showing an example of regular updating of the matching table 演者の位置情報に応じたゾーン音信号の選択の動作概要例を模式的に示す図A diagram schematically showing an example of the operation overview of selecting a zone sound signal according to the position information of the performer. サテライト会場内でのサテライトスピーカの配置例を示す図Diagram showing an example of satellite speaker placement in a satellite venue 実施の形態１に係る臨場感音場再生システムの動作手順例を時系列に示すシーケンス図A sequence diagram chronologically showing an example of the operation procedure of the realistic sound field reproduction system according to Embodiment 1. アンビソニックスマイクを用いたシーンベース立体音響再生技術における音場収音から音場再現までの概念を模式的に示す図A schematic diagram showing the concept of scene-based 3D sound reproduction using Ambisonics microphones, from sound field capture to sound field reproduction. 次数ｎ及び度数ｍに対する球面調和関数展開に基づくアンビソニックス成分の基底の一例を示す図FIG. 1 shows an example of a basis of Ambisonics components based on a spherical harmonic expansion for order n and degree m. 音場臨場感再現システムの動作概要例を模式的に示す図A schematic diagram showing an example of the operation of a sound field realism reproduction system. 実施の形態２に係る音場臨場感再現システムのシステム構成例を示すブロック図FIG. 13 is a block diagram showing an example of a system configuration of a sound field realism reproduction system according to a second embodiment. 図１３の音場臨場感再現システムにおける音場臨場感収音から音場臨場感再現までの動作概要例を示す図FIG. 14 is a diagram showing an example of an outline of operations from sound field realism pickup to sound field realism reproduction in the sound field realism reproduction system of FIG. 13; 実施の形態２に係る音場臨場感再現装置による音場臨場感再現の動作手順例を時系列に示すフローチャートFlowchart chronologically showing an example of an operation procedure for reproducing a sense of presence in a sound field by the sense of presence in a sound field according to the second embodiment 実施の形態２の変形例に係る音場臨場感再現システムのシステム構成例を示すブロック図FIG. 13 is a block diagram showing an example of a system configuration of a sound field realism reproduction system according to a modification of the second embodiment. 図１６の音場臨場感再現システムにおける音場臨場感収音から音場臨場感再現までの動作概要例を示す図Figure 16 is a diagram illustrating an example of an outline of the operation from sound field realistic sound collection to sound field realistic reproduction in the sound field realistic feeling reproduction system. 実施の形態２の変形例に係る音場臨場感再現装置による音場臨場感再現の動作手順例を時系列に示すフローチャートFlowchart chronologically showing an example of an operation procedure for reproducing a sense of presence in a sound field by a sense of presence in a sound field according to a modification of the second embodiment. 実施の形態３に係る音場臨場感再現システムのシステム構成例を示すブロック図Block diagram showing a system configuration example of a sound field realistic sensation reproduction system according to Embodiment 3 図１９の音場臨場感再現システムにおける音場臨場感収音から音場臨場感再現までの動作概要例を示す図Figure 19 is a diagram illustrating an example of an outline of the operation from sound field realistic sound collection to sound field realistic reproduction in the sound field realistic feeling reproduction system. 実施の形態３に係る音場臨場感再現装置による音場臨場感再現の動作手順例を時系列に示すフローチャート11 is a flowchart showing an example of an operation procedure for reproducing a sense of realism in a sound field by the sound field realism reproduction device according to the third embodiment. 実施の形態３の変形例に係る音場臨場感再現システムのシステム構成例を示すブロック図Block diagram showing a system configuration example of a sound field realistic sensation reproduction system according to a modification of Embodiment 3 図２２の音場臨場感再現システムにおける音場臨場感収音から音場臨場感再現までの動作概要例を示す図Figure 22 is a diagram showing an example of an outline of the operation from sound field realistic sound collection to sound field realistic reproduction in the sound field realistic feeling reproduction system. 実施の形態３の変形例に係る音場臨場感再現装置による音場臨場感再現の動作手順例を時系列に示すフローチャートFlowchart chronologically showing an example of an operation procedure for reproducing a sense of presence in a sound field by a sense of presence in a sound field according to a modification of the third embodiment.

以下、図面を適宜参照して、本開示に係る臨場感音場再生装置及び臨場感音場再生方法を具体的に開示した実施の形態について、詳細に説明する。ただし、必要以上に詳細な説明は省略する場合がある。例えば、すでによく知られた事項の詳細説明及び実質的に同一の構成に対する重複説明を省略する場合がある。これは、以下の説明が不必要に冗長になるのを避け、当業者の理解を容易にするためである。なお、添付図面及び以下の説明は、当業者が本開示を十分に理解するために提供されるのであって、これらにより特許請求の記載の主題を限定することは意図されていない。 DESCRIPTION OF EMBODIMENTS Hereinafter, embodiments specifically disclosing a realistic sound field reproducing device and a realistic sound field reproducing method according to the present disclosure will be described in detail with reference to the drawings as appropriate. However, more detailed explanation than necessary may be omitted. For example, detailed explanations of well-known matters and redundant explanations of substantially the same configurations may be omitted. This is to avoid unnecessary redundancy in the following description and to facilitate understanding by those skilled in the art. The accompanying drawings and the following description are provided to enable those skilled in the art to fully understand the present disclosure, and are not intended to limit the subject matter of the claims.

（実施の形態１）
実施の形態１では、音場収録空間（例えば広範なコンサートホール等のライブ会場）内に設けられた舞台上で演劇等の上演がなされている間、役者等の演者（人物の一例）が発話した時の発話音声、演者の足元付近の音、舞台から離れた客席側の拍手、歓声、ざわつき等の音による各種の音源による音源信号（音信号）を収録し、その音源信号による音場を含む臨場感を音場収録空間とは異なる音場再生空間（例えば映画館等のサテライト会場）において再生する例を説明する。 (Embodiment 1)
In the first embodiment, a performer such as an actor (an example of a person) speaks while a play is being performed on a stage set up in a sound field recording space (for example, a live venue such as a wide concert hall). We recorded sound source signals (sound signals) from various sound sources such as speech sounds when performers performed, sounds near the performers' feet, applause, cheers, and murmurs from the audience seats away from the stage, and created a sound field based on the sound source signals. An example will be described in which the sense of presence is reproduced in a sound field reproduction space (for example, a satellite venue such as a movie theater) that is different from the sound field recording space.

まず、図１を参照して、実施の形態１に係る臨場感音場再生システムのシステム構成例について説明する。図１は、実施の形態１に係る臨場感音場再生システム１０００のシステム構成例を示すブロック図である。 First, with reference to FIG. 1, an example of the system configuration of the realistic sound field reproduction system according to the first embodiment will be described. FIG. 1 is a block diagram showing an example of a system configuration of a realistic sound field reproduction system 1000 according to the first embodiment.

臨場感音場再生システム１０００は、ライブ会場ＬＶ１側に配置された各種の機器と、サテライト会場ＳＴＬ１側に配置された各種の機器と、を含む。ライブ会場ＬＶ１側の機器の１つであるライブ会場側通信部６とサテライト会場ＳＴＬ１側の機器の１つであるサテライト会場側通信部１１とは、ネットワークＮＷ１を介して互いにデータ通信が可能に接続されている。ネットワークＮＷ１は、有線ネットワークでもよいし、無線ネットワークでもよい。有線ネットワークは、例えば有線ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、有線ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）、電力線通信（ＰＬＣ：ＰｏｗｅｒＬｉｎｅＣｏｍｍｕｎｉｃａｔｉｏｎ）のうち少なくとも１つが該当し、他の有線通信可能なネットワーク構成でもよい。一方、無線ネットワークは、Ｗｉ－Ｆｉ（登録商標）等の無線ＬＡＮ、無線ＷＡＮ、Ｂｌｕｅｔｏｏｔｈ（登録商標）等の近距離無線通信、４Ｇ或いは５Ｇ等の移動体携帯通信網のうち少なくとも１つが該当し、他の無線通信可能なネットワーク構成でもよい。 The realistic sound field reproduction system 1000 includes various types of equipment placed on the live venue LV1 side and various types of equipment placed on the satellite venue STL1 side. The live venue side communication unit 6, which is one of the devices on the live venue LV1 side, and the satellite venue side communication unit 11, which is one of the devices on the satellite venue STL1 side, are connected to each other to enable data communication via the network NW1. has been done. Network NW1 may be a wired network or a wireless network. The wired network may be, for example, at least one of a wired LAN (Local Area Network), a wired WAN (Wide Area Network), and a power line communication (PLC), and may be any other network configuration that allows wired communication. On the other hand, the wireless network includes at least one of wireless LAN such as Wi-Fi (registered trademark), wireless WAN, short-range wireless communication such as Bluetooth (registered trademark), and mobile mobile communication network such as 4G or 5G. , or other network configurations that allow wireless communication.

ライブ会場ＬＶ１側に配置された各種の機器は、例えば、ヘッドセットマイクＨＭ１、…、ＨＭ４と、ゾーンマイクＺＭ１、…、ＺＭ７と、アンビソニックスマイクＡＭＢ１と、ミキサ１と、演者トラッキングシステム２と、顔データベース３と、エンコーダ４と、マッチング表データベース５と、ライブ会場側通信部６と、を含む。なお、図１では説明を簡単にするために、ヘッドセットマイクを４つ、ゾーンマイクを７つと例示しているが、ヘッドセットマイクは４つに限定されず、更に、ゾーンマイクは７つに限定されない。 Various devices placed on the live venue LV1 side include, for example, headset microphones HM1, ..., HM4, zone microphones ZM1, ..., ZM7, ambisonics microphone AMB1, mixer 1, performer tracking system 2, and the like. It includes a face database 3, an encoder 4, a matching table database 5, and a live venue side communication section 6. In addition, in order to simplify the explanation, FIG. 1 shows an example of four headset microphones and seven zone microphones, but the number of headset microphones is not limited to four, and furthermore, the number of zone microphones is seven. Not limited.

ヘッドセットマイクＨＭ１～ＨＭ４のそれぞれは、人物マイクの一例であり、ライブ会場ＬＶ１の舞台ＳＴＧ１（図２参照）を移動可能な少なくとも１人の人物（例えば役者等の演者）により装着されるワイヤレスマイクであり、ミキサ１との間で各種の制御信号或いはデータ信号の無線通信を行うことが可能である。ヘッドセットマイクＨＭ１～ＨＭ４のそれぞれのマイク素子は、例えば無指向性マイク或いは有指向性マイクを用いて構成される。ヘッドセットマイクＨＭ１～ＨＭ４のそれぞれは、そのマイク素子により、装着者である人物の発話音声を収音（収録）する。なお、ヘッドセットマイクＨＭ１～ＨＭ４のそれぞれは、スピーカを有し、そのスピーカに入力された音信号を出力してもよい。ヘッドセットマイクＨＭ１～ＨＭ４のそれぞれは、装着者である人物の口元を向くように装着される。例えば、ヘッドセットマイクＨＭ１は演者Ａにより装着され、ヘッドセットマイクＨＭ２は演者Ｂにより装着され、ヘッドセットマイクＨＭ３は演者Ｃにより装着され、ヘッドセットマイクＨＭ４は演者Ｄにより装着される。ヘッドセットマイクＨＭ１～ＨＭ４のそれぞれにより収音（収録）された発話時の音声信号は、対応するヘッドセットマイクの識別情報（例えば装着者である演者の識別番号である音源ＩＤ（後述参照））とともに演者音信号としてミキサ１に無線送信される。 Each of the headset microphones HM1 to HM4 is an example of a person microphone, and is a wireless microphone worn by at least one person (for example, a performer such as an actor) who can move around the stage STG1 (see FIG. 2) of the live venue LV1. It is possible to perform wireless communication of various control signals or data signals with the mixer 1. Each of the microphone elements of the headset microphones HM1 to HM4 is configured using, for example, an omnidirectional microphone or a directional microphone. Each of the headset microphones HM1 to HM4 collects (records) the speech voice of the wearer using its microphone element. Note that each of the headset microphones HM1 to HM4 may have a speaker, and may output the sound signal input to the speaker. Each of the headset microphones HM1 to HM4 is worn so as to face the mouth of the wearer. For example, headset microphone HM1 is worn by performer A, headset microphone HM2 is worn by performer B, headset microphone HM3 is worn by performer C, and headset microphone HM4 is worn by performer D. The speech signals collected (recorded) by each of the headset microphones HM1 to HM4 are identified by the identification information of the corresponding headset microphone (for example, the sound source ID, which is the identification number of the performer who wears it (see below)). It is also wirelessly transmitted to the mixer 1 as a performer sound signal.

ゾーンマイクＺＭ１～ＺＭ７のそれぞれは、周辺マイクの一例であり、ライブ会場ＬＶ１の舞台ＳＴＧ１（図２参照）の周辺に配置され、ミキサ１との間で各種の制御信号或いはデータ信号の有線通信若しくは無線通信を行うことが可能である。ゾーンマイクＺＭ１～ＺＭ７のそれぞれのマイク素子は、例えば無指向性マイク或いは有指向性マイクを用いて構成される。ゾーンマイクＺＭ１～ＺＭ７のそれぞれは、そのマイク素子により、ヘッドセットマイクの装着者である人物の足元付近或いは頭上付近で発生している音を収音（収録）する。ゾーンマイクＺＭ１～ＺＭ７のそれぞれにより収音（収録）された音信号は、ゾーン音信号としてミキサ１に無線送信される。ゾーンマイクＺＭ１～ＺＭ７の配置例の詳細については図２～図５を参照して後述する。 Each of the zone microphones ZM1 to ZM7 is an example of a peripheral microphone, and is placed around the stage STG1 (see FIG. 2) of the live venue LV1, and is used for wired communication or communication of various control signals or data signals with the mixer 1. It is possible to perform wireless communication. Each of the microphone elements of the zone microphones ZM1 to ZM7 is configured using, for example, an omnidirectional microphone or a directional microphone. Each of the zone microphones ZM1 to ZM7 uses its microphone element to collect (record) sounds occurring near the feet or above the head of the person wearing the headset microphone. The sound signals picked up (recorded) by each of the zone microphones ZM1 to ZM7 are wirelessly transmitted to the mixer 1 as a zone sound signal. Details of arrangement examples of the zone microphones ZM1 to ZM7 will be described later with reference to FIGS. 2 to 5.

アンビソニックスマイクＡＭＢ１は、音収録デバイスの一例であり、ライブ会場ＬＶ１の舞台ＳＴＧ１（図２参照）から離れた周辺エリア（例えば客席側の座席同士の間付近）に配置され、ミキサ１との間で各種の制御信号或いはデータ信号の有線通信若しくは無線通信を行うことが可能である。アンビソニックスマイクＡＭＢ１は、４つのマイク素子Ｍｃ１、Ｍｃ２、Ｍｃ３、Ｍｃ４（図１０参照）を備え、マイク素子Ｍｃ１においてアンビソニックスマイクＡＭＢ１の筐体の前方左上方向（図１０参照）の音を収録し、マイク素子Ｍｃ２において同筐体の前方右下方向（図１０参照）の音を収録し、マイク素子Ｍｃ３において同筐体の後方左下方向（図１０参照）の音を収録し、マイク素子Ｍｃ４において同筐体の後方右上方向（図１０参照）の音を収録する。アンビソニックスマイクＡＭＢ１は、例えばライブ会場ＬＶ１内の舞台ＳＴＧ１（図２参照）とは異なるエリア（具体的には、客席側の拍手、歓声、ざわつき等の収録音（言い換えると、客席側臨場感））を収録する。なお、アンビソニックスマイクＡＭＢ１は、中空配置された４つのマイク素子Ｍｃ１、Ｍｃ２、Ｍｃ３、Ｍｃ４よりも多くの単一指向性を有するマイク素子を備えていてもよく、また、剛球上に配置された無指向性を有するマイク素子を備えていても良い。多数のマイク素子を備えたアンビソニックスマイクを用いることにより、後述する客席側臨場感パラメータ算出部１６において、２次以上オーダーのアンビソニックス信号を合成することが可能となる。アンビソニックスマイクＡＭＢ１により収音（収録）された客席側の音声信号は、客席側臨場感音信号としてミキサ１に無線送信される。 Ambisonics microphone AMB1 is an example of a sound recording device, and is placed in a peripheral area (for example, near the space between seats on the audience side) away from stage STG1 (see Figure 2) of live venue LV1, and between mixer 1 and It is possible to perform wired or wireless communication of various control signals or data signals. The ambisonics microphone AMB1 includes four microphone elements Mc1, Mc2, Mc3, and Mc4 (see Figure 10), and the microphone element Mc1 records sound from the front upper left direction of the casing of the ambisonics microphone AMB1 (see Figure 10). , microphone element Mc2 records the sound in the front lower right direction of the housing (see Figure 10), microphone element Mc3 records the sound in the rear lower left direction of the housing (see Figure 10), and microphone element Mc4 records the sound in the rear lower left direction of the housing (see Figure 10). Record the sound from the rear and upper right direction of the same housing (see Figure 10). For example, the ambisonics microphone AMB1 is used in an area different from the stage STG1 (see Figure 2) in the live venue LV1 (specifically, recording sounds such as applause, cheers, and murmurs from the audience side (in other words, a sense of presence on the audience side). ) is recorded. Note that the ambisonics microphone AMB1 may include more microphone elements having unidirectionality than the four microphone elements Mc1, Mc2, Mc3, and Mc4 arranged in the hollow, or may include microphone elements arranged on a hard sphere. A microphone element having omnidirectionality may be included. By using an ambisonics microphone equipped with a large number of microphone elements, it becomes possible to synthesize ambisonics signals of second order or higher order in the audience seat side realism parameter calculation unit 16, which will be described later. The audio signal from the audience seat side that is picked up (recorded) by the ambisonics microphone AMB1 is wirelessly transmitted to the mixer 1 as a realistic acoustic signal from the audience seat side.

ミキサ１は、コンピュータ装置Ｐ１が備えるＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）等のプロセッサを用いて構成される。ミキサ１は、ヘッドセットマイクＨＭ１～ＨＭ４のそれぞれからの識別情報（上述参照）が付された演者音信号と、ゾーンマイクＺＭ１～ＺＭ７のそれぞれからのゾーン音信号と、アンビソニックスマイクＡＭＢ１からの客席側臨場感音信号とを入力し、ライブ会場音信号として混合してエンコーダ４に出力する。なお、ミキサ１は、ゾーンマイクＺＭ１～ＺＭ７のそれぞれからのゾーン音信号と、アンビソニックスマイクＡＭＢ１からの客席側臨場感音信号とを混合してエンコーダ４に出力するとともに、ヘッドセットマイクＨＭ１～ＨＭ４のそれぞれからの演者音信号を個別にエンコーダ４に出力してもよい。また、ミキサ１とエンコーダ４とライブ会場側通信部６とは、１以上のコンピュータ装置Ｐ１（例えばパーソナルコンピュータ或いはサーバコンピュータ）により構成されてよい。例えば、コンピュータ装置Ｐ１は、音場収録装置と称してもよい。 The mixer 1 is configured using a processor such as a CPU (Central Processing Unit) included in the computer device P1. Mixer 1 receives performer sound signals with identification information (see above) from each of the headset microphones HM1 to HM4, zone sound signals from each of zone microphones ZM1 to ZM7, and audience sound signals from ambisonics microphone AMB1. The side realistic sensor sound signal is inputted, mixed as a live venue sound signal, and outputted to the encoder 4. Note that the mixer 1 mixes the zone sound signals from each of the zone microphones ZM1 to ZM7 and the ambience sound signal from the audience seat side from the ambisonics microphone AMB1 and outputs the mixture to the encoder 4, and also outputs the mixture to the encoder 4. The performer sound signals from each of the performers may be individually output to the encoder 4. Further, the mixer 1, the encoder 4, and the live venue side communication section 6 may be configured by one or more computer devices P1 (for example, a personal computer or a server computer). For example, the computer device P1 may be called a sound field recording device.

演者トラッキングシステム２は、顔データベース３、エンコーダ４及びマッチング表データベース５のそれぞれとの間でデータ信号の有線通信若しくは無線通信を行うことが可能である。演者トラッキングシステム２は、演劇等の上演がなされている間、ライブ会場ＬＶ１の舞台ＳＴＧ１（図２参照）を移動中或いは立ち止まっている演者の追跡を顔データベース３の参照によって行うことにより、演者を認識して演者の識別情報の発行或いは再確認を常時行い、更に、その演者の位置情報を取得する。この結果、演者トラッキングシステム２は、演劇等の上限がなされている間、演者ごとに、演者の識別情報（「音源ＩＤ」とも称する場合がある）と演者の位置情報とを有するレコードを並べたマッチング表のデータを生成してマッチング表データベース５に保存する。演者トラッキングシステム２は、例えばＡＩ（ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ）である人工知能を搭載したＡＩカメラを備え、このＡＩカメラにより撮像された演者の撮像映像と顔データベース３に予め登録されている演者の顔写真との照合によって、演者の識別情報及び位置情報の取得を行う。なお、演者トラッキングシステム２が演者の識別情報及び位置情報の取得を、ＡＩカメラ及び顔データベース３を利用して行うことに限定しなくてもよく、他の機器（例えば演者が個別に装着したセンサ）が検知した情報に基づいて行ってもよい。演者トラッキングシステム２は、演者ごとの識別情報及び位置情報とその演者を撮像した撮像映像のデータ信号をエンコーダ４に送る。 The performer tracking system 2 can perform wired or wireless communication of data signals with each of the face database 3, encoder 4, and matching table database 5. The performer tracking system 2 tracks performers who are moving or standing still on the stage STG1 (see FIG. 2) of the live venue LV1 while a play is being performed, by referring to the face database 3. After recognizing the performer, identification information of the performer is constantly issued or reconfirmed, and furthermore, the position information of the performer is acquired. As a result, the performer tracking system 2 arranges records having performer identification information (sometimes referred to as "sound source ID") and performer position information for each performer while the upper limit is set for a play, etc. Matching table data is generated and stored in the matching table database 5. The performer tracking system 2 includes an AI camera equipped with, for example, AI (Artificial Intelligence), and combines an image of the performer captured by the AI camera with a face photo of the performer registered in advance in the face database 3. The performer's identification information and location information are obtained by comparing the information. Note that the performer tracking system 2 does not need to be limited to acquiring the performer's identification information and location information using the AI camera and the face database 3, and may obtain the performer's identification information and location information using other equipment (for example, a sensor individually worn by the performer). ) may be performed based on the information detected. The performer tracking system 2 sends to the encoder 4 identification information and position information for each performer, and a data signal of a captured image of the performer.

顔データベース３は、例えばフラッシュメモリ或いはハードディスクドライブを用いて構成され、ライブ会場ＬＶ１の舞台ＳＴＧ１（図２参照）上での演劇等の上演を行う演者ごとの顔写真のデータを演者の識別情報と関連付けて登録している。顔データベース３は、演者トラッキングシステム２によるアクセスが可能に接続されており、演者トラッキングシステム２が演者の認識及び位置情報の取得の際に参照される。図１では、データベースを便宜的に「ＤＢ」と示している。 The face database 3 is configured using, for example, a flash memory or a hard disk drive, and stores face photo data of each performer performing a play or the like on the stage STG1 (see FIG. 2) of the live venue LV1 as the performer's identification information. Registered in association. The face database 3 is connected so as to be accessible by the performer tracking system 2, and is referred to by the performer tracking system 2 when recognizing the performer and acquiring location information. In FIG. 1, the database is shown as "DB" for convenience.

エンコーダ４は、コンピュータ装置Ｐ１が備えるＣＰＵ等のプロセッサを用いて構成される。エンコーダ４は、ミキサ１からのライブ会場音信号のデータ信号と演者トラッキングシステム２からの認識できた演者ごとの識別情報、位置情報及び撮像映像のデータとを入力して取得する。エンコーダ４は、ライブ会場音信号のうちの演者音信号のデータに付された演者の識別情報（例えば音源ＩＤ）と一致する演者の識別情報を有する演者トラッキングシステム２からのデータ（例えば演者の識別情報及び位置情報）をマッチング表データベース５を参照することによって取得してもよい。エンコーダ４は、共通の識別情報を有する演者の演者音信号のデータと識別情報、位置情報及び撮像映像のデータとを関連付け（例えばＩＰ（ＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ）パケット化）することにより、ネットワークＮＷ１を介した送受信用のＩＰパケットを生成する。また、エンコーダ４は、ゾーンマイクＺＭ１～ＺＭ７のそれぞれからのゾーン音信号とアンビソニックスマイクＡＭＢ１からの客席側臨場感音信号とを用いて、ネットワークＮＷ１を介した送受信用のＩＰパケットを生成する。 The encoder 4 is configured using a processor such as a CPU included in the computer device P1. The encoder 4 inputs and acquires the data signal of the live venue sound signal from the mixer 1 and the identification information of each recognized performer, position information, and captured video data from the performer tracking system 2. The encoder 4 receives data from the performer tracking system 2 (for example, performer identification information) having performer identification information that matches the performer identification information (for example, sound source ID) attached to the performer sound signal data of the live venue sound signal. information and location information) may be obtained by referring to the matching table database 5. The encoder 4 associates (for example, IP (Internet Protocol) packetization) the performer sound signal data of the performers having common identification information with the identification information, position information, and captured video data, so that the data is transmitted via the network NW1. Generates IP packets for transmission and reception. Furthermore, the encoder 4 uses the zone sound signals from each of the zone microphones ZM1 to ZM7 and the audience-side realistic sound signal from the ambisonics microphone AMB1 to generate IP packets for transmission and reception via the network NW1.

マッチング表データベース５は、例えばフラッシュメモリ或いはハードディスクドライブを用いて構成され、演者トラッキングシステム２により生成或いは更新されるマッチング表ＭａＴＬ１、ＭａＴＬ２、…（図６参照）のデータを保存する。マッチング表ＭａＴＬ１、ＭａＴＬ２、…のデータの詳細については、図６を参照して後述する。 The matching table database 5 is configured using, for example, a flash memory or a hard disk drive, and stores data of matching tables MaTL1, MaTL2, . . . (see FIG. 6) generated or updated by the performer tracking system 2. Details of the data in the matching tables MaTL1, MaTL2, . . . will be described later with reference to FIG.

ライブ会場側通信部６は、例えば有線あるいは無線を用いた通信を実現可能な通信回路を用いて構成され、サテライト会場側通信部１１との間でネットワークＮＷ１を介した各種の制御信号或いはデータ信号の有線通信若しくは無線通信を行う。ライブ会場側通信部６は、エンコーダ４により生成された各種のＩＰパケットを、ネットワークＮＷ１を介してサテライト会場側通信部１１に送信する。 The live venue communication unit 6 is configured using a communication circuit capable of realizing wired or wireless communication, for example, and performs wired or wireless communication of various control signals or data signals with the satellite venue communication unit 11 via the network NW1. The live venue communication unit 6 transmits various IP packets generated by the encoder 4 to the satellite venue communication unit 11 via the network NW1.

サテライト会場ＳＴＬ１側に配置された各種の機器は、例えば、サテライト会場側通信部１１と、デコーダ１２と、記憶部１３と、映像出力部１４と、音像定位ゾーン決定部１５と、客席側臨場感パラメータ算出部１６と、ミキシング／レベルバランス調整部１７と、音場再生処理部１８と、サテライトスピーカＳＰｋ１、…、ＳＰｋｐと、を含む。ｐはサテライトスピーカの配置数を示し、２以上の整数であって定数である。 The various devices arranged on the satellite venue STL1 side include, for example, a satellite venue side communication unit 11, a decoder 12, a memory unit 13, a video output unit 14, a sound image localization zone determination unit 15, an audience side realism parameter calculation unit 16, a mixing/level balance adjustment unit 17, a sound field reproduction processing unit 18, and satellite speakers SPk1, ..., SPkp. p indicates the number of satellite speakers arranged and is a constant integer equal to or greater than 2.

サテライト会場側通信部１１は、例えば有線あるいは無線を用いた通信を実現可能な通信回路を用いて構成され、ライブ会場側通信部６との間でネットワークＮＷ１を介した各種の制御信号或いはデータ信号の有線通信若しくは無線通信を行う。サテライト会場側通信部１１は、ライブ会場側通信部６から送られた各種のＩＰパケットを、ネットワークＮＷ１を介して受信してデコーダ１２に出力する。また、サテライト会場側通信部１１とデコーダ１２と記憶部１３と映像出力部１４と音像定位ゾーン決定部１５と客席側臨場感パラメータ算出部１６とミキシング／レベルバランス調整部１７と音場再生処理部１８とは、１以上のコンピュータ装置Ｐ１（例えばパーソナルコンピュータ或いはサーバコンピュータ）により構成されてよい。例えば、コンピュータ装置Ｐ２は、音場再生装置と称してもよい。 The satellite venue side communication unit 11 is configured using a communication circuit that can realize communication using wired or wireless communication, for example, and exchanges various control signals or data signals with the live venue side communication unit 6 via the network NW1. Performs wired or wireless communication. The satellite venue side communication unit 11 receives various IP packets sent from the live venue side communication unit 6 via the network NW1 and outputs them to the decoder 12. Also, the satellite venue side communication section 11, the decoder 12, the storage section 13, the video output section 14, the sound image localization zone determination section 15, the audience seat side realistic sensation parameter calculation section 16, the mixing/level balance adjustment section 17, and the sound field reproduction processing section. 18 may be configured by one or more computer devices P1 (for example, a personal computer or a server computer). For example, the computer device P2 may be called a sound field reproduction device.

デコーダ１２は、コンピュータ装置Ｐ２が備えるＣＰＵ等のプロセッサを用いて構成される。デコーダ１２は、サテライト会場側通信部１１からのＩＰパケットをデコード処理することにより、ヘッドセットマイクＨＭ１～ＨＭ４のそれぞれにより収音（収録）された演者ごとの演者音信号、識別情報、位置情報及び撮像映像のデータと、ゾーンマイクＺＭ１～ＺＭ７のそれぞれからのゾーン音信号と、アンビソニックスマイクＡＭＢ１からの客席側臨場感音信号とに分離する。デコーダ１２は、演者ごとの撮像映像のデータを映像出力部１４に送る。デコーダ１２は、ヘッドセットマイクＨＭ１～ＨＭ４のそれぞれにより収音（収録）された演者ごとの演者音信号、識別情報、位置情報及び撮像映像のデータと、ゾーンマイクＺＭ１～ＺＭ７のそれぞれからのゾーン音信号と、を音像定位ゾーン決定部１５に出力する。デコーダ１２は、アンビソニックスマイクＡＭＢ１からの客席側臨場感音信号を客席側臨場感パラメータ算出部１６に出力する。 The decoder 12 is configured using a processor such as a CPU provided in the computer device P2. The decoder 12 decodes the IP packets from the satellite venue communication unit 11 to separate the performer sound signals, identification information, position information, and captured video data for each performer picked up (recorded) by each of the headset microphones HM1 to HM4, the zone sound signals from each of the zone microphones ZM1 to ZM7, and the audience-side realistic sound signal from the Ambisonics microphone AMB1. The decoder 12 sends the captured video data for each performer to the video output unit 14. The decoder 12 outputs the performer sound signals, identification information, position information, and captured video data for each performer picked up (recorded) by each of the headset microphones HM1 to HM4, and the zone sound signals from each of the zone microphones ZM1 to ZM7 to the sound image localization zone determination unit 15. The decoder 12 outputs the audience-side realism sound signal from the Ambisonics microphone AMB1 to the audience-side realism parameter calculation unit 16.

記憶部１３は、例えばフラッシュメモリ或いはハードディスクドライブを用いて構成され、予め算出されたサテライトスピーカＳＰｋ１～ＳＰｋｐのそれぞれごとの音像定位パラメータ、或いは、デコーダ１２により取得された各種のデータを保存する。サテライトスピーカごとの音像定位パラメータの算出方法は、例えば特開平０９－１４９５００号公報に開示されているので、ここでは説明を省略する。つまり、音像定位パラメータは、サテライトスピーカごとに、そのサテライトスピーカ（定位スピーカ）からの音量を補助するための他のサテライトスピーカ（音量補助スピーカ）のディレイと、その音量補助スピーカのアッテネートレベルである。この音像定位パラメータは、音場再生処理部１８による音像定位の処理の実行時に使用される。また、デコーダ１２により取得された各種のデータには、例えば、ヘッドセットマイクＨＭ１～ＨＭ４のそれぞれにより収音（収録）された演者ごとの演者音信号、識別情報、位置情報及び撮像映像のデータと、ゾーンマイクＺＭ１～ＺＭ７のそれぞれからのゾーン音信号と、アンビソニックスマイクＡＭＢ１からの客席側臨場感音信号とが含まれる。また、記憶部１３は、サテライトスピーカＳＰｋ１～ＳＰｋｐのそれぞれのサテライト会場ＳＴＬ１内の位置情報（例えば３次元空間の座標情報）を保存している。 The storage unit 13 is configured using, for example, a flash memory or a hard disk drive, and stores sound image localization parameters calculated in advance for each of the satellite speakers SPk1 to SPkp or various data acquired by the decoder 12. A method for calculating the sound image localization parameter for each satellite speaker is disclosed in, for example, Japanese Patent Application Laid-Open No. 09-149500, so a description thereof will be omitted here. That is, the sound image localization parameters are, for each satellite speaker, the delay of another satellite speaker (volume auxiliary speaker) for supplementing the volume from that satellite speaker (localization speaker), and the attenuation level of that volume auxiliary speaker. This sound image localization parameter is used when the sound field reproduction processing section 18 executes sound image localization processing. In addition, the various data acquired by the decoder 12 include, for example, performer sound signals, identification information, position information, and captured video data for each performer collected (recorded) by each of the headset microphones HM1 to HM4. , zone sound signals from each of the zone microphones ZM1 to ZM7, and an audience-side realistic sound signal from the ambisonics microphone AMB1. Furthermore, the storage unit 13 stores position information (for example, coordinate information in three-dimensional space) within the satellite venue STL1 of each of the satellite speakers SPk1 to SPkp.

映像出力部１４は、例えばプロジェクタを用いて構成され、デコーダ１２からの演者ごとの撮像映像のデータを、サテライト会場ＳＴＬ１に配置されたスクリーンＳＣＲ１（図８参照）に投影して映し出す。これにより、サテライト会場ＳＴＬ１内で着席している複数人の観客は、ライブ会場ＬＶ１で演者ごとに注目するように撮像された撮像映像をスクリーンＳＣＲ１を介して視聴することができる。 The video output unit 14 is configured using, for example, a projector, and projects and displays the captured video data for each performer from the decoder 12 on a screen SCR1 (see FIG. 8) arranged in the satellite venue STL1. Thereby, a plurality of audience members seated in the satellite venue STL1 can view the captured video imaged at the live venue LV1 so as to focus on each performer via the screen SCR1.

音像定位ゾーン決定部１５は、コンピュータ装置Ｐ２が備えるＣＰＵ等のプロセッサを用いて構成される。音像定位ゾーン決定部１５は、決定部の一例であり、デコーダ１２からのデータ（つまり、演者ごとの識別情報及び位置情報のデータ）に基づいて、ライブ会場ＬＶ１の舞台ＳＴＧ１（図２参照）の演者の発話による音声と立ち止まり若しくは移動等による足元音とにより形成される音場（雰囲気）をサテライト会場ＳＴＬ１内で再現（再生）するための音像定位ゾーンを決定する。ここでいう音像定位ゾーンは、図３に示す収音ゾーンＺＮ１～ＺＮ７のうちいずれかである。この音像定位ゾーンの決定例の詳細については、図７を参照して後述する。 The sound image localization zone determination unit 15 is configured using a processor such as a CPU included in the computer device P2. The sound image localization zone determining unit 15 is an example of a determining unit, and based on the data from the decoder 12 (that is, the identification information and position information data for each performer), the sound image localization zone determination unit 15 determines the location of the stage STG1 (see FIG. 2) of the live venue LV1. A sound image localization zone is determined for reproducing (reproducing) a sound field (atmosphere) formed by the voice of the performer's utterances and the sound of footsteps caused by stopping or moving in the satellite venue STL1. The sound image localization zone referred to here is one of the sound collection zones ZN1 to ZN7 shown in FIG. 3. Details of this example of determining the sound image localization zone will be described later with reference to FIG.

客席側臨場感パラメータ算出部１６は、コンピュータ装置Ｐ２が備えるＣＰＵ等のプロセッサを用いて構成される。客席側臨場感パラメータ算出部１６は、音場再生部の一例であり、デコーダ１２からのデータ（つまり、アンビソニックスマイクＡＭＢ１からの客席側臨場感音信号）と記憶部１３に保存されているサテライトスピーカＳＰｋ１～ＳＰｋｐのそれぞれの位置情報とに基づいて、ライブ会場ＬＶ１内の客席側臨場感の音場をサテライト会場内ＳＴＬ１で立体的に再現（再生）するための立体再生用パラメータを算出する。客席側臨場感パラメータ算出部１６は、この立体再生用パラメータを用いて、サテライトスピーカＳＰｋ１～ＳＰｋｐのそれぞれごとのスピーカ駆動信号を生成してミキシング／レベルバランス調整部１７に出力する。立体再生用パラメータの算出例並びにスピーカ駆動信号の生成例の詳細については、実施の形態２を参照して詳述する。 The audience seat side realism parameter calculation unit 16 is configured using a processor such as a CPU included in the computer device P2. The audience seat-side presence sensation parameter calculation unit 16 is an example of a sound field reproduction unit, and is based on the data from the decoder 12 (that is, the audience-side presence sensation signal from the ambisonics microphone AMB1) and the satellite data stored in the storage unit 13. Based on the position information of each of the speakers SPk1 to SPkp, parameters for stereoscopic reproduction are calculated for three-dimensionally reproducing (reproducing) the realistic sound field on the audience seat side in the live venue LV1 at the satellite venue STL1. The audience seat side realism parameter calculation unit 16 uses this stereoscopic reproduction parameter to generate a speaker drive signal for each of the satellite speakers SPk1 to SPkp, and outputs the generated speaker drive signal to the mixing/level balance adjustment unit 17. Details of an example of calculating parameters for stereoscopic reproduction and an example of generating a speaker drive signal will be described in detail with reference to Embodiment 2.

ミキシング／レベルバランス調整部１７は、コンピュータ装置Ｐ２が備えるＣＰＵ等のプロセッサを用いて構成される。ミキシング／レベルバランス調整部１７は、音場再生部の一例であり、ヘッドセットマイクＨＭ１～ＨＭ４のそれぞれにより収音（収録）された演者ごとの演者音信号と、ゾーンマイクＺＭ１～ＺＭ７のそれぞれからのゾーン音信号と、サテライトスピーカごとのスピーカ駆動信号との各信号レベルのバランスを調整する。ミキシング／レベルバランス調整部１７は、各信号レベルのバランスが調整された各信号をミキシング（つまり、信号混合）する。各信号レベルのバランスの調整例の詳細については、図７を参照して後述する。 The mixing/level balance adjustment unit 17 is configured using a processor such as a CPU provided in the computer device P2. The mixing/level balance adjustment unit 17 is an example of a sound field reproduction unit, and adjusts the balance of each signal level between the performer sound signals picked up (recorded) by each of the headset microphones HM1 to HM4 for each performer, the zone sound signals from each of the zone microphones ZM1 to ZM7, and the speaker drive signals for each satellite speaker. The mixing/level balance adjustment unit 17 mixes (i.e., mixes) the signals whose signal level balance has been adjusted. A detailed example of adjusting the balance of each signal level will be described later with reference to FIG. 7.

音場再生処理部１８は、コンピュータ装置Ｐ２が備えるＣＰＵ等のプロセッサを用いて構成される。音場再生処理部１８は、音場再生部の一例であり、記憶部１３から読み出されたサテライトスピーカごとの音像定位パラメータとミキシング／レベルバランス調整部１７から出力された信号とを用いて、ライブ会場ＬＶ１内の演者の発話による音声と立ち止まり若しくは移動等による足元音とにより形成される音場（雰囲気）、並びに、ライブ会場ＬＶ１内の客席側臨場感により形成される音場（雰囲気）をサテライト会場ＳＴＬ１において再現（再生）するための音場再生処理を実行する。具体的には、音場再生処理部１８は、ライブ会場ＬＶ１内の演者の発話による音声と立ち止まり若しくは移動等による足元音とにより形成される音場（雰囲気）をサテライト会場ＳＴＬ１内において再生するために、ミキシング／レベルバランス調整部１７から出力された信号を用いて、演者の発話音声及び足元音による音像を各サテライトスピーカを介して定位する処理を実行する。更に、音場再生処理部１８は、ライブ会場ＬＶ１内の客席側臨場感により形成される音場（雰囲気）をサテライト会場ＳＴＬ１内において再生するために、ミキシング／レベルバランス調整部１７から出力された信号（特にサテライトスピーカごとのスピーカ駆動信号）を対応するサテライトスピーカを介して出力する。 The sound field reproduction processing unit 18 is configured using a processor such as a CPU provided in the computer device P2. The sound field reproduction processing unit 18 is an example of a sound field reproduction unit, and executes a sound field reproduction process to reproduce (reproduce) the sound field (atmosphere) formed by the voice of the performer in the live venue LV1 and the foot sounds due to standing still or moving, etc., and the sound field (atmosphere) formed by the presence of the audience in the live venue LV1, in the satellite venue STL1, using the sound image localization parameters for each satellite speaker read from the storage unit 13 and the signal output from the mixing/level balance adjustment unit 17. Specifically, the sound field reproduction processing unit 18 executes a process of localizing the sound image of the performer's speech and the foot sounds via each satellite speaker, using the signal output from the mixing/level balance adjustment unit 17, in order to reproduce the sound field (atmosphere) formed by the voice of the performer in the live venue LV1 and the foot sounds due to standing still or moving, etc., in the satellite venue STL1. Furthermore, in order to reproduce the sound field (atmosphere) formed by the sense of realism of the audience seats in the live venue LV1 in the satellite venue STL1, the sound field reproduction processing unit 18 outputs the signals output from the mixing/level balance adjustment unit 17 (particularly the speaker drive signals for each satellite speaker) via the corresponding satellite speakers.

サテライトスピーカＳＰｋ１、…、ＳＰｋｐのそれぞれは、例えばサテライト会場ＳＴＬ１内で一定距離離した位置に配置され、音場再生処理部１８からのスピーカ駆動信号に基づいて音場を再生（出力）する。なお、サテライトスピーカの設置数は再現（再生）したい音場によって変化させてよく、特定の方位に対する再現（再生）を行わない場合や、トランスオーラルシステムやＶＢＡＰ（ＶｅｃｔｏｒＢａｓｅｄＡｍｐｌｉｔｕｄｅＰａｎｎｉｎｇ）法など一般的に知られた仮想音像生成方式を組み合わせることによりｐ個よりも少ないサテライトスピーカを用いて音場の再現（再生）を行っても良い。逆に、ｐ個よりも多くのスピーカを用いた音場の再現（再生）を行っても良い。また、サテライトスピーカの設置位置はサテライト会場ＳＴＬ１の基準位置（例えば聴取者ＰＳ１が着席するサテライト会場ＳＴＬ１内の中心位置）を取り囲むように設置されていれば配置箇所は限定されなくて良い。 Each of the satellite speakers SPk1, . The number of satellite speakers installed can be changed depending on the sound field you want to reproduce (reproduce), and when you do not want to reproduce (reproduce) a specific direction, or when using general methods such as a transaural system or VBAP (Vector Based Amplitude Panning) method. By combining virtual sound image generation methods known in the art, a sound field may be reproduced using fewer than p satellite speakers. Conversely, the sound field may be reproduced using more than p speakers. Further, the installation position of the satellite speakers is not limited as long as they are installed so as to surround the reference position of the satellite venue STL1 (for example, the central position in the satellite venue STL1 where the listener PS1 is seated).

次に、図２から図５を参照して、図１のゾーンマイクの詳細について説明する。図２は、ライブ会場の舞台を天井から見た方向のゾーンマイクの配置例を示す図である。図３は、ゾーンマイクの収録範囲例を示す図である。図４は、ゾーンマイクの指向性例を示す図である。図５は、ライブ会場の舞台を左側（右側）の舞台端から右側（左側）の舞台端を見た方向のゾーンマイクの配置例を示す図である。図２から図５において、Ｚ軸はＸ軸及びＹ軸の両方に垂直であって重力方向と平行な高さ方向を示し、Ｘ軸はＺ軸及びＹ軸の両方に垂直であって舞台ＳＴＧ１の幅方向を示し、Ｙ軸はＺ軸及びＹ軸の両方に垂直であって舞台ＳＴＧ１の奥行き方向を示す。なお、Ｙ軸の正の方向にライブ会場ＬＶ１の観客が着席する客席が設けられ、Ｙ軸の負の方向に舞台ＳＴＧ１のバックヤードが設けられる。 Next, details of the zone microphone shown in FIG. 1 will be described with reference to FIGS. 2 to 5. FIG. 2 is a diagram showing an example of the arrangement of zone microphones in the direction of the stage of a live venue viewed from the ceiling. FIG. 3 is a diagram showing an example of a recording range of a zone microphone. FIG. 4 is a diagram showing an example of directivity of a zone microphone. FIG. 5 is a diagram illustrating an example of the arrangement of zone microphones in a direction when looking from the left (right) stage edge to the right (left) stage edge of the stage at a live venue. In FIGS. 2 to 5, the Z axis is perpendicular to both the X and Y axes and indicates a height direction parallel to the direction of gravity, and the X axis is perpendicular to both the Z and Y axes and indicates the height direction of the stage STG1. The Y axis is perpendicular to both the Z axis and the Y axis and represents the depth direction of the stage STG1. In addition, seats for the audience of the live venue LV1 are provided in the positive direction of the Y-axis, and a backyard of the stage STG1 is provided in the negative direction of the Y-axis.

図２に示すように、ライブ会場ＬＶ１の舞台ＳＴＧ１の舞台端には４つのゾーンマイクＺＭ１、ＺＭ２、ＺＭ３、ＺＭ４が配置される。一方、舞台ＳＴＧ１の上部（例えば天井）から３つのゾーンマイクＺＭ５、ＺＭ６、ＺＭ７が天吊りして配置されている。 As shown in FIG. 2, four zone microphones ZM1, ZM2, ZM3, and ZM4 are arranged at the edge of the stage STG1 of the live venue LV1. On the other hand, three zone microphones ZM5, ZM6, and ZM7 are suspended from the upper part (for example, the ceiling) of the stage STG1.

図３に示すように、ゾーンマイクＺＭ１、ＺＭ２、ＺＭ３、ＺＭ４のそれぞれは、例えば略楕円形状（円形状を含む）の収音ゾーンＺＮ１、ＺＮ２、ＺＮ３、ＺＮ４（収録領域若しくは音像定位ゾーン）内で生じる音（音声も含む）を収音（収録）する。収音ゾーンＺＮ１～ＺＮ４の長手方向の長さは４［ｍ］である。従って、ゾーンマイクＺＭ１～ＺＭ４のいずれかに役者等の演者が立ち止まってステップを踏んだ若しくはダンスをした時の足元音は、その演者の位置を含む収音ゾーンに対応するゾーンマイクにより主に収音（収録）される。但し、その足元音は、その演者の位置を含む収音ゾーンに隣接する収音ゾーンに対応するゾーンマイクにより収音（収録）されることが排除されるものではない。 As shown in FIG. 3, each of the zone microphones ZM1, ZM2, ZM3, and ZM4 is located within, for example, approximately elliptical (including circular) sound collection zones ZN1, ZN2, ZN3, and ZN4 (recording area or sound image localization zone). Collect (record) the sounds (including voice) generated by The length in the longitudinal direction of the sound collection zones ZN1 to ZN4 is 4 [m]. Therefore, the footstep sound when a performer, such as an actor, stops and takes steps or dances on one of the zone microphones ZM1 to ZM4 is mainly collected by the zone microphone corresponding to the sound collection zone that includes the position of the performer. Sound (recorded). However, it is not excluded that the footstep sound is collected (recorded) by a zone microphone corresponding to a sound collection zone adjacent to the sound collection zone including the performer's position.

なお、舞台ＳＴＧ１の奥行き方向の長さは１２［ｍ］であり、ゾーンマイクＺＭ１、ＺＭ２、ＺＭ３、ＺＭ４のそれぞれは舞台ＳＴＧ１の客席側端部に配置され、ゾーンマイクＺＭ５、ＺＭ６、ＺＭ７のそれぞれは舞台ＳＴＧ１の客席側端部から６［ｍ］程離れた位置かつ高さ４［ｍ］の位置に天吊りして配置されている。これらの寸法はあくまで一例であって限定されるものではない。 The length of the stage STG1 in the depth direction is 12 [m], and each of the zone microphones ZM1, ZM2, ZM3, and ZM4 is placed at the end of the audience seat side of the stage STG1, and each of the zone microphones ZM5, ZM6, and ZM7 is placed at the end of the stage STG1 on the audience seat side. is suspended from the ceiling at a position approximately 6 [m] away from the audience seat side end of stage STG1 and at a height of 4 [m]. These dimensions are merely examples and are not limiting.

図３に示すように、ゾーンマイクＺＭ５、ＺＭ６、ＺＭ７のそれぞれは、例えば略円形状の収音ゾーンＺＮ５、ＺＮ６、ＺＮ７（収録領域）内で生じる音（音声も含む）を収音（収録）する。収音ゾーンＺＮ５～ＺＮ７の直径方向の長さは４［ｍ］である。従って、ゾーンマイクＺＭ５～ＺＭ７のいずれかの直下或いはその付近に役者等の演者が立ち止まって頭上付近で音が生じるような動きをした時の音は、その演者の頭上位置を含む収音ゾーンに対応するゾーンマイクにより主に収音（収録）される。但し、その音は、その演者の頭上位置を含む収音ゾーンに隣接する収音ゾーンに対応するゾーンマイクにより収音（収録）されることが排除されるものではない。 As shown in FIG. 3, each of the zone microphones ZM5, ZM6, and ZM7 collects (records) sounds (including audio) occurring within, for example, approximately circular sound collection zones ZN5, ZN6, and ZN7 (recording areas). do. The length of the sound collection zones ZN5 to ZN7 in the diametrical direction is 4 [m]. Therefore, when a performer, such as an actor, stops directly under or near one of the zone microphones ZM5 to ZM7 and makes a movement that produces sound near his or her head, the sound will be heard in the sound collection zone that includes the performer's overhead position. Sound is mainly collected (recorded) by the corresponding zone microphone. However, this does not preclude that the sound may be collected (recorded) by a zone microphone corresponding to a sound collection zone adjacent to the sound collection zone including the position above the performer's head.

図４に示すように、ゾーンマイクＺＭ５～ＺＭ７のそれぞれは、そのゾーンマイクの筐体の中心から見て８０度のビーム範囲（指向性）内で２．５［ｍ］先までの収音ゾーンＺＮ５～ＺＮ７（つまり、図３に示す直径４［ｍ］の略円形エリア）で生じる音を収音（収録）することができる。ゾーンマイクＺＭ５～ＺＭ７のそれぞれの筐体から２．５［ｍ］先には、舞台ＳＴＧ１の上で移動可能な役者等の演者ＡＣＴ１の頭上～口元付近の部位が位置することが想定される。 As shown in FIG. 4, each of the zone microphones ZM5 to ZM7 has a sound collection zone up to 2.5 [m] ahead within a beam range (directivity) of 80 degrees when viewed from the center of the housing of the zone microphone. It is possible to collect (record) sounds generated in ZN5 to ZN7 (that is, the approximately circular area with a diameter of 4 [m] shown in FIG. 3). It is assumed that a portion above the head and near the mouth of the performer ACT1, such as an actor who can move on the stage STG1, is located 2.5 [m] ahead of the housing of each of the zone microphones ZM5 to ZM7.

従って、図５に示すように、ヘッドセットマイクＨＭ１を装着している演者ＡＣＴ１が舞台ＳＴＧ１上で台詞等の発話をすると、そのヘッドセットマイクＨＭ１により収音される演者音信号だけでなく、その演者ＡＣＴ１の位置を収音ゾーンとして有する少なくとも１つのゾーンマイクにより収音されたゾーン音信号が収録されてミキサ１に入力される。なお、図５では説明のために演者ＡＣＴ１の身長が１．５［ｍ］を例示しているが、身長は１．５［ｍ］に限定されなくてもよい。 Therefore, as shown in FIG. 5, when a performer ACT1 wearing a headset microphone HM1 utters lines or the like on the stage STG1, not only the performer's sound signal picked up by the headset microphone HM1 but also the A zone sound signal picked up by at least one zone microphone having the position of the performer ACT1 as a sound pickup zone is recorded and input to the mixer 1. Although FIG. 5 shows an example in which the height of the performer ACT1 is 1.5 [m] for explanation, the height does not have to be limited to 1.5 [m].

次に、図６を参照して、演者トラッキングシステム２が生成或いは更新するマッチング表について説明する。図６は、マッチング表の定期更新例を示す遷移図である。 Next, with reference to FIG. 6, the matching table generated or updated by the performer tracking system 2 will be described. FIG. 6 is a transition diagram showing an example of regular updating of the matching table.

マッチング表ＭａＴＬ１、ＭａＴＬ２、…は、舞台ＳＴＧ１（図２参照）上で演劇等している役者等の各演者（例えば、演者Ａ、演者Ｂ、演者Ｃ、演者Ｄ、…）が演者トラッキングシステム２により認識された際に、演者の識別情報（音源ＩＤ）及び位置情報（例えば（Ｘ、Ｙ）の２次元座標）を演者ごとに規定するデータである。なお、演者の位置情報は、舞台ＳＴＧ１（図２参照）の特定の位置を原点とした場合の２次元座標として示されている。このマッチング表は、例えば１秒間隔等で定期的に更新される。図６に示すように、時刻ｔ＝ｔ１においてマッチング表ＭａＴＬ１が生成され、その１秒後であるｔ＝ｔ２においてマッチング表ＭａＴＬ２が生成される。なお、更新間隔は１秒に限定されなくてもよい。 The matching tables MaTL1, MaTL2, ... show that each performer (for example, performer A, performer B, performer C, performer D, ...) performing a play on the stage STG1 (see FIG. 2) is tracked by the performer tracking system 2. This is data that defines identification information (sound source ID) and position information (for example, two-dimensional coordinates of (X, Y)) for each performer when the performer is recognized. Note that the performer's position information is shown as two-dimensional coordinates with a specific position on the stage STG1 (see FIG. 2) as the origin. This matching table is updated periodically, for example, at 1 second intervals. As shown in FIG. 6, matching table MaTL1 is generated at time t=t1, and matching table MaTL2 is generated one second later at t=t2. Note that the update interval does not need to be limited to 1 second.

例えば図６のマッチング表ＭａＴＬ１では、時刻ｔ＝ｔ１において、演者トラッキングシステム２により演者Ａ、演者Ｂ、演者Ｃが少なくとも認識されている。つまり、演者Ａの音源ＩＤ（１）及び位置情報（４、１）と、演者Ｂの音源ＩＤ（２）及び位置情報（－３、２）と、演者Ｃの音源ＩＤ（３）及び位置情報（１、０）と、がマッチング表ＭａＴＬ１において少なくとも関連付けされている。 For example, in the matching table MaTL1 of FIG. 6, at least performer A, performer B, and performer C are recognized by the performer tracking system 2 at time t=t1. In other words, performer A's sound source ID (1) and position information (4, 1), performer B's sound source ID (2) and position information (-3, 2), and performer C's sound source ID (3) and position information. (1, 0) are at least associated in the matching table MaTL1.

一方、マッチング表ＭａＴＬ２では、時刻ｔ＝ｔ２において、演者トラッキングシステム２により演者Ａ、演者Ｂは時刻ｔ＝ｔ１と同様に認識されているが、演者Ｃが認識されず、演者Ｄが新たに認識されている。つまり、演者Ａの音源ＩＤ（１）及び位置情報（１、５）と、演者Ｂの音源ＩＤ（２）及び位置情報（－３、０）と、演者Ｄの音源ＩＤ（４）及び位置情報（－２、３）と、がマッチング表ＭａＴＬ２において少なくとも関連付けされている。例えば、演者Ｃは時刻ｔ＝ｔ２では舞台ＳＴＧ１（図２参照）から姿を消し、演者Ｄは時刻ｔ＝ｔ２において新たに舞台ＳＴＧ１（図２参照）に登場したことが示唆される。 Meanwhile, in matching table MaTL2, at time t=t2, performer A and performer B are recognized by performer tracking system 2 in the same way as at time t=t1, but performer C is not recognized and performer D is newly recognized. In other words, performer A's sound source ID (1) and position information (1, 5), performer B's sound source ID (2) and position information (-3, 0), and performer D's sound source ID (4) and position information (-2, 3) are at least associated in matching table MaTL2. For example, it is suggested that performer C has disappeared from stage STG1 (see Figure 2) at time t=t2, and performer D has newly appeared in stage STG1 (see Figure 2) at time t=t2.

このように、演者が演劇等の間に舞台ＳＴＧ１（図２参照）上を移動する動きがあったとしても、演者トラッキングシステム２により各演者の識別情報及び位置情報が認識、追跡された結果、演者の位置情報を示すマッチング表のデータが随時更新される。 In this way, even if the performers move on the stage STG 1 (see FIG. 2) during a play, etc., the performer tracking system 2 recognizes and tracks each performer's identification information and location information. The data of the matching table indicating the location information of the performers is updated as needed.

次に、図７を参照して、演者の位置情報に応じたゾーン音信号の選択について説明する。図７は、演者の位置情報に応じたゾーン音信号の選択の動作概要例を模式的に示す図である。図７の説明を簡易化するために、ライブ会場ＬＶ１にゾーンマイクが計４つ配置され、サテライト会場ＳＴＬ１にサテライトスピーカが計４つ配置されているとする。なお、サテライトスピーカの配置数を示すｐは１２に限定されなくてよい。 Next, with reference to FIG. 7, selection of a zone sound signal according to position information of a performer will be explained. FIG. 7 is a diagram schematically showing an example of an outline of the operation of selecting a zone sound signal according to position information of a performer. To simplify the explanation of FIG. 7, it is assumed that a total of four zone microphones are arranged at the live venue LV1, and a total of four satellite speakers are arranged at the satellite venue STL1. Note that p, which indicates the number of satellite speakers arranged, does not have to be limited to 12.

演者トラッキングシステム２により、演者Ａの「ある時刻」における位置情報が取得され、その位置情報が示す位置がゾーンマイクＺＭ２の収音ゾーンＺＮ２内であったとする。図７の説明において、この「ある時刻」のことを「特定時刻」と称する場合がある。この場合、音像定位ゾーン決定部１５は、その「特定時刻」における演者Ａの位置情報に基づいて、演者Ａの頭上付近若しくは足元付近の音を主に収音（収録）する主周辺マイクをゾーンマイクＺＭ２と決定する。また、演者Ａの音声信号として、演者Ａのヘッドセットマイクが収音（収録）した演者音信号及びゾーンマイクＺＭ２のゾーン音信号だけでなく、他のゾーンマイクＺＭ１、ＺＭ３、ＺＭ４が収音（収録）したゾーン音信号もミキシング／レベルバランス調整部１７に入力される。 It is assumed that the performer tracking system 2 acquires the position information of the performer A at a "certain time" and that the position indicated by the position information is within the sound collection zone ZN2 of the zone microphone ZM2. In the description of FIG. 7, this "certain time" may be referred to as a "specific time." In this case, the sound image localization zone determination unit 15 places the main peripheral microphone that mainly picks up (records) sounds near the head of the performer A or near the feet of the performer A into a zone based on the position information of the performer A at the "specific time." I decided on Mike ZM2. In addition, as the audio signal of performer A, not only the performer sound signal collected (recorded) by the headset microphone of performer A and the zone sound signal of zone microphone ZM2, but also the sound signal collected (recorded) by other zone microphones ZM1, ZM3, and ZM4 are included. The recorded zone sound signal is also input to the mixing/level balance adjustment section 17.

また、演者トラッキングシステム２により、演者Ｂの同じ「特定時刻」（上述参照）における位置情報が取得され、その位置情報が示す位置がゾーンマイクＺＭ３の収音ゾーンＺＮ３内であったとする。この場合、音像定位ゾーン決定部１５は、その「特定時刻」における演者Ｂの位置情報に基づいて、演者Ｂの頭上付近若しくは足元付近の音を主に収音（収録）する主周辺マイクをゾーンマイクＺＭ３と決定する。また、演者Ｂの音声信号として、演者Ｂのヘッドセットマイクが収音（収録）した演者音信号及びゾーンマイクＺＭ３のゾーン音信号だけでなく、他のゾーンマイクＺＭ１、ＺＭ２、ＺＭ４が収音（収録）したゾーン音信号もミキシング／レベルバランス調整部１７に入力される。 Further, it is assumed that the performer tracking system 2 acquires position information of the performer B at the same "specific time" (see above), and that the position indicated by the position information is within the sound collection zone ZN3 of the zone microphone ZM3. In this case, the sound image localization zone determining unit 15 places the main peripheral microphone that mainly picks up (records) sounds near the head or feet of the performer B into a zone based on the position information of the performer B at the "specific time." I decided on Mike ZM3. Furthermore, as the audio signal of performer B, not only the performer sound signal collected (recorded) by the headset microphone of performer B and the zone sound signal of zone microphone ZM3, but also the sound signal collected (recorded) by other zone microphones ZM1, ZM2, and ZM4 are included. The recorded zone sound signal is also input to the mixing/level balance adjustment section 17.

ミキシング／レベルバランス調整部１７は、サテライト会場ＳＴＬ１において演者Ａ、演者Ｂの音場（例えば発話音声、頭上付近若しくは足元付近で演者Ａ、演者Ｂの動き等に起因して生じた音）の雰囲気を再生するために、演者Ａ、演者Ｂの演者音信号及びゾーンマイクＺＭ２、ＺＭ３のゾーン音信号のレベルを他の演者音信号及びゾーン音信号に比べて相対的に高くするようにレベル調整した上で各信号をミキシングする。このミキシングの処理は、サテライトスピーカＳＰｋ１～ＳＰｋ４のそれぞれ用に実行される。 The mixing/level balance adjustment unit 17 adjusts the levels of the performer sound signals of performers A and B and the zone sound signals of zone microphones ZM2 and ZM3 to be relatively higher than the other performer sound signals and zone sound signals, and then mixes each signal, in order to reproduce the atmosphere of the sound field for performers A and B (for example, sounds generated by speech, or sounds generated overhead or near the feet due to the movements of performers A and B) in the satellite venue STL1. This mixing process is performed for each of the satellite speakers SPk1 to SPk4.

つまり、図７の例では、ミキシング／レベルバランス調整部１７は、サテライトスピーカＳＰｋ１用に、演者Ａの演者音信号と演者Ｂの演者音信号とゾーンマイクＺＭ１のゾーン音信号とゾーンマイクＺＭ２のゾーン音信号とゾーンマイクＺＭ３のゾーン音信号とゾーンマイクＺＭ４のゾーン音信号とを各信号をミキシングする。また、ミキシング／レベルバランス調整部１７は、サテライトスピーカＳＰｋ２用に、演者Ａの演者音信号と演者Ｂの演者音信号とゾーンマイクＺＭ１のゾーン音信号とゾーンマイクＺＭ２のゾーン音信号とゾーンマイクＺＭ３のゾーン音信号とゾーンマイクＺＭ４のゾーン音信号とを各信号をミキシングする。また、ミキシング／レベルバランス調整部１７は、サテライトスピーカＳＰｋ３用に、演者Ａの演者音信号と演者Ｂの演者音信号とゾーンマイクＺＭ１のゾーン音信号とゾーンマイクＺＭ２のゾーン音信号とゾーンマイクＺＭ３のゾーン音信号とゾーンマイクＺＭ４のゾーン音信号とを各信号をミキシングする。また、ミキシング／レベルバランス調整部１７は、サテライトスピーカＳＰｋ４用に、演者Ａの演者音信号と演者Ｂの演者音信号とゾーンマイクＺＭ１のゾーン音信号とゾーンマイクＺＭ２のゾーン音信号とゾーンマイクＺＭ３のゾーン音信号とゾーンマイクＺＭ４のゾーン音信号とを各信号をミキシングする。それぞれのサテライトスピーカ用にミキシングされた信号が音場再生処理部１８に入力される。 7, the mixing/level balance adjustment unit 17 mixes the performer sound signal of performer A, the performer sound signal of performer B, the zone sound signal of zone microphone ZM1, the zone sound signal of zone microphone ZM2, the zone sound signal of zone microphone ZM3, and the zone sound signal of zone microphone ZM4 for the satellite speaker SPk1. The mixing/level balance adjustment unit 17 also mixes the performer sound signal of performer A, the performer sound signal of performer B, the zone sound signal of zone microphone ZM1, the zone sound signal of zone microphone ZM2, the zone sound signal of zone microphone ZM3, and the zone sound signal of zone microphone ZM4 for the satellite speaker SPk2. The mixing/level balance adjustment unit 17 also mixes the performer sound signal of performer A, the performer sound signal of performer B, the zone sound signal of zone microphone ZM1, the zone sound signal of zone microphone ZM2, the zone sound signal of zone microphone ZM3, and the zone sound signal of zone microphone ZM4 for the satellite speaker SPk3. The mixing/level balance adjustment unit 17 also mixes the performer sound signal of performer A, the performer sound signal of performer B, the zone sound signal of zone microphone ZM1, the zone sound signal of zone microphone ZM2, the zone sound signal of zone microphone ZM3, and the zone sound signal of zone microphone ZM4 for the satellite speaker SPk4. The signals mixed for each satellite speaker are input to the sound field reproduction processing unit 18.

音場再生処理部１８は、入力されたそれぞれのサテライトスピーカ用の信号を用いて、サテライト会場ＳＴＬ１において、演者Ａがライブ会場ＬＶ１の収音ゾーンＺＮ２に位置するように、その収音ゾーンＺＮ２に対応するサテライトスピーカに対応する音像定位パラメータを記憶部１３から取得して音像定位（再生）の処理を施す。また、音場再生処理部１８は、入力されたそれぞれのサテライトスピーカ用の信号を用いて、サテライト会場ＳＴＬ１において、演者Ｂがライブ会場ＬＶ１の収音ゾーンＺＮ３に位置するように、その収音ゾーンＺＮ３に対応するサテライトスピーカに対応する音像定位パラメータを記憶部１３から取得して音像定位（再生）の処理を施す。 Using the input signals for each satellite speaker, the sound field reproduction processing unit 18 uses the input signals for each satellite speaker to adjust the sound field to the sound collection zone ZN2 of the live venue LV1 so that the performer A is located in the sound collection zone ZN2 of the live venue LV1 at the satellite venue STL1. The sound image localization parameters corresponding to the corresponding satellite speakers are acquired from the storage unit 13 and sound image localization (reproduction) processing is performed. In addition, the sound field reproduction processing unit 18 uses the input signals for each satellite speaker to adjust the sound collection zone of the satellite venue STL1 so that the performer B is located in the sound collection zone ZN3 of the live venue LV1. The sound image localization parameters corresponding to the satellite speakers corresponding to ZN3 are acquired from the storage unit 13, and sound image localization (reproduction) processing is performed.

次に、図８を参照して、サテライト会場内でのサテライトスピーカの配置について説明する。図８は、サテライト会場内でのサテライトスピーカの配置例を示す図である。図８の説明を分かりやすくするために、サテライト会場ＳＴＬ１にはサテライトスピーカが計１２個配置されているとする。なお、サテライトスピーカの配置数を示すｐは１２に限定されなくてよい。 Next, referring to FIG. 8, the arrangement of satellite speakers in a satellite venue will be described. FIG. 8 is a diagram showing an example of the arrangement of satellite speakers in a satellite venue. To make the explanation of FIG. 8 easier to understand, it is assumed that a total of 12 satellite speakers are arranged in satellite venue STL1. Note that p, which indicates the number of satellite speakers arranged, does not need to be limited to 12.

図８に示すように、サテライト会場ＳＴＬ１は中央に複数人の観客が着席する客席が数多く設けられ、各客席の正面方向にスクリーンＳＣＲ１が設けられている。映像出力部１４は、例えばプロジェクタであって、ライブ会場ＬＶ１のライブ会場側通信部６から送られた演者の撮像映像のデータ信号をスクリーンＳＣＲ１に投影して出力する。これにより、サテライト会場ＳＴＬ１に来場している来場客は、スクリーンＳＣＲ１に投影された撮像映像を視聴してライブ会場ＬＶ１の舞台ＳＴＧ１（図２参照）上でなされている演劇等の上演を体感することができる。 As shown in Figure 8, satellite venue STL1 has many seats in the center where multiple spectators can sit, and a screen SCR1 is provided directly in front of each seat. The video output unit 14 is, for example, a projector, and projects and outputs data signals of the captured video of the performers sent from the live venue communication unit 6 of the live venue LV1 onto the screen SCR1. This allows visitors to the satellite venue STL1 to watch the captured video projected onto the screen SCR1 and experience a performance such as a play being shown on the stage STG1 (see Figure 2) of the live venue LV1.

スクリーンＳＣＲ１の背面側には、例えば４つのサテライトスピーカＳＰｋ１、ＳＰｋ２、ＳＰｋ３、ＳＰｋ４が一定間隔ほど離れて配置されている。サテライトスピーカＳＰｋ１～ＳＰｋ４のそれぞれは、スクリーンＳＣＲ１の背面側に配置されているので、音場再生処理部１８において、主にライブ会場ＬＶ１の舞台ＳＴＧ１（図２参照）上にいる演者の発話音声、頭上付近若しくは足元付近で生じた音による音場の音像定位に使用されることが好ましい。また、音場再生処理部１８は、記憶部１３からサテライトスピーカＳＰｋ１、ＳＰｋ２用の音像定位パラメータを用いることにより、サテライトスピーカＳＰｋ１、ＳＰｋ２の中間位置にライブ会場ＬＶ１内の演者の音場（音像ＳＩｍ１）を定位することもできる。同様に、音場再生処理部１８は、記憶部１３からサテライトスピーカＳＰｋ２、ＳＰｋ３用の音像定位パラメータを用いることにより、サテライトスピーカＳＰｋ２、ＳＰｋ３の中間位置にライブ会場ＬＶ１内の演者の音場（音像ＳＩｍ２）を定位することもできる。同様に、音場再生処理部１８は、記憶部１３からサテライトスピーカＳＰｋ３、ＳＰｋ４用の音像定位パラメータを用いることにより、サテライトスピーカＳＰｋ３、ＳＰｋ４の中間位置にライブ会場ＬＶ１内の演者の音場（音像ＳＩｍ３）を定位することもできる。 For example, four satellite speakers SPk1, SPk2, SPk3, and SPk4 are arranged at a certain distance from each other on the rear side of the screen SCR1. Since each of the satellite speakers SPk1 to SPk4 is arranged on the rear side of the screen SCR1, it is preferable that the sound field reproduction processing unit 18 uses them mainly for sound image localization of the sound field based on the speech of the performers on the stage STG1 (see FIG. 2) of the live venue LV1 and the sounds generated near the head or feet. In addition, the sound field reproduction processing unit 18 can also localize the sound field (sound image SIm1) of the performers in the live venue LV1 at the midpoint between the satellite speakers SPk1 and SPk2 by using the sound image localization parameters for the satellite speakers SPk1 and SPk2 from the memory unit 13. Similarly, the sound field reproduction processing unit 18 can localize the sound field (sound image SIm2) of the performer in the live venue LV1 at the midpoint between the satellite speakers SPk2 and SPk3 by using the sound image localization parameters for the satellite speakers SPk2 and SPk3 from the storage unit 13. Similarly, the sound field reproduction processing unit 18 can localize the sound field (sound image SIm3) of the performer in the live venue LV1 at the midpoint between the satellite speakers SPk3 and SPk4 by using the sound image localization parameters for the satellite speakers SPk3 and SPk4 from the storage unit 13.

ここで、客席に着席している聴取者ＰＳ１から見たサテライトスピーカＳＰｋ１までのベクトルとサテライトスピーカＳＰｋ２までのベクトルとの間の角度は２０度以内となるように隣接するサテライトスピーカＳＰｋ１、ＳＰｋ２が配置されることが好ましい。これにより、音場再生処理部１８による音像定位の処理が高精度に実現可能となる。なお、サテライトスピーカＳＰｋ１、ＳＰｋ２と聴取者ＰＳ１との間のなす角度に限らず、スクリーンＳＣＲ１の背面側に配置される隣接するサテライトスピーカ同士において同様に成り立つ。つまり、サテライトスピーカＳＰｋ２、ＳＰｋ３と聴取者ＰＳ１との間の角度でも、サテライトスピーカＳＰｋ３、ＳＰｋ４と聴取者ＰＳ１との間の角度でも同様である。 Here, adjacent satellite speakers SPk1 and SPk2 are arranged so that the angle between the vector to satellite speaker SPk1 and the vector to satellite speaker SPk2 seen from listener PS1 seated in the audience seat is within 20 degrees. It is preferable that This allows the sound field reproduction processing section 18 to perform sound image localization processing with high accuracy. Note that the same holds true not only for the angle formed between the satellite speakers SPk1, SPk2 and the listener PS1, but also for adjacent satellite speakers arranged on the back side of the screen SCR1. That is, the same applies to the angle between the satellite speakers SPk2, SPk3 and the listener PS1, and the angle between the satellite speakers SPk3, SPk4 and the listener PS1.

更に、サテライト会場ＳＴＬ１内の広範な客席の側面側を覆うように、サテライトスピーカＳＰｋ５、ＳＰｋ６、ＳＰｋ７、ＳＰｋ８、ＳＰｋ９、ＳＰｋ１０、ＳＰｋ１１、ＳＰｋ１２が一定距離ほど離れて配置されている。サテライトスピーカＳＰｋ５～ＳＰｋ１２のそれぞれは、サテライト会場ＳＴＬ１内の客席の側面側を覆うように配置されているので、音場再生処理部１８において、主にライブ会場ＬＶ１の客席側の拍手、歓声、ざわつき等の客席側臨場感の出力に使用されることが好ましい。なお、これらのサテライトスピーカＳＰｋ５～ＳＰｋ１２のそれぞれを用いた客席側臨場感の出力（言い換えると、サテライトスピーカＳＰｋ５～ＳＰｋ１２のそれぞれ用のスピーカ駆動信号の再生）の詳細については、実施の形態２を参照して詳述する。 Further, satellite speakers SPk5, SPk6, SPk7, SPk8, SPk9, SPk10, SPk11, and SPk12 are arranged at a certain distance apart so as to cover the sides of the wide audience seats in the satellite venue STL1. Since each of the satellite speakers SPk5 to SPk12 is arranged so as to cover the side surface of the audience seats in the satellite venue STL1, the sound field reproduction processing unit 18 mainly generates applause, cheers, and noise from the audience seats in the live venue LV1. It is preferable to use it for outputting a sense of presence on the audience seat side. For details on outputting the sense of presence on the audience seat side using each of these satellite speakers SPk5 to SPk12 (in other words, reproducing the speaker drive signal for each of the satellite speakers SPk5 to SPk12), see Embodiment 2. This will be explained in detail.

次に、図９を参照して、実施の形態１に係る臨場感音場再生システム１０００の動作手順について説明する。図９は、実施の形態１に係る臨場感音場再生システム１０００の動作手順例を時系列に示すシーケンス図である。ステップＳｔ１からステップＳｔ７までの一連の処理はライブ会場ＬＶ１内で実行されるが、この一連の処理は処理単位として演劇等の上演がなされている間に繰り返される。なお、図９の説明を簡単にするために、舞台ＳＴＧ１（図２参照）上には１名の役者等の演者Ａ（図６参照）がいる状態を例示して説明するが、演者の人数は２名以上でも構わない。 Next, with reference to FIG. 9, the operating procedure of the realistic sound field reproduction system 1000 according to the first embodiment will be described. FIG. 9 is a sequence diagram chronologically showing an example of the operation procedure of the realistic sound field reproduction system 1000 according to the first embodiment. A series of processes from step St1 to step St7 is executed within the live venue LV1, and this series of processes is repeated as a unit of processing while a play or the like is being performed. In order to simplify the explanation of FIG. 9, an example will be explained in which there is a performer A (see FIG. 6) such as one actor on the stage STG1 (see FIG. 2), but the number of performers There can be 2 or more people.

図９において、演者トラッキングシステム２は、ライブ会場ＬＶ１の舞台ＳＴＧ１（図２参照）上の演者Ａを認識し、その認識結果としての演者Ａの識別情報（音源ＩＤ）、位置情報及び撮像映像のデータ信号を、ミキサ１を介してエンコーダ４に送る（ステップＳｔ１）。また、演者トラッキングシステム２は、演者Ａを認識したことにより、演者Ａの識別情報（音源ＩＤ）及び位置情報を有するマッチング表（図６参照）のデータを生成或いは更新してマッチング表データベース５に保存する（ステップＳｔ２）。 In Figure 9, the performer tracking system 2 recognizes performer A on stage STG1 (see Figure 2) of the live venue LV1, and sends the identification information (sound source ID), position information, and captured video data signal of performer A as the recognition result to the encoder 4 via the mixer 1 (step St1). In addition, by recognizing performer A, the performer tracking system 2 generates or updates data in a matching table (see Figure 6) containing performer A's identification information (sound source ID) and position information, and stores the data in the matching table database 5 (step St2).

演者Ａが装着しているヘッドセットマイクＨＭ１は、舞台ＳＴＧ１（図２参照）上にいる演者Ａの発話音声を収音（収録）した演者Ａの演者音信号を、ミキサ１を介してエンコーダ４に送る（ステップＳｔ３）。舞台ＳＴＧ１（図２参照）の周囲に配置されたそれぞれのゾーンマイクＺＭ１～ＺＭ７は、演者Ａの頭上付近若しくは足元付近の音を収音（収録）した演者Ａのゾーン音信号を、ミキサ１を介してエンコーダ４に送る（ステップＳｔ４）。舞台ＳＴＧ１（図２参照）から離れた周辺エリア（例えば客席側の座席同士の間付近）に配置されたアンビソニックスマイクＡＭＢ１は、舞台ＳＴＧ１（図２参照）から離れた客席側エリアの臨場感（具体的には、客席側の拍手、歓声、ざわつき等）の収録音である客席側臨場感音信号を収音（収録）し、ミキサ１を介してエンコーダ４に送る（ステップＳｔ５）。 The headset microphone HM1 worn by performer A picks up (records) the speech of performer A on the stage STG1 (see FIG. 2), and sends the performer sound signal of performer A, which is recorded, to the encoder 4 via the mixer 1 (step St3). Each of the zone microphones ZM1 to ZM7 arranged around the stage STG1 (see FIG. 2) picks up (records) the sound near the head or feet of performer A, and sends the zone sound signal of performer A, which is recorded, to the encoder 4 via the mixer 1 (step St4). The Ambisonics microphone AMB1 arranged in a peripheral area (e.g., near the seats on the audience side) away from the stage STG1 (see FIG. 2) picks up (records) an audience side realistic sound signal, which is a recording sound of the realistic sensation (specifically, applause, cheers, commotion, etc. on the audience side) in the audience side area away from the stage STG1 (see FIG. 2), and sends it to the encoder 4 via the mixer 1 (step St5).

エンコーダ４は、ステップＳｔ３で送られた演者音信号のデータに付された演者Ａの識別情報（例えば音源ＩＤ）と一致する演者Ａの識別情報を有する演者トラッキングシステム２からのデータ（例えば演者の識別情報及び位置情報）を、ステップＳｔ１で送られたデータ或いはマッチング表データベース５の参照によって取得する（ステップＳｔ６）。エンコーダ４は、共通の識別情報を有する演者Ａの演者音信号のデータと識別情報、位置情報及び撮像映像（ステップＳｔ１或いはステップＳＴ６参照）のデータとを関連付け（例えばＩＰパケット化）することにより、ネットワークＮＷ１を介した送受信用のＩＰパケットを生成する（ステップＳｔ６）。また、エンコーダ４は、ゾーンマイクＺＭ１～ＺＭ７のそれぞれからのゾーン音信号とアンビソニックスマイクＡＭＢ１からの客席側臨場感音信号とを用いて、ネットワークＮＷ１を介した送受信用のＩＰパケットを生成する（ステップＳｔ６）。エンコーダ４は、ステップＳｔ６で生成したそれぞれのＩＰパケットを、ライブ会場側通信部６及びネットワークＮＷ１を介してサテライト会場側通信部１１に送る（ステップＳｔ７）。 The encoder 4 receives data from the performer tracking system 2 (for example, the performer's identification information and location information) is acquired by referring to the data sent in step St1 or the matching table database 5 (step St6). The encoder 4 associates data of the performer sound signal of performer A having common identification information with the data of the identification information, position information, and captured video (see step St1 or step ST6) (for example, by converting it into an IP packet). An IP packet for transmission and reception via the network NW1 is generated (step St6). Furthermore, the encoder 4 uses the zone sound signals from each of the zone microphones ZM1 to ZM7 and the audience-side realistic sound signal from the ambisonics microphone AMB1 to generate IP packets for transmission and reception via the network NW1 ( Step St6). The encoder 4 sends each IP packet generated in step St6 to the satellite venue side communication unit 11 via the live venue side communication unit 6 and the network NW1 (step St7).

デコーダ１２は、ステップＳｔ７で送られたそれぞれのＩＰパケットを、サテライト会場側通信部１１を介して取得し、デコード処理を行って各種のデータ信号を取り出す（ステップＳｔ８）。 The decoder 12 acquires each IP packet sent in step St7 via the satellite venue side communication unit 11, performs decoding processing, and extracts various data signals (step St8).

音像定位ゾーン決定部１５は、デコーダ１２からのデータ（つまり、演者Ａの識別情報及び位置情報のデータ）に基づいて、ライブ会場ＬＶ１の舞台ＳＴＧ１（図２参照）の演者Ａの発話による音声と立ち止まり若しくは移動等による足元音とにより形成される音場（雰囲気）をサテライト会場ＳＴＬ１内で再現（再生）するための音像定位ゾーンを決定する（ステップＳｔ９）。ここでいう音像定位ゾーンは、図３に示す収音ゾーンＺＮ１～ＺＮ７のうちいずれかである。客席側臨場感パラメータ算出部１６は、デコーダ１２からのデータ（つまり、アンビソニックスマイクＡＭＢ１からの客席側臨場感音信号）と記憶部１３に保存されているサテライトスピーカＳＰｋ１～ＳＰｋｐのそれぞれの位置情報とに基づいて、ライブ会場ＬＶ１内の客席側臨場感の音場をサテライト会場内ＳＴＬ１で立体的に再現（再生）するための立体再生用パラメータを算出する（ステップＳｔ１０）。客席側臨場感パラメータ算出部１６は、この立体再生用パラメータを用いて、サテライトスピーカＳＰｋ１～ＳＰｋｐのそれぞれごとのスピーカ駆動信号（立体再生音の一例）を生成してミキシング／レベルバランス調整部１７に出力する（ステップＳｔ１１）。 The sound image localization zone determining unit 15 distinguishes between the voice uttered by the performer A on the stage STG1 (see FIG. 2) of the live venue LV1 and A sound image localization zone for reproducing (reproducing) a sound field (atmosphere) formed by footstep sounds caused by standing still or moving, etc., within the satellite venue STL1 is determined (Step St9). The sound image localization zone referred to here is one of the sound collection zones ZN1 to ZN7 shown in FIG. 3. The audience seat side presence sensation parameter calculation unit 16 calculates the data from the decoder 12 (that is, the audience seat side presence sensation signal from the ambisonics microphone AMB1) and the position information of each of the satellite speakers SPk1 to SPkp stored in the storage unit 13. Based on this, parameters for stereoscopic reproduction are calculated for three-dimensionally reproducing (reproducing) the realistic sound field on the audience seat side in the live venue LV1 at the satellite venue STL1 (step St10). The audience seat-side presence feeling parameter calculation unit 16 uses this stereoscopic reproduction parameter to generate a speaker drive signal (an example of stereoscopic reproduction sound) for each of the satellite speakers SPk1 to SPkp, and sends the signal to the mixing/level balance adjustment unit 17. Output (step St11).

音像定位ゾーン決定部１５は、記憶部１３からサテライトスピーカＳＰｋ１～ＳＰｋｐのそれぞれごとの音像定位パラメータを読み出して取得する（ステップＳｔ１２）。音像定位ゾーン決定部１５は、ステップＳｔ９で決定した音像定位ゾーンの情報とステップＳｔ１２で取得したサテライトスピーカＳＰｋ１～ＳＰｋｐのそれぞれごとの音像定位パラメータの情報とをミキシング／レベルバランス調整部１７に送る（ステップＳｔ１３）。 The sound image localization zone determination unit 15 reads and acquires the sound image localization parameters for each of the satellite speakers SPk1 to SPkp from the memory unit 13 (step St12). The sound image localization zone determination unit 15 sends the information on the sound image localization zone determined in step St9 and the information on the sound image localization parameters for each of the satellite speakers SPk1 to SPkp acquired in step St12 to the mixing/level balance adjustment unit 17 (step St13).

ミキシング／レベルバランス調整部１７は、ステップＳｔ１３で送られた情報を用いて、ヘッドセットマイクＨＭ１～ＨＭ４のそれぞれにより収音（収録）された演者ごとの演者音信号と、ゾーンマイクＺＭ１～ＺＭ７のそれぞれからのゾーン音信号と、サテライトスピーカごとのスピーカ駆動信号との各信号レベルのバランスを調整する（ステップＳｔ１４）。ミキシング／レベルバランス調整部１７は、各信号レベルのバランスが調整された各信号をミキシング（つまり、信号混合）し（ステップＳｔ１４）、調整かつミキシング後の信号を音場再生処理部１８に送る（ステップＳｔ１５）。 The mixing/level balance adjustment unit 17 uses the information sent in step St13 to adjust the balance of each signal level between the performer sound signals picked up (recorded) by each of the headset microphones HM1-HM4 for each performer, the zone sound signals from each of the zone microphones ZM1-ZM7, and the speaker drive signals for each satellite speaker (step St14). The mixing/level balance adjustment unit 17 mixes (i.e., mixes) each signal whose signal level balance has been adjusted (step St14), and sends the adjusted and mixed signal to the sound field reproduction processing unit 18 (step St15).

音場再生処理部１８は、サテライトスピーカごとの音像定位パラメータとミキシング／レベルバランス調整部１７から出力された信号とを用いて、ライブ会場ＬＶ１内の演者の発話による音声と立ち止まり若しくは移動等による足元音とにより形成される音場（雰囲気）、並びに、ライブ会場ＬＶ１内の客席側臨場感により形成される音場（雰囲気）をサテライト会場ＳＴＬ１において再現（再生）するための音場再生処理を実行する（ステップＳｔ１６）。具体的には、音場再生処理部１８は、ライブ会場ＬＶ１内の演者の発話による音声と立ち止まり若しくは移動等による足元音とにより形成される音場（雰囲気）をサテライト会場ＳＴＬ１内において再生するために、ミキシング／レベルバランス調整部１７から出力された信号を用いて、演者の発話音声及び足元音による音像を各サテライトスピーカを介して定位する処理を実行する（ステップＳｔ１６）。更に、音場再生処理部１８は、ライブ会場ＬＶ１内の客席側臨場感により形成される音場（雰囲気）をサテライト会場ＳＴＬ１内において再生するために、ミキシング／レベルバランス調整部１７から出力された信号（特にサテライトスピーカごとのスピーカ駆動信号）を対応するサテライトスピーカを介して出力する（ステップＳｔ１６）。 The sound field reproduction processing unit 18 uses the sound image localization parameters for each satellite speaker and the signal output from the mixing/level balance adjustment unit 17 to distinguish between the voices of the performers in the live venue LV1 and the footsteps of the performers who are standing still or moving. Execute sound field reproduction processing to reproduce (reproduce) the sound field (atmosphere) formed by the sound and the sound field (atmosphere) formed by the sense of presence on the audience side in the live venue LV1 at the satellite venue STL1. (Step St16). Specifically, the sound field reproduction processing unit 18 reproduces, in the satellite venue STL1, a sound field (atmosphere) formed by voices uttered by the performers in the live venue LV1 and footstep sounds caused by stopping or moving. Then, using the signal output from the mixing/level balance adjustment section 17, a process is performed to localize the sound image of the performer's speech and footstep sounds via each satellite speaker (Step St16). Furthermore, the sound field reproduction processing section 18 uses the output from the mixing/level balance adjustment section 17 in order to reproduce the sound field (atmosphere) formed by the sense of presence on the audience side in the live venue LV1 in the satellite venue STL1. A signal (particularly a speaker drive signal for each satellite speaker) is outputted via the corresponding satellite speaker (step St16).

以上により、実施の形態１に係る臨場感音場再生システム１０００において、臨場感音場再生装置は、音場収録空間（ライブ会場ＬＶ１）内の活動エリア（舞台ＳＴＧ１）を移動可能な少なくとも１人の人物（演者ＡＣＴ１）が装着する人物マイク（ヘッドセットマイクＨＭ１等）が収録する発話音声信号（演者音信号）と、活動エリアの周辺に配置された複数の周辺マイク（ゾーンマイクＺＭ１～ＺＭ７）が収録する周辺音信号と、人物の位置情報と、を少なくとも取得する取得部（サテライト会場側通信部１１）と、人物の位置情報に基づいて、複数の周辺マイクのうちいずれかであってかつ活動エリア内の人物が位置する箇所を収録領域とする主周辺マイク（例えばゾーンマイクＺＭ２）を決定する決定部（音像定位ゾーン決定部１５）と、少なくとも人物マイクによる発話音声信号と、主周辺マイクによる第１周辺音信号（例えばゾーンマイクＺＭ２のゾーン音信号）と、主周辺マイク以外の他の周辺マイクによる第２周辺音信号（例えばゾーンマイクＺＭ１、ＺＭ３～ＺＭ７のそれぞれのゾーン音信号）とに基づいて、音場収録空間とは異なる音場再生空間（サテライト会場ＳＴＬ１）内に配置された複数のスピーカ（サテライトスピーカＳＰｋ１～ＳＰｋｐ）を用いて音場収録空間内の音場を再現するための音場再生処理を実行する音場再生部（客席側臨場感パラメータ算出部１６、ミキシング／レベルバランス調整部１７、音場再生処理部１８）と、を備える。これにより、臨場感音場再生装置は、音場収録空間（ライブ会場ＬＶ１）内の各種の音源（例えば少なくとも１人の演者等）による音像を含む臨場感を、音場収録空間とは異なる音場再生空間（サテライト会場ＳＴＬ１）内において高感度に再生することができる。 As described above, in the realistic sound field reproducing system 1000 according to the first embodiment, the realistic sound field reproducing device is configured to support at least one person who can move in the activity area (stage STG1) in the sound field recording space (live venue LV1). Speech audio signals (performer sound signals) recorded by a person microphone (headset microphone HM1, etc.) worn by a person (performer ACT1) (performer ACT1), and multiple peripheral microphones (zone microphones ZM1 to ZM7) placed around the activity area. an acquisition unit (satellite venue side communication unit 11) that acquires at least an ambient sound signal recorded by the person and location information of the person; A determining unit (sound image localization zone determining unit 15) that determines the main peripheral microphone (for example, zone microphone ZM2) whose recording area is the location where the person is located in the activity area, at least the speech signal from the person microphone, and the main peripheral microphone. a first peripheral sound signal (for example, a zone sound signal of zone microphone ZM2), and a second peripheral sound signal by peripheral microphones other than the main peripheral microphone (for example, zone sound signals of zone microphones ZM1, ZM3 to ZM7). Based on this, to reproduce the sound field in the sound field recording space using multiple speakers (satellite speakers SPk1 to SPkp) placed in a sound field reproduction space (satellite venue STL1) different from the sound field recording space. The sound field reproduction section (audience seat side realism parameter calculation section 16, mixing/level balance adjustment section 17, sound field reproduction processing section 18) that executes the sound field reproduction processing of. As a result, the immersive sound field playback device reproduces the sense of presence, including the sound images from various sound sources (for example, at least one performer, etc.) in the sound field recording space (live venue LV1), using sounds different from those in the sound field recording space. Highly sensitive reproduction can be performed within the venue reproduction space (satellite venue STL1).

また、音場再生部（音場再生処理部１８）は、音場再生空間（サテライト会場ＳＴＬ１）内において第１周辺音信号（ゾーンマイクＺＭ２のゾーン音信号）を第２周辺音信号（例えばゾーンマイクＺＭ１、ＺＭ３～ＺＭ７のそれぞれのゾーン音信号）よりも強調して音場収録空間（ライブ会場ＬＶ１）内の音場を再現するように音場再生処理を実行する。これにより、臨場感音場再生装置は、活動エリア（舞台ＳＴＧ１）上にいて台詞等で発話している演者の発話音声、頭上付近若しくは足元付近の音を相対的に強調することでサテライト会場ＳＴＬ１内でもその演者の発話音声、頭上付近若しくは足元付近の音が目立つように音場再生できて、音場再生の再現性を向上できる。 Further, the sound field reproduction unit (sound field reproduction processing unit 18) converts the first ambient sound signal (zone sound signal of zone microphone ZM2) into a second ambient sound signal (for example, zone sound signal) in the sound field reproduction space (satellite venue STL1). Sound field reproduction processing is performed so as to reproduce the sound field in the sound field recording space (live venue LV1) with more emphasis than the zone sound signals of the microphones ZM1 and ZM3 to ZM7. As a result, the realistic sound field reproducing device relatively emphasizes the speech sounds of the performers who are on the activity area (stage STG1) and are speaking their lines, etc., and the sounds near the head or feet of the satellite venue STL1. Even within a theater, the sound field can be reproduced so that the voice uttered by the performer and the sounds near the head or feet of the performer stand out, and the reproducibility of the sound field reproduction can be improved.

また、取得部（サテライト会場側通信部１１）は、音場収録空間（ライブ会場ＬＶ１）内の活動エリア（舞台ＳＴＧ１）とは異なる周辺エリア（客席側）に配置された音収録デバイス（アンビソニックスマイクＡＭＢ１）が収録する周辺エリア音信号（客席側臨場感音信号）を更に取得する。音場再生部（音場再生処理部１８）は、音場収録空間内の周辺エリアに相当する音場再生空間内のエリアに周辺エリア音信号による音場を再生するための信号処理を実行する。これにより、臨場感音場再生装置は、音収録デバイスにより収音（収録）されたライブ会場ＬＶ１内の客席側の拍手、歓声、ざわつき等の音（客席側臨場感）を副次的にサテライト会場ＬＶ１内の客席側に相当するエリアに音場再現（再生）するように演出できる。 In addition, the acquisition unit (satellite venue side communication unit 11) acquires a sound recording device (ambisonics Further, the surrounding area sound signal (audience seat side realistic sound signal) recorded by the microphone AMB1) is acquired. The sound field reproduction unit (sound field reproduction processing unit 18) executes signal processing to reproduce a sound field based on the peripheral area sound signal in an area in the sound field reproduction space corresponding to the peripheral area in the sound field recording space. . As a result, the immersive sound field reproducing device sub-distributes sounds such as applause, cheers, and murmurs from the audience seats in the live venue LV1 that have been collected (recorded) by the sound recording device (audience side ambiance) as a secondary satellite. The sound field can be reproduced (reproduced) in the area corresponding to the audience seats in the venue LV1.

また、音場再生部（ミキシング／レベルバランス調整部１７）は、発話音声信号（演者の演者音信号）と第１周辺音信号（ゾーンマイクＺＭ２のゾーン音信号）と第２周辺音信号（例えばゾーンマイクＺＭ１、ＺＭ３～ＺＭ７のそれぞれのゾーン音信号）との混合及び信号レベルのバランス調整を実行し、複数のスピーカ（サテライトスピーカＳＰｋ１～ＳＰｋｐ）のそれぞれを介して出力する。これにより、臨場感音場再生装置は、ライブ会場ＬＶ１内の活動エリア（舞台ＳＴＧ１）上で発生している演者の台詞等の発話音声だけでなく演者の頭上付近若しくは足元付近の音も含めて演者が醸し出している音場の雰囲気をサテライト会場ＳＴＬ１内においても高精度に再現（再生）することができる。 The sound field reproduction unit (mixing/level balance adjustment unit 17) also outputs a speech audio signal (performer sound signal of the performer), a first peripheral sound signal (zone sound signal of zone microphone ZM2), and a second peripheral sound signal (e.g. Mixing with the zone sound signals of the zone microphones ZM1 and ZM3 to ZM7 and balance adjustment of the signal level is performed, and output via each of a plurality of speakers (satellite speakers SPk1 to SPkp). As a result, the realistic sound field reproducing device can reproduce not only the speech sounds such as the performer's lines that are occurring on the activity area (stage STG1) in the live venue LV1, but also the sounds near the performer's head or feet. The atmosphere of the sound field created by the performers can be reproduced (reproduced) with high precision even within the satellite venue STL1.

また、音場再生部（ミキシング／レベルバランス調整部１７）は、発話音声信号（演者の演者音信号）と第１周辺音信号（ゾーンマイクＺＭ２のゾーン音信号）と第２周辺音信号（例えばゾーンマイクＺＭ１、ＺＭ３～ＺＭ７のそれぞれのゾーン音信号）と周辺エリア音信号（客席側臨場感音信号）の混合及び信号レベルのバランス調整を実行し、複数のスピーカ（サテライトスピーカＳＰｋ１～ＳＰｋｐ）のそれぞれを介して出力する。これにより、臨場感音場再生装置は、ライブ会場ＬＶ１内の活動エリア（舞台ＳＴＧ１）上で発生している演者の台詞等の発話音声、演者の頭上付近若しくは足元付近の音だけでなく、ライブ会場ＬＶ１内の客席側臨場感の音の雰囲気をサテライト会場ＳＴＬ１内においても高精度に再現（再生）することができる。 The sound field reproduction unit (mixing/level balance adjustment unit 17) also outputs a speech audio signal (performer sound signal of the performer), a first peripheral sound signal (zone sound signal of zone microphone ZM2), and a second peripheral sound signal (e.g. It mixes the zone sound signals of the zone microphones ZM1, ZM3 to ZM7) and the surrounding area sound signals (audience seat-side immersive sound signals) and adjusts the signal level balance, Output via each. As a result, the immersive sound field reproducing device can reproduce not only speech sounds such as the performer's lines that are occurring on the activity area (stage STG1) in the live venue LV1, sounds near the performer's head or feet, but also the live performance. The sound atmosphere of the auditorium side in the venue LV1 can be reproduced (reproduced) with high precision even in the satellite venue STL1.

また、取得部（サテライト会場側通信部１１）は、周期的に更新され得る人物の位置情報を繰り返して取得する。決定部（音像定位ゾーン決定部１５）は、人物の位置情報の更新の度に、主周辺マイクを決定する。これにより、臨場感音場再生装置は、ライブ会場ＬＶ１内の舞台ＳＴＧ１上を人物（演者）が移動する度に、その人物の位置情報に適合する主周辺マイクによるゾーン音信号を取得できるので、演者の位置情報が変更してもサテライト会場ＳＴＬ１内において適応的に音場再生することができる。 Further, the acquisition unit (satellite venue side communication unit 11) repeatedly acquires position information of a person, which can be updated periodically. The determining unit (sound image localization zone determining unit 15) determines the main peripheral microphone every time the position information of the person is updated. As a result, the realistic sound field reproducing device can acquire zone sound signals from the main peripheral microphones that match the position information of the person (performer) every time the person (performer) moves on the stage STG1 in the live venue LV1. Even if the position information of the performer changes, the sound field can be adaptively reproduced within the satellite venue STL1.

また、人物（役者等の演者）が複数であり、決定部（音像定位ゾーン決定部１５）は、人物ごとに、人物の位置情報に基づいて主周辺マイクを決定する。これにより、臨場感音場再生装置は、ライブ会場ＬＶ１内の舞台ＳＴＧ１上に複数人の演者がいる場合でも、演者ごとにその音像定位を高精度に再現することができる。 Furthermore, there are a plurality of people (performers such as actors), and the determination unit (sound image localization zone determination unit 15) determines the main peripheral microphone for each person based on the position information of the person. Thereby, the realistic sound field reproducing device can reproduce the sound image localization for each performer with high precision even when there are multiple performers on the stage STG1 in the live venue LV1.

また、音場再生部（例えば音場再生処理部１８）は、発話音声信号（演者の演者音信号）、第１周辺音信号（ゾーンマイクＺＭ２のゾーン音信号）及び第２周辺音信号（例えばゾーンマイクＺＭ１、ＺＭ３～ＺＭ７のそれぞれのゾーン音信号）を参照信号とし、周辺エリア音信号（客席側臨場感音信号）に含まれる参照信号の成分を消去する消去処理を信号処理として実行する。これにより、臨場感音場再生装置は、周辺エリア音信号に漏れ込んでいると考えられる参照信号の成分を効果的に消去することができ、高品質な周辺エリア音信号を生成することができる。 In addition, the sound field reproduction unit (for example, the sound field reproduction processing unit 18) generates a speech sound signal (a performer's sound signal of a performer), a first ambient sound signal (a zone sound signal of zone microphone ZM2), and a second ambient sound signal (for example, Using the respective zone sound signals of the zone microphones ZM1 and ZM3 to ZM7 as reference signals, erasing processing for erasing the reference signal components included in the surrounding area sound signal (audience side realistic sound signal) is executed as signal processing. As a result, the realistic sound field reproduction device can effectively eliminate components of the reference signal that are thought to have leaked into the surrounding area sound signal, and can generate a high-quality surrounding area sound signal. .

また、音場再生部（例えば音場再生処理部１８）は、消去処理後の信号を符号化処理し、符号化処理後の信号に基づいて、複数のスピーカ（サテライトスピーカＳＰｋ１～ＳＰｋｐ）ごとに、音場収録空間内の音場臨場感を音場再生空間内において再生するためのスピーカ駆動信号を生成する。これにより、臨場感音場再生装置は、サテライト会場ＳＴＬ１の空間内に配置されている複数のサテライトスピーカＳＰｋ１～ＳＰｋｐのそれぞれの位置情報を加味してライブ会場ＬＶ１内の客席側臨場感を再現可能なスピーカ駆動信号を生成できる。 Further, the sound field reproduction unit (for example, the sound field reproduction processing unit 18) encodes the signal after the erasure processing, and based on the signal after the encoding processing, performs a coding process on each of the plurality of speakers (satellite speakers SPk1 to SPkp). , generates a speaker drive signal for reproducing the sense of presence of the sound field in the sound field recording space in the sound field reproduction space. As a result, the realistic sound field reproducing device can reproduce the sense of presence on the audience seat side in the live venue LV1 by taking into account the position information of each of the multiple satellite speakers SPk1 to SPkp arranged in the space of the satellite venue STL1. A speaker drive signal can be generated.

また、音場再生部（例えば音場再生処理部１８）は、生成されたスピーカ（サテライトスピーカＳＰｋ１～ＳＰｋｐ）ごとのスピーカ駆動信号を、対応するスピーカ（サテライトスピーカ）から出力する。例えば、音場再生処理部１８は、サテライトスピーカＳＰｋ１向けに生成したスピーカ駆動信号を、サテライトスピーカＳＰｋ１から出力する。他のサテライトスピーカ向けに生成されたスピーカ駆動信号も、同様に対応するサテライトスピーカから出力される。これにより、臨場感音場再生装置は、サテライト会場ＳＴＬ１の空間内に配置されている複数のサテライトスピーカＳＰｋ１～ＳＰｋｐのそれぞれごとのスピーカ駆動信号の出力により、ライブ会場ＬＶ１内の客席側臨場感をサテライト会場ＳＴＬ１内において高精度に再現することができる。 Further, the sound field reproduction unit (for example, the sound field reproduction processing unit 18) outputs a speaker drive signal for each generated speaker (satellite speakers SPk1 to SPkp) from the corresponding speaker (satellite speaker). For example, the sound field reproduction processing unit 18 outputs a speaker drive signal generated for the satellite speaker SPk1 from the satellite speaker SPk1. Speaker drive signals generated for other satellite speakers are similarly output from the corresponding satellite speakers. As a result, the realistic sound field reproducing device reproduces the sense of presence on the audience seat side in the live venue LV1 by outputting speaker drive signals for each of the plurality of satellite speakers SPk1 to SPkp arranged in the space of the satellite venue STL1. It can be reproduced with high precision within the satellite venue STL1.

（実施の形態２以降に至る経緯）
昨今、リアルタイムに音場再現（再生）を行うためにシーンベース立体音響再生技術が注目されている。シーンベース立体音響再生技術とは、複数の無指向性マイク素子が剛球上に又は複数の指向性マイクが中空球面上に配置されているアンビソニックスマイクを用いて収音した多チャンネル信号に対して信号処理を施すことにより、視聴環境（空間）を取り囲むように配置されたスピーカを用いてあたかもリスナー（聴取者）がアンビソニックスマイクの設置箇所に存在しているかのような立体的な音場をリアルタイムに再現（再生）する方式である。 (Circumstances leading to Embodiment 2 and beyond)
Recently, scene-based stereophonic sound reproduction technology has been attracting attention in order to reproduce (reproduce) a sound field in real time. Scene-based 3D sound reproduction technology is a multi-channel signal collected using an ambisonics microphone in which multiple omnidirectional microphone elements are placed on a rigid sphere or multiple directional microphones are placed on a hollow sphere. By applying signal processing, speakers placed around the listening environment (space) can be used to create a three-dimensional sound field as if the listener were present at the location where the ambisonics microphone is installed. This is a method of reproducing (playing) in real time.

音場再現に関する先行技術として、例えば参考特許文献１が知られている。参考特許文献１は、被写体に装着されたワイヤレスマイクの収音信号を受信し、複数のマイクロフォンにより音声を収音した各音声信号を基にしてマルチチャネル音声信号を生成する音声収録装置を開示している。この音声収録装置は、ワイヤレスマイクの収音信号をマルチチャネル音声信号の１つ以上の任意のチャネルに割り当て、それぞれ任意の合成比で合成して撮像画像信号とともに記録媒体に記録する。 As a prior art related to sound field reproduction, for example, Reference Patent Document 1 is known. Reference Patent Document 1 discloses an audio recording device that receives a sound signal from a wireless microphone attached to a subject and generates a multichannel audio signal based on each audio signal collected by a plurality of microphones. ing. This audio recording device allocates a sound signal picked up by a wireless microphone to one or more arbitrary channels of a multi-channel audio signal, synthesizes them at an arbitrary synthesis ratio, and records them on a recording medium together with a captured image signal.

（参考特許文献１）特開２００６－３１４０７８号公報 (Reference Patent Document 1) Japanese Patent Application Publication No. 2006-314078

ここで、上述したシーンベース立体音響再生技術を用いて、例えば広範なコンサートホール等のライブ会場の客席側にアンビソニックスマイク（上述参照）を配置し、メインステージ等で繰り広げられている演劇等の上演中の客席側の拍手、どよめき、ざわめき、歓声等の臨場感（以下、「客席側臨場感」と称する場合がある）を収音し、その客席側臨場感をライブ会場とは異なる１つ以上のサテライト会場において再現することを想定する。参考特許文献１には、マイクロフォンにより収音された音場の雰囲気とワイヤレスマイクにより収音された音場の雰囲気との関係が詳細に開示されておらず、上述した想定の実現に特許文献１の技術を適用することが困難であると考えられる。また、アンビソニックスマイクは、客席側に配置されたとしても、客席側臨場感だけでなく例えばステージ上の役者等の演者の台詞等の発話音声、効果音、ＢＧＭ（ＢａｃｋｇｒｏｕｎｄＭｕｓｉｃ）、独自音源等の演奏音がライブ会場内の空間を伝播した音信号を収音する可能性が高い。この場合、ライブ会場内の客席側臨場感以外の他の音成分が混入するため、サテライト会場にいる聴取者に向けて客席側臨場感の音場を高精度に再現することが困難であったと考えられる。特許文献１では、上述したライブ会場内を収音した客席側臨場感による音場をサテライト会場内において高感度に再現するための解決の道筋が提示されていない。 Here, using the scene-based stereophonic sound reproduction technology described above, for example, an ambisonics microphone (see above) is placed on the audience side of a live performance venue such as a wide concert hall, and the ambisonics microphone (see above) is placed on the audience side of a live performance venue such as a wide concert hall. It captures the sense of presence such as applause, roars, murmurs, cheers, etc. from the audience seats during a performance (hereinafter sometimes referred to as "the sense of presence from the audience seats"), and makes the sense of presence from the audience seats different from that of a live venue. It is assumed that it will be reproduced at the above satellite venues. Reference Patent Document 1 does not disclose in detail the relationship between the atmosphere of the sound field picked up by the microphone and the atmosphere of the sound field picked up by the wireless microphone. It is considered difficult to apply this technology. In addition, even if the ambisonics microphone is placed on the audience side, it not only provides a sense of presence on the audience side, but also provides utterances such as the speech of actors on stage, sound effects, BGM (Background Music), original sound sources, etc. There is a high possibility that the performance sound will pick up the sound signal that propagated through the space inside the live venue. In this case, other sound components other than the sense of presence on the audience side in the live venue are mixed in, making it difficult to reproduce the sound field of the sense of presence on the audience side with high precision for the listeners in the satellite venue. Conceivable. Patent Document 1 does not present a solution for reproducing with high sensitivity in a satellite venue the sound field based on the sense of presence on the audience seat side, where the sound inside the live venue is collected.

そこで、以下の実施の形態２以降では、アンビソニックスマイクを用いて収音した収音空間内の客席側臨場感の雰囲気を少なくとも１つのサテライト会場内において高精度に再現する音場臨場感再現装置及び音場臨場感再現方法の例を説明する。なお、実施の形態１における「臨場感音場再生」と、以下の各実施の形態における「音場臨場感再現」とは同義である（言い換えると、互いに用語を置き換えてもよい）ことを注釈しておく。 Therefore, in Embodiment 2 and later described below, a sound field presence reproduction device that reproduces with high precision the atmosphere of presence on the audience seat side in a sound collection space in which sound is collected using an ambisonics microphone is provided in at least one satellite venue. An example of a method for reproducing a sense of presence in a sound field will be explained. Note that "realistic sound field reproduction" in Embodiment 1 and "sound field realistic reproduction" in each of the following embodiments are synonymous (in other words, the terms may be replaced with each other). I'll keep it.

以下の実施の形態では、収音空間（例えばライブ会場）内の音、音楽、人の声等の音源信号を収音する収音デバイスとしてアンビソニックスマイクを用いたシーンベース立体音響再生技術を例示して説明する。このシーンベース立体音響再生技術では、アンビソニックスマイクを構成する複数のマイク素子で収音した信号（収音信号）或いはモノラル信号として表現可能な点音源を、球面調和関数を用いた中間表現ＩＴＭＲ１（図１参照）或いはＢフォーマット信号として表現する（エンコードする）ことにより、全方位から到来する音場をアンビソニックス信号領域（後述参照）において統一的に取り扱う。更に、この中間表現をデコード（復号化）することによりスピーカ駆動信号を生成し、再現空間（例えばサテライト会場）内での所望の音場再現を実現する。 In the following embodiments, a scene-based stereophonic sound reproduction technology using an ambisonics microphone as a sound collection device for collecting sound source signals such as sounds, music, and human voices in a sound collection space (for example, a live venue) will be exemplified. and explain. In this scene-based stereophonic sound reproduction technology, a point sound source that can be expressed as a signal picked up by multiple microphone elements constituting an ambisonics microphone (acoustic signal) or a monaural signal is converted into an intermediate representation ITMR1 ( (see FIG. 1) or by expressing (encoding) it as a B format signal, the sound field arriving from all directions is handled uniformly in the ambisonics signal domain (see below). Further, by decoding this intermediate representation, a speaker drive signal is generated, thereby realizing a desired sound field reproduction within a reproduction space (for example, a satellite venue).

以下、「音場」とは、音が広がる空間（場所を含む）のことと定義する。音場内に伝播する音には、対象となる空間内で伝播している１以上の音源からの音が含まれる。ここで、音源とは、例えばライブ会場ＬＶ１等の収音空間のメインステージ上で行われている各種の演奏（例えばバンド演奏、ミュージカル演劇）の音源だけでなく、ライブ会場ＬＶ１内のメインステージから離れた客席側で生じる歓声、ざわめき、どよめき、拍手等の臨場感を与える音も含まれる。 Hereinafter, a "sound field" is defined as a space (including a location) in which sound spreads. Sound propagating within the sound field includes sounds from one or more sound sources propagating within the target space. Here, the sound sources include not only the sound sources of various performances (e.g., band performances, musical plays) being performed on the main stage of the sound collection space such as the live venue LV1, but also the sound sources from the main stage in the live venue LV1. It also includes sounds that give a sense of presence, such as cheers, murmurs, roars, and applause that occur in the audience seats that are far away.

（実施の形態２）
まず、図１０を参照して、シーンベース立体音響再生技術の概念について説明する。図１０は、アンビソニックスマイクＡＭＢ１を用いたシーンベース立体音響再生技術における音場収音から音場再現までの概念を模式的に示す図である。アンビソニックスマイクＡＭＢ１は、収音空間（例えばライブ会場ＬＶ１）内の客席側所定位置に配置される。ライブ会場ＬＶ１では、その空間内を伝搬している音信号がアンビソニックスマイクＡＭＢ１により収音される。例えば、ライブ会場ＬＶ１のメインステージで複数人によるバンド演奏がなされていれば、ボーカル（歌声）、ベース、ギター、ドラム等の各種の音源による音信号が収音される。また、ミュージカル演劇がなされていれば、１人以上の役者等の演者（音源）の発話による音声信号が収音される。一方、客席側で生じる歓声、ざわめき、どよめき、拍手等の客席側臨場感を与える音信号も、アンビソニックスマイクＡＭＢ１により収音される。 (Embodiment 2)
First, the concept of scene-based stereophonic sound reproduction technology will be explained with reference to FIG. FIG. 10 is a diagram schematically showing the concept from sound field collection to sound field reproduction in the scene-based stereophonic sound reproduction technology using the ambisonics microphone AMB1. The ambisonics microphone AMB1 is arranged at a predetermined position on the audience seat side in the sound collection space (for example, the live venue LV1). In the live venue LV1, sound signals propagating within the space are collected by the ambisonics microphone AMB1. For example, if a band is performing by a plurality of people on the main stage of the live venue LV1, sound signals from various sound sources such as vocals (singing voices), bass, guitar, drums, etc. are collected. Furthermore, if a musical play is being performed, audio signals uttered by one or more performers (sound sources) such as actors are collected. On the other hand, sound signals that give a sense of presence on the audience side, such as cheers, murmurs, roars, and applause, generated on the audience side are also collected by the ambisonics microphone AMB1.

収音デバイスの一例としてのアンビソニックスマイクＡＭＢ１は、４つのマイク素子Ｍｃ１、Ｍｃ２、Ｍｃ３、Ｍｃ４を備える。マイク素子Ｍｃ１～Ｍｃ４のそれぞれは、方向Ｄｒ１を正面方向とした場合に、図１０中の立方体ＣＢ１の中心から４つの頂点を向くように中空配置され、各頂点方向に対する単一指向性を有している。マイク素子Ｍｃ１は、アンビソニックスマイクＡＭＢ１の前方左上（ＦＬＵ：ＦｒｏｎｔＬｅｆｔＵｐ）を向き、その前方左上（ＦＬＵ）の方向の音を主に収音する。マイク素子Ｍｃ２は、アンビソニックスマイクＡＭＢ１の前方右下（ＦＲＤ：ＦｒｏｎｔＲｉｇｈｔＤｏｗｎ）を向き、その前方右下（ＦＲＤ）の方向の音を主に収音する。マイク素子Ｍｃ３は、アンビソニックスマイクＡＭＢ１の後方左下（ＢＬＤ：ＢａｃｋＬｅｆｔＤｏｗｎ）を向き、その後方左下の方向の音を主に収音する。マイク素子Ｍｃ４は、アンビソニックスマイクＡＭＢ１の後方右上（ＢＲＵ：ＢａｃｋＲｉｇｈｔＵｐ）を向き、その後方右上の方向の音を主に収音する。 An ambisonics microphone AMB1 as an example of a sound collection device includes four microphone elements Mc1, Mc2, Mc3, and Mc4. Each of the microphone elements Mc1 to Mc4 is hollowly arranged so as to face four vertices from the center of the cube CB1 in FIG. 10 when the direction Dr1 is the front direction, and has unidirectivity in each vertex direction. ing. The microphone element Mc1 faces the front left up (FLU) of the ambisonics microphone AMB1, and mainly picks up sound in the front left up (FLU) direction. The microphone element Mc2 faces toward the front right down (FRD) of the ambisonics microphone AMB1, and mainly collects sounds in the front right down (FRD) direction. The microphone element Mc3 faces toward the back left down (BLD) of the ambisonics microphone AMB1, and mainly picks up sound in the back left direction. The microphone element Mc4 faces toward the back right up (BRU) of the ambisonics microphone AMB1, and mainly picks up sound from the rear upper right direction.

これらの４方向（つまり、ＦＬＵ、ＦＲＤ、ＢＬＤ、ＢＲＵ）の音の収音信号は、Ａフォーマット信号と呼ばれる。Ａフォーマット信号は、そのままでは使用できず、指向特性（指向性）を有する中間表現ＩＴＭＲ１としてのＢフォーマット信号に変換される。Ｂフォーマット信号は、例えば、全方向（全方位）の音のＢフォーマット信号Ｗ、前後方向の音のＢフォーマット信号Ｘ、左右方向の音のＢフォーマット信号Ｙ、上下方向の音のＢフォーマット信号Ｚを有する。Ａフォーマット信号は、次に示す変換式により、Ｂフォーマット信号に変換される。 The collected signals of sounds from these four directions (i.e., FLU, FRD, BLD, BRU) are called A-format signals. The A-format signals cannot be used as is and are converted into B-format signals as intermediate representations ITMR1 having directional characteristics (directivity). The B-format signals have, for example, a B-format signal W for sounds from all directions (omnidirectional), a B-format signal X for sounds in the front-rear direction, a B-format signal Y for sounds in the left-right direction, and a B-format signal Z for sounds in the up-down direction. The A-format signals are converted into B-format signals using the following conversion formula:

Ｗ＝ＦＬＵ＋ＦＲＤ＋ＢＬＤ＋ＢＲＵ
Ｘ＝ＦＬＵ＋ＦＲＤ－ＢＬＤ－ＢＲＵ
Ｙ＝ＦＬＵ－ＦＲＤ＋ＢＬＤ－ＢＲＵ
Ｚ＝ＦＬＵ－ＦＲＤ－ＢＬＤ＋ＢＲＵ W = FLU + FRD + BLD + BRU
X = FLU + FRD - BLD - BRU
Y = FLU - FRD + BLD - BRU
Z = FLU - FRD - BLD + BRU

Ｂフォーマット信号Ｗ、Ｘ、Ｙ、Ｚを合成することにより、前後、左右、上下の全方位の音の信号が得られる。そして、Ｂフォーマット信号Ｗ、Ｘ、Ｙ、Ｚのそれぞれの信号レベルを変更させて合成することにより、前後、左右、上下の全方位のうち任意の指向特性を有する音の信号を生成することができる。例えば図１０に示すように、立方体でモデル化される再現空間（例えばサテライト会場ＳＴＬ１）内の各頂点部分に、合計８つのサテライトスピーカＳＰｋ１、ＳＰｋ２、ＳＰｋ３、ＳＰｋ４、ＳＰｋ５、ＳＰｋ６、ＳＰｋ７、ＳＰｋ８が配置され、収音空間（例えばライブ会場ＬＶ１）と同様（つまり、前後、左右、上下の方向が平行或いは同方向）の３次元座標系を考える。なお、ここでは説明を分かり易くするために、サテライトスピーカの個数を８であると例示しているが、その個数は８に限定されないことは言うまでもない。 By combining the B format signals W, X, Y, and Z, sound signals in all directions, front and back, left and right, and up and down are obtained. By changing the signal levels of the B-format signals W, X, Y, and Z and synthesizing them, it is possible to generate a sound signal having arbitrary directional characteristics in all directions: front, rear, left, right, and top and bottom. can. For example, as shown in FIG. 10, a total of eight satellite speakers SPk1, SPk2, SPk3, SPk4, SPk5, SPk6, SPk7, and SPk8 are installed at each vertex in a reproduction space modeled as a cube (for example, satellite venue STL1). Consider a three-dimensional coordinate system in which the sound collection space (for example, the live venue LV1) is arranged (that is, the front-rear, left-right, and up-down directions are parallel or in the same direction). Note that although the number of satellite speakers is exemplified as eight here to make the explanation easier to understand, it goes without saying that the number is not limited to eight.

なお、サテライトスピーカＳＰｋ１～ＳＰｋ８のそれぞれの位置は、再現空間（例えばサテライト会場ＳＴＬ１）の基準位置（例えば中心位置ＬＳＰ１）からの既定距離と角度（方位角θ_ｉ及び仰角φ_ｉ）とにより特定可能である。図１０において、ｉは再現空間（例えばサテライト会場ＳＴＬ１）内に配置されているサテライトスピーカを示す変数であり、図１０の例では１から８までのいずれかの整数をとる。 The position of each of the satellite speakers SPk1 to SPk8 can be specified by a predetermined distance and angle (azimuth angle θ _i and elevation angle φ _i ) from a reference position (e.g., central position LSP1) of the reproduction space (e.g., satellite venue STL1). In Fig. 10, i is a variable indicating the satellite speaker located in the reproduction space (e.g., satellite venue STL1), and takes any integer from 1 to 8 in the example of Fig. 10.

再現空間（例えばサテライト会場ＳＴＬ１）の中心位置ＬＳＰ１にユーザであるリスナー（聴取者）が存在し、正面方向（Ｆｒｏｎｔ）を向いているとする。このような状況下において、収音空間（例えばライブ会場ＬＶ１）内で収音されたＡフォーマット信号を用いた符号化処理により得られたＢフォーマット信号Ｗ、Ｘ、Ｙ、Ｚのデータと再現空間（例えばサテライト会場ＳＴＬ１）内のサテライトスピーカＳＰｋ１～ＳＰｋ８のそれぞれの方向とに基づいて、収音空間（例えばライブ会場ＬＶ１）内の音場を再現空間（例えばサテライト会場ＳＴＬ１）内で自由に再現することができる。つまり、再現空間（例えばサテライト会場ＳＴＬ１）にユーザであるリスナー（聴取者）が存在する場合に、リスナーの正面方向を基準方向とし、その基準方向から任意の３次元方向の音を再現出力することが可能となる。 It is assumed that a listener (listener) who is a user exists at the center position LSP1 of the reproduction space (for example, satellite venue STL1) and is facing the front direction (Front). Under such circumstances, the data and reproduction space of the B format signals W, The sound field in the sound collection space (for example, live venue LV1) is freely reproduced in the reproduction space (for example, satellite venue STL1) based on the respective directions of satellite speakers SPk1 to SPk8 in (for example, satellite venue STL1). be able to. In other words, when there is a listener who is a user in the reproduction space (for example, satellite venue STL1), the front direction of the listener is taken as the reference direction, and sound in any three-dimensional direction from that reference direction is reproduced and output. becomes possible.

次に、図１１を参照して、次数ｎ及び度数ｍに対する球面調和関数展開に基づくアンビソニックス成分の基底について説明する。図１１は、次数ｎ及び度数ｍに対する球面調和関数展開に基づくアンビソニックス成分の基底の一例を示す図である。 Next, with reference to FIG. 11, the basis of the ambisonics component based on the spherical harmonic expansion for the order n and the power m will be described. FIG. 11 is a diagram showing an example of the basis of an ambisonics component based on spherical harmonic expansion with respect to order n and power m.

図１１の横軸（ｍ）は度数（ｄｅｇｒｅｅ）を示し、図１１の縦軸（ｎ）は次数（ｏｒｄｅｒ）を示す。度数ｍは、－ｎから＋ｎまでの値をとる。ｎ＝Ｎ次までの球面調和関数は合計（Ｎ＋１）^２個の基底を含む。例えば、ｎ＝Ｎ＝０である場合、１個の基底（つまり、全方位のＢフォーマット信号Ｗ）が得られる。また例えば、ｎ＝Ｎ＝１である場合、４個の基底（つまり、（ｎ、ｍ）＝（０、０）に対応する全方位のＢフォーマット信号Ｗ、（ｎ、ｍ）＝（１、－１）に対応する前後方向のＢフォーマット信号Ｘ、（ｎ、ｍ）＝（１、０）に対応する上下方向のＢフォーマット信号Ｚ、（ｎ、ｍ）＝（１、１）に対応する左右方向のＢフォーマット信号Ｙ）が得られる。なお、ｎ＝Ｎ＝２以降も同様であるため、説明を省略する。 The horizontal axis (m) in FIG. 11 indicates degree, and the vertical axis (n) in FIG. 11 indicates order. The frequency m takes a value from -n to +n. The spherical harmonics up to order n=N include a total of (N+1) ^two bases. For example, when n=N=0, one basis (that is, an omnidirectional B format signal W) is obtained. For example, when n = N = 1, the omnidirectional B format signal W corresponding to the four bases (that is, (n, m) = (0, 0), (n, m) = (1, B-format signal X in the front-rear direction corresponding to -1), B-format signal Z in the vertical direction corresponding to (n, m) = (1, 0), corresponding to (n, m) = (1, 1) A horizontal B format signal Y) is obtained. Note that the same applies to n=N=2 and thereafter, so the explanation will be omitted.

球面調和関数はｎとｍの増加に対して空間的な周期性が増す性質を有することが知られている。このため、ｎとｍの組み合わせによって異なる方向パターン（指向特性）のＢフォーマット信号を表現することが可能となる。次数ｎ及び度数ｍに対する次元をアンビソニックスチャネルナンバリング（ＡＣＮ：ＡｍｂｉｓｏｎｉｃｓＣｈａｎｎｅｌＮｕｍｂｅｒｉｎｇ）に基づいてＫ＝ｎ（ｎ＋１）＋ｍと定義すると、球面調和関数を式（１）のようにベクトル形式で表現可能である。式（１）において、上添字のＴは転置を示す。 It is known that spherical harmonics have a property of increasing spatial periodicity as n and m increase. Therefore, it is possible to express B format signals with different directional patterns (directional characteristics) depending on the combination of n and m. If the dimensions for order n and degree m are defined as K=n(n+1)+m based on Ambisonics Channel Numbering (ACN), the spherical harmonic function can be expressed in vector form as shown in equation (1). be. In formula (1), the superscript T indicates transposition.

次に、図１２を参照して、音場臨場感再現システムの動作概要例について説明する。図１２は、音場臨場感再現システムの動作概要例を模式的に示す図である。図１２では、アンビソニックスマイクＡＭＢ１が配置される収音空間は、例えばボーカル、ベース、ギター、ドラム等の各種の音源によるバンド演奏が行われるライブ会場ＬＶ１を例示して説明する。但し、上述したように、収音空間であるライブ会場ＬＶ１ではバンド演奏に限らず、１人以上の役者等の演者が演じているミュージカルの上演、複数の楽器の演奏によるコンサート若しくはオーケストラの演奏であってもよく、以下同様である。 Next, with reference to FIG. 12, an example of an outline of the operation of the sound field realistic sensation reproduction system will be described. FIG. 12 is a diagram schematically showing an example of an outline of the operation of the sound field realistic sensation reproduction system. In FIG. 12, the sound collection space in which the ambisonics microphone AMB1 is arranged will be explained by exemplifying a live venue LV1 where a band performance is performed using various sound sources such as vocals, bass, guitar, drums, etc. However, as mentioned above, in the live venue LV1, which is a sound collection space, it is not limited to band performances, but also musical performances performed by one or more actors, concerts with multiple musical instruments, or orchestra performances. The same applies below.

図１２に示すように、ライブ会場ＬＶ１にはメインステージＳＴＧ１が設けられ、このメインステージＳＴＧ１上においてバンド演奏が行われている。バンド演奏では、例えばボーカル（音源の一例）による歌声等の音声信号ＳＳ２、ベース（音源の一例）によるベース音の音信号ＳＳ１、ギター（音源の一例）によるギター音の音信号ＳＳ３が広くライブ会場ＬＶ１内の空間内を伝搬して客席側に到達する。これらの信号は各音源位置から直接空間内を伝達して客席側に到達することもあれば、ライブ会場ＬＶ１内に備えられたスピーカなどの拡声装置を通じて再生されて客席側に到達することもある。アンビソニックスマイクＡＭＢ１は、客席側臨場感の音を主に収音することを目的として、ライブ会場ＬＶ１の客席側所定位置（例えば客席の中心位置）に配置されている。このため、アンビソニックスマイクＡＭＢ１は、上述したバンド演奏中の客席側の歓声、どよめき、ざわつき、拍手等の客席側臨場感を与える音を主に収音する。 As shown in FIG. 12, a main stage STG1 is provided in the live venue LV1, and a band is performing on this main stage STG1. In a band performance, for example, an audio signal SS2 such as a singing voice from a vocalist (an example of a sound source), a bass sound signal SS1 from a bass (an example of a sound source), and a sound signal SS3 from a guitar (an example of a sound source) are widely used at the live venue. It propagates within the space within LV1 and reaches the audience seats. These signals may be directly transmitted through the space from each sound source position and reach the audience seats, or they may be reproduced through loudspeaker equipment such as speakers installed in the live venue LV1 and reach the audience seats. . The ambisonics microphone AMB1 is placed at a predetermined position on the audience side of the live venue LV1 (for example, the center position of the audience seats) for the purpose of mainly collecting sound that gives a sense of presence on the audience side. For this reason, the ambisonics microphone AMB1 mainly picks up sounds that give a sense of presence on the audience side, such as cheers, roars, murmurs, and applause from the audience side during the band performance.

ところが、上述したように、バンド演奏中のベースの音信号ＳＳ１、ボーカルの音声信号ＳＳ２、ギターの音信号ＳＳ３は、ライブ会場ＬＶ１の空間内を伝搬する。このため、音信号ＳＳ１、音声信号ＳＳ２、音信号ＳＳ３の拡散音（残響音を含む。以下同様。）の成分ＤＳ１、ＤＳ２、ＤＳ３がアンビソニックスマイクＡＭＢ１により音信号として収音されてしまう。従って、アンビソニックスマイクＡＭＢ１には、本来収音されて欲しくない拡散音の成分ＤＳ１、ＤＳ２、ＤＳ３が収音されることにより、従来のシーンベース立体音響再生技術では、サテライト会場ＳＴＬ１においてライブ会場ＬＶ１の客席側臨場感の音を高精度に再現することは難しかった。 However, as described above, the bass sound signal SS1, vocal sound signal SS2, and guitar sound signal SS3 during the band performance propagate within the space of the live venue LV1. Therefore, the components DS1, DS2, and DS3 of the diffused sounds (including reverberant sounds; the same applies hereinafter) of the sound signal SS1, the audio signal SS2, and the sound signal SS3 are picked up as sound signals by the ambisonics microphone AMB1. Therefore, the ambisonics microphone AMB1 picks up diffuse sound components DS1, DS2, and DS3 that are not originally wanted to be picked up. It was difficult to accurately reproduce the realistic sound from the audience seats.

そこで、以下の実施の形態では、アンビソニックスマイクを用いて収音した収音空間内の客席側臨場感の雰囲気を少なくとも１つのサテライト会場内において高精度に再現する音場臨場感再現システムの例を説明する。 Therefore, in the following embodiments, an example of a sound field immersion reproduction system that reproduces with high precision the atmosphere of presence on the audience seat side in a sound collection space in which sound is collected using an ambisonics microphone is provided in at least one satellite venue. Explain.

次に、図１３及び図１４を参照して、実施の形態２に係る音場臨場感再現システム１００のシステム構成並びに動作概要について説明する。図１３は、実施の形態２に係る音場臨場感再現システム１００のシステム構成例を示すブロック図である。図１４は、図１３の音場臨場感再現システム１００における音場臨場感収音から音場臨場感再現までの動作概要例を示す図である。 Next, with reference to FIGS. 13 and 14, the system configuration and operational outline of the sound field realistic sensation reproduction system 100 according to the second embodiment will be described. FIG. 13 is a block diagram showing an example of a system configuration of a sound field realistic sensation reproduction system 100 according to the second embodiment. FIG. 14 is a diagram illustrating an example of an outline of the operation from sound field presence sound collection to sound field presence reproduction in the sound field presence reproduction system 100 of FIG. 13.

音場臨場感再現システム１００は、音場臨場感収音装置１０と、音場臨場感再現装置２０とを含む。音場臨場感収音装置１０と音場臨場感再現装置２０とはネットワークＮＷ１を介して互いにデータ通信が可能に接続されている。ネットワークＮＷ１は、有線ネットワークでもよいし、無線ネットワークでもよい。有線ネットワークは、例えば有線ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、有線ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）、電力線通信（ＰＬＣ：ＰｏｗｅｒＬｉｎｅＣｏｍｍｕｎｉｃａｔｉｏｎ）のうち少なくとも１つが該当し、他の有線通信可能なネットワーク構成でもよい。一方、無線ネットワークは、Ｗｉ－Ｆｉ（登録商標）等の無線ＬＡＮ、無線ＷＡＮ、Ｂｌｕｅｔｏｏｔｈ（登録商標）等の近距離無線通信、４Ｇ或いは５Ｇ等の移動体携帯通信網のうち少なくとも１つが該当し、他の無線通信可能なネットワーク構成でもよい。 The sound field presence reproduction system 100 includes a sound field presence sound pickup device 10 and a sound field presence reproduction device 20. The sound field presence sound pickup device 10 and the sound field presence reproduction device 20 are connected to each other via a network NW1 so as to be capable of data communication. Network NW1 may be a wired network or a wireless network. The wired network may be, for example, at least one of a wired LAN (Local Area Network), a wired WAN (Wide Area Network), and a power line communication (PLC), and may be any other network configuration that allows wired communication. On the other hand, the wireless network includes at least one of wireless LAN such as Wi-Fi (registered trademark), wireless WAN, short-range wireless communication such as Bluetooth (registered trademark), and mobile mobile communication network such as 4G or 5G. , or other network configurations that allow wireless communication.

音場臨場感収音装置１０は、収音空間（例えばライブ会場ＬＶ１）に配置され、アンビソニックスマイクＡＭＢ１と、Ａ／Ｄ変換部７と、個別収音マイクＭ１、…、Ｍｎとを含む。ここでいうｎは、ライブ会場ＬＶ１内の個別音源（例えばバンド演奏であればボーカル、ベース、ギター等の独立した音源）の数を示し、具体的には２以上の整数である。なお、音場臨場感収音装置１０は、少なくともアンビソニックスマイクＡＭＢ１を有していればよく、Ａ／Ｄ変換部７は音場臨場感再現装置２０に設けられてもよい。 The sound field realistic sound pickup device 10 is arranged in a sound pickup space (for example, a live venue LV1), and includes an ambisonics microphone AMB1, an A/D converter 7, and individual sound pickup microphones M1, . . . , Mn. Here, n indicates the number of individual sound sources (for example, independent sound sources such as vocals, bass, guitar, etc. in the case of a band performance) in the live venue LV1, and is specifically an integer of 2 or more. Note that the sound field presence sound pickup device 10 only needs to have at least the ambisonics microphone AMB1, and the A/D converter 7 may be provided in the sound field presence reproduction device 20.

アンビソニックスマイクＡＭＢ１は、４つのマイク素子Ｍｃ１、Ｍｃ２、Ｍｃ３、Ｍｃ４を備え、マイク素子Ｍｃ１において前方左上方向（図１０参照）の音を収音し、マイク素子Ｍｃ２において前方右下方向（図１０参照）の音を収音し、マイク素子Ｍｃ３において後方左下方向（図１０参照）の音を収音し、後方右上方向（図１０参照）の音を収音する。なお、アンビソニックスマイクＡＭＢ１は、中空配置された４つのマイク素子Ｍｃ１、Ｍｃ２、Ｍｃ３、Ｍｃ４よりも多くの単一指向性を有するマイク素子を備えていてもよく、また、剛球上に配置された無指向性を有するマイク素子を備えていても良い。多数のマイク素子を備えたアンビソニックスマイクを用いることにより、音場臨場感再現装置２０の符号化部２２において、２次以上オーダーのアンビソニックス信号を合成することが可能となる。アンビソニックスマイクＡＭＢ１を構成する各マイク素子により収音された信号（収音信号）は、Ａ／Ｄ変換部７に入力される。 Ambisonics microphone AMB1 includes four microphone elements Mc1, Mc2, Mc3, and Mc4. Microphone element Mc1 collects sound in the front upper left direction (see FIG. 10), and microphone element Mc2 collects sound in the front lower right direction (see FIG. 10). ), the microphone element Mc3 collects the sound in the rear lower left direction (see FIG. 10), and the rear upper right direction (see FIG. 10). Note that the ambisonics microphone AMB1 may include more microphone elements having unidirectionality than the four microphone elements Mc1, Mc2, Mc3, and Mc4 arranged in the hollow, or may include microphone elements arranged on a hard sphere. A microphone element having omnidirectionality may be included. By using an ambisonics microphone equipped with a large number of microphone elements, it becomes possible to synthesize ambisonics signals of second order or higher order in the encoding unit 22 of the sound field reality reproduction device 20. Signals (collected sound signals) collected by each microphone element constituting the ambisonics microphone AMB1 are input to the A/D converter 7.

少なくともＡ／Ｄ変換部７は、例えばＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）、ＧＰＵ（ＧｒａｐｈｉｃａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）等の電子デバイスのうち少なくとも１つが実装された半導体チップ若しくは専用のハードウェアにより構成される。 At least the A/D conversion unit 7 is configured by a semiconductor chip or dedicated hardware that implements at least one of the following electronic devices: a CPU (Central Processing Unit), a DSP (Digital Signal Processor), a GPU (Graphical Processing Unit), an FPGA (Field Programmable Gate Array), etc.

Ａ／Ｄ変換部７は、アンビソニックスマイクＡＭＢ１を構成する各マイク素子からのアナログ形式の収音信号をディジタル形式の収音信号に変換する。この変換後の収音信号は、音場臨場感収音装置１０が備える通信インターフェース（図示略）及びネットワークＮＷ１を介して、音場臨場感再現装置２０に送信される。 The A/D converter 7 converts the analog sound pickup signal from each microphone element constituting the ambisonics microphone AMB1 into a digital sound pickup signal. This converted sound pickup signal is transmitted to the sound field presence reproduction device 20 via a communication interface (not shown) included in the sound field presence sound pickup device 10 and the network NW1.

個別収音マイクＭ１は、ライブ会場ＬＶ１のメインステージＳＴＧ１（図１２参照）上のバンド演奏或いはミュージカル演劇等のイベント発生中のユニークな音源（例えばバンド演奏のボーカル、或いはミュージカル演劇の演者）から生じる個別音源音（第１の音源）を収音する。個別収音マイクＭ１は、例えばバンド演奏のボーカル、或いはミュージカル演劇の演者により装着されるヘッドセットマイクでよい。個別収音マイクＭ１により収音された個別音源音信号は、音場臨場感収音装置１０が備える通信インターフェース（図示略）及びネットワークＮＷ１を介して、音場臨場感再現装置２０に送信される。 The individual sound pickup microphone M1 is generated from a unique sound source (for example, vocals of a band performance or performers of a musical theater performance) during an event such as a band performance or a musical theater performance on the main stage STG1 (see FIG. 12) of the live venue LV1. Individual sound source sound (first sound source) is collected. The individual sound pickup microphone M1 may be, for example, a headset microphone worn by a vocalist in a band performance or a performer in a musical play. The individual sound source sound signal collected by the individual sound collection microphone M1 is transmitted to the sound field presence reproduction device 20 via a communication interface (not shown) included in the sound field presence sound pickup device 10 and the network NW1. .

同様に、個別収音マイクＭｎは、ライブ会場ＬＶ１のメインステージＳＴＧ１（図１２参照）上のバンド演奏或いはミュージカル演劇等のイベント発生中のユニークな音源（例えばバンド演奏のギター、或いはミュージカル演劇中の効果音若しくはＢＧＭ（ＢａｃｋｇｒｏｕｎｄＭｕｓｉｃ））から生じる個別音源音（第ｎの音源）を収音する。個別収音マイクＭｎは、例えばバンド演奏のギター奏者により装着されるヘッドセットマイク、或いはミュージカル演劇中の効果音若しくはＢＧＭを収音可能なマイクでよい。個別収音マイクＭｎにより収音された個別音源音信号は、音場臨場感収音装置１０が備える通信インターフェース（図示略）及びネットワークＮＷ１を介して、音場臨場感再現装置２０に送信される。 Similarly, the individual sound pickup microphone Mn is configured to listen to a unique sound source (for example, a guitar in a band performance, or a guitar in a musical play) during an event such as a band performance or musical play on the main stage STG1 (see FIG. 12) of the live venue LV1. An individual sound source sound (nth sound source) generated from sound effects or BGM (Background Music) is collected. The individual sound collection microphone Mn may be, for example, a headset microphone worn by a guitar player playing in a band, or a microphone capable of collecting sound effects or BGM during a musical play. The individual sound source sound signals collected by the individual sound collection microphones Mn are transmitted to the sound field presence reproduction device 20 via a communication interface (not shown) included in the sound field presence sound pickup device 10 and the network NW1. .

音場臨場感再現装置２０は、再現空間（例えばサテライト会場ＳＴＬ１）に配置され、エコーキャンセル部２１、…、２ｎと、符号化部２２と、マイク素子方向指定部２３と、スピーカ方向指定部２４と、復号化部２５と、音場再生部２６と、サテライトスピーカＳＰｋ１、…、ＳＰｋｐとを含む。ここでいうｐは、サテライト会場ＳＴＬ１内に配置されるサテライトスピーカの数を示し、具体的には２以上の整数である。また、エコーキャンセル部２１～２ｎの個数を示すｎと、個別収音マイクＭ１～Ｍｎの個数を示すｎとは同一である。つまり、個別収音マイクが収音する音源の種類数と同数のエコーキャンセル部が音場臨場感再現装置２０内に設けられる。なお、図１３中のエコーキャンセル部２１～２ｎ、符号化部２２、マイク素子方向指定部２３、スピーカ方向指定部２４、復号化部２５及び音場再生部２６の構成は、実施の形態１に係る客席側臨場感パラメータ算出部１６及び音場再生処理部１８の構成（図１参照）に相当する。 The sound field realism reproduction device 20 is placed in a reproduction space (for example, satellite venue STL1) and includes echo cancellation units 21, ..., 2n, an encoding unit 22, a microphone element direction designation unit 23, a speaker direction designation unit 24, a decoding unit 25, a sound field reproduction unit 26, and satellite speakers SPk1, ..., SPkp. Here, p indicates the number of satellite speakers placed in the satellite venue STL1, and is specifically an integer of 2 or more. Also, n indicating the number of echo cancellation units 21 to 2n is the same as n indicating the number of individual sound pickup microphones M1 to Mn. In other words, the sound field realism reproduction device 20 is provided with the same number of echo cancellation units as the number of types of sound sources picked up by the individual sound pickup microphones. Note that the configurations of the echo cancellation units 21 to 2n, the encoding unit 22, the microphone element direction designation unit 23, the speaker direction designation unit 24, the decoding unit 25, and the sound field reproduction unit 26 in FIG. 13 correspond to the configurations of the audience seat side presence parameter calculation unit 16 and the sound field reproduction processing unit 18 in embodiment 1 (see FIG. 1).

エコーキャンセル部２１は、音場臨場感収音装置１０（Ａ／Ｄ変換部７側）から送られたアンビソニックスマイクＡＭＢ１のマイク素子ごとの収音信号を入力し、更に、音場臨場感収音装置１０（個別収音マイクＭ１側）から送られた第１の音源（上述参照）の個別音源音信号を第１の参照信号Ｍ１Ｓとして入力する。エコーキャンセル部２１は、アンビソニックスマイクＡＭＢ１のマイク素子ごとの収音信号に含まれる第１の参照信号Ｍ１Ｓ（つまり、個別収音マイクＭ１が収音した個別音源音信号）の成分を消去するための消去処理（例えばエコーキャンセル処理）を実行する。エコーキャンセル部２１は、消去処理後の信号（第１の収音信号）を符号化部２２に出力する。 The echo canceling unit 21 inputs the sound pickup signal for each microphone element of the ambisonics microphone AMB1 sent from the sound field immersive sound pickup device 10 (A/D converter 7 side), and further performs sound field ambience sensing. The individual sound source sound signal of the first sound source (see above) sent from the sound device 10 (individual sound collection microphone M1 side) is input as the first reference signal M1S. The echo cancellation unit 21 cancels the component of the first reference signal M1S (that is, the individual sound source sound signal collected by the individual sound collection microphone M1) included in the sound signal collected by each microphone element of the ambisonics microphone AMB1. Executes erasure processing (for example, echo cancellation processing). The echo canceling unit 21 outputs the signal after the cancellation process (first collected sound signal) to the encoding unit 22.

同様に、エコーキャンセル部２ｎは、音場臨場感収音装置１０（Ａ／Ｄ変換部７側）から送られたアンビソニックスマイクＡＭＢ１のマイク素子ごとの収音信号を入力し、更に、音場臨場感収音装置１０（個別収音マイクＭｎ側）から送られた第ｎの音源（上述参照）の個別音源音信号を第ｎの参照信号ＭｎＳとして入力する。エコーキャンセル部２ｎは、アンビソニックスマイクＡＭＢ１のマイク素子ごとの収音信号に含まれる第ｎの参照信号ＭｎＳ（つまり、個別収音マイクＭｎが収音した個別音源音信号）の成分を消去するための消去処理（例えばエコーキャンセル処理）を実行する。エコーキャンセル部２ｎは、消去処理後の信号（第ｎの収音信号）を符号化部２２に出力する。 Similarly, the echo canceling unit 2n inputs the sound pickup signal for each microphone element of the ambisonics microphone AMB1 sent from the sound field presence sound pickup device 10 (A/D conversion unit 7 side), and The individual sound source sound signal of the nth sound source (see above) sent from the realistic sound pickup device 10 (individual sound collection microphone Mn side) is input as the nth reference signal MnS. The echo canceling unit 2n cancels the component of the n-th reference signal MnS (that is, the individual sound source sound signal collected by the individual sound collection microphone Mn) included in the sound signal collected by each microphone element of the ambisonics microphone AMB1. Executes erasure processing (for example, echo cancellation processing). The echo canceling unit 2n outputs the signal after the cancellation process (the nth collected sound signal) to the encoding unit 22.

ここで、エコーキャンセル部２１～２ｎのそれぞれは、例えば時間領域上で動作する適応フィルタを用いたエコーキャンセラとして構成してよい。このエコーキャンセラは、例えばＳｉｎｇｌｅＣｈａｎｎｅｌＥｃｈｏＣａｎｃｅｌｌｅｒとして構成することができる。従って、図１４に示すように、エコーキャンセル部２１～２ｎのそれぞれは、アンビソニックスマイクＡＭＢ１のマイク素子の数（例えば４）と同数のＳｉｎｇｌｅＣｈａｎｎｅｌＥｃｈｏＣａｎｃｅｌｌｅｒを用いて構成可能である。このＳｉｎｇｌｅＣｈａｎｎｅｌＥｃｈｏＣａｎｃｅｌｌｅｒは、例えば参考非特許文献において開示されている構成でよい。この構成を用いることにより、特に参照信号間に相関が無い場合（言い換えると、クロストーク成分が実質的に含まれないと見做せる程度の所定閾値未満となる場合）には、アンビソニックスマイクＡＭＢ１のマイク素子ごとの収音信号に含まれる参照信号（つまり、個別収音マイクが収音した個別音源音信号）の成分を高精度に消去（抑圧）することが可能となる。また、エコーキャンセル部２１～２ｎは時間領域信号をＤＦＴ（ＤｉｓｃｒｅｔｅＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ）などにより順変換した上で周波数領域上やサブバンド領域上の適応フィルタを用いたエコーキャンセル処理として実現しても良く、キャンセル後の信号をＩＦＦＴ（ＩｎｖｅｒｓｅＦａｓｔＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ）などにより時間領域へと逆変換した上で後段の処理を行っても良い。 Here, each of the echo canceling units 21 to 2n may be configured as an echo canceller using an adaptive filter that operates in the time domain, for example. This echo canceller can be configured as a Single Channel EchoCanceller, for example. Therefore, as shown in FIG. 14, each of the echo canceling units 21 to 2n can be configured using the same number of Single Channel Echo Cancellers as the number of microphone elements (for example, 4) of the ambisonics microphone AMB1. This Single Channel EchoCanceller may have the configuration disclosed in the referenced non-patent document, for example. By using this configuration, the ambisonics microphone AMB1 It becomes possible to highly accurately eliminate (suppress) the component of the reference signal (that is, the individual sound source sound signal collected by the individual sound collection microphone) included in the sound pickup signal for each microphone element. Further, the echo canceling units 21 to 2n may be implemented as echo canceling processing using an adaptive filter on the frequency domain or subband domain after forwardly transforming the time domain signal using DFT (Discrete Fourier Transform) or the like. The post-cancelled signal may be inversely transformed into the time domain by IFFT (Inverse Fast Fourier Transform) or the like, and then the subsequent processing may be performed.

＜参考非特許文献＞５章音響エコーキャンセラ、「適応フィルタの構成例」（図５・２参照）、ｐ４／（１７）、電子情報通信学会、２０１２年、［令和４年９月２日検索］、インターネット＜ＵＲＬ：ｈｔｔｐｓ：／／ｗｗｗ．ｉｅｉｃｅ－ｈｂｋｂ．ｏｒｇ／ｆｉｌｅｓ／０２／０２ｇｕｎ＿０６ｈｅｎ＿０５．ｐｄｆ＞ <Reference non-patent literature> Chapter 5 Acoustic echo canceller, "Configuration example of adaptive filter" (see Figures 5 and 2), p4/(17), Institute of Electronics, Information and Communication Engineers, 2012, [September 2, 2020] Search], Internet <URL: https://www. ieice-hbkb. org/files/02/02gun_06hen_05. pdf>

なお、エコーキャンセル部２１～２ｎのそれぞれは収音空間（例えばライブ会場ＬＶ１）内を伝搬した個別音源音を消去（抑圧）することを目的として設けられる。このため、エコーキャンセル部２１～２ｎのそれぞれは、符号化部２２とともに収音空間（例えばライブ会場ＬＶ１）側に設けられてもよいし、符号化部２２とともに再現空間（例えばサテライト会場ＳＴＬ）側に設けられていてもよい。この場合、収音空間（例えばライブ会場ＬＶ１）側に設けた場合、符号化部２２の出力である１次オーダーアンビソニックス信号（つまり、客席側臨場感）の成分のみが音場臨場感再現装置２０に送られることになる。一方、再現空間（例えばサテライト空間ＳＴＬ１）側に設けた場合、符号化部２２の出力である１次オーダーアンビソニックス信号（つまり、客席側臨場感）の成分と個別音源音信号とが音場臨場感再現装置２０に送られることになる。また、エコーキャンセル部２１～２ｎのみを収音空間（例えばライブ会場ＬＶ１）側に設け、符号化部２２を再現空間（例えばサテライト会場ＳＴＬ）に設けても良い。この場合、エコーキャンセル部２ｎの出力信号の成分のみが音場臨場感再現装置２０に送られることになる。 Each of the echo cancellation units 21 to 2n is provided for the purpose of erasing (suppressing) the individual sound source sound propagated in the sound collection space (for example, live venue LV1). For this reason, each of the echo cancellation units 21 to 2n may be provided on the sound collection space (for example, live venue LV1) side together with the encoding unit 22, or on the reproduction space (for example, satellite venue STL) side together with the encoding unit 22. In this case, when provided on the sound collection space (for example, live venue LV1) side, only the component of the first-order Ambisonics signal (i.e., the realism of the audience seats) output from the encoding unit 22 is sent to the sound field realism reproduction device 20. On the other hand, when provided on the reproduction space (for example, satellite space STL1) side, the component of the first-order Ambisonics signal (i.e., the realism of the audience seats) output from the encoding unit 22 and the individual sound source sound signal are sent to the sound field realism reproduction device 20. Also, only the echo cancellation units 21 to 2n may be provided in the sound collection space (for example, the live venue LV1), and the encoding unit 22 may be provided in the reproduction space (for example, the satellite venue STL). In this case, only the output signal component of the echo cancellation unit 2n is sent to the sound field realism reproduction device 20.

また、再現空間（例えばサテライト会場ＳＴＬ１）側では、音場臨場感再現装置２０は、音場臨場感収音装置１０の個別収音マイクＭ１～Ｍｎのそれぞれにより収音された個別音源音信号を音場臨場感の再現目的のためにサテライトスピーカＳＰｋ１～ＳＰｋｐのそれぞれ、或いは個別音源音信号用に設けた他のサテライトスピーカ（図示略）から出力してもよい。 Furthermore, on the side of the reproduction space (for example, satellite venue STL1), the sound field presence reproduction device 20 receives individual sound source sound signals picked up by each of the individual sound pickup microphones M1 to Mn of the sound field presence sound pickup device 10. For the purpose of reproducing a sense of presence in a sound field, the signal may be output from each of the satellite speakers SPk1 to SPkp, or from another satellite speaker (not shown) provided for an individual sound source sound signal.

ここで、符号化部２２による符号化処理の詳細について説明する。 Here, details of the encoding process by the encoding unit 22 will be explained.

一般的に、球面上の任意の角度（θ、φ）に対し半径ｒの位置で観測（収音）される音圧ｐは波動方程式の球面調和関数領域における内部問題の解として、波数ｋに対し式（２）の球面調和関数を基底として式（４）と展開されることが知られている。式（４）において、Ａ^ｍ _ｎは展開係数であり、Ｒ_ｎ（ｋｒ）は動径関数項である。また、次数ｎに関する無限和は有限次数Ｎで打ち切ることで近似され、この打ち切り次数Ｎに応じて音場再現の精度が変化する。以下、打ち切り次数をＮとして表現する。 In general, the sound pressure p observed (collected) at a position of radius r for any angle (θ, φ) on a spherical surface is expressed as a wave number k as a solution to an internal problem in the spherical harmonic domain of the wave equation. On the other hand, it is known that Equation (2) can be expanded as Equation (4) using the spherical harmonic function as a basis. In equation (4), A ^m _n is an expansion coefficient and R _n (kr) is a radial function term. Furthermore, the infinite sum with respect to order n is approximated by truncating at a finite order N, and the accuracy of sound field reproduction changes according to this truncated order N. Hereinafter, the truncation order will be expressed as N.

式（６）において、ｉは虚数単位であり、ｊ_ｎ（ｋｒ）はｎ次の球ベッセル関数、ｊ^’ _ｎ（ｋｒ）はその導関数である。本開示においては、この平面波に対する展開係数ベクトルγ^ｍ _ｎを、符号化部２２による符号化処理の出力であるＢフォーマット信号（中間表現）として取り扱う。以下、この展開係数ベクトルを、時間領域と異なるアンビソニックス領域上のアンビソニックス領域信号（ａｍｂｉｓｏｎｉｃｓｄｏｍａｉｎｓｉｇｎａｌ）又は単にアンビソニックス信号と称する場合がある。 In equation (6), i is an imaginary unit, j _n (kr) is an n-th spherical Bessel function, and j ^′ _n (kr) is its derivative. In the present disclosure, the expansion coefficient vector γ ^m _n for this plane wave is handled as a B format signal (intermediate representation) that is the output of the encoding process by the encoding unit 22. Hereinafter, this expansion coefficient vector may be referred to as an ambisonics domain signal on an ambisonics domain different from the time domain or simply an ambisonics signal.

より具体的には、符号化部２２による符号化処理では、エコーキャンセル部２１～２ｎのそれぞれから出力された参照信号の成分の消去処理後の時間領域信号（ｔｉｍｅｄｏｍａｉｎｓｉｇｎａｌ）である収音信号をアンビソニックス信号（例えば１次オーダーアンビソニックス信号）へと変換し、このアンビソニックス信号（例えば１次オーダーアンビソニックス信号）は復号化部２５によりデコード処理されてスピーカ駆動信号に変換される。 More specifically, in the encoding process by the encoding unit 22, the collected sound signal is a time domain signal after the cancellation process of the components of the reference signals output from each of the echo canceling units 21 to 2n. is converted into an ambisonics signal (for example, a first-order ambisonics signal), and this ambisonics signal (for example, a first-order ambisonics signal) is decoded by the decoding section 25 and converted into a speaker drive signal.

音場再生部２６は、復号化部２５から出力されたサテライトスピーカごとのディジタル形式のスピーカ駆動信号をアナログ形式のスピーカ駆動信号に変換して信号増幅し、対応するサテライトスピーカから出力（再生）する。 The sound field reproduction unit 26 converts the digital speaker drive signal for each satellite speaker output from the decoding unit 25 into an analog speaker drive signal, amplifies the signal, and outputs (reproduces) the signal from the corresponding satellite speaker. .

サテライトスピーカＳＰｋ１、…、ＳＰｋｐは、立方体でモデル化される再現空間（例えばサテライト会場ＳＴＬ１）の各頂点部分（図１０参照）に配置され、音場再生部２６からのスピーカ駆動信号に基づいて音場を再生（再現）する。なお、スピーカ設置数は再現したい音場によって変化させてよく、特定の方位に対する再現を行わない場合や、トランスオーラルシステムやＶＢＡＰ（ＶｅｃｔｏｒＢａｓｅｄＡｍｐｌｉｔｕｄｅＰａｎｎｉｎｇ）法など一般的に知られた仮想音像生成方式を組み合わせることによりｐ個（例えば図１０の例では８個）よりも少ないサテライトスピーカを用いて音場再現を行っても良い。逆に、ｐ個（例えば図１０の例では８個）よりも多くのサテライトスピーカを用いた音場再現を行っても良い。また、スピーカ設置位置はサテライト会場ＳＴＬ１の基準位置（例えば中心位置ＬＳＰ１）を取り囲むように設置されていれば再現空間（例えばサテライト会場ＳＴＬ１）の各頂点部分以外であっても良い。音場再生部２６はサテライトスピーカの代わりに聴取者（ユーザ）が装着しているヘッドホンやイヤホンなどの両耳への再生装置に信号を出力しても良い。また、音場再生部２８は、聴取者（ユーザ）の両耳への再生装置（例えば、上述したヘッドホンやイヤホン）に信号を供給する際は後述するデコード処理によって方位角＋－９０°に対応した再生信号を生成しても良いし、頭部を包囲する複数の方向に対して仮想音像を生成し、それら複数の角度に対応したＨＲＴＦ（ＨｅａｄＲｅｌａｔｅｄＴｒａｎｓｆｅｒＦｕｎｃｔｉｏｎ）などの立体音像をユーザに知覚させるための伝達特性を対応した方向の仮想音像に対して周波数領域で乗算又は時間領域で畳み込むことで再生信号を生成しても良い。これにより、サテライト会場ＳＴＬ１に配置されたサテライトスピーカＳＰｋ１～ＳＰｋｐのそれぞれからに限った音場再現となるのではなく、サテライト会場ＳＴＬ１に配置された聴取者（ユーザ）が装着している再生装置（例えば、上述したヘッドホンやイヤホン）への音場再現も可能となる。 The satellite speakers SPk1, . Regenerate (recreate) the place. Note that the number of speakers installed may be changed depending on the sound field to be reproduced, and may be used in cases where reproduction is not performed in a specific direction, or in cases where generally known virtual sound image generation methods such as the transaural system and VBAP (Vector Based Amplitude Panning) method are used. By combining these, the sound field may be reproduced using fewer than p satellite speakers (for example, eight in the example of FIG. 10). Conversely, sound field reproduction may be performed using more than p satellite speakers (for example, 8 in the example of FIG. 10). Further, the speaker installation position may be other than each vertex of the reproduction space (for example, the satellite venue STL1) as long as it is installed so as to surround the reference position (for example, the center position LSP1) of the satellite venue STL1. The sound field reproduction unit 26 may output the signal to a binaural reproduction device such as headphones or earphones worn by the listener (user) instead of the satellite speaker. Furthermore, when the sound field reproduction unit 28 supplies a signal to a reproduction device for both ears of the listener (user) (for example, the above-mentioned headphones or earphones), the sound field reproduction unit 28 corresponds to an azimuth angle of +-90° by decoding processing described later. Alternatively, a virtual sound image may be generated in multiple directions surrounding the head, and the user may perceive a 3D sound image such as HRTF (Head Related Transfer Function) corresponding to the multiple angles. The reproduction signal may be generated by multiplying the virtual sound image in the corresponding direction by the transfer characteristic for the reproduction in the frequency domain or by convolving it in the time domain. As a result, the sound field is not reproduced only from each of the satellite speakers SPk1 to SPkp placed in the satellite venue STL1, but the playback device ( For example, it is also possible to reproduce the sound field on the headphones and earphones mentioned above.

ここで、復号化部２５による処理の詳細について説明する。 Here, we will explain the details of the processing performed by the decoding unit 25.

次に、図１５を参照して、音場臨場感再現装置２０による音場臨場感再現の動作手順について説明する。図１５は、実施の形態２に係る音場臨場感再現装置２０による音場臨場感再現の動作手順例を時系列に示すフローチャートである。 Next, with reference to FIG. 15, a description will be given of an operation procedure for reproducing a sound field sense of presence by the sound field sense of presence reproduction apparatus 20. FIG. 15 is a flowchart chronologically showing an example of an operation procedure for reproducing a sound field sense of presence by the sound field sense of presence reproduction apparatus 20 according to the second embodiment.

図１５において、音場臨場感収音装置１０のアンビソニックスマイクＡＭＢ１は、収音空間（例えばライブ会場ＬＶ１）内で配置されている客席側所定位置の周囲で生じている音（例えば客席側臨場感を与える音）を収音する（ステップＳｔ２１）。このステップＳｔ２１で収音されたアンビソニックスマイクＡＭＢ１のマイク素子ごとの収音信号は、音場臨場感再現装置２０Ｂに送信される。但し、上述したように、ステップＳｔ２１で収音される音には、客席側臨場感を与える音だけでなく、収音空間（例えばライブ会場ＬＶ１）のメインステージＳＴＧ１（図１２参照）での演奏或いは演劇等による１以上の音源からの音も含まれる。また、収音空間（例えばライブ会場ＬＶ１）のメインステージＳＴＧ１（図１２参照）での演奏或いは演劇等による１以上の音源からの個別収音マイクＭ１～Ｍｎのそれぞれにより収音された個別音源音（言い換えると、第１の参照信号Ｍ１Ｓ～第ｎの参照信号ＭｎＳ）も音場臨場感再現装置２０に送信される（ステップＳｔ２１）。 15, the Ambisonics microphone AMB1 of the sound field realism pickup device 10 picks up sounds (e.g., sounds that give a sense of realism to the audience) occurring around a specific position on the audience side located in the pickup space (e.g., live venue LV1) (step St21). The pickup signals for each microphone element of the Ambisonics microphone AMB1 picked up in step St21 are transmitted to the sound field realism reproduction device 20B. However, as described above, the sounds picked up in step St21 include not only sounds that give a sense of realism to the audience side, but also sounds from one or more sound sources due to performances or plays on the main stage STG1 (see FIG. 12) of the pickup space (e.g., live venue LV1). In addition, individual sound source sounds (in other words, the first reference signal M1S to the nth reference signal MnS) picked up by each of the individual pickup microphones M1 to Mn from one or more sound sources such as a performance or a play on the main stage STG1 (see FIG. 12) of the sound collection space (e.g., the live performance venue LV1) are also transmitted to the sound field realism reproduction device 20 (step St21).

音場臨場感再現装置２０は、エコーキャンセル部２１～２ｎのそれぞれにおいて、アンビソニックスマイクＡＭＢ１のマイク素子ごとの収音信号を主信号、個別収音マイクＭ１～Ｍｎのそれぞれの収音信号を参照信号としたエコーキャンセル処理（上述参照）を時間軸上で参照信号ごとに繰り返して実行する（ステップＳｔ２２）。より具体的には、音場臨場感再現装置２０のエコーキャンセル部２１は、ステップＳｔ２１で送られた各種の信号（具体的には、アンビソニックスマイクＡＭＢ１のマイク素子ごとの収音信号、対応する第１の参照信号Ｍ１Ｓ）を入力し、アンビソニックスマイクＡＭＢ１のマイク素子ごとの収音信号に含まれる第１の参照信号Ｍ１Ｓ（つまり、個別収音マイクＭ１が収音した個別音源音信号）の成分を消去するための消去処理（例えばエコーキャンセル処理）を時間軸上で実行する（ステップＳｔ２２）。同様に、音場臨場感再現装置２０のエコーキャンセル部２ｎは、ステップＳｔ２１で送られた各種の信号（具体的には、アンビソニックスマイクＡＭＢ１のマイク素子ごとの収音信号、対応する第ｎの参照信号ＭｎＳ）を入力し、アンビソニックスマイクＡＭＢ１のマイク素子ごとの収音信号に含まれる第ｎの参照信号ＭｎＳ（つまり、個別収音マイクＭｎが収音した個別音源音信号）の成分を消去するための消去処理（例えばエコーキャンセル処理）を時間軸上で実行する（ステップＳｔ２２）。 In each of the echo canceling units 21 to 2n, the sound field reality reproduction device 20 uses the picked-up signal of each microphone element of the ambisonics microphone AMB1 as a main signal, and refers to the picked-up signal of each of the individual sound collecting microphones M1 to Mn as a main signal. The echo canceling process (see above) made into a signal is repeatedly executed for each reference signal on the time axis (step St22). More specifically, the echo canceling unit 21 of the sound field presence reproduction device 20 receives various signals sent in step St21 (specifically, the sound pickup signal for each microphone element of the ambisonics microphone AMB1, the corresponding The first reference signal M1S) is input, and the first reference signal M1S (that is, the individual sound source sound signal collected by the individual sound collection microphone M1) included in the sound collection signal for each microphone element of the ambisonics microphone AMB1 is Erasing processing (for example, echo canceling processing) for erasing the components is executed on the time axis (Step St22). Similarly, the echo canceling unit 2n of the sound field presence reproduction device 20 receives various signals sent in step St21 (specifically, the sound pickup signal for each microphone element of the ambisonics microphone AMB1, the corresponding n-th input the reference signal MnS) and erase the component of the n-th reference signal MnS (that is, the individual sound source sound signal picked up by the individual sound pickup microphone Mn) included in the sound pickup signal for each microphone element of the ambisonics microphone AMB1. Erasing processing (for example, echo canceling processing) for this purpose is executed on the time axis (Step St22).

以上により、実施の形態２に係る音場臨場感再現装置２０は、収音空間（ライブ会場ＬＶ１）内に配置される収音デバイス（アンビソニックスマイクＡＭＢ１）により収音される収音信号（拡散音の成分ＤＳ１～ＤＳ３）と、収音空間内の１以上の音源の音源信号（音信号ＳＳ１、音声信号ＳＳ２、音信号ＳＳ３）とを取得する取得部（エコーキャンセル部２１～２ｎ）と、その音源信号を参照信号とし、収音信号に含まれる参照信号の成分を消去する消去処理を時間軸上で実行する消去部（エコーキャンセル部２１～２ｎ）と、消去処理後の信号を符号化処理する符号化部２２と、符号化処理後の信号に基づいて、収音空間とは異なる再現空間（サテライト会場ＳＴＬ１）内に配置される複数のスピーカ（サテライトスピーカＳＰｋ１～ＳＰｋｐ）ごとに、収音空間内の音場臨場感を再現空間内において再現するためのスピーカ駆動信号を生成する生成部（復号化部２５）と、複数のスピーカのそれぞれから、スピーカごとのスピーカ駆動信号を出力する音場再生部２６と、を備える。これにより、実施の形態２に係る音場臨場感再現装置２０は、アンビソニックスマイクＡＭＢ１が主に収音した収音空間（ライブ会場ＬＶ１）内の客席側臨場感の雰囲気を、アンビソニックスマイクＡＭＢ１により収音されるライブ会場ＬＶ１内の１以上の個別音源（参照信号）の成分を時間軸上で消去することにより、少なくとも１つのサテライト会場内において高精度に再現することができる。 As described above, the sound field presence reproduction device 20 according to the second embodiment is configured to provide a sound pickup signal (diffusion an acquisition unit (echo cancellation units 21 to 2n) that acquires sound components DS1 to DS3) and sound source signals (sound signal SS1, audio signal SS2, and sound signal SS3) of one or more sound sources in the sound collection space; An erasing section (echo canceling section 21 to 2n) that uses the sound source signal as a reference signal and performs an erasing process on the time axis to erase the reference signal component included in the collected sound signal, and encodes the signal after the erasing process. Based on the encoding unit 22 to process and the signal after encoding processing, the recording is performed for each of the plurality of speakers (satellite speakers SPk1 to SPkp) arranged in a reproduction space (satellite venue STL1) different from the sound collection space. A generation unit (decoding unit 25) that generates a speaker drive signal for reproducing the sense of presence of a sound field in a reproduction space, and a sound output unit that outputs a speaker drive signal for each speaker from each of a plurality of speakers. A field reproduction section 26 is provided. Thereby, the sound field presence reproduction device 20 according to the second embodiment reproduces the atmosphere of presence on the audience seat side in the sound collection space (live venue LV1) mainly collected by the ambisonics microphone AMB1. By erasing on the time axis the components of one or more individual sound sources (reference signals) in the live venue LV1 that are picked up by the sound, it is possible to reproduce the sound with high precision in at least one satellite venue.

また、音場臨場感再現装置２０は、収音デバイス（アンビソニックスマイクＡＭＢ１）が備える複数のマイク素子Ｍｃ１～Ｍｃ４の方向情報を指定するマイク素子方向指定部２３を更に備える。符号化部２２は、複数のマイク素子Ｍｃ１～Ｍｃ４のそれぞれの方向情報と消去処理後の信号とを用いて、符号化処理を実行する。これにより、音場臨場感再現装置２０は、アンビソニックスマイクＡＭＢ１が備えるマイク素子Ｍｃ１～Ｍｃ４のそれぞれの方向情報を加味して複数の方向解像度（図１０のＢフォーマット信号参照）を有するアンビソニックス信号を生成できる。 The sound field reality reproduction device 20 further includes a microphone element direction designation unit 23 that designates direction information of a plurality of microphone elements Mc1 to Mc4 included in the sound collection device (ambisonics microphone AMB1). The encoding unit 22 executes encoding processing using the direction information of each of the plurality of microphone elements Mc1 to Mc4 and the signal after the erasure processing. Thereby, the sound field presence reproduction device 20 generates an ambisonics signal having a plurality of directional resolutions (see the B format signal in FIG. 10) by taking into account the direction information of each of the microphone elements Mc1 to Mc4 included in the ambisonics microphone AMB1. can be generated.

また、音場臨場感再現装置２０は、再現空間（サテライト会場ＳＴＬ１）内の複数のスピーカ（サテライトスピーカＳＰｋ１～ＳＰｋｐ）の方向情報を指定するスピーカ方向指定部２４を更に備える。生成部（復号化部２５）は、複数のスピーカのそれぞれの方向情報と符号化処理後の信号とを用いて、複数のスピーカのそれぞれごとのスピーカ駆動信号をアンビソニックス領域上で生成する。これにより、音場臨場感再現装置２０は、サテライト会場ＳＴＬ１の空間内に配置されている複数のサテライトスピーカＳＰｋ１～ＳＰｋｐのそれぞれの基準位置（例えば聴取者の位置に相当する中心位置ＬＳＰ１参照）からの方向情報を加味してライブ会場ＬＶ１内の客席側臨場感を再現可能なスピーカ駆動信号を生成できる。 Furthermore, the sound field presence reproduction device 20 further includes a speaker direction designation unit 24 that designates direction information of a plurality of speakers (satellite speakers SPk1 to SPkp) in the reproduction space (satellite venue STL1). The generation unit (decoding unit 25) generates a speaker drive signal for each of the plurality of speakers in the ambisonics region using the direction information of each of the plurality of speakers and the encoded signal. As a result, the sound field presence reproduction device 20 can move from the reference position (for example, refer to the center position LSP1 corresponding to the listener's position) of each of the plurality of satellite speakers SPk1 to SPkp arranged in the space of the satellite venue STL1. It is possible to generate a speaker drive signal that can reproduce the sense of presence on the audience seat side in the live venue LV1 by taking into account the direction information.

また、音場臨場感再現装置２０の消去部（エコーキャンセル部２１～２ｎ）は、収音デバイス（アンビソニックスマイクＡＭＢ１）が備えるマイク素子Ｍｃ１～Ｍｃ４の数と音源の数とに基づいて定められる個数分のシングルエコーキャンセラ（図１４参照）により構成される。それぞれのシングルエコーキャンセラは、対応する音源の音源信号（例えば個別収音マイクによる音信号若しくは音声信号）を入力して消去処理（エコーキャンセル処理）を時間軸上で実行する。これにより、特に参照信号間に相関が無い場合（言い換えると、クロストーク成分が実質的に含まれないと見做せる程度の所定閾値未満となる場合）には、アンビソニックスマイクＡＭＢ１のマイク素子ごとの収音信号に含まれる参照信号（つまり、個別収音マイクＭｎが収音した個別音源音信号）の成分を高精度に消去（抑圧）することが可能となる。クロストーク成分が実質的に含まれないとは、例えばメインステージＳＴＧ１（図１２参照）上にいるボーカルが歌った時の音声が他の個別収音マイクにより収音されない或いは収音されてもその音圧レベルが上述した所定閾値未満となる場合が該当する。 The elimination unit (echo cancellation unit 21-2n) of the sound field realistic reproduction device 20 is composed of a number of single echo cancellers (see FIG. 14) determined based on the number of microphone elements Mc1-Mc4 of the sound collection device (Ambisonics microphone AMB1) and the number of sound sources. Each single echo canceller inputs a sound source signal of the corresponding sound source (for example, a sound signal or a voice signal from an individual sound collection microphone) and performs an elimination process (echo cancellation process) on the time axis. As a result, particularly when there is no correlation between the reference signals (in other words, when it is less than a predetermined threshold value that can be considered to be substantially free of crosstalk components), it is possible to highly accurately eliminate (suppress) the components of the reference signal (i.e., the individual sound source sound signal collected by the individual sound collection microphone Mn) contained in the collected signal for each microphone element of the Ambisonics microphone AMB1. Substantially no crosstalk components are present when, for example, the voice of a vocalist singing on the main stage STG1 (see FIG. 12) is not picked up by other individual microphones, or if it is picked up, the sound pressure level is below the predetermined threshold value mentioned above.

（実施の形態２の変形例）
実施の形態２では、音場臨場感再現装置において、エコーキャンセル部２１～２ｎのそれぞれをＳｉｎｇｌｅＥｃｈｏＣａｎｃｅｌｌｅｒとして構成した例を説明した。実施の形態２の変形例では、音場臨場感再現装置において、エコーキャンセル部２１～２ｎの代わりに、複数の音声チャネルを取り扱うマルチチャンネルエコーキャンセラとして構成した例を説明する。なお、実施の形態２の変形例において、実施の形態２と重複する構成、内容については対応する共通の符号を付与して説明を簡略化或いは省略し、異なる内容について説明する。 (Modification of Embodiment 2)
In the second embodiment, an example has been described in which each of the echo canceling units 21 to 2n is configured as a Single Echo Canceller in the sound field realistic feeling reproduction device. In a modification of the second embodiment, an example will be described in which a sound field realistic sensation reproducing apparatus is configured as a multi-channel echo canceller that handles a plurality of audio channels instead of the echo canceling sections 21 to 2n. In addition, in the modification of the second embodiment, configurations and contents that overlap with those of the second embodiment will be given corresponding common reference numerals to simplify or omit the explanation, and different contents will be explained.

まず、図１６及び図１７を参照して、実施の形態２の変形例に係る音場臨場感再現システム１００Ａのシステム構成並びに動作概要について説明する。図１６は、実施の形態２の変形例に係る音場臨場感再現システム１００Ａのシステム構成例を示すブロック図である。図１７は、図１６の音場臨場感再現システム１００Ａにおける音場臨場感収音から音場臨場感再現までの動作概要例を示す図である。 First, with reference to FIGS. 16 and 17, the system configuration and operational outline of a sound field realistic sensation reproduction system 100A according to a modification of the second embodiment will be described. FIG. 16 is a block diagram showing a system configuration example of a sound field realistic sensation reproduction system 100A according to a modification of the second embodiment. FIG. 17 is a diagram illustrating an example of an outline of operations from sound field presence sound collection to sound field presence reproduction in the sound field presence reproduction system 100A of FIG. 16.

音場臨場感再現システム１００Ａは、音場臨場感収音装置１０と、音場臨場感再現装置２０Ａとを含む。音場臨場感収音装置１０と音場臨場感再現装置２０ＡとはネットワークＮＷ１を介して互いにデータ通信が可能に接続されている。 The sound field presence reproduction system 100A includes a sound field presence sound pickup device 10 and a sound field presence reproduction device 20A. The sound field presence sound pickup device 10 and the sound field presence reproduction device 20A are connected to each other via a network NW1 so as to be capable of data communication.

音場臨場感再現装置２０Ａは、再現空間（例えばサテライト会場ＳＴＬ１）に配置され、マルチチャンネルエコーキャンセル部２１Ａと、符号化部２２と、マイク素子方向指定部２３と、スピーカ方向指定部２４と、復号化部２５と、音場再生部２６と、サテライトスピーカＳＰｋ１、…、ＳＰｋｐとを含む。つまり、実施の形態２の変形例では、実施の形態２のエコーキャンセル部２１～２ｎの代わりに、ｎ個の個別収音マイクのそれぞれが収音する音源の音源信号を入力するマルチチャンネルエコーキャンセル部２１Ａが音場臨場感再現装置２０Ａ内に設けられる。なお、図１６中のマルチチャンネルエコーキャンセル部２１Ａ、符号化部２２、マイク素子方向指定部２３、スピーカ方向指定部２４、復号化部２５及び音場再生部２６の構成は、実施の形態１に係る客席側臨場感パラメータ算出部１６及び音場再生処理部１８の構成（図１参照）に相当する。 The sound field presence reproduction device 20A is arranged in a reproduction space (for example, the satellite venue STL1), and includes a multichannel echo canceling section 21A, an encoding section 22, a microphone element direction specifying section 23, a speaker direction specifying section 24, It includes a decoding section 25, a sound field reproduction section 26, and satellite speakers SPk1, . . . , SPkp. That is, in the modified example of the second embodiment, instead of the echo canceling units 21 to 2n of the second embodiment, a multi-channel echo canceller inputs the sound source signal of the sound source picked up by each of the n individual sound collection microphones. The section 21A is provided in the sound field realistic sensation reproduction device 20A. Note that the configurations of the multichannel echo canceling section 21A, encoding section 22, microphone element direction specifying section 23, speaker direction specifying section 24, decoding section 25, and sound field reproduction section 26 in FIG. 16 are the same as in the first embodiment. This corresponds to the configuration of the audience seat side realism parameter calculation unit 16 and sound field reproduction processing unit 18 (see FIG. 1).

マルチチャンネルエコーキャンセル部２１Ａは、音場臨場感収音装置１０（Ａ／Ｄ変換部７側）から送られたアンビソニックスマイクＡＭＢ１のマイク素子ごとの収音信号を入力し、更に、音場臨場感収音装置１０（個別収音マイクＭ１側）から送られた第１の音源（上述参照）～第ｎの音源（上述参照）のそれぞれの個別音源音信号を第１の参照信号Ｍ１Ｓ～第ｎの参照信号ＭｎＳとして入力する。マルチチャンネルエコーキャンセル部２１Ａは、アンビソニックスマイクＡＭＢ１の収音信号に含まれる、第１の参照信号Ｍ１Ｓ（つまり、個別収音マイクＭ１が収音した個別音源音信号）～第ｎの参照信号ＭｎＳ（つまり、個別収音マイクＭｎが収音した個別音源音信号）のそれぞれの成分を消去するための消去処理（例えばマルチチャンネルエコーキャンセル処理）を時間領域上で実行する。マルチチャンネルエコーキャンセル部２１Ａは、消去処理後の信号（第１の収音信号～第ｎの収音信号）を符号化部２２に出力する。 The multi-channel echo canceling unit 21A inputs the sound pickup signal for each microphone element of the ambisonics microphone AMB1 sent from the sound field realistic sound pickup device 10 (A/D conversion unit 7 side), and further inputs the sound pickup signal for each microphone element of the ambisonic microphone AMB1. The respective individual sound source sound signals of the first sound source (see above) to the nth sound source (see above) sent from the sound pickup device 10 (individual sound collection microphone M1 side) are converted into first reference signals M1S to Nth sound source (see above). n reference signal MnS. The multi-channel echo cancellation unit 21A is configured to convert the first reference signal M1S (that is, the individual sound source sound signal collected by the individual sound collection microphone M1) to the nth reference signal MnS, which are included in the sound signal collected by the ambisonics microphone AMB1. (that is, the individual sound source sound signal picked up by the individual sound collection microphone Mn), an erasure process (for example, multi-channel echo cancellation process) is executed in the time domain. The multi-channel echo canceling unit 21A outputs the signals after the cancellation process (first to nth collected sound signals) to the encoding unit 22.

ここで、マルチチャンネルエコーキャンセル部２１Ａは、例えば時間領域上で動作する適応フィルタを用いたエコーキャンセラとして構成してよい。このエコーキャンセラは、例えばＭｕｌｔｉＣｈａｎｎｅｌＥｃｈｏＣａｎｃｅｌｌｅｒとして構成することができる。なお、このＭｕｌｔｉＣｈａｎｎｅｌＥｃｈｏＣａｎｃｅｌｌｅｒの構成としては、参考非特許文献に示されるステレオエコーキャンセラを参考にしてよく、このステレオエコーキャンセラはＭｕｌｔｉＣｈａｎｎｅｌＥｃｈｏＣａｎｃｅｌｌｅｒに入力される参照信号の数を２つ設けた場合の例となる。従って、図１７に示すように、マルチチャンネルエコーキャンセル部２１Ａは、アンビソニックスマイクＡＭＢ１のマイク素子の数（例えば４）と同数のＭｕｌｔｉＣｈａｎｎｅｌＥｃｈｏＣａｎｃｅｌｌｅｒ若しくはステレオエコーキャンセラを用いて構成可能である。このＭｕｌｔｉＣｈａｎｎｅｌＥｃｈｏＣａｎｃｅｌｌｅｒ若しくはステレオエコーキャンセラは、例えば参考非特許文献において開示されている構成或いはその構成を参照することで得られる構成でよい。この構成を用いることにより、たとえ参照信号間に相関がある場合（言い換えると、クロストーク成分が含まれないと見做せる程度の所定閾値以上となる場合）でも、アンビソニックスマイクＡＭＢ１の収音信号に含まれる参照信号（つまり、個別収音マイクＭｎが収音した個別音源音信号）の成分を高精度に消去（抑圧）することが可能となる。また、マルチチャンネルエコーキャンセル部２１Ａは時間領域信号をＤＦＴなどにより順変換した上で周波数領域上やサブバンド領域上の適応フィルタを用いたエコーキャンセル処理として実現しても良く、キャンセル後の信号をＩＦＦＴなどにより時間領域へと逆変換した上で後段の処理を行っても良い。 Here, the multi-channel echo canceller 21A may be configured as an echo canceller using an adaptive filter that operates in the time domain, for example. This echo canceller can be configured as a multi-channel echo canceller, for example. Note that the configuration of this multi-channel echo canceller may refer to the stereo echo canceller shown in the reference non-patent document, and this stereo echo canceller is an example in which two reference signals are input to the multi-channel echo canceller. Therefore, as shown in FIG. 17, the multi-channel echo canceller 21A can be configured using a multi-channel echo canceller or a stereo echo canceller in the same number as the number of microphone elements (for example, 4) of the Ambisonics microphone AMB1. This Multi Channel Echo Canceller or stereo echo canceller may be, for example, a configuration disclosed in a reference non-patent document or a configuration obtained by referring to that configuration. By using this configuration, even if there is a correlation between the reference signals (in other words, when the correlation is equal to or exceeds a predetermined threshold value to the extent that it can be considered that no crosstalk components are included), it is possible to highly accurately erase (suppress) the components of the reference signal (i.e., the individual sound source signal picked up by the individual sound pickup microphone Mn) contained in the sound pickup signal of the Ambisonics microphone AMB1. In addition, the multi-channel echo cancellation unit 21A may be realized as an echo cancellation process using an adaptive filter in the frequency domain or subband domain after forward transforming the time domain signal by DFT or the like, or the canceled signal may be inversely transformed to the time domain by IFFT or the like before performing subsequent processing.

＜参考非特許文献＞５章音響エコーキャンセラ、「ステレオエコーキャンセラの構成例」（図５・８参照）、電子情報通信学会、２０１２年、［令和４年９月２日検索］、インターネット＜ＵＲＬ：ｈｔｔｐｓ：／／ｗｗｗ．ｉｅｉｃｅ－ｈｂｋｂ．ｏｒｇ／ｆｉｌｅｓ／０２／０２ｇｕｎ＿０６ｈｅｎ＿０５．ｐｄｆ＞ <Reference non-patent literature> Chapter 5 Acoustic echo canceller, "Configuration example of stereo echo canceller" (see Figures 5 and 8), Institute of Electronics, Information and Communication Engineers, 2012, [Retrieved September 2, 2020], Internet < URL: https://www. ieice-hbkb. org/files/02/02gun_06hen_05. pdf>

次に、図１８を参照して、音場臨場感再現装置２０Ａによる音場臨場感再現の動作手順について説明する。図１８は、実施の形態２の変形例に係る音場臨場感再現装置による音場臨場感再現の動作手順例を時系列に示すフローチャートである。図１８の説明において、図１５の説明と重複する内容については同一のステップ番号を付与して説明を簡略化或いは省略し、異なる内容について説明する。 Next, with reference to FIG. 18, an explanation will be given of an operation procedure for reproducing a sound field sense of presence by the sound field sense of presence reproduction apparatus 20A. FIG. 18 is a flowchart chronologically showing an example of an operation procedure for reproducing a sound field sense of presence by the sound field sense of presence reproduction apparatus according to a modification of the second embodiment. In the explanation of FIG. 18, the same step numbers are given to the contents that overlap with the explanation of FIG. 15 to simplify or omit the explanation, and the different contents will be explained.

図１８において、音場臨場感再現装置２０Ａは、マルチチャンネルエコーキャンセル部２１Ａにおいて、アンビソニックスマイクＡＭＢ１の収音信号を主信号、個別収音マイクＭ１～Ｍｎのそれぞれの収音信号を参照信号としたマルチチャンネルエコーキャンセル処理（上述参照）を時間領域上で実行する（ステップＳｔ２２Ａ）。より具体的には、音場臨場感再現装置２０Ａのマルチチャンネルエコーキャンセル部２１Ａは、ステップＳｔ２１で送られた各種の信号（具体的には、アンビソニックスマイクＡＭＢ１のマイク素子ごとの収音信号、第１の参照信号Ｍ１Ｓ～第ｎの参照信号ＭｎＳ）を入力し、アンビソニックスマイクＡＭＢ１のマイク素子ごとの収音信号に含まれる、第１の参照信号Ｍ１Ｓ（つまり、個別収音マイクＭ１が収音した個別音源音信号）の成分～第ｎの参照信号ＭｎＳ（つまり、個別収音マイクＭｎが収音した個別音源音信号）の成分を消去するための消去処理（例えばマルチチャンネルエコーキャンセル処理）を時間軸上で実行する（ステップＳｔ２２Ａ）。ステップＳｔ２２Ａ以降の処理は図１５と重複するので、説明は省略する。 In FIG. 18, the sound field presence reproduction device 20A uses the picked-up signal of the ambisonics microphone AMB1 as the main signal and the picked-up signals of the individual sound collecting microphones M1 to Mn as the reference signal in the multi-channel echo canceling unit 21A. The multi-channel echo cancellation processing (see above) is executed in the time domain (Step St22A). More specifically, the multi-channel echo canceling unit 21A of the sound field presence reproduction device 20A receives the various signals sent in step St21 (specifically, the sound pickup signal for each microphone element of the ambisonics microphone AMB1, The first reference signal M1S to the n-th reference signal MnS) are input, and the first reference signal M1S (that is, the individual sound collection microphone M1 is Erasing processing (for example, multi-channel echo cancellation processing) for erasing the component of the individual sound source sound signal) to the n-th reference signal MnS (that is, the individual sound source sound signal picked up by the individual sound collection microphone Mn) is executed on the time axis (step St22A). Since the processing after step St22A is the same as that in FIG. 15, the explanation will be omitted.

以上により、実施の形態２の変形例に係る音場臨場感再現システム１００Ａでは、音場臨場感再現装置２０Ａの消去部（マルチチャンネルエコーキャンセル部２１Ａ）は、収音デバイス（アンビソニックスマイクＡＭＢ１）が備えるマイク素子Ｍｃ１～Ｍｃ４の数に基づいて定められる個数分のマルチチャンネルエコーキャンセラにより構成される（図１７参照）。それぞれのマルチチャンネルエコーキャンセラは、複数の音源のそれぞれに対応する音源信号（音源音信号）を入力して消去処理（マルチキャンセルエコーキャンセル処理）を実行する。これにより、たとえ参照信号間に相関がある場合（言い換えると、クロストーク成分が含まれないと見做せる程度の所定閾値以上となる場合）でも、アンビソニックスマイクＡＭＢ１のマイク素子ごとの収音信号に含まれる参照信号（つまり、個別収音マイクが収音した個別音源音信号）の成分を高精度に消去（消去）することが可能となる。 As described above, in the sound field presence reproduction system 100A according to the modified example of the second embodiment, the canceling section (multichannel echo canceling section 21A) of the sound field presence reproduction device 20A is configured to use the sound pickup device (ambisonics microphone AMB1). It is constituted by a number of multi-channel echo cancellers determined based on the number of microphone elements Mc1 to Mc4 included in the device (see FIG. 17). Each multi-channel echo canceller receives a sound source signal (sound source sound signal) corresponding to each of a plurality of sound sources and executes cancellation processing (multi-cancellation echo cancellation processing). As a result, even if there is a correlation between the reference signals (in other words, if the correlation is greater than a predetermined threshold to the extent that it can be considered that crosstalk components are not included), the collected sound signal for each microphone element of the ambisonics microphone AMB1 It becomes possible to eliminate (eliminate) with high precision the component of the reference signal (that is, the individual sound source sound signal picked up by the individual sound pickup microphone) included in the reference signal.

（実施の形態３）
実施の形態２では、音場臨場感再現装置において、１次オーダーアンビソニックス信号を生成するための符号化処理を実行する前に、アンビソニックスマイクＡＭＢ１のマイク素子ごとの収音信号と収音空間（例えばライブ会場ＬＶ１）内の音源ごとの音源音信号（参照信号）とを用いて時間領域上でエコーキャンセル処理を実行する例を説明した。実施の形態３では、音場臨場感再現装置において、１次オーダーアンビソニックス信号と収音空間（例えばライブ会場ＬＶ１）内の音源ごとの方向指定別の音源音信号とを用いてアンビソニックス領域上でエコーキャンセル処理を実行する例を説明する。なお、実施の形態３において、実施の形態２と重複する構成、内容については対応する共通の符号を付与して説明を簡略化或いは省略し、異なる内容について説明する。 (Embodiment 3)
In the second embodiment, an example is described in which, in a sound field realistic sensation reproduction device, before performing an encoding process for generating a first-order Ambisonics signal, echo cancellation processing is performed in the time domain using the pickup signal for each microphone element of the Ambisonics microphone AMB1 and a sound source sound signal (reference signal) for each sound source in the sound collection space (for example, live venue LV1). In the third embodiment, an example is described in which, in a sound field realistic sensation reproduction device, echo cancellation processing is performed in the Ambisonics domain using a first-order Ambisonics signal and a direction-specified sound source sound signal for each sound source in the sound collection space (for example, live venue LV1). Note that in the third embodiment, the configuration and contents that overlap with those in the second embodiment are given corresponding common reference symbols, and the description is simplified or omitted, and the different contents are described.

まず、図１９及び図２０を参照して、実施の形態３に係る音場臨場感再現システム１００Ｂのシステム構成並びに動作概要について説明する。図１９は、実施の形態３に係る音場臨場感再現システム１００Ｂのシステム構成例を示すブロック図である。図２０は、図１９の音場臨場感再現システム１００Ｂにおける音場臨場感収音から音場臨場感再現までの動作概要例を示す図である。 First, with reference to FIGS. 19 and 20, the system configuration and operational outline of the sound field realistic sensation reproduction system 100B according to the third embodiment will be described. FIG. 19 is a block diagram showing an example of a system configuration of a sound field realistic sensation reproduction system 100B according to the third embodiment. FIG. 20 is a diagram illustrating an example of an outline of the operation from sound field presence sound collection to sound field presence reproduction in the sound field presence reproduction system 100B of FIG. 19.

音場臨場感再現システム１００Ｂは、音場臨場感収音装置１０Ｂと、音場臨場感再現装置２０Ｂとを含む。音場臨場感収音装置１０Ｂと音場臨場感再現装置２０ＢとはネットワークＮＷ１を介して互いにデータ通信が可能に接続されている。 The sound field realism reproduction system 100B includes a sound field realism pickup device 10B and a sound field realism reproduction device 20B. The sound field realism pickup device 10B and the sound field realism reproduction device 20B are connected to each other via a network NW1 so that data communication is possible.

音場臨場感収音装置１０Ｂは、収音空間（例えばライブ会場ＬＶ１）に配置され、アンビソニックスマイクＡＭＢ１と、Ａ／Ｄ変換部７と、符号化部８と、マイク素子方向指定部９と、個別収音マイクＭ１、…、Ｍｎと、を含む。なお、音場臨場感収音装置１０は、少なくともアンビソニックスマイクＡＭＢ１を有していればよく、Ａ／Ｄ変換部７、符号化部８及びマイク素子方向指定部９は音場臨場感再現装置２０Ｂに設けられてもよい。 The sound field realism pickup device 10B is placed in a sound pickup space (e.g., a live venue LV1) and includes an Ambisonics microphone AMB1, an A/D conversion unit 7, an encoding unit 8, a microphone element direction designation unit 9, and individual pickup microphones M1, ..., Mn. Note that the sound field realism pickup device 10 only needs to have at least the Ambisonics microphone AMB1, and the A/D conversion unit 7, the encoding unit 8, and the microphone element direction designation unit 9 may be provided in the sound field realism reproduction device 20B.

音場臨場感再現装置２０Ｂは、再現空間（例えばサテライト会場ＳＴＬ１）に配置され、エコーキャンセル部２１Ｂ、…、２ｎＢと、符号化部３１、…、３ｎと、音源位置指定部４１、…、４ｎと、スピーカ方向指定部２４と、復号化部２５Ｂと、音場再生部２６と、サテライトスピーカＳＰｋ１、…、ＳＰｋｐとを含む。また、エコーキャンセル部２１～２ｎの個数を示すｎと、符号化部３１～３ｎの個数を示すｎと、音源位置指定部４１～４ｎの個数を示すｎと、個別収音マイクＭ１～Ｍｎの個数を示すｎとは同一である。つまり、個別収音マイクが収音する音源の種類数と同数のエコーキャンセル部、符号化部及び音源位置指定部が音場臨場感再現装置２０Ｂ内に設けられる。なお、図１９中のエコーキャンセル部２１Ｂ～２ｎＢ、符号化部３１～３ｎ、音源位置指定部４１～４ｎ、スピーカ方向指定部２４、復号化部２５Ｂ及び音場再生部２６の構成は、実施の形態１に係る客席側臨場感パラメータ算出部１６及び音場再生処理部１８の構成（図１参照）に相当する。 The sound field presence reproduction device 20B is arranged in a reproduction space (for example, the satellite venue STL1), and includes echo canceling sections 21B, ..., 2nB, encoding sections 31, ..., 3n, and sound source position specifying sections 41, ..., 4n. , a speaker direction designation section 24, a decoding section 25B, a sound field reproduction section 26, and satellite speakers SPk1, . . . , SPkp. Further, n indicates the number of echo canceling sections 21 to 2n, n indicates the number of encoding sections 31 to 3n, n indicates the number of sound source position specifying sections 41 to 4n, and n indicates the number of individual sound collecting microphones M1 to Mn. This is the same as n indicating the number of objects. That is, the same number of echo canceling units, encoding units, and sound source position specifying units as the number of types of sound sources picked up by the individual sound pickup microphones are provided in the sound field presence reproduction device 20B. Note that the configurations of the echo canceling units 21B to 2nB, encoding units 31 to 3n, sound source position specifying units 41 to 4n, speaker direction specifying unit 24, decoding unit 25B, and sound field reproduction unit 26 in FIG. This corresponds to the configuration of the audience seat side realism parameter calculation section 16 and the sound field reproduction processing section 18 according to the first embodiment (see FIG. 1).

ここで、エコーキャンセル部２１Ｂ～２ｎＢのそれぞれは、例えばアンビソニックス領域で動作する適応フィルタを用いたエコーキャンセラとして構成してよい。このエコーキャンセラは、例えばＳｉｎｇｌｅＣｈａｎｎｅｌＥｃｈｏＣａｎｃｅｌｌｅｒとして構成することができる。従って、図２０に示すように、エコーキャンセル部２１Ｂ～２ｎＢのそれぞれは、アンビソニックスマイクＡＭＢ１のマイク素子の数（例えば４）と同数のＳｉｎｇｌｅＣｈａｎｎｅｌＥｃｈｏＣａｎｃｅｌｌｅｒを用いて構成可能である。このＳｉｎｇｌｅＣｈａｎｎｅｌＥｃｈｏＣａｎｃｅｌｌｅｒは、例えば参考非特許文献において開示されている構成でよい。この構成を用いることにより、たとえ参照信号間に相関が無い場合（言い換えると、クロストーク成分が実質的に含まれないと見做せる程度の所定閾値未満となる場合）には、アンビソニックスマイクＡＭＢ１のマイク素子ごとの収音信号に基づく信号成分（例えば図１０に示すＷ、Ｘ、Ｙ、Ｚの各方向解像度を有する信号）に含まれる参照信号（つまり、個別収音マイクが収音した個別音源音信号）の成分を高精度に消去（抑圧）することが可能となる。 Here, each of the echo cancellation units 21B to 2nB may be configured as an echo canceller using an adaptive filter that operates in the Ambisonics domain, for example. This echo canceller can be configured as a Single Channel Echo Canceller, for example. Therefore, as shown in FIG. 20, each of the echo cancellation units 21B to 2nB can be configured using a Single Channel Echo Canceller with the same number of microphone elements as the number of microphone elements of the Ambisonics microphone AMB1 (for example, 4). This Single Channel Echo Canceller may have a configuration disclosed in, for example, a reference non-patent document. By using this configuration, even if there is no correlation between the reference signals (in other words, if it is below a certain threshold at which it can be assumed that there is substantially no crosstalk component), it is possible to highly accurately erase (suppress) the components of the reference signal (i.e., the individual sound source signal picked up by the individual sound pickup microphone) contained in the signal components based on the pickup signal for each microphone element of the Ambisonics microphone AMB1 (for example, a signal having resolution in each of the W, X, Y, and Z directions shown in FIG. 10).

次に、図２１を参照して、音場臨場感再現装置２０Ｂによる音場臨場感再現の動作手順について説明する。図２１は、実施の形態３に係る音場臨場感再現装置２０Ｂによる音場臨場感再現の動作手順例を時系列に示すフローチャートである。図２１の説明において、図１５の説明と重複する内容については同一のステップ番号を付与して説明を簡略化或いは省略し、異なる内容について説明する。 Next, with reference to FIG. 21, an explanation will be given of an operation procedure for reproducing a sound field sense of presence by the sound field sense of presence reproduction apparatus 20B. FIG. 21 is a flowchart chronologically showing an example of an operation procedure for reproducing a sound field sense of presence by the sound field sense of presence reproduction apparatus 20B according to the third embodiment. In the explanation of FIG. 21, the same step numbers are assigned to the contents that overlap with the explanation of FIG. 15 to simplify or omit the explanation, and the different contents will be explained.

以上により、実施の形態３に係る音場臨場感再現システム１００Ｂにおいて、音場臨場感再現装置２０Ｂは、収音空間（ライブ会場ＬＶ１）内に配置される収音デバイス（アンビソニックスマイクＡＭＢ１）により収音される収音信号（拡散音の成分ＤＳ１～ＤＳ３）を少なくとも取得する取得部（エコーキャンセル部２１Ｂ～２ｎＢ）と、収音空間内の１以上の音源の音源信号（音信号ＳＳ１、音声信号ＳＳ２、音信号ＳＳ３）を符号化処理する符号化部３１～３ｎと、符号化処理後の音源信号を参照信号とし、収音信号に含まれる参照信号の成分を消去する消去処理を実行する消去部（エコーキャンセル部２１Ｂ～２ｎＢ）と、消去処理後の信号に基づいて、収音空間とは異なる再現空間（サテライト会場ＳＴＬ１）内に配置される複数のスピーカ（サテライトスピーカＳＰｋ１～ＳＰｋｐ）ごとに、収音空間内の音場臨場感を再現空間内において再現するためのスピーカ駆動信号を生成する生成部（復号化部２５Ｂ）と、複数のスピーカのそれぞれから、スピーカごとのスピーカ駆動信号を出力する音場再生部２６と、を備える。これにより、実施の形態３に係る音場臨場感再現装置２０Ｂは、アンビソニックスマイクＡＭＢ１が主に収音した収音空間（ライブ会場ＬＶ１）内の客席側臨場感の雰囲気を、アンビソニックスマイクＡＭＢ１により収音されるライブ会場ＬＶ１内の１以上の個別音源（参照信号）の成分を時間領域上ではなくアンビソニックス領域上で消去することにより、高い方向解像度を有して少なくとも１つのサテライト会場内において高精度に再現することができる。 As described above, in the sound field presence reproduction system 100B according to the third embodiment, the sound field presence reproduction device 20B uses the sound collection device (ambisonics microphone AMB1) arranged in the sound collection space (live venue LV1). An acquisition section (echo canceling section 21B to 2nB) that acquires at least a collected sound signal (diffuse sound components DS1 to DS3), and a sound source signal of one or more sound sources in the sound collection space (sound signal SS1, audio Encoding units 31 to 3n encode the signal SS2, sound signal SS3), use the encoded sound source signal as a reference signal, and perform an erasing process to erase the reference signal component included in the collected sound signal. Based on the cancellation unit (echo cancellation unit 21B to 2nB) and the signal after cancellation processing, each of the plurality of speakers (satellite speakers SPk1 to SPkp) arranged in a reproduction space (satellite venue STL1) different from the sound collection space A generation unit (decoding unit 25B) that generates a speaker drive signal for reproducing the sound field presence in the sound collection space in the reproduction space, and a speaker drive signal for each speaker from each of the plurality of speakers. A sound field reproduction section 26 for outputting the sound field is provided. As a result, the sound field presence reproduction device 20B according to the third embodiment reproduces the atmosphere of presence on the audience seat side in the sound collection space (live venue LV1) mainly collected by the ambisonics microphone AMB1. By erasing the components of one or more individual sound sources (reference signals) in the live venue LV1, which are picked up by the LV1, in the ambisonics domain rather than in the time domain, the components of at least one individual sound source (reference signal) in the live venue LV1 are eliminated with high directional resolution. can be reproduced with high precision.

また、取得部（エコーキャンセル部２１Ｂ～２ｎＢ）により取得される収音信号は、収音デバイス（アンビソニックスマイクＡＭＢ１）が備える複数のマイク素子Ｍｃ１～Ｍｃ４のそれぞれの方向情報を用いて符号化処理された信号である。これにより、音場臨場感再現装置２０Ｂは、消去部（エコーキャンセル部２１Ｂ～２ｎＢのそれぞれ）における消去処理の対象として入力する信号として、高い方向解像度を有する１次オーダーアンビソニックス信号を取得できる。 In addition, the sound signals acquired by the acquisition unit (echo cancellation units 21B to 2nB) are encoded using the direction information of each of the plurality of microphone elements Mc1 to Mc4 included in the sound collection device (Ambisonics microphone AMB1). This is the signal that was sent. Thereby, the sound field presence reproduction device 20B can obtain a first-order ambisonics signal having a high directional resolution as a signal to be input as a target of cancellation processing in the cancellation section (each of the echo cancellation sections 21B to 2nB).

また、音場臨場感再現装置２０Ｂは、再現空間（サテライト会場ＳＴＬ１）内の複数のスピーカ（サテライトスピーカＳＰｋ１～ＳＰｋｐ）の方向情報を指定するスピーカ方向指定部２４を更に備える。生成部（復号化部２５Ｂ）は、複数のスピーカのそれぞれの方向情報と消去処理後の信号とを用いて、複数のスピーカのそれぞれごとのスピーカ駆動信号を生成する。これにより、音場臨場感再現装置２０Ｂは、アンビソニックス領域上での消去処理後の信号を用いて復号化処理を施すことにより、複数のサテライトスピーカＳＰｋ１～ＳＰｋｐのそれぞれの基準位置（例えば聴取者の位置に相当する中心位置ＬＳＰ１参照）からの方向情報を加味してライブ会場ＬＶ１内の客席側臨場感を再現可能であってかつ高い方向解像度を有するスピーカ駆動信号を高精度に生成できる。 Furthermore, the sound field presence reproduction device 20B further includes a speaker direction designation unit 24 that designates direction information of a plurality of speakers (satellite speakers SPk1 to SPkp) in the reproduction space (satellite venue STL1). The generation unit (decoding unit 25B) generates a speaker drive signal for each of the plurality of speakers using the direction information of each of the plurality of speakers and the signal after the erasure processing. As a result, the sound field presence reproduction device 20B performs decoding processing using the signal after the erasure processing in the ambisonics region, so that the reference position of each of the plurality of satellite speakers SPk1 to SPkp (for example, It is possible to reproduce the sense of presence on the audience seat side in the live venue LV1 by taking into account the direction information from the center position LSP1 (see center position LSP1 corresponding to the position of LV1), and to generate a speaker drive signal with high precision with high directional resolution.

また、音場臨場感再現装置２０Ｂは、収音空間（ライブ会場ＬＶ１）内の１以上の音源の位置情報を指定する音源位置指定部４１～４ｎを更に備える。符号化部３１～３ｎのそれぞれは、対応する音源の音源信号及び位置情報を用いて、符号化処理を実行する。これにより、音場臨場感再現装置２０Ｂは、ライブ会場ＬＶ１内の個別音源の存在する方向を加味して消去部（エコーキャンセル部２１Ｂ～２ｎＢ）の消去処理に必要となる高精度な参照信号を生成できる。 In addition, the sound field presence reproduction device 20B further includes sound source position designating units 41 to 4n that designate position information of one or more sound sources in the sound collection space (live venue LV1). Each of the encoding units 31 to 3n executes encoding processing using the sound source signal and position information of the corresponding sound source. Thereby, the sound field presence reproduction device 20B takes into account the direction in which the individual sound sources in the live venue LV1 exist and generates a highly accurate reference signal that is necessary for the cancellation process of the cancellation section (echo cancellation section 21B to 2nB). Can be generated.

また、消去部（エコーキャンセル部２１Ｂ～２ｎＢ）は、収音デバイス（アンビソニックスマイクＡＭＢ１）が備えるマイク素子Ｍｃ１～Ｍｃ４の数（例えば４）と音源の数（例えばｎ個）とに基づいて定められる個数分（例えば４ｎ（＝４×ｎ）個）のシングルエコーキャンセラにより構成される。シングルエコーキャンセラは、対応する音源の音源信号が符号化処理された後の信号を入力して消去処理を実行する。これにより、音場臨場感再現装置２０Ｂは、特に参照信号間に相関が無い場合（言い換えると、クロストーク成分が実質的に含まれないと見做せる程度の所定閾値未満となる場合）には、アンビソニックスマイクＡＭＢ１のマイク素子ごとの収音信号に基づく信号成分（例えば図１０に示すＷ、Ｘ、Ｙ、Ｚの各方向解像度を有する信号）に含まれる参照信号（つまり、個別収音マイクＭｎが収音した個別音源音信号と個別音源方向とに基づく１次オーダーアンビソニックス信号）の成分を高精度に消去（抑圧）することが可能となる。 Further, the canceling section (echo canceling section 21B to 2nB) is determined based on the number of microphone elements Mc1 to Mc4 (for example, 4) and the number of sound sources (for example, n pieces) included in the sound collection device (Ambisonics microphone AMB1). (for example, 4n (=4×n)) single echo cancellers. The single echo canceller inputs a signal after the sound source signal of the corresponding sound source has been encoded and performs cancellation processing. As a result, the sound field presence reproduction device 20B is able to control the sound field, especially when there is no correlation between the reference signals (in other words, when the crosstalk component is less than a predetermined threshold that can be considered to be substantially free of crosstalk components). , a reference signal (that is, an individual sound collection microphone) included in a signal component (for example, a signal having resolution in each of the W, X, Y, and Z directions shown in FIG. 10) based on a sound pickup signal for each microphone element of the ambisonics microphone AMB1. It becomes possible to highly accurately eliminate (suppress) the component of the first-order ambisonics signal based on the individual sound source sound signal collected by Mn and the direction of the individual sound source.

（実施の形態３の変形例）
実施の形態３では、音場臨場感再現装置において、エコーキャンセル部２１Ｂ～２ｎＢのそれぞれをＳｉｎｇｌｅＥｃｈｏＣａｎｃｅｌｌｅｒとして構成した例を説明した。実施の形態３の変形例では、音場臨場感再現装置において、エコーキャンセル部２１Ｂ～２ｎＢの代わりに、時間領域上ではなくアンビソニックス領域上において複数の音声チャネルを取り扱うマルチチャンネルエコーキャンセラとして構成した例を説明する。なお、実施の形態３の変形例において、実施の形態１、２と重複する構成、内容については対応する共通の符号を付与して説明を簡略化或いは省略し、異なる内容について説明する。 (Modification of the third embodiment)
In the third embodiment, an example is described in which each of the echo cancellation units 21B to 2nB is configured as a single echo canceller in the sound field realistic reproduction device. In a modified version of the third embodiment, an example is described in which the echo cancellation units 21B to 2nB are configured as a multi-channel echo canceller that handles multiple audio channels in the Ambisonics domain instead of the time domain in the sound field realistic reproduction device. Note that in the modified version of the third embodiment, the configurations and contents that overlap with those of the first and second embodiments are given the corresponding common reference numerals, and the description is simplified or omitted, and only the different contents are described.

まず、図２２及び図２３を参照して、実施の形態３の変形例に係る音場臨場感再現システム１００Ｃのシステム構成並びに動作概要について説明する。図２２は、実施の形態３の変形例に係る音場臨場感再現システム１００Ｃのシステム構成例を示すブロック図である。図２３は、図２２の音場臨場感再現システム１００Ｃにおける音場臨場感収音から音場臨場感再現までの動作概要例を示す図である。 First, with reference to FIGS. 22 and 23, the system configuration and operational outline of a sound field realistic sensation reproduction system 100C according to a modification of the third embodiment will be described. FIG. 22 is a block diagram showing a system configuration example of a sound field realistic sensation reproduction system 100C according to a modification of the third embodiment. FIG. 23 is a diagram illustrating an example of an outline of the operation from sound field presence sound pickup to sound field presence reproduction in the sound field presence reproduction system 100C of FIG. 22.

音場臨場感再現システム１００Ｃは、音場臨場感収音装置１０Ｂ（図１９参照）と、音場臨場感再現装置２０Ｃとを含む。音場臨場感収音装置１０Ｂと音場臨場感再現装置２０ＣとはネットワークＮＷ１を介して互いにデータ通信が可能に接続されている。 The sound field presence reproduction system 100C includes a sound field presence sound pickup device 10B (see FIG. 19) and a sound field presence reproduction device 20C. The sound field presence sound pickup device 10B and the sound field presence reproduction device 20C are connected to each other via a network NW1 so as to be capable of data communication.

ここで、マルチチャンネルエコーキャンセル部２１Ｃは、例えばアンビソニックス領域で動作する複数の適応フィルタを用いたエコーキャンセラとして構成してよい。このエコーキャンセラは、例えばＭｕｌｔｉＣｈａｎｎｅｌＥｃｈｏＣａｎｃｅｌｌｅｒとして構成することができる。なお、このＭｕｌｔｉＣｈａｎｎｅｌＥｃｈｏＣａｎｃｅｌｌｅｒの構成としては、参考非特許文献に示されるステレオエコーキャンセラを参考にしてよく、このステレオエコーキャンセラはＭｕｌｔｉＣｈａｎｎｅｌＥｃｈｏＣａｎｃｅｌｌｅｒに入力される参照信号の数を２つ設けた場合の例となる。従って、図２３に示すように、マルチチャンネルエコーキャンセル部２１Ｃは、アンビソニックスマイクＡＭＢ１のマイク素子の数（例えば４）と同数のＭｕｌｔｉＣｈａｎｎｅｌＥｃｈｏＣａｎｃｅｌｌｅｒ若しくはステレオエコーキャンセラを用いて構成可能である。このＭｕｌｔｉＣｈａｎｎｅｌＥｃｈｏＣａｎｃｅｌｌｅｒ若しくはステレオエコーキャンセラは、例えば参考非特許文献において開示されている構成或いはその構成を参照することで得られる構成でよい。この構成を用いることにより、たとえ参照信号間に相関がある場合（言い換えると、クロストーク成分が含まれないと見做せる程度の所定閾値以上となる場合）でも、アンビソニックスマイクＡＭＢ１の収音信号に基づく信号成分（例えば図１０に示すＷ、Ｘ、Ｙ、Ｚの各方向解像度を有する信号）に含まれる参照信号（つまり、個別収音マイクＭｎが収音した個別音源音信号）の成分を高精度に消去（抑圧）することが可能となる。 Here, the multichannel echo canceling unit 21C may be configured as an echo canceller using a plurality of adaptive filters that operate in the ambisonics region, for example. This echo canceller can be configured as a Multi Channel EchoCanceller, for example. Note that for the configuration of this Multi Channel EchoCanceller, the stereo echo canceler shown in the reference non-patent document may be referred to. An example. Therefore, as shown in FIG. 23, the multi-channel echo canceling section 21C can be configured using the same number of Multi Channel Echo Cancellers or stereo echo cancellers as the number of microphone elements (for example, 4) of the ambisonics microphone AMB1. This Multi Channel Echo Canceller or stereo echo canceller may be, for example, a configuration disclosed in a referenced non-patent document or a configuration obtained by referring to the configuration. By using this configuration, even if there is a correlation between the reference signals (in other words, if the correlation is greater than or equal to a predetermined threshold that allows it to be considered that crosstalk components are not included), the sound picked up by the ambisonics microphone AMB1 can be The component of the reference signal (that is, the individual sound source sound signal picked up by the individual sound collection microphone Mn) included in the signal component based on It becomes possible to erase (suppress) with high precision.

次に、図２４を参照して、音場臨場感再現装置２０Ｃによる音場臨場感再現の動作手順について説明する。図２４は、実施の形態３の変形例に係る音場臨場感再現装置２０Ｃによる音場臨場感再現の動作手順例を時系列に示すフローチャートである。図２４の説明において、図１５、図１８或いは図２１の説明と重複する内容については同一のステップ番号を付与して説明を簡略化或いは省略し、異なる内容について説明する。 Next, with reference to FIG. 24, a description will be given of an operation procedure for reproducing a sound field sense of presence by the sound field sense of presence reproduction apparatus 20C. FIG. 24 is a flowchart chronologically showing an example of an operation procedure for reproducing a sound field sense of presence by the sound field sense of presence reproduction apparatus 20C according to a modification of the third embodiment. In the description of FIG. 24, the same step numbers will be given to the same steps as those in the description of FIG. 15, FIG. 18, or FIG. 21, and the description will be simplified or omitted, and the different contents will be explained.

以上により、実施の形態３の変形例に係る音場臨場感再現装置２０Ｃにおいて、消去部（マルチチャンネルエコーキャンセル部２１Ｃ）は、収音デバイス（アンビソニックスマイクＡＭＢ１）が備えるマイク素子Ｍｃ１～Ｍｃ４の数に基づいて定められる個数分のマルチチャンネルエコーキャンセラにより構成される。マルチチャンネルエコーキャンセラは、複数の音源のそれぞれに対応する音源信号が符号化処理された後の信号を入力して消去処理（マルチチャンネルエコーキャンセル処理）を実行する。これにより、音場臨場感再現装置２０Ｃは、たとえ参照信号間に相関がある場合（言い換えると、クロストーク成分が含まれないと見做せる程度の所定閾値以上となる場合）でも、アンビソニックスマイクＡＭＢ１のマイク素子ごとの収音信号に基づく信号成分（例えば図１０に示すＷ、Ｘ、Ｙ、Ｚの各方向解像度を有する信号）に含まれる参照信号（つまり、個別収音マイクが収音した個別音源音信号）の成分をアンビソニックス領域上で高精度に消去（抑圧）することが可能となる。 As described above, in the sound field reality reproduction device 20C according to the modified example of the third embodiment, the canceling section (multichannel echo canceling section 21C) is configured to remove the microphone elements Mc1 to Mc4 included in the sound collection device (ambisonics microphone AMB1). It is constituted by the number of multichannel echo cancellers determined based on the number of echo cancellers. The multi-channel echo canceller inputs a signal obtained by encoding the sound source signal corresponding to each of a plurality of sound sources and executes cancellation processing (multi-channel echo cancellation processing). As a result, the sound field presence reproduction device 20C can use the ambisonics microphone even if there is a correlation between the reference signals (in other words, if the correlation is greater than or equal to a predetermined threshold at which it can be considered that crosstalk components are not included). A reference signal (that is, a reference signal included in a signal component based on a sound pickup signal for each microphone element of AMB1 (for example, a signal having resolution in each of the W, X, Y, and Z directions shown in FIG. 10) This makes it possible to eliminate (suppress) components of the individual sound source sound signal with high precision in the ambisonics region.

以上、添付図面を参照しながら実施の形態について説明したが、本開示はかかる例に限定されない。当業者であれば、特許請求の範囲に記載された範疇内において、各種の変更例、修正例、置換例、付加例、削除例、均等例に想到し得ることは明らかであり、それらについても本開示の技術的範囲に属すると了解される。また、発明の趣旨を逸脱しない範囲において、上述した実施の形態における各構成要素を任意に組み合わせてもよい。 Although the embodiments have been described above with reference to the attached drawings, the present disclosure is not limited to such examples. It is clear that a person skilled in the art can conceive of various modifications, corrections, substitutions, additions, deletions, and equivalents within the scope of the claims, and it is understood that these also fall within the technical scope of the present disclosure. Furthermore, the components in the above-described embodiments may be combined in any manner as long as it does not deviate from the spirit of the invention.

なお、上述した実施の形態１では、臨場感音場再生装置は、ライブ会場ＬＶ１内で収音（収録）した演者の演者音信号、ゾーン音信号、客席側臨場感音信号を用いてリアルタイムにサテライト会場ＳＴＬ１内において音像定位並びに臨場感の再生を行う例を説明した。しかしながら、臨場感音場再生装置は、ライブ会場ＬＶ１において予め録音した演者の演者音信号、ゾーン音信号、客席側臨場感音信号をサテライト会場ＳＴＬ１内の記憶部１３に予め保存しておき、リアルタイムではなく録音日よりも後日にライブ会場ＬＶ１の音場による雰囲気の再生（音像定位、臨場感再現）を行うようにしても構わない。 In the above-described first embodiment, the realistic sound field reproducing device reproduces sound in real time using the performer's sound signal, the zone sound signal, and the audience seat side realistic sound signal collected (recorded) in the live venue LV1. An example has been described in which sound image localization and realistic reproduction are performed within the satellite venue STL1. However, the realistic sound field reproducing device stores in advance the performer's sound signal, the zone sound signal, and the audience seat side realistic sound signal recorded in advance at the live venue LV1 in the storage unit 13 in the satellite venue STL1, and real-time Instead, the atmosphere may be reproduced (sound image localization, realistic reproduction) using the sound field of the live venue LV1 at a later date than the recording date.

なお、図１に示したシステム構成例において、記憶部１３、音像定位ゾーン決定部１５、客席側臨場感パラメータ算出部１６、ミキシング／レベルバランス調整部１７のうち少なくとも１つはライブ会場ＬＶ側に設けられても構わない。言い換えると、図１に示す臨場感音場再生システム１０００のシステム構成の配置については、ライブ会場ＬＶ１側に設けられるコンピュータ装置Ｐ１の性能、サテライト会場ＳＴＬ１側に設けられるコンピュータ装置Ｐ２の性能等を勘案して適宜決定してよい。 In the system configuration example shown in FIG. 1, at least one of the memory unit 13, sound image localization zone determination unit 15, audience seating side realism parameter calculation unit 16, and mixing/level balance adjustment unit 17 may be provided on the live venue LV side. In other words, the layout of the system configuration of the realism sound field reproduction system 1000 shown in FIG. 1 may be appropriately determined taking into consideration the performance of the computer device P1 provided on the live venue LV1 side and the performance of the computer device P2 provided on the satellite venue STL1 side.

本開示は、音場収録空間内の各種の音源による音像を含む臨場感を音場再生空間内において高感度に再生する臨場感音場再生装置及び臨場感音場再生方法として有用である。 INDUSTRIAL APPLICABILITY The present disclosure is useful as a realistic sound field reproducing device and a realistic sound field reproducing method that highly sensitively reproduces a sense of presence including sound images from various sound sources in a sound field recording space in a sound field reproduction space.

１ミキサ
２演者トラッキングシステム
３顔データベース
４エンコーダ
５マッチング表データベース
６ライブ会場側通信部
７Ａ／Ｄ変換部
８、２２、３１、３ｎ符号化部
９、２３マイク素子方向指定部
１０、１０Ｂ音場臨場感収音装置
１１サテライト会場側通信部
１２デコーダ
１３記憶部
１４映像出力部
１５音像定位ゾーン決定部
１６客席側臨場感パラメータ算出部
１７ミキシング／レベルバランス調整部
１８音場再生処理部
２０、２０Ａ、２０Ｂ、２０Ｃ音場臨場感再現装置
２１、２ｎ、２１Ｂ、２ｎＢエコーキャンセル部
２１Ａ、２１Ｃマルチチャンネルエコーキャンセル部
２４スピーカ方向指定部
２５、２５Ｂ、２５Ｃ復号化部
２６音場再生部
４１、４ｎ音源位置指定部
１００、１００Ａ、１００Ｂ、１００Ｃ音場臨場感再現システム
１０００臨場感音場再生システム
ＡＭＢ１アンビソニックスマイク
ＨＭ１、ＨＭ４ヘッドセットマイク
Ｍ１、Ｍｎ個別収音マイク
ＳＰｋ１、ＳＰｋ２、ＳＰｋ３、ＳＰｋ４、ＳＰｋ５、ＳＰｋ６、ＳＰｋ７、ＳＰｋ８、ＳＰｋ９、ＳＰｋ１０、ＳＰｋ１１、ＳＰｋ１２、ＳＰｋｐサテライトスピーカ
ＺＭ１、ＺＭ２、ＺＭ３、ＺＭ４、ＺＭ５、ＺＭ６、ＺＭ７ゾーンマイク 1 Mixer 2 Performer tracking system 3 Face database 4 Encoder 5 Matching table database 6 Live venue side communication section 7 A/D conversion section 8, 22, 31, 3n Encoding section 9, 23 Microphone element direction specifying section 10, 10B Sound field Presence sound pickup device 11 Satellite venue side communication section 12 Decoder 13 Storage section 14 Video output section 15 Sound image localization zone determination section 16 Audience seat side presence feeling parameter calculation section 17 Mixing/level balance adjustment section 18 Sound field reproduction processing section 20, 20A , 20B, 20C Sound field presence reproduction device 21, 2n, 21B, 2nB Echo cancellation unit 21A, 21C Multi-channel echo cancellation unit 24 Speaker direction designation unit 25, 25B, 25C Decoding unit 26 Sound field reproduction unit 41, 4n Sound source Position specifying section 100, 100A, 100B, 100C Sound field realistic reproduction system 1000 Realistic sound field reproduction system AMB1 Ambisonics microphone HM1, HM4 Headset microphone M1, Mn Individual sound collection microphone SPk1, SPk2, SPk3, SPk4, SPk5, SPk6, SPk7, SPk8, SPk9, SPk10, SPk11, SPk12, SPkp Satellite speaker ZM1, ZM2, ZM3, ZM4, ZM5, ZM6, ZM7 Zone microphone

Claims

A speech audio signal recorded by a person microphone worn by at least one person who can move in an activity area in a sound field recording space, and an ambient sound signal recorded by a plurality of peripheral microphones arranged around the activity area. , and an acquisition unit that acquires at least the location information of the person;
a determining unit that determines a main peripheral microphone, which is one of the plurality of peripheral microphones and whose recording area is a location where the person is located within the activity area, based on position information of the person;
Based on at least the uttered audio signal from the person microphone, a first surrounding sound signal from the main surrounding microphone, and a second surrounding sound signal from the surrounding microphone other than the main surrounding microphone, the sound field recording space is a sound field reproduction unit that executes a sound field reproduction process for reproducing the sound field in the sound field recording space using a plurality of speakers arranged in a sound field reproduction space different from the sound field reproduction space;
Realistic sound field reproduction device.

The sound field reproduction unit reproduces the sound field so as to reproduce the sound field in the sound field recording space by emphasizing the first peripheral sound signal more than the second peripheral sound signal in the sound field reproduction space. execute the process,
The realistic sound field reproducing device according to claim 1.

The acquisition unit further acquires a peripheral area sound signal recorded by a sound recording device located in a peripheral area different from the activity area in the sound field recording space,
The sound field reproduction unit executes signal processing for reproducing a sound field based on the peripheral area sound signal in an area in the sound field reproduction space corresponding to the peripheral area in the sound field recording space.
The realistic sound field reproducing device according to claim 1.

The sound field reproduction unit mixes the uttered audio signal, the first ambient sound signal, and the second ambient sound signal, and performs signal level balance adjustment, and outputs the mixture through each of the plurality of speakers.
The realistic sound field reproducing device according to claim 1.

The sound field reproduction unit mixes the uttered audio signal, the first ambient sound signal, the second ambient sound signal, and the ambient area sound signal and adjusts the balance of signal levels, and controls each of the plurality of speakers. output via,
The realistic sound field reproducing device according to claim 3.

The acquisition unit repeatedly acquires position information of the person that can be updated periodically,
The determining unit determines the main peripheral microphone each time the position information of the person is updated.
The realistic sound field reproducing device according to claim 1.

the person is plural;
The determination unit determines the main peripheral microphone for each person based on position information of the person.
The realistic sound field reproducing device according to claim 1.

A speech audio signal recorded by a person microphone worn by at least one person who can move in an activity area in a sound field recording space, and an ambient sound signal recorded by a plurality of peripheral microphones arranged around the activity area. , and an acquisition unit that acquires at least the location information of the person;
a determining unit that determines a main peripheral microphone, which is one of the plurality of peripheral microphones and whose recording area is a location where the person is located within the activity area, based on position information of the person;
Based on at least the uttered audio signal from the person microphone, a first surrounding sound signal from the main surrounding microphone, and a second surrounding sound signal from the surrounding microphone other than the main surrounding microphone, the sound field recording space is a sound field reproduction unit that executes a sound field reproduction process for reproducing the sound field in the sound field recording space using a plurality of speakers arranged in a sound field reproduction space different from the sound field reproduction space,
The acquisition unit further acquires a peripheral area sound signal recorded by a sound recording device located in a peripheral area different from the activity area in the sound field recording space,
The sound field reproduction section is
performing signal processing for reproducing a sound field based on the surrounding area sound signal in an area in the sound field reproduction space that corresponds to the surrounding area in the sound field recording space;
using the speech audio signal, the first ambient sound signal, and the second ambient sound signal as reference signals, and performing an erasure process as the signal processing for erasing a component of the reference signal included in the ambient area sound signal;
Realistic sound field reproduction device.

The sound field reproduction section encodes the signal after the erasure processing, and reproduces the sense of presence of the sound field in the sound field recording space for each of the plurality of speakers based on the signal after the encoding processing. Generating a speaker drive signal for reproduction in a sound field reproduction space,
The realistic sound field reproducing device according to claim 8.

The sound field reproduction unit outputs the generated speaker drive signal for each of the speakers from the corresponding speaker.
The realistic sound field reproducing device according to claim 9.

A method for reproducing a realistic sound field using a realistic sound field reproducing device, the method comprising:
A speech audio signal recorded by a person microphone worn by at least one person who can move in an activity area in a sound field recording space, and an ambient sound signal recorded by a plurality of peripheral microphones arranged around the activity area. , and obtaining at least the location information of the person;
determining a main peripheral microphone, which is one of the plurality of peripheral microphones and whose recording area is a location where the person is located within the activity area, based on the position information of the person;
Based on at least the uttered audio signal from the person microphone, a first surrounding sound signal from the main surrounding microphone, and a second surrounding sound signal from the surrounding microphone other than the main surrounding microphone, the sound field recording space is executing a sound field reproduction process for reproducing the sound field in the sound field recording space using a plurality of speakers arranged in a sound field reproduction space different from the sound field reproduction space,
A realistic sound field reproduction method.