JP7403392B2

JP7403392B2 - Sound collection device, system, program, and method for transmitting environmental sound signals collected by multiple microphones to a playback device

Info

Publication number: JP7403392B2
Application number: JP2020101320A
Authority: JP
Inventors: 正樹内藤; 俊治堀内
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2020-06-11
Filing date: 2020-06-11
Publication date: 2023-12-22
Anticipated expiration: 2040-06-11
Also published as: JP2021196433A

Description

本発明は、環境音に内在する音響信号の収音及び再生の技術に関する。特に、ネットワークを介したテレプレゼンス（テレビ会議）システムに適する。 The present invention relates to technology for collecting and reproducing acoustic signals inherent in environmental sounds. It is particularly suitable for telepresence (video conference) systems via networks.

仕事や業務に限らず、離れて暮らす親と子など、メンバ同士が異なる拠点間で離れて活動や生活をする場合が多い。この場合、スマートフォンやパソコンなどの電話や通信の機能によって、相手の状況を認識し合おうとする。このとき、自発的にコミュニケーションをとる必要があり、遠隔で活動するメンバが疎外感を抱くなどの問題が指摘されている（例えば非特許文献１参照）。 In many cases, members engage in activities and lives apart from each other in different locations, not just for work or business, but also for parents and children who live far apart. In this case, each party tries to recognize the other party's situation using the telephone and communication functions of smartphones, computers, etc. At this time, it is necessary to communicate spontaneously, and it has been pointed out that members working remotely may feel left out (for example, see Non-Patent Document 1).

これに対し、複数の拠点間で、映像及び音声を常時流し続けるテレプレゼンスシステムが利用されるようになってきている（例えば非特許文献２、３参照）。これは、テレビ会議システムと同じであるが、通話又は会議中にのみ接続するものではなく、常時接続されている。このシステムによれば、遠隔の異なる拠点に滞在するメンバ（親子、家族、社員）同士であっても、相手周辺の環境音や画像を常時送信することによって、互いの状況を共有しながら、あたかも同じ居所にいるような環境を提供することができる。テレプレゼンスシステムは、国内及び海外における拠点間のみではなく、会社と在宅又はシェアオフィスとの間でも、ネットワークを介して手軽に接続することができる。
例えば代表的なテレビ電話のSkype（登録商標）によれば、遠隔拠点のメンバの動向を共有するために、「在籍／離席」の状態を、相手側の端末のディスプレイに表示することができる。 In contrast, telepresence systems that constantly stream video and audio between multiple locations have come into use (for example, see Non-Patent Documents 2 and 3). This is the same as a video conferencing system, but instead of being connected only during calls or meetings, it is always connected. According to this system, even if members (parents and children, families, employees) staying in different remote locations can share each other's situation by constantly transmitting environmental sounds and images of the other party's surroundings, they can We can provide an environment where you feel like you are living in the same place. A telepresence system can easily connect not only between domestic and overseas bases, but also between a company and a home or shared office via a network.
For example, according to Skype (registered trademark), a typical videophone service, in order to share the movements of members at remote locations, it is possible to display the status of "attended/away" on the display of the other party's terminal. .

尚、他の従来技術として、映像を拡大表示する際に、複数のマイクで収音した音響信号の音場の方向、広さを調整し、ユーザが指定した映像の範囲に合った音場を再現する技術もある（例えば特許文献１参照）。この技術によれば、複数のマイクで収音した音響信号の音場の方向、広さを調整し再生することができる。
また、作業中にディスプレイを見ていなくても、遠隔拠点のメンバの状況を知るために、その相手方の状況を合成音で伝える技術もある（例えば非特許文献４参照）。
更に、遠隔拠点間で互いに多様な環境音を認識し合う環境音認識装置の技術もある（例えば特許文献２参照）。 Another conventional technique is to adjust the direction and width of the sound field of acoustic signals picked up by multiple microphones to create a sound field that matches the range of the video specified by the user when displaying an enlarged video. There is also a technique for reproducing this (for example, see Patent Document 1). According to this technology, the direction and width of the sound field of acoustic signals collected by multiple microphones can be adjusted and reproduced.
In addition, there is also a technology in which the situation of a member at a remote site is communicated using a synthesized voice in order to know the situation of a member at a remote site even if the user is not looking at the display while working (for example, see Non-Patent Document 4).
Furthermore, there is a technology for an environmental sound recognition device that mutually recognizes various environmental sounds between remote locations (for example, see Patent Document 2).

特開２０１９―０６８２１０号公報JP2019-068210A 特許第６０８５５３８号公報Patent No. 6085538

総務省編、「テレワークの動向と生産性に関する調査研究報告書，総務省情報通信国際戦略局(2010)」、[online]、［令和２年３月１０日検索］、インターネット＜URL:https://www.soumu.go.jp/johotsusintokei/linkdata/h22_06_houkoku.pdf＞Edited by the Ministry of Internal Affairs and Communications, “Research Report on Telework Trends and Productivity, International Strategy Bureau of Information and Communications, Ministry of Internal Affairs and Communications (2010)”, [online], [Retrieved March 10, 2020], Internet <URL: https //www.soumu.go.jp/johotsusintokei/linkdata/h22_06_houkoku.pdf＞ Telepresence: Integrating shared task and person spaces, W Buxton - Proceedings of graphics interface, 1992、[online]、［令和２年３月１０日検索］、インターネット＜URL:https://www.billbuxton.com/TelepShrdSpce.pdf＞Telepresence: Integrating shared task and person spaces, W Buxton - Proceedings of graphics interface, 1992, [online], [Retrieved March 10, 2020], Internet <URL: https://www.billbuxton.com/TelepShrdSpce .pdf＞日本人間工学会大会講演集 406-407, 2009：テレワーク向け常時接続型音声会議システムJapan Ergonomics Society Conference Proceedings 406-407, 2009: Always-on audio conferencing system for telework HRI 2018: Fribo: A Social Networking Robot for Increasing Social Connectedness through Sharing Daily Home Activities from Living Noise Data.、[online]、［令和２年３月１０日検索］、インターネット＜URL:https://yonsei.pure.elsevier.com/en/publications/fribo-a-social-networking-robot-for-increasing-social-connectedne＞HRI 2018: Fribo: A Social Networking Robot for Increasing Social Connectedness through Sharing Daily Home Activities from Living Noise Data., [online], [Retrieved March 10, 2020], Internet <URL: https://yonsei. pure.elsevier.com/en/publications/fribo-a-social-networking-robot-for-increasing-social-connectedne＞電子情報通信学会「知識の森」、2群（画像・音・言語）－6編（音響信号処理）－2章（音源分離）、[online]、［令和２年４月２０日検索］、インターネット＜URL:http://www.ieice-hbkb.org/files/02/02gun_06hen_02.pdf＞Institute of Electronics, Information and Communication Engineers "Forest of Knowledge", Group 2 (Image/Sound/Language) - Volume 6 (Acoustic Signal Processing) - Chapter 2 (Sound Source Separation), [online], [Retrieved April 20, 2020] , Internet <URL:http://www.ieice-hbkb.org/files/02/02gun_06hen_02.pdf> 小野一穂、「マルチチャネルオーディオ」、[online]、［令和２年３月１０日検索］、インターネット＜URL:https://www.jstage.jst.go.jp/article/itej/68/8/68_604/_pdf/-char/ja＞Kazuho Ono, "Multichannel Audio", [online], [Retrieved March 10, 2020], Internet <URL: https://www.jstage.jst.go.jp/article/itej/68/8 /68_604/_pdf/-char/ja>

既存のテレプレゼンスシステムによれば、第１の拠点の収音装置によって収音された音響信号を、第２の拠点の再生装置で再生することができる。このとき、第１の拠点の環境音を単に録音し、第２の拠点ではその環境音をそのまま再生するだけである。 According to the existing telepresence system, a sound signal picked up by a sound pickup device at a first base can be reproduced by a playback device at a second base. At this time, the environmental sounds at the first base are simply recorded, and the second base simply plays back the environmental sounds as they are.

これに対し、本願の発明者らは、第２の拠点では、当該第１の拠点の音源位置に応じて第１の拠点の環境音を再生した方が、第２の拠点のユーザは、第１の拠点のユーザの存在を雰囲気的に感じることができる、と考えた。例えば、第１の拠点で水道の蛇口音が収音された場合、第２の拠点でも同じ音源位置から、第１の拠点の水道の蛇口音が到来するように再生することができないか、と考えた。 In contrast, the inventors of the present application have found that if the environmental sound of the first base is played back at the second base according to the sound source position of the first base, the user of the second base will We thought that the presence of users at base 1 could be felt in the atmosphere. For example, if the sound of a water faucet is collected at a first base, is it possible to reproduce it at a second base so that the sound of the water faucet at the first base comes from the same sound source position? Thought.

そこで、本発明は、複数のマイクによって収音した環境音信号を、再生装置によって所定の到来方向から聞こえるべく再生できるように送信する収音装置、システム、プログラム及び方法を提供することを目的とする。 SUMMARY OF THE INVENTION Therefore, an object of the present invention is to provide a sound collection device, a system, a program, and a method for transmitting environmental sound signals collected by a plurality of microphones so that they can be reproduced by a reproduction device so as to be heard from a predetermined direction of arrival. do.

本発明によれば、複数のマイクによって収音した環境音信号を、複数のスピーカによって再生する再生装置へ送信する収音装置において、
音響タグ及び到来方向を紐付けた環境センサに接続されており、
音響タグ毎に、音響信号を蓄積する第１の音響データベースと、
環境音信号から、環境音信号に内在する１つ以上の音響信号を検出すると共に、音響信号毎の到来方向を推定する音源分離手段と、
第１の音響データベースを用いて、音響信号の音響タグを推定する音響タグ推定手段と、
音響タグ推定手段によって推定された音響タグ及び音源分離手段によって推定された到来方向と、環境センサから所定信号を受信した際における当該環境センサに紐付けられた音響タグ及び到来方向とを、再生装置へ送信する音響タグ送信手段と
を有し、再生装置について当該到来方向から当該音響タグに紐付く音響信号を再生させることを特徴とする。 According to the present invention, in a sound collection device that transmits environmental sound signals collected by a plurality of microphones to a playback device that reproduces them using a plurality of speakers,
It is connected to an acoustic tag and an environmental sensor linked to the direction of arrival.
a first acoustic database that accumulates acoustic signals for each acoustic tag;
Sound source separation means for detecting one or more acoustic signals inherent in the environmental sound signal from the environmental sound signal and estimating the direction of arrival of each acoustic signal;
acoustic tag estimating means for estimating an acoustic tag of an acoustic signal using a first acoustic database;
The reproduction device reproduces the acoustic tag estimated by the acoustic tag estimating means, the arrival direction estimated by the sound source separation means, and the acoustic tag and arrival direction associated with the environmental sensor when receiving a predetermined signal from the environmental sensor. and an acoustic tag transmitting means for transmitting to the acoustic tag, and causes the reproduction device to reproduce the acoustic signal associated with the acoustic tag from the direction of arrival.

本発明の収音装置における他の実施形態によれば、
音源分離手段は、複数のマイクを用いたブラインド音源分離方式又はビームフォーミングによって、音響信号の到来方向を推定することも好ましい。 According to another embodiment of the sound collection device of the present invention,
It is also preferable that the sound source separation means estimates the arrival direction of the acoustic signal by a blind sound source separation method using a plurality of microphones or beamforming.

本発明の収音装置における他の実施形態によれば、
カメラに接続されており、
音響タグが紐付けられた画像オブジェクトを蓄積する画像データベースと、
画像データベースを用いて、カメラによって撮影された映像に内在する１つ以上の画像オブジェクトを検出し、当該画像オブジェクトの音響タグを特定する画像オブジェクト検出エンジンと
を更に有し、
音響タグ送信手段は、画像オブジェクト検出エンジンによって特定された音響タグに紐付く音響信号における到来方向を送信する
ことも好ましい。 According to another embodiment of the sound collection device of the present invention,
connected to the camera,
an image database that stores image objects associated with acoustic tags;
further comprising an image object detection engine that uses the image database to detect one or more image objects within the video captured by the camera and identifies an acoustic tag of the image object;
It is also preferable that the acoustic tag transmitting means transmits the arrival direction of the acoustic signal associated with the acoustic tag identified by the image object detection engine.

本発明によれば、前述した収音装置と、複数のスピーカを搭載した再生装置とがネットワークを介して接続されたシステムにおいて、
再生装置は、
音響タグ毎に、音響信号を蓄積する第２の音響データベースと、
収音装置から、音響タグ及び到来方向を受信する音響タグ受信手段と、
第２の音響データベースを用いて、音響タグに紐付く音響信号が、当該音響タグの到来方向から聞こえるように合成した環境音を、複数のスピーカから出力する環境音再生手段と
を有することを特徴とする。 According to the present invention, in a system in which the aforementioned sound collection device and a playback device equipped with a plurality of speakers are connected via a network,
The playback device is
a second acoustic database that accumulates acoustic signals for each acoustic tag;
acoustic tag receiving means for receiving the acoustic tag and the direction of arrival from the sound collection device;
It is characterized by having an environmental sound reproducing means for outputting, from a plurality of speakers, an environmental sound synthesized using a second acoustic database so that the acoustic signal linked to the acoustic tag can be heard from the direction of arrival of the acoustic tag. shall be.

本発明のシステムにおける他の実施形態によれば、
収音装置における音響タグ送信手段は、音響タグ及び到来方向と共に、音響信号を更に送信し、
再生装置における環境音再生手段は、音響タグに紐付く音響信号に代えて、収音装置から受信した音響信号を再生することも好ましい。 According to other embodiments of the system of the invention:
The acoustic tag transmitting means in the sound collection device further transmits the acoustic signal together with the acoustic tag and the direction of arrival,
It is also preferable that the environmental sound reproducing means in the reproducing device reproduces the acoustic signal received from the sound collection device instead of the acoustic signal tied to the acoustic tag.

本発明のシステムにおける他の実施形態によれば、
再生装置の第２の音響データベースに蓄積された音響タグ及び音響信号は、収音装置の第１の音響データベースに蓄積された音響タグ及び音響信号の一部又は全部であり、
再生装置の第２の音響データベースに蓄積された音響タグと、収音装置の第１の音響データベースに蓄積された音響タグとが同一であっても、異なる音響信号に基づく音響信号であることも好ましい。 According to other embodiments of the system of the invention:
The acoustic tags and acoustic signals accumulated in the second acoustic database of the playback device are part or all of the acoustic tags and acoustic signals accumulated in the first acoustic database of the sound collection device,
Even if the acoustic tag stored in the second acoustic database of the playback device and the acoustic tag accumulated in the first acoustic database of the sound collection device are the same, the acoustic signals may be based on different acoustic signals. preferable.

本発明によれば、複数のマイクによって収音した環境音信号を、複数のスピーカによって再生する再生装置へ送信する収音装置に搭載されたコンピュータを機能させるプログラムにおいて、
当該収音装置は、音響タグ及び到来方向を紐付けた環境センサに接続されており、
音響タグ毎に、音響信号を蓄積する第１の音響データベースと、
環境音信号から、環境音信号に内在する１つ以上の音響信号を検出すると共に、音響信号毎の到来方向を推定する音源分離手段と、
第１の音響データベースを用いて、音響信号の音響タグを推定する音響タグ推定手段と、
音響タグ推定手段によって推定された音響タグ及び音源分離手段によって推定された到来方向と、環境センサから所定信号を受信した際における当該環境センサに紐付けられた音響タグ及び到来方向とを、再生装置へ送信する音響タグ送信手段と
してコンピュータを機能させ、再生装置について当該到来方向から当該音響タグに紐付く音響信号を再生させることを特徴とする。 According to the present invention, in a program that causes a computer installed in a sound collection device to transmit environmental sound signals picked up by a plurality of microphones to a playback device that reproduces them through a plurality of speakers,
The sound collection device is connected to an environmental sensor that is associated with an acoustic tag and a direction of arrival,
a first acoustic database that accumulates acoustic signals for each acoustic tag;
Sound source separation means for detecting one or more acoustic signals inherent in the environmental sound signal from the environmental sound signal and estimating the direction of arrival of each acoustic signal;
acoustic tag estimating means for estimating an acoustic tag of an acoustic signal using a first acoustic database;
The reproduction device reproduces the acoustic tag estimated by the acoustic tag estimating means, the arrival direction estimated by the sound source separation means, and the acoustic tag and arrival direction associated with the environmental sensor when receiving a predetermined signal from the environmental sensor. The present invention is characterized in that a computer functions as an acoustic tag transmitting means for transmitting an acoustic tag to a user, and causes a reproducing device to reproduce an acoustic signal associated with the acoustic tag from the direction of arrival.

本発明によれば、前述した収音装置から音響タグ及び到来方向を受信し、複数のスピーカを搭載した再生装置に搭載されたコンピュータを機能させるプログラムにおいて、
音響タグ毎に、音響信号を蓄積する第２の音響データベースと、
収音装置から、音響タグ及び到来方向を受信する音響タグ受信手段と、
第２の音響データベースを用いて、音響タグに紐付く音響信号が、当該音響タグの到来方向から聞こえるように合成した環境音を、複数のスピーカから出力する環境音再生手段と
してコンピュータを機能させることを特徴とする。 According to the present invention, in a program that receives an acoustic tag and a direction of arrival from the above-described sound collection device and causes a computer installed in a playback device equipped with a plurality of speakers to function,
a second acoustic database that accumulates acoustic signals for each acoustic tag;
acoustic tag receiving means for receiving the acoustic tag and the direction of arrival from the sound collection device;
Using a second acoustic database, the computer functions as an environmental sound reproducing means for outputting from a plurality of speakers an environmental sound synthesized so that the acoustic signal linked to the acoustic tag can be heard from the direction of arrival of the acoustic tag. It is characterized by

本発明によれば、複数のマイクによって収音した環境音信号を、複数のスピーカによって再生する再生装置へ送信する収音装置の収音方法において、
収音装置は、
音響タグ及び到来方向を紐付けた環境センサに接続されており、
音響タグ毎に、音響信号を蓄積する第１の音響データベースを有し、
環境音信号から、環境音信号に内在する１つ以上の音響信号を検出すると共に、音響信号毎のび到来方向を推定する第１のステップと、
第１の音響データベースを用いて、音響信号の音響タグを推定する第２のステップと、
第２のステップによって推定された音響タグ及び第１のステップによって推定された到来方向と、環境センサから所定信号を受信した際における当該環境センサに紐付けられた音響タグ及び到来方向とを、再生装置へ送信する第３のステップと
を実行し、再生装置について当該到来方向から当該音響タグに紐付く音響信号を再生させることを特徴とする。
According to the present invention, in a sound collection method for a sound collection device that transmits environmental sound signals collected by a plurality of microphones to a playback device that reproduces them using a plurality of speakers,
The sound collection device is
It is connected to an acoustic tag and an environmental sensor linked to the direction of arrival.
a first acoustic database for accumulating acoustic signals for each acoustic tag;
a first step of detecting one or more acoustic signals inherent in the environmental sound signal from the environmental sound signal and estimating the direction of arrival of each acoustic signal;
a second step of estimating an acoustic tag of the acoustic signal using the first acoustic database;
Reproducing the acoustic tag estimated in the second step, the direction of arrival estimated in the first step, and the acoustic tag and direction of arrival associated with the environmental sensor when the predetermined signal is received from the environmental sensor. The method is characterized in that the third step of transmitting the audio signal to the audio tag is executed, and the reproducing device reproduces the acoustic signal associated with the acoustic tag from the direction of arrival.

本発明によれば、前述した収音装置から音響タグ及び到来方向を受信し、複数のスピーカを搭載した再生装置の再生方法において、
再生装置は、
音響タグ毎に、音響信号を蓄積する第２の音響データベースを有し、
収音装置から、音響タグ及び到来方向を受信する第１のステップと、
第２の音響データベースを用いて、音響タグに紐付く音響信号が、当該音響タグの到来方向から聞こえるように合成した環境音を、複数のスピーカから出力する第２のステップと
を実行することを特徴とする。 According to the present invention, in a reproduction method of a reproduction apparatus that receives an acoustic tag and an arrival direction from the above-mentioned sound collection device and is equipped with a plurality of speakers,
The playback device is
a second acoustic database for accumulating acoustic signals for each acoustic tag;
a first step of receiving an acoustic tag and a direction of arrival from a sound collection device;
and a second step of outputting from a plurality of speakers an environmental sound synthesized so that the acoustic signal associated with the acoustic tag can be heard from the direction of arrival of the acoustic tag using the second acoustic database. Features.

本発明の収音装置、システム、プログラム及び方法によれば、複数のマイクによって収音した環境音信号を、再生装置によって所定の到来方向から聞こえるべく再生できるように送信することができる。
本発明によれば、具体的には、収音側における各音源の音響信号を、再生側における各音源の位置に応じてその音響信号を再生することができる。収音装置と再生装置とが異なる拠点に配置された場合であっても、遠隔に滞在するメンバ同士で、互いの環境音を共有することができる。 According to the sound collection device, system, program, and method of the present invention, environmental sound signals collected by a plurality of microphones can be transmitted so that they can be played back by a playback device so that they can be heard from a predetermined direction of arrival.
According to the present invention, specifically, the acoustic signal of each sound source on the sound collection side can be reproduced according to the position of each sound source on the reproduction side. Even if the sound collection device and the playback device are located at different bases, members staying remotely can share each other's environmental sounds.

収音装置が配置された拠点Ａの環境音を表す外観図である。It is an external view showing the environmental sound of base A where the sound collection device is arranged. 本発明における収音装置の機能構成図である。It is a functional block diagram of the sound collection device in this invention. 収音装置における音源分離部及び音響タグ推定部の説明図である。FIG. 2 is an explanatory diagram of a sound source separation unit and an acoustic tag estimation unit in the sound collection device. ブラインド音源分離方式を用いた到来方向の検出を表す説明図である。FIG. 2 is an explanatory diagram showing arrival direction detection using a blind sound source separation method. ビームフォーミング方式を用いた到来方向の検出を表す説明図である。FIG. 3 is an explanatory diagram showing direction of arrival detection using a beamforming method. 本発明における再生装置の機能構成図である。FIG. 2 is a functional configuration diagram of a playback device according to the present invention. 収音装置から再生装置へ送信されるデータを表す説明図である。FIG. 2 is an explanatory diagram showing data transmitted from a sound collection device to a playback device. パターン１における再生装置の環境音再生部の説明図である。FIG. 3 is an explanatory diagram of an environmental sound reproduction section of the reproduction device in pattern 1. パターン１によって再生された音響信号を表す外観図である。3 is an external view showing an acoustic signal reproduced by pattern 1. FIG. パターン２における環境音再生部の説明図である。FIG. 6 is an explanatory diagram of an environmental sound reproduction section in pattern 2. パターン２によって再生された音響信号を表す外観図である。3 is an external view showing an acoustic signal reproduced by pattern 2. FIG. 環境センサに対応する音響タグを送信する収音装置の説明図である。FIG. 2 is an explanatory diagram of a sound collection device that transmits an acoustic tag corresponding to an environmental sensor. カメラの映像から推定した音響タグを送信する収音装置の説明図である。FIG. 2 is an explanatory diagram of a sound collection device that transmits an acoustic tag estimated from a camera image.

以下、本発明の実施の形態について、図面を用いて詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail using the drawings.

＜収音装置１＞
図１は、収音装置が配置された拠点Ａの環境音を表す外観図である。 <Sound collection device 1>
FIG. 1 is an external view showing the environmental sounds of a base A where a sound collection device is placed.

本発明によれば、少なくとも、第１の拠点内に配置された収音装置１から構成される。
収音装置１は、第１の拠点内で、ユーザに聞こえる環境音から複数の音響信号の到来方向を検出し、その音響タグ及び到来方向を、再生装置へ送信する（パターン１）。また、収音した音響信号そのものを、再生装置へ送信するものであってもよい（パターン２）。 According to the present invention, the sound collection device 1 includes at least the sound collection device 1 placed within the first base.
The sound collection device 1 detects the directions of arrival of a plurality of acoustic signals from the environmental sounds heard by the user within the first base, and transmits the acoustic tags and the directions of arrival to the playback device (pattern 1). Alternatively, the collected acoustic signal itself may be transmitted to the playback device (pattern 2).

図１によれば、ユーザａ１、ａ２の周辺環境の外観が表されており、水道や窓、洗濯機が配置されている。このとき、ユーザａ１、ａ２には、以下のような音響信号が混在した環境音として聞こえている。
水道の蛇口音「ジャー」
窓の開閉音「バタッ」
洗濯機の駆動音「グルングルン」
収音装置１は、環境音を収音するための複数のマイクを搭載すると共に、ネットワークを介して遠隔の再生装置２と通信する。 According to FIG. 1, the appearance of the surrounding environment of users a1 and a2 is shown, and a water supply, windows, and a washing machine are arranged. At this time, users a1 and a2 hear environmental sounds that include a mixture of the following acoustic signals.
Water faucet sound "jar"
Window opening/closing sound “bap”
Washing machine driving sound "Grungurun"
The sound collection device 1 is equipped with a plurality of microphones for collecting environmental sounds, and also communicates with a remote playback device 2 via a network.

図２は、本発明における収音装置の機能構成図である。 FIG. 2 is a functional configuration diagram of the sound collection device according to the present invention.

図２によれば、収音装置１は、複数のマイク１０１と、第１の音響データベース１１と、音源分離部１２と、音響タグ推定部１３と、音響タグ送信部１４と、映像送信部１５とを有する。これら機能構成部は、装置に搭載されたコンピュータを機能させるプログラムを実行することによって実現できる。また、これら機能構成部の処理の流れは、収音送信方法としても理解できる。 According to FIG. 2, the sound collection device 1 includes a plurality of microphones 101, a first acoustic database 11, a sound source separation section 12, an acoustic tag estimation section 13, an acoustic tag transmission section 14, and a video transmission section 15. and has. These functional components can be realized by executing a program that causes a computer installed in the device to function. Further, the processing flow of these functional components can also be understood as a sound collection and transmission method.

［マイク１０１］
マイク１０１は、環境音を収音する複数のマイクロフォンからなる。例えばマイクロフォンアレイのようなものであってもよい。マイクロフォンアレイは、複数のマイクによって収音された環境音を信号処理することによって、音の空間的な情報を取得することができる。 [Mike 101]
The microphone 101 includes a plurality of microphones that collect environmental sounds. For example, it may be something like a microphone array. A microphone array can acquire spatial information of sound by signal processing environmental sounds picked up by a plurality of microphones.

［第１の音響データベース１１］
第１の音響データベース１１は、音響タグ毎に音響信号を蓄積する。
音響タグ<->音響信号
「音響タグ」は、音響信号を特定するための識別子である。
「音響信号」は、音響信号そのものに限らず、時系列の周波数スペクトルのような音響的特徴量の標準パターンのようなものであってもよい。 [First acoustic database 11]
The first acoustic database 11 stores acoustic signals for each acoustic tag.
Acoustic tag<->acoustic signal An “acoustic tag” is an identifier for specifying an acoustic signal.
The "acoustic signal" is not limited to the acoustic signal itself, but may be a standard pattern of acoustic features such as a time-series frequency spectrum.

［音源分離部１２］
音源分離部１２は、環境音に内在する１つ以上の音響信号を検出すると共に、音響信号毎の到来方向を推定する。 [Sound source separation unit 12]
The sound source separation unit 12 detects one or more acoustic signals inherent in environmental sounds and estimates the direction of arrival of each acoustic signal.

図３は、収音装置における音源分離部及び音響タグ推定部の説明図である。
図３によれば、音源分離部１２には、マイク１０１によって収音された環境音が入力される。この環境音には、例えば以下のような様々な音響信号が内在している。
「バタッ」
「グルングルン」
「ジャー」
音源分離部１２は、音源毎に分離して検出した音響信号と、その到来方向とを出力する。 FIG. 3 is an explanatory diagram of a sound source separation unit and an acoustic tag estimation unit in the sound collection device.
According to FIG. 3, the environmental sound picked up by the microphone 101 is input to the sound source separation unit 12. This environmental sound includes various acoustic signals such as the following.
"Bat"
"Grungrun"
"Jah"
The sound source separation unit 12 outputs acoustic signals separated and detected for each sound source and their direction of arrival.

音源分離部１２には、音響信号の到来方向を推定するために、ブラインド音源分離方式又はビームフォーミング方式を採用することができる。これら方式によれば、環境音に混在する音響信号を検出し、各音響信号の到来方向も検出することができる。 The sound source separation unit 12 may employ a blind sound source separation method or a beamforming method in order to estimate the arrival direction of the acoustic signal. According to these methods, it is possible to detect acoustic signals mixed in environmental sounds and also detect the direction of arrival of each acoustic signal.

（ブラインド音源分離方式）
図４は、ブラインド音源分離方式を用いた到来方向の検出を表す説明図である。
ブラインド音源分離方式とは、例えば独立成分分析に基づく場合（非特許文献５参照）、複数音源が未知であっても統計的に互いに独立であるとする仮定の下、分離信号が互いに独立となるようなフィルタを構成する。音響信号は、（マイクの数－１）個まで検出可能となる。尚、ブラインド音源分離方式は、音源の種類や空間的位置の知識、目的音の区間の切り出し、合成条件などの情報を原理的に必要とせず、音源信号の調波構造の仮定も用いない。 (Blind sound source separation method)
FIG. 4 is an explanatory diagram showing arrival direction detection using the blind sound source separation method.
A blind sound source separation method is, for example, when based on independent component analysis (see Non-Patent Document 5), separated signals are made independent of each other under the assumption that multiple sound sources are statistically independent from each other even if they are unknown. Configure a filter like this. Up to (number of microphones - 1) acoustic signals can be detected. Note that the blind sound source separation method does not in principle require information such as knowledge of the type and spatial position of the sound source, extraction of the target sound section, and synthesis conditions, and does not use assumptions about the harmonic structure of the sound source signal.

（ビームフォーミング方式）
図５は、ビームフォーミング方式を用いた到来方向の検出を表す説明図である。
ビームフォーミング方式は、各マイクが目的方向の音源の音響信号を検出する方式をいう（例えば非特許文献５参照）。音源から各マイクロフォンへの音波伝搬がそれぞれ異なることに基づいて、遅延及びフィルタによって位相や振幅を制御する。これによって、目的方向以外の音響信号の感度を低下させて、目的方向の音響信号の感度（Ｓ／Ｎ比）を確保する。
具体的には、マイクロフォンアレイからの角度を複数に分割し（図５によれば８分割）、角度範囲毎に、目的方向として音響信号を収音する。 (Beamforming method)
FIG. 5 is an explanatory diagram showing detection of the direction of arrival using the beamforming method.
The beamforming method is a method in which each microphone detects an acoustic signal of a sound source in a target direction (see, for example, Non-Patent Document 5). Based on the fact that the sound waves propagate differently from the sound source to each microphone, the phase and amplitude are controlled by delays and filters. This reduces the sensitivity of acoustic signals in directions other than the target direction, and ensures the sensitivity (S/N ratio) of the acoustic signals in the target direction.
Specifically, the angle from the microphone array is divided into a plurality of parts (eight divisions according to FIG. 5), and an acoustic signal is collected in each angular range as a target direction.

［音響タグ推定部１３］
音響タグ推定部１３は、第１の音響データベース１１を用いて、音響信号の音響タグを推定する。 [Acoustic tag estimation unit 13]
The acoustic tag estimation unit 13 uses the first acoustic database 11 to estimate the acoustic tag of the acoustic signal.

音響タグ推定部１３は、メル周波数ケプストラム係数（ＭＦＣＣ）を特徴量とし抽出し、深層学習に基づくニューラルネットワークを用いて音響信号を識別する（例えば非特許文献３、４参照）。これは、制約付きボルツマンマシン（ＲＢＭ）に基づく自己符号化器によって事前学習された隠れ層を積み重ねて、多層の階層ネットワークを構築し、最終層の出力を使った識別ネットワークを追加して、全体として教師あり学習によって音響タグを検出している。
音響タグ推定部１３は、学習段階として、第１の音響データベース１１に蓄積された音響タグ及び音響信号を対応付けた教師データによって学習する。推定段階として、音源分離部１２からの音響信号を入力し、当該音響信号に対応する音響タグを出力する。 The acoustic tag estimating unit 13 extracts Mel frequency cepstral coefficients (MFCC) as features and identifies acoustic signals using a neural network based on deep learning (for example, see Non-Patent Documents 3 and 4). This is done by stacking hidden layers pre-trained by an autoencoder based on a constrained Boltzmann machine (RBM) to build a multilayer hierarchical network, and adding a discriminative network using the output of the final layer to complete the The acoustic tags are detected using supervised learning.
In the learning stage, the acoustic tag estimation unit 13 performs learning using teacher data in which acoustic tags and acoustic signals stored in the first acoustic database 11 are associated with each other. In the estimation stage, an acoustic signal from the sound source separation unit 12 is input, and an acoustic tag corresponding to the acoustic signal is output.

図３によれば、音響タグ推定部１３は、例えば以下のように音響タグを推定している。
音響タグ101（水道の蛇口音）
音響タグ167（洗濯機の駆動音）
音響タグ143（窓の開閉音） According to FIG. 3, the acoustic tag estimating unit 13 estimates an acoustic tag as follows, for example.
Acoustic tag 101 (water faucet sound)
Acoustic tag 167 (washing machine driving sound)
Acoustic tag 143 (window opening/closing sound)

［音響タグ送信部１４］
音響タグ送信部１４は、音響タグ及び到来方向を、再生装置２へ送信する（パターン１）。
また、他の実施形態として、収音装置１で収音した音響信号をそのまま、再生装置２で再生する場合、音響タグ送信部１４は、「音響信号」自体も再生装置２へ送信する（パターン２）。 [Acoustic tag transmitter 14]
The acoustic tag transmitter 14 transmits the acoustic tag and the direction of arrival to the playback device 2 (pattern 1).
In addition, as another embodiment, when the acoustic signal collected by the sound collection device 1 is reproduced as it is by the reproduction device 2, the acoustic tag transmitting unit 14 also transmits the “acoustic signal” itself to the reproduction device 2 (pattern 2).

［映像送信部１５］
映像送信部１５は、カメラによって撮影した映像を、再生装置２へ送信する。テレプレゼンスシステムとして、拠点Ａの映像を、拠点Ｂへ送信するものである。 [Video transmitter 15]
The video transmitter 15 transmits the video captured by the camera to the playback device 2. This is a telepresence system that transmits video from base A to base B.

＜再生装置２＞
再生装置２は、第２の拠点内に配置され、収音装置から音響タグ及び到来方向を受信する。そして、第１の拠点の収音装置から受信した複数の音響信号を、第２の拠点内の環境音として合成して再生する。このとき、各音響信号が、所定の到来方向から聞こえるように再生することができる。これには、マルチチャネル音響の技術が用いることができる（例えば非特許文献６参照）。 <Playback device 2>
The playback device 2 is placed within the second base and receives the acoustic tag and direction of arrival from the sound pickup device. Then, the plurality of acoustic signals received from the sound collection device at the first base are synthesized and reproduced as environmental sounds within the second base. At this time, each acoustic signal can be reproduced so that it can be heard from a predetermined arrival direction. For this purpose, multi-channel audio technology can be used (see, for example, Non-Patent Document 6).

図６は、本発明における再生装置の機能構成図である。
図６によれば、再生装置２は、スピーカ２０１と、ディスプレイ２０２と、第２の音響データベース２１と、音響タグ受信部２２と、環境音再生部２３と、映像再生部２４とを有する。これら機能構成部は、装置に搭載されたコンピュータを機能させるプログラムを実行することによって実現できる。また、これら機能構成部の処理の流れは、環境音再生方法としても理解できる。 FIG. 6 is a functional configuration diagram of the playback device according to the present invention.
According to FIG. 6, the playback device 2 includes a speaker 201, a display 202, a second acoustic database 21, an acoustic tag receiving section 22, an environmental sound playback section 23, and a video playback section 24. These functional components can be realized by executing a program that causes a computer installed in the device to function. Further, the processing flow of these functional components can also be understood as an environmental sound reproduction method.

［第２の音響データベース２１］
第２の音響データベース２１は、音響タグ毎に、音響信号を蓄積する。基本的に、収音装置１の第１の音響データベース１１は、再生装置２の第２の音響データベースと同じ機能のものである。音響タグに紐付く音響信号は、できる限り、原音に近い音響信号であることが好ましい。 [Second acoustic database 21]
The second acoustic database 21 accumulates acoustic signals for each acoustic tag. Basically, the first acoustic database 11 of the sound collection device 1 has the same function as the second acoustic database of the reproduction device 2. The acoustic signal associated with the acoustic tag is preferably an acoustic signal as close to the original sound as possible.

ここで、再生装置２の第２の音響データベース２１に蓄積された音響タグ及び音響信号は、収音装置１の第１の音響データベース１１に蓄積された音響タグ及び音響信号の一部又は全部であってもよい。
例えば、収音装置１の第１の音響データベース１１に蓄積された音響タグ及び音響信号が、再生装置２の第２の音響データベース２１に蓄積されていない場合、その音響信号は再生されないだけである。
一方で、収音装置１の第１の音響データベース１１に蓄積された音響タグ及び音響信号が、再生装置２の第２の音響データベース２１にも蓄積されている場合、再生装置２における第２の音響データベース２１の音響信号によって合成された環境音が再生される。即ち、再生装置２の第２の音響データベース２１に蓄積された音響タグと、収音装置１の第１の音響データベース１１に蓄積された音響タグとが、異なる音響信号である場合、第２の音響データベース２１の音響信号によって変換された環境音が再生されることとなる。 Here, the acoustic tags and acoustic signals accumulated in the second acoustic database 21 of the playback device 2 are part or all of the acoustic tags and acoustic signals accumulated in the first acoustic database 11 of the sound collection device 1. There may be.
For example, if the acoustic tag and acoustic signal accumulated in the first acoustic database 11 of the sound collection device 1 are not accumulated in the second acoustic database 21 of the reproduction device 2, the acoustic signal is simply not reproduced. .
On the other hand, if the acoustic tags and acoustic signals accumulated in the first acoustic database 11 of the sound collection device 1 are also accumulated in the second acoustic database 21 of the reproduction device 2, Environmental sounds synthesized using acoustic signals from the acoustic database 21 are reproduced. That is, if the acoustic tag stored in the second acoustic database 21 of the playback device 2 and the acoustic tag accumulated in the first acoustic database 11 of the sound collection device 1 are different acoustic signals, the second The environmental sound converted by the acoustic signal of the acoustic database 21 will be reproduced.

［音響タグ受信部２２］
音響タグ受信部２２は、収音装置１から、音響タグ及び到来方向（及び音響信号）を受信する。受信した音響タグ及び到来方向（及び音響信号）は、環境音再生部２３へ出力される。 [Acoustic tag receiving unit 22]
The acoustic tag receiving unit 22 receives the acoustic tag and direction of arrival (and acoustic signal) from the sound collection device 1 . The received acoustic tag and arrival direction (and acoustic signal) are output to the environmental sound reproduction section 23.

［環境音再生部２３］
環境音再生部２３は、受信した音響タグに紐付く音響信号を、受信した到来方向から聞こえるように合成し、環境音を再生する。環境音は、複数のスピーカ２０１へ出力される。
スピーカ２０１は、複数のスピーカからなり、ユーザに対して、収音装置１が配置された拠点Ａにおける音源位置の到来方向から聞こえるように環境音を再生する。 [Environmental sound reproduction section 23]
The environmental sound reproduction unit 23 synthesizes the acoustic signals associated with the received acoustic tags so that they can be heard from the received direction of arrival, and reproduces the environmental sounds. The environmental sound is output to multiple speakers 201.
The speaker 201 is composed of a plurality of speakers, and reproduces environmental sound so that the user can hear it from the direction of arrival of the sound source position at the base A where the sound collection device 1 is placed.

［映像再生部２４］
映像再生部は、収音装置１から映像を受信し、その映像をディスプレイ２０２へ出力する。
ディスプレイ２０２は、その映像を再生し、ユーザに対して視認させる。 [Video playback section 24]
The video playback unit receives video from the sound collection device 1 and outputs the video to the display 202.
The display 202 plays back the video and allows the user to view it.

図７は、収音装置から再生装置へ送信されるデータを表す説明図である。 FIG. 7 is an explanatory diagram showing data transmitted from the sound collection device to the playback device.

図７によれば、例えば以下の２つのパターンの実施例がある。
［パターン１］
収音装置１の音響タグ送信部１４は、「音響タグ」「到来方向」を、再生装置２の音響タグ受信部２５へ送信する。
（収音装置）
音響タグ101（水道の蛇口音）：到来方向１ ->
音響タグ167（洗濯機の駆動音）：到来方向３ ->
音響タグ143（窓の開閉音）：到来方向８ ->
［パターン２］
収音装置１の音響タグ送信部１４は、「音響タグ」「到来方向」「音響信号」を、再生装置２の音響タグ受信部２５へ送信する。
（収音装置）
音響タグ101（水道の蛇口音）：到来方向１：音響信号「ジャー」 ->
音響タグ167（洗濯機の駆動音）：到来方向３：音響信号「グルングルン」 ->
音響タグ143（窓の開閉音）：到来方向８：音響信号「バタッ」 -> According to FIG. 7, for example, there are two examples of the following patterns.
[Pattern 1]
The acoustic tag transmitter 14 of the sound collection device 1 transmits the “acoustic tag” and “direction of arrival” to the acoustic tag receiver 25 of the playback device 2.
(sound collection device)
Acoustic tag 101 (water faucet sound): Direction of arrival 1 ->
Acoustic tag 167 (washing machine driving sound): Direction of arrival 3 ->
Acoustic tag 143 (window opening/closing sound): Direction of arrival 8 ->
[Pattern 2]
The acoustic tag transmitting unit 14 of the sound collection device 1 transmits the “acoustic tag”, “direction of arrival”, and “acoustic signal” to the acoustic tag receiving unit 25 of the reproducing device 2.
(sound collection device)
Acoustic tag 101 (water faucet sound): Direction of arrival 1: Acoustic signal "Jar"->
Acoustic tag 167 (washing machine driving sound): Arrival direction 3: Acoustic signal "Grungurun"->
Acoustic tag 143 (window opening/closing sound): Arrival direction 8: Acoustic signal "bap"->

図８は、パターン１における再生装置の環境音再生部の説明図である。 FIG. 8 is an explanatory diagram of the environmental sound reproduction section of the reproduction device in pattern 1.

図８によれば、環境音再生部２３は、環境音に内在する音響信号について、音響タグ受信部２２から音響タグ及び到来方向を入力する。
音響タグ101（水道の蛇口音）：到来方向１
音響タグ167（洗濯機の駆動音）：到来方向３
音響タグ143（窓の開閉音）：到来方向８
また、第２の音響データベース２１によれば、音響タグ毎に、以下のような音響信号が対応付けられている。
音響タグ101（水道の蛇口音）：音響信号「シャー」
音響タグ167（洗濯機の駆動音）：音響信号「クルンクルン」
音響タグ143（窓の開閉音）：音響信号「キーッ」
環境音再生部２３は、第２の音響データベース２１に登録された音響信号「シャー」「クルンクルン」「キーッ」を、拠点Ａにおける各到来方向の音源から聞こえるように合成し、環境音を再生する。
音響信号「シャー」：到来方向１
音響信号「クルンクルン」：到来方向３
音響信号「キーッ」：到来方向８ According to FIG. 8, the environmental sound reproducing unit 23 inputs the acoustic tag and arrival direction from the acoustic tag receiving unit 22 regarding the acoustic signal inherent in the environmental sound.
Acoustic tag 101 (water faucet sound): Direction of arrival 1
Acoustic tag 167 (washing machine driving sound): Direction of arrival 3
Acoustic tag 143 (window opening/closing sound): Arrival direction 8
Further, according to the second acoustic database 21, the following acoustic signals are associated with each acoustic tag.
Acoustic tag 101 (water faucet sound): Acoustic signal “sha”
Acoustic tag 167 (washing machine driving sound): Acoustic signal "Krunkrun"
Acoustic tag 143 (window opening/closing sound): Acoustic signal “squeak”
The environmental sound reproduction unit 23 synthesizes the acoustic signals "Sha", "Crunchy", and "Squeak" registered in the second acoustic database 21 so that they can be heard from the sound sources in each direction of arrival at base A, and reproduces the environmental sound. .
Acoustic signal “Shar”: Arrival direction 1
Acoustic signal “Krunkrun”: Arrival direction 3
Acoustic signal “squeak”: Direction of arrival 8

図９は、パターン１によって再生された音響信号を表す外観図である。 FIG. 9 is an external view showing an acoustic signal reproduced by pattern 1.

図９によれば、再生装置２が配置された拠点Ｂにおけるユーザｂには、拠点Ａの音源となる水道や窓、洗濯機の配置位置から、各音響信号が聞こえるようになる。
例えば拠点Ｂの環境音として、拠点Ａの窓の方向から音響信号「キーッ」が再生されている。これは、拠点Ａの環境音として、窓の開閉音「バタッ」を検出した際に、拠点Ａの窓と同じ方向から到来するように再生対象の音響信号「キーッ」が再生されている。できる限り、原音に近い音を再生することが好ましい。
このように、第２の音響データベース２１に登録された音響信号を、収音装置１で収音された音響信号の到来方向に応じた位置の音源から聞こえるような環境音として、再生することができる。 According to FIG. 9, user b at base B, where the playback device 2 is located, can hear various acoustic signals from the locations of the water supply, windows, and washing machines that serve as sound sources at base A.
For example, as an environmental sound of base B, an acoustic signal "squeak" is played from the direction of the window of base A. In this case, when the sound of opening and closing a window "bap" is detected as an environmental sound of base A, the acoustic signal "squeak" to be reproduced is reproduced so that it comes from the same direction as the window of base A. It is preferable to reproduce sound as close to the original sound as possible.
In this way, the acoustic signals registered in the second acoustic database 21 can be reproduced as environmental sounds that can be heard from the sound source at the position according to the direction of arrival of the acoustic signals collected by the sound collecting device 1. can.

図１０は、パターン２における環境音再生部の説明図である。 FIG. 10 is an explanatory diagram of the environmental sound reproduction section in pattern 2.

図１０によれば、環境音再生部２３は、環境音に内在する音響信号について、音響タグ受信部２２から、音響タグ及び到来方向と音響信号とを入力する。
音響タグ101（水道の蛇口音）：到来方向１：音響信号「ジャー」
音響タグ167（洗濯機の駆動音）：到来方向３：音響信号「グルングルン」
音響タグ143（窓の開閉音）：到来方向８：音響信号「バタッ」
環境音再生部２３は、受信した音響信号「ジャー」「グルングルン」「バタッ」を、拠点Ａにおける各到来方向の音源から聞こえるように合成し、環境音を再生する。
音響信号「ジャー」：到来方向１
音響信号「グルングルン」：到来方向３
音響信号「バタッ」：到来方向８ According to FIG. 10, the environmental sound reproducing unit 23 inputs an acoustic tag, an arrival direction, and an acoustic signal from the acoustic tag receiving unit 22 regarding the acoustic signal inherent in the environmental sound.
Acoustic tag 101 (water faucet sound): Direction of arrival 1: Acoustic signal “Jah”
Acoustic tag 167 (washing machine driving sound): Arrival direction 3: Acoustic signal "Grungurun"
Acoustic tag 143 (window opening/closing sound): Arrival direction 8: Acoustic signal “bap”
The environmental sound reproducing unit 23 synthesizes the received acoustic signals "jar", "gurungurung", and "bap" so that they can be heard from the sound sources in each direction of arrival at base A, and reproduces the environmental sound.
Acoustic signal “jar”: Arrival direction 1
Acoustic signal “Grungurun”: Arrival direction 3
Acoustic signal “bap”: Arrival direction 8

図１１は、パターン２によって再生された音響信号を表す外観図である。 FIG. 11 is an external view showing the acoustic signal reproduced by pattern 2.

図１１によれば、再生装置２が配置された拠点Ｂにおけるユーザｂには、拠点Ａの音源となる水道や窓、洗濯機の配置位置から、各音響信号が聞こえるようになる。
また、図１１によれば、拠点Ｂには、ディスプレイ２０２が配置されており、収音装置１によって撮影された拠点Ａの映像が再生されている。このとき、映像に「窓」が映り込んでいる。例えば拠点Ｂの環境音として、拠点Ａの窓の方向から音響信号「バタッ」が再生されている。これは、拠点Ａの映像における窓と同じ方向から到来するように再生対象の音響信号「バタッ」が再生されている。
このように、拠点Ａの収音装置１によって収音された音響信号を、拠点Ｂではその到来方向に応じた位置の音源から聞こえるような環境音として、再生することができる。 According to FIG. 11, the user b at the base B where the playback device 2 is located can hear each acoustic signal from the location of the water supply, window, and washing machine that are the sound sources of the base A.
Further, according to FIG. 11, a display 202 is arranged at the base B, and the video of the base A captured by the sound collection device 1 is reproduced. At this time, a "window" is reflected in the video. For example, as an environmental sound of site B, an acoustic signal "bap" is played from the direction of the window of site A. In this example, the sound signal "bap" to be played back is played so that it comes from the same direction as the window in the video of base A.
In this way, the sound signal picked up by the sound collection device 1 at base A can be reproduced at base B as an environmental sound that can be heard from a sound source located at a position corresponding to the direction of arrival.

図１２は、環境センサに対応する音響タグを送信する収音装置の説明図である。 FIG. 12 is an explanatory diagram of a sound collection device that transmits an acoustic tag corresponding to an environmental sensor.

図１２によれば、収音装置１は、環境センサ１７に接続されており、ON/OFF信号を受信する。環境センサとしては、例えば窓開閉センサのようなものであってもよい。環境センサは、いずれか１つの音響タグに紐付いている。環境センサのON/OFF信号は、音響タグ送信部１４へ入力される。音響タグ送信部１４は、環境センサ１７から所定信号を受信した際に、その環境音信号に対応する音響タグ及び到来方向を再生装置２へ送信する。これによって、例えば窓開閉音のみを再生装置２へ送信することができる。 According to FIG. 12, the sound collection device 1 is connected to an environmental sensor 17 and receives ON/OFF signals. The environmental sensor may be, for example, a window opening/closing sensor. The environmental sensor is tied to one of the acoustic tags. The ON/OFF signal of the environmental sensor is input to the acoustic tag transmitter 14. When the acoustic tag transmitter 14 receives a predetermined signal from the environmental sensor 17, the acoustic tag transmitter 14 transmits the acoustic tag and arrival direction corresponding to the environmental sound signal to the playback device 2. Thereby, for example, only the window opening/closing sound can be transmitted to the playback device 2.

図１３は、カメラの映像から推定した音響タグを送信する収音装置の説明図である。 FIG. 13 is an explanatory diagram of a sound collection device that transmits an acoustic tag estimated from a camera image.

図１３によれば、収音装置１は、カメラ１０２によって撮影された画像を入力する。
また、図１３によれば、収音装置１は、画像データベース１８０及び画像オブジェクト検出エンジン１８１を更に有する。
画像データベース１８０は、音響タグが紐付けられた画像オブジェクトを蓄積する。
画像オブジェクト検出エンジン１８１は、画像データベース１８０を用いて、カメラ１０２によって撮影された映像に内在する１つ以上の画像オブジェクトを検出し、当該画像オブジェクトの音響タグを特定する。特定された音響タグは、音響タグ送信部１４へ出力される。 According to FIG. 13, the sound collection device 1 receives an image captured by the camera 102.
Further, according to FIG. 13, the sound collection device 1 further includes an image database 180 and an image object detection engine 181.
The image database 180 stores image objects with associated acoustic tags.
The image object detection engine 181 uses the image database 180 to detect one or more image objects inherent in the video captured by the camera 102 and identifies the acoustic tag of the image object. The identified acoustic tag is output to the acoustic tag transmitter 14.

具体的には、画像オブジェクト検出エンジン１８１は、入力された画像又は映像から、物体（画像オブジェクト）を枠（バウンディングボックス）で囲み、その物体の種別（カテゴリ）を識別する。これは、例えばＳＳＤ(Single Shot Multibox Detector)のようなものであってもよい。ＳＳＤは、画像をグリッドで分割し、各グリッドに対して固定された複数のバウンディングボックスの当てはまり具合から、その位置のバウンディングボックスを検知する。そのバウンディングボックスには、１つの画像オブジェクトが収まる。
また、画像オブジェクト検出エンジン１８１としては、例えばＲＧＢ認識に基づくＣＮＮ(Convolutional Neural Network)のようなニューラルネットワークであって、ＹＯＬＯ(You Only Look Once)（登録商標）のようなものであってもよい。 Specifically, the image object detection engine 181 surrounds an object (image object) with a frame (bounding box) from the input image or video, and identifies the type (category) of the object. This may be, for example, an SSD (Single Shot Multibox Detector). SSD divides an image into grids, and detects the bounding box at that position based on how well a plurality of bounding boxes fixed to each grid fit. One image object fits within that bounding box.
Further, the image object detection engine 181 may be a neural network such as a CNN (Convolutional Neural Network) based on RGB recognition, or a neural network such as YOLO (You Only Look Once) (registered trademark). .

以上、詳細に説明したように、本発明の収音装置、システム、プログラム及び方法によれば、複数のマイクによって収音した環境音信号を、再生装置によって所定の到来方向から聞こえるべく再生できるように送信することができる。
本発明によれば、具体的には、収音側における各音源の音響信号を、再生側でも、収音側での音源位置に応じてその音響信号を再生することができる。収音装置と再生装置とが異なる拠点に配置された場合であっても、遠隔に滞在するメンバ同士で、互いの環境音を共有することができる。 As described above in detail, according to the sound collection device, system, program, and method of the present invention, environmental sound signals collected by a plurality of microphones can be reproduced by a reproduction device so that they can be heard from a predetermined direction of arrival. can be sent to.
According to the present invention, specifically, the acoustic signal of each sound source on the sound collection side can be reproduced on the reproduction side according to the position of the sound source on the sound collection side. Even if the sound collection device and the playback device are located at different bases, members staying remotely can share each other's environmental sounds.

前述した本発明の種々の実施形態について、本発明の技術思想及び見地の範囲の種々の変更、修正及び省略は、当業者によれば容易に行うことができる。前述の説明はあくまで例であって、何ら制約しようとするものではない。本発明は、特許請求の範囲及びその均等物として限定するものにのみ制約される。 Regarding the various embodiments of the present invention described above, various changes, modifications, and omissions within the scope of the technical idea and viewpoint of the present invention can be easily made by those skilled in the art. The above description is merely an example and is not intended to be limiting in any way. The invention is limited only by the claims and their equivalents.

１収音装置
１０１マイク
１０２カメラ
１１第１の音響データベース
１２音源分離部
１３音響タグ推定部
１４音響タグ送信部
１５映像送信部
１７環境センサ
１８０画像データベース
１８１画像オブジェクト検出エンジン
２再生装置
２０１スピーカ
２０２ディスプレイ
２１第２の音響データベース
２２音響タグ受信部
２３環境音再生部
２４映像再生部

1 sound collection device 101 microphone 102 camera 11 first acoustic database 12 sound source separation section 13 acoustic tag estimation section 14 acoustic tag transmission section 15 video transmission section 17 environment sensor 180 image database 181 image object detection engine 2 playback device 201 speaker 202 display 21 Second acoustic database 22 Acoustic tag receiving section 23 Environmental sound reproduction section 24 Video reproduction section

Claims

In a sound collection device that transmits environmental sound signals collected by multiple microphones to a playback device that reproduces them using multiple speakers,
It is connected to an acoustic tag and an environmental sensor linked to the direction of arrival.
a first acoustic database that accumulates acoustic signals for each acoustic tag;
Sound source separation means for detecting one or more acoustic signals inherent in the environmental sound signal from the environmental sound signal and estimating the direction of arrival of each acoustic signal;
acoustic tag estimating means for estimating an acoustic tag of an acoustic signal using a first acoustic database;
The reproduction device reproduces the acoustic tag estimated by the acoustic tag estimating means, the arrival direction estimated by the sound source separation means, and the acoustic tag and arrival direction associated with the environmental sensor when receiving a predetermined signal from the environmental sensor. What is claimed is: 1. A sound collection device comprising: an acoustic tag transmitting means for transmitting a signal to an acoustic tag, and causing a reproduction device to reproduce an acoustic signal associated with the acoustic tag from the direction of arrival.

2. The sound collection device according to claim 1, wherein the sound source separation means estimates the arrival direction of the acoustic signal by a blind sound source separation method using a plurality of microphones or by beamforming.

connected to the camera,
an image database that stores image objects associated with acoustic tags;
further comprising an image object detection engine that uses the image database to detect one or more image objects within the video captured by the camera and identifies an acoustic tag of the image object;
3. The sound collection device according to claim 1 , wherein the acoustic tag transmitting means transmits the arrival direction of the acoustic signal associated with the acoustic tag specified by the image object detection engine.

A system in which the sound collection device according to any one of claims 1 to 3 and a playback device equipped with a plurality of speakers are connected via a network,
The playback device is
a second acoustic database that accumulates acoustic signals for each acoustic tag;
acoustic tag receiving means for receiving the acoustic tag and the direction of arrival from the sound collection device;
It is characterized by having an environmental sound reproducing means for outputting, from a plurality of speakers, an environmental sound synthesized using a second acoustic database so that the acoustic signal linked to the acoustic tag can be heard from the direction of arrival of the acoustic tag. system.

The acoustic tag transmitting means in the sound collection device further transmits the acoustic signal together with the acoustic tag and the direction of arrival,
5. The system according to claim 4 , wherein the environmental sound reproducing means in the reproducing device reproduces an acoustic signal received from a sound collection device instead of an acoustic signal tied to an acoustic tag.

The acoustic tags and acoustic signals accumulated in the second acoustic database of the playback device are part or all of the acoustic tags and acoustic signals accumulated in the first acoustic database of the sound collection device,
Even if the acoustic tag stored in the second acoustic database of the playback device and the acoustic tag accumulated in the first acoustic database of the sound collection device are the same, it can be determined that the acoustic signals are based on different acoustic signals. 5. The system of claim 4 .

A program that operates a computer installed in a sound collection device that transmits environmental sound signals collected by multiple microphones to a playback device that reproduces them through multiple speakers.
The sound collection device is connected to an environmental sensor that is associated with an acoustic tag and a direction of arrival,
a first acoustic database that accumulates acoustic signals for each acoustic tag;
a sound source separation means for detecting one or more acoustic signals inherent in the environmental sound signal and estimating the direction of arrival of each acoustic signal;
acoustic tag estimating means for estimating an acoustic tag of an acoustic signal using a first acoustic database;
The reproduction device reproduces the acoustic tag estimated by the acoustic tag estimating means, the arrival direction estimated by the sound source separation means, and the acoustic tag and arrival direction linked to the environmental sensor when receiving a predetermined signal from the environmental sensor. A program for a sound collection device, characterized in that the program causes a computer to function as an acoustic tag transmitting means to transmit an acoustic tag to a reproducing device, and causes a reproducing device to reproduce an acoustic signal associated with the acoustic tag from the direction of arrival.

A program that receives an acoustic tag and a direction of arrival from the sound collection device according to any one of claims 1 to 3 and causes a computer installed in a playback device equipped with a plurality of speakers to function,
a second acoustic database that accumulates acoustic signals for each acoustic tag;
acoustic tag receiving means for receiving the acoustic tag and the direction of arrival from the sound collection device;
Using a second acoustic database, the computer functions as an environmental sound reproducing means for outputting from a plurality of speakers an environmental sound synthesized so that the acoustic signal linked to the acoustic tag can be heard from the direction of arrival of the acoustic tag. A playback device program featuring:

In a sound collection method for a sound collection device that transmits environmental sound signals collected by multiple microphones to a playback device that reproduces them using multiple speakers,
The sound collection device is
It is connected to an acoustic tag and an environmental sensor linked to the direction of arrival.
a first acoustic database for accumulating acoustic signals for each acoustic tag;
a first step of detecting one or more acoustic signals inherent in the environmental sound signal from the environmental sound signal and estimating the direction of arrival of each acoustic signal;
a second step of estimating an acoustic tag of the acoustic signal using the first acoustic database;
Reproducing the acoustic tag estimated in the second step, the direction of arrival estimated in the first step, and the acoustic tag and direction of arrival associated with the environmental sensor when the predetermined signal is received from the environmental sensor. a third step of transmitting the sound to the device, and causing the playback device to play back the acoustic signal associated with the acoustic tag from the direction of arrival.

A reproducing method for a reproducing device that receives an acoustic tag and a direction of arrival from a sound collection device according to any one of claims 1 to 3 and is equipped with a plurality of speakers,
The playback device is
a second acoustic database for accumulating acoustic signals for each acoustic tag;
a first step of receiving an acoustic tag and a direction of arrival from a sound collection device;
a second step of outputting from a plurality of speakers an environmental sound synthesized so that the acoustic signal associated with the acoustic tag can be heard from the direction of arrival of the acoustic tag using the second acoustic database; Featured playback method.