JP7285690B2

JP7285690B2 - Information processing method, information processing device and program

Info

Publication number: JP7285690B2
Application number: JP2019092403A
Authority: JP
Inventors: 耕水野
Original assignee: Panasonic Intellectual Property Corp of America
Current assignee: Panasonic Intellectual Property Corp of America
Priority date: 2018-08-28
Filing date: 2019-05-15
Publication date: 2023-06-02
Anticipated expiration: 2039-05-15
Also published as: JP2020034542A

Description

本開示は、情報処理方法、情報処理装置及びプログラムに関する。 The present disclosure relates to an information processing method, an information processing device, and a program.

音源の位置（特にマイクロホンと音源との距離）を推定する技術が開示されている。例えば、特許文献１では、複数の互いに異なる位置に設置した複数のマイクロホンを用いた三角測量によって音源の位置を推定する技術が開示されている。また、例えば、特許文献２では、空間における各位置に音源を置いたときにマイクロホンが集音する音をそれぞれ予めデータベース化しておくことで、音源の位置を推定する技術が開示されている。このような技術によって音源の位置を推定できることで、例えば、マイクロホンが設置された空間（原音場）における音の環境を別の場所で再現することができる。例えば、遠隔でテレビ会議等を行う際に、相手方があたかも同じ部屋で話しているかのような環境を作り出すことができる。その他、例えばパブリックビューイング又はオンラインゲーム等においても当該技術は有用となる。 Techniques for estimating the position of a sound source (especially the distance between a microphone and the sound source) have been disclosed. For example, Patent Literature 1 discloses a technique of estimating the position of a sound source by triangulation using a plurality of microphones installed at a plurality of mutually different positions. Further, for example, Patent Literature 2 discloses a technique for estimating the position of a sound source by creating a database in advance for each sound collected by a microphone when the sound source is placed at each position in space. By estimating the position of the sound source by such a technique, for example, the sound environment in the space (original sound field) in which the microphones are installed can be reproduced in another place. For example, when holding a teleconference remotely, it is possible to create an environment as if the other party were talking in the same room. In addition, the technology is also useful in, for example, public viewing or online games.

特許第４９２６０９１号公報Japanese Patent No. 4926091 特許第５０７９７６１号公報Japanese Patent No. 5079761

例えば、上記特許文献１に開示された技術では、複数の位置に集音装置（マイクロホン）を設置する必要がある。また、上記特許文献２に開示された技術では、上記データベースを予め作成しておく必要があり、さらに、当該データベースは空間ごとに固有のものとなるため、空間ごとに上記データベースを作成する必要がある。つまり、上記特許文献１、２に開示された技術では、集音装置と音源との距離を推定するのに手間がかかるという問題がある。 For example, in the technique disclosed in Patent Document 1, it is necessary to install sound collectors (microphones) at a plurality of positions. In addition, in the technology disclosed in Patent Document 2, the database must be created in advance, and the database is specific to each space. Therefore, it is necessary to create the database for each space. be. In other words, the techniques disclosed in Patent Documents 1 and 2 have a problem that it takes time and effort to estimate the distance between the sound collector and the sound source.

そこで、本開示は、集音装置と音源との距離を容易に推定できる情報処理方法等を提供することを目的とする。 Therefore, an object of the present disclosure is to provide an information processing method and the like that can easily estimate the distance between the sound collector and the sound source.

本開示の一態様に係る情報処理方法は、集音装置が集音した音を示す音信号を取得し、取得した前記音信号から前記集音装置が集音した音の音量を計算し、取得した前記音信号から前記集音装置が集音した音の音源の種類を識別し、音源の種類と当該音源から所定距離における当該音源からの音の音量である標準音量とが予め対応付けられたデータベースにおける、識別した前記種類に対応する標準音量と、計算した前記音量とに基づいて、前記集音装置と前記集音装置が集音した音の音源との距離を推定し、推定結果を出力する。 An information processing method according to an aspect of the present disclosure acquires a sound signal indicating sound collected by a sound collector, calculates the volume of the sound collected by the sound collector from the acquired sound signal, and acquires The type of the sound source of the sound collected by the sound collecting device is identified from the sound signal obtained, and the type of the sound source is associated in advance with a standard volume, which is the volume of the sound from the sound source at a predetermined distance from the sound source. Based on the standard volume corresponding to the identified type and the calculated volume in the database, estimate the distance between the sound collector and the sound source of the sound collected by the sound collector, and output the estimation result. do.

なお、これらの包括的または具体的な側面は、システム、装置、方法、記録媒体、または、コンピュータプログラムで実現されてもよく、システム、装置、方法、記録媒体、および、コンピュータプログラムの任意な組み合わせで実現されてもよい。 It should be noted that these general or specific aspects may be realized by systems, devices, methods, recording media, or computer programs, and any combination of systems, devices, methods, recording media, and computer programs. may be implemented with

本開示に係る情報処理方法等によれば、集音装置と音源との距離を容易に推定できる。 According to the information processing method and the like according to the present disclosure, it is possible to easily estimate the distance between the sound collector and the sound source.

図１は、実施の形態１に係る情報処理装置の構成の一例を示すブロック図である。FIG. 1 is a block diagram showing an example of the configuration of an information processing apparatus according to Embodiment 1. FIG. 図２は、実施の形態１に係る情報処理装置の動作の一例を示すフローチャートである。2 is a flowchart illustrating an example of the operation of the information processing apparatus according to Embodiment 1. FIG. 図３は、データベースの一例を示す表である。FIG. 3 is a table showing an example of the database. 図４は、実施の形態２に係る情報処理装置の構成の一例を示すブロック図である。FIG. 4 is a block diagram showing an example of the configuration of an information processing apparatus according to Embodiment 2. As shown in FIG. 図５は、実施の形態２に係る情報処理装置の動作の一例を示すフローチャートである。FIG. 5 is a flow chart showing an example of the operation of the information processing device according to the second embodiment. 図６は、実施の形態３に係る情報処理装置の構成の一例を示すブロック図である。FIG. 6 is a block diagram showing an example of the configuration of an information processing apparatus according to Embodiment 3. As shown in FIG. 図７は、実施の形態４に係る情報処理装置の構成の一例を示すブロック図である。FIG. 7 is a block diagram showing an example of the configuration of an information processing apparatus according to Embodiment 4. As shown in FIG. 図８は、実施の形態５に係る情報処理装置の構成の一例を示すブロック図である。FIG. 8 is a block diagram showing an example of the configuration of an information processing apparatus according to Embodiment 5. As shown in FIG. 図９は、実施の形態６に係る情報処理装置の構成の一例を示すブロック図である。FIG. 9 is a block diagram showing an example of the configuration of an information processing apparatus according to Embodiment 6. In FIG.

これによれば、様々な種類の音源の音について標準音量を測定して予めデータベースを生成しておけば、あとは、音源の距離の推定を行う空間において、集音装置が集音した音の音量を計算し、また、当該音の音源の種類を識別するだけで、集音装置と音源との距離を容易に推定できる。 According to this, if the standard volume is measured for sounds of various types of sound sources and a database is generated in advance, then the sound collected by the sound collecting device is generated in the space where the distance to the sound source is estimated. The distance between the sound collector and the sound source can be easily estimated simply by calculating the volume and identifying the type of sound source of the sound.

また、前記標準音量に対する計算した前記音量の減衰量と前記距離との予め定められた関係式に基づいて前記距離を推定してもよい。 Further, the distance may be estimated based on a predetermined relational expression between the calculated attenuation amount of the volume with respect to the standard volume and the distance.

これによれば、音量の減衰量と距離には関係性があるため、当該関係性を示す予め定められた関係式を用いることで、集音装置と音源との距離をより容易に推定できる。 According to this, since there is a relationship between the volume attenuation amount and the distance, the distance between the sound collector and the sound source can be more easily estimated by using a predetermined relational expression indicating the relationship.

また、任意の音を示す音信号を入力データとし、当該任意の音の音源の種類を正解データとして学習させた学習モデルと、取得した前記音信号とから前記集音装置が集音した音の音源の種類を識別してもよい。 Also, a sound signal indicating an arbitrary sound is used as input data, and the type of the sound source of the arbitrary sound is used as correct data to learn a learning model, and from the acquired sound signal, the sound collected by the sound collector. The type of sound source may be identified.

これによれば、機械学習された学習モデルを用いることで、集音装置が集音した音の音源の種類を容易に識別できる。 According to this, it is possible to easily identify the type of the sound source of the sound collected by the sound collector by using the machine-learned learning model.

また、取得した前記音信号から前記集音装置が集音した音の音源の種類を識別できない場合に、さらに、当該音信号の特徴量を算出し、前記集音装置と前記集音装置が集音した音の音源との距離を示す距離情報を取得し、算出した前記特徴量と、前記標準音量として、取得した前記距離情報が示す距離における計算した前記音量とを対応付けて前記データベースに登録してもよい。 Further, when the type of the sound source of the sound collected by the sound collecting device cannot be identified from the acquired sound signal, the feature amount of the sound signal is further calculated, and the sound collecting device and the sound collecting device collect the sound. Obtaining distance information indicating the distance from the sound source of the sounded sound, and registering in the database the calculated feature amount and the calculated volume at the distance indicated by the obtained distance information as the standard volume in association with each other. You may

これによれば、識別できなかった種類の音源について、当該音源の音の特徴量と標準音量とがデータベース化されるため、以降は、集音装置と当該特徴量を有する音の音源との距離を推定できるようになる。 According to this, for a sound source of a type that could not be identified, since the feature amount of the sound of the sound source and the standard volume are stored in a database, the distance between the sound collector and the sound source having the feature amount can be estimated.

また、取得した前記音信号が複数の音源からの音を示す場合、音源ごとに前記音信号を分離し、分離後の音信号のそれぞれについて、前記音量の計算、前記種類の識別、前記距離の推定、前記推定結果の出力を行ってもよい。 Further, when the acquired sound signals indicate sounds from a plurality of sound sources, the sound signals are separated for each sound source, and the volume is calculated, the type is identified, and the distance is calculated for each of the separated sound signals. Estimation and output of the estimation result may be performed.

これによれば、複数の音源のそれぞれについて、集音装置との距離を容易に推定できる。 According to this, the distance to the sound collector can be easily estimated for each of the plurality of sound sources.

また、前記識別した音源の種類が、可聴音を出力するものか非可聴音を出力するものかを判定し、当該判定結果にも応じて、前記距離を推定してもよい。具体的には、前記標準音量に対する計算した前記音量の減衰量と前記距離との予め定められた関係式に基づいて前記距離を推定し、前記関係式は、識別した前記種類が可聴音を出力するものである場合と、非可聴音を出力するものである場合とでそれぞれ予め定められていてもよい。 Further, it is possible to determine whether the type of the identified sound source outputs audible sound or non-audible sound, and estimate the distance according to the determination result. Specifically, the distance is estimated based on a predetermined relational expression between the attenuation amount of the volume calculated with respect to the standard volume and the distance, and the relational expression is such that the identified type outputs an audible sound. It may be determined in advance for the case of outputting an inaudible sound and the case of outputting an inaudible sound.

これによれば、集音装置と可聴音を出力する音源との距離、及び、集音装置と非可聴音を出力する音源との距離をそれぞれ推定できる。 According to this, it is possible to estimate the distance between the sound collector and the sound source that outputs audible sound, and the distance between the sound collector and the sound source that outputs inaudible sound.

また、前記集音装置は、複数のマイクロホンからなるマイクロホンアレイであり、さらに、前記複数のマイクロホンのそれぞれが集音した音の集音時間差に基づいて当該音の音源の前記集音装置に対する方向を推定してもよい。 Further, the sound collector is a microphone array composed of a plurality of microphones, and the direction of the sound source of the sound with respect to the sound collector is determined based on the sound collection time difference of the sound collected by each of the plurality of microphones. can be estimated.

これによれば、集音装置が集音した音の音源と集音装置との距離、及び、当該音源の集音装置に対する方向から当該音源の正確な位置を推定できる。 According to this, the accurate position of the sound source can be estimated from the distance between the sound source of the sound collected by the sound collector and the sound collector and the direction of the sound source with respect to the sound collector.

また、識別した音の音源の種類と、前記集音装置と当該音源との距離の推定結果との対応関係を、距離の推定を行うごとに蓄積し、蓄積した前記対応関係と、前記集音装置と前記集音装置が集音した音の音源との距離の推定結果と、に基づいて、当該音源の種類の識別の精度を判定し、前記判定の結果をフィードバックして、音源の種類の識別に用いてもよい。 Further, a correspondence relationship between the type of the identified sound source and the estimation result of the distance between the sound collector and the sound source is accumulated each time the distance is estimated, and the accumulated correspondence relationship and the sound collection are stored. Based on the estimation result of the distance between the device and the sound source of the sound collected by the sound collecting device, the accuracy of identifying the type of the sound source is determined, and the determination result is fed back to determine the type of the sound source. May be used for identification.

これによれば、音源の種類の識別の精度を向上させることができる。 According to this, it is possible to improve the accuracy of identifying the type of sound source.

本開示の一態様に係る情報処理装置は、集音装置が集音した音を示す音信号を取得する音信号取得部と、取得された前記音信号から前記集音装置が集音した音の音量を計算する計算部と、取得された前記音信号から前記集音装置が集音した音の音源の種類を識別する識別部と、音源の種類と当該音源から所定距離における当該音源からの音の音量である標準音量とが予め対応付けられたデータベースにおける、識別された前記種類に対応する標準音量と、計算された前記音量とに基づいて、前記集音装置と前記集音装置が集音した音の音源との距離を推定する推定部と、推定結果を出力する出力部と、を備える。 An information processing device according to an aspect of the present disclosure includes a sound signal acquisition unit that acquires a sound signal indicating sound collected by a sound collector, and a sound signal that is collected by the sound collector from the acquired sound signal. A calculation unit that calculates the volume, an identification unit that identifies the type of sound source of the sound collected by the sound collector from the acquired sound signal, and the type of the sound source and the sound from the sound source at a predetermined distance from the sound source. Based on the standard volume corresponding to the identified type and the calculated volume in a database in which the standard volume is associated in advance, the sound collector and the sound collector collect sound an estimating unit for estimating the distance to the sound source of the generated sound; and an output unit for outputting the estimation result.

これによれば、集音装置と音源との距離を容易に推定できる情報処理装置を提供できる。 According to this, it is possible to provide an information processing device capable of easily estimating the distance between the sound collector and the sound source.

本開示の一態様に係るプログラムは、上記の情報処理方法をコンピュータに実行させるプログラムである。 A program according to an aspect of the present disclosure is a program that causes a computer to execute the above information processing method.

これによれば、集音装置と音源との距離を容易に推定できるプログラムを提供できる。 According to this, it is possible to provide a program capable of easily estimating the distance between the sound collector and the sound source.

以下、実施の形態について、図面を参照しながら具体的に説明する。 Hereinafter, embodiments will be specifically described with reference to the drawings.

なお、以下で説明する実施の形態は、いずれも包括的または具体的な例を示すものである。以下の実施の形態で示される数値、形状、構成要素、構成要素の配置位置および接続形態、ステップ、ステップの順序などは、一例であり、本開示を限定する主旨ではない。また、以下の実施の形態における構成要素のうち、最上位概念を示す独立請求項に記載されていない構成要素については、任意の構成要素として説明される。 It should be noted that the embodiments described below are all comprehensive or specific examples. Numerical values, shapes, components, arrangement positions and connection forms of components, steps, order of steps, and the like shown in the following embodiments are examples and are not intended to limit the present disclosure. In addition, among the constituent elements in the following embodiments, constituent elements that are not described in independent claims representing the highest concept will be described as optional constituent elements.

（実施の形態１）
以下、図１から図３を用いて実施の形態１について説明する。 (Embodiment 1)
Embodiment 1 will be described below with reference to FIGS. 1 to 3. FIG.

図１は、実施の形態１に係る情報処理装置１の構成の一例を示すブロック図である。なお、図１には、集音装置１００と、集音装置１００が設置された空間において、集音装置１００とｄ［ｍ］離れた位置で発生した音の音源２００も示している。 FIG. 1 is a block diagram showing an example of the configuration of an information processing device 1 according to Embodiment 1. As shown in FIG. FIG. 1 also shows the sound collector 100 and a sound source 200 of a sound generated at a position d [m] away from the sound collector 100 in the space where the sound collector 100 is installed.

情報処理装置１は、集音装置１００と集音装置１００が集音した音の音源２００との距離を推定するための装置である。情報処理装置１は、集音装置１００が設けられた空間の周辺に設けられたコンピュータ、又は、当該空間とは異なる場所に設けられたサーバ装置等である。また、情報処理装置１は集音装置１００と一体に設けられてもよく、例えば、ポータブル機器であってもよい。つまり、集音装置１００と一体に設けられた情報処理装置１を持ち運びながら、集音装置１００の位置つまり現在位置と音源２００との距離の推定が行われてもよい。 The information processing device 1 is a device for estimating the distance between the sound collector 100 and the sound source 200 of the sound collected by the sound collector 100 . The information processing device 1 is a computer provided around the space in which the sound collector 100 is provided, or a server device or the like provided in a place different from the space. Further, the information processing device 1 may be provided integrally with the sound collecting device 100, and may be, for example, a portable device. In other words, the position of the sound collector 100 , that is, the distance between the current position and the sound source 200 may be estimated while carrying the information processing device 1 integrated with the sound collector 100 .

集音装置１００は、例えば、１つのマイクロホンである。集音装置１００は、集音した音を音信号（電気信号）に変換して情報処理装置１へ出力する。集音装置１００は、音源の距離の推定が行われる空間（部屋等）に設置される。集音装置１００と情報処理装置１とは有線又は無線により接続される。 The sound collection device 100 is, for example, one microphone. The sound collector 100 converts the collected sound into a sound signal (electrical signal) and outputs the signal to the information processing device 1 . The sound collector 100 is installed in a space (a room or the like) where the distance to the sound source is estimated. The sound collector 100 and the information processing device 1 are connected by wire or wirelessly.

図１に示されるように、情報処理装置１は、音信号取得部１０、計算部２０、識別部３０、学習モデル３１、推定部４０、データベース４１及び出力部５０を備える。情報処理装置１は、プロセッサ（マイクロプロセッサ）、ユーザインタフェース、通信インタフェース（図示しない通信回路等）及びメモリ等を含むコンピュータである。ユーザインタフェースは、例えば、ＬＣＤ（ＬｉｑｕｉｄＣｒｙｓｔａｌＤｉｓｐｌａｙ）等のディスプレイ、又は、キーボード、タッチパネル等の入力装置を含む。メモリは、ＲＯＭ、ＲＡＭ等であり、プロセッサにより実行される制御プログラム（コンピュータプログラム）を記憶することができる。なお、情報処理装置１は、１つのメモリを有していてもよく、また、複数のメモリを有していてもよい。１つ又は複数のメモリには、後述する学習モデル３１及びデータベース４１が記憶される。 As shown in FIG. 1 , the information processing device 1 includes a sound signal acquisition section 10 , a calculation section 20 , an identification section 30 , a learning model 31 , an estimation section 40 , a database 41 and an output section 50 . The information processing apparatus 1 is a computer including a processor (microprocessor), a user interface, a communication interface (a communication circuit, etc., not shown), a memory, and the like. The user interface includes, for example, a display such as an LCD (Liquid Crystal Display) or an input device such as a keyboard or touch panel. The memory is ROM, RAM, etc., and can store a control program (computer program) executed by the processor. Note that the information processing apparatus 1 may have one memory, or may have a plurality of memories. A learning model 31 and a database 41, which will be described later, are stored in one or more memories.

プロセッサが制御プログラムに従って動作することにより、プロセッサが有する機能構成要素である音信号取得部１０、計算部２０、識別部３０、推定部４０及び出力部５０が実現される。また、プロセッサが、制御プログラムに従って動作することにより、通信インタフェース及びユーザインタフェース等を制御する処理が行われる。なお、後述する実施の形態２～５における音信号取得部１０ａ、識別部３０ａ、推定部４０ａ、４０ｂ、算出部６０、距離情報取得部７０、登録部８０、判定部９０についても、プロセッサが制御プログラムに従って動作することにより実現される。 The sound signal acquisition unit 10, the calculation unit 20, the identification unit 30, the estimation unit 40, and the output unit 50, which are functional components of the processor, are implemented by the processor operating according to the control program. Further, the processor operates according to the control program to perform processing for controlling the communication interface, the user interface, and the like. Note that the sound signal acquisition unit 10a, the identification unit 30a, the estimation units 40a and 40b, the calculation unit 60, the distance information acquisition unit 70, the registration unit 80, and the determination unit 90 in Embodiments 2 to 5 described later are also controlled by the processor. It is realized by operating according to a program.

音信号取得部１０、計算部２０、識別部３０、学習モデル３１、推定部４０、データベース４１及び出力部５０について、図２を用いて説明する。 The sound signal acquisition unit 10, the calculation unit 20, the identification unit 30, the learning model 31, the estimation unit 40, the database 41, and the output unit 50 will be described with reference to FIG.

図２は、実施の形態１に係る情報処理装置１の動作の一例を示すフローチャートである。 FIG. 2 is a flow chart showing an example of the operation of the information processing device 1 according to the first embodiment.

まず、音信号取得部１０は、集音装置１００が集音した音を示す音信号を取得する（ステップＳ１１）。具体的には、情報処理装置１が備える通信インタフェースが集音装置１００から出力（送信）された音信号を受信することで、音信号取得部１０は、当該音信号を取得する。なお、集音装置１００が集音した音は音源２００からの音とする。 First, the sound signal acquisition unit 10 acquires a sound signal indicating sound collected by the sound collector 100 (step S11). Specifically, when the communication interface included in the information processing device 1 receives a sound signal output (transmitted) from the sound collecting device 100, the sound signal acquisition unit 10 acquires the sound signal. Note that the sound collected by the sound collector 100 is the sound from the sound source 200 .

次に、計算部２０は、音信号取得部１０に取得された音信号から集音装置１００が集音した音の音量を計算する（ステップＳ１２）。音量の計算については、一般的になされていることであるため詳細な説明は省略するが、例えば、音信号（電気信号）が示す振幅等から計算することができる。 Next, the calculator 20 calculates the volume of the sound collected by the sound collector 100 from the sound signal acquired by the sound signal acquirer 10 (step S12). Calculation of the volume is generally performed, so a detailed description thereof will be omitted. For example, the volume can be calculated from the amplitude indicated by the sound signal (electrical signal).

次に、識別部３０は、音信号取得部１０に取得された音信号から集音装置１００が集音した音の音源２００の種類を識別する（ステップＳ１３）。例えば、識別部３０は、任意の音を示す音信号を入力データとし当該任意の音の音源の種類を正解データとして学習させた学習モデル３１と、音信号取得部１０に取得された音信号とから集音装置１００が集音した音の音源２００の種類を識別する。なお、音信号は周波数スペクトル等に変換されて入力データとして用いられてもよい。また、学習モデル３１は、例えば、ニューラルネットワーク等である。以下、例えば、音源の種類として「男性の声」を学習させる場合について説明する。 Next, the identifying unit 30 identifies the type of the sound source 200 of the sound collected by the sound collecting device 100 from the sound signal obtained by the sound signal obtaining unit 10 (step S13). For example, the identification unit 30 uses a sound signal indicating an arbitrary sound as input data and the type of the sound source of the arbitrary sound as correct data to learn a learning model 31, and the sound signal acquired by the sound signal acquisition unit 10. , the type of the sound source 200 of the sound collected by the sound collector 100 is identified. Note that the sound signal may be converted into a frequency spectrum or the like and used as input data. Also, the learning model 31 is, for example, a neural network or the like. In the following, for example, a case of learning "male voice" as the type of sound source will be described.

まず、任意の音として男性の声を示すものとわかっている音信号を数多く準備する。そして、準備したこれらの男性の声を示す音信号を入力データとして学習モデル３１に入力していき、正解が「男性の声」となるように学習モデル３１に学習させる。これにより、学習が完了した学習モデル３１に種類が未知である音信号を入力した場合に、当該種類が「男性の声」であるときには、学習モデル３１から正解として「男性の声」が出力されるようになる。 First, a number of sound signals known to represent male voices are prepared as arbitrary sounds. Then, the prepared sound signals representing male voices are input to the learning model 31 as input data, and the learning model 31 is trained so that the correct answer is "male voice". As a result, when a sound signal whose type is unknown is input to the learning model 31 that has completed learning, and the type is "male voice", the learning model 31 outputs "male voice" as the correct answer. Become so.

また、図２では、音量の計算が行われた後に音源２００の種類の識別が行われるが、音源２００の種類の識別が行われた後に音量の計算が行われてよい。つまり、ステップＳ１２及びステップＳ１３が行われる順序が逆であってもよい。また、音量の計算と音源２００の種類の識別とが並行して行われてもよい。 Also, in FIG. 2, the type of sound source 200 is identified after the calculation of volume, but the calculation of volume may be performed after the type of sound source 200 is identified. That is, the order in which steps S12 and S13 are performed may be reversed. Also, the calculation of volume and the identification of the type of sound source 200 may be performed in parallel.

次に、推定部４０は、音源の種類と当該音源から所定距離における当該音源からの音の音量である標準音量とが予め対応付けられたデータベース４１における、識別部３０に識別された種類に対応する標準音量と、計算部２０に計算された音量とに基づいて、集音装置１００と集音装置１００が集音した音の音源２００との距離を推定する（ステップＳ１４）。ここで、データベース４１について図３を用いて説明する。 Next, the estimation unit 40 corresponds to the type identified by the identification unit 30 in the database 41 in which the type of sound source and the standard volume, which is the volume of the sound from the sound source at a predetermined distance from the sound source, are associated in advance. The distance between the sound collector 100 and the sound source 200 of the sound collected by the sound collector 100 is estimated based on the standard volume and the volume calculated by the calculator 20 (step S14). Here, the database 41 will be explained using FIG.

図３は、データベース４１の一例を示す表である。 FIG. 3 is a table showing an example of the database 41. As shown in FIG.

一般的に、音源の種類によってその音の音量はある程度決まっている。そこで、様々な種類の音源について、音源から所定距離における当該音源からの音の音量を標準音量として測定する。これにより、様々な音源の種類と標準音量とが対応付けられたデータベース４１を作成できる。図３には、所定距離を１ｍとしたときの、「男性の声」、「女性の声」、「車の騒音」、「掃除機の音」、「水の流れる音」の標準音量の一例が示されている。なお、人の声については、人によってその音量が異なるため、例えば、集音装置１００が設置された空間を使用する人の声について標準音量が測定されることで、人ごとに声と標準音量とが対応付けられてデータベース化されてもよい。 Generally, the volume of sound is determined to some extent depending on the type of sound source. Therefore, for various types of sound sources, the volume of the sound from the sound source at a predetermined distance from the sound source is measured as the standard volume. This makes it possible to create a database 41 in which various types of sound sources and standard volumes are associated with each other. FIG. 3 shows an example of the standard volume of "male voice", "female voice", "car noise", "vacuum cleaner sound", and "flowing water sound" when the predetermined distance is 1 m. It is shown. It should be noted that since the volume of a human voice differs from person to person, for example, by measuring the standard volume of the voice of a person using the space in which the sound collector 100 is installed, the voice and the standard volume of each person can be measured. may be associated with each other and stored in a database.

推定部４０は、具体的には、標準音量に対する計算部２０に計算された音量の減衰量と、集音装置１００と集音装置１００が集音した音の音源との距離との予め定められた関係式に基づいて集音装置１００と音源２００との距離を推定する。当該関係式の一例を以下の式１に示す。 Specifically, the estimation unit 40 calculates a predetermined distance between the sound volume attenuation amount calculated by the calculation unit 20 and the sound source of the sound collected by the sound collection device 100 with respect to the standard sound volume. The distance between the sound collector 100 and the sound source 200 is estimated based on the relational expression. An example of the relational expression is shown in Equation 1 below.

式１において、ｄ［ｍ］は集音装置１００と音源２００との距離を示し、ｒ［ｍ］は所定距離を示し、Ａ０［ｄＢ］は所定距離における標準音量を示し、Ａ［ｄＢ］は計算部２０に計算された音量を示す。また、式１では、音源２００を点音源とみなしている。 In Equation 1, d [m] indicates the distance between the sound collector 100 and the sound source 200, r [m] indicates a predetermined distance, A0 [dB] indicates the standard volume at the predetermined distance, and A [dB] is The volume calculated in the calculator 20 is shown. Also, in Equation 1, the sound source 200 is regarded as a point sound source.

例えば、識別部３０に識別された、集音装置１００が集音した音の音源２００の種類が「男性の声」の場合、データベース４１において、当該種類「男性の声」に対応する所定距離１ｍでの標準音量は５５ｄＢとなる。また、このとき計算部２０に計算された音量が４３ｄＢだったとする。この場合、推定部４０は、式１において、ｒ＝１、Ａ０＝５５、Ａ＝４３を代入することで、集音装置１００と音源２００との距離を約４ｍと推定する。 For example, when the type of the sound source 200 of the sound collected by the sound collector 100 identified by the identification unit 30 is “male voice”, the predetermined distance 1 m corresponding to the type “male voice” is stored in the database 41. The standard sound volume is 55 dB. Also, assume that the volume calculated by the calculation unit 20 at this time is 43 dB. In this case, the estimation unit 40 substitutes r=1, A0=55, and A=43 in Equation 1 to estimate the distance between the sound collector 100 and the sound source 200 to be about 4 m.

そして、出力部５０は、推定部４０の推定結果を出力する（ステップＳ１５）。例えば、出力部５０は、情報処理装置１が備えるユーザインタフェース、又は、情報処理装置１と通信可能な携帯端末若しくはサーバ装置等の機器に推定結果を出力する。なお、出力部５０は、推定結果として、集音装置１００と音源２００との距離だけでなく、音源の種類等も出力してもよい。 Then, the output unit 50 outputs the estimation result of the estimation unit 40 (step S15). For example, the output unit 50 outputs the estimation result to a user interface included in the information processing device 1 or a device such as a mobile terminal or a server device that can communicate with the information processing device 1 . Note that the output unit 50 may output not only the distance between the sound collecting device 100 and the sound source 200 but also the type of the sound source as the estimation result.

以上説明したように、様々な種類の音源の音について標準音量を測定して予めデータベース４１を生成しておけば、あとは、音源２００の距離の推定を行う空間において、集音装置１００が集音した音の音量を計算し、当該音の音源２００の種類を識別するだけで、集音装置１００と音源２００との距離を容易に推定できる。具体的には、距離の推定をするのに、複数の位置に集音装置１００を設置する手間がかからず、また、空間ごとに固有のデータベースを作成する手間がかからない。したがって、これまで手間がかかるために集音装置１００と音源２００との距離の推定が行われていなかった空間においても簡単に距離の推定を行うことができるようになり、距離の推定が行われる空間を増やすことができる。 As described above, if the standard volumes of sounds of various types of sound sources are measured and the database 41 is generated in advance, then the sound collector 100 collects the sound in the space where the distance of the sound source 200 is estimated. The distance between the sound collector 100 and the sound source 200 can be easily estimated simply by calculating the volume of the sound and identifying the type of the sound source 200 of the sound. Specifically, in order to estimate the distance, it is not necessary to install the sound collector 100 at a plurality of positions, and it is not necessary to create a unique database for each space. Therefore, even in a space where the distance between the sound collecting device 100 and the sound source 200 has not been estimated due to the time and effort involved, the distance can be easily estimated, and the distance can be estimated. You can increase the space.

また、音量の減衰量と距離には関係性があるため、当該関係性を示す予め定められた関係式（例えば式１）を用いることで、集音装置１００と音源２００との距離をより容易に推定できる。 In addition, since there is a relationship between the volume attenuation amount and the distance, using a predetermined relational expression (for example, Equation 1) showing the relationship makes it easier to determine the distance between the sound collecting device 100 and the sound source 200. can be estimated to

また、機械学習された学習モデル３１を用いることで、集音装置１００が集音した音の音源２００の種類を容易に識別できる。 Further, by using the machine-learned learning model 31, the type of the sound source 200 of the sound collected by the sound collector 100 can be easily identified.

（実施の形態２）
次に、図４及び図５を用いて実施の形態２について説明する。 (Embodiment 2)
Next, Embodiment 2 will be described with reference to FIGS. 4 and 5. FIG.

図４は、実施の形態２に係る情報処理装置１ａの構成の一例を示すブロック図である。 FIG. 4 is a block diagram showing an example of the configuration of an information processing device 1a according to the second embodiment.

実施の形態２に係る情報処理装置１ａは、識別部３０の代わりに識別部３０ａを備え、さらに、算出部６０、距離情報取得部７０及び登録部８０を備える点が、実施の形態１に係る情報処理装置１と異なる。その他の点は、実施の形態１に係る情報処理装置１におけるものと同じであるため説明は省略し、図５を用いて上記異なる点を中心に説明する。 The information processing apparatus 1a according to Embodiment 2 is different from Embodiment 1 in that it includes an identification unit 30a instead of the identification unit 30, and further includes a calculation unit 60, a distance information acquisition unit 70, and a registration unit 80. It differs from the information processing device 1 . Since other points are the same as those in the information processing apparatus 1 according to the first embodiment, description thereof will be omitted, and the description will focus on the above-mentioned different points using FIG.

なお、実施の形態２では、集音装置１００が設置された空間において発生した音の音源２０１の種類を学習モデル３１に学習させておらず、また、データベース４１に音源２０１の種類及び当該種類に対応する標準音量が含まれていないとする。 In the second embodiment, the learning model 31 is not made to learn the type of the sound source 201 of the sound generated in the space where the sound collector 100 is installed, and the database 41 stores the type of the sound source 201 and the corresponding type. Suppose the corresponding standard volume is not included.

図５は、実施の形態２に係る情報処理装置１ａの動作の一例を示すフローチャートである。 FIG. 5 is a flow chart showing an example of the operation of the information processing device 1a according to the second embodiment.

識別部３０ａは、図２のステップＳ１３と同じように、音信号取得部１０に取得された音信号から集音装置１００が集音した音の音源２０１の種類を識別する。このとき、例えば、識別部３０ａは、音源２０１の種類を識別できるか否かを判定する（ステップＳ２１）。例えば、学習モデル３１に、集音装置１００が設置された空間において発生した音の音源２０１の種類を学習させている場合には、識別部３０ａは音源２０１の種類を識別でき、音源２０１の種類を学習させていない場合には、識別部３０ａは音源２０１の種類を識別できない。識別部３０ａが音源２０１の種類を識別できると判定した場合（ステップＳ２１でＹｅｓ）には、実施の形態１と同様にステップＳ１４及びステップＳ１５での処理が行われる。 The identifying unit 30a identifies the type of the sound source 201 of the sound collected by the sound collecting device 100 from the sound signal acquired by the sound signal acquiring unit 10, as in step S13 of FIG. At this time, for example, the identification unit 30a determines whether or not the type of the sound source 201 can be identified (step S21). For example, when the learning model 31 learns the type of the sound source 201 of the sound generated in the space where the sound collector 100 is installed, the identification unit 30a can identify the type of the sound source 201, and the type of the sound source 201 is not learned, the identification unit 30a cannot identify the type of the sound source 201. When the identifying unit 30a determines that the type of the sound source 201 can be identified (Yes in step S21), the processes in steps S14 and S15 are performed as in the first embodiment.

識別部３０ａが音源２０１の種類を識別できないと判定した場合（ステップＳ２１でＮｏ）、算出部６０は、音信号取得部１０に取得された音信号の特徴量を算出する。当該特徴量は、当該音信号の特徴を示すものであり、例えば周波数スペクトル等である。例えば、当該特徴量は学習モデル３１が記憶されたメモリ等に記憶される。 When the identification unit 30 a determines that the type of the sound source 201 cannot be identified (No in step S<b>21 ), the calculation unit 60 calculates the feature amount of the sound signal acquired by the sound signal acquisition unit 10 . The feature amount indicates the feature of the sound signal, and is, for example, a frequency spectrum. For example, the feature amount is stored in a memory or the like in which the learning model 31 is stored.

次に、距離情報取得部７０は、集音装置１００と集音装置１００が集音した音の音源２０１との距離を示す距離情報を取得する（ステップＳ２３）。例えば、集音装置１００が設置された空間において、ユーザが実際に集音装置１００と音源２０１との距離を測定し、測定結果を情報処理装置１が備えるユーザインタフェースを介して入力したり、情報処理装置１と通信可能な携帯端末等を介して入力したりすることで、距離情報取得部７０は距離情報を取得する。 Next, the distance information acquisition unit 70 acquires distance information indicating the distance between the sound collector 100 and the sound source 201 of the sound collected by the sound collector 100 (step S23). For example, in a space in which the sound collector 100 is installed, the user actually measures the distance between the sound collector 100 and the sound source 201, inputs the measurement result via a user interface provided in the information processing device 1, or receives information. The distance information acquisition unit 70 acquires the distance information by inputting via a mobile terminal or the like that can communicate with the processing device 1 .

そして、登録部８０は、算出部６０に算出された特徴量と、標準音量として、距離情報取得部７０に取得された距離情報が示す距離における計算部２０に計算された音量とを対応付けてデータベース４１に登録する（ステップＳ２４）。上述したように、標準音量は、音源から所定距離における当該音源からの音の音量であるため、当該特徴量についての所定距離は、距離情報取得部７０が取得した距離情報が示す距離となり、標準音量は計算部２０に計算された音量となる。例えば、距離情報取得部７０に取得された距離情報が示す距離が２ｍであり、計算部２０に計算された音量が４９ｄＢであったとすると、データベース４１において、算出部６０に算出された特徴量と所定距離が２ｍでの標準音量４９ｄＢとが対応付けられる。 Then, the registration unit 80 associates the feature amount calculated by the calculation unit 60 with the volume calculated by the calculation unit 20 at the distance indicated by the distance information acquired by the distance information acquisition unit 70 as the standard volume. It is registered in the database 41 (step S24). As described above, the standard volume is the volume of the sound from the sound source at a predetermined distance from the sound source. The volume is the volume calculated by the calculation unit 20 . For example, if the distance indicated by the distance information acquired by the distance information acquisition unit 70 is 2 m and the sound volume calculated by the calculation unit 20 is 49 dB, the feature amount calculated by the calculation unit 60 and A standard sound volume of 49 dB at a predetermined distance of 2 m is associated.

以降、音源２０１と同じ種類の音源（つまり、同等な特徴量を有する音を発生する音源：同等音源と呼ぶ）と集音装置１００との距離を推定できるようになる。具体的には、識別部３０ａは、音信号取得部１０に取得された同等音源からの音を示す音信号の特徴量と同等な特徴量がメモリに記憶されていることを確認することで、当該音信号から集音装置１００が集音した音の音源２０１の種類を識別できると判定する。つまり、学習モデル３１に同等音源の種類を学習させていなくても、音源の種類を識別できない（つまり、ステップＳ２１でＮｏ）と判定されなくなり、距離の推定処理に進むことができる。 Henceforth, it becomes possible to estimate the distance between the sound source 201 and the sound source of the same type (that is, a sound source that generates sound having an equivalent feature value: called an equivalent sound source) and the sound collector 100 . Specifically, the identification unit 30a confirms that a feature amount equivalent to the feature amount of the sound signal indicating the sound from the equivalent sound source acquired by the sound signal acquisition unit 10 is stored in the memory. It is determined that the type of the sound source 201 of the sound collected by the sound collector 100 can be identified from the sound signal. In other words, even if the learning model 31 has not learned the type of equivalent sound source, it is no longer determined that the type of sound source cannot be identified (that is, No in step S21), and the distance estimation process can be performed.

推定部４０は、登録部８０に登録された特徴量と距離情報取得部７０に取得された距離情報が示す距離における標準音量とが対応付けられたデータベース４１における、識別部３０に識別された特徴量に対応する標準音量と、計算部２０に計算された同等音源の音の音量とに基づいて、集音装置１００と集音装置１００が集音した音の音源２０１との距離を推定する。例えば、計算部２０に計算された音量が４３ｄＢだったとする。この場合、上述したように所定距離２ｍでの標準音量が４９ｄＢであることから、推定部４０は、式１において、ｒ＝２、Ａ０＝４９、Ａ＝４３を代入することで、集音装置１００と同等音源との距離を約４ｍと推定する。 The estimation unit 40 extracts the feature identified by the identification unit 30 in the database 41 in which the feature amount registered in the registration unit 80 and the standard sound volume at the distance indicated by the distance information obtained by the distance information acquisition unit 70 are associated with each other. The distance between the sound collector 100 and the sound source 201 of the sound collected by the sound collector 100 is estimated based on the standard volume corresponding to the volume and the volume of the sound of the equivalent sound source calculated by the calculation unit 20 . For example, assume that the volume calculated by the calculator 20 is 43 dB. In this case, since the standard sound volume at the predetermined distance of 2 m is 49 dB as described above, the estimation unit 40 substitutes r=2, A0=49, and A=43 in Equation 1 to obtain the sound collector The distance between 100 and the equivalent sound source is estimated to be about 4m.

以上説明したように、識別できなかった種類の音源２０１について、音源２０１の音の特徴量と標準音量とがデータベース化されるため、以降は、集音装置１００と当該特徴量を有する音の音源との距離を推定できるようになる。 As described above, for the sound source 201 of the type that could not be identified, the sound feature amount and the standard volume of the sound source 201 are stored in a database. be able to estimate the distance to

なお、音源２０１からの音を示す音信号を入力データとし、音源２０１の種類を正解データとして学習モデル３１に学習させ、未知の音源２０１に対して対応できるように学習モデル３１を更新してもよい。 The learning model 31 may be updated so as to be able to deal with an unknown sound source 201 by making the learning model 31 learn using the sound signal representing the sound from the sound source 201 as input data and the type of the sound source 201 as correct data. good.

（実施の形態３）
次に、図６を用いて実施の形態３について説明する。 (Embodiment 3)
Next, Embodiment 3 will be described with reference to FIG.

図６は、実施の形態３に係る情報処理装置１ｂの構成の一例を示すブロック図である。 FIG. 6 is a block diagram showing an example of the configuration of an information processing device 1b according to the third embodiment.

実施の形態３に係る情報処理装置１ｂは、音信号取得部１０の代わりに音信号取得部１０ａを備える点が、実施の形態１に係る情報処理装置１と異なる。その他の点は、実施の形態１に係る情報処理装置１におけるものと同じであるため説明は省略し、上記異なる点を中心に説明する。 The information processing apparatus 1b according to the third embodiment differs from the information processing apparatus 1 according to the first embodiment in that a sound signal acquisition section 10a is provided instead of the sound signal acquisition section 10. FIG. Since other points are the same as those in the information processing apparatus 1 according to the first embodiment, the description is omitted, and the description will focus on the above-mentioned different points.

音信号取得部１０ａは、取得した音信号が複数の音源２０２及び２０３からの音を示す場合、音源２０２及び２０３ごとに音信号を分離する。複数の音源からの音の成分を含む音信号から、音源ごとの音信号を分離することは一般になされている技術であるため、ここでは詳細な説明は省略するが、分離する方法は特に限定されない。そして、分離後の音信号のそれぞれについて、計算部２０は音量の計算を行い、識別部３０は種類の識別を行い、推定部４０は距離の推定を行い、出力部５０は推定結果の出力を行う。つまり、音源２０２については距離ｄ１が推定され、音源２０３については距離ｄ２が推定される。 The sound signal acquisition unit 10 a separates the sound signals for each of the sound sources 202 and 203 when the acquired sound signals indicate sounds from a plurality of sound sources 202 and 203 . Separating a sound signal for each sound source from a sound signal containing sound components from multiple sound sources is a commonly used technique, so detailed description is omitted here, but the separation method is not particularly limited. . Then, for each of the separated sound signals, the calculation unit 20 calculates the volume, the identification unit 30 identifies the type, the estimation unit 40 estimates the distance, and the output unit 50 outputs the estimation result. conduct. That is, the distance d1 is estimated for the sound source 202, and the distance d2 is estimated for the sound source 203.

なお、情報処理装置１ｂは、計算部２０、識別部３０及び推定部４０を複数組備えていてもよく、分離後の音信号のそれぞれについて、並列に上記各処理が行われてもよい。これにより、複数の音源２０２及び２０３の両方についてリアルタイムに距離ｄ１及びｄ２の推定を行うことができる。 Note that the information processing device 1b may include a plurality of sets of the calculation unit 20, the identification unit 30, and the estimation unit 40, and the above processes may be performed in parallel on each of the separated sound signals. As a result, the distances d1 and d2 can be estimated in real time for both of the multiple sound sources 202 and 203 .

以上説明したように、取得した音信号が複数の音源２０２及び２０３からの音を示す場合、音源２０２及び２０３ごとに音信号を分離することで、複数の音源２０２及び２０３のそれぞれについて、集音装置１００との距離を容易に推定できる。 As described above, when the acquired sound signals indicate sounds from the plurality of sound sources 202 and 203, by separating the sound signals for each of the sound sources 202 and 203, the collected sound for each of the plurality of sound sources 202 and 203 is obtained. The distance to the device 100 can be easily estimated.

（実施の形態４）
次に、図７を用いて実施の形態４について説明する。 (Embodiment 4)
Next, Embodiment 4 will be described with reference to FIG.

図７は、実施の形態４に係る情報処理装置１ｃの構成の一例を示すブロック図である。 FIG. 7 is a block diagram showing an example of the configuration of an information processing device 1c according to the fourth embodiment.

実施の形態４に係る情報処理装置１ｃは、推定部４０の代わりに推定部４０ａを備え、さらに判定部９０を備える点が、実施の形態１に係る情報処理装置１と異なる。その他の点は、実施の形態１に係る情報処理装置１におけるものと同じであるため説明は省略し、上記異なる点を中心に説明する。 The information processing apparatus 1c according to the fourth embodiment differs from the information processing apparatus 1 according to the first embodiment in that an estimating section 40a is provided instead of the estimating section 40 and a determining section 90 is provided. Since other points are the same as those in the information processing apparatus 1 according to the first embodiment, the description is omitted, and the description will focus on the above-mentioned different points.

判定部９０は、識別部３０に識別された音源２００の種類が、可聴音を出力するものか非可聴音を出力するものかを判定する。例えば、判定部９０は、音信号取得部１０に取得された音信号の周波数スペクトル等から、識別部３０に識別された音源２００の種類が、可聴音を出力するものか非可聴音を出力するものかを判定できる。 The determination unit 90 determines whether the type of the sound source 200 identified by the identification unit 30 outputs audible sound or non-audible sound. For example, the determination unit 90 determines whether the type of the sound source 200 identified by the identification unit 30 outputs audible sound or non-audible sound based on the frequency spectrum of the sound signal acquired by the sound signal acquisition unit 10. can determine what

そして、推定部４０ａは、判定部９０の判定結果にも応じて、集音装置１００と音源２００との距離を推定する。具体的には、推定部４０ａは、推定部４０と同じように、標準音量に対する計算部２０に計算された音量の減衰量と、集音装置１００と音源との距離との予め定められた関係式に基づいて集音装置１００と音源２００との距離を推定するが、当該関係式は、識別部３０に識別された音源２００の種類が可聴音を出力するものである場合と、非可聴音を出力するものである場合とでそれぞれ予め定められる。可聴音と非可聴音とでは、距離による音量の減衰量が異なるためである。例えば、音源２００の種類が可聴音を出力するものである場合には、関係式は式１となる。一方で、音源２００の種類が非可聴音を出力するものである場合には、関係式は式１とは異なる式となる。さらに、非可聴音には、超音波及び超低周波が含まれ、非可聴音の中でも超音波と超低周波とで関係式は互いに異なる式となる。 Then, the estimation unit 40 a estimates the distance between the sound collector 100 and the sound source 200 also according to the determination result of the determination unit 90 . Specifically, in the same way as the estimating unit 40, the estimating unit 40a sets a predetermined relationship between the volume attenuation calculated by the calculating unit 20 with respect to the standard volume and the distance between the sound collecting device 100 and the sound source. The distance between the sound collecting device 100 and the sound source 200 is estimated based on the equation. are predetermined respectively. This is because audible sound and non-audible sound have different volume attenuation amounts depending on the distance. For example, when the type of the sound source 200 outputs audible sound, the relational expression is Equation (1). On the other hand, when the type of the sound source 200 outputs inaudible sound, the relational expression is different from the expression (1). Furthermore, inaudible sounds include ultrasonic waves and infrasound frequencies, and the relational expressions for ultrasonic waves and infrasound sounds are different from each other.

以上説明したように、集音装置１００と可聴音を出力する音源２００との距離、及び、集音装置１００と非可聴音を出力する音源２００との距離をそれぞれ推定できる。 As described above, the distance between the sound collector 100 and the sound source 200 that outputs audible sound and the distance between the sound collector 100 and the sound source 200 that outputs inaudible sound can be estimated.

例えば、非可聴音として超音波を出力する音源２００と集音装置１００との距離を推定できる。具体的には、機器に異常が発生した場合に機器の異常が発生した部分から超音波が発生することがあり、異常が発生した機器の特定が可能となる。また、異常の内容と超音波の周波数とは対応していることがあるため、推定結果とともに異常の内容も出力されてもよい。 For example, the distance between the sound source 200 that outputs ultrasonic waves as inaudible sound and the sound collector 100 can be estimated. Specifically, when an abnormality occurs in a device, an ultrasonic wave may be generated from the portion of the device where the abnormality has occurred, making it possible to specify the device in which the abnormality has occurred. Further, since the content of the abnormality and the frequency of the ultrasonic waves may correspond, the content of the abnormality may be output together with the estimation result.

また、例えば、非可聴音として超低周波を出力する音源２００と集音装置１００との距離を推定できる。具体的には、竜巻が発生した場合に、竜巻から超低周波が発生し、竜巻が発生した場所までの距離を推定できる。なお、この場合には、集音装置１００が設置される空間は、屋外も含めた数十ｋｍスケールの空間となる。 Also, for example, the distance between the sound source 200 that outputs infrasound as inaudible sound and the sound collector 100 can be estimated. Specifically, when a tornado occurs, an ultra-low frequency wave is generated from the tornado, and the distance to the location where the tornado occurred can be estimated. In this case, the space in which the sound collector 100 is installed has a scale of several tens of kilometers, including the outdoors.

（実施の形態５）
次に、図８を用いて実施の形態５について説明する。 (Embodiment 5)
Next, Embodiment 5 will be described with reference to FIG.

図８は、実施の形態５に係る情報処理装置１ｄの構成の一例を示すブロック図である。 FIG. 8 is a block diagram showing an example of the configuration of an information processing device 1d according to the fifth embodiment.

実施の形態５に係る情報処理装置１ｄは、推定部４０の代わりに推定部４０ｂを備える点が、実施の形態１に係る情報処理装置１と異なる。また、実施の形態５では、距離の推定が行われる空間に集音装置１００の代わりに集音装置１００ａが設置される点が、実施の形態１と異なる。その他の点は、実施の形態１に係る情報処理装置１におけるものと同じであるため説明は省略し、上記異なる点を中心に説明する。 The information processing device 1d according to the fifth embodiment differs from the information processing device 1 according to the first embodiment in that an estimating unit 40b is provided instead of the estimating unit 40. FIG. Further, the fifth embodiment differs from the first embodiment in that a sound collector 100a is installed instead of the sound collector 100 in a space where distance estimation is performed. Since other points are the same as those in the information processing apparatus 1 according to the first embodiment, the description is omitted, and the description will focus on the above-mentioned different points.

実施の形態５では、集音装置１００ａは、複数のマイクロホンからなるマイクロホンアレイである。複数のマイクロホンは、集音装置１００ａ内においてそれぞれ異なる位置に配置される。これにより、音源２００の音は、時間差をもって各マイクロホンに到達する。 In Embodiment 5, the sound collector 100a is a microphone array consisting of a plurality of microphones. The multiple microphones are arranged at different positions in the sound collector 100a. As a result, the sound from the sound source 200 reaches each microphone with a time difference.

推定部４０ｂは、推定部４０と同じように、集音装置１００ａに含まれる複数のマイクロホンのうちのいずれかのマイクロホンが集音した音の音信号を用いて集音装置１００ａと音源２００との距離を推定し、さらに、複数のマイクロホンのそれぞれが集音した音の集音時間差に基づいて当該音の音源２００の集音装置１００ａに対する方向を推定する。推定部４０ｂは、複数のマイクロホンの位置関係と、各マイクロホンが音源２００からの音を集音した時間の時間差（集音時間差）とから音源２００の集音装置１００ａに対する方向を推定できる。 As with the estimating unit 40, the estimating unit 40b uses sound signals of sounds collected by any one of the plurality of microphones included in the sound collecting device 100a to determine the relationship between the sound collecting device 100a and the sound source 200. The distance is estimated, and the direction of the sound source 200 of the sound with respect to the sound collector 100a is estimated based on the sound collection time difference of the sound collected by each of the plurality of microphones. The estimating unit 40b can estimate the direction of the sound source 200 with respect to the sound collector 100a from the positional relationship of the plurality of microphones and the time difference (sound collection time difference) in which the sound from the sound source 200 is collected by each microphone.

以上説明したように、さらに音源２００の集音装置１００ａに対する方向も推定できるため、集音装置１００ａが集音した音の音源２００と集音装置１００ａとの距離、及び、音源２００の集音装置１００ａに対する方向から、音源２００の正確な位置を推定できる。 As described above, the direction of the sound source 200 with respect to the sound collector 100a can also be estimated. The exact position of the sound source 200 can be estimated from the direction with respect to 100a.

（実施の形態６）
次に、図９を用いて実施の形態６について説明する。 (Embodiment 6)
Next, Embodiment 6 will be described with reference to FIG.

図９は、実施の形態６に係る情報処理装置１ｅの構成の一例を示すブロック図である。 FIG. 9 is a block diagram showing an example of the configuration of an information processing device 1e according to the sixth embodiment.

実施の形態６に係る情報処理装置１ｅは、さらに識別精度判定部４２を備える点が、実施の形態１に係る情報処理装置１と異なる。また、実施の形態６では、推定部４０の代わりに推定部４０ｃを備え、データベース４１の代わりにデータベース４１ａを備える点が、実施の形態１と異なる。その他の点は、実施の形態１に係る情報処理装置１におけるものと同じであるため説明は省略し、上記異なる点を中心に説明する。 The information processing apparatus 1e according to the sixth embodiment differs from the information processing apparatus 1 according to the first embodiment in that an identification accuracy determination unit 42 is further provided. Further, the sixth embodiment differs from the first embodiment in that an estimating unit 40 c is provided instead of the estimating unit 40 and a database 41 a is provided instead of the database 41 . Since other points are the same as those in the information processing apparatus 1 according to the first embodiment, the description is omitted, and the description will focus on the above-mentioned different points.

推定部４０ｃは、識別部３０が識別した音の音源の種類と、集音装置１００と当該音源との距離の推定結果との対応関係を、距離の推定を行うごとに例えばデータベース４１ａに蓄積する。なお、データベース４１ａは、このような対応関係が蓄積されること以外は、データベース４１と同じである。また、例えば、データベース４１とは別途、このような対応関係を蓄積するためのデータベースが設けられてもよい。 The estimating unit 40c accumulates, for example, the database 41a, each time the distance is estimated, the correspondence relationship between the type of sound source of the sound identified by the identifying unit 30 and the estimation result of the distance between the sound collector 100 and the sound source. . Note that the database 41a is the same as the database 41 except that such correspondence relationships are accumulated. Further, for example, a database for storing such correspondence may be provided separately from the database 41 .

例えば、特定の種類の音源から音が発生したときには、当該音源について距離の推定を行うごとに推定結果は同じ傾向となる場合がある。特定の種類の音源が、施設等に固定される音源である場合、当該音源からの音の音量は同じ傾向となり、音の音量から推定される距離の推定結果も同じ傾向となるためである。したがって、このような特定の種類の音源については、距離の推定結果の傾向から、集音装置１００からの実際の距離をある程度特定することができる。 For example, when sound is generated from a specific type of sound source, the estimation results may tend to be the same each time the distance is estimated for the sound source. This is because when a specific type of sound source is a sound source that is fixed in a facility or the like, the volume of the sound from the sound source has the same tendency, and the estimation result of the distance estimated from the volume of the sound also has the same tendency. Therefore, for such a specific type of sound source, the actual distance from the sound collecting device 100 can be specified to some extent from the tendency of the distance estimation results.

識別精度判定部４２は、蓄積した上記対応関係と、集音装置１００と集音装置１００が集音した音の音源２００との距離の推定結果と、に基づいて、当該音源２００の種類の識別の精度を判定する。具体的には、識別部３０で識別された音の種類についての推定結果が示す距離が、データベース４１ａに蓄積された対応関係において、当該種類に対応する距離の傾向とほぼ同じとなっていれば、推定部４０ｃによる推定が精度良く行われており、すなわち、識別部３０による音の種類の識別が精度良く行われていると判定できる。一方で、識別部３０で識別された音の種類についての推定結果が示す距離が、データベース４１ａに蓄積された対応関係において、当該種類に対応する距離の傾向からずれていれば、推定部４０ｃによる推定が精度良く行われておらず、すなわち、識別部３０による音の種類の識別が精度良く行われていないと判定できる。 The identification accuracy determination unit 42 identifies the type of the sound source 200 based on the accumulated correspondence relationship and the estimation result of the distance between the sound collector 100 and the sound source 200 of the sound collected by the sound collector 100. to determine the accuracy of Specifically, if the distance indicated by the estimation result for the type of sound identified by the identification unit 30 is substantially the same as the tendency of the distance corresponding to the type in the correspondence relationship accumulated in the database 41a , the estimation by the estimation unit 40c is performed with high accuracy, that is, it can be determined that the discrimination of the type of sound by the identification unit 30 is performed with high accuracy. On the other hand, if the distance indicated by the estimation result for the type of sound identified by the identifying unit 30 deviates from the tendency of the distance corresponding to the type in the correspondence accumulated in the database 41a, the estimating unit 40c It can be determined that the estimation is not accurately performed, that is, the identification of the sound type by the identification unit 30 is not accurately performed.

そして、識別精度判定部４２は、判定の結果を識別部３０にフィードバックして、識別部３０は、上記判定の結果を音源の種類の識別に用いる。このようなフィードバックにより、音源の種類の識別の精度を向上させることができる。 Then, the identification accuracy determination unit 42 feeds back the determination result to the identification unit 30, and the identification unit 30 uses the determination result to identify the type of sound source. Such feedback can improve the accuracy of identifying the type of sound source.

（その他の実施の形態）
以上、本開示の情報処理装置について、実施の形態に基づいて説明したが、本開示は、上記実施の形態に限定されるものではない。本開示の趣旨を逸脱しない限り、当業者が思いつく各種変形を本実施の形態に施したもの、および、異なる実施の形態における構成要素を組み合わせて構築される形態も、本開示の範囲内に含まれる。 (Other embodiments)
Although the information processing apparatus of the present disclosure has been described above based on the embodiments, the present disclosure is not limited to the above embodiments. As long as they do not deviate from the spirit of the present disclosure, modifications that can be made by those skilled in the art to the present embodiment, and forms constructed by combining the components of different embodiments are also included within the scope of the present disclosure. be

例えば、実施の形態３～６において、各情報処理装置は、実施の形態２における識別部３０ａ、算出部６０、距離情報取得部７０及び登録部８０に対応する機能を有していてもよい。つまり、実施の形態３～６において、各情報処理装置は、種類を識別できない音源について、その音の特徴量と標準音量とをデータベース４１、４１ａに登録する機能を有していてもよい。 For example, in the third to sixth embodiments, each information processing apparatus may have functions corresponding to the identifying section 30a, the calculating section 60, the distance information acquiring section 70, and the registering section 80 in the second embodiment. In other words, in Embodiments 3 to 6, each information processing apparatus may have a function of registering in the databases 41 and 41a the feature amount and the standard volume of the sound source for which the type cannot be identified.

また、例えば、実施の形態２、４～６において、各情報処理装置は、実施の形態３における音信号取得部１０ａに対応する機能を有していてもよい。つまり、実施の形態２、４～６において、各情報処理装置は、複数の音源が存在する場合に、音源ごとに音信号を分離して、分離後の音信号のそれぞれについて、音量の計算、種類の識別、距離の推定、推定結果の出力を行う機能を有していてもよい。 Further, for example, in the second, fourth to sixth embodiments, each information processing apparatus may have a function corresponding to the sound signal acquiring section 10a in the third embodiment. That is, in Embodiments 2, 4 to 6, each information processing device separates a sound signal for each sound source when there are a plurality of sound sources, and calculates volume and volume for each of the separated sound signals. It may have a function of identifying the type, estimating the distance, and outputting the estimation result.

また、例えば、実施の形態２、３、５及び６において、各情報処理装置は、実施の形態４における推定部４０ａ及び判定部９０に対応する機能を有していてもよい。つまり、実施の形態２、３、５及び６において、各情報処理装置は、音源の種類が、可聴音を出力するものか非可聴音を出力するものかを判定し、当該判定結果にも応じて、距離を推定する機能を有していてもよい。 Further, for example, in the second, third, fifth and sixth embodiments, each information processing apparatus may have functions corresponding to the estimation unit 40a and the determination unit 90 in the fourth embodiment. That is, in Embodiments 2, 3, 5 and 6, each information processing device determines whether the type of sound source outputs audible sound or non-audible sound, and may have the function of estimating the distance.

また、例えば、実施の形態２～４及び６において、集音装置１００は、複数のマイクロホンからなるマイクロホンアレイである集音装置１００ａであってもよく、各情報処理装置は、実施の形態５における推定部４０ｂに対応する機能を有していてもよい。つまり、実施の形態２～４及び６において、各情報処理装置は、音源の集音装置１００ａに対する方向を推定する機能を有していてもよい。 Further, for example, in Embodiments 2 to 4 and 6, the sound collection device 100 may be a sound collection device 100a that is a microphone array composed of a plurality of microphones, and each information processing device may be the It may have a function corresponding to the estimation unit 40b. That is, in Embodiments 2 to 4 and 6, each information processing device may have a function of estimating the direction of the sound source with respect to the sound collector 100a.

また、例えば、実施の形態２～５において、各情報処理装置は、実施の形態６における識別精度判定部４２及び推定部４０ｃに対応する機能を有していてもよい。つまり、実施の形態２～５において、各情報処理装置は、音源の種類の識別の精度を向上させるための機能を有していてもよい。 Further, for example, in the second to fifth embodiments, each information processing device may have functions corresponding to the identification accuracy determination unit 42 and the estimation unit 40c in the sixth embodiment. That is, in Embodiments 2 to 5, each information processing device may have a function for improving the accuracy of identifying the type of sound source.

また、例えば、上記実施の形態では、識別部３０、３０ａは学習モデル３１を用いて音源の種類を識別したが、学習モデル３１を用いなくてもよい。例えば、音信号が示す周波数スペクトル等には、音源の種類によってその特徴が異なってくるため、識別部３０、３０ａは、周波数スペクトル等から音源の種類を推定することで、音源の種類を識別してもよい。 Further, for example, in the above embodiments, the identification units 30 and 30a use the learning model 31 to identify the type of sound source, but the learning model 31 may not be used. For example, the frequency spectrum or the like indicated by the sound signal has different characteristics depending on the type of sound source. may

また、例えば、情報処理装置は、サーバ装置等により実現される場合に、情報処理装置が備える機能構成要素は、複数のサーバ装置に分散して配置されていてもよい。 Further, for example, when the information processing device is realized by a server device or the like, the functional components included in the information processing device may be distributed and arranged in a plurality of server devices.

また、本開示は、情報処理装置として実現できるだけでなく、情報処理装置を構成する各構成要素が行う処理のステップを含む情報処理方法として実現できる。 Further, the present disclosure can be realized not only as an information processing apparatus, but also as an information processing method including steps of processing performed by each component constituting the information processing apparatus.

具体的には、図２に示されるように、情報処理方法は、集音装置１００が集音した音を示す音信号を取得し（ステップＳ１１）、取得した音信号から集音装置１００が集音した音の音量を計算し（ステップＳ１２）、取得した音信号から集音装置１００が集音した音の音源２００の種類を識別し（ステップＳ１３）、音源の種類と当該音源から所定距離における当該音源からの音の音量である標準音量とが予め対応付けられたデータベース４１における、識別した種類に対応する標準音量と、計算した音量とに基づいて、集音装置１００と集音装置１００が集音した音の音源２００との距離を推定し（ステップＳ１４）、推定結果を出力する（ステップＳ１５）。 Specifically, as shown in FIG. 2, the information processing method acquires a sound signal indicating the sound collected by the sound collector 100 (step S11), and the sound collector 100 collects the sound from the acquired sound signal. The volume of the sound produced is calculated (step S12), the type of the sound source 200 of the sound collected by the sound collector 100 is identified from the acquired sound signal (step S13), and the type of the sound source and the volume at a predetermined distance from the sound source Based on the standard volume corresponding to the identified type in the database 41 in which the standard volume, which is the volume of the sound from the sound source, is associated in advance, and the calculated volume, the sound collector 100 and the sound collector 100 The distance between the collected sound and the sound source 200 is estimated (step S14), and the estimation result is output (step S15).

また、例えば、それらのステップは、コンピュータ（コンピュータシステム）によって実行されてもよい。そして、本開示は、それらの方法に含まれるステップを、コンピュータに実行させるためのプログラムとして実現できる。さらに、本開示は、そのプログラムを記録したＣＤ－ＲＯＭ等である非一時的なコンピュータ読み取り可能な記録媒体として実現できる。 Also, for example, those steps may be executed by a computer (computer system). The present disclosure can be realized as a program for causing a computer to execute the steps included in those methods. Furthermore, the present disclosure can be implemented as a non-temporary computer-readable recording medium such as a CD-ROM recording the program.

例えば、本開示が、プログラムで実現される場合には、コンピュータのＣＰＵ、メモリおよび入出力回路等のハードウェア資源を利用してプログラムが実行されることによって、各ステップが実行される。つまり、ＣＰＵがデータをメモリまたは入出力回路等から取得して演算したり、演算結果をメモリまたは入出力回路等に出力したりすることによって、各ステップが実行される。 For example, when the present disclosure is implemented by a program, each step is executed by executing the program using hardware resources such as a computer's CPU, memory, and input/output circuits. That is, each step is executed by the CPU obtaining data from a memory, an input/output circuit, or the like, performing an operation, or outputting the operation result to the memory, an input/output circuit, or the like.

また、上記実施の形態の情報処理装置に含まれる複数の構成要素は、それぞれ、専用または汎用の回路として実現されてもよい。これらの構成要素は、１つの回路として実現されてもよいし、複数の回路として実現されてもよい。 Further, each of the plurality of components included in the information processing apparatus of the above embodiments may be implemented as a dedicated or general-purpose circuit. These components may be implemented as one circuit or as multiple circuits.

また、上記実施の形態の情報処理装置に含まれる複数の構成要素は、集積回路（ＩＣ：ＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）であるＬＳＩ（ＬａｒｇｅＳｃａｌｅＩｎｔｅｇｒａｔｉｏｎ）として実現されてもよい。これらの構成要素は、個別に１チップ化されてもよいし、一部または全てを含むように１チップ化されてもよい。ＬＳＩは、集積度の違いにより、システムＬＳＩ、スーパーＬＳＩまたはウルトラＬＳＩと呼称される場合がある。 Also, the plurality of components included in the information processing apparatus of the above embodiments may be implemented as an LSI (Large Scale Integration), which is an integrated circuit (IC). These components may be made into one chip individually, or may be made into one chip so as to include some or all of them. LSIs are sometimes called system LSIs, super LSIs, or ultra LSIs depending on the degree of integration.

また、集積回路はＬＳＩに限られず、専用回路または汎用プロセッサで実現されてもよい。プログラム可能なＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）、または、ＬＳＩ内部の回路セルの接続および設定が再構成可能なリコンフィギュラブル・プロセッサが、利用されてもよい。 Also, the integrated circuit is not limited to an LSI, and may be realized by a dedicated circuit or a general-purpose processor. A programmable FPGA (Field Programmable Gate Array) or a reconfigurable processor in which connection and setting of circuit cells inside the LSI can be reconfigured may be used.

さらに、半導体技術の進歩または派生する別技術によりＬＳＩに置き換わる集積回路化の技術が登場すれば、当然、その技術を用いて、情報処理装置に含まれる各構成要素の集積回路化が行われてもよい。 Furthermore, if a technology for integrating circuits to replace LSIs emerges due to advances in semiconductor technology or another technology derived from it, it is natural that such technology will be used to integrate each component included in the information processing apparatus. good too.

その他、実施の形態に対して当業者が思いつく各種変形を施して得られる形態や、本開示の趣旨を逸脱しない範囲で各実施の形態における構成要素および機能を任意に組み合わせることで実現される形態も本開示に含まれる。 In addition, forms obtained by applying various modifications that a person skilled in the art can think of to the embodiments, and forms realized by arbitrarily combining the components and functions in each embodiment within the scope of the present disclosure are also included in this disclosure.

本開示の一態様は、例えば、音源の位置を特定するための装置に利用できる。 One aspect of the present disclosure can be used, for example, in a device for identifying the position of a sound source.

１、１ａ、１ｂ、１ｃ、１ｄ、１ｅ情報処理装置
１０、１０ａ音信号取得部
２０計算部
３０、３０ａ識別部
３１学習モデル
４０、４０ａ、４０ｂ、４０ｃ推定部
４１、４１ａデータベース
４２識別精度判定部
５０出力部
６０算出部
７０距離情報取得部
８０登録部
９０判定部
１００、１００ａ集音装置
２００、２０１、２０２、２０３音源 1, 1a, 1b, 1c, 1d, 1e Information processing device 10, 10a Sound signal acquisition unit 20 Calculation unit 30, 30a Identification unit 31 Learning model 40, 40a, 40b, 40c Estimation unit 41, 41a Database 42 Identification accuracy determination unit 50 output unit 60 calculation unit 70 distance information acquisition unit 80 registration unit 90 determination unit 100, 100a sound collector 200, 201, 202, 203 sound source

Claims

An information processing method executed by an information processing device,
Acquiring a sound signal indicating the sound collected by the sound collecting device,
calculating the volume of the sound collected by the sound collecting device from the acquired sound signal;
identifying the type of sound source of the sound collected by the sound collecting device from the acquired sound signal;
Based on the standard volume corresponding to the identified type and the calculated volume in a database in which the type of sound source and the standard volume, which is the volume of the sound from the sound source at a predetermined distance from the sound source, are associated in advance. , estimating the distance between the sound collector and the sound source of the sound collected by the sound collector;
output the estimation result,
Information processing methods.

estimating the distance based on a predetermined relational expression between the calculated attenuation amount of the volume with respect to the standard volume and the distance;
The information processing method according to claim 1.

Using a sound signal representing an arbitrary sound as input data and the type of the sound source of the arbitrary sound as correct data, a learning model is trained, and from the acquired sound signal, the sound source of the sound collected by the sound collecting device. identify the type
The information processing method according to claim 1 or 2.

When the type of the sound source of the sound collected by the sound collecting device cannot be identified from the acquired sound signal, further calculating a feature amount of the sound signal,
Acquiring distance information indicating the distance between the sound collector and a sound source of the sound collected by the sound collector;
registering the calculated feature amount and the calculated volume at the distance indicated by the acquired distance information as the standard volume in association with each other in the database;
The information processing method according to any one of claims 1 to 3.

if the obtained sound signals represent sounds from different types of sound sources, separating the sound signals for each sound source;
calculating the volume, identifying the type, estimating the distance, and outputting the estimation result for each of the separated sound signals;
The information processing method according to any one of claims 1 to 4.

The sound collecting device is a microphone array consisting of a plurality of microphones,
Further, estimating the direction of the sound source of the sound with respect to the sound collector based on the sound collection time difference of the sound collected by each of the plurality of microphones.
The information processing method according to any one of claims 1 to 5 .

accumulating the correspondence relationship between the type of the identified sound source and the estimation result of the distance between the sound collector and the sound source each time the distance is estimated;
Determining the accuracy of identifying the type of sound source based on the accumulated correspondence relationship and the estimation result of the distance between the sound collector and the sound source of the sound collected by the sound collector,
The information processing method according to any one of claims 1 to 6 , wherein the determination result is fed back and used for identifying the type of sound source.

a sound signal acquisition unit that acquires a sound signal indicating the sound collected by the sound collecting device;
a calculation unit that calculates the volume of the sound collected by the sound collecting device from the acquired sound signal;
an identification unit that identifies the type of sound source of the sound collected by the sound collecting device from the acquired sound signal;
The standard volume corresponding to the identified type and the calculated volume in a database in which the type of sound source and the standard volume, which is the volume of the sound from the sound source at a predetermined distance from the sound source, are associated in advance. an estimating unit for estimating the distance between the sound collector and the sound source of the sound collected by the sound collector, based on
and an output unit that outputs the estimation result,
Information processing equipment.

A program that causes a computer to execute the information processing method according to any one of claims 1 to 7 .