JP2014137226A

JP2014137226A - Mobile object, and system and method for creating acoustic source map

Info

Publication number: JP2014137226A
Application number: JP2013004482A
Authority: JP
Inventors: Yani Evan; ヤニ・エヴァン; Kallakuri Nagasrikanth; ナガスリカン・カラクリ; Saiki Luis Yoichi Morales; モラレス・サイキ・ルイス・ヨウイチ; Carlos Toshinori Ishii; イシイ・カルロス・トシノリ; Norihiro Hagita; 紀博萩田
Original assignee: ATR Advanced Telecommunications Research Institute International
Current assignee: ATR Advanced Telecommunications Research Institute International
Priority date: 2013-01-15
Filing date: 2013-01-15
Publication date: 2014-07-28
Anticipated expiration: 2033-01-15
Also published as: JP6240995B2

Abstract

PROBLEM TO BE SOLVED: To provide a mobile object that can identify the position of a sound source, and can collect advance information on an acoustic source available for real voice communication.SOLUTION: In a robot 1000, a robot position identifying section 1040 estimates the position of the robot 1000 using a particle filter including a plurality of particles. By using an incoming direction of a sound estimated from measured values by a microphone array 1052, a sound source position estimating section 1070 accumulates likelihoods with respect to hit positions as raycasts going in the incoming direction from the positions of the particles hit an object that is identified on a geometry map, to estimate the position of an acoustic source.

Description

この発明は実環境における音源定位技術に関し、特に、実環境において音センサアレイによる音声の方向の検出と移動体の位置特定技術とを組み合わせた音源位置の推定技術に関する。 The present invention relates to a sound source localization technique in a real environment, and more particularly to a sound source position estimation technique that combines detection of a voice direction by a sound sensor array and a moving object position specifying technique in a real environment.

家庭、オフィス、商店街など、異なった環境では、場所や時間によって多様な雑音特性を持つため、音声などの特定の音を対象としたアプリケーションでは、使用される環境の雑音の種類や度合いにより、期待した性能が得られないという問題がある。 Different environments such as home, office, and shopping streets have various noise characteristics depending on location and time, so in applications that target specific sounds such as voice, depending on the type and degree of environmental noise used, There is a problem that the expected performance cannot be obtained.

たとえば、人とロボットとの音声コミュニケーションにおいて、ロボットに取付けたマイクロホンは、通常離れた位置（１ｍ以上）にある。したがって、たとえば、電話音声のようにマイクと口との距離が数センチの場合と比べて、信号と雑音の比（ＳＮＲ）は低くなる。このため、傍にいる他人の声や環境の雑音が妨害音となり、ロボットによる目的音声の認識が難しくなる。従って、ロボットへの応用として、音源定位や音源分離は重要である。 For example, in voice communication between a person and a robot, the microphone attached to the robot is usually at a position (1 m or more) away from the robot. Therefore, for example, the signal-to-noise ratio (SNR) is lower than when the distance between the microphone and the mouth is several centimeters as in telephone voice. For this reason, the voices of others nearby and the noise of the environment become interference sounds, making it difficult for the robot to recognize the target speech. Therefore, sound source localization and sound source separation are important for robot applications.

音源定位に関して、実環境を想定した従来技術として特許文献１または特許文献２に記載のものがある。特許文献１または特許文献２に記載の技術は、分解能が高いＭＵＳＩＣ法と呼ばれる公知の音源定位の手法を用いている。 Regarding the sound source localization, there are those described in Patent Document 1 or Patent Document 2 as conventional techniques assuming an actual environment. The technique described in Patent Document 1 or Patent Document 2 uses a known sound source localization method called the MUSIC method with high resolution.

特許文献１または特許文献２に記載の発明では、マイクロホンアレイを用い、マイクロホンアレイからの信号をフーリエ変換して得られた受信信号ベクトルと、過去の相関行列とに基づいて現在の相関行列を計算する。このようにして求められた相関行列を固有値分解し、最大固有値と、最大固有値以外の固有値に対応する固有ベクトルである雑音空間とを求める。さらに、マイクロホンアレイのうち、１つのマイクロホンを基準として、各マイクの出力の位相差と、雑音空間と、最大固有値とに基づいて、ＭＵＳＩＣ法により音源の方向を推定する。 In the invention described in Patent Document 1 or Patent Document 2, a current correlation matrix is calculated based on a received signal vector obtained by Fourier-transforming a signal from the microphone array using a microphone array and a past correlation matrix. To do. The correlation matrix thus obtained is subjected to eigenvalue decomposition to obtain the maximum eigenvalue and a noise space that is an eigenvector corresponding to an eigenvalue other than the maximum eigenvalue. Furthermore, the direction of the sound source is estimated by the MUSIC method based on the phase difference of the output of each microphone, the noise space, and the maximum eigenvalue with one microphone as a reference in the microphone array.

一方で、たとえば、ロボット聴覚の分野での最近の数十年間のいくつかの大躍進にもかかわらず、音響源を正確に定位しマッピングすることは、環境の複雑さおよび多様性のために、依然として挑戦的なタスクである。 On the other hand, for example, despite several major breakthroughs in the field of robot hearing in recent decades, accurate localization and mapping of acoustic sources is due to the complexity and diversity of the environment. Still a challenging task.

たとえば、モバイル・プラットフォームを使用する音響源の定位は、ロボット研究の一環として、報告されてきた（たとえば、非特許文献１〜４を参照）。 For example, localization of acoustic sources using mobile platforms has been reported as part of robotic research (see, for example, Non-Patent Documents 1-4).

特開２００８−１７５７３３号公報明細書Japanese Patent Application Laid-Open No. 2008-175733 特開２０１１−２２０７０１号公報明細書JP 2011-220701 A Specification

J.-M. Valin, J. Rouat, and F. Michaud, ”Enhanced robot audition based on microphone array source separation with post-filter,” in Intelligent Robots and Systems, 2004. (IROS 2004). Proceedings. 2004 IEEE/RSJ International Conference on, vol. 3, sept.-2 oct. 2004, pp. 2123 - 2128 vol.3.J.-M. Valin, J. Rouat, and F. Michaud, “Enhanced robot audition based on microphone array source separation with post-filter,” in Intelligent Robots and Systems, 2004. (IROS 2004). Proceedings. 2004 IEEE / RSJ International Conference on, vol. 3, sept.-2 oct. 2004, pp. 2123-2128 vol.3. E. Martinson and A. C. Schultz, ”Auditory evidence grids.” in IROS. IEEE, 2006, pp. 1139-1144.E. Martinson and A. C. Schultz, “Auditory evidence grids.” In IROS. IEEE, 2006, pp. 1139-1144. K. Nakadai, H. Okuno, H. Nakajima, Y. Hasegawa, and H. Tsujino, ”An open source software system for robot audition hark and its evalation,” in IEEE-RAS International Conference on Humanoid Robots, 2008, pp. 561-566.K. Nakadai, H. Okuno, H. Nakajima, Y. Hasegawa, and H. Tsujino, ”An open source software system for robot audition hark and its evalation,” in IEEE-RAS International Conference on Humanoid Robots, 2008, pp. 561-566. Y. Sasaki, S. Thompson, M. Kaneyoshi, and S. Kagami, ”Mapgeneration and identification of multiple sound sources from robot in motion,” in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2010, 2010, pp. 437-443.Y. Sasaki, S. Thompson, M. Kaneyoshi, and S. Kagami, “Mapgeneration and identification of multiple sound sources from robot in motion,” in Proceedings of IEEE / RSJ International Conference on Intelligent Robots and Systems, IROS 2010, 2010, pp. 437-443.

ロボットのように、周囲からの雑音の影響の無視できないシステムへの応用を考える場合、このような複雑な混合をした音から処理されるべき情報を含む音を探し出すとともに、他の情報を破棄することが必要である。 When considering application to a system that cannot ignore the influence of noise from surroundings, such as a robot, search for sounds that contain information to be processed from such complex mixed sounds and discard other information. It is necessary.

そこで、与えられた場所の音響環境の事前知識を集めることは、その環境中の任意の音関連のタスクに有益である。ここで、音声源は、場所と特性が必ずしも定常的だとは限らない。 Thus, gathering prior knowledge of the acoustic environment at a given location is beneficial for any sound-related task in that environment. Here, the location and characteristics of the sound source are not always stationary.

一方で、音響源のうちのいくつかは、たとえば、ショッピング・モールのテレビ、エア・コンディショナあるいはラウドスピーカーのように、比較的長い期間において活性で、スペース内で完全に定位されうる。 On the other hand, some of the acoustic sources can be active for a relatively long period of time, such as a shopping mall television, air conditioner or loudspeaker, and fully localized in the space.

しかしながら、このような固定的な音響源と動的に変化する音響源とが混在する環境において、現実の音声コミュニケーションに利用可能な音響源に対する事前情報を、どのようにして収集し、データベース化するかという課題をどのように解決するかについては、必ずしも明らかではなかった。 However, in an environment where such fixed acoustic sources and dynamically changing acoustic sources coexist, how to collect prior information on acoustic sources that can be used for real voice communication and create a database It was not always clear how to solve this problem.

この発明は、このような問題点を解決するためになされたものであって、音源の位置を特定でき、現実の音声コミュニケーションに利用可能な音響源に対する事前情報を収集することが可能な移動体を提供することである。 The present invention has been made to solve such a problem, and is a movable body that can identify the position of a sound source and can collect prior information on an acoustic source that can be used for actual voice communication. Is to provide.

この発明の他の目的は、現実の音声コミュニケーションに利用可能な音響源に対する事前情報を収集することが可能な音響源マップ作成システムおよび音響源マップ作成方法を提供することである。 Another object of the present invention is to provide an acoustic source map creation system and an acoustic source map creation method capable of collecting prior information on acoustic sources that can be used for actual voice communication.

この発明の１つの局面に従うと、移動体であって、移動体を駆動するための移動手段と、移動体の移動可能な空間における対象物の幾何学的な位置を特定するための幾何学マップを格納する記憶手段と、移動体の現在位置を計測し、位置計測結果を出力する位置計測手段と、移動体から空間内において幾何学マップで特定される対象物までの距離を、所定の確率分布に従う測距データとして取得する測距手段と、複数のパーティクルを含むパーティクルフィルタにより、移動体の位置を推定する位置推定手段とを備え、各パーティクルは、空間内の位置および向きの情報を属性とし、位置推定手段は、位置計測結果に基づいて、各パーティクルを運動させ、測距データと確率分布に基づき、各パーティクルの現在の属性についての第１の尤度を算出し対応するパーティクルの重みを算出する手段と、各パーティクルの重み平均として、移動体の現在位置を推定する移動体位置推定手段とを含み、音センサアレイと、音センサアレイからの信号に基づいて、音センサアレイに音の到来する到来方向を特定するための処理を実行する到来方向推定手段と、各パーティクルの属性と到来方向とに基づいて、音響源の位置を推定する音源定位手段とを備え、音源定位手段は、各パーティクルの位置から、到来方向に向かうレイキャストが、幾何学マップにより特定される対象物にヒットすることに応じて、当該ヒット位置に対する第２の尤度を累積し、累積された第２の尤度に基づいて、音響源の位置を推定する。 According to one aspect of the present invention, a moving body, a moving means for driving the moving body, and a geometric map for specifying a geometric position of an object in a movable space of the moving body Storage means for storing the position of the moving object, the position measuring means for measuring the current position of the moving object and outputting the position measurement result, and the distance from the moving object to the object specified by the geometric map in the space with a predetermined probability. Ranging means to obtain distance measurement data according to the distribution, and position estimation means to estimate the position of the moving object by a particle filter including a plurality of particles, each particle is attributed to position and orientation information in space The position estimating means moves each particle based on the position measurement result, and based on the distance measurement data and the probability distribution, the first likelihood for the current attribute of each particle. And a moving body position estimating means for estimating the current position of the moving body as a weighted average of each particle. The sound sensor array and the signal from the sound sensor array Based on arrival direction estimation means for executing processing for specifying the arrival direction of sound arrival in the sound sensor array, and sound source localization means for estimating the position of the acoustic source based on the attribute and arrival direction of each particle And the sound source localization means obtains a second likelihood for the hit position in response to a raycast heading in the direction of arrival from the position of each particle hitting an object specified by the geometric map. Accumulate and estimate the position of the acoustic source based on the accumulated second likelihood.

好ましくは、測距手段は、レーザレンジファインダであり、第１の尤度は、移動体の現在位置から対象物までの距離の計測結果に基づく確率分布において、各パーティクルの位置からパーティクルの向きにレイキャストした場合に幾何学マップにおいて対象物にヒットするまでの距離に対応する確率を、所定の角度間隔で積算したものである。 Preferably, the distance measuring means is a laser range finder, and the first likelihood is determined from the position of each particle to the direction of the particle in the probability distribution based on the measurement result of the distance from the current position of the moving object to the object. In the case of ray casting, the probability corresponding to the distance to hit the object in the geometric map is integrated at a predetermined angular interval.

好ましくは、到来方向推定手段は、複数回のサウンドスキャンを実行し、音源定位手段は、第２の尤度の計算においては、音のパワーの最大値または変化量の少なくとも一方が所定のしきい値を超えるサウンドスキャンを選択して累積処理を実行する。 Preferably, the direction-of-arrival estimation means executes a plurality of sound scans, and the sound source localization means determines that at least one of the maximum value or the change amount of the sound power is a predetermined threshold in the second likelihood calculation. Select a sound scan that exceeds the value and execute the accumulation process.

好ましくは、第２の尤度は、音のパワーとともに増加し、所定の範囲にスケールされた値を有する関数で規定され、音源定位手段は、サウンドスキャンが選択され、ヒットの回数が所定のしきい値を超えるヒット位置について、第２の尤度が増加するように累積処理を実行する。 Preferably, the second likelihood increases with the sound power and is defined by a function having a value scaled to a predetermined range, and the sound source localization means selects a sound scan and the number of hits is predetermined. For the hit positions exceeding the threshold value, the accumulation process is executed so that the second likelihood increases.

好ましくは、移動体は、定位された音響源を幾何学マップ内での位置を示す音響マップを作成して、記憶装置に格納する。 Preferably, the moving body creates an acoustic map indicating the position of the localized acoustic source in the geometric map, and stores the acoustic map in the storage device.

好ましくは、移動体は、自律移動型のロボットである。 Preferably, the mobile body is an autonomous mobile robot.

好ましくは、移動体は、自律的に移動するに伴い、音響マップを更新する。 Preferably, the moving body updates the acoustic map as it moves autonomously.

好ましくは、位置計測手段は、移動手段の動きに基づいて、移動体の移動距離と移動方向を検出する路程測定センサである。 Preferably, the position measuring unit is a path length measurement sensor that detects a moving distance and a moving direction of the moving body based on the movement of the moving unit.

この発明の他の局面にしたがうと、音響源マップ作成システムであって、移動体を駆動するための移動手段と、移動体の移動可能な空間における対象物の幾何学的な位置を特定するための幾何学マップを格納する記憶手段と、移動体の現在位置を計測し、位置計測結果を出力する位置計測手段と、移動体から空間内において幾何学マップで特定される対象物までの距離を、所定の確率分布に従う測距データとして取得する測距手段と、複数のパーティクルを含むパーティクルフィルタにより、移動体の位置を推定する位置推定手段とを備え、各パーティクルは、空間内の位置および向きの情報を属性とし、位置推定手段は、位置計測結果に基づいて、各パーティクルを運動させ、測距データと確率分布に基づき、各パーティクルの現在の属性についての第１の尤度を算出し対応するパーティクルの重みを算出する手段と、各パーティクルの重み平均として、移動体の現在位置を推定する移動体位置推定手段とを含み、移動体に装着された音センサアレイと、音センサアレイからの信号に基づいて、音センサアレイに音の到来する到来方向を特定するための処理を実行する到来方向推定手段と、各パーティクルの属性と到来方向とに基づいて、音響源の位置を推定する音源定位手段とをさらに備え、音源定位手段は、各パーティクルの位置から、到来方向に向かうレイキャストが、幾何学マップにより特定される対象物にヒットすることに応じて、当該ヒット位置に対する第２の尤度を累積し、累積された第２の尤度に基づいて、音響源の位置を推定し、推定された音響源の位置を音響源マップとして格納する音響源マップデータベースとをさらに備える。 According to another aspect of the present invention, there is provided an acoustic source map creation system for specifying a moving means for driving a moving body and a geometric position of an object in a movable space of the moving body. Storage means for storing the geometric map, position measurement means for measuring the current position of the moving object and outputting a position measurement result, and the distance from the moving object to the object specified by the geometric map in space. , Ranging means for acquiring distance measurement data according to a predetermined probability distribution, and position estimation means for estimating the position of the moving body by a particle filter including a plurality of particles, each particle having a position and orientation in space The position estimation unit moves each particle based on the position measurement result, and based on the distance measurement data and the probability distribution, the current attribute of each particle. Means for calculating a first likelihood for the particle and calculating the weight of the corresponding particle, and a moving object position estimating means for estimating the current position of the moving object as a weighted average of each particle, and is attached to the moving object The sound direction of the sound sensor array, the direction of arrival estimation means for executing the process for specifying the direction of arrival of sound in the sound sensor array based on the signal from the sound sensor array, and the attribute and direction of arrival of each particle. And a sound source localization means for estimating the position of the acoustic source, wherein the sound source localization means hits the target specified by the geometric map by the ray cast from the position of each particle toward the arrival direction. In response, the second likelihood for the hit position is accumulated, the position of the acoustic source is estimated based on the accumulated second likelihood, and the estimated position of the acoustic source is calculated. Further comprising an acoustic source map database storing as HibikiHajime map.

この発明のさらに他の局面にしたがうと、音響源マップ作成方法であって、移動体を駆動して移動させるステップと、移動体の移動可能な空間における対象物の幾何学的な位置を特定するための幾何学マップを記憶装置に格納するステップと、移動体の現在位置を計測し、位置計測結果を出力するステップと、移動体から空間内において幾何学マップで特定される対象物までの距離を、所定の確率分布に従う測距データとして取得するステップと、複数のパーティクルを含むパーティクルフィルタにより、移動体の位置を推定するステップとを備え、各パーティクルは、空間内の位置および向きの情報を属性とし、移動体の位置を推定するステップは、位置計測結果に基づいて、各パーティクルを運動させ、測距データと確率分布に基づき、各パーティクルの現在の属性についての第１の尤度を算出し対応するパーティクルの重みを算出するステップと、各パーティクルの重み平均として、移動体の現在位置を推定するステップとを含み、移動体に装着された音センサアレイからの信号に基づいて、音センサアレイに音の到来する到来方向を特定するための処理を実行するステップと、各パーティクルの属性と到来方向とに基づいて、音響源の位置を推定するステップとをさらに備え、音響源の位置を推定するステップは、各パーティクルの位置から、到来方向に向かうレイキャストが、幾何学マップにより特定される対象物にヒットすることに応じて、当該ヒット位置に対する第２の尤度を累積し、累積された第２の尤度に基づいて、音響源の位置を推定し、推定された音響源の位置を音響源マップとして記憶装置に格納するステップとをさらに備える。 According to still another aspect of the present invention, there is provided a method for creating an acoustic source map, the step of driving and moving a moving body, and specifying the geometric position of an object in a movable space of the moving body. Storing a geometric map for storage in a storage device, measuring a current position of the moving body, outputting a position measurement result, and a distance from the moving body to an object specified by the geometric map in space Is obtained as distance measurement data according to a predetermined probability distribution, and a step of estimating the position of the moving body by a particle filter including a plurality of particles, each particle having position and orientation information in space. The step of estimating the position of the moving object as an attribute moves each particle based on the position measurement result, and based on the distance measurement data and the probability distribution. Calculating a first likelihood for the current attribute of each particle and calculating a weight of the corresponding particle; and estimating a current position of the moving object as a weighted average of each particle; Based on the signal from the mounted sound sensor array, the step of executing the process for identifying the direction of arrival of the sound in the sound sensor array, and on the basis of the attribute and direction of arrival of each particle, Estimating the position of the sound source, wherein the step of estimating the position of the acoustic source is in response to a raycast heading from the position of each particle hitting an object specified by the geometric map. The second likelihood for the hit position is accumulated, the position of the acoustic source is estimated based on the accumulated second likelihood, and the estimated acoustic source Position further comprising the steps of: storing in a storage device as an acoustic source map.

本発明によれば、音源の位置を特定でき、現実の音声コミュニケーションに利用可能な音響源に対する事前情報を収集することが可能である。 According to the present invention, it is possible to identify the position of a sound source and collect prior information on an acoustic source that can be used for actual voice communication.

音響源マップの状況を説明するための図である。It is a figure for demonstrating the condition of an acoustic source map. 音響源マップの状況を説明するための他の図である。It is another figure for demonstrating the condition of an acoustic source map. ロボット１０００の構成のうち、音響源マップ作成のための構成を示す図である。It is a figure which shows the structure for acoustic source map preparation among the structures of the robot 1000. FIG. MUSICアルゴリズムを用いる音源定位のための機能ブロックを示す図である。It is a figure which shows the functional block for the sound source localization using a MUSIC algorithm. ロボット１０００と音響源Ｓ１およびＳ２とを配置した空間の例を示す図である。It is a figure which shows the example of the space which has arrange | positioned the robot 1000 and acoustic source S1 and S2. ロボット１０００の外観を示す図である。It is a figure which shows the external appearance of the robot 1000. FIG. レーザレンジファインダによる測距処理の概念を説明するための図である。It is a figure for demonstrating the concept of the ranging process by a laser range finder. ロボット１０００の位置の特定の処理の概念を説明するための図である。It is a figure for demonstrating the concept of the specific process of the position of the robot. パーティクルフィルタを用いたロボット１０００の位置の特定処理を説明するためのフローチャートである。It is a flowchart for demonstrating the specific process of the position of the robot 1000 using a particle filter. パーティクルの状態ベクトルｓ^m［ｔ］の算出の概念を説明するための図である。It is a diagram for explaining a concept of calculating the particle state vector s ^m [t]. 対象物までの距離に基づく各パーティクルについての尤度を示す図である。It is a figure which shows the likelihood about each particle based on the distance to a target object. レーザレンジファインダの測定結果（測距データ）に基づいて、パーティクルの尤度を算出する手続きを説明するための概念図である。It is a conceptual diagram for demonstrating the procedure which calculates the likelihood of a particle based on the measurement result (ranging data) of a laser range finder. レーザレンジファインダによる測距と、あるパーティクルからのレイキャストの手続きを示す概念図である。It is a conceptual diagram which shows the procedure of the distance measurement by a laser range finder, and the raycast from a certain particle. パーティクルからの音源の位置を特定するためのレイキャストの概念を説明するための図である。It is a figure for demonstrating the concept of the ray cast for pinpointing the position of the sound source from a particle. 占有セルについての尤度の累積および音響源マッピングの処理を説明するためのフローチャートである。It is a flowchart for demonstrating the process of the accumulation | accumulation of likelihood about an occupation cell, and an acoustic source mapping. テスト環境のうちの１つに対して得られた地図を示す図である。FIG. 6 shows a map obtained for one of the test environments. 音響源位置の特定に使用したパラメータを表す図である。It is a figure showing the parameter used for specification of an acoustic source position. フラットな初期設定の場合に得られた生の音響源マップを示す図である。It is a figure which shows the raw sound source map obtained in the case of flat initial setting. フラットな初期設定の場合に得られた生の音響源マップを示す図である。It is a figure which shows the raw sound source map obtained in the case of flat initial setting. フラットな初期設定の場合に得られた生の音響源マップを示す図である。It is a figure which shows the raw sound source map obtained in the case of flat initial setting. 各試行のための音響源の活性化のパターンを示すテーブルである。It is a table which shows the pattern of activation of the acoustic source for each trial.

以下、本発明の実施の形態の移動体、音響源マップ作成システムおよび音響源マップ作成方法の構成について、図に従って説明する。なお、以下の実施の形態において、同じ符号を付した構成要素および処理工程は、同一または相当するものであり、必要でない場合は、その説明は繰り返さない。 Hereinafter, a configuration of a moving object, an acoustic source map creation system, and an acoustic source map creation method according to an embodiment of the present invention will be described with reference to the drawings. In the following embodiments, components and processing steps given the same reference numerals are the same or equivalent, and the description thereof will not be repeated unless necessary.

なお、以下の説明では、音センサとしては、いわゆるマイクロホン、より特定的にはエレクトレットコンデンサマイクロホンを例にとって説明を行うが、音声を電気信号として検出できるセンサであれば、他の音センサであってもよい。 In the following description, as a sound sensor, a so-called microphone, more specifically an electret condenser microphone will be described as an example, but other sound sensors may be used as long as they can detect sound as an electric signal. Also good.

また、本実施の形態では、音響源のパワー、指向性、周波数および高調波歪のような位置および特性を含んでいる地図を「音響源マップ」と呼ぶ。 In the present embodiment, a map including a position and characteristics such as power, directivity, frequency, and harmonic distortion of an acoustic source is referred to as an “acoustic source map”.

実環境では、異なった場所で発生する複数の音が混合して観測されるため、音響源マップの生成において、騒音計で空間をスキャンするような単純な方法は不十分である。音環境の事前知識として役立つと考えられる音源の位置や種類を特徴付けた音環境地図の生成には、空間的情報（通常の地図）に加え、少なくとも音源の定位および分離が必要であり、さらには、音源の分類がなされることが望ましい。 In a real environment, a plurality of sounds generated at different locations are mixed and observed. Therefore, a simple method of scanning a space with a sound level meter is insufficient for generating an acoustic source map. Generating a sound environment map that characterizes the location and type of a sound source that may be useful as prior knowledge of the sound environment requires at least localization and separation of the sound source in addition to spatial information (normal map). It is desirable that the sound source is classified.

音響源マップから得られる情報は、視覚的データまたは測距センサデータを補完することができ、ロボットの位置、インタラクションおよび監視能力を増加させる。 Information obtained from the acoustic source map can complement visual data or ranging sensor data, increasing the robot's position, interaction and monitoring capabilities.

図１および図２は、このような音響源マップの状況を説明するための図である。 FIG. 1 and FIG. 2 are diagrams for explaining the state of such an acoustic source map.

例えば、図1の中の実例を考慮すると、ロボットは２つの主要な音を知覚する:
ｉ）前方やや左側からのテレビの音
ｉｉ）右側からのユーザーの声
ロボットは、地図上の所定の位置を占める自律移動型ロボットであり、現在の自身の姿勢を知覚しているとする。ここで、図１では、このような自律移動型ロボットは、差動動作が可能な２輪により駆動されている構成の人型ロボットである。 For example, considering the example in Figure 1, the robot perceives two main sounds:
i) TV sound from the front or the left side ii) User's voice from the right side The robot is an autonomous mobile robot that occupies a predetermined position on the map, and perceives its current posture. Here, in FIG. 1, such an autonomous mobile robot is a humanoid robot configured to be driven by two wheels capable of differential operation.

図２に示すように、音響源マップがロボットに利用可能ならば、ロボットは、環境中の固定音響源に関する事前知識を持つことになり、その左から来るデータは、固定発生源（テレビ）からの音であって、右側からのユーザーの声に比べて重要度が低いことを知っていることになる。 As shown in FIG. 2, if an acoustic source map is available to the robot, the robot will have prior knowledge of the fixed acoustic source in the environment, and the data coming from its left is from a fixed source (TV). This means that it is less important than the voice of the user from the right side.

その後、環境(この場合、それは人間の発話（スピーチ）である)中の新しい音の特性に基づいて、ロボットは、ユーザーの言うことを聞くことができるように、それ自体の位置を決める（右側に向く）といったような適切な動作を行なうことができ、彼/彼女と、より有効にコミュニケーションをとることができる。 Then, based on the characteristics of the new sound in the environment (in this case it is human speech), the robot positions itself so that it can hear what the user says (right Can communicate with him / her more effectively.

環境中の音響源マップは、幾何学的な地図に似ているものの、一定でなく、非常に急速に変わり続ける。したがって、変更が環境に生じるごとに、新しい音響源マップを作ることは、退屈なタスクである。 An acoustic source map in the environment is similar to a geometric map, but it is not constant and changes very rapidly. Therefore, creating a new source map every time a change occurs in the environment is a tedious task.

以下に説明するように、本実施の形態では、ロボットがその通常のタスクを行なう間に、そのタスクの実行だけでなく環境音を観察するとともに、音響源マップの変化を更新することができるフレームワークを提供する。 As will be described below, in the present embodiment, while the robot performs its normal task, not only the execution of the task but also the environmental sound can be observed and the change in the acoustic source map can be updated. Provide work.

図３は、ロボット１０００の構成のうち、音響源マップ作成のための構成を示す図である。なお、図３においては、ロボットが移動するための移動機構（たとえば、差動で動作できる２つの車輪による駆動機構、カメラなどのように視覚により外部環境を検知するためのセンサなど）については図示省略している。また、以下の例では、音響源マップを作成するための「移動体」として、２輪で駆動される人型の自律移動型ロボットを例にとって説明するが、本発明における移動体はこのようなものに制限されず、自律的に空間内を移動可能な物であれば、他の構成のロボットでもよい。 FIG. 3 is a diagram showing a configuration for creating an acoustic source map among the configurations of the robot 1000. In FIG. 3, a moving mechanism for moving the robot (for example, a driving mechanism using two wheels that can operate differentially, a sensor for visually detecting an external environment such as a camera) is illustrated. Omitted. In the following example, a human-type autonomous mobile robot driven by two wheels will be described as an example of a “moving body” for creating an acoustic source map. The robot is not limited to the one, and may be a robot having another configuration as long as it can move autonomously in the space.

図３を参照して、ロボット１０００は、２輪の駆動機構によりロボットを移動させる駆動機構に対して、その移動距離、速度、角速度などを検出するための路程測定センサ１０と、ロボット１０００の前面に搭載され、レーザ光のスキャンにより前方に存在する対象物までの距離を計測するための前面レーザレンジファインダ（ＬＲＦ：Laser Range Finder）２０と、ロボット１０００の後面に搭載され、レーザ光のスキャンにより後方に存在する対象物までの距離を計測するための後面レーザレンジファインダ（ＬＲＦ）３０と、路程測定センサ１０や前面ＬＲＦ２０および後面ＬＲＦ３０からの信号をバス６０を介してデータとして出力し、あるいは、バス６０からのコマンドをこれらのセンサに伝達するためのセンサ入出力ボード４０とを備える。 Referring to FIG. 3, robot 1000 has a path length measurement sensor 10 for detecting a moving distance, speed, angular velocity, and the like of a driving mechanism that moves the robot by a two-wheel driving mechanism, and a front surface of robot 1000. Is mounted on the rear surface of the robot 1000 and a front surface laser range finder (LRF) 20 for measuring the distance to an object existing ahead by scanning the laser beam. A rear surface laser range finder (LRF) 30 for measuring the distance to the object existing behind, and signals from the path measurement sensor 10, the front surface LRF 20, and the rear surface LRF 30 are output as data via the bus 60, or A sensor input / output board 40 is provided for transmitting commands from the bus 60 to these sensors. That.

ロボット１０００は、さらに、ワーキングメモリとして機能し、ＲＡＭ（Random Access Memory）などで構成されるメモリ５０と、ロボット１０００が動作するためのプログラム（図示せず）や幾何学マップ１１０２、音響源マップ１１０４などを格納するための不揮発性記憶装置１１００とを備える。不揮発性記憶装置１１００としては、ランダムアクセス可能な記憶装置であれば、ハードディスクを用いてもよいし、あるいは、ＳＳＤ（Solid State Drive）などを用いてもよい。「幾何学マップ」とは、ロボット１０００が移動する空間内の壁や固定・半固定の定常的に存在する対象物の位置を幾何学的に地図上に表現したデータのことをいう。なお「幾何学マップ」は、２次元的な情報であってもよいが、３次元的な情報であることが望ましい。 The robot 1000 further functions as a working memory, and includes a memory 50 configured by a RAM (Random Access Memory), a program (not shown) for operating the robot 1000, a geometric map 1102, and an acoustic source map 1104. And a non-volatile storage device 1100 for storing the data. As the nonvolatile storage device 1100, a hard disk may be used as long as it is a randomly accessible storage device, or an SSD (Solid State Drive) or the like may be used. The “geometric map” refers to data that geometrically expresses the position of a wall in the space in which the robot 1000 moves or a fixed / semi-fixed object that exists constantly on the map. The “geometric map” may be two-dimensional information, but is preferably three-dimensional information.

ロボット１０００は、さらに、音源の方向（音の到来方向）とその方向からの音響パワーを測定するためのマイクロホンアレイ１０５２と、マイクロホンアレイ１０５２からの信号をバス６０に伝達する信号に変換するための音声入出力ボード１０５４と、ロボット１０００の動作を制御し、幾何学マップの作成、音響マップの作成および更新処理を実行するための演算装置であるプロセッサ１０１０とを備える。なお、図３では、図示省略されているが、音声入出力ボードには、プロセッサ１０１０により生成された音声を、ユーザとのコミュニケーションのために再生するスピーカも接続されている。 The robot 1000 further includes a microphone array 1052 for measuring the direction of the sound source (sound arrival direction) and the acoustic power from the direction, and a signal for transmitting the signal from the microphone array 1052 to a signal transmitted to the bus 60. A voice input / output board 1054 and a processor 1010 which is an arithmetic unit for controlling the operation of the robot 1000 and executing creation of a geometric map and creation and update of an acoustic map are provided. Although not shown in FIG. 3, the audio input / output board is also connected to a speaker that reproduces the audio generated by the processor 1010 for communication with the user.

プロセッサ１０１０は、不揮発性記憶装置１１００に格納されたプログラムに基づいて、路程測定センサ１０や前面ＬＲＦ２０および後面ＬＲＦ３０からの信号に基づいて、幾何学マップの作成を行う幾何学マップ作成部１０３０と、マイクロホンアレイ１０５２からの信号に基づいて、音源パワーのスペクトルを取得する処理を実行する音源パワースペクトル取得部１０５０と、音源パワースペクトルに基づいて、音源方向（音の到来方向）をロボット座標上で推定するための音源方向推定部１０６０と、路程測定センサ１０や前面ＬＲＦ２０および後面ＬＲＦ３０からの信号に基づいて、ロボットの位置を特定するためのロボット位置特定処理部１０４０と、特定されたロボット位置と音源方向から音源位置を推定する音源位置推定部１０７０とを備える。 The processor 1010, based on a program stored in the nonvolatile storage device 1100, a geometric map creation unit 1030 that creates a geometric map based on signals from the path length measurement sensor 10, the front surface LRF 20, and the rear surface LRF 30, A sound source power spectrum acquisition unit 1050 that executes processing for acquiring a sound source power spectrum based on a signal from the microphone array 1052, and a sound source direction (sound arrival direction) is estimated on the robot coordinates based on the sound source power spectrum. A sound source direction estimating unit 1060, a robot position specifying processing unit 1040 for specifying the position of the robot based on signals from the path measurement sensor 10, the front surface LRF 20, and the rear surface LRF 30, and the specified robot position and sound source Sound source position estimation unit 1 for estimating the sound source position from the direction And a 70.

ここで、音源方向の推定については、マイクロホンアレイへの到着の時間遅れの推定、あるいはステアード応答パワー (SRP)あるいはMUSICアルゴリズムのようなスペクトル分解技術の推定に基づく。 Here, the estimation of the sound source direction is based on estimation of the time delay of arrival at the microphone array, or estimation of spectral decomposition techniques such as the steered response power (SRP) or the MUSIC algorithm.

そして、位相差を用いた音源方向推定の場合は、ＳＲＰ−ＰＨＡＴ法(Steered Response Power - Phase Alignment Transform）や、ＳＰＩＲＥ法（Stepwise Phase Difference Restoration法）などを用いて音源方向推定を行うことが可能である。 In the case of sound source direction estimation using the phase difference, it is possible to perform sound source direction estimation using the SRP-PHAT method (Steered Response Power-Phase Alignment Transform), SPIRE method (Stepwise Phase Difference Restoration method), etc. It is.

このような音源方向の推定については、上述した特許文献２や、以下の文献にも開示がある。 Such estimation of the sound source direction is also disclosed in the above-mentioned Patent Document 2 and the following documents.

文献１：特開２０１２−２４２５９７号公報明細書
図４は、このような音源定位法のうち、MUSICアルゴリズムを用いる音源定位のための機能ブロックを示す図である。 Literature 1: JP 2012-242597 A FIG. 4 is a diagram showing functional blocks for sound source localization using the MUSIC algorithm in such a sound source localization method.

ここでの音源定位とは、音源の方位を継続的に特定することをいい、音源の位置推定とは、所定の空間内で、音源定位により特定された音源の方位に基づいて、３次元的な音源の位置を推定することをいう。 The sound source localization here means to continuously specify the direction of the sound source, and the position estimation of the sound source is based on the direction of the sound source specified by the sound source localization in a predetermined space. This refers to estimating the position of a sound source.

一例として、音源の位置の推定のために、音源の方位を推定するための手法の具体例として、ＭＵＳＩＣ（Multiple Signal Classification）法を例にとって説明する。ただし、音源の方位を推定できる方法であれば、他の手法を用いてもよい。 As an example, a MUSIC (Multiple Signal Classification) method will be described as an example of a method for estimating the direction of a sound source in order to estimate the position of the sound source. However, other methods may be used as long as the direction of the sound source can be estimated.

ＭＵＳＩＣ法の概略について説明すると、まず、高速フーリエ変換により多チャンネルのスペクトルＸ（ｋ，ｔ）をフレーム毎に求め、スペクトル領域でチャンネル間の空間的相関行列Ｒ_kをブロック毎に求め、相関行列の固有値分解により指向性の成分と無指向性の成分のサブ空間を分解し、無指向性のサブ空間に対応する固有ベクトルＥ_k ⁿと、対象の検索空間に応じて予め用意した方向ベクトルａ_k を用いて（狭帯域の）ＭＵＳＩＣ空間スペクトルＰ（ｋ）を周波数ビンごとに求め、特定の周波数帯域内の周波数ビン毎のＭＵＳＩＣ空間スペクトルを統合して広帯域ＭＵＳＩＣ空間スペクトルが求まる。 The outline of the MUSIC method will be described. First, a multi-channel spectrum X (k, t) is obtained for each frame by fast Fourier transform, and a spatial correlation matrix R _k between channels in a spectral region is obtained for each block. The eigenvalue decomposition of the directional component and the omnidirectional component subspace is decomposed, the eigenvector E _k ⁿ corresponding to the omnidirectional subspace, and the direction vector a _k prepared in advance according to the target search space. Is used to obtain a (narrowband) MUSIC spatial spectrum P (k) for each frequency bin, and a MUSIC spatial spectrum for each frequency bin within a specific frequency band is integrated to obtain a wideband MUSIC spatial spectrum.

以下では、広帯域ＭＵＳＩＣ空間スペクトルを単に「ＭＵＳＩＣ空間スペクトル」と呼び、ＭＵＳＩＣ空間スペクトルの時系列を「ＭＵＳＩＣスペクトログラム」を呼ぶ。 In the following, the broadband MUSIC spatial spectrum is simply referred to as “MUSIC spatial spectrum”, and the time series of the MUSIC spatial spectrum is referred to as “MUSIC spectrogram”.

音源定位においては、ＭＵＳＩＣ空間スペクトルのピークを探索することにより、音源の方向が求まる。 In sound source localization, the direction of the sound source is obtained by searching for the peak of the MUSIC spatial spectrum.

なお、以下では、マイクロホンアレイが１つである場合を例にとって説明するが、マイクロホンアレイの個数はより多くてもよい。 In the following, a case where there is one microphone array will be described as an example, but the number of microphone arrays may be larger.

図４を参照して、音源パワースペクトル取得部１０５０は、マイクロホン１０５２．１〜１０５２．ｐ（ｐ：自然数）を含むマイクロホンアレイＭＣ１から、それぞれｐ個のアナログ音源信号を受け、アナログ／デジタル変換を行なってｐ個のデジタル音源信号をそれぞれ出力するＡ／Ｄ変換器１０５４と、Ａ／Ｄ変換器１０５４からそれぞれ出力されるｐ個のデジタル音源信号を受け、ＭＵＳＩＣ法で必要とされる相関行列とその固有値および固有ベクトルを、所定の時間、たとえば、１００ミリ秒を１ブロックとしてブロックごとに出力するための固有ベクトル算出部６１と、固有ベクトル算出部６１からブロックごとに出力される固有ベクトルを使用し、ＭＵＳＩＣ法によりＭＵＳＩＣ空間スペクトルを出力するＭＵＳＩＣ処理部６２とを含む。音源方向推定部１０６０は、ＭＵＳＩＣ処理部６２が出力するＭＵＳＩＣ空間スペクトルに基づいて、音源の方向（本実施の形態では、３次元極座標の内の２つの偏角φおよびθとする）を推定する。なお、本明細書では、「ＭＵＳＩＣ応答」とは、ＭＵＳＩＣアルゴリズムにより得られるＭＵＳＩＣ空間スペクトルを所定の式で平均化したものである。 Referring to FIG. 4, sound source power spectrum acquisition section 1050 includes microphones 1052.1 to 1052. An A / D converter 1054 that receives p analog sound source signals from the microphone array MC1 including p (p: natural number), performs analog / digital conversion, and outputs p digital sound source signals, respectively, and A / Receiving p digital sound source signals respectively output from the D converter 1054, the correlation matrix and its eigenvalues and eigenvectors required by the MUSIC method are set for each block with a predetermined time, for example, 100 milliseconds as one block. An eigenvector calculation unit 61 for outputting, and a MUSIC processing unit 62 that uses the eigenvector output from the eigenvector calculation unit 61 for each block and outputs a MUSIC space spectrum by the MUSIC method. The sound source direction estimation unit 1060 estimates the direction of the sound source (in this embodiment, two declination angles φ and θ in the three-dimensional polar coordinates) based on the MUSIC spatial spectrum output from the MUSIC processing unit 62. . In the present specification, the “MUSIC response” is obtained by averaging the MUSIC spatial spectrum obtained by the MUSIC algorithm using a predetermined formula.

特に限定されないが、本実施の形態では、Ａ／Ｄ変換器１０５４は、一般的な１６ｋＨｚ／１６ビットで各マイクロホンの出力をＡ／Ｄ変換する。 Although not particularly limited, in this embodiment, the A / D converter 1054 A / D converts the output of each microphone at a general 16 kHz / 16 bits.

また、固有ベクトル算出部６１は、マイクロホンアレイＭＣ１からの信号に基づきＡ／Ｄ変換器１０５４の出力するｐ個のデジタル音源信号を、たとえば、４ミリ秒のフレーム長でフレーム化するためのフレーム化処理部８０と、フレーム化処理部８０の出力するｐチャンネルのフレーム化された音源信号に対してそれぞれＦＦＴ（ＦａｓｔＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍａｔｉｏｎ）を施し、所定個数の周波数領域（以下、各周波数領域を「ビン」と呼び、周波数領域の数を「ビン数」と呼ぶ。）に変換して出力するＦＦＴ処理部８２と、ＦＦＴ処理部８２から４ミリ秒ごとに出力される各チャネルの各ビンの値を、１００ミリ秒ごとにブロック化するためのブロック化処理部８４と、ブロック化処理部８４から出力される各ビンの値の間の相関を要素とする相関行列を所定時間ごと（１００ミリ秒ごと）に算出し出力する相関行列算出部８６と、相関行列算出部８６から出力される相関行列を固有値分解し、固有ベクトル９２をＭＵＳＩＣ処理部６２に出力する固有値分解部８８とを含む。 In addition, the eigenvector calculation unit 61 framing the p digital sound source signals output from the A / D converter 1054 based on the signal from the microphone array MC1 with a frame length of, for example, 4 milliseconds. Unit 80 and p-channel framed sound source signals output from framing processing unit 80 are each subjected to FFT (Fast Fourier Transform), and a predetermined number of frequency regions (hereinafter, each frequency region is referred to as a “bin”). The FFT processing unit 82 that converts the frequency domain number into a number of bins and outputs it, and the bin value of each channel that is output from the FFT processing unit every 4 milliseconds is expressed as 100. A blocking processing unit 84 for blocking every millisecond, and each bin output from the blocking processing unit 84 A correlation matrix calculation unit 86 that calculates and outputs a correlation matrix having a correlation between the two as a factor every predetermined time (every 100 milliseconds), and eigenvalue decomposition of the correlation matrix output from the correlation matrix calculation unit 86, Is output to the MUSIC processing unit 62.

通常、ＦＦＴでは５１２〜１０２４点を使用する（１６ｋＨｚのサンプリングレートで３２〜６４ミリ秒に相当）が、ここでは１フレームを４ミリ秒（ＦＦＴでは６４〜１２８点に相当）とした。このようにフレーム長を短くすることにより、ＦＦＴの計算量が少なくてすむだけでなく、後の相関行列の算出、固有値分解、およびＭＵＳＩＣ応答の算出における計算量も少なくて済む。その結果、性能を落とすことなく、比較的非力なコンピュータを用いても十分にリアルタイムで音源定位を行なうことができる。 Usually, 512 to 1024 points are used in FFT (corresponding to 32 to 64 milliseconds at a sampling rate of 16 kHz), but here one frame is set to 4 milliseconds (corresponding to 64 to 128 points in FFT). By reducing the frame length in this way, not only the amount of calculation of FFT is reduced, but also the amount of calculation in later calculation of correlation matrix, eigenvalue decomposition, and calculation of MUSIC response is reduced. As a result, sound source localization can be performed sufficiently in real time even if a relatively weak computer is used without degrading performance.

ＭＵＳＩＣ処理部６２は、マイクロホンアレイＭＣ１に含まれる各マイクロホンの位置を所定の座標系を用いて表す位置ベクトルを記憶するためのマイク配置記憶部１００と、マイク配置記憶部１００に記憶されているマイクロホンの位置ベクトル、および固有値分解部８８から出力される固有ベクトルを用いて、音源数が固定されているものとしてＭＵＳＩＣ法によりＭＵＳＩＣ空間スペクトルを算出し出力するＭＵＳＩＣ空間スペクトル算出部１０４とを含む。ブロックごとに得られる相関行列の固有値が音源数に関連することは、例えば、文献２：Ｆ．アサノら、「リアルタイム音源定位及び生成システムと自動音声認識におけるその応用」、Ｅｕｒｏｓｐｅｅｃｈ，２００１、アールボルグ、デンマーク、２００１、１０１３−１０１６頁（F. Asano, M. Goto, K. Itou, and H. Asoh, ”Real-time sound source localization and separation system and its application on automatic speech recognition,” in Eurospeech 2001, Aalborg, Denmark, 2001, pp. 1013-1016）にも記載されており、既に知られている事項である。 The MUSIC processing unit 62 includes a microphone arrangement storage unit 100 for storing a position vector representing the position of each microphone included in the microphone array MC1 using a predetermined coordinate system, and a microphone stored in the microphone arrangement storage unit 100. And a MUSIC spatial spectrum calculation unit 104 that calculates and outputs a MUSIC spatial spectrum by the MUSIC method on the assumption that the number of sound sources is fixed, using the position vector of Eq. The fact that the eigenvalue of the correlation matrix obtained for each block is related to the number of sound sources is described in, for example, Reference 2: Asano et al., “Real-time sound source localization and generation system and its application in automatic speech recognition”, Eurospeech, 2001, Aalborg, Denmark, 2001, 1013-1016 (F. Asano, M. Goto, K. Itou, and H. Asoh, “Real-time sound source localization and separation system and its application on automatic speech recognition,” in Eurospeech 2001, Aalborg, Denmark, 2001, pp. 1013-1016) It is.

なお、本実施の形態では、各音源の２次元的な方位角だけでなく、仰角も推定する。そのために、ＭＵＳＩＣアルゴリズムとしては、３次元での計算が可能なものを実装する。方位角と仰角とのセットを、これ以降、音源方位（ＤＯＡ）と呼ぶ。ＭＵＳＩＣ処理部６２で実行されるアルゴリズムでは、音源までの距離は推定しない。音源方位のみを推定するようにすることで、処理時間を大幅に減少させることができる。 In the present embodiment, not only the two-dimensional azimuth angle of each sound source but also the elevation angle is estimated. Therefore, as the MUSIC algorithm, an algorithm that can calculate in three dimensions is implemented. The set of azimuth and elevation is hereinafter referred to as sound source azimuth (DOA). The algorithm executed by the MUSIC processing unit 62 does not estimate the distance to the sound source. By estimating only the sound source azimuth, the processing time can be significantly reduced.

ＭＵＳＩＣ処理部６２はさらに、ＭＵＳＩＣ空間スペクトル算出部１０４により算出されたＭＵＳＩＣ空間スペクトルに基づいて、ＭＵＳＩＣ法にしたがいＭＵＳＩＣ応答と呼ばれる値を各方位について算出し出力するためのＭＵＳＩＣ応答算出部１０６を含む。 The MUSIC processing unit 62 further includes a MUSIC response calculation unit 106 for calculating and outputting a value called a MUSIC response for each direction according to the MUSIC method based on the MUSIC spatial spectrum calculated by the MUSIC spatial spectrum calculation unit 104. .

音源方向推定部１０６０は、ＭＵＳＩＣ応答算出部１０６により算出されたＭＵＳＩＣ応答のピークを、一時的に時系列に所定数だけＦＩＦＯ形式でそれぞれ蓄積するためのバッファ１０８を含む。さらに、音源方向推定処理部１１０は、バッファ１０８に蓄積された各ブロックの各探索点のＭＵＳＩＣ応答について、音源の方向（上述した２つの偏角φおよびθ）を推定する。 The sound source direction estimation unit 1060 includes a buffer 108 for temporarily accumulating a predetermined number of MUSIC response peaks calculated by the MUSIC response calculation unit 106 in time series in the FIFO format. Further, the sound source direction estimation processing unit 110 estimates the direction of the sound source (the two declination angles φ and θ described above) for the MUSIC response of each search point of each block accumulated in the buffer 108.

ここで、ＭＵＳＩＣ法では、狭帯域ＭＵＳＩＣ空間スペクトルの推定において、その時刻に発している指向性を持つ音源数（ＮＯＳ）を与える必要があるが、以下の説明では、固定数を与え、ＭＵＳＩＣ空間スペクトル上で、特定の閾値を超えたピークのみを指向性のある音源とみなすものとして説明する。
（ＭＵＳＩＣ法）
以下、上述した３次元での方位を算出するＭＵＳＩＣ法について、簡単にまとめる。 Here, in the MUSIC method, in the estimation of the narrow band MUSIC space spectrum, it is necessary to give the number of sound sources (NOS) having directivity emitted at that time, but in the following explanation, a fixed number is given and the MUSIC space is given. In the following description, it is assumed that only peaks that exceed a specific threshold on the spectrum are regarded as directional sound sources.
(MUSIC method)
Hereinafter, the MUSIC method for calculating the above-described three-dimensional orientation will be briefly summarized.

たとえば、Ｍ個のマイク入力のフーリエ変換Ｘｍ（ｋ、ｔ）は、式（Ｍ１）のようにモデル化される。 For example, the Fourier transform Xm (k, t) of M microphone inputs is modeled as in equation (M1).

ただし、ベクトルｓ（ｋ、ｔ）はＮ個の音源のスペクトルＳ_n（ｋ、ｔ）から成る（ｎ＝１，…，Ｎ）。 However, the vector s (k, t) consists of N sound source spectra S _n (k, t) (n = 1,..., N).

すなわち、ｓ（ｋ、ｔ）＝［Ｓ₁（ｋ、ｔ）、…、Ｓ_N（ｋ、ｔ）］^Tである。ここで、ｋとｔはそれぞれ周波数と時間フレームのインデックスを示す。ベクトルｎ（ｋ、ｔ）は背景雑音を示す。行列Ａ_ｋは変換関数行列であり、その（ｍ、ｎ）要素はｎ番目の音源から、ｍ番目のマイクロホンへの直接パスの変換関数である。Ａ_ｋのｎ列目のベクトルをｎ番目の音源の位置ベクトル（ＳｔｅｅｒｉｎｇＶｅｃｔｏｒ）と呼ぶ。 That is, s (k, t) = [S ₁ (k, t),..., S _N (k, t)] ^T. Here, k and t indicate frequency and time frame indexes, respectively. Vector n (k, t) indicates background noise. The matrix A _k is a conversion function matrix, and its (m, n) element is a conversion function of a direct path from the nth sound source to the mth microphone. The n-th column vectors of A _k is referred to as a position vector of the n-th sound source (Steering Vector).

まず、式（Ｍ２）で定義される空間相関行列Ｒ_ｋを求め、式（Ｍ３）に示すＲｋの固有値分解により、固有値の対角行列Λ_ｋおよび固有ベクトルから成るＥ_ｋが求められる。 First, a spatial correlation matrix R _k defined by Expression (M2) is obtained, and E _k composed of a diagonal matrix Λ _k of eigenvalues and eigenvectors is obtained by eigenvalue decomposition of Rk shown in Expression (M3).

固有ベクトルはＥ_ｋ＝［Ｅ_ｋｓ｜Ｅ_ｋｎ］のように分割出来る。Ｅ_ｋｓとＥ_ｋｎとはそれぞれ支配的なＮ個の固有値に対応する固有ベクトルと、それ以外の固有ベクトルとを示す。 The eigenvector can be divided as E _k = [E _ks | E _kn ]. E _ks and E _kn indicate eigenvectors corresponding to the dominant N eigenvalues and other eigenvectors, respectively.

ＭＵＳＩＣ空間スペクトルは式（Ｍ４）と（Ｍ５）とで求める。ｒは距離、θとφとはそれぞれ方位角と仰角とを示す。式（Ｍ５）は、スキャンされる点（ｒ、θ、φ）における正規化した位置ベクトルである。 The MUSIC spatial spectrum is obtained by equations (M4) and (M5). r is a distance, and θ and φ are an azimuth angle and an elevation angle, respectively. Equation (M5) is a normalized position vector at the scanned point (r, θ, φ).

ＭＵＳＩＣ応答（パワーに相当）は、ＭＵＳＩＣ空間スペクトルを式（Ｍ６）のように平均化したものである。 The MUSIC response (corresponding to power) is obtained by averaging the MUSIC spatial spectrum as shown in Equation (M6).

式（Ｍ６）においてｋ_Lおよびｋ_Hは、それぞれ周波数帯域の下位と上位の境界のインデックスであり、Ｋ＝ｋ_H−ｋ_L＋１である。マイクロホンアレイに到来する音の方位は、ＭＵＳＩＣ応答のピークを探索することにより求められる。 In Expression (M6), k _L and k _H are indices of the lower and upper boundaries of the frequency band, respectively, and K = k _H −k _L +1. The direction of the sound arriving at the microphone array can be obtained by searching for the peak of the MUSIC response.

後に説明するように、本実施の形態の音響マップ作成システムは、ロボットにより、異なる場所で、従来の音源定位アルゴリズムを使用し、これらのすべての異なる場所からの結果を組み合わせることで、音響マップを作成する。 As will be described later, the acoustic map creation system according to the present embodiment uses a conventional sound source localization algorithm at different locations by a robot, and combines the results from all these different locations to generate an acoustic map. create.

さて、以上説明したとおり、音の到来方向の推定アルゴリズムとしては、ＭＵＳＩＣ法を用いることも、一方で、他の方法、たとえば、ステアード応答パワー法を用いることも可能である。 As described above, the MUSIC method can be used as an algorithm for estimating the direction of arrival of sound, while other methods such as the steered response power method can be used.

そこで、特に限定されないが、以下では、他のアルゴリズムの例として、ステアード応答パワー法に基づいて、音の到来方向を推定する場合について説明する。 Thus, although not particularly limited, a case where the direction of arrival of sound is estimated based on the steered response power method will be described below as an example of another algorithm.

このようなステアード応答パワー法については、以下の文献に開示がある。 Such a steered response power method is disclosed in the following document.

文献３：M. Brandstein and H. Silverman, ”A robust method for speech signal time-delay estimation in reverberant rooms,” in IEEE Conference on Acoustics, Speech, and Signal Processing, ICASSP 1997, 1997, pp. 375-378.
文献４：A. Badali, J.-M. Valin, F. Michaud, and P. Aarabi, ”Evaluating realtime audio localization algorithms for artificial audition on mobile robots,” in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2009, 2009, pp. 2033-2038.
位相変換(SRP-PHAT)を有する「ステアード応答パワー」は、移動ロボットアプリケーションに適している効率的な音源定位アルゴリズムである。このフレームワークを、より詳しく示すために、音源定位へのステアード応答パワーアプローチについて概説する。 Reference 3: M. Brandstein and H. Silverman, “A robust method for speech signal time-delay estimation in reverberant rooms,” in IEEE Conference on Acoustics, Speech, and Signal Processing, ICASSP 1997, 1997, pp. 375-378.
Reference 4: A. Badali, J.-M. Valin, F. Michaud, and P. Aarabi, “Evaluating realtime audio localization algorithms for artificial audition on mobile robots,” in Proceedings of IEEE / RSJ International Conference on Intelligent Robots and Systems , IROS 2009, 2009, pp. 2033-2038.
“Staired response power” with phase transformation (SRP-PHAT) is an efficient sound source localization algorithm suitable for mobile robot applications. To illustrate this framework in more detail, we will outline a steered response power approach to sound source localization.

音源定位のゴールは、オーディオ計測を使用して、サーチ領域中の音響源の位置を評価することである。 The goal of sound source localization is to use audio measurements to evaluate the position of the acoustic source in the search area.

サンプリング時間kで、アレイ状のＱ個のマイクロホンからの計測された信号をｖ₁（ｋ），…，ｖ_Q（ｋ）とする。 Assume that the measured signals from the Q microphones in the array at sampling time k are v ₁ (k),..., V _Q (k).

マイクロホンアレイの空間的配置が、事前に正確に知られているので、空間ビーム形成を使用したアレイをフォーカスして、ある空間的位置からの音を評価することが可能である。 Since the spatial arrangement of the microphone array is known in advance accurately, it is possible to focus the array using spatial beamforming and evaluate the sound from a certain spatial location.

ビーム形成の出力は、以下のように表される： The beamforming output is expressed as:

ここで、［x,y,z］は、マイクロホンアレイを基準とする焦点位置の座標である。
ステアード応答パワー（SRP）は、Ｔサンプル時間に対する出力パワーの計算により、以下のように得られる： Here, [x, y, z] are the coordinates of the focal position with reference to the microphone array.
The steered response power (SRP) is obtained by calculating the output power against the T sample time as follows:

ここで、以下の座標は、探索空間におけるＮ個の位置の組を表す： Here, the following coordinates represent a set of N positions in the search space:

ステアード応答パワーのピークに対応する場所が、音響源の位置を与える。ビーム形成の出力を取得して、パワーを計算し、音響源の場所の組を選択するためには、いくつかの方法がある。 The location corresponding to the peak of the steered response power gives the position of the acoustic source. There are several ways to obtain the beamforming output, calculate power, and select a set of acoustic source locations.

以下では、Ｎ個の場所からのパワーを含む、時間ｋにおいて取得されたステアード応答パワーは、ｋ番目の「オーディオスキャン」と呼ぶことにする。 In the following, the steered response power acquired at time k, including the power from N locations, will be referred to as the kth “audio scan”.

従って、ロボットの移動に伴って音源定位を行なうことは、異なる時に異なる場所から得られたオーディオスキャンを組み合わせることを意味する。 Therefore, performing sound source localization as the robot moves means combining audio scans obtained from different locations at different times.

特に、これらのスキャンは、マイクロホンアレイを基準とする座標（ロボット座標）中での探索空間で計算される(フォーカスされる位置は、アレイを基準とする座標中で知る必要がある)。 In particular, these scans are calculated in a search space in coordinates (robot coordinates) relative to the microphone array (the focused position needs to be known in the coordinates relative to the array).

したがって、それらを組み合わせて、ロボットの座標系から、音響源マップを作るグローバル座標系へ変換することが必要である。 Therefore, it is necessary to convert them from a robot coordinate system to a global coordinate system for creating an acoustic source map by combining them.

これらの定位結果が異なる時間で取得されるので、位置が固定された音響源および動いている音響源を区別して、音響源マップを作成することが可能となる。 Since these localization results are acquired at different times, it is possible to create an acoustic source map by distinguishing between a fixed acoustic source and a moving acoustic source.

本実施の形態では、図１および図２で説明したように、固定された音響源（)の環境騒音のマッピングに興味を持っている。すなわち、固定音響源の音響源マップを持っていることは動いているものの検知のためには、よい事前知識である。 In the present embodiment, as described with reference to FIGS. 1 and 2, we are interested in mapping environmental noise of a fixed acoustic source (). That is, having an acoustic source map of a fixed acoustic source is good prior knowledge for detecting what is moving.

なお、以上の説明では、ロボットに装着されたマイクロホンアレイにより音源の定位が実行されるものとして説明しているが、たとえば、空間内には、ロボットとは異なる定位置にマイクロホンアレイが設置され、このような定位置のマイクロホンアレイからのデータも併せて、音響源マップが作成される構成としてもよい。 In the above description, the sound source is localized by the microphone array attached to the robot. For example, in the space, the microphone array is installed at a different position from the robot. An acoustic source map may be created together with data from such a fixed-position microphone array.

（SRP-PHAT法によるオーディオスキャンの取得）
本実施の形態では、SRP-PHAT処理は、たとえば、４８ｋＨｚ(分析ウィンドウは長さ２５ミリセカンドである。また、ウィンドウのシフトは１０ミリセカンドである)でサンプリングされた、観測信号に短時間フーリエ変換(STFT)を適用した後に、周波数領域で行われる。 (Acquisition of audio scan by SRP-PHAT method)
In the present embodiment, the SRP-PHAT processing is performed by, for example, short-time Fourier transform on an observation signal sampled at 48 kHz (analysis window is 25 milliseconds in length, and window shift is 10 milliseconds). After applying the transform (STFT), it is done in the frequency domain.

その後、ステアード応答パワーは、パワーの平均のために５つのSTFTフレームを使用して、周波数帯域[1000，6000]Hzで計算される。 The steered response power is then calculated in the frequency band [1000, 6000] Hz using 5 STFT frames for power averaging.

したがって、オーディオスキャンは、５０ミリセカンドごとに新たに実行される。 Therefore, an audio scan is newly executed every 50 milliseconds.

音源定位アルゴリズムの特徴は、その方位の評価は正確であるものの、音源の存在する範囲の評価は、不正確である、ということである。したがって、探索領域について記述するためには、ＭＵＳＩＣ法と同様に、極座標［ρ，θ，φ］がしばしば使用される。 The feature of the sound source localization algorithm is that the evaluation of its direction is accurate, but the evaluation of the range where the sound source exists is inaccurate. Therefore, polar coordinates [ρ, θ, φ] are often used to describe the search area, as in the MUSIC method.

そして、本実施の形態では、遠距離場 (ρがアレイの開口に比較して大きい)を仮定して、方位についてのみ走査を行うものとする。 In this embodiment, it is assumed that the far field (ρ is larger than the aperture of the array) is assumed, and only the azimuth is scanned.

さらに、以下では、説明の簡単のために、音響源の方位は、２次元の方向についてのみ考えることにする。もちろん、以下の説明に基づけば、音響源の方位の推定を３次元に拡張することが可能である。 Further, in the following, for the sake of simplicity of explanation, only the two-dimensional direction is considered as the direction of the acoustic source. Of course, based on the following description, it is possible to extend the estimation of the orientation of the acoustic source to three dimensions.

すなわち、ｋ番目のスキャンは、［０，２π］の範囲のＮ個の角度ｑn（ｋ）（＝θn）とこれに関連するパワーＪn（ｋ）の組みである（これらの角度は、ロボットを基準とする座標において取得される）。 That is, the k-th scan is a set of N angles qn (k) (= θn) in the range [0, 2π] and the power Jn (k) associated therewith (these angles Obtained at the reference coordinates).

（幾何学的な地図の構築）
以下では、上述したような音源の方位の推定結果に基づいて、空間内の音源の位置を特定する処理の前提として、幾何学的な地図を構築する処理について説明する。 (Construction of geometric map)
In the following, a process of constructing a geometric map will be described as a premise of the process of specifying the position of the sound source in the space based on the estimation result of the direction of the sound source as described above.

すなわち、たとえば、ロボット１０００が移動する空間についての壁の位置などのような予め空間の設計図面として作成されている地図（設計地図）は所与のものとして、ロボット１０００が不揮発性記憶装置１１００に保持しているものとする。 That is, for example, a map (design map) created as a space design drawing in advance, such as a wall position for a space in which the robot 1000 moves, is given, and the robot 1000 is stored in the nonvolatile storage device 1100. It shall be held.

ただし、現実にロボットが移動する空間では、固定的または半固定的に設置されている対象物（たとえば、図１におけるテレビなど）が存在し、このような対象物を含めた幾何学的な地図を、音源の位置の特定の前提として構築しておくことが必要である。 However, in a space where the robot actually moves, there are objects that are fixedly or semi-fixedly installed (for example, the television in FIG. 1), and a geometric map including such objects is present. Must be constructed as a specific premise of the position of the sound source.

このような幾何学マップの作成のための、同時の測位およびマッピング(ＳＬＡＭ：Simultaneous Localization And Mapping)に基づくマッピング技術は、これまでにもよく研究され、完成されて、実際にも適用されてきたものである。 Mapping technology based on simultaneous positioning and mapping (SLAM) for creating such a geometric map has been well studied, completed and applied in practice. Is.

この実施の形態では、グリッド地図の生成のために、その技術を使用する。 In this embodiment, the technique is used for generating a grid map.

たとえば、事前に、このような幾何学マップを作成するために、人間が、ジョイスティックでロボットをコントロールして、測定対象となる空間の環境下での路程測定およびレーザセンサ情報を収集することができる。 For example, in order to create such a geometric map in advance, a human can control a robot with a joystick and collect path length measurement and laser sensor information under the environment of the space to be measured. .

その後、ＩＣＰ（iterative closest point）ベースのＳＬＡＭを使用し、３ＤToolkitライブラリ・フレームワークを用いて、ロボットの軌道を修正し、かつレーザレンジファインダのスキャンを、地図に対して整合させる。 Then, using the iterative closest point (ICP) based SLAM, the robot trajectory is corrected using the 3D Toolkit library framework, and the laser range finder scan is aligned with the map.

その結果得られる整合されたスキャンで、占有グリッド地図が作成される。 With the resulting aligned scan, an occupancy grid map is created.

このような占有グリッド地図については、たとえば、以下の文献に開示がある。 Such an occupation grid map is disclosed in, for example, the following documents.

文献５：A. Elfes, ”Using occupancy grids for mobile robot perception and navigation,” Computer, vol. 22, no. 6, pp. 46-57, June 1989.
したがって、本実施の形態において、「レーザレンジファインダのスキャンを、地図に整合させる」とは、レーザレンジファインダのスキャンの方向および測位の位置が設計図面においてどの方向および位置に対応するかを整合させることをいい、「占有グリッド」とは、幾何学マップにおいて、固定対象物（壁などのような固定的な対象物およびテレビのような半固定的な対象物であって、測定対象となる時間間隔に対して、固定されているとみなされるもの）により、占有されているグリッドのことをいう。 Reference 5: A. Elfes, “Using occupancy grids for mobile robot perception and navigation,” Computer, vol. 22, no. 6, pp. 46-57, June 1989.
Therefore, in the present embodiment, “matching the scan of the laser range finder to the map” means matching the direction and position of the scan of the laser range finder and the positioning position in the design drawing. The “occupied grid” is a fixed object (a fixed object such as a wall or a semi-fixed object such as a television set) on a geometric map, and is the time to be measured. A grid that is occupied by what is considered fixed relative to the spacing.

後に説明するように、本実施の形態では、音響源の方向と「占有グリッド」に関する情報とを組み合わせることで、空間内の音響源の位置を特定する。 As will be described later, in the present embodiment, the position of the acoustic source in the space is specified by combining the direction of the acoustic source and information related to the “occupied grid”.

（ロボットの位置の特定処理のアプローチ）
以下では、ロボット位置特定処理部１０４０の行う処理について、詳しく説明する。 (Robot position identification processing approach)
Hereinafter, processing performed by the robot position specifying processing unit 1040 will be described in detail.

図５は、実験として、ロボット１０００と音響源Ｓ１およびＳ２とを配置した空間の例を示す図である。 FIG. 5 is a diagram illustrating an example of a space in which the robot 1000 and the sound sources S1 and S2 are arranged as an experiment.

図５においては、ロボット１０００は、実験のために、移動機構および音源の定位のための構成のみとなっている。 In FIG. 5, the robot 1000 has only a configuration for moving mechanism and sound source localization for experiments.

図６は、このようなロボット１０００の外観を示す図である。 FIG. 6 is a view showing the appearance of such a robot 1000.

ロボット１０００の前面および後面には、前面レーザレンジファインダ（ＬＲＦ：Laser Range Finder）２０と、後面レーザレンジファインダ（ＬＲＦ）３０とが設けられる。また、移動機構である車輪には、路程測定センサ１０が取り付けられている。 A front laser range finder (LRF) 20 and a rear laser range finder (LRF) 30 are provided on the front and rear surfaces of the robot 1000. Moreover, the path | route measurement sensor 10 is attached to the wheel which is a moving mechanism.

図７は、レーザレンジファインダによる測距処理の概念を説明するための図である。 FIG. 7 is a diagram for explaining the concept of distance measurement processing by the laser range finder.

レーザレンジファインダは、ロボット１０００から角度ｂでレーザ光を照射して、その角度ｂについて対象物までの距離ｚを測定する装置である。角度ｂを変化させてスキャンすることで、一定の角度範囲内の対象物までの距離を取得できる。 The laser range finder is a device that irradiates laser light from the robot 1000 at an angle b and measures the distance z to the target object at the angle b. By scanning while changing the angle b, the distance to the object within a certain angle range can be acquired.

図７に示されるように、前面ＬＲＦ２０および後面ＬＲＦ３０によりロボットの周囲３６０度に存在する対象物までの距離を、各一回のスキャンで取得することが可能である。 As shown in FIG. 7, it is possible to acquire the distance to the object existing at 360 degrees around the robot by the front surface LRF 20 and the rear surface LRF 30 in each one scan.

図８は、ロボット１０００の位置の特定の処理の概念を説明するための図である。 FIG. 8 is a diagram for explaining the concept of the process for specifying the position of the robot 1000.

図８（ａ）に示すように、ロボット１０００の位置を特定するにあたり、路程センサ１０のみを使用した場合、実際には、出発点と終点が一致するようにロボット１０００が移動したとしても、ロボット内部で特定されたロボットの位置がずれてしまう。 As shown in FIG. 8A, when only the path sensor 10 is used for specifying the position of the robot 1000, the robot 1000 actually moves even if the start point and the end point coincide with each other. The position of the robot specified inside will shift.

これに対して、以下に説明する図８（ｂ）に示すように、路程センサ１０と、前面ＬＲＦ２０および後面ＬＲＦ３０の測距データとを組み合わせると、ロボット内部で特定される位置においても、出発点と終点が一致する。 On the other hand, as shown in FIG. 8B described below, when the path length sensor 10 and the distance measurement data of the front surface LRF 20 and the rear surface LRF 30 are combined, even at a position specified inside the robot, the starting point And the end point match.

このように、路程センサ１０と、前面ＬＲＦ２０および後面ＬＲＦ３０の測距データとを組み合わせるために、以下に説明するように、本実施の形態では、ロボット位置を評価するにあたり、重みづけられたＭ個のパーティクルによるパーティクルフィルタ・アプローチを使用する。 Thus, in order to combine the distance sensor 10 and the distance measurement data of the front surface LRF 20 and the rear surface LRF 30, as described below, in this embodiment, the weighted M pieces are used in evaluating the robot position. Use a particle filter approach with multiple particles.

ｍ個の各パーティクルには、その属性として、ロボット１０００の候補位置{xm(k)，ym(k)}および候補方向{qm(k)}、ならびに、パーティクル重み{wm(k)}を含む状態ベクトルｓ^m［ｔ］＝{xm(k)，ym(k)，qm(k)，wm(k)}が対応付けられる。 Each of the m particles includes, as its attributes, the candidate position {xm (k), ym (k)} and candidate direction {qm (k)} of the robot 1000 and the particle weight {wm (k)}. State vectors s ^m [t] = {xm (k), ym (k), qm (k), wm (k)} are associated.

ロボット１０００が移動する間、各パーティクルも、さらに路程測定および確率論的な動作模型(それはロボット運動に不確実性について記述する)に基づいて運動する。 While the robot 1000 moves, each particle also moves based on path length measurement and a stochastic motion model (which describes uncertainty in the robot motion).

さらに、パーティクルに対して重みを算出する過程で、パーティクルフィルタは、レーザレンジファインダによる測定に基づく尤度を考慮して、事後確率を推定する。 Further, in the process of calculating the weight for the particle, the particle filter estimates the posterior probability in consideration of the likelihood based on the measurement by the laser range finder.

図９は、パーティクルフィルタを用いたロボット１０００の位置の特定処理を説明するためのフローチャートである。 FIG. 9 is a flowchart for explaining the process of specifying the position of the robot 1000 using the particle filter.

図９を参照して、処理が開始されると、まず、ロボット位置特定処理部１０４０は、各パーティクルの属性の初期化を行う（Ｓ１００）。 Referring to FIG. 9, when the process is started, first, the robot position specification processing unit 1040 initializes the attribute of each particle (S100).

続いて、ロボット位置特定処理部１０４０は、変数ｍを１から順次ｍの最大値ｍmaxまでインクリメントしながら、パーティクルの状態ベクトルｓ^m［ｔ］を、路程計測の結果と動作モデル、および前時刻の状態ベクトルｓ^m［ｔ−１］とに基づいて、算出する（Ｓ１０２、Ｓ１０４，Ｓ１０６，Ｓ１０８のループ）。 Subsequently, the robot position specifying processing unit 1040 increments the variable m sequentially from 1 to the maximum value mmax of m, and calculates the particle state vector s ^m [t] from the path measurement result, the motion model, and the previous time. Calculation is performed based on the state vector s ^m [t−1] (loop of S102, S104, S106, and S108).

図１０は、パーティクルの状態ベクトルｓ^m［ｔ］の算出の概念を説明するための図である。 FIG. 10 is a diagram for explaining the concept of calculating the particle state vector s ^m [t].

図１０に示されるように、路程センサ１０は、右側の車輪の速度ｖ＿ｒおよび左側の車輪の速度ｖ＿ｌを計測しているので、それにより、ロボット位置特定処理部１０４０は、ロボット１０００の速度Ｖおよび各速度Ｒを算出することができる。これにより、前時刻の状態ベクトルｓ^m［ｔ−１］から、Δｔ時間だけ経過した後のパーティクルの状態ベクトルｓ^m［ｔ］を算出することになる。なお、各パーティクルは、位置だけでなく、方向も異なっている。 As shown in FIG. 10, since the path sensor 10 measures the speed v_r of the right wheel and the speed v_l of the left wheel, the robot position specifying processing unit 1040 causes the speed V of the robot 1000 and Each speed R can be calculated. Thus, from the previous time the state vector s ^m [t-1], leading to calculation of a state vector of the particles after the lapse of Δt time s ^m [t]. Each particle is different not only in position but also in direction.

図９に戻って、続いて、ロボット１０００では、プロセッサ１０１０に制御されて、パーティクルをカウントするための変数ｍを１に初期化し（Ｓ１０８）、測定角度および変数ｋの初期化を行った（Ｓ１１０）後に、角度θ₀から所定の角度Δθごとにレーザレンジファインダ２０および３０により、対象物までの測距を行う（Ｓ１１２）。 Returning to FIG. 9, subsequently, the robot 1000 is controlled by the processor 1010 to initialize the variable m for counting particles to 1 (S108), and initialize the measurement angle and the variable k (S110). ) After that, distance measurement to the object is performed by the laser range finders 20 and 30 every predetermined angle Δθ from the angle θ ₀ (S112).

続いて、ロボット位置特定処理部１０４０は、パーティクル毎に、時刻ｔにおいて、その向きにおいて、対象物までの距離に基づいて、各パーティクルについての尤度を計算する（Ｓ１１４）。 Subsequently, the robot position specification processing unit 1040 calculates the likelihood for each particle based on the distance to the target object at the time t in each direction at time t (S114).

図１１は、このような対象物までの距離に基づく各パーティクルについての尤度を示す図である。 FIG. 11 is a diagram showing the likelihood of each particle based on the distance to such an object.

このようなレーザレンジファインダの測距に基づく尤度については、パーティクルを対象とするものではないが、以下の文献に記載がある。 The likelihood based on the distance measurement of such a laser range finder is not intended for particles, but is described in the following documents.

文献６：S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics (Intelligent Robotics and Autonomous Agents)，The MIT Press, 2005.
図１１（ａ）に示すように、レーザレンジファインダにより対象物までの距離がｚ_t ^＊である場合、このような尤度（事後確率分布）は、以下の確率分布を重ね合わせたものである。 Reference 6: S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics (Intelligent Robotics and Autonomous Agents), The MIT Press, 2005.
As shown in FIG. 11A, when the distance to the object is z _t ^* by the laser range finder, such likelihood (posterior probability distribution) is obtained by superimposing the following probability distributions. .

ｉ）測定ノイズを考慮した測定距離ｚ_t ^＊を中心とするガウス分布
ｉｉ）ロボットが移動することにより予期しない対象物が検出される確率であって、距離に対して指数関数的に減少する分布
ｉｉｉ）最大測距可能範囲内に対象物が存在しない場合に測距データが最大値となる確率
ｉｖ）ランダムな要因によるホワイトノイズに相当する確率分布
ただし、以下の説明では、簡単のために、尤度は、図１１（ｂ）のように表されるものとして説明する。 i) Gaussian distribution centered on the measurement distance z _t ^* in consideration of measurement noise ii) Probability that an unexpected object is detected when the robot moves, and a distribution that decreases exponentially with distance iii) Probability that ranging data becomes maximum when there is no object within the maximum distance measurement range iv) Probability distribution corresponding to white noise due to random factors However, in the following description, for simplicity, The likelihood will be described on the assumption that it is expressed as shown in FIG.

図１２は、レーザレンジファインダの測定結果（測距データ）に基づいて、パーティクルの尤度を算出する手続きを説明するための概念図である。 FIG. 12 is a conceptual diagram for explaining a procedure for calculating the likelihood of particles based on the measurement result (ranging data) of the laser range finder.

図１２に示されるように、ロボット１０００の現実の位置からレーザレンジファインダにより、角度θkでの対象物までの距離がｚ＊として測定された場合を考える。 Consider the case where the distance from the actual position of the robot 1000 to the object at the angle θk is measured as z * from the actual position of the robot 1000 as shown in FIG.

このとき、パーティクル１については、その位置および向きから、光線を発射したとして、幾何学マップにおいて、角度θkでの対象物までの距離がＺ１＊であると算出され、パーティクル２については、その位置および向きから、光線を発射したとして、幾何学マップにおいて、角度θkでの対象物までの距離がＺ２＊であると算出されるとする。 At this time, with respect to the particle 1, it is calculated that the distance to the object at the angle θk is Z1 * in the geometric map, assuming that a ray is emitted from its position and orientation. Further, it is assumed that a light ray is emitted from the direction and that the distance to the object at the angle θk is calculated to be Z2 * in the geometric map.

ここで、このようなパーティクルから対象物まで仮想的に光線を照射する処理も、現実のレーザ光のスキャンの動作において光線を所定の角度で照射することを「レイキャスト」と呼ぶのに対応して、「レイキャスト」と呼ぶことにする。 Here, the process of irradiating light rays virtually from the particles to the object also corresponds to the fact that irradiating light rays at a predetermined angle in the actual scanning operation of laser light is called “ray casting”. This is called “Raycast”.

このとき、ロボット位置特定処理部１０４０は、図１１に示した確率分布に基づいて、図１２に示すように、パーティクル１の尤度を、Ｌ１と算出し、パーティクル２の尤度を、Ｌ２と算出する（Ｓ１１４）。 At this time, based on the probability distribution shown in FIG. 11, the robot position specifying processing unit 1040 calculates the likelihood of the particle 1 as L1, and the likelihood of the particle 2 as L2, as shown in FIG. Calculate (S114).

ロボット位置特定処理部１０４０は、角度θkが最大値に達していなければ（Ｓ１１６）、角度θkをΔθだけ更新し（Ｓ１１８）、角度θkでの対象物までの距離のレーザレンジファインダでの計測、および各パーティクルについて角度θkでの対象物までの距離の算出に基づいて、図１１に示した確率分布により、尤度を算出し積算する（Ｓ１１４）。 If the angle θk does not reach the maximum value (S116), the robot position specifying unit 1040 updates the angle θk by Δθ (S118), and measures the distance to the object at the angle θk with the laser range finder. Based on the calculation of the distance to the object at the angle θk for each particle, the likelihood is calculated and integrated by the probability distribution shown in FIG. 11 (S114).

図１３は、レーザレンジファインダによる測距と、あるパーティクルからのレイキャストの手続きを示す概念図である。 FIG. 13 is a conceptual diagram showing a procedure of distance measurement by a laser range finder and ray casting from a certain particle.

図１３に示すように、角度を変更してスキャンしながら、ロボット１０００からレーザレンジファインダにより測距した結果と、各パーティクルからのレイキャストにより幾何学マップ上で算出される対象物までの距離とにより、各角度におけるパーティクルの尤度を算出して、角度について得られた尤度を積算する。 As shown in FIG. 13, while scanning at a different angle, the result of distance measurement from the robot 1000 using the laser range finder, and the distance to the object calculated on the geometric map by ray casting from each particle Thus, the likelihood of particles at each angle is calculated, and the likelihoods obtained for the angles are integrated.

図９に再び戻って、このような処理を角度θkが最大値に達するまで繰り返すことにより、ロボット位置特定処理部１０４０は、各パーティクルの重みを尤度の高さに比例して決定する（Ｓ１２０）。 Returning to FIG. 9 again, by repeating such processing until the angle θk reaches the maximum value, the robot position specifying processing unit 1040 determines the weight of each particle in proportion to the likelihood (S120). ).

以上のようなステップＳ１１０からステップＳ１２０までの処理を、変数ｍの値をインクリメントしながら、パーティクルの最大個数ｍmaxとなるまで繰り返す（Ｓ１１０〜Ｓ１２４までのループ）。 The processes from step S110 to step S120 as described above are repeated until the maximum number mmax of particles is reached while incrementing the value of the variable m (loop from S110 to S124).

さらに、ロボット位置特定処理部１０４０は、パーティクルの表す現在の状態とその重みとから、ロボット位置の確率密度分布を推定する（Ｓ１２６）。さらに、ロボット位置特定処理部１０４０は、推定された確率密度分布を用いて、ロボットの現在の状態（位置および向き）を推定する（Ｓ１２８）。 Further, the robot position specification processing unit 1040 estimates the probability density distribution of the robot position from the current state represented by the particle and its weight (S126). Further, the robot position specification processing unit 1040 estimates the current state (position and orientation) of the robot using the estimated probability density distribution (S128).

そして、ロボット位置特定処理部１０４０は、パーティクルフィルタについて、推定された確率密度分布を用いてリサンプリング処理を行う（Ｓ１３０）。すなわち、たとえば、確率密度分布に比例した確率で、パーティクルを復元抽出する。つまり、確率の大きなパーティクルの抽出頻度を高くし、確率の小さなパーティクルを消滅させて、時刻ｔでのパーティクル集合を得る。 Then, the robot position specification processing unit 1040 performs resampling processing on the particle filter using the estimated probability density distribution (S130). That is, for example, particles are restored and extracted with a probability proportional to the probability density distribution. That is, the extraction frequency of particles with a large probability is increased, and particles with a small probability are extinguished to obtain a particle set at time t.

その後、処理は、ステップＳ１０２に復帰する。 Thereafter, the process returns to step S102.

以上のような処理によりパーティクルは、レイキャスティングの後には、より正確にロボットの状態を反映したパーティクルは、より高い尤度スコアを有することになって、より多くの重みが割り当てられる。 As a result of the above processing, after the ray casting, the particles more accurately reflecting the state of the robot have a higher likelihood score and are assigned more weight.

なお、幾何学マップ作成部１０３０は、幾何学マップを、パーティクル分散の状態と、レーザースキャンのマッチングとに基づいて、更新する構成としてもよい。
（占有セルについての尤度の累積および音響源マッピング）
以下では、音源の方位の推定処理と、ロボット１０００の位置を図９に示すようなパーティクルフィルタにより特定することとに連動して、音源の空間中の位置を特定する処理について説明する。 The geometric map creation unit 1030 may be configured to update the geometric map based on the particle dispersion state and the laser scan matching.
(Accumulation of likelihood and sound source mapping for occupied cells)
In the following, a description will be given of a process for specifying the position of the sound source in space in conjunction with the process of estimating the direction of the sound source and specifying the position of the robot 1000 using a particle filter as shown in FIG.

自律移動中では、ｋ番目のオーディオスキャン｛ｑn(ｋ)，Ｊn(ｋ)｝は、ロボット座標で表現されるものの、この実施の形態で実行される処理では、グローバル座標での幾何学マップにおける音源の位置を評価するために、オーディオスキャンをグローバル座標系におけるロボットの姿勢に関する情報と組み合わせる。 During autonomous movement, the k-th audio scan {qn (k), Jn (k)} is expressed in robot coordinates, but in the process executed in this embodiment, in the geometric map in global coordinates. To evaluate the position of the sound source, an audio scan is combined with information about the robot's posture in the global coordinate system.

すなわち、各パーティクルであらわされるロボットの姿勢（位置および方向）から、音源の方向として特定された複数の方向にレイキャストし、これらの方向でヒットした幾何学マップ上の占有セルを見つけ出す処理を行う。言い換えると、音響源は、幾何学的マップにおいて、占有セルとして表現される（目に見える）と仮定していることに相当する。 In other words, from the posture (position and direction) of the robot represented by each particle, ray casting is performed in a plurality of directions specified as the direction of the sound source, and a process of finding an occupied cell on the geometric map hit in these directions is performed. . In other words, this corresponds to assuming that the acoustic source is represented (visible) as an occupied cell in the geometric map.

図１４は、このようなパーティクルからの音源の位置を特定するためのレイキャストの概念を説明するための図である。 FIG. 14 is a diagram for explaining the concept of ray casting for specifying the position of a sound source from such particles.

図９で説明したように、ロボット姿勢に関する知識は、パーティクルの分布に含まれている。 As described with reference to FIG. 9, the knowledge about the robot posture is included in the particle distribution.

ｋ番目のオーディオスキャンがなされると、図１４に示すように、このパーティクルから、オーディオ方向ｑn(ｋ)の知識を伝搬させるために、Ｍ（＝ｍmax）個のパーティクルの各々からレイキャストが実行される。 When the kth audio scan is performed, raycast is performed from each of M (= mmax) particles in order to propagate the knowledge of the audio direction qn (k) from this particle, as shown in FIG. Is done.

そして、このようなレイキャストが、占有セル{i,j}にヒットした場合は、音源位置推定部１０７０は、その占有セルについての尤度ｃij（ｋ）を、パーティクルの重みｗmだけ増加する。 When such a raycast hits the occupied cell {i, j}, the sound source position estimating unit 1070 increases the likelihood cij (k) for the occupied cell by the particle weight wm.

したがって、ある占有セルが音源位置である尤度が、このようなレイキャストにより算出される。 Therefore, the likelihood that a certain occupied cell is a sound source position is calculated by such ray casting.

図１５は、占有セルについての尤度の累積および音響源マッピングの処理を説明するためのフローチャートである。 FIG. 15 is a flowchart for explaining processing of accumulating likelihoods and acoustic source mapping for occupied cells.

また、図１６は、テスト環境のうちの１つに対して得られた地図を示す図である。以下では、図１６を、図１５の説明のための具体例として使用する。図１６（ａ）は、音響源およびロボット軌道を備えた廊下の幾何学マップを示し、図１６（ｂ）は、レイキャスティング後に得られた占有セルについての尤度Ｃij（ｋ）を示す部分拡大図を示す。黒いＸ印は占有セルを表わす)。 FIG. 16 is a diagram showing a map obtained for one of the test environments. In the following, FIG. 16 is used as a specific example for explaining FIG. FIG. 16 (a) shows a geometric map of the corridor with acoustic source and robot trajectory, and FIG. 16 (b) is a partial enlargement showing the likelihood Cij (k) for the occupied cell obtained after ray casting. The figure is shown. A black X represents an occupied cell).

図１５を参照して、音源位置推定部１０７０は、変数ｋを初期化した後（Ｓ２００）、音源方向推定部１０６０から、ｋ番目のオーディオスキャンの結果として取得されている、ロボット座標系における音の到来方向ｑn(ｋ)およびパワーＪn(ｋ)を取得する（Ｓ２０２）。 Referring to FIG. 15, sound source position estimating section 1070 initializes variable k (S200), and then obtains sound in the robot coordinate system acquired from sound source direction estimating section 1060 as a result of the kth audio scan. Direction of arrival qn (k) and power Jn (k) are acquired (S202).

上述した占有セルについての尤度Ｃij（ｋ）は、「有益な」スキャンに対してのみ、有意な値を有するように生成される。 The likelihood Cij (k) for the above-mentioned occupied cell is generated to have a significant value only for “useful” scans.

そこで、Ｓ２０４に示すように、スキャンが、顕著なピークを成しており、パワーの変化量が所定よりも大きいスキャンについて、尤度の累積処理を行う。逆にいえば、全方向と同程度のパワーとなっているスキャンについては破棄され、累積処理は行われない。 Therefore, as shown in S204, likelihood accumulation processing is performed for scans in which the scan has a prominent peak and the amount of change in power is greater than a predetermined value. In other words, a scan having the same power as that in all directions is discarded and no accumulation process is performed.

そこで、Ｓ２０４では、ｋ番目のオーディオスキャンは、以下の場合に選択される。 Therefore, in S204, the kth audio scan is selected in the following case.

ここで、εmaxとεrangeは、所定のしきい値である。 Here, εmax and εrange are predetermined threshold values.

オーディオスキャンが、上述した条件を満たす場合（Ｓ２０４でＹ）、音源位置推定部１０７０は、変数ｎを初期化した後（Ｓ２０６）、変数ｍと変数ｐmaxとを初期化した後（Ｓ２０８）、ｍ番目のパーティクルについて、位置［xm(k)、ym(k)］から、方向［qm(k)+qn(k)］にレイキャストを実行する（Ｓ２１０）。 When the audio scan satisfies the above-described conditions (Y in S204), the sound source position estimation unit 1070 initializes the variable n (S206), initializes the variable m and the variable pmax (S208), m For the second particle, ray casting is performed from the position [xm (k), ym (k)] to the direction [qm (k) + qn (k)] (S210).

音源位置推定部１０７０は、レイキャストが、幾何学的マップにおいて占有されたセルにヒットするとき、または、マップにおける非探索領域（図４（ａ）におけるグレーの領域）に達したとき、あるいは、最大範囲ｄに到達したときは、レイキャストを終了させる（Ｓ２１２）。 The sound source position estimation unit 1070 is used when the raycast hits an occupied cell in the geometric map, or reaches a non-search area (gray area in FIG. 4A) in the map, or When the maximum range d is reached, the raycast is terminated (S212).

セル｛ｉ，ｊ｝については、ｋ番目のスキャンに対するヒット率は、Ｃij（ｋ）と表すこととする。 For the cell {i, j}, the hit rate for the k-th scan is represented as Cij (k).

ヒット数のカウントが、初期的にゼロとされた後、セル｛ｉ，ｊ｝が、ｋ番目のスキャンのｍ番目のパーティクルからのレイキャストにヒットすると、音源位置推定部１０７０は、Ｃij（ｋ）を、以下のように更新する。 After the count of hits is initially set to zero, when the cell {i, j} hits a raycast from the m-th particle of the k-th scan, the sound source position estimation unit 1070 selects Cij (k ) Is updated as follows.

ここで、ｗmは、ｍ番目のパーティクルの重みである。 Here, wm is the weight of the mth particle.

ヒットしたセルは、ヒットセルのリストに加えられ、リストサイズをｐmaxをインクリメントする（Ｓ２１４）。 The hit cell is added to the list of hit cells, and the list size is incremented by pmax (S214).

音源位置推定部１０７０は、このような処理をｍmax個のパーティクルについて繰り返す（Ｓ２１０からＳ２１８までのループ）。 The sound source position estimation unit 1070 repeats such processing for mmax particles (loop from S210 to S218).

音源位置推定部１０７０において、ｍmax個のパーティクルの各々からレイキャストした後においては、ｃij（ｋ）は、セル｛ｉ，ｊ｝が、ヒットした確率を表すことになる。 In the sound source position estimation unit 1070, after ray casting from each of the mmax particles, cij (k) represents the probability that the cell {i, j} has hit.

したがって、オーディオ方向ｑn(k)は、いくつかのセルと関連付けられ、このような各セルに対する確率が評価されることになる。 Thus, the audio direction qn (k) is associated with several cells and the probability for each such cell will be evaluated.

図１６（ｂ）は、スキャンによってヒットしたセルと、そのヒット確率とを示す図である。 FIG. 16B shows a cell hit by scanning and its hit probability.

図１５に戻って、レイキャストの間、セル｛ｉ，ｊ｝には、オーディオスキャンパワーの関数である尤度が、以下のようにして割り当てられる。 Returning to FIG. 15, during raycast, cells {i, j} are assigned a likelihood that is a function of audio scan power as follows.

音源位置推定部１０７０において、変数ｐが初期化される（Ｓ２２０）。 In the sound source position estimation unit 1070, the variable p is initialized (S220).

そして、選択されたスキャンについては、尤度関数は、以下のように与えられる。 And for the selected scan, the likelihood function is given as:

ここで、αは、尤度のピークの鋭さを制御するための値であり、以下のとおりの関係がある。 Here, α is a value for controlling the sharpness of the likelihood peak, and has the following relationship.

パラメーターεnormは、低いパワーのスキャンに対する尤度の範囲をコントロールするものである。 The parameter εnorm controls the range of likelihood for low power scans.

尤度は、以下の範囲にスケールされる。 The likelihood is scaled to the following range:

セル｛ｉ，ｊ｝には、ｃij（ｋ）<εprobaであるとき（Ｓ２２４でＹ）には、対数尤度が、以下のように割り当てられる（Ｓ２２８）。 When cij (k) <εproba (Y in S224), log likelihood is assigned to the cell {i, j} as follows (S228).

それ以外の場合（Ｓ２２４でＮ）は、対数尤度は、以下のようになる（Ｓ２２６）。 In other cases (N in S224), the log likelihood is as follows (S226).

なお、対数尤度の初期値は、以下のようになる： Note that the initial value of the log likelihood is as follows:

累積中においては、対数尤度は、［−ＬＲmax，ＬＲmax］の範囲になるようにしきい値がかけられる。 During accumulation, the logarithmic likelihood is thresholded so as to be in the range of [−LRmax, LRmax].

音源位置推定部１０７０は、このような処理を、ヒットセルのリスト中のすべてのセルについて繰り返す（Ｓ２２２からＳ２３２までのループ）。 The sound source position estimation unit 1070 repeats such processing for all the cells in the hit cell list (loop from S222 to S232).

続いて、音源位置推定部１０７０は、変数ｎ（何番目の音源であるかを示す変数）が、ｎmax−１よりも大きくなければ（Ｓ２３４でＮ）、ｎの値をインクリメントして（Ｓ２３８）、ステップＳ２０８からＳ２３４までの処理を繰り返す。 Subsequently, the sound source position estimation unit 1070 increments the value of n if the variable n (a variable indicating which sound source is) is not greater than nmax−1 (N in S234) (S238). The processing from step S208 to S234 is repeated.

音源位置推定部１０７０は、変数ｎが音源の最大個数ｎmaxに達していれば（Ｓ２３４でＹ）、続いて、オーディオスキャンを示す変数ｋが、その最大値ｋmax−１よりも大きくない場合に（Ｓ２３６でＮ）、ｋの値をインクリメントして（Ｓ２０５）、ステップＳ２０２からＳ２３４までの処理を繰り返す。 If the variable n has reached the maximum number nmax of sound sources (Y in S234), the sound source position estimation unit 1070 continues when the variable k indicating the audio scan is not larger than the maximum value kmax-1 ( N in S236), the value of k is incremented (S205), and the processing from step S202 to S234 is repeated.

音源位置推定部１０７０は、さらに、全てのオーディオスキャンについての処理が終了すると（Ｓ２３６でＹ）、より高い対数尤度があるセルを見つけて、これらのセルをクラスターに分け（Ｓ２４０）、各クラスター(これらの重みは選択されたセルの対数尤度Ｌij(k)である)内の加重平均位置を計算することにより、幾何学的なマップにおける音源定位を行なう（Ｓ２４２）。 Further, when the processing for all audio scans is completed (Y in S236), the sound source position estimating unit 1070 finds cells having higher log likelihoods, divides these cells into clusters (S240), and sets each cluster. By calculating the weighted average position in these weights (the logarithmic likelihood Lij (k) of the selected cell), sound source localization in the geometric map is performed (S242).

なお、このような音響源の位置推定と音響源マップの更新を、ロボット１０００の移動とともに、定期的に実行し、あるいは、所定のイベントが生じた際に不定期に実行することで、常に、音響源マップを最新の状態に更新することが可能である。 It is to be noted that the position estimation of the acoustic source and the update of the acoustic source map are periodically performed along with the movement of the robot 1000, or irregularly performed when a predetermined event occurs. It is possible to update the acoustic source map to the latest state.

さらに、音源位置推定部１０７０は、セル｛ｉ，ｊ｝が見通されるオーディオスキャンの回数を、Ｋij（ｋ）としてカウントしてもよい。その場合、このカウント数は、ほとんど見通されることのない占有セルを対象から除くために使用される。 Further, the sound source position estimation unit 1070 may count the number of audio scans through which the cell {i, j} is foreseen as Kij (k). In that case, this count number is used to exclude occupied cells that are rarely foreseen.

なお、音源定位について事前情報が何もない場合、すべての占有セル｛ｉ，ｊ｝に対して、ｐ_ij,init＝０．５とするものとする（フラットな初期設定）。
（実験結果）
以下では、以上説明したようなロボット１０００を用いた音響源マップの実験結果について説明する。 When there is no prior information regarding sound source localization, it is assumed that p _{ij, init} = 0.5 for all occupied cells {i, j} (flat initial setting).
(Experimental result)
Hereinafter, an experimental result of the acoustic source map using the robot 1000 as described above will be described.

実験は、図１６で示したような環境の幾何学マップの廊下に対して実行された。図１６において、このマップ中のセルの次元は、５ｃｍ×５ｃｍである。 The experiment was performed on the corridor of the geometric map of the environment as shown in FIG. In FIG. 16, the dimension of the cell in this map is 5 cm × 5 cm.

このような環境の幾何学マップのデータを保持するので、ロボット１０００は、廊下で自律的に移動することができる。 Since the data of the geometric map of such an environment is held, the robot 1000 can move autonomously in the hallway.

実験の最初では、ロボット１０００は、廊下のすべての部分をカバーするように、ループ状に巡回できる１組の各位置について、幾何学マップ作成のための情報が与えられる。 At the beginning of the experiment, the robot 1000 is given information for creating a geometric map for each set of positions that can be looped to cover all parts of the hallway.

以下では、「１試行」とは、廊下の１つのループをロボットが移動することに相当する。 In the following, “one trial” corresponds to the robot moving in one loop of the hallway.

音響源マップが、以下に説明するように、これらの試行の間に生成された。 An acoustic source map was generated during these trials, as described below.

いくつかの(４個まで)音響源Ｓ１〜Ｓ４が、環境(これらの場所はレーザレンジファインダのスキャン面に位置している)に設置された。 Several (up to four) acoustic sources S1-S4 were installed in the environment (these locations are located on the scanning surface of the laser range finder).

これらの音響源は、録音された音を再生するラウドスピーカーであり、環境下には設置された各音響源からの音は、それぞれ、空気調節設備の音Ｓ１（５ｃｍ離れたところで測定したとき７８．５ｄｂＡの音圧を備える音）、デスクトップコンピューター・ファンの音Ｓ２(７７．５ｄｂＡ)、サーバーラックの音Ｓ３(７７ｄｂＡ)およびポピュラーソングの音Ｓ４（７３ｄｂＡ）であった。 These acoustic sources are loudspeakers that reproduce the recorded sound, and the sound from each acoustic source installed in the environment is the sound S1 of the air conditioning equipment (when measured at a distance of 5 cm, 78 Sound with a sound pressure of 5 dbA), sound S2 of a desktop computer fan (77.5 dbA), sound S3 of a server rack (77 dbA), and sound S4 of a popular song (73 dbA).

静かな廊下の音圧は約４２ｄＢＡであった。 The sound pressure in the quiet corridor was about 42 dBA.

図１７は、音響源位置の特定に使用したパラメータを表す図である。 FIG. 17 is a diagram illustrating parameters used for specifying the acoustic source position.

図１７において、オーディオスキャンの選択のためのパラメータ（εmaxとεrange）、オーディオスキャンにおける尤度関数の形状（εnorm，α）、尤度の累積（Ｌmax，Ｌmin，εproba，ＬＲmax）、レイキャストのための最大距離ｄの設定値を示す。レイキャスティングに使用された最大の距離ｄは、３メートルに設定された。それは、３メートル以上離れると、音響源から出た音は空間の雑音レベル未満となるからである。レイキャスティングの処理に対しては、より長い距離にすると、結果がよりばらつき、計算負荷も増加させることになる。 In FIG. 17, audio scan selection parameters (εmax and εrange), likelihood function shape (εnorm, α) in audio scan, likelihood accumulation (Lmax, Lmin, εproba, LRmax), and raycast The set value of the maximum distance d is shown. The maximum distance d used for ray casting was set to 3 meters. This is because the sound emitted from the acoustic source becomes less than the noise level of the space when it is more than 3 meters away. For ray casting processing, longer distances result in more variable results and increased computational load.

また、実験における試行にあたっては、各音響源の活性化のパターンを変化させている。 In the trial in the experiment, the activation pattern of each acoustic source is changed.

図２１は、各試行のための音響源の活性化のパターンを示すテーブルである。 FIG. 21 is a table showing the activation pattern of the acoustic source for each trial.

また、実行された試行のうち、４回の試行において、人が環境の中で歩いており話していた。 Also, among the trials that were performed, people were walking in the environment and talking in four trials.

自律移動中に、オーディオスキャンが可能な場合に、占有セルの対数尤度が更新された。 The log likelihood of the occupied cell is updated when audio scanning is possible during autonomous movement.

最新の生の音響源マップは、オーディオスキャンの後ごとに生成され、利用可能となる
図１８〜図２０は、フラットな初期設定の場合に得られた生の音響源マップを示す図である。 The latest raw acoustic source map is generated and made available after each audio scan. FIGS. 18-20 are diagrams showing raw acoustic source maps obtained in the case of a flat initial setting.

すなわち、図１８（ａ）（ｂ）、図１９（ｃ）（ｄ）、図２０（ｅ）（ｆ）は、それぞれ、フラットな初期設定の場合に、以下のような条件で、複数回の試行の終わりに得られた生の音響源マップを示す図である。 18 (a) (b), FIG. 19 (c) (d), and FIG. 20 (e) (f), respectively, in the case of a flat initial setting, a plurality of times under the following conditions: It is a figure which shows the raw acoustic source map obtained at the end of trial.

(ｏ)はグラウンドトルゥースで、(ｘ)は検知されたソースである。図１８（ａ）は、活性なソースなしでの試行１を、図１８（ｂ）は活性なソースS1、S2およびS3での試行２を示す。 (o) is ground truth and (x) is the detected source. FIG. 18 (a) shows trial 1 without an active source and FIG. 18 (b) shows trial 2 with active sources S1, S2 and S3.

図１９（ｃ）は、活性なソースS1、S2およびS3での試行３を、図１９（ｄ）は、人々が歩きまわり話しをしつつ、ソースS1、S2およびS3が活性な試行４を示す。 FIG. 19 (c) shows trial 3 with active sources S1, S2 and S3, and FIG. 19 (d) shows trial 4 with sources S1, S2 and S3 active while people walk around and talk. .

図２０（ｅ）は、ソースS1およびS3が活性な試行５を、図２０（ｆ）は、ソースS1、S2、S3およびS4を活性とした試行６を示す。 FIG. 20 (e) shows trial 5 in which the sources S1 and S3 are active, and FIG. 20 (f) shows trial 6 in which the sources S1, S2, S3 and S4 are active.

これらの生の音響源マップについては、正の尤度を有するセルが、クラスター化される。 For these raw acoustic source maps, cells with positive likelihood are clustered.

クラスター重心の場所が黒い×として示される。音響源の実際の位置は黒丸として与えられる。 The location of the cluster centroid is shown as a black x. The actual position of the acoustic source is given as a black circle.

クラスターの各々については、セルの数および対数尤度の和は、図２１に示されている。 For each of the clusters, the sum of the number of cells and the log likelihood is shown in FIG.

図２１において、Ｓ＊は、ソースがないところに誤って検出されたことを表わす。 In FIG. 21, S * represents that it was erroneously detected where there was no source.

図２１により、本実施の形態のようなアプローチは、音響源の位置をうまく評価できた。 As shown in FIG. 21, the approach of the present embodiment can successfully evaluate the position of the acoustic source.

フラットな初期設定については、平均局在誤差は１２．８８ｃｍであった。 For the flat initial setting, the average localization error was 12.88 cm.

セルの大きさが、５ｃｍ×５ｃｍであり、ラウドスピーカーが点音源ではないものの、複数のセルにまたがってもよいことを考慮すると、数セルの範囲に収まる誤差は、定位が正確に行われたことを示す。 Considering that the size of the cell is 5cm x 5cm and the loudspeaker is not a point sound source, but may be spread over multiple cells, the error that falls within the range of several cells was accurately localized. It shows that.

誤って検知された音源から得られた最も高い対数尤度和は、６８７（図２０（ｆ））であり、活性な音源に対して得られた最も小さな対数尤度和２１１６（図１９（ｄ））に比較して小さい。 The highest log likelihood sum obtained from an erroneously detected sound source is 687 (FIG. 20 (f)), and the smallest log likelihood sum 2116 obtained for an active sound source (FIG. 19 (d)). Smaller than)).

しかも、この低い値は、マッピング中に、２人が環境下で話しながら移動していた場合の試行４に相当する。 Moreover, this low value corresponds to trial 4 when two people are moving while speaking in the environment during mapping.

なお、以上の説明では、移動体であるロボット１０００は、自律的に移動する機構を有しており、ｉ）自身の位置を特定する処理、ｉｉ）音響源の方向を特定する処理、ｉｉｉ）音響源の位置を特定する処理、ｉｖ）幾何学マップを保持しまたは更新する処理、ｖ）音響源マップを保持しまたは更新する処理のすべてをロボット１０００の単体の中で行うものとして説明した。 In the above description, the robot 1000 that is a moving body has a mechanism that moves autonomously, i) a process that specifies its own position, ii) a process that specifies the direction of the acoustic source, and iii) It has been described that the process of specifying the position of the acoustic source, iv) the process of holding or updating the geometric map, and v) the process of holding or updating the acoustic source map are all performed in the robot 1000 alone.

しかしながら、音響源マップ作成システムとして考えた場合、ｉ）からｖ）の処理のうちの一部または全部は、ロボット１０００と無線通信などにより通信可能な別のコンピュータの処理として実行してもよい。 However, when considered as an acoustic source map creation system, part or all of the processes from i) to v) may be executed as a process of another computer that can communicate with the robot 1000 by wireless communication or the like.

また、ｉ）自身の位置を特定する処理において、パーティクルフィルタにおける各パーティクルの運動は、路程測定センサの測定値により算出するものとして説明したが、このようなパーティクルの運動は、ロボット１０００の外部の環境内に配置されたセンサ群により、ロボットの位置を計測した測定値により算出する構成としてもよい。 In addition, i) in the process of specifying its own position, the movement of each particle in the particle filter has been described as being calculated by the measured value of the path length measurement sensor. A configuration may be used in which a sensor group disposed in the environment is used to calculate a measured value of the position of the robot.

また、ロボットが外部の対象物までの距離を測定する装置としてレーザレンジファインダを用いるものとして説明したが、対象物までの距離を測定でき、その測定結果が所定の確率分布としてモデル化できる測距センサであれば、他のセンサを用いてもよい。 In addition, the robot has been described as using a laser range finder as a device for measuring the distance to an external object, but the distance measurement that can measure the distance to the object and model the measurement result as a predetermined probability distribution If it is a sensor, you may use another sensor.

以上説明したように、本実施の形態の移動体およびこれを用いた音響源マップの作成方法によれば、音源の位置を特定できる機能を有し、現実の音声コミュニケーションに利用可能な音響源に対する事前情報を収集することが可能である。 As described above, according to the moving body of the present embodiment and the method of creating the acoustic source map using the moving body, the acoustic source that has the function of specifying the position of the sound source and can be used for actual voice communication is used. Prior information can be collected.

特に、幾何学マップとオーディオスキャンの組合せることにより、計算の負荷を大きくすることなく、より正確に音響源マップの作成を実行することができる。これは、このアプローチでは、幾何学マップの占有セルだけが、レイキャストにおいて考慮されるからである。この占有セルに対するレイキャストにおいて、ロボット姿勢に関する知識（パーティクルの属性に反映される）は、占有セルの尤度の更新に利用される。 In particular, by combining a geometric map and an audio scan, it is possible to create an acoustic source map more accurately without increasing the calculation load. This is because with this approach, only the occupied cells of the geometric map are considered in the raycast. In ray casting for this occupied cell, the knowledge about the robot posture (reflected in the particle attributes) is used to update the likelihood of the occupied cell.

したがって、たとえば、このような音響源マップは、ロボットが環境騒音に関するよりよい知識を獲得するのをアシストすることになる。 Thus, for example, such an acoustic source map will assist the robot in acquiring better knowledge about environmental noise.

今回開示された実施の形態はすべての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は上記した説明ではなくて特許請求の範囲によって示され、特許請求の範囲と均等の意味および範囲内でのすべての変更が含まれることが意図される。 The embodiment disclosed this time should be considered as illustrative in all points and not restrictive. The scope of the present invention is defined by the terms of the claims, rather than the description above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.

１０路程測定センサ、２０前面レーザレンジファインダ、３０後面レーザレンジファインダ、４０センサ入出力ボード、５０メモリ、６０バス、６１固有ベクトル算出部、６２ＭＵＳＩＣ処理部、８６相関行列算出部、８８固有値分解部、１０４ＭＵＳＩＣ空間スペクトル算出部、１０６ＭＵＳＩＣ応答算出部、１１０音源方向推定処理部、ＭＣ１マイクロホンアレイ、１０００ロボット、１０１０プロセッサ、１０３０幾何学マップ作成部、１０４０ロボット位置特定処理部、１０５０音源パワースペクトル取得部、１０５２マイクロホンアレイ、１０５４音声入出力ボード、１０６０音源方向推定部、１０７０音源位置推定部、１１００不揮発性記憶装置、１１０２幾何学マップ、１１０４音響源マップ。 10 path length measurement sensor, 20 front laser range finder, 30 rear laser range finder, 40 sensor input / output board, 50 memory, 60 bus, 61 eigenvector calculation unit, 62 MUSIC processing unit, 86 correlation matrix calculation unit, 88 eigenvalue decomposition unit, 104 MUSIC spatial spectrum calculation unit, 106 MUSIC response calculation unit, 110 sound source direction estimation processing unit, MC1 microphone array, 1000 robot, 1010 processor, 1030 geometric map creation unit, 1040 robot location specification processing unit, 1050 sound source power spectrum acquisition unit , 1052 Microphone array, 1054 Sound input / output board, 1060 Sound source direction estimation unit, 1070 Sound source position estimation unit, 1100 Non-volatile storage device, 1102 Geometric map, 1104 Acoustic source map

Claims

A moving object,
Moving means for driving the moving body;
Storage means for storing a geometric map for specifying a geometric position of an object in a movable space of the moving body;
Position measuring means for measuring the current position of the moving body and outputting a position measurement result;
Ranging means for obtaining a distance from the moving body to an object specified by the geometric map in the space as ranging data according to a predetermined probability distribution;
Position estimation means for estimating the position of the moving body by a particle filter including a plurality of particles, each of the particles has position and orientation information in the space as attributes,
The position estimating means includes
Based on the position measurement result, each of the particles is moved, and based on the distance measurement data and the probability distribution, a first likelihood for the current attribute of each of the particles is calculated, and a weight of the corresponding particle is calculated. Means to
As a weighted average of each of the particles, including moving body position estimating means for estimating a current position of the moving body,
A sound sensor array;
A direction-of-arrival estimation means for executing processing for specifying a direction of arrival of sound in the sound sensor array based on a signal from the sound sensor array;
Sound source localization means for estimating the position of the acoustic source based on the attribute of each particle and the direction of arrival;
The sound source localization means obtains a second likelihood for the hit position in response to a ray cast from the position of each particle in the arrival direction hitting an object specified by the geometric map. A moving object that accumulates and estimates the position of the acoustic source based on the accumulated second likelihood.

The distance measuring means is a laser range finder,
The first likelihood is determined when the geometric distribution is obtained by ray casting from the position of each particle to the direction of the particle in the probability distribution based on the measurement result of the distance from the current position of the moving object to the object. The moving body according to claim 1, wherein probabilities corresponding to distances until the object is hit in a map are integrated at predetermined angular intervals.

The direction-of-arrival estimation means performs a plurality of sound scans,
In the second likelihood calculation, the sound source localization means selects a sound scan in which at least one of a maximum value or a change amount of sound power exceeds a predetermined threshold value, and executes the accumulation process. The moving body according to claim 1 or 2.

The second likelihood increases with the sound power and is defined by a function having a value scaled to a predetermined range;
The sound source localization means performs the accumulation process so that the second likelihood increases for the hit position where the sound scan is selected and the number of hits exceeds a predetermined threshold. 3. The moving body according to 3.

The moving body according to claim 1, wherein the moving body creates an acoustic map indicating the position of the localized acoustic source in the geometric map and stores the acoustic map in a storage device.

The mobile body according to claim 5, wherein the mobile body is an autonomous mobile robot.

The said moving body is a moving body of Claim 6 which updates the said acoustic map as it moves autonomously.

The moving body according to any one of claims 5 to 7, wherein the position measuring unit is a path length measurement sensor that detects a moving distance and a moving direction of the moving body based on the movement of the moving unit.

An acoustic source map creation system,
Moving means for driving the moving body;
Storage means for storing a geometric map for specifying a geometric position of an object in a movable space of the moving body;
Position measuring means for measuring the current position of the moving body and outputting a position measurement result;
Ranging means for obtaining a distance from the moving body to an object specified by the geometric map in the space as ranging data according to a predetermined probability distribution;
Position estimation means for estimating the position of the moving body by a particle filter including a plurality of particles, each of the particles has position and orientation information in the space as attributes,
The position estimating means includes
Based on the position measurement result, each of the particles is moved, and based on the distance measurement data and the probability distribution, a first likelihood for the current attribute of each of the particles is calculated, and a weight of the corresponding particle is calculated. Means to
As a weighted average of each of the particles, including moving body position estimating means for estimating a current position of the moving body,
A sound sensor array mounted on the moving body;
A direction-of-arrival estimation means for executing processing for specifying a direction of arrival of sound in the sound sensor array based on a signal from the sound sensor array;
Sound source localization means for estimating the position of an acoustic source based on the attribute of each particle and the direction of arrival;
The sound source localization means obtains a second likelihood for the hit position in response to a ray cast from the position of each particle in the arrival direction hitting an object specified by the geometric map. Accumulating and estimating the position of the acoustic source based on the accumulated second likelihood,
An acoustic source map creation system further comprising: an acoustic source map database that stores the estimated position of the acoustic source as an acoustic source map.

An acoustic source map creation method comprising:
Driving and moving the moving body;
Storing a geometric map for specifying a geometric position of an object in a movable space of the moving body in a storage device;
Measuring the current position of the moving body and outputting a position measurement result;
Obtaining a distance from the moving object to an object specified by the geometric map in the space as distance measurement data according to a predetermined probability distribution;
A step of estimating the position of the moving body by a particle filter including a plurality of particles, each of the particles has position and orientation information in the space as attributes,
The step of estimating the position of the moving body includes:
Based on the position measurement result, each of the particles is moved, and based on the distance measurement data and the probability distribution, a first likelihood for the current attribute of each of the particles is calculated, and a weight of the corresponding particle is calculated. And steps to
Estimating the current position of the moving body as a weighted average of each of the particles,
Executing a process for specifying a direction of arrival of sound in the sound sensor array based on a signal from the sound sensor array mounted on the moving body;
Further comprising estimating a position of an acoustic source based on each particle attribute and the direction of arrival;
The step of estimating the position of the acoustic source includes a step in which a ray cast toward the arrival direction from each particle position hits an object identified by the geometric map, and Accumulating the likelihood of 2 and estimating the position of the acoustic source based on the accumulated second likelihood,
Storing the estimated position of the acoustic source in a storage device as an acoustic source map.