JP7403778B2

JP7403778B2 - Sound source direction identification device

Info

Publication number: JP7403778B2
Application number: JP2022153296A
Authority: JP
Inventors: 昌浩和田; 慶介高橋
Original assignee: 株式会社ユピテル; 株式会社ユピテル鹿児島
Priority date: 2018-01-16
Filing date: 2022-09-27
Publication date: 2023-12-25
Anticipated expiration: 2038-01-16
Also published as: JP7154530B2; JP2022180571A; JP2019124570A

Description

本発明は、音源方向特定装置等に関するものである。 The present invention relates to a sound source direction identification device and the like.

特許文献１には、３つのマイクロフォンを備え、音源方向を推定する音源方向推定装置が記載されている。具体的には、音源方向の水平角を、３つの到達時間差を用いて算出する方法が開示されている。 Patent Document 1 describes a sound source direction estimation device that includes three microphones and estimates a sound source direction. Specifically, a method is disclosed in which the horizontal angle of the sound source direction is calculated using three arrival time differences.

特開２０１５－１６１６５９号公報Japanese Patent Application Publication No. 2015-161659

３つのマイクロフォンを用いて音源を特定する方法として、特許文献１の方法では、方向を精度良く特定できないおそれがあった。 As a method of identifying a sound source using three microphones, the method disclosed in Patent Document 1 may not be able to accurately identify the direction.

本願は、例えば上記の課題等の様々な課題に鑑み提案されたものであって、従来技術とは異なる方法で、例えば３つのマイクロフォンを用いて音源方向を精度良く特定することができる音源方向特定装置等を提供すること等を目的とする。
本願の発明の目的はこれに限定されず、本明細書および図面等に開示される構成の部分から奏する効果を得ることを目的とする構成についても分割出願・補正等により権利取得する意思を有する。例えば本明細書において「～できる」と記載した箇所を「～が課題である」と読み替えた課題が本明細書には開示されている。課題はそれぞれ独立したものとして記載しているものであり、この課題を解決するための構成についても単独で分割出願・補正等により権利取得する意思を有する。課題が明細書の記載から黙示的に把握されるものであっても、本出願人は本明細書に記載の構成の一部を補正または分割出願にて特許請求の範囲とする意思を有する。 The present application has been proposed in view of various problems such as those mentioned above, and uses a method different from the conventional technology to identify the sound source direction by using, for example, three microphones to accurately identify the direction of the sound source. The purpose is to provide equipment, etc.
The purpose of the invention of the present application is not limited to this, and we intend to acquire rights through divisional applications, amendments, etc. for structures that aim to obtain effects from the parts of the structure disclosed in this specification, drawings, etc. . For example, the present specification discloses a problem in which the phrase ``can be done'' is replaced with ``the problem is.'' Each of the problems is described as independent, and we intend to obtain rights for the structure to solve these problems independently through divisional applications, amendments, etc. Even if the problem is implicitly understood from the description of the specification, the present applicant has the intention of claiming a part of the structure described in the specification in an amendment or divisional application.

（１）本願の音源方向特定装置は、三角形の頂点に配置された３つのマイクロフォンと、音源から前記３つのマイクロフォンの各々までの音の到達時間の差に基づき、前記音源の位置を、前記三角形を含む平面に垂直な方向に沿って前記三角形を含む平面に投影した位置から前記平面の前記三角形で囲まれた領域の内側にある基準位置へ向かう音源方向を特定する特定部と、を備えることを特徴とする。このようにすれば、３つのマイクロフォンで音源方向を特定することができる。 (1) The sound source direction identification device of the present application determines the position of the sound source in the triangle based on three microphones arranged at the vertices of the triangle and the difference in arrival time of sound from the sound source to each of the three microphones. a specifying unit that specifies a sound source direction from a position projected onto a plane containing the triangle along a direction perpendicular to the plane containing the triangle to a reference position located inside an area surrounded by the triangle on the plane. It is characterized by In this way, the direction of the sound source can be specified using three microphones.

例えば、２つのマイクロフォンを結ぶ線分の中点を基準位置とし、当該２つのマイクロフォンを含む平面を想定した場合、２つのマイクロフォンでは、音源方向の特定は制約を受けざるを得ない。
図１に示すように、間隔Ｄａｂを開けて配置されたマイクロフォンMIC（Ａｃｈ）（以下、MICａと記載する。），MIC（Ｂｃｈ）（以下、MICｂと記載する。）および音源を含む平面を想定する。図１では、音源の位置（以下、音源位置と称する）から基準位置へ向かう音源方向を矢印にて示している。また、音源方向を、マイクロフォンMICａ，MICｂを結ぶ線分の中点を通る、マイクロフォンMICａ，MICｂを結ぶ線分の垂線（図１では０°と表記する。）と音源方向とのなす角の角度（以下、音源角度と称する。）である角度θにて示すとする。以下の記載において、垂線を０°線と記載する場合がある。
音が平面波であるとみなすと、音源位置からマイクロフォンMICａまでの距離と音源位置からマイクロフォンMICｂまでの距離との差である距離差Ｄdiffは、斜辺がマイクロフォンMICａ，MICｂを結ぶ線分、１辺が音源方向に直交する直角三角形のもう１辺の長さである。従って、距離差Ｄdiffは、次の（式１）で示される。音源位置からの距離は、マイクロフォンMICｂに対し、マイクロフォンMICａの方が距離差Ｄdiffだけ長いということになる。
Ｄdiff＝Ｄａｂ×ｓｉｎθ・・・（式１）
（式１）を変形すると、角度θは次の（式２）で示される。
θ＝ａｒｃｓｉｎ（Ｄdiff／Ｄａｂ）・・・（式２）
また、音速Ｖｓを用いて、距離差Ｄdiffと、マイクロフォンMICａにおける音の到達時間とマイクロフォンMICｂにおける音の到達時間との差である到達時間差Ｔdiffとの関係は、次の（式３）で示される。
Ｔdiff＝Ｄdiff／Ｖｓ・・・（式３）
（式３）を変形すると、距離差Ｄdiffは次の（式４）で示される。
Ｄdiff＝Ｖｓ×Ｔdiff・・・（式４）
（式２）に（式４）を代入すると、次の（式５）となる。
θ＝ａｒｃｓｉｎ（Ｖｓ×Ｔdiff／Ｄａｂ）・・・（式５）
（式５）において、間隔Ｄａｂおよび音速Ｖｓが既知とすれば、到達時間差Ｔdiffを測定などにより求めることで、角度θを算出することができる。
しかしながら、図２に示すように、マイクロフォンMICａ，MICｂに対し、同じ到達時間差Ｔdiffとなる音源方向は、角度θで示される方向と角度θ´で示される方向との２つ存在する。ここで、角度θ´は１８０°から角度θを減じた角度である。従って、到達時間差Ｔdiffだけでは、音源角度が角度θと角度θ´との何れであるかを特定することができない。マイクロフォンMICａ，MICｂを通る線に対し、一方の方向を正面方向、他方の方向を背面方向と称する場合、正面方向に音源があった場合、実際には、正面方向が実像、背面方向が虚像であるが、到達時間差Ｔdiffだけでは、何れか実像であるか区別がつかない。 For example, if the midpoint of a line segment connecting two microphones is used as a reference position, and a plane including the two microphones is assumed, the two microphones inevitably impose restrictions on identifying the direction of the sound source.
As shown in Figure 1, a plane is assumed that includes microphones MIC (Ach) (hereinafter referred to as MICa), MIC (Bch) (hereinafter referred to as MICb) arranged with an interval Dab, and a sound source. do. In FIG. 1, arrows indicate the direction of the sound source from the position of the sound source (hereinafter referred to as the sound source position) toward the reference position. Also, the sound source direction is determined by the angle between the sound source direction and the perpendicular to the line connecting the microphones MICa and MICb (denoted as 0° in Figure 1), which passes through the midpoint of the line connecting the microphones MICa and MICb. (hereinafter referred to as the sound source angle). In the following description, a perpendicular line may be referred to as a 0° line.
Assuming that the sound is a plane wave, the distance difference Ddiff, which is the difference between the distance from the sound source position to the microphone MICa and the distance from the sound source position to the microphone MICb, is determined by the hypotenuse being the line segment connecting the microphones MICa and MICb, and one side being This is the length of the other side of the right triangle that is perpendicular to the direction of the sound source. Therefore, the distance difference Ddiff is expressed by the following (Equation 1). Regarding the distance from the sound source position, microphone MICa is longer than microphone MICb by a distance difference Ddiff.
Ddiff=Dab×sinθ...(Formula 1)
When (Formula 1) is transformed, the angle θ is expressed by the following (Formula 2).
θ=arcsin(Ddiff/Dab)...(Formula 2)
Furthermore, using the sound speed Vs, the relationship between the distance difference Ddiff and the arrival time difference Tdiff, which is the difference between the sound arrival time at the microphone MICa and the sound arrival time at the microphone MICb, is expressed by the following (Equation 3). .
Tdiff=Ddiff/Vs...(Formula 3)
When (Formula 3) is transformed, the distance difference Ddiff is expressed by the following (Formula 4).
Ddiff=Vs×Tdiff...(Formula 4)
Substituting (Formula 4) into (Formula 2) results in the following (Formula 5).
θ=arcsin(Vs×Tdiff/Dab)...(Formula 5)
In (Equation 5), if the interval Dab and the sound speed Vs are known, the angle θ can be calculated by determining the arrival time difference Tdiff by measurement or the like.
However, as shown in FIG. 2, there are two sound source directions with the same arrival time difference Tdiff for the microphones MICa and MICb: the direction indicated by the angle θ and the direction indicated by the angle θ'. Here, the angle θ' is an angle obtained by subtracting the angle θ from 180°. Therefore, it is not possible to specify whether the sound source angle is the angle θ or the angle θ′ using only the arrival time difference Tdiff. When one direction is called the front direction and the other direction is called the back direction with respect to the line passing through the microphones MICa and MICb, if there is a sound source in the front direction, the front direction is actually the real image and the back direction is the virtual image. However, it is not possible to distinguish which image is a real image based on the arrival time difference Tdiff alone.

これに対して、本願の構成である同一直線上にない、三角形の頂点に配置される３つのマイクロフォンの間での到達時間の差によれば、音源方向を特定することができる。例えば、マイクロフォンMICａ，MICｂおよび音源を含む平面上の、マイクロフォンMICａ，MICｂに対して、音源方向が角度θである音１から遠い位置に、３つ目のマイクロフォンであるマイクロフォンMIC（Ｃｃｈ）（以下、MICｃと記載する。）を配置したとする。この場合、マイクロフォンMICａ，MICｂよりもマイクロフォンMICｃに音が早く到達した場合には音源方向は角度θ´である音２であり、マイクロフォンMICａ，MICｂよりもマイクロフォンMICｃに音が遅く到達した場合には音源方向は角度θである音１であると特定することができる。
つまり、マイクロフォンMICａへの音の到達時間とマイクロフォンMICｂへの音の到達時間との差である到達時間差Ｔｃａを（式５）に代入して算出される、角度θおよび１８０°から角度θを減じた角度θ´のうち、マイクロフォンMICａへの音の到達時間とマイクロフォンMICｃへの音の到達時間との差である到達時間差ＴｃａもしくはマイクロフォンMICｂへの音の到達時間とマイクロフォンMICｃへの音の到達時間との差である到達時間差Ｔｂｃに基づいて、何れか一方を音源角度であると特定することができる。以下の説明において、マイクロフォンMICａ～MICｃの何れの組の到達時間差であるかを区別する場合には到達時間差Ｔａｂ，Ｔｂｃ，Ｔｃａと記載し、総称する場合には到達時間差Ｔdiffと記載する。
尚、ここでは、到達時間差Ｔdiffは、２つのマイクロフォンにおいて、音が到達するのに要した時間の長い方の時間から、短い方の時間を減じて算出される時間であるものとする。無論、２つのマイクロフォンにおいて、音が到達するのに要した時間の長短によれば、２つのマイクロフォンのどちらが音源に対して遠方にあるかを特定することができる。 On the other hand, according to the configuration of the present application, which is based on the difference in arrival time between three microphones arranged at the vertices of a triangle that are not on the same straight line, it is possible to specify the direction of the sound source. For example, on a plane containing the microphones MICa, MICb and the sound source, a third microphone, microphone MIC (Cch) (hereinafter referred to as , MICc) are placed. In this case, if the sound reaches the microphone MICc earlier than the microphones MICa, MICb, the sound source direction is sound 2 whose direction is angle θ', and if the sound arrives at the microphone MICc later than the microphones MICa, MICb, then It can be specified that the sound source direction is sound 1 at an angle θ.
In other words, the angle θ is subtracted from the angle θ and 180°, which are calculated by substituting the arrival time difference Tca, which is the difference between the arrival time of the sound to the microphone MICa and the arrival time of the sound to the microphone MICb, into (Equation 5). Of the angle θ', the arrival time difference Tca is the difference between the time the sound arrives at the microphone MICa and the time the sound arrives at the microphone MICc, or the time the sound arrives at the microphone MICb and the time the sound arrives at the microphone MICc. Based on the arrival time difference Tbc, which is the difference between the two, it is possible to identify either one as the sound source angle. In the following description, when distinguishing between which set of microphones MICa to MICc the arrival time differences are, the arrival time differences are written as Tab, Tbc, and Tca, and when they are collectively referred to, the arrival time differences are written as Tdiff.
Here, it is assumed that the arrival time difference Tdiff is a time calculated by subtracting the shorter time from the longer time required for the sound to reach the two microphones. Of course, depending on the length of time required for sound to reach the two microphones, it is possible to specify which of the two microphones is farther from the sound source.

以下の説明において、図２にて示した、角度θ，θ´を区別するために、３つのマイクロフォンのうち、１組をなす２つのマイクロフォンにおいて、２つのマイクロフォンを通る線に対し、残り１つのマイクロフォンがない側を「表」、残り１つのマイクロフォンがある側を「裏」と称する。例えば、図２では、１組をなすマイクロフォンMICａ、MICｂにおいて、マイクロフォンMICｃが図２の位置にある場合、音１がある側が「表」であり、音２がある側が「裏」である。以下の説明において、１組をなすマイクロフォンをマイク組として記載、例えば１組をなすマイクロフォンMICａ、MICｂをマイク組MICａ⇔MICｂと記載する場合がある。 In the following explanation, in order to distinguish between the angles θ and θ′ shown in FIG. The side without the microphone is called the "front", and the side with the remaining microphone is called the "back". For example, in FIG. 2, in a pair of microphones MICa and MICb, when microphone MICc is in the position shown in FIG. 2, the side where sound 1 is located is the "front" side, and the side where sound 2 is located is the "back". In the following description, one set of microphones may be described as a microphone set; for example, one set of microphones MICa and MICb may be referred to as microphone set MICa⇔MICb.

さて、音源方向が、３つのマイクロフォンの位置を頂点とする三角形ＡＢＣの垂心を通り、三角形の各辺と平行な３本の線を境界線とする、６つの領域の何れに含まれるかを特定すれば、効率的に３組のマイクロフォンの各々における音源方向の表裏の区別をするこ
とができる。
図３に示すように、マイクロフォンMICａ～MICｃの位置を頂点とする三角形ＡＢＣの垂心を通り、三角形ＡＢＣの各辺と平行な３本の線である、平行線ＰＬａｂ，ＰＬｂｃ、ＰＬｃａを境界線とする６つの領域を領域１～６と称する。ここでは、三角形ＡＢＣの垂心を基準位置とする。音源位置から基準位置へ向かう方向が音源方向であり、図３において矢印にて音源方向の一例を示している。尚、音は平面波であるとみなしているため、音源方向は、マイクロフォンMICａ～MICｃを含む平面上にて、任意に移動して考えることができる。マイク組MICａ⇔MICｂの「表」に位置する領域は領域１～３であり、「裏」に位置する領域は領域４～６である。マイク組MICｂ⇔MICｃの「表」に位置する領域は領域３～５であり、「裏」に位置する領域は領域１，２，６である。マイク組MICｃ⇔MICａの組の「表」に位置する領域は領域１、５，６であり、「裏」に位置する領域は領域２～４である。従って、例えば、音源方向が領域３にあると特定されれば、マイク組MICａ⇔MICｂの「表」、マイク組MICｂ⇔MICｃの「表」、マイク組MICｃ⇔MICａの「表」であると、効率的に特定することができる。 Now, specify which of the six regions the sound source direction falls in, passing through the orthocenter of triangle ABC with the positions of the three microphones as vertices, and having three lines parallel to each side of the triangle as boundary lines. By doing so, it is possible to efficiently distinguish between the front and the back of the sound source direction in each of the three sets of microphones.
As shown in FIG. 3, the boundary lines are three parallel lines PLab, PLbc, and PLca that pass through the orthocenter of triangle ABC whose vertices are the positions of microphones MICa to MICc and are parallel to each side of triangle ABC. The six regions are referred to as regions 1 to 6. Here, the orthocenter of triangle ABC is taken as the reference position. The direction from the sound source position to the reference position is the sound source direction, and an example of the sound source direction is shown by an arrow in FIG. Note that since sound is assumed to be a plane wave, the direction of the sound source can be considered to be moved arbitrarily on the plane including the microphones MICa to MICc. The areas located on the "front" of the microphone set MICa⇔MICb are regions 1 to 3, and the regions located on the "back" are regions 4 to 6. Areas located on the "front" of the microphone group MICb⇔MICc are regions 3 to 5, and regions 1, 2, and 6 are located on the "back". The areas located on the "front" of the microphone group MICc⇔MICa are regions 1, 5, and 6, and the regions located on the "back" are regions 2 to 4. Therefore, for example, if it is specified that the sound source direction is in area 3, then the microphone group MICa⇔MICb is the “table”, the microphone group MICb⇔MICc is the “table”, and the microphone group MICc⇔MICa is the “table”. can be identified efficiently.

ところで、図４に示す様に、三角形ＡＢＣの垂心Ｏを通る、三角形ＡＢＣの各辺と平行な平行線ＰＬａｂ，ＰＬｂｃ、ＰＬｃａに加え、頂点Ａ～Ｃの各々から、各々の対辺に下された垂線ＰＬａ，ＰＬｂ，ＰＬｃの、合計６本の線を境界線とする１２の領域において、到達時間差Ｔdiffの大きい順は自ずと決まる。説明するに当たって、１２の領域を次のように称する。垂心Ｏを基点として、垂心Ｏから頂点Ａ側の垂線ＰＬａから右回りに平行線ＰＬｃａまでの領域を領域Ｒ１ａと称し、領域Ｒ１ａから右回りに順に、領域Ｒ１ｂ，Ｒ２ａ，Ｒ２ｂ，Ｒ３ａ，Ｒ３ｂ，Ｒ４ａ，Ｒ４ｂ，Ｒ５ａ，Ｒ５ｂ，Ｒ６ａ，Ｒ６ｂと称する。尚、垂線ＰＬａ，ＰＬｂ，ＰＬｃの交点が垂心Ｏである。 By the way, as shown in Fig. 4, in addition to parallel lines PLab, PLbc, and PLca passing through the orthocenter O of triangle ABC and parallel to each side of triangle ABC, there are parallel lines drawn from each of vertices A to C to each opposite side. In 12 regions whose boundaries are a total of six perpendicular lines PLa, PLb, and PLc, the order of arrival time difference Tdiff is automatically determined. For the purpose of explanation, the 12 areas will be referred to as follows. With the orthocenter O as the base point, the area from the orthocenter O to the perpendicular line PLa on the apex A side clockwise to the parallel line PLca is referred to as a region R1a, and from the region R1a clockwise in order, the regions R1b, R2a, R2b, R3a, R3b, They are called R4a, R4b, R5a, R5b, R6a, and R6b. Note that the orthocenter O is the intersection of the perpendicular lines PLa, PLb, and PLc.

尚、ここでは、簡単のため、三角形ＡＢＣが正三角形である場合を例に説明する。
例えば、音源位置が垂線ＰＬａ上にある場合には、マイク組MICａ⇔MICｂの到達時間差Ｔａｂとマイク組MICａ⇔MICｃの到達時間差Ｔｃａとは同じになり、マイク組MICｂ⇔MICｃの到達時間差Ｔｂｃは０となる。つまり、到達時間差Ｔａｂ，Ｔｃａが最大で、到達時間差Ｔｂｃが最小となる。音源位置が垂線ＰＬｂ，ＰＬｃ上にある場合、同様に、到達時間差Ｔdiffの大きい順は決まる。
また、例えば、音源位置が平行線ＰＬａｂ上にある場合には、到達時間差Ｔｂｃと到達時間差Ｔｃａとは同じになり、到達時間差Ｔａｂは到達時間差Ｔｂｃおよび到達時間差Ｔｃａの２倍となる。つまり、到達時間差Ｔａｂが最大となり、到達時間差Ｔｂｃ，Ｔｃａが最小となる。これは、音源位置からマイクロフォンMICａまでの距離と音源位置からマイクロフォンMICｂまでの距離との差である距離差ＤＤａｂは辺ＡＢの長さであり、音源位置からマイクロフォンMICｂまでの距離と音源位置からマイクロフォンMICｃまでの距離との差である距離差ＤＤｂｃは頂点Ｂから辺ＡＢの中点までの距離であり、音源位置からマイクロフォンMICｃまでの距離と音源位置からマイクロフォンMICａまでの距離との差である距離差ＤＤｃａは頂点Ａから辺ＡＢの中点までの距離であるからである。音源位置が平行線ＰＬｂｃ，ＰＬｃａ上にある場合、同様に、到達時間差Ｔdiffの大きい順は決まる。以下の説明において、マイクロフォンMICａ～MICｃの何れの組の距離差であるかを区別する場合には距離差ＤＤａｂ，ＤＤｂｃ，ＤＤｃａと記載し、総称する場合には距離差Ｄdiffと記載する。
次に、音源位置が線上にない場合について、音源位置が領域Ｒ２ａにあり、音源方向と垂線ＰＬｃとのなす角が角度θである場合を例に、図５を用いて説明する。尚、音源位置は領域Ｒ２ａにあるため、角度θは３０°未満である。 Here, for the sake of simplicity, a case where triangle ABC is an equilateral triangle will be explained as an example.
For example, when the sound source position is on the perpendicular line PLa, the arrival time difference Tab between the microphone group MICa⇔MICb is the same as the arrival time difference Tca between the microphone group MICa⇔MICc, and the arrival time difference Tbc between the microphone group MICb⇔MICc is 0. becomes. That is, the arrival time difference Tab, Tca is the maximum, and the arrival time difference Tbc is the minimum. When the sound source positions are on perpendicular lines PLb and PLc, the order of arrival time difference Tdiff is determined in the same way.
Further, for example, when the sound source position is on the parallel line PLab, the arrival time difference Tbc and the arrival time difference Tca are the same, and the arrival time difference Tab is twice the arrival time difference Tbc and the arrival time difference Tca. That is, the arrival time difference Tab becomes the maximum, and the arrival time differences Tbc and Tca become the minimum. This is the difference between the distance from the sound source position to microphone MICa and the distance from the sound source position to microphone MICb.The distance difference DDab is the length of side AB, and the distance from the sound source position to microphone MICb and the distance from the sound source position to microphone MICb are The distance difference DDbc, which is the difference from the distance to MICc, is the distance from vertex B to the midpoint of side AB, and the distance is the difference between the distance from the sound source position to microphone MICc and the distance from the sound source position to microphone MICa. This is because the difference DDca is the distance from the vertex A to the midpoint of the side AB. When the sound source positions are on parallel lines PLbc and PLca, the order of arrival time difference Tdiff is determined in the same way. In the following description, when distinguishing between which set of microphones MICa to MICc, the distance differences are written as DDab, DDbc, and DDca, and when they are referred to collectively, the distance differences are written as Ddiff.
Next, a case where the sound source position is not on a line will be described using FIG. 5, taking as an example a case where the sound source position is in the region R2a and the angle between the sound source direction and the perpendicular line PLc is an angle θ. Note that since the sound source position is in the region R2a, the angle θ is less than 30°.

図５は、各組における距離差Ｄdiffを算出するために、マイクロフォンMICａ～MICｃの位置を頂点とする三角形ＡＢＣに、図１と同様に、各辺を斜辺とする直角三角形を描いた図である。詳しくは、直角三角形ＡＢＤは、斜辺が辺ＡＢであり、一辺が音源方向と直交
する辺ＢＤである直角三角形である。また、直角三角形ＢＣＦは、斜辺が辺ＢＣであり、一辺が音源方向と直交する辺ＦＢである直角三角形である。また、直角三角形ＣＡＥは、辺ＣＡが斜辺であり、一辺が音源方向と直交する辺ＡＥである直角三角形である。
ここで、角ＤＢＡの角度はθ、角ＦＢＣの角度は（６０°＋θ）、角ＣＡＥの角度は（６０°－θ）となる。距離差ＤＤａｂは直角三角形ＡＢＤの辺ＡＤの長さである。また、距離差ＤＤｂｃは直角三角形ＢＣＦの辺ＣＦの長さである。また、距離差ＤＤｃａは直角三角形ＣＡＥの辺ＥＣの長さである。
マイクロフォンMICｂ，MICｃ間の間隔を間隔Ｄｂｃ、マイクロフォンMICｃ，MICａ間の間隔を間隔Ｄｃａとすると、距離差ＤＤａｂはＤａｂ×ｓｉｎθ、距離差ＤＤｂｃはＤｂｃ×ｓｉｎ（６０＋θ）、距離差ＤＤｃａはＤｃａ×ｓｉｎ（６０－θ）である。ここで、θ＜３０°であるので、ｓｉｎθ＜ｓｉｎ（６０－θ）＜ｓｉｎ（６０＋θ）であり、Ｄａｂ＝Ｄｂｃ＝Ｄｃａであるので、Ｄａｂ×ｓｉｎθ＜Ｄｃａ×ｓｉｎ（６０－θ）＜Ｄｂｃ×ｓｉｎ（６０＋θ）である。つまり、ＤＤａｂ＜ＤＤｃａ＜ＤＤｂｃとなる。
他の領域についても同様に、距離差Ｄdiffの大きい順は決まる。また、距離差Ｄdiffの大きい順とは、到達時間差Ｔdiffの大きい順と同じであるので、各々の領域における到達時間差Ｔdiffの大きい順は図６に示すようになる。図６では、各領域において、３つの到達時間差Ｔdiffを大きい順に記載している。尚、上記したように、音が到達するのに要した時間の長短によれば、２つのマイクロフォンのうち、どちらが音源に対して遠方にあるかを特定することができる。図６では、音源より遠方のマイクロフォンを括弧書きで示している。例えば、領域Ｒ１ａにおいて、最大の到達時間差Ｔdiffとなるのは到達時間差Ｔｃａであり、音源位置より遠方のマイクロフォンはマイクロフォンMICｃであることを示している。
以上、三角形ＡＢＣが正三角形である場合を例に、各領域における距離差Ｄdiffの大きい順について説明したが、三角形ＡＢＣが正三角形ではなく、垂心が三角形ＡＢＣで囲まれた領域の内側にある、すべての角が９０°以下である三角形である場合にも、同様に各領域における距離差Ｄdiffの大きい順は自ずと決まる。すべての角が９０°以下である三角形とは、例えば、直角三角形、鋭角三角形などである。尚、すべての角が９０°以下である三角形に該当しない三角形、鈍角三角形の場合には、距離差Ｄdiffの大きい順は図６に示す通りにはならない。 FIG. 5 is a diagram in which, in order to calculate the distance difference Ddiff for each pair, a right triangle with each side as a hypotenuse is drawn in the triangle ABC whose vertices are the positions of microphones MICa to MICc, as in FIG. 1. . Specifically, the right triangle ABD is a right triangle whose hypotenuse is the side AB and one side is the side BD perpendicular to the sound source direction. Further, the right triangle BCF is a right triangle whose hypotenuse is side BC and one side is side FB orthogonal to the sound source direction. Further, the right triangle CAE is a right triangle in which the side CA is the hypotenuse and one side is the side AE orthogonal to the sound source direction.
Here, the angle of the angle DBA is θ, the angle of the angle FBC is (60°+θ), and the angle of the angle CAE is (60°−θ). The distance difference DDab is the length of the side AD of the right triangle ABD. Further, the distance difference DDbc is the length of the side CF of the right triangle BCF. Further, the distance difference DDca is the length of the side EC of the right triangle CAE.
If the distance between the microphones MICb and MICc is the distance Dbc, and the distance between the microphones MICc and MICa is the distance Dca, the distance difference DDab is Dab×sinθ, the distance difference DDbc is Dbc×sin(60+θ), and the distance difference DDca is Dca×sin (60-θ). Here, since θ<30°, sin θ<sin(60-θ)<sin(60+θ), and Dab=Dbc=Dca, so Dab×sinθ<Dca×sin(60-θ)<Dbc ×sin(60+θ). That is, DDab<DDca<DDbc.
For other areas, the order of the distance differences Ddiff is determined in the same way. Further, the order of increasing distance difference Ddiff is the same as the order of increasing arrival time difference Tdiff, so the order of increasing arrival time difference Tdiff in each region is as shown in FIG. In FIG. 6, three arrival time differences Tdiff are listed in ascending order in each region. Note that, as described above, depending on the length of time required for the sound to arrive, it is possible to specify which of the two microphones is located farther from the sound source. In FIG. 6, microphones located far from the sound source are shown in parentheses. For example, in the region R1a, the maximum arrival time difference Tdiff is the arrival time difference Tca, indicating that the microphone farther from the sound source position is the microphone MICc.
Above, we have explained the order of distance difference Ddiff in each region using the case where triangle ABC is an equilateral triangle as an example. However, when triangle ABC is not an equilateral triangle and the orthocenter is inside the region surrounded by triangle ABC, Even in the case of a triangle in which all angles are 90° or less, the order of the distance differences Ddiff in each region is similarly determined. Triangles in which all angles are 90° or less include, for example, right triangles, acute triangles, and the like. Note that in the case of triangles that do not correspond to triangles in which all angles are 90 degrees or less, or obtuse triangles, the order of the distance differences Ddiff from large to large is not as shown in FIG. 6.

ところで、（式１）では、真横、つまり角度θが９０°に近づくほど、角度θの変化量に対する距離差Ｄdiffの変化量は小さくなる。（式１）を微分した、次の（式６）から明らかである。
Ｄdiff´＝（Ｄａｂ×ｓｉｎθ）´＝ｃｏｓθ・・・（式６）
従って、図７に示すように、例えば、音源角度が角度θの場合の距離差Ｄdiffと、角度（θ＋Δθ）の場合の距離差Ｄdiffとの差は、角度θが９０°に近づくほど微小となる。このため、角度θが９０°に近づくほど、距離差Ｄdiffの測定誤差の影響を大きく受けた角度θが算出され易くなり、算出される角度θの精度は悪くなる。尚、図７は、図１と同様の図であり、音源方向をマイクロフォンMICａ，MICｂを結ぶ線分の中点を通る、マイクロフォンMICａ，MICｂを結ぶ線分の垂線（図７では０°と表記する。）と音源方向とのなす角の角度である角度θにて示した図である。ここで、角度θが９０°に近づくとは、対象のマイク組MICａ⇔MICｂにおいて、距離差Ｄdiffおよび到達時間差Ｔdiffが最大に近づくということである。
以上を鑑み、３つのマイクロフォンのうちの２つのマイクロフォンを１組として各組から算出される３つの到達時間差Ｔdiffの各々に基づいて合計３つの角度θを算出することはできるが、３つの到達時間差Ｔdiffのうち最大の到達時間差Ｔdiffは角度θの算出から除外することで、音源方向を示す角度θの精度を上げることができることを発明者らは見出した。 By the way, in (Formula 1), as the angle θ approaches 90 degrees, the amount of change in the distance difference Ddiff becomes smaller with respect to the amount of change in the angle θ. This is clear from the following (Formula 6) obtained by differentiating (Formula 1).
Ddiff'=(Dab×sinθ)'=cosθ...(Formula 6)
Therefore, as shown in FIG. 7, for example, the difference between the distance difference Ddiff when the sound source angle is the angle θ and the distance difference Ddiff when the sound source angle is the angle (θ+Δθ) becomes smaller as the angle θ approaches 90°. . Therefore, as the angle θ approaches 90°, it becomes easier to calculate the angle θ which is greatly affected by the measurement error of the distance difference Ddiff, and the accuracy of the calculated angle θ becomes worse. Note that FIG. 7 is a diagram similar to FIG. 1, and the perpendicular line of the line connecting the microphones MICa and MICb (indicated as 0° in FIG. 7) passes through the midpoint of the line segment connecting the microphones MICa and MICb. ) and the direction of the sound source, which is the angle θ. Here, when the angle θ approaches 90°, it means that the distance difference Ddiff and the arrival time difference Tdiff approach the maximum in the target microphone group MICa⇔MICb.
In view of the above, it is possible to calculate a total of three angles θ based on each of the three arrival time differences Tdiff calculated from each set, with two of the three microphones as one set. The inventors have found that by excluding the maximum arrival time difference Tdiff from the calculation of the angle θ, it is possible to improve the accuracy of the angle θ indicating the direction of the sound source.

さて、最大の到達時間差Ｔdiffとなる１組を除く、残り２組の到達時間差Ｔdiffを角度
θの算出に用いるのであれば、音源方向が１２の領域（図６参照）の何れにあるかを特定する必要はなく、音源方向が最大の到達時間差Ｔdiffで特定される６領域の何れにあるかを特定すれば足りる。
最大の到達時間差Ｔdiffで特定される６領域とは、図８に示す、領域Ｒ１ａ、Ｒ１ｂを含む領域Ｒ１、領域Ｒ２ａ、Ｒ２ｂを含む領域Ｒ２、領域Ｒ３ａ、Ｒ３ｂを含む領域Ｒ３、領域Ｒ４ａ、Ｒ４ｂを含む領域Ｒ４、領域Ｒ５ａ、Ｒ５ｂを含む領域Ｒ５、領域Ｒ６ａ、Ｒ６ｂを含む領域Ｒ６の６領域である。図８に示すように、領域Ｒ１～Ｒ６の各々における最大の到達時間差Ｔdiffは、それぞれ、到達時間差Ｔｃａ，Ｔｂｃ，Ｔａｂ，Ｔｃａ，Ｔｂｃ，Ｔａｂである。
例えば、到達時間差Ｔdiffが最大となる組のマイクロフォンがマイク組MICａ⇔MICｂであり、遠方のマイクロフォンがマイクロフォンMICａである場合、音源方向は領域Ｒ３にあると特定される。図８における領域Ｒ３は、図３における領域３，４を跨ぐ領域である。従って、音源方向が、領域Ｒ３であると特定されれば、マイク組MICａ⇔MICｂにおいては音源方向が表裏の何れかであるかを特定することができないが、マイク組MICｂ⇔MICｃにおいては音源方向が「表」にあり、マイク組MICｃ⇔MICａにおいては音源方向が「裏」にあると特定することができる。因みに、到達時間差Ｔdiffが２番目あるいは３番目に大きい組がわかったとしても、この組を除いた残り２組の表裏の特定をすることはできない。 Now, if the arrival time differences Tdiff of the remaining two sets excluding the one set with the maximum arrival time difference Tdiff are to be used to calculate the angle θ, specify which of the 12 regions (see Figure 6) the sound source direction is in. It is not necessary to do this, and it is sufficient to specify which of the six regions the sound source direction is located in, which is specified by the maximum arrival time difference Tdiff.
The six regions identified by the maximum arrival time difference Tdiff are shown in FIG. 8: region R1 including regions R1a and R1b, region R2 including regions R2a and R2b, region R3 including regions R3a and R3b, and region R4a and R4b. There are six regions: region R4 including region R5, region R5 including region R5a and R5b, and region R6 including region R6a and R6b. As shown in FIG. 8, the maximum arrival time differences Tdiff in each of regions R1 to R6 are arrival time differences Tca, Tbc, Tab, Tca, Tbc, and Tab, respectively.
For example, if the microphone pair with the maximum arrival time difference Tdiff is the microphone group MICa⇔MICb, and the distant microphone is the microphone MICa, the sound source direction is specified to be in the region R3. Region R3 in FIG. 8 is a region spanning regions 3 and 4 in FIG. 3. Therefore, if the sound source direction is specified as region R3, it is not possible to specify whether the sound source direction is front or back for the microphone group MICa⇔MICb, but for the microphone group MICb⇔MICc, the sound source direction cannot be specified. is on the "front", and it can be specified that the sound source direction is on the "back" in the microphone group MICc⇔MICa. Incidentally, even if the set with the second or third largest arrival time difference Tdiff is known, it is not possible to identify the front and back sides of the remaining two sets other than this set.

以上、音源がマイクロフォンMICａ，MICｂ，MICｃを含む平面にあると仮定して説明した。しかしながら、（１）の構成は、音源がマイクロフォンMICａ，MICｂ，MICｃを含む平面にある場合に限定されるものではない。図９は、音源がマイクロフォンMICａ，MICｂ，MICｃを含む平面にない場合を示している。ここでは、マイクロフォンMICａ，MICｂ，MICｃを含む平面をＸＹ平面と称し、音源からＸＹ平面までの距離を距離Ｄｚと称し、音源の位置を、ＸＹ平面に垂直な方向に沿ってＸＹ平面に投影した位置を投影位置と称し、投影位置から基準位置までの距離を距離Ｄｘｙと称し、音源から基準位置までの距離を距離Ｄｄと称する。また、音源、基準位置、および投影位置を頂点とする三角形における、音源と基準位置とを結ぶ線分と基準位置と投影位置とを結ぶ線分とのなす角の角度を角度θｚと称する。距離Ｄｚに対し距離Ｄｘｙが十分長ければ、角度θｚは小さくなるため、距離Ｄｘｙを距離Ｄｄに近似することができる。同様に、投影位置から３つのマイクロフォンMICａ，MICｂ，MICｃの各々までの距離は、夫々、音源から３つのマイクロフォンMICａ，MICｂ，MICｃの各々までの距離に近似することができる。従って、例えば、投影位置からマイクロフォンMICａまでの距離と投影位置からマイクロフォンMICｂまでの距離との差は、距離差ＤＤａｂに近似することができる。そのため、音源から３つのマイクロフォンMICａ，MICｂ，MICｃの各々までの音の到達時間の差に基づいて、投影位置から３つのマイクロフォンMICａ，MICｂ，MICｃの各々までの距離の差を求めることができる。つまり、音源がＸＹ平面にない場合においても、音源の位置をＸＹ平面に対して垂直な方向に沿ってＸＹ平面に投影した投影位置から基準位置へ向かう音源方向を、音源から３つのマイクロフォンMICａ，MICｂ，MICｃの各々までの音の到達時間の差に基づいて特定することができる。
以上を鑑み、発明者らは、３つのマイクロフォンを用いた音源方向の特定において、次の（２）の構成が良いことを見出した。 The explanation above has been made on the assumption that the sound source is on a plane that includes the microphones MICa, MICb, and MICc. However, the configuration (1) is not limited to the case where the sound source is on a plane including the microphones MICa, MICb, and MICc. FIG. 9 shows a case where the sound source is not on the plane including the microphones MICa, MICb, and MICc. Here, the plane containing the microphones MICa, MICb, and MICc is called the XY plane, the distance from the sound source to the XY plane is called the distance Dz, and the position of the sound source is projected onto the XY plane along the direction perpendicular to the XY plane. The position is referred to as a projection position, the distance from the projection position to the reference position is referred to as distance Dxy, and the distance from the sound source to the reference position is referred to as distance Dd. Further, in a triangle whose vertices are the sound source, the reference position, and the projection position, the angle formed by the line segment connecting the sound source and the reference position and the line segment connecting the reference position and the projection position is referred to as an angle θz. If the distance Dxy is sufficiently long with respect to the distance Dz, the angle θz will be small, so the distance Dxy can be approximated to the distance Dd. Similarly, the distances from the projection position to each of the three microphones MICa, MICb, and MICc can be approximated to the distances from the sound source to each of the three microphones MICa, MICb, and MICc, respectively. Therefore, for example, the difference between the distance from the projection position to the microphone MICa and the distance from the projection position to the microphone MICb can be approximated to the distance difference DDab. Therefore, based on the difference in arrival time of sound from the sound source to each of the three microphones MICa, MICb, and MICc, the difference in distance from the projection position to each of the three microphones MICa, MICb, and MICc can be determined. In other words, even when the sound source is not on the XY plane, the direction of the sound source from the projection position, which is the projection position of the sound source onto the XY plane along the direction perpendicular to the XY plane, toward the reference position is from the sound source to the three microphones MICa, It can be specified based on the difference in arrival time of sound to each of MICb and MICc.
In view of the above, the inventors have found that the following configuration (2) is good in identifying the direction of a sound source using three microphones.

（２）本願の音源方向特定装置は、前記基準位置は前記三角形の垂心であり、前記特定部は、前記３つのマイクロフォンのうちの２つのマイクロフォンを１組として各組から算出される３つの前記到達時間の差のうち、最大の前記到達時間の差である１組に基づき、前記音源方向が、前記２つのマイクロフォンを通る線の各々に引かれた前記基準位置を通る３つの垂線により区画された前記基準位置を囲む６つの領域の何れに属するかを決定し、前記最大の到達時間の差である１組を除く残り２組の前記到達時間の差に基づき、前記６つの領域のうち決定した領域となる前記音源方向と前記３つの垂線のうちの１つの垂線
とのなす角度である音源角度を算出することを特徴とする。このようにすると、音源方向を精度良く特定することができる。 (2) In the sound source direction identification device of the present application, the reference position is the orthocenter of the triangle, and the identification unit is configured to identify the three microphones calculated from each set, with two microphones among the three microphones as one set. The sound source direction is divided by three perpendicular lines passing through the reference position drawn to each of the lines passing through the two microphones, based on one set that is the largest difference in arrival time among the differences in arrival times. Determine which of the six regions surrounding the reference position belongs to, and decide among the six regions based on the difference in arrival time of the remaining two sets excluding one set which is the maximum difference in arrival time. The present invention is characterized in that a sound source angle is calculated, which is an angle between the sound source direction and one of the three perpendicular lines, which corresponds to the area where the sound source is located. In this way, the direction of the sound source can be specified with high accuracy.

つまり、最大の到達時間差Ｔdiffである１組に基づき、音源方向が６つの領域の何れに属するかを決定し、最大の到達時間差Ｔdiffである１組を除く残り２組の到達時間差Ｔdiffから算出される音源角度の候補である角度θ，θ´のうち、６つの領域のうち決定した領域となる方の角度を音源角度の算出に用いる。最大の到達時間差Ｔdiffである１組によって、残り２組の音源方向の表裏、角度θ，θ´の何れであるかを特定することができる。また、最大の到達時間差Ｔdiffである１組を除くことで、音源角度の精度を良くすることができる。 In other words, it is determined which of the six regions the sound source direction belongs to based on one set with the maximum arrival time difference Tdiff, and it is calculated from the arrival time differences Tdiff of the remaining two sets excluding the one set with the maximum arrival time difference Tdiff. Of the angles θ and θ′ that are candidates for the sound source angle, the angle that corresponds to the determined region among the six regions is used to calculate the sound source angle. Based on the one set with the maximum arrival time difference Tdiff, it is possible to specify which of the remaining two sets are the front and back of the sound source direction and the angles θ and θ'. Furthermore, by excluding one set with the maximum arrival time difference Tdiff, the accuracy of the sound source angle can be improved.

（３）前記三角形は正三角形である構成とすると良い。このようにすると、角度θを導出するための演算を簡素にすることができる。演算の際に特定部にかかる負荷を軽減することができる。 (3) Preferably, the triangle is an equilateral triangle. In this way, the calculation for deriving the angle θ can be simplified. The load placed on the specific unit during calculation can be reduced.

（４）前記特定部は、前記３つのマイクロフォンの各々が出力する３つの電気信号のうちの２つの電気信号を１組として各組から算出される位相差に基づき、前記到達時間の差を算出する構成とすると良い。 (4) The identification unit calculates the difference in arrival times based on the phase difference calculated from each set of two electrical signals out of the three electrical signals output by each of the three microphones. It is good to have a configuration that does this.

電気信号は、マイクロフォンの周波数特性のバラツキ、環境などによる誤差を含む。このため、例えば、電気信号のレベルが閾値を超えた時刻に基づき、到達時間の差を算出した場合には、音源方向の精度が悪くなるおそれがある。位相差に基づき、到達時間の差を算出することで、音源方向を精度良く特定することができる。尚、音が人声である（５）のように、音に複数の周波数成分が含まれる場合、電気信号を周波数解析し、周波数成分毎に位相差を算出する構成とすると良い。 The electrical signal includes errors due to variations in microphone frequency characteristics, the environment, and the like. For this reason, for example, if the difference in arrival time is calculated based on the time when the level of the electrical signal exceeds the threshold, the accuracy of the direction of the sound source may deteriorate. By calculating the difference in arrival time based on the phase difference, the direction of the sound source can be specified with high accuracy. Note that when the sound includes a plurality of frequency components, as in case (5) where the sound is a human voice, it is preferable to perform frequency analysis on the electrical signal and calculate a phase difference for each frequency component.

（５）前記音源の音は人声であり、前記３つのマイクロフォンの各マイクロフォン間の距離は、５７ｍｍ以上１７０ｍｍ以下である構成とすると良い。このようにすると、音源である人の方向を精度良く特定することができる。 (5) The sound source may be a human voice, and the distance between each of the three microphones may be 57 mm or more and 170 mm or less. In this way, the direction of the person who is the source of the sound can be specified with high accuracy.

到達時間差Ｔdiffを位相差から算出する（４）の構成の場合、マイクロフォン間距離は、位相差の算出に使用する周波数成分の周波数に基づいて決定すると良い。例えば、マイクロフォン間距離を周波数の１波長分としてしまうと、一方のマイクロフォンに入る波に対し、他方のマイクロフォンには、１周期進んだ波から１周期遅れた波までの範囲の波が入る可能性が出てきてしまい、位相差を特定することができなくなってしまう。
例えば、一方のマイクロフォンに入る波に対して位相差が－１／２πである波が、他方のマイクロフォンに入る場合、実際に他方のマイクロフォンに入る波とは１／４周期遅れた波なのであるが、３／４周期早い波も入る可能性があるため、電気信号に基づき、位相差が－１／２πであるのか＋３／２πであるのか特定することはできない。
そこで、マイクロフォン間距離を位相差の算出に使用する周波数の半波長分とすると、一方のマイクロフォンに入る波に対し、他方のマイクロフォンに入る波は、１／２周期進んだ波から１／２周期遅れた波までに限定される。上記した、一方のマイクロフォンに入る波に対して位相差が－１／２πである波が他方のマイクロフォンに入る場合、他方のマイクロフォンに入る波は１／４周期遅れた波であり、位相差は－１／２πであると特定することができるようになる。
具体的な数値を挙げると、例えば、音速を３４０ｍ／ｓ、マイクロフォン間距離を５７ｍｍとすれば、３ｋＨｚ以下の周波数の波に対し、位相差を特定することができる。また、マイクロフォン間距離を１７０ｍｍとすれば、１ｋＨｚ以下の周波数の波に対し、位相差を特定することができる。 In the case of configuration (4) in which the arrival time difference Tdiff is calculated from the phase difference, the inter-microphone distance may be determined based on the frequency of the frequency component used to calculate the phase difference. For example, if the distance between microphones is set to one wavelength of frequency, there is a possibility that a wave that enters one microphone will enter the other microphone in a range from a wave that is one cycle ahead to a wave that is one cycle behind. appears, making it impossible to identify the phase difference.
For example, when a wave whose phase difference is -1/2π with respect to the wave entering one microphone enters the other microphone, the wave that actually enters the other microphone is delayed by 1/4 period. , 3/4 period earlier waves may also enter, so it is not possible to specify whether the phase difference is -1/2π or +3/2π based on the electrical signal.
Therefore, if the distance between the microphones is set to half a wavelength of the frequency used to calculate the phase difference, the wave entering one microphone will be 1/2 period from the wave that is 1/2 period ahead of the wave entering the other microphone. Limited to late waves. As mentioned above, when a wave whose phase difference is -1/2π with respect to the wave entering one microphone enters the other microphone, the wave entering the other microphone is a wave delayed by 1/4 period, and the phase difference is -1/2π can now be specified.
To give specific numerical values, for example, if the speed of sound is 340 m/s and the distance between microphones is 57 mm, it is possible to specify the phase difference for waves with a frequency of 3 kHz or less. Furthermore, if the distance between the microphones is 170 mm, it is possible to specify the phase difference for waves with a frequency of 1 kHz or less.

このように、マイクロフォン間距離が短い程、高い周波数においても位相差を特定することができるようになるため、位相差を特定できる周波数の範囲は広くなる。しかしながら、マイクロフォン間距離が短いと位相差は小さくなってしまうため、特に、低い周波数における位相差が小さくなり、位相差の誤差を招来するおそれがある。 In this way, the shorter the distance between the microphones, the more it becomes possible to specify the phase difference even at a high frequency, so the range of frequencies in which the phase difference can be specified becomes wider. However, if the distance between the microphones is short, the phase difference becomes small, so the phase difference becomes small especially at low frequencies, which may lead to errors in the phase difference.

ところで、人声の基本周波数の上限は２００Ｈｚ程度であり、第１フォルマント周波数の上限は１ｋＨｚ程度であり、第２フォルマント周波数の上限は３ｋＨｚ程度であることが知られている。ここで、フォルマント周波数は、音圧レベルがピークとなる、母音を特徴付ける周波数である。例えば、「あ」などの短い人声の場合にも、マイクロフォンが出力する電気信号には、１ｋＨｚ以下に、基本周波数および第１フォルマント周波数の２つの周波数成分が含まれる。また、３ｋＨｚ以下に、基本周波数、第１フォルマント周波数、および第２フォルマント周波数の３つの周波数成分が含まれる。
発明者らは、音源方向を精度良く特定するのに、音源位置を特定するのに用いる周波数範囲を１ｋＨｚ以下とすると良く、３ｋＨｚ以下とすると特に良いことを見出した。上記のように、１ｋＨｚ以下の周波数範囲とすれば、少なくとも基本周波数および第１フォルマント周波数の２つの周波数成分が含まれ、さらに範囲を広げ、３ｋＨｚ以下の周波数範囲とすれば、基本周波数、第１フォルマント周波数、および第２フォルマント周波数の３つの周波数成分が含まれるからである。また、３ｋＨｚより高い周波数を使用しなくても、音源を精度良く特定することができるからである。
上記のように、１ｋＨｚ以下の周波数の位相差を算出するには、マイクロフォン間距離を１７０ｍｍとすれば良く、３ｋＨｚ以下の周波数の位相差を算出するには、マイクロフォン間距離を５７ｍｍとすれば良い。マイクロフォン間距離を５７ｍｍ以上１７０ｍｍ以下の範囲とすると、位相差を特定できる周波数の上限値が１ｋＨｚ～３ｋＨｚとなる。従って、マイクロフォン間距離を５７ｍｍ以上１７０ｍｍ以下の範囲とすると、少なくとも基本周波数、第１フォルマント周波数を位相差の算出に使用することができる。また、位相差の算出に使用する周波数の上限を第２フォルマント周波数程度とすることで、低い周波数における位相差の精度を良くすることができる。このように、人声に対し、音源方向を精度良く特定することができる。 By the way, it is known that the upper limit of the fundamental frequency of human voice is about 200 Hz, the upper limit of the first formant frequency is about 1 kHz, and the upper limit of the second formant frequency is about 3 kHz. Here, the formant frequency is a frequency that characterizes a vowel and has a peak sound pressure level. For example, even in the case of a short human voice such as "a", the electrical signal output by the microphone includes two frequency components of the fundamental frequency and the first formant frequency below 1 kHz. Furthermore, below 3 kHz, three frequency components are included: the fundamental frequency, the first formant frequency, and the second formant frequency.
The inventors have found that in order to accurately identify the direction of a sound source, the frequency range used to identify the position of the sound source is preferably 1 kHz or less, and particularly preferably 3 kHz or less. As mentioned above, if the frequency range is 1 kHz or less, at least the two frequency components of the fundamental frequency and the first formant frequency are included, and if the range is further expanded to a frequency range of 3 kHz or less, the fundamental frequency and the first formant frequency are included. This is because three frequency components, the formant frequency and the second formant frequency, are included. Furthermore, the sound source can be identified with high accuracy without using a frequency higher than 3 kHz.
As mentioned above, to calculate the phase difference for frequencies below 1 kHz, the distance between the microphones should be 170 mm, and to calculate the phase difference for frequencies below 3 kHz, the distance between the microphones should be 57 mm. . When the distance between the microphones is in the range of 57 mm or more and 170 mm or less, the upper limit of the frequency at which the phase difference can be determined is 1 kHz to 3 kHz. Therefore, if the distance between the microphones is in the range of 57 mm or more and 170 mm or less, at least the fundamental frequency and the first formant frequency can be used to calculate the phase difference. Furthermore, by setting the upper limit of the frequency used to calculate the phase difference to about the second formant frequency, it is possible to improve the accuracy of the phase difference at low frequencies. In this way, the direction of the sound source of human voices can be identified with high accuracy.

（６）前記特定部は、所定期間において、前記３つのマイクロフォンの各々から出力される電気信号をデジタル値に変換するサンプリング処理と前記、サンプリング処理にて変換されたデジタル値に基づき方向を特定する特定処理と、を繰り返し実行する構成とすると良い。 (6) The identification unit performs sampling processing for converting the electrical signals output from each of the three microphones into digital values during a predetermined period, and identifies the direction based on the digital values converted by the sampling processing. It is preferable to have a configuration in which specific processing is repeatedly executed.

このようにすると、音がいつ発せられるかわからない場合であっても、音の発生に応じて、音源方向の特定をすることができる。例えば、人声に応じて動作するコミュニケーションロボット、音の発生場所を記録する監視カメラなどの、音に応じて動作する装置において適用すると良い。音に応じて動作する装置に適用する場合、音源方向に向けた動きを行う制御をすると良い。また、人の発話した位置を特定する機能を備えると良い。また、音声認識機能を設けると良い。また、特定した人の発話方向に所定の部位を向ける機能を備えると良い。特に、コミュニケーションロボットとすると良い。また、サンプリング処理の開始時刻から次のサンプリング処理の開始時刻までの時間は２００ｍｓ以下である構成とすると良い。このようにすると、例えばコミュニケーションロボットの場合、例えば「おい」などの短い呼びかけに対しても、音に応じて音源位置を特定し、確実に動作することができる。また、本願の構成によれば、特に（５）のように、マイクロフォン間の距離をコミュニケーションロボットとして好適なサイズとすることができる。 In this way, even if it is not known when the sound will be emitted, the direction of the sound source can be specified according to the sound generation. For example, the present invention may be applied to devices that operate in response to sound, such as communication robots that operate in response to human voices, and surveillance cameras that record the location of sound. When applied to a device that operates in response to sound, it is preferable to perform control to move toward the direction of the sound source. It is also good to have a function to identify the position where a person speaks. It is also good to provide a voice recognition function. Further, it is preferable to have a function of directing a predetermined part in the direction of speech of the specified person. It is especially good to use communication robots. Further, it is preferable that the time from the start time of a sampling process to the start time of the next sampling process is 200 ms or less. In this way, for example, in the case of a communication robot, even in response to a short call such as "hey", the location of the sound source can be specified according to the sound, and the robot can operate reliably. Further, according to the configuration of the present application, the distance between the microphones can be set to a size suitable for a communication robot, especially as in (5).

上述した（１）から（６）に示した発明は、任意に組み合わせることができる。例えば、（１）に示した発明の全てまたは一部の構成に、（２）以降の少なくとも１つの発明の少なくとも一部の構成を加える構成としてもよい。特に、（１）に示した発明に、（２）
以降の少なくとも１つの発明の少なくとも一部の構成を加えた発明とするとよい。また、（１）から（６）に示した発明から任意の構成を抽出し、抽出された構成を組み合わせてもよい。本願の出願人は、これらの構成を含む発明について権利を取得する意思を有する。 The inventions shown in (1) to (6) above can be combined arbitrarily. For example, at least a part of the structure of at least one invention described in (2) and subsequent ones may be added to all or part of the structure of the invention shown in (1). In particular, in the invention shown in (1), (2)
The invention may include at least a part of at least one of the following inventions. Further, arbitrary configurations may be extracted from the inventions shown in (1) to (6) and the extracted configurations may be combined. The applicant of this application intends to acquire rights to inventions containing these structures.

また、後述する（Ａ）から（Ｉ）に示した発明は、任意に組み合わせるとよい。例えば、（Ａ）に示した発明の全てまたは一部の構成に、（Ｂ）以降の少なくとも１つの発明の少なくとも一部の構成を加える構成としてもよい。特に、（Ａ）に示した発明に、（Ｂ）以降の少なくとも１つの発明の少なくとも一部の構成を加えた発明とするとよい。また、上述した（１）から（６）に示した発明と後述する（Ａ）から（Ｉ）に示した発明とは、任意に組み合わせることができる。また、（Ａ）から（Ｉ）に示した発明から任意の構成を抽出し、抽出された構成を組み合わせてもよい。（１）から（６）に示した発明から任意の構成を抽出し、（Ａ）から（Ｉ）に示した発明から任意の構成を抽出し、抽出された構成を組み合わせてもよい。本願の出願人は、これらの構成を含む発明について権利を取得する意思を有する。 Moreover, the inventions shown in (A) to (I) described below may be combined arbitrarily. For example, a configuration may be adopted in which at least a part of the configuration of at least one of the following inventions (B) is added to all or part of the configuration of the invention shown in (A). In particular, it is preferable to create an invention in which at least a part of the structure of at least one invention after (B) is added to the invention shown in (A). Moreover, the inventions shown in (1) to (6) above and the inventions shown in (A) to (I) described below can be combined arbitrarily. Alternatively, arbitrary configurations may be extracted from the inventions shown in (A) to (I) and the extracted configurations may be combined. Any configuration may be extracted from the inventions shown in (1) to (6), any configurations may be extracted from the inventions shown in (A) to (I), and the extracted configurations may be combined. The applicant of this application intends to acquire rights to inventions containing these structures.

（Ａ）複数のマイクを備える装置であって、前記複数のマイクのうち２つのマイクに対する音の到達時間の前後関係及び前記２つのマイクに対する音の到達時間の差に基づき所定の基準方向と音源の方向とのなす角度を求める機能である角度算出機能と、前記２つのマイクとは別のマイクを用いて、これらのマイクの位置を含む平面に垂直な面であって前記２つのマイクの位置を含む面によって区分される２つの領域のうちのいずれの領域側に前記音源が存在するかを特定する機能である音源方向特定機能とを備えることを特徴とする装置とするとよい。
このようにすれば、３つのマイクの位置を含む平面に垂直な面であって前記２つのマイクの位置を含む面によって区分される２つの領域のうちのいずれの領域側に音源が存在するかを確定でき、所定の基準方向と音源の方向とのなす角度を求めることができる。
所定の基準方向は例えば３つのマイクの位置を含む平面内の所定の方向とするとよく、音源の方向は３つのマイクの位置を含む平面内の方向（例えば３次元ベクトルの当該平面内の成分）とするとよい。
前記音源方向特定機能で用いる前記別のマイクは１つのマイクとしてもよいが複数のマイクとしてもよい。 (A) A device comprising a plurality of microphones, wherein a predetermined reference direction and a sound source are determined based on the order of arrival times of sound to two of the plurality of microphones and the difference in arrival times of sound to the two microphones. Using the angle calculation function, which is a function to calculate the angle formed with the direction of The apparatus may include a sound source direction specifying function, which is a function of specifying in which region the sound source is present out of two regions divided by a plane including a plane.
In this way, it is possible to determine on which side of the two areas the sound source is located, which is divided by the plane perpendicular to the plane including the positions of the three microphones and including the positions of the two microphones. can be determined, and the angle between the predetermined reference direction and the direction of the sound source can be determined.
The predetermined reference direction may be, for example, a predetermined direction in a plane that includes the positions of the three microphones, and the direction of the sound source is a direction in the plane that includes the positions of the three microphones (for example, a component of a three-dimensional vector in the plane) It is good to do this.
The other microphone used in the sound source direction specifying function may be one microphone, or may be a plurality of microphones.

（Ｂ）前記「別のマイクを用いて」は、「前記２つのマイクとは配置位置が平行でない位置に配置された別の２個のマイク間の音の到達時間の前後関係を用いて」とするとよい。
このようにすれば、３つのマイクの位置を含む平面に垂直な面であって前記２つのマイクの位置を含む面によって区分される２つの領域のうちのいずれの領域側に音源が存在するかをより確実により精度よく確定できる。例えば正五角形ABCDEの頂点にマイクAからEを各々配置し、マイクAとマイクBとを角度算出機能で用いる前記２つのマイクとし、マイクCとマイクDとを音源方向特定機能で用いる別のマイクとするとよい。 (B) "Using another microphone" refers to "using the context of the arrival time of sound between two other microphones that are placed in positions that are not parallel to the two microphones." It is good to do this.
In this way, it is possible to determine on which side of the two areas the sound source is located, which is divided by the plane perpendicular to the plane including the positions of the three microphones and including the positions of the two microphones. can be determined more reliably and accurately. For example, microphones A to E are placed at the vertices of a regular pentagon ABCDE, and microphones A and B are used as the two microphones used in the angle calculation function, and microphones C and D are used as other microphones used in the sound source direction identification function. It is good to do this.

（Ｃ）前記「別のマイクを用いて」は、「前記２つのマイクとは別の１つのマイクと、前記２つのマイクのうちいずれか１つのマイクとの、音の到達時間の前後関係を用いて」とするとよい。
このようにすればマイクを少なくとも１つ追加するだけで３つのマイクの位置を含む平面に垂直な面であって前記２つのマイクの位置を含む面によって区分される２つの領域のうちのいずれの領域側に音源が存在するかをより確実に精度よく確定できる。例えば正三角形XYZの頂点にマイクXからZを各々配置し、マイクXとマイクYとを角度算出機能で用いる前記２つのマイクとし、マイクZを「別の１つのマイク」とし、「前記２つのマイクのうちいずれか１つのマイク」をマイクXとするとよい。 (C) "Using another microphone" means "using one microphone other than the two microphones and one of the two microphones in relation to the arrival time of the sound." It is better to say "using".
In this way, by simply adding at least one microphone, you can select which of the two areas divided by the plane perpendicular to the plane containing the positions of the three microphones and including the positions of the two microphones. It is possible to more reliably and accurately determine whether a sound source exists on the area side. For example, place microphones X to Z at the vertices of an equilateral triangle It is preferable to define one of the microphones as microphone X.

（Ｄ）前記複数のマイクのうちから、前記角度算出機能で用いる前記２つのマイクとして機能させるマイクのペアと、前記音源方向特定機能で用いる前記別のマイクとして機能させるマイクとを、所定のルールに基づいて決定する機能を備えるとよい。
このようにすれば、音源の位置が変化しても、より確実に、より精度よく、３つのマイクの位置を含む平面に垂直な面であって前記２つのマイクの位置を含む面によって区分される２つの領域のうちのいずれの領域側に音源が存在するかを確定でき、所定の基準方向と音源の方向とのなす角度を求めることが可能となる。
特に所定のルールは、前記複数のマイク各々に検出される音に基づくルールとするとよく、前記複数のマイク各々に検出される音の比較結果のルールとするとよい。例えば前記複数のマイク各々に検出される音の位相のずれなど、到達時間の差に基づくルールとするとよい。 (D) Out of the plurality of microphones, a pair of microphones to be used as the two microphones used in the angle calculation function and a microphone to be used as the other microphone to be used in the sound source direction identification function are selected according to a predetermined rule. It would be good to have a function to make a decision based on.
In this way, even if the position of the sound source changes, it can be divided more reliably and accurately by a plane that is perpendicular to a plane that includes the positions of the three microphones and that includes the positions of the two microphones. It is possible to determine in which region of the two regions the sound source is present, and to determine the angle between the predetermined reference direction and the direction of the sound source.
In particular, the predetermined rule may be a rule based on the sounds detected by each of the plurality of microphones, or may be a rule based on a comparison result of the sounds detected by each of the plurality of microphones. For example, the rule may be based on a difference in arrival time, such as a phase shift of sounds detected by each of the plurality of microphones.

（Ｅ）前記所定のルールは、前記複数のマイクのうちから、最も音の到達時間差の大きいマイクのペアである基準ペアの２マイクを除く他のいずれかのマイクを前記角度算出機能で用いる２つのマイクのうちの少なくとも１つとするルールとするとよい。
このようにすれば、音源の位置がどのような位置になっても、角度算出機能による基準方向と音源の方向とのなす角度の算出精度が大幅に低くなってしまうことを防止できる。
（Ｆ）前記所定のルールは、前記複数のマイクのうちから、最も音の到達時間差の大きいマイクのペアである基準ペアの２マイクの少なくともいずれか一方を前記音源方向特定機能で用いる前記別のマイクとするルールとするとよい。
（Ｇ）前記音源方向特定機能は、前記複数のマイクを頂点とする多角形の頂点を結ぶ辺をなすマイクのペアのうち、最も音の到達時間差の大きいマイクのペアである基準ペアの２マイク以外がなす前記多角形の各辺に対して当該基準ペアの２マイクのなす辺が前記基準ペアの音の到達時間の前から後に向かう方向に交差する方向が、各辺について当該多角形の内側から外側であるか外側から内側であるかの性質に基づいて、当該各辺のうちの少なくとも１つの辺を形成する前記２つのマイクの位置を含む面によって区分される２つの領域のうちのいずれの領域側に前記音源が存在するかを特定するとよい。
このようにすれば、音源の位置がどのような位置になっても、より確実に２つの領域のうちのいずれの領域側に音源が存在するかを特定することができる。例えば三角形ABCの頂点位置に各々のマイクを設け、マイクBとマイクCの間が最も音の到達時間差の大きいマイクのペアとした場合、辺BCについてはA→Bと向かう辺ABについては三角形ABCの外側から内側へ向かう方向となる幾何学的な性質がある。 (E) The predetermined rule specifies that any one of the plurality of microphones other than the two microphones of the reference pair, which is the pair of microphones with the largest sound arrival time difference, is used in the angle calculation function. It is preferable to set the rule to use at least one of the two microphones.
In this way, regardless of the position of the sound source, it is possible to prevent the accuracy of calculating the angle between the reference direction and the direction of the sound source by the angle calculation function from becoming significantly lower.
(F) The predetermined rule specifies that, among the plurality of microphones, at least one of the two microphones of the reference pair, which is a pair of microphones having the largest sound arrival time difference, is used in the sound source direction identification function. It would be a good idea to make it a rule to use a microphone.
(G) The sound source direction identification function is performed using two microphones of a reference pair, which is a pair of microphones having the largest sound arrival time difference among pairs of microphones forming sides connecting vertices of a polygon having the plurality of microphones as vertices. The direction in which the sides formed by the two microphones of the reference pair intersect in the direction from before to after the arrival time of the sound of the reference pair with respect to each side of the polygon formed by Which of the two areas is divided by a plane including the positions of the two microphones forming at least one of the sides, based on whether the area is outside the area or from the outside to the inside. It is preferable to specify whether or not the sound source exists in the region.
In this way, regardless of the position of the sound source, it is possible to more reliably specify in which of the two regions the sound source is present. For example, if each microphone is placed at the apex position of triangle ABC, and microphones B and C are a pair of microphones with the largest difference in sound arrival time, side BC goes from A to B, and side AB goes from triangle ABC. It has a geometric property that the direction is from the outside to the inside.

（Ｈ）前記複数のマイクとして、三角形の頂点位置に第一のマイクと第二のマイクと第三のマイクを備え、前記音源方向特定機能は、第一のマイクと第二のマイクからなるペアと、第二のマイクと第三のマイクからなるペアと、第三のマイクと第一のマイクからなるベアの、前記三角形の３辺を形成する３組のペアのうち、最も音の到達時間差の大きい２つのマイクのペアを基準ペアとして、前記基準ペアのうち先に音が到達したマイクと前記基準ペアとは別のマイク位置を含む前記面によって区分される前記２つの領域のうち前記三角形の外側の領域から音が到達したものとする、または、前記基準ペアのうち後に音が到達したマイクと前記基準ペアとは別のマイク位置を含む前記面によって区分される前記２つの領域のうち前記三角形の内側の領域から音が到達したものとする、の少なくともいずれか一方を行うとよい。
このようにすれば、３つのマイクで、音源がいずれの領域にあるかをより確実に特定することができる。 (H) As the plurality of microphones, a first microphone, a second microphone, and a third microphone are provided at the apex positions of the triangle, and the sound source direction identification function is performed using a pair consisting of the first microphone and the second microphone. and the pair consisting of the second microphone and the third microphone, and the pair consisting of the third microphone and the first microphone, which form the three sides of the triangle, the difference in arrival time of the sound is the greatest. A pair of two microphones having a larger value is set as a reference pair, and the triangle is defined in the two areas divided by the plane including the microphone to which the sound reached first among the reference pair and a microphone position different from the reference pair. It is assumed that the sound has arrived from an area outside of the reference pair, or of the two areas divided by the plane that includes a microphone that the sound reached later among the reference pair and a microphone position different from the reference pair. It is preferable to perform at least one of the following, assuming that the sound has arrived from an area inside the triangle.
In this way, it is possible to more reliably identify in which region the sound source is located using the three microphones.

（Ｉ）前記三角形は正三角形とするとよい。
このようにすれば、三組のペアの精度が平等となり、方向による偏りが少ない条件で360°をカバーできる。したがって、装置の全周のいずれの方向から音声が到達したかを検
知する装置において極めて優れた効果を発揮する。 (I) The triangle is preferably an equilateral triangle.
In this way, the accuracy of the three pairs will be equal, and 360° can be covered with less bias due to direction. Therefore, an extremely excellent effect is exhibited in a device that detects from which direction around the device the sound has arrived.

本願によれば、従来技術とは異なる方法で、例えば３つのマイクロフォンを用いて音源方向を精度良く特定することができる音源方向特定装置等を提供することができる。本願の発明の効果はこれに限定されず、本明細書および図面等に開示される構成の部分から奏する効果についても開示されており、当該効果を奏する構成についても分割出願・補正等により権利取得する意思を有する。例えば本明細書において「～できる」と記載した箇所などは奏する効果を明示する記載であり、また「～できる」と記載がなくとも効果を示す部分が存在する。またこのような記載がなくとも当該構成よって把握される効果が存在する。 According to the present application, it is possible to provide a sound source direction identification device and the like that can accurately identify the direction of a sound source using, for example, three microphones, using a method different from the conventional technology. The effects of the invention of the present application are not limited to these, but the effects obtained from the parts of the structure disclosed in the specification, drawings, etc. are also disclosed, and the rights to the structure that produces the effects have also been acquired through divisional applications, amendments, etc. have the intention to do so. For example, in this specification, a portion where it is written as "can be done" is a statement that clearly indicates the effect to be achieved, and there are parts that show an effect even if there is no mention of "can be done". Further, even without such a description, there are effects that can be understood from the configuration.

２つのマイクロフォンの各々への音源からの距離の差と音源方向との関係を説明する図である。FIG. 2 is a diagram illustrating the relationship between the difference in distance from a sound source to each of two microphones and the direction of the sound source. ２つのマイクロフォンの各々への音の到達時間の差が互いに同じになる音源位置が２つ存在することを説明する図である。FIG. 3 is a diagram illustrating that there are two sound source positions in which the difference in arrival time of sound to each of two microphones is the same. 三角形ＡＢＣの垂心を囲む６つの領域と３つのマイクロフォンの各組における音源方向の表裏との関係を示す図である。FIG. 7 is a diagram showing the relationship between six areas surrounding the orthocenter of triangle ABC and the front and back sides of the sound source direction in each set of three microphones. 三角形ＡＢＣの垂心Ｏを囲む１２の領域の境界線における到達時間差を示す図である。FIG. 7 is a diagram showing arrival time differences at the boundaries of 12 regions surrounding the orthocenter O of the triangle ABC. 音源位置が図４に示す領域Ｒ２ａにある場合の音源から３つのマイクロフォンの各々までの距離の差を導出するための図である。5 is a diagram for deriving the difference in distance from the sound source to each of three microphones when the sound source position is in the region R2a shown in FIG. 4. FIG. 三角形ＡＢＣの垂心Ｏを囲む１２の領域の各々における到達時間差の大きい順を示す図である。FIG. 7 is a diagram showing the order of arrival time differences in each of 12 regions surrounding the orthocenter O of triangle ABC. 音源が２つのマイクロフォンを通る直線に近づくほど距離差の測定誤差が大きくなることを説明する図である。FIG. 3 is a diagram illustrating that the closer the sound source is to a straight line passing through two microphones, the larger the measurement error of distance difference becomes. 音の到達時間の差が最大である１組にて特定される三角形ＡＢＣの垂心Ｏを囲む６の領域を示す図である。FIG. 6 is a diagram showing six areas surrounding the orthocenter O of the triangle ABC, which are identified by one set having the maximum difference in arrival time of sound. 音源が３つのマイクロフォンを含む平面にない場合を示す図である。FIG. 6 is a diagram illustrating a case where the sound source is not on a plane that includes three microphones. 実施形態に係るロボットの斜視図である。FIG. 1 is a perspective view of a robot according to an embodiment. 固定部下筐体とともに示す音源方向特定装置の斜視図である。FIG. 3 is a perspective view of the sound source direction identification device shown together with the fixed lower housing. 音源方向特定装置の電気的構成図である。FIG. 2 is an electrical configuration diagram of a sound source direction identification device. １組のマイクロフォンにおける音源角度の極性を説明する図である。FIG. 2 is a diagram illustrating the polarity of sound source angles in one set of microphones. 距離差が最大の組とその組をなす２つのマイクロフォンのうち遠方であるマイクロフォンとにより特定される６つの場合の各々において各組の表裏の音源角度の何れを音源角度の算出に採用するかを示した表である。In each of the six cases specified by the pair with the largest distance difference and the farthest microphone of the two microphones forming the pair, which of the front and rear sound source angles of each pair is to be adopted for calculating the sound source angle. This is the table shown. 各組における音源角度と全体における音源角度との関係を説明する図である。It is a figure explaining the relationship between the sound source angle in each set and the sound source angle in the whole.

図１０に示すロボット１は、人声に反応して動作するコミュニケーションロボットである。ロボット１は、固定部２および固定部２に対して可動する可動部３を備える。以下の説明には、図１０，１１に示す方向を用いる。固定部２は固定部下筐体２１、固定部上筐体２２、および音源方向特定装置１０などを有する。固定部下筐体２１は、底面２３を有するボウル状であり、内部に音源方向特定装置１０などを収納している。尚、底面２３と平行な面がＸＹ平面である。固定部上筐体２２は筒状であり、固定部下筐体２１の上に位置し、可動部３の下部を覆う。固定部下筐体２１と固定部上筐体２２との間には、僅かな間隙が設けられており、間隙に音源方向特定装置１０が備えるマイクロフォンMICａ～MICｃ（図１１参照）が配置されている。尚、固定部上筐体２２の内部は部材がぎゅうぎゅうにつまっておらず、遮音する構造になっていない。このため、実際は固定部上筐体２２内
部に音が抜け、マイクロフォンMICａ～MICｃは、それぞれ、子基板１２ａ～１２ｃ（図１１）の後ろからも音を拾うことができる。可動部３は可動部筐体３１および表示装置３２などを備える。表示装置３２は、例えばタッチパネル、液晶ディスプレイなどで実現される。可動部筐体３１は一部が平面状に切り欠かれた球状である。表示装置３２は、可動部筐体３１の平面状の部分に取り付けられている。可動部３は、モータ（不図示）を駆動源として、固定部下筐体２１の底面２３に垂直なＺ軸回りに３６０°回転可能となっている。ロボット１は、音が発せられると、例えば人などの音を発した音源に表示装置３２が対面するように可動部３を回転させる。音源方向特定装置１０は、可動部３を回転させるための、音源の方向を特定する装置である。 The robot 1 shown in FIG. 10 is a communication robot that operates in response to human voices. The robot 1 includes a fixed part 2 and a movable part 3 movable with respect to the fixed part 2. The directions shown in FIGS. 10 and 11 will be used in the following description. The fixed part 2 includes a lower fixed part casing 21, an upper fixed part casing 22, a sound source direction identifying device 10, and the like. The fixed lower housing 21 has a bowl shape with a bottom surface 23, and houses the sound source direction identification device 10 and the like therein. Note that the plane parallel to the bottom surface 23 is the XY plane. The fixed part upper housing 22 has a cylindrical shape, is located above the fixed lower housing 21, and covers the lower part of the movable part 3. A slight gap is provided between the fixed lower housing 21 and the fixed part upper housing 22, and the microphones MICa to MICc (see FIG. 11) included in the sound source direction identification device 10 are arranged in the gap. . Note that the inside of the fixed part upper housing 22 is not packed with members and does not have a structure for sound insulation. Therefore, the sound actually escapes inside the fixed part upper housing 22, and the microphones MICa to MICc can pick up the sound from behind the slave boards 12a to 12c (FIG. 11), respectively. The movable part 3 includes a movable part housing 31, a display device 32, and the like. The display device 32 is realized by, for example, a touch panel, a liquid crystal display, or the like. The movable part housing 31 has a spherical shape with a portion cut out in a planar shape. The display device 32 is attached to a planar portion of the movable part housing 31. The movable part 3 is rotatable by 360° around a Z-axis perpendicular to the bottom surface 23 of the fixed lower housing 21 using a motor (not shown) as a drive source. When a sound is emitted, the robot 1 rotates the movable part 3 so that the display device 32 faces the source of the sound, such as a person. The sound source direction specifying device 10 is a device for specifying the direction of a sound source for rotating the movable part 3.

図１１に示すように、音源方向特定装置１０は、円盤状の基板１１およびマイクロフォンMICａ～MICｃなどを備える。基板１１はマイクロフォンMICａ～MICｃが取り付けられる子基板１２ａ～１２ｃを有する。子基板１２ａ～１２ｃの各々は、一方の面にマイクロフォンMICａ～MICｃの各々が取り付けられ、他方の面は基板１１と直交するように基板１１に取り付けられている。マイクロフォンMICａ～MICｃは、無指向性のコンデンサマイクフォンであり、基板１１に固定されている。基板１１は、固定部下筐体２１の底面２３に対してほぼ平行であり、マイクロフォンMICａ～MICｃ各々のＺ方向の位置は、底面２３に対してほぼ同等である。マイクロフォンMICａ～MICｃは、それぞれ、ＸＹ平面に描かれる正三角形ＡＢＣ（図３参照）の頂点の位置に位置するように配置されている。これにより、例えばマイクロフォンMICａ～MICｃの各々間の距離は３組で共通であるため、例えば（式５）をなどの導出式などの導出方法を３組で共通とすることができ、音源角度θを導出するための演算を簡素にすることができる。正三角形ＡＢＣの一辺の長さ、即ち、マイクロフォンMICａ～MICｃの各々間の距離は例えば約１００ｍｍである。これにより、後述する（処理５）での位相差の算出には少なくとも基本周波数、第１フォルマント周波数が含まれることとなり、また、低い周波数における位相差の精度を良くすることができるため、マイコン４１（後述）は人声に対し、音源角度θを精度良く特定することができる。 As shown in FIG. 11, the sound source direction identification device 10 includes a disk-shaped substrate 11, microphones MICa to MICc, and the like. The board 11 has daughter boards 12a to 12c to which microphones MICa to MICc are attached. Each of the daughter boards 12a to 12c has microphones MICa to MICc attached to one surface thereof, and the other surface thereof is attached to the substrate 11 so as to be perpendicular to the substrate 11. The microphones MICa to MICc are omnidirectional condenser microphones, and are fixed to the substrate 11. The substrate 11 is approximately parallel to the bottom surface 23 of the fixed lower housing 21, and the positions of the microphones MICa to MICc in the Z direction are approximately the same with respect to the bottom surface 23. The microphones MICa to MICc are arranged so as to be located at the vertices of an equilateral triangle ABC (see FIG. 3) drawn on the XY plane, respectively. As a result, for example, the distance between each of the microphones MICa to MICc is common to the three groups, so the derivation method such as formula (5) can be made common to the three groups, and the sound source angle θ The calculation for deriving can be simplified. The length of one side of the equilateral triangle ABC, that is, the distance between each of the microphones MICa to MICc is, for example, about 100 mm. As a result, at least the fundamental frequency and the first formant frequency are included in the calculation of the phase difference in (processing 5) described later, and the accuracy of the phase difference at low frequencies can be improved. (described later) can accurately specify the sound source angle θ for human voices.

また、音源方向特定装置１０は、基板１１の下方に、図１２に示す様にアンプＡＭＰａ～ＡＭＰｃ、サンプルホールド回路ＳＨａ～ＳＨｃ、およびマイコン４１などを備える。マイクロフォンMICａ、アンプＡＭＰａ、およびサンプルホールド回路ＳＨａはこの順に直列に接続されている。同様に、マイクロフォンMICｂ、アンプＡＭＰｂ、およびサンプルホールド回路ＳＨｂは直列に接続されており、マイクロフォンMICｃ、アンプＡＭＰｃ、およびサンプルホールド回路ＳＨｃは直列に接続されている。つまり、音源方向特定装置１０には、マイクロフォンMICａ～MICｃの各々からサンプルホールド回路ＳＨａ～ＳＨｃの各々までの３つのチャンネルがある。３つのチャンネルのそれぞれをチャンネルＡｃｈ～Ｃｃｈと称する。アンプＡＭＰａ～ＡＭＰｃは、電気的に接続されているマイクロフォンMICａ～MICｃから出力される電気信号を増幅して、電気的に接続されているサンプルアンドホールド回路ＳＨａ～ＳＨｃへ出力する。サンプルアンドホールド回路ＳＨａ～ＳＨｃは、マイコン４１から出力されるサンプリングクロック信号に同期して、入力される電気信号をホールドし、ホールドした電気信号をマイコン４１へ出力する。サンプリングクロック信号の周波数、つまりサンプリング周波数は、２０～４０ｋＨｚ程度である。 Further, the sound source direction specifying device 10 includes amplifiers AMPa to AMPc, sample and hold circuits SHa to SHc, a microcomputer 41, etc. below the board 11, as shown in FIG. The microphone MICa, the amplifier AMPa, and the sample-and-hold circuit SHa are connected in series in this order. Similarly, microphone MICb, amplifier AMPb, and sample-and-hold circuit SHb are connected in series, and microphone MICc, amplifier AMPc, and sample-and-hold circuit SHc are connected in series. That is, the sound source direction identifying device 10 has three channels from each of the microphones MICa to MICc to each of the sample and hold circuits SHa to SHc. Each of the three channels is referred to as channels Ach to Cch. The amplifiers AMPa to AMPc amplify electrical signals output from the electrically connected microphones MICa to MICc and output them to the electrically connected sample and hold circuits SHa to SHc. The sample-and-hold circuits SHa to SHc hold input electrical signals in synchronization with the sampling clock signal output from the microcomputer 41, and output the held electrical signals to the microcomputer 41. The frequency of the sampling clock signal, that is, the sampling frequency is approximately 20 to 40 kHz.

マイコン４１はロボット１の電源がオンされ、起動すると、後述する（処理１）を開始する。また、所定期間において、音源方向を特定するための、（処理１）～（処理８）を繰り返し実行する。これにより、音がいつ発せられるかわからない場合であっても、音の発生に応じて、音源方向の特定をすることができる。尚、（処理１）を実行する周期は、２００ｍｓ以下である。これにより、例えば「おい」などの短い人声であっても、音の発生に応じて、音源方向の特定をすることができる。また、マイコン４１はロボット１の電源がオフされると、実行している（処理１）～（処理８）の何れかを終了する。 When the robot 1 is powered on and activated, the microcomputer 41 starts (processing 1) to be described later. Furthermore, (processing 1) to (processing 8) are repeatedly executed for specifying the direction of the sound source during a predetermined period. Thereby, even if it is not known when the sound will be emitted, the direction of the sound source can be specified according to the sound generation. Note that the cycle for executing (process 1) is 200 ms or less. As a result, the direction of the sound source can be specified according to the generation of the sound, even if it is a short human voice such as "hey". Further, when the power of the robot 1 is turned off, the microcomputer 41 terminates any one of (processing 1) to (processing 8) that is being executed.

（処理１）マイコン４１はサンプルアンドホールド回路ＳＨａ～ＳＨｃの各々から出力された電気信号をＡＤ変換し、各チャネル用の配列に格納する。詳しくは、マイコン４１は、サンプルアンドホールド回路ＳＨａ～ＳＨｃの各々から出力された電気信号をＡＤ変換したデータを順次、チャンネル毎に配列して内蔵するメモリに記憶する。 (Processing 1) The microcomputer 41 performs AD conversion on the electrical signals output from each of the sample-and-hold circuits SHa to SHc, and stores them in an array for each channel. Specifically, the microcomputer 41 sequentially arranges data obtained by AD converting the electrical signals outputted from each of the sample-and-hold circuits SHa to SHc for each channel and stores the data in a built-in memory.

（処理２）一定量のデータを取得すると、マイコン４１は、高速フーリエ変換（ＦＦＴ）を３チャンネル分、行う。詳しくは、マイコン４１は、サンプルアンドホールド回路ＳＨａ～ＳＨｃの各々から出力された電気信号をＡＤ変換したデータの数が予め決められた数となる程度の所定時間が経過すると、メモリに記憶したデータを、チャネル毎に高速フーリエ変換する。所定時間は、５０ｍｓ～１００ｍｓ程度である。例えば２００ｍｓより長くなると、声を掛けられてから動作するまでにタイムラグが生じ、不自然さが増す。一方、５０ｍｓより短くすると、データ数が少なくなるため、方向の精度が落ちる。所定時間を上記の範囲とすることで、コミュニケーションを円滑にし、音源角度の精度を確保することができる。また、声にはいろいろな波長が混ざっているため、高速フーリエ変換により周波数解析を行う。尚、高速フーリエ変換のため、データの数は２の累乗が良く、例えば２＾８、２＾９、２＾１０程度が良い。マイコン４１は、高速フーリエ変換により得られた各周波数成分の複素数データを、周波数成分の各々に付与される周波数インデックスに対応付けてメモリに記憶する。また、後述の（処理４）にて絶対位相を算出する際に１つの位相に特定することができるように、次からの処理では、半波長がマイクロフォンMICａ～MICｃの各々間の距離より長い周波数である１．７ｋＨｚより低い周波数を処理の対象とする。尚、ここでは、音速を３４０ｍ／ｓとして算出している。 (Processing 2) After acquiring a certain amount of data, the microcomputer 41 performs fast Fourier transform (FFT) for three channels. Specifically, the microcomputer 41 converts the data stored in the memory into data stored in the memory after a predetermined period of time has passed during which the number of data obtained by AD converting the electrical signals output from each of the sample-and-hold circuits SHa to SHc reaches a predetermined number. is fast Fourier transformed for each channel. The predetermined time is approximately 50 ms to 100 ms. For example, if it is longer than 200 ms, there will be a time lag between when a voice is called and when the motion is performed, which will increase the unnaturalness. On the other hand, when the time is shorter than 50 ms, the number of data decreases, resulting in a decrease in direction accuracy. By setting the predetermined time within the above range, communication can be facilitated and accuracy of the sound source angle can be ensured. Additionally, since voices contain a mixture of various wavelengths, frequency analysis is performed using fast Fourier transform. Note that for fast Fourier transform, the number of data should preferably be a power of 2, for example, about 2^8, 2^9, or 2^10. The microcomputer 41 stores the complex number data of each frequency component obtained by fast Fourier transform in a memory in association with a frequency index assigned to each frequency component. In addition, in order to be able to specify one phase when calculating the absolute phase in (Processing 4) described later, in the next processing, we will use frequencies whose half wavelength is longer than the distance between each of the microphones MICa to MICc. The processing target is frequencies lower than 1.7kHz. Note that the calculation here is based on the assumption that the speed of sound is 340 m/s.

（処理３）次に、マイコン４１は、１．７ｋＨｚより低い周波数成分を対象として、高速フーリエ変換により得られた周波数インデックスごとに、複素数データからパワーを算出する。パワーは実数値の２乗に虚数値の２乗を加算した値である。次に、マイコン４１は、予め設定された閾値を超えた周波数インデックスをメモリに記憶する。以後、予め設定された閾値を超えた周波数インデックスを有音周波数インデックスと称する。ここで、予め設定された閾値を超えなかった周波数インデックスは、この周波数に音声成分が無いことを示す。そこで、マイコン４１は、以降の処理において、有音周波数インデックスのみを処理の対象とする。 (Process 3) Next, the microcomputer 41 calculates power from the complex number data for each frequency index obtained by fast Fourier transform, targeting frequency components lower than 1.7 kHz. Power is the sum of the square of the real value and the square of the imaginary value. Next, the microcomputer 41 stores in memory the frequency index that exceeds a preset threshold. Hereinafter, a frequency index exceeding a preset threshold will be referred to as a voiced frequency index. Here, a frequency index that does not exceed a preset threshold value indicates that there is no audio component at this frequency. Therefore, in the subsequent processing, the microcomputer 41 processes only the voiced frequency index.

（処理４）次に、マイコン４１は、有音周波数インデックスごとに、複素数データから絶対位相を算出する。絶対位相を算出する式を以下に示すように、４象限を対象とするものである。
絶対位相＝ＡｒｃＴａｎ［虚数値，実数値］
尚、ここでの絶対位相は、サンプルアンドホールド回路ＳＨａ～ＳＨｃがサンプリングした実時間データの、サンプルアンドホールド回路ＳＨａ～ＳＨｃが最初にホールドした開始時間を基準としたものである。また、複素数データの範囲は複素数平面における４象限であるため、算出される絶対位相の範囲は－π～＋πとなる。 (Process 4) Next, the microcomputer 41 calculates the absolute phase from the complex number data for each voice frequency index. As shown below, the formula for calculating the absolute phase targets four quadrants.
Absolute phase = ArcTan [imaginary value, real value]
Note that the absolute phase here is based on the start time at which the sample-and-hold circuits SHa to SHc first hold the real-time data sampled by the sample-and-hold circuits SHa to SHc. Furthermore, since the range of complex number data is four quadrants on the complex number plane, the range of the calculated absolute phase is -π to +π.

（処理５）次に、マイコン４１は、各有音周波数インデックスについて、３チャンネル分の絶対位相から、２チャンネルを１組とし、合計３組の位相差を求める。詳しくは、チャンネルＡｃｈ対チャンネルＢｃｈの位相差、チャンネルＢｃｈ対チャンネルＣｃｈの位相差、およびチャンネルＣｃｈ対チャンネルＡｃｈの位相差を求める。ここでは、チャンネルＡｃｈ対チャンネルＢｃｈの位相差を算出する際にはチャンネルＢｃｈの絶対位相からチャンネルＡｃｈの絶対位相を減じて算出し、チャンネルＢｃｈ対チャンネルＣｃｈの位相差を算出する際にはチャンネルＣｃｈの絶対位相からチャンネルＢｃｈの絶対位相を減じて算出し、チャンネルＣｃｈ対チャンネルＡｃｈの位相差を算出する際にはチャンネルＡｃｈの絶対位相からチャンネルＣｃｈの絶対位相を減じて算出するものとする。 (Process 5) Next, the microcomputer 41 determines phase differences for a total of three sets, with two channels as one set, from the absolute phases of three channels for each voiced frequency index. Specifically, the phase difference between channel Ach and channel Bch, the phase difference between channel Bch and channel Cch, and the phase difference between channel Cch and channel Ach are determined. Here, when calculating the phase difference between channel Ach and channel Bch, the absolute phase of channel Ach is subtracted from the absolute phase of channel Bch, and when calculating the phase difference between channel Bch and channel Cch, the absolute phase of channel Cch is calculated. The absolute phase of channel Bch is calculated by subtracting the absolute phase of channel Bch from the absolute phase of channel Bch, and the phase difference between channel Cch and channel Ach is calculated by subtracting the absolute phase of channel Cch from the absolute phase of channel Ach.

また、１組のマイクロフォンMICにおける音源角度のプラス・マイナスの極性を図１１に示すように定義する。尚、図１１は、３組のうちマイクロフォンMICａ，MICｂの組を取り上げて説明する図である。音源角度および表裏などの定義は上記と同様である。即ち、マイクロフォンMICａ，MICｂを結ぶ線分の中点を通る、マイクロフォンMICａ，MICｂを結ぶ線分の垂線を０°線と称する。また、音源の位置を、三角形ＡＢＣを含む平面に垂直な方向に沿って三角形ＡＢＣを含む平面に投影した投影位置から、三角形ＡＢＣの垂心である基準位置へ向かう方向が音源方向である。音は平面波とみなし、音源の投影位置からマイクロフォンMICａ，MICｂを結ぶ線分の中点へ向かう方向と０°線とのなす角度が音源角度θである。マイクロフォンMICａ，MICｂを通る線に対し、マイクロフォンMICｃがない側が表であり、マイクロフォンMICｃがある側が裏である。
マイクロフォンMICａ，MICｂにおいて、０°線に対して、位相差を算出する際に、減じる方のチャンネルであるチャンネルＡｃｈのマイクロフォンMICａのない側をプラス、マイクロフォンMICａのある側をマイナスと定義する。つまり、位相差がプラスであればマイクロフォンMICａがマイクロフォンMICｂよりも音源に対して遠方にあり、一方、位相差がマイナスであればマイクロフォンMICｂがマイクロフォンMICａよりも音源に対して遠方にあることになる。
また、他の組についても同様に、定義する。即ち、マイクロフォンMICｂ，MICｃにおいて、０°線に対して、位相差を算出する際に、減じる方のチャンネルであるチャンネルＢｃｈのマイクロフォンMICｂのない側をプラス、マイクロフォンMICｂのある側をマイナスと定義する。マイクロフォンMICｃ，MICａにおいて、０°線に対して、位相差を算出する際に、減じる方のチャンネルであるチャンネルＣｃｈのマイクロフォンMICｃのない側をプラス、マイクロフォンMICｃのある側をマイナスと定義する。以下の説明において、音源角度θを方向値と記載する場合がある。 Further, the plus and minus polarities of the sound source angle in one set of microphones MIC are defined as shown in FIG. Note that FIG. 11 is a diagram for explaining the set of microphones MICa and MICb among the three sets. The definitions of the sound source angle, front and back, etc. are the same as above. That is, a line perpendicular to the line segment connecting the microphones MICa and MICb that passes through the midpoint of the line segment connecting the microphones MICa and MICb is referred to as a 0° line. Further, the sound source direction is the direction from the projection position of the sound source projected onto the plane including the triangle ABC along a direction perpendicular to the plane including the triangle ABC toward the reference position which is the orthocenter of the triangle ABC. Sound is regarded as a plane wave, and the angle between the 0° line and the direction from the projection position of the sound source to the midpoint of the line segment connecting microphones MICa and MICb is the sound source angle θ. With respect to the line passing through microphones MICa and MICb, the side without microphone MICc is the front side, and the side with microphone MICc is the back side.
When calculating the phase difference between the microphones MICa and MICb with respect to the 0° line, the side of channel Ach, which is the channel to be subtracted, where the microphone MICa is not located is defined as plus, and the side where the microphone MICa is present is defined as minus. In other words, if the phase difference is positive, microphone MICa is farther from the sound source than microphone MICb, whereas if the phase difference is negative, microphone MICb is farther from the sound source than microphone MICa. .
Further, other groups are defined in the same way. That is, when calculating the phase difference between microphones MICb and MICc with respect to the 0° line, the side without microphone MICb of channel Bch, which is the channel to be subtracted, is defined as plus, and the side with microphone MICb is defined as minus. . When calculating the phase difference between microphones MICc and MICa with respect to the 0° line, the side of channel Cch, which is the channel to be subtracted, where there is no microphone MICc is defined as plus, and the side with microphone MICc is defined as minus. In the following description, the sound source angle θ may be referred to as a direction value.

（処理６）次に、マイコン４１は、各有音周波数インデックスについて、位相差と該有音周波数インデックスの周波数から到達時間差Ｔdiffを算出する。このように、位相差から到達時間差Ｔdiffを求めることで、到達時間差Ｔdiffを精度良く求めることができる。例えば、２つの、高速フーリエ変換前の実時間波形の各々にて予め設定された音量の閾値を超えた時刻の時間差を遅延時間差とすることもできる。しかしながら、この実時間波形を用いた方式の場合、マイクロフォンの周波数特性、２つのマイクロフォン間の周波数特性の差の影響を受け易い。例えば、一方のマイクロフォンにおいて、ある帯域の周波数の感度が悪く、この帯域の周波数のレベルが落ちた場合には、実時間波形は他方のマイクロフォンとは異なるものとなってしまう。このため、遅延時間差が実際とは異なるものとなり、到達時間差Ｔdiffの精度は悪くなってしまう。また、この実時間波形を用いた方式の場合、閾値の設定が遅延時間差に大きく影響してしまう。上記のように、２つの実時間波形は互いに異なるものとなるため、音量の閾値によって遅延時間差は変動してしまう。また、この実時間波形を用いた方式の場合、周囲環境、例えば、壁などによる反射音の影響を受け易い。この点、本実施形態における位相差を用いた方式によれば、実時間波形を用いた方式と比較し、マイクロフォンの周波数特性および反射音の影響が到達時間差Ｔdiffに反映されにくいため、精度良く到達時間差Ｔdiffを求めることができる。後述するように、音声の周波数成分ごとに到達時間差Ｔdiffを求めて、求めた到達時間差Ｔdiffを用いて音源角度θを求めるので、周波数成分間における相関がなく、マイクフォンや環境の周波数特性の影響を受けにくい。 (Process 6) Next, the microcomputer 41 calculates the arrival time difference Tdiff for each voice frequency index from the phase difference and the frequency of the voice frequency index. In this way, by determining the arrival time difference Tdiff from the phase difference, the arrival time difference Tdiff can be determined with high accuracy. For example, the time difference between the times when the two real-time waveforms before the fast Fourier transform exceed a preset volume threshold can be used as the delay time difference. However, in the case of this method using real-time waveforms, it is easily affected by the frequency characteristics of the microphone and the difference in frequency characteristics between the two microphones. For example, if one microphone has poor sensitivity to frequencies in a certain band and the level of the frequencies in this band drops, the real-time waveform will be different from that of the other microphone. Therefore, the delay time difference becomes different from the actual one, and the accuracy of the arrival time difference Tdiff deteriorates. Furthermore, in the case of this method using real-time waveforms, the setting of the threshold greatly affects the delay time difference. As described above, since the two real-time waveforms are different from each other, the delay time difference varies depending on the volume threshold. Furthermore, in the case of this method using real-time waveforms, it is easily affected by the surrounding environment, for example, sound reflected from walls. In this regard, according to the method using the phase difference in this embodiment, compared to the method using real-time waveforms, the frequency characteristics of the microphone and the influence of reflected sound are less likely to be reflected in the arrival time difference Tdiff. The time difference Tdiff can be determined. As described later, the arrival time difference Tdiff is determined for each frequency component of the voice, and the sound source angle θ is determined using the determined arrival time difference Tdiff, so there is no correlation between the frequency components, and the influence of the frequency characteristics of the microphone and environment is eliminated. hard to receive.

マイコン４１は、各有音周波数インデックスの到達時間差Ｔdiffを算出した後、全ての有音周波数インデックスの到達時間差Ｔdiffの加重平均を算出する。ここで使用される重み（レベル）は、当該周波数インデックスの√（実数値＾２＋虚数値＾２）である。マイコン４１は、以降の処理では各有音周波数インデックスでの値は使用せず、加重平均により求まった１つの値を使用する。
各組で１つの到達時間差Ｔdiffを算出後、マイコン４１は到達時間差Ｔdiff、音速、（
式４）から、距離差Ｄdiffを算出する。ここでは、音速を３４０ｍ／ｓとして算出するものとする。尚、ここでは、位相差のプラス・マイナスの極性を到達時間差Ｔdiffおよび距離差Ｄdiffにも踏襲させるものとする。従って、例えば、マイクロフォンMICａ，MICｂにおいて、距離差ＤdiffがプラスであればマイクロフォンMICａが遠方にあり、距離差ＤdiffがマイナスであればマイクロフォンMICｂが遠方にあることを示すこととなる。 After calculating the arrival time difference Tdiff of each voice frequency index, the microcomputer 41 calculates a weighted average of the arrival time differences Tdiff of all voice frequency indexes. The weight (level) used here is √(real value ^2 + imaginary value ^2) of the frequency index. The microcomputer 41 does not use the value at each voice frequency index in subsequent processing, but uses one value determined by weighted average.
After calculating one arrival time difference Tdiff for each group, the microcomputer 41 calculates arrival time difference Tdiff, sound speed, (
The distance difference Ddiff is calculated from Equation 4). Here, it is assumed that the speed of sound is calculated as 340 m/s. Here, it is assumed that the plus/minus polarity of the phase difference is also followed by the arrival time difference Tdiff and the distance difference Ddiff. Therefore, for example, between the microphones MICa and MICb, if the distance difference Ddiff is positive, it means that the microphone MICa is far away, and if the distance difference Ddiff is negative, it means that the microphone MICb is far away.

（処理７－１）次に、マイコン４１は、算出した３つの距離差Ｄdiffの絶対値が最大である距離差Ｄdiffおよび算出した３つの距離差Ｄdiffのプラス・マイナスの極性に基づき、図１２に示す表５１の６つの行のうち、適合する行を選出する。 (Process 7-1) Next, the microcomputer 41 calculates the distance difference Ddiff in FIG. A matching row is selected from among the six rows of the table 51 shown.

図１４に示す表５１は、図８を表にまとめたものである。表５１は、チェンネルＡｃｈ～Ｃｃｈの各組において、表裏のいずれの側を音源角度の算出に採用すべきかを示したものである。表５１の行の各々は、図８に示す領域Ｒ１～Ｒ６の各々のいずれかに対応している。表５１の列は、チャンネルＡｃｈ～Ｃｃｈの３組の各々における表・裏に対応している。表５１において、音源角度の算出に採用すべき側には「〇」が記され、採用すべきでない側には「－」が記されている。
例えば、表５１の１行目は、距離差Ｄdiffが最大のペアがチャンネルＡｃｈ，Ｃｃｈのペアであり、チャンネルＡｃｈのマイクロフォンMICａが音源に対して遠方である場合について示されている。この場合とは、図８における領域Ｒ４に音源方向が属する場合であり、音源方向はマイクロフォンMICｂ，MICｃのペアの表、マイクロフォンMICａ，MICｂのペアの裏に位置するため、表５１においても、「Ｂｃｈ－Ｃｃｈの表」および「Ｃｃｈ－Ａｃｈの裏」に「○」が記されている。また、この場合、上記したように、チャンネルＡｃｈ，Ｃｃｈのペアの距離差ＤＤｃａから算出される音源角度の精度は悪い為、マイコン４１はチャンネルＡｃｈ，Ｃｃｈのペアの表・裏いずれの側も音源角度の算出には採用しない。このため、表５１では、「Ａｃｈ－Ｃｃｈの表」、「Ａｃｈ－Ｃｃｈの裏」の何れにも「－」が記されている。 Table 51 shown in FIG. 14 is a table summarizing FIG. 8. Table 51 shows which side, front or back, should be adopted for calculating the sound source angle in each set of channels Ach to Cch. Each row of table 51 corresponds to one of the regions R1 to R6 shown in FIG. 8. The columns of the table 51 correspond to the front and back sides of each of the three sets of channels Ach to Cch. In Table 51, the sides that should be adopted in the calculation of the sound source angle are marked with "O", and the sides that should not be adopted are marked with "-".
For example, the first row of Table 51 shows the case where the pair with the largest distance difference Ddiff is the pair of channels Ach and Cch, and the microphone MICa of channel Ach is far from the sound source. This case is a case where the sound source direction belongs to region R4 in FIG. "O" is marked on "Bch-Cch front" and "Cch-Ach back". In this case, as described above, the accuracy of the sound source angle calculated from the distance difference DDca between the pair of channels Ach and Cch is poor, so the microcomputer 41 detects the sound source on either the front or back side of the pair of channels Ach and Cch. It is not used to calculate angles. Therefore, in Table 51, "-" is written on both the "front side of Ach-Cch" and the "back side of Ach-Cch."

マイコン４１は、最大である距離差Ｄdiffであるチャンネルの組および極性に基づき、表５１を参照し、距離差Ｄdiffが最大であるチャンネルの組以外の、チャンネルの組について、「〇」が記されているのは表裏の何れであるかを選出する。例えば、最大である距離差Ｄdiffであるチャンネルの組がチャンネルＡｃｈ，Ｃｃｈであり、距離差Ｄdiffの極性がプラスであれば、表５１の１行目が適合するため、マイコン４１は、チャンネルＢｃｈ，Ｃｃｈの表、チャンネルＡｃｈ，Ｂｃｈの裏を選出し、メモリに記憶する。また、マイコン４１は、各有音周波数インデックスについて、チャンネルの２組の各々について、（式５）を用いて、音源角度θを算出する。尚、ここでは、距離差Ｄdiffのプラス・マイナスの極性を音源角度θにも踏襲させるものとする。上述したように、（式５）を用いて算出される音源角度θは、マイクロフォンMICａ～MICｃを含むＸＹ平面に対して垂直な方向に沿って音源の位置をＸＹ平面に投影した投影位置から基準位置までの距離を、音源から基準位置までの距離に近似できると仮定した場合の、投影位置から基準位置へ向かう音源方向を示すものである。 The microcomputer 41 refers to Table 51 based on the set of channels with the maximum distance difference Ddiff and the polarity, and marks "〇" for the set of channels other than the set of channels with the maximum distance difference Ddiff. Select which side is on the front or back side. For example, if the set of channels with the maximum distance difference Ddiff is channels Ach and Cch, and the polarity of the distance difference Ddiff is positive, then the first row of Table 51 is applicable, so the microcomputer 41 selects channels Bch, Cch, The front side of Cch and the back side of channels Ach and Bch are selected and stored in the memory. Furthermore, the microcomputer 41 calculates the sound source angle θ for each of the two sets of channels for each sound frequency index using (Equation 5). Here, it is assumed that the plus/minus polarity of the distance difference Ddiff is also followed by the sound source angle θ. As mentioned above, the sound source angle θ calculated using (Equation 5) is based on the projection position of the sound source projected onto the XY plane along the direction perpendicular to the XY plane including the microphones MICa to MICc. This shows the direction of the sound source from the projected position to the reference position, assuming that the distance to the position can be approximated to the distance from the sound source to the reference position.

（処理７－２）次に、マイコン４１は、各々の組で算出した音源角度θを、基準方向を３組で統一させた、全体の音源角度に換算する。ここでは、図１５に示すように、マイクロフォンMICａ，MICｃの表側の０°線を全体の基準方向として、マイクロフォンMICａ，MICｃを結ぶ線分の中点を支点として右回りに０°～３６０°の範囲で全体の音源角度を示すものとする。図１５は、距離差Ｄdiffが最大のペアがチャンネルＢｃｈ，Ｃｃｈのペアであり、チャンネルＣｃｈのマイクロフォンMICｃが音源に対してマイクロフォンMICｂよりも遠方である場合について示されている。ここで、マイクロフォンMICａ，MICｃにおける音源角度を角度＋θｃａ、マイクロフォンMICａ，MICｂにおける音源角度を角度＋θａｂであるとする。この場合、角度＋θｃａは裏、角度＋θａｂは表に位置するため、全体
の音源角度は、それぞれ、１８０°－θｃａ、１２０°＋θａｂとなる。 (Process 7-2) Next, the microcomputer 41 converts the sound source angle θ calculated for each group into an overall sound source angle with the reference direction unified for the three groups. Here, as shown in Fig. 15, the 0° line on the front side of the microphones MICa, MICc is used as the overall reference direction, and the center point of the line connecting the microphones MICa, MICc is used as the fulcrum, and the angle is 0° to 360° clockwise. The range shall indicate the entire sound source angle. FIG. 15 shows a case where the pair with the largest distance difference Ddiff is the pair of channels Bch and Cch, and the microphone MICc of channel Cch is farther from the sound source than the microphone MICb. Here, it is assumed that the sound source angle at the microphones MICa and MICc is an angle +θca, and the sound source angle at the microphones MICa and MICb is an angle +θab. In this case, since the angle +θca is located on the back side and the angle +θab is located on the front side, the overall sound source angles are 180°-θca and 120°+θab, respectively.

（処理８）次に、マイコン４１は、（処理７－２）で算出した全体の音源角度に基づき、最終的な音源方向を統計的に算出する。具体的には、マイコン４１は、（処理７－２）で算出した全体の音源角度を平均し、１つの音源方向を算出する。 (Process 8) Next, the microcomputer 41 statistically calculates the final sound source direction based on the overall sound source angle calculated in (Process 7-2). Specifically, the microcomputer 41 calculates one sound source direction by averaging all the sound source angles calculated in (processing 7-2).

マイコン４１は、算出した音源方向に表示装置３２が対面するように、可動部３を回転させるモータを制御する。これにより、ロボット１の表示装置３２が音源方向に対面する。 The microcomputer 41 controls the motor that rotates the movable part 3 so that the display device 32 faces the calculated sound source direction. Thereby, the display device 32 of the robot 1 faces the direction of the sound source.

ここで、本実施形態による音源方向特定の他方式に対するメリットを説明する。
他方式として、指向性マイクフォロンを複数用い、その音量差、もしくは音量比から音源方向を求める方式がある。この他方式では、音源の位置検出の精度は、マイクロフォンの指向性の性能に依存されてしまう。この点、本実施形態では、無指向性マイクロフォンを使用し、指向性の性能に依存されない。また、この他方式では、例えば１０個程度の指向性マイクフォロンが必要とされるが、本実施形態では、３個のマイクロフォンで、音源方向を特定することができる。また、この他方式では、周囲環境の影響を受け易い。例えば周りに壁などがあると、音が壁に反射するため、マイクロフォンは間接音を拾ってしまう。このため、複数のマイクロフォンが拾う音の互いのレベル差が小さくなってしまう。この点、本実施形態では、音量ではなく、位相で見ているので、求める音源角度を高い分解能、精度とすることができる。 Here, the advantages of this embodiment over other methods of specifying the direction of a sound source will be explained.
Another method is to use a plurality of directional microphone phonons and determine the direction of the sound source from the volume difference or volume ratio. In other methods, the accuracy of sound source position detection depends on the directivity performance of the microphone. In this regard, in this embodiment, an omnidirectional microphone is used and is not dependent on directional performance. Further, in other methods, for example, about 10 directional microphone phonons are required, but in this embodiment, the direction of the sound source can be specified with three microphones. Furthermore, other methods are susceptible to the influence of the surrounding environment. For example, if there are walls nearby, the microphone will pick up indirect sound as the sound reflects off the walls. Therefore, the level difference between the sounds picked up by the plurality of microphones becomes small. In this regard, in this embodiment, since the phase is looked at instead of the volume, the obtained sound source angle can be determined with high resolution and accuracy.

ここで、音源方向特定装置１０は音源方向特定装置の一例であり、マイコン４１は特定部の一例であり、（処理１）はサンプリング処理の一例であり、（処理２）～（処理８）は特定処理の一例である。また、（処理７－２）にて算出する全体の音源角度は、「前記６つの領域のうち決定した領域となる前記音源方向と前記３つの垂線のうちの１つの垂線とのなす角度である音源角度」の一例である。 Here, the sound source direction identifying device 10 is an example of a sound source direction identifying device, the microcomputer 41 is an example of a identifying section, (processing 1) is an example of sampling processing, and (processing 2) to (processing 8) are This is an example of specific processing. In addition, the overall sound source angle calculated in (Process 7-2) is "the angle between the sound source direction, which is the determined area among the six areas, and one perpendicular line among the three perpendicular lines. This is an example of "sound source angle".

以上、説明した実施形態によれば、以下の効果を奏する。
音源方向特定装置１０は、正三角形ＡＢＣの頂点に配置された３つのマイクロフォンMICａ～MICｃと、音源から３つのマイクロフォンの各々までの音の到達時間差Ｔdiffに基づき、音源の位置を、正三角形ＡＢＣを含む平面に垂直な方向に沿って正三角形ＡＢＣを含む平面に投影した位置から正三角形ＡＢＣを含む平面の正三角形ＡＢＣで囲まれた領域の内側にある基準位置へ向かう音源方向を特定するマイコン４１とを備える。これにより、音源方向特定装置１０は３つのマイクロフォンMICａ～MICｃで音源方向を特定することができる。また、３つのマイクロフォンMICａ～MICｃは正三角形ＡＢＣの頂点に配置されるため、音源角度θを導出するための演算を簡素にすることができる。 According to the embodiment described above, the following effects are achieved.
The sound source direction identification device 10 determines the position of the sound source from the equilateral triangle ABC based on the three microphones MICa to MICc placed at the vertices of the equilateral triangle ABC and the arrival time difference Tdiff of the sound from the sound source to each of the three microphones. A microcomputer 41 that specifies the direction of a sound source from a position projected onto a plane containing equilateral triangle ABC along a direction perpendicular to the containing plane toward a reference position located inside an area surrounded by equilateral triangle ABC on the plane containing equilateral triangle ABC. Equipped with. Thereby, the sound source direction identifying device 10 can identify the sound source direction using the three microphones MICa to MICc. Furthermore, since the three microphones MICa to MICc are arranged at the vertices of the equilateral triangle ABC, calculations for deriving the sound source angle θ can be simplified.

また、マイコン４１は、（処理７－１）において、マイクロフォンMICａ～MICｃのうちの２つのマイクロフォンを１組として各組から算出される３つの到達時間差Ｔdiffのうち、最大の到達時間差Ｔdiffに基づき、音源方向がマイクロフォンMICａ～MICｃのうちの２つのマイクロフォンを通る線の各々に引かれた基準位置を通る３つの垂線ＰＬａ～ＰＬｃ（図８）により区画された基準位置を囲む６つの領域である領域Ｒ１～Ｒ６（図８）に対応する表２１の何れの行に適合するかを決定し、最大の到達時間差Ｔdiffである１組を除く残り２組の到達時間差Ｔdiffに基づき、領域Ｒ１～Ｒ６のうち決定した領域となる、表裏および極性の情報を付加した音源角度θを算出する。実施形態においては、音源角度θの範囲を０°以上９０°以下の範囲とし、音源角度θにプラス・マイナスの極性および表裏の情報を付加することで、３６０°を示すこととしている。マイコン４１は、（処理７－２）において、（処理７－１）にて算出した音源角度θを、基準方向を３組で統一させた、全体の音源角度に換算する。これにより、音源方向特定装置１０は、音源角度θを精
度良く特定することができる。 In addition, in (processing 7-1), the microcomputer 41 performs a process based on the maximum arrival time difference Tdiff among the three arrival time differences Tdiff calculated from each set, with two microphones among the microphones MICa to MICc as one set. A region in which the direction of the sound source is six regions surrounding a reference position divided by three perpendicular lines PLa to PLc (FIG. 8) passing through the reference position drawn to each of the lines passing through two of the microphones MICa to MICc. Determine which row of Table 21 corresponding to R1 to R6 (FIG. 8) fits, and based on the arrival time difference Tdiff of the remaining two sets excluding the one set with the maximum arrival time difference Tdiff, the area R1 to R6 is determined. The sound source angle θ, which corresponds to the determined region, is calculated by adding information on the front and back sides and polarity. In the embodiment, the range of the sound source angle θ is 0° or more and 90° or less, and 360° is indicated by adding plus/minus polarity and front/back information to the sound source angle θ. In (process 7-2), the microcomputer 41 converts the sound source angle θ calculated in (process 7-1) into an overall sound source angle with the three sets of reference directions unified. Thereby, the sound source direction identifying device 10 can accurately identify the sound source angle θ.

また、マイコン４１は、（処理６）において、位相差に基づき、到達時間差Ｔdiffを算出する。これにより、音源方向特定装置１０は、音源角度θを精度良く特定することができる。 Furthermore, in (process 6), the microcomputer 41 calculates the arrival time difference Tdiff based on the phase difference. Thereby, the sound source direction identifying device 10 can accurately identify the sound source angle θ.

また、マイクロフォンMICａ～MICｃのマイクロフォン間の距離は約１００ｍｍである。これにより、音源方向特定装置１０は、音源である人の方向を精度良く特定することができる。 Further, the distance between the microphones MICa to MICc is approximately 100 mm. Thereby, the sound source direction identifying device 10 can accurately identify the direction of the person who is the sound source.

また、マイコン４１は、所定期間において、マイクロフォンMICａ～MICｃの各々から出力される電気信号をデジタル値に変換する（処理１）と、（処理１）にて変換されたデジタル値に基づき方向を特定する（処理２）～（処理８）と、を繰り返し実行する。これにより、音源方向特定装置１０は、短い呼びかけに対しても、音に応じて音源位置を特定し、確実に動作することができる。 In addition, the microcomputer 41 converts the electrical signals output from each of the microphones MICa to MICc into digital values during a predetermined period (processing 1), and specifies the direction based on the digital values converted in (processing 1). (Processing 2) to (Processing 8) are repeatedly executed. Thereby, the sound source direction specifying device 10 can specify the position of the sound source according to the sound even in response to a short call, and can operate reliably.

また、本発明は前記実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲内での種々の改良、変更が可能であることは言うまでもない。
例えば、上記では、マイクロフォンMICａ～MICｃは正三角形ＡＢＣの頂点の位置に配置されると説明したが、これに限定されない。正三角形ではなく、すべての角が９０°以下である三角形であっても良い。 Furthermore, it goes without saying that the present invention is not limited to the embodiments described above, and that various improvements and changes can be made without departing from the spirit of the present invention.
For example, although it has been explained above that the microphones MICa to MICc are arranged at the vertices of the equilateral triangle ABC, the present invention is not limited thereto. Instead of being an equilateral triangle, it may be a triangle in which all angles are 90° or less.

また、上記では、音源方向特定装置１０は、サンプルホールド回路ＳＨａ～ＳＨｃを備えると説明したが、これに限定されない。例えば、マイコン４１が、アンプのＡＭＰａ～ＡＭＰｃからの出力信号を同時にサンプリング可能な構成を備えている場合には、サンプルホールド回路ＳＨａ～ＳＨｃを備えない構成としても良い。 Furthermore, although the sound source direction identification device 10 has been described above as having sample and hold circuits SHa to SHc, the present invention is not limited thereto. For example, if the microcomputer 41 has a configuration that can simultaneously sample the output signals from the amplifiers AMPa to AMPc, the microcomputer 41 may have a configuration that does not include the sample and hold circuits SHa to SHc.

また、上記では、（処理４）および（処理５）において、組をなす一方のチャンネルの絶対位相から他方のチャンネルを減じて位相差を算出すると説明した。これに限定されず、組をなす一方のチャンネルを基準に決め、基準としたチャンネルの絶対位相が０°となる様に他方のチャンネルを座標回転させて算出しても良い。座標回転させた後の他方のチャンネルの絶対位相が位相差となる。 Moreover, in the above description, in (Processing 4) and (Processing 5), the phase difference is calculated by subtracting the absolute phase of one channel of the set from the other channel. The calculation is not limited to this, but one channel of the set may be determined as a reference, and the coordinates of the other channel may be rotated so that the absolute phase of the reference channel becomes 0°. The absolute phase of the other channel after coordinate rotation becomes the phase difference.

また、上記では、（処理７－２）において、最終的な音源方向を統計的に算出すると説明したが、これに限定されない。ここで、複数の音源方向を相加平均するのではなく、角度が０°に近い程、重みを付けた重み付け平均とすると良い。 Further, although it has been described above that the final sound source direction is statistically calculated in (processing 7-2), the present invention is not limited to this. Here, instead of taking an arithmetic average of a plurality of sound source directions, it is preferable to use a weighted average that is weighted the closer the angle is to 0°.

また、上記では、マイクロフォンMICａ～MICｃは無指向性マイクロフォンであると説明したが、これに限定されず、単一指向性マイクロフォンでも良い。コンデンサマイクロフォンは単一指向性といっても、指向性はするどくない。単一指向性のコンデンサマイクロフォンであっても、裏から、つまり集音側でない側で話をした場合に、音が取れないということはなく、全指向性と単一指向性との差はわずかであるからである。但し、本実施形態では、マイクロフォンMICａ～MICｃの各々は、マイクロフォンMICａ～MICｃを含む平面における３６０°方向のどの方向に音源が位置したとしても、同等に音を拾うことが好適であるため、マイクロフォンMICａ～MICｃは無指向性マイクロフォンであることが好ましい。 Further, although it has been explained above that the microphones MICa to MICc are omnidirectional microphones, they are not limited to this, and may be unidirectional microphones. Even though condenser microphones are unidirectional, their directivity is not very good. Even with a unidirectional condenser microphone, if you speak from the back, that is, from the side that is not collecting the sound, you will not be unable to pick up the sound, and there is only a small difference between omnidirectional and unidirectional. This is because. However, in this embodiment, each of the microphones MICa to MICc preferably picks up sound equally no matter which direction in the 360° direction the sound source is located in the plane including the microphones MICa to MICc. MICa to MICc are preferably omnidirectional microphones.

また、上記では、例えば、音源方向の表裏の定義、音源角度の極性の定義を説明したが、これに限定されない。これらは、算出される位相差に対して全体の音源方向が整合されるように、任意に定義することができる。 Further, in the above description, for example, the definition of front and back sides of the sound source direction and the definition of the polarity of the sound source angle have been described, but the present invention is not limited thereto. These can be arbitrarily defined so that the overall sound source direction is aligned with the calculated phase difference.

また、上記では、ロボット１の電源がオンされ、マイコン４１が起動している期間、（処理１）～（処理８）を繰り返し実行すると説明したが、これに限定されない。例えば、音量レベルが閾値を超えたことをトリガとして、（処理１）を開始する構成としても良い。この構成によれば、音を確実に取り込み、音に反応して確実に動作することができる。 Furthermore, although it has been described above that (Processing 1) to (Processing 8) are repeatedly executed while the robot 1 is powered on and the microcomputer 41 is activated, the present invention is not limited to this. For example, the configuration may be such that (processing 1) is started when the volume level exceeds a threshold value as a trigger. According to this configuration, it is possible to reliably capture sound and reliably operate in response to sound.

また、上記では、音源方向特定装置１０はロボット１に備えられると説明したが、これに限定されない。例えば、音源方向特定装置１０が可動式の監視カメラに備えられても良い。この構成によれば、音源方向特定装置１０が特定した方向にカメラを向けることができる。また、音源方向特定装置１０が判定した音源方向を記録する機能を備える構成、音源方向特定装置１０が判定した音源方向を記録装置に出力する構成としても良い。また、マイクロフォンMICａ～MICｃが集音した音声を音源方向特定装置１０が記録する機能を備える構成、マイクロフォンMICａ～MICｃが集音した音声を例えばＰＣなどの処理装置に出力する構成を音源方向特定装置１０が備える構成としても良い。 Moreover, although it has been explained above that the sound source direction identifying device 10 is provided in the robot 1, the present invention is not limited to this. For example, the sound source direction identifying device 10 may be included in a movable surveillance camera. According to this configuration, the camera can be directed in the direction specified by the sound source direction identification device 10. Alternatively, a configuration may be provided that includes a function of recording the sound source direction determined by the sound source direction identification device 10, or a configuration in which the sound source direction determined by the sound source direction identification device 10 is output to a recording device. In addition, the sound source direction identification device 10 may have a configuration in which the sound source direction identifying device 10 records the sounds collected by the microphones MICa to MICc, and a configuration in which the sounds collected by the microphones MICa to MICc are output to a processing device such as a PC. 10 may be provided.

また、上記では、（処理６）にて、各組にて、加重平均により１つの到達時間差Ｔdiffを求め、以降の処理を行うと説明した。これに限定されず、１つの到達時間差Ｔdiffを求めずに、各組において、各有音周波数インデックスについて、（処理７－１）以降の処理を行う構成としても良い。この構成の場合、（処理８）にて、マイコン４１は、（処理７－２）で算出した、「有音インデックス数×２」個の全体の音源角度に基づき、最終的な音源方向を統計的に算出する。具体的には、マイコン４１は、（処理７－２）で算出した、「有音インデックス数×２」個の全体の音源角度のうち、外れ値を除外して平均し、１つの音源方向を算出する。このように、周波数成分ごとに音源角度を求め、それらの統計から最終的な音源角度を求めるため、条件の悪い周波数成分による誤差の影響を受けにくい。例えばマイクロフォン、アンプなどの周波数特性にはバラツキがある。このため、算出する位相差の誤差が比較的大きい周波数成分と、誤差が比較的小さい周波数成分とが含まれることが考えられる。そこで、複数の周波数成分に基づき最終的な音源角度を求めることで、１つの周波数成分に基づき最終的な音源角度を求める場合よりも、音源角度θの精度を良くすることができる。 Furthermore, in the above description, in (processing 6), one arrival time difference Tdiff is obtained by weighted averaging for each group, and the subsequent processing is performed. The present invention is not limited to this, and a configuration may be adopted in which the processes after (process 7-1) are performed for each voice frequency index in each group without calculating one arrival time difference Tdiff. In this configuration, in (processing 8), the microcomputer 41 calculates the final sound source direction statistically based on the "number of sound indexes x 2" overall sound source angles calculated in (processing 7-2). Calculate accordingly. Specifically, the microcomputer 41 removes outliers from the total sound source angles calculated in (processing 7-2), excluding outliers, and calculates one sound source direction. calculate. In this way, since the sound source angle is determined for each frequency component and the final sound source angle is determined from the statistics, it is less susceptible to errors caused by frequency components with poor conditions. For example, there are variations in the frequency characteristics of microphones, amplifiers, etc. Therefore, it is conceivable that the calculated phase difference includes a frequency component with a relatively large error and a frequency component with a relatively small error. Therefore, by determining the final sound source angle based on a plurality of frequency components, the accuracy of the sound source angle θ can be made better than when determining the final sound source angle based on one frequency component.

本発明の範囲は，明細書に明示的に説明された構成や限定されるものではなく，本明細書に開示される本発明の様々な側面の組み合わせをも，その範囲に含むものである。本発明のうち，特許を受けようとする構成を，添付の特許請求の範囲に特定したが，現在の処は特許請求の範囲に特定されていない構成であっても，本明細書に開示される構成を，将来的に特許請求の範囲とする意思を有する。
本願発明は上述した実施の形態に記載の構成に限定されない。上述した各実施の形態や変形例の構成要素は任意に選択して組み合わせて構成するとよい。また各実施の形態や変形例の任意の構成要素と，発明を解決するための手段に記載の任意の構成要素または発明を解決するための手段に記載の任意の構成要素を具体化した構成要素とは任意に組み合わせて構成するとよい。これらについても本願の補正または分割出願等において権利取得する意思を有する。
また，意匠出願への変更出願により，全体意匠または部分意匠について権利取得する意思を有する。図面は本装置の全体を実線で描画しているが，全体意匠のみならず当該装置の一部の部分に対して請求する部分意匠も包含した図面である。例えば当該装置の一部の部材を部分意匠とすることはもちろんのこと，部材と関係なく当該装置の一部の部分を部分意匠として包含した図面である。当該装置の一部の部分としては，装置の一部の部材としても良いし，その部材の部分としても良い。全体意匠はもちろんのこと，図面の実線部分のうち任意の部分を破線部分とした部分意匠を，権利化する意思を有する。 The scope of the present invention is not limited to the configurations explicitly described in the specification, but also includes combinations of various aspects of the invention disclosed herein. Of the present invention, the structure for which a patent is sought has been specified in the attached claims, but currently, even if the structure is not specified in the claims, it is not disclosed in this specification. The applicant intends to make such a configuration the scope of a patent claim in the future.
The present invention is not limited to the configuration described in the embodiments described above. The components of each of the embodiments and modifications described above may be arbitrarily selected and combined. Also, any component of each embodiment or modification, any component described in the means for solving the invention, or a component that embodies any component described in the means for solving the invention. It may be configured in any combination. The applicant intends to obtain rights to these matters through amendments to the application or divisional applications.
In addition, the applicant intends to acquire rights to the entire design or partial design by filing a conversion application to a design application. Although the drawing depicts the entire device using solid lines, the drawing includes not only the overall design but also a partial design that claims some parts of the device. For example, it is a drawing that not only includes some parts of the device as a partial design, but also includes some parts of the device as a partial design regardless of the components. The part of the device may be a part of the device or a part of the device. We intend to obtain rights not only for the entire design, but also for partial designs in which any part of the solid line part of the drawing is a broken line part.

１ロボット
１０音源方向特定装置
４１マイコン
MICａ，MICｂ，MICｃマイクロフォン

1 Robot 10 Sound source direction identification device 41 Microcomputer
MICa, MICb, MICc Microphone

Claims

A function to record audio collected by multiple microphones,
having a function of identifying a sound source direction based on sounds collected by the plurality of microphones and recording the identified sound source direction ,
The plurality of microphones are three microphones,
Three microphones are placed at the vertices of the triangle.
identifying the direction of the sound source based on the difference in arrival time of sound from the sound source to each of the three microphones;
A camera characterized in that the interior of the casing in which the three microphones are housed has a structure that allows sound to escape, and each of the three microphones is configured to pick up sound from behind as well.

The camera according to claim 1, wherein the three microphones arranged at the vertices of the triangle are all omnidirectional microphones.

It is characterized by having a function of calculating the difference in arrival time based on a phase difference calculated from each set of two electrical signals out of the three electrical signals output by each of the three microphones. The camera according to claim 1 or 2.

The camera according to any one of claims 1 to 3, further comprising a function of specifying a sound source direction based on sounds collected by the plurality of microphones and directing the camera to the specified sound source direction.