JP2007329702A

JP2007329702A - Sound-receiving device and voice-recognition device, and movable object mounted with them

Info

Publication number: JP2007329702A
Application number: JP2006159365A
Authority: JP
Inventors: Seisho Watabe; 生聖渡部; Ryo Murakami; 涼村上; Makoto Kawarada; 誠河原田; Yusuke Nakano; 雄介中野
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2006-06-08
Filing date: 2006-06-08
Publication date: 2007-12-20

Abstract

PROBLEM TO BE SOLVED: To provide a sound receiving device equipped with a display unit for notifying the direction in which the existing source of received sound to the surrounding people. SOLUTION: The sound receiving device comprises a microphone 20, which can be adjusted for a direction having high directionality (sound-receiving direction); a sound source direction detector 101 for detecting the direction, in which the source of sound received by the microphone 20 exists; and the display unit 10 which indicates the detected direction, in which the source of sound exists and the sound receiving direction of the microphone 20, in such a manner as to be visible from around a robot 1. The surrounding people can understand which source of sound is being received by the sound-receiving device. COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、音源が発した音を受音する受音装置に関する。 The present invention relates to a sound receiving device that receives sound emitted from a sound source.

マイクで受音した音の伝播方向を検出できる受音装置が知られている。そのための一つの方法に、複数のマイクを等間隔で直線上に配置する方法が知られている。例えば図１１に示すように、複数のマイク２０ａ〜２０ｆを間隔ｄで直線L上に配置する。
音源Ｍがマイク群から十分に離れた位置に存在していれば（図示の都合によって、音源Mがマイク群に接近した位置に図示されているが、実際には充分に離れている。逆に、図１１では間隔ｄが実際よりも拡大して図示されているといってもよい）、音源Ｍが発した音は、マイク群の近傍ではほぼ平面波として伝播する。例えばマイク２０ｃに到達する音は経路６１０に沿って伝播してきており、マイク２０ｄに到達する音は経路６１２に沿って伝播してきている。平面波として伝播してくる音は、波面６１６や波面６１８において、同一の位相を備えている。従って、音源Mがマイク２０ｃ，２０ｄから見て角度θの方向にある場合、ある時点で経路６１０に沿って伝播してマイク２０ｃに到達した音は、その時点では経路６１２に沿って点６２０まで到達しており、その後さらにｄ・ｃｏｓθの長さだけ伝播してから、マイク２０ｄに到達する。従って、隣接するマイク２０ｃと２０ｄの間では、受音時間差Δｔがｄ・ｃｏｓθ／λで計算される値となる。ここで、λは音の伝播速度である。したがって、受音時間差Δｔを検出することによって、音源Ｍが存在する方向を示す角度θを算出することができる。角度θは、直線Lを基準とする角度であり、直線Lは受音装置に固定されている。図１１のマイク対２０ｃ，２０ｄを利用すると、受音した音の伝播方向を受音装置を基準にして検出することができる。上記説明は、マイク２０ｃと２０ｄのマイク対に限られず、任意のマイク対について成立する。 A sound receiving device that can detect the propagation direction of sound received by a microphone is known. As one method for that purpose, a method of arranging a plurality of microphones on a straight line at equal intervals is known. For example, as shown in FIG. 11, a plurality of microphones 20a to 20f are arranged on a straight line L with an interval d.
If the sound source M exists at a position sufficiently away from the microphone group (for convenience of illustration, the sound source M is illustrated at a position close to the microphone group, but in reality, it is sufficiently far away. In FIG. 11, it may be said that the interval d is illustrated as being larger than the actual distance), and the sound emitted by the sound source M propagates as a substantially plane wave in the vicinity of the microphone group. For example, the sound reaching the microphone 20c has propagated along the path 610, and the sound reaching the microphone 20d has propagated along the path 612. Sound propagating as a plane wave has the same phase at the wavefront 616 and the wavefront 618. Accordingly, when the sound source M is in the direction of the angle θ when viewed from the microphones 20c and 20d, the sound that has propagated along the path 610 and reached the microphone 20c at a certain point in time reaches the point 620 along the path 612. It reaches the microphone 20d after further propagating by the length of d · cos θ. Therefore, between the adjacent microphones 20c and 20d, the sound reception time difference Δt is a value calculated by d · cos θ / λ. Here, λ is the speed of sound propagation. Therefore, the angle θ indicating the direction in which the sound source M exists can be calculated by detecting the sound reception time difference Δt. The angle θ is an angle with respect to the straight line L, and the straight line L is fixed to the sound receiving device. If the microphone pair 20c, 20d of FIG. 11 is used, the propagation direction of the received sound can be detected with reference to the sound receiving device. The above description is not limited to the microphone pair of the microphones 20c and 20d, and is valid for any microphone pair.

受音装置の中には、良好な指向性（強い受音感受性）を持つ方向を切換えることができるものが存在する。その一つの方法に、各マイクが受音した音信号を、良好な指向性を実現したい方向から決定される時間だけ遅延させてから重ね合わせる方法が知られている。 Some sound receiving devices can switch the direction having good directivity (strong sound receiving sensitivity). As one of the methods, a method is known in which sound signals received by the microphones are overlapped after being delayed by a time determined from a direction in which good directivity is desired.

図１２に示すように、マイク２０ｂが受音する音信号は、マイク２０ａが受音する音信号からｄ・ｃｏｓθ／λ時間だけ遅延している。マイク２０ｃが受音する音信号は、マイク２０ａが受音する音信号から２・ｄ・ｃｏｓθ／λ時間だけ遅延している。マイク２０ｄが受音する音信号は、マイク２０ａが受音する音信号から３・ｄ・ｃｏｓθ／λ時間だけ遅延している。マイク２０ｅが受音する音信号は、マイク２０ａが受音する音信号から４・ｄ・ｃｏｓθ／λ時間だけ遅延している。マイク２０ｆが受音する音信号は、マイク２０ａが受音する音信号から５・ｄ・ｃｏｓθ／λ時間だけ遅延している。 As shown in FIG. 12, the sound signal received by the microphone 20b is delayed by d · cos θ / λ time from the sound signal received by the microphone 20a. The sound signal received by the microphone 20c is delayed by 2 · d · cos θ / λ time from the sound signal received by the microphone 20a. The sound signal received by the microphone 20d is delayed by 3 · d · cos θ / λ time from the sound signal received by the microphone 20a. The sound signal received by the microphone 20e is delayed by 4 · d · cos θ / λ time from the sound signal received by the microphone 20a. The sound signal received by the microphone 20f is delayed by 5 · d · cos θ / λ time from the sound signal received by the microphone 20a.

そこで、マイク２０ａで受音した音信号については［５・ｄ・ｃｏｓθ／λ］時間だけ遅延させ、マイク２０ｂで受音した音信号については［４・ｄ・ｃｏｓθ／λ］時間だけ遅延させ、マイク２０ｃで受音した音信号については［３・ｄ・ｃｏｓθ／λ］時間だけ遅延させ、マイク２０ｄで受音した音信号については［２・ｄ・ｃｏｓθ／λ］時間だけ遅延させ、マイク２０ｅで受音した音信号については［ｄ・ｃｏｓθ／λ］時間だけ遅延させ、マイク２０ｆで受音した音信号については遅延させなければ、それらの音信号の位相が一致する。
すなわち、図１２に示す音源Ｍが発した音の音信号ａ１，ｂ１，ｃ１，ｄ１，ｅ１，ｆ１の受音量が最大となる部分（音信号の最大振幅値［Ａ］の部分）は、上記の遅延処理を実行すると、一致する。
上記の処理によって遅延した音信号を重ね合わせると、角度θの方向から伝播する音については、重ね合わせる音信号の位相が一致していることから、大きな強度の音信号が得られる。 Therefore, the sound signal received by the microphone 20a is delayed by [5 · d · cos θ / λ] time, and the sound signal received by the microphone 20b is delayed by [4 · d · cos θ / λ] time, The sound signal received by the microphone 20c is delayed by [3 · d · cos θ / λ] time, the sound signal received by the microphone 20d is delayed by [2 · d · cos θ / λ] time, and the microphone 20e is delayed. If the sound signal received at the time is delayed by [d · cos θ / λ] time, and the sound signal received by the microphone 20f is not delayed, the phases of the sound signals are matched.
That is, the portion (the portion of the maximum amplitude value [A] of the sound signal) where the sound receiving volume of the sound signals a1, b1, c1, d1, e1, f1 of the sound emitted by the sound source M shown in FIG. When the delay processing is executed, they match.
When the sound signals delayed by the above processing are superimposed, a sound signal having a large intensity can be obtained for the sound propagating from the direction of the angle θ because the phases of the sound signals to be superimposed are the same.

例えば図１１に示すように、マイク２０ａ〜２０ｆの正面方向（角度θ＝９０度）にも音源Ｎが存在するものとする。この場合、図１２に示すように、音源Ｎが発した音は、マイク２０ａ〜２０ｆに同時に到達する。
この場合、マイク２０ａで受音した音信号については［５・ｄ・ｃｏｓθ／λ］時間だけ遅延させ、マイク２０ｂで受音した音信号については［４・ｄ・ｃｏｓθ／λ］時間だけ遅延させ、マイク２０ｃで受音した音信号については［３・ｄ・ｃｏｓθ／λ］時間だけ遅延させ、マイク２０ｄで受音した音信号については［２・ｄ・ｃｏｓθ／λ］時間だけ遅延させ、マイク２０ｅで受音した音信号については［ｄ・ｃｏｓθ／λ］時間だけ遅延させ、マイク２０ｆで受音した音信号については遅延させなければ、それらの音信号の位相がずれてしまう。
すなわち、図１２に示す音源Ｎが発した音の音信号ａ２，ｂ２，ｃ２，ｄ２，ｅ２，ｆ２の受音量が最大の部分（音信号の最大振幅値［Ｂ］の部分）は、上記の遅延処理を実行すると、ずれてしまう。
上記の処理によって遅延した音信号を重ね合わせると、重ね合わせる音信号の位相が一致していないことから、大きな強度の音信号は得られない。
すなわち、上記の遅延処理を実施してから重ね合わせると、角度θ方向から伝播する音からは大きな強度の音信号が得られ、それ以外の方向から伝播する音からは大きな強度の音信号が得られない。角度θの方向に強い指向性を有する受音装置が実現される。 For example, as shown in FIG. 11, it is assumed that the sound source N exists also in the front direction (angle θ = 90 degrees) of the microphones 20a to 20f. In this case, as shown in FIG. 12, the sound emitted by the sound source N reaches the microphones 20a to 20f simultaneously.
In this case, the sound signal received by the microphone 20a is delayed by [5 · d · cos θ / λ] time, and the sound signal received by the microphone 20b is delayed by [4 · d · cos θ / λ] time. The sound signal received by the microphone 20c is delayed by [3 · d · cos θ / λ] time, and the sound signal received by the microphone 20d is delayed by [2 · d · cos θ / λ] time. If the sound signal received by 20e is delayed by [d · cos θ / λ] time, and the sound signal received by the microphone 20f is not delayed, the phases of those sound signals are shifted.
That is, the portion where the received sound volume of the sound signals a2, b2, c2, d2, e2, f2 of the sound emitted by the sound source N shown in FIG. When the delay process is executed, it is shifted.
When the sound signals delayed by the above processing are superimposed, the sound signals with high strength cannot be obtained because the phases of the sound signals to be superimposed do not match.
That is, when the above delay processing is performed and then superimposed, a sound signal having a high intensity is obtained from the sound propagating from the angle θ direction, and a sound signal having a high intensity is obtained from the sound propagating from other directions. I can't. A sound receiving device having strong directivity in the direction of the angle θ is realized.

遅延時間を決定するために用いる角度θの値を変えれば、強い指向性を有する方向を切換えることができる。角度θを９０度として遅延時間を決定すれば、９０度の角度に強い指向性を持つ受音装置が得られる。この場合、９０度の方向に存在する音源Ｎが発生する音については感度よく受音する一方、それ以外の角度に存在する音源Ｍが発生する音についてはほとんど受音しないこととなる。同様に、遅延時間を決定するために用いる角度θの値を音源Ｍが存在する角度に変えれば、音源Ｍが発生する音については感度よく受音する一方、それ以外の角度に存在する音源Ｎが発生する音についてはほとんど受音しないこととなる。 By changing the value of the angle θ used for determining the delay time, the direction having strong directivity can be switched. If the delay time is determined by setting the angle θ to 90 degrees, a sound receiving device having strong directivity at an angle of 90 degrees can be obtained. In this case, the sound generated by the sound source N existing in the direction of 90 degrees is received with high sensitivity, while the sound generated by the sound source M existing at other angles is hardly received. Similarly, if the value of the angle θ used for determining the delay time is changed to an angle where the sound source M exists, the sound generated by the sound source M is received with high sensitivity, while the sound source N existing at other angles is received. Sounds that occur are hardly received.

音源Ｍ，Ｎが人であり、同時に音声を発している場合、それらを同時に受音して同時に音声認識することは難しい。この場合、強い指向性を有する方向を切換えることが有意義である。人Ｎが存在する方向に指向性を合わせれば、人Ｍが発生する音声はほとんど受音されず、人Ｎが発した音声を認識することが可能となる。人Ｍが存在する方向に指向性を合わせれば、人Ｎが発生する音声はほとんど受音されず、人Ｍが発した音声を認識することが可能となる。 When the sound sources M and N are people and are simultaneously producing sound, it is difficult to receive them simultaneously and recognize them simultaneously. In this case, it is meaningful to switch the direction having strong directivity. If the directivity is matched with the direction in which the person N exists, the voice generated by the person M is hardly received, and the voice generated by the person N can be recognized. If the directivity is matched with the direction in which the person M exists, the voice generated by the person N is hardly received, and the voice generated by the person M can be recognized.

マイクで受音した音の伝播方向を検出する技術と、良好な指向性を持つ方向を切換える技術を組み合わせて用いると、音源が存在する方向を見つけ出し、その方向に指向性を向けることが可能なる。
例えば、音の伝播方向を検出する技術によって９０度の方向に音源が存在することを知って指向性を９０度に合わせれば、人Ｎが発生する音声を集中的に受音して音声認識することが可能となる。θ方向に音源が存在することを知って指向性をθに合わせれば、人Ｍが発生する音声を集中的に受音して音声認識することが可能となる。 Using a technique that detects the propagation direction of sound received by a microphone and a technique that switches the direction with good directivity, it is possible to find the direction in which the sound source exists and direct the directivity to that direction. .
For example, if the technology for detecting the sound propagation direction knows that a sound source exists in the direction of 90 degrees and adjusts the directivity to 90 degrees, the sound generated by the person N is received intensively and recognized. It becomes possible. Knowing that there is a sound source in the θ direction and adjusting the directivity to θ, it is possible to receive the voice generated by the person M intensively and recognize the voice.

しかしながら、この技術では、受音装置の指向性の方向が外部からはわからない。例えば車両に搭載されているナビゲーション装置に、音の伝播方向を検出する技術と、良好な指向性を持つ方向を切換える技術を組み込めば、ドライバーが発声したときにはドライバーの音声を集中的に受音して音声認識することができ、助手席が発声したときには助手席での発声を集中的に受音して音声認識することができる。しかしながら、ドライバー席と助手席で同時に発声した場合には、どちらの音声を集中的に受音して音声認識しているのかがわからない。
受音装置が向けている受音方向（強い指向性を有する方向）を周囲に存在する人にわからせる技術が必要とされている。 However, with this technique, the direction of directivity of the sound receiving device is not known from the outside. For example, if a technology for detecting the direction of sound propagation and a technology for switching the direction with good directivity are incorporated into a navigation device mounted on a vehicle, the driver's voice is received intensively when the driver speaks. When the passenger seat utters, the utterance at the passenger seat can be received intensively and recognized. However, when the driver seat and the passenger seat are uttered at the same time, it is not clear which voice is received intensively and recognized.
There is a need for a technique that allows a person existing in the vicinity to know the sound receiving direction (the direction having strong directivity) directed by the sound receiving device.

受音装置がロボット等の回転可能な物体に搭載されている場合、受音装置が向けている受音方向を、ロボットの顔の向きによって周囲に存在する人にわからせることができる。
特許文献１には、受音装置の指向性の高い方向とロボットの顔の向きを一致させることによって、受音装置の指向性の高い方向を周囲に存在する人にわからせる技術が記載されている。 When the sound receiving device is mounted on a rotatable object such as a robot, the sound receiving direction to which the sound receiving device is directed can be made known to a person existing around by the orientation of the face of the robot.
Japanese Patent Application Laid-Open No. 2004-228561 describes a technique that allows a person around to know the direction of the sound receiving device having high directivity by matching the direction of the sound receiving device having high directivity with the direction of the face of the robot. Yes.

特開２００２−３６６１９１号公報JP 2002-366191 A

受音装置がロボット等の可動体に搭載されている場合には、顔の向き等によって指向性の高い方向を周囲に存在する人にわからせることができるが、ナビゲーション装置等のように動かない受音装置の場合には、指向性の高い方向を周囲に存在する人にわからせることができない。 When the sound receiving device is mounted on a movable body such as a robot, a person with a high directivity can recognize the surrounding direction depending on the orientation of the face, but it does not move like a navigation device. In the case of a sound receiving device, it is impossible for a person existing in the vicinity to know a direction having high directivity.

顔の向き等によって指向性の高い方向を示す技術でも、問題はある。例えば、ロボットに音声で指示しながら配電盤に配置されているスイッチ群の操作をさせたい場合がある。この場合、ロボットは配電盤に顔を向けて作業を続ける必要があり、指向性が高い方向に顔を向けることができない。顔の方位とは別に、ロボットに搭載されている受音装置の指向性が指示者に向けられているのか、あるいはそれ以外に向けられているのかを表示する技術が必要とされている。 There is a problem even in the technique that indicates a direction having high directivity depending on the orientation of the face. For example, there is a case where it is desired to operate a switch group arranged on the switchboard while instructing the robot by voice. In this case, it is necessary for the robot to continue working with its face facing the switchboard, and the face cannot be directed in a direction with high directivity. In addition to the orientation of the face, there is a need for a technique for displaying whether the directivity of the sound receiving device mounted on the robot is directed toward the instructor or the other direction.

（請求項１に記載の発明）
本願発明の受音装置は、マイクと、マイクで受音した音の伝播方向を受音装置を基準にして検出する音源方向検出手段と、音源方向検出手段で検出した方向を受音装置の周囲から視認可能に表示する表示手段を備えている。
音源方向検出手段は、例えば閾値以上の音量の音がマイクで受音されるときに、音の伝播方向を判別する。音源方向検出手段は、受音装置に固定されている基準方向を基準にして音の伝播方向を検出する。音源方向検出手段は、受音装置を基準にして音源が存在する方向を判別する。音源方向検出手段は、ソフトウエアで構成されていてもハードウエアで構成されていてもよい。音源が２以上存在する場合には、各々の音源の存在方向を検出する。
表示手段には、受音装置に固定されている基準方向に対する音源の存在方向が表示される。音源が２以上存在する場合には、各々の音源の存在方向を表示する。 (Invention of Claim 1)
The sound receiving device of the present invention includes a microphone, sound source direction detecting means for detecting a propagation direction of sound received by the microphone with reference to the sound receiving device, and a direction detected by the sound source direction detecting means around the sound receiving device. Display means for displaying in a visible manner.
The sound source direction detection means determines the sound propagation direction when, for example, a sound having a volume equal to or higher than a threshold is received by a microphone. The sound source direction detecting means detects a sound propagation direction with reference to a reference direction fixed to the sound receiving device. The sound source direction detecting means determines the direction in which the sound source exists with reference to the sound receiving device. The sound source direction detection means may be configured by software or hardware. When there are two or more sound sources, the direction of presence of each sound source is detected.
The display means displays the direction of the sound source relative to the reference direction fixed to the sound receiving device. When there are two or more sound sources, the direction of each sound source is displayed.

本発明の受音装置を用いれば、周囲の人が、受音装置が受音している音源の存在方向を知ることができる。ナビゲーション装置に対してドライバー席と助手席で同時に発声している場合には、両者の音声を受音しているのか、一方の音声しか受音していないのかを知ることができる。後者の場合には、どちらの音声を受音しているのかを知ることができる。
特定方位に顔を向けて作業を続けるロボットに音声で指示する場合には、ロボットが指示者に指向性を合わせているのか、それ以外の音源に指向性を合わせているのかを知ることができる。周囲に存在する雑音源に指向性を合わせているために指示者の音声が認識されていなければそのことを知ることができ、雑音を小さくするなどの対策が有効であるといったことを知ることができる。あるいは、指向性を合わせていないロボットに音声指示を続けるといったことを防止できる。
また複数の人がロボットを取り囲んで同時に発声している場合には、ロボットが誰の音声を認識しているのかが明らかとされ、混乱が避けられる。 By using the sound receiving device of the present invention, surrounding people can know the direction of the sound source that the sound receiving device is receiving. If the navigation device is uttering at the driver seat and the passenger seat at the same time, it is possible to know whether both voices are received or only one voice is received. In the latter case, it is possible to know which voice is being received.
When you give a voice to a robot that keeps its face facing a specific direction, you can know whether the robot has directionality to the instructor or other sound source. . Because the directivity is matched to the surrounding noise sources, it is possible to know if the voice of the instructor is not recognized, and know that measures such as reducing noise are effective. it can. Alternatively, it is possible to prevent a voice instruction from being continued to a robot whose directionality is not matched.
Also, when a plurality of people surround the robot and speak at the same time, it becomes clear who the robot is recognizing, and confusion can be avoided.

（請求項２に記載の発明）
表示手段が、マイクの受音量を音源別に表示するようにしてもよい。
この場合、周囲の人が、受音装置が検出している音源の存在方向と、方向別の受音量を知ることができる。 (Invention of Claim 2)
The display means may display the received sound volume of the microphone for each sound source.
In this case, the surrounding people can know the direction of the sound source detected by the sound receiving device and the sound receiving volume for each direction.

（請求項３に記載の発明）
マイクで受音した音の周波数成分に基づいて、受音した音の音源の種類を判別する音源種類判別手段が付加されており、音源種類判別手段で判別した音源の種類を表示手段で表示することが好ましい。
音源種類判別手段は、人の音声とそれ以外の音とを判別することができればよく、ソフトウエアで構成されていてもハードウエアで構成されていてもよい。
本発明の受音装置を用いれば、周囲の人が、受音装置が認識している音源の存在方向と音源種類を知ることができる。例えば、テレビやラジオが音を発している環境でロボットに音声で指示する場合、ロボットがテレビやラジオの音を受音しているのか、肉声のみを受音しているのかを知ることができる。 (Invention of Claim 3)
Sound source type discriminating means for discriminating the type of sound source of the received sound is added based on the frequency component of the sound received by the microphone, and the type of the sound source discriminated by the sound source type discriminating means is displayed on the display means. It is preferable.
The sound source type discriminating unit only needs to be able to discriminate between human voices and other sounds, and may be configured by software or hardware.
By using the sound receiving device of the present invention, surrounding people can know the direction of the sound source and the type of sound source recognized by the sound receiving device. For example, if you give a voice to the robot in an environment where the TV or radio is emitting sound, you can know whether the robot is receiving the sound of the TV or radio or only the real voice. .

（請求項４に記載の発明）
受音装置に、複数のカメラと音源距離計算手段が付加されていてもよい。複数のカメラは、マイクが音を受音する範囲を撮像する。音源距離計算手段は、複数のカメラで撮像した画像群に基づいて、受音装置と音源の距離を計算する。音源距離計算手段は、ソフトウエアで構成されていてもハードウエアで構成されていてもよい。この場合、表示手段に、音源距離計算手段で計算した距離を併せて表示するとよい。
表示手段に受音装置と音源がマークで表示される場合、受音装置と音源の距離が、マーク間の長さによって表示されてもよい。また、受音装置と音源の距離が、「○○ｃｍ」のようにテキスト表示されてもよい。
本発明の受音装置を用いれば、周囲の人が、受音装置が認識している音源の存在方向と音源までの距離を知ることができ、受音装置の受音状況を一層把握し易い。 (Invention of Claim 4)
A plurality of cameras and sound source distance calculation means may be added to the sound receiving device. The plurality of cameras capture an area where the microphone receives sound. The sound source distance calculation means calculates the distance between the sound receiving device and the sound source based on a group of images captured by a plurality of cameras. The sound source distance calculation means may be configured by software or hardware. In this case, the distance calculated by the sound source distance calculating means may be displayed together on the display means.
When the sound receiving device and the sound source are displayed as marks on the display means, the distance between the sound receiving device and the sound source may be displayed according to the length between the marks. Further, the distance between the sound receiving device and the sound source may be displayed as text such as “OOcm”.
By using the sound receiving device of the present invention, surrounding people can know the direction of the sound source recognized by the sound receiving device and the distance to the sound source, making it easier to grasp the sound receiving status of the sound receiving device. .

（請求項５に記載の発明）
本願発明は、請求項１〜４のいずれかの受音装置を利用して音声認識装置を実現することもできる。この場合の音声認識装置は、請求項１〜４のいずれかの受音装置と、音源方向固定手段と、音声認識手段を備えている。音源方向固定手段は、音源方向検出手段が複数の方向を検出する場合に、受音量が最大となる方向にマイクの受音方向を固定する。
マイクの「受音方向」とは、強い指向性を持つ方向をいう。「受音方向を固定する」態様には複数の態様が存在する。指向性マイクを物理的に回転させて指向性の方向を変える場合には、マイクを物理的に固定して受音方向を固定することができる。静止している複数のマイクの出力を処理して指向性を実現する場合には、処理内容を固定することによって受音方向を固定することができる。
従来の技術では、音声を発している人が音声認識装置で認識している音源の存在方向を知ることができず、音声指示に従った結果が得られない場合に、その原因を知ることが困難であった。本装置によると、受音装置の受音方向が表示され、音声指示に従った結果が得られない場合にその理由が把握しやすくなる。 (Invention of Claim 5)
This invention can also implement | achieve a speech recognition apparatus using the any one of Claims 1-4. The voice recognition apparatus in this case includes the sound receiving apparatus according to any one of claims 1 to 4, a sound source direction fixing means, and a voice recognition means. The sound source direction fixing means fixes the sound receiving direction of the microphone in the direction in which the sound receiving volume is maximized when the sound source direction detecting means detects a plurality of directions.
The “sound receiving direction” of the microphone means a direction having strong directivity. There are a plurality of modes in the mode of “fixing the sound receiving direction”. When the directionality direction is changed by physically rotating the directional microphone, the sound receiving direction can be fixed by physically fixing the microphone. When the directivity is realized by processing the outputs of a plurality of stationary microphones, the sound receiving direction can be fixed by fixing the processing content.
In the prior art, when the person who is speaking cannot know the direction of the sound source recognized by the voice recognition device and cannot obtain the result according to the voice instruction, the cause can be known. It was difficult. According to this device, the sound receiving direction of the sound receiving device is displayed, and when the result according to the voice instruction cannot be obtained, the reason can be easily understood.

（請求項６に記載の発明）
表示手段が、受音方向固定手段で固定したマイクの受音方向を併せて表示するのが好ましい。
表示手段には、少なくともマイクの受音方向が固定された場合にその方向が表示されればよく、受音方向が固定されない間も受音方向が表示され続けてもよい。例えば、マイクが受音方向を時間とともに切替ながら受音している間、時間ともに変化する表示が示されてもよい。
これによって、ユーザーは、強い指向性を持つ受音方向を知ることができる。表示された受音方向が自身を向いていなければ、自身の音声が受音されないことを認識できる。マイクの受音方向に存在する雑音源を排除する等、必要な対策が講じやすい。 (Invention of Claim 6)
It is preferable that the display means also displays the sound receiving direction of the microphone fixed by the sound receiving direction fixing means.
The display means only needs to display at least the direction of sound reception of the microphone, and the direction of sound reception may continue to be displayed while the direction of sound reception is not fixed. For example, a display that changes with time may be displayed while the microphone receives sound while switching the sound receiving direction with time.
Thereby, the user can know the sound receiving direction having a strong directivity. If the displayed sound receiving direction does not face itself, it can be recognized that the user's own sound is not received. It is easy to take necessary measures such as eliminating noise sources that exist in the microphone receiving direction.

（請求項７に記載の発明）
本願発明は、可動体に搭載する場合に特に有効である。ここいう可動体は、少なくとも鉛直軸の周りに回転できるものであり、請求項１〜４のいずれかの受音装置を搭載している。可動体には表示手段が設けられており、可動体に固定されている基準方向と音源が存在する方向がなす角度を表示する。
本発明の可動体を用いれば、周囲の人が、可動体に搭載されている受音装置が認識している音源の存在方向を知ることができ、受音装置の受音状況を把握し易い。
また、可動体が鉛直軸の周りに回転しても、表示手段には、基準方向と音源が存在する方向がなす角度、すなわち可動体に対する音源の相対的な方向を表示することができる。 (Invention of Claim 7)
The present invention is particularly effective when mounted on a movable body. The movable body here can rotate at least around the vertical axis, and is mounted with the sound receiving device according to any one of claims 1 to 4. The movable body is provided with display means for displaying an angle formed by a reference direction fixed to the movable body and a direction in which the sound source exists.
If the movable body of the present invention is used, the surrounding people can know the direction of the sound source recognized by the sound receiving device mounted on the movable body, and can easily grasp the sound reception status of the sound receiving device. .
Even if the movable body rotates around the vertical axis, the display means can display the angle formed by the reference direction and the direction in which the sound source exists, that is, the relative direction of the sound source with respect to the movable body.

（請求項８に記載の発明）
可動体を床面に対して鉛直軸の周りに回転させる回転機構と、音源方向検出手段が複数の方向を検出する場合に受音量が最大となる方向にマイクの受音方向を固定する手段と、受音方向が固定されたマイクで受音した音を音声認識する音声認識手段と、音声認識手段で認識した情報に基づいて、回転機構を制御する制御手段が付加されていることが好ましい。
ユーザーは、可動体に搭載されている受音装置が自身が位置している方向に指向性を向けているか否かを知ることができる。ユーザー以外の音源に指向性が向けられていればそれを知ることができる。ユーザーは、マイクの指向性を自身に向けておくための対策を知ることができる。 (Invention of Claim 8)
A rotating mechanism for rotating the movable body around the vertical axis with respect to the floor, and a means for fixing the sound receiving direction of the microphone in a direction in which the sound receiving volume becomes maximum when the sound source direction detecting means detects a plurality of directions. It is preferable that voice recognition means for recognizing sound received by a microphone whose sound receiving direction is fixed and control means for controlling the rotation mechanism based on information recognized by the voice recognition means are added.
The user can know whether or not the sound receiving device mounted on the movable body is directed in the direction in which the sound receiving device is located. If the directivity is directed to a sound source other than the user, it can be known. The user can know a measure for keeping the microphone directivity at the user.

（請求項９に記載の発明）
可動体が備える表示手段は、受音方向固定手段で固定したマイクの受音方向を併せて表示することが好ましい。
これによって、ユーザーは、マイクの受音方向を知ることができる。表示された受音方向が自身を向いていなければ、自身の音声が受音されないことを知ることができる。マイクの受音方向に存在する雑音源を排除する等、音声入力の障害に対処し易くなる。 (Invention of Claim 9)
The display means included in the movable body preferably displays the sound receiving direction of the microphone fixed by the sound receiving direction fixing means.
Thereby, the user can know the sound receiving direction of the microphone. If the displayed sound receiving direction does not face itself, it is possible to know that its own sound is not received. It becomes easy to deal with a failure in voice input, such as eliminating a noise source present in the sound receiving direction of the microphone.

本発明の受音装置を用いれば、周囲の人が、受音装置が受音している音源の存在方向を知ることができ、受音装置の受音状況を把握し易い。 If the sound receiving device of the present invention is used, surrounding people can know the direction of the sound source that the sound receiving device is receiving, and can easily understand the sound receiving status of the sound receiving device.

以下に説明する実施例の主要な特徴を列記しておく。
（第１形態）音源方向検出手段は、マイクが受音方向を切り替えながら受音した音量値が極大値を示す方向を、音源が存在する方向として認識する。
（第２形態）音源方向検出手段が認識した音源方向群の中から、受音した音量値が最大値を示す音源方向を決定する手段が付加されている。マイクは、決定した音源方向に受音方向を固定する。表示装置は、固定した受音方向を併せて表示する。
（第３形態）音源方向検出手段が検出した音源方向からマイクが受音した音声の言語内容を認識する手段が付加されており、表示手段は、言語内容認識手段が認識した言語内容を併せて表示する。
（第４形態）音源方向検出手段が検出した音源方向からマイクが受音した音声の言語内容を認識する手段が付加されており、表示手段は、言語内容認識手段が言語内容を認識することができたか否かを示す結果を併せて表示する。
（第５形態）言語内容認識手段で認識した言語内容に対応する対話音声を生成する手段が付加されており、表示手段は、対話生成手段が生成した対話音声を併せて表示する。
（第６形態）言語内容認識手段で認識した言語内容に対応する対話音声を生成する手段が付加されており、表示手段は、対話生成手段が対話音声を生成することができたか否かを示す結果を併せて表示する。 The main features of the embodiments described below are listed.
(First Form) The sound source direction detecting means recognizes the direction in which the volume value received by the microphone while switching the sound receiving direction shows the maximum value as the direction in which the sound source exists.
(2nd form) The means to determine the sound source direction in which the received sound volume value shows the maximum value is added from the sound source direction group recognized by the sound source direction detecting means. The microphone fixes the sound receiving direction to the determined sound source direction. The display device also displays the fixed sound receiving direction.
(Third embodiment) Means for recognizing the language content of the sound received by the microphone from the sound source direction detected by the sound source direction detecting means is added, and the display means also combines the language contents recognized by the language content recognizing means. indicate.
(Fourth Mode) Means for recognizing the language content of the sound received by the microphone from the sound source direction detected by the sound source direction detection means is added, and the display means can recognize the language content by the language content recognition means. The result indicating whether or not it has been completed is also displayed.
(5th form) The means to produce | generate the dialog sound corresponding to the language content recognized by the language content recognition means is added, and a display means displays the dialog sound produced | generated by the dialog production | generation means collectively.
(6th form) The means to generate | occur | produce the dialog sound corresponding to the language content recognized by the language content recognition means is added, and a display means shows whether the dialog generation means was able to generate | occur | produce the dialog sound The result is also displayed.

（第１実施例）
図１〜図８を参照して、音声入力装置を組み込んだ対話型ロボットの第１実施例を説明する。対話型ロボットの前面には、ロボットの周囲に位置している人が視認可能な表示器が設けられている。表示器には、ロボットが受音している音源の存在方向が表示される。表示器には、ロボットに固定されている基準方向と、音源の存在方向が示される。それを見ることによって、ロボットの周囲に位置している人は、ロボットから見た音源の存在方向を知ることができる。第１実施例では、表示器に、ロボットから見た音源の存在方向の他に、音源の種類、受音している音量レベル、音声入力装置で受音している方向（指向性を合わせている方向）が表示される。
図１は、ロボットが対話可能なエリア内に存在する音源とロボットの概要を示す。図２は、ロボットの構成を示すブロック図である。図３は、音源の存在方向を特定する過程を説明する図である。図４は、ロボットの制御手段が実行するプログラムのフローチャート図である。図５〜図８は、表示器に表示される画面の例を示す。 (First embodiment)
A first embodiment of an interactive robot incorporating a voice input device will be described with reference to FIGS. On the front side of the interactive robot, there is provided a display that can be visually recognized by people located around the robot. The direction of the sound source received by the robot is displayed on the display. The display shows the reference direction fixed to the robot and the direction in which the sound source exists. By looking at it, a person located around the robot can know the direction of the sound source as seen from the robot. In the first embodiment, in addition to the direction in which the sound source is seen from the robot, the type of sound source, the volume level at which sound is received, and the direction at which sound is received by the sound input device (directivity is matched) Direction) is displayed.
FIG. 1 shows an outline of a sound source and a robot that exist in an area where the robot can interact. FIG. 2 is a block diagram showing the configuration of the robot. FIG. 3 is a diagram for explaining the process of specifying the direction in which the sound source exists. FIG. 4 is a flowchart of a program executed by the robot control means. 5 to 8 show examples of screens displayed on the display.

図１に示すように、人型で対話型のロボット１の周囲には、人間２〜４、テレビ５、携帯電話６等の音源が存在する。ロボット１の真横を０度、その反対方向の真横を１８０度とすると、０度と１８０度を境界とするロボット１の前面が、ロボット１で対応することができる応対可能角度範囲である。ここで、人間２はロボット１に「トイレはどこですか？」と話しかけている。人間３と人間４は大声で話をしている。ＴＶ５は大音量を発している。携帯電話６は着信音を発している。
ロボット１の前面には表示器１０が設けられている。表示器１０の上方には、６個のマイク２０ａ〜２０ｆを備えたマイクアレイ２０（請求項に記載の”マイク”の実施例）が設けられている。マイク２０ａ〜２０ｆは、ロボット１の前面において、間隔ｄで直線上に配置されている。ロボット１の口に相当する部分にはスピーカ３０が設けられている。ロボット１には車輪状の移動手段４０が設けられており、床面上で鉛直軸の周りに回転したり、床面上を移動したりすることができる。また、ロボット１はコントローラ３００が内蔵されている。コントローラ３００によって、ロボット１の動作が制御されている。
ロボット１は、対話者に話しかけられた言語内容を認識して応対する。対話者に「トイレはどこですか？」と聞かれたら、ロボット１はトイレの場所まで移動して案内をすることができる。 As shown in FIG. 1, sound sources such as humans 2 to 4, a television 5, and a mobile phone 6 exist around a humanoid and interactive robot 1. When the right side of the robot 1 is 0 degrees and the right side in the opposite direction is 180 degrees, the front surface of the robot 1 having a boundary between 0 degrees and 180 degrees is the available angle range that the robot 1 can handle. Here, the human 2 is talking to the robot 1 "Where is the toilet?" Humans 3 and 4 are talking loudly. The TV 5 emits a large volume. The mobile phone 6 emits a ring tone.
A display 10 is provided on the front surface of the robot 1. Above the display 10, there is provided a microphone array 20 (an example of a “microphone” recited in the claims) including six microphones 20 a to 20 f. The microphones 20a to 20f are arranged on a straight line at an interval d on the front surface of the robot 1. A speaker 30 is provided at a portion corresponding to the mouth of the robot 1. The robot 1 is provided with wheel-shaped moving means 40 and can rotate around the vertical axis on the floor surface or move on the floor surface. In addition, the robot 1 has a built-in controller 300. The operation of the robot 1 is controlled by the controller 300.
The robot 1 recognizes and responds to the language content spoken to the interlocutor. When the interrogator asks "Where is the toilet?", The robot 1 can move to the toilet location and provide guidance.

図２に示すように、コントローラ３００は、制御手段１００、音声入力インターフェース２１、表示出力インターフェース１１、音声出力インターフェース３１、移動手段駆動部４１を備えている。
制御手段１００は、音声入力インターフェース２１を介してマイクアレイ２０に接続されており、表示出力インターフェース１１を介して表示器１０に接続されており、音声出力インターフェース３１を介してスピーカ３０に接続されており、移動手段駆動部４１を介して移動手段４０に接続されている。
制御手段１００には、音源方向検出部１０１、音源種類判別部１０２、指向性制御部１０３、言語内容認識部１０４、対話生成処理部１０５が設けられている。
音声入力インターフェース２１には、音源方向検出部１０１の入力側と、音源種類判別部１０２の入力側と、言語内容認識部１０４の入力側が接続されている。音源方向検出部１０１の出力側は、指向性制御部１０３の入力側と、表示出力インターフェース１１に接続されている。音源種類判別部１０２の出力側は、指向性制御部１０３の入力側と、表示出力インターフェース１１に接続されている。指向性制御部１０３の出力側は、表示出力インターフェース１１と、言語内容認識部１０４の入力側に接続されている。言語内容認識部１０４の出力側は、対話生成処理部１０５の入力側と、移動手段駆動部４１に接続されている。対話生成処理部１０５の出力側は、音声出力インターフェース３１に接続されている。 As shown in FIG. 2, the controller 300 includes a control unit 100, a voice input interface 21, a display output interface 11, a voice output interface 31, and a moving unit drive unit 41.
The control means 100 is connected to the microphone array 20 via the audio input interface 21, connected to the display 10 via the display output interface 11, and connected to the speaker 30 via the audio output interface 31. And connected to the moving means 40 via the moving means driving unit 41.
The control unit 100 includes a sound source direction detection unit 101, a sound source type determination unit 102, a directivity control unit 103, a language content recognition unit 104, and a dialogue generation processing unit 105.
The audio input interface 21 is connected to the input side of the sound source direction detection unit 101, the input side of the sound source type determination unit 102, and the input side of the language content recognition unit 104. The output side of the sound source direction detection unit 101 is connected to the input side of the directivity control unit 103 and the display output interface 11. The output side of the sound source type discrimination unit 102 is connected to the input side of the directivity control unit 103 and the display output interface 11. The output side of the directivity control unit 103 is connected to the display output interface 11 and the input side of the language content recognition unit 104. The output side of the language content recognition unit 104 is connected to the input side of the dialogue generation processing unit 105 and the moving means driving unit 41. The output side of the dialogue generation processing unit 105 is connected to the audio output interface 31.

音源方向検出部１０１は、マイクアレイ２０で得た音信号に基づいて、音源がロボット１に対してどの方向に位置しているかを示す音源方向を検出する。音源方向の検出には、図１１を参照して説明した従来方法を用いることができる。任意のマイク対から受音時間差Δｔを検出することによって、音源方向を検出することができる。 The sound source direction detection unit 101 detects a sound source direction indicating in which direction the sound source is located with respect to the robot 1 based on the sound signal obtained by the microphone array 20. The conventional method described with reference to FIG. 11 can be used to detect the sound source direction. The sound source direction can be detected by detecting the sound reception time difference Δt from any pair of microphones.

音源が１個しか存在しない場合には、任意のマイク対の受音時間差Δｔから音源方向を検出することができるが、複数の音源が存在する場合には、受音時間差Δｔを特定することが困難となる。
そこで、図１１と図１２を参照して説明した指向性を利用して、音源の存在方向を検出する。指向性θを時間的に０度〜１８０度の間で変化させ、その間の受音量を検出し続ける。図３は、こうして得られる受音量の変化を例示している。横軸は指向性θの大きさであり、縦軸はその指向性において受音された音量レベルを示している。図３に例示するグラフの場合、ロボット１から見たときの角度が、５度、４５度、９０度、１３５度の方向に、音量の極大値ｘ１〜ｘ４（ｄＢ）が観測されたことを例示している。音源方向検出部１０１が音源であると認識可能な音量レベルの閾値がＡ（ｄＢ）であれば、ｘ１，ｘ３，ｘ４（ｄＢ）＞Ａ（ｄＢ）であり、ｘ２＜Ａ（ｄＢ）であるので、ロボット１から見て５度、９０度、１３５度の方向に、何らかの音源が存在していることがロボット１によって検出される。図３の場合、４５度の角度に存在する携帯電話６の音量が小さく、ロボット１は音源として認識しない。 When there is only one sound source, the direction of the sound source can be detected from the sound reception time difference Δt between any pair of microphones. However, when there are a plurality of sound sources, the sound reception time difference Δt can be specified. It becomes difficult.
Therefore, the direction of the sound source is detected using the directivity described with reference to FIGS. The directivity θ is temporally changed between 0 degrees and 180 degrees, and the received sound volume is continuously detected. FIG. 3 illustrates the change in the received sound volume thus obtained. The horizontal axis represents the magnitude of the directivity θ, and the vertical axis represents the volume level received by the directivity. In the case of the graph illustrated in FIG. 3, the maximum values x1 to x4 (dB) of the sound volume are observed in the directions of 5 degrees, 45 degrees, 90 degrees, and 135 degrees when viewed from the robot 1. Illustrated. If the sound volume level threshold that the sound source direction detection unit 101 can recognize as a sound source is A (dB), x1, x3, x4 (dB)> A (dB), and x2 <A (dB). Therefore, the robot 1 detects that some sound source is present in the directions of 5 degrees, 90 degrees, and 135 degrees as viewed from the robot 1. In the case of FIG. 3, the volume of the mobile phone 6 existing at an angle of 45 degrees is low, and the robot 1 does not recognize it as a sound source.

音源種類判別部１０２は、受音した音信号の周波数成分に基づいて、音源の種類を特定する。音源の種類は、ＨＭＭ（Hidden Markov Model）等を用いたパターンマッチングによって特定される。音源種類判別部１０２によって、音源が人間の音声であるのか、テレビの音声であるのか、あるいはＣＤからの音であるのかが特定される。 The sound source type determination unit 102 specifies the type of the sound source based on the frequency component of the received sound signal. The type of sound source is specified by pattern matching using an HMM (Hidden Markov Model) or the like. The sound source type discriminating unit 102 specifies whether the sound source is a human voice, a television voice, or a sound from a CD.

指向性制御部１０３は、音源方向検出部１０１で特定した各音源の存在方向を示す情報と、音源種類判別部１０２が特定した各音源の種類を示す情報に基づいて、マイクアレイ２０の指向性を調整する。ここでは、音源方向検出部１０１で特定した方向に存在する音源であって種類が人間の音声で音源の中から音量が最も大きい音源を選択し、その音源に指向性を合わせる。音源方向検出部１０１で、図３に例示した角度−音量レベルの特性が得られた場合には、人間３が存在する５度の方向に指向性を合わせる。 The directivity control unit 103 is based on the information indicating the direction of each sound source specified by the sound source direction detection unit 101 and the information indicating the type of each sound source specified by the sound source type determination unit 102. Adjust. Here, a sound source that exists in the direction specified by the sound source direction detection unit 101 and is of the human voice and has the highest volume is selected from the sound sources, and the directivity is matched to the sound source. When the sound source direction detection unit 101 obtains the angle-volume level characteristic illustrated in FIG. 3, the directivity is adjusted to the direction of 5 degrees where the human 3 exists.

言語内容認識部１０４は、指向性を合わせた音源が発した音声の言語内容を判別する。例えば、人間３が発した音声である「さっきのテレビ番組を見た？」という言語内容を認識する。音声の言語内容を認識する方法は、一般的な技術であるので詳細な説明は省略する。
対話生成処理部１０５では、言語内容認識部１０４で認識した言語内容に対応して、スピーカ３０から出力する音声（返事）の内容を示す音信号を組み立てる。例えば、「申し訳ございませんが、見ていません。」という内容の音声を出力する音信号を組み立てる。
音声出力インターフェース３１は、対話生成処理部１０５から入力された音信号によってスピーカ３０から音声を発声させる。
表示出力インターフェース１１は、音源方向検出部１０１から入力された音源の存在方向と、音源種類判別部１０２から入力された音源種類と、指向性制御部１０３から入力された指向性を合わせている受音方向の情報を、表示器１０に出力する。表示器１０は、これらの情報を表示する。
移動手段駆動部４１は、言語内容認識部１０４で認識した音声の内容に対応して、移動手段４０を駆動する。言語内容認識部１０４で認識した音声が「トイレはどこですか？」であれば、移動手段４０によってロボット１はトイレまで移動し、質問者をトイレに案内をする。 The language content recognition unit 104 discriminates the language content of the sound emitted by the sound source with the directivity. For example, it recognizes the language content “Did you watch a TV program a while ago?”, Which is a voice uttered by human 3. Since the method for recognizing the language content of speech is a general technique, detailed description thereof is omitted.
The dialogue generation processing unit 105 assembles a sound signal indicating the content of the voice (answer) output from the speaker 30 in accordance with the language content recognized by the language content recognition unit 104. For example, a sound signal that outputs a sound of “I am sorry but I have not seen it” is assembled.
The sound output interface 31 causes the speaker 30 to utter sound by the sound signal input from the dialogue generation processing unit 105.
The display output interface 11 receives the sound source direction input from the sound source direction detection unit 101, the sound source type input from the sound source type determination unit 102, and the directivity input from the directivity control unit 103. Information on the sound direction is output to the display 10. The display 10 displays these pieces of information.
The moving means driving unit 41 drives the moving means 40 in accordance with the voice content recognized by the language content recognition unit 104. If the speech recognized by the language content recognition unit 104 is “Where is the toilet?”, The robot 1 moves to the toilet by the moving means 40 and guides the questioner to the toilet.

次に、図４を参照して、コントローラ３００がロボット１を制御する手順を説明する。図４のフローチャートは、コントローラ３００が実行するプログラムの処理内容を示す。このプログラムは、コントローラ３００の制御手段１００に設けられているＲＯＭ等の記憶手段（特に図示していない。）に記憶されており、制御手段１００に設けられているＣＰＵ等の制御装置（特に図示していない。）に適宜読み出されて実行される。 Next, a procedure in which the controller 300 controls the robot 1 will be described with reference to FIG. The flowchart of FIG. 4 shows the processing contents of the program executed by the controller 300. This program is stored in a storage means (not shown) such as a ROM provided in the control means 100 of the controller 300, and a control device such as a CPU provided in the control means 100 (particularly in FIG. (Not shown)) and read out as appropriate.

ステップＳ１０では、コントローラ３００が、マイクアレイ２０の指向性（受音方向）を角度範囲０度〜１８０度の範囲内で時間的に切替えながら（走査しながら）受音する。この際、表示器１０には、図５に示すようなサーチ画面２００が表示される。サーチ画面２００には、ロボット１の応対可能角度範囲が表示される。応対可能角度範囲には、ロボット１の位置を示す位置表示２０１が表示されている。また、ロボット１の真横方向を０度の方向、反対側の真横方向を１８０度の方向とし、４５度方向、９０度方向、１３５度の方向を示す補助ライン２０２〜２０６が表示されている。また、良好な指向性を持つ方向を走査中であることを示すサーチ表示２０７が表示される。サーチ表示２０７は、ロボット１の位置表示２０１を中心に、矢印が補助ライン２０２から補助ライン２０６までの範囲を、時計回りあるいは反時計回りに往来し、この範囲に含まれる音源をサーチ中であることを示す。 In step S10, the controller 300 receives sound while switching the directivity (sound receiving direction) of the microphone array 20 in time (scanning) within an angle range of 0 to 180 degrees. At this time, a search screen 200 as shown in FIG. The search screen 200 displays the available angle range of the robot 1. A position display 201 indicating the position of the robot 1 is displayed in the available angle range. Also, auxiliary lines 202 to 206 indicating 45 degrees, 90 degrees, and 135 degrees directions are displayed, with the lateral direction of the robot 1 being the 0 degree direction and the opposite lateral direction being the 180 degree direction. Further, a search display 207 indicating that a direction having good directivity is being scanned is displayed. The search display 207 is centering on the position display 201 of the robot 1, and the arrow moves clockwise or counterclockwise in the range from the auxiliary line 202 to the auxiliary line 206, and the sound source included in this range is being searched. It shows that.

上記のサーチによって、音源が存在方向が検出される。音源の存在方向は、ロボット１に固定されている角度（例えば、０度あるいは１８０度の方向はロボット１に固定されており、ロボット１が回転しなければその方向は変わらないが、ロボット１が回転すればそれに伴って回転する）で検出される。ロボット１が回転すれば、音源の存在位置は不動であっても、音源の存在方向は回転する。図４のステップＳ１０を実行することによって、図２に示す音源方向検出部１０１が実現される。 The direction in which the sound source exists is detected by the above search. The direction in which the sound source exists is an angle fixed to the robot 1 (for example, a direction of 0 degrees or 180 degrees is fixed to the robot 1, and the direction does not change unless the robot 1 rotates. If it rotates, it rotates with it). If the robot 1 rotates, the direction of the sound source rotates even if the position of the sound source does not move. By executing step S10 of FIG. 4, the sound source direction detection unit 101 shown in FIG. 2 is realized.

図４のステップＳ１２では、コントローラ３００が、マイクアレイ２０から入力された音信号に基づいて、音源方向にある音源の種類を特定する。各音源の種類が特定されたら、表示器１０には、図６に例示するように、音源の存在方向と、音源の種類と、受音した音量を表示する。図６に示す画面２１０では、ロボット１から見て５度の方向に、人間３が存在していると検出した結果を示す人のマーク２１８が表示されている。また、ロボット１から見て９０度の方向に、人間２が存在していると検出した結果を示す人のマーク２１７が表示されている。また、ロボット１から見て１３５度の方向に、テレビが存在していると検出した結果を示すマーク２１６が表示されている。各マークは、音量に比例する大きさで表示される。図８に示した画面２１０では、マーク２１７で示される人間２が発する音量よりも、マーク２１８で示される人間３が発する音量の方が大きい。なお、音源の種類に対応するマークは、コントローラ３００の記憶手段（特に図示していない。）に記憶されており、音源の種類が判別されたら、種類に対応するマークが記憶手段から読み出され、表示出力インターフェース１１を介して表示器１０に出力され、各画面に表示される。図４のステップＳ１２を実行することによって、図２に示す音源種類判別部１０２が実現される。 In step S 12 of FIG. 4, the controller 300 specifies the type of sound source in the sound source direction based on the sound signal input from the microphone array 20. When the type of each sound source is specified, as shown in FIG. 6, the display unit 10 displays the direction of the sound source, the type of the sound source, and the received sound volume. On the screen 210 shown in FIG. 6, a human mark 218 indicating the result of detecting that the human 3 is present is displayed in a direction of 5 degrees when viewed from the robot 1. In addition, a human mark 217 indicating the result of detecting that the human 2 is present is displayed in a direction of 90 degrees as viewed from the robot 1. Further, a mark 216 indicating the result of detecting that the television is present is displayed in a direction of 135 degrees as viewed from the robot 1. Each mark is displayed in a size proportional to the volume. In the screen 210 shown in FIG. 8, the volume emitted by the human 3 indicated by the mark 218 is larger than the volume generated by the human 2 indicated by the mark 217. The mark corresponding to the type of the sound source is stored in the storage means (not shown) of the controller 300. When the type of the sound source is determined, the mark corresponding to the type is read from the storage means. Are output to the display 10 via the display output interface 11 and displayed on each screen. By executing step S12 of FIG. 4, the sound source type determination unit 102 shown in FIG. 2 is realized.

図４のステップＳ１４では、コントローラ３００が対話者を選択し、対話者である人間に受音方向（指向性）を合わせる。図７に示す画面２２０では、良好な指向性を持つ方向を示す指向性アンテナ２２１が表示されている。指向性アンテナ２２１が５度の方向を向いており、５度の方向に存在する人間３（併せて図１参照）に指向性を向けている状態を図示している。図８に示す画面２３０では、指向性アンテナ２３１が９０度の方向に位置している人間２を向いている状態を表している。
コントローラ３００は、マイク２０ａ〜２０ｆから出力される音信号を、向けている指向性の方向から計算される時間差だけ遅延させてから重ね合わせることによって、その方向から伝播する音を集中的に受音する。言語内容認識部は、人間に向けて指向性を合わせている状態で受音する音信号に基づいて、音声で示される言語の内容を認識する。指向性を合わせている方向以外から伝播する音についてはほとんど受音しないので、雑音の少ない音声信号から言語の内容を認識することができる。図４のステップＳ１４を実行することによって、図２の指向性制御部１０３が実現される。 In step S14 of FIG. 4, the controller 300 selects a conversation person and matches the sound receiving direction (directivity) with the person who is the conversation person. On the screen 220 shown in FIG. 7, a directional antenna 221 indicating a direction having good directivity is displayed. A state in which the directional antenna 221 faces the direction of 5 degrees and directivity is directed toward the human 3 (refer to FIG. 1 together) existing in the direction of 5 degrees is illustrated. A screen 230 shown in FIG. 8 represents a state in which the directional antenna 231 faces the person 2 positioned in the 90-degree direction.
The controller 300 concentrates the sound signals output from the microphones 20a to 20f by delaying the time signals calculated from the direction of directivity to which the sound signals are directed, and then superimposing the sound signals propagating from that direction. To do. The language content recognizing unit recognizes the content of the language indicated by the voice based on the sound signal received while the directivity is matched to the human. Since the sound propagating from the direction other than the direction in which the directivity is matched is hardly received, the content of the language can be recognized from the voice signal with little noise. By executing step S14 in FIG. 4, the directivity control unit 103 in FIG. 2 is realized.

次に、コントローラ３００は、図４のステップＳ１８の処理に進む。ステップＳ１８では、音声によって示されている言語内容を認識する。この際に、ステップＳ１４で指向性が固定された状態で受音した音声信号を利用するために、言語内容を認識する音源方向以外の方向に存在する音源から伝播する雑音によって、言語内容の認識が妨害される可能性が低く抑えられる。ステップＳ１８を実行することによって、図２の言語内容認識部１０４が実現される。 Next, the controller 300 proceeds to the process of step S18 in FIG. In step S18, the language content indicated by the voice is recognized. At this time, in order to use the audio signal received with the directivity fixed in step S14, the language content is recognized by noise propagated from the sound source existing in a direction other than the sound source direction for recognizing the language content. Is less likely to be disturbed. By executing step S18, the language content recognition unit 104 of FIG. 2 is realized.

次に、コントローラ３００は、図４のステップＳ２０の処理に進む。ステップＳ２０では、コントローラ３００が、ステップＳ１８で認識した言語内容に対応して、スピーカ３０から出力する言語内容を決定し、決定した音声を発声するための情報を音声出力インターフェース３１に出力する。また、コントローラ３００は、ステップＳ１８で認識した言語内容に対応して、移動手段駆動部４１を制御する。図４のステップＳ２０を実行することによって、図２の対話生成処理部１０５が実現される。 Next, the controller 300 proceeds to the process of step S20 in FIG. In step S 20, the controller 300 determines the language content to be output from the speaker 30 corresponding to the language content recognized in step S 18, and outputs information for uttering the determined sound to the audio output interface 31. Further, the controller 300 controls the moving means driving unit 41 in accordance with the language content recognized in step S18. By executing step S20 in FIG. 4, the dialog generation processing unit 105 in FIG. 2 is realized.

次に、コントローラ３００は、図４のステップＳ２２の処理に進む。ステップＳ２２では、コントローラ３００が、ロボット１の電源がオフになっているか否かを判別する。電源がオフになっていれば（ステップＳ２２のＹｅｓ）、処理を終了する。電源がオフになっていなければ（ステップＳ２２のＮｏ）、ステップＳ１０の処理に戻ってそれ以降の処理を繰り返す。 Next, the controller 300 proceeds to the process of step S22 in FIG. In step S22, the controller 300 determines whether or not the power of the robot 1 is turned off. If the power is off (Yes in step S22), the process is terminated. If the power is not turned off (No in step S22), the process returns to step S10 and the subsequent processes are repeated.

以上では、人間３と人間４（図１参照）が話をしている声が大きく、ロボット１が５度の方向に位置している人間３に指向性を合わせている場合について説明した。この場合、人間３は、ロボット１と対話しているわけではないが、人間４がロボット１の後ろ側に存在するので、ロボット１が自身に話しかけられていると認識し、人間３の存在する方向に指向性を合わせている。そこで、人間３が「さっきのテレビ番組を見た？」と人間４に向かって発話したことに反応して、「申し訳ございませんが、見ていません。」という内容の音声を出力している。
ところが、ロボット１との対話を希望しているのは、実際は、人間２であって、人間２はロボット１に「トイレはどこですか？」と聞いているものとする。しかしながら、ロボット１は、「申し訳ございませんが、見ていません。」と返事をしているので、対話が成立していない。そこで、人間２が、ロボット１の前面の表示器１０を見ると、図７に示す画面２２０が表示されており、人間３が大声を発しているので、ロボット１が人間３の方向に指向性を合わせていることが分かる。ロボット１が自分の発声内容に対応しない返事をしたとしても、その原因がわかることから不快感を感じることが少ない。ロボット１が故障しているわけでないことを認識することができる。ロボット１に自分の発話を認識してもらうために、人間３に小さい声で話をしてもらう等の対策を講じることができる。 In the above description, a case has been described in which the human 3 and the human 4 (see FIG. 1) are speaking loudly and the robot 1 is aligned with the human 3 positioned in the direction of 5 degrees. In this case, the human 3 is not interacting with the robot 1, but the human 4 is present behind the robot 1, and therefore recognizes that the robot 1 is speaking to itself and the human 3 exists. The directivity is matched to the direction. Therefore, in response to the human 3 speaking to the human 4 saying “Did you watch the TV program a while ago?”, The voice of “I'm sorry, I haven't watched” is output.
However, it is assumed that the person who desires the dialogue with the robot 1 is actually the person 2 and the person 2 asks the robot 1 “Where is the toilet?”. However, since the robot 1 has replied “I'm sorry, I have not seen it”, the dialogue has not been established. Therefore, when the human 2 looks at the display 10 on the front surface of the robot 1, the screen 220 shown in FIG. 7 is displayed, and the human 3 is loud, so the robot 1 has directivity in the direction of the human 3. You can see that Even if the robot 1 makes a reply that does not correspond to the content of its own utterance, it is less likely to feel uncomfortable because the cause is known. It can be recognized that the robot 1 is not out of order. In order for the robot 1 to recognize his / her utterance, it is possible to take measures such as having the human 3 speak with a low voice.

前記した一連の処理は繰り返し実行される。図４のステップＳ１０、Ｓ１２を繰り返すと、コントローラ３００がロボット１の応対可能角度範囲０度〜１８０度を再度走査し、音源の存在方向、音源の種類、受音した音量を再び検出する。この結果、表示器１０の表示画面が、例えば図８に示すように変化する。画面２３０から、人間３が発する音量が小さくなっており、指向性アンテナ２３１が人間２の方向に向いていることから、ロボット１が人間２の存在方向にその指向性を合わせていることが分かる。
音量が最大の方向に指向性を固定する場合、過去の所定期間内の平均的音量が最大の音源方向に指向性を固定することが好ましい。平均的音量によって指向性を固定すると、一人の発声者に指向性を合わせ続けることが可能となる。 The series of processes described above are repeatedly executed. When steps S10 and S12 in FIG. 4 are repeated, the controller 300 scans again the available angle range of the robot 1 from 0 degrees to 180 degrees, and again detects the direction of the sound source, the type of the sound source, and the received sound volume. As a result, the display screen of the display 10 changes as shown in FIG. 8, for example. From the screen 230, it can be seen that the volume of the human 3 is low and the directional antenna 231 is directed in the direction of the human 2, so that the robot 1 is aligned with the direction of the human 2. .
When the directivity is fixed in the direction in which the sound volume is maximum, it is preferable to fix the directivity in the sound source direction in which the average sound volume in the past predetermined period is maximum. If the directivity is fixed by the average sound volume, it becomes possible to keep the directivity matched to one speaker.

本実施例のロボット１によれば、人間２が、表示器１０の表示内容から、ロボット１が認識している音源の存在方向、ロボット１が認識している音源の種類、ロボット１が受音している音量を把握することができる。人間２は、マイクの受音方向（指向性）が自分以外の方向に向けられたこと、ならびにそれがいずれの音源の影響によるものなのかを明確に知ることができる。また、現時点ではマイクの受音方向が自分の方向に向けられていても、今後、いずれの音源が雑音源となって自己の音声入力を阻む可能性があるのかを予測することができる。例えば、別の人間がロボット１の応対可能角度範囲の中に存在し、今は音声の音量が小さくてマイクの受音方向が向いていないものの、ロボット１は音源の方向として認識しており、その音量が大きくなるとロボット１がマイクの受音方向を切換えるために、人間２の音声入力が拒まれることになりそうであるといったことを知ることができる。したがって、人間２は、ロボット１のマイクの受音方向を確実に自分に向けておくために、どの音源を排除すればよいのかを明確に知ることができる。 According to the robot 1 of the present embodiment, the human 2 determines from the display content of the display 10 the direction of the sound source recognized by the robot 1, the type of sound source recognized by the robot 1, and the robot 1 receiving sound. It is possible to grasp the sound volume. The human 2 can clearly know that the sound receiving direction (directivity) of the microphone is directed in a direction other than his / her own, and which sound source influences it. Moreover, even if the sound receiving direction of the microphone is directed to the user's direction at the present time, it is possible to predict which sound source may become a noise source and block his / her voice input in the future. For example, another person exists in the range of available angles of the robot 1 and the sound volume is low and the microphone receiving direction is not directed, but the robot 1 recognizes the direction of the sound source, When the volume increases, the robot 1 can change the sound receiving direction of the microphone, so that it can be known that the voice input of the human 2 is likely to be rejected. Therefore, the human 2 can clearly know which sound source should be excluded in order to ensure that the sound receiving direction of the microphone of the robot 1 is directed to himself / herself.

（第２実施例）
本実施例の表示器では、音源の存在方向、音源の種類、受音した音量に加え、ロボットと音源の距離が表示される。図９、図１０を参照して説明する。
図９は、ロボット１ａの構成を示すブロック図である。図１０は、表示器に表示される画面の例を示す。 (Second embodiment)
In the display of this embodiment, the distance between the robot and the sound source is displayed in addition to the direction of the sound source, the type of the sound source, and the received sound volume. This will be described with reference to FIGS.
FIG. 9 is a block diagram showing the configuration of the robot 1a. FIG. 10 shows an example of a screen displayed on the display.

図９に示すように、ロボット１ａには、図２に示すロボット１の構成要素に加え、ステレオカメラ５０が設けられている。また、ロボット１ａに内蔵されているコントローラ３００ａには、ロボット１のコントローラ３００の構成要素に加え、画像入力インターフェース５１が設けられている。また、コントローラ３００ａの制御手段１００ａには、ロボット１の制御手段１００の構成要素に加え、音源距離計算部１０６が設けられている。
ステレオカメラ５０は、コントローラ３００ａの画像入力インターフェース５１を介して、制御手段１００ａの音源距離計算部１０６の入力側に接続されている。音源距離認識部１０６の出力側は、指向性制御部１０３の入力側と表示出力インターフェース１１に接続されている。その他のコントローラ３００ａの構成要素及びそれらの接続は、コントローラ３００と同様であるので、説明を省略する。 As shown in FIG. 9, the robot 1a is provided with a stereo camera 50 in addition to the components of the robot 1 shown in FIG. The controller 300a built in the robot 1a is provided with an image input interface 51 in addition to the components of the controller 300 of the robot 1. In addition to the components of the control unit 100 of the robot 1, the sound source distance calculation unit 106 is provided in the control unit 100a of the controller 300a.
The stereo camera 50 is connected to the input side of the sound source distance calculation unit 106 of the control means 100a via the image input interface 51 of the controller 300a. The output side of the sound source distance recognition unit 106 is connected to the input side of the directivity control unit 103 and the display output interface 11. Since the other components of the controller 300a and their connections are the same as those of the controller 300, description thereof will be omitted.

ステレオカメラ５０は２個のカメラで構成されており、それぞれがロボット１ａの右目と左目に配設されている。２個のカメラで撮像される画像は、対象物を微小な角度差をもって撮像したものである。撮像された画像は其々画像信号に変換されて画像入力インターフェース５１に出力される。画像入力インターフェース５１に接続された音源距離計算部１０６では、入力された画像信号に基づいて、ロボット１ａと音源との距離を算出する。例えば、特定のポイントに着目して一方のカメラと他方のカメラで撮像した場合の”ずれ”を算出する。そして、その”ずれ”の原因となる各カメラから対象物を見た際の角度の違いと、２個のカメラの間隔（固定）とから、ロボット１ａと音源との距離を算出する。このようにして距離を算出する方法は公用の技術であるので、詳しい説明は省略する。 The stereo camera 50 is composed of two cameras, which are arranged on the right eye and the left eye of the robot 1a. The images picked up by the two cameras are images of the object picked up with a minute angle difference. Each captured image is converted into an image signal and output to the image input interface 51. The sound source distance calculation unit 106 connected to the image input interface 51 calculates the distance between the robot 1a and the sound source based on the input image signal. For example, paying attention to a specific point, the “deviation” when the images are taken by one camera and the other camera is calculated. Then, the distance between the robot 1a and the sound source is calculated from the difference in angle when the object is viewed from each camera causing the “displacement” and the interval (fixed) between the two cameras. Since the method for calculating the distance in this way is a public technique, a detailed description is omitted.

指向性制御部１０３ａでは、音源方向検出部１０１で特定された各音源の存在方向と、音源種類判別部１０２で特定された各音源の種類を示す情報と、受音した音量を示す情報と、音源距離計算部１０６で認識されたロボットと各音源の距離を示す情報に基づいて、指向性を合わせる方向（受音方向）を特定する。ここでは、音源方向検出部１０１で認識した音源の存在方向の中から、種類が人間の音声である音源を選択し、なおかつロボット１ａからの距離が最も近い音源を選択する。これによって、人間２〜４の中からロボット１ａに話しかけている人間２を対話者として特定する。そして、その人間２の方向に指向性を合わせる。なお、距離が同じ場合には、受音した音量の大きい方の音源を選択し、選択した音源の方向に指向性を合わせる。 In the directivity control unit 103a, the direction of each sound source specified by the sound source direction detection unit 101, the information indicating the type of each sound source specified by the sound source type determination unit 102, the information indicating the received sound volume, Based on the information indicating the distance between the robot recognized by the sound source distance calculation unit 106 and each sound source, the direction (sound receiving direction) to match the directivity is specified. Here, a sound source whose type is human voice is selected from the sound source existing directions recognized by the sound source direction detection unit 101, and a sound source closest to the robot 1a is selected. As a result, the person 2 talking to the robot 1a from among the persons 2 to 4 is specified as a dialog person. Then, the directivity is adjusted to the direction of the person 2. If the distance is the same, the sound source with the higher received sound volume is selected, and the directivity is adjusted to the direction of the selected sound source.

表示出力インターフェース１１には、音源の存在方向、音源の種類、受音した音量、ロボットと音源の距離、受音装置の指向性が向いている方向に関する情報が入力されているので、それらの情報を表示器１０に表示する。例えば、図１０に例示する画面２４０を表示する。画面２４０は、各音源に存在する人間が、ロボットと音源間の距離の大小を認識できるように表示している。図１２に示す一番外側の半円２４１は、ロボットからの距離が１．５ｍであることを表し、半円２４２は、ロボットからの距離が１．０ｍであることを表し、半円２４３は、ロボットからの距離が０．５ｍであることを表す。
画面２４０には、ロボット１から５度の方向であって距離が１．５ｍ（ｄ１＝１．５）の位置に人間３（図１参照）の存在を示す人のマークが表示されている。また、ロボット１から９０度の方向であって距離が１ｍ（ｄ２＝１）の位置に人間２の存在を示す人のマークが表示されている。また、ロボット１から１３５度の方向であって１．５ｍ（ｄ３＝１．５）の位置にテレビの存在を示すマークが表示されている。そして、音源の種類が人間であり、かつロボット１ａからの距離が最も近い人間２の方向に指向性が合わせられているので、指向性アンテナが人間２に向いて表示されている。 The display output interface 11 receives information on the direction of the sound source, the type of the sound source, the received sound volume, the distance between the robot and the sound source, and the direction in which the directivity of the sound receiving device is directed. Is displayed on the display 10. For example, the screen 240 illustrated in FIG. 10 is displayed. The screen 240 is displayed so that a person existing in each sound source can recognize the magnitude of the distance between the robot and the sound source. The outermost semicircle 241 shown in FIG. 12 represents that the distance from the robot is 1.5 m, the semicircle 242 represents that the distance from the robot is 1.0 m, and the semicircle 243 is This indicates that the distance from the robot is 0.5 m.
On the screen 240, a human mark indicating the presence of the human 3 (see FIG. 1) is displayed at a position 5 degrees from the robot 1 and a distance of 1.5 m (d1 = 1.5). In addition, a human mark indicating the presence of the human 2 is displayed at a position of 90 degrees from the robot 1 and a distance of 1 m (d2 = 1). A mark indicating the presence of the television is displayed at a position of 135 m from the robot 1 and at a position of 1.5 m (d3 = 1.5). Since the type of the sound source is a human and the directivity is matched to the direction of the human 2 that is the closest to the robot 1a, the directional antenna is displayed toward the human 2.

本実施例のロボット１ａによれば、表示器１０に、ロボットと各音源の距離を表示することができるので、表示器を見た人が音源の位置を把握し易い。 According to the robot 1a of the present embodiment, since the distance between the robot and each sound source can be displayed on the display device 10, a person who looks at the display device can easily grasp the position of the sound source.

ロボット１ａは、カメラ５０で撮像した画像に基づいて、人間の顔の特徴を抽出し、人を区別して対応するように構成してもよい。この場合、カメラ５０は、画像入力インターフェース５１を介して、音源種類判別部１０２の入力側にも接続される。また、音源種類判別部１０２の出力側は対話生成部１０５の入力側にも接続される。そして、制御手段１００ａの記憶手段（特に図示していない。）には、顔の特徴に対応する人間の個人情報データベースが記憶されている。そして、音源種類判別部１０２で、撮像した画像の顔の特徴を抽出し、抽出した特徴に対応して個人情報データベースに記憶されている個人情報を読み出す。対話生成部１０５では、読み出された個人情報に基づいて適切な対話を生成する。例えば、人間がロボット１ａに「おはよう」と話しかけると、ロボットは誰であるかを認識して、「○○さん、おはようございます」と返答をする。
これによれば、ロボット１ａは、同じ問いかけがあった場合でも、発声者によって相違する細やかな対応をすることができる。 The robot 1a may be configured to extract features of a human face based on an image captured by the camera 50 and to distinguish and respond to a person. In this case, the camera 50 is also connected to the input side of the sound source type determination unit 102 via the image input interface 51. The output side of the sound source type determination unit 102 is also connected to the input side of the dialogue generation unit 105. The storage means (not shown) of the control means 100a stores a human personal information database corresponding to facial features. Then, the sound source type discriminating unit 102 extracts the facial features of the captured image, and reads the personal information stored in the personal information database corresponding to the extracted features. The dialog generation unit 105 generates an appropriate dialog based on the read personal information. For example, when a human being speaks to the robot 1a as "Good morning", the robot recognizes who the robot is and responds with "Good morning, Mr. XX".
According to this, even when the robot 1a has the same question, the robot 1a can take a fine response that differs depending on the speaker.

第２実施例では、音声入力インターフェース２１から出力される音信号を音源方向検出部１０１で処理することによって音源の存在方向を認識する場合を説明したが、画像入力インターフェース５１から出力される画像信号に基づいて音源の存在方向を認識してもよい。この場合、画像入力インターフェース５１と音源方向検出部１０１の入力側が接続される。音源方向検出部１０１では、画像によって音源となる物体の存在方向を特定する。 In the second embodiment, the case where the sound source output direction is recognized by processing the sound signal output from the audio input interface 21 by the sound source direction detection unit 101 has been described, but the image signal output from the image input interface 51 is described. The direction of the sound source may be recognized based on the above. In this case, the image input interface 51 and the input side of the sound source direction detection unit 101 are connected. The sound source direction detection unit 101 specifies the presence direction of an object serving as a sound source from an image.

第１、第２実施例では、音声入力インターフェース２１から出力される音信号を音源種類判別部１０２で処理することによって音源の種類を判別する場合を説明したが、画像入力インターフェース５１から出力される画像信号から音源の種類を判別してもよい。この場合、画像入力インターフェース５１と音源種類判別部１０１の入力側が接続される。そして、音源種類判別部１０２では、画像から音源となる物体の種類を特定する。 In the first and second embodiments, the case where the sound source type is determined by processing the sound signal output from the audio input interface 21 by the sound source type determining unit 102 has been described, but the sound source type is output from the image input interface 51. The type of sound source may be determined from the image signal. In this case, the image input interface 51 and the input side of the sound source type determination unit 101 are connected. Then, the sound source type discriminating unit 102 specifies the type of the object serving as the sound source from the image.

第２実施例では、ステレオカメラ５０から入力される画像信号を音源距離計算部１０６で処理することによって音源までの距離を計算する場合を説明したが、マイク２０で受音する音信号から計算しもよい。この場合、音源距離計算部１０６の入力側は音声入力インターフェース２１に接続される。また、カメラ５０と画像入力インターフェース５１は必要ではない。 In the second embodiment, the case where the distance to the sound source is calculated by processing the image signal input from the stereo camera 50 by the sound source distance calculating unit 106 has been described, but the calculation is performed from the sound signal received by the microphone 20. Also good. In this case, the input side of the sound source distance calculation unit 106 is connected to the audio input interface 21. Further, the camera 50 and the image input interface 51 are not necessary.

第１、第２実施例では、本発明の音声入力装置を対話型のロボットに適用した場合について説明したが、本発明は他の機器に適用してもよい。例えば、カーナビゲーションシステム（以降、カーナビと省略する。）に適用してもよい。カーナビは、車内の限られた空間の中の音源の音声を認識する。複数の人が乗車している場合には、雑音源の音声に影響され易い。このため、カーナビは言語内容の誤認識を起こして誤作動などを引き起こし易い。本発明を適用すれば、カーナビが認識している音源方向を表示器に表示することができ、対話者がこれを把握することができる。これによって、対話者がカーナビに向けて発話したにもかかわらず適切に音声入力されなかった場合（不適切な応答があった場合等）に、適切に入力されなかった原因となっている雑音源を把握して対処することができる。例えば、表示器に、音源位置が後部座席に表示されており、後部座席に座っている人同士が大声で話し合っていた場合、雑音源となっている後部座席の人に、話し声を小さくしてもらうように対処することができる。 In the first and second embodiments, the case where the voice input device of the present invention is applied to an interactive robot has been described. However, the present invention may be applied to other devices. For example, the present invention may be applied to a car navigation system (hereinafter abbreviated as “car navigation”). Car navigation recognizes the sound of a sound source in a limited space in the car. When a plurality of people are on board, it is easily affected by the sound of the noise source. For this reason, the car navigation system easily causes erroneous recognition of the language content and malfunction. If the present invention is applied, the sound source direction recognized by the car navigation system can be displayed on the display device, and the dialog person can grasp this. As a result, when a conversation person speaks to the car navigation system and the voice is not input properly (for example, when there is an inappropriate response), the noise source that is not input properly Can be understood and dealt with. For example, if the sound source position is displayed on the rear seat on the display and people sitting in the rear seat are talking loudly, the voice of the rear seat that is the source of noise is reduced. Can be dealt with.

第１、第２実施例の表示器に、言語内容認識部１０４で、人間２が発した音声の認識結果（例えば、「トイレはどこですか？」）をテキスト表示するように構成してもよい。これによれば、対話者は実際に認識された音声を目視によって確認することができるので、対話者の安心度が向上する。
また、言語内容認識部１０４で、人間２が発した音声の内容が認識できたか否かを示す結果（認識ＯＫ、認識ＮＧ等）をテキスト表示するように構成してもよい。これによれば、対話者は実際に音声が認識されたか否かを目視によって確認することができるので、対話者の安心度が向上する。
また、対話生成処理部１０５で、認識した音声の内容に対応してスピーカ３０から出力する音声（返事）の内容を決定できたか否かを示す結果（対話生成ＯＫ、対話生成ＮＧ等）をテキスト表示するように構成してもよい。これによれば、対話者は返事が生成されたか否かを目視によって確認することができるので、対話者の安心度が向上する。
第１実施例と第２実施例の表示器に、対話生成処理部１０５で決定したスピーカ３０から出力する音声（例えば、「トイレをご案内します。」）をテキスト表示するように構成してもよい。これによれば、対話者は返事を目視によって確認することができるので、対話者の安心度が向上する。 The display unit of the first and second embodiments may be configured such that the language content recognition unit 104 displays the voice recognition result (for example, “Where is the toilet?”) Produced by the human 2 in text. . According to this, since the conversation person can visually confirm the actually recognized voice, the comfort level of the conversation person is improved.
In addition, the language content recognition unit 104 may be configured to display the result (recognition OK, recognition NG, etc.) indicating whether or not the content of the voice uttered by the person 2 has been recognized. According to this, since the conversation person can visually confirm whether or not the voice is actually recognized, the comfort level of the conversation person is improved.
In addition, the dialog generation processing unit 105 displays a result (dialog generation OK, dialog generation NG, etc.) indicating whether or not the content of the voice (reply) output from the speaker 30 has been determined corresponding to the recognized voice content. You may comprise so that it may display. According to this, since the conversation person can visually confirm whether or not a reply is generated, the comfort level of the conversation person is improved.
The display of the first embodiment and the second embodiment is configured to display the voice (for example, “I will guide you to the toilet”) output from the speaker 30 determined by the dialog generation processing unit 105 as text. Also good. According to this, since the conversation person can confirm a reply by visual observation, the comfort level of the conversation person improves.

また、第１、第２実施例では、マイクの受音方向は、静止している複数個のマイクの出力を処理することによって切り替える場合について説明したが、マイクを物理的に回転させることによってマイクの受音方向を切り替えるマイクを用いてもよい。
なお、音源方向検出部１０１、音源種類判別部１０２、指向性制御部１０３、言語内容認識部１０４、対話生成処理部１０５、及び音源距離計算部１０６は、ハードウエアで構成されていてもよいし、ソフトウエアで構成されていてもよい。ソフトウエアで構成される場合には、各部１０１〜１０６は、プログラムの各機能（音源方向検出機能、音源種類判別機能、指向性制御機能、言語内容認識処理機能、対話生成処理機能、音源距離計算機能）を実行する各ステップに相当する。 In the first and second embodiments, the case where the sound receiving direction of the microphone is switched by processing the outputs of a plurality of stationary microphones has been described. However, the microphone is rotated by physically rotating the microphone. A microphone that switches the sound receiving direction may be used.
The sound source direction detection unit 101, the sound source type determination unit 102, the directivity control unit 103, the language content recognition unit 104, the dialogue generation processing unit 105, and the sound source distance calculation unit 106 may be configured by hardware. It may be configured by software. When configured by software, each of the units 101 to 106 includes each function of the program (sound source direction detection function, sound source type discrimination function, directivity control function, language content recognition processing function, dialog generation processing function, sound source distance calculation. This corresponds to each step of executing the function.

以上、本発明の具体例を詳細に説明したが、これらは例示にすぎず、特許請求の範囲を限定するものではない。特許請求の範囲に記載の技術には、以上に例示した具体例を様々に変形、変更したものが含まれる。
また、本明細書または図面に説明した技術要素は、単独であるいは各種の組み合わせによって技術的有用性を発揮するものであり、出願時請求項記載の組み合わせに限定されるものではない。また、本明細書または図面に例示した技術は複数目的を同時に達成するものであり、そのうちの一つの目的を達成すること自体で技術的有用性を持つものである。 Specific examples of the present invention have been described in detail above, but these are merely examples and do not limit the scope of the claims. The technology described in the claims includes various modifications and changes of the specific examples illustrated above.
In addition, the technical elements described in the present specification or the drawings exhibit technical usefulness alone or in various combinations, and are not limited to the combinations described in the claims at the time of filing. In addition, the technology illustrated in the present specification or the drawings achieves a plurality of objects at the same time, and has technical utility by achieving one of the objects.

ロボット１の対話可能エリア内に存在する音源とロボットの概略を示す。An outline of a sound source and a robot that exist in an interactive area of the robot 1 is shown. ロボット１の構成を示すブロック図である。1 is a block diagram showing a configuration of a robot 1. FIG. ロボット１が音源の方向を認識する過程を説明する図である。It is a figure explaining the process in which the robot recognizes the direction of a sound source. ロボット１の制御手段が実行するプログラムのフローチャート図である。It is a flowchart figure of the program which the control means of the robot 1 performs. 表示器１０に表示される画面の例を示す。The example of the screen displayed on the indicator 10 is shown. 表示器１０に表示される画面の例を示す。The example of the screen displayed on the indicator 10 is shown. 表示器１０に表示される画面の例を示す。The example of the screen displayed on the indicator 10 is shown. 表示器１０に表示される画面の例を示す。The example of the screen displayed on the indicator 10 is shown. ロボット１ａの構成を示すブロック図である。It is a block diagram which shows the structure of the robot 1a. 表示器１０に表示される画面の例を示す。The example of the screen displayed on the indicator 10 is shown. マイクで受音した音の音源が、受音装置に対して、どの方向に存在するかを検出する過程を説明する図である。It is a figure explaining the process which detects in which direction the sound source of the sound received with the microphone exists with respect to a sound receiving apparatus. 各マイク２０ａ〜２０ｆが受音した音の音信号を示す。The sound signal of the sound which each microphone 20a-20f received is shown.

Explanation of symbols

１，１ａロボット
２，３，４人間
５テレビ
６携帯電話
１０表示器
２０マイク
３０スピーカ
４０移動手段
５０カメラ
１００，１００ａ制御手段
３００，３００ａコントローラ DESCRIPTION OF SYMBOLS 1,1a Robot 2,3,4 Human 5 Television 6 Mobile phone 10 Display 20 Microphone 30 Speaker 40 Moving means 50 Camera 100, 100a Control means 300, 300a Controller

Claims

A sound receiving device,
With a microphone,
Sound source direction detecting means for detecting the propagation direction of the sound received by the microphone with reference to the sound receiving device;
A sound receiving device comprising: display means for displaying the direction detected by the sound source direction detecting means so as to be visible from the periphery of the sound receiving device.

2. The sound receiving device according to claim 1, wherein the display means displays the sound receiving volume of the microphone for each sound source.

A sound source type discriminating means for discriminating the type of the sound source of the received sound based on the frequency component of the sound received by the microphone is added,
The sound receiving device according to claim 1 or 2, wherein the display means also displays the type of the sound source determined by the sound source type determining means.

A plurality of cameras that capture the range in which the microphone receives sound;
Sound source distance calculation means for calculating the distance between the sound receiving device and the sound source is added based on the image group captured by the plurality of cameras,
The sound receiving apparatus according to claim 1, wherein the display means displays the distance calculated by the sound source distance calculating means together.

A sound receiving device according to any one of claims 1 to 4,
Means for fixing the sound receiving direction of the microphone in the direction in which the sound receiving volume is maximized when the sound source direction detecting means detects a plurality of directions;
A speech recognition apparatus comprising speech recognition means for recognizing a sound received by a microphone having a fixed sound receiving direction.

6. The speech recognition apparatus according to claim 5, wherein the display means also displays the sound receiving direction of the microphone fixed by the sound receiving direction fixing means.

It is a movable body that can rotate at least around the vertical axis,
The sound receiving device according to claim 1 is mounted,
A movable body, wherein the display means displays an angle formed by a reference direction fixed to the movable body and a direction in which the sound source exists.

A rotation mechanism that rotates a movable body around a vertical axis with respect to the floor surface;
Means for fixing the sound receiving direction of the microphone in the direction in which the sound receiving volume is maximized when the sound source direction detecting means detects a plurality of directions;
Speech recognition means for recognizing sound received by a microphone having a fixed sound receiving direction;
8. The movable body according to claim 7, further comprising control means for controlling the rotation mechanism based on information recognized by the voice recognition means.

9. The movable body according to claim 8, wherein the display means also displays the sound receiving direction of the microphone fixed by the sound receiving direction fixing means.