JP2008040075A

JP2008040075A - Robot apparatus and control method of robot apparatus

Info

Publication number: JP2008040075A
Application number: JP2006213195A
Authority: JP
Inventors: Susumu Shimizu; 奨清水
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2006-08-04
Filing date: 2006-08-04
Publication date: 2008-02-21
Anticipated expiration: 2026-08-04
Also published as: JP4821489B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a robot apparatus and its operation control method improving a voice recognition rate. <P>SOLUTION: The robot apparatus includes: voice input and output paths A1 to A4 with priority set; a voice signal switching device 11 for switchably and selecting voice input and output paths A1 to A4 based on the priority; a voice recognition module 12 for recognizing input voice; a voice generation module 13 for generating voice based on the voice recognition result. The voice recognition module 12 includes: an S/N ratio determination section for determining whether or not the S/N ratio of the voice from the voice input and output paths is a predetermined threshold value or more; and a recognition ratio determination means for determining whether the voice recognition ratio of the voice is a predetermined threshold value or more. The voice signal switching device 11 includes a priority updating means for updating the priority of the voice input and output paths A2 to A4 having variable priority in voice input and output paths A1 to A4, based on a determination result of the S/N ratio determination means and the recognition ratio determination means. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、音声認識結果に基づき自律的に動作するロボット装置及びロボット装置の制御方法に関する。 The present invention relates to a robot apparatus that operates autonomously based on a voice recognition result and a method for controlling the robot apparatus.

従来、自律移動型のロボットに対し、教示信号として音声を使用し、ロボットに音声認識させ、音声認識結果に基づき移動等の動作をさせるものがある。しかし、現状の音声認識技術では、周囲の環境によっては音声認識精度が上がらず、音声を認識できない、又は誤認識する場合があるため、認識率を向上するため、背景ノイズを十分に抑える必要がある。そこで、マイクアレイなどを使用した信号処理によりノイズを抑制する手法も開発されているが、この場合であっても、マイクから数メートル離れただけで認識率が低下する。このように、環境に左右されず、ロボットに適切な指示を音声によって与えるためには、ロボットに内蔵したマイクだけで音声認識を行なうには限界がある。 2. Description of the Related Art Conventionally, some autonomously moving robots use speech as a teaching signal, cause the robot to recognize speech, and perform operations such as movement based on the speech recognition result. However, with the current speech recognition technology, depending on the surrounding environment, speech recognition accuracy may not be improved, and speech may not be recognized or may be erroneously recognized. Therefore, it is necessary to sufficiently suppress background noise in order to improve the recognition rate. is there. Therefore, a technique for suppressing noise by signal processing using a microphone array or the like has been developed, but even in this case, the recognition rate is lowered only by being several meters away from the microphone. Thus, in order to give an appropriate instruction to the robot by voice regardless of the environment, there is a limit to performing voice recognition using only the microphone built in the robot.

そこで、このようにロボット内蔵のマイクで認識できない場合には、ロボットの周囲に別途マイク（以下、外部マイクという。）を設置し、背景ノイズを収集したり、ロボットから離れた位置で集音することが考えられる。しかし、この場合、外部マイクの設置場所が固定されていると、ロボットの移動範囲が制限されてしまうため、オペレータの音声をロボットに対し無線で伝送することが行なわれている。 Therefore, if the robot's built-in microphone cannot recognize it, a separate microphone (hereinafter referred to as an external microphone) is installed around the robot to collect background noise or collect sound at a position away from the robot. It is possible. However, in this case, if the installation location of the external microphone is fixed, the movement range of the robot is limited, so that the operator's voice is wirelessly transmitted to the robot.

しかし、ロボットの音声指示がオペレータのみによって与えられるのであれば、オペレータの音声入力を無線等でロボットに直接入力すればよいが、この方式ではオペレータ以外の音声入力を受け付けることができなくなる。したがって、オペレータとロボットの近くにいる人との両方の音声を認識することができることが望ましいが、従来、ロボットに対し無線で伝送した音声信号と、内蔵マイクとの切替手段が確立しておらず、手動で切替が行なわれていた。 However, if the robot's voice instruction is given only by the operator, the operator's voice input may be directly input to the robot by radio or the like. However, this method cannot accept voice input from other than the operator. Therefore, it is desirable to be able to recognize the voice of both the operator and the person near the robot, but conventionally there has not been established a means for switching between the voice signal transmitted wirelessly to the robot and the built-in microphone. Switching was done manually.

これに対し、内蔵マイクからの入力と、無線通信を介して受信した音声情報とに基づき音声認識を行なうロボット装置が特許文献１に開示されている。この特許文献１に記載のロボット装置においては、無線通信を介して送信されるユーザの発話内容を受信する受信手段と、ロボット装置に組み込まれた内蔵マイク機器から入力した音声内容に代えて又は当該音声内容と共に、受信手段から受信した音声情報に基づく発話内容を認識する音声認識手段とを設ける。この構成により、ユーザとスムーズかつ自然にインタラクティブな対話をすることができるロボット装置を提供する。
特開２００５−２０２０７５号公報 On the other hand, Patent Document 1 discloses a robot apparatus that performs voice recognition based on input from a built-in microphone and voice information received via wireless communication. In the robot apparatus described in Patent Document 1, in place of the voice content input from the receiving means for receiving the user's utterance content transmitted via wireless communication and the built-in microphone device incorporated in the robot device, or Voice recognition means for recognizing the utterance contents based on the voice information received from the receiving means is provided together with the voice contents. With this configuration, a robot apparatus is provided that can smoothly and naturally interact with a user.
Japanese Patent Laying-Open No. 2005-202075

しかしながら、特許文献１のロボット装置においては、複数の内蔵マイクを有する場合、どの内蔵マイクを使用して音声認識すればよいのか判断することができない。例えば、順次選択して、音声認識できればその内蔵マイクを使用する、などの選択の方法も考えられるものの、最適なマイクを判断して選択することができれば、音声認識率を向上させることができる。 However, in the robot apparatus of Patent Document 1, when there are a plurality of built-in microphones, it is impossible to determine which built-in microphone should be used for voice recognition. For example, a selection method such as sequentially selecting and using the built-in microphone if the voice can be recognized can be considered, but if the optimum microphone can be determined and selected, the voice recognition rate can be improved.

本発明は、このような事情に鑑みてなされたものであり、音声認識率を向上させることができるロボット装置及びその制御方法を提供することを目的とする。 The present invention has been made in view of such circumstances, and an object of the present invention is to provide a robot apparatus and a control method thereof that can improve a speech recognition rate.

本発明に係るロボット装置は、音声認識結果に基づき自律的に動作するロボット装置であって、優先度が設定された第１及び第２の音声入力手段と、前記優先度に基づき前記第１又は第２の音声入力手段からの入力を切替選択する音声信号切替手段と、前記音声信号切替手段により切替選択された前記第１又は第２の音声入力手段から入力された音声を認識する音声認識手段と、前記音声認識手段の音声認識結果に基づき行動選択する行動選択手段とを備え、前記音声信号切替手段は、前記音声入力手段からの音声の信号対雑音比が所定の閾値以上であるか否かの判定結果に基づき前記第１の音声入力手段及び／又は第２の音声入力手段の優先度を更新する。 The robot apparatus according to the present invention is a robot apparatus that operates autonomously based on a voice recognition result, wherein the first and second voice input means set with priority and the first or second based on the priority are set. Voice signal switching means for switching and selecting input from the second voice input means, and voice recognition means for recognizing voice input from the first or second voice input means switched and selected by the voice signal switching means And an action selection means for selecting an action based on the voice recognition result of the voice recognition means, wherein the voice signal switching means determines whether the signal-to-noise ratio of the voice from the voice input means is equal to or greater than a predetermined threshold value. Based on the determination result, the priority of the first voice input means and / or the second voice input means is updated.

本発明においては、信号対雑音比（Ｓ／Ｎ比）の良否に基づき音声入力手段を選択する優先度を更新するため、Ｓ／Ｎ比が良好な音声入力手段を自動的に選択することができる。 In the present invention, since the priority for selecting the voice input means is updated based on the signal-to-noise ratio (S / N ratio), it is possible to automatically select the voice input means having a good S / N ratio. it can.

本発明に係るロボット装置は、音声認識結果に基づき自律的に移動するロボット装置であって、優先度が設定された第１及び第２の音声入力手段と、前記優先度に基づき前記第１又は第２の音声入力手段からの入力を切替選択する音声信号切替手段と、前記音声信号切替手段により切替選択された前記第１又は第２の音声入力手段から入力された音声を認識する音声認識手段と、前記音声認識手段の音声認識結果に基づき行動選択する行動選択手段とを備え、前記音声信号切替手段は、前記音声入力手段からの音声の音声認識率が所定の閾値以上であるか否かの判定結果に基づき前記第１の音声入力手段及び／又は第２の音声入力手段の優先度を更新するものである。 The robot apparatus according to the present invention is a robot apparatus that autonomously moves based on a voice recognition result, wherein the first and second voice input means set with priority and the first or second based on the priority are set. Voice signal switching means for switching and selecting input from the second voice input means, and voice recognition means for recognizing voice input from the first or second voice input means switched and selected by the voice signal switching means And an action selection means for selecting an action based on a voice recognition result of the voice recognition means, wherein the voice signal switching means determines whether or not the voice recognition rate of the voice from the voice input means is equal to or higher than a predetermined threshold value. The priority of the first voice input means and / or the second voice input means is updated based on the determination result.

本発明においては、音声認識率の良否に基づき音声信号入力手段を選択する優先度を更新するため、音声認識率が良好な音声が入力される音声入力手段を自動的に選択することができる。 In the present invention, since the priority for selecting the voice signal input means is updated based on whether the voice recognition rate is good or not, it is possible to automatically select the voice input means to which the voice having a good voice recognition rate is input.

本発明に係るロボット装置は、音声認識結果に基づき自律的に移動するロボット装置であって、優先度が設定された第１及び第２の音声入力手段と、前記優先度に基づき前記第１又は第２の音声入力手段からの入力を切替選択する音声信号切替手段と、前記音声信号切替手段により切替選択された前記第１又は第２の音声入力手段から入力された音声を認識する音声認識手段と、前記音声認識手段の音声認識結果に基づき行動選択する行動選択手段と、前記音声入力手段からの音声の信号対雑音比が所定の閾値以上であるか否かを判定するＳ／Ｎ比判定手段と、前記音声入力手段からの音声の音声認識率が所定の閾値以上であるか否かを判定する認識率判定手段と、前記Ｓ／Ｎ比判定手段及び前記認識率判定手段の判定結果に基づき前記第１の音声入力手段及び／又は第２の音声入力手段の優先度を更新する優先度更新手段とを有するものである。 The robot apparatus according to the present invention is a robot apparatus that autonomously moves based on a voice recognition result, wherein the first and second voice input means set with priority and the first or second based on the priority are set. Voice signal switching means for switching and selecting input from the second voice input means, and voice recognition means for recognizing voice input from the first or second voice input means switched and selected by the voice signal switching means An action selection unit that selects an action based on a voice recognition result of the voice recognition unit; and an S / N ratio determination that determines whether a signal-to-noise ratio of a voice from the voice input unit is equal to or greater than a predetermined threshold value And a recognition rate determination unit that determines whether or not a voice recognition rate of the voice from the voice input unit is equal to or greater than a predetermined threshold, and a determination result of the S / N ratio determination unit and the recognition rate determination unit Based on the first Those having a priority updating means for updating the priority of the voice input means and / or second audio input means.

本発明においては、Ｓ／Ｎ比の良否や、音声認識率の良否に基づき、音声入力手段を選択する優先度を更新するため、音声認識率が高い音声入力手段を自動的に選択することができる。 In the present invention, since the priority for selecting the voice input means is updated based on the quality of the S / N ratio and the quality of the voice recognition rate, it is possible to automatically select a voice input means having a high voice recognition rate. it can.

また、前記優先度更新手段は、前記Ｓ／Ｎ比が所定の閾値未満である場合、及び前記音声認識率が所定の閾値未満である場合に、前記優先度を更新することができ、これにより、音声認識する確率が低い音声入力手段の優先順位を下げることができる。 Further, the priority update means can update the priority when the S / N ratio is less than a predetermined threshold and when the speech recognition rate is less than a predetermined threshold. It is possible to lower the priority of the voice input means having a low probability of voice recognition.

更に、前記第１の音声入力手段は、無線信号を受信する音声受信手段であり、前記第２の音声入力手段は、１又は複数のマイクロフォンであって、前記第１の音声入力手段の優先度の初期値が前記第２の音声入力手段の優先度の初期値より高く設定されるものとすることができる。これにより、初期設定時には、必ず無線信号による入力を選択させることができる。 Further, the first voice input means is a voice reception means for receiving a radio signal, and the second voice input means is one or a plurality of microphones, and the priority of the first voice input means. Is set higher than the initial value of the priority of the second voice input means. Thereby, at the time of initial setting, it is possible to always select an input by a radio signal.

更にまた、前記音声受信手段に設定された優先度は固定値であって、前記優先度更新手段は、前記複数のマイクロフォンから入力される音声信号の信号対雑音比及び／又は音声認識率に基づき、当該複数のマイクロフォンに設定された優先度を、前記固定値を上回らない範囲で更新することができる。これにより、入力があれば必ず選択される無線信号により、例えば重要な指示を送ることができる。 Furthermore, the priority set in the voice receiving means is a fixed value, and the priority update means is based on a signal-to-noise ratio and / or a voice recognition rate of voice signals input from the plurality of microphones. The priority set for the plurality of microphones can be updated within a range not exceeding the fixed value. Thus, for example, an important instruction can be sent by a radio signal that is always selected if there is an input.

また、前記音声入力手段に設定される優先度は、リセット信号の入力又は所定のタイミングで前記初期値にリセットされるものとすることができる。 The priority set in the voice input means can be reset to the initial value when a reset signal is input or at a predetermined timing.

本発明に係るロボット装置の制御方法は、音声認識結果に基づき自律的に移動するロボット装置の制御方法であって、第１及び第２の音声入力手段に設定される優先度に基づき当該第１又は第２の音声入力手段からの入力を切替選択する音声入力選択工程と、前記切替選択された前記第１又は第２の音声入力手段から入力された音声を認識する音声認識工程と、前記音声認識工程における音声認識結果に基づき行動選択する行動選択工程とを備え、前記音声入力選択工程は、前記音声入力手段からの音声の信号対雑音比が所定の閾値以上であるか否かを判定するＳ／Ｎ比判定工程、及び／又は前記音声入力手段からの音声の音声認識率が所定の閾値以上であるか否かを判定する認識率判定工程と、前記Ｓ／Ｎ比判定工程及び／又は認識率判定工程の判定結果に基づき前記第１の音声入力手段及び／又は第２の音声入力手段の優先度を更新する優先度更新工程とを有する。 A control method for a robotic device according to the present invention is a control method for a robotic device that moves autonomously based on a voice recognition result, and is based on the priority set in the first and second voice input means. Alternatively, a voice input selection step for switching and selecting an input from the second voice input means, a voice recognition step for recognizing a voice inputted from the first or second voice input means selected for switching, and the voice An action selection step of selecting an action based on a voice recognition result in the recognition step, wherein the voice input selection step determines whether or not the signal-to-noise ratio of the voice from the voice input means is greater than or equal to a predetermined threshold value. An S / N ratio determination step, and / or a recognition rate determination step of determining whether or not a voice recognition rate of the voice from the voice input means is equal to or higher than a predetermined threshold, and the S / N ratio determination step and / or Recognition rate judgment Based on the extent of the determination result and a priority updating step of updating the priority of the first audio input unit and / or second audio input means.

本発明においては、Ｓ／Ｎ比又は認識率に応じて、複数の音声入力手段のいずれかを選択する優先順位を変更することができ、Ｓ／Ｎ比や、認識率が良好なものを自動的に選択することができる。 In the present invention, the priority order for selecting one of a plurality of voice input means can be changed according to the S / N ratio or the recognition rate, and the one having a good S / N ratio or recognition rate is automatically selected. Can be selected.

本発明によれば、音声認識率を向上させることができるロボット装置及びその動作制御方法を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the robot apparatus which can improve a speech recognition rate, and its operation | movement control method can be provided.

以下、本発明を適用した具体的な実施の形態について、図面を参照しながら詳細に説明する。図１は、本発明の実施の形態にかかる２足歩行タイプのロボットを示す斜視図である。図１に示すように、ロボット１は、体幹部ユニット１ｃの所定の位置に頭部ユニット１ａ、左右２つの腕部ユニット１ｂ、左右２つの脚部ユニット１ｄが連結されている。なお、本実施の形態においては、２足歩行タイプのロボットとして説明するが、４足歩行等であってもよく、又は脚部は、車輪等からなるものであってもよい。 Hereinafter, specific embodiments to which the present invention is applied will be described in detail with reference to the drawings. FIG. 1 is a perspective view showing a biped walking type robot according to an embodiment of the present invention. As shown in FIG. 1, in the robot 1, a head unit 1a, two left and right arm units 1b, and two right and left leg units 1d are connected to a predetermined position of the trunk unit 1c. In this embodiment, the robot is described as a biped walking type robot, but it may be a quadruped walking or the like, or the leg may be composed of wheels or the like.

図２は、本実施の形態にかかるロボットを示すブロック図である。ロボット１は、制御部１０１、入出力部１０２、駆動部１０３、電源部１０４、及び外部記憶部１０５などを有している。 FIG. 2 is a block diagram showing the robot according to the present embodiment. The robot 1 includes a control unit 101, an input / output unit 102, a drive unit 103, a power supply unit 104, an external storage unit 105, and the like.

入出力部１０２は、周囲の映像を取得するためのＣＣＤ（Charge Coupled Device）などからなるカメラ１２１、周囲の音を集音するための１又は複数の内蔵マイク１２２、音声を出力してユーザと対話等を行なうためのスピーカ１２３、ユーザへの応答や感情等を表現するためのＬＥＤ１２４、タッチセンサなどからなるセンサ部１２５などを備える。 The input / output unit 102 includes a camera 121 such as a CCD (Charge Coupled Device) for acquiring surrounding video, one or a plurality of built-in microphones 122 for collecting surrounding sounds, and outputs audio to the user. A speaker 123 for performing a dialogue and the like, an LED 124 for expressing a response to the user, emotions, and the like, a sensor unit 125 including a touch sensor, and the like are provided.

また、駆動部１０３は、モータ１３１及びモータを駆動するドライバ１３２などを有し、ユーザの指示などに従って脚部ユニット１ｄや腕部ユニット１ｂを動作させる。電源部１０４は、バッテリ１４１及びその放充電を制御するバッテリ制御部１４２を有し、各部に電源を供給する。 The drive unit 103 includes a motor 131 and a driver 132 that drives the motor, and operates the leg unit 1d and the arm unit 1b according to a user instruction. The power supply unit 104 includes a battery 141 and a battery control unit 142 that controls discharging and charging thereof, and supplies power to each unit.

外部記憶部１０５は、着脱可能なＨＤＤ、光ディスク、光磁気ディスク等からなり、各種プログラムや制御パラメータなどを記憶し、そのプログラムやデータを必要に応じて制御部１０１内のメモリ（不図示）等に供給する。 The external storage unit 105 includes a removable HDD, an optical disk, a magneto-optical disk, and the like, stores various programs and control parameters, and stores the programs and data in a memory (not shown) in the control unit 101 as necessary. To supply.

制御部は、ＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）、無線通信用のインターフェースなどを有し、ロボット１の各種動作を制御する。そして、この制御部１０１は、例えばＲＯＭに格納された制御プログラムに従って音声認識を行なう音声認識モジュール１２、認識結果に基づきユーザに発話動作をする音声発話モジュール１３、カメラ１２１により取得した映像を解析する画像認識モジュール１４、各種認識結果に基づきとるべき行動を選択する行動決定モジュール１５などを有する。 The control unit includes a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), a wireless communication interface, and the like, and controls various operations of the robot 1. And this control part 101 analyzes the image | video acquired by the speech recognition module 12 which performs speech recognition according to the control program stored, for example in ROM, the speech utterance module 13 which speaks to a user based on a recognition result, and the camera 121 The image recognition module 14 includes an action determination module 15 that selects an action to be taken based on various recognition results.

ここで、本実施の形態にかかるロボット１は、ユーザ又はオペレータの呼びかけ等の音声を認識し、音声認識結果に基づき動作するものである。次に、本実施の形態にかかるロボット１の動作制御方法について詳細に説明する。 Here, the robot 1 according to the present embodiment recognizes a voice such as a user or operator call and operates based on the voice recognition result. Next, the operation control method of the robot 1 according to the present embodiment will be described in detail.

図３は、本実施の形態にかかるロボットにおいて、音声認識にかかわる部分のみを抜き出して示すブロック図である。図３に示すように、ロボット１は、上述したように入出力部として複数の内蔵マイク及びスピーカからなる音声入出力経路を有する。すなわち、本実施の形態においては、正面内蔵マイク２１及びスピーカ２２を使用する音声入出力経路Ａ２、左側内蔵マイク４１及びスピーカを使用する音声入出力経路Ａ３、並びに右側内蔵マイク５１及びスピーカを使用する音声入出力経路Ａ４の３つの音声入出力経路Ａ２〜Ａ４を有している。なお、音声入出力経路は３つに限るものではない。更に、本ロボット１は、無線ＬＡＮ３１を有し、オペレータによって外部から当該無線ＬＡＮ３１を介して音声入力が可能な音声入出力経路Ａ１も有している。 FIG. 3 is a block diagram showing only a portion related to speech recognition in the robot according to the present embodiment. As shown in FIG. 3, the robot 1 has a voice input / output path including a plurality of built-in microphones and speakers as an input / output unit as described above. That is, in this embodiment, the voice input / output path A2 using the front built-in microphone 21 and the speaker 22, the left built-in microphone 41 and the voice input / output path A3 using the speaker, and the right built-in microphone 51 and the speaker are used. There are three voice input / output paths A2 to A4 of the voice input / output path A4. Note that the number of voice input / output paths is not limited to three. Further, the robot 1 has a wireless LAN 31 and also has a voice input / output path A1 through which an operator can input voice from the outside via the wireless LAN 31.

さらに、ロボット１の制御部１０１は、これら音声入出力経路Ａ１〜Ａ４のいずれを使用するかを選択する音声信号切替器１１を有している。音声信号切替器１１は、音声入出力経路Ａ１〜Ａ４のいずれかを選択し、当該経路と音声認識モジュール１２とを接続する。 Further, the control unit 101 of the robot 1 has an audio signal switch 11 that selects which of these audio input / output paths A1 to A4 is used. The voice signal switching unit 11 selects any one of the voice input / output paths A1 to A4 and connects the path and the voice recognition module 12.

本実施の形態においては、内蔵マイク２１、４１、５１から音声を入力するのが顧客であり、無線ＬＡＮ３１を介して音声入力するのがオペレータである場合について説明する。オペレータは、例えばマイク７４、及びスピーカ７５を有する遠隔操作用ＰＣ（Personal Computer）７１を使用して無線ＬＡＮ７３を制御し、ロボット１に教示信号を入力する。例えば、オペレータは、遠隔操作用ＰＣ７１に接続されたヘッドセットや、タイピンマイク及びイヤホンなどの入力手段を使用して音声情報を入力したり、又はキーボードなどの入力手段により文字情報を入力したりすることで、ロボット１に教示信号を入力することができる。一方で、ロボット１は、内蔵マイク２１、４１、５１及びスピーカ２２、４２、５２を使用して顧客と対話をすることができる。 In the present embodiment, a case will be described in which a customer inputs voice from the built-in microphones 21, 41, 51, and an operator inputs voice via the wireless LAN 31. The operator controls the wireless LAN 73 using, for example, a remote operation PC (Personal Computer) 71 having a microphone 74 and a speaker 75 and inputs a teaching signal to the robot 1. For example, the operator inputs voice information using a headset connected to the PC 71 for remote operation, input means such as a tie pin microphone and earphone, or inputs character information using an input means such as a keyboard. Thus, a teaching signal can be input to the robot 1. On the other hand, the robot 1 can interact with the customer using the built-in microphones 21, 41, 51 and the speakers 22, 42, 52.

ロボット１内の音声信号切替器１１は、音声入出力経路Ａ１〜Ａ４のいずれかを、各経路Ａ１〜Ａ４に設定された優先度に基づいて選択する。ここで、本実施の形態においては、無線ＬＡＮ３１を介して音声が入力される音声入出力経路Ａ１の優先度を固定とし、かつ最も高い優先度とする。一方、内蔵マイクによる音声入出力経路Ａ２〜Ａ４の優先度は可変となっている。 The voice signal switching unit 11 in the robot 1 selects one of the voice input / output paths A1 to A4 based on the priority set for each path A1 to A4. Here, in the present embodiment, the priority of the voice input / output path A1 through which voice is input via the wireless LAN 31 is fixed and set to the highest priority. On the other hand, the priorities of the voice input / output paths A2 to A4 by the built-in microphone are variable.

図４は、音声信号切替器１１及び音声認識モジュール１２の詳細を示すブロック図である。図４に示すように、音声信号切替器１１は、図４に示すように、優先度更新部２０１を有し、内蔵マイクにより構成される音声入出力経路Ａ２〜Ａ４の優先度を後述する方法により更新する。 FIG. 4 is a block diagram showing details of the voice signal switching unit 11 and the voice recognition module 12. As shown in FIG. 4, the audio signal switching unit 11 has a priority update unit 201 as shown in FIG. Update with

音声認識モジュール１２は、Ｓ／Ｎ比判定部２１１及び音声認識率判定部２１２を有する。Ｓ／Ｎ比判定部２１１は、入力される音声のＳ／Ｎ比の良否を判定する。また、音声認識率判定部２１２は、入力される音声の認識率の良否を判定する。優先度更新部２０１は、Ｓ／Ｎ比判定部２１１のＳ／Ｎ比判定結果、音声認識率判定部２１２の音声認識率判定結果に基づき、音声入出力経路Ａ２〜Ａ４の優先度を更新する。 The speech recognition module 12 includes an S / N ratio determination unit 211 and a speech recognition rate determination unit 212. The S / N ratio determination unit 211 determines the quality of the S / N ratio of the input voice. In addition, the voice recognition rate determination unit 212 determines whether the recognition rate of the input voice is good or bad. The priority update unit 201 updates the priorities of the voice input / output paths A2 to A4 based on the S / N ratio determination result of the S / N ratio determination unit 211 and the voice recognition rate determination result of the voice recognition rate determination unit 212. .

このように設定された優先度により、オペレータからの音声入力があると、同時に顧客からの音声入力があっても、常に優先度の高い音声入出力経路Ａ１が選択されることとなる。一方、オペレータからの音声入力がなく、顧客がロボット１に話かけた場合には、音声入出力経路Ａ２〜Ａ４のうち、その時点で優先度が最も高い音声入出力経路が選択されることとなる。本例においては、初期設定としては、音声入出力経路Ａ２の優先度＞経路Ａ３の優先度＞経路Ａ４の優先度とするものとする。 If there is a voice input from the operator according to the priority set in this way, the voice input / output path A1 having a high priority is always selected even if there is a voice input from the customer at the same time. On the other hand, when there is no voice input from the operator and the customer talks to the robot 1, the voice input / output path with the highest priority at that time is selected from the voice input / output paths A2 to A4. Become. In this example, as an initial setting, the priority of the voice input / output path A2> the priority of the path A3> the priority of the path A4.

次に、優先度の更新方法について説明する。先ず、第１の優先度更新方法として、音声の信号対雑音比（Ｓ／Ｎ比）が所定の値未満である場合、その優先度を下げる方法がある。Ｓ／Ｎ比が小さければ、音声を正確に認識できる確率が小さく、正確な音声指示を行なうことができない可能性があるからである。よって、音声認識モジュール１２のＳ／Ｎ比判定部２１１により、入力された音声信号のＳ／Ｎ比が所定の閾値未満であるか否かを判定し、その結果を優先度更新部２０１に通知する。優先度更新部２０１は、この判定結果に基づき優先度を再設定する。 Next, a priority update method will be described. First, as a first priority update method, there is a method of lowering the priority when the audio signal-to-noise ratio (S / N ratio) is less than a predetermined value. This is because if the S / N ratio is small, the probability of correctly recognizing the voice is low and there is a possibility that an accurate voice instruction cannot be performed. Therefore, the S / N ratio determination unit 211 of the voice recognition module 12 determines whether or not the S / N ratio of the input voice signal is less than a predetermined threshold, and notifies the priority update unit 201 of the result. To do. The priority update unit 201 resets the priority based on the determination result.

また、第２の優先度更新方法として、音声認識の認識率が所定の閾値以下であった場合に、その優先度を下げる方法がある。音声認識手法としては、ＤＰ（Dynamic Programming）マッチングによる方法、ＨＭＭ（隠れマルコフモデル：Hidden Markov Model）による方法等、公知の手法を利用することができる。そして、音声認識率としては、これらの確率モデルの結果算出される尤度を使用することができる。この場合、音声認識モジュール１２の音声認識率判定部２１２により、音声認識率が所定の閾値未満であるか否かを判定し、その結果を優先度更新部２０１に通知する。優先度更新部２０１は、この判定結果に基づき優先度を再設定する。 Further, as a second priority update method, there is a method of lowering the priority when the recognition rate of voice recognition is equal to or lower than a predetermined threshold. As a speech recognition method, a known method such as a method using DP (Dynamic Programming) matching or a method using HMM (Hidden Markov Model) can be used. As the speech recognition rate, the likelihood calculated as a result of these probability models can be used. In this case, the speech recognition rate determination unit 212 of the speech recognition module 12 determines whether or not the speech recognition rate is less than a predetermined threshold, and notifies the priority update unit 201 of the result. The priority update unit 201 resets the priority based on the determination result.

なお、本実施の形態においては、音声信号切替器１１内に優先度更新部２０１を有し、音声認識モジュール１２がＳ／Ｎ比判定部２１１、音声認識率判定部２１２を有するものとして説明するが、優先度更新部、Ｓ／Ｎ比判定部、音声認識率判定部を別途設けるようにしてもよいことは言うまでもない。 In the present embodiment, it is assumed that the voice signal switching unit 11 includes the priority update unit 201 and the voice recognition module 12 includes the S / N ratio determination unit 211 and the voice recognition rate determination unit 212. However, it goes without saying that a priority update unit, an S / N ratio determination unit, and a speech recognition rate determination unit may be provided separately.

次に、音声信号切替器１１の優先度更新方法について図５を参照して詳細に説明する。図５は、本実施の形態にかかる優先度更新方法を示すフローチャートである。先ず、音声信号切替器１１に、音声入出力経路Ａ１〜Ａ４の初期優先度を設定する（ステップＳ１）。このとき上述したように、同時に音声入力があった場合に音声入出力経路Ａ１が優先的に選択されるよう優先度が設定されている。また、本実施の形態においては、内蔵マイクを使用する音声入出力経路が複数存在するため、これらの音声入出力経路の優先度は個別に設定される。 Next, the priority update method of the audio signal switch 11 will be described in detail with reference to FIG. FIG. 5 is a flowchart showing the priority update method according to the present embodiment. First, the initial priority of the voice input / output paths A1 to A4 is set in the voice signal switch 11 (step S1). At this time, as described above, the priority is set so that the voice input / output path A1 is preferentially selected when voices are input simultaneously. In this embodiment, since there are a plurality of voice input / output paths using the built-in microphone, the priority of these voice input / output paths is individually set.

そして、音声入力があれば（ステップＳ２：Ｙｅｓ）、このとき最も高い優先度の音声入力を、音声認識モジュール１２に接続する（ステップＳ３）。本実施の形態においては、無線ＬＡＮ３１を介した音声入出力経路Ａ１の優先度が常に最も高く、内蔵マイクを介した音声入出力経路Ａ２〜Ａ４の優先度は、音声入出力経路Ａ１の優先度を上回らない範囲で更新される。よって、無線ＬＡＮ３１を介して音声入力があった場合には、内蔵マイク２１等に音声入力があっても、無線ＬＡＮ３１による音声入出力経路Ａ１からの入力が音声認識モジュール１２へ供給される。すなわち、ロボット１と顧客が対話をしている場合であっても、オペレータがいつでも介入することができる。例えば、ロボット１と顧客の対話中に、ロボット１が顧客の指示を誤認識するなどして危険な動作を開始した場合には、オペレータが音声コマンドによりロボットに動作停止を命令することができる。また、ロボットと顧客の対話環境の背景ノイズが大きく音声認識に失敗してしまうような場合には、顧客に代わってオペレータが指示を出すことができる。 If there is a voice input (step S2: Yes), the voice input with the highest priority at this time is connected to the voice recognition module 12 (step S3). In the present embodiment, the priority of the voice input / output path A1 via the wireless LAN 31 is always highest, and the priority of the voice input / output paths A2 to A4 via the built-in microphone is the priority of the voice input / output path A1. It is updated within the range not exceeding. Therefore, when there is a voice input via the wireless LAN 31, even if the built-in microphone 21 or the like has a voice input, the input from the voice input / output path A1 by the wireless LAN 31 is supplied to the voice recognition module 12. That is, even when the robot 1 and the customer are interacting, the operator can intervene at any time. For example, during a dialogue between the robot 1 and the customer, if the robot 1 starts a dangerous operation by erroneously recognizing the customer's instruction, the operator can instruct the robot to stop the operation with a voice command. Further, when the background noise of the interaction environment between the robot and the customer is large and voice recognition fails, the operator can give an instruction on behalf of the customer.

音声入力があると、次にＳ／Ｎ比判定部２１１により、入力された音声情報のＳ／Ｎ比が所定の閾値以上であるか否かを判定する。この際、例えば、Ｓ／Ｎ比が所定の閾値未満である場合には、入力された音声認識を中止し、音声発話モジュール１３により、再度命令してもらうよう発話する等してもよい。 When there is a voice input, the S / N ratio determination unit 211 next determines whether or not the S / N ratio of the input voice information is equal to or greater than a predetermined threshold value. At this time, for example, when the S / N ratio is less than a predetermined threshold value, the input speech recognition may be stopped, and the speech utterance module 13 may utter the voice so as to be commanded again.

Ｓ／Ｎ比が所定の閾値以上ある場合には（ステップＳ４：Ｙｅｓ）、音声認識率が所定の閾値以上であるか否かを、音声認識率判定部２１２が判定する。音声認識率、例えば確率モデルの結果算出される尤度が所定の閾値以上である場合には、その音声認識結果に基づき、音声発話モジュール１３により発話を行なったり、その他、手を振るなどの動作を発現したりする（ステップＳ６）。 When the S / N ratio is equal to or higher than the predetermined threshold (step S4: Yes), the voice recognition rate determination unit 212 determines whether or not the voice recognition rate is equal to or higher than the predetermined threshold. When the speech recognition rate, for example, the likelihood calculated as a result of the probability model is equal to or greater than a predetermined threshold, the speech utterance module 13 utters or otherwise shakes the hand based on the speech recognition result (Step S6).

一方、音声入力はないが（ステップＳ２：Ｎｏ）、リセット信号が入力された場合（ステップＳ７：Ｙｅｓ）、音声信号切替器１１は、各音声入出力経路Ａ１〜Ａ４の優先度を初期値に再設定する。また、音声入力があった場合であって、Ｓ／Ｎ比が所定の閾値未満であった場合（ステップＳ：Ｎｏ）及び認識率が所定の閾値未満であった場合（ステップＳ５：Ｎｏ）は、当該音声入出力経路の優先度が可変であれば（ステップＳ９：Ｙｅｓ）、当該優先度を下げる。また、優先度が固定の場合には、ステップＳ２の処理に戻る。内蔵スピーカの音声入出力経路Ａ２〜Ａ４の優先度を可変にすることで、Ｓ／Ｎ比が良好になる顧客に最も近い側の内蔵マイクの入出力経路が自動的に選択される。また、顧客に取り囲まれてそれぞれが良好なＳ／Ｎ比でロボット１に問いかける場合には、確率モデルの尤度が高い、すなわち認識しやすい発音をしている顧客側の内蔵マイクの入出力経路が自動的に選択されることとなる。 On the other hand, if there is no voice input (step S2: No), but a reset signal is input (step S7: Yes), the voice signal switch 11 sets the priority of the voice input / output paths A1 to A4 to the initial values. Reset it. Also, when there is a voice input, when the S / N ratio is less than a predetermined threshold (step S: No) and when the recognition rate is less than a predetermined threshold (step S5: No) If the priority of the voice input / output path is variable (step S9: Yes), the priority is lowered. If the priority is fixed, the process returns to step S2. By making the priority of the voice input / output paths A2 to A4 of the built-in speaker variable, the input / output path of the built-in microphone closest to the customer whose S / N ratio is good is automatically selected. Also, when the robot 1 is surrounded by the customers and asks the robot 1 with a good S / N ratio, the input / output path of the built-in microphone on the customer side having a high probability of the probability model, that is, the pronunciation that is easy to recognize Will be automatically selected.

このように、音声信号切替器１１の優先度更新部２０１は、入力音声のＳ／Ｎ比が小さかったり、認識率が悪かった場合にその入力ラインの優先度を下げることで、誤認識しやすい、質が低い音声入力を選択しないようにすることができる。なお、更新された優先度は、リセット信号の入力により初期値に戻されるものとしたが、所定時間音声入力がない場合に優先度を初期値にするよう制御することも可能である。 As described above, the priority update unit 201 of the audio signal switching unit 11 is likely to be erroneously recognized by lowering the priority of the input line when the S / N ratio of the input voice is small or the recognition rate is bad. , You can avoid selecting low quality voice input. Note that the updated priority is returned to the initial value by the input of the reset signal, but it is also possible to control the priority to the initial value when there is no voice input for a predetermined time.

次に、本発明の実施の形態の変形例について説明する。図３に示すロボット１においては、オペレータが無線ＬＡＮを使用して音声指示を送るものとして説明したが、携帯電話を使用して音声指示を送ることも可能である。 Next, a modification of the embodiment of the present invention will be described. In the robot 1 shown in FIG. 3, it has been described that the operator sends a voice instruction using a wireless LAN, but it is also possible to send a voice instruction using a mobile phone.

図６は、本実施の形態の変形例にかかるロボットの音声認識に関わる要部を抜き出して示すブロック図である。図６に示すように、ロボット１は、通信手段として電話機６１を備える。その他の構成は、実施の形態１と同様であり、同一構成要素には同一の符号を付しその詳細な説明は省略する。 FIG. 6 is a block diagram showing a main part related to the voice recognition of the robot according to the modification of the present embodiment. As shown in FIG. 6, the robot 1 includes a telephone 61 as communication means. Other configurations are the same as those of the first embodiment, and the same components are denoted by the same reference numerals and detailed description thereof is omitted.

オペレータは、携帯電話機８１を介してロボット１に音声信号を入力することができる。音声信号切替器１１は、実施の形態１と同様に、優先度が固定の音声入出力経路Ａ１として、電話機６１を介したオペレータからの音声入力経路を有している。また、上述と同様、優先度が可変の内蔵マイク及びスピーカによる音声入出力経路Ａ２〜Ａ４を有しており、音声信号切替器１１が音声認識モジュール１２との接続を優先度に応じて切り替える。本変形例においても、電話機６１から入力があった場合には、これを優先的に選択するよう、優先度が設定されるものとする。そして、音声信号切替器１１は、Ｓ／Ｎ比の大きさや音声認識率に基づき、音声入出力経路Ａ２〜Ａ４の優先度を更新する。 The operator can input a voice signal to the robot 1 via the mobile phone 81. As in the first embodiment, the audio signal switching unit 11 has an audio input path from the operator via the telephone 61 as the audio input / output path A1 having a fixed priority. Similarly to the above, the voice input / output paths A2 to A4 are provided by the built-in microphone and the speaker with variable priority, and the voice signal switch 11 switches the connection with the voice recognition module 12 according to the priority. Also in this modification, when there is an input from the telephone 61, the priority is set so as to preferentially select it. And the audio | voice signal switch 11 updates the priority of audio | voice input / output path | route A2-A4 based on the magnitude | size of an S / N ratio, and an audio | voice recognition rate.

本実施の形態においては、無線通信による音声入力の優先度を固定かつ最も高いものとし、複数の内蔵マイクの音声入力はその優先度に応じて適宜選択するものとしたので、例えば顧客の指示を誤って認識して誤動作した場合には、オペレータが無線通信による音声入力により、その動作を直ちに停止又は修正することができる。また、複数の内蔵マイクのうち、優先度に応じて良好なＳ／Ｎ比を有するもの、良好な認識率を示すものが自動的に選択されるので、認識率を向上させることができる。 In the present embodiment, the priority of voice input by wireless communication is fixed and highest, and the voice input of a plurality of built-in microphones is appropriately selected according to the priority. In the case of erroneous recognition and malfunction, the operation can be immediately stopped or corrected by voice input through wireless communication. Further, among the plurality of built-in microphones, those having a good S / N ratio and those showing a good recognition rate are automatically selected according to the priority, so that the recognition rate can be improved.

なお、本発明は上述した実施の形態のみに限定されるものではなく、本発明の要旨を逸脱しない範囲において種々の変更が可能であることは勿論である。例えば、本実施の形態においては、各内蔵マイクに対応するスピーカを有するものとして説明したが、スピーカは１つであってもよい。また、正面内蔵マイク、右側内蔵マイク、左側内蔵マイクを複数設け、複数の正面内蔵マイク、右側内蔵マイク、左側内蔵マイクに対して優先度を設定・更新するようにしてもよい。更に、本実施の形態においては、無線通信を介して入力される音声信号の入出力経路の優先度を固定としたが可変としてもよい。 It should be noted that the present invention is not limited to the above-described embodiments, and various modifications can be made without departing from the scope of the present invention. For example, although the present embodiment has been described as having a speaker corresponding to each built-in microphone, the number of speakers may be one. Further, a plurality of front built-in microphones, right built-in microphones, and left built-in microphones may be provided, and priority may be set and updated for a plurality of front built-in microphones, right built-in microphones, and left built-in microphones. Furthermore, in this embodiment, the priority of the input / output path of the audio signal input via wireless communication is fixed, but may be variable.

本発明の実施の形態にかかる２足歩行タイプのロボットを示す斜視図である。1 is a perspective view showing a biped walking type robot according to an embodiment of the present invention. 本発明の実施の形態にかかるロボットを示すブロック図である。It is a block diagram which shows the robot concerning embodiment of this invention. 本発明の実施の形態にかかるロボットにおいて、音声認識にかかわる部分のみを抜き出して示すブロック図である。FIG. 3 is a block diagram showing only a portion related to voice recognition in the robot according to the embodiment of the present invention. 本発明の実施の形態にかかるロボット装置における音声信号切替器及び音声認識モジュールの詳細を示すブロック図である。It is a block diagram which shows the detail of the audio | voice signal switcher and speech recognition module in the robot apparatus concerning embodiment of this invention. 本発明の実施の形態にかかる優先度更新方法を示すフローチャートである。It is a flowchart which shows the priority update method concerning embodiment of this invention. 本発明の実施の形態の変形例にかかるロボットの音声認識に関わる要部を抜き出して示すブロック図である。It is a block diagram which extracts and shows the principal part in connection with the speech recognition of the robot concerning the modification of embodiment of this invention.

Explanation of symbols

１ロボット
１ａ頭部ユニット
１ｂ腕部ユニット
１ｃ体幹部ユニット
１ｄ脚部ユニット
１１音声信号切替器
１２音声認識モジュール
１３音声発話モジュール
１４画像認識モジュール
１５行動決定モジュール
２１、４１、５１内蔵マイク
２２、４２、５２、７５スピーカ
６１電話機
７４マイク
８１携帯電話機
１０１制御部
１０２入出力部
１０３駆動部
１０４電源部
１０５外部記憶部
１２１カメラ
１２２内蔵マイク
１２５センサ部
１３１モータ
１３２ドライバ
１４１バッテリ
１４２バッテリ制御部
２０１優先度更新部
２１１比判定部
２１２音声認識率判定部 DESCRIPTION OF SYMBOLS 1 Robot 1a Head unit 1b Arm unit 1c Trunk unit 1d Leg unit 11 Voice signal switcher 12 Voice recognition module 13 Voice utterance module 14 Image recognition module 15 Action determination module 21, 41, 51 Built-in microphones 22, 42, 52, 75 Speaker 61 Telephone 74 Microphone 81 Mobile phone 101 Control unit 102 Input / output unit 103 Drive unit 104 Power supply unit 105 External storage unit 121 Camera 122 Built-in microphone 125 Sensor unit 131 Motor 132 Driver 141 Battery 142 Battery control unit 201 Priority update Unit 211 ratio determination unit 212 speech recognition rate determination unit

Claims

A robot device that operates autonomously based on a voice recognition result,
First and second voice input means set with priority;
Voice signal switching means for switching and selecting input from the first or second voice input means based on the priority;
Voice recognition means for recognizing the voice input from the first or second voice input means selected by the voice signal switching means;
Action selection means for selecting an action based on the voice recognition result of the voice recognition means,
The voice signal switching means is configured to determine whether the signal-to-noise ratio of the voice from the voice input means is equal to or higher than a predetermined threshold value, and the first voice input means and / or the second voice input means. Robot device that updates the priority of the.

A robot device that moves autonomously based on a voice recognition result,
First and second voice input means set with priority;
Voice signal switching means for switching and selecting input from the first or second voice input means based on the priority;
Voice recognition means for recognizing the voice input from the first or second voice input means selected by the voice signal switching means;
Action selection means for selecting an action based on the voice recognition result of the voice recognition means,
The voice signal switching means is configured to determine whether the voice recognition rate of the voice from the voice input means is greater than or equal to a predetermined threshold value based on a determination result of the first voice input means and / or the second voice input means. Robot device that updates priority.

A robot device that moves autonomously based on a voice recognition result,
First and second voice input means set with priority;
Voice signal switching means for switching and selecting input from the first or second voice input means based on the priority;
Voice recognition means for recognizing the voice input from the first or second voice input means selected by the voice signal switching means;
An action selection means for selecting an action based on a voice recognition result of the voice recognition means;
S / N ratio determination means for determining whether or not the signal-to-noise ratio of the voice from the voice input means is equal to or greater than a predetermined threshold;
A recognition rate determination unit that determines whether or not a voice recognition rate of the voice from the voice input unit is equal to or greater than a predetermined threshold;
A robot apparatus comprising: a priority update unit that updates the priority of the first voice input unit and / or the second voice input unit based on determination results of the S / N ratio determination unit and the recognition rate determination unit.

The priority update unit updates the priority when the S / N ratio is less than a predetermined threshold and when the speech recognition rate is less than a predetermined threshold. The robot apparatus described.

The first voice input means is a voice reception means for receiving a radio signal, and the second voice input means is one or a plurality of microphones,
5. The robot according to claim 1, wherein an initial value of priority of the first voice input unit is set higher than an initial value of priority of the second voice input unit. apparatus.

The priority set in the voice receiving means is a fixed value,
The priority update means is a range in which the priority set for the plurality of microphones does not exceed the fixed value based on the signal-to-noise ratio and / or the speech recognition rate of the audio signals input from the plurality of microphones. The robot apparatus according to claim 5, wherein the robot apparatus is updated by.

The robot apparatus according to claim 5 or 6, wherein the priority set in the voice input unit is reset to the initial value when a reset signal is input or at a predetermined timing.

A method for controlling a robot device that moves autonomously based on a speech recognition result,
A voice input selection step of switching and selecting the input from the first or second voice input means based on the priority set in the first and second voice input means;
A voice recognition step for recognizing a voice inputted from the first or second voice input means selected for switching;
An action selection step of selecting an action based on the voice recognition result in the voice recognition step,
The voice input selection step includes
S / N ratio determination step for determining whether or not the signal-to-noise ratio of the voice from the voice input means is equal to or greater than a predetermined threshold, and / or the voice recognition rate of the voice from the voice input means is a predetermined threshold A recognition rate determination step of determining whether or not
A priority update step of updating the priority of the first voice input means and / or the second voice input means based on the determination result of the S / N ratio determination step and / or the recognition rate determination step. Control method.