JP4915665B2

JP4915665B2 - Controller with voice recognition function

Info

Publication number: JP4915665B2
Application number: JP2007109809A
Authority: JP
Inventors: 賢二中北; 清隆竹原; 健治奥野; 朗馬場; 新平日比谷
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 2007-04-18
Filing date: 2007-04-18
Publication date: 2012-04-11
Anticipated expiration: 2027-04-18
Also published as: JP2008268450A

Description

本発明は、音声認識機能付き操作器に関するものである。 The present invention relates to an operating device with a voice recognition function.

従来、ユーザが発話した音声を認識し、認識結果をもとに負荷のオン、オフを制御する音声認識装置が提供されている。 2. Description of the Related Art Conventionally, there has been provided a voice recognition device that recognizes voice spoken by a user and controls load on / off based on a recognition result.

また、このような負荷制御装置に用いられる音声対話装置として、ユーザーが音声を入力する音声入力手段と、音声認識手段にて用いる認識可能語彙の音響モデルを記憶する音声認識用辞書記憶手段と、入力音声を認識用辞書に予め記憶された認識可能語彙の音響モデルと比較することによって入力音声を認識する音声認識手段とを備えたものが従来提案されている（例えば特許文献１参照）。
特開平１１−２１２５９４号公報 Further, as a voice interaction device used for such a load control device, a voice input means for a user to input voice, a voice recognition dictionary storage means for storing an acoustic model of a recognizable vocabulary used in the voice recognition means, Conventionally, a speech recognition unit that recognizes an input speech by comparing the input speech with an acoustic model of a recognizable vocabulary stored in a recognition dictionary in advance has been proposed (see, for example, Patent Document 1).
JP-A-11-212594

上述の音声対話装置では、入力音声を予め記憶された認識可能語彙の音響モデルと比較することで音声認識を行っているため、音声で負荷を操作したいユーザは、音声操作に使用できる認識可能語彙を記憶しておく必要がある。ここで、音声操作に使用される認識可能語彙が数個であれば、ユーザにとって認識可能語彙を記憶する負担はあまり大きくないが、オン／オフだけではなく細かい制御（例えば空調装置であれば温度や風量の設定など）を行おうとすると、制御に必要な認識可能語彙の数が増加し、ユーザにとって認識可能語彙を記憶する負担が増大するという問題があった。 In the above-described speech dialogue apparatus, since speech recognition is performed by comparing the input speech with an acoustic model of a recognizable vocabulary stored in advance, a recognizable vocabulary that a user who wants to operate a load with speech can use for speech operation. Must be remembered. Here, if there are several recognizable vocabulary words used for voice operation, the burden of storing the recognizable vocabulary for the user is not so great, but not only on / off, but also fine control (for example, temperature for an air conditioner) And setting the air volume), the number of recognizable vocabulary necessary for control increases, and there is a problem that the burden of storing recognizable vocabulary for the user increases.

認識可能語彙を記憶するユーザの負担を軽減するために、認識可能語彙を紙などの媒体に記載して、音声認識装置に並設しておくことも考えられるが、音声操作に使用できる認識可能語彙を探す手間が増えるため、手動でスイッチを操作する場合に比べてユーザの負担が増大するばかりか、音声操作による操作性の向上という目的につながっていないという問題もあった。 In order to reduce the burden on the user who stores recognizable vocabulary, it is possible to write the recognizable vocabulary on a medium such as paper and place it on the speech recognition device. Since there is an increase in time and effort for searching for a vocabulary, there is a problem that not only the burden on the user is increased as compared with the case of manually operating the switch, but also the purpose of improving the operability by voice operation.

本発明は上記問題点に鑑みて為されたものであり、その目的とするところは、認識可能語彙をユーザが記憶する手間を軽減した音声認識機能付き操作器を提供することにある。 The present invention has been made in view of the above-mentioned problems, and an object of the present invention is to provide an operating device with a voice recognition function that reduces the effort for a user to store a recognizable vocabulary.

上記目的を達成するために、請求項１の発明は、被制御機器を操作するための手動操作部と、ユーザの発話した音声が入力される音声入力部と、被制御機器を操作するための複数の操作語彙の音響モデルを記憶した記憶部と、音声入力部に入力された音声と記憶部に記憶された音響モデルとを比較することによって音声認識を行う音声認識部と、音声ガイダンスを出力する音声出力部と、手動操作部による操作入力または音声認識部による認識結果の何れかに基づいて被制御機器を制御する制御部と、制御部の動作モードを手動操作部による操作入力に起因して音声出力部から音声ガイダンスを出力させる学習モード又は音声ガイダンスの出力を停止させる通常モードの何れかに切り換えるモード切換部と、ユーザによる操作語彙の学習状況の進捗度合いを被制御機器に所望の動作を行わせる個々のタスク毎に推定する学習状況推定部を備え、制御部は、音声認識により操作可能な操作が手動操作部により行われると、当該操作が音声認識により行えることを報知するための音声ガイダンスを音声出力部から出力させ、モード切換部は、個々のタスク毎に、学習状況推定部の推定結果が所定レベル以下であれば動作モードを学習モードに切り替え、且つ、推定結果が所定レベルを超えると動作モードを通常モードに切り換えており、学習状況推定部は、個々のタスク毎にタスクを完了するまでの発話回数を積算し、この積算値をタスクの実行回数で除算することによって、タスクを１回実行する際の平均発話回数を求め、この平均発話回数をもとに個々のタスク毎に学習状況の進捗度合いを推定することを特徴とする。ここにおいて、音声ガイダンスの内容としては、手動操作部による操作を音声認識により行える点のみを報知する内容でも良いし、手動操作部による操作を音声認識により実行する場合に用いる操作語彙でも良い。 In order to achieve the above object, the invention of claim 1 is directed to a manual operation unit for operating a controlled device, a voice input unit for inputting a voice spoken by a user, and a device for operating the controlled device. A storage unit that stores an acoustic model of a plurality of operation vocabularies, a speech recognition unit that performs speech recognition by comparing the speech input to the speech input unit and the acoustic model stored in the storage unit, and outputs speech guidance A control unit that controls the controlled device based on either the operation input by the manual operation unit or the recognition result by the voice recognition unit, and the operation mode of the control unit is caused by the operation input by the manual operation unit. A mode switching unit for switching to either a learning mode for outputting voice guidance from the voice output unit or a normal mode for stopping output of voice guidance, and a learning status of operation vocabulary by the user A learning status estimation unit that estimates the degree of progress for each task that causes the controlled device to perform a desired operation, and when the operation that can be operated by voice recognition is performed by the manual operation unit, the control unit Voice guidance for notifying that it can be performed by voice recognition is output from the voice output unit, and the mode switching unit sets the operation mode to the learning mode for each task if the estimation result of the learning status estimation unit is a predetermined level or less. When the estimation result exceeds a predetermined level, the operation mode is switched to the normal mode, and the learning state estimation unit accumulates the number of utterances until the task is completed for each task, and the accumulated value is obtained. by dividing the number of times of execution of a task, an average speech number in performing the task once, progress of the learning situation this average utterance count for each individual task based on Fit, characterized in that it estimated. Here, the content of the voice guidance may be a content that informs only that the operation by the manual operation unit can be performed by voice recognition, or may be an operation vocabulary used when the operation by the manual operation unit is performed by voice recognition.

請求項２の発明は、被制御機器を操作するための手動操作部と、ユーザの発話した音声が入力される音声入力部と、被制御機器を操作するための複数の操作語彙の音響モデルを記憶した記憶部と、音声入力部に入力された音声と記憶部に記憶された音響モデルとを比較することによって音声認識を行う音声認識部と、音声ガイダンスを出力する音声出力部と、手動操作部による操作入力または音声認識部による認識結果の何れかに基づいて被制御機器を制御する制御部と、制御部の動作モードを手動操作部による操作入力に起因して音声出力部から音声ガイダンスを出力させる学習モード又は音声ガイダンスの出力を停止させる通常モードの何れかに切り換えるモード切換部と、ユーザによる操作語彙の学習状況の進捗度合いを被制御機器に所望の動作を行わせる個々のタスク毎に推定する学習状況推定部を備え、制御部は、音声認識により操作可能な操作が手動操作部により行われると、当該操作が音声認識により行えることを報知するための音声ガイダンスを音声出力部から出力させ、モード切換部は、個々のタスク毎に、学習状況推定部の推定結果が所定レベル以下であれば動作モードを学習モードに切り替え、且つ、推定結果が所定レベルを超えると動作モードを通常モードに切り換えており、学習状況推定部は、個々のタスク毎に、音声認識による音声操作と手動操作部による手動操作とを含めた全操作回数に対して音声操作の占める割合又は手動操作の占める割合の少なくとも何れか一方に基づいて学習状況の進捗度合いを推定することを特徴とする。 According to a second aspect of the present invention, there is provided a manual operation unit for operating the controlled device, a voice input unit for inputting a voice spoken by the user, and an acoustic model of a plurality of operation vocabularies for operating the controlled device. A stored storage unit, a voice recognition unit that performs voice recognition by comparing the voice input to the voice input unit and the acoustic model stored in the storage unit, a voice output unit that outputs voice guidance, and manual operation A control unit for controlling the controlled device based on either an operation input by the unit or a recognition result by the voice recognition unit, and voice guidance from the voice output unit due to an operation input by the manual operation unit. A mode switching unit that switches to either the learning mode to be output or the normal mode to stop the output of voice guidance, and the degree of progress of the learning status of the operation vocabulary by the user to the controlled device A learning status estimation unit that estimates each task that performs a desired action is provided, and the control unit notifies that the operation can be performed by voice recognition when an operation that can be performed by voice recognition is performed by the manual operation unit. A voice guidance unit for outputting a voice guidance for each task, and the mode switching unit switches the operation mode to the learning mode if the estimation result of the learning state estimation unit is a predetermined level or less for each task, and the estimation result When the value exceeds a predetermined level, the operation mode is switched to the normal mode, and the learning state estimation unit performs the total number of operations including the voice operation by voice recognition and the manual operation by the manual operation unit for each task. The progress degree of the learning situation is estimated based on at least one of the ratio occupied by the voice operation and the ratio occupied by the manual operation.

請求項３の発明は、被制御機器を操作するための手動操作部と、ユーザの発話した音声が入力される音声入力部と、被制御機器を操作するための複数の操作語彙の音響モデルを記憶した記憶部と、音声入力部に入力された音声と記憶部に記憶された音響モデルとを比較することによって音声認識を行う音声認識部と、音声ガイダンスを出力する音声出力部と、手動操作部による操作入力または音声認識部による認識結果の何れかに基づいて被制御機器を制御する制御部と、制御部の動作モードを手動操作部による操作入力に起因して音声出力部から音声ガイダンスを出力させる学習モード又は音声ガイダンスの出力を停止させる通常モードの何れかに切り換えるモード切換部と、ユーザによる操作語彙の学習状況の進捗度合いを推定する学習状況推定部を備え、制御部は、音声認識により操作可能な操作が手動操作部により行われると、当該操作が音声認識により行えることを報知するための音声ガイダンスを音声出力部から出力させ、モード切換部は、学習状況推定部の推定結果が所定レベル以下であれば動作モードを学習モードに切り替え、且つ、推定結果が所定レベルを超えると動作モードを通常モードに切り換え、学習状況推定部は、被制御機器に所望の動作を行わせる全てのタスクについて、音声認識による音声操作と手動操作部による手動操作とを含めた全操作回数に対して音声操作の占める割合又は手動操作の占める割合の少なくとも何れか一方に基づいて学習状況の進捗度合いを推定することを特徴とする。 According to a third aspect of the present invention, there is provided an acoustic model of a manual operation unit for operating a controlled device, a voice input unit for inputting a voice spoken by a user, and a plurality of operation vocabulary for operating the controlled device. A stored storage unit, a voice recognition unit that performs voice recognition by comparing the voice input to the voice input unit and the acoustic model stored in the storage unit, a voice output unit that outputs voice guidance, and manual operation A control unit for controlling the controlled device based on either an operation input by the unit or a recognition result by the voice recognition unit, and voice guidance from the voice output unit due to an operation input by the manual operation unit. A mode switching unit that switches to either the learning mode to be output or the normal mode to stop the output of voice guidance, and learning to estimate the progress of the learning status of the operation vocabulary by the user When the operation that can be operated by voice recognition is performed by the manual operation unit, the control unit causes the voice output unit to output voice guidance for notifying that the operation can be performed by voice recognition. The switching unit switches the operation mode to the learning mode if the estimation result of the learning situation estimation unit is equal to or lower than a predetermined level, and switches the operation mode to the normal mode if the estimation result exceeds a predetermined level . For all tasks that cause the controlled device to perform a desired action, at least the ratio of voice operation or the ratio of manual operation to the total number of operations including voice operation by voice recognition and manual operation by manual operation unit and estimates the degree of progress of the learning situation based on either one.

請求項４の発明は、請求項１乃至３の何れか１つの発明において、音声入力部に入力された音声からユーザを個別に識別するユーザ識別部を設け、学習状況推定部が、ユーザ識別部により識別された個々のユーザ毎に学習状況の進捗度合いを推定し、制御部は、個々のユーザ毎に学習状況推定部の推定結果に基づいて動作モードの切り換えを行うことを特徴とする。 A fourth aspect of the present invention, in any one invention of claims 1 to 3, provided the user identification unit that identifies the individual user from the voice input to the voice input unit, the learning condition estimation unit, the user identification unit The degree of progress of the learning status is estimated for each individual user identified by the above, and the control unit switches the operation mode based on the estimation result of the learning status estimation unit for each individual user.

請求項１の発明によれば、音声認識により操作可能な操作が手動操作部により行われると、制御部は、上記操作が音声認識により行えることを報知するための音声ガイダンスを音声出力部から出力させているので、音声認識による音声操作が可能なことを知らなかったユーザに対して、音声認識による音声操作が可能なことを知らしめることができ、マニュアルを読むなどして操作語彙を覚えるユーザの負担を軽減し、操作器を使いながら自然に新しい操作語彙を学習させることができるという効果がある。 According to the first aspect of the present invention, when an operation operable by voice recognition is performed by the manual operation unit, the control unit outputs voice guidance for notifying that the operation can be performed by voice recognition from the voice output unit. Users who do not know that voice operation by voice recognition is possible can be informed that voice operation by voice recognition is possible, and users who learn the operation vocabulary by reading the manual etc. This reduces the burden on the user and allows the user to learn new operation vocabulary naturally while using the controller.

また、学習モードに切り替えた場合のみ、手動操作部による操作入力に起因して音声ガイダンスが出力されるので、音声ガイダンスの出力が不要な場合には通常モードに切り替えることで、音声ガイダンスの出力を停止させることができる。 Also, only when switching to the learning mode, the voice guidance is output due to the operation input by the manual operation unit, so when the voice guidance output is unnecessary, the voice guidance is output by switching to the normal mode. Can be stopped.

さらに、操作語彙の学習状況の進捗度合いが所定レベル以下のタスクのみ、音声ガイダンスが出力されるので、ユーザが習熟していない操作語彙であることを際立たせて、学習効果をより高めることができる。 Furthermore, since voice guidance is output only for tasks whose progress in the learning status of the operation vocabulary is a predetermined level or less, it is possible to highlight the operation vocabulary that the user is not familiar with and improve the learning effect. .

さらに、所望のタスクを行わせるまでの発話回数が多いほど、習熟度が低いと判断できるので、その場合には音声ガイダンスを出力することで、所望のタスクを行わせるための操作語彙をユーザに効率良く学習させることができる。 Furthermore, the more the number of utterances until the desired task is performed, the lower the proficiency level can be determined. In that case, by outputting voice guidance, the user can be given an operation vocabulary for performing the desired task. You can learn efficiently.

請求項２の発明によれば、音声認識により操作可能な操作が手動操作部により行われると、制御部は、上記操作が音声認識により行えることを報知するための音声ガイダンスを音声出力部から出力させているので、音声認識による音声操作が可能なことを知らなかったユーザに対して、音声認識による音声操作が可能なことを知らしめることができ、マニュアルを読むなどして操作語彙を覚えるユーザの負担を軽減し、操作器を使いながら自然に新しい操作語彙を学習させることができるという効果がある。また、学習モードに切り替えた場合のみ、手動操作部による操作入力に起因して音声ガイダンスが出力されるので、音声ガイダンスの出力が不要な場合には通常モードに切り替えることで、音声ガイダンスの出力を停止させることができる。さらに、操作語彙の学習状況の進捗度合いが所定レベル以下のタスクのみ学習モードに切り替えられ、手動操作部による操作入力に起因して音声ガイダンスが出力されるので、音声操作に未習熟の場合のみ音声ガイダンスを出力させて、学習効果を高めることができる。さらに、音声操作の割合が低いほど、或いは、手動操作の割合が高いほど音声操作に慣れていないと判断できるので、その場合には音声ガイダンスを出力することで、所望のタスクを行わせるための操作語彙をユーザに効率良く学習させることができる。 According to the invention of claim 2 , when an operation that can be operated by voice recognition is performed by the manual operation unit, the control unit outputs voice guidance for notifying that the operation can be performed by voice recognition from the voice output unit. Users who do not know that voice operation by voice recognition is possible can be informed that voice operation by voice recognition is possible, and users who learn the operation vocabulary by reading the manual etc. This reduces the burden on the user and allows the user to learn new operation vocabulary naturally while using the controller. Also, only when switching to the learning mode, the voice guidance is output due to the operation input by the manual operation unit, so when the voice guidance output is unnecessary, the voice guidance is output by switching to the normal mode. Can be stopped. Furthermore, only the tasks whose progress in the learning status of the operation vocabulary is less than the predetermined level are switched to the learning mode, and voice guidance is output due to the operation input by the manual operation unit. Guidance can be output to enhance the learning effect. Furthermore, it can be determined that the lower the voice operation rate or the higher the manual operation rate, the less familiar the user is with the voice operation. In that case, the voice guidance is output to perform the desired task. The user can learn the operation vocabulary efficiently.

請求項３の発明によれば、音声認識により操作可能な操作が手動操作部により行われると、制御部は、上記操作が音声認識により行えることを報知するための音声ガイダンスを音声出力部から出力させているので、音声認識による音声操作が可能なことを知らなかったユーザに対して、音声認識による音声操作が可能なことを知らしめることができ、マニュアルを読むなどして操作語彙を覚えるユーザの負担を軽減し、操作器を使いながら自然に新しい操作語彙を学習させることができるという効果がある。また、学習モードに切り替えた場合のみ、手動操作部による操作入力に起因して音声ガイダンスが出力されるので、音声ガイダンスの出力が不要な場合には通常モードに切り替えることで、音声ガイダンスの出力を停止させることができる。さらに、操作語彙の学習状況の進捗度合いが所定レベル以下の場合は学習モードに切り替えられ、手動操作部による操作入力に起因して音声ガイダンスが出力されるので、音声操作に未習熟の場合のみ音声ガイダンスを出力させて、学習効果を高めることができる。さらに、タスク毎に学習状況の進捗度合いを判断するのではなく、全タスクについて学習状況の進捗度合いを判断しているので、学習状況の進捗度合いの判定処理を簡略化できる。 According to the invention of claim 3 , when an operation operable by voice recognition is performed by the manual operation unit, the control unit outputs voice guidance for notifying that the operation can be performed by voice recognition from the voice output unit. Users who do not know that voice operation by voice recognition is possible can be informed that voice operation by voice recognition is possible, and users who learn the operation vocabulary by reading the manual etc. This reduces the burden on the user and allows the user to learn new operation vocabulary naturally while using the controller. Also, only when switching to the learning mode, the voice guidance is output due to the operation input by the manual operation unit, so when the voice guidance output is unnecessary, the voice guidance is output by switching to the normal mode. Can be stopped. Furthermore, when the progress of the learning status of the operation vocabulary is below a predetermined level, the mode is switched to the learning mode, and voice guidance is output due to the operation input by the manual operation unit. Guidance can be output to enhance the learning effect. Furthermore , since the progress of the learning status is determined for all tasks instead of determining the progress of the learning status for each task, the process for determining the progress of the learning status can be simplified.

請求項４の発明によれば、個々のユーザ毎に学習状況の進捗度合いを評価することで、音声ガイダンスの必要なユーザのみに音声ガイダンスを提供することができる。また音声ガイダンスの不要なユーザに対しては、音声ガイダンスの出力を停止することで、不要な音声ガイダンスを聞かされてユーザが不快に感じるのを防止できる。 According to the fourth aspect of the present invention, the voice guidance can be provided only to the user who needs the voice guidance by evaluating the progress degree of the learning situation for each individual user. Further, by stopping the output of the voice guidance for a user who does not need the voice guidance, it is possible to prevent the user from feeling uncomfortable when the unnecessary voice guidance is heard.

以下に本発明の実施の形態を図面に基づいて説明する。 Embodiments of the present invention will be described below with reference to the drawings.

（実施形態１）
本発明の実施形態１を図１〜図３に基づいて説明する。本実施形態の操作器１は、被制御機器である宅内の設備機器（例えば照明器具、給湯器、防犯設備、防災設備、映像・音響設備など）を操作するために用いられる。操作器１は、図２に示すように例えば壁面に埋込配設される器体２を備えており、器体２の前面には動作表示用のＬＥＤ３と、バックライトを備えた液晶ディスプレイ（ＬＣＤ）４と、マイク（音声入力部）５と、スピーカ６と、各種の操作を行うための操作ボタン７ａ〜７ｄとを配置してある。 (Embodiment 1)
Embodiment 1 of this invention is demonstrated based on FIGS. 1-3. The operation device 1 of the present embodiment is used to operate in-home equipment (such as lighting fixtures, hot water heaters, security equipment, disaster prevention equipment, video / audio equipment, etc.) as controlled equipment. As shown in FIG. 2, the operation device 1 includes a device body 2 embedded in, for example, a wall surface, and an operation display LED 3 on the front surface of the device body 2 and a liquid crystal display including a backlight ( LCD) 4, microphone (voice input unit) 5, speaker 6, and operation buttons 7a to 7d for performing various operations.

操作器１は、図１のブロック図に示すように、操作ボタン７ａ〜７ｄを有して被制御機器を操作するために用いられる手動操作部７と、被制御機器である照明器具２０の点灯状態を制御する照明制御部１９と、手動操作部７の操作に対応した動作を行わせるように照明制御部１９を制御する手動制御部１８を備えるとともに、ＬＣＤ４の表示を制御する表示制御部１４や、器体２の外部（宅内）に設けた防犯センサ２１や防災センサ２２の動作を制御するとともに各センサからの警報信号が入力されるセンサ制御部１５や、外部の通信回線網２３が接続される通信制御部１６や、センサ制御部１５を介して防犯センサ２１や防災センサ２２からの警報信号が入力されると通信制御部１６を用いて予め設定された外部の通報先に警報を発する緊急通報部１７を備えている。 As shown in the block diagram of FIG. 1, the operation device 1 has operation buttons 7 a to 7 d and a manual operation unit 7 that is used to operate the controlled device, and the lighting device 20 that is the controlled device is turned on. An illumination control unit 19 that controls the state and a manual control unit 18 that controls the illumination control unit 19 to perform an operation corresponding to the operation of the manual operation unit 7 and a display control unit 14 that controls the display of the LCD 4 are provided. In addition, a sensor control unit 15 that controls the operation of the security sensor 21 and the disaster prevention sensor 22 provided outside the housing 2 (inside the house) and receives an alarm signal from each sensor, and an external communication network 23 are connected. When an alarm signal from the security sensor 21 or the disaster prevention sensor 22 is input via the communication control unit 16 or the sensor control unit 15, an alarm is issued to an external report destination set in advance using the communication control unit 16. Tight And it includes a reporting unit 17.

また操作器１は、手動操作部７の操作に応じて被制御機器である照明器具２０を制御する機能に加えて、音声により入力された操作命令に応じて照明器具２０を制御する機能を備えている。すなわち被制御機器に所望の動作を行わせる操作（タスク）を、対応する操作語彙を発話することで行う音声操作と、手動操作部７を直接操作することで行う手動操作の両方で行えるようになっている。 In addition to the function of controlling the lighting fixture 20 that is a controlled device in accordance with the operation of the manual operation unit 7, the operating device 1 has a function of controlling the lighting fixture 20 in accordance with an operation command input by voice. ing. That is, an operation (task) for causing the controlled device to perform a desired operation can be performed by both a voice operation performed by speaking the corresponding operation vocabulary and a manual operation performed by directly operating the manual operation unit 7. It has become.

而して操作器１は音声認識処理を行うために、ユーザの発話した音声を電気信号に変換するマイク５と、マイク５により入力された電気信号（音声信号）を増幅するアンプ（図示せず）と、操作に対する応答音や警報音や音声ガイダンスを出力するためのスピーカ６と、スピーカ６から出力される音がマイク５へ回り込み入力されてエコーやハウリングが発生するのを防止するエコーキャンセラ１１とを備えるとともに、被制御機器を操作するための複数の操作語彙について多数の話者が発した操作語彙の特徴量を例えばＨＭＭ（隠れマルコフモデル）を用いてモデル化した音響モデルを記憶した音響モデル記憶部８と、マイク５から入力されアンプにより増幅された音声信号と記憶部８に記憶された音響モデルとを比較し、入力された音声信号と類似度の高い音響モデルに対応する音（操作語彙）を認識結果として出力する音声認識部９と、操作語彙を蓄積したデータベースや手動操作部７による操作と操作語彙との対応付けを行うデータベースなどが構築されたデータベース１０と、人の音声を合成する音声生成部１２と、音声の認識結果に基づいて表示制御部１４や照明制御部１９を制御するとともに、手動操作部１８による手動操作に基づいて音声生成部１２に対して合成する音声を出力する対話制御部１３と、音声生成部１２で合成された音声や警告音や報知音を出力するスピーカ６とを備えている。ここに、対話制御部１３、音声生成部１２、スピーカ６などから音声ガイダンスを出力する音声出力部が構成される。なお対話制御部１３、手動制御部１８、表示制御部１４などはマイクロコンピュータよりなる制御部３０の演算機能により実現される。 Thus, in order to perform voice recognition processing, the operating device 1 converts a voice spoken by the user into an electric signal, and an amplifier (not shown) that amplifies the electric signal (voice signal) input by the microphone 5. ), A speaker 6 for outputting a response sound, an alarm sound, and a voice guidance for the operation, and an echo canceller 11 for preventing the sound output from the speaker 6 from entering the microphone 5 and causing echo or howling. And an acoustic model storing an acoustic model obtained by modeling, using, for example, HMM (Hidden Markov Model), feature quantities of operation vocabulary issued by a large number of speakers for a plurality of operation vocabularies for operating controlled devices The model storage unit 8 compares the audio signal input from the microphone 5 and amplified by the amplifier with the acoustic model stored in the storage unit 8, and the input audio The speech recognition unit 9 that outputs a sound (operation vocabulary) corresponding to an acoustic model having a high degree of similarity to the issue is associated with the operation vocabulary and the operation performed by the database storing the operation vocabulary or the manual operation unit 7. The database 10 in which a database is constructed, the voice generation unit 12 that synthesizes human speech, the display control unit 14 and the illumination control unit 19 are controlled based on the speech recognition result, and the manual operation by the manual operation unit 18 And a speaker 6 for outputting a voice, a warning sound, and a notification sound synthesized by the voice generation unit 12. Here, a voice output unit that outputs voice guidance from the dialogue control unit 13, the voice generation unit 12, the speaker 6, and the like is configured. The dialog control unit 13, the manual control unit 18, the display control unit 14, and the like are realized by the calculation function of the control unit 30 formed of a microcomputer.

この操作器１の動作を図３のフローチャートに基づいて説明する。ステップＳ１で操作器１が動作を開始すると、制御部３０は操作入力があるか否かを判別し（ステップＳ２）、操作入力があると操作入力の入力方法が音声操作によるものか手動操作によるものかを判別する（ステップＳ３）。ここで、操作入力が音声操作によるものであれば、音声認識部９がマイク５からユーザの発話した音声を取り込み（ステップＳ４）、入力された音声と音響モデル記憶部８に記憶された音響モデルとを比較することで音声認識を行い、認識結果を対話制御部１３に出力する（ステップＳ５）。対話制御部１３は、音声認識部９から入力された認識結果が音声か否かを判別し（ステップＳ６）、音声でなければステップＳ２に戻り、音声の場合は対話処理部１３がデータベース１０内に構築された語彙操作対応付けデータベースを参照する（ステップＳ７）。語彙操作対応付けデータベースには表１に示すような認識語彙−操作対応テーブルが登録されており、個々の被制御機器に対して、設置位置を示す宅内ＩＤ、機器ＩＤと、被制御機器を操作するための操作語彙と、操作語彙による操作内容とを示すデータが対応付けて記憶されている。対話制御部１３では、認識語彙操作テーブルを参照し（ステップＳ８）、認識結果が語彙操作対応付けデータベースに登録された操作語彙であれば、照明制御部１９により操作語彙に対応する処理を行わせた後（ステップＳ９）、タスクを終了して（ステップＳ１０）、ステップＳ２に戻る。一方、ステップＳ８において、音声認識部９から入力された操作語彙が語彙操作対応付けデータベースに登録された操作語彙でなければ、対話制御部１３は照明器具２０の制御は行わずにステップＳ２に移行する。また、ステップＳ２において操作器１の停止操作が行われると、操作器１による操作が停止される（ステップＳ１７）。 The operation of the controller 1 will be described based on the flowchart of FIG. When the controller 1 starts operating in step S1, the control unit 30 determines whether or not there is an operation input (step S2). If there is an operation input, the input method of the operation input is by voice operation or by manual operation. It is determined whether it is a thing (step S3). Here, if the operation input is a voice operation, the voice recognition unit 9 captures the voice spoken by the user from the microphone 5 (step S4), and the input voice and the acoustic model stored in the acoustic model storage unit 8 Is recognized, and the recognition result is output to the dialogue control unit 13 (step S5). The dialogue control unit 13 determines whether or not the recognition result input from the voice recognition unit 9 is a voice (step S6). If the voice is not a voice, the process returns to step S2. Reference is made to the vocabulary operation association database constructed in step (7). A recognition vocabulary-operation correspondence table as shown in Table 1 is registered in the vocabulary operation association database, and for each controlled device, the home ID indicating the installation position, the device ID, and the controlled device are operated. Data indicating the operation vocabulary for operation and the operation content by the operation vocabulary are stored in association with each other. The dialogue control unit 13 refers to the recognized vocabulary operation table (step S8), and if the recognition result is an operation vocabulary registered in the vocabulary operation association database, the lighting control unit 19 performs processing corresponding to the operation vocabulary. (Step S9), the task is terminated (step S10), and the process returns to step S2. On the other hand, in step S8, if the operation vocabulary input from the speech recognition unit 9 is not the operation vocabulary registered in the vocabulary operation association database, the dialogue control unit 13 does not control the lighting fixture 20 and proceeds to step S2. To do. Further, when the operation device 1 is stopped in step S2, the operation by the operation device 1 is stopped (step S17).

一方、ステップＳ３において操作入力の入力方法が手動操作によるものであれば、手動制御部１８が手動操作部７の操作内容に応じて照明制御部１９を制御し、所望の動作を行わせた後（ステップＳ１１）、対話制御部１３が、手動制御部１８による制御動作をもとにデータベース１０内に構築された操作語彙対応付けデータベースを参照し（ステップＳ１２）、手動操作部７の操作に対応した操作語彙があるか否かを判別する（ステップＳ１３）。操作語彙対応付けデータベースには表２に示すような操作−認識語彙対応テーブルが登録されており、個々のタスクに対して、手動操作部７による操作に割り当てた操作ＩＤと、操作語彙とが対応付けて記憶されている。 On the other hand, if the input method of the operation input is a manual operation in step S3, the manual control unit 18 controls the illumination control unit 19 according to the operation content of the manual operation unit 7 to perform a desired operation. (Step S11), the dialogue control unit 13 refers to the operation vocabulary association database built in the database 10 based on the control operation by the manual control unit 18 (Step S12), and corresponds to the operation of the manual operation unit 7. It is determined whether or not there is an operated vocabulary (step S13). The operation-recognition vocabulary correspondence table as shown in Table 2 is registered in the operation vocabulary correspondence database, and the operation ID assigned to the operation by the manual operation unit 7 corresponds to the operation vocabulary for each task. It is remembered.

ここで、操作−認識語彙対応テーブルに手動操作部７の操作に対応した操作語彙がなければ、対話制御部１３はタスクを終了して（ステップＳ１０）、ステップＳ２に戻る。一方、操作−認識語彙対応テーブルに操作に対応した操作語彙があれば、学習状況推定部としての制御部３０が、この操作語彙についてのユーザの学習状況を判定する処理を行い（ステップＳ１４）、判定結果に応じて制御部３０の演算機能により実現されたモード切替部が、制御部３０の動作モードを学習モード又は通常モードの何れかに切り替える（ステップＳ１５）。すなわち未習熟であれば動作モードを、手動操作部７による操作入力に起因して音声ガイダンスを出力させる学習モードに切り換え、習熟済であれば、動作モードを音声ガイダンスの出力を停止させる通常モードに切り換える。 If there is no operation vocabulary corresponding to the operation of the manual operation unit 7 in the operation-recognition vocabulary correspondence table, the dialogue control unit 13 ends the task (step S10) and returns to step S2. On the other hand, if there is an operation vocabulary corresponding to the operation in the operation-recognition vocabulary correspondence table, the control unit 30 as a learning state estimation unit performs a process of determining the user's learning state for the operation vocabulary (step S14), Depending on the determination result, the mode switching unit realized by the calculation function of the control unit 30 switches the operation mode of the control unit 30 to either the learning mode or the normal mode (step S15). That is, if the user is not proficient, the operation mode is switched to a learning mode in which voice guidance is output due to an operation input by the manual operation unit 7, and if the user is proficient, the operation mode is changed to a normal mode in which the output of the voice guidance is stopped. Switch.

例えば制御部３０では、この操作語彙に対応するタスクについて、音声認識による音声操作と手動操作部による手動操作とを含めた全操作回数に対して音声操作の占める割合から学習状況の進捗度を判定している。すなわち制御部３０は、音声操作の占める割合が所定の閾値以下であればこの操作語彙にユーザが習熟していないと判断し、手動操作部による操作に対応した操作語彙をユーザに知らせる音声ガイダンスを音声生成部１２により合成させて、スピーカ６から出力させた後（ステップＳ１６）、タスクを終了して（ステップＳ１０）、ステップＳ２に戻る。一方、音声操作の割合が所定の閾値よりも高ければ、制御部３０はこの操作語彙にユーザが習熟済みであると判断し、タスクを終了して（ステップＳ１０）、ステップＳ２に戻る。 For example, the control unit 30 determines the progress of the learning situation from the ratio of the voice operation to the total number of operations including the voice operation by voice recognition and the manual operation by the manual operation unit for the task corresponding to the operation vocabulary. is doing. That is, the control unit 30 determines that the user is not familiar with the operation vocabulary if the proportion of the voice operation is equal to or less than a predetermined threshold, and provides voice guidance that informs the user of the operation vocabulary corresponding to the operation by the manual operation unit. After making it synthesize | combine by the audio | voice production | generation part 12 and making it output from the speaker 6 (step S16), a task is complete | finished (step S10) and it returns to step S2. On the other hand, if the rate of voice operation is higher than a predetermined threshold, the control unit 30 determines that the user has mastered this operation vocabulary, ends the task (step S10), and returns to step S2.

このように音声認識により行える操作が手動操作部７によって行われると、制御部３０は、上記操作が音声認識により行えることを報知するための音声ガイダンスを音声生成部１２により音声合成し、スピーカ６から出力させているので、音声認識による操作が可能なことを知らなかったユーザに対して、音声認識による操作が可能なことを知らしめることができ、マニュアルを読むなどして操作語彙を覚えるユーザの負担を軽減し、操作器を使いながら自然に新しい操作語彙を学習させることができる。また手動操作部７を用いて操作を行うと、手動操作部７による操作に対応した操作語彙を音声ガイダンスによって知ることができるので、マニュアルを読むことなく操作語彙を覚えることができ、ユーザが操作語彙を学習する負担を軽減することができる。 When an operation that can be performed by voice recognition is performed by the manual operation unit 7, the control unit 30 synthesizes voice guidance for notifying that the above operation can be performed by voice recognition by the voice generation unit 12, and the speaker 6 Users who do not know that voice recognition operations are possible, so that users can learn that voice recognition operations are possible, and users who learn operation vocabulary by reading manuals, etc. This makes it possible to learn new operation vocabulary naturally while using the operation device. Further, when the operation is performed using the manual operation unit 7, the operation vocabulary corresponding to the operation by the manual operation unit 7 can be known by voice guidance, so that the operation vocabulary can be learned without reading the manual, and the user can operate The burden of learning vocabulary can be reduced.

また本実施形態では、ある操作語彙に対応するタスクについて、音声認識による音声操作と手動操作部による手動操作とを含めた全操作回数に対して音声操作の占める割合が所定の閾値を超えると、その操作語彙に習熟済であると判断して音声ガイダンスを出力せず、所定の閾値以下の場合のみ音声ガイダンスを出力するようにしているので、ユーザが習熟していない操作語彙であることを際立たせることができ、学習効果をより高めることができる。ここにおいて、音声生成部１２とスピーカ６とで、音声ガイダンスを出力する音声出力部が構成される。 In the present embodiment, for a task corresponding to a certain operation vocabulary, when the ratio of the voice operation to the total number of operations including the voice operation by voice recognition and the manual operation by the manual operation unit exceeds a predetermined threshold, Since it is judged that the operation vocabulary is familiar and no voice guidance is output, and voice guidance is output only when it is below a predetermined threshold, it is emphasized that the operation vocabulary is not familiar to the user. Can improve the learning effect. Here, the voice generation unit 12 and the speaker 6 constitute a voice output unit that outputs voice guidance.

なお制御部３０では、全操作回数に対して音声操作の占める割合から学習状況の進捗合いを判定しているが、全操作回数に対して手動操作の占める割合から学習状況の進捗度合いを判定するようにしても良く、例えば全操作回数に対して手動操作の占める割合が所定の閾値よりも高ければ、対応する操作語彙に未習熟であると判断して、動作モードを学習モードに切り替えるとともに、手動操作の占める割合が閾値以下であれば、対応する操作語彙に習熟済みであると判断して、動作モードを通常モードに切り替える。この場合も、操作語彙に未習熟の場合のみ音声ガイダンスを出力させることで、ユーザが習熟していない操作語彙であることを際立たせることができ、学習効果をより高めることができる。 The control unit 30 determines the progress of the learning status from the ratio of the voice operation to the total number of operations, but determines the progress of the learning status from the ratio of the manual operation to the total number of operations. For example, if the ratio of manual operation to the total number of operations is higher than a predetermined threshold, it is determined that the corresponding operation vocabulary is not familiar, and the operation mode is switched to the learning mode. If the ratio of manual operation is less than or equal to the threshold value, it is determined that the corresponding operation vocabulary has been mastered, and the operation mode is switched to the normal mode. Also in this case, by outputting the voice guidance only when the operation vocabulary is not yet mastered, it is possible to make the operation vocabulary that the user is not familiar with stand out, and the learning effect can be further enhanced.

（実施形態２）
上述の実施形態１では、学習状況推定部たる制御部３０が、操作語彙の学習状況の進捗度合いを判定する際に、ある操作語彙に対応するタスクについて、全操作回数に対して音声操作の占める割合又は手動操作の占める割合の何れかに基づいて学習状況の進捗度合いを判定しているが、タスクを完了するまでの発話回数の平均値をもとに操作語彙の学習状況の進捗度合いを判定するようにしても良い。尚、操作器１の構成は実施形態１と同様であるので、共通する構成要素には同一の符号を付して、その説明は省略する。 (Embodiment 2)
In the first embodiment described above, when the control unit 30 serving as the learning status estimation unit determines the progress of the learning status of the operation vocabulary, the voice operation occupies the total number of operations for the task corresponding to the operation vocabulary. The progress of the learning status is determined based on either the ratio or the ratio of manual operation, but the progress of the learning status of the operation vocabulary is determined based on the average number of utterances until the task is completed. You may make it do. In addition, since the structure of the operating device 1 is the same as that of Embodiment 1, the same code | symbol is attached | subjected to a common component and the description is abbreviate | omitted.

例えば操作器１で制御される被制御機器がテレビの場合、テレビのスイッチをオンにして、チャンネルをｎチャンネルに合わせるというタスクを行う際に、「テレビオン」という操作語彙でテレビをオンした後に、「テレビｎチャンネル」という操作語彙でチャンネルをｎチャンネルに切り替える方法と、「テレビｎチャンネル」という操作語彙により、いっぺんにテレビをオンしてチャンネルをｎチャンネルに切り替える方法とがある。また被制御機器が段調光機能を有する照明器具の場合、照明器具を調光点灯させるというタスクを行う際に、「ライトエーオン」という操作語彙で消灯状態の照明器具（ライトＡ）を点灯させた後に、「ライトエーチョウコウ」という操作語彙で照明器具（ライトＡ）を調光する方法と、「ライトエーチョウコウ」という操作語彙で、いっぺんに照明器具（ライトＡ）を調光点灯させる方法とがある。 For example, when the controlled device controlled by the controller 1 is a television, when the task of turning on the television and setting the channel to n channel is performed, the television is turned on with the operation vocabulary “TV on”. There are a method of switching the channel to the n channel with an operation vocabulary of “TV n channel” and a method of switching the channel to the n channel at once by turning on the television by the operation vocabulary of “TV n channel”. Also, if the controlled device is a lighting fixture that has a step dimming function, when the task of turning on the lighting fixture is performed, turn off the lighting fixture (light A) with the operation vocabulary “light-on” After lighting up, the method of dimming the lighting fixture (light A) with the operation vocabulary “Light A Cho Kou”, and the method of dimming the lighting fixture (light A) at once with the operation vocabulary “Light A Cho Kou” There is.

このように同じタスクを複数通りの操作語彙で実現できる場合には、より少ない操作語彙でタスクを実行する方が、操作語彙に習熟していると考えられるので、学習状況推定部たる制御部３０が、個々のタスクを完了するまでの発話回数を積算して、タスクを１回実行する際の平均発話回数を求めており、個々のタスクを完了するまでの発話回数の平均値（平均発話回数）をもとに、個々のタスク毎に学習状況の進捗度合いを推定しており、発話回数の平均値が個々のタスク毎に予め設定された閾値以上であれば、対応する操作語彙に未習熟であると判断して、動作モードを学習モードに切り替えるとともに、発話回数の平均値が上記の閾値未満であれば、対応する操作語彙に習熟済であると判断して、動作モードを通常モードに切り替えている。 When the same task can be realized with a plurality of operation vocabularies as described above, it is considered that the task is executed with fewer operation vocabularies, so that it is considered familiar with the operation vocabulary. However, by adding up the number of utterances until each task is completed, the average number of utterances when executing the task once is obtained, and the average number of utterances until completing each task (average utterance number) ), The progress of the learning situation is estimated for each task, and if the average number of utterances is greater than or equal to a preset threshold value for each task, the corresponding operation vocabulary is unfamiliar If the average value of the number of utterances is less than the above threshold value, it is determined that the corresponding operation vocabulary is familiar, and the operation mode is changed to the normal mode. Switch That.

そして、動作モードが学習モードに切り替えられた場合には、音声認識により行える操作が手動操作部７によって行われると、制御部３０は、上記操作が音声認識により行えることを報知するための音声ガイダンスを音声生成部１２により音声合成し、スピーカ６から出力させているので、音声認識による操作が可能なことを知らなかったユーザに対して、音声認識による操作が可能なことを知らしめることができる。一方、動作モードが通常モードに切り替えられた場合には、制御部３０は、手動操作部７による手動操作が行われたとしても、上述の音声ガイダンスを出力することはなく、操作語彙に習熟しているユーザに対しては、音声ガイダンスの出力を停止することで、不要な音声ガイダンスを聞かされてユーザが不快に感じるのを防止できる。 When the operation mode is switched to the learning mode, when an operation that can be performed by voice recognition is performed by the manual operation unit 7, the control unit 30 provides voice guidance for notifying that the operation can be performed by voice recognition. Is generated by the voice generation unit 12 and output from the speaker 6, so that the user who did not know that the operation by voice recognition can be performed can be informed that the operation by voice recognition can be performed. . On the other hand, when the operation mode is switched to the normal mode, the control unit 30 does not output the above-described voice guidance even if a manual operation by the manual operation unit 7 is performed, and is familiar with the operation vocabulary. By stopping the output of the voice guidance, it is possible to prevent the user from feeling uncomfortable by listening to unnecessary voice guidance.

（実施形態３）
上述の実施形態１では、学習状況推定部たる制御部３０が、被制御機器に所望の動作を行わせる個々のタスク毎に対応する操作語彙の学習状況を判定しているのに対して、本実施形態では、被制御機器に所望の動作を行わせる全てのタスクについて操作語彙の学習状況を判定するようにしている。尚、操作器１の構成は実施形態１と同様であるので、共通する構成要素には同一の符号を付して、その説明は省略する。 (Embodiment 3)
In the first embodiment described above, the control unit 30 serving as the learning state estimation unit determines the learning state of the operation vocabulary corresponding to each task that causes the controlled device to perform a desired operation. In the embodiment, the learning status of the operation vocabulary is determined for all tasks that cause the controlled device to perform a desired operation. In addition, since the structure of the operating device 1 is the same as that of Embodiment 1, the same code | symbol is attached | subjected to a common component and the description is abbreviate | omitted.

すなわち、学習状況推定部たる制御部３０では、操作器１により制御される被制御機器に所望の動作を行わせる全てのタスクについて、音声認識による音声操作と手動操作部による手動操作とを含めた全操作回数を積算するとともに、全操作回数に対する音声操作の割合、或いは、手動操作の割合を求めており、音声操作の割合又は手動操作の割合の何れかに基づいて操作語彙の学習状況を判定している。 That is, the control unit 30 serving as a learning status estimation unit includes a voice operation by voice recognition and a manual operation by the manual operation unit for all tasks that cause the controlled device controlled by the controller 1 to perform a desired operation. The total number of operations is integrated, and the ratio of voice operations or manual operations to the total number of operations is calculated, and the learning status of the operation vocabulary is determined based on either the voice operation ratio or the manual operation ratio. is doing.

つまり、制御部３０は、全操作回数に対する音声操作の割合が所定の閾値以下、或いは、手動操作の割合が所定の閾値よりも高ければ、音声操作に未習熟であると判断して、動作モードを学習モードに切り替えるとともに、全操作回数に対する音声操作の割合が所定の閾値よりも高ければ、或いは、手動操作の割合が所定の閾値以下であれば、音声操作に習熟済であると判断して、動作モードを通常モードに切り替えている。 That is, the control unit 30 determines that the voice operation is unskilled if the ratio of the voice operation to the total number of operations is equal to or less than a predetermined threshold, or if the ratio of the manual operation is higher than the predetermined threshold. If the ratio of the voice operation to the total number of operations is higher than a predetermined threshold or if the ratio of the manual operation is equal to or lower than the predetermined threshold, it is determined that the voice operation is familiar. The operation mode is switched to the normal mode.

（実施形態４）
上述の実施形態では学習状況の進捗度合いを推定する際にユーザの識別を行っていないが、ユーザの識別を行って、個々のユーザ毎に学習状況の進捗度合いを判定するとともに、音声操作により行える操作が手動操作部により行われた際に、操作を行ったユーザの学習状況に応じて音声ガイダンスを出力するか否かを切り換えるようにしても良い。尚、音声認識機能付き操作器１の基本的な構成は実施形態１と同様であるので、共通する構成要素には同一の符号を付して、その説明は省略する。 (Embodiment 4)
In the above-described embodiment, the user is not identified when estimating the progress of the learning situation. However, the user is identified, and the progress of the learning situation is determined for each individual user and can be performed by voice operation. When the operation is performed by the manual operation unit, whether or not to output the voice guidance may be switched according to the learning status of the user who performed the operation. In addition, since the basic structure of the operating device 1 with a speech recognition function is the same as that of Embodiment 1, the same code | symbol is attached | subjected to a common component and the description is abbreviate | omitted.

本実施形態の操作器１では、操作部を操作する人物を撮影するカメラ（図示せず）を供え、操作器１を操作する全てのユーザの顔画像をデータベース１０に予め登録するとともに、全てのユーザの音響モデルを音響モデル記憶部８に記憶させており、音声操作による操作が行われた場合には、入力音声と音響モデル記憶部８に記憶された各ユーザの音響モデルとを比較することで、音声操作を行ったユーザを特定し、このユーザが音声操作を行った回数を計数する。また手動操作部７による操作が行わせれた場合には、手動操作部７を操作した人物のカメラ画像と、データベース１０に登録された顔画像とを比較することによって、手動操作を行ったユーザを特定し、このユーザが手動操作を行った回数を計数する。 In the operation device 1 of the present embodiment, a camera (not shown) for photographing a person who operates the operation unit is provided, and face images of all users who operate the operation device 1 are registered in the database 10 in advance, and all of the face images are registered. The user's acoustic model is stored in the acoustic model storage unit 8, and when an operation by voice operation is performed, the input voice and the acoustic model of each user stored in the acoustic model storage unit 8 are compared. Then, the user who performed the voice operation is specified, and the number of times this user has performed the voice operation is counted. Further, when an operation by the manual operation unit 7 is performed, the user who performed the manual operation is compared by comparing the camera image of the person who operated the manual operation unit 7 with the face image registered in the database 10. The number of times this user performs manual operation is counted.

次に、この操作器１の動作を図４のフローチャートに基づいて説明する。尚、図４のフローチャートに示す動作は、図３のフローチャートと略同様であるので、共通する部分の動作については、その説明は省略する。 Next, operation | movement of this operation device 1 is demonstrated based on the flowchart of FIG. Note that the operation shown in the flowchart of FIG. 4 is substantially the same as that of the flowchart of FIG.

操作入力方法が音声操作の場合、ステップＳ４で音声認識部９がマイク５からユーザの発話した音声を取り込み、入力された音声と音響モデル記憶部８に記憶された音響モデルとを比較することで音声認識を行って、認識結果を対話制御部１３に出力する（ステップＳ５）。対話制御部１３は、音声認識部９から入力された認識結果が音声か否かを判別し（ステップＳ６）、音声でなければステップＳ２に戻り、音声の場合は対話処理部１３がデータベース１０内に構築された語彙操作対応付けデータベースを参照する（ステップＳ７）。語彙操作対応付けデータベースには上記の表１に示すような認識語彙−操作対応テーブルが登録されており、個々の被制御機器に対して、設置位置を示す宅内ＩＤ、機器ＩＤと、被制御機器を操作するための操作語彙と、操作語彙による操作内容とを示すデータが対応付けて記憶されている。この時、制御部３０が、入力された音声と各ユーザの音響モデルとを比較することで、音声操作を行ったユーザを識別し、このユーザの音声操作回数を計数する（ステップＳ１８）。その後、対話制御部１３では、認識語彙操作テーブルを参照し（ステップＳ８）、認識結果が語彙操作対応付けデータベースに登録された操作語彙であれば、照明制御部１９により操作語彙に対応する処理を行わせた後（ステップＳ９）、タスクを終了して（ステップＳ１０）、ステップＳ２に戻る。 When the operation input method is a voice operation, the voice recognition unit 9 captures the voice spoken by the user from the microphone 5 in step S4, and compares the input voice with the acoustic model stored in the acoustic model storage unit 8. Voice recognition is performed, and the recognition result is output to the dialogue control unit 13 (step S5). The dialogue control unit 13 determines whether or not the recognition result input from the voice recognition unit 9 is a voice (step S6). If the voice is not a voice, the process returns to step S2. Reference is made to the vocabulary operation association database constructed in step (7). In the vocabulary operation association database, a recognition vocabulary-operation correspondence table as shown in Table 1 above is registered. For each controlled device, a home ID indicating the installation position, a device ID, and a controlled device Are stored in association with data indicating the operation vocabulary for operating the vocabulary and the operation contents of the operation vocabulary. At this time, the control unit 30 compares the input voice and the acoustic model of each user to identify the user who performed the voice operation, and counts the number of times the user has performed the voice operation (step S18). Thereafter, the dialogue control unit 13 refers to the recognized vocabulary operation table (step S8), and if the recognition result is an operation vocabulary registered in the vocabulary operation association database, the lighting control unit 19 performs processing corresponding to the operation vocabulary. After performing (step S9), the task is terminated (step S10), and the process returns to step S2.

一方、操作入力方法が手動操作の場合、手動制御部１８が手動操作部７の操作内容に応じて照明制御部１９を制御し、所望の動作を行わせた後（ステップＳ１１）、制御部３０がカメラにより撮像された操作者の顔画像とデータベース１０に登録された各ユーザの顔画像とを比較することによって、手動操作を行ったユーザを識別し、このユーザの手動操作回数を計数する（ステップＳ１９）。その後、対話制御部１３が、手動制御部１８による制御動作をもとにデータベース１０内に構築された操作語彙対応付けデータベースを参照し（ステップＳ１２）、手動操作部７の操作に対応した操作語彙があるか否かを判別する（ステップＳ１３）。操作語彙対応付けデータベースには上記の表２に示すような操作−認識語彙対応テーブルが登録されており、個々のタスクに対して、手動操作部７による操作に割り当てた操作ＩＤと、操作語彙とが対応付けて記憶されている。ここで、操作−認識語彙対応テーブルに操作に対応した操作語彙があれば、学習状況推定部としての制御部３０が、手動操作を行ったユーザについて対応する操作語彙の学習状況を判定する処理を行う（ステップＳ２０）。すなわち制御部３０では、ステップＳ１８，Ｓ１９の識別結果をもとに、この操作語彙について音声操作と手動操作とを含めた全操作回数を積算するとともに、全操作回数に対する音声操作の割合、或いは、手動操作の割合を求め、全操作回数に対する音声操作又は手動操作の割合に基づいて動作モードを学習モード又は通常モードの何れかに切り替えている（ステップＳ１５）。ここにおいて、制御部３０では、全操作回数に対する音声操作の割合が所定の閾値以下、或いは、手動操作の割合が所定の閾値よりも高ければ、音声操作に未習熟であると判断して、動作モードを学習モードに切り替えるとともに、全操作回数に対する音声操作の割合が所定の閾値よりも高ければ、或いは、手動操作の割合が所定の閾値以下であれば、音声操作に習熟済であると判断して、動作モードを通常モードに切り替えている。そして、動作モードが学習モードに切り替えられた場合は、制御部３０が、手動操作部による操作に対応した操作語彙をユーザに知らせる音声ガイダンスを音声生成部１２により合成させて、スピーカ６から出力させた後（ステップＳ１６）、タスクを終了して（ステップＳ１０）、ステップＳ２に戻る。一方、動作モードが通常モードに切り替えられた場合、制御部３０は、この操作語彙にユーザが習熟済みであると判断し、タスクを終了して（ステップＳ１０）、ステップＳ２に戻る。 On the other hand, when the operation input method is manual operation, the manual control unit 18 controls the illumination control unit 19 according to the operation content of the manual operation unit 7 to perform a desired operation (step S11), and then the control unit 30. Is compared with the face image of the operator imaged by the camera and the face image of each user registered in the database 10 to identify the user who performed the manual operation, and count the number of manual operations of this user ( Step S19). Thereafter, the dialog control unit 13 refers to the operation vocabulary association database built in the database 10 based on the control operation by the manual control unit 18 (step S12), and the operation vocabulary corresponding to the operation of the manual operation unit 7 is performed. It is determined whether or not there is (step S13). In the operation vocabulary correspondence database, an operation-recognition vocabulary correspondence table as shown in Table 2 is registered. For each task, an operation ID assigned to an operation by the manual operation unit 7, an operation vocabulary, Are stored in association with each other. Here, if there is an operation vocabulary corresponding to the operation in the operation-recognition vocabulary correspondence table, the control unit 30 as a learning state estimation unit performs a process of determining the learning state of the corresponding operation vocabulary for the user who has performed the manual operation. It performs (step S20). That is, the control unit 30 accumulates the total number of operations including voice operation and manual operation for the operation vocabulary based on the identification results of steps S18 and S19, and the ratio of the voice operation to the total number of operations, or The ratio of manual operation is obtained, and the operation mode is switched to either the learning mode or the normal mode based on the ratio of voice operation or manual operation to the total number of operations (step S15). Here, the control unit 30 determines that the voice operation is inexperienced if the ratio of the voice operation to the total number of operations is equal to or less than a predetermined threshold, or if the ratio of the manual operation is higher than the predetermined threshold. When the mode is switched to the learning mode and the ratio of the voice operation to the total number of operations is higher than a predetermined threshold, or if the ratio of the manual operation is equal to or lower than the predetermined threshold, it is determined that the voice operation is familiar. The operation mode is switched to the normal mode. When the operation mode is switched to the learning mode, the control unit 30 causes the voice generation unit 12 to synthesize voice guidance that informs the user of the operation vocabulary corresponding to the operation by the manual operation unit, and outputs it from the speaker 6. (Step S16), the task is terminated (step S10), and the process returns to step S2. On the other hand, when the operation mode is switched to the normal mode, the control unit 30 determines that the user has mastered the operation vocabulary, ends the task (step S10), and returns to step S2.

このように本実施形態では、制御部３０が、音声操作あるいは手動操作を行ったユーザを識別し、個々のユーザについてタスク毎に操作語彙の学習状況の進捗度合い（習熟度）を判定しているので、個々のユーザ毎にその習熟度に応じて動作モードを学習モード又は通常モードの何れかに切り換えることができる。複数のユーザが操作器１を使用する場合は、個々のユーザ毎に操作語彙の習熟度にバラツキが生じるが、発話したユーザを個別に識別することで、ユーザ毎に音声操作の習熟度を判定することができ、音声操作に未習熟のユーザに対してのみ音声ガイダンスを提供することで、その学習効果を高めることができる。また音声操作に習熟済のユーザに対しては音声ガイダンスの出力を停止することで、不要な音声ガイダンスを出力することによってユーザが不快感を感じるのを防ぐことができる。ここにおいて、個々のユーザ毎の音響モデルを登録した音響モデル記憶部８と音声認識部９、並びに、ユーザの顔画像を記憶したデータベースとカメラと顔認証を行う制御部３０などから、ユーザを個別に識別するユーザ識別部が構成される。 As described above, in the present embodiment, the control unit 30 identifies the user who performed the voice operation or the manual operation, and determines the progress degree (skill level) of the operation vocabulary learning status for each task for each user. Therefore, the operation mode can be switched to either the learning mode or the normal mode according to the proficiency level of each individual user. When multiple users use the controller 1, the vocabulary proficiency varies for each user, but the proficiency level for voice operation is determined for each user by identifying the spoken user individually. By providing voice guidance only to users who are not familiar with voice operations, the learning effect can be enhanced. Further, by stopping the output of the voice guidance for a user who has mastered the voice operation, it is possible to prevent the user from feeling uncomfortable by outputting the unnecessary voice guidance. Here, users are individually identified from an acoustic model storage unit 8 and a voice recognition unit 9 in which acoustic models for individual users are registered, a database in which user face images are stored, a control unit 30 that performs face authentication with a camera, and the like. A user identification unit for identifying is configured.

なお、本実施形態では制御部３０が、個々のユーザ毎に全操作回数に対する音声操作の割合又は手動操作の割合の何れかより学習状況の進捗度合いを判定しているが、実施形態２で説明したように、個々のタスクを完了するまでの発話回数の平均値から、個々のタスク毎の学習状況を推定するようにしても良く、この場合には個々のユーザ毎の音響モデルを登録した音響モデル記憶部８と音声認識部９とでユーザ識別部が構成される。 In this embodiment, the control unit 30 determines the degree of progress of the learning status from either the voice operation ratio or the manual operation ratio with respect to the total number of operations for each individual user. As described above, the learning status for each task may be estimated from the average value of the number of utterances until each task is completed. In this case, an acoustic model in which an acoustic model for each individual user is registered is used. The model storage unit 8 and the voice recognition unit 9 constitute a user identification unit.

（実施形態５）
上述の各実施形態では、制御部３０が、全操作回数に対する音声操作の占める割合又は手動操作の占める割合の何れかに基づいて、ユーザによる操作語彙の学習状況の進捗状況を判定し、その判定結果に基づいて動作モードを学習モード又は通常モードの何れかに自動的に切り替えるようになっているが、ユーザ自身が制御部３０の動作モードを学習モード又通常モードの何れかに直接切り替えるようにしても良い。尚、操作器１の構成は実施形態１と同様であるので、共通する構成要素には同一の符号を付して、その説明は省略する。 (Embodiment 5)
In each of the embodiments described above, the control unit 30 determines the progress of the learning status of the operation vocabulary by the user based on either the ratio of the voice operation or the ratio of the manual operation to the total number of operations, and the determination Although the operation mode is automatically switched to either the learning mode or the normal mode based on the result, the user himself / herself directly switches the operation mode of the control unit 30 to either the learning mode or the normal mode. May be. In addition, since the structure of the operating device 1 is the same as that of Embodiment 1, the same code | symbol is attached | subjected to a common component and the description is abbreviate | omitted.

本実施形態では手動操作部７に、制御部３０の動作モードを学習モード又は通常モードの何れかに選択的に切り替える切替スイッチ（図示せず）を設けてあり、ユーザによる切替スイッチの切替操作に応じて、制御部３０の動作モードが学習モード又は通常モードの何れかに切り替えられるので、ユーザ自身の意志で動作モードを設定することができる。 In the present embodiment, the manual operation unit 7 is provided with a selector switch (not shown) that selectively switches the operation mode of the control unit 30 to either the learning mode or the normal mode. Accordingly, since the operation mode of the control unit 30 is switched to either the learning mode or the normal mode, the operation mode can be set at the user's own will.

ここで、切替スイッチにより動作モードを学習モードに切り替えた場合、音声認識により行える操作が手動操作部７によって行われると、制御部３０は、上記操作が音声認識により行えることを報知するための音声ガイダンスを音声生成部１２により音声合成し、スピーカ６から出力させており、音声認識による操作が可能なことを知らなかったユーザに対して、音声認識による操作が可能なことを知らしめることができ、マニュアルを読むなどして操作語彙を覚えるユーザの負担を軽減し、操作器を使いながら自然に新しい操作語彙を学習させることができる。 Here, when the operation mode is switched to the learning mode by the changeover switch, when an operation that can be performed by voice recognition is performed by the manual operation unit 7, the control unit 30 is a voice for notifying that the above operation can be performed by voice recognition. The voice is synthesized by the voice generation unit 12 and output from the speaker 6, so that the user who did not know that the voice recognition operation can be performed can be informed that the voice recognition operation can be performed. The user's burden of learning the operation vocabulary by reading the manual can be reduced, and a new operation vocabulary can be learned naturally while using the operation device.

一方、切替スイッチにより動作モードを通常モードに切り替えた場合、制御部３０は、手動操作部７による手動操作が行われたとしても、上述の音声ガイダンスを出力することはなく、音声ガイダンスの不要なユーザに対しては、音声ガイダンスの出力を停止することで、不要な音声ガイダンスを聞かされてユーザが不快に感じるのを防止できる。 On the other hand, when the operation mode is switched to the normal mode by the changeover switch, the control unit 30 does not output the above-described voice guidance even if the manual operation by the manual operation unit 7 is performed, and the voice guidance is unnecessary. By stopping the output of the voice guidance for the user, it is possible to prevent the user from feeling uncomfortable when the unnecessary voice guidance is heard.

実施形態１の音声認識機能付き操作器のブロック図である。It is a block diagram of the operation device with a voice recognition function of the first embodiment. 同上の正面図である。It is a front view same as the above. 同上の動作を説明するフローチャートである。It is a flowchart explaining operation | movement same as the above. 実施形態４の動作を説明するフローチャートである。10 is a flowchart for explaining the operation of the fourth embodiment.

Explanation of symbols

１音声認識機能付き操作器
５マイク（音声入力部）
６スピーカ（音声出力部）
７手動操作部
８音響モデル記憶部（記憶部）
９音声認識部
２０照明器具（被制御機器）
３０制御部 1 Controller with voice recognition function 5 Microphone (voice input unit)
6 Speaker (Audio output unit)
7 Manual operation unit 8 Acoustic model storage unit (storage unit)
9 Voice recognition unit 20 Lighting equipment (controlled equipment)
30 Control unit

Claims

A manual operation unit for operating the controlled device, a voice input unit for inputting a voice spoken by the user, a storage unit storing an acoustic model of a plurality of operation vocabularies for operating the controlled device, and a voice A voice recognition unit that performs voice recognition by comparing the voice input to the input unit with the acoustic model stored in the storage unit, a voice output unit that outputs voice guidance, and an operation input or voice recognition by a manual operation unit A control unit that controls the controlled device based on any of the recognition results by the unit, and a learning mode or voice guidance that outputs the voice guidance from the voice output unit due to the operation input of the manual operation unit as the operation mode of the control unit A mode switching unit that switches to one of the normal modes that stops the output of the output, and the controlled device to make the controlled device perform a desired operation on the progress of the learning status of the operation vocabulary by the user Comprising a learning condition estimation unit that estimates for each individual task,
When an operation that can be performed by voice recognition is performed by the manual operation unit, the control unit causes the voice output unit to output voice guidance for notifying that the operation can be performed by voice recognition,
For each task, the mode switching unit switches the operation mode to the learning mode if the estimation result of the learning status estimation unit is below a predetermined level, and switches the operation mode to the normal mode if the estimation result exceeds the predetermined level. And
The learning status estimation unit calculates the average number of utterances when a task is executed once by adding up the number of utterances until the task is completed for each task and dividing the integrated value by the number of times of execution of the task. An operation device with a voice recognition function that estimates the progress of the learning situation for each task based on the average number of utterances .

A manual operation unit for operating the controlled device, a voice input unit for inputting a voice spoken by the user, a storage unit storing an acoustic model of a plurality of operation vocabularies for operating the controlled device, and a voice A voice recognition unit that performs voice recognition by comparing the voice input to the input unit with the acoustic model stored in the storage unit, a voice output unit that outputs voice guidance, and an operation input or voice recognition by a manual operation unit A control unit that controls the controlled device based on any of the recognition results by the unit, and a learning mode or voice guidance that outputs the voice guidance from the voice output unit due to the operation input of the manual operation unit as the operation mode of the control unit A mode switching unit that switches to one of the normal modes that stops the output of the output, and the controlled device to make the controlled device perform a desired operation on the progress of the learning status of the operation vocabulary by the user Comprising a learning condition estimation unit that estimates for each individual task,
When an operation that can be performed by voice recognition is performed by the manual operation unit, the control unit causes the voice output unit to output voice guidance for notifying that the operation can be performed by voice recognition,
For each task, the mode switching unit switches the operation mode to the learning mode if the estimation result of the learning status estimation unit is below a predetermined level, and switches the operation mode to the normal mode if the estimation result exceeds the predetermined level. And
For each task, the learning status estimation unit has at least one of the ratio occupied by the voice operation or the ratio occupied by the manual operation with respect to the total number of operations including the voice operation by voice recognition and the manual operation by the manual operation unit. An operation device with a voice recognition function, wherein the progress degree of a learning situation is estimated based on the above .

A manual operation unit for operating the controlled device, a voice input unit for inputting a voice spoken by the user, a storage unit storing an acoustic model of a plurality of operation vocabularies for operating the controlled device, and a voice A voice recognition unit that performs voice recognition by comparing the voice input to the input unit with the acoustic model stored in the storage unit, a voice output unit that outputs voice guidance, and an operation input or voice recognition by a manual operation unit A control unit that controls the controlled device based on any of the recognition results by the unit, and a learning mode or voice guidance that outputs the voice guidance from the voice output unit due to the operation input of the manual operation unit as the operation mode of the control unit A mode switching unit that switches to any of the normal modes for stopping the output of, and a learning status estimation unit that estimates the degree of progress of the learning status of the operation vocabulary by the user,
When an operation that can be performed by voice recognition is performed by the manual operation unit, the control unit causes the voice output unit to output voice guidance for notifying that the operation can be performed by voice recognition,
The mode switching unit switches the operation mode to the learning mode if the estimation result of the learning state estimation unit is equal to or lower than the predetermined level, and switches the operation mode to the normal mode when the estimation result exceeds the predetermined level,
The learning status estimation unit determines the ratio of the voice operation to the total number of operations including the voice operation by voice recognition and the manual operation by the manual operation unit for all tasks that cause the controlled device to perform a desired operation or manual operation. An operation device with a voice recognition function, wherein the progress degree of a learning state is estimated based on at least one of the proportions of operations.

A user identification unit for individually identifying a user from the voice input to the voice input unit is provided, and the learning status estimation unit estimates the degree of progress of the learning status for each individual identified by the user identification unit, and the control unit The operation unit with a voice recognition function according to any one of claims 1 to 3, wherein the operation mode is switched for each user based on the estimation result of the learning state estimation unit .