JP2009109586A

JP2009109586A - Voice recognition control device

Info

Publication number: JP2009109586A
Application number: JP2007279456A
Authority: JP
Inventors: Shinpei Hibiya; 新平日比谷; Kiyotaka Takehara; 清隆竹原; Kenji Okuno; 健治奥野; Akira Baba; 朗馬場; Kenji Nakakita; 賢二中北
Original assignee: Panasonic Electric Works Co Ltd
Current assignee: Panasonic Electric Works Co Ltd
Priority date: 2007-10-26
Filing date: 2007-10-26
Publication date: 2009-05-21

Abstract

<P>PROBLEM TO BE SOLVED: To allow a user to easily learn a correct speech method by specifying speech lazy tendency and teach the user it. <P>SOLUTION: In this voice recognition control device 10, when at least it is recognized that a voice spoken by a user corresponds to a predetermined voice command, the control corresponding to the recognized voice command is performed. The device includes: a voice input part 40 to which voice spoken by the user is input; a voice recognition part 12 for determining whether or not the input voice corresponds to the predetermined voice command or a lazy command corresponding to the voice when the voice command is spoken in the lazy state; a lazy tendency specifying part 16 for specifying a speech lazy tendency of the user when the input voice is determined to correspond to the lazy command by the voice recognition part 12; and a user teaching part 60 for teaching the user the speech lazy tendency specified by the lazy tendency specifying part 16. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、ユーザが発話する音声を入力し、入力された音声が所定の音声コマンドに該当すると認識した場合、認識した音声コマンドに対応する制御を実行する音声認識制御装置に関する。 The present invention relates to a voice recognition control apparatus that inputs a voice uttered by a user and executes control corresponding to the recognized voice command when the input voice is recognized as corresponding to a predetermined voice command.

発声の仕方や個人差によって、発話がなまける、例えば、発話する音声の語頭又は語尾が弱くなったり、欠落したり、或いはｒの子音が抜けたりすることがある。このように、なまけて発話した音声が音声認識制御装置に入力される場合、各制御内容に対応する音声コマンドが１つしか登録されていないと、音声認識制御装置は、入力された音声から音声コマンドを認識することができず制御を実行することができない。 Depending on the utterance method and individual differences, the utterance may be lost, for example, the beginning or ending of the uttered voice may be weakened or missing, or r consonant may be lost. As described above, when the voice that is spoken is input to the voice recognition control apparatus, if only one voice command corresponding to each control content is registered, the voice recognition control apparatus starts the voice from the input voice. The command cannot be recognized and control cannot be executed.

そこで、従来から、音声コマンドをなまけて発話した時の音声に対応するコマンド（以後、「なまけコマンド」という）を予め前記の音声コマンドとは別に管理しておくことにより、発話がなまけた場合であっても音声認識制御装置はなまけコマンドを認識して当該なまけコマンドに対応する制御を実行できるようにしている（特許文献１参照）。
特開昭６２−１１１２９５号公報 Therefore, conventionally, a command corresponding to the voice when the voice command is spoken (hereinafter referred to as a “name command”) is managed separately from the voice command, so that the voice is spoken. Even if it exists, the voice recognition control device recognizes the slack command and can execute control corresponding to the slack command (see Patent Document 1).
JP-A-62-111295

しかし、なまけて発話した音声を音声認識装置が認識できるようにしただけでは、ユーザは、自分の発話がなまけているか否か、なまけている場合、どのようななまけの傾向があるのかを把握することができないので、自分の発話のなまけを修正することができず、正しい発話方法を習得することが難しい。 However, simply by allowing the speech recognition device to recognize the spoken speech, the user knows whether or not his speech is being slack, and if so, what kind of slack tends to be found Because it is not possible to correct the utterance of my utterance, it is difficult to learn the correct utterance method.

本発明は、上記問題点を解決するために成されたものであり、その目的は、ユーザに正しい発話方法を容易に習得させることができる音声認識制御装置を提供することである。 The present invention has been made to solve the above problems, and an object of the present invention is to provide a speech recognition control apparatus that allows a user to easily learn a correct speech method.

本発明の特徴は、少なくともユーザが発話する音声が所定の音声コマンドに該当すると認識した場合、認識した音声コマンドに対応する制御を実行する音声認識制御装置であって、ユーザが発話する音声を入力する音声入力部と、入力された音声が所定の音声コマンド又は当該音声コマンドをなまけて発話した時の音声に対応するなまけコマンドに該当するか否かを判定する音声認識部と、入力された音声がなまけコマンドに該当すると音声認識部が判定した場合は、ユーザの発話なまけの傾向を特定するなまけ傾向特定部と、なまけ傾向特定部により特定された発話なまけの傾向をユーザに対して教示するユーザ教示手段とを備えることである。 A feature of the present invention is a voice recognition control device that executes control corresponding to a recognized voice command when at least voice uttered by the user corresponds to a predetermined voice command, and inputs voice uttered by the user A voice input unit, a voice recognition unit for determining whether or not the input voice corresponds to a predetermined voice command or a voice command corresponding to the voice when the voice command is spoken, and the input voice If the speech recognition unit determines that the command corresponds to a gamanake command, the user who teaches the user the tendency of the utterance blurring specified by the lenient tendency specifying unit that identifies the user's tendency of utterance And teaching means.

入力された音声がなまけコマンドに該当すると判定した場合に、ユーザの発話なまけの傾向を特定し、これをユーザに対して教示することにより、ユーザは自分の発話なまけの傾向を容易に把握することができるので、ユーザに正しい発話方法を容易に習得させることができる。例えば、語頭又は語尾の一音が弱くなる又は欠落する、或いは「ｒ」等の子音が抜けるなどの発話なまけの傾向をユーザは何ら負担を受けることなく把握することができる。 When it is determined that the input voice corresponds to a sloppy command, the user's tendency to sneak utterance is identified, and this is taught to the user so that the user can easily grasp the tendency of his utterance sloppy Therefore, the user can easily learn the correct utterance method. For example, the user can grasp the tendency of utterance blurring, such as a single sound at the beginning or end of the word becoming weak or missing, or a consonant such as “r” being missing.

なお、なまけ傾向特定部は、過去に認識したなまけコマンドの履歴又は当該なまけコマンドが属する発話なまけの傾向を記憶する記憶部を備え、当該記憶部のデータを参照して、所定の回数又は頻度でなまけコマンドが認識された場合に、当該なまけコマンドが属する傾向を「発話なまけの傾向」として特定してもよい。或いは、なまけコマンドが１回認識されただけで、当該なまけコマンドが属する傾向を「発話なまけの傾向」として特定しても構わない。 The slack tendency identifying unit includes a storage unit that stores a history of a slack command that has been recognized in the past or a trend of a slack utterance to which the slack command belongs, and refers to data in the storage unit at a predetermined number or frequency. When a slack command is recognized, the tendency to which the slack command belongs may be specified as the “spoken slack tendency”. Alternatively, the tendency to which the slack command belongs may be specified as the “spoken slack tendency” only by recognizing the slack command once.

本発明の特徴において、ユーザ教示手段は、正しい音声コマンドに基づく発音をユーザに対して報知してもよい。ユーザは何ら負担を受けることなく正しい音声コマンドに基づく発音を把握することができる。 In the feature of the present invention, the user teaching means may notify the user of a pronunciation based on a correct voice command. The user can grasp the pronunciation based on the correct voice command without any burden.

本発明の特徴において、なまけ傾向特定部は、入力された音声がなまけコマンドに該当すると音声認識部が判定した場合、当該なまけコマンドをなまけずに発話した時の音声に相当する音声コマンドに対して入力された音声が不足している箇所を特定する不足箇所特定部を有し、ユーザ教示手段は、不足箇所特定部により特定された不足箇所をユーザに対して教示してもよい。 In the feature of the present invention, the lenient tendency specifying unit, when the speech recognition unit determines that the input voice corresponds to the lenient command, in response to the voice command corresponding to the voice when the utterance is made without slicking the lenient command. It may have a deficient part specifying unit that specifies a part where the input voice is insufficient, and the user teaching unit may teach the user the deficient part specified by the deficient part specifying unit.

なまけコマンドをなまけずに発話した時の音声に相当する音声コマンドに対して入力された音声が不足している箇所を特定し、この不足箇所をユーザに対して教示することにより、ユーザが発話する音声のどの箇所の音が弱い或いは欠落しているのかを提示することができるので、ユーザは、正しい発話方法を容易に学習することができる。例えば、語頭又は語尾が弱くなる又は欠落するなどの発話なまけの傾向が有る場合、音声コマンドに対して入力された音声が不足している語頭又は語尾をユーザに対して教示すればよい。 The user utters by identifying a point where the input voice is insufficient for the voice command corresponding to the voice when speaking without using the slack command, and teaching the user of the shortage part. Since it is possible to present which part of the sound is weak or missing, the user can easily learn the correct speech method. For example, when there is a tendency for utterance dullness such as the beginning or end of the word to be weak or missing, the beginning or end of the speech that is lacking in voice input to the voice command may be taught to the user.

本発明の特徴において、なまけ傾向特定部は、入力された音声がなまけコマンドに該当すると音声認識部が判定した場合、なまけコマンドをなまけずに発話した時の音声に相当する音声コマンドに対して入力された音声が変更されている箇所を特定する変更箇所特定部を有し、教示手段は、変更箇所特定部により特定された変更箇所をユーザに対して教示してもよい。 In the feature of the present invention, the slack tendency identifying unit inputs a voice command corresponding to a voice when a speech is spoken without a slack command when the speech recognition unit determines that the input voice corresponds to a slack command. The change part specifying part which specifies the place where the made sound is changed may be provided, and the teaching means may teach the user the change part specified by the change part specifying part.

なまけコマンドをなまけずに発話した時の音声に相当する音声コマンドに対して入力された音声が変更されている箇所を特定し、この変更箇所をユーザに対して教示することにより、ユーザが発話する音声のどの箇所が間違っているのかを提示することができるので、ユーザは、正しい発話方法を容易に学習することができる。例えば、「ｒ」等の子音が抜ける発話なまけの傾向が有る場合、音声コマンドに対して入力された音声が変更されている「ｒ」等の子音箇所をユーザに対して教示すればよい。 The user utters by specifying the location where the input voice is changed in response to the voice command corresponding to the voice when uttering without slicking the command, and teaching the user of the changed location. Since it is possible to present which part of the voice is wrong, the user can easily learn the correct speech method. For example, if there is a tendency to utter utterances where consonants such as “r” are missing, a consonant location such as “r” where the voice input in response to the voice command is changed may be taught to the user.

本発明の特徴において、音声認識部は、なまけ傾向特定部により特定された発話なまけの傾向に属さないなまけコマンドを照合対象から削除する照合対象削除部を備えていてもよい。 In the feature of the present invention, the voice recognition unit may include a collation target deletion unit that deletes, from the collation target, a slack command that does not belong to the utterance slack tendency specified by the slack trend identification unit.

なまけ傾向特定部により特定された発話なまけの傾向に属さないなまけコマンドを照合対象から削除することにより、ユーザにとって必要のないなまけコマンドが照合対象から削除されるので、照合対象となる照合用データのデータ量が削減され、音声認識率が向上し、誤認識及びこれによる誤動作が抑制される。 By deleting the slack commands that do not belong to the trend of utterance slack specified by the slack trend identification unit from the collation target, the slack commands that are not necessary for the user are deleted from the collation target. The amount of data is reduced, the speech recognition rate is improved, and erroneous recognition and malfunction caused thereby are suppressed.

本発明の特徴において、音声認識制御装置は、ユーザが発話する音声が所定の音声コマンドに該当すると認識した場合は、当該認識した音声コマンドに対応する制御を実行し、ユーザが発話する音声が当該音声コマンドをなまけて発話した時の音声に対応するなまけコマンドに該当すると認識した場合は、前記音声コマンドに対応する制御を実行する制御部を備えていてもよい。 In the feature of the present invention, when the voice recognition control device recognizes that the voice uttered by the user corresponds to the predetermined voice command, the voice recognition control device executes control corresponding to the recognized voice command, and the voice uttered by the user When it is recognized that the voice command corresponds to the voice command corresponding to the voice when the voice command is spoken, the control unit may execute a control corresponding to the voice command.

本発明の音声認識制御装置によれば、発話なまけを特定し、これをユーザに対して教示することにより、ユーザに正しい発話方法を容易に習得させることができる。 According to the speech recognition control device of the present invention, the user can easily learn the correct utterance method by specifying the utterance slack and teaching it to the user.

以下図面を参照して、本発明の実施形態を説明する。図面の記載において同一部分には同一符号を付して説明を省略する。
（第１の実施の形態）
図１を参照して、本発明の第１の実施の形態に係わる音声認識制御装置１０及び被制御機器２０の具体的な構成を説明する。音声認識制御装置１０は、ユーザが発話する音声による命令（コマンド）を認識し、この音声による命令に応じた被制御機器２０の制御を実行する装置である。また、音声認識制御装置１０は、被制御機器２０の制御に限らず、音声認識制御装置１０内の各構成要素の制御をこの音声による命令に基づいて実行する。なお、本発明の実施の形態においては、浴室に設置された様々な被制御機器２０を音声認識制御装置１０が制御する場合を例にとり説明する。 Embodiments of the present invention will be described below with reference to the drawings. In the description of the drawings, the same portions are denoted by the same reference numerals, and description thereof is omitted.
(First embodiment)
With reference to FIG. 1, specific configurations of the speech recognition control device 10 and the controlled device 20 according to the first embodiment of the present invention will be described. The voice recognition control device 10 is a device that recognizes a command (command) by a voice spoken by a user and executes control of the controlled device 20 according to the command by the voice. Further, the voice recognition control device 10 performs control of each component in the voice recognition control device 10 based on this voice command, not limited to the control of the controlled device 20. In the embodiment of the present invention, a case where the voice recognition control device 10 controls various controlled devices 20 installed in a bathroom will be described as an example.

具体的に、音声認識制御装置１０は、ユーザインターフェースを形成するコントローラ１１と、コントローラ１１を介して入力されたユーザの音声による命令が所定の音声コマンド又はなまけコマンドに該当するか否かを判断する音声認識部１２と、音声認識部１２により認識された音声コマンド又はなまけコマンドに対応する被制御機器２０の制御を実行するための制御信号を出力する制御実行部１３と、制御実行部１３から出力された制御信号を被制御機器２０へ送信する制御ＩＦ部１５と、ユーザに対して出力する音声を合成する音声合成部と、音声認識制御装置１０が認識することができる音声コマンドの照合用データ及びなまけコマンドの照合用データを格納する照合用データ記憶部１４と、音声認識部１２による判定の結果に基づいて、ユーザの発話なまけの傾向を特定するなまけ傾向特定部１６とを備える。 Specifically, the voice recognition control device 10 determines whether or not a controller 11 that forms a user interface and a user's voice command input via the controller 11 corresponds to a predetermined voice command or a slack command. The voice recognition unit 12, the control execution unit 13 that outputs a control signal for executing control of the controlled device 20 corresponding to the voice command or the slack command recognized by the voice recognition unit 12, and output from the control execution unit 13 The control IF unit 15 that transmits the control signal to the controlled device 20, the voice synthesizer that synthesizes the voice output to the user, and the voice command verification data that can be recognized by the voice recognition control device 10 And the collation data storage unit 14 for storing collation data for the slack command and the result of determination by the voice recognition unit 12. There are, and a raw only tend specifying unit 16 which specifies the tendency of the user's utterance lazy.

通常、発声の仕方や個人差によって、例えば、発話する音声の語頭又は語尾が弱くなったり、欠落したり、或いはｒの子音が抜けたりする等、発話のなまけが発生することがある。本発明の実施形態における「なまけコマンド」は、音声コマンドをなまけて発話した時の音声に対応するコマンドであり、「音声コマンド」は、音声による命令をなまけることなく、正しく発話した時の音声に対応するコマンドである。なまけコマンドの詳細については、図４及び図５を参照して後述する。 Usually, depending on the manner of utterance and individual differences, utterance slack may occur, for example, the beginning or ending of speech to be uttered may be weak, missing, or r consonants may be missing. In the embodiment of the present invention, the “name command” is a command corresponding to the voice when the voice command is spoken, and the “voice command” is the voice when the voice is spoken correctly without uttering the voice command. Corresponding command. Details of the slack command will be described later with reference to FIGS.

コントローラ１１は、ユーザが発する音声を入力し、これを電気信号（音声信号）として出力する音声入力部４０と、音声合成部によって合成された音声を出力する音声出力部５０と、ユーザに対して所定の画面などを表示する表示部７０とを備える。なお、図１には示さないが、本実施形態においてコントローラ１１はユーザのボタン操作を受け付ける操作ボタン部を更に備えている。コントローラ１１の詳細については図３を参照して後述する。 The controller 11 inputs a voice uttered by the user, outputs a voice signal 40 as an electrical signal (voice signal), a voice output section 50 that outputs a voice synthesized by the voice synthesizer, and a user And a display unit 70 for displaying a predetermined screen or the like. Although not shown in FIG. 1, in this embodiment, the controller 11 further includes an operation button unit that receives a user button operation. Details of the controller 11 will be described later with reference to FIG.

また、コントローラ１１は、なまけ傾向特定部１６により特定された発話なまけの傾向をユーザに対して教示するユーザ教示部６０（ユーザ教示手段）を備える。ユーザ教示部６０には、発話なまけの傾向を画面に表示する表示部７０及び発話なまけの傾向を音声で案内する音声出力部５０が含まれる。 In addition, the controller 11 includes a user teaching unit 60 (user teaching unit) that teaches the user the tendency of the utterance blur specified by the slack tendency specifying unit 16. The user teaching unit 60 includes a display unit 70 that displays the tendency of speech utterance on the screen and a voice output unit 50 that guides the tendency of speech utterance by voice.

音声入力部４０は、ユーザの発話音声を入力し、これを音声信号に変換するマイクと、この音声信号を増幅する増幅部と、増幅された音声信号をデジタル信号に変換するＡ／Ｄ変換部と、このデジタル化された音声信号から雑音成分を除去する雑音減算部とを備える。雑音減算部により雑音が除去された音声信号は、音声認識部１２へ送信される。 The voice input unit 40 receives a user's speech and converts it into a voice signal, an amplifier that amplifies the voice signal, and an A / D converter that converts the amplified voice signal into a digital signal. And a noise subtracting section for removing a noise component from the digitized voice signal. The voice signal from which noise has been removed by the noise subtraction unit is transmitted to the voice recognition unit 12.

音声出力部５０は、音声合成部にて合成された音声信号をアナログ信号に変換するＤ／Ａ変換部と、アナログ化された音声信号を増幅する増幅部と、増幅された音声信号を音声に変換して出力するスピーカとを備える。 The audio output unit 50 includes a D / A conversion unit that converts the audio signal synthesized by the audio synthesis unit into an analog signal, an amplification unit that amplifies the analog audio signal, and converts the amplified audio signal into audio. And a speaker for conversion and output.

表示部７０は、点灯／消灯／点滅によって被制御機器２０の動作状況をユーザに対して表示するＬＥＤと、文字や絵図等の画像により被制御機器２０の動作状況をユーザに対して表示する液晶表示装置とを有する。 The display unit 70 is an LED that displays the operating status of the controlled device 20 to the user by turning on / off / flashing, and a liquid crystal that displays the operating status of the controlled device 20 to the user by an image such as a character or a picture. And a display device.

操作ボタン部は、被制御機器２０の動作設定などをユーザの手入力により行うための各種ボタンからなり、この中には、コントローラ１１の運転のオン／オフ状態を切替えるコントローラオン／オフスイッチが含まれる。 The operation button unit includes various buttons for manually setting the operation of the controlled device 20 by a user, and includes a controller on / off switch for switching an operation on / off state of the controller 11. It is.

音声認識部１２は、音声入力部４０から出力された音声信号と照合用データ記憶部１４に記憶されている照合用データとを照合して、音声入力部４０に入力された音声が所定の音声コマンド又はなまけコマンドに該当するか否かを判定する。具体的に、音声認識部１２は、音声入力部４０から出力された音声信号（音声データ）と照合用データ記憶部１４に格納された照合用データとを比較することによりユーザが発する音声が所定の音声コマンド又はなまけコマンドに該当するか否かを判定し、音声コマンド又はなまけコマンドに該当すると判定した場合には当該音声コマンド又はなまけコマンドに対応する所定の信号を制御実行部１３及びなまけ傾向特定部１６へ出力する。 The voice recognition unit 12 collates the voice signal output from the voice input unit 40 with the collation data stored in the collation data storage unit 14, and the voice input to the voice input unit 40 is a predetermined voice. It is determined whether it corresponds to a command or a slack command. Specifically, the voice recognition unit 12 compares the voice signal (speech data) output from the voice input unit 40 with the collation data stored in the collation data storage unit 14 so that the voice uttered by the user is predetermined. If the voice command or the lenient command is determined, and if it is determined that the voice command or the lenient command, the control execution unit 13 and the lenient tendency specification To the unit 16.

なまけコマンドの照合方法の詳細は次の通りである。照合用データ記憶部１４には音声コマンドごとに想定されるなまけコマンドの照合用データが記憶されている。音声認識部１２は、音声入力部４０から入力された音声信号を、音素ごとの音声信号として識別し、入力された文字が例えば（て）（え）（び）であると認識する。そして、なまけコマンドの照合用データの中に（て）（え）（び）があるか否かを照合する。なお、照合用データ記憶部１４が、なまけコマンドの照合用データとして、（て）（え）（び）という“言葉”の音声データを保持している場合は、音声入力部４０から入力された音声信号を、直接なまけコマンドの照合用データと照合することができる。このように、音声認識部１２は、音声入力部４０から入力された音声信号を、音素ごとに分けて照合しても良いし、コマンド単位で照合しても構わない。 The details of the check method of the namaze command are as follows. The collation data storage unit 14 stores collation data for slack commands assumed for each voice command. The speech recognition unit 12 identifies the speech signal input from the speech input unit 40 as a speech signal for each phoneme, and recognizes that the input character is, for example, (te) (e) (bi). Then, it is verified whether or not there is (te), (e), (b) in the collation data of the slack command. In the case where the collation data storage unit 14 holds voice data of “words” (te) (e) (bi) as collation data for the slack command, it is input from the voice input unit 40. The voice signal can be collated with the collation data of the direct name command. As described above, the voice recognition unit 12 may collate the voice signal input from the voice input unit 40 for each phoneme, or may collate for each command.

なお、音声認識部１２は、なまけ傾向特定部１６により特定された発話なまけの傾向に属さないなまけコマンドを照合対象から削除する照合対象削除部１９を備える。 Note that the voice recognition unit 12 includes a collation target deletion unit 19 that deletes a slack command that does not belong to the utterance slack tendency specified by the slack trend identification unit 16 from the collation target.

なまけ傾向特定部１６は、入力された音声が所定の音声コマンド又はなまけコマンドに該当するか否かの判定の結果に基づいて、ユーザの発話なまけの傾向を特定する。通常、発話のなまけには、発話する音声の語頭又は語尾が弱くなったり、欠落したり、或いはｒの子音が抜けたりするなどの幾つかの傾向があるが、なまけ傾向特定部１６は、なまけコマンドに該当するか否かの判定結果に基づいて、ユーザがどの発話なまけの傾向にあるかを特定する。発話なまけの傾向については、図４を参照して後述する。 The slack tendency identification unit 16 identifies the tendency of the user to be uttered and slack based on the determination result of whether or not the input voice corresponds to a predetermined voice command or a slack command. Usually, there are several tendencies in the utterance of utterance, such as the beginning or ending of the utterance being weak, missing, or missing r consonant. Based on the determination result of whether or not it corresponds to the command, it is specified which utterance dull tendency the user has. The tendency of utterance slander will be described later with reference to FIG.

なまけ傾向特定部１６は、音声入力部４０に入力された音声がなまけコマンドに該当すると音声認識部１２が判定した場合、当該なまけコマンドをなまけずに発話した時の音声に相当する音声コマンドに対して入力された音声が不足している箇所を特定する不足箇所特定部１７と、入力された音声がなまけコマンドに該当すると音声認識部１２が判定した場合、当該なまけコマンドをなまけずに発話した時の音声に相当する音声コマンドに対して入力された音声が変更されている箇所を特定する変更箇所特定部１８とを有する。 When the voice recognition unit 12 determines that the voice input to the voice input unit 40 corresponds to the lick command, the lick tendency specifying unit 16 responds to the voice command corresponding to the voice when the utterance is made without slicking the lick command. When the speech recognition unit 12 determines that the input speech falls under the sneak command when the speech recognition unit 12 determines that the location where the input speech is lacking, and the input speech falls under the sneak command, And a change location specifying unit 18 for specifying a location where the voice input to the voice command corresponding to the voice is changed.

不足箇所特定部１７が不足箇所を特定した場合、ユーザ教示部６０は当該不足箇所及び正しい発話方法をユーザに対して教示する。変更箇所特定部１８が変更箇所を特定した場合、ユーザ教示部６０は当該変更箇所及び正しい発話方法をユーザに対して教示する。入力された音声に不足箇所或いは変更箇所が認められる場合、不足箇所或いは変更箇所及び正しい発話方法をユーザに対して教示することにより、ユーザが発話する音声のどの箇所の音が弱い又は欠落しているのか、或いは、どの箇所が間違っているのかを提示することができるので、ユーザは、正しい発話方法を容易に学習することができる。例えば、語頭又は語尾が弱くなる又は欠落するなどの発話なまけの傾向が有る場合、音声コマンドに対して入力された音声が不足している語頭又は語尾をユーザに対して教示すればよい。或いは「ｒ」等の子音が抜ける発話なまけの傾向が有る場合、音声コマンドに対して入力された音声が変更されている「ｒ」等の子音箇所をユーザに対して教示すればよい。なお、ユーザに対して不足箇所或いは変更箇所を教示する具体例については、図６（ａ）及び図６（ｂ）を参照して後述する。 When the lacking part identifying unit 17 identifies the lacking part, the user teaching unit 60 teaches the user about the lacking part and the correct utterance method. When the changed part specifying unit 18 specifies the changed part, the user teaching unit 60 teaches the user about the changed part and the correct utterance method. If there are missing or changed parts in the input voice, teaching the user of the missing or changed parts and the correct utterance method will cause the user to speak any part of the voice that is uttered. Or which part is wrong can be presented, so that the user can easily learn the correct speech method. For example, when there is a tendency for utterance dullness such as the beginning or end of the word to be weak or missing, the beginning or end of the speech that is lacking in voice input to the voice command may be taught to the user. Alternatively, if there is a tendency to utter utterance that consonants such as “r” are missing, it is only necessary to teach the user the consonant location such as “r” in which the voice input in response to the voice command is changed. A specific example of teaching a user a shortage or change will be described later with reference to FIGS. 6 (a) and 6 (b).

照合対象削除部１９は、照合用データ記憶部１４から当該なまけコマンドの照合用データを削除するのではなく、音声認識部１２が音声入力部４０から出力された音声信号（音声データ）と比較する照合用データの中から、ユーザにとって必要のないなまけコマンドの照合用データを削除する。よって、なまけコマンドの照合用データは音声認識部１２の照合対象から削除されても、照合用データ記憶部１４には依然として格納されている。このように、なまけ傾向特定部１６により特定された発話なまけの傾向に属さないなまけコマンドを音声認識部１２の照合対象から削除することにより、当該ユーザにとって必要のないなまけコマンドを削除することができるので、音声認識部１２の照合対象となる照合用データのデータ量が削減され、音声認識率が向上し、誤認識及びこれによる誤動作が抑制される。 The collation target deletion unit 19 does not delete the collation data of the name command from the collation data storage unit 14, but the voice recognition unit 12 compares the voice signal (speech data) output from the voice input unit 40. From the verification data, the verification data of the slack command that is not necessary for the user is deleted. Therefore, even if the matching data of the slack command is deleted from the verification target of the voice recognition unit 12, it is still stored in the verification data storage unit 14. In this way, by deleting a slack command that does not belong to the utterance slack tendency specified by the slack trend specifying unit 16 from the collation target of the speech recognition unit 12, it is possible to delete a slack command that is not necessary for the user. Therefore, the data amount of the collation data to be collated by the voice recognition unit 12 is reduced, the voice recognition rate is improved, and erroneous recognition and malfunction due to this are suppressed.

制御実行部（制御部）１３は、ユーザが発話する音声が所定の音声コマンドに該当すると音声認識部１２が認識した場合、当該認識した音声コマンドに対応する制御を実行し、ユーザが発話する音声が当該音声コマンドをなまけて発話した時の音声に対応するなまけコマンドに該当すると認識した場合は、前記音声コマンドに対応する制御を実行する。 When the voice recognition unit 12 recognizes that the voice uttered by the user corresponds to a predetermined voice command, the control execution unit (control unit) 13 executes control corresponding to the recognized voice command, and the voice uttered by the user If the voice command is recognized to correspond to the voice command corresponding to the voice when the voice command is spoken, the control corresponding to the voice command is executed.

制御実行部１３は、マイクロコンピュータと所定の記憶領域（ＲＡＭ）を備え、所定のプログラムに従って被制御機器２０及び音声認識制御装置１０の各構成要素の動作を制御する。具体的に、制御実行部１３は、音声認識部１２が認識した音声コマンド又はなまけコマンドに対応する所定の信号を受信するか、操作ボタン部のボタン操作による所定の信号を受信すると、被制御機器２０又は音声認識制御装置１０の各構成要素に対して、当該音声コマンド又はなまけコマンド或いはボタン操作に相当する制御信号を送信する。 The control execution unit 13 includes a microcomputer and a predetermined storage area (RAM), and controls the operation of each component of the controlled device 20 and the speech recognition control device 10 according to a predetermined program. Specifically, when the control execution unit 13 receives a predetermined signal corresponding to the voice command or the slack command recognized by the voice recognition unit 12 or receives a predetermined signal by operating the button of the operation button unit, A control signal corresponding to the voice command, slack command, or button operation is transmitted to each component 20 or the voice recognition control device 10.

被制御機器２０には、照明機器２１、空調機器２２、給湯器２３、テレビ２４、ジェット噴流バス装置２５、及びミストサウナ装置２６が含まれる。照明機器２１は、浴室内を人工的な光で照らして明るくするための装置であり、浴室全体を明るくする主照明や光源からの光を間接的に照射する間接照明が含まれる。空調機器２２は、浴室の壁や窓などに取り付けられ、空気の温度・湿度や清浄度などが調節された空気をモーターで羽根を回転させて浴室内に送出し、浴室内を快適な状態に保つための装置である。ジェット噴流バス装置２５は、浴槽の壁面の数カ所に設置された噴出口から気泡混じりの湯を噴き出し、入浴者の背中や足腰などに当てる装置である。ミストサウナ装置２６は、浴室内に暖められた霧状の水蒸気を送出する装置であって、送出される水蒸気をユーザが浴びることによりユーザの体を温める入浴方法において使用される装置である。 The controlled devices 20 include a lighting device 21, an air conditioning device 22, a water heater 23, a television 24, a jet jet bath device 25, and a mist sauna device 26. The lighting device 21 is a device for illuminating the interior of the bathroom with artificial light, and includes main illumination that brightens the entire bathroom and indirect illumination that indirectly irradiates light from the light source. The air conditioner 22 is attached to the wall or window of the bathroom, and the air whose temperature, humidity, and cleanness are adjusted is sent to the bathroom by rotating the blades with a motor to make the bathroom comfortable. It is a device for keeping. The jet fountain bath device 25 is a device that blows out hot water mixed with bubbles from the spouts installed at several places on the wall surface of the bathtub and hits the bather's back or legs. The mist sauna device 26 is a device that delivers mist-like water vapor that is warmed into the bathroom, and is a device that is used in a bathing method that warms the user's body by the user taking the water vapor that is delivered.

図２は、図１に示した音声認識制御装置１０及び被制御機器２０の配置例を示す浴室内の外観図である。被制御機器２０として、照明機器２１に属する主照明２１ａ及び間接照明２１ｂや空調機器２２が浴室内天井に設置され、浴室の浴槽３付近の壁面にテレビ２４及びミストサウナ装置２６が設置され、ジェット噴流バス装置２５の噴出口２５ａ及び吸込口２５ｂが浴室の浴槽３内に設置されている。また、浴室の浴槽３付近の壁面には、コントローラ１１が設置されている。なお、被制御機器２０の１つである給湯器２３やジェット噴流バス装置２５のポンプ装置、ミストサウナ装置２６の熱源機、及び音声認識制御装置１０のコントローラ以外の構成要素は浴室外に設置されている。 FIG. 2 is an external view in the bathroom showing an example of the arrangement of the voice recognition control device 10 and the controlled device 20 shown in FIG. As the controlled device 20, main lighting 21a and indirect lighting 21b belonging to the lighting device 21 and air conditioning device 22 are installed on the ceiling in the bathroom, a television 24 and a mist sauna device 26 are installed on the wall surface near the bathtub 3 in the bathroom, A jet outlet 25a and a suction port 25b of the jet bath device 25 are installed in the bathtub 3 in the bathroom. A controller 11 is installed on the wall surface near the bathtub 3 in the bathroom. In addition, components other than the hot water heater 23, which is one of the controlled devices 20, the pump device of the jet jet bath device 25, the heat source device of the mist sauna device 26, and the controller of the voice recognition control device 10 are installed outside the bathroom. ing.

なお、図２で示した配置例は一例であり、音声認識制御装置１０及び被制御機器２０は他のレイアウトを取り得る。また、図１及び図２では、照明機器２１、空調機器２２、給湯器２３、テレビ２４、ジェット噴流バス装置２５及びミストサウナ装置２６を被制御機器２０の例として挙げたが、これに限らず、被制御機器２０には、カセットテープ、ＣＤ、ＭＤ、ＤＶＤなどの記録媒体に格納された音楽や映像を再生する電気器具や、暖房機器やパーソナルコンピュータなど、浴室内においてユーザが利用する電気器具が含まれる。 The arrangement example shown in FIG. 2 is an example, and the voice recognition control device 10 and the controlled device 20 can take other layouts. Moreover, in FIG.1 and FIG.2, although the illuminating device 21, the air conditioner 22, the hot water heater 23, the television 24, the jet-jet bath apparatus 25, and the mist sauna apparatus 26 were mentioned as an example of the to-be-controlled apparatus 20, it is not restricted to this. The controlled device 20 includes electrical appliances that play music and video stored in recording media such as cassette tapes, CDs, MDs, and DVDs, and electrical appliances that users use in the bathroom, such as heating devices and personal computers. Is included.

次に、図３を参照して、図１及び図２に示したコントローラ１１の操作面のレイアウトを説明する。コントローラ１１の操作面には、音声入力部４０のマイク４１、音声出力部５０のスピーカ５３、各種操作ボタン６０ａ〜６０ｉ、及び表示部７０としてのＬＥＤ７１及び液晶表示装置７２が配置されている。 Next, the layout of the operation surface of the controller 11 shown in FIGS. 1 and 2 will be described with reference to FIG. On the operation surface of the controller 11, a microphone 41 of the audio input unit 40, a speaker 53 of the audio output unit 50, various operation buttons 60 a to 60 i, an LED 71 as a display unit 70, and a liquid crystal display device 72 are arranged.

各種操作ボタン６０ａ〜６０ｉは、メニューボタン６０ａ、確定ボタン６０ｂ、戻るボタン６０ｃ、十字キー６０ｄ、優先ボタン６０ｅ、追いだきボタン６０ｆ、ふろ自動ボタン６０ｇ、通話ボタン６０ｈ及びコントローラオンオフスイッチ６０ｉからなる。これらボタン６０ａ〜６０ｉのうち、優先ボタン６０ｅ、追いだきボタン６０ｆ、ふろ自動ボタン６０ｇ、及び通話ボタン６０ｈは、給湯器２３の制御のために用いられる。また、他のボタン及びスイッチは、給湯器２３に限らず、その他の被制御機器２０及び音声認識制御装置１０の各構成要素の制御のためにも用いられる。このように、コントローラ１１は、被制御機器２０をスイッチ操作により制御する浴室リモコンと、音声認識制御装置１０のコントロールパネルとの機能を兼ねる構成となっている。 The various operation buttons 60a to 60i include a menu button 60a, a confirmation button 60b, a return button 60c, a cross key 60d, a priority button 60e, a follow-up button 60f, a bath automatic button 60g, a call button 60h, and a controller on / off switch 60i. Among these buttons 60a to 60i, the priority button 60e, the follow-up button 60f, the automatic bath button 60g, and the call button 60h are used for controlling the water heater 23. The other buttons and switches are used not only for the hot water heater 23 but also for controlling each component of the controlled device 20 and the speech recognition control device 10. As described above, the controller 11 is configured to function as both a bathroom remote controller that controls the controlled device 20 by a switch operation and the control panel of the voice recognition control device 10.

具体的に、優先ボタン６０ｅは、浴室で給湯温度やシャワー温度を設定したいときに使用するボタンである。一般的に水や湯は、浴室以外にも台所等で用いられる。このため、給湯器２３の給湯温度やシャワー温度を設定しても他の箇所で水や湯を使用されると、実際の給湯温度やシャワー温度にズレが生じる可能性がある。そこで、優先ボタン６０ｅを押下することにより、他の箇所よりも浴室を優先し、実際の給湯温度やシャワー温度にズレが生じ難いようにすることができる。また、優先ボタン６０ｅが押下されると、ＬＥＤ７１が点灯する、又は液晶表示装置７２に優先状態を表示する等の方法により、表示部７０に優先マーク（不図示）が表示される。 Specifically, the priority button 60e is a button used when it is desired to set a hot water supply temperature or a shower temperature in the bathroom. In general, water and hot water are used not only in the bathroom but also in the kitchen. For this reason, even if the hot water supply temperature and the shower temperature of the water heater 23 are set, if water or hot water is used in other places, the actual hot water supply temperature or the shower temperature may be shifted. Therefore, by pressing the priority button 60e, it is possible to give priority to the bathroom over other places, and to prevent the actual hot water supply temperature and shower temperature from being shifted. When the priority button 60e is pressed, a priority mark (not shown) is displayed on the display unit 70 by a method such as turning on the LED 71 or displaying the priority state on the liquid crystal display device 72.

追いだきボタン６０ｆは、浴槽内の湯水の温度を高くするときに使用されるボタンである。追いだきボタン６０ｆが押下されると、前記の優先マークと同様にして、表示部７０に追いだきマーク（不図示）が表示される。ふろ自動ボタン６０ｇは、予め設定した湯量と温度とで浴槽内にお湯をはるときに使用されるボタンである。ふろ自動ボタン６０ｇが押下されると、前記の優先マークと同様にして、表示部７０に自動マーク（不図示）が表示される。 The chasing button 60f is a button used when raising the temperature of hot water in the bathtub. When the tracking button 60f is pressed, a tracking mark (not shown) is displayed on the display unit 70 in the same manner as the priority mark. The bath automatic button 60g is a button used when hot water is poured into the bathtub with a preset amount and temperature of hot water. When the automatic button 60g is pressed, an automatic mark (not shown) is displayed on the display unit 70 in the same manner as the priority mark.

通話ボタン６０ｈは、浴室外、例えば台所などに設置される台所用リモコンと通話するときに使用されるボタンである。通話ボタン６０ｈが押下されると、前記の優先マークと同様にして、表示部７０に通話マーク（不図示）が表示される。 The call button 60h is a button used when making a call with a kitchen remote controller installed outside the bathroom, for example, in a kitchen. When the call button 60h is pressed, a call mark (not shown) is displayed on the display unit 70 in the same manner as the priority mark.

メニューボタン６０ａは、手入力により被制御機器２０及び音声認識制御装置１０の動作を設定するためのボタンである。メニューボタン６０ａが押下されると、被制御機器２０及び音声認識制御装置１０の動作項目（例えば空調機器オフ、テレビ電源オン、テレビチャンネル＋１、ミストサウナ装置オン、音声認識部オンなど）が液晶表示装置７２に複数個表示される。ユーザは、これら複数の動作項目から十字キー６０ｄを操作して１つの動作項目を選択することとなる。 The menu button 60a is a button for setting operations of the controlled device 20 and the voice recognition control device 10 by manual input. When the menu button 60a is pressed, the operation items of the controlled device 20 and the voice recognition control device 10 (for example, air conditioner off, TV power on, TV channel + 1, mist sauna device on, voice recognition unit on, etc.) are displayed on the liquid crystal display. A plurality of items are displayed on the device 72. The user selects one action item by operating the cross key 60d from the plurality of action items.

確定ボタン６０ｂは、十字キー６０ｄを操作して選択された動作項目の動作を被制御機器２０及び音声認識制御装置１０に実行させる際に押下されるボタンである。戻るボタン６０ｃは、液晶表示装置７２に表示される画面を１つ前の状態に戻すときなどに使用されるボタンである。例えば、液晶表示装置７２上に動作項目の一部しか表示できない場合、十字キー６０ｄを操作することにより、次の画面に移行して、残りの動作項目を表示させることができる。また、戻るボタン６０ｃを押下すれば、移行した画面を元に戻して、前回画面の動作項目を液晶表示装置７２に表示させることができる。十字キー６０ｄは、給湯温度やシャワー温度の温度設定、湯量の設定、動作項目の選択、オン／オフの選択などに用いられるボタンである。 The confirmation button 60b is a button that is pressed when the controlled device 20 and the voice recognition control device 10 execute the operation of the operation item selected by operating the cross key 60d. The return button 60c is a button used when returning the screen displayed on the liquid crystal display device 72 to the previous state. For example, when only a part of the operation items can be displayed on the liquid crystal display device 72, it is possible to move to the next screen by operating the cross key 60d and display the remaining operation items. If the return button 60c is pressed, the transitioned screen can be restored and the operation items of the previous screen can be displayed on the liquid crystal display device 72. The cross key 60d is a button used for temperature setting of hot water supply temperature or shower temperature, setting of hot water volume, selection of operation items, selection of on / off, and the like.

コントローラオンオフスイッチ６０ｉは、コントローラ１１の電源をオン又はオフするためのボタンであり、コントローラオンオフスイッチ６０ｉを押下する度に、コントローラ１１の電源のオンとオフが切り替わる。コントローラオンオフスイッチ６０ｉによりコントローラ１１の電源がオフされた場合、液晶表示装置７２の表示は消去し、コントローラ１１のスイッチ操作を介した被制御機器２０及び音声認識制御装置１０の制御が無効となり、かつ音声認識による被制御機器２０及び音声認識制御装置１０の制御も無効となる。 The controller on / off switch 60i is a button for turning on or off the power supply of the controller 11. Each time the controller on / off switch 60i is pressed, the controller 11 is turned on or off. When the controller 11 is turned off by the controller on / off switch 60i, the display on the liquid crystal display device 72 is erased, the control of the controlled device 20 and the voice recognition control device 10 through the switch operation of the controller 11 is disabled, and Control of the controlled device 20 and the voice recognition control device 10 by voice recognition is also invalidated.

上記した各種操作ボタン６０ａ〜６０ｉのボタン操作による制御は、ユーザの発話音声の音声認識機能を用いても同様にして実行することができる。即ち、照合用データ記憶部１４は、上記したコントローラ１１のボタン操作と同等な制御に相当する音声コマンド及びなまけコマンドの照合用データを格納し、音声認識部１２は、音声入力部４０から出力された音声信号（音声データ）と照合用データ記憶部１４に格納された照合用データとを比較することにより、上記した各種操作ボタン６０ａ〜６０ｉのボタン操作と同等な制御に相当する音声コマンド及びなまけコマンドを認識することができる。 The above-described control by the button operation of the various operation buttons 60a to 60i can be executed in the same manner even if the voice recognition function of the user's uttered voice is used. That is, the collation data storage unit 14 stores collation data of voice commands and slack commands corresponding to the control equivalent to the button operation of the controller 11 described above, and the voice recognition unit 12 is output from the voice input unit 40. By comparing the voice signal (speech data) with the collation data stored in the collation data storage unit 14, voice commands corresponding to the control equivalent to the button operations of the various operation buttons 60 a to 60 i and the namare Can recognize commands.

液晶表示装置７２は、時刻、浴槽内の湯水の量及び温度、給湯温度、シャワー温度などを表示する。また、液晶表示装置７２は、なまけ傾向特定部１６により特定された発話なまけの傾向を画面に表示する。また、液晶表示装置７２による画面表示と併せて、或いは画面表示の代わりに、スピーカ５３から同様な内容を音声にて案内してもよい。このように、画面表示及び音声案内によってなまけ傾向特定部１６により特定された発話なまけの傾向をユーザに対して教示することにより、ユーザは自分の発話なまけの傾向を容易に把握することができる。 The liquid crystal display 72 displays the time, the amount and temperature of hot water in the bathtub, the hot water supply temperature, the shower temperature, and the like. Further, the liquid crystal display device 72 displays the tendency of utterance blurring specified by the trend tendency specifying unit 16 on the screen. In addition to the screen display by the liquid crystal display device 72 or in place of the screen display, the same content may be guided from the speaker 53 by voice. In this way, the user can easily grasp his / her utterance tendency by teaching the user the utterance / smoothing tendency specified by the sloppy tendency specifying unit 16 through screen display and voice guidance.

次に、図４及び図５を参照して、なまけ傾向特定部１６により特定される発話なまけの傾向を説明する。なまけ傾向特定部１６は、入力された音声が音声コマンド又はなまけコマンドに該当するか否かの判定結果に基づいて、ユーザがどの発話なまけの傾向にあるかを特定する。図４に示すように、発話なまけの傾向には、例えば、発話する音声の語頭が弱くなるか欠落する「語頭の弱化（なまけ傾向１）」、発話する音声の語尾が弱くなるか欠落する「語尾の弱化（なまけ傾向２）」、及びｒの子音が抜ける「ｒの抜け（なまけ傾向３）」がある。 Next, with reference to FIG. 4 and FIG. 5, the tendency of speech utterance specified by the lenient tendency specifying unit 16 will be described. The slack tendency identifying unit 16 identifies the utterance slack tendency of the user based on the determination result of whether or not the input voice corresponds to the voice command or the slack command. As shown in FIG. 4, the tendency of utterance sanctions includes, for example, “beginning weakening (smoothing tendency 1)” in which the beginning of the speech to be uttered is weak or missing, or the ending of the speech to be uttered is weakened or missing. "End weakening (smoothing tendency 2)" and "r missing (smoothing tendency 3)" where r consonants are missing.

図５に示すように、例えば、「テレビ２４の電源をオンする」制御内容に対応する音声コマンドが「てれびをつけて」である場合、当該「てれびをつけて」に対応するなまけコマンドのうち、「語頭の弱化（なまけ傾向１）」に属するなまけコマンドは「れびをつけて」であり、「語尾の弱化（なまけ傾向２）」に属するなまけコマンドは「てれびをつけ」であり、「ｒの抜け（なまけ傾向３）」に属するなまけコマンドは「てえびをつけて」である。 As shown in FIG. 5, for example, when the voice command corresponding to the control content of “turning on the power of the television 24” is “turn on TV”, among the slack commands corresponding to the “turn on TV” , The name commands belonging to “Initial weakening (named tendency 1)” are “Add Levi”, and the name commands belonging to “Initial weakening (Named tendency 2)” are “Add Lebi”, “ The slack command belonging to “missing r (sloping tendency 3)” is “put shrimp”.

同様に、「照明機器２１の電源をオンする」制御内容に対応する音声コマンドが「あかりをつけて」である場合、当該「あかりをつけて」に対応するなまけコマンドのうち、「語頭の弱化（なまけ傾向１）」に属するなまけコマンドは「かりをつけて」であり、「語尾の弱化（なまけ傾向２）」に属するなまけコマンドは「あかりをつけ」であり、「ｒの抜け（なまけ傾向３）」に属するなまけコマンドは「あかいをつけて」である。 Similarly, when the voice command corresponding to the control content “turn on the lighting device 21” is “Turn on the light”, among the slack commands corresponding to the “Turn on the light”, The lazy command belonging to “(Laze tendency 1)” is “Turn on the scale”, and the lazy command belonging to “Lack of endings (Lack of trend 2)” is “Take a light”, and “missing r” (sloping tendency) 3) The slander command belonging to “” is “with red”.

図示は省略するが、この他に、なまけ傾向特定部１６は、以下のような発話なまけの傾向も特定する。
（ａ）「ｅｉ」が「ｅｅ」に変化する。例「ていし（停止）」が「てえし」へ変化する等、
（ｂ）「ｏｕ」が「ｏｏ」に変化する。例「ぼこう（母校）」が「ぼこお」へ変化する等、
（ｃ）「し」と「ひ」が入れ替わる。例「ひつじ（羊）」と「しつじ」等、
（ｄ）その他、ルール化できないもの。例「ぜんいん（全員）」が「ぜえいん」へ変化する、「ばあい（場合）」が「ばわい」へ変化する等。 Although illustration is omitted, in addition to this, the slack tendency specifying unit 16 also specifies the tendency of utterance slack as follows.
(A) “ei” changes to “ee”. Example: “Tashishi (stop)” changes to “Teshishi”, etc.
(B) “ou” changes to “oo”. Example: “Boko (home school)” changes to “Bokoo”
(C) “shi” and “hi” are interchanged. Examples: “sheep” and “sheep”
(D) Others that cannot be ruled. Examples: “Zenin (all)” changes to “Zein”, “Bai (if)” changes to “Bai”, etc.

また、図５では、語頭及び語尾の弱化の例として、語頭及び語尾の１音が弱化した場合を示したが、語頭及び語尾の２以上の音が弱化した「なまけコマンド」を用意しておいても構わない。次の図６（ａ）では語尾の３音が弱化している発話なまけの例を示す。 In addition, in FIG. 5, as an example of weakening of the beginning and ending, a case where one sound of the beginning and ending is weakened, but a “name command” in which two or more sounds of the beginning and ending are weakened is prepared. It does not matter. Next, FIG. 6 (a) shows an example of utterance blurring in which the last three sounds are weakened.

図６（ａ）及び図６（ｂ）は、ユーザに対して発話なまけの傾向及び不足箇所或いは変更箇所を表示する液晶表示装置７２の画面例を示す。図６（ａ）の画面例は、入力された音声がなまけコマンド「てれびを」に該当すると音声認識部１２が認識し、「てれびを」をなまけずに発話したときの音声に相当する音声コマンド（例えば、「てれびをつけて」や「てれびをけして」など）に対して「てれびを」が不足している箇所を不足箇所特定部１７が特定した場合に表示される。図６（ａ）の画面例では、ユーザがなまけ傾向２（語尾の弱化）を有していることを明示している。また、「てれびを」をなまけずに発話したときの音声に相当する音声コマンドを１つに絞り込むことができないので、不足箇所を１つに特定していないが、その代わりに、装置１０が認識できた音声「てれびを」を明示し、その後に続く言葉を最後まで発話するようにユーザに対して正しい発話方法を指導している。なお、図６（ａ）の画面例において、その後に続く言葉（例えば、「つけて」や「けして」）の候補を追加して表示しても構わない。 FIG. 6A and FIG. 6B show screen examples of the liquid crystal display device 72 that displays the tendency of utterance slack and a lacking part or a changed part to the user. The screen example of FIG. 6A shows a voice command corresponding to the voice when the voice recognition unit 12 recognizes that the input voice corresponds to the lick command “Telebi” and speaks without singing “Telebi”. This message is displayed when the lacking part identifying unit 17 identifies a place where “telebi” is insufficient for (for example, “put on TV” or “get off TV”). In the screen example of FIG. 6A, it is clearly shown that the user has a slack tendency 2 (weak ending). In addition, since the voice command corresponding to the voice when uttering “Telebi” is not narrowed down to one, the shortage point is not specified as one, but the device 10 recognizes it instead. The voice “Telebi” is clearly indicated, and the user is instructed in the correct utterance method so that the subsequent words are uttered to the end. In addition, in the screen example of FIG. 6A, candidates for subsequent words (for example, “tick” and “kake”) may be added and displayed.

図６（ｂ）の画面例は、入力された音声がなまけコマンド「てえびをつけて」に該当すると音声認識部１２が認識し、「てえびをつけて」をなまけずに発話したときの音声に相当する音声コマンド「てれびをつけて」に対して変更されている箇所を変更箇所特定部１８が特定した場合に表示される。図６（ｂ）の画面例では、ユーザがなまけ傾向３（ｒの抜け）を有していることを明示している。また、入力された音声が「てえびをつけて」と認識されたことを示し、ｒをはっきりと発音するようにユーザに対して正しい発話方法を指導している。 In the screen example of FIG. 6B, the voice recognition unit 12 recognizes that the input voice corresponds to the lick command “Take shrimp” and utters “shrimp” without slap. This is displayed when the changed part specifying unit 18 specifies a part that has been changed in response to the voice command “Telebi on” corresponding to the voice. In the screen example of FIG. 6B, it is clearly shown that the user has a slack tendency 3 (r missing). In addition, it indicates that the input voice is recognized as “with shrimp”, and the user is instructed on the correct utterance method to pronounce r clearly.

次に、図８を参照して、図１の音声認識制御装置１０の動作手順の一例を説明する。 Next, an example of the operation procedure of the voice recognition control device 10 of FIG. 1 will be described with reference to FIG.

（イ）先ず、ステップＳ１０１において、液晶表示装置７２にユーザに発話を促す画面を表示するか、或いはスピーカ５３からユーザに発話を促す音声案内を出力することにより、音声入力部４０へ音声を入力することをユーザに促す。 (A) First, in step S101, a voice prompt is input to the voice input unit 40 by displaying a screen prompting the user to speak on the liquid crystal display device 72 or outputting a voice guidance prompting the user to speak from the speaker 53. Prompt the user to

（ロ）ステップＳ１０３に進み、音声入力部４０にユーザの発話音声が入力されたか否かを判断する。ユーザの発話音声が入力されて音声信号が音声認識部１２へ出力された場合（Ｓ１０３でＹＥＳ）ステップＳ１０５へ進み、発話音声が入力されなかった場合（Ｓ１０３でＮＯ）、ステップＳ１０１に戻り、再度、音声入力をユーザに促す。 (B) Proceeding to step S103, it is determined whether or not the user's speech is input to the voice input unit 40. If the user's speech is input and a speech signal is output to the speech recognition unit 12 (YES in S103), the process proceeds to step S105. If the speech is not input (NO in S103), the process returns to step S101, and again. , Prompt the user for voice input.

（ハ）ステップＳ１０５において、音声認識部１２は、照合用データ記憶部１４に格納された照合用データと音声入力部４０から出力された音声信号とを照合して、入力された音声が所定の音声コマンド又はなまけコマンドに該当するか否かを判定する。音声コマンド又はなまけコマンドに該当すると判定した（音声認識に成功した）場合（Ｓ１０５でＹＥＳ）、当該音声コマンド又はなまけコマンドに対応する所定の信号をなまけ傾向特定部１６へ出力し、ステップＳ１０７に進む。一方、音声コマンド又はなまけコマンドに該当すると判定しない（音声認識に失敗した）場合（Ｓ１０５でＮＯ）、音声認識できない旨を液晶表示装置７２に表示し、その後、Ｓ１０１に戻る。 (C) In step S105, the voice recognition unit 12 collates the collation data stored in the collation data storage unit 14 with the voice signal output from the voice input unit 40, and the input voice is a predetermined value. It is determined whether or not a voice command or a slack command is applicable. If it is determined that it corresponds to the voice command or the lenient command (speech recognition is successful) (YES in S105), a predetermined signal corresponding to the voice command or the lenient command is output to the lenient tendency specifying unit 16, and the process proceeds to step S107. . On the other hand, if it is not determined that the command corresponds to a voice command or a slack command (speech recognition has failed) (NO in S105), the fact that voice recognition cannot be performed is displayed on the liquid crystal display device 72, and then the process returns to S101.

（ニ）ステップＳ１０７において、なまけ傾向特定部１６は、ステップＳ１０５の判定の結果に基づいて、ユーザの発話なまけの傾向を特定する。その後、ステップＳ１０９に進む。入力された音声になまけ傾向が有る場合（Ｓ１０９でＹＥＳ）、ステップＳ１１１へ進み、ユーザ教示部６０は、なまけ傾向特定部１６により特定された発話なまけの傾向（なまけ傾向１、２、３、・・・）をユーザに対して教示し、図８のフローチャートは終了する。なお、入力された音声になまけ傾向が無い場合（Ｓ１０９でＮＯ）、ステップＳ１１１を実施せずに、図８のフローチャートは終了する。 (D) In step S107, the trend tendency identifying unit 16 identifies the tendency of the user to utter and blur based on the determination result in step S105. Thereafter, the process proceeds to step S109. If the input voice has a tendency to be distorted (YES in S109), the process proceeds to step S111, where the user teaching unit 60 determines the tendency of the utterance to be distorted specified by the lenient tendency specifying unit 16 (smooth tendency 1, 2, 3,. ..) is taught to the user, and the flowchart of FIG. 8 ends. If the input voice does not tend to be distorted (NO in S109), step S111 is not performed and the flowchart of FIG. 8 ends.

以上説明したように、本発明の第１の実施の形態によれば、以下の作用効果が得られる。 As described above, according to the first embodiment of the present invention, the following operational effects can be obtained.

音声入力部４０に入力された音声が所定の音声コマンド又はなまけコマンドに該当するか否かの判定結果から、なまけ傾向特定部１６がユーザの発話なまけの傾向を特定し、ユーザ教示部６０がユーザに対して発話なまけの傾向を教示することにより、ユーザは自分の発話なまけの傾向を容易に把握することができるので、ユーザに正しい発話方法を容易に習得させることができる。例えば、語頭又は語尾の一音が弱くなる又は欠落する、或いは「ｒ」等の子音が抜けるなどの発話なまけの傾向をユーザは何ら負担を受けることなく把握することができる。 From the determination result of whether or not the voice input to the voice input unit 40 corresponds to a predetermined voice command or a slack command, the slack tendency identifying unit 16 identifies the tendency of the user's speech slack, and the user teaching unit 60 is the user. Since the user can easily grasp his / her utterance tendency, the user can easily acquire the correct utterance method. For example, the user can grasp the tendency of utterance blurring, such as a single sound at the beginning or end of the word becoming weak or missing, or a consonant such as “r” being missing.

音声認識部１２は、なまけ傾向特定部１６により特定された発話なまけの傾向に属さないなまけコマンドを照合対象から削除する照合対象削除部１９を備えることにより、ユーザにとって必要のないなまけコマンドが音声認識部１２の照合対象から削除されるので、照合対象となる照合用データのデータ量が削減され、音声認識率が向上し、誤認識及びこれによる誤動作が抑制される。
（第２の実施の形態）
図９を参照して、第２の実施の形態に係わる音声認識制御装置１０の動作手順の一例を説明する。ここでは、なまけ傾向があると認められる場合に、不足箇所及び変更箇所の有無を判断し、不足箇所及び変更箇所及び正しい発話方法をユーザに教示する手順を説明する。 The voice recognition unit 12 includes a collation target deletion unit 19 that deletes, from a collation target, a slack command that does not belong to the utterance slack tendency specified by the slack trend identification unit 16, so that the unrecognized simple command is recognized by the user. Since the data is deleted from the verification target of the unit 12, the amount of verification data to be verified is reduced, the speech recognition rate is improved, and erroneous recognition and malfunction caused thereby are suppressed.
(Second Embodiment)
With reference to FIG. 9, an example of an operation procedure of the speech recognition control apparatus 10 according to the second embodiment will be described. Here, when it is recognized that there is a tendency to be slack, the procedure for determining the presence or absence of a missing part and a changed part and teaching the user of the missing part and the changed part and the correct speech method will be described.

（イ）図９のステップＳ２０１〜Ｓ２０９における処理動作は、図８のステップＳ１０１〜Ｓ１０９と同じであるため説明を省略する。 (A) Since the processing operations in steps S201 to S209 in FIG. 9 are the same as those in steps S101 to S109 in FIG.

（ロ）ステップＳ２１１において、不足箇所特定部１７は、音声認識部１２が認識したなまけコマンドをなまけずに発話した時の音声に相当する音声コマンドに対して入力された音声が不足している箇所を特定する。不足箇所が特定された場合（Ｓ２１１でＹＥＳ）、ステップＳ２１３に進み、ユーザ教示部６０は当該不足箇所及び正しい発話方法をユーザに対して教示する。その後、ステップＳ２１５へ進む。不足箇所が特定されない場合（Ｓ２１１でＮＯ）、ステップＳ２１５へ進む。 (B) In step S211, the lacking part specifying unit 17 is a part where the input voice is insufficient for the voice command corresponding to the voice when the speech recognition unit 12 recognizes the spoken command without sneaking. Is identified. When the insufficient part is specified (YES in S211), the process proceeds to step S213, and the user teaching unit 60 teaches the user about the insufficient part and the correct speech method. Thereafter, the process proceeds to step S215. If the lacking part is not specified (NO in S211), the process proceeds to step S215.

（ハ）ステップＳ２１５において、変更箇所特定部１８は、音声認識部１２が認識したなまけコマンドをなまけずに発話した時の音声に相当する音声コマンドに対して入力された音声が変更されている箇所を特定する。変更箇所が特定された場合（Ｓ２１５でＹＥＳ）、ステップＳ２１７に進み、ユーザ教示部６０は当該変更箇所及び正しい発話方法をユーザに対して教示して、図９のフローチャートは終了する。変更箇所が特定されない場合（Ｓ２１５でＮＯ）、ステップＳ２１７を実施せずに、図９のフローチャートは終了する。 (C) In step S215, the changed part specifying unit 18 is a part where the input voice is changed with respect to the voice command corresponding to the voice when the utterance command recognized by the voice recognition unit 12 is spoken. Is identified. When the changed part is specified (YES in S215), the process proceeds to step S217, where the user teaching unit 60 teaches the changed part and the correct speech method to the user, and the flowchart of FIG. 9 ends. If the change location is not specified (NO in S215), step S217 is not performed and the flowchart of FIG. 9 ends.

以上説明したように、本発明の第２の実施の形態によれば、以下の作用効果が得られる。 As described above, according to the second embodiment of the present invention, the following operational effects can be obtained.

なまけコマンドをなまけずに発話した時の音声に相当する音声コマンドに対して入力された音声が不足している箇所を特定し、この不足箇所をユーザに対して教示することにより、ユーザが発話する音声のどの箇所の音が弱い或いは欠落しているのかを提示することができるので、ユーザは、正しい発話方法を容易に学習することができる。 The user utters by identifying a point where the input voice is insufficient for the voice command corresponding to the voice when speaking without using the slack command, and teaching the user of the shortage part. Since it is possible to present which part of the sound is weak or missing, the user can easily learn the correct speech method.

なまけコマンドをなまけずに発話した時の音声に相当する音声コマンドに対して入力された音声が変更されている箇所を特定し、この変更箇所をユーザに対して教示することにより、ユーザが発話する音声のどの箇所が間違っているのかを提示することができるので、ユーザは、正しい発話方法を容易に学習することができる。 The user utters by specifying the location where the input voice is changed in response to the voice command corresponding to the voice when uttering without slicking the command, and teaching the user of the changed location. Since it is possible to present which part of the voice is wrong, the user can easily learn the correct speech method.

上記のように、本発明は、２つの実施形態によって記載したが、この開示の一部をなす論述及び図面はこの発明を限定するものであると理解すべきではない。この開示から当業者には様々な代替実施の形態、実施例及び運用技術が明らかとなろう。 As described above, the present invention has been described in terms of two embodiments, but it should not be understood that the discussion and drawings that form part of this disclosure limit the present invention. From this disclosure, various alternative embodiments, examples, and operational techniques will be apparent to those skilled in the art.

本発明の実施の形態では、ユーザが発話する音声がなまけコマンドに該当すると認識した場合、認識したなまけコマンドに対応する制御を実行する音声認識制御装置１０について説明したが、本発明の音声認識制御装置は、これに限らず、なまけコマンドを認識した場合、なまけコマンドに対応する機器の制御を実行しなくても構わない。すなわち、少なくとも音声コマンドを認識した場合に、認識した音声コマンドに対応する制御を実行すればよく、なまけコマンドを認識した場合に、対応する制御を実行してもしなくてもどちらでもよい。 In the embodiment of the present invention, the voice recognition control apparatus 10 that executes control corresponding to the recognized lick command when the voice uttered by the user is recognized as the lick command has been described. The apparatus is not limited to this, and when a slack command is recognized, the device does not have to execute control of the device corresponding to the slack command. That is, at least when a voice command is recognized, control corresponding to the recognized voice command may be executed, and when a slack command is recognized, the corresponding control may or may not be executed.

本発明に係わる音声認識制御装置は、浴室に限らず、寝室、リビング、会社のデスク付近及び会議室など、他の箇所に適用することが可能であり、これらの部屋に設置されている電気器具の制御を行うことができる。また、本発明に係わる音声認識制御装置は、自動車等のナビゲーション装置、携帯電話、パーソナルコンピュータなど、音声認識機能を用いて操作可能な機器に対しても適用可能である。 The speech recognition control device according to the present invention is not limited to a bathroom, and can be applied to other places such as a bedroom, a living room, a company desk, and a conference room, and the electric appliances installed in these rooms. Can be controlled. The voice recognition control device according to the present invention can also be applied to devices that can be operated using a voice recognition function, such as navigation devices such as automobiles, mobile phones, and personal computers.

また、図７に示すように、被制御機器２０ごとに、音声コマンドと制御内容の対応関係をまとめた表を、液晶表示装置７２に表示したり、或いは音声認識制御装置１０の使用説明書などに記載しておく。これにより、ユーザは、所望する制御内容に対応する音声コマンドを直ぐに認識できるので、音声による命令の正しい発話方法を容易に学習することができる。 Further, as shown in FIG. 7, for each controlled device 20, a table summarizing the correspondence between the voice command and the control content is displayed on the liquid crystal display device 72, or the instruction manual of the voice recognition control device 10 is used. It is described in. Thus, the user can immediately recognize the voice command corresponding to the desired control content, and can easily learn the correct utterance method of the voice command.

このように、本発明はここでは記載していない様々な実施の形態等を包含するということを理解すべきである。したがって、本発明はこの開示から妥当な特許請求の範囲に係る発明特定事項によってのみ限定されるものである。 Thus, it should be understood that the present invention includes various embodiments and the like not described herein. Therefore, the present invention is limited only by the invention specifying matters according to the scope of claims reasonable from this disclosure.

本発明の第１の実施の形態に係わる音声認識制御装置１０及び被制御機器２０の具体的な構成を示すブロック図である。It is a block diagram which shows the specific structure of the speech recognition control apparatus 10 concerning the 1st Embodiment of this invention, and the to-be-controlled device 20. FIG. 図１に示した音声認識制御装置１０及び被制御機器２０の配置例を示す浴室内の外観図である。It is an external view in the bathroom which shows the example of arrangement | positioning of the speech recognition control apparatus 10 shown in FIG. 図１及び図２に示したコントローラ１１の操作面のレイアウトを示す平面図である。It is a top view which shows the layout of the operation surface of the controller 11 shown in FIG.1 and FIG.2. なまけ傾向特定部１６により特定される発話なまけの傾向を例示する表である。It is a table | surface which illustrates the tendency of the utterance blur specified by the slack tendency specific | specification part 16. FIG. 音声コマンド及びなまけコマンドの例を示す表である。It is a table | surface which shows the example of a voice command and a slack command. 図６（ａ）及び図６（ｂ）は、ユーザに対して発話なまけの傾向及び不足箇所或いは変更箇所を表示する液晶表示装置７２の画面例を示す。FIG. 6A and FIG. 6B show screen examples of the liquid crystal display device 72 that displays the tendency of utterance slack and a lacking part or a changed part to the user. 被制御機器２０ごとに音声コマンドと制御内容の対応関係をまとめた表である。4 is a table summarizing the correspondence between voice commands and control contents for each controlled device 20. 図１の音声認識制御装置１０の動作手順の一例を示すフローチャートである。It is a flowchart which shows an example of the operation | movement procedure of the speech recognition control apparatus 10 of FIG. 第２の実施の形態に係わる音声認識制御装置１０の動作手順の一例を示すフローチャートである。It is a flowchart which shows an example of the operation | movement procedure of the speech recognition control apparatus 10 concerning 2nd Embodiment.

Explanation of symbols

３…浴槽
１０…音声認識制御装置
１１…コントローラ
１２…音声認識部
１３…制御実行部
１４…照合用データ記憶部
１５…制御ＩＦ部
１６…なまけ傾向特定部
１７…不足箇所特定部
１８…変更箇所特定部
１９…照合対象削除部
２０…被制御機器
２１…照明機器
２１ａ…主照明
２１ｂ…間接照明
２２…空調機器
２３…給湯器
２４…テレビ
２５…ジェット噴流バス装置
２５ａ…噴出口
２５ｂ…吸込口
２６…ミストサウナ装置
４０…音声入力部
４１…マイク
５０…音声出力部
５３…スピーカ
６０…ユーザ教示部（ユーザ表示手段）
６０ａ…メニューボタン
６０ｂ…確定ボタン
６０ｃ…戻るボタン
６０ｄ…十字キー
６０ｅ…優先ボタン
６０ｆ…追いだきボタン
６０ｇ…自動ボタン
６０ｈ…通話ボタン
６０ｉ…コントローラオンオフスイッチ
７０…表示部
７１…ＬＥＤ
７２…液晶表示装置 DESCRIPTION OF SYMBOLS 3 ... Bathtub 10 ... Voice recognition control apparatus 11 ... Controller 12 ... Voice recognition part 13 ... Control execution part 14 ... Data storage part 15 for collation 15 ... Control IF part 16 ... Slack tendency specific part 17 ... Shortage part specific part 18 ... Change location Specific part 19 ... Check target deletion part 20 ... Controlled device 21 ... Lighting equipment 21a ... Main lighting 21b ... Indirect lighting 22 ... Air conditioning equipment 23 ... Water heater 24 ... Television 25 ... Jet jet bus device 25a ... Spout 25b ... Suction port 26 ... Mist sauna device 40 ... Audio input unit 41 ... Microphone 50 ... Audio output unit 53 ... Speaker 60 ... User teaching unit (user display means)
60a ... Menu button 60b ... Confirm button 60c ... Back button 60d ... Cross key 60e ... Priority button 60f ... Follow-up button 60g ... Automatic button 60h ... Call button 60i ... Controller on / off switch 70 ... Display unit 71 ... LED
72 ... Liquid crystal display device

Claims

When it is recognized that at least the voice uttered by the user corresponds to a predetermined voice command, in the voice recognition control device that executes control corresponding to the recognized voice command,
A voice input unit for inputting voice uttered by the user;
A voice recognizing unit that determines whether or not the input voice corresponds to a predetermined voice command or a slack command corresponding to a voice uttered by speaking the voice command;
When the voice recognition unit determines that the input voice corresponds to the lenient command, a lenient tendency identifying unit that identifies a tendency of the user to utter utterance,
A voice recognition control device comprising: user teaching means for teaching the user the tendency of utterance blurring specified by the smoothing tendency specifying unit.

The voice recognition control apparatus according to claim 1, wherein the user teaching unit reports a pronunciation based on a correct voice command.

When the voice recognition unit determines that the input voice corresponds to the lick command, the lick tendency specifying unit is configured to input the voice command corresponding to the voice when the utterance is made without singing the lick command. A lacking part identifying part that identifies the part where the recorded audio is lacking,
The voice recognition control device according to claim 1, wherein the user teaching means teaches the user the shortage portion specified by the shortage portion specifying unit.

When the voice recognition unit determines that the input voice corresponds to the lick command, the lick tendency specifying unit is configured to input the voice command corresponding to the voice when the utterance is made without singing the lick command. A change location identifying unit that identifies the location where the recorded audio has been changed,
The voice recognition control apparatus according to claim 1, wherein the teaching unit teaches the user about the changed part specified by the changed part specifying unit.

5. The voice recognition unit includes a collation target deletion unit that deletes, from the collation target, a slack command that does not belong to the utterance slack tendency specified by the slack tendency identification unit. The voice recognition control apparatus according to 1.

When it is recognized that the voice uttered by the user corresponds to the predetermined voice command, control corresponding to the recognized voice command is executed, and the voice uttered by the user corresponds to the voice when the voice command is uttered. The speech recognition control apparatus according to claim 1, further comprising a control unit that executes control corresponding to the voice command when it is recognized that the command corresponds to the SUNAMAKE command.