JP2009109585A

JP2009109585A - Voice recognition control device

Info

Publication number: JP2009109585A
Application number: JP2007279455A
Authority: JP
Inventors: Shinpei Hibiya; 新平日比谷; Kiyotaka Takehara; 清隆竹原; Kenji Okuno; 健治奥野; Akira Baba; 朗馬場; Kenji Nakakita; 賢二中北
Original assignee: Panasonic Electric Works Co Ltd
Current assignee: Panasonic Electric Works Co Ltd
Priority date: 2007-10-26
Filing date: 2007-10-26
Publication date: 2009-05-21

Abstract

<P>PROBLEM TO BE SOLVED: To improve voice recognition rate and to restrain wrong recognition and malfunction due to it. <P>SOLUTION: In this voice recognition control device 10, when it is recognized that a voice spoken by a user corresponds to a predetermined voice command or a lazy command corresponding to voice in speaking a voice command lazily, the control corresponding to the recognized voice command or laze command is performed. The device includes: a collating data storage part 14 for storing collating data of voice command and a lazy command; a voice input part 40 to which voice spoken by the user is input, and which converts the voice to a predetermined voice signal; a voice recognition part 12 for determining whether or not the input voice corresponds to the predetermined voice command or lazy command by collating the converted voice signal with the collating data stored in the collating data storage part 14; and a collating object deleting part 16 allowing the user to specify an unnecessary lazy command based on the determination result of the voice recognizing part 12 and deleting the specified command from the collating object of the voice recognizing part 12. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、ユーザが発話する音声を入力し、入力された音声が所定の音声コマンドに該当すると認識した場合、認識した音声コマンドに対応する制御を実行する音声認識制御装置に関する。 The present invention relates to a voice recognition control apparatus that inputs a voice uttered by a user and executes control corresponding to the recognized voice command when the input voice is recognized as corresponding to a predetermined voice command.

発声の仕方や個人差によって、発話がなまける、例えば、発話する音声の語頭又は語尾が弱くなったり、欠落したり、或いはｒの子音が抜けたりすることが一般的に知られている。このように、なまけて発話した音声が音声認識制御装置に入力された場合、各制御内容に対応する音声コマンドが１つしか登録されていないと、音声認識制御装置は、入力された音声から音声コマンドを認識することができず制御を実行することができない。 It is generally known that the utterance can be reduced depending on the manner of utterance and individual differences, for example, the beginning or ending of the voice to be uttered is weak or missing, or the consonant of r is lost. In this way, when the voice that is spoken is input to the voice recognition control apparatus, if only one voice command corresponding to each control content is registered, the voice recognition control apparatus uses the input voice to The command cannot be recognized and control cannot be executed.

そこで、従来から、音声コマンドをなまけて発話した時の音声に対応するコマンド（以後、「なまけコマンド」という）を予め前記の音声コマンドとは別に管理しておくことにより、発話がなまけた場合であってもなまけコマンドを認識して当該なまけコマンドに対応する制御を実行できるようにしている（特許文献１参照）。
特開昭６２−１１１２９５号公報 Therefore, conventionally, a command corresponding to the voice when the voice command is spoken (hereinafter referred to as a “name command”) is managed separately from the voice command, so that the voice is spoken. Even if it exists, it recognizes the name command and makes it possible to execute control corresponding to the name command (see Patent Document 1).
JP-A-62-111295

しかし、なまけコマンドを音声認識の対象語彙として新たに追加してしまうと、対象語彙の数が増えてしまうため音声認識率が低下してしまう。また、類似する対象語彙が増えるため誤認識が増加し、これに伴う機器の誤動作も増加してしまう。 However, if a slack command is newly added as a target vocabulary for speech recognition, the number of target vocabularies increases, and the speech recognition rate decreases. Moreover, since the number of similar target vocabulary increases, misrecognition increases, and the malfunction of the apparatus accompanying this increases.

本発明は、上記問題点を解決するために成されたものであり、その目的は、音声認識率を向上させ、且つ誤認識及びこれによる誤動作を抑制することである。 The present invention has been made to solve the above-described problems, and an object of the present invention is to improve a speech recognition rate and to suppress erroneous recognition and malfunction caused thereby.

本発明の特徴は、ユーザが発話する音声が所定の音声コマンドに該当すると認識した場合は、当該認識した音声コマンドに対応する制御を実行し、ユーザが発話する音声が当該音声コマンドをなまけて発話した時の音声に対応するなまけコマンドに該当すると認識した場合は、前記音声コマンドに対応する制御を実行する音声認識制御装置であって、音声認識制御装置が認識することができる音声コマンドの照合用データ及びなまけコマンドの照合用データを格納する照合用データ記憶部と、ユーザが発話する音声を入力し、この音声を所定の音声信号に変換する音声入力部と、音声入力部によって変換された音声信号と照合用データ記憶部に格納された照合用データとを照合して、入力された音声が所定の音声コマンド又はなまけコマンドに該当するか否かを判定する音声認識部と、音声認識部による判定の結果に基づいて、ユーザにとって必要のないなまけコマンドを特定し音声認識部の照合対象から削除する照合対象削除部とを備えることである。 The feature of the present invention is that, when it is recognized that the voice uttered by the user corresponds to the predetermined voice command, the control corresponding to the recognized voice command is executed, and the voice uttered by the user utters the voice command. A voice recognition control device that executes control corresponding to the voice command when the voice recognition control device recognizes that the command corresponds to the slack command corresponding to the voice when the voice recognition control device recognizes the voice command. A data storage unit for storing data and data for collating namaze commands, a voice input unit for inputting voice uttered by the user, and converting the voice into a predetermined voice signal, and voice converted by the voice input unit The input voice is compared with the specified voice command or name command by comparing the signal with the verification data stored in the verification data storage unit. A voice recognition unit that determines whether or not the matching is true, and a collation target deletion unit that identifies a slack command that is not necessary for the user and deletes it from the collation target of the voice recognition unit based on a result of the determination by the voice recognition unit That is.

音声認識部による判定の結果に基づいて当該ユーザにとって必要のないなまけコマンドを特定し照合対象から削除することにより、照合対象となる照合用データのデータ量が削減されるので、音声認識率が向上し、誤認識及びこれによる誤動作が抑制される。 Based on the result of determination by the voice recognition unit, the amount of collation data to be collated is reduced by identifying unneeded slack commands that are unnecessary for the user and deleting them from the collation target, thus improving the voice recognition rate In addition, erroneous recognition and malfunction due to this are suppressed.

ここで、照合対象削除部は、照合用データ記憶部から、当該なまけコマンドの照合用データを削除するのではなく、音声認識部が音声入力部から出力された音声信号（音声データ）と比較する照合用データの中から、ユーザにとって必要のないなまけコマンドの照合用データを削除することが望ましいが、照合用データ記憶部から当該なまけコマンドの照合用データを削除しても構わない。 Here, the collation target deletion unit does not delete the collation data of the slack command from the collation data storage unit, but the voice recognition unit compares the voice signal (speech data) output from the voice input unit. Although it is desirable to delete the matching data of the slack command that is not necessary for the user from the collating data, the matching data of the slack command may be deleted from the collating data storage unit.

本発明の特徴において、音声認識制御装置は、ユーザに対して所定の初期発話を促す発話促進手段を更に備え、音声認識部は、初期発話の音声が所定の音声コマンド又はなまけコマンドに該当するか否かを判定し、照合対象削除部は、初期発話の判定の結果に基づいて、ユーザにとって必要のないなまけコマンドを音声認識部の照合対象から削除してもよい。 In the features of the present invention, the voice recognition control device further includes an utterance promoting unit that prompts the user to perform a predetermined initial utterance, and the voice recognition unit determines whether the voice of the initial utterance corresponds to a predetermined voice command or a slack command The collation target deletion unit may delete the slack command unnecessary for the user from the collation target of the voice recognition unit based on the determination result of the initial utterance.

ユーザに対して所定の初期発話を促し、初期発話の判定結果に基づいてユーザにとって必要のないなまけコマンドを照合対象から削除することにより、音声認識制御装置の使用を開始する時から、照合対象となる照合用データのデータ量を削減することができる。 By prompting the user for a predetermined initial utterance and deleting the slack command that is not necessary for the user from the verification target based on the determination result of the initial utterance, The amount of data for verification can be reduced.

なお、発話促進手段は、ユーザに対して所定の初期発話を促すための画面に表示する表示手段であってもよい。 Note that the utterance promoting means may be display means for displaying on a screen for prompting the user for a predetermined initial utterance.

本発明の特徴において、照合対象削除部は、音声認識部がなまけコマンドを認識する頻度が低いなまけコマンドを特定し、音声認識部の照合対象から削除してもよい。 In the feature of the present invention, the collation target deletion unit may identify a slack command whose voice recognition unit recognizes a slack command at a low frequency and delete it from the collation target of the voice recognition unit.

例えば、音声認識制御装置は、音声認識部がなまけコマンドを認識する頻度を当該なまけコマンドが示す制御内容ごとに計数する第１の頻度計数部を更に備え、照合対象削除部は、第１の頻度計数部により計数された頻度に応じて、音声認識部の照合対象から削除するなまけコマンドを制御内容ごとに特定してもよい。 For example, the voice recognition control device further includes a first frequency counting unit that counts the frequency at which the voice recognition unit recognizes the slack command for each control content indicated by the slack command, and the collation target deletion unit includes the first frequency Depending on the frequency counted by the counting unit, the slack command to be deleted from the collation target of the voice recognition unit may be specified for each control content.

或いは、音声認識制御装置は、音声認識部がなまけコマンドを認識する頻度を当該なまけコマンドが属する発話なまけの傾向ごとに計数する第２の頻度計数部を更に備え、照合対象削除部は、第２の頻度計数部により計数された頻度に応じて、音声認識部の照合対象から削除するなまけコマンドを発話なまけの傾向ごとに特定してもよい。各なまけコマンドを認識する頻度に応じて照合対象から削除するなまけコマンドを特定することにより、ユーザにとって必要のないなまけコマンドを削除することができる。 Alternatively, the voice recognition control device further includes a second frequency counting unit that counts the frequency at which the voice recognition unit recognizes the slack command for each tendency of the utterance slack to which the slack command belongs, and the collation target deletion unit includes the second Depending on the frequency counted by the frequency counting unit, the slack command to be deleted from the collation target of the speech recognition unit may be specified for each utterance slack tendency. By identifying the slack command to be deleted from the verification target according to the frequency of recognizing each slack command, the slack command that is not necessary for the user can be deleted.

本発明の特徴において、音声認識制御装置は、音声入力部に入力された音声に基づいて当該音声を発話する話者を識別する話者識別部を更に備え、照合対象削除部は、話者識別部により識別された話者に応じて、音声認識部の照合対象から削除するなまけコマンドを変更してもよい。発話なまけの傾向は各話者によってほぼ特定されるので、話者識別部により識別された話者に応じて照合対象から削除するなまけコマンドを変更することにより、ユーザ（話者）ごとに適切ななまけコマンドを選択することができる。 In the feature of the present invention, the voice recognition control device further includes a speaker identification unit that identifies a speaker who speaks the voice based on the voice input to the voice input unit, and the verification target deletion unit includes the speaker identification unit. The slack command to be deleted from the verification target of the voice recognition unit may be changed according to the speaker identified by the unit. Since the tendency of utterance slander is almost specified by each speaker, it is appropriate for each user (speaker) by changing the slack command to be deleted from the verification target according to the speaker identified by the speaker identification unit. You can select a namaze command.

本発明の音声認識制御装置によれば、ユーザにとって必要のないなまけコマンドを照合対象から削除することにより、照合対象となる照合用データのデータ量が削減されるので、音声認識率が向上し、且つ誤認識及びこれによる誤動作を抑制することができる。 According to the speech recognition control device of the present invention, the amount of collation data to be collated is reduced by deleting slack commands that are not necessary for the user from the collation target, thereby improving the speech recognition rate, In addition, erroneous recognition and malfunction due to this can be suppressed.

以下図面を参照して、本発明の実施形態を説明する。図面の記載において同一部分には同一符号を付して説明を省略する。
（第１の実施の形態）
図１を参照して、本発明の第１の実施の形態に係わる音声認識制御装置１０及び被制御機器２０の具体的な構成を説明する。音声認識制御装置１０は、ユーザが発話する音声による命令（コマンド）を認識し、この音声による命令に応じた被制御機器２０の制御を実行する装置である。また、音声認識制御装置１０は、被制御機器２０の制御に限らず、音声認識制御装置１０内の各構成要素の制御をこの音声による命令に基づいて実行する。なお、本発明の実施の形態においては、被制御機器２０として浴室に設置された様々な機器を音声認識制御装置１０が制御する場合を例にとり説明する。 Embodiments of the present invention will be described below with reference to the drawings. In the description of the drawings, the same portions are denoted by the same reference numerals, and description thereof is omitted.
(First embodiment)
With reference to FIG. 1, specific configurations of the speech recognition control device 10 and the controlled device 20 according to the first embodiment of the present invention will be described. The voice recognition control device 10 is a device that recognizes a command (command) by a voice spoken by a user and executes control of the controlled device 20 according to the command by the voice. Further, the voice recognition control device 10 performs control of each component in the voice recognition control device 10 based on this voice command, not limited to the control of the controlled device 20. In the embodiment of the present invention, the case where the speech recognition control device 10 controls various devices installed in a bathroom as the controlled device 20 will be described as an example.

具体的に、音声認識制御装置１０は、ユーザインターフェースを形成するコントローラ１１と、コントローラ１１を介して入力されたユーザの音声による命令が所定の音声コマンドに該当するか否かを判断する音声認識部１２と、音声認識部１２により認識された音声コマンドに対応する被制御機器２０の制御を実行するための制御信号を出力する制御実行部１３と、制御実行部１３から出力された制御信号を被制御機器２０へ送信する制御ＩＦ部１５と、ユーザに対して出力する音声を合成する音声合成部と、音声認識制御装置１０が認識することができる音声コマンドの照合用データ及びなまけコマンドの照合用データを格納する照合用データ記憶部１４と、音声認識部１２による判定の結果に基づいて、ユーザにとって必要のないなまけコマンドを音声認識部１２の照合対象から削除する照合対象削除部１６と、音声認識部１２がなまけコマンドを認識する頻度を当該なまけコマンドが示す制御内容ごとに計数する第１の頻度計数部１７と、音声認識部１２がなまけコマンドを認識する頻度を当該なまけコマンドが属する発話なまけの傾向ごとに計数する第２の頻度計数部１８と、コントローラ１１を介して入力された音声に基づいて当該音声を発話する話者を識別する話者識別部１９とを備える。 Specifically, the voice recognition control apparatus 10 includes a controller 11 that forms a user interface, and a voice recognition unit that determines whether a user's voice command input via the controller 11 corresponds to a predetermined voice command. 12, a control execution unit 13 that outputs a control signal for executing control of the controlled device 20 corresponding to the voice command recognized by the voice recognition unit 12, and a control signal output from the control execution unit 13. The control IF unit 15 to be transmitted to the control device 20, the voice synthesis unit for synthesizing the voice to be output to the user, the voice command verification data that can be recognized by the voice recognition control device 10, and the name change command verification Based on the result of the determination by the collation data storage unit 14 for storing data and the voice recognition unit 12, there is no need for the user. A collation target deletion unit 16 that deletes a command from a collation target of the voice recognition unit 12; a first frequency counting unit 17 that counts the frequency at which the voice recognition unit 12 recognizes a slack command for each control content indicated by the slack command; A second frequency counting unit 18 that counts the frequency at which the voice recognition unit 12 recognizes the slack command for each tendency of utterance slack to which the slack command belongs, and the voice based on the voice input through the controller 11. And a speaker identification unit 19 for identifying a speaker who speaks.

通常、発声の仕方や個人差によって、例えば、発話する音声の語頭又は語尾が弱くなったり、欠落したり、或いはｒの子音が抜けたりする等、発話のなまけが発生することがある。本発明の実施形態における「なまけコマンド」は、音声コマンドをなまけて発話した時の音声に対応するコマンドであり、「音声コマンド」は、音声による命令をなまけることなく、正しく発話した時の音声に対応するコマンドである。なまけコマンドの詳細については、図４及び図５を参照して後述する。 Usually, depending on the manner of utterance and individual differences, utterance slack may occur, for example, the beginning or ending of speech to be uttered may be weak, missing, or r consonants may be missing. In the embodiment of the present invention, the “name command” is a command corresponding to the voice when the voice command is spoken, and the “voice command” is the voice when the voice is spoken correctly without uttering the voice command. Corresponding command. Details of the slack command will be described later with reference to FIGS.

音声認識制御装置１０は、ユーザが発話する音声が所定の音声コマンドに該当すると認識した場合は、当該認識した音声コマンドに対応する制御を実行し、ユーザが発話する音声が当該音声コマンドをなまけて発話した時の音声に対応するなまけコマンドに該当すると認識した場合は、前記音声コマンドに対応する制御を実行する。 When the voice recognition control device 10 recognizes that the voice uttered by the user corresponds to the predetermined voice command, the voice recognition control device 10 executes control corresponding to the recognized voice command, and the voice uttered by the user transcribes the voice command. When it is recognized that the command corresponds to the slack command corresponding to the voice when the utterance is made, the control corresponding to the voice command is executed.

コントローラ１１は、ユーザが発する音声を入力し、これを電気信号（音声信号）として出力する音声入力部４０と、音声合成部によって合成された音声を出力する音声出力部５０と、ユーザに対して所定の画面などを表示する表示部７０とを備える。なお、図１には示さないが、本実施形態においてコントローラ１１はユーザのボタン操作を受け付ける操作ボタン部を更に備えている。コントローラ１１の詳細については図３を参照して後述する。 The controller 11 inputs a voice uttered by the user, outputs a voice signal 40 as an electrical signal (voice signal), a voice output section 50 that outputs a voice synthesized by the voice synthesizer, and a user And a display unit 70 for displaying a predetermined screen or the like. Although not shown in FIG. 1, in this embodiment, the controller 11 further includes an operation button unit that receives a user button operation. Details of the controller 11 will be described later with reference to FIG.

音声入力部４０は、ユーザの発話音声を入力し、これを音声信号に変換するマイクと、この音声信号を増幅する増幅部と、増幅された音声信号をデジタル信号に変換するＡ／Ｄ変換部と、このデジタル化された音声信号から雑音成分を除去する雑音減算部とを備える。雑音減算部により雑音が除去された音声信号は、音声認識部１２及び話者識別部１９へ送信される。 The voice input unit 40 receives a user's speech and converts it into a voice signal, an amplifier that amplifies the voice signal, and an A / D converter that converts the amplified voice signal into a digital signal. And a noise subtracting section for removing a noise component from the digitized voice signal. The voice signal from which noise has been removed by the noise subtraction unit is transmitted to the voice recognition unit 12 and the speaker identification unit 19.

音声出力部５０は、音声合成部にて合成された音声信号をアナログ信号に変換するＤ／Ａ変換部と、アナログ化された音声信号を増幅する増幅部と、増幅された音声信号を音声に変換して出力するスピーカとを備える。 The audio output unit 50 includes a D / A conversion unit that converts the audio signal synthesized by the audio synthesis unit into an analog signal, an amplification unit that amplifies the analog audio signal, and converts the amplified audio signal into audio. And a speaker for conversion and output.

表示部７０は、点灯／消灯／点滅によって被制御機器２０の動作状況をユーザに対して表示するＬＥＤと、文字や絵図等の画像により被制御機器２０の動作状況をユーザに対して表示する液晶表示装置とを有する。 The display unit 70 is an LED that displays the operating status of the controlled device 20 to the user by turning on / off / flashing, and a liquid crystal that displays the operating status of the controlled device 20 to the user by an image such as a character or a picture. And a display device.

操作ボタン部は、被制御機器２０の動作設定などをユーザの手入力により行うための各種ボタンからなり、この中には、コントローラ１１の運転のオン／オフ状態を切替えるコントローラオン／オフスイッチが含まれる。 The operation button unit includes various buttons for manually setting the operation of the controlled device 20 by a user, and includes a controller on / off switch for switching an operation on / off state of the controller 11. It is.

音声認識部１２は、音声入力部４０から出力された音声信号と照合用データ記憶部１４に記憶されている照合用データとを照合して、上記音声信号が所定の音声コマンド又はなまけコマンドに該当するか否かを判定する。具体的に、音声認識部１２は、音声入力部４０から出力された音声信号（音声データ）と照合用データベースに格納された照合用データとを比較することによりユーザが発する音声が所定の音声コマンド又はなまけコマンドに該当するか否かを判定し、音声コマンド又はなまけコマンドに該当すると判定した場合には当該音声コマンド又はなまけコマンドに対応する所定の信号を制御実行部１３へ出力する。 The voice recognition unit 12 collates the voice signal output from the voice input unit 40 with the collation data stored in the collation data storage unit 14, and the voice signal corresponds to a predetermined voice command or a name command. It is determined whether or not to do. Specifically, the voice recognition unit 12 compares the voice signal (speech data) output from the voice input unit 40 with the collation data stored in the collation database so that the voice uttered by the user is a predetermined voice command. Or, it is determined whether or not the command corresponds to the name command, and if it is determined that the command corresponds to the voice command or the name command, a predetermined signal corresponding to the voice command or the name command is output to the control execution unit 13.

なまけコマンドの照合方法の詳細は次の通りである。照合用データ記憶部１４には音声コマンドごとに想定されるなまけコマンドの照合用データが記憶されている。音声認識部１２は、音声入力部４０から入力された音声信号を、音素ごとの音声信号として識別し、入力された文字が例えば（て）（え）（び）であると認識する。そして、なまけコマンドの照合用データの中に（て）（え）（び）があるか否かを照合する。なお、照合用データ記憶部１４が、なまけコマンドの照合用データとして、（て）（え）（び）という“言葉”の音声データを保持している場合は、音声入力部４０から入力された音声信号を、直接なまけコマンドの照合用データと照合することができる。このように、音声認識部１２は、音声入力部４０から入力された音声信号を、音素ごとに分けて照合しても良いし、コマンド単位で照合しても構わない。 The details of the check method of the namaze command are as follows. The collation data storage unit 14 stores collation data for slack commands assumed for each voice command. The speech recognition unit 12 identifies the speech signal input from the speech input unit 40 as a speech signal for each phoneme, and recognizes that the input character is, for example, (te) (e) (bi). Then, it is verified whether or not there is (te), (e), (b) in the collation data of the slack command. In the case where the collation data storage unit 14 holds voice data of “words” (te) (e) (bi) as collation data for the slack command, it is input from the voice input unit 40. The voice signal can be collated with the collation data of the direct name command. As described above, the voice recognition unit 12 may collate the voice signal input from the voice input unit 40 for each phoneme, or may collate for each command.

また、音声認識部１２は、音声コマンド又はなまけコマンドに該当するか否かの判定結果に基づいて、ユーザの発話なまけの傾向を特定する。通常、発話のなまけには、発話する音声の語頭又は語尾が弱くなったり、欠落したり、或いはｒの子音が抜けたりするなどの幾つかの傾向があるが、音声認識部１２は、なまけコマンドに該当するか否かの判定結果に基づいて、ユーザがどの発話なまけの傾向にあるかを特定する。発話なまけの傾向については、図４を参照して後述する。 In addition, the voice recognition unit 12 identifies the tendency of the user to utter a utterance based on the determination result of whether or not it corresponds to a voice command or a slack command. Usually, there are several tendencies to utter speech, such as the beginning or ending of speech to be weakened, missing, or r consonant missing. On the basis of the determination result of whether or not this is true, the utterance dulling tendency of the user is specified. The tendency of utterance slander will be described later with reference to FIG.

制御実行部１３は、マイクロコンピュータと所定の記憶領域（ＲＡＭ）を備え、所定のプログラムに従って被制御機器２０及び音声認識制御装置１０の各構成要素の動作を制御する。具体的に、制御実行部１３は、音声認識部１２が認識した音声コマンド又はなまけコマンドに対応する所定の信号を受信するか、操作ボタン部のボタン操作による所定の信号を受信すると、被制御機器２０又は音声認識制御装置１０の各構成要素に対して、当該音声コマンド又はなまけコマンド或いはボタン操作に相当する制御信号を送信する。 The control execution unit 13 includes a microcomputer and a predetermined storage area (RAM), and controls the operation of each component of the controlled device 20 and the speech recognition control device 10 according to a predetermined program. Specifically, when the control execution unit 13 receives a predetermined signal corresponding to the voice command or the slack command recognized by the voice recognition unit 12 or receives a predetermined signal by operating the button of the operation button unit, A control signal corresponding to the voice command, slack command, or button operation is transmitted to each component 20 or the voice recognition control device 10.

照合対象削除部１６は、入力された音声が所定の音声コマンド又はなまけコマンドに該当するか否かの音声認識部１２による判定の結果に基づいて、ユーザにとって必要のないなまけコマンドを音声認識部１２の照合対象から削除する。ここで、照合対象削除部１６は、照合用データ記憶部１４から、当該なまけコマンドの照合用データを削除するのではなく、音声認識部１２が音声入力部４０から出力された音声信号（音声データ）と比較する照合用データの中から、ユーザにとって必要のないなまけコマンドの照合用データを削除する。よって、なまけコマンドの照合用データは音声認識部１２の照合対象から削除されても、照合用データ記憶部１４には依然として格納されている。このように、音声認識部１２による判定の結果に基づいて当該ユーザにとって必要のないなまけコマンドを音声認識部１２の照合対象から削除することにより、音声認識部１２の照合対象となる照合用データのデータ量が削減されるので、音声認識率が向上し、誤認識及びこれによる誤動作が抑制される。なお、照合対象削除部１６は、照合用データ記憶部１４から当該なまけコマンドの照合用データを削除しても構わない。 Based on the result of determination by the voice recognition unit 12 as to whether or not the input voice corresponds to a predetermined voice command or a slack command, the collation target deletion unit 16 sends a slack command unnecessary for the user to the voice recognition unit 12. Delete from the verification target. Here, the collation target deletion unit 16 does not delete the collation data of the slack command from the collation data storage unit 14, but the voice recognition unit 12 outputs the voice signal (voice data) output from the voice input unit 40. ) Is deleted from the matching data to be compared with (). Therefore, even if the matching data of the slack command is deleted from the verification target of the voice recognition unit 12, it is still stored in the verification data storage unit 14. As described above, by deleting the slack command unnecessary for the user from the collation target of the voice recognition unit 12 based on the determination result by the voice recognition unit 12, the collation data to be collated by the voice recognition unit 12 is changed. Since the amount of data is reduced, the voice recognition rate is improved, and erroneous recognition and malfunction caused thereby are suppressed. The collation target deletion unit 16 may delete the collation data of the slack command from the collation data storage unit 14.

また、照合対象削除部１６は、入力された音声が所定の音声コマンド又はなまけコマンドに該当するか否かの音声認識部１２の判定結果を直接参照して、音声認識部１２の照合対象から削除するなまけコマンドを特定してよいが、音声認識部１２により特定されたユーザの発話なまけの傾向を介して実施しても構わない。すなわち、照合対象削除部１６は、入力された音声が所定の音声コマンド又はなまけコマンドに該当するか否かの音声認識部１２の判定結果を直接的又は間接的に参照して、音声認識部１２の照合対象から削除するなまけコマンドを特定しても構わない。 The collation target deletion unit 16 directly deletes the input speech from the collation target of the voice recognition unit 12 by directly referring to the determination result of the voice recognition unit 12 as to whether or not the input voice corresponds to a predetermined voice command or a slack command. Although the slumber command may be specified, the command may be implemented via the tendency of the user's utterance sway specified by the voice recognition unit 12. That is, the collation target deletion unit 16 refers directly or indirectly to the determination result of the voice recognition unit 12 as to whether or not the input voice corresponds to a predetermined voice command or a slack command, and the voice recognition unit 12 It is also possible to specify a slack command to be deleted from the verification target.

第１の頻度計数部１７及び第２の頻度計数部１８は、音声認識部１２がなまけコマンドを認識する頻度を計数する。例えば、音声認識部１２が総ての音声コマンド及びなまけコマンドの音声認識に成功した回数に対する各なまけコマンドの認識回数の割合、音声入力部４０に音声が入力された回数に対する各なまけコマンドの認識回数の割合、その他に、音声認識部１２が同じ制御内容を示す音声コマンド及び総てのなまけコマンドの音声認識に成功した回数に対する当該制御内容を示す各なまけコマンドの認識回数の割合、音声認識部１２が同じ制御内容を示す総てのなまけコマンドの音声認識に成功した回数に対する当該制御内容を示す各なまけコマンドの認識回数の割合、などが、このなまけコマンドを認識する頻度に含まれる。第１の頻度計数部１７は、当該なまけコマンドが示す制御内容ごとにこの頻度を計数し、第２の頻度計数部１８は、当該なまけコマンドが属する発話なまけの傾向ごとにこの頻度を計数する。第１の頻度計数部１７及び第２の頻度計数部１８の詳細については図４及び図５を参照して後述する。 The first frequency counting unit 17 and the second frequency counting unit 18 count the frequency at which the voice recognition unit 12 recognizes the slack command. For example, the ratio of the number of recognitions of each name command to the number of times that the voice recognition unit 12 has successfully recognized all voice commands and name commands, and the number of recognition times of each name command relative to the number of times the voice is input to the voice input unit 40 In addition, the ratio of the number of recognition times of each lenient command indicating the control contents to the number of times that the voice recognition section 12 has successfully recognized the voice commands of the same control contents and all the lenient commands, and the voice recognition section 12 The frequency of recognizing this slack command includes the ratio of the number of recognition times of each slack command indicating the control content to the number of times that the speech recognition of all slack commands indicating the same control content has been successfully performed. The first frequency counting unit 17 counts this frequency for each control content indicated by the lenient command, and the second frequency counting unit 18 counts this frequency for each utterance lenient tendency to which the lenient command belongs. Details of the first frequency counting unit 17 and the second frequency counting unit 18 will be described later with reference to FIGS. 4 and 5.

話者識別部１９は、予め音声入力部４０で変換された音声信号（音声データ）をユーザごとに格納したメモリを備え（図示せず）、このメモリに格納された音声データと音声入力部４０に入力された音声とを比較して当該音声を発話する話者を識別する。 The speaker identifying unit 19 includes a memory (not shown) that stores, for each user, the voice signal (voice data) converted in advance by the voice input unit 40. The voice data stored in the memory and the voice input unit 40 are stored in the memory. The speaker who speaks the voice is identified by comparing the voice inputted to the voice.

被制御機器２０には、照明機器２１、空調機器２２、給湯器２３、テレビ２４、ジェット噴流バス装置２５、及びミストサウナ装置２６が含まれる。照明機器２１は、浴室内を人工的な光で照らして明るくするための装置であり、浴室全体を明るくする主照明や光源からの光を間接的に照射する間接照明が含まれる。空調機器２２は、浴室の壁や窓などに取り付けられ、空気の温度・湿度や清浄度などが調節された空気をモーターで羽根を回転させて浴室内に送出し、浴室内を快適な状態に保つための装置である。ジェット噴流バス装置２５は、浴槽の壁面の数カ所に設置された噴出口から気泡混じりの湯を噴き出し、入浴者の背中や足腰などに当てる装置である。ミストサウナ装置２６は、浴室内に暖められた霧状の水蒸気を送出する装置であって、送出される水蒸気をユーザが浴びることによりユーザの体を温める入浴方法において使用される装置である。 The controlled devices 20 include a lighting device 21, an air conditioning device 22, a water heater 23, a television 24, a jet jet bath device 25, and a mist sauna device 26. The lighting device 21 is a device for illuminating the interior of the bathroom with artificial light, and includes main illumination that brightens the entire bathroom and indirect illumination that indirectly irradiates light from the light source. The air conditioner 22 is attached to the wall or window of the bathroom, and the air whose temperature, humidity, and cleanness are adjusted is sent to the bathroom by rotating the blades with a motor to make the bathroom comfortable. It is a device for keeping. The jet fountain bath device 25 is a device that blows out hot water mixed with bubbles from the spouts installed at several places on the wall surface of the bathtub and hits the bather's back or legs. The mist sauna device 26 is a device that delivers mist-like water vapor that is warmed into the bathroom, and is a device that is used in a bathing method that warms the user's body by the user taking the water vapor that is delivered.

図２は、図１に示した音声認識制御装置１０及び被制御機器２０の配置例を示す浴室内の外観図である。被制御機器２０として、照明機器２１に属する主照明２１ａ及び間接照明２１ｂや空調機器２２が浴室内天井に設置され、浴室の浴槽３付近の壁面にテレビ２４及びミストサウナ装置２６が設置され、ジェット噴流バス装置２５の噴出口２５ａ及び吸込口２５ｂが浴室の浴槽３内に設置されている。また、浴室の浴槽３付近の壁面には、コントローラ１１が設置されている。なお、被制御機器２０の１つである給湯器２３やジェット噴流バス装置２５のポンプ装置、ミストサウナ装置２６の熱源機、及び音声認識制御装置１０のコントローラ以外の構成要素は浴室外に設置されている。 FIG. 2 is an external view in the bathroom showing an example of the arrangement of the voice recognition control device 10 and the controlled device 20 shown in FIG. As the controlled device 20, main lighting 21a and indirect lighting 21b belonging to the lighting device 21 and air conditioning device 22 are installed on the ceiling in the bathroom, a television 24 and a mist sauna device 26 are installed on the wall surface near the bathtub 3 in the bathroom, A jet outlet 25a and a suction port 25b of the jet bath device 25 are installed in the bathtub 3 in the bathroom. A controller 11 is installed on the wall surface near the bathtub 3 in the bathroom. In addition, components other than the hot water heater 23, which is one of the controlled devices 20, the pump device of the jet jet bath device 25, the heat source device of the mist sauna device 26, and the controller of the voice recognition control device 10 are installed outside the bathroom. ing.

なお、図２で示した配置例は一例であり、音声認識制御装置１０及び被制御機器２０は他のレイアウトを取り得る。また、図１及び図２では、照明機器２１、空調機器２２、給湯器２３、テレビ２４、ジェット噴流バス装置２５及びミストサウナ装置２６を被制御機器２０の例として挙げたが、これに限らず、被制御機器２０には、カセットテープ、ＣＤ、ＭＤ、ＤＶＤなどの記録媒体に格納された音楽や映像を再生する電気器具や、暖房機器やパーソナルコンピュータなど、浴室内においてユーザが利用する電気器具が含まれる。 The arrangement example shown in FIG. 2 is an example, and the voice recognition control device 10 and the controlled device 20 can take other layouts. Moreover, in FIG.1 and FIG.2, although the illuminating device 21, the air conditioner 22, the hot water heater 23, the television 24, the jet-jet bath apparatus 25, and the mist sauna apparatus 26 were mentioned as an example of the to-be-controlled apparatus 20, it is not restricted to this. The controlled device 20 includes electrical appliances that play music and video stored in recording media such as cassette tapes, CDs, MDs, and DVDs, and electrical appliances that users use in the bathroom, such as heating devices and personal computers. Is included.

次に、図３を参照して、図１及び図２に示したコントローラ１１の操作面のレイアウトを説明する。コントローラ１１の操作面には、音声入力部４０のマイク４１、音声出力部５０のスピーカ５３、各種操作ボタン６０ａ〜６０ｉ、及び表示部７０としてのＬＥＤ７１及び液晶表示装置７２が配置されている。 Next, the layout of the operation surface of the controller 11 shown in FIGS. 1 and 2 will be described with reference to FIG. On the operation surface of the controller 11, a microphone 41 of the audio input unit 40, a speaker 53 of the audio output unit 50, various operation buttons 60 a to 60 i, an LED 71 as a display unit 70, and a liquid crystal display device 72 are arranged.

各種操作ボタン６０ａ〜６０ｉは、メニューボタン６０ａ、確定ボタン６０ｂ、戻るボタン６０ｃ、十字キー６０ｄ、優先ボタン６０ｅ、追いだきボタン６０ｆ、ふろ自動ボタン６０ｇ、通話ボタン６０ｈ及びコントローラオンオフスイッチ６０ｉからなる。これらボタン６０ａ〜６０ｉのうち、優先ボタン６０ｅ、追いだきボタン６０ｆ、ふろ自動ボタン６０ｇ、及び通話ボタン６０ｈは、給湯器２３の制御のために用いられる。また、他のボタン及びスイッチは、給湯器２３に限らず、その他の被制御機器２０及び音声認識制御装置１０の各構成要素の制御のためにも用いられる。このように、コントローラ１１は、被制御機器２０をスイッチ操作により制御する浴室リモコンと、音声認識制御装置１０のコントロールパネルとの機能を兼ねる構成となっている。 The various operation buttons 60a to 60i include a menu button 60a, a confirmation button 60b, a return button 60c, a cross key 60d, a priority button 60e, a follow-up button 60f, a bath automatic button 60g, a call button 60h, and a controller on / off switch 60i. Among these buttons 60a to 60i, the priority button 60e, the follow-up button 60f, the automatic bath button 60g, and the call button 60h are used for controlling the water heater 23. The other buttons and switches are used not only for the hot water heater 23 but also for controlling each component of the controlled device 20 and the speech recognition control device 10. As described above, the controller 11 is configured to function as both a bathroom remote controller that controls the controlled device 20 by a switch operation and the control panel of the voice recognition control device 10.

具体的に、優先ボタン６０ｅは、浴室で給湯温度やシャワー温度を設定したいときに使用するボタンである。一般的に水や湯は、浴室以外にも台所等で用いられる。このため、給湯器２３の給湯温度やシャワー温度を設定しても他の箇所で水や湯を使用されると、実際の給湯温度やシャワー温度にズレが生じる可能性がある。そこで、優先ボタン６０ｅを押下することにより、他の箇所よりも浴室を優先し、実際の給湯温度やシャワー温度にズレが生じ難いようにすることができる。また、優先ボタン６０ｅが押下されると、ＬＥＤ７１が点灯する、又は液晶表示装置７２に優先状態を表示する等の方法により、表示部７０に優先マーク（不図示）が表示される。 Specifically, the priority button 60e is a button used when it is desired to set a hot water supply temperature or a shower temperature in the bathroom. In general, water and hot water are used not only in the bathroom but also in the kitchen. For this reason, even if the hot water supply temperature and the shower temperature of the water heater 23 are set, if water or hot water is used in other places, the actual hot water supply temperature or the shower temperature may be shifted. Therefore, by pressing the priority button 60e, it is possible to give priority to the bathroom over other places, and to prevent the actual hot water supply temperature and shower temperature from being shifted. When the priority button 60e is pressed, a priority mark (not shown) is displayed on the display unit 70 by a method such as turning on the LED 71 or displaying the priority state on the liquid crystal display device 72.

追いだきボタン６０ｆは、浴槽内の湯水の温度を高くするときに使用されるボタンである。追いだきボタン６０ｆが押下されると、前記の優先マークと同様にして、表示部７０に追いだきマーク（不図示）が表示される。ふろ自動ボタン６０ｇは、予め設定した湯量と温度とで浴槽内にお湯をはるときに使用されるボタンである。ふろ自動ボタン６０ｇが押下されると、前記の優先マークと同様にして、表示部７０に自動マーク（不図示）が表示される。 The chasing button 60f is a button used when raising the temperature of hot water in the bathtub. When the tracking button 60f is pressed, a tracking mark (not shown) is displayed on the display unit 70 in the same manner as the priority mark. The bath automatic button 60g is a button used when hot water is poured into the bathtub with a preset amount and temperature of hot water. When the automatic button 60g is pressed, an automatic mark (not shown) is displayed on the display unit 70 in the same manner as the priority mark.

通話ボタン６０ｈは、浴室外、例えば台所などに設置される台所用リモコンと通話するときに使用されるボタンである。通話ボタン６０ｈが押下されると、前記の優先マークと同様にして、表示部７０に通話マーク（不図示）が表示される。 The call button 60h is a button used when making a call with a kitchen remote controller installed outside the bathroom, for example, in a kitchen. When the call button 60h is pressed, a call mark (not shown) is displayed on the display unit 70 in the same manner as the priority mark.

メニューボタン６０ａは、手入力により被制御機器２０及び音声認識制御装置１０の動作を設定するためのボタンである。メニューボタン６０ａが押下されると、被制御機器２０及び音声認識制御装置１０の動作項目（例えば空調機器オフ、テレビ電源オン、テレビチャンネル＋１、ミストサウナ装置オン、音声認識部オンなど）が液晶表示装置７２に複数個表示される。ユーザは、これら複数の動作項目から十字キー６０ｄを操作して１つの動作項目を選択することとなる。 The menu button 60a is a button for setting operations of the controlled device 20 and the voice recognition control device 10 by manual input. When the menu button 60a is pressed, the operation items of the controlled device 20 and the voice recognition control device 10 (for example, air conditioner off, TV power on, TV channel + 1, mist sauna device on, voice recognition unit on, etc.) are displayed on the liquid crystal display. A plurality of items are displayed on the device 72. The user selects one action item by operating the cross key 60d from the plurality of action items.

確定ボタン６０ｂは、十字キー６０ｄを操作して選択された動作項目の動作を被制御機器２０及び音声認識制御装置１０に実行させる際に押下されるボタンである。戻るボタン６０ｃは、液晶表示装置７２に表示される画面を１つ前の状態に戻すときなどに使用されるボタンである。例えば、液晶表示装置７２上に動作項目の一部しか表示できない場合、十字キー６０ｄを操作することにより、次の画面に移行して、残りの動作項目を表示させることができる。また、戻るボタン６０ｃを押下すれば、移行した画面を元に戻して、前回画面の動作項目を液晶表示装置７２に表示させることができる。十字キー６０ｄは、給湯温度やシャワー温度の温度設定、湯量の設定、動作項目の選択、オン／オフの選択などに用いられるボタンである。 The confirmation button 60b is a button that is pressed when the controlled device 20 and the voice recognition control device 10 execute the operation of the operation item selected by operating the cross key 60d. The return button 60c is a button used when returning the screen displayed on the liquid crystal display device 72 to the previous state. For example, when only a part of the operation items can be displayed on the liquid crystal display device 72, it is possible to move to the next screen by operating the cross key 60d and display the remaining operation items. If the return button 60c is pressed, the transitioned screen can be restored and the operation items of the previous screen can be displayed on the liquid crystal display device 72. The cross key 60d is a button used for temperature setting of hot water supply temperature or shower temperature, setting of hot water volume, selection of operation items, selection of on / off, and the like.

コントローラオンオフスイッチ６０ｉは、コントローラ１１の電源をオン又はオフするためのボタンであり、コントローラオンオフスイッチ６０ｉを押下する度に、コントローラ１１の電源のオンとオフが切り替わる。コントローラオンオフスイッチ６０ｉによりコントローラ１１の電源がオフされた場合、液晶表示装置７２の表示は消去し、コントローラ１１のスイッチ操作を介した被制御機器２０及び音声認識制御装置１０の制御が無効となり、かつ音声認識による被制御機器２０及び音声認識制御装置１０の制御も無効となる。 The controller on / off switch 60i is a button for turning on or off the power supply of the controller 11. Each time the controller on / off switch 60i is pressed, the controller 11 is turned on or off. When the controller 11 is turned off by the controller on / off switch 60i, the display on the liquid crystal display device 72 is erased, the control of the controlled device 20 and the voice recognition control device 10 through the switch operation of the controller 11 is disabled, and Control of the controlled device 20 and the voice recognition control device 10 by voice recognition is also invalidated.

上記した各種操作ボタン６０ａ〜６０ｉのボタン操作による制御は、ユーザの発話音声の音声認識機能を用いても同様にして実行することができる。即ち、照合用データ記憶部１４は、上記したコントローラ１１のボタン操作と同等な制御に相当する音声コマンド及びなまけコマンドの照合用データを格納し、音声認識部１２は、音声入力部４０から出力された音声信号（音声データ）と照合用データベースに格納された照合用データとを比較することにより、上記した各種操作ボタン６０ａ〜６０ｉのボタン操作と同等な制御に相当する音声コマンド及びなまけコマンドを認識することができる。 The above-described control by the button operation of the various operation buttons 60a to 60i can be executed in the same manner even if the voice recognition function of the user's uttered voice is used. That is, the collation data storage unit 14 stores collation data of voice commands and slack commands corresponding to the control equivalent to the button operation of the controller 11 described above, and the voice recognition unit 12 is output from the voice input unit 40. By comparing the voice signal (speech data) and the collation data stored in the collation database, voice commands and slack commands corresponding to the control equivalent to the button operations of the various operation buttons 60a to 60i are recognized. can do.

液晶表示装置７２は、時刻、浴槽内の湯水の量及び温度、給湯温度、シャワー温度などを表示する。また、液晶表示装置７２は、ユーザに対して所定の初期発話を促すための画面に表示する。例えば、初期発話の内容（言葉）を画面に表示し、併せて、表示されている言葉をコントローラ１１のマイク４１に向かって発話することをユーザに対して促す文字案内を表示する。当該文字案内に従ってユーザが初期発話を行うと、初期発話の音声は音声入力部４０で音声信号に変換され、音声認識部１２は、初期発話の音声が所定の音声コマンド又はなまけコマンドに該当するか否かを判定し、照合対象削除部１６は、初期発話の判定の結果に基づいて、ユーザにとって必要のないなまけコマンドを音声認識部１２の照合対象から削除することができる。 The liquid crystal display 72 displays the time, the amount and temperature of hot water in the bathtub, the hot water supply temperature, the shower temperature, and the like. In addition, the liquid crystal display device 72 displays a screen for prompting the user to make a predetermined initial utterance. For example, the contents (words) of the initial utterance are displayed on the screen, and at the same time, the character guidance that prompts the user to speak the displayed words toward the microphone 41 of the controller 11 is displayed. When the user performs an initial utterance according to the character guidance, the voice of the initial utterance is converted into a voice signal by the voice input unit 40, and the voice recognition unit 12 determines whether the voice of the initial utterance corresponds to a predetermined voice command or a slack command. The collation target deletion unit 16 can delete the slack command unnecessary for the user from the collation target of the voice recognition unit 12 based on the determination result of the initial utterance.

なお、初期発話の画面表示は、音声認識制御装置１０の使用を開始する前に実施することが望ましい。音声認識制御装置１０の使用を開始する時から、音声認識部１２の照合対象となる照合用データのデータ量を削減することができる。初期発話の画面表示は、コントローラオンオフスイッチ６０ｉを操作してコントローラ１１の電源をオンさせる度に、すべてのユーザに対して実施してもよいし、ユーザ登録機能を備えている場合には、初めて音声認識制御装置１０を使用するユーザにたいしてのみ、初期発話の画面表示を行っても構わない。 Note that the screen display of the initial utterance is preferably performed before the use of the speech recognition control apparatus 10 is started. Since the use of the speech recognition control device 10 is started, the amount of collation data to be collated by the speech recognition unit 12 can be reduced. The screen display of the initial utterance may be performed for all users every time the controller on / off switch 60i is operated to turn on the controller 11, or for the first time when a user registration function is provided. The screen display of the initial utterance may be performed only for the user who uses the voice recognition control device 10.

また、初期発話をユーザに促すための手段としては、液晶表示装置７２などによる画面表示に限らず、音声出力部５０による音声案内でも構わない。具体的には、初期発話の内容（言葉）をコントローラ１１のマイク４１に向かって発話することをユーザに対して促すための音声案内をスピーカ５３から出力すればよい。 The means for prompting the user to make an initial utterance is not limited to the screen display by the liquid crystal display device 72 or the like, but may be voice guidance by the voice output unit 50. Specifically, voice guidance for prompting the user to speak the content (words) of the initial utterance toward the microphone 41 of the controller 11 may be output from the speaker 53.

したがって、図１に示したように、ユーザに対して所定の初期発話を促す発話促進部６０（発話促進手段）には、画面案内を行うための表示部７０及び音声案内を行うための音声出力部５０が含まれる。 Therefore, as shown in FIG. 1, the utterance promoting unit 60 (speech promoting means) that prompts the user for a predetermined initial utterance includes a display unit 70 for performing screen guidance and a voice output for performing voice guidance. Part 50 is included.

次に、図４及び図５を参照して、音声認識部１２により特定される発話なまけの傾向を説明する。音声認識部１２は、入力された音声が音声コマンド又はなまけコマンドに該当するか否かの判定結果に基づいて、ユーザがどの発話なまけの傾向にあるかを特定する。図４に示すように、発話なまけの傾向には、例えば、発話する音声の語頭が弱くなるか欠落する「語頭の弱化（なまけ傾向１）」、発話する音声の語尾が弱くなるか欠落する「語尾の弱化（なまけ傾向２）」、及びｒの子音が抜ける「ｒの抜け（なまけ傾向３）」がある。 Next, with reference to FIG. 4 and FIG. 5, the tendency of utterance blurring specified by the speech recognition unit 12 will be described. The voice recognizing unit 12 identifies which utterance dulling tendency the user has based on the determination result of whether or not the input voice corresponds to a voice command or a slack command. As shown in FIG. 4, the tendency of utterance sanctions includes, for example, “beginning weakening (smoothing tendency 1)” in which the beginning of the speech to be uttered is weak or missing, or the ending of the speech to be uttered is weakened or missing. "End weakening (smoothing tendency 2)" and "r missing (smoothing tendency 3)" where r consonants are missing.

図５に示すように、例えば、「テレビ２４の電源をオンする」制御内容に対応する音声コマンドが「てれびをつけて」である場合、当該「てれびをつけて」に対応するなまけコマンドのうち、「語頭の弱化（なまけ傾向１）」に属するなまけコマンドは「れびをつけて」であり、「語尾の弱化（なまけ傾向２）」に属するなまけコマンドは「てれびをつけ」であり、「ｒの抜け（なまけ傾向３）」に属するなまけコマンドは「てえびをつけて」である。 As shown in FIG. 5, for example, when the voice command corresponding to the control content of “turning on the power of the television 24” is “turn on TV”, among the slack commands corresponding to the “turn on TV” , The name commands belonging to “Initial weakening (named tendency 1)” are “Add Levi”, and the name commands belonging to “Initial weakening (Named tendency 2)” are “Add Lebi”, “ The slack command belonging to “missing r (sloping tendency 3)” is “put shrimp”.

同様に、「照明機器２１の電源をオンする」制御内容に対応する音声コマンドが「あかりをつけて」である場合、当該「あかりをつけて」に対応するなまけコマンドのうち、「語頭の弱化（なまけ傾向１）」に属するなまけコマンドは「かりをつけて」であり、「語尾の弱化（なまけ傾向２）」に属するなまけコマンドは「あかりをつけ」であり、「ｒの抜け（なまけ傾向３）」に属するなまけコマンドは「あかいをつけて」である。 Similarly, when the voice command corresponding to the control content “turn on the lighting device 21” is “Turn on the light”, among the slack commands corresponding to the “Turn on the light”, The lazy command belonging to “(Laze tendency 1)” is “Turn on the scale”, and the lazy command belonging to “Lack of endings (Lack of trend 2)” is “Take a light”, and “missing r” (sloping tendency) 3) The slander command belonging to “” is “with red”.

図示は省略するが、この他に、音声認識部１２は、以下のような発話なまけの傾向も特定する。
（ａ）「ｅｉ」が「ｅｅ」に変化する。例「ていし（停止）」が「てえし」へ変化する等、
（ｂ）「ｏｕ」が「ｏｏ」に変化する。例「ぼこう（母校）」が「ぼこお」へ変化する等、
（ｃ）「し」と「ひ」が入れ替わる。例「ひつじ（羊）」と「しつじ」等、
（ｄ）その他、ルール化できないもの。例「ぜんいん（全員）」が「ぜえいん」へ変化する、「ばあい（場合）」が「ばわい」へ変化する等。 In addition to this, although not shown, the voice recognition unit 12 also specifies the tendency of speech utterances as follows.
(A) “ei” changes to “ee”. Example: “Tashishi (stop)” changes to “Teshishi”, etc.
(B) “ou” changes to “oo”. Example: “Boko (home school)” changes to “Bokoo”
(C) “shi” and “hi” are interchanged. Examples: “sheep” and “sheep”
(D) Others that cannot be ruled. Examples: “Zenin (all)” changes to “Zein”, “Bai (if)” changes to “Bai”, etc.

また、第１の頻度計数部１７は、音声認識部１２がなまけコマンドを認識する頻度を当該なまけコマンドが示す制御内容ごとに計数する。例えば、音声認識部１２がなまけ傾向１に属する「れびをつけて」を認識した場合、制御内容「テレビ２４の電源をオンする」について、なまけ傾向１の頻度を計数する。 The first frequency counting unit 17 counts the frequency at which the voice recognition unit 12 recognizes the slack command for each control content indicated by the slack command. For example, when the voice recognition unit 12 recognizes “Take a Levi” belonging to the trend of tendency 1, the frequency of the trend of trend 1 is counted for the control content “Turn on the power of the television 24”.

これに対して、第２の頻度計数部１８は、音声認識部１２がなまけコマンドを認識する頻度を当該なまけコマンドが属する発話なまけの傾向ごとに計数する。例えば、音声認識部１２がなまけ傾向１に属する「れびをつけて」を認識した場合、総ての制御内容について、なまけ傾向１の頻度を計数する。 On the other hand, the second frequency counting unit 18 counts the frequency with which the voice recognition unit 12 recognizes the slack command for each utterance slack tendency to which the slack command belongs. For example, when the speech recognition unit 12 recognizes “add a levy” that belongs to the trend of tendency 1, the frequency of the trend of trend 1 is counted for all the control contents.

照合対象削除部１６は、第１の頻度計数部１７又は第２の頻度計数部１８により計数された頻度に応じて、音声認識部１２の照合対象から削除するなまけコマンドを制御内容ごと又は発話なまけの傾向ごとに特定しても構わない。 The collation target deletion unit 16 adds a slack command to be deleted from the collation target of the speech recognition unit 12 according to the frequency counted by the first frequency counting unit 17 or the second frequency counting unit 18 for each control content or utterance. You may specify for each tendency.

次に、図９を参照して、図１の音声認識制御装置１０の動作手順の一例を説明する。 Next, an example of the operation procedure of the voice recognition control device 10 of FIG. 1 will be described with reference to FIG.

（イ）先ず、音声認識制御装置１０に電源が供給され、音声認識制御装置１０が作動状態になると、音声認識制御装置１０は、コントローラ１１の運転スイッチ６０ｉがオン状態であるか否かを判断する（Ｓ１０１）。判断の結果、オン状態である場合（Ｓ１０１でＹＥＳ）、ステップＳ１０３へ進む。 (A) First, when power is supplied to the voice recognition control device 10 and the voice recognition control device 10 is activated, the voice recognition control device 10 determines whether or not the operation switch 60 i of the controller 11 is on. (S101). As a result of the determination, if it is in the on state (YES in S101), the process proceeds to step S103.

（ロ）ステップＳ１０３において、音声入力部４０にユーザの発話音声が入力されたか否かを判断する。ユーザの発話音声が入力されて音声信号が音声認識部１２へ出力された場合（Ｓ１０３でＹＥＳ）ステップＳ１０５へ進み、発話音声が入力されなかった場合（Ｓ１０３でＮＯ）、ステップＳ１１７に進む。 (B) In step S103, it is determined whether or not the user's speech is input to the voice input unit 40. If the user's uttered voice is input and a voice signal is output to the voice recognition unit 12 (YES in S103), the process proceeds to step S105. If the uttered voice is not input (NO in S103), the process proceeds to step S117.

（ハ）ステップＳ１０５において、音声認識部１２は、照合用データ記憶部１４に格納された照合用データと音声入力部４０から出力された音声信号とを照合してユーザが発する音声が所定の音声コマンド又はなまけコマンドに該当するか否かを判定する。音声コマンド又はなまけコマンドに該当すると判定した（音声認識に成功した）場合（Ｓ１０５でＹＥＳ）、当該音声コマンド又はなまけコマンドに対応する所定の信号を制御実行部１３へ出力し、ステップＳ１０７に進む。一方、音声コマンド又はなまけコマンドに該当すると判定しない（音声認識に失敗した）場合（Ｓ１０５でＮＯ）、音声認識できない旨を液晶表示装置７２に表示し、その後、Ｓ１１７に進む。 (C) In step S105, the voice recognition unit 12 collates the collation data stored in the collation data storage unit 14 with the voice signal output from the voice input unit 40, and the voice uttered by the user is a predetermined voice. It is determined whether it corresponds to a command or a slack command. If it is determined that the command corresponds to the voice command or the name command (successful voice recognition) (YES in S105), a predetermined signal corresponding to the voice command or name command is output to the control execution unit 13, and the process proceeds to step S107. On the other hand, if it is not determined that the command corresponds to the voice command or the slack command (speech recognition has failed) (NO in S105), the fact that voice recognition cannot be performed is displayed on the liquid crystal display device 72, and then the process proceeds to S117.

（ニ）ステップＳ１０７において、制御実行部１３は当該音声コマンドに対応する被制御機器２０の制御を実行する。その後、ステップＳ１０９に進む。 (D) In step S107, the control execution unit 13 executes control of the controlled device 20 corresponding to the voice command. Thereafter, the process proceeds to step S109.

（ホ）ステップＳ１０９において、音声認識部１２は、入力された音声が音声コマンド又はなまけコマンドに該当するか否かの判定結果に基づいて、ユーザの発話なまけの傾向を特定する。例えば、ステップＳ１０５において音声認識部１２が図６のなまけ傾向１に属する「れびをつけて」及び図６のなまけ傾向３に属する「てえびをつけて」をそれぞれ認識した場合、音声認識部１２は、ユーザがなまけ傾向１及びなまけ傾向３を有すると特定する。その後、ステップＳ１１１へ進む。 (E) In step S109, the voice recognizing unit 12 identifies the tendency of the user to be uttered and slack based on the determination result of whether or not the input voice corresponds to a voice command or a slack command. For example, in step S105, when the voice recognition unit 12 recognizes “Add Lebi” belonging to the trend tendency 1 in FIG. 6 and “Take Ebi” belonging to the trend tendency 3 in FIG. 12 specifies that the user has a tendency 1 and a tendency 3. Thereafter, the process proceeds to step S111.

（へ）ステップＳ１１１において、第１の頻度計数部１７及び第２の頻度計数部１８は、音声認識部１２がなまけコマンドを認識する頻度を計数する。例えば、「れびをつけて」及び「てえびをつけて」を認識した場合、第１の頻度計数部１７は、「テレビ２４の電源をオンする」制御内容について、なまけ傾向１及びなまけ傾向３の頻度を計数し、第２の頻度計数部１８は、総ての制御内容について、なまけ傾向１及びなまけ傾向３の頻度を計数する。その後、ステップＳ１１３へ進む。 (F) In step S111, the first frequency counting unit 17 and the second frequency counting unit 18 count the frequency at which the voice recognition unit 12 recognizes the slack command. For example, when “Take a Lebi” and “Take a Shrimp” are recognized, the first frequency counting unit 17 sets the trend tendency 1 and the trend tendency for the control content “turns on the power of the TV 24”. 3, and the second frequency counting unit 18 counts the frequency of the trend tendency 1 and the trend trend 3 for all the control contents. Thereafter, the process proceeds to step S113.

（ト）ステップＳ１１３において、照合対象削除部１６は、第１の頻度計数部１７又は第２の頻度計数部１８により計数された頻度に応じて、音声認識部１２の照合対象からなまけコマンドを削除するか否かを判断する。削除する場合（Ｓ１１３でＹＥＳ）ステップＳ１１５へ進み、照合対象削除部１６は、音声認識部１２の照合対象から削除するなまけコマンドを制御内容ごと又は発話なまけの傾向ごとに特定し、なまけコマンドの削除を実行する。その後、ステップＳ１１７へ進む。一方、削除しない場合（Ｓ１１３でＮＯ）ステップＳ１１７へ進む。 (G) In step S113, the collation target deletion unit 16 deletes the slack command from the collation target of the speech recognition unit 12 according to the frequency counted by the first frequency counting unit 17 or the second frequency counting unit 18. Judge whether to do. When deleting (YES in S113), the process proceeds to step S115, in which the collation target deletion unit 16 identifies the slack command to be deleted from the collation target of the speech recognition unit 12 for each control content or for each tendency of utterance slack, and deletes the slack command. Execute. Thereafter, the process proceeds to step S117. On the other hand, when not deleting (it is NO at S113), it progresses to step S117.

（チ）ステップＳ１１７において、コントローラ１１の運転スイッチ６０ｉがオフされたか否かを判断する。オフされた場合（Ｓ１１７でＹＥＳ）、図９のフローチャートは終了し、オン状態に維持されている場合（Ｓ１１７でＮＯ）、ステップＳ１０３に戻る。 (H) In step S117, it is determined whether or not the operation switch 60i of the controller 11 is turned off. When turned off (YES at S117), the flowchart of FIG. 9 ends, and when it is maintained on (NO at S117), the process returns to step S103.

図６は、図９のフローチャートに示す動作手順により、音声認識部１２の照合対象からなまけコマンドを削除した結果の一例を示す表である。ここでは、第２の頻度計数部１８により計数された発話なまけの傾向ごとの頻度に応じて、音声認識部１２の照合対象からなまけ傾向２に属する総てのなまけコマンドを削除した例を示す。図９のステップＳ１０５において音声認識部１２が「れびをつけて」及び「てえびをつけて」を認識し、ステップＳ１０９において音声認識部１２がユーザはなまけ傾向１及びなまけ傾向３を有すると特定し、ステップＳ１１１において第２の頻度計数部１８がなまけ傾向１及びなまけ傾向３全体の頻度を計数している。このような手順による頻度の計数を繰り返し実施し、統計が取れる程度まで繰り返された時に、ステップＳ１１５において照合対象削除部１６は、第２の頻度計数部１８により計数された頻度に応じて、音声認識部１２の照合対象から、当該ユーザにとって必要のないなまけ傾向２に属する総てのなまけコマンドを削除する。すなわち、図６の「てれびをつけ」及び「あかりをつけ」など「語尾の弱化（なまけ傾向２）」に属するなまけコマンドの総てを照合対象から削除する。 FIG. 6 is a table showing an example of the result of deleting the slack command from the collation target of the speech recognition unit 12 by the operation procedure shown in the flowchart of FIG. Here, an example is shown in which all the lenient commands belonging to the lenient tendency 2 are deleted from the collation target of the speech recognition unit 12 according to the frequency for each utterance lenient tendency counted by the second frequency counting section 18. In step S105 of FIG. 9, the speech recognition unit 12 recognizes “with a skein” and “with a shrimp”. In step S109, the speech recognition unit 12 indicates that the user has a trend 1 and a trend 3. In step S111, the second frequency counting unit 18 counts the frequencies of the entire tendency 1 and the tendency 3. When the frequency count according to such a procedure is repeatedly performed and repeated until the statistics can be obtained, the collation target deletion unit 16 in step S115 determines the voice according to the frequency counted by the second frequency count unit 18. All the lenient commands belonging to the lenient tendency 2 that are not necessary for the user are deleted from the verification target of the recognition unit 12. That is, all name commands belonging to “weak endings (named tendency 2)” such as “apply TV” and “apply light” in FIG. 6 are deleted from the verification target.

以上説明したように、本発明の第１の実施の形態によれば、なまけコマンドを認識する頻度に応じて音声認識部１２の照合対象から削除するなまけコマンドを特定することにより、ユーザにとって必要のないなまけコマンドを削除することができる。よって、照合対象となる照合用データのデータ量が削減されるので、音声認識率が向上し、誤認識及びこれによる誤動作が抑制される。 As described above, according to the first embodiment of the present invention, it is necessary for the user to specify the slack command to be deleted from the collation target of the voice recognition unit 12 according to the frequency of recognizing the slack command. You can delete missing commands. Therefore, since the data amount of the collation data to be collated is reduced, the voice recognition rate is improved, and erroneous recognition and malfunction caused thereby are suppressed.

なお、図９のフローチャートでは、ステップＳ１１１において頻度の計数を実施し、第１の頻度計数部１７又は第２の頻度計数部１８により計数された頻度に応じて、なまけコマンドの削除を実行する場合を示したが、第１の実施の形態における動作手順はこれに限らない。例えば、ステップＳ１０９において特定されたユーザの発話なまけの傾向に基づいて、認識頻度の計数（Ｓ１１１）を実施することなく、Ｓ１１３へ進んで、なまけコマンドの削除を判断及び実行してもよい。即ち、照合対象削除部１６は、音声認識部１２により特定されたユーザの発話なまけの傾向を参照して、音声認識部１２の照合対象から削除するなまけコマンドを特定してもよい。 In the flowchart of FIG. 9, the frequency is counted in step S <b> 111 and deletion of the slack command is executed according to the frequency counted by the first frequency counting unit 17 or the second frequency counting unit 18. However, the operation procedure in the first embodiment is not limited to this. For example, based on the tendency of the user's utterances to be identified in step S109, the process may proceed to S113 without performing the recognition frequency counting (S111) to determine and execute the deletion of the lenient command. That is, the collation target deletion unit 16 may identify a slack command to be deleted from the collation target of the voice recognition unit 12 with reference to the tendency of the user's utterance blur specified by the voice recognition unit 12.

或いは、入力された音声が所定の音声コマンド又はなまけコマンドに該当するか否かの判定の結果（Ｓ１０５）に基づいて、なまけ傾向の特定（Ｓ１０９）及び認識頻度の計数（Ｓ１１１）を実施することなく、Ｓ１１３へ進んで、なまけコマンドの削除を判断及び実行してもよい。即ち、照合対象削除部１６は、入力された音声が所定の音声コマンド又はなまけコマンドに該当するか否かの音声認識部１２による判定の結果に基づいて、音声認識部１２の照合対象から削除するなまけコマンドを特定してもよい。
（第２の実施の形態）
音声認識制御装置１０の使用を開始する前にユーザに対して所定の初期発話を促す「調整モード」について説明する。第２の実施の形態では、コントローラオンオフスイッチ６０ｉを操作してコントローラ１１の電源をオンさせる度に、すべてのユーザに対して調整モードを実施する場合について説明する。 Alternatively, based on the determination result (S105) whether or not the input voice corresponds to a predetermined voice command or a slack command, the slack tendency is specified (S109) and the recognition frequency is counted (S111). Alternatively, the process may proceed to S113 to determine and execute deletion of the slack command. That is, the collation target deletion unit 16 deletes the input voice from the collation target of the voice recognition unit 12 based on the determination result by the voice recognition unit 12 as to whether or not the input voice corresponds to a predetermined voice command or a slack command. You may specify a slack command.
(Second Embodiment)
The “adjustment mode” that prompts the user for a predetermined initial utterance before starting to use the speech recognition control device 10 will be described. In the second embodiment, a case will be described in which the adjustment mode is performed for all users every time the controller 11 is turned on by operating the controller on / off switch 60i.

図１０は、第２の実施の形態に係わる図１の音声認識制御装置１０の動作手順の一例を示すフローチャートである。 FIG. 10 is a flowchart showing an example of an operation procedure of the speech recognition control apparatus 10 of FIG. 1 according to the second embodiment.

（イ）先ず、音声認識制御装置１０に電源が供給され、音声認識制御装置１０が作動状態になると、音声認識制御装置１０は、コントローラ１１の運転スイッチ６０ｉがオン状態であるか否かを判断する（Ｓ２０１）。判断の結果、オン状態である場合（Ｓ２０１でＹＥＳ）、ステップＳ２０３へ進む。 (A) First, when power is supplied to the voice recognition control device 10 and the voice recognition control device 10 is activated, the voice recognition control device 10 determines whether or not the operation switch 60 i of the controller 11 is on. (S201). As a result of the determination, if it is on (YES in S201), the process proceeds to step S203.

（ロ）ステップＳ２０３において、液晶表示装置７２は、ユーザに対して所定の初期発話を促すための画面に表示する。そして、ユーザが初期発話を行い、音声入力部４０に音声が入力された場合（Ｓ２０５でＹＥＳ）、ステップＳ２０７に進み、音声入力部４０に音声が入力されない場合（Ｓ２０５でＮＯ）、ステップＳ２０３に戻り、再度、所定の初期発話を促すための画面に表示する。 (B) In step S203, the liquid crystal display device 72 displays a screen for prompting the user to make a predetermined initial utterance. If the user makes an initial utterance and a voice is input to the voice input unit 40 (YES in S205), the process proceeds to step S207. If no voice is input to the voice input unit 40 (NO in S205), the process proceeds to step S203. Return to the screen for prompting a predetermined initial utterance again.

（ハ）ステップＳ２０７において、音声認識部１２は、照合用データ記憶部１４に格納された照合用データと音声入力部４０から出力された音声信号とを照合してユーザが発する音声が所定の音声コマンド又はなまけコマンドに該当するか否かを判定する。音声コマンド又はなまけコマンドに該当すると判定した（音声認識に成功した）場合（Ｓ２０７でＹＥＳ）、ステップＳ２１１に進み、音声コマンド又はなまけコマンドに該当すると判定しない（音声認識に失敗した）場合（Ｓ２０７でＮＯ）、音声認識できない旨を液晶表示装置７２に表示し（Ｓ２０９）、その後、ステップＳ２０３に戻る。 (C) In step S207, the voice recognition unit 12 collates the collation data stored in the collation data storage unit 14 with the voice signal output from the voice input unit 40, and the voice uttered by the user is a predetermined voice. It is determined whether it corresponds to a command or a slack command. If it is determined that it corresponds to the voice command or the namade command (speech recognition succeeds) (YES in S207), the process proceeds to step S211 and if it is not determined that the command corresponds to the voice command or the lick command (voice recognition has failed) (S207). NO), the fact that voice recognition is not possible is displayed on the liquid crystal display device 72 (S209), and then the process returns to step S203.

（ニ）ステップＳ２１１において、音声認識部１２は、入力された音声が音声コマンド又はなまけコマンドに該当するか否かの判定結果に基づいて、ユーザの発話なまけの傾向を特定する。 (D) In step S211, the voice recognition unit 12 identifies the tendency of the user to utter a utterance based on the determination result as to whether or not the input voice corresponds to a voice command or a slack command.

（ホ）ステップＳ２１３に進み、照合対象削除部１６は、音声認識部１２により特定されたユーザの発話なまけの傾向を参照して、音声認識部１２の照合対象からなまけコマンドを削除するか否かを判断する。削除する場合（Ｓ２１３でＹＥＳ）ステップＳ２１５へ進み、照合対象削除部１６は、音声認識部１２の照合対象から削除するなまけコマンドを特定し、なまけコマンドの削除を実行する。その後、ステップＳ２１７へ進む。一方、削除しない場合（Ｓ２１３でＮＯ）ステップＳ２１７へ進む。 (E) Proceeding to step S213, whether or not the collation target deletion unit 16 refers to the user's utterance blurring tendency specified by the voice recognition unit 12 and deletes the lenient command from the collation target of the voice recognition unit 12 or not Judging. When deleting (YES in S213), the process proceeds to step S215, in which the collation target deletion unit 16 specifies a slack command to be deleted from the collation target of the speech recognition unit 12, and deletes the slack command. Thereafter, the process proceeds to step S217. On the other hand, when not deleting (it is NO at S213), it progresses to step S217.

（へ）ステップＳ２１７において、音声認識部１２が総ての発話なまけの傾向について判定を行ったか否かを判断し、総ての発話なまけの傾向について判定を行った場合（Ｓ２１７でＹＥＳ）、図１０のフローチャートは終了し、総ての発話なまけの傾向について判定を行っていない場合（Ｓ２１７でＮＯ）、ステップＳ２０３に戻る。 (F) In step S217, when the speech recognition unit 12 determines whether or not all the utterance blurring tendencies have been determined, and all the utterance blurring tendencies have been determined (YES in S217), FIG. The flowchart of FIG. 10 is completed, and if all the utterances are not judged (NO in S217), the process returns to step S203.

以上説明したように、本発明の第２の実施の形態によれば、「調整モード」において、音声認識制御装置１０の使用を開始する前にユーザに対して所定の初期発話を促し、入力された初期発話から、ユーザの発話なまけの傾向を特定することにより、音声認識制御装置１０の使用を開始する時から、照合対象となる照合用データのデータ量を削減することができる。 As described above, according to the second embodiment of the present invention, in the “adjustment mode”, the user is prompted and input a predetermined initial utterance before starting to use the speech recognition control device 10. In addition, by identifying the tendency of the user's utterance to be distorted from the initial utterance, the amount of verification data to be verified can be reduced from when the use of the speech recognition control device 10 is started.

なお、所定の初期発話をユーザに促すための手段としては、液晶表示装置７２などによる画面表示に限らず、音声出力部５０による音声案内であっても構わない。
（第３の実施の形態）
第３の実施の形態では、予めユーザごとに発話なまけの傾向を登録しておき、音声認識制御装置１０の使用を開始する前に話者の識別を実施する場合について説明する。 The means for prompting the user to make a predetermined initial utterance is not limited to the screen display by the liquid crystal display device 72 or the like, but may be voice guidance by the voice output unit 50.
(Third embodiment)
In the third embodiment, a case will be described in which the tendency of utterances is registered in advance for each user, and speaker identification is performed before the use of the speech recognition control device 10 is started.

図７は、ユーザごとの発話なまけの傾向の例を示す表である。照合対象削除部１６は、ユーザごとの発話なまけの傾向を記憶する第２のメモリを備える。初めて音声認識制御装置１０を使用するユーザに対して前記の「調整モード」を実施し、ユーザごとに発話なまけの傾向を特定し、第２のメモリに予め記憶しておく。そして、以下に示す「話者識別モード」を音声認識制御装置１０の使用を開始する前に実施する。 FIG. 7 is a table showing an example of the tendency of utterance slack for each user. The collation target deletion unit 16 includes a second memory that stores the tendency of utterance blurring for each user. The “adjustment mode” is performed on the user who uses the speech recognition control apparatus 10 for the first time, and the tendency of the utterance is distorted for each user, and is stored in advance in the second memory. Then, the “speaker identification mode” described below is performed before the use of the speech recognition control apparatus 10 is started.

図１１は、第３の実施の形態に係わる図１の音声認識制御装置１０の動作手順の一例を示すフローチャートである。 FIG. 11 is a flowchart showing an example of an operation procedure of the speech recognition control apparatus 10 of FIG. 1 according to the third embodiment.

（イ）先ず、音声認識制御装置１０に電源が供給され、音声認識制御装置１０が作動状態になると、音声認識制御装置１０は、コントローラ１１の運転スイッチ６０ｉがオン状態であるか否かを判断する（Ｓ３０１）。判断の結果、オン状態である場合（Ｓ３０１でＹＥＳ）、ステップＳ３０３へ進む。 (A) First, when power is supplied to the voice recognition control device 10 and the voice recognition control device 10 is activated, the voice recognition control device 10 determines whether or not the operation switch 60 i of the controller 11 is on. (S301). As a result of the determination, if it is on (YES in S301), the process proceeds to step S303.

（ロ）ステップＳ３０３において、液晶表示装置７２は、ユーザに対して所定の発話を促すための画面に表示する。そして、ユーザが所定の発話を行い、音声入力部４０に音声が入力された場合（Ｓ３０５でＹＥＳ）、ステップＳ３０７に進み、音声入力部４０に音声が入力されない場合（Ｓ３０５でＮＯ）、ステップＳ３０３に戻り、再度、所定の発話を促すための画面に表示する。 (B) In step S303, the liquid crystal display device 72 displays the screen on the screen for prompting the user to make a predetermined utterance. If the user utters a predetermined utterance and a voice is input to the voice input unit 40 (YES in S305), the process proceeds to step S307. If no voice is input to the voice input unit 40 (NO in S305), step S303 is performed. The display is again displayed on the screen for prompting a predetermined utterance.

（ハ）ステップＳ３０７において、話者識別部１９は、自らが備えるメモリに格納されたユーザごとの音声データと音声入力部４０に入力された音声とを比較して当該音声を発話する話者を識別する。話者の識別に成功した場合（Ｓ３０７でＹＥＳ）、ステップＳ３０９に進み、話者の識別に失敗した場合（Ｓ３０７でＮＯ）、ステップＳ３０３に戻る。 (C) In step S307, the speaker identification unit 19 compares the voice data for each user stored in the memory provided by the speaker with the voice input to the voice input unit 40, and selects a speaker who speaks the voice. Identify. If the speaker identification is successful (YES in S307), the process proceeds to step S309. If the speaker identification fails (NO in S307), the process returns to step S303.

（ニ）ステップＳ３０９において、照合対象削除部１６は、ユーザごとに発話なまけの傾向を記憶する第２のメモリを参照して、識別された話者の発話なまけの傾向を特定する。 (D) In step S309, the collation target deleting unit 16 refers to the second memory that stores the tendency of utterance blurring for each user, and identifies the tendency of the identified speaker to utterance blurring.

（ホ）ステップＳ３１１に進み、照合対象削除部１６は、特定されたユーザの発話なまけの傾向を参照して、音声認識部１２の照合対象からなまけコマンドを削除するか否かを判断する。削除する場合（Ｓ３１１でＹＥＳ）ステップＳ３１３へ進み、照合対象削除部１６は、音声認識部１２の照合対象から削除するなまけコマンドを特定し、なまけコマンドの削除を実行して、図１１のフローチャートは終了する。一方、削除しない場合（Ｓ２１３でＮＯ）、なまけコマンドの削除を実行せずに、図１１のフローチャートは終了する。 (E) Proceeding to step S311, the collation target deletion unit 16 refers to the tendency of the specified user's utterance to be distorted, and determines whether or not to delete the lenient command from the collation target of the voice recognition unit 12. When deleting (YES in S311), the process proceeds to step S313, in which the collation target deletion unit 16 specifies a lick command to be deleted from the collation target of the voice recognition unit 12, executes the deletion of the lick command, and the flowchart of FIG. finish. On the other hand, when not deleting (NO in S213), the flowchart of FIG. 11 ends without executing the deletion of the slack command.

以上説明したように、本発明の第３の実施の形態によれば、発話なまけの傾向は各話者によってほぼ特定されるので、話者識別部１９により識別された話者に応じて照合対象から削除するなまけコマンドを変更することにより、ユーザ（話者）ごとに適切ななまけコマンドを選択することができ、音声認識率が向上する。 As described above, according to the third embodiment of the present invention, since the tendency of utterance blurring is almost specified by each speaker, the verification target is determined according to the speaker identified by the speaker identifying unit 19. By changing the slack command to be deleted from the appropriate slack command can be selected for each user (speaker), and the speech recognition rate is improved.

なお、所定の発話をユーザに促すための手段としては、液晶表示装置７２などによる画面表示に限らず、音声出力部５０による音声案内であっても構わない。 The means for prompting the user to make a predetermined utterance is not limited to the screen display by the liquid crystal display device 72 or the like, but may be voice guidance by the voice output unit 50.

上記のように、本発明は、３つの実施形態によって記載したが、この開示の一部をなす論述及び図面はこの発明を限定するものであると理解すべきではない。この開示から当業者には様々な代替実施の形態、実施例及び運用技術が明らかとなろう。 As described above, the present invention has been described in terms of three embodiments. However, it should not be understood that the description and drawings that form part of this disclosure limit the present invention. From this disclosure, various alternative embodiments, examples, and operational techniques will be apparent to those skilled in the art.

本発明に係わる音声認識制御装置は、浴室に限らず、寝室、リビング、会社のデスク付近及び会議室など、他の箇所に適用することが可能であり、これらの部屋に設置されている電気器具の制御を行うことができる。また、本発明に係わる音声認識制御装置は、自動車等のナビゲーション装置、携帯電話、パーソナルコンピュータなど、音声認識機能を用いて操作可能な機器に対しても適用可能である。 The speech recognition control device according to the present invention is not limited to a bathroom, and can be applied to other places such as a bedroom, a living room, a company desk, and a conference room, and the electric appliances installed in these rooms. Can be controlled. The voice recognition control device according to the present invention can also be applied to devices that can be operated using a voice recognition function, such as navigation devices such as automobiles, mobile phones, and personal computers.

また、図８に示すように、被制御機器２０ごとに、音声コマンドと制御内容の対応関係をまとめた表を、液晶表示装置７２に表示したり、或いは音声認識制御装置１０の使用説明書などに記載しておく。これにより、ユーザは、所望する制御内容に対応する音声コマンドを直ぐに認識できるので、音声による命令の正しい発話方法を容易に学習することができる。 Further, as shown in FIG. 8, for each controlled device 20, a table summarizing the correspondence between the voice command and the control content is displayed on the liquid crystal display device 72, or the instruction manual of the voice recognition control device 10 is used. It is described in. Thus, the user can immediately recognize the voice command corresponding to the desired control content, and can easily learn the correct utterance method of the voice command.

このように、本発明はここでは記載していない様々な実施の形態等を包含するということを理解すべきである。したがって、本発明はこの開示から妥当な特許請求の範囲に係る発明特定事項によってのみ限定されるものである。 Thus, it should be understood that the present invention includes various embodiments and the like not described herein. Therefore, the present invention is limited only by the invention specifying matters according to the scope of claims reasonable from this disclosure.

本発明の第１の実施の形態に係わる音声認識制御装置１０及び被制御機器２０の具体的な構成を示すブロック図である。It is a block diagram which shows the specific structure of the speech recognition control apparatus 10 concerning the 1st Embodiment of this invention, and the to-be-controlled device 20. FIG. 図１に示した音声認識制御装置１０及び被制御機器２０の配置例を示す浴室内の外観図である。It is an external view in the bathroom which shows the example of arrangement | positioning of the speech recognition control apparatus 10 shown in FIG. 図１及び図２に示したコントローラ１１の操作面のレイアウトを示す平面図である。It is a top view which shows the layout of the operation surface of the controller 11 shown in FIG.1 and FIG.2. 音声認識部１２により特定される発話なまけの傾向を例示する表である。It is a table | surface which illustrates the tendency of utterance slack specified by the speech recognition part. 音声コマンド及びなまけコマンドの例を示す表である。It is a table | surface which shows the example of a voice command and a slack command. 図９のフローチャートに示す動作手順により、音声認識部１２の照合対象からなまけコマンドを削除した結果の一例を示す表である。It is a table | surface which shows an example of the result of having deleted the slack command from the collation target of the speech recognition part 12 by the operation | movement procedure shown to the flowchart of FIG. ユーザごとの発話なまけの傾向の例を示す表である。It is a table | surface which shows the example of the tendency of utterance slander for every user. 被制御機器２０ごとに音声コマンドと制御内容の対応関係をまとめた表である。4 is a table summarizing the correspondence between voice commands and control contents for each controlled device 20. 図１の音声認識制御装置１０の動作手順の一例を示すフローチャートである。It is a flowchart which shows an example of the operation | movement procedure of the speech recognition control apparatus 10 of FIG. 第２の実施の形態に係わる図１の音声認識制御装置１０の動作手順の一例を示すフローチャートである。It is a flowchart which shows an example of the operation | movement procedure of the speech recognition control apparatus 10 of FIG. 1 concerning 2nd Embodiment. 第３の実施の形態に係わる図１の音声認識制御装置１０の動作手順の一例を示すフローチャートである。It is a flowchart which shows an example of the operation | movement procedure of the speech recognition control apparatus 10 of FIG. 1 concerning 3rd Embodiment.

Explanation of symbols

３…浴槽
１０…音声認識制御装置
１１…コントローラ
１２…音声認識部
１３…制御実行部
１４…照合用データ記憶部
１５…制御ＩＦ部
１６…照合対象削除部
１７…第１の頻度計数部
１８…第２の頻度計数部
１９…話者識別部
２０…被制御機器
２１…照明機器
２１ａ…主照明
２１ｂ…間接照明
２２…空調機器
２３…給湯器
２４…テレビ
２５…ジェット噴流バス装置
２５ａ…噴出口
２５ｂ…吸込口
２６…ミストサウナ装置
４０…音声入力部
４１…マイク
５０…音声出力部
５３…スピーカ
６０…発話促進部（発話促進手段）
６０ａ…メニューボタン
６０ｂ…確定ボタン
６０ｃ…戻るボタン
６０ｄ…十字キー
６０ｅ…優先ボタン
６０ｆ…追いだきボタン
６０ｇ…ふろ自動ボタン
６０ｈ…通話ボタン
６０ｉ…コントローラオンオフスイッチ
７０…表示部
７１…ＬＥＤ
７２…液晶表示装置 DESCRIPTION OF SYMBOLS 3 ... Bathtub 10 ... Voice recognition control apparatus 11 ... Controller 12 ... Voice recognition part 13 ... Control execution part 14 ... Data storage part 15 for collation 15 ... Control IF part 16 ... Target object deletion part 17 ... 1st frequency counting part 18 ... Second frequency counting unit 19 ... Speaker identification unit 20 ... Controlled device 21 ... Lighting device 21a ... Main lighting 21b ... Indirect lighting 22 ... Air conditioning device 23 ... Water heater 24 ... TV 25 ... Jet jet bus device 25a ... Jump 25b ... Suction port 26 ... Mist sauna device 40 ... Voice input part 41 ... Microphone 50 ... Voice output part 53 ... Speaker 60 ... Speech promotion part (speech promotion means)
60a ... Menu button 60b ... Confirm button 60c ... Back button 60d ... Cross key 60e ... Priority button 60f ... Follow-up button 60g ... Automatic bath button 60h ... Call button 60i ... Controller on / off switch 70 ... Display unit 71 ... LED
72 ... Liquid crystal display device

Claims

When it is recognized that the voice spoken by the user corresponds to the predetermined voice command, the control corresponding to the recognized voice command is executed,
In a voice recognition control device that executes control corresponding to the voice command when the voice uttered by the user is recognized as corresponding to the voice command corresponding to the voice when the voice command is spoken,
A collation data storage unit that stores collation data of voice commands that can be recognized by the voice recognition control device and collation data of the slender commands;
A voice input unit that inputs voice spoken by the user and converts the voice into a predetermined voice signal;
The voice signal converted by the voice input unit is compared with the verification data stored in the verification data storage unit to determine whether the input voice corresponds to a predetermined voice command or a slack command. A voice recognition unit for determining;
A speech recognition control device comprising: a collation target deletion unit that identifies a slack command unnecessary for the user based on a result of determination by the speech recognition unit and deletes the command from the collation target of the speech recognition unit.

Utterance promoting means for urging the user to make a predetermined initial utterance,
The voice recognition unit determines whether or not the voice of the initial utterance corresponds to a predetermined voice command or a slack command, and the collation target deletion unit determines whether or not the user is based on the determination result of the initial utterance. 2. The voice recognition control apparatus according to claim 1, wherein an unnecessary slack command is deleted from a collation target of the voice recognition unit.

3. The voice recognition control apparatus according to claim 2, wherein the utterance promoting means is a display means for displaying on the screen for prompting the user to perform the predetermined initial utterance.

4. The collation target deletion unit identifies a slack command with which the voice recognition unit recognizes a slack command at a low frequency and deletes it from the collation target of the voice recognition unit. The voice recognition control apparatus according to 1.

A first frequency counting unit that counts the frequency at which the voice recognition unit recognizes a slack command for each control content indicated by the slack command;
The collation target deletion unit specifies a slack command to be deleted from the collation target of the voice recognition unit for each control content according to the frequency counted by the first frequency counting unit. Item 5. The speech recognition control device according to Item 4.

A second frequency counting unit that counts the frequency at which the voice recognition unit recognizes a slack command for each utterance slack trend to which the slack command belongs;
The collation target deletion unit identifies a slack command to be deleted from the collation target of the speech recognition unit for each utterance slack tendency according to the frequency counted by the second frequency counting unit. The voice recognition control device according to claim 4.

A speaker identification unit for identifying a speaker who utters the voice based on the voice input to the voice input unit;
The said collation object deletion part changes the slack command deleted from the collation object of the said voice recognition part according to the speaker identified by the said speaker identification part, The one of Claims 1 thru | or 3 characterized by the above-mentioned. The speech recognition control device according to item.