JP2009109523A

JP2009109523A - Voice recognition system and voice recognizer

Info

Publication number: JP2009109523A
Application number: JP2007278312A
Authority: JP
Inventors: Akira Baba; 朗馬場; Kiyotaka Takehara; 清隆竹原; Kenji Okuno; 健治奥野; Kenji Nakakita; 賢二中北; Shinpei Hibiya; 新平日比谷
Original assignee: Panasonic Electric Works Co Ltd
Current assignee: Panasonic Electric Works Co Ltd
Priority date: 2007-10-26
Filing date: 2007-10-26
Publication date: 2009-05-21

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice recognition system capable of improving availability, and to provide a voice recognizer. <P>SOLUTION: The voice recognition system 1 inputs reproduction signals Sb, Sc output from a television set 22 with a mixer part 12. The mixer part 12 selects and outputs a command signal Sh previously contained in the reproduction signals Sb, Sc for controlling control equipment 20 in a specific mode. In the specific mode, an echo canceling part 13 inputs a voice signal Sa from a voice input part 11a and the command signal Sh from the mixer part 12, and outputs only the command signal Sh of the signals Sa, Sh to a recognition collation processing part 14. The recognition collation processing part 14 collates the command signal Sh and a reference pattern signal and executes control based on the command signal Sh with the control equipment 20 when being collated. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、音声認識システム及び音声認識装置に関する。 The present invention relates to a voice recognition system and a voice recognition device.

近年、ユーザからの発話音声を入力し、入力した発話音声に基づいて制御機器を制御する音声認識システムが知られている。このような音声認識システムは、予め標準パターン信号を記憶している。そして、音声認識システムは、入力した発話音声を音声信号に変換して標準パターン信号と照合し、両者の一致度が一定値以上であれば、標準パターンに基づく語彙（「照明オフ」など）の発話があったと判断する。これにより、音声認識システムは、標準パターンに基づく語彙の指示通りに、制御機器を制御することとなる。 2. Description of the Related Art In recent years, a speech recognition system has been known in which utterance speech from a user is input and a control device is controlled based on the input utterance speech. Such a speech recognition system stores a standard pattern signal in advance. Then, the voice recognition system converts the input utterance voice into a voice signal and collates it with a standard pattern signal. If the degree of coincidence between both is a certain value or more, a vocabulary based on the standard pattern (such as “illumination off”) Judge that there was an utterance. As a result, the speech recognition system controls the control device as instructed by the vocabulary based on the standard pattern.

また、このような音声認識システムには、ユーザの利便性を考慮して、ガイダンス音声を流して操作方法を説明するなど、補助的な処理を実行して利便性を向上させたものがある（例えば特許文献１参照）。
特開２００１−１５４６８９号公報 In addition, some of these voice recognition systems have improved convenience by executing auxiliary processing such as explaining the operation method by playing guidance voice in consideration of user convenience ( For example, see Patent Document 1).
Japanese Patent Laid-Open No. 2001-154689

しかし、従来の音声認識システムでは、未だ利便性の面で向上の余地があるものであった。 However, the conventional speech recognition system still has room for improvement in terms of convenience.

本発明は、上記問題点を解決するために成されたものであり、その目的とするところは、より利便性を向上させることが可能な音声認識システム及び音声認識装置を提供することにある。 The present invention has been made to solve the above-described problems, and an object of the present invention is to provide a speech recognition system and a speech recognition apparatus that can further improve convenience.

本発明に係る音声認識システムは、ユーザからの発話音声を入力し、入力した発話音声に基づいて制御機器を制御するものであって、再生信号に基づいてコンテンツを再生する再生機器と、再生機器により出力された音声とユーザの発話による音声とを入力して、これらに基づく音声信号を出力する音声入力手段と、音声入力手段からの音声信号と、予め記憶された標準パターン信号との照合処理を行う認識照合処理手段と、認識照合処理手段による照合結果に基づいて制御機器を制御する制御手段と、処理方法を決定する複数の制御モードから１のモードを選択可能なモード選択手段と、モード選択手段により特定モードが選択されている場合に、再生機器から再生信号を入力し、入力した再生信号のうち、制御機器を制御するために予め再生信号内に含まれるコマンド信号を選択して出力する第１信号処理手段と、モード選択手段により特定モードが選択されている場合に、音声入力手段からの音声信号と第１信号処理手段からのコマンド信号とを入力し、これら信号のうちコマンド信号のみを認識照合処理手段に出力する第２信号処理手段と、を備え、認識照合処理手段は、モード選択手段により特定モードが選択されている場合、第２信号処理手段からのコマンド信号と標準パターンとの照合処理を行う。 A speech recognition system according to the present invention inputs speech speech from a user, controls a control device based on the input speech speech, and plays back content based on a playback signal, and playback device The voice input means for inputting the voice output by the user and the voice by the user's utterance and outputting the voice signal based on the voice, the voice signal from the voice input means, and the collation processing of the standard pattern signal stored in advance Recognition control processing means for performing control, control means for controlling the control device based on the result of verification by the recognition verification processing means, mode selection means capable of selecting one mode from a plurality of control modes for determining the processing method, mode When a specific mode is selected by the selection means, a playback signal is input from the playback device, and, in order to control the control device among the input playback signals, in advance First signal processing means for selecting and outputting a command signal included in the raw signal, and when the specific mode is selected by the mode selection means, the voice signal from the voice input means and the first signal processing means Second signal processing means for inputting a command signal and outputting only the command signal of these signals to the recognition / collation processing means, and the recognition / collation processing means is selected when the specific mode is selected by the mode selection means The command signal from the second signal processing means is compared with the standard pattern.

この音声認識システムによれば、特定モードが選択されている場合、音声信号とコマンド信号とを入力し、これら信号のうちコマンド信号のみを認識照合処理手段に出力し、認識照合処理手段は、コマンド信号と標準パターン信号との照合処理を行う。このため、音声認識システムは、認識照合処理手段によりコマンド信号と標準パターン信号とが照合された場合、コマンド信号に基づいて制御機器を制御することとなる。すなわち、再生機器において再生されるコンテンツのデータ内にコマンド信号を組み込んでおけば、再生機器による再生に伴って制御機器を制御することができる。これにより、例えば擬似的に森林浴を楽しむための音楽データを記録したＣＤやＤＶＤ等を再生するだけで、森林浴環境に似た静かな音楽を流しながら照明装置を森林浴に似たやわらかな照明に変化させることができる。また、操作説明するためのＣＤやＤＶＤ等を再生してガイダンス音声を流しながら、実際に制御機器を制御してガイダンス音声による操作説明を一層わかりやすくすることができる。従って、より利便性を向上させることができる。 According to this speech recognition system, when a specific mode is selected, a speech signal and a command signal are input, and only the command signal among these signals is output to the recognition verification processing means. The recognition verification processing means The signal is compared with the standard pattern signal. For this reason, the speech recognition system controls the control device based on the command signal when the command signal and the standard pattern signal are verified by the recognition verification processing means. That is, if a command signal is incorporated in content data to be played back by a playback device, the control device can be controlled along with playback by the playback device. This makes it possible to change the lighting system to soft lighting similar to a forest bath while playing quiet music that resembles a forest bath environment simply by playing a CD or DVD that records music data for enjoying a forest bath in a simulated manner, for example. Can be made. In addition, it is possible to make the operation explanation by the guidance voice easier to understand by actually controlling the control device while playing the guidance voice by playing a CD or DVD for explaining the operation. Therefore, convenience can be further improved.

また、本発明に係る音声認識システムにおいて、第１信号処理手段は、モード選択手段により特定モード以外の制御モードが選択されている場合、再生機器からのコンテンツの再生信号からコマンド信号のみを選択して出力せず、当該再生信号を出力し、第２信号処理手段は、モード選択手段により特定モード以外の制御モードが選択されている場合、第１信号処理手段からの再生信号に基づいて、音声入力手段により入力された音声信号からエコー成分を除去することが好ましい。 In the speech recognition system according to the present invention, the first signal processing unit selects only the command signal from the playback signal of the content from the playback device when the control mode other than the specific mode is selected by the mode selection unit. And the second signal processing means outputs the reproduction signal based on the reproduction signal from the first signal processing means when a control mode other than the specific mode is selected by the mode selection means. It is preferable to remove the echo component from the audio signal input by the input means.

この音声認識システムによれば、第２信号処理手段はモード選択手段により特定モード以外の制御モードが選択されている場合、音声入力手段により入力された音声信号からエコー成分を除去する。このため、第２信号処理手段はエコーキャンセル機能を備えることとなり、特定モード以外の制御モードが選択され、ユーザが発話音声により制御機器を制御しようとする場合、発話音声の認識率を向上させることができる。 According to this voice recognition system, the second signal processing means removes an echo component from the voice signal input by the voice input means when a control mode other than the specific mode is selected by the mode selection means. For this reason, the second signal processing means has an echo cancellation function, and when a control mode other than the specific mode is selected and the user intends to control the control device by the uttered voice, the recognition rate of the uttered voice is improved. Can do.

また、本発明に係る音声認識システムにおいて、認識照合処理手段は、音声入力手段からの音声信号と照合処理を行うための第１標準パターン信号と、第１信号処理手段からのコマンド信号と照合処理を行うための第２標準パターン信号とを有することが好ましい。 In the speech recognition system according to the present invention, the recognition / collation processing means includes a speech signal from the speech input means and a first standard pattern signal for performing collation processing, and a command signal from the first signal processing means and collation processing. It is preferable to have a second standard pattern signal for performing.

この音声認識システムによれば、認識照合処理手段は、音声信号と照合処理を行うための第１標準パターン信号と、コマンド信号と照合処理を行うための第２標準パターン信号とを有するため、両者の標準パターン信号を共通とすることなく、それぞれの専用の標準パターン信号を用いることで、音声認識性能を向上させることができる。 According to this speech recognition system, the recognition / collation processing means includes the first standard pattern signal for performing collation processing with the speech signal, and the second standard pattern signal for performing collation processing with the command signal. The voice recognition performance can be improved by using each standard pattern signal without using the standard pattern signal in common.

また、本発明に係る音声認識システムにおいて、再生機器から出力されるコマンド信号に基づく音声の音量のみを調整可能な音量調整手段をさらに備え、音量調整手段は、モード選択手段により特定モードが選択されている場合、コマンド信号に基づく音声の出力を禁止することが好ましい。 The speech recognition system according to the present invention further includes volume adjusting means capable of adjusting only the sound volume based on the command signal output from the playback device, and the sound volume adjusting means selects the specific mode by the mode selecting means. In such a case, it is preferable to prohibit the output of sound based on the command signal.

この音声認識システムによれば、モード選択手段により特定モードが選択されている場合、コマンド信号に基づく音声の出力を禁止する。ここで、コマンド信号は再生信号の一部であるため、再生機器から音声出力されてしまう。このため、コマンド信号に基づく音声の出力を禁止することで、コマンド信号に基づく音声がユーザに認識されることによる不快感を抑制することができる。 According to this voice recognition system, when the specific mode is selected by the mode selection unit, the output of voice based on the command signal is prohibited. Here, since the command signal is a part of the reproduction signal, sound is output from the reproduction device. For this reason, the discomfort caused by the user recognizing the sound based on the command signal can be suppressed by prohibiting the output of the sound based on the command signal.

また、本発明に係る音声認識システムにおいて、再生機器は、映像を再生出力し、再生信号は、出力映像のシーンに対応してコマンド信号が含まれており、認識照合処理手段が第２信号処理手段からのコマンド信号と標準パターンとの照合処理を行うことにより、制御機器の制御内容が変更されることが好ましい。 In the voice recognition system according to the present invention, the playback device plays back and outputs video, the playback signal includes a command signal corresponding to the scene of the output video, and the recognition / collation processing means performs the second signal processing. It is preferable that the control content of the control device is changed by performing a collation process between the command signal from the means and the standard pattern.

この音声認識システムによれば、再生信号は、出力映像のシーンに対応してコマンド信号が含まれており、認識照合処理手段が第２信号処理手段からのコマンド信号と標準パターンとの照合処理を行うことにより、制御機器の制御内容が変更される。このため、映画のＤＶＤ等を再生している場合において、映画のシーン毎に照明の明るさを変更することなどが可能となり、映像出力の演出効果を高めることができる。 According to this voice recognition system, the reproduction signal includes a command signal corresponding to the scene of the output video, and the recognition / collation processing means performs the collation processing between the command signal from the second signal processing means and the standard pattern. As a result, the control content of the control device is changed. For this reason, when a movie DVD or the like is being reproduced, the brightness of the illumination can be changed for each scene of the movie, and the effect of producing the video output can be enhanced.

また、本発明に係る音声認識システムにおいて、再生機器は、５．１チャンネルにより音声出力し、０．１チャンネルがコマンド信号の再生チャンネルに割り当てられていることが好ましい。 In the voice recognition system according to the present invention, it is preferable that the playback device outputs voice by 5.1 channel, and 0.1 channel is assigned to the playback channel of the command signal.

この音声認識システムによれば、再生機器は、５．１チャンネルにより音声出力し、０．１チャンネルがコマンド信号の再生チャンネルに割り当てられているため、残りの５チャンネルを出力音声等に費やせ、コンテンツ自体の演出効果の減退を抑制することができる。 According to this voice recognition system, the playback device outputs the voice by 5.1 channel, and 0.1 channel is assigned to the playback channel of the command signal. Therefore, the remaining 5 channels can be used for the output voice, etc. It is possible to suppress a decrease in the effect of the content itself.

また、本発明に係る音声認識装置は、ユーザからの発話音声を入力し、入力した発話音声に基づいて制御機器を制御する制御信号を出力するものであって、再生信号に基づいてコンテンツを再生する再生機器により出力された音声とユーザの発話による音声とを入力して、これらに基づく音声信号を出力する音声入力手段と、音声入力手段からの音声信号と、予め記憶された標準パターン信号との照合処理を行う認識照合処理手段と、認識照合処理手段による照合結果に基づいて制御信号の出力制御を行う制御手段と、処理方法を決定する複数の制御モードから１のモードを選択可能なモード選択手段と、モード選択手段により特定モードが選択されている場合に、再生機器から再生信号を入力し、入力した再生信号のうち、制御機器を制御するために予め再生信号内に含まれるコマンド信号を選択して出力する第１信号処理手段と、音声入力手段からの音声信号と第１信号処理手段からのコマンド信号とを入力し、これら信号のうちコマンド信号のみを認識照合処理手段に出力する第２信号処理手段と、を備え、認識照合処理手段は、モード選択手段により特定モードが選択されている場合、第２信号処理手段からのコマンド信号と標準パターンとの照合処理を行う。 In addition, the speech recognition apparatus according to the present invention inputs speech speech from a user and outputs a control signal for controlling the control device based on the input speech speech, and reproduces content based on the playback signal. Voice input means for inputting the voice output by the playback device and the voice of the user's utterance and outputting a voice signal based on the voice, the voice signal from the voice input means, and the standard pattern signal stored in advance A mode in which one mode can be selected from a plurality of control modes for determining a processing method When a specific mode is selected by the selection means and the mode selection means, a playback signal is input from the playback device, and the control device is controlled from the input playback signal. Therefore, first signal processing means for selecting and outputting a command signal included in the reproduction signal in advance, a voice signal from the voice input means, and a command signal from the first signal processing means are input, A second signal processing unit that outputs only the command signal to the recognition / collation processing unit, and the recognition / collation processing unit receives the command signal from the second signal processing unit when the specific mode is selected by the mode selection unit. And the standard pattern.

この音声認識装置によれば、特定モードが選択されている場合、音声信号とコマンド信号とを入力し、これら信号のうちコマンド信号のみを認識照合処理手段に出力し、認識照合処理手段は、コマンド信号と標準パターン信号との照合処理を行う。このため、音声認識システムは、認識照合処理手段によりコマンド信号と標準パターン信号とが照合された場合、コマンド信号に基づいて制御機器を制御することとなる。すなわち、再生機器において再生されるコンテンツのデータ内にコマンド信号を組み込んでおけば、再生機器による再生に伴って制御機器を制御することができる。これにより、例えば擬似的に森林浴を楽しむための音楽データを記録したＣＤやＤＶＤ等を再生するだけで、森林浴環境に似た静かな音楽を流しながら照明装置を森林浴に似たやわらかな照明に変化させることができる。また、操作説明するためのＣＤやＤＶＤ等を再生してガイダンス音声を流しながら、実際に制御機器を制御してガイダンス音声による操作説明を一層わかりやすくすることができる。従って、より利便性を向上させることができる。 According to this speech recognition apparatus, when the specific mode is selected, the speech signal and the command signal are input, and only the command signal among these signals is output to the recognition verification processing unit. The signal is compared with the standard pattern signal. For this reason, the speech recognition system controls the control device based on the command signal when the command signal and the standard pattern signal are verified by the recognition verification processing means. That is, if a command signal is incorporated in content data to be played back by a playback device, the control device can be controlled along with playback by the playback device. This makes it possible to change the lighting system to soft lighting similar to a forest bath while playing quiet music that resembles a forest bath environment simply by playing a CD or DVD that records music data for enjoying a forest bath in a simulated manner, for example. Can be made. In addition, it is possible to make the operation explanation by the guidance voice easier to understand by actually controlling the control device while playing the guidance voice by playing a CD or DVD for explaining the operation. Therefore, convenience can be further improved.

本発明によれば、より利便性を向上させることが可能な音声認識システム及び音声認識装置を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the speech recognition system and speech recognition apparatus which can improve the convenience more can be provided.

以下、図面を参照して、本発明の実施の形態を説明する。図１は、本発明の実施形態に係る音声認識システムを示す構成図である。音声認識システム１は、ユーザからの発話音声を入力し、入力した発話音声に基づいて制御機器２０を制御するものであって、音声認識装置１０と、制御機器２０とからなっている。 Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a configuration diagram showing a speech recognition system according to an embodiment of the present invention. The speech recognition system 1 inputs speech speech from a user and controls the control device 20 based on the input speech speech. The speech recognition system 1 includes a speech recognition device 10 and a control device 20.

音声認識装置１０は、ユーザから音声及びスイッチ操作による入力を受け付け、受け付けた入力内容に応じて制御機器２０を制御するための制御信号を出力するものである。この音声認識装置１０は、音声により制御機器２０を制御できる音声入力モードと、スイッチ操作により制御機器２０を制御できるボタン操作入力モードとが選択可能となっている。音声入力モードにおいて、音声認識装置１０は、ユーザからの発話音声を入力して、入力した発話音声が所定の標準パターンに該当すると認識した場合に、認識した標準パターンに応じて制御機器２０を制御する制御信号を出力する。また、ボタン操作入力モードにおいて、音声認識装置１０は、ユーザからのスイッチ操作を入力し、スイッチ操作に該当する内容で制御機器２０を制御する制御信号を出力する。 The voice recognition device 10 receives voice and input by a switch operation from a user, and outputs a control signal for controlling the control device 20 according to the received input content. The voice recognition device 10 can select a voice input mode in which the control device 20 can be controlled by voice and a button operation input mode in which the control device 20 can be controlled by switch operation. In the voice input mode, the voice recognition device 10 controls the control device 20 according to the recognized standard pattern when the uttered voice from the user is input and the input uttered voice is recognized as corresponding to the predetermined standard pattern. Output a control signal. In the button operation input mode, the voice recognition device 10 inputs a switch operation from the user, and outputs a control signal for controlling the control device 20 with contents corresponding to the switch operation.

制御機器２０は、音声認識装置１０からの制御信号の内容に応じて動作する外部機器である。具体的に制御機器２０は、ＤＶＤプレーヤ２１、テレビ２２、浴室装置２３、換気扇２４及び照明装置２５の５機器からなっており、音声認識装置１０からの制御信号に応じて運転したり、運転を停止したりなどする。一例を挙げると、制御機器２０の１つであるテレビ２２は、音声認識装置１０からの制御信号によって、電源がオンされたり、チャンネルが変えられたりする。なお、制御機器２０のうちＤＶＤプレーヤ２１は、ＤＶＤからの再生信号に基づいてコンテンツを再生し、テレビ２２から音声出力及び映像出力する。このため、ＤＶＤプレーヤ２１とテレビ２２とは、両者によって再生機器を構成することとなる。また、本実施形態においてテレビ２２は２チャンネル音声出力を行う構成となっている。 The control device 20 is an external device that operates according to the content of the control signal from the voice recognition device 10. Specifically, the control device 20 includes five devices, a DVD player 21, a television 22, a bathroom device 23, a ventilation fan 24, and a lighting device 25. The control device 20 operates according to a control signal from the voice recognition device 10 and operates. Stop and so on. For example, the television 22 that is one of the control devices 20 is turned on or changed in channel by a control signal from the voice recognition device 10. Note that the DVD player 21 of the control device 20 reproduces content based on a reproduction signal from the DVD, and outputs audio and video from the television 22. For this reason, the DVD player 21 and the television 22 constitute a playback device. In the present embodiment, the television 22 is configured to perform 2-channel audio output.

図２は、図１に示した音声認識装置１０の設置例を示す外観図である。図２に示すように、音声認識装置１０は、例えば浴室に設けられる。浴室には、ＤＶＤプレーヤ２１（図２において図示せず）、テレビ２２、浴室装置２３（図２において図示せず）、換気扇２４及び照明装置２５が設けられている。さらに、浴室の浴槽３０付近には、音声認識装置１０の構成要素である後述のコントローラ１１が設置されている。 FIG. 2 is an external view showing an installation example of the speech recognition apparatus 10 shown in FIG. As shown in FIG. 2, the speech recognition apparatus 10 is provided in a bathroom, for example. In the bathroom, a DVD player 21 (not shown in FIG. 2), a television 22, a bathroom device 23 (not shown in FIG. 2), a ventilation fan 24, and a lighting device 25 are provided. Further, a controller 11 (described later) that is a component of the speech recognition device 10 is installed in the vicinity of the bathtub 30 in the bathroom.

なお、図１及び図２では、ＤＶＤプレーヤ２１、テレビ２２、浴室装置２３、換気扇２４及び照明装置２５を制御機器２０の一例として挙げたが、これに限らず、制御機器２０は、床暖房機器やパーソナルコンピュータなど他の機器であってもよい。また、音声認識装置１０は浴室に設けられていなくともよく、寝室、リビング、会社のデスク付近及び会議室など、他の箇所に設けられていてもよい。 1 and 2, the DVD player 21, the television 22, the bathroom device 23, the ventilation fan 24, and the lighting device 25 are given as examples of the control device 20. However, the control device 20 is not limited to this, and the control device 20 is a floor heating device. It may be another device such as a personal computer. The voice recognition device 10 may not be provided in the bathroom, and may be provided in other places such as a bedroom, a living room, a company desk, and a conference room.

再度、図１を参照する。図１に示すように、音声認識装置１０は、コントローラ１１と、ミキサー部（第１信号処理手段）１２、エコーキャンセル部（第２信号処理手段）１３と、認識照合処理部（認識照合処理手段）１４と、制御部（制御手段）１５とを備えている。コントローラ１１は、ユーザから音声及びスイッチ操作による入力を受け付けるものである。図３は、図１に示したコントローラ１１の詳細を示す正面図である。図３に示すように、コントローラ１１は、音声入力部（音声入力手段）１１ａと、操作ボタン（モード選択手段）１１ｂと、表示部１１ｃと、ＬＥＤランプ１１ｄとを備えている。なお、ＬＥＤランプ１１ｄは他の部分１２〜１５，２０との接続関係がないため、図１におけるＬＥＤランプ１１ｄの図示は省略した。 Reference is again made to FIG. As shown in FIG. 1, the speech recognition apparatus 10 includes a controller 11, a mixer unit (first signal processing unit) 12, an echo cancellation unit (second signal processing unit) 13, and a recognition verification processing unit (recognition verification processing unit). ) 14 and a control unit (control means) 15. The controller 11 receives input from the user by voice and switch operation. FIG. 3 is a front view showing details of the controller 11 shown in FIG. As shown in FIG. 3, the controller 11 includes an audio input unit (audio input unit) 11a, an operation button (mode selection unit) 11b, a display unit 11c, and an LED lamp 11d. Note that the LED lamp 11d is not shown in FIG. 1 because the LED lamp 11d is not connected to the other portions 12 to 15 and 20.

図３に示す音声入力部１１ａは、マイク等によって構成されており、ＤＶＤプレーヤ２１によって再生されテレビ２２から出力された音声と、ユーザの発話による音声とを入力して、これらに基づく音声信号Ｓａを出力するものである。操作ボタン１１ｂは、ユーザによるスイッチ操作を受け付けるものである。表示部１１ｃは、ＬＣＤなどによって構成され、各種制御機器２０の動作状況等（例えばふろの温度や現在時刻など）を表示するものである。ＬＥＤランプ１１ｄは、現在音声入力モードであるか、ボタン操作入力モードであるかをユーザに提示するものである。このＬＥＤランプ１１ｄは、３つのＬＥＤによって構成され、例えば１つが点灯しているときには音声入力モードであり、他の１つが点灯しているときにはボタン操作入力モードであり、残り１つが点灯しているときには双方のモードの併用状態であることを示す構成となっている。 The audio input unit 11a shown in FIG. 3 includes a microphone or the like. The audio input unit 11a receives the audio reproduced by the DVD player 21 and output from the television 22 and the audio from the user's utterance, and the audio signal Sa based on these inputs. Is output. The operation button 11b receives a switch operation by the user. The display unit 11c is configured by an LCD or the like, and displays operation statuses and the like (for example, bath temperature and current time) of the various control devices 20. The LED lamp 11d indicates to the user whether the current voice input mode or the button operation input mode is selected. This LED lamp 11d is composed of three LEDs. For example, when one is lit, it is in the voice input mode, when the other is lit, it is in the button operation input mode, and the remaining one is lit. In some cases, the configuration indicates that both modes are used together.

具体的に各種操作ボタン１１ｂを説明する。各種操作ボタン１１ｂは、優先ボタン１１ｂ１、追いだきボタン１１ｂ２、ふろ自動ボタン１１ｂ３、通話ボタン１１ｂ４、コントローラオンオフボタン１１ｂ５、メニューボタン１１ｂ６、確定ボタン１１ｂ７、戻るボタン１１ｂ８、及び十字キー１１ｂ９からなっている。 The various operation buttons 11b will be specifically described. The various operation buttons 11b include a priority button 11b1, a follow-up button 11b2, a bath automatic button 11b3, a call button 11b4, a controller on / off button 11b5, a menu button 11b6, a confirm button 11b7, a return button 11b8, and a cross key 11b9.

優先ボタン１１ｂ１は、浴室で給湯温度やシャワー温度を設定したいときに使用するボタンである。一般的に水や湯は、浴室以外にも台所等で用いられる。このため、浴室装置２３の給湯温度やシャワー温度を設定しても他の箇所で水や湯を使用されると、実際の給湯温度やシャワー温度にズレが生じる可能性がある。そこで、優先ボタン１１ｂ１を押下することにより、他の箇所よりも浴室を優先し、実際の給湯温度やシャワー温度にズレが生じ難いようにすることができる。また、優先ボタン１１ｂ１が押下されると、表示部１１ｃに優先マーク（不図示）が表示される。 The priority button 11b1 is a button used when it is desired to set a hot water supply temperature or a shower temperature in the bathroom. In general, water and hot water are used not only in the bathroom but also in the kitchen. For this reason, even if the hot water supply temperature and the shower temperature of the bathroom device 23 are set, if water or hot water is used in other places, the actual hot water supply temperature or the shower temperature may be shifted. Therefore, by depressing the priority button 11b1, it is possible to give priority to the bathroom over other places, and to prevent the actual hot water supply temperature and shower temperature from being displaced. When the priority button 11b1 is pressed, a priority mark (not shown) is displayed on the display unit 11c.

追いだきボタン１１ｂ２は、浴槽３０内で冷たくなった湯水の温度を高くするときに使用されるボタンである。また、追いだきボタン１１ｂ２が押下されると、表示部１１ｃに追いだきマーク（不図示）が表示される。 The chasing button 11b2 is a button used when raising the temperature of hot water that has become cold in the bathtub 30. When the tracking button 11b2 is pressed, a tracking mark (not shown) is displayed on the display unit 11c.

ふろ自動ボタン１１ｂ３は、設定した湯量と温度とで浴槽３０内にお湯をはるときに使用されるボタンである。また、ふろ自動ボタン１１ｂ３が押下されると、表示部１１ｃに自動マーク（不図示）が表示される。 The bath automatic button 11b3 is a button used when hot water is poured into the bathtub 30 with the set amount and temperature of hot water. When the automatic button 11b3 is pressed, an automatic mark (not shown) is displayed on the display unit 11c.

通話ボタン１１ｂ４は、浴室外、例えば台所などに設置される台所用リモコンと通話するときに使用されるボタンである。また、通話ボタン１１ｂ４が押下されると、表示部１１ｃに通話マーク（不図示）が表示される。 The call button 11b4 is a button used when talking with a kitchen remote controller installed outside the bathroom, for example, in a kitchen. When the call button 11b4 is pressed, a call mark (not shown) is displayed on the display unit 11c.

コントローラオンオフボタン１１ｂ５は、コントローラ１１自体の電源をオンオフするためのボタンである。コントローラオンオフボタン１１ｂ５により電源がオフされた場合、表示部１１ｃの表示は消去することとなる。 The controller on / off button 11b5 is a button for turning on / off the power of the controller 11 itself. When the power is turned off by the controller on / off button 11b5, the display on the display unit 11c is erased.

メニューボタン１１ｂ６は、手入力により制御機器２０の動作を設定するためのボタンである。このボタン１１ｂ６が押下されると、各制御機器２０の動作項目（例えば照明オフ、換気扇オフ、テレビ電源オン、テレビチャンネル＋１など）が表示部１１ｃに複数個表示される。ユーザは、これら複数の動作項目から十字キー１１ｂ９を操作して１つの動作項目を選択することとなる。 The menu button 11b6 is a button for setting the operation of the control device 20 by manual input. When the button 11b6 is pressed, a plurality of operation items (for example, lighting off, ventilation fan off, television power on, television channel +1, etc.) of each control device 20 are displayed on the display unit 11c. The user selects one action item by operating the cross key 11b9 from the plurality of action items.

確定ボタン１１ｂ７は、十字キー１１ｂ９を操作して選択された動作項目の動作を制御機器２０に実行させる際に押下されるボタンである。戻るボタン１１ｂ８は、表示部１１ｃに表示される画面を１つ前の状態に戻すときなどに使用されるボタンである。例えば、表示部１１ｃ上に動作項目を３つ程度しか表示できない場合、十字キー１１ｂ９を操作することにより、表示画面を次の画面に移行させて新たな動作項目を表示させることができる。この状態において、戻るボタン１１ｂ８を押下すれば、移行した画面を元に戻して、前回画面の動作項目を表示部１１ｃに表示させることができる。 The confirmation button 11b7 is a button that is pressed when the control device 20 executes the operation of the operation item selected by operating the cross key 11b9. The return button 11b8 is a button used to return the screen displayed on the display unit 11c to the previous state. For example, when only about three action items can be displayed on the display unit 11c, the display screen can be shifted to the next screen and a new action item can be displayed by operating the cross key 11b9. In this state, if the return button 11b8 is pressed, the transitioned screen can be restored and the operation items of the previous screen can be displayed on the display unit 11c.

十字キー１１ｂ９は、給湯温度やシャワー温度の温度設定、及び湯量の設定などに用いられるボタンである。また、十字キー１１ｂ９は、表示部１１ｃにより表示される動作項目の選択にも用いられる。 The cross key 11b9 is a button used for setting the temperature of the hot water supply temperature or shower temperature, setting the amount of hot water, and the like. The cross key 11b9 is also used to select an operation item displayed on the display unit 11c.

さらに、本実施形態では、コントローラ１１の操作ボタン１１ｂを操作することにより、音声入力モードと、ボタン操作入力モードとを選択可能となっている。具体的にユーザは、メニューボタン１１ｂ６を操作し、表示部１１ｃに表示される入力モードを選択することによって、音声入力モードとボタン操作入力モードとを切り替えることができる。 Furthermore, in this embodiment, by operating the operation button 11b of the controller 11, the voice input mode and the button operation input mode can be selected. Specifically, the user can switch between the voice input mode and the button operation input mode by operating the menu button 11b6 and selecting an input mode displayed on the display unit 11c.

再度、図１を参照する。ミキサー部１２は、テレビ２２からの再生信号Ｓｂ，Ｓｃを入力し、エコーキャンセル部１３に出力するものであり、テレビ２２からの再生信号Ｓｂ，Ｓｃをミキシングしたミキシング信号Ｓｄを出力するものである。 Reference is again made to FIG. The mixer unit 12 inputs the reproduction signals Sb and Sc from the television 22 and outputs them to the echo canceling unit 13 and outputs a mixing signal Sd obtained by mixing the reproduction signals Sb and Sc from the television 22. .

エコーキャンセル部１３は、テレビ２２からの再生信号Ｓｂ，Ｓｃ、すなわちミキサー部１２からのミキシング信号Ｓｄを入力し、入力したミキシング信号Ｓｄに基づいて、エコーを除去するものである。すなわち、音声入力部１１ａに入力される音声には、ユーザの発話音声と、テレビ２２などの機器からのエコーの双方が含まれている。エコーキャンセル部１３は、発話音声とエコーとからなる音声信号Ｓａから、ミキシング信号Ｓｄに基づいてエコーに相当する信号成分を除去する構成となっている。なお、エコーキャンセル部１３は、性能に限界があり、エコーを完全には除去できず、ある程度除去する構成が一般的である。 The echo cancel unit 13 receives the reproduction signals Sb and Sc from the television set 22, that is, the mixing signal Sd from the mixer unit 12, and removes echoes based on the input mixing signal Sd. That is, the voice input to the voice input unit 11a includes both the user's uttered voice and the echo from the device such as the television 22. The echo canceling unit 13 is configured to remove a signal component corresponding to an echo from the audio signal Sa composed of the uttered voice and the echo based on the mixing signal Sd. Note that the echo canceling unit 13 has a limit in performance, and the echo canceling unit 13 cannot remove the echo completely, and generally removes the echo to some extent.

認識照合処理部１４は、音声入力部１１ａからの音声信号Ｓａ、より詳細にはエコーキャンセル部１３によってエコーがある程度除去された音声信号Ｓｅ（以下、エコー除去信号Ｓｅという）と、予め記憶される標準パターン信号との照合を行うものである。また、認識照合処理部１４は、エコー除去信号Ｓｅが標準パターン信号とマッチングした場合、その旨の認識結果信号Ｓｆを制御部１５に出力する。具体的に説明すると、例えばユーザからの発話内容が「テレビ電源オン」であった場合、認識照合処理部１４は、発話内容に基づくエコー除去信号Ｓｅが予め登録される「テレビ電源オン」の標準パターン信号にマッチングするか否かを判断する。そして、認識照合処理部１４は、「テレビ電源オン」の標準パターン信号にマッチングすると判断した場合、テレビ２２の電源をオンする旨の認識結果信号Ｓｅを制御部１５に出力する。 The recognition / collation processing unit 14 stores in advance a voice signal Sa from the voice input unit 11a, more specifically, a voice signal Se from which echoes have been removed to some extent by the echo canceling unit 13 (hereinafter referred to as echo cancellation signal Se). This is used for collation with the standard pattern signal. In addition, when the echo removal signal Se matches the standard pattern signal, the recognition verification processing unit 14 outputs a recognition result signal Sf to that effect to the control unit 15. More specifically, for example, when the utterance content from the user is “TV power on”, the recognition / collation processing unit 14 uses the “TV power on” standard in which the echo removal signal Se based on the utterance content is registered in advance. It is determined whether or not the pattern signal is matched. Then, when the recognition / collation processing unit 14 determines to match the standard pattern signal of “TV power on”, it outputs a recognition result signal Se for turning on the power of the television 22 to the control unit 15.

制御部１５は、制御機器２０の動作を制御するものである。制御部１５は、例えば認識照合処理部１４からテレビ２２の電源をオンする旨の認識結果信号Ｓｅを受信した場合、テレビ２２に対して電源をオンする制御信号Ｓｇを出力する。これにより、テレビ２２の電源はオンすることとなる。 The control unit 15 controls the operation of the control device 20. For example, when receiving a recognition result signal Se for turning on the power of the television 22 from the recognition / collation processing unit 14, the control unit 15 outputs a control signal Sg for turning on the power to the television 22. Thereby, the power supply of the television 22 is turned on.

ここで、本実施形態において操作ボタン１１ｂ６は、音声認識装置１０の処理方法を決定する複数の制御モードから１の制御モードを選択可能となっている。具体的に制御モードは、デモモード（特定モード）と、癒しモード（特定モード）と、通常モードとからなっており、ユーザは、操作ボタン１１ｂを操作することにより、これら３つの制御モードから１の制御モードを選択することができる。具体的に、ユーザは、メニューボタン１１ｂ６を押圧し、表示部１１ｃに表示される、各モードから１つを選択して、確定ボタン１１ｂ７を押圧することにより、１の制御モードを選択することができる。なお、上記デモモードとは、音声認識装置１０の操作説明をするためにガイダンス音声やガイダンス映像を流すためのモードであり、癒しモードとは、癒し効果を得るために浴室内の環境設定をするモードである。 Here, in the present embodiment, the operation button 11b6 can select one control mode from a plurality of control modes for determining the processing method of the speech recognition apparatus 10. Specifically, the control mode includes a demonstration mode (specific mode), a healing mode (specific mode), and a normal mode. The user operates the operation button 11b to select one of these three control modes. The control mode can be selected. Specifically, the user can select one control mode by pressing the menu button 11b6, selecting one of the modes displayed on the display unit 11c, and pressing the confirm button 11b7. it can. The demo mode is a mode for playing a guidance voice or a guidance video for explaining the operation of the voice recognition device 10, and the healing mode is setting the environment in the bathroom to obtain a healing effect. Mode.

さらに、本実施形態において、デモモード、又は癒しモードが選択され、所定のＤＶＤが再生された場合、以下の動作を行うこととなる。図４は、デモモード、又は癒しモードが選択され、所定のＤＶＤが再生された場合における音声認識システム１の動作を示す図である。なお、この場合において、テレビ２２から再生信号Ｓｂ，Ｓｃのうち一方の再生信号Ｓｂ（例えば２チャンネルの音声信号のうちＲ側の音声信号）は、制御機器２０を制御するためのコマンド信号Ｓｈにより構成されているものとする。 Further, in the present embodiment, when the demo mode or the healing mode is selected and a predetermined DVD is reproduced, the following operation is performed. FIG. 4 is a diagram showing the operation of the voice recognition system 1 when the demo mode or the healing mode is selected and a predetermined DVD is reproduced. In this case, one of the reproduction signals Sb and Sc from the television 22 (for example, the R-side audio signal of the two-channel audio signals) is generated by a command signal Sh for controlling the control device 20. It shall be configured.

まず、図４に示すように、デモモード、又は癒しモードが選択された場合、制御部１５は、ミキサー部１２に対して切替信号Ｓｉを出力すると共に、エコーキャンセル部１３に対して切替信号Ｓｊを出力する。これにより、ミキサー部１２及びエコーキャンセル部１３は以下の動作を行うこととなる。 First, as shown in FIG. 4, when the demo mode or the healing mode is selected, the control unit 15 outputs the switching signal Si to the mixer unit 12 and the switching signal Sj to the echo cancellation unit 13. Is output. Thereby, the mixer unit 12 and the echo cancel unit 13 perform the following operations.

すなわち、ミキサー部１２は、音声信号Ｓｃとコマンド信号Ｓｈとをミキシングすることなく、音声信号Ｓｃをカットし、コマンド信号Ｓｈのみを選択して出力する。これにより、エコーキャンセル部１３には、コマンド信号Ｓｈのみが出力される。 That is, the mixer unit 12 cuts the audio signal Sc without mixing the audio signal Sc and the command signal Sh, and selects and outputs only the command signal Sh. As a result, only the command signal Sh is output to the echo cancellation unit 13.

次いで、エコーキャンセル部１３は、音声入力部１１ａにより入力した音声信号Ｓａと、ミキサー部１２からのコマンド信号Ｓｈとを入力し、これら信号のみコマンド信号Ｓｈのみを認識照合処理部１４に出力する。 Next, the echo cancel unit 13 receives the audio signal Sa input from the audio input unit 11 a and the command signal Sh from the mixer unit 12, and outputs only these signals to the recognition / collation processing unit 14.

これにより、認識照合処理部１４は、コマンド信号Ｓｈと標準パターン信号との照合処理を行うこととなる。なお、コマンド信号Ｓｈは、予め制御機器２０に所定の制御を行わせるようにされている。すなわち、癒しモード用のＤＶＤには、「ライトを暗くして」などの発話音声に相当するコマンド信号Ｓｈが記録されている。このため、認識照合処理部１４は、コマンド信号Ｓｈと標準パターン信号との照合処理を行うことにより、コマンド信号Ｓｈが「ライトを暗くして」の標準パターン信号とマッチングしたと判断して、その旨の認識結果信号Ｓｆを出力することとなる。これにより、照明装置２５は、暗く点灯することとなり、単にテレビ２２からの癒し効果がある映像や音声が流れる場合よりも、周囲環境を整えて、一層癒し向きとすることができる。 Thereby, the recognition / collation processing unit 14 performs collation processing between the command signal Sh and the standard pattern signal. The command signal Sh is configured to cause the control device 20 to perform predetermined control in advance. That is, the command signal Sh corresponding to the speech voice such as “Dark the light” is recorded on the healing mode DVD. For this reason, the recognition / collation processing unit 14 performs a collation process between the command signal Sh and the standard pattern signal, thereby determining that the command signal Sh matches the standard pattern signal of “darkening the light”. A recognition result signal Sf to that effect is output. As a result, the lighting device 25 is lit darkly, and the surrounding environment can be adjusted and the healing direction can be further improved as compared with the case where a video or sound having a healing effect from the television 22 flows.

また、デモモード用のＤＶＤには、「換気扇オフ」などの発話音声に相当するコマンド信号Ｓｈが記録されている。このため、認識照合処理部１４は、コマンド信号Ｓｈと標準パターン信号との照合処理を行うことにより、コマンド信号Ｓｈが「換気扇オフ」の標準パターン信号とマッチングしたと判断して、その旨の認識結果信号Ｓｆを出力することとなる。これにより、換気扇２４は運転を停止することとなり、換気扇２４をオフするガイダンス音声やガイダンス映像を流しながら実際に換気扇２４をオフさせることができ、ユーザには印象深い操作説明を行うことができる。 The demonstration mode DVD records a command signal Sh corresponding to an utterance voice such as “ventilator fan off”. For this reason, the recognition / collation processing unit 14 performs a collation process between the command signal Sh and the standard pattern signal to determine that the command signal Sh matches the standard pattern signal of “ventilator fan off”, and recognizes that fact. The result signal Sf is output. As a result, the operation of the ventilation fan 24 is stopped, and the ventilation fan 24 can be actually turned off while a guidance voice or guidance video for turning off the ventilation fan 24 is played, and an impressive operation explanation can be given to the user.

さらに、本実施形態において、テレビ２２は、ユーザに向けて出力される音声のうち、コマンド信号Ｓｈに基づく音声の音量のみを調整可能な音量調整手段を備えている。すなわち、ＤＶＤには、コマンド信号Ｓｈが含まれており、テレビ２２においてそのまま音声出力すると、「ライトを暗くして」などの音声がテレビ２２から流れてしまい、癒し効果が減殺されてしまうこととなる。ところが、コマンド信号Ｓｈに基づく音声の音量のみをカットしたり、極めて小さくしたりすることができる音量調整手段を備えることで、コマンド信号Ｓｈに基づく音声がユーザに認識されて、不快感を与えてしまう事態を抑制することができる。 Furthermore, in the present embodiment, the television 22 includes volume adjusting means that can adjust only the volume of the sound based on the command signal Sh among the sounds output to the user. That is, the DVD includes the command signal Sh, and if the sound is output as it is on the television 22, a sound such as “darken the light” flows from the television 22 and the healing effect is diminished. Become. However, by providing a volume adjusting means that can cut or extremely reduce the volume of the voice based on the command signal Sh, the voice based on the command signal Sh is recognized by the user, giving unpleasant feeling. Can be suppressed.

具体的に、音量調整手段は、２チャンネル音声出力の場合、以下のように構成される。例えばＬ側から音楽が出力され、Ｒ側からコマンド信号Ｓｈが音声出力される場合、音量調整手段は、Ｌ側からのみ音声出力し、Ｒ側からの音声出力をカットする構成とされる。５．１チャンネル音声出力や６．１チャンネル音声出力の場合も同様に、コマンド信号Ｓｈを或るチャンネル割り当てておく。そして、音量調整手段は、コマンド信号Ｓｈが割り当てられるチャンネルからの音声出力をカットする。 Specifically, the volume adjusting means is configured as follows in the case of 2-channel audio output. For example, when music is output from the L side and a command signal Sh is output from the R side, the sound volume adjusting unit is configured to output the sound only from the L side and cut the audio output from the R side. Similarly, in the case of 5.1 channel audio output or 6.1 channel audio output, a command signal Sh is assigned to a certain channel. Then, the sound volume adjusting means cuts the sound output from the channel to which the command signal Sh is assigned.

なお、上記動作が行われる場合において、認識照合処理部１４は、音声入力部１１ａからの音声信号Ｓａ（詳細にはエコー除去信号Ｓｄ）と照合処理を行うための第１標準パターン信号と、ミキサー部１２からのコマンド信号Ｓｈと照合処理を行うための第２標準パターン信号とを有することが望ましい。これにより、両者の標準パターン信号を共通とすることなく、それぞれの専用の標準パターン信号を用いることで、音声認識性能を向上させることができるためである。 In the case where the above operation is performed, the recognition / collation processing unit 14 includes a first standard pattern signal for performing collation processing with the voice signal Sa (specifically, the echo removal signal Sd) from the voice input unit 11a, and a mixer. It is desirable to have a command signal Sh from the unit 12 and a second standard pattern signal for performing a collation process. This is because the voice recognition performance can be improved by using the dedicated standard pattern signals without using both standard pattern signals in common.

すなわち、専用の標準パターン信号がなく、共通の標準パターン信号によって照合処理を行う場合、ユーザによる発話音声に基づく音声信号Ｓａとコマンド信号Ｓｈとの双方にマッチする標準パターン信号を用意しておくことが必要となる。ここで、音声信号Ｓａとコマンド信号Ｓｈとは全く同じでないため、マッチする標準パターン信号は、両者の中間的な性質を有することとなる。これにより、音声信号Ｓａと標準パターン信号のマッチング率及びコマンド信号Ｓｈと標準パターン信号とのマッチング率は、それぞれ低下してしまうこととなる。ところが、専用の標準パターン信号を有する場合には、上記マッチング率の低下を防止でき、音声認識性能を向上させることができる。 That is, when there is no dedicated standard pattern signal and collation processing is performed using a common standard pattern signal, a standard pattern signal that matches both the voice signal Sa and the command signal Sh based on the uttered voice by the user is prepared. Is required. Here, since the audio signal Sa and the command signal Sh are not exactly the same, the matched standard pattern signal has an intermediate property between them. As a result, the matching rate between the audio signal Sa and the standard pattern signal and the matching rate between the command signal Sh and the standard pattern signal are reduced. However, in the case of having a dedicated standard pattern signal, it is possible to prevent the matching rate from being lowered and improve speech recognition performance.

次に、フローチャートを参照して音声認識システム１の詳細動作を説明する。図５は、図１に示した音声認識システム１の動作の詳細を示すフローチャートである。なお、図５に示す処理は音声認識装置１０の電源がオフされるまで繰り返される。 Next, detailed operation of the speech recognition system 1 will be described with reference to a flowchart. FIG. 5 is a flowchart showing details of the operation of the speech recognition system 1 shown in FIG. The process shown in FIG. 5 is repeated until the power of the speech recognition apparatus 10 is turned off.

図５に示すように、まず、制御部１５は、デモモードが選択されて開始されたか否かを判断する（Ｓ１）。そして、デモモードが開始されたと判断した場合（Ｓ１：ＹＥＳ）、処理はステップＳ３に移行する。 As shown in FIG. 5, first, the control unit 15 determines whether or not the demonstration mode is selected and started (S1). If it is determined that the demo mode has been started (S1: YES), the process proceeds to step S3.

一方、デモモードが開始されていないと判断した場合（Ｓ１：ＮＯ）、制御部１５は、癒しモードが選択されて開始されたか否かを判断する（Ｓ２）。癒しモードが開始されたと判断した場合（Ｓ２：ＹＥＳ）、処理はステップＳ３に移行する。一方、癒しモードが開始されていないと判断した場合（Ｓ２：ＮＯ）、処理はステップＳ１に移行する。 On the other hand, when determining that the demo mode has not been started (S1: NO), the control unit 15 determines whether the healing mode has been selected and started (S2). If it is determined that the healing mode has been started (S2: YES), the process proceeds to step S3. On the other hand, when it is determined that the healing mode has not been started (S2: NO), the process proceeds to step S1.

ステップＳ３において、制御部１５は、表示部１１ｃの現在のモード（すなわち、デモモード又は癒しモード）を表示させる（Ｓ３）。そして、制御部１５は、ミキサー部１２に切替信号Ｓｉを送信する（Ｓ４）。これにより、ミキサー部１２は、音声信号Ｓｂ，Ｓｃをミキシングして出力せず、コマンド信号Ｓｈのみを選択して出力することとなる。 In step S3, the control unit 15 displays the current mode (that is, demo mode or healing mode) of the display unit 11c (S3). Then, the control unit 15 transmits a switching signal Si to the mixer unit 12 (S4). Thus, the mixer unit 12 does not mix and output the audio signals Sb and Sc, but selects and outputs only the command signal Sh.

次いで、制御部１５は、エコーキャンセル部１３に切替信号Ｓｊを送信する（Ｓ５）。これにより、エコーキャンセル部１３は、エコーキャンセル処理を実行することなく、入力したコマンド信号Ｓｈを、そのまま認識照合処理部１４に出力することとなる。 Next, the control unit 15 transmits a switching signal Sj to the echo cancellation unit 13 (S5). As a result, the echo cancellation unit 13 outputs the input command signal Sh to the recognition / collation processing unit 14 as it is without executing the echo cancellation processing.

さらに、制御部１５は、テレビ２２に対して制御信号Ｓｇを送信し、コマンド信号Ｓｈの音声が出力されないように切り替える（Ｓ６）。これにより、コマンド信号Ｓｈに基づく音声出力を防止して、癒し効果等の減殺を防止することとなる。 Further, the control unit 15 transmits a control signal Sg to the television 22 and switches so that the sound of the command signal Sh is not output (S6). This prevents voice output based on the command signal Sh and prevents a healing effect and the like from being reduced.

次いで、制御部１５は、デモモード又は癒しモードのキャンセル操作があったか否かを判断する（Ｓ７）。すなわち、制御部１５は、通常モードへの移行操作があったか否かを判断する。通常モードへの移行操作があったと判断した場合（Ｓ７：ＹＥＳ）、制御モードは通常モードに移行し、処理はステップＳ９に移行する。一方、通常モードへの移行操作がなかったと判断した場合（Ｓ７：ＮＯ）、制御部１５は、ＤＶＤ再生が終了したか否かを判断する（Ｓ８）。ＤＶＤ再生が終了していないと判断した場合（Ｓ８：ＮＯ）、処理はステップＳ７に移行する。 Next, the control unit 15 determines whether or not there is a cancel operation in the demo mode or the healing mode (S7). That is, the control unit 15 determines whether or not there has been an operation for shifting to the normal mode. If it is determined that there has been an operation for shifting to the normal mode (S7: YES), the control mode shifts to the normal mode, and the process proceeds to step S9. On the other hand, when it is determined that there has been no operation for shifting to the normal mode (S7: NO), the control unit 15 determines whether or not the DVD playback has ended (S8). If it is determined that the DVD playback has not ended (S8: NO), the process proceeds to step S7.

一方、ＤＶＤ再生が終了したと判断した場合（Ｓ８：ＹＥＳ）、処理はステップＳ９に移行する。ステップＳ９において、制御部１５は、表示部１１ｃの現在のモード（すなわち、通常モード）を表示させる（Ｓ９）。そして、制御部１５は、ミキサー部１２に切替信号Ｓｉを送信する（Ｓ４）。これにより、ミキサー部１２は、コマンド信号Ｓｈのみを選択して出力することなく、音声信号Ｓｂ，Ｓｃをミキシングして出力することとなる。 On the other hand, if it is determined that the DVD playback has ended (S8: YES), the process proceeds to step S9. In step S9, the control unit 15 displays the current mode (that is, the normal mode) of the display unit 11c (S9). Then, the control unit 15 transmits a switching signal Si to the mixer unit 12 (S4). Thus, the mixer unit 12 mixes and outputs the audio signals Sb and Sc without selecting and outputting only the command signal Sh.

次いで、制御部１５は、エコーキャンセル部１３に切替信号Ｓｊを送信する（Ｓ５）。これにより、エコーキャンセル部１３は、コマンド信号Ｓｈのみを出力せず、エコーキャンセル処理を実行することとなる。 Next, the control unit 15 transmits a switching signal Sj to the echo cancellation unit 13 (S5). As a result, the echo cancellation unit 13 does not output only the command signal Sh, but executes echo cancellation processing.

さらに、制御部１５は、テレビ２２に対して制御信号Ｓｇを送信し、音声出力を通常の状態に戻す（Ｓ６）。その後、図５に示す処理は終了する。 Furthermore, the control part 15 transmits control signal Sg with respect to the television 22, and returns an audio | voice output to a normal state (S6). Thereafter, the process shown in FIG. 5 ends.

このようにして、本実施形態に係る音声認識システム１及び音声認識装置１０によれば、特定モードが選択されている場合、音声信号Ｓｃとコマンド信号Ｓｈとを入力し、これら信号Ｓｃ，Ｓｈのうちコマンド信号Ｓｈのみを認識照合処理部１４に出力し、認識照合処理部１４は、コマンド信号Ｓｈと標準パターン信号との照合処理を行う。このため、音声認識システム１は、認識照合処理部１４によりコマンド信号Ｓｈと標準パターン信号Ｓｃとが照合された場合、コマンド信号Ｓｈに基づいて制御機器２０を制御することとなる。すなわち、再生機器において再生されるコンテンツのデータ内にコマンド信号Ｓｈを組み込んでおけば、再生機器による再生に伴って制御機器２０を制御することができる。これにより、例えば擬似的に森林浴を楽しむための音楽データを記録したＣＤやＤＶＤ等を再生するだけで、森林浴環境に似た静かな音楽を流しながら照明装置を森林浴に似たやわらかな照明に変化させることができる。また、操作説明するためのＣＤやＤＶＤ等を再生してガイダンス音声を流しながら、実際に制御機器２０を制御してガイダンス音声による操作説明を一層わかりやすくすることができる。従って、より利便性を向上させることができる。 Thus, according to the speech recognition system 1 and the speech recognition apparatus 10 according to the present embodiment, when the specific mode is selected, the speech signal Sc and the command signal Sh are input, and the signals Sc and Sh Only the command signal Sh is output to the recognition / collation processing unit 14, and the recognition / collation processing unit 14 performs collation processing between the command signal Sh and the standard pattern signal. For this reason, when the command signal Sh and the standard pattern signal Sc are collated by the recognition collation processing unit 14, the voice recognition system 1 controls the control device 20 based on the command signal Sh. That is, if the command signal Sh is incorporated in the data of the content played back by the playback device, the control device 20 can be controlled along with playback by the playback device. This makes it possible to change the lighting system to soft lighting similar to a forest bath while playing quiet music that resembles a forest bath environment simply by playing a CD or DVD that records music data for enjoying a forest bath in a simulated manner, for example. Can be made. Further, it is possible to make the operation explanation by the guidance voice easier to understand by actually controlling the control device 20 while playing the guidance voice by playing a CD or DVD for explaining the operation. Therefore, convenience can be further improved.

また、エコーキャンセル部１３は、操作ボタン１１ｂによりデモモード及び癒しモード以外の通常モードが選択されている場合、音声入力部１１ａにより入力された音声信号Ｓａからエコー成分を除去する。このように、エコーキャンセル部１３はコーキャンセル機能を備えることとなり、デモモード及び癒しモード以外の通常モードが選択され、ユーザが発話音声により制御機器２０を制御しようとする場合、発話音声の認識率を向上させることができる。 In addition, when the normal mode other than the demo mode and the healing mode is selected by the operation button 11b, the echo cancel unit 13 removes the echo component from the audio signal Sa input by the audio input unit 11a. Thus, the echo cancellation unit 13 has a co-cancelling function, and when the normal mode other than the demo mode and the healing mode is selected and the user intends to control the control device 20 by the uttered speech, the recognition rate of the uttered speech Can be improved.

また、認識照合処理部１４は、音声信号Ｓｃと照合処理を行うための第１標準パターン信号と、コマンド信号Ｓｈと照合処理を行うための第２標準パターン信号とを有するため、両者の標準パターン信号を共通とすることなく、それぞれの専用の標準パターン信号を用いることで、音声認識性能を向上させることができる。 Further, since the recognition / collation processing unit 14 includes the voice signal Sc and the first standard pattern signal for performing the collation processing, and the command signal Sh and the second standard pattern signal for performing the collation processing, both standard patterns The voice recognition performance can be improved by using each dedicated standard pattern signal without sharing the signal.

また、操作ボタン１１ｂによりデモモード又は癒しモードが選択されている場合、コマンド信号Ｓｈに基づく音声の出力を禁止する。ここで、コマンド信号Ｓｈは再生信号Ｓｂの一部であるため、テレビ２２から音声出力されてしまう。このため、コマンド信号Ｓｈに基づく音声の出力を禁止することで、コマンド信号Ｓｈに基づく音声がユーザに認識されることによる不快感を抑制することができる。 Further, when the demo mode or the healing mode is selected by the operation button 11b, the output of sound based on the command signal Sh is prohibited. Here, since the command signal Sh is a part of the reproduction signal Sb, audio is output from the television 22. For this reason, the discomfort caused by the user recognizing the sound based on the command signal Sh can be suppressed by prohibiting the output of the sound based on the command signal Sh.

以上、本発明に係る音声認識システム及び音声認識装置を実施形態に基づいて説明したが、本発明はこれに限定されるものではなく、本発明の趣旨を逸脱しない範囲で、変更を加えてもよい。 As mentioned above, although the speech recognition system and the speech recognition apparatus according to the present invention have been described based on the embodiments, the present invention is not limited thereto, and modifications may be made without departing from the spirit of the present invention. Good.

例えば、本実施形態では、ＤＶＤプレーヤ２１及びテレビ２２を再生機器の一例としてあげているが、これに限らず、再生機器はＣＤプレーヤなど音楽や音声のみを再生するものであってもよいし、パーソナルコンピュータ等の他の機器であってもよい。 For example, in the present embodiment, the DVD player 21 and the television 22 are given as an example of a playback device. However, the present invention is not limited to this, and the playback device may play only music and audio, such as a CD player. Other devices such as a personal computer may be used.

また、本実施形態において、再生信号Ｓｂは、出力映像のシーンに対応してコマンド信号Ｓｈが含まれており、認識照合処理部１４がエコーキャンセル部１３からのコマンド信号Ｓｈと標準パターンとの照合処理を行うことにより、制御機器２０の制御内容が変更される。このため、映画のＤＶＤ等を再生している場合において、映画のシーン毎に照明の明るさを変更することなどが可能となり、映像出力の演出効果を高めることができる。 In the present embodiment, the reproduction signal Sb includes the command signal Sh corresponding to the scene of the output video, and the recognition / collation processing unit 14 collates the command signal Sh from the echo cancel unit 13 with the standard pattern. By performing the process, the control content of the control device 20 is changed. For this reason, when a movie DVD or the like is being reproduced, the brightness of the illumination can be changed for each scene of the movie, and the effect of producing the video output can be enhanced.

また、本実施形態では、２チャンネル音声出力の場合における音声認識システム１を説明したが、これに限らず、再生機器が５．１チャンネルにより音声出力し、そのうちの０．１チャンネルにコマンド信号Ｓｈが割り当てられていることが望ましい。これにより、残りの５チャンネルを出力音声等に費やせ、コンテンツ自体の演出効果の減退を抑制することができるためである。 In the present embodiment, the speech recognition system 1 in the case of 2-channel audio output has been described. However, the present invention is not limited to this, and a playback device outputs audio through 5.1 channels, and command signal Sh is output to 0.1 of them. Is preferably assigned. This is because the remaining five channels can be spent on output audio and the like, and the reduction of the effect of the content itself can be suppressed.

また、本実施形態では、エコーキャンセル部１３を備え、デモモード及び癒しモードにおいて、エコーキャンセル部１３が音声入力部１１ａからの音声信号Ｓａとミキサー部１２からのコマンド信号Ｓｈとを入力し、コマンド信号Ｓｈのみを出力する構成となっているが、音声信号Ｓａとコマンド信号Ｓｈとを入力し、コマンド信号Ｓｈのみを出力できるものであれば、特にエコーキャンセル部１３に限られるものではない。なお、エコーキャンセル部１３を備えない場合、通常モードにおいてエコーを除去できず、音声認識率が低下してしまうおそれがあるため、コントローラ１１に発話ボタンを設け、発話ボタン押下時にはテレビ２２からの音声出力がカットされる（すなわちミュート状態となる）ように構成することが好ましい。これにより、テレビ２２からの音声による音声認識率の低下を防止できるからである。 Further, in the present embodiment, the echo canceling unit 13 is provided, and in the demo mode and the healing mode, the echo canceling unit 13 inputs the audio signal Sa from the audio input unit 11a and the command signal Sh from the mixer unit 12, and the command Although only the signal Sh is output, it is not particularly limited to the echo cancel unit 13 as long as it can input the audio signal Sa and the command signal Sh and output only the command signal Sh. If the echo cancel unit 13 is not provided, the echo cannot be removed in the normal mode, and the voice recognition rate may be reduced. Therefore, an utterance button is provided in the controller 11 and the voice from the television 22 is pressed when the utterance button is pressed. It is preferable that the output is cut (that is, the mute state is set). This is because it is possible to prevent a decrease in the speech recognition rate due to the sound from the television 22.

また、本実施異形態において、デモモード及び癒しモードでは、図４を参照して説明したように、ユーザの発話音声の信号が認識照合処理部１４に送信されることがないため、発話による制御機器２０の操作ができなくなっている。このため、コントローラ１１に発話ボタンを備える場合、デモモード及び癒しモードにおいて発話ボタンが押下することにより、デモモード及び癒しモードが一時停止又は停止し、通常モードに戻る構成とすることが好ましい。これにより、デモモード及び癒しモードにおいても発話による制御機器２０の操作が可能となるからである。 Further, in the present embodiment, in the demo mode and the healing mode, as described with reference to FIG. 4, since the signal of the user's speech is not transmitted to the recognition verification processing unit 14, the control by speech is performed. The device 20 cannot be operated. For this reason, when the controller 11 is provided with an utterance button, it is preferable that the demo mode and the healing mode are temporarily stopped or stopped by returning to the normal mode when the utterance button is pressed in the demonstration mode and the healing mode. This is because the control device 20 can be operated by speech even in the demonstration mode and the healing mode.

さらに、本実施形態では、操作ボタン１１ｂ６を操作することにより、複数の制御モードから１の制御モードを選択可能となっているが、これに限らず、音声によって制御モードを選択可能となっていてもよい。さらには、ＤＶＤやＣＤなどに制御モードを選択するコマンド信号Ｓｈを記録しておき、ＤＶＤプレーヤ２１や音楽プレーヤにてＤＶＤやＣＤ等を再生したときに、自動的に１の制御モードが選択されるようになっていてもよい。 Furthermore, in the present embodiment, one control mode can be selected from a plurality of control modes by operating the operation button 11b6. However, the present invention is not limited to this, and the control mode can be selected by voice. Also good. Furthermore, when a command signal Sh for selecting a control mode is recorded on a DVD or CD, and the DVD or CD is played on the DVD player 21 or music player, one control mode is automatically selected. It may come to be.

また、本実施形態では、操作ボタン１１ｂを操作することにより音声入力モードとボタン操作入力モードと切り替え可能となっているが、これに限らず、音声入力モード中には、発話によりボタン操作入力モードへ移行させるようにしてもよい。 Further, in the present embodiment, the voice input mode and the button operation input mode can be switched by operating the operation button 11b. However, the present invention is not limited to this, and during the voice input mode, the button operation input mode is set by utterance. You may make it shift to.

本発明の実施形態に係る音声認識システムを示す構成図である。It is a block diagram which shows the speech recognition system which concerns on embodiment of this invention. 図１に示した音声認識装置の設置例を示す外観図である。It is an external view which shows the example of installation of the speech recognition apparatus shown in FIG. 図１に示したコントローラの詳細を示す正面図である。It is a front view which shows the detail of the controller shown in FIG. デモモード、又は癒しモードが選択され、所定のＤＶＤが再生された場合における音声認識システムの動作を示す図である。It is a figure which shows operation | movement of the speech recognition system when demo mode or healing mode is selected and predetermined | prescribed DVD is reproduced | regenerated. 図１に示した音声認識システムの動作の詳細を示すフローチャートである。It is a flowchart which shows the detail of operation | movement of the speech recognition system shown in FIG.

Explanation of symbols

１音声認識システム
１０音声認識装置
１１コントローラ
１１ａ音声入力部
１１ｂ操作ボタン（モード選択手段）
１１ｃ表示部
１２ミキサー部（第１信号処理手段）
１３エコーキャンセル部（第２信号処理手段）
１４認識照合処理部（認識照合処理手段）
１５制御部（制御手段）
２０制御機器
２１ＤＶＤプレーヤ（再生機器）
２２テレビ（再生機器）
２３浴室装置
２４換気扇
２５照明装置 DESCRIPTION OF SYMBOLS 1 Voice recognition system 10 Voice recognition apparatus 11 Controller 11a Voice input part 11b Operation button (mode selection means)
11c Display unit 12 Mixer unit (first signal processing means)
13 Echo cancellation unit (second signal processing means)
14 recognition / collation processing unit (recognition / collation processing means)
15 Control unit (control means)
20 Control equipment 21 DVD player (playback equipment)
22 Television (playback equipment)
23 Bathroom equipment 24 Ventilation fan 25 Lighting equipment

Claims

A speech recognition system that inputs speech speech from a user and controls a control device based on the speech speech input,
A playback device for playing back content based on the playback signal;
Voice input means for inputting the voice output by the playback device and the voice of the user's utterance and outputting a voice signal based on the voice;
Recognition collation processing means for performing collation processing between a voice signal from the voice input means and a standard pattern signal stored in advance;
Control means for controlling the control device based on the collation result by the recognition collation processing means;
Mode selection means capable of selecting one mode from a plurality of control modes for determining a processing method;
When a specific mode is selected by the mode selection means, a command signal included in the playback signal is selected and output in advance to control the control device from the playback signal of the content from the playback device. First signal processing means;
When a specific mode is selected by the mode selection means, a voice signal from the voice input means and a command signal from the first signal processing means are input, and only the command signal among the signals is recognized and verified. Second signal processing means for outputting to the processing means,
The speech recognition system, wherein the recognition / collation processing means performs collation processing between the command signal from the second signal processing means and the standard pattern.

When the control mode other than the specific mode is selected by the mode selection unit, the first signal processing unit does not select and output only the command signal from the playback signal of the content from the playback device, but the playback Output signal,
When the control mode other than the specific mode is selected by the mode selection unit, the second signal processing unit is an audio signal input by the audio input unit based on a reproduction signal from the first signal processing unit. The speech recognition system according to claim 1, wherein an echo component is removed from the speech recognition system.

The recognition collation processing means includes a first standard pattern signal for performing collation processing with the voice signal from the voice input means, and a second standard pattern for performing collation processing with the command signal from the first signal processing means. The speech recognition system according to claim 1, further comprising: a signal.

A volume adjusting means capable of adjusting only the volume of the sound based on the command signal output from the playback device;
4. The sound volume adjusting unit according to claim 1, wherein, when a specific mode is selected by the mode selecting unit, the sound output based on the command signal is prohibited. 5. Speech recognition system.

The playback device plays back and outputs video,
The playback signal includes the command signal corresponding to an output video scene,
The control content of the control device is changed when the recognition / collation processing unit performs a collation process between the command signal from the second signal processing unit and the standard pattern. 5. The speech recognition system according to any one of 4 above.

The playback device outputs audio through 5.1 channel,
The voice recognition system according to any one of claims 1 to 5, wherein 0.1 channel is assigned to a reproduction channel of the command signal.

A speech recognition device for inputting a speech voice from a user and outputting a control signal for controlling a control device based on the inputted speech voice,
A voice input means for inputting a voice output by a playback device that plays back content based on a playback signal and a voice of a user's utterance and outputting a voice signal based on the voice;
Recognition collation processing means for performing collation processing between a voice signal from the voice input means and a standard pattern signal stored in advance;
Control means for controlling the control device based on the collation result by the recognition collation processing means;
Mode selection means capable of selecting one mode from a plurality of control modes for determining a processing method;
When a specific mode is selected by the mode selection unit, a command signal included in the reproduction signal is selected and output in advance to control the control device from the content reproduction signal from the reproduction device. First signal processing means;
A second signal processing means for inputting a voice signal from the voice input means and a command signal from the first signal processing means, and outputting only the command signal of these signals to the recognition verification processing means,
The recognition / collation processing means performs a collation process between a command signal from a second signal processing means and the standard pattern when the specific mode is selected by the mode selection means. Voice recognition device.