JP2008203800A

JP2008203800A - Audio controller

Info

Publication number: JP2008203800A
Application number: JP2007043127A
Authority: JP
Inventors: Akira Baba; 朗馬場; Kiyotaka Takehara; 清隆竹原; Kenji Okuno; 健治奥野; Kenji Nakakita; 賢二中北; Shinpei Hibiya; 新平日比谷
Original assignee: Matsushita Electric Works Ltd
Current assignee: Panasonic Electric Works Co Ltd
Priority date: 2007-02-23
Filing date: 2007-02-23
Publication date: 2008-09-04
Anticipated expiration: 2027-02-23
Also published as: JP4821648B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an audio controller capable of preventing miscontrol resulting from wrong audio recognition even in an environment wherein home audio equipment is outputting sound. <P>SOLUTION: After making an echo cancel processing unit 11 performs echo cancel processing on output sound of the home audio equipment included in sound picked up by a sound pickup unit 10, the audio controller 1 smoothes an output signal by superposing a noise signal on the output signal of the echo cancel processing unit 11, and compares the smoothed signal with an audio recognition model to recognize audio included in the output signal. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、住宅等の建物内に設置され、住宅内の設備を音声で操作する音声コントローラに関するものである。 The present invention relates to a voice controller that is installed in a building such as a house and operates equipment in the house with voice.

住宅等の建物内に設置され、住宅内の設備を音声で操作する音声コントローラが提供されている。 2. Description of the Related Art A voice controller that is installed in a building such as a house and operates the equipment in the house with voice is provided.

ところで、住宅内には多様な機器が存在し、これらをスイッチ等で操作するのは煩雑である。例えばキッチン空間には、給湯コントローラ、床暖房コントローラ、照明スイッチ、インターホンなどがあり、また浴室空間には、気泡浴槽コントローラ、照明スイッチ、浴室暖房換気乾燥機のスイッチ、テレビ等がある。 By the way, there are various devices in the house, and it is complicated to operate them with a switch or the like. For example, the kitchen space includes a hot water controller, a floor heating controller, a lighting switch, an intercom, and the bathroom space includes a bubble bathtub controller, a lighting switch, a bathroom heating ventilation dryer switch, a television, and the like.

また、これらのコントローラ、スイッチが複数存在することは、宅内の景観も損なうという課題がある。 In addition, the presence of a plurality of these controllers and switches has a problem that the scenery in the house is damaged.

これらの課題を解決するために、複数の機器を制御可能とする統合的なコントローラが提供されており、またその操作を音声認識によって行うものが提供されてきている。このコントローラが所謂音声コントローラである。 In order to solve these problems, an integrated controller that can control a plurality of devices has been provided, and a controller that performs the operation by voice recognition has been provided. This controller is a so-called voice controller.

音声コントローラは、発話入力スイッチがあると、発話入力スイッチを操作する必要があるため、発話入力スイッチを設けていない場合が多いが、このような音声コントローラでは、設置している宅内の環境雑音（騒音）が集音用のマイクに入力されるために、環境騒音をユーザの発話音声と認識して、誤った制御を行う可能性がある。特に、対象空間にテレビ、ラジオ等の音響機器がある場合には、人の音声がマイクに入力された音響機器の出力音をエコーキャンセラにより低減する、例えば特許文献１に開示されている音声認識装置を用い方法も考えられる。
特開２００３−２４１７９１号公報 When there is an utterance input switch, the voice controller needs to operate the utterance input switch, so there are many cases where the utterance input switch is not provided. However, in such a voice controller, the environmental noise ( Noise) is input to the microphone for collecting sound, there is a possibility that environmental noise is recognized as the user's speech and erroneous control is performed. In particular, when there is an audio device such as a television or radio in the target space, the output sound of the audio device in which a human voice is input to a microphone is reduced by an echo canceller. For example, the voice recognition disclosed in Patent Document 1 A method using an apparatus is also conceivable.
JP 2003-241791 A

ところで、エコーキャンセラは、音響機器が出力する音信号を参照信号として適応フィルタを逐次学習することで、音響機器の出力音が集音部であるマイクへ回り込むのを低減する技術である。また音声認識方法は、音声から特徴量を抽出し、この特徴量の時系列変化から発話内容を識別する方法であり、発話音量の個人差は、発話者とマイクとの距離の変動による入力レベルの変動の影響を受けないように、特徴量に音声パワーを用いずに音声パワーの日瓶値を用いるのが一般的である。 By the way, the echo canceller is a technique for reducing the output sound of the acoustic device from wrapping around the microphone as the sound collection unit by sequentially learning an adaptive filter using the sound signal output from the acoustic device as a reference signal. The voice recognition method is a method of extracting feature amounts from speech and identifying the utterance contents from the time series change of the feature amounts. The individual difference in the utterance volume is the input level due to the fluctuation of the distance between the speaker and the microphone. In general, the daily value of the voice power is used as the feature amount without using the voice power so as not to be affected by the fluctuation of the voice.

一方住宅内の設備機器を音声で操作する音声コントローラでは、マイクから離れたユーザの声も認識できるように、集音部の入力ゲインを高く設定しなければならない上に、キッチンや浴室等の音の反射が強い環境では、残響時間が長くなるために、エコーキャンセラが推定するフィルタ長も長くなり、精度の高いフィルタ学習が困難になる。そのため回り込みの抑圧できなかった成分が多くなる。 On the other hand, in the voice controller that operates the equipment in the house with voice, the input gain of the sound collection unit must be set high so that the voice of the user away from the microphone can be recognized, and the sound in the kitchen, bathroom, etc. In an environment where the reflection is strong, the reverberation time becomes long, so the filter length estimated by the echo canceller also becomes long, and it becomes difficult to perform highly accurate filter learning. For this reason, there are many components that cannot be suppressed.

かように従来の音声コントローラでは、エコーキャンセラの抑圧量が十分ではなく、音響機器の出力音をユーザの発話音声と認識してしまい、誤った制御を行うという課題があった。 As described above, the conventional voice controller has a problem that the amount of suppression of the echo canceller is not sufficient, and the output sound of the acoustic device is recognized as the user's uttered voice and erroneous control is performed.

本発明は、上述の点に鑑みて為されたもので、その目的とするところは、宅内の音響機器が音を出力している環境下でも、誤った音声認識による誤制御を防ぐことができる音声コントローラを提供することにある。 The present invention has been made in view of the above-described points, and an object of the present invention is to prevent erroneous control due to erroneous voice recognition even in an environment where sound equipment in the house outputs sound. To provide a voice controller.

上述の目的を達成するために、請求項１の発明では、ユーザから発声された音声を認識し、この認識した音声で規定される制御命令に従って制御対象の機器の動作を制御する音声コントローラであって、宅内の音信号を集音する集音部と、該集音部で集音した音に含まれる宅内の音響機器の出力音に対してエコーキャンセル処理を施すエコーキャンセル処理部と、音声認識の基準となる音声認識モデル及び重畳させる雑音信号を記憶している記憶部と、エコーキャンセル処理部の出力信号に雑音信号を重畳させて前記出力信号を平滑化する雑音重畳部と、前記雑音信号が重畳された出力信号と、前記音声認識モデルとを比較して前記出力信号に含まれる音声を認識する音声認識部と、該音声認識部で認識した音声に対応する制御命令を実行する制御実行部とを備えて成ることを特徴とする。 In order to achieve the above object, according to the first aspect of the present invention, there is provided a voice controller for recognizing a voice uttered by a user and controlling an operation of a device to be controlled in accordance with a control command defined by the recognized voice. A sound collection unit that collects the sound signal in the house, an echo cancellation processing unit that performs echo cancellation processing on the output sound of the home acoustic device included in the sound collected by the sound collection unit, and voice recognition A storage unit that stores a speech recognition model and a noise signal to be superimposed, a noise superimposing unit that superimposes a noise signal on an output signal of an echo cancellation processing unit, and smoothes the output signal, and the noise signal A speech recognition unit that recognizes speech included in the output signal by comparing the output signal superimposed with the speech recognition model, and a control command corresponding to the speech recognized by the speech recognition unit Characterized by comprising a control execution unit.

請求項１の発明によれば、雑音重畳部を設けてエコーキャンセル部の出力信号に雑音信号を重畳して平滑化するので、宅内の音響機器が音を出力している環境下でも、音声認識部がユーザの音声を認識することができ、その結果誤った音声認識による誤制御を防ぐことができる。 According to the first aspect of the present invention, since the noise superimposing unit is provided and the noise signal is superimposed and smoothed on the output signal of the echo canceling unit, the speech recognition is performed even in the environment where the acoustic device in the house outputs the sound. The unit can recognize the user's voice, and as a result, erroneous control due to erroneous voice recognition can be prevented.

請求項２の発明では、請求項１の発明において、前記音響機器からの出力音の大きさを検知する出力音検知部を備え、前記雑音重畳部は、前記出力音検知部が検知した出力音の大きさに応じて、前記出力信号に重畳させる雑音信号の大きさを変化させることを特徴とする。 According to a second aspect of the present invention, in the first aspect of the invention, an output sound detection unit that detects a magnitude of an output sound from the acoustic device is provided, and the noise superimposing unit is an output sound detected by the output sound detection unit. The size of the noise signal to be superimposed on the output signal is changed according to the size of the output signal.

請求項２の発明によれば、出力音のパワーが小さいときには重畳する雑音信号の割合を小さくし、出力音が大きいときには重畳ずる雑音信号の割合を大きくすることができ、そのため音響機器の出力音のパワーが小さいときに過大な雑音信号が重畳され、ユーザの音声信号も雑音信号により平滑化されてしまうことがなく、その結果ユーザの音声を認識できなくなるという問題を回避できる。 According to the second aspect of the present invention, the ratio of the noise signal to be superimposed can be reduced when the power of the output sound is low, and the ratio of the noise signal to be superimposed can be increased when the output sound is large. An excessive noise signal is superimposed when the power of the user is low, and the user's voice signal is not smoothed by the noise signal. As a result, the user's voice cannot be recognized.

請求項３の発明では、請求項１又は２の発明において、前記音声認識モデルは、前記出力信号に重畳させる雑音信号を予め組み込んで構成されていることを特徴とする。 According to a third aspect of the present invention, in the first or second aspect of the present invention, the speech recognition model is configured by previously incorporating a noise signal to be superimposed on the output signal.

請求項３の発明によれば、音声認識モデルに重畳用の雑音信号を組み込んでいるので、雑音重畳部で雑音信号が重畳された音声信号と音声認識モデルとの間の音響的ミスマッチが減少することになり、ユーザの音声に雑音を重畳することによる認識性能の低下を減少させることができる。 According to the invention of claim 3, since the noise signal for superimposition is incorporated in the speech recognition model, the acoustic mismatch between the speech signal on which the noise signal is superimposed by the noise superimposing unit and the speech recognition model is reduced. In other words, it is possible to reduce degradation in recognition performance due to noise superimposed on the user's voice.

請求項４の発明では、請求項１の発明において、前記記憶部に記憶されている雑音信号は、宅内に設置された機器の定常雑音信号であることを特徴とする。 According to a fourth aspect of the present invention, in the first aspect of the invention, the noise signal stored in the storage unit is a stationary noise signal of a device installed in a home.

請求項４の発明によれば、重畳用雑音信号が、既に宅内に設置された機器が発生する騒音の信号を用いるので、雑音信号を重畳することによる性能低下が、宅内機器の動作時の騒音による性能低下と同程度になり、ユーザの使用感に不自然さがなくなる。 According to the invention of claim 4, since the noise signal for superimposition uses a noise signal generated by a device already installed in the house, the performance degradation due to the superposition of the noise signal is caused by noise during operation of the home device. The performance is reduced by the same level, and the user experience is not unnatural.

請求項５の発明では、請求項４の発明において、前記音声認識モデルは、前記出力信号に重畳する前記定常雑音信号を予め組み込んで構成されていることを特徴とする。 According to a fifth aspect of the present invention, in the fourth aspect of the invention, the speech recognition model is configured by incorporating the stationary noise signal superimposed on the output signal in advance.

請求項５の発明によれば、音声認識モデルにも重畳用の定常雑音信号を組み込んでいるので、雑音重畳部で定常雑音信号が重畳された音声信号と音声認識モデルとの間の音響的ミスマッチが減少することになり、ユーザの音声に雑音を重畳することによる認識性能の低下を減少させることができる。 According to the invention of claim 5, since the stationary noise signal for superimposing is also incorporated into the speech recognition model, an acoustic mismatch between the speech signal on which the stationary noise signal is superimposed by the noise superimposing unit and the speech recognition model. As a result, the degradation of recognition performance due to the superimposition of noise on the user's voice can be reduced.

請求項６の発明では、請求項４又は５の発明において、前記宅内に設置された前記機器の電源のオン／オフを検知する機器電源検知部を備え、前記雑音重畳部は、前記機器の電源がオンしている場合に、前記出力信号への雑音信号の重畳を抑止することを特徴とする。 According to a sixth aspect of the present invention, in the fourth or fifth aspect of the present invention, the apparatus includes a device power source detection unit that detects on / off of the power source of the device installed in the house, and the noise superimposing unit is a power source of the device. When ON is turned on, the superposition of a noise signal on the output signal is suppressed.

請求項６の発明によれば、機器の電源がオンしている場合には、雑音信号を重畳しないので、雑音信号の重畳分に代わり、浴室機器２の発する騒音が混入されることになるが、ユーザの音声と雑音のＳ／Ｎ比が想定以上に劣化して認識性能が低下することもなく、しかも浴室機器の騒音がエコーキャンセル未抑圧成分を平滑化するので、雑音信号を重畳しなくても音響機器の出力音信号を平滑化し、誤制御することがない。 According to the invention of claim 6, when the power of the device is turned on, the noise signal is not superimposed, so that the noise generated by the bathroom device 2 is mixed instead of the superimposed noise signal. In addition, the S / N ratio of the user's voice and noise is not deteriorated more than expected and the recognition performance is not deteriorated, and the noise of the bathroom device smooths the echo cancellation unsuppressed component, so that the noise signal is not superimposed However, the output sound signal of the audio equipment is smoothed and is not erroneously controlled.

請求項７の発明では、請求項１乃至６の何れかの発明において、前記宅内に設置された前記音響機器の電源のオン／オフを検知する機器電源検知部を備え、前記雑音重畳部は、前記音響機器の電源がオフしている場合に、前記出力信号への雑音信号の重畳を抑止することを特徴とする。 According to a seventh aspect of the invention, in any one of the first to sixth aspects of the present invention, the apparatus includes a device power source detection unit that detects power on / off of the acoustic device installed in the house, and the noise superimposing unit includes: When the power source of the acoustic device is off, superimposition of a noise signal on the output signal is suppressed.

請求項７の発明によれば、音響機器の電源がオフしている場合に、雑音信号を重畳しないので、ユーザが小さな声で音声入力しても十分なＳ／Ｎ比を得ることができ、高い認識性能を得ることができる。 According to the invention of claim 7, since the noise signal is not superimposed when the power of the audio equipment is off, a sufficient S / N ratio can be obtained even if the user inputs a voice with a small voice, High recognition performance can be obtained.

本発明は、雑音重畳部を設けてエコーキャンセル部の出力信号に雑音信号を重畳して平滑化するので、宅内の音響機器が音を出力している環境下でも、音声認識部がユーザの音声を認識することができ、その結果誤った音声認識による誤制御を防ぐことができるという効果がある。 In the present invention, since the noise superimposing unit is provided and the noise signal is superimposed on the output signal of the echo canceling unit and smoothed, the voice recognizing unit can perform the user's voice even in an environment where the sound equipment in the home outputs sound. Can be recognized, and as a result, erroneous control due to erroneous voice recognition can be prevented.

以下本発明を実施形態により説明する。 Embodiments of the present invention will be described below.

（実施形態１）
本実施形態の音声コントローラ１は、例えば図１に示すように浴室に設置した浴室機器２を制御対象とするものであり、図示するようにユーザの発話音声を捉えて音信号（音声信号）に変換するマイクのような集音部１０と、集音部１０で集音される例えば浴室テレビ等の音響機器３の出力音に対してエコーキャンセル処理を施すエコーキャンセル部１１と、音響機器３から、例えばライン入力でエコーキャンセル部１１に参照信号として入力される出力音の音信号の信号パワーを計測する出力音検知部１２と、エコーキャンセル部１１の出力信号に重畳する重畳用雑音信号のデータを記憶する記憶部１３と、エコーキャンセル部１１の出力信号に重畳用雑音信号を重畳する雑音重畳部１４と、制御対象機器である浴室機器２及び音響機器３の電源のオン／オフを検知する機器電源検知部１５と、音声認識処理に用いる音声認識モデルのデータを記憶している記憶部１６と、雑音重畳部１４の出力信号と音声認識モデルとを比較して発話音声を認識する音声認識部１７と、音声認識部１７で認識した音声に対応する制御命令を実行して、本実施形態の制御対象機器である浴室機器２を制御する制御実行部１８とで構成される。尚記憶部１３，１６は同じ記憶装置を用いて構成しても良い。 (Embodiment 1)
The voice controller 1 of the present embodiment is for controlling a bathroom device 2 installed in a bathroom as shown in FIG. 1, for example, and captures a user's uttered voice as shown in FIG. 1 to generate a sound signal (voice signal). From the sound collection unit 10 such as a microphone to be converted, the echo cancellation unit 11 that performs echo cancellation processing on the output sound of the acoustic device 3 such as a bathroom television set collected by the sound collection unit 10, and the acoustic device 3 For example, output sound detection unit 12 that measures the signal power of a sound signal of an output sound that is input as a reference signal to echo cancellation unit 11 by line input, and data of a noise signal for superimposition superimposed on the output signal of echo cancellation unit 11 , A noise superimposing unit 14 that superimposes a noise signal for superimposition on the output signal of the echo canceling unit 11, a bathroom device 2 and an acoustic device 3 that are control target devices The device power source detection unit 15 that detects power on / off, the storage unit 16 that stores the data of the voice recognition model used for the voice recognition processing, and the output signal of the noise superimposing unit 14 and the voice recognition model are compared. A voice recognition unit 17 for recognizing the speech voice, and a control execution unit 18 for executing the control command corresponding to the voice recognized by the voice recognition unit 17 to control the bathroom device 2 which is the control target device of this embodiment; Consists of. The storage units 13 and 16 may be configured using the same storage device.

図２はエコーキャンセル部１１の構成を示しており、このエコーキャンセル部１１は減算器１１ａと、適応フィルタ１１ｂとで構成され、集音部１０を構成するマイクでユーザの発話音声及び音響機器３のスピーカ３ａで再生される出力音からなる音信号を減算器１１ａに入力する。この減算器１１ａは、音信号から適応フィルタ１１ｂによって合成された推定エコー信号を減算し、エコーを消去して出力する。一方適応フィルタ１１ｂは、減算器１１ａによる減算処理後の誤差信号から音響機器３のスピーカ３ａから集音部１０までの室内の伝達特性を推定し、この推定したフィルタを用いて音響機器３の出力音の音信号（参照信号）から前記推定エコー信号を減算器１１ａに出力する。 FIG. 2 shows the configuration of the echo canceling unit 11, which is composed of a subtractor 11 a and an adaptive filter 11 b, and the user's uttered voice and acoustic device 3 with a microphone constituting the sound collecting unit 10. The sound signal consisting of the output sound reproduced by the speaker 3a is input to the subtractor 11a. The subtractor 11a subtracts the estimated echo signal synthesized by the adaptive filter 11b from the sound signal, erases the echo, and outputs the result. On the other hand, the adaptive filter 11b estimates the transfer characteristic in the room from the speaker 3a of the acoustic device 3 to the sound collection unit 10 from the error signal after the subtraction processing by the subtractor 11a, and outputs the output of the acoustic device 3 using the estimated filter. The estimated echo signal is output from the sound signal (reference signal) of the sound to the subtractor 11a.

次に本実施形態の音声コントローラ１の動作を説明する。 Next, the operation of the voice controller 1 of this embodiment will be described.

まず、ユーザは、浴室機器２の制御を行いたい場合には、集音部１０に所定の音声を入力する。 First, when the user wants to control the bathroom device 2, the user inputs a predetermined sound to the sound collection unit 10.

このときに、浴室内の音響機器３が動作し、そのスピーカ３ａ（図２参照）から再生音、つまり出力音が出力されている場合、集音部１０にこの出力音も入力することになる。 At this time, when the acoustic device 3 in the bathroom operates and a reproduction sound, that is, an output sound is output from the speaker 3a (see FIG. 2), the output sound is also input to the sound collecting unit 10. .

図３（ａ）はユーザの音声（Ｉ）に続いて音響機器３の出力音（ＩＩ）が集音部１０に入力したときの集音部１０から出力される音信号の波形例を示す。 FIG. 3A shows a waveform example of a sound signal output from the sound collection unit 10 when the output sound (II) of the acoustic device 3 is input to the sound collection unit 10 following the user's voice (I).

エコーキャンセル部１１は集音部１０から出力される音信号を入力するとともに、ライン入力で音響機器３の出力音信号を入力し、この出力音信号により、集音部１０から出力される音信号に含まれる音響機器３の出力音（ＩＩ）の成分を抑制する処理を行う。図３（ｂ）はこの抑制処理され、エコーキャンセル部１１から出力される出力信号の波形例である。 The echo canceling unit 11 inputs the sound signal output from the sound collecting unit 10 and also inputs the output sound signal of the acoustic device 3 by line input, and the sound signal output from the sound collecting unit 10 by this output sound signal The process which suppresses the component of the output sound (II) of the audio equipment 3 contained in is performed. FIG. 3B is a waveform example of an output signal that is subjected to the suppression process and is output from the echo canceling unit 11.

そしてエコーキャンセル部１１の出力信号を入力した雑音重畳部１４は、記憶部１３から読み出した重畳用雑音信号（図３（ｃ）参照）をエコーキャンセル部１１の出力信号に重畳して図３（ｄ）に示すように平滑化を行う。 The noise superimposing unit 14 that has received the output signal of the echo canceling unit 11 superimposes the superimposing noise signal (see FIG. 3C) read from the storage unit 13 on the output signal of the echo canceling unit 11 (FIG. 3 ( Smoothing is performed as shown in d).

ここで重畳用雑音信号としては、例えば浴室乾燥機のような浴室機器２の動作音信号を用いている。また雑音重畳部１４は重畳する雑音信号の割合を出力音検知部１２から出力される信号パワーに対応して通知される信号に基づいて制御する。出力音検知部１２は、音響機器３からエコーキャンセル部１１へ出力される参照信号の単位時間当たりの信号パワーを所定の時間間隔毎に雑音重畳部１４へ通知している。尚信号パワーの変動があった場合のみ、雑音重畳部１４へ通知するようにしても良い。 Here, as the noise signal for superimposition, for example, an operation sound signal of a bathroom device 2 such as a bathroom dryer is used. The noise superimposing unit 14 controls the ratio of the noise signal to be superimposed based on a signal notified in correspondence with the signal power output from the output sound detection unit 12. The output sound detection unit 12 notifies the noise superimposing unit 14 of the signal power per unit time of the reference signal output from the acoustic device 3 to the echo cancellation unit 11 at predetermined time intervals. The noise superimposing unit 14 may be notified only when the signal power varies.

尚、雑音重畳部１４は、機器電源検知部１５から浴室機器２の電源がオン状態である信号が出力されている場合、或いは機器電源検知部１５から音響機器３の電源がオフ状態である場合に、雑音重畳を行わない。 In addition, the noise superimposing unit 14 is in a case where a signal indicating that the bathroom device 2 is turned on is output from the device power source detection unit 15 or in a case where the power source of the acoustic device 3 is turned off from the device power source detection unit 15. In addition, no noise superposition is performed.

さて、音声認識部１７は、雑音重畳部１４からの出力信号と音声認識モデルとを比較して、音声認識処理を行っており、ユーザの音声が集音部１０に入力されていない場合には、主音部１０には音響機器３の出力音のみが入力されているが、エコーキャンセル部１１の出力信号に雑音信号が重畳されるため、音声認識部１７は音声と認識しない。 The speech recognition unit 17 compares the output signal from the noise superimposing unit 14 with the speech recognition model and performs speech recognition processing, and when the user's speech is not input to the sound collection unit 10. Although only the output sound of the acoustic device 3 is input to the main sound unit 10, since the noise signal is superimposed on the output signal of the echo cancellation unit 11, the speech recognition unit 17 does not recognize the sound.

一方、ユーザの音声が入力されている場合には、重畳されている雑音よりもユーザの音声のパワーが十分に大きいので、正しく音声と判断されて音声認識処理が行われる。このときユーザの音声に雑音が重畳されているが、音声認識モデルも雑音が重畳された学習データから作成されているので、音声と音声認識モデルの音響的ミスマッチによる認識性能の低下は少ない。 On the other hand, when the user's voice is input, the power of the user's voice is sufficiently larger than the superimposed noise, so that the voice is correctly determined and the voice recognition process is performed. At this time, noise is superimposed on the user's voice, but since the voice recognition model is also created from learning data on which noise is superimposed, there is little degradation in recognition performance due to acoustic mismatch between the voice and the voice recognition model.

制御実行部１８は、音響認識部１７による認識結果に基づいて、浴室機器２に制御命令を出力して所定の制御を実行する。 Based on the recognition result by the sound recognition unit 17, the control execution unit 18 outputs a control command to the bathroom device 2 and executes predetermined control.

以上のように本実施形態の音声コントローラ１は、上述のように雑音重畳部１４を設けてエコーキャンセル部１１の出力信号に雑音信号を重畳して平滑化するので、音声認識部１７が、エコーキャンセル部１１で完全に抑制できなかった音響機器３の出力音を音声と判断して音声認識処理を行うことがなくなり、その結果として出力音をユーザの音声と判断して誤った制御が為されるのを防ぐことができる。 As described above, the voice controller 1 according to the present embodiment provides the noise superimposing unit 14 as described above and superimposes the noise signal on the output signal of the echo canceling unit 11 so as to be smoothed. It is no longer necessary to perform speech recognition processing by determining the output sound of the acoustic device 3 that could not be completely suppressed by the cancel unit 11 as a sound, and as a result, the output sound is determined as the user's sound and erroneous control is performed. Can be prevented.

また、音響機器３からの出力音の大きさを検知する出力音検知部１２を備えていることで、出力音のパワーが小さいときには重畳する雑音信号の割合を小さくし、出力音が大きいときには重畳ずる雑音信号の割合を大きくするので、音響機器３の出力音のパワーが小さいときに過大な雑音信号が重畳され、ユーザの音声信号も雑音信号により平滑化されてしまうことがなく、その結果ユーザの音声を認識できなくなるという問題を回避できる。 In addition, since the output sound detection unit 12 that detects the magnitude of the output sound from the acoustic device 3 is provided, the ratio of the noise signal to be superimposed is reduced when the power of the output sound is low, and is superimposed when the output sound is high. Since the ratio of the scrambled noise signal is increased, an excessive noise signal is superimposed when the power of the output sound of the acoustic device 3 is small, and the user's voice signal is not smoothed by the noise signal. The problem of being unable to recognize the voice of can be avoided.

尚、出力音が大きい場合には、重畳する雑音の割合が大きくするが、ユーザの発話音声の大きさは、音響機器の再生音の大きさに影響されて大きくなるので、雑音によって平滑化されてしまうことはない。 Note that when the output sound is loud, the ratio of the superimposed noise increases, but the volume of the user's uttered voice increases due to the volume of the reproduced sound of the audio equipment, so it is smoothed by the noise. There is no end to it.

更に、また重畳用雑音信号が、既に浴室内に設置された浴室機器２が発生する騒音の信号を用いるので、雑音信号を重畳することによる性能低下が、浴室機器２の動作時の騒音による性能低下と同程度になり、ユーザの使用感に不自然さがなくなる。 Furthermore, since the noise signal for superimposition uses the noise signal generated by the bathroom device 2 already installed in the bathroom, the performance degradation due to the superposition of the noise signal is the performance due to the noise during operation of the bathroom device 2. This is almost the same as the decrease, and the user experience is not unnatural.

また、浴室機器２の電源がオンしている場合には、雑音信号を重畳しないので、雑音信号の重畳分に代わり、浴室機器２の発する騒音が混入されることになるが、ユーザの音声と雑音のＳ／Ｎ比が想定以上に劣化して認識性能が低下することもなく、しかも浴室機器２の騒音がエコーキャンセル未抑圧成分を平滑化するので、雑音信号を重畳しなくても音響機器３の出力音信号を平滑化し、誤動作することがない。 In addition, when the bathroom device 2 is turned on, no noise signal is superimposed, so that noise generated by the bathroom device 2 is mixed in place of the superimposed noise signal. Since the S / N ratio of noise does not deteriorate more than expected and the recognition performance does not deteriorate, and the noise of the bathroom device 2 smooths the echo canceling unsuppressed component, the acoustic device can be used without superimposing the noise signal. The output sound signal 3 is smoothed and no malfunction occurs.

また音響機器３の電源がオフしている場合にも、雑音信号を重畳しないので、ユーザが小さな声で音声入力しても十分なＳ／Ｎ比を得ることができ、高い認識性能を得ることができる。 Even when the power of the acoustic device 3 is off, no noise signal is superimposed, so that a sufficient S / N ratio can be obtained even when the user inputs a voice with a small voice, and high recognition performance is obtained. Can do.

記憶部１６に記憶させている音声認識モデルに、定常雑音信号、例えば重畳用雑音信号＜浴室機器２の騒音＞を組み込めば、雑音重畳部１４で雑音信号を重畳した音声信号と音声認識モデルとの間の音響的ミスマッチが減少することになり、ユーザの音声に雑音を重畳するこことによる認識性能の低下を減少させることができる。この音声認識モデルへの重畳用雑音の組み込みは、音声認識モデルの学習用音声データに対して雑音を重畳した後、音声認識モデルを作成しても良い。 If a stationary noise signal, for example, a noise signal for superimposition <noise of bathroom equipment 2> is incorporated into the speech recognition model stored in the storage unit 16, the speech signal and the speech recognition model on which the noise signal is superimposed by the noise superimposing unit Will be reduced, and the degradation of the recognition performance caused by superimposing noise on the user's voice can be reduced. In order to incorporate the noise for superimposition into the speech recognition model, the speech recognition model may be created after the noise is superimposed on the speech data for learning of the speech recognition model.

一実施形態の構成図である。It is a block diagram of one Embodiment. 一実施形態に用いるエコーキャンセル部の構成図である。It is a block diagram of the echo cancellation part used for one Embodiment. 一実施形態の動作説明用の波形図である。It is a wave form chart for operation explanation of one embodiment.

Explanation of symbols

１音声コントローラ
１０集音部
１１エコーキャンセル部
１２出力音検知部
１３記憶部
１４雑音重畳部
１５機器電源検知部
１６記憶部
１７音声認識部
１８制御実行部
２浴室機器
３音響機器 1 Voice controller
DESCRIPTION OF SYMBOLS 10 Sound collection part 11 Echo cancellation part 12 Output sound detection part 13 Storage part 14 Noise superimposition part 15 Device power supply detection part 16 Storage part 17 Voice recognition part 18 Control execution part 2 Bathroom equipment 3 Acoustic equipment

Claims

A voice controller for recognizing a voice uttered by a user and controlling an operation of a device to be controlled according to a control command defined by the recognized voice;
A sound collection unit that collects sound signals in the house;
An echo cancellation processing unit for performing echo cancellation processing on the output sound of the in-house audio equipment included in the sound collected by the sound collection unit;
A storage unit for storing a speech recognition model to be a reference for speech recognition and a noise signal to be superimposed;
A noise superimposing unit that smoothes the output signal by superimposing a noise signal on the output signal of the echo cancellation processing unit;
A speech recognition unit that recognizes speech included in the output signal by comparing the output signal on which the noise signal is superimposed and the speech recognition model;
A voice controller comprising: a control execution unit that executes a control command corresponding to the voice recognized by the voice recognition unit.

An output sound detection unit for detecting the magnitude of output sound from the acoustic device;
The voice controller according to claim 1, wherein the noise superimposing unit changes a magnitude of a noise signal to be superimposed on the output signal according to a magnitude of the output sound detected by the output sound detecting unit.

The voice controller according to claim 1, wherein the voice recognition model is configured by incorporating a noise signal to be superimposed on the output signal in advance.

The voice controller according to claim 1, wherein the noise signal stored in the storage unit is a stationary noise signal of a device installed in a home.

The voice controller according to claim 4, wherein the voice recognition model is configured by incorporating the stationary noise signal to be superimposed on the output signal in advance.

A device power supply detection unit that detects on / off of the power supply of the device installed in the house, and the noise superimposing unit is configured to output a noise signal to the output signal when the device is turned on. 6. The audio controller according to claim 4, wherein superposition is suppressed.

A device power source detection unit that detects on / off of the power source of the acoustic device installed in the house, and the noise superimposing unit is configured to detect noise in the output signal when the power source of the acoustic device is off. 7. The voice controller according to claim 1, wherein superposition of signals is suppressed.