JP2006058479A

JP2006058479A - Controller with voice recognition function

Info

Publication number: JP2006058479A
Application number: JP2004238741A
Authority: JP
Inventors: Akira Baba; 朗馬場; Shinpei Hibiya; 新平日比谷; Haruka Amanuma; はるか天沼; Yoshihiko Tokunaga; 吉彦徳永; Kenji Nakakita; 賢二中北
Original assignee: Matsushita Electric Works Ltd
Current assignee: Panasonic Electric Works Co Ltd
Priority date: 2004-08-18
Filing date: 2004-08-18
Publication date: 2006-03-02
Anticipated expiration: 2024-08-18
Also published as: JP4784056B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a controller with a voice recognition function which has improved recognition accuracy by making a sound model adaptive in compliance with the sounds, such as a voice and noise, inputted in real use. <P>SOLUTION: The controller A with the voice recognition function is equipped with: a recognition part 4 which compares the feature quantities of the voice signal extracted by a feature quantities extraction part 2 with the sound model stored in a sound model storage part 3 to thereby recognize an input sound; a control part 6 which outputs a control signal according to the recognition result to an illuminator B; and a sound model learning part 8 which relearns the sound model by using the voice signal stored in an input voice storage control part 7, and the corresponding sound model and updates the sound model of the sound model storage part 3. When the control signal different in the control content from the recognition result is outputted from a switch 5 during the time from the output of the recognition result before the lapse of the prescribed limit time, the input voice storage control part 7 corrects the recognition result and causes the sound model learning part 8 to relearn the sound model by using the corrected recognition result and the corresponding voice signal. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、人が発する操作命令を認識して制御対象の機器を制御する音声認識機能付制御装置に関するものである。 The present invention relates to a control device with a voice recognition function that recognizes an operation command issued by a person and controls a device to be controlled.

従来より、人が発する操作命令を認識して制御対象の照明負荷を点灯又は消灯させる音声認識機能付制御装置が提供されている（例えば特許文献１参照）。 2. Description of the Related Art Conventionally, a control device with a voice recognition function that recognizes an operation command issued by a person and turns on or off a lighting load to be controlled has been provided (for example, see Patent Document 1).

また、制御対象機器を操作する手段として、制御対象機器を直接操作するための操作手段と、機器を操作するために人が発した操作命令を認識する音声認識手段とを備え、操作手段による操作入力および音声認識手段による認識結果を用いて制御対象機器を動作させる音声認識機能付制御装置も従来から提供されている。 Further, as means for operating the control target device, an operation means for directly operating the control target device and a voice recognition means for recognizing an operation command issued by a person to operate the device are provided. A control device with a voice recognition function that operates a device to be controlled using an input and a recognition result by a voice recognition means has also been provided.

図６はこのような従来の音声認識機能付制御装置のブロック図であり、この音声認識機能付制御装置Ａは、マイク１と、特徴量抽出部２と、音響モデル保存部３と、認識部４と、スイッチ５と、制御部６と、入力音声保存制御部７と、音響モデル学習部８とを主要な構成として備えている。 FIG. 6 is a block diagram of such a conventional control device with a voice recognition function. The control device A with a voice recognition function includes a microphone 1, a feature amount extraction unit 2, an acoustic model storage unit 3, and a recognition unit. 4, a switch 5, a control unit 6, an input voice storage control unit 7, and an acoustic model learning unit 8 are provided as main components.

マイク１には、制御対象機器Ｂに所望の動作を行わせるために人が発した操作命令（音声）または雑音の何れかである音が入力され、入力音をアナログの電気信号である音声信号に変換して出力する。 The microphone 1 receives a sound that is either an operation command (voice) or noise issued by a person to cause the control target device B to perform a desired operation, and the input sound is an audio signal that is an analog electrical signal. Convert to and output.

特徴量抽出部２は、マイク１からの音声信号の入力を検知すると、入力された音声信号を例えば量子化ビット数を１６、標本化周波数を１６ｋＨｚとしてＡ／Ｄ変換し、その後分析フレーム長２５ミリ秒、分析間隔を１０ミリ秒として周波数変換した後、音声信号の特徴量を抽出する。ここで音声信号の特徴量としては例えばメル周波数ケプストラム係数などを用いることができ、抽出した特徴量は認識部４に出力される。また特徴量抽出部２は、マイク１から入力された音声信号を入力音声保存制御部７に出力する。 When detecting the input of the audio signal from the microphone 1, the feature amount extraction unit 2 performs A / D conversion on the input audio signal, for example, with the number of quantization bits set to 16 and the sampling frequency set to 16 kHz. After frequency conversion is performed in milliseconds and the analysis interval is 10 milliseconds, the feature amount of the audio signal is extracted. Here, for example, a mel frequency cepstrum coefficient or the like can be used as the feature amount of the audio signal, and the extracted feature amount is output to the recognition unit 4. The feature amount extraction unit 2 outputs the audio signal input from the microphone 1 to the input audio storage control unit 7.

音響モデル保存部３には、制御対象機器Ｂを制御するための１乃至複数の操作命令について、多数の話者が発した操作命令の特徴量を例えばＨＭＭ（隠れマルコフモデル）を用いてモデル化した「命令語」音響モデル、および、使用環境において想定される雑音の特徴量をモデル化した「雑音」音響モデルを含む音響モデルが保存されている。なお音響モデルの例としては、例えば制御対象機器Ｂが照明装置の場合には、「命令語」音響モデルとして、照明器具の点灯を指示するために用いる「あかり」という語彙に対応した「あかり」音響モデルが保存されるとともに、ドアの開閉音などの物音や「あかり」に関する語彙以外の音声に対応した「雑音」音響モデルが保存されるのである。 In the acoustic model storage unit 3, for one or a plurality of operation commands for controlling the control target device B, feature quantities of operation commands issued by a large number of speakers are modeled using, for example, an HMM (Hidden Markov Model). The acoustic model including the “command word” acoustic model and the “noise” acoustic model obtained by modeling the characteristic amount of noise assumed in the usage environment is stored. As an example of the acoustic model, for example, when the control target device B is a lighting device, the “light” corresponding to the vocabulary “light” used to instruct the lighting of the lighting device as the “command word” acoustic model. The acoustic model is stored, and the “noise” acoustic model corresponding to the sound other than the vocabulary related to the sound of the door and the “light” is stored.

認識部４は、特徴量抽出部２で抽出された特徴量と、音響モデル保存部３に保存された音響モデルとを比較して、抽出された特徴量と類似度の高いモデルに対応する音（操作命令または雑音）を認識結果として制御部６および入力音声保存制御部７に出力する。つまり、音響モデル保存部３に「あかり」、「雑音」という２つの音響モデルが保存されている場合、入力音声の内容が「あかり」に関連している場合は「あかり」という認識結果が得られ、入力音がドアの閉まる音の場合は「雑音」という認識結果が得られるのである。 The recognition unit 4 compares the feature amount extracted by the feature amount extraction unit 2 with the acoustic model stored in the acoustic model storage unit 3, and compares the extracted feature amount with a model having a high degree of similarity. (Operation command or noise) is output to the control unit 6 and the input voice storage control unit 7 as a recognition result. That is, when two acoustic models “AKARI” and “NOISE” are stored in the acoustic model storage unit 3, if the input speech is related to “AKARI”, the recognition result “AKARI” is obtained. If the input sound is a door closing sound, a recognition result of “noise” is obtained.

スイッチ５は制御対象の制御対象機器Ｂを直接操作するために設けられ、スイッチ５の操作に応じた制御信号が制御対象機器Ｂに出力される。例えば制御対象機器Ｂが照明器具の場合、スイッチ５から制御信号として照明器具を点灯（オン）させるオン操作信号、消灯（オフ）させるオフ操作信号が出力される。 The switch 5 is provided to directly operate the control target device B to be controlled, and a control signal corresponding to the operation of the switch 5 is output to the control target device B. For example, when the control target device B is a luminaire, an ON operation signal for turning on (ON) the luminaire and a OFF operation signal for turning off (OFF) are output as control signals from the switch 5.

制御部６は、認識部４の認識結果に基づいて認識結果に対応する動作を行わせるための制御信号を制御対象機器Ｂに出力する。すなわち、制御対象機器Ｂにはスイッチ５と制御部６の両方から制御信号が入力され、スイッチ５からの制御信号、又は、制御部６からの制御信号のうち最新の制御信号に応じた動作を行うようになっている。 The control unit 6 outputs a control signal for performing an operation corresponding to the recognition result to the control target device B based on the recognition result of the recognition unit 4. That is, a control signal is input to the control target device B from both the switch 5 and the control unit 6, and an operation corresponding to the latest control signal from the control signal from the switch 5 or the control signal from the control unit 6 is performed. To do.

入力音声保存制御部７は、特徴量抽出部２から入力された音声信号と、認識部４から入力された認識結果とを対応付けて保存するとともに、新たに入力された認識結果とこの認識結果に対応する１乃至複数の音声信号を音響モデル学習部８に出力する。 The input voice storage control unit 7 stores the voice signal input from the feature amount extraction unit 2 and the recognition result input from the recognition unit 4 in association with each other, and newly inputs the recognition result and the recognition result. Are output to the acoustic model learning unit 8.

そして、音響モデル学習部８は、入力音声保存制御部７から入力された認識結果と該認識結果に対応する１乃至複数の音声信号とを用いて、認識結果に対応する音響モデルを、ＭＬＬＲ（Maximum Liklihood Lenier Regression）法やＭＡＰ推定法（Maximum A Posteriori Probability Pstimation）などにより適応化させており、認識精度の向上を図っている。
特開２００２−２８９３７１号公報 Then, the acoustic model learning unit 8 uses the recognition result input from the input voice storage control unit 7 and one or more audio signals corresponding to the recognition result to convert the acoustic model corresponding to the recognition result to MLLR ( It is adapted by the Maximum Liklihood Renier Regression method and the MAP estimation method (Maximum A Posteriori Probability Pstimation) to improve recognition accuracy.
JP 2002-289371 A

上記構成の音声認識機能付制御装置では、音響モデル保存部３に組み込まれた音響モデルを用いて音声認識を行っているので、音響モデルの作成に使用した音と、使用時にマイク１に入力される音（ユーザの声など）との類似性が低い場合には、認識の精度が低下するという問題がある。例えば「命令語」音響モデルの場合、人間の声質、抑揚などは個人毎に異なるので、実際に使用する人の特性（声質や抑揚など）に合わせた音響モデルを音響モデル保存部３に予め組み込んでおくことは困難であり、その結果認識部４による音声認識の精度が低下してしまうという問題があった。また雑音モデルの場合も同様であり、この装置の使用環境によって、入力される雑音の性質は大きく異なるので、実使用時に入力される雑音に合わせた雑音モデルを音響モデル保存部３に予め組み込んでおくことは困難であり、この結果音声認識の精度が低下してしまう可能性があった。 In the control device with a speech recognition function having the above configuration, since speech recognition is performed using the acoustic model incorporated in the acoustic model storage unit 3, the sound used to create the acoustic model and the microphone 1 when input are used. There is a problem that the accuracy of recognition is lowered when the similarity to a sound (such as a user's voice) is low. For example, in the case of the “command word” acoustic model, human voice quality, inflection, and the like vary from person to person. Therefore, an acoustic model that matches the characteristics of the person actually used (voice quality, intonation, etc.) is incorporated in the acoustic model storage unit 3 in advance. Therefore, there is a problem that the accuracy of speech recognition by the recognition unit 4 is reduced. The same applies to the noise model, and the nature of the input noise varies greatly depending on the use environment of this apparatus. Therefore, a noise model that matches the input noise in actual use is incorporated in the acoustic model storage unit 3 in advance. As a result, the accuracy of speech recognition may be reduced.

そこで、上述の音声認識機能付制御装置では、音響モデル学習部８が、実使用時に入力される音声又は雑音の音声信号と、この音声信号に対する認識部４の認識結果とを用いて音響モデルを再学習することで、音響モデルを逐次更新して実際の使用環境に適応させているのであるが、入力された音声信号に対して誤った認識結果が出力された場合、この誤った認識結果を用いて音響モデルを再学習するため、音響モデルが不正確なものになり、結果的に音声認識の認識性能が低下してしまう可能性があった。 Therefore, in the control device with a voice recognition function described above, the acoustic model learning unit 8 uses the voice signal or noise voice signal input during actual use and the recognition result of the recognition unit 4 for the voice signal to generate an acoustic model. By re-learning, the acoustic model is sequentially updated and adapted to the actual usage environment, but if an incorrect recognition result is output for the input audio signal, this incorrect recognition result is displayed. Since the acoustic model is re-learned by using the acoustic model, the acoustic model becomes inaccurate, and as a result, the recognition performance of speech recognition may be deteriorated.

本発明は上記問題点に鑑みて為されたものであり、その目的とするところは、実使用時に入力される音声や雑音などの音に合わせて音響モデルを適応化させることで認識精度を向上させた音声認識機能付制御装置を提供することにある。 The present invention has been made in view of the above problems, and its object is to improve recognition accuracy by adapting an acoustic model according to sounds such as voice and noise input during actual use. Another object is to provide a control device with a voice recognition function.

上記目的を達成するために、請求項１の発明は、制御対象機器を操作するために人が発した音声または雑音の何れかである音が入力され、入力音を電気信号である音声信号に変換して出力する音変換部と、音声信号より入力音の特徴量を抽出する特徴量抽出部と、複数の音声および雑音の各々について特徴量をモデル化した音響モデルを保存する音響モデル部と、特徴量抽出部が抽出した特徴量と音響モデル部に保存された音響モデルとを比較することによって入力音を認識する認識部と、認識部の認識結果に応じた動作を行わせるための制御信号を制御対象機器に出力する制御部と、操作に応じた制御信号を制御対象機器に直接出力する操作部と、音声信号と認識部の認識結果とを対応付けて保存する入力音声保存部と、入力音声保存部に保存された認識結果と該認識結果に対応する音声信号とを用いて入力音の音響モデルを再学習し、音響モデル部に保存された音響モデルを更新する音響モデル学習部と、認識部から制御部へ認識結果が入力された時点より所定の限時時間を限時するタイマ部とを備え、タイマ部の限時動作中に認識部の認識結果とは制御内容が異なる制御信号が操作部から出力された場合、入力音声保存部は、保存している認識結果を、操作部から出力された制御信号の内容に基づいて修正し、修正された認識結果と入力音声とを用いて音響モデル学習部が音響モデルを再学習することを特徴とする。 In order to achieve the above object, according to the first aspect of the present invention, a sound that is either a voice or a noise uttered by a person to operate the control target device is input, and the input sound is converted into a voice signal that is an electrical signal. A sound conversion unit that converts and outputs, a feature amount extraction unit that extracts a feature amount of an input sound from an audio signal, and an acoustic model unit that stores an acoustic model in which the feature amount is modeled for each of a plurality of sounds and noises , A recognition unit that recognizes an input sound by comparing the feature amount extracted by the feature amount extraction unit with the acoustic model stored in the acoustic model unit, and a control for performing an operation according to the recognition result of the recognition unit A control unit that outputs a signal to the control target device, an operation unit that directly outputs a control signal corresponding to the operation to the control target device, an input voice storage unit that stores the voice signal and the recognition result of the recognition unit in association with each other, and In the input audio storage The acoustic model learning unit that re-learns the acoustic model of the input sound using the existing recognition result and the audio signal corresponding to the recognition result and updates the acoustic model stored in the acoustic model unit, and the control from the recognition unit And a timer unit that limits a predetermined time limit from the time when the recognition result is input to the control unit, and a control signal that is different in control content from the recognition result of the recognition unit is output from the operation unit during the time limit operation of the timer unit. In this case, the input speech storage unit corrects the stored recognition result based on the content of the control signal output from the operation unit, and the acoustic model learning unit uses the corrected recognition result and the input speech. It is characterized by re-learning the model.

ところで、認識部が入力音を誤認識し、誤った認識結果によって制御部が制御対象機器を誤動作させた場合、使用者は制御対象機器を所望の動作状態とするために操作部を直接操作すると考えられるが、本発明によれば、タイマ部の限時動作中、つまり認識部から制御部へ認識結果が入力された時点から所定の限時時間が経過するまでの間に、認識部の認識結果とは制御内容が異なる制御信号が操作部から出力されると、入力音声保存部が保存している認識結果を修正しており、誤認識された入力音の音声信号と修正された認識結果とを用いて音響モデル学習部が音響モデルを再学習することで、次回同じ音が入力された場合に誤認識が起きる可能性が低くなり、音声認識の正解率を向上させて装置の信頼性を高めることができる。 By the way, when the recognition unit misrecognizes the input sound and the control unit causes the control target device to malfunction due to an incorrect recognition result, the user directly operates the operation unit to bring the control target device into a desired operation state. Although it is conceivable, according to the present invention, during the time limit operation of the timer unit, that is, between when the recognition result is input from the recognition unit to the control unit and until a predetermined time limit elapses, When a control signal with a different control content is output from the operation unit, the recognition result stored in the input voice storage unit is corrected, and the voice signal of the erroneously recognized input sound and the corrected recognition result are displayed. By using the acoustic model learning unit to re-learn the acoustic model, the possibility of misrecognition when the same sound is input next time is reduced, improving the accuracy of speech recognition and improving the reliability of the device be able to.

請求項２の発明は、請求項１の発明において、制御対象機器がオフしている状態で認識部が入力音を雑音と認識してからタイマ部の限時動作が終了するまでの間に操作部から制御対象機器をオンさせる制御信号が出力されると、入力音声保存部は、雑音と認識された認識結果をオン操作のための音声に修正して保存することを特徴とし、請求項１の発明と同様の作用を奏する。 According to a second aspect of the present invention, there is provided the operation unit according to the first aspect of the present invention, from when the recognition unit recognizes the input sound as noise in a state where the control target device is turned off until the time limit operation of the timer unit ends. When the control signal for turning on the device to be controlled is output from, the input voice storage unit corrects and stores the recognition result recognized as noise into a voice for an on operation, The same effect as the invention is achieved.

請求項３の発明は、請求項１の発明において、制御対象機器がオフしている状態で認識部が入力音をオン操作のための音声と認識してからタイマ部の限時動作が終了するまでの間に操作部から制御対象機器をオフさせる制御信号が出力されると、入力音声保存部はオン操作のための音声と認識された認識結果を雑音に修正して保存することを特徴とし、請求項１の発明と同様の作用を奏する。 According to a third aspect of the present invention, in the first aspect of the invention, from when the recognition unit recognizes the input sound as a sound for turning on in a state where the control target device is turned off, until the time limit operation of the timer unit ends. When a control signal for turning off the control target device is output from the operation unit during the period, the input voice storage unit corrects and recognizes the recognition result recognized as the voice for the on operation and stores the noise, The same effect as that of the invention of claim 1 is obtained.

請求項４の発明は、請求項１の発明において、制御対象機器の動作状態を記憶する制御状態記憶部を備え、入力音が、制御状態記憶部に記憶されている現在の動作状態に制御対象機器を操作するための音声であると認識部が認識した場合、入力音声保存部は操作のための音声と認識された認識結果を雑音に修正して保存することを特徴とする。 The invention of claim 4 is the invention of claim 1, further comprising a control state storage unit that stores the operation state of the device to be controlled, and the input sound is controlled by the current operation state stored in the control state storage unit. When the recognition unit recognizes that the voice is for operating the device, the input voice storage unit corrects and stores the recognition result recognized as the voice for operation as noise.

一般に使用者が制御対象機器を音声で操作する際に、現在の動作状態に操作するような命令を発することはないと考えられるが、請求項４の発明によれば、現在の動作状態に操作する命令であると入力音が認識された場合、入力音声保存部は、操作するための音声と認識された認識結果を雑音に修正して保存しているので、誤認識された入力音の音声信号と修正された認識結果とを用いて音響モデル学習部が音響モデルを再学習することで、次回同じ音が入力された場合に誤認識が起きる可能性を低減することができる。 In general, when the user operates the device to be controlled by voice, it is considered that a command to operate the current operation state is not issued. However, according to the invention of claim 4, the operation is performed to the current operation state. If the input sound is recognized to be a command to be input, the input sound storage unit corrects and stores the recognition result recognized as the sound for operation as noise. When the acoustic model learning unit relearns the acoustic model using the signal and the corrected recognition result, it is possible to reduce the possibility of erroneous recognition when the same sound is input next time.

請求項５の発明は、請求項１の発明において、音変換部の集音範囲を少なくとも含む検知エリア内で人の存否を検知する人感センサを設け、当該人感センサが人の存在を検知していない状態で、認識部が入力音を操作のための音声と認識した場合、入力音声保存部は、操作のための音声と認識された認識結果を雑音と修正して保存すること特徴とする。 According to a fifth aspect of the present invention, in the first aspect of the present invention, a human sensor is provided to detect the presence or absence of a person in a detection area including at least the sound collection range of the sound conversion unit, and the human sensor detects the presence of a person If the recognition unit recognizes the input sound as a voice for operation in a state in which the recognition is not performed, the input voice storage unit corrects and recognizes the recognition result recognized as the voice for the operation as noise. To do.

ところで、音変換部の集音範囲に人がいない場合は音変換部に入力される音が雑音であることは自明であり、請求項５の発明によれば、人感センサが人の存在を検知していない状態で入力音が操作のための音声と認識された場合、入力音声保存部は認識結果を雑音と修正して保存しているので、誤認識された入力音の音声信号と修正された認識結果とを用いて音響モデル学習部が音響モデルを再学習することで、次回同じ音が入力された場合に誤認識が起きる可能性を低減することができる。 By the way, when there is no person in the sound collection range of the sound converter, it is obvious that the sound input to the sound converter is noise. According to the invention of claim 5, the human sensor detects the presence of a person. If the input sound is recognized as operation sound without detection, the input sound storage unit corrects and stores the recognition result as noise. When the acoustic model learning unit re-learns the acoustic model using the recognized recognition result, it is possible to reduce the possibility of erroneous recognition when the same sound is input next time.

請求項６の発明は、請求項１の発明において、認識部が入力音を音声と認識してから一定時間が経過するまでの間に、認識部が新たな入力音を雑音と判断した場合、入力音声保存部は雑音と認識された認識結果とこの認識結果に対応する音声信号のデータを削除することを特徴とする。 According to a sixth aspect of the present invention, in the first aspect of the invention, when the recognizing unit determines that the new input sound is noise until a predetermined time elapses after the recognizing unit recognizes the input sound as speech. The input voice storage unit deletes the recognition result recognized as noise and the data of the voice signal corresponding to the recognition result.

請求項６の発明によれば、一定時間内に入力音が連続して入力される場合、雑音と認識された認識結果とこの認識結果に対応する音声信号のデータを入力音声保存部が削除しているので、連続して操作のための音声が入力されるような機器では、雑音と誤認識された入力音の音声信号とその認識結果のデータを削除することで、音響モデル学習部が誤ったデータに基づいて再学習するのを防止でき、次回同じ音が入力された場合に誤認識が起きる可能性を低減することができる。 According to the invention of claim 6, when the input sound is continuously input within a predetermined time, the input sound storage unit deletes the recognition result recognized as noise and the data of the sound signal corresponding to the recognition result. Therefore, in a device in which voice for operation is input continuously, the acoustic model learning unit will erroneously delete the voice signal of the input sound that was mistakenly recognized as noise and the recognition result data. It is possible to prevent re-learning based on the received data, and to reduce the possibility of erroneous recognition when the same sound is input next time.

以上説明したように、本発明によれば、タイマ部の限時動作中、つまり認識部から制御部へ認識結果が入力された時点から所定の限時時間が経過するまでの間に、認識部の認識結果とは制御内容が異なる制御信号が操作部から出力されると、入力音声保存部が保存している認識結果を修正しており、誤認識された入力音の音声信号と修正された認識結果とを用いて音響モデル学習部が音響モデルを再学習することで、次回同じ入力音が入力された場合に誤認識が起きる可能性が低くなり、音声認識の正解率を向上させて装置の信頼性を高めることができるという効果がある。 As described above, according to the present invention, the recognition unit recognizes during the time limit operation of the timer unit, that is, until the predetermined time limit elapses after the recognition result is input from the recognition unit to the control unit. When a control signal whose control content is different from the result is output from the operation unit, the recognition result stored in the input voice storage unit is corrected, and the voice signal of the misrecognized input sound and the corrected recognition result When the acoustic model learning unit re-learns the acoustic model using, the possibility of misrecognition when the same input sound is input next time is reduced, and the accuracy rate of speech recognition is improved and the reliability of the device is improved. There is an effect that can improve the nature.

以下に本発明の実施の形態を図面に基づいて説明する。 Embodiments of the present invention will be described below with reference to the drawings.

（実施形態１）
図１は本実施形態の音声認識機能付制御装置のブロック図であり、この音声認識機能付制御装置Ａは、マイク１と、特徴量抽出部２と、音響モデル保存部３と、認識部４と、スイッチ５と、制御部６と、入力音声保存制御部７と、音響モデル学習部８と、タイマ部９とを主要な構成として備えている。尚、タイマ部９を付加した点以外は背景技術で説明した図６の音声認識機能付制御装置Ａと略同様であるので、共通する構成要素には同一の符号を付して、その説明は省略する。 (Embodiment 1)
FIG. 1 is a block diagram of a control device with a voice recognition function according to the present embodiment. A control device A with a voice recognition function includes a microphone 1, a feature amount extraction unit 2, an acoustic model storage unit 3, and a recognition unit 4. The switch 5, the control unit 6, the input voice storage control unit 7, the acoustic model learning unit 8, and the timer unit 9 are provided as main components. In addition, since it is substantially the same as the control apparatus A with the voice recognition function of FIG. 6 described in the background art except that the timer unit 9 is added, the same components are denoted by the same reference numerals, and the description thereof is as follows. Omitted.

タイマ部９は制御部６からのトリガ信号を受けて限時動作を開始する。すなわち、制御部６では、認識部４から認識結果が入力されると、認識結果に応じた動作を行わせるための制御信号を制御対象機器（例えば照明装置Ｂ）に出力するとともに、タイマ部９にトリガ信号を出力する。タイマ部９では、トリガ信号を受け取ると所定時間の限時動作を開始し、限時動作が完了すると、タイマ動作完了信号を入力音声保存制御部７に出力する。 The timer unit 9 receives the trigger signal from the control unit 6 and starts a time limit operation. That is, when the recognition result is input from the recognition unit 4, the control unit 6 outputs a control signal for performing an operation according to the recognition result to the control target device (for example, the lighting device B), and the timer unit 9. Output a trigger signal. When the timer unit 9 receives the trigger signal, the timer unit 9 starts a time limit operation for a predetermined time. When the time limit operation is completed, the timer unit 9 outputs a timer operation completion signal to the input voice storage control unit 7.

一方、入力音声保存制御部７では、特徴量抽出部２から入力された音声信号と、認識部４から入力された認識結果とを対応付けて保存するとともに、タイマ動作完了信号が入力されると、今回入力された認識結果とこの認識結果に対応する１乃至複数の音声信号を音響モデル学習部８に出力する。また、入力音声保存制御部７にはスイッチ５の操作入力も与えられるようになっており、認識部４から認識結果が入力された時点（タイマ動作開始時点）から、タイマ部９よりタイマ動作完了信号が入力されるまでの間にスイッチ５から認識結果とは異なる制御内容の制御信号が与えられると、認識結果を制御信号の制御内容に一致するように書き換えており、その後タイマ動作完了信号が入力されると、修正した認識結果と、この認識結果に対応する音声信号を音響モデル学習部８に出力し、音響モデルの再学習を行わせる。 On the other hand, the input voice storage control unit 7 stores the voice signal input from the feature amount extraction unit 2 and the recognition result input from the recognition unit 4 in association with each other and receives a timer operation completion signal. The recognition result input this time and one or more audio signals corresponding to the recognition result are output to the acoustic model learning unit 8. The input voice storage control unit 7 is also provided with an operation input of the switch 5, and the timer operation is completed by the timer unit 9 from the time when the recognition result is input from the recognition unit 4 (time point when the timer operation starts). When a control signal having a control content different from the recognition result is given from the switch 5 until the signal is input, the recognition result is rewritten so as to match the control content of the control signal. When input, the corrected recognition result and the audio signal corresponding to the recognition result are output to the acoustic model learning unit 8 to re-learn the acoustic model.

図２は本装置Ａを照明器具の制御用に適用した場合の施工例を示しており、部屋２０の天井２１に制御対象の照明装置Ｂが設置されるとともに、外部に通じるドア２２付近の壁２３には照明装置Ｂを直接操作するためのスイッチ５が配設され、さらにこのスイッチ５の近傍（上側）にマイク１が配設されている。スイッチ５は部屋２０に出入りする際に操作されることが多いため、ドア２２の近傍に設置されており、同様の理由でドア２２の近傍に居る人が発する操作命令を確実に集音できるよう、マイク１もドア２２（スイッチ５）の近傍に設置されている。 FIG. 2 shows a construction example when the present apparatus A is applied for controlling a lighting fixture. The lighting apparatus B to be controlled is installed on the ceiling 21 of the room 20 and the wall near the door 22 leading to the outside. 23 is provided with a switch 5 for directly operating the illumination device B, and a microphone 1 is provided in the vicinity (upper side) of the switch 5. Since the switch 5 is often operated when entering and exiting the room 20, the switch 5 is installed in the vicinity of the door 22, and for the same reason, the operation command issued by the person in the vicinity of the door 22 can be reliably collected. The microphone 1 is also installed in the vicinity of the door 22 (switch 5).

しかしながら、マイク１をドア２２の近傍に設置したために、ドア２２を開閉する音がマイク１に集音されやすくなり、ドア２２の開閉音を操作命令と誤認識し、ユーザの意図に反して照明装置Ｂが点灯又は消灯する可能性があった。また、人の発した操作命令をドア２２の開閉音と誤認識し、ユーザの意図に反して照明装置Ｂが点灯しない、或いは消灯しない可能性もあった。 However, since the microphone 1 is installed in the vicinity of the door 22, the sound for opening and closing the door 22 is easily collected by the microphone 1, and the opening and closing sound of the door 22 is mistakenly recognized as an operation command, and lighting is performed contrary to the user's intention. There is a possibility that the device B is turned on or off. In addition, there is a possibility that an operation command issued by a person is mistakenly recognized as an opening / closing sound of the door 22 and the lighting device B does not turn on or off, contrary to the user's intention.

ここで、認識部４が雑音を命令語と誤認識したり、逆に操作命令を雑音と誤認識した結果、照明装置Ｂの動作がユーザの意図と異なる動作になった場合、ユーザはスイッチ５を直接操作して、照明装置Ｂの動作を自分の意図する動作に一致させるものと考えられる。したがって、タイマ部９の限時時間を、認識部４の認識結果が制御部６に入力された時点より、ユーザがスイッチ５を直接操作して照明装置Ｂの動作を変更するまでに必要な時間よりも若干長めに設定しておけば、認識部４の認識結果とユーザの意図する動作とが異なる場合はタイマ動作完了信号が入力されるよりも前にスイッチ５からの操作入力が入力音声保存制御部７に与えられ、認識部４の認識結果とユーザの意図する動作とが一致している場合はタイマ動作完了信号が入力されるまでの間にスイッチ５からの操作入力は与えられないものと判断できる。 Here, when the recognition unit 4 misrecognizes noise as a command word or conversely misrecognizes an operation command as noise, when the operation of the lighting device B is different from the user's intention, the user switches to the switch 5. It is considered that the operation of the lighting device B is matched with the operation intended by the user by directly operating. Therefore, the time limit of the timer unit 9 is determined from the time required for the user to directly operate the switch 5 to change the operation of the lighting device B from the time when the recognition result of the recognition unit 4 is input to the control unit 6. If the recognition result of the recognition unit 4 is different from the operation intended by the user, the operation input from the switch 5 is input voice storage control before the timer operation completion signal is input. The operation input from the switch 5 is not given until the timer operation completion signal is inputted when the recognition result of the recognition unit 4 and the operation intended by the user match with each other. I can judge.

背景技術で説明したように入力音声保存制御部７には、特徴量抽出部２に入力された音声信号と認識部４による認識結果とが対応付けて保存されており、認識結果とこの認識結果に対応する１乃至複数の音声信号とが音響モデル学習部８に出力され、音響モデル学習部８によって音響モデルが再学習されるのであるが、入力音声保存制御部７では、認識部４から認識結果が入力された時点より、タイマ部９からタイマ動作完了信号が入力されるまでの間にスイッチ５から認識結果とは異なる制御内容の制御信号が与えられると、認識結果を制御信号の制御内容に一致するように修正しているので、認識結果をユーザの発した操作命令に合致するように修正することができる。したがって、修正後の認識結果とそれに対応する音声信号を用いて音響モデル学習部８が音響モデルを再学習することで、音響モデルを正確なものとして、音声認識の認識性能を向上させることができる。 As described in the background art, the input voice storage control unit 7 stores the voice signal input to the feature amount extraction unit 2 and the recognition result by the recognition unit 4 in association with each other. Are output to the acoustic model learning unit 8 and the acoustic model is re-learned by the acoustic model learning unit 8, but the input speech storage control unit 7 recognizes from the recognition unit 4. If a control signal having a control content different from the recognition result is given from the switch 5 until the timer operation completion signal is inputted from the timer unit 9 from the time when the result is inputted, the recognition result is converted to the control content of the control signal. Therefore, the recognition result can be corrected to match the operation command issued by the user. Therefore, the acoustic model learning unit 8 re-learns the acoustic model using the corrected recognition result and the corresponding speech signal, so that the acoustic model is accurate and the recognition performance of speech recognition can be improved. .

例えば照明装置Ｂが消灯（オフ）している状態でユーザが照明装置Ｂを点灯させる操作命令（例えば「あかり」）を発話したにも関わらず、認識部４が特徴量抽出部２から入力された特徴量を音響モデル保存部３に保存された音響モデルと比較して、「雑音」の音響モデルに類似していると判断した場合、制御部６および入力音声保存制御部７に「雑音」という認識結果を出力する。制御部６に「雑音」という認識結果が与えられると、制御部６はタイマ部９に対してトリガ信号を出力するとともに、照明装置Ｂに対しては何ら制御信号を出力しないため、照明装置Ｂは消灯したままとなる。このとき、ユーザは「あかり」と発話したにも関わらず、照明装置Ｂが点灯しないため、「あかり」というオン操作命令が正しく認識されなかったと判断し、スイッチ５を直接オン操作して照明装置Ｂを点灯させようとする。スイッチ５がオン操作されると、スイッチ５のオン操作信号が照明装置Ｂに与えられて照明装置Ｂが点灯するとともに、オン操作信号が入力音声保存制御部７に与えられる。入力音声保存制御部７では、特徴量抽出部２から入力される音声信号と認識部４の認識結果とを対応付けて保存してあり、タイマ部９からタイマ動作完了信号を受け取ると今回の認識結果とそれに対応する１乃至複数の音声信号の特徴量とを音響モデル学習部８に出力するのであるが、タイマ動作完了信号を受け取るよりも前に、認識部４の認識結果（「雑音」）と異なるオン操作信号がスイッチ５から与えられると、入力音声保存制御部７は誤認識が発生したと判断して、今回の認識結果を「雑音」から「あかり」に変更し、変更後の「あかり」という認識結果と対応する音声信号とを音響モデル学習部８に出力する。このとき、音響モデル学習部８は、「雑音」と誤認識された音声信号を用いて「あかり」音響モデルの再学習を行うので、使用者が次に「あかり」と発話した場合にこの発話を「あかり」と認識できる可能性が高くなり、認識精度を向上させることが可能になる。 For example, the recognition unit 4 is input from the feature amount extraction unit 2 even though the user utters an operation command (for example, “Akari”) to turn on the lighting device B in a state where the lighting device B is turned off (off). If the determined feature amount is compared with the acoustic model stored in the acoustic model storage unit 3 and is determined to be similar to the acoustic model of “noise”, the control unit 6 and the input voice storage control unit 7 are notified of “noise”. The recognition result is output. When the recognition result “noise” is given to the control unit 6, the control unit 6 outputs a trigger signal to the timer unit 9 and does not output any control signal to the lighting device B. Remains off. At this time, since the lighting device B does not light even though the user utters “light”, it is determined that the on operation command “light” is not correctly recognized, and the switch 5 is directly turned on to turn on the lighting device. Try to light B. When the switch 5 is turned on, an on operation signal of the switch 5 is given to the lighting device B, the lighting device B is turned on, and an on operation signal is given to the input sound storage control unit 7. The input voice storage control unit 7 stores the voice signal input from the feature amount extraction unit 2 and the recognition result of the recognition unit 4 in association with each other. When the timer operation completion signal is received from the timer unit 9, the current recognition is performed. The result and the feature quantities of one or more audio signals corresponding to the result are output to the acoustic model learning unit 8, but the recognition result (“noise”) of the recognition unit 4 before receiving the timer operation completion signal. When the switch 5 gives an ON operation signal different from the above, the input voice storage control unit 7 determines that a misrecognition has occurred, changes the current recognition result from “noise” to “light”, and changes the “ The recognition result “Akari” and the corresponding speech signal are output to the acoustic model learning unit 8. At this time, the acoustic model learning unit 8 re-learns the “AKARI” acoustic model by using the voice signal misrecognized as “NOISE”, so that when the user next utters “AKARI”, this utterance Can be recognized as “Akari”, and the recognition accuracy can be improved.

また例えば照明装置Ｂが消灯（オフ）している状態でマイク１に雑音が入力され、特徴量抽出部２が雑音の特徴量を抽出して認識部４に出力した場合に、認識部４が特徴量抽出部２から入力された特徴量を音響モデル保存部３に保存された音響モデルと比較して、「あかり」の音響モデルに類似していると判断した場合、制御部６および入力音声保存制御部７に「あかり」という認識結果が出力される。制御部６に「あかり」という認識結果が与えられると、制御部６はタイマ部９に対してトリガ信号を出力するとともに、照明装置Ｂに対して点灯制御信号を出力して、照明装置Ｂを点灯（オン）させる。このとき、ユーザは「あかり」と発話していないにも関わらず、照明装置Ｂが点灯してしまうため、雑音が「あかり」と誤認識されたと判断し、スイッチ５を直接オフ操作して照明装置Ｂを消灯させようとする。スイッチ５がオフ操作されると、スイッチ５のオフ操作信号が照明装置Ｂに与えられて照明装置Ｂが消灯するとともに、オフ操作信号が入力音声保存制御部７に与えられる。入力音声保存制御部７では、タイマ部９からタイマ動作完了信号を受け取るよりも前に、認識部４の認識結果（「あかり」）と異なるオフ操作信号がスイッチ５から与えられるので、誤認識が発生したと判断して、保存している認識結果を「あかり」から「雑音」に変更し、変更後の「雑音」という認識結果と音声信号の特徴量とを音響モデル学習部８に出力する。このとき、音響モデル学習部８は、「あかり」と誤認識された「雑音」の音声信号を用いて、「雑音」音響モデルを再学習するので、次回「雑音」がマイク１に入力された場合にこの音声信号を「雑音」と正しく認識できる可能性が高くなり、認識精度を向上させることが可能になる。 For example, when the noise is input to the microphone 1 in a state where the lighting device B is turned off (off), and the feature amount extraction unit 2 extracts the feature amount of the noise and outputs it to the recognition unit 4, the recognition unit 4 When the feature amount input from the feature amount extraction unit 2 is compared with the acoustic model stored in the acoustic model storage unit 3 and it is determined that the feature is similar to the acoustic model of “AKARI”, the control unit 6 and the input voice A recognition result “Akari” is output to the storage control unit 7. When the recognition result “Akari” is given to the control unit 6, the control unit 6 outputs a trigger signal to the timer unit 9, and outputs a lighting control signal to the lighting device B. Turn on (ON). At this time, although the user does not speak “Akari”, the lighting device B is turned on. Therefore, it is determined that the noise is erroneously recognized as “Akari”, and the switch 5 is directly turned off to perform illumination. Device B is going to be turned off. When the switch 5 is turned off, an off operation signal of the switch 5 is given to the lighting device B, the lighting device B is turned off, and an off operation signal is given to the input sound storage control unit 7. In the input voice storage control unit 7, since an off operation signal different from the recognition result (“light”) of the recognition unit 4 is given from the switch 5 before receiving the timer operation completion signal from the timer unit 9, erroneous recognition is prevented. It is determined that it has occurred, the stored recognition result is changed from “light” to “noise”, and the changed recognition result “noise” and the feature amount of the speech signal are output to the acoustic model learning unit 8. . At this time, since the acoustic model learning unit 8 re-learns the “noise” acoustic model by using the “noise” voice signal erroneously recognized as “light”, the next time “noise” is input to the microphone 1. In this case, there is a high possibility that the voice signal can be correctly recognized as “noise”, and the recognition accuracy can be improved.

このように、認識部４の誤認識によって照明装置Ｂが使用者の意図と異なる動作を行うと、使用者はスイッチ５を直接操作して照明装置Ｂの動作状態を所望の動作に修正するような操作を行うので、使用者がスイッチ５を用いて修正する操作を検知することで、音響モデルの再学習に用いる音声信号とその認識結果との対応関係を正しく修正することができる。したがって、正しい認識結果を用いて音響モデルを再学習することで、次回同じ音が入力された際に認識部４が正しく認識する確率が向上するのである。 As described above, when the lighting device B performs an operation different from the user's intention due to the misrecognition of the recognition unit 4, the user directly operates the switch 5 to correct the operation state of the lighting device B to a desired operation. Therefore, by detecting an operation to be corrected by the user using the switch 5, it is possible to correct the correspondence between the speech signal used for re-learning the acoustic model and the recognition result. Therefore, by re-learning the acoustic model using the correct recognition result, the probability that the recognition unit 4 correctly recognizes the next time the same sound is input is improved.

（実施形態２）
本発明の実施形態２を図３に基づいて説明する。図３は本実施形態の音声認識機能付制御装置のブロック図であり、この音声認識機能付制御装置Ａは、マイク１と、特徴量抽出部２と、音響モデル保存部３と、認識部４と、スイッチ５と、制御部６と、入力音声保存制御部７と、音響モデル学習部８と、タイマ部９と、制御状態記憶部１０とを主要な構成として備えている。尚、制御状態記憶部１０を付加した点以外は実施形態１で説明した音声認識機能付制御装置Ａと略同様であるので、共通する構成要素には同一の符号を付して、その説明は省略する。 (Embodiment 2)
A second embodiment of the present invention will be described with reference to FIG. FIG. 3 is a block diagram of the control device with a speech recognition function according to the present embodiment. The control device A with a speech recognition function includes a microphone 1, a feature amount extraction unit 2, an acoustic model storage unit 3, and a recognition unit 4. The switch 5, the control unit 6, the input voice storage control unit 7, the acoustic model learning unit 8, the timer unit 9, and the control state storage unit 10 are provided as main components. In addition, since it is substantially the same as the control apparatus A with a speech recognition function demonstrated in Embodiment 1 except the point which added the control status memory | storage part 10, the same code | symbol is attached | subjected to a common component and the description is given. Omitted.

制御状態記憶部１０には、制御部６から照明装置Ｂに与えられる制御信号と、スイッチ５から照明装置Ｂに与えられる制御信号とが入力されており、これらの制御信号のうち最新の制御信号をもとに照明装置Ｂの現在の制御状態を判断し、判断結果を保存する。そして、制御状態記憶部１０は、入力音声保存制御部７から制御状態の問い合わせがあると、現在記憶している制御状態（照明装置Ｂの場合は「点灯」あるいは「非点灯」）のデータを入力音声保存制御部７に出力する。 The control state storage unit 10 receives a control signal supplied from the control unit 6 to the lighting device B and a control signal supplied from the switch 5 to the lighting device B. Among these control signals, the latest control signal is input. Based on the above, the current control state of the lighting device B is determined, and the determination result is stored. When the control state storage unit 10 receives an inquiry about the control state from the input voice storage control unit 7, the control state storage unit 10 stores the currently stored control state data ("lit" or "not lit" in the case of the lighting device B). Output to the input voice storage control unit 7.

ここで、入力音声保存制御部７は、特徴量抽出部２に入力される音声信号と、認識部４による認識結果とを対応付けて保存しており、認識部４から新たな認識結果が入力されると、制御状態記憶部１０に照明装置Ｂの現在の制御状態を問い合わせる。そして、現在の制御状態と認識部４の認識結果が示す状態とが同じ状態になった場合、一般的に現在の制御状態と同じ状態に切り替えるような命令は出されないので、入力音声保存制御部７は音声信号を誤認識したと判断し、保存している認識結果を修正する。例えば入力音声保存制御部７に「あかり」という認識結果が入力された際に、制御状態記憶部１０から取り込んだ現在の制御状態が「点灯」であった場合、入力音声保存制御部７は、照明装置Ｂが既に点灯しているにも関わらず、「あかり」という命令が入力されることはないと判断できるので、認識結果を「あかり」から「雑音」に修正し、修正後の認識結果とそれに対応する音声信号を音響モデル学習部８に出力する。而して音響モデル学習部８では、「あかり」と誤認識した雑音信号を用いて、「雑音」音響モデルを再学習することができ、「雑音」音響モデルの精度を高めることで、次回同じ雑音が入力された場合に「雑音」と正しく認識できる可能性が向上する。したがって、照明装置Ｂの消灯中に同じ雑音信号が入力されたとしても、「雑音」と正しく認識できる可能性が高いから、「雑音」と誤認識して照明装置Ｂが点灯してしまうのを防止できる。 Here, the input voice storage control unit 7 stores the voice signal input to the feature amount extraction unit 2 and the recognition result by the recognition unit 4 in association with each other, and a new recognition result is input from the recognition unit 4. Then, the control state storage unit 10 is inquired about the current control state of the lighting device B. Then, when the current control state and the state indicated by the recognition result of the recognition unit 4 are the same state, an instruction to switch to the same state as the current control state is generally not issued. 7 determines that the voice signal is erroneously recognized, and corrects the stored recognition result. For example, when the recognition result “AKARI” is input to the input voice storage control unit 7 and the current control state captured from the control state storage unit 10 is “lit”, the input voice storage control unit 7 Since it can be determined that the command “light” is not input even though the lighting device B is already lit, the recognition result is corrected from “light” to “noise”, and the recognition result after correction And the corresponding audio signal is output to the acoustic model learning unit 8. Thus, the acoustic model learning unit 8 can re-learn the “noise” acoustic model using the noise signal misrecognized as “light”, and the accuracy of the “noise” acoustic model can be increased next time. The possibility of correctly recognizing “noise” when noise is input is improved. Therefore, even if the same noise signal is input while the lighting device B is turned off, there is a high possibility that it can be correctly recognized as “noise”. Can be prevented.

（実施形態３）
本発明の実施形態３を図４に基づいて説明する。本実施形態の音声認識機能付制御装置は、実施形態１の音声認識機能付制御装置Ａにおいて人感センサ１１を付加してある。なお、人感センサ１１以外の構成は実施形態１で説明した音声認識機能付制御装置Ａと同様であるので、共通する構成要素には同一の符号を付して、その説明は省略する。 (Embodiment 3)
A third embodiment of the present invention will be described with reference to FIG. The control device with a voice recognition function of the present embodiment is provided with a human sensor 11 in the control device A with a voice recognition function of the first embodiment. Since the configuration other than the human sensor 11 is the same as that of the control device A with the voice recognition function described in the first embodiment, common components are denoted by the same reference numerals, and description thereof is omitted.

人感センサ１１は、例えば超音波を用いて検知エリア内の物体を検知する超音波センサや、人体から放射される熱線を検知することによって検知エリア内で人の存否を検知する焦電型の赤外線検出素子からなり、マイク１の設置位置から所定の距離範囲に設定した検知エリア（マイク１の集音範囲を含む）内で人の存否を検出しており、検出結果を入力音声保存制御部７に出力する。 The human sensor 11 is, for example, an ultrasonic sensor that detects an object in the detection area using ultrasonic waves, or a pyroelectric type that detects the presence or absence of a person in the detection area by detecting heat rays emitted from the human body. It consists of an infrared detection element, detects the presence / absence of a person within a detection area (including the sound collection range of the microphone 1) set within a predetermined distance range from the installation position of the microphone 1, and detects the detection result as an input voice storage control unit 7 is output.

一方、入力音声保存制御部７は、特徴量抽出部２から入力された音声信号と、認識部４から入力された認識結果とを対応付けて保存しており、認識部４から認識結果が入力されると、この認識結果とそれに対応する１乃至複数の音声信号とを音響モデル学習部８に出力し、音響モデル学習部８に音響モデルを再学習させる。ただし、入力音声保存制御部７では、認識部４から操作命令という認識結果が入力された場合に人感センサ１１が人の存在を検知していない場合、雑音をユーザの声（命令語）と誤認識したと判断して、入力された認識結果を「雑音」に修正し、修正後の認識結果とそれに対応する音声信号を音響モデル学習部８に出力する。而して音響モデル学習部８では、命令語（例えば「あかり」や「消灯」など）と誤認識された雑音の音声信号を用いて、「雑音」音響モデルを再学習することができ、「雑音」音響モデルの精度を高めることで、次回同じ雑音信号が入力された場合に「雑音」と正しく認識できる可能性が向上する。 On the other hand, the input voice storage control unit 7 stores the voice signal input from the feature amount extraction unit 2 and the recognition result input from the recognition unit 4 in association with each other, and the recognition result is input from the recognition unit 4. Then, the recognition result and one or a plurality of audio signals corresponding to the recognition result are output to the acoustic model learning unit 8 to cause the acoustic model learning unit 8 to relearn the acoustic model. However, in the input voice storage control unit 7, when the human sensor 11 does not detect the presence of a person when the recognition result of the operation command is input from the recognition unit 4, the noise is regarded as the user's voice (command word). The input recognition result is corrected to “noise”, and the corrected recognition result and the corresponding speech signal are output to the acoustic model learning unit 8 by determining that the recognition has been mistaken. Thus, the acoustic model learning unit 8 can re-learn the “noise” acoustic model using a noise voice signal that is misrecognized as a command word (for example, “light” or “light-off”). Increasing the accuracy of the “noise” acoustic model improves the possibility of correctly recognizing “noise” the next time the same noise signal is input.

なお本実施形態は、照明装置Ｂの操作手段として音声認識による操作手段と、スイッチ５を用いて直接操作する手段を備えているが、人感センサ１１の検出結果を照明装置Ｂに出力させ、照明装置Ｂにおいて人感センサ１１から人体を検知したという信号が入力された場合のみ照明負荷を点灯可能とするようにしても良い。 In addition, although this embodiment is provided with the operation means by voice recognition as the operation means of the lighting device B and the means for direct operation using the switch 5, the detection result of the human sensor 11 is output to the lighting device B, The lighting load may be turned on only when a signal indicating that the human body is detected from the human sensor 11 in the lighting device B is input.

（実施形態４）
本発明の実施形態４を図５に基づいて説明する。尚、音声認識機能付制御装置Ａの基本的な構成は実施形態１〜３と同様であるので、共通する構成要素には同一の符号を付して、その説明は省略する。 (Embodiment 4)
A fourth embodiment of the present invention will be described with reference to FIG. In addition, since the basic structure of the control apparatus A with a speech recognition function is the same as that of Embodiments 1-3, the same code | symbol is attached | subjected to a common component and the description is abbreviate | omitted.

図５（ａ）（ｂ）は、例えば台所に設置され、料理に使用する材料名から料理のレシピを検索してユーザに提示する料理レシピ検索装置Ｃの音声入力用に音声認識機能付制御装置Ａを適用した場合の施工例を示している。この検索装置Ｃの器体３０は台所４０のシンク周りの壁４１に設置されており、器体３０の前面にはレシピの検索条件や検索結果を表示するタッチスイッチ付の表示パネル３１とマイク１とが配置されている。 5 (a) and 5 (b) show, for example, a control device with a voice recognition function for voice input of a cooking recipe search device C that is installed in a kitchen and searches a recipe for cooking from the names of ingredients used for cooking and presents it to the user. The example of construction at the time of applying A is shown. The body 30 of the search device C is installed on a wall 41 around the sink of the kitchen 40. On the front surface of the body 30, a display panel 31 with a touch switch for displaying recipe search conditions and search results and a microphone 1 are provided. And are arranged.

この検索装置Ｃは、操作手段として検索装置Ｃを直接操作するためタッチパネル式のスイッチ５を備え、スイッチ５から入力される制御信号に応じて所望の動作を行うのであるが、例えば料理中にユーザの手が汚れていてタッチパネルの操作ができない場合を想定して、検索装置Ｃを操作するために人が発した命令語を認識する認識部４を備え、認識部４の認識結果に応じて制御部６から出力される制御信号に応じて所望の動作を行うようになっている。なお、料理レシピの検索用に用いる場合には音響モデル保存部３に保存される音響モデルとして、表示パネル３１の画面を検索画面に遷移させるための命令語である「材料検索」という語彙に対応した「材料検索」音響モデルや、材料名を入力する際に発する材料名に対応した材料名音響モデル、例えば林檎という語彙に対応した「林檎」音響モデルや、検索処理を実行させるための命令語である「検索」という語彙に対応した「検索」音響モデルなど多数の命令語の音響モデルからなる「命令語」音響モデルと、装置を操作するための音声以外の音声や物音などの音に対応する「雑音」音響モデルとが保存されている。 This search device C includes a touch panel type switch 5 for directly operating the search device C as an operation means, and performs a desired operation in accordance with a control signal input from the switch 5, for example, a user during cooking Assuming that the user's hand is dirty and the touch panel cannot be operated, a recognition unit 4 that recognizes a command word issued by a person to operate the search device C is provided, and control is performed according to the recognition result of the recognition unit 4 A desired operation is performed in accordance with a control signal output from the unit 6. In addition, when used for searching for cooking recipes, the acoustic model stored in the acoustic model storage unit 3 corresponds to the vocabulary “material search” which is a command word for transitioning the screen of the display panel 31 to the search screen. The "material search" acoustic model, the material name acoustic model corresponding to the material name generated when inputting the material name, for example, the "apple" acoustic model corresponding to the vocabulary "apple", and the command word for executing the search process Corresponding to "command" acoustic model consisting of acoustic models of many command words such as "search" acoustic model corresponding to the vocabulary "search" and sounds such as sounds other than voice for operating the device and sounds "Noise" acoustic model to be saved.

ところで、上述の実施形態１では認識部４から制御部６に認識結果が入力されると、制御部６がタイマ部９にトリガ信号を出力して限時動作を開始させているが、本実施形態では、認識部４から制御部６に入力音が音声であるという認識結果が入力されると、制御部６が、認識結果に応じて制御信号を制御対象機器（検索装置Ｃ）に出力するとともに、図示しない第２タイマにトリガ信号を出力して、一定時間の限時動作を開始させる。この第２タイマは、一定時間を限時すると入力音声保存制御部７にタイマ完了信号を出力しており、限時動作中に制御部６から再度トリガ信号が入力されると、一定時間の限時動作を再度初めから行っており、いわゆるリトリガブル機能を有している。 By the way, in Embodiment 1 described above, when a recognition result is input from the recognition unit 4 to the control unit 6, the control unit 6 outputs a trigger signal to the timer unit 9 to start a time limit operation. Then, when the recognition result that the input sound is a voice is input from the recognition unit 4 to the control unit 6, the control unit 6 outputs a control signal to the control target device (search device C) according to the recognition result. Then, a trigger signal is output to a second timer (not shown) to start a time limit operation for a fixed time. The second timer outputs a timer completion signal to the input sound storage control unit 7 when a certain time is reached. When the trigger signal is input again from the control unit 6 during the time limit operation, the second timer performs the timed operation for a certain time. It is performed again from the beginning and has a so-called retriggerable function.

一方、入力音声保存制御部７では、特徴量抽出部２から入力された音声信号と、認識部４から入力された認識結果とを対応付けて保存しており、認識部４から認識結果が入力されると保存している認識結果の内、今回入力された認識結果とその前に入力された認識結果を除く全ての認識結果と、それらに対応した１乃至複数の音声データとを音響モデル学習部８に出力し、音響モデル学習部８に音響モデルを再学習させている。ただし、２つ前に入力された認識結果が「材料検索」などのような音声操作を表す語彙であり、その後第２タイマのタイマ完了信号が入力されるよりも前に入力された１つ前の認識結果が「雑音」であり、さらにその後に第２タイマのタイマ完了信号が入力されるよりも前に今回の認識結果として「林檎」のような音声操作を表す語彙を受け取った場合、入力音声保存制御部７は「雑音」という結果になった１つ前の認識結果と、この認識結果に対応する音声信号のデータを削除する。 On the other hand, the input voice storage control unit 7 stores the voice signal input from the feature amount extraction unit 2 and the recognition result input from the recognition unit 4 in association with each other, and the recognition result is input from the recognition unit 4. Acoustic model learning of all the recognition results except the recognition result input this time, the recognition result input before that, and one or a plurality of speech data corresponding to them among the stored recognition results The data is output to the unit 8, and the acoustic model learning unit 8 is relearned with the acoustic model. However, the recognition result input two times before is a vocabulary representing a voice operation such as “material search”, and then the input before the timer completion signal of the second timer is input. If the recognition result is “noise” and a vocabulary representing a voice operation such as “apple” is received as the current recognition result before the timer completion signal of the second timer is input after that, The voice storage control unit 7 deletes the previous recognition result that resulted in “noise” and the data of the voice signal corresponding to this recognition result.

本実施形態のように、料理レシピ検索装置Ｃのような連続して音声入力を行う装置に音声認識機能付制御装置Ａを適用した場合、ユーザが本装置Ａを利用している時には、例えば「材料検索」と発話して材料検索画面に切り替え、「林檎」などと材料名を発話した後、連続して「検索」と発話して検索動作を実行させるというように、連続的に音声が入力されるので、第２タイマが限時動作を終了するまでの間に入力される音は雑音ではなく音声である可能性が高い。すなわち、第２タイマの限時動作が終了するまでの間（つまり音声という認識結果が得られてから一定時間が経過するまでの間）にマイク１に入力される音は音声である可能性が高いので、この間に認識部４から「雑音」という認識結果が入力された場合には、「雑音」と認定された音は実際には音声であると判断することができ、入力音声保存制御部７では「雑音」という認識結果と、「雑音」と認識された音声信号のデータとを破棄させることで、音響モデル学習部８が誤った認識結果を用いて音響モデルを再学習するのを防止することができる。 When the control device with voice recognition function A is applied to a device that continuously inputs voice, such as the cooking recipe search device C as in the present embodiment, when the user uses the device A, for example, “ Speak “material search” to switch to the material search screen, utter the material name such as “apple”, and then continuously utter “search” to execute the search operation. Therefore, there is a high possibility that the sound input before the second timer finishes the time limit operation is not noise but speech. That is, there is a high possibility that the sound input to the microphone 1 is a voice until the time limit operation of the second timer ends (that is, until a predetermined time elapses after the recognition result of voice is obtained). Therefore, if a recognition result “noise” is input from the recognition unit 4 during this period, it can be determined that the sound recognized as “noise” is actually a voice, and the input voice storage control unit 7 Then, by recognizing the recognition result “noise” and the audio signal data recognized as “noise”, the acoustic model learning unit 8 is prevented from re-learning the acoustic model using the erroneous recognition result. be able to.

実施形態１のブロック図である。1 is a block diagram of Embodiment 1. FIG. 同上を照明装置に適用した使用例の説明図である。It is explanatory drawing of the usage example which applied the same to the illuminating device. 実施形態２のブロック図である。6 is a block diagram of Embodiment 2. FIG. 実施形態３のブロック図である。It is a block diagram of Embodiment 3. （ａ）（ｂ）は実施形態４を料理レシピ検索装置に適用した使用例の説明図である。(A) (b) is explanatory drawing of the usage example which applied Embodiment 4 to the cooking recipe search device. 従来例のブロック図である。It is a block diagram of a prior art example.

Explanation of symbols

Ａ音声認識機能付制御装置
Ｂ照明装置
２特徴量抽出部
３音響モデル保存部
４認識部
５スイッチ
６制御部
７入力音声保存制御部
８音響モデル学習部 A control device with voice recognition function B lighting device 2 feature extraction unit 3 acoustic model storage unit 4 recognition unit 5 switch 6 control unit 7 input voice storage control unit 8 acoustic model learning unit

Claims

A sound conversion unit that receives a sound that is either a voice or noise generated by a person to operate the control target device, converts the input sound into a sound signal that is an electrical signal, and outputs the sound signal;
A feature amount extraction unit that extracts a feature amount of an input sound from the audio signal;
An acoustic model unit that stores an acoustic model in which a feature amount is modeled for each of a plurality of voices and noises;
A recognition unit that recognizes an input sound by comparing the feature amount extracted by the feature amount extraction unit with an acoustic model stored in the acoustic model unit;
A control unit that outputs a control signal for performing an operation according to a recognition result of the recognition unit to a control target device;
An operation unit that directly outputs a control signal corresponding to the operation to the control target device; and
An input voice storage unit that stores the voice signal and the recognition result of the recognition unit in association with each other;
Acoustic model learning for re-learning the acoustic model of the input sound using the recognition result stored in the input speech storage unit and the speech signal corresponding to the recognition result, and updating the acoustic model stored in the acoustic model unit And
A timer unit for timing a predetermined time limit from the time when a recognition result is input from the recognition unit to the control unit;
When a control signal having a control content different from the recognition result of the recognition unit is output from the operation unit during the time limit operation of the timer unit, the input voice storage unit displays the stored recognition result as the operation unit. A control apparatus with a speech recognition function, wherein the acoustic model learning unit re-learns the acoustic model using the corrected recognition result and the input speech, based on the content of the control signal output from the control signal.

A control signal for turning on the control target device is output from the operation unit until the time limit operation of the timer unit ends after the recognition unit recognizes the input sound as noise while the control target device is off. Then, the input voice storage unit corrects and stores the recognition result recognized as noise into a voice for an on operation, and stores the voice recognition function-equipped control apparatus according to claim 1.

While the control target device is turned off, the control unit turns off the control target device from the operation unit until the time limit operation of the timer unit ends after the recognition unit recognizes the input sound as a sound for turning on the operation. 2. The control with a voice recognition function according to claim 1, wherein when the control signal to be output is output, the input voice storage unit corrects and stores the recognition result recognized as the voice for the on-operation to noise. apparatus.

The recognition unit includes a control state storage unit that stores an operation state of the control target device, and the input sound is a voice for operating the control target device to a current operation state stored in the control state storage unit 2. The control device with a voice recognition function according to claim 1, wherein the input voice storage unit corrects the recognition result recognized as a voice for operation to noise and stores the recognition result.

A human sensor for detecting the presence or absence of a person in a detection area including at least the sound collection range of the sound conversion unit is provided, and the recognition unit receives an input sound in a state where the human sensor does not detect the presence of a person. 2. The control with a voice recognition function according to claim 1, wherein when the voice for operation is recognized, the input voice storage unit corrects and stores the recognition result recognized as the voice for operation as noise. apparatus.

When the recognition unit determines that the new input sound is noise until a certain time has elapsed after the recognition unit recognizes the input sound as speech, the input speech storage unit recognizes that the input sound is recognized as noise. 2. The control apparatus with a voice recognition function according to claim 1, wherein the result and the voice signal data corresponding to the recognition result are deleted.