JP3550871B2

JP3550871B2 - Voice recognition method and apparatus

Info

Publication number: JP3550871B2
Application number: JP10728496A
Authority: JP
Inventors: 章寺澤; 弘及川; 博昭竹山; 香子田中; 実福島
Original assignee: Matsushita Electric Works Ltd
Current assignee: Panasonic Electric Works Co Ltd
Priority date: 1996-04-26
Filing date: 1996-04-26
Publication date: 2004-08-04
Anticipated expiration: 2016-04-26
Also published as: JPH09292894A

Description

【０００１】
【発明の属する技術分野】
本発明は、音声認識方法及び装置、特に、背景雑音のある環境下にて音声の認識を行う技術に関する。
【０００２】
【従来の技術】
音声認識の技術が応用される分野として、音声によって電気機器の操作を行う音声操作スイッチがある。この音声操作スイッチの音声認識装置の構成としては、図６に示すものが知られている。このものは、音声信号を入力するマイクロフォン１から出力される音声信号を所定の周波数帯域にて弁別する弁別手段に相当する複数の周波数帯域の異なる帯域フィルタ２に入力し、該帯域フィルタ２からの出力を比較手段に相当する比較器３によって所定の閾値と比較する。そして、比較器３の出力を演算する演算手段に相当するＡＮＤ回路５によって、音声信号の所定の信号スペクトルの全てのものの大きさが、所定の大きさ以上か否かを演算するものである。このものにおいては、比較器３の閾値は、所定の音声を所定の認識水準にて認識する事を前提にし、認識しようとする音声によってそれぞれ所定の異なる値に設定される。
【０００３】
【発明が解決しようとする課題】
ところで、上記の音声操作スイッチにおいては、使用される環境は、認識を必要とする音声以外の音すなわち背景雑音があることが多く、認識しようとする音声信号に比べ背景雑音の音圧レベルが大きいこともあり、この様な場合には誤認識するといった問題があった。
【０００４】
本発明は、上記事由に鑑みてなしたもので、その目的とするところは、背景雑音があっても誤認識する事がなく確実に音声を認識する技術を提供することにある。
【０００５】
【課題を解決するための手段】
上記目的を達成するために、請求項１記載の音声認識方法は、認識対象の音声を認識する方法において、背景雑音とともに前記認識対象の音声信号を入力する音声入力手段と、該音声信号を少なくとも２つの所定の周波数帯域によって弁別する弁別手段と、該弁別手段からのそれぞれの出力レベルを閾値と比較する比較手段と、認識対象の音声以外の背景雑音に基づき前記閾値を変化する閾値制御手段と、前記比較手段からの出力を演算する演算手段と、を備え、前記閾値制御手段は、音声信号の信号スペクトルを演算し出力するスペクトル演算部と、該スペクトル演算部からのスペクトル出力が所定の時間に所定の閾値を越える回数を計数する計数部とを有し、前記音声入力手段により入力された音声信号を前記弁別手段で少なくとも２つの所定の周波数帯域によって弁別し、該周波数帯域のそれぞれの出力レベルを前記比較手段により前記閾値制御手段にて変化される閾値と比較し、比較した結果を前記演算手段で演算することによって前記認識対象の音声を認識することとしている。これにより、少なくとも２つの周波数帯域によって弁別したそれぞれの出力レベルが認識対象の音声以外の背景雑音に基づき閾値を変化して比較され、音声が認識され、しかも、この場合、弁別手段によって所定の周波数帯域により弁別された出力は、認識対象の音声以外の背景雑音がスペクトル演算部によって信号スペクトルが演算されて計数部によって該スペクトル出力が所定の時間に所定の閾値を越える回数が計数されて閾値が変化される比較手段によって比較されるものとなる。
【０００６】
また、請求項２記載の音声認識装置は、認識対象の音声を認識する装置において、背景雑音とともに前記認識対象の音声信号を入力する音声入力手段と、該音声信号を少なくとも２つの所定の周波数帯域によって弁別する弁別手段と、該弁別手段からのそれぞれの出力レベルを閾値と比較する比較手段と、認識対象の音声以外の背景雑音に基づき前記閾値を変化する閾値制御手段と、前記比較手段からの出力を演算する演算手段と、を備え、前記閾値制御手段は、音声信号の信号スペクトルを演算し出力するスペクトル演算部と、該スペクトル演算部からのスペクトル出力が所定の時間に所定の閾値を越える回数を計数する計数部とを有することとしている。これにより、音声入力手段から入力された音声信号が少なくとも２つの弁別手段によって所定の周波数帯域により弁別されて出力され、閾値制御手段によって認識対象の音声以外の背景雑音に基づき閾値が制御され変化する比較手段によって比較され、演算手段により演算されて認識され、しかも、この場合、弁別手段によって所定の周波数帯域により弁別された出力は、認識対象の音声以外の背景雑音がスペクトル演算部によって信号スペクトルが演算されて計数部によって該スペクトル出力が所定の時間に所定の閾値を越える回数が計数されて閾値が変化される比較手段によって比較されるものとなる。
【０００７】
また、請求項３記載の音声認識装置は、請求項２記載の弁別手段の周波数帯域の中心を、認識する所定の音声におけるフォルマント周波数とすることとしている。これにより、音声信号が所定の音声におけるフォルマント周波数を中心周波数とする周波数帯域によって弁別されるものとなる。
【０００８】
また、請求項４記載の音声認識装置は、請求項２又は３記載の比較手段は、入力側の弁別手段の周波数帯域の高いものの閾値を入力側の弁別手段の周波数帯域の低いものの閾値より小さくすることとしている。これにより、音声信号の周波数帯域の高い信号成分は周波数帯域の低い信号成分よりも低い閾値によって比較されるものとなる。
【００１２】
【発明の実施の形態】
以下、本発明の音声認識装置の第１の実施の形態を図１乃至図３に基づいて、第１の参考の形態を図４に基づいて、第２の参考の形態を図５に基づいて、それぞれ説明する。
【００１３】
［第１の実施の形態］
図１は、第１の実施の形態の音声認識装置を示す機能ブロック図である。図２は、図１に示す音声認識装置の比較器の閾値の説明図である。図３は、図１に示す音声認識装置のノイズモニタの構成図である。
【００１４】
この音声認識装置は、音声の応答によって通話を開始する拡声通話装置等にて認識対象の音声を認識するもので、音声入力手段に相当するマイクロフォン１と、所定の周波数帯域によって弁別する弁別手段に相当する３つの帯域フィルタ２と、比較手段に相当する比較器３と、閾値制御手段に相当するノイズモニタ４と、演算手段に相当するＡＮＤ回路５と、を備えている。
【００１５】
マイクロフォン１は、背景雑音とともに認識対象の音声信号を入力するもので、それに限定されるものではないが、この実施の形態では小型のコンデンサ型マイクロフォンによって構成される。
【００１６】
帯域フィルタ２は、マイクロフォン１から出力される音声信号を所定の周波数帯域によって弁別するもので、この実施の形態では３つの周波数帯域の帯域フィルタ２ａ、２ｂ、２ｃがそれぞれ並列に接続される。この帯域フィルタ２は、所定の帯域の音声信号を通過させるバンドパスフィルタで、通過する周波数帯域の中心周波数を、認識する「はい」という音声におけるフォルマント周波数としている。フォルマント周波数は、所定の音声波形のスペクトルのエネルギーの集中部分の周波数で、この実施の形態では、例えば応答のための「はい」という音声を認識することとしている。そして、帯域フィルタ２ａを第１フォルマント周波数ｆ１の２５０Ｈｚ、帯域フィルタ２ｂを第２フォルマント周波数ｆ２の８００Ｈｚ、帯域フィルタ２ｃを第３フォルマント周波数ｆ３の１４００Ｈｚとしている。そして、それぞれの帯域フィルタの通過周波数帯域を、帯域フィルタ２ａの通過周波数帯域Ｗ１を２００〜３００Ｈｚ、帯域フィルタ２ｂの通過周波数帯域Ｗ２を７００〜９００Ｈｚ、帯域フィルタ２ｃの通過周波数帯域Ｗ１を１２００〜１６００Ｈｚとしている。
【００１７】
比較器３は、帯域フィルタ２からのそれぞれの出力レベルを閾値と比較するもので、３つの帯域フィルタにそれぞれ接続される。この３つの比較器３ａ、３ｂ、３ｃは、それぞれ、帯域フィルタ２ａ、２ｂ、２ｃからの出力が入力され、それぞれ所定の閾値Ｖａ、Ｖｂ、Ｖｃを有してそれぞれの出力レベルを比較する。この比較器３のそれぞれの閾値Ｖａ、Ｖｂ、Ｖｃは、図２に示すように、入力側の帯域フィルタ２の周波数帯域の高いものの閾値を入力側の帯域フィルタ２の周波数帯域の低いものの閾値より小さくなるよう、Ｖａ＞Ｖｂ＞Ｖｃの条件によって、例えば「はい」という音声を認識しうるようそれぞれ設定されている。また、この閾値Ｖａ、Ｖｂ、Ｖｃは、後述するノイズモニタ４の出力によって変化する。
【００１８】
ノイズモニタ４は、マイクロフォン１から出力される音声信号において認識対象の音声以外の背景雑音に基づき前記閾値を変化するもので、スペクトル演算部４ａと、該スペクトル演算部４ａのスペクトル出力が所定の時間に所定の閾値を越える回数を計数する計数部４ｂと、計数部４ｂの計数結果によって閾値をそれぞれ制御する閾値制御部４ｃとを有して構成される。スペクトル演算部４ａは、音声信号の信号スペクトルを演算し出力するもので、それに限定されるものではないが、この実施の形態ではディジタルシグナルプロセッサー（ＤＳＰ）によって構成される。このノイズモニタ４により、マイクロフォン１から出力される認識対象の音声以外の背景雑音が、スペクトル演算部４ａによって信号スペクトルが演算されて出力され、計数部４ｂによって該信号スペクトル出力が所定の時間に所定の閾値を越える回数が計数される。そして、信号スペクトル出力が所定の時間において所定の閾値を越える回数が多い場合は背景雑音があると判定して前記閾値Ｖａ、Ｖｂ、Ｖｃを変化させる。
【００１９】
ＡＮＤ回路５は、比較器３からの出力を演算するもので、このものにおいては３つの入力の積算の論理演算を行うＡＮＤ回路によって構成されている。このＡＮＤ回路５の出力は、比較器３ａ、３ｂ、３ｃの出力が全てオンのときのみにオンを出力する。
【００２０】
次に、以上説明した音声認識装置によって認識対象の音声を認識する動作について説明する。
【００２１】
マイクロフォン１に向かって音声を入力すると、マイクロフォン１から入力された音声に基づく音声信号が出力される。この音声信号は、所定の増幅回路によって増幅された後帯域フィルタ２ａ、２ｂ、２ｃにそれぞれ入力され、帯域フィルタ２ａは所定の通過周波数帯域Ｗ１である２００〜３００Ｈｚ、帯域フィルタ２ｂは通過周波数帯域Ｗ２である７００〜９００Ｈｚ、帯域フィルタ２ｃは通過周波数帯域Ｗ３である１２００〜１６００Ｈｚの間の音声周波数成分のみを通過させ出力する。そして、これら３つの出力は、それぞれ比較器３ａ、３ｂ、３ｃに入力され、それぞれ所定の閾値Ｖａ、Ｖｂ、Ｖｃと比較され閾値より入力が大きい場合、比較器はオンを出力する。この閾値Ｖａ、Ｖｂ、Ｖｃは、ノイズモニタ４によって制御されている。
【００２２】
この閾値Ｖａ、Ｖｂ、Ｖｃは、マイクロフォン１から入力された音声に基づく音声信号が所定の増幅回路（図示せず）によって増幅された後、ノイズモニタ４のスペクトル演算部４ａに入力され入力信号の信号スペクトルが演算により求められて制御される。即ち、信号スペクトルが計数部４ｂに入力され、信号スペクトルが所定の時間において所定の閾値を越える回数が多い場合は背景雑音があると判定され、閾値制御部４ｃによって閾値がそれぞれ変化される。そして、比較器３ａ、３ｂ、３ｃの全てがオンの出力をＡＮＤ回路５に入力した場合のみＡＮＤ回路５の出力がオンとなり、所定の音声である「はい」という音声が認識される。
【００２３】
以上説明した実施の形態の音声認識装置によると、マイクロフォン１から入力された音声信号が３つの帯域フィルタ２によって所定の周波数帯域により弁別されて出力され、ノイズモニタ４によって認識対象の音声以外の背景雑音に基づき閾値が制御され変化する比較器３によって比較され、ＡＮＤ回路５により演算されて認識されるものとなるので、背景雑音があっても誤認識する事がなく確実に音声を認識することができる。また、音声信号が所定の音声におけるフォルマント周波数を中心周波数とする周波数帯域によって弁別されるものとなるので、認識対象である所定の音声の認識能力が向上する。また、音声信号の周波数帯域の高い信号成分は周波数帯域の低い信号成分よりも低い閾値によって比較されるものとなるので、例えば音声認識装置の電源回路のトランスの振動による低い周波数の背景雑音があっても、認識対象である所定の音声を確実に認識できる。また、帯域フィルタ２によって所定の周波数帯域により弁別された出力は、認識対象の音声以外の背景雑音がノイズモニタ４のスペクトル演算部４ａによって信号スペクトルが演算されて、計数部４ｂにより該信号スペクトル出力が所定の時間に所定の閾値を越える回数が計数され閾値が変化される比較器３によって比較されるものとなるので、背景雑音の特徴に見合って閾値が制御されて変化し、認識対象である所定の音声の認識能力が向上する。
【００２４】
［第１の参考の形態］
図４は、第１の参考の形態の音声認識装置のノイズモニタの構成図である。
【００２５】
この音声認識装置は、第１の実施の形態の音声認識装置のノイズモニタ４の構成のみ異なるもので、他の部分は同一に構成される。
【００２６】
このもののノイズモニタ４も、マイクロフォン１から出力される音声信号において認識対象の音声以外の背景雑音に基づき３つの比較器３ａ、３ｂ、３ｃの閾値Ｖａ、Ｖｂ、Ｖｃを変化するもので、平均値演算部４ｄと、該平均値演算部４ｄからの平均値演算出力を比較する比較部４ｅと、比較部４ｅの比較結果によって閾値をそれぞれ制御する閾値制御部４ｆとを有する。平均値演算部４ｄは、音声信号を長さの異なる時間にわたって入力レベルの平均値を演算し出力するもので、例えば、入力レベルの平均値を演算する演算時間を異なった値を持つ第１平均値演算部及び第２平均値演算部によって構成される。このノイズモニタ４により、マイクロフォン１から出力される認識対象の音声以外の背景雑音が、平均値演算部４ｄ即ち第１平均値演算部と第２平均値演算部によって長さの異なる時間にわたって入力レベルの平均値が演算された後、比較部４ｅによって比較される。そして、短時間平均値出力＞長時間平均値出力の場合は音声入力、短時間平均値出力＜長時間平均値出力の場合は背景雑音入力と判定して前記閾値Ｖａ、Ｖｂ、Ｖｃを変化させる。なお、平均値演算部４ｄは、平均値演算時間の異なる第１平均値演算部と第２平均値演算部による構成に限定するものでなく、ＤＳＰによって構成してもよい。
【００２７】
以上説明した参考の形態の音声認識装置によると、帯域フィルタ２によって所定の周波数帯域により弁別された出力は、認識対象の音声以外の背景雑音が平均値演算部４ｄによって長さの異なる時間にわたって入力レベルの平均値が演算された後それぞれを比較して閾値が変化される比較手段によって比較されるものとなるので、背景雑音の継続時間に見合って閾値が制御されて変化し、認識対象である所定の音声の認識能力が向上する。
【００２８】
［第２の参考の形態］
図５は、第２の参考の形態の音声認識装置を示す機能ブロック図である。
【００２９】
この音声認識装置は、音声入力手段に相当するマイクロフォン１と、所定の音素別に弁別する弁別手段に相当する３つの音素検知回路６と、演算手段に相当する演算器７と、を備えている。
【００３０】
マイクロフォン１は、第１の実施の形態のものと同一のものによって構成される。
【００３１】
音素検知回路６は、マイクロフォン１から出力される音声信号を所定の音素別に弁別するもので、この実施の形態では３つのものがそれぞれ並列に接続される。この音素検知回路６は、所定の音声の音素を検知するもので、帯域フィルタが、通過する音声信号の所定の音素を検出するフォルマント周波数の周波数帯域の中心周波数と所定の閾値とを持って構成されている。音素検知回路６は、この実施の形態においては、例えば認識する「はい」という音声における「ｈ」、「ａ」、「ｉ」の３つの音素を検知する音素検知回路６ａ、６ｂ、６ｃを持って構成されている。
【００３２】
演算器７は、音素検知回路６からの出力を演算するもので、音素検知回路６ａ、６ｂ、６ｃからの３つの出力が所定の順に出力されることを識別する識別回路によって構成されている。この演算器７の出力は、音素検知回路６ａ、６ｂ、６ｃからの入力が、最初に「ｈ」の音素検知回路である６ａから入力され、次に「ａ」の音素検知回路である６ｂから入力され、次に「ｉ」の音素検知回路である６ｃから入力されときのみにオンが出力される。
【００３３】
以上説明した参考の形態の音声認識装置によると、マイクロフォン１から入力された音声信号が３つの音素検知回路６ａ、６ｂ、６ｃによって所定の音素別に弁別して出力され、演算器５によって所定の順に出力されることが識別されて認識されるものとなるので、認識対象の所定の音声の音素に見合って出力の順が認識され所定の音声の認識能力が向上する。
【００３４】
【発明の効果】
以上説明したように、本発明の音声認識方法及び装置によれば、少なくとも２つの周波数帯域によって弁別したそれぞれの出力レベルが認識対象の音声以外の背景雑音に基づき閾値を変化して比較され、音声が認識されるので、背景雑音があっても誤認識する事がなく確実に音声を認識することができる。しかも、弁別手段によって所定の周波数帯域により弁別された出力は、認識対象の音声以外の背景雑音がスペクトル演算部によって信号スペクトルが演算されて計数部によって該スペクトル出力が所定の時間に所定の閾値を越える回数が計数されて閾値が変化される比較手段によって比較されるものとなるので、背景雑音の特徴に見合って閾値が制御されて変化し、認識対象である所定の音声の認識能力が向上する。
【００３５】
また、請求項３記載の音声認識装置は、請求項２記載のものの効果に加え、音声信号が所定の音声におけるフォルマント周波数を中心周波数とする周波数帯域によって弁別されるものとなるので、認識対象である所定の音声の認識能力が向上する。
【００３６】
また、請求項４記載の音声認識装置は、請求項２又は３記載のものの効果に加え、音声信号の周波数帯域の高い信号成分は周波数帯域の低い信号成分よりも低い閾値によって比較されるものとなるので、例えば音声認識装置の電源回路のトランスの振動による低い周波数の背景雑音があっても、認識対象である所定の音声を確実に認識できる。
【図面の簡単な説明】
【図１】本発明の第１の実施の形態の音声認識装置を示す機能ブロック図である。
【図２】図１に示す、音声認識装置の比較器の閾値の説明図である。
【図３】図１に示す、音声認識装置のノイズモニタの構成図である。
【図４】第１の参考の形態の音声認識装置のノイズモニタの構
【図５】第２の参考の形態の音声認識装置を示す機能ブロック図である。
【図６】従来例を示す機能ブロック図である。
【符号の説明】
１マイクロフォン（音声入力手段）
２帯域フィルタ（弁別手段）
３比較器（比較手段）
４ノイズモニタ（閾値制御手段）
５ＡＮＤ回路（演算手段）
６音素検知回路（弁別手段）
７演算器（演算手段）[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a speech recognition method and apparatus, and more particularly to a technique for recognizing speech in an environment with background noise.
[0002]
[Prior art]
As a field to which voice recognition technology is applied, there is a voice operation switch for operating an electric device by voice. FIG. 6 shows a known configuration of the voice recognition device for the voice operation switch. This device inputs an audio signal output from a microphone 1 for inputting an audio signal to a band filter 2 having a plurality of different frequency bands corresponding to a discriminating means for discriminating in a predetermined frequency band. The output is compared with a predetermined threshold value by a comparator 3 corresponding to a comparing means. Then, the AND circuit 5 corresponding to the calculating means for calculating the output of the comparator 3 calculates whether or not the magnitudes of all the predetermined signal spectra of the audio signal are equal to or larger than the predetermined magnitude. In this case, the threshold value of the comparator 3 is set to a predetermined different value depending on the voice to be recognized, on the assumption that a predetermined voice is recognized at a predetermined recognition level.
[0003]
[Problems to be solved by the invention]
By the way, in the above-mentioned voice operation switch, a used environment often includes a sound other than the voice that needs to be recognized, that is, background noise, and the sound pressure level of the background noise is higher than the voice signal to be recognized. In such a case, there is a problem that the recognition is erroneous.
[0004]
The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a technique for reliably recognizing speech without erroneous recognition even if there is background noise.
[0005]
[Means for Solving the Problems]
In order to achieve the above object, a speech recognition method according to claim 1 is a method for recognizing a speech to be recognized, wherein the speech input means inputs the speech signal to be recognized together with background noise; Discriminating means for discriminating by two predetermined frequency bands, comparing means for comparing each output level from the discriminating means with a threshold value, and threshold value controlling means for changing the threshold value based on background noise other than speech to be recognized. Computing means for computing an output from the comparing means, wherein the threshold value controlling means computes and outputs a signal spectrum of the audio signal, and a spectrum output from the spectrum computing section is operated for a predetermined time. and a counting section for counting the number of times exceeding a predetermined threshold value, at least two in said discriminating means an audio signal inputted by said voice input means Discriminating a predetermined frequency band, the recognition target by comparing a threshold with which changes in the threshold control means by said comparing means each output level of the frequency band, and calculates the result of comparison by the arithmetic means Is to recognize the voice. Thus, compared to changing the threshold based on at least two respective output level background noise other than the voice to be recognized that discriminated by the frequency band, the sound is recognized, moreover, in this case, the predetermined frequency by discriminator For the output discriminated by the band, the background noise other than the speech to be recognized is subjected to a signal spectrum calculation by a spectrum calculation unit, and the counting unit counts the number of times that the spectrum output exceeds a predetermined threshold at a predetermined time, and the threshold is determined. that Do shall be compared by altered by the comparison means.
[0006]
A speech recognition apparatus according to claim 2, wherein the speech recognition means recognizes the speech to be recognized. A speech input means for inputting the speech signal to be recognized together with background noise, and the speech signal is converted into at least two predetermined frequency bands. Discriminating means, discriminating means, comparing means for comparing each output level from the discriminating means with a threshold, threshold controlling means for changing the threshold based on background noise other than speech to be recognized, and Calculating means for calculating an output , wherein the threshold value control means calculates and outputs a signal spectrum of the audio signal, and a spectrum output from the spectrum calculating section exceeds a predetermined threshold value at a predetermined time. having a and counting unit for counting the number of times is set to Rukoto. Thus, the audio signal input from the audio input unit is discriminated by the predetermined frequency band by the at least two discriminating units and output, and the threshold value is controlled and changed by the threshold control unit based on the background noise other than the speech to be recognized. In the output, which is compared by the comparing means, calculated and recognized by the calculating means , and is discriminated by the predetermined frequency band by the discriminating means, the signal spectrum of the background noise other than the speech to be recognized is calculated by the spectrum calculating unit. the calculation has been counting unit becomes shall be compared by comparison means the spectral output threshold is counted the number of times exceeding a predetermined threshold value at a given time is changed.
[0007]
In the voice recognition device according to the third aspect, the center of the frequency band of the discriminating means according to the second aspect is a formant frequency in a predetermined voice to be recognized. As a result, the audio signal is discriminated by the frequency band having the center frequency of the formant frequency of the predetermined audio.
[0008]
According to a fourth aspect of the present invention, in the speech recognition apparatus according to the second or third aspect, the threshold of the input-side discriminating unit having a high frequency band is smaller than the threshold of the input-side discriminating unit having a low frequency band. I'm going to do that. As a result, a signal component having a high frequency band of the audio signal is compared with a signal component having a low frequency band using a lower threshold.
[0012]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, a first embodiment of the voice recognition device of the present invention will be described with reference to FIGS. 1 to 3, a first reference embodiment will be described with reference to FIG. 4, and a second reference embodiment will be described based on FIG. , Respectively.
[0013]
[First Embodiment]
FIG. 1 is a functional block diagram illustrating the voice recognition device according to the first embodiment. FIG. 2 is an explanatory diagram of a threshold value of a comparator of the speech recognition device shown in FIG. FIG. 3 is a configuration diagram of a noise monitor of the voice recognition device shown in FIG.
[0014]
This voice recognition device recognizes a voice to be recognized by a loudspeaker device or the like that starts a call by responding to a voice. The voice recognition device includes a microphone 1 corresponding to a voice input device and a discrimination device that discriminates a predetermined frequency band. There are provided three corresponding bandpass filters 2, a comparator 3 corresponding to comparing means, a noise monitor 4 corresponding to threshold control means, and an AND circuit 5 corresponding to calculating means.
[0015]
The microphone 1 inputs a speech signal to be recognized together with background noise, and is not limited to this. In this embodiment, the microphone 1 is configured by a small condenser microphone.
[0016]
The bandpass filter 2 discriminates an audio signal output from the microphone 1 by a predetermined frequency band. In this embodiment, bandpass filters 2a, 2b, and 2c of three frequency bands are respectively connected in parallel. The bandpass filter 2 is a bandpass filter that allows a sound signal in a predetermined band to pass therethrough, and determines the center frequency of the passing frequency band as the formant frequency of the recognized “yes” sound. The formant frequency is a frequency of a concentrated portion of energy of a spectrum of a predetermined voice waveform, and in this embodiment, for example, a voice “yes” for response is recognized. The bandpass filter 2a has a first formant frequency f1 of 250 Hz, the bandpass filter 2b has a second formant frequency f2 of 800 Hz, and the bandpass filter 2c has a third formant frequency f3 of 1400 Hz. The pass frequency band of each band filter is 200 to 300 Hz for the pass frequency band W1 of the band filter 2a, 700 to 900 Hz for the pass frequency band W2 of the band filter 2b, and 1200 to 1600 Hz for the pass frequency band W1 of the band filter 2c. And
[0017]
The comparator 3 compares each output level from the bandpass filter 2 with a threshold value, and is connected to each of the three bandpass filters. The outputs from the bandpass filters 2a, 2b, and 2c are input to the three comparators 3a, 3b, and 3c, respectively, and have predetermined thresholds Va, Vb, and Vc, respectively, and compare the respective output levels. As shown in FIG. 2, the threshold values Va, Vb, and Vc of the comparator 3 are set such that the threshold value of the input band-pass filter 2 having a higher frequency band is smaller than the threshold value of the input band-pass filter 2 having a lower frequency band. The values are set so that, for example, the voice of “Yes” can be recognized according to the condition of Va>Vb> Vc so as to be smaller. The thresholds Va, Vb, and Vc change according to the output of the noise monitor 4 described later.
[0018]
The noise monitor 4 changes the threshold based on background noise other than the speech to be recognized in the speech signal output from the microphone 1. The noise monitor 4 has a spectrum calculation unit 4a and a spectrum output from the spectrum calculation unit 4a for a predetermined time. A counting unit 4b for counting the number of times exceeding a predetermined threshold value, and a threshold value control unit 4c for controlling the threshold value based on the counting result of the counting unit 4b. The spectrum calculator 4a calculates and outputs the signal spectrum of the audio signal, and is not limited to this, but is configured by a digital signal processor (DSP) in this embodiment. The noise monitor 4 outputs a background noise other than the speech to be recognized output from the microphone 1 after a signal spectrum is calculated by a spectrum calculator 4a and output by the counter 4b at a predetermined time. The number of times exceeding the threshold is counted. If the number of times that the signal spectrum output exceeds the predetermined threshold in a predetermined time is large, it is determined that there is background noise, and the thresholds Va, Vb, and Vc are changed.
[0019]
The AND circuit 5 calculates an output from the comparator 3, and is configured by an AND circuit that performs a logical operation of integrating three inputs. The output of the AND circuit 5 outputs ON only when all the outputs of the comparators 3a, 3b, 3c are ON.
[0020]
Next, an operation of recognizing a recognition target voice by the above-described voice recognition device will be described.
[0021]
When a voice is input toward the microphone 1, a voice signal based on the voice input from the microphone 1 is output. The audio signal is amplified by a predetermined amplifier circuit and then input to band filters 2a, 2b, and 2c, respectively. The band filter 2a has a predetermined pass frequency band W1 of 200 to 300 Hz, and the band filter 2b has a pass frequency band W2. , And the bandpass filter 2c passes and outputs only audio frequency components between 1200 and 1600 Hz, which is the pass frequency band W3. These three outputs are input to the comparators 3a, 3b, and 3c, respectively, are compared with predetermined thresholds Va, Vb, and Vc, respectively, and when the input is larger than the threshold, the comparator outputs ON. The thresholds Va, Vb, and Vc are controlled by the noise monitor 4.
[0022]
The thresholds Va, Vb, and Vc are determined by inputting the audio signal based on the audio input from the microphone 1 to a spectrum calculator 4a of the noise monitor 4 after the audio signal is amplified by a predetermined amplifier circuit (not shown). The signal spectrum is calculated and controlled. That is, the signal spectrum is input to the counting unit 4b, and if the number of times the signal spectrum exceeds the predetermined threshold value in a predetermined time is large, it is determined that there is background noise, and the threshold value is changed by the threshold control unit 4c. The output of the AND circuit 5 is turned on only when the outputs of all the comparators 3a, 3b, and 3c are turned on to the AND circuit 5, and the predetermined voice "Yes" is recognized.
[0023]
According to the speech recognition apparatus of the embodiment described above, the speech signal input from the microphone 1 is discriminated by a predetermined frequency band by the three band filters 2 and output, and the noise monitor 4 outputs a background other than the speech to be recognized. Since the threshold value is controlled and changed by the comparator 3 based on the noise and compared and calculated and recognized by the AND circuit 5, it is possible to reliably recognize the voice without erroneous recognition even if there is background noise. Can be. Further, since the audio signal is discriminated by the frequency band having the center frequency of the formant frequency of the predetermined voice, the recognition ability of the predetermined voice to be recognized is improved. In addition, a signal component having a high frequency band of a voice signal is compared with a signal component having a low frequency band using a lower threshold value. For example, there is low-frequency background noise due to vibration of a transformer of a power supply circuit of a voice recognition device. However, the predetermined voice to be recognized can be reliably recognized. The output discriminated by the bandpass filter 2 based on the predetermined frequency band is obtained by calculating the signal spectrum of the background noise other than the speech to be recognized by the spectrum calculator 4a of the noise monitor 4, and outputting the signal spectrum by the counter 4b. Are counted by the comparator 3 in which the number of times exceeds a predetermined threshold value in a predetermined time and the threshold value is changed. Therefore, the threshold value is controlled and changed in accordance with the characteristic of the background noise, and is a recognition target. The ability to recognize a given voice is improved.
[0024]
First of reference of the form]
Figure 4 is a block diagram of a noise monitor of the voice recognition device of the first reference embodiment.
[0025]
This speech recognition apparatus differs from the speech recognition apparatus according to the first embodiment only in the configuration of the noise monitor 4, and the other parts are identical.
[0026]
The noise monitor 4 also changes the thresholds Va, Vb, Vc of the three comparators 3a, 3b, 3c based on background noise other than the voice to be recognized in the voice signal output from the microphone 1, and averages the average value. It has an operation unit 4d, a comparison unit 4e for comparing the average value operation output from the average value operation unit 4d, and a threshold control unit 4f for controlling a threshold value based on the comparison result of the comparison unit 4e. The average value calculator 4d calculates and outputs the average value of the input level of the audio signal over different lengths of time. For example, the first average having the different calculation time for calculating the average value of the input level has a different value. It comprises a value calculation unit and a second average value calculation unit. By the noise monitor 4, the background noise other than the speech to be recognized, which is output from the microphone 1, is changed to an input level over a period of different lengths by the average value calculation unit 4d, ie, the first average value calculation unit and the second average value calculation unit. Is calculated by the comparison unit 4e. If short-term average value output> long-term average value output, it is determined that a voice is input, and if short-time average value output <long-term average value output, it is determined that a background noise is input, and the thresholds Va, Vb, and Vc are changed. . The average value calculation unit 4d is not limited to the configuration including the first average value calculation unit and the second average value calculation unit having different average value calculation times, but may be configured using a DSP.
[0027]
According to the speech recognition apparatus of the reference embodiment described above, the output discriminated by the bandpass filter 2 in the predetermined frequency band is such that the background noise other than the speech to be recognized is input by the average value calculation unit 4d over different time periods. After the average value of the level is calculated, the threshold value is changed by comparing the respective values, and the threshold value is changed. Therefore, the threshold value is controlled and changed in accordance with the duration of the background noise, and the target is recognized. The ability to recognize a given voice is improved.
[0028]
[ Second Reference Form]
Figure 5 is a functional block diagram showing a speech recognition apparatus of a second reference embodiment.
[0029]
This voice recognition device includes a microphone 1 corresponding to voice input means, three phoneme detection circuits 6 corresponding to discriminating means for discriminating predetermined phonemes, and a calculator 7 corresponding to calculating means.
[0030]
The microphone 1 is configured by the same one as that of the first embodiment.
[0031]
The phoneme detection circuit 6 discriminates a speech signal output from the microphone 1 for each predetermined phoneme. In this embodiment, three phonemes are connected in parallel. The phoneme detection circuit 6 detects a phoneme of a predetermined voice. The bandpass filter has a center frequency of a formant frequency band for detecting a predetermined phoneme of a passing voice signal and a predetermined threshold. Have been. In this embodiment, the phoneme detection circuit 6 has, for example, phoneme detection circuits 6a, 6b, and 6c for detecting three phonemes "h", "a", and "i" in the voice of "yes" to be recognized. It is configured.
[0032]
The computing unit 7 computes the output from the phoneme detection circuit 6, and is constituted by an identification circuit that identifies that three outputs from the phoneme detection circuits 6a, 6b, and 6c are output in a predetermined order. As for the output of the arithmetic unit 7, the inputs from the phoneme detection circuits 6a, 6b, and 6c are first input from the phoneme detection circuit 6a that is “h”, and then from the phoneme detection circuit 6b that is “a”. ON is output only when it is input, and then when it is input from the phoneme detection circuit 6c of "i".
[0033]
Or According to the speech recognition device references the embodiment described, the audio signal inputted from the microphone 1 is three phonemes detection circuit 6a, 6b, and Tsu by the 6c is outputted to discrimination by predetermined phonemes, given by the calculator 5 Are output and recognized. Therefore, the output order is recognized in accordance with the phoneme of the predetermined voice to be recognized, and the recognition capability of the predetermined voice is improved.
[0034]
【The invention's effect】
As described above, according to the speech recognition method and apparatus of the present invention, each output level discriminated by at least two frequency bands is compared by changing the threshold based on background noise other than the speech to be recognized, and Is recognized, so that even if there is background noise, speech can be reliably recognized without erroneous recognition. In addition, the output discriminated by the predetermined frequency band by the discriminating means is such that the background noise other than the speech to be recognized is processed by the spectrum calculation unit to calculate a signal spectrum, and the spectrum output is set to a predetermined threshold by the counting unit at a predetermined time. Since the number of times exceeds the threshold value and the threshold value is changed by the comparing means, the threshold value is controlled and changed in accordance with the characteristic of the background noise, and the recognition ability of the predetermined speech to be recognized is improved. .
[0035]
In addition, in addition to the effects of the second aspect, the voice recognition apparatus according to the third aspect is characterized in that the voice signal is discriminated by a frequency band having a center frequency around a formant frequency in a predetermined voice. The ability to recognize a given voice is improved.
[0036]
In addition, the speech recognition device according to the fourth aspect has the same effects as those of the second or third aspect, and furthermore, a signal component having a higher frequency band of the audio signal is compared with a signal component having a lower frequency band by using a lower threshold value. Therefore, even if there is low-frequency background noise due to, for example, the vibration of the transformer of the power supply circuit of the voice recognition device, the predetermined voice to be recognized can be reliably recognized.
[Brief description of the drawings]
FIG. 1 is a functional block diagram showing a voice recognition device according to a first embodiment of the present invention.
FIG. 2 is an explanatory diagram of a threshold value of a comparator of the speech recognition device shown in FIG.
FIG. 3 is a configuration diagram of a noise monitor of the voice recognition device shown in FIG. 1;
4 is a functional block diagram showing a speech recognition apparatus of the first configuration [5] of the noise monitor of the voice recognition device of Reference Embodiment second reference embodiment.
FIG. 6 is a functional block diagram showing a conventional example.
[Explanation of symbols]
1 microphone (voice input means)
2 Band filter (discriminating means)
3. Comparator (comparing means)
4 Noise monitor (threshold control means)
5 AND circuit (arithmetic means)
6. Phoneme detection circuit (discrimination means)
7 Computing unit (computing means)

Claims

In the method of recognizing a speech to be recognized,
Voice input means for inputting the voice signal to be recognized together with background noise, discriminating means for discriminating the voice signal by at least two predetermined frequency bands, and comparing each output level from the discriminating means with a threshold value Means, threshold control means for changing the threshold based on background noise other than speech to be recognized, and arithmetic means for calculating an output from the comparison means, wherein the threshold control means comprises a signal spectrum of an audio signal. Has a counting unit that counts the number of times that a spectrum output from the spectrum calculation unit exceeds a predetermined threshold at a predetermined time,
The audio signal input by the audio input means is discriminated by the discriminating means by at least two predetermined frequency bands, and the output level of each of the frequency bands is changed by the comparing means to a threshold value changed by the threshold control means. A speech recognition method comprising: recognizing the speech to be recognized by comparing and comparing the result of the comparison with the computing unit .

In a device for recognizing a speech to be recognized,
Voice input means for inputting the voice signal to be recognized together with background noise, discriminating means for discriminating the voice signal by at least two predetermined frequency bands, and comparing each output level from the discriminating means with a threshold value Means, threshold control means for changing the threshold based on background noise other than speech to be recognized, and arithmetic means for calculating an output from the comparison means , wherein the threshold control means comprises a signal spectrum of an audio signal. a spectrum calculating unit for calculating and outputting a speech recognition apparatus which spectral output is characterized Rukoto to have a counting unit for counting the number of times exceeding a predetermined threshold in a predetermined time from the spectrum calculating unit.

3. The speech recognition apparatus according to claim 2, wherein the center of the frequency band of the discriminating means is a formant frequency of a predetermined speech to be recognized.

4. The speech recognition apparatus according to claim 2, wherein said comparing means sets a threshold value of a high frequency band of the discriminating means on the input side smaller than a threshold value of a low frequency band of the discriminating means on the input side .