JP2000089799A

JP2000089799A - Voice recognition system and method and recording medium stored with software for voice recognietion

Info

Publication number: JP2000089799A
Application number: JP10254793A
Authority: JP
Inventors: Koichiro Fukunaga; 功一郎福永; Masami Maesaka; 正巳前坂; Mitsuaki Shibazaki; 光陽柴崎; Makoto Kisanuki; 誠木佐貫
Original assignee: Clarion Co Ltd
Current assignee: Faurecia Clarion Electronics Co Ltd
Priority date: 1998-09-09
Filing date: 1998-09-09
Publication date: 2000-03-31

Abstract

PROBLEM TO BE SOLVED: To correctly perform voice recognition with a small load even when there is noise in the surroundings of a voice recognition system. SOLUTION: A speaking state detecting circuit 45 detects the state that a user does not perform the inputting of vioces, that is, recognition operation is not performed. During this state, a filter arithmetic circuit 46 calculates filter constants based on inputs from two microphones 41, 42 to store them in a storage memory 47 together with corresponding correlation values. At the time of the recognition operation, a filter constant selecting circuit 49 selects a filter constant whose correlation value is the highest from the storage memory 47 and a noise cancelling circuit 410 eliminates the noise in a voice signal by using this filter constant. Since it is made unnecessary to perform complex operation and complex calculation for cancelling the noise simultaneously during actual recognition processing, the amount of calculations which should be performed in real time is reduced and then since it is made unnecessary to use circuits whose operating speeds are high speed especially, it is made easy to miniaturize a voice recognition system and to reduce the cost of the system.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、発話された語句を
認識する音声認識の技術の改良にかかわるもので、より
具体的には、周囲に騒音の存在する環境においても、少
ない負荷で正しく認識動作を行うようにしたものであ
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an improvement in speech recognition technology for recognizing spoken words. An operation is performed.

【０００２】[0002]

【従来の技術】カーオーディオシステムなどの電子機器
に情報を入力する技術として、音声認識が知られてい
る。この音声認識では、認識しようとする語句ごとの特
徴として、波形やパラメータなどのデータを認識辞書と
して予め用意しておく。なお、個々の語句ごとのデータ
を音声参照データと呼び、複数の音声参照データを格納
した部分を認識辞書と呼ぶ。そして、音声認識では、マ
イクロフォンから入力した使用者の音声をこの認識辞書
とパターンマッチングすることによって、発話された語
句を認識する。2. Description of the Related Art Voice recognition is known as a technique for inputting information to an electronic device such as a car audio system. In this speech recognition, data such as waveforms and parameters are prepared in advance as a recognition dictionary as features of each phrase to be recognized. Note that data for each individual phrase is referred to as voice reference data, and a portion storing a plurality of voice reference data is referred to as a recognition dictionary. In speech recognition, the spoken words are recognized by pattern matching of the user's voice input from the microphone with the recognition dictionary.

【０００３】ここで、このような音声認識を行う音声認
識システムの構成例を図９に示す。すなわち、このシス
テムは、マイクロフォン（マイク）１１と、入力回路１
２と、認識辞書１３と、パターンマッチング回路１４
と、認識結果出力回路１５とを備えていて、例えば、カ
ーオーディオシステムなどに命令を入力する部分として
使われる。FIG. 9 shows a configuration example of a speech recognition system for performing such speech recognition. That is, this system includes a microphone (microphone) 11 and an input circuit 1
2, a recognition dictionary 13, and a pattern matching circuit 14
And a recognition result output circuit 15, which is used, for example, as a part for inputting a command to a car audio system or the like.

【０００４】この例では、使用者が、認識させたい語句
をマイク１１に対して発話すると、マイク１１から入力
された音声信号の波形を、入力回路１２がデジタルデー
タに変換する。そして、パターンマッチング回路１４
が、このデジタルデータの特徴と認識辞書１３に記憶さ
れている語句ごとの音声参照データとを比較照合し、該
当するものがあれば、認識結果出力回路１５を介して、
例えば語句コードなどの形で認識結果が出力される。In this example, when the user speaks a phrase to be recognized to the microphone 11, the input circuit 12 converts the waveform of the audio signal input from the microphone 11 into digital data. Then, the pattern matching circuit 14
Compares the feature of the digital data with the voice reference data for each phrase stored in the recognition dictionary 13, and if there is a corresponding one, via the recognition result output circuit 15,
For example, the recognition result is output in the form of a phrase code or the like.

【０００５】ところで、このような音声認識システムを
実際に使用する環境下では、マイク１１に、使用者の音
声以外に周囲の騒音も存在し、本来目的とする音声と同
時に集音されることが多い。このような場合、周囲の騒
音の影響で、パターンマッチング回路１４に入力される
音声信号はＳ／Ｎが悪くなり、パターンマッチング回路
１４で音声データの特徴を十分に抽出できないため、認
識辞書１３内の音声参照データとの照合が正しく行われ
なくなる場合もある。この結果、正しい認識結果が得ら
れず、認識率が悪化するといった問題が発生する。In an environment in which such a voice recognition system is actually used, the microphone 11 includes ambient noise in addition to the user's voice, and may be collected simultaneously with the originally intended voice. Many. In such a case, the S / N of the audio signal input to the pattern matching circuit 14 deteriorates due to the influence of ambient noise, and the pattern matching circuit 14 cannot sufficiently extract the characteristics of the audio data. May not be correctly collated with the voice reference data. As a result, there arises a problem that a correct recognition result cannot be obtained and the recognition rate is deteriorated.

【０００６】このような問題を解決する目的で、音声認
識システムに適応騒音キャンセル処理を追加することに
よって、認識の対象となる音声信号から騒音を取り除く
ことで、認識精度を改善する従来技術も知られている。
ここで、図１０は、このような従来技術の構成例を示す
ブロック図である。このシステムは、音声用マイク２１
と騒音用マイク２２という２つのマイクロフォンを備
え、音声入力回路２３に加えて適応騒音キャンセル回路
２４を備えている。また、このシステムは、その他に、
図９で説明したのと同様なパターンマッチング回路２５
と、認識辞書２６と、認識結果出力回路２７などを備え
ている。For the purpose of solving such a problem, there is also known a conventional technique for improving recognition accuracy by removing noise from a speech signal to be recognized by adding adaptive noise cancellation processing to the speech recognition system. Have been.
Here, FIG. 10 is a block diagram showing a configuration example of such a conventional technique. This system uses an audio microphone 21
And a microphone 22 for noise, and an adaptive noise canceling circuit 24 in addition to the voice input circuit 23. In addition, this system
Pattern matching circuit 25 similar to that described in FIG.
, A recognition dictionary 26, a recognition result output circuit 27, and the like.

【０００７】この例では、音声用マイク２１は、使用者
の発話する音声を収録しやすい位置に設置し、一方、騒
音用マイク２２は、音声用マイク２１に混入する騒音源
と同じ騒音が収録されるような位置に設置する。そし
て、これら２つのマイクの入力信号は適応騒音キャンセ
ル回路２４に入力され、この適応騒音キャンセル回路２
４が、これら２系統の入力に基づいて適応フィルタリン
グ処理を行うことによって、騒音がキャンセル（軽減）
された音声入力をパターンマッチング回路２５に対して
出力する。In this example, the voice microphone 21 is installed at a position where the voice spoken by the user can be easily recorded, while the noise microphone 22 captures the same noise as the noise source mixed into the voice microphone 21. In a location where Then, the input signals of these two microphones are input to the adaptive noise canceling circuit 24, and the adaptive noise canceling circuit 2
4 performs an adaptive filtering process based on these two inputs, thereby canceling (reducing) noise.
The input voice is output to the pattern matching circuit 25.

【０００８】ここで、適応フィルタリング処理の一般的
な方法としては、ＬＭＳ法等が用いられる。すなわち、
適応フィルタリング処理は、例えば適応ノイズキャンセ
ラで行い、この適応ノイズキャンセラでは、一方の音声
用マイクから騒音の混じった音声を入力し、他方の騒音
用マイクから主に騒音のもととなる雑音源信号を入力す
る。そして、雑音源信号に基づいて、適応フィルタで誤
差信号などを利用して推定した騒音を音声から差し引け
ば、真に所望の音声が得られる。Here, as a general method of the adaptive filtering process, an LMS method or the like is used. That is,
The adaptive filtering process is performed by, for example, an adaptive noise canceller. In this adaptive noise canceller, a voice mixed with noise is input from one voice microphone, and a noise source signal that mainly causes noise is input from the other noise microphone. I do. Then, based on the noise source signal, by subtracting the noise estimated by the adaptive filter using the error signal or the like from the voice, a truly desired voice can be obtained.

【０００９】そして、ＬＭＳ(Least Mean Square) 法
は、このような適応フィルタのフィルタ定数を更新する
ためのグラディエント・ベクトルすなわち勾配ベクトル
について、誤差信号と入力信号から長時間かかる平均操
作などで期待値を求める代わりに、サンプルごとの瞬時
推定値を使ってフィルタ定数を更新する手法である。The LMS (Least Mean Square) method uses a gradient vector for updating the filter constant of such an adaptive filter, that is, a gradient vector, to an expected value by an averaging operation that takes a long time from an error signal and an input signal. Is a method of updating the filter constant using the instantaneous estimated value for each sample instead of obtaining

【００１０】この場合、パターンマッチング回路２５で
は、このように適応騒音キャンセル回路２４から入力さ
れる騒音キャンセル後の音声データの特徴と、認識辞書
２６に記憶されている音声参照データとを比較照合し、
該当するものがあれば、認識結果出力回路２７を介して
認識結果が出力される。In this case, the pattern matching circuit 25 compares and compares the features of the voice data after noise cancellation inputted from the adaptive noise cancellation circuit 24 with the voice reference data stored in the recognition dictionary 26. ,
If there is a corresponding one, the recognition result is output via the recognition result output circuit 27.

【００１１】[0011]

【発明が解決しようとする課題】しかしながら、上に述
べたような適応フィルタリング処理を使った従来技術で
は、同時に処理しなければならない負荷が大きいという
問題があった。すなわち、音声認識によってリアルタイ
ムに電子機器などを動作させる場合、従来技術では、音
声を発話しているときに、適応フィルタリング処理とパ
ターンマッチング処理という２つの複雑な処理を同時並
行して実行しなければならなかった。このため、動作ス
ピードの高速な回路が必要であり、コストや小型化など
の点で障害が大きいという問題があった。However, the prior art using the above-described adaptive filtering processing has a problem that the load that must be processed simultaneously is large. In other words, in the case of operating an electronic device or the like in real time by voice recognition, in the related art, two complicated processes, an adaptive filtering process and a pattern matching process, must be performed simultaneously in parallel when voice is being uttered. did not become. For this reason, a circuit having a high operation speed is required, and there is a problem that the obstacle is large in terms of cost, size reduction, and the like.

【００１２】本発明は、上記のような従来技術の問題点
を解決するために提案されたもので、その目的は、周囲
に騒音の存在する環境においても、少ない負荷で正しく
認識動作を行う音声認識の技術を提供することである。The present invention has been proposed in order to solve the above-mentioned problems of the prior art, and has as its object to provide a voice recognition apparatus capable of performing a recognition operation correctly with a small load even in an environment where noise is present in the surroundings. To provide recognition technology.

【００１３】[0013]

【課題を解決するための手段】上に述べた目的を達成す
るため、請求項１の音声認識システムは、第１及び第２
の各マイクロフォンと、前記各マイクロフォンからの各
音声信号に基づいて、フィルタリング処理用のフィルタ
定数を演算する手段と、第１のマイクロフォンからの音
声信号に対して、第２のマイクロフォンからの音声信号
と、前記演算されたフィルタ定数に基づいた特性と、に
基づいたフィルタリング処理を行うことによって騒音成
分を除去する手段と、騒音が除去された音声信号に基づ
いて、発話された語句を認識する手段と、マイクロフォ
ンから音声が入力されているかどうかを判定する手段
と、前記音声が入力されていないときに、前記フィルタ
定数を演算するように制御する手段と、を備えたことを
特徴とする。請求項３の音声認識方法は、請求項１の発
明を方法という見方からとらえたもので、第１及び第２
の各マイクロフォンからの各音声信号に基づいて、フィ
ルタリング処理用のフィルタ定数を演算するステップ
と、第１のマイクロフォンからの音声信号に対して、第
２のマイクロフォンからの音声信号と、前記演算された
フィルタ定数に基づいた特性と、に基づいたフィルタリ
ング処理を行うことによって騒音成分を除去するステッ
プと、騒音が除去された音声信号に基づいて、発話され
た語句を認識するステップと、マイクロフォンから音声
が入力されているかどうかを判定するステップと、前記
音声が入力されていないときに、前記フィルタ定数を演
算するように制御するステップと、を含むことを特徴と
する。請求項５の発明は、請求項１，３の発明を、コン
ピュータのソフトウェアを記録した記録媒体という見方
からとらえたもので、音声中の語句をコンピュータを使
って認識するための音声認識用ソフトウェアを記録した
記録媒体において、そのソフトウェアは前記コンピュー
タに、第１及び第２の各マイクロフォンからの各音声信
号に基づいて、フィルタリング処理用のフィルタ定数を
演算させ、第１のマイクロフォンからの音声信号に対し
て、第２のマイクロフォンからの音声信号と、前記演算
されたフィルタ定数に基づいた特性と、に基づいたフィ
ルタリング処理を行うことによって騒音成分を除去さ
せ、騒音が除去された音声信号に基づいて、発話された
語句を認識させ、マイクロフォンから音声が入力されて
いるかどうかを判定させ、前記音声が入力されていない
ときに、前記フィルタ定数を演算させることを特徴とす
る。請求項１，３，５の発明では、騒音除去用のフィル
タ定数の計算について、実際の音声入力が行われていな
い間に行われ、実際の認識処理と同時並行して行う必要
がないので、少ない負荷で正しく音声認識を行うことが
可能となる。In order to achieve the above-mentioned object, a speech recognition system according to claim 1 comprises a first and a second speech recognition system.
Means for calculating a filter constant for filtering processing based on each audio signal from each microphone, and an audio signal from a second microphone with respect to an audio signal from the first microphone. A means for removing a noise component by performing a filtering process based on the characteristic based on the calculated filter constant, and a means for recognizing a spoken word based on a voice signal from which noise has been removed. Means for determining whether or not sound is being input from a microphone, and means for controlling so as to calculate the filter constant when the sound is not being input. The speech recognition method according to claim 3 is a method in which the invention according to claim 1 is viewed from the viewpoint of a method.
Calculating a filter constant for a filtering process based on each audio signal from each of the microphones; and an audio signal from a second microphone with respect to the audio signal from the first microphone. Removing a noise component by performing a filtering process based on a characteristic based on a filter constant, a step of recognizing a spoken word based on the voice signal from which the noise has been removed, and It is characterized by including a step of determining whether or not the input is performed, and a step of performing control so as to calculate the filter constant when the sound is not input. According to a fifth aspect of the present invention, the inventions of the first and third aspects are viewed from the viewpoint of a recording medium on which computer software is recorded. Speech recognition software for recognizing words in speech using a computer is provided. In the recorded recording medium, the software causes the computer to calculate a filter constant for a filtering process based on each audio signal from each of the first and second microphones, and to process the audio signal from the first microphone. The noise component is removed by performing a filtering process based on the audio signal from the second microphone and the characteristic based on the calculated filter constant, and based on the audio signal from which the noise has been removed, Recognize uttered words and phrases, determine whether voice is input from the microphone, When the serial audio is not input, and wherein the to calculate the filter constant. According to the first, third, and fifth aspects of the present invention, since the calculation of the filter constant for noise removal is performed while the actual speech input is not being performed, it is not necessary to perform the calculation in parallel with the actual recognition processing. Voice recognition can be performed correctly with a small load.

【００１４】請求項２の音声認識システムは、発話され
る音声を入力するための第１のマイクロフォンと、前記
第１のマイクロフォンに混入する騒音と同一の騒音を入
力するための第２のマイクロフォンと、前記音声の入力
時にオンするためのスイッチと、与えられる音声信号か
ら語句を認識し、認識を終了すると終了信号を出力する
認識手段と、前記スイッチの状態と前記終了信号とに基
づいて発話中であるかどうかを判断する手段と、前記各
マイクロフォンの各音声信号間の相関値が高くなるよう
に適応的にフィルタ定数を演算する演算手段と、演算さ
れたフィルタ定数と、そのフィルタ定数が演算されたと
きの２つの音声信号の間の相関値との組み合わせを、複
数記憶するメモリと、前記発話中でない間だけ、予め決
められた間隔で前記演算手段を動作させる制御手段と、
前記メモリに記憶されている相関値のなかで最大の相関
値に対応するフィルタ定数を選ぶ手段と、選ばれた前記
フィルタ定数と前記各マイクロフォンからの２つの音声
信号とに基づいて、フィルタリング処理を行うことによ
って、騒音がキャンセルされた音声信号を出力する手段
と、を備え、前記認識手段は、前記騒音がキャンセルさ
れた音声信号に基づいて語句を認識するように構成され
たことを特徴とする。請求項４の発明は、請求項２の発
明を方法という見方からとらえたもので、発話される音
声を入力するための第１のマイクロフォンと、前記第１
のマイクロフォンに混入する騒音と同一の騒音を入力す
るための第２のマイクロフォンと、を使った音声認識方
法において、与えられる音声信号から語句を認識し、認
識を終了すると終了信号を出力する認識ステップと、音
声の入力時にオンするためのスイッチの状態と前記終了
信号とに基づいて発話中であるかどうかを判断するステ
ップと、前記各マイクロフォンの各音声信号間の相関値
が高くなるように適応的にフィルタ定数を演算するステ
ップと、演算されたフィルタ定数と、そのフィルタ定数
が演算されたときの２つの音声信号の間の相関値との組
み合わせを、複数記憶するステップと、前記発話中でな
い間だけ、予め決められた間隔で前記演算するステップ
を実行するように制御するステップと、前記記憶されて
いる相関値のなかで最大の相関値に対応するフィルタ定
数を選ぶステップと、選ばれた前記フィルタ定数と前記
各マイクロフォンからの２つの音声信号とに基づいて、
フィルタリング処理を行うことによって、騒音がキャン
セルされた音声信号を出力するステップと、を含み、前
記認識ステップは、前記騒音がキャンセルされた音声信
号に基づいて語句を認識することを特徴とする。請求項
６の発明は、請求項２，４の発明を、コンピュータのソ
フトウェアを記録した記録媒体という見方からとらえた
もので、音声中の語句を、コンピュータを使って認識す
るための音声認識用ソフトウェアを記録した記録媒体に
おいて、そのソフトウェアは前記コンピュータに、発話
される音声を入力するための第１のマイクロフォンと、
前記第１のマイクロフォンに混入する騒音と同一の騒音
を入力するための第２のマイクロフォンと、とを使っ
て、与えられる音声信号から語句を認識し、認識を終了
すると終了信号を出力させ、音声の入力時にオンするた
めのスイッチの状態と前記終了信号とに基づいて発話中
であるかどうかを判断させ、前記各マイクロフォンの各
音声信号間の相関値が高くなるように適応的にフィルタ
定数を演算させ、演算されたフィルタ定数と、そのフィ
ルタ定数が演算されたときの２つの音声信号の間の相関
値との組み合わせを、複数記憶させ、前記発話中でない
間だけ、予め決められた間隔で前記フィルタ定数の演算
を実行させ、前記記憶されている相関値のなかで最大の
相関値に対応するフィルタ定数を選ばせ、選ばれた前記
フィルタ定数と前記各マイクロフォンからの２つの音声
信号とに基づいて、フィルタリング処理を行うことによ
って、騒音がキャンセルされた音声信号を出力させ、前
記認識では、前記騒音がキャンセルされた音声信号に基
づいて語句を認識させることを特徴とする。According to a second aspect of the present invention, there is provided a voice recognition system comprising: a first microphone for inputting a voice to be uttered; and a second microphone for inputting the same noise as the noise mixed in the first microphone. A switch for turning on at the time of inputting the voice, a recognition unit for recognizing a phrase from a given voice signal and outputting an end signal when the recognition is completed, and uttering based on the state of the switch and the end signal. Means for determining whether or not the filter constants are calculated; calculating means for adaptively calculating a filter constant so that the correlation value between the audio signals of the microphones is high; and calculating the calculated filter constant and calculating the filter constant. A memory for storing a plurality of combinations of the correlation values between the two voice signals at the time of the speech, and a combination of the correlation values at a predetermined interval only during the period when the speech is not being performed. And control means for operating the computing means,
Means for selecting a filter constant corresponding to the largest correlation value among the correlation values stored in the memory; and a filtering process based on the selected filter constant and two audio signals from each of the microphones. Means for outputting a sound signal from which noise has been cancelled, and wherein the recognizing means is configured to recognize a phrase based on the sound signal from which the noise has been cancelled. . According to a fourth aspect of the present invention, the second aspect of the present invention is viewed from the viewpoint of a method, wherein a first microphone for inputting a voice to be spoken and the first microphone are provided.
And a second microphone for inputting the same noise as the noise mixed in the microphone of the first embodiment. A recognition step of recognizing a phrase from a given voice signal, and outputting an end signal when the recognition is completed. Determining whether or not a speech is being made based on a state of a switch for turning on at the time of voice input and the end signal, and adapting so that a correlation value between each voice signal of each microphone becomes high. Calculating a filter constant; storing a plurality of combinations of the calculated filter constant and a correlation value between the two audio signals when the filter constant is calculated; Controlling the computer to execute the calculating step at a predetermined interval only for a time interval; A step of selecting a filter constant corresponding to the maximum correlation value, based the the selected the filter constant and two audio signals from the microphones,
Outputting a voice signal from which noise has been canceled by performing a filtering process, wherein the recognizing step recognizes a phrase based on the voice signal from which the noise has been canceled. According to a sixth aspect of the present invention, the inventions of the second and fourth aspects are viewed from the viewpoint of a recording medium on which software of a computer is recorded. Software for voice recognition for recognizing words and phrases in speech using a computer. On a recording medium recording the software, the software comprises: a first microphone for inputting an uttered voice to the computer;
A second microphone for inputting the same noise as the noise mixed into the first microphone; and a second microphone for recognizing a phrase from a given voice signal, and when the recognition is completed, outputting an end signal. Is determined based on the state of the switch for turning on at the time of input and the end signal, and the filter constant is adaptively adjusted so that the correlation value between the audio signals of the microphones is increased. A plurality of combinations of the calculated filter constant and the correlation value between the two audio signals at the time when the calculated filter constant is calculated are stored at a predetermined interval only while the speech is not being performed. The calculation of the filter constant is executed, and a filter constant corresponding to a maximum correlation value is selected from among the stored correlation values. By performing a filtering process based on two audio signals from a microphone, a noise-cancelled audio signal is output, and in the recognition, a phrase is recognized based on the noise-cancelled audio signal. It is characterized by.

【００１５】請求項２，４，６の発明では、音声用の第
１のマイクロフォンと騒音用の第２のマイクロフォンと
いう２つのマイクロフォン（マイク）を使い、使用者が
音声入力を行っていない、すなわち認識動作が行われて
いない状態を検出し、その状態の間に、上記２つのマイ
ク間の入力波形の相関を検出し、第２のマイクに設けた
フィルタの定数を、２つのマイクの間でこの相関が高く
なるように演算し、一定時間以上たてば設定を中止し、
フィルタ定数を決定させる。このような動作を一定時間
間隔で実行し、各動作時のフィルタ定数を別々に記憶さ
せておく。そして、実際に認識動作が行われる際には、
使用者の音声認識動作の開始を検出した後、これまで記
憶されているフィルタ定数の設定値の中で、最も相関の
高かったをものを選択し、そのフィルタ定数から定まる
特性のフィルタを介した第１のマイクからの音声信号
と、第２のマイクからの音声信号との差分を音声認識回
路の入力とし、音声認識処理を行う。According to the second, fourth and sixth aspects of the present invention, two microphones (microphones), that is, a first microphone for voice and a second microphone for noise are used, and the user does not input voice. A state in which the recognition operation is not performed is detected, and during that state, a correlation between the input waveforms between the two microphones is detected, and a constant of a filter provided in the second microphone is set between the two microphones. Calculate so that this correlation is high, stop setting if it is longer than a certain time,
Let the filter constant be determined. Such operations are executed at regular time intervals, and filter constants for each operation are separately stored. And when the recognition operation is actually performed,
After detecting the start of the user's voice recognition operation, from among the set values of the filter constants stored so far, the one having the highest correlation is selected, and a filter having a characteristic determined from the filter constant is selected. The difference between the audio signal from the first microphone and the audio signal from the second microphone is input to a speech recognition circuit to perform speech recognition processing.

【００１６】この結果、騒音キャンセル処理を施した信
号を認識パターンマッチング処理の対象とするため、騒
音環境下でも認識率を低下させず認識動作を行わせるこ
とができる。また、認識処理を行っていない、いわば空
き時間を見計らって騒音キャンセルフィルタの定数の設
定を行う。このため、実際の認識処理中に複雑な騒音キ
ャンセルのための動作や計算などを同時に行う必要がな
い。これによって、リアルタイムで行わなければならな
い演算量を削減でき、特別に動作スピードの高速な回路
を使用する必要がないので音声認識システムの小型化や
費用低減なども容易になる。As a result, since the signal subjected to the noise cancellation processing is subjected to the recognition pattern matching processing, the recognition operation can be performed without lowering the recognition rate even in a noise environment. In addition, the constant of the noise canceling filter is set in consideration of the vacant time in which the recognition process is not performed, so to speak. Therefore, there is no need to simultaneously perform complicated operations for noise cancellation and calculations during the actual recognition processing. As a result, the amount of computation that must be performed in real time can be reduced, and it is not necessary to use a specially high-speed circuit, which facilitates downsizing and cost reduction of the speech recognition system.

【００１７】[0017]

【発明の実施の形態】次に、この発明のナビゲーション
システムの実施の形態（以下「実施形態」という）につ
いて、図面を参照して具体的に説明する。なお、以下の
説明で使うそれぞれの図について、それより前で説明し
た図と同じ部材や同じ種類の部材については同じ符号を
つけ、説明は省略する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Next, an embodiment of a navigation system according to the present invention (hereinafter referred to as "embodiment") will be specifically described with reference to the drawings. In each of the drawings used in the following description, the same members and members of the same type as those in the drawings described earlier are denoted by the same reference numerals, and description thereof will be omitted.

【００１８】また、この実施形態は、いろいろなハード
ウェア装置と、ソフトウェアによって制御されるコンピ
ュータとを使って実現される。この場合、そのソフトウ
ェアは、この明細書の記載にしたがった命令を組み合わ
せることで作られ、上に述べた従来技術と共通の部分に
は従来技術で説明した手法も使われる。また、そのソフ
トウェアは、プログラムコードだけでなく、プログラム
コードの実行のときに使うために予め用意されたデータ
も含む。そして、そのソフトウェアは、ナビゲーション
システムに組み込まれたＣＰＵ、各種チップセットとい
った物理的な処理装置を活用することでこの発明の作用
効果を実現する。This embodiment is realized using various hardware devices and a computer controlled by software. In this case, the software is created by combining the instructions according to the description in this specification, and the common parts with the above-described prior art use the technique described in the prior art. Further, the software includes not only program codes but also data prepared in advance for use in executing the program codes. The software realizes the effects of the present invention by utilizing a physical processing device such as a CPU and various chipsets incorporated in the navigation system.

【００１９】但し、この発明を実現する具体的なハード
ウェアやソフトウェアの構成はいろいろ変更することが
できる。例えば、回路の構成やＣＰＵの処理能力に応じ
て、ある機能を、ＬＳＩなどの物理的な電子回路で実現
する場合も、ソフトウェアによって実現する場合も考え
られる。また、ソフトウェアを使う部分についても、ソ
フトウェアの形式には、コンパイラ、アセンブラなどい
ろいろ考えられる。また、この発明を実現するソフトウ
ェアを記録した記録媒体は、それ単独でもこの発明の一
態様である。However, specific hardware and software configurations for realizing the present invention can be variously modified. For example, depending on the circuit configuration and the processing capability of the CPU, a certain function may be realized by a physical electronic circuit such as an LSI, or may be realized by software. Also, regarding the parts that use software, various forms of software, such as compilers and assemblers, can be considered. A recording medium on which software for realizing the present invention is recorded is also an aspect of the present invention by itself.

【００２０】以上のように、コンピュータを使ってこの
発明を実現する態様はいろいろ考えられるので、以下で
は、この発明や実施形態に含まれる個々の機能を実現す
る仮想的回路ブロックを使って、この発明と実施形態と
を説明する。As described above, there are various possible modes for realizing the present invention using a computer. In the following, a virtual circuit block for realizing individual functions included in the present invention and the embodiments will be described. The invention and embodiments will be described.

【００２１】〔１．構成〕まず、図１は、この実施形態
の構成を示す機能ブロック図である。すなわち、この実
施形態は、音声用マイク４１と騒音参照用マイク４２
と、音声入力回路４３と、発話開始スイッチ４４と、発
話状態検出回路４５と、フィルタ演算回路４６と、フィ
ルタ定数演算結果用記憶メモリ４７と、演算動作制御回
路４８と、フィルタ定数選択回路４９と、騒音キャンセ
ル回路４１０と、パターンマッチング回路４１１と、認
識辞書４１２と、認識結果出力回路４１３と、を備えて
いる。[1. Configuration] FIG. 1 is a functional block diagram showing the configuration of this embodiment. That is, in this embodiment, the audio microphone 41 and the noise reference microphone 42
A speech input circuit 43, an utterance start switch 44, an utterance state detection circuit 45, a filter operation circuit 46, a filter constant operation result storage memory 47, an operation control circuit 48, and a filter constant selection circuit 49. , A noise canceling circuit 410, a pattern matching circuit 411, a recognition dictionary 412, and a recognition result output circuit 413.

【００２２】このうち、音声用マイク４１は、発話され
る音声を入力するための第１のマイクロフォンであり、
主に使用者の発話音声が収録される位置に設置される。
また、騒音参照用マイク４２は、音声用マイク４１に混
入する騒音と同一の騒音を入力するための第２のマイク
ロフォンであり、音声用マイクに混入する騒音源と同一
の騒音が収録され、かつ発話者の音声レベルができるだ
け小さく入力されるような位置に設置される。The voice microphone 41 is a first microphone for inputting voice to be uttered.
It is installed mainly at the position where the uttered voice of the user is recorded.
The noise reference microphone 42 is a second microphone for inputting the same noise as the noise mixed into the audio microphone 41, and the same noise as the noise source mixed into the audio microphone is recorded, and It is installed at a position where the voice level of the speaker can be input as low as possible.

【００２３】また、発話開始スイッチ４４は、前記音声
の入力時にユーザがオンするためのスイッチであり、パ
ターンマッチング回路４１１は、与えられる音声信号か
ら語句を認識し、認識を終了すると終了信号を出力する
認識手段であり、発話状態検出回路４５は、音声用マイ
ク４１から音声が入力されているかどうかを判定する手
段であり、具体的には、発話開始スイッチ４４の状態と
前記終了信号とに基づいて発話中であるかどうかを判断
する手段である。The utterance start switch 44 is a switch for the user to turn on when the voice is input. The pattern matching circuit 411 recognizes a word from a given voice signal, and outputs an end signal when the recognition is completed. The utterance state detection circuit 45 is a means for determining whether or not voice is being input from the voice microphone 41. Specifically, based on the state of the utterance start switch 44 and the end signal, This is a means for determining whether or not the user is speaking.

【００２４】また、フィルタ演算回路（騒音キャンセル
フィルタ演算回路とも表す）４６は、前記各マイクロフ
ォン４１，４２からの入力すなわち音声信号に基づい
て、フィルタリング処理用のフィルタ定数を演算する手
段であり、具体的には、前記各マイク４１，４２の各入
力間の相関値が高くなるように適応的にフィルタ定数を
演算する演算手段である。A filter calculation circuit (also referred to as a noise cancellation filter calculation circuit) 46 is means for calculating a filter constant for filtering processing based on the input from the microphones 41 and 42, that is, the audio signal. More specifically, it is a calculating means for adaptively calculating a filter constant so that the correlation value between the respective inputs of the microphones 41 and 42 becomes higher.

【００２５】また、演算動作制御回路４８は、フィルタ
演算回路４６に対して、前記音声が入力されていないと
きに前記適応フィルタリング処理用のフィルタ定数を演
算するように制御する手段であり、具体的には、前記発
話中でない間だけ、予め決められた間隔で前記演算手段
を動作させる制御手段である。The arithmetic operation control circuit 48 is means for controlling the filter arithmetic circuit 46 to calculate a filter constant for the adaptive filtering process when the voice is not input. Is control means for operating the arithmetic means at predetermined intervals only while the speech is not being made.

【００２６】また、記憶メモリ４７は、演算されたフィ
ルタ定数と、そのフィルタ定数が演算されたときの２つ
の入力の間の相関値との組み合わせを、複数記憶するメ
モリである。また、フィルタ定数選択回路４９は、記憶
メモリ４７に記憶されている相関値のなかで最大の相関
値に対応するフィルタ定数を選ぶ手段であり、具体的に
は、記憶メモリ４７に記憶されている過去ｋ回分の相関
値を比較し、そのなかで最大の相関値のものを抽出、そ
の最大の相関値に対応するフィルタ定数を、騒音キャン
セル回路４１０に送出するように構成されている。The storage memory 47 is a memory for storing a plurality of combinations of calculated filter constants and correlation values between two inputs when the filter constants are calculated. Further, the filter constant selection circuit 49 is means for selecting a filter constant corresponding to the largest correlation value from among the correlation values stored in the storage memory 47, and specifically, is stored in the storage memory 47. The correlation values for the past k times are compared, the one with the largest correlation value is extracted, and a filter constant corresponding to the largest correlation value is sent to the noise cancellation circuit 410.

【００２７】また、騒音キャンセル回路４１０は、選択
されたフィルタ定数と各マイク４１，４２からの２つの
入力すなわち音声信号とに基づいて、フィルタリング処
理を行うことによって、騒音がキャンセルされた音声信
号を出力する手段であり、具体的には、音声用マイク４
１からの音声信号に対して、騒音参照用マイク４２から
の音声信号と、前記選択されたフィルタ定数に基づいた
特性と、に基づいたフィルタリング処理を行うことによ
って騒音成分を除去する手段である。The noise canceling circuit 410 performs a filtering process on the basis of the selected filter constant and the two inputs from the microphones 41 and 42, that is, the audio signal, to convert the audio signal from which the noise has been canceled. Output means, specifically, a voice microphone 4
This is a means for removing a noise component by performing a filtering process on the audio signal from No. 1 based on the audio signal from the noise reference microphone 42 and the characteristic based on the selected filter constant.

【００２８】なお、ここで、図２は、騒音キャンセル回
路４１０によって実現されるフィルタ１１１と減算回路
１１２との働きで騒音のキャンセルが実現されることを
示す概念図である。また、パターンマッチング回路４１
１は、このように騒音が除去された出力すなわち音声信
号に基づいて、発話された語句を認識するように構成さ
れている。FIG. 2 is a conceptual diagram showing that noise is canceled by the operation of the filter 111 and the subtraction circuit 112 realized by the noise canceling circuit 410. Also, the pattern matching circuit 41
1 is configured to recognize an uttered word or phrase based on an output from which noise has been removed, that is, an audio signal.

【００２９】なお、フィルタ定数の具体的な形式や演算
手法、騒音キャンセルの具体的な手法などについては、
従来技術でも説明したような公知の技術を適宜選択して
用いればよい。The specific format and calculation method of the filter constant, the specific method of noise cancellation, etc.
A known technique described in the related art may be appropriately selected and used.

【００３０】〔２．作用〕上に述べたように構成された
本実施形態では、騒音除去用のフィルタ定数の計算につ
いて、実際の音声入力が行われていない間に行われ、実
際の認識処理と同時並行して行う必要がないので、少な
い負荷で正しく音声認識を行うことが可能となる。[2. Operation] In the present embodiment configured as described above, the calculation of the filter constant for noise removal is performed while no actual speech input is being performed, and is performed concurrently with the actual recognition processing. Since there is no need, it is possible to perform speech recognition correctly with a small load.

【００３１】〔２−１．発話状態検出回路の動作〕すな
わち、まず、使用者が現在発話中であるかどうかの判断
は、発話状態検出回路４５によって行われる。すなわ
ち、使用者は、語句を発話するときに発話開始スイッチ
４４をオンし、パターンマッチング回路４１１は発話さ
れた語句の認識が終了すると認識動作の終了信号を出力
する。このため、発話状態検出回路４５は、図３のフロ
ーチャートに示すように、オンされたときに発話開始ス
イッチ４４から送られてくる信号（ＳＷｏｎ信号）を待
ち受け（ステップ１）、ＳＷｏｎ信号を受け取ると（ス
テップ２）、発話中かどうかの状態をあらわす発話状態
信号を「発話中」にセットする（ステップ３）。[2-1. Operation of utterance state detection circuit] That is, first, the utterance state detection circuit 45 determines whether or not the user is currently uttering. That is, the user turns on the utterance start switch 44 when uttering a word, and the pattern matching circuit 411 outputs a recognition operation end signal when the recognition of the uttered word ends. Therefore, as shown in the flowchart of FIG. 3, the utterance state detection circuit 45 waits for a signal (SWon signal) sent from the utterance start switch 44 when it is turned on (step 1), and receives the SWon signal. (Step 2) An utterance state signal indicating whether or not the utterance is being set is set to "Speaking" (Step 3).

【００３２】この発話状態信号は、演算動作制御回路４
８、フィルタ定数選択回路４９、パターンマッチング回
路４１１へ出力され、発話状態検出回路４５は続けて、
パターンマッチング回路４１１からの認識動作の終了信
号を待ち受け（ステップ４）、この終了信号を受け取る
ことで認識動作の終了がわかると（ステップ５）、出力
している発話状態信号を「非発話中」にセットすること
で（ステップ６）、演算動作制御回路４８などに発話中
でなくなったことを知らせる。The utterance state signal is supplied to the arithmetic operation control circuit 4
8, output to the filter constant selection circuit 49 and the pattern matching circuit 411, the utterance state detection circuit 45 continues,
The end of the recognition operation is awaited from the pattern matching circuit 411 (step 4). When the end of the recognition operation is recognized by receiving the end signal (step 5), the output utterance state signal is set to "non-speech". (Step 6) to notify the arithmetic operation control circuit 48 and the like that the speech is no longer being made.

【００３３】〔２−２．演算動作制御回路の動作〕一
方、演算動作制御回路４８は、発話状態検出回路４５か
ら入力されてくる発話状態信号に基づいて、フィルタ演
算回路４６に命令を送ることで、ｍ時間間隔でごとにｎ
時間の間、フィルタ定数の演算すなわちフィルタ演算処
理を実行させるようにフィルタ演算回路４６を制御す
る。[2-2. Operation of Arithmetic Operation Control Circuit] On the other hand, the arithmetic operation control circuit 48 sends a command to the filter operation circuit 46 based on the speech state signal input from the speech state detection circuit 45, so that every m time intervals n
During the time, the filter operation circuit 46 is controlled so as to execute the operation of the filter constant, that is, the filter operation process.

【００３４】ここで、図４は、演算動作制御回路４８の
動作を示すフローチャートであり、ステップ４４の処理
は、ｍ時間及びｎ時間という２種類の時間を図るカウン
タのそれぞれについて、発話中から非発話中に変化した
後（ステップ４２）の最初と、ｍ時間又はｎ時間が経過
した後の最初ではそのカウンタをリセットして初期化す
る処理を表し、それ以外のときは両方のカウンタをイン
クリメントする処理を表すものとする。FIG. 4 is a flow chart showing the operation of the arithmetic operation control circuit 48. The processing of step 44 is performed for each of the counters for measuring two types of time, m time and n time, from non-speech to non-speech. At the beginning after the change during the utterance (step 42) and at the beginning after m or n hours have elapsed, the counter is reset and initialized. Otherwise, both counters are incremented. It represents processing.

【００３５】すなわち、この手順では、演算動作制御回
路４８はまず、発話状態信号を参照して（ステップ４
１）、この発話状態信号が「発話中」から「非発話中」
に変化すると（ステップ４２）、フィルタ定数演算回路
４６に動作開始命令を送る（ステップ４３）。そして、
発話状態信号が「非発話中」から「発話中」に変化しな
いかを監視しながら（ステップ４５）、ｎ時間経過する
と（ステップ４６）フィルタ演算回路４６に動作終了命
令を送る（ステップ４７）。That is, in this procedure, the arithmetic operation control circuit 48 first refers to the speech state signal (step 4).
1), the utterance state signal changes from "active" to "non-active"
(Step 42), an operation start command is sent to the filter constant calculation circuit 46 (step 43). And
While monitoring whether or not the utterance state signal changes from "non-speech" to "speech" (step 45), when n hours have passed (step 46), an operation end command is sent to the filter operation circuit 46 (step 47).

【００３６】そして、まだ発話状態信号が「発話中」か
ら「非発話中」に変化していなければ（ステップ４
８）、ｍ時間経過するまで（ステップ４９）ステップ４
４から４８までの手順を繰り返す。但し、ｍ時間経過を
待つ間のこの繰り返しでは、フィルタ演算回路４６には
動作開始命令（ステップ４３）は発行されていないの
で、フィルタ定数の演算は行われない。そして、ｍ時間
経過すると、再びフィルタ演算回路４６に動作開始命令
が送られ（ステップ４３）、フィルタ定数の演算が開始
される。If the speech state signal has not yet changed from "speaking" to "non-speaking" (step 4).
8) Until m hours have passed (step 49), step 4
Repeat steps 4 to 48. However, in this repetition while waiting for the elapse of the m-time, since the operation start command (step 43) has not been issued to the filter operation circuit 46, the operation of the filter constant is not performed. After the elapse of m hours, an operation start command is sent to the filter operation circuit 46 again (step 43), and the operation of the filter constant is started.

【００３７】図５は、以上のような制御の結果、非発話
状態の間、ｍ時間間隔ごとにｎ時間の間、フィルタ演算
回路４６が動作する状態を示すタイミングチャートであ
る。FIG. 5 is a timing chart showing a state in which the filter operation circuit 46 operates during the non-speech state and for n hours at m time intervals as a result of the above control.

【００３８】〔２−３．フィルタ演算回路の動作〕次
に、上に説明したように演算動作制御回路４８から制御
されるフィルタ演算回路４６の動作手順を図６に示す。
すなわち、フィルタ演算回路４６は、演算動作制御回路
４８が動作開始命令を発行してから（ステップ６１）動
作終了命令を発行するまでの間（ステップ６３）、すな
わち、認識動作の為の発話音声が存在していない間、音
声用マイク４１及び騒音参照用マイク４２で収録される
２つの音声信号を入力として、これら２つの音声信号の
間の相関値が高くなるように、適応的にフィルタ定数を
演算する（ステップ６１）。[2-3. Operation of Filter Operation Circuit] Next, FIG. 6 shows an operation procedure of the filter operation circuit 46 controlled by the operation operation control circuit 48 as described above.
That is, the filter operation circuit 46 outputs the speech sound for the recognition operation from when the operation control circuit 48 issues the operation start command (step 61) to when the operation end command is issued (step 63). While not present, two audio signals recorded by the audio microphone 41 and the noise reference microphone 42 are input, and the filter constant is adaptively adjusted so that the correlation value between the two audio signals becomes high. Calculation is performed (step 61).

【００３９】そして、フィルタ演算動作制御回路４８か
ら発行された動作終了命令を受信すると（ステップ６
３）、フィルタ定数の演算動作（フィルタ演算）を停止
し（ステップ６４）、それまでの演算で決定したフィル
タ定数と、その定数のもととなった２つの音声信号の間
の相関値を、記憶メモリ４７に出力して記憶させる（ス
テップ６５）。When an operation end command issued from the filter operation control circuit 48 is received (step 6).
3) The operation of calculating the filter constant (filter operation) is stopped (step 64), and the correlation value between the filter constant determined by the calculation up to that point and the two audio signals based on the constant is calculated as follows: The data is output to and stored in the storage memory 47 (step 65).

【００４０】この記憶メモリ４７は、発話状態信号が発
話状態でないことを表している間、このようにフィルタ
演算回路４６から渡されるフィルタ定数と対応する相関
値との組を、図７に示すように、過去ｋ回分まで複数記
憶する。While the utterance state signal indicates that the utterance state signal is not in the utterance state, the storage memory 47 stores a set of the filter constant passed from the filter operation circuit 46 and the corresponding correlation value as shown in FIG. Are stored a plurality of times up to k times in the past.

【００４１】〔２−４．発話時の騒音除去〕このように
演算され記憶されたフィルタ定数は、音声認識の対象と
なる音声信号から騒音をキャンセル（除去）するのに使
われる。ここで、図８は、音声信号から騒音を除去する
ときのフィルタ定数選択回路４９及び騒音キャンセル回
路４１０の動作手順を示すフローチャートである。すな
わち、まず、フィルタ定数選択回路４９が、記憶メモリ
４７に記憶されている過去ｋ回分の相関値を参照して
（ステップ８１）、相関値同士を大小比較し（ステップ
８２）、そのなかで最大の相関値のものを検出する（ス
テップ８３）。[2-4. Noise removal at the time of speech] The filter constant calculated and stored in this manner is used to cancel (remove) noise from a speech signal to be subjected to speech recognition. Here, FIG. 8 is a flowchart showing the operation procedure of the filter constant selection circuit 49 and the noise cancellation circuit 410 when removing noise from an audio signal. That is, first, the filter constant selecting circuit 49 refers to the past k correlation values stored in the storage memory 47 (step 81), compares the correlation values with each other (step 82), and sets Are detected (step 83).

【００４２】そして、フィルタ定数選択回路４９は、そ
の最大の相関値に対応するフィルタ定数を記憶メモリ４
７から取得し（ステップ８４）、このフィルタ定数を騒
音キャンセル回路４１０に送出する（ステップ８５）。The filter constant selecting circuit 49 stores the filter constant corresponding to the maximum correlation value in the storage memory 4.
7 (step 84), and sends this filter constant to the noise cancellation circuit 410 (step 85).

【００４３】このフィルタ定数を受け取った騒音キャン
セル回路４１０は、音声用マイク４１と騒音参照用マイ
ク４２から収録される２つの信号を入力として次のよう
な処理を行う。つまり、図２に示したように、音声用マ
イク４１からの音声信号と、騒音参照用マイクからの音
声信号をフィルタ定数選択回路４９から送られてきたフ
ィルタ定数の特性のフィルタを通過させた信号と、を使
用して減算処理を行い、それにより得られた信号を騒音
キャンセル後音声信号として、パターンマッチング回路
４１１へ出力する。The noise canceling circuit 410 which has received the filter constant performs the following processing by inputting two signals recorded from the voice microphone 41 and the noise reference microphone 42. That is, as shown in FIG. 2, a signal obtained by passing the audio signal from the audio microphone 41 and the audio signal from the noise reference microphone through a filter having the characteristic of the filter constant transmitted from the filter constant selection circuit 49. And a subtraction process is performed using, and a signal obtained by the subtraction is output to the pattern matching circuit 411 as an audio signal after noise cancellation.

【００４４】そして、パターンマッチング回路４１１
は、このように入力される騒音キャンセル後音声信号の
特徴と、認識辞書４１２に記憶されている音声参照デー
タとを比較照合し、該当するものがあれば、認識結果出
力回路４１３を介して認識結果を出力する。Then, the pattern matching circuit 411
Compares and compares the features of the thus-input speech signal after noise cancellation with the speech reference data stored in the recognition dictionary 412, and if there is a corresponding one, recognizes it via the recognition result output circuit 413. Output the result.

【００４５】〔３．実施形態の効果〕以上のように、こ
の実施形態では、音声用マイク４１と騒音参照用マイク
４２という２つのマイクロフォンを使い、使用者が音声
入力を行っていない、すなわち認識動作が行われていな
い状態を検出し、その状態の間に、上記２つのマイク間
の入力波形の相関を検出し、騒音参照用マイク４２に設
けたフィルタの定数を、２つのマイクの間でこの相関が
高くなるように演算し、一定時間以上たてば設定を中止
し、フィルタ定数を決定させる。このような動作を一定
時間間隔で実行し、各動作時のフィルタ定数を別々に記
憶させておく。そして、実際に認識動作が行われる際に
は、使用者の音声認識動作の開始を検出した後、これま
で記憶されているフィルタ定数の設定値の中で、最も相
関の高かったをものを選択し、そのフィルタ定数から定
まる特性のフィルタを介した騒音用マイクからの音声信
号と、音声用マイクからの音声信号との差分を音声認識
回路の入力とし、音声認識処理を行う。[3. Effect of Embodiment] As described above, in this embodiment, the two microphones, the voice microphone 41 and the noise reference microphone 42, are used, and the user does not perform voice input, that is, the recognition operation is not performed. A state is detected, and during that state, a correlation between the input waveforms between the two microphones is detected, and a constant of a filter provided in the noise reference microphone 42 is set so that the correlation becomes high between the two microphones. The setting is stopped after a certain period of time, and the filter constant is determined. Such operations are executed at regular time intervals, and filter constants for each operation are separately stored. Then, when the recognition operation is actually performed, after detecting the start of the user's voice recognition operation, the one having the highest correlation is selected from among the set values of the filter constants stored so far. Then, the difference between the audio signal from the noise microphone and the audio signal from the audio microphone through the filter having the characteristic determined by the filter constant is input to the audio recognition circuit, and the audio recognition processing is performed.

【００４６】この結果、騒音キャンセル処理を施した信
号を認識パターンマッチング処理の対象とするため、騒
音環境下でも認識率を低下させず認識動作を行わせるこ
とができる。また、認識処理を行っていない、いわば空
き時間を見計らって騒音キャンセルフィルタの定数の設
定を行う。このため、実際の認識処理中に複雑な騒音キ
ャンセルのための動作や計算などを同時に行う必要がな
い。これによって、リアルタイムで行わなければならな
い演算量を削減でき、特別に動作スピードの高速な回路
を使用する必要がないので音声認識システムの小型化や
費用低減なども容易になる。As a result, since the signal subjected to the noise cancellation processing is subjected to the recognition pattern matching processing, the recognition operation can be performed without reducing the recognition rate even in a noise environment. In addition, the constant of the noise canceling filter is set in consideration of the vacant time in which the recognition process is not performed, so to speak. Therefore, there is no need to simultaneously perform complicated operations for noise cancellation and calculations during the actual recognition processing. As a result, the amount of computation that must be performed in real time can be reduced, and it is not necessary to use a specially high-speed circuit, which facilitates downsizing and cost reduction of the speech recognition system.

【００４７】〔４．他の実施の形態〕なお、この発明は
上に述べた実施形態に限定されるものではなく、次に例
示するような他の実施の形態も含むものである。例え
ば、この発明の音声認識は、自動車などに搭載するいわ
ゆるカーナビゲーションシステムに命令を入力するのに
使えるだけでなく、コンピュータなど他の電子機器に命
令を入力するといった他の目的に使うこともできる。[4. Other Embodiments] The present invention is not limited to the above-described embodiments, but includes other embodiments as exemplified below. For example, the speech recognition of the present invention can be used not only for inputting a command to a so-called car navigation system mounted on a car or the like, but also for other purposes such as inputting a command to another electronic device such as a computer. .

【００４８】また、フィルタリング処理の具体的な手
法、フィルタ定数の具体的な形式、フィルタ定数を演算
する具体的な手法などは自由であり、また、認識辞書の
具体的な形式や内容、認識辞書を使った音声認識の手法
も自由に選択することができる。また、上に述べた実施
形態で示した時間ｍや時間ｎの具体的な値や、記憶メモ
リにいくつのフィルタ定数を記憶しておくかなどの具体
的事項も自由に決めることができる。Further, the specific method of the filtering process, the specific format of the filter constant, the specific method of calculating the filter constant, and the like are free. The method of voice recognition using can be freely selected. Further, specific values such as the specific values of the time m and the time n shown in the above-described embodiment and the number of filter constants to be stored in the storage memory can be freely determined.

【００４９】[0049]

【発明の効果】以上のように、本発明によれば、、周囲
に騒音の存在する環境においても、少ない負荷で正しく
音声認識を行うことが可能となるので、装置の小型化や
低廉化を図ることが容易になる。As described above, according to the present invention, even in an environment where noise is present in the surroundings, it is possible to correctly perform speech recognition with a small load, so that the apparatus can be reduced in size and cost. It is easy to plan.

[Brief description of the drawings]

【図１】本発明の実施形態の構成を示す機能ブロック
図。FIG. 1 is a functional block diagram showing a configuration of an embodiment of the present invention.

【図２】本発明の実施形態における騒音除去の原理を示
す概念図。FIG. 2 is a conceptual diagram illustrating the principle of noise removal according to the embodiment of the present invention.

【図３】本発明の実施形態における発話状態検出回路の
動作手順を示すフローチャート。FIG. 3 is a flowchart showing an operation procedure of an utterance state detection circuit according to the embodiment of the present invention.

【図４】本発明の実施形態における演算動作制御回路の
動作手順を示すフローチャート。FIG. 4 is a flowchart showing an operation procedure of an arithmetic operation control circuit according to the embodiment of the present invention.

【図５】本発明の実施形態におけるフィルタ演算回路の
動作タイミングを示すタイミングチャート。FIG. 5 is a timing chart showing operation timings of the filter operation circuit according to the embodiment of the present invention.

【図６】本発明の実施形態におけるフィルタ演算回路の
動作手順を示すフローチャート。FIG. 6 is a flowchart illustrating an operation procedure of the filter operation circuit according to the embodiment of the present invention.

【図７】本発明の実施形態におけるフィルタ定数と相関
値との組み合わせを示す概念図。FIG. 7 is a conceptual diagram showing a combination of a filter constant and a correlation value according to the embodiment of the present invention.

【図８】本発明の実施形態におけるフィルタ定数選択回
路及び騒音キャンセル回路の動作手順を示すフローチャ
ート。FIG. 8 is a flowchart illustrating an operation procedure of a filter constant selection circuit and a noise cancellation circuit according to the embodiment of the present invention.

【図９】従来技術の一構成例を示す機能ブロック図。FIG. 9 is a functional block diagram showing a configuration example of a conventional technique.

【図１０】従来技術の別の構成例を示す機能ブロック
図。FIG. 10 is a functional block diagram showing another configuration example of the related art.

[Explanation of symbols]

４１…音声用マイク４２…騒音参照用マイク４３…音声入力回路４４…発話開始スイッチ４５…発話状態検出回路４６…フィルタ演算回路４７…フィルタ定数の演算結果用記憶メモリ４８…演算動作制御回路４９…フィルタ定数選択回路４１０…騒音キャンセル回路４１１…パターンマッチング回路４１２…認識辞書４１３…認識結果出力回路１１１…フィルタ１１２…減算回路 41 Voice Microphone 42 Noise Reference Microphone 43 Voice Input Circuit 44 Utterance Start Switch 45 Voice Status Detection Circuit 46 Filter Operation Circuit 47 Storage Memory for Filter Result Calculation Results 48 Operation Control Circuit 49 Filter constant selection circuit 410 Noise cancellation circuit 411 Pattern matching circuit 412 Recognition dictionary 413 Recognition result output circuit 111 Filter 112 Subtraction circuit

フロントページの続き (72)発明者柴崎光陽東京都文京区白山５丁目35番２号クラリオン株式会社内 (72)発明者木佐貫誠東京都文京区白山５丁目35番２号クラリオン株式会社内Ｆターム(参考） 5D015 CC02 DD02 EE05 9A001 BB02 EE05 HH17 JJ77 KK54Continuation of the front page (72) Inventor Koyo Shibasaki 5-35-2 Hakusan, Bunkyo-ku, Tokyo Clarion Co., Ltd. (72) Inventor Makoto Kisani 5-35-2 Hakusan, Bunkyo-ku, Tokyo Clarion Co., Ltd. F term (reference) 5D015 CC02 DD02 EE05 9A001 BB02 EE05 HH17 JJ77 KK54

Claims

[Claims]

1. A first and second microphone, means for calculating a filter constant for filtering processing based on each audio signal from each microphone, and an audio signal from the first microphone Means for removing a noise component by performing a filtering process based on an audio signal from the second microphone and a characteristic based on the calculated filter constant, based on the audio signal from which the noise has been removed. Means for recognizing an uttered phrase, means for determining whether or not sound is input from a microphone, and means for controlling the filter constant to be calculated when the sound is not input. A speech recognition system, comprising:

2. A first microphone for inputting a voice to be spoken, a second microphone for inputting the same noise as noise mixed in the first microphone, and turned on when the voice is input. A recognition unit for recognizing a word from a given voice signal and outputting an end signal when the recognition is completed, and determining whether or not the speech is being made based on the state of the switch and the end signal. Means for calculating filter constants adaptively so that the correlation value between the audio signals of the microphones becomes higher; calculated filter constants; and two sounds when the filter constants are calculated. A memory for storing a plurality of combinations of correlation values between signals, and a control for operating the arithmetic means at predetermined intervals only while the speech is not being performed. Means for selecting a filter constant corresponding to the largest correlation value among the correlation values stored in the memory; and, based on the selected filter constant and two audio signals from each of the microphones, Means for outputting a noise-cancelled audio signal by performing a filtering process, wherein the recognition means is configured to recognize a phrase based on the noise-cancelled audio signal. Characteristic speech recognition system.

3. A step of calculating a filter constant for a filtering process based on each audio signal from each of the first and second microphones, and a second microphone for the audio signal from the first microphone. Removing a noise component by performing a filtering process based on the voice signal from the voice signal and a characteristic based on the calculated filter constant; and a phrase spoken based on the voice signal from which the noise has been removed. Recognizing, and determining whether or not voice is input from a microphone; and controlling the filter constant to be calculated when the voice is not input. Voice recognition method to be used.

4. A voice recognition method using a first microphone for inputting a voice to be uttered and a second microphone for inputting the same noise as noise mixed in the first microphone. A recognition step of recognizing a phrase from a given voice signal and outputting a termination signal when the recognition is completed; and a state of a switch for turning on when a voice is input and whether or not the speech is being made based on the termination signal. Determining; and adaptively calculating a filter constant so as to increase the correlation value between the audio signals of the microphones; and 2) calculating the calculated filter constant and calculating the filter constant. Storing a plurality of combinations of correlation values between two voice signals; and performing the calculation at predetermined intervals only while the speech is not being performed. Controlling to perform the following steps: selecting a filter constant corresponding to a maximum correlation value among the stored correlation values; and selecting two filter constants from the selected filter constant and each of the microphones. Outputting a noise-cancelled audio signal by performing a filtering process based on the audio signal, wherein the recognizing step recognizes a phrase based on the noise-cancelled audio signal. A speech recognition method characterized in that:

5. A recording medium recording voice recognition software for recognizing a word in a voice using a computer, wherein the software transmits the voice signal from the first and second microphones to the computer. Calculating a filter constant for the filtering process based on the audio signal from the first microphone and the audio signal from the second microphone and the characteristic based on the calculated filter constant. By performing a filtering process, a noise component is removed, based on the voice signal from which the noise has been removed, a spoken word is recognized, it is determined whether or not voice is input from a microphone, and the voice is input. When there is no voice recognition software, the filter constant is calculated. Recording medium on which

6. A recording medium storing voice recognition software for recognizing a phrase in a voice using a computer, the software comprising a first microphone for inputting voice to be spoken to the computer. And a second microphone for inputting the same noise as the noise mixed into the first microphone, and a word is recognized from a given voice signal, and when the recognition is completed, an end signal is output. Determining whether or not speech is being made based on the state of a switch for turning on at the time of voice input and the end signal, and adaptively filtering so as to increase the correlation value between the voice signals of the microphones. A set of a calculated filter constant and a correlation value between two audio signals when the calculated filter constant is calculated. A plurality of matchings are stored, the calculation of the filter constant is performed at predetermined intervals only while the speech is not being performed, and a filter constant corresponding to a maximum correlation value among the stored correlation values is selected. By performing a filtering process based on the selected filter constant and the two audio signals from each of the microphones, an audio signal in which noise is canceled is output. In the recognition, the noise is canceled. A recording medium storing voice recognition software for recognizing words and phrases based on a voice signal.