JPH01200294A

JPH01200294A - Sound recognizing device

Info

Publication number: JPH01200294A
Application number: JP63024643A
Authority: JP
Inventors: Yasuyuki Yamamoto; 靖之山本
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1988-02-04
Filing date: 1988-02-04
Publication date: 1989-08-11

Abstract

PURPOSE:To prevent erroneous recognition due to an environmental noise by sound- gathering the environmental noise with using a main microphone and an auxiliary microphone and after that, obtaining sound data for recognition processing with using a noise correcting coefficient, which is calculated by frequency-dividing the environmental noise. CONSTITUTION:The environmental noise is sound-gathered by a main microphone 1 and an auxiliary microphone 11 and divided in each frequency band by BPFs 3 and 13. Then, the correcting coefficient for noise reduction is calculated by a correcting coefficient calculating means 24 and stored in a memory 25. Next, when a sound signal is sound-gathered by the main microphone 1, the sound signal to include the environmental noise is frequency-divided and supplied to a sound data calculating means 26. Then, the environmental noise at such a time is sound-gathered by the auxiliary microphone 11, frequency divided and supplied to the sound data calculating means 26. The correcting coefficient just before a word is generated is read from the memory 25 and the sound data as data for recognition processing are calculated by the sound data calculating means 26. Thus, the erroneous recognition can be prevented from being executed by the environmental noise in a sound recognizing part 27.

Description

【発明の詳細な説明】（産業上の利用分野）この発明は、生マイクロホンの他に主として環境雑音を
集音する補助マイクロホンを用いた音声認識装置に関す
る。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to a speech recognition device that uses, in addition to a live microphone, an auxiliary microphone that mainly collects environmental noise.

[Summary of the invention]

この発明は、主マイクロホンによって集音された音声信
号を周波数分割し正規化して得られた音響パラメータを
予め設定された標準パターンと比較して音声認識を行う
音声認識装置において、主マイクロホンによって集音さ
れた環境雑音を周波数分割したデータと補助マイクロホ
ンによって集音された環境雑音を周波数分割したデータ
を比較して雑音除去用補正係数を算出し、この補正係数
と生マイクロホンで集音された環境雑音を含む音声信号
及び補助マイクロホンで集音した少な（とも環境雑音よ
り雑音の除去された＃虐データを算出し、この音声デー
タを用いて音響認識を行うようにすることにより、環境
雑音による誤認識を防止するようにしたものである。The present invention provides a voice recognition device that performs voice recognition by comparing acoustic parameters obtained by frequency-dividing and normalizing an audio signal collected by a main microphone with a preset standard pattern. A correction coefficient for noise removal is calculated by comparing the frequency-divided data of the environmental noise collected by the auxiliary microphone with the frequency-divided data of the environmental noise collected by the auxiliary microphone. By calculating the noise-removed data from the audio signal collected by the auxiliary microphone and the auxiliary microphone, and performing acoustic recognition using this audio data, it is possible to eliminate misrecognition due to environmental noise. It is designed to prevent this.

[Conventional technology]

従来、音声認識装置として種々のものが提案されており
、例えば指向性マイクを発声者の口の近くに置き、無指
向性マイクを発声者の口からは離しかつ発声者の近傍に
置き、両マイクの出力を夫々増幅して差動増％Ｉ器で差
を求め、その差出力を音声識別装置に入力し、ここで音
声で入力された特定語型が予め記憶されている特定語党
特徴と比較して識別を行う方法がある（特開昭５１−６
２６０４号公報）。Conventionally, various types of speech recognition devices have been proposed. For example, a directional microphone is placed near the speaker's mouth, an omnidirectional microphone is placed away from the speaker's mouth and near the speaker, and both The outputs of the microphones are each amplified and the difference is determined by a differential intensifier, and the difference output is inputted to a voice identification device, where the specific word type is pre-stored. There is a method of identification by comparing the
Publication No. 2604).

また、音声入力前の人力雑音に付いての周波数スペクト
ルに時間平均値を求めておき、この平均値を音声入力信
号の周波数スペクトラムから減算することにより、人力
音声自身の周波数スペクトルを抽出する方法がある（特
開昭５５−３３１２６号公報。）〔発明が解決しようと
する課題〕ところが特開昭５１−６２６０４号公報に記載されてい
るような従来装置の場合、指向性マイクに入って来る雑
音の位相と、無指向性マイクに人って来る雑音の位相の
ずれの検出が難しく、従って安定して雑音を相殺するこ
とが難しく雑音により誤認識を生ずる欠点があった。In addition, there is a method of extracting the frequency spectrum of the human voice itself by calculating the time average value of the frequency spectrum of human noise before inputting the voice and subtracting this average value from the frequency spectrum of the voice input signal. (Japanese Unexamined Patent Publication No. 55-33126.) [Problem to be Solved by the Invention] However, in the case of the conventional device as described in Unexamined Japanese Patent Publication No. 51-62604, noise entering the directional microphone It is difficult to detect the phase difference between the phase of the noise and the noise coming from the omnidirectional microphone, and therefore it is difficult to stably cancel out the noise, resulting in erroneous recognition due to the noise.

また、特開昭５５−３３１２６号公報に記載されている
ような従来装置の場合、定審的雑音には有効であるが、
間欠的な雑音や、人の会話や音楽等の変化しやすい雑音
の如く雑音の状況が変わるものには対処できず、ｔｊ４
１首により誤認識を生ずる欠点があった・この発明は斯る点に鑑みてなされたもので、環境雑音に
よる誤認識を防止することができる音響認識装置を提供
するものである。Furthermore, in the case of a conventional device such as that described in Japanese Patent Application Laid-open No. 55-33126, although it is effective against deterministic noise,
It is not possible to deal with intermittent noises or noises that change easily, such as people's conversations or music, and tj4
The present invention has been made in view of this problem, and provides an acoustic recognition device that can prevent erroneous recognition due to environmental noise.

[Means to solve the problem]

この発明は、生マイクロホン（１）によって集音された
音声信号を周波数分割（３）シ正規化（２７ａ）Ｌ。This invention performs frequency division (3) and normalization (27a) of an audio signal collected by a live microphone (1).

て得られた音響パラメータを予め設定された標準パター
ン（２７ｃ）と比較（２７ｂ）して音声認識を行う音声
認識装置において、少なくとも環境雑音を集音する補助
マイクロホン（１１）と、主マイクロホン（１）によっ
て集音された環境雑音を周波数分割したデータ及び補助
マイクロホン（１１）によって集音された環境雑音を周
波数分割したデータを比較して雑音除去用補正係数を算
出する補止係数算出手段（２４，２５）と、主マイクロ
ホン（１）によって集音された環境雑音を含む音声信号
及び補助マイクロホン（１１）によって集音された少な
くとも環境雑音と補正係数より雑音の除去された音声デ
ータを算出する音声データ算出手段（２６）とを備え、
音声データを用いて音声認識（２７）を行うように構成
している。A speech recognition device that performs speech recognition by comparing (27b) the acoustic parameters obtained with a preset standard pattern (27c) includes an auxiliary microphone (11) that collects at least environmental noise, and a main microphone (1). ) and the data obtained by frequency-dividing the environmental noise collected by the auxiliary microphone (11) to calculate a correction coefficient for noise removal. , 25) and an audio signal including environmental noise collected by the main microphone (1) and at least the environmental noise collected by the auxiliary microphone (11) and a correction coefficient to calculate audio data from which noise has been removed. and a data calculation means (26),
It is configured to perform voice recognition (27) using voice data.

[Effect]

先ず言葉が発生されない状態で主マイクロホン＋１１及
び補助マイクロホン（１１）で環境雑音を集音し、バン
ドパスフィルタバンク（３，１３）で夫々各周波数帯域
毎に分割し、その各データを用いて補止係数算出手段（
２４，２５）で雑音除去用補正係数Ａｎを周波数帯域毎
に算出して記憶する。この補正係数Ａｎは常に最適値を
保つように更新し、修正して行く。次に言葉が発生され
て音声信号が主マイクロホン（１）に集音されると、こ
の環境雑音を含む音声信号を上述の如く周波数分割して
音声データ算出手［ｆｉ（２６）に供給すると共にこの
ときの少なくとも環境雑音を補助マイクロホン（１１）
で集音し、周波数分割して音声データ算出手段（２６）
に供給し、またメモリ　（２５）より言葉が発生された
直前の補正係数を読み出し、音声データ算出手段（２６
）において認識処理用データとしての音声データを算出
し、この音声データを標準パターンと比較して音声認識
を行う。これにより、実質的に環境雑音は周波数軸上で
相殺されるので、時間軸上で行う従来の如く位相ずれの
影響を受けることなく、確実に環境雑音による誤認識を
防止することができる。First, the main microphone +11 and the auxiliary microphone (11) collect environmental noise in a state where no words are being generated, and the bandpass filter bank (3, 13) divides the sound into each frequency band, and uses each data to perform compensation. Stop coefficient calculation means (
24, 25), the noise removal correction coefficient An is calculated and stored for each frequency band. This correction coefficient An is updated and corrected so as to always maintain the optimum value. Next, when words are generated and the audio signal is collected by the main microphone (1), this audio signal containing environmental noise is frequency-divided as described above and supplied to the audio data calculator [fi (26)]. Auxiliary microphone (11) at least eliminates environmental noise at this time.
Collects sound, divides it into frequencies, and calculates audio data (26)
The correction coefficient immediately before the word was generated is read out from the memory (25), and the speech data calculation means (26)
), voice data as recognition processing data is calculated, and this voice data is compared with a standard pattern to perform voice recognition. As a result, environmental noise is substantially canceled out on the frequency axis, so that erroneous recognition due to environmental noise can be reliably prevented without being affected by phase shifts unlike the conventional method performed on the time axis.

〔Example〕

以下、この発明の一実施例を添付図面に基づいて詳しく
説明する。Hereinafter, one embodiment of the present invention will be described in detail based on the accompanying drawings.

図は本実施例の回路構成を示すもので、同図において、
［１）は主として音声信号を集音するための主マイクロ
ホン、（２）は主マイクロホン（１）からの音声信号を
増幅する増幅器、（３）はバンドパスフィルタバンクで
あって、例えば１６チヤンネルのバンドパスフィルタ（
１３１）〜（１３ｔｓ　）から成り、その全周波数帯域
は例えば２００Ｈｚから６　ｋＨｚとされ、この周波数
帯域が等間隔となるように各バンドパスフィルタに割り
振られる。（４）は各バンドパスフィルタ（３１）〜（
３ｔｓ）の出力を時分割的に切換えて取り出すマルチプ
レクサである。（５）はローパスフィルタ、（６）はサ
ンプルホールド回路、（７）はＡ／Ｄ変換器である。The figure shows the circuit configuration of this embodiment, and in the figure,
[1) is a main microphone mainly for collecting audio signals, (2) is an amplifier that amplifies the audio signal from the main microphone (1), and (3) is a band-pass filter bank, which has, for example, 16 channels. Bandpass filter (
131) to (13ts), the total frequency band of which is, for example, from 200 Hz to 6 kHz, and is allocated to each bandpass filter so that the frequency bands are equally spaced. (4) represents each bandpass filter (31) to (
This is a multiplexer that switches and extracts the output of 3ts) in a time-division manner. (5) is a low pass filter, (6) is a sample hold circuit, and (7) is an A/D converter.

また、（１１）は主として環境雑音を集音する補助マイ
クロホン、（１２）は補助マイクロホン（１１）からの
環境雑音を増１陥する増幅器、（１３）はバントパスフ
ィルタバンクであって、上述のバンドパスフィルタ（３
）同様１６チヤンネルのバンドパスフィルタ（１３１）
〜（１３１ε）から成り、同様の周波数帯域を有し、等
間隔となるように各バンドパスフィルタに割り振られて
いる。（１４）はマルチプレクサ、（１５）はローパス
フィルタ、（１６）はサンプルホールド回路、（１７）
はＡ／Ｄ変換器である。Further, (11) is an auxiliary microphone that mainly collects environmental noise, (12) is an amplifier that amplifies the environmental noise from the auxiliary microphone (11), and (13) is a band pass filter bank, which is the same as described above. Bandpass filter (3
) Similar 16 channel band pass filter (131)
~(131ε), have similar frequency bands, and are distributed to each bandpass filter at equal intervals. (14) is a multiplexer, (15) is a low pass filter, (16) is a sample hold circuit, (17)
is an A/D converter.

（２０）はスイッチ回路であって、連動するスイッチ（
２０ａ　）　、　　（２０ｂ　）を有し、スイッチ（２
０ａ　）にはＡ／Ｄ変換変換子）からのディジタルデー
タが供給され、スイッチ（２０ｂ　）にはＡ／Ｄ＊換器
（１７）からのディジタルデータが供給される。（２１
）はスイッチ回路であって、連動するスイッチ（２１ａ
）。(20) is a switch circuit, which is an interlocking switch (
20a) and (20b), and has a switch (20a) and (20b).
The switch (20b) is supplied with digital data from the A/D converter (17), and the switch (20b) is supplied with digital data from the A/D* converter (17). (21
) is a switch circuit in which an interlocking switch (21a
).

（２１ｂ）を有し、スイッチ（２１ａ）にはＡ／Ｄ変換
器（７）からのディジタルデータが供給され、スイッチ
（２１ｂ　）にはＡ／Ｄ変換器（１７）からのディジタ
ルデータが供給される。(21b), the switch (21a) is supplied with digital data from the A/D converter (7), and the switch (21b) is supplied with digital data from the A/D converter (17). Ru.

これ等のスイッチ回路（２０）及び（２１）はレベル検
出回路（２２）からの出力信号により制御される。すな
わち、レベル検出回路（２２）は所定のスレショルドレ
ベルｉ’　ｈを有し、主マイクロホン（１１カラの信号
のレベルがこのスレッショルドレベル’１’　ｈより大
きいとその出力側に０Ｎ（ｉ号を発生し、このＯＮ信号
によりスイッチ回路（２１）のスイッチ（２１ａ　）　
、　　（２１ｂ　）は閉成してＯＮ状態となり、またレ
ベル検出回路（２２）より発生されたＯＮ信号はインバ
ータ（２３）で反転されてＯＦＦ信号となり、このＯＦ
Ｆ信号によりスイッチ回路（２０）のスイッチ（２０ａ
　）　、　　（２０ｂ　）は開放してＯＦＦ状態となる
。また、主マイクロホン（１）からの音声信号のレベル
がスレッショルドレベルＴ）Ｉより小さいとその出力側
にＯＦＦ信号を発生し、このＯＦＦ信号によりスイッチ
回路（２１）のスイッチ（２１ａ　）　、　　（２１ｂ
　）は解放してＯＦＦ状態となり、またレベル検出回路
（２２）より発生されたＯＦＦ信号はインバータ（２３
）で反転されてＯＮ信号となり、このＯＦＦ信号により
スイッチ回路（２０）のスイッチ（２０ａ　）　、　　
（２０ｂ　）は閉成してＯＮ状態となる。つまり、主マ
イクロホン（１）からの音声信号のレベルがレベル検出
回路（２２）のスレショルドレベル゛Ｉ’ｈより大きい
ときはスイッチ回路（２１）のスイッチ（２１ａ　）　
、　　（２１ｂ　）がＯＮ状態、スイッチ回路（２０）
のスイッチ（２０ａ　）　、　　（２０ｂ　）がＯＦ　
Ｆ状態となり、逆に主マイクロホン＋１１からの音声信
号のレベルがレベル検出回路（２２）のスレッショルド
レベルＴｋより小さいときはスイッチ回路（２１）のス
イッチ（２１ａ　）　、　　（２１ｂ　）が０ドＦ状態
、スイッチ回路（２０）のスイッチ（２０ａ）。These switch circuits (20) and (21) are controlled by the output signal from the level detection circuit (22). That is, the level detection circuit (22) has a predetermined threshold level i'h, and when the level of the signal from the main microphone (11 colors) is greater than this threshold level '1'h, it generates 0N(i) on its output side. Then, this ON signal turns on the switch (21a) of the switch circuit (21).
, (21b) are closed and become ON, and the ON signal generated by the level detection circuit (22) is inverted by the inverter (23) and becomes an OFF signal, and this OF
The switch (20a) of the switch circuit (20) is activated by the F signal.
) and (20b) are opened and become OFF state. Furthermore, when the level of the audio signal from the main microphone (1) is lower than the threshold level T)I, an OFF signal is generated on its output side, and this OFF signal causes the switches (21a) and (21b) of the switch circuit (21) to be activated.
) is released and becomes OFF, and the OFF signal generated from the level detection circuit (22) is sent to the inverter (23).
) is inverted and becomes an ON signal, and this OFF signal causes the switch (20a) of the switch circuit (20),
(20b) is closed and becomes ON state. In other words, when the level of the audio signal from the main microphone (1) is higher than the threshold level 'I'h of the level detection circuit (22), the switch (21a) of the switch circuit (21) is activated.
, (21b) is in ON state, switch circuit (20)
switches (20a) and (20b) are OFF
When the level of the audio signal from the main microphone +11 is lower than the threshold level Tk of the level detection circuit (22), the switches (21a) and (21b) of the switch circuit (21) are in the F state. A switch (20a) of a switch circuit (20).

（２０ｂ）がＯＮ状態となる。(20b) is turned on.

このレベル検出回路（２２）におけるスレッショルドレ
ベルＴｈは普通に話す程度の音声信号のレベルより小さ
く且つ環境雑音よりは大きい程度に設定される。The threshold level Th in this level detection circuit (22) is set to a level that is lower than the level of the voice signal of normal speaking and higher than the environmental noise.

スイッチ回路（２０）のスイッチ（２０ａ　）　、　　
（２０ｂ　）を通った各ディジタルデータ（つまり、ス
イッチ回路（２０）のスイッチ（２０ａ　）　、　　（
２０ｂ　）がＯＮ状態になるときには何も言葉を発生し
ておらず、環境雑音のみであるので、このときの各ディ
ジタルデータは雑音成分のみから成るデータである）は
補正係数算出回路（２４）に供給され、ここで雑音除去
用補正係数が算出される。すなわち雑音除去用補正係数
をＡｎとすると、次式に従って算出を行う。switch (20a) of the switch circuit (20),
(20b), each digital data (that is, the switch (20a) of the switch circuit (20), (
20b) is in the ON state, no words are being generated and there is only environmental noise, so each digital data at this time consists only of noise components) is sent to the correction coefficient calculation circuit (24). The correction coefficient for noise removal is calculated here. That is, when the noise removal correction coefficient is An, calculation is performed according to the following equation.

上記（１１式において、ｎはＡ／Ｄ変換器＋７）、　　
（１７）における号ンプリング回数、ａ　１１は主マイ
クロホン（１）からのデータ（この場合雑音成分）、ｂ
ｎは補助マイクロホン（■１）からのデータ（この場合
雑音成分）、Ｎ、Ｍは環境雑音の性質やサンプリング頻
度により適切な値を選ばれる定数、Ａｎ−１は１サンプ
リング前の補正係数である。Above (in formula 11, n is A/D converter + 7),
The number of signal samplings in (17), a 11 is the data from the main microphone (1) (in this case, the noise component), b
n is the data from the auxiliary microphone (■1) (in this case, the noise component), N and M are constants whose values are selected appropriately depending on the nature of the environmental noise and sampling frequency, and An-1 is the correction coefficient before one sampling. .

補正係数算出回路（２４）は主−フィクロホン（１１か
らの音声信号がないときすなわちスイッチ回路（２０）
のスイッチ（２０ａ　）　、　　（２０ｂ　）が閉成し
てＯＮ状態のとき、この補正係数Ａｎを周波数帯域毎に
順次算出し、更新してゆく。補正係数算出回路（２４）
で算出された補正係数Ａｎは周波数帯域毎にメモリ　（
２５）に記憶される。The correction coefficient calculation circuit (24) is used when there is no audio signal from the main ficrophone (11, that is, the switch circuit (20)
When the switches (20a) and (20b) are closed and in the ON state, this correction coefficient An is sequentially calculated and updated for each frequency band. Correction coefficient calculation circuit (24)
The correction coefficient An calculated in is stored in the memory (
25).

なお、この補正係数Ａｎは平常時、つまり言葉を何も発
生しない時はできるだけ後述する認識処理用データＣｎ
が零に近い値になるように設定される。In addition, this correction coefficient An is used as much as possible during normal times, that is, when no words are generated, for recognition processing data Cn, which will be described later.
is set to a value close to zero.

また、スイッチ回路（２１）のスイッチ（２１ａ）。Also, a switch (21a) of the switch circuit (21).

（２１ｂ　）を通った各ディジタルデータ（つまり、ス
イッチ回路（２１）の２、イソナ（２１ａ　）　　、　
　（２１ｂ　＞がＯＮ状態になるときには言葉が発生さ
れて少なくとも主マイクロホン（１）には音声信号が与
えられているので、このときのＡ／Ｄ変換器（７）から
のディジクルデータは首府信号と雑音成分であり、Ａ／
Ｄ変換器（１７）からのディジタルデータはほとんど雑
音成分のみである）は音声データ算出手段としての差分
検出回１ｆ８（２６）に供給される。また、この差分検
出回路（２６）にはメモリ　（２５）に周波数＋２ｈ域
毎に記憶されている補正係数Ａｎが読み出されて入力さ
れる。そこで、差分検出回路（２６）は供給された各デ
ィジタルデータと補正係数Ａｎに基づいて次式により確
認処理用データＣｎを算出する。Each digital data passed through (21b) (that is, 2 of the switch circuit (21), isona (21a),
(When 21b> is turned on, words are generated and at least the main microphone (1) is given an audio signal, so the digital data from the A/D converter (7) at this time is the capital signal. is the noise component, and A/
The digital data from the D converter (17) is almost only a noise component) and is supplied to a difference detection circuit 1f8 (26) as an audio data calculation means. Further, the correction coefficient An stored in the memory (25) for each frequency +2h range is read out and input to the difference detection circuit (26). Therefore, the difference detection circuit (26) calculates the confirmation processing data Cn using the following equation based on each supplied digital data and the correction coefficient An.

Ｃｎ＝ａｎ　　Ａｎ−ｂｎ　　　　　　　　・・１２１
上記（２）式において、ａｌはこの場合音声信号と雑音
成分から成るデータであり、ｂｎはほとんど雑音成分か
ら成るデータである。そして、補正係数Ａｎは上述の如
く言葉が発生されてないときできるだけ認識処理用デー
タＣｎが零となるように設定されているので、実質的に
ａｌに含まれる雑音成分とｂ　１１に含まれる雑音成分
は相殺され、結局ａｎに含まれる音声信号のみが認識処
理用データＣｎとして取り出されることになる。Cn=an An-bn...121
In the above equation (2), al is data consisting of an audio signal and noise components in this case, and bn is data consisting mostly of noise components. As mentioned above, the correction coefficient An is set so that the recognition processing data Cn becomes zero as much as possible when no words are generated, so it is essentially the noise component contained in al and the noise contained in b11. The components are canceled out, and in the end, only the audio signal included in an is extracted as the recognition processing data Cn.

この差分検出回路（２６）からの認識処理用データＣｎ
は音声認識部（２７）の音源情報正規化器（２７ａ）に
供給されて正規化され、音響パラメータとして取り出さ
れる。この音響パラメータはパターンマツチング回路（
２７ｂ）に供給される。認識する前に標準パターンメモ
リ　（２７ｃ　）には前もってその話者の各認識対象単
語の分析結果を標準パターンとして登録しており、認識
するときには、各認識対象単語の標準パターンをメモリ
　（２７ｃ）より読み出して入力音声パターンに対応し
た音響パラメータをパターンマツチング回路（２７ｂ）
で比較し、最も近いすなわちＶＩｉ離の小さい認識対象
８１語を選択し、出力端子（２８）へ人力音声を示す認
識結果として出力する。Recognition processing data Cn from this difference detection circuit (26)
is supplied to the sound source information normalizer (27a) of the speech recognition unit (27), normalized, and extracted as an acoustic parameter. This acoustic parameter is determined by the pattern matching circuit (
27b). Before recognition, the analysis result of each recognition target word of the speaker is registered in advance as a standard pattern in the standard pattern memory (27c), and when recognizing, the standard pattern of each recognition target word is stored in the memory (27c). A pattern matching circuit (27b) reads out acoustic parameters corresponding to the input audio pattern.
The 81 words to be recognized that are closest to each other, that is, those with the smallest VIi distance, are selected and outputted to the output terminal (28) as a recognition result indicating the human voice.

次に図の回路動作を説明する。言葉を発生していない平
常時には主マイクロホンｉｌｌ及び補助マイクロホン（
■１）は環境雑音のみ集音しているのでレベル検出回路
（２２）の出力側にはＯＦＦ信号が得られ、これにより
スイッチ回路（２１）のスイッチ（２１ａ　）　、　　
（２１ｂ　）が開放してＯＦＦ状態になると共にＯＦＦ
信号をインバータ（２３）で反転したＯＮ信号によりス
イッチ回路（２０）のスイッチ（２０ａ　）　、　　（
２０ｂ　）が閉成してＯＮ状態となる。Next, the operation of the circuit shown in the figure will be explained. During normal times when no words are being generated, the main microphone ill and the auxiliary microphone (
■1) Since only environmental noise is collected, an OFF signal is obtained on the output side of the level detection circuit (22), which causes the switch (21a) of the switch circuit (21) to
(21b) opens and turns OFF, and turns OFF.
The ON signal obtained by inverting the signal with the inverter (23) causes the switches (20a), (
20b) is closed and becomes ON state.

すると、マイクロホン（１１，（１１）で集音された雑
音は夫々バンドパスフィルタバンク（３１，（１３）で
周波数帯域毎に分離され、マルチプレクサ（４）。Then, the noise collected by the microphones (11, (11)) is separated into frequency bands by the bandpass filter banks (31, (13), respectively), and then sent to the multiplexer (4).

（１４）で夫々時分割的に取り出されてＡ／Ｄ変換器（
７）、　　（１７）に夫々供給される。そしてＡ／１）
回路（７）及び（１７）の出力側に得られている各ディ
ジタルデータ（雑音成分）は補正係数算出回路（２４）
に供給され、上記（Ｌ）式に従って周波数帯域毎に順次
補正係数Ａｎが算出されてメモ’Ｊ　（２５）に記憶さ
れる。この補正係数Ａｎは言葉が発生されるまですなわ
ち少なくとも主マイクロホン（１）により音声信号が集
音開始されるまで続行され、逐次新しい補正係数Ａｎが
周波数帯域毎にできるだけ認識処理用データＣｎが零に
近い値になるような最適値を保つように修正されメモリ
　（２５）に記憶される。(14), each is taken out in a time-divisional manner and sent to the A/D converter (
7) and (17), respectively. and A/1)
Each digital data (noise component) obtained on the output side of circuits (7) and (17) is sent to a correction coefficient calculation circuit (24).
The correction coefficient An is sequentially calculated for each frequency band according to the above equation (L) and stored in the memo 'J (25). This correction coefficient An is continued until a word is generated, that is, at least until the main microphone (1) starts collecting the audio signal, and a new correction coefficient An is successively applied for each frequency band so that the recognition processing data Cn becomes zero as much as possible. It is corrected to maintain the optimal value that is close to the value and stored in the memory (25).

そして、言葉が発生されて少なくとも主マイクロホン＋
１）が音声信号を集音すると、そのレベルはレベル検出
回路（２２）のスレショルドレベル゛ｔ’ｈを越えるよ
うになるのでレベル検出回路（２２）の出力側にはＯＮ
信号が得られ、これによりスイッチ回路（２１）のスイ
ッチ（２１ａ　）　、　　（２１ｂ　）が閉成してＯＮ
状態になると共にＯＮ信号をインバータ（２３）で反転
したＯＦＦ信号によりスイッチ回路（２０）のスイッチ
（２０ａ　）　、　　（２０ｂ　）が解放してＯＦＦ状
態となる。従ってメモリ　（２５）には言葉が発生され
る直前の雑音成分を相殺するのに最適な補正係数Ａｎが
各周波数帯域にわたってこの場合１６個記憶される。Then, the words are generated at least from the main microphone +
1) collects an audio signal, its level exceeds the threshold level ゛t'h of the level detection circuit (22), so the output side of the level detection circuit (22) is turned on.
A signal is obtained, which closes the switches (21a) and (21b) of the switch circuit (21) and turns them on.
At the same time, the switches (20a) and (20b) of the switch circuit (20) are released by the OFF signal obtained by inverting the ON signal by the inverter (23), and the switches (20a) and (20b) are turned OFF. Therefore, in the memory (25), 16 correction coefficients An are stored in each frequency band, which are optimal for canceling the noise component immediately before a word is generated.

一方生マイクロホン（１）で集音された雑音成分を合む
音声信号及び補助マイクロホン（２）で集音された主と
して環境雑音は上述と同様の信号処理を受けて差分検出
回路（２ｂ）に供給される。そして差分検出回路（２６
）では対応する周波数帯域の補正係数Ａｎをメモリ　（
２５）より読み出して、上記（１）式に従って認識処理
用データＣｎを算出する。認識処理用データＣｎは平常
時つまり言葉を発生しない時はできるだけ零に近い値に
なるように補正係数Ａｎを設定しているので実質的に主
マイクロホンｉｌｌからの音声信号に含まれていた雑音
成分は補助マイクロホン（１１）からの雑音成分により
相殺され、結局音声信号のみが真の認識処理用データＣ
ｎとして取り出される。On the other hand, the audio signal including the noise component collected by the raw microphone (1) and the mainly environmental noise collected by the auxiliary microphone (2) undergo the same signal processing as described above and are supplied to the difference detection circuit (2b). be done. And the difference detection circuit (26
), the correction coefficient An of the corresponding frequency band is stored in memory (
25) and calculate the recognition processing data Cn according to the above equation (1). Since the correction coefficient An is set so that the recognition processing data Cn has a value as close to zero as possible during normal times, that is, when no words are generated, it is essentially a noise component contained in the audio signal from the main microphone ill. is canceled out by the noise component from the auxiliary microphone (11), and in the end, only the audio signal becomes the true recognition processing data C.
It is taken out as n.

この認識処理用データＣｎは音戸認識部（２７）の音楽
情報正規化器（２７ａ）に供給されて正規化　、゛され
て音響パラメータとして取り出される。この音響パラメ
ータはパターンマツチング回路（２７ｂ）でメモリ　（
２７ｃ）の標準パターンと比較され、人力音響を示す認
識結果として出方端子（２８）に出力される。This recognition processing data Cn is supplied to the music information normalizer (27a) of the Ondo recognition section (27), where it is normalized and extracted as acoustic parameters. These acoustic parameters are stored in memory (
27c) and is output to the output terminal (28) as a recognition result indicating human-powered sound.

このように本実施例では主マイクロホン（１）からのデ
ータから補助マイクロホン（１１）からのデータを差し
引く際、各周波数帯域毎に別々の雑音除去用補正係数を
準備し、その補正係数が常に最適値を保つように修正し
てゆくので確実に環境雑音を相殺でき、特に従来の如く
時間軸上では位相のずれが予測しにくいので環境雑音の
相殺が困難であったが、本実施例では周波数分割してデ
ータに位相情報が含まれないように成し、実質的に周波
数軸上で環境雑音を相殺するようにしたので位相のずれ
の影響を何等受けることな（確実に環境雑音を相殺でき
る。In this way, in this embodiment, when subtracting data from the auxiliary microphone (11) from data from the main microphone (1), separate correction coefficients for noise removal are prepared for each frequency band, and the correction coefficient is always optimal. Since the correction is made to maintain the same value, it is possible to reliably cancel out environmental noise.Especially in the conventional method, it was difficult to predict the phase shift on the time axis, so it was difficult to cancel out environmental noise, but in this example, the frequency The data is divided so that phase information is not included in the data, and environmental noise is essentially canceled out on the frequency axis, so it is not affected by phase shifts (environmental noise can be definitely canceled out). .

また、周波数軸上で周波数帯域毎に環境雑音の相殺を行
っているので主マイクロホン＋１）と補助マイクロホン
（１１）の特性の違いや場所による周波数分布や位相の
違い等に影響されない、また、音声信号の入力直前まで
雑音除去用補正係数を修正するので、雑音源の大きさ、
位置、音質等の影響を受けにくい、更に増幅器（２）、
　　（１２）の発生する雑音も相殺できる。In addition, since environmental noise is canceled for each frequency band on the frequency axis, it is not affected by differences in the characteristics of the main microphone + 1) and auxiliary microphone (11), or differences in frequency distribution or phase depending on the location. Since the noise removal correction coefficient is corrected until just before the signal is input, the size of the noise source,
Amplifier (2), which is not easily affected by position, sound quality, etc.
The noise generated by (12) can also be canceled out.

〔Effect of the invention〕

上述の如くこの発明によれば、主マイクロホンと補助マ
イクロホンを用いて環境雑音を集音した後周波数分割し
て雑音除去用補正係数を算出し、この補正係数と主マイ
クロホンからの環境雑音を含む音声信号と補助マイクロ
ホンからの少なくとも環境雑音とから認識処理用の音声
データを得るようにしたので、実質的に周波数軸上で何
等位相ずれの影響を受けることなく環境雑音を相殺でき
、もって確実に環境雑音による誤認識を防止することが
できる。As described above, according to the present invention, environmental noise is collected using the main microphone and the auxiliary microphone, and then frequency-divided to calculate a correction coefficient for noise removal, and the sound including this correction coefficient and the environmental noise from the main microphone is collected. Since the audio data for recognition processing is obtained from the signal and at least the environmental noise from the auxiliary microphone, it is possible to cancel out the environmental noise without being affected by any phase shift on the frequency axis, thereby ensuring that the environmental noise is Misrecognition due to noise can be prevented.

[Brief explanation of the drawing]

図はこの発明の一実施例を示す回路構成図である。＋１１は生マイクロホン、（３）、　　（１３）はバン
トパスフィルタバンク、（７）　、　　（１７）はＡ／
Ｄ変換器、（１１）は補助マイクロホン、（２０）　、
　　（２１）はスイッチ回路、（２２）はレベル検出回
路、（２４）は補正係数算出回路、（２５）はメモリ、
（２６）は差分検出回路、（２７）は音声認識部である
。The figure is a circuit configuration diagram showing an embodiment of the present invention. +11 is a raw microphone, (3) and (13) are band pass filter banks, (7) and (17) are A/
D converter, (11) is auxiliary microphone, (20),
(21) is a switch circuit, (22) is a level detection circuit, (24) is a correction coefficient calculation circuit, (25) is a memory,
(26) is a difference detection circuit, and (27) is a speech recognition section.

Claims

[Scope of Claims] A speech recognition device that performs speech recognition by comparing acoustic parameters obtained by frequency-dividing and normalizing an audio signal collected by a main microphone with a preset standard pattern, which mainly uses environmental noise. A correction coefficient for noise removal is calculated by comparing data obtained by frequency-dividing the environmental noise collected by the auxiliary microphone with the data obtained by frequency-dividing the environmental noise collected by the auxiliary microphone. and a correction coefficient calculation means for calculating noise-free audio data from an audio signal including environmental noise collected by the main microphone, at least environmental noise collected by the auxiliary microphone, and the correction coefficient. 1. A speech recognition device, comprising: data calculation means, and performs speech recognition using the speech data.