JPH075895A

JPH075895A - Device for recognition and method for recognizing voice in noisy evironment

Info

Publication number: JPH075895A
Application number: JP6102164A
Authority: JP
Inventors: Hirofumi Yajima; 弘文矢島
Original assignee: Clarion Co Ltd
Current assignee: Faurecia Clarion Electronics Co Ltd
Priority date: 1993-04-20
Filing date: 1994-04-15
Publication date: 1995-01-10
Anticipated expiration: 2019-05-17
Also published as: JP3526911B2

Abstract

PURPOSE:To provide a voice recognition device in which noise components having rapid chagnes are exactly eliminated and a voice recognition rate is improved. CONSTITUTION:The device is provided with a microphone 11 which generates main signals ma in that voice signals sa and noise components oa . g from an audio device 16 are mixed, an amplifier 18 which generates reference signals ra based on the noise components, a voice section discrimination means which discriminates that voice section in which voice signal is contained in, the main signals or non-voice section containing no voice signal, a compensation coefficient updating means which generates and updates compensation coefficients based on the main signals ma in the non-voice section, a CPU 15 which has a computing means that subtracts the value obtained by multiplying the reference signals ra by the compensation coefficients in the voice section from the main signals ma and a voice recognition part 21 which collates the computation results obtained from the computing means and comparison voice signals B registered in a registration dictionary 22 and performs voice recognition.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、音声を認識する装置
で、特に、騒音環境下で音声を認識する音声認識装置及
び騒音環境での音声認識方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition apparatus, and more particularly to a speech recognition apparatus for recognizing speech in a noisy environment and a speech recognition method in a noisy environment.

【０００２】[0002]

【従来の技術】従来の音声認識装置においては、このよ
うな騒音環境下で音声認識の認識率の低下を防止するた
めに、ＬＭＳ法やスペクトル・サブトラクション法（以
下「Ｓ．Ｓ法」という）等が採られていた。ＬＭＳ法と
は、適応フィルタ法により、発声音声と騒音成分が混在
したマイク入力信号であるメイン信号から、既知騒音信
号をリファレンス信号として騒音成分を除去する方法で
ある。また、Ｓ．Ｓ法とは、発声音声に含まれる騒音成
分を定常雑音とみなして除去する方法である。2. Description of the Related Art In a conventional voice recognition device, in order to prevent a reduction in the recognition rate of voice recognition in such a noisy environment, an LMS method or a spectrum subtraction method (hereinafter referred to as "SS method") is used. And so on. The LMS method is a method of removing a noise component from a main signal, which is a microphone input signal in which vocalized voice and a noise component are mixed, by using a known noise signal as a reference signal by an adaptive filter method. Also, S. The S method is a method of removing noise components included in uttered speech by considering them as stationary noise.

【０００３】図２１はＳ．Ｓ法を適用した従来の音声認
識装置のブロック図である。この図において、１は発声
者（図示せず）からの音声、及び、オーディオ装置等の
別の発生源からの音楽騒音であるオーディオ信号による
音を受けて、電気信号のメイン信号ｍａとして送出する
マイクである。２はこのメイン信号ｍａを増幅するアン
プである。３は増幅されたメイン信号ｍａを、周波数分
割して複数のチャンネル信号を生成し、その一つのチャ
ンネル信号を択一的に送出するフィルタバンクである。FIG. 21 shows the S. It is a block diagram of the conventional speech recognition apparatus to which the S method is applied. In this figure, 1 receives a voice from a speaker (not shown) and a sound due to an audio signal which is a music noise from another source such as an audio device, and sends it as a main signal ma of an electric signal. It's Mike. Reference numeral 2 is an amplifier for amplifying the main signal ma. Reference numeral 3 is a filter bank for frequency-dividing the amplified main signal ma to generate a plurality of channel signals and selectively transmitting one of the channel signals.

【０００４】フィルタバンク３は、メイン信号ｍａの全
帯域をチャンネル信号ｍo として通過させるオールパス
フィルタ３ａ、メイン信号ｍａを所定帯域ごとに分割し
て複数（ｎ個）のチャンネル信号ｍ1 ，ｍ2 ，…，ｍｎ
を送出するバンドパスフィルタ群３ｂ，３ｃ、チャンネ
ル信号ｍａ，ｍ1 ，ｍ2 ，…，ｍｎのうちの一つのチャ
ンネル信号ｍ（ＣＨ）（ＣＨ＝0,1,2,…, ｎ）を選択す
るマルチプレクサ３ｄ、この選択されたチャンネル信号
ｍ（ＣＨ）をデジタル信号に変換してチャンネル信号Ｍ
（ＣＨ）を送出するＡ／Ｄコンバータ３ｅで構成され
る。The filter bank 3 is an all-pass filter 3a which passes the entire band of the main signal ma as a channel signal mo, and a plurality (n) of channel signals m1, m2, ..., Which divide the main signal ma into predetermined bands. mn
, A multiplexer for selecting one channel signal m (CH) (CH = 0,1,2, ..., n) of the bandpass filter groups 3b, 3c for transmitting the signal, and the channel signals ma, m1, m2 ,. 3d, the selected channel signal m (CH) is converted into a digital signal to obtain a channel signal M
It is composed of an A / D converter 3e that sends out (CH).

【０００５】４はフィルタバンク３から送出されるメイ
ン音声データであるチャンネル信号Ｍ（ＣＨ）を音声認
識するＣＰＵであり、図には示さないが、演算部、プロ
グラム格納用のＲＯＭ、データ格納用のＲＡＭ等により
構成される。５は予め登録された比較音声データを格納
し、音声解析時にＣＰＵ４にその比較音声データを供給
するする登録辞書である。Reference numeral 4 denotes a CPU for recognizing the channel signal M (CH), which is the main audio data sent from the filter bank 3, by a voice, which is not shown in the drawing, but is not shown in the figure, but is a calculation unit, a ROM for storing a program, a data storing RAM and the like. Reference numeral 5 is a registration dictionary that stores comparative voice data registered in advance and supplies the comparative voice data to the CPU 4 during voice analysis.

【０００６】次に、上記従来の音声認識装置の動作につ
いて説明する。オーディオ騒音成分を含むメイン信号ｍ
ａは、フィルタバンク３を経た後、各チャンネルごとに
デジタル信号に変換され、音声認識すべきメイン音声デ
ータとしてＣＰＵ４に取り込まれる。その後、オーディ
オ騒音成分が既知騒音成分として除去されて、登録辞書
に予め登録されている比較音声データとパターンマッチ
ングされ音声認識される。Next, the operation of the conventional speech recognition apparatus will be described. Main signal m including audio noise component
After passing through the filter bank 3, a is converted into a digital signal for each channel and taken into the CPU 4 as main voice data to be voice-recognized. After that, the audio noise component is removed as a known noise component, and the voice recognition is performed by pattern matching with the comparative voice data registered in advance in the registration dictionary.

【０００７】[0007]

【発明が解決しようとする課題】しかしながら上記従来
の音声認識装置においては、ＬＭＳ法すなわち適応フィ
ルタ法の場合には、オーディオ騒音成分のような非定常
雑音に対しては、あまり急激な変化に対応できず、急激
な騒音の変化がない場合でも、フィルタの収束時間が長
くなるという問題があった。さらに、ＤＳＰ等の高速演
算処理が可能な処理装置を必要とするため、音声認識装
置のコストアップの要因になるという問題もあった。However, in the above-mentioned conventional speech recognition apparatus, in the case of the LMS method, that is, the adaptive filter method, it responds to a too rapid change with respect to non-stationary noise such as an audio noise component. Even if it is not possible and there is no sudden change in noise, there is a problem that the convergence time of the filter becomes long. Further, since a processing device capable of high-speed arithmetic processing such as a DSP is required, there is a problem that it causes a cost increase of the voice recognition device.

【０００８】また、Ｓ．Ｓ法の場合も、オーディオ騒音
成分のような急激な変化を伴う騒音に対しては、正確な
雑音除去ができないので、認識率を高めることができな
いという問題があった。In addition, S. Also in the case of the S method, there is a problem that the recognition rate cannot be increased because noise cannot be accurately removed from noise accompanied by a sudden change such as an audio noise component.

【０００９】本発明による音声認識装置は、このような
従来の問題を解決するものであり、ＤＳＰ等の高価な高
速演算処理装置を必要とすることなく、急激な変化を伴
う騒音成分を正確に除去し、音声認識率を向上すること
ができる優れた音声認識を行なうことを目的とする。The speech recognition apparatus according to the present invention solves such a conventional problem, and accurately corrects a noise component accompanied by an abrupt change without requiring an expensive high-speed arithmetic processing device such as a DSP. It is an object of the present invention to perform excellent speech recognition by removing the speech and improving the speech recognition rate.

【００１０】また、本発明による音声認識方法は、ファ
ジィ推論を用いることにより、さらに適応型Ｓ・Ｓ方式
の効果を向上させることを目的とする。Another object of the speech recognition method according to the present invention is to further improve the effect of the adaptive SS system by using fuzzy inference.

【００１１】[0011]

【課題を解決するための手段】本発明による音声認識装
置は上記目的を達成するために、発声者からの音声信号
に騒音成分が混在したメイン信号から前記騒音成を除去
して予め登録した比較音声信号と照合して前記音声信号
の認識を行う音声認識装置であって、前記騒音成分に基
づいて基準信号を生成する手段と、前記メイン信号に前
記音声信号が含まれる音声区間か含まれない非音声区間
かを判別する音声区間判別手段と、前記非音声区間にお
いて前記メイン信号に基づいて補正係数を生成しかつ更
新する補正係数更新手段と、前記音声区間において前記
基準信号に前記補正係数を乗じた値を前記メイン信号か
ら減算する演算手段と、該演算手段から得られる演算結
果と前記比較音声信号とを照合して音声認識を行う認識
手段と、を備えたことを特徴とする。In order to achieve the above-mentioned object, a speech recognition apparatus according to the present invention is a comparison in which the noise component is removed from a main signal in which a voice signal from a speaker is mixed with a noise component and is registered in advance. A voice recognition device for recognizing the voice signal by collating with the voice signal, comprising means for generating a reference signal based on the noise component, and a voice section in which the main signal includes the voice signal or not. A voice section discriminating means for discriminating whether or not it is a non-voice section, a correction coefficient updating means for generating and updating a correction coefficient based on the main signal in the non-voice section, and a correction coefficient for the reference signal in the voice section. An arithmetic means for subtracting the multiplied value from the main signal, and a recognition means for collating the arithmetic result obtained from the arithmetic means with the comparative voice signal for voice recognition are provided. And wherein the door.

【００１２】また、本発明による音声認識方法は上記目
的を達成するために、発声者からの音声信号成分に騒音
成分が混在した入力信号から当該騒音成分を除去して前
記発声者の音声を認識する騒音環境での音声認識方法で
あって、前記入力信号からファジィ推論により音声区間
を検出し、この音声区間に前記騒音成分が混在している
か否かを判別して、当該判別結果に応じて前記音声信号
成分を予測する補正計数の更新を行い、当該更新した補
正計数の調整を行い、当該調整された補正計数に基づい
て減算処理を行い、当該減算結果を前記音声信号成分と
して音声認識を行うことを特徴とする。In order to achieve the above object, the speech recognition method according to the present invention recognizes the voice of the speaker by removing the noise component from the input signal in which the noise component is mixed with the voice signal component from the speaker. A voice recognition method in a noise environment, wherein a voice section is detected from the input signal by fuzzy inference, and it is determined whether or not the noise component is mixed in this voice section, and according to the determination result. The correction count for predicting the audio signal component is updated, the updated correction count is adjusted, subtraction processing is performed based on the adjusted correction count, and voice recognition is performed using the subtraction result as the audio signal component. It is characterized by performing.

【００１３】また、発声者からの音声信号成分に音響騒
音成分及び走行騒音成分が混在した入力信号から当該騒
音成分を除去して前記発声者の音声を認識する騒音環境
での音声認識方法であって、前記入力信号から音声区間
を検出し、この音声区間に前記走行騒音成分が混在して
いるか否かをファジィ推論により判別して、当該判別結
果に応じて前記音声信号成分を予測する補正計数の更新
を行い、当該更新した補正計数の調整を行い、当該調整
された補正計数に基づいて減算処理を行い、当該減算結
果を前記音声信号成分として音声認識を行うことを特徴
とする騒音環境での音声認識方法。Further, it is a voice recognition method in a noise environment for recognizing the voice of the speaker by removing the noise component from an input signal in which an acoustic noise component and a running noise component are mixed in the voice signal component from the speaker. A correction count for detecting a voice section from the input signal, determining by fuzzy inference whether or not the running noise component is mixed in the voice section, and predicting the voice signal component according to the determination result. In the noise environment characterized by performing the adjustment of the updated correction count, performing subtraction processing based on the adjusted correction count, and performing voice recognition using the subtraction result as the voice signal component. Voice recognition method.

【００１４】またさらに、発声者からの音声信号成分に
騒音成分が混在した入力信号から当該騒音成分を除去し
て前記発声者の音声を認識する音声認識方法であって、
前記入力信号から音声区間を検出し、この音声区間に前
記騒音成分が混在しているか否かを判別して、当該判別
結果に応じて前記音声信号成分を予測する補正計数の更
新を行い、ファジィ推論により当該更新した補正計数の
調整を行い、当該調整された補正計数に基づいて減算処
理を行い、当該減算結果を前記音声信号成分として音声
認識を行うことを特徴とする。Furthermore, a voice recognition method for recognizing the voice of the speaker by removing the noise component from an input signal in which a noise component is mixed with a voice signal component from the speaker,
A voice section is detected from the input signal, it is determined whether or not the noise component is mixed in the voice section, and a correction count for predicting the voice signal component is updated according to the determination result, and fuzzy The updated correction count is adjusted by inference, subtraction processing is performed based on the adjusted correction count, and voice recognition is performed using the subtraction result as the voice signal component.

【００１５】[0015]

【作用】したがって本発明による音声認識装置は、音声
区間において基準信号に補正係数を乗じた値をメイン信
号から減算する演算において、補正係数を更新しつつ演
算を行うので、急激な変化を伴う騒音成分を正確に除去
し、音声認識率を向上することができる。Therefore, in the voice recognition apparatus according to the present invention, in the calculation for subtracting the value obtained by multiplying the reference signal by the correction coefficient from the main signal in the voice section, the calculation is performed while updating the correction coefficient. It is possible to accurately remove the component and improve the voice recognition rate.

【００１６】また、本発明による音声認識方法は、音声
トリガレベルの決定方法、走行騒音判定レベルの決定方
法、及び、補正係数の調整量の決定方法をファジィ推論
により行うことにより、さらに適応型Ｓ・Ｓ方式の効果
を向上させることができる。Further, in the voice recognition method according to the present invention, the method of determining the voice trigger level, the method of determining the running noise determination level, and the method of determining the adjustment amount of the correction coefficient are performed by fuzzy inference, so that the adaptive S -The effect of the S method can be improved.

【００１７】[0017]

【実施例】以下、第１ないし第７の発明の実施例につい
て図を参照して詳細に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments of the first to seventh inventions will be described below in detail with reference to the drawings.

【００１８】１．第１の発明の実施例について説明す
る。1. An embodiment of the first invention will be described.

【００１９】図１は本発明の第１の実施例の音声認識装
置のブロック図である。図１において、１１は発声者
（図示せず）からの音声、及び、オーディオ装置等の別
の発生源からのオーディオ信号（これについては後述す
る）による音を受けて、電気信号のメイン信号ｍａとし
て送出するマイクである。１２はこのメイン信号ｍａを
増幅するアンプである。１３は増幅されたメイン信号ｍ
ａを、周波数分割して複数のチャンネル信号を生成し、
その一つのチャンネル信号を択一的に送出するフィルタ
バンクである。FIG. 1 is a block diagram of a voice recognition apparatus according to the first embodiment of the present invention. In FIG. 1, 11 is a main signal ma which is an electric signal in response to a voice from a speaker (not shown) and a sound from an audio signal (which will be described later) from another source such as an audio device. It is a microphone to send as. Reference numeral 12 is an amplifier for amplifying the main signal ma. 13 is the amplified main signal m
a is frequency-divided to generate a plurality of channel signals,
It is a filter bank that selectively outputs the one channel signal.

【００２０】フィルタバンク１３は、メイン信号ｍａの
全帯域をチャンネル信号ｍo として通過させるオールパ
スフィルタ１３ａ、メイン信号ｍａを所定帯域ごとに分
割して複数（ｎ個）のチャンネル信号ｍ1 ，ｍ2 ，…，
ｍｎを送出するバンドパスフィルタ群１３ｂ，１３ｃ、
チャンネル信号ｍａ，ｍ1 ，ｍ2 ，…，ｍｎのうちの一
つのチャンネル信号ｍ（ＣＨ）（ＣＨ＝0,1,2,…, ｎ）
を選択するマルチプレクサ１３ｄ、この選択されたチャ
ンネル信号ｍ（ＣＨ）をデジタル信号に変換してチャン
ネル信号Ｍ（ＣＨ）を送出するＡ／Ｄコンバータ１３ｅ
で構成される。The filter bank 13 is an all-pass filter 13a which passes the entire band of the main signal ma as a channel signal mo, and a plurality (n) of channel signals m1, m2, ..., Which divide the main signal ma into predetermined bands.
bandpass filter groups 13b and 13c for transmitting mn,
One of the channel signals ma, m1, m2, ..., Mn is a channel signal m (CH) (CH = 0, 1, 2, ..., N).
A multiplexer 13d for selecting the A / D converter 13e for converting the selected channel signal m (CH) into a digital signal and transmitting the channel signal M (CH).
Composed of.

【００２１】１４はフィルタバンク１３から送出される
メイン音声データであるチャンネル信号Ｍ（ＣＨ）を保
持するラッチ回路である。１５は制御手段としてのＣＰ
Ｕであり、演算手段である演算部１５ａ、補正係数格納
部１５ｂ、その他、図には示さないが、プログラム格納
用のＲＯＭ、データ格納用のＲＡＭ等により構成され
る。Reference numeral 14 is a latch circuit for holding a channel signal M (CH) which is the main audio data sent from the filter bank 13. 15 is CP as a control means
U, which is a computing unit 15a serving as a computing unit, a correction coefficient storage unit 15b, and others, which are not shown in the figure, include a ROM for storing a program, a RAM for storing data, and the like.

【００２２】１６はオーディオ装置であり、オーディオ
信号ｏａを送出する。１７は電気信号のオーディオ信号
ｏａを音に変換するスピーカである。もっとも、このス
ピーカ１７から送出されるオーディオ信号は、上記した
発声者からの音声にとっては、除去されるべき音楽騒音
となる。Reference numeral 16 is an audio device, which sends out an audio signal oa. Reference numeral 17 is a speaker for converting the audio signal oa of the electric signal into sound. However, the audio signal transmitted from the speaker 17 becomes music noise that should be removed from the voice from the speaker.

【００２３】１８はオーディオ装置１６からのオーディ
オ信号ｏａを増幅して基準信号であるリファレンス信号
ｒａとして出力するアンプである。１９はこのリファレ
ンス信号ｒａを、周波数分割して複数のチャンネル信号
を生成し、その一つのチャンネル信号を択一的に送出す
るフィルタバンクである。Reference numeral 18 denotes an amplifier which amplifies the audio signal oa from the audio device 16 and outputs it as a reference signal ra which is a reference signal. Reference numeral 19 is a filter bank for frequency-dividing this reference signal ra to generate a plurality of channel signals and selectively transmitting one of the channel signals.

【００２４】フィルタバンク１９は、リファレンス信号
ｒａの全帯域をチャンネル信号ｒoとして通過させるオ
ールパスフィルタ１９ａ、リファレンス信号ｒａを所定
帯域ごとに分割して複数（ｎ個）のチャンネル信号ｒ1
，ｒ2 ，…，ｒｎを送出するバンドパスフィルタ群１
９ｂ，１９ｃ、チャンネル信号ｒａ，ｒ1 ，ｒ2 ，…，
ｒｎのうちの一つのチャンネル信号ｒ（ＣＨ）（ＣＨ＝
0,1,2,…, ｎ）を選択するマルチプレクサ１９ｄ、この
選択されたチャンネル信号ｒ（ＣＨ）をデジタル信号に
変換してチャンネル信号Ｒ（ＣＨ）を送出するＡ／Ｄコ
ンバータ１９ｅで構成される。The filter bank 19 divides the reference signal ra into predetermined bands so as to pass the entire band of the reference signal ra as the channel signal ro, and divides the reference signal ra into a plurality (n) of channel signals r1.
, R2, ..., rn for transmitting bandpass filter group 1
9b, 19c, channel signals ra, r1, r2, ...
One of the channel signals r (CH) (CH =
0, 1, 2, ..., N) is selected by a multiplexer 19d, and the selected channel signal r (CH) is converted into a digital signal and an A / D converter 19e for transmitting a channel signal R (CH) is provided. It

【００２５】２０はフィルタバンク１９から送出される
チャンネル信号Ｒ（ＣＨ）を保持するラッチ回路であ
る。２１はＣＰＵ１５で演算され出力されるチャンネル
信号Ｓ（ＣＨ）の音声解析データを認識する認識手段と
しての音声認識部であり、チャンネル信号を選択するた
めのチャンネル選択信号ＣＳを、ＣＰＵ１５並びにマル
チプレクサ１３ｄ及び１９ｄに供給する。２２は予め登
録された比較音声データを格納し、音声解析時に、音声
認識部２１にその比較音声データを供給するする登録辞
書である。Reference numeral 20 is a latch circuit for holding the channel signal R (CH) sent from the filter bank 19. Reference numeral 21 is a voice recognition unit as a recognition means for recognizing the voice analysis data of the channel signal S (CH) calculated and output by the CPU 15, and outputs the channel selection signal CS for selecting the channel signal to the CPU 15 and the multiplexer 13d. Supply to 19d. Reference numeral 22 is a registration dictionary that stores comparative voice data registered in advance and supplies the comparative voice data to the voice recognition unit 21 during voice analysis.

【００２６】２３はメイン信号ｍａに含まれる発生者か
らの音声信号の音声区間の始端を検出して始端信号（ト
リガ信号）ＴＲを発生し、ＣＰＵ１５に供給する音声区
間判別手段としての音声トリガ回路である。Reference numeral 23 is a voice trigger circuit as a voice section discriminating means for detecting the start of the voice section of the voice signal from the generator included in the main signal ma to generate a start signal (trigger signal) TR and supplying it to the CPU 15. Is.

【００２７】次に、補正係数格納部１５ｂに格納される
補正係数について説明する。Next, the correction coefficient stored in the correction coefficient storage section 15b will be described.

【００２８】マイク１１から入力されるメイン信号ｍａ
は、下記の（数１）で表される。The main signal ma input from the microphone 11
Is represented by the following (Equation 1).

【００２９】[0029]

【数１】ここで、ｓａは発生者からの音声をうけて、マイク１１
で電気信号として出力される音声信号であり、マイク１
１の変換特性が加わったものである。また、ｏａはオー
ディオ装置１６から送出されるオーディオ信号である。
さらに、ｇはオーディオ信号ｏａがスピーカ１７の変換
特性により音に変換され、その発生された音が伝播して
マイク１１に到達するまでに受ける伝送特性である。[Equation 1] Here, sa receives the voice from the generator, and the microphone 11
Is a voice signal output as an electric signal by the microphone 1
The conversion characteristic of 1 is added. Further, oa is an audio signal transmitted from the audio device 16.
Further, g is a transmission characteristic in which the audio signal oa is converted into sound by the conversion characteristic of the speaker 17, and the generated sound is propagated and reaches the microphone 11.

【００３０】オーディオ信号ｏａはオーディオ装置１６
より直接得ることができるので、伝送特性ｇが解れば、
下記の（数２）より音声信号ｓａを求めることが可能で
ある。The audio signal oa is sent to the audio device 16
Since it can be obtained more directly, if the transmission characteristic g is known,
The audio signal sa can be obtained from the following (Equation 2).

【００３１】[0031]

【数２】しかしながら、この伝送特性ｇを得るためには、高精度
測定を必要とし、しかもその正確な値を得ることは非常
に困難である。[Equation 2] However, in order to obtain this transmission characteristic g, highly accurate measurement is required, and it is very difficult to obtain its accurate value.

【００３２】そこで、本実施例においては、メイン信号
ｍａ及びオーディオ信号ｏａを周波数解析し、デジタル
化したデータを用いて音声信号ｓａを求める手法を採
る。Therefore, in this embodiment, a method of frequency-analyzing the main signal ma and the audio signal oa and obtaining the audio signal sa by using the digitized data is adopted.

【００３３】図１のＡ／Ｄコンバータ１３ｅ及び１９ｅ
から送出されるデジタル信号Ｍ（ＣＨ）及びＲ（ＣＨ）
の間には、次の（数３）の関係が成立する。The A / D converters 13e and 19e shown in FIG.
Digital signals M (CH) and R (CH) sent from
The following relationship (Equation 3) is established between the two.

【００３４】[0034]

【数３】もっともこの（数３）は、アナログ信号をデジタル化し
たために生ずる誤差により、左項と右項とは必ずしも完
全に等しくはならない。この（数３）において、Ｓ（Ｃ
Ｈ）は音声信号ｓａをデジタル化したデータであり、Ｇ
（ＣＨ）はＲ（ＣＨ）に乗じてメイン信号Ｍ（ＣＨ）に
含まれる音声信号成分Ｓ（ＣＨ）を予測するための補正
係数である。[Equation 3] However, this (Equation 3) is not always completely equal to the left term and the right term due to an error caused by digitizing the analog signal. In this (Equation 3), S (C
H) is data obtained by digitizing the audio signal sa, and G
(CH) is a correction coefficient for multiplying R (CH) to predict the audio signal component S (CH) included in the main signal M (CH).

【００３５】この（数３）により、音声信号Ｓ（ＣＨ）
は次に示す（数４）で表される。From this (Equation 3), the audio signal S (CH)
Is represented by the following (Equation 4).

【００３６】[0036]

【数４】この（数４）により、メイン信号Ｍ（ＣＨ）に含まれる
音声信号Ｓ（ＣＨ）が予測できる。[Equation 4] From this (Equation 4), the audio signal S (CH) included in the main signal M (CH) can be predicted.

【００３７】この補正係数Ｇ（ＣＨ）は、周波数分解能
であるチャンネル数ｎが大であるならば、音声が発生さ
れていないときのＭ（ＣＨ）とＲ（ＣＨ）との比により
推測可能である。すなわち、Ｓ（ＣＨ）＝０とすると、
Ｍ（ＣＨ）＝Ｒ（ＣＨ）・Ｇ（ＣＨ）となり、補正係数
Ｇ（ＣＨ）は、Ｍ（ＣＨ）／Ｒ（ＣＨ）と表すことがで
きるからである。非音声区間において算出された補正係
数Ｇ（ＣＨ）は、補正係数格納部１５ｂに格納される。This correction coefficient G (CH) can be estimated from the ratio of M (CH) and R (CH) when no sound is generated, if the number of channels n which is the frequency resolution is large. is there. That is, if S (CH) = 0,
This is because M (CH) = R (CH) · G (CH), and the correction coefficient G (CH) can be expressed as M (CH) / R (CH). The correction coefficient G (CH) calculated in the non-voice section is stored in the correction coefficient storage unit 15b.

【００３８】図２は図１に示す音声認識装置のＣＰＵ１
５の動作を示すフローチャートである。以下、この動作
を説明する。この場合の補正係数は、非音声区間のＭ
（ＣＨ）及びＲ（ＣＨ）の各々数秒間（ここでは１秒間
とする）の累計値をΣＭ（ＣＨ）及びΣＲ（ＣＨ）と
し、次に示す（数５）により、その比を補正係数とす
る。FIG. 2 is a CPU 1 of the voice recognition apparatus shown in FIG.
6 is a flowchart showing the operation of FIG. This operation will be described below. The correction coefficient in this case is M in the non-voice section.
The cumulative values of (CH) and R (CH) for several seconds (here, one second) are set as ΣM (CH) and ΣR (CH), and the ratio thereof is used as a correction coefficient according to the following (Equation 5). To do.

【００３９】[0039]

【数５】図２において、音声認識部２１からチャンネル選択信号
ＣＳを取り込み、ラッチ回路１４及び１９へ出力し（ス
テップＳ１１）、ラッチタイミング及びラッチ回路１４
及び１９からデータを取り込むタイミングをつくる。そ
の後、ラッチ回路１４及び１９からデータＭ（ＣＨ）及
びＲ（ＣＨ）を取り込むとともに、音声トリガ回路２３
からのトリガ信号ＴＲを取り込む（ステップＳ１２）。
このトリガ信号ＴＲすなわち始端信号を受けた時点を音
声始端とし、その時点からタイマをセットして、１．６
秒間（この区間は、音声認識部２１の最大許容音声区間
長である）を音声区間とする。[Equation 5] In FIG. 2, the channel selection signal CS is fetched from the voice recognition unit 21 and output to the latch circuits 14 and 19 (step S11), and the latch timing and the latch circuit 14 are output.
And the timing to fetch data from 19 is created. After that, the data M (CH) and R (CH) are fetched from the latch circuits 14 and 19, and the voice trigger circuit 23
The trigger signal TR from is taken in (step S12).
The time point at which this trigger signal TR, that is, the start end signal is received is regarded as the voice start end, and the timer is set from that time point to 1.6.
A second (this section is the maximum allowable speech section length of the speech recognition unit 21) is a speech section.

【００４０】データＭ（ＣＨ）及びＲ（ＣＨ）を取り込
むごとに音声区間か否かを判別し（ステップＳ１３）、
音声区間でない区間が１秒以上継続した場合には、現在
より過去１秒間のストックデータＭ（ＣＨ）及びＲ（Ｃ
Ｈ）の更新を行い、その累計値ΣＭ（ＣＨ）及びΣＲ
（ＣＨ）を計算して、式（３）により最新の補正係数を
作成して、補正係数格納部１５ｂの補正係数の値を更新
する（ステップＳ１４）。Each time the data M (CH) and R (CH) is fetched, it is judged whether or not it is in the voice section (step S13).
If a section that is not a voice section continues for 1 second or longer, stock data M (CH) and R (C
H) is updated, and the cumulative values ΣM (CH) and ΣR are updated.
(CH) is calculated, the latest correction coefficient is created by equation (3), and the value of the correction coefficient in the correction coefficient storage unit 15b is updated (step S14).

【００４１】一方、音声区間である場合には、補正係数
格納部１５ｂに格納されている（最新の）補正係数Ｇ
（ＣＨ）のデータを読出して、取り込んだデータＲ（Ｃ
Ｈ）に乗じて、オーディオ騒音成分であるＲ（ＣＨ）・
Ｇ（ＣＨ）を求め、（数４）によりメイン信号Ｍ（Ｃ
Ｈ）のデータからオーディオ騒音成分の減算を行う（ス
テップＳ１５）。この減算の結果である減算データを音
声信号Ｓ（ＣＨ）のデータとして出力する（ステップＳ
１６）。On the other hand, in the case of the voice section, the (latest) correction coefficient G stored in the correction coefficient storage unit 15b.
The data of (CH) is read and the captured data R (C
H) to multiply the audio noise component R (CH)
G (CH) is calculated, and the main signal M (C
The audio noise component is subtracted from the data of (H) (step S15). The subtraction data that is the result of this subtraction is output as the data of the audio signal S (CH) (step S
16).

【００４２】このように、上記第１の発明の実施例によ
れば、非音声区間のメイン信号及びリファレンス信号か
ら、常に最新の補正係数を求めることにより、急激に変
化するオーディオ騒音等の非定常雑音にも対応すること
ができ、さほど急激なオーディオ騒音の変化がない場合
には、フィルタの収束時間を短くすることができる。ま
た、ＤＳＰ等のような高速演算処理を可能とする高価な
処理装置を必要とすることもない。As described above, according to the first embodiment of the present invention, the latest correction coefficient is always obtained from the main signal and the reference signal in the non-voice section, so that the non-steady state such as the rapidly changing audio noise is generated. It is also possible to deal with noise, and the convergence time of the filter can be shortened when there is no sudden change in audio noise. Further, there is no need for an expensive processing device capable of high-speed arithmetic processing such as DSP.

【００４３】さらに、リファレンス信号に音声信号が含
まれることがないので、推定誤差を少なくすることがで
き、オーディオ騒音環境下においても高い音声認識が可
能となる。Furthermore, since the reference signal does not include a voice signal, the estimation error can be reduced and high voice recognition can be performed even in an audio noise environment.

【００４４】２．第２の発明の実施例について説明す
る。2. An embodiment of the second invention will be described.

【００４５】この発明の特徴は、第１の発明の実施例で
行っている補正係数の作成及び更新の際に、過去のデー
タを利用した遅延データを用いて作成・更新を行う点に
ある。A feature of the present invention is that when the correction coefficient is created and updated in the embodiment of the first invention, it is created and updated by using the delay data using the past data.

【００４６】図３は本発明の第１の実施例の音声認識装
置のブロック図である。図３において、図１に示す第１
の発明の実施例の構成と同じ構成のものは、同一の符号
で表しその説明は省略する。図３に示すように、この実
施例の構成には、音声トリガ回路は設けられていない。
ただし、ＣＰＵ１５内に、データの演算及び遅延処理を
行う演算手段とともに音声区間の検出を行う音声区間判
別手段である音声検出部１５ｃを備えた構成となってい
る。FIG. 3 is a block diagram of a voice recognition apparatus according to the first embodiment of the present invention. In FIG. 3, the first shown in FIG.
The same configurations as those of the embodiment of the invention are represented by the same reference numerals, and the description thereof will be omitted. As shown in FIG. 3, a voice trigger circuit is not provided in the configuration of this embodiment.
However, the CPU 15 is provided with a voice detecting section 15c which is a voice section determining means for detecting a voice section together with a calculating means for calculating data and a delay process.

【００４７】この音声検出部１５ｃにおいては、音声信
号の暫定的な始端を定めた後、その暫定的な始端から一
定時間過去の時点を確定的な始端とみなして音声区間を
定める。そのために、以下に記述するデータ遅延処理を
行う。In the voice detection section 15c, after the provisional start end of the voice signal is determined, a point in time past a certain time from the provisional start end is regarded as a definite start end and the voice section is determined. Therefore, the data delay process described below is performed.

【００４８】図４は図３における音声認識装置の音声始
端を検出する様子を示す図であり、データ遅延の様子を
示す図である。図４（ａ）及び（ｂ）において、ＤＰは
オーディオ騒音成分除去後の音声波形の現在値データを
示すものである。実際はデジタル信号であるが、ここで
は説明の便宜上アナログ信号として表すことにする。ｔ
ｓはこの音声始端の暫定的な検出位置であり、後述する
所定のスレッショルドレベル以上になる点を検出位置と
する。FIG. 4 is a diagram showing how the voice start end of the voice recognition device shown in FIG. 3 is detected, and is a diagram showing how data is delayed. In FIGS. 4A and 4B, DP indicates the current value data of the voice waveform after removing the audio noise component. Although it is actually a digital signal, it is represented here as an analog signal for convenience of description. t
s is a tentative detection position of the voice start end, and a point at which a predetermined threshold level, which will be described later, is reached is defined as a detection position.

【００４９】ＤＤはこの現在値データＤＰを一定時間だ
け遅延させた音声波形のデータ、すなわち過去値データ
である。この実施例の場合、現在値データＤＰと過去値
データＤＤとの遅延時間Ｔｄは１秒間である。したがっ
て、過去値データＤＤは、現在値データＤＰを図示せぬ
ＲＡＭに格納した後、１秒後に読み出すことにより得ら
れる。遅延時間Ｔｄを１秒間としたのは、音声信号の真
の始端と暫定的な始端との間の最大誤差時間が１秒間で
あると推定できるからである。DD is voice waveform data obtained by delaying the present value data DP by a fixed time, that is, past value data. In the case of this embodiment, the delay time Td between the current value data DP and the past value data DD is 1 second. Therefore, the past value data DD is obtained by storing the present value data DP in the RAM (not shown) and then reading it one second later. The delay time Td is set to 1 second because it can be estimated that the maximum error time between the true start end and the provisional start end of the audio signal is 1 second.

【００５０】Ｔ0 は音声信号の真の音声区間であり、こ
の場合１．６秒とする。Ｔ1 はＣＰＵ１５が判断する音
声区間であり、この場合２．６秒とする。したがって、
図４（ａ）の場合は、音声信号の暫定的な始端ｔｓが、
真の始端から１秒間（最大誤差時間）遅れている場合で
ある。また、図４（ｂ）の場合は、音声信号の暫定的な
始端ｔｓが、真の始端から僅かな時間だけ遅れている場
合である。いずれの場合も、ＣＰＵ１５の判断する音声
区間内には、真の音声区間が含まれることになる。T0 is the true voice section of the voice signal, which is 1.6 seconds in this case. T1 is a voice section judged by the CPU 15, and is 2.6 seconds in this case. Therefore,
In the case of FIG. 4A, the provisional start end ts of the audio signal is
This is a case where it is delayed by 1 second (maximum error time) from the true start point. Further, in the case of FIG. 4B, the provisional start end ts of the audio signal is delayed from the true start end by a slight time. In either case, the true voice section is included in the voice section determined by the CPU 15.

【００５１】図５は第２の発明の第１の実施例における
音声認識装置におけるＣＰＵ１５の動作を表すフローチ
ャートである。以下、図５を参照しつつ本発明の第１の
実施例の動作について説明する。FIG. 5 is a flow chart showing the operation of the CPU 15 in the voice recognition device in the first embodiment of the second invention. The operation of the first embodiment of the present invention will be described below with reference to FIG.

【００５２】まず、音声認識部２１から供給されるチャ
ンネル選択信号ＣＳを監視して、ラッチ回路１４及び２
０へラッチタイミングの信号を出力し（ステップＳ２
１）、ラッチ回路１４及び２０よりデータＭ（ＣＨ）及
びＲ（ＣＨ）を取り込む（ステップＳ２２）。First, the channel selection signal CS supplied from the voice recognition section 21 is monitored, and the latch circuits 14 and 2 are monitored.
The latch timing signal is output to 0 (step S2
1) The data M (CH) and R (CH) are fetched from the latch circuits 14 and 20 (step S22).

【００５３】次に、取り込んだデータにより音声区間の
検出を行う（ステップＳ２３）。この音声区間の始端ｔ
ｓの検出は以下のようにして行う。フィルタバンク１３
のオールパスフィルタ１３ａから得られた更新前の補正
係数（これをＧ０（０）とする）を利用し、下記に示す
（数６）により、音声信号のレベルが所定のスレッショ
ルドレベルＴＨより大となる点（時間）すなわち始端ｔ
ｓを検出する。Next, the voice section is detected from the acquired data (step S23). Start t of this voice section
The detection of s is performed as follows. Filter bank 13
Using the correction coefficient before updating (which is referred to as G0 (0)) obtained from the all-pass filter 13a, the level of the audio signal becomes higher than the predetermined threshold level TH by the following (Equation 6). Point (time), that is, starting point t
s is detected.

【００５４】[0054]

【数６】この（数６）で、Ｍ（０）及びＲ（０）は、オールパス
フィルタ１３ａから得られるメイン信号及びリファレン
ス信号である。上記したように、音声区間Ｔ1はこの始
端ｔｓから２．６秒間とする。[Equation 6] In this (Equation 6), M (0) and R (0) are the main signal and the reference signal obtained from the all-pass filter 13a. As described above, the voice section T1 is set to 2.6 seconds from the start end ts.

【００５５】さらに、始端ｔｓから過去１秒間のストッ
クデータを更新する。すなわち、過去１秒前の遅延デー
タＭＤ（ＣＨ），ＲＤ（ＣＨ）をＲＡＭから取り出す
（ステップＳ２４）。この遅延データを得ることによ
り、音声区間Ｔ1 内に１．６秒間の真の音声区間Ｔ0 を
包含することができる。また、音声データの遅延データ
を利用するのは、遅延データの音声始端よりも数ミリ早
く補正係数を更新しておくことにより、音声始端以前の
推定誤差によるオーディオ信号の残留成分を少なくし、
音声始端のトリガが早くかかり過ぎるのを防ぐという２
次的効果もある。Further, the stock data for the past 1 second from the starting end ts is updated. That is, the delay data MD (CH) and RD (CH) one second before are taken out from the RAM (step S24). By obtaining this delay data, the true voice section T0 of 1.6 seconds can be included in the voice section T1. Also, the delay data of the voice data is used because the correction coefficient is updated a few millimeters earlier than the voice start end of the delay data to reduce the residual component of the audio signal due to the estimation error before the voice start end,
To prevent the trigger at the beginning of the sound from being triggered too early 2
There are also secondary effects.

【００５６】次に、音声区間か否かを判別し（ステップ
Ｓ２５）、音声区間でない場合には、遅延データＭＤ
（ＣＨ），ＲＤ（ＣＨ）を補正係数計算用データとし
て、補正係数計算用ストックデータを更新する（ステッ
プＳ２６）。更新した遅延データＭＤ（ＣＨ），ＲＤ
（ＣＨ）を１秒間累計したΣＭＤ（ＣＨ），ΣＲＤ（Ｃ
Ｈ）は、音声成分を含んでいない遅延データの過去１秒
間の累計値である。この累計値を下記の（数７）に代入
して補正係数Ｇ（ＣＨ）を更新する。Next, it is judged whether or not it is in the voice section (step S25). If it is not in the voice section, the delay data MD
The correction coefficient calculation stock data is updated using (CH) and RD (CH) as the correction coefficient calculation data (step S26). Updated delay data MD (CH), RD
ΣMD (CH), ΣRD (C
H) is a cumulative value of delay data that does not include a voice component in the past 1 second. The correction coefficient G (CH) is updated by substituting this cumulative value into the following (Equation 7).

【００５７】[0057]

【数７】一方、ステップＳ２５において、音声区間である場合に
は、更新された補正係数Ｇ（ＣＨ）と遅延データＭＤ
（ＣＨ），ＲＤ（ＣＨ）を用いて、次の（数８）により
オーディオ騒音成分の減算を行い（ステップＳ２７）、
減算データすなわち遅延音声信号ＳＤ（ＣＨ）を得る。[Equation 7] On the other hand, in step S25, in the case of the voice section, the updated correction coefficient G (CH) and the delay data MD
Using (CH) and RD (CH), the audio noise component is subtracted by the following (Equation 8) (step S27),
The subtracted data, that is, the delayed voice signal SD (CH) is obtained.

【００５８】[0058]

【数８】この減算データである遅延音声信号ＳＤ（ＣＨ）を音声
認識部２１に出力する（ステップＳ２８）。[Equation 8] The delayed voice signal SD (CH) which is the subtracted data is output to the voice recognition unit 21 (step S28).

【００５９】このように、メイン信号及びリファレンス
信号の現在値データと、更新前の補正係数を利用した音
声区間の検出、並びに、メイン信号及びリファレンス信
号の過去値データを利用することにより、以下に示す効
果を得ることができる。As described above, the present value data of the main signal and the reference signal, the voice section detection using the correction coefficient before the update, and the past value data of the main signal and the reference signal are used. The effect shown can be obtained.

【００６０】１）オーディオ騒音が大きい場合でも、更
新前補正係数を利用した減算データによって、予めある
程度のオーディオ騒音成分を除去しているので、音声信
号の始端の検出誤差を小さくすることができる。1) Even if the audio noise is large, the audio noise component is removed in advance to some extent by the subtraction data using the pre-update correction coefficient, so that the detection error at the start end of the audio signal can be reduced.

【００６１】２）音声区間を自動的に検出するので、ユ
ーザが発声のたびにキー入力等の操作を行う負担を解消
することができる。2) Since the voice section is automatically detected, it is possible to eliminate the burden of the user performing an operation such as key input each time the user speaks.

【００６２】３）音声信号のレベルが小さいために推定
誤りによるオーディオ騒音成分の残留成分が存在する場
合でも、音声信号を検出するスレッショルドレベルを大
きく設定することにより、オーディオ騒音による音声区
間の誤検出を少なくすることができ、補正係数の適正値
を求めることができる。したがって、音声認識部のスレ
ッショルドレベルに依存することがない。3) Even if there is a residual component of audio noise component due to an estimation error because the level of the voice signal is small, the threshold level for detecting the voice signal is set to a large value, thereby erroneously detecting the voice section due to the audio noise. Can be reduced and an appropriate value of the correction coefficient can be obtained. Therefore, it does not depend on the threshold level of the voice recognition unit.

【００６３】４）極めて単純な方法であるため、リアル
タイム処理が可能となる。4) Real-time processing is possible because of the extremely simple method.

【００６４】本発明の第２の実施例の音声認識装置のブ
ロック図は、図３に示す第２の発明の実施例の構成と同
じ構成であり、その説明は省略する。The block diagram of the speech recognition apparatus according to the second embodiment of the present invention has the same configuration as that of the second embodiment of the present invention shown in FIG. 3, and the description thereof will be omitted.

【００６５】この実施例の特徴は、適応型Ｓ．Ｓ法にお
ける補正係数の更新を発声ごとに毎回行うのではなく、
一定時間ごとに行うことにある。音声認識部にある程度
の定常雑音除去機能を有する場合、毎回ごとの補正係数
の更新を行うと、特にオーディオ成分の変動が大きい
と、減算量が毎回変動してしまう。その結果、推定誤差
によるオーディオ騒音の残留成分が毎回変動するため、
音声認識部の定常雑音除去機能が有効に働かない。図６
（ａ）は発声のたびに毎回補正係数を更新した場合のオ
ーディオ騒音の残留成分を示す図である。かかる場合に
は、音声認識部で音声区間の誤検出が起こり易いという
現象が生じる。すなわち、この第２の実施例は上記誤検
出を回避するためになされたものである。The feature of this embodiment is that the adaptive S.M. Instead of updating the correction coefficient in the S method every time when uttering,
It is to do it at regular intervals. When the voice recognition unit has a certain level of stationary noise removal function, if the correction coefficient is updated every time, the subtraction amount will change every time especially when the audio component greatly changes. As a result, the residual component of audio noise due to the estimation error changes every time,
The stationary noise removal function of the voice recognition part does not work effectively. Figure 6
(A) is a figure which shows the residual component of audio noise when a correction coefficient is updated every time it utters. In such a case, a phenomenon occurs in which the voice recognition unit is likely to erroneously detect a voice section. That is, this second embodiment is made to avoid the above-mentioned erroneous detection.

【００６６】図７はこの第２の発明の第２の実施例の音
声認識装置のＣＰＵ１５の動作を示すフローチャートで
ある。このフローチャート及び図３に基づいて、この第
３の実施例の動作を説明する。FIG. 7 is a flow chart showing the operation of the CPU 15 of the voice recognition device according to the second embodiment of the second invention. The operation of the third embodiment will be described based on this flowchart and FIG.

【００６７】まず、音声認識部２１からのチャンネル選
択信号ＣＳをモニタし、ラッチ回路１４及び２０へラッ
チタイミング信号を供給し（ステップＳ３１）、並び
に、ラッチ回路１４及び２０からＣＰＵ１５への取り込
みタイミングをつくり、データＭ（ＣＨ），Ｒ（ＣＨ）
を取り込む（ステップＳ３２）。First, the channel selection signal CS from the voice recognition section 21 is monitored, the latch timing signal is supplied to the latch circuits 14 and 20 (step S31), and the fetch timing from the latch circuits 14 and 20 to the CPU 15 is determined. Structure, data M (CH), R (CH)
Is taken in (step S32).

【００６８】この取り込んだデータより音声区間の検出
を行う（ステップＳ３３）。この検出は、オールパスフ
ィルタ１３ａから得られた更新前の補正係数Ｇ０（０）
を利用し、（数６）を満たすデータにより、音声の始端
とする。The voice section is detected from the fetched data (step S33). This detection is performed by the correction coefficient G0 (0) before update obtained from the all-pass filter 13a.
Is used as the start point of the voice by the data satisfying (Equation 6).

【００６９】[0069]

【数６】音声区間はこの音声始端から２．６秒間（音声
認識装置の最大音声区間長）とする。## EQU00006 ## The voice section is set to 2.6 seconds (maximum voice section length of the voice recognition device) from this voice start end.

【００７０】次に、データのストックと遅延データの取
り出しを行う（ステップＳ３４）。現在より過去数秒間
（この場合１秒間とする）のストックデータを更新し、
これにより遅延データＭＤ（ＣＨ），ＲＤ（ＣＨ）を得
る。その後、音声区間か否かを判別し（ステップＳ３
５）、音声区間でない場合には、この遅延データをスト
ックして補正係数の候補を作成する（ステップＳ３
６）。具体的には、音声区間でないときに、得られた遅
延データＭＤ（ＣＨ），ＲＤ（ＣＨ）を補正係数計算用
データとして、補正係数計算用ストックデータを更新す
る。そして、音声成分が含まれていない過去１秒間分の
累計値ΣＭＤ（ＣＨ），ΣＲＤ（ＣＨ）を用いて下記の
（数９）により、補正係数候補Ｇｃ（ＣＨ）を求める。Next, data stock and delayed data are taken out (step S34). Update the stock data for the past few seconds (1 second in this case) from the present,
As a result, delay data MD (CH) and RD (CH) are obtained. Then, it is determined whether or not it is a voice section (step S3
5) If it is not in the voice section, this delay data is stocked to create correction coefficient candidates (step S3).
6). Specifically, when it is not in the voice section, the obtained delay data MD (CH), RD (CH) is used as the correction coefficient calculation data, and the correction coefficient calculation stock data is updated. Then, the correction coefficient candidate Gc (CH) is obtained by the following (Equation 9) using the cumulative values ΣMD (CH) and ΣRD (CH) for the past one second that does not include the voice component.

【００７１】[0071]

【数９】補正係数候補Ｇｃ（ＣＨ）を求めた後、一定時間ごとに
補正係数の更新を行う（ステップＳ３７）。すなわち、
カウンタを設定して、音声区間でないときにこのカウン
タをインクリメントし、一定時間（この場合、０．５
秒）ごとに補正係数候補Ｇｃ（ＣＨ）を補正係数Ｇ（Ｃ
Ｈ）として更新する。[Equation 9] After obtaining the correction coefficient candidate Gc (CH), the correction coefficient is updated at regular intervals (step S37). That is,
A counter is set, and this counter is incremented when it is not in a voice section, and a fixed time (in this case, 0.5
Every second), the correction coefficient candidate Gc (CH) is replaced with the correction coefficient G (C
H).

【００７２】一方、音声区間でない場合には、Ｓ３６，
Ｓ３７は省く。次に、音声区間の有無に関わらず、オー
ディオ騒音成分の減算処理を行う（ステップＳ３８）。
この減算処理は、更新された補正係数Ｇ（ＣＨ）と遅延
データＭＤ（ＣＨ），ＲＤ（ＣＨ）とを用いて、（数
８）により音声信号ＳＤ（ＣＨ）を抽出し、その減算デ
ータを出力する（ステップＳ３９）。On the other hand, if it is not in the voice section, S36,
Omit S37. Next, the subtraction process of the audio noise component is performed regardless of the presence or absence of the voice section (step S38).
In this subtraction process, the audio signal SD (CH) is extracted by (Equation 8) using the updated correction coefficient G (CH) and the delay data MD (CH) and RD (CH), and the subtracted data is extracted. Output (step S39).

【００７３】[0073]

【数８】図６（ｂ）は一定時間（０．５秒）ごとに補正
係数を更新した場合のオーディオ騒音の残留成分を示す
図である。この図で明らかなように、残留成分の変動が
少なくなるので、音声認識部の定常雑音除去機能によ
り、残留成分を除去することができる。## EQU00008 ## FIG. 6B is a diagram showing the residual component of the audio noise when the correction coefficient is updated at fixed time intervals (0.5 seconds). As is clear from this figure, since the fluctuation of the residual component is reduced, the residual component can be removed by the stationary noise removal function of the voice recognition unit.

【００７４】次に、本発明の第３の実施例について説明
する。Next, a third embodiment of the present invention will be described.

【００７５】この実施例の音声認識装置のブロック図
も、図３に示す第２の発明の実施例の構成と同じ構成で
あるので、その説明は省略し、図８に示す動作フローチ
ャートに基づいてその動作について説明する。図８はこ
の第２の発明の第３の実施例の音声認識装置のＣＰＵ１
５の動作を表すフローチャートである。The block diagram of the speech recognition apparatus of this embodiment also has the same configuration as that of the embodiment of the second invention shown in FIG. 3, so its explanation is omitted and based on the operation flowchart shown in FIG. The operation will be described. FIG. 8 shows the CPU 1 of the voice recognition apparatus according to the third embodiment of the second invention.
6 is a flowchart showing the operation of FIG.

【００７６】音声認識部２１から供給されるチャンネル
選択信号ＣＳを監視して、ラッチ回路１４及び２０へラ
ッチタイミングの信号を出力し（ステップＳ４１）、ラ
ッチ回路１４及び２０よりデータＭ（ＣＨ）及びＲ（Ｃ
Ｈ）を取り込む（ステップＳ４２）。The channel selection signal CS supplied from the voice recognition unit 21 is monitored and a latch timing signal is output to the latch circuits 14 and 20 (step S41). The latch circuits 14 and 20 output the data M (CH) and R (C
H) is taken in (step S42).

【００７７】次に、取り込んだデータにより音声区間の
検出を行う（ステップＳ４３）。この音声区間の始端ｔ
ｓの検出は以下のようにして行う。フィルタバンク１３
のオールパスフィルタ１３ａから得られた更新前の補正
係数（これをＧ０（０）とする）を利用し、下記に示す
（数６）により、音声信号のレベルが所定のスレッショ
ルドレベルＴＨより大となる点（時間）すなわち始端ｔ
ｓを検出する。Next, the voice section is detected from the fetched data (step S43). Start t of this voice section
The detection of s is performed as follows. Filter bank 13
Using the correction coefficient before updating (which is referred to as G0 (0)) obtained from the all-pass filter 13a, the level of the audio signal becomes higher than the predetermined threshold level TH by the following (Equation 6). Point (time), that is, starting point t
s is detected.

【００７８】[0078]

【数６】この（数６）で、Ｍ（０）及びＲ（０）は、オ
ールパスフィルタ１３ａから得られるメイン信号及びリ
ファレンス信号である。上記したように、音声区間Ｔ1
はこの始端ｔｓから２．６秒間とする。In this (Equation 6), M (0) and R (0) are the main signal and the reference signal obtained from the all-pass filter 13a. As described above, the voice section T1
Is 2.6 seconds from the starting end ts.

【００７９】さらに、始端ｔｓから過去１秒間のストッ
クデータを更新する。すなわち、過去１秒前の遅延デー
タＭＤ（ＣＨ），ＲＤ（ＣＨ）をＲＡＭから取り出す
（ステップＳ４４）。この遅延データをストックすると
ともに、補正係数候補を作成する（ステップＳ４５）。
すなわち、遅延データＭＤ（ＣＨ），ＲＤ（ＣＨ）を補
正係数計算用データとして、補正係数計算用ストックデ
ータを更新する。そして、遅延データの過去１秒分の累
計値ΣＭＤ（ＣＨ），ΣＲＤ（ＣＨ）を計算し、補正係
数の候補Ｇｃ（ＣＨ）を（数９）により求める。Further, the stock data for the past 1 second from the starting end ts is updated. That is, the delay data MD (CH) and RD (CH) one second before is taken out from the RAM (step S44). This delay data is stocked and correction coefficient candidates are created (step S45).
That is, the correction coefficient calculation stock data is updated using the delay data MD (CH) and RD (CH) as the correction coefficient calculation data. Then, the cumulative values ΣMD (CH) and ΣRD (CH) of the delay data for the past one second are calculated, and the correction coefficient candidate Gc (CH) is obtained by (Equation 9).

【００８０】[0080]

【数９】次に、音声始端を検出したか否かを判別し（ス
テップＳ４６）、音声始端を検出した場合には、Ｇｃ
（ＣＨ）＝Ｇ（ＣＨ）として補正係数を更新する（ステ
ップＳ４７）。## EQU9 ## Next, it is determined whether or not the voice start edge is detected (step S46). If the voice start edge is detected, Gc
The correction coefficient is updated with (CH) = G (CH) (step S47).

【００８１】図９はこの実施例における音声認識装置の
音声始端を検出する様子を示す図である。この音声始端
の検出により補正係数の更新を行うものである。すなわ
ち、かかる方法による補正係数の更新は、図９における
音声始端であるａ点の２秒前（ｃ点）から１秒前（ｂ
点）のデータの累計値の比が補正係数となる。したがっ
て、出力データは遅延データであるため、ｂ点から補正
係数が更新されることになる。FIG. 9 is a diagram showing how the voice start end of the voice recognition apparatus in this embodiment is detected. The correction coefficient is updated by detecting the voice start end. That is, the correction coefficient is updated by such a method from 2 seconds (point c) to 1 second (b) before the point a which is the voice start end in FIG.
The ratio of the cumulative value of the data of (point) becomes the correction coefficient. Therefore, since the output data is delay data, the correction coefficient is updated from the point b.

【００８２】よって、毎回の発声ごとに補正係数を更新
するときも、オーディオ騒音の残留成分の変動が少なく
なり、図６（ｂ）に示すような波形が得られるので、音
声認識部２１がある程度の定常雑音除去機能を有する場
合には、残留成分は定常雑音として除去される可能性が
高くなる。Therefore, even when the correction coefficient is updated for each utterance, the fluctuation of the residual component of the audio noise is reduced, and the waveform as shown in FIG. 6B is obtained. In the case of having the stationary noise removal function of, the residual component is likely to be removed as stationary noise.

【００８３】この方式の場合、更新前の補正係数は前回
の発声の際に決定されるので、例えば、「ボイスコント
ロール」というような特定の単語の発声で音声認識装置
が音声起動し、その後コントロールワードを認識させる
ようなシステムの場合に有効となる。In the case of this method, since the correction coefficient before updating is determined at the time of the previous utterance, the voice recognition device is voice activated by the utterance of a specific word such as "voice control", and then the control is performed. This is effective in the case of a system that recognizes words.

【００８４】ステップＳ４６において音声始端でない場
合には、オーディオ騒音成分の減算を行い（ステップＳ
４８）、更新された補正係数Ｇ（ＣＨ）と遅延データＭ
Ｄ（ＣＨ），ＲＤ（ＣＨ）を用いて、（数８）により遅
延音声信号のデータＳＤ（ＣＨ）を抽出してデータの出
力を行う（ステップＳ４９）。When it is not the voice start end in step S46, the audio noise component is subtracted (step S
48), updated correction coefficient G (CH) and delay data M
Using D (CH) and RD (CH), the data SD (CH) of the delayed audio signal is extracted by (Equation 8) and the data is output (step S49).

【００８５】[0085]

【数８】この第３の実施例によれば、音声始端検出位置
で補正係数を更新することにより、毎回の発声ごとに補
正係数を更新する場合でも、音声始端の誤検出が起こり
にくくなる。## EQU00008 ## According to the third embodiment, the correction coefficient is updated at the voice start edge detection position, so that the false detection of the voice start edge is less likely to occur even when the correction coefficient is updated for each utterance.

【００８６】また、音声の発声間隔は通常２秒以上ある
ので、図９に示すｃ点からｂ点までは音声データが含ま
れないことか予想され、音声成分の有無を判定する必要
がなくなるという利点もある。Further, since the utterance interval of voice is usually 2 seconds or more, it is expected that voice data is not included from point c to point b shown in FIG. 9, and it is not necessary to judge the presence or absence of voice component. There are also advantages.

【００８７】３．第３の発明の実施例について説明す
る。3. An embodiment of the third invention will be described.

【００８８】この発明の特徴は、更新した補正係数に対
して、さらに所定の調整量を乗ずることにある。A feature of the present invention is that the updated correction coefficient is further multiplied by a predetermined adjustment amount.

【００８９】以下に述べるこの発明の第１及び第２の実
施例の音声認識装置のブロック図は、図３に示す第２の
発明の実施例の構成と同じ構成であり、その説明は省略
する。The block diagram of the voice recognition apparatus according to the first and second embodiments of the present invention described below has the same configuration as that of the second embodiment of the present invention shown in FIG. 3, and the description thereof will be omitted. .

【００９０】図１０はこの第３の発明の実施例における
音声認識装置のＣＰＵ１５の動作を表すフローチャート
である。このフローチャート及び図３に基づいて、この
第３の発明の第１の実施例の動作を説明する。FIG. 10 is a flow chart showing the operation of the CPU 15 of the voice recognition device in the embodiment of the third invention. The operation of the first embodiment of the third invention will be described with reference to this flowchart and FIG.

【００９１】まず、音声認識部２１からのチャンネル選
択信号ＣＳをモニタし、ラッチ回路１４及び２０へラッ
チタイミング信号を供給し（ステップＳ５１）、並び
に、ラッチ回路１４及び２０からＣＰＵ１５への取り込
みタイミングをつくり、データＭ（ＣＨ），Ｒ（ＣＨ）
を取り込む（ステップＳ５２）。First, the channel selection signal CS from the voice recognition section 21 is monitored, the latch timing signal is supplied to the latch circuits 14 and 20 (step S51), and the timing of fetching from the latch circuits 14 and 20 to the CPU 15 is determined. Structure, data M (CH), R (CH)
Is taken in (step S52).

【００９２】この取り込んだデータより音声区間の検出
を行う（ステップＳ５３）。この検出は、オールパスフ
ィルタ１３ａから得られた更新前の補正係数Ｇ０（０）
を利用し、（数６）を満たすデータにより、音声の始端
とする。The voice section is detected from the fetched data (step S53). This detection is performed by the correction coefficient G0 (0) before update obtained from the all-pass filter 13a.
Is used as the start point of the voice by the data satisfying (Equation 6).

【００９３】[0093]

【００９４】次に、データのストックと遅延データの取
り出しを行う（ステップＳ５４）。現在より過去数秒間
（この場合１秒間とする）のストックデータを更新し、
これにより遅延データＭＤ（ＣＨ），ＲＤ（ＣＨ）を得
る。そして音声区間か否かを判別して（ステップＳ５
５）、音声区間でない場合には、遅延音声成分を含まな
い遅延データをストックして補正係数の候補を作成する
（ステップＳ５６）。具体的には、音声区間でないとき
に、得られた遅延データＭＤ（ＣＨ），ＲＤ（ＣＨ）を
補正係数計算用データとして、補正係数計算用ストック
データを更新する。そして、音声成分が含まれていない
過去１秒間分の累計値ΣＭＤ（ＣＨ），ΣＲＤ（ＣＨ）
を用いて下記の（数９）により、補正係数候補Ｇｃ（Ｃ
Ｈ）を求める。Next, data stock and delayed data are extracted (step S54). Update the stock data for the past few seconds (1 second in this case) from the present,
As a result, delay data MD (CH) and RD (CH) are obtained. Then, it is determined whether or not it is in the voice section (step S5
5) If it is not a voice section, delay data that does not include a delayed voice component is stocked to create a correction coefficient candidate (step S56). Specifically, when it is not in the voice section, the obtained delay data MD (CH), RD (CH) is used as the correction coefficient calculation data, and the correction coefficient calculation stock data is updated. Then, cumulative values ΣMD (CH), ΣRD (CH) for the past one second that does not include a voice component
And the correction coefficient candidate Gc (C
H) is calculated.

【００９５】[0095]

【数９】補正係数候補Ｇｃ（ＣＨ）を求めた後、０．５
秒ごとに補正係数の更新を行う（ステップＳ５７）。す
なわち、カウンタを設定して、音声区間でないときにこ
のカウンタをインクリメントし、０．５秒ごとに補正係
数候補Ｇｃ（ＣＨ）を補正係数Ｇ（ＣＨ）として更新す
る。さらに更新した補正係数の調整を行う（ステップＳ
５８）。この調整は、遅延音声成分を含まない遅延デー
タＭＤ（ＣＨ）の累計値ΣＭＤ（ＣＨ）を利用して、調
整量αを調整量決定ルールより求め、オールパスフィル
タ（ＣＨ０）１３ａの補正係数をＧ′（０）（＝Ｇ
（０）・α）とする。## EQU9 ## After obtaining the correction coefficient candidate Gc (CH), 0.5
The correction coefficient is updated every second (step S57). That is, a counter is set, this counter is incremented when it is not in the voice section, and the correction coefficient candidate Gc (CH) is updated as the correction coefficient G (CH) every 0.5 seconds. Further, the updated correction coefficient is adjusted (step S
58). This adjustment uses the cumulative value ΣMD (CH) of the delay data MD (CH) that does not include the delayed voice component to obtain the adjustment amount α from the adjustment amount determination rule, and the correction coefficient of the all-pass filter (CH0) 13a is G ′ (0) (= G
(0) · α).

【００９６】図１１は第３及び後述する第４の発明の音
声認識装置における補正係数の調整量決定ルールを表す
図である。図１１において、横軸は遅延データＭＤ（Ｃ
Ｈ）の累計値ΣＭＤ（ＣＨ）の数であり、縦軸は調整量
αである。累計値の数が２００まではαは１であり、２
００から４００まではαは１．３となり、累計値に応じ
てαの値が増加する。このように、図１１に示す調整量
決定ルールにおけるαの値は常に１以上であり、次の処
理である調整量の修正のため、予め減算量が多めになる
ように設定されている。FIG. 11 is a diagram showing a rule for determining the adjustment amount of the correction coefficient in the voice recognition device of the third and fourth inventions described later. In FIG. 11, the horizontal axis represents the delay data MD (C
H) is the number of cumulative values ΣMD (CH), and the vertical axis is the adjustment amount α. Α is 1 until the number of accumulated values is 200, and 2
From 00 to 400, α becomes 1.3, and the value of α increases according to the cumulative value. As described above, the value of α in the adjustment amount determination rule shown in FIG. 11 is always 1 or more, and the subtraction amount is set to be larger in advance for the adjustment of the adjustment amount in the next process.

【００９７】補正係数の修正処理（ステップＳ５９）に
おいては、ＣＨ０のオーディオ成分の減算時に、減算結
果が負となった場合に、減算量が多すぎることを示す減
算量過多フラグを立て、音声区間以外の過去一定時間
（この場合３秒間とする）の累計値を計算する。例え
ば、１フレーム１０ｍｓとした場合、累計値が３００で
あれば完全に減算のし過ぎであるということが解る。こ
のような場合には、調整量αをディクリメントして修正
することにより減算のし過ぎを回避できる。In the correction processing of the correction coefficient (step S59), when the subtraction result is negative when subtracting the audio component of CH0, an excessive subtraction amount flag indicating that the subtraction amount is too large is set, The cumulative value of the past constant time (in this case, 3 seconds) other than is calculated. For example, when 1 frame is set to 10 ms, it can be understood that if the cumulative value is 300, the subtraction is excessively complete. In such a case, it is possible to avoid oversubtraction by decrementing and adjusting the adjustment amount α.

【００９８】この場合のルールは、累計値＞２８５ならば調整量のディクリメント累計値＜２５０ならば調整量のインクリメントとし、累計データが３秒間データであるため、この判断
も３秒ごとに行う。In this case, the rule is that if the accumulated value is> 285, the adjustment amount is decremented. If the accumulated value is <250, the adjustment amount is incremented. Since the accumulated data is data for 3 seconds, this determination is also made every 3 seconds. .

【００９９】ステップＳ５５において音声区間でない場
合には、オーディオ騒音成分の減算処理を行う（ステッ
プＳ６０）。この減算処理は、更新された補正係数Ｇ
（ＣＨ）と遅延データＭＤ（ＣＨ），ＲＤ（ＣＨ）とを
用いて、（数８）により音声信号ＳＤ（ＣＨ）を抽出
し、その減算データを出力する（ステップＳ６１）。If it is not in the voice section in step S55, subtraction processing of the audio noise component is performed (step S60). This subtraction process is performed with the updated correction coefficient G
Using (CH) and the delay data MD (CH) and RD (CH), the audio signal SD (CH) is extracted by (Equation 8), and the subtracted data is output (step S61).

【０１００】この実施例による効果は、オーディオ騒音
レベルの変動に応じて補正係数を更新できることであ
る。The effect of this embodiment is that the correction coefficient can be updated according to the fluctuation of the audio noise level.

【０１０１】通常、オーディオ騒音レベルが大きくなる
と推定誤差が生じるため、オーディオ騒音の残留データ
が多く残ってしまい、音声認識部における音声区間の検
出誤りが多くなる。そこで、本実施例の発明を適用する
ことにより、オーディオ騒音レベルが大きいときは上記
調整量決定ルールに従い、オールパスフィルタにおいて
多めに減算することにより、音声区間の検出誤りを少な
くすることができる。Usually, when the audio noise level becomes large, an estimation error occurs, so that a large amount of residual data of audio noise remains and the detection error of the voice section in the voice recognition unit increases. Therefore, by applying the invention of this embodiment, when the audio noise level is high, a large amount of subtraction is performed in the all-pass filter in accordance with the adjustment amount determination rule, so that the detection error of the voice section can be reduced.

【０１０２】一方、オーディオ騒音レベルが小さいとき
には、オーディオ騒音の残留レベルは少ないので、多く
減算し過ぎると音声区間が狭まるために類似度が低下す
る。かかる場合には、調整量決定ルールに従い、オール
パスフィルタにおいて少なめになるように減算すること
により、類似度を高くすることができる。On the other hand, when the audio noise level is low, the residual level of the audio noise is low, and if too much subtraction is performed, the voice section becomes narrow and the similarity decreases. In such a case, according to the adjustment amount determination rule, the degree of similarity can be increased by subtracting so that the all-pass filter is reduced.

【０１０３】次にこの発明の第２の実施例について説明
する。Next, a second embodiment of the present invention will be described.

【０１０４】第２の実施例の特徴は、第１の実施例と同
様に、更新した補正係数に対して、さらに所定の調整量
を乗ずることにある。もっともこの実施例の場合には補
正係数の修正は行わない点が第１の実施例と異なる。The feature of the second embodiment resides in that the updated correction coefficient is further multiplied by a predetermined adjustment amount, as in the first embodiment. However, this embodiment is different from the first embodiment in that the correction coefficient is not modified.

【０１０５】図１２はこの第３の発明の第２の実施例に
おける音声認識装置のＣＰＵ１５の動作を示すフローチ
ャートである。このフローチャート及び図３に基づい
て、この第２の実施例の動作を説明する。FIG. 12 is a flow chart showing the operation of the CPU 15 of the voice recognition device in the second embodiment of the third invention. The operation of the second embodiment will be described based on this flowchart and FIG.

【０１０６】まず、音声認識部２１からのチャンネル選
択信号ＣＳをモニタし、ラッチ回路１４及び２０へラッ
チタイミング信号を供給し（ステップＳ７１）、並び
に、ラッチ回路１４及び２０からＣＰＵ１５への取り込
みタイミングをつくり、データＭ（ＣＨ），Ｒ（ＣＨ）
を取り込む（ステップＳ７２）。First, the channel selection signal CS from the voice recognition section 21 is monitored, a latch timing signal is supplied to the latch circuits 14 and 20 (step S71), and the timing of fetching from the latch circuits 14 and 20 to the CPU 15 is determined. Structure, data M (CH), R (CH)
Is taken in (step S72).

【０１０７】この取り込んだデータより音声区間の検出
を行う（ステップＳ７３）。この検出は、オールパスフ
ィルタ１３ａから得られた更新前の補正係数Ｇ０（０）
を利用し、（数６）を満たすデータにより、音声の始端
とする。The voice section is detected from the fetched data (step S73). This detection is performed by the correction coefficient G0 (0) before update obtained from the all-pass filter 13a.
Is used as the start point of the voice by the data satisfying (Equation 6).

【０１０８】[0108]

【０１０９】次に、データのストックと遅延データの取
り出しを行う（ステップＳ７４）。現在より過去数秒間
（この場合１秒間とする）のストックデータを更新し、
これにより遅延データＭＤ（ＣＨ），ＲＤ（ＣＨ）を得
る。そして音声区間か否かを判別して（ステップＳ７
５）、音声区間でない場合には、遅延音声成分を含まな
い遅延データをストックして補正係数の候補を作成する
（ステップＳ７６）。具体的には、音声区間でないとき
に、得られた遅延データＭＤ（ＣＨ），ＲＤ（ＣＨ）を
補正係数計算用データとして、補正係数計算用ストック
データを更新する。そして、音声成分が含まれていない
過去１秒間分の累計値ΣＭＤ（ＣＨ），ΣＲＤ（ＣＨ）
を用いて下記の（数９）により、補正係数候補Ｇｃ（Ｃ
Ｈ）を求める。Next, the data stock and the delayed data are taken out (step S74). Update the stock data for the past few seconds (1 second in this case) from the present,
As a result, delay data MD (CH) and RD (CH) are obtained. Then, it is determined whether or not it is in the voice section (step S7).
5) If it is not in the voice section, delay data that does not include the delayed voice component is stocked to create correction coefficient candidates (step S76). Specifically, when it is not in the voice section, the obtained delay data MD (CH), RD (CH) is used as the correction coefficient calculation data, and the correction coefficient calculation stock data is updated. Then, cumulative values ΣMD (CH), ΣRD (CH) for the past one second that does not include a voice component
And the correction coefficient candidate Gc (C
H) is calculated.

【０１１０】[0110]

【数９】補正係数候補Ｇｃ（ＣＨ）を求めた後、０．５
秒ごとに補正係数の更新を行う（ステップＳ７７）。す
なわち、カウンタを設定して、音声区間でないときにこ
のカウンタをインクリメントし、０．５秒ごとに補正係
数候補Ｇｃ（ＣＨ）を補正係数Ｇ（ＣＨ）として更新す
る。さらに更新した補正係数の調整を行う（ステップＳ
７８）。この調整は、遅延音声成分を含まない遅延デー
タＭＤ（ＣＨ）の累計値ΣＭＤ（ＣＨ）を利用して、調
整量αを調整量決定ルールより求め、オールパスフィル
タ（ＣＨ０）１３ａの補正係数をＧ′（０）（＝Ｇ
（０）・α）とする。調整量決定ルールは第４の実施例
と同じく図９に示す通りである。## EQU9 ## After obtaining the correction coefficient candidate Gc (CH), 0.5
The correction coefficient is updated every second (step S77). That is, a counter is set, this counter is incremented when it is not in the voice section, and the correction coefficient candidate Gc (CH) is updated as the correction coefficient G (CH) every 0.5 seconds. Further, the updated correction coefficient is adjusted (step S
78). This adjustment uses the cumulative value ΣMD (CH) of the delay data MD (CH) that does not include the delayed voice component to obtain the adjustment amount α from the adjustment amount determination rule, and the correction coefficient of the all-pass filter (CH0) 13a is G ′ (0) (= G
(0) · α). The adjustment amount determination rule is as shown in FIG. 9 as in the fourth embodiment.

【０１１１】ステップＳ７５において音声区間でない場
合には、オーディオ騒音成分の減算処理を行う（ステッ
プＳ７９）。この減算処理は、更新された補正係数Ｇ
（ＣＨ）と遅延データＭＤ（ＣＨ），ＲＤ（ＣＨ）とを
用いて、（数８）により音声信号ＳＤ（ＣＨ）を抽出
し、その減算データを出力する（ステップＳ８０）。When it is not in the voice section in step S75, the subtraction process of the audio noise component is performed (step S79). This subtraction process is performed with the updated correction coefficient G
Using (CH) and the delay data MD (CH) and RD (CH), the audio signal SD (CH) is extracted by (Equation 8), and the subtracted data is output (step S80).

【０１１２】[0112]

【数８】この実施例による効果は、オーディオ騒音レベ
ルの変動に応じて補正係数を更新できることである。## EQU00008 ## The effect of this embodiment is that the correction coefficient can be updated according to the fluctuation of the audio noise level.

【０１１３】通常、オーディオ騒音レベルが大きくなる
と推定誤差が生じるため、オーディオ騒音の残留データ
が多く残ってしまい、音声認識部における音声区間の検
出誤りが多くなる。そこで、本実施例の発明を適用する
ことにより、オーディオ騒音レベルが大きいときは上記
調整量決定ルールに従い、オールパスフィルタにおいて
多めに減算することにより、音声区間の検出誤りを少な
くすることができる。Usually, when the audio noise level becomes high, an estimation error occurs, so that a large amount of residual data of audio noise remains, resulting in a large number of detection errors in the voice section in the voice recognition unit. Therefore, by applying the invention of this embodiment, when the audio noise level is high, a large amount of subtraction is performed in the all-pass filter in accordance with the adjustment amount determination rule, so that the detection error of the voice section can be reduced.

【０１１４】一方、オーディオ騒音レベルが小さいとき
には、オーディオ騒音の残留レベルは少ないので、多く
減算し過ぎると音声区間が狭まるために類似度が低下す
る。かかる場合には、調整量決定ルールに従い、オール
パスフィルタにおいて少なめになるように減算すること
により、類似度を高くすることができる。On the other hand, when the audio noise level is low, the residual level of the audio noise is low, and if too much is subtracted, the voice section is narrowed and the similarity decreases. In such a case, according to the adjustment amount determination rule, the degree of similarity can be increased by subtracting so that the all-pass filter is reduced.

【０１１５】４．第４の発明の実施例について説明す
る。4. An embodiment of the fourth invention will be described.

【０１１６】この発明の特徴は、特に車両等の移動体内
に設置された音声認識装置において、オーディオ騒音環
境下における音声認識装置の認識率の低下を防ぐため
に、既知オーディオ信号を基準信号として適応的にオー
ディオ騒音成分を除去する方式での、車両の走行騒音の
重畳時における対策を行うことにある。The feature of the present invention is that a known audio signal is adaptively used as a reference signal in order to prevent a reduction in the recognition rate of the speech recognition device in an audio noise environment, especially in a speech recognition device installed in a moving body such as a vehicle. In particular, it is to take measures against the superposition of vehicle running noise by a method of removing audio noise components.

【０１１７】この実施例の音声認識装置のブロック図
も、図３に示す第２の発明の実施例の構成と同じ構成で
あり、その説明は省略する。The block diagram of the voice recognition apparatus of this embodiment also has the same configuration as that of the embodiment of the second invention shown in FIG. 3, and the description thereof will be omitted.

【０１１８】図３において、マイク１１から入力される
メイン信号ｍａは、下記の（数１０）で表される。In FIG. 3, the main signal ma input from the microphone 11 is represented by the following (Equation 10).

【０１１９】[0119]

【数１０】ここで、ｓａは発生者からの音声をうけて、マイク１１
で電気信号として出力される音声信号であり、マイク１
１の変換特性が加わったものである。また、ｏａはオー
ディオ装置１６から送出されるオーディオ信号である。
さらに、ｇはオーディオ信号ｏａがスピーカ１７の変換
特性により音に変換され、その発生された音が伝播して
マイク１１に到達するまでに受ける伝送特性である。ま
た、ｎａは車両の走行騒音成分である。[Equation 10] Here, sa receives the voice from the generator, and the microphone 11
Is a voice signal output as an electric signal by the microphone 1
The conversion characteristic of 1 is added. Further, oa is an audio signal transmitted from the audio device 16.
Further, g is a transmission characteristic in which the audio signal oa is converted into sound by the conversion characteristic of the speaker 17, and the generated sound is propagated and reaches the microphone 11. Further, na is a running noise component of the vehicle.

【０１２０】入力信号（ｍａ，ｏａ）を周波数解析しデ
ジタル化したデータ、Ｍ（ＣＨ），Ｒ（ＣＨ）を用いる
と、（数１０）は次の（数１１）で表すことができる。Using data (M (CH), R (CH)) obtained by frequency-analyzing and digitizing the input signal (ma, oa), (Equation 10) can be expressed by the following (Equation 11).

【０１２１】[0121]

【数１１】もっともこの（数１１）はアナログ信号をデジタル化し
てるために生ずる誤差により、左項と右項とは必ずしも
完全に等しくはならない。この式において、Ｓ（ＣＨ）
は音声信号ｓａをデジタル化したデータであり、Ｇ（Ｃ
Ｈ）はＲ（ＣＨ）に乗じてメイン信号Ｍ（ＣＨ）に含ま
れる音声信号成分Ｓ（ＣＨ）を予測するための補正係数
である。また、Ｎ（ＣＨ）は走行騒音成分ｎａをデジタ
ル化したデータである。[Equation 11] However, in this (Equation 11), the left term and the right term are not always completely equal due to an error caused by digitizing the analog signal. In this formula, S (CH)
Is data obtained by digitizing the audio signal sa, and G (C
H) is a correction coefficient for multiplying R (CH) to predict the audio signal component S (CH) included in the main signal M (CH). N (CH) is data obtained by digitizing the traveling noise component na.

【０１２２】この（数１１）により、走行音声成分Ｎ
（ＣＨ）を含む音声信号Ｓ（ＣＨ）は次に示す（数１
２）で表される。From this (Equation 11), the running voice component N
The audio signal S (CH) including (CH) is shown below (Equation 1)
It is represented by 2).

【０１２３】[0123]

【数１２】この（数１２）により、メイン信号Ｍ（ＣＨ）に含まれ
る音声信号Ｓ（ＣＨ）及び走行騒音成分Ｎ（ＣＨ）の合
成成分が予測できる。[Equation 12] From this (Equation 12), the composite component of the audio signal S (CH) and the running noise component N (CH) included in the main signal M (CH) can be predicted.

【０１２４】この補正係数Ｇ（ＣＨ）は、周波数分解能
であるチャンネル数ｎが大であるならば、音声が発生さ
れていないとき、かつ、走行騒音がゼロのときのＭ（Ｃ
Ｈ）とＲ（ＣＨ）との比により推測可能である。すなわ
ち、Ｓ（ＣＨ）＝０、Ｎ（ＣＨ）＝０とすると、Ｍ（Ｃ
Ｈ）＝Ｒ（ＣＨ）・Ｇ（ＣＨ）となり、補正係数Ｇ（Ｃ
Ｈ）は、Ｍ（ＣＨ）／Ｒ（ＣＨ）と表すことができるか
らである。This correction coefficient G (CH) is M (C) when no sound is generated and when the running noise is zero if the number of channels n, which is the frequency resolution, is large.
It can be estimated from the ratio of H) and R (CH). That is, if S (CH) = 0 and N (CH) = 0, then M (C
H) = R (CH) · G (CH), and the correction coefficient G (C
This is because H) can be represented as M (CH) / R (CH).

【０１２５】ここで、変動騒音及び基準信号に含まれな
い定常騒音の環境下において、Ｇ（ＣＨ）をいかに精度
良く推定できるかが重要となる。Here, it is important how accurately G (CH) can be estimated under the environment of fluctuating noise and steady noise not included in the reference signal.

【０１２６】この場合、音声認識部２１が定常騒音を除
去する機能を有するとすると、音声成分に定常騒音であ
る走行騒音を含んでいても良いので、音声認識部２１に
供給するデータは、Ｓ（ＣＨ）＋Ｎ（ＣＨ）で良い。な
お、音声認識部が定常騒音を除去する方式は、単一マイ
クによるＳ．Ｓ法とする。In this case, assuming that the voice recognition unit 21 has a function of removing the stationary noise, the voice component may include traveling noise which is the stationary noise. Therefore, the data supplied to the speech recognition unit 21 is S. (CH) + N (CH) is sufficient. The method in which the voice recognition unit removes stationary noise is S. S method.

【０１２７】図１３は第４の発明の実施例における音声
認識装置のＣＰＵ１５の動作を表すフローチャートであ
る。このフローチャート及び図３に基づいて、この第４
の発明の実施例の動作を説明する。FIG. 13 is a flow chart showing the operation of the CPU 15 of the voice recognition device in the embodiment of the fourth invention. Based on this flowchart and FIG. 3, this fourth
The operation of the embodiment of the invention will be described.

【０１２８】まず、音声認識部２１からのチャンネル選
択信号ＣＳをモニタし、ラッチ回路１４及び２０へラッ
チタイミング信号を供給し（ステップＳ８１）、並び
に、ラッチ回路１４及び２０からＣＰＵ１５への取り込
みタイミングをつくり、データＭ（ＣＨ），Ｒ（ＣＨ）
を取り込む（ステップＳ８２）。First, the channel selection signal CS from the voice recognition section 21 is monitored, the latch timing signal is supplied to the latch circuits 14 and 20 (step S81), and the fetch timing from the latch circuits 14 and 20 to the CPU 15 is determined. Structure, data M (CH), R (CH)
Is taken in (step S82).

【０１２９】この取り込んだデータより音声区間の検出
を行う（ステップＳ８３）。この検出は、オールパスフ
ィルタ１３ａから得られた更新前の補正係数Ｇ０（０）
を利用し、（数６）を満たすデータにより、音声の始端
とする。なおこの場合のトリガレベルは固定値である。The voice section is detected from the fetched data (step S83). This detection is performed by the correction coefficient G0 (0) before update obtained from the all-pass filter 13a.
Is used as the start point of the voice by the data satisfying (Equation 6). The trigger level in this case is a fixed value.

【０１３０】[0130]

【０１３１】次に、データのストックと遅延データの取
り出しを行う（ステップＳ８４）。現在より過去数秒間
（この場合１秒間とする）のストックデータを更新し、
これにより遅延データＭＤ（ＣＨ），ＲＤ（ＣＨ）を得
る。そして音声区間か否かを判別して（ステップＳ８
５）、音声区間でない場合には、遅延音声成分を含まな
い遅延データをストックして補正係数の候補を作成する
（ステップＳ８６）。具体的には、音声区間でないとき
に、得られた遅延データＭＤ（ＣＨ），ＲＤ（ＣＨ）を
補正係数計算用データとして、補正係数計算用ストック
データを更新する。そして、音声成分が含まれていない
過去１秒間分の累計値ΣＭＤ（ＣＨ），ΣＲＤ（ＣＨ）
を用いて下記の（数９）により、補正係数候補Ｇｃ（Ｃ
Ｈ）を求める。Next, data stock and delayed data are taken out (step S84). Update the stock data for the past few seconds (1 second in this case) from the present,
As a result, delay data MD (CH) and RD (CH) are obtained. Then, it is determined whether or not it is a voice section (step S8).
5) If it is not in the voice section, the delay data not including the delayed voice component is stocked to create a correction coefficient candidate (step S86). Specifically, when it is not in the voice section, the obtained delay data MD (CH), RD (CH) is used as the correction coefficient calculation data, and the correction coefficient calculation stock data is updated. Then, cumulative values ΣMD (CH), ΣRD (CH) for the past one second that does not include a voice component
And the correction coefficient candidate Gc (C
H) is calculated.

【０１３２】[0132]

【数９】次に、各種フラグの設定を行う（ステップＳ８
７）。この場合のフラグとしては、走行騒音環境下であ
るかどうかのフラグ（Ｎ−ＦＬＡＧ）、及び、音楽騒音
環境下であるかどうかのフラグ（Ｍ−ＦＬＡＧ）を設定
する。Next, various flags are set (step S8).
7). As the flags in this case, a flag (N-FLAG) indicating whether the vehicle is in a running noise environment and a flag (M-FLAG) indicating whether a vehicle is in a music noise environment are set.

【０１３３】Ｎ−ＦＬＡＧの設定は、ステップＳ８６で
得られたＣＨ１の遅延音声成分を含まないデータの累計
値ΣＭＤ（１），ΣＲＤ（１）と、走行音声成分を含ま
ない３０秒間の累計値ΣＭＤ（１），ΣＲＤ（１）のさ
らに累計値ΣΣＭＤ（１），ΣΣＲＤ（１）（これにつ
いては後述する）を用い、次の（数１３）を満たす場合
にＮ−ＦＬＡＧをたてる。The setting of N-FLAG is performed by accumulating the cumulative values ΣMD (1) and ΣRD (1) of the data not including the delayed voice component of CH1 obtained at step S86 and the cumulative value for 30 seconds not including the traveling voice component. Further, cumulative values ΣΣMD (1) and ΣΣRD (1) of ΣMD (1) and ΣRD (1) (which will be described later) are used, and N-FLAG is set when the following (Equation 13) is satisfied.

【０１３４】[0134]

【数１３】Ｍ−ＦＬＡＧの設定は、ステップＳ８６で得られたＣＨ
０リファレンス信号の遅延音声成分を含まないデータの
累計値ΣＲＤ（０）を用い、次の（数１４）を満たす場
合にＭ−ＦＬＡＧをたてる。[Equation 13] M-FLAG is set by the CH obtained in step S86.
The cumulative value ΣRD (0) of data that does not include the delayed audio component of the 0 reference signal is used, and M-FLAG is calculated when the following (Equation 14) is satisfied.

【０１３５】[0135]

【数１４】以下、Ｎ−ＦＬＡＧがたっている場合をＮＦ＝１、たっ
ていない場合をＮＦ＝０と表し、Ｍ−ＦＬＡＧがたって
いる場合をＭＦ＝１、たっていない場合をＭＦ＝０と表
す。[Equation 14] Hereinafter, the case where N-FLAG is accumulated is represented as NF = 1, the case where N-FLAG is not accumulated is represented as NF = 0, the case where M-FLAG is accumulated is represented as MF = 1, and the case where M-FLAG is not accumulated is represented as MF = 0.

【０１３６】その後、これら２つのフラグを判定して、
補正係数の更新（ステップＳ８８）、補正係数の調整
（ステップＳ８９）、補正係数の修正（ステップＳ９
０）を行う。補正係数の更新については、ＮＦ＝０，Ｍ
Ｆ＝１の場合には、カウンタを設定し、音声区間でない
ときにカウンタをインクリメントし、一定時間（この場
合、０．５秒）おきにＧ（ＣＨ）＝Ｇｃ（ＣＨ）として
補正係数を更新する。After that, these two flags are judged,
Update of correction coefficient (step S88), adjustment of correction coefficient (step S89), correction of correction coefficient (step S9)
0) is performed. For updating the correction coefficient, NF = 0, M
When F = 1, the counter is set, the counter is incremented when it is not in the voice section, and the correction coefficient is updated as G (CH) = Gc (CH) at regular time intervals (0.5 seconds in this case). To do.

【０１３７】さらに、３０秒平均補正係数の計算を行
う。すなわち、ステップＳ８６で得られた遅延音声成分
を含まないデータの累計値ΣＭＤ（ＣＨ），ΣＲＤ（Ｃ
Ｈ）を一定時間（この場合、０．５秒間）ごとにストッ
クし、同時に過去３０秒間のさらなる累計値ΣΣＭＤ
（ＣＨ），ΣΣＲＤ（ＣＨ）を求める。ここで求めたＣ
Ｈ０の累計値が（数１３）に用いられるデータとなる。Further, the 30-second average correction coefficient is calculated. That is, the accumulated values ΣMD (CH) and ΣRD (C of the data not including the delayed voice component obtained in step S86
H) is stocked at regular intervals (0.5 seconds in this case), and at the same time, a further cumulative value ΣΣMD for the past 30 seconds.
(CH) and ΣΣRD (CH) are obtained. C obtained here
The cumulative value of H0 becomes the data used in (Equation 13).

【０１３８】ＮＦ＝１，ＭＦ＝１の場合には、次の（数
１５）により補正係数を決定する。When NF = 1 and MF = 1, the correction coefficient is determined by the following (Equation 15).

【０１３９】[0139]

【数１５】補正係数の調整についても２つのフラグの値によってそ
れぞれ異なる調整を行い、ＮＦ＝０，ＭＦ＝１の場合に
は、メイン信号の遅延音声成分を含まないデータの累計
値ΣＭＤ（０）を利用して、調整量αを図９の調整量決
定ルールより求め、オールパスフィルタ（ＣＨ０）１３
ａの補正係数をＧ′（０）（＝Ｇ（０）・α）とする。
ここで調整量決定ルールは、次の調整量の修正のため予
め減算量が多めになるように設定されている。[Equation 15] The adjustment of the correction coefficient is also performed differently depending on the values of the two flags. When NF = 0 and MF = 1, the cumulative value ΣMD (0) of the data that does not include the delayed audio component of the main signal is used. Then, the adjustment amount α is obtained from the adjustment amount determination rule of FIG. 9, and the all-pass filter (CH0) 13
Let the correction coefficient of a be G '(0) (= G (0) · α).
Here, the adjustment amount determination rule is set in advance so that the subtraction amount is increased in order to correct the next adjustment amount.

【０１４０】一方、ＮＦ＝１，ＭＦ＝１の場合には、メ
イン信号の遅延音声成分を含まないデータの累計値ΣＭ
Ｄ（０）には走行騒音成分が含まれるため、次の（数１
６）によりΣＭＤ（０）の推定値ΣＭＤ inf（０）を求
める。On the other hand, when NF = 1 and MF = 1, the cumulative value ΣM of data not including the delayed audio component of the main signal
Since D (0) contains a running noise component, the following (Equation 1)
The estimated value ΣMD inf (0) of ΣMD (0) is obtained by 6).

【０１４１】[0141]

【数１６】その後、ＮＦ＝０，ＭＦ＝１の場合と同様に、調整量α
を図９の調整量決定ルールより求め、オールパスフィル
タ（ＣＨ０）１３ａの補正係数をＧ′（０）（＝Ｇ
（０）・α）とする。[Equation 16] Then, as in the case of NF = 0 and MF = 1, the adjustment amount α
From the adjustment amount determination rule of FIG. 9, and the correction coefficient of the all-pass filter (CH0) 13a is G ′ (0) (= G
(0) · α).

【０１４２】次に、補正係数の修正については、Ｍ−Ｆ
ＬＡＧのみに注目し、ＭＦ＝１の場合に調整量の修正を
行う。ＣＨ０のオーディオ成分の減算時に、減算結果が
負となった場合に、減算量が多すぎることを示す減算量
過多フラグを立て、音声区間以外の過去一定時間（この
場合３秒間とする）の累計値を計算する。例えば、１フ
レーム１０ｍｓとした場合、累計値が３００であれば完
全に減算のし過ぎであるということが解る。このような
場合には、調整量αをディクリメントして修正すること
により減算のし過ぎを回避できる。Next, regarding the correction of the correction coefficient, MF
Focusing only on LAG, when MF = 1, the adjustment amount is corrected. When the subtraction result becomes negative when subtracting the audio component of CH0, an excessive subtraction amount flag is set to indicate that the subtraction amount is too large, and the cumulative past past time (3 seconds in this case) other than the voice section is accumulated. Calculate the value. For example, when 1 frame is set to 10 ms, it can be understood that if the cumulative value is 300, the subtraction is excessively complete. In such a case, it is possible to avoid oversubtraction by decrementing and adjusting the adjustment amount α.

【０１４３】この場合のルールは、累計値＞２８５ならば調整量のディクリメント累計値＜２５０ならば調整量のインクリメントとし、累計データが３秒間データであるため、この判断
も３秒ごとに行う。The rule in this case is that if the cumulative value> 285, the adjustment amount is decremented. If the cumulative value <250, the adjustment amount is incremented. Since the cumulative data is 3 seconds data, this judgment is also made every 3 seconds. .

【０１４４】ステップＳ８５において音声区間でない場
合には、フラグの設定をして（ステップＳ９１）、オー
ディオ騒音成分の減算処理を行う（ステップＳ９２）。
この減算処理は、更新された補正係数Ｇ（ＣＨ）と遅延
データＭＤ（ＣＨ），ＲＤ（ＣＨ）とを用いて、（数
８）により音声信号ＳＤ（ＣＨ）を抽出し、その減算デ
ータを出力する（ステップＳ９３）。If it is not in the voice section in step S85, a flag is set (step S91) and an audio noise component subtraction process is performed (step S92).
In this subtraction process, the audio signal SD (CH) is extracted by (Equation 8) using the updated correction coefficient G (CH) and the delay data MD (CH) and RD (CH), and the subtracted data is extracted. Output (step S93).

【０１４５】この第４の発明の実施例によれば、既知雑
音環境下の適応型Ｓ．Ｓ法において、音声成分に走行騒
音が重畳した場合であっても、適正な補正係数でオーデ
ィオ騒音成分を除去し、走行騒音除去に対しては音声認
識部の機能をそのまま利用することができる。According to the fourth embodiment of the present invention, the adaptive S.V. In the S method, even when the running noise is superimposed on the voice component, the audio noise component is removed with an appropriate correction coefficient, and the function of the voice recognition unit can be used as it is for the running noise removal.

【０１４６】第２の発明の第３の実施例における音声認
識装置の音声始端を検出する様子を示す図である。FIG. 27 is a diagram showing how a voice start edge of the voice recognition device in the third embodiment of the second invention is detected.

【０１４７】５．第５ないし第７の発明の実施例につい
て説明する。5. Embodiments of the fifth to seventh inventions will be described.

【０１４８】第５ないし第７の発明は、それぞれ、上記
第１ないし第４の発明における音声トリガレベルの決定
方法、走行騒音判定レベルの決定方法、及び、補正係数
の調整量の決定方法をファジィ推論により行うものであ
る。The fifth to seventh inventions are a fuzzy method for determining the voice trigger level, a method for determining the running noise determination level, and a method for determining the adjustment amount of the correction coefficient, respectively, in the first to fourth inventions. It is based on inference.

【０１４９】なお、これらの実施例のシステム構成は図
３のブロック図と同一であるのでその説明は省略する。
図１４は第５ないし第７の発明における音声トリガレベ
ルの決定方法、走行騒音判定レベルの決定方法、及び、
補正係数の調整量の決定方法をファジィ推論により行っ
た場合の、ＣＰＵ１５の動作を表すフローチャートであ
る。Since the system configurations of these embodiments are the same as those in the block diagram of FIG. 3, the description thereof will be omitted.
FIG. 14 is a method for determining a voice trigger level, a method for determining a running noise determination level according to the fifth to seventh inventions, and
9 is a flowchart showing the operation of the CPU 15 when the method of determining the adjustment amount of the correction coefficient is performed by fuzzy inference.

【０１５０】図３及び図１４において、音声認識部２１
のＣＨ信号をモニタし、それからラッチ回路１４及び２
０のラッチタイミングとＣＰＵ１５の取り込みタイミン
グをつくり、データ（Ｍ（ＣＨ），Ｒ（ＣＨ））を取り
込む（ステップＳ１０１）。次に、ファジィ推論によ
り、音声トリガレベルを決定して音声区間を検出する
（ステップＳ１０２）。この場合、更新前の補正係数を
Ｇ０（ＣＨ）としてこれをを利用し、In FIG. 3 and FIG. 14, the voice recognition unit 21
Monitor the CH signal and then latch circuits 14 and 2
The latch timing of 0 and the fetch timing of the CPU 15 are created to fetch the data (M (CH), R (CH)) (step S101). Next, the voice trigger level is determined by fuzzy inference to detect the voice section (step S102). In this case, the correction coefficient before update is used as G0 (CH),

【０１５１】[0151]

【数６】を音声の始端とする。音声区間は音声始端から
２．６秒間（これは音声認識装置の最大音声区間長）と
する。この場合、上記第２ないし第４の発明と同様、音
声始端からすなわち現在から過去数秒間（この場合１秒
間）のストックデータをＲＡＭから読み出して、遅延デ
ータＭＤ（ＣＨ）、ＲＤ（ＣＨ）を得る。そして音声区
間かどうかを判別し（ステップＳ１０３）、音声区間で
ない場合には、遅延データＭＤ（ＣＨ）、ＲＤ（ＣＨ）
を補正係数の計算用データとして補正係数計算用ストッ
クデータを更新する。そして音声成分未含有の遅延デー
タの過去１秒間の累計値であるΣＭＤ（ＣＨ）とΣＲＤ
（ＣＨ）とを計算し、補正係数の候補Ｇｃ（ＣＨ）を、Let [Equation 6] be the beginning of the voice. The voice section is 2.6 seconds from the beginning of the voice (this is the maximum voice section length of the voice recognition device). In this case, similarly to the second to fourth inventions, stock data from the voice start end, that is, from the present time to the past several seconds (in this case, one second) is read from the RAM and delay data MD (CH) and RD (CH) are obtained. obtain. Then, it is determined whether or not it is a voice section (step S103). If it is not a voice section, delay data MD (CH), RD (CH)
The stock data for calculating the correction coefficient is updated by using as the calculation data for the correction coefficient. Then, ΣMD (CH) and ΣRD, which are the cumulative values of the delay data not containing the voice component in the past 1 second.
(CH) and calculate the correction coefficient candidate Gc (CH) as

【０１５２】[0152]

【数９】より作成する。Created from

【０１５３】そして音響騒音すなわちオーディオ騒音成
分があるかどうかを判別する（ステップＳ１０４）。そ
の判別結果に応じて音響騒音環境下であるかどうかのフ
ラグ（これを「ＭＦＬＡＧ」と称する）を設定する。
この設定は、ＣＨ０リファレンス信号の遅延音声成分未
含有データの累計値ΣＲＤ（１）を用い、Then, it is judged whether or not there is an acoustic noise, that is, an audio noise component (step S104). A flag indicating whether or not the environment is an acoustic noise environment (this is referred to as "M FLAG") is set according to the determination result.
This setting uses the cumulative value ΣRD (1) of the delayed voice component-free data of the CH0 reference signal,

【０１５４】[0154]

【数１４】の条件を満たしたときにＭＦＬＡＧをたて
る。When the condition of [Equation 14] is satisfied, M FLAG is set.

【０１５５】オーディオ騒音成分があるときは走行騒音
があるかどうかを、ファジィ推論により判別する（ステ
ップＳ１０５）。その判別結果に応じて走行騒音環境下
であるかどうかのフラグ（これを「ＮＦＬＡＧ」と称
する）を設定する。この設定において、走行騒音判定レ
ベルはファジィ推論により判別するが（ステップＳ１０
５）、このとき、前回求めた音響騒音レベルΣＭＤ
（０）を用いて決定する。そして、ＣＨ１の遅延音声成
分未含有データの累計値ΣＭＤ（１）、ΣＲＤ（１）
と、走行騒音成分の含まれない３０秒間の累計値（これ
については後述する）を用い、When there is an audio noise component, it is determined by fuzzy inference whether or not there is running noise (step S105). A flag indicating whether or not the vehicle is in a traveling noise environment (this is referred to as "N FLAG") is set according to the determination result. In this setting, the running noise determination level is determined by fuzzy inference (step S10).
5), at this time, the acoustic noise level ΣMD obtained last time
Determine using (0). Then, the cumulative values ΣMD (1) and ΣRD (1) of the delayed voice component-free data of CH1.
And a cumulative value for 30 seconds that does not include a running noise component (this will be described later),

【０１５６】[0156]

【数１３】の条件を満たしたときにＮＦＬＡＧをたて
る。When the condition of [Equation 13] is satisfied, N FLAG is set.

【０１５７】ＭＦＬＡＧがたっていて、ＮＦＬＡＧ
がたっていない場合には、適応的に補正係数を更新する
（ステップＳ１０６）。具体的には、カウンタを設定
し、音声区間でないときにこのカウンタを進ませ、かつ
数ミリ秒（この場合、０．５秒）おきにＧ（ＣＨ）＝Ｇ
ｃ（ＣＨ）として補正係数を更新する。さらにステップ
Ｓ１０３で得られた遅延音声成分未含有データの累積値
（ΣＭＤ（ＣＨ），ΣＲＤ（ＣＨ））を数秒間毎（ここ
では０．５秒）にストックし、同時に過去３０秒間の更
なる累積値（ΣΣＭＤ（ＣＨ），ΣΣＲＤ（ＣＨ））を
求める。そしてファジィ推定により調整量の決定及び修
正を行い（ステップＳ１０７）、さらに決定あるいは修
正した調整量により減算処理を行い（ステップＳ１０
８）、その減算結果を出力する（ステップＳ１０９）。M FLAG is on, N FLAG
If not, the correction coefficient is adaptively updated (step S106). Specifically, a counter is set, this counter is advanced when it is not in the voice section, and G (CH) = G every few milliseconds (0.5 seconds in this case).
The correction coefficient is updated as c (CH). Further, the cumulative value (ΣMD (CH), ΣRD (CH)) of the delayed voice component-free data obtained in step S103 is stocked every few seconds (here, 0.5 seconds), and at the same time, the further 30 seconds have passed. Cumulative values (ΣΣMD (CH), ΣΣRD (CH)) are obtained. Then, the adjustment amount is determined and corrected by fuzzy estimation (step S107), and the subtraction process is further performed by the determined or corrected adjustment amount (step S10).
8) and outputs the subtraction result (step S109).

【０１５８】一方、ＭＦＬＡＧ及びＮＦＬＡＧがと
もにたっている場合には、ステップＳ１０６で求めた遅
延音声成分未含有データの過去３０秒間の累計値である
ΣΣＭＤ（ＣＨ）、ΣΣＲＤ（ＣＨ）を用いて、次式、On the other hand, when both M FLAG and N FLAG are present, ΣΣMD (CH) and ΣΣRD (CH) which are the cumulative values of the delayed voice component non-containing data obtained in step S106 for the past 30 seconds are used. , The following equation,

【数１５】により補正係数を更新する（ステップＳ１１
０）。その更新した補正係数によりパラメータを推定し
（ステップＳ１１１）、ファジィ推論により調整量の決
定と修正を行い（ステップＳ１０７）、その決定あるい
は修正した調整量により減算処理を行い（ステップＳ１
０８）、その減算結果を出力する（ステップＳ１０
９）。なお、、ステップＳ１０３において音声区間であ
る場合、及びステップＳ１０４においてオーディオ騒音
成分がない場合には、調整量の決定及び修正を行うこと
なくステップＳ１０８に移行して減算処理を行う。The correction coefficient is updated by the following equation (15) (step S11).
0). The parameter is estimated by the updated correction coefficient (step S111), the adjustment amount is determined and corrected by fuzzy reasoning (step S107), and the subtraction process is performed by the determined or corrected adjustment amount (step S1).
08), and outputs the subtraction result (step S10).
9). If it is a voice section in step S103 and if there is no audio noise component in step S104, the process proceeds to step S108 without performing the adjustment amount determination and correction, and the subtraction process is performed.

【０１５９】第５の発明による音声トリガレベル決定方
法を説明する。図１５にステップＳ１０２における音声
トリガレベルを決定するためのファジィ推論方法を示
す。すなわち、第４の発明の実施例においては、固定値
の音声トリガレベルにより音声区間を検出したが、本発
明においては、図１５のファジィルールに基づいて音声
トリガレベルを決定する。A method for determining a voice trigger level according to the fifth aspect of the invention will be described. FIG. 15 shows a fuzzy inference method for determining the voice trigger level in step S102. That is, in the embodiment of the fourth invention, the voice section is detected by the fixed value voice trigger level, but in the present invention, the voice trigger level is determined based on the fuzzy rule of FIG.

【０１６０】適応型Ｓ．Ｓ．方式は、前処理的に減算を
行った結果に基づいて音声始端を決定する方式であるた
め、前処理的に減算を行った結果を参照して、本方式の
音声トリガレベル決定のファジィルールを作成した。こ
のファジィルールは、ＭＡＸ−ＭＩＮ重心法による方法
である。この重心法とは、各ルール（この場合、ルール
１ないし６）ごとに推論結果を求め、各ルールにおける
推論結果を総合して、その重心としてルール全体の推論
結果を得る方法である。図１６は、図１５のファジィル
ールに対応した減算結果を示すものであり、ルール１〜
６が各々図１６（ａ）〜（ｆ）の場合を想定している。
すなわち、残留レベルに応じて音声トリガレベルを調整
している。Adaptive S. S. Since the method determines the voice start edge based on the result of preprocessing subtraction, refer to the result of preprocessing subtraction to determine the fuzzy rule for determining the audio trigger level of this method. Created. This fuzzy rule is a method based on the MAX-MIN centroid method. The centroid method is a method of obtaining an inference result for each rule (in this case, rules 1 to 6), combining the inference results in each rule, and obtaining the inference result of the entire rule as its centroid. FIG. 16 shows a result of subtraction corresponding to the fuzzy rule shown in FIG.
It is assumed that 6 is shown in FIGS. 16 (a) to 16 (f).
That is, the voice trigger level is adjusted according to the residual level.

【０１６１】本発明のファジィ推論による音声トリガレ
ベル決定方法により、次にような効果が得られる。The following effects can be obtained by the method for determining a voice trigger level by fuzzy inference according to the present invention.

【０１６２】１．音響騒音レベルと走行騒音レベルに応
じて音声トリガレベルを決定することが可能となり、残
留成分が大きい時にはその残留成分でトリガが掛かる事
の無いようにトリガレベルを大きくし、残留成分が少な
い時には比較的発声レベルが低い発声者に対して発声音
に音声トリガが掛かり難くなることの無いようトリガレ
ベルを小さくする事により、適応型Ｓ．Ｓ．方式の効果
を向上させることが可能となった。1. It is possible to determine the voice trigger level according to the acoustic noise level and the running noise level. When the residual component is large, increase the trigger level so that the residual component will not trigger, and when the residual component is small, compare The adaptive S.I.S. S. It has become possible to improve the effect of the method.

【０１６３】２．ファジィ推論を利用することにより、
通常の制御では難しかった複数のパラメータ（ここでは
音響騒音レベルと走行騒音レベルの２パラメータ）によ
る制御ルールの作成、調整が容易となった。2. By using fuzzy reasoning,
It has become easy to create and adjust a control rule based on a plurality of parameters (here, two parameters of the acoustic noise level and the running noise level) that were difficult to achieve by normal control.

【０１６４】３．該発明のファジィルールは６つのルー
ルで示されるが、ファジィ推論による補間効果により、
中間的な値に対しても適切な制御が可能となり、きめ細
かい制御が可能となった。3. The fuzzy rule of the invention is shown by six rules, but due to the interpolation effect by fuzzy inference,
Appropriate control is possible even for intermediate values, and fine control is possible.

【０１６５】４．ロンバート効果を考えると、騒音レベ
ルが低い状態（結果として本方式による騒音の残留レベ
ルが小さい場合、例えば図１６（ｄ））では同一話者で
も発声レベルは低くなる。又、その逆の場合にはロンバ
ート効果により発声レベルは高くなる。よって、ロンバ
ート効果による音声レベルの変動にも本方式はマッチし
ている。4. Considering the Lombard effect, in the state where the noise level is low (as a result, when the residual level of noise according to this method is small, for example, FIG. 16D), the utterance level is low even for the same speaker. In the opposite case, the Lombard effect raises the vocalization level. Therefore, this method matches the fluctuation of the voice level due to the Lombard effect.

【０１６６】次に、第６の発明である走行騒音判定レベ
ル決定方法について説明する。図１７に図１４のステッ
プＳ１０５における走行騒音の有無を判定するための、
走行騒音判定フラグのしきい値決定ルールを示す。ルー
ル全体の推論結果の計算はＭＡＸ−ＭＩＮ重心法によ
る。Next, a traveling noise determination level determining method according to the sixth aspect of the present invention will be described. In FIG. 17, for determining the presence or absence of running noise in step S105 of FIG.
The threshold value determination rule of a running noise determination flag is shown. Calculation of the inference result of the entire rule is based on the MAX-MIN centroid method.

【０１６７】図１３におけるフラグ設定においては、走
行騒音を判定してフラグを立てるしきい値が固定であっ
た。この固定値は、音響騒音レベルが有る程度小さい時
に、走行騒音が音響騒音よりも支配的になる走行騒音レ
ベルの手前の走行騒音レベルを採用していた。しかし実
際には、音響騒音レベルがかなり大きいと、走行騒音が
音響騒音よりも支配的となる走行騒音レベルは上方にシ
フトする。In the flag setting in FIG. 13, the threshold value for determining the running noise and setting the flag is fixed. As this fixed value, when the acoustic noise level is small to some extent, the traveling noise level before the traveling noise level at which the traveling noise becomes dominant over the acoustic noise is adopted. However, in reality, when the acoustic noise level is considerably high, the traveling noise level in which the traveling noise becomes dominant over the acoustic noise shifts upward.

【０１６８】そのため、必要以上に走行騒音レベルが低
い位置で補正係数の更新及び学習がなされなくなり、適
応的な処理の効果が低減してしまった。そこで、ファジ
ィ推論により、音響レベルに応じて走行騒音判定フラグ
のしきい値を決定する。Therefore, the correction coefficient is not updated and learned at a position where the traveling noise level is lower than necessary, and the effect of adaptive processing is reduced. Therefore, the threshold value of the running noise determination flag is determined according to the sound level by fuzzy inference.

【０１６９】このルールは音響騒音と走行騒音とがどち
らが支配的となるかを考慮している。つまり、音響騒音
レベルが「かなり大きい」時には全般的に音響騒音が支
配的となり易く、走行騒音が支配的となる走行騒音レベ
ルはかなり上方に位置するため、走行騒音判定フラグの
しきい値も「かなり大きい」とする。This rule considers which of acoustic noise and running noise is dominant. In other words, when the acoustic noise level is “substantially high”, the acoustic noise generally tends to be dominant, and the traveling noise level at which the traveling noise is dominant is located considerably above. Therefore, the threshold value of the traveling noise determination flag is also “ It's quite large. "

【０１７０】逆に、音響騒音レベルが「有る程度小さ
い」時には、走行騒音が支配的となり易く、その走行騒
音レベルは低いレベルに位置されるため、走行騒音判定
フラグのしきい値は「小さい」とする。On the other hand, when the acoustic noise level is "small to some extent", the traveling noise is likely to be dominant and the traveling noise level is located at a low level, so the threshold value of the traveling noise judgment flag is "small". And

【０１７１】この第６の発明によれば、ファジィ推論に
より、音響騒音レベルに応じて走行騒音判定フラグのし
きい値を決定することが可能となり、適応型Ｓ．Ｓ．方
式の効果を向上させることができる。According to the sixth aspect of the present invention, it is possible to determine the threshold value of the running noise determination flag according to the acoustic noise level by fuzzy inference, and the adaptive S.I. S. The effect of the method can be improved.

【０１７２】また、この発明の調整量決定ルールは音響
騒音レベルに応じて２つのルールで示されるが、ファジ
ィ推論による補間効果により、中間的な値に対しても適
切な制御が可能となり、きめ細かい制御が可能となっ
た。Further, although the adjustment amount determination rule of the present invention is shown by two rules according to the acoustic noise level, the interpolation effect by the fuzzy reasoning makes it possible to appropriately control even an intermediate value, and to perform fine adjustment. Control became possible.

【０１７３】次に、第７の発明であるファジィ推論によ
る調整量決定方法について説明する。図１８は図１４の
ステップＳ１０８における調整量を決定するための調整
量決定ルールを示す。また、図１９は走行騒音レベルに
より減算量を変化させた場合の減算結果を示す図であ
る。図１９（ａ）は走行騒音レベルが「かなり大きい」
時の通常の減算量の場合の減算結果を示し、図１９
（ｂ）は走行騒音レベルが「ある程度大きい」時に減算
量を少な目にした場合の減算結果を示し、図１９（ｃ）
は走行騒音レベルが「ある程度大きい」時に減算量を多
目にした場合の減算結果を示す。また、図２０は走行騒
音レベル及び音響騒音レベルと調整量との関係の概略を
示す図である。この図２０でａは走行騒音レベルが「小
さい」時の図１８におけるルール１及びルール２による
もので、ｂは走行騒音レベルが「ある程度大き」い時の
ルール１及びルール３によるもので、ｃは走行騒音レベ
ルが「かなり大きい」時のルール１及びルール４による
ものである。Next, a method of determining an adjustment amount by fuzzy inference according to the seventh invention will be described. FIG. 18 shows an adjustment amount determination rule for determining the adjustment amount in step S108 of FIG. Further, FIG. 19 is a diagram showing a subtraction result when the subtraction amount is changed according to the traveling noise level. In FIG. 19 (a), the traveling noise level is "very high".
19 shows the subtraction result in the case of the normal subtraction amount at
FIG. 19B shows the subtraction result when the subtraction amount is small when the traveling noise level is “somewhat high”, and FIG.
Indicates the subtraction result when the amount of subtraction is increased when the running noise level is “somewhat high”. Further, FIG. 20 is a diagram schematically showing the relationship between the traveling noise level and the acoustic noise level and the adjustment amount. In FIG. 20, a is based on the rules 1 and 2 in FIG. 18 when the running noise level is “small”, b is based on the rules 1 and 3 when the running noise level is “somewhat high”, and c Is due to Rule 1 and Rule 4 when the running noise level is "significantly high".

【０１７４】図１８における調整量決定ルールの方式は
第３及び第４の発明における調整量決定ルールをさらに
改良したものである。このファジィルールは、ＭＡＸ−
ＭＩＮ重心法による方法である。The method of the adjustment amount determination rule in FIG. 18 is a further improvement of the adjustment amount determination rule in the third and fourth inventions. This fuzzy rule is MAX-
It is a method based on the MIN centroid method.

【０１７５】ルール１は走行騒音レベルにかかわらず、
音響騒音レベルが「かなり小さい」時には、調整量を
「小さい」に設定する。Rule 1 is that regardless of the running noise level,
When the acoustic noise level is "quite low", the adjustment amount is set to "low".

【０１７６】ルール２は走行騒音レベルが「小さい」時
で、且つ、音響騒音レベルが「大き目」の時には、調整
量を「大きい」に設定する。Rule 2 sets the adjustment amount to "high" when the traveling noise level is "low" and when the acoustic noise level is "high".

【０１７７】ルール３は走行レベルが「ある程度大き
い」時で、且つ、音響騒音レベルが「大き目」の時に
は、調整量を「かなり大きい」に設定する。Rule 3 sets the adjustment amount to "substantially large" when the traveling level is "somewhat large" and the acoustic noise level is "large".

【０１７８】ルール４は走行騒音レベルが「かなり大き
い」時で、且つ、音響騒音レベルが「大き目」の時に
は、調整量を「大き目」に設定する。Rule 4 sets the adjustment amount to "large" when the running noise level is "quite large" and the acoustic noise level is "large".

【０１７９】このファジィルールは以下のような効果を
期待して作成されたものである。This fuzzy rule is created with the expectation of the following effects.

【０１８０】１．走行騒音が「ある程度大きい」が、音
響騒音に対して余り支配的でない程度の環境下では、走
行騒音レベルが「小さい」時より調整量を大き目に設定
することで騒音成分を多目に減算した方が効果がある。
これは、騒音成分の推定誤差が大きい走行騒音重畳時に
は音響騒音成分が残り易い為、少な目に減算し走行騒音
成分＋音響騒音の残留成分を残すよりも（図１９（ｂ）
参照）、多目に減算し走行騒音成分も減算してしまい走
行騒音の残留成分を残した方が、図２０−ｃに示すよう
に、認識部での音声トリガが掛かり難くなるためであ
る。1. In an environment where the running noise is “somewhat loud” but not so dominant to the acoustic noise, the noise component is subtracted to a large extent by setting the adjustment amount to a larger value than when the running noise level is “small”. Is more effective.
This is because the acoustic noise component is likely to remain when the traveling noise is superimposed with a large estimation error of the noise component, so that it is less subtracted to leave the traveling noise component + the residual component of the acoustic noise (FIG. 19B).
This is because it is more difficult to trigger the voice trigger in the recognition unit when the residual component of the traveling noise is left by subtracting the traveling noise component and subtracting the traveling noise component, as shown in FIG. 20-c.

【０１８１】２．走行騒音が「かなり大きい」状態で、
音響騒音よりも支配的である環境下では、音響騒音が走
行騒音に埋もれる状態となるために（図１９（ａ）参
照）、減算量は少な目でよい。2. When the driving noise is "quite loud",
In an environment in which the acoustic noise is dominant over the acoustic noise, the acoustic noise is buried in the traveling noise (see FIG. 19A), and thus the subtraction amount may be small.

【０１８２】結局、走行騒音レベルに応じた音響騒音レ
ベルと調整量との関係は図２０に示すごとく設定され、
上記の効果が期待できる。After all, the relationship between the acoustic noise level and the adjustment amount according to the running noise level is set as shown in FIG.
The above effects can be expected.

【０１８３】[0183]

【発明の効果】上記各実施例で明らかなように、第１の
発明ないし第７の発明により、以下のような効果を得る
ことができる。As is apparent from the above embodiments, the following effects can be obtained by the first to seventh inventions.

【０１８４】１．第１の発明の効果この発明によれば、非音声区間のメイン信号及びリファ
レンス信号から、常に最新の補正係数を求めることによ
り、急激に変化するオーディオ騒音等の非定常雑音にも
対応することができ、さほど急激なオーディオ騒音の変
化がない場合には、フィルタの収束時間を短くすること
ができる効果がある。また、ＤＳＰ等のような高速演算
処理を可能とする高価な処理装置を必要とすることもな
い。1. EFFECTS OF THE FIRST INVENTION According to the present invention, by always obtaining the latest correction coefficient from the main signal and the reference signal in the non-voice section, it is possible to deal with non-stationary noise such as abruptly changing audio noise. If the audio noise does not change so much, the convergence time of the filter can be shortened. Further, there is no need for an expensive processing device capable of high-speed arithmetic processing such as DSP.

【０１８５】さらに、リファレンス信号に音声信号が含
まれることがないので、推定誤差を少なくすることがで
き、オーディオ騒音環境下においても高い音声認識が可
能となる。Further, since the reference signal does not include the voice signal, the estimation error can be reduced and the high voice recognition can be performed even in the audio noise environment.

【０１８６】２．第２の発明の効果第２の発明の効果としては以下に述べるものがある。2. Effects of the Second Invention The effects of the second invention are as follows.

【０１８７】１）オーディオ騒音が大きい場合でも、更
新前補正係数を利用した減算データによって、予めある
程度のオーディオ騒音成分を除去しているので、音声信
号の始端の検出誤差を小さくすることができる。1) Even if the audio noise is large, the audio noise component is removed to some extent in advance by the subtraction data using the pre-update correction coefficient, so that the detection error at the beginning of the audio signal can be reduced.

【０１８８】２）音声区間を自動的に検出するので、ユ
ーザが発声のたびにキー入力等の操作を行う負担を解消
することができる。2) Since the voice section is automatically detected, it is possible to eliminate the burden of the user performing an operation such as key input each time the user speaks.

【０１８９】３）音声信号のレベルが小さいために推定
誤りによるオーディオ騒音成分の残留成分が存在する場
合でも、音声信号を検出するスレッショルドレベルを大
きく設定することにより、オーディオ騒音による音声区
間の誤検出を少なくすることができ、補正係数の適正値
を求めることができる。したがって、音声認識部のスレ
ッショルドレベルに依存することがない。3) Even if there is a residual component of audio noise component due to an estimation error because the level of the voice signal is small, the threshold level for detecting the voice signal is set to a large value, thereby erroneously detecting the voice section due to the audio noise. Can be reduced and an appropriate value of the correction coefficient can be obtained. Therefore, it does not depend on the threshold level of the voice recognition unit.

【０１９０】４）極めて単純な方法であるため、リアル
タイム処理が可能となる。4) Real-time processing is possible because of the extremely simple method.

【０１９１】３．第３の発明の効果この発明の効果は、オーディオ騒音レベルの変動に応じ
て補正係数を更新できることである。3. Effect of the third invention The effect of the present invention is that the correction coefficient can be updated according to the fluctuation of the audio noise level.

【０１９２】通常、オーディオ騒音レベルが大きくなる
と推定誤差が生じるため、オーディオ騒音の残留データ
が多く残ってしまい、音声認識部における音声区間の検
出誤りが多くなる。そこで、本実施例の発明を適用する
ことにより、オーディオ騒音レベルが大きいときは上記
調整量決定ルールに従い、オールパスフィルタにおいて
多めに減算することにより、音声区間の検出誤りを少な
くすることができる。Usually, when the audio noise level becomes large, an estimation error occurs, so that a large amount of residual data of the audio noise remains and the detection error of the voice section in the voice recognition unit increases. Therefore, by applying the invention of this embodiment, when the audio noise level is high, a large amount of subtraction is performed in the all-pass filter in accordance with the adjustment amount determination rule, so that the detection error of the voice section can be reduced.

【０１９３】一方、オーディオ騒音レベルが小さいとき
には、オーディオ騒音の残留レベルは少ないので、多く
減算し過ぎると音声区間が狭まるために類似度が低下す
る。かかる場合には、調整量決定ルールに従い、オール
パスフィルタにおいて少なめになるように減算すること
により、類似度を高くすることができる。On the other hand, when the audio noise level is low, the residual level of the audio noise is low, and if too much subtraction is performed, the voice section becomes narrow and the similarity decreases. In such a case, according to the adjustment amount determination rule, the degree of similarity can be increased by subtracting so that the all-pass filter is reduced.

【０１９４】４．第４の発明の効果この発明によれば、既知雑音環境下の適応型スペクトル
・サブトラクション方式において、音声成分に走行騒音
が重畳した場合であっても、適正な補正係数でオーディ
オ騒音成分を除去し、走行騒音除去に対しては音声認識
部の機能をそのまま利用することができるという効果が
ある。4. EFFECTS OF FOURTH INVENTION According to the present invention, in the adaptive spectral subtraction method under the known noise environment, even when the running noise is superposed on the voice component, the audio noise component is removed with the appropriate correction coefficient. The function of the voice recognition unit can be used as it is for removing the traveling noise.

【０１９５】５．第５の発明の効果第５の発明によれば次のような効果がある。5. Effects of Fifth Invention According to the fifth invention, there are the following effects.

【０１９６】１）該発明のファジィ推論により、音響騒
音レベルと走行騒音レベルに応じて音声トリガレベルを
決定することが可能となり、残留成分が大きい時にはそ
の残留成分でトリガが掛かる事の無いようにトリガレベ
ルを大きくし、残留成分が少ない時には比較的発声レベ
ルが低い発声者に対して発声音に音声トリガが掛かり難
くなることの無いよう、トリガレベルを小さくする事に
より、適応型Ｓ．Ｓ．方式の効果を向上させることがで
きた。1) The fuzzy inference of the present invention makes it possible to determine the voice trigger level according to the acoustic noise level and the running noise level, and when the residual component is large, the residual component does not trigger. By increasing the trigger level and reducing the trigger level so as not to make it difficult for the utterer having a relatively low voicing level to easily trigger the voicing sound when the residual component is small, the adaptive S.I. S. The effect of the method could be improved.

【０１９７】２）ファジィ推論を利用することにより、
通常の制御では難しかった複数のパラメータ（ここでは
音響騒音レベルと走行騒音レベルの２パラメータ）によ
る制御ルールの作成、調整が容易となった。2) By using fuzzy inference,
It has become easy to create and adjust a control rule based on a plurality of parameters (here, two parameters of the acoustic noise level and the running noise level) that were difficult to achieve by normal control.

【０１９８】３）該発明のファジィルールは６つのルー
ルで示されるが、ファジィ推論による補間効果により、
中間的な値に対しても適切な制御が可能となり、きめ細
かい制御が可能となった。3) The fuzzy rule of the present invention is shown by six rules, but due to the interpolation effect by fuzzy inference,
Appropriate control is possible even for intermediate values, and fine control is possible.

【０１９９】４）ロンバート効果を考えると、騒音レベ
ルが低い状態（結果として本方式による騒音の残留レベ
ルが小さい場合（例えば第２図−ｄ））では同一話者で
も発声レベルは低くなる。又、その逆の場合にはロンバ
ート効果により発声レベルは高くなる。よって、ロンバ
ート効果による音声レベルの変動にも本方式はマッチし
ている。4) Considering the Lombard effect, when the noise level is low (as a result, the residual level of noise according to this method is small (for example, FIG. 2D)), the utterance level is low even for the same speaker. In the opposite case, the Lombard effect raises the vocalization level. Therefore, this method matches the fluctuation of the voice level due to the Lombard effect.

【０２００】６．第６の発明の効果第６の発明によれば次のような効果がある。6. Effects of Sixth Invention According to the sixth invention, there are the following effects.

【０２０１】１）該発明のファジイ推論により、音響騒
音レベルに応じて走行騒音判定フラグのしきい値を決定
することが可能となり、従来方式よりも適応型Ｓ．Ｓ．
方式の効果を向上させることが出来た。1) By the fuzzy inference of the present invention, it becomes possible to determine the threshold value of the running noise determination flag according to the acoustic noise level, which is more adaptive than the conventional system. S.
The effect of the method was able to be improved.

【０２０２】２）該発明の調整量決定ルールは音響騒音
レベルに応じて２つのルールで示されるが、ファジィ推
論による補間効果により、中間的な値に対しても適切な
制御が可能となり、きめ細かい制御が可能となった。2) The adjustment amount determination rule of the present invention is shown by two rules according to the acoustic noise level, but due to the interpolation effect by fuzzy reasoning, appropriate control is possible even for intermediate values, and fine adjustment is possible. Control became possible.

【０２０３】７．第７の発明の効果第７の発明によれば次のような効果がある。7. Effects of Seventh Invention According to the seventh invention, there are the following effects.

【０２０４】１）ファジィ推論により、音響騒音レベル
と走行騒音レベルに応じて調整量を決定することが可能
となり、ある程度走行騒音がある時には通常よりも調整
量を大き目にすることにより多めに減算し、走行騒音が
大きい時には通常よりも調整量を小さ目にすることによ
り少な目に減算する事により、従来方式よりも適応型
Ｓ．Ｓ．方式の効果を向上させることが出来た。1) By fuzzy reasoning, it becomes possible to determine the adjustment amount according to the acoustic noise level and the running noise level. When there is running noise to some extent, the adjustment amount is made larger than usual and the subtraction amount is increased. , When the driving noise is large, the adjustment amount is made smaller than usual to reduce the adjustment amount to a smaller amount, so that the adaptive S.S. S. The effect of the method was able to be improved.

【０２０５】２）ファジィ推論を利用することにより、
通常の制御では難しかった複数のパラメータ（ここでは
音響騒音レベルと走行騒音レベルの２パラメータ）によ
る制御ルールの作成、調整が容易となった。2) By using fuzzy inference,
It has become easy to create and adjust a control rule based on a plurality of parameters (here, two parameters of the acoustic noise level and the running noise level) that were difficult to achieve by normal control.

【０２０６】３）該発明の調整量決定ルールは音響騒音
レベルが「大き目」の場合には３つのルールで示される
が、ファジィ推論による補間効果により、中間的な値に
対しても適切な制御が可能となり、きめ細かい制御が可
能となった。3) The adjustment amount determination rule of the present invention is shown by three rules when the acoustic noise level is "large". However, due to the interpolation effect by fuzzy reasoning, appropriate control is performed even for intermediate values. It became possible, and fine control became possible.

[Brief description of drawings]

【図１】第１の発明の実施例における音声認識装置のブ
ロック図である。FIG. 1 is a block diagram of a voice recognition device in an embodiment of the first invention.

【図２】図１に示す音声認識装置のＣＰＵ１５の動作を
表すフローチャートである。FIG. 2 is a flowchart showing an operation of a CPU 15 of the voice recognition device shown in FIG.

【図３】第２の発明の第１の実施例における音声認識装
置のブロック図である。FIG. 3 is a block diagram of a voice recognition device in a first embodiment of the second invention.

【図４】図３に示す音声認識装置の音声始端を検出する
様子を示す図である。FIG. 4 is a diagram showing how a voice start end of the voice recognition device shown in FIG. 3 is detected.

【図５】第２の発明の第１の実施例における音声認識装
置のＣＰＵ１５の動作を表すフローチャートである。FIG. 5 is a flowchart showing an operation of a CPU 15 of the voice recognition device in the first exemplary embodiment of the second invention.

【図６】（ａ）は発声のたびに毎回補正係数を更新した
場合のオーディオ成分の残留分を示す図である。（ｂ）
は一定時間ごとに補正係数を更新した場合のオーディオ
成分の残留分を示す図である。FIG. 6A is a diagram showing a residual amount of an audio component when a correction coefficient is updated every time a voice is uttered. (B)
[Fig. 6] is a diagram showing a residual amount of audio components when a correction coefficient is updated at regular intervals.

【図７】第２の発明の第２の実施例における音声認識装
置の動作を表すフローチャートである。FIG. 7 is a flowchart showing an operation of the voice recognition device in the second exemplary embodiment of the second invention.

【図８】第２の発明の第３の実施例における音声認識装
置のＣＰＵ１５の動作を表すフローチャートである。FIG. 8 is a flowchart showing an operation of a CPU 15 of the voice recognition device in the third exemplary embodiment of the second invention.

【図９】[Figure 9]

【図１０】第３の発明の第１の実施例における音声認識
装置のＣＰＵ１５の動作を表すフローチャートである。FIG. 10 is a flowchart showing an operation of the CPU 15 of the voice recognition device in the first exemplary embodiment of the third invention.

【図１１】第３及び第４の発明の音声認識装置における
補正係数の調整量決定ルールを表す図である。FIG. 11 is a diagram showing an adjustment amount determination rule of a correction coefficient in the voice recognition device of the third and fourth inventions.

【図１２】第３の発明の第２の実施例における音声認識
装置のＣＰＵ１５の動作を表すフローチャートである。FIG. 12 is a flowchart showing the operation of the CPU 15 of the voice recognition device in the second exemplary embodiment of the third invention.

【図１３】第４の発明の実施例における音声認識装置の
ＣＰＵ１５の動作を表すフローチャートである。FIG. 13 is a flowchart showing an operation of the CPU 15 of the voice recognition device in the embodiment of the fourth invention.

【図１４】第５ないし第７の発明における音声トリガレ
ベルの決定方法、走行騒音判定レベルの決定方法、及
び、補正係数の調整量の決定方法をファジィ推論により
行った場合の、ＣＰＵ１５の動作を表すフローチャート
である。FIG. 14 shows the operation of the CPU 15 when the method of determining the voice trigger level, the method of determining the running noise determination level, and the method of determining the adjustment amount of the correction coefficient in the fifth to seventh inventions are performed by fuzzy inference. It is a flowchart showing.

【図１５】図１４のステップＳ１０２における音声トリ
ガレベルを決定するためのファジィ推論方法を示す図で
ある。15 is a diagram showing a fuzzy inference method for determining a voice trigger level in step S102 of FIG.

【図１６】図１５のファジィルールに対応した減算結果
を示す図である。16 is a diagram showing a result of subtraction corresponding to the fuzzy rule of FIG.

【図１７】図１４のステップＳ１０５における走行騒音
の有無を判定するための走行騒音判定フラグのしきい値
決定ルールを示す図である。FIG. 17 is a diagram showing a threshold value determination rule of a traveling noise determination flag for determining the presence or absence of traveling noise in step S105 of FIG.

【図１８】図１４のステップＳ１０８における調整量を
決定するための調整量決定ルールを示す図である。FIG. 18 is a diagram showing an adjustment amount determination rule for determining an adjustment amount in step S108 of FIG.

【図１９】走行騒音レベルにより減算量を変化させた場
合の減算結果を示す図である。FIG. 19 is a diagram showing a subtraction result when the subtraction amount is changed according to the traveling noise level.

【図２０】走行騒音レベル及び音響騒音レベルと調整量
との関係の概略を示す図である。FIG. 20 is a diagram showing an outline of a relationship between a traveling noise level and an acoustic noise level and an adjustment amount.

【図２１】従来の音声認識装置のブロック図である。FIG. 21 is a block diagram of a conventional voice recognition device.

[Explanation of symbols]

１１マイク１３フィルタバンク１５ＣＰＵ１６オーディオ装置１８アンプ１９フィルタバンク２１音声認識部（認識手段）２２登録辞書 Reference Signs List 11 microphone 13 filter bank 15 CPU 16 audio device 18 amplifier 19 filter bank 21 voice recognition unit (recognition means) 22 registration dictionary

Claims

[Claims]

1. A voice recognition device for recognizing a voice signal by removing the noise component from a main signal in which a voice signal from a speaker is mixed with a noise component and comparing the voice signal with a comparative voice signal registered in advance. A means for generating a reference signal based on the noise component, a voice section in which the voice signal is included in the main signal,
A voice section discriminating means for discriminating a non-voice section which is not included; a correction coefficient updating means for generating and updating a correction coefficient based on the main signal in the non-voice section; A calculation means for subtracting a value multiplied by a correction coefficient from the main signal; and a recognition means for performing voice recognition by collating the calculation result obtained from the calculation means with the comparative voice signal. Voice recognition device.

2. A voice start edge detecting means for setting a temporary voice start edge of the voice signal when a subtraction result obtained by subtracting a value obtained by multiplying the reference signal by a correction coefficient before updating from the main signal is larger than a predetermined value. A means for generating a delayed main signal and a delayed reference signal from a past main signal and a reference signal before a fixed time, and a voice section determination means for determining a fixed voice start end based on the delayed main signal and determining the voice section A correction coefficient updating means for generating and updating a correction coefficient from a ratio of cumulative values of the delayed main signal and the delayed reference signal in a certain past time; and a value obtained by multiplying the delayed reference signal by the correction coefficient in the voice section. The speech recognition apparatus according to claim 1, further comprising: an arithmetic unit that subtracts from the delayed main signal.

3. A means for generating a correction coefficient candidate from a ratio of cumulative values of the delayed main signal and the delay reference signal in the past fixed time, a correction coefficient updating means for updating the correction coefficient candidate at fixed time intervals, and a delay. A correction coefficient adjusting unit that adjusts the correction coefficient based on a predetermined adjustment amount determination rule by using a delayed main signal that does not include an audio signal, and a correction coefficient correcting unit that corrects the adjusted correction coefficient are provided. The voice recognition device according to claim 2, wherein

4. A means for generating a correction coefficient candidate from a ratio of cumulative values of the delayed main signal and the delayed reference signal for a certain past time, and whether or not the noise component includes musical noise and the noise component is a vehicle. Noise component discriminating means for discriminating whether or not the traveling noise is included, and when the noise component includes only the music noise, the value of the correction factor candidate is updated as a correction factor at fixed time intervals, and the noise component When the includes the music noise and the running noise, a correction coefficient updating unit that uses a ratio of a cumulative value of the cumulative values of the delayed main signal and the delayed reference signal for a certain past time as a correction coefficient, and the noise component is the musical noise. When including only, the correction coefficient is adjusted based on a predetermined adjustment amount determination rule, and when the noise component includes the music noise and running noise,
Correction coefficient adjusting means for adjusting the correction coefficient based on the predetermined adjustment amount determination rule using the estimated main signal obtained from the calculating means; and when the noise component includes the music noise, the adjustment is made. 3. The voice recognition device according to claim 2, further comprising a correction coefficient correction unit that corrects the correction coefficient.

5. A voice recognition method in a noise environment for recognizing a voice of the speaker by removing the noise component from an input signal in which a noise component is mixed with a voice signal component from the speaker, the input signal From this, a voice section is detected by fuzzy inference, it is determined whether or not the noise component is mixed in this voice section, and the correction count for predicting the voice signal component is updated according to the determination result, A voice recognition method in a noise environment, characterized in that the updated correction count is adjusted, subtraction processing is performed based on the adjusted correction count, and voice recognition is performed using the subtraction result as the voice signal component.

6. A voice recognition method in a noise environment for recognizing a voice of the speaker by removing the noise component from an input signal in which an acoustic noise component and a running noise component are mixed in a voice signal component from the speaker. A correction count for detecting a voice section from the input signal, determining by fuzzy inference whether or not the running noise component is mixed in the voice section, and predicting the voice signal component according to the determination result. In the noise environment characterized by performing the adjustment of the updated correction count, performing subtraction processing based on the adjusted correction count, and performing voice recognition using the subtraction result as the voice signal component. Voice recognition method.

7. A voice recognition method for recognizing a voice of the speaker by removing the noise component from an input signal in which a noise component is mixed with a voice signal component from the speaker, wherein a voice section is recognized from the input signal. Detecting and determining whether or not the noise component is mixed in this voice section, updating the correction count predicting the voice signal component according to the determination result, and updating the correction count by fuzzy inference. Is performed, subtraction processing is performed based on the adjusted correction count, and voice recognition is performed using the subtraction result as the voice signal component.