JPS63262695A

JPS63262695A - Voice recognition system

Info

Publication number: JPS63262695A
Application number: JP62098876A
Authority: JP
Inventors: 松澤　英明
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1987-04-21
Filing date: 1987-04-21
Publication date: 1988-10-28

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は音声認識方式に関し、特に認識すべき音声に雑
音が印加された入力信号から雑音の影響を除去して認識
確率の改善を図った音声認識方式〔従来の技術〕従来、この梅の音声認識方式は、第５図に示す３つの基
本的手段、すなわち特徴パラメータ分析抽出手段ｌ、パ
ターンマツチング識別手段４．音声標準パターン記憶手
段５にょ多構成される。特徴パラメータ分析は抽出手段
１に入力された雑音を含む入力音声Ｓ１は、音声識別に
必要な特徴パラメータ時系列Ｄｖ＋Ｎｉ分析、抽出して
出方し、この特徴パラメータ時系列には特徴パラメータ
Ｄｖと雑音入力から抽出された雑音成分Ｎが含まれてい
る。パターンマツチング識別手段４は、この特徴パラメ
ータ時系列と音声標準パターン記憶手段５から読み出し
た音声標準パターン時系列Ｄｐｉとを比厳し、最も似て
いる音声標準パターン時系列を指定するｉを音声認識結
果として選択して、それに対応する標準音声を入力音声
であると判定する形式によって入力音声の認識を行って
いる。[Detailed Description of the Invention] [Field of Industrial Application] The present invention relates to a speech recognition method, and in particular aims to improve the recognition probability by removing the influence of noise from an input signal in which noise is added to the speech to be recognized. Speech Recognition Method [Prior Art] Conventionally, this Ume speech recognition method has been implemented using three basic means shown in FIG. The voice standard pattern storage means 5 is composed of a number of types. In the feature parameter analysis, the input speech S1 containing noise inputted to the extraction means 1 is extracted and output by analyzing the feature parameter time series Dv+Ni necessary for speech identification, and this feature parameter time series includes the feature parameters Dv and noise. Contains a noise component N extracted from the input. The pattern matching identification means 4 compares this feature parameter time series with the speech standard pattern time series Dpi read from the speech standard pattern storage means 5, and performs speech recognition on i that designates the most similar speech standard pattern time series. Input speech is recognized in a format in which the standard speech that corresponds to the selected result is determined to be the input speech.

[Problem that the invention seeks to solve]

上述した従来の音声認識方式は、パターンマツチング識
別手段４に入力する特徴パラメータ時系列Ｄｖ＋Ｎには
雑音成分Ｎを含んでおシ、また、音声標準パターン時系
列り、ｉには雑音成分を含んでいないので、この雑音成
分の有無が両時系列の値に差をもたらし、これが原因と
なって、音声標準パターン時系列のうちのどれが最も似
ているかという判別結果が誤まって行われる可能性が高
くな時系列の中から音声を特徴付ける成分の量が、ある
しきい値よシ多いか少ないかを判定することによって行
なわれている。特徴パラメータ時系列の中に本来認識す
べき音声による成分以外の雑音成分が含まれることは、
この音声切出しの判定が不正確となる可能性が高く、従
って特徴パラメータ時系列と音声標準パターン時系列と
の比較がうまく行なわれず認識結果に誤フが生じゃすく
なるわけである。In the conventional speech recognition method described above, the characteristic parameter time series Dv+N input to the pattern matching identification means 4 contains a noise component N, and the speech standard pattern time series i does not contain a noise component. Therefore, the presence or absence of this noise component causes a difference in the values of both time series, which may lead to incorrect determination of which of the speech standard pattern time series is most similar. This is done by determining whether the amount of components that characterize speech from a time series with high sensitivity is greater or less than a certain threshold. The fact that the feature parameter time series contains noise components other than the speech components that should be recognized is
There is a high possibility that this speech segmentation judgment will be inaccurate, and therefore, the comparison between the feature parameter time series and the speech standard pattern time series will not be performed well, and the recognition result will likely contain errors.

この問題に対しての従来の解決策としては、ひとつは、
特徴パラメータ分析抽出手段１に認識すべき音声を入力
する前に雑音のみを入力させ、その特徴パラメータ時系
列全平均化して雑音成分として一時蓄えておき、次に雑
音を含む音声を入力したとき、その特徴パラメータ時系
列から一時蓄えてあった雑音成分を除去して、バタンマ
ツチング識別手段４に入力する特徴パラメータ時系列に
含まれている雑音による影＃を少なくする方法である。One of the traditional solutions to this problem is
Before inputting the voice to be recognized to the feature parameter analysis and extraction means 1, only noise is input, the time series of the characteristic parameters are averaged and temporarily stored as a noise component, and next time when a voice containing noise is input, This method removes temporarily stored noise components from the feature parameter time series to reduce the shadow caused by noise contained in the feature parameter time series input to the slam matching identification means 4.

もうひとつの方法としては、あらかじめ記憶すべき音声
標準パターン時系列を生成するときに、認識すべき入力
信号に含まれる雑音の成分も含めでおく方法がある。こ
れらのいずれの方法でも、雑音の時間的変化に対応する
ことができず、特に音声が入力されている短時間の間に
特徴パラメータが変化する雑音に対してはその影響を除
去することができず、認識率が低下することが避はられ
ないという欠点がある。Another method is to include noise components included in the input signal to be recognized when generating the speech standard pattern time series to be stored in advance. None of these methods can respond to temporal changes in noise, and in particular cannot remove the effects of noise whose feature parameters change during a short period of time when speech is being input. First, there is a drawback that the recognition rate inevitably decreases.

本発明の目的は上述した欠点を除去し、バタンマツチン
グ識別形式の音声認識方式において、従来の分析抽出手
段金策１の分析抽出手段として認識すべき音声と雑音と
を含む入力音声を受けてその特徴パラメータ時系列を分
析、抽出し、新たに第２の分析抽出手段を設けてこれに
前記雑音と相関を有するかもしくは前記雑音の成分全台
む信号のみ全入力し、第１の分析抽出手段の出力する特
徴パラメータ時系列に含まれる雑音成分を第２の分析抽
出手段の出力する雑音特徴パラメータから算出して除去
したうえでバタンマツチング識別処理を行なうことによ
）、著しく認識確率を向上することができる音声認識方
式を提供することにあるＯ〔問題点全解決するための手段〕本発明の方式は、入力音声を分析し音響特徴パラメータ
時系列全抽出する特徴パラメータ分析抽出手段と、標準
音声の特徴パラメータ時系列を音声標準パターン時系列
としてあらかじめ記憶して６一おく音声標準パターン記憶手段と、前記特徴パラメータ
分析抽出手段によって抽出した特徴パラメータ時系列と
前記音声標準パターン記憶手段から読み出した音声標準
パターン時系列とを比較して入力音声の識別結果を出力
するパターンマツチング識別手段とを備えて構成される
音声認識装置において、認識すべ＠音声とそれ以外の音
全雑音として含む音響信号全入力とする第１の特徴パラ
メータ分析抽出手段と、前記第１の特徴パラメータ分析
抽出手段に入力する雑音と相関を有するかもしくは前記
雑音の成分を含む音響信号を入力とする第２の特徴パラ
メータ分析抽出手段と、前記第２の特徴パラメータ分析
抽出手段の出力する雑音特徴パラメータ時系列から算出
した雑音成分を前記第１の特徴パラメータ分析抽出手段
の出力する特徴パラメータ時系列から除去したうえこれ
を前記パターンマツチング識別手段に供給する雑音成分
除去手段を備えて構成される。The purpose of the present invention is to eliminate the above-mentioned drawbacks, and to provide an analysis and extraction method for receiving input speech containing speech to be recognized and noise in a speech recognition method using a bang matching recognition method. The feature parameter time series is analyzed and extracted, a second analysis and extraction means is newly provided, and only signals that have a correlation with the noise or that include all the components of the noise are inputted to the first analysis and extraction means. (by calculating and removing the noise component included in the feature parameter time series output by the second analysis extraction means from the noise feature parameters output by the second analysis extraction means and then performing the slam matching identification process), the recognition probability is significantly improved. [Means for solving all the problems] The method of the present invention includes a feature parameter analysis and extraction means for analyzing input speech and extracting all acoustic feature parameters in time series; A voice standard pattern storage means which stores in advance a standard voice characteristic parameter time series as a voice standard pattern time series, and reads out the characteristic parameter time series extracted by the characteristic parameter analysis and extraction means from the voice standard pattern storage means. In a speech recognition device, the speech recognition device includes a pattern matching identification means that outputs a classification result of the input speech by comparing it with a speech standard pattern time series that has been recognized. a first feature parameter analysis and extraction means that takes the entire signal input; and a second feature that takes as input an acoustic signal that has a correlation with the noise input to the first feature parameter analysis and extraction means or contains a component of the noise. a parameter analysis and extraction means, and a noise component calculated from the noise feature parameter time series outputted by the second feature parameter analysis and extraction means is removed from the feature parameter time series outputted by the first feature parameter analysis and extraction means; and a noise component removing means for supplying the pattern matching discrimination means to the pattern matching identification means.

〔Example〕

次に本発明について図面を参照して説明する。 Next, the present invention will be explained with reference to the drawings.

第１図は本発明の第１の実施例を示すブロック図でちゃ
、特徴パラメータ分析抽出手段１、雑音特徴パラメータ
分析抽出手段２、雑音成分除去手段３、パターンマツチ
ング識別手段４、音声標準パターン記憶手段５を備えて
構成され、特徴パラメータ分析抽出手段１には入力信号
Ｓ１ｆ入力し、雑音特徴パラメータ分析抽出手段２には
特徴パラメータ分析抽出手段１に入力される雑音の発生
源の信号またはその雑音と相関を有し、その雑音を特徴
づける成分を含む雑音信号８２（５人力する。FIG. 1 is a block diagram showing a first embodiment of the present invention. Feature parameter analysis and extraction means 1, noise feature parameter analysis and extraction means 2, noise component removal means 3, pattern matching and identification means 4, speech standard pattern The feature parameter analysis and extraction means 1 receives the input signal S1f, and the noise feature parameter analysis and extraction means 2 receives the noise source signal input to the feature parameter analysis and extraction means 1 or its Noise signal 82 (processed by 5 people) that has a correlation with noise and includes components that characterize the noise.

特徴パラメータ分析抽出手段１からは特徴パラメータ時
系列Ｄｖ＋Ｎを出力し、また雑音特徴パラメータ分析抽
出手段２からは雑音特徴パラメータ時系列ＤＮを出力す
る。The feature parameter analysis and extraction means 1 outputs a feature parameter time series Dv+N, and the noise feature parameter analysis and extraction means 2 outputs a noise feature parameter time series DN.

雑音成分除去手段３は、２つの入力時系列の時間的同期
をとって、雑音特徴パラメータ時系列ＤＮから特徴パラ
メータ時系列Ｄｖ十Ｎに含まれる雑音成分を算出し、特
徴パラメータ時系列Ｄｖ＋、から前述した雑音成分Ｎを
除去することによつ・て雑音成分が除かれた特徴パラメ
ータ時系列Ｄｖを出力する。バタンマツチング識別手段
４は、入力した特徴パラメータ時系列へと音声標準パタ
ーン記憶手段５から読み出した標準パターン時系列り、
ｉ（１＝＝ｌ　ｐ　２　ｔ・・・・・・Ｎ）とを照合し
最も特徴パラメータ時系列Ｄ　に似ている標準パターン
時系列り、ｉ■ を決定してこのＤｐｉを指定するｉを音声認識結果１（
ｉ＝１．２．・・・・・・Ｎ）として出力する。The noise component removing means 3 temporally synchronizes the two input time series, calculates the noise component included in the feature parameter time series Dv+N from the noise feature parameter time series DN, and calculates the noise component included in the feature parameter time series Dv+ from the feature parameter time series Dv+. By removing the noise component N described above, the feature parameter time series Dv from which the noise component has been removed is output. The slam matching identification means 4 converts the standard pattern time series read from the voice standard pattern storage means 5 into the input feature parameter time series,
i (1==l p 2 t...N), determine the standard pattern time series that is most similar to the feature parameter time series D, and then select i that specifies this Dpi. Speech recognition result 1 (
i=1.2. ......N).

第２図は本発明の第２の実施例を示すブロック図Ｔ：あ
る。FIG. 2 is a block diagram showing a second embodiment of the present invention.

第２図に示す本発明の第２の実施例は、特徴パラメータ
分析抽出手段としてのＢ　１）Ｉ″（Ｂａｎｄ　Ｐａ５
ｓＦｉ１ｔｅｒ］１１６．ＢＰＦ（２）７．雑音成分除
去器８゜パターンマツチング識別手段としてのＤＰ（Ｄ
ｙｎ−ａｍｉｃ　ｆ’ｒｏｇｒａｍｉｎｇ）処理器９お
よび音声標準パターンメモリ１０を備えて構成され、第
２図にはなお、インパルスレスポンスｈ（ｔｌｅ有する
雑音伝達系１０１と、この雑音伝達系で生起する音声と
雑音の空間的加算を表現する仮想加算器１０２を併記し
て示す。The second embodiment of the present invention shown in FIG.
sFilter]116. BPF(2)7. Noise component remover DP (D
FIG. 2 also shows a noise transfer system 101 having an impulse response h(tle), and the speech generated in this noise transfer system. A virtual adder 102 representing spatial addition of noise is also shown.

第２図Ｖこ示す第２の実施例は、特徴パラメータの分析
抽出手段としてバンドパスフィルタ群を利用するもので
あシ、Ｂ　Ｐ　Ｆ　（１）　６とＢＰＦ（２）７はいず
れも所定数のバンドパスフィルタによるフィルタバンク
として構成され、これらバンドパスフィルタのそれぞれ
から出力される各周波数帯域ごとのスペクトル包絡を特
徴パラメータとして雑音成分除去器９に供給する。従っ
て、この場合、音声識別に用いる特徴パラメータは、バ
ンドパスフィルタ群の中心周波数ｆｊ（ｊ＝１，２．・
・・・・・、１で表わされる周波数帯域ごとの電力であ
り、音声認識の場合は通常８〜２０　ＣＨ（チャネル）
程度のバンドパスフィルタから構成されるフィルタバン
クによって特徴パラメータの分析・抽出を行なっている
。第２図の２つのバンドパスフィルタは、それぞれ同一
の中心周波数をもつバンドパスフィルタ群からなる。The second embodiment shown in FIG. The spectral envelope of each frequency band output from each of these band-pass filters is supplied to the noise component remover 9 as a characteristic parameter. Therefore, in this case, the feature parameter used for voice identification is the center frequency fj (j=1, 2, etc.) of the group of bandpass filters.
・・・・It is the power for each frequency band expressed as 1, and in the case of voice recognition, it is usually 8 to 20 CH (channels).
The feature parameters are analyzed and extracted using a filter bank consisting of several band-pass filters. The two bandpass filters shown in FIG. 2 each consist of a group of bandpass filters having the same center frequency.

いま、ＢＰＦ（１）６とＢＰＦ（２）７１Ｃそれぞれ入
力する信号人力ｓ　（ｔ）と雑音人力ｎ　（ｔｌについ
て、その入力背景を説明する。Now, the input background of the signal force s (t) and the noise force n (tl that are input to the BPF (1) 6 and BPF (2) 71C, respectively) will be explained.

第３図は第２図の囃２の実施例における音声人ぺ〇− 力と雑音入力の発生状況を示す説明図である。Figure 3 shows the vocalist Pe〇- in the embodiment of hayashi 2 in Figure 2. FIG. 2 is an explanatory diagram showing the generation situation of force and noise input.

話者１１が発声した音声ｖ　（ｔ）はマイクロホン１２
によって電気的信号に変換され音声人力ｓ　（ｔ）とし
てＢＰＦ（１１６へ供給される。−万、雑音源１３から
発生する雑音ｎはｈ　（ｔ）なるインパルスレスポンス
をもつ雑音伝達系１０１’ｅ介してマイクロホン１２に
も到達し、従ってマイクロホン１２の出力する音声入力
ｓ　（ｔ）は、音声ｖ（ｔ）と入力雑音ｎ（ｔ）＊ｈ（
ｔ）との和ｖ（ｔ）＋ｎ（ｔｌ＊ｈ（ｔ）として表わす
ことができる。ここで、記号本はたたみ込み乗算を表わ
す。また、雑音ｎは、マイクロホン１４で電気信号に変
換され雑音入力ｎ（ｔ）としてＢＰＦ（２）７に供給さ
れる。The voice v (t) uttered by the speaker 11 is the microphone 12
The noise n generated from the noise source 13 is converted into an electrical signal by s (t) and supplied to the BPF (116) as a voice signal s (t). Therefore, the audio input s(t) output from the microphone 12 is composed of the audio v(t) and the input noise n(t)*h(
t) can be expressed as the sum v(t)+n(tl*h(t). Here, the symbol sign represents convolution multiplication. Also, the noise n is converted into an electrical signal by the microphone 14 and becomes a noise It is supplied to BPF (2) 7 as input n(t).

ここで、雑音源とは認識すべき音声Ｖ以外のあらゆる音
響を指し、たとえば話者１１以外の人間の声も雑音とな
る。Here, the noise source refers to any sound other than the voice V to be recognized; for example, the voices of humans other than the speaker 11 also constitute noise.

第４図は第２図の第２の実施例における音声入力と雑音
入力の他の発生状況を示す説明図である。FIG. 4 is an explanatory diagram showing another situation in which voice input and noise input occur in the second embodiment of FIG. 2.

この第４図では雑音伝達系１０１として航空機パイロヅ
トの酸累マスクや防音壁などを対象としたコ晶− ７づ−１場合であ夛、マイクロホン１２の出力する音声人力５（
ｔ）は第３図の場合と同じ（ｖ　（ｔ）　＋　ｎ　（ｔ
）＊　ｈ　（ｔ）と表わすことができ、またマイクロホ
ン１４も周囲の一様な雑音ｎを入力して雑音人力ｎ（ｔ
ｌｆｆｉ出力するＯふたたび第２図の実施例に戻って説明ｔ−α行する。第
２図において、ｖ（ｔ）全認識すべき音声とし、雑音人
力ｎ　（ｔ）は雑音伝達系１０１を介してｈ（ｔ）＊ｎ
　（ｔ）として音声信号ｖ（ｔ）に加えられ、音声人力
５（１）が生成される。加算器１０２は雑音伝送系１０
１における空間的加算を表現するものである。In this FIG. 4, the noise transmission system 101 is used for an aircraft pilot's acid mask, soundproof wall, etc. In many cases, the audio output 5 (
t) is the same as in Figure 3 (v (t) + n (t
) * h (t), and the microphone 14 also inputs surrounding uniform noise n and calculates the noise human power n(t
lffi output O Returning again to the embodiment of FIG. 2, the explanation will be in line t-α. In FIG. 2, v(t) is the total voice to be recognized, and noise human power n(t) is transmitted through the noise transmission system 101 to h(t)*n
(t) is added to the audio signal v(t) to generate audio input 5(1). Adder 102 is noise transmission system 10
This represents the spatial addition in 1.

ＢＰＦ（１）６はｌチャネルのＢＰＦ群から成フ、各チ
ャネルの出力が特徴パラメータＳ　（１，Ｓ　（ｆｚ　
）ｔ・・・・・・・・・８（ｆＪ）である。この特徴パ
ラメータの分析・抽出は一定周期Ｔ秒ごとに行なわれ、
音声の場合は通常ｌＯ〜２０ｍ５程度のいわゆる分析周
期ごとに繰り返し行なわれる。従って、４個の特徴パラ
メータＳ　（ｆｘ　）　、Ｓ　（ｆｚ）　、・・〜・・
５（ｆｌ）のセットはＴ秒ごとの時系列となる。全く同
様にＢ　Ｐ　Ｆ（２）８に入力した雑音人力ｎ（ｔ）も
分析が施され、Ｎ（、ｆｔ）。BPF (1) 6 consists of a group of l-channel BPFs, and the output of each channel is the feature parameter S (1, S (fz
)t...8(fJ). The analysis and extraction of this characteristic parameter is carried out at a constant period of T seconds,
In the case of audio, this is usually repeated every so-called analysis cycle of about 10 to 20 m5. Therefore, the four feature parameters S (fx), S (fz), ......
The set of 5 (fl) becomes a time series every T seconds. In exactly the same way, the noise human power n(t) input to B P F (2) 8 is also analyzed, resulting in N(, ft).

Ｎ（ｆｚ）、・・・・・・ＮＣｆ１）という雑音の特徴
パラメータがＴ秒ごとに抽出きれ時系列として出方され
る。Noise characteristic parameters N(fz), . . . NCf1) are extracted every T seconds and output as a time series.

ここで、時間領域の関数５（ｔ）＝ｖ（ｔ）＋　ｈ（ｇ
＊　ｎ（Ｌｌをフーリエ変換して周波数領域の関数で表
わすと、５（１）＝Ｖげ）＋ｌ（げ）・Ｎにｌ’）と表
わすことができる。従って、雑音成分除去器８では、５
（ｆｊ）とＮ（ｆｊ）を入力し、Ｖ（ｆｊ）＝Ｓ（ｆｊ
　）−Ｈ（ｆｊ　）・Ｎ　（ｆｊ　）という演算地理を
ｊ＝ｔ、ｚ、・・・・・・ｌの１回について行なえば雑
音成分を含まない音声Ｖ（１）ノ％徴ハラｊ−タＶ　（
ｊ’ｔ　）　、■（ｆｚ　）　、＝−ＶＣｆｌ　）を出
力することができる。この演算で使用するＨ（ｆｊ）は
、雑音伝達系１０１の周波数特性でめシ、これは無音区
間すなわち、Ｖ（ｔ）＝Ｏのとき、８（ｆＤ＝ＨＣｆｏ
）　・ＮＣｆｊ　）　（ｊ＝ｘ　、ｚ、−−−−−、（
Ｊ）であることから、）ｉ（ｆｊ）＝Ｓ（ｆｊ）／Ｎ（
、ｆｉ）（ｊ＝ｉ、ｚ、・・・・・・２）としてあらか
じめ求めることができ、この値は雑音人力ｎ　（ｔｊの
ように急に変化することはないので一時記憶しておぎ音
声ｖ（ｔ）が入力でれた時雑音成分除去演算に使用する
ことができる。Here, the time domain function 5(t)=v(t)+h(g
*n (If Ll is Fourier transformed and expressed as a function in the frequency domain, it can be expressed as 5(1)=V)+l(+N). Therefore, in the noise component remover 8, 5
(fj) and N(fj), V(fj)=S(fj
)-H(fj)・N(fj) is performed once for j=t, z,...l, then the voice V(1) which does not contain any noise components can be calculated as % characteristic Haraj- Ta V (
j't), ■(fz), =-VCfl). H(fj) used in this calculation is based on the frequency characteristic of the noise transfer system 101, which is 8(fD=HCfo
) ・NCfj ) (j=x, z, ------, (
Since )i(fj)=S(fj)/N(
, fi) (j = i, z, ...2), and this value can be temporarily memorized and used as the voice input since it does not change suddenly like the noise force n (tj). When v(t) is input, it can be used for noise component removal calculations.

雑音の除去された特徴パラメータＶ　Ｕｌ）　、Ｖ（ｆ
ｚ）。Denoised feature parameters V Ul) , V(f
z).

・・・・・・ＶＣｆ、）の時系列は、パターンマツチン
グ識別手段としてＤＰ処理器９に供給され、標準バタン
メモリ１０から読み出す音声標準パターンＶｉ（ｆｔ　
）　、　Ｖｉ（ｆｚ）、・・・・・・Ｖｉ（ｆｌ）と比
較・照合される。このバタンマツチング処理は通常、ダ
イナミックプログラミングの手法が用いられ％２つの　
　　　“特徴パラメータ時系列の間の類似度を、入力の
時間的な伸縮を配慮して行なう非線形な整合をとシつつ
照合して求めることができる。The time series of VCf, ) is supplied to the DP processor 9 as a pattern matching identification means, and the audio standard pattern Vi(ft
), Vi(fz), . . . are compared and verified with Vi(fl). This slam matching process usually uses a dynamic programming method to
“The degree of similarity between feature parameter time series can be determined by matching while performing non-linear matching that takes into account the temporal expansion and contraction of the input.

音声標準パターンメモリＮｏとしてはＲＡＭやＲＯＭの
ようなメモリが使用されるのが一般的であシ、ここには
、音声標準パターンＶｉＣｆｘ　）　、ｖｉ（ｆｚ）、
・・・・・・ｖｉＣｆｌ）（ｉ＝　１．２　ｍ・・・・
・・Ｍ）のＭ個の標準音声の特徴パラメータ時系列をあ
らかじめ登録しておく。ダイナミックプログラミングで
は、標準パターンＭ個の時系列に対して、入力してきた
時系列Ｖ（ｆｌ）　−Ｖ（ｆｚ　）　、・・・・・・■
（、ｆＪ３）とどれが最も似テイルカ全判定シ、Ｖｉ（
ｆｔ）−Ｖｉ（ｆｚ）、”−””’Ｖｉ（ｆｚ）の中で
最も似ているｉをもつ標準パターンを認識結果として出
力する。Generally, a memory such as RAM or ROM is used as the audio standard pattern memory No., and here the audio standard patterns ViCfx), vi(fz), vi(fz),
...viCfl) (i= 1.2 m...
... M) feature parameter time series of M standard voices are registered in advance. In dynamic programming, for the standard pattern M time series, the input time series V (fl) - V (fz), ...... ■
(, fJ3) and which is the most similar to all judgments, Vi(
ft)-Vi(fz) and "-""''Vi(fz), the standard pattern with the most similar i is output as a recognition result.

以上が第２図の実施例の説明であるが、このほかにも多
くの変形例が考えられ、これらはすべて次に述べる如く
本発明の主旨を損なうことなく容易に実施できる。The above is a description of the embodiment shown in FIG. 2, but many other variations are possible, and all of these can be easily implemented without departing from the spirit of the invention, as will be described below.

たとえば、第１図と第２図の実施例ではそれぞれ、特徴
パラメータ分析抽出手段もしくは特徴パラメータ分析抽
出器としてのＢＰＦｆｔ、２個備え、−万には認識すべ
き音声と雑音を含む入力音声を、また他方には前記締音
と相関を有するかもしくは雑音の成分を含む入力音声を
供給するものとしているが、上述した特徴パラメータ分
析抽出手段もしくはＢＰＦ’はそれぞれ共通の１個で実
現し、入力全時分割とすることによっても容易に実施し
うろことは明らかである。For example, in the embodiments of FIGS. 1 and 2, two BPFfts are provided as feature parameter analysis and extraction means or feature parameter analysis and extraction devices, respectively. On the other hand, an input voice having a correlation with the above-mentioned constriction or containing a noise component is supplied, but the above-mentioned feature parameter analysis and extraction means or BPF' are each realized by one common unit, and all of the input It is clear that this could be easily implemented by time-sharing.

また、雑音成分除去のために行なわれる処理は演算量が
比較的少なく、パターンマツチング識別処理における前
段もしくは処理中で行なうことも可能であシ、従って物
理的にはこれら２つの処理は同一構成で実施することも
容易に実施しうる。Furthermore, the amount of calculations required for the processing performed to remove noise components is relatively small, and it is possible to perform it before or during pattern matching and identification processing, so physically these two processes have the same configuration. It can also be easily implemented.

〔Effect of the invention〕

以上説明したように本発明は、認識すべき音声に付加さ
れて認ａ率低下の原因となっている雑音を第２の特徴パ
ラメータ分析抽出手段によって雑音の特徴パラメータを
抽出し、マツチング識別処理で使用する特徴パラメータ
時系列から雑音の成分を除去することによって、ｂらか
しめ記憶しておいた音声標準パターン時系列に対して、
雑音の成分が除去された分だけ音声特徴パラメータ時系
列の類似度が高ま９、入力音声の認、、！率が向上する
と匹う効果がある。また、特に音声の始端と終端に対す
る検出は、音声標準パターン時系列との比較すなわちマ
ツチング識別処理において認識率を左右する重要な要因
であるが、音声信号のパワーレベルが小さい始端と終端
において雑音成分が除去されればその検出が容易となシ
認識率が大幅に向上するという効果がある。As explained above, the present invention uses the second feature parameter analysis and extraction means to extract the feature parameters of the noise that is added to the speech to be recognized and causes a decrease in the recognition rate, and performs the matching identification process. By removing noise components from the feature parameter time series to be used,
As the noise components are removed, the similarity of the speech feature parameter time series increases9, recognition of the input speech,,! Improving the ratio has a corresponding effect. In addition, detection of the start and end of speech is an important factor that affects the recognition rate in comparison with the speech standard pattern time series, that is, in matching identification processing, but noise components are detected at the start and end where the power level of the speech signal is low If it is removed, it will be easier to detect it and the recognition rate will be greatly improved.

さらに、従来の音声認識方式では、入力信号の音響波形
レベルでの雑音除去も特に高騒音下での使用時には必要
であったが、音響波形レベルでの除去では最低８　ＫＦ
Ｉ　ｚで約２００ポイント程度の積　　。Furthermore, in conventional speech recognition methods, it is necessary to remove noise at the acoustic waveform level of the input signal, especially when used under high noise conditions, but removal at the acoustic waveform level requires at least 8 KF.
A product of about 200 points in Iz.

和演算（約１６０００００回／秒を必要としたのに対し
、本発明の如＠特徴パラメータ領域での雑音成分除去で
は、１６チヤネルのＢＰＦ＝２用いた場合金側とすると
、周期ｅｌｏｍｓとしても１６００回の積和演算程度で
すみ雑音除去の為の演算量が大幌に削減できるという効
果がある。While the sum operation (approximately 1,600,000 times/second was required), in the noise component removal in the feature parameter domain as in the present invention, when using BPF = 2 for 16 channels and on the gold side, the period eloms is also 1,600 times/second. This has the effect that the amount of calculations required for noise removal can be greatly reduced by requiring only one product-sum operation.

[Brief explanation of the drawing]

第１図は本発明の第１の実施例を示すブロック図、第２
図は本発明の第２の実施例金示すブロック図、第３図は
第１および第２図における音声入力と雑音入力の発生状
況を示す説明図、第４図は１・・・・・・特徴パラメー
タ分析抽出手段、２・・・・・・雑音特徴パラメータ分
析抽出手段、３・山・・雑音分析除去手段、４・・・・
・・パターンマツチング識別手段、５・・・・・・音声
標準パターン記憶手段、６・・・・・・ＢＰＦ（１）、
７・・・・・・ＢＰＦ（２）、訃・・・・・雑音成分除
去器、９・・・・・・ＤＰ処理器、１０・・・・・・音
声標準パターンメモ！Ｊ、１１・・・・・・話者、ｘｚ
・・・・・・マイクロホン、１３・・・・・・雑音源、
１４・・・・・・マイクロホン。FIG. 1 is a block diagram showing a first embodiment of the present invention;
Fig. 3 is a block diagram showing the second embodiment of the present invention, Fig. 3 is an explanatory diagram showing the occurrence of voice input and noise input in Figs. 1 and 2, Fig. 4 is 1... Feature parameter analysis and extraction means, 2...Noise feature parameter analysis and extraction means, 3. Mountain...Noise analysis and removal means, 4...
...Pattern matching identification means, 5...Audio standard pattern storage means, 6...BPF (1),
7...BPF (2),...Noise component remover, 9...DP processor, 10...Audio standard pattern memo! J, 11...Speaker, xz
...Microphone, 13...Noise source,
14...Microphone.

Claims

[Claims]

(1) Feature parameter analysis and extraction means for analyzing input speech and extracting acoustic feature parameter time series; speech standard pattern storage means for storing feature parameter time series of standard speech in advance as speech standard pattern time series; and pattern matching identification means for comparing the feature parameter time series extracted by the feature parameter analysis and extraction means with the speech standard pattern time series read from the speech standard pattern storage means and outputting an identification result of the input speech. a first feature parameter analysis and extraction means that receives as input an acoustic signal containing the speech to be recognized and other sounds as noise;
a second feature parameter analysis and extraction means that receives as input an acoustic signal that has a correlation with the noise input to the first feature parameter analysis and extraction means or contains a component of the noise;
The noise component calculated from the noise feature parameter time series outputted by the second feature parameter analysis and extraction means is
a noise component removing means for removing a noise component from the feature parameter time series outputted by the feature parameter analysis and extraction means and supplying the noise component to the pattern matching identification means.

(2) The first feature parameter analysis and extraction means and the second feature parameter analysis and extraction means are realized by one common feature parameter analysis and extraction means, and the sound to be input to the first feature parameter analysis and extraction means is The speech recognition method according to claim 1, wherein the signal and the acoustic signal to be input to the second feature parameter analysis and extraction means are inputted in a time-sharing manner.