JPS59123896A

JPS59123896A - Voice recognition system

Info

Publication number: JPS59123896A
Application number: JP57229278A
Authority: JP
Inventors: 佐藤　泰雄; 教幸藤本; 杉田　忠靖
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1982-12-29
Filing date: 1982-12-29
Publication date: 1984-07-17
Also published as: JPH0146079B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（Ａ）　　発明の技術分野本発明は音声認識方式、特に帯域フィルタ群を用いて入
力音声の周波数分析を行い、単音節または単語等の音声
認識を行う音声認識方式において。Detailed Description of the Invention (A) Technical Field of the Invention The present invention relates to a speech recognition method, particularly a speech recognition method that performs frequency analysis of input speech using a group of bandpass filters and recognizes speech such as monosyllables or words. .

音声認識率を低下させることなく、照合すべき特徴パラ
メータ時系列のパラメータ量を削減し、かつ分析ハード
ウェア量の削減を可能とした音声認織方式に関するもの
である。The present invention relates to a voice recognition method that reduces the amount of time-series feature parameter parameters to be compared and reduces the amount of analysis hardware without reducing the voice recognition rate.

（Ｂｌ　　技術の背景と問題点音声認識方式として、広帯域の音声周波数分析を行うた
め、多数チャネルの帯域通過フィルタを使用し、各フィ
ルタの出力を整流積分等によって帯域別スペクトル電力
に変換し、それらを対数変換して帯域別対数スペクトル
電力を求め、スペクトルの正規化１のため、全チャネル
の平均値が零となるように帯域別対数スペクトル電力を
変換した後、正規化されたすべての帯域別対数スペクト
ル電力を照合用特徴パラメータ時系列として使用し。(Bl Technology Background and Problems As a speech recognition method, in order to perform wideband speech frequency analysis, a multi-channel bandpass filter is used, and the output of each filter is converted into band-specific spectral power by rectification and integration, etc.). For spectrum normalization 1, the logarithmic spectral power for each band is converted so that the average value of all channels is zero, and then the logarithmic spectral power for each band is calculated by logarithmically transforming The logarithmic spectral power is used as the feature parameter time series for matching.

予め辞書に登録さハた標準特徴パラメータ時系列と２例
えばダイナミックプルグラミング（ＤＰ）マツチング法
等により照合して、単音節または単語等の音声認識を行
う方式が知られている。A method is known in which speech recognition, such as a single syllable or a word, is performed by comparing a time series of standard feature parameters registered in advance in a dictionary using, for example, a dynamic programmatic (DP) matching method.

上記音声認識方式において、音声の認識率を高めるため
には、帯域フィルタの数、すなわちチャネル数を多くす
る必要がある。しかし、チャネル数を増加させると、音
声周波数を分析するためのハードウェア量が多く必要に
なるだけでなく、特徴パラメータの要素が増えることか
ら、照合に用いるメモリ量が多く必要になり、また辞書
に格納する標準特徴パラメータ時系列の格納領域も多く
必要になる。さらに、照合のための演算処理時間も多く
かかることになる。In the above speech recognition method, in order to increase the speech recognition rate, it is necessary to increase the number of bandpass filters, that is, the number of channels. However, increasing the number of channels not only requires a large amount of hardware to analyze audio frequencies, but also increases the number of feature parameter elements, which requires a large amount of memory for matching. A large amount of storage space is also required for the standard feature parameter time series stored in the . Furthermore, it takes a lot of time to process the computation for verification.

しかし、チャネル数を減らせば、重要とするメモリ量等
を少なくすることができるが、音声認識率が劣化するこ
とになる。However, if the number of channels is reduced, the amount of important memory, etc. can be reduced, but the speech recognition rate will deteriorate.

ところで１本発明者等は２本発明の完成に先立って、多
くの実験・研究を積み重ねた結果、音声認識における次
のような！時性を発見した。音声周波数分析は、高周波
数帯域部分も含めて、広帯域にわたって行ったほうが良
好な結果が得られるが。By the way, 1) The inventors of the present invention have conducted many experiments and researches prior to completing the present invention, and as a result, the following results regarding voice recognition have been discovered. I discovered temporality. Better results can be obtained when performing audio frequency analysis over a wide range of frequencies, including the high frequency range.

特に高周波数帯域部分については、各サンプリングごと
のパワースペクトルの相対的な音声エネルギー量が重要
であり９例えば、そのパワースペクトルのピークがｔ５
Ｋｌｚの周波数部分にあるが。Especially in the high frequency band part, the relative amount of audio energy in the power spectrum for each sampling is important.9 For example, the peak of the power spectrum is t5.
It is in the frequency part of Klz.

７　Ｋ）Ｉｚの周波数部分にあるかは、音声ＲＮ６？上
それ程重要ではないということである。これは２人間の
耳では、おそらく高周波数帯域における周波数のわずか
な違いは、認識が困難であるためと考えられる。7 K) Is the voice RN6 in the frequency part of Iz? That said, it's not that important. This is probably because it is difficult for two human ears to recognize slight differences in frequencies in high frequency bands.

そこで本発明者等は、高周波数帯域部分も含めた複数の
帯域フィルタで分析したパラメータを正規化した後、高
域部分の複数チャネルのパラメータを除去するようにし
て、音声認識率が変化するかどうかを実験してみたとど
ろ９％徴パラメータ時系列として高域部分も含めた全チ
ャネルについてのパラメータを用いた場合に比べて、認
識率が低下しないことが判明した。一方、高周波数帯域
部分を始めから正規化の条件に加えない場合には。Therefore, the present inventors normalized the parameters analyzed using multiple band filters, including the high frequency band, and then removed the parameters of multiple channels in the high frequency band, and investigated whether the speech recognition rate would change. In an experiment, it was found that the recognition rate did not decrease compared to the case where parameters for all channels including the high frequency portion were used as the 9% parameter time series. On the other hand, if the high frequency band part is not added to the normalization conditions from the beginning.

音声認識率が低下することが判明した。It was found that the speech recognition rate decreased.

（ｑ　発明の目的と構成本発明は上記の点に鑑み、従来方式の改善を図り、音声
認識率を低下させることな（、照合すべき特徴パラメー
タ量を減少させて、メモリ量等の削減を可能とするとと
もに、スペクトル分析のためのハードウェア量を削減す
ることを目的としている。換言すれば、従来と同じ特徴
パラメータ量であれば、音声の認識率がさらに向上する
ようにすることを目的としている。そのため２本発明の
音声認識方式は、音声を周波数分析して得られる特徴パ
ラメータ時系列の照合を行い音声を認識する音声認識方
式において、広帯域の音声周波数帯域に４−ンたってス
ペクトル分析を行い、その分析結果の高周波数帯域部分
を重み付けして正規化スペクトルを求めた後、照合用特
徴パラメータ時系列として上記高周波数帯域部分を除去
したスペクトルを用いることを特徴としている。以下２
図面を参照しつつ実施例に従って説明する。(q) Purpose and Structure of the Invention In view of the above points, the present invention aims to improve the conventional method, and reduces the amount of feature parameters to be compared to reduce the amount of memory, etc., without reducing the speech recognition rate. The aim is to reduce the amount of hardware required for spectrum analysis.In other words, the aim is to further improve the speech recognition rate with the same amount of feature parameters as before. Therefore, in the speech recognition method of the present invention, which recognizes speech by collating feature parameter time series obtained by frequency analysis of speech, spectrum analysis is performed on a wide speech frequency band. The high frequency band portion of the analysis result is weighted to obtain a normalized spectrum, and then the spectrum from which the high frequency band portion is removed is used as the feature parameter time series for matching.The following 2
An embodiment will be described with reference to the drawings.

（Ｄ）　　発明の実施例図は本発明の一実施例Ｄｌｉ成を示す。(D) Examples of the invention The figure shows an embodiment of the invention.

図中、１は音声入力部、２はパラメータ抽出部。In the figure, 1 is a voice input section, and 2 is a parameter extraction section.

３はスペクトル分析部、４−１ないし４−２７は帯域通
過フィルタ、５−１ないし５−７１は整流器。3 is a spectrum analyzer, 4-1 to 4-27 are band pass filters, and 5-1 to 5-71 are rectifiers.

６−１ないし６−ｎはアナログ・ディジタル変換回路、
７はスペクトル正規化部、８−１ないし８−ｎは対数変
換部、９は定数記憶部、ｉｏは乗算器、１１は平均値算
出部、１２−１ないし１２−（ｙｔ−１）は減算器、１
３は音声認識部、１４は辞ギ４を表わす。6-1 to 6-n are analog-to-digital conversion circuits;
7 is a spectrum normalization unit, 8-1 to 8-n are logarithmic conversion units, 9 is a constant storage unit, io is a multiplier, 11 is an average value calculation unit, and 12-1 to 12-(yt-1) are subtraction units. vessel, 1
3 represents a voice recognition section, and 14 represents a speech recognition section 4.

音声入力部１から入力された単音節または単語からなる
音声のアナログ信号は、パラメータ抽出部２に入力され
る。パラメータ抽出部２は、音声アナログ信号の周波数
分析を行い、認識すべき入力音声の特徴パラメータ時系
列を抽出生成するものである。そのため、パラメータ抽
出部２は、広帯域の音声周波数帯域にわたってスペクト
ル分析を行うスペクトル分析部３と、スペクトル分析部
３の出力を、高周波数帯域部分について」“み付げして
正規化し、高周波数帯域部分を除く正規化スペクトルを
照合用の特徴パラメータＦｌｙ　”２＋　　・・・Ｐ　
？）−１として出力するスペクトル正規化部７とを有し
ている。A voice analog signal consisting of a single syllable or word inputted from the voice input section 1 is inputted to the parameter extraction section 2 . The parameter extraction unit 2 performs frequency analysis of the audio analog signal and extracts and generates a time series of characteristic parameters of the input audio to be recognized. Therefore, the parameter extraction unit 2 normalizes the output of the spectrum analysis unit 3 that performs spectrum analysis over a wide audio frequency band and the output of the spectrum analysis unit 3 with respect to the high frequency band. The feature parameter for matching the normalized spectrum excluding the part Fly "2+ ...P
? )-1.

スペクトル分析部３は、帯域別に複数（ｙｚ個）の帯域
通過フィルタ４−１〜４−ｎを有している。The spectrum analysis section 3 has a plurality of (yz) bandpass filters 4-1 to 4-n for each band.

図において、上部の帯域通過フィルタ４−１から順に下
位に向うに従って２通過周波数が高くなっている。帯域
通過フィルタ４−１〜４−ｎは９例えば瞬接する帯域通
過フィルタの３　ｄＢの減衰点が一致するように配置さ
れ２例えば１８０Ｈｚから７．８ＫＨｚまでの広帯域に
わたってカバーするよ、うにされる。特に、帯域通過フ
ィルタ４−１から４−（ｎ−１）までは２例えば帯域幅
が１７０Ｈｚないし６２０Ｈｚ程度に定められるが、最
高周波帯域の帯域通運フィルタ４−ｎは２例えば３　Ｋ
１１ｚというような広い帯域特性をもつようにされてい
る。In the figure, the 2-pass frequency increases in order from the bandpass filter 4-1 at the top to the bottom. The band pass filters 4-1 to 4-n are arranged such that the 3 dB attenuation points of the band pass filters that are in contact with each other coincide with each other, and cover a wide band from, for example, 180 Hz to 7.8 KHz. In particular, the bandpass filters 4-1 to 4-(n-1) have a bandwidth of, for example, 170Hz to 620Hz, but the bandpass filter 4-n in the highest frequency band has a bandwidth of, for example, 3K.
It is designed to have wide band characteristics such as 11z.

音声入力部１から９音声信号は、帯域通過フィルタ４−
１〜４−ｎによって帯域別にろ波され。The audio signals from audio input units 1 to 9 are passed through a bandpass filter 4-
1 to 4-n for each band.

それぞれ整流器５−１〜５−７１に入力される。各整流
器５−１〜５−　ｎは２例えば１０７７２．９の整流積
分時定数でもって９人力信号の整流平滑化を行う。整流
器５−１〜５−２１の出力は、アナログ・ディジタル変
換器６−１〜６−２２に入力され、帯域別スペクトル電
力をディジタル量として表わしたものが求められる。変
換結果は、スペクトル正規化部７へ出力される。The signals are input to rectifiers 5-1 to 5-71, respectively. Each rectifier 5-1 to 5-n performs rectification and smoothing of nine human input signals with a rectification and integration time constant of 2, for example, 10772.9. The outputs of the rectifiers 5-1 to 5-21 are input to analog-to-digital converters 6-1 to 6-22, and the spectrum power for each band is obtained as a digital quantity. The conversion result is output to the spectrum normalization section 7.

スペクトル正規化部７に入力された帯域別スペクトル電
力は、対数変換部８−１〜Ｂ−ｎによって２人間が感じ
る音の強弱に出力値が比例するよう対数変換されて、帯
域別対数スペクトル電力が求められる。次に、この帯域
別対数スペクトル電力について、入力音声が大きな声で
あっても、小さな声であっても同じ特徴パラメータとし
て表われるようにするために、以下のよラフ（変換が行
わＪｌ、る。The band-specific spectral power input to the spectrum normalization unit 7 is logarithmically converted by the logarithmic conversion units 8-1 to B-n so that the output value is proportional to the intensity of sound felt by two people, and is converted into band-specific logarithmic spectral power. is required. Next, regarding the logarithmic spectral power for each band, in order to make it appear as the same feature parameter regardless of whether the input voice is a loud voice or a soft voice, the following rough (transformation is performed) is performed. .

まず、対数変換部８−２１の出力値、すなわち最高周波
数帯域の対数スペクトル電力に、予め定数記憶部９に格
「１された重み付は定数を、千算器１０によって、掛は
合わせる。こ」１．は、上述の如く。First, the output value of the logarithmic conversion unit 8-21, that is, the logarithmic spectral power of the highest frequency band, is multiplied by a weighting constant, which has been stored in the constant storage unit 9 in advance, by a multiplier 10. ”1. As mentioned above.

帯域通過フィルタ４−１１については、他の帯域通過フ
ィルタ４−１〜４−（？！−１’）よりも広い帯域幅を
もつようにしているため、］チャネルでもって復数チャ
ネル分のウェイトを持つからである８もし、該最高周波
数帯域の１チヤネルが、低域におゆる３チヤネル分の帯
域幅に相当する揚台には。Since the bandpass filter 4-11 has a wider bandwidth than the other bandpass filters 4-1 to 4-(?!-1'), the weight for multiple channels can be calculated using ]channel. 8.If one channel of the highest frequency band corresponds to the bandwidth of three channels of low frequencies,

重み付は定数として「３」が定数記憶部９に格納され２
乗算器１０によって、対数変換部８−２２の出力値が３
倍されることになる。For weighting, "3" is stored as a constant in the constant storage section 9, and 2
The multiplier 10 converts the output value of the logarithmic conversion unit 8-22 into 3
It will be doubled.

平均値算出部１１は、上記重み付けが考慮された帯域別
対数スペクトル電力についての平均値を算出する。例え
ば各対数変換部８−１〜８−７１の出力値が、それぞれ
Ｐ；、　Ｐ２′、　ａＴ、　ｐｎ′−１，ｐ；であり。The average value calculation unit 11 calculates an average value of the logarithmic spectral power by band in which the weighting described above is taken into consideration. For example, the output values of the logarithmic conversion units 8-1 to 8-71 are P;, P2', aT, pn'-1, p;, respectively.

重み付は定数がＷであるとすると、平均値Ｆは次のよう
になる。Assuming that the weighting constant is W, the average value F is as follows.

ｎ−１−４−Ｗ減算器１２−１〜１’２−（７２−１）は、対数変換部
８−１〜８−（ｎ−１）に対応して設けられる。すなわ
ち、対数変換部８−Ｈに対応する減算器は設けらハず、
帯域別対数スペクトル電力Ｐ２’Ｚは。n-1-4-W subtracters 12-1 to 1'2-(72-1) are provided corresponding to logarithmic conversion units 8-1 to 8-(n-1). That is, there is no need to provide a subtracter corresponding to the logarithmic conversion section 8-H.
The logarithmic spectral power P2'Z for each band is.

平均値の算出のためにだけ用いら］１．平均値の算出後
は除去され、る。減算器１２〜１〜１２−（ｔｌ−１）
は、各帯域別対数スペクトル電力Ｐ、’、Ｐ；。Used only for calculating the average value] 1. After calculating the average value, it is removed. Subtractor 12-1-12-(tl-1)
is the logarithmic spectral power P,′,P; for each band.

・・・、　Ｐｎ、から、平均値算出部１１の出力Ｐの減
算を行う。すなわち、減算器１２−１〜１２−Ｃ１１−
１）の出力ξは、各々次のようになる。. . . The output P of the average value calculation unit 11 is subtracted from Pn. That is, subtracters 12-1 to 12-C11-
The outputs ξ of 1) are as follows.

この減算器１２−１〜１２−（ｎ−１）（７）出力Ｐ。This subtracter 12-1 to 12-(n-1)(7) output P.

は、照合用特徴パラメータとして、音声認識部１３に出
力される。is output to the speech recognition unit 13 as a feature parameter for verification.

音声認識部１３ば、Ｃｎ−１）個の特徴パラメータの組
からなる特徴パラメータ時系列によって。The speech recognition unit 13 uses a feature parameter time series consisting of a set of Cn-1) feature parameters.

予め辞書１０に登録された標準特徴パラメータ時系列と
９例えばＤＰマツチング法により照合することにより入
力音声の認識を行う。すなわち、簡単に言えば時間軸の
正規化を行い、対応する時点におゆる１７７個の入力特
徴パラメータＰｉと標準特徴パラメータＰｉとの距離（
Ｐｉ−Ｐｉ）　　を１−１からｉ−ｍまで加算し、これ
を一連の時系列について加えた結果が最小になる標準４
キ徴パラメータに対応する単音節または単語を認識結果
とする。The input speech is recognized by comparing it with a standard feature parameter time series registered in advance in the dictionary 10 using, for example, the DP matching method. That is, to put it simply, the time axis is normalized, and the distance (
Standard 4 where the result of adding Pi-Pi) from 1-1 to i-m and adding this for a series of time series is the minimum
A recognition result is a monosyllable or a word corresponding to the characteristic parameter.

本発明者等は１本発明の効果を試験するために。In order to test the effect of the present invention, the present inventors 1.

次のような実験を行った。まず、第１チヤネルから第１
９チヤネルまで、　　１８０ＫＨｚから７．８　ＩＵｌ
ｚまでの帯域をカバーする１９イ固の帯域フィルタを用
意した。特に第１７チヤネル、第１８チヤネル。The following experiment was conducted. First, from the first channel
Up to 9 channels, 180KHz to 7.8 IUl
A 19-band filter that covers the band up to Z was prepared. Especially the 17th channel and the 18th channel.

第１９チヤネルの帯域フィルタの特性を記すと。Describe the characteristics of the 19th channel bandpass filter.

それぞれ中心周波数は５１４５Ｈｚ、　　５９１０Ｈｚ
、　７０２０Ｈｚ。The center frequencies are 5145Hz and 5910Hz, respectively.
, 7020Hz.

下限周波数は４８００Ｈｚ　、　５５１４１１ｚ　、　
６３３４　Ｈｚ　、上限周波晧は５５１４Ｈｚ、　６３
３４ｈ、　７８００Ｈ７テあり、帯域幅はそれぞれ７１
４　Ｈｚ　、８２０　Ｈｚ　、　１４６６　Ｈｚである
。そして、従来方式により、この全チャネルの帯域別対
数スペクトル電力を正規化した１９個の正規化スペクト
ル電力を特徴パラメータとして音声認識を行った。The lower limit frequency is 4800Hz, 551411z,
6334 Hz, upper limit frequency is 5514 Hz, 63
There are 34h and 7800H7te, each with a bandwidth of 71
4 Hz, 820 Hz, and 1466 Hz. Then, according to the conventional method, speech recognition was performed using 19 normalized spectral powers obtained by normalizing the band-specific logarithmic spectral powers of all channels as feature parameters.

次に、第１チヤネルから第１６チヤネルまでは。Next, from the 1st channel to the 16th channel.

上述のものと同じ帯域フィルタを用意し、第１７チヤネ
ルから第１９チヤネルまでの帯域フィルタに替えて、下
限周波数が４８００ＦＩ７．、　　上限周波数が７、８
　ＫＨ７の帯域フィルタを用い、上記実施例で説明した
如く、１７個のチャネルによってスペクトル分析を行い
２重み付は定数を「３」として、第１７チヤネルの出方
値の重み付けを行って平均値を算出し、そのうえで、第
１７チヤネルの出力値を除いた１６個の帯域別対数スペ
クトルから平均値を減算し、結果を特徴パラメータとし
た。この１６個の特徴パラメータに基づいて、新たに作
成し直した１６個の特徴パラメータの組からなる標準特
徴パラメータと照合して音声認識を行ったが。Prepare the same bandpass filter as above, and replace the bandpass filter from the 17th channel to the 19th channel, and set the lower limit frequency to 4800FI7. , upper limit frequency is 7, 8
Using a KH7 bandpass filter, spectrum analysis is performed using 17 channels as explained in the above example, and the 2-weighting constant is set to 3, and the output value of the 17th channel is weighted to calculate the average value. Then, the average value was subtracted from the 16 band-specific logarithmic spectra excluding the output value of the 17th channel, and the result was used as the characteristic parameter. Based on these 16 feature parameters, speech recognition was performed by comparing them with standard feature parameters consisting of a newly created set of 16 feature parameters.

音声認識率は上記１９個の特徴パラメータを用いた場合
と同様な結果が得られた。Regarding the speech recognition rate, results similar to those obtained using the above 19 feature parameters were obtained.

なお、最初から、第１チヤネルから第１６チヤネルまで
のスペクトル分析しか行わなかったものについては、高
域部分の情報が全く加味されないため、音声認識率が低
下することは、以前の実験でわかっている。It should be noted that previous experiments have shown that if only spectrum analysis from the 1st channel to the 16th channel is performed from the beginning, the speech recognition rate will decrease because high-frequency information is not taken into account at all. There is.

さらに１周波数帯域を変化させて実験を絞−り返したが
同様な効果を得ることができた。We further narrowed down the experiment by changing one frequency band, but were able to obtain similar effects.

（Ｄ　発明の詳細な説明した如く本発明によれば、簡単な手段によって、
音声認識率を低下させること）、’ｃ　＜　ｖ照合／格
納特徴パラメータ量を削減することができ。(D As described in detail, according to the present invention, by simple means,
(reducing the speech recognition rate), the amount of matching/storing feature parameters can be reduced.

メモリ量、演算機構等を節減することができるとともに
、高周波数帯域部分をまとめることによって、スペクト
ル分析のためのハードウェア訃を減少させることができ
るようになる。さらに９周波数帯域を広げることによっ
て、音声認識率を向上させることができるようになる。It is possible to reduce the amount of memory, calculation mechanism, etc., and by consolidating the high frequency band parts, it is possible to reduce the amount of hardware required for spectrum analysis. By further widening the nine frequency bands, it becomes possible to improve the speech recognition rate.

[Brief explanation of drawings]

図は本発明の一実施例構成を示す。図中、１は音声入力部、２はパラメータ抽出部。３はスペクトル分析部、４−１ないし４−フ１は帯域通
過フィルタ、５−１ないし５−２１はＤ　Ｈ，器。６−１ないし６−ｎはアナログ・ディジタル変換回路、
７はスペクトル正規化部、８−１ないし８−ｎは対数変
換部、９は定数記憶部、１０は乗算器、１１は平均値算
出部、１２−１ないし１２−（ｎ　−１）は減算器、１
３は音声認識部、１４は辞月を表わす。特許出願人　富士通株式会社The figure shows the configuration of an embodiment of the present invention. In the figure, 1 is a voice input section, and 2 is a parameter extraction section. 3 is a spectrum analysis section, 4-1 to 4-F1 are band pass filters, and 5-1 to 5-21 are DH, devices. 6-1 to 6-n are analog-to-digital conversion circuits;
7 is a spectrum normalization unit, 8-1 to 8-n are logarithmic conversion units, 9 is a constant storage unit, 10 is a multiplier, 11 is an average value calculation unit, and 12-1 to 12-(n −1) are subtraction units. vessel, 1
3 represents a speech recognition unit, and 14 represents a resignation letter. Patent applicant Fujitsu Limited

Claims

[Claims]

(1) In a speech recognition method that recognizes speech by collating feature parameter time series obtained by frequency analysis of speech, - spectrum analysis is performed over a wide speech frequency band, and high frequency band parts of the analysis results are separated. A speech recognition method characterized in that after obtaining a normalized spectrum by weighting the above, the spectrum from which the high frequency band portion has been removed is used as a feature parameter time series for comparison.

(2) The above weighting is normalized by 9 After logarithmically transforming the analyzed spectrum, weighting the high frequency band part and finding the average value, and correcting the sum of the spectra based on the average value is a predetermined value such as zero. A speech recognition system according to claim (1), characterized in that:

(3) The above spectral analysis uses multi-channel bandpass filters and converts the output of each filter into band-specific spectral power. After converting to obtain the logarithmic spectral power for each band, weight one channel of the highest frequency band, and obtain the average value for weighting of all channels. The voice recognition method according to claim 1, wherein the weighting is performed by subtracting the average value from one channel of the highest frequency band (using spectral power).