JPS61206000A - Voice recognition equipment - Google Patents

Voice recognition equipment

Info

Publication number
JPS61206000A
JPS61206000A JP4653085A JP4653085A JPS61206000A JP S61206000 A JPS61206000 A JP S61206000A JP 4653085 A JP4653085 A JP 4653085A JP 4653085 A JP4653085 A JP 4653085A JP S61206000 A JPS61206000 A JP S61206000A
Authority
JP
Japan
Prior art keywords
pattern
speech
spectral
recognition device
level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP4653085A
Other languages
Japanese (ja)
Inventor
畑岡 信夫
天野 明雄
矢島 俊一
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi Ltd
Original Assignee
Hitachi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd filed Critical Hitachi Ltd
Priority to JP4653085A priority Critical patent/JPS61206000A/en
Publication of JPS61206000A publication Critical patent/JPS61206000A/en
Pending legal-status Critical Current

Links

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。
(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】 〔発明の利用分野〕 本発明は音声認識装置に係り、特に発声レベルの変動に
よらない安定なスペクトルパタン抽出、かつ音声の基本
周波数や分析のゆらぎの影響を緩和した音声認識装置に
関する。
[Detailed Description of the Invention] [Field of Application of the Invention] The present invention relates to a speech recognition device, and in particular, to a speech recognition device that extracts a stable spectrum pattern that is not affected by fluctuations in speech level, and that alleviates the effects of fluctuations in the fundamental frequency of speech and analysis. The present invention relates to a speech recognition device.

〔発明の背景〕[Background of the invention]

従来のスペクトルパタン正規化方式は第1図で示される
ような2段処理(レベルの補正、基準レベル合わせ)で
構成されていた(文献「簡易形不特定話者音声認識」、
音声研究会資料番号884−152日本音響学会など)
。しかし、従来の方式は原スペクトルパタンの絶対レベ
ルをある程度保持した正規化を行うという利点はあるも
のの、発声レベルに起因する絶対レベルの違いからくる
分析の演算長の有効利用の点については配慮されていな
かった。発声レベルをある程度向等にできる離散発声の
入力による音声認識装置(厳密には、登録する音声と同
じ発声の方法で入力する音声認識袋fi!E)では従来
の方式でもあまり問題とはならなかった。しかし、登録
する音声と入力する音声の発声方法が異なる連続発声方
式の音声認識装置では登録する音声(一般には入力音声
の一部分)が連続発声された音声のどこに現われるか(
例えば語中か語尾か)によってその音声の絶対レベルが
大きく異なることがあり、絶対レベルの小さい場合には
分析の演算長が有効に活用できてはいなかった。さらに
、従来の帯域通過フィルタ(BPF)分析では音声の基
本周波数の高調波の影響や分析フィルタの演算精度不足
などに起因する分析のゆらぎの影響により本来のスペク
トルパタンの形状とは異なった(例えば低域周波数帯域
に局所的なピークが現われるなど)パタン抽出がされる
ことがあった。
The conventional spectral pattern normalization method consisted of two-stage processing (level correction, reference level matching) as shown in Figure 1 (Reference ``Simplified Speaker-Independent Speech Recognition'',
Speech Research Group Material No. 884-152 Acoustical Society of Japan, etc.)
. However, although the conventional method has the advantage of performing normalization that maintains the absolute level of the original spectral pattern to some extent, it does not take into consideration the effective use of the calculation length of analysis due to the difference in absolute level caused by the vocalization level. It wasn't. With a speech recognition device that inputs discrete utterances that can adjust the utterance level to some extent (strictly speaking, a speech recognition bag fi!E that inputs input using the same utterance method as the voice to be registered), the conventional method does not pose much of a problem. Ta. However, in continuous voice recognition devices where the voice to be registered and the voice to be input are uttered in different ways, it is difficult to determine where the voice to be registered (generally a part of the input voice) appears in the continuously uttered voice.
For example, the absolute level of the voice may vary greatly depending on whether it is in the middle or at the end of a word), and when the absolute level is small, the computational length of the analysis cannot be used effectively. Furthermore, in conventional band-pass filter (BPF) analysis, the shape of the spectral pattern differs from the original one due to the influence of harmonics of the fundamental frequency of the voice and the influence of analysis fluctuations due to insufficient calculation accuracy of the analysis filter (for example, In some cases, patterns (such as local peaks appearing in the low frequency band) were extracted.

〔発明の目的〕[Purpose of the invention]

本発明の目的は、音声の絶対レベルが相違しても分析の
演算長が有効に活用され、かつ、音声の基本周波数の影
響や分析のゆらぎに影響されない音声認識装置における
パタン正規化方式を提供することにある。
An object of the present invention is to provide a pattern normalization method for a speech recognition device in which the calculation length of analysis is effectively utilized even if the absolute level of speech differs, and which is unaffected by the influence of the fundamental frequency of speech or the fluctuation of analysis. It's about doing.

〔発明の概要〕[Summary of the invention]

上記目的を達成するために1本発明では短時間スペクト
ルパタンの各帯域における値の緩和に準じた値で各帯域
のスペクトルの値を除するようなレベル正規化を行うこ
とに第1の特徴があり、さらにスペクトルパタンを平滑
化することに第2の特徴がある。
In order to achieve the above object, the first feature of the present invention is to perform level normalization in which the spectral value of each band is divided by a value corresponding to the relaxation of the value in each band of the short-time spectral pattern. The second feature is that the spectrum pattern is smoothed.

〔発明の実施例〕[Embodiments of the invention]

以下、本発明の原理を詳細に説明する。 The principle of the present invention will be explained in detail below.

第1図は従来のパタン正規化処理のフローを示すもので
ある。帯域通過フィルタ(B P F)分析などの結果
得られたスペクトルパタンfi、i”0〜Nチヤネルを
入力として、第1ステツプでは対数変換などによるレベ
ルの補正が行われる。これは人間の聴覚特性を考慮した
処理である。対数変換の具体的な例はフローに示したも
のの他に、前述の文献「簡易形不特定話者音声認識」に
記載されている固定小数点値対象の単純なる対数変換な
ども考えられる。次に第2ステツプでは、発声レベル差
に起因する絶対レベルの差の正規化として、基準レベル
合わせが行すれる。
FIG. 1 shows the flow of conventional pattern normalization processing. In the first step, the level is corrected by logarithmic transformation, etc., using as input the spectral pattern fi,i''0 to N channels obtained as a result of band-pass filter (BPF) analysis.This is based on the human auditory characteristics. Specific examples of logarithmic transformation include those shown in the flowchart, as well as the simple logarithmic transformation of a fixed-point value object described in the above-mentioned document "Simplified speaker-independent speech recognition." etc. can also be considered. Next, in the second step, reference level matching is performed to normalize the difference in absolute level caused by the difference in utterance level.

本発明によるパタン正規化処理のフロー例を第2図に示
す、以下詳細に各ステップを説明する。
A flow example of the pattern normalization process according to the present invention is shown in FIG. 2, and each step will be explained in detail below.

入力;スペクトルパタンf、、i=1〜Nチャネル (1)第1ステツプ(レベルの正規化)f、’ =f、
/f、、、、但しf、、−=Σf。
Input; spectral pattern f,, i = 1 to N channels (1) 1st step (level normalization) f,' = f,
/f, , , where f, , -=Σf.

絶対レベルの違いに影響されずに1分析の演算長の有効
活用をはかる目的でレベルの正規化が行われる。この結
果、絶対レベルの小さな音声でも第2ステツプの対数変
換処理を精度良く実行することができる。
Level normalization is performed for the purpose of making effective use of the calculation length for one analysis without being affected by differences in absolute levels. As a result, the logarithmic conversion process in the second step can be executed with high accuracy even for voices having a small absolute level.

(2)第2ステツプ(レベルの補正) fl’ == nogxa (1,0+ f1/ f、
)、但しf、;定数 この処理は従来方式の第1ステツプに等しい。
(2) Second step (level correction) fl' == nogxa (1,0+f1/f,
), where f, ; constant This process is equivalent to the first step of the conventional method.

(3)第3ステツプ(基準レベル合わせ)f、 t″=
= f 、 #  f、、、 、 #。
(3) Third step (standard level adjustment) f, t″=
= f, # f, , , #.

但しf、、’ = (Σf、’ ) /N従来方式の第
2ステツプに等しい。
However, f,,'=(Σf,')/N is equivalent to the second step of the conventional method.

(4)第4ステツプ(平滑化処理) f、’=f1”’−1+2ft”+f1.1”’)/4
但しi=l〜n (N 一般に、帯域通過フィルタ分析の中心周波数配置は人間
の聴覚特性と母音の第1ホルマント周波数の分布(日本
語5母音の場合(男性)、Ialが600〜900 H
z、Ialが160〜300Hz、julが200〜5
00H2、lelが300〜600 Hz、101が4
00〜650Hzとなっている)を考慮して、対数スケ
ールあるいはメルスケール配置とするのが通例である。
(4) Fourth step (smoothing process) f,'=f1"'-1+2ft"+f1.1"')/4
However, i = l~n (N) In general, the center frequency arrangement for bandpass filter analysis is based on the human auditory characteristics and the distribution of the first formant frequency of vowels (for Japanese 5 vowels (male), Ial is 600~900 H
z, Ial is 160-300Hz, jul is 200-5
00H2, lel is 300-600 Hz, 101 is 4
00 to 650 Hz), it is customary to use a logarithmic scale or mel scale arrangement.

こうした場合、中心周波数が極端に低域周波数側に配置
されることになり易い(例えば帯域150〜4000H
zを16チヤネル、対数スケールで中心周波数を配置し
た場合、500Hz以下は6チャネル、IKHz以下は
10チヤネルの配置となる)。この結果、音韻性(例え
ばfat、filなど)の特徴に関係しない音声の基本
周波数(音声波形の周期音部分で声帯振動数に対応、男
の場合、100〜150Hz)の高調波による影響が低
次のチャネルに現われ易く、スペクトルパタンが崩れる
(例えば低次に2,3の小ピークが現われてくる)こと
がある、平滑化処理はこの現象に対処するためになされ
るもので、特に500Hz付近までの低次のチャネルに
対して実行する。処理は本発明の例のように3チヤネル
の移動平均などが考えられる。
In such cases, the center frequency tends to be placed extremely low frequency side (for example, in the band 150-4000H).
When 16 channels are arranged for z and the center frequency is arranged on a logarithmic scale, 6 channels are arranged for frequencies below 500 Hz, and 10 channels are arranged for frequencies below IKHz). As a result, the influence of harmonics of the fundamental frequency of speech (the periodic part of the speech waveform, corresponding to the vocal fold frequency, 100 to 150 Hz for men) that is not related to phonological characteristics (e.g. fat, fil, etc.) is reduced. Smoothing processing is performed to deal with this phenomenon, which tends to appear in the next channel and may cause the spectrum pattern to collapse (for example, a few small peaks appear in the low order), especially around 500Hz. Perform for low-order channels up to The processing may be a moving average of three channels as in the example of the present invention.

第3図は従来のパタン正規化方式と本発明の方式1(第
1ステツプから第3ステツプまで)によるスペクトルパ
タンの比較を示す図である。音声はl aikurus
hi I  から絶対レベルの大幅に異なる語中1i+
と語尾filを入力とした1本発明の方式lの結果は従
来の方式に比べて、ピーク(第1ホルマント)がつぶれ
ないで尖鋭であり、かつ絶対レベルの小さな語尾fil
のパタンのダイナミックレンジの比は。
FIG. 3 is a diagram showing a comparison of spectral patterns obtained by the conventional pattern normalization method and method 1 of the present invention (from the first step to the third step). The audio is l aikurus
Words with significantly different absolute levels from hi I 1i+
The result of method 1 of the present invention, which inputs the word ending fil, is that the peak (first formant) is sharp without being crushed, and the absolute level of the word ending fil is small, compared to the conventional method.
The dynamic range ratio of the pattern is .

従来の方式(W、)  2.4国 となり1.4倍に増えており、良好なパタン正規化とな
っている。
The conventional method (W,) has 2.4 countries, which is an increase of 1.4 times, and is a good pattern normalization.

第4図は本発明の方式2(平滑化処理を加えた第1ステ
ツプから第4ステツプまでの処理)の効果を示すもので
、平滑化処理によって第1ホルマントよりも低次に現わ
れていた局所的な小ピークがなくなっていることがわか
る。最上部のパタンはLPGスペクトラムであり、本来
のスペクトルパタンを表わしている。
Figure 4 shows the effect of method 2 of the present invention (processing from the first step to the fourth step including smoothing processing). It can be seen that the small peaks have disappeared. The top pattern is the LPG spectrum and represents the original spectral pattern.

次に本発明の具体的実施例を詳細に説明する。Next, specific embodiments of the present invention will be described in detail.

第5図は本発明を用いた音声認識装置の一実施例の構成
を示すブロック図である。入力音声1は低域r波器(L
PF)、アナログ−ディジタル変換器(ADC)2で折
り返し雑音を除去されながらアナログ値からディジタル
値にサンプリングされる。その後、音声分析部3にて入
力音声が分析され、認識に必要なスペクトルパタンか求
められる。
FIG. 5 is a block diagram showing the configuration of an embodiment of a speech recognition device using the present invention. Input audio 1 is a low-frequency r-wave generator (L
PF), and an analog-to-digital converter (ADC) 2 samples the analog value into a digital value while removing aliasing noise. Thereafter, the input voice is analyzed by the voice analysis section 3 to obtain a spectrum pattern necessary for recognition.

スペクトルパタンを求める方法としては、例えば帯域通
過フィルタ(B P F)分析やFET分析などが考え
られる。その後、本発明によるパタンを正規化部4で音
声の発声レベル変動や帯域通過フィルタ分析結果の正規
化、補正が行われる0次に正規化された入力のスペクト
ルパタンと標準音声記憶部5に格納されている標準音声
のスペクトルパタンとの差(あるいは類似度)が距離計
算機6にて求められる。距離としてはチャネル間のスペ
クトル値の差の絶対値の緩和などが考えられる。
Possible methods for determining the spectral pattern include, for example, band pass filter (BPF) analysis and FET analysis. Thereafter, the pattern according to the present invention is stored in the standard speech storage section 5 as the zero-order normalized input spectrum pattern, in which the normalization section 4 normalizes and corrects the voice utterance level fluctuations and bandpass filter analysis results. The distance calculator 6 calculates the difference (or degree of similarity) between the spectrum pattern and the standard speech spectrum pattern. As the distance, relaxation of the absolute value of the difference in spectral values between channels can be considered.

次に、照合部7にて入力音声と標準音声との時間構造の
差も含めた総距離(照合)値が計算され、判定部8にて
総距離値の大小関係をもとに、入力音声がどの標準音声
に最も似ているかの判定がなされ、認識結果9を出力す
る。距離計算部6は簡単な加・減算器のみでも構成され
、照合部7は例えば連続NL(Non旦1raar )
マツチング法(公知例;連続DP法、特開昭55−22
05号公報の改良)による回路などで構成される6判定
部8は単純な大Ij%比較器でも構成しうる。また、本
発明のパタン正規化部4での除算、対数変換処理はテー
ブル引き処理で実行され、その他は加・減算器のみで構
成されうる。
Next, the matching section 7 calculates the total distance (matching) value including the difference in time structure between the input speech and the standard speech, and the determining section 8 calculates the input speech based on the magnitude relationship of the total distance value. A determination is made as to which standard voice is most similar, and a recognition result 9 is output. The distance calculating section 6 is composed of only simple adders and subtracters, and the matching section 7 is, for example, a continuous NL (Non tan 1 raar).
Matching method (known example: continuous DP method, JP-A-55-22
The 6-judgment section 8, which is constructed from a circuit according to the improvement of the No. 05 publication), can also be constructed from a simple large Ij% comparator. Further, the division and logarithmic conversion processing in the pattern normalization unit 4 of the present invention are executed by table look-up processing, and the rest may be configured only by adders and subtracters.

〔発明の効果〕〔Effect of the invention〕

第6図は従来の方式と本発明の方式による連続音声の子
音認識結果を示すものであり、第n位内までの複数候補
を許した場合の認識率を表わしている。この結果、従来
の方式に比べて平滑化処理を含めた本発明の方式2の場
合、約3%(第1位内)の認識率向上がはかれた。特に
、4,5位内までをみた場合の向上が著しい。
FIG. 6 shows the results of consonant recognition of continuous speech by the conventional method and the method of the present invention, and shows the recognition rate when multiple candidates up to the nth rank are allowed. As a result, in the case of method 2 of the present invention, which includes smoothing processing, compared to the conventional method, an improvement in recognition rate of about 3% (within the first place) was achieved. The improvement is particularly remarkable when looking at the top 4 and 5 rankings.

以上のように1本発明によれば、絶対レベル変動によら
ず分析の演算長の有効利用がはかれ、かつ帯域通過フィ
ルタ分析での基本周波数を分析のゆらぎなどの影響を緩
和できるので、認識性能の向上がはかれるという効果が
ある。
As described above, according to the present invention, the calculation length of the analysis can be effectively used regardless of absolute level fluctuations, and the fundamental frequency in the band-pass filter analysis can be used to reduce the influence of fluctuations in the analysis. This has the effect of improving performance.

【図面の簡単な説明】[Brief explanation of drawings]

第1図は燐来のパタン正規化処理のフローを示す図、第
2図は本発明のパタン正規化処理のフローの一実施例を
示す図、第3図は従来の方式と本発明の方式によるスペ
クトルパタンの比較を示す図、第4図は本発明の特に平
滑化処理方式の効果を示す図、第5図は本発明を組み入
れた音声認識装置の一実施例を示すブロック図、第6図
は本発明の効果を示す認識実験結果を表わす図である。 4・・・パタン正規化部。 Y j 口 第 2 目 ′tJ J 口 (a)硅明一方A;1 (b)従束・方式 11”tz<之’  (K抱ン 運ロ  〃(慣iζ  (にH1ン VI 6 口 123  チ 5 矛弧 稽 円
Fig. 1 is a diagram showing the flow of Rinrai's pattern normalization processing, Fig. 2 is a diagram showing an example of the flow of pattern normalization processing of the present invention, and Fig. 3 is a diagram showing the conventional method and the method of the present invention. FIG. 4 is a diagram showing the effect of the smoothing processing method of the present invention, FIG. 5 is a block diagram showing an embodiment of a speech recognition device incorporating the present invention, and FIG. The figure is a diagram showing the results of a recognition experiment showing the effects of the present invention. 4...Pattern normalization section. Y j 口 2nd 'tJ J 口(a) 硅明一个A; 1 (b) Subjective/method 11"tz<之' 5.

Claims (1)

【特許請求の範囲】 1、少なくとも音声の短時間のスペクトルパタンを抽出
する音声分析部を具備する音声認識装置において、前記
音声分析部で得られるスペクトルパタンのレベル正規化
を行う第1の手段と、前記第1の手段で得られたパタン
のレベル補正変換を行う第2の手段と、前記第2の手段
で得られたパタンの基準レベル合わせを行う第3の手段
とを設けたことを特徴とする音声認識装置。 2、前記第1項記載の音声認識装置において、隣接する
帯域のスペクトル情報を用いて、音声分析部で得られた
スペクトルパタンを平滑化する手段を設けたことを特徴
とする音声認識装置。 3、前記第1項記載の音声認識装置において、前記第1
の手段を短時間スペクトルパタンの各帯域でのスペクト
ル値の総和に準じた値で各帯域のスペクトル情報を除す
る処理とし、前記第2の手段を対数変換処理とし、第3
の手段をスペクトルパタンの各帯域のスペクトル値の平
均値に準じた値を各帯域のスペクトル値から減じる処理
としたことを特徴とする音声認識装置。
[Scope of Claims] 1. In a speech recognition device equipped with a speech analysis section for extracting at least a short-time spectral pattern of speech, a first means for normalizing the level of the spectral pattern obtained by the speech analysis section; , a second means for level correction conversion of the pattern obtained by the first means, and a third means for adjusting the reference level of the pattern obtained by the second means. speech recognition device. 2. The speech recognition device according to item 1, further comprising means for smoothing the spectrum pattern obtained by the speech analysis section using spectrum information of adjacent bands. 3. In the speech recognition device according to item 1 above, the first
The means is a process of dividing the spectral information of each band by a value corresponding to the sum of the spectral values in each band of the short-time spectral pattern, the second means is a logarithmic transformation process, and the third means
A speech recognition device characterized in that the means is a process of subtracting a value based on the average value of the spectral values of each band of the spectral pattern from the spectral value of each band.
JP4653085A 1985-03-11 1985-03-11 Voice recognition equipment Pending JPS61206000A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP4653085A JPS61206000A (en) 1985-03-11 1985-03-11 Voice recognition equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP4653085A JPS61206000A (en) 1985-03-11 1985-03-11 Voice recognition equipment

Publications (1)

Publication Number Publication Date
JPS61206000A true JPS61206000A (en) 1986-09-12

Family

ID=12749836

Family Applications (1)

Application Number Title Priority Date Filing Date
JP4653085A Pending JPS61206000A (en) 1985-03-11 1985-03-11 Voice recognition equipment

Country Status (1)

Country Link
JP (1) JPS61206000A (en)

Similar Documents

Publication Publication Date Title
Reynolds Experimental evaluation of features for robust speaker identification
US5054085A (en) Preprocessing system for speech recognition
US7756700B2 (en) Perceptual harmonic cepstral coefficients as the front-end for speech recognition
Hunt et al. Speaker dependent and independent speech recognition experiments with an auditory model
JPH0743598B2 (en) Speech recognition method
JP2745535B2 (en) Voice recognition device
Vergin et al. Compensated mel frequency cepstrum coefficients
Bahaghighat et al. Textdependent Speaker Recognition by combination of LBG VQ and DTW for persian language
JPS6366600A (en) Method and apparatus for obtaining normalized signal for subsequent processing by preprocessing of speaker,s voice
Alam et al. Robust feature extractors for continuous speech recognition
JP3354252B2 (en) Voice recognition device
JPS61206000A (en) Voice recognition equipment
JPH0449952B2 (en)
Chougule et al. Channel robust MFCCs for continuous speech speaker recognition
Zhen et al. On the use of bandpass liftering in speaker recognition
Kyriakides et al. Isolated word endpoint detection using time-frequency variance kernels
JPH04230800A (en) Voice signal processor
Saha et al. Modified mel-frequency cepstral coefficient
Sahu et al. Significance of Filterbank Structure for Capturing Dysarthric Information through Cepstral Coefficients
Sivakumaran et al. Sub-band based speaker verification using dynamic recombination weights.
JPH1097288A (en) Background noise removing device and speech recognition system
Islam et al. Mel-Wiener filter for Mel-LPC based speech recognition
Ormanci et al. Subjective assessment of frequency bands for perception of speaker identity
JP2658426B2 (en) Voice recognition method
JPH0146079B2 (en)