JPS5857197A - Voice recognition equipment - Google Patents

Voice recognition equipment

Info

Publication number
JPS5857197A
JPS5857197A JP15663381A JP15663381A JPS5857197A JP S5857197 A JPS5857197 A JP S5857197A JP 15663381 A JP15663381 A JP 15663381A JP 15663381 A JP15663381 A JP 15663381A JP S5857197 A JPS5857197 A JP S5857197A
Authority
JP
Japan
Prior art keywords
lpc
distance
difference
distance scale
bandwidth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP15663381A
Other languages
Japanese (ja)
Inventor
清宏 鹿野
雅英 杉山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP15663381A priority Critical patent/JPS5857197A/en
Publication of JPS5857197A publication Critical patent/JPS5857197A/en
Pending legal-status Critical Current

Links

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。
(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】 この発明祉入力音声からLPC分析を介して得られる特
徴パラメータと音韻の標準の特徴パラメータとの距離尺
度値を計算して、入力音声中の音韻を認識することKよ
る入力音声を認識する装置に関するものである。
[Detailed Description of the Invention] This invention is based on the method of recognizing phonemes in input speech by calculating distance scale values between feature parameters obtained from input speech through LPC analysis and standard feature parameters of phoneme. The present invention relates to a device that recognizes input speech.

従来のむの種0LPC分析に基づく音声認識装置で社入
力音声と標準パターンとのLPCスペクトル包絡関の距
離尺度値の計算に$PI/hて、これらLPCスペクト
ル包−関のバンド幅の差を考慮していなかった。この丸
め発声者内での調音結合、つま)同一音韻でもその音声
の前後の音韻との結合により、LPCスペクトル包絡の
バンド幅が変化し、或社発声者の違いによシ声に!lり
がある場合とない場合とによって同一音韻でもバンド幅
の違いが起る。例えば第1図に示すように同一音韻であ
ってホルマント周波数がfa、f麿、fs、faが一款
しているが、前記発声者内での調音結合や発声者間での
声の張シの有無などKよシ、スペクトル包絡が実線と点
線とのようになシ、各ホルマント周波数のピークの尖る
1度が異なシ、つtb同一ホルマントのバンド幅IB*
 、ΔB、lが異なつ九ものとなることがある。このよ
うKLPCスペクトル包絡のバンド幅が異なる場合社同
−音韻を異なる音韻と認識したシ、wt識不能になった
シすることが生じた。
In the conventional speech recognition device based on 0LPC analysis, the difference in the bandwidth of these LPC spectral envelopes is calculated using $PI/h to calculate the distance measure value of the LPC spectral envelope relationships between the input speech and the standard pattern. I hadn't considered it. Even if the same phoneme is the same, the bandwidth of the LPC spectrum envelope changes due to the articulatory combination within the speaker, and the band width of the LPC spectrum changes, resulting in a different voice depending on the difference between the speaker! Differences in bandwidth occur even for the same phoneme depending on whether there is a lisp or not. For example, as shown in Figure 1, the same phoneme and the formant frequencies fa, fmaro, fs, and fa are all in one sentence, but the articulatory combination within the speaker and the tone of the voice between the speakers are different. The spectral envelope is like a solid line and a dotted line, the sharpness of the peak of each formant frequency is different by 1 degree, and the bandwidth of the same formant is IB*.
, ΔB, l may be nine different values. In this way, when the bandwidths of the KLPC spectrum envelopes are different, it may become impossible to recognize the same phoneme as a different phoneme.

このような誤認識を防止する点から入力音声と標準音声
との音韻0LPCスペクトル包絡のピ−特開昭58−5
7197(2) 像抽出及び距離尺度計算部2において、LPC分析によ
シ特徴バ2メータを抽出し、かつ、その特徴パラメータ
と、記憶部3にあらかじめ蓄えられている音韻の標準の
特徴パラメータとのスペクトル包絡間の距離尺度値4が
求められる。この発明で社この距離尺度値4の計算の際
に、バンド幅の正規化を行う。音韻の標準パターンごと
の距離尺度値4を音韻決定部5で比較し、もつと4小さ
い距離尺度値を与える音韻基6あるいは、一定のしきい
値以下の距離尺度値をもつ音韻基6が出方される。この
出力音韻基は複数個であっても喪い。
In order to prevent such misrecognition, the phonological 0LPC spectrum envelope between the input speech and the standard speech is proposed.
7197 (2) The image extraction and distance scale calculation unit 2 extracts the feature bar 2 meter by LPC analysis, and combines the feature parameter with the standard feature parameter of the phoneme stored in advance in the storage unit 3. The distance measure value 4 between the spectral envelopes of is determined. In this invention, the bandwidth is normalized when calculating the distance measure value 4. The distance scale values 4 for each standard pattern of phonemes are compared in the phoneme determination unit 5, and a phoneme base 6 that gives a distance scale value smaller than 4 or a phoneme base 6 that has a distance scale value below a certain threshold value is output. be treated. Even if there are multiple output phonological bases, it will be ignored.

距離尺度値の計算のIIK行うバンド幅正規化の概念を
第3図に示す。バンド幅の正規化を公知のバンド幅拡大
法を用いて行う。入力音声1はLPC分析部11で距離
尺度パラメータ、例えば線形予測係数αlが求められ、
この距離尺度パラメータを可変補正部12で補正して<
 t+g > ’α1とし、一方、音韻の標準パターン
の同種の距離尺度パラメータ13、即ち線形予測係数α
ilを可変補正部14で補正部12に対する補正と逆に
補正して(1+a ) −1a s +  とする。こ
のときの補正量り1を変化するととによりLPCスペク
トルの包絡の差、例えばLPCケプストラム距離CEP
が最小になるように距離計算部15で求める。このとき
の補正量Cは、(1)式で与えられる。
The concept of bandwidth normalization for distance measure calculation is shown in FIG. Bandwidth normalization is performed using a known bandwidth expansion method. The input speech 1 is used to obtain a distance scale parameter, for example, a linear prediction coefficient αl, in the LPC analysis unit 11.
This distance scale parameter is corrected by the variable correction unit 12 to
t+g >'α1, and on the other hand, the homogeneous distance measure parameter 13 of the phonological standard pattern, that is, the linear prediction coefficient α
The variable correction unit 14 corrects il in the opposite manner to the correction performed by the correction unit 12 to obtain (1+a) −1a s + . When the correction scale 1 is changed at this time, the difference in the envelope of the LPC spectrum, for example, the LPC cepstral distance CEP
The distance calculation unit 15 calculates the distance so that the distance is minimized. The correction amount C at this time is given by equation (1).

C1、CI ’aそれぞれ入力音声、標準ノ(ターンの
各LPCケプストツム係数である。
C1 and CI'a are the LPC cepstatum coefficients of the input voice and standard turn, respectively.

桓離尺度の計算にはLPCケプメトツ^距離を求める他
KLPCピーク重みづけ尺度のWLR(We1ght*
d Likelilcood Ratio)を基に行う
場合もあシ、このWLR値が最小Keるようにεを求め
てもよい。このようにして決定され九補正量aの値を用
いて、LPCケプストチム距離あるいはWLRによって
距離尺度値を計算する。
To calculate the separation scale, in addition to calculating the LPC peak weighting scale WLR (We1ght*
Alternatively, ε may be determined so that this WLR value is the minimum Ke. Using the value of the nine correction amount a determined in this way, a distance scale value is calculated by LPC cepstochim distance or WLR.

次にこの距離尺度計算、即ち第2図の特徴抽出及び距離
尺度計算部2の動作を[4図を用いて説明する。入力音
声1a自己相関分析部7で自己相関分析され、自己相関
係数(pn、 HI==l 、 2.・・・・op)が
求められる。たVL、PaLPC分析の分析次数である
。次にLPG分析部8でp次のLpc分析を行透い、線
形予測係数(αn、ne=1.2゜・・−1p)を求め
、かつLPCケプスト2ム係数(Cn、nWl、2.@
@@IQ)と、場合によってaLpc相関係数(yH,
Hml、2.ass、q)とを線形予測係数(αn1か
ら式(2)、<3)によって計算する。
Next, this distance measure calculation, that is, the operation of the feature extraction and distance measure calculating section 2 shown in FIG. 2 will be explained using FIG. The input speech 1a is subjected to autocorrelation analysis by the autocorrelation analysis section 7, and autocorrelation coefficients (pn, HI==l, 2...op) are determined. VL is the analytical order of PaLPC analysis. Next, the LPG analysis unit 8 performs p-order Lpc analysis to obtain linear prediction coefficients (αn, ne=1.2°...-1p), and LPC cepstral coefficients (Cn, nWl, 2. @
@@IQ) and in some cases aLpc correlation coefficient (yH,
Hml, 2. ass, q) is calculated using the linear prediction coefficient (from αn1, equation (2), <3).

1難−1 Cnwm−αn−−Σ(n−m)αm@cn−m   
 (21nm−ま たソし、y1寵1 、2.・・・、qであシ、Cnの次
数nがLPC分析の次数pよ#)屯大きいときに社、α
p+−αp4gNM@*@・=αqWMOと考える。
1 Difficulty-1 Cnwm-αn--Σ(n-m)αm@cn-m
(21nm - again, y1, 2..., q, the order n of Cn is the order p of LPC analysis) When the order is large, α
Consider p+-αp4gNM@*@・=αqWMO.

r−−!αmrl−m        (3)m−ま たソし、n■p+1 、 p+2、−・・、qであシ、
p次以下のLPC相関係数は、自己相関分析の出力の相
関係数に一致する。
r--! αmrl-m (3) m-also so, n■p+1, p+2,-..., q,
LPC correlation coefficients of order p or lower match the correlation coefficients of the output of autocorrelation analysis.

また、音韻の標準パターンも、あらかじめ上記の手順で
、LPCケプストツム係数(Cn’)と場合によっては
LPC相関係数(rn@)も含めて特徴パラメータとし
て記憶部3に蓄えておく。距離尺度値計算部9では、入
力音声の特徴パラメータと音韻の標準パターンの特徴パ
ラメータと0LPCスペクトル包絡関の距離尺度値を計
算する。
Further, the standard pattern of phonemes is also stored in the storage unit 3 in advance as a feature parameter, including the LPC cepstatum coefficient (Cn') and, depending on the case, the LPC correlation coefficient (rn@), using the above-described procedure. The distance measure value calculating section 9 calculates distance measure values between the feature parameters of the input speech, the feature parameters of the standard pattern of phonemes, and the 0LPC spectral envelope relation.

まず、先に述ぺたようにバンド幅拡大法によシパンド幅
の補再量εを式(1)で求める。この補正量−のとる範
囲に制限を式(4)のように設けることが有効であるこ
とが実験的に確められた。
First, as described above, the bandwidth expansion method is used to calculate the compensation amount ε of the shpand width using equation (1). It has been experimentally confirmed that it is effective to set a limit on the range of this correction amount as shown in equation (4).

r−≦8≦r +             (’1制
限を設けた場合には、Cがr−よシ小さい場合にはeの
値をr−1c、 gがr + z g大きい場合には−
の蓋をr+とする。このような補正を行ったときの入力
音声及び標準パターンのLPCケプストラム係数Cn、
Cn’ Id次のように計算される。
r-≦8≦r+ ('1 limit is set, if C is smaller than r-, the value of e is r-1c, and if g is larger than r+z g, -
Let the lid of be r+. LPC cepstral coefficients Cn of the input voice and standard pattern when such correction is performed,
Cn' Id is calculated as follows.

en−(1+ g )” Cn         (5
1合n ’−(1+1 )−” Cn ’ あ為いは合
B −(1−g 戸Cn’(6) よって、この補正された係数を用いて距離尺度計算をL
PCケプスト2ム距離について行う場合は次式で求めら
れる。
en-(1+g)”Cn(5
1-g n'-(1+1)-"Cn'The difference is B-(1-g Cn'(6)) Therefore, using this corrected coefficient, calculate the distance scale as L.
When calculating the PC cepst 2m distance, it is obtained by the following formula.

この演算によれば先に述べたように補正量εはLPCス
ペクトラムの包絡間距離が最小と々るように1係数Cn
とCn@とを互に逆に補正して両包絡が近ずくようにな
シ、つまシバンド幅が互に近ずけられ正規化された距離
尺度値となる。
According to this calculation, as mentioned above, the correction amount ε is set by one coefficient Cn so that the distance between the envelopes of the LPC spectrum reaches the minimum.
and Cn@ are mutually inversely corrected so that both envelopes become closer to each other, and the band widths are brought closer to each other, resulting in a normalized distance measure value.

いればよい。距離尺度値の計算には先に述べ良ようKL
PCビーク重みづけ尺度のWLRを基にし九場合があり
、この場合のときの距離尺度値を前記補正を行って求め
ると、 WLR−ff(仝、静、・)(包着、・)(8)−s となる。こ\で、/>i及び会l・の値は各々予測計数
((1+1 ’)αt)(t−xs・・・pp)及び(
(1+1)−α11)(””1e・・・op)からパー
コール係数を経由して求められる。こ\でα1及びα1
1は、入力音声の線形子側係数及び標準パターンの線形
予測計数である。
It's fine if you have one. To calculate the distance scale value, it is best to state KL first.
There are nine cases based on the WLR of the PC beak weighting scale, and when the distance scale value in this case is calculated by performing the above correction, )−s. Here, the values of //>i and meeting l are respectively predicted coefficients ((1+1') αt) (t-xs...pp) and (
It is obtained from (1+1)-α11)(""1e...op) via the Percoll coefficient. α1 and α1 here
1 is the linear child-side coefficient of the input voice and the linear prediction coefficient of the standard pattern.

以上述べたよ5にこの発明によれば、入力音声と音韻標
準パターンとのLPCスペクトル包絡間の距離尺度値を
スペクトル包絡のバンド幅を正規化した条件で計算する
ととができ、音韻を高い精度で識別することができる。
As stated above, according to the present invention, the distance measure value between the LPC spectral envelope of the input speech and the standard phonetic pattern can be calculated under the condition that the bandwidth of the spectral envelope is normalized, and the phonological standard pattern can be calculated with high accuracy. can be identified.

ヒの構成により、調音結合によるバンド幅のばらつきを
正規化したシ発声者間のバンド幅のばらつきを正規化す
るととK It り 、音m1lli!臓の性能を向上
させることができる。なお(4)式の制限を設ける場合
は大きく補正することによシ返って認識精度が低下する
のを防ぐことができる。
By normalizing the band width variations due to articulatory coupling and normalizing the band width variations between speakers, the sound m1lli! It can improve the performance of the heart. Note that when the restriction of equation (4) is provided, by making a large correction, it is possible to prevent the recognition accuracy from decreasing.

この発qa音韻認識のみならず、例えば単語音声認識装
置にも適用して認識性能を向上させることができる。第
5図は単語音声認識装置の例を示し、記憶部3に紘単語
ごとにその特徴パラメータの時系列が蓄えられている。
The present invention can be applied not only to utterance qa phoneme recognition, but also to, for example, a word speech recognition device to improve recognition performance. FIG. 5 shows an example of a word speech recognition device, in which a time series of characteristic parameters for each word is stored in the storage unit 3.

入力音声1は記憶部3よシの標準パターン時系列とのL
PCスペクトル距離尺度値4が計算部2で計算され、そ
の距離尺度値4について時間軸正規化マツチングがマツ
チング部17で行われる。その計算結果として単語単位
での距離尺度値18は単語決定部19に入力され、もつ
とも小さい距離尺度値18の単語名あるいは一定のしき
い値以下の距離尺度値の単語名20が出力される。
The input audio 1 is the same as the standard pattern time series stored in the storage unit 3.
A PC spectral distance scale value 4 is calculated by the calculation unit 2, and time axis normalized matching is performed on the distance scale value 4 by the matching unit 17. As a result of the calculation, the distance scale value 18 in units of words is input to the word determination section 19, and the word name 20 having the smallest distance scale value 18 or the distance scale value below a certain threshold value is output.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は同一音韻で吃スペクトル包絡のバンド幅が異危
る例を示す図、第2図はこの発明を適用し九音韻識別装
置の一実施例を示す概念図、第3図はバンド幅正規化尺
度の考え方を示す概念図、第4図は第1図の特徴抽出及
び距離尺度計算部2の一例を示す概念図、第5図はこの
発明を適用し九単語音声wlllI!装置の一例を示す
概念図である。 l:入力音声、2:特徴抽出及び距離尺度計算部、3:
音韻の標準パターンの特徴パラメータの記憶、4:距離
尺度値、5:音韻決定部、6:音韻名、7:自己相関分
析部、8:LPC分折部、9:距離尺度計算部、17:
時間軸正規化マツチング部、18:単語単位の距離尺度
値、19:単語決定部、20:単語名。 特許出願人  日本電信電話公社 代理人 軍費 卓 ′:X  1  口 才  2  肥 木  4  閉 才  5  図 手続補正書(自発) 昭和561$12月2日 特許庁長音 殿 事件の表示  特願昭56−156683発明の名称 
音声1ivi装置 補正をする看 事件との関係  特許出願人 日本電信電話公社 永代 理 人  東京都新宿区新宿4−2−21相模ビ
ル鳴細111頁17行「することによる」をrすること
により」と訂正する。
Fig. 1 is a diagram showing an example in which the bandwidth of the stuttering spectrum envelope is different for the same phoneme, Fig. 2 is a conceptual diagram showing an embodiment of a nine-phoneme identification device applying the present invention, and Fig. 3 is a diagram showing the bandwidth of the stuttering spectrum envelope. FIG. 4 is a conceptual diagram showing an example of the feature extraction and distance measure calculation unit 2 shown in FIG. 1. FIG. 5 is a conceptual diagram showing the concept of the normalization scale, and FIG. It is a conceptual diagram showing an example of a device. 1: Input audio, 2: Feature extraction and distance measure calculation unit, 3:
Memory of feature parameters of standard patterns of phonemes, 4: Distance scale value, 5: Phoneme determination unit, 6: Phoneme name, 7: Autocorrelation analysis unit, 8: LPC decomposition unit, 9: Distance scale calculation unit, 17:
Time axis normalization matching section, 18: Distance scale value in word units, 19: Word determining section, 20: Word name. Patent applicant Nippon Telegraph and Telephone Public Corporation agent Military expenses Taku': name of invention
Relationship with the case of correcting audio 1ivi equipment Patent applicant: Osamu Nagayo, Nippon Telegraph and Telephone Public Corporation, Meiho, Sagami Building, 4-2-21, Shinjuku, Shinjuku-ku, Tokyo, page 111, line 17: By changing ``by doing'' I am corrected.

Claims (1)

【特許請求の範囲】[Claims] (1)入力音声からLPC分析によって得られる特徴パ
ラメータと、あらかじめ蓄えられた音韻の標準パターン
の特徴パラメータとの差(距離尺度値)を求めて入力音
声を認識する音声認識装置において、前記差(距離尺度
値)の計算手段にLPGスペクトル包絡間の差に注目し
てLPCスペクトル包絡間のバンド幅の差の正規化能力
をもたせたことを特徴とする音声認識装置。
(1) In a speech recognition device that recognizes input speech by finding the difference (distance scale value) between a feature parameter obtained from an input speech by LPC analysis and a feature parameter of a standard pattern of phonemes stored in advance, the difference ( 1. A speech recognition device, characterized in that the calculation means (distance scale value) has the ability to normalize the difference in bandwidth between LPC spectral envelopes by paying attention to the difference between LPG spectral envelopes.
JP15663381A 1981-09-30 1981-09-30 Voice recognition equipment Pending JPS5857197A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP15663381A JPS5857197A (en) 1981-09-30 1981-09-30 Voice recognition equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP15663381A JPS5857197A (en) 1981-09-30 1981-09-30 Voice recognition equipment

Publications (1)

Publication Number Publication Date
JPS5857197A true JPS5857197A (en) 1983-04-05

Family

ID=15631938

Family Applications (1)

Application Number Title Priority Date Filing Date
JP15663381A Pending JPS5857197A (en) 1981-09-30 1981-09-30 Voice recognition equipment

Country Status (1)

Country Link
JP (1) JPS5857197A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6272106B1 (en) 1994-05-06 2001-08-07 Nit Mobile Communications Network, Inc. Method and device for detecting double-talk, and echo canceler

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6272106B1 (en) 1994-05-06 2001-08-07 Nit Mobile Communications Network, Inc. Method and device for detecting double-talk, and echo canceler

Similar Documents

Publication Publication Date Title
Bandoin et al. On the transformation of the speech spectrum for voice conversion
Wang et al. An objective measure for predicting subjective quality of speech coders
US7337107B2 (en) Perceptual harmonic cepstral coefficients as the front-end for speech recognition
US7257535B2 (en) Parametric speech codec for representing synthetic speech in the presence of background noise
US7792672B2 (en) Method and system for the quick conversion of a voice signal
Hunt Spectral signal processing for ASR
US20070208566A1 (en) Voice Signal Conversation Method And System
Hunt et al. Speaker dependent and independent speech recognition experiments with an auditory model
US4937871A (en) Speech recognition device
JPS634200B2 (en)
EP2372707B1 (en) Adaptive spectral transformation for acoustic speech signals
Hermansky An efficient speaker-independent automatic speech recognition by simulation of some properties of human auditory perception
JP3240908B2 (en) Voice conversion method
US20050240397A1 (en) Method of determining variable-length frame for speech signal preprocessing and speech signal preprocessing method and device using the same
JPS5857197A (en) Voice recognition equipment
KR100511248B1 (en) An Amplitude Warping Approach to Intra-Speaker Normalization for Speech Recognition
EP0782127A2 (en) A time-varying feature space preprocessing procedure for telephone based speech recognition
US5581650A (en) Learning dynamic programming
Gallardo et al. Spectral Sub-band Analysis of Speaker Verification Employing Narrowband and Wideband Speech.
Tanaka A dynamic processing approach to phoneme recognition (part I)--Feature extraction
Ishizuka et al. Speech feature extraction method representing periodicity and aperiodicity in sub bands for robust speech recognition
JPS5999500A (en) Voice recognition method
Hur et al. Formant weighted cepstral feature for LSP-based speech recognition
Pols Analysis and synthesis of speech using a broad-band spectral representation
Steeneken Quality evaluation of speech processing systems