JPS5857197A

JPS5857197A - Voice recognition equipment

Info

Publication number: JPS5857197A
Application number: JP15663381A
Authority: JP
Inventors: 清宏鹿野; 雅英杉山
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1981-09-30
Filing date: 1981-09-30
Publication date: 1983-04-05

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】この発明祉入力音声からＬＰＣ分析を介して得られる特
徴パラメータと音韻の標準の特徴パラメータとの距離尺
度値を計算して、入力音声中の音韻を認識することＫよ
る入力音声を認識する装置に関するものである。[Detailed Description of the Invention] This invention is based on the method of recognizing phonemes in input speech by calculating distance scale values between feature parameters obtained from input speech through LPC analysis and standard feature parameters of phoneme. The present invention relates to a device that recognizes input speech.

従来のむの種０ＬＰＣ分析に基づく音声認識装置で社入
力音声と標準パターンとのＬＰＣスペクトル包絡関の距
離尺度値の計算に＄ＰＩ／ｈて、これらＬＰＣスペクト
ル包−関のバンド幅の差を考慮していなかった。この丸
め発声者内での調音結合、つま）同一音韻でもその音声
の前後の音韻との結合により、ＬＰＣスペクトル包絡の
バンド幅が変化し、或社発声者の違いによシ声に！ｌり
がある場合とない場合とによって同一音韻でもバンド幅
の違いが起る。例えば第１図に示すように同一音韻であ
ってホルマント周波数がｆａ、ｆ麿、ｆｓ、ｆａが一款
しているが、前記発声者内での調音結合や発声者間での
声の張シの有無などＫよシ、スペクトル包絡が実線と点
線とのようになシ、各ホルマント周波数のピークの尖る
１度が異なシ、つｔｂ同一ホルマントのバンド幅ＩＢ＊
　、ΔＢ、ｌが異なつ九ものとなることがある。このよ
うＫＬＰＣスペクトル包絡のバンド幅が異なる場合社同
−音韻を異なる音韻と認識したシ、ｗｔ識不能になった
シすることが生じた。In the conventional speech recognition device based on 0LPC analysis, the difference in the bandwidth of these LPC spectral envelopes is calculated using $PI/h to calculate the distance measure value of the LPC spectral envelope relationships between the input speech and the standard pattern. I hadn't considered it. Even if the same phoneme is the same, the bandwidth of the LPC spectrum envelope changes due to the articulatory combination within the speaker, and the band width of the LPC spectrum changes, resulting in a different voice depending on the difference between the speaker! Differences in bandwidth occur even for the same phoneme depending on whether there is a lisp or not. For example, as shown in Figure 1, the same phoneme and the formant frequencies fa, fmaro, fs, and fa are all in one sentence, but the articulatory combination within the speaker and the tone of the voice between the speakers are different. The spectral envelope is like a solid line and a dotted line, the sharpness of the peak of each formant frequency is different by 1 degree, and the bandwidth of the same formant is IB*.
, ΔB, l may be nine different values. In this way, when the bandwidths of the KLPC spectrum envelopes are different, it may become impossible to recognize the same phoneme as a different phoneme.

このような誤認識を防止する点から入力音声と標準音声
との音韻０ＬＰＣスペクトル包絡のピ−特開昭５８−５
７１９７（２）像抽出及び距離尺度計算部２において、ＬＰＣ分析によ
シ特徴バ２メータを抽出し、かつ、その特徴パラメータ
と、記憶部３にあらかじめ蓄えられている音韻の標準の
特徴パラメータとのスペクトル包絡間の距離尺度値４が
求められる。この発明で社この距離尺度値４の計算の際
に、バンド幅の正規化を行う。音韻の標準パターンごと
の距離尺度値４を音韻決定部５で比較し、もつと４小さ
い距離尺度値を与える音韻基６あるいは、一定のしきい
値以下の距離尺度値をもつ音韻基６が出方される。この
出力音韻基は複数個であっても喪い。In order to prevent such misrecognition, the phonological 0LPC spectrum envelope between the input speech and the standard speech is proposed.
7197 (2) The image extraction and distance scale calculation unit 2 extracts the feature bar 2 meter by LPC analysis, and combines the feature parameter with the standard feature parameter of the phoneme stored in advance in the storage unit 3. The distance measure value 4 between the spectral envelopes of is determined. In this invention, the bandwidth is normalized when calculating the distance measure value 4. The distance scale values 4 for each standard pattern of phonemes are compared in the phoneme determination unit 5, and a phoneme base 6 that gives a distance scale value smaller than 4 or a phoneme base 6 that has a distance scale value below a certain threshold value is output. be treated. Even if there are multiple output phonological bases, it will be ignored.

距離尺度値の計算のＩＩＫ行うバンド幅正規化の概念を
第３図に示す。バンド幅の正規化を公知のバンド幅拡大
法を用いて行う。入力音声１はＬＰＣ分析部１１で距離
尺度パラメータ、例えば線形予測係数αｌが求められ、
この距離尺度パラメータを可変補正部１２で補正して＜
　ｔ＋ｇ　＞　’α１とし、一方、音韻の標準パターン
の同種の距離尺度パラメータ１３、即ち線形予測係数α
ｉｌを可変補正部１４で補正部１２に対する補正と逆に
補正して（１＋ａ　）　−１ａ　ｓ　＋　　とする。こ
のときの補正量り１を変化するととによりＬＰＣスペク
トルの包絡の差、例えばＬＰＣケプストラム距離ＣＥＰ
が最小になるように距離計算部１５で求める。このとき
の補正量Ｃは、（１）式で与えられる。The concept of bandwidth normalization for distance measure calculation is shown in FIG. Bandwidth normalization is performed using a known bandwidth expansion method. The input speech 1 is used to obtain a distance scale parameter, for example, a linear prediction coefficient αl, in the LPC analysis unit 11.
This distance scale parameter is corrected by the variable correction unit 12 to
t+g >'α1, and on the other hand, the homogeneous distance measure parameter 13 of the phonological standard pattern, that is, the linear prediction coefficient α
The variable correction unit 14 corrects il in the opposite manner to the correction performed by the correction unit 12 to obtain (1+a) −1a s + . When the correction scale 1 is changed at this time, the difference in the envelope of the LPC spectrum, for example, the LPC cepstral distance CEP
The distance calculation unit 15 calculates the distance so that the distance is minimized. The correction amount C at this time is given by equation (1).

Ｃ１、ＣＩ　’ａそれぞれ入力音声、標準ノ（ターンの
各ＬＰＣケプストツム係数である。C1 and CI'a are the LPC cepstatum coefficients of the input voice and standard turn, respectively.

桓離尺度の計算にはＬＰＣケプメトツ＾距離を求める他
ＫＬＰＣピーク重みづけ尺度のＷＬＲ（Ｗｅ１ｇｈｔ＊
ｄ　Ｌｉｋｅｌｉｌｃｏｏｄ　Ｒａｔｉｏ）を基に行う
場合もあシ、このＷＬＲ値が最小Ｋｅるようにεを求め
てもよい。このようにして決定され九補正量ａの値を用
いて、ＬＰＣケプストチム距離あるいはＷＬＲによって
距離尺度値を計算する。To calculate the separation scale, in addition to calculating the LPC peak weighting scale WLR (We1ght*
Alternatively, ε may be determined so that this WLR value is the minimum Ke. Using the value of the nine correction amount a determined in this way, a distance scale value is calculated by LPC cepstochim distance or WLR.

次にこの距離尺度計算、即ち第２図の特徴抽出及び距離
尺度計算部２の動作を［４図を用いて説明する。入力音
声１ａ自己相関分析部７で自己相関分析され、自己相関
係数（ｐｎ、　ＨＩ＝＝ｌ　、　２．・・・・ｏｐ）が
求められる。たＶＬ、ＰａＬＰＣ分析の分析次数である
。次にＬＰＧ分析部８でｐ次のＬｐｃ分析を行透い、線
形予測係数（αｎ、ｎｅ＝１．２゜・・−１ｐ）を求め
、かつＬＰＣケプスト２ム係数（Ｃｎ、ｎＷｌ、２．＠
＠＠ＩＱ）と、場合によってａＬｐｃ相関係数（ｙＨ，
Ｈｍｌ、２．ａｓｓ、ｑ）とを線形予測係数（αｎ１か
ら式（２）、＜３）によって計算する。Next, this distance measure calculation, that is, the operation of the feature extraction and distance measure calculating section 2 shown in FIG. 2 will be explained using FIG. The input speech 1a is subjected to autocorrelation analysis by the autocorrelation analysis section 7, and autocorrelation coefficients (pn, HI==l, 2...op) are determined. VL is the analytical order of PaLPC analysis. Next, the LPG analysis unit 8 performs p-order Lpc analysis to obtain linear prediction coefficients (αn, ne=1.2°...-1p), and LPC cepstral coefficients (Cn, nWl, 2. @
@@IQ) and in some cases aLpc correlation coefficient (yH,
Hml, 2. ass, q) is calculated using the linear prediction coefficient (from αn1, equation (2), <3).

１難−１Ｃｎｗｍ−αｎ−−Σ（ｎ−ｍ）αｍ＠ｃｎ−ｍ　　　
　（２１ｎｍ−またソし、ｙ１寵１　、２．・・・、ｑであシ、Ｃｎの次
数ｎがＬＰＣ分析の次数ｐよ＃）屯大きいときに社、α
ｐ＋−αｐ４ｇＮＭ＠＊＠・＝αｑＷＭＯと考える。1 Difficulty-1 Cnwm-αn--Σ(n-m)αm@cn-m
(21nm - again, y1, 2..., q, the order n of Cn is the order p of LPC analysis) When the order is large, α
Consider p+-αp4gNM@*@・=αqWMO.

ｒ−−！αｍｒｌ−ｍ　　　　　　　　（３）ｍ−またソし、ｎ■ｐ＋１　、　ｐ＋２、−・・、ｑであシ、
ｐ次以下のＬＰＣ相関係数は、自己相関分析の出力の相
関係数に一致する。r--! αmrl-m (3) m-also so, n■p+1, p+2,-..., q,
LPC correlation coefficients of order p or lower match the correlation coefficients of the output of autocorrelation analysis.

また、音韻の標準パターンも、あらかじめ上記の手順で
、ＬＰＣケプストツム係数（Ｃｎ’）と場合によっては
ＬＰＣ相関係数（ｒｎ＠）も含めて特徴パラメータとし
て記憶部３に蓄えておく。距離尺度値計算部９では、入
力音声の特徴パラメータと音韻の標準パターンの特徴パ
ラメータと０ＬＰＣスペクトル包絡関の距離尺度値を計
算する。Further, the standard pattern of phonemes is also stored in the storage unit 3 in advance as a feature parameter, including the LPC cepstatum coefficient (Cn') and, depending on the case, the LPC correlation coefficient (rn@), using the above-described procedure. The distance measure value calculating section 9 calculates distance measure values between the feature parameters of the input speech, the feature parameters of the standard pattern of phonemes, and the 0LPC spectral envelope relation.

まず、先に述ぺたようにバンド幅拡大法によシパンド幅
の補再量εを式（１）で求める。この補正量−のとる範
囲に制限を式（４）のように設けることが有効であるこ
とが実験的に確められた。First, as described above, the bandwidth expansion method is used to calculate the compensation amount ε of the shpand width using equation (1). It has been experimentally confirmed that it is effective to set a limit on the range of this correction amount as shown in equation (4).

ｒ−≦８≦ｒ　＋　　　　　　　　　　　　　（’１制
限を設けた場合には、Ｃがｒ−よシ小さい場合にはｅの
値をｒ−１ｃ、　ｇがｒ　＋　ｚ　ｇ大きい場合には−
の蓋をｒ＋とする。このような補正を行ったときの入力
音声及び標準パターンのＬＰＣケプストラム係数Ｃｎ、
Ｃｎ’　Ｉｄ次のように計算される。r-≦8≦r+ ('1 limit is set, if C is smaller than r-, the value of e is r-1c, and if g is larger than r+z g, -
Let the lid of be r+. LPC cepstral coefficients Cn of the input voice and standard pattern when such correction is performed,
Cn' Id is calculated as follows.

ｅｎ−（１＋　ｇ　）”　Ｃｎ　　　　　　　　　（５
１合ｎ　’−（１＋１　）−”　Ｃｎ　’　あ為いは合
Ｂ　−（１−ｇ　戸Ｃｎ’（６）よって、この補正された係数を用いて距離尺度計算をＬ
ＰＣケプスト２ム距離について行う場合は次式で求めら
れる。en-(1+g)”Cn(5
1-g n'-(1+1)-"Cn'The difference is B-(1-g Cn'(6)) Therefore, using this corrected coefficient, calculate the distance scale as L.
When calculating the PC cepst 2m distance, it is obtained by the following formula.

この演算によれば先に述べたように補正量εはＬＰＣス
ペクトラムの包絡間距離が最小と々るように１係数Ｃｎ
とＣｎ＠とを互に逆に補正して両包絡が近ずくようにな
シ、つまシバンド幅が互に近ずけられ正規化された距離
尺度値となる。According to this calculation, as mentioned above, the correction amount ε is set by one coefficient Cn so that the distance between the envelopes of the LPC spectrum reaches the minimum.
and Cn@ are mutually inversely corrected so that both envelopes become closer to each other, and the band widths are brought closer to each other, resulting in a normalized distance measure value.

いればよい。距離尺度値の計算には先に述べ良ようＫＬ
ＰＣビーク重みづけ尺度のＷＬＲを基にし九場合があり
、この場合のときの距離尺度値を前記補正を行って求め
ると、ＷＬＲ−ｆｆ（仝、静、・）（包着、・）（８）−ｓとなる。こ＼で、／＞ｉ及び会ｌ・の値は各々予測計数
（（１＋１　’）αｔ）（ｔ−ｘｓ・・・ｐｐ）及び（
（１＋１）−α１１）（””１ｅ・・・ｏｐ）からパー
コール係数を経由して求められる。こ＼でα１及びα１
１は、入力音声の線形子側係数及び標準パターンの線形
予測計数である。It's fine if you have one. To calculate the distance scale value, it is best to state KL first.
There are nine cases based on the WLR of the PC beak weighting scale, and when the distance scale value in this case is calculated by performing the above correction, )−s. Here, the values of //>i and meeting l are respectively predicted coefficients ((1+1') αt) (t-xs...pp) and (
It is obtained from (1+1)-α11)(""1e...op) via the Percoll coefficient. α1 and α1 here
1 is the linear child-side coefficient of the input voice and the linear prediction coefficient of the standard pattern.

以上述べたよ５にこの発明によれば、入力音声と音韻標
準パターンとのＬＰＣスペクトル包絡間の距離尺度値を
スペクトル包絡のバンド幅を正規化した条件で計算する
ととができ、音韻を高い精度で識別することができる。As stated above, according to the present invention, the distance measure value between the LPC spectral envelope of the input speech and the standard phonetic pattern can be calculated under the condition that the bandwidth of the spectral envelope is normalized, and the phonological standard pattern can be calculated with high accuracy. can be identified.

ヒの構成により、調音結合によるバンド幅のばらつきを
正規化したシ発声者間のバンド幅のばらつきを正規化す
るととＫ　Ｉｔ　り　、音ｍ１ｌｌｉ！臓の性能を向上
させることができる。なお（４）式の制限を設ける場合
は大きく補正することによシ返って認識精度が低下する
のを防ぐことができる。By normalizing the band width variations due to articulatory coupling and normalizing the band width variations between speakers, the sound m1lli! It can improve the performance of the heart. Note that when the restriction of equation (4) is provided, by making a large correction, it is possible to prevent the recognition accuracy from decreasing.

この発ｑａ音韻認識のみならず、例えば単語音声認識装
置にも適用して認識性能を向上させることができる。第
５図は単語音声認識装置の例を示し、記憶部３に紘単語
ごとにその特徴パラメータの時系列が蓄えられている。The present invention can be applied not only to utterance qa phoneme recognition, but also to, for example, a word speech recognition device to improve recognition performance. FIG. 5 shows an example of a word speech recognition device, in which a time series of characteristic parameters for each word is stored in the storage unit 3.

入力音声１は記憶部３よシの標準パターン時系列とのＬ
ＰＣスペクトル距離尺度値４が計算部２で計算され、そ
の距離尺度値４について時間軸正規化マツチングがマツ
チング部１７で行われる。その計算結果として単語単位
での距離尺度値１８は単語決定部１９に入力され、もつ
とも小さい距離尺度値１８の単語名あるいは一定のしき
い値以下の距離尺度値の単語名２０が出力される。The input audio 1 is the same as the standard pattern time series stored in the storage unit 3.
A PC spectral distance scale value 4 is calculated by the calculation unit 2, and time axis normalized matching is performed on the distance scale value 4 by the matching unit 17. As a result of the calculation, the distance scale value 18 in units of words is input to the word determination section 19, and the word name 20 having the smallest distance scale value 18 or the distance scale value below a certain threshold value is output.

[Brief explanation of the drawing]

第１図は同一音韻で吃スペクトル包絡のバンド幅が異危
る例を示す図、第２図はこの発明を適用し九音韻識別装
置の一実施例を示す概念図、第３図はバンド幅正規化尺
度の考え方を示す概念図、第４図は第１図の特徴抽出及
び距離尺度計算部２の一例を示す概念図、第５図はこの
発明を適用し九単語音声ｗｌｌｌＩ！装置の一例を示す
概念図である。ｌ：入力音声、２：特徴抽出及び距離尺度計算部、３：
音韻の標準パターンの特徴パラメータの記憶、４：距離
尺度値、５：音韻決定部、６：音韻名、７：自己相関分
析部、８：ＬＰＣ分折部、９：距離尺度計算部、１７：
時間軸正規化マツチング部、１８：単語単位の距離尺度
値、１９：単語決定部、２０：単語名。特許出願人　　日本電信電話公社代理人　軍費　卓 ′：Ｘ　　１　　口才　　２　　肥木　　４　　閉才　　５　　図手続補正書（自発）昭和５６１＄１２月２日特許庁長音　殿事件の表示　　特願昭５６−１５６６８３発明の名称　
音声１ｉｖｉ装置補正をする看事件との関係　　特許出願人日本電信電話公社永代　理　人　　東京都新宿区新宿４−２−２１相模ビ
ル鳴細１１１頁１７行「することによる」をｒすること
により」と訂正する。Fig. 1 is a diagram showing an example in which the bandwidth of the stuttering spectrum envelope is different for the same phoneme, Fig. 2 is a conceptual diagram showing an embodiment of a nine-phoneme identification device applying the present invention, and Fig. 3 is a diagram showing the bandwidth of the stuttering spectrum envelope. FIG. 4 is a conceptual diagram showing an example of the feature extraction and distance measure calculation unit 2 shown in FIG. 1. FIG. 5 is a conceptual diagram showing the concept of the normalization scale, and FIG. It is a conceptual diagram showing an example of a device. 1: Input audio, 2: Feature extraction and distance measure calculation unit, 3:
Memory of feature parameters of standard patterns of phonemes, 4: Distance scale value, 5: Phoneme determination unit, 6: Phoneme name, 7: Autocorrelation analysis unit, 8: LPC decomposition unit, 9: Distance scale calculation unit, 17:
Time axis normalization matching section, 18: Distance scale value in word units, 19: Word determining section, 20: Word name. Patent applicant Nippon Telegraph and Telephone Public Corporation agent Military expenses Taku': name of invention
Relationship with the case of correcting audio 1ivi equipment Patent applicant: Osamu Nagayo, Nippon Telegraph and Telephone Public Corporation, Meiho, Sagami Building, 4-2-21, Shinjuku, Shinjuku-ku, Tokyo, page 111, line 17: By changing ``by doing'' I am corrected.

Claims

[Claims]

(1) In a speech recognition device that recognizes input speech by finding the difference (distance scale value) between a feature parameter obtained from an input speech by LPC analysis and a feature parameter of a standard pattern of phonemes stored in advance, the difference ( 1. A speech recognition device, characterized in that the calculation means (distance scale value) has the ability to normalize the difference in bandwidth between LPC spectral envelopes by paying attention to the difference between LPG spectral envelopes.