JPS58168100A

JPS58168100A - Voice recognition equipment

Info

Publication number: JPS58168100A
Application number: JP5174382A
Authority: JP
Inventors: 文雄前原; 楠原　久代; 英一坪香
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1982-03-29
Filing date: 1982-03-29
Publication date: 1983-10-04

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】本発明は単音節の音声認識装置に関するものである。[Detailed description of the invention] The present invention relates to a monosyllabic speech recognition device.

近年、音声の認識を行なう装置が種々考案されている。In recent years, various devices for recognizing speech have been devised.

このような音声認識装置は入力音声信号を分析すること
によって得られるｎ次元の入力特徴ベクトル系列（ａ　
　ａ　　・・・・ａ　１　）に対し、あ１　ラ　　２ｇらかしめ装置内の辞書に登録しであるＰ個の標準パター
ンベクトル系列（ｂ１′、ｂ２ｔ・・・・ｂ、ｐ　）・
・・・（ｂｌｐ、ｂ２ｐ・・・・ｂｋｐ）の中から最も
距離の近いもの、あるいは最も類似度の大きいものを選
択して認識を終了する。この際入力特徴ベクトルや標準
パターンベクトルを求める分析方法としては複数個のフ
ィルタの組み合わせから構成されるいるフィルタバンク
もしくはフーリエｆ４により周波数分析を行なう方法や
線形予測係数を用いる方法や、ケグヌトラム係数を用い
る方法等がある。Such a speech recognition device uses an n-dimensional input feature vector series (a
a ... a 1 ), A1 la 2g P standard pattern vector sequences (b1', b2t ... b, p ) registered in the dictionary in the staking device
. . (blp, b2p, . . . bkp), the one closest in distance or the one with the highest degree of similarity is selected and recognition is completed. At this time, the analysis methods for determining the input feature vector and standard pattern vector include a filter bank consisting of a combination of multiple filters, a method of frequency analysis using Fourier f4, a method of using linear prediction coefficients, and a method of using Kegnutram coefficients. There are methods etc.

さて上述したような方法により音声を認識しようとする
場合、Ｉｂｌ　、Ｉｄｌ　、Ｉｃｒｌ　、１ｍｌ。Now, when attempting to recognize speech using the method described above, Ibl, Idl, Icrl, 1ml.

Ｉｎ＋、１ｒＩ等の有声子音を含む音声たとえば「バ」
、「ダ」、「ガ」、「マ」、「す」、「う」のような有
声子音は、発声開始時点でいわゆるバスバーと呼ばれる
音帯振動波形のみがスペクトルの特徴となって現わ扛る
部分が類似していることにより、従来の音声認識装置で
は上述したような有声子音のパターン比較に際して誤認
識を生じやすい。Speech containing voiced consonants such as In+, 1rI, etc.
, ``da'', ``ga'', ``ma'', ``su'', and ``u'' are voiced consonants such as ``da,''``ga,''``ma,''``su,'' and ``u.'' At the beginning of utterance, only the so-called busbar vibration waveform appears as a spectral feature. Due to the similarities in the voiced consonant parts, conventional speech recognition devices tend to misrecognize voiced consonants when comparing voiced consonant patterns as described above.

一方、ＩＰＩ、Ｉｔｌ、ｌｋｌのような無声破裂音の場
合でもＩｐｌ、ｌｔｌ、ｌｋｌの％徴を示すスペクトル
のパターンは地の音韻に比べ比較的短時間しか特徴を有
していないため、池の音韻と同じ畏さの時間長でｌｐｌ
　、ｌ　ｔ　ｌ　、ｌｋｌを切り出した際にはむしろ後
続母音の碩域の占める割合が大きくなり、後続母音を無
声破裂音の子音として検出してしまうため、従来の音声
認識装置では無声破裂音の認識率を低下させる一因とな
っている。On the other hand, even in the case of voiceless plosives such as IPI, Itl, and lkl, the spectral patterns showing the percent signature of Ipl, ltl, and lkl have characteristics for only a relatively short period of time compared to the earth phoneme. lpl with the same fearful time length as the phoneme
, l t l , and lkl, the proportion of the following vowel's sub-range becomes larger, and the following vowel is detected as a consonant of a voiceless plosive. This is one of the causes of lower recognition rates.

以下、第１．２図を参照してさらに具体的に説明する。A more specific explanation will be given below with reference to FIG. 1.2.

第１図（、）は単音節「・く」の発声を周波数分析した
もので、縦軸に振幅、横軸に周波数、斜傾帷に時刻を示
す。第１図（−）において、時間Ｔ１で生じている波形
がノ１ズノく−と呼ばれる部分で、調音すなわち口や声
道の形を特定音韻固有のものとする動作に先立って声帯
か振動するために生じるものであり、一般に低い周波数
蛍にのみスペクトルピークを有している。次に時間Ｔ２
でノくズノ（一部にひき続き各音韻個有の声道や口の形
状により引きおこさｎるスペクトルが比較的短時間にわ
たって生じ、その次に時間Ｔ３で後続母音とのわたり部
、時間Ｔ＝の母音定常部と続く。また第１図（ｂ）　、
　（Ｃ）はそれぞれ単音節「ガ」、「う」の発声を周波
数分析したもので、時間Ｔ１がそれぞれのノくズバ一部
を示している。第１図（ａ）　、　（ｂ）　、　（Ｃり
から明らかなようにバズバー現象は有声子音に共通のも
のであり、これらバスバー現象により生じるスペクトラ
スは非常に似たものとなっている。そのためこれらのス
ペクトラムから音声を認識することは非常に困難であり
、音声誤認識の一因となっている。とりわけ単音節を区
切って発声したものを認識させる装置ではこの影響が非
常に大きい。Figure 1 (,) is a frequency analysis of the utterance of the monosyllable ``・ku'', with the vertical axis showing amplitude, the horizontal axis showing frequency, and the slope showing time. In Figure 1 (-), the waveform occurring at time T1 is a part called no1zunoku-, where the vocal cords vibrate prior to articulation, that is, the action that makes the shape of the mouth and vocal tract unique to a particular phoneme. Generally, only low frequency fireflies have a spectral peak. Next time T2
Denokuzuno (partially followed by a spectrum caused by the shape of the vocal tract and mouth unique to each phoneme, which occurs over a relatively short period of time, and then at time T3, the transition with the following vowel, the time This continues with the vowel stationary part of T=. Also, Figure 1(b),
(C) is a frequency analysis of the utterances of the monosyllables "ga" and "u", and time T1 indicates a part of each nokuzuba. As is clear from Figure 1 (a), (b), and (C), the buzzbar phenomenon is common to voiced consonants, and the spectra produced by these busbar phenomena are very similar. Recognizing speech from these spectra is extremely difficult, and is a contributing factor to speech recognition errors.This effect is especially significant in devices that recognize speech made by dividing monosyllables.

次に第２図を用いて無声破裂計について述べる第２図（
−）は無声破裂音「パ」の発声を周波数分析したもので
、時間Ｔ６の波形は発声に先立つ無＃都時間Ｔ６の波形
は無声破裂音「バ」の特徴を最もよく現わす特徴部、Ｔ
７は後続母音とのわたり品。Next, we will discuss the silent rupture meter using Figure 2 (Figure 2).
-) is a frequency analysis of the utterance of the voiceless plosive "pa", and the waveform at time T6 is the characteristic part that best expresses the characteristics of the voiceless plosive "ba". T
7 is a crossover product with the following vowel.

時間Ｔ８は母音定常部である。また第２図（ｂ）は無声
破裂音「メ」の発声を周波尿分析したもので、第２図（
−）と同様に時間Ｔ６が無音部、時間Ｔ６が特ｉａ部、
時間゛ｆ７がわたり部、時間Ｔ８が母音定常部である。Time T8 is the vowel stationary part. Figure 2(b) is a frequency urine analysis of the voiceless plosive sound ``me''.
-), time T6 is the silent part, time T6 is the special ia part,
Time f7 is the transition part, and time T8 is the vowel stationary part.

第２図（ａ）　、　（ｂ）から明らかなように時１司Ｔ
６の特徴部では双方とも非常に類似−たスペクトラムと
なっており、また子音として特徴を有している特徴部は
短時間で生じるものであり認、藏の除にろ後続母音を検
出してしまうことが多く音声誤認識の一因となっている
。As is clear from Figure 2 (a) and (b), when
In the feature part 6, both have very similar spectra, and the feature part that has characteristics as a consonant occurs in a short period of time. This is often a cause of voice recognition errors.

本発明は上記欠点に鑑み、ポインタにより時定の頭載の
ペルトル系列を切り出すことにより、有声子音のバズバ
ーの影響や無声破裂音の後続母音部の影響を受けること
なく音声認識を行なうことのできる音声認識装置を提供
するものである。In view of the above-mentioned drawbacks, the present invention makes it possible to perform speech recognition without being influenced by the buzz bar of voiced consonants or by the vowel part following voiceless plosives by cutting out a timed initial Peltl sequence using a pointer. The present invention provides a speech recognition device.

以下、本発明の一実施例について図面を参照しながら説
明する。An embodiment of the present invention will be described below with reference to the drawings.

５ｇ３図は本発明の一実施例における音声認識装置蓋の
ブロック図である。同図において、１は入力音声を入力
信号端子１ａを介して入力してｎ次元の入力パラメータ
ベクトル系列（ａ４．ａ２・・・・・・ａ　Ｉ　Ｊに遂
次変侠するパラメータ分析器で、フィルタバンクにより
構成されている。２はキーボード３の指示により標準パ
ターンとなる標準パターンベクトル系列をあらかじめパ
ラメータ分析器１を介して記憶している標準パターンメ
モリである。Figure 5g3 is a block diagram of a voice recognition device lid in an embodiment of the present invention. In the figure, reference numeral 1 denotes a parameter analyzer that inputs an input voice through an input signal terminal 1a and sequentially changes it into an n-dimensional input parameter vector series (a4, a2...a I J, 2 is a standard pattern memory in which a standard pattern vector sequence that becomes a standard pattern is stored in advance via a parameter analyzer 1 according to an instruction from a keyboard 3.

４はポインタメモリで標準パターンメモリ２に標準パタ
ーンを記憶させる際、談示器６により標準パターンを表
示させ、音声の特徴となる特定の頭域である始点と終点
をキーホード３の操作の指示により記憶している。、６
はノ（ターン比較器で、）（ラメータ分析器１からの入
カッ（ラメータベクトル系列と標準パターンメモリ２か
らの標準〕々ターンペルトル系列とを入力し、双方のベ
クトル系列をポインタメモリ４のポインタに応じて切り
出して比較する。７はパターン比較器６により比較され
り標準パターンベクトル系列の中で入カッ（ラメータベ
クトル系列と最小の距離をもった標準・（ターンベクト
ル系列を唯一決定する距離最小判定器で判定結果を端子
７ａを介して出力する。4 is a pointer memory that, when storing a standard pattern in the standard pattern memory 2, displays the standard pattern using a talk device 6, and indicates the starting point and ending point, which are specific head regions that are characteristic of the voice, according to instructions from the keyboard 3. I remember. ,6
Input the input vector sequence from the parameter analyzer 1 (the parameter vector series and the standard from the standard pattern memory 2) and the turn pertle series using the turn comparator, and input both vector series to the pointer in the pointer memory 4. 7 is compared by the pattern comparator 6 and is selected from among the standard pattern vector series. The device outputs the determination result through the terminal 7a.

上記のように構成された音声認識装置について。Regarding the speech recognition device configured as described above.

以下音声認識の動作を説明する。The operation of voice recognition will be explained below.

まずパラメータ分析器１は入力音声を人力信号端子１ａ
を介して入力し、第４図（ａ）に示すようなｎ次元の入
力パラメータベクトル系列（ａｌｌ、ａ２゜・・・・ａ
　ｉ　）に変換してノくターン比較器６に出力する。First, the parameter analyzer 1 inputs the input voice to the human input signal terminal 1a.
, and an n-dimensional input parameter vector series (all, a2゜...a) as shown in Figure 4(a).
i) and output to the no-turn comparator 6.

端方標準パターンメモリ２は第４図（ｂ）に示すように
あらかじめ記憶している標準パターンの１つβ　　　犯である標準パターンベクトル系列（ｂｌ　、ｂ２　・・
・ｂ、、’［−パターン比較器６に出力する。As shown in FIG. 4(b), the edge standard pattern memory 2 stores in advance one of the standard patterns β, a standard pattern vector series (bl, b2, . . .
・b,,'[-Output to pattern comparator 6.

パターン、比較器６ではパラメータ分析器１の入力パラ
メータベクトル系列（ａｌ、ａ２・・・・ｄ、ｌと標準
パターンメモリ２の標準パターンベクトル系！列（１ｂ　、１ｂ”、・・・・１ｂ−Ａ）とを入力する
と１　　　　　２ともに、第４図（ａ）　Ｉ　（ｂ）に示すようにポイン
タメモリ４から標準パターンベクトル系列（ｂ１β、ｂ
２β。The pattern comparator 6 inputs the input parameter vector series (al, a2...d, l of the parameter analyzer 1 and the standard pattern vector series! column (1b, 1b'',...1b-) of the standard pattern memory 2. A), 1 2 and the standard pattern vector series (b1β, b
2β.

・・・・ｂｍ′（）に対応するポインタ（Ｓ！、εｌ）
とを入力し、ポインタ（Ｓβ、εβ　）を双方の入力パ
ラメータベクトル系列及び標準パターンベクトル系列で
、ポインタＳβとポインタりとではさまれｆｃｄ域り４
．Ｌ２で互いのベクトル系列を比較する。... Pointer (S!, εl) corresponding to bm'()
and input the pointers (Sβ, εβ) with both input parameter vector series and standard pattern vector series, and the fcd area 4 which is sandwiched between the pointer Sβ and the pointer
．． At L2, mutual vector sequences are compared.

すなわち入力パラメータベクトル系列（ａｌ　ｓ　ａ２
・・・・ａｉ　ｊをポインタ８１．εβにより与えられ
た領域Ｌ１で切り出した際のベクトル系列を（ａＢｌｍ
”ｓＩｌ、＋１・・・・、ａ、）とし、また標準パター
ンベク１２１２、Ｉｌトル系列（１ｂ　、１ｂ　、・・・　１ｂｍ）をポイン
ター　　　　　　２ｓＩｌ、、ε！により与えられた領域Ｌ２で切り出した
際のベクトル系列を（ｂ、、ｂＪ＋１　、・・・・、ｂ
ｇｊβ とすれば、パターン比較器６は双方のベクトル間の距離
０を、により求める。That is, the input parameter vector sequence (al s a2
. . . ai j to pointer 81. The vector sequence when cut out in the region L1 given by εβ is (aBlm
"sIl, +1..., a,), and the standard pattern vector 1212, Il torque series (1b, 1b,... 1bm) was cut out in the area L2 given by the pointer 2sIl,, ε! The actual vector sequence is (b,, bJ+1 ,..., b
If gjβ, the pattern comparator 6 finds the distance 0 between both vectors using the following equation.

以上のようにして順次標準）くターンメモリ２に記憶し
ている標準ノ（ターンのすべてと比較し、双方のベクト
ル間の距４０″ｆ：求める。そして距Ｐａ最小判定部７
では距４Ｄが最も小さくなった標準／−Ｃターンベクト
ル系列を唯−選び出し、その標準）（ターン清報を端子
７ａを介して出力する。In the above manner, the distance 40"f between both vectors is determined by sequentially comparing all the standard turns stored in the turn memory 2. Then, the minimum distance Pa determining unit 7
Then, only the standard/-C turn vector series with the smallest distance 4D is selected, and its standard) (turn information) is outputted via the terminal 7a.

以上のように本実施例によれば、標準）（ターンをパタ
ーンメモリ２に記憶させる際にキーボード３及び表示４
６によりその標準パターンに応じて特徴が生じる領域を
ポインタメモリ４に登録しておくことにより、パラメー
タ分析器１から送出される入力パラメータベクトル系列
とパターンメモリ°２から送出される標準パターンベク
トル系列との距離を求める際ポインタメモリ４に登録さ
れているポインタにより指定される領域で双方のベクト
ル系列の距離を求めるため、非常に精度のよい音声認識
を行なうことができる。As described above, according to this embodiment, when storing a standard turn in the pattern memory 2, the keyboard 3 and the display 4
By registering in the pointer memory 4 the area in which features occur according to the standard pattern in 6, the input parameter vector series sent from the parameter analyzer 1 and the standard pattern vector series sent out from the pattern memory 2 can be combined. When calculating the distance between both vector series, the distance between both vector series is calculated in the area designated by the pointer registered in the pointer memory 4, so that very accurate speech recognition can be performed.

なお本実施例ではパラメータ分析器１をフィルタパンク
により構成したが、パラメータ分析器１はフーリエ変換
器、ｆｌＪｌ十形係数付分析器、あるいはケプストラム
係数付分析器等の分析手段でもよい。In this embodiment, the parameter analyzer 1 is configured by a filter puncture, but the parameter analyzer 1 may be an analysis means such as a Fourier transformer, an analyzer with flJl decimal coefficients, or an analyzer with cepstral coefficients.

また本実施例ではポインタＳ１．εβにより入力パラメ
ータベクトル系列及び標準パターンベクトル系列の双方
を特徴のある領域で切り出したが、ポインタＳμ、ε１
により標準パターンベクトル系列あるいは入力パラメー
タベクトル系列のどちらか一方を特定の領域で切り出す
ようにしてもよい。Further, in this embodiment, the pointer S1. Although both the input parameter vector series and the standard pattern vector series were cut out in a characteristic region using εβ, the pointer Sμ, ε1
Either the standard pattern vector series or the input parameter vector series may be cut out in a specific region.

さらにパターン比較器６及び距離最小判定部７は入力パ
ラメータベクトル系列と標準ノ（ターンベクトル系列と
の最小距離を求めるように慴成したが、最大類似度を求
めるようなものであってもよいことは言うまでもない。Furthermore, although the pattern comparator 6 and the distance minimum determination unit 7 are designed to find the minimum distance between the input parameter vector sequence and the standard turn vector sequence, they may be designed to find the maximum similarity. Needless to say.

以上のように本発明はそれぞれの音声に特徴ヲ有してい
る領域の始点とをあらかじめポイントとして記憶してい
るポイント記憶手段を設け、パラメータ分析手段の出力
である入力特徴ベクトル系列と標準パターン記憶手段の
出力である４阜パタ一ンベクトル系列との少なくとも一
方を前記ポイント記憶手段のポイントにより領域を決め
て双方のベクトル系列の距！４を求めることにより、有
声子音のバスバーの影響や無声破裂音の後続母音ら影響
を防止し、音声認識の際の認識率の向上を計ることがで
き、その１条的１１ｔＩｉ　１ｉは大なるものがある。As described above, the present invention provides a point storage means that stores in advance the starting point of a region having characteristics in each voice as a point, and stores the input feature vector series which is the output of the parameter analysis means and the standard pattern. The area of at least one of the four pattern vector series output from the means is determined by the points of the point storage means, and the distance between both vector series is calculated. By determining 4, it is possible to prevent the effects of the busbar of voiced consonants and the following vowels of voiceless plosives, and improve the recognition rate during speech recognition. There is.

【図面の簡単な説明】第１図は有声子音の特性図、第２図は無声破裂音の特性
図、第３図は本発明の一実施例における音声認識装置の
ブロック図、第４図はポイ；′夕により切り出しを行な
う際の音声の特性図である。１・・・・パラメータ分析器、２・・・・・標準）くタ
ーンメモリ、４・・・・・・ポインタメモリ、６・・・
・・パターン比較器、７・・・・・距離最小判定部。代理人の氏名　弁理士　中　尾　敏　男　ほか１名第１
図第１図第４図（α）[Brief Description of the Drawings] Fig. 1 is a characteristic diagram of voiced consonants, Fig. 2 is a characteristic diagram of voiceless plosives, Fig. 3 is a block diagram of a speech recognition device in an embodiment of the present invention, and Fig. 4 is a characteristic diagram of voiced consonants. It is a characteristic diagram of the voice when cutting out by evening. 1...Parameter analyzer, 2...Standard) turn memory, 4...Pointer memory, 6...
...Pattern comparator, 7...Distance minimum determination unit. Name of agent: Patent attorney Toshio Nakao and 1 other person No. 1
Figure 1 Figure 4 (α)

Claims

[Scope of Claims] Parameter analysis means for performing parameter analysis of input speech and outputting an n-dimensional input feature vector sequence (a 1s a 2...k) (where i is an integer); Parameter-analyzed two sets (where P is an integer) of standard pattern vector sequences (b', b y,...
...b,')...[blp, b2p...1
2] A drift pattern storage means that stores ``bkp)'' (where j is an integer) and a point that stores in advance the start and end points of regions having characteristics for each voice. a storage means, and an input feature vector sequence (a 1 . a 2 ,
...at) and the standard pattern vector string (b 1.b 2...
bmn) (where m and n are integers m and m≦k); a comparison means for determining the distance between both vector sequences by determining an area using the points of the point storage means; Standard pattern vector sequence (blr, b2r...-b9rJ (however, "s
q is an integer) is a speech recognition device.