JPS6310840B2

JPS6310840B2 -

Info

Publication number: JPS6310840B2
Application number: JP57123801A
Authority: JP
Inventors: Yoshiteru Mifune; Satoru Kabasawa; Hidekazu Tsuboka
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1982-07-15
Filing date: 1982-07-15
Publication date: 1988-03-09
Also published as: JPS5915298A

Description

【発明の詳細な説明】産業上の利用分野本発明は音声認識装置における音韻識別方式に
関する。DETAILED DESCRIPTION OF THE INVENTION Field of Industrial Application The present invention relates to a phoneme identification method in a speech recognition device.

従来例とその問題点従来の音声認識装置における音韻識別方式は、
はじめに入力音声の特徴ベクトルの時系列パター
ンに対して大まかな音韻区間の分類を行い、例え
ば非定常な子音区間と比較的定常な母音区間等の
分離を行つた後で、子音区間の各フレームについ
ては、全ての子音の標準パターンとの距離を計算
して子音を決定し、母音区間についても同様の計
算を行つて母音を決定していた。上記のような従
来の音声認識装置における大まかな音韻区間の分
類方式には、単純に系列の相関値等によつて子音
区間と母音区間の分類のみ行い、音韻の最終決定
は標準パターンとの比較によつて行つているもの
がある。しかしこの場合には大まかな分類自体の
精度も悪く、かつ標準パターンの集合も大規模と
なりパターンマツチングにおける計算量も莫大な
ものとなる。そこでこれらの欠点を改善するもの
として、大まかな音韻区間の分類を周波数の低域
あるいは高域に対する偏りに関する情報も利用し
て、子音区間もさらに有声子音区間と無声子音区
間に、またさらに精度の高い分類を行うものにつ
いては、無声子音区間も破裂音と摩擦音に分類す
るものがある。そして最終決定を行うべき標準パ
ターンの集合規模を小さくして、音韻の最終決定
を行うべき標準パターンとの比較回数を低減して
処理速度を向上させている。しかしこの場合にお
いても標準パターンとの比較回路は低減されるも
のの、大まかな分類に要する計算量が増加し、装
置全体として評価すると計算量は必ずしも低減さ
れることにはならず、音韻識別に要する計算時間
に問題がある。Conventional examples and their problems The phoneme identification method in conventional speech recognition devices is
First, the time-series pattern of the feature vector of the input speech is roughly classified into phoneme intervals, and after separating, for example, non-stationary consonant intervals and relatively constant vowel intervals, each frame of the consonant interval is classified. The consonant was determined by calculating the distance between all consonants and the standard pattern, and the same calculation was performed for the vowel interval to determine the vowel. Conventional speech recognition devices, such as those mentioned above, use a rough classification method for phoneme intervals that involves simply classifying consonant and vowel intervals based on sequence correlation values, etc., and the final decision on phoneme is made by comparison with a standard pattern. There are things that are going on. However, in this case, the accuracy of the rough classification itself is poor, and the set of standard patterns is also large, resulting in an enormous amount of calculation for pattern matching. Therefore, in order to improve these shortcomings, the consonant intervals can be further classified into voiced consonant intervals and unvoiced consonant intervals, and further improve the precision by using information regarding the bias toward low or high frequencies in the rough classification of phoneme intervals. Among those that perform high classification, there are also those that classify voiceless consonant intervals into plosives and fricatives. Then, by reducing the set size of the standard patterns for which final decisions are to be made, the number of comparisons with the standard patterns for which final phoneme decisions are to be made is reduced, thereby improving processing speed. However, even in this case, although the comparison circuit with the standard pattern is reduced, the amount of calculation required for rough classification increases, and when evaluating the device as a whole, the amount of calculation is not necessarily reduced, and the amount of calculation required for phoneme identification increases. There is a problem with calculation time.

発明の目的本発明は上記従来の欠点を解消するもので、標
準パターンと入力音声の特徴ベクトル間の距離計
算を簡略化し、音韻識別に要する処理時間を低
減、音声認識装置の実時間処理を可能とし、かつ
装置構成を簡略化することを目的とする。Purpose of the Invention The present invention solves the above-mentioned conventional drawbacks by simplifying the distance calculation between the standard pattern and the feature vector of input speech, reducing the processing time required for phoneme identification, and enabling real-time processing by speech recognition devices. The purpose is to simplify the device configuration.

発明の構成上記目的を達成するため、本発明の音韻識別方
式は、入力音声を特徴ベクトル｛〓_ti｝の時系列
パターン｛〓_t1、〓_t2、……〓_tN｝に変換する特徴
系列変換手段と、各音韻の特徴ベクトルの標準パ
ターンを記憶する標準パターン記憶手段と、入力
と標準パターンの特徴ベクトル間の距離を計算す
る距離判定手段とより成り、前記距離判定手段
を、入力の時系列パターンの定められたフレーム
｛〓_tk｝においては、すべての標準パターンとの
比較を行つて音韻を決定し、それ以後のフレーム
｛〓_t〓｝（但し、τ＞ｋ）については決定された標
準パターンとの距離が定められた閾値以下である
場合には、同一音韻として比較すべき特徴ベクト
ルのフレームを更新し、閾値以上である場合には
他のすべての標準パターンとの比較を行つて新し
い音韻を決定し、それ以後のフレーム｛〓_tn｝
（但し、ｍ＞τ）については同様な距離の閾値に
よる判定を繰り返して音韻識別を行う構成であ
る。Structure of the Invention In order to achieve the above object, the phoneme identification method of the present invention includes a feature sequence conversion means for converting input speech into a time series pattern {〓 _t1 , 〓 _t2 , ... 〓 _tN } of a feature vector {〓 ti } _. , a standard pattern storage means for storing a standard pattern of feature vectors of each phoneme, and a distance determination means for calculating the distance between the input and the feature vector of the standard pattern, and the distance determination means is configured to be a standard pattern storage means for storing a standard pattern of feature vectors of each phoneme. In the determined frame {〓 _tk }, the phoneme is determined by comparing with all standard patterns, and for subsequent frames {〓 _t 〓} (however, τ>k), the determined standard pattern is used. If the distance is less than a predetermined threshold, the frame of the feature vector to be compared as the same phoneme is updated, and if it is more than the threshold, a new phoneme is created by comparing with all other standard patterns. and the subsequent frames {〓 _tn }
(However, for m>τ), the configuration is such that phoneme identification is performed by repeating determination using a similar distance threshold.

実施例の説明以下、本発明の一実施例を図面に基づいて説明
する。DESCRIPTION OF EMBODIMENTS Hereinafter, one embodiment of the present invention will be described based on the drawings.

本発明の処理概念を第１図を用いて説明する。 The processing concept of the present invention will be explained using FIG.

音韻識別手段は、入力音声時系列から一定時間
間隔ごとに特徴ベクトル系列｛〓_ti｝に変換する
特徴系変換列手段１と、特徴ベクトル系列から大
まかな音韻（無声子音、有声子音、母音等）を分
類する大分類手段２と、有声子音と母音の場合に
は、有声子音と母音の標準パターン４，４′との
距離を計算して音韻を決定する距離判定手段３，
３′と、音韻時系列を出力する音韻系列出力手段
５とからなる。 The phoneme identification means includes a feature system conversion sequence means 1 that converts an input speech time series into a feature vector sequence {〓 _ti } at regular time intervals, and a rough phoneme (voiceless consonant, voiced consonant, vowel, etc.) from the feature vector sequence. a major classification means 2 for classifying a voiced consonant and a vowel; a distance determining means 3 for determining a phoneme by calculating the distance between a voiced consonant and a standard pattern 4, 4' of a vowel in the case of a voiced consonant and a vowel;
3', and a phoneme sequence output means 5 for outputting a phoneme time sequence.

第１図の動作説明を行う。入力音声は、特徴系
列変換手段１によつて、特徴ベクトルの系列に変
換される。この特徴ベクトルは大分類手段２によ
つて例えば周波数の低域あるいは高域に対する偏
より等で、無声子音、有声子音、母音に大まかに
分類する。そして有声子音と母音の場合には、そ
れぞれ距離判定３，３′によつて有声子音と母音
の標準パターン４，４′との距離計算を実行し、
最も近い音韻を割り当てる。各々音韻が決定され
ると音韻系列出力手段５によつて音韻を時系列で
出力する。上記において無声子音を標準パターン
と比較しないのは、無声子音の非定常性によつて
いるからである。 The operation of FIG. 1 will be explained. Input speech is converted into a sequence of feature vectors by the feature sequence conversion means 1. The feature vectors are roughly classified into voiceless consonants, voiced consonants, and vowels by the rough classification means 2, for example, based on their bias toward low or high frequencies. In the case of voiced consonants and vowels, distance calculations between voiced consonants and vowel standard patterns 4 and 4' are performed by distance determinations 3 and 3', respectively;
Assign the closest phoneme. When each phoneme is determined, the phoneme sequence output means 5 outputs the phonemes in time series. The reason why voiceless consonants are not compared with the standard pattern in the above is due to the non-stationarity of voiceless consonants.

次に本発明における距離判定手段３の詳細な構
成と処理の様子を第２図、第３図を用いて説明す
る。 Next, the detailed configuration and processing of the distance determining means 3 in the present invention will be explained using FIGS. 2 and 3.

距離判定手段３は、入力の特徴ベクトルを記憶
する特徴ベクトル記憶部６と、特徴ベクトルと標
準パターン４の１つの標準パターンベクトル４
の間の距離を計算する距離計算部７と、初めに定
められた特徴ベクトルとはすべての標準パターン
ベクトルの比較を行い一度音韻が決まると次に入
力される特徴ベクトルとは決められた音韻の標準
パターンのベクトルの比較を行い、その距離が一
定値以下であれば同一音韻とし、一定値以上であ
れば他のすべての標準パターンベクトルと比較を
行つて音韻を更新する比較制御部８からなる。 The distance determining means 3 includes a feature vector storage unit 6 that stores input feature vectors, and one standard pattern vector 4 of the feature vector and the standard pattern 4.
The distance calculation unit 7 that calculates the distance between the first and first determined feature vectors compares all standard pattern vectors, and once the phoneme is determined, the next input feature vector is the first determined feature vector. It consists of a comparison control unit 8 that compares standard pattern vectors, and if the distance is less than a certain value, it is considered the same phoneme, and if it is more than a certain value, it compares with all other standard pattern vectors and updates the phoneme. .

第３図は比較制御部８の詳細な動作を表わす。 FIG. 3 shows the detailed operation of the comparison control section 8.

第３図において特徴ベクトルは２次元（X_ti1、
X_ti2）で表わされている。音韻の標準パターンベ
クトル｜Ａ｜、｜ｉ｜、｜ｕ｜はそれらの重心が
９，１０，１１で表わされている。入力の特徴ベ
クトルは｛〓_tk｝を始端とし時系点（ｔ）の標本
として１２で表わされている。また各標準パター
ンベクトルに対応した一定の閾値がそれぞれr₁、
r₂、r₃で表わされている。比較制御部８は、ある
時間の特徴ベクトル｛〓_tk｝とすべての標準パタ
ーンベクトル９，１０，１１の距離を計算する。
第３図においては、母音｜ｉ｜が判定される。次
の特徴ベクトル｛〓_tk+1｝′は、｜ｉ｜の標準パタ
ーンベクトル１０との距離のみを計算し、一定の
閾値r₂とを比較し、この場合は以下となるから母
音｜ｉ｜とする。特徴ベクトル｛〓_tk+2｝につい
ても同様である。特徴ベクトル｛〓_tk+3｝につい
ては、一定の閾値r₂以上となるので他の標準パタ
ーンベクトル９，１０との距離を計算し、この場
合｜Ａ｜と判定する。特徴ベクトル｛〓_tk+4｝に
ついては、｜Ａ｜の標準パターンベクトル９との
距離がr₂以上なので他の標準パターンベクトル１
０，１１とも比較し、この場合も｜Ａ｜と判定す
る。特徴ベクトル｛〓_tk+5｝｛〓_tk+6｝｛〓_tk+7｝は
｜Ａ｜との距離がr₁以下なのですべて｜Ａ｜と判
定する。第３図の場合に、本発明によらない音韻
決定に要する距離計算回数は、８×３＝24回であ
り、本発明による距離計算回数は、５＋３×３＝
14回となりほぼ半分の計算回数となる。母音区間
では定常性が大きいためにこの方式における計算
回数の減少はより大きく期待できる。 In Fig. 3, the feature vector is two-dimensional (X _ti1 ,
X _ti2 ). The standard pattern vectors of phonemes |A|, |i|, |u| have their centroids represented by 9, 10, and 11. The input feature vector is represented by 12 as a sample of time point (t) with {〓 _tk } as the starting point. Also, the constant threshold values corresponding to each standard pattern vector are r ₁ ,
It is expressed as r ₂ and r ₃ . The comparison control unit 8 calculates the distance between the feature vector {〓 _tk } at a certain time and all the standard pattern vectors 9, 10, and 11.
In FIG. 3, the vowel |i| is determined. The next feature vector {〓 _tk+1 }' is determined by calculating only the distance of |i| from the standard pattern vector 10 and comparing it with a certain threshold value r _2. In this case, the following is obtained, so the vowel |i| shall be. The same applies to the feature vector {〓 _tk+2 }. Since the feature vector {〓 _tk+3 } is equal to or greater than a certain threshold value r ₂ , the distance from the other standard pattern vectors 9 and 10 is calculated, and in this case it is determined to be |A|. Regarding the feature vector {〓 _tk+4 }, the distance from the standard pattern vector 9 of |A| is r ₂ or more, so the other standard pattern vector 1
0 and 11 are also compared, and in this case as well, it is determined to be |A|. Since the feature vector {〓 _tk+5 } {〓 _tk+6 } {〓 _tk+7 } has a distance of r ₁ or less from |A|, all of them are determined to be |A|. In the case of FIG. 3, the number of distance calculations required for phoneme determination not according to the present invention is 8×3=24, and the number of distance calculations according to the present invention is 5+3×3=
This is 14 times, which is almost half the number of calculations. Since stationarity is large in vowel intervals, this method can be expected to reduce the number of calculations even more.

発明の効果以上説明のように本発明によれば次の効果を得
ることができる。Effects of the Invention As explained above, according to the present invention, the following effects can be obtained.

本発明は入力音声の特徴ベクトルと標準パター
ンのパターンマツチングを、定められたフレーム
においては全ての標準パターンとの比較を行つて
決定し、それ以降の特徴ベクトルのフレームは、
決定された音韻との距離のみを計算し一定の閾値
以下の場合には同一音韻とし、一定の閾値以上の
場合には他のすべての標準パターンとの比較を行
つて音韻を更新し同様の操作を繰返し適用するこ
とで音韻識別をするため、音韻決定のパターンマ
ツチングに要する処理時間を短縮でき、かつ音声
認識装置における処理構成を簡略化できるもので
ある。 The present invention determines pattern matching between a feature vector of an input voice and a standard pattern by comparing it with all standard patterns in a predetermined frame, and in subsequent frames of the feature vector,
Only the distance to the determined phoneme is calculated, and if it is less than a certain threshold, it is considered the same phoneme, and if it is more than a certain threshold, it is compared with all other standard patterns, the phoneme is updated, and the same operation is performed. Since phoneme identification is performed by repeatedly applying , the processing time required for pattern matching for phoneme determination can be shortened, and the processing configuration of the speech recognition device can be simplified.

[Brief explanation of the drawing]

第１図は本発明における入力音声から音韻系列
を計算する音韻識別手段の具体的構成図、第２図
は本発明における音韻識別手段の中の特徴ベクト
ルと標準パターンの距離計算を行う距離判定手段
の具体的構成図、第３図は本発明における距離判
定手段の動作原理の説明図である。１……特徴系列変換手段、３……距離判定手
段、４……標準パターン。 FIG. 1 is a specific configuration diagram of the phoneme identification means for calculating a phoneme sequence from input speech according to the present invention, and FIG. 2 is a distance judgment means for calculating the distance between a feature vector and a standard pattern in the phoneme identification means according to the invention. FIG. 3 is an explanatory diagram of the operating principle of the distance determining means in the present invention. 1...Feature series conversion means, 3...Distance determination means, 4...Standard pattern.

Claims

[Claims]

1. A feature sequence conversion means for converting input speech into a time series pattern of feature vectors {〓 _ti } {〓 _t1 , 〓 _t2 , ...〓 _tN }, and a standard pattern storage means for storing a standard pattern of feature vectors for each phoneme. and a distance determining means for calculating the distance between the feature vectors of the input and standard patterns, and the distance determining means is configured to calculate the distance between all standard patterns _and For subsequent frames {〓 _t 〓} (where τ > k), if the distance from the determined standard pattern is less than a predetermined threshold, the phoneme is determined by comparing The frame of the feature vector to be compared is updated, and if it is equal to or higher than the threshold, a new phoneme is determined by comparison with all other standard patterns, and subsequent frames {〓 _tn } (where m> For τ), this phoneme identification method is configured to perform phoneme identification by repeating determination using a similar distance threshold.