JPH0199096A

JPH0199096A - Pattern generator

Info

Publication number: JPH0199096A
Application number: JP62257585A
Authority: JP
Inventors: Hidekazu Tsuboka; 英一坪香
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1987-10-13
Filing date: 1987-10-13
Publication date: 1989-04-17

Abstract

PURPOSE: To improve the accuracy of recognition by reflecting dynamic features at the time of preparing a reference pattern for voice recognition or the like. CONSTITUTION: A feature extraction part 1 converts an input voice signal to be registered as a reference pattern into a feature vector sequence and stores the converted sequence in an input buffer memory 2. A mean value calculation part 5 successively reads out each partial section set up by a partial section setting part 3 from a memory 2, finds out a mean value and outputs the mean value to a least square approximate straight line calculating part 6. The calculating part 6 finds out the least square approximate straight line of feature vectors and a residual square sum (partial distance) based upon outputs from the calculation part 5 and the memory 2 and a partial residual square sum normalizing part 15 normalizes the residual square sum. A minimum accumulated residual square sum calculating part 7 divides a normalized residual square sum so as to minimize it based upon an output from the normalizing part 15, finds out the sum and stores the average vector and direction vector of a square approximate straight line in an average/direction vector storing part 13. Since the inclination of the straight line in the partial section indicates a dynamic feature and the average vector indicates a static feature, the accuracy of recognition can be improved.

Description

【発明の詳細な説明】産業上の利用分野本発明は、音声認識等における標準パターンを作成する
装置に関する。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to an apparatus for creating standard patterns in speech recognition and the like.

従来の技術以下、単語音声の認識を行う場合について説明する。ま
た、ベクトル間あるいはパターン間の相違は、類似度、
距離、誤差等の言葉が用いられ、それぞれの尺度も種々
存在するものであるが、本発明にとっては本質的なもの
ではないので、ここでは距離と言う言葉をそれ等を代表
させて用いることにする。即ち、例えば、距離が近い、
距離が小さいと言うことは、類似度が高い、類似度が大
きいと言うことに対応し、距離が遠い、距離が大きいと
言うことは類似度が低い、類似度が小さいと言うことに
対応する等である。BACKGROUND OF THE INVENTION A case in which word speech is recognized will be described below. In addition, differences between vectors or patterns can be expressed as similarity,
Words such as distance and error are used, and there are various measures for each, but since they are not essential to the present invention, the word distance will be used here to represent them. do. That is, for example, the distance is short,
A small distance corresponds to a high degree of similarity, and a large distance corresponds to a low degree of similarity. etc.

音声認識等の特徴ベクトルの系列からなるパターンを認
識する方法として、所謂ＤＰマツチング法がよく用いら
れる。これは認識すべき単語音声３　・・− を代表する、特徴ベクトルの系列からなるパターンを標
準パターンとして、それぞれの前記単語音声について予
め登録しておき、認識時には、同じく特徴ベクトルの系
列からなる認識されるべき入カバターンと前記標準パタ
ーンのそれぞれと照合を取シ、最も距離的に近い標準パ
ターンを探索し、その標準パターンに対応する単語を入
カバターンの認識結果とするものである。このとき、時
間長の異なるパターン同志を時間軸を非線形に伸縮させ
る必要があるが、これを効率よく行うために動的計画法
を用いるのがＤＰマツチングと呼ばれる方法であり、今
のところ最もよい結果の得られる方法の一つである。A so-called DP matching method is often used as a method for recognizing a pattern consisting of a series of feature vectors, such as in speech recognition. This is done by registering a pattern consisting of a series of feature vectors representing the word speech 3 to be recognized as a standard pattern in advance for each of the word speeches, and at the time of recognition, recognizing the same sequence of feature vectors. The input cover pattern to be used is compared with each of the standard patterns, the closest standard pattern is searched, and the word corresponding to the standard pattern is taken as the recognition result of the input cover pattern. At this time, it is necessary to non-linearly expand and contract the time axes of patterns with different time lengths, but in order to do this efficiently, a method called DP matching uses dynamic programming, which is currently the best method. This is one of the ways to get results.

ところがこの方法は、時間軸の伸縮は比較すべき両パタ
ーンが最も距離的に近くなるように時間軸の伸縮が行な
われるものであって、時間軸に対する傾斜等の特徴ベク
トルの時間的変化に関する特徴（以後、動的特徴と呼ぶ
ことにする）が適切に反映されないきらいがある。従っ
て、スペクトルの変化の仕方に特徴のある音韻に対して
は、この方法のみでは認識精度の点で不十分である。However, in this method, the time axis is expanded or contracted so that both patterns to be compared are closest in terms of distance, and features related to temporal changes in feature vectors such as slopes with respect to the time axis are used. (hereinafter referred to as dynamic features) may not be reflected appropriately. Therefore, this method alone is insufficient in terms of recognition accuracy for phonemes that are characterized by the way their spectra change.

また、単語辞書を音韻や音節（以後、音声素片と呼ぶこ
とにする）を表わす記号系列の形でもち、予めそれぞれ
の音声素片に対応する標準パターンを準備しておき、認
識すべき入カバターンを前記標準パターンを基にして音
声素片系列、即ち、各音声素片を表わす記号の系列に変
換し、前記単語辞書のそれぞれの単語と記号レベルのマ
ツチングを行ない、最も距離的に近い単語を認識結果と
するものがある０このとき、前記入カバターンから変換
された音声素片系列は、音声素片の認識を完全にするこ
とは不可能であるから、挿入、脱落。In addition, we have a word dictionary in the form of symbol sequences representing phonemes and syllables (hereinafter referred to as speech segments), prepare standard patterns corresponding to each speech segment in advance, and use The kabataan is converted into a sequence of phonetic units, that is, a sequence of symbols representing each phonetic unit, based on the standard pattern, and matching is performed at the symbol level with each word in the word dictionary to find the closest word. In this case, since it is impossible to completely recognize the speech segments, the speech segment sequence converted from the input cover pattern is inserted or omitted.

置換等の多少の間違いを含んでいる。従って、前記記号
レベルのマツチングにおいては、予め計算し、準備され
た音声素片間距離を基に、ＤＰマツチングによシ音声素
片系列間の距離を求めることになる。この場合も、前記
入カバターンに対して音声素片認識を行う場合や音声素
片間距離を求めるに際して、前記動的特徴を反映させる
ことが認識精度を上げる上で重要な問題となる。Contains some errors such as substitutions. Therefore, in the symbol level matching, distances between speech segment sequences are determined by DP matching based on distances between speech segments that have been calculated and prepared in advance. In this case as well, it is important to reflect the dynamic characteristics when performing speech segment recognition on the input cover pattern or when calculating the distance between speech segments in order to improve recognition accuracy.

６　′＼−・発明が解決しようとする問題点本発明は、上記従来例の欠点に鑑み、時間的動的特徴を
加味した標準パターンの適切な表現方法とそれに基づく
標準パターンを作成するパターン作成装置の実現にある
。6'＼-・Problems to be Solved by the Invention In view of the drawbacks of the above-mentioned conventional examples, the present invention proposes an appropriate method of expressing standard patterns that takes temporal dynamic characteristics into account, and a pattern creation method for creating standard patterns based on the method. It lies in the realization of the device.

問題点を解決するだめの手段特徴ヘク）／しの系列からなるパターンの第ｉ部分区間
の候補区間を設定する部分区間設定手段と、この設定さ
れた前記パターンの第ｉ区間の候補区間の特徴ベクｌ−
）Ｉ／系列と、ペクト／Ｌ／値をとる時間関数で与えら
れる曲線上のペクト）ｖ系列との前記区間の長さに関連
して正規化された正規化部分距離（正規化部分類似度）
が最小（最大）になるように前記時間関数のパラメータ
を算出する最適パラメータ算出手段と、前記部分区間の
区間数および分割位置を最適に定めるととにより、前記
正規化部分距離（正規化部分類似度）の合計の最小（最
大）値を求める最小累積距離（最大累積類似度）算出手
段と、前記最適に定められた各部分区間ｉに対応して算
出された前記時間関数のパラメ−タを前記区間番号ｉに
関連して記憶する標準パターン記憶手段とを構成要件と
するものである。Means for solving the problem Characteristics: A subsection setting means for setting a candidate section for the i-th subsection of a pattern consisting of a series of vector l-
)I/series and the pect)v sequence on a curve given by a time function that takes pect/L/value. Normalized partial distance (normalized partial similarity) normalized in relation to the length of the interval )
The normalized partial distance (normalized partial similarity) is calculated by optimal parameter calculation means for calculating the parameters of the time function such that a minimum cumulative distance (maximum cumulative similarity) calculating means for calculating the minimum (maximum) value of the sum of the degrees), and a parameter of the time function calculated corresponding to each optimally determined subinterval i The standard pattern storage means stores information related to the section number i.

作　　用部分区間設定手段によシ、特徴ベクＩ−／しの系列から
なるパターンの第ｉ部分区間の候補区間を設定し、最適
パラメータ算出手段によシ、この設定された前記パター
ンの第ｉ区間の候補区間の特徴ベクトル系列と、ベクト
ル値をとる時間関数で与えられる曲線上のベクトル系列
との前記区間の長さに関連して正規化された正規化部分
距離（正規化部分類似度）が最小（最大）になるように
前記時間関数のパラメータを算出し、最小累積距離（最
大累積類似度）算出手段により、前記正規化部分距離（
正規化部分類似度）の合計の最小（最大）値を求めるべ
く前記部分区間の区間数および分割位置を最適に定め、
標準パターン記憶手段に、前記最適に定められた各部分
区間ｉに対応して算出された前記時間関数のパラメータ
を前記区間番号ｉに関連して記憶し、このパラメータ列
を前記パターンの新たな標準パターンとするものである
。The operation partial interval setting means sets a candidate interval for the i-th partial interval of the pattern consisting of the series of feature vectors I−/2, and the optimum parameter calculating means sets the i-th candidate interval of the set pattern. Normalized partial distance (normalized partial similarity) between a feature vector sequence of a candidate interval and a vector sequence on a curve given by a time function that takes vector values, normalized in relation to the length of the interval. The parameters of the time function are calculated so that the minimum (maximum)
optimally determine the number of sections and division positions of the partial sections in order to obtain the minimum (maximum) value of the total of normalized partial similarity),
The parameters of the time function calculated corresponding to each optimally determined subinterval i are stored in the standard pattern storage means in association with the interval number i, and this parameter string is used as a new standard for the pattern. It is a pattern.

７　へ−７実施例前記時間関数としては、ｎ次（ｎ＝１．２．・・・）多
項式やスプライン関数等が用いられ得る。ここでは簡単
のためと十分実用に耐え得るという理由から、１次関数
を用いる場合について本発明の一実施例を説明する。ま
だ、前記曲線とそれに対応する実際の特徴ベクトルとの
相違を表す量として、前記特徴ベクトルとそれに対応す
る前記曲線上のベクトルのユークリッド距離の２乗和を
用いることにする。この場合は前記曲線は所謂最小２乗
近似直線となり、前記距離に対応する量は残差平方和と
呼ばれるものになる。7 to 7 Embodiment As the time function, an n-th order (n=1.2...) polynomial, a spline function, etc. may be used. Here, an embodiment of the present invention will be described using a linear function for the sake of simplicity and for the reason that it is sufficiently practical. Still, the sum of squares of the Euclidean distances between the feature vector and its corresponding vector on the curve will be used as a quantity representing the difference between the curve and the actual feature vector corresponding thereto. In this case, the curve becomes a so-called least squares approximation straight line, and the quantity corresponding to the distance becomes what is called the residual sum of squares.

第１図は本発明の一実施例であって、音声信号に対して
前記標準パターンの作成を行う場合について説明してい
る。FIG. 1 shows an embodiment of the present invention, and describes a case where the standard pattern is created for an audio signal.

１は特徴抽出部であって、標準パターンとして登録すべ
き入力音声信号をフィルタバンク、フーリエ変換、ＬＰ
Ｃ分析等の周知の方法によって数ｍ５ｅｃ〜士数ｍ５ｅ
ｃ毎（フレームと称する）に数次光〜士数次元の特徴ベ
クトルの系列１ｘ（ｔｌ　＝　１ｘ（１）、　Ｘ（２）
、　＝−、ｄｌに変換するものである。ｘ　（ｔ）は時
刻ｔにおける特徴ベクトルである。1 is a feature extraction unit which processes input audio signals to be registered as standard patterns through a filter bank, Fourier transform, LP
Several m5ec to several m5e are determined by well-known methods such as C analysis.
For each c (referred to as a frame), a series of feature vectors from multi-dimensional light to multi-dimensional light 1x (tl = 1x(1), X(2)
, =-, dl. x (t) is a feature vector at time t.

２は入力バッファメモリであって、特徴面出部１で得ら
れる特徴ベクトルの系列を一時記憶するものである。Reference numeral 2 denotes an input buffer memory, which temporarily stores a series of feature vectors obtained by the feature surface extraction section 1.

４はフレームカウンタであって音声入力がないときリセ
ットされ、１フレーム毎にカウントアツプする。即ち、
その内容は現在処理中のフレームを示すことになる。4 is a frame counter which is reset when there is no audio input and counts up every frame. That is,
Its contents will indicate the frame currently being processed.

３は部分区間設定部であって、前記入カバターンに対し
て部分区間を設定するものである。いま、フレームカウ
ンタ４の内容をｔとするとき、部分区間設定部３は、ｒ
　＝　ｔ　−ｓ〜ｔ　−ｅなるフレームを第ｉ部分区間
の始端候補フレームとして順次設定するものである。こ
こで、ｓ、ｅは部分区間として許される範囲を制限する
ために、予め与えられる定数である。Reference numeral 3 denotes a partial section setting section, which sets a partial section for the input cover turn. Now, when the content of the frame counter 4 is t, the partial section setting unit 3 sets r
= t -s to t -e are sequentially set as starting end candidate frames of the i-th partial section. Here, s and e are constants given in advance to limit the range allowed as a subinterval.

５は平均値計算部であって、前記部分区間として設定さ
れだτ−ｔ　−ｒフレームの区間に含まれる特徴ベクト
ルを入力バッファメモリ２から読み　′９　ベーン出し、その平均値を求めるものである。即ち、部分区間
ｔ−τ＋１〜ｔがｉ番目の部分区間であるとし、その平
均値をｍ（ｔ、ｉ）　　とすれば、を求めることになる
。Reference numeral 5 denotes an average value calculation unit, which reads the feature vectors included in the section of the τ-t-r frame set as the partial section from the input buffer memory 2, extracts them, and calculates the average value. . That is, if the subinterval t-τ+1 to t is the i-th subinterval and its average value is m(t, i), then the following is obtained.

６は最小２乗近似直線計算部（部分距離計算部）であっ
て、前記部分区間における特徴ベクトルの最小２乗近似
直線とこの部分区間に含まれる特徴ベクトルの残差平方
和（部分距離）を求めるものである。即ち、前記部分区
間ｉにおける特徴ベクトルに対する最小２乗近似直線お
よび部分残差平方和（部分距離）は次のようにして求ま
る。6 is a least squares approximation straight line calculation unit (partial distance calculation unit), which calculates the residual sum of squares (partial distance) of the least squares approximation straight line of the feature vector in the partial interval and the feature vector included in this partial interval. That's what I'm looking for. That is, the least squares approximation straight line and partial residual sum of squares (partial distance) for the feature vector in the partial interval i are determined as follows.

前記部分区間ｉに対して求めるべき最小２乗近似直線Ｑ
（ｋ、　ｉ　）（ｋ＝１−ｒ　）はｕ（ｔ、ｉ）　をそ
の方向ベクトルとすればとおける。このとき、ｘ（ｔ−ｒ＋ｋ）とＱ（ｋ、ｉ）
とのに−１〜τの部分残差平方和ｖ（ｔ−τ＋１：ｔ）
は１０　・＼−／ｖ（を−τ＋１：ｔ）で表される。従って、最小２乗近似直線は式（２）にお
けるｕ（ｔ、ｉ）を部分距離（部分残差平方和）ｖ（を
−τ＋１：ｔ）が最小になるように定めることによって
得られる。Least squares approximation straight line Q to be found for the partial interval i
(k, i) (k=1-r) can be set if u(t, i) is its direction vector. At this time, x(t-r+k) and Q(k,i)
The partial residual sum of squares v(t-τ+1:t) between −1 and τ
is expressed as 10 ＼−/v(−τ+1:t). Therefore, the least squares approximation straight line can be obtained by determining u(t, i) in equation (2) so that the partial distance (partial residual sum of squares) v (-τ+1:t) is minimized.

即ち、ｖ（を−τ＋１：ｔ）をｕ（ｔ　、　ｉ　）で偏
微分したものが０に等しいとおいて、ｕ（ｔ、ｉ）に関
する方程式を解くことによって得られるものであって、
１ｘ（ｔ−ｒ＋ｋ　）　−Ｑ　（ｋ、　ｉ）　ｌ＝ｏ　
・＝−（ａ）より、１１　Ａ−ッとなる。ただし、ｍ（ｔ、ｉ）、ｕ（ｔ、ｉ）、Ｑ（ｋ
、ｉ）。That is, it is obtained by solving the equation regarding u(t, i) assuming that the partial differentiation of v(-τ+1:t) with respect to u(t, i) is equal to 0, and
1x (t-r+k) -Q (k, i) l=o
・From =-(a), it becomes 11 A-. However, m(t, i), u(t, i), Q(k
, i).

ｘ（＋−τ千１）等は縦ベクトルであって、′は転置を
意味する。また、ベクトルによる微分はその要素毎に別
々に微分することを意味している。x(+-τ1,000), etc. are vertical vectors, and ' means transposition. Further, differentiation by a vector means to differentiate each element separately.

以上のように、平均値計算部５、最小２乗近似直線計算
部６は最適パラメータ算出手段を形成するものであって
、部分区間設定部３で設定された区間に対する最適パラ
メータ、即ち、該区間の特徴ベクトル系列を最もよく近
似する最小２乗近似直線のパラメータとして平均値と方
向ベクトル、およびその直線に対する部分残差平方和が
算出される。As described above, the average value calculation section 5 and the least squares approximation straight line calculation section 6 form the optimal parameter calculation means, and calculate the optimal parameter for the section set by the partial section setting section 3, that is, the section The mean value and direction vector are calculated as parameters of the least squares approximation straight line that best approximates the feature vector series of , and the partial residual sum of squares for that straight line is calculated.

１５は部分残差平方和正規化部であって、前記部分残差
平方和を当該部分区間のフレーム数の関数で正規化する
ものである。ここでの正規化の目的は、正規化をしない
場合はフレーム数の多い部分区間の効果がフレーム数の
少ない部分区間の効果に比べて大きく分割の結果に影響
するのに対しにある。正規化計数としては、当該部分区
間のフレーム数をτとするとき、例えば、τ、τ　等を
用いることが出来る。このように正規化することによっ
て同一区間の異なった分割数の分割の仕方に対して、何
れがより最適であるかを比較することが可能となる。も
し、本実施例のように、部分距離としてベクトル間距離
の２乗和を採用するときは、部分区間のフレーム数τの
影響を除くためには、τ　に比例する量で正規化するの
が望ましく、ベクトル間距離として絶対値距離（市街地
距離）を採用するときは、部分区間のフレーム数τの影
響を除くためには、τに比例する量で正規化するのが望
ましいと考えられる。Reference numeral 15 denotes a partial residual sum of squares normalization unit, which normalizes the partial residual sum of squares by a function of the number of frames in the partial interval. The purpose of normalization here is that, if normalization is not performed, the effect of partial sections with a large number of frames will have a greater influence on the division result than the effect of subintervals with a small number of frames. As the normalization coefficient, for example, τ, τ, etc. can be used, where τ is the number of frames in the partial section. By normalizing in this way, it becomes possible to compare different division numbers of the same section to see which one is more optimal. If, as in this example, the sum of the squares of the distances between vectors is used as the partial distance, in order to remove the influence of the number of frames τ in the partial interval, it is recommended to normalize it by an amount proportional to τ. When using the absolute value distance (urban area distance) as the distance between vectors, it is considered desirable to normalize by an amount proportional to τ in order to remove the influence of the number of frames τ in the partial section.

７は累積残差平方和計算部（累積距離計算部）である。7 is a cumulative residual sum of squares calculation unit (cumulative distance calculation unit).

先ず、第ｉ部分区間のフレーム数をτ１、正規化計数を
τ　とすれば、１〜Ｔフレームを第１〜第１部分区間に
おける正規化残差平方和（正規化部分距離）の総和ｖ（
１：ｆ（１））／　ｒ　＋ｖ（ｆ（１）＋１　：ｆ（２
））／τ２＋−・−＋ｖ（ｆ　（Ｉ−１）＋１　：　ｆ
　（Ｉ　））／ｒ１が最小になるよ１３パ−・うに工分割しく以後、最適にＩ分割すると言うととにす
る）、その総和（以後、最小累積距離と呼ぶことにする
）Ｖ（Ｔ、Ｉ）　　を求めることを考える。First, if the number of frames in the i-th subinterval is τ1 and the normalization coefficient is τ, then the 1st to T frames are the sum of the normalized residual sums of squares (normalized partial distances) in the 1st to 1st subintervals v(
1:f(1))/r+v(f(1)+1:f(2)
))/τ2+-・-+v(f (I-1)+1: f
(I ))/r1 is minimized by 13%.Hereafter, we will refer to the optimal I division), and the sum (hereinafter referred to as the minimum cumulative distance) V(T , I).

ここで、ｆ（ｉ）（ｉ＝１〜工）は分割された第ｉ部分
区間の最終フレームである。この計算は動的計画法を用
いることによって効率的に行うことかできる。即ち、τ
１＝ｔ−ｒであるから漸化式％式％）（ｔ≠０）としテｔ　＝１〜Ｔ　、　ｉ　＝１〜Ｉ　に
ツイテ順次計算すれば、Ｖ（Ｔ、Ｉ）が求めるものであ
る。Here, f(i) (i=1 to 1) is the final frame of the divided i-th partial section. This calculation can be performed efficiently by using dynamic programming. That is, τ
Since 1 = tr, recurrence formula% formula%) (t≠0), and if we sequentially calculate t = 1 to T and i = 1 to I, we can obtain V(T, I). be.

漸化式（５）の意味するところは、１〜ｔフレームをｉ
分割したときの前記最小累積距離Ｖ（ｔ：ｉ）は、１〜
ｒ（ｔ−ｓ≦ｒ≦ｔ−ｅ）　フレームを１−１分割した
ときの最小累積距離Ｖ（ｒ、１−１）と、第ｉ区間の正
規化部分路＠ｖ（ｒ：ｔ）／ｒ　＝ｖ（ｒ：ｔ）／（ｔ
−ｒ）２との和のｒに関する最小値として求まるという
ととである。これは、式（６）を満足するＴをｒ。ｐ、
とすれば、１〜ｔフレームを最適にｉ分割したとき、１
〜’ｏｐｔ　フレームにおける各区間の分割点は、１〜
ｒｏｐｔ　フレームを最適にｉ−１分割したときの各区
間の分割点に一致する、最適過程の部分過程はその部分
でもまた最適過程になっているという、所謂最適性の原
理に基づくものである。What the recurrence formula (5) means is that 1 to t frames are
The minimum cumulative distance V(t:i) when divided is 1 to
r (t-s≦r≦t-e) The minimum cumulative distance V (r, 1-1) when the frame is divided 1-1 and the normalized partial path of the i-th section @v (r:t)/ r = v(r:t)/(t
-r)2 and is determined as the minimum value of r. This means that T that satisfies equation (6) is r. p,
Then, when 1 to t frames are optimally divided into i, 1
~'opt The dividing point of each section in the frame is 1~
This is based on the so-called principle of optimality, which states that the partial process of the optimal process that corresponds to the division point of each interval when the ropt frame is optimally divided into i-1 parts is also an optimal process.

本発明は式（６）の計算において、分割数工についても
最適化することを特徴とする。このときは、漸化式（５
）をノヨうに変更すれば、初期値Ｖ（ｏ）−ｏ、Ｖ（ｔ）＝
”（ｔ−４！Ｏ）としてこの漸化式を解いて最終的に得
られるＶ（Ｔ）は求めるべき最小累積残差平方和であっ
て、分割数工についても最適化されたものとなっている
。The present invention is characterized in that the calculation of equation (6) also optimizes the number of divisions. In this case, the recurrence formula (5
) to Noyo, the initial value V(o)-o, V(t)=
V(T), which is finally obtained by solving this recurrence formula as (t-4!O), is the minimum cumulative residual sum of squares that should be found, and the number of divisions is also optimized. ing.

結局、最小累積残差平方和計算部７はとの■（ｔ）をｔ
＝１〜Ｔについて計算すると共にＶ　（ｔ）に対応す求
めるものである。ここで、Ｂ（ｔ）は（Ｘ（１）、・・
・・・、ｘ（ｔ））を分割数および分割点について最適
に分割したと１５　′・−・きの最後から２番目の部分区間の最終フレームであって
、Ｂ（ｔ）の初期値はＢに）−〇とする。In the end, the minimum cumulative residual sum of squares calculation unit 7 calculates
= 1 to T and find the value corresponding to V (t). Here, B(t) is (X(1),...
..., x(t)) is optimally divided in terms of the number of divisions and division points, and the final frame of the penultimate sub-interval of 15'..., and the initial value of B(t) is B) −〇.

また、Ｎ　（ｔ）は（Ｘ（１）　、　＝−９ｘ（ｔ）　
）　　を最適に分割したときの分割数であって、Ｎ　（
ｔ）の初期値はＮ（ｏ）＝　。Also, N(t) is (X(1), =-9x(t)
) is the number of divisions when optimally dividing N (
The initial value of t) is N(o)=.

である。It is.

８は最小累積残差平方和記憶部であって、最小累積残差
平方和計算部γの結果、即ち、１〜ｔフレームを最適に
分割したときの最小累積残差平方和■（ｔ）を記憶する
。Ｖ［）は最小累積残差平方和計算部７における以後の
漸化式の計算に用いられる。8 is a minimum cumulative residual sum of squares storage unit which stores the result of the minimum cumulative residual sum of squares calculation unit γ, that is, the minimum cumulative residual sum of squares (t) when frames 1 to t are optimally divided. Remember. V[) is used in the subsequent calculation of the recurrence formula in the minimum cumulative residual sum of squares calculation unit 7.

１０はバンクポインタ記憶部であって、最小累積残差平
方和計算部７で計算された前記Ｂ　（＋）　（以後、第
ｔフレームのバックポインタと呼ぶことにする。）をｔ
＝１〜丁について記憶する０Ｂ（ｔ）はｔ−１〜Ｔにつ
いて以上の処理を行った後、バックトラックにより、最
適分割点の最終結果を後述のようにして見出すのに用い
られる。Reference numeral 10 denotes a bank pointer storage unit, which stores the B (+) (hereinafter referred to as the back pointer of the t-th frame) calculated by the minimum cumulative residual sum of squares calculation unit 7 at t.
0B(t) stored for =1 to d is used to find the final result of the optimal division point by backtracking after performing the above processing for t-1 to T as described below.

９は分割区間番号記憶部であって、最小累積残差平方和
計算部７で計算された前記Ｎ　（ｔ）をｔ＝１〜Ｔにつ
いて記憶する。Ｎ　（ｔ）はｔ＝１〜Ｔについて以上の
処理を行った後、バックトランクにょシ、最適分割点の
最終結果を後述のようにして見出す際にそれぞれの部分
区間の区間番号を見出すのに用いられる。Reference numeral 9 denotes a division section number storage section, which stores the N (t) calculated by the minimum cumulative residual sum of squares calculation section 7 for t=1 to T. After performing the above processing for t = 1 to T, N (t) is used to find the section number of each subsection when finding the final result of the backtrunk and optimal division points as described below. used.

１３は平均・方向ベクトル記憶部であって最小累積残差
平方和計算部７で求めた前記ｖ　（ｇに対応する最終区
間の最小２乗近似直線の平均ベクトルｍ（ｔ）と方向ベ
クレレｕ　（ｔ）とをｔ＝１〜Ｔについ記憶するもので
ある。ここで、ｍ（ｔ）　、　ｕ　（ｔ）は、それぞれ
前記Ｖ　（ｔ）に対応したｍ（ｔ　、ｉ　）、ｕ（ｔ、
ｉ）である。Reference numeral 13 denotes an average/direction vector storage unit which stores the average vector m(t) of the least squares approximation straight line of the final section corresponding to v (g) calculated by the minimum cumulative residual sum of squares calculation unit 7 and the direction Bekulele u ( t) for t=1 to T. Here, m(t) and u(t) are m(t, i) and u(t,
i).

１４は音声区間検出部であって特徴抽出部１の出力から
周知の方法にょシ、久方音声のレベル等を検知すること
によって、音声区間を検出するものである。前記フレー
ムカウンタ４はこの音声区間検出部１４の出力によって
制御され、音声の存在区間を１フレームから順次計数す
る・１１はバンクポインタ読出制御部であって、バック
ポインタ記憶部１０に対してフレーム番号ｔを与えるも
のであって、音声区間終了直後はその１７　　＼−７時のフレームカウンタ４の値Ｔをバックポインタ記憶部
１０に出力し、以後は、そのバックポインタの値が０に
なるまでバックポインタ記憶部１０の出力をバックポイ
ンタ記憶部１ｏにフィードバックするものである。バッ
クポインタ記憶部１゜は、与えられたフレーム番号ｔに
対して、１つ前の部分区間の最終フレームＢ　（ｔ）を
出力する。故に、バンクポインタ読出制御部１１は、先
ず、音声区間検出部１４により音声区間の終了が検出さ
れると、その時のフレームカウンタ４の値Ｔをバックポ
インタ記憶部１０に出力し、以後、バックポインタ記憶
部１ｏから読み出されるバンクポインタＢ　（ｔ）が○
になるまでそれを新たなフレーム番号としてバックポイ
ンタ記憶部１ｏにフィードバックする。このことによシ
、バックポインタ読出制御部１１はＴフレームを最適に
分割したときの各分割区間の最終フレームをＴから逆の
順序で出力することになる。また、フレーム１〜ｔを最
適に分割したときの最後尾分割区間における既に計算済
みの前記平均ベク）　／Ｌ／　ｍ　（ｔ）および方向ベ
ク）／し１８　・・− ｕ　（ｔ）、分割数Ｎ　（ｔ）を読み出すために、この
ようにして得られるバンクポインタ読出制御部１１の出
力であるフレーム番号ｔは、平均・方向ペクト／ｌ／記
憶部１３、分割区間番号記憶部９に供給される。Reference numeral 14 denotes a voice section detecting section, which detects a voice section by detecting the level of the long voice from the output of the feature extracting section 1 using a well-known method. The frame counter 4 is controlled by the output of the voice section detecting section 14, and sequentially counts the section in which voice exists starting from one frame. 11 is a bank pointer read control section, which inputs the frame number to the back pointer storage section 10. Immediately after the end of the voice section, the value T of the frame counter 4 at 17 \-7 is output to the back pointer storage section 10, and from then on, the back is continued until the value of the back pointer becomes 0. The output of the pointer storage section 10 is fed back to the back pointer storage section 1o. The back pointer storage unit 1° outputs the final frame B (t) of the previous partial section for a given frame number t. Therefore, first, when the end of a voice section is detected by the voice section detection section 14, the bank pointer read control section 11 outputs the value T of the frame counter 4 at that time to the back pointer storage section 10, and thereafter uses the back pointer. Bank pointer B (t) read from storage unit 1o is ○
The frame number is fed back to the back pointer storage unit 1o as a new frame number until the frame number is reached. As a result, the back pointer readout control unit 11 outputs the final frame of each divided section in the reverse order from T when the T frame is optimally divided. In addition, the already calculated average vector in the last divided section when frames 1 to t are optimally divided) /L/ m (t) and the direction vector) / 18 ... - u (t), the number of divisions In order to read out N (t), the frame number t, which is the output of the bank pointer read control section 11 obtained in this way, is supplied to the average/direction pect/l/ storage section 13 and the division section number storage section 9. Ru.

１２は標準パターン記憶部であって、以上のようにして
与えられる、バックポインタ読出制御部１１の前記出力
ｔと、それに伴う平均・方向ベクトル記憶部１３の出力
と、分割区間番号記憶部９の出力を記憶するものである
。Reference numeral 12 denotes a standard pattern storage section, which stores the output t of the back pointer readout control section 11, the accompanying output of the average/direction vector storage section 13, and the divided section number storage section 9, which are given as described above. It stores the output.

以上のようにして、特徴ベクトルの系列からなる音声パ
ターンを、前述の意味で分割数と分割位置について、最
適に分割したときの各部分区間における最小２乗近似直
線が得られる。As described above, a least squares approximation straight line in each partial section is obtained when the audio pattern consisting of a series of feature vectors is optimally divided in terms of the number of divisions and the division position in the above-mentioned sense.

第２図（ａ）は、１次元の特徴ベク）　ｙＶ系列で表さ
れたパターンを想定して、本発明における分割方法の概
念を説明するものである。FIG. 2(a) illustrates the concept of the division method in the present invention assuming a pattern expressed as a one-dimensional feature vector (yV series).

縦軸は特徴ベクトルの特徴量、横軸はフレーム、・は各
時点における特徴ベクトルの座標位置を表している。本
例では３分割の場合を示しておシ、線分１００〜１０２
は各区間の最小２乗近似直線１９　”−／である。同図（ｂ）は、同様に、最小２乗近似直線の方
向ベク）／しが常にＱのときを示すものであって、これ
は各区間の平均ベクトルに対するその区間に含まれるベ
クトルの誤差の２乗和を最小にすべく分割した場合に相
当し、それぞれの区間はそこに含まれる特徴ベクトルの
平均値という１つの代表ベク）　／Ｌ／で表現されるこ
とになる。The vertical axis represents the feature amount of the feature vector, the horizontal axis represents the frame, and * represents the coordinate position of the feature vector at each time point. In this example, the case of 3 divisions is shown.
is the least squares approximating straight line 19''-/ in each section. Similarly, Figure (b) shows the case where the direction vector )/ of the least squares approximating straight line is always Q, and this corresponds to the case where each interval is divided to minimize the sum of squares of errors of the vectors included in that interval with respect to the average vector of that interval, and each interval is one representative vector, which is the average value of the feature vectors included in it.) It will be expressed as /L/.

以上の説明からも明らかなように、本発明においては、
入カバターンを最適の分割数と最適の分割点によって分
割し、それぞれの分割区間における最小２乗近似直線を
求め、その平均値を表すベクトルと、そこを通る最小２
乗近似直線の傾き（方向）を表すベクトルを標準パター
ンとして持つことになる。As is clear from the above description, in the present invention,
Divide the input cover turn by the optimal number of divisions and optimal division points, find the least squares approximation straight line in each division section, and find the vector representing the average value and the minimum 2 lines passing through it.
A vector representing the slope (direction) of the power approximation straight line is included as a standard pattern.

発明の効果本発明によれば、前記部分区間の直線の傾きがその部分
区間の動的特徴を、平均ベクトルが静的特徴を表現する
ことになる。本発明はこれらを標準パターンとして持つ
ことによりその動的特徴が反映されることになシ、前述
の従来例の持つ欠点を除去することが出来たものである
。Effects of the Invention According to the present invention, the slope of the straight line of the partial section represents the dynamic feature of the partial section, and the average vector represents the static feature. By having these as standard patterns, the present invention is able to reflect the dynamic characteristics of the patterns and eliminate the drawbacks of the prior art example described above.

まだ、本発明は、標準パターンとして記憶すべきパラメ
ータは、それぞれの部分区間に対するその平均値を表す
ベクトルと、そこを通る最小２乗近似直線の傾き（方向
）を表すベクトルのみでよいから、特徴抽出部の出力の
特徴ベクトルの系列そのものを標準パターンとして持つ
場合の必要記憶容量を多く必要とするという欠点も除去
されることとなる。However, in the present invention, the only parameters to be stored as a standard pattern are a vector representing the average value for each subinterval and a vector representing the slope (direction) of the least squares approximation straight line passing through it. This also eliminates the drawback of requiring a large storage capacity when the series of feature vectors output from the extraction unit itself is held as a standard pattern.

さらに、本発明は、不特定話者を対象とする場合は、前
記最小２乗近似直線上の点をそれに対応する時点の特徴
ベクトルの平均値として分布形（具体的には正規分布等
の分布の種類と分散）を与えることによって実現できる
等、前記従来例にはない特徴を有するものである。Furthermore, when the present invention is aimed at unspecified speakers, points on the least squares approximation straight line are set as the average value of the feature vector at the corresponding point in time to form a distribution (specifically, a distribution such as a normal distribution). It has features not found in the conventional example, such as being able to realize this by providing different types and distributions.

なお、本実施例では前記近似曲線は直線の場合について
説明したが、本実施例の説明の冒頭でも述べたように、
同様な方法により、種々の曲線で近似することもでき、
よシ精密に認識単位の動的特徴を表現することが可能で
あるばかりでなく、２１　・＼−、パターンも音声パターンに限るものではないことは言う
までもない。In addition, in this embodiment, the case where the approximate curve is a straight line has been explained, but as stated at the beginning of the explanation of this embodiment,
Approximations can also be made with various curves using a similar method,
It goes without saying that not only is it possible to express the dynamic characteristics of the recognition unit very precisely, but the patterns are not limited to speech patterns.

さらに、ベクトル間の差の尺度として、各成分の差の絶
対値和、即ち、市街地距離の他種々の距離または類似度
を用いることができるＯまた、実施例においては音声区
間の検出を自動的に行う場合について説明したが、音声
の分析出力を眺めることにより人手によって音節や音韻
等の前記音声素片に相当する区間を抽出し、それを本発
明で述べた方法により、さらに細かい区間に分け、それ
ら区間に対応する前記近似曲線のパラメータによってそ
れぞれの区間のパターンを表現し、その連結によって前
記音声素片の標準パターンとすることもできるのは勿論
であシ、前記従来例の後半で述べた音声素片を認識する
方法に適用することが出力る。Furthermore, as a measure of the difference between vectors, the sum of absolute values of the differences of each component, that is, various distances or similarities in addition to the urban area distance, can be used.In addition, in the embodiment, the voice interval detection is automatically performed. However, by looking at the speech analysis output, we manually extract sections corresponding to the speech segments such as syllables and phonemes, and then divide them into smaller sections using the method described in the present invention. Of course, it is also possible to express the pattern of each section by the parameters of the approximate curve corresponding to those sections, and to make the standard pattern of the speech segment by concatenating them. The output is applied to a method for recognizing speech segments.

[Brief explanation of the drawing]

第１図は本発明の一実施例を示すブロック図、第２図は
本発明の詳細な説明する概念図である０１・・・・・・
特徴抽出部、２・・・・・・入カバソファメモリ、２２
　　・３・・・・・・部分区間設定部、４・・・・・・フレー
ムカウンタ、５・・・・・・平均値計算部、６・・・・
・・最小２乗近似直線計算部、７・・・・・・最小累積
残差平方和計算部、８・・・・・・最小累積残差平方和
記憶部、９・・・・・・分割区間番号記憶部、１ｏ・・
・・・・バックポインタ記憶部、１１・・・・・バンク
ポインタ読出制御部、１２・・・・・・標準パターン記
憶部、１３・・・・・・平均・方向ベク）／Ｌ／記憶部
、１４・・・・・・音声区間検出部、１５・・・・・・
部分残差平方和記憶部。FIG. 1 is a block diagram showing an embodiment of the present invention, and FIG. 2 is a conceptual diagram explaining the present invention in detail.
Feature extraction unit, 2...Input cover sofa memory, 22
・ 3... Partial section setting unit, 4... Frame counter, 5... Average value calculation unit, 6...
... Least squares approximation straight line calculation section, 7 ... Minimum cumulative residual sum of squares calculation section, 8 ... Minimum cumulative residual sum of squares storage section, 9 ... Division Section number storage unit, 1o...
... Back pointer storage section, 11 ... Bank pointer read control section, 12 ... Standard pattern storage section, 13 ... Average/direction vector)/L/ storage section , 14... Voice section detection unit, 15...
Partial residual sum of squares storage.

Claims

[Claims]

a subinterval setting means for setting a candidate interval for the i-th subinterval of a pattern consisting of a series of feature vectors, a feature vector series of the set candidate interval for the i-th interval of the pattern, and a time function that takes a vector value; The normalized partial distance (normalized partial similarity) normalized in relation to the length of the interval with the vector sequence on the given curve is the minimum (maximum)
The minimum total of the normalized partial distances (normalized partial similarities) is determined by optimal parameter calculation means for calculating the parameters of the time function such that a minimum cumulative distance (maximum cumulative similarity) calculation means for calculating a (maximum) value, and a parameter of the time function calculated corresponding to each optimally determined subinterval i in relation to the interval number i. 1. A pattern creation device comprising: standard pattern storage means for storing standard patterns.