JPS59198A

JPS59198A - Pattern comparator

Info

Publication number: JPS59198A
Application number: JP57110529A
Authority: JP
Inventors: 中川　聖一; 英一坪香
Original assignee: Individual
Current assignee: Individual
Priority date: 1982-06-25
Filing date: 1982-06-25
Publication date: 1984-01-05
Also published as: JPH0247758B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】本発明は、登録された複数種類のパターンと入を行うパ
ターン比較装置、特に連続して発声した単語音声の認識
などに適用ｉＪ能なパターン比較装置に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a pattern comparison device that performs matching with a plurality of registered patterns, and particularly to a pattern comparison device that can be applied to recognition of continuously uttered word sounds.

人間にとって最も自然な情報発生手段である高声が、人
間−機械系の入力手段として真価が発揮されるためには
、話者を限定せず連続的な通常の会話音声の認識が可能
なことが望ましい。In order for high-pitched voice, which is the most natural means of information generation for humans, to demonstrate its true value as an input means for human-machine systems, it is necessary to be able to recognize continuous normal conversational speech without limiting the speaker. is desirable.

第１図は単語単位を認識単位とする音声認識装置のブロ
ック図である。（１）は音声信号の入力端子、（２）は
入力音声信号を周波数分析、ＬＰＣ分析、ＰＡＲ（Ｘ）
Ｒ分析、相関分析等によシ幾つかの数値の組（特徴ベク
トル）の系列に変換する音響分析部、（３）はＭ（識す
べき単語が前記特徴ベクトルの系列として登録されてい
る標準パターン記憶部、（４）は音響分析部（２）で分
析された認識すべき入力音声信号に対する前記特徴ベク
トルの系列と前記標準パターンのそれぞれとを比較し、
両者の距離あるいは類似度をｔ＋算するパターンマツチ
ング部、（５）はパターンマツチング部（４）の計算結
果に基づいて前記入力音語を認、識結果として判定する
判定部であり、（６）はこの認識結果を出力する出力端
子である。このような構成による峰声認識装置において
、パターンマツチングの方法として、動的３１画法によ
る時間軸非線形伸縮によりマツチング（ＤＰマツチング
）を行う方法が優れている。FIG. 1 is a block diagram of a speech recognition device that uses words as recognition units. (1) is an audio signal input terminal, (2) is an input audio signal that performs frequency analysis, LPC analysis, and PAR(X).
(3) is an acoustic analysis unit that converts into a series of several sets of numerical values (feature vectors) by R analysis, correlation analysis, etc.; The pattern storage unit (4) compares the series of feature vectors for the input audio signal to be recognized analyzed by the acoustic analysis unit (2) with each of the standard patterns,
A pattern matching unit calculates the distance or similarity between the two by t+, and (5) is a determination unit that recognizes the input phonetic word based on the calculation result of the pattern matching unit (4) and determines it as a recognition result. 6) is an output terminal that outputs this recognition result. In the peak voice recognition device having such a configuration, an excellent pattern matching method is a method of performing matching (DP matching) using time axis nonlinear expansion/contraction using the dynamic 31-stroke method.

本発明装置による連続単語認識において、このＩ）Ｐマ
ツチングは中心的な役割を、演する。次にＤＰマッヂン
グのアルゴリズムについて簡単に説明する。In continuous word recognition by the apparatus of the present invention, this I)P matching plays a central role. Next, the DP matching algorithm will be briefly explained.

の？４声パターンは、それぞれに対する特徴ベクトルａ
、、ｂ、の系列で表わされる。of? The four-tone pattern has a feature vector a for each
, ,b,.

ベクトルａｌとす、の距離をｄ（ｉ、ｊ）とするとき、
前記両系列を構成するベクトルの種々の対応づけに苅し
、ｄ（１，ｊ）の荷重平均を求め、それが最小になる対
応づけを両系列間の最適な対応づけとし、そのときのＭ
　Ｔｎ平均を両系列間の距１１１ｆｉＤ（Ａ、Ｂ）とす
るのであるが、この手続を動的３１画法を用いて効率よ
く行うのがＤＰマツチングである。なお、ｄ（ｉ、ｊ）
ハ通常ベクトルａ１とす、のユークリッド距１１％［だ
は市街距離が用いられる。When the distance between vector al and , is d(i, j),
The weighted average of d(1,j) is obtained by examining various correspondences between the vectors constituting both the above-mentioned series, and the correspondence that minimizes the weighted average is taken as the optimal correspondence between both series, and then M
The Tn average is set as the distance 111fiD(A, B) between both series, and DP matching efficiently performs this procedure using the dynamic 31-stroke method. Note that d(i, j)
The normal vector a1 is the Euclidean distance of 11%, and the city distance is used.

第２図はこれを二次元的に図示しだもので、Ａ。Figure 2 shows this two-dimensionally.A.

８両パターンの時間の対応すなわち時間変換函数ｊ（ｉ
）は、ｉ−ｊ平面上の格子点ｃ（ｋ）＝（ｉ（ｋ）、ｊ
（ｋ））の系列Ｆ＝ｃ（１）ｃ（２）　　・・・　ｃ（ｋ）−ｃ（Ｔ（
）　　　−（２）（ｉ（Ｋ）＝Ｉ　　、ｊ（Ｋ）＝Ｊ）で表わされる。このとき、Ｄ（Ａ、Ｂ）は次のように定
義される。The time correspondence of the 8-car pattern, that is, the time conversion function j(i
) is the grid point c(k)=(i(k),j
(k)) series F=c(1)c(2) ... c(k)-c(T(
) −(2) (i(K)=I, j(K)=J). At this time, D(A, B) is defined as follows.

ここに、ｗ（ｋ）は非負の定数で、その値は時間変換１
季１数ｊ（ｉ）を点列で近似するときの方式によって定
められる。ここで、式（３）の分母をＦに依存しない定
数Ｍ−Σｗ（ｋ）とすれば、Ｉ）（Ａ、Ｂ）は動的３１
両法にに−１より効率的に求められる。すなわち、＝ｍｉｎ［ｍｉｎ　　　　　〔ｒ’　ｄ（ｃ（１）ｗ（
／？）、：ｌ刊（ｃ（ｋ）ｗ（ｋ）〕ｃ（ｋ）ｃ（＋）
ｃ（２）−ｃ（ｋ−１）　”であるから、ｇ（ｃ（１）
）＝ｇ（１，１）＝ｄ（１，１）として、漸化式（４）
を解き、ｇ（ｃ（Ｋ））＝ｇ（１、Ｊ　）が求められれ
Ｖより（Ａ、Ｂ）＝　−Ｉ−ｇ（１，Ｊ）Ｍｏｏ−−−°−−−　　（５）としてＩ）（Ａ、Ｂ）が求められる。Here, w(k) is a non-negative constant, and its value is the time transformation 1
It is determined by the method used to approximate the seasonal number j(i) using a sequence of points. Here, if the denominator of equation (3) is a constant M-Σw(k) that does not depend on F, then I) (A, B) is a dynamic 31
Both methods can be calculated more efficiently by -1. That is, =min[min [r' d(c(1)w(
/? ), :l publication (c(k)w(k)]c(k)c(+)
c(2)-c(k-1)'', so g(c(1)
)=g(1,1)=d(1,1), recurrence formula (4)
By solving, g(c(K))=g(1, J) is obtained, and from V, (A, B)= −I−g(1, J) Moo−−−°−−− (5) as I )(A, B) is obtained.

式（３）の分１好を定数化する方法として、Ｍ＝Ｉ＋Ｊ
となるようＫする方法（対称１１便）と、Ｍ＝Ｉｊたは
Ｊとなるようにする方法（非対称４！ｌ）がある。第３
図ｆａｌ〜（ｆｌは点列Ｆを選ぶ際の拘束条件の例を示
１、ておシ、点（ｉ、ｊ）に至る径路は図の矢線で示さ
れる径路のみとり得る。寸だ各線分上に示された数字は
その線分が径路として選ばれた場合の荷重ｗ（ｋ）を示
している。（ａ）、（ｂ）は前記対称型の例でＭ＝１十
Ｊ　となり、（ｃｌ〜（ｆ）はｎＯ＾己非対称型の例で
Ｍ＝１となる。As a method to make the fraction of equation (3) constant, M=I+J
There is a method of setting K so that it becomes (symmetrical 11 flights) and a method of setting K so that M=Ij or J (asymmetrical 4!l). Third
Figure fal ~ (fl shows an example of the constraint conditions when selecting the point sequence F. 1) The route leading to the point (i, j) can only take the route shown by the arrow in the figure. The number shown above indicates the load w(k) when that line segment is selected as the path. (a) and (b) are the symmetrical example mentioned above, where M = 10 J, (cl~(f) is an example of nO^ self-asymmetric type, and M=1.

このようなマツチング法を用いて単語音声の認識をする
ためには次のようにする。認識の対象となっている単語
クラスをｎ（ｎ＝１〜Ｎ）、その標準パターンを°Ｂ０
で表す。人力Ａと各標準パターンＨ’との距ＮＵ　Ｉ）
。−Ｌ）（Ａ、Ｂ”）を上記の方法でＪｌ）Ｑシ、Ｉ）
１１ｏ−ｍｉｎ（Ｉｎｎ斤与えるクラスｎ。をＡに対す
る認識結果とする。To recognize word sounds using such a matching method, proceed as follows. The word class to be recognized is n (n=1 to N), and its standard pattern is °B0.
Expressed as Distance NU I) between human power A and each standard pattern H'
. -L) (A, B”) in the above method Jl) Qshi, I)
11 o-min (Inn catty given class n. Let be the recognition result for A.

前記非対称型のＤＰマッヂングでＭ＝１となるようにす
れば、Ｍは入カバターン長にのみ関係する１％逼となり
、式（５）において何れの標準パターンに対してもＭは
一定であるから、と定義できる。以後、パターン間の距〜ｔは式（６）に
よるものとする。第３図（Ｃ）の拘束条件のもとに式（
６）を求める場合には次の漸化式（７）を計やすれはよ
い０初期条件　ｇ（１，１）＝ｄ（１，１）次に連続単語認
識の認識について説明する。連続単語音声認識は次のよ
うに定式化できる１、い１Ｘ個の単語ｑ（１）、ｑ（２
）、・・・ｑ（ｘ）を連続して発声したときの高ＩＪ１
パターンをＡで表わす。If M is set to 1 in the asymmetric DP matching described above, M becomes 1%, which is related only to the input cover turn length, and M is constant for any standard pattern in equation (5). , can be defined as . Hereinafter, it is assumed that the distance between patterns ~t is based on equation (6). Under the constraint conditions shown in Figure 3(C), the formula (
6), it is best to calculate the following recurrence formula (7).0 Initial condition: g(1,1)=d(1,1)Next, continuous word recognition will be explained. Continuous word speech recognition can be formulated as follows: 1, 1X words q(1), q(2
), ...high IJ1 when uttering q(x) continuously
The pattern is represented by A.

Ａ、＝ａａ　・・・ａｌ・・・ａｌ　　　　　　・・・
・・　（８）２甲語ｑｃｘ）の椰ｌドパターンを１％、＝＝１〕、ｑＬｘ）１）２ｑｔｘ）・・・ｂＩｑ
（ｘ）・ｂｚＱ（ｘ）　　　＋　ＨＨＨ＋　　（９）と
するとき、Ｘ個の単語Ｂｑ（Ｄ　”’ｑ（２ビ”　Ｂｑ
（ｘ）を接続して得られる椰ｌドパターン■はＢ＝Ｂｑい、ｑ→ＢＱ（２１■・・Φ、虜（ｘ）　　　
　・・・・・・・　　（１０＝１ｒｒＬｌ）１〕’ｊ（
＋）、、、　ｂ、ｑｔＤ　ｌ）？（２斥Ｅ””’　ｂｙ
＋（２）　・４．ｑ（ｘも、ｑＬｘ）　、・、へ２３ｙ
、七で表わされる。ここで山はパターンの接続を表わす
。A,=aa...al...al...
... (8) 2 Kogo qcx) palm pattern by 1%, == 1], qLx) 1) 2qtx)...bIq
When (x)・bzQ(x) + HHH+ (9), X words Bq(D ``'q(2bi'' Bq
The palm pattern ■ obtained by connecting (x) is B=Bq, q→BQ(21■...Φ, prisoner(x)
・・・・・・・・・ (10=1rrLl)1〕'j(
+),,, b, qtD l)? (2斥E""' by
+(2) ・4. q (x also, qLx) ,...23y
, represented by seven. Here, the mountains represent the connections of the patterns.

そこで、連続ｒｌｉ飴音声認識は、この百と入力音声パ
ターンＡとの間で１）Ｐマツチングを実行し、その際得
られる１）（Ａ、Ｂ　亦最小になるように、Ｘとｑ（ｘ
）（ｘ＝１．２．・・・、ｘ）を決めるという問題にな
る。すなわちをＲ１算し、′ｌ゛が最小になる条件を求めればよい。Therefore, continuous rli candy speech recognition performs 1) P matching between this hundred and input speech pattern A, and then calculates X and q(x
)(x=1.2...,x). In other words, it is sufficient to calculate R1 and find the conditions under which 'l' becomes the minimum.

式（１υの計算を寸ともに実行しようとすると、膨大な
計勢量が必要となる。すなわち、入力音声パターンにお
いて連続発声の単語数の最大値をＫ、単語標セパターン
の数をＮとすれば、Ｎ回のＲ１算を実行することになる
。そこで、宋１祭にはこの問題を次の漸化式を解く問題
に帰着させている。Equation (1υ) would require a huge amount of measurement if we were to calculate it in minutes.In other words, if the maximum number of consecutively uttered words in the input speech pattern is K, and the number of word mark patterns is N, then , N times of R1 calculations are executed.Therefore, in the 1st Song Dynasty, this problem was reduced to solving the following recurrence formula.

入力音声パターンＡにおいて、１＝ｆ４−１からｉ＝ｍ
までの部分区間を、部分パターンＡ（八ｍ）で定義する
。In input audio pattern A, 1=f4-1 to i=m
The partial section up to is defined by partial pattern A (8m).

Ａ、（ｌ＋ｍ）＝ａ　　　ａ　　　・・・輻　　・・・
・・Φ・・・・　＠６＋Ｉ　　ｆｆ＋２このとき、式（６）によりパターン間の距離を定義すれ
Ｖｆ、次のことが昌える。A, (l+m)=a a...radius...
...Φ...@6+I ff+2 At this time, the distance between the patterns is defined by equation (6), Vf, and the following can be changed.

Ｉ）（Ａ、Ｂ、ωＢ２）＝ｍｉ　ｎ［ＤｃＡ（ｏ、ｍ）
　、Ｂ、　）＋Ｄ（Ａ（ｍ、　Ｉ　）　、Ｂ２））・（
１３１このことを用いれば式０υは次のように解ける。I) (A, B, ωB2) = min [DcA(o, m)
,B, )+D(A(m,I),B2))・(
131 Using this fact, the equation 0υ can be solved as follows.

ここで以後用いる記号の意味を第１表にまとめて示す。The meanings of the symbols used hereinafter are summarized in Table 1.

−以　　下　　余　　臼　　− 第１表１）入力単語数Ｘが既知の場合Ｎｘ（ｉ）＝舎、ｌ３（ｉ）輪（令、命は式Ｃｌ４）を満たすｎとｍ）なる漸化式の解
を求めれば、認識結果は第４図に示すフローチャートに
より、Ｘ単語列の最後Ｆｖ　ｔｌｔ−語名とセグメンテ
ーション結果から先頭Ｊｌｔ　ｓｎ名とセグメンテーシ
ョン結果まで順次求まる。- Below is the remainder - Table 1 1) When the number of input words Once the solution is found, the recognition results are sequentially obtained from the last Fv tlt-word name and segmentation result of the X word string to the first Jlt sn name and segmentation result according to the flowchart shown in FIG.

１１）入力単語数Ｘが未知の場合＝ｍｉｎ［Ｄ（ｍ）＋Ｄ”（ｍ＋１：ｉ））　　　　　
・・・・・・　（ｌｒＮＮ（ｉ）−介＋　Ｂ（＋　）−
ｍ（ｎ、ｍは式ａ９を満たすｎとｍ）なる漸化式の解から第５図のフローチャートによυ認識
結果が得られる。11) When the number of input words X is unknown = min[D(m)+D”(m+1:i))
・・・・・・ (lrNN(i)−−+ B(+)−
From the solution of the recurrence formula m (where n and m satisfy equation a9), the υ recognition result is obtained according to the flowchart of FIG.

以上の考え方を実現するのに２段ＤＰ法が提案されてい
る。次に２段ＤＰ法について概略を説明する。A two-stage DP method has been proposed to realize the above idea. Next, the outline of the two-stage DP method will be explained.

２段ＤＰ法は、先ず居（ｓ：ｔ）をあらゆるｓ、ｔの組
合せに対してＤＰで求めておき、その後Ｉ）（ｉ）をＩ
）Ｐで求める方法で、ＤＰを２段にしているのが特徴で
ある。In the two-stage DP method, first, s(s:t) is obtained by DP for every combination of s and t, and then I)(i) is
)P, and is characterized by having two stages of DP.

この２段ＤＩ）法としては前向きアルゴリズムと後向き
アルゴリズム、が提案されているが、ここでは後向キア
ルゴリズムについて説明する。Although a forward algorithm and a backward algorithm have been proposed as this two-stage DI) method, the backward algorithm will be explained here.

（ｐ　人カバターンのフレーム１−１ＶＣ列して、Ｄ（
ｉ−１）、Ｎ（ｉ−］　）、１３（ｉ　−１）は求１っ
ているとする。(p Person Kabataan frame 1-1VC row, D(
i-1), N(i-]), and 13(i-1) are calculated as 1.

（２）　　用語ｎ（ｎ−１，２，−、Ｎ）　　の標準パ
ターンと入カバターンを、Ｉｏを始点として逆時間向き
にＤＰマンチングする。従って、径路の鉤虫条件は第３
図（ｃｌ、（ｄｌ、（ｅｌ、（ｆ）に対応して、第７図
（ａｔ、（ｂ）、（ｃｌ、（ｄｌとなる。(2) Perform DP munching on the standard pattern and input cover turn of term n(n-1, 2, -, N) in the reverse time direction with Io as the starting point. Therefore, the hookworm condition for the route is the third
Corresponding to Figures (cl, (dl, (el, f)), Figure 7 (at, (b), (cl, (dl)).

マツチング範囲は、整合窓幅Ｒで行うことも考えられる
が、ここでは傾きＬ〜２の範囲（傾斜制限内、第６図の
斜線部）で行うものとする。Although matching may be performed within the matching window width R, here it is assumed that matching is performed within the range of slope L to 2 (within the slope limit, the shaded area in FIG. 6).

このマツチングを終端フリーとじて行う。その結果、Ｉ
）：（ｓ：ｉ）が求する。ただし、ｉ　−２Ｊ　”＋１
３ＳＳ、ｉ−（Ｊ／２　）Ｊ　’である。This matching is performed with a free termination. As a result, I
):(s:i) is found. However, i −2J ”+1
3SS, i-(J/2)J'.

■　戊（１りの１）（ｉ）、Ｎ（ｉ）、Ｂ（ｉ）を求め
る。■ Find 戊(1)(i), N(i), and B(i).

（４）　　ｉ　＝ｉ旧としてＣ）へもどる。(4) Return to C) with i = i old.

この考え方を連続単音節音声の認識に適用することを考
える。単音節台声は子音プラス母音という形をしており
、子音部は母音部よりかなシ短い。しかるに、特に、母
音部が同じである単音節音声は、子音部の微妙な連いに
より区別されなければならない。従って、前記のパター
ンマツチングにおいて入力されたＱｔ　１節音声と標準
バタ・−ンの単音節音声のそれぞれと中咥節音声全体と
してマツチングするとマツチング結果に与える母音部の
影響が大きく子片部の微妙な差を区別するのが難かしく
なる。Let us consider applying this idea to the recognition of continuous monosyllabic speech. A monosyllabic voice has a consonant plus a vowel, and the consonant part is shorter than the vowel part. However, in particular, monosyllabic speech that has the same vowel part must be distinguished by subtle sequences of consonant parts. Therefore, when matching each of the input Qt 1-syllable speech and standard bata-n monosyllabic speech with the entire middle syllable speech in the pattern matching described above, the influence of the vowel part on the matching result is large, and the influence of the vowel part on the matching result is large. It becomes difficult to distinguish subtle differences.

本発明はこの欠点を補うものであって、子音部を重視し
たマツチングを行うパターン比較装置を提供するもので
ある。The present invention compensates for this drawback and provides a pattern comparison device that performs matching with emphasis on consonant parts.

すなわち、事前知識を積極的に導入し、より精度の高い
マツチングを行うには、標準パターンや入カバターンの
各フレームに重みを導入する必要がある。入カバターン
の各フレームに適当に車みを導入しても、今までの全ア
ルゴリズムはそのまま成立する。しかし、標準パターン
に重みを導入すると累積照合距離が標準パターン長等に
も依存してしまい、漸化式（１４１（１Ｇが成立しなく
なる。That is, in order to actively introduce prior knowledge and perform more accurate matching, it is necessary to introduce weights to each frame of the standard pattern and input pattern. Even if a car is randomly introduced into each frame of the input pattern, all the algorithms up until now will still hold true. However, if a weight is introduced into the standard pattern, the cumulative matching distance will depend on the standard pattern length, etc., and the recurrence formula (141 (1G) will no longer hold.

次にその理由を説明する。例えば、標準パターンに重み
を３９人（〜だ例としては、既に説明した対称！（すの
Ｉ）Ｐマツチングがある。この場合は、標準パターン長
によっても累積照合距離が変るので、どの標準パターン
が最も良く適合するかを評価するだめにＩｄ　Ｒｉｌ記
の如く人カバターン長と標準パターン長の４１１で両パ
ターン間の累積照合距離を割る（正規化する）必要があ
った。Next, the reason will be explained. For example, the standard pattern has a weight of 39 people (...).An example is the symmetric! In order to evaluate whether the two patterns fit best, it was necessary to divide (normalize) the cumulative matching distance between the two patterns by 411, which is the human cover pattern length and the standard pattern length, as described in Id Ril.

いま、人カバターンＡの部分パターンＡ（０，ｍ）に最
も良く適合する標準パターンがＢ１、その長さがＪ、そ
の他の任意の標準パターンがＢ２、その長さがＢ２であ
ったとすると次式が成立する。Now, suppose that the standard pattern that best fits partial pattern A (0, m) of human cover turn A is B1, its length is J, and any other standard pattern is B2, its length is B2, then the following formula is obtained. holds true.

イ１１シ、Ｃコテ１）（Ｐ、Ｑ）は正規化する前のパタ
ーンＰとパターンＱの累４？を照合距離を表わすものと
している。1) Is (P, Q) the sum of pattern P and pattern Q before normalization? is used to represent the matching distance.

人力が第ｉフレ＼ノ・の時点で式圓、ａ！′９に基づい
て（勿論入カバターン長と＄２％パターン長で正規化す
るとして）バンクポインタと最後Ｆｃ単語（単呂節）を
探索する場合を考える。最後尾即語をＸ、その擾さをｘ
、パックポインタをＩｎ　、ｌ！：　（Ｒ定しだとき、
Ｂ、とＸを結合した標準パターンと人力の部分パターン
Ａ（０，ｉ）の累積照合距離を入カバターンＪｋと標準
パターン長の和で正規化したものはで表わされる。ｍお
よびＸを式０４）　ＱＦＤにより探索するだめには、α
は当然法の値よシも小さくなければならない。At the time of the i-th friendship, Shikien, a! Let us consider the case of searching for the bank pointer and the last Fc word (single word) based on '9 (assuming, of course, that it is normalized by the input pattern length and the $2% pattern length). The final immediate word is X, and its dissonance is x.
, pack pointer In , l! : (When R is set,
The cumulative matching distance of the standard pattern combining B, and In order to search m and X by Equation 04) QFD, α
Naturally, the value of the modulus must also be small.

すなわち、もしβ〈αが成立すれば、弐ｏ９におけるＤ
（ｍ）として、第ｍフル−ム目で求めたＤ（ｍ）を用い
ることができなくなるからである。That is, if β<α holds, D at 2o9
This is because D(m) obtained at the m-th frame cannot be used as (m).

ところが、αくβは一般には成立しない。例えはＤ（Ａ
（０、ｍ）　、Ｂ、　）　＝　１０　、　Ｄ（Ａ（０、
ｍ）　、Ｂ２）＝２０ｒｒ＋＝２０　、　ｂ、＝１０　
、　ｂ２＝２０とすれば式Ｑｅｊにおいて左辺＝　１０／（２０＋１０　）＝１／３右辺＝　２０
／（２０＋２０　）＝１／２となり、Ｆ記の数値は式Ｏ
Ｑを満足する。しかし１＝４０　、　ｘ＝１０．１）（
Ａ（ｍ＋ｌ　、　ｉ　）　、Ｘ　）−６０とずれＣ」″ α−（，１０−１−６０）／（４０＋１０−１−１０　
）＝７／６β−（２０−１−６０）／（４０＋２０＋１
０）−８／７であるから α〉β となり、もはや式θ６は満足されなくなる。However, α and β generally do not hold. For example, D(A
(0, m) , B, ) = 10 , D(A(0,
m), B2)=20rr+=20, b,=10
, If b2=20, then in formula Qej, left side = 10/(20+10)=1/3 right side = 20
/(20+20)=1/2, and the value in F is the formula O
Satisfy Q. But 1=40, x=10.1)(
A (m+l, i),
)=7/6β-(20-1-60)/(40+20+1
0)-8/7, so α>β, and equation θ6 is no longer satisfied.

ところが人カバターン長のみに依存する前記非対称型の
Ｉ）Ｐ法の場合はであれば四八（０，ｍ万Ｂ　）−ｔＩ）（Ａ（ｍ＋１．ｉ）、Ｘ
）は明らかであるから矛盾なく式０４）（，１〜が使え
る。However, in the case of the asymmetric I)P method that depends only on the human cover turn length, then 48 (0, m million B) - tI) (A (m + 1.i), X
) is clear, so Equation 04)(, 1~ can be used without contradiction.

」１音節音声の認識において、子音部を重視するために
、子音部の重みを大きくするとよいわけであるが、単純
にこれを行うと以上のような問題を生ずる。'' In the recognition of one-syllable speech, it is better to increase the weight of the consonant part in order to place emphasis on the consonant part, but if this is simply done, the problems described above will occur.

本発明は、この欠点を除去し、かつ子斤部を重視したマ
ツチングを可能とする車みづけの写え方に特徴を有して
いる。The present invention is characterized by a method of photographing car matching that eliminates this drawback and enables matching with emphasis on the child's portion.

以上の問題は標準パターンの各フレーｊ、　［対する重
みの和がどの標準パターンに対しても一定になるように
ずれは解決できる。すなわち、ｎ番目のｍ　ｔｉｔパタ
ーンの第ｊフレームにおける重みをＷ’（ｊ）とすれば
、となるようにＷ”（ｊ）を決めることによシ、累積照合
距離は人カバターン長と単音節数のみに依存することに
なるから、単音節数が指定されたときは入カバターン長
のみに依存することになシ、２段ＤＰマツチングが使え
ることになる。The above problem can be solved so that the sum of the weights for each frame j, [of the standard pattern is constant for any standard pattern. In other words, if the weight of the nth mtit pattern in the jth frame is W'(j), by determining W''(j) as Since it depends only on the number of monosyllables, when the number of monosyllables is specified, it does not depend only on the length of the input cover turn, and two-stage DP matching can be used.

第８図、第９図はそれぞれのマツチング径路に対しての
重み付は方法の一実施例である。FIGS. 8 and 9 show an example of a weighting method for each matching path.

従って、第７図（ｄｌのようなマツチング径路の鉤虫に
列してｒＪ、第１０図のような重み（＝１をすれば良へいことになる。Therefore, it would be sufficient to set rJ and the weight (=1) as shown in FIG. 10 in line with the hookworm of the matching path as shown in FIG. 7 (dl).

第１１図は本発明の一実施例である。０１は音声信号の
入力端子である。（１ｊ〕けフィルタパンク等で構成さ
れており入力音声信号を特徴ベクトルの系列に変換する
特徴抽出部である。０９は認識すべき単音゛節高声の標
準パターンとしてそれぞれが特徴ベクトルの形で予め登
録されている単音節標準パターンＨ己惰部である。ここ
にはまた、１１Ｊ記の重みＷ’（ｊ）もそれぞれの単音
節、それぞれのフレームに対してＶ録されている。θ（
１はベクトル間距離計脚部であって、各ｉフレームにつ
いて、第６図で示される斜線部における入カバターンを
構成するベクｌ゛ルａｌ′と標準パターンｎを構成する
ベクトル１）、”　（７）　１１１１　ｔ７Ｊ）　距Ｎ
ｆ　ｄ’（ｉ　、　ｊ）　（ｎ＝１　、２　＋”’＋Ｎ
Ｆ　ｊ＝１．２、−０Ｊ”）　ヲ計ηし記憶する。ここ
に距離ｄ”（ｉ、ｊ）は例えば市街距１１１ｆｌなどが
用いられる。ずなわぢａ’（＝（ａｉ′ｌ　ｌ　ａｉ１
２　＋”’ハ′ρ、ｂ７−　（ｂ’４　Ｈ、Ｌ）’ｊ２
　、・・・、ｌ）’１７）とするときｄ’（ｉ、ｊ）−
Σｌ　ａ　ｉ　ｈ　　Ｌ）ｘ　ｋ　ｌｋ・１と［７で定義゛できる。FIG. 11 shows an embodiment of the present invention. 01 is an input terminal for audio signals. (1j) is a feature extractor that converts the input audio signal into a series of feature vectors, which is composed of a filter puncture, etc. 09 is a feature extraction unit that converts the input audio signal into a series of feature vectors. This is the pre-registered monosyllabic standard pattern H self-inertia part.Here, the weight W'(j) of 11J is also recorded for each monosyllable and each frame.θ(
Reference numeral 1 denotes a vector distance meter leg, and for each i-frame, a vector l'al' forming the input cover turn in the shaded area shown in FIG. 6, and a vector 1) forming the standard pattern n, 7) 1111 t7J) Distance N
f d'(i, j) (n=1, 2 +"'+N
F j = 1.2, -0J") is calculated and stored. Here, the distance d" (i, j) is, for example, a city distance of 111fl. Zunawajia'(=(ai'l l ai1
2 +"'ha'ρ, b7- (b'4 H, L)'j2
,...,l)'17) When d'(i, j)-
Σl a i h L) x k lk・1 and [7] can be defined.

Ｑ７１はベクトル間距離計算部（１４９の出力ｄ”（ｉ
　、　ｊ）（ｎ＝１゜２、・・・、Ｎ＋　ｊ＝１．２ｓ
・・、Ｊ’）　ト、Ｍ’　ｉｆ　ｆｍ　Ｉ！　準ハター
ン記（１ｍ部に記憶されている重み係数Ｗ”　（ｊ　）
から、人カバ１１−　：／　（７）　ｉ’（ｉ’＝ｉ　
−２Ｊ”＋１〜１−２−Ｊ”）カラｉ　７　ｖ　−Ｊ−
１での部分パターンと標準パターンとの累積照合距離１
）会（ｉ’：　ｉ　）を計算し記憶する部分累積距離計
ヤ部であって、Ｄｒｏ（ｉ’：　ｉ　）は次の漸化式か
らｔＩ嘗される。Q71 is the vector distance calculation unit (149 output d"(i
, j) (n=1゜2,..., N+ j=1.2s
..., J') To, M' if fm I! Quasi-Hatan (weighting coefficient W” (j) stored in the 1m section
From, human cover 11- :/ (7) i'(i'=i
-2J"+1~1-2-J") Kara i 7 v -J-
Cumulative matching distance between the partial pattern and the standard pattern at 1
) is a part of the partial cumulative range meter that calculates and stores (i': i ), where Dro (i': i ) is calculated from the following recurrence formula.

すなわちヲ初ＪＪＪＩ　値１）’、　（ｉ　、　Ｊ’）＝ｄ”（
１，Ｊｎ）　トＬ、テ計ヤスル。たたし、径路選択の拘
束条件は式い）の場合第１０図で示されるものとしてい
る。この計算の結果書られるＤ”、　（ｉ’、　１　）
をＤ：（ｉ’、ｊ）として次の累積距離計算部Ｏ樽に一
時的に記憶する。式（イ）において、Ｄ７　（＋’、ｊ
）の計算は、標準パターンｎの第ｊフレームに対応する
人カバターンのフレームｉ′の範囲はｉ　　２Ｊ”　　
Ｉ＋２Ｊ、４ｉ”；ｉ−↓Ｊ町ヒト１＋１ｊ２　２２であるから、この範囲のｉ′に列してｊ＝Ｊ”、Ｊｆｆ
ｉ、、・・・、１について求めるものである。That is, the first JJJI value 1)', (i, J')=d"(
1, Jn) To L, Te Kei Yasuru. However, the constraint conditions for route selection are as shown in FIG. The result of this calculation is D'', (i', 1)
is temporarily stored in the next cumulative distance calculation unit O barrel as D:(i', j). In formula (a), D7 (+', j
), the range of frame i' of the human cover turn corresponding to the j-th frame of standard pattern n is i2J''
I+2J, 4i"; i-↓J-chohito1+1j2 22 Therefore, in this range of i', j=J", Jff
This is calculated for i, . . . , 1.

θ８）は第ｉフレームが最終フレームと仮定したとき、
最終単音節がｎのときのｉ＝１からの累積距離１）：（
ｉ）と単音節ｎのパックポインタＢ：（ｉ）を計算し、
それらを記憶する累積距離計算部である。すなわちｘ＝１．２．・・・、ＸについてＢ　　（ｉ）＝ｉ’ △　△ （ｎ、ｉ’ｉま式シυを満たすｎ、ｉ’）として求める
。ここにＸは入力Ｊ１１？ｆ節数である。θ8) assumes that the i-th frame is the final frame,
Cumulative distance from i=1 when the final monosyllable is n1):(
i) and pack pointer B of monosyllable n: Compute (i),
This is a cumulative distance calculation unit that stores them. That is, x=1.2. ..., find for X as B (i)=i' △ △ (n, i'i where n, i' satisfies the formula υ). Is X here input J11? f is the number of clauses.

μ上のようにして求められた累積用ＮｆＤ（＋）、バラ
クポインタＢｘ（ｉ）、最後尾即音節Ｎｘ　（＋　）の
それぞれを累積距離記憶部（２）や、パックポインタ記
憶部■、最後尾単音節記憶部（イ）に記憶する。式？υ
におけるＤｘ（ｉＬｌ）は以前に求められた値として累
積距離記（ハ）は音声区間検出部であって、人力信号の
大きさ等から音声区間を判定するものであり、この酔声
区聞検出部（ハ）が、音声入力が開始されたことを検出
すると、フレーム数計数器い警はフレーム毎に計数をは
じめる。前記母音認識よシ最後尾ｊｌｔ斤簡の決定まで
の処理は第ｉフレームについての処理でめったが、この
フレーム数計数器翰の計数値がすなわちこのｉを設定し
ている。従って、前記と同様の処理がフレームが１進む
毎に行われることになる。フレーム数計数器（ハ）は音
声区間が検出されると計数を始め、音声区間が終了する
とリセットされる。最後尾単音節記憶部＠、パックポイ
ンタ記憶部（ハ）には、従ってＮ（ｉ）、Ｂ（ｉ）がｉ
＝１．２．・・・、■について記憶されることになる。The cumulative NfD(+), the barak pointer Bx(i), and the last immediate syllable Nx(+) obtained as above are stored in the cumulative distance storage unit (2), the pack pointer storage unit Stored in the tail monosyllable storage section (A). formula? υ
Dx (iLl) is the value obtained previously, and the cumulative distance record (c) is a voice section detection unit that determines the voice section from the magnitude of the human signal, etc., and this drunkenness zone detection When the part (c) detects that audio input has started, the frame number counter starts counting every frame. In the vowel recognition process, the process up to the determination of the last jlt box is rarely performed on the i-th frame, and the count value of this frame number counter is the one that sets this i. Therefore, the same processing as described above is performed every time the frame advances by one. The frame number counter (c) starts counting when a voice section is detected, and is reset when the voice section ends. Therefore, N(i) and B(i) are i
=1.2. ..., ■ will be stored.

セグメンテーション部（イ）はパックポインタ記憶部（
ハ）に対し、所定のパックポインタを読み出すべき命令
を発するものである。すなわち、セグメンテーション部
（イ）がｉなる値をパックポインタ記憶部（ハ）に発す
ると、パックポインタ記憶部（ハ）からはパックポイン
タＢ（１）が藺み出される。セグメンテーション部（ハ
）はパックポインタに己惰部（ハ）からＢ（ｉ）なる値
を受は収ると、その同じ餡をパックポインタ記憶部■に
光する。従って、音声区間検出部（ハ）が音声入力の終
了を検知すると、フレーム数計数器い９の最終値１がセ
グメンテーション部（イ）Ｋ　供＋１され、セグメンテ
ーション部（イ）は先ずＩなる値をバックポインタ記憶
部（ハ）に発する。以後、前記説明の動作に従って、パ
ックポインタ記イ意部（ハ）には、Ｊ３（Ｉ　）　Ｂ（
１３（１））　、Ｂ（１’、（Ｂ（１）））、・・・、
０なる出力が順次得られることになる。、仁れらの有白
け、最後から２番目のｒｌｔ音節の終りのフレーム、同
３番目の終りのフレーム、同４番］」の終りのフレーノ
・、・・・というものであり、Ｎ（ｉ）＆よｉフレーム
で終る単音節であったから、このｎ＾をそのまま鏝後尾
即叶節紀憶部＠に５．えると、最後の中Ｈ節から逆のｌ
１ｌＮｉ序で認識結果が得られることになる。正規の順
序で結果を得たいときはこの順序の変換をパックポイン
タ記憶部（ハ）の出力に苅して行うか、最後尾単音節記
憶部＠の出力に対して行えばよい。The segmentation unit (a) is the pack pointer storage unit (
For c), an instruction to read a predetermined pack pointer is issued. That is, when the segmentation unit (A) issues the value i to the pack pointer storage unit (C), the pack pointer B (1) is retrieved from the pack pointer storage unit (C). When the segmentation section (c) receives the value B(i) from the self-initiation section (c) in the pack pointer, it lights the same bean paste into the pack pointer storage section (2). Therefore, when the voice section detecting section (c) detects the end of the voice input, the final value 1 of the frame number counter I9 is added to the segmentation section (a) K+1, and the segmentation section (a) first calculates the value I. Issued to the back pointer storage section (c). Thereafter, according to the operation described above, J3(I) B(
13(1)), B(1', (B(1))),...
Outputs of 0 will be sequentially obtained. , Nirera no Arihakuke, the frame at the end of the penultimate rlt syllable, the frame at the end of the third rlt syllable, the frame at the end of the rlt syllable 4],... ) &yo Since it was a monosyllable ending in an i frame, we simply changed this n^ to 5. Then, from the last middle H clause, the reverse l
Recognition results will be obtained in the order of 1lNi. If you want to obtain results in the normal order, you can perform this order conversion on the output of the pack pointer storage section (c) or on the output of the last monosyllable storage section @.

第１２図は、ソフトウェアによって―１１紀実施例装置
の機能を実現した場合のフローチャートである。FIG. 12 is a flowchart when the functions of the 11th century embodiment device are realized by software.

ステップ１００〜１０３は初期化する部分である。Steps 100 to 103 are the initialization portion.

ステップ１０６〜１０８は入力フレーム１１中音節ｎに
ついて、第６図斜線部における標準パターンの特徴ベク
トルと人カバターンの特徴ベクトルの間のベクトル間距
離を求める部分であって、前記ベクトル間距離計算部Ｃ
１１における処理に相当する。Steps 106 to 108 are steps for calculating the inter-vector distance between the feature vector of the standard pattern and the feature vector of the human kataan in the shaded area in FIG.
This corresponds to the process in step 11.

ステップ１０９〜１１４は部分累積部Ｍ　Ｄ６　（＋’
：　＋　）を求める部分であって、前記部分累積毘離計
算部０７）における処理に相当する。Steps 109 to 114 are partial accumulation part M D6 (+'
: +), and corresponds to the processing in the partial cumulative deviation calculating section 07).

ステップ１１５〜１１６は、累積用ＩＨ）ｘ（ｉ）、最
後Ｈ単音節Ｎｘ（ｉ）、パックポインタＢｘ　（＋　）
を求め、それぞれを記憶しておく部分であって、前ｉ？
８累積距離ｔ１嘗一部０枠、累積距離記憶部（ハ）、最
後尾単音節記憶部（イ）、パックポインタ記憶部（ハ）
における処理である。Steps 115 to 116 include the cumulative IH)x(i), the last H monosyllable Nx(i), and the pack pointer Bx(+)
This is the part that calculates and memorizes each, and the previous i?
8 cumulative distance t1 part 0 frame, cumulative distance storage section (c), last monosyllable storage section (a), pack pointer storage section (c)
This is the process in.

ステップ１１７〜１２０はｉ＝１．２．・・・、■につ
いて上記のようにして求まったＮｘ（ｉ　）　、Ｂｘ（
ｉ　）から最終α・Ｊな認識結果を得る部分であって、
ｈｒｌ記バツバツクポインタ記憶部）、セグメンテーシ
ョン部（イ）、最後尾単音節ｍｌ億部（２）の間で行わ
れる処理に相当する。Steps 117-120 are for i=1.2. ..., Nx(i), Bx(
i) to obtain the final α・J recognition result,
This corresponds to the processing performed between the url entry pointer storage unit), the segmentation unit (a), and the last monosyllable ml unit (2).

以」１のように、本発明装置によれば、式０９の漸化式
Ｒ１ａによる連続単音節認識においで、重みを導入する
ことにより子音部を重視した照合が用箋となり認鍼率が
大幅に向上したものである。As shown in 1 above, according to the device of the present invention, in continuous monosyllable recognition using the recurrence formula R1a of formula 09, by introducing weights, the verification that emphasizes the consonant part is used as a guideline, and the recognition rate is significantly increased. This is an improvement.

なお、本実施例は単音節音声の認識を例に上げて説明し
たが、一般の単語音声でもよく、特に互に甘ぎられしい
単１１ハを訴識語粱に含むとき等は、その特徴部分に大
きなηｆみをつけることにより効果をあげることができ
る。まだ、本発明は音声の認識のみに１炙らず他のパタ
ーンの認識にも応用６ｆ能であることは勿論である。Although this embodiment has been explained using the recognition of monosyllabic speech as an example, general word speech may also be used, and especially when the mutually sweet single 11ha is included in the pleading word 粱, the characteristic parts of the speech may be used. The effect can be improved by adding a large ηf to . Of course, the present invention is applicable not only to voice recognition but also to recognition of other patterns.

[Brief explanation of the drawing]

第１図は従来の音声、Ｒ識装商゛のブロック図、第２図
Ｃ」パターンＡ、Ｂの特徴ベクトルの対応関係を７１く
ず図、第；３１゛べ１（ａ）〜（ｆ）はｉ−ｊ平面上の
格子点を選ぶ１仝の拘束条件例をう」、す図、第４しｉ
および第５図はそれぞれ人力ｊｌｉ語数が既知の場合、
未知の場合の連続単＾（ｌ？４戸コよ４識におりるセク
メンデーションおよび開織単語の決定手順を示すフロー
チャー　ト、第６図は２段ＤＰ法の後向きアルゴリズム
の説、四国、第７図（ａｌ〜（ｄ）は１−ｊ平面上の格
子点を選ぶ際の拘束条件例を示す図、第８図〜第１０図
はマツチング径路に対しての重みイ」けの実施例を示す
図、第１Ｉ図は本発明における一実施例のブロック図、
第１２図は同実施例装置の機能をソフトウェアで実現（
７たときのフローチャートである。ｑＤ・・・特徴抽出部、（１〜・・・単音節標準パター
ン記憶部、ＯＱ・・・ベクトル間距離計算部、０７）・
・・部分累積距離計算部、０８）・・・累積距離計算部
、＠・・・最後（で１１１−音節記憶部、（ハ）・・・
累積距離記憶部、（ハ）・・・パックポインタ記憶部、
（ハ）・・・音声区間検出部、９勺・・・フレーム＆　
計＆器、ｖｈ・・・セグメンテーション部代理人　　　
　森　　　本　　　義　　　弘第を図（ｊ＞　　　　ｔｂｌ（０＜ｄ＞第１図第２図第７０図７＋　＋’ＵＪ”（Ｊン第１１図第１２図Figure 1 is a block diagram of a conventional speech recognition processor. is an example of one constraint condition for selecting lattice points on the i-j plane.
and Figure 5 are respectively when the number of human words is known,
A flowchart showing the procedure for determining the secumendation and opening words in the unknown case of continuous unit ^ (l? 4 doors). , Figures 7(al to d) are diagrams showing examples of constraint conditions when selecting grid points on the 1-j plane, and Figures 8 to 10 are examples of weight adjustment for matching paths. FIG. 1I is a block diagram of an embodiment of the present invention;
Figure 12 shows the functions of the device in this embodiment realized by software (
7 is a flowchart. qD... Feature extraction unit, (1~... Monosyllabic standard pattern storage unit, OQ... Inter-vector distance calculation unit, 07)・
... Partial cumulative distance calculation section, 08) ... Cumulative distance calculation section, @ ... Last (at 111-Syllable storage section, (c) ...
Cumulative distance storage unit, (c)...Pack pointer storage unit,
(c)...Voice section detection unit, 9...frame &
Meter & Equipment, VH...Segmentation Department Agent
Figure Yoshihiro Morimoto (j> tbl (0 <d> Figure 1 Figure 2 Figure 70 Figure 7+ +'UJ''(J' Figure 11 Figure 12

Claims

[Claims]

1 Sequence of feature vectors aI ”2...a
Feature extraction means for converting into l and feature vector series 1)
Standard pattern B consisting of I r'b, n...b';n
A standard pattern storage means for storing '(just...W'(J'')) and a feature vector aIa2 constituting a pattern in which the distance between the human cover turn and the standard pattern R' are entered in 11. ...a, and the feature vector b+ 1)2...b;n that constitutes the standard pattern R'
,! : As a function consisting of the weighting coefficient W'(j),
A pattern comparison device characterized by having means for minimizing by dynamic programming.