JPS634299A

JPS634299A - Pattern matching

Info

Publication number: JPS634299A
Application number: JP61148580A
Authority: JP
Inventors: 隆夫渡辺
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1986-06-24
Filing date: 1986-06-24
Publication date: 1988-01-09
Also published as: JPH0577078B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（産業上の利用分野）本発明はパタンマッチング技術、特に音節を認識単位と
した音声認識におけるパタンマッチング技術の改良に関
する。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to pattern matching technology, particularly to improvement of pattern matching technology in speech recognition using syllables as recognition units.

（従来の技術）音節のような音声学上の基本的な単位を用いて音声認識
を行う方法は、認識対象の各単語単位の標準パタンを必
要としないため大語業認識に適している。この場合認識
処理は、入力された音声を音節候補の系列へ変換する音
節認識処理と、この音節系列と音節表記された単語辞書
中の各単語とを照合する単語認識処理の２つの処理を含
んで構成される。後者の単語認識処理は音節を記号単位
とする記号系列間のパタンマッチングにより実現するこ
とが可能である。このようなパタンマツチング法として
「情報処理Ｊ第１７巻Ｎｏ、７　ｐｐ、６５０−６５８
（１９７６年７月）に述べられている記号列間のマツチ
ング法は、代表的な方法である。(Prior Art) A method of performing speech recognition using basic phonetic units such as syllables is suitable for large language word recognition because it does not require a standard pattern for each word to be recognized. In this case, the recognition process includes two processes: a syllable recognition process that converts the input speech into a series of syllable candidates, and a word recognition process that matches this syllable series with each word in the word dictionary expressed in syllables. Consists of. The latter word recognition process can be realized by pattern matching between symbol sequences using syllables as symbol units. As such a pattern matching method, "Information Processing J Vol. 17 No. 7 pp. 650-658
(July 1976) is a typical method for matching between symbol strings.

この方法においては、記号間の距離をあらかじめ定めて
おき、２つの記号列パタンを時間的に対応付けたときに
対応付けられる各々の記号間の距離をパタン全体にわた
って累積した累積距離量が最小となるように動的計画法
（Ｄｙｎａｍｉｃ　Ｐｒｏｇｒａｍｉｎｇ）を利用して
マツチングを行っている。In this method, the distance between symbols is determined in advance, and when two symbol string patterns are temporally associated, the distance between each associated symbol is determined so that the cumulative amount of distance accumulated over the entire pattern is the minimum. Matching is performed using dynamic programming to achieve this.

（発明が解決しようとする問題点）ここで記号間の距離は、言己号すなわち音節間の近さ、
あるいはある音節が認識結果として正解とおきかわって
別の音節として観測される確度を示すものと解決される
。音節候補系列に脱落や挿入が存在する場合には第１図
に示されるように１つの音節が複数個の音節に対応付け
られることになる。音節の脱落や挿入は、音韻固有の性
質によりある特定の音節のつながりで生じることが多い
。(Problem to be solved by the invention) Here, the distance between symbols is the word self sign, that is, the closeness between syllables,
Alternatively, it is resolved to indicate the probability that a certain syllable will be observed as a different syllable instead of the correct answer as a recognition result. If there are omissions or insertions in the syllable candidate series, one syllable will be associated with a plurality of syllables as shown in FIG. Syllable dropouts and insertions often occur in specific syllable connections due to the inherent properties of phonemes.

このため音節間の距離を決定するにあたっては、コンテ
キスト（音節の前後のつながり）に依存した音韻論的な
変形を考慮することが重要である。Therefore, when determining the distance between syllables, it is important to consider phonological transformations that depend on the context (connection before and after syllables).

しかしながら、上記の記号間距離によるマツチングでは
、音節間距離はコンテキストに無関係に与えられるため
このようなコンテキストに依存した変形が考慮されない
。However, in the above-mentioned matching using the inter-symbol distance, since the inter-syllable distance is given without regard to the context, such context-dependent transformations are not taken into consideration.

例えば変形／ｋｉｙｏ／−／ｋｙｏ／を考えよう。音節
／ｋｉ／と音節／ｋｙｏ／間の゛°距離″、音節／ｙｏ
／と音節／ｋｙｏ／間の゛距離″はいずれも小さくない
にもかかわらず、このような変形は典型的なものである
。しかしながら上記のマツチングでは音節列／ｋｉｙｏ
／と音節／ｋｙｏ／とを対応させたときの距離は小さく
ならない。For example, consider the transformation /kiyo/-/kyo/. ``distance'' between the syllable /ki/ and the syllable /kyo/, the syllable /yo
Such transformations are typical even though the "distance" between / and the syllable /kyo/ is not small. However, in the above matching, the syllable string /kiyo/
The distance when / and the syllable /kyo/ are made to correspond does not become small.

本発明は、音韻論的な変形規則をテーブル形式の距離に
よって記述することにより音節候補系列ないしネットワ
ーク上で音節の脱落、挿入等が存在する場合でも高い精
度でマツチングを行う手段を提供する。本発明はこれに
より認識精度の高い音声認識装置を実現することを目的
としている。The present invention provides means for performing matching with high accuracy even when syllable dropouts, insertions, etc. exist in a syllable candidate series or network by describing phonological transformation rules using distances in a table format. An object of the present invention is to thereby realize a speech recognition device with high recognition accuracy.

（発明の構成）本発明は、音節に関する系列ないしネットワークとして
表現された２つのパタンを照合するに際して、あるかし
め各音節の組み合わせ（Ｘ１、Ｘ２．Ｘ３）について与
えられた音節Ｘ１と音節列Ｘ２Ｘ３との間の距離ｄ（Ｘ
ｘ；Ｘ２、Ｘ３）を用いることを特徴とする。(Structure of the Invention) When comparing two patterns expressed as a syllable series or network, the present invention provides the syllable X1 and the syllable sequence X2 The distance d(X
x; X2, X3).

また、本発明は音節に関する系列ないしネットワークと
して表現された２つのパタンを照合するに際して、音節
ｘ１と音節列Ｘ２Ｘ３との間の距離ｄを用い、該距離ｄ
は、各音節の組み合わせについてあらかじめ与えられた
２種の音節間距離ｄＦ（Ｘ１、Ｘ２）、ｄＢ（Ｘ１、Ｘ
ｓ）を用いて草山されることを特徴とする。Furthermore, the present invention uses the distance d between the syllable x1 and the syllable string X2X3 when comparing two patterns expressed as a syllable series or network, and
are the two types of inter-syllable distances dF(X1, X2) and dB(X1, X2) given in advance for each syllable combination.
s).

さらに、本発明は音節に関する系列ないしネットワーク
として表現された２つのパタンを照合するに際して、音
節ｘ１と音節列Ｘ２Ｘ３との間の距離ｄを用い、該距離
ｄは各音節の組み合わせについてあらかじめ与えられた
３種の音節間距離ｄＦ（Ｘ１、Ｘ２）、ｄＢ（Ｘ１、Ｘ
ａ）、ｄ’（Ｘ２．Ｘａ）を用イテ算出すレルコとを特
徴とする。Furthermore, the present invention uses the distance d between the syllable x1 and the syllable sequence Three types of inter-syllable distance dF (X1, X2), dB (X1,
a) and d' (X2.Xa).

（作用）本発明の作用を２つのパタンがともに音節系列として表
わされている場合を例にとって説明する。(Operation) The operation of the present invention will be explained by taking as an example the case where two patterns are both expressed as syllable sequences.

２つのパタンＡ、ＢをそれぞれＡ＝（ａ（１）、・、ａ（ｉ）、−、ａ（Ｉ））。Two patterns A and B respectively A=(a(1),·,a(i),−,a(I)).

Ｂ＝（ｂ（１）、・・・、ｂψ、・・・、ｂ（Ｊ））と
する。ここでａ（ｉ）、ｂ（ｊ）は音節記号（具体的に
は番号）である。前掲の文献と類似の方法により、音節
８己号間距離ｄＯ（Ｘ１、Ｘ２）を導入することにより
パンＡ、Ｈのマツチングは次の通り動的計画法を利用し
て実現できる。Let B=(b(1), . . . , bψ, . . . , b(J)). Here, a(i) and b(j) are syllable symbols (specifically, numbers). Using a method similar to the above-mentioned document, by introducing the distance dO (X1, X2) between syllables 8 and 8, matching of breads A and H can be realized using dynamic programming as follows.

初期条件：ｇ（０，０）＝　Ｏｇ（Ｏｊ）＝ｃｏ、ｊ＝１．・・・、Ｊ漸化式：　　ｇ
（ｉ、ｊ）＝ｄＯ（ａ（ｉ）、ｂ（ｊ））ｍｉｎｌｇ（
ｉ−１ｊ）　　’１（１）・ｇ（ｉｊ−１）　　’。Initial conditions: g(0,0)=O g(Oj)=co, j=1. ..., J recurrence formula: g
(i,j)=dO(a(i),b(j))minlg(
i-1j) '1(1)・g(ij-1)'.

ｉ＝１．・・・、■ ｊ＝１．・・・、Ｊ但しｇ（ｉ、０）　＝■、ｉ＝１．・・・、■漸化式（
１）のマツチングパスは第２図（Ａ）に示すようなもの
である。i=1. ..., ■ j=1. ..., J However, g(i, 0) =■, i=1. ..., ■ Recurrence formula (
The matching path of 1) is as shown in FIG. 2(A).

次に音節Ｘ１と音節列Ｘ２Ｘ３との間の距離ｄ（Ｘｌ；
Ｘｓ。Next, the distance d(Xl;
Xs.

Ｘａ）を定義する。この距離は音節Ｘ１が音節列Ｘ２Ｘ
３と置き換わって観測される確度を示すものとする。（
これと逆に音節列Ｘ２Ｘ３が音節Ｘ１と置き換わって観
測される確度を別に定義することができるが、以下の説
明では同じものとする）新たに定義された距離ｄに対し
ては第２図（Ｂ）に示されるようなりＰの漸化式を用い
ることができる。Define Xa). This distance is from syllable X1 to syllable string X2
3 to indicate the accuracy of observation. (
Conversely, the accuracy with which the syllable sequence X2X3 is observed replacing the syllable X1 can be defined separately, but in the following explanation it will be the same.) For the newly defined distance d, see Figure 2 ( A recurrence formula for P can be used as shown in B).

このような距離ｄの採用により音節系列間のマツチング
距離をコンチクストに依存した音節の変形のしやすさを
考慮したものとして求めることが可能となっている。By employing such a distance d, it is possible to obtain a matching distance between syllable sequences that takes into account the ease with which syllables are deformed depending on the concatenation.

以上が本発明の第一の側面を述べたものであるが、上述
したように距離ｄは３次元配列データであるため、デー
タ量が大きい。日本語の音節数−１００とすると１００
３＝１０６語の記憶容量を必要とする。本発明の第二、
第三の側面はこの記憶量の低減を意図するものである。The first aspect of the present invention has been described above, and since the distance d is three-dimensional array data as described above, the amount of data is large. Number of syllables in Japanese - 100 = 100
Requires storage capacity of 3=106 words. Second of the present invention,
The third aspect is intended to reduce this amount of memory.

ここでは、音節Ｘ１と音節列Ｘ２Ｘ３の距離は次のよう
な近似的表現により記述される。Here, the distance between the syllable X1 and the syllable string X2X3 is described by the following approximate expression.

ｄ（Ｘｌ；Ｘｓ、Ｘｓ）ンｄＦ（Ｘｔ、Ｘｓ）　＋　ｄ
Ｂ（ＸＩ、Ｘａ）　　　　　（３）ｄ（Ｘｌ；Ｘｓ、Ｘ
ｓ）”ｄＦ（Ｘ１、Ｘｓ）＋　ｄＢ（Ｘｘ　、Ｘｓ）＋
ｄ’（Ｘｓ、Ｘｓ）　　　　　　　　　（４）式（３）
ではｄは、音節ｘ１の前半部分と音節Ｘ２の距離ｄＦ（
Ｘ１、Ｘｓ）と音節ｘ２の後半部分と音節Ｘ３の距離ｄ
Ｂ（Ｘ１、Ｘｓ）の和として与えられる。このとき記憶
量は、１００２Ｘ２：２Ｘ１０４であり、大巾な記憶容
量の低下となっている。d(Xl;Xs,Xs) dF(Xt,Xs) + d
B(XI,Xa) (3)d(Xl;Xs,X
s)”dF(X1,Xs)+dB(Xx,Xs)+
d'(Xs, Xs) (4) Formula (3)
Then, d is the distance dF(
X1, Xs) and the distance d between the second half of syllable x2 and syllable X3
It is given as the sum of B(X1, Xs). At this time, the storage capacity is 1002X2:2X104, which is a significant decrease in storage capacity.

式（４）は式（３）に加え第３の項としてｄ’（Ｘｓ、
Ｘｓ）が付加されているが、これは音節列Ｘ２Ｘ３が１
音節として観測される確度を距離によって表現したもの
である。Equation (4) has the third term d'(Xs,
Xs) is added, which means that the syllable string X2X3 is 1
The accuracy with which a syllable is observed is expressed by distance.

式（２）のｄｏ、ｄあるいは式（３）、（４）のｄＦ、
ｄＢ、ｄＪは任意の方法によりあらかじめ与えられてい
ればよい。音韻論的な変形規則から求めることも実音声
がら求めることも可能である。do and d in equation (2) or dF in equations (3) and (4),
dB and dJ may be given in advance by any method. It is possible to obtain it from phonological transformation rules or from real speech.

例えば次に示すような方法が考えられる。For example, the following methods can be considered.

（１）、音韻論的な変形規則によってｄ（Ｘｌ；Ｘｓ、
Ｘｓ）を求める。(1), d(Xl;Xs,
Find Xs).

ｒ　［Ｃ］ｙａ−［Ｃ］ｉｙａ；［Ｃ］　＝任意の子音
１等の変形規則の集合を用意し、この集合にｘｌ−Ｘ２
Ｘ３が含まれる場合にはｄ＝１．０、その他の場合には
ｄ＝α（＄−１，０）とする。r [C]ya-[C]iya; [C] = Prepare a set of transformation rules for any consonant 1, etc., and add xl-X2 to this set.
If X3 is included, d=1.0; otherwise, d=α($-1,0).

（２）、実音声を用いて音声バタン間のマツチングを実
際に行って（この場合、ＤＰ法による時間軸正規化マツ
チング法が利用できる）得られたマツチングスコアから
距離を求める。このような作業をすべての音節の組合せ
について実行することは効率的でないので、音韻論の観
点から変形の可能性かあ−る程度、考えられる組合せに
ついてのみ、実行することが実用的である。(2) The distance is calculated from the matching score obtained by actually performing matching between the sound bangs using real speech (in this case, the time axis normalized matching method using the DP method can be used). Since it is not efficient to perform such an operation for all syllable combinations, it is practical to perform it only for combinations that have a certain degree of possibility of deformation from a phonological point of view.

（実施例）本発明による一実施例を示すブロック図を第３図に示す
。図において、１，２はバタンバッファであり、それぞ
れバタンＡ＝（ａ（１）、・・・、ａ（Ｉ））Ｂ＝（ｂ（１）、・・・、ｂ（Ｊ））が格納されている。３は漸化式計算部であり、バッファ
１，２から順次記号ａ（ｉ）、ｂ（ｊ）を読み出し漸化
式の計算を実行し、その結果しとて得られる積分量ｇ（
ｉｊ）は積分量バッファ４に保持される。５は距離テー
ブルメモリであり、記号ａ（ｉ）、ｂ（ｉ）等をアドレ
ス入力として受け、（２）式の距離値ｄｏ及びｄを出力
する。(Embodiment) A block diagram showing an embodiment according to the present invention is shown in FIG. In the figure, 1 and 2 are the baton buffers, which store the batons A=(a(1),..., a(I)) B=(b(1),..., b(J)), respectively. has been done. 3 is a recurrence formula calculation unit, which sequentially reads symbols a(i) and b(j) from buffers 1 and 2, executes calculation of the recurrence formula, and calculates the integral amount g(
ij) is held in the integral amount buffer 4. A distance table memory 5 receives symbols a(i), b(i), etc. as address inputs, and outputs distance values do and d in equation (2).

距離テーブルメモリには、本発明の３つの側面に対応し
てそれぞれ次の内容が格納される。The distance table memory stores the following contents corresponding to the three aspects of the present invention.

（ｉ）３次元配列テーブルｄ（Ｘ１、Ｘｓ、Ｘｓ）及び
２次元配列テーブルｄ’（Ｘ１、Ｘｓ）（ｉｉ）　　３つの３次元配列テーブルｄＦ（Ｘ１、Ｘ
ｓ）、ｄＢ（Ｘ１、Ｘｓ）。(i) Three-dimensional array table d(X1, Xs, Xs) and two-dimensional array table d'(X1, Xs) (ii) Three three-dimensional array table dF(X1,
s), dB(X1, Xs).

ｄｏ（Ｘ１、Ｘｓ）（ｉｉｉ）　４つの３次元配列テーブルｄＦ（ＸＩ、Ｘ
ｓ）、ｄＢ（ＸＩ、Ｘｓ）。do(X1, Xs) (iii) Four three-dimensional array tables dF(XI,
s), dB(XI, Xs).

ｄ’（Ｘ１、Ｘｓ）、ｄｏ（Ｘ１、ｘ２）漸化式計算部
２は（ｉ）の場合には（２）式を実行するが（ｉｉ）、
（ｉｉｉ）の場合には（２）式に（３）、（４）式を代
入した次の式（２’　Ｘｓ“）をそれぞれ実行する。d' (X1, Xs), do (X1, x2) The recurrence formula calculation unit 2 executes formula (2) in case (i),
In the case of (iii), the following equations (2'Xs") are executed by substituting equations (3) and (4) into equation (2).

最終的に１＝Ｉｊ＝Ｊで得られた積分量ｇ（Ｉ、Ｊ）を
マツチング距離として出力する。Finally, the integral amount g(I, J) obtained by 1=Ij=J is output as a matching distance.

以上本発明の原理を、パタンが音節系列として表わされ
ている場合を例にとって説明したが、本発明の原理は、
音節だけでなくその他の音声学的な単位に対してもその
まま適用可能である。このような単位としてはｐｈｏｎ
ｅｍｅ（音素）、ｐｈｏｎｅ（音素よりさらに細かい単
位）、ｄｉｐｈｏｎｅ（１ツのｐｈｏｎｅから次のｐｈ
ｏｎｅに至る区間を１つの単位としたもの）、ｄｅｍｉ
−ｓｙｌｌａｂｌｅ（１つの音節から次の音節に至る区
間を１つの単位としたもの）等が考えられる。The principle of the present invention has been explained above using the case where the pattern is expressed as a syllable sequence, but the principle of the present invention is as follows.
It can be applied directly to not only syllables but also other phonetic units. Such a unit is phon
eme (phoneme), phone (a unit even finer than a phoneme), diphone (from one phone to the next ph)
one unit), demi
-syllable (one unit is an interval from one syllable to the next syllable), etc.

また本発明の原理はパタンが記号系列としてではなく記
号のネットワークとして表わされている場合にもそのま
ま適用可能である。Further, the principles of the present invention can be directly applied even when a pattern is represented not as a symbol sequence but as a network of symbols.

例えば、２つのパタンＡ、ＢをそれぞれＡ＝［（ａ（１
）、・、ａ（Ｉ））；（ｔ（ｉ’　、ｉ））、ｉ、ｉ’
　　＝１．・、ＩＩＢ＝［（ｂ（１）、・・・、ｂ（Ｊ
））；（Ｓ（ｊ’　　、ｊ））ｊｊ’　　＝１．・・・
、Ｊｌただし、Ｉ、Ｊはネットワークのノードの個数で
あり、ｔ（ｉ、ｉ′）、５（ｊｊ′）はそれぞれノード
ｉからｉ′、ノードｊからｊ′へのパスの存在の有無を
表わす。For example, suppose two patterns A and B are respectively A=[(a(1
), ·, a(I)); (t(i', i)), i, i'
=1.・, IIB=[(b(1),...,b(J
)); (S(j', j))jj' = 1. ...
, Jl, where I and J are the number of nodes in the network, and t(i, i') and 5(jj') indicate the presence or absence of a path from node i to i' and from node j to j', respectively. represent.

ｔ（ｉ’　、１）＝１のときパスｉ’　−ｉが存在する
Ｑ　　ＩＩ　　　　　　ＩＩ　　　　Ｌないｓ（ｊ’　
＝）＝１のときパス了−ｊが存在するＱ　　ｎ　　　　
　　ｎ　　　　シないこのとき（２）の漸化式のかわりに次の漸化式を用いることがで
きる。When t(i', 1) = 1, path i' -i exists Q II II L not s(j'
When =)=1, there is a pass completion-j Q n
When n is not present, the following recurrence formula can be used instead of the recurrence formula (2).

（発明の効果）以上述べたように、本発明によれば、音節の前後の環境
に依存した音節の変形を考慮した音節記号レベルでのマ
ツチングを行うことが可能となり、単語単位のパタンを
用意することなく高精度のマツチングを行う手段を大語
業音声認識の実現のために提供することが可能となる。(Effects of the Invention) As described above, according to the present invention, it is possible to perform matching at the syllable symbol level, taking into account the deformation of syllables depending on the environment before and after the syllable, and to prepare patterns for each word. It becomes possible to provide a means for performing high-precision matching without having to perform a large amount of speech recognition.

[Brief explanation of the drawing]

第１図は記号列のマツチングの一例を示す図、第２図（
Ａ）、（Ｂ）はマツチングにおける漸化式の原理を説明
する図、第３図は本発明による一実施例を示すブロック
図である。図中、１．２・・・パタンバッファ３・・・積分量バッファ４・・・漸化式計算部５・・・距離テーブルメモリ抑へキ　　ゴ　　ラ　　　カ・　　　　（単語３半婁）ｉ−
２ｊ−／　　メFigure 1 shows an example of matching symbol strings, Figure 2 (
A) and (B) are diagrams explaining the principle of recurrence formula in matching, and FIG. 3 is a block diagram showing an embodiment according to the present invention. In the figure, 1.2...Pattern buffer 3...Integral amount buffer 4...Recurrence formula calculation unit 5...Distance table memory storage (word 3 and a half) i-
2j-/Me

Claims

[Claims]

(1) When comparing two patterns expressed as syllable sequences or networks, the distance d (d) between the syllable X_1 and the syllable sequence X_2 A pattern matching method characterized by using (X_1; X_2, X_3).

(2) When matching two patterns expressed as sequences or networks regarding syllables, syllable X_1
and the syllable string X_2X_3, the distance d
is the two types of inter-syllable distances d^F(X_1, X_2) and d^B( given in advance for each syllable combination.
X_1, X_3).

(3) When comparing two patterns expressed as sequences or networks regarding syllables, syllable X_1
and the syllable string X_2X_3, the distance d
is the 3 given in advance for each syllable combination.
Seed intersyllable distance d^F(X_1, X_2), d^B(X
_1, X_3), d^J(X_2, X_3).