JPS6173199A

JPS6173199A - Voice preselection system for large-vocabulary word

Info

Publication number: JPS6173199A
Application number: JP59195621A
Authority: JP
Inventors: 沢井　秀文
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1984-09-18
Filing date: 1984-09-18
Publication date: 1986-04-15

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】肢五分立本発明は、大語党単語音声認識における単語の予備選択
方式に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a pre-selection method for words in large word speech recognition.

ｌ米及血単語辞書項目間の距離を基に単語集合を数個〜数十個の
クラスタに予めクラスタリングを行い、各クラスタの中
心単語とのマツチングを行う大語党単語音声予備選択法
については、例えば、「日本音π学会講演論文集１−１
−２．ＰＰ３〜・↓。Regarding the preliminary word phonetic selection method, which clusters a set of words into several to several dozen clusters in advance based on the distances between the entries in the rice and blood word dictionary, and matches them with the central word of each cluster. , for example, "Japanese π Society Lecture Proceedings 1-1
-2. PP3～・↓.

“大語霊単語音声認識のための単語の予備選択法の検討
”松井新、坂野正三、城戸健−二に記載されているが、
上記方法では、計算量は直接認識を行う場合に比べて１
５〜２３％しか減少せず、認識率も２１２単語に対して
８６〜９５％と十分な値が得られていない。It is described in “Study of preliminary word selection method for speech recognition of big words” by Arata Matsui, Shozo Sakano, and Kenji Kido.
In the above method, the amount of calculation is 1 compared to the case of direct recognition.
The reduction is only 5 to 23%, and the recognition rate is 86 to 95% for 212 words, which is not a sufficient value.

旦直本発明は、上述のごとき実情に鑑みてなされたもので、
ＤＰマツチングの際の単語間距離に基づいて類似単語同
士をグループ（クラスタと呼ぶ）に大分類し、各クラス
タの中心単語を登録しておき、未知入力単語とこれら各
中心単語とのＤＰマツチングを行って入力単語が属する
クラスタを決定後、クラスタ内の単語についてのみＤＰ
マツチングを改めて行って認識処理の高速化を図ったも
のである。The present invention was made in view of the above-mentioned circumstances.
Similar words are roughly classified into groups (called clusters) based on the distance between words during DP matching, the central word of each cluster is registered, and DP matching is performed between unknown input words and each of these central words. After determining the cluster to which the input word belongs, DP is applied only to the words within the cluster.
This is an attempt to speed up the recognition process by performing matching again.

ハ本発明は、上記目的を達成するため、大語覚単語音声認
識を行う音声認識装置において、辞書単語間でＤＰマツ
チングを行って各単語間距離を計算して計算結果のテー
ブルをメモリーに蓄積しておく手段と、この単語間距離
テーブルに基づいて、辞書単語内の類似単語同士をグル
ープ化する単語のクラスタリング手段とを有し、各クラ
スタ内で中心単語を登録しておき、未知入力単語が入力
された時に、予め登録しておいた前記クラスタの中心単
語とのマツチングをＤＰマツチングによって行い、認識
単語が存在するクラスタを予備的に選択した後に、前記
クラスタ内の単語についてのみ改めてＤＰマツチングを
行うことを特徴としたものである。以下、本発明の実施
例に基づいて説明する。C. In order to achieve the above object, the present invention performs DP matching between dictionary words, calculates the distance between each word, and stores a table of calculation results in a memory in a speech recognition device that performs speech recognition of large words. and a word clustering means for grouping similar words in the dictionary words based on this inter-word distance table, registering the central word in each cluster, and is input, DP matching is performed to match the central word of the cluster registered in advance, and after preliminary selection of the cluster in which the recognized word exists, DP matching is performed again only for the words within the cluster. It is characterized by the following. Hereinafter, the present invention will be explained based on examples.

第１図は、辞書単語登録時に於けるシステム構成図で、
図中、１はＮ個の登録用辞書単語音声ｎ１＝１．２．・
・・、Ｎの入力端子部、２はスペクトル解析部、３は辞
書単語ｎｉ同士のＤＰマツチング部、４は単語間距離テ
ーブル作成部、５は単語のクラスタリング部、６はクラ
スタの中心単語登録部で、まず、入力部１よりＮ個の登
録用単語音声が入力され、スペクトル解析８１；２で周
波数分析か行われ、ＤＰマツチング部３で登録用単語Ｎ
個同士のＤＰマツチングが行われる。実際には、自分自
身との単語間距離はＯであり、また組み合わせの対称性
から一意的な組み合わせ総数（Ｎ’　−Ｎ）７２回のＤ
Ｐマツチングを行えばよい。而して、登録単語ｎｉとｎ
ｊ　（ｎｉ、ｎｊ−１，２，・・・、Ｎ）間の単語間距
離テーブルＤ　（ｎｉ、ｎｊ　）を単語間距離テーブル
作成部４で作成する。Figure 1 is a system configuration diagram when registering dictionary words.
In the figure, 1 indicates N dictionary word sounds for registration n1=1.2.・
..., N input terminal section, 2 is a spectrum analysis section, 3 is a DP matching section between dictionary words ni, 4 is an inter-word distance table creation section, 5 is a word clustering section, 6 is a cluster center word registration section First, N registration word sounds are input from the input unit 1, frequency analysis is performed in the spectrum analysis 81;2, and the registration word N is input in the DP matching unit 3.
DP matching between individuals is performed. In reality, the distance between words and itself is O, and due to the symmetry of combinations, the total number of unique combinations (N' - N) is 72 D
P matching can be performed. Therefore, the registered words ni and n
An inter-word distance table D (ni, nj) between j (ni, nj-1, 2, . . . , N) is created by an inter-word distance table creation unit 4.

第２図は、上述のようにして作成した単語間距離テーブ
ルの一例を示す図で、該テーブルは、対角成分がＯの対
称行列となる。即ち、Ｄ＝　（ｎｉ、ｎｊ　）　＝Ｏ：　ｎｉ　＝ｎｊＤ＝　
（ｎｉ、ｎｊ　）　＝Ｄ　（ｎｊ、ｎｉ　）　≠Ｏ：ｎ
ｉ　≠ｎｊｎｉ、ｎｊ　＝１．　２．−、　Ｎである。FIG. 2 is a diagram showing an example of an inter-word distance table created as described above, and the table is a symmetric matrix with O diagonal elements. That is, D= (ni, nj) =O: ni =njD=
(ni, nj) = D (nj, ni) ≠O:n
i≠njni, nj =1. 2. -, N.

次に、上述のごとくして作成した単語間距離テーブル４
を基にして、類似単語同士をグループ分けする単語のク
ラスタリングをクラスタリング部５で行い、このとき、
各クラスタごとに中心単語を単語登録部６に登録してお
く。Next, inter-word distance table 4 created as described above.
Based on this, the clustering unit 5 performs word clustering to group similar words, and at this time,
A central word for each cluster is registered in the word registration section 6.

第３図は、単語のクラスタリングの概念図であり、図中
、７は人語全単語音声の集合、７ａはそれぞれ単語のク
ラスタの１つを表わし、７ｂは前記クラスタ７ａの中心
単語、７Ｃはクラスタ７ａに属する辞書単語の１つを表
わす。なお、クラスタリングのアルゴリズムの詳細につ
いては後述する。FIG. 3 is a conceptual diagram of word clustering. In the figure, 7 is a set of all human word sounds, 7a each represents one of the word clusters, 7b is the central word of cluster 7a, and 7C is Represents one of the dictionary words belonging to cluster 7a. Note that details of the clustering algorithm will be described later.

第４図は、未知入力音声Ｘの認識時に於けるシステム構
成図で、図中、８は未知入力音声入力部、２はスペクト
ル解析部、９は第１図に示した登録部６に登録した各ク
ラスタの中心単語とのＤＰマツチング部、１０はクラス
タ選定部、１１は選定したクラスタ内の辞書単語とのＤ
Ｐマツチング部、１２は単語同定部、１３は認識結果出
力部で、未知人力音声Ｘは、スペクトル解析部２で周波
数分析され、第１図の登録部６に登録したクラスタの中
心単語（例えば第３図７ｂ）とのＤＰマツチングがＤＰ
マツチング部９で行われ、このとき最も距離の短い中心
単語が属するクラスタがクラスタ選定部ＩＯで決定され
る（第３図でば７ａが決定クラスタとなる）。次に、ク
ラスタ選定部１０で決定されたクラスタに属する単語に
ついて、未知入力音声ＸとのＤＰマツチングをＤＰマツ
チング部１１で行い、最小距離となる単語を単語選定部
１２で決定し、認識結果出力部１３にて認識結果として
出力する。FIG. 4 is a system configuration diagram when recognizing unknown input voice DP matching unit with the center word of each cluster, 10 is a cluster selection unit, 11 is D with the dictionary word in the selected cluster.
P matching unit, 12 is a word identification unit, and 13 is a recognition result output unit.The unknown human voice 3) DP matching with Figure 7b) is DP
This is performed by the matching unit 9, and the cluster to which the central word with the shortest distance belongs is determined by the cluster selection unit IO (in FIG. 3, 7a is the determined cluster). Next, the DP matching unit 11 performs DP matching with the unknown input speech X for the words belonging to the cluster determined by the cluster selection unit 10, and the word selection unit 12 determines the word with the minimum distance, and outputs the recognition result. The unit 13 outputs the result as a recognition result.

第５図は、単語のクラスタリングアルゴリズムの一例を
示すゼネラルフローチャートであり、図中、１４は初期
化ブロック、１５は単語のクラスタリング部、１６は単
語間距離の平均値の計算部、１７はクラスタリングの定
當性判定部、１８は中心単語登録部、１９は各クラスタ
内での中心単語の再計算部、２０は単語間距離更新部お
よび繰り返し数ｍのカウンタ部である。FIG. 5 is a general flowchart showing an example of a word clustering algorithm. In the figure, 14 is an initialization block, 15 is a word clustering section, 16 is a calculation section for the average value of distance between words, and 17 is a clustering block. 18 is a central word registration section; 19 is a recalculation section for the central word within each cluster; 20 is an inter-word distance updating section and a counter section for the number of repetitions m.

第６図は、単語のクラスタリングアルゴリズムの他の例
を示すフローチャートで、２１の初期化ブロックで大語
党中の任急の単語に１を選択し、２２てクラスタリング
レベルのカウンターアップをし、２３で渚１から最も距
離の大きい単語π２を選択し、２４で残りの単語全てに
ついてｎｉ　　（ｉ＝１．２．・・・、ｋ）との距離を
第２図に示したテーブル４を用いて求め、２５で２４で
求めた距離の最小値のうち最大となる距離を持つ単語を
ｎ、＋。FIG. 6 is a flowchart showing another example of the word clustering algorithm, in which 1 is selected as the most important word in the large word group in the initialization block 21, the clustering level is countered up in 22, and the clustering level is countered in 23. In step 24, select the word π2 with the greatest distance from the beach 1, and in step 24, calculate the distance from ni (i=1.2...,k) using Table 4 shown in Figure 2. Find the word with the maximum distance among the minimum distances found in step 25 and n, +.

とし、２６τＬヤ１とし下１（ｉ＝１．２．　　・・・
、ｋ）との距離がある閾値Ｖ２Ｄ　（Ｅｉ、ｎｅ　）よ
り小さければ、２７でクラスタの中心単語ｎｋ　　（ｋ
＝１゜２、・・・、Ｋ）を登録し、そうでなければ、２
８でクラスタリングのレベルｋを１つ上げて２４へ戻す
。Then, 26τL ya 1 and lower 1 (i=1.2. . .
, k) is smaller than a certain threshold V2D (Ei, ne ), the central word nk (k
=1゜2,...,K), otherwise 2
At 8, raise the clustering level k by one and return it to 24.

なお、前記実施例においては、単語のクラスタリングを
行う際にクラスタ間の重なりを持たせなかったが、第７
図（ａ）に示すようにクラスタ間に重なりを持たせても
よい。また同図（ｂ）のようにクラスタを木構造にする
ことにより、候補クラスタの決定に必要なり’Ｐマノチ
ンクの計算量を減少させることができる。また、第１候
補のクラスタ中心との単語間距離を基に闇値を設定し、
ある闇値内のクラスタを第１候補以外にも選択し、予備
選択率の向上を図ることもできる。また、闇値を設定す
る方法とは別に、第１１頭補から第ｋ（ｆｆｌ補（ｋ＜
＜Ｋ）までのクラスタを選択することもできる。また、
ある闇値内のクラスタのうち第に候補までのクラスタを
選択することにより、予備選択率の性能向上を図ること
もできる。Note that in the above embodiment, there was no overlap between clusters when clustering words;
As shown in Figure (a), clusters may overlap. Furthermore, by forming the clusters into a tree structure as shown in FIG. 2(b), it is possible to reduce the amount of calculation required for determining candidate clusters. In addition, the darkness value is set based on the distance between the words from the cluster center of the first candidate,
It is also possible to select clusters within a certain dark value other than the first candidates to improve the preliminary selection rate. In addition, apart from the method of setting the darkness value, from the 11th complement to the k (ffl complement (k<
It is also possible to select clusters up to <K). Also,
By selecting the clusters up to the first candidate among the clusters within a certain dark value, it is also possible to improve the performance of the preliminary selection rate.

廟果以上の説明から明らかなように、本発明によると、大語
党単語をＤＰマツチングの際の単語間距離に基づいて単
語クラスタに大分類し、各クラスタ中心の単語を登録し
ておき、未知入力音声か入力されたとき、前記クラスタ
中心の艙語とのＤＰマツチングにより候補クラスタを決
定じた後に、このクラスタ内の単語とのみＤＰマツチン
グを行って認識するようにしたので、人語？単語音声を
高速かつ正確に認識することが可能となる。As is clear from the above explanation, according to the present invention, major words are roughly classified into word clusters based on the distance between words during DP matching, and the words at the center of each cluster are registered. When an unknown input voice is input, a candidate cluster is determined by DP matching with the foreign language at the center of the cluster, and then DP matching is performed only with words within this cluster to recognize it. It becomes possible to recognize word sounds quickly and accurately.

第２図第３図ア第５図７ＪｆＪ６図簗７図Ｃｂ）手続補正書輸鋤昭和５９年１１月２２日特許庁長官　　志　賀　　学　　殿２、発明の名称大語索単語音声予備選択方式３、補正をする者事件との関係　　特許出願人オオタ　り　ナカマゴメ住所　　東京都大田区中馬込１丁目３番６号氏名（名称
）　　　（６７４）株　大会　社　リ　コ　−代表者　
　浜　　１）　　　広４、代　理　人住　所　　　　　〒２３１　　ＩｔＩｔ浜市中区不老町
１−２−７シヤトレ一イン横浜８０７号６、補正の対象７、補正の内容（１）、明細書第２頁第９行目に記載の「松井新。(Figure 2, Figure 3, Figure 5, Figure 7, Figure 7, Figure 7, Figure 7, Figure 7, Figure 7, Cb)) Procedural amendments imported November 22, 1981 Manabu Shiga, Commissioner of the Japan Patent Office 2, Name of invention large search word audio preliminary selection method 3 , Relationship with the case of the person making the amendment Patent applicant Ota Ri Nakamagome Address 1-3-6 Nakamagome, Ota-ku, Tokyo Name (674) Shares Daisha Rico - Representative
Hama 1) Hiro 4, Agent Address 6, 807-807, 1-2-7, Furo-cho, Naka-ku, Hama-shi, ItItItItItItItItI-Hama-shi, Naka-ku, Hama-shi, 231 Japan, Subject of amendment 7, Contents of amendment (1), Specification page 2 “Matsui Arata” written in the 9th line.

坂野正三、」を「村井新、牧野正三、」に補正する。"Shozo Sakano," is corrected to "Arata Murai, Shozo Makino,".

（２）、同第６頁第１０行から１１行目に記載の「単語
選定部１２で」を「単語同定部１２で」に補正する。(2) "In the word selection section 12" written in lines 10 to 11 on page 6 is corrected to "in the word identification section 12."

（３）、同第７頁第８行目に記載の「テーブル４」を「
テーブル」に補正する。(3), "Table 4" written on page 7, line 8 of the same page is "
Correct the table.

（４）、同第７頁第１１行目に記載の「閾値１／２Ｄ　（ｎ　ｉ、　ｎ　ｅ）　Ｊを（５）、
第５図及び第６図を別紙の通り補正する。(4), “threshold 1/2D (n i, n e) J” described in page 7, line 11 of the same (5),
Figures 5 and 6 are corrected as shown in the attached sheet.

Claims

[Claims]

In a speech recognition device that performs speech recognition of large vocabulary words, there is a means for performing DP matching between dictionary words to calculate the distance between each word and storing a table of calculation results in a memory, and a means for storing a table of calculation results in a memory, and It has a word clustering means for grouping similar words in the dictionary words, registers the central word in each cluster, and when an unknown input word is input, the word clustering means groups similar words in the dictionary words. Preliminary speech selection of large vocabulary words characterized by performing matching with the center word of a cluster by DP matching, preliminary selecting a cluster in which a recognized word exists, and then performing DP matching anew only for the words in the cluster. method.