JP2004109906A

JP2004109906A - Text clustering method and speech recognizing method

Info

Publication number: JP2004109906A
Application number: JP2002275887A
Authority: JP
Inventors: Tomohiro Tani; 谷　智洋; Yoichi Yamashita; 山下　洋一; Tomoko Matsui; 松井　知子; Satoru Nakamura; 中村　哲
Original assignee: ATR Advanced Telecommunications Research Institute International
Current assignee: ATR Advanced Telecommunications Research Institute International
Priority date: 2002-09-20
Filing date: 2002-09-20
Publication date: 2004-04-08

Abstract

<P>PROBLEM TO BE SOLVED: To provide a text clustering method of clustering a text corpus according to word cooccurrence information. <P>SOLUTION: In the text clustering method of clustering a text corpus by using the distance between texts as a measure, the distance between the texts is calculated based upon word cooccurrence information of respective words included in the respective texts. More in concrete, the clustering is carried out by a k-means method using the relative distance between texts as a measure. Here, the center of gravity of each cluster needed for the k-means method is the text having the minimum total distance obtained by totaling distances of each text from other texts in the cluster. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
この発明は、テキストクラスタリング方法および音声認識方法に関する。
【０００２】
【従来の技術】
音声対話システムにおいて、対話の様々なシーンでユーザが自主的に発する質問や要求を受理できる仕組み（ユーザ指導型システム）は、ユーザフレンドリーなシステムを構築するにあたって不可欠な要素であるものの、シーン毎にユーザの発話内容が制限されるシステム主導の場合とは異なり、ユーザの発話内容のバリエーションが増大するため、音声認識性能が低下するという問題を抱えている。
【０００３】
しかし、対話のシーンやドメインが推定できれば、それに対応した認識系を用いることにより、認識性能の向上が見込まれる。このような試みとして、様々な発話内容に対応した言語モデルを別々の認識系に割り当て、それらの出力する複数の認識結果から尤度をもとに認識結果を選択する手法が報告されている（文献１参照）。
【０００４】
文献１：田熊，岩野，古井，”並列処理型計算機を用いた音声対話システムの検討”，人口知能学会研究会資料，ＳＩＧ−ＳＬＵＤ−Ａ２０１−０４，　ｐｐ．２１−２６，　２００２．
【０００５】
また、ユーザの内部状態も考慮して対話の状態を定義し、その同定を通じて最適な言語モデルを動的に割り当てる手法も提案されている（文献２参照）。
【０００６】
文献２：Ｙａｎｎｉｋ　Ｅｓｔｅｖｅ，　Ｆｒｅｄｅｒｉｃ　Ｂｅｃｈｅｔ，　Ａｌｅｘｉｓ　Ｎａｓｒ，　ａｎｄ　Ｒｅｎａｔｏ　Ｄｅ　Ｍｏｒｉ，　”Ｓｔｏｃｈａｓｔｉｃ　Ｆｉｎｉｔｅ　Ｓｔａｔｅ　Ａｕｔｏｍａｔａ　Ｌａｎｇｕａｇｅ　Ｍｏｄｅｌ　ｔｒｉｇｇｅｒｅｄ　ｂｙ　Ｄｉａｌｏｇｕｅ　Ｓｔａｔｅ，”Ｅｕｒｏｓｐｅｅｃｈ　２００１．
【０００７】
しかし、対話のシーンやドメインに対応した言語モデルを作成するには、学習用テキストに対話のシーンやドメインのタグを振る作業が必要となる。その作業を人手で行った場合、コストが膨大となる。また、そもそも対話のシーンやドメインを正確に規定するのは困難である。
【０００８】
そこで本発明者らは、対話のシーンやドメインを一発声中の各単語の共起情報から（概略的に）判別できるのではないかと考え、単語共起情報に基づいて、テキストコーパスを自動的にクラスタリングする手法について検討を行った。クラスタ毎に作成した言語モデルを割り当てた認識系を並列動作させ、複数の認識結果を得て、その内の尤度が最大になるものを最終結果として選択する。
【０００９】
従来、テキストコーパスの自動クラスタリング手法に関しては、エントロピーに基づくクラスタリング手法（以下、従来法という）が報告されている（文献３参照）。
【００１０】
文献３：清水，大野，樋口，”文のクラスタリングに基づく統計的言語モデル”，音響学会講演論文集，１−６−１４，　ｐｐ．３１−３２，　１９９８−３．
【００１１】
従来法（エントロピーに基づくクラスタリング手法）では、次のような手順によってクラスタリングが行われる。
【００１２】
１．全学習文を１クラスタとする。
【００１３】
２．全クラスタについて以下の処理を行う。
まず、クラスタに属するテキストからモデルを作成する。次に、このモデルを用いてクラスタ内の各テキストの単語当たりのエントロピーを計算する。次に、クラスタをコピーし、１方のクラスタからはエントロピーの低い文を、他方のクラスタからはエントロピーの高い文をそれぞれわずかづつ取り除く。このようにして、１つのクラスタから、含まれるテキストがわずかに異なる２つのクラスタを作成する。
【００１４】
３．上記クラスタを初期値として、準最適なクラスタを求める。具体的には、各クラスタ毎にクラスタに属するテキストからモデルを作成し、全ての学習文に対して、各クラスタ別モデルに対するエントロピーを求める。次に、各テキストを最もエントロピーが小さくなるクラスタに割り当てる。この作業を全学習文に対する平均エントロピーの値が収束する（すなわち、前回との差分が一定のしきい値以下になる）まで繰り返す。
【００１５】
４．クラスタ数が予め定められた値に達するまで、２，３を繰り返す。
【００１６】
【発明が解決しようとする課題】
この発明は、単語共起情報に基づいてテキストコーパスをクラスタリングするテキストクラスタリング方法を提供することを目的とする。
【００１７】
また、この発明は、単語共起情報に基づくクラスタリング結果を利用した音声認識方法を提供することを目的とする。
【００１８】
【課題を解決するための手段】
請求項１に記載の発明は、テキスト間の距離を尺度としてテキストコーパスをクラスタリングするテキストクラスタリング方法であって、テキスト間の距離を各テキストに含まれる各単語の単語共起情報に基づいて算出することを特徴とする。
【００１９】
請求項２に記載の発明は、テキスト中の各単語の単語共起情報を用いて、テキストコーパスをクラスタリングする第１ステップ、第１ステップのクラスタリング結果に基づいて各クラスタに対応した言語モデルを作成する第２ステップ、各クラスタ毎に作成した言語モデルを割り当てた認識系を並列動作させることにより、複数の認識結果を得る第３ステップ、および得られた認識結果のうちの尤度が最大となるものを最終認識結果として選択する第４ステップを備えていることを特徴とする。
【００２０】
請求項３に記載の発明は、テキスト中の各単語の単語共起情報を用いて、テキストコーパスをクラスタリングする第１ステップ、エントロピーに基づいて、テキストコーパスをクラスタリングする第２ステップ、第１ステップのクラスタリング結果に基づいて各クラスタに対応した言語モデルを作成する第３ステップ、第２ステップのクラスタリング結果に基づいて各クラスタに対応した言語モデルを作成する第４ステップ、第３ステップおよび第４ステップでそれぞれ作成された各言語モデルを割り当てた認識系を並列動作させることにより、複数の認識結果を得る第５ステップ、および得られた認識結果のうちの尤度が最大となるものを最終認識結果として選択する第６ステップを備えていることを特徴とする。
【００２１】
請求項４に記載の発明は、請求項２または請求項３に記載の音声認識装置において、第１ステップでは、テキスト間の相対距離を尺度としてテキストコーパスのクラスタリングが行われ、テキスト間の距離が各テキストに含まれる各単語の単語共起情報に基づいて算出されることを特徴とする。
【００２２】
【発明の実施の形態】
以下、図面を参照して、この発明の実施の形態について説明する。
【００２３】
〔１〕音声認識方法の説明
【００２４】
〔１．１〕第１の音声認識方法
図１は、第１の音声認識方法の手順を示している。
【００２５】
まず、テキスト中の各単語の単語共起情報を用いて、テキストコーパスをクラスタリングする（ステップ１）。この処理（単語共起情報に基づくクラスタリング処理）の詳細については後述する。次に、ステップ１のクラスタリング結果に基づいて各クラスタに対応した言語モデルを作成する（ステップ２）。たとえば、クラスタリングしたテキストセットを、標準的な学習セットに重み付けして足し合わせ、言語モデルを適応学習して各クラスタに対応した言語モデルを作成する。
【００２６】
この後、各クラスタ毎に作成した言語モデルを割り当てた認識系を並列動作させることにより、複数の認識結果を得る（ステップ３）。そして、得られた認識結果のうちの尤度が最大となるものを最終認識結果として選択する（ステップ４）。
【００２７】
〔１．２〕第２の音声認識方法
図２は、第２の音声認識方法の手順を示している。
【００２８】
まず、テキスト中の各単語の単語共起情報を用いて、テキストコーパスをクラスタリングする（ステップ１１）。この処理（単語共起情報に基づくクラスタリング処理）の詳細については後述する。エントロピーに基づいて、テキストコーパスをクラスタリングする（ステップ１２）。この処理（エントロピーに基づくクラスタリング処理）の詳細については、既に従来法として説明したので、その説明を省略する。
【００２９】
次に、ステップ１１のクラスタリング結果に基づいて各クラスタに対応した言語モデルを作成する（ステップ１３）。また、ステップ１２のクラスタリング結果に基づいて各クラスタに対応した言語モデルを作成する（ステップ１４）。
【００３０】
この後、ステップ１３およびステップ１４でそれぞれ作成された各言語モデルを割り当てた認識系を並列動作させることにより、複数の認識結果を得る（ステップ１５）。そして、得られた認識結果のうちの尤度が最大となるものを最終認識結果として選択する（ステップ１６）。
【００３１】
〔２〕単語共起情報に基づくクラスタリング処理の説明
単語共起情報に基づくクラスタリング処理では、２つのテキスト間の距離を各テキストに含まれる単語どうしの共起のしやすさで測る。以下、その手法について説明する。
【００３２】
単語Ｗ_ｐの単語共起ベクトルＣ_ｐを次式（１）のように定める。
【００３３】
【数１】

【００３４】
ただし、ａ_ｐ ^ｋは、全コーパス中の同一テキスト内において単語Ｗ_ｐが単語Ｗ_ｋと共起した回数である。言い換えれば、ａ_ｐ ^ｋは、全コーパスのうち、単語Ｗ_ｐと単語Ｗ_ｋとを含むテキストの数である。ただし、Ｋは対象とする単語の総数である。また、ｋ＝０，１，…Ｋ−１である。
【００３５】
さらに、単語Ｗ_ｐの単語共起ベクトルＣ_ｐと単語Ｗ_ｑの単語共起ベクトルＣ_ｑを使い、単語Ｗ_ｐとＷ_ｑの単語間距離ｄ（Ｗ_ｐ，Ｗ_ｑ）を次式（２）のように単語共起ベクトル間のユークリッド距離として定めた。
【００３６】
【数２】

【００３７】
テキストＬとテキストＭのテキスト間距離Ｓ_Ｌ−Ｍは、上記単語間距離を用い、次式（３）で与える。
【００３８】
【数３】

【００３９】
ただし、ｗ_{［Ｌ，ｊ］}はテキストＬの第ｉ番目の単語、同様にｗ_{［Ｍ，ｊ］}はテキストＭの第ｊ番目の単語を表す。ｎ_Ｌとｎ_Ｍはそれぞれ、テキストＬとテキストＭに含まれる単語数である。
【００４０】
テキストコーパスのクラスタリングは、テキスト間の相対距離Ｓ_Ｌ−Ｍを尺度としたｋ−ｍｅａｎｓ　法（文献４参照）により行う。ただし、ｋ−ｍｅａｎｓ　法で必要な各クラスタの重心は、各テキストについてクラスタ内の自分以外のテキストとの距離の総和を求め、その総和距離が最小となるテキストとする。
【００４１】
文献４：Ｊ．　ＭａｃＱｕｅｅｎ．　Ｓｏｍｅ　ｍｅｔｈｏｄ　ｆｏｒ　ｃｌａｓｓｉｆｉｃａｔｉｏｎ　ａｎｄ　ａｎａｌｙｓｉｓ　ｏｆ　ｍｕｌｔｉｖａｒｉａｔｅ　ｏｂｓｅｒｖａｔｉｏｎｓ．　ｖｏｌｕｍｅ　１　ｏｆ　Ｐｒｏｃｅｅｄｉｎｇｓ　ｏｆ　ｔｈｅ　Ｆｉｆｔｈ　Ｂｅｒｋｅｌｅｙ　Ｓｙｍｐｏｓｉｕｍ　ｏｎ　Ｍａｔｈｅｍａｔｉｃａｌ　ｓｔａｔｉｓｔｉｃｓ　ａｎｄ　ｐｒｏｂａｂｉｌｉｔｙ，　ｐａｇｅｓ　２８１−２９７，　Ｂｅｒｋｅｌｅｙ，　１９６７．　Ｕｎｉｖｅｒｓｉｔｙ　ｏｆ　Ｃａｌｉｆｏｒｎｉａ　Ｐｒｅｓｓ．
【００４２】
図３は、単語共起情報に基づくクラスタリング処理手順を示している。
【００４３】
ここでは、データ集合（テキストコーパス）をｘ個の集合にクラスタリングする場合の処理について説明する。
【００４４】
まず、前処理を行う（ステップ２１）。前処理では、次のようにして、テキストコーパスに含まれているテキストの全ての組み合わせについて、テキスト間距離を算出する。
【００４５】
つまり、まず、上記式（１）に基づいて、テキストコーパスに含まれている各単語毎に、単語共起ベクトルを求める。次に、上記式（２）に基づいて、テキストコーパスに含まれている単語の全ての組み合わせについて、単語間距離を求める。そして、上記式（３）に基づいて、テキストコーパスに含まれているテキストの全ての組み合わせについて、テキスト間距離を算出する。
【００４６】
前処理が行われると、ｘ個のクラスタの重心となるテキストを適当に決定する（ステップ２２）。残りの各テキストを最近隣距離の重心を持ったクラスタに割り当てる（ステップ２３）。
【００４７】
上記ステップ２３で得られた各クラスタの重心を求める（ステップ２４）。具体的には、各クラスタ毎に、そのクラスタに含まれる各テキストについて自分以外のテキストとの距離の総和を求め、その総和距離が最も小さいテキストを新たなクラスタの重心とする。
【００４８】
上記ステップ２４で得られた各クラスタの重心が、それぞれ前回の重心と同じであるか否かを判別する（ステップ２５）。上記ステップ２４で得られたｘ個のクラスタの重心が、それぞれ前回の重心と同じであれば、クラスタリング処理を終了する。そうでなければ、上記ステップ２３に戻る。
【００４９】
〔３〕対話音声認識実験
【００５０】
〔３．１〕実験条件
名詞と動詞３，７０８単語と日時・数詞（数字）の２シンボル、計３，７１０次元の単語共起ベクトルを作成し、これらの単語が共起する８３，２１１を対象にクラスタリングを行った。それらのテキスト群より複数の言語モデルを作成して認識実験を行った。
【００５１】
音声認識エンジンは、本研究で開発したＡＴＲＳＰＲＥＣを用いた。図４に示すように、複数の認識器（ベースラインとなる言語モデルとクラスタに対応した言語モデルとからなる）から得られた認識結果のうち、スコア（音響尤度と言語尤度の和）が一番大きいものを認識結果として選択する。ベースラインとなる言語モデルは、本研究所で収録した旅行対話タスクデータベース（１５８，４８８文）を用いて学習した多重クラス複合Ｎ−ｇｒａｍを用いた。
【００５２】
音響モデルには、窓長２０ｍｓｅｃ、フレームシフト１０ｍｓｅｃで抽出した２５次元の特徴ベクトル（１２次ＭＦＣＣ，１２　次ΔＭＦＣＣ，　Δｐｏｗｅｒ　）によりモデル化した男女別、５　混合ガウス分布の状態共有化ＨＭＭ（１４００状態）　を用いた。単語辞書サイズは２１，７５０単語である。評価データには学習セットと同一タスクの男女４２名、４７０発話データを用いた。
【００５３】
〔３．２〕結果
【００５４】
単語共起情報に基づくクラスタリング手法により学習コーパスを２，４，８，１６，３２個のクラスタに分割して、それら分割セット毎に言語モデルを作成し、認識実験を行った結果を表１に示す。比較のため、従来法による各言語モデルのエントロピーの総和が最小になるようにクラスタリングを行った場合（エントロピー基準）の結果についても示す。
【００５５】
【表１】

【００５６】
表１より、エントロピーに基づいてクラスタリングを行った場合、クラスタ数が増えるほど、認識率が改善される。単語共起情報に基づくクラスタリング手法においては、クラスタ数８において、最大の改善量が得られ、同一クラスタ数のエントロピー基準を用いた場合を上回った。
【００５７】
また、単語共起情報に基づくクラスタリング手法、エントロピーに基づくクラスタリング手法の両者を組み合わせて用いた場合の結果を表２に示す。
【００５８】
【表２】

【００５９】
両者の組み合わせでは、それぞれの言語モデルを全て使って、最尤基準により認識結果を選択した。二種類の言語モデルを合わせて用いることによりそれぞれ単独で用いた場合より認識率が改善された。これは、エントロピーに基づくクラスタリング手法は、各クラスタのＮグラムのエントロピーを基準に分割が行われるため、テキスト中で離れた位置にある単語間の関係を表現できないが、単語共起情報に基づくクラスタリング手法のように単語共起情報を利用すると、離れた単語どうしの関係も表現できることから、それら双方を組み合わせた場合、相補的な効果により性能が向上するからであると考えられる。
【００６０】
【発明の効果】
この発明によれば、単語共起情報に基づいてテキストコーパスをクラスタリングするテキストクラスタリング方法が実現する。
【００６１】
また、この発明によれば、単語共起情報に基づくクラスタリング結果を利用した音声認識方法が実現する。
【図面の簡単な説明】
【図１】第１の音声認識方法の手順を示すフローチャートである。
【図２】第２の音声認識方法の手順を示すフローチャートである。
【図３】単語共起情報に基づくクラスタリング処理手順を示すフローチャートである。
【図４】音声認識システムの構成例を示す模式図である。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a text clustering method and a speech recognition method.
[0002]
[Prior art]
In a spoken dialogue system, a system (user-guided system) that can accept questions and requests that the user voluntarily asks in various dialogue scenes is an indispensable element in building a user-friendly system. Unlike the case of the system initiative in which the content of the user's utterance is restricted, the variation of the content of the user's utterance increases, so that there is a problem that the voice recognition performance decreases.
[0003]
However, if the scene or domain of the dialogue can be estimated, the recognition performance can be improved by using a recognition system corresponding to the scene or domain. As such an attempt, a method has been reported in which language models corresponding to various utterance contents are assigned to different recognition systems, and a recognition result is selected from a plurality of recognition results output from the recognition system based on likelihood ( Reference 1).
[0004]
Literature 1: Takuma, Iwano, Furui, "Study of Spoken Dialogue System Using Parallel Processing Computer", Proceedings of the Japanese Society for Artificial Intelligence, SIG-SLUD-A201-04, pp. 21-26, 2002.
[0005]
A method has also been proposed in which the state of a dialog is defined in consideration of the internal state of the user, and an optimal language model is dynamically assigned through identification thereof (see Reference 2).
[0006]
Literature 2: Yannik Esteve, Frederic Bechet, Alexis Nasr, and Renato De Mori, "Stochastic Finite State Automata Language Age 200 Digital Age Digitale Digitale Digitale Digitale Digitale.
[0007]
However, in order to create a language model corresponding to a dialogue scene or domain, it is necessary to assign a dialogue scene or domain tag to the learning text. If the work is performed manually, the cost becomes enormous. In addition, it is difficult to accurately define a dialogue scene or domain in the first place.
[0008]
Therefore, the present inventors thought that the scene or domain of the dialogue could be (approximately) determined from the co-occurrence information of each word in one utterance, and automatically generated a text corpus based on the word co-occurrence information. We examined the method of clustering. A recognition system to which a language model created for each cluster is assigned is operated in parallel, a plurality of recognition results are obtained, and one having the maximum likelihood is selected as a final result.
[0009]
Conventionally, as for an automatic clustering method for a text corpus, a clustering method based on entropy (hereinafter, referred to as a conventional method) has been reported (see Reference 3).
[0010]
Reference 3: Shimizu, Ohno, Higuchi, "Statistical language model based on sentence clustering", Proc. Of the Acoustical Society of Japan, 1-6-14, pp. 31-32, 1998-3.
[0011]
In the conventional method (a clustering method based on entropy), clustering is performed by the following procedure.
[0012]
1. All learning sentences are defined as one cluster.
[0013]
2. The following processing is performed for all clusters.
First, a model is created from the text belonging to the cluster. Next, the entropy per word of each text in the cluster is calculated using this model. Next, the clusters are copied, and sentences with low entropy are removed from one cluster, and sentences with high entropy are removed from the other cluster. In this way, two clusters containing slightly different texts are created from one cluster.
[0014]
3. With the above clusters as initial values, a sub-optimal cluster is obtained. Specifically, a model is created for each cluster from the text belonging to the cluster, and entropy for each cluster-specific model is obtained for all learning sentences. Next, each text is assigned to the cluster with the smallest entropy. This operation is repeated until the value of the average entropy for all the learning sentences converges (that is, the difference from the previous sentence becomes equal to or less than a certain threshold).
[0015]
4.

Steps

2 and 3 are repeated until the number of clusters reaches a predetermined value.
[0016]
[Problems to be solved by the invention]
An object of the present invention is to provide a text clustering method for clustering a text corpus based on word co-occurrence information.
[0017]
Another object of the present invention is to provide a speech recognition method using a clustering result based on word co-occurrence information.
[0018]
[Means for Solving the Problems]
The invention according to claim 1 is a text clustering method for clustering a text corpus using a distance between texts as a measure, wherein a distance between texts is calculated based on word co-occurrence information of each word included in each text. It is characterized by the following.
[0019]
According to the second aspect of the present invention, a first step of clustering a text corpus using word co-occurrence information of each word in a text, and a language model corresponding to each cluster is created based on a clustering result of the first step. A second step of obtaining a plurality of recognition results by operating a recognition system to which a language model created for each cluster is assigned in parallel, and a maximum likelihood among the obtained recognition results. A fourth step of selecting an object as a final recognition result.
[0020]
The invention according to claim 3 includes a first step of clustering the text corpus using word co-occurrence information of each word in the text, a second step of clustering the text corpus based on entropy, and a first step. A third step of creating a language model corresponding to each cluster based on the clustering result; a fourth step of creating a language model corresponding to each cluster based on the clustering result of the second step; a third step and a fourth step The fifth step of obtaining a plurality of recognition results by operating the recognition systems to which the respective language models created are assigned in parallel, and the one having the maximum likelihood among the obtained recognition results is used as the final recognition result. A sixth step of selecting is provided.
[0021]
According to a fourth aspect of the present invention, in the speech recognition apparatus according to the second or third aspect, in the first step, text corpus clustering is performed using the relative distance between the texts as a measure, and the distance between the texts is reduced. It is calculated based on word co-occurrence information of each word included in each text.
[0022]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0023]
[1] Description of speech recognition method
[1.1] First Speech Recognition Method FIG. 1 shows the procedure of the first speech recognition method.
[0025]
First, the text corpus is clustered using the word co-occurrence information of each word in the text (step 1). Details of this processing (clustering processing based on word co-occurrence information) will be described later. Next, a language model corresponding to each cluster is created based on the clustering result of step 1 (step 2). For example, a clustered text set is weighted and added to a standard learning set, and a language model is adaptively learned to create a language model corresponding to each cluster.
[0026]
Thereafter, a plurality of recognition results are obtained by operating the recognition systems to which the language models created for each cluster are assigned in parallel (step 3). Then, among the obtained recognition results, the one with the highest likelihood is selected as the final recognition result (step 4).
[0027]
[1.2] Second Speech Recognition Method FIG. 2 shows the procedure of the second speech recognition method.
[0028]
First, the text corpus is clustered using the word co-occurrence information of each word in the text (step 11). Details of this processing (clustering processing based on word co-occurrence information) will be described later. The text corpus is clustered based on the entropy (step 12). Details of this processing (clustering processing based on entropy) have already been described as a conventional method, and a description thereof will be omitted.
[0029]
Next, a language model corresponding to each cluster is created based on the clustering result of step 11 (step 13). Further, a language model corresponding to each cluster is created based on the clustering result of step 12 (step 14).
[0030]
Thereafter, a plurality of recognition results are obtained by operating the recognition systems to which the respective language models created in steps 13 and 14 are assigned in parallel (step 15). Then, the one having the maximum likelihood among the obtained recognition results is selected as the final recognition result (step 16).
[0031]
[2] Description of Clustering Process Based on Word Co-occurrence Information In the clustering process based on word co-occurrence information, the distance between two texts is measured by the ease of co-occurrence of words included in each text. Hereinafter, the method will be described.
[0032]
Defining a word cooccurrence vector _{C p} word _{W p} as the following equation (1).
[0033]
(Equation 1)

[0034]
Here, a _p ^k is the number of times the word W _p co-occurs with the word W _k in the same text in all the corpora. In other words, a _p ^k is the number of texts containing word W _p and word W _{k in} the entire corpus. Here, K is the total number of target words. Also, k = 0, 1,... K−1.
[0035]
Furthermore, using the word cooccurrence vector _{C q} of word cooccurrence vector of words _{W p} _{C p} and words _{W q,} the word _{W p} and _{W q} word distance _{d (W} p, _{W q)} of the following formula (2) The Euclidean distance between word co-occurrence vectors is defined as follows.
[0036]
(Equation 2)

[0037]
The inter-text distance S _LM between the text L and the text M is given by the following equation (3) using the above-described inter-word distance.
[0038]
[Equation 3]

[0039]
Here, w _{[L, j]} represents the i-th word of the text L, and w _{[M, j]} similarly represents the j-th word of the text M. n _L and n _M are the number of words included in the text L and the text M, respectively.
[0040]
Clustering of text corpus is performed by k-means method using a measure of the relative distance _{S L-M} between the text (see Reference 4). However, the center of gravity of each cluster required in the k-means method is obtained by calculating the sum of the distances between each text and other texts in the cluster, and determining the total distance of the text as a minimum.
[0041]
Reference 4: J. MacQueen. Some method for classification and analysis of multivariate observations. volume 1 of Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pages 281-297, Berkeley, 1967. University of California Press.
[0042]
FIG. 3 shows a clustering processing procedure based on word co-occurrence information.
[0043]
Here, a process for clustering a data set (text corpus) into x sets will be described.
[0044]
First, preprocessing is performed (step 21). In the preprocessing, the inter-text distance is calculated for all combinations of texts included in the text corpus as follows.
[0045]
That is, first, a word co-occurrence vector is determined for each word included in the text corpus based on the above equation (1). Next, the inter-word distances are obtained for all the combinations of the words included in the text corpus based on the above equation (2). Then, the inter-text distance is calculated for all the combinations of the texts included in the text corpus, based on Expression (3).
[0046]
When the preprocessing is performed, the text that is the center of gravity of the x clusters is appropriately determined (step 22). Each of the remaining texts is assigned to a cluster having the center of gravity of the nearest distance (step 23).
[0047]
The center of gravity of each cluster obtained in step 23 is obtained (step 24). Specifically, for each cluster, the sum of the distances between each text included in the cluster and the texts other than itself is calculated, and the text having the shortest sum distance is set as the center of gravity of the new cluster.
[0048]
It is determined whether or not the center of gravity of each cluster obtained in step 24 is the same as the previous center of gravity (step 25). If the centroids of the x clusters obtained in step 24 are the same as the previous centroids, the clustering process ends. Otherwise, the process returns to step 23.
[0049]
[3] Dialogue speech recognition experiment
[3.1] Experimental conditions Nouns, verbs 3,708 words, and two symbols of date and number / numerical (numerical) are created, and a total of 3,710-dimensional word co-occurrence vectors are created. Clustering was performed on the target. Multiple language models were created from these text groups and recognition experiments were performed.
[0051]
ATSPREC developed in this study was used as the speech recognition engine. As shown in FIG. 4, among the recognition results obtained from a plurality of recognizers (including a language model serving as a baseline and a language model corresponding to a cluster), a score (sum of acoustic likelihood and language likelihood) is obtained. Is selected as the recognition result. As a language model serving as a baseline, a multi-class composite N-gram trained using a travel dialogue task database (158,488 sentences) recorded by this institute was used.
[0052]
The acoustic model includes a gender-specific, 5 mixture Gaussian state-shared HMM (1400 states) modeled by a 25-dimensional feature vector (12th-order MFCC, 12th-order ΔMFCC, Δpower) extracted with a window length of 20 msec and a frame shift of 10 msec. ) Was used. The word dictionary size is 21,750 words. As the evaluation data, 470 utterance data of 42 men and women of the same task as the learning set were used.
[0053]
[3.2] Result
The learning corpus was divided into 2, 4, 8, 16, and 32 clusters by a clustering method based on word co-occurrence information, and a language model was created for each of the divided sets. Show. For comparison, the result of the case where clustering is performed by the conventional method so that the sum of entropy of each language model is minimized (entropy criterion) is also shown.
[0055]
[Table 1]

[0056]
According to Table 1, when clustering is performed based on entropy, as the number of clusters increases, the recognition rate improves. In the clustering method based on the word co-occurrence information, the maximum improvement was obtained in the number of clusters of 8, and exceeded the case where the entropy criterion of the same number of clusters was used.
[0057]
Table 2 shows the results when both the clustering method based on word co-occurrence information and the clustering method based on entropy are used in combination.
[0058]
[Table 2]

[0059]
In the combination of both, the recognition result was selected by the maximum likelihood criterion using all the respective language models. The recognition rate was improved by using two types of language models together, compared to using them alone. This is because the entropy-based clustering method does not express the relationship between words at distant positions in the text because the division is performed based on the entropy of the N-gram of each cluster, but clustering based on word co-occurrence information If word co-occurrence information is used as in the method, the relationship between distant words can be expressed, and when both are combined, the performance is improved by a complementary effect.
[0060]
【The invention's effect】
According to the present invention, a text clustering method for clustering a text corpus based on word co-occurrence information is realized.
[0061]
Further, according to the present invention, a speech recognition method using a clustering result based on word co-occurrence information is realized.
[Brief description of the drawings]
FIG. 1 is a flowchart showing a procedure of a first speech recognition method.
FIG. 2 is a flowchart showing a procedure of a second speech recognition method.
FIG. 3 is a flowchart illustrating a clustering processing procedure based on word co-occurrence information.
FIG. 4 is a schematic diagram illustrating a configuration example of a speech recognition system.

Claims

A text clustering method for clustering a text corpus using a distance between texts as a measure, wherein a distance between texts is calculated based on word co-occurrence information of each word included in each text.

A first step of clustering a text corpus using word co-occurrence information of each word in the text;
A second step of creating a language model corresponding to each cluster based on the clustering result of the first step;
A third step of obtaining a plurality of recognition results by operating a recognition system to which a language model created for each cluster is assigned in parallel, and determining a recognition result having a maximum likelihood among the obtained recognition results as a final recognition result; A fourth step to select as
Speech recognition method provided with.

A first step of clustering a text corpus using word co-occurrence information of each word in the text,
A second step of clustering the text corpus based on entropy,
A third step of creating a language model corresponding to each cluster based on the clustering result of the first step;
A fourth step of creating a language model corresponding to each cluster based on the clustering result of the second step;
The fifth step of obtaining a plurality of recognition results by operating the recognition systems to which the respective language models created in the third step and the fourth step are assigned in parallel, and the likelihood of the obtained recognition results is maximized. A sixth step of selecting the following as the final recognition result:
Speech recognition method provided with.

4. The method according to claim 2, wherein in the first step, text corpus clustering is performed using the relative distance between the texts as a measure, and the distance between the texts is calculated based on the word co-occurrence information of each word included in each text. The speech recognition method according to any of the above.