JP2000356997A

JP2000356997A - Statistical language model generator and voice recognition device

Info

Publication number: JP2000356997A
Application number: JP11168188A
Authority: JP
Inventors: Hiroshi Yamamoto; 博史山本; Yoshinori Kosaka; 芳典匂坂
Original assignee: ATR Interpreting Telecommunications Research Laboratories
Current assignee: ATR Interpreting Telecommunications Research Laboratories
Priority date: 1999-06-15
Filing date: 1999-06-15
Publication date: 2000-12-26

Abstract

PROBLEM TO BE SOLVED: To provide a statistical language model generator, that a statistical language model having superior performance in prediction percision, reliability and robustness is generated by executing prescribed normalizing and smoothing processes to generate a forward-looking N-gram statistical language model. SOLUTION: When a switch SW is switched to an 'a' side, a statistical language model memory 21 is connected to a word hypothesis squeezing section 6 and the section 6 executes a word hypothesis squeezing process while referring to a forward-looking N-gram statistical language model in the memory 21 generated by a language model generating section 20. When the switch SW is switched to a 'b' side, a statistical language model memory 31 is connected to the section 6, the section 6 executes a word hypothsis squeezing process while referring to a fusion N-gram statistical language model in the memory 31 generated by a language model generating section 30. Thus, voice recognition is conducted at an improved voice recognition rate.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、学習用テキストデ
ータに基づいて統計的言語モデルを生成する統計的言語
モデル生成装置、及び上記統計的言語モデルを用いて、
入力される発声音声文の音声信号を音声認識する音声認
識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a statistical language model generating apparatus for generating a statistical language model based on learning text data, and a statistical language model using the statistical language model.
The present invention relates to a voice recognition device that recognizes a voice signal of an input uttered voice sentence.

【０００２】[0002]

【従来の技術】近年、連続音声認識装置において、その
性能を高めるために言語モデルを用いる方法が研究され
ている。これは、言語モデルを用いて、次単語を予測し
探索空間を削減することにより、認識率の向上及び計算
時間の削減の効果を狙ったものである。最近盛んに用い
られている言語モデルとしてＮ−グラム（Ｎ−ｇｒａ
ｍ）がある。これは、大規模なテキストデータを学習
し、直前のＮ−１個の単語から次の単語への遷移確率を
統計的に与えるものである。複数Ｌ個の単語列ｗ₁ ^L＝ｗ
₁，ｗ₂，…，ｗ_Lの生成確率Ｐ（ｗ₁ ^L）は次式で表され
る。2. Description of the Related Art In recent years, a method of using a language model has been studied to improve the performance of a continuous speech recognition apparatus. This aims at improving the recognition rate and reducing the calculation time by predicting the next word using a language model and reducing the search space. Recently, N-gram (N-gram) has been widely used as a language model.
m). It learns large-scale text data and statistically gives the transition probability from the previous N-1 words to the next word. Multiple L word strings w ₁ ^L = w
The generation probability P (w ₁ ^L ) of ₁ , w ₂ ,..., W _L is expressed by the following equation.

【０００３】[0003]

【数１】 (Equation 1)

【０００４】ここで、ｗ_tは単語列ｗ₁ ^Lのうちｔ番目の
１つの単語を表し、ｗ_i ^jはｉ番目からｊ番目の単語列を
表わす。上記数１において、確率Ｐ（ｗ_t｜
ｗ_t+1-N ^t-1）は、Ｎ個の単語からなる単語列ｗ_t+1-N ^t-1
が発声された後に単語ｗ_tが発声される確率であり、以
下同様に、確率Ｐ（Ａ｜Ｂ）は単語又は単語列Ｂが発声
された後に単語Ａが発声される確率を意味する。また、
数１における「Π」はｔ＝１からＬまでの確率Ｐ（ｗ_t
｜ｗ_t+1-N ^t-1）の積を意味し、以下同様である。[0004] Here, w _t represents a t-th one word of the word string w ₁ ^L, w _i ^j represents the j-th word string from the i-th. In the above _equation 1, the probability P (w _t |
wt _{+ 1-} ^Nt-1 ) is a word sequence wt _{+ 1-} ^Nt-1 composed of N words.
Is the probability that the word w _t will be uttered after is uttered, and similarly, the probability P (A | B) means the probability that the word A will be uttered after the word or word string B has been uttered. Also,
“Π” in Equation 1 represents the probability P (w _t from t = 1 to L
| W _{t + 1−N} ^t−1 ), and so on.

【０００５】大語彙連続音声認識において上述の単語Ｎ
−ｇｒａｍなどのＮ−ｇｒａｍの統計的言語モデルが広
く用いられているが、以下の４点を十分に満たしている
必要がある。（１）次単語予測の精度（２）スパースデータ（学習量が少ないデータ）に対す
る信頼性（３）コンパクトなモデルサイズ（４）タスク（又はドメイン、音声認識処理すべき場面
などのジョブをいう。）のずれに対する頑健さIn large vocabulary continuous speech recognition, the word N
Although N-gram statistical language models such as -gram are widely used, the following four points must be sufficiently satisfied. (1) Accuracy of next word prediction (2) Reliability for sparse data (data with a small amount of learning) (3) Compact model size (4) Task (or domain, job to be subjected to speech recognition processing, etc.) Robustness against misalignment

【０００６】[0006]

【発明が解決しようとする課題】上述の単語Ｎ−ｇｒａ
ｍは「次単語予測の精度」に関しては優れた性能を持っ
ているが、「スパースデータに対する信頼性」、「タス
クのずれに対する頑健さ」の点では不満が残る。一方、
品詞クラスの情報に基づく品詞クラスＮ−ｇｒａｍ（以
下、品詞クラスＮ−ｇｒａｍという。）は「次単語予測
の精度」に関しては単語Ｎ−ｇｒａｍにかなり劣るもの
の、他の３点に関してはすぐれている。また、クラスＮ
−ｇｒａｍとして、自動的にクラスタリングすることに
よりクラス分類を行う方法（例えば、従来技術文献１
「政瀧浩和ほか，”最大事後確率推定によるＮ−ｇｒａ
ｍ言語モデルのタスク適応”，電子情報通信学会論文
誌，Ｖｏｌ．Ｊ８１−Ｄ−ＩＩ，ｐｐ．２５１９−２５
２５，１９９８年１１月」参照。）により得られた統計
的言語モデル（以下、自動クラス２−ｇｒａｍとい
う。）も提案されており、「次単語予測の精度」、「ス
パースデータに対する信頼性」、「コンパクトなモデル
サイズ」いずれにおいても良い性能を示すが、クラス分
類自体がタスクに依存したものとなってしまうため、
「タスクのずれに対する頑健さ」という点では劣る。こ
れらのモデルの４つの要求に対する充足度を表１にまと
めて示す。The above word N-gra
m has excellent performance with respect to "accuracy of next word prediction", but remains unsatisfactory in terms of "reliability for sparse data" and "robustness against task deviation". on the other hand,
The part-of-speech class N-gram based on the part-of-speech class information (hereinafter referred to as the part-of-speech class N-gram) is considerably inferior to the word N-gram in "accuracy of next word prediction", but excellent in the other three points. . Class N
A method of performing class classification by automatically performing clustering as -gram (for example,
"Hirokazu Masataki et al.," N-gra by maximum posterior probability estimation
Task adaptation of m-language model ", IEICE Transactions, Vol.J81-D-II, pp.2519-25.
25, November 1998 ". ), A statistical language model (hereinafter, referred to as an automatic class 2-gram) has also been proposed, which is used in any of “accuracy of next word prediction”, “reliability for sparse data”, and “compact model size”. Shows good performance, but the classification itself depends on the task,
It is inferior in terms of "robustness to task deviation". Table 1 summarizes the fulfillment of these models for the four requirements.

【０００７】[0007]

【表１】各種の言語モデル基本性能の比較 ―――――――――――――――――――――――――――――――――― 予測精度信頼性モテ゛ルサイス゛頑健さ ―――――――――――――――――――――――――――――――――― 単語２−ｇｒａｍ ◎ × × × ―――――――――――――――――――――――――――――――――― 品詞クラス２−ｇｒａｍ × ◎ ◎ ◎ ―――――――――――――――――――――――――――――――――― 自動クラス２−ｇｒａｍ ◎ ◎ ◎ × ――――――――――――――――――――――――――――――――――[Table 1] Comparison of basic performance of various language models ―――――――――――――――――――――――――――――――― Prediction accuracy Reliability Model Size Robustness ―――――――――――――――――――――――――――――――― Word 2-gram ◎ × × × ―――― ―――――――――――――――――――――――――――――― Part of speech class 2-gram × ◎ ◎ ◎ ――――――――――― ――――――――――――――――――――――― Automatic class 2-gram ◎ ◎ ◎ × ―――――――――――――――――― ――――――――――――――――

【０００８】本発明の第１の目的は上述の問題点を解決
し、モデルサイズを小型化できないが、予測精度、信頼
性及び頑健さにおいて優れた性能を有する統計的言語モ
デルを生成することができる統計的言語モデル生成装置
及びそれを用いた音声認識装置を提供することにある。A first object of the present invention is to solve the above-mentioned problems and to generate a statistical language model which cannot reduce the model size but has excellent performance in prediction accuracy, reliability and robustness. It is an object of the present invention to provide a statistical language model generation apparatus capable of performing the above-described processing and a speech recognition apparatus using the same.

【０００９】また、本発明の第２の目的は上述の問題点
を解決し、予測精度、信頼性、モデルサイズ及び頑健さ
において優れた性能を有する統計的言語モデルを生成す
ることができる統計的言語モデル生成装置及びそれを用
いた音声認識装置を提供することにある。A second object of the present invention is to solve the above-mentioned problems and to provide a statistical language model capable of generating a statistical language model having excellent performance in prediction accuracy, reliability, model size and robustness. An object of the present invention is to provide a language model generation device and a speech recognition device using the same.

【００１０】[0010]

【課題を解決するための手段】本発明に係る請求項１記
載の統計的言語モデル生成装置は、所定の話者の発声音
声文を書き下した学習用テキストデータに基づいて、複
数の単語からなる単語列の後に処理対象の単語が生起す
る第１の頻度確率を計算することにより前向きの単語Ｎ
−ｇｒａｍの統計的言語モデルを生成する第１の生成手
段と、上記学習用テキストデータと、品詞クラス情報を
含む品詞クラス情報データとに基づいて、第１の単語の
品詞クラス及び上記第１の単語の後に接続される複数の
単語からなる単語列の後に、処理対象の単語が生起する
第２の頻度確率を計算することにより前向きの品詞クラ
ス−単語Ｎ−ｇｒａｍの統計的言語モデルを生成する第
２の生成手段と、上記第２の生成手段によって生成され
た前向きの品詞クラス−単語Ｎ−ｇｒａｍの統計的言語
モデルを事前知識として用い、上記第１の生成手段によ
って生成された前向きの単語Ｎ−ｇｒａｍの統計的言語
モデルを事後知識として用いて最大事後確率推定法によ
り、第１の頻度確率と第２の頻度確率との間を補間して
なる前向きの遷移確率を計算する第１の計算手段と、上
記第１の計算手段によって計算された前向きの遷移確率
に対して、所定の正規化処理と平滑化処理を実行するこ
とにより前向きのＮ−ｇｒａｍの統計的言語モデルを生
成する第１の処理手段とを備えたことを特徴とする。According to a first aspect of the present invention, there is provided a statistical language model generating apparatus comprising a plurality of words based on learning text data in which uttered voice sentences of a predetermined speaker are written. The forward word N is calculated by calculating the first frequency probability that the word to be processed occurs after the word string.
A first part-of-speech class of the first word and the first part-of-speech class based on the learning text data and the part-of-speech class information data including the part-of-speech class information; A statistical language model of a forward-looking part-of-speech class-word N-gram is generated by calculating a second frequency probability that a word to be processed occurs after a word string including a plurality of words connected after the word. A second generation unit, and a forward-looking word generated by the first generation unit using a forward-looking part of speech class-word N-gram statistical language model generated by the second generation unit as prior knowledge; Forward transition formed by interpolating between the first frequency probability and the second frequency probability by the maximum posterior probability estimation method using the N-gram statistical language model as the posterior knowledge A first calculating means for calculating the rate, and a forward N-gram statistic by executing a predetermined normalizing process and a smoothing process on the forward transition probability calculated by the first calculating device. And first processing means for generating a dynamic language model.

【００１１】また、請求項２記載の統計的言語モデル生
成装置は、請求項１記載の統計的言語モデル生成装置に
おいて、上記学習用テキストデータに基づいて、処理対
象の単語から前に接続する複数の単語からなる単語列が
生起する第３の頻度確率を計算することにより後向きの
単語Ｎ−ｇｒａｍの統計的言語モデルを生成する第３の
生成手段と、上記学習用テキストデータと、上記品詞ク
ラス情報データとに基づいて、処理対象の単語の品詞ク
ラスから前に接続する複数の単語からなる単語列が生起
する第４の頻度確率を計算することにより後向きの品詞
クラス−単語Ｎ−ｇｒａｍの統計的言語モデルを生成す
る第４の生成手段と、上記第４の生成手段によって生成
された後向きの品詞クラス−単語Ｎ−ｇｒａｍの統計的
言語モデルを事前知識として用い、上記第３の生成手段
によって生成された後向きの単語Ｎ−ｇｒａｍの統計的
言語モデルを事後知識として用いて最大事後確率推定法
により、第３の頻度確率と第４の頻度確率との間を補間
してなる後向きの遷移確率を計算する第２の計算手段
と、上記第２の計算手段によって計算された後向きの遷
移確率に基づいて、所定の正規化処理と平滑化処理を実
行することにより後向きの単語Ｎ−ｇｒａｍの統計的言
語モデルを生成する第２の処理手段とをさらに備えたこ
とを特徴とする。According to a second aspect of the present invention, there is provided a statistical language model generating apparatus according to the first aspect, wherein a plurality of words connected before a word to be processed are connected based on the learning text data. A third generation means for generating a statistical language model of the backward word N-gram by calculating a third frequency probability that a word string consisting of the word string occurs, the learning text data, and the part of speech class The backward part of speech class-statistic of word N-gram is calculated based on the information data by calculating the fourth frequency probability that a word string composed of a plurality of words connected before occurs from the part of speech class of the word to be processed. Generating means for generating a statistical language model, and a statistical language model of the backward part-of-speech class-word N-gram generated by the fourth generating means in advance. Using the statistical language model of the backward word N-gram generated by the third generating means as the posterior knowledge, by the maximum posterior probability estimating method, by using the third frequency probability and the fourth frequency probability. Executing a predetermined normalization process and a smoothing process based on the backward transition probability calculated by the second calculation unit, which calculates a backward transition probability by interpolating between And a second processing means for generating a statistical language model of the backward word N-gram.

【００１２】本発明に係る請求項３記載の統計的言語モ
デル生成装置は、複数の単語からなる単語列の後に処理
対象の単語が生起する第１の頻度確率と、第１の単語の
品詞クラス及び上記第１の単語の後に接続される複数の
単語からなる単語列の後に、処理対象の単語が生起する
第２の頻度確率とを補間してなる前向きの遷移確率を含
む前向きのＮ−ｇｒａｍの統計的言語モデルの遷移確率
に基づいて、処理対象単語よりも前に接続される各単語
列に対して特徴量として上記前向きのＮ−ｇｒａｍの統
計的言語モデルの遷移確率を割り当てて、各クラスの特
徴量のばらつきが小さくならないようにクラスタリング
して、クラスタリング後のクラス分類情報を生成する第
１のクラスタリング手段と、処理対象の単語から前に接
続する複数の単語からなる単語列が生起する第３の頻度
確率と、処理対象の単語の品詞クラスから前に接続する
複数の単語からなる単語列が生起する第４の頻度確率と
を補間してなる後向きの遷移確率を含む後向きのＮ−ｇ
ｒａｍの統計的言語モデルの遷移確率に基づいて、各処
理単語に対して特徴量として上記後向きのＮ−ｇｒａｍ
の統計的言語モデルの遷移確率を割り当てて、各クラス
の特徴量のばらつきが小さくならないようにクラスタリ
ングして、クラスタリング後のクラス分類情報を生成す
る第２のクラスタリング手段と、所定の話者の発声音声
文を書き下した学習用テキストデータに基づいて、上記
第１のクラスタリング手段及び第２のクラスタリング手
段によって生成されたクラス分類情報を処理対象とし
て、処理対象の単語よりも前の単語列のクラスから、処
理対象の単語のクラスへの頻度確率を計算することによ
り融合Ｎ−ｇｒａｍの統計的言語モデルを生成する第５
の生成手段とを備えたことを特徴とする。According to a third aspect of the present invention, there is provided a statistical language model generating apparatus, comprising: a first frequency probability that a word to be processed occurs after a word string including a plurality of words; and a part of speech class of the first word. And a forward N-gram including a forward transition probability obtained by interpolating, after a word string including a plurality of words connected after the first word, a second frequency probability at which a processing target word occurs. Based on the transition probability of the statistical language model of the above, the forward N-gram statistical language model transition probability is assigned as a feature amount to each word string connected before the processing target word, First clustering means for performing clustering so as to prevent variation in the feature amount of the class from becoming small and generating class classification information after clustering; and a plurality of words connected before the word to be processed. Backward transition obtained by interpolating the third frequency probability of occurrence of a word sequence consisting of the following and the fourth frequency probability of occurrence of a word sequence consisting of a plurality of words connected before from the part of speech class of the word to be processed Backward Ng with probability
The backward N-gram is used as a feature for each processing word based on the transition probability of the statistical language model of gram.
A clustering means for allocating the transition probabilities of the statistical language model of (a) and performing clustering so that the variation in the feature amount of each class is not reduced, and generating class classification information after clustering; The class classification information generated by the first clustering means and the second clustering means is used as a processing target based on the learning text data in which the spoken sentence has been written, and a class of a word string before the processing target word is used. Generating a statistical language model of the fusion N-gram by calculating the frequency probability of the word to be processed to the class.
And a generation means.

【００１３】また、請求項４記載の統計的言語モデル生
成装置は、請求項３記載の統計的言語モデル生成装置に
おいて、上記補間してなる前向きの遷移確率は、上記第
１の処理手段によって生成された前向きのＮ−ｇｒａｍ
の統計的言語モデルの遷移確率であり、上記補間してな
る後向きの遷移確率は、上記第２の処理手段によって生
成された後向きのＮ−ｇｒａｍの統計的言語モデルの遷
移確率であり、上記第５の生成手段において用いる学習
用テキストデータは、上記第１乃至第４の生成手段にお
いて用いる学習用テキストデータであることを特徴とす
る。According to a fourth aspect of the present invention, in the statistical language model generating apparatus according to the third aspect, the forward transition probability obtained by the interpolation is generated by the first processing means. Forward-looking N-gram
The transition probability of the backward language obtained by the interpolation is the transition probability of the statistical language model of the backward N-gram generated by the second processing means. The learning text data used in the fifth generation means is the learning text data used in the first to fourth generation means.

【００１４】さらに、本発明に係る請求項５記載の音声
認識装置は、入力される発声音声文の音声信号に基づい
て、所定の統計的言語モデルを用いて音声認識する音声
認識手段を備えた音声認識装置において、上記音声認識
手段は、請求項１に記載の第１の処理手段によって生成
された前向きＮ−ｇｒａｍの統計的言語モデルを用い
て、請求項２に記載の第２の処理手段によって生成され
た後向きＮ−ｇｒａｍの統計的言語モデルを用いて、も
しくは、請求項３又は４記載の第５の生成手段によって
生成された融合Ｎ−ｇｒａｍの統計的言語モデルを用い
て、音声認識することを特徴とする。Further, the speech recognition apparatus according to the fifth aspect of the present invention is provided with a speech recognition means for recognizing a speech using a predetermined statistical language model based on a speech signal of an input uttered speech sentence. 3. The second processing unit according to claim 2, wherein the voice recognition unit uses a forward-looking N-gram statistical language model generated by the first processing unit according to claim 1. 4. Speech recognition using a backward N-gram statistical language model generated by the fifth embodiment or using a fused N-gram statistical language model generated by the fifth generation means according to claim 3 or 4. It is characterized by doing.

【００１５】[0015]

【発明の実施の形態】以下、図面を参照して本発明に係
る実施形態について説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００１６】図１は、本発明に係る一実施形態である連
続音声認識装置のブロック図である。この実施形態の連
続音声認識装置は、図４の第１の言語モデル生成処理を
実行することにより前向きＮ−ｇｒａｍである統計的言
語モデル及び後向きＮ−ｇｒａｍである統計的言語モデ
ルを生成する言語モデル生成部２０（第１の実施形態）
と、上記生成された２つの統計的言語モデルに基づいて
図７の言語モデル生成処理を実行することにより上記２
つの統計的言語モデルを融合した融合Ｎ−ｇｒａｍであ
る統計的言語モデルを生成する言語モデル生成部３０
（第２の実施形態）とを備えたことを特徴としている。
ここで、スイッチＳＷを接点ａ側に切り換えて統計的言
語モデルメモリ２１内の前向きＮ−ｇｒａｍを用いて単
語仮説絞込部６が単語仮説絞込処理を行う実施形態を第
１の実施形態という一方、スイッチＳＷを接点ｂ側に切
り換えて統計的言語モデルメモリ３１内の後向きＮ−ｇ
ｒａｍを用いて単語仮説絞込部６が単語仮説絞込処理を
行う実施形態を第２の実施形態という。FIG. 1 is a block diagram of a continuous speech recognition apparatus according to one embodiment of the present invention. The continuous speech recognition device according to this embodiment executes the first language model generation process of FIG. 4 to generate a forward language N-gram statistical language model and a backward N-gram statistical language model. Model generation unit 20 (first embodiment)
And executing the language model generation process of FIG. 7 based on the two generated statistical language models.
Language model generation unit 30 that generates a statistical language model that is a fusion N-gram that fuses two statistical language models
(Second Embodiment).
Here, an embodiment in which the switch SW is switched to the contact a side and the word hypothesis narrowing unit 6 performs the word hypothesis narrowing process using the forward N-gram in the statistical language model memory 21 is referred to as a first embodiment. On the other hand, the switch SW is switched to the contact b side so that the backward N-g in the statistical language model memory 31 is changed.
An embodiment in which the word hypothesis narrowing unit 6 performs word hypothesis narrowing processing using ram is referred to as a second embodiment.

【００１７】本実施形態では、単語Ｎ−ｇｒａｍ、品詞
クラスＮ−ｇｒａｍ、自動クラスＮ−ｇｒａｍの欠点を
補い合うことにより、４つの要求を同時に満たす新しい
Ｎ−ｇｒａｍモデルを生成する方法を開示する。まず、
第１の実施形態では、まず、単語Ｎ−ｇｒａｍに対し
て、公知の最大事後確率推定法（以下、ＭＡＰ推定法と
いう。）により品詞クラスの情報を付加することで、
「次単語予測の精度」を保ちながらも、「スパースデー
タに対する信頼性」、「タスクのずれに対する頑健さ」
を付加した統計的言語モデルを作成する。続いて、この
統計的言語モデルにより得られる単語間の遷移確率を単
語の特徴量とみなし、これに基づいて、自動的にクラス
タリングしてクラス分類を行うことによりさらに「コン
パクトなモデルサイズ」を加え、先に挙げた４つの要求
の同時に満たす融合Ｎ−ｇｒａｍである統計的言語モデ
ルを生成する。This embodiment discloses a method of generating a new N-gram model that satisfies the four requirements simultaneously by compensating for the disadvantages of the word N-gram, the part-of-speech class N-gram, and the automatic class N-gram. First,
In the first embodiment, first, information of a part-of-speech class is added to a word N-gram by a known maximum posterior probability estimation method (hereinafter, referred to as a MAP estimation method).
"Reliability for sparse data" and "robustness for task deviation" while maintaining "accuracy of next word prediction"
Create a statistical language model with. Subsequently, the transition probability between words obtained by this statistical language model is regarded as a feature amount of the word, and based on this, a “compact model size” is added by automatically performing clustering and class classification. , Generate a statistical language model that is a fused N-gram that simultaneously satisfies the four requirements listed above.

【００１８】まず、品詞クラス情報と単語情報の連続的
な補間処理について説明する。単語Ｎ−ｇｒａｍにおけ
る問題点である「スパースデータに対する信頼性」と
「タスクのずれに対する頑健さ」を品詞クラス情報を用
いて補うことを考える。単語Ｎ−ｇｒａｍにおける信頼
性は推定に用いる直前のＮ−１個の単語列の出現頻度に
依存する。そこで、出現回数の少ない単語列に関しては
品詞クラスの情報で補うようにすれば精度を落とすこと
なく、信頼性の向上が期待でき、同時に品詞クラスＮ−
ｇｒａｍの持つタスクのずれに対する頑健さを加えるこ
とができると考えられる。この方法の実現手段として
は、以下の方法が考えられる。（ａ）単語列の出現回数によるしきい値によって単語Ｎ
−ｇｒａｍと品詞クラスＮ−ｇｒａｍを切り換える方
法、（ｂ）両者の線形結合を用いる方法、及び（ｃ）バ
ックオフ平滑化に用いる方法など。First, a description will be given of a continuous interpolation process between part of speech class information and word information. Consider using the part-of-speech class information to compensate for problems in the word N-gram, “reliability for sparse data” and “robustness for task deviation”. The reliability of the word N-gram depends on the appearance frequency of the N-1 word strings immediately before used for estimation. Therefore, if a word string having a small number of appearances is supplemented with information on the part of speech class, an improvement in reliability can be expected without lowering the accuracy, and at the same time, the part of speech class N-
It is conceivable that the gram can be made more robust against task deviation. The following methods are conceivable as means for realizing this method. (A) Word N is determined by a threshold based on the number of appearances of a word string
A method of switching between the -gram and the part-of-speech class N-gram, (b) a method using a linear combination of both, and (c) a method used for back-off smoothing.

【００１９】発明者の意見として、図３に示すように、
品詞クラスＮ−ｇｒａｍと、単語Ｎ−ｇｒａｍとの間を
直前の単語列の出現頻度に応じて、直前の単語列を用い
た確率推定値と、直前の品詞クラス列を用いた確率推定
値を連続的に補間する方がより精度と信頼性を期待でき
る。そこで、この目的のために品詞クラス及び単語列か
ら単語へのＮ−ｇｒａｍ（以下、品詞クラス−単語Ｎ−
ｇｒａｍという。）を考える。単語列（ｗ₁，ｗ₂，…，
ｗ_n-1，ｗ_n）について処理するときに、品詞クラス−単
語Ｎ−ｇｒａｍは品詞クラス及び単語列（ｃ₁，ｗ₂，
…，ｗ_n-1）から単語ｗ_nへの遷移確率を考えるものあ
り、その値は次式であらわされる。ここで、ｃ₁は単語
ｗ₁の品詞クラスである。As the opinion of the inventor, as shown in FIG.
Between the part of speech class N-gram and the word N-gram, the probability estimation value using the immediately preceding word string and the probability estimation value using the immediately preceding part of speech class string are calculated according to the appearance frequency of the immediately preceding word string. Continuous interpolation can provide higher accuracy and reliability. Therefore, for this purpose, a part-of-speech class and an N-gram from a word string to a word (hereinafter, part of speech class-word N-
Gram. )think of. Word strings (w ₁ , w ₂ , ...,
w _n−1 , w _n ), the part-of-speech class-word N-gram consists of the part-of-speech class and the word string (c ₁ , w ₂ ,
, W _n-1 ) to the word w _n , and the value is expressed by the following equation. Here, c ₁ is the part of speech class of the word w ₁ .

【００２０】[0020]

【数２】ｐ（ｗ_n｜ｃ₁，ｗ₂，…，ｗ_n-1）## EQU2 ## p (w _n | c ₁ , w ₂ ,..., W _n-1 )

【００２１】そして、この品詞クラス−単語Ｎ−ｇｒａ
ｍを事前知識とし、単語Ｎ−ｇｒａｍを事後知識とする
公知のＭＡＰ推定法を用いることにする。ＭＡＰ推定法
を用いる場合、事後事象である単語Ｎ−ｇｒａｍの値が
どのような分布に従うかを予め知っておく必要がある。
そして、この分布、すなわち事前分布が次式のベータ分
布に従うものと仮定する。Then, this part-of-speech class-word N-gra
A known MAP estimation method using m as prior knowledge and the word N-gram as posterior knowledge will be used. When using the MAP estimation method, it is necessary to know in advance what distribution the value of the word N-gram, which is a posterior event, follows.
Then, it is assumed that this distribution, that is, the prior distribution, follows the beta distribution of the following equation.

【００２２】[0022]

【数３】｛１／Ｂ（α，β）｝×ｐ⁽ ^α ^-1)（１−ｐ）⁽ ^β ^-1) [Equation 3] {1 / B (α, β)} × p ⁽ ^α ⁻¹⁾ (1-p) ⁽ ^β ⁻¹⁾

【００２３】ベータ分布を用いる理由はパラメータα及
びβを変化させることにより、一様分布を含む様々な分
布形状を表すことができるためである。単語列（ｗ₁，
ｗ₂，…，ｗ_i）の観測回数をＣ（ｗ₁，ｗ₂，…，ｗ_i）
とする時、ＭＡＰ推定法によるＭＡＰ推定処理後の単語
列（ｗ₁，ｗ₂，…，ｗ_n-1）から単語ｗ_nへの前向きの遷
移確率ｐ_MAP（ｗ_n｜ｗ₁，ｗ₂，…，ｗ_n-1）は次式のよ
うになる。The reason for using the beta distribution is that various distribution shapes including a uniform distribution can be represented by changing the parameters α and β. Word strings (w ₁ ,
The number of observations of w ₂ ,..., w _i ) is represented by C (w ₁ , w ₂ _,.
, The forward transition probability p _MAP (w _n | w ₁ , w ₂ ) from the word string (w ₁ , w ₂ ,..., W _n-1 ) after the MAP estimation processing by the MAP estimation method to the word w _n ,..., W _n-1 ) are as follows.

【００２４】[0024]

【数４】ｐ_MAP（ｗ_n｜ｗ₁，ｗ₂，…，ｗ_n-1）＝｛Ｃ
（ｗ₁，ｗ₂，…，ｗ_n-1）＋α−１｝／｛Ｃ（ｗ₁，
ｗ₂，…，ｗ_n-1，ｗ_n）＋α＋β−２｝## EQU4 ## p _MAP (w _n | w ₁ , w ₂ ,..., W _n-1 ) = ｛C
(W ₁ , w ₂ ,..., W _n-1 ) + α-1} / {C (w ₁ ,
w ₂ ,..., w _n−1 , w _n ) + α + β−2｝

【００２５】ベータ分布においてはその平均μと分散β
は次式となることが知られている。In the beta distribution, the mean μ and the variance β
Is known to be:

【００２６】[0026]

【数５】μ＝α／（α＋β）## EQU5 ## μ = α / (α + β)

【数６】σ²＝αβ／｛（α＋β）²（α＋β＋１）｝Σ ² = αβ / {(α + β) ² (α + β + 1)}

【００２７】しかしながら、これらの値を最尤推定によ
って求めることは困難である。まず、観測回数Ｃ
（ｗ₁，ｗ₂，…，ｗ_n-1）及び観測回数Ｃ（ｗ₁，ｗ₂，
…，ｗ_n）がともに０の時、すなわち事後知識が与えら
れないときの確率推定値は、上記数４に次式の数７を代
入することにより、数８を得る。However, it is difficult to obtain these values by maximum likelihood estimation. First, the number of observations C
(W ₁ , w ₂ ,..., W _n-1 ) and the number of observations C (w ₁ , w ₂ ,
, W _n ) are both 0, that is, the probability estimation value when the posterior knowledge is not given is obtained by substituting the following equation 7 into the above equation 4 and the equation 8 below.

【００２８】[0028]

【数７】Ｃ（ｗ₁，ｗ₂，…，ｗ_n-1）＝Ｃ（ｗ₁，ｗ₂，
…，ｗ_n）＝０C (w ₁ , w ₂ ,..., W _n-1 ) = C (w ₁ , w ₂ ,
…, W _n ) = 0

【数８】ｐ_MAP（ｗ_n｜ｗ₁，ｗ₂，…，ｗ_n-1）＝（α−
１）／（α＋β−２）## _EQU8 ## p _MAP (w _n | w ₁ , w ₂ ,..., W _n-1 ) = (α−
1) / (α + β-2)

【００２９】また、事後知識が与えられない時は事前知
識を用いるのが妥当であることから上記数８の値は事前
知識による値と等しいとし、次式を得る。When the posterior knowledge is not given, it is appropriate to use the prior knowledge, so that the value of the above equation 8 is equal to the value based on the prior knowledge, and the following equation is obtained.

【００３０】[0030]

【数９】（α−１）／（α＋β−２）＝ｐ（ｗ_n｜ｃ₁，
ｗ₂，…，ｗ_n-1）(Α-1) / (α + β-2) = p (w _n | c ₁ ,
w ₂ , ..., w _n-1 )

【００３１】ここで、確率ｐ（ｗ_n｜ｃ₁，ｗ₂，…，ｗ
_n-1）は単語Ｎ−ｇｒａｍの確率ｐ（ｗ_n｜ｗ_i，ｗ₂，
…，ｗ_n-1）（ただしｗ_i∈ｃ_i）の重み付き平均μ_hとし
て求められる。ここで、パラメータα及びβに関して解
が求めやすいように上記数５の事前分布の平均μと、上
記数８及び数９における重み付き平均μ_hとの関係にお
いて次式の数１０及び数１１であることを上記数６に形
式的にあてはめ、次式の数１２を仮定する。Here, the probability p (w _n | c ₁ , w ₂ ,..., W
_n-1 ) is the probability p (w _n | w _i , w ₂ ,
.., W _n-1 ) (where w _i ∈c _i ) is obtained as a weighted average μ _h . Here, in order to easily find a solution for the parameters α and β, the relationship between the average μ of the prior distribution of Equation 5 and the weighted average μ _h in Equations 8 and 9 is expressed by Equations 10 and 11 below. This is formally applied to the above equation (6), and the following equation (12) is assumed.

【００３２】[0032]

【数１０】α→α−１## EQU10 ## α → α-1

【数１１】α＋β→α＋β−２[Equation 11] α + β → α + β-2

【数１２】｛（α−１）（β−１）｝／｛（α＋β−
２）²（α＋β−１）｝＝σ_h ² (12) {(α-1) (β-1)} / ｛(α + β-
2) ² (α + β-1)｝ = σ _h ²

【００３３】ここで、σ_h ²は単語Ｎ−ｇｒａｍの重み付
き分散である。上記数８、数９、数１２により、パラメ
ータα及びβは次式に従って求めることができる。Here, σ _h ² is the weighted variance of the word N-gram. From Equations 8, 9, and 12, the parameters α and β can be obtained according to the following equations.

【００３４】[0034]

【数１３】α−１＝｛μ_h ²／（１−μ_h）｝σ_h ²−μ_h Α-1 = ｛μ _h ² / (1-μ _h )｝ σ _h ² -μ _h

【数１４】 α＋β−２＝［｛μ_h（１−μ_h）｝／σ_h ²］−１Α + β−2 = [{μ _h (1-μ _h )} / σ _h ² ] −1

【００３５】なお、重み付き平均μ_h及び重み付き分散
σ_h ²は次式で定義されて計算することができる。The weighted average μ _h and the weighted variance σ _h ² can be calculated by being defined by the following equations.

【００３６】[0036]

【数１５】 (Equation 15)

【数１６】 (Equation 16)

【００３７】上述された方法で計算されたパラメータα
及びβに基づいて上記数４を用いて、各単語列毎にＭＡ
Ｐ推定後の前向きの確率を求める。The parameter α calculated by the method described above
Using the above equation (4) on the basis of
The forward probability after P estimation is determined.

【００３８】次いで、後向きの遷移確率について説明す
る。ＭＡＰ推定法によるＭＡＰ推定処理後の単語ｗ_nか
ら単語列（ｗ_n-1，ｗ_n-2，…，ｗ₂，ｗ₁）への後向きの
遷移確率ｐ_MAP（ｗ_n-1，ｗ_n-2，…，ｗ₂，ｗ₁｜ｗ_n）は
次式のようになる。Next, the backward transition probability will be described. The backward transition probability p _MAP (w _n−1 , w _n ) from the word w _n after the MAP estimation processing by the MAP estimation method to the word string (w _n−1 , w _n−2 ,..., W ₂ , w ₁ ) _{_{-2, ..., w 2, w}} 1 | w n) is given by the following equation.

【００３９】[0039]

【数１７】ｐ_MAP（ｗ_n-1，ｗ_n-2，…，ｗ₂，ｗ₁｜ｗ_n）
＝｛Ｃ（ｗ_n，ｗ_n-1，…，ｗ₂，ｗ₁）＋α−１｝／｛Ｃ
（ｗ_n）＋α＋β−２｝## _EQU17 ## p _MAP (w _n−1 , w _n−2 ,..., W ₂ , w ₁ | w _n )
= {C (w _n , w _n−1 ,..., W ₂ , w ₁ ) + α−1} / {C
(W _n ) + α + β−2｝

【００４０】ここで、事後知識が与えられないときの確
率推定値は、上記数１７に次式の数１８を代入すること
により、数１９を得る。Here, a probability estimation value when no posterior knowledge is given is obtained by substituting the following equation (18) into the above equation (17).

【００４１】[0041]

【数１８】Ｃ（ｗ_n，ｗ_n-1，…，ｗ₂，ｗ₁）＝Ｃ（ｗ_n）＝０C (w _n , w _n−1 ,..., W ₂ , w ₁ ) = C (w _n ) = 0

【数１９】ｐ_MAP（ｗ_n-1，…，ｗ₂，ｗ₁｜ｗ_n）＝（α
−１）／（α＋β−２）(19) p _MAP (w _n−1 ,..., W ₂ , w ₁ | w _n ) = (α
-1) / (α + β-2)

【００４２】また、事後知識が与えられない時は事前知
識を用いるのが妥当であることから上記数１９の値は事
前知識による値と等しいとし、次式を得る。When the posterior knowledge is not given, it is appropriate to use the prior knowledge, so the value of the above Expression 19 is equal to the value based on the prior knowledge, and the following equation is obtained.

【００４３】[0043]

【数２０】（α−１）／（α＋β−２）＝ｐ（ｗ_n-1，
…，ｗ₂，ｗ₁｜ｃ_n）(Α-1) / (α + β-2) = p (w _n−1 ,
…, W ₂ , w ₁ | c _n )

【００４４】また、パラメータα及びβは上記の数１３
及び数１４を用いて計算でき、ここで、重み付き平均μ
_h及び重み付き分散σ_h ²は次式で定義されて計算するこ
とができる。The parameters α and β are calculated by the above equation (13).
And Equation 14 where the weighted average μ
_h and the weighted variance σ _h ² can be calculated by being defined by the following equation.

【００４５】[0045]

【数２１】 (Equation 21)

【数２２】 (Equation 22)

【００４６】上述された方法で計算されたパラメータα
及びβに基づいて上記数１７を用いて、各単語列毎にＭ
ＡＰ推定後の後向きの確率を求める。The parameter α calculated by the method described above
And β using the above equation (17), M
The backward probability after AP estimation is determined.

【００４７】このようにして得られたＭＡＰ推定後のＮ
−ｇｒａｍの値は単語Ｎ−ｇｒａｍの観測回数が多い単
語列ではより単語Ｎ−ｇｒａｍの値に近く、少ない単語
列では品詞クラス−単語２−ｇｒａｍの値に近いため、
「次単語の予測精度」、「スパースデータに対する信頼
性」、「タスクのずれに対する頑健さ」の三つを兼ね備
えたモデルになっていると考えられる。The N obtained after the MAP estimation thus obtained is
The value of -gram is closer to the value of the word N-gram in a word string in which the number of times of observation of the word N-gram is large, and is close to the value of the part of speech class-word 2-gram in a word string of few words.
This model is considered to be a model having three of "prediction accuracy of next word", "reliability for sparse data", and "robustness against task deviation".

【００４８】次いで、正規化処理と平滑化処理について
説明する。ＭＡＰ推定後の確率は個々の単語列ごとに独
立に計算されるため、各先行単語列ごとの遷移確率の和
が１となる保証がない。このため、これが１となるよう
に補正を行う必要がある。また、ＭＡＰ推定後のＮ−ｇ
ｒａｍにおいても事前知識すら与えられない単語列に対
しては確率値を割り当てることができないため、平滑化
により何らかの確率値を割り振る必要がある。平滑化の
方法としては、好ましくは、次の２つの方法のいずれか
を用いる。（１）ＭＡＰ推定による平滑化処理（２）バックオフ平滑化処理Next, the normalization processing and the smoothing processing will be described. Since the probability after the MAP estimation is calculated independently for each word string, there is no guarantee that the sum of the transition probabilities for each preceding word string is 1. For this reason, it is necessary to perform correction so that this becomes 1. N-g after MAP estimation
Even in ram, since a probability value cannot be assigned to a word string to which no prior knowledge is given, it is necessary to allocate some probability value by smoothing. As the smoothing method, preferably, one of the following two methods is used. (1) Smoothing processing by MAP estimation (2) Back-off smoothing processing

【００４９】まず、ＭＡＰ推定による平滑化処理につい
て説明する。ＭＡＰ推定における事前知識である確率ｐ
（ｗ_n｜ｃ₁，ｗ₂，…，ｗ_n-1）に対して、さらに１次元
下（すなわち、履歴の１つ短い）の確率ｐ（ｗ_n｜ｗ₂，
…，ｗ_n-1）を事前知識とするＭＡＰ推定を行うこと
で、事前知識すら与えられていない場合でも確率値を割
り当てができる（例えば、従来技術文献２「川端豪ほ
か，”二項事後分布に基づくＮ−ｇｒａｍ言語モデルの
Ｂａｃｋ−ｏｆｆ平滑化”，電子情報通信学会技術報
告，ＳＰ９３−９５，ｐｐ．１−６，１９９５年１２
月」参照。）。さらに、確率ｐ（ｗ_n｜ｗ₂，…，
ｗ_n-1）は確率ｐ（ｗ_n｜ｃ₂，…，ｗ_n-1）で、確率ｐ
（ｗ_n｜ｃ₂，…，ｗ_n-1）は確率ｐ（ｗ_n｜ｗ₃，…，ｗ
_n-1）でという具合に逐次的にＭＡＰ推定を繰り返すこ
とにより、最終的には単語１−ｇｒａｍ（又は０−ｇｒ
ａｍ）を事前知識とするＭＡＰ推定が行われるため、す
べての単語列に対して確率値を割り当てることができ
る。なお、この場合、上記数９における重み付き平均μ
_hの値としてはＭＡＰ推定後の確率推定値ｐ_MAP（ｗ_n｜
ｃ₁，ｗ₂，…，ｗ_n _-1）を用いることになる。また、正
規化後の確率値は各先行単語列に対して、次単語への遷
移確率の総和が１となるように次式に従って正規化され
る。ここで、Ｖは語彙数を表すものとする。First, the smoothing process based on the MAP estimation will be described. Probability p which is prior knowledge in MAP estimation
(W _n | c ₁ , w ₂ ,..., W _n-1 ), the probability p (w _n | w ₂ ,
, W _n-1 ), a probability value can be assigned even when no prior knowledge is given (for example, prior art document 2 "Go Kawabata et al.," Back-off smoothing of N-gram language model based on distribution ", IEICE Technical Report, SP93-95, pp. 1-6, December 1995
Month ". ). Further, the probability p (w _n | w ₂ ,.
w _n-1 ) is the probability p (w _n | c ₂ ,..., w _n-1 ) and the probability p
(W _n | c ₂ ,..., W _n-1 ) is the probability p (w _n | w ₃ ,.
_n-1 ), the MAP estimation is repeated sequentially, and finally the word 1-gram (or 0-gr) is obtained.
Since the MAP estimation using the prior knowledge am) is performed, probability values can be assigned to all word strings. Note that in this case, the weighted average μ
_As the value of _h , the probability estimation value p _MAP (w _n |
c ₁ , w ₂ ,..., w _n _-1 ). The normalized probability value is normalized with respect to each preceding word string according to the following equation so that the sum of transition probabilities to the next word becomes 1. Here, V represents the number of words.

【００５０】[0050]

【数２３】 (Equation 23)

【００５１】次いで、バックオフ平滑化処理について説
明する。平滑化の方法として公知のバックオフ平滑化法
（例えば、従来技術文献３「S. M. Katz,”Estimation
of Probabilities from Sparse Data for the Language
Model Component of a Speech Recognitizer”,IEEE T
ransaction on Acoustics, Speech and Signal Process
ing, pp.400-401, 1987」参照。）を用いた場合は、未
学習である列に割り振られる確率値の総和は、事前知識
として用いるモデルにおいてに公知のバックオフ平滑化
法により未学習である列に割り振られる確率値の総和と
等しくする。すなわち、ＭＡＰ推定後のバックオフ係数
ｂ（ｗ₁，ｗ₂，…，ｗ_n-1）は、事前知識におけるバッ
クオフ係数ｂ（ｃ₁，ｗ₂，…，ｗ_n-1）と等しいものと
する。その後、後続単語への遷移確率の和が１となるよ
うにディスカウント係数を求める。最終的に、バックオ
フ平滑化処理を施したＭＡＰ推定後の遷移確率は次式で
与えられることになる。Next, the back-off smoothing processing will be described. As a smoothing method, a known back-off smoothing method (for example, Prior Art Document 3 “SM Katz,” Estimation
of Probabilities from Sparse Data for the Language
Model Component of a Speech Recognitizer ”, IEEE T
ransaction on Acoustics, Speech and Signal Process
ing, pp. 400-401, 1987 ". )), The sum of the probability values assigned to the unlearned columns is equal to the sum of the probability values assigned to the unlearned columns by a known back-off smoothing method in a model used as prior knowledge. I do. That is, the back-off coefficient b (w ₁ , w ₂ ,..., W _n-1 ) after the MAP estimation is equal to the back-off coefficient b (c ₁ , w ₂ ,..., W _n-1 ) in the prior knowledge. And After that, a discount coefficient is calculated so that the sum of the transition probabilities to the succeeding word becomes 1. Finally, the transition probability after the MAP estimation after the back-off smoothing processing is given by the following equation.

【００５２】（１）Ｃ（ｗ₁，ｗ₂，…，ｗ_n）＞０のと
き、すなわち事後知識が与えられた場合(1) When C (w ₁ , w ₂ ,..., W _n )> 0, that is, when posterior knowledge is given

【００５３】[0053]

【数２４】ｐ_MAP（ｗ_n｜ｗ₁，ｗ₂，…，ｗ_n-1）＝ｄ
（ｗ₂，…，ｗ_n-1，ｗ_n）×｛Ｃ（ｗ₁，ｗ₂，…，
ｗ_n-1，ｗ_n）＋α−１｝／｛Ｃ（ｗ₁，ｗ₂，…，
ｗ_n-1）＋α＋β−２｝## _EQU24 ## p _MAP (w _n | w ₁ , w ₂ ,..., W _n-1 ) = d
(W ₂ ,..., W _n−1 , w _n ) × ｛C (w ₁ , w ₂ ,.
w _n−1 , w _n ) + α−1} / {C (w ₁ , w ₂ ,...,
w _n-1 ) + α + β-2｝

【００５４】（２）Ｃ（ｃ₁，ｗ₂，…，ｗ_n）＞Ｃｕｔ
（ｎ）のとき、すなわち事後知識が与えられない場合(2) C (c ₁ , w ₂ ,..., W _n )> Cut
In the case of (n), that is, when no ex post knowledge is given

【００５５】[0055]

【数２５】ｐ_MAP（ｗ_n｜ｗ₁，ｗ₂，…，ｗ_n-1）＝ｄ
（ｗ₂，…，ｗ_n-1，ｗ_n）×（α−１）／｛Ｃ（ｗ₁，ｗ
₂，…，ｗ_n-1）＋α＋β−２｝[Number 25] _{_{_{p MAP (w n | w 1}}} , w 2, ..., w n-1) = d
(W ₂ ,..., W _n−1 , w _n ) × (α−1) / ΔC (w ₁ , w
₂ ,..., W _n-1 ) + α + β-2｝

【００５６】（３）Ｃ（ｃ₁，ｗ₂，…，ｗ_n）≦Ｃｕｔ
（ｎ）のとき、すなわち事前知識すら信頼性がない場合(3) C (c ₁ , w ₂ ,..., W _n ) ≦ Cut
In the case of (n), that is, when even prior knowledge is not reliable

【００５７】[0057]

【数２６】ｐ_MAP（ｗ_n｜ｗ₁，ｗ₂，…，ｗ_n-1）＝ｂ
（ｗ₁，ｗ₂，…，ｗ_n-1）×ｐ_MAP（ｗ_n｜ｗ₂，…，ｗ
_n-1）(26) p _MAP (w _n | w ₁ , w ₂ ,..., W _n-1 ) = b
(W ₁ , w ₂ ,..., W _n-1 ) × p _MAP (w _n | w ₂ ,.
_n-1 )

【００５８】ここで、Ｃｕｔ（ｎ）はＮ−ｇｒａｍの次
数ｎにおけるカットオフ係数であり、ｄ（ｗ₂，…，ｗ
_n-1，ｗ_n）はディスカウント係数であり、ｂ（ｗ₁，
ｗ₂，…，ｗ_n-1）はバックオフ係数である。Here, Cut (n) is a cutoff coefficient at the order n of N-gram, and d (w ₂ ,..., W
_n−1 , w _n ) is a discount coefficient and b (w ₁ , w _n )
w ₂ ,..., w _n-1 ) are back-off coefficients.

【００５９】なお、後向きの確率の場合においても、同
様の式を用いて平滑化処理を行うことができる。ここ
で、上記数２０における重み付き平均μの値としては、
ＭＡＰ推定後の値ｐ_MAP（ｗ_n-1，…，ｗ₂，ｗ₁｜ｃ_n）
を用いる。また、後向きのＭＡＰ推定においては、後向
きの品詞クラス−単語Ｎ−ｇｒａｍの統計的言語モデル
を事前知識として用い、後向きの単語Ｎ−ｇｒａｍの統
計的言語モデルを事後知識として用いる。In the case of the backward probability, the smoothing process can be performed using the same equation. Here, the value of the weighted average μ in Equation 20 is
Value p _MAP after MAP estimation (w _n−1 ,..., W ₂ , w ₁ | c _n )
Is used. In the backward MAP estimation, the statistical language model of the backward part of speech class-word N-gram is used as prior knowledge, and the statistical language model of the backward word N-gram is used as posterior knowledge.

【００６０】次いで、第２の実施形態における品詞クラ
ス情報と自動クラス分類の併用の方法について説明す
る。第１の実施形態の言語モデル生成部２０により得ら
れた前向きＮ−ｇｒａｍ及び後向きＮ−ｇｒａｍである
２つの統計的言語モデルに対して、自動的にクラスタリ
ング（自動クラス分類）することにより、モデルサイズ
の縮小をはかる。自動クラスＮ−ｇｒａｍがタスクのず
れに対して頑健か否かは、自動クラス分類を行う際、各
単語に割り当てられる特徴量がタスクのずれに対して頑
健かどうかで決まると考えられる。通常の自動クラス分
類においては単語の特徴量として単語Ｎ−ｇｒａｍの値
を用いるが、この特徴量はタスクのずれに対する頑健さ
を欠くため、得られる自動クラスＮ−ｇｒａｍもまたタ
スクのずれに対する頑健さに欠けるものとなると考えら
れる。上述の第１の実施形態の言語モデル生成部２０に
より得られたＭＡＰ推定後のＮ−ｇｒａｍの統計的言語
モデルはタスクのずれに対する頑健さを持つため、この
値を単語の特徴量として用いることによってタスクのず
れに対して頑健な自動クラスが得られることが期待さ
れ、このクラスを用いたクラスＮ−ｇｒａｍにより、第
１の実施形態の言語モデル生成部２０により得られた統
計的言語モデルに対してさらに「コンパクトなモデルサ
イズ」を付け加えることが期待できる。Next, a method of using part-of-speech class information and automatic class classification in the second embodiment will be described. The two statistical language models of the forward N-gram and the backward N-gram obtained by the language model generation unit 20 of the first embodiment are automatically clustered (automatic class classification) to obtain a model. We will reduce the size. Whether or not the automatic class N-gram is robust against a task shift is considered to be determined by whether or not a feature amount assigned to each word is robust against a task shift when performing automatic class classification. In the normal automatic class classification, the value of the word N-gram is used as the feature amount of the word. However, since this feature amount lacks robustness against task deviation, the obtained automatic class N-gram is also robust against task deviation. It is thought to be lacking. Since the statistical language model of N-gram after the MAP estimation obtained by the language model generation unit 20 of the first embodiment has robustness against task deviation, this value is used as a feature amount of a word. It is expected that an automatic class that is robust against a task shift is obtained by the class N-gram using this class, and the statistical language model obtained by the language model generation unit 20 of the first embodiment is obtained. On the other hand, it can be expected that "compact model size" will be added.

【００６１】次いで、先行単語列のクラス分類について
説明する。ある単語列（ｗ₁，ｗ₂，…，ｗ_n-1）を考え
た場合、この単語列（ｗ₁，ｗ₂，…，ｗ_n-1）から他の
単語ｗ _nへの遷移確率は次式で与えられる。Next, regarding the class classification of the preceding word string
explain. A certain word string (w₁, W_Two, ..., w_n-1Think)
When this word string (w₁, W_Two, ..., w_n-1) From other
Word w _nIs given by the following equation.

【数２７】ｐ_MAP（ｗ_n｜ｗ₁，ｗ₂，…，ｗ_n-1）(27) p _MAP (w _n | w ₁ , w ₂ ,..., W _n-1 )

【００６２】このとき、遷移先の単語として語彙セット
中のすべての単語Ｖ個を考え、これらの遷移確率列を長
さＶのベクトルと考えることができる。このベクトルは
単語列（ｗ₁，ｗ₂，…，ｗ_n-1）の次単語に対する接続
属性を表していると考えることができるため、このベク
トルに基づき、単語列をクラス分類する。ここで、自動
クラス分類の方法としては次の手順を用いる。At this time, all V words in the vocabulary set are considered as transition destination words, and these transition probability sequences can be considered as vectors of length V. Since this vector can be considered to represent a connection attribute for the next word of the word string (w ₁ , w ₂ ,..., W _n-1 ), the word string is classified based on this vector. Here, the following procedure is used as a method of automatic class classification.

【００６３】＜ステップＳＳ１＞１単語列に対して１ク
ラスとする。＜ステップＳＳ２＞個々の単語列のクラスＸに対して特
徴量Ｖ（Ｘ）を割り当てる。ここで、特徴量Ｖ（Ｘ）は
クラスＸから次の単語への遷移確率を要素とするベクト
ルである。＜ステップＳＳ３＞マージするために必要なマージコス
ト（Ｕ_new−Ｕ_old）が最小となるような（すなわち、マ
ージ後のベクトルの分散が小さくならないような）クラ
スのペアを選び、統合して１つのクラスとする。ここ
で、Ｕ_newはマージ後のクラス内のベクトルのバラツキ
又は分散であり、Ｕ_oldはマージ前のクラス内のベクト
ルのバラツキ又は分散であり、それぞれ次式のように計
算できる。<Step SS1> One word string is classified into one class. <Step SS2> A feature value V (X) is assigned to the class X of each word string. Here, the feature amount V (X) is a vector having the transition probability from the class X to the next word as an element. <Step SS3> A pair of classes that minimizes the merge cost (U _new −U _old ) necessary for merging (that is, the variance of the vector after merging is not reduced) is selected and integrated to 1 And two classes. Here, U _new is the variation or variance of the vector in the class after merging, and U _old is the variance or variance of the vector in the class before merging, and can be calculated as follows.

【００６４】[0064]

【数２８】 [Equation 28]

【数２９】 (Equation 29)

【００６５】ここで、Ｃ_newはマージ後のクラスであ
り、Ｃ_oldはマージ前のクラスを表し、Ｄ（Ｖｃ，Ｖ
ｗ）はベクトルＶｃとＶｗとのユークリッド距離の自乗
を表す。＜ステップＳＳ４＞上記ステップＳＳ２及びＳＳ３の手
順を予め定められたクラス数になるまで繰り返すことに
より、クラスタリング処理を行う。Here, C _new is the class after merging, C _old is the class before merging, and D (Vc, V
w) represents the square of the Euclidean distance between the vectors Vc and Vw. <Step SS4> The clustering process is performed by repeating the procedures of steps SS2 and SS3 until the number of classes reaches a predetermined number.

【００６６】次いで、後続単語のクラス分類について説
明する。単語列（ｗ₁，ｗ₂，…，ｗ _n-1）から他の単語
ｗ_nへの遷移確率は単語列（ｗ₁，ｗ₂，…，ｗ_n-1）がス
パースな場合（十分な学習量が無い場合）でもＭＡＰ推
定により信頼性のある値を割り当てることができた。し
かしながら、後続単語ｗ_nがスパースな場合の信頼性の
問題は解決されていない。そこで、ある単語ｗ_nに先ん
じて単語列（ｗ₁，ｗ₂，…，ｗ_n-1）が出現する確率ｐ
（ｗ₁，ｗ₂，…，ｗ_n-1｜ｗ_n）を単語ｗ_nの属する品詞
クラスに先んじて単語列（ｗ₁，ｗ₂，…，ｗ_n-1）が出
現する確率ｐ（ｗ₁，ｗ₂，…，ｗ_n-1｜ｃ_n）を事前知識
とするＭＡＰ推定で補間することを考える。ＭＡＰ推定
の手順自体は上述した方法を用いる。これにより得られ
たＭＡＰ推定後の確率ｐ_MAP（ｗ₁，ｗ₂，…，ｗ_n-1｜ｗ
_n）を要素とするベクトルは上述したのと同様に、単語
ｗ_nの先行単語列（ｗ₁，ｗ₂，…，ｗ_n-1）に対する信頼
性のある接続属性を表している。このベクトルに基づい
て後続単語もクラス分類する。このようにして得られた
先行単語列のクラスと後続単語のクラス（ともに処理対
象の単語を基準として）は共に単語情報とともに品詞情
報をも考慮したクラスとなっており、これらのクラスか
ら生成されるクラスＮ−ｇｒａｍを、単語Ｎ−ｇｒａｍ
と品詞クラスＮ−ｇｒａｍとを融合した融合Ｎ−ｇｒａ
ｍと呼んでいる。Next, the class classification of the succeeding word will be described.
I will tell. Word string (w₁, W_Two, ..., w _n-1) From other words
w_nTransition probability to the word string (w₁, W_Two, ..., w_n-1)
Even if it is parse (when there is not enough learning amount), MAP
A more reliable value could be assigned by the rule. I
However, the following word w_nIs reliable when sparse
The problem has not been solved. Therefore, a certain word w_nAhead of
The word string (w₁, W_Two, ..., w_n-1) Probability p
(W₁, W_Two, ..., w_n-1| W_n) To the word w_nPart of speech
Word strings (w₁, W_Two, ..., w_n-1) Comes out
Probability p (w₁, W_Two, ..., w_n-1| C_nA) prior knowledge
It is assumed that interpolation is performed using MAP estimation. MAP estimation
The procedure itself uses the method described above. This gives
Probability p after MAP estimation_MAP(W₁, W_Two, ..., w_n-1| W
_n) Is a vector with the word
w_nPreceding word string (w₁, W_Two, ..., w_n-1Trust in)
Represents connection attributes that have potential. Based on this vector
The subsequent words are also classified. Obtained in this way
Preceding word string class and succeeding word class (both processing
Elephant words) together with word information
The class also takes into account information
Class N-gram generated from the word N-gram
N-gra that fuses the part-of-speech class N-gram
We call it m.

【００６７】さらに、品詞によるクラスについて補足説
明する。クラスＮ−ｇｒａｍにおけるクラスの指標とし
ては品詞クラス情報がよく用いられる。そこで、本発明
者は、まず、品詞クラス情報に基づくクラス分類を行っ
たクラスｂｉｇｒａｍと単語ｂｉｇｒａｍの性能比較を
行った。学習セットはのべ単語数４５９，３８３単語、
異なり単語数７，２２１単語であり、クラスは品詞クラ
ス情報（本実施形態では、品詞クラス情報データメモリ
１４に格納される。）に基づく１５８クラスを用いた。
実験結果はのべ単語数６，８２６のテストセットにおい
て、クラスｂｉｇｒａｍがパープレキシティ３１．５３
であったのに対して、単語ｂｉｇｒａｍでは１８．５１
であり、両者の間にはかなりの差があることがわかる。
この原因はクラス分類に用いた品詞情報は単語の性質全
体を表わすものであるため、Ｎ−ｇｒａｍにおいて重要
な単語の接続性を純粋にあらわしているとは言い難いた
めと考えられる。Further, the class based on the part of speech will be supplementarily described. Part-of-speech class information is often used as a class index in the class N-gram. Therefore, the present inventor first performed a performance comparison between the class bigram and the word bigram, which were classified based on the part-of-speech class information. The training set has a total of 459,383 words,
The difference is 7,221 words, and the class used is 158 classes based on part-of-speech class information (stored in the part-of-speech class information data memory 14 in the present embodiment).
The experimental results show that the class bigram has a perplexity of 31.53 in a test set with a total of 6,826 words.
Whereas the word bigram has 18.51.
It can be seen that there is a considerable difference between the two.
It is considered that the reason is that the part of speech information used for the class classification represents the entire property of the word, and thus it cannot be said that the connectivity of important words is purely expressed in the N-gram.

【００６８】この状況に鑑みて本発明者は、以下に示す
多重クラスＮ−ｇｒａｍである融合Ｎ−ｇｒａｍを発明
した。ここで、品詞情報によるクラス分類を行う場合
と、単語の接続性のみに着目してクラス分類を行う場合
の違いに関して考える。例として全節で用いた品詞分類
の１つである動詞活用語尾のうち次の３つを対象として
みる。（ａ）「き：語尾、五段カ行、連用形」（ｂ）「し：語尾、五段サ行、連用形」（ｃ）「く：語尾、五段カ行、終止形」これらは異なった品詞を持つために（ここでは、３つの
品詞にクラス分類される）、品詞情報によるクラス分類
では各々別のクラスとされる。この場合どのような単語
が前にくるか後にくるか、各々３通りの接続を考えるこ
とになる。しかしながら、これらに対してどのような単
語が前にくるかだけを考えた場合、「き」と「く」は同
じと考えることができ、どのような単語が後にくるかだ
けを考えた場合、「き」と「し」は同じと考えることが
できる。従って、どのような単語が前にくるか、後にく
るかという性質を個別に考え、それぞれに対して別々に
クラスを割り当てれば各々２通りの接続を考えればよい
ことになり、通常のクラス分類よりも効率的なクラス分
類を行うことができる。In view of this situation, the present inventor has invented a multi-class N-gram fusion N-gram shown below. Here, the difference between the case where class classification based on part of speech information is performed and the case where class classification is performed focusing only on the connectivity of words is considered. As an example, the following three verb conjugation endings, which are one of the parts of speech used in all sections, will be considered. (A) "ki: ending, five columns, continuous form" (b) "shi: ending, five columns, continuous form" (c) "ku: ending, five columns, closing form" In order to have parts of speech (here, they are classified into three parts of speech), they are classified into different classes in the class classification based on part of speech information. In this case, three types of connections are to be considered for what word comes before or after. However, if we only consider what words come before them, "ki" and "ku" can be considered the same, and if we only consider what words come after, "Ki" and "shi" can be considered the same. Therefore, if the words are preceded or deceased individually, two classes of connections can be considered if each class is assigned separately. Classification can be performed more efficiently.

【００６９】本実施形態では、前者である前にある単語
の後向きの接続性（ｆｒｏｍの接続性）に関するクラス
をｔｏクラス（後向きのクラス）と呼び、後者である後
続する単語の前向きの接続性（ｔｏの接続性）をｆｒｏ
ｍクラス（前向きのクラス）と呼ぶことにし、個々の単
語はｔｏクラス、ｆｒｏｍクラスの２つのクラス属性
（品詞属性）を持つと考えることにする。これによれ
ば、上記の３つの単語はｔｏクラスでは（ａ）「き、く：語尾、五段カ行」（ｂ）「し：語尾、五段サ行」の２つのクラスで表現でき、ｆｒｏｍクラスでは（ａ）「き、し：語尾、五段、連用形」（ｂ）「く：語尾、五段、終止形」の２つのクラスで表現できる。この考え方はＮ≧３の場
合に対してもそのまま拡張可能で、この場合、個々の単
語はＮ個のクラス属性を持つことになる。このクラス属
性のことを多重クラスと呼び、これを用いたＮ−ｇｒａ
ｍを多重クラスＮ−ｇｒａｍと呼ぶ。Ｎ＝２としたとき
の多重クラスｂｉｇｒａｍの出現確率は次式で表わされ
る。In this embodiment, a class relating to the backward connectivity (from connectivity) of the former word, which is the former, is called a to class (backward class), and the forward connectivity of the subsequent word, which is the latter, is referred to as the to class. (Connectivity of to)
It will be referred to as an m class (forward class), and each word will be considered to have two class attributes (part of speech attribute) of a to class and a from class. According to this, the above three words can be expressed in the to class in two classes: (a) “ki, ku: ending, five columns” (b) “shi: ending, five columns” In the from class, it can be expressed in two classes: (a) "Ki-shi: ending, 5-dan, continuous form". (b) "K: ending, 5-dan, ending form". This concept can be extended as it is even when N ≧ 3. In this case, each word has N class attributes. This class attribute is called a multiple class, and the N-gra
m is called a multi-class N-gram. The appearance probability of the multi-class bigram when N = 2 is expressed by the following equation.

【００７０】[0070]

【数３０】Ｐ（Ｗｎ｜Ｗｎ−１）≒Ｐ（Ｃｔ（Ｗｎ）｜
Ｃｆ（Ｗｎ−１））×Ｐ（Ｗｎ｜Ｃｔ（Ｗｎ））P (Wn | Wn-1) ｎP (Ct (Wn) |
Cf (Wn-1)) × P (Wn | Ct (Wn))

【００７１】ここで、Ｃｔは、処理対象の単語が属する
ｔｏクラスを表わし、Ｃｆはｆｒｏｍクラスを表わすも
のとする。このときのパラメータ数はクラスＮ−ｇｒａ
ｍが、クラス数の自乗＋単語数となるのに対して、ｔｏ
クラス数×ｆｒｏｍクラス数＋単語数となる。Here, Ct represents the to class to which the word to be processed belongs, and Cf represents the from class. The number of parameters at this time is class N-gra
m is the square of the number of classes plus the number of words, whereas to
The number of classes × the number of classes + the number of words.

【００７２】さらに、言語モデル生成部２０及び３０に
よって実行される統計的言語モデル生成処理について図
４乃至図１０のフローチャートを参照して説明する。Further, the statistical language model generation processing executed by the language model generation units 20 and 30 will be described with reference to the flowcharts of FIGS.

【００７３】図４は、図１の言語モデル生成部２０によ
って実行される第１の言語モデル生成処理を示すフロー
チャートである。図４において、まず、ステップＳ１に
おいて前向き言語モデル生成処理（図５）を実行した
後、ステップＳ２において後向き言語モデル生成処理
（図６）を実行して、当該第１の言語モデル生成処理を
終了する。FIG. 4 is a flowchart showing a first language model generation process executed by the language model generation unit 20 of FIG. In FIG. 4, first, a forward language model generation process (FIG. 5) is executed in step S1, and then a backward language model generation process (FIG. 6) is executed in step S2, and the first language model generation process ends. I do.

【００７４】図５は、図４のサブルーチンである前向き
言語モデル生成処理を示すフローチャートである。図５
において、まず、ステップＳ１１において学習用テキス
トデータメモリ１３内に格納され、発声音声文を書き下
したコーパスである学習用テキストデータに基づいて、
前向きの単語Ｎ−ｇｒａｍの統計的言語モデルを生成し
た後、ステップＳ１２において学習用テキストデータメ
モリ１３内の学習用テキストデータと、品詞クラス情報
データメモリ１４内の品詞クラス情報データとに基づい
て、前向きの品詞クラス−単語Ｎ−ｇｒａｍの統計的言
語モデルを生成する。次いで、ステップＳ１３において
品詞クラスと単語列との各対に対して、数１３乃至数１
６を用いてＭＡＰ推定法のためのパラメータα及びβを
計算し、ステップＳ１４において各単語列に対して、計
算されたパラメータα及びβに基づいて数４を用いてＭ
ＡＰ推定後の前向きの遷移確率を計算する。さらに、ス
テップＳ１５において計算された前向きの遷移確率に対
して、正規化処理と平滑化処理を実行することにより前
向き言語モデルを生成して、統計的言語モデルメモリ２
１に格納して、元のメインルーチンに戻る。FIG. 5 is a flowchart showing a forward language model generation process which is a subroutine of FIG. FIG.
First, at step S11, based on the learning text data stored in the learning text data memory 13 and the corpus in which the uttered voice sentence is newly written,
After generating the statistical language model of the forward word N-gram, in step S12, based on the learning text data in the learning text data memory 13 and the part of speech class information data in the part of speech class information data memory 14, Generate a statistical language model of the forward part of speech class-word N-gram. Next, in step S13, for each pair of the part of speech class and the word string,
6 and the parameters α and β for the MAP estimating method are calculated. In step S14, M is calculated for each word string by using Equation 4 based on the calculated parameters α and β.
A forward transition probability after AP estimation is calculated. Further, the forward transition probability calculated in step S15 is subjected to normalization processing and smoothing processing to generate a forward language model, and the statistical language model memory 2
1 and the program returns to the main routine.

【００７５】図６は、図４のサブルーチンである後向き
言語モデル生成処理を示すフローチャートである。図６
において、まず、ステップＳ２１において学習用テキス
トデータメモリ１３内の学習用テキストデータに基づい
て後向きの単語Ｎ−ｇｒａｍを生成した後、ステップＳ
２２において学習用テキストデータメモリ１３内の学習
用テキストデータと、品詞クラス情報データメモリ１４
内の品詞クラス情報データとに基づいて、後向きの品詞
クラス−単語Ｎ−ｇｒａｍを生成する。次いで、ステッ
プＳ２３において品詞クラスと単語列との各対に対し
て、数１３、数１４、数２１及び数２２を用いてＭＡＰ
推定法のためのパラメータα及びβを計算し、ステップ
Ｓ２４において各単語列に対して、計算されたパラメー
タα及びβに基づいて数１７を用いてＭＡＰ推定後の後
向きの遷移確率を計算する。さらに、ステップＳ２５に
おいて計算された後向きの遷移確率に対して正規化処理
と平滑化処理を実行することにより、後向き言語モデル
を生成して、統計的言語モデルメモリ２２に格納して、
元のメインルーチンに戻る。FIG. 6 is a flowchart showing the backward language model generation process which is a subroutine of FIG. FIG.
First, in step S21, a backward word N-gram is generated based on the learning text data in the learning text data memory 13, and then in step S21.
At 22, the learning text data in the learning text data memory 13 and the part-of-speech class information data memory 14
A backward part-of-speech class-word N-gram is generated on the basis of the part-of-speech class information data. Next, in step S23, for each pair of the part-of-speech class and the word string, MAP is performed using Expressions 13, 14, 21, and 22.
The parameters α and β for the estimation method are calculated, and in step S24, the backward transition probability after the MAP estimation is calculated for each word string using Equation 17 based on the calculated parameters α and β. Further, by performing a normalization process and a smoothing process on the backward transition probability calculated in step S25, a backward language model is generated and stored in the statistical language model memory 22.
Return to the original main routine.

【００７６】図７は、図１の言語モデル生成部３０によ
って実行される第２の言語モデル生成処理を示すフロー
チャートである。図７において、まず、ステップＳ３１
において前向き言語モデルのクラスタリング処理（図
８）を実行し、ステップＳ３２において後向き言語モデ
ルのクラスタリング処理（図９）を実行し、ステップＳ
３３において融合言語モデル生成処理（図１０）を実行
して当該第２の言語モデル生成処理を終了する。FIG. 7 is a flowchart showing the second language model generation processing executed by the language model generation unit 30 of FIG. In FIG. 7, first, at step S31
In step S32, a clustering process for the forward-looking language model (FIG. 9) is executed, and in step S32, a clustering process for the forward-looking language model (FIG. 9) is executed.
At 33, the fusion language model generation process (FIG. 10) is executed, and the second language model generation process ends.

【００７７】図８は、図７のサブルーチンである前向き
言語モデルのクラスタリング処理（ステップＳ３１）を
示すフローチャートである。図８において、まず、ステ
ップＳ４１において統計的言語モデルメモリ２１内の前
向き言語モデルの遷移確率に基づいて、処理対象単語よ
りも前に接続される各単語列に対して特徴量として上記
遷移確率を割り当てて、ここで、１単語列に対して１つ
のクラスとする。次いで、ステップＳ４２において各ク
ラスの対をマージしてマージコストが最小となるクラス
の対を選択し、１つのクラスに統合し、ステップＳ４３
において所定のクラス数しきい値（例えば、２００又は
５００）になったか否かが判断される。ここで、ＮＯの
ときはステップＳ４２に戻る一方、ＹＥＳのときはステ
ップＳ４４に進み、クラスタリング後の処理対象単語に
先行する単語列のクラス分類情報を一時メモリに格納し
て元のメインルーチンに戻る。FIG. 8 is a flowchart showing the forward language model clustering process (step S31) which is a subroutine of FIG. 8, first, in step S41, based on the transition probability of the forward-looking language model in the statistical language model memory 21, the above-mentioned transition probability is used as a feature amount for each word string connected before the processing target word. Here, one class is assigned to one word string. Next, in step S42, the pairs of the classes are merged to select the pair of the classes with the minimum merge cost, and are integrated into one class.
It is determined whether or not a predetermined class number threshold (for example, 200 or 500) has been reached. Here, if the determination is NO, the process returns to step S42, while if the determination is YES, the process proceeds to step S44, where the class classification information of the word string preceding the processing target word after clustering is stored in the temporary memory, and the process returns to the original main routine. .

【００７８】図９は、図７のサブルーチンである後向き
言語モデルのクラスタリング処理（ステップＳ３２）を
示すフローチャートである。図９において、ステップＳ
５１において統計的言語モデルメモリ２２内の後向き言
語モデルの遷移確率に基づいて、各処理対象単語に対し
て特徴量として上記遷移確率を割り当て、ここで、１単
語列に対して１つのクラスとする。次いで、ステップＳ
５２において各クラスの対をマージしてマージコストが
最小となるクラスの対を選択し、１つのクラスに統合
し、ステップＳ５３において所定のクラス数しきい値に
なったか否かが判断される。ここで、ＮＯのときはステ
ップＳ５２に戻る一方、ＹＥＳのときはステップＳ５４
に進み、クラスタリング後の処理対象単語のクラス分類
情報を一時メモリに格納して元のメインルーチンに戻
る。FIG. 9 is a flowchart showing the backward language model clustering process (step S32) which is a subroutine of FIG. In FIG. 9, step S
At 51, based on the transition probability of the backward language model in the statistical language model memory 22, the above-mentioned transition probability is assigned to each processing target word as a feature amount, where one word string is classified into one class. . Then, step S
At step 52, the pairs of the classes are merged to select the pair of the class with the minimum merge cost, and are integrated into one class. At step S53, it is determined whether or not a predetermined class number threshold is reached. Here, if the determination is NO, the process returns to step S52, whereas if the determination is YES, the process returns to step S54.
Then, the class classification information of the processing target word after clustering is stored in the temporary memory, and the process returns to the original main routine.

【００７９】図１０は、図７のサブルーチンである融合
言語モデル生成処理を示すフローチャートである。図１
０において、まず、ステップＳ６１において、一時メモ
リにそれぞれ格納されたクラスタリング後の処理対象単
語に先行する単語列のクラス分類情報と、クラスタリン
グ後の処理対象単語のクラス分類情報とを処理対象とす
る。次いで、ステップＳ６２において、学習用テキスト
データメモリ１３内の学習用テキストデータに基づい
て、上記２つのクラス分類情報の各クラスを用いて、処
理対象単語に先行する単語列のクラスから処理対象単語
のクラスへの頻度確率を計算することにより、単語Ｎ−
ｇｒａｍと、品詞クラスＮ−ｇｒａｍとを融合させた融
合Ｎ−ｇｒａｍの統計的言語モデルを生成して統計的言
語モデルメモリ３１に格納して元のメインルーチンに戻
る。FIG. 10 is a flowchart showing a fusion language model generation process which is a subroutine of FIG. FIG.
In step S61, first, in step S61, the class classification information of the word string preceding the clustering target word and the class classification information of the clustering target word stored in the temporary memory are processed. Next, in step S62, based on the learning text data in the learning text data memory 13, the classes of the word string preceding the processing target word are extracted from the class of the word string preceding the processing target word using each of the two classes of class classification information. By calculating the frequency probabilities for the classes, the word N-
A statistical language model of the fusion N-gram in which the gram and the part-of-speech class N-gram are fused is generated, stored in the statistical language model memory 31, and the process returns to the original main routine.

【００８０】次いで、図１に示す連続音声認識装置の構
成及び動作について説明する。図１において、単語照合
部４に接続された音素隠れマルコフモデル（以下、隠れ
マルコフモデルをＨＭＭという。）メモリ１１内の音素
ＨＭＭは、各状態を含んで表され、各状態はそれぞれ以
下の情報を有する。（ａ）状態番号、（ｂ）受理可能なコンテキストクラ
ス、（ｃ）先行状態、及び後続状態のリスト、（ｄ）出
力確率密度分布のパラメータ、及び（ｅ）自己遷移確率
及び後続状態への遷移確率。なお、本実施形態において
用いる音素ＨＭＭは、各分布がどの話者に由来するかを
特定する必要があるため、所定の話者混合ＨＭＭを変換
して生成する。ここで、出力確率密度関数は３４次元の
対角共分散行列をもつ混合ガウス分布である。また、単
語照合部４に接続された単語辞書メモリ１２内の単語辞
書は、音素ＨＭＭメモリ１１内の音素ＨＭＭの各単語毎
にシンボルで表した読みを示すシンボル列を格納する。Next, the configuration and operation of the continuous speech recognition apparatus shown in FIG. 1 will be described. In FIG. 1, the phoneme HMM in the phoneme hidden Markov model (hereinafter, referred to as HMM) memory 11 connected to the word matching unit 4 is represented by including each state, and each state includes the following information. Having. (A) state number, (b) acceptable context class, (c) list of preceding and succeeding states, (d) parameters of output probability density distribution, and (e) self-transition probability and transition to succeeding state probability. Note that the phoneme HMM used in the present embodiment is generated by converting a predetermined speaker-mixed HMM because it is necessary to specify which speaker each distribution originates from. Here, the output probability density function is a Gaussian mixture distribution having a 34-dimensional diagonal covariance matrix. Further, the word dictionary in the word dictionary memory 12 connected to the word matching unit 4 stores a symbol string indicating a reading represented by a symbol for each word of the phoneme HMM in the phoneme HMM memory 11.

【００８１】図１において、話者の発声音声はマイクロ
ホン１に入力されて音声信号に変換された後、特徴抽出
部２に入力される。特徴抽出部２は、入力された音声信
号をＡ／Ｄ変換した後、例えばＬＰＣ分析を実行し、対
数パワー、１６次ケプストラム係数、Δ対数パワー及び
１６次Δケプストラム係数を含む３４次元の特徴パラメ
ータを抽出する。抽出された特徴パラメータの時系列は
バッファメモリ３を介して単語照合部４に入力される。In FIG. 1, a uttered voice of a speaker is input to a microphone 1 and converted into a voice signal, and then input to a feature extracting unit 2. After performing A / D conversion on the input audio signal, the feature extraction unit 2 performs, for example, LPC analysis, and performs 34-dimensional feature parameters including logarithmic power, 16th-order cepstrum coefficient, Δlogarithmic power, and 16th-order Δcepstrum coefficient. Is extracted. The time series of the extracted feature parameters is input to the word matching unit 4 via the buffer memory 3.

【００８２】単語照合部４は、ワン−パス・ビタビ復号
化法を用いて、バッファメモリ３を介して入力される特
徴パラメータのデータに基づいて、音素ＨＭＭメモリ１
１内の音素ＨＭＭと単語辞書メモリ１２内の単語辞書と
を用いて単語仮説を検出し尤度を計算して出力する。こ
こで、単語照合部４は、各時刻の各ＨＭＭの状態毎に、
単語内の尤度と発声開始からの尤度を計算する。尤度
は、単語の識別番号、単語の開始時刻、先行単語の違い
毎に個別にもつ。また、計算処理量の削減のために、上
記音素ＨＭＭ及び単語辞書とに基づいて計算される総尤
度のうちの低い尤度のグリッド仮説を削減する。単語照
合部４は、その結果の単語仮説と尤度の情報を発声開始
時刻からの時間情報（具体的には、例えばフレーム番
号）とともにバッファメモリ５を介して単語仮説絞込部
６に出力する。The word collating unit 4 uses the one-pass Viterbi decoding method to store the phoneme HMM memory 1 based on the characteristic parameter data input via the buffer memory 3.
A word hypothesis is detected by using the phoneme HMM in 1 and the word dictionary in the word dictionary memory 12, and the likelihood is calculated and output. Here, the word matching unit 4 determines, for each state of each HMM at each time,
The likelihood within a word and the likelihood from the start of utterance are calculated. The likelihood is individually provided for each word identification number, word start time, and difference between preceding words. Further, in order to reduce the amount of calculation processing, a grid hypothesis having a low likelihood among the total likelihoods calculated based on the phoneme HMM and the word dictionary is reduced. The word collating unit 4 outputs the resulting word hypothesis and likelihood information to the word hypothesis narrowing unit 6 via the buffer memory 5 together with time information (specifically, a frame number, for example) from the utterance start time. .

【００８３】ここで、スイッチＳＷをａ側に切り換えた
とき、統計的言語モデルメモリ２１が単語仮説絞込部６
に接続されて単語仮説絞込部６は、言語モデル生成部２
０により生成された統計的言語モデルメモリ２１内の前
向きＮ−ｇｒａｍである統計的言語モデルを参照して単
語仮説絞込処理を行う一方、スイッチＳＷをｂ側に切り
換えたとき、統計的言語モデルメモリ３１が単語仮説絞
込部６に接続されて単語仮説絞込部６は、言語モデル生
成部３０により生成された統計的言語モデルメモリ３１
内の融合Ｎ−ｇｒａｍである統計的言語モデルを参照し
て単語仮説絞込処理を行う。Here, when the switch SW is switched to the a side, the statistical language model memory 21 stores the word hypothesis narrowing section 6
And the word hypothesis refining unit 6 is connected to the language model generation unit 2
0 while performing a word hypothesis narrowing process by referring to a statistical language model that is a forward-looking N-gram in the statistical language model memory 21 generated when the switch SW is switched to the b side. The memory 31 is connected to the word hypothesis narrowing unit 6, and the word hypothesis narrowing unit 6 is configured to use the statistical language model memory 31 generated by the language model generation unit 30.
The word hypothesis narrowing process is performed with reference to the statistical language model which is the fusion N-gram in the above.

【００８４】単語仮説絞込部６は、単語照合部４からバ
ッファメモリ５を介して出力される単語仮説に基づい
て、統計的言語モデルメモリ２１又は３１内の統計的言
語モデルを参照して、終了時刻が等しく開始時刻が異な
る同一の単語の単語仮説に対して、当該単語の先頭音素
環境毎に、発声開始時刻から当該単語の終了時刻に至る
計算された総尤度のうちの最も高い尤度を有する１つの
単語仮説で代表させるように単語仮説の絞り込みを行っ
た後、絞り込み後のすべての単語仮説の単語列のうち、
最大の総尤度を有する仮説の単語列を認識結果として出
力する。なお、タスク適応化された統計的言語モデル
は、各タスク毎に１つの統計的言語モデルを備え、単語
仮説絞込部６は、音声認識しようとするタスクに対応す
る統計的言語モデルを選択的に参照する。本実施形態に
おいては、好ましくは、処理すべき当該単語の先頭音素
環境とは、当該単語より先行する単語仮説の最終音素
と、当該単語の単語仮説の最初の２つの音素とを含む３
つの音素並びをいう。The word hypothesis narrowing section 6 refers to the statistical language model in the statistical language model memory 21 or 31 based on the word hypothesis output from the word matching section 4 via the buffer memory 5, and For a word hypothesis of the same word having the same end time and different start time, the highest likelihood among the total likelihoods calculated from the utterance start time to the end time of the word for each head phoneme environment of the word. After narrowing down the word hypotheses so as to be represented by one word hypothesis having a degree, of the word strings of all the narrowed word hypotheses,
A word string of a hypothesis having the maximum total likelihood is output as a recognition result. The task-adapted statistical language model includes one statistical language model for each task, and the word hypothesis narrowing unit 6 selectively selects a statistical language model corresponding to the task to be subjected to speech recognition. Refer to In the present embodiment, preferably, the first phoneme environment of the word to be processed includes the last phoneme of the word hypothesis preceding the word and the first two phonemes of the word hypothesis of the word.
One phoneme.

【００８５】例えば、図２に示すように、（ｉ−１）番
目の単語Ｗ_i-1の次に、音素列ａ₁，ａ₂，…，ａ_nからな
るｉ番目の単語Ｗ_iがくるときに、単語Ｗ_i-1の単語仮説
として６つの仮説Ｗａ，Ｗｂ，Ｗｃ，Ｗｄ，Ｗｅ，Ｗｆ
が存在している。ここで、前者３つの単語仮説Ｗａ，Ｗ
ｂ，Ｗｃの最終音素は／ｘ／であるとし、後者３つの単
語仮説Ｗｄ，Ｗｅ，Ｗｆの最終音素は／ｙ／であるとす
る。終了時刻ｔ_eと先頭音素環境が等しい仮説（図２で
は先頭音素環境が“ｘ／ａ₁／ａ₂”である上から３つの
単語仮説）のうち総尤度が最も高い仮説（例えば、図２
において１番上の仮説）以外を削除する。なお、上から
４番めの仮説は先頭音素環境が違うため、すなわち、先
行する単語仮説の最終音素がｘではなくｙであるので、
上から４番めの仮説を削除しない。すなわち、先行する
単語仮説の最終音素毎に１つのみ仮説を残す。図２の例
では、最終音素／ｘ／に対して１つの仮説を残し、最終
音素／ｙ／に対して１つの仮説を残す。[0085] For example, as shown in FIG. 2, the (i-1) th word W _i-1 of the following phoneme string a _1, a _2, ..., come i th word W _i consisting a _n Sometimes, six hypotheses Wa, Wb, Wc, Wd, We, and Wf are assumed as the word hypotheses of the word Wi _-1.
Exists. Here, the former three word hypotheses Wa, W
It is assumed that the final phonemes of b and Wc are / x /, and the final phonemes of the latter three word hypotheses Wd, We and Wf are / y /. The hypothesis with the highest total likelihood (for example, FIG. 2) is the hypothesis in which the end time t _e is equal to the first phoneme environment (the top three word hypotheses in which the _first phoneme environment is “x / a ₁ / a ₂ ” in FIG. 2). 2
Are deleted except for the top hypothesis). Since the fourth hypothesis from the top has a different phoneme environment, that is, since the last phoneme of the preceding word hypothesis is y instead of x,
Do not delete the fourth hypothesis from the top. That is, only one hypothesis is left for each final phoneme of the preceding word hypothesis. In the example of FIG. 2, one hypothesis is left for the final phoneme / x /, and one hypothesis is left for the final phoneme / y /.

【００８６】以上の実施形態においては、当該単語の先
頭音素環境とは、当該単語より先行する単語仮説の最終
音素と、当該単語の単語仮説の最初の２つの音素とを含
む３つの音素並びとして定義されているが、本発明はこ
れに限らず、先行する単語仮説の最終音素と、最終音素
と連続する先行する単語仮説の少なくとも１つの音素と
を含む先行単語仮説の音素列と、当該単語の単語仮説の
最初の音素を含む音素列とを含む音素並びとしてもよ
い。In the above embodiment, the head phoneme environment of the word is defined as a sequence of three phonemes including the last phoneme of the word hypothesis preceding the word and the first two phonemes of the word hypothesis of the word. Although defined, the present invention is not limited to this. The phoneme sequence of the preceding word hypothesis including the final phoneme of the preceding word hypothesis, and at least one phoneme of the preceding word hypothesis that is continuous with the final phoneme, And a phoneme sequence that includes a phoneme sequence that includes the first phoneme of the word hypothesis.

【００８７】以上の実施形態において、特徴抽出部２
と、単語照合部４と、単語仮説絞込部６と、言語モデル
生成部２０，３０とは、例えば、デジタル電子計算機な
どのコンピュータで構成され、バッファメモリ３，５
と、音素ＨＭＭメモリ１１と、単語辞書メモリ１２と、
学習用テキストデータメモリ１３と、品詞クラス情報デ
ータメモリ１４と、統計的言語モデルメモリ２１，２
２，３１とは、例えばハードディスクメモリなどの記憶
装置で構成される。In the above embodiment, the feature extraction unit 2
The word matching unit 4, the word hypothesis narrowing unit 6, and the language model generation units 20 and 30 are configured by a computer such as a digital computer, for example, and include buffer memories 3 and 5.
A phoneme HMM memory 11, a word dictionary memory 12,
Learning text data memory 13, part of speech class information data memory 14, statistical language model memories 21,
Each of the reference numerals 2 and 31 includes a storage device such as a hard disk memory.

【００８８】以上実施形態においては、単語照合部４と
単語仮説絞込部６とを用いて音声認識を行っているが、
本発明はこれに限らず、例えば、音素ＨＭＭ１１を参照
する音素照合部と、例えばＯｎｅＰａｓｓＤＰアル
ゴリズムを用いて統計的言語モデルを参照して単語の音
声認識を行う音声認識部とで構成してもよい。In the above embodiment, speech recognition is performed using the word collating unit 4 and the word hypothesis narrowing unit 6.
The present invention is not limited to this. For example, the present invention includes a phoneme matching unit that refers to the phoneme HMM 11 and a speech recognition unit that performs speech recognition of a word by referring to a statistical language model using, for example, the One Pass DP algorithm. Is also good.

【００８９】以上の実施形態においては、単語仮説絞込
部６において用いる統計的言語モデルとして、前向きの
Ｎ−ｇｒａｍ又は融合Ｎ−ｇｒａｍを用いているが、統
計的言語モデルメモリ２２内の後向きのＮ−ｇｒａｍを
用いてもよいし、前向きのＮ−ｇｒａｍと後向きのＮ−
ｇｒａｍの各遷移確率の平均値を有するＮ−ｇｒａｍの
統計的言語モデルを生成して、これを単語仮説絞込部６
で用いてもよい。In the above embodiment, the forward N-gram or the fusion N-gram is used as the statistical language model used in the word hypothesis narrowing unit 6, but the backward N-gram in the statistical language model memory 22 is used. An N-gram may be used, or a forward N-gram and a backward N-gram may be used.
A statistical language model of N-gram having an average value of each transition probability of the gram is generated, and is generated by the word hypothesis narrowing unit 6.
May be used.

【００９０】[0090]

【実施例】＜第１の実施形態の実験とその結果＞本発明
者は、ＭＡＰ推定による品詞情報と単語情報の連続補間
の有効性を確認するために評価実験を行った。実験にお
けるＮ−ｇｒａｍの次元は２−ｇｒａｍを用い、平滑化
はカットオフ係数０のバックオフを用いた。比較対象は
事後知識である単語２−ｇｒａｍ、事前知識である品詞
クラス−単語２−ｇｒａｍ、及びそれらの線形結合モデ
ルとした。学習用テキストデータ及び品詞クラス情報デ
ータである学習セットは総単語数約２６万語、異なり単
語数約４千語からなる日本語の対話文で、品詞情報は８
８個のカテゴリに分類されている。また、評価セットは
学習セットと同一タスクの１６会話２１７８単語の評価
セットＡと学習セットとは異なるタスクの２４会話３６
５５単語の評価セットＢの２種類を用いた。本発明者に
よる実験結果である、各モデルにおける評価セットＡ、
Ｂにおけるパープレキシティを表２に示す。EXAMPLES <Experiment of First Embodiment and Results thereof> The present inventors conducted an evaluation experiment in order to confirm the effectiveness of continuous interpolation of part of speech information and word information by MAP estimation. The dimension of N-gram in the experiment used 2-gram, and the smoothing used the back-off of the cut-off coefficient 0. The comparison targets were word 2-gram as post-knowledge, part-of-speech class-word 2-gram as pre-knowledge, and a linear combination model thereof. The learning set, which is learning text data and part-of-speech class information data, is a Japanese dialogue sentence consisting of about 260,000 total words and about 4,000 different words.
It is classified into eight categories. The evaluation set includes 16 conversations of the same task as the learning set, 2178 evaluation sets of 2178 words, and 24 conversations of tasks different from the learning set.
Two types of evaluation set B of 55 words were used. Evaluation set A in each model, which is an experimental result by the inventor,
Table 2 shows the perplexity in B.

【００９１】[0091]

【表２】各モデルの評価セットにおける性能（パープレキシティ） ―――――――――――――――――――――――――――――――――― テストセットＡテストセットＢ（同一のタスク）（異なるタスク） ―――――――――――――――――――――――――――――――――― 単語２−ｇｒａｍ１４．３９５３．０６ ―――――――――――――――――――――――――――――――――― 品詞クラス−単語２−ｇｒａｍ２２．００５８．７４ ―――――――――――――――――――――――――――――――――― 線形結合された２−ｇｒａｍ１４．００４５．６９ ―――――――――――――――――――――――――――――――――― ＭＡＰ推定された２−ｇｒａｍ１３．３３４３．７６（第１の実施形態） ――――――――――――――――――――――――――――――――――[Table 2] Performance (perplexity) in the evaluation set of each model ―――――――――――――――――――――――――――――――― Test set A Test set B (Same task) (Different tasks) ―――――――――――――――――――――――――――――――――― Words 2-gram 14.39 53.06 ―――――――――――――――――――――――――――――――― Part of speech class-word 2-gram 22.00 58.74 ―――――――――――――――――――――――――――――――― 2-gram 14.00 linearly combined 45.69 ―――――――――――――――――――――――――――――――― MAP estimated 2-gram 13.33 43.76 (First implementation State) ----------------------------------

【００９２】表２から明らかなように、ＭＡＰ推定後の
モデルは両方の評価セットに対して、最もパープレキシ
ティが低く、特に評価セットＢに対して著しい。このこ
とからＭＡＰ推定を用いて品詞情報と単語情報を連続的
に補間したモデルでは、単語２−ｇｒａｍの持つ次単語
予測精度を保ったまま、「タスクのずれに対する頑健
さ」が加わっており、その効果は線形結合モデルよりも
大きいことがわかる。As is clear from Table 2, the model after the MAP estimation has the lowest perplexity for both evaluation sets, and is particularly remarkable for evaluation set B. For this reason, in the model in which part of speech information and word information are continuously interpolated using MAP estimation, “robustness to task deviation” is added while maintaining the next word prediction accuracy of the word 2-gram. The effect is larger than that of the linear combination model.

【００９３】＜第２の実施形態の実験とその結果＞第１
の実施形態の言語モデル生成部２０により得られたＭＡ
Ｐ推定と自動クラス分類の併用による品詞Ｎ−ｇｒａｍ
と、単語Ｎ−ｇｒａｍの融合Ｎ−ｇｒａｍモデル（第２
の実施形態）に対する評価実験を行った。評価の条件
は、第１の実施形態の条件と同じであり、比較対象は単
語２−ｇｒａｍ及び自動クラス分類のみを行ったクラス
２−ｇｒａｍとした。また、クラス数はそれぞれに対し
て、５００及び２００クラスのものを用いた。各モデル
に対する評価セットＡ、Ｂのパープレキシティとクラス
数の関係を図１１に示す。評価セットＡ、Ｂ双方におい
て一定のクラス数まではクラス数の減少とともにパープ
レキシティも減少しており、品詞情報を含んだ特徴量を
用いたクラス分類の効果が現われている。特に異なるタ
スクである評価セットＢにおいてパープレキシティの減
少が著しく、タスクのずれに対する頑健さとして品詞情
報が効果的に働いていることを示している。また、クラ
ス２００及び５００の場合のパープレキシティの値を表
３に示す。<Experiment of Second Embodiment and Results thereof>
MA obtained by the language model generation unit 20 of the embodiment
Part-of-speech N-gram using P estimation and automatic classification
And a word N-gram fusion N-gram model (second
Of the present embodiment). The evaluation conditions are the same as those of the first embodiment, and the comparison target is the word 2-gram and the class 2-gram in which only the automatic class classification is performed. The number of classes used was 500 and 200, respectively. FIG. 11 shows the relationship between the perplexity of the evaluation sets A and B for each model and the number of classes. In both the evaluation sets A and B, the perplexity decreases along with the decrease in the number of classes up to a certain number of classes, and the effect of the class classification using the feature quantity including the part of speech information appears. Particularly, in the evaluation set B, which is a different task, the perplexity is remarkably reduced, indicating that the part of speech information is effectively working as robustness against the task deviation. Table 3 shows the values of perplexity for the classes 200 and 500.

【００９４】[0094]

【表３】融合２−ｇｒａｍのパープレキシティ評価 ―――――――――――――――――――――――――――――――――― クラス数テストセットＡテストセットＢ（同一のタスク）（異なるタスク） ―――――――――――――――――――――――――――――――――― 単語２−ｇｒａｍ１４．３９５３．０６ ―――――――――――――――――――――――――――――――――― 自動的にクラスタリング５００１５．１２５９．１０された２−ｇｒａｍ ――――――――――――――――――――――― ２００１６．６２５８．０１ ―――――――――――――――――――――――――――――――――― 融合２−ｇｒａｍ５００１３．９５４５．３２（第２の実施形態） ――――――――――――――――――――――― ２００１５．４４４６．５１ ――――――――――――――――――――――――――――――――――[Table 3] Perplexity evaluation of fused 2-gram ―――――――――――――――――――――――――――――――― Set A Test set B (Same task) (Different task) ―――――――――――――――――――――――――――――――――― Word 2 −gram 14.39 53.06 ―――――――――――――――――――――――――――――――― Automatic clustering 500 15.12 59.10 Performed 2-gram ――――――――――――――――――――――― 200 16.62 58.01 ―――――――――――― ―――――――――――――――――――――― Fusion 2-gram 500 13.95 45.32 (Second embodiment) ―――――――――― ―――― ――――――――― 200 15.44 46.51 ――――――――――――――――――――――――――――――――――

【００９５】表３から明らかなように、融合２−ｇｒａ
ｍは自動クラス分類によるクラス２−ｇｒａｍ、単語２
−ｇｒａｍ双方に対して低いパープレキシティを示して
いる。特に、評価セットＢに対しては単語２−ｇｒａｍ
に対して、１５％パープレキシティが低下しており、タ
スクのずれに対する頑健さを示している。また、このと
きの論理パラメータサイズは単語２−ｇｒａｍの２％未
満であり、エントリ数においても１／２であり。従来技
術の項で述べた４つの要求を同時に満たしていることが
わかる。As is evident from Table 3, the fused 2-gra
m is class 2-gram by automatic classification, word 2
-Shows low perplexity for both. In particular, for the evaluation set B, the word 2-gram
In contrast, the 15% perplexity has decreased, indicating the robustness to the task deviation. The logical parameter size at this time is less than 2% of the word 2-gram, and the number of entries is also １／. It can be seen that the four requirements described in the section of the prior art are simultaneously satisfied.

【００９６】次いで、第２の実施形態の融合２−ｇｒａ
ｍの連続単語認識における評価を行った。実験条件は、
上述の条件と同様であり、その評価結果を示す。Next, the fused 2-gra of the second embodiment
The evaluation in m continuous word recognition was performed. The experimental conditions were
The conditions are the same as those described above, and the evaluation results are shown.

【００９７】[0097]

【表４】融合２−ｇｒａｍ（第２の実施形態）の単語誤認識率の評価 ―――――――――――――――――――――――――――――――――― テストセットＡテストセットＢ（同一のタスク）（異なるタスク） ―――――――――――――――――――――――――――――――――― 単語２−ｇｒａｍ１４．３４％３２．７６％ ―――――――――――――――――――――――――――――――――― 融合２−ｇｒａｍ１２．３７％２３．０７％（２００クラス） ―――――――――――――――――――――――――――――――――― 融合２−ｇｒａｍ１２．０２％２３．５１％（５００クラス） ――――――――――――――――――――――――――――――――――[Table 4] Evaluation of word misrecognition rate of fusion 2-gram (second embodiment) ―――――――――――――――――――――――――――― ―――――― Test set A Test set B (Same task) (Different tasks) ――――――――――――――――――――――――――――― ――――― Word 2-gram 14.34% 32.76% ――――――――――――――――――――――――――――――――― ― Fusion 2-gram 12.37% 23.07% (200 class) ――――――――――――――――――――――――――――――――― ― Fusion 2-gram 12.02% 23.51% (500 class) ――――――――――――――――――――――――――――――――― ―

【００９８】表４から明らかなように、融合２−ｇｒａ
ｍの単語誤認識率はパープレキシティ同様評価セット
Ａ、Ｂ共単語２−ｇｒａｍより低い値を示している。ま
たその低下率はパープレキシティが同一タスクで３％で
あったのに対し１６％と大きく、さらに、異なるタスク
では２８％と著しい。このことから融合２−ｇｒａｍは
連続単語認識おいても同一タスク、異なるタスク共に非
常に有効であることがわかる。As is clear from Table 4, the fused 2-gra
The word misrecognition rate of m is lower than that of the evaluation set A or B, both words 2-gram, as in the case of perplexity. The decrease rate is as large as 16%, compared with 3% for the same task, and is remarkable as 28% for different tasks. From this, it can be seen that the fusion 2-gram is very effective for the same task and different tasks even in continuous word recognition.

【００９９】以上説明したように、本実施形態によれ
ば、単語Ｎ−ｇｒａｍに対してスパースデータに対する
信頼性とタスクのずれに対する頑健さを与える手法とし
て、品詞クラスＮ−ｇｒａｍを事前知識とした最大事後
確率推定を用いる方法を提案した。本手法を用いること
により、出現回数の少ない単語ペアに対しても品詞クラ
スＮ−ｇｒａｍの値に基づく信頼性のある値を与えるこ
とができる。また、同時にタスクのずれに対して頑健で
あるという品詞クラスＮ−ｇｒａｍの性質も引き継ぐこ
とができる。As described above, according to the present embodiment, as a method of giving the reliability to sparse data and the robustness against task deviation to the word N-gram, the part-of-speech class N-gram is used as prior knowledge. A method using maximum posterior probability estimation was proposed. By using this method, a reliable value based on the value of the part-of-speech class N-gram can be given to a word pair having a small number of appearances. At the same time, the property of the part-of-speech class N-gram, which is robust against task deviation, can be inherited.

【０１００】さらに、このモデルの単語間の遷移確率を
単語の特徴量とみなし、これに基づいて自動クラス分類
を行うことにより、性能を落とすことなくパラメータサ
イズを縮小することができる。このモデルは単語Ｎ−ｇ
ｒａｍの２％の論理パラメータ数、５０％のエントリ数
で、同一タスクでは３％、異なるタスクでは１５％パー
プレキシティが低く、さらに連続単語認識においては単
語誤認識率がそれぞれ１６％及び２８％だけ低い。この
ことから単語Ｎ−ｇｒａｍの次単語予測精度を保ったま
ま、スパースデータ対する信頼性、タスクのずれに対す
る頑健さを加えたうえ、モデルサイズも縮小できること
が実験により確認された。Further, by regarding the transition probability between words of this model as a feature amount of words and performing automatic class classification based on this, the parameter size can be reduced without deteriorating performance. This model uses the word Ng
2% of ram, 50% of entries, 3% for the same task, 15% low perplexity for different tasks, and 16% and 28% for word recognition in continuous word recognition Only low. From this, it was confirmed by experiments that the reliability of sparse data and the robustness against task deviation can be added and the model size can be reduced while maintaining the next word prediction accuracy of the word N-gram.

【０１０１】従って、以上詳述したように、第１の実施
形態の言語モデル生成部２０により生成された前向きＮ
−ｇｒａｍである統計的言語モデルは、モデルサイズを
小型化できないが、予測精度、信頼性及び頑健さにおい
て優れた性能を有する。また、この統計的言語モデルを
用いて音声認識することにより、従来例に比較して改善
された音声認識率で音声認識できる。Therefore, as described in detail above, the forward N generated by the language model generation unit 20 of the first embodiment.
Statistical language models that are -gram cannot reduce the model size, but have good performance in prediction accuracy, reliability and robustness. Also, by performing speech recognition using this statistical language model, speech recognition can be performed with an improved speech recognition rate as compared with the conventional example.

【０１０２】また、第２の実施形態の言語モデル生成部
３０により生成された融合Ｎ−ｇｒａｍである統計的言
語モデルは、予測精度、信頼性、モデルサイズ及び頑健
さにおいて優れた性能を有する。また、この統計的言語
モデルを用いて音声認識することにより、従来例に比較
して改善された音声認識率で音声認識できる。The statistical language model, which is a fusion N-gram generated by the language model generation unit 30 of the second embodiment, has excellent performance in prediction accuracy, reliability, model size, and robustness. Also, by performing speech recognition using this statistical language model, speech recognition can be performed with an improved speech recognition rate as compared with the conventional example.

【０１０３】[0103]

【発明の効果】以上詳述したように本発明に係る請求項
１記載の統計的言語モデル生成装置によれば、所定の話
者の発声音声文を書き下した学習用テキストデータに基
づいて、複数の単語からなる単語列の後に処理対象の単
語が生起する第１の頻度確率を計算することにより前向
きの単語Ｎ−ｇｒａｍの統計的言語モデルを生成する第
１の生成手段と、上記学習用テキストデータと、品詞ク
ラス情報を含む品詞クラス情報データとに基づいて、第
１の単語の品詞クラス及び上記第１の単語の後に接続さ
れる複数の単語からなる単語列の後に、処理対象の単語
が生起する第２の頻度確率を計算することにより前向き
の品詞クラス−単語Ｎ−ｇｒａｍの統計的言語モデルを
生成する第２の生成手段と、上記第２の生成手段によっ
て生成された前向きの品詞クラス−単語Ｎ−ｇｒａｍの
統計的言語モデルを事前知識として用い、上記第１の生
成手段によって生成された前向きの単語Ｎ−ｇｒａｍの
統計的言語モデルを事後知識として用いて最大事後確率
推定法により、第１の頻度確率と第２の頻度確率との間
を補間してなる前向きの遷移確率を計算する第１の計算
手段と、上記第１の計算手段によって計算された前向き
の遷移確率に対して、所定の正規化処理と平滑化処理を
実行することにより前向きのＮ−ｇｒａｍの統計的言語
モデルを生成する第１の処理手段とを備える。従って、
上記生成された生成された前向きＮ−ｇｒａｍである統
計的言語モデルは、モデルサイズを小型化できないが、
予測精度、信頼性及び頑健さにおいて優れた性能を有す
る。また、この統計的言語モデルを用いて音声認識する
ことにより、従来例に比較して改善された音声認識率で
音声認識できる。As described above in detail, according to the statistical language model generating apparatus according to the first aspect of the present invention, based on the learning text data in which the uttered voice sentence of a predetermined speaker has been written, a plurality of First generation means for generating a statistical language model of a forward word N-gram by calculating a first frequency probability that a word to be processed occurs after a word string consisting of the words Based on the data and the part-of-speech class information data including the part-of-speech class information, the word to be processed is added after the part of speech class of the first word and the word string including a plurality of words connected after the first word. Second generation means for generating a forward-looking part-of-speech class-word N-gram statistical language model by calculating a second frequency probability of occurrence, and forward generation generated by the second generation means. Using the statistical language model of the part-of-speech class-word N-gram as prior knowledge, and the maximum posterior probability estimation using the statistical language model of the forward-looking word N-gram generated by the first generation means as posterior knowledge First calculating means for calculating a forward transition probability obtained by interpolating between the first frequency probability and the second frequency probability by the method, and the forward transition probability calculated by the first calculating means. And a first processing unit for generating a forward-looking N-gram statistical language model by executing predetermined normalization processing and smoothing processing. Therefore,
The generated statistical language model that is the generated forward N-gram cannot reduce the model size,
It has excellent performance in prediction accuracy, reliability and robustness. Also, by performing speech recognition using this statistical language model, speech recognition can be performed with an improved speech recognition rate as compared with the conventional example.

【０１０４】また、請求項２記載の統計的言語モデル生
成装置によれば、請求項１記載の統計的言語モデル生成
装置において、上記学習用テキストデータに基づいて、
処理対象の単語から前に接続する複数の単語からなる単
語列が生起する第３の頻度確率を計算することにより後
向きの単語Ｎ−ｇｒａｍの統計的言語モデルを生成する
第３の生成手段と、上記学習用テキストデータと、上記
品詞クラス情報データとに基づいて、処理対象の単語の
品詞クラスから前に接続する複数の単語からなる単語列
が生起する第４の頻度確率を計算することにより後向き
の品詞クラス−単語Ｎ−ｇｒａｍの統計的言語モデルを
生成する第４の生成手段と、上記第４の生成手段によっ
て生成された後向きの品詞クラス−単語Ｎ−ｇｒａｍの
統計的言語モデルを事前知識として用い、上記第３の生
成手段によって生成された後向きの単語Ｎ−ｇｒａｍの
統計的言語モデルを事後知識として用いて最大事後確率
推定法により、第３の頻度確率と第４の頻度確率との間
を補間してなる後向きの遷移確率を計算する第２の計算
手段と、上記第２の計算手段によって計算された後向き
の遷移確率に基づいて、所定の正規化処理と平滑化処理
を実行することにより後向きの単語Ｎ−ｇｒａｍの統計
的言語モデルを生成する第２の処理手段とをさらに備え
る。従って、上記生成された生成された後向きＮ−ｇｒ
ａｍである統計的言語モデルは、モデルサイズを小型化
できないが、予測精度、信頼性及び頑健さにおいて優れ
た性能を有する。また、この統計的言語モデルを用いて
音声認識することにより、従来例に比較して改善された
音声認識率で音声認識できる。Further, according to the statistical language model generating device of the present invention, in the statistical language model generating device of the present invention, based on the learning text data,
Third generation means for generating a statistical language model of the backward word N-gram by calculating a third frequency probability that a word string composed of a plurality of words connected before the word to be processed occurs; Based on the learning text data and the part-of-speech class information data, a backward frequency is calculated by calculating a fourth frequency probability that a word string composed of a plurality of words connected before occurs from the part-of-speech class of the word to be processed. A fourth generation unit for generating a statistical language model of the part-of-speech class-word N-gram, and prior knowledge of a backward part-of-speech class-statistical language model of the word N-gram generated by the fourth generation unit Using the statistical language model of the backward word N-gram generated by the third generating means as the posterior knowledge, by the maximum posterior probability estimation method. Calculating a backward transition probability by interpolating between the frequency probability of the second and fourth frequency probabilities, and a predetermined value based on the backward transition probability calculated by the second calculating means. And a second processing unit that generates a statistical language model of the backward word N-gram by executing the normalization process and the smoothing process. Therefore, the generated generated backward N-gr
The statistical language model that is am cannot reduce the model size, but has excellent performance in prediction accuracy, reliability, and robustness. Also, by performing speech recognition using this statistical language model, speech recognition can be performed with an improved speech recognition rate as compared with the conventional example.

【０１０５】本発明に係る請求項３記載の統計的言語モ
デル生成装置によれば、複数の単語からなる単語列の後
に処理対象の単語が生起する第１の頻度確率と、第１の
単語の品詞クラス及び上記第１の単語の後に接続される
複数の単語からなる単語列の後に、処理対象の単語が生
起する第２の頻度確率とを補間してなる前向きの遷移確
率を含む前向きのＮ−ｇｒａｍの統計的言語モデルの遷
移確率に基づいて、処理対象単語よりも前に接続される
各単語列に対して特徴量として上記前向きのＮ−ｇｒａ
ｍの統計的言語モデルの遷移確率を割り当てて、各クラ
スの特徴量のばらつきが小さくならないようにクラスタ
リングして、クラスタリング後のクラス分類情報を生成
する第１のクラスタリング手段と、処理対象の単語から
前に接続する複数の単語からなる単語列が生起する第３
の頻度確率と、処理対象の単語の品詞クラスから前に接
続する複数の単語からなる単語列が生起する第４の頻度
確率とを補間してなる後向きの遷移確率を含む後向きの
Ｎ−ｇｒａｍの統計的言語モデルの遷移確率に基づい
て、各処理単語に対して特徴量として上記後向きのＮ−
ｇｒａｍの統計的言語モデルの遷移確率を割り当てて、
各クラスの特徴量のばらつきが小さくならないようにク
ラスタリングして、クラスタリング後のクラス分類情報
を生成する第２のクラスタリング手段と、所定の話者の
発声音声文を書き下した学習用テキストデータに基づい
て、上記第１のクラスタリング手段及び第２のクラスタ
リング手段によって生成されたクラス分類情報を処理対
象として、処理対象の単語よりも前の単語列のクラスか
ら、処理対象の単語のクラスへの頻度確率を計算するこ
とにより融合Ｎ−ｇｒａｍの統計的言語モデルを生成す
る第５の生成手段とを備える。ここで、好ましくは、上
記補間してなる前向きの遷移確率は、上記第１の処理手
段によって生成された前向きのＮ−ｇｒａｍの統計的言
語モデルの遷移確率であり、上記補間してなる後向きの
遷移確率は、上記第２の処理手段によって生成された後
向きのＮ−ｇｒａｍの統計的言語モデルの遷移確率であ
り、上記第５の生成手段において用いる学習用テキスト
データは、上記第１乃至第４の生成手段において用いる
学習用テキストデータである。従って、上記生成された
生成された融合Ｎ−ｇｒａｍである統計的言語モデル
は、モデルサイズ、予測精度、信頼性及び頑健さにおい
て優れた性能を有する。また、この統計的言語モデルを
用いて音声認識することにより、従来例に比較して改善
された音声認識率で音声認識できる。According to the statistical language model generating apparatus according to the third aspect of the present invention, the first frequency probability that a word to be processed occurs after a word string composed of a plurality of words, A forward N including a forward transition probability obtained by interpolating a part of speech class and a second frequency probability of occurrence of a word to be processed after a word string including a plurality of words connected after the first word; Based on the transition probability of the statistical language model of -gram, for each word string connected before the word to be processed, the forward-looking N-gra
first clustering means for allocating the transition probabilities of the statistical language model of m and performing clustering so that the variation in the feature amount of each class is not reduced, and generating class classification information after clustering; The third occurrence of a word string consisting of multiple words connected before
Of the backward N-gram including the backward transition probability obtained by interpolating the frequency probability of the word to be processed and the fourth frequency probability of the occurrence of a word string composed of a plurality of words connected before the part of speech class of the word to be processed. Based on the transition probability of the statistical language model, the backward N-
assigning the transition probabilities of the statistical language model of gram,
Based on second clustering means for performing clustering so that the variation in the feature amount of each class does not become small and generating class classification information after clustering, and learning text data in which uttered voice sentences of a predetermined speaker are written down With the class classification information generated by the first clustering unit and the second clustering unit as a processing target, the frequency probability from the class of the word string before the word to be processed to the class of the word to be processed is calculated. Fifth generation means for generating a statistical language model of the fusion N-gram by calculating. Here, preferably, the forward transition probability obtained by the interpolation is a transition probability of the forward N-gram statistical language model generated by the first processing means, and the backward transition probability obtained by the interpolation is used. The transition probability is a transition probability of a backward N-gram statistical language model generated by the second processing means, and the learning text data used in the fifth generation means is the first to fourth text data. Is the text data for learning used in the generation means. Therefore, the generated statistical language model, which is the generated fused N-gram, has excellent performance in model size, prediction accuracy, reliability, and robustness. Also, by performing speech recognition using this statistical language model, speech recognition can be performed with an improved speech recognition rate as compared with the conventional example.

【０１０６】さらに、本発明に係る請求項５記載の音声
認識装置によれば、入力される発声音声文の音声信号に
基づいて、所定の統計的言語モデルを用いて音声認識す
る音声認識手段を備えた音声認識装置において、上記音
声認識手段は、請求項１に記載の第１の処理手段によっ
て生成された前向きＮ−ｇｒａｍの統計的言語モデルを
用いて、請求項２に記載の第２の処理手段によって生成
された後向きＮ−ｇｒａｍの統計的言語モデルを用い
て、もしくは、請求項３又は４記載の第５の生成手段に
よって生成された融合Ｎ−ｇｒａｍの統計的言語モデル
を用いて、音声認識する。従って、上記統計的言語モデ
ルを用いて音声認識することにより、従来例に比較して
改善された音声認識率で音声認識できる。Further, according to the speech recognition apparatus of the fifth aspect of the present invention, the speech recognition means for recognizing the speech using a predetermined statistical language model based on the speech signal of the input uttered speech sentence. In a speech recognition apparatus provided with the second aspect, the speech recognition means uses a forward-looking N-gram statistical language model generated by the first processing means according to claim 1. Using a statistical language model of the backward N-gram generated by the processing means, or using a statistical language model of the fusion N-gram generated by the fifth generation means according to claim 3 or 4; Recognize voice. Therefore, by performing speech recognition using the statistical language model, speech recognition can be performed with an improved speech recognition rate as compared with the conventional example.

[Brief description of the drawings]

【図１】本発明に係る一実施形態である連続音声認識
装置のブロック図である。FIG. 1 is a block diagram of a continuous speech recognition apparatus according to an embodiment of the present invention.

【図２】図１の連続音声認識装置における単語仮説絞
込部６の処理を示すタイミングチャートである。FIG. 2 is a timing chart showing a process of a word hypothesis narrowing section 6 in the continuous speech recognition device of FIG.

【図３】図１の言語モデル生成部２０によって実行さ
れる確率推定値の補間処理を示すグラフである。FIG. 3 is a graph illustrating an interpolation process of a probability estimation value performed by a language model generation unit 20 of FIG. 1;

【図４】図１の言語モデル生成部２０によって実行さ
れる第１の言語モデル生成処理を示すフローチャートで
ある。FIG. 4 is a flowchart illustrating a first language model generation process executed by a language model generation unit 20 of FIG. 1;

【図５】図４のサブルーチンである前向き言語モデル
生成処理を示すフローチャートである。FIG. 5 is a flowchart showing a forward language model generation process which is a subroutine of FIG. 4;

【図６】図４のサブルーチンである後向き言語モデル
生成処理を示すフローチャートである。FIG. 6 is a flowchart showing a backward language model generation process which is a subroutine of FIG. 4;

【図７】図１の言語モデル生成部３０によって実行さ
れる第２の言語モデル生成処理を示すフローチャートで
ある。FIG. 7 is a flowchart illustrating a second language model generation process executed by the language model generation unit 30 of FIG. 1;

【図８】図７のサブルーチンである前向き言語モデル
のクラスタリング処理を示すフローチャートである。8 is a flowchart showing a clustering process of a forward-looking language model, which is a subroutine of FIG. 7;

【図９】図７のサブルーチンである後向き言語モデル
のクラスタリング処理を示すフローチャートである。FIG. 9 is a flowchart showing a backward language model clustering process which is a subroutine of FIG. 7;

【図１０】図７のサブルーチンである融合言語モデル
生成処理を示すフローチャートである。FIG. 10 is a flowchart showing a fusion language model generation process which is a subroutine of FIG. 7;

【図１１】図１の言語モデル生成部３０によって生成
された融合２−ｇｒａｍの言語モデル（第２の実施形
態）及び単語２−ｇｒａｍ（従来例）に対する実験結果
であって、クラス数とパープレキシティの関係を示すグ
ラフである。11 is an experimental result of a fusion 2-gram language model (second embodiment) and a word 2-gram (conventional example) generated by the language model generation unit 30 of FIG. It is a graph which shows the relationship of plexity.

[Explanation of symbols]

１…マイクロホン、２…特徴抽出部、３，５…バッファメモリ、４…単語照合部、６…単語仮説絞込部、１１…音素ＨＭＭメモリ、１２…単語辞書メモリ、１３…学習用テキストデータメモリ、１４…品詞クラス情報データメモリ、２０，３０…言語モデル生成部、２１，２２，３１…統計的言語モデルメモリ、ＳＷ…スイッチ。 DESCRIPTION OF SYMBOLS 1 ... Microphone, 2 ... Feature extraction part, 3, 5 ... Buffer memory, 4 ... Word collation part, 6 ... Word hypothesis narrowing part, 11 ... Phoneme HMM memory, 12 ... Word dictionary memory, 13 ... Text data memory for learning , 14 ... part-of-speech class information data memory, 20, 30 ... language model generator, 21, 22, 31 ... statistical language model memory, SW ... switch.

───────────────────────────────────────────────────── フロントページの続き (72)発明者匂坂芳典京都府相楽郡精華町大字乾谷小字三平谷５番地株式会社エイ・ティ・アール音声翻訳通信研究所内Ｆターム(参考） 5B091 AA15 CB12 CC01 CC16 EA01 5D015 HH23 ──────────────────────────────────────────────────続き Continuing from the front page (72) Inventor Yoshinori Sakasaka 5 Sanraya, Inaya, Seika-cho, Sagara-gun, Kyoto F-term in ATR Speech Translation and Communication Research Laboratories Co., Ltd. 5B091 AA15 CB12 CC01 CC16 EA01 5D015 HH23

Claims

[Claims]

1. A first frequency probability that a word to be processed occurs after a word string composed of a plurality of words, based on learning text data in which uttered voice sentences of a predetermined speaker are written. First generation means for generating a statistical language model of the forward-looking word N-gram; a part-of-speech class of the first word based on the learning text data and part-of-speech class information data including part-of-speech class information; By calculating a second frequency probability that a word to be processed occurs after a word string consisting of a plurality of words connected after the first word, a forward-looking part-of-speech class—a statistical language of the word N-gram A second generation unit for generating a model; and a forward-looking part-of-speech class-word N-gram statistical language model generated by the second generation unit as prior knowledge, Using the statistical language model of the forward-looking word N-gram generated by the generating means as the posterior knowledge and interpolating between the first frequency probability and the second frequency probability by the maximum posterior probability estimation method. First calculating means for calculating a forward transition probability; and performing forward normalization and smoothing processing on the forward transition probability calculated by the first calculating means, thereby obtaining a forward N- a first processing means for generating a statistical language model of the gram.

2. The statistical language model generating apparatus according to claim 1, wherein a third frequency at which a word string including a plurality of words connected before the word to be processed occurs based on the learning text data. By calculating the probability, the backward word N-
a third generation unit for generating a statistical language model of the gram; a plurality of words connected before the part of speech class of the word to be processed based on the learning text data and the part of speech class information data; A fourth generation unit that generates a statistical language model of the backward part-of-speech class-word N-gram by calculating a fourth frequency probability that the word string occurs; and a backward direction generated by the fourth generation unit. Using the statistical language model of the part-of-speech class-word N-gram as prior knowledge, and the maximum posterior probability estimation using the statistical language model of the backward word N-gram generated by the third generation means as posterior knowledge A second calculating means for calculating a backward transition probability obtained by interpolating between the third frequency probability and the fourth frequency probability by the method, Second processing means for generating a statistical language model of the backward word N-gram by executing a predetermined normalization process and a smoothing process based on the backward transition probability calculated in the above manner. A statistical language model generation device.

3. A first frequency probability that a word to be processed occurs after a word string including a plurality of words, a part of speech class of the first word, and a plurality of words connected after the first word. The word to be processed based on the transition probabilities of the forward-looking N-gram statistical language model including the forward transition probabilities obtained by interpolating the second word probability at which the word to be processed occurs after the word sequence The forward N-gram statistical language model transition probability of the forward N-gram is assigned as a feature amount to each word string connected before, and clustering is performed so that the variation in the feature amount of each class is not reduced. First to generate class classification information after clustering
Clustering means, a third frequency probability that a word string consisting of a plurality of words connected before the word to be processed occurs, and a word string consisting of a plurality of words connected before the part of speech class of the word to be processed Is generated as a feature amount for each processing word based on the transition probability of the backward N-gram statistical language model including the backward transition probability obtained by interpolating the fourth frequency probability of occurrence of A second clustering means for allocating a transition probability of a statistical language model of "gram" and performing clustering so that the variation of the feature amount of each class is not reduced, and generating class classification information after clustering; Generated by the first clustering means and the second clustering means based on the learning text data in which the uttered voice sentence of A statistical language model of the fusion N-gram is generated by calculating the frequency probability of the class of the word to be processed from the class of the word string before the word to be processed, using the class classification information as the processing target. A statistical language model generation device comprising: a fifth generation unit.

4. The statistical language model generation device according to claim 3, wherein the forward transition probability obtained by interpolation is obtained by calculating a forward N-gram statistical language model generated by the first processing means. The transition probability is the transition probability of the backward N-gram statistical language model generated by the second processing means, and is used in the fifth generation means. A statistical language model generation device, wherein the learning text data is learning text data used in the first to fourth generation means.

5. A speech recognition apparatus comprising speech recognition means for recognizing a speech using a predetermined statistical language model based on a speech signal of an input uttered speech sentence, wherein the speech recognition means comprises: A forward-looking N-gram statistical language model generated by the first processing unit according to claim 2 is used to generate a backward N-gram statistical language model generated by the second processing unit according to claim 2. A speech recognition apparatus characterized in that speech recognition is performed using the statistical language model of the fused N-gram generated by using the fifth generation means according to claim 3 or 4.