JP3004254B2

JP3004254B2 - Statistical sequence model generation device, statistical language model generation device, and speech recognition device

Info

Publication number: JP3004254B2
Application number: JP10165030A
Authority: JP
Inventors: サビン・デリン; 芳典匂坂; 秀治中嶋
Original assignee: 株式会社エイ・ティ・アール音声翻訳通信研究所
Priority date: 1998-06-12
Filing date: 1998-06-12
Publication date: 2000-01-31
Anticipated expiration: 2018-06-12
Also published as: US6314399B1; JPH11352994A; EP0964389A3; EP0964389A2

Abstract

An apparatus is disclosed for generating a statistical class sequence model called class bi-multigram model from input strings of discrete-valued units, where bigram dependencies are assumed between adjacent variable length sequences of maximum length N units, and where class labels are assigned to the sequences. There are counted the number of times all sequences of units occur and the number of times all pairs of sequences of units co-occur in the input training strings of units, and an initial bigram probability distribution of all the pairs of sequences is computed as the counted number of times the two sequences co-occur divided by the counted number of times the first sequence occurs in the input training string. Then the input sequences are classified into a pre-specified desired number of classes. Further, an estimate of the bigram probability distribution of the sequences is calculated by using an EM algorithm to maximize the likelihood of the input training string computed with the input probability distributions, and the above processes are iteratively performed to generate a statistical class sequence model. <IMAGE>

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、学習用シーケンス
データに基づいて統計的シーケンスモデルを生成する統
計的シーケンスモデル生成装置、学習用テキストデータ
に基づいて統計的言語モデルを生成する統計的言語モデ
ル生成装置、及び上記統計的言語モデルを用いて、入力
される発声音声文の音声信号を音声認識する音声認識装
置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a statistical sequence model generation device for generating a statistical sequence model based on learning sequence data, and a statistical language model for generating a statistical language model based on learning text data. The present invention relates to a generation device and a speech recognition device that recognizes a speech signal of an input uttered speech sentence using the statistical language model.

【０００２】[0002]

【従来の技術】近年、連続音声認識装置において、その
性能を高めるために言語モデルを用いる方法が研究され
ている。これは、シーケンスモデルである言語モデルを
用いて、次単語を予測し探索空間を削減することによ
り、認識率の向上及び計算時間の削減の効果を狙ったも
のである。ここで、シーケンスとは、具体的には、文字
のシーケンスでは単語であり、単語のシーケンスではフ
レーズ（又は句）である。最近盛んに用いられている言
語モデルとしてＮ−ｇｒａｍ（Ｎ−グラム；ここで、Ｎ
は２以上の自然数である。）がある。これは、大規模な
テキストデータを学習し、直前のＮ−１個の単語から次
の単語への遷移確率を統計的に与えるものである。複数
Ｌ個の単語列ｗ₁ ^L＝ｗ₁，ｗ₂，…，ｗ_Lの生成確率Ｐ
（ｗ₁ ^L）は次式で表される。2. Description of the Related Art In recent years, a method of using a language model has been studied to improve the performance of a continuous speech recognition apparatus. This aims at improving the recognition rate and reducing the calculation time by predicting the next word and reducing the search space using a language model that is a sequence model. Here, the sequence is, specifically, a word in a sequence of characters and a phrase (or phrase) in a sequence of words. N-gram (N-gram; here, N-gram)
Is a natural number of 2 or more. ). It learns large-scale text data and statistically gives the transition probability from the previous N-1 words to the next word. Generation probability P of a plurality of L word strings w ₁ ^L = w ₁ , w ₂ ,..., W _L
(W ₁ ^L ) is expressed by the following equation.

【０００３】[0003]

【数１】 (Equation 1)

【０００４】ここで、ｗ_tは単語列ｗ₁ ^Lのうちｔ番目の
１つの単語を表し、ｗ_i ^jはｉ番目からｊ番目の単語列を
表わす。上記数１において、確率Ｐ（ｗ_t｜
ｗ_t+1-N ^t-1）は、Ｎ個の単語からなる単語列ｗ_t+1-N ^t-1
が発声された後に単語ｗ_tが発声される確率であり、以
下同様に、確率Ｐ（Ａ｜Ｂ）は単語又は単語列Ｂが発声
された後に単語Ａが発声される確率を意味する。また、
数１における「Π」はｔ＝１からＬまでの確率Ｐ（ｗ_t
｜ｗ_t+1-N ^t-1）の積を意味する。[0004] Here, w _t represents a t-th one word of the word string w ₁ ^L, w _i ^j represents the j-th word string from the i-th. In the above _equation 1, the probability P (w _t |
wt _{+ 1-} ^Nt-1 ) is a word sequence wt _{+ 1-} ^Nt-1 composed of N words.
Is the probability that the word w _t will be uttered after is uttered, and similarly, the probability P (A | B) means the probability that the word A will be uttered after the word or word string B has been uttered. Also,
“Π” in Equation 1 represents the probability P (w _t from t = 1 to L
| W _{t + 1−N} ^t−1 ).

【０００５】ところで、近年、上記統計的言語モデルの
Ｎ−ｇｒａｍを用いて連続音声認識の性能を向上させる
手法が盛んに提案されており、そのいくつかのモデルで
は、可変長の単語列にわたる単語の依存性を利用する方
法を用いている。これらのモデルは、共通して従来のＮ
−ｇｒａｍモデルにみられる固定長の依存性の仮定を緩
和するために用いられており、種々のより広い仮定をカ
バーしている。In recent years, techniques for improving the performance of continuous speech recognition using the above-described statistical language model N-gram have been actively proposed. It uses a method that takes advantage of dependencies. These models have in common the traditional N
Used to mitigate the fixed-length dependency assumption found in the -gram model, covering a variety of broader assumptions.

【０００６】フレーズを純粋に統計的方法（すなわち、
統計的文脈自由文法（Stochastic Context Free Gramma
rs）にあるような文法的規則を用いない方法）で導くた
めには、種々の基準を使用する必要があり、例えば、以
下の基準が提案されてきた。（ａ）従来技術文献１「K. Ries et al.，”Class phra
se models for languagemodeling”，Proceedings of I
CSLP 96, 1996」において開示されたリーブ・ワン・ア
ウト（leave-one-out）尤度、及び（ｂ）従来技術文献２「H. Masataki et al., Variable
-order n-gram generation by word-class splitting a
nd consecutive word grouping. Proceedings ofICASSP
96,1996」において開示されたエントロピー。[0006] Phrases are expressed in a purely statistical manner (ie,
Stochastic Context Free Gramma
rs), it is necessary to use various criteria in order to derive it in a manner that does not use grammatical rules, for example, the following criteria have been proposed. (A) Prior art document 1 "K. Ries et al.," Class phra
se models for languagemodeling ”, Proceedings of I
Leave-one-out likelihood disclosed in CSLP 96, 1996, and (b) Prior art document 2 “H. Masataki et al., Variable
-order n-gram generation by word-class splitting a
nd consecutive word grouping.Proceedings ofICASSP
96, 1996 ".

【０００７】[0007]

【発明が解決しようとする課題】これらの方法におい
て、尤度の基準を統計的枠組みの中で用いることで、Ｅ
Ｍ（Expectation Maximum;すなわち、期待値の最大化）
アルゴリズムを用いた最適化の方法を用いることができ
るが、過学習となる傾向がある。また、最適化処理にお
いては、例えば、従来技術文献３「S. Matsunaga et a
l.,”Variable-length language modeling integrating
global constraints”,Proceedings of EUROSPEECH 9
7,1997」において発見的手法を用いられているが、統計
的言語モデルの収束と最適化は理論的に保証されていな
い。In these methods, by using the likelihood criterion in a statistical framework, the E
M (Expectation Maximum; ie, maximization of expected value)
Although an optimization method using an algorithm can be used, it tends to be over-learned. Further, in the optimization processing, for example, the related art document 3 “S. Matsunaga et a
l., ”Variable-length language modeling integrating
global constraints ”, Proceedings of EUROSPEECH 9
7,1997], the convergence and optimization of statistical language models are not theoretically guaranteed.

【０００８】ここで、さらに、例えば、従来技術文献１
において提案された尤度の基準を用いたときの問題点に
ついて述べると以下の通りである。＜問題点１＞単語のシーケンスの頻度確率が貪欲なアル
ゴリズム(greedy algorithm)によって得られるために、
最適な状態に向かう単調な収束が保証されない。＜問題点２＞この方法は確定的なものである。つまり、
仮にシーケンス［ｂｃｄ］がシーケンスの目録（invent
ory)に在れば、入力文字列に”ｂｃｄ”が発生しても、
これが［ｂｃ］＋［ｄ］、［ｂ］＋［ｃｄ］、［ｂ］＋
［ｃ］＋［ｄ］等のサブシーケンスに分割されることは
ない。言い換えれば、シーケンスへの解析において自由
度が無い。＜問題点３＞シーケンスのクラスの定義が先行する単語
のクラス分類を基礎としている。すなわち、まず、単語
が分類され、次に、単語のクラスのラベルの各シーケン
スは、シーケンスのクラスを定義するために使用され
る。従って、同一クラスに長さの違うシーケンスを入れ
ることはできない。例えば、”thank you for”と”tha
nk you very much for”は同じクラスに入らない。Here, for example, prior art document 1
The following describes problems when the likelihood criterion proposed in is used. <Problem 1> Since the frequency probability of a sequence of words can be obtained by a greedy algorithm,
Monotonic convergence towards optimal conditions is not guaranteed. <Problem 2> This method is definite. That is,
If the sequence [bcd] is a list of sequences (invent
ory), even if "bcd" occurs in the input character string,
This is [bc] + [d], [b] + [cd], [b] +
It is not divided into sub-sequences such as [c] + [d]. In other words, there is no degree of freedom in analyzing the sequence. <Problem 3> The definition of the class of the sequence is based on the class classification of the preceding word. That is, first the words are classified, and then each sequence of labels of the class of words is used to define the class of the sequence. Therefore, sequences with different lengths cannot be included in the same class. For example, “thank you for” and “tha
nk you very much for ”does not belong to the same class.

【０００９】これを解決するために、本発明者は、従来
技術文献４「S. Deligne et al.,”Introducing statis
tical dependencies and structural constraints in v
ariable-length sequence models”、In Grammatical In
ference: Learning Syntaxfrom Sentences, Lecture No
tes in Artificial Intelligence 1147, pp.156-167,Sp
ringer,1996」において、可変長のシーケンスであるマ
ルチグラムを用いる統計的言語モデルについて、当該従
来技術文献４の（１６）式を用いて、それらのパラメー
タを計算できる可能性だけを示しているが、当該（１
６）式は、実際にディジタル計算機を用いて計算するこ
とができる形式とはなっておらず、実用化することがで
きないという問題点があった。ここで、マルチグラムと
は、他のシーケンスとの依存性を特定しない可変長のシ
ーケンスである。In order to solve this problem, the present inventor has proposed a technique disclosed in prior art document 4 “S. Deligne et al.,” Introducing statis.
tical dependencies and structural constraints in v
ariable-length sequence models ”, In Grammatical In
ference: Learning Syntaxfrom Sentences, Lecture No
tes in Artificial Intelligence 1147, pp.156-167, Sp
ringer, 1996 ", for a statistical language model using a multigram that is a variable-length sequence, only the possibility of calculating those parameters using Equation (16) of the related art document 4 is shown. , The (1)
Equation (6) is not in a format that can be actually calculated using a digital computer, and has a problem that it cannot be put to practical use. Here, a multigram is a variable-length sequence that does not specify dependence on other sequences.

【００１０】本発明の目的は以上の問題点を解決し、従
来例に比較して、最適な状態に向かう単調な収束を保証
することができ、解析結果に自由度があり、可変長のシ
ーケンスを同一のクラスで取り扱うことができ、ディジ
タル計算機を用いて実用的に高速処理して統計的モデル
を生成することができる統計的シーケンスモデル生成装
置、統計的言語モデル生成装置及び音声認識装置を提供
することにある。[0010] An object of the present invention is to solve the above-mentioned problems, to assure monotonous convergence toward an optimum state as compared with the conventional example, to provide a degree of freedom in the analysis result, and to obtain a variable-length sequence. A statistical sequence model generation device, a statistical language model generation device, and a speech recognition device capable of handling statistical information in the same class and generating a statistical model by practically performing high-speed processing using a digital computer. Is to do.

【００１１】[0011]

【課題を解決するための手段】本発明に係る統計的シー
ケンスモデル生成装置は、１個又は複数の単位からなる
単位列であるシーケンスを含む入力データに基づいて、
可変長の自然数Ｎ₁個の単位列であるマルチグラムと、
可変長の自然数Ｎ₂個の単位列であるマルチグラムとの
間のバイグラムであるバイ−マルチグラムの統計的シー
ケンスモデルを生成する統計的シーケンスモデル生成装
置であって、上記入力データに基づいて、予め決められ
たＮ₁，Ｎ₂の最大値の拘束条件のもとで、すべての単位
列の組み合わせの上記バイグラムの頻度確率を計数する
初期化手段と、上記初期化手段によって計数された上記
バイグラムの頻度確率に基づいて、各クラスの対をマー
ジしたときの相互情報量の損失が最小となるようにマー
ジして各クラスの頻度確率を更新して予め決められた数
の複数のクラスに分類することにより、分類されたクラ
スに含まれる単位列と、分類されたクラスの条件付きの
単位列の頻度確率と、分類されたクラス間のバイグラム
の頻度確率を計算して出力する分類手段と、上記分類処
理手段から出力される分類されたクラスに含まれる単位
列と、分類されたクラスの条件付きの単位列の頻度確率
と、分類されたクラス間のバイグラムの頻度確率とに基
づいて、ＥＭアルゴリズムを用いて、最尤推定値を得る
ように再推定し、ここで、フォワード・バックワードア
ルゴリズムを用いて、処理対象の各単位列に対して、時
系列的に前方にとり得る処理対象の当該単位列に対する
前方尤度と、当該単位列の直前の単位列を条件としたと
きの当該単位列の頻度確率と、時系列的に後方にとり得
る当該単位列に対する後方尤度とに基づいてシーケンス
間のバイグラムの頻度確率を示す式を用いて、当該シー
ケンス間のバイグラムの頻度確率を再推定することによ
り、再推定結果である上記バイ−マルチグラムの統計的
シーケンスモデルを生成して出力する再推定手段と、上
記分類手段の処理と上記再推定手段の処理を所定の終了
条件を満たすまで繰り返し実行するように制御する制御
手段とを備えたことを特徴とする。According to the present invention, there is provided a statistical sequence model generating apparatus based on input data including a sequence which is a unit sequence of one or more units.
A multigram that is a unit sequence of variable-length natural numbers N ₁ ,
A is bi bigram between multigram a natural number N ₂ pieces of unit columns of variable length - a statistical sequence model generating device for generating a statistical sequence model of multi-gram, based on the input data, Initializing means for counting the frequency probabilities of the bigrams of all combinations of unit strings under the constraint of predetermined maximum values of N ₁ and N _{2, and} the bigram counted by the initializing means Based on the frequency probabilities of each class, the classes are merged so that the loss of mutual information when the pairs of classes are merged is minimized, and the frequency probabilities of each class are updated to classify them into a predetermined number of classes. By calculating the unit sequence included in the classified class, the frequency probability of the conditional unit sequence of the classified class, and the frequency probability of the bigram between the classified classes Classifying means, and a unit sequence included in the classified class output from the classification processing means, a frequency probability of a conditional unit sequence of the classified class, and a bigram between the classified classes. Based on the frequency probabilities, re-estimation is performed using the EM algorithm to obtain a maximum likelihood estimation value. Here, using the forward / backward algorithm, each unit sequence to be processed is time-series , The forward likelihood of the unit sequence of the processing target that can be taken forward, the frequency probability of the unit sequence when the unit sequence immediately before the unit sequence is a condition, and the backward possibility of the unit sequence that can be taken backward in chronological order. By re-estimating the frequency probability of the bigram between the sequences using an expression indicating the frequency probability of the bigram between the sequences based on the likelihood, Re-estimating means for generating and outputting a statistical sequence model of the chigram; and control means for controlling the processing of the classifying means and the processing of the re-estimating means to be repeatedly executed until a predetermined end condition is satisfied. It is characterized by the following.

【００１２】また、上記統計的シーケンスモデル生成装
置において、上記初期化手段はさらに、上記計数された
バイグラムの頻度確率のうち、所定の頻度確率以下のバ
イグラムの組み合わせのデータを除去することを特徴と
する。In the above-mentioned statistical sequence model generating apparatus, the initialization means may further remove, from among the counted frequency counts of the bigram, data of a combination of bigrams having a predetermined frequency probability or less. I do.

【００１３】さらに、上記統計的シーケンスモデル生成
装置において、上記分類手段は、上記初期化手段によっ
て計数された上記バイグラムの頻度確率に基づいて、ブ
ラウンアルゴリズムを用いて、上記複数のクラスに分類
することを特徴とする。Further, in the statistical sequence model generation device, the classification means classifies the plurality of classes using a Brownian algorithm based on the frequency probability of the bigram counted by the initialization means. It is characterized by.

【００１４】また、上記統計的シーケンスモデル生成装
置において、上記式は、上記入力データにおいて、当該
単位列である第２の単位列が第１の単位列に続くときの
単位列のシーケンス間のバイグラムの頻度確率を、上記
入力データにおける処理対象の各単位列に対して計算す
るための式であり、上記シーケンス間のバイグラムの頻
度確率は、第１と第２の単位列を含むすべてのセグメン
ト化での尤度の和を、第１の単位列を含むすべてのセグ
メント化での尤度の和で除算することによって得られ
る。また、ここで、上記式は、上記入力データにおいて
各単位列が発生する平均回数を示す分母と、上記入力デ
ータにおいて第２の単位列が第１の単位列に続くときの
各単位列に対する平均回数を示す分子とを有し、上記分
子は、処理対象の各単位列に対する、上記前方尤度と、
当該単位列の直前の単位列を条件としたときの当該単位
列の頻度確率と、上記後方尤度の積の和であり、上記分
母は、処理対象の各単位列に対する、上記前方尤度と、
当該単位列の直前の単位列を条件としたときのすべての
単位列の頻度確率と、上記後方尤度の積の和である。In the above-mentioned statistical sequence model generating apparatus, the above equation may be a bigram between a sequence of unit sequences when a second unit sequence as the unit sequence follows the first unit sequence in the input data. Is a formula for calculating the frequency probability of each of the unit strings to be processed in the input data. The frequency probability of the bigram between the sequences is obtained by dividing all of the segmentation including the first and second unit strings. Is obtained by dividing the sum of likelihoods by the sum of likelihoods in all the segmentations including the first unit sequence. In this case, the above formula is obtained by calculating a denominator indicating an average number of times each unit sequence occurs in the input data, and an average for each unit sequence when the second unit sequence follows the first unit sequence in the input data. A numerator indicating the number of times, the numerator, for each unit sequence of the processing target, the forward likelihood,
The sum of the product of the frequency probability of the unit sequence and the backward likelihood when the unit sequence immediately before the unit sequence is a condition, and the denominator is the forward likelihood for each unit sequence to be processed. ,
This is the sum of the product of the frequency probabilities of all the unit columns and the above-mentioned backward likelihood when the unit column immediately before the unit column is used as a condition.

【００１５】さらに、上記統計的シーケンスモデル生成
装置において、上記終了条件は、上記分類手段の処理
と、上記再推定手段の処理との反復回数が予め決められ
た回数に達したときであることを特徴とする。Further, in the above-mentioned statistical sequence model generating apparatus, the termination condition is that the number of repetitions of the processing of the classifying means and the processing of the re-estimating means has reached a predetermined number. Features.

【００１６】また、本発明に係る統計的言語モデル生成
装置は、上記統計的シーケンスモデル生成装置におい
て、上記単位は自然言語の文字であり、上記シーケンス
は単語であり、上記分類手段は、文字列を複数の単語の
列に分類し、上記統計的シーケンスモデルは、統計的言
語モデルであることを特徴とする。In the statistical language model generating apparatus according to the present invention, in the statistical sequence model generating apparatus, the unit is a character of a natural language, the sequence is a word, and the classifying means is a character string. Is classified into a plurality of word strings, and the statistical sequence model is a statistical language model.

【００１７】さらに、本発明に係る統計的言語モデル生
成装置は、上記統計的シーケンスモデル生成装置におい
て、上記単位は自然言語の単語であり、上記シーケンス
はフレーズであり、上記分類手段は、単語列を複数のフ
レーズの列に分類し、上記統計的シーケンスモデルは、
統計的言語モデルであることを特徴とする。Further, in the statistical language model generating apparatus according to the present invention, in the statistical sequence model generating apparatus, the unit is a word of a natural language, the sequence is a phrase, and the classification means includes a word string. Into a series of phrases, and the statistical sequence model
It is characterized by being a statistical language model.

【００１８】またさらに、本発明に係る音声認識装置
は、入力される発声音声文の音声信号に基づいて、所定
の統計的言語モデルを用いて音声認識する音声認識手段
を備えた音声認識装置において、上記音声認識手段は、
上記統計的言語モデル生成装置によって生成された統計
的言語モデルを参照して音声認識することを特徴とす
る。Still further, the speech recognition apparatus according to the present invention is a speech recognition apparatus provided with speech recognition means for recognizing a speech using a predetermined statistical language model based on an input speech signal of an uttered speech sentence. , The voice recognition means,
The speech recognition is performed by referring to the statistical language model generated by the statistical language model generation device.

【００１９】[0019]

【発明の実施の形態】以下、図面を参照して本発明に係
る実施形態について説明する。以下の実施形態において
は、単位は文字であり、文字のシーケンスである文字列
を単語列に分類する一例、並びに、単位は単語であり、
単語のシーケンスである単語列をフレーズ（句）に分類
する一例について説明しているが、本発明はこれに限ら
ず、単位はＤＮＡであり、ＤＮＡのシーケンスであるＤ
ＮＡ列を所定のＤＮＡ配列に分類するように構成しても
よい。また、単位は塩基であり、塩基のシーケンスであ
る塩基列を所定のコドンに分類するように構成してもよ
い。Embodiments of the present invention will be described below with reference to the drawings. In the following embodiments, the unit is a character, an example of classifying a character string that is a sequence of characters into a word string, and the unit is a word,
An example of classifying a word sequence, which is a sequence of words, into phrases (phrases) has been described. However, the present invention is not limited to this, and the unit is DNA, and the sequence of DNA, D
You may comprise so that an NA row | line may be classified into a predetermined DNA sequence. Further, the unit may be a base, and a base sequence as a base sequence may be classified into predetermined codons.

【００２０】図１は、本発明に係る一実施形態である連
続音声認識装置のブロック図である。本実施形態の連続
音声認識装置は、学習用テキストデータメモリ２１に記
憶された文字列であるテキストデータに基づいて、ワー
キングＲＡＭ３０を用いて、可変長のバイ−マルチグラ
ムの言語モデルを生成する統計的言語モデル生成部２０
を備え、ここで、統計的言語モデル生成部２０の処理
は、図３に示すように、大きく分けると、ブラウンアル
ゴリズムを用いた分類処理（ステップＳ３）と、バイ−
マルチグラムを用いた再推定処理（ステップＳ４）とを
含むことを特徴としている。FIG. 1 is a block diagram of a continuous speech recognition apparatus according to an embodiment of the present invention. The continuous speech recognition apparatus according to the present embodiment uses the working RAM 30 to generate a variable-length bi-multigram language model based on text data that is a character string stored in the learning text data memory 21. Language model generation unit 20
Here, as shown in FIG. 3, the processing of the statistical language model generation unit 20 can be roughly classified into a classification processing using the Brown algorithm (step S3),
And a re-estimation process using a multigram (step S4).

【００２１】すなわち、本実施形態の統計的言語モデル
生成装置は、１個又は複数の文字からなる文字列のシー
ケンスを含む入力データに基づいて、可変長の自然数Ｎ
₁個の文字列と可変長の自然数Ｎ₂個の文字列との間のバ
イグラムであるバイ−マルチグラムの統計的言語モデル
を生成する統計的言語モデル生成装置であり、ここで、
図３に示すように、（ａ）上記入力データに基づいて、
予め決められたＮ₁，Ｎ₂の最大値の拘束条件のもとで、
すべての文字列の組み合わせの上記バイグラムの頻度確
率を計数する初期化処理（ステップＳ２）と、（ｂ）上
記初期化処理によって計数された上記バイグラムの頻度
確率に基づいて、各クラスの対をマージしたときの相互
情報量の損失が最小となるようにマージして各クラスの
頻度確率を更新して予め決められた数の複数のクラスに
分類することにより、分類されたクラスに含まれる文字
列と、分類されたクラスの条件付きの文字列の頻度確率
と、分類されたクラス間のバイグラムの頻度確率を計算
して出力する分類処理（ステップＳ３）と、（ｃ）上記
分類処理によって得られた分類されたクラスに含まれる
文字列と、分類されたクラスの条件付きの文字列の頻度
確率と、分類されたクラス間のバイグラムの頻度確率と
に基づいて、ＥＭアルゴリズムを用いて、最尤推定値を
得るように再推定し、ここで、フォワード・バックワー
ドアルゴリズムを用いて、処理対象の各文字列に対し
て、時系列的に前方にとり得る処理対象の当該文字列に
対する前方尤度と、当該文字列の直前の文字列を条件と
したときの当該文字列の頻度確率と、時系列的に後方に
とり得る当該文字列に対する後方尤度とに基づいてシー
ケンス間のバイグラムの頻度確率を示す式（数２２−数
２４）を用いて、当該シーケンス間のバイグラムの頻度
確率を再推定することにより、再推定結果である上記バ
イ−マルチグラムの統計的シーケンスモデルを生成して
出力する再推定処理（ステップＳ４）と、（ｄ）上記分
類処理と上記再推定処理を所定の終了条件を満たすまで
繰り返し実行するように制御する処理（ステップＳ５）
を含むことを特徴とする。That is, the statistical language model generating apparatus according to the present embodiment uses a variable-length natural number N based on input data including a character string sequence composed of one or more characters.
A is bi bigram between _one string and a natural number N ₂ pieces of string length - a statistical language model generating device for generating a statistical language model of a multi-gram, wherein
As shown in FIG. 3, (a) based on the input data,
Under the constraint of the predetermined maximum value of N ₁ and N ₂ ,
Initialization processing (step S2) for counting the frequency probabilities of the bigrams of all combinations of character strings, and (b) merging pairs of each class based on the frequency probabilities of the bigrams counted by the initialization processing Character strings included in the classified classes by merging so as to minimize the loss of mutual information when doing so and updating the frequency probability of each class and classifying it into a predetermined number of multiple classes A classification process (step S3) for calculating and outputting the frequency probability of a conditional character string of the classified class and the frequency probability of a bigram between the classified classes (step S3); EM based on the character strings included in the classified class, the frequency probability of the conditional character string of the classified class, and the frequency probability of the bigram between the classified classes. The algorithm is re-estimated so as to obtain the maximum likelihood estimation value. Here, using the forward / backward algorithm, for each character string to be processed, Based on the forward likelihood for the character string, the frequency probability of the character string under the condition of the character string immediately before the character string, and the backward likelihood for the character string that can be backward in time series, By re-estimating the frequency probability of the bigram between the sequences using the equation (equation 22-equation 24) indicating the frequency probability of the bigram, the statistical sequence model of the bi-multigram as the re-estimation result is obtained. A re-estimation process (step S4) to generate and output, and (d) a process of controlling to repeatedly execute the classification process and the re-estimation process until predetermined end conditions are satisfied. Step S5)
It is characterized by including.

【００２２】本実施形態では、単語のＮ−ｇｒａｍに基
づく手法に対向する、フレーズに基づく方法に焦点を当
てる。ここで、複数の文はフレーズに構成され、頻度確
率は、単語に代わってフレーズに割り当てられる。モデ
ルがＮ−ｇｒａｍに基づくか、フレーズに基づくかに関
わらず、それらは確定的モデルあるいは統計的モデルの
いずれかに該当する。フレーズに基づく枠組みでは、非
確定性はその文の解析結果の曖昧さを通じてフレーズに
導入される。すなわち、これは実際においては、フレー
ズ”ａｂｃ”がフレーズとして登録されているにもかか
わらず、文字列の解析結果が例えば［ａｂ］［ｃ］とな
る確率が皆無でないことを意味する。これとは対照的
に、確定的手法ではａ、ｂ、ｃすべての同時出現はシス
テマティックにフレーズ［ａｂｃ］の出現と解釈され
る。This embodiment focuses on a phrase-based method as opposed to a word N-gram-based method. Here, the plurality of sentences are formed into phrases, and the frequency probabilities are assigned to the phrases in place of the words. Regardless of whether the models are based on N-grams or phrases, they correspond to either deterministic or statistical models. In a phrase-based framework, uncertainty is introduced into a phrase through ambiguity in the parsing of the sentence. That is, this means that, in fact, even though the phrase "abc" is registered as a phrase, there is no probability that the analysis result of the character string is, for example, [ab] [c]. In contrast, in the deterministic approach, the simultaneous occurrence of all a, b, and c is systematically interpreted as the occurrence of the phrase [abc].

【００２３】また、本実施形態では、統計的言語モデル
の処理は、バイ−マルチグラムを用いて実行され、当該
バイ−マルチグラムの言語モデルは、フレーズに基づく
統計的モデルであり、そのパラメータは尤度基準に従っ
て推定される。In this embodiment, the processing of the statistical language model is executed by using a bi-multigram, and the language model of the bi-multigram is a statistical model based on phrases, and its parameter is Estimated according to likelihood criteria.

【００２４】まず、マルチグラムの理論的な定式化につ
いて説明する。マルチグラムの枠組みでは、Ｔ個の単語
からなる文First, the theoretical formulation of the multigram will be described. In the multigram framework, a sentence consisting of T words

【数２】Ｗ＝ｗ₍₁₎ｗ₍₂₎…ｗ_(T) は、それぞれ最大長ｎ個の単語からなる各々のフレーズ
が連鎖（シーケンス）したものと仮定される。ここで、
ＳはＴ_s個のフレーズへのセグメント化を示し、ｓ_(t)は
セグメント化Ｓにおける時刻インデックス（最初の語か
らのシリアル番号を示す。）（ｔ）のフレーズとした場
合、ＷのＳでのセグメント化の結果は、次式で表すこと
ができる。[Mathematical formula-see original document] W = w ₍₁₎ w ₍₂₎ ... W _(T) is assumed to be a sequence (sequence) of phrases having a maximum length of n words. here,
S indicates segmentation into T _s phrases, and s _(t) is the time index (indicating the serial number from the first word ₎ in the segmentation S. Can be represented by the following equation.

【数３】（Ｗ，Ｓ）＝ｓ₍₁₎…ｓ_(Ts) ## EQU3 ## (W, S) = s ₍₁₎ ... S _(Ts)

【００２５】ここで、セグメント化された複数のフレー
ズからなる辞書は、語彙から１，２…からｎにいたるま
での単語を組み合わせて形成されるものであり、ここで
は，次式のように表す。Here, a dictionary composed of a plurality of segmented phrases is formed by combining words from 1, 2,... To n from the vocabulary, and is represented by the following equation. .

【数４】Ｄｓ＝｛ｓ_j｝_j そして、文の尤度は、各セグメント化に対する尤度の和
として、次式のように計算される。Ds = {s _j } _{j The} sentence likelihood is calculated as the following equation as the sum of likelihoods for each segmentation.

【００２６】[0026]

【数５】 (Equation 5)

【００２７】モデルの決定指向的手法により、文Ｗは、
最も尤らしいセグメント化に従って解析され、次の近似
式が得られる。According to the decision-oriented method of the model, the sentence W is
Analyzed according to the most likely segmentation, the following approximation is obtained.

【００２８】[0028]

【数６】 (Equation 6)

【００２９】ここで、フレーズ間のｎ−ｇｒａｍの相関
を仮定し、特定のセグメント化Ｓの結果の尤度の値を次
式のように計算する。Here, assuming an n-gram correlation between phrases, the likelihood value of the result of the specific segmentation S is calculated as follows.

【００３０】[0030]

【数７】 (Equation 7)

【００３１】ここで、以下、符号ｎは複数のフレーズ間
の依存度を表し、従来のｎ−ｇｒａｍの表記法のｎとし
て使用する。また、符号ｎ_maxは、フレーズの最大長を
表す。従って、ここで、尤度の計算例を次式に示す。こ
の例では、バイ−マルチグラムモデル（ｎ_max＝３，ｎ
＝２）の”ａｂｃｄ”の尤度を示す。記号＃は空のシー
ケンスを表す。Here, the symbol n represents the degree of dependence between a plurality of phrases, and is used as n in the conventional n-gram notation. The code n _max indicates the maximum length of the phrase. Therefore, an example of calculating the likelihood is shown in the following equation. In this example, the bi-multigram model (n _max = 3, n
= 2) indicates the likelihood of “abcd”. The symbol # represents an empty sequence.

【００３２】[0032]

【数８】尤度＝ｐ(［ａ］｜＃)ｐ(［ｂ］｜［ａ］)ｐ(［ｃ］｜［ｂ］)ｐ(［ｄ］｜［ｃ］）＋ｐ(［ａ］｜＃)ｐ(［ｂ］｜［ａ］)ｐ(［ｃｄ］｜［ｂ］) ＋ｐ(［ａ］｜＃)ｐ(［ｂｃ］｜［ａ］)ｐ(［ｄ］｜［ｂｃ］) ＋ｐ(［ａ］｜＃)ｐ(［ｂｃｄ］｜［ａ］) ＋ｐ(［ａｂ］｜＃)ｐ(［ｃ］｜［ａｂ］)ｐ(［ｄ］｜［ｃ］) +p(［ａｂ］｜＃)ｐ(［ｃｄ］｜［ａｂ］) ＋ｐ(［ａｂｃ］｜＃)ｐ(［ｄ］｜［ａｂｃ］)Equation 8 Likelihood = p ([a] | #) p ([b] | [a]) p ([c] | [b]) p ([d] | [c]) + p ([a] | #) P ([b] | [a]) p ([cd] | [b]) + p ([a] | #) p ([bc] | [a]) p ([d] | [bc] ) + P ([a] | #) p ([bcd] | [a]) + p ([ab] | #) p ([c] | [ab]) p ([d] | [c]) + p ( [Ab] | #) p ([cd] | [ab]) + p ([abc] | #) p ([d] | [abc])

【００３３】上記数８から明らかなように、当該尤度
は、シーケンス”ａｂｃｄ”をセグメント化するときの
すべての組み合わせについての頻度確率の和を表してい
る。As is apparent from the above equation 8, the likelihood represents the sum of the frequency probabilities of all combinations when segmenting the sequence “abcd”.

【００３４】次いで、言語モデルのパラメータの推定に
ついて説明する。マルチグラムのｎ−ｇｒａｍモデル
は、パラメータΘのセットによって完全に定義され、次
式のパラメータΘは、辞書Ｄｓを用いて、Next, the estimation of the language model parameters will be described. The multigram n-gram model is completely defined by a set of parameters Θ, and the parameter の in the following equation is obtained using a dictionary Ds:

【数９】 Θ＝｛ｐ（ｓ_in｜ｓ_i1…ｓ_in-1）｜ｓ_i1…ｓ_in∈Ｄｓ｝ｎ個のフレーズのあらゆる組み合わせに関係するｎ−ｇ
ｒａｍの条件付き確率によって構成される。パラメータ
Θのセットの推定値は、例えば、不完全なデータから得
られる想定しうる最大の尤度値、すなわち最尤推定値
（Maximum Likelihood Estimation）として得られ、こ
こで、未知のデータは基礎をなすセグメント化Ｓであ
る。従って、パラメータΘの反復的な最尤推定値は、公
知のＥＭアルゴリズム（Expectation Maximization Alg
orithm)によって計算することができる。ここで、Ｑ
（ｋ，ｋ＋１）を、反復回数パラメータｋ及びｋ＋１の
尤度を用いて計算される、次式の補助関数とする。９ = {p (s _in | s _i1 ... s _in-1 ) | s _i1 ... s _in {Ds} ng related to any combination of n phrases
ram. The estimate of the set of parameters Θ is obtained, for example, as the maximum possible likelihood value obtained from incomplete data, ie, the Maximum Likelihood Estimation, where the unknown data is based on This is the segmentation S to be made. Therefore, the iterative maximum likelihood estimate of the parameter Θ is calculated using the well-known EM algorithm (Expectation Maximization Alg
orithm). Where Q
Let (k, k + 1) be an auxiliary function of the following equation, which is calculated using the likelihood of the iteration number parameters k and k + 1.

【００３５】[0035]

【数１０】 (Equation 10)

【００３６】公知のＥＭアルゴリズムにおいて示される
ように、As shown in the known EM algorithm,

【数１１】Ｑ（ｋ，ｋ＋１）≧Ｑ（ｋ，ｋ）であれば、If Q (k, k + 1) ≧ Q (k, k), then

【数１２】Ｌ^(k+1)（Ｗ）≧Ｌ^(k)（Ｗ）である。従って、反復回数パラメータ（ｋ＋１）におけ
る次式の再推定式L ^{(k + 1)} (W) ≧ L ^(k) (W) Therefore, the following re-estimation formula for the number of iterations parameter (k + 1)

【数１３】ｐ^(k+1)（ｓ_in｜ｓ_i1…ｓ_in-1）は、次式の拘束条件## EQU13 ## p ^{(k + 1)} (s _in | s _i1 ... S _in-1 ) is a constraint condition of the following equation.

【数１４】のもとで、モデルパラメータΘ^(k+1)について補助関数
Ｑ（ｋ，ｋ＋１）を最大化することにより、次式のよう
に直接的に導くことができる。なお、本明細書におい
て、下付きの下付きの表記及び上付きの下付きの表記は
できないので、下層の下付きの表記を省略している。[Equation 14] By maximizing the auxiliary function Q (k, k + 1) for the model parameter Θ ^{(k + 1)} under the following equation, the following equation can be directly derived. In this specification, the subscript notation and the subscript notation of the superscript are not possible, so the subscript notation of the lower layer is omitted.

【００３７】[0037]

【数１５】 (Equation 15)

【００３８】ここで、ｃ（ｓ_i1…ｓ_in，Ｓ）は、セグメ
ント化Ｓにおける複数のフレーズｓ_i1…ｓ_inの組み合わ
せの出現数を示す。数１５の再推定式は、バイ−マルチ
グラム（ｎ＝２）について詳細後述されるように、フォ
ワード・バックワードアルゴリズム（forward backward
algorithm)(以下、ＦＢ法ともいう。）を用いて実行さ
れる。決定指向の方法では、再推定式は、次式のように
簡略化される。[0038] _{_{Here, c (s i1 ... s in}} , S) indicates the number of occurrences of the combination of a plurality of phrases s _i1 ... s _in the segmentation S. The re-estimation equation of equation (15) is described in detail later for the bi-multigram (n = 2), by using a forward backward algorithm (forward backward algorithm).
algorithm) (hereinafter also referred to as the FB method). In a decision-oriented method, the re-estimation equation is simplified as:

【００３９】[0039]

【数１６】ｐ^(k+1)（ｓ_in…ｓ_in-1）＝｛ｃ（ｓ_i1…ｓ
_in-1ｓ_in，Ｓ^*(k)）｝／｛ｃ（ｓ_i1…ｓ_in-1，
Ｓ^*(k)）｝## EQU16 ## p ^{(k + 1)} (s _in ... S _in-1 ) = ｛c (s _i1 .
_in-1 s _in , S ^{* (k)} )｝ /｝ c (s _i1 ... s _in-1 ,
S ^{* (k)} )｝

【００４０】ここで、Ｓ^*(k)は、Ｌ^(k)（Ｓ｜Ｗ）を最大
化する文の解析結果であり、ビタビ（Viterbi）アルゴ
リズムによって導かれる。各反復は、尤度Ｌ^(k)（Ｗ）
を増大させる意味において言語モデルを改善し、最終的
には臨界点（おそらくは、局所最大値）へ収束する。モ
デルパラメータΘのセットは、学習用コーパス、すなわ
ち学習用テキストデータにおいて観察されるすべてのフ
レーズの組み合わせの相対的頻度を用いて初期化され
る。Here, S ^{* (k)} is an analysis result of a sentence that maximizes L ^(k) (S | W), and is derived by the Viterbi algorithm. Each iteration has a likelihood L ^(k) (W)
Improve the language model in the sense of increasing and eventually converge to a critical point (perhaps a local maximum). The set of model parameters Θ is initialized using the learning corpus, ie, the relative frequency of all phrase combinations observed in the training text data.

【００４１】次いで、可変長フレーズのクラスタリング
（分類処理）について説明する。従来技術文献１によれ
ば、近年、クラス−フレーズに基づくモデルが注目され
ているが、通常、それは従来の単語クラスタリングを仮
定している。典型的には、各単語はまず、単語が属する
クラスのラベルＣ_kを割り当てられ、単語−クラスラベ
ルの可変長フレーズ［Ｃ_k1，Ｃ_k2…Ｃ_kn］が導かれる。
各可変長フレーズによって、“＜［Ｃ_k1，Ｃ_k2…Ｃ_kn］
＞”として示されるフレーズが属するクラスのラベルが
定義される。しかしながら、この手法では、同じ長さの
フレーズのみにしか同じフレーズ−クラスラベルを割り
当てることができない。例えば、”thank you for”
と”thank you very much for”というフレーズを同じ
クラスラベルに割り当てることができない。本実施形態
では、このような限界に対する解決法として、単語に代
わり直接フレーズをクラスタリングする方法を提案す
る。この目的を達成するためには、２個のフレーズ間の
バイグラムの相関（ｎ_max＝２）を仮定し、上述したバ
イ−マルチグラムモデルの学習手法に変更を加え、各反
復が次の２つの段階より構成されるようにする。Next, clustering of variable-length phrases (classification processing) will be described. According to the prior art document 1, a model based on class-phrases has recently attracted attention, but it usually assumes conventional word clustering. Typically, each word is first assigned the label C _k of the class to which the word belongs, and a variable-length phrase [C _k1 , C _k2 ... C _kn ] of the word-class label is derived.
By each variable length phrase, “<[C _k1 , C _k2 ... C _kn ]
The label of the class to which the phrase denoted as >> is defined. However, with this approach, only phrases of the same length can be assigned the same phrase-class label. For example, "thank you for"
And the phrase "thank you very much for" cannot be assigned to the same class label. In the present embodiment, as a solution to such a limitation, a method of directly clustering phrases instead of words is proposed. To this end, assuming a bigram correlation (n _max = 2) between the two phrases, and modifying the bi-multigram model learning method described above, each iteration has the following two Be composed of stages.

【００４２】（Ｉ）ステップＳＳ１：クラス割り当て
（図３のステップＳ３に対応する。）(I) Step SS1: Class assignment (corresponding to step S3 in FIG. 3)

【数１７】｛ｐ^(k)（ｓ_j｜ｓ_i）｝→｛ｐ^(k)（Ｃ_k(sj)
｜Ｃ_k(sj)），ｐ^(k)（ｓ_j｜Ｃ_k(sj)）｝（ＩＩ）ステップＳＳ２：マルチグラムの再推定（図３
のステップＳ４に対応する。）[Equation 17] {p ^(k) (s _j | s _i )} → {p ^(k) (C _{k (sj)}
| C _{k (sj)} ), p ^(k) (s _j | C _{k (sj)} )｝ (II) Step SS2: Re-estimation of multigram (FIG. 3
Corresponds to step S4. )

【数１８】｛ｐ^(k)（Ｃ_k(sj)｜Ｃ_k(si)），ｐ^(k)（ｓ_j
｜Ｃ_k(sj)）｝→｛ｐ^(k+1)（ｓ_j｜ｓ_i）｝ _１８p ^(k) (C _{k (sj)} | C _{k (si)} ), p ^(k) (s _j
| C _{k (sj)} )｝ → {p ^{(k + 1)} (s _j | s _i )}

【００４３】上記ステップＳＳ１では、フレーズバイグ
ラムの頻度確率を入力とし、クラスバイグラムの頻度確
率を出力する。クラス割り当ては、例えば、従来技術文
献５「P. F. Brown et al., ”Class-based n-gram mod
els of natural language”,Computational Linguistic
s, Vol.18,No.4,pp.467-479,1992」によれば、隣り合う
フレーズ間の相関情報を最大化することによって行われ
る。ここで、クラスタリングの候補は単語ではなくフレ
ーズとする。上述のように、｛ｐ⁽⁰⁾（ｓ_j｜ｓ_i）｝
は、学習用テキストデータにおけるフレーズの同時出現
の相対的頻度を用いて初期化される。上記ステップＳＳ
２では、マルチグラムの再推定式（数１５）又はその近
似式（数１６）を用いてフレーズの頻度確率を再推定す
る。ここで、唯一の違いは、解析結果の尤度は以下の式
により計算される。In step SS1, the frequency probability of the phrase bigram is input and the frequency probability of the class bigram is output. The class assignment is performed, for example, according to the conventional technique 5 “PF Brown et al.,” “Class-based n-gram mod.
els of natural language ”, Computational Linguistic
s, Vol. 18, No. 4, pp. 467-479, 1992 ", this is performed by maximizing correlation information between adjacent phrases. Here, the clustering candidates are not words but phrases. As described above, {p ⁽⁰⁾ (s _j | s _i )}
Is initialized using the relative frequency of simultaneous appearance of phrases in the learning text data. The above step SS
In step 2, the phrase frequency probability is re-estimated using the multigram re-estimation equation (Equation 15) or its approximate equation (Equation 16). Here, the only difference is that the likelihood of the analysis result is calculated by the following equation.

【００４４】[0044]

【数１９】 [Equation 19]

【００４５】これは、上述したように、頻度確率ｐ^(k)
（ｓ_j｜ｓ_i）に対する処理と同様に、頻度確率ｐ
^(k)（Ｃ_k(sj)｜Ｃ_k(si)）×ｐ^(k)（ｓ_j｜Ｃ_k(sj)）に基
づいて頻度確率ｐ^(k+1)（ｓ_j｜ｓ_i）を再推定すること
に等しい。This is, as described above, the frequency probability p ^(k)
(S _j | s _i ), the frequency probability p
^{(k) The} frequency probability p ^{(k + 1)} (s _j | s _i ) is _calculated based on (C _{k (sj)} | C _{k (si)} ) × p ^(k) (s _j | C _{k (sj)} ). Equivalent to re-estimation.

【００４６】要約すれば、上記ステップＳＳ１によっ
て、現在のフレーズ分布に関し、相互情報量の基準に基
づくクラス割り当てが最適化されるよう保証され、上記
ステップＳＳ２によって、現在のクラスの頻度確率を用
いて、上記数１９に従って、計算された尤度がフレーズ
の頻度確率により最適化されるよう保証される。学習デ
ータは、従って、完全に統合化された方法により連合的
（paradigmatic）かつ統合的（syntagmatic）（それぞ
れ言語学の用語である。）レベルの双方において反復的
に構成される。すなわち、クラス割り当てにより表現さ
れるフレーズ間の連合的関係はフレーズの頻度確率の再
推定に影響を与え、フレーズの頻度確率は後続するクラ
ス割り当てを決定する。In summary, the above-mentioned step SS1 guarantees that the class assignment based on the mutual information criterion is optimized with respect to the current phrase distribution, and the above-mentioned step SS2 uses the frequency probabilities of the current class. According to Equation 19 above, it is ensured that the calculated likelihood is optimized by the frequency probability of the phrase. The training data is thus constructed iteratively at both a paradigmatic and a syntagmatic (each a linguistic term) level in a completely integrated manner. That is, the associative relationship between phrases represented by class assignments affects the re-estimation of phrase frequency probabilities, and the phrase frequency probabilities determine subsequent class assignments.

【００４７】本実施形態では、上述のように、バイ−マ
ルチグラムのパラメータの推定のために、フォワード・
バックワードアルゴリズム（ＦＢ法）を用いる。これに
ついて、以下に、詳述する。In the present embodiment, as described above, the forward and
The backward algorithm (FB method) is used. This will be described in detail below.

【００４８】上記数１５は、フォワード・バックワード
アルゴリズムを用いて、ｎ_maxをシーケンスの最大長と
し、Ｔをコーパス（学習用テキストデータ）の語数とし
て、複雑さの度合いであるコンプレキシティＯ（ｎ_max ²
Ｔ）で計算することができる。ここで、コンプレキシテ
ィＯ（ｎ_max ²Ｔ）は計算コストのオーダーに対応する。
すなわち、当該数１５の計算コストは、シーケンスの最
大長ｎ_maxの２乗に比例し、コーパスの語数に比例す
る。本実施形態においては、基本的には、セグメント化
｛Ｓ｝のセットではなく、単語のタイムインデックス
（ｔ）にわたって加算を行い、数１５の分子及び分母を
計算する。ここで、当該計算は、次式の前方向の変数α
（ｔ，ｌ_i）及び後ろ方向の変数β（ｔ，ｌ_j）の定義に
依存する。The above equation (15) uses the forward / backward algorithm, where n _max is the maximum length of the sequence and T is the number of words in the corpus (learning text data). n _max ²
T). Here, the complexity O (n _max ² T) corresponds to the order of the calculation cost.
That is, the calculation cost of Equation 15 is proportional to the square of the maximum length n _max of the sequence, and is proportional to the number of words in the corpus. In this embodiment, basically, the addition is performed over the time index (t) of the word, not the set of segmentation {S}, and the numerator and denominator of Expression 15 are calculated. Here, the calculation is based on the forward variable α in the following equation.
(T, l _i ) and the backward variable β (t, l _j ).

【００４９】[0049]

【数２０】 α（ｔ，ｌ_i）＝Ｌ（Ｗ₍₁₎ ^(t-li)｜
［Ｗ_(t-li+1) ^(t)］）Α (t, l _i ) = L (W ₍₁₎ ^(t-li) |
[W _{(t-li + 1)} ^(t) ])

【数２１】 β（ｔ，ｌ_j）＝Ｌ（Ｗ_(t+1) ^(T)｜［Ｗ_(t-lj+1) ^(t)］）Β (t, l _j ) = L (W _{(t + 1)} ^(T) | [W _{(t−lj + 1)} ^(t) ])

【００５０】前方向の変数α（ｔ，ｌ_i）は、最初のｔ
個の単語の尤度を表し、ここで、最後のl_i個の単語は、
１つのシーケンスを形成するように制限される。また、
後ろ方向の変数β（ｔ，ｌ_j）は、最後の（Ｔ−ｔ）個
の語の条件付き尤度を示し、最後の（Ｔ−ｔ）個の単語
は、シーケンス［ｗ_(t-lj+1)…ｗ_(t)］に後続する。こ
こで、例えば、Ｗ₍₁₎ ^(t-li)は、時刻インデックス
（１）から（ｔ−ｌ_i）までの単語からなる単語列を表
す。そして、解析結果の尤度は、数７によって計算され
ると仮定すると、数１５は次式のように書き換えられ
る。The forward variable α (t, l _i ) is the first t
Represents the likelihood of the words, where the last l _i words are
Limited to form one sequence. Also,
The backward variable β (t, l _j ) indicates the conditional likelihood of the last (Tt) words, and the last (Tt) words are the sequence [w _{(t-lj +1)} ... W _(t) ]. Here, for example, W ₍₁₎ ^(t-li) represents a word sequence composed of words from the time index (1) to (t-l _i). Then, assuming that the likelihood of the analysis result is calculated by Expression 7, Expression 15 is rewritten as the following expression.

【００５１】[0051]

【数２２】ｐ^(k+1)（ｓ_j｜ｓ_i）＝ｐ_c／ｐ_d ここで、P ^{(k + 1)} (s _j | s _i ) = p _c / p _d where:

【数２３】 (Equation 23)

【数２４】 (Equation 24)

【００５２】ここで、ｌ_i及びｌ_jはそれぞれシーケンス
ｓ_i及びｓ_jの長さを示す。クロネッカー関数δ_k（ｔ）
は、時刻インデックスｔで開始する単語のシーケンスが
ｓ_kであるときは１となる一方、そうでない場合は０と
なる関数である。また、変数α及びβは以下の反復式
（又は帰納式）によって計算できる。ここで、時刻イン
デックスｔ＝０及びｔ＝Ｔ＋１においてそれぞれ開始及
び終了シンボルを仮定する。Here, l _i and l _j indicate the lengths of the sequences s _i and s _j , respectively. Kronecker function δ _k (t)
When the sequence of words that started at time index t is s _k whereas a 1, a 0 to become function otherwise. The variables α and β can be calculated by the following iterative formula (or induction formula). Here, start and end symbols are assumed at time indices t = 0 and t = T + 1, respectively.

【００５３】１≦ｔ≦Ｔ＋１に対して：For 1 ≦ t ≦ T + 1:

【数２５】ここで、(Equation 25) here,

【数２６】α（０，１）＝１，α（０，２）＝…＝α
（０，ｎ_max）＝０である。Α (0,1) = 1, α (0,2) =... = Α
(0, n _max ) = 0.

【００５４】０≦ｔ≦Ｔに対して：For 0 ≦ t ≦ T:

【数２７】ここで、[Equation 27] here,

【数２８】β（Ｔ＋１，１）＝１，β（Ｔ＋１，２）＝
…＝β（Ｔ＋１，ｎ_max）＝０である。(28) β (T + 1,1) = 1, β (T + 1,2) =
.. = Β (T + 1, n _max ) = 0.

【００５５】解析結果の尤度がクラスの仮定を用いて計
算される場合、すなわち、数１９に従って計算される場
合は、再推定式（数２２−数２４）の項ｐ^(k)（ｓ_j｜ｓ
_i）はそのクラスの等価物、すなわちｐ^(k)（Ｃ_k(sj)｜
Ｃ_k(si)）ｐ^(k)（ｓ_j｜Ｃ_k(sj)）に置き換えられる。α
の反復式において、項ｐ（［Ｗ_(t-li+1) ^(t)］｜［Ｗ
_(t-li-l+1) ^(t-li)］）は、シーケンス
［Ｗ_(t-li+1) ^(t)］のクラスの条件付き確率を乗算した
対応するクラスのバイグラム確率に置き換えられる。同
様の変形を反復式における変数βについても行う。When the likelihood of the analysis result is calculated using the class assumption, that is, when it is calculated according to Equation 19, the term p ^(k) (s _j ^{) in} the re-estimation equation (Equation 22-Equation 24) | S
_i ) is the equivalent of that class, ie p ^(k) (C _{k (sj)} |
C _{k (si)} ) p ^(k) (s _j | C _{k (sj)} ). α
In the iterative expression, the term p ([W _{(t-li + 1)} ^(t) ] | [W
_{(t-li-l + 1)} ^(t-li) ]) is replaced by the corresponding class bigram probability multiplied by the conditional probability of the class of the sequence [W _{(t-li + 1)} ^(t) ]. . The same modification is performed for the variable β in the iterative equation.

【００５６】次いで、本実施形態におけるフォワード・
バックワードアルゴリズムを用いた再推定処理につい
て、一例を参照して、以下に詳述する。前方向及び後ろ
方向（以下、前後方向という。）の再推定処理は、数２
２の分子の加算、及び分母の加算が、可能な解析結果集
合｛Ｓ｝に代わって、学習データにおける単位の時刻イ
ンデックスｔについて計算されるように、数１５におけ
る複数の項を配列し直して行う。この方法は、前方向の
変数α及び後ろ方向の変数βの定義に依存している。（ａ）下記のパラグラフ＜＜Ａ１＞＞では、クラスのな
いことを仮定している。（ｂ）下記のパラグラフ＜＜Ａ１．１＞＞では、変数α
及びβを定義し、例を提供する。（ｃ）下記のパラグラフ＜＜Ａ１．２＞＞では、変数α
及びβを使用した頻度確率に関する前後方向の再推定に
ついて例示する。（ｄ）下記のパラグラフ＜＜Ａ１．３＞＞では、反復
（又は帰納）による変数αとβの計算方法に関して例示
する。（ｅ）下記のパラグラフ＜＜Ａ２＞＞では、クラスが存
在する場合のパラグラフ＜＜Ａ１．２＞＞及び＜＜Ａ
１．３＞＞の修正方法を示す。（ｆ）下記の例はすべて、次の表に示すデータに基づい
ている。Next, in the present embodiment, the forward
The re-estimation process using the backward algorithm will be described in detail below with reference to an example. The re-estimation process in the forward direction and the backward direction (hereinafter, referred to as the front-back direction) is represented by Equation 2
Rearrange the multiple terms in Equation 15 so that the addition of the numerator of 2 and the addition of the denominator are calculated for the time index t of the unit in the learning data, instead of the possible analysis result set {S}. Do. This method relies on the definition of the forward variable α and the backward variable β. (A) In the following paragraph << A1 >>, it is assumed that there is no class. (B) In the following paragraph << A1.1 >>, the variable α
And β are defined and examples are provided. (C) In the following paragraph << A1.2 >>, the variable α
The following describes an example of re-estimation in the front-rear direction with respect to the frequency probability using. (D) In the following paragraph << A1.3 >>, an example is given of a method of calculating the variables α and β by iteration (or induction). (E) In the following paragraph << A2 >>, the paragraphs << A1.2 >> and << A
1.3 shows a correction method. (F) All the examples below are based on the data shown in the following table.

【００５７】[0057]

【表１】 ――――――――――――――――――――――――――――――――――― 入力学習データ(下記）： o n e s i x o n e e i g h t s i x t h r e e t w o 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 単位の時刻インデックス（上記）： ――――――――――――――――――――――――――――――――――― （注）学習データの１つの文字は、１つの時刻インデックスに対応している。[Table 1] ――――――――――――――――――――――――――――――――― Input training data (below): onesixoneeightsixthre etwo 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Time index (unit above): ――――――――――――――――――― ―――――――――――――――― (Note) One character of the learning data corresponds to one time index.

【００５８】＜＜Ａ１．１＞＞前方向の変数α及び後ろ
方向の変数βの定義変数α（ｔ，ｌ）は、長さｌのシーケンスで終了する、
時刻インデックス（ｔ）までのデータの尤度である。例
えば、変数α（９，３）は、シーケンス「o ne s i x o
_n_e」の尤度である。また、変数β（ｔ，ｌ）は、長さ
ｌのシーケンスが時刻インデックス（ｔ）で終了すると
いうことが知られているときに、時刻インデックス（ｔ
＋１）で開始されるデータの条件つき尤度である。例え
ば、変数β（９，３）は、先行するシーケンスが「o_n_
e」であるときの、シーケンス「e i g h t s i x t h r
e e tw o」の尤度である。反復又は帰納による変数α
及びβの計算方法に関する例を、下記のパラグラフ＜＜
Ａ１．３＞＞に示す。<< A1.1 >> Definition of Forward Variable α and Backward Variable β Variable α (t, l) ends with a sequence of length l.
This is the likelihood of the data up to the time index (t). For example, the variable α (9,3) corresponds to the sequence “o ne sixo
_n_e ”. Also, the variable β (t, l) indicates that when the sequence of length l is known to end at the time index (t), the time index (t
+1) is the conditional likelihood of the data starting at +1). For example, for the variable β (9,3), the preceding sequence is “o_n_
e ", the sequence" eightsixthr
ee tw o ”. Variable α by iteration or induction
Examples of how to calculate and β are given in the following paragraphs <<
A1.3 >>.

【００５９】＜＜Ａ１．２＞＞変数α及びβに基づく確
率の再推定例として、上記の学習データ例に関する、変数α及びβ
を使用した頻度確率p（o_n_e｜s_i_x）の再推定式を示
す。頻度確率p（o_n_e｜s_i_x）の一般的な再推定式
（数１５)）は次のような意味を持つ。（ａ）分子は、学習データにおいてシーケンス「o_n_
e」がシーケンス「s_i_x」に続く平均回数である。（ｂ）分母は、学習データにおいてシーケンス「s_i_
x」が発生する平均回数である。（ｃ）ここで、平均回数の値は、学習データのシーケン
スにおけるすべての可能な解析結果について求める。<< A1.2 >> Re-estimation of Probability Based on Variables α and β As an example, the variables α and β
Is a formula for re-estimating the frequency probability p (o_n_e | s_i_x) using The general re-estimation formula (Equation 15) for the frequency probability p (o_n_e | s_i_x) has the following meaning. (A) The numerator is the sequence “o_n_
"e" is the average number of times following the sequence "s_i_x". (B) The denominator uses the sequence “s_i_
"x" is the average number of occurrences. (C) Here, the value of the average number is determined for all possible analysis results in the sequence of the learning data.

【００６０】フォワード・バックワードアルゴリズムを
用いた再推定式（数２２−２４）の分子（数２３）及び
分母（数２４）はそれぞれ、数１５の分子及び分母に等
しいが、これらは解析結果集合にわたる加算ではなく、
時刻インデックスにわたる加算によって計算したもので
ある。再推定式（数１５）の分子では、「s_i_x」と「o
_n_e」の２個のシーケンスが連続して発生する毎に、各
可能な解析結果の尤度が加算される。一方、フォワード
・バックワードアルゴリズムを用いた再推定式（数２２
−数２４）においては、「s_i_x」と「o_n_e」の２個の
シーケンスが連続して発生し、また、シーケンス「o_n_
e」が時刻インデックス（ｔ＋１）で開始するようなす
べての解析結果の尤度値をまずグループ化して、加算す
る。時刻インデックスｔまで加算した時点で加算計算は
完了する。The numerator (Equation 23) and the denominator (Equation 24) of the re-estimation equation (Equation 22-24) using the forward / backward algorithm are equal to the numerator and the denominator of Equation 15, respectively. Rather than addition over
It is calculated by addition over a time index. In the numerator of the re-estimation equation (Equation 15), “s_i_x” and “o
Each time two sequences of “_n_e” occur consecutively, the likelihood of each possible analysis result is added. On the other hand, a re-estimation formula using the forward / backward algorithm (Equation 22)
In (Equation 24), two sequences of “s_i_x” and “o_n_e” continuously occur, and the sequence “o_n_e”
First, likelihood values of all analysis results such that “e” starts at the time index (t + 1) are grouped and added. The addition calculation is completed when the addition is performed up to the time index t.

【００６１】上記の例では、「s_i_x」と「o_n_e」の２
個のシーケンスが連続して発生し、しかもシーケンス
「o_n_e」が時刻インデックス（７）でのみ開始してい
る。ここで、「s_i_x」と「o_n_e」の２個のシーケンス
が連続して発生し、また、時刻インデックス（７）でシ
ーケンス「o_n_e」が開始するようなすべての解析結果
の尤度値の和は、シーケンス「o n e s_i_x o_n_e e i
g h t s i x t h r e et w o」の尤度であり、これは、
次式に等しい。In the above example, “s_i_x” and “o_n_e”
Sequences occur consecutively, and the sequence “o_n_e” starts only at the time index (7). Here, two sequences of “s_i_x” and “o_n_e” continuously occur, and the sum of likelihood values of all analysis results such that the sequence “o_n_e” starts at the time index (7) is , The sequence "one s_i_x o_n_e ei
ghtsixthre et wo ", which is
It is equal to:

【数２９】 (Equation 29)

【００６２】ここで、第２項のｐ（o_n_e｜s_i_x）は、
反復回数パラメータ（ｋ）における頻度確率である。ま
た、前方向の変数αの定義により、変数α（６，３）は
シーケンス「o n e s_i_x」の尤度であり、さらに、後
ろ方向の変数βの定義により、変数β（９，３）は、シ
ーケンス「o_n_e」が得られたときの、シーケンス「ei
g h t s i x t h r e e t w o」の尤度である。Here, p (o_n_e | s_i_x) of the second term is
This is the frequency probability in the number of iterations parameter (k). According to the definition of the variable α in the forward direction, the variable α (6, 3) is the likelihood of the sequence “one s_i_x”. Further, according to the definition of the variable β in the backward direction, the variable β (9, 3) becomes When the sequence “o_n_e” is obtained, the sequence “ei
ghtsixthreetwo ”.

【００６３】数１５の分母では、可能な各解析結果の尤
度を、シーケンス「s_i_x」がこの解析において発生す
るのと同じ回数で加算する。等価である、フォワード・
バックワードアルゴリズムを用いた前後方向の定式化で
は、シーケンス「s_i_x」が発生し、時刻インデックス
（ｔ）で終了するすべての全解析結果の尤度値をまずグ
ループ化した後に加算し、時刻インデックスｔを越えた
時点で加算を終了する。In the denominator of equation 15, the likelihood of each possible analysis result is added the same number of times as the sequence "s_i_x" occurs in this analysis. Equivalent, forward
In the forward-backward formulation using the backward algorithm, a sequence “s_i_x” is generated, the likelihood values of all analysis results ending at the time index (t) are first grouped, and then added, and the time index t The addition ends when the value exceeds.

【００６４】上述の例では、シーケンス「s_i_x」は、
時刻インデックス（６）と時刻インデックス（１７）で
終了するように発生している。シーケンス「s_i_x」が
時刻インデックス（６）で終了するように発生するすべ
ての解析結果の尤度値の加算は、シーケンス「o n e s_
i_x o_n_e e i g h t s i x t h r e e t w o」の尤度
であり、これは次式に等しい。In the above example, the sequence “s_i_x” is
It is generated to end at the time index (6) and the time index (17). The addition of the likelihood values of all the analysis results that occur so that the sequence “s_i_x” ends at the time index (6) is the sequence “one s_
i_x o_n_e eightsixthreetwo ”, which is equal to:

【００６５】[0065]

【数３０】 [Equation 30]

【００６６】ここで、前方向の変数αの定義により、変
数α（６，３）はシーケンス「o ne s_i_x」の尤度であ
り、後ろ方向の変数βの定義により、変数β（９，３）
は、シーケンス「o_n_e」が与えられたときの、シーケ
ンス「e i g h t s i x t hr e e t w o」の尤度であ
る。Here, according to the definition of the forward variable α, the variable α (6, 3) is the likelihood of the sequence “ones_i_x”, and according to the definition of the backward variable β, the variable β (9, 3) )
Is the likelihood of the sequence “eightsixt hr eetwo” when the sequence “o_n_e” is given.

【００６７】次いで、時刻インデックス（１７）におい
てシーケンス「s_i_x」が終了するすべての解析結果の
尤度値の加算は、シーケンス「o n e s i x o n e e i
g ht s_i_x t_h_r_e_e t w o」の尤度であり、これは次
式に等しい。Next, at the time index (17), the addition of the likelihood values of all the analysis results at which the sequence “s_i_x” ends is determined by the sequence “onesixoneei”.
g ht s_i_x t_h_r_e_e two ”, which is equal to the following equation.

【００６８】[0068]

【数３１】 (Equation 31)

【００６９】ここで、前方向の変数αの定義により、変
数α（１７，３）はシーケンス「on e s i x o n e e i
g h t s_i_x」の尤度であり、後ろ方向の変数βの定義
により、変数β（２２，５）は、シーケンス「t_h_r_e_
e」が与えられたときの、シーケンス「t w o」の尤度で
ある。Here, according to the definition of the variable α in the forward direction, the variable α (17, 3) is changed to the sequence “on esixoneei
ght s_i_x ”, and according to the definition of the variable β in the backward direction, the variable β (22, 5) corresponds to the sequence“ t_h_r_e_x ”.
The likelihood of the sequence "two" given "e".

【００７０】従って、「o n e s i x o n e e i g h t
s i x t h r e e t w o」なる学習データにおける、反
復回数パラメータ（ｋ＋１）における頻度確率p(o_n_e
│s_i_x)に対する、フォワード・バックワードアルゴリ
ズムを用いた再推定式は次式のようになる。Therefore, "onesixoneeight
In the learning data "sixthreetwo", the frequency probability p (o_n_e
| S_i_x) is given by the following equation using the forward / backward algorithm.

【００７１】[0071]

【数３２】ここで、(Equation 32) here,

【数３３】 [Equation 33]

【数３４】 (Equation 34)

【００７２】以上説明したように、本発明の実施形態に
おける特徴は、フォワード・バックワードアルゴリズム
を用いて、数２３及び数２４を含む数２２を定式化した
ことにあるが、当該特徴とする数式は、以下の意味を有
する。当該式は、入力データにおいて、当該単位列であ
る第２の単位列が第１の単位列に続くときの単位列のシ
ーケンス間のバイグラムの頻度確率を、上記入力データ
における処理対象の各単位列に対して計算するための式
であり、上記シーケンス間のバイグラムの頻度確率は、
第１と第２の単位列を含むすべてのセグメント化での尤
度の和を、第１の単位列を含むすべてのセグメント化で
の尤度の和で除算することによって得られる。また、上
記式は、上記入力データにおいて各単位列が発生する平
均回数を示す分母と、上記入力データにおいて第２の単
位列が第１の単位列に続くときの各単位列に対する平均
回数を示す分子とを有し、上記分子は、処理対象の各単
位列に対する、上記前方尤度と、当該単位列の直前の単
位列を条件としたときの当該単位列の頻度確率と、上記
後方尤度の積の和であり、上記分母は、処理対象の各単
位列に対する、上記前方尤度と、当該単位列の直前の単
位列を条件としたときのすべての単位列の頻度確率と、
上記後方尤度の積の和である。As described above, the feature of the embodiment of the present invention is that Formula 22 including Formula 23 and Formula 24 is formulated using the forward-backward algorithm. Has the following meaning. The expression is used to calculate the frequency of the bigram between the sequences of unit columns when the second unit column, which is the unit column, follows the first unit column in the input data, and the processing target unit columns in the input data. Where the frequency probability of a bigram between the above sequences is
It is obtained by dividing the sum of the likelihoods in all the segmentations including the first and second unit strings by the sum of the likelihoods in all the segmentations including the first unit string. Further, the above expression shows a denominator indicating the average number of times each unit column occurs in the input data, and an average number of times for each unit column when the second unit column follows the first unit column in the input data. A numerator, the numerator is the forward likelihood for each unit sequence to be processed, the frequency probability of the unit sequence when a unit sequence immediately before the unit sequence is a condition, and the backward likelihood Where the denominator is the forward likelihood for each unit sequence to be processed, the frequency probabilities of all unit sequences on the condition of the unit sequence immediately before the unit sequence,
This is the sum of the products of the backward likelihood.

【００７３】＜＜Ａ１．３＞＞前方向の変数αと後ろ方
向の変数βの計算例例として、データ「o n e s i x o n e e i g h t s i
x t h r e e t w o」について変数α（９，３）と変数
β（９，３）を以下に計算する。ここで、変数α（９，
３）は、シーケンス「o n e s i x o_n_e」の尤度であ
り、このシーケンスは、時刻インデックス９までのシー
ケンスであって、最後尾において長さ３のシーケンスを
有する。また、変数β（９，３）は、シーケンス「o_n_
e」が与えられたときの、シーケンス「e i g h t s i x
t h r e e t w o」の条件つき尤度であり、このシーケ
ンスは、時刻インデックス９以降のシーケンスであっ
て、先行するシーケンス”o_n_e”は予め知られてい
る。<< A1.3 >> Calculation Example of Forward Variable α and Backward Variable β As an example, the data “onesixoneeightsi
The variable α (9,3) and the variable β (9,3) for “xthreetwo” are calculated below. Here, the variable α (9,
3) is the likelihood of the sequence “onesix o_n_e”, which is the sequence up to the time index 9 and has a length 3 sequence at the end. Further, the variable β (9, 3) corresponds to the sequence “o_n_
sequence "eightsix" given "e"
This sequence is a conditional likelihood of "threetwo", and this sequence is a sequence after time index 9, and the preceding sequence "o_n_e" is known in advance.

【００７４】シーケンス”o_n_e”までの尤度（前方の
変数）α（９，３）は、次式で計算される。なお、シー
ケンス（系列）の長さの最大値を”５”に指定した場合
について考える。The likelihood (forward variable) α (9, 3) up to the sequence “o_n_e” is calculated by the following equation. It is assumed that the maximum value of the length of a sequence is set to “5”.

【数３５】α（９，３）＝下記の加算値（ａ）n_e_s_i_xについて：α（６，５）×ｐ（o_n_e｜
n_e_s_i_x）（ｂ）e_s_i_xについて：α（６，４）×ｐ（o_n_e｜e_
s_i_x）（ｃ）s_i_xについて：α（６，３）×ｐ（o_n_e｜s_i_
x）（ｄ）i_xについて：α（６，２）×ｐ（o_n_e｜i_x）（ｅ）ｘについて：α（６，１）×ｐ（o_n_e｜x）Α (9,3) = added value below (a) For n_e_s_i_x: α (6,5) × p (o_n_e |
n_e_s_i_x) (b) Regarding e_s_i_x: α (6,4) × p (o_n_e | e_
s_i_x) (c) Regarding s_i_x: α (6,3) × p (o_n_e | s_i_
x) (d) For i_x: α (6,2) × p (o_n_e | i_x) (e) For x: α (6,1) × p (o_n_e | x)

【００７５】シーケンス”o_n_e”の条件のもとでのそ
の後方の尤度（後方の変数）β（９，３）は、次式で計
算される。The backward likelihood (backward variable) β (9, 3) under the condition of the sequence “o_n_e” is calculated by the following equation.

【数３６】β（９，３）＝下記の加算値（ａ）e_i_g_h_tについて：ｐ（e_i_g_h_t｜o_n_e）×
β（９＋５，５）（ｂ）e_i_g_hについて：ｐ（e_i_g_h
｜o_n_e）×β（９＋４，４）（ｃ）e_i_gについて：ｐ
（e_i_g｜o_n_e）×β（９＋３，３）（ｄ）e_iについ
て：ｐ（e_i｜o_n_e）×β（９＋２，２）（ｅ）eについて：ｐ（e｜o_n_e）×β（９＋１，１）(36) β (9,3) = added value below (a) For e_i_g_h_t: p (e_i_g_h_t | o_n_e) ×
β (9 + 5,5) (b) For e_i_g_h: p (e_i_g_h
| O_n_e) × β (9 + 4,4) (c) For e_i_g: p
(E_i_g | o_n_e) × β (9 + 3, 3) (d) For e_i: p (e_i | o_n_e) × β (9 + 2, 2) (e) For e: p (e | o_n_e) × β (9 + 1, 1)

【００７６】＜＜Ａ２＞＞クラスの事例シーケンスがクラスに属するケースでは、上述の例のバ
イグラムの確率部分を、以下のように置き換えることに
よって変数α，βが計算される。（ａ）ｐ（o_n_e｜n_e_s_i_x）は、ｐ（class of o_n_e
｜class of n_e_s_i_x）×ｐ（o_n_e｜class of o_n_
e）と取って換えられる。（ｂ）ｐ（o_n_e｜e_s_i_x）は、ｐ（class of o_n_e
｜ class of e_s_i_x）×ｐ（o_n_e｜class of o_n_
e）と取って換えられる。（ｃ）ｐ（o_n_e｜s_i_x）
は、ｐ（class of o_n_e｜class of s_i_x）×ｐ（o_n_
e｜class of o_n_e）と取って換えられる。（ｄ）ｐ（o_n_e｜i_x）は、ｐ（class of o_n_e｜clas
s of i_x）×ｐ（o_n_e｜class of o_n_e）と取って換
えられる。（ｅ）ｐ（o_n_e｜x）は、ｐ（class of o_n_e｜class
of x）×ｐ（o_n_e｜class of o_n_e）と取って換えら
れる。（ｆ）ｐ（e_i_g_h_t｜o_n_e）は、ｐ（class of e_i_g
_h_t｜class of o_n_e）×ｐ（e_i_g_h_t｜class of e_
i_g_h_t）と取って換えられる。（ｇ）ｐ（e_i_g_h｜o_n_e）は、ｐ（class of e_i_g_h
｜class of o_n_e）×ｐ（e_i_g_h｜class of e_i_g_
h）と取って換えられる。（ｈ）ｐ（e_i_g｜o_n_e）は、ｐ（class of e_i_g｜cl
ass of o_n_e）×ｐ（e_i_g｜class of e_i_g）と取っ
て換えられる。（ｉ）ｐ（e_i｜o_n_e）は、ｐ（class of e_i｜class
of o_n_e）×ｐ（e_i｜class of e_i）と取って換えら
れる。（ｊ）ｐ（e｜o_n_e）は、ｐ（class of e｜class of o
_n_e）×ｐ（e｜class of e）と取って換えられる。<< A2 >> Class Case In the case where a sequence belongs to a class, variables α and β are calculated by replacing the probability part of the bigram in the above example as follows. (A) p (o_n_e | n_e_s_i_x) is p (class of o_n_e
| Class of n_e_s_i_x) × p (o_n_e | class of o_n_
Replaced with e). (B) p (o_n_e | e_s_i_x) is p (class of o_n_e
| Class of e_s_i_x) x p (o_n_e | class of o_n_
Replaced with e). (C) p (o_n_e | s_i_x)
Is p (class of o_n_e | class of s_i_x) × p (o_n_
e | class of o_n_e). (D) p (o_n_e | i_x) is p (class of o_n_e | clas
s of i_x) × p (o_n_e | class of o_n_e). (E) p (o_n_e | x) is p (class of o_n_e | class
of x) × p (o_n_e | class of o_n_e). (F) p (e_i_g_h_t | o_n_e) is p (class of e_i_g
_h_t | class of o_n_e) × p (e_i_g_h_t | class of e_
i_g_h_t). (G) p (e_i_g_h | o_n_e) is p (class of e_i_g_h)
| Class of o_n_e) × p (e_i_g_h | class of e_i_g_
Replaced with h). (H) p (e_i_g | o_n_e) is p (class of e_i_g | cl
ass of o_n_e) × p (e_i_g | class of e_i_g). (I) p (e_i | o_n_e) is p (class of e_i | class
of o_n_e) × p (e_i | class of e_i). (J) p (e | o_n_e) is p (class of e | class of o
_n_e) × p (e | class of e).

【００７７】＜統計的言語モデル生成処理＞図３は、図
１の統計的言語モデル生成部２０によって実行される統
計的言語モデル生成処理を示すフローチャートである。
ここで、統計的言語モデル生成部２０は、図１に示すよ
うに、次のメモリ３１乃至３６に区分されたワーキング
ＲＡＭ３０を備える。（ａ）パラメータメモリ３１：当該生成処理で用いる種
々の設定パラメータを記憶するメモリである。（ｂ）シーケンス頻度確率メモリ３２：計算された各シ
ーケンスの頻度確率を記憶するメモリである。（ｃ）クラス定義メモリ３３：推定された各クラスに属
する文字列を記憶するメモリである。（ｄ）クラス条件付き頻度確率メモリ３４：推定された
各クラスに属する各文字列に対する頻度確率、すなわ
ち、クラスの条件付きのクラス間の文字列の頻度確率を
記憶するメモリである。（ｅ）クラスバイグラム頻度確率メモリ３５：クラスの
バイグラムの頻度確率を記憶するメモリである。（ｆ）セグメント化されたシーケンスメモリ３６：再推
定処理後のセグメント化されたシーケンス（文字列）を
記憶するメモリである。<Statistical Language Model Generation Processing> FIG. 3 is a flowchart showing the statistical language model generation processing executed by the statistical language model generation unit 20 of FIG.
Here, the statistical language model generation unit 20 includes a working RAM 30 divided into the following memories 31 to 36, as shown in FIG. (A) Parameter memory 31: A memory for storing various setting parameters used in the generation processing. (B) Sequence frequency probability memory 32: A memory for storing the calculated frequency probability of each sequence. (C) Class definition memory 33: a memory for storing character strings belonging to each estimated class. (D) Class conditional frequency probability memory 34: A memory for storing the estimated frequency probability for each character string belonging to each class, that is, the frequency probability of a character string between class conditional classes. (E) Class bigram frequency probability memory 35: A memory that stores the frequency probability of a bigram of a class. (F) Segmented sequence memory 36: A memory for storing a segmented sequence (character string) after the re-estimation processing.

【００７８】図３において、まず、ステップＳ１では、
学習用テキストデータメモリ２１からテキストデータを
読み込む。ここで、入力される学習用テキストデータ
は、離散的な単位のシーケンスであり、ここで、単位と
は例えば、文字であり、シーケンスは単語又は文となり
得る文字列である。また、予め下記の入力パラメータが
設定されてパラメータメモリ３１に記憶されている。（ａ）シーケンスの最大長（単位の数で表す。）、
（ｂ）再推定処理後のクラス数、（ｃ）廃棄するシーケ
ンス数のしきい値（すなわち、廃棄するシーケンスの発
生数の最小値）、及び（ｄ）終了条件。ここで、終了条
件は、例えば、反復回数ｋのしきい値である。In FIG. 3, first, in step S1,
The text data is read from the learning text data memory 21. Here, the input learning text data is a sequence of discrete units, where the unit is, for example, a character, and the sequence is a character string that can be a word or a sentence. The following input parameters are set in advance and stored in the parameter memory 31. (A) the maximum length of the sequence (represented by the number of units),
(B) the number of classes after the re-estimation process; (c) a threshold value for the number of discarded sequences (that is, the minimum value of the number of occurrences of discarded sequences); and (d) termination conditions. Here, the termination condition is, for example, a threshold value of the number of repetitions k.

【００７９】次いで、ステップＳ２で、初期化処理が実
行される。入力された学習用テキストデータにおいて、
複数の単位からなるシーケンスの相対的な頻度を計数し
て、それに基づいて各シーケンスの頻度確率を初期設定
する。また、上記設定された廃棄するシーケンス数のし
きい値以下のシーケンスについては廃棄する。そして、
反復回数パラメータｋを０にリセットする。Next, in step S2, an initialization process is executed. In the input training text data,
The relative frequency of the sequence consisting of a plurality of units is counted, and the frequency probability of each sequence is initialized based on the relative frequency. Also, a sequence that is equal to or less than the set threshold value of the number of sequences to be discarded is discarded. And
Reset the iteration number parameter k to zero.

【００８０】次いで、ステップＳ３では、ブラウンアル
ゴリズムを用いた分類処理を実行する。この分類処理で
は、反復回数パラメータｋのときの各シーケンスの頻度
確率に基づいて、クラス間の相互情報量の損失が最小と
なるように、反復回数パラメータｋのときの、クラス定
義、クラス条件付きクラス間のシーケンスの頻度確率、
及びクラスバイグラムの頻度確率を計算してそれぞれメ
モリ３２乃至３５に出力して記憶する。この処理におけ
る分類基準は、隣接するシーケンス間の相互情報量であ
り、上述のアルゴリズムを用いる。これらの相互情報量
とアルゴリズムは、隣接する単語の場合に対して、ブラ
ウンによって提案されており、本実施形態では、ブラウ
ンアルゴリズムを用いる。しかしながら、本発明はこれ
に限らず、単位の頻度確率を基礎とする他の分類アルゴ
リズムを使用することができる。Next, in step S3, a classification process using the Brownian algorithm is executed. In this classification processing, based on the frequency probability of each sequence at the time of the iteration number parameter k, the class definition and the class condition with the class number at the time of the iteration number parameter k are set so as to minimize the loss of mutual information between classes. Frequency probability of the sequence between classes,
And the frequency probability of the class bigram is calculated and output to and stored in the memories 32 to 35, respectively. The classification criterion in this process is the mutual information amount between adjacent sequences, and the above-described algorithm is used. These mutual information and algorithms are proposed by Brown for the case of adjacent words, and in this embodiment, the Brown algorithm is used. However, the invention is not so limited and other classification algorithms based on unit frequency probabilities can be used.

【００８１】次いで、ステップＳ４において、フォワー
ド・バックワードアルゴリズムを参照して得られた数２
２−数２４を用いて、バイ−マルチグラムを用いた再推
定処理を実行する。この処理では、直前のステップＳ３
で計算された、反復回数パラメータｋのときの、クラス
定義、クラス条件付きクラス間のシーケンスの頻度確
率、及びクラスバイグラムの頻度確率に基づいて、次の
反復パラメータのときのシーケンス間のバイグラムの頻
度確率の最尤推定値を得るように、反復回数パラメータ
（ｋ＋１）のときの、各シーケンスの頻度確率を再推定
して計算して、メモリ３２に出力して記憶する。この処
理における処理基準は、上記数２２−数２４を用いて、
すなわち、複数のシーケンスのクラスとバイグラムの依
存性を仮定して計算された解析結果の尤度の中の最大値
である最尤推定値を基準値として用いることであり、再
推定のためのアルゴリズムとしてＥＭアルゴリズムを用
いる。Next, in step S4, the number 2 obtained by referring to the forward / backward algorithm
2—Re-estimation processing using bi-multigrams is performed using Equation 24. In this process, the immediately preceding step S3
Based on the class definition, the frequency probability of the sequence between classes with class conditions, and the frequency probability of the class bigram for the iteration number parameter k calculated in the above, the frequency of the bigram between the sequences for the next iteration parameter In order to obtain the maximum likelihood estimation value of the probability, the frequency probability of each sequence in the case of the number of iterations parameter (k + 1) is re-estimated and calculated, output to the memory 32, and stored. The processing criterion in this processing is expressed by using the above equations (22) to (24).
That is, the maximum likelihood estimation value that is the maximum value among the likelihoods of the analysis results calculated assuming the dependence of the classes of a plurality of sequences and the bigram is used as a reference value. EM algorithm is used.

【００８２】次いで、ステップＳ５で、所定の終了条件
を満足するか否かが判断され、ＮＯのときは、ステップ
Ｓ６で反復回数パラメータｋを１だけインクリメントし
てステップＳ３及びＳ４の処理を繰り返す。一方、ステ
ップＳ５でＹＥＳであれば、生成された統計的言語モデ
ルのデータを統計的言語モデルメモリ２２に出力して記
憶する。ここで、生成された統計的言語モデルのデータ
とは、各シーケンスの頻度確率に関するデータであり、
具体的には、下記のデータである。（ａ）入力されたデータを複数のシーケンスにセグメン
ト化したときの最尤推定値を有する各シーケンスのデー
タ；（ｂ）クラス定義、すなわち、各クラスにおけるシーケ
ンス；及び（ｃ）クラスの頻度確率、すなわち、各クラスのバイグ
ラム確率、各シーケンスのクラス条件付き確率。Next, in step S5, it is determined whether or not a predetermined end condition is satisfied. If NO, the repetition number parameter k is incremented by 1 in step S6, and the processing in steps S3 and S4 is repeated. On the other hand, if “YES” in the step S5, data of the generated statistical language model is output to the statistical language model memory 22 and stored. Here, the data of the generated statistical language model is data on the frequency probability of each sequence,
Specifically, it is the following data. (A) data of each sequence having a maximum likelihood estimate when the input data is segmented into a plurality of sequences; (b) class definitions, ie, sequences in each class; and (c) frequency probabilities of the classes; The bigram probability for each class, the class conditional probability for each sequence.

【００８３】図４は、図３のサブルーチンであるブラウ
ンアルゴリズムを用いた分類処理を示すフローチャート
である。単語の自動分類のために、ブラウン他によって
シーケンスの自動分類に使用するためのアルゴリズム
（例えば、従来技術文献５参照。）が提案されており、
本実施形態では、これを使用する。ブラウンらは、文章
の尤度を最大化するクラスへの分割又はセグメント化
が、隣接する単語間の相互情報量を最大化する分割又は
セグメント化でもあることを示している。彼らは単語の
バイグラム分布を入力とし、単語クラスへの分割及びク
ラス分布を出力する貪欲なアルゴリズム(greedy algori
thm)を提案している。一方、本発明者は、入力としてバ
イ−マルチグラムの頻度確率の分布（すなわち、シーケ
ンスのバイグラムの頻度確率の分布）を採用することに
より、このアルゴリズムを適用している。出力は、シー
ケンスのクラスへのセグメント化及びその各シーケンス
の頻度確率の分布である。FIG. 4 is a flowchart showing a classification process using the Brownian algorithm which is a subroutine of FIG. For automatic word classification, Brown et al. Have proposed an algorithm for use in automatic sequence classification (for example, see Prior Art Document 5).
In the present embodiment, this is used. Brown et al. Show that the division or segmentation of a sentence into classes that maximizes the likelihood is also the division or segmentation that maximizes the mutual information between adjacent words. They take as input a bigram distribution of words, split them into word classes, and output a class distribution (greedy algori
thm). On the other hand, the present inventor has applied this algorithm by employing the distribution of bi-multigram frequency probabilities (ie, the distribution of sequence bigram frequency probabilities) as input. The output is the segmentation of the sequences into classes and the distribution of the frequency probabilities of each sequence.

【００８４】この分類処理で用いる相互情報量を用いた
単語のクラスタリングについて詳細説明する（例えば、
従来技術文献６「北研二ほか著，”音声言語処理”，森
北出版，ｐｐ．１１０−１１３，１９９６年１１月１５
日発行」参照。）。ここでは、隣接する単語に基づく単
語の分類法として、クラス間の相互情報量を最大にする
方法について説明する。相互情報量に基づくクラスタリ
ングは、バイグラムのクラスモデルにおいて単語をクラ
スへ分割する最尤な方法は、隣接するクラスの平均相互
情報量を最大にするようなクラス割り当てであること
を、理論的な根拠としている。Ｎ−ｇｒａｍのクラスモ
デルとは、次式のように、単語のクラスのＮ−ｇｒａｍ
とクラス別の単語の出現分布の組み合わせで、単語のＮ
−ｇｒａｍを近似する言語モデルのことである(この式
は、単語クラスを品詞に置き換えれば、形態素解析にお
けるＨＭＭの式と同じになる。従って、この単語分類法
は、最適な品詞体系を自動的に求める方法とも考えられ
る。The clustering of words using mutual information used in this classification processing will be described in detail (for example,
Prior Art Document 6: Kenji Kita et al., "Speech Language Processing", Morikita Publishing, pp. 110-113, November 15, 1996.
Day issue ”. ). Here, a method of maximizing mutual information between classes will be described as a method of classifying words based on adjacent words. The theoretical basis for mutual information-based clustering is that the maximum likelihood method of dividing words into classes in a bigram class model is a class assignment that maximizes the average mutual information of adjacent classes. And The N-gram class model is an N-gram of a word class as shown in the following equation.
And the word's N
(This expression is the same as the HMM expression in morphological analysis if the word class is replaced by the part of speech. Therefore, this word classification method automatically determines the optimal part of speech system. It is also conceivable to ask for it.

【数３７】Ｐ（ｗ_i｜ｗ₁ ^i-1）≒Ｐ（ｗ_i｜ｃ_i）Ｐ（ｃ_i
｜ｃ_i-n+1 ^i-1）P (w _i | w ₁ ^i-1 ) ≒ P (w _i | c _i ) P (c _i
| C _{i-n + 1} ^i-1 )

【００８５】ここで、単語ｗ_iをクラスｃ_iに写像する関
数πを用いて、Ｖ個の単語をＣ個のクラスに分割すると
仮定する。学習テキストｔ₁ ^Tが与えられたとき、Ｐ（ｔ
₂ ^T｜ｔ₁）＝Ｐ（Ｔ₂｜Ｔ₁）Ｐ（ｔ₃｜ｔ₂）…Ｐ（ｔ_T｜
ｔ_T-1）を最大にするように関数πを決めればよい。詳
細は省略するが、単語あたりの対数尤度Ｌ（π）、単語
のエントロピーＨ（ｗ）、隣接するクラスの平均相互情
報量Ｉ（ｃ₁；ｃ₂）の間には、近似的に次式の関係が成
り立つ。Here, it is assumed that V words are divided into C classes using a function π that maps the words w _i to classes c _i . Given a learning text t ₁ ^T , P (t
_{^{_{2 T | t 1) = P}}} (T 2 | T 1) P (t 3 | t 2) ... P (t T |
The function π may be determined so as to maximize t _T-1 ). Although details are omitted, the log likelihood L (π) per word, the entropy H (w) of the word, and the average mutual information I (c ₁ ; c ₂ ) of the adjacent classes are approximately The relationship of the expression holds.

【００８６】[0086]

【数３８】 (38)

【００８７】ここで、Ｈ（ｗ）は分割πに依存しないか
ら、Ｌ（π）を最大化するためには、Ｉ（ｃ₁；ｃ₂）を
最大化すればよい。いまのところ、平均相互情報量を最
大化するような分割を求めるアルゴリズムは知られてい
ない。しかしながら、本実施形態で用いる次のような貪
欲なアルゴリズム（greedy algorithm）でも、かなり興
味深いクラスタを得ることができる。このように包含関
係を持つクラスタを生成する方法は、階層的クラスタリ
ングと呼ばれる。これに対して、ｋ平均アルゴリズムの
ように、重なりを持たないクラスタを生成する方法は非
階層的クラスタリングと呼ばれる。Since H (w) does not depend on the division π, L (π) can be maximized by maximizing I (c ₁ ; c ₂ ). At present, there is no known algorithm for obtaining a partition that maximizes the average mutual information amount. However, even the following greedy algorithm used in the present embodiment can obtain a rather interesting cluster. Such a method of generating a cluster having an inclusion relation is called hierarchical clustering. On the other hand, a method of generating clusters having no overlap, such as the k-means algorithm, is called non-hierarchical clustering.

【００８８】次の併合をＶ−１回繰り返すと、すべての
単語が一つのクラスになる。すなわち、クラスが併合さ
れる順序から、単語を葉とする二分木ができる。１．すべての単語に対して、一つのクラスを割り当て
る。２．可能な二つのクラスの組み合わせの中で、平均相互
情報量の損失を最小にする組み合わせを選択し、これら
を一つのクラスに併合する。３．ステップ２をＶ−Ｃ回繰り返すとＣ個のクラスが得
られる。When the next merging is repeated V-1 times, all the words are in one class. That is, from the order in which the classes are merged, a binary tree having words as leaves is created. 1. Assign one class to all words. 2. Among the possible combinations of the two classes, the combination that minimizes the loss of the average mutual information is selected, and these are combined into one class. 3. By repeating Step 2 VC times, C classes are obtained.

【００８９】一般に、クラスタが形成される過程を表す
階層構造は樹形図（dendrogram）と呼ばれるが、自然言
語処理ではこれをシソーラスの代わりに使うことができ
る。単純に考えると、この準最適なアルゴリズムは、語
彙数Ｖに対してＶ⁵の計算量を必要とする。しかし、
（１）二つのクラスタを併合したときの情報量の変化だ
けを求めればよいことや、（２）二つのクラスタの併合
により相互情報量が変化するのは全体の一部に過ぎない
ことを利用すれば、Ｏ（Ｖ³）の計算、すなわち、繰り
返し回数Ｖの三乗に比例するオーダーの計算コストで済
む。In general, a hierarchical structure representing a process of forming a cluster is called a dendrogram. In natural language processing, this can be used instead of a thesaurus. To put it simply, this sub-optimal algorithm requires V ⁵ computations for V vocabulary. But,
(1) It is necessary to obtain only the change in the information amount when two clusters are merged. (2) It is used that the mutual information amount changes by merging two clusters is only a part of the whole. Then, the calculation cost of O (V ³ ), that is, the calculation cost of the order proportional to the cube of the number of repetitions V is sufficient.

【００９０】分類処理（又はクラスタリング処理）を示
す図４において、まず、ステップＳ１１では、初期設定
処理が実行され、各シーケンスをその自らのクラスに割
り当てる。すなわち、各シーケンスｓ_iそれぞれ各クラ
スＣ_iに割り当てる。従って、クラスの初期バイグラム
の頻度確率の分布はシーケンスのバイグラムの頻度確率
の分布に等しく、また、In FIG. 4 showing the classification processing (or clustering processing), first, in step S11, an initial setting processing is executed, and each sequence is assigned to its own class. That is assigned to each class C _i respectively each sequence s _i. Thus, the distribution of the frequency probabilities of the initial bigrams of a class is equal to the distribution of the frequency probabilities of the bigrams of the sequence, and

【数３９】ｐ（s_i｜Ｃ_i）＝１である。P (s _i | C _i ) = 1.

【００９１】次いで、ステップＳ１２で、各クラスの対
（Ｃ_k，Ｃ_l）について、クラスＣ_kとクラスＣ_lとをマー
ジしたときの相互情報量の損失を計算した後、ステップ
Ｓ１３で、相互情報量の損失が最小であるクラスの対を
マージする。そして、ステップＳ１４で、上記マージに
従って、メモリ３４及び３５に記憶されたクラスの頻度
確率の分布を更新する。次いで、ステップＳ１５で、ス
テップＳ２の初期化処理で設定された必要なクラス数が
得られたか否かが判断され、ＮＯであるときは、ステッ
プＳ１２に戻り、上記の処理を繰り返す。一方、ステッ
プＳ１５で、ＹＥＳのときは、元のメインルーチンに戻
る。Next, in step S12, for each pair of classes (C _k , C _l ), the mutual information loss when class C _k and class C _l are merged is calculated. Merge pairs of classes with the least loss of information. Then, in step S14, the distribution of the frequency probabilities of the classes stored in the memories 34 and 35 is updated according to the merge. Next, in step S15, it is determined whether or not the required number of classes set in the initialization processing in step S2 has been obtained. If NO, the process returns to step S12, and the above processing is repeated. On the other hand, if YES in step S15, the process returns to the main routine.

【００９２】＜音声認識装置＞次いで、図１に示す連続
音声認識装置の構成及び動作について説明する。図１に
おいて、単語照合部４に接続された音素隠れマルコフモ
デル（以下、隠れマルコフモデルをＨＭＭという。）メ
モリ１１内の音素ＨＭＭは、各状態を含んで表され、各
状態はそれぞれ以下の情報を有する。（ａ）状態番号、（ｂ）受理可能なコンテキストクラ
ス、（ｃ）先行状態、及び後続状態のリスト、（ｄ）出
力確率密度分布のパラメータ、及び（ｅ）自己遷移確率
及び後続状態への遷移確率。なお、本実施形態において
用いる音素ＨＭＭは、各分布がどの話者に由来するかを
特定する必要があるため、所定の話者混合ＨＭＭを変換
して生成する。ここで、出力確率密度関数は３４次元の
対角共分散行列をもつ混合ガウス分布である。また、単
語照合部４に接続された単語辞書メモリ１２内の単語辞
書は、音素ＨＭＭメモリ１１内の音素ＨＭＭの各単語毎
にシンボルで表した読みを示すシンボル列を格納する。<Speech Recognition Apparatus> Next, the configuration and operation of the continuous speech recognition apparatus shown in FIG. 1 will be described. In FIG. 1, the phoneme HMM in the phoneme hidden Markov model (hereinafter, referred to as HMM) memory 11 connected to the word matching unit 4 is represented by including each state, and each state includes the following information. Having. (A) state number, (b) acceptable context class, (c) list of preceding and succeeding states, (d) parameters of output probability density distribution, and (e) self-transition probability and transition to succeeding state probability. Note that the phoneme HMM used in the present embodiment is generated by converting a predetermined speaker-mixed HMM because it is necessary to specify which speaker each distribution originates from. Here, the output probability density function is a Gaussian mixture distribution having a 34-dimensional diagonal covariance matrix. Further, the word dictionary in the word dictionary memory 12 connected to the word matching unit 4 stores a symbol string indicating a reading represented by a symbol for each word of the phoneme HMM in the phoneme HMM memory 11.

【００９３】図１において、話者の発声音声はマイクロ
ホン１に入力されて音声信号に変換された後、特徴抽出
部２に入力される。特徴抽出部２は、入力された音声信
号をＡ／Ｄ変換した後、例えばＬＰＣ分析を実行し、対
数パワー、１６次ケプストラム係数、Δ対数パワー及び
１６次Δケプストラム係数を含む３４次元の特徴パラメ
ータを抽出する。抽出された特徴パラメータの時系列は
バッファメモリ３を介して単語照合部４に入力される。In FIG. 1, a uttered voice of a speaker is input to a microphone 1 and converted into a voice signal, and then input to a feature extracting unit 2. After performing A / D conversion on the input audio signal, the feature extraction unit 2 performs, for example, LPC analysis, and performs 34-dimensional feature parameters including logarithmic power, 16th-order cepstrum coefficient, Δlogarithmic power, and 16th-order Δcepstrum coefficient. Is extracted. The time series of the extracted feature parameters is input to the word matching unit 4 via the buffer memory 3.

【００９４】単語照合部４は、ワン−パス・ビタビ復号
化法を用いて、バッファメモリ３を介して入力される特
徴パラメータのデータに基づいて、音素ＨＭＭ１１と単
語辞書１２とを用いて単語仮説を検出し尤度を計算して
出力する。ここで、単語照合部４は、各時刻の各ＨＭＭ
の状態毎に、単語内の尤度と発声開始からの尤度を計算
する。尤度は、単語の識別番号、単語の開始時刻、先行
単語の違い毎に個別にもつ。また、計算処理量の削減の
ために、音素ＨＭＭ１１及び単語辞書１２とに基づいて
計算される総尤度のうちの低い尤度のグリッド仮説を削
減する。単語照合部４は、その結果の単語仮説と尤度の
情報を発声開始時刻からの時間情報（具体的には、例え
ばフレーム番号）とともにバッファメモリ５を介して単
語仮説絞込部６に出力する。The word collating unit 4 uses the one-pass Viterbi decoding method and the word hypothesis using the phoneme HMM 11 and the word dictionary 12 based on feature parameter data input via the buffer memory 3. Is detected, the likelihood is calculated and output. Here, the word matching unit 4 determines whether each HMM
The likelihood within a word and the likelihood from the start of utterance are calculated for each state. The likelihood is individually provided for each word identification number, word start time, and difference between preceding words. Further, in order to reduce the amount of calculation processing, the grid hypothesis of a low likelihood among the total likelihoods calculated based on the phoneme HMM 11 and the word dictionary 12 is reduced. The word collating unit 4 outputs the resulting word hypothesis and likelihood information to the word hypothesis narrowing unit 6 via the buffer memory 5 together with time information (specifically, a frame number, for example) from the utterance start time. .

【００９５】単語仮説絞込部６は、単語照合部４からバ
ッファメモリ５を介して出力される単語仮説に基づい
て、統計的言語モデルメモリ２２内の統計的言語モデル
を参照して、終了時刻が等しく開始時刻が異なる同一の
単語の単語仮説に対して、当該単語の先頭音素環境毎
に、発声開始時刻から当該単語の終了時刻に至る計算さ
れた総尤度のうちの最も高い尤度を有する１つの単語仮
説で代表させるように単語仮説の絞り込みを行った後、
絞り込み後のすべての単語仮説の単語列のうち、最大の
総尤度を有する仮説の単語列を認識結果として出力す
る。本実施形態においては、好ましくは、処理すべき当
該単語の先頭音素環境とは、当該単語より先行する単語
仮説の最終音素と、当該単語の単語仮説の最初の２つの
音素とを含む３つの音素並びをいう。The word hypothesis narrowing section 6 refers to the statistical language model in the statistical language model memory 22 based on the word hypothesis output from the word collating section 4 via the buffer memory 5 and determines the end time. For the word hypothesis of the same word having the same start time but different start times, the highest likelihood among the total likelihoods calculated from the utterance start time to the end time of the word is determined for each head phoneme environment of the word. After narrowing down word hypotheses so that they are represented by one word hypothesis,
The word string of the hypothesis having the maximum total likelihood among the word strings of all the narrowed word hypotheses is output as the recognition result. In the present embodiment, preferably, the first phoneme environment of the word to be processed is three phonemes including the last phoneme of the word hypothesis preceding the word and the first two phonemes of the word hypothesis of the word. I mean a line.

【００９６】例えば、図２に示すように、（ｉ−１）番
目の単語Ｗｉ−１の次に、音素列ａ１，ａ２，…，ａｎ
からなるｉ番目の単語Ｗｉがくるときに、単語Ｗｉ−１
の単語仮説として６つの仮説Ｗａ，Ｗｂ，Ｗｃ，Ｗｄ，
Ｗｅ，Ｗｆが存在している。ここで、前者３つの単語仮
説Ｗａ，Ｗｂ，Ｗｃの最終音素は／ｘ／であるとし、後
者３つの単語仮説Ｗｄ，Ｗｅ，Ｗｆの最終音素は／ｙ／
であるとする。終了時刻ｔｅと先頭音素環境が等しい仮
説（図２では先頭音素環境が“ｘ／ａ１／ａ２”である
上から３つの単語仮説）のうち総尤度が最も高い仮説
（例えば、図２において１番上の仮説）以外を削除す
る。なお、上から４番めの仮説は先頭音素環境が違うた
め、すなわち、先行する単語仮説の最終音素がｘではな
くｙであるので、上から４番めの仮説を削除しない。す
なわち、先行する単語仮説の最終音素毎に１つのみ仮説
を残す。図２の例では、最終音素／ｘ／に対して１つの
仮説を残し、最終音素／ｙ／に対して１つの仮説を残
す。For example, as shown in FIG. 2, following the (i-1) -th word Wi-1, phoneme strings a1, a2,.
When the i-th word Wi consisting of
The six hypotheses Wa, Wb, Wc, Wd,
We and Wf exist. Here, the final phoneme of the former three word hypotheses Wa, Wb, Wc is / x /, and the final phoneme of the latter three word hypotheses Wd, We, Wf is / y /
And The hypothesis with the highest total likelihood (for example, 1 in FIG. 2) is the hypothesis in which the end time te is the same as the first phoneme environment (the top three word hypotheses in which the first phoneme environment is “x / a1 / a2” in FIG. 2). Delete the hypothesis). The fourth hypothesis from the top is not deleted because the first phoneme environment is different, that is, the last phoneme of the preceding word hypothesis is y instead of x. That is, only one hypothesis is left for each final phoneme of the preceding word hypothesis. In the example of FIG. 2, one hypothesis is left for the final phoneme / x /, and one hypothesis is left for the final phoneme / y /.

【００９７】以上の実施形態においては、当該単語の先
頭音素環境とは、当該単語より先行する単語仮説の最終
音素と、当該単語の単語仮説の最初の２つの音素とを含
む３つの音素並びとして定義されているが、本発明はこ
れに限らず、先行する単語仮説の最終音素と、最終音素
と連続する先行する単語仮説の少なくとも１つの音素と
を含む先行単語仮説の音素列と、当該単語の単語仮説の
最初の音素を含む音素列とを含む音素並びとしてもよ
い。In the above embodiment, the head phoneme environment of the word is defined as a sequence of three phonemes including the last phoneme of the word hypothesis preceding the word and the first two phonemes of the word hypothesis of the word. Although defined, the present invention is not limited to this. The phoneme sequence of the preceding word hypothesis including the final phoneme of the preceding word hypothesis, and at least one phoneme of the preceding word hypothesis that is continuous with the final phoneme, And a phoneme sequence that includes a phoneme sequence that includes the first phoneme of the word hypothesis.

【００９８】以上の実施形態において、特徴抽出部２
と、単語照合部４と、単語仮説絞込部６と、統計的言語
モデル生成部２０とは、例えば、デジタル電子計算機な
どのコンピュータで構成され、バッファメモリ３，５
と、音素ＨＭＭメモリ１１と、単語辞書メモリ１２と、
学習用テキストデータメモリ２１と、統計的言語モデル
メモリ２２とは、例えばハードデイスクメモリなどの記
憶装置で構成される。In the above embodiment, the feature extraction unit 2
The word collating unit 4, the word hypothesis narrowing unit 6, and the statistical language model generating unit 20 are constituted by a computer such as a digital computer, for example.
A phoneme HMM memory 11, a word dictionary memory 12,
The learning text data memory 21 and the statistical language model memory 22 are configured by a storage device such as a hard disk memory.

【００９９】以上実施形態においては、単語照合部４と
単語仮説絞込部６とを用いて音声認識を行っているが、
本発明はこれに限らず、例えば、音素ＨＭＭ１１を参照
する音素照合部と、例えばＯｎｅＰａｓｓＤＰアル
ゴリズムを用いて統計的言語モデルを参照して単語の音
声認識を行う音声認識部とで構成してもよい。In the above embodiment, speech recognition is performed using the word collating unit 4 and the word hypothesis narrowing unit 6.
The present invention is not limited to this. For example, the present invention includes a phoneme matching unit that refers to the phoneme HMM 11 and a speech recognition unit that performs speech recognition of a word by referring to a statistical language model using, for example, the One Pass DP algorithm. Is also good.

【０１００】[0100]

【実施例】＜統計的言語モデル生成処理の第１の実施例
＞入力される学習データが、以下のような１０００文字
列の場合であり、単位である文字から単語にセグメント
化するための例である。「o n e s i x o n e e i g h t f i v e z e r o
...」但し、奇数の単語の後には必ず偶数の単語が後続し、偶
数の単語の後には必ず奇数の単語が後続する場合であ
る。当該実施例における入力パラメータは以下の通りで
ある。（ａ）１個のシーケンスの最大長＝５、（ｂ）クラス数
＝２、及び（ｃ）廃棄するシーケンスのしきい値＝１０
０。<Embodiment><First Embodiment of Statistical Language Model Generation Processing> An example in which input learning data is a 1000-character string as shown below, and segments from a unit character to a word. It is. "Onesixoneeightfivezer o
... "However, the odd-numbered word always follows the even-numbered word, and the even-numbered word always follows the odd-numbered word. The input parameters in this embodiment are as follows. (A) Maximum length of one sequence = 5, (b) number of classes = 2, and (c) threshold of sequence to be discarded = 10
0.

【０１０１】初期化処理（ｋ＝０）では、学習データに
おいて、１００回を越えて観測した文字のすべての組合
せの相対的な計数値を初期値とする。従って、反復パラ
メータｋ＝０におけるシーケンスの頻度確率の分布の計
数結果は次の表のようになる。なお、各シーケンスのｎ
ｂ（・）は計数値を表す。In the initialization processing (k = 0), relative count values of all combinations of characters observed more than 100 times in the learning data are set as initial values. Therefore, the counting result of the distribution of the frequency probability of the sequence at the repetition parameter k = 0 is as shown in the following table. Note that n of each sequence
b (·) represents a count value.

【０１０２】[0102]

【表２】 ―――――――――――――――――――――――――――――――――― Ｐ（n｜o）＝ｎｂ（on）／ｎｂ（o）＝０．０８ｐ（n e｜o）＝ｎｂ（one）／ｎｂ（o）＝０．０６．．．ｐ（n e s i x｜o）＝ｎｂ（onesix）／ｎｂ（o）＝０．００５ｐ（e｜o n）＝ｎｂ（one）／ｎｂ（on）＝０．９ｐ（e s｜o n）＝ｎｂ（ones）／ｎｂ（on）＝０．００５．．．ｐ（e s i x o｜o n）＝ｎｂ（onesixo）／ｎｂ（on）＝０．００１．．．ｐ（s i x｜o n e）＝ｎｂ（onesix）／ｎｂ（one）＝０．０５．．． ――――――――――――――――――――――――――――――――――[Table 2] ―――――――――――――――――――――――――――――――――― P (n | o) = nb (on) / nb (o) = 0.08 p (n e | o) = nb (one) / nb (o) = 0.06. . . p (n e s i x | o) = nb (onesix) / nb (o) = 0.005 p (e | o n) = nb (one) / nb (on) = 0.9 p (e s ｜ o n) = nb (ones) / nb (on) = 0.005. . . p (e s i x o ｜ o n) = nb (onesixo) / nb (on) = 0.001. . . p (s i x ｜ o n e) = nb (onesix) / nb (one) = 0.05. . . ――――――――――――――――――――――――――――――――――

【０１０３】ステップＳ３の分類処理では、入力データ
は、反復パラメータｋ＝０のときのシーケンスの頻度確
率の分布であり、当該分類処理における出力データは、
以下のようになる。（ａ）反復パラメータｋ＝１のときのクラス定義In the classification processing in step S3, the input data is the distribution of the frequency probability of the sequence when the repetition parameter k = 0, and the output data in the classification processing is
It looks like this: (A) Class definition when iterative parameter k = 1

【数４０】ｃｌａｓｓ１＝｛e s i x o;e;e t w o;n e
s i x;......;f o u r;f o u r f;...;g h t s;g h t o
n e;e i g h t｝[Equation 40] class1 = ｛e s i x o; e; e t w o; n e
s i x; ......; f o u r; f o u r f; ...; g h t s; g h t o
n e; e i g h t｝

【数４１】ｃｌａｓｓ２＝｛o n e;e s i x o;x;f i v;
f i v e;t s e v;s e v e n;......;x n i;x n i n e;n
i n e;...｝ｃｌａｓｓ３＝…… （ｂ）反復パラメータｋ＝１のときのクラス条件付き頻
度確率の分布[Equation 41] class2 = ｛o n e; e s i x o; x; f i v;
f i v e; t s e v; s e v e n; ......; x n i; x n i n e; n
i n e; ...｝ class3 = ... (b) Distribution of class conditional frequency probabilities when the repetition parameter k = 1

【数４２】ｐ（e s i x o｜class 1），ｐ（e｜class 1），．．．ｐ（o n e｜class 2），ｐ（e s i x o｜class
2），．．．（ｃ）反復パラメータｋ＝１のときのクラスバイグラム
の頻度確率の分布[Mathematical formula-see original document] p (e s i x o | class 1), p (e | class 1),. . . p (o n e | class 2), p (e s i x o ｜ class
2),. . . (C) Distribution of frequency probability of class bigram when iterative parameter k = 1

【数４３】ｐ（class 1｜class 2）＝０．３ｐ（class 2｜class 1）＝０．１ｐ（class 3｜class 1）＝０．４．．．P (class1 | class2) = 0.3 p (class2 | class1) = 0.1 p (class3 | class1) = 0.4. . .

【０１０４】ステップＳ４の再推定処理では、反復パラ
メータｋ＝１のときのクラス定義及びクラスの頻度確率
の分布を入力データとし、次に示す反復パラメータｋ＝
１のときのシーケンスの頻度確率の分布を出力する。In the re-estimation process in step S4, the class definition and the distribution of the class frequency probability when the iteration parameter k = 1 are used as input data, and the following iteration parameter k =
The distribution of the frequency probability of the sequence at 1 is output.

【数４４】ｐ（n｜o）＝０．９ｐ（n e｜o）＝０．８ｐ（n e s｜o）＝０．０５．．．ｐ（n e s i x｜o）＝０P (n | o) = 0.9 p (n e | o) = 0.8 p (n e s | o) = 0.05. . . p (n e s i x | o) = 0

【数４５】ｐ（e｜o n）＝０．０２ｐ（e s｜o n）＝０．００１．．．ｐ（e s i x o｜o n）＝０．．．ｐ（s i x｜o n e）＝０．５．．．[Equation 45] p (e | o n) = 0.02 p (e s ｜ o n) = 0.001. . . p (e s i x o ｜ o n) = 0. . . p (s i x ｜ o n e) = 0.5. . .

【０１０５】以下同様に処理が実行され、第１の実施例
における出力結果は以下のようになる。（ａ）セグメント化された入力文字列（ＭＬセグメント
化） ”o n e s i x o n e e i g h t f i v e z e r o
...” （ｂ）クラス定義Thereafter, the same processing is executed, and the output result in the first embodiment is as follows. (A) Input character string segmented (ML segmentation) "o n es i xo n ee i g h tf i v ez e r o
... ”(b) Class definition

【数４６】ｃｌａｓｓ１＝｛o n e;t h r e e;f i v e;
s e v e n;n i n e｝ｃｌａｓｓ２＝｛z e r o;t w o;f o u r;s i x;e i g
h t｝（ｃ）クラス条件付きの頻度確率の分布[Equation 46] class1 = ｛o n e; t h r e e; f i v e;
s e v e n; n i n e｝ class2 = ｛z e r o; t w o; f o u r; s i x; e i g
h t｝ (c) Distribution of frequency probabilities with class conditions

【数４７】ｐ（o n e｜class 1）＝０．２ｐ（t h r e e｜class 1）＝０．２ｐ（f i v e｜class 1）＝０．２．．．ｐ（z e r o｜class 2）＝０．２ｐ（t w o｜class 2）＝０．２（ｄ）クラスバイグラムの頻度確率の分布[Equation 47] p (o n e | class 1) = 0.2 p (t h r e e | class 1) = 0.2 p (f i v e | class 1) = 0.2. . . p (z e r o | class 2) = 0.2 p (t w o | class 2) = 0.2 (d) Distribution of frequency probability of class bigram

【数４８】ｐ（class 1｜class 2）＝１ｐ（class 2｜class 1）＝１P (class1 | class2) = 1 p (class2 | class1) = 1

【０１０６】＜統計的言語モデル生成処理の第２の実施
例＞入力される学習データが、自然言語のテキストデー
タによる以下の文、すなわち単語列である場合であっ
て、単位である単語をフレーズにセグメント化する場合
を説明するための実施例である。ここで、＜ｓ＞は開始
を示す記号であり、＜／ｓ＞は終了を示す記号である。「<s> good afternoon new washington hotel may i he
lp you ...</s>」ここで、入力パラメータは、以下の通りである。（ａ）シーケンスの最大長＝数個の単語（例えば、１乃
至５個の単語、以下の実施例では、４）、（ｂ）クラス
数＝１０００、及び（ｃ）初期化処理のしきい値＝３
０。<Second Embodiment of Statistical Language Model Generation Processing> In the case where the input learning data is the following sentence based on natural language text data, that is, a word string, the unit word is a phrase. 7 is an embodiment for explaining a case where the segmentation is performed. Here, <s> is a symbol indicating the start, and </ s> is a symbol indicating the end. "<S> good afternoon new washington hotel may i he
lp you ... </ s>] Here, the input parameters are as follows. (A) maximum length of sequence = several words (eg, 1 to 5 words, 4 in the following embodiment), (b) number of classes = 1000, and (c) threshold value of initialization processing = 3
0.

【０１０７】初期化処理（ｋ＝０）では、学習データに
おいて、３０回を越えて観測した単語のすべての組合せ
の相対的な計数値を初期値とする。従って、反復パラメ
ータｋ＝０におけるシーケンスの頻度確率の分布の計数
結果は次の表のようになる。In the initialization process (k = 0), relative count values of all combinations of words observed more than 30 times in the learning data are set as initial values. Therefore, the counting result of the distribution of the frequency probability of the sequence at the repetition parameter k = 0 is as shown in the following table.

【０１０８】[0108]

【表３】 [Table 3]

【０１０９】そして、第２の実施例における出力結果は
以下のようになる。（ａ）セグメント化された入力文字列（ＭＬセグメント
化）「good_afternoon new_washington_hotel may_i_help_y
ou」（ｂ）クラス定義The output result in the second embodiment is as follows. (A) Input character string segmented (ML segmentation) "good_afternoon new_washington_hotel may_i_help_y
ou "(b) Class definition

【数４９】ｃｌａｓｓ１＝｛good afternoon ; good mo
rning;hello ; may i help you...} ... ｃｌａｓｓ２＝｛new washington hotel ; sheraton ho
tel ; plaza;...｝ ... ｃｌａｓｓ１０００＝｛give me some ; tell me｝（ｃ）クラス条件付き頻度確率の分布[Equation 49] class1 = ｛good afternoon; good mo
rning; hello; may i help you ...} ... class2 = ｛new washington hotel; sheraton ho
tel; plaza; ...｝ ... class1000 = ｛give me some; tell me｝ (c) Distribution of frequency probability with class condition

【数５０】ｐ（good afternoon｜class 1）＝０．００３ｐ（good morning｜class 1）＝０．００２ｐ（hello｜class 1）＝０．００２．．．（ｄ）クラスバイグラムの頻度確率の分布[Mathematical formula-see original document] p (good afternoon | class 1) = 0.003 p (good morning | class 1) = 0.002 p (hello | class 1) = 0.002. . . (D) Distribution of class bigram frequency probabilities

【数５１】ｐ（class 2｜class 1）＝０．０４ｐ（class 3｜class 1）＝０．００５．．．P (class2 | class1) = 0.04 p (class3 | class1) = 0.005. . .

【０１１０】＜実験及び実験結果＞本発明者は、実施形
態の装置の性能を実験するために、下記の実験を行っ
た。まず、プロトコル及びデータベースの実験及び実験
結果について述べる。可変長フレーズ間のバイグラム依
存を学習する目的は、従来のワードバイグラムモデルの
限界を改善する一方で、モデル内のパラメータ数を単語
のトライグラムの場合よりも少なくすることにある。従
って、バイ−マルチグラムモデルの評価を行うために適
する基準は、その予測能力、パラメータ数を測定し、従
来のバイグラム、トライグラムモデルのそれらと比較す
ることである。予測能力は通常、次式のパープレキシテ
ィの測定によって評価される。<Experiment and Experimental Results> The present inventor conducted the following experiment in order to experiment the performance of the device of the embodiment. First, the protocol and database experiments and experimental results will be described. The purpose of learning bigram dependencies between variable length phrases is to improve the limitations of the conventional word bigram model while reducing the number of parameters in the model to that of a word trigram. Therefore, a suitable criterion for performing an evaluation of a bi-multigram model is to measure its predictive ability, number of parameters, and compare it to those of the conventional bigram, trigram model. The predictive ability is usually evaluated by measuring perplexity as:

【０１１１】[0111]

【数５２】ＰＰ＝ｅｘｐ｛−（１／Ｔ）ｌｏｇ（Ｌ（Ｗ））｝52 = PP = exp {-(1 / T) log (L (W))}

【０１１２】ここで、Ｔを文Ｗにおける単語の数であ
る。パープレキシティＰＰが低いほど、モデルの予測が
より高精度であることを示す。統計的モデルでは、実際
には２つのパープレキシティ値ＰＰ及びＰＰ^*が存在
し、数５２の中のＬ（Ｗ）をそれぞれ次式として計算さ
れる。Here, T is the number of words in the sentence W. A lower perplexity PP indicates that the prediction of the model is more accurate. In the statistical model, there are actually two perplexity values PP and PP ^* , and L (W) in Equation 52 is calculated as follows.

【０１１３】[0113]

【数５３】及び(Equation 53) as well as

【数５４】Ｌ（Ｗ）＝Ｌ（Ｗ，Ｓ^*）L (W) = L (W, S ^* )

【０１１４】２つのパープレキシティＰＰ^*−ＰＰの差
は、常に正の数又は零であり、文Ｗの解析結果Ｓの曖昧
さの度合い、あるいは発話認識機のように最良の解析結
果の尤度を用いて文の尤度に到達する場合は、予測の正
確さにおける損失を測定する。The difference between the two perplexities PP ^* -PP is always a positive number or zero, and the degree of ambiguity of the analysis result S of the sentence W, or the likelihood of the best analysis result like an utterance recognizer. If degrees are used to reach the likelihood of a sentence, measure the loss in prediction accuracy.

【０１１５】以下では、先ず、ある推定手順における損
失（ＰＰ^*−ＰＰ）を評価し、この推定手順自体の影響
力についてフォワード・バックワードアルゴリズム（数
１５）又は決定論的方法（数１６）を用いて考察する。
最後に、これら結果を従来のｎ−ｇｒａｍモデルを用い
て得られた結果と比較する。本目的の達成のため、クラ
ークソン（（Clarkson）ほか1997年）による公知のＣＭ
Ｕツールキットを用いる。実験対象として、次の表の本
特許出願人が所有する「旅行の手配」に関するデータを
使用する。In the following, first, the loss (PP ^* -PP) in a certain estimation procedure is evaluated, and the forward-backward algorithm (Equation 15) or the deterministic method (Equation 16) is used for the influence of the estimation procedure itself. Consider using
Finally, these results are compared with those obtained using the conventional n-gram model. To achieve this goal, a well-known CM by Clarkson et al. (1997)
Use the U toolkit. As the experimental object, the data of “Travel arrangement” owned by the present applicant in the following table is used.

【０１１６】[0116]

【表４】本特許出願人が所有する「旅行の手配」に関するデータ ――――――――――――――――――――――――――――――――― 学習テスト ――――――――――――――――――――――――――――――――― 文の数１３６５０２４３０トークンの数１６７０００２９０００（１％ＯＯＶ）語彙数３５２５＋２８０ＯＯＶ ――――――――――――――――――――――――――――――――― （注）ＯＯＶは、ＯｕｔＯｆＶｏｃａｂｕｌａｒｙの略であり、語彙にない単語をいう。[Table 4] Data on “Travel arrangements” owned by the applicant of the present invention ―――――――――――――――――――――――――――――――― ― Learning test ――――――――――――――――――――――――――――――――― Number of sentences 13650 2430 Number of tokens 167000 29000 (1% OOV ) Number of vocabulary 3525 + 280OOV ――――――――――――――――――――――――――――――― (Note) OOV is an abbreviation of Out Of Vocabulary Yes, words that are not in the vocabulary.

【０１１７】本データベースは、ホテルのクラークと顧
客の間で自発的に行われた旅行／宿泊施設情報について
の対話である。言いよどみの単語、及び間違った開始
は、単一のマーカー“^*ｕｈ^*”にマッピングされる。本
実験において、フレーズの最大長はｎ＝１語から４語ま
で変化させた（ｎ＝１ではバイ−マルチグラムは従来の
バイグラムに相当する）。すべてのバイ−マルチグラム
の頻度確率は、６回のトレーニング反復で推定され、初
期化において２０回以下、各反復において１０回以下の
頻度でしか現れないすべての文を放棄し、フレーズ辞書
の枝刈りを行った。ここで、初期化におけるしきい値が
１０−３０の範囲にあるとき、本データにおいて、異な
る枝刈り限界値を用いても結果に重大な影響が及ぶこと
はない。反復の場合のしきい値はその約半分である。This database is a conversation about travel / accommodation information which is spontaneously performed between a hotel clerk and a customer. Words that are stagnant, and incorrect start, are mapped to a single marker " ^* uh ^* ". In this experiment, the maximum length of the phrase was changed from n = 1 word to 4 words (when n = 1, the bi-multigram corresponds to the conventional bigram). The frequency probabilities of all bi-multigrams are estimated in 6 training iterations, discarding all sentences that appear less than 20 times in initialization and less than 10 times in each iteration, and branching the phrase dictionary. I mowed. Here, when the threshold value in the initialization is in the range of 10-30, using a different pruning limit value in the data does not significantly affect the result. The threshold for iteration is about half of that.

【０１１８】しかしながら、すべての1単語フレーズ
は、その推定出現回数にかかわらず維持されるため（フ
レーズｓ_i及びｓ_jが１単語フレーズであり、組み合わせ
ｃ（s_i，s_j）の再推定値が零であると、組み合わせｃ
（s_i，s_j）は１にリセットされる。）、すべてのワード
バイグラムが最終辞書に現れることになる。さらに、す
べてのｎ−ｇｒａｍ及びフレーズのバイグラム確率は、
ウィッテン（Witten）ほか（１９９１年）による公知の
Witten-Bellディスカウンティング法を用いて、カッツ
（Katz）（１９８７年）による公知のバックオフ・スム
ージング法で平滑化される。ここで、Witten-Bellディ
スカウンティング法を選択したのは、本テストデータに
おいて従来のｎ−ｇｒａｍを用いた場合、最良のパープ
レキシティスコアが得られるためである。However, since all the one-word phrases are maintained irrespective of the estimated number of occurrences (the phrases s _i and s _j are one-word phrases, the re-estimated value of the combination c (s _i , s _j )) Is zero, the combination c
(S _i , s _j ) is reset to 1. ), All word bigrams will appear in the final dictionary. In addition, the bigram probabilities of all n-grams and phrases are
Known by Witten et al. (1991)
Using the Witten-Bell discounting method, smoothing is performed by the well-known back-off smoothing method by Katz (1987). Here, the reason why the Witten-Bell counting method is selected is that the best perplexity score can be obtained when the conventional n-gram is used in the test data.

【０１１９】次いで、クラスタリングを行わない実験に
ついて述べる。まず、非決定性の方式の度合いにおいて
は、表４の本特許出願人が所有する「旅行の手配」に関
するデータに対するテストで、フォワード・バックワー
ドアルゴリズムによる学習の後に得られたパープレキシ
ティ値ＰＰ^*及びＰＰを次の表に示す。パープレキシテ
ィ値の差（ＰＰ^*−ＰＰ）は通常、パープレキシティの
約１ポイント以内にとどまる。すなわち、単一の最良フ
レーズに依存しても、予測の正確さが大幅に損なわれる
ことがあってはならないことを意味している。Next, an experiment in which clustering is not performed will be described. First, in terms of the degree of non-determinism, the perplexity value PP ^* obtained after learning by the forward-backward algorithm in a test on data on “travel arrangement” owned by the present applicant in Table 4 And PP are shown in the following table. The difference in perplexity values (PP ^* -PP) typically stays within about one point of perplexity. This means that relying on a single best phrase should not significantly impair the accuracy of the prediction.

【０１２０】[0120]

【表５】非決定性の方式の度合い ―――――――――――――――――――――――――――――― ｎ１２３４ ―――――――――――――――――――――――――――――― ＰＰ５６．０４３．９４４．２４５．０ＰＰ^* ５６．０４５．１４５．４４６．３ ――――――――――――――――――――――――――――――[Table 5] Degree of nondeterminism method ―――――――――――――――――――――――――――― n 1 2 3 4 ―――――― ――――――――――――――――――――――――― PP 56.0 43.9 44.2 45.0 PP ^* 56.0 45.1 45.4 46 ３ ――――――――――――――――――――――――――――――

【０１２１】次いで、再推定手順の影響力では、フォワ
ード・バックワードアルゴリズム又はビタビ推定アルゴ
リズムのいずれかを用いたパープレキシティ値ＰＰ^*及
びモデルサイズを次の表に示す。Next, in the influence of the re-estimation procedure, the perplexity value PP ^* and the model size using either the forward-backward algorithm or the Viterbi estimation algorithm are shown in the following table.

【０１２２】[0122]

【表６】推定方法の影響：テストパープレキシティ値ＰＰ^* ――――――――――――――――――――――――――――――――― ｎ１２３４ ――――――――――――――――――――――――――――――――― ＦＢ法５６．０４５．１４５．４４６．３ビタビ法５６．０４５．７４５．９４６．２ ―――――――――――――――――――――――――――――――――[Table 6] Effect of estimation method: Test perplexity value PP ^* ――――――――――――――――――――――――――――――― n 1 2 3 4 ――――――――――――――――――――――――――――― FB method 56.0 45.1 45.4 46. 3 Viterbi method 56.0 45.7 45.9 46.2 ―――――――――――――――――――――――――――――――――

【０１２３】[0123]

【表７】推定方法の影響：モデルのサイズ ――――――――――――――――――――――――――――――――― ｎ１２３４ ――――――――――――――――――――――――――――――――― ＦＢ法３２５０５４４３８２４３６７２４３１８６ビタビ法３２５０５６５１４１６７２５８６７２９５ ―――――――――――――――――――――――――――――――――[Table 7] Influence of estimation method: Model size ――――――――――――――――――――――――――――――― n 1 2 3 4 ――――――――――――――――――――――――――――――― FB method 32505 44382 43672 43186 Viterbi method 32505 65141 67258 67295 ――――― ――――――――――――――――――――――――――――

【０１２４】表６及び表７から明らかなように、パープ
レキシティ値に関する限り、推定方法はほとんど影響を
及ぼさず、フォワード・バックワードアルゴリズムによ
る学習を用いる方がわずかながら有利であるように見え
る。一方、モデルのサイズは、学習終了時に個々のバイ
−マルチグラム数として測定された場合、フォワード・
バックワードアルゴリズムによる学習において約３０％
も減少する。すなわち、同じテストパープレキシティ値
に対して、おおよそ４０，０００対６０，０００の違い
となる。As can be seen from Tables 6 and 7, as far as the perplexity values are concerned, the estimation method has little effect and it seems to be slightly advantageous to use the learning by the forward-backward algorithm. On the other hand, the size of the model, when measured as individual bi-multigrams at the end of training,
About 30% in backward algorithm learning
Also decreases. That is, for the same test perplexity value, the difference is about 40,000 to 60,000.

【０１２５】バイ−マルチグラム結果は、概して、フレ
ーズ放棄を行う枝刈りのための発見的知識では完全に過
学習を回避できないことを示唆する。確かに、（おそら
くは６から８語にまたがる依存性を意味する）ｎ＝３，
４のパープレキシティ値は、（依存性が４語に限定され
る）ｎ＝２のときのそれよりも高くなる。他の方法、お
そらくは短いものよりも長いフレーズを不利にするよう
な方法であれば成功ものと考えられる。The bi-multigram results generally suggest that heuristics for pruning with phrase abandonment cannot completely avoid overlearning. Indeed, n = 3 (perhaps implying a dependency spanning 6 to 8 words)
The perplexity value of 4 is higher than that for n = 2 (dependency is limited to 4 words). Other methods, perhaps those that favor longer phrases over shorter ones, are considered successful.

【０１２６】さらに、ｎ−ｇｒａｍとの比較において
は、フォワード・バックワードアルゴリズムによる学習
から得られたパープレキシティ値（ＰＰ）、ｎ−ｇｒａ
ｍに対するモデルサイズ、及びバイ−マルチグラムを次
の表に示す。Further, in comparison with n-gram, the perplexity value (PP) obtained from learning by the forward-backward algorithm, n-gram
The model size for m and the bi-multigram are shown in the following table.

【０１２７】[0127]

【表８】ｎ−ｇｒａｍの比較 ―――――――――――――――――――――――――――――――――― テストパープレキシティ値ＰＰ ―――――――――――――――――――――――――――――――――― ｎの値１２３４ ―――――――――――――――――――――――――――――――――― ｎ−ｇｒａｍ３１４．２５６．０４０．４３９．８バイ−マルチグラム５６．０４３．９４４．２４５．０ ――――――――――――――――――――――――――――――――――[Table 8] Comparison of n-gram ―――――――――――――――――――――――――――――――― Test perplexity value PP ― ――――――――――――――――――――――――――――――――― Value of n 1 2 3 4 ―――――――――― ―――――――――――――――――――――――― n-gram 314.2 56.0 40.4 39.8 Bi-multigram 56.0 43.9 44 .2 45.0 ――――――――――――――――――――――――――――――――――

【０１２８】[0128]

【表９】ｎ−ｇｒａｍの比較 ―――――――――――――――――――――――――――――――――― モデルのサイズ ―――――――――――――――――――――――――――――――――― ｎ値１２３４ ―――――――――――――――――――――――――――――――――― ｎ−ｇｒａｍ３５２６３２５０５７５５１１１１２１４８バイ−マルチグラム３２５０５４４３８２４３６７２４３１８６ ――――――――――――――――――――――――――――――――――[Table 9] Comparison of n-gram ―――――――――――――――――――――――――――――――― Model size ―――― ―――――――――――――――――――――――――――――― n value 1 2 3 4 ―――――――――――――― ―――――――――――――――――――― n-gram 3526 32505 75511 112148 Bi-multigram 32505 44382 43672 43186 ――――――――――――――― ―――――――――――――――――――

【０１２９】表８及び表９から明らかなように、最も低
いバイ−マルチグラムパープレキシティスコア（４３．
９）は、トライグラムの値よりも依然として高いが、バ
イグラム値（５６．０）よりもトライグラム値（４０．
４）により近い値となっている。さらに、トライグラム
スコアはディスカウントされた方法に依存する。なお、
線形ディスカウンティング法では、本テストにおけるト
ライグラムのパープレキシティは、４８．１であった。As can be seen from Tables 8 and 9, the lowest bi-multigram perplexity score (43.
9) is still higher than the trigram value, but is higher than the bigram value (56.0).
4) The value is closer. In addition, the trigram score depends on the discounted method. In addition,
According to the linear counting method, the perplexity of the trigram in this test was 48.1.

【０１３０】５−グラムのパープレキシティ値（上記表
に示さず）は４０．８であり、４−ｇｒａｍスコアより
もやや高い。これは、バイ−マルチグラムパープレキシ
ティがｎ＞２（すなわち、依存性が４語以上にわたる場
合）のとき減少しないという事実に一致する。最後に、
バイ−マルチグラムモデルのエントリ数はトライグラム
モデルのエントリ数よりも少なく（４５０００に対して
７５０００）、マルチグラムが達成するモデルの正確性
とモデルサイズ間のトレードオフが示されている。The 5-gram perplexity value (not shown in the above table) is 40.8, which is slightly higher than the 4-gram score. This is consistent with the fact that the bi-multigram perplexity does not decrease when n> 2 (ie, where the dependency spans more than four words). Finally,
The number of entries in the bi-multigram model is smaller than the number of entries in the trigram model (75000 versus 45000), indicating a trade-off between model accuracy and model size achieved by multigram.

【０１３１】さらに、クラスタリングを用いた実験及び
実験結果について述べる。本実験では、フレーズのクラ
スタリングによってパープレキシティスコアは改善され
なかった。パープレキシティの増加が非常に少なくなる
（１ポイント以下）のは、フレーズのほんの一部（１０
〜２０％）のみがクラスタとなる時であり、これを越え
るとパープレキシティはかなり悪化する。この効果は、
クラス推定が単語推定に統合されない時、ｎ−ｇｒａｍ
の枠組みにおいても度々報告されている。しかしなが
ら、フレーズのクラスタリングによって、自然発話を特
徴づける言いよどみの語の挿入等、ことばの非流暢性の
いくつかを自然に扱うことができる。この点を説明する
ために、先ずｎ＝４語までのフレーズを扱うモデルの学
習の間に統合されるフレーズを次の表に列挙する。ここ
で、言いよどみを示す“^*ｕｈ^*”を含むフレーズはこの
表の上部に示す。主に、話者の言いよどみによるフレー
ズの違いは、共に統合されることが多い。Further, an experiment using clustering and an experimental result will be described. In this experiment, perplexity scores were not improved by phrase clustering. The increase in perplexity is very small (less than 1 point) only for a small part of the phrase (10
(.About.20%) is the time when it becomes a cluster, beyond which perplexity deteriorates considerably. This effect
When the class estimate is not integrated with the word estimate, n-gram
In the framework of the report. However, by clustering phrases, some of the verbal fluency, such as the insertion of words that characterize spontaneous speech, can be handled naturally. To illustrate this point, the following table first lists the phrases that are integrated during learning of a model that handles phrases up to n = 4 words. Here, the phrase including “ ^* uh ^* ” indicating the stagnation is shown at the top of the table. Mainly, the differences in phrases due to the speaker's depressing are often integrated together.

【０１３２】[0132]

【表１０】４語シーケンスまでを扱うモデルにおける統合されたフレーズの一例 ―――――――――――――――――――――――――――――――――― {yes that will;^*uh^* that would} {yes that will be;^*uh^* yes that's} {^*uh^* by the;and by the} {yes ^*uh^* i;i see i} {okay i understand;^*uh^* yes please} {could you recommend;^*uh^* is there} {^*uh^* could you tell;and could you tell} {so that will;yes that will;yes that would;uh^* that would} {if possible i'd like;we would like;^*uh^* i want} {that sounds good;^*uh^* i understand} {^*uh^* i really;^*uh^* i don't} {^*uh^* i'm staying;and i'm staying} {all right we;^*uh^* yes i} ――――――――――――――――――――――――――――――――――― {good morning this;good afternoon this} {yes i do;yes thank you} {we'll be looking forward;we look forward} {dollars a night;and forty yen} {for your help;for your information} {hold the line;want for a moment} {yes that will be;and could you tell} {please go ahead;you like to know} {want time would you;and you would} {yes there is;but there is} {join phillips in room;ms. suzuki in} {name is suzuki;name is ms. suzuki} {i'm calling from;a;also i'd like} {much does it cost;can reach you} {thousand yen room;dollars per person} {yes i do;yes thank you;i see sir} {you tell me where;you tell me what} {a reservation for the;the reservation for} {your name and the;you give me the} {amy harris in;is amy harris in} {name is mary phillips;name is kazuo suzuki} {hold on a moment;wait a moment} {give me some;also tell me} ――――――――――――――――――――――――――――――――――[Table 10] Examples of integrated phrases in a model that handles up to four-word sequences ―――――――――――――――――――――――――――――――― -(Yes that will; ^* uh ^* that would} {yes that will be; ^* uh ^* yes that's} { ^* uh ^* by the; and by the} {yes ^* uh ^* i; i see i} {okay i understand; ^* uh ^* yes please} {could you recommend; ^* uh ^* is there} { ^* uh ^* could you tell; and could you tell} {so that will; yes that will; yes that would; uh ^* that would} {if possible i'd like; we would like; ^* uh ^* i want} {that sounds good; ^* uh ^* i understand} { ^* uh ^* i really; ^* uh ^* i don't} { ^* uh ^* i'm staying; and i'm staying} {all right we; ^* uh ^* yes i} ――――――――――――――――――――――――――――――――――― {good morning this; good afternoon this} {yes i do; yes thank you} {we'll be looking forward; we look forward} {dollars a night; and forty yen} {for your help; for your information} {hold the line; want for a moment} {yes that will be; and could you tell} {please go ahead; you like to know} {want time would you; and you would} {yes there is; but there is} {join phillips in room; ms. suzuki in} {name is suzuki; name is ms. suzuki} {i'm calling from; a; also i'd like} {much does it cost; can reach you} {thousand yen room; dollars per person} {yes i do; yes thank you; i see sir} {you tell me where; you tell me what} {a reservation for the; the reservation for} {your name and the; you give me the} {amy harris in; is amy harris in} {name is mary phillips; name is kazuo suzuki} {hold on a moment; wait a moment} {give me some; also tell me} ――――――――――――――――――――――――――――――――――

【０１３３】カワハラ（Kawahara）ら（１９９７年）に
よれば、上記の表はさらに、単語予測とは別に、フレー
ズ検索及びクラスタリングを行う他の動機づけ、すなわ
ちトピックの識別や対話のモデリング、及び言語理解に
関する問題への対応を示している。確かに本実験におけ
るクラスタとなったフレーズは、完全盲目的、すなわち
意味論的／語用論的情報を全くなくして導かれたもので
あるが、クラス内フレーズには強固な意味論的相関関係
が示されている。しかしながら、本手法を音声理解に効
率的に使用できるようにするためには、拘束条件は、例
えばスピーチアクトタグ（speech act tags）のような
いくつかのより高いレベルの情報を用いてフレーズクラ
スタリング処理に設定する必要がある。According to Kawahara et al. (1997), the above table further provides other motivations for performing phrase search and clustering, apart from word prediction, namely topic identification and modeling of dialogue, and language. It shows how to deal with issues related to understanding. Certainly, the clustered phrases in this experiment were derived completely blind, ie, without any semantic / pragmatic information, but the in-class phrases had strong semantic correlations. It is shown. However, in order to be able to use this technique efficiently for speech understanding, the constraints must be a phrase clustering process using some higher level information, such as speech act tags. Must be set to

【０１３４】以上説明したように、フレーズ間にｎ−ｇ
ｒａｍ依存を仮定する可変長フレーズを導くアルゴリズ
ムは、言語モデリングのタスクのために提案され、推定
されてきた。特定タスクの言語コーパスは、文をフレー
ズに構成することによりバイグラムパープレキシティ値
を大幅に減らし、一方で言語モデルにおけるエントリ数
をトライグラムモデルの場合に比べてより低い値に保つ
ことが可能であることを示している。しかしながら、こ
れら結果は、より効率的な枝刈り方法によってさらに改
善され、不要な学習を行わずにより長い依存性について
学習することが可能となる。さらに、語形変化の態様を
簡単に本枠組み内に統合することができるため、異なる
長さを有するフレーズに共通のラベルを割り当てること
が可能である。フレーズの意味論的関係が統合されるの
で、本手法は対話モデリングや言語理解の分野において
も用いられる。その場合、意味論的／語用論的情報を用
いれば、フレーズクラスを得るための処理に制限を設け
ることができる。As described above, ng between phrases is used.
Algorithms for deriving variable-length phrases that assume ram dependence have been proposed and estimated for the task of language modeling. The language corpus for a particular task can greatly reduce the bigram perplexity value by composing sentences into phrases, while keeping the number of entries in the language model lower than in the trigram model. It indicates that there is. However, these results are further improved by a more efficient pruning method, which makes it possible to learn about longer dependencies without performing unnecessary learning. Furthermore, since the forms of inflection can be easily integrated into the framework, it is possible to assign a common label to phrases having different lengths. Since the semantic relations of phrases are integrated, this method is also used in the field of dialog modeling and language understanding. In this case, if semantic / pragmatic information is used, a process for obtaining a phrase class can be limited.

【０１３５】＜変形例＞以上の実施形態においては、単
位は英語の文字であり、シーケンスは単語であり、上記
分類処理は、文字列を複数の単語の列に分類し、上記統
計的シーケンスモデルは、統計的言語モデルである。本
発明はこれに限らず、単位は、日本語などの他の自然言
語の文字であってもよい。また、単位は自然言語の単語
であり、シーケンスはフレーズであり、上記分類処理
は、単語列を複数のフレーズの列に分類し、上記統計的
シーケンスモデルは、統計的言語モデルであってもよ
い。<Modification> In the above embodiment, the unit is English characters, the sequence is a word, and the classification process classifies a character string into a plurality of word sequences, and the statistical sequence model Is a statistical language model. The present invention is not limited to this, and the unit may be a character of another natural language such as Japanese. The unit may be a word in a natural language, the sequence may be a phrase, and the classification processing may classify the word string into a plurality of phrase strings, and the statistical sequence model may be a statistical language model. .

【０１３６】＜実施形態の効果＞以上説明したように，
本発明に係る実施形態によれば、以下のような特有の効
果を有する。（Ａ）ＥＭアルゴリズムを使用して単語のシーケンスの
頻度分布を計算することができ、ＭＬ基準を最適化する
ことができる。すなわち、本実施形態のアルゴリズムを
用いられば、必ず、クラスタリングの処理を単調収束さ
せることができて、最適値の解析結果を得ることができ
る。（Ｂ）シーケンス分類の解析を自由にすることができ
る。具体的には、上述のフォワード・バックワードアル
ゴリズムを用いた非決定性の手法を用いるので、自由度
のある解が得られる。なお、当該非決定性の手法を用い
ることができるのは、変数α，βを決めることができる
からである。従って、入力データの尤度を改善すること
により、シーケンス［ｂｃｄ］が入力シーケンスにあっ
たときに、［ｂｃ］＋［ｄ］、［ｂ］＋［ｃｄ］、
［ｂ］＋［ｃ］＋［ｄ］等の小シーケンスへの分割が可
能である。言い換えれば、あるシーケンスが入力シーケ
ンスに与えられていても、解析は事前に決定されず、す
べては入力データの尤度に依存する、つまり確定的では
なく、入力データの頻度確率に依存してクラスタリング
の処理が行われる。（Ｃ）可変長のシーケンスの自動的分類を行うことがで
きる。ここで、シーケンスの分類を、単語の分類に依存
させない。また、シーケンスの分類を直接的に自動的に
行なって、長さの違う共通のクラスシーケンスに高精度
で分類できる。<Effects of Embodiment> As described above,
According to the embodiment of the present invention, the following specific effects are obtained. (A) The frequency distribution of a sequence of words can be calculated using the EM algorithm, and the ML criteria can be optimized. That is, if the algorithm of the present embodiment is used, the clustering process can always be monotonically converged, and an analysis result of the optimum value can be obtained. (B) Analysis of sequence classification can be freely performed. Specifically, since a non-deterministic method using the above-described forward-backward algorithm is used, a solution having a degree of freedom can be obtained. It should be noted that the nondeterminism technique can be used because the variables α and β can be determined. Therefore, by improving the likelihood of the input data, when the sequence [bcd] is in the input sequence, [bc] + [d], [b] + [cd],
Division into small sequences such as [b] + [c] + [d] is possible. In other words, even if a sequence is given to the input sequence, the analysis is not predetermined and everything depends on the likelihood of the input data, i.e. it is not deterministic but clusters on the frequency probabilities of the input data Is performed. (C) Automatic classification of variable-length sequences can be performed. Here, the classification of the sequence does not depend on the classification of the word. In addition, the sequences are directly and automatically classified, and can be classified into common class sequences having different lengths with high accuracy.

【０１３７】従って、本発明に係る実施形態によれば、
従来例に比較して、最適な状態に向かう単調な収束を保
証することができ、自由度があり、可変長のシーケンス
を同一のクラスで取り扱うことができ、ディジタル計算
機を用いて実用的に高速処理することができる統計的シ
ーケンスモデル生成装置、統計的言語モデル生成装置及
び音声認識装置を提供することができる。Therefore, according to the embodiment of the present invention,
Compared to the conventional example, monotonous convergence toward the optimal state can be guaranteed, there is a degree of freedom, variable-length sequences can be handled in the same class, and practically high-speed using a digital computer It is possible to provide a statistical sequence model generation device, a statistical language model generation device, and a speech recognition device capable of processing.

【０１３８】[0138]

【発明の効果】以上詳述したように本発明に係る統計的
シーケンスモデル生成装置によれば、１個又は複数の単
位からなる単位列であるシーケンスを含む入力データに
基づいて、可変長の自然数Ｎ₁個の単位列であるマルチ
グラムと、可変長の自然数Ｎ₂個の単位列であるマルチ
グラムとの間のバイグラムであるバイ−マルチグラムの
統計的シーケンスモデルを生成する統計的シーケンスモ
デル生成装置であって、上記入力データに基づいて、予
め決められたＮ₁，Ｎ₂の最大値の拘束条件のもとで、す
べての単位列の組み合わせの上記バイグラムの頻度確率
を計数する初期化手段と、上記初期化手段によって計数
された上記バイグラムの頻度確率に基づいて、各クラス
の対をマージしたときの相互情報量の損失が最小となる
ようにマージして各クラスの頻度確率を更新して予め決
められた数の複数のクラスに分類することにより、分類
されたクラスに含まれる単位列と、分類されたクラスの
条件付きの単位列の頻度確率と、分類されたクラス間の
バイグラムの頻度確率を計算して出力する分類手段と、
上記分類処理手段から出力される分類されたクラスに含
まれる単位列と、分類されたクラスの条件付きの単位列
の頻度確率と、分類されたクラス間のバイグラムの頻度
確率とに基づいて、ＥＭアルゴリズムを用いて、最尤推
定値を得るように再推定し、ここで、フォワード・バッ
クワードアルゴリズムを用いて、処理対象の各単位列に
対して、時系列的に前方にとり得る処理対象の当該単位
列に対する前方尤度と、当該単位列の直前の単位列を条
件としたときの当該単位列の頻度確率と、時系列的に後
方にとり得る当該単位列に対する後方尤度とに基づいて
シーケンス間のバイグラムの頻度確率を示す式を用い
て、当該シーケンス間のバイグラムの頻度確率を再推定
することにより、再推定結果である上記バイ−マルチグ
ラムの統計的シーケンスモデルを生成して出力する再推
定手段と、上記分類手段の処理と上記再推定手段の処理
を所定の終了条件を満たすまで繰り返し実行するように
制御する制御手段とを備える。従って、本発明によれ
ば、従来例に比較して、最適な状態に向かう単調な収束
を保証することができ、自由度があり、可変長のシーケ
ンスを同一のクラスで取り扱うことができ、ディジタル
計算機を用いて実用的に高速処理して統計的シーケンス
モデルを生成することができる統計的シーケンスモデル
生成装置を提供することができる。As described above in detail, according to the statistical sequence model generating apparatus according to the present invention, a variable length natural number is determined based on input data including a sequence which is a unit sequence composed of one or more units. Statistical sequence model generation for generating a bi-multigram statistical sequence model which is a bigram between a multigram which is N ₁ unit sequences and a multigram which is a variable length natural number N ₂ unit sequence. An initialization means for counting the bigram frequency probabilities of all combinations of unit sequences under the constraint of predetermined maximum values of N ₁ and N ₂ based on the input data. And, based on the frequency probabilities of the bigrams counted by the initialization means, merging so that the loss of mutual information when the pairs of each class are merged is minimized. By updating the frequency probability of a class and classifying it into a predetermined number of classes, the unit sequence included in the classified class, the frequency probability of a conditional unit sequence of the classified class, and the classification Classifying means for calculating and outputting the frequency probability of the bigram between the classes,
On the basis of the unit sequence included in the classified class output from the classification processing unit, the frequency probability of the conditional unit sequence of the classified class, and the frequency probability of the bigram between the classified classes, Using the algorithm, re-estimate to obtain the maximum likelihood estimation value. Here, using the forward-backward algorithm, for each unit sequence to be processed, Based on the forward likelihood for the unit sequence, the frequency probability of the unit sequence assuming the unit sequence immediately before the unit sequence as a condition, and the backward likelihood for the unit sequence that can be backward in time series, By re-estimating the probabilities of the bigrams between the sequences using an expression indicating the probabilities of the bigrams of the bigram, the statistical sequence of the bi-multigram as the re-estimation result is obtained. Comprising a re-estimation means for generating and outputting a Sumoderu, and control means for controlling to repeatedly execute the processing of the processing and the re-estimation means of the classification means until a predetermined termination condition is satisfied. Therefore, according to the present invention, it is possible to guarantee monotonous convergence toward an optimal state as compared with the conventional example, and it is possible to handle a variable-length sequence with the same degree of freedom and the same class, It is possible to provide a statistical sequence model generation device capable of generating a statistical sequence model through practical high-speed processing using a computer.

【０１３９】また、本発明に係る統計的言語モデル生成
装置によれば、上記統計的シーケンスモデル生成装置に
おいて、上記単位は自然言語の文字であり、上記シーケ
ンスは単語であり、上記分類手段は、文字列を複数の単
語の列に分類し、上記統計的シーケンスモデルは、統計
的言語モデルである。従って、本発明によれば、従来例
に比較して、最適な状態に向かう単調な収束を保証する
ことができ、自由度があり、可変長のシーケンスを同一
のクラスで取り扱うことができ、ディジタル計算機を用
いて実用的に高速処理して統計的言語モデルを生成する
ことができる統計的言語モデル生成装置を提供すること
ができる。Further, according to the statistical language model generating apparatus of the present invention, in the statistical sequence model generating apparatus, the unit is a character of a natural language, the sequence is a word, and the classifying means includes: The character string is classified into a plurality of word strings, and the statistical sequence model is a statistical language model. Therefore, according to the present invention, it is possible to guarantee monotonous convergence toward an optimum state as compared with the conventional example, and it is possible to handle a variable-length sequence with the same class with a degree of freedom, A statistical language model generation device capable of generating a statistical language model by practically performing high-speed processing using a computer can be provided.

【０１４０】さらに、本発明に係る統計的言語モデル生
成装置によれば、上記統計的シーケンスモデル生成装置
において、上記単位は自然言語の単語であり、上記シー
ケンスはフレーズであり、上記分類手段は、単語列を複
数のフレーズの列に分類し、上記統計的シーケンスモデ
ルは、統計的言語モデルである。従って、本発明によれ
ば、従来例に比較して、最適な状態に向かう単調な収束
を保証することができ、自由度があり、可変長のシーケ
ンスを同一のクラスで取り扱うことができ、ディジタル
計算機を用いて実用的に高速処理して統計的言語モデル
を生成することができる統計的言語モデル生成装置を提
供することができる。Further, according to the statistical language model generating device of the present invention, in the statistical sequence model generating device, the unit is a word of a natural language, the sequence is a phrase, and the classifying means includes: The word sequence is classified into a plurality of phrase sequences, and the statistical sequence model is a statistical language model. Therefore, according to the present invention, it is possible to guarantee monotonous convergence toward an optimum state as compared with the conventional example, and it is possible to handle a variable-length sequence with the same class with a degree of freedom, A statistical language model generation device capable of generating a statistical language model by practically performing high-speed processing using a computer can be provided.

【０１４１】またさらに、本発明に係る音声認識装置に
よれば、入力される発声音声文の音声信号に基づいて、
所定の統計的言語モデルを用いて音声認識する音声認識
手段を備えた音声認識装置において、上記音声認識手段
は、上記統計的言語モデル生成装置によって生成された
統計的言語モデルを参照して音声認識する。従って、本
発明によれば、従来例に比較して、最適な状態に向かう
単調な収束を保証することができ、自由度があり、可変
長のシーケンスを同一のクラスで取り扱うことができ、
ディジタル計算機を用いて実用的に高速処理して統計的
言語モデルを生成することができる。また、当該生成さ
れた統計的言語モデルを用いて音声認識することによ
り、従来例に比較して高い音声認識率で音声認識するこ
とができる。Further, according to the speech recognition apparatus of the present invention, based on the speech signal of the input speech sentence,
In a speech recognition apparatus provided with speech recognition means for recognizing speech using a predetermined statistical language model, the speech recognition means refers to a statistical language model generated by the statistical language model generation apparatus and performs speech recognition. I do. Therefore, according to the present invention, it is possible to guarantee monotonous convergence toward an optimum state as compared with the conventional example, and it is possible to handle a variable-length sequence having the degree of freedom in the same class,
A statistical language model can be generated by practically high-speed processing using a digital computer. Also, by performing voice recognition using the generated statistical language model, voice recognition can be performed at a higher voice recognition rate than in the conventional example.

[Brief description of the drawings]

【図１】本発明に係る一実施形態である連続音声認識
装置のブロック図である。FIG. 1 is a block diagram of a continuous speech recognition apparatus according to an embodiment of the present invention.

【図２】図１の連続音声認識装置における単語仮説絞
込部６の処理を示すタイミングチャートである。FIG. 2 is a timing chart showing a process of a word hypothesis narrowing section 6 in the continuous speech recognition device of FIG.

【図３】図１の統計的言語モデル生成部２０によって
実行される統計的言語モデル生成処理を示すフローチャ
ートである。FIG. 3 is a flowchart showing a statistical language model generation process executed by a statistical language model generation unit 20 of FIG. 1;

【図４】図３のサブルーチンであるブラウンアルゴリ
ズムを用いた分類処理を示すフローチャートである。FIG. 4 is a flowchart showing a classification process using the Brownian algorithm, which is a subroutine of FIG.

[Explanation of symbols]

１…マイクロホン、２…特徴抽出部、３，５…バッファメモリ、４…単語照合部、６…単語仮説絞込部、１１…音素ＨＭＭメモリ、１２…単語辞書メモリ、２０…統計的言語モデル生成部、２１…学習用テキストデータメモリ、２２…統計的言語モデルメモリ、３０…ワーキングＲＡＭ、３１…パラメータメモリ、３２…シーケンス頻度確率メモリ、３３…クラス定義メモリ、３４…クラス条件付き頻度確率メモリ、３５…クラスバイグラム頻度確率メモリ、３６…セグメント化されたシーケンスメモリ。 DESCRIPTION OF SYMBOLS 1 ... Microphone, 2 ... Feature extraction part, 3, 5 ... Buffer memory, 4 ... Word collation part, 6 ... Word hypothesis narrowing part, 11 ... Phoneme HMM memory, 12 ... Word dictionary memory, 20 ... Statistical language model generation 21: learning text data memory, 22: statistical language model memory, 30: working RAM, 31: parameter memory, 32: sequence frequency probability memory, 33: class definition memory, 34: class conditional frequency probability memory, 35: class bigram frequency probability memory; 36 ... segmented sequence memory.

───────────────────────────────────────────────────── フロントページの続き (72)発明者中嶋秀治京都府相楽郡精華町大字乾谷小字三平谷５番地株式会社エイ・ティ・アール音声翻訳通信研究所内 (56)参考文献ＤＥＬＩＧＮＥＳ．”ＬＡＮＧＵＡＧＥＭＯＤＥＬＩＮＧＢＹＶＡＲＩＡＢＬＥＬＥＮＧＴＨＳＥＱＵＥＮＣＥＳ：ＴＨＥＯＲＥＴＩＣＡＬＦＯＲＭＵＬＡＴＩＯＮＡＮＤＥＶＡＬＵＡＴＩＯＮＯＦＭＵＬＴＩＧＲＡＭＳ”，ＩＣＡＳＳＰ 1995，Ｖｏｌ．１，ｐｐ169−172 ＤｅｌｉｇｎｅＳ．”ＩＮＦＥＲＥＮＣＥＯＦＶＡＲＩＡＢＬＥ−ＬＥＮＧＴＨＡＣＯＵＳＴＩＣＵＮＩＴＳＦＯＲＣＯＮＴＩＮＵＯＵＳＳＰＥＥＣＨＲＥＣＯＧＮＩＴＩＯＮ”，ＩＣＡＳＳＰ 1997，Ｖｏｌ. ３，ｐｐ1731−1734 ＦｒｅｄｅｒｉｃＢ．ｅｔ．ａｌ. Ｖａｒｉａｂｌｅ−ＬｅｎｇｔｈＳｅｑｕｅｎｃｅＭｏｄｅｌｉｎｇ：Ｍｕｌｔｉｇｒａｍｓ”，ＩＥＥＥＳｉｇｎａｌＰｒｏｃｅｓｓｉｎｇＬｅｔｔｅｒｓ，Ｖｏｌ．２，Ｎｏ．６，ｐｐ 111−113，ＪＵＮＥ 1995 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 3/00 - 9/20 C12N 15/00 ＪＩＣＳＴファイル（ＪＯＩＳ)──────────────────────────────────────────────────続き Continuation of the front page (72) Inventor Hideharu Nakajima 5th Sanraya, Inaya, Koika-cho, Soraku-cho, Kyoto Prefecture ATIR Corporation Voice Translation and Communication Research Laboratories (56) References DELIGNES S. "LANGUA GE MODELING BY VAR IABLE LENGTH SEQUENCE NESS: THEORETALIC FORMULATION AND EVA LUTION OF MULTIGR AMS", ICASP 1995, Vol. 1, pp. 169-172 Deligne S .; "INFIRE NCE OF VARIABLE-LE NGTH ACOUSTIC UNITS FOR CONTINOUUS S PEECH RECOGNITION N", ICASSP 1997, Vol. 3, pp 1731-1734 Frederic B. et. al. Variable-Length Sequence Modeling: Multigrams ", IEEE Signal Processing Letters, Vol. 2, No. 6, pp 111-113, JUNE 1995 (58) Fields investigated (Int. Cl. ⁷ , DB name ⁷ ). G10L 3/00-9/20 C12N 15/00 JICST file (JOIS)

Claims

(57) [Claims]

1. A multigram that is a unit string of variable-length natural numbers N ₁ and a multi-gram that is a variable-length natural number N ₂ based on input data including a sequence that is a unit string composed of one or a plurality of units. A statistical sequence model generation device for generating a bi-multigram statistical sequence model which is a bigram between a unit sequence and a multigram, wherein N ₁ and N are predetermined based on the input data. Under the constraint of the maximum value of ₂ , initialization means for counting the frequency probabilities of the bigrams of all combinations of unit sequences, based on the frequency probabilities of the bigrams counted by the initialization means, Class pairs are merged so that the loss of mutual information when merged is minimized, the frequency probability of each class is updated, and the classes are classified into a predetermined number of classes. Classifying means for calculating and outputting a unit sequence included in the classified class, a frequency probability of a conditional unit sequence of the classified class, and a frequency probability of a bigram between the classified classes; The EM algorithm is performed based on the unit sequence included in the classified class output from the processing unit, the frequency probability of the conditional unit sequence of the classified class, and the bigram frequency probability between the classified classes. Is used to re-estimate to obtain the maximum likelihood estimation value. Here, using a forward / backward algorithm, for each unit sequence to be processed, the unit sequence to be processed can be taken forward in time series. Based on the forward likelihood of the unit sequence, the frequency probability of the unit sequence on the condition of the unit sequence immediately before the unit sequence, and the backward likelihood of the unit sequence that can be backward in time series. By re-estimating the frequency probabilities of the bigrams between the sequences using an expression indicating the frequency probabilities of the bigrams between the cans, a statistical sequence model of the bi-multigram, which is the re-estimation result, is generated and output. A statistical sequence model generation apparatus, comprising: a re-estimation unit; and a control unit that controls the process of the classification unit and the process of the re-estimation unit to be repeatedly executed until a predetermined end condition is satisfied.

2. The statistical sequence according to claim 1, wherein said initialization means further removes data of a combination of bigrams having a predetermined frequency probability or less from the frequency probabilities of the counted bigrams. Model generator.

3. The classifying unit according to claim 1, wherein the classifying unit classifies into the plurality of classes using a Brownian algorithm based on the frequency probability of the bigram counted by the initialization unit. The described statistical sequence model generator.

4. The above formula is used to calculate the frequency probability of a bigram between sequences of unit columns when the second unit column, which is the unit column, follows the first unit column in the input data. Expression for calculating for each unit sequence to be processed, the frequency probability of the bigram between the sequences is the sum of the likelihood in all the segmentations including the first and second unit sequences, 4. The statistical sequence model generation device according to claim 1, wherein the statistical sequence model generation device is obtained by dividing by a sum of likelihoods in all segmentations including one unit sequence.

5. The above-mentioned formula is a denominator indicating an average number of occurrences of each unit string in the input data, and an average for each unit string when a second unit string follows the first unit string in the input data. A numerator indicating the number of times, the numerator, for each unit sequence to be processed, the forward likelihood, the frequency probability of the unit sequence when the unit sequence immediately before the unit sequence is a condition, The denominator is the sum of the products of the backward likelihoods.The denominator is the forward likelihood for each unit sequence to be processed, and the frequency probabilities of all the unit sequences under the condition of the unit sequence immediately before the unit sequence. 5. The statistical sequence model generation apparatus according to claim 4, wherein the sum is the sum of the products of the backward likelihood.

6. The method according to claim 1, wherein the ending condition is when the number of repetitions of the processing of the classifying means and the processing of the re-estimating means reaches a predetermined number. The statistical sequence model generation device according to one of the above.

7. The statistical sequence model generation device according to claim 1, wherein the unit is a character of a natural language, the sequence is a word, and the classifying unit includes a character string. Is classified into a plurality of word strings, and the statistical sequence model is a statistical language model.

8. The statistical sequence model generating apparatus according to claim 1, wherein the unit is a word in a natural language, the sequence is a phrase, and the classifying unit includes a word string. Are classified into a plurality of phrase columns, and the statistical sequence model is a statistical language model.

9. A speech recognition device comprising speech recognition means for recognizing speech using a predetermined statistical language model based on a speech signal of an input uttered speech sentence, wherein the speech recognition means is provided. Or a speech recognition device characterized by performing speech recognition with reference to the statistical language model generated by the statistical language model generation device according to 8.