JP5276610B2

JP5276610B2 - Language model generation apparatus, program thereof, and speech recognition system

Info

Publication number: JP5276610B2
Application number: JP2010023969A
Authority: JP
Inventors: 真一本間; 亨今井
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2010-02-05
Filing date: 2010-02-05
Publication date: 2013-08-28
Anticipated expiration: 2030-02-05
Also published as: JP2011164175A

Abstract

<P>PROBLEM TO BE SOLVED: To provide a language model generating device for generating a language model for precisely recognizing voice. <P>SOLUTION: The language model generating device 1 includes: a learning text storage part 11 for storing a learning text; an equivalent word and chain word selection part 21; a language model generating part 22 for generating the language model indicating at least one appearance probability of a word or a chain word included in the learning text; a language model conversion part 23 for calculating a probability value on the basis of the appearance probability of a language model in synonyms having the same meaning and updating the appearance probability of the synonyms at the probability value; a pronunciation dictionary storage part 17 for storing a pronunciation dictionary; and a pronunciation dictionary conversion part 24 for converting the pronunciation dictionary. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、確率的言語モデルによって言語モデルを生成する技術、および、この言語モデルを用いて音声認識を行う技術に関する。 The present invention relates to a technique for generating a language model using a probabilistic language model, and a technique for performing speech recognition using the language model.

例えば、音声による機械操作（カーナビゲーション装置）、自動音声案内システム、または、放送番組におけるリアルタイム字幕生成では、言語モデルを用いた音声認識が不可欠である。このように、言語モデルは、音声認識の精度を左右する重要なものであるため、この言語モデルに関する従来技術が幾つか提案されている。 For example, speech recognition using a language model is indispensable in mechanical operation by voice (car navigation device), automatic voice guidance system, or real-time caption generation in a broadcast program. As described above, since the language model is important for determining the accuracy of speech recognition, several conventional techniques related to this language model have been proposed.

現在、音声認識で利用されている一般的な言語モデルは、単語Ｎグラムモデルと呼ばれるものである（例えば、非特許文献１参照）。また、この単語Ｎグラムモデルを拡張した言語モデルとして、単語クラスという概念を用いた、クラスＮグラムモデルがある。このクラスＮグラムモデルは、品詞や概念によって単語を分類したクラスとして扱うＮグラムモデルであり、学習テキストが少ないために、単語Ｎグラムモデルで学習を十分に行うことができない場合に有効である。 A general language model currently used in speech recognition is called a word N-gram model (see Non-Patent Document 1, for example). Moreover, there is a class N gram model using the concept of word class as a language model that is an extension of this word N gram model. This class N-gram model is an N-gram model handled as a class in which words are classified according to parts of speech and concepts, and is effective when learning cannot be sufficiently performed with the word N-gram model because there are few learning texts.

また、Ｎグラムモデル以外の言語モデルに関する技術として、特許文献１に記載の発明が提案されている。この特許文献１に記載の発明は、正式名称の単語リストと、この言い換え単語リストとを用いて、言い換えのパターン（規則）を確率的に推定（モデル化）する。そして、この特許文献１に記載の発明は、正式名称に加えて、その辞書に言い換えのパターンを辞書（言語モデル）に追加する。さらに、この特許文献１に記載の発明は、短い入力音声（例えば、施設名、地名等の固有名詞）が、この辞書に登録された単語の波形にどれだけ近いものであるかを計算して認識結果を出力する。つまり、特許文献１に記載の発明は、１つ単語が含まれる入力音声を音声認識する孤立単語認識を行うものである。 Further, as a technique related to a language model other than the N-gram model, the invention described in Patent Document 1 has been proposed. In the invention described in Patent Document 1, a paraphrase pattern (rule) is probabilistically estimated (modeled) using a word list of official names and the paraphrase word list. The invention described in Patent Document 1 adds a paraphrase pattern to the dictionary (language model) in addition to the official name. Furthermore, the invention described in Patent Document 1 calculates how close a short input speech (for example, proper nouns such as facility names and place names) is to the waveform of a word registered in this dictionary. Output the recognition result. That is, the invention described in Patent Document 1 performs isolated word recognition that recognizes an input speech including one word.

特開２００５−３１２５５号公報Japanese Patent Laid-Open No. 2005-3255

確率的言語モデル、東京大学出版会、ｐｐ．６０−６２およびｐｐ．７２−７５Stochastic language model, University of Tokyo Press, pp. 60-62 and pp. 72-75

しかし、Ｎグラムモデルでは、以下に述べるような問題がある。
主に話し言葉で見られるくだけた日本語には、同一の表現に対応するさまざまな表記や読みが存在する。例えば、「という」は、「っていう」または「っちゅう」等の表現になることが多い。一方、書き言葉では、これら全てが「という」に統一される。
また、話し言葉を書き起こした学習テキストを用いる場合、「という」が、例えば、「っていう」または「っちゅう」に置き換えられることがある。
これらの結果、言語モデルにおいて、一例として、「という」の統計量が分散して学習テキストの量が不十分となり、信頼できる確率値を算出できないことがある。 However, the N-gram model has the following problems.
There are various notations and readings corresponding to the same expression in Japanese that can be seen mainly in spoken language. For example, “to” often becomes an expression such as “to” or “tchu”. On the other hand, in the written language, all of these are unified as “to”.
In addition, when using a learning text in which a spoken word is transcribed, “to” may be replaced with “to” or “chu”, for example.
As a result, in the language model, for example, the statistic “NO” is dispersed, the amount of learning text becomes insufficient, and a reliable probability value may not be calculated.

また、学習テキストには、例えば、「取り引きする」は、「取引する」とも記述できるように、表記のゆらぎが含まれる場合がある。この場合、この表現のゆらぎが原因で、「取り引きする」の確率値が、小さな値で算出されることがある。 In addition, the learning text may include notation fluctuations so that, for example, “dealing” can be described as “dealing”. In this case, the probability value of “dealing” may be calculated as a small value due to the fluctuation of the expression.

以上をまとめると、Ｎグラムモデルは、話し言葉や書き言葉、および、表記のゆらぎのように、同一の意味で表記または読みが異なる単語または連鎖語が存在する場合、これら単語または連鎖語の統計量が分散する。このため、Ｎグラムモデルは、相対的に学習テキスト量が不足することになり、生成された言語モデルにおいて、確率値の信頼性が低くなるという問題がある。 To summarize the above, the N-gram model is such that when there are words or chain words that have the same meaning or different readings, such as spoken words, written words, and fluctuations in notation, the statistics of these words or chain words are calculated. scatter. For this reason, the learning amount of the N-gram model is relatively short, and there is a problem that the reliability of the probability value becomes low in the generated language model.

また、特許文献１に記載の発明は、孤立単語認識を行うことから、文脈を考慮して辞書を生成していない。このため、特許文献１に記載の発明は、文脈が非常に重要となる大語彙連続音声認識（複数の単語が含まれる入力音声の音声認識）に対応することができない。 Further, since the invention described in Patent Document 1 performs isolated word recognition, a dictionary is not generated in consideration of the context. For this reason, the invention described in Patent Document 1 cannot cope with large vocabulary continuous speech recognition (speech recognition of input speech including a plurality of words) in which context is very important.

そこで、本発明は、前記した問題を解決し、認識誤りが少ない音声認識を可能とする言語モデルを生成する言語モデル生成装置およびそのプログラムを提供することを課題とする。
さらに、本発明は、前記した問題を解決し、認識誤りが少ない音声認識を可能とする音声認識システムを提供することも課題とする。 Accordingly, an object of the present invention is to provide a language model generation apparatus and program for generating a language model that solves the above-described problems and enables speech recognition with few recognition errors.
Furthermore, another object of the present invention is to provide a speech recognition system that solves the above-described problems and enables speech recognition with few recognition errors.

前記した課題を解決するため、本願第１発明に係る言語モデル生成装置は、同一の意味で表記または読みが異なる単語または連鎖語からなる同意語を含む学習テキストを用いて、言語モデルを生成する言語モデル生成装置であって、言語モデル生成部と、連鎖語抽出手段と、編集距離算出手段と、最小編集距離選択手段と、同意単語・連鎖語リスト生成手段と、言語モデル変換部とを備えることを特徴とする。 In order to solve the above-described problem, the language model generation device according to the first invention of the present application generates a language model by using learning text including synonyms composed of words or chain words having the same meaning or different notation or reading. A language model generation device, comprising a language model generation unit, a chain word extraction unit, an edit distance calculation unit, a minimum edit distance selection unit, a consent word / chain word list generation unit, and a language model conversion unit It is characterized by that.

かかる構成によれば、言語モデル生成装置は、言語モデル生成部で、前記学習テキストによって学習を行うことで、前記学習テキストに含まれる単語または連鎖語の少なくとも一方の出現確率を示す言語モデル（例えば、Ｎグラムモデル）を生成する。つまり、言語モデル生成部は、文脈を考慮した確率的言語モデルを生成する。 According to this configuration, the language model generation device performs learning with the learning text in the language model generation unit, thereby indicating a language model (for example, an appearance probability of at least one of words or chain words included in the learning text) , N-gram model). That is, the language model generation unit generates a probabilistic language model considering the context.

ここで、前記したように、確率的言語モデルとして生成された言語モデルは、学習テキストに同意語が存在する場合、これら同意語の統計量が分散してしまい、これら同意語の出現確率が低くなる。そこで、言語モデル生成装置は、連鎖語抽出手段によって、前記学習テキストの１単語あたりのエントロピーを最も削減する順序で、前記学習テキストに予め設定された頻度以上出現する単語対を前記連鎖語として抽出する。さらに、言語モデル生成装置は、編集距離算出手段によって、前記連鎖語抽出手段が抽出した連鎖語の編集距離をＤＰマッチングによって算出する。さらに、言語モデル生成装置は、最小編集距離選択手段によって、前記編集距離算出手段によって算出された編集距離が最小となる連鎖語を同意語候補として選択する。さらに、言語モデル生成装置は、同意単語・連鎖語リスト生成手段によって、前記同意語候補から予め選択された連鎖語が含まれる選択指示が入力され、前記選択指示に基づいて、同一の意味を有する前記同意語が予め対応付けられた同意語リストを生成する。さらに、言語モデル生成装置は、言語モデル変換部によって、前記同意語リストを参照して、前記言語モデルにおいて前記同一の意味を有する同意語の出現確率に基づいて確率値を算出し、前記言語モデルに含まれる前記同意語の出現確率を前記確率値で更新する。つまり、言語モデル変換部は、学習テキストに同意語が存在することによって低い値で算出された同意語の出現確率を補正する。 Here, as described above, in the language model generated as a probabilistic language model, when there are synonyms in the learning text, the statistics of these synonyms are dispersed, and the occurrence probability of these synonyms is low. Become. Therefore, the language model generation device extracts word pairs that appear more than a preset frequency in the learning text as the chain words in the order in which the entropy per word of the learning text is most reduced by the chain word extraction unit. To do. Further, the language model generation apparatus calculates the edit distance of the chain word extracted by the chain word extraction means by DP matching by the edit distance calculation means. Furthermore, the language model generation apparatus selects, as a synonym candidate, a chain word that has the minimum edit distance calculated by the edit distance calculation unit by the minimum edit distance selection unit. Further, the language model generation device receives a selection instruction including a chain word selected in advance from the synonym word candidates by the synonym word / chain word list generation unit, and has the same meaning based on the selection instruction A synonym list in which the synonyms are associated in advance is generated. Furthermore, the language model generating device, the language model conversion unit, said with reference to the agreed word list, to calculate a probability value based on the probability of occurrence of synonyms having the same meaning in the language model, the language The appearance probability of the synonym included in the model is updated with the probability value. That is, the language model conversion unit corrects the appearance probability of the synonym calculated with a low value due to the presence of the synonym in the learning text.

これによって、言語モデル生成装置は、利用者が同意語リストの同意語候補を参照して、連鎖語についての同意語を選択することが可能となる。 Thus, the language model generation device allows the user to select synonyms for the chain words by referring to the synonym candidates in the synonym list.

また、本願第２発明に係る言語モデル生成装置は、前記言語モデル変換部が、前記出現確率が最大となる前記同意語の基本型以外である前記同意語の同意型を、前記言語モデル変換部が更新した後の言語モデルから削除する言語モデル削除手段、を備えることを特徴とする。
かかる構成によれば、言語モデル生成装置は、言語モデルのデータサイズを縮小することができる。 Moreover, in the language model generation device according to the second invention of the present application, the language model conversion unit converts the synonym synonym type other than the basic type of the synonym with the highest occurrence probability into the language model conversion unit. Language model deleting means for deleting from the updated language model.
According to such a configuration, the language model generation device can reduce the data size of the language model.

本願第３発明に係る言語モデル生成装置は、少なくとも前記同意語の表記と当該同意語の発音とを予め対応付けた発音辞書を記憶する発音辞書記憶部と、前記発音辞書を、前記同意語リストを参照して、前記同意語の基本型の表記と、当該基本型に対応する同意語の同意型の表記と、当該同意語の同意型の発音とを少なくとも含む変換後発音辞書に変換する発音辞書変換部とをさらに備えることを特徴とする。
かかる構成によれば、言語モデル生成装置は、発音辞書を、基本型の表記と同意型の表記および発音とを対応付けた変換後発音辞書に変換する。 A language model generation device according to a third invention of the present application includes a pronunciation dictionary storage unit that stores a pronunciation dictionary in which at least the synonym notation and the pronunciation of the synonym are associated in advance, the pronunciation dictionary, the synonym list The phonetic to be converted into a converted pronunciation dictionary including at least a synonym notation of the synonym, a synonym notation of the synonym corresponding to the basic type, and a synonym pronunciation of the synonym And a dictionary conversion unit.
According to such a configuration, the language model generation device converts the pronunciation dictionary into a converted pronunciation dictionary in which basic type notation, consent type notation and pronunciation are associated with each other.

また、前記した課題を解決するため、本願第４発明に係る言語モデル生成プログラムは、コンピュータを、本願第１発明に係る言語モデル生成装置として機能させることを特徴とする。 In order to solve the problems described above, the language model generating program according to the present fourth invention, the computer, characterized in that to function as a language model generating apparatus according to the first aspect of the present invention.

また、前記した課題を解決するため、本願第５発明に係る音声認識システムは、本願第３発明に係る言語モデル生成装置と、当該言語モデル生成装置が生成した言語モデルを用いて音声認識を行う音声認識装置とを備える音声認識システムであって、前記音声認識装置は、音声データを学習することで予め生成した音響モデルを記憶する音響モデル記憶部と、音声分析部と、探索部を備えることを特徴とする。 In order to solve the above-described problem, the speech recognition system according to the fifth invention of the present application performs speech recognition using the language model generation device according to the third invention of the present application and the language model generated by the language model generation device. A speech recognition system including a speech recognition device, wherein the speech recognition device includes an acoustic model storage unit that stores an acoustic model generated in advance by learning speech data, a speech analysis unit, and a search unit. It is characterized by.

かかる構成によれば、音声認識装置は、音声分析部によって、入力される入力音声を音声分析して当該入力音声の特徴ベクトルを算出する。そして、音声認識装置は、探索部によって、前記音声分析部が算出した特徴ベクトルと前記音響モデルとのマッチングにより音響スコアを算出すると共に、前記言語モデルを参照して、音声認識結果の候補となる単語候補の出現確率に第１の定数を乗算した値に第２の定数を加算した言語スコアを算出すると共に、前記言語スコアと前記音響スコアとが最大になる単語候補の列を、前記変換後発音辞書を参照して前記音声認識の結果として出力する。つまり、音声認識装置は、変換後発音辞書を参照することで、基本型に対応する同意型の表記と発音とを出力することができる。 According to this configuration, the speech recognition apparatus performs speech analysis on the input speech input by the speech analysis unit, and calculates a feature vector of the input speech. In the speech recognition apparatus, the search unit calculates an acoustic score by matching the feature vector calculated by the speech analysis unit with the acoustic model, and becomes a speech recognition result candidate with reference to the language model. A language score is calculated by adding a second constant to a value obtained by multiplying the appearance probability of the word candidate by the first constant, and a column of word candidates that maximizes the language score and the acoustic score is calculated after the conversion. The phonetic dictionary is referenced and output as a result of the speech recognition. That is, the speech recognition apparatus can output the consent type notation and pronunciation corresponding to the basic type by referring to the converted pronunciation dictionary.

本発明によれば、以下のような優れた効果を奏する。
本願第１，４発明によれば、文脈を考慮した確率的言語モデルを生成するため、大語彙連続音声認識に対応することができる。そして、本願第１発明によれば、学習テキストに同意語が存在することによって分散して、低い値で算出された同意語の出現確率を補正するので、学習テキストが少量の場合でも、認識誤りが少ない音声認識を可能とする言語モデルを生成できる。 According to the present invention, the following excellent effects can be obtained.
According to the first and fourth inventions of the present application, since the probabilistic language model considering the context is generated, it is possible to cope with large vocabulary continuous speech recognition. According to the first invention of this application, since the synonym is dispersed due to the presence of the synonym in the learning text and the appearance probability of the synonym calculated with a low value is corrected, even if the learning text is small, a recognition error occurs. It is possible to generate a language model that enables voice recognition with less.

本願第１，４発明によれば、利用者が同意語リストの同意語候補を参照して、連鎖語についての同意語を選択できるため、同意語リストが提示されない場合に比べて、利用者が同意語を選択する手間を大きく低減することができる。 According to the first and fourth inventions of the present application, since the user can select synonyms for the chain words by referring to the synonym candidates in the synonym list, the user can compare with the case where the synonym list is not presented. The trouble of selecting synonyms can be greatly reduced.

本願第２発明によれば、言語モデルのデータサイズを縮小することができるため、この言語モデルを用いる音声認識装置のメモリ容量を節約することができる。
本願第３発明によれば、変換後発音辞書に基本型の表記と同意型の表記および発音とが対応付けられているので、変換後発音辞書を参照することで、基本型に対応する同意型の表記と発音とを容易に出力することができる。 According to the second aspect of the present invention, since the data size of the language model can be reduced, the memory capacity of the speech recognition apparatus using the language model can be saved.
According to the third invention of the present application, the converted pronunciation dictionary is associated with the basic type notation, the consent type notation, and the pronunciation, so by referring to the converted pronunciation dictionary, the consent type corresponding to the basic type Can be easily output.

本願第５発明によれば、文脈を考慮した確率的言語モデルを生成するため、大語彙連続音声認識に対応することができる。そして、本願第５発明によれば、学習テキストに同意語が存在することによって分散して、低い値で算出された同意語の出現確率を補正するので、学習テキストが少量の場合でも、認識誤りが少ない音声認識を可能とする。さらに、本願第５発明によれば、変換後発音辞書を参照することで、基本型に対応する同意型の表記と発音とを容易に出力することができ、音声認識システムの利便性を向上させることができる。 According to the fifth aspect of the present invention, since the probabilistic language model considering the context is generated, it is possible to cope with large vocabulary continuous speech recognition. According to the fifth aspect of the present invention, since the synonym is dispersed due to the presence of the synonym in the learning text and the appearance probability of the synonym calculated with a low value is corrected, even if the learning text is small, a recognition error is caused. Enables voice recognition with less. Furthermore, according to the fifth invention of the present application, by referring to the converted pronunciation dictionary, it is possible to easily output the consent type notation and pronunciation corresponding to the basic type, thereby improving the convenience of the speech recognition system. be able to.

本発明の実施形態に係る音声認識システムの構成を示すブロック図である。It is a block diagram which shows the structure of the speech recognition system which concerns on embodiment of this invention. 図１の同意単語・連鎖語選択部の構成を示すブロック図である。It is a block diagram which shows the structure of the consent word and chain word selection part of FIG. 図２の連鎖語抽出手段が生成した連鎖語リストの一例を示す図である。It is a figure which shows an example of the chain word list which the chain word extraction means of FIG. 2 produced | generated. 図２の最小編集距離選択手段が生成した編集距離最小連鎖語関係リストの一例を示す図である。It is a figure which shows an example of the edit distance minimum chain word relation list | wrist which the minimum edit distance selection means of FIG. 2 produced | generated. 図２の同意単語・連鎖語リスト記憶部が記憶する同意単語・連鎖語リストの一例である。3 is an example of a consent word / chain word list stored in a consent word / chain word list storage unit in FIG. 2. 図１の言語モデル変換部の構成を示すブロック図である。It is a block diagram which shows the structure of the language model conversion part of FIG. 図１の言語モデル記憶部が記憶する言語モデルの一例を示す図であり、（ａ）はユニグラムの場合であり、（ｂ）はバイグラムの場合である。It is a figure which shows an example of the language model which the language model memory | storage part of FIG. 1 memorize | stores, (a) is a case of a unigram, (b) is a case of a bigram. 図１の発話辞書記憶手段が記憶する発話辞書の一例を示す図である。It is a figure which shows an example of the speech dictionary which the speech dictionary memory | storage means of FIG. 1 memorize | stores. 図８の発話辞書を変換した変換後発話辞書の一例を示す図である。It is a figure which shows an example of the post-conversion utterance dictionary which converted the utterance dictionary of FIG. 図２の同意単語・連鎖語選択部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the consent word and chain word selection part of FIG. 図６の言語モデル変換部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the language model conversion part of FIG.

以下、本発明の実施形態について、適宜図面を参照しながら詳細に説明する。なお、各実施形態において、同一の機能を有する手段には同一の符号を付し、説明を省略した。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings as appropriate. In each embodiment, means having the same function are denoted by the same reference numerals and description thereof is omitted.

［音声認識システムの概略］
図１を参照して、本発明の実施形態に係る音声認識システムの概略について説明する。
図１に示すように、音声認識システム１００は、言語モデル生成装置１と、音声認識装置３とを備える。
言語モデル生成装置１は、同一の意味で表記または読みが異なる単語または連鎖語からなる同意語を含む学習テキストを用いて、音声認識に用いる言語モデルを生成する。
音声認識装置３は、言語モデル生成装置１が生成した言語モデルを用いて、入力音声を音声認識する。 [Outline of speech recognition system]
An outline of a speech recognition system according to an embodiment of the present invention will be described with reference to FIG.
As shown in FIG. 1, the speech recognition system 100 includes a language model generation device 1 and a speech recognition device 3.
The language model generation device 1 generates a language model used for speech recognition using learning text including synonyms consisting of words or chain words that have the same meaning and different notation or reading.
The speech recognition device 3 recognizes the input speech using the language model generated by the language model generation device 1.

なお、連鎖語（単語連鎖）とは、複数の単語からなる頻出表現であり、１つの単語連鎖中の各単語を結合して１単語として扱うものである。例えば、連鎖語としては、単語「と」および単語「いう」とを連結した「と＿いう」がある。ここで、”＿“は、単語同士の連結を示す。 Note that a chain word (word chain) is a frequent expression composed of a plurality of words, and combines words in one word chain and handles them as one word. For example, as a chain word, there is “to_” which is a concatenation of the word “to” and the word “to”. Here, “_” indicates connection between words.

［言語モデル生成装置の構成］
以下、言語モデル生成装置１の構成を詳細に説明する。
図１に示すように、言語モデル生成装置１は、学習テキスト記憶部１１と、連鎖語リスト記憶部１２と、編集距離付与連鎖語関係リスト記憶部１３と、編集距離最小連鎖語関係リスト記憶部１４と、同意単語・連鎖語リスト記憶部１５と、言語モデル記憶部１６と、発音辞書記憶部１７と、変換後発音辞書記憶部１８と、同意単語・連鎖語選択部（同意語選択部）２１と、言語モデル生成部２２と、言語モデル変換部２３と、発音辞書変換部２４とを備える。 [Configuration of language model generator]
Hereinafter, the configuration of the language model generation device 1 will be described in detail.
As shown in FIG. 1, the language model generation device 1 includes a learning text storage unit 11, a chain word list storage unit 12, an edit distance addition chain word relationship list storage unit 13, and an edit distance minimum chain word relationship list storage unit. 14, consent word / chain word list storage unit 15, language model storage unit 16, pronunciation dictionary storage unit 17, converted pronunciation dictionary storage unit 18, and consent word / chain word selection unit (consent word selection unit) 21, a language model generation unit 22, a language model conversion unit 23, and a pronunciation dictionary conversion unit 24.

［各記憶部］
学習テキスト記憶部１１は、言語モデルを生成する際に必要となる学習テキスト（学習データ）を記憶するメモリ、ハードディスク等の記憶手段である。この学習テキストは、例えば、音声認識システムの利用者により、学習テキスト記憶部１１に予め記憶される。 [Each storage unit]
The learning text storage unit 11 is a storage unit such as a memory or a hard disk for storing learning text (learning data) necessary for generating a language model. This learning text is stored in advance in the learning text storage unit 11 by a user of the speech recognition system, for example.

連鎖語リスト記憶部１２は、連鎖語リストを記憶するメモリ、ハードディスク等の記憶手段である。
編集距離付与連鎖語関係リスト記憶部１３は、編集距離付与連鎖語関係リストを記憶するメモリ、ハードディスク等の記憶手段である。
編集距離最小連鎖語関係リスト記憶部１４は、編集距離最小連鎖語関係リストを記憶するメモリ、ハードディスク等の記憶手段である。
同意単語・連鎖語リスト記憶部１５は、同意単語・連鎖語リスト（同意語リスト）を記憶するメモリ、ハードディスク等の記憶手段である。
なお、連鎖語リスト、編集距離付与連鎖語関係リスト、編集距離最小連鎖語関係リストおよび同意単語・連鎖語リストの詳細は、同意単語・連鎖語選択部２１とあわせて説明する。 The chain word list storage unit 12 is a storage unit such as a memory or a hard disk for storing the chain word list.
The edit distance assignment chain word relation list storage unit 13 is a storage unit such as a memory or a hard disk for storing the edit distance assignment chain word relation list.
The edit distance minimum chain word relationship list storage unit 14 is a storage unit such as a memory or a hard disk for storing the edit distance minimum chain word relationship list.
The consent word / chain word list storage unit 15 is a storage unit such as a memory or a hard disk for storing the consent word / chain word list (synonym list).
The details of the chain word list, the edit distance imparted chain word relation list, the edit distance minimum chain word relation list, and the consent word / chain word list will be described together with the consent word / chain word selection unit 21.

言語モデル記憶部１６は、言語モデルを記憶するメモリ、ハードディスク等の記憶手段である。この言語モデルは、学習テキストに含まれる単語または連鎖語の少なくとも一方の出現確率を示すものであり、後記する言語モデル生成部２２によって生成される。 The language model storage unit 16 is a storage unit such as a memory or a hard disk that stores a language model. This language model indicates the appearance probability of at least one of words or chain words included in the learning text, and is generated by a language model generation unit 22 described later.

発音辞書記憶部１７は、発音辞書を予め記憶するメモリ、ハードディスク等の記憶手段である。この発音辞書は、単語または連鎖語の表記とその発音とを対応付けた辞書である。
変換後発音辞書記憶部１８は、変換後発音辞書を記憶するメモリ、ハードディスク等の記憶手段である。この変換後発音辞書は、後記する発音辞書変換部２４によって発音辞書が変換されたものであり、音声認識装置３が音声認識を行う際に参照される。
なお、発音辞書および変換後発音辞書の詳細は、発音辞書変換部２４とあわせて説明する。 The pronunciation dictionary storage unit 17 is a storage unit such as a memory or a hard disk that stores the pronunciation dictionary in advance. This pronunciation dictionary is a dictionary that associates the notation of words or chain words with their pronunciation.
The post-conversion pronunciation dictionary storage unit 18 is a storage unit such as a memory or a hard disk that stores the post-conversion pronunciation dictionary. This post-conversion pronunciation dictionary is obtained by converting the pronunciation dictionary by the pronunciation dictionary conversion unit 24 described later, and is referred to when the speech recognition device 3 performs speech recognition.
Details of the pronunciation dictionary and the converted pronunciation dictionary will be described together with the pronunciation dictionary conversion unit 24.

［同意単語・連鎖語選択部］
以下、図２を参照し、同意単語・連鎖語選択部２１を詳細に説明する。
同意単語・連鎖語選択部２１は、学習テキストを参照し、利用者から入力された選択指示に基づいて、同意単語・連鎖語リストを生成する。ここで、図２に示すように、同意単語・連鎖語選択部２１は、連鎖語抽出手段２１１と、編集距離算出手段２１２と、最小編集距離選択手段（連鎖語候補選択手段）２１３と、同意単語・連鎖語リスト生成手段２１４とを備える。 [Consent word / chain word selection part]
Hereinafter, the consent word / chain word selection unit 21 will be described in detail with reference to FIG.
The consent word / chain word selection unit 21 refers to the learning text, and generates a consent word / chain word list based on the selection instruction input by the user. Here, as shown in FIG. 2, the consent word / chain word selection unit 21 includes a chain word extraction unit 211, an edit distance calculation unit 212, a minimum edit distance selection unit (chain word candidate selection unit) 213, and an agreement A word / chain word list generation unit 214.

連鎖語抽出手段２１１は、学習テキストの文頭から順に、連続する２個の単語の出現確率を計算するバイグラムを行う。例えば、学習テキストが「きょう＿は＿温かい＿朝＿だった」の場合、単語対は、「きょう＿は」、「は＿温かい」および「朝＿だった」になる。この場合、連鎖語抽出手段２１１は、「きょう＿は」、「は＿温かい」および「朝＿だった」という単語対が１回ずつ出現するため、これら単語対の出現確率「１」をそれぞれ算出する。そして、連鎖語抽出手段２１１は、予め設定された頻度（閾値）以上であり、かつ、学習テキストの１単語あたりのエントロピーを最も削減する順序でこの学習テキストに出現する単語対を連鎖語として抽出する。その後、連鎖語抽出手段２１１は、抽出した連鎖語を格納した連鎖語リストを生成し、連鎖語リスト記憶部１２に記憶する。
なお、連鎖語の抽出手法の詳細は、例えば、文献「対談音声のための連鎖語とクラスを利用した言語モデル、日本音饗学会講演論文集、ｐｐ．７１−７２、２００６年３月」に記載されている。 The chain word extraction unit 211 performs a bigram for calculating the appearance probability of two consecutive words in order from the beginning of the learning text. For example, if the learning text is “Kyo_ was _Warm_Morning_”, the word pairs would be “Kyo_ha”, “was_warm” and “Morning_”. In this case, since the word pair “Kyo_ha”, “ha_warm”, and “was morning_” appeared once each, the chain word extraction unit 211 sets the appearance probability “1” of each of these word pairs. calculate. The chain word extraction unit 211 extracts, as a chain word, word pairs that are equal to or higher than a preset frequency (threshold value) and appear in the learning text in an order that reduces the entropy per word of the learning text most. To do. Thereafter, the chain word extraction unit 211 generates a chain word list in which the extracted chain words are stored, and stores them in the chain word list storage unit 12.
Details of the method for extracting chain words can be found in, for example, the document “Language model using chain words and classes for conversational speech, Proceedings of the Society of Musical Engineers of Japan, pp. 71-72, March 2006”. Have been described.

ここで、図３を参照し、連鎖語リストの一例を説明する。
図３の連鎖語リストは、連鎖語抽出手段２１１によって学習テキストから抽出された連鎖語（例えば、「あり＿ます」、「い＿ない」、「い＿ました」）が、それぞれ格納されている。 Here, an example of a chain word list will be described with reference to FIG.
In the chain word list of FIG. 3, chain words extracted from the learning text by the chain word extraction means 211 (for example, “Yes_Is”, “I_No”, “I_I” ”) are respectively stored. Yes.

図２に戻り、同意単語・連鎖語選択部２１の説明を続ける。
編集距離算出手段２１２は、連鎖語リストを参照し、この連鎖語リストに含まれる連鎖語の編集距離をＤＰマッチングによって算出する。具体的には、編集距離算出手段２１２は、各連鎖語に対して、その連鎖語を除くＮ−１個の連鎖語と、その連鎖語を構成する単語単位とでＤＰマッチングを行って、連鎖語の編集距離を算出する。そして、編集距離算出手段２１２は、各連鎖語、および、各連鎖語の編集距離を格納した編集距離付与連鎖語関係リストを生成して、編集距離付与連鎖語関係リスト記憶部１３に記憶する。
なお、ＤＰマッチングの詳細は、例えば、文献「パターン認識と学習のアルゴリズム、文一総合出版、ｐｐ.９１−１０８」に記載されている。 Returning to FIG. 2, the explanation of the consent word / chain word selection unit 21 will be continued.
The edit distance calculation means 212 refers to the chain word list, and calculates the edit distance of the chain word included in the chain word list by DP matching. Specifically, the edit distance calculation means 212 performs DP matching for each chain word on the N-1 chain words excluding the chain word and the word unit constituting the chain word, and creates a chain. Calculate the edit distance of the word. Then, the edit distance calculation means 212 generates an edit distance-added chain word relationship list storing each chain word and the edit distance of each chain word, and stores it in the edit distance-added chain word relationship list storage unit 13.
The details of DP matching are described in, for example, the document “Pattern Recognition and Learning Algorithm, Sentence 1 General Publishing, pp. 91-108”.

最小編集距離選択手段２１３は、編集距離付与連鎖語関係リストを参照し、この編集距離付与連鎖語関係リストに含まれる編集距離が最小となる連鎖語を同意語候補として選択（リストアップ）する。そして、最小編集距離選択手段２１３は、選択した同意語候補（連鎖語）を格納した編集距離最小連鎖語関係リストを生成して、編集距離最小連鎖語関係リスト記憶部１４に記憶する。 The minimum edit distance selection means 213 refers to the edit distance assignment chain word relation list, and selects (lists up) as a synonym candidate the link word having the shortest edit distance included in the edit distance assignment chain word relation list. Then, the minimum edit distance selection means 213 generates an edit distance minimum chain word relationship list storing the selected synonym candidates (chain words), and stores it in the edit distance minimum chain word relationship list storage unit 14.

ここで、図４を参照し、編集距離最小連鎖語関係リストの一例を説明する（適宜図２参照）。
図４に示すように、最小編集距離選択手段２１３は、編集距離最小連鎖語関係リストを、例えば、基本型を示す識別子「基本型」で始まる行と、同意型を示す識別子「同意型」で始まる行とが交互に現れるフォーマットとする。このとき、最小編集距離選択手段２１３は、編集距離最小連鎖語関係リストに、「基本型」で始まる行には１個の連鎖語を含め、「同意型」で始まる行には１個以上の連鎖語を含めるようにする。さらに、最小編集距離選択手段２１３は、編集距離最小連鎖語関係リストにおいて、「同意型」で始まる行の各連鎖語が、その１行上の「基本型」で始まる行の連鎖語に対して、編集距離が最小となる関係とする。例えば、図４の編集距離最小連鎖語関係リストは、「同意型」で始まる行の２つの連鎖語「なきゃ＿いけ＿ない」および「なければ＿なら＿ない」が、その１行上の「基本型」で始まる行の連鎖語「なければ＿いけ＿ない」に対して、編集距離が最小であることを示している。 Here, an example of the edit distance minimum chain word relation list will be described with reference to FIG. 4 (see FIG. 2 as appropriate).
As shown in FIG. 4, the minimum edit distance selection means 213 displays the edit distance minimum chain word relationship list with, for example, a line beginning with an identifier “basic type” indicating a basic type and an identifier “consent type” indicating a consensus type. A format in which the starting line appears alternately. At this time, the minimum edit distance selection means 213 includes one chain word in the line starting with “basic type” and one or more in the line starting with “consent type” in the edit distance minimum chain word relation list. Include chain words. Further, the minimum edit distance selection means 213 selects each chain word of the line starting with “consent type” in the edit distance minimum chain word relation list with respect to the chain word of the line starting with “basic type” on the one line. , The editing distance is minimized. For example, in the edit distance minimum chain word relation list in FIG. 4, two chain words “Ne____________” and “Must___________________________________________________ This indicates that the edit distance is the minimum for the chain word “must_do_no” in the line starting with “basic type”.

ここで、編集距離最小連鎖語関係リストは、「同意型」で始まる行の連鎖語の中に、「基本型」で始まる行の連鎖語に対して同意語と言えないものを含んでいる可能性がある。このため、利用者は、編集距離最小連鎖語関係リストをチェックし、「同意型」で始まる行の連鎖語の中から、同意語として問題ない連鎖語だけを選択することが好ましい。そして、利用者は、編集距離最小連鎖語関係リストに基づいて利用者が選択した連鎖語を、選択指示として同意単語・連鎖語リスト生成手段２１４に入力する。 Here, it is possible that the edit distance minimum chain word relation list includes a word that cannot be said to be a synonym for a chain word of a line starting with “basic type” in a chain word of a line starting with “consent type”. There is sex. For this reason, it is preferable that the user checks the edit distance minimum chain word relation list and selects only a chain word having no problem as a synonym from among the chain words of the line starting with “consent type”. Then, the user inputs the chain word selected by the user based on the edit distance minimum chain word relation list to the consent word / chain word list generation unit 214 as a selection instruction.

この選択指示は、同意語として選択した単語および連鎖語の少なくとも一方を示すものである。つまり、選択指示は、連鎖語以外に、利用者が予め選択した単語を含めても良い。さらに、選択指示は、同意語として連鎖語および単語の何れか一方だけを用いる場合、当該一方だけを含めても良い。 This selection instruction indicates at least one of a word selected as a synonym and a chain word. That is, the selection instruction may include a word previously selected by the user in addition to the chain word. Further, when only one of a chain word and a word is used as a synonym, the selection instruction may include only the one.

以下、図２に戻り、同意単語・連鎖語選択部２１の説明を続ける。
同意単語・連鎖語リスト生成手段２１４は、利用者から選択指示が入力され、この選択指示に基づいて同意単語・連鎖語リストを生成する。つまり、同意単語・連鎖語リスト生成手段２１４は、この選択指示に含まれる同意語を格納して同意単語・連鎖語リストを生成する。そして、同意単語・連鎖語リスト生成手段２１４は、生成した同意単語・連鎖語リストを同意単語・連鎖語リスト記憶部１５に記憶する。 Hereinafter, returning to FIG. 2, the explanation of the consent word / chain word selection unit 21 will be continued.
The consent word / chain word list generation means 214 receives a selection instruction from the user, and generates a consent word / chain word list based on the selection instruction. That is, the consent word / chain word list generation unit 214 stores the synonyms included in the selection instruction and generates a consent word / chain word list. Then, the consent word / chain word list generation unit 214 stores the generated consent word / chain word list in the consent word / chain word list storage unit 15.

ここで、図５を参照し、同意単語・連鎖語リストの一例を説明する。
図５の同意単語・連鎖語リストは、図４の編集距離最小連鎖語関係リストから、同意語として選択された連鎖語が含まれる。また、この同意単語・連鎖語リストは、図４の編集距離最小連鎖語関係リストと同様に、「基本型」および「同意型」という識別子を行頭に挿入することで、同意語（同一の意味の単語および同一の意味の連鎖語）を対応付けている。つまり、同意単語・連鎖語リストは、「基本型」で始まる行の連鎖語と、その１行下の「同意型」で始まる行の連鎖語とが同意語であることを示す。例えば、図５の同意単語・連鎖語リストは、「基本型」で始まる行の連鎖語「なければ＿いけ＿ない」と、その一行下の「同意型」で始まる行の連鎖語「なきゃ＿いけ＿ない」とが同意語であることを示している。その一方、図４に図示した連鎖語「なければ＿なら＿ない」は、連鎖語「なければ＿いけ＿ない」の同意語でないと利用者によって判断されたため、同意単語・連鎖語リストに含まれない。
なお、図５では省略したが、同意単語・連鎖語リストは、連鎖語と同様、同意語として、同一の意味となる単語を対応付けても良いことは言うまでもない。 Here, an example of the consent word / chain word list will be described with reference to FIG.
The synonym word / chain word list in FIG. 5 includes a chain word selected as a synonym from the edit distance minimum chain word relation list in FIG. Also, this synonym word / chain word list is similar to the edit distance minimum chain word relation list in FIG. 4 by inserting identifiers of “basic type” and “consent type” at the beginning of the line, and synonyms (same meaning). And a chain word having the same meaning). That is, the synonym word / chain word list indicates that the chain word of the line starting with “basic type” and the chain word of the line starting with “consent type” one line below are synonyms. For example, the consensus word / chain word list of FIG. 5 includes a chain word “must_do_no” in a line starting with “basic type” and a chain word “naki_” in a line starting with “consent type” one line below. "I don't have" is a synonym. 4 is included in the synonym word / chain word list because it is determined by the user that it is not a synonym for the chain word “must_must_must”. I can't.
Although omitted in FIG. 5, it goes without saying that the synonym word / chain word list may be associated with words having the same meaning as synonyms, similarly to the chain words.

以上をまとめると、同意単語・連鎖語選択部２１は、学習テキストの中で出現頻度が大きい単語対を選択し、これら単語対の中で学習テキストのエントロピーを最も大きく削減するものを連鎖語（同意語候補）とする。そして、同意単語・連鎖語選択部２１は、この処理を必要に応じて繰り返し、Ｍ個の連鎖語（同意語候補）を得て同意単語・連鎖語リストを生成する。これによって、利用者は、例えば、同意単語・連鎖語リストの同意語候補を参照して、連鎖語についての同意語を選択できるため、利用者が同意語を選択する手間を大きく低減することができる。仮に、同意単語・連鎖語リストを利用者が参照できない場合、利用者は、連鎖語について同意語を学習テキストから直接選択するという困難な作業を行う必要があり、負担が極めて大きくなる。ここで、Ｍの値は、音声認識の精度が高くなるように予め設定しておく。
なお、単語については、連鎖語に比べ、利用者が同意語を選択する手間が少ないため、編集距離最小連鎖語関係リストに相当するリストを生成していない。 In summary, the consent word / chain word selection unit 21 selects a word pair having a high appearance frequency in the learning text, and selects the word pair that greatly reduces the entropy of the learning text among these word pairs. Synonym candidate). The synonym word / chain word selection unit 21 repeats this process as necessary, and obtains M chain words (synonym word candidates) to generate a synonym word / chain word list. Thereby, for example, the user can select synonyms for the chain words by referring to the synonym candidates in the synonym word / chain word list, so that the user's effort to select the synonyms can be greatly reduced. it can. If the user cannot refer to the synonym word / chain word list, the user needs to perform a difficult task of directly selecting the synonym word from the learning text for the chain word, which greatly increases the burden. Here, the value of M is set in advance so as to increase the accuracy of speech recognition.
For words, a list corresponding to the minimum edit distance chain word relation list is not generated because the user has less time to select synonyms than the chain words.

［言語モデル生成部］
図１に戻り、言語モデル生成装置１の説明を続ける。
言語モデル生成部２２は、学習テキストを確率的言語モデルによって学習（機械学習）することで、言語モデルを生成して言語モデル記憶部１６に記憶する。ここで、言語モデル生成部２２は、確率的言語モデルとして、単語Ｎグラムモデルを用いる。この単語Ｎグラムモデルは、学習テキストに含まれる単語列ｗ_１ ^ｎ＝ｗ_１，・・・，ｗ_ｎに対して、単語ｗ_ｎの出現確率を、直前のＮ−１単語から予測する確率的言語モデルであり、下記の式（１）で表すことができる。 [Language model generator]
Returning to FIG. 1, the description of the language model generation apparatus 1 will be continued.
The language model generation unit 22 generates a language model by learning the learning text using a probabilistic language model (machine learning), and stores the language model in the language model storage unit 16. Here, the language model generation unit 22 uses a word N-gram model as the probabilistic language model. This word N-gram model, a word string _w ¹ n ₌ _w ¹ that is included in the learning text,..., Against _{w n,} stochastic to predict the probability of occurrence of word _{w n,} from the N-1 last word It is a language model and can be expressed by the following equation (1).

この単語Ｎグラムモデルは、Ｎ＝１とした場合には、ユニグラム（ｕｎｉｇｒａｍ）と呼ばれ、Ｎ＝２とした場合には、バイグラム（ｂｉｇｒａｍ）と呼ばれ、Ｎ＝３とした場合には、トライグラム（ｔｒｉｇｒａｍ）と呼ばれる。また、直前のＮ−１単語（ｗ_１ ^ｎ-１）は、履歴（ｈｉｓｔｏｒｙ）と呼ばれる。 This word N-gram model is called a unigram when N = 1, called a bigram when N = 2, and when N = 3, It is called a trigram. Also, the immediately preceding N-1 word (w ₁ ^n-1 ) is called a history.

ここで、図７を参照し、言語モデルの一例を説明する（適宜図１参照）。
図７の言語モデルは、左列がＮグラム確率値（出現確率）であり、中央列がパラメータ名（単語または連鎖語）であり、右列がバックオフ係数である。 Here, an example of a language model will be described with reference to FIG. 7 (see FIG. 1 as appropriate).
In the language model of FIG. 7, the left column is an N-gram probability value (appearance probability), the center column is a parameter name (word or chain word), and the right column is a back-off coefficient.

Ｎグラム確率値は、パラメータ名に記載の単語または連鎖語の出現確率を示し、図７では、その値を対数表記している。
パラメータ名は、単語または連鎖語を示している。また、パラメータ名の＜ｓ＞は文頭記号であり、＜／ｓ＞は文末記号である。つまり、この言語モデルでは、文頭記号と文末記号とを単語として扱っている。
バックオフ係数は、学習テキストの中に出現確率がゼロとなるＮグラム確率値を、低次の単語Ｎグラムモデル（例えば、ユニグラム）から推定するときに用いる係数である。
なお、バックオフ係数の詳細は、後記するバックオフ係数処理手段２３４とあわせて説明する。 The N-gram probability value indicates the appearance probability of the word or chain word described in the parameter name, and in FIG. 7, the value is expressed in logarithm.
The parameter name indicates a word or a chain word. In addition, <s> in the parameter name is a sentence head symbol, and </ s> is a sentence end symbol. That is, in this language model, the beginning symbol and the end symbol are handled as words.
The back-off coefficient is a coefficient that is used when an N-gram probability value that has an appearance probability of zero in the learning text is estimated from a low-order word N-gram model (for example, a unigram).
The details of the back-off coefficient will be described together with back-off coefficient processing means 234 described later.

［言語モデル変換部］
以下、図６に戻り、言語モデル変換部２３を詳細に説明する。
言語モデル変換部２３は、同意単語・連鎖語リストを参照し、言語モデルの変換（補正）を行う。ここで、図６に示すように、言語モデル変換部２３は、パラメータ抽出手段２３１と、確率値算出手段２３２と、履歴処理手段２３３と、バックオフ係数処理手段２３４と、言語モデル更新手段２３５とを備える。 [Language model converter]
Hereinafter, returning to FIG. 6, the language model conversion unit 23 will be described in detail.
The language model conversion unit 23 performs conversion (correction) of the language model with reference to the consent word / chain word list. Here, as shown in FIG. 6, the language model conversion unit 23 includes a parameter extraction unit 231, a probability value calculation unit 232, a history processing unit 233, a backoff coefficient processing unit 234, and a language model update unit 235. Is provided.

パラメータ抽出手段２３１は、同意単語・連鎖語リストを参照して、この同意単語・連鎖語リストに含まれる同意語のＮグラムパラメータを、言語モデルから抽出する。そして、パラメータ抽出手段２３１は、抽出したＮグラムパラメータを確率値算出手段２３２に出力する。以下の説明において、Ｎグラムパラメータは、言語モデルのＮグラム確率値、パラメータ名およびバックオフ係数のことを指す。 The parameter extraction unit 231 refers to the synonym word / chain word list and extracts N-gram parameters of the synonym words included in the synonym word / chain word list from the language model. Then, the parameter extracting unit 231 outputs the extracted N-gram parameter to the probability value calculating unit 232. In the following description, the N-gram parameter refers to the N-gram probability value, parameter name, and back-off coefficient of the language model.

確率値算出手段２３２は、パラメータ抽出手段２３１からＮグラムパラメータが入力される。また、確率値算出手段２３２は、同意単語・連鎖語リストを参照して、この同意単語・連鎖語リストで対応付けられた同一の意味を有する同意語を取得する。そして、確率値算出手段２３２は、この同一の意味を有する同意語について、入力されたＮグラムパラメータの出現確率に基づいて確率値を算出する。ここで、確率値算出手段２３２は、入力されたＮグラムパラメータの出現確率について、加算値等を求める四則演算を行って確率値を算出することができる。また、確率値算出手段２３２は、入力されたＮグラムパラメータの出現確率について、平均値、最大値等を求める統計演算を行って確率値を算出することもできる。さらに、確率値算出手段２３２は、入力されたＮグラムパラメータの出現確率について、加算値（手法１）、平均値（手法２）又は最大値（手法３）の何れかを確率値として算出することが好ましい。以下、確率値を算出する６つの具体例を順に説明する。 The probability value calculation means 232 receives the N-gram parameter from the parameter extraction means 231. Further, the probability value calculation means 232 refers to the synonym word / chain word list and acquires synonyms having the same meaning associated with the synonym word / chain word list. Then, the probability value calculation unit 232 calculates a probability value for the synonym having the same meaning based on the appearance probability of the input N-gram parameter. Here, the probability value calculation means 232 can calculate a probability value by performing four arithmetic operations for obtaining an added value or the like for the appearance probability of the input N-gram parameter. In addition, the probability value calculation unit 232 can calculate a probability value by performing a statistical calculation for obtaining an average value, a maximum value, and the like for the appearance probability of the input N-gram parameter. Further, the probability value calculation means 232 calculates, as the probability value, any one of the added value (method 1), the average value (method 2), and the maximum value (method 3) for the appearance probability of the input N-gram parameter. Is preferred. Hereinafter, six specific examples of calculating probability values will be described in order.

＜第１例：トライグラムで手法１＞
まず、第１例〜第３例として、連鎖語列ｗ_ｉ，ｗ_ｊの次に連鎖語ｗ_ｋが出現するトライグラムに手法１〜手法３を適用したときの具体例を説明する。
同一の意味の連鎖語毎にクラスタリングした結果、Ｎ個の連鎖語クラス｛Ｃ_１，・・・，Ｃ_Ｎ｝が得られ、あるクラスＣ_ｎにおいて（但し、１≦ｎ≦Ｎ）、Ｋ_ｎ＋１個の同一の意味の連鎖語が存在するとする。この場合、Ｋ_ｎ＋１個の連鎖語のうち、出現確率が最大のものを基本型とし、これ以外を同意型とする（単語も基本型と同意型とを有する）。つまり、Ｋ_ｎ＋１個の連鎖語は、下記の式（２）で表される。 <First example: Trigram technique 1>
First, as a first example to a third example, a specific example will be described when Method 1 to Method 3 are applied to a trigram in which the chain word w _k appears next to the chain word strings w _i and w _j .
As a result of clustering for each chain word having the same meaning, N chain word classes {C ₁ ,..., C _N } are obtained, and in a certain class C _n (where 1 ≦ n ≦ N), K _n Suppose that there are +1 chain words with the same meaning. In this case, of the K _n +1 chain words, the one with the highest appearance probability is set as the basic type, and the other is set as the consensus type (the word also has the basic type and the consensus type). That is, K _n +1 chain words are represented by the following formula (2).

この第１例では、同一の意味を表す連鎖語が、基本型と同意型とに分散して出現すると解釈する。従って、確率値算出手段２３２は、下記の式（３）を用いて、連鎖語の出現確率の加算値を確率値として算出する。
なお、式（３）において、Ｓ_ｎ（κ）は、クラスＣ_ｎにおけるκ番目の連鎖語を示す In this first example, it is interpreted that chain words representing the same meaning appear in a distributed manner in a basic type and a consent type. Therefore, the probability value calculation means 232 calculates the addition value of the appearance probability of a chain word as a probability value using the following formula (3).
In Equation (3), S _n (κ) represents the κ-th chain word in class C _n .

＜第２例：トライグラムで手法２＞
この第２例では、基本型と同意型とが均等の確率で出現すると解釈する。従って、確率値算出手段２３２は、下記の式（４）を用いて、連鎖語の出現確率の平均値を確率値として算出する。 <Second example: Trigram method 2>
In this second example, it is interpreted that the basic type and the consent type appear with an equal probability. Therefore, the probability value calculating means 232 calculates the average value of the appearance probability of the chain word as a probability value using the following formula (4).

＜第３例：トライグラムで手法３＞
この第３例では、確率値算出手段２３２は、手法１および手法２を簡略し、下記の式（５）を用いて、連鎖語の出現確率の最大値を確率値として算出する。つまり、確率値算出手段２３２は、同意型の出現確率を、基本型の出現確率で置き換える。 <Third example: Trigram technique 3>
In this third example, the probability value calculating means 232 simplifies the methods 1 and 2 and calculates the maximum value of the appearance probability of a chain word as a probability value using the following equation (5). That is, the probability value calculation means 232 replaces the consent type appearance probability with the basic type appearance probability.

＜第４例：ユニグラムで手法１＞
続いて、第４例〜第６例として、ユニグラムに手法１〜手法３を適用したときの具体例を説明する。この第４例〜第６例では、同意語とその出現確率とが以下の関係であるとする。また、この第４例〜第６例では、同意語「んです＿けれど」を基本型とし、それ以外の同意語「んです＿けれども」、「んです＿けど」および「んです＿が」を同意型とする。 <Fourth Example: Method 1 with Unigram>
Subsequently, as a fourth example to a sixth example, specific examples when the methods 1 to 3 are applied to a unigram will be described. In the fourth to sixth examples, it is assumed that the synonyms and their appearance probabilities have the following relationship. Also, in these 4th to 6th examples, the synonym “n is _ but is” is the basic type, and other synonyms are “n is _ but”, “n is _ but” and “n is _”. Is the consent type.

＜＜第４例〜第６例における同意語とその出現確率＞＞
同意語出現確率
んです＿けれど０．４
んです＿けれども０．３
んです＿けど０．２
んです＿が０．１ << Synonyms and their appearance probabilities in the fourth to sixth examples >>
Synonym occurrence probability is _ but 0.4
It is __ but 0.3
But it is 0.2
Is it __

この第４例では、確率値算出手段２３２は、第１例と同様、同意語の出現確率を加算した値を確率値とする。つまり、確率値算出手段２３２は、「０．４＋０．３＋０．２＋０．１＝１．０」という計算を行う。従って、各同意語の確率値は、以下のようになる。 In this fourth example, the probability value calculating means 232 sets the value obtained by adding the appearance probabilities of synonyms as the probability value, as in the first example. That is, the probability value calculation means 232 performs a calculation “0.4 + 0.3 + 0.2 + 0.1 = 1.0”. Therefore, the probability value of each synonym is as follows.

＜＜第４例で算出した確率値＞＞
同意語確率値
んです＿けれど１．０
んです＿けれども１．０
んです＿けど１．０
んです＿が１．０ << Probability value calculated in the fourth example >>
Synonym is the probability value__ but 1.0
It is _ but 1.0
I'm 1.0
It is __

＜第５例：ユニグラムで手法２＞
この第５例では、確率値算出手段２３２は、第２例と同様、同意語の出現確率を平均した値を確率値とする。つまり、確率値算出手段２３２は、「（０．４＋０．３＋０．２＋０．１）／４＝０．２５」という計算を行う。従って、各同意語の確率値は、以下のようになる。 <Fifth example: Unigram method 2>
In the fifth example, the probability value calculating means 232 sets the probability value to a value obtained by averaging the appearance probabilities of synonyms as in the second example. That is, the probability value calculation means 232 performs a calculation of “(0.4 + 0.3 + 0.2 + 0.1) /4=0.25”. Therefore, the probability value of each synonym is as follows.

＜＜第５例で算出した確率値＞＞
同意語確率値
んです＿けれど０．２５
んです＿けれども０．２５
んです＿けど０．２５
んです＿が０．２５ << Probability value calculated in the fifth example >>
Synonym Probability Value _ but 0.25
_ But 0.25
_ But 0.25
It is _ 0.25

＜第６例：ユニグラムで手法３＞
この第６例では、確率値算出手段２３２は、第３例と同様、同意語の中で出現確率の最大値「０．４」を求める。従って、各同意語の確率値は、以下のようになる。 <Sixth example: Method 3 with unigram>
In the sixth example, the probability value calculating means 232 calculates the maximum value “0.4” of the appearance probability in the synonym as in the third example. Therefore, the probability value of each synonym is as follows.

＜＜第６例で算出した確率値＞＞
同意語確率値
んです＿けれど０．４
んです＿けれども０．４
んです＿けど０．４
んです＿が０．４ << Probability value calculated in the sixth example >>
Synonym Probability value _ but 0.4
But it is 0.4
But it is 0.4
It is 0.4

その後、確率値算出手段２３２は、パラメータ抽出手段２３１から入力されたＮグラムパラメータに含まれる出現確率を、算出した確率値で更新する。そして、確率値算出手段２３２は、確率値で更新されたＮグラムパラメータを履歴処理手段２３３に出力する。 After that, the probability value calculating unit 232 updates the appearance probability included in the N-gram parameter input from the parameter extracting unit 231 with the calculated probability value. Then, the probability value calculation unit 232 outputs the N-gram parameter updated with the probability value to the history processing unit 233.

なお、確率値算出手段２３２は、どの手法で確率値しても良く、例えば、どの手法で確率値を算出するか予め設定しても良い。また、確率値算出手段２３２は、連鎖語と同様、単語についても確率値を算出することができる。 Note that the probability value calculation means 232 may use any method to set the probability value, and for example, may set in advance which method is used to calculate the probability value. Further, the probability value calculation means 232 can calculate the probability value for the word as well as the chain word.

履歴処理手段２３３は、確率値算出手段２３２からＮグラムパラメータが入力されると共に、同意型が存在する単語が履歴中に存在する場合、履歴処理を行う。ここで、履歴処理の説明を簡略化するため、同意型のパターン数Ｋ＝１、すなわち、基本型に対して１つの同意型が存在すると仮定する。また、このとき、言語モデルは、バイグラムであるとする。 The history processing means 233 performs history processing when an N-gram parameter is input from the probability value calculation means 232 and a word having a consent type exists in the history. Here, in order to simplify the description of the history processing, it is assumed that the number of consent type patterns K = 1, that is, one consent type exists for the basic type. At this time, the language model is assumed to be a bigram.

学習テキスト中の単語ｗ_ｎ-１の次に単語ｗ_ｎが出現する確率は、下記の式（６）で表すことができる。
なお、式（６）において、Ｃ(・)は、学習テキスト中の出現確率を示す。 The probability that the word w _n appears in the next word w _n-1 in the training text, can be represented by the following formula (6).
In Equation (6), C (•) indicates the appearance probability in the learning text.

また、同様に、単語ｗ_ｎ-１の同意語ｗ´_ｎ-１の次に単語ｗ_ｎが出現する確率は、下記の式（７）で表すことができる。 Similarly, the probability that the word w _n-1 of the next word w _n of synonyms _w'n-1 appears, can be represented by the following formula (7).

これらより、履歴中の基本型と同意型とを統合して得られる出現確率は、下記の式（８）で表すことができる。
なお、式（８）において、Ｎは、学習テキストの全単語について、ユニグラムでの出現確率の和を表す。 From these, the appearance probability obtained by integrating the basic type and the consent type in the history can be expressed by the following equation (8).
In Expression (8), N represents the sum of appearance probabilities in the unigram for all words in the learning text.

そして、履歴処理手段２３３は、下記の式（９）を用いて、確率値算出手段２３２から入力されたＮグラムパラメータの出現確率を更新する。その後、履歴処理手段２３３は、出現確率を更新したＮグラムパラメータをバックオフ係数処理手段２３４に出力する。 Then, the history processing unit 233 updates the appearance probability of the N-gram parameter input from the probability value calculation unit 232 using the following equation (9). Thereafter, the history processing unit 233 outputs the N-gram parameter whose appearance probability is updated to the back-off coefficient processing unit 234.

つまり、前記した式（８）および式（９）によれば、学習テキストにおいて、基本型および同意型について、どちらか一方の出現確率がゼロの場合には、出現確率がゼロとなっている一方のＮグラムパラメータを新たに生成する。そして、この新たなＮグラムパラメータにおいて、その出現確率は、出現確率がゼロでない他方の出現確率となる。 That is, according to the above equations (8) and (9), in the learning text, when one of the occurrence probabilities is zero for the basic type and the consent type, the appearance probability is zero. N-gram parameters are newly generated. In this new N-gram parameter, the appearance probability is the other appearance probability whose appearance probability is not zero.

ところで、単語Ｎグラムモデルの次数が大きくなると、前記した式（８）が複雑になるため、実用上、近似することが好ましい。この近似手法としては、例えば、以下の手法Ａまたは手法Ｂが考えられる。
なお、履歴処理手段２３３は、単語と同様、連鎖語についても履歴処理を行うことができる。 By the way, when the degree of the word N-gram model is increased, the above-described equation (8) becomes complicated, and thus it is preferable to approximate it practically. As this approximation method, for example, the following method A or method B can be considered.
The history processing means 233 can perform history processing on chain words as well as words.

手法Ａ：出現確率がゼロのＮグラムパラメータだけを新たに生成し、他のＮグラムパラメータに関する計算を省略する。
手法Ｂ：同意型を履歴とするＮグラムパラメータに、基本型を履歴とするＮグラムパラメータの値を代用する。 Method A: Only an N-gram parameter with an appearance probability of zero is newly generated, and calculations for other N-gram parameters are omitted.
Method B: The value of the N-gram parameter whose history is the basic type is substituted for the N-gram parameter whose history is the consent type.

バックオフ係数処理手段２３４は、履歴処理手段２３３からＮグラムパラメータが入力されると共に、バックオフ係数を更新するバックオフ係数処理を行う。ここで、バックオフ係数処理の説明を簡略化するため、履歴処理と同様、同意型のパターン数Ｋ＝１(基本型ｗに対して同意語ｗ´が存在する）とし、言語モデルがバイグラムであるとする。 The back-off coefficient processing means 234 receives the N-gram parameter from the history processing means 233 and performs back-off coefficient processing for updating the back-off coefficient. Here, in order to simplify the description of the back-off coefficient processing, as in the history processing, the number of synonymous patterns K = 1 (the synonym w ′ exists for the basic type w), and the language model is bigram. Suppose there is.

バックオフ・スムージングは、学習テキストの出現確率Ｃ（ｗ_ｎ-１ｗ_ｎ）＝０の場合、出現確率Ｐ（ｗ_ｎ｜ｗ_ｎ-１）を出現確率Ｐ（ｗ_ｎ）から推定する手法である。ここで、バックオフ・スムージングの一つであるカッツの手法では、下記の式（１０）および式（１１）を用いる。このとき、学習テキスト中の低頻度語(出現確率がゼロの単語を含む)の出現確率は、グッド・チューリングの推定法を利用して、予め補正しておくことが好ましい（例えば、「確率的言語モデル、東京大学出版会、ｐｐ．６７−６８」参照)。
なお、式（１０）および式（１１）において、バックオフ係数はαである。 Backoff smoothing is a method of estimating the appearance probability P (w _n | w _n−1 ) from the appearance probability P (w _n ) when the appearance probability C (w _n−1 w _n ) = 0 of the learning text. is there. Here, in the Katz method, which is one of backoff smoothing, the following equations (10) and (11) are used. At this time, it is preferable that the appearance probability of low-frequency words (including words with an appearance probability of zero) in the learning text is corrected in advance using a good Turing estimation method (for example, “probabilistic” Language model, University of Tokyo Press, pp. 67-68 ”).
Note that the back-off coefficient is α in the equations (10) and (11).

ここで、基本型ｗ_ｎ-１と同意型ｗ´_ｎ-１とを統合する場合、バックオフ係数αは、下記の式（１２）で表すことができる(これに式（８）に代入すればさらに展開可能)。 Here, when integrating the basic type w _n-1 and the consent form w _'n-1, the back-off factor alpha, Substituting in can be expressed by the following equation (12) (which in the formula (8) Can be further expanded).

そして、バックオフ係数処理手段２３４は、下記の式（１３）を用いて、履歴処理手段２３３から入力されたＮグラムパラメータのバックオフ係数を更新する。その後、バックオフ係数処理手段２３４は、バックオフ係数を更新したＮグラムパラメータを言語モデル更新手段２３５に出力する。 Then, the back-off coefficient processing unit 234 updates the back-off coefficient of the N-gram parameter input from the history processing unit 233 using the following equation (13). Thereafter, the back-off coefficient processing unit 234 outputs the N-gram parameter with the updated back-off coefficient to the language model update unit 235.

ところで、単語Ｎグラムモデルの次数が大きくなると、前記した式（１２）が複雑になるため、実用上、近似することが好ましい。この近似手法としては、例えば、履歴処理と同様に、計算の省略（手法Ａ）、または、基本型のＮグラムパラメータの代用（手法Ｂ）が考えられる。
なお、バックオフ係数処理手段２３４は、単語と同様、連鎖語についてもバックオフ係数処理を行うことができる。 By the way, when the degree of the word N-gram model is increased, the above-described equation (12) becomes complicated, and therefore, it is preferable to approximate in practice. As this approximation method, for example, similarly to the history processing, calculation omission (method A) or substitution of a basic N-gram parameter (method B) can be considered.
Note that the back-off coefficient processing means 234 can perform back-off coefficient processing for chain words as well as words.

言語モデル更新手段２３５は、バックオフ係数処理手段２３４からＮグラムパラメータが入力されると共に、このＮグラムパラメータを用いて、言語モデル記憶部１６に記憶された言語モデルを更新する。つまり、言語モデル更新手段２３５は、言語モデル記憶部１６の言語モデルに含まれる出現確率をこのＮグラムパラメータに含まれる出現確率で更新し、言語モデル記憶部１６の言語モデルに含まれるバックオフ係数をこのＮグラムパラメータに含まれるバックオフ係数で更新する。 The language model update unit 235 receives the N-gram parameter from the back-off coefficient processing unit 234 and updates the language model stored in the language model storage unit 16 using the N-gram parameter. That is, the language model update unit 235 updates the appearance probability included in the language model of the language model storage unit 16 with the appearance probability included in the N-gram parameter, and the backoff coefficient included in the language model of the language model storage unit 16 Is updated with the back-off coefficient included in the N-gram parameter.

ここで、言語モデル更新手段２３５は、図６に示すように、言語モデル削除手段２３６を備える。この言語モデル削除手段２３６は、言語モデル更新手段２３５が言語モデルを更新した後、この言語モデルから同意型のＮグラムパラメータを削除する。このように、言語モデルのデータサイズが縮小されるため、この言語モデルを参照する音声認識装置３は、音声認識の際、そのメモリ容量を節約することができる。 Here, the language model update unit 235 includes a language model deletion unit 236 as shown in FIG. The language model deletion unit 236 deletes the consent type N-gram parameter from the language model after the language model update unit 235 updates the language model. Thus, since the data size of the language model is reduced, the speech recognition apparatus 3 that refers to the language model can save the memory capacity during speech recognition.

［発音辞書変換部］
以下、図８および図９を参照し、発音辞書変換部２４の詳細を説明する（適宜図１参照）。
発音辞書変換部２４は、同意単語・連鎖語リストを参照して、発音辞書のフォーマット変換を行う。図８に示すように、発音辞書は、左列が連鎖語または単語の表記であり、右列がその連鎖語または単語の発音である。この発音辞書では、発音をローマ字で表しており、“：”はその直前の母音を伸ばして発音することを表している。例えば、この発音辞書には、連鎖語「と＿いう」は、「ｔｏｉｕｓｐ」および「ｔｏｙｏｕ：ｓｐ」という２つの発音が登録されている。 [Pronunciation dictionary converter]
The details of the pronunciation dictionary conversion unit 24 will be described below with reference to FIGS. 8 and 9 (see FIG. 1 as appropriate).
The pronunciation dictionary conversion unit 24 converts the pronunciation dictionary format with reference to the consent word / chain word list. As shown in FIG. 8, in the pronunciation dictionary, the left column is a notation of a chain word or word, and the right column is the pronunciation of the chain word or word. In this pronunciation dictionary, pronunciation is expressed in Roman letters, and “:” indicates that the vowel immediately before that is extended and pronounced. For example, in this pronunciation dictionary, two pronunciations “toiu sp” and “toyou: sp” are registered for the chain word “to_say”.

ここで、同意単語・連鎖語リストから、同意語の関係となる単語および連鎖語と、それら同意語のうちの基本型および同意型とを判別できる。例えば、同意単語・連鎖語リストに、基本型「と＿いう」、および、その同意型「って＿いう」が設定されていたとする。この場合、図８の発音辞書には、基本型「と＿いう」の発音が２つ登録されているので、発音辞書変換部２４は、図９の変換後発話辞書に基本型の表記「と＿いう」と、基本型の２つの発音「ｔｏｉｕｓｐ」および「ｔｏｙｏｕ：ｓｐ」をそれぞれ登録する。つまり、図９に示すように、発音辞書変換部２４は、左列および中央列に基本型の表記「と＿いう」と、右列にその１つ目の発音「ｔｏｉｕｓｐ」とを登録する。また、発音辞書変換部２４は、左列および中央列に基本型の表記「と＿いう」と、右列にその１つ目の発音「ｔｏｙｏｕ：ｓｐ」とを登録する（図８，図９の符号α参照）。 Here, from the synonym word / chain word list, it is possible to discriminate between the synonym word and chain word, and the basic type and the synonym type among the synonyms. For example, it is assumed that the basic type “to_say” and the consent type “te_say” are set in the consent word / chain word list. In this case, since the pronunciation dictionary of FIG. 8 has two pronunciations of the basic type “to_say”, the pronunciation dictionary conversion unit 24 adds the basic type notation “to” to the converted utterance dictionary of FIG. “_” Means that two basic pronunciations “toiu sp” and “toyou: sp” are registered. That is, as shown in FIG. 9, the pronunciation dictionary conversion unit 24 registers the basic notation “to_say” in the left column and the center column and the first pronunciation “toiu sp” in the right column. . Further, the pronunciation dictionary conversion unit 24 registers the basic type expression “to_say” in the left column and the center column and the first pronunciation “toyou: sp” in the right column (FIGS. 8 and 9). (See symbol α).

また、図８の発音辞書には、同意型「って＿いう」の発音が３つ登録されているので、発音辞書変換部２４は、図９の変換後発話辞書に基本型の表記「と＿いう」と、同意型の表記「って＿いう」と、同意型の発音「Ｑｔｅｉｕｓｐ」、「Ｑｔｅｙｕ：ｓｐ」および「Ｑｔｕ：ｓｐ」をそれぞれ登録する。つまり、図９に示すように、発音辞書変換部２４は、左列に基本型の表記「と＿いう」と、中央列に同意型の表記「って＿いう」と、右列にその同意型の１つ目の発音「Ｑｔｅｉｕｓｐ」とを登録する。また、発音辞書変換部２４は、左列に基本型の表記「と＿いう」と、中央列に同意型の表記「って＿いう」と、右列にその同意型の２つ目の発音「Ｑｔｅｙｕ：ｓｐ」とを登録する。さらに、発音辞書変換部２４は、左列に基本型の表記「と＿いう」と、中央列に同意型の表記「って＿いう」と、右列にその同意型の２つ目の発音「Ｑｔｕ：ｓｐ」とを登録する（図８，図９の符号β参照）。 Further, since the pronunciation dictionary of FIG. 8 has three pronunciations of the consensus type “te_say” registered, the pronunciation dictionary conversion unit 24 adds the basic expression “to” to the post-conversion utterance dictionary of FIG. “Constant”, “Consent” notation “Tell”, and consensus pronunciations “Qteiu sp”, “Qteyu: sp” and “Qtu: sp” are registered respectively. That is, as shown in FIG. 9, the pronunciation dictionary conversion unit 24 uses the basic type notation “to_say” in the left column, the consent type notation “te_say” in the center column, and the agreement in the right column. The first pronunciation of the mold “Qteiu sp” is registered. The pronunciation dictionary conversion unit 24 also displays the basic type notation “to_say” in the left column, the consent type notation “te_say” in the center column, and the second pronunciation of the consent type in the right column. “Qteyu: sp” is registered. Further, the pronunciation dictionary conversion unit 24 displays the basic type notation “to_say” in the left column, the consent type notation “te_say” in the center column, and the second pronunciation of the consent type in the right column. “Qtu: sp” is registered (see symbol β in FIGS. 8 and 9).

つまり、発音辞書変換部２４は、図８の発音辞書を、基本型の表記と、同意型の表記と、同意型の発音とを有する変換後発音辞書に変換する。従って、図９の変換後発音辞書は、左列と中央列の表記が異なる場合には、中央列が左列の同意型となる。 That is, the pronunciation dictionary conversion unit 24 converts the pronunciation dictionary of FIG. 8 into a converted pronunciation dictionary having basic notation, consent type notation, and consent type pronunciation. Therefore, the converted pronunciation dictionary of FIG. 9 is a consensus type in which the center column is the left column when the notation of the left column and the center column is different.

［言語モデル生成装置の動作］
＜同意単語・連鎖語選択部＞
以下、図１０を参照して、図２の同意単語・連鎖語選択部２１の動作を説明する（適宜図２参照）。
まず、言語モデル生成装置１は、連鎖語抽出手段２１１によって、学習テキストより連鎖語を抽出する（ステップＳ１）。また、言語モデル生成装置１は、編集距離算出手段２１２によって、抽出した連鎖語の編集距離をＤＰマッチングによって算出する（ステップＳ２）。 [Operation of language model generator]
<Consent word / chain word selection part>
Hereinafter, the operation of the consent word / chain word selection unit 21 of FIG. 2 will be described with reference to FIG. 10 (see FIG. 2 as appropriate).
First, the language model generation device 1 extracts chain words from the learning text by the chain word extraction unit 211 (step S1). In addition, the language model generation device 1 calculates the edit distance of the extracted chain word by DP matching by the edit distance calculation means 212 (step S2).

また、言語モデル生成装置１は、最小編集距離選択手段２１３によって、編集距離が最小となる連鎖語を同意語候補として選択し、編集距離最小連鎖語関係リストを生成する（ステップＳ３）。そして、言語モデル生成装置１は、同意単語・連鎖語リスト生成手段２１４によって、入力された選択指示に基づいて同意単語・連鎖語リストを生成する（ステップＳ４）。 In addition, the language model generation device 1 selects a chain word having the minimum edit distance as a synonym candidate by the minimum edit distance selection unit 213, and generates a minimum edit distance chain word relation list (step S3). Then, the language model generation device 1 generates the consent word / chain word list based on the input selection instruction by the consent word / chain word list generation unit 214 (step S4).

＜言語モデル変換部＞
以下、図１１を参照して、図６の言語モデル変換部２３の動作を説明する（適宜図６参照）。
まず、言語モデル生成装置１は、言語モデル生成部２２によって、言語モデルを生成する（ステップＳ１１）。また、言語モデル生成装置１は、パラメータ抽出手段２３１によって、同意語のＮグラムパラメータを言語モデルから抽出する（ステップＳ１２）。 <Language model conversion unit>
The operation of the language model conversion unit 23 in FIG. 6 will be described below with reference to FIG. 11 (see FIG. 6 as appropriate).
First, the language model generation device 1 generates a language model by using the language model generation unit 22 (step S11). In addition, the language model generation device 1 extracts the N-gram parameter of the synonym from the language model by the parameter extraction unit 231 (step S12).

また、言語モデル生成装置１は、確率値算出手段２３２によって、出現確率に基づいて確率値を算出する（ステップＳ１３）。そして、言語モデル生成装置１は、履歴処理手段２３３によって、履歴処理を行う（ステップＳ１４）。 Moreover, the language model generation apparatus 1 calculates a probability value based on the appearance probability by the probability value calculation unit 232 (step S13). Then, the language model generation device 1 performs history processing by the history processing unit 233 (step S14).

また、言語モデル生成装置１は、バックオフ係数処理手段２３４によって、バックオフ係数処理を行う（ステップＳ１５）。そして、言語モデル生成装置１は、言語モデル更新手段２３５によって、確率値およびバックオフ係数を算出したＮグラムパラメータで言語モデルを更新する（ステップＳ１６）。 Further, the language model generation device 1 performs back-off coefficient processing by the back-off coefficient processing means 234 (step S15). Then, the language model generation device 1 updates the language model with the N-gram parameter for which the probability value and the back-off coefficient are calculated by the language model update unit 235 (step S16).

以上のように、本発明の実施形態に係る言語モデル生成装置１は、言語モデル生成部２２によって、文脈を考慮した単語Ｎグラムモデルを生成するため、大語彙連続音声認識に対応することができる。そして、言語モデル生成装置１は、言語モデル変換部２３によって、学習テキストに同意語が存在することによって分散して、低い値で算出された同意語の出現確率を補正するので、学習テキストが少量の場合でも、認識誤りが少ない音声認識を可能とする言語モデルを生成できる。 As described above, the language model generation device 1 according to the embodiment of the present invention generates a word N-gram model in consideration of the context by the language model generation unit 22, and therefore can support large vocabulary continuous speech recognition. . Then, the language model generation device 1 corrects the appearance probability of the synonym calculated with a low value by the language model conversion unit 23 by dispersing the presence of the synonym in the learning text. Even in this case, it is possible to generate a language model that enables speech recognition with few recognition errors.

なお、言語モデル生成装置１は、表記のゆらぎ（例：「取引する」「取り引きする」）の問題を解消するために、言語モデルを更新した後、予め定めた推奨表記を含むＮグラムパラメータだけを残し、それ以外のＮグラムパラメータを削除することが好ましい。 Note that the language model generation device 1 updates only the N-gram parameter including the recommended notation determined in advance after updating the language model in order to solve the problem of notation fluctuation (eg, “trade” and “deal”). And the other N-gram parameters are preferably deleted.

なお、言語モデル生成装置１は、基本型を示す識別子「基本型」、および、同意型を示す識別子「同意型」を用いる例で説明したが、これに限定されない。例えば、言語モデル生成装置１は、基本型を示す識別子として「ｒｅｆ」、および、同意型を示す識別子として「ｈｙｐ」を用いてもよい。 The language model generation device 1 has been described with an example using the identifier “basic type” indicating the basic type and the identifier “consent type” indicating the consent type, but is not limited thereto. For example, the language model generation device 1 may use “ref” as the identifier indicating the basic type and “hyp” as the identifier indicating the consent type.

なお、実施形態では、本発明に係る言語モデル生成装置を独立した装置として説明したが、本発明では、一般的なコンピュータを、前記した各手段として機能させるプログラムによっても実現することができる。このプログラムは、通信回線を介して配布しても良く、ＣＤ−ＲＯＭやフラッシュメモリ等の記録媒体に書き込んで配布しても良い。 In the embodiment, the language model generation apparatus according to the present invention has been described as an independent apparatus. However, in the present invention, a general computer can be realized by a program that functions as each of the above-described units. This program may be distributed via a communication line, or may be distributed by writing in a recording medium such as a CD-ROM or a flash memory.

［音声認識装置の構成］
図１に戻り、音声認識装置３の構成を説明する。
図１に示すように、音声認識装置３は、音響モデル記憶部３１と、音声分析部３３と、探索部３５とを備える。 [Configuration of voice recognition device]
Returning to FIG. 1, the configuration of the speech recognition apparatus 3 will be described.
As shown in FIG. 1, the speech recognition device 3 includes an acoustic model storage unit 31, a speech analysis unit 33, and a search unit 35.

音響モデル記憶部３１は、音響モデルを予め記憶するメモリ、ハードディスク等の記憶手段である。この音響モデルは、大量の音声データを学習（機械学習）することによって予め生成した確率モデルである。 The acoustic model storage unit 31 is a storage unit such as a memory or a hard disk that stores an acoustic model in advance. This acoustic model is a probability model generated in advance by learning (machine learning) a large amount of speech data.

音声分析部３３は、入力音声（音声信号）が入力されると共に、入力音声を音声分析して入力音声の特徴ベクトルを算出し、探索部３５に出力する。具体的には、音声分析部３３は、入力音声をハミング窓で切り出して、線形予測分析（ＬＰＧ）やメルケプストラム分析を行って、入力音声の特徴ベクトル（ＭＦＣＣ特徴量）を求める。 The voice analysis unit 33 receives the input voice (voice signal), analyzes the input voice, calculates a feature vector of the input voice, and outputs it to the search unit 35. Specifically, the speech analysis unit 33 cuts out the input speech with a Hamming window, performs linear prediction analysis (LPG) or mel cepstrum analysis, and obtains a feature vector (MFCC feature amount) of the input speech.

探索部３５は、音声分析部３３から入力音声の特徴ベクトルが入力されると共に、この入力音声の特徴ベクトルから、言語モデルと音響モデルと変換後発音辞書とを用いて、音声認識の結果を出力する。具体的には、探索部３５は、入力音声の特徴ベクトルと音響モデルとのマッチングを行って確率値（尤度）を求め、この確率値の対数(ｌｏｇ)をとった値を音響スコアとして算出する。また、探索部３５は、音声認識の実行中、音声認識結果の候補となった単語候補について、言語モデルから出現確率（Ｎグラム確率）を求める。このとき、探索部３５は、基本型のパラメータ（出現確率およびバックオフ係数）を使って正解語探索を行うことが好ましい。そして、探索部３５は、この出現確率の対数をとり、言語重みと呼ばれる第１の定数を乗じ、挿入ペナルティーと呼ばれる第２の定数を加えた値を言語スコアとする。その後、探索部３５は、図９の変換後発音辞書を参照して、言語スコアと音響スコアとが最大になる単語候補の列を音声認識の結果（図１では認識結果）として出力する。 The search unit 35 receives the feature vector of the input speech from the speech analysis unit 33, and outputs the result of speech recognition using the language model, the acoustic model, and the converted pronunciation dictionary from the feature vector of the input speech. To do. Specifically, the search unit 35 obtains a probability value (likelihood) by matching the feature vector of the input speech with the acoustic model, and calculates a value obtained by taking the logarithm (log) of the probability value as an acoustic score. To do. Further, the search unit 35 obtains an appearance probability (N-gram probability) from the language model for the word candidate that is a candidate for the speech recognition result during the speech recognition. At this time, the search unit 35 preferably performs a correct word search using basic parameters (appearance probability and backoff coefficient). Then, the search unit 35 takes the logarithm of the appearance probability, multiplies it by a first constant called a language weight, and sets a value obtained by adding a second constant called an insertion penalty as a language score. Thereafter, the search unit 35 refers to the post-conversion pronunciation dictionary in FIG. 9 and outputs a word candidate string having the maximum language score and acoustic score as a speech recognition result (recognition result in FIG. 1).

以上のように、本発明の実施形態に係る音声認識装置３は、探索部３５によって、正解語探索中に選ばれた単語候補（基本型）に対応する発音系列を参照できるため、図９の変換後発音辞書を利用して、中央列に記載された同意型の表記を出力することができる。すなわち、音声認識装置３は、基本型に対応する同意型の表記と発音とを出力することができ、音声認識システム１００の利便性を向上させることができる。 As described above, the speech recognition apparatus 3 according to the embodiment of the present invention can refer to the pronunciation sequence corresponding to the word candidate (basic type) selected during the correct word search by the search unit 35. By using the pronunciation dictionary after conversion, it is possible to output the consent type notation described in the center column. That is, the voice recognition device 3 can output the consent type notation and pronunciation corresponding to the basic type, and can improve the convenience of the voice recognition system 100.

以下、実施例として、本発明の効果について説明する。
ここでは、図１の言語モデル生成装置１によって、手法１〜手法３を用いて言語モデルを生成した。そして、各言語モデルを用いて、図１の音声認識装置３によって、報道情報番組（大語彙連続音声認識）を入力音声として、音声認識を行った。また、比較の対象として、従来の手法で生成した言語モデルを用いて、同一の報道情報番組を音声認識し、単語誤り率を求めた。下記の表１に、単語誤り率の結果を示す。 The effects of the present invention will be described below as examples.
Here, the language model is generated using the method 1 to the method 3 by the language model generation device 1 of FIG. Then, using each language model, the speech recognition apparatus 3 in FIG. 1 performed speech recognition using the news report program (large vocabulary continuous speech recognition) as input speech. In addition, as a comparison target, the same news report program was voice-recognized using a language model generated by a conventional method, and a word error rate was obtained. Table 1 below shows the results of word error rate.

表１に示すように、手法１〜手法３の言語モデルは、比較例１，２の言語モデルより単語誤り率（認識誤り）が低いことがわかる。つまり、言語モデル生成装置１は、従来技術に比べて、認識誤りが少ない音声認識を可能とする言語モデルを生成することができる。 As shown in Table 1, it can be seen that the language models of Method 1 to Method 3 have a lower word error rate (recognition error) than the language models of Comparative Examples 1 and 2. That is, the language model generation device 1 can generate a language model that enables speech recognition with fewer recognition errors than in the related art.

また、図９の変換後発音辞書を利用するために、言語モデル削除手段２３６によって、同意型のＮグラムパラメータを削除した言語モデル（実施例４）と、従来の手法で生成した言語モデルとのデータサイズを比較した。
なお、比較例３の手法は、言語モデルに基本型のＮグラムパラメータが存在して同意型のＮグラムパラメータが存在しない場合、又は、その逆の場合で存在しない方のＮグラムパラメータを追加するものである。 Further, in order to use the post-conversion pronunciation dictionary of FIG. 9, the language model deleting unit 236 deletes the consensus type N-gram parameter (Example 4) and the language model generated by the conventional method. The data size was compared.
The method of Comparative Example 3 adds the non-existing N-gram parameter when the basic N-gram parameter exists in the language model and the consent-type N-gram parameter does not exist, or vice versa. Is.

表２に示すように、実施例４の言語モデルは、比較例１，３の言語モデルに比べて、データサイズが小さくなる。つまり、言語モデル生成装置１は、従来技術に比べて言語モデルのデータサイズを縮小することができる。 As shown in Table 2, the language model of Example 4 has a smaller data size than the language models of Comparative Examples 1 and 3. That is, the language model generation device 1 can reduce the data size of the language model as compared with the prior art.

１言語モデル生成装置
１１学習テキスト記憶部
１２連鎖語リスト記憶部
１３編集距離付与連鎖語関係リスト記憶部
１４編集距離最小連鎖語関係リスト記憶部
１５同意単語・連鎖語リスト記憶部
１６言語モデル記憶部
１７発音辞書記憶部
１８変換後発音辞書記憶部
２１同意単語・連鎖語選択部（同意語選択部）
２１１連鎖語抽出手段
２１２編集距離算出手段
２１３最小編集距離選択手段（連鎖語候補選択手段）
２１４同意単語・連鎖語リスト生成手段
２２言語モデル生成部
２３言語モデル変換部
２３１パラメータ抽出手段
２３２確率値算出手段
２３３履歴処理手段
２３４バックオフ係数処理手段
２３５言語モデル更新手段
２３６言語モデル削除手段
２４発音辞書変換部
３音声認識装置
３１音響モデル記憶部
３３音声分析部
３５探索部
１００音声認識システム DESCRIPTION OF SYMBOLS 1 Language model production | generation apparatus 11 Learning text memory | storage part 12 Chain word list memory | storage part 13 Edit distance addition chain word relation list memory | storage part 14 Edit distance minimum chain word relation list memory | storage part 15 Consent word and chain word list memory | storage part 16 Language model memory | storage part 17 Pronunciation dictionary storage unit 18 Converted pronunciation dictionary storage unit 21 Consent word / concatenated word selection unit (synonymous word selection unit)
211 Chain word extraction means 212 Edit distance calculation means 213 Minimum edit distance selection means (chain word candidate selection means)
214 Consent word / linked word list generation means 22 Language model generation section 23 Language model conversion section 231 Parameter extraction means 232 Probability value calculation means 233 History processing means 234 Backoff coefficient processing means 235 Language model update means 236 Language model deletion means 24 Pronunciation Dictionary conversion unit 3 Speech recognition device 31 Acoustic model storage unit 33 Speech analysis unit 35 Search unit 100 Speech recognition system

Claims

A language model generation device that generates a language model using learning text including synonyms consisting of words or chain words that have the same meaning or different notation or reading,
A language model generation unit that generates a language model indicating the appearance probability of at least one of words or chain words included in the learning text by learning the learning text using a probabilistic language model;
Chain word extraction means for extracting, as the chain word, word pairs that appear more frequently than a preset frequency in the learning text in an order that most reduces entropy per word of the learning text;
Editing distance calculation means for calculating the editing distance of the chain words extracted by the chain word extraction means by DP matching;
Minimum edit distance selection means for selecting as a synonym candidate a chain word that minimizes the edit distance calculated by the edit distance calculation means;
A selection instruction including a chain word preselected from the synonym candidates is input, and based on the selection instruction, a synonym word for generating a synonym list in which the synonyms having the same meaning are associated in advance A chain word list generating means;
With reference to the synonym list, to calculate a probability value based on the probability of occurrence of synonyms having the same meaning in the language model, the probability of occurrence of the synonyms included in the language model with the probability value A language model conversion unit to be updated;
A language model generation apparatus comprising:

The language model conversion unit includes:
Language model deletion means for deleting the synonym synonym type other than the basic type of the synonym having the maximum appearance probability from the language model after the language model conversion unit is updated,
The language model generation apparatus according to claim 1, further comprising:

A pronunciation dictionary storage unit that stores a pronunciation dictionary that associates at least the notation of the synonym and the pronunciation of the synonym in advance;
With reference to the synonym list, the phonetic dictionary includes at least a synonym representation of the synonym, a synonym representation of the synonym corresponding to the basic type, and a synonym pronunciation of the synonym A pronunciation dictionary converter for converting to a converted pronunciation dictionary including:
Language model generating apparatus according to claim 1 or claim 2, further comprising a.

Language model generation program for the computer to function as a language model generating apparatus according to claim 1.

A speech recognition system comprising: the language model generation device according to claim 3; and a speech recognition device that performs speech recognition using a language model generated by the language model generation device,
The speech recognition device
An acoustic model storage unit that stores an acoustic model that is a probability model generated in advance by learning speech data;
A voice analysis unit that performs voice analysis on the input voice and calculates a feature vector of the input voice;
An acoustic score is calculated by matching the feature vector calculated by the speech analysis unit and the acoustic model, and a first constant is set for the appearance probability of a word candidate as a speech recognition result candidate with reference to the language model. A language score obtained by adding a second constant to the multiplied value is calculated, and a sequence of word candidates that maximizes the language score and the acoustic score is obtained as a result of the speech recognition with reference to the converted pronunciation dictionary. A search unit that outputs as
A speech recognition system comprising: