JP4537970B2

JP4537970B2 - Language model creation device, language model creation method, program thereof, and recording medium thereof

Info

Publication number: JP4537970B2
Application number: JP2006075364A
Authority: JP
Inventors: 浩和政瀧; 哲小橋川
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2006-03-17
Filing date: 2006-03-17
Publication date: 2010-09-08
Anticipated expiration: 2026-03-17
Also published as: JP2007249050A

Description

本発明は、言語モデルを合目的的に作成する技術に関する。 The present invention relates to a technique for purposely creating a language model.

例えば音声認識や統計的機械翻訳などでは、認識性能を向上させるための言語的な制約として言語モデルが用いられる。そして、音声認識などの使用用途（タスク）が限定されている場合、一般に、その用途に特化して構築された言語モデルを用いることで認識精度を高めることができるとされている。 For example, in speech recognition and statistical machine translation, a language model is used as a linguistic restriction for improving recognition performance. When the usage (task) such as voice recognition is limited, it is generally said that the recognition accuracy can be increased by using a language model specially constructed for the usage.

近年盛んに使用されている言語モデルである統計的言語モデルＮ−ｇｒａｍは、性能の高いモデルを構築するために大量のデータを学習する必要がある。使用用途を限定した場合、その用途に関するテキストデータを大量に収集するのは一般に困難である。この問題を解決するべく、用途外のテキストも含めた大量のテキストデータで学習した言語モデルから、目的のテキストを用いてモデルを適応する言語モデルの適応方法が提案されている。 The statistical language model N-gram, which is a language model actively used in recent years, needs to learn a large amount of data in order to construct a high-performance model. When the usage is limited, it is generally difficult to collect a large amount of text data related to the usage. In order to solve this problem, an adaptation method of a language model has been proposed in which a model is adapted using a target text from a language model learned from a large amount of text data including non-use text.

非特許文献１では、複数のテキストデータベースを準備し、それぞれで構築した言語モデルに対して特定の用途のテキストを用いて最適な重みを見つけ、各言語モデルをその重みで加算することによって、言語モデルを適応する方法が開示されている。
また、非特許文献２では、重みではなく、最大事後確率推定によって、複数のテキストデータベースのそれぞれで構築した言語モデルから特定の用途へ言語モデルを適応する方法が開示されている。
P. R. Clarkson and A. J. Robinson, "Language model adaptation using mixtures and an exponentially decaying cache", Proc. ICASSP'97, vol.2, pp.799-802, April, 1997. 政瀧浩和、匂坂芳典、久木和也、河原達也、"最大事後確率推定によるＮ−ｇｒａｍ言語モデルのタスク適応"、電子情報通信学会論文誌、Ｖｏｌ．Ｊ８１−Ｄ−ＩＩ、Ｎｏ．１１、ｐｐ．２５１９−２５２５、１９９８年１１月 In Non-Patent Document 1, a plurality of text databases are prepared, an optimum weight is found using a text for a specific purpose with respect to the language model constructed by each, and each language model is added by the weight, thereby obtaining a language. A method for adapting the model is disclosed.
Non-Patent Document 2 discloses a method of adapting a language model to a specific application from a language model constructed by each of a plurality of text databases by estimating maximum posterior probability instead of weight.
PR Clarkson and AJ Robinson, "Language model adaptation using mixture and an exponentially decaying cache", Proc. ICASSP'97, vol.2, pp.799-802, April, 1997. Masakazu Masatsugu, Yoshinori Kasaka, Kazuya Hisaki, Tatsuya Kawahara, "Task Adaptation of N-gram Language Model by Maximum A posteriori Probability Estimation", IEICE Transactions, Vol. J81-D-II, no. 11, pp. 2519-2525, November 1998

従来技術では、少量であっても必ず目的の用途に即したテキストデータが必要となるという問題がある。また、最適な重みや最大事後確率推定というのは、与えられた特定の用途に関連するテキストに対して言語モデルとして最適な性能を得ることに主眼が置かれており、上記用途以外の応用の分野で精度が向上するとは限らないという問題点があった。 The prior art has a problem in that text data suitable for the intended application is required even in a small amount. Optimum weights and maximum posterior probability estimation are focused on obtaining optimal performance as a language model for texts related to a given application. There was a problem that accuracy was not always improved in the field.

そこで本発明は、上記の問題点に鑑み、目的の用途に即したテキストデータを要することなく、目的の用途に適した言語モデル（適応言語モデル）を作成することを目的とする。 In view of the above problems, an object of the present invention is to create a language model (adaptive language model) suitable for a target application without requiring text data suitable for the target application.

上記課題を解決するために、本発明は、次のようにして適応言語モデルを作成する。即ち、複数のテキストデータクラスタから、各テキストデータクラスタに対応したクラスタ言語モデルをそれぞれ作成する。全てのクラスタ言語モデルから一部のクラスタ言語モデルを除いた残りのクラスタ言語モデルの組み合わせから言語モデル（以下、「部分選択合成クラスタ言語モデル」という。）をそれぞれ合成する。そして、各部分選択合成クラスタ言語モデルを、評価用データおよび評価用音響モデルを用いて評価して、各部分選択合成クラスタ言語モデルの評価結果を出力する。さらに、この各評価結果のうち低い評価結果を与えた部分選択合成クラスタ言語モデルの合成において除外されたクラスタ言語モデルを選択し、選択されたクラスタ言語モデルが一つの場合に当該クラスタ言語モデルを作成された言語モデルとして出力する。選択されたクラスタ言語モデルが複数である場合には、これらから１つの言語モデルを合成して出力する。 In order to solve the above problems, the present invention creates an adaptive language model as follows. That is, a cluster language model corresponding to each text data cluster is created from a plurality of text data clusters. A language model (hereinafter referred to as “partial selection synthesis cluster language model”) is synthesized from the combination of the remaining cluster language models excluding some cluster language models from all cluster language models. Then, each partial selection synthesis cluster language model is evaluated using the evaluation data and the evaluation acoustic model, and the evaluation result of each partial selection synthesis cluster language model is output. Furthermore, the cluster language model excluded in the synthesis of the partial selection synthesis cluster language model that gave the low evaluation result among these evaluation results is selected, and the cluster language model is created when there is only one selected cluster language model Is output as a language model. When there are a plurality of selected cluster language models, one language model is synthesized from them and output.

また、上記の選択においては、低い評価結果を、閾値以下あるいは閾値よりも小である評価結果であるとして、言語モデルあるいは部分選択合成クラスタ言語モデルの合成において除外されたクラスタ言語モデルを選択するようにしてもよい。 Further, in the above-described selection, the low evaluation results, as walk lower than the threshold value is an evaluation result is smaller than the threshold value, the language model or partial selective synthesis cluster language model excluded cluster language model in the synthesis of May be selected.

また、次のようにして適応言語モデルを作成するとしてもよい。即ち、入力されたテキストデータを分類基準に従って複数のテキストデータクラスタに分割する。
これによれば、テキストデータとして例えば既存の汎用的な大量テキストデータを用いる場合、用途に応じて適切な分類基準でテキストデータクラスタを作成することができる。 Further, the adaptive language model may be created as follows. That is, the input text data is divided into a plurality of text data clusters according to the classification criteria.
According to this, when using, for example, existing general-purpose mass text data as text data, a text data cluster can be created with an appropriate classification standard according to the application.

また、本発明の言語モデル作成装置としてコンピュータを機能させる言語モデル作成プログラムによって、コンピュータを言語モデル作成装置として作動処理させることができる。そして、この言語モデル作成プログラムを記録した、コンピュータに読み取り可能なプログラム記録媒体によって、他のコンピュータを言語モデル作成装置として機能させることや、言語モデル作成プログラムを流通させることなどが可能になる。 Further, the computer can be operated as a language model creation device by a language model creation program that causes the computer to function as the language model creation device of the present invention. Then, it becomes possible to cause another computer to function as a language model creation device or to distribute the language model creation program by using a computer-readable program recording medium that records this language model creation program.

本発明によれば、複数のテキストデータクラスタに対応した各クラスタ言語モデルのうち複数の組み合わせから合成クラスタ言語モデルを合成し、各クラスタ言語モデルおよび／または各合成クラスタ言語モデルのうち評価用データについて高い評価結果を与える言語モデルを選び出すことから、特定の用途に即したテキストデータを全く必要とせずに目的の用途に適した言語モデル（適応言語モデル）を作成することができる。 According to the present invention, a synthesized cluster language model is synthesized from a plurality of combinations among cluster language models corresponding to a plurality of text data clusters, and evaluation data of each cluster language model and / or each synthesized cluster language model is obtained. Since a language model that gives a high evaluation result is selected, it is possible to create a language model (adaptive language model) suitable for the intended application without requiring text data suitable for a specific application.

また、言語モデルとしての性能ではなく認識精度による基準でクラスタ言語モデルなどの選択を行っているため、より直接的に認識精度が向上した適応言語モデルを得ることができる。 Further, since the cluster language model or the like is selected based on the recognition accuracy rather than the performance as the language model, an adaptive language model with improved recognition accuracy can be obtained more directly.

《第１実施形態》
本発明の第１実施形態について、図面を参照しながら説明する。
＜第１実施形態の言語モデル作成装置＞
図１に例示するように、言語モデル作成装置（１）は、キーボードなどが接続可能な入力部（１１）、液晶ディスプレイなどが接続可能な出力部（１２）、ＣＰＵ（Central Processing Unit;１４）〔キャッシュメモリなどを備えていてもよい。〕、メモリであるＲＡＭ（Random Access Memory）（１５）、ＲＯＭ（Read Only Memory）（１６）やハードディスクである外部記憶装置（１７）、並びにこれらの入力部（１１）、出力部（１２）、ＣＰＵ（１４）、ＲＡＭ（１５）、ＲＯＭ（１６）、外部記憶装置（１７）間のデータのやり取りが可能なように接続するバス（１８）などを備えている。また必要に応じて、言語モデル作成装置（１）に、ＣＤ−ＲＯＭなどの記憶媒体を読み書きできる装置（ドライブ）などを設けるとしてもよい。このようなハードウェア資源を備えた物理的実体としては、汎用コンピュータなどがある。 << First Embodiment >>
A first embodiment of the present invention will be described with reference to the drawings.
<Language Model Creation Device of First Embodiment>
As illustrated in FIG. 1, the language model creation device (1) includes an input unit (11) to which a keyboard or the like can be connected, an output unit (12) to which a liquid crystal display or the like can be connected, and a CPU (Central Processing Unit; 14). [A cache memory or the like may be provided. RAM (Random Access Memory) (15), ROM (Read Only Memory) (16) and external storage device (17) which is a hard disk, and these input unit (11), output unit (12), A CPU (14), a RAM (15), a ROM (16), a bus (18) connected so as to exchange data between the external storage devices (17), and the like are provided. If necessary, the language model creation device (1) may be provided with a device (drive) that can read and write a storage medium such as a CD-ROM. A physical entity having such hardware resources includes a general-purpose computer.

言語モデル作成装置（１）の外部記憶装置（１７）には、言語モデル作成のためのプログラムおよびこのプログラムの処理において必要となるデータなどが保存記憶されている。また、これらのプログラムの処理によって得られるデータなどは、ＲＡＭ（１５）などに適宜に保存記憶される。 The external storage device (17) of the language model creation device (1) stores and stores a program for creating a language model, data necessary for processing the program, and the like. Further, data obtained by the processing of these programs is appropriately stored and stored in the RAM (15) or the like.

また、外部記憶装置（１７）には、テキストデータ（１１１）が保存記憶されている。テキストデータ（１１１）は、予め複数（Ｎ個）のデータに分割されているとする。この各データを『テキストデータクラスタ』と呼ぶことにする。つまり、テキストデータ（１１１）は、テキストデータクラスタ［１］（１１１−１）、テキストデータクラスタ［２］（１１１−２）、・・・、テキストデータクラスタ［Ｎ］（１１１−Ｎ）で構成される。なお、各テキストデータクラスタは１つのテキストデータを分割したものに限定されず、例えば、複数のテキストデータを用意し、それぞれを各別のテキストデータクラスタに見立てるとしてもよいし、あるいは、テキストデータクラスタは、複数のテキストデータをマージ（merge）したものとすることでもよい。さらに、このような場合にテキストデータクラスタを複数用意すれば、同じ内容のテキストデータクラスタが存在しえることになるが、本発明はこのような場合も許容しえるものである。換言すれば、複数のテキストデータクラスタはそれぞれ異なるものが望ましいが、同じ内容のテキストデータクラスタが存在しえる場合にも本発明を実施することは可能である〔このことは各実施形態等において同様である。〕。 The external storage device (17) stores and stores text data (111). It is assumed that the text data (111) is divided into a plurality (N) of data in advance. Each piece of data is called a “text data cluster”. That is, the text data (111) is composed of a text data cluster [1] (111-1), a text data cluster [2] (111-2),..., A text data cluster [N] (111-N). Is done. In addition, each text data cluster is not limited to what divided | segmented one text data, For example, several text data may be prepared and each may be regarded as another text data cluster, or a text data cluster May be a combination of a plurality of text data. Furthermore, if a plurality of text data clusters are prepared in such a case, text data clusters having the same contents can exist. However, the present invention is also acceptable in such a case. In other words, the plurality of text data clusters are preferably different from each other, but the present invention can be implemented even when text data clusters having the same contents can exist [this is the same in each embodiment, etc. It is. ].

さらに、外部記憶装置（１７）には、言語モデルを評価するための評価用データ（１１８）〔評価用データは、評価用の音声データおよび音声書き起こしテキストで構成されるとする。これは、言語モデルを音声認識に用いる場合を想定しており、例えば機械翻訳などの用途に応じて評価用データを準備することを除外する趣旨ではない。また、評価用音声データは、予定している音声認識対象となる音声と音響的に近いものとするのが良い。〕、および評価用音響モデル（１１９）がデータとして保存記憶されている。 Further, in the external storage device (17), evaluation data (118) for evaluating the language model [the evaluation data is assumed to be composed of voice data for evaluation and voice transcription text. This assumes the case where a language model is used for speech recognition, and is not intended to exclude preparing evaluation data according to applications such as machine translation. Also, the evaluation voice data is preferably acoustically close to the scheduled voice recognition target voice. , And an acoustic model for evaluation (119) are stored and stored as data.

また外部記憶装置（１７）には、テキストデータクラスタから言語モデル（以下、テキストデータクラスタから作成した言語モデルを「クラスタ言語モデル」という。）を作成するためのプログラム、複数のクラスタ言語モデルから言語モデル（以下、「合成クラスタ言語モデル」という。）を合成するためのプログラム、言語モデルを評価するためのプログラム、評価結果から言語モデルを選択するためのプログラムが保存記憶されている。 In the external storage device (17), a program for creating a language model from a text data cluster (hereinafter, a language model created from the text data cluster is referred to as “cluster language model”), and a language from a plurality of cluster language models. A program for synthesizing a model (hereinafter referred to as “synthesis cluster language model”), a program for evaluating a language model, and a program for selecting a language model from an evaluation result are stored and stored.

言語モデル作成装置（１）では、外部記憶装置（１７）に記憶された各プログラムとこの各プログラムの処理に必要なデータが必要に応じてＲＡＭ（１５）に読み込まれて、ＣＰＵ（１４）で解釈実行・処理される。この結果、ＣＰＵ（１４）が所定の機能（クラスタ言語モデル作成部、モデル合成部、モデル評価部、選択部）を実現することで言語モデルの作成が実現される。 In the language model creation device (1), each program stored in the external storage device (17) and data necessary for the processing of each program are read into the RAM (15) as necessary, and the CPU (14). Interpreted and processed. As a result, the creation of the language model is realized by the CPU (14) realizing predetermined functions (cluster language model creation unit, model synthesis unit, model evaluation unit, selection unit).

＜第１実施形態の概要＞
第１実施形態では、テキストデータクラスタ毎にクラスタ言語モデルを作成し、それぞれのクラスタ言語モデルのうち複数の組み合わせから合成クラスタ言語モデルを合成する。そして、各クラスタ言語モデルおよび各合成クラスタ言語モデルを、評価用データ（１１８）および評価用音響モデル（１１９）を用いて評価し、所定の評価結果の一例として最高の認識性能を達成する言語モデルを適応言語モデルとして特定する。なお、所定の評価結果を最高の認識性能に限定せず、閾値に対する相対評価などによって認識率が向上するという良い評価を所定の評価結果としてもよい。 <Outline of First Embodiment>
In the first embodiment, a cluster language model is created for each text data cluster, and a synthesized cluster language model is synthesized from a plurality of combinations among the respective cluster language models. Each cluster language model and each synthesized cluster language model are evaluated using the evaluation data (118) and the acoustic model for evaluation (119), and a language model that achieves the highest recognition performance as an example of a predetermined evaluation result Is identified as an adaptive language model. The predetermined evaluation result is not limited to the highest recognition performance, and a good evaluation that the recognition rate is improved by a relative evaluation with respect to a threshold value or the like may be used as the predetermined evaluation result.

＜第１実施形態の言語モデル作成処理＞
次に、図２から図４を参照して、言語モデル作成装置（１）における言語モデル作成処理の流れを叙述的に説明する。 <Language Model Creation Processing of First Embodiment>
Next, the flow of language model creation processing in the language model creation device (1) will be described descriptively with reference to FIGS.

まず、クラスタ言語モデル作成部（１１３）は、テキストデータクラスタ［１］（１１１−１）、テキストデータクラスタ［２］（１１１−２）、・・・、テキストデータクラスタ［Ｎ］（１１１−Ｎ）をＲＡＭ（１５）から読み込み、各テキストデータクラスタを用いて、各テキストデータクラスタに対応したクラスタ言語モデルの集合（１１４）を作成する（ステップＳ１）。
つまり、クラスタ言語モデル作成部（１１３）は、テキストデータクラスタ［１］（１１１−１）からクラスタ言語モデル［１］（１１４−１）を作成し、同様に、テキストデータクラスタ［２］（１１１−２）からクラスタ言語モデル［２］（１１４−２）を作成し、・・・、テキストデータクラスタ［Ｎ］（１１１−Ｎ）からクラスタ言語モデル［Ｎ］（１１４−Ｎ）を作成するのである。Ｎ個のクラスタ言語モデル［１］（１１４−１）、クラスタ言語モデル［２］（１１４−２）、・・・、クラスタ言語モデル［Ｎ］（１１４−Ｎ）はＲＡＭ（１５）などに適宜に保存記憶される。
なお、テキストデータ（クラスタテキストデータ）から言語モデル（クラスタ言語モデル）を作成する方法は公知のものに拠るから説明を略する。また、言語モデルは、Ｎ−ｇｒａｍなどのようにテキストデータから自動的に学習できるモデルであれば使用可能である。 First, the cluster language model creation unit (113) performs text data cluster [1] (111-1), text data cluster [2] (111-2),..., Text data cluster [N] (111-N ) Is read from the RAM (15), and a set (114) of cluster language models corresponding to each text data cluster is created using each text data cluster (step S1).
That is, the cluster language model creation unit (113) creates the cluster language model [1] (114-1) from the text data cluster [1] (111-1), and similarly the text data cluster [2] (111). -2) creates a cluster language model [2] (114-2), and so on, and creates a cluster language model [N] (114-N) from a text data cluster [N] (111-N). is there. N cluster language models [1] (114-1), cluster language models [2] (114-2),..., And cluster language models [N] (114-N) are appropriately stored in the RAM (15) or the like. Stored in memory.
Note that a method for creating a language model (cluster language model) from text data (cluster text data) is based on a publicly known method, and therefore a description thereof is omitted. The language model can be used as long as it is a model that can automatically learn from text data such as N-gram.

次に、モデル合成部（１１５）は、ＲＡＭ（１５）から読み込んだ各クラスタ言語モデル（１１４−１）（１１４−２）・・・（１１４−Ｎ）のうち複数の組み合わせから合成クラスタ言語モデル（１１６）を合成する（ステップＳ２）。第１実施形態において『複数の組み合わせ』とは、複数の組み合わせとして考えうる全ての網羅的な組み合わせを云うものとする。 Next, the model synthesis unit (115) generates a synthesized cluster language model from a plurality of combinations among the cluster language models (114-1) (114-2) (114-N) read from the RAM (15). (116) is synthesized (step S2). In the first embodiment, the “plurality of combinations” means all exhaustive combinations that can be considered as a plurality of combinations.

このことを具体的に説明すると、モデル合成部（１１５）は、Ｎ個のクラスタ言語モデル（１１４−１）（１１４−２）・・・（１１４−Ｎ）から、Σ_ｊ＝２ ^Ｎ _ＮＣ_ｊ個の合成クラスタ言語モデルを作成するということである。ここで_ＮＣ_ｊは、Ｎ個のものから重複を許さずｊ個を選択した組み合わせ数を表す。また、記号Ｍについて、Ｍ＝Σ_ｊ＝２ ^Ｎ _ＮＣ_ｊ＝_ＮＣ_２＋_ＮＣ_３＋・・・＋_ＮＣ_Ｎとする。
つまり、モデル合成部（１１５）は、例えばクラスタ言語モデル［１］（１１４−１）およびクラスタ言語モデル［２］（１１４−２）からは合成クラスタ言語モデル［１＋２］（１１６−２）を合成し、クラスタ言語モデル［１］（１１４−１）、クラスタ言語モデル［２］（１１４−２）およびクラスタ言語モデル［３］（１１４−３）からは合成クラスタ言語モデル［１＋２＋３］（１１６−３）を合成し、・・・、クラスタ言語モデル［５］およびクラスタ言語モデル［９］（１１４−３）からは合成クラスタ言語モデル［５＋９］を合成し、・・・、全てのクラスタ言語モデルからは合成クラスタ言語モデル［Ｎ＋（Ｎ−１）＋・・・＋１］（１１６−Ｍ）を合成するのである。なお、複数の言語モデル（クラスタ言語モデル）から１つの言語モデル（合成クラスタ言語モデル）を合成する方法は公知のものに拠るから説明を略する〔例えば参考文献を参照のこと。〕。
（参考文献）特開２００４−５３７４５号公報 More specifically, the model synthesis unit (115) calculates Σ _{j = 2} ^N _N C from N cluster language models (114-1) (114-2) (114-N). That is, _j synthetic cluster language models are created. Here, _N C _j represents the number of combinations in which j is selected from N without allowing duplication. For the symbol M, M = Σ _{j = 2} ^N _N C _j = _NC ₂ + _NC ₃ +... + _NC _N
That is, the model synthesis unit (115) synthesizes the synthesized cluster language model [1 + 2] (116-2) from, for example, the cluster language model [1] (114-1) and the cluster language model [2] (114-2). From the cluster language model [1] (114-1), the cluster language model [2] (114-2), and the cluster language model [3] (114-3), the synthesized cluster language model [1 + 2 + 3] (116-3) ), And synthesize the synthesized cluster language model [5 + 9] from the cluster language model [5] and the cluster language model [9] (114-3), and so on, from all the cluster language models. Synthesizes the synthesized cluster language model [N + (N-1) +... +1] (116-M). Note that a method for synthesizing one language model (synthetic cluster language model) from a plurality of language models (cluster language model) is based on a known method, so the description thereof is omitted [for example, refer to the reference literature. ].
(Reference) Japanese Patent Application Laid-Open No. 2004-53745

Ｍ個の合成クラスタ言語モデル（１１６−１）（１１６−２）・・・（１１６−Ｍ）はＲＡＭ（１５）などに適宜に保存記憶される。このステップＳ２の処理を終えたとき、ＲＡＭ（１５）には、Ｎ個のクラスタ言語モデル（１１４−１）（１１４−２）・・・（１１４−Ｎ）と、Ｍ個の合成クラスタ言語モデル（１１６−１）（１１６−２）・・・（１１６−Ｍ）との、計（Ｎ＋Ｍ）個の言語モデルが保存記憶されていることになる。 M synthetic cluster language models (116-1), (116-2),... (116-M) are appropriately stored and stored in the RAM (15) or the like. When the processing in step S2 is completed, the RAM (15) has N cluster language models (114-1) (114-2) (114-N) and M synthetic cluster language models. A total of (N + M) language models (116-1), (116-2),... (116-M) are stored and stored.

なお、第１実施形態では『複数の組み合わせ』を、複数の組み合わせとして考えうる全ての網羅的な組み合わせを云うものとした。しかし、テキストデータクラスタの個数が多い場合、この網羅的な組み合わせ数はとても多くなるので、任意の組み合わせで合成クラスタ言語モデルを合成するようにしてもよい。つまり、任意の組み合わせでＭ個よりも少ない合成クラスタ言語モデルを合成することでもよい。 In the first embodiment, “plural combinations” refers to all exhaustive combinations that can be considered as a plurality of combinations. However, when the number of text data clusters is large, the number of exhaustive combinations becomes very large, so that the synthesized cluster language model may be synthesized with arbitrary combinations. That is, fewer than M synthetic cluster language models may be synthesized in any combination.

続いて、モデル評価部（１１７）は、（Ｎ＋Ｍ）個の各言語モデル（１１４−１）（１１４−２）・・・（１１４−Ｎ）（１１６−１）（１１６−２）・・・（１１６−Ｍ）、評価用データ（１１８）および評価用音響モデル（１１９）をＲＡＭ（１５）から読み込み、各言語モデルと評価用音響モデル（１１９）とを用いて評価用データ（１１８）に対する認識率（１５０）を求め、これを各言語モデルの評価結果として算出する（ステップＳ３）。
評価結果である認識率の差異は、同じ評価用データ（１１８）および評価用音響モデル（１１９）を用いて評価していることから、各言語モデル（１１４−１）（１１４−２）・・・（１１４−Ｎ）（１１６−１）（１１６−２）・・・（１１６−Ｍ）の差異に基づく。評価用音声データの認識率の算出は公知の方法によって達成される。 Subsequently, the model evaluation unit (117) includes (N + M) language models (114-1) (114-2) (114-N) (116-1) (116-2). (116-M), the evaluation data (118) and the evaluation acoustic model (119) are read from the RAM (15), and each language model and the evaluation acoustic model (119) are used for the evaluation data (118). A recognition rate (150) is obtained and calculated as an evaluation result of each language model (step S3).
Since the difference in recognition rate as an evaluation result is evaluated using the same evaluation data (118) and acoustic model (119) for evaluation, each language model (114-1) (114-2). Based on the difference of (114-N) (116-1) (116-2) (116-M). The calculation of the recognition rate of the evaluation voice data is achieved by a known method.

モデル評価部（１１７）の具体的な機能構成について図４を参照して説明する。
モデル評価部（１１７）は、特徴量算出部（１１７１）、解探索部（１１７２）、テキストマッチング部（１１７３）から構成される。
特徴量算出部（１１７１）は、評価用データ（１１８）を構成する音声データ（１１８１）の特徴量を算出する。この特徴量は、例えば多次元のＭＦＣＣ（メル周波数ケプストラム係数）などである。
次に、解探索部（１１７２）は、音声データの特徴量、評価用音響モデル（１１９）、各言語モデル（１１４−１）（１１４−２）・・・（１１４−Ｎ）（１１６−１）（１１６−２）・・・（１１６−Ｍ）それぞれを用いて、音声データに対する認識結果テキストを出力する。この処理では、各言語モデル（１１４−１）（１１４−２）・・・（１１４−Ｎ）（１１６−１）（１１６−２）・・・（１１６−Ｍ）それぞれに対応した認識結果テキストが得られる。
続いて、テキストマッチング部（１１７３）は、評価用データ（１１８）を構成する音声書き起こしテキスト（１１８２）および各認識結果テキストを用いて、音声書き起こしテキスト（１１８２）に対する認識結果テキストの認識率（１５０）を算出する。この認識率（１５０）が各言語モデルの評価結果である。 A specific functional configuration of the model evaluation unit (117) will be described with reference to FIG.
The model evaluation unit (117) includes a feature amount calculation unit (1171), a solution search unit (1172), and a text matching unit (1173).
The feature amount calculation unit (1171) calculates the feature amount of the audio data (1181) constituting the evaluation data (118). This feature amount is, for example, multidimensional MFCC (Mel Frequency Cepstrum Coefficient).
Next, the solution search unit (1172) includes the feature amount of the speech data, the acoustic model for evaluation (119), each language model (114-1) (114-2) (114 -N) (116-1). ) (116-2) (116-M) is used to output a recognition result text for the voice data. In this process, the recognition result text corresponding to each language model (114-1) (114-2) (114-N) (116-1) (116-2) (116-M). Is obtained.
Subsequently, the text matching unit (1173) uses the speech transcription text (1182) and each recognition result text constituting the evaluation data (118) to recognize the recognition result text for the speech transcription text (1182). (150) is calculated. This recognition rate (150) is the evaluation result of each language model.

続いて、選択部（１２０）は、各言語モデル（１１４−１）（１１４−２）・・・（１１４−Ｎ）（１１６−１）（１１６−２）・・・（１１６−Ｍ）の評価結果に基づき、各言語モデル（１１４−１）（１１４−２）・・・（１１４−Ｎ）（１１６−１）（１１６−２）・・・（１１６−Ｍ）のうち最高の評価結果を与えた言語モデルを選択して適応言語モデル（１２３）として出力する（ステップＳ４）。この適応言語モデル（１２３）が、予定する用途に対して適応した言語モデルとなっている。
第１実施形態では、最高の評価結果を与えた言語モデルを選択するとしたが、予め閾値を設定しておき、この閾値以上（あるいは閾値よりも大）の評価結果を与えた言語モデルを選択するようにしてもよい。この場合、複数の言語モデルが選択されえるが、これらが予定する用途に対して適応した音響モデルの候補となる。換言すれば、この閾値を厳しく設定することで１個の言語モデルを出力することもできるし、あるいは、１個の言語モデルを出力することに限定したくない場合には、閾値を緩く設定することで複数の言語モデルを適応言語モデルの候補として出力するようにすることもできる。なお、この閾値は予め外部記憶装置（１７）に保存記憶しておくとする。 Subsequently, the selection unit (120) reads the language models (114-1) (114-2) (114-N) (116-1) (116-2) (116-M). Based on the evaluation result, the highest evaluation result among the language models (114-1) (114-2) (114-N) (116-1) (116-2) (116-M) Is selected and output as an adaptive language model (123) (step S4). This adaptive language model (123) is a language model adapted for the intended use.
In the first embodiment, the language model giving the highest evaluation result is selected. However, a threshold value is set in advance, and the language model giving an evaluation result equal to or higher than this threshold value (or larger than the threshold value) is selected. You may do it. In this case, a plurality of language models can be selected, but these become acoustic model candidates adapted to the intended use. In other words, it is possible to output a single language model by setting this threshold strictly, or set a low threshold if it is not desired to output only one language model. Thus, a plurality of language models can be output as candidates for the adaptive language model. This threshold value is stored and stored in advance in the external storage device (17).

なお、各テキストデータクラスタのデータ量を均等化することで、テキストデータ量の影響を低減することもできる。また、評価用データを予定している用途に応じて変更することで、当該用途などに適応した言語モデルを作成することができる。 Note that the influence of the text data amount can be reduced by equalizing the data amount of each text data cluster. In addition, by changing the evaluation data according to the intended use, a language model adapted to the use can be created.

《第２実施形態》
本発明の第２実施形態について、図面を参照しながら説明する。
＜第２実施形態の概要＞
第２実施形態では、テキストデータクラスタ毎にクラスタ言語モデルを作成し、それぞれのクラスタ言語モデルのうち複数の組み合わせから合成クラスタ言語モデルを合成する。そして、各クラスタ言語モデルおよび各合成クラスタ言語モデルを、評価用データ（１１８）および評価用音響モデル（１１９）を用いて評価する。さらに、所定の評価結果を与えた言語モデルから１つの言語モデルを合成し〔以下、「統合処理」と云う。〕、この統合処理で出力された言語モデルを適応言語モデルとする。なお、所定の評価結果については、第１実施形態の説明を参照のこと。 << Second Embodiment >>
A second embodiment of the present invention will be described with reference to the drawings.
<Outline of Second Embodiment>
In the second embodiment, a cluster language model is created for each text data cluster, and a synthesized cluster language model is synthesized from a plurality of combinations of the respective cluster language models. Each cluster language model and each synthesized cluster language model are evaluated using the evaluation data (118) and the evaluation acoustic model (119). Furthermore, one language model is synthesized from the language models that give a predetermined evaluation result [hereinafter referred to as “integrated processing”. ], The language model output by this integration processing is set as an adaptive language model. For the predetermined evaluation result, see the description of the first embodiment.

＜第２実施形態の言語モデル作成装置＞
第２実施形態の言語モデル作成装置は、第１実施形態の言語モデル作成装置と同様のハードウェア構成であり、第１実施形態と異なる部分について説明を行う。
第２実施形態では、外部記憶装置（１７）に、第１実施形態のプログラムに加え、所定の評価結果を与えた言語モデルから１つの言語モデルを合成する統合処理を行うためのプログラムも保存記憶されている。 <Language Model Creation Device of Second Embodiment>
The language model creation apparatus according to the second embodiment has the same hardware configuration as that of the language model creation apparatus according to the first embodiment, and only different parts from the first embodiment will be described.
In the second embodiment, in addition to the program of the first embodiment, the external storage device (17) also stores a program for performing an integration process for synthesizing one language model from a language model given a predetermined evaluation result. Has been.

第２実施形態の言語モデル作成装置（１）では、外部記憶装置（１７）に記憶された各プログラムとこの各プログラムの処理に必要なデータが必要に応じてＲＡＭ（１５）に読み込まれて、ＣＰＵ（１４）で解釈実行・処理される。この結果、ＣＰＵ（１４）が所定の機能（クラスタ言語モデル作成部、モデル合成部、モデル評価部、選択部、モデル統合部）を実現することで言語モデルの作成が実現される。 In the language model creation device (1) of the second embodiment, each program stored in the external storage device (17) and data necessary for processing each program are read into the RAM (15) as necessary, Interpretation is executed and processed by the CPU (14). As a result, the creation of the language model is realized by the CPU (14) realizing predetermined functions (cluster language model creation unit, model synthesis unit, model evaluation unit, selection unit, model integration unit).

＜第２実施形態の言語モデル作成処理＞
次に、図５および図６を参照して、第２実施形態における言語モデル作成処理の流れを叙述的に説明する。ここでは、第１実施形態における言語モデル作成処理の流れと異なる部分について説明を行う。 <Language Model Creation Processing of Second Embodiment>
Next, with reference to FIG. 5 and FIG. 6, the flow of the language model creation process in the second embodiment will be described descriptively. Here, parts different from the flow of the language model creation process in the first embodiment will be described.

第２実施形態における言語モデル作成処理では、第１実施形態におけるステップＳ３の処理に続いて次の処理を行う。
即ち、選択部（１２０ａ）は、（Ｎ＋Ｍ）個の各言語モデル（１１４−１）（１１４−２）・・・（１１４−Ｎ）（１１６−１）（１１６−２）・・・（１１６−Ｍ）の評価結果に基づき、閾値以上（あるいは閾値よりも大）の評価結果を与えた言語モデルを選択する（ステップＳ４ａ）。なお、この閾値は予め外部記憶装置（１７）に保存記憶されており、ステップＳ４ａの処理で選択部（１２０ａ）によって読み込まれる。
例えばステップＳ３の処理において閾値以上の評価結果を与えた言語モデルがクラスタ言語モデル［１］および合成クラスタ言語モデル［５＋９］であった場合、選択部（１２０ａ）は、クラスタ言語モデル［１］および合成クラスタ言語モデル［５＋９］を選択する。
なお、選択された言語モデルが１つの場合は、第１実施形態の場合に相当するから、第２実施形態では複数の言語モデルが選択されるとする。 In the language model creation process in the second embodiment, the following process is performed following the process in step S3 in the first embodiment.
That is, the selection unit (120a) includes (N + M) language models (114-1) (114-2) (114-N) (116-1) (116-2) (116). Based on the evaluation result of -M), a language model giving an evaluation result equal to or higher than the threshold (or larger than the threshold) is selected (step S4a). This threshold value is stored in advance in the external storage device (17), and is read by the selection unit (120a) in the process of step S4a.
For example, when the language model that gave the evaluation result equal to or higher than the threshold in the process of step S3 is the cluster language model [1] and the synthetic cluster language model [5 + 9], the selection unit (120a) selects the cluster language model [1] and Select the synthetic cluster language model [5 + 9].
Note that when one language model is selected, which corresponds to the case of the first embodiment, it is assumed that a plurality of language models are selected in the second embodiment.

次に、モデル統合部（１２１）は、選択された複数の言語モデルに対して統合処理を行って適応言語モデル（１２３）を出力する（ステップＳ５）。
モデル統合部（１２１）の統合処理は、モデル合成部（１１５）のモデル合成処理と同様である〔上記参考文献参照。〕。但し、モデル合成部（１１５）のモデル合成処理ではＭ個の合成クラスタ言語モデルを作成したが、モデル統合部（１２１）の統合処理では１個の言語モデルを作成する。
つまり、クラスタ言語モデル［１］および合成クラスタ言語モデル［５＋９］が選択された場合を例にとって説明すると、モデル統合部（１２１）は、クラスタ言語モデル［１］および合成クラスタ言語モデル［５＋９］に対してモデル合成を行ない１つの言語モデルを出力する。この言語モデルが適応言語モデル（１２３）である。 Next, the model integration unit (121) performs an integration process on the selected plurality of language models and outputs an adaptive language model (123) (step S5).
The integration process of the model integration unit (121) is the same as the model synthesis process of the model synthesis unit (115) [see the above-mentioned reference. ]. However, although M synthesized cluster language models are created in the model synthesis process of the model synthesis unit (115), one language model is created in the integration process of the model integration unit (121).
In other words, the case where the cluster language model [1] and the synthesized cluster language model [5 + 9] are selected will be described as an example. The model integration unit (121) determines the cluster language model [1] and the synthesized cluster language model [5 + 9]. Then, model synthesis is performed for one language model. This language model is the adaptive language model (123).

《第３実施形態》
本発明の第３実施形態について、図面を参照しながら説明する。
＜第３実施形態の概要＞
第３実施形態では、テキストデータクラスタ毎にクラスタ言語モデルを作成し、全てのクラスタ言語モデルのうち一部を除いた残りの組み合わせから言語モデル（部分選択合成クラスタ言語モデルである。以下、単に「合成クラスタ言語モデル」と略記する。）を合成する。そして、各合成クラスタ言語モデルを、評価用データ（１１８）および評価用音響モデル（１１９）を用いて評価する。さらに、所定の評価結果を与えた合成クラスタ言語モデルの合成において除外されたクラスタ言語モデルを選択し、この選択されたクラスタ言語モデルから１つの言語モデルを合成し〔統合処理〕、この統合処理で出力された言語モデルを適応言語モデルとする。ここで所定の評価結果とは、閾値以下あるいは閾値よりも小であるとの評価結果のこととする。 << Third Embodiment >>
A third embodiment of the present invention will be described with reference to the drawings.
<Outline of Third Embodiment>
In the third embodiment, a cluster language model is created for each text data cluster, and a language model (partial selection synthesis cluster language model from the remaining combinations excluding some of all the cluster language models. Abbreviated as “composite cluster language model”). Each synthetic cluster language model is evaluated using the evaluation data (118) and the evaluation acoustic model (119). Further, the cluster language model excluded in the synthesis of the synthesized cluster language model giving the predetermined evaluation result is selected, and one language model is synthesized from the selected cluster language model [integration process]. The output language model is set as an adaptive language model. Here, the predetermined evaluation result is an evaluation result that is equal to or less than the threshold value or smaller than the threshold value.

＜第３実施形態の言語モデル作成装置＞
第３実施形態の言語モデル作成装置は、第２実施形態の言語モデル作成装置と同様のハードウェア構成であり、第２実施形態と異なる部分について説明を行う。
第２実施形態における選択部（１２０）を実現するためのプログラムは、第３実施形態では、所定の評価結果を与えた合成クラスタ言語モデルの合成において除外されたクラスタ言語モデルを選択するためのプログラムとする。
また、第２実施形態におけるモデル統合部（１２１）を実現するためのプログラムは、第３実施形態では、選択されたクラスタ言語モデルから適応言語モデルを作成するためのプログラムとする。 <Language Model Creation Device of Third Embodiment>
The language model creation apparatus according to the third embodiment has the same hardware configuration as that of the language model creation apparatus according to the second embodiment, and only different parts from the second embodiment will be described.
In the third embodiment, the program for realizing the selection unit (120) in the second embodiment is a program for selecting a cluster language model excluded in the synthesis of the synthetic cluster language model that gave a predetermined evaluation result. And
In the third embodiment, the program for realizing the model integration unit (121) in the second embodiment is a program for creating an adaptive language model from the selected cluster language model.

第３実施形態の言語モデル作成装置（１）では、外部記憶装置（１７）に記憶された各プログラムとこの各プログラムの処理に必要なデータが必要に応じてＲＡＭ（１５）に読み込まれて、ＣＰＵ（１４）で解釈実行・処理される。この結果、ＣＰＵ（１４）が所定の機能（クラスタ言語モデル作成部、モデル合成部、モデル評価部、選択部、モデル統合部）を実現することで音響モデルの作成が実現される。 In the language model creation device (1) of the third embodiment, each program stored in the external storage device (17) and data necessary for the processing of each program are read into the RAM (15) as necessary. Interpretation is executed and processed by the CPU (14). As a result, the creation of the acoustic model is realized by the CPU (14) realizing predetermined functions (cluster language model creation unit, model synthesis unit, model evaluation unit, selection unit, model integration unit).

＜第３実施形態の言語モデル作成処理＞
次に、図７および図８を参照して、第３実施形態における言語モデル作成処理の流れを叙述的に説明する。ここでは、第２実施形態における言語モデル作成処理の流れと異なる部分について説明を行う。 <Language Model Creation Processing of Third Embodiment>
Next, with reference to FIG. 7 and FIG. 8, the flow of the language model creation process in the third embodiment will be described descriptively. Here, parts different from the flow of language model creation processing in the second embodiment will be described.

第３実施形態における言語モデル作成処理では、第２実施形態におけるステップＳ１の処理に続いて次の処理を行う。
即ち、モデル合成部（１１５ｂ）は、ＲＡＭ（１５）から読み込んだ各クラスタ言語モデル（１１４−１）（１１４−２）・・・（１１４−Ｎ）のうち、全てのクラスタ言語モデルから各別の１つのクラスタ言語モデルを除いた残りのクラスタ言語モデルの組み合わせから合成クラスタ言語モデル（１１６ｂ）を合成する（ステップＳ２ｂ）。
つまり、モデル合成部（１１５ｂ）は、全てのクラスタ言語モデル（１１４−１）（１１４−２）・・・（１１４−Ｎ）からクラスタ言語モデル［１］（１１４−１）を除外したＮ−１個のクラスタ言語モデル（１１４−２）（１１４−３）・・・（１１４−Ｎ）から合成クラスタ言語モデル［１無し］（１１６ｂ−１）を合成し、全てのクラスタ言語モデル（１１４−１）（１１４−２）・・・（１１４−Ｎ）からクラスタ言語モデル［２］（１１４−２）を除外したＮ−１個のクラスタ言語モデル（１１４−１）（１１４−３）・・・（１１４−Ｎ）から合成クラスタ言語モデル［２無し］（１１６ｂ−２）を合成し、・・・、全てのクラスタ言語モデル（１１４−１）（１１４−２）・・・（１１４−Ｎ）からクラスタ言語モデル［Ｎ］（１１４−Ｎ）を除外したＮ−１個のクラスタ言語モデル（１１４−１）（１１４−２）・・・（１１４−（Ｎ−１））から合成クラスタ言語モデル［Ｎ無し］（１１６ｂ−Ｎ）を合成するのである。
Ｎ個の合成クラスタ言語モデル（１１６ｂ−１）・・・（１１６ｂ−Ｎ）はＲＡＭ（１５）などに適宜に保存記憶される。 In the language model creation process in the third embodiment, the following process is performed following the process in step S1 in the second embodiment.
That is, the model synthesizing unit (115b) selects each of the cluster language models (114-1), (114-2),... (114-N) read from the RAM (15) from each cluster language model. The synthesized cluster language model (116b) is synthesized from the remaining cluster language model combinations excluding the one cluster language model (step S2b).
That is, the model synthesis unit (115b) removes the cluster language model [1] (114-1) from all the cluster language models (114-1) (114-2) (114-N). The synthesized cluster language model [1 none] (116b-1) is synthesized from one cluster language model (114-2) (114-3) (114-N), and all cluster language models (114- 1) N-1 cluster language models (114-1) (114-3) obtained by excluding the cluster language model [2] (114-2) from (114-2) (114-N) A synthesized cluster language model [2 none] (116b-2) is synthesized from (114-N), and all cluster language models (114-1) (114-2) (114-N) are synthesized. ) To cluster language model [N From the N-1 cluster language models (114-1) (114-2) (114- (N-1)) excluding (114-N), the synthesized cluster language model [N None] (116b- N) is synthesized.
The N synthetic cluster language models (116b-1)... (116b-N) are appropriately stored and stored in the RAM (15) or the like.

このステップＳ２ｂの処理に続いてステップＳ３ｂの処理を実行する。具体的には、モデル評価部（１１７ｂ）は、Ｎ個の各合成クラスタ言語モデル（１１６ｂ−１）・・・（１１６ｂ−Ｎ）、評価用データ（１１８）および評価用音響モデル（１１９）をＲＡＭ（１５）から読み込み、各合成クラスタ言語モデルと評価用音響モデル（１１９）とを用いて評価用データ（１１８）に対する認識率を求め、これを各合成クラスタ言語モデルの評価結果として算出する（ステップＳ３ｂ）。モデル評価部（１１７ｂ）の機能構成は、第１実施形態のモデル評価部（１１７）と同様である。
評価結果である認識率の差異は、同じ評価用データ（１１８）および評価用音響モデル（１１９）を用いて評価していることから、各合成クラスタ言語モデル（１１６ｂ−１）・・・（１１６ｂ−Ｎ）の差異に基づく。 Subsequent to step S2b, step S3b is executed. Specifically, the model evaluation unit (117b) receives each of the N synthesized cluster language models (116b-1) (116b-N), the evaluation data (118), and the evaluation acoustic model (119). Reading from the RAM (15), the recognition rate for the evaluation data (118) is obtained using each synthetic cluster language model and the acoustic model for evaluation (119), and this is calculated as the evaluation result of each synthetic cluster language model ( Step S3b). The functional configuration of the model evaluation unit (117b) is the same as that of the model evaluation unit (117) of the first embodiment.
Since the difference in recognition rate as an evaluation result is evaluated using the same evaluation data (118) and evaluation acoustic model (119), each synthetic cluster language model (116b-1) (116b). -Based on the difference of N).

続いて、選択部（１２０ｂ）は、各合成クラスタ言語モデル（１１６ｂ−１）・・・（１１６ｂ−Ｎ）の評価結果のうち所定の評価結果を与えた合成クラスタ言語モデルの合成において除外されたクラスタ言語モデルを選択する（ステップＳ４ｂ）。
ここで『所定の評価結果』とは、合成クラスタ言語モデル［ｊ無し］（１１６ｂ−ｊ）〔ｊ＝１、２、・・・、Ｎ〕の各評価結果が予め定められた閾値以下（あるいは閾値よりも小）である、との評価結果のことである。換言すれば、当該閾値を下回る評価結果を与えた合成クラスタ言語モデルの合成において除外されたクラスタ言語モデルを選択する。
例えば閾値を認識率７０％とし、ステップＳ３ｂの処理において合成クラスタ言語モデル［５無し］の評価結果が６０％、合成クラスタ言語モデル［９無し］の評価結果が６３％、その他の各合成クラスタ言語モデルの評価結果が７０％よりも大であったとすると、合成クラスタ言語モデル［５無し］および合成クラスタ言語モデル［９無し］の各評価結果が、閾値７０％を下回っている。そこで、選択部（１２０ｂ）は、合成クラスタ言語モデル［５無し］および合成クラスタ言語モデル［９無し］の合成で除外されたクラスタ言語モデル［５］およびクラスタ言語モデル［９］を選択する。 Subsequently, the selection unit (120b) was excluded in the synthesis of the synthesized cluster language model that gave a predetermined evaluation result among the evaluation results of each synthesized cluster language model (116b-1) (116b-N). A cluster language model is selected (step S4b).
Here, the “predetermined evaluation result” means that each evaluation result of the composite cluster language model [no j] (116b-j) [j = 1, 2,..., N] is equal to or less than a predetermined threshold (or It is the evaluation result that it is smaller than the threshold value. In other words, the cluster language model excluded in the synthesis of the synthesized cluster language model that gave the evaluation result below the threshold is selected.
For example, the recognition rate is set to 70%, the evaluation result of the synthetic cluster language model [ 5 None] is 60%, the evaluation result of the synthetic cluster language model [9 None] is 63% in the processing of Step S3b, and each other synthetic cluster language If the evaluation result of the model is larger than 70%, the evaluation results of the synthetic cluster language model [5 none] and the synthetic cluster language model [9 none] are below the threshold value 70%. Therefore, the selection unit (120b) selects the cluster language model [5] and the cluster language model [9] that are excluded by the synthesis of the synthetic cluster language model [5 none] and the synthetic cluster language model [9 none].

ステップＳ４ｂの処理に続いてステップＳ５ｂの処理を実行する。モデル統合部（１２１ｂ）は、選択されたクラスタ言語モデルに対して統合処理を行って適応言語モデル（１２３）を出力する（ステップＳ５）。もし、選択されたクラスタ言語モデルが１つの場合は、モデル統合部（１２１ｂ）は不要であり、選択部（１２０ｂ）が選択したクラスタ言語モデルを適応言語モデルとして出力すればよく、第１実施形態のステップＳ４の処理で説明したのと同様の構成とすることができる。
モデル統合部（１２１ｂ）の統合処理は、モデル合成部（１１５）のモデル合成処理と同様である〔上記参考文献参照。〕。但し、モデル合成部（１１５）のモデル合成処理ではＭ個の合成クラスタ言語モデルを作成したが、モデル統合部（１２１ｂ）の統合処理では１個の言語モデルを作成する。
つまり、クラスタ言語モデル［５］およびクラスタ言語モデル［９］が選択された場合を例にとって説明すると、モデル統合部（１２１ｂ）は、クラスタ言語モデル［５］およびクラスタ言語モデル［９］に対してモデル合成を行ない１つの言語モデルを出力する。この言語モデルが適応言語モデル（１２３）である。 Following step S4b, step S5b is executed. The model integration unit (121b) performs an integration process on the selected cluster language model and outputs an adaptive language model (123) (step S5). If the selected cluster language model is one, the model integration unit (121b) is not necessary, and the cluster language model selected by the selection unit (120b) may be output as an adaptive language model. It can be set as the structure similar to having demonstrated by the process of step S4.
The integration process of the model integration unit (121b) is the same as the model synthesis process of the model synthesis unit (115) [see the above-mentioned references. ]. However, although M synthesized cluster language models are created in the model synthesis process of the model synthesis unit (115), one language model is created in the integration process of the model integration unit (121b).
That is, the case where the cluster language model [5] and the cluster language model [9] are selected will be described as an example. The model integration unit (121b) performs the processing on the cluster language model [5] and the cluster language model [9]. Perform model synthesis and output one language model. This language model is the adaptive language model (123).

第３実施形態の構成は、ある合成クラスタ言語モデルｋの認識精度が低いということは、クラスタ言語モデルｋを除外することによって認識精度が劣化する、逆に言えば、そのクラスタ言語モデルｋが認識精度向上に大きく寄与することに基づくものである。また、第３実施形態は、全ての組み合わせの中から最も評価基準が高くなる言語モデルを選択することを保証する方法ではないが、評価すべき合成クラスタ言語モデルの数がテキストデータクラスタの個数と同じであるから評価回数を大幅に減らし、現実的な計算量で適応言語モデル作成を行うのに有効な方法と言える。 The configuration of the third embodiment is that the recognition accuracy of a certain synthetic cluster language model k is low. That is, the recognition accuracy deteriorates by excluding the cluster language model k. In other words, the cluster language model k is recognized. This is based on greatly contributing to accuracy improvement. The third embodiment is not a method for guaranteeing that the language model having the highest evaluation criterion is selected from all the combinations, but the number of synthetic cluster language models to be evaluated is the number of text data clusters. since the same der Ru significantly reduces the number of evaluations, it can be said that an effective way to perform adaptive language model creating a realistic computational complexity.

《第４実施形態》
本発明の第４実施形態について、図面を参照しながら説明する。
＜第４実施形態の概要＞
第４実施形態は、膨大なデータ量であるテキストデータを、適当な分類基準で複数のテキストデータクラスタに分類するデータ分類処理を含むものである。このデータ分類処理は、第１実施形態、第２実施形態、第３実施形態に組み合わせて適用できる。第４実施形態は、第２実施形態に組み合わせた場合として説明する。 << 4th Embodiment >>
A fourth embodiment of the present invention will be described with reference to the drawings.
<Outline of Fourth Embodiment>
The fourth embodiment includes a data classification process for classifying text data having an enormous amount of data into a plurality of text data clusters according to an appropriate classification standard. This data classification process can be applied in combination with the first embodiment, the second embodiment, and the third embodiment. The fourth embodiment will be described as a combination with the second embodiment.

＜第４実施形態の言語モデル作成装置＞
第４実施形態の言語モデル作成装置は、第２実施形態の言語モデル作成装置と同様のハードウェア構成であり、第２実施形態と異なる部分について説明を行う。
第４実施形態では、外部記憶装置（１７）に、第２実施形態のプログラムに加え、テキストデータを分類するためのプログラムも保存記憶されている。
また第２実施形態では、外部記憶装置（１７）にＮ個のテキストデータクラスタが保存記憶されているとしたが、第４実施形態では、外部記憶装置（１７）に分割前のテキストデータ（１１１ａ）が保存記憶されているとする。 <Language Model Creation Device of Fourth Embodiment>
The language model creation apparatus according to the fourth embodiment has the same hardware configuration as that of the language model creation apparatus according to the second embodiment, and only different parts from the second embodiment will be described.
In the fourth embodiment, in addition to the program of the second embodiment, a program for classifying text data is also stored in the external storage device (17).
In the second embodiment, N text data clusters are stored and stored in the external storage device (17). However, in the fourth embodiment, the text data (111a before division) is stored in the external storage device (17). ) Is stored and stored.

第４実施形態の言語モデル作成装置（１）では、外部記憶装置（１７）に記憶された各プログラムとこの各プログラムの処理に必要なデータが必要に応じてＲＡＭ（１５）に読み込まれて、ＣＰＵ（１４）で解釈実行・処理される。この結果、ＣＰＵ（１４）が所定の機能（データ分類部、クラスタ言語モデル作成部、モデル合成部、モデル評価部、選択部、モデル統合部）を実現することで言語モデルの作成が実現される。 In the language model creation device (1) of the fourth embodiment, each program stored in the external storage device (17) and data necessary for the processing of each program are read into the RAM (15) as necessary. Interpretation is executed and processed by the CPU (14). As a result, creation of a language model is realized by the CPU (14) realizing predetermined functions (data classification unit, cluster language model creation unit, model synthesis unit, model evaluation unit, selection unit, model integration unit). .

＜第４実施形態の言語モデル作成処理＞
次に、図９および図１０を参照して、第４実施形態における言語モデル作成処理の流れを叙述的に説明する。ここでは、第２実施形態における言語モデル作成処理の流れと異なる部分について説明を行う。 <Language Model Creation Processing of Fourth Embodiment>
Next, with reference to FIG. 9 and FIG. 10, the flow of the language model creation process in the fourth embodiment will be described descriptively. Here, parts different from the flow of language model creation processing in the second embodiment will be described.

データ分類部（１１０）は、新聞記事、Ｗｅｂ等より得られる大規模な既存のテキストデータ（１１１ａ）を読み込み、予め設定された分類基準（例えば話題などである。）に従って、テキストデータ（１１１ａ）をＮ個のテキストデータクラスタ［１］（１１１−１）、テキストデータクラスタ［２］（１１１−２）、・・・、テキストデータクラスタ［Ｎ］（１１１−Ｎ）に分割して出力する（ステップＳ１ｐ）。なお、分類方法は、Ｋ−ｍｅａｎｓ法などの一般的なクラスタリング手法を用いることができる。
Ｎ個のテキストデータクラスタは、ＲＡＭ（１５）などに適宜に保存記憶される。図９において、テキストデータ（１１１ｂ）は分割後のテキストデータクラスタの集合を示している。 The data classification unit (110) reads large-scale existing text data (111a) obtained from newspaper articles, the Web, etc., and text data (111a) according to a preset classification standard (for example, topic). Are divided into N text data clusters [1] (111-1), text data clusters [2] (111-2),..., Text data clusters [N] (111-N) and output ( Step S1p). As a classification method, a general clustering method such as a K-means method can be used.
The N text data clusters are appropriately stored and stored in the RAM (15) or the like. In FIG. 9, text data (111b) indicates a set of text data clusters after division.

ステップＳ１ｐの処理に続いて、第２実施形態のステップＳ１以降の処理が行われる。 Subsequent to step S1p, the processing after step S1 of the second embodiment is performed.

各種の実施形態を挙げたが、いずれも、評価用データを用いた評価によってクラスタ言語モデルあるいは合成クラスタ言語モデルの選択を行い適応言語モデルの作成を行うものであるから、用途に応じた評価用データを用いることで当該用途に特化した最適な言語モデルを作成できる。
また、認識性能の差に対する閾値によってクラスタ言語モデルあるいは合成クラスタ言語モデルを選択する以外に、閾値だけでなく、テキストデータ量の制限などの要素を加えても良い。さらに、認識性能の差の大きさが認識性能改善への寄与度と考えられるから、例えば選択された合成クラスタ言語モデルに対して認識性能の差に応じた重み付けを行って適応言語モデルを合成するとしてもよく、この場合、適応言語モデルの性能向上が望める。 Although various embodiments have been described, all of them select a cluster language model or a synthetic cluster language model by evaluation using evaluation data, and create an adaptive language model. By using the data, it is possible to create an optimal language model specialized for the application.
In addition to selecting a cluster language model or a synthesized cluster language model according to a threshold for a difference in recognition performance, elements such as a text data amount restriction may be added in addition to the threshold. Furthermore, since the magnitude of the recognition performance difference is considered to be a contribution to the recognition performance improvement, for example, the adaptive language model is synthesized by weighting the selected synthesized cluster language model according to the recognition performance difference. In this case, the performance of the adaptive language model can be improved.

以上の各実施形態の他、本発明である言語モデル作成装置・方法は上述の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。また、上記言語モデル作成装置・方法において説明した処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されるとしてもよい。 In addition to the above embodiments, the language model creation apparatus and method according to the present invention are not limited to the above-described embodiments, and can be appropriately changed without departing from the spirit of the present invention. In addition, the processing described in the language model creation device / method is not only executed in time series in the order described, but also executed in parallel or individually as required by the processing capability of the device that executes the processing. It may be.

また、上記言語モデル作成装置における処理機能をコンピュータによって実現する場合、言語モデル作成装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記言語モデル作成装置における処理機能がコンピュータ上で実現される。 When the processing function in the language model creation device is realized by a computer, the processing contents of the function that the language model creation device should have are described by a program. Then, by executing this program on a computer, the processing function in the language model creating apparatus is realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ−ＲＡＭ（Random Access Memory）、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ−Ｒ（Recordable）／ＲＷ（ReWritable）等を、光磁気記録媒体として、ＭＯ（Magneto-Optical disc）等を、半導体メモリとしてＥＥＰ−ＲＯＭ（Electronically Erasable and Programmable-Read Only Memory）等を用いることができる。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a flexible disk, a magnetic tape or the like, and as an optical disk, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only). Memory), CD-R (Recordable) / RW (ReWritable), etc., magneto-optical recording medium, MO (Magneto-Optical disc), etc., semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. Can be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. When executing the process, the computer reads a program stored in its own recording medium and executes a process according to the read program. As another execution form of the program, the computer may directly read the program from the portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、言語モデル作成装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In this embodiment, the language model creation apparatus is configured by executing a predetermined program on a computer. However, at least a part of the processing contents may be realized by hardware.

本発明による効果を、第３実施形態に即して、模擬対話データの連続音声認識において評価した。
既存のテキストデータベースとして、Ｗｅｂなどから収集した約１億単語からなる大規模なテキストデータを使用した。辞書に登録されている５０００単語それぞれのテキストデータでの出現確率を入力ベクトルとし、入力ベクトル間のユークリッド距離を距離尺度としたＫ−ｍｅａｎｓ法を用いて、５０のデータベースクラスタに分類した。各データベースクラスタに対して、クラスタ言語モデル（Ｔｒｉｇｒａｍ）を構築し、それを元に合成クラスタ言語モデル（Ｔｒｉｇｒａｍ）を構築した。音響モデルは、性別依存の隠れマルコフモデルを使用し、評価に用いた音声データは約１時間分である。選択部では、閾値を５８．７とし、認識率がそれ未満の合成クラスタ言語モデルから選択されたクラスタ言語モデルは３つであった。この３つのクラスタ言語モデルをモデル統合部で合成し、適応言語モデル（Ｔｒｉｇｒａｍ）を得た。 The effect of the present invention was evaluated in continuous speech recognition of simulated dialogue data in accordance with the third embodiment.
As an existing text database, large-scale text data consisting of about 100 million words collected from the Web or the like was used. Using the K-means method in which the appearance probability of each 5000 word registered in the dictionary in the text data is an input vector and the Euclidean distance between the input vectors is a distance scale, the data is classified into 50 database clusters. A cluster language model (Trigram) was constructed for each database cluster, and a synthetic cluster language model (Trigram) was constructed based on the cluster language model (Trigram). The acoustic model uses a gender-dependent hidden Markov model, and the speech data used for evaluation is about one hour. In the selection unit, the threshold value is set to 58.7, and the number of cluster language models selected from the synthesized cluster language models having a recognition rate lower than that is three. These three cluster language models were synthesized by the model integration unit to obtain an adaptive language model (Trigram).

用途に適応しない比較対象の言語モデルとして、上記全てのテキストデータクラスタで言語モデル（Ｔｒｉｇｒａｍ）を構築し、評価用音声を認識した場合の認識精度は５９％であった。これに対して、テキストデータの分類・選択によって用途適応した適応言語モデル（Ｔｒｉｇｒａｍ）で同一評価用音声を認識した時の認識精度は６５％となった。つまり、本発明によって言語モデルを適応することで、認識精度を６％向上させることができた。 As a language model to be compared that is not adapted to the application, a language model (Trigram) is constructed with all the text data clusters described above, and the recognition accuracy when the speech for evaluation was recognized was 59%. On the other hand, the recognition accuracy when the same evaluation speech was recognized by the adaptive language model (Trigram) adapted for use by classifying and selecting text data was 65%. That is, the recognition accuracy can be improved by 6% by applying the language model according to the present invention.

本発明は、音声認識―例えば、音声認識に基づく文字入力や対話システムの音声認識など―に用いる言語モデルの作成に有用である。 The present invention is useful for creating a language model used for speech recognition--for example, character input based on speech recognition or speech recognition of a dialogue system.

言語モデル作成装置のハードウェア構成例を示す図。The figure which shows the hardware structural example of a language model creation apparatus. 第１実施形態に係わる言語モデル作成装置の機能構成例を示すブロック図。The block diagram which shows the function structural example of the language model creation apparatus concerning 1st Embodiment. 第１実施形態に係わる言語モデル作成処理の処理フローを示す図。The figure which shows the processing flow of the language model creation process concerning 1st Embodiment. モデル評価部の機能構成例を示すブロック図。The block diagram which shows the function structural example of a model evaluation part. 第２実施形態に係わる言語モデル作成装置の機能構成例を示すブロック図。The block diagram which shows the function structural example of the language model creation apparatus concerning 2nd Embodiment. 第２実施形態に係わる言語モデル作成処理の処理フローを示す図。The figure which shows the processing flow of the language model creation process concerning 2nd Embodiment. 第３実施形態に係わる言語モデル作成装置の機能構成例を示すブロック図。The block diagram which shows the function structural example of the language model production apparatus concerning 3rd Embodiment. 第３実施形態に係わる言語モデル作成処理の処理フローを示す図。The figure which shows the processing flow of the language model creation process concerning 3rd Embodiment. 第４実施形態に係わる言語モデル作成装置の機能構成例を示すブロック図。The block diagram which shows the function structural example of the language model creation apparatus concerning 4th Embodiment. 第４実施形態に係わる言語モデル作成処理の処理フローを示す図。The figure which shows the processing flow of the language model creation process concerning 4th Embodiment.

Explanation of symbols

１言語モデル作成装置
１１０データ分類部
１１１、１１１ａ、１１１ｂテキストデータ
１１１−１・・・１１１−Ｎテキストデータクラスタ
１１３クラスタ言語モデル作成部
１１４−１・・・１１４−Ｎクラスタ言語モデル
１１５、１１５ｂモデル合成部
１１６合成クラスタ言語モデル
１１６−ｂ部分選択合成クラスタ言語モデル
１１７、１１７ｂモデル評価部
１１８評価用データ
１１９評価用音響モデル
１２０、１２０ａ、１２０ｂ選択部
１２１、１２１ｂモデル統合部
１２３適応言語モデル DESCRIPTION OF SYMBOLS 1 Language model creation apparatus 110 Data classification | category part 111, 111a, 111b Text data 111-1 ... 111-N Text data cluster 113 Cluster language model creation part 114-1 ... 114-N Cluster language model 115,115b model Synthesis unit 116 Synthesis cluster language model 116-b Partial selection synthesis cluster language model 117, 117b Model evaluation unit 118 Evaluation data 119 Evaluation acoustic model 120, 120a, 120b Selection unit 121, 121b Model integration unit 123 Adaptive language model

Claims

Storage means for storing a plurality of text data clusters, evaluation data that is data used for evaluation of the language model, and an acoustic model for evaluation that is an acoustic model used for evaluation of the language model;
Cluster language model creation means for creating a language model corresponding to each text data cluster (hereinafter referred to as “cluster language model”) from each text data cluster,
Model synthesis means for synthesizing each language model (hereinafter referred to as “partial selection synthesis cluster language model”) from a combination of the remaining cluster language models excluding some cluster language models from all cluster language models,
Model evaluation means for evaluating each partial selection synthesis cluster language model using the evaluation data and the evaluation acoustic model, and outputting an evaluation result of each partial selection synthesis cluster language model;
Of the above evaluation results, the cluster language model excluded in the synthesis of the partially selected synthesis cluster language model that gave the low evaluation result was selected, and the cluster language model was created when there was one selected cluster language model Selection means for outputting as a language model ;
A language model creation device comprising model integration means for synthesizing and outputting one language model from a plurality of the selected cluster language models .

The storage means also stores a threshold value,
The lower evaluation result the selecting means is a selection criterion, walk below the upper Symbol threshold is the evaluation result is smaller than the upper Symbol threshold
The language model creation device according to claim 1 .

Data classification means for dividing the input text data into a plurality of text data clusters according to the classification criteria and outputting them,
Each text data cluster stored in the storage means is output by the data classification means.
The language model creation device according to claim 1 or claim 2 , wherein

The storage means of the language model creation device stores a plurality of text data clusters, evaluation data that is data used for evaluation of the language model, and an evaluation acoustic model that is an acoustic model used for evaluation of the language model. And
A cluster language model creating means for creating a language model corresponding to each text data cluster (hereinafter referred to as a “cluster language model”) from each text data cluster;
The model synthesizing means of the language model creation device is referred to as a synthesized cluster language model (hereinafter referred to as “partial selection synthesized cluster language model”) from combinations of the remaining cluster language models excluding some cluster language models from all cluster language models. ) To synthesize each,
The model evaluation unit of the language model creation device evaluates each partial selection synthesis cluster language model using the evaluation data and the evaluation acoustic model, and outputs an evaluation result of each partial selection synthesis cluster language model. A model evaluation step;
When the selection means of the language model creation device selects a cluster language model excluded in the synthesis of the partial selection synthesis cluster language model that gave a low evaluation result among the above evaluation results, and the selected cluster language model is one A step of outputting the cluster language model as a created language model to
A language model creation method, comprising: a model integration step in which the model integration unit of the language model creation apparatus synthesizes and outputs one language model from a plurality of the selected cluster language models.

Language modeling program for causing a computer to function claims 1 as the language model creating apparatus according to claim 3.

A computer-readable recording medium on which the language model creation program according to claim 5 is recorded.