JP2014085462A

JP2014085462A - Method for language model evaluation, device for the same and program

Info

Publication number: JP2014085462A
Application number: JP2012233530A
Authority: JP
Inventors: Kazuhiro Arai; 和博荒井; Akira Masumura; 亮増村
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2012-10-23
Filing date: 2012-10-23
Publication date: 2014-05-12
Anticipated expiration: 2032-10-23
Also published as: JP5986883B2

Abstract

PROBLEM TO BE SOLVED: To provide a method for language model evaluation and a device for the same that make it possible to quantitatively grasp an effect in additional learning of a language model.SOLUTION: A language model evaluation device 100 comprises: a language model part 10 that stores a word chain consisting of two or more words obtained from a written text and also stores its appearance probability; a comparison part 20 that compares the word chain which is stored in the language model part 10 and consists of two or more words obtained from the written text with a word chain in an additional written text to be input externally, and outputs storage information showing whether or not the word chain is already stored in the language model part 10; and an evaluation part 30 that calculates a non-observed word string proportion which is a proportion of a word chain not stored in the language model part 10 to the word chain in the additional written text, on the basis of the storage information.

Description

この発明は、例えば連続音声認識において必要とされる統計的言語モデル（以下、言語モデル）の学習時に用いることができる言語モデル評価方法とその装置とプログラムに関する。 The present invention relates to a language model evaluation method, apparatus, and program that can be used for learning, for example, a statistical language model (hereinafter referred to as language model) required in continuous speech recognition.

連続音声認識の正解率向上を目的として、集められた書き起こしテキストを用いて言語モデルの追加学習を行う方法が、例えば特許文献１や２に開示されている。特許文献１に開示された言語モデルの追加学習の方法を、図１０を参照して簡単に説明する。 For example, Patent Documents 1 and 2 disclose a method for performing additional learning of a language model using collected transcript texts for the purpose of improving the accuracy rate of continuous speech recognition. A method for additional learning of a language model disclosed in Patent Document 1 will be briefly described with reference to FIG.

図１０は、認識タスク用記号連鎖確率生成部１４０の構成である。つまり、言語モデルを作成する機能部である。重み決定部２１０では、認識タスク用データベース１５０中の各認識タスクのテキストデータと各一般用テキストデータベース１６０−ｎの各テキストデータとを入力とし、認識タスクのテキストデータと各一般用テキストデータベース１６０−ｎのテキストデータとの類似度からその一般用テキストデータベース１６０−ｎに対する重みを決定する。 FIG. 10 shows the configuration of the recognition task symbol chain probability generation unit 140. That is, it is a functional unit that creates a language model. In the weight determination unit 210, the text data of each recognition task in the recognition task database 150 and the text data of each general text database 160-n are input, and the text data of the recognition task and each general text database 160- are input. The weight for the general text database 160-n is determined from the similarity of n to the text data.

記号連鎖確率生成部２２０では、重み決定部２１０が出力した重み付きの認識タスクテキストデータベース１５０及び重み付きの複数の一般用テキストデータベース１６０−１〜１６０−Ｎの各テキストデータを入力し、言語モデルを生成して記号連鎖確率データベース１２０に格納する。このように従来の言語モデルの追加学習は、テキストデータの類似度から重みを決定して順次、機械的に言語モデルとしていた。 In the symbol chain probability generation unit 220, the text data of the weighted recognition task text database 150 and the weighted general text databases 160-1 to 160-N output from the weight determination unit 210 are input, and the language model is input. And stored in the symbol chain probability database 120. As described above, in the conventional additional learning of the language model, the weight is determined from the similarity of the text data, and the language model is mechanically sequentially.

特許第３６２８２４５号Japanese Patent No. 3628245 特開平１０−３１９９８９号公報JP-A-10-3199889

従来の言語モデルの追加学習の方法では、追加される書き起こしテキストの内容や量に基づいて、例えば言語モデルを音声認識に用いた場合の追加学習による音声認識正解率の向上を定量的に明らかにすることができず、追加学習で期待される効果を事前に推計することができなかった。このため、言語モデルの追加学習によって連続音声認識の正解率が向上しなかった場合、その原因を定量的に明らかにすることができなかった。また、書き起こしテキストの追加によって音声認識正解率が向上するか否かを事前に判断する方法が無いため、どれほどの書き起こしテキストを追加すべきなのかを明確にできない等の課題があった。 In the conventional incremental learning method for language models, based on the content and amount of the added transcription text, for example, the improvement of the speech recognition accuracy rate by additional learning when the language model is used for speech recognition is quantitatively clarified. The effects expected from additional learning could not be estimated in advance. For this reason, when the correct answer rate of continuous speech recognition did not improve by additional learning of the language model, the cause could not be clarified quantitatively. Further, there is no method for determining in advance whether or not the speech recognition accuracy rate is improved by the addition of the transcription text, and thus there is a problem that it is not clear how much the transcription text should be added.

本発明は、この課題に鑑みてなされたものであり、言語モデルの追加学習において、追加する書き起こしテキストが言語モデルの追加学習にどれ程の変更を与え、連続音声認識の正解率を向上させるかを明らかにすることができる言語モデル評価方法とその装置とプログラムを提供することを目的とする。 The present invention has been made in view of this problem, and in the additional learning of the language model, how much the added transcription text changes the additional learning of the language model and improves the accuracy rate of continuous speech recognition. It is an object of the present invention to provide a language model evaluation method, apparatus, and program that can clarify the above.

本発明の言語モデル評価方法は、比較過程と、評価過程と、を備える。比較過程は、言語モデル部に格納されている書き起こしテキストから得られた２個以上の単語から成る単語連鎖と、外部から入力される追加書き起こしテキストの単語連鎖を比較して当該単語連鎖が言語モデル部に既に格納されているか否かの既納情報を出力する。評価過程は、既納情報に基づいて言語モデル部に未格納な単語連鎖と追加書き起こしテキストの単語連鎖との比率である未観測単語列比率を計算する。 The language model evaluation method of the present invention includes a comparison process and an evaluation process. In the comparison process, a word chain composed of two or more words obtained from the transcription text stored in the language model part is compared with a word chain of the additional transcription text input from the outside, and the word chain is determined. Prepaid information indicating whether or not already stored in the language model part is output. In the evaluation process, an unobserved word string ratio, which is a ratio between a word chain not stored in the language model portion and a word chain of the additional transcription text, is calculated based on the already-paid information.

本発明の言語モデル評価方法によれば、追加書き起こしテキストに、これまで未観測であった単語連鎖が含まれるのかを表す指標である未観測単語列比率を計算することが出来る。従って、従来では、実際に音声認識処理を行った結果の正解率等でしか知ることができなかった追加学習の効果を、音声認識処理を行うことなく定量的に知ることが可能となる効果を奏する。つまり、未観測単語列比率の変化を観測することによって言語モデルの学習の進捗状態を把握することが可能になる。 According to the language model evaluation method of the present invention, it is possible to calculate an unobserved word string ratio, which is an index indicating whether or not an untranscribed word chain is included in the additional transcription text. Therefore, in the past, it is possible to quantitatively know the effect of additional learning, which can be known only by the accuracy rate of the result of actually performing speech recognition processing, without performing speech recognition processing. Play. That is, it is possible to grasp the progress of learning of the language model by observing a change in the ratio of unobserved word strings.

この発明の言語モデル評価装置１００の機能構成例を示す図。The figure which shows the function structural example of the language model evaluation apparatus 100 of this invention. 言語モデル評価装置１００の動作フローを示す図。The figure which shows the operation | movement flow of the language model evaluation apparatus 100. 比較部２０の機能構成例を示す図。The figure which shows the function structural example of the comparison part. 比較部２０の動作フローを示す図。The figure which shows the operation | movement flow of the comparison part. 形態素解析手段２１と単語連鎖生成手段２２の入出力する情報の例を示してその動作を説明する図。The figure which shows the example of the information input / output by the morphological analysis means 21 and the word chain generation means 22, and demonstrates the operation | movement. 単語連鎖判別手段２３の入出力する情報の例を示してその動作を説明する図。The figure which shows the example of the information which the word chain determination means 23 inputs and outputs, and demonstrates the operation | movement. 評価部３０の機能構成例を示す図。The figure which shows the function structural example of the evaluation part 30. FIG. 評価部３０の動作フローを示す図。The figure which shows the operation | movement flow of the evaluation part. 比較集計手段３１と未観測単語列比較計算手段３２の入出力する情報の例を示してその動作を説明する図。The figure which shows the example of the information input / output by the comparison totalization means 31 and the unobserved word string comparison calculation means 32, and demonstrates the operation | movement. 従来の認識タスク用記号連鎖確率生成部１４０の機能構成を示す図The figure which shows the function structure of the symbol chain probability generation part 140 for the conventional recognition task

以下、この発明の実施の形態を図面を参照して説明する。複数の図面中同一のものには同じ参照符号を付し、説明は繰り返さない。 Embodiments of the present invention will be described below with reference to the drawings. The same reference numerals are given to the same components in a plurality of drawings, and the description will not be repeated.

図１に、この発明の言語モデル評価装置１００の機能構成例を示す。その動作フローを図２に示す。言語モデル評価装置１００は、言語モデル部１０と、比較部２０と、評価部３０と、を具備する。言語モデル評価装置１００は、例えばＲＯＭ、ＲＡＭ、ＣＰＵ等で構成されるコンピュータに所定のプログラムが読み込まれて、ＣＰＵがそのプログラムを実行することで実現されるものである。 FIG. 1 shows a functional configuration example of the language model evaluation apparatus 100 of the present invention. The operation flow is shown in FIG. The language model evaluation apparatus 100 includes a language model unit 10, a comparison unit 20, and an evaluation unit 30. The language model evaluation apparatus 100 is realized by reading a predetermined program into a computer composed of, for example, a ROM, a RAM, and a CPU and executing the program by the CPU.

言語モデル部１０は、書き起こしテキストから得られた２個以上の単語から成る単語連鎖とその出現確率とが格納されたものである。言語モデル部１０は、言語の特徴を統計的手法によりモデル化したデータを格納しており、例えば連続音声認識の実行時に音声認識結果候補に対して言語的蓋然性を与えるものである。一般に、連続音声認識のための統計的言語モデルではｎ−ｇｒａｍが利用される場合が多い。ｎ−ｇｒａｍは単語の二つ組みあるいは三つ組に対して出現確率を与えるものを言語モデルとしており、その出現確率は書き起こしテキスト内に現れる単語連鎖から計算される。言語モデル部１０には、例えば単語三つ組とその出現確率が格納されているものとする。 The language model unit 10 stores a word chain composed of two or more words obtained from a transcription text and its appearance probability. The language model unit 10 stores data in which language features are modeled by a statistical method. For example, the language model unit 10 gives a linguistic probability to a speech recognition result candidate when performing continuous speech recognition. In general, n-gram is often used in a statistical language model for continuous speech recognition. The n-gram uses a language model that gives an appearance probability to two or three pairs of words, and the appearance probability is calculated from a word chain appearing in the transcribed text. The language model unit 10 stores, for example, word triplets and their appearance probabilities.

書き起こしテキストは、音声データを聴取等の方法によって書き起こしたデータである。収録された音声データが朗読の場合、１人の話者の音声を書き起こすのみとなるが、音声データが会話の場合、２人以上の話者の音声を書き起こすと同時に、各発声の時間関係についても記録する必要があり、また、言い直しや言い淀みなど会話に頻出する特徴も現れることになる。一般に連続音声認識で用いる書き起こしテキストでは、これらの特徴についても記述する場合が多く、書き起こしテキストはそのような特徴を有するものとする。 The transcription text is data obtained by transcription of voice data by a method such as listening. When the recorded voice data is read, only one speaker's voice is transcribed. However, when the voice data is conversation, two or more speakers' voices are transcribed and at the same time Relationships need to be recorded, and features that appear frequently in conversations, such as rephrasing and phrasing, will also appear. In general, a transcription text used in continuous speech recognition often describes these characteristics, and the transcription text has such characteristics.

比較部２０は、言語モデル部１０に格納されている単語連鎖と、外部から入力される追加書き起こしテキストの単語連鎖を比較して当該単語連鎖が言語モデル部１０に既に格納されているか否かの既納情報を出力する（図２、ステップＳ２０）。評価部３０は、既納情報に基づいて言語モデル部１０に未格納な単語連鎖の追加書き起こしテキストに対する比率である未観測単語列比率を計算する（ステップＳ３０）。ここで、追加書き起こしテキストとは、追加学習用に用意された書き起こしテキストのことであり、追加学習用に別途用意されたテキストである点のみが異なるだけで、テキストとしては書き起こしテキストと同じものである。 The comparison unit 20 compares the word chain stored in the language model unit 10 with the word chain of the additional transcription text input from the outside, and whether or not the word chain is already stored in the language model unit 10. Is already output (FIG. 2, step S20). The evaluation unit 30 calculates an unobserved word string ratio, which is a ratio of the word chain not yet stored in the language model unit 10 to the additional transcription text based on the already-paid information (step S30). Here, the additional transcribed text is a transcribed text prepared for additional learning, and is different only in that it is a text prepared separately for additional learning. The same thing.

言語モデル評価装置１００によれば、追加書き起こしテキストに、これまで未観測であった単語連鎖が含まれるのかの指標である未観測単語列比率を計算することが出来る。この未観測単語列比率によって、言語モデルの追加学習時に追加された書き起こしテキストが、これまで未観測であった単語連鎖をどれ程、新たに網羅できたかを数値として定量的に測ることが可能となる。例えば追加書き起こしテキストにおける未観測の単語連鎖の数が減少すれば、言語モデルの学習はある程度進んでいることを表している。つまり、追加書き起こしテキストの未観測単語列比率を求めることは、言語モデル部１０の学習の度合いを評価していることに他ならない。 According to the language model evaluation apparatus 100, it is possible to calculate an unobserved word string ratio, which is an index as to whether or not an untranscribed word chain is included in the additional transcription text. Using this unobserved word string ratio, it is possible to measure quantitatively how many new uncovered word chains the transcription text added during additional learning of the language model can be covered. It becomes. For example, if the number of unobserved word chains in the additional transcription text decreases, it indicates that learning of the language model has progressed to some extent. That is, obtaining the unobserved word string ratio of the additional transcription text is nothing other than evaluating the learning level of the language model unit 10.

上記した追加書き起こしテキストと言語モデル部１０との比較方法は、追加書き起こしテキスト中に出現する単語連鎖が、言語モデル部１０内に存在するかどうかで比較する。追加書き起こしテキスト内で観測された単語連鎖でかつ言語モデル部１０内では未観測の単語連鎖があった場合、未観測の単語連鎖は追加学習の過程で言語モデルの一部として新たに追加されることとなる。 In the comparison method between the additional transcription text and the language model unit 10 described above, the word chain appearing in the additional transcription text is compared based on whether or not the language model unit 10 has a word chain. If there is a word chain observed in the additional transcript text and an unobserved word chain in the language model unit 10, the unobserved word chain is newly added as a part of the language model in the course of additional learning. The Rukoto.

このため、追加書き起こしテキスト内で新たに観測された単語連鎖が多い場合、追加学習前の言語モデルは追加書き起こしテキストに対応していなかったこととなり、追加学習による言語モデルの学習の進展が期待される。逆に、新たに観測された単語連鎖が少ない場合は、言語モデル部１０はその追加書き起こしテキストに既に対応しており、その追加書き起こしテキストを用いて追加学習を行っても、言語モデルの学習の進展は期待できないこととなる。 For this reason, when there are many newly observed word chains in the additional transcript, the language model before additional learning did not correspond to the additional transcript. Be expected. On the other hand, when there are few newly observed word chains, the language model unit 10 already corresponds to the additional transcription text, and even if additional learning is performed using the additional transcription text, the language model portion 10 Learning progress cannot be expected.

例えば連続音声認識に用いる言語モデルの学習に用いる追加書き起こしテキストが、連続音声認識で対象とする発話内容を偏りなく含んでおり、追加書き起こしテキストが与える単語連鎖が特定の特徴のみに偏在していないと仮定できる場合には、追加書き起こしテキストを一定量づつ言語モデルに追加して追加学習を行うことで、初期の段階においては言語モデルで未観測の単語連鎖が追加書き起こしテキストによって多数与えられることから、言語モデルの学習が進展し連続音声認識の認識率は向上する。 For example, the additional transcription text used for learning the language model used for continuous speech recognition contains the content of the utterance targeted for continuous speech recognition, and the word chain given by the additional transcription text is unevenly distributed only to specific features. If it can be assumed that it is not, additional learning text is added to the language model by a fixed amount and additional learning is performed.In the initial stage, many unobserved word chains are added to the language model by the additional transcription text. Given this, learning of the language model advances and the recognition rate of continuous speech recognition improves.

しかし、追加書き起こしテキストの単語連鎖が特定の特徴のみに偏在していない場合、順次追加書き起こしテキストを言語モデルに追加して行くと、追加書き起こしテキスト内に現れる単語連鎖で、かつ言語モデル内で未観測のものの比率は減少して行くことになる。 However, if the word chain of the additional transcription text is not unevenly distributed only for a specific feature, adding the additional transcription text to the language model sequentially will result in the word chain appearing in the additional transcription text and the language model. The ratio of unobserved items will decrease.

この発明の言語モデル評価装置１００を用いて一定量づつの追加書き起こしを言語モデルに追加していった場合の未観測単語列比率を測ることで、言語モデルがまだ改善の余地があるのか、或いは既に言語モデル内で多様な言語表現に対応できているのかを判断することが可能となる。 Whether the language model still has room for improvement by measuring the ratio of unobserved word strings when an additional transcript of a certain amount is added to the language model using the language model evaluation apparatus 100 of the present invention. Alternatively, it is possible to determine whether the language model already supports various language expressions.

以降において、各部のより具体的な構成例を示して更に詳しく言語モデル評価装置１００の動作を説明する。図３に、比較部２０のより具体的な機能構成例を示す。その動作フローを図４に示す。図５に、形態素解析手段２１と単語連鎖生成手段２２の入出力する情報の例を示す。 Hereinafter, the operation of the language model evaluation apparatus 100 will be described in more detail by showing a more specific configuration example of each unit. FIG. 3 shows a more specific functional configuration example of the comparison unit 20. The operation flow is shown in FIG. FIG. 5 shows an example of information input / output by the morpheme analysis unit 21 and the word chain generation unit 22.

比較部２０は、形態素解析手段２１と、単語連鎖生成手段２２と、単語連鎖判別手段２３と、を備える。形態素解析手段２１は、追加書き起こしテキストを入力として、当該追加書き起こしテキストを形態素に分割する（ステップＳ２１）。形態素解析手段２１で用いる形態素技術は、従来からある周知の技術によって実現可能である。 The comparison unit 20 includes a morphological analysis unit 21, a word chain generation unit 22, and a word chain determination unit 23. The morpheme analyzing means 21 receives the additional transcription text and divides the additional transcription text into morphemes (step S21). The morpheme technology used in the morpheme analyzing means 21 can be realized by a known technology.

図５に示す例では、追加書き起こしテキストとして小説の一節を引用している。引用元は、芥川龍之介著「トロッコ」の一節である。「竹薮のある所へ来ると、…途中省略…、急にはっきりと感じられた。」の追加書き起こしテキストは、形態素解析手段２１によって、／で区切られた形態素に分割される。形態素解析手段２１の出力する形態素は、単語連鎖生成手段２２に入力される。 In the example shown in FIG. 5, a passage from the novel is cited as the additional transcript text. The source of the quote is a passage of "Minecart" written by Ryunosuke Ayukawa. The additional transcript of “When I came to a place with a bamboo basket ... I was abruptly felt ... Omitted halfway” was divided into morphemes separated by / by the morpheme analysis means 21. The morpheme output by the morpheme analysis unit 21 is input to the word chain generation unit 22.

単語連鎖生成手段２２は、形態素を入力として２個以上の単語、例えば単語三つ組から成る単語連鎖を生成する（ステップＳ２２）。単語連鎖は、１個づつ形態素をシフトさせた形式で生成される。例えば、「竹薮／の／ある／」、「の／ある／所／」、といった形で全ての文章が単語連鎖に変換される。図５に示す例では、分かり易くする目的で一文単位を四角で囲って表記している。単語連鎖生成手段１２が生成した単語連鎖は単語連鎖判別手段２３に入力される。 The word chain generation means 22 generates a word chain composed of two or more words, for example, word triplets, by using morphemes as input (step S22). The word chain is generated in a form in which morphemes are shifted one by one. For example, all sentences are converted into word chains in the form of “bamboo basket / no / some /”, “no / some / location /”. In the example shown in FIG. 5, a sentence unit is enclosed by a square for the purpose of easy understanding. The word chain generated by the word chain generation unit 12 is input to the word chain determination unit 23.

図６に、単語連鎖判別手段２３の入出力する情報の例を示す。単語連鎖判別手段２３は、言語モデル部１０を参照して、入力される単語連鎖が言語モデル部１０に既に格納されているか否かを判別し、単語連鎖の既納情報を出力する（ステップＳ２３）。図６に示す例では、「来る／と／トロッコ／」と「と／トロッコ／は／」と「トロッコ／は／静か／」の３個の単語連鎖が、言語モデル部１０に存在しなかったことを図中の「×」の表記で示している。既納情報は、例えば言語モデル部１０に存在しない追加書き起こしテキスト中の単語連鎖の数である。図６の例では既納情報として単語連鎖の形態素も表記しているが、形態素の情報は無くても良い。単語連鎖判別手段２３が出力する既納情報は評価部３０に入力される。 FIG. 6 shows an example of information input / output by the word chain discrimination means 23. The word chain discriminating means 23 refers to the language model unit 10 to discriminate whether or not the input word chain is already stored in the language model unit 10, and outputs the prepaid information of the word chain (step S23). ). In the example shown in FIG. 6, the word model unit 10 does not have three word chains “coming / and / or truck /”, “and / or truck / ha /”, and “track / ha / quiet /”. This is indicated by “x” in the figure. The already-paid information is, for example, the number of word chains in the additional transcription text that does not exist in the language model unit 10. In the example of FIG. 6, word chain morphemes are also shown as already-paid information, but morpheme information may be omitted. Prepaid information output by the word chain determination means 23 is input to the evaluation unit 30.

図７に、評価部３０のより具体的な機能構成例を示す。その動作フローを図８に示す。評価部３０は、比較集計手段３１と、未観測単語列比率計算手段３２と、を備える。図９に、比較集計手段３１と未観測単語列比率計算手段３２の入出力する情報の例を示す。 FIG. 7 shows a more specific functional configuration example of the evaluation unit 30. The operation flow is shown in FIG. The evaluation unit 30 includes a comparison tabulation unit 31 and an unobserved word string ratio calculation unit 32. FIG. 9 shows an example of information input / output by the comparison tabulation unit 31 and the unobserved word string ratio calculation unit 32.

比較集計手段３１は、既納情報を入力として言語モデル部１０に格納されていない単語連鎖の数を表す未格納単語連鎖数情報を集計する（ステップＳ３１）。図９に示す例では、「来る／と／トロッコ／」〜「雑木林／に／なった／」の９個が、言語モデル部１０に格納されていない単語連鎖の数を表す未格納単語連鎖数情報として示されている。形態素の並びと併記している「１」は、各単語連鎖の出現回数が１回であったことを表している。比較集計手段が集計した未格納単語連鎖数情報は、未観測単語列比率計算手段３２に入力される。 The comparison tabulation unit 31 tabulates unstored word chain number information indicating the number of word chains that are not stored in the language model unit 10 with the already-paid information as an input (step S31). In the example shown in FIG. 9, the number of unstored word chains in which nine of “coming / to / trolley /” to “miscellaneous forest / ni / become /” represent the number of word chains not stored in the language model unit 10. Shown as information. “1” written together with the morpheme sequence indicates that the number of occurrences of each word chain is one. The unstored word chain number information tabulated by the comparison tabulating unit is input to the unobserved word string ratio calculating unit 32.

未観測単語列比率計算手段３２は、未格納単語連鎖数情報を入力として、当該未格納単語連鎖数情報と追加書き起こしテキストとの比率である未観測単語列比率を計算する（ステップＳ３２）。未観測単語列比率は、例えば未格納単語連鎖数情報の単語連鎖の数と、追加書き起こしテキストに含まれる全単語連鎖数との比率である。図９に示す例では、９÷２８≒０.３２が未観測単語列比率となる。追加書き起こしテキストに含まれる全単語連鎖数は、予め未観測単語列比率計算手段３２の内部に定数３２ａとして用意しておいても良い。又は、破線で示す信号線で示すように、比較部１０の単語連鎖生成手段１２からその数の情報を入手するようにしても良い。 The unobserved word string ratio calculating means 32 receives the unstored word chain number information as input, and calculates an unobserved word string ratio that is a ratio between the unstored word chain number information and the additional transcription text (step S32). The unobserved word string ratio is, for example, the ratio between the number of word chains in the unstored word chain number information and the total number of word chains included in the additional transcription text. In the example shown in FIG. 9, 9 ÷ 28≈0.32 is the unobserved word string ratio. The total number of word chains included in the additional transcription text may be prepared in advance as a constant 32 a in the unobserved word string ratio calculation means 32. Alternatively, as indicated by a signal line indicated by a broken line, the number of pieces of information may be obtained from the word chain generation unit 12 of the comparison unit 10.

未観測単語列比率は、実際にはもっと大きな数の比率で計算される。例えば１回目の追加学習を、１０００個の単語連鎖からなる追加書き起こしテキストを用いて言語モデル評価を行った場合の未格納単語連鎖数情報の単語連鎖の数が５０個であったと仮定すると未観測単語列比率は５％となる。 The unobserved word string ratio is actually calculated as a larger ratio. For example, if it is assumed that the number of word chains in the unstored word chain number information is 50 in the first additional learning when the language model evaluation is performed using the additional transcription text consisting of 1000 word chains, The observed word string ratio is 5%.

２回目の追加学習を、１回目の１０００個を含まない２０００個の単語連鎖からなる追加書き起こしテキストを用いた場合の未格納単語連鎖数情報の単語連鎖の数が８０個であったと仮定すると、未観測単語列比率は４％となる。３回目の追加学習の追加書き起こしテキストの単語連鎖の数を５００個、未格納単語連鎖数情報の単語連鎖の数が１０個であったと仮定すると未観測単語列比率は２％となる。 Assume that the number of word chains in the unstored word chain number information is 80 when the additional learning of the second time uses additional transcription text composed of 2000 word chains not including the first 1000 words. The unobserved word string ratio is 4%. Assuming that the number of word chains in the additional transcription text of the third additional learning is 500 and the number of word chains in the unstored word chain number information is 10, the unobserved word string ratio is 2%.

このように言語モデル評価装置１００を用いることで言語モデルの学習の進捗状況を定量的に把握することが可能である。なお、追加書き起こしテキストの単語連鎖の数は、所定の一定値にした方が、学習の進捗度合いをより正しく評価することができる。 As described above, by using the language model evaluation apparatus 100, it is possible to quantitatively grasp the progress of learning of the language model. Note that the progress of learning can be more accurately evaluated when the number of word chains in the additional transcription text is set to a predetermined constant value.

また、上記した例では単語連鎖を単語三つ組としたが、この発明の考えは単語二つ組み以上のＮ個組の単語連鎖を用いた場合に適用することができる。また、単語連鎖にはならないが、一単語ごとにこの発明の考えを適用しても良い。その場合は、追加書き起こしテキストに含まれる新語の比率ということになる。その新語が多ければ言語モデルの追加学習が更に必要であり、逆に新語が少なければ対象の追加書き起こしテキストを用いても学習の効果は期待できないことが分かるので、一単語ごとであっても追加書き起こしテキストが言語モデルの追加学習に効果があるか否かを判定することができる。 In the above example, the word chain is a word triplet. However, the idea of the present invention can be applied to the case where N word chain combinations of two or more word pairs are used. Moreover, although it does not become a word chain, you may apply the idea of this invention for every word. In that case, it means the ratio of new words included in the additional transcript. If there are many new words, additional learning of the language model is necessary, and conversely, if there are few new words, it will be understood that the effect of learning cannot be expected even if the target additional transcript is used. It can be determined whether the additional transcript text is effective for additional learning of the language model.

また、未観測単語列比率は、既納情報の数から求められる発声時間長と、追加書き起こしテキストの発話時間長との比であっても良い。追加書き起こしテキストは、その文言通り、元々は音声情報である。よって、形態素数の比率で評価するよりも音声の発話時間長で評価した方が直感的に分かり易いといった効果が期待できる。なお、形態素情報を発話時間長情報に変換するのは、一対一の関係で容易に行うことができる。 The unobserved word string ratio may be a ratio between the utterance time length obtained from the number of pieces of information already delivered and the utterance time length of the additional transcription text. The additional transcription text is originally voice information as the wording indicates. Therefore, it can be expected that the evaluation based on the speech utterance length is easier to understand intuitively than the evaluation based on the ratio of the number of morphemes. Note that conversion of morpheme information into utterance time length information can be easily performed in a one-to-one relationship.

以上説明したように、本発明の言語モデル評価方法は、言語モデルの追加学習時に用いる追加書き起こしテキストと、既に構築されている言語モデルとを比較することで、追加書き起こしテキストが、例えば連続音声認識の認識率の向上に寄与するかどうかを判定することができる。と同時に言語モデルの学習の進捗度合いを評価することができる。 As described above, according to the language model evaluation method of the present invention, the additional transcript text used in the additional learning of the language model is compared with the already constructed language model, so that the additional transcript text is, for example, continuous. It can be determined whether or not it contributes to the improvement of the recognition rate of voice recognition. At the same time, the progress of learning the language model can be evaluated.

上記説明した未観測単語列比率の値が大きい場合は、更に追加書き起こしテキストを追加し、様々な言語表現に言語モデルを対応させる必要があると判断できる。未観測単語列比率の値が低下してきた場合、或いは一定値より低下しない場合には、追加書き起こしテキスト内には既に観測済みの単語列が多く存在し、言語モデルの学習はある程度進んでいると判断することができる。 When the value of the unobserved word string ratio described above is large, it can be determined that it is necessary to add additional transcription text and to associate language models with various language expressions. If the value of the unobserved word string ratio decreases or does not decrease below a certain value, there are many already observed word strings in the additional transcript, and language model learning has progressed to some extent. It can be judged.

実際の連続音声認識の利用場面においては、未観測の単語連鎖の比率をゼロにすることは一般に不可能である。つまり、言語モデルの学習をどこまで実施すれば良いのかを判断するのが難しいのが現状であるが、この発明の言語モデル評価方法を用いることで、言語モデルを作成する際の有効な指標を新たに提供することが可能になる。 In actual use situations of continuous speech recognition, it is generally impossible to reduce the ratio of unobserved word chains to zero. In other words, it is currently difficult to determine how much language model learning should be performed, but by using the language model evaluation method of the present invention, a new effective index for creating a language model can be obtained. Can be provided to.

上記装置における処理手段をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、各装置における処理手段がコンピュータ上で実現される。 When the processing means in the above apparatus is realized by a computer, the processing contents of the functions that each apparatus should have are described by a program. Then, by executing this program on the computer, the processing means in each apparatus is realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、DVD（Digital Versatile Disc）、DVD-RAM（Random Access Memory）、CD-ROM（Compact Disc Read Only Memory）、CD-R（Recordable）/RW（ReWritable）等を、光磁気記録媒体として、MO（Magneto Optical disc）等を、半導体メモリとしてEEP-ROM（Electronically Erasable and Programmable-Read Only Memory）等を用いることができる。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a flexible disk, a magnetic tape or the like, and as an optical disk, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only) Memory), CD-R (Recordable) / RW (ReWritable), etc., magneto-optical recording media, MO (Magneto Optical disc), etc., semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. Can be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したDVD、CD-ROM等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記録装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 This program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Further, the program may be distributed by storing the program in a recording device of a server computer and transferring the program from the server computer to another computer via a network.

また、各手段は、コンピュータ上で所定のプログラムを実行させることにより構成することにしてもよいし、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 Each means may be configured by executing a predetermined program on a computer, or at least a part of these processing contents may be realized by hardware.

Claims

A word chain composed of two or more words obtained from the transcription text stored in the language model part is compared with the word chain of the additional transcription text input from the outside, and the word chain is the language model part. A comparison process that outputs pre-paid information on whether or not already stored in
An evaluation process for calculating an unobserved word string ratio, which is a ratio between the word chain not stored in the language model unit and the word chain of the additional transcription text based on the prepaid information;
A language model evaluation method comprising:

The language model evaluation method according to claim 1,
The above comparison process is
A morphological analysis step for dividing the transcription text into morphemes;
A word chain generation step for generating a word chain composed of two or more words using the morpheme as an input;
A word chain determination step of determining whether or not the word chain is already stored in the language model part with reference to the language model part, and outputting prepaid information of the word chain;
A language model evaluation method comprising:

In the language model evaluation method according to claim 1 or 2,
The above evaluation process is
A comparison counting step of counting unstored word chain number information representing the number of word chains that are not stored in the language model unit as input of the prepaid information;
An unobserved word string ratio calculation step for calculating an unobserved word string ratio, which is a ratio between the unstored word chain number information and the word chain of the additional transcription text, using the unstored word chain number information as input,
A language model evaluation method comprising:

The language model evaluation method according to any one of claims 1 to 3,
2. The language model evaluation method according to claim 1, wherein the unobserved word string ratio is a value obtained by dividing the number of the already-paid information by the number of word chains of the additional transcription text.

The language model evaluation method according to any one of claims 1 to 3,
The language model evaluation method according to claim 1, wherein the unobserved word string ratio is a value obtained by dividing the utterance time length obtained from the prepaid information by the utterance time length of the additional transcription text.

A language model part in which a word chain composed of two or more words obtained from the transcription text and its appearance probability are stored;
Prepayment information on whether or not the word chain is already stored in the language model part by comparing the word chain stored in the language model part with the word chain of the additional transcription text input from the outside A comparison unit that outputs
An evaluation unit that calculates an unobserved word string ratio that is a ratio between the word chain that is not stored in the language model unit and the word chain of the additional transcription text based on the prepaid information;
A language model evaluation apparatus comprising:

A program for causing a computer to function as the language model evaluation apparatus according to claim 6.