JP2010118001A

JP2010118001A - Language model update device, method, and program

Info

Publication number: JP2010118001A
Application number: JP2008292584A
Authority: JP
Inventors: Koji Okabe; 浩司岡部; Ryosuke Isotani; 亮輔磯谷; Takeshi Hanazawa; 健花沢
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2008-11-14
Filing date: 2008-11-14
Publication date: 2010-05-27

Abstract

<P>PROBLEM TO BE SOLVED: To allow accurate discrimination leaning of language models even when text data including text data which actual data corresponding do not exist. <P>SOLUTION: A language model update device includes a language model update part which updates language models by using false data being data including text data. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、言語モデル更新装置、方法およびプログラムに関し、特に、識別学習に基づく言語モデル更新装置、方法およびプログラムに関する。 The present invention relates to a language model update device, method and program, and more particularly to a language model update device, method and program based on discriminative learning.

言語モデルは、例えば、音声認識および手書き文字認識において一般的に用いられる。言語モデルは、コーパス中に出現する単語または単語連鎖の頻度に基づいて作成される。 The language model is generally used in, for example, speech recognition and handwritten character recognition. The language model is created based on the frequency of words or word chains appearing in the corpus.

また、近年においては、音声認識を高精度化するために、識別学習によって言語モデルを更新する手法が注目されている。非特許文献１には、言語モデルを更新する言語モデル更新装置が記載されており、特に、音声認識に用いられる言語モデルに対する言語モデル更新装置が記載されている。 In recent years, a technique for updating a language model by discriminative learning has been attracting attention in order to improve the accuracy of speech recognition. Non-Patent Document 1 describes a language model update device for updating a language model, and particularly describes a language model update device for a language model used for speech recognition.

Ｈｏｎｇ−ＫｗａｎｇＪｅｆｆＫｕｏｅｔａｌ． “ＤＩＳＣＲＩＭＩＮＡＴＩＶＥＴＲＡＩＮＩＮＧＯＦＬＡＮＧＵＡＧＥＭＯＤＥＬＳＦＯＲＳＰＥＥＣＨＲＥＣＯＧＮＩＴＩＯＮ” ＩＣＡＳＳＰ２００２Hong-Kwang Jeff Kuo et al. “DISCRIMINATE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION” ICASSP 2002

以下の分析は、本発明者によってなされたものである。 The following analysis was made by the present inventors.

図５は、非特許文献１に記載された言語モデル更新装置の構成を概略的に示すブロック図である。図５を参照すると、言語モデル更新装置１４０は、音声認識部１４３および言語モデル更新部１４５を備える。 FIG. 5 is a block diagram schematically showing the configuration of the language model update device described in Non-Patent Document 1. Referring to FIG. 5, the language model update device 140 includes a voice recognition unit 143 and a language model update unit 145.

言語モデル更新装置１４０の動作は、次の通りである。すなわち、音声認識部１４３は、更新すべき更新前言語モデル１３３および音響モデル１３４を用いて、収録された音声データである実音声データ１３１の音声認識を行う。言語モデル更新部１４５は、音声認識部１４３による音声認識結果およびそのスコア（すなわち、確からしさ）ならびに実音声データ１３１における発声内容に対応するテキストデータ１３２を用いて、正例と負例の間のスコアの差がより大きくなるように、更新前言語モデル１３３に対する識別学習を行う。ここで、正例とは正解認識結果をいい、負例とは誤認識結果のうち認識結果として上位に挙げられたものをいう。以上により、言語モデル更新装置１４０は、更新前言語モデル１３３よりも認識精度が高い更新後言語モデル１３５を生成する。 The operation of the language model update device 140 is as follows. That is, the speech recognition unit 143 performs speech recognition of the actual speech data 131 that is recorded speech data, using the pre-update language model 133 and the acoustic model 134 to be updated. The language model update unit 145 uses the speech recognition result by the speech recognition unit 143 and its score (that is, the probability) and the text data 132 corresponding to the utterance content in the actual speech data 131, and between the positive example and the negative example. Discrimination learning is performed on the pre-update language model 133 so that the difference in scores becomes larger. Here, a positive example refers to a correct answer recognition result, and a negative example refers to an erroneous recognition result that is listed at the top as a recognition result. As described above, the language model update device 140 generates the updated language model 135 with higher recognition accuracy than the pre-update language model 133.

ところで、言語モデルの識別学習においては、学習すべきデータの量が多いほど高精度な学習が可能となる。言語モデル学習装置１４０は、言語モデルの識別学習を行う際、書き起こし文などのテキストデータ１３２のみならず、これに対応する実データ（例えば、実音声データ１３１又は手書き文字データ）を必要とする。しかし、タスクによってはテキストデータ１３２のみが存在し、これに対応する実データが存在しない場合もある。言語モデル更新装置１４０は、テキストデータ１３２のうち対応する実データが存在しないものを識別学習に用いることができず、高精度の学習を行うことができないという問題がある。 By the way, in the identification learning of the language model, the more accurate the learning becomes possible as the amount of data to be learned increases. When performing language model identification learning, the language model learning device 140 requires not only text data 132 such as a transcript, but also actual data (for example, actual speech data 131 or handwritten character data) corresponding thereto. . However, depending on the task, only the text data 132 exists, and there may be no actual data corresponding thereto. The language model update device 140 has a problem that it cannot use text data 132 that does not have corresponding real data for identification learning, and cannot perform highly accurate learning.

そこで、テキストデータにおいて、対応する実データが存在しないものが含まれる場合においても、言語モデルを高精度に識別学習できるようにすることが課題となる。本発明の目的は、かかる課題を解決する言語モデル更新装置、方法及びプログラムを提供することにある。 Therefore, it becomes a problem to be able to identify and learn a language model with high accuracy even when text data that does not have corresponding actual data is included. The objective of this invention is providing the language model update apparatus, method, and program which solve this subject.

本発明の第１の視点に係る言語モデル更新装置は、テキストデータから合成したデータである擬似データを用いて言語モデルを更新する言語モデル更新部を備える。 The language model update device according to the first aspect of the present invention includes a language model update unit that updates a language model using pseudo data that is data synthesized from text data.

本発明の第２の視点に係る言語モデル更新方法は、テキストデータから合成したデータである擬似データを用いて言語モデルを更新する言語モデル更新工程を含む。 The language model update method according to the second aspect of the present invention includes a language model update step of updating a language model using pseudo data which is data synthesized from text data.

本発明の第３の視点に係る言語モデル更新プログラムは、テキストデータから合成したデータである擬似データを用いて言語モデルを更新する言語モデル更新処理をコンピュータに実行させる。 A language model update program according to a third aspect of the present invention causes a computer to execute language model update processing for updating a language model using pseudo data that is data synthesized from text data.

本発明に係る言語モデル更新装置、方法及びプログラムにより、テキストデータに対応する実データの存在しない場合においても、高精度な言語モデルを生成することができる。 The language model updating apparatus, method, and program according to the present invention can generate a highly accurate language model even when there is no actual data corresponding to text data.

（第１の実施形態）
本発明の第１の実施形態に係る言語モデル更新装置について、図面を参照して説明する。図１は、本実施形態の言語モデル更新装置２０の構成を示すブロック図である。 (First embodiment)
A language model updating apparatus according to a first embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing the configuration of the language model update device 20 of the present embodiment.

図１を参照すると、言語モデル更新装置２０は、テキストデータ１２から合成したデータである擬似データ２２を用いて言語モデルを更新する言語モデル更新部２５を備える。 Referring to FIG. 1, the language model update device 20 includes a language model update unit 25 that updates a language model using pseudo data 22 that is data synthesized from text data 12.

また、言語モデル更新装置２０は、擬似データ２２を合成する擬似データ合成部２１を備えることが好ましい。さらに、擬似データ合成部２１は、認識対象として実際に収集されたデータである実データ１１であってテキストデータ１２に対応するものが存在しない場合に限り、擬似データ２２を合成することが好ましい。 The language model update device 20 preferably includes a pseudo data synthesis unit 21 that synthesizes the pseudo data 22. Furthermore, it is preferable that the pseudo data synthesis unit 21 synthesizes the pseudo data 22 only when the actual data 11 that is actually collected as a recognition target and there is no data corresponding to the text data 12.

また、言語モデル更新装置２０は、実データ１１及び擬似データ２２を認識する認識部２３をさらに備え、言語モデル更新部２５は、認識部２３における認識結果に基づいて更新前言語モデル１３を更新することが好ましい。さらに、認識部２３における認識結果は、正例及び負例並びにこれらのスコアを含むことが好ましい。 The language model update device 20 further includes a recognition unit 23 that recognizes the actual data 11 and the pseudo data 22, and the language model update unit 25 updates the pre-update language model 13 based on the recognition result in the recognition unit 23. It is preferable. Furthermore, it is preferable that the recognition result in the recognition part 23 includes a positive example, a negative example, and these scores.

また、言語モデル更新装置２０は、擬似データ２２を実データ１１とみなした場合におけるスコアを推定し、推定したスコアに基づいて、擬似データ２２に対するスコアを補正するスコア補正部２４を備えることが好ましい。さらに、スコア補正部２４は、実データ１１とその対立候補との間のスコア差を学習することによって得られたスコア差モデルに基づいて、擬似データ２２を実データ１１とみなした場合におけるスコアを推定することが好ましい。 The language model update device 20 preferably includes a score correction unit 24 that estimates a score when the pseudo data 22 is regarded as the actual data 11, and corrects the score for the pseudo data 22 based on the estimated score. . Furthermore, the score correction unit 24 calculates the score when the pseudo data 22 is regarded as the actual data 11 based on the score difference model obtained by learning the score difference between the actual data 11 and the opponent candidate. It is preferable to estimate.

また、言語モデル更新装置２０における認識対象は、例えば、音声であってもよいし、手書き文字であってもよい。 Further, the recognition target in the language model update device 20 may be, for example, a voice or handwritten characters.

（第２の実施形態）
本発明の第２の実施形態に係る言語モデル更新装置について図面を参照して詳細に説明する。図１は、本実施形態の言語モデル更新装置２０の構成を示すブロック図である。図１を参照すると、言語モデル更新装置２０は、擬似データ合成部２１、擬似データ２２、認識部２３、スコア補正部２４および言語モデル更新部２５とを備える。 (Second Embodiment)
A language model updating apparatus according to a second embodiment of the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing the configuration of the language model update device 20 of the present embodiment. Referring to FIG. 1, the language model update device 20 includes a pseudo data synthesis unit 21, pseudo data 22, a recognition unit 23, a score correction unit 24, and a language model update unit 25.

実データ１１は、実際に収集されたデータである。テキストデータ１２は、書き起こし文を含む。更新前言語モデル１３は、認識部２３における認識に用いられるとともに、更新の対象とされる言語モデルである。モデル１４は、認識部２３による認識において用いられる。 The actual data 11 is actually collected data. The text data 12 includes a transcript. The pre-update language model 13 is a language model that is used for recognition in the recognition unit 23 and is an update target. The model 14 is used in recognition by the recognition unit 23.

擬似データ合成部２１は、テキストデータ１２のうちの対応する実データ１１が存在しないものを入力し、（例えば、音声合成技術に基づいて）機械式にデータを合成して擬似データ２２として出力する。認識部２３は、更新前言語モデル１３とモデル１４を用いて、被認識データ（すなわち、実データ１１および擬似データ２２）の認識を行う。スコア補正部２４は、擬似データ２２を実データ１１とみなした場合のスコアを推定し、推定したスコアを用いて擬似データ２２の認識結果のスコアを適切に補正する。言語モデル更新部２５は、更新前言語モデル１３と、発声内容であるテキストデータ１２と、認識結果と、補正後のスコアとを入力し、言語モデルの識別学習を行って更新後言語モデル１５を出力する。 The pseudo data synthesizer 21 inputs text data 12 that does not have corresponding real data 11, synthesizes the data mechanically (for example, based on a speech synthesis technique), and outputs the synthesized data 22 as pseudo data 22. . The recognition unit 23 recognizes the recognized data (that is, the actual data 11 and the pseudo data 22) using the pre-update language model 13 and the model 14. The score correction unit 24 estimates a score when the pseudo data 22 is regarded as the actual data 11, and appropriately corrects the score of the recognition result of the pseudo data 22 using the estimated score. The language model update unit 25 inputs the pre-update language model 13, the text data 12 that is the utterance content, the recognition result, and the corrected score, and performs language model identification learning to update the updated language model 15. Output.

次に、図２のフローチャートを参照して本実施形態の言語モデル更新装置２０の動作について説明する。 Next, the operation of the language model update device 20 of this embodiment will be described with reference to the flowchart of FIG.

まず、擬似データ合成部２１は、テキストデータ１２に対応する実データ１１が存在するか否かを判定する（ステップＳ１１）。対応する実データ１１が存在しない場合には（ステップＳ１１のＮｏ）、擬似データ合成部２１は、擬似データ２２を合成して（ステップＳ１２）、記憶部に記録する。 First, the pseudo data composition unit 21 determines whether or not the actual data 11 corresponding to the text data 12 exists (step S11). When the corresponding actual data 11 does not exist (No in Step S11), the pseudo data combining unit 21 combines the pseudo data 22 (Step S12) and records it in the storage unit.

次に、認識部２３は、更新前言語モデル１３およびモデル１４を用いて、実データ１１および擬似データ２２の認識を行う（ステップＳ１３）。 Next, the recognition unit 23 recognizes the actual data 11 and the pseudo data 22 by using the pre-update language model 13 and the model 14 (step S13).

次に、認識部２３は、認識結果が実データ１１の認識結果であるのか、擬似データ２２の認識結果であるのかを判定する（ステップＳ１４）。擬似データ２２の認識結果である場合には（ステップＳ１４のＹｅｓ）、スコア補正部２４は、実データ１１とみなした場合とのスコア差を推定し、擬似データ２２に対するスコアを補正する（ステップＳ１５）。 Next, the recognition unit 23 determines whether the recognition result is a recognition result of the actual data 11 or a recognition result of the pseudo data 22 (step S14). If the result is the recognition result of the pseudo data 22 (Yes in step S14), the score correction unit 24 estimates the score difference from the case where it is regarded as the actual data 11, and corrects the score for the pseudo data 22 (step S15). ).

次に、言語モデル更新部２５は、更新前言語モデル１３の識別学習を行って言語モデルを更新する（ステップＳ１６）。最後に、言語モデル更新部２５は、更新後の言語モデルを更新後言語モデル１５として出力する（ステップＳ１７）。 Next, the language model update unit 25 performs identification learning of the pre-update language model 13 and updates the language model (step S16). Finally, the language model update unit 25 outputs the updated language model as the updated language model 15 (step S17).

なお、上記において、擬似データ合成部２１は、擬似データ２２を記憶部に一時的に記録するものとした。しかし、擬似データ合成部２１は、擬似データ２２を直接認識部２３に入力するようにしてもよい。 In the above description, the pseudo data synthesis unit 21 temporarily records the pseudo data 22 in the storage unit. However, the pseudo data synthesis unit 21 may input the pseudo data 22 directly to the recognition unit 23.

本実施形態の言語モデル更新装置２０は、擬似データ合成部２１を用いて擬似データ２２を合成するため、テキストデータ１２のうち対応する実データ１１が存在しないものも用いつつ、言語モデルの識別学習を行うことができる。したがって、本実施形態の言語モデル更新装置２０により、高精度な識別学習が可能となる。 Since the language model update device 20 of the present embodiment synthesizes the pseudo data 22 using the pseudo data synthesis unit 21, the language model identification learning is performed while using the text data 12 that does not have the corresponding actual data 11. It can be performed. Therefore, the language model update device 20 according to the present embodiment enables highly accurate identification learning.

また、擬似データ合成部２１により合成された擬似データ２２を用いて言語モデルの識別学習を行ったときには、機械的な合成によって生成された擬似データ２２と実際に収集された実データ１１との間で認識を行った際のスコアが大きく異なる場合がある。 In addition, when language model identification learning is performed using the pseudo data 22 synthesized by the pseudo data synthesis unit 21, the pseudo data 22 generated by the mechanical synthesis and the actual data 11 actually collected are between The score when recognizing in may be greatly different.

この場合には、言語モデル更新装置２０は、言語モデルを適切に学習することができないないという問題が生じうる。しかしながら、本実施形態の言語モデル更新装置２０においては、擬似データ２２を実データ１１とみなした場合のスコアをスコア補正部２４によって推定することにより、擬似データ２２のスコアが補正される。したがって、本実施形態の言語モデル更新装置２０によると、機械的な合成によって生成された擬似データ２２を用いた場合であっても、実データ１１を用いた場合と同等の学習効果が得られる。 In this case, there may arise a problem that the language model update device 20 cannot properly learn the language model. However, in the language model update device 20 of the present embodiment, the score of the pseudo data 22 is corrected by estimating the score when the pseudo data 22 is regarded as the actual data 11 by the score correction unit 24. Therefore, according to the language model update device 20 of the present embodiment, even when the pseudo data 22 generated by mechanical synthesis is used, the same learning effect as that obtained when the actual data 11 is used can be obtained.

具体的な実施例を基づいて、本発明に係る言語モデル更新装置の動作を説明する。ここでは、音声認識に用いられる言語モデルを更新する場合を例として説明する。なお、本発明に係る言語モデル更新装置は、手書き文字認識に用いる言語モデルに関しても同様の効果を奏する。 The operation of the language model update device according to the present invention will be described based on a specific embodiment. Here, a case where a language model used for speech recognition is updated will be described as an example. Note that the language model update device according to the present invention has the same effect with respect to a language model used for handwritten character recognition.

ここでは、新聞記事を読み上げて得られた読み上げ音声を認識するための言語モデルについて考える。一般に、言語モデルは、新聞記事テキストデータを用いてＮ−ｇｒａｍ頻度を計数することによって作成される。また、言語モデルの識別学習を行う際には、用意された新聞記事の読み上げ音声を音声認識し、正例と負例を識別学習することによって識別的言語モデルが生成される。 Here, a language model for recognizing a reading speech obtained by reading a newspaper article is considered. Generally, a language model is created by counting N-gram frequency using newspaper article text data. In addition, when performing language model identification learning, the speech of a prepared newspaper article is recognized by speech, and a positive example and a negative example are discriminated and learned to generate a discriminative language model.

新聞記事テキストデータは相当の年数に亘って入手することができるものの、それら全部の読み上げ音声を作成し、又は入手することは、時間的な面及びコスト的な面から困難である。すなわち、識別学習において、対応する読み上げ音声が存在する一部の新聞記事テキストデータしか用いることができない。 Although newspaper article text data can be obtained for a considerable number of years, it is difficult from the viewpoint of time and cost to create or obtain read-out speech for all of them. That is, in the discriminative learning, only a part of newspaper article text data having a corresponding reading voice can be used.

図３は、本実施例における言語モデル更新装置４０の構成を示すブロック図である。擬似音声データ合成部４１は、対応する読み上げ音声である実音声データ３１を持たない新聞記事のテキストデータ３２に対して、形態素解析等によって読み情報を与えた後、ＨＭＭ（ＨｉｄｄｅｎＭａｒｋｏｖＭｏｄｅｌ）合成によって音声合成または特徴量合成を行い、合成された音声データまたは特徴量データを擬似音声データ４２として記憶部（非図示）に記録する。擬似音声データ合成部４１は、ＨＭＭ合成において音響モデル３４を用いてもよいし、それ以外の音響モデルを用いてもよい。 FIG. 3 is a block diagram showing the configuration of the language model update device 40 in this embodiment. The pseudo voice data synthesizer 41 gives reading information to the text data 32 of the newspaper article that does not have the real voice data 31 corresponding to the read-out voice by morphological analysis or the like, and then performs HMM (Hidden Markov Model) synthesis. Voice synthesis or feature quantity synthesis is performed, and the synthesized voice data or feature quantity data is recorded as pseudo voice data 42 in a storage unit (not shown). The pseudo speech data synthesizer 41 may use the acoustic model 34 in HMM synthesis or may use another acoustic model.

次に、音声認識部４３は、更新前言語モデル３３および図１のモデル１４に対応する音響モデル３４を用いて、実際の新聞記事読み上げ音声である実音声データ３１および擬似音声データ４２の音声認識を行う。このとき、音声認識部４３は、認識結果のＮ−ｂｅｓｔ出力またはワードラティスを各音素の音響スコアとともに出力する。 Next, the speech recognition unit 43 uses the acoustic model 34 corresponding to the pre-update language model 33 and the model 14 of FIG. 1 to recognize the speech of the actual speech data 31 and the pseudo speech data 42 that are actual newspaper article reading speech. I do. At this time, the speech recognition unit 43 outputs the N-best output or word lattice of the recognition result together with the acoustic score of each phoneme.

次に、スコア補正部４４は、擬似音声データ４２の認識結果の音響スコアを補正する。用いる合成手段及び音響モデルによって、擬似音声データ４２の音響スコアと実音声データ３１の音響スコアとは、大きく異なる場合があるからである。 Next, the score correction unit 44 corrects the acoustic score of the recognition result of the pseudo voice data 42. This is because the acoustic score of the pseudo speech data 42 and the acoustic score of the actual speech data 31 may differ greatly depending on the synthesis means and acoustic model used.

スコア補正部４４は、各音素とその対立候補とのスコア差をフレームごとに平均化し、モデル化して保持する。スコア補正４４は、擬似音声データ４２の認識結果における正解音素のスコアを式（１）のように補正する。 The score correction unit 44 averages the score difference between each phoneme and its opponent candidate for each frame, and stores it as a model. The score correction 44 corrects the correct phoneme score in the recognition result of the pseudo speech data 42 as shown in Expression (1).

式（１）において、ｓ_ｃは補正前のスコア、ｓ_ｃ’は補正後のスコアである。また、ｐ_ｃは正解音素、ｐ_ｋは誤った対立音素、Ｋは認識結果中に現れる対立音素の数をそれぞれ表す。さらに、Ｄはスコア差モデルから得られる正解音素と対立音素とのスコア差、ｄは認識結果における正解音素と対立音素とのスコア差をそれぞれ表す。また、αは補正の度合を表すパラメータである。 In Expression (1), s _c is a score before correction, and s _c ′ is a score after correction. Also, _pc is a correct phoneme, _pk is an incorrect phoneme, and K is the number of phonemes that appear in the recognition result. Furthermore, D represents the score difference between the correct phoneme and the opposite phoneme obtained from the score difference model, and d represents the score difference between the correct phoneme and the opposite phoneme in the recognition result. Α is a parameter representing the degree of correction.

スコア補正部４４は、スコア差モデルと実際のスコア差との差を平均化し、パラメータαを乗算したものを、補正前のスコアｓ_ｃに足し合わせることによって音響スコアを補正する。 The score correction unit 44 corrects the acoustic score by averaging the difference between the score difference model and the actual score difference and adding the result obtained by multiplying the parameter α to the score s _c before correction.

スコア補正部４４は、あらかじめ認識した実音声データ３１の正例、負例、正例のスコアおよび負例のスコアを用いて、正解音素と対立音素とのスコア差をガウス分布としてモデル化する。上記のスコア差Ｄとして、ガウス分布の平均値を用いることができる。また、αの値として、適当な値を設定しておく。 The score correction unit 44 models the score difference between the correct phoneme and the opposite phoneme as a Gaussian distribution using the positive example, the negative example, the positive example score, and the negative example score of the real speech data 31 recognized in advance. As the score difference D, an average value of Gaussian distribution can be used. Also, an appropriate value is set as the value of α.

図４は、スコア補正の例を示す。図４を参照すると、ｐ_ｃは／ｐ／、ｐ_ｋは／ｔ／にそれぞれ相当する。音声合成によって作成された擬似音声データ４２においては、正解音素である／ｐ／の尤度が高く、／ｐ／と／ｔ／とのスコア差ｄは、スコア差モデルにおける／ｐ／と／ｔ／とのスコア差Ｄよりも大きい。他の対立候補においても同様の傾向である場合には、式（１）の右辺の第２項は負となる。このとき、正解音素／ｐ／の音響スコアｓ_ｃは、その値が小さくなるように補正される。 FIG. 4 shows an example of score correction. Referring to FIG. 4, _{p c} is / _p /, p _k correspond respectively to / t /. In the pseudo speech data 42 created by speech synthesis, the likelihood of the correct answer phoneme / p / is high, and the score difference d between / p / and / t / is / p / and / t in the score difference model. It is larger than the score difference D with /. In the case of the same tendency in other conflict candidates, the second term on the right side of Equation (1) is negative. In this case, correct the phoneme / p / acoustic score s _c of, is corrected to the value decreases.

言語モデル更新部４５は、音響スコアを補正した音声認識結果とテキストデータ３２を用いて、更新前言語モデル３３に対する識別学習を行って、その結果を更新後言語モデル３５として出力する。 The language model update unit 45 performs identification learning on the pre-update language model 33 using the speech recognition result with the corrected acoustic score and the text data 32, and outputs the result as an updated language model 35.

以上の記載は実施例に基づいて行ったが、本発明は、上記実施例に限定されるものではない。 Although the above description has been made based on examples, the present invention is not limited to the above examples.

本発明は、例えば、音声認識、手書き文字認識における言語モデルを更新する際に適用することができる。 The present invention can be applied, for example, when updating a language model in speech recognition and handwritten character recognition.

本発明の実施形態に係る言語モデル更新装置の構成を示すブロック図である。It is a block diagram which shows the structure of the language model update apparatus which concerns on embodiment of this invention. 本発明の実施形態に係る言語モデル更新装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the language model update apparatus which concerns on embodiment of this invention. 本発明の実施例における言語モデル更新装置の構成を示すブロック図である。It is a block diagram which shows the structure of the language model update apparatus in the Example of this invention. スコアの補正について説明するための図である。It is a figure for demonstrating correction | amendment of a score. 従来の言語モデル更新装置の構成を示すブロック図である。It is a block diagram which shows the structure of the conventional language model update apparatus.

Explanation of symbols

１１実データ
１２、３２、１３２テキストデータ
１３、３３、１３３更新前言語モデル
１４モデル
１５、３５、１３５更新後言語モデル
２０、４０、１４０言語モデル更新装置
２１擬似データ合成部
２２擬似データ
２３認識部
２４、４４スコア補正部
２５、４５、１４５言語モデル更新部
３１、１３１実音声データ
３４、１３４音響モデル
４１擬似音声データ合成部
４２擬似音声データ
４３、１４３音声認識部 11 real data 12, 32, 132 text data 13, 33, 133 pre-update language model 14 models 15, 35, 135 post-update language models 20, 40, 140 language model update device 21 pseudo data composition unit 22 pseudo data 23 recognition unit 24, 44 Score correction unit 25, 45, 145 Language model update unit 31, 131 Real speech data 34, 134 Acoustic model 41 Pseudo speech data synthesis unit 42 Pseudo speech data 43, 143 Speech recognition unit

Claims

A language model update device comprising a language model update unit that updates a language model using pseudo data that is data synthesized from text data.

The language model update apparatus according to claim 1, further comprising a pseudo data synthesis unit that synthesizes the pseudo data.

The pseudo data synthesizer synthesizes the pseudo data only when there is no actual data that is actually collected as a recognition target and there is no data corresponding to the text data. Item 3. The language model update device according to Item 2.

A recognition unit for recognizing real data and pseudo data;
4. The language model update device according to claim 1, wherein the language model update unit updates the language model based on a recognition result in the recognition unit. 5.

The language model update apparatus according to claim 4, wherein the recognition result in the recognition unit includes positive examples, negative examples, and scores thereof.

The language model according to claim 5, further comprising: a score correction unit that estimates a score when the pseudo data is regarded as actual data, and corrects the score for the pseudo data based on the estimated score. Update device.

The score correction unit is configured to estimate a score when the pseudo data is regarded as actual data based on a score difference model obtained by learning a score difference between the actual data and the opponent candidate. The language model updating apparatus according to claim 6, wherein the language model updating apparatus is characterized in that:

The language model update device according to claim 1, wherein the recognition target is a voice.

The language model updating apparatus according to claim 1, wherein the recognition target is a handwritten character.

A language model updating method comprising a language model updating step of updating a language model using pseudo data which is data synthesized from text data.

The language model update method according to claim 10, further comprising a pseudo data synthesis step of synthesizing the pseudo data.

In the pseudo data synthesis step, the pseudo data is synthesized only when there is no actual data that is actually collected as a recognition target and there is no data corresponding to the text data. Item 12. The language model update method according to Item 11.

A recognition process for recognizing real data and pseudo data;
The language model update method according to any one of claims 10 to 12, wherein, in the language model update step, the language model is updated based on a recognition result in the recognition step.

The language model update method according to claim 13, wherein the recognition result in the recognition step includes a positive example, a negative example, and scores thereof.

The language model according to claim 14, further comprising a score correction step of estimating a score when the pseudo data is regarded as actual data, and correcting a score for the pseudo data based on the estimated score. Update method.

In the score correction step, based on a score difference model obtained by learning a score difference between the actual data and its opponent candidates, estimating a score when the pseudo data is regarded as actual data The language model update method according to claim 15, characterized in that the language model is updated.

A language model update program that causes a computer to execute language model update processing for updating a language model using pseudo data that is data synthesized from text data.

The language model update program according to claim 17, which causes a computer to execute a pseudo data synthesis process for synthesizing the pseudo data.

In the pseudo data synthesizing process, let the computer execute the process of synthesizing the pseudo data only when there is no actual data that is actually collected as a recognition target and there is no data corresponding to the text data. The language model update program according to claim 18, wherein:

Causing the computer to further perform recognition processing for recognizing real data and pseudo data,
The language model update program according to any one of claims 17 to 19, wherein, in the language model update process, the language model is updated based on a recognition result in the recognition process.

21. The language model update program according to claim 20, wherein the recognition result in the recognition processing includes a positive example, a negative example, and scores thereof.

The score when the pseudo data is regarded as actual data is estimated, and a computer executes score correction processing for correcting the score for the pseudo data based on the estimated score. Language model update program.

In the score correction process, based on a score difference model obtained by learning a score difference between actual data and its opponent candidates, estimating a score when the pseudo data is regarded as actual data 23. The language model update program according to claim 22, wherein