JP5860439B2

JP5860439B2 - Language model creation device and method, program and recording medium

Info

Publication number: JP5860439B2
Application number: JP2013160193A
Authority: JP
Inventors: 亮増村; 浩和政瀧; 隆伸大庭
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2013-08-01
Filing date: 2013-08-01
Publication date: 2016-02-16
Anticipated expiration: 2033-08-01
Also published as: JP2015031775A

Description

本発明は、対象タスクに関する音声書き起こし以外の言語資源を効率的に利用して、高い音声認識性能を実現する言語モデルを構築する言語モデル作成装置とその方法と、そのプログラムと記録媒体に関する。 The present invention relates to a language model creation apparatus and method, a program, and a recording medium for constructing a language model that achieves high speech recognition performance by efficiently using language resources other than speech transcription related to a target task.

音声認識では、言語的な予測のために言語モデルが必要である。言語モデルとしては、音声認識のデコーディングと非常に相性の良い形であるＮ−ｇｒａｍ言語モデルが一般的に利用される。Ｎ−ｇｒａｍ言語モデルは、学習テキストがあれば容易に学習することが可能であり、その学習方法はこれまで様々に提案されている。例えば、Ｎ−ｇｒａｍ言語モデルの学習方法は、非特許文献１などに記載されており周知である。 Speech recognition requires a language model for linguistic prediction. As the language model, an N-gram language model that is very compatible with speech recognition decoding is generally used. The N-gram language model can be easily learned if there is a learning text, and various learning methods have been proposed so far. For example, the learning method of the N-gram language model is described in Non-Patent Document 1 and is well known.

高精度な音声認識を実現する方法として、認識したい音声のタスク（対象タスク）に特化した言語モデルを使用することが有効であることが知られている。特化するとは、そのタスクで良く使われる言語現象に高い生起確率を与えることである。つまり、ニュース音声を認識する際は、ニュース音声に特化した言語モデル、コールセンター音声を認識する際は、コールセンター音声に特化した言語モデルが有効である。この特定のタスクに特化した言語モデルは、そのタスクの音声を人手により書き起こしを行うことで得られたテキスト（書き起こしテキスト）を用いて学習するのが一般的な方法である。 As a method for realizing highly accurate speech recognition, it is known that it is effective to use a language model specialized for a speech task (target task) to be recognized. Specialization means giving a high probability of occurrence to the language phenomenon often used in the task. That is, when recognizing news speech, a language model specialized for news speech is effective, and when recognizing call center speech, a language model specialized for call center speech is effective. In general, a language model specialized for a specific task is learned by using a text (transcribed text) obtained by manually transcribing the voice of the task.

しかし、人手による書き起こしを大量に集めることは、時間や人件費といったコストの増大を伴う。したがって、対象のタスクの書き起こしテキストを用いずに、タスクに関する別言語資源を上手く利用して言語モデルを構築する技術が用いられる。この技術は言語モデルタスク適応と称され、非特許文献２や３に記載されている。 However, collecting a large amount of human transcription is accompanied by an increase in costs such as time and labor costs. Therefore, a technique for constructing a language model by using another language resource relating to the task without using the transcription of the target task is used. This technique is called language model task adaptation and is described in Non-Patent Documents 2 and 3.

言語モデルタスク適応は、複数の外部言語資源からそれぞれのＮ−ｇｒａｍ言語モデルを構築し、それぞれの確率を線形補間することで言語モデルを構築する手法である。この技術は、単純な手法にも関わらず、現在でも多様に利用されている。 Language model task adaptation is a method of constructing a language model by constructing each N-gram language model from a plurality of external language resources and linearly interpolating the respective probabilities. Although this technique is simple, it is still widely used.

北研二著「言語と計算−４確率的言語モデル」東京大学出版会、pp.57-62.Kitakenji "Language and Computation-4 Stochastic Language Model" The University of Tokyo Press, pp.57-62. 鹿野清宏他「ＩＴ Text音声認識システム」オーム社出版局Kiyohiro Shikano et al. “IT Text Speech Recognition System” Ohm Publishing Office R. Iyer, and M. Ostendorf,” Modeling long distance dependence in language: topic mixtures vs dynamic cache models” , IEEE Transactions on Speech and Audio Processing, vol.7,no.1,pp.30-36, 1996.R. Iyer, and M. Ostendorf, “Modeling long distance dependence in language: topic mixture vs dynamic cache models”, IEEE Transactions on Speech and Audio Processing, vol.7, no.1, pp. 30-36, 1996.

しかし、コンタクトセンタのような話し言葉のタスクにおいては、言語モデルタスク適応を用いても書き起こしテキストと同等の質のテキストが得られることがほとんどない。
その原因は、Ｗｅｂ等の言語資源には、話し言葉調のテキスト情報がほとんど存在しないことによる。例えば通信系コンタクトセンタの通話音声が対象タスクの場合、外部言語資源として通信系のＷｅｂテキストを利用すれば使用されている名詞等は類似しているが、そのスタイルは書き言葉調である。このスタイルの違いが原因で、話し言葉のタスクにマッチした書き起こしテキストが得られない課題がある。 However, in spoken language tasks such as contact centers, text with the same quality as transcripts is rarely obtained using language model task adaptation.
This is because language information such as the Web has almost no spoken language text information. For example, in the case where the communication voice of a communication contact center is the target task, the nouns used are similar if the communication Web text is used as an external language resource, but the style is written. Due to this difference in style, there is a problem that the transcription text that matches the task of spoken language cannot be obtained.

本発明は、この課題に鑑みてなされたものであり、タスクに完全にはマッチしない外部言語資源から柔軟に言語モデルタスク適応を行い、タスクに対して高性能な言語モデルを構築する言語モデル作成装置とその方法と、そのプログラムと記録媒体を提供することを目的とする。 The present invention has been made in view of this problem, and language model creation that flexibly adapts a language model task from an external language resource that does not perfectly match the task and constructs a high-performance language model for the task. An object is to provide an apparatus, a method thereof, a program thereof, and a recording medium.

本発明の言語モデル作成装置は、ＬＷＬＭ学習部と、ＬＷＬＭタスク適応部と、疑似学習テキスト作成部と、Ｎ−ｇｒａｍ言語モデル作成部と、を具備する。ＬＷＬＭ学習部は、複数個の学習テキストを入力として、当該各学習テキストの単語列にそれぞれ対応する隠れた単語系列である複数の隠れ系列を生成し、当該隠れ系列から潜在語−潜在語確率である文型モデルと潜在語−観測語確率である語彙モデルの２つの確率分布からなる潜在語言語モデルを上記複数個の学習テキストごとに学習する。ＬＷＬＭタスク適応部は、複数個の潜在語言語モデルを入力として、当該潜在語言語モデルを文型モデルと語彙モデルに分離し、複数個の文型モデルと語彙モデルをそれぞれ重み付け混合した組をタスク適応潜在語言語モデルとして生成する。疑似学習テキスト作成部は、タスク適応潜在語言語モデルを入力として、潜在語系列を生成し、当該潜在語系列から疑似学習テキストを生成する。Ｎ−ｇｒａｍ言語モデル作成部は、疑似テキストを入力として、当該疑似テキスト中の全てのＮ個組みの単語の頻度を数えてＬＷＬＭ的Ｎ−ｇｒａｍ言語モデルを作成する。 The language model creation apparatus of the present invention includes an LMLM learning unit, an LMLM task adaptation unit, a pseudo learning text creation unit, and an N-gram language model creation unit. The LMLM learning unit receives a plurality of learning texts, generates a plurality of hidden sequences that are hidden word sequences respectively corresponding to the word strings of the respective learning texts, and uses latent word-latent word probabilities from the hidden sequences. A latent word language model composed of two probability distributions of a certain sentence pattern model and a vocabulary model of latent word-observed word probability is learned for each of the plurality of learning texts. The LMLM task adaptation unit receives a plurality of latent language models, separates the latent language model into a sentence model and a vocabulary model, and sets a combination of a plurality of sentence models and vocabulary models as weighted tasks. Generate as a language model. The pseudo learning text creation unit receives the task adaptive latent word language model as an input, generates a latent word series, and generates a pseudo learning text from the latent word series. The N-gram language model creation unit creates pseudo-text as an input and counts the frequency of all N words in the pseudo-text to create an LMLM-like N-gram language model.

本発明の言語モデル作成装置によれば、潜在語言語モデルを使用して、言語資源を文型的な側面と語彙的な側面の２つの成分に分離して、文型的な側面と語彙的な側面のそれぞれを言語資源間で適切に混合することで言語モデルタスク適応を行い、タスクに対してＮ−ｇｒａｍ言語モデルを構築する。したがって、文型的な側面のみに有用な情報を持つ外部言語資源や、語彙的な側面のみに有用な情報を持つ外部言語資源から、タスクに適応した高性能な言語モデルを構築することができる。 According to the language model creating apparatus of the present invention, a latent language model is used to separate a language resource into two components, a sentence-type aspect and a lexical aspect, and a sentence-type aspect and a lexical aspect. Are appropriately mixed between language resources to perform language model task adaptation, and an N-gram language model is constructed for the task. Therefore, a high-performance language model adapted to the task can be constructed from external language resources having information useful only for the sentence-type aspect and external language resources having information useful only for the lexical aspect.

本発明の言語モデル作成装置１００の機能構成例を示す図。The figure which shows the function structural example of the language model creation apparatus 100 of this invention. 言語モデル作成装置１００の動作フローを示す図。The figure which shows the operation | movement flow of the language model production apparatus 100. 潜在語系列と観測語系列を説明するための図。The figure for demonstrating a latent word series and an observation word series. ＬＷＬＭ学習部１１０の機能構成例を示す図。The figure which shows the function structural example of the LMLM learning part 110. FIG. ＬＷＬＭ学習部１１０の動作フローを示す図。The figure which shows the operation | movement flow of the LMLM learning part 110. ＬＷＬＭタスク適応部１３０の機能構成例を示す図。The figure which shows the function structural example of the LMLM task adaptation part. ＬＷＬＭタスク適応部１３０の動作フローを示す図。The figure which shows the operation | movement flow of the LMLM task adaptation part. 擬似学習テキスト作成部１４０の機能構成例を示す図。The figure which shows the function structural example of the pseudo learning text preparation part 140. FIG. 擬似学習テキスト作成部１４０の動作フローを示す図。The figure which shows the operation | movement flow of the pseudo learning text preparation part 140. FIG. 本発明の言語モデル作成装置２００の機能構成例を示す図。The figure which shows the function structural example of the language model creation apparatus 200 of this invention. 言語モデル作成装置２００の動作フローを示す図。The figure which shows the operation | movement flow of the language model production apparatus 200.

以下、この発明の実施の形態を図面を参照して説明する。複数の図面中同一のものには同じ参照符号を付し、説明は繰り返さない。 Embodiments of the present invention will be described below with reference to the drawings. The same reference numerals are given to the same components in a plurality of drawings, and the description will not be repeated.

図１に、この発明の言語モデル作成装置１００の機能構成例を示す。その動作フローを図２に示す。言語モデル作成装置１００は、ＬＷＬＭ学習部１１０と、ＬＷＬＭタスク適応部１３０と、疑似学習テキスト作成部１４０と、Ｎ−ｇｒａｍ言語モデル作成部１６０と、制御部１８０と、を具備する。ＬＷＬＭ（Latent Words Language Model）とは潜在語言語モデルのことである。言語モデル作成装置１００は、例えばＲＯＭ、ＲＡＭ、ＣＰＵ等で構成されるコンピュータに所定のプログラムが読み込まれて、ＣＰＵがそのプログラムを実行することで実現されるものである。以降で説明する他の実施例についても同様である。 FIG. 1 shows a functional configuration example of the language model creation device 100 of the present invention. The operation flow is shown in FIG. The language model creation device 100 includes an LMLM learning unit 110, an LMLM task adaptation unit 130, a pseudo learning text creation unit 140, an N-gram language model creation unit 160, and a control unit 180. LVLM (Latent Words Language Model) is a latent language model. The language model creation apparatus 100 is realized by a predetermined program being read into a computer composed of, for example, a ROM, a RAM, and a CPU, and the CPU executing the program. The same applies to other embodiments described below.

ＬＷＬＭ学習部１１０は、複数個の学習テキストを入力として、当該各学習テキストの単語列にそれぞれ対応する隠れた単語系列である複数の隠れ系列を生成し、当該隠れ系列から潜在語−潜在語確率である文型モデルと潜在語−観測語確率である語彙モデルの２つの確率分布からなる潜在語言語モデルを上記複数個の学習テキストごとに学習する（ステップＳ１１０）。１個のＬＷＬＭの学習は、隠れ系列の数分繰り返される（ステップＳ１８０１）。この繰り返し動作の制御は制御部１８０が行う。隠れ系列と同じ数、学習された複数個の潜在語言語モデルは、モデルの集合としてまとまった形で記録され、潜在語言語モデル１２０を構成する。潜在語は、クラスＮ−ｇｒａｍ言語モデルにおけるクラスに相当し、ある文脈において意味や構文的な役割が似た単語をグループ化した場合の代表語を表す。 The LMLM learning unit 110 receives a plurality of learning texts, generates a plurality of hidden sequences that are hidden word sequences respectively corresponding to the word strings of the respective learning texts, and latent word-latent word probabilities from the hidden sequences. A latent word language model comprising two probability distributions of a sentence pattern model and a vocabulary model of latent word-observed word probability is learned for each of the plurality of learning texts (step S110). The learning of one LMLM is repeated for the number of hidden sequences (step S1801). The control unit 180 controls this repetitive operation. A plurality of learned latent language models having the same number as the hidden series are recorded as a set of models, and constitute a latent language model 120. A latent word corresponds to a class in the class N-gram language model, and represents a representative word when words having similar meanings and syntactic roles are grouped in a certain context.

図３に、潜在語と観測語との関係を例示する。観測語とは、テキスト上に現れる単語のことであり、例えば図示するような学習テキスト「今日はいい天気です。」の一文を構成する単語列（観測語列）のことである。この観測語に対して、潜在語は観測語に似た単語の代表語を用いて表される。観測語の「今日」は、潜在語の「明日」に対応し、「明日」「昨日」「今日」等の複数の類似単語の代表として表される。 FIG. 3 illustrates the relationship between latent words and observed words. The observation word is a word appearing on the text, for example, a word string (observation word string) constituting a sentence of the learning text “Today is good weather” as shown in the figure. For this observed word, the latent word is expressed using a representative word similar to the observed word. The observation word “today” corresponds to the latent word “tomorrow” and is represented as a representative of a plurality of similar words such as “tomorrow”, “yesterday”, and “today”.

学習テキストは、言語資源であり単語単位で分割されているものである。潜在語言語モデルは、潜在語−潜在語確率分布Ｐ（ｈ_ｋ｜ｈ_ｋ−２，ｈ_ｋ−１）と、潜在語−観測語確率分布Ｐ（ｗ_ｋ｜ｈ_ｋ）の２つの確率分布から成る。潜在語ｈ_ｋは、潜在語言語モデルにおける潜在変数に当たり、観測語ｗ_ｋは実際にテキスト中に出現する単語を表す。 The learning text is a language resource and is divided into words. The latent word language model has _two probability distributions of a latent word-latent word probability distribution P (h _k | h _k−2 , h _k−1 ) and a latent word-observed word probability distribution P (w _k | h _k ). Consists of. The latent word h _k is a latent variable in the latent word language model, and the observed word w _k represents a word that actually appears in the text.

隠れ系列とは、例えば観測語系列を「りんごみかんパインです」とし、単語数もその４個の単語のみと仮定した場合、隠れ系列の一つは例えば「りんごりんごりんごです」の一文である。この隠れ系列は、予め定められた数（複数）、生成される。 The hidden sequence is, for example, that the observed word sequence is “apple mandarin pine” and if the number of words is assumed to be only those four words, one of the hidden sequences is, for example, one sentence of “apple apple apple”. This hidden sequence is generated in a predetermined number (plural).

ＬＷＬＭタスク適応部１３０は、ＬＷＬＭ学習部１１０で学習した複数の潜在語言語モデルを入力として、当該潜在語言語モデルを文型モデルと語彙モデルに分離し、複数の文型モデルと語彙モデルをそれぞれ重み付け混合した組をタスク適応潜在語言語モデルとして生成する（ステップＳ１３０）。ＬＷＬＭタスク適応部１３０は、Ｎ個の言語資源からそれぞれ学習したＮ個の潜在語言語モデルを利用して単一の潜在語言語モデルを構築する。具体的には、潜在語言語モデルの文型モデルは文型モデル同士で混合を行い、語彙モデルは語彙モデル同士で混合を行う。 The LMLM task adaptation unit 130 receives a plurality of latent language models learned by the LMLM learning unit 110, separates the latent language model into a sentence model and a vocabulary model, and weights and mixes the plurality of sentence models and the vocabulary models respectively. The set is generated as a task adaptive latent language model (step S130). The LMLM task adaptation unit 130 constructs a single latent language model using N latent language models learned from N language resources. Specifically, the sentence model of the latent language model is mixed between the sentence model, and the vocabulary model is mixed between the vocabulary models.

疑似学習テキスト作成部１４０は、ＬＷＬＭタスク適応部１３０で生成したタスク適応潜在語言語モデルを入力として、潜在語系列を生成し、当該潜在語系列から疑似学習テキストを生成する（ステップＳ１４０）。このステップＳ１４０は、Ｍ個の観測語が生成されるまで繰り返され（ステップＳ１８０２のＮｏ）、疑似学習テキスト１５０が作成される。Ｍは外部から与えても良いし、疑似学習テキスト作成部１４０に予め定数として設定しておいても良い。 The pseudo learning text creation unit 140 receives the task adaptation latent word language model generated by the LMLM task adaptation unit 130 as an input, generates a latent word sequence, and generates a pseudo learning text from the latent word sequence (step S140). This step S140 is repeated until M observation words are generated (No in step S1802), and the pseudo learning text 150 is created. M may be given from the outside, or may be set as a constant in the pseudo learning text creation unit 140 in advance.

Ｎ−ｇｒａｍ言語モデル作成部１６０は、疑似学習テキスト１５０を入力として、当該疑似学習テキスト中の全てのＮ個組みの単語の頻度を数えてＬＷＬＭ的Ｎ−ｇｒａｍ言語モデルを作成する（ステップＳ１６０）。音声認識の場合は、Ｎは一般的にＮ＝３とすることが多い。例えば３−ｇｒａｍの構造を持つＬＷＬＭ的言語モデルは、Ｐ_ｌｗｌｍ（ｗ_ｋ｜ｗ_ｋ−２，ｗ_ｋ−１）という確率分布を与えるものである。Ｎ−ｇｒａｍ言語モデルの学習方法は周知である（非特許文献１）。作成されたＮ−ｇｒａｍ言語モデルは、一般的にモデルの集合として記録され、ＬＷＬＭ的Ｎ−ｇｒａｍ言語モデル１７０を構成する。 The N-gram language model creation unit 160 receives the pseudo-learning text 150 as an input and counts the frequency of all N words in the pseudo-learning text to create an LMLM-like N-gram language model (step S160). . In the case of speech recognition, N is generally often N = 3. For example, an LMLM-like language model having a 3-gram structure gives a probability distribution of P _lwlm (w _k | w _k−2 , w _k−1 ). The learning method of the N-gram language model is well known (Non-Patent Document 1). The created N-gram language model is generally recorded as a set of models, and constitutes an LMLM-like N-gram language model 170.

以上説明したように言語モデル作成装置１００によれば、文型的な側面のみに有用な情報を持つ外部言語資源や、語彙的な側面のみに有用な情報を持つ外部言語資源から、タスクに適応したＬＷＬＭ的Ｎ−ｇｒａｍ言語モデルを作成することができる。以降では、各部のより具体的な機能構成例を示して更に詳しく言語モデル作成装置１００の動作を説明する。 As described above, according to the language model creation device 100, an external language resource having information useful only for the sentence-type aspect or an external language resource having information useful only for the lexical aspect is adapted to the task. An LVLM-like N-gram language model can be created. Hereinafter, the operation of the language model creating apparatus 100 will be described in more detail by showing more specific functional configuration examples of the respective units.

〔ＬＷＬＭ学習部〕
潜在語言語モデルの学習は、入力する学習テキストの各単語に対して、潜在語の割り当てを推定する問題である。つまり、総単語数Ｌの「ｗ_１，ｗ_２，…，ｗ_Ｌ」という学習テキスト（観測語の系列）があれば、「ｗ_１」「ｗ_２」…「ｗ_Ｌ」の各観測語の潜在語「ｈ_１」「ｈ_２」…「ｈ_Ｌ」を推定する問題と言える。この割り当てを推定できれば、潜在語系列「ｈ_１，ｈ_２，…，ｈ_Ｌ」に対してＮ−ｇｒａｍ言語モデルを学習すればＰ（ｈ_ｋ｜ｈ_ｋ−２，ｈ_ｋ−１）を構築でき、「ｈ_１→ｗ_１」「ｈ_２→ｗ_２」…「ｈ_Ｌ→ｗ_Ｌ」に対して、ユニグラム言語モデルを学習すればＰ（ｗ_ｋ｜ｈ_ｋ）を構築できる。 [LVLM Learning Department]
The learning of the latent language model is a problem of estimating the allocation of latent words for each word of the input learning text. That is, if there is a learning text (sequence of observation words) of “w ₁ , w ₂ ,..., W _L ” with the total number of words _L , each observation word of “w ₁ ” “w ₂ ”. It can be said that the latent words “h ₁ ” “h ₂ ”... “H _L ” are estimated. If this allocation can be estimated, P (h _k | h _k−2 , h _k−1 ) is constructed by learning an N-gram language model for the latent word sequence “h ₁ , h ₂ ,..., H _L ”. P (w _k | h _k ) can be constructed by learning a unigram language model for “h ₁ → w ₁ ”, “h ₂ → w ₂ ”... “H _L → w _L ”.

図４に、ＬＷＬＭ学習部１１０のより具体的な機能構成例を示す。その動作フローを図５に示す。ＬＷＬＭ学習部１１０は、隠れ系列推定手段１１０１と、確率生成手段１１０２と、を備える。 FIG. 4 shows a more specific functional configuration example of the LMLM learning unit 110. The operation flow is shown in FIG. The LMLM learning unit 110 includes hidden sequence estimation means 1101 and probability generation means 1102.

隠れ系列推定手段１１０１は、学習テキストを入力として、当該学習テキストの単語列にそれぞれ対応する隠れた単語系列を情報Ｂ（Ｂ≧２）の数分（ステップＳ１８０１ａのＮｏ）、隠れ系列として生成する（ステップＳ１１０１）。情報Ｂは、予め定められた数であり、外部から与えるようにしても良い。学習テキストに含まれる単語の集合をＷとすると、各潜在語はＷに含まれる何れかの単語が割り当てられる。 The hidden sequence estimation unit 1101 receives the learning text as input, and generates hidden word sequences corresponding to the word strings of the learning text as hidden sequences by the number of information B (B ≧ 2) (No in step S1801a). (Step S1101). The information B is a predetermined number and may be given from the outside. If the set of words included in the learning text is W, any word included in W is assigned to each latent word.

上記した例のように、学習テキストを「りんごみかんパインです」という４単語からなる文であるとした場合、Ｗは「りんご」、「みかん」、「パイン」、「です」の４単語からなる。この例の場合、「りんごりんごりんごです」や「みかんりんごみかんです」「パインパインパインパイン」などが、隠れ系列の候補になる。隠れ系列はこの場合、４^４個（Ｗ＝４）の数有り得る。この隠れ系列を構成する潜在語の割り当ては、ギブスサンプリングという周知の方法で推定する。この例の４^４個の隠れ系列は、文型を表していると言える。 As in the example above, if the learning text is a sentence consisting of four words “I am apple apple pine,” W is four words “apple,” “mandarin,” “pine,” and “is.” . In this example, “apple apple apple is”, “mandarin orange apple is”, “pine pine pine”, etc. are candidates for the hidden series. Hidden series in this case, likely number of ^{4 four} (W = 4). The assignment of latent words constituting this hidden sequence is estimated by a known method called Gibbs sampling. 4 ^four hidden sequence of this embodiment can be said to represent the sentence pattern.

確率生成手段１１０２は、隠れ系列から潜在語−潜在語確率Ｐ（ｈ_ｋ｜ｈ_ｋ-２，ｈ_ｋ-１）と潜在語−観測語確率Ｐ（ｗ_ｋ｜ｈ_ｋ）を生成する（ステップＳ１１０２ａ）。 The probability generation means 1102 generates a latent word-latent word probability P (h _k | h _k-2 , h _k-1 ) and a latent word-observed word probability P (w _k | h _k ) from the hidden sequence (step) S1102a).

隠れ系列に例えば「りんごりんごりんごです」が選ばれたとする。隠れ系列が決まれば、潜在語−潜在語確率Ｐ（りんご｜りんご，りんご）は、例えば最尤推定の場合、式（１）で得ることができる。 For example, suppose that “I am an apple apple apple” is selected as the hidden series. If the hidden sequence is determined, the latent word-latent word probability P (apple | apple, apple) can be obtained by, for example, Equation (1) in the case of maximum likelihood estimation.

「りんごりんごりんごです」を隠れ系列とした場合、隠れ系列中に「りんごりんご」が生じる回数は２回、「りんごりんごりんご」が生じる回数は１回なので、潜在語−潜在語確率Ｐ（りんご｜りんご，りんご）＝１/２となる。潜在語−潜在語確率Ｐ（りんご｜りんご，りんご）は、果物名「りんご」を代表単語として、果物名の後には果物名が良く出現する確率を表わしている。この潜在語−潜在語確率Ｐは文型モデルを表す。なお、最尤推定による値以外に、バックオフスムージングなど、任意の方法で推定した確率を利用できる。バックオフスムージングは公知の技術である。 When “apple apple apple” is a hidden sequence, the number of occurrences of “apple apple” in the hidden sequence is 2 and the number of occurrences of “apple apple apple” is 1. Therefore, the latent word-latent probability P (apple) | Apple, apple) = 1/2 The latent word-latent word probability P (apple | apple, apple) represents the probability that the fruit name often appears after the fruit name with the fruit name “apple” as a representative word. This latent word-latent word probability P represents a sentence model. In addition to the value obtained by maximum likelihood estimation, a probability estimated by an arbitrary method such as back-off smoothing can be used. Backoff smoothing is a known technique.

次に、確率生成手段１１０２は、潜在語−観測語確率Ｐ（ｗ_ｋ｜ｈ_ｋ）を計算する（
ステップＳ１１０２ｂ）。潜在語−観測語確率Ｐ（みかん｜りんご）は、学習テキストを「りんごみかんパインです」とした場合、例えば最尤推定の場合、式（３）で得ることができる。 Next, the probability generation means 1102 calculates the latent word-observed word probability P (w _k | h _k ) (
Step S1102b). The latent word-observed word probability P (mandarin | apple) can be obtained by Equation (3) when the learning text is “It is apple orange pine”, for example, in the case of maximum likelihood estimation.

隠れ系列「りんごりんごりんごです」の潜在語「りんご」の回数は３回、学習テキスト中の観測語が「みかん」になる回数は１回なので、潜在語−観測語確率Ｐ（みかん｜りんご）＝１/３となる。潜在語−観測語確率Ｐ（みかん｜りんご）は、観測語として「みかん」が出現する確率を表わしている。この潜在語−観測語確率Ｐは語彙モデルを表す。なお、確率に、最尤度推定以外の方法（例えばバックオフスムージングなど）で推定した値が利用できることは、上記した通りである。 The number of latent word “Ringo” in the hidden sequence “Ringo-Ringo-Ringo” is 3, and the number of times the observed word in the learning text is “Mikan” is 1. = 1/3. The latent word-observed word probability P (mandarin | apple) represents the probability that "mandarin orange" appears as an observed word. This latent word-observed word probability P represents a vocabulary model. As described above, the value estimated by a method other than the maximum likelihood estimation (for example, back-off smoothing) can be used for the probability.

このように、確率生成手段１１０２は、隠れ系列に対して潜在語−潜在語確率Ｐ（ｈ_ｋ｜ｈ_ｋ-２，ｈ_ｋ-１）と潜在語−観測語確率Ｐ（ｗ_ｋ｜ｈ_ｋ）の１組を学習する。この１組で潜在語言語モデルとして表現される。潜在語−潜在語確率Ｐ（ｈ_ｋ｜ｈ_ｋ-２，ｈ_ｋ-１）と潜在語−観測語確率Ｐ（ｗ_ｋ｜ｈ_ｋ）は、情報Ｂの数分が計算されるまで繰り返される（ステップＳ１８０１ｂ）。この情報Ｂの値は、潜在語言語モデルを詳細にしたい場合に大きく設定する。情報Ｂの値は１０程度あれば、ある程度詳細といえる。 As described above, the probability generation unit 1102 uses the latent word-latent word probability P (h _k | h _k-2 , h _k-1 ) and the latent word-observed word probability P (w _k | h _k ) for the hidden sequence. ). This set is expressed as a latent language model. The latent word-latent word probability P (h _k | h _k-2 , h _k-1 ) and the latent word-observed word probability P (w _k | h _k ) are repeated until the number of information B is calculated. (Step S1801b). The value of the information B is set to be large when the latent language model is desired to be detailed. If the value of the information B is about 10, it can be said that it is somewhat detailed.

ＬＷＬＭ学習部１１０は、上記した処理を入力される複数個の学習テキストの数の、潜在語−潜在語確率Ｐ_ｎ（ｈ_ｋ｜ｈ_ｋ-２，ｈ_ｋ-１）と潜在語−観測語確率Ｐ_ｎ（ｗ_ｋ｜ｈ_ｋ）の組（ｎ：１，２，…Ｎ）からなる潜在後言語モデルを学習する。つまり、Ｎ個の学習テキストからＮ個の潜在語言語モデルが生成される。 The LMLM learning unit 110 includes the latent word-latent word probability P _n (h _k | h _k-2 , h _k-1 ) and the latent word-observed word of the number of learning texts to which the above processing is input. A latent post-language model consisting of a set (n: 1, 2,... N) of probabilities P _n (w _k | h _k ) is learned. That is, N latent language models are generated from the N learning texts.

〔ＬＷＬＭタスク適応部〕
図６に、ＬＷＬＭタスク適応部１３０のより具体的な機能構成例を示す。その動作フローを図７に示す。ＬＷＬＭタスク適応部１３０は、モデル分割手段１３０１と、文型モデル適応手段１３０２と、語彙モデル適応手段１３０３とを備える。 [LVLM task adaptation part]
FIG. 6 shows a more specific functional configuration example of the LMLM task adaptation unit 130. The operation flow is shown in FIG. The LMLM task adaptation unit 130 includes a model division unit 1301, a sentence pattern model adaptation unit 1302, and a vocabulary model adaptation unit 1303.

モデル分割手段１３０１は、ＬＷＬＭ学習部１１０で学習した複数の潜在語言語モデルを、文型モデルと語彙モデルとに分割する（ステップＳ１３０１）。１個目の潜在語言語モデルを、文型モデルＰ_１（ｈ_ｋ｜ｈ_ｋ−２，ｈ_ｋ−１）と語彙モデルＰ_１（ｗ_ｋ｜ｈ_ｋ）とに分割する。このモデル分割ステップは、潜在語言語モデルの数（Ｎ個）だけ繰り返される（ステップＳ１８０２ａ）。 The model dividing unit 1301 divides the plurality of latent language models learned by the LMLM learning unit 110 into a sentence model and a vocabulary model (step S1301). The first latent language model is divided into a sentence pattern model P ₁ (h _k | h _k−2 , h _k−1 ) and a vocabulary model P ₁ (w _k | h _k ). This model division step is repeated by the number of latent language models (N) (step S1802a).

文型モデル適応手段１３０２は、モデル分割手段１３０１で分割したＮ個の文型モデルを重み付け混合したタスク適応文型モデル１３０４を生成する（ステップＳ１３０２）。タスク適応文型モデルは式（４）で表せる。 The sentence pattern model adaptation unit 1302 generates a task adaptation sentence pattern model 1304 in which the N sentence pattern models divided by the model dividing unit 1301 are weighted and mixed (step S1302). The task adaptive sentence model can be expressed by equation (4).

ここでω_ｎは、ｎ番目の文型モデルに対する重みである。この重みは次式に示す制約の元に決定される。 Here, ω _n is a weight for the n-th sentence pattern model. This weight is determined based on the constraint shown in the following equation.

この重みω_ｎは、学習テキストが文型モデルの抽出に向いているか否かの程度に応じて、ＬＷＬＭ学習部１１０に入力する複数個のそれぞれに予め付与しておいても良い。例えば、学習テキストが３個あった場合に、それぞれの重みをω_１＝０.５，ω_２＝０.４，ω_３＝０.１と予め決めておく。ω_１＝０.５は、この例では最も文型が利用できそうな学習テキストであることを意味している。例えば、対象ドメインが通信系コンタクトセンタで、外部言語資源がオペレータと利用者の会話を録音した音声から書き起こした学習テキストあれば、文型は信用できると考えられる。又は、重みω_ｎを、ＬＷＬＭタスク適応部１３０が潜在語言語モデルを処理する際にいくつにすべきか、言語モデル作成装置１００を使用する利用者に、言語モデル作成装置１００が問うように構成しても良い。 The weight ω _n may be given in advance to each of a plurality of inputs to the LMLM learning unit 110 depending on the degree of whether or not the learning text is suitable for sentence pattern model extraction. For example, when there are three learning texts, the weights are determined in advance as ω ₁ = 0.5, ω ₂ = 0.4, ω ₃ = 0.1. ω ₁ = 0.5 means that in this example, the learning text is most likely to use the sentence pattern. For example, if the target domain is a communication contact center and the external language resource is a learning text transcribed from a voice recording a conversation between an operator and a user, the sentence pattern is considered to be reliable. Alternatively, the language model creation device 100 is configured to ask the user who uses the language model creation device 100 how many weights ω _n should be used when the LMLM task adaptation unit 130 processes the latent language model. May be.

語彙モデル適応手段１３０３は、モデル分割手段１３０１で分割したＮ個の語彙モデルを重み付け混合したタスク適応語彙モデル１３０５を生成する（ステップＳ１３０３）。文型モデル適応ステップ（ステップＳ１３０２）と語彙モデル適応ステップ（ステップＳ１３０３）は、潜在語言語モデルの数（Ｎ個）だけ繰り返される（ステップＳ１８０２ｂ）。タスク適応語彙モデルは式（６）で表せる。 The vocabulary model adaptation unit 1303 generates a task adaptation vocabulary model 1305 in which the N vocabulary models divided by the model division unit 1301 are weighted and mixed (step S1303). The sentence pattern model adaptation step (step S1302) and the vocabulary model adaptation step (step S1303) are repeated by the number (N) of latent language models (step S1802b). The task adaptive vocabulary model can be expressed by equation (6).

ここでμ_ｎは、ｎ番目の語彙モデルに対する重みである。この重みは次式に示す制約の元に決定される。 Here, μ _n is a weight for the nth vocabulary model. This weight is determined based on the constraint shown in the following equation.

この語彙モデルに対する重みμ_ｎは、学習テキストが語彙モデルの抽出に向いているか否かの程度に応じて、予め決められる値である。例えば対象ドメインが通信系コンタクトセンタで、外部言語資源が通信関係のＷｅｂ上のニュース記事であれば、それに含まれる語彙は信用できると考えられるので大きな値に決定される。重みμ_ｎの決め方は、重みω_ｎと同じである。 The weight μ _n for the vocabulary model is a value determined in advance according to the degree of whether or not the learning text is suitable for extraction of the vocabulary model. For example, if the target domain is a communication contact center and the external language resource is a communication-related news article on the Web, the vocabulary contained therein is considered to be trustworthy and is therefore determined to be a large value. The method of determining the weight μ _n is the same as the weight ω _n .

〔疑似学習テキスト作成部〕
図８に、疑似学習テキスト作成部１４０のより具体的な機能構成例を示す。その動作フローを図９に示す。疑似学習テキスト作成部１４０は、タスク適応ＬＷＬＭから疑似学習テキスト「ｗ_１，ｗ_２，…，ｗ_Ｍ」というＭ個の単語を生成するものである。 [Pseudo-learning text creation part]
FIG. 8 shows a more specific functional configuration example of the pseudo learning text creation unit 140. The operation flow is shown in FIG. Pseudo-learning text creating unit 140, the task adaptation pseudo-learning text from LWLM _{_{"w 1, w 2, ...,}} w M " is intended to generate M word.

疑似学習テキスト作成部１４０は、潜在語生成手段１４０１と、観測語生成手段１４０２と、を備える。潜在語生成手段１４０１は、タスク適応文型モデルＰ_all（ｈ_ｋ｜ｈ_ｋ−２，ｈ_ｋ−１）と潜在語履歴を入力として、潜在語ｈ_ｋをランダムに生成する（ステップＳ１４０１）。 The pseudo-learning text creation unit 140 includes latent word generation means 1401 and observation word generation means 1402. The latent word generation means 1401 receives the task adaptive sentence pattern model P _all (h _k | h _k−2 , h _k−1 ) and the latent word history, and randomly generates the latent word h _k (step S1401).

潜在語生成手段１４０１は、最初は潜在語履歴が存在しないので、初期の潜在語ｈ_１をタスク適応文型モデルＰ_all（ｈ_１｜−，−）に基づいてランダムに生成する（ステップＳ１４０１ａ）。潜在語ｈ_１は、ある離散確率分布からランダムに１つの単語をサンプルする周知のSampleOneアルゴリズムに基づいて生成される。 Since the latent word history does not exist at first, the latent word generating unit 1401 randomly generates the initial latent word h _{1 based} on the task adaptive sentence pattern model P _all (h ₁ | −, −) (step S1401a). The latent word h ₁ is generated based on a well-known SampleOne algorithm that samples one word randomly from a certain discrete probability distribution.

潜在語生成手段１４０１は一様乱数（rand）を発生させ（ステップＳ１４０１ｂ）、参照した潜在語−潜在語確率Ｐ_all（ｈ_１｜−，−）に従って最初の潜在語ｈ_１を決定する（ステップＳ１４０１ｃ）。潜在語ｈ_１は、一様乱数（rand）と潜在語ｈ_１の確率との関係で決定する。例えば、Ｐ_１（りんご｜−，−）＝０．３、Ｐ_１（みかん｜−，−）＝０．３、Ｐ_１（パイン｜−，−）＝０．３、Ｐ_１（です｜−，−）＝０．１と、仮定した時に、rand＝０．１の場合は最初の潜在語ｈ_１＝「りんご」に決定する。Rand＝０．７の場合は潜在語ｈ_１＝「パイン」、rand＝０．９５の場合は潜在語ｈ_１＝「です」に決定する。この潜在語ｈ_１を決定する過程が、SampleOneアルゴリズムである。 The latent word generating means 1401 generates a uniform random number (rand) (step S1401b), and determines the first latent word h ₁ according to the referenced latent word-latent word probability P _all (h ₁ | −, −) (step S1401b). S1401c). The latent word h ₁ is determined by the relationship between the uniform random number (rand) and the probability of the latent word h ₁ . For example, P ₁ (apple |-,-) = 0.3, P ₁ (mandarin |-,-) = 0.3, P ₁ (pine |-,-) = 0.3, P ₁ (is |- ,-) = 0.1. When rand = 0.1, the first latent word h ₁ = “apple” is determined. When Rand = 0.7, the latent word h ₁ = “pine”, and when rand = 0.95, the latent word h ₁ = “is”. The process of determining the potential word h ₁ is a SampleOne algorithm.

観測語生成手段１４０２は、潜在語ｈ_１を入力として、タスク適応語彙モデルＰ_all（ｗ_ｋ｜ｈ_ｋ）を参照し、潜在語−観測語確率と一様乱数との関係で観測語を決定し、決定した観測語を疑似学習テキストとして外部に出力すると共に、出力した観測語に対応する潜在語履歴を潜在語生成手段１４０１に出力する（ステップＳ１４０２）。 The observed word generation means 1402 receives the latent word h ₁ as input, refers to the task adaptive vocabulary model P _all (w _k | h _k ), and determines the observed word based on the relationship between the latent word-observed word probability and the uniform random number. and, pseudo- the determined observed words and outputs to the outside as a learning text, and outputs a potential word history corresponding to the output were observed word to potential word generation unit 1401 (step S1402).

観測語生成手段１４０２は、潜在語ｈ_１が入力されると一様乱数（rand）を発声させ（ステップＳ１４０２ｂ）、一様乱数（rand）と潜在語−観測語確率Ｐ_all（ｗ_１｜ｈ_１）との関係で観測語ｗ_１を決定する（ステップＳ１４０２ｃ）。決定した観測語ｗ_１は疑似学習テキストとして外部に出力される（ステップＳ１４０２ｄ）と共に、出力した観測語ｗ_１に対応する潜在語履歴ｈ_１を潜在語言語モデル選択手段１４０１に出力する（ステップＳ１４０２ｅ）。 When the latent word h ₁ is input, the observed word generation unit 1402 utters a uniform random number (rand) (step S1402b), and the uniform random number (rand) and the latent word-observed word probability P _all (w ₁ | h to determine the observed word _{w 1} in relation to ₁₎ (step S1402c). With the determined observation words w ₁ is output as the learning text pseudo- (Step S1402d), and outputs a potential word history h ₁ corresponding to the monitoring words w ₁ output to the potential language model selecting means 1401 (step S1402e).

例えば、観測語ｗ_１は、潜在語−観測語確率Ｐ_all（ｗ_１｜ｈ_１）の観測語の確率が、「りんご」＝０.３、「みかん」＝０.３、「パイン」＝０．３、「です」＝０．１とした場合に、一様乱数がrand＝０．１の時は「りんご」、rand＝０．４の時は「みかん」、rand＝０．７の時は「パイン」、rand＝０．９５の時は「です」、として決定される。 For example, the observation word w ₁ has the latent word-observation word probability P _all (w ₁ | h ₁ ) observation word probabilities of “apple” = 0.3, “mandarin orange” = 0.3, “pine” = When 0.3, “is” = 0.1, when the random number is rand = 0.1, “apple”, when rand = 0.4, “mandarin”, rand = 0.7 The time is determined as “Pine”, and when rand = 0.95, it is “I”.

観測語ｗ_１を出力すると、疑似学習テキスト作成部１４０は、上記したステップＳ１４０１ａ〜ステップＳ１４０２ｅまでの処理を、Ｍ個の観測語ｗ_Ｍが出力されるまで繰り返される（ステップＳ１８０２のＮｏ）。Ｍの値は大きいほど潜在語言語モデルの性質を良く表す疑似学習テキストとすることができる。Ｍは外部から与えても良いし、観測語生成手段１４０２に予め設定しておいても良い。 When outputting the observation word _{w 1,} pseudo learning text creation section 140 is repeated the processing up to step S1401a~ Step S1402e described above, until the M observations word _{w M} is outputted (No in step S1802). The value of M can be a pseudo-learning text represents well the nature of the potential language model larger. M may be given from the outside, or may be set in the observation word generation unit 1402 in advance.

図１０に、本発明の言語モデル作成装置２００の機能構成例を示す。その動作フローを図１１に示す。言語モデル作成装置２００は、従来のＮ−ｇｒａｍ言語モデルと、上記した言語モデル作成装置１００で作成した言語モデルを所定の割合で混合した混合Ｎ−ｇｒａｍ言語モデルを作成するものである。 FIG. 10 shows a functional configuration example of the language model creation device 200 of the present invention. The operation flow is shown in FIG. The language model creation device 200 creates a mixed N-gram language model in which a conventional N-gram language model and the language model created by the language model creation device 100 described above are mixed at a predetermined ratio.

言語モデル学習装置２００は、ＬＷＬＭ的言語モデル作成部２１０と、Ｎ個の言語モデル作成部２２０_１〜２２０_Ｎと、ＬＷＬＭ的Ｎ−ｇｒａｍ言語モデル２３０と、Ｎ個のＮ−ｇｒａｍ言語モデル２４０_１〜２４０_Ｎと、混合Ｎ−ｇｒａｍ言語モデル作成部２５０と、制御部２６０と、を具備する。ＬＷＬＭ的言語モデル作成部２１０は、上記した言語モデル作成装置１００を構成するＮ−ｇｒａｍ言語モデル作成部１６０と同じものである。説明の都合上、ＬＷＬＭ的言語モデル作成部２１０と称しているが、Ｎ−ｇｒａｍ言語モデル２４０_ｎ（ｎ：１〜Ｎ）と同じものである。 The language model learning apparatus 200 includes an RWLM-like language model creation unit 210, N language model creation units 220 _{1 to} 220 _N , an LMLM-like N-gram language model 230, and N N-gram language models 240 _1. -240 _N , the mixed N-gram language model creation part 250, and the control part 260 are comprised. The LMLM-like language model creation unit 210 is the same as the N-gram language model creation unit 160 constituting the language model creation device 100 described above. For convenience of explanation, it is referred to as an LVLM-like language model creation unit 210, which is the same as the N-gram language model 240 _n (n: 1 to N).

ＬＷＬＭ的言語モデル作成部２１０は、言語モデル作成装置１００の疑似学習テキスト作成部１４０が生成した疑似学習テキスト１５０を入力として、当該疑似学習テキスト中の全てのＮ個組みの単語の頻度を数えてＮ−ｇｒａｍ言語モデルをＬＷＬＭ的言語モデルＰ_ｌｗｌｍ（ｗ_ｋ｜ｗ_ｋ−２，ｗ_ｋ−１）として作成する（ステップＳ２１０）。 LWLM language model generating unit 210 is input with pseudo training text 150 to pseudo-learning text creation section 140 of the language model creating apparatus 100 has generated, the frequency of words in all N tuples in the pseudo training text And an N-gram language model is created as an LVLM-like language model P _lwlm (w _k | w _k−2 , w _k−1 ) (step S210).

Ｎ個の言語モデル作成部２２０_１〜２２０_Ｎには学習テキスト１〜Ｎがそれぞれ入力され、各言語モデル作成部２２０_ｎ（ｎ：１〜Ｎ）はそれぞれの学習テキスト中の全てのＮ個組みの単語の頻度を数えて複数のＮ−ｇｒａｍ言語モデル２４０_１〜２４０_Ｎを作成する（ステップＳ２２０）。Ｎ−ｇｒａｍ言語モデル２４０_１〜２４０_Ｎは、Ｐ_１（ｗ_ｋ｜ｗ_ｋ−２，ｗ_ｋ−１）〜Ｐ_Ｎ（ｗ_ｋ｜ｗ_ｋ−２，ｗ_ｋ−１）である。 Learning texts 1 to N are respectively input to the _N language model creation units 220 _{1 to} 220 _N , and each language model creation unit 220 _n (n: 1 to N) includes all N sets in each learning text. A plurality of N-gram language models 240 ₁ to 240 _N are created by counting the frequency of the words (step S220). The N-gram language models 240 ₁ to 240 _N are P ₁ (w _k | w _k−2 , w _k−1 ) to P _N (w _k | w _k−2 , w _k−1 ).

混合Ｎ−ｇｒａｍ言語モデル作成部２５０は、ＬＷＬＭ的言語モデルＰ_ｌｗｌｍ（ｗ_ｋ｜ｗ_ｋ−２，ｗ_ｋ−１）と複数のＮ−ｇｒａｍ言語モデルＰ_１（ｗ_ｋ｜ｗ_ｋ−２，ｗ_ｋ−１）〜Ｐ_Ｎ（ｗ_ｋ｜ｗ_ｋ−２，ｗ_ｋ−１）を入力として、全ての言語モデルを混合した単一の混合Ｎ−ｇｒａｍ言語モデルを作成する（ステップＳ２５０）。混合は次式に従う。言語モデルの混合（ステップＳ２５０）は、全てのＮ−ｇｒａｍ言語モデルが終了するまで繰り返される（ステップＳ２６０のＮｏ）。 The mixed N-gram language model creation unit 250 includes an LMLM-like language model P _lwlm (w _k | w _k−2 , w _k−1 ) and a plurality of N-gram language models P ₁ (w _k | w _k−2 , Using w _k−1 ) to P _N (w _k | w _k−2 , w _k−1 ) as input, a single mixed N-gram language model is created by mixing all language models (step S250). Mixing follows the formula: The mixing of language models (step S250) is repeated until all N-gram language models are completed (No in step S260).

ここで、λ_ｎとλ_lwlmは各言語モデルに対する重みであり、以下の制約を満たす。 Here, λ _n and λ _lwlm are weights for each language model and satisfy the following constraints.

λ_ｎとλ_lwlmの値は予め決定しておく。例えば、ＬＷＬＭ的言語モデルＰ_ｌｗｌｍ（ｗ_ｋ｜ｗ_ｋ−２，ｗ_ｋ−１）の重みを大きくしたい場合はλ_lwlmを大きく設定し、小さくしたい場合はλ_lwlmを小さく設定する。このように混合した混合Ｎ−ｇｒａｍ言語モデルは、潜在語言語モデルの特性と従来のＮ−ｇｒａｍ言語モデルの特性を補完する言語モデルとすることができる。 The values of λ _n and λ _lwlm are determined in advance. For example, if it is desired to increase the weight of the LMLM-like language model P _lwlm (w _k | w _k−2 , w _k−1 ), λ _lwlm is set large, and if it is _desired to be small, λ _lwlm is set small. The mixed N-gram language model thus mixed can be a language model that complements the characteristics of the latent language model and the characteristics of the conventional N-gram language model.

〔評価実験〕
この発明の言語モデル作成装置で作成した言語モデルの性能を評価する目的で、音声認識誤り率を比較する評価実験を行った。評価対象のドメインとして、コンタクトセンタの模擬通話（コンタクトセンタ：Ａ）を利用した。その模擬通話の１時間分を書き起こして学習データとし、その他の２時間分をテストデータとした。更に、４種類の外部言語資源を利用した。具体的には、Twitterデータ、ブログデータ、日本語話し言葉コーパス（ＣＳＪ）、他のドメインのコンタクトセンタ音声書き起こし（コンタクトセンタ：Ｂ）、である。これらの言語資源を混合する文型モデルに対する重みω_ｎと語彙モデルに対する重みμ_ｎは次式に示すように設定した。音響モデルは２０００状態１６混合トライフォンＨＭＭを準備し、音声認識デコーダにはVoiceRexを用いた。各言語資源の形態素解析にはＪＴＡＧを利用した。 [Evaluation experiment]
For the purpose of evaluating the performance of the language model created by the language model creation device of the present invention, an evaluation experiment was performed to compare speech recognition error rates. As a domain to be evaluated, a contact center simulated call (contact center: A) was used. One hour of the simulated call was transcribed as learning data, and the other two hours were used as test data. In addition, four types of external language resources were used. Specifically, Twitter data, blog data, Japanese spoken corpus (CSJ), and other domain contact center voice transcripts (contact center: B). The weight ω _n for the sentence pattern model in which these language resources are mixed and the weight μ _n for the vocabulary model are set as shown in the following equation. As an acoustic model, 2000 state 16 mixed triphone HMM was prepared, and VoiceRex was used as a speech recognition decoder. JTAG was used for morphological analysis of each language resource.

評価は、音声書き起こしと外部言語資源から線形補間に基づいて構築した言語モデルを使用した場合（従来法）、上記した言語モデル作成装置１００で作成した言語モデルを使用した場合（提案法１）、上記した言語モデル作成装置２００で作成した言語モデルを使用した場合（提案法２）、の３つの場合を比較して行った。その結果を表１に示す。 Evaluation is performed when a language model constructed based on linear interpolation from speech transcription and external language resources is used (conventional method), or when a language model created by the language model creation device 100 described above is used (proposed method 1). When the language model created by the language model creation device 200 described above was used (proposed method 2), the three cases were compared. The results are shown in Table 1.

表１に示すように、生成したテキストのみで構築した言語モデル（提案法１）は、確率モデルに基づいて自働的に生成したテキストに関わらず従来法（76.12%）と同程度（75.89%）の文字正解精度が得られた。更に、生成したテキストとその他のテキストを線形補間により組み合わせて構築した言語モデル（提案法２）は、従来法より高い性能が実現できている。 As shown in Table 1, the language model constructed with only the generated text (Proposed method 1) is similar to the conventional method (76.12%) (75.89%) regardless of the text automatically generated based on the probability model. ) Character correct answer accuracy was obtained. Furthermore, the language model (proposed method 2) constructed by combining the generated text and other texts by linear interpolation can achieve higher performance than the conventional method.

つまり、音声書き起こしのデータ量が１時間分と極めて少なくても、この発明の言語モデルが、有効な改善効果を与えていることが分かる。このように、本発明の言語モデル作成方法は、少量の音声書き起こしからでも高い性能の言語モデルを構築する方法として有用である。本発明の言語モデル作成方法は、潜在語言語モデルを利用して、言語資源を文型的な側面と語彙的な側面の２つの成分に分離して、文型的な側面と語彙的な側面のそれぞれを言語資源間で適切に混合することにより言語モデルタスク適用を行って構築するので、タスクに適応した高い性能を持つ言語モデルを提供することができる。
なお、上記した実施例では、一様乱数と確率値とを比較してモデルを決定する方法を説明したが、上記した方法は一例であって、一様乱数を用いた他の方法を用いても良い。 That is, it can be seen that the language model of the present invention provides an effective improvement effect even if the amount of voice transcription data is as small as one hour. Thus, the language model creation method of the present invention is useful as a method for constructing a language model with high performance even from a small amount of speech transcription. The language model creation method of the present invention uses a latent language model to separate a language resource into two components, a sentence-type aspect and a lexical aspect, respectively. Since the language model task is applied and constructed by appropriately mixing the language resources, it is possible to provide a language model with high performance adapted to the task.
In the above-described embodiment, the method for determining the model by comparing the uniform random number and the probability value has been described. However, the above-described method is an example, and another method using the uniform random number is used. Also good.

上記装置における処理手段をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、各装置における処理手段がコンピュータ上で実現される。 When the processing means in the above apparatus is realized by a computer, the processing contents of the functions that each apparatus should have are described by a program. Then, by executing this program on the computer, the processing means in each apparatus is realized on the computer.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記録装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Further, the program may be distributed by storing the program in a recording device of a server computer and transferring the program from the server computer to another computer via a network.

また、各手段は、コンピュータ上で所定のプログラムを実行させることにより構成することにしてもよいし、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 Each means may be configured by executing a predetermined program on a computer, or at least a part of these processing contents may be realized by hardware.

Claims

Using a plurality of learning texts as input, a plurality of hidden sequences, which are hidden word sequences corresponding to the word strings of the respective learning texts, are generated, and a sentence model and a latent word-latent word probability are generated from the hidden sequences. An RWLM learning unit that learns a latent word language model composed of two probability distributions of vocabulary models that are word-observed word probabilities for each of the plurality of learning texts;
Using the plurality of latent language models as input, the latent language model is separated into a sentence model and a vocabulary model, and a combination of the plurality of sentence models and the vocabulary model is weighted and mixed with each other. An RWLM task adaptation unit to be generated as a model;
Using the task adaptive latent language model as an input, generating a latent word sequence, and generating a pseudo learning text from the latent word sequence,
An N-gram language model creation unit that takes the pseudo-learning text as an input and counts the frequency of all N words in the pseudo-learning text to create an LVLM-like N-gram language model;
A language model creation device comprising:

The language model creation device according to claim 1,
The LMLM task adaptation unit
A model dividing means for dividing the latent language model into a sentence model and a vocabulary model;
A sentence model adaptation means for generating a task adaptive sentence pattern model in which a plurality of the sentence pattern models divided by the model dividing means are weighted and mixed;
A vocabulary model adaptation means for generating a task adaptive vocabulary model in which a plurality of the vocabulary models divided by the model division means are weighted and mixed;
A language model creation device comprising:

As input learning text pseudo- created in the language model creating apparatus according to claim 1, said pseudo LWLM language model N-gram language model and counting the frequency of a word of all N tuples in the training text An LVLM-like language model creation unit created as
A language model creation unit that takes a plurality of learning texts as input and counts the frequency of all N words in each learning text to create a plurality of N-gram language models;
A mixed N-gram language model creation unit that creates the single mixed N-gram language model obtained by mixing the respective language models, using the LMLM-like language model and the plurality of N-gram language models as inputs,
A language model creation device comprising:

Using a plurality of learning texts as input, a plurality of hidden sequences, which are hidden word sequences corresponding to the word strings of the respective learning texts, are generated, and a sentence model and a latent word-latent word probability are generated from the hidden sequences. An LVLM learning process of learning a latent word language model composed of two probability distributions of a vocabulary model that is a word-observed word probability for each of the plurality of learning texts;
Using the plurality of latent language models as input, the latent language model is separated into a sentence model and a vocabulary model, and a combination of the plurality of sentence models and the vocabulary model is weighted and mixed with each other. LMLM task adaptation process generated as a model,
Using the task adaptive latent language model as an input, generating a latent word sequence and generating a pseudo learning text from the latent word sequence,
An N-gram language model creation process for creating an RWLM-like N-gram language model by counting the frequencies of all N words in the pseudo-learning text using the pseudo-learning text as an input;
Language model creation method including

In the language model creation method according to claim 4,
The LMLM task adaptation process is as follows:
A model dividing step of dividing the latent language model into a sentence model and a vocabulary model;
A sentence pattern adaptation step for generating a task adaptation sentence pattern model in which the plurality of sentence pattern models divided in the model division step are weighted and mixed;
A vocabulary model adaptation step for generating a task adaptive vocabulary model in which a plurality of the vocabulary models divided in the model division step are weighted and mixed;
A language model creation method comprising:

As input pseudo learning text created in a language model generating method according to claim 4, the pseudo LWLM language model N-gram language model and counting the frequency of a word of all N tuples in the training text LMLM-like language model creation process created as
A language model creation process of creating a plurality of N-gram language models by counting the frequency of all N sets of words in each learning text, using a plurality of learning texts as input,
A mixed N-gram language model creation process for creating a single mixed N-gram language model obtained by mixing the respective language models with the LMLM-like language model and the plurality of N-gram language models as inputs,
Language model creation method including

A program for causing a computer to function as the language model creation device according to any one of claims 1 to 3.

A computer-readable recording medium on which any one of the programs according to claim 7 is recorded.