JP4897737B2

JP4897737B2 - Word addition device, word addition method, and program thereof

Info

Publication number: JP4897737B2
Application number: JP2008124295A
Authority: JP
Inventors: 明夫神; 浩和政瀧
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2008-05-12
Filing date: 2008-05-12
Publication date: 2012-03-14
Anticipated expiration: 2028-05-12
Also published as: JP2009271465A

Description

この発明は、音声認識処理などに用いる言語モデルにおける言語辞書に不足な単語を追加する場合に、適切なクラス内単語出現確率を求めて、当該単語を追加する単語追加装置、単語追加方法、そのプログラムに関する。 When adding an insufficient word to a language dictionary in a language model used for speech recognition processing or the like, the present invention obtains an appropriate intra-class word appearance probability and adds the word, a word adding device, a word adding method thereof, Regarding the program.

非特許文献１に統計的言語モデル（以下、単に言語モデルという。）を使った一般的な音声認識の原理が記載されている（以下、従来技術１という。）。そして、言語モデルにおける言語辞書に登録されていない未登録単語を追加したい場合がある。図１に言語辞書の概念図を示す。図１に示すように、言語辞書は単語クラス毎に、単語が登録されている。図１の例では、人名クラス（クラス１）、地名クラス（クラス２）食品名クラス（クラス３）で分けられている。図１の例では、クラス３には、「ラーメン」「刺身」などの食品名が登録されている。 Non-Patent Document 1 describes a general speech recognition principle using a statistical language model (hereinafter simply referred to as a language model) (hereinafter referred to as Prior Art 1). In some cases, it is desired to add unregistered words that are not registered in the language dictionary in the language model. FIG. 1 shows a conceptual diagram of a language dictionary. As shown in FIG. 1, words are registered in the language dictionary for each word class. In the example of FIG. 1, it is divided into a personal name class (class 1), a place name class (class 2), and a food name class (class 3). In the example of FIG. 1, food names such as “ramen” and “sashimi” are registered in class 3.

この場合に、クラス３には、「ラーメン」「刺身」などの単語は既に登録されているので、認識したい音声内に「ラーメンを食べたい」とか「刺身はおいしい」などの発話があった場合には、音声認識の結果、「ラーメン」「刺身」という単語が適切な場所に認識結果として出現させることができる。しかし、図１の例のように、「豆腐」という単語が未登録であった場合に、「豆腐を食べたい」と発話しても適切な場所に「豆腐」という単語を出現させることができない。また、このような場合に「甲府を食べたい」や「を食べたい」などと誤認識してしまうことが多い。従って、認識させたい単語が未登録単語の場合には、言語辞書に新たに追加登録することが重要になる。未登録単語のそのほかの例としては、特殊な用語、専門用語などである。 In this case, since words such as “ramen” and “sashimi” are already registered in class 3, there is an utterance such as “I want to eat ramen” or “Sashimi is delicious” in the speech I want to recognize As a result of speech recognition, the words “ramen” and “sashimi” can appear as recognition results at appropriate locations. However, as in the example of FIG. 1, if the word “tofu” is not registered, the word “tofu” cannot appear in an appropriate place even if the user says “I want to eat tofu”. . Also, in such cases, it is often mistakenly recognized as “I want to eat Kofu” or “I want to eat”. Therefore, when the word to be recognized is an unregistered word, it is important to newly register it in the language dictionary. Other examples of unregistered words include special terms and technical terms.

ところで、言語辞書では上述のように単語クラス毎に単語が登録されているが、各単語には単語を識別する番号、読み、品詞、クラス内単語出現確率（以下、単に「単語出現確率」という。）などが記述（付与）されている。ここで、単語出現確率とは、ある単語クラス内において、クラス内に登録されている単語の出現する確率を表す値であり、そのクラス内での総和は１になることが望ましい。 By the way, in the language dictionary, words are registered for each word class as described above. For each word, a number for identifying the word, a reading, a part of speech, a word appearance probability in the class (hereinafter simply referred to as “word appearance probability”). .) Etc. are described (granted). Here, the word appearance probability is a value representing the probability of occurrence of a word registered in a class in a certain word class, and the sum in the class is preferably 1.

また、各クラスにおいてはクラス同士の連続する出現確率をＮ−ｇｒａｍモデルとして事前に学習して計算している。例えば、ある単語Ｗが属するクラスＣ_ｗと、あるクラスＣ_ｊとのクラスバイグラムの出現確率は、Ｐ（Ｃ_ｗ│Ｃ_ｊ）として事前に学習によって求められている。また、クラスＣ_ｗに属する単語Ｗについて、２つの連鎖する単語Ｗ_ｉ、Ｗ_ｉ−１が出現する確率Ｐ（Ｗ_ｉ│Ｗ_ｉ−１）は、バイグラムのみで考えた場合には、一般に次式で表すことができる。
Ｐ（Ｗ_ｉ│Ｗ_ｉ−１）＝Ｐ（Ｗ_ｉ│Ｃ_ｉ）Ｐ（Ｃ_ｉ│Ｃ_ｉ−１）（１）
ただし、Ｗ_ｉ∈Ｃ_ｉとする。従って、単語Ｗの単語出現確率Ｐ（Ｗ_ｉ│Ｃ_ｉ）は非常に重要な要素になる。 In addition, in each class, the consecutive appearance probability between classes is learned and calculated in advance as an N-gram model. For example, the appearance probability of a class bigram between a class C _w to which a certain word W belongs and a certain class C _j is obtained by learning in advance as P (C _w | C _j ). In addition, regarding the word W belonging to the class C _w , the probability P (W _i | W _i-1 ) of appearance of two linked words W _i and W _i-1 is generally It can be expressed by a formula.
_{_{P (W i │W i-1}} ) = P (W i │C i) P (C i │C i-1) (1)
Here, W _i ∈C _i . Therefore, the word appearance probability P (W _i | C _i ) of the word W is a very important factor.

また、特許文献１には、少ない処理量で自動的に不足している単語辞書を追加することが出来、未登録単語の数を減らす技術が記載されている（以下、従来技術２という。）。
中川聖一著「確率モデルによる音声認識」、コロナ社、ｐ１０９−１２０特開２００５−２５００７１号 Patent Document 1 describes a technique that can automatically add a deficient word dictionary with a small amount of processing and reduce the number of unregistered words (hereinafter referred to as Conventional Technique 2). .
Seiichi Nakagawa, “Speech Recognition by Stochastic Model”, Corona, p109-120 JP 2005-250071

図２に、未登録単語を追加する模式図を示す。図２に示すように、従来技術１、２では追加した単語のクラスを人間が決定し、そのクラスに追加単語の単語出現確率を既定値で与えて追加していた。ここで、図２に示すようにある単語クラスに含まれている単語の数をｍとすると、単語出現確率の既定値として単語出現確率を１／ｍにしたり、経験的に良い値とされる定数値にしていた。この場合であると、本来正しい単語が認識されにくくなったり、湧き出し誤り（意図しない単語が不正解として出現すること）が多く発生しており、単語の認識精度を確実に向上させることはできなかった。 FIG. 2 shows a schematic diagram for adding an unregistered word. As shown in FIG. 2, in the prior arts 1 and 2, a person determines a class of added words, and adds the word appearance probability of the added word to the class by a default value. Here, as shown in FIG. 2, when the number of words included in a word class is m, the word appearance probability is set to 1 / m as a default value of the word appearance probability, or a good value is obtained empirically. It was a constant value. In this case, it is difficult to recognize the original correct word or many errors occur (unintentional words appear as incorrect answers), and the word recognition accuracy can be improved reliably. There wasn't.

この発明の単語追加装置は、誤り値計算部と、判定部と、出現確率出力部と、言語辞書出力部と、を備える。誤り値計算部は、追加前言語辞書に追加単語と当該追加単語の第１出現確率とを対応付けて追加した言語辞書と、評価音声データベースとを用いて、追加単語の第１誤り値を求める。判定部は、第１誤り値を用いて、音声認識処理において追加単語の湧き出し誤りが生じない傾向にある（以下、「第１関係を満たす」という。）か否かを判定する。出現確率出力部は、第１関係を満たすと判定されると、追加単語の単語出現確率として、第１中出現確率、第２中出現確率より大きい大出現確率を出力する。言語辞書出力部は、追加単語と単語出現確率とを対応付けて追加前言語辞書に追加した言語辞書を出力する。 The word adding device of the present invention includes an error value calculation unit, a determination unit, an appearance probability output unit, and a language dictionary output unit. The error value calculation unit obtains the first error value of the additional word by using the language dictionary in which the additional word and the first appearance probability of the additional word are associated with each other and the evaluation speech database. . The determination unit uses the first error value to determine whether there is a tendency that an error of an additional word does not occur in the speech recognition process (hereinafter referred to as “satisfying the first relationship”). When it is determined that the first relationship is satisfied, the appearance probability output unit outputs a larger appearance probability that is larger than the first medium appearance probability and the second medium appearance probability as the word appearance probability of the additional word. The language dictionary output unit outputs the language dictionary added to the pre-addition language dictionary in association with the added word and the word appearance probability.

以下の説明では、「追加単語の正解」とは、追加単語Ａを追加した言語辞書を用いた音声認識処理において、追加単語Ａが意図する場所に出現することであり、「追加単語の湧き出し誤り」とは、追加単語Ａを追加した言語辞書を用いた音声認識処理において、追加単語Ａが意図しない場所に出現することである。また、追加単語Ａの「正解率」とは、追加単語Ａの正解が生じる傾向（生じやすさ）の度合いを示し、「誤り率」とは、追加単語あの湧き出し誤りが生じる傾向（生じやすさ）の度合いを示す。上記の構成によれば、予め用意した評価音声データベースを用いることで、評価音声データベースに対する追加単語の誤り率の傾向を知ることが出来るため湧き出し誤りの出やすい追加単語は誤り率を少なくし、かつ当該追加単語の正解率を上げるような単語出現確率を決定できる。 In the following description, “correct answer of additional word” means that the additional word A appears in the intended place in the speech recognition process using the language dictionary to which the additional word A is added. “Error” means that the additional word A appears in an unintended place in the speech recognition process using the language dictionary to which the additional word A is added. Further, the “correct answer rate” of the additional word A indicates the degree of the tendency (ease of occurrence) of the correct answer of the additional word A, and the “error rate” indicates the tendency (prone to occur) of the additional word. Degree). According to the above configuration, by using an evaluation speech database prepared in advance, it is possible to know the tendency of the error rate of the additional words with respect to the evaluation speech database, so that the additional words that are likely to generate errors reduce the error rate, In addition, it is possible to determine a word appearance probability that increases the accuracy rate of the additional word.

以下に、発明を実施するための最良の形態を示す。なお、同じ機能を持つ構成部や同じ処理を行う過程には同じ番号を付し、重複説明を省略する。 The best mode for carrying out the invention will be described below. In addition, the same number is attached | subjected to the process which performs the same part and the process which has the same function, and duplication description is abbreviate | omitted.

まず、この発明の目的を簡単に説明する。以下の説明では、追加単語Ａの追加される前の言語辞書を「追加前言語辞書」という。HMMを用いた連続音声認識のための言語モデルにおいて、言語辞書に登録されていない未登録単語を追加前言語辞書に追加登録する際に、追加単語Aの単語クラスにおける単語出現確率を理想的な値に調整して決定する。ここで、単語出現確率の理想的な値とは、追加単語Aの正解値が高く、誤り値が低い状態を指す。すなわち、音声認識処理において、出てほしい場所に高い確率で正解単語が出て、出てほしくない場所には、不要な単語がなるべく出ないようにする状態のことである。 First, the object of the present invention will be briefly described. In the following description, the language dictionary before the additional word A is added is referred to as “pre-added language dictionary”. In a language model for continuous speech recognition using an HMM, when an unregistered word that is not registered in the language dictionary is additionally registered in the previous language dictionary, the word appearance probability in the word class of the additional word A is ideal. Determine by adjusting to the value. Here, the ideal value of the word appearance probability indicates a state where the correct value of the additional word A is high and the error value is low. That is, in the speech recognition process, a correct word is output with high probability at a place where the user wants to appear, and unnecessary words are prevented from appearing in a place where the user does not want to appear.

図３に発明を実施するための単語追加装置１００の最良の形態の機能構成例を示し、図４に処理フローを示す。図３記載の単語追加装置１００は、入力部２、音声認識部４、誤り値計算部６、出現確率出力部１４、判定部１５、言語辞書出力部１６、記憶部１０、制御部１２、を有する。また、利用者が追加を所望する単語を追加単語Ａとすると、上記式（１）からも理解されるように、追加単語Ａの単語出現確率が大きくなると、追加単語Ａの正解率は大きくなることに留意されたい。また、本実施例１の単語追加装置１００は、追加単語Ａの誤り率をなるべく小さくしつつ、大きな値の単語出現確率を決定して、追加単語Ａを追加前言語辞書に追加することを目的とする。 FIG. 3 shows a functional configuration example of the best mode of the word adding device 100 for carrying out the invention, and FIG. 4 shows a processing flow. 3 includes an input unit 2, a speech recognition unit 4, an error value calculation unit 6, an appearance probability output unit 14, a determination unit 15, a language dictionary output unit 16, a storage unit 10, and a control unit 12. Have. If the word that the user desires to add is an additional word A, the correct word rate of the additional word A increases as the word appearance probability of the additional word A increases, as is understood from the equation (1). Please note that. In addition, the word adding device 100 according to the first embodiment aims to determine a word appearance probability with a large value while reducing the error rate of the additional word A as much as possible, and add the additional word A to the pre-addition language dictionary. And

前準備として、追加単語Ａの追加の対象となる追加前言語辞書、その他音声認識処理に必要な情報（言語モデル情報、音響モデル情報など）も用意しておき、追加前言語辞書、言語モデル情報は言語モデル記憶部１８に記憶させ、音響モデル情報は音響モデル記憶部２０に記憶させる。 As pre-preparation, an additional language dictionary to be added to additional word A and other information necessary for speech recognition processing (language model information, acoustic model information, etc.) are also prepared, and the additional language dictionary and language model information are prepared. Is stored in the language model storage unit 18 and the acoustic model information is stored in the acoustic model storage unit 20.

まず、入力部２から追加単語Ａと、単語出現確率の決定の基準となる、任意の大きな値である第１出現確率Ｐ１（例えば、０．１）が入力される。そして、追加前言語辞書に追加単語Ａと第１出現確率Ｐ１とが対応付けられて追加される。また、第１出現確率Ｐ１は記憶部１０に記憶される。ここで、追加単語Ａの属するクラスは利用者が決定して当該追加単語Ａを追加する。追加単語Ａと第１出現確率Ｐ１が追加された言語辞書を第１追加後言語辞書という。そして、誤り値計算部６は、第１追加後言語辞書と評価音声データベースとを用いて、追加単語Ａの第１誤り値Ｅ１（Ａ）を求める（ステップＳ２）。 First, an additional word A and a first appearance probability P1 (for example, 0.1), which is an arbitrarily large value, are input from the input unit 2 as a criterion for determining the word appearance probability. Then, the additional word A and the first appearance probability P1 are added in association with the pre-addition language dictionary. Further, the first appearance probability P1 is stored in the storage unit 10. Here, the class to which the additional word A belongs is determined by the user, and the additional word A is added. The language dictionary to which the additional word A and the first appearance probability P1 are added is referred to as a first added language dictionary. Then, the error value calculation unit 6 calculates the first error value E1 (A) of the additional word A using the first post-addition language dictionary and the evaluation speech database (step S2).

まず、評価音声データベースについて説明する。評価音声データベースは評価音声データベース記憶部２２に記憶されている。そして、評価音声データベースは、認識結果を正しく評価できる大量の音声データαと、当該音声データαの正解テキストを予め書き起こした書き起こし文データβと、が対応付けられている。音声データα、書き起こし文データβは、どのような音声データでも良いが分量は多いほどよく、少なくとも数時間程度の発話量があることが好ましい。また、この実施例１の単語追加装置１００で用いる評価音声データベースは、当該音声データαには、追加単語Ａの発話音声を含む必要はないが、以下で説明する単語追加装置２００では追加単語Ａの発話音声をなるべく多く含ませることが好ましい（理由は実施例４で述べる）。 First, the evaluation voice database will be described. The evaluation voice database is stored in the evaluation voice database storage unit 22. In the evaluation speech database, a large amount of speech data α that can correctly evaluate the recognition result is associated with transcription sentence data β in which the correct text of the speech data α is transcribed in advance. The voice data α and the transcription sentence data β may be any voice data, but the larger the volume, the better. In the evaluation speech database used in the word adding device 100 of the first embodiment, the speech data α need not include the utterance speech of the additional word A. However, in the word adding device 200 described below, the additional word A is used. Is preferably included as much as possible (the reason will be described in Example 4).

次に、第１誤り値の具体的な計算手法を説明する。図３に示すように、音声認識部４が、第１追加後言語辞書を含む言語モデル（言語モデル記憶部１８に記憶）と、音響モデル（音響モデル記憶部２０に記憶）を用いて、評価音声データベース記憶部２２に記憶されている音声データαに対して音声認識処理を行い、音声テキストデータγを生成する。当該音声認識処理は公知の技術を用いれば良い。そして、誤り値計算部６が評価音声データベース記憶部２２内の書き起こし文βと音声テキストデータγとを比較し、追加単語Ａの湧き出し誤り数（つまり、追加単語Ａが意図しない場所に出てきている箇所の数）をカウントする。誤り値計算部６は「（湧き出し誤り数／評価音声データベース内の単語数）＊１００」を計算し、当該計算結果を追加単語Ａの第１誤り値Ｅ１（Ａ）として出力する。ただし、「＊」は乗算を示す。また、誤り値計算部６は追加単語Ａの第１誤り値Ｅ１（Ａ）として、追加単語Ａの湧き出し誤り数をそのまま出力しても良い。第１誤り値Ｅ１（Ａ）は、記憶部１０に記憶される。また、以下で説明する第２誤り値Ｅ２（Ａ）、第１正解値Ｃ１（Ａ）、第２正解値Ｃ２（Ａ）、第１閾値Ｔｈ１、第２閾値Ｔｈ２、第３閾値Ｔｈ３等も記憶部１０に記憶される。 Next, a specific method for calculating the first error value will be described. As shown in FIG. 3, the speech recognition unit 4 evaluates using a language model (stored in the language model storage unit 18) including the first post-addition language dictionary and an acoustic model (stored in the acoustic model storage unit 20). Speech recognition processing is performed on the speech data α stored in the speech database storage unit 22 to generate speech text data γ. A known technique may be used for the voice recognition process. Then, the error value calculation unit 6 compares the transcript sentence β in the evaluation speech database storage unit 22 with the speech text data γ, and the number of errors of the additional word A (that is, the additional word A appears in an unintended place). Count the number of incoming parts). The error value calculation unit 6 calculates “(number of errors in the source / number of words in the evaluation speech database) * 100” and outputs the calculation result as the first error value E1 (A) of the additional word A. However, “*” indicates multiplication. Further, the error value calculation unit 6 may output the error number of the additional word A as it is as the first error value E1 (A) of the additional word A. The first error value E1 (A) is stored in the storage unit 10. Further, the second error value E2 (A), the first correct answer value C1 (A), the second correct answer value C2 (A), the first threshold value Th1, the second threshold value Th2, the third threshold value Th3, etc., which will be described below, are also stored. Stored in the unit 10.

次に、判定部１５は、第１誤り値Ｅ１（Ａ）を用いて、追加単語Ａの単語出現確率を第１出現確率Ｐ１とした場合の音声認識処理において、追加単語Ａの湧き出し誤りが生じない傾向にある（生じにくい）か否かを判定する。以下では、「追加単語Ａの湧き出し誤りが生じにくい」という条件を満たすことを「第１関係を満たす」という。第１関係を満たすか否かの判定は、第１誤り値Ｅ１（Ａ）の他に第１閾値Ｔｈ１を用いる。ここで、第１閾値Ｔｈ１は予め定められているものであり、記憶部１０に予め記憶されている。また、第１関係を満たすか否かとは、例えば
Ｅ１（Ａ）＜Ｔｈ１（２）
であるか否かということである。第１誤り値Ｅ１（Ａ）が「（湧き出し誤り数／評価音声データベース内の単語数）＊１００」である場合には、第１閾値Ｔｈ１は例えば０以上１００未満の定数であり、例えばＴｈ１＝１（％）などと設定すればよい。第１誤り値Ｅ１（Ａ）が上記式（２）を満たすということは、追加単語Ａの単語出現確率が第１出現確率Ｐ１である場合の音声認識処理において、追加単語Ａの湧き出し誤りが生じない傾向にある（生じにくい）ということである。 Next, the determination unit 15 uses the first error value E1 (A), and in the speech recognition process when the word appearance probability of the additional word A is set to the first appearance probability P1, an error of the additional word A is generated. It is determined whether or not it tends to not occur (is difficult to occur). In the following, satisfying the condition that “the error of adding word A is unlikely to occur” is referred to as “satisfying the first relationship”. Whether or not the first relationship is satisfied is determined using the first threshold value Th1 in addition to the first error value E1 (A). Here, the first threshold Th1 is determined in advance and is stored in the storage unit 10 in advance. Whether or not the first relationship is satisfied is, for example, E1 (A) <Th1 (2)
It is whether or not. When the first error value E1 (A) is “(the number of errors in the source / the number of words in the evaluation speech database) * 100”, the first threshold Th1 is a constant not less than 0 and less than 100, for example, Th1 = 1 (%) may be set. The fact that the first error value E1 (A) satisfies the above formula (2) means that an error of the additional word A is generated in the speech recognition process when the word appearance probability of the additional word A is the first appearance probability P1. This means that it does not tend to occur (it is difficult to generate).

また、第１関係を満たすか否かは、上記式（１）を満たすか否かに限られず、追加単語Ａの単語出現確率が第１出現確率Ｐ１である場合の音声認識処理において、追加単語Ａの湧き出し誤りが生じない傾向にあるか否かを判定できれば何でも良い。つまり、音声認識部４、誤り値計算部６、判定部１５の処理により、音声認識処理を事前に実験的に行うことで、単語出現確率を第１出現確率Ｐ１とした場合の追加単語Aの誤り率の傾向を知ることが出来るということである。このことは、実施例２〜４でも同様であり、実施例４では正解率の傾向を知ることが出来る。 Whether or not the first relationship is satisfied is not limited to whether or not the above formula (1) is satisfied, and in the speech recognition process when the word appearance probability of the additional word A is the first appearance probability P1, the additional word Anything can be used as long as it can be determined whether or not there is a tendency that an error of A does not occur. That is, the speech recognition unit 4, the error value calculation unit 6, and the determination unit 15 perform the speech recognition process experimentally in advance, so that the word appearance probability is set to the first appearance probability P <b> 1. It means that the tendency of error rate can be known. This is the same in Examples 2 to 4. In Example 4, the tendency of the correct answer rate can be known.

判定部１５が第１関係を満たすと判定すると、出現確率出力部１４は大出現確率Ｌを出力する（ステップＳ６）。ここで、大出現確率Ｌは、以下で説明する第１中出現確率Ｍ１、第２中出現確率Ｍ２、小出現確率Ｓより大きければよいが、大出現確率Ｌは、第１出現確率Ｐ１以上の値であることが好ましい。何故なら、上述したように、追加単語Ａの湧き出し誤りが生じない傾向にある（第１関係を満たす）と判定されているので、単語出現確率を高くすることで、追加単語Ａの正解率が上げることが出来るからである。大出現確率Ｌは言語辞書出力部１６に入力される。そして、言語辞書出力部１６は追加単語Ａと大出現確率Ｌとを対応付けて追加前言語辞書に追加した言語辞書を出力する（ステップＳ３０）。また、図４に示してはいないが、ステップＳ４において、判定部１５がＥ１（Ａ）≧Ｔｈ１と判定すれば、その時点で処理を終了しても良いし、実施例２で説明する次の処理に進んでもよい。 If the determination part 15 determines with satisfy | filling 1st relationship, the appearance probability output part 14 will output the large appearance probability L (step S6). Here, the large appearance probability L may be larger than the first medium appearance probability M1, the second medium appearance probability M2, and the small appearance probability S described below, but the large appearance probability L is greater than or equal to the first appearance probability P1. It is preferably a value. This is because, as described above, it is determined that there is a tendency that an error of the additional word A does not occur (satisfying the first relationship), and therefore the accuracy rate of the additional word A is increased by increasing the word appearance probability. It is because it can raise. The large appearance probability L is input to the language dictionary output unit 16. Then, the language dictionary output unit 16 outputs the language dictionary added to the pre-addition language dictionary in association with the additional word A and the large appearance probability L (step S30). Although not shown in FIG. 4, if the determination unit 15 determines that E1 (A) ≧ Th1 in step S4, the process may be terminated at that time, or the following will be described in the second embodiment. You may proceed to processing.

このように、追加単語Ａについて、事前に任意に設定した十分大きい第１出現確率Ｐ１を単語出現確率として、事前に音声認識処理を行い、追加単語Ａの第１誤り値Ｅ１（Ａ）が第１閾値Ｔｈ１より高いか否かを判定することで、追加単語Ａの誤り率の傾向を知ることが出来る。そして、追加単語Ａの湧き出し誤りが生じない傾向にある（湧き出し誤りが生じにくい）と判定されると、追加単語Ａの単語出現確率を大出現確率Ｌとして決定できる。また、上述のように、単語出現確率を大きくすると、追加単語Ａの正解率も上げることが出来る。従って、この実施例１の単語追加装置１００であると、誤り率が低くかつ正解率が高くなるような、追加単語Ａの単語出現確率を決定できる。 Thus, for the additional word A, speech recognition processing is performed in advance using the sufficiently large first appearance probability P1 arbitrarily set in advance as the word appearance probability, and the first error value E1 (A) of the additional word A is the first. By determining whether or not the threshold value is higher than the threshold value Th1, the tendency of the error rate of the additional word A can be known. Then, if it is determined that there is a tendency that an additional word A is not prone to occur (i.e., an error is difficult to occur), the word appearance probability of the additional word A can be determined as the large appearance probability L. Further, as described above, when the word appearance probability is increased, the accuracy rate of the additional word A can be increased. Therefore, with the word adding device 100 of the first embodiment, it is possible to determine the word appearance probability of the additional word A such that the error rate is low and the correct answer rate is high.

実施例２の単語追加装置は、実施例１で説明したステップＳ４において、判定部１５がＥ１（Ａ）≧Ｔｈ１と判定した場合に、次の処理を行うものである。ステップＳ４でＥ１（Ａ）≧Ｔｈ１と判定された場合には、当該判定は追加単語Ａの単語出現確率がＰ１の場合であると追加単語Ａの湧き出し誤りが生じる傾向にある（生じやすい）ということである。従って、追加単語Ａの単語出現確率の基準として、第１出現確率Ｐ１未満である第２出現確率Ｐ２（例えば０．０１）が設定される。具体的には、第２出現確率Ｐ２は、入力部２から入力されて、言語モデル記憶部１８に入力されて、記憶部１０に記憶される。 The word adding device according to the second embodiment performs the following process when the determination unit 15 determines that E1 (A) ≧ Th1 in step S4 described in the first embodiment. If it is determined in step S4 that E1 (A) ≧ Th1, the determination is likely to occur when the word appearance probability of the additional word A is P1. That's what it means. Accordingly, a second appearance probability P2 (for example, 0.01) that is less than the first appearance probability P1 is set as a reference for the word appearance probability of the additional word A. Specifically, the second appearance probability P <b> 2 is input from the input unit 2, input to the language model storage unit 18, and stored in the storage unit 10.

誤り値計算部６は、追加前言語辞書に追加単語Ａと第２出現確率Ｐ２とを対応付けて追加した言語辞書（以下、「第２追加後言語辞書」という。）と、評価音声データベースとを用いて、第２誤り値Ｅ２（Ａ）を求める（ステップＳ１０）。第２誤り値Ｅ２（Ａ）の求め方は第１誤り値Ｅ１（Ａ）と同様に、音声認識部４と評価音声データベースとを用いて求められるので、詳細は省略する。求められた第２誤り値Ｅ２（Ａ）は記憶部１０に記憶される。 The error value calculation unit 6 adds a language dictionary (hereinafter referred to as “second post-addition language dictionary”) in which the additional word A and the second appearance probability P2 are added to the pre-addition language dictionary in association with each other, an evaluation speech database, and the like. Is used to determine the second error value E2 (A) (step S10). Since the method for obtaining the second error value E2 (A) is obtained using the speech recognition unit 4 and the evaluation speech database in the same manner as the first error value E1 (A), details are omitted. The obtained second error value E2 (A) is stored in the storage unit 10.

次に、判定部１５は、第１誤り値Ｅ１（Ａ）および第２誤り値Ｅ２（Ａ）を用いて、追加単語Ａの出現確率を下げる（つまり、単語出現確率をＰ１からＰ２に変更する）と、音声認識処理において追加単語Ａの湧き出し誤りが生じない傾向にある（生じにくくなる）か否かを判定する。ここで、「追加単語Ａの出現確率を下げると音声認識処理において追加単語Ａの湧き出し誤りが生じない傾向にある」という条件を満たすことを「第２関係を満たす」という。判定部１５の判定処理は、Ｅ１（Ａ）、Ｅ２（Ａ）の他、第２閾値Ｔｈ２を用いる。例えば、第２関係を満たすか否かの判定とは、
（Ｅ２（Ａ）／Ｅ１（Ａ））＊１００＜Ｔｈ２（３）
を満たすかどうかの判定である。この場合にはＴｈ２とは０以上１００未満の定数であり、例えば１０（％）である。 Next, the determination unit 15 reduces the appearance probability of the additional word A using the first error value E1 (A) and the second error value E2 (A) (that is, changes the word appearance probability from P1 to P2). ), It is determined whether or not an error of the additional word A tends not to occur (becomes less likely to occur) in the speech recognition process. Here, satisfying the condition that “if the appearance probability of the additional word A is lowered, there is a tendency that the error of the additional word A does not occur in the speech recognition process” is said to “satisfy the second relationship”. The determination process of the determination unit 15 uses the second threshold value Th2 in addition to E1 (A) and E2 (A). For example, determining whether or not the second relationship is satisfied is as follows:
(E2 (A) / E1 (A)) * 100 <Th2 (3)
It is a judgment of whether or not it satisfies. In this case, Th2 is a constant not less than 0 and less than 100, for example, 10 (%).

ここで、上述のように、式（３）中の第１誤り値Ｅ１（Ａ）、第２誤り値Ｅ２（Ａ）はそれぞれ、第１出現確率Ｐ１を付与した第１追加後言語辞書、第２出現確率Ｐ２を付与した第２追加後言語辞書について求められるものである。上述のように、Ｐ１＞Ｐ２であることから、式（３）を満たすということは、「追加単語Ａの単語出現確率を下げると追加単語Ａは湧き出し誤りが生じない傾向にある」ということである。また、判定部１５は、第２関係を満たすか否かを判定できれば良いので、他の判定式として、Ｅ２（Ａ）−Ｅ１（Ａ）＜Ｔｈ２を満たすか否か等で判定しても良い。 Here, as described above, the first error value E1 (A) and the second error value E2 (A) in the expression (3) are the first post-addition language dictionary to which the first appearance probability P1 is assigned, It is calculated | required about the 2nd post-addition language dictionary which provided 2 appearance probability P2. As described above, since P1> P2, satisfying the expression (3) means that if the word appearance probability of the additional word A is lowered, the additional word A tends not to be generated and an error occurs. It is. Moreover, since the determination part 15 should just be able to determine whether the 2nd relationship is satisfy | filled, you may determine by other whether it satisfies E2 (A) -E1 (A) <Th2 etc. as another determination formula. .

そして、追加単語Ａの単語出現確率を下げると音声認識処理において追加単語Ａの湧き出し誤りが生じない傾向にある（つまり、第２関係を満たす）ことが判定されると、追加単語Ａの単語出現確率を大出現確率Ｌより小さくする必要がある。従って、出現確率出力部１４は、追加単語Ａの単語出現確率として、大出現確率Ｌ未満の値である中出現確率Ｍ１（例えば、０．０１）を出力する。（ステップＳ１６）。また中出現確率Ｍ１は、第２出現確率Ｐ２以上であることが好ましい。 Then, if it is determined that if the word appearance probability of the additional word A is lowered, there is a tendency that the error of the additional word A does not occur in the speech recognition process (that is, the second relationship is satisfied), the word of the additional word A It is necessary to make the appearance probability smaller than the large appearance probability L. Therefore, the appearance probability output unit 14 outputs a medium appearance probability M1 (for example, 0.01) that is a value less than the large appearance probability L as the word appearance probability of the additional word A. (Step S16). The medium appearance probability M1 is preferably equal to or higher than the second appearance probability P2.

また、図４には示していないが、ステップＳ１２において、判定部１５がＥ２（Ａ）／Ｅ１（Ａ）≧Ｔｈ２と判定すれば、その時点で処理を終了しても良いし、次の実施例３、４で説明する次の処理（ステップＳ２８またはステップＳ２０））に進んでもよい。 Although not shown in FIG. 4, if the determination unit 15 determines that E2 (A) / E1 (A) ≧ Th2 in step S12, the process may be terminated at that time, or the next implementation You may proceed to the next process (step S28 or step S20) described in examples 3 and 4.

このように、実施例２の単語追加装置は、追加単語Ａの単語出現確率を下げると追加単語Ａは湧き出し誤りが生じない傾向にあると判定された場合には、単語出現確率として大出現確率Ｌ未満の第１出現確率Ｍ１を設定することで追加単語Ａの誤り率を下げることが出来る。 As described above, when it is determined that when the word appearance probability of the additional word A is lowered, it is determined that the additional word A has a tendency not to occur and the word appearance probability of the additional word A is large. By setting the first appearance probability M1 that is less than the probability L, the error rate of the additional word A can be lowered.

実施例３の単語追加装置１００は、実施例１で説明したステップＳ１２において、判定部１５がＥ２（Ａ）／Ｅ１（Ａ）≧Ｔｈ２と判定した（第２関係を満たさない）場合に、次の処理を行うものである。Ｅ２（Ａ）／Ｅ１（Ａ）≧Ｔｈ２の場合というのは、単語出現確率を小さくすると（第１出現確率Ｐ１から第２出現確率Ｐ２に変更）、追加単語Ａはまだ湧き出し誤りが生じる傾向にある（生じやすい）ということである。この場合には、追加単語Ａの単語出現確率を、第１中出現確率Ｍより更に小さくしなければならない。従って、出現確率出力部１４は、追加単語Ａの単語出現確率として第１中出現確率Ｍ１未満の値である小出現確率Ｓを出力する（ステップＳ２８）。 The word adding device 100 according to the third embodiment performs the following when the determination unit 15 determines that E2 (A) / E1 (A) ≧ Th2 (does not satisfy the second relationship) in step S12 described in the first embodiment. The process is performed. The case of E2 (A) / E1 (A) ≧ Th2 means that when the word appearance probability is reduced (changed from the first appearance probability P1 to the second appearance probability P2), the additional word A still tends to be generated and an error occurs. It is (is likely to occur). In this case, the word appearance probability of the additional word A must be made smaller than the first medium appearance probability M. Therefore, the appearance probability output unit 14 outputs the small appearance probability S that is a value less than the first medium appearance probability M1 as the word appearance probability of the additional word A (step S28).

この実施例３の単語追加装置１００により、追加単語Ａの単語出現確率を小さくしても追加単語Ａはまだ湧き出し誤りが生じる傾向にあるという状況であれば、追加単語Ａの単語出現確率を小さくして、湧き出し誤りを生じにくくすることができる。 In the situation where the additional word A is still prone to occur even if the word appearance probability of the additional word A is reduced by the word adding device 100 of the third embodiment, the word appearance probability of the additional word A is set. By making it small, it is possible to make it difficult to make an error.

実施例４の単語追加装置２００は、ステップＳ１２において、判定部１５がＥ２（Ａ）／Ｅ１（Ａ）≧Ｔｈ２と判定した場合に、実施例３で説明したステップＳ１２とステップＳ２８の間に更なる処理を行うものである。図４では、破線で示されているステップＳ２０、Ｓ２２、Ｓ２４である。この実施例４の単語追加装置２００は追加単語Ａの正解値の概念を用いて、実施例１〜３と比較して単語出現確率の更なる微調整を行うものである。追加単語Ａの単語出現確率を下げると（第１出現確率Ｐ１から第２出現確率Ｐ２に変更すると）、追加単語Ａの正解が生じない傾向にある（生じにくい）状況である場合に、追加単語Ａの単語出現確率を下げすぎず、やや大きい値に設定する。以下、具体的に説明する。 In the word adding device 200 of the fourth embodiment, when the determination unit 15 determines that E2 (A) / E1 (A) ≧ Th2 in step S12, the word adding device 200 is updated between step S12 and step S28 described in the third embodiment. Is performed. In FIG. 4, steps S20, S22, and S24 are indicated by broken lines. The word adding device 200 according to the fourth embodiment uses the concept of the correct value of the additional word A to further fine-tune the word appearance probability as compared with the first to third embodiments. When the word appearance probability of the additional word A is lowered (changed from the first appearance probability P1 to the second appearance probability P2), the additional word A is in a situation where the correct answer of the additional word A does not tend to occur (is difficult to occur). Set the word appearance probability of A to a slightly large value without reducing it too much. This will be specifically described below.

実施例４の単語追加装置２００は、実施例１〜３で説明した単語追加装置１００と比較して、正解値計算部８（図３では、破線で示す）を更に有する点で異なる。 The word adding device 200 of the fourth embodiment is different from the word adding device 100 described in the first to third embodiments in that it further includes a correct answer value calculation unit 8 (indicated by a broken line in FIG. 3).

正解値計算部８は、追加前言語辞書に、追加単語Ａと第１出現確率とを対応付けて追加した言語辞書（第１追加後言語辞書）と、評価音声データベースを用いて、追加単語Ａの第１正解値Ｃ１（Ａ）を求める。求め方は、上記第１誤り値Ｅ１（Ａ）、第２誤り値Ｅ２（Ａ）と同様であるが、念のため説明する。音声認識部４が第１追加後言語辞書を含む、言語モデル記憶部１８中の言語モデルと、音響モデル記憶部２０中の音響モデルを用いて、評価音声データベース記憶部２２に記憶されている音声データαに対して音声認識処理を行い、音声テキストデータγを生成する。そして、正解値計算部８が評価音声データベース記憶部２２内の書き起こし文βと音声テキストデータγとを比較し、追加単語Ａの正解数をカウントする。第１正解値は、追加単語Ａのカウントされた正解数でも良いし、（正解数／評価音声データベース内の全単語数）＊１００でも良いし、（正解数／評価音声データベース内の正解すべき数）＊１００等で良い。このことは第２正解値Ｃ２（Ａ）についても同様である。同様に、正解値計算部８は、第２追加後言語辞書（追加前言語辞書に追加単語Ａと第２出現確率とを対応付けて追加した言語辞書）等や評価音声データベースを用いて、第２正解値Ｃ２（Ａ）についても同様に求める（ステップＳ２０）。なお、評価音声データベース（音声データαと書き起こし文データβ）には、追加単語Ａがなるべく多く含まれていなければならない。なぜなら正解値計算部８は正解数をカウントできないからである。また、実施例１〜３で説明した単語追加装置１００では正解率の概念を用いないので、評価音声データベースには追加単語Ａが含まれている必要はない。 The correct value calculation unit 8 uses the language dictionary (first post-addition language dictionary) in which the additional word A and the first appearance probability are added to the pre-addition language dictionary in association with each other, and the additional word A using the evaluation speech database. The first correct answer value C1 (A) is obtained. The calculation method is the same as the first error value E1 (A) and the second error value E2 (A), but will be described just in case. The speech stored in the evaluation speech database storage unit 22 using the language model in the language model storage unit 18 and the acoustic model in the acoustic model storage unit 20 in which the speech recognition unit 4 includes the first post-addition language dictionary. Speech recognition processing is performed on the data α to generate speech text data γ. Then, the correct answer value calculation unit 8 compares the transcription sentence β in the evaluation speech database storage unit 22 with the speech text data γ, and counts the number of correct answers of the additional word A. The first correct answer value may be the number of correct answers counted for the additional word A, or (the number of correct answers / the total number of words in the evaluation speech database) * 100, or (the number of correct answers / the correct answer in the evaluation speech database) Number) * 100 etc. The same applies to the second correct answer value C2 (A). Similarly, the correct answer value calculation unit 8 uses the second post-addition language dictionary (language dictionary in which the additional word A and the second appearance probability are added to the pre-addition language dictionary in association with each other) or the like and the evaluation speech database. The two correct answer values C2 (A) are similarly obtained (step S20). The evaluation speech database (speech data α and transcription text data β) must contain as many additional words A as possible. This is because the correct value calculation unit 8 cannot count the number of correct answers. In addition, since the word addition device 100 described in the first to third embodiments does not use the concept of the correct answer rate, the evaluation speech database does not need to include the additional word A.

そして、判定部１５は、第１正解値Ｃ１（Ａ）および第２正解値Ｃ２（Ａ）を用いて、「追加単語Ａの単語出現確率を下げると、音声認識処理において追加単語Ａの正解が生じない傾向にある」か否かを判定する。ここで、「追加単語Ａの単語出現確率を下げると、音声認識処理において追加単語Ａの正解が生じない傾向にある」という条件を満たすことを「第３関係を満たす」という。 Then, the determination unit 15 uses the first correct value C1 (A) and the second correct value C2 (A), “If the word appearance probability of the additional word A is lowered, the correct answer of the additional word A is determined in the speech recognition process. It is determined whether or not “they tend to not occur”. Here, satisfying the condition that “if the word appearance probability of the additional word A is reduced, the correct answer of the additional word A does not occur in the speech recognition process” is referred to as “satisfying the third relationship”.

判定部１５の判定処理は、第１正解値Ｃ１（Ａ）および第２正解値Ｃ２（Ａ）の他に、第３閾値Ｔｈ３を用いる。第３関係を満たすか否かの判定は、例えば、
（Ｃ２（Ａ）／Ｃ１（Ａ））＊１００＜Ｔｈ３（４）
の式を満たすかどうかの判定を行えばよい。この場合にはＴｈ３とは０以上１００未満の定数であり、例えば１０（％）である。 The determination process of the determination unit 15 uses the third threshold Th3 in addition to the first correct value C1 (A) and the second correct value C2 (A). The determination of whether or not the third relationship is satisfied is, for example,
(C2 (A) / C1 (A)) * 100 <Th3 (4)
It may be determined whether or not the following expression is satisfied. In this case, Th3 is a constant not less than 0 and less than 100, for example, 10 (%).

ここで、上述のように、式（４）中の第１正解値Ｃ１（Ａ）、第２正解値Ｃ２（Ａ）はそれぞれ、第１出現確率Ｐ１を付与した第１追加後言語辞書、第２出現確率Ｐ２を付与した第２追加後言語辞書について求められるものである。Ｐ１＞Ｐ２であることから、式（４）を満たすということは、「追加単語Ａの単語出現確率を下げると、追加単語Ａは正解が生じない傾向にある。」ということである。その他、判定部１５は例えば、Ｃ２（Ａ）−Ｃ１（Ａ）＜Ｔｈ３を満たすか否かを判定しても良い。 Here, as described above, the first correct answer value C1 (A) and the second correct answer value C2 (A) in the formula (4) are respectively the first post-addition language dictionary assigned the first appearance probability P1, It is calculated | required about the 2nd post-addition language dictionary which provided 2 appearance probability P2. Since P1> P2, satisfying the expression (4) means “if the word appearance probability of the additional word A is lowered, the additional word A tends not to be correct”. In addition, for example, the determination unit 15 may determine whether or not C2 (A) −C1 (A) <Th3 is satisfied.

そして、「追加単語Ａの単語出現確率を下げると追加単語Ａの正解が生じない傾向にある（第３関係を満たす）」ということが判定されると、追加単語Ａの単語出現確率を下げることは妥当でない。従って、出現確率出力部１４は、追加単語の単語出現確率として小出現確率Ｓまで下げずに、前記大出現確率Ｌ未満の値である中出現確率Ｍ２を出力する（ステップＳ２４）。当該Ｍ２は、小出現確率Ｓより大きいことが好ましく、Ｍ２は上記Ｍ１と同値でもよい。 Then, if it is determined that “if the word appearance probability of the additional word A is lowered, the correct answer of the additional word A does not tend to occur (the third relationship is satisfied)”, the word appearance probability of the additional word A is lowered. Is not valid. Therefore, the appearance probability output unit 14 outputs the medium appearance probability M2 that is a value less than the large appearance probability L without lowering the word appearance probability of the additional word to the small appearance probability S (step S24). The M2 is preferably larger than the small occurrence probability S, and M2 may be equal to the above M1.

また、「追加単語Ａの単語出現確率を下げると追加単語Ａの正解が生じる傾向にある（生じやすくなる）」ことが判定されれば、そこで処理を終了しても良い（図４には示さず）。また、ステップＳ１２において第２関係を満たしていない（つまり、追加単語の単語出現確率が下がると、音声認識処理において前記追加単語の湧き出し誤りが生じやすくなる）と判定されているので、ステップＳ２８に移動し、更に単語出現確率を小さくしてもよい（つまり、出現確率出力部１４が小出現確率Ｓを出力する）。 Further, if it is determined that “the lower the word appearance probability of the additional word A, the correct answer of the additional word A tends to occur (more likely to occur)”, the processing may be ended there (as shown in FIG. 4). ) Further, since it is determined in step S12 that the second relationship is not satisfied (that is, if the word appearance probability of the additional word is lowered, the additional word is likely to occur in the speech recognition process), so step S28 is performed. The word appearance probability may be further reduced (that is, the appearance probability output unit 14 outputs the small appearance probability S).

この実施例４の単語追加装置２００であれば、正解値という概念を用いることで、正解率が低い追加単語Ａであれば正解率を高くでき、誤り率が高い追加単語Ａであれば誤り率を低くなるような単語出現確率を設定でき、実施例１〜３と比較して更なる単語出現確率の微調整を行うことが出来る。 In the word adding device 200 of the fourth embodiment, by using the concept of a correct answer value, the correct word rate can be increased if the additional word A has a low correct answer rate, and the error rate if the additional word A has a high error rate. The word appearance probability can be set to be lower, and the word appearance probability can be further finely adjusted as compared with the first to third embodiments.

［変形例］
次に、単語追加装置２００の変形例を説明する。図５に変形例の単語追加装置２００’の処理フローを示す。単語追加装置２００’の処理フローは、単語追加装置２００の処理フロー（図４参照）と比較して、ステップＳ４の処理において、例えば、Ｅ１（Ａ）≧Ｔｈ１と判定された場合に、ステップＳ２０、Ｓ２２、Ｓ２４の処理（実施例４で説明）が行われ、ステップＳ２２の処理の後、ステップＳ１０の処理が行われる。つまり、単語追加装置２００の処理は、ステップＳ４において、Ｅ１（Ａ）≧Ｔｈ１と判定されると、Ｅ１（Ａ）／Ｅ２（Ａ）＜Ｔｈ２か否かの判定を行っていたが、単語追加装置２００’の処理は、ステップＳ４において、Ｅ１（Ａ）≧Ｔｈ１と判定されると、Ｃ１（Ａ）／Ｃ２（Ａ）＜Ｔｈ３か否かの判定を行っている。また、Ｃ１（Ａ）／Ｃ２（Ａ）≧Ｔｈ３と判定された場合には、その時点で処理を終了をしてもよく、次のステップＳ１０に移っても良い。 [Modification]
Next, a modified example of the word adding device 200 will be described. FIG. 5 shows a processing flow of the word adding device 200 ′ according to the modification. The processing flow of the word adding device 200 ′ is compared with the processing flow of the word adding device 200 (see FIG. 4). In the process of step S4, for example, when it is determined that E1 (A) ≧ Th1, step S20 , S22 and S24 (described in the fourth embodiment) are performed, and after step S22, step S10 is performed. That is, in the processing of the word adding device 200, if it is determined in step S4 that E1 (A) ≧ Th1, it is determined whether E1 (A) / E2 (A) <Th2 or not. When it is determined in step S4 that E1 (A) ≧ Th1, the processing of the apparatus 200 ′ determines whether C1 (A) / C2 (A) <Th3. When it is determined that C1 (A) / C2 (A) ≧ Th3, the process may be terminated at that time, or the process may move to the next step S10.

このような単語追加装置２００’の構成にしても、単語追加装置２００と同様の効果を得れる。 Even with this configuration of the word adding device 200 ′, the same effect as the word adding device 200 can be obtained.

＜ハードウェア構成＞
本発明は上述の実施の形態に限定されるものではない。また、上述の各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。その他、本発明の趣旨を逸脱しない範囲で適宜変更が可能であることはいうまでもない。 <Hardware configuration>
The present invention is not limited to the above-described embodiment. In addition, the various processes described above are not only executed in time series according to the description, but may be executed in parallel or individually according to the processing capability of the apparatus that executes the processes or as necessary. Needless to say, other modifications are possible without departing from the spirit of the present invention.

また、上述の構成をコンピュータによって実現する場合、単語追加装置１００、２００、２００’が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、処理機能がコンピュータ上で実現される。 Further, when the above configuration is realized by a computer, the processing contents of the functions that the word adding devices 100, 200, 200 'should have are described by a program. The processing function is realized on the computer by executing the program on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよいが、具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ−ＲＡＭ（Random Access Memory）、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ−Ｒ（Recordable）／ＲＷ（ReWritable）等を、光磁気記録媒体として、ＭＯ（Magneto-Optical disc）等を、半導体メモリとしてＥＥＰ−ＲＯＭ（Electronically Erasable and Programmable-Read Only Memory）等を用いることができる。 The program describing the processing contents can be recorded on a computer-readable recording medium. The computer-readable recording medium may be any medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, or a semiconductor memory. Specifically, for example, the magnetic recording device may be a hard disk device or a flexible Discs, magnetic tapes, etc. as optical disks, DVD (Digital Versatile Disc), DVD-RAM (Random Access Memory), CD-ROM (Compact Disc Read Only Memory), CD-R (Recordable) / RW (ReWritable), etc. As the magneto-optical recording medium, MO (Magneto-Optical disc) or the like can be used, and as the semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory) or the like can be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. When executing the process, the computer reads a program stored in its own recording medium and executes a process according to the read program. As another execution form of the program, the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、本装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In this embodiment, the present apparatus is configured by executing a predetermined program on a computer. However, at least a part of these processing contents may be realized by hardware.

また、本実施例で説明した単語追加装置は、ＣＰＵ（Central Processing Unit）、入力部、出力部、補助記憶装置、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）及びバスを有している（何れも図示せず）。 Further, the word adding device described in the present embodiment includes a CPU (Central Processing Unit), an input unit, an output unit, an auxiliary storage device, a RAM (Random Access Memory), a ROM (Read Only Memory), and a bus. (Neither shown).

ＣＰＵは、読み込まれた各種プログラムに従って様々な演算処理を実行する。補助記憶装置は、例えば、ハードディスク、ＭＯ（Magneto-Optical disc）、半導体メモリ等であり、ＲＡＭは、ＳＲＡＭ(Static Random Access Memory)、ＤＲＡＭ (Dynamic Random Access Memory)等である。また、バスは、ＣＰＵ、入力部、出力部、補助記憶装置、ＲＡＭ及びＲＯＭを通信可能に接続している。 The CPU executes various arithmetic processes according to the read various programs. The auxiliary storage device is, for example, a hard disk, an MO (Magneto-Optical disc), a semiconductor memory, or the like, and the RAM is an SRAM (Static Random Access Memory), a DRAM (Dynamic Random Access Memory), or the like. The bus connects the CPU, the input unit, the output unit, the auxiliary storage device, the RAM, and the ROM so that they can communicate with each other.

＜ハードウェアとソフトウェアとの協働＞
本実施例の単語追加装置は、上述のようなハードウェアに所定のプログラムが読み込まれ、ＣＰＵがそれを実行することによって構築される。以下、このように構築される各装置の機能構成を説明する。 <Cooperation between hardware and software>
The word adding device of this embodiment is constructed by reading a predetermined program into the hardware as described above and executing it by the CPU. The functional configuration of each device constructed in this way will be described below.

入力部２、言語辞書出力部１６は、所定のプログラムが読み込まれたＣＰＵの制御のもと駆動するＬＡＮカード、モデム等の通信装置である。音声認識部４、誤り値計算部６、正解値計算部８、出現確率出力部１４、は所定のプログラムがＣＰＵに読み込まれ、実行されることによって構築される演算部である。評価音声データベース記憶部２２、言語モデル記憶部１８、音響モデル記憶部２０は上記補助記憶装置として機能する。 The input unit 2 and the language dictionary output unit 16 are communication devices such as a LAN card and a modem that are driven under the control of a CPU loaded with a predetermined program. The voice recognition unit 4, the error value calculation unit 6, the correct answer value calculation unit 8, and the appearance probability output unit 14 are calculation units that are constructed when a predetermined program is read into the CPU and executed. The evaluation speech database storage unit 22, the language model storage unit 18, and the acoustic model storage unit 20 function as the auxiliary storage device.

言語辞書の模式図。The schematic diagram of a language dictionary. 未登録単語を追加することを示した模式図。The schematic diagram which showed adding an unregistered word. 本実施例の単語追加装置の機能構成例を示したブロック図。The block diagram which showed the function structural example of the word addition apparatus of a present Example. 本実施例の単語追加装置の処理フローを示した図。The figure which showed the processing flow of the word addition apparatus of a present Example. 本実施例の単語追加装置の変形例の処理フローを示した図。The figure which showed the processing flow of the modification of the word addition apparatus of a present Example.

Claims

An error value calculation unit for obtaining a first error value of the additional word by using a language dictionary in which the additional word and the first appearance probability of the additional word are associated and added to the pre-addition language dictionary, and an evaluation speech database; ,
A determination unit that determines whether or not the error of the additional word is not generated in the speech recognition process using the first error value (hereinafter, referred to as “satisfying the first relationship”);
When it is determined that the first relationship is satisfied, an appearance probability output unit that outputs a large appearance probability that is a value larger than the first medium appearance probability as the word appearance probability of the additional word;
A language dictionary output unit that outputs the language dictionary added to the pre-addition language dictionary in association with the additional word and the word appearance probability;
If it is determined that the first relation is not satisfied, the error value calculation unit adds the additional word and the second appearance probability that is less than the first appearance probability in the pre-addition language dictionary in association with each other. A second error value is obtained using a dictionary and the evaluation speech database;
When the word appearance probability of the additional word is lowered using the first error value and the second error value, the determination unit tends not to cause an error of the additional word in the speech recognition process (hereinafter referred to as “the additional word”). , “Satisfying the second relationship”).
When it is determined that the first probability does not satisfy the first relationship and satisfies the second relationship, the appearance probability output unit calculates the first medium appearance probability that is a value less than the large appearance probability as a word appearance probability of an additional word. An apparatus for adding words, characterized by being output.

The word adding device according to claim 1 ,
If it is determined that the first relationship and the second relationship are not satisfied, the appearance probability output unit outputs a small appearance probability that is a value less than the first medium appearance probability as a word appearance probability of an additional word. A word adding device characterized by being.

The word adding device according to claim 1 ,
Further, using the language dictionary in which the additional word and the first appearance probability are added in association with the pre-addition language dictionary, and using the evaluation speech database, the first correct value of the additional word is obtained.
A language dictionary in which an additional word and the second appearance probability are associated with each other and added to the pre-addition language dictionary, and a correct value calculation unit for obtaining a second correct value of the additional word using an evaluation speech database. Equipped,
When the word appearance probability of the additional word decreases using the first correct value and the second correct value, the determination unit tends to prevent the correct answer of the additional word in the speech recognition process (hereinafter, “ The third relation is satisfied ”).
If it is determined that the appearance probability output unit does not satisfy the first relationship and the second relationship and satisfies the third relationship, the appearance probability output unit is a second value that is less than the large appearance probability as a word appearance probability of an additional word. An apparatus for adding words, characterized by outputting a medium appearance probability.

The word adding device according to claim 3 ,
When it is determined that the first relation, the second relation, and the third relation are not satisfied, the appearance probability output unit is a small value that is less than the second medium appearance probability as a word appearance probability of an additional word. An apparatus for adding words, characterized by outputting an appearance probability.

An error value calculation unit for obtaining a first error value of the additional word by using a language dictionary in which the additional word and the first appearance probability of the additional word are associated and added to the pre-addition language dictionary, and an evaluation speech database; ,
A determination unit that determines whether or not the error of the additional word is not generated in the speech recognition process using the first error value (hereinafter, referred to as “satisfying the first relationship”);
When it is determined that the first relationship is satisfied, an appearance probability output unit that outputs a large appearance probability that is a value larger than a second medium appearance probability as the word appearance probability of the additional word;
A language dictionary output unit that outputs the language dictionary added to the pre-addition language dictionary in association with the additional word and the word appearance probability;
Using the language dictionary in which the additional word and the first appearance probability are associated and added to the pre-addition language dictionary, and using the evaluation speech database, the first correct value of the additional word is obtained.
The correct value for obtaining the second correct value of the additional word using the language dictionary in which the additional word and the second appearance probability less than the first appearance probability are associated and added to the pre-addition language dictionary, and the evaluation speech database A calculation unit,
When the appearance probability of the additional word decreases using the first correct value and the second correct value, the determination unit tends to prevent the additional word from being correct in the speech recognition process (hereinafter referred to as “second”). 3) ”), and whether or not
When it is determined that the appearance probability output unit does not satisfy the first relationship and satisfies the third relationship , the second medium appearance is a value less than the large appearance probability as the word appearance probability of the additional word. An apparatus for adding words, characterized by outputting a probability.

The word adding device according to claim 5 ,
When it is determined that the third relationship is not satisfied, the error value calculation unit adds the additional word and the second appearance probability in association with the pre-addition language dictionary, and the evaluation speech database And the second error value is obtained using
When the appearance probability of the additional word is lowered using the first error value and the second error value, the determination unit tends not to cause an error of the additional word in the speech recognition process (hereinafter, referred to as “additional word”). "It is said that the second relation is satisfied")
If it is determined that the appearance probability output unit does not satisfy the first relationship and the third relationship and satisfies the second relationship, the appearance probability output unit is a value less than the large appearance probability as the word appearance probability of the additional word. An apparatus for adding words, wherein the first medium appearance probability is output.

The word adding device according to claim 6 ,
When it is determined that the first relationship, the second relationship, and the third relationship are not satisfied, the appearance probability output unit uses a value less than the second medium appearance probability as the word appearance probability of the additional word. A word adding device characterized in that it outputs a small probability of appearance.

A step of obtaining a first error value of the additional word using a language dictionary in which the additional word and the first appearance probability of the additional word are associated and added to the pre-addition language dictionary, and an evaluation speech database;
Using the first error value to determine whether or not the error of the additional word tends not to occur in the speech recognition process (hereinafter referred to as “satisfying the first relationship”);
When it is determined to satisfy the first relation, as a word occurrence probability of the additional words, the steps of outputting a large occurrence probability, which is a value greater than the first in appearance probability,
If it is determined that the first relationship is not satisfied, a language dictionary in which the additional word and a second appearance probability that is less than the first appearance probability are associated and added to the pre-addition language dictionary, and the evaluation speech database And a process of obtaining the second error value using
If the word appearance probability of the additional word is lowered using the first error value and the second error value, the additional word is not prone to appear in the speech recognition process (hereinafter referred to as “second relation”). And the process of determining whether or not
If it is determined that the second relationship is satisfied without satisfying the first relationship, a step of outputting the first medium appearance probability that is a value less than the large appearance probability as a word appearance probability of an additional word;
Outputting the language dictionary added to the pre-addition language dictionary in association with the additional word and the word appearance probability.

A step of obtaining a first error value of the additional word using a language dictionary in which the additional word and the first appearance probability of the additional word are associated and added to the pre-addition language dictionary, and an evaluation speech database;
Using the first error value to determine whether or not the error of the additional word tends not to occur in the speech recognition process (hereinafter referred to as “satisfying the first relationship”);
When it is determined that the first relationship is satisfied, a process of outputting a large appearance probability that is a value larger than a second medium appearance probability as the word appearance probability of the additional word;
A step of obtaining a first correct value of the additional word by using a language dictionary in which an additional word and the first appearance probability are associated and added to the pre-addition language dictionary, and using an evaluation speech database;
A process of obtaining a second correct value of the additional word using the language dictionary in which the additional word and the second appearance probability that is less than the first appearance probability are associated and added to the pre-addition language dictionary, and the evaluation speech database; ,
If the appearance probability of the additional word decreases using the first correct value and the second correct value, the correct word of the additional word tends not to occur in the speech recognition process (hereinafter, “satisfying the third relationship”). And the process of determining whether or not
If it is determined that the third relationship is satisfied without satisfying the first relationship, a step of outputting the second medium appearance probability that is a value less than the large appearance probability as the word appearance probability of the additional word;
Outputting the language dictionary added to the pre-addition language dictionary in association with the additional word and the word appearance probability.

Claim 1-7 program for operating a computer as a word additional device according to any one.