JP2007226091A

JP2007226091A - Speech recognizer and speech recognizing program

Info

Publication number: JP2007226091A
Application number: JP2006049729A
Authority: JP
Inventors: Shinichi Honma; 真一本間; Toru Imai; 亨今井
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2006-02-27
Filing date: 2006-02-27
Publication date: 2007-09-06
Anticipated expiration: 2026-02-27
Also published as: JP4764203B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speech recognizer which is improved in recognition precision for a predetermined keyword. <P>SOLUTION: The speech recognizer includes a storage means 11 which stores a language model 111 wherein predetermined identification information is imparted to a word corresponding to the specified keyword, and a word string generating means 13 for outputting a word string as a recognition result by searching the language model 111 for a route giving a maximum probability value to the word string of an input voice. The word string generating means 13 includes a bonus imparting means 132a for increasing the connection probability value of the word to which the identification information is given in the language model 111. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、言語モデルを用いて、入力音声を認識する音声認識装置及び音声認識プログラムに関する。 The present invention relates to a speech recognition apparatus and speech recognition program for recognizing input speech using a language model.

一般に、音声の認識は、単語の出現頻度や接続確率をモデル化した言語モデルから、認識候補となる単語を探索することにより行われている。この言語モデルとしては、Ｎ−ｇｒａｍを用いた言語モデル（Ｎグラム言語モデル）が知られている。
このＮグラム言語モデルは、入力される単語の列（単語列）ｗ_１ｗ_２…ｗ_ｎに対して、その単語の出現確率Ｐ（ｗ_１ｗ_２…ｗ_ｎ）を、以下に示す（１）式のように条件付き確率により算出し生成するモデルであり、単語列ｗ_１ｗ_２…ｗ_ｎと出現確率Ｐ（ｗ_１ｗ_２…ｗ_ｎ）とからなるエントリの集合である。 In general, speech recognition is performed by searching for a word as a recognition candidate from a language model that models the appearance frequency and connection probability of a word. As this language model, a language model (N-gram language model) using N-gram is known.
In this N-gram language model, for an input word string (word string) w ₁ w ₂ ... W _n , the word appearance probability P (w ₁ w ₂ ... W _n ) is shown below (1 ) Is a model that is calculated and generated based on a conditional probability, and is a set of entries including a word string w ₁ w ₂ ... W _n and an appearance probability P (w ₁ w ₂ ... W _n ).

すなわち、Ｎグラム言語モデルでは、ｉ番目の単語ｗ_ｉの生成確率が、（Ｎ−１）単語列ｗ_{ｉ−Ｎ＋１}…ｗ_ｉ−２ｗ_ｉ−１に依存する。例えば、３−ｇｒａｍ（Ｎ＝３：トライグラム）を例にとると、単語列ｗ_１ｗ_２に続いて単語ｗ_３が出現する確率はＰ（ｗ_３｜ｗ_１ｗ_２）と表される。
このようなＮグラム言語モデルにおいては、学習されていない未登録語については、認識を行うことができない。
そこで、従来は、認識辞書の語彙を拡張して未登録語を削減する手法が提案されている（非特許文献１、非特許文献２等）。
「Open Vocabulary ASR for Audiovisual Document Indexation」,ICASSP 2005 I pp.1013-1016 「Unsupervised Vocabulary Expansion for Automatic Transcription of Broadcast News」,ICASSP 2005 I pp.1021-1024 That is, in the N-gram language model, the generation probability of the i-th word w _i depends on (N−1) word strings w _{i−N + 1} ... W _i−2 w _i−1 . For example, taking 3-gram (N = 3: trigram) as an example, the probability that the word w ₃ appears following the word string w ₁ w ₂ is expressed as P (w ₃ | w ₁ w ₂ ). .
In such an N-gram language model, unregistered words that have not been learned cannot be recognized.
Therefore, conventionally, a method has been proposed in which the vocabulary of the recognition dictionary is expanded to reduce unregistered words (Non-Patent Document 1, Non-Patent Document 2, etc.).
`` Open Vocabulary ASR for Audiovisual Document Indexation '', ICASSP 2005 I pp.1013-1016 `` Unsupervised Vocabulary Expansion for Automatic Transcription of Broadcast News '', ICASSP 2005 I pp.1021-1024

一般に、放送番組、講演等の音声信号を音声認識する場合、認識する必要があることが事前に分かっている重要な単語、すなわち、キーワードが存在する。例えば、講演におけるテーマに関連する用語等である。従来から、このようなキーワードについては、精度よく音声認識してほしいという要求があった。
しかし、従来の技術は、事前に登録されていない単語を認識辞書に登録して、認識を可能とするものである。すなわち、認識辞書に単に未登録単語を登録することで、未登録単語を他の単語と同様に認識可能としたものであり、キーワードとなる単語の認識精度を高める工夫はなされていない。 In general, when a speech signal of a broadcast program, a lecture, or the like is recognized by speech, there is an important word that is known in advance, that is, a keyword that needs to be recognized. For example, terms related to the theme of the lecture. Conventionally, there has been a demand for such keywords to be recognized with high accuracy.
However, in the conventional technique, a word that has not been registered in advance is registered in a recognition dictionary to enable recognition. That is, by simply registering an unregistered word in the recognition dictionary, the unregistered word can be recognized in the same manner as other words, and no contrivance has been made to increase the recognition accuracy of the word as a keyword.

また、一般にキーワードは、固有名詞や、専門用語であることが多く、特殊な単語であるため、過去の出現頻度の情報に基づいて出現確率を推定することは困難であり、音声認識の精度を高めることができない要因となっていた。
本発明は、以上のような問題点に鑑みてなされたものであり、予め定めたキーワードについての認識精度を高めた音声認識装置及び音声認識プログラムを提供することを目的とする。 In general, keywords are often proper nouns and technical terms, and are special words. Therefore, it is difficult to estimate the appearance probability based on past appearance frequency information. It was a factor that could not be increased.
The present invention has been made in view of the above problems, and an object of the present invention is to provide a speech recognition apparatus and a speech recognition program with improved recognition accuracy for a predetermined keyword.

本発明は、前記目的を達成するために創案されたものであり、まず、請求項１に記載の音声認識装置は、言語モデルを用いて、入力音声を認識する音声認識装置において、言語モデル記憶手段と、単語列生成手段と、を備え、前記単語列生成手段が、確率値増加手段を有する構成とした。 The present invention has been developed to achieve the above object. First, the speech recognition apparatus according to claim 1 is a speech recognition apparatus for recognizing an input speech using a language model. And a word string generating means, wherein the word string generating means has a probability value increasing means.

かかる構成において、音声認識装置は、言語モデル記憶手段に、特定のキーワードに対応する単語に予め定めた識別情報を付与した言語モデルを記憶する。これによって、音声認識装置は、言語モデルを使用する際に、キーワードを識別することができる。
そして、音声認識装置は、単語列生成手段によって、言語モデルにおいて、入力音声の単語列に対する確率値が最大となる経路を探索することで、認識結果である単語列を出力する。このとき、単語列生成手段は、言語モデルにおいて識別情報によりキーワードを識別することができるため、確率値増加手段によって、識別情報が付与されている単語の接続確率値に、予め定めた値を加算することで、確率値を増加させる。これによって、キーワードを含んだ単語列が出力される確率値が増加することになり、キーワードの認識率を高めることができる。 In such a configuration, the speech recognition apparatus stores a language model obtained by adding predetermined identification information to a word corresponding to a specific keyword in the language model storage unit. Thus, the speech recognition apparatus can identify the keyword when using the language model.
Then, the speech recognition device outputs a word string that is a recognition result by searching for a path having a maximum probability value for the word string of the input speech in the language model by the word string generation unit. At this time, since the word string generation means can identify the keyword by the identification information in the language model, the predetermined value is added to the connection probability value of the word to which the identification information is given by the probability value increasing means. To increase the probability value. As a result, the probability value that the word string including the keyword is output increases, and the recognition rate of the keyword can be increased.

また、請求項２に記載の音声認識装置は、請求項１に記載の音声認識装置において、キーワード抽出手段と、言語モデル更新手段と、を備える構成とした。 According to a second aspect of the present invention, the speech recognition apparatus according to the first aspect includes a keyword extracting unit and a language model updating unit.

かかる構成において、音声認識装置は、キーワード抽出手段によって、電子化された文書からキーワードとなる単語を抽出する。この文書からキーワードを抽出するには、例えば、文書中に出現する単語の頻度と、その単語が全文書中のどれくらいの文書に出現するかを示す尺度とに基づいて、単語の重み（重要度スコア）を算出するＴＦ・ＩＤＦ法を用いることができる。
そして、音声認識装置は、言語モデル更新手段によって、キーワード抽出手段で抽出されたキーワードに識別情報を付与して、言語モデルを更新する。 In such a configuration, the speech recognition apparatus extracts a keyword word from the digitized document by the keyword extraction unit. To extract keywords from this document, for example, the word weight (importance) based on the frequency of the words that appear in the document and the scale that indicates how many documents the word appears in all the documents. The TF / IDF method for calculating the score can be used.
Then, the speech recognition apparatus updates the language model by adding identification information to the keyword extracted by the keyword extracting unit by the language model updating unit.

さらに、請求項３に記載の音声認識装置は、言語モデルを用いて、入力音声を認識する音声認識装置において、言語モデル記憶手段と、単語列生成手段と、を備え、前記単語列生成手段が、確率値増加手段と、確率値減少手段と、を有する構成とした。 Furthermore, the speech recognition apparatus according to claim 3 is a speech recognition apparatus for recognizing input speech using a language model, comprising a language model storage means and a word string generation means, wherein the word string generation means And a probability value increasing means and a probability value decreasing means.

かかる構成において、音声認識装置は、言語モデル記憶手段に、特定のキーワードに対応する単語に予め定めた第１の識別情報を付与するとともに、特定のキーワードに類似する単語に予め定めた第２の識別情報を付与した言語モデルを記憶する。これによって、音声認識装置は、言語モデルを使用する際に、キーワードと、そのキーワードに類似する単語を識別することができる。 In such a configuration, the speech recognition apparatus provides the language model storage unit with predetermined first identification information for a word corresponding to a specific keyword and a second predetermined for a word similar to the specific keyword. The language model with identification information is stored. Thus, the speech recognition apparatus can identify a keyword and a word similar to the keyword when using the language model.

そして、音声認識装置は、単語列生成手段によって、言語モデルにおいて、入力音声の単語列に対する確率値が最大となる経路を探索することで、認識結果である単語列を出力する。このとき、単語列生成手段は、言語モデルにおいて第１の識別情報によりキーワードを識別することができるため、確率値増加手段によって、第１の識別情報が付与されている単語の接続確率値に、予め定めた値を加算することで、確率値を増加させる。さらに、単語列生成手段は、言語モデルにおいて第２の識別情報によりキーワードに類似する単語を識別することができるため、確率値減少手段によって、第２の識別情報が付与されている単語の接続確率値から、予め定めた値を減算することで、確率値を減少させる。これによって、キーワードを含んだ単語列が出力される確率値を直接的に増加させるとともに、キーワードに類似する単語を含んだ単語列の出力確率を抑えるため、相対的にキーワードを含んだ単語列の確率値を増加させることになり、キーワードの認識率を高めることができる。 Then, the speech recognition device outputs a word string that is a recognition result by searching for a path having a maximum probability value for the word string of the input speech in the language model by the word string generation unit. At this time, since the word string generation means can identify the keyword by the first identification information in the language model, the probability value increasing means adds the connection probability value of the word to which the first identification information is given, The probability value is increased by adding a predetermined value. Furthermore, since the word string generation means can identify a word similar to the keyword by the second identification information in the language model, the probability of connection of the word to which the second identification information is given by the probability value reduction means The probability value is decreased by subtracting a predetermined value from the value. This directly increases the probability that the word string containing the keyword will be output and also reduces the output probability of the word string containing the word similar to the keyword. The probability value is increased, and the keyword recognition rate can be increased.

また、請求項４に記載の音声認識装置は、請求項３に記載の音声認識装置において、発音辞書記憶手段と、キーワード抽出手段と、音素列探索手段と、類似単語抽出手段と、言語モデル更新手段と、を備える構成とした。 The speech recognition apparatus according to claim 4 is the speech recognition apparatus according to claim 3, wherein the pronunciation dictionary storage means, the keyword extraction means, the phoneme string search means, the similar word extraction means, and the language model update Means.

かかる構成において、音声認識装置は、発音辞書記憶手段に予め単語の発音を記憶した発音辞書を記憶しておく。そして、音声認識装置は、キーワード抽出手段によって、電子化された文書からキーワードとなる単語を抽出する。
そして、音声認識装置は、音素列探索手段によって、発音辞書から、キーワードの発音を示す音素列であるキーワード音素列を探索する。
さらに、音声認識装置は、類似単語抽出手段によって、音素列探索手段で探索されたキーワード音素列と、発音辞書に登録されている単語の発音を示す登録単語音素列とに基づいて、キーワードに類似する単語を抽出する。
そして、音声認識装置は、言語モデル更新手段によって、キーワード抽出手段で抽出されたキーワードに、第１の識別情報を付与するとともに、キーワードに類似する単語に第２の識別情報を付与して、言語モデルを更新する。 In such a configuration, the speech recognition apparatus stores a pronunciation dictionary in which pronunciations of words are stored in advance in the pronunciation dictionary storage unit. Then, the speech recognition apparatus extracts a word as a keyword from the digitized document by the keyword extraction unit.
Then, the speech recognition apparatus searches the phoneme string search means for a keyword phoneme string that is a phoneme string indicating the pronunciation of the keyword from the pronunciation dictionary.
Furthermore, the speech recognition apparatus is similar to the keyword based on the keyword phoneme string searched by the phoneme string search means by the similar word extraction means and the registered word phoneme string indicating the pronunciation of the word registered in the pronunciation dictionary. To extract words.
Then, the speech recognition device adds the first identification information to the keyword extracted by the keyword extraction unit by the language model update unit, and adds the second identification information to a word similar to the keyword, Update the model.

さらに、請求項５に記載の音声認識装置は、請求項４に記載の音声認識装置において、前記確率値減少手段が、前記キーワード音素列と前記登録単語音素列との類似の度合いに基づいて、接続確率値の減少量を変化させる構成とした。 Furthermore, the speech recognition device according to claim 5 is the speech recognition device according to claim 4, wherein the probability value reducing means is based on the degree of similarity between the keyword phoneme sequence and the registered word phoneme sequence. The amount of decrease in the connection probability value is changed.

かかる構成において、音声認識装置は、確率値減少手段によって、キーワード音素列と登録単語音素列との類似の度合いが大きいほど、接続確率値の減少量を多くすることで、キーワードに類似する単語が認識される確率を低くすることができる。 In such a configuration, the speech recognition apparatus uses the probability value reduction unit to increase the amount of decrease in the connection probability value as the degree of similarity between the keyword phoneme sequence and the registered word phoneme sequence increases. The probability of being recognized can be lowered.

また、請求項６に記載の音声認識装置は、言語モデルを用いて、入力音声を認識する音声認識装置において、特定のキーワードとその品詞を記憶するキーワード記憶手段と、前記キーワードに対応する単語を品詞に置き換えて学習した言語モデルを記憶する言語モデル記憶手段と、前記言語モデルにおいて、前記入力音声の単語列に対する確率値が最大となる経路を探索することで、認識結果である単語列を出力する単語列生成手段と、を備え、前記単語列生成手段が、前記言語モデルにおいて、前記キーワードの接続確率値を前記キーワード記憶手段に記憶されている当該キーワードに対応する品詞の接続確率値に基づいて演算することで、前記キーワードの接続確率値を増加させる確率値増加手段を有する構成とした。 According to a sixth aspect of the present invention, there is provided a speech recognition apparatus for recognizing input speech using a language model, a keyword storage means for storing a specific keyword and its part of speech, and a word corresponding to the keyword. A language model storage means for storing a language model learned by replacing with a part of speech, and a word sequence as a recognition result is output by searching the language model for a path having a maximum probability value for the word sequence of the input speech. And a word string generation means that, based on a connection probability value of a part of speech corresponding to the keyword stored in the keyword storage means, in the language model, the word string generation means The probability value increasing means for increasing the connection probability value of the keyword is calculated.

かかる構成において、言語モデル記憶手段に記憶されている言語モデルは、キーワードについては品詞クラスの言語モデルとなる。そこで、音声認識装置は、単語列生成手段によって、キーワードの接続確率値が小さい場合には、キーワードの接続確率値を、キーワードに対応する品詞の接続確率値に基づいて演算することで、より大きな接続確率値が得られることになる。 In such a configuration, the language model stored in the language model storage means is a language model of the part of speech class for the keyword. Therefore, the speech recognition apparatus, when the keyword connection probability value is small by the word string generation means, calculates the keyword connection probability value based on the connection probability value of the part of speech corresponding to the keyword. A connection probability value is obtained.

さらに、請求項７に記載の音声認識プログラムは、特定のキーワードに対応する単語に予め定めた第１の識別情報を付与するとともに、前記特定のキーワードに類似する単語に予め定めた第２の識別情報を付与した言語モデルを用いて、入力音声を認識するために、コンピュータを、特徴抽出手段、単語列生成手段、として機能させ、前記単語生成手段が、前記言語モデルにおいて、前記第１の識別情報が付与されている単語の接続確率値を増加させるとともに、前記第２の識別情報が付与されている単語の接続確率値を減少させる構成とした。 Furthermore, the speech recognition program according to claim 7 gives a predetermined first identification information to a word corresponding to a specific keyword and a second identification predetermined for a word similar to the specific keyword. In order to recognize input speech using a language model to which information is added, a computer is caused to function as a feature extraction unit and a word string generation unit, and the word generation unit includes the first identification in the language model. While the connection probability value of the word to which the information is assigned is increased, the connection probability value of the word to which the second identification information is assigned is decreased.

かかる構成において、音声認識プログラムは、特徴抽出手段によって、入力音声を分析し、特徴量を抽出する。そして、音声認識プログラムは、単語列生成手段によって、特徴量により、言語モデルにおいて、入力音声の単語列に対する確率値が最大となる経路を探索することで、認識結果である単語列を出力する。このとき、単語列生成手段は、言語モデルにおいて第１の識別情報によりキーワードを識別することができるため、確率値増加手段によって、第１の識別情報が付与されている単語の接続確率値に予め定めた値を加算することで確率値を増加させる。さらに、単語列生成手段は、言語モデルにおいて第２の識別情報によりキーワードに類似する単語を識別することができるため、確率値減少手段によって、第２の識別情報が付与されている単語の接続確率値から予め定めた値を減算することで確率値を減少させる。これによって、キーワードを含んだ単語列が出力される確率値を直接的に増加させるとともに、キーワードに類似する単語を含んだ単語列の出力確率を抑えるため、相対的にキーワードを含んだ単語列の確率値を増加させることになり、キーワードの認識率を高めることができる。 In such a configuration, the speech recognition program analyzes the input speech by the feature extraction unit and extracts the feature amount. Then, the speech recognition program outputs a word string that is a recognition result by searching for a path having a maximum probability value for the word string of the input speech in the language model by the feature amount by the word string generation unit. At this time, since the word string generation means can identify the keyword by the first identification information in the language model, the probability value increasing means preliminarily sets the connection probability value of the word to which the first identification information is given. The probability value is increased by adding the determined values. Furthermore, since the word string generation means can identify a word similar to the keyword by the second identification information in the language model, the probability of connection of the word to which the second identification information is given by the probability value reduction means The probability value is decreased by subtracting a predetermined value from the value. This directly increases the probability that the word string containing the keyword will be output and also reduces the output probability of the word string containing the word similar to the keyword. The probability value is increased, and the keyword recognition rate can be increased.

本発明は、以下に示す優れた効果を奏するものである。
請求項１に記載の発明によれば、予め登録したキーワードについて、認識精度を高めて音声認識を行うことができる。これによって、例えば、放送番組や講演等で、固有のキーワードを使用する場合、通常では認識精度が低い単語であっても、認識精度を高めることが可能となり、個々の放送番組や講演等に適した音声認識を行うことが可能になる。 The present invention has the following excellent effects.
According to the first aspect of the present invention, it is possible to perform speech recognition with a higher recognition accuracy for keywords registered in advance. As a result, for example, when a unique keyword is used in a broadcast program or lecture, it is possible to increase the recognition accuracy even if the word is usually low in recognition accuracy, which is suitable for individual broadcast programs and lectures. It is possible to perform voice recognition.

請求項２に記載の発明によれば、電子化された文書からキーワードを抽出することができるため、人手を介さずにキーワードを登録することができる。
また、本発明によれば、キーワードに対して識別情報を付与しているだけであるため、その情報を削除することで、容易に元の精度の音声認識装置に戻すことができる。これによって、例えば、異なる放送番組等の台本によって、異なるキーワードの認識精度を高めたい場合であっても、前の識別情報を削除した後に、新たな台本によってキーワードを登録するだけで容易に行うことができる。 According to the second aspect of the present invention, since the keyword can be extracted from the digitized document, the keyword can be registered without human intervention.
Further, according to the present invention, since identification information is only given to a keyword, it is possible to easily return to the voice recognition device with the original accuracy by deleting the information. Thus, for example, even when it is desired to improve the recognition accuracy of different keywords by using a script such as a different broadcast program, it is easy to do so simply by registering the keyword with a new script after deleting the previous identification information. Can do.

請求項３又は請求項７に記載の発明によれば、キーワードに類似する単語の接続確率値を減少させることができるため、相対的にキーワードが認識される確率が高くなる。これによって、予め登録したキーワードについて、認識精度を高めて音声認識を行うことができる。
請求項４に記載の発明によれば、電子化された文書からキーワードに類似する単語を抽出することができるため、人手を介さずにキーワードの類似する単語を識別するための情報を登録することができる。
また、本発明によれば、キーワードに対して識別情報を付与しているだけであるため、その情報を削除することで、容易に元の精度の音声認識装置に戻すことができる。 According to the invention described in claim 3 or claim 7, since the connection probability value of the word similar to the keyword can be reduced, the probability that the keyword is recognized becomes relatively high. This makes it possible to perform speech recognition with a higher recognition accuracy for keywords registered in advance.
According to the fourth aspect of the present invention, since a word similar to the keyword can be extracted from the digitized document, information for identifying the word similar to the keyword is registered without human intervention. Can do.
Further, according to the present invention, since identification information is only given to a keyword, it is possible to easily return to the voice recognition device with the original accuracy by deleting the information.

請求項５に記載の発明によれば、キーワードに類似する単語ほど、より接続確率値を減少させるため、相対的にキーワードの認識精度を高めることができる。
請求項６に記載の発明によれば、キーワードを品詞クラスの言語モデルにより接続確率値を算出するため、通常の単語のみにより接続確率値を求める場合に比べて、その接続確率値を高めることができ、キーワードの認識精度を高めることができる。 According to the fifth aspect of the present invention, as the word resembles the keyword, the connection probability value is further reduced, so that the keyword recognition accuracy can be relatively improved.
According to the sixth aspect of the present invention, since the connection probability value is calculated using the language model of the part-of-speech class for the keyword, the connection probability value can be increased as compared with the case where the connection probability value is obtained only from a normal word. It is possible to improve the keyword recognition accuracy.

以下、本発明の実施の形態について図面を参照して説明する。
［第一の実施の形態］
（音声認識装置の構成）
まず、図１を参照して、第一の実施の形態に係る音声認識装置の構成について説明する。図１は、本発明の第一の実施の形態に係る音声認識装置の構成を示すブロック図である。図１に示した音声認識装置１は、特定のキーワードについての認識精度を高めて、入力された音声信号を認識するものである。ここでは、音声認識装置１は、認識部１０と、更新部２０とで構成されている。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[First embodiment]
(Configuration of voice recognition device)
First, the configuration of the speech recognition apparatus according to the first embodiment will be described with reference to FIG. FIG. 1 is a block diagram showing the configuration of the speech recognition apparatus according to the first embodiment of the present invention. The speech recognition apparatus 1 shown in FIG. 1 recognizes an input speech signal with improved recognition accuracy for a specific keyword. Here, the speech recognition apparatus 1 includes a recognition unit 10 and an update unit 20.

認識部１０は、言語モデルと音響モデルと発音辞書とに基づいて、音声信号を認識するものである。ここでは、認識部１０は、記憶手段１１と、特徴抽出手段１２と、単語列生成手段１３とを備えている。 The recognition unit 10 recognizes an audio signal based on a language model, an acoustic model, and a pronunciation dictionary. Here, the recognition unit 10 includes a storage unit 11, a feature extraction unit 12, and a word string generation unit 13.

記憶手段（言語モデル記憶手段、発音辞書記憶手段）１１は、言語モデル１１１と、音響モデル１１２と、発音辞書１１３とを記憶するものであって、ハードディスク等の一般的な記憶装置である。
言語モデル１１１は、大量の音声データから学習した出力系列（単語、形態素、音素等）の出現頻度や接続確率等をモデル化したものである。この言語モデルには、例えば、一般的な「Ｎグラム言語モデル」を用いることができる。 The storage means (language model storage means, pronunciation dictionary storage means) 11 stores a language model 111, an acoustic model 112, and a pronunciation dictionary 113, and is a general storage device such as a hard disk.
The language model 111 models the appearance frequency, connection probability, and the like of an output sequence (words, morphemes, phonemes, etc.) learned from a large amount of speech data. As this language model, for example, a general “N-gram language model” can be used.

なお、言語モデル１１１は、後記する更新部２０によって、特定のキーワードを識別する情報が付与されて生成されている。ここで、図２を参照して、言語モデル１１１の内容について具体的に説明する。図２は、本発明に係る言語モデルの内容を示すデータ構造図である。 The language model 111 is generated by the update unit 20 described later with information for identifying a specific keyword. Here, with reference to FIG. 2, the content of the language model 111 is demonstrated concretely. FIG. 2 is a data structure diagram showing the contents of the language model according to the present invention.

図２に示すように、言語モデル１１１は、文章を構成する単語と、その次に来る単語のつながりやすさの確率や、文頭に出現しやすい単語の確率を、「文頭」から「文末」にわたって記述したデータである。なお、この言語モデル１１１は、一般的なＮグラム言語モデルを基本として、特定のキーワードを識別することが可能なように拡張されている。
具体的には、言語モデル１１１は、キーワード（図２では「コスギ」）に、キーワードを特定するための識別情報（図２では「！」の文字）を付与されていることとし、そのキーワードが接続される確率値に対して、加算する確率値をボーナス値として記憶する領域ＢＡを設けている。
これによって、単語の接続確率値を算出する際に、このボーナス値を加算することで、キーワードが認識される精度を高めることができる。 As shown in FIG. 2, the language model 111 sets the probability of connection between a word constituting a sentence and the next word, and the probability of a word that tends to appear at the beginning of a sentence from “beginning” to “end of sentence”. Described data. The language model 111 is expanded based on a general N-gram language model so that a specific keyword can be identified.
Specifically, in the language model 111, identification information (characters “!” In FIG. 2) for specifying the keyword is given to the keyword (“Kosugi” in FIG. 2). An area BA is provided for storing the probability value to be added as a bonus value for the probability value to be connected.
Thereby, when calculating the connection probability value of the word, the accuracy of recognizing the keyword can be increased by adding the bonus value.

また、言語モデル１１１では、さらに、キーワードの認識精度を高めるため、キーワードと発音が類似する単語については、接続される確率値に対して減算する確率値をペナルティ値として記憶する領域ＰＡを設けている。図２の例では、ボーナス値とペナルティ値を記憶する領域を同一とし、「＋」、「−」の記号によって、いずれかを識別することとしている。 Further, in the language model 111, an area PA for storing a probability value to be subtracted from a probability value to be connected as a penalty value is provided for words similar in pronunciation to the keyword in order to improve the recognition accuracy of the keyword. Yes. In the example of FIG. 2, the area for storing the bonus value and the penalty value is the same, and either one is identified by the symbols “+” and “−”.

なお、ここでは、「！」等の文字によってキーワードを識別する識別情報（第１の識別情報）としているが、ボーナス値の「＋」等の符号を識別情報とみなすこととしてもよい。また、ここでは、ペナルティ値の「−」等の符号を、キーワードに類似する単語を示す第２の識別情報としている。
この図２に示した言語モデルの生成については、後記する更新部２０の説明において行うこととする。
図１に戻って、音声認識装置１の構成について説明を続ける。 Here, the identification information (first identification information) for identifying the keyword by characters such as “!” Is used, but a sign such as “+” of the bonus value may be regarded as the identification information. Here, a sign such as “−” of the penalty value is used as second identification information indicating a word similar to the keyword.
The generation of the language model shown in FIG. 2 is performed in the description of the update unit 20 described later.
Returning to FIG. 1, the description of the configuration of the speech recognition apparatus 1 will be continued.

音響モデル１１２は、大量の音声データから予め学習した音素ごとの特徴量を「隠れマルコフモデル」によってモデル化したものである。この音響モデル１１２は、単一の音響モデルを用いてもよいし、音響の種別（例えば、人物別）ごとに複数のモデルを用いてもよい。
発音辞書１１３は、単語ごとにその発音を示す子音と母音との構成を示したものである。なお、この発音辞書１１３には、予め複数の単語の発音を登録しておく。 The acoustic model 112 is obtained by modeling a feature amount for each phoneme learned in advance from a large amount of speech data using a “hidden Markov model”. As the acoustic model 112, a single acoustic model may be used, or a plurality of models may be used for each acoustic type (for example, for each person).
The pronunciation dictionary 113 shows the structure of consonants and vowels that indicate the pronunciation for each word. In this pronunciation dictionary 113, pronunciations of a plurality of words are registered in advance.

特徴抽出手段１２は、外部から入力された音声（音声信号）を分析し、その音声の特徴量を抽出するものである。
なお、特徴抽出手段１２は、入力された音声の音声波形に窓関数（ハミング窓等）をかけることで、フレーム化された波形を抽出し、その波形を周波数分析することで、種々の特徴量を抽出する。例えば、フレーム化された波形のパワースペクトルの対数を逆フーリエ変換した値であるケプストラム係数等を特徴量とする。この特徴量には、ケプストラム係数以外にも、メル周波数ケプストラム係数（ＭＦＣＣ：Mel Frequency Cepstrum Coefficient）、ＬＰＣ（Linear Predictive Cording）係数、対数パワー等、一般的な音声特徴量を用いることができる。 The feature extraction unit 12 analyzes a voice (voice signal) input from the outside and extracts a feature amount of the voice.
Note that the feature extraction unit 12 extracts a framed waveform by applying a window function (such as a Hamming window) to the speech waveform of the input speech, and performs frequency analysis on the waveform to obtain various feature quantities. To extract. For example, a cepstrum coefficient or the like, which is a value obtained by inverse Fourier transform of the logarithm of the power spectrum of a framed waveform, is used as the feature amount. In addition to the cepstrum coefficient, a general speech feature quantity such as a mel frequency cepstrum coefficient (MFCC), an LPC (Linear Predictive Cording) coefficient, logarithmic power, or the like can be used as the feature quantity.

単語列生成手段１３は、記憶手段１１に記憶されている言語モデル１１１、音響モデル１１２及び発音辞書１１３に基づいて、特徴抽出手段１２で抽出された特徴量から、音声認識結果となる単語列を生成するものである。ここでは、単語列生成手段１３は、音響類似度算出手段１３１と、探索手段１３２とを備えている。 Based on the language model 111, the acoustic model 112, and the pronunciation dictionary 113 stored in the storage unit 11, the word string generation unit 13 generates a word string that is a speech recognition result from the feature amount extracted by the feature extraction unit 12. Is to be generated. Here, the word string generation unit 13 includes an acoustic similarity calculation unit 131 and a search unit 132.

音響類似度算出手段１３１は、特徴抽出手段１２で抽出され、時系列に入力される特徴量と、記憶手段１１に記憶されている音響モデル１１２でモデル化されている音素との類似度（確率値）を算出するものである。 The acoustic similarity calculation unit 131 extracts the similarity (probability) between the feature quantity extracted by the feature extraction unit 12 and input in time series and the phoneme modeled by the acoustic model 112 stored in the storage unit 11. Value).

探索手段１３２は、言語モデル１１１から、接続される出力系列の候補を探索し、確率値が最大となる出力系列を入力音声に対する認識結果（認識単語列）として出力するものである。ここでは、探索手段１３２は、ボーナス付与手段１３２ａと、ペナルティ付与手段１３２ｂとを備えている。 The search means 132 searches the language model 111 for connected output sequence candidates, and outputs the output sequence having the maximum probability value as a recognition result (recognition word string) for the input speech. Here, the search unit 132 includes a bonus grant unit 132a and a penalty grant unit 132b.

ボーナス付与手段（確率値増加手段）１３２ａは、出力系列の確率値を算出する際に、言語モデル１１１において、キーワードとして登録されている単語の確率に、予め定めた確率値をボーナス値として加算するものである。ここでは、ボーナス付与手段１３２ａは、図２で説明した言語モデル１１１において、キーワードとなる単語を、識別情報（図２では「！」の文字）により認識し、そのキーワードに付されているボーナス値を、元となる確率値に加算する。これによって、キーワードの接続確率が高くなり、キーワードが認識される精度を高めることができる。 The bonus giving means (probability value increasing means) 132a adds a predetermined probability value as a bonus value to the probability of a word registered as a keyword in the language model 111 when calculating the probability value of the output series. Is. Here, the bonus grant means 132a recognizes a word as a keyword from the identification information (characters “!” In FIG. 2) in the language model 111 described with reference to FIG. 2, and a bonus value assigned to the keyword. Is added to the original probability value. As a result, the probability of keyword connection increases, and the accuracy with which the keyword is recognized can be increased.

ペナルティ付与手段（確率値減少手段）１３２ｂは、出力系列の接続確率を算出する際に、言語モデル１１１において、キーワードに類似する単語として登録されている単語の確率から、予め定めた確率値をペナルティ値として減算するものである。これによって、キーワードに類似する単語の接続確率が低くなり、相対的にキーワードが認識される精度を高めることができる。 When calculating the connection probability of the output sequence, the penalty granting means (probability value reducing means) 132b penalizes a predetermined probability value from the probabilities of words registered as words similar to the keyword in the language model 111. Subtract as a value. Thereby, the connection probability of words similar to the keyword is lowered, and the accuracy of recognizing the keyword can be relatively increased.

なお、探索手段１３２は、言語モデル１１１に基づいて、単語ごとに接続確率を加算するとともに、音響類似度算出手段１３１によって算出された、音響モデル１１２に基づく類似度（確率値）も加算する。 Note that the search unit 132 adds a connection probability for each word based on the language model 111 and also adds a similarity (probability value) based on the acoustic model 112 calculated by the acoustic similarity calculation unit 131.

更新部２０は、認識部１０で使用される言語モデル１１１を、特定のキーワードの認識精度が高くなるように更新するものである。ここでは、更新部２０は、キーワード抽出手段２１と、記憶手段２２と、音素列探索手段２３と、類似単語抽出手段２４と、言語モデル更新手段２５とを備えている。 The update unit 20 updates the language model 111 used by the recognition unit 10 so that the recognition accuracy of a specific keyword is increased. Here, the update unit 20 includes a keyword extraction unit 21, a storage unit 22, a phoneme string search unit 23, a similar word extraction unit 24, and a language model update unit 25.

キーワード抽出手段２１は、電子化されたキーワードを含んだ文書からキーワードを抽出するものである。この文書は、例えば、放送番組の台本、構成表等が該当する。ここでは、キーワード抽出手段２１は、文書内の単語に重み付けを行い、その重み（重要度スコア）に基づいて、キーワードとなる単語を特定することとする。 The keyword extraction means 21 extracts a keyword from a document containing the digitized keyword. This document corresponds to, for example, a broadcast program script or a configuration table. Here, the keyword extraction means 21 weights the words in the document, and specifies the word that becomes the keyword based on the weight (importance score).

ここで、図３を参照（適宜図１参照）して、キーワード抽出手段２１が行うキーワードの抽出手法について説明する。図３は、キーワードの抽出手法を説明するための説明図である。
ここでは、図３に示すように、キーワードを含む文書ｄがＮ個入力され、その文書内からキーワードを抽出することとする。なお、文書ｄにおける記号「△」は、単語の区切りを示している。このように、入力される文書ｄは予め単語ごとに区切られたデータであってもよいし、キーワード抽出手段２１が形態素解析を行うことで、単語ごとに区分することとしてもよい。 Here, referring to FIG. 3 (refer to FIG. 1 as appropriate), a keyword extraction method performed by the keyword extraction means 21 will be described. FIG. 3 is an explanatory diagram for explaining a keyword extraction method.
Here, as shown in FIG. 3, N documents d including a keyword are input, and the keywords are extracted from the document. Note that the symbol “Δ” in the document d indicates a word break. As described above, the input document d may be data divided in advance for each word, or may be classified for each word by the keyword extraction unit 21 performing morphological analysis.

一般に、キーワードとなる重要な単語は、文書ｄ内に多く出現する。しかし、すべての文書（Ｎ個分）に数多く出現すると、逆に文書ｄを特徴付ける単語とは言えなくなる。そこで、ここでは、キーワード抽出手段２１は、文書ｄから、キーワードを抽出する手法として「ＴＦ・ＩＤＦ法」を用い、単語の重み（重要度スコア）を算出することで、キーワードを抽出する。
この「ＴＦ・ＩＤＦ法」における重要度スコアは、ある文書中に出現する単語の頻度（ＴＦ：Term Frequency）と、その単語が全文書中のどれくらいの文書に出現するかを示す尺度（ＩＤＦ：Inverse Document Frequency）とを乗算することにより得られる。
すなわち、キーワード抽出手段２１は、ある文書ｄ中に出現する単語ｔの頻度をｔｆ（ｔ，ｄ）、単語ｔが全文書中のどれくらいの文書に出現するかを示す尺度をｉｄｆ（ｔ）としたとき、以下の（２）式により単語ｔの重要度スコアｗ_ｔ ^ｄを算出する。 In general, many important words as keywords appear in the document d. However, if many appear in all documents (N), it cannot be said that the word characterizes the document d. Therefore, here, the keyword extracting means 21 uses the “TF / IDF method” as a method for extracting a keyword from the document d, and calculates the weight of the word (importance score) to extract the keyword.
The importance score in the “TF / IDF method” is a scale (IDF :) indicating the frequency (TF: Term Frequency) of a word that appears in a document and how many documents in the entire document the word appears. Inverse Document Frequency).
That is, the keyword extracting means 21 uses tf (t, d) as the frequency of the word t appearing in a certain document d, and idf (t) as a scale indicating how many words the word t appears in all documents. Then, the importance score w _t ^d of the word t is calculated by the following equation (2).

なお、この（２）式における尺度ｉｄｆ（ｔ）は、全文書数をＮ、単語ｔが出現する文書数をｄｆ（ｔ）としたとき、以下の（３）式で表すことができる。 The scale idf (t) in the equation (2) can be expressed by the following equation (3), where N is the total number of documents and df (t) is the number of documents in which the word t appears.

このように、キーワード抽出手段２１は、文書から、その文書内における単語の重要度スコアを算出し、予め定めた閾値よりも大きい重要度スコアとなった単語をキーワードとして抽出する。そして、キーワード抽出手段２１は、抽出したキーワードと、その重要度スコアとを記憶手段２２に記憶する。また、ここでは、重要度スコアを、前記したボーナス値として使用することとする。
なお、この図３に示した例では、文書ｄから、キーワード「コスギ」、「空手」、「武道」が抽出され、それぞれの重要度スコア（ボーナス値）として「１．０００」、「０．９５９」、「０．５３２」が算出された例を示している。
図１に戻って、音声認識装置１の構成について説明を続ける。 As described above, the keyword extracting unit 21 calculates the importance score of the word in the document from the document, and extracts the word having the importance score larger than the predetermined threshold as the keyword. Then, the keyword extraction unit 21 stores the extracted keyword and its importance score in the storage unit 22. Here, the importance score is used as the bonus value described above.
In the example shown in FIG. 3, the keywords “Cosgi”, “Karate”, and “Martial art” are extracted from the document d, and “1.000”, “0. 959 ”and“ 0.532 ”are calculated.
Returning to FIG. 1, the description of the configuration of the speech recognition apparatus 1 will be continued.

記憶手段（キーワード記憶手段）２２は、キーワード抽出手段２１で抽出されたキーワードと重要度スコアとを、キーワード辞書２２１として記憶するものであって、ハードディスク等の一般的な記憶装置である。 The storage means (keyword storage means) 22 stores the keyword extracted by the keyword extraction means 21 and the importance score as a keyword dictionary 221 and is a general storage device such as a hard disk.

音素列探索手段２３は、記憶手段２２に記憶されているキーワード辞書２２１に含まれているキーワードの発音を示す音素列（キーワード音素列）を、発音辞書１１３から検索するものである。なお、発音辞書１１３から検索されたキーワード音素列は、子音と母音とが組み合わされた文字列である。この音素列探索手段２３は、キーワードと、検索したキーワード音素列とを、類似単語抽出手段２４に出力する。 The phoneme string search means 23 searches the pronunciation dictionary 113 for a phoneme string (keyword phoneme string) indicating the pronunciation of a keyword included in the keyword dictionary 221 stored in the storage means 22. The keyword phoneme string retrieved from the pronunciation dictionary 113 is a character string in which consonants and vowels are combined. The phoneme string search means 23 outputs the keyword and the searched keyword phoneme string to the similar word extraction means 24.

類似単語抽出手段２４は、音素列探索手段２３で探索されたキーワードの発音を示すキーワード音素列と、発音辞書１１３に登録されている単語の発音を示す登録単語音素列とに基づいて、キーワードに類似（同一を含む）する単語（類似単語）を抽出するものである。ここでは、類似単語抽出手段２４は、類似度測定手段２４１を備えている。 The similar word extracting unit 24 selects a keyword based on the keyword phoneme sequence indicating the pronunciation of the keyword searched by the phoneme sequence searching unit 23 and the registered word phoneme sequence indicating the pronunciation of the word registered in the pronunciation dictionary 113. Words that are similar (including the same) (similar words) are extracted. Here, the similar word extracting unit 24 includes a similarity degree measuring unit 241.

類似度測定手段２４１は、キーワード音素列と登録単語音素列との類似の度合いを示す類似度を測定するものである。
すなわち、類似単語抽出手段２４は、類似度測定手段２４１によって測定された類似度により、キーワード音素列に類似する登録単語音素列を有する単語を類似単語として抽出する。 The similarity measurer 241 measures the similarity indicating the degree of similarity between the keyword phoneme string and the registered word phoneme string.
That is, the similar word extracting unit 24 extracts words having a registered word phoneme sequence similar to the keyword phoneme sequence as similar words based on the similarity measured by the similarity measuring unit 241.

ここで、図４を参照（適宜図１参照）して、類似単語抽出手段２４が行う類似単語の抽出手法について説明する。図４は、類似単語の抽出手法を説明するための説明図である。
ここでは、図４に示すように、キーワードとして、「武道」及び「コスギ」が予め抽出されているものとする。
このとき、類似単語抽出手段２４は、類似度測定手段２４１によって、キーワード音素列と登録単語音素列との類似の度合いを示す類似度を測定する。この各音素列同士の類似度は、例えば、ＤＰ（Dynamic Programming）マッチング法により音素列間の距離として求めることができる。 Here, with reference to FIG. 4 (refer to FIG. 1 as appropriate), a method of extracting similar words performed by the similar word extracting unit 24 will be described. FIG. 4 is an explanatory diagram for explaining a method of extracting similar words.
Here, as shown in FIG. 4, it is assumed that “martial arts” and “Kosugi” are extracted in advance as keywords.
At this time, the similar word extracting means 24 measures the similarity indicating the degree of similarity between the keyword phoneme string and the registered word phoneme string by the similarity measuring means 241. The similarity between the phoneme strings can be obtained as a distance between the phoneme strings by, for example, DP (Dynamic Programming) matching method.

図４の例では、キーワード「武道」と登録単語「ぶどう」とは、どちらも音素列「ｂｕｄｏ：」で、音素列間の距離が“０”となり、同一の発音となる。また、キーワード「コスギ」と登録単語「小菅」とは、各音素列が「ｋｏｓｕｇｉ」と「ｋｏｓｕｇｅ」とで１音素（「ｉ」と「ｅ」）のみが異なっており、音素列間の距離が“１”の類似する発音となる。
そこで、類似単語抽出手段２４は、この音素列間の距離が近いものほど、前記したペナルティ値を大きく設定する。ここでは、登録単語「ぶどう」にペナルティ値「−１．０」、登録単語「小菅」にペナルティ値「−０．５」を設定した例を示している。
図１に戻って、音声認識装置１の構成について説明を続ける。 In the example of FIG. 4, the keyword “martial arts” and the registered word “grape” are both phoneme strings “budo:”, and the distance between phoneme strings is “0”, resulting in the same pronunciation. In addition, the keyword “Kosugi” and the registered word “Kosuge” are different in each phoneme sequence between “kosugi” and “kosugue”, but only one phoneme (“i” and “e”). Is a similar pronunciation of “1”.
Therefore, the similar word extraction unit 24 sets the penalty value larger as the distance between the phoneme strings is shorter. Here, an example is shown in which a penalty value “−1.0” is set for the registered word “grape”, and a penalty value “−0.5” is set for the registered word “Kosuge”.
Returning to FIG. 1, the description of the configuration of the speech recognition apparatus 1 will be continued.

言語モデル更新手段２５は、記憶手段２２に記憶されているキーワード辞書２２１に登録されているキーワード及び重要度スコア（ボーナス値）と、類似単語抽出手段２４で抽出された類似単語及びペナルティ値とに基づいて、言語モデル１１１を更新するものである。すなわち、言語モデル更新手段２５は、図２で説明したように、言語モデル１１１において、キーワードを特定するための識別情報（図２では「！」の文字）を、キーワードを示す単語に付与し、接続確率値に加算するボーナス値を領域ＢＡに登録する。さらに、言語モデル更新手段２５は、言語モデル１１１において、類似単語を探索し、接続確率値から減算するペナルティ値を領域ＰＡに登録する。 The language model update unit 25 uses the keywords and importance scores (bonus values) registered in the keyword dictionary 221 stored in the storage unit 22 and the similar words and penalty values extracted by the similar word extraction unit 24. Based on this, the language model 111 is updated. That is, as described in FIG. 2, the language model update unit 25 adds identification information (character “!” In FIG. 2) for specifying the keyword to the word indicating the keyword in the language model 111. A bonus value to be added to the connection probability value is registered in the area BA. Furthermore, the language model update unit 25 searches for similar words in the language model 111 and registers a penalty value to be subtracted from the connection probability value in the area PA.

なお、言語モデル更新手段２５は、言語モデル１１１に、以前登録したキーワードが存在する場合、そのキーワードの登録を抹消し、ボーナス値やペナルティ値を消去することとする。ここでキーワードの登録を抹消するには、キーワードに付した識別情報（図２では「！」の文字）を消去すればよい。これによって、言語モデル１１１は、常に認識精度を高めたいキーワードのみについてボーナス値が付与された状態となる。
このように音声認識装置１を構成することで、音声認識装置１は、電子化された文書からキーワードを抽出し、そのキーワードに対して認識精度を高めて音声認識を行うことができる。 The language model update unit 25 cancels the registration of the keyword and deletes the bonus value and the penalty value when the keyword registered previously exists in the language model 111. Here, in order to delete the registration of the keyword, the identification information attached to the keyword (the character “!” In FIG. 2) may be deleted. As a result, the language model 111 is in a state in which a bonus value is given only for a keyword for which recognition accuracy is always to be improved.
By configuring the speech recognition device 1 in this way, the speech recognition device 1 can extract a keyword from an electronic document and perform speech recognition with an increased recognition accuracy for the keyword.

以上、第一の実施の形態に係る音声認識装置１の構成について説明したが、本発明は、この構成に限定されるものではない。例えば、予めキーワードが決まっているのであれば、構成からキーワード抽出手段２１を省略し、直接、キーワード辞書２２１にキーワードとボーナス値とを登録することとしてもよい。
また、ここでは、言語モデル１１１に、キーワードに対してボーナス値を設定し、類似単語にペナルティ値を設定することとしたが、キーワードに対してボーナス値を設定するだけの構成としてもよい。すなわち、音声認識装置１から、ペナルティ付与手段１３２ｂ、音素列探索手段２３、類似単語抽出手段２４を省略して構成してもよい。この場合、言語モデル更新手段２５は、記憶手段２２に記憶されているキーワード辞書２２１に登録されているキーワード及びボーナス値に基づいて、言語モデル１１１を更新する。
なお、音声認識装置１は、一般的なコンピュータを前記した各手段として機能させる音声認識プログラムによって動作させることができる。 The configuration of the speech recognition apparatus 1 according to the first embodiment has been described above, but the present invention is not limited to this configuration. For example, if a keyword is determined in advance, the keyword extraction unit 21 may be omitted from the configuration, and the keyword and bonus value may be registered directly in the keyword dictionary 221.
Here, in the language model 111, a bonus value is set for a keyword and a penalty value is set for a similar word. However, a configuration in which a bonus value is only set for a keyword may be used. That is, the speech recognition apparatus 1 may be configured by omitting the penalty giving means 132b, the phoneme string searching means 23, and the similar word extracting means 24. In this case, the language model update unit 25 updates the language model 111 based on the keywords and bonus values registered in the keyword dictionary 221 stored in the storage unit 22.
The voice recognition apparatus 1 can be operated by a voice recognition program that causes a general computer to function as each of the above-described means.

（音声認識装置の動作）
次に、図５及び図６を参照（構成については図１参照）して、音声認識装置の動作について説明する。図５は、本発明の第一の実施の形態に係る音声認識装置の言語モデルの更新動作を示すフローチャートである。図６は、本発明の第一の実施の形態に係る音声認識装置の音声認識動作を示すフローチャートである。 (Operation of voice recognition device)
Next, the operation of the speech recognition apparatus will be described with reference to FIGS. 5 and 6 (see FIG. 1 for the configuration). FIG. 5 is a flowchart showing the language model update operation of the speech recognition apparatus according to the first embodiment of the present invention. FIG. 6 is a flowchart showing the speech recognition operation of the speech recognition apparatus according to the first embodiment of the present invention.

〔言語モデルの更新動作〕
図５に示すように、まず、音声認識装置１は、キーワード抽出手段２１によって、電子化されたキーワードを含んだ文書から、「ＴＦ・ＩＤＦ法」により単語の重み（重要度スコア；ボーナス値として使用）を算出することで、キーワードを抽出する（ステップＳ１）。
そして、キーワード抽出手段２１が、ステップＳ１で抽出したキーワードと、そのキーワードのボーナス値とを、キーワード辞書２２１として記憶手段２２に記憶する（ステップＳ２）。
その後、音声認識装置１は、音素列探索手段２３によって、キーワード辞書２２１に含まれているキーワードの発音を示す音素列（キーワード音素列）を、発音辞書１１３から探索する（ステップＳ３）。 [Update operation of language model]
As shown in FIG. 5, first, the speech recognition apparatus 1 uses a keyword extraction unit 21 to extract word weights (importance score; bonus value) from a document containing keywords digitized by the “TF / IDF method”. The keyword is extracted by calculating (use) (step S1).
Then, the keyword extraction unit 21 stores the keyword extracted in step S1 and the bonus value of the keyword in the storage unit 22 as the keyword dictionary 221 (step S2).
Thereafter, the speech recognition apparatus 1 searches the phoneme sequence 113 for the phoneme sequence (keyword phoneme sequence) indicating the pronunciation of the keyword included in the keyword dictionary 221 by the phoneme sequence search means 23 (step S3).

そして、音声認識装置１は、類似単語抽出手段２４によって、ステップＳ３で探索されたキーワード音素列と、発音辞書１１３に登録されている単語の発音を示す登録単語音素列とに基づいて、キーワードに類似（同一を含む）する単語（類似単語）を抽出するとともに、その類似単語に対するペナルティ値を決定する（ステップＳ４）。
このとき、類似単語抽出手段２４は、類似度測定手段２４１によって、ＤＰマッチング法による音素列間の距離により、キーワード音素列と登録単語音素列との類似度を測定することで、類似単語を抽出する。さらに、類似単語抽出手段２４は、音素列間の距離が近い類似単語ほど、ペナルティ値を大きくする。 Then, the speech recognition apparatus 1 uses the similar word extraction unit 24 as a keyword based on the keyword phoneme string searched in step S3 and the registered word phoneme string indicating the pronunciation of the word registered in the pronunciation dictionary 113. A similar (including the same) word (similar word) is extracted, and a penalty value for the similar word is determined (step S4).
At this time, the similar word extracting unit 24 extracts the similar word by measuring the similarity between the keyword phoneme sequence and the registered word phoneme sequence by the similarity measuring unit 241 based on the distance between the phoneme sequences by the DP matching method. To do. Furthermore, the similar word extracting unit 24 increases the penalty value for similar words having a shorter distance between phoneme strings.

そして、音声認識装置１は、言語モデル更新手段２５によって、ステップＳ２で記憶されたキーワード辞書２２１に登録されているキーワード及びボーナス値と、ステップＳ４で抽出された類似単語及びペナルティ値とに基づいて、言語モデル１１１を更新する。
すなわち、音声認識装置１は、言語モデル更新手段２５によって、言語モデル１１１において、キーワードを特定するための識別情報（図２では「！」の文字）を、キーワードを示す単語に付与し、接続確率値に加算するボーナス値を登録する（ステップＳ５）。
さらに、音声認識装置１は、言語モデル更新手段２５によって、言語モデル１１１において、類似単語を探索し、接続確率値から減算するペナルティ値を登録する（ステップＳ６）。 Then, the speech recognition apparatus 1 uses the language model update unit 25 based on the keywords and bonus values registered in the keyword dictionary 221 stored in step S2 and the similar words and penalty values extracted in step S4. The language model 111 is updated.
That is, the speech recognition apparatus 1 uses the language model update unit 25 to add identification information (characters “!” In FIG. 2) for specifying a keyword in the language model 111 to the word indicating the keyword, and the connection probability. A bonus value to be added to the value is registered (step S5).
Further, the speech recognition apparatus 1 searches for similar words in the language model 111 by the language model update unit 25 and registers a penalty value to be subtracted from the connection probability value (step S6).

以上の動作によって、音声認識装置１は、電子化された文書からキーワードを抽出し、そのキーワードを認識することが可能な言語モデルを新たに生成することができる。
なお、新たに別の文書によって、キーワードを更新する場合は、ステップＳ５より前に、登録された識別情報や、ボーナス値及びペナルティ値を削除することとする。これによって、例えば、放送番組の台本、構成表等によって、認識精度を高めたいキーワードが異なる場合であっても、容易にその対象となるキーワードを変更することができる。 Through the above operation, the speech recognition apparatus 1 can extract a keyword from an electronic document and newly generate a language model that can recognize the keyword.
When the keyword is updated with another new document, the registered identification information, bonus value, and penalty value are deleted before step S5. As a result, for example, even if a keyword whose recognition accuracy is to be improved differs depending on a script of a broadcast program, a configuration table, or the like, the target keyword can be easily changed.

〔音声認識動作〕
次に、図６に示すように、音声認識装置１は、探索手段１３２によって、言語モデル１１１から、接続される出力系列の候補を探索する。
このとき、探索手段１３２は、候補となる単語が、キーワードとして登録されている単語であるか否かを判定し（ステップＳ１１）、キーワードである場合（ステップＳ１１でＹｅｓ）は、ボーナス付与手段１３２ａによって、接続確率値にボーナス値を加算した値を当該出力系列の確率値に加算し（ステップＳ１２）、ステップＳ１６へ進む。 [Voice recognition operation]
Next, as illustrated in FIG. 6, the speech recognition apparatus 1 uses the search unit 132 to search for a connected output series candidate from the language model 111.
At this time, the search means 132 determines whether the candidate word is a word registered as a keyword (step S11). If it is a keyword (Yes in step S11), the bonus grant means 132a. Thus, the value obtained by adding the bonus value to the connection probability value is added to the probability value of the output series (step S12), and the process proceeds to step S16.

さらに、探索手段１３２は、候補となる単語が、キーワードと類似する類似単語であるか否かを判定し（ステップＳ１３）、類似単語である場合（ステップＳ１３でＹｅｓ）は、ペナルティ付与手段１３２ｂによって、接続確率値からペナルティ値を減算した値を当該出力系列の確率値に加算し（ステップＳ１４）、ステップＳ１６へ進む。
一方、候補となる単語が、キーワードでもなく類似単語でもない場合は、当該単語に設定されている接続確率値を出力系列の確率値に加算する（ステップＳ１５）。 Further, the search unit 132 determines whether the candidate word is a similar word similar to the keyword (step S13). If the word is a similar word (Yes in step S13), the penalty granting unit 132b Then, a value obtained by subtracting the penalty value from the connection probability value is added to the probability value of the output series (step S14), and the process proceeds to step S16.
On the other hand, if the candidate word is neither a keyword nor a similar word, the connection probability value set for the word is added to the probability value of the output series (step S15).

さらに、音声認識装置１は、特徴抽出手段１２によって、入力された音声（音声信号）を分析することで音声の特徴量を抽出し、音響類似度算出手段１３１によって、単語に振られた発音（音素）との類似度を出力系列に加算する（ステップＳ１６）。 Further, the speech recognition apparatus 1 extracts the feature amount of the speech by analyzing the input speech (speech signal) by the feature extraction unit 12, and the pronunciation ( The similarity with the phoneme is added to the output series (step S16).

そして、探索手段１３２は、接続される単語がさらに継続するか否かを判定し（ステップＳ１７）、継続する場合（ステップＳ１７でＹｅｓ）は、ステップＳ１１に戻って、出力系列の確率値を加算していく。
そして、探索手段１３２は、すべての出力系列の候補の確率値を算出した段階で、確率値が最大となる出力系列を認識単語列として出力する（ステップＳ１８）。
以上の動作によって、音声認識装置１は、キーワードの認識精度を高めた音声認識を行うことができる。 Then, the search means 132 determines whether or not the connected word further continues (step S17), and when it continues (Yes in step S17), returns to step S11 and adds the probability value of the output sequence. I will do it.
Then, the search means 132 outputs the output series having the maximum probability value as a recognized word string at the stage of calculating the probability values of all the output series candidates (step S18).
With the above operation, the speech recognition apparatus 1 can perform speech recognition with improved keyword recognition accuracy.

［第二の実施の形態］
（音声認識装置の構成）
次に、図７を参照して、第二の実施の形態に係る音声認識装置の構成について説明する。図７は、本発明の第二の実施の形態に係る音声認識装置の構成を示すブロック図である。図７に示した音声認識装置１Ｂは、特定のキーワードについての認識精度を高めて、入力された音声信号を認識するものである。ここでは、音声認識装置１Ｂは、認識部１０Ｂと、更新部２０Ｂとで構成されている。 [Second Embodiment]
(Configuration of voice recognition device)
Next, the configuration of the speech recognition apparatus according to the second embodiment will be described with reference to FIG. FIG. 7 is a block diagram showing the configuration of the speech recognition apparatus according to the second embodiment of the present invention. The speech recognition apparatus 1B shown in FIG. 7 recognizes an input speech signal with improved recognition accuracy for a specific keyword. Here, the speech recognition apparatus 1B includes a recognition unit 10B and an update unit 20B.

この音声認識装置１Ｂは、図１で説明した音声認識装置１に対して、言語モデル１１１Ｂ及びキーワード辞書２２１Ｂの内容と、ボーナス付与手段１３２Ｂａ、キーワード抽出手段２１Ｂ及び言語モデル更新手段２５Ｂの各機能とが異なっている。他の構成については、図１で説明した音声認識装置１と同一であるため、同一の符号を付し説明を省略する。
なお、音声認識装置１Ｂは、図１で説明した音声認識装置１とは、キーワードに対する接続確率を高める（ボーナスを付与する）手法が異なっている。 The speech recognition apparatus 1B is different from the speech recognition apparatus 1 described in FIG. 1 with respect to the contents of the language model 111B and the keyword dictionary 221B and the functions of the bonus granting means 132Ba, the keyword extracting means 21B, and the language model updating means 25B. Is different. Other configurations are the same as those of the speech recognition apparatus 1 described with reference to FIG.
Note that the speech recognition device 1B is different from the speech recognition device 1 described in FIG. 1 in a method of increasing the connection probability for a keyword (giving a bonus).

キーワード抽出手段２１Ｂは、電子化されたキーワードを含んだ文書からキーワードを抽出するものである。なお、このキーワード抽出手段２１Ｂは、図１で説明したキーワード抽出手段２１と同様にキーワードを抽出するが、キーワードの品詞を認識する機能を付加している点が異なっている。すなわち、ここでは、キーワード抽出手段２１Ｂは、形態素解析を行うことで、キーワードの品詞を解析する。
そして、キーワード抽出手段２１Ｂは、抽出したキーワードとその品詞とを記憶手段２２Ｂのキーワード辞書２２１Ｂに登録する。 The keyword extraction means 21B extracts keywords from a document containing digitized keywords. The keyword extracting unit 21B extracts keywords in the same manner as the keyword extracting unit 21 described with reference to FIG. 1, except that a function for recognizing the keyword part of speech is added. That is, here, the keyword extraction unit 21B analyzes the keyword part of speech by performing morphological analysis.
Then, the keyword extraction unit 21B registers the extracted keyword and its part of speech in the keyword dictionary 221B of the storage unit 22B.

言語モデル更新手段２５Ｂは、記憶手段２２Ｂに記憶されているキーワード辞書２２１Ｂに登録されているキーワード及び品詞に基づいて、言語モデル１１１Ｂを更新するものである。ここでは、言語モデル更新手段２５Ｂは、学習テキストとして登録されている単語のうち、キーワードに相当する単語を、そのキーワードの品詞を示す固有の文字列に置換した後、品詞クラスの言語モデルとして生成する。 The language model update unit 25B updates the language model 111B based on keywords and parts of speech registered in the keyword dictionary 221B stored in the storage unit 22B. Here, the language model update unit 25B replaces a word corresponding to a keyword among words registered as learning text with a unique character string indicating the part of speech of the keyword, and then generates a language model of the part of speech class. To do.

ここで、図８を参照（適宜図７参照）して、言語モデル更新手段２５Ｂが言語モデル１１１Ｂを生成する手法について説明する。図８は、本発明に係る言語モデルを生成する手順を説明するための説明図である。
ここで、まず、図８（ａ）に示すように、言語モデルを生成するための学習テキストが存在するとする。なお、この学習テキストは、通常、数百万文程度を用い、図示を省略した記憶手段に記憶されているものとする。 Here, with reference to FIG. 8 (refer to FIG. 7 as appropriate), a method in which the language model update unit 25B generates the language model 111B will be described. FIG. 8 is an explanatory diagram for explaining a procedure for generating a language model according to the present invention.
Here, first, as shown in FIG. 8A, it is assumed that there is a learning text for generating a language model. It is assumed that this learning text is normally stored in a storage means (not shown) using about several million sentences.

また、図８（ｂ）に示すように、言語モデル１１１Ｂには、図８（ａ）の学習結果として数万個程度の単語（登録語彙）が登録されているものとする。
このとき、図８（ｃ）に示すような未登録語彙（ここでは、「中教審」、「こども」）がキーワードであると仮定すると、言語モデル更新手段２５Ｂは、図８（ｄ）に示すように、学習テキストの未登録語彙（キーワード）に相当する単語を、固有のマーク（ここでは、「＄」）を付したキーワードの品詞に置き換える。
図８（ｄ）では、「中教審」を「＄固有名詞＄」、「こども」を「＄一般名詞＄」にそれぞれ置換した例を示している。 Further, as shown in FIG. 8B, it is assumed that tens of thousands of words (registered vocabulary) are registered in the language model 111B as the learning result of FIG. 8A.
At this time, if it is assumed that an unregistered vocabulary as shown in FIG. 8 (c) (here, “junior high court”, “child”) is a keyword, the language model updating unit 25B is as shown in FIG. 8 (d). In addition, the word corresponding to the unregistered vocabulary (keyword) of the learning text is replaced with the keyword part of speech with a unique mark (here, “$”).
FIG. 8D shows an example in which “junior court” is replaced with “$ proper noun $” and “child” is replaced with “$ general noun $”.

その後、言語モデル更新手段２５Ｂは、図８（ｅ）に示すように、Ｎ−ｇｒａｍの数をカウントする。なお、図８（ｅ）では、Ｎ−ｇｒａｍのうち、Ｎ＝２の例を示している。
そして、言語モデル更新手段２５Ｂは、図８（ｆ）に示すように、各単語の接続確率値を計算し、言語モデル１１１Ｂを学習しなおす。これによって、キーワードについては品詞クラスの言語モデルが生成されることになる。
図７に戻って、音声認識装置１Ｂの構成について説明を続ける。 Thereafter, the language model update unit 25B counts the number of N-grams as shown in FIG. FIG. 8E shows an example of N = 2 among N-grams.
Then, the language model update unit 25B calculates the connection probability value of each word and relearns the language model 111B as shown in FIG. As a result, a language model of the part of speech class is generated for the keyword.
Returning to FIG. 7, the description of the configuration of the speech recognition apparatus 1B will be continued.

ボーナス付与手段１３２Ｂａは、言語モデル１１１Ｂにおいて、キーワードの品詞に基づいて、出力系列の確率値を算出するものである。なお、キーワードの品詞は、キーワード辞書２２１Ｂに登録されているものを使用する。
ここでは、ボーナス付与手段１３２Ｂａは、出力系列の確率値（言語スコア）を算出する際に、キーワードの言語スコアについては、品詞に基づく言語スコアを用い、より大きな言語スコアが得られる単語列を入力音声に対する認識結果（認識単語列）として出力する。
具体的には、ボーナス付与手段１３２Ｂａは、キーワードをｗ、キーワードｗの直前の（Ｎ−１）−ｇｒａｍの履歴をｈ、品詞クラスをＣとしたとき、以下の（４）式により、ｗが出現する事後確率Ｐ（ｗ｜ｈ）を推定する。 The bonus giving means 132Ba calculates the probability value of the output series based on the part of speech of the keyword in the language model 111B. The keyword part of speech is registered in the keyword dictionary 221B.
Here, when the probability value (language score) of the output series is calculated, the bonus grant unit 132Ba uses a language score based on the part of speech as the keyword language score, and inputs a word string from which a larger language score can be obtained. Output as a speech recognition result (a recognition word string).
Specifically, the bonus granting unit 132Ba has a keyword w, a history of (N-1) -gram immediately before the keyword w is h, and a part of speech class is C. A posteriori probability P (w | h) of appearance is estimated.

すなわち、ボーナス付与手段１３２Ｂａは、単語列（（Ｎ−１）−ｇｒａｍの履歴）ｈの後、キーワードｗが出現する事後確率Ｐ（ｗ｜ｈ）を、単語列ｈの後に品詞クラスＣが出現する確率に、品詞クラスＣ中でキーワードｗが出現する確率を乗じて算出する。
ここで、品詞クラスＣ中でキーワードｗが出現する確率とは、品詞クラスＣ（例えば、固有名詞のクラス）に属するキーワードの個数の逆数である。
この場合、予め音声認識装置１Ｂで使用する語彙の設計において、例えば、一般に特殊な単語とみなされる固有名詞を除いておけば、固有名詞の単語に与えられる確率値が大きくなる。また、キーワードは、一般に固有名詞等限られた品詞であることが多いため、キーワードを含んだ出力系列の言語スコアの値に対して、実質的にボーナスを与えたことになる。 That is, the bonus granting means 132Ba uses the posterior probability P (w | h) that the keyword w appears after the word string ((N-1) -gram history) h, and the part of speech class C appears after the word string h. And the probability that the keyword w appears in the part-of-speech class C is calculated.
Here, the probability that the keyword w appears in the part-of-speech class C is the reciprocal of the number of keywords belonging to the part-of-speech class C (for example, the proper noun class).
In this case, in the design of the vocabulary used in the speech recognition apparatus 1B in advance, for example, if proper nouns that are generally regarded as special words are excluded, the probability value given to the proper noun word increases. Further, since keywords are generally limited to parts of speech such as proper nouns, a bonus is substantially given to the language score value of the output series including the keywords.

以上、第二の実施の形態に係る音声認識装置１Ｂの構成について説明したが、本発明は、この構成に限定されるものではない。
例えば、予めキーワードが決まっているのであれば、構成からキーワード抽出手段２１Ｂを省略し、直接、キーワード辞書２２１Ｂにキーワードと品詞とを登録することとしてもよい。また、第一の実施の形態に係る音声認識装置１と同様、ペナルティ付与手段１３２ｂ、音素列探索手段２３、類似単語抽出手段２４を省略して構成してもよい。
なお、音声認識装置１Ｂは、一般的なコンピュータを前記した各手段として機能させる音声認識プログラムによって動作させることができる。 The configuration of the speech recognition apparatus 1B according to the second embodiment has been described above, but the present invention is not limited to this configuration.
For example, if a keyword is determined in advance, the keyword extraction unit 21B may be omitted from the configuration, and the keyword and part of speech may be registered directly in the keyword dictionary 221B. Further, as with the speech recognition apparatus 1 according to the first embodiment, the penalty giving means 132b, the phoneme string searching means 23, and the similar word extracting means 24 may be omitted.
The voice recognition device 1B can be operated by a voice recognition program that causes a general computer to function as each of the above-described means.

（音声認識装置の動作）
次に、図９及び図１０を参照（構成については図１参照）して、音声認識装置の動作について説明する。図９は、本発明の第二の実施の形態に係る音声認識装置の言語モデルの更新動作を示すフローチャートである。図１０は、本発明の第二の実施の形態に係る音声認識装置の音声認識動作を示すフローチャートである。 (Operation of voice recognition device)
Next, the operation of the speech recognition apparatus will be described with reference to FIGS. 9 and 10 (see FIG. 1 for the configuration). FIG. 9 is a flowchart showing the language model update operation of the speech recognition apparatus according to the second embodiment of the present invention. FIG. 10 is a flowchart showing the speech recognition operation of the speech recognition apparatus according to the second embodiment of the present invention.

〔言語モデルの更新動作〕
図９に示すように、まず、音声認識装置１Ｂは、キーワード抽出手段２１Ｂによって、電子化されたキーワードを含んだ文書から、キーワードとその品詞を抽出する（ステップＳ２１）。
そして、キーワード抽出手段２１Ｂが、ステップＳ２１で抽出したキーワードと、その品詞とを、キーワード辞書２２１Ｂとして記憶手段２２に記憶する（ステップＳ２２）。
その後、音声認識装置１Ｂは、音素列探索手段２３によって、キーワード辞書２２１Ｂに含まれているキーワードの発音を示す音素列（キーワード音素列）を、発音辞書１１３から検索する（ステップＳ２３）。 [Update operation of language model]
As shown in FIG. 9, first, the speech recognition apparatus 1B extracts a keyword and its part of speech from a document including the digitized keyword by the keyword extraction unit 21B (step S21).
Then, the keyword extraction unit 21B stores the keyword extracted in step S21 and its part of speech in the storage unit 22 as a keyword dictionary 221B (step S22).
Thereafter, the speech recognition device 1B searches the phoneme string search means 23 for a phoneme string (keyword phoneme string) indicating the pronunciation of the keyword included in the keyword dictionary 221B from the pronunciation dictionary 113 (step S23).

そして、音声認識装置１Ｂは、類似単語抽出手段２４によって、ステップＳ２３で探索されたキーワード音素列と、発音辞書１１３に登録されている単語の発音を示す登録単語音素列とに基づいて、キーワードに類似（同一を含む）する単語（類似単語）を抽出するとともに、その類似単語にペナルティ値を設定する（ステップＳ２４）。 The speech recognition apparatus 1B uses the similar word extraction unit 24 as a keyword based on the keyword phoneme string searched in step S23 and the registered word phoneme string indicating the pronunciation of the word registered in the pronunciation dictionary 113. A similar (including the same) word (similar word) is extracted, and a penalty value is set for the similar word (step S24).

そして、音声認識装置１Ｂは、言語モデル更新手段２５Ｂによって、ステップＳ２２で記憶されたキーワード辞書２２１Ｂに登録されているキーワード及び品詞と、ステップＳ２４で抽出された類似単語及びペナルティ値とに基づいて、言語モデル１１１Ｂを更新する。すなわち、音声認識装置１Ｂは、言語モデル更新手段２５Ｂによって、学習テキストとして登録されている単語のうち、キーワードに相当する単語を、そのキーワードの品詞を示す固有の文字列に置換した後、学習テキストを学習することで品詞クラスの言語モデルを生成する（ステップＳ２５）。
さらに、音声認識装置１Ｂは、言語モデル更新手段２５Ｂによって、言語モデル１１１Ｂにおいて、類似単語を探索し、接続確率値を減算するペナルティ値を登録する（ステップＳ２６）。
以上の動作によって、音声認識装置１Ｂは、電子化された文書からキーワードを抽出し、キーワードについては、品詞によりモデル化された言語モデルを生成する。 Then, the speech recognition device 1B uses the language model update unit 25B based on the keywords and parts of speech registered in the keyword dictionary 221B stored in step S22 and the similar words and penalty values extracted in step S24. The language model 111B is updated. That is, the speech recognition apparatus 1B replaces the word corresponding to the keyword among the words registered as the learning text by the language model updating unit 25B with the unique character string indicating the part of speech of the keyword, and then the learning text. A language model of the part of speech class is generated by learning (Step S25).
Furthermore, the speech recognition apparatus 1B searches for a similar word in the language model 111B by the language model update unit 25B and registers a penalty value for subtracting the connection probability value (step S26).
Through the above operation, the speech recognition apparatus 1B extracts keywords from the digitized document, and generates a language model modeled with parts of speech for the keywords.

〔音声認識動作〕
次に、図１０に示すように、音声認識装置１Ｂは、探索手段１３２Ｂによって、言語モデル１１１Ｂから、接続される出力系列の候補を探索する。
このとき、探索手段１３２Ｂは、候補となる単語が、キーワードとして登録されている単語であるか否かを判定し（ステップＳ３１）、キーワードである場合（ステップＳ３１でＹｅｓ）は、ボーナス付与手段１３２Ｂａによって、当該キーワードに対応する品詞の確率値を当該出力系列の確率値に加算し（ステップＳ３２）、ステップＳ３６へ進む。 [Voice recognition operation]
Next, as shown in FIG. 10, the speech recognition apparatus 1B searches the output model candidate from the language model 111B by the search unit 132B.
At this time, the search means 132B determines whether or not the candidate word is a word registered as a keyword (step S31). If it is a keyword (Yes in step S31), the bonus grant means 132Ba Thus, the probability value of the part of speech corresponding to the keyword is added to the probability value of the output series (step S32), and the process proceeds to step S36.

さらに、探索手段１３２Ｂは、候補となる単語が、キーワードと類似する類似単語であるか否かを判定し（ステップＳ３３）、類似単語である場合（ステップＳ３３でＹｅｓ）は、接続確率値からペナルティ値を減算した値を当該出力系列の確率値に加算し（ステップＳ３４）、ステップＳ３６へ進む。
一方、候補となる単語が、キーワードでもなく類似単語でもない場合は、当該単語に設定されている接続確率値を出力系列の確率値に加算する（ステップＳ３５）。 Further, the search means 132B determines whether or not the candidate word is a similar word similar to the keyword (step S33), and if it is a similar word (Yes in step S33), the penalty is determined from the connection probability value. The value obtained by subtracting the value is added to the probability value of the output series (step S34), and the process proceeds to step S36.
On the other hand, if the candidate word is neither a keyword nor a similar word, the connection probability value set for the word is added to the probability value of the output sequence (step S35).

さらに、音声認識装置１Ｂは、特徴抽出手段１２によって、入力された音声（音声信号）を分析することで音声の特徴量を抽出し、音響類似度算出手段１３１によって、単語に振られた発音（音素）との類似度を出力系列に加算する（ステップＳ３６）。 Furthermore, the speech recognition apparatus 1B extracts the feature amount of the speech by analyzing the input speech (speech signal) by the feature extraction unit 12, and the pronunciation ( The similarity with the phoneme is added to the output series (step S36).

そして、探索手段１３２Ｂは、接続される単語がさらに継続するか否かを判定し（ステップＳ３７）、継続する場合（ステップＳ３７でＹｅｓ）は、ステップＳ３１に戻って、出力系列の確率値を加算していく。
そして、探索手段１３２Ｂは、すべての出力系列の候補の確率値を算出した段階で、確率値が最大となる出力系列を認識単語列として出力する（ステップＳ３８）。
以上の動作によって、音声認識装置１Ｂは、キーワードの認識精度を高めた音声認識を行うことができる。 Then, the search means 132B determines whether or not the connected word continues (step S37), and if so (Yes in step S37), returns to step S31 and adds the probability value of the output sequence. I will do it.
Then, the search means 132B outputs the output series having the maximum probability value as a recognized word string at the stage of calculating the probability values of all the output series candidates (step S38).
With the above operation, the speech recognition apparatus 1B can perform speech recognition with improved keyword recognition accuracy.

本発明の第一の実施の形態に係る音声認識装置の構成を示すブロック図である。It is a block diagram which shows the structure of the speech recognition apparatus which concerns on 1st embodiment of this invention. 本発明に係る言語モデルの内容を示すデータ構造図である。It is a data structure figure which shows the content of the language model which concerns on this invention. キーワードの抽出手法を説明するための説明図である。It is explanatory drawing for demonstrating the extraction method of a keyword. 類似単語の抽出手法を説明するための説明図である。It is explanatory drawing for demonstrating the extraction method of a similar word. 本発明の第一の実施の形態に係る音声認識装置の言語モデルの更新動作を示すフローチャートである。It is a flowchart which shows the update operation of the language model of the speech recognition apparatus which concerns on 1st embodiment of this invention. 本発明の第一の実施の形態に係る音声認識装置の音声認識動作を示すフローチャートである。It is a flowchart which shows the speech recognition operation | movement of the speech recognition apparatus which concerns on 1st embodiment of this invention. 本発明の第二の実施の形態に係る音声認識装置の構成を示すブロック図である。It is a block diagram which shows the structure of the speech recognition apparatus which concerns on 2nd embodiment of this invention. 本発明に係る言語モデルを生成する手順を説明するための説明図である。It is explanatory drawing for demonstrating the procedure which produces | generates the language model which concerns on this invention. 本発明の第二の実施の形態に係る音声認識装置の言語モデルの更新動作を示すフローチャートである。It is a flowchart which shows the update operation of the language model of the speech recognition apparatus which concerns on 2nd embodiment of this invention. 本発明の第二の実施の形態に係る音声認識装置の音声認識動作を示すフローチャートである。It is a flowchart which shows the speech recognition operation | movement of the speech recognition apparatus which concerns on 2nd embodiment of this invention.

Explanation of symbols

１音声認識装置
１１記憶手段（言語モデル記憶手段、発音辞書記憶手段）
１１１言語モデル
１１２音響モデル
１１３発音辞書
１２特徴抽出手段
１３単語列生成手段
１３１音響類似度算出手段
１３２探索手段
１３２ａボーナス付与手段（確率値増加手段）
１３２ｂペナルティ付与手段（確率値減少手段）
２１キーワード抽出手段
２２記憶手段（キーワード記憶手段）
２２１キーワード辞書
２３音素列探索手段
２４類似単語抽出手段
２４１類似度測定手段
２５言語モデル更新手段 1 speech recognition device 11 storage means (language model storage means, pronunciation dictionary storage means)
111 Language Model 112 Acoustic Model 113 Pronunciation Dictionary 12 Feature Extraction Unit 13 Word String Generation Unit 131 Acoustic Similarity Calculation Unit 132 Search Unit 132a Bonus Grant Unit (Probability Value Increase Unit)
132b Penalty giving means (probability reduction means)
21 keyword extraction means 22 storage means (keyword storage means)
221 Keyword dictionary 23 Phoneme string search means 24 Similar word extraction means 241 Similarity measurement means 25 Language model update means

Claims

In a speech recognition device that recognizes input speech using a language model,
Language model storage means for storing a language model in which predetermined identification information is given to a word corresponding to a specific keyword;
In the language model, a word string generation unit that outputs a word string that is a recognition result by searching for a path that maximizes a probability value for the word string of the input speech, and
The word string generation means includes a probability value increasing means for increasing the probability value by adding a predetermined value to a connection probability value of a word to which the identification information is given in the language model. A voice recognition device characterized by that.

Keyword extracting means for extracting the word as the keyword from the digitized document;
A language model updating unit that updates the language model by adding the identification information to the keyword extracted by the keyword extracting unit;
The speech recognition apparatus according to claim 1, further comprising:

In a speech recognition device that recognizes input speech using a language model,
A language model storage means for storing a language model in which predetermined first identification information is given to a word corresponding to a specific keyword and predetermined second identification information is given to a word similar to the specific keyword When,
In the language model, a word string generation unit that outputs a word string that is a recognition result by searching for a path that maximizes a probability value for the word string of the input speech, and
The word string generation means is
In the language model, a probability value increasing means for increasing the probability value by adding a predetermined value to the connection probability value of the word to which the first identification information is given,
In the language model, a probability value reducing means for reducing the probability value by subtracting a predetermined value from a connection probability value of the word to which the second identification information is given,
A speech recognition apparatus characterized by comprising:

Pronunciation dictionary storage means for storing a pronunciation dictionary storing the pronunciation of a word;
Keyword extracting means for extracting the word as the keyword from the digitized document;
Phoneme string search means for searching a keyword phoneme string, which is a phoneme string indicating the pronunciation of the keyword, from the pronunciation dictionary;
Similar word extraction means for extracting a word similar to the keyword based on the keyword phoneme string searched by the phoneme string search means and a registered word phoneme string indicating pronunciation of a word registered in the pronunciation dictionary; ,
A language model update that updates the language model by adding the first identification information to the keyword extracted by the keyword extraction means, and adding the second identification information to a word similar to the keyword. Means,
The speech recognition apparatus according to claim 3, further comprising:

5. The speech recognition apparatus according to claim 4, wherein the probability value decreasing unit changes the amount of decrease in the connection probability value based on a degree of similarity between the keyword phoneme string and the registered word phoneme string. .

In a speech recognition device that recognizes input speech using a language model,
Keyword storage means for storing a specific keyword and its part of speech;
Language model storage means for storing a language model learned by replacing a word corresponding to the keyword with a part of speech;
In the language model, a word string generation unit that outputs a word string that is a recognition result by searching for a path that maximizes a probability value for the word string of the input speech, and
The word string generation means is
In the language model, the probability value for increasing the connection probability value of the keyword by calculating the connection probability value of the keyword based on the connection probability value of the part of speech corresponding to the keyword stored in the keyword storage means A speech recognition apparatus comprising an increasing means.

Using a language model in which predetermined first identification information is given to a word corresponding to a specific keyword and predetermined second identification information is given to a word similar to the specific keyword, To recognize the computer,
A feature extraction means for analyzing the input speech and extracting a feature amount;
A word string that outputs a word string that is a recognition result by searching for a path in which the probability value for the word string of the input speech is maximized in the language model based on the feature amount extracted by the feature extraction unit. Function as a generation means,
The word string generation means increases the probability value by adding a predetermined value to a connection probability value for the word to which the first identification information is given in the language model, and the second A speech recognition program for reducing a probability value by subtracting a predetermined value from a connection probability value for a word assigned with the identification information.