JP6437092B2

JP6437092B2 - Speech recognition apparatus, speech recognition method, and speech recognition program

Info

Publication number: JP6437092B2
Application number: JP2017505903A
Authority: JP
Inventors: 知宏成田
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2015-03-16
Filing date: 2015-03-16
Publication date: 2018-12-12
Anticipated expiration: 2035-03-16
Also published as: WO2016147292A1; JPWO2016147292A1

Description

この発明は、統計言語モデルを使用したキーワード抽出型の音声認識技術に関するものである。 The present invention relates to a keyword extraction type speech recognition technique using a statistical language model.

キーワード抽出型の音声認識技術では、ユーザの発話内容の全てを一字一句正しく認識する必要はなく、キーワードと呼ばれる重要語だけ正しく認識することが要求される。例えば認識対象が料理名とその付帯表現であり、「寿司」がキーワードであって、「寿司を食べたい」との発話に対する認識結果が「寿司食べた」である場合、認識結果としては誤りを含むが、キーワードである「寿司」は正しく抽出しており正解であると言える。一方、「テニスしたいね」との発話に対する認識結果が「手に寿司鯛ね」である場合、認識結果としてはユーザが発話していない「寿司」を抽出しているため誤りである。つまり、あるキーワードを含む発話に対しては当該キーワードを正しく抽出し、当該キーワードを含まない発話に対しては当該キーワードを誤って抽出しないことが求められる。
なお、以下では上述した「テニスしたいね」のように認識対象外の発話をタスク外発話と称する。In the keyword extraction type speech recognition technology, it is not necessary to correctly recognize all of the user's utterance contents one by one, and it is required to correctly recognize only important words called keywords. For example, if the recognition target is a dish name and its accompanying expression, “sushi” is the keyword, and the recognition result for the utterance “I want to eat sushi” is “I ate sushi”, the recognition result is incorrect. Although it is included, the keyword “sushi” is correctly extracted and can be said to be correct. On the other hand, if the recognition result for the utterance “I want to play tennis” is “Sushi in hand”, the recognition result is an error because “sushi” that is not uttered by the user is extracted. That is, it is required to correctly extract the keyword for utterances including a certain keyword and not to extract the keyword erroneously for utterances not including the keyword.
In the following, utterances that are not subject to recognition, such as “I want to play tennis” described above, are referred to as utterances outside a task.

また、単語間の接続のしやすさを言語尤度と呼ぶ数値で表す統計言語モデルを用いた音声認識技術では、学習コーパスから言語尤度を学習することで、学習コーパス内に高頻度で出現する単語の組み合わせに対して高い言語尤度を与える。また、複数の語彙をクラスという１つのグループとして表現したクラス言語モデルを用いることで、スパース（粗密な情報）な学習コーパスから効率良く言語尤度を学習することができる。
例えば学習コーパス内に「寿司が食べたい」というコーパスがあった場合、「寿司、ラーメン、カレーライス、」を１つのクラスであるキーワードクラス＜Ａ＞とし、「＜Ａ＞が食べたい」という表現にすることにより、「寿司が食べたい」、「ラーメンが食べたい」および「カレーライスが食べたい」という単語の並びに対する言語尤度を学習することができる。In speech recognition technology using a statistical language model that expresses the ease of connection between words as a language likelihood, it appears in the learning corpus at a high frequency by learning the language likelihood from the learning corpus. High language likelihood is given to a combination of words. Further, by using a class language model in which a plurality of vocabularies are expressed as one group called a class, it is possible to efficiently learn a language likelihood from a sparse (rough information) learning corpus.
For example, if there is a corpus “I want to eat sushi” in the learning corpus, “Sushi, Ramen, Curry and Rice” is the one class class <A>, and the expression “I want to eat <A>” By doing so, it is possible to learn the language likelihood for the word sequences “I want to eat sushi”, “I want to eat ramen” and “I want to eat curry and rice”.

統計言語モデルを用いた音声認識技術では、学習コーパスに存在しないＮ−ｇｒａｍ確立に基づく未知語に対しても低次のＮ−ｇｒａｍ確率から高次のＮ−ｇｒａｍ確率を補完するバックオフという手法で言語尤度を与えている。当該バックオフにより、学習コーパスに含まれない単語の連鎖を含む表現も受理可能になる。しかし、認識結果からキーワードを抽出するタスクではバックオフの結果、学習コーパスに含まれない発話に対してキーワードの誤検出が多くなるという問題があった。 In speech recognition technology using a statistical language model, a method called back-off that complements high-order N-gram probabilities from low-order N-gram probabilities for unknown words based on N-gram establishment that does not exist in the learning corpus. Gives the language likelihood. By this back-off, an expression including a chain of words that is not included in the learning corpus can be accepted. However, in the task of extracting keywords from the recognition result, there is a problem that as a result of backoff, keyword misdetection increases for utterances not included in the learning corpus.

例えば、「寿司、ラーメン、カレーライス」を抽出すべきキーワードクラスとして言語モデルを学習した場合を考える。この場合、「テニス」が学習コーパスに含まれない未知語だとすると、タスク外発話「テニスしたいね」に対して「手に寿司鯛ね」というように音響的には合致しているがＮ−ｇｒａｍ確立に基づく未知語から構成される認識結果のスコアが高くなり、キーワードである「寿司」が誤って抽出されてしまう。このようにキーワードの誤検出が発生すると、ユーザが意図していないキーワードまたはそれに紐づく機能が提示されることとなり、ユーザに不快感を与えてしまう。特にキーワード長が短い場合、発話の一部分についてキーワードとの音響的な類似度が大きくなる頻度が高くなり、上述した例のようにキーワードの誤検出が多く発生する。 For example, consider a case where a language model is learned as a keyword class for extracting “sushi, ramen, curry and rice”. In this case, if “tennis” is an unknown word that is not included in the learning corpus, it is acoustically matched to the out-task utterance “I want to play tennis” as “sushi in hand”, but N-gram The score of the recognition result composed of unknown words based on the establishment becomes high, and the keyword “sushi” is erroneously extracted. In this way, when a keyword is erroneously detected, a keyword that is not intended by the user or a function associated with the keyword is presented, which causes discomfort to the user. In particular, when the keyword length is short, the frequency with which the acoustic similarity with the keyword increases for a part of the utterance increases, and the erroneous detection of the keyword occurs frequently as in the example described above.

上述した問題に対して、キーワードの１−ｇｒａｍ確率を低くすることにより、キーワードの誤検出が抑制される。 By reducing the 1-gram probability of the keyword with respect to the above-described problem, erroneous keyword detection is suppressed.

また、上述した問題に対して特許文献１には、局所的なマッチングにより誤って挿入される可能性の高い短い単語に対して高い単語挿入ペナルティを与え、短い単語が連続して挿入されるのを抑制するために、挿入された単語の単語長を検出し、検出した単語長が短くなるほど大きなペナルティが課されるように単語挿入ペナルティを決定する音声認識装置が開示されている。 In addition, for the problem described above, Patent Document 1 gives a high word insertion penalty for short words that are likely to be erroneously inserted by local matching, and short words are continuously inserted. In order to suppress this, a speech recognition device that detects the word length of an inserted word and determines a word insertion penalty so that a larger penalty is imposed as the detected word length becomes shorter is disclosed.

特開２０１１−１６４１９２号公報JP 2011-164192 A

しかしながら、上述したキーワードの１−ｇｒａｍ確率を低くする技術では、短いキーワードのみの発話に対する認識性能が低下するという課題があった。また、バックオフの手法を適用しない、あるいはバックオフで計算される未知語に対するＮ−ｇｒａｍ確率値を低くすると、学習コーパスに含まれない表現の認識が困難となり、キーワードの抽出性能が低下するという課題があった。
また、上述した特許文献１の技術においても、短いキーワードを含む発話の認識性能が低下するという課題があった。However, the above-described technique for reducing the 1-gram probability of a keyword has a problem that the recognition performance for an utterance of only a short keyword is lowered. Further, if the back-off method is not applied or the N-gram probability value for an unknown word calculated by the back-off is lowered, it is difficult to recognize expressions not included in the learning corpus, and the keyword extraction performance decreases. There was a problem.
Further, the technique disclosed in Patent Document 1 described above has a problem that the recognition performance of an utterance including a short keyword is deteriorated.

この発明は、上記のような課題を解決するためになされたもので、キーワードの抽出性能を低下させることなく、単語長が短いキーワードを含む発話に対する認識性能の低下を抑制することを目的とする。 The present invention has been made to solve the above-described problems, and an object thereof is to suppress a reduction in recognition performance for an utterance including a keyword having a short word length without reducing the keyword extraction performance. .

この発明に係る音声認識装置は、設定された認識対象について学習を行った言語モデルと、音声の特徴をモデル化した音響モデルとを用いて入力音声の音声認識を行い、得られた認識結果に基づいて算出される言語尤度および音響尤度から認識結果の認識スコアを算出する音声認識部と、言語モデルのＮ−ｇｒａｍ確率に基づいて、音声認識部が取得した認識結果に含まれる未知語の数を算出する未知語数算出部と、設定された認識対象に関するキーワードを蓄積するキーワード蓄積部と、キーワード蓄積部に蓄積されたキーワードが、音声認識部が取得した認識結果に含まれる場合に、当該キーワードの長さ示すキーワード長を算出するキーワード長算出部と、未知語数算出部が算出した未知語の数の増加に応じて前記認識スコアを低減させ、キーワード長算出部が算出したキーワード長の減少に応じて前記認識スコアを低減させるように、音声認識部が算出した認識スコアを再計算し、再計算した認識スコアに基づいて音声認識部が取得した認識結果を出力するスコア再計算部とを備えるものである。 The speech recognition apparatus according to the present invention performs speech recognition of input speech using a language model that has been learned for a set recognition target and an acoustic model that is a model of speech characteristics, and the obtained recognition result is obtained. A speech recognition unit that calculates a recognition score of the recognition result from the language likelihood and acoustic likelihood calculated based on the unknown word included in the recognition result acquired by the speech recognition unit based on the N-gram probability of the language model When the unknown word number calculation unit for calculating the number of words, the keyword storage unit for storing keywords related to the set recognition target, and the keywords stored in the keyword storage unit are included in the recognition result acquired by the speech recognition unit, and keyword length calculation unit for calculating a keyword length indicating the length of the keyword, of reducing the recognition score in accordance with the increase in the number of unknown words unknown word count calculation unit has calculated , So as to reduce the recognition score in accordance with a decrease of the keyword length keyword length calculation section is calculated, recalculate the recognition score speech recognition unit is calculated, the voice recognition unit on the basis of the recalculated recognition score is acquired And a score recalculation unit that outputs the recognized result.

この発明によれば、キーワードの抽出性能を低下させることなく、単語長が短いキーワードを含む発話の認識性能低下を抑制することができる。また、タスク外発話に対して特に短いキーワードの誤認識を抑制することができる。 According to the present invention, it is possible to suppress a decrease in recognition performance of an utterance including a keyword having a short word length without reducing the keyword extraction performance. In addition, it is possible to suppress erroneous recognition of a keyword that is particularly short for an out-of-task utterance.

実施の形態１に係る音声認識装置の構成を示すブロック図である。1 is a block diagram showing a configuration of a speech recognition apparatus according to Embodiment 1. FIG. 実施の形態１に係る音声認識装置のハードウェア構成を示す図である。2 is a diagram illustrating a hardware configuration of the speech recognition apparatus according to Embodiment 1. FIG. 実施の形態１に係る音声認識装置の動作を示すフローチャートである。3 is a flowchart showing an operation of the speech recognition apparatus according to the first embodiment. 実施の形態１に係る音声認識装置の音声認識部の認識結果を示す一例である。6 is an example showing a recognition result of a voice recognition unit of the voice recognition device according to the first embodiment. 実施の形態１に係る音声認識装置の未知語数算出部の動作を示すフローチャートである。4 is a flowchart showing an operation of an unknown word number calculation unit of the speech recognition apparatus according to the first embodiment. 実施の形態１に係る音声認識装置のキーワード長算出部の動作を示すフローチャートである。4 is a flowchart illustrating an operation of a keyword length calculation unit of the speech recognition apparatus according to the first embodiment. 実施の形態１に係る音声認識装置のスコア再計算部の動作を示すフローチャートである。4 is a flowchart showing an operation of a score recalculation unit of the speech recognition apparatus according to Embodiment 1. 実施の形態１に係る音声認識装置の認識スコア更新後の認識結果を示す一例である。It is an example which shows the recognition result after the recognition score update of the speech recognition apparatus which concerns on Embodiment 1. FIG. 実施の形態２に係る音声認識装置の構成を示すブロック図である。4 is a block diagram illustrating a configuration of a speech recognition apparatus according to Embodiment 2. FIG. 実施の形態２に係る音声認識装置の動作を示すフローチャートである。6 is a flowchart showing the operation of the speech recognition apparatus according to the second embodiment. 実施の形態２に係る音声認識装置のキーワード選択頻度算出部の動作を示すフローチャートである。10 is a flowchart illustrating an operation of a keyword selection frequency calculation unit of the speech recognition apparatus according to the second embodiment. 実施の形態２に係る音声認識装置のスコア再計算部の動作を示すフローチャートである。10 is a flowchart illustrating an operation of a score recalculation unit of the speech recognition apparatus according to the second embodiment. 実施の形態２に係る音声認識装置の認識スコア更新後の認識結果を示す一例である。It is an example which shows the recognition result after the recognition score update of the speech recognition apparatus which concerns on Embodiment 2. FIG.

以下、この発明をより詳細に説明するために、この発明を実施するための形態について、添付の図面に従って説明する。
実施の形態１．
図１は、実施の形態１に係る音声認識装置１０の構成を示すブロック図である。
音声認識装置１０は、入力信号に対して音声認識処理を行い、認識結果を取得する装置であって、音声認識部１、音響モデル蓄積部２、言語モデル蓄積部３、未知語数算出部４、キーワード長算出部５、キーワード蓄積部６およびスコア再計算部７で構成されている。Hereinafter, in order to explain the present invention in more detail, modes for carrying out the present invention will be described with reference to the accompanying drawings.
Embodiment 1 FIG.
FIG. 1 is a block diagram showing the configuration of the speech recognition apparatus 10 according to the first embodiment.
The speech recognition device 10 is a device that performs speech recognition processing on an input signal and obtains a recognition result, and includes a speech recognition unit 1, an acoustic model storage unit 2, a language model storage unit 3, an unknown word number calculation unit 4, The keyword length calculation unit 5, the keyword storage unit 6, and the score recalculation unit 7 are configured.

この実施の形態１では、音声認識部１、音響モデル蓄積部２、言語モデル蓄積部３、未知語数算出部４、キーワード長算出部５、キーワード蓄積部６およびスコア再計算部７は、それぞれ専用の回路により構成するものとして説明する。なお、当該回路によってなされる情報処理の方法についても本願発明の特徴である。また、他の構成回路、例えば汎用的なＣＰＵなどで構成した制御回路とコンピュータプログラムとの組み合わせにより実現することも可能である。 In the first embodiment, the speech recognition unit 1, the acoustic model storage unit 2, the language model storage unit 3, the unknown word number calculation unit 4, the keyword length calculation unit 5, the keyword storage unit 6, and the score recalculation unit 7 are dedicated to each. It is assumed that the circuit is configured by Note that the information processing method performed by the circuit is also a feature of the present invention. It can also be realized by a combination of a computer program and a control circuit constituted by other constituent circuits such as a general-purpose CPU.

図２は、実施の形態１に係る音声認識装置１０のハードウェア構成を示す図である。
音声認識装置１０の音声認識部１、未知語数算出部４、キーワード長算出部５およびスコア再計算部７は、プロセッサ２０がメモリ３０に記憶されたプログラムを実行することにより、実現される。音響モデル蓄積部２、言語モデル蓄積部３およびキーワード蓄積部６は、メモリ３０を構成する。また、複数のプロセッサ２０および複数のメモリ３０が連携して上述した機能を実行するように構成してもよい。FIG. 2 is a diagram illustrating a hardware configuration of the speech recognition apparatus 10 according to the first embodiment.
The speech recognition unit 1, the unknown word number calculation unit 4, the keyword length calculation unit 5, and the score recalculation unit 7 of the speech recognition apparatus 10 are realized by the processor 20 executing a program stored in the memory 30. The acoustic model storage unit 2, the language model storage unit 3, and the keyword storage unit 6 constitute a memory 30. Further, a plurality of processors 20 and a plurality of memories 30 may be configured to cooperate to execute the above-described functions.

次に、音声認識装置１０の各構成について説明する。なお以下では、当該音声認識装置１０の認識対象を、料理名および当該料理名の付帯表現とした場合を例に説明を行う。
音声認識部１は、入力された音声について、音響モデル蓄積部２に蓄積された音響モデルと、言語モデル蓄積部３に蓄積された言語モデルとを用いて音声認識を行い、認識結果を取得する。さらに、取得した認識結果を認識スコアの大きい順に順位付けを行い、上位Ｎ個の認識結果の表記を抽出し、認識スコアを算出する。ここで、認識スコアは音響尤度と言語尤度の和である。Next, each configuration of the speech recognition apparatus 10 will be described. In the following, a case where the recognition target of the voice recognition device 10 is a dish name and an accompanying expression of the dish name will be described as an example.
The speech recognition unit 1 performs speech recognition on the input speech using the acoustic model stored in the acoustic model storage unit 2 and the language model stored in the language model storage unit 3, and acquires a recognition result. . Further, the obtained recognition results are ranked in descending order of recognition score, the notation of the top N recognition results is extracted, and the recognition score is calculated. Here, the recognition score is the sum of acoustic likelihood and language likelihood.

音響モデル蓄積部２は、音声の特徴をモデル化した音響モデルを蓄積している。音響モデルは、例えばＨＭＭ(Hidden Markov Model)とする。言語モデル蓄積部３は、料理名および当該料理名の付帯表現を学習コーパスとして学習した統計言語モデル（Ｎ−ｇｒａｍ言語モデル）を蓄積する。蓄積する料理名は、複数の語彙をクラスという１つのグループ（以下、キーワードクラスと称する）で表現した言語モデルを用いて学習する。蓄積する料理名の付帯表現は、例えば「＜Ａ＞が食べたい」あるいは「おいしい＜Ａ＞が食べたい」という表現である。 The acoustic model accumulating unit 2 accumulates an acoustic model obtained by modeling a voice feature. The acoustic model is, for example, an HMM (Hidden Markov Model). The language model accumulating unit 3 accumulates a statistical language model (N-gram language model) obtained by learning a dish name and an accompanying expression of the dish name as a learning corpus. The cooking names to be accumulated are learned using a language model in which a plurality of vocabularies are expressed in one group called a class (hereinafter referred to as a keyword class). The incidental expression of the accumulated dish name is, for example, an expression “I want to eat <A>” or “I want to eat delicious <A>”.

「＜Ａ＞が食べたい」あるいは「おいしい＜Ａ＞が食べたい」との表現において、キーワードクラスを＜Ａ＞として学習コーパス上で記述し、キーワードクラス＜Ａ＞に含まれるキーワードＡが「寿司、ラーメン、カレーライス」などの料理名で表現される。当該表現を蓄積することにより、全てのキーワードＡに対して「〜が食べたい」「おいしい〜が食べたい」という付帯表現を展開する必要がなく、効率的に言語モデルを学習することができる。
なお、料理名の付帯表現として「＜Ａ＞が食べたい」および「おいしい＜Ａ＞が食べたい」との表現を示したが、当該表現以外にユーザが発話すると推定される表現を網羅した学習コーパスを使用する。In the expression “I want to eat <A>” or “I want to eat delicious <A>”, the keyword class is described as <A> on the learning corpus, and the keyword A included in the keyword class <A> is “Sushi , Ramen, curry and rice ". By accumulating the expressions, it is not necessary to develop additional expressions such as “I want to eat” and “I want to eat delicious” for all the keywords A, and the language model can be efficiently learned.
In addition, although the expressions “<A> wants to eat” and “delicious <A> wants to eat” are shown as supplementary expressions of the dish name, learning that covers expressions that the user is supposed to utter in addition to the expressions Use a corpus.

未知語数算出部４は、言語モデル蓄積部３に蓄積された言語モデルを参照し、音声認識部１が抽出したＮ個の認識結果の表記に含まれる単語から、Ｎ−ｇｒａｍ確率による未知語数を算出する。キーワード長算出部５は、キーワード蓄積部６に蓄積されたキーワードの表記およびキーワードの読みを参照し、音声認識部１が出力したＮ個の認識結果の表記に含まれるキーワードのキーワード長を算出する。キーワード蓄積部６は、音声認識部１が出力する認識結果の形式に合わせたキーワードの表記およびキーワードの読みを蓄積する。また、キーワード蓄積部６が蓄積対象とするキーワードは音声認識装置１０の認識対象に対応したキーワードであり、例えば、認識対象が料理名および当該料理名の付帯表現とした場合に、「寿司、ラーメン、カレーライス」について、表記および読みを蓄積する。 The unknown word number calculation unit 4 refers to the language model stored in the language model storage unit 3 and calculates the number of unknown words based on the N-gram probability from the words included in the notation of the N recognition results extracted by the speech recognition unit 1. calculate. The keyword length calculation unit 5 refers to the keyword descriptions and keyword readings stored in the keyword storage unit 6 and calculates the keyword lengths of the keywords included in the N recognition result descriptions output by the speech recognition unit 1. . The keyword accumulating unit 6 accumulates keyword notations and keyword readings according to the recognition result format output by the speech recognition unit 1. The keywords to be stored by the keyword storage unit 6 are keywords corresponding to the recognition targets of the speech recognition device 10. For example, when the recognition target is a dish name and an accompanying expression of the dish name, “sushi, ramen” , Curry and rice "accumulation and notation.

スコア再計算部７は、音声認識部１が抽出した認識結果の認識スコア、未知語数算出部４が算出した未知語数、およびキーワード長算出部５が算出したキーワードのキーワード長を用いて、認識結果の認識スコアを再計算する。スコア再計算部７は、再計算した認識スコアのうち、最も大きい認識スコアを有する認識結果を出力する。 The score recalculation unit 7 uses the recognition score of the recognition result extracted by the speech recognition unit 1, the number of unknown words calculated by the unknown word number calculation unit 4, and the keyword length of the keyword calculated by the keyword length calculation unit 5. Recalculate the recognition score for. The score recalculation unit 7 outputs a recognition result having the largest recognition score among the recalculated recognition scores.

次に、音声認識装置１０の動作について、フローチャートおよび具体例を参照しながら説明する。図３は、実施の形態１に係る音声認識装置１０の動作を示すフローチャートである。図４は、実施の形態１に係る音声認識装置１０の音声認識部１の認識結果を示す一例である。
音声認識装置１０に音声が入力されると（ステップＳＴ１）、音声認識部１は音響モデル蓄積部２に蓄積された音響モデルと、言語モデル蓄積部３に蓄積された言語モデルとを参照し、ステップＳＴ１で入力された音声の音声認識を行って認識結果を取得する（ステップＳＴ２）。さらに、音声認識部１は、ステップＳＴ２で取得した認識結果を認識スコアが大きい順に順位付けを行い、上位Ｎ個の認識結果の表記および認識スコアを抽出する（ステップＳＴ３）。Next, the operation of the speech recognition apparatus 10 will be described with reference to a flowchart and a specific example. FIG. 3 is a flowchart showing the operation of the speech recognition apparatus 10 according to the first embodiment. FIG. 4 is an example showing a recognition result of the speech recognition unit 1 of the speech recognition apparatus 10 according to the first embodiment.
When speech is input to the speech recognition device 10 (step ST1), the speech recognition unit 1 refers to the acoustic model stored in the acoustic model storage unit 2 and the language model stored in the language model storage unit 3, Voice recognition of the voice input at step ST1 is performed to obtain a recognition result (step ST2). Furthermore, the speech recognition unit 1 ranks the recognition results acquired in step ST2 in descending order of recognition scores, and extracts the top N recognition result descriptions and recognition scores (step ST3).

ステップＳＴ１において音声としてタスク外発話「テニスしたいね」が入力されると、
音声認識部１は、ステップＳＴ２およびステップＳＴ３の処理を行い、図４に示す認識結果を抽出する。認識順位１位の認識結果の表記が「手に寿司鯛ね」であり、認識スコアが「１１０」である。認識順位２位の認識結果の表記が「手にする鯛ね」であり、認識スコアが「１０５」である。図４の認識結果では、読みのカナが「テニスしたいね」と一致する表記「手に寿司鯛ね」が最も大きい認識スコアを有し、認識順位１位の認識結果として抽出される。When an utterance outside task "I want to play tennis" is input as a voice in step ST1,
The voice recognition unit 1 performs the processing of step ST2 and step ST3, and extracts the recognition result shown in FIG. The notation of the recognition result of the recognition rank No. 1 is “Sushi in hand” and the recognition score is “110”. The notation of the recognition result of the second recognition rank is “Keep it in hand” and the recognition score is “105”. In the recognition result of FIG. 4, the notation “sushi to sushi” in which the kana of the reading matches “I want to play tennis” has the largest recognition score, and is extracted as the recognition result of the first recognition rank.

未知語数算出部４は、言語モデル蓄積部３に蓄積された言語モデルを参照し、ステップＳＴ３で抽出されたＮ個の認識結果ｎに含まれるＮ−ｇｒａｍ確率に基づく未知語数を示す値Ｃ_ｎ（以下、未知語数Ｃ_ｎと称する）を算出する（ステップＳＴ４）。ステップＳＴ４で算出された未知語数Ｃ_ｎはスコア再計算部７に出力される。なお、ステップＳＴ４の詳細な処理動作は後述する。The unknown word number calculation unit 4 refers to the language model stored in the language model storage unit 3, and indicates a value C _n indicating the number of unknown words based on the N-gram probability included in the N recognition results n extracted in step ST3. (hereinafter, referred to as the unknown word number _{C n)} is calculated (step ST4). The unknown word count C _n calculated in step ST4 is output to the score recalculation unit 7. The detailed processing operation of step ST4 will be described later.

キーワード長算出部５は、キーワード蓄積部６に蓄積されたキーワードの表記およびキーワードの読みを参照し、ステップＳＴ３で抽出されたＮ個の認識結果ｎに含まれるキーワードの長さを示す値Ｌ_ｎ（以下、キーワード長Ｌ_ｎと称する）を算出する（ステップＳＴ５）。ステップＳＴ５で算出されたキーワード長Ｌ_ｎはスコア再計算部７に出力される。なお、ステップＳＴ５の詳細な処理動作は後述する。The keyword length calculation unit 5 refers to the keyword notations and keyword readings stored in the keyword storage unit 6 and refers to the values L _n indicating the lengths of the keywords included in the N recognition results n extracted in step ST3. (hereinafter, referred to as keyword length _{L n)} is calculated (step ST5). Keyword length L _n calculated in step ST5 are output to the score recalculation unit 7. The detailed processing operation of step ST5 will be described later.

スコア再計算部７は、ステップＳＴ４で算出された未知語数Ｃ_ｎおよびステップＳＴ５で算出されたキーワード長Ｌ_ｎを用いて、ステップＳＴ３で抽出されたＮ個の認識結果の認識スコアの更新を行う（ステップＳＴ６）。スコア再計算部７は、ステップＳＴ６で更新した認識スコアが大きい順に認識結果の順位付けを行い、最も認識スコアが大きい認識結果を出力し（ステップＳＴ７）、処理を終了する。なお、ステップＳＴ６およびステップＳＴ７の詳細な処理動作は後述する。Score recalculation unit 7 uses the keyword length L _n calculated in unknown word number C _n and step ST5 calculated in step ST4, updates the recognition score of the extracted N number of recognition results in step ST3 (Step ST6). The score recalculation unit 7 ranks the recognition results in descending order of the recognition score updated in step ST6, outputs the recognition result having the largest recognition score (step ST7), and ends the process. Detailed processing operations in steps ST6 and ST7 will be described later.

次に、ステップＳＴ４で示した未知語数算出部４の処理動作についてより詳細に説明する。
図５は、実施の形態１に係る音声認識装置１０の未知語数算出部４の動作を示すフローチャートである。以下では、Ｎ個の認識結果を認識結果ｎ（ｎ＝１，２，３，・・・，Ｎ）と記載する。また、認識結果ｎに含まれる単語を単語ｍ（ｍ＝１，２，３，・・・，Ｍ_ｎ）と記載する。なお、ｎは認識結果のインデックスであり、ｍは単語のインデックスである。さらに、以下では未知語数として、３−ｇｒａｍ確率に基づく未知語数をカウントする例を示す。Next, the processing operation of the unknown word number calculation unit 4 shown in step ST4 will be described in more detail.
FIG. 5 is a flowchart showing the operation of the unknown word number calculation unit 4 of the speech recognition apparatus 10 according to the first embodiment. Hereinafter, N recognition results are referred to as recognition results n (n = 1, 2, 3,..., N). Further, a word included in the recognition result n is described as a word m (m = 1, 2, 3,..., M _n ). Note that n is an index of recognition results, and m is a word index. Furthermore, the example which counts the number of unknown words based on 3-gram probability as an unknown word number below is shown.

未知語数算出部４は、ステップＳＴ３で抽出された認識結果のインデックスｎを「１」に初期化し（ステップＳＴ１１）、認識結果ｎに含まれる単語のインデックスｍを「１」に初期化し、認識結果ｎに含まれる未知語数Ｃ_ｎを「０」に初期化する（ステップＳＴ１２）。次に、未知語数算出部４は、言語モデル蓄積部３に蓄積された言語モデルを参照し、３−ｇｒａｍ確率Ｐ（ｗ_ｎ,ｍ｜ｗ_{ｎ,ｍ−２}ｗ_{ｎ,ｍ−１}）が設定された閾値Ｐ_ｔｈ以下であるか否か判定を行う（ステップＳＴ１３）。ここで、３−ｇｒａｍ確率Ｐのｗ_ｎ,ｍはｎ番目の認識結果のｍ番目の単語を意味する。また、閾値Ｐ_ｔｈを用いることにより、３−ｇｒａｍ確率Ｐが閾値Ｐ_ｔｈ以下の場合には、当該単語が未知語であるとみなすことができる。The unknown word number calculation unit 4 initializes the index n of the recognition result extracted in step ST3 to “1” (step ST11), initializes the index m of the word included in the recognition result n to “1”, and recognizes the recognition result. The number of unknown words C _n included in n is initialized to “0” (step ST12). Next, the unknown word number calculation unit 4 refers to the language model stored in the language model storage unit 3, and the 3-gram probability P (w _{n, m} | w _{n, m−2} w _{n, m−1} ) performing the set determined whether the threshold value _{P th} or less (step ST13). Here, w _{n, m} of the 3-gram probability P means the m-th word of the n-th recognition result. Further, by using the threshold value _Pth , when the 3-gram probability P is equal to or less than the threshold value _Pth , the word can be regarded as an unknown word.

３−ｇｒａｍ確率Ｐが閾値Ｐ_ｔｈ以下である場合（ステップＳＴ１３；ＹＥＳ）、未知語数算出部４は、未知語数Ｃ_ｎに１加算し（ステップＳＴ１４）、認識結果ｎに含まれる単語のインデックスｍに１加算する（ステップＳＴ１５）。一方、３−ｇｒａｍ確率Ｐが閾値Ｐ_ｔｈより大きい場合（ステップＳＴ１３；ＮＯ）、上述したステップＳＴ１５の処理に進む。ステップＳＴ１５の次に、未知語数算出部４は、認識結果ｎに含まれる単語のインデックスｍが、認識結果ｎに含まれる単語の数Ｍ_ｎ以下であるか否か判定を行う（ステップＳＴ１６）。When the 3-gram probability P is equal to or less than the threshold value P _th (step ST13; YES), the unknown word number calculation unit 4 adds 1 to the unknown word number C _n (step ST14), and the index m of the word included in the recognition result n. 1 is added to (step ST15). On the other hand, 3-gram probability if P is greater than the threshold _{P th} (step ST13; NO), the process proceeds to step ST15 described above. After step ST15, the unknown word number calculation unit 4 determines whether or not the index m of the word included in the recognition result n is equal to or less than the number _{Mn of} words included in the recognition result n (step ST16).

認識結果ｎに含まれる単語のインデックスｍが、認識結果ｎに含まれる単語の数Ｍ_ｎ以下である場合（ステップＳＴ１６；ＹＥＳ）、ステップＳＴ１３の処理に戻る。一方、認識結果ｎに含まれる単語のインデックスｍが、認識結果ｎに含まれる単語の数Ｍ_ｎより大きい場合（ステップＳＴ１６；ＮＯ）、未知語数算出部４は認識結果のインデックスｎに１加算する（ステップＳＴ１７）。さらに、未知語数算出部４は、認識結果のインデックスｎが認識結果の数Ｎ以下であるか否か判定を行う（ステップＳＴ１８）。認識結果のインデックスｎが認識結果の数Ｎ以下である場合（ステップＳＴ１８；ＹＥＳ）、ステップＳＴ１２の処理に戻る。一方、認識結果のインデックスｎが認識結果の数Ｎより大きい場合（ステップＳＴ１８；ＮＯ）、未知語数算出部４は、上述した処理により得られた未知語数Ｃ_ｎをスコア再計算部７に出力し（ステップＳＴ１９）、図４のフローチャートのステップＳＴ５の処理に進む。When the index m of the word included in the recognition result n is less than or equal to the number _{Mn of} words included in the recognition result n (step ST16; YES), the process returns to step ST13. On the other hand, when the word index m included in the recognition result n is larger than the number _{Mn of} words included in the recognition result n (step ST16; NO), the unknown word number calculation unit 4 adds 1 to the index n of the recognition result. (Step ST17). Furthermore, the unknown word number calculation unit 4 determines whether or not the index n of the recognition result is equal to or less than the number N of recognition results (step ST18). If the index n of the recognition result is equal to or less than the number N of recognition results (step ST18; YES), the process returns to step ST12. On the other hand, when the index n of the recognition result is larger than the number N of the recognition results (step ST18; NO), the unknown word number calculation unit 4 outputs the unknown word number C _n obtained by the above-described processing to the score recalculation unit 7. (Step ST19), the process proceeds to Step ST5 of the flowchart of FIG.

なお、上述した説明では、３−ｇｒａｍ確率に基づく未知語数をカウントする例を示したが、２−ｇｒａｍ確率に基づく未知語数をカウントしてもよいし、３−ｇｒａｍ確率に基づく未知語数および２−ｇｒａｍ確率に基づく未知語数の両者の合計値をカウントするように構成してもよい。 In the above description, an example of counting the number of unknown words based on the 3-gram probability is shown, but the number of unknown words based on the 2-gram probability may be counted, or the number of unknown words based on the 3-gram probability and 2 -You may comprise so that the total value of both of the number of unknown words based on -gram probability may be counted.

次に、ステップＳＴ５で示したキーワード長算出部５の処理動作についてより詳細に説明する。
図６は、実施の形態１に係る音声認識装置１０のキーワード長算出部５の動作を示すフローチャートである。
キーワード長算出部５は、ステップＳＴ３で抽出された認識結果のインデックスｎを「１」に初期化し（ステップＳＴ２１）、認識結果ｎに含まれる単語のインデックスｍを「１」に初期化し、認識結果ｎに含まれるキーワードの長さを示す値Ｌ_ｎ（以下、キーワード長Ｌ_ｎと称する）を「０」に初期化する（ステップＳＴ２２）。次に、キーワード長算出部５は、キーワード蓄積部６に蓄積されたキーワードの表記およびキーワードの読みを参照し、ｎ番目の認識結果のｍ番目の単語Ｗ_ｎ，ｍが、キーワード蓄積部６に蓄積されたキーワードであるか否か判定を行う（ステップＳ２３）。Next, the processing operation of the keyword length calculation unit 5 shown in step ST5 will be described in more detail.
FIG. 6 is a flowchart showing the operation of the keyword length calculation unit 5 of the speech recognition apparatus 10 according to the first embodiment.
The keyword length calculation unit 5 initializes the index n of the recognition result extracted in step ST3 to “1” (step ST21), initializes the index m of the word included in the recognition result n to “1”, and recognizes the recognition result. A value L _n (hereinafter referred to as keyword length L _n ) indicating the length of the keyword included in _n is initialized to “0” (step ST22). Next, the keyword length calculation unit 5 refers to the keyword notation and keyword reading stored in the keyword storage unit 6, _{and the mth} word W _{n, m} of the _nth recognition result is stored in the keyword storage unit 6. It is determined whether or not the keyword is stored (step S23).

Ｗ_ｎ，ｍがキーワードである場合（ステップＳＴ２３；ＹＥＳ）、キーワード長算出部５は当該キーワードのキーワード長Ｌ_ｎを、以下に示す式（１）および式（２）に基づいて更新する（ステップＳＴ２４）。
L’_n= UpdateLength ( L_n, length(w_n,m)) （１）
L_n= L’_n （２）
式（１）および式（２）において、Ｌ’_ｎは更新後のキーワード長、length(w)はキーワードｗの長さを返す関数、UpdateLength(L_n,A)はキーワード長Ｌ_ｎの更新を行う関数である。この実施の形態１では、length(w)はキーワードｗのモーラ長を算出する関数、UpdateLength(L_n,A)はキーワード長Ｌ_ｎとＡの最小値演算を行う関数として説明を行う。When W _{n, m} is a keyword (step ST23; YES), the keyword length calculation unit 5 updates the keyword length L _n of the keyword based on the following equations (1) and (2) (step) ST24).
L ' _n = UpdateLength (L _n , length (w _{n, m} )) (1)
L _n = L ' _n (2)
In Expression (1) and Expression (2), L ′ _n is a keyword length after update, length (w) is a function that returns the length of keyword w, and UpdateLength (L _n , A) is an update of keyword length L _n . The function to perform. In the first embodiment, length (w) is described as a function for calculating the mora length of the keyword w, and UpdateLength (L _n , A) is described as a function for calculating the minimum value of the keyword lengths L _n and A.

キーワード長算出部５は、認識結果ｎに含まれる単語のインデックスｍに１加算する（ステップＳＴ２５）。一方、単語Ｗ_ｎ，ｍがキーワードでない場合（ステップＳＴ２３；ＮＯ）、上述したステップＳＴ２５の処理に進む。ステップＳＴ２５の次に、キーワード長算出部５は、認識結果ｎに含まれる単語のインデックスｍが、認識結果ｎに含まれる単語の数Ｍ_ｎ以下であるか否か判定を行う（ステップＳＴ２６）。認識結果ｎに含まれる単語のインデックスｍが、認識結果ｎに含まれる単語の数Ｍ_ｎ以下である場合（ステップＳＴ２６；ＹＥＳ）、ステップＳＴ２３の処理に戻る。The keyword length calculation unit 5 adds 1 to the word index m included in the recognition result n (step ST25). On the other hand, if the word W _{n, m} is not a keyword (step ST23; NO), the process proceeds to step ST25 described above. Following step ST25, the keyword length calculation unit 5 determines whether or not the index m of the word included in the recognition result n is equal to or less than the number _{Mn of} words included in the recognition result n (step ST26). When the index m of the word included in the recognition result n is _{equal to} or less than the number _{Mn of} words included in the recognition result n (step ST26; YES), the process returns to step ST23.

一方、認識結果ｎに含まれる単語のインデックスｍが、認識結果ｎに含まれる単語の数Ｍ_ｎより大きい場合（ステップＳＴ２６；ＮＯ）、キーワード長算出部５は認識結果のインデックスｎに１加算する（ステップＳＴ２７）。さらに、キーワード長算出部５は、認識結果のインデックスｎが認識結果の数Ｎ以下であるか否か判定を行う（ステップＳＴ２８）。認識結果のインデックスｎが認識結果の数Ｎ以下である場合（ステップＳＴ２８；ＹＥＳ）、ステップＳＴ２２の処理に戻る。一方、認識結果のインデックスｎが認識結果の数Ｎより大きい場合（ステップＳＴ２８；ＮＯ）、キーワード長算出部５は、上述した処理により得られたキーワード長Ｌ_ｎをスコア再計算部７に出力し（ステップＳＴ２９）、図４のフローチャートのステップＳＴ６の処理に進む。On the other hand, when the index m of the word included in the recognition result n is larger than the number M _{n of} words included in the recognition result n (step ST26; NO), the keyword length calculation unit 5 adds 1 to the index n of the recognition result. (Step ST27). Further, the keyword length calculation unit 5 determines whether or not the recognition result index n is equal to or less than the number N of recognition results (step ST28). If the index n of the recognition result is less than or equal to the number N of recognition results (step ST28; YES), the process returns to step ST22. On the other hand, greater than the number N of index n of the recognition result is a recognition result (step ST28; NO), the keyword length calculating unit 5, and outputs a keyword length L _n obtained by the process described above the score recalculation section 7 (Step ST29), the process proceeds to Step ST6 of the flowchart of FIG.

次に、ステップＳＴ６およびステップＳＴ７で示したスコア再計算部７の処理動作についてより詳細に説明する。
図７は、実施の形態１に係る音声認識装置１０のスコア再計算部７の動作を示すフローチャートである。
スコア再計算部７は、ステップＳＴ３で抽出された認識結果のインデックスｎを「１」に初期化する（ステップＳＴ３１）。スコア再計算部７は、ステップＳＴ４で算出された未知語数Ｃ_ｎおよびステップＳＴ５で算出されたキーワード長Ｌ_ｎを用いて、ステップＳＴ３で抽出された認識結果ｎの認識スコアＳｃｏｒｅ_ｎを以下の式（３）および式（４）に基づいて更新する（ステップＳＴ３２）
Score’_n=Rescore1(Score_n, C_n, L_n) （３）
Score_n= Score’_n （４）
式（３）および式（４）において、Ｓｃｏｒｅ´_ｎは更新後の認識スコアを示し、Rescore1(Score_n, C_n, L_n)はスコアの更新を行う関数である。Next, the processing operation of the score recalculation unit 7 shown in step ST6 and step ST7 will be described in more detail.
FIG. 7 is a flowchart showing the operation of the score recalculation unit 7 of the speech recognition apparatus 10 according to the first embodiment.
The score recalculation unit 7 initializes the index n of the recognition result extracted in step ST3 to “1” (step ST31). The score recalculation unit 7 uses the unknown word count C _n calculated in step ST4 and the keyword length L _n calculated in step ST5 to calculate the recognition score Score _n of the recognition result n extracted in step ST3 as follows: Update based on (3) and equation (4) (step ST32)
Score ' _n = Rescore1 (Score _n , C _n , L _n ) (3)
Score _n = Score ' _n (4)
In Equations (3) and (4), Score _n represents the updated recognition score, and Rescore 1 (Score _n , C _n , L _n ) is a function for updating the score.

スコアの更新を行う関数として、この実施の形態１では以下の式（５）に示す関数を用いる。

上述した式（５）の第２項および第３項は、従来技術である認識スコアに対するペナルティに相当する。α、β、ＴＨ_Ｌは予め実験により決定されたパラメータである。この実施の形態１では、例えばα＝１、β＝１０、ＴＨ_Ｌ＝０とする。In the first embodiment, the function shown in the following formula (5) is used as a function for updating the score.

The second term and the third term of the above-described formula (5) correspond to a penalty for the recognition score, which is a conventional technique. α, β, and TH _L are parameters determined in advance by experiments. In the first embodiment, for example, α = 1, β = 10, and TH _L = 0.

なお、スコアの更新を行う関数は、Ｓｃｏｒｅ_ｎ，Ｃ_ｎ，Ｌ_ｎの関数になっていれば式（５）以外の関数を適用することが可能である。例えば、以下の式（５ａ）として構成してもよい。

As the function for updating the score, functions other than Expression (5) can be applied as long as it is a function of Score _n , C _n , and L _n . For example, you may comprise as the following formula | equation (5a).

スコア再計算部７は、認識結果のインデックスｎに１加算し（ステップＳＴ３３）、認識結果のインデックスｎが認識結果の数Ｎ以下であるか否か判定を行う（ステップＳＴ３４）。認識結果のインデックスｎが認識結果の数Ｎ以下である場合（ステップＳＴ３４；ＹＥＳ）、ステップＳＴ３２の処理に戻る。一方、認識結果のインデックスｎが認識結果の数Ｎより大きい場合（ステップＳＴ３４；ＮＯ）、スコア再計算部７は、上述した処理により得られた認識スコアＳｃｏｒｅ_ｎを参照し、当該認識スコアＳｃｏｒｅ_ｎが大きい順に認識結果ｎの並び替えを行い（ステップＳＴ３５）、最も認識スコアが大きい認識結果ｎを出力し（ステップＳＴ３６）、処理を終了する。The score recalculation unit 7 adds 1 to the index n of the recognition result (step ST33), and determines whether or not the index n of the recognition result is equal to or less than the number N of recognition results (step ST34). If the index n of the recognition result is less than or equal to the number N of recognition results (step ST34; YES), the process returns to step ST32. On the other hand, if the index n of the recognition result is larger than the number N of recognition results (step ST34; NO), the score recalculation unit 7 refers to the recognition score Score _n obtained by the above-described processing, and the recognition score Score _n The recognition results n are rearranged in descending order (step ST35), the recognition result n having the largest recognition score is output (step ST36), and the process ends.

図８は、実施の形態１に係る音声認識装置１０において認識スコアを更新した後の認識結果を示す一例である。図８では、認識順位、認識結果、更新前の認識スコア、未知語数Ｃ_ｎ、キーワード長Ｌ_ｎおよび更新後の認識スコアを示している。
認識結果「手にする鯛ね」および「手に寿司鯛ね」の未知語数Ｃ_ｎは共に「２」である。一方、認識結果「手にする鯛ね」に含まれるキーワード長Ｌ_ｎは「０」であり、「手に寿司鯛ね」に含まれるキーワード長Ｌ_ｎは「２」である。これらの結果を上述した式（５）に代入すると、認識結果「手に寿司鯛ね」の更新後の認識スコアは「110-(1×2)-(10/2)=103」に低下する。一方、認識結果「手にする鯛ね」の更新後の認識スコアはＬ_ｎ＝０であることから「１０５」のままである。これにより、更新後の認識スコアに基づく認識順位が逆転し、１位が認識結果「手にする鯛ね」となり、２位が認識結果「手に寿司鯛ね」となる。FIG. 8 is an example illustrating a recognition result after the recognition score is updated in the speech recognition apparatus 10 according to the first embodiment. FIG. 8 shows the recognition rank, the recognition result, the recognition score before update, the number of unknown words C _n , the keyword length L _n, and the recognition score after update.
The number of unknown words C _n of the recognition results “handed sushi” and “sushi sushi” is both “2”. On the other hand, the keyword length L _n included in the recognition result “hand-made rice cake” is “0”, and the keyword length L _n contained in “hand-made sushi rice” is “2”. Substituting these results into the above formula (5), the recognition score after updating the recognition result “sushi in hand” decreases to “110− (1 × 2) − (10/2) = 103”. . On the other hand, the updated recognition score of the recognition result “Hold in hand” is “105” because L _n = 0. As a result, the recognition order based on the updated recognition score is reversed, and the first place becomes the recognition result “hands on rice” and the second place the recognition result “hands on sushi rice”.

以上のように、この実施の形態１によれば、入力音声の音声認識を行い、認識スコアが上位の認識結果を抽出する音声認識部１と、蓄積された言語モデルを参照して認識結果の表記に基づいて未知語数を算出する未知語数算出部４と、蓄積されたキーワードを参照して認識結果の表記に基づいて認識結果に含まれるキーワードのキーワード長を算出するキーワード長算出部５と、音声認識部１が抽出した認識結果の認識スコアを、算出された未知語数およびキーワード長を用いて更新するスコア再計算部７を備えるように構成したので、未知語数が多く、且つキーワード長が短いキーワードが存在している認識結果の認識スコアを低下させることができる。これにより、タスク外発話に対して特にキーワード長が短いキーワードが多く出現するのを抑制することができる。 As described above, according to the first embodiment, the speech recognition of the input speech is performed, the speech recognition unit 1 that extracts the recognition result having the highest recognition score, and the recognition result of the recognition result by referring to the accumulated language model. An unknown word number calculating unit 4 that calculates the number of unknown words based on the notation, a keyword length calculating unit 5 that calculates the keyword length of the keyword included in the recognition result based on the notation of the recognition result with reference to the accumulated keyword, Since the score recalculation unit 7 that updates the recognition score of the recognition result extracted by the speech recognition unit 1 using the calculated number of unknown words and the keyword length is provided, the number of unknown words is large and the keyword length is short. The recognition score of the recognition result in which the keyword exists can be reduced. Thereby, it can suppress that many keywords with a short keyword length appear with respect to the utterance outside a task.

実施の形態２．
この実施の形態２では、音声認識装置１０がユーザに提示した認識結果が選択された回数あるいは音声認識装置１０が出力した認識結果に対応した機能が選択された回数をフィードバックし、タスク外発話に対してユーザが意図していないキーワードの出現をより抑制する構成を示す。Embodiment 2. FIG.
In the second embodiment, the number of times the recognition result presented to the user by the voice recognition device 10 is selected or the number of times the function corresponding to the recognition result output by the voice recognition device 10 is selected is fed back to the out-task utterance. On the other hand, the structure which suppresses appearance of the keyword which a user does not intend is shown more.

図９は、実施の形態２に係る音声認識装置１０ａの構成を示すブロック図である。
実施の形態２の音声認識装置１０ａは、実施の形態１で示した音声認識装置１０にキーワード選択頻度算出部８およびキーワード選択頻度蓄積部９を追加して設けている。なお、以下では、実施の形態１に係る音声認識装置１０の構成要素と同一または相当する部分には、実施の形態１で使用した符号と同一の符号を付して説明を省略または簡略化する。FIG. 9 is a block diagram illustrating a configuration of the speech recognition apparatus 10a according to the second embodiment.
The voice recognition device 10a of the second embodiment is provided with a keyword selection frequency calculation unit 8 and a keyword selection frequency storage unit 9 added to the voice recognition device 10 shown in the first embodiment. In the following description, the same or corresponding parts as the components of the speech recognition apparatus 10 according to the first embodiment are denoted by the same reference numerals as those used in the first embodiment, and description thereof is omitted or simplified. .

キーワード選択頻度算出部８は、キーワード選択頻度蓄積部９に蓄積されたキーワード選択頻度を参照して、音声認識部１が抽出した認識結果に含まれるあるキーワードがユーザにより選択された回数を示すキーワード選択頻度を算出する。キーワード選択頻度蓄積部９は、スコア再計算部７ａが出力した認識結果あるいは認識結果に対応した機能が選択されたか否かに基づいて、当該認識結果に含まれるキーワードが選択された回数を蓄積する。蓄積方法は適宜構成可能である。例えば、スコア再計算部７が出力した認識結果をキーワード選択頻度算出部８に入力することにより、キーワード選択頻度算出部８はユーザに提示したキーワードを取得して蓄積する。さらに、キーワード選択頻度算出部８は、ユーザからの入力操作を受け付ける外部装置から入力される情報を取得してユーザに選択されたキーワードを特定し、特定したキーワードの選択回数を加算する。 The keyword selection frequency calculation unit 8 refers to the keyword selection frequency stored in the keyword selection frequency storage unit 9 and indicates the number of times a certain keyword included in the recognition result extracted by the speech recognition unit 1 has been selected by the user. Calculate the selection frequency. The keyword selection frequency accumulation unit 9 accumulates the number of times a keyword included in the recognition result is selected based on whether the recognition result output from the score recalculation unit 7a or a function corresponding to the recognition result is selected. . The accumulation method can be appropriately configured. For example, by inputting the recognition result output from the score recalculation unit 7 to the keyword selection frequency calculation unit 8, the keyword selection frequency calculation unit 8 acquires and accumulates the keywords presented to the user. Further, the keyword selection frequency calculation unit 8 acquires information input from an external device that accepts an input operation from the user, specifies the keyword selected by the user, and adds the number of times the specified keyword is selected.

次に、音声認識装置１０ａの動作について説明する。
図１０はこの発明の実施の形態２に係る音声認識装置１０ａの動作を示すフローチャートである。なお、以下では実施の形態１に係る音声認識装置１０と同一のステップには図３で使用した符号と同一の符号を付し、説明を省略または簡略化する。
ステップＳＴ５においてキーワード長算出部５がキーワード長Ｌ_ｎを算出すると、キーワード選択頻度算出部８は、キーワード選択頻度蓄積部９に蓄積されたキーワードの選択頻度を参照し、ステップＳＴ３で抽出されたＮ個の認識結果に含まれるキーワードのキーワード選択頻度Ｆ_ｎを算出する（ステップＳＴ４１）。ステップＳＴ４１で算出された選択頻度Ｆ_ｎはスコア再計算部７ａに出力される。なお、ステップＳＴ４１の詳細な処理動作は後述する。Next, the operation of the voice recognition device 10a will be described.
FIG. 10 is a flowchart showing the operation of the speech recognition apparatus 10a according to Embodiment 2 of the present invention. In the following, the same steps as those of the speech recognition apparatus 10 according to Embodiment 1 are denoted by the same reference numerals as those used in FIG. 3, and the description thereof is omitted or simplified.
When the keyword length calculation unit 5 calculates the keyword length L _n in step ST5, the keyword selection frequency calculation unit 8 refers to the keyword selection frequency stored in the keyword selection frequency storage unit 9, and N extracted in step ST3. calculating a keyword selection frequency F _n of keywords contained in the pieces of recognition result (step ST41). Selection frequency F _n calculated in step ST41 is outputted to the score recalculation unit 7a. The detailed processing operation of step ST41 will be described later.

スコア再計算部７ａは、ステップＳＴ４で算出された未知語数Ｃ_ｎ、ステップＳＴ５で算出されたキーワード長Ｌ_ｎ、およびステップＳＴ４１で算出された選択頻度Ｆ_ｎを用いて、ステップＳＴ３で抽出されたＮ個の認識結果の認識スコアの更新を行う（ステップＳＴ４２）。スコア再計算部７ａは、ステップＳＴ４２で更新した認識スコアが大きい順に認識結果の順位付けを行い、最も認識スコアが大きい認識結果を出力し（ステップＳＴ４３）、処理を終了する。なお、ステップＳＴ４２およびステップＳＴ４３の詳細な処理動作は後述する。The score recalculation unit 7a is extracted in step ST3 using the number of unknown words C _n calculated in step ST4, the keyword length L _n calculated in step ST5, and the selection frequency F _n calculated in step ST41. The recognition score of N recognition results is updated (step ST42). The score recalculation unit 7a ranks the recognition results in descending order of the recognition score updated in step ST42, outputs the recognition result having the largest recognition score (step ST43), and ends the process. Detailed processing operations of step ST42 and step ST43 will be described later.

次に、ステップＳＴ４１で示したキーワード選択頻度算出部８の動作についてより詳細に説明する。
図１１は、実施の形態２に係る音声認識装置１０ａのキーワード選択頻度算出部８の動作を示すフローチャートである。
キーワード選択頻度算出部８は、ステップＳＴ３で抽出された認識結果のインデックスｎを「１」に初期化し（ステップＳＴ５１）、認識結果ｎに含まれる単語のインデックスｍを「１」に初期化し、認識結果ｎに含まれるキーワード選択頻度Ｆ_ｎを「−１」に初期化する（ステップＳＴ５２）。Next, the operation of the keyword selection frequency calculation unit 8 shown in step ST41 will be described in more detail.
FIG. 11 is a flowchart showing the operation of the keyword selection frequency calculation unit 8 of the speech recognition apparatus 10a according to the second embodiment.
The keyword selection frequency calculation unit 8 initializes the index n of the recognition result extracted in step ST3 to “1” (step ST51), initializes the index m of the word included in the recognition result n to “1”, and recognizes it. The keyword selection frequency F _n included in the result n is initialized to “−1” (step ST52).

次に、キーワード選択頻度算出部８は、キーワード蓄積部６に蓄積されたキーワードの表記およびキーワードの読みを参照し、ｎ番目の認識結果のｍ番目の単語Ｗ_ｎ，ｍが、キーワード蓄積部６に蓄積されたキーワードであるか否か判定を行う（ステップＳＴ５３）。単語Ｗ_ｎ，ｍがキーワードである場合（ステップＳＴ５３；ＹＥＳ）、キーワード選択頻度算出部８は認識結果ｎに含まれるキーワードのキーワード選択頻度Ｆ_ｎを、以下に示す式（６）および式（７）に基づいて更新する（ステップＳＴ５４）。
F’_n= UpdateFreq ( F_n, Freq (w_n,m)) （６）
F_n= F’_n （７）
式（６）および式（７）において、Ｆ’_ｎは更新後のキーワード選択頻度、Freq(w)はキーワードｗの選択頻度を返す関数、UpdateLength(F_n,A)はキーワード選択頻度Ｆ_ｎの更新を行う関数である。Next, the keyword selection frequency calculation unit 8 refers to the keyword notation and keyword reading stored in the keyword storage unit 6, _{and the mth} word W _{n, m} of the _nth recognition result is the keyword storage unit 6. It is determined whether or not the keyword is stored in (Step ST53). When the word W _{n, m} is a keyword (step ST53; YES), the keyword selection frequency calculation unit 8 determines the keyword selection frequency F _n of the keyword included in the recognition result _n by the following expressions (6) and (7) ) Based on (step ST54).
F ' _n = UpdateFreq (F _n , Freq (w _{n, m} )) (6)
F _n = F ' _n (7)
In Formula (6) and Formula (7), F ′ _n is a keyword selection frequency after update, Freq (w) is a function that returns the selection frequency of keyword w, and UpdateLength (F _n , A) is the keyword selection frequency F _n . A function that performs an update.

この実施の形態１では、Freq(w)として以下の式（８）で示す関数を用いる。

式（８）においてＳ（ｗ）はキーワードｗの選択回数、Ｒ（ｗ）はキーワードｗの認識回数である。また、この実施の形態２ではUpdateFreq(F_n,A)はキーワード選択頻度Ｆ_ｎとＡの最大値演算を行う関数とする。In the first embodiment, a function represented by the following equation (8) is used as Freq (w).

In equation (8), S (w) is the number of times the keyword w is selected, and R (w) is the number of times the keyword w is recognized. In the second embodiment, UpdateFreq (F _n , A) is a function for calculating the maximum value of the keyword selection frequencies F _n and A.

その後、キーワード選択頻度算出部８は、認識結果ｎに含まれる単語のインデックスｍに１加算する（ステップＳＴ５５）。一方、単語Ｗ_ｎ，ｍがキーワードでない場合（ステップＳＴ５３；ＮＯ）、上述したステップＳＴ５５の処理に進む。ステップＳＴ５５の次に、キーワード選択頻度算出部８は、認識結果ｎに含まれる単語のインデックスｍが、認識結果ｎに含まれる単語の数Ｍ_ｎ以下であるか否か判定を行う（ステップＳＴ５６）。認識結果ｎに含まれる単語のインデックスｍが、認識結果ｎに含まれる単語の数Ｍ_ｎ以下である場合（ステップＳＴ５６；ＹＥＳ）、ステップＳＴ５３の処理に戻る。After that, the keyword selection frequency calculation unit 8 adds 1 to the word index m included in the recognition result n (step ST55). On the other hand, when the word W _{n, m} is not a keyword (step ST53; NO), the process proceeds to step ST55 described above. Following step ST55, the keyword selection frequency calculation unit 8 determines whether or not the index m of the word included in the recognition result n is equal to or less than the number _{Mn of} words included in the recognition result n (step ST56). . When the index m of the word included in the recognition result n is less than or equal to the number _{Mn of} words included in the recognition result n (step ST56; YES), the process returns to step ST53.

一方、認識結果ｎに含まれる単語のインデックスｍが、認識結果ｎに含まれる単語の数Ｍ_ｎより大きい場合（ステップＳＴ５６；ＮＯ）、キーワード選択頻度算出部８は認識結果のインデックスｎに１加算する（ステップＳＴ５７）。さらに、キーワード選択頻度算出部８は、認識結果のインデックスｎが認識結果の数Ｎ以下であるか否か判定を行う（ステップＳＴ５８）。認識結果のインデックスｎが認識結果の数Ｎ以下である場合（ステップＳＴ５８；ＹＥＳ）、ステップＳＴ５２の処理に戻る。一方、認識結果のインデックスｎが認識結果の数Ｎより大きい場合（ステップＳＴ５８；ＮＯ）、キーワード選択頻度算出部８は、上述した処理により得られたキーワード選択頻度Ｆ_ｎをキーワード選択頻度算出部８に出力し（ステップＳＴ５９）、図１０のフローチャートのステップＳＴ４２の処理に進む。On the other hand, when the index m of the word included in the recognition result n is larger than the number M _{n of} words included in the recognition result n (step ST56; NO), the keyword selection frequency calculation unit 8 adds 1 to the index n of the recognition result. (Step ST57). Further, the keyword selection frequency calculation unit 8 determines whether or not the recognition result index n is equal to or less than the number N of recognition results (step ST58). If the index n of the recognition result is equal to or less than the number N of recognition results (step ST58; YES), the process returns to step ST52. On the other hand, when the index n of the recognition result is larger than the number N of the recognition results (step ST58; NO), the keyword selection frequency calculation unit 8 uses the keyword selection frequency F _n obtained by the above-described processing as the keyword selection frequency calculation unit 8. (Step ST59), the process proceeds to step ST42 in the flowchart of FIG.

次に、ステップＳＴ４２およびステップＳＴ４３で示したスコア再計算部７ａの処理動作についてより詳細に説明する。
図１２は、実施の形態２に係る音声認識装置１０のスコア再計算部７ａの動作を示すフローチャートである。なお、以下では実施の形態１に係る音声認識装置１０のスコア再計算部７と同一のステップには図７で使用した符号と同一の符号を付し、説明を省略または簡略化する。Next, the processing operation of the score recalculation unit 7a shown in step ST42 and step ST43 will be described in more detail.
FIG. 12 is a flowchart showing the operation of the score recalculation unit 7a of the speech recognition apparatus 10 according to the second embodiment. In the following, the same steps as those of the score recalculation unit 7 of the speech recognition apparatus 10 according to the first embodiment are denoted by the same reference numerals as those used in FIG. 7, and the description thereof is omitted or simplified.

スコア再計算部７ａは、ステップＳＴ３１で認識結果のインデックスｎを「１」に初期化すると、ステップＳＴ４で算出された未知語数Ｃ_ｎ、ステップＳＴ５で算出されたキーワード長Ｌ_ｎおよびステップＳＴ４１で算出されたキーワード選択頻度Ｆ_ｎを用いて、ステップＳＴ３で抽出された認識結果ｎの認識スコアＳｃｏｒｅ_ｎを以下の式（９）および式（１０）に基づいて更新する（ステップＳ６１）。
Score’_n=Rescore2(Score_n, C_n, L_n, F_n) （９）
Score_n= Score’_n （１０）
式（９）および式（１０）において、Ｓｃｏｒｅ´_ｎは更新後の認識スコアを示し、Rescore1(Score_n, C_n, L_n)はスコアの更新を行う関数である。When the index n of the recognition result is initialized to “1” in step ST31, the score recalculation unit 7a calculates the number of unknown words C _n calculated in step ST4, the keyword length L _n calculated in step ST5, and step ST41. The recognition score Score _n of the recognition result n extracted in step ST3 is updated based on the following formula (9) and formula (10) using the keyword selection frequency F _n thus performed (step S61).
Score ' _n = Rescore2 (Score _n , C _n , L _n , F _n ) (9)
Score _n = Score ' _n (10)
In Equation (9) and Equation (10), Score ′ _n represents the recognition score after update, and Rescore ₁ (Score _n , C _n , L _n ) is a function for updating the score.

この実施の形態１では、スコアの更新を行う関数として以下の式（１１）に示す関数を用いる。

上述した式（１１）の第２項から第４項は、従来技術である認識スコアに対するペナルティに相当する。α、β、γ、ＴＨ_Ｌ、ＴＨ_Ｆは予め実験により決定されたパラメータである。この実施の形態２では、例えばα＝１、β＝１０、γ＝５、ＴＨ_Ｌ＝０、ＴＨ_Ｆ＝０．５とする。なお、スコアの更新を行う関数は、Score_n、Ｃ_ｎ、Ｌ_ｎ、Ｆ_ｎの関数になっていれば式（１１）以外の関数を適用することが可能である。In the first embodiment, a function shown in the following equation (11) is used as a function for updating the score.

The second to fourth terms of the above-described formula (11) correspond to a penalty for the recognition score, which is a conventional technique. α, β, γ, TH _L , and TH _F are parameters determined in advance by experiments. In the second embodiment, for example, α = 1, β = 10, γ = 5, TH _L = 0, and TH _F = 0.5. As the function for updating the score, functions other than Expression (11) can be applied as long as the functions are Score _n , C _n , L _n , and F _n .

その後、スコア再計算部７ａは、ステップＳＴ３３からステップＳＴ３６の処理を行い、認識結果を出力して処理を終了する。 Thereafter, the score recalculation unit 7a performs the processing from step ST33 to step ST36, outputs the recognition result, and ends the processing.

図１３は、実施の形態２に係る音声認識装置１０ａにおいて認識スコアを更新した後の認識結果を示す一例である。図１３では、認識順位、認識結果、更新前の認識スコア、未知語数Ｃ_ｎ、キーワード長Ｌ_ｎ、キーワード選択頻度Ｆ_ｎおよび更新後の認識スコアを示している。
認識結果「手にする鯛ね」および「手に寿司鯛ね」の未知語数Ｃ_ｎは共に「２」である。一方、認識結果「手にする鯛ね」に含まれるキーワード長Ｌ_ｎは「０」であり、「手に寿司鯛ね」に含まれるキーワード長Ｌ_ｎは「２」である。さらに、認識結果「手にする鯛ね」に含まれるキーワード選択頻度Ｆ_ｎは「−１（キーワードが含まれていないことを意味する）」であり、「手に寿司鯛ね」に含まれるキーワード選択頻度Ｆ_ｎは「０」である。FIG. 13 is an example showing a recognition result after updating the recognition score in the speech recognition apparatus 10a according to the second embodiment. FIG. 13 shows the recognition rank, the recognition result, the recognition score before update, the number of unknown words C _n , the keyword length L _n , the keyword selection frequency F _n, and the recognition score after update.
The number of unknown words C _n of the recognition results “handed sushi” and “sushi sushi” is both “2”. On the other hand, the keyword length L _n included in the recognition result “hand-made rice cake” is “0”, and the keyword length L _n contained in “hand-made sushi rice” is “2”. Furthermore, the keyword selection frequency F _n included in the recognition result “hand-made rice cake” is “−1 (means that no keyword is included)”, and the keyword included in “hand-made sushi rice cake”. The selection frequency F _n is “0”.

これらの結果を上述した式（１１）に代入すると、認識結果「手に寿司鯛ね」の更新後の認識スコアは「110-(1×2)-(10/2)-5=98」に低下する。一方、認識結果「手にする鯛ね」の更新後の認識スコアはＬ_ｎ＝０であることから「１０５」のままである。これにより、更新後の認識スコアに基づく認識順位が逆転し、１位が認識結果「手にする鯛ね」となり、２位が認識結果「手に寿司鯛ね」となる。さらに、実施の形態１で示した図６の認識結果例と比較すると、キーワード選択頻度算出部８が算出したキーワード選択頻度Ｆ_ｎに基づいてペナルティが加えられることにより、図１１の認識結果「手に寿司鯛ね」の認識スコアがより低下している。Substituting these results into the above-mentioned formula (11), the recognition score after updating the recognition result “sushi on hand” is “110- (1 × 2)-(10/2) -5 = 98”. descend. On the other hand, the updated recognition score of the recognition result “Hold in hand” is “105” because L _n = 0. As a result, the recognition order based on the updated recognition score is reversed, and the first place becomes the recognition result “hands on rice” and the second place the recognition result “hands on sushi rice”. Moreover, when compared with the recognition result example of FIG. 6 described in Embodiment 1, by the penalty is added on the basis of the keyword selection frequency F _n that keyword selection frequency calculating unit 8 is calculated, the recognition result of FIG. 11, "hand The recognition score for “Nishi Sushi” is lower.

以上のように、この実施の形態２によれば、音声認識部１が抽出した認識結果に含まれるあるキーワードがユーザにより選択された回数を示すキーワード選択頻度を算出するキーワード選択頻度算出部８を備えるように構成したので、タスク外発話に対して、ユーザが意図しないキーワードが一時的に出現したとしても、当該キーワードを含む認識結果がユーザに選択される頻度が低い場合にはペナルティを加えるように認識スコアを再計算することができる。これにより、タスク外の発話に対してユーザが意図しないキーワードが多く出現するのを抑制することができる。 As described above, according to the second embodiment, the keyword selection frequency calculation unit 8 that calculates the keyword selection frequency indicating the number of times a certain keyword included in the recognition result extracted by the speech recognition unit 1 is selected by the user is provided. Even if a keyword unintended by the user appears temporarily for an out-of-task utterance, a penalty will be added if the recognition result containing the keyword is not frequently selected by the user. The recognition score can be recalculated. Thereby, it can suppress that many keywords which a user does not intend with respect to the utterance outside a task appear.

なお、上述した実施の形態１および実施の形態２では、認識対象を料理名および当該料理名の付帯表現とした場合を例に説明を行ったが、これらの認識対象に限定されるものではない。 In Embodiments 1 and 2 described above, the case where the recognition target is a dish name and an accompanying expression of the dish name has been described as an example. However, the present invention is not limited to these recognition targets. .

なお、上述した実施の形態１では、未知語数の算出、キーワード長の算出の順で処理を行う場合を例に説明を行い、上述した実施の形態２では未知語数の算出、キーワード長の算出、キーワード選択頻度算出の順で処理を行う場合を例に説明を行ったが、算出の順序はこれに限定されるものではない。 In the above-described first embodiment, the case where processing is performed in the order of calculation of the number of unknown words and calculation of the keyword length will be described as an example. In the above-described second embodiment, calculation of the number of unknown words, calculation of the keyword length, Although the case where processing is performed in the order of keyword selection frequency calculation has been described as an example, the calculation order is not limited to this.

上記以外にも、本発明はその発明の範囲内において、各実施の形態の自由な組み合わせ、あるいは各実施の形態の任意の構成要素の変形、もしくは各実施の形態において任意の構成要素の省略が可能である。 In addition to the above, within the scope of the present invention, the present invention can be freely combined with each embodiment, modified any component of each embodiment, or omitted any component in each embodiment. Is possible.

この発明に係る音声認識装置は、例えば音声認識機能を搭載したナビゲーションシステムや、ソフトウェアとして音声認識プログラムを当該した機器に適用し、キーワード長が短いキーワードを含む発話の認識性能を向上させるのに適している。 The speech recognition apparatus according to the present invention is suitable for improving the recognition performance of an utterance including a keyword having a short keyword length, for example, by applying a speech recognition program as a software to a navigation system equipped with a speech recognition function or a device to which the speech recognition program is applied. ing.

１音声認識部、２音響モデル蓄積部、３言語モデル蓄積部、４未知語数算出部、５キーワード蓄積部、６キーワード長算出部、７，７ａスコア再計算部、８キーワード選択頻度算出部、９キーワード選択頻度蓄積部、１０，１０ａ音声認識装置、２０メモリ、３０プロセッサ。 DESCRIPTION OF SYMBOLS 1 Speech recognition part, 2 Acoustic model storage part, 3 Language model storage part, 4 Unknown word number calculation part, 5 Keyword storage part, 6 Keyword length calculation part, 7, 7a Score recalculation part, 8 Keyword selection frequency calculation part, 9 Keyword selection frequency storage unit, 10, 10a speech recognition device, 20 memory, 30 processor.

Claims

Language likelihood calculated based on the recognition result obtained by performing speech recognition of the input speech using a language model that has been learned for the set recognition target and an acoustic model that is modeled on the features of speech A speech recognition unit that calculates a recognition score of the recognition result from acoustic likelihood;
An unknown word number calculation unit that calculates the number of unknown words included in the recognition result acquired by the speech recognition unit based on the N-gram probability of the language model;
A keyword storage unit for storing keywords related to the set recognition target;
A keyword length calculation unit that calculates a keyword length indicating the length of the keyword when the keyword stored in the keyword storage unit is included in the recognition result acquired by the voice recognition unit;
The speech is reduced so that the recognition score is reduced according to an increase in the number of unknown words calculated by the unknown word number calculation unit, and the recognition score is reduced according to a decrease in keyword length calculated by the keyword length calculation unit. A speech recognition apparatus comprising: a score recalculation unit that recalculates a recognition score calculated by a recognition unit and outputs a recognition result acquired by the speech recognition unit based on the recalculated recognition score.

The score recalculation unit, when the keyword length calculated by the keyword length calculation unit is larger than a set threshold, a value obtained by multiplying the number of unknown words calculated by the unknown word number calculation unit and a parameter, The speech recognition apparatus according to claim 1 , wherein a value obtained by dividing the keyword length is subtracted from a recognition score calculated by the speech recognition unit.

The score recalculation unit, when the keyword length calculated by the keyword length calculation unit is larger than a set threshold, a value obtained by multiplying the parameter by the number of unknown words calculated by the unknown word number calculation unit and the parameter The speech recognition apparatus according to claim 1 , wherein a value obtained by multiplying the value divided by the keyword length is subtracted from the recognition score calculated by the speech recognition unit.

A keyword selection frequency accumulating unit that accumulates the number of times the user selected a recognition result output by the score recalculation unit or a function corresponding to the recognition result;
When the recognition result acquired by the voice recognition unit includes the keyword stored in the keyword storage unit, the selection frequency indicating the number of times the keyword is selected is selected as the keyword selection stored in the keyword selection frequency storage unit. A keyword selection frequency calculation unit that calculates by referring to the frequency,
The score recalculation unit recalculates the recognition score calculated by the voice recognition unit based on the number of unknown words, the keyword length, and the selection frequency calculated by the keyword selection frequency calculation unit. The speech recognition apparatus according to claim 1.

The score recalculation unit reduces the recognition score according to an increase in the number of unknown words calculated by the unknown word number calculation unit, and the keyword length calculated by the keyword length calculation unit or the keyword selection frequency calculation unit calculates The speech recognition apparatus according to claim 4, wherein the recognition score is reduced according to a decrease in the selected frequency.

The speech recognition unit performs speech recognition of the input speech using the language model learned for the set recognition target and the acoustic model that models the features of the speech, and is calculated based on the obtained recognition results A recognition score of the recognition result is calculated from the language likelihood and the acoustic likelihood,
An unknown word number calculation unit calculates the number of unknown words included in the recognition result based on the N-gram probability of the language model,
The keyword length calculation unit calculates a keyword length indicating the length of the keyword when the keyword related to the set recognition target accumulated in advance is included in the recognition result,
Score recalculation unit reduces the recognition score in accordance with the increase in the number of unknown words the calculated, so as to reduce the recognition score in accordance with a decrease of the calculated keyword length, the recognition score A speech recognition method for recalculating and outputting the recognition result based on the recalculated recognition score.

Language likelihood calculated based on the recognition result obtained by performing speech recognition of the input speech using a language model that has been learned for the set recognition target and an acoustic model that is modeled on the features of speech A speech recognition unit that calculates a recognition score of the recognition result from acoustic likelihood;
An unknown word number calculation unit that calculates the number of unknown words included in the recognition result acquired by the speech recognition unit based on the N-gram probability of the language model;
A keyword storage unit for storing keywords related to the set recognition target;
A keyword length calculation unit that calculates a keyword length indicating the length of the keyword when the keyword stored in the keyword storage unit is included in the recognition result acquired by the voice recognition unit;
The speech is reduced so that the recognition score is reduced according to an increase in the number of unknown words calculated by the unknown word number calculation unit, and the recognition score is reduced according to a decrease in keyword length calculated by the keyword length calculation unit. A speech recognition program for causing a computer to function as a score recalculation unit that recalculates a recognition score calculated by a recognition unit and outputs the recognition result based on the recalculated recognition score.