JP4829910B2 - Speech recognition error analysis apparatus, method, program, and recording medium therefor - Google Patents

Speech recognition error analysis apparatus, method, program, and recording medium therefor Download PDF

Info

Publication number
JP4829910B2
JP4829910B2 JP2008038468A JP2008038468A JP4829910B2 JP 4829910 B2 JP4829910 B2 JP 4829910B2 JP 2008038468 A JP2008038468 A JP 2008038468A JP 2008038468 A JP2008038468 A JP 2008038468A JP 4829910 B2 JP4829910 B2 JP 4829910B2
Authority
JP
Japan
Prior art keywords
word
error
correct
word set
section
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
JP2008038468A
Other languages
Japanese (ja)
Other versions
JP2009198646A (en
Inventor
太一 浅見
喜昭 野田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP2008038468A priority Critical patent/JP4829910B2/en
Publication of JP2009198646A publication Critical patent/JP2009198646A/en
Application granted granted Critical
Publication of JP4829910B2 publication Critical patent/JP4829910B2/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Description

この発明は、音声認識技術に関する。特に、言語モデルにおける音声認識誤りの原因を分析する音声認識誤り分析装置、方法、プログラム及びその記録媒体に関する。   The present invention relates to speech recognition technology. In particular, the present invention relates to a speech recognition error analyzing apparatus, method, program, and recording medium for analyzing the cause of a speech recognition error in a language model.

音声認識エンジンを構成する音響モデル、言語モデルを改善する際には、認識誤りを起こしやすい部分から改善すると効率が良い。   When improving the acoustic model and the language model that constitute the speech recognition engine, it is efficient to improve from the part that easily causes a recognition error.

入力音声がどの音素に近いかを判定する音響モデルにおいては、Confusion Matrixを作成することにより認識誤りを起こしやすい部分を特定することができる。Confusion Matrixは、全ての音素について、別のどの音素と混同しやすいかを表にしたものである。Confusion Matrixを作成して混同しやすい音素を特定した上で、その混同しやすい音素から改善を行うことで、音響モデルを効率良く改善して行くことができる。   In an acoustic model that determines which phoneme the input speech is close to, it is possible to identify a portion that is likely to cause a recognition error by creating a confusion matrix. The Confusion Matrix lists all phonemes that are likely to be confused with other phonemes. An acoustic model can be improved efficiently by creating a confusion matrix and identifying phonemes that are easily confused, and then making improvements from the phonemes that are easily confused.

一方、言語モデルの性能の分析方法としては、パープレキシティによって言語モデルを評価する方法がよく用いられている(例えば、非特許文献1参照。)。音声認識では、言語モデルで計算された単語連鎖確率を用いて、認識単語候補の絞り込みを行っている。パープレキシティは、認識語彙に含まれる各単語から、次の単語への平均分岐数を示す値であり、その値が大きいほど認識単語候補を絞り込みにくい言語モデルであるということを表す。   On the other hand, as a method for analyzing the performance of a language model, a method of evaluating a language model by perplexity is often used (for example, see Non-Patent Document 1). In speech recognition, recognition word candidates are narrowed down using word chain probabilities calculated by a language model. The perplexity is a value indicating the average number of branches from each word included in the recognized vocabulary to the next word, and indicates that the larger the value is, the more difficult it is to narrow down the recognized word candidates.

しかし、パープレキシティの値からは、具体的にどの単語列で候補を絞り込みにくいのかわからないため、言語モデルにおける認識誤りを起こしやすい部分を特定することはできない。
Lawrence Rabiner(著),Biing-Hwang Juang(著),古井定煕(翻訳),「音声認識の基礎(下)」,NTTアドバンステクノロジ株式会社,1995年,P.263−265
However, since the perplexity value does not clearly indicate which word string is difficult to narrow down candidates, it is not possible to identify a portion that is likely to cause a recognition error in the language model.
Lawrence Rabiner (Author), Biing-Hwang Juang (Author), Sadaaki Furui (Translation), "Basics of Speech Recognition (Lower)", NTT Advanced Technology Co., Ltd., 1995, p. 263-265

上記したように、非特許文献1に記載された言語モデルの性能の分析方法では、言語モデルにおける認識誤りを起こしやすい部分を特定することができないという問題があった。   As described above, the method for analyzing the performance of the language model described in Non-Patent Document 1 has a problem in that it is impossible to specify a portion that easily causes a recognition error in the language model.

この発明は、言語モデルにおいて認識誤りを起こしやすい部分を特定する音声認識誤り分析装置、方法、プログラム及びその記録媒体を提供することを目的とする。   An object of the present invention is to provide a speech recognition error analysis apparatus, method, program, and recording medium for identifying a part that easily causes a recognition error in a language model.

この発明の1つの観点によれば、言語モデルを用いて音声信号に対して音声認識処理を行い、その音声認識結果である単語列(以下、認識単語列とする。)を割り当てる。認識単語列内の、その認識単語列に対応する正解単語列と一致しない1つ又は連続する複数の単語から構成される単語列(以下、認識誤り単語列とする。)と、その認識誤り単語列及びその前後一単語から構成される認識誤り区間とを認識単語列から抽出する。認識誤り区間の最初の単語と、認識誤り単語列の最初の単語とから構成される開始部誤り二単語組を抽出する。認識誤り区間の最初の単語と、認識誤り単語列に対応する正解単語列の最初の単語とから構成される開始部正解二単語組を抽出する。言語モデルを用いて、開始部誤り二単語組の単語連鎖確率と開始部正解二単語組の単語連鎖確率をそれぞれ計算する。開始部誤り二単語組の単語連鎖確率と開始部正解二単語組の単語連鎖確率とを比較して、開始部誤り二単語組の単語連鎖確率よりも単語連鎖確率が低い開始部正解二単語組(以下、低開始部正解二単語組とする。)を抽出する。   According to one aspect of the present invention, a speech recognition process is performed on a speech signal using a language model, and a word string (hereinafter referred to as a recognized word string) as a speech recognition result is assigned. A word string composed of one or a plurality of consecutive words that do not match the correct word string corresponding to the recognized word string in the recognized word string (hereinafter referred to as a recognized error word string), and the recognized error word A recognition error section composed of a string and one word before and after the string is extracted from the recognized word string. A starting error two-word set composed of the first word in the recognition error section and the first word in the recognition error word string is extracted. A starting correct two-word group composed of the first word in the recognition error section and the first word in the correct word string corresponding to the recognition error word string is extracted. Using the language model, the word chain probability of the starting part error two-word group and the word chain probability of the starting part correct word group are calculated. Comparing the word chain probability of the start part error two-word set with the word chain probability of the start part correct word pair, the start part correct two-word group having a word chain probability lower than the word chain probability of the start part error two-word set (Hereinafter referred to as a low-starting part correct two-word set) is extracted.

開始部誤り二単語組の単語連鎖確率よりも単語連鎖確率が低い低開始部正解二単語組は、認識誤りが発生する原因となる単語列である。したがって、低開始部正解二単語組を抽出することにより、言語モデルにおける認識誤りを起こしやすい部分を特定することができる。   The low start correct two-word set having a word chain probability lower than the word chain probability of the start-part error two-word set is a word string that causes a recognition error. Therefore, by extracting the low-start-part correct two-word set, it is possible to specify a portion that is likely to cause a recognition error in the language model.

以下、図面を参照してこの発明の実施形態の例を説明する。   Hereinafter, embodiments of the present invention will be described with reference to the drawings.

[第一実施形態]
認識誤りは数単語に亘って連続して生じる傾向があり、認識誤りの原因は、(1)認識誤りの開始の原因と、(2)認識誤りの拡大の原因の2つに分けることができる。第一実施形態は、認識誤りの原因のうち、認識誤りの開始の原因となり得る部分を特定するものである。
[First embodiment]
Recognition errors tend to occur continuously over several words, and the causes of recognition errors can be divided into two reasons: (1) the cause of recognition error start and (2) the cause of recognition error expansion. . The first embodiment specifies a portion that can cause a recognition error among the causes of the recognition error.

図1,4を参照してこの発明の第一実施形態の例を説明する。図1は、音声認識誤り分析装置の例の機能ブロック図である。図4は、音声認識誤り分析方法の処理の流れを例示するフローチャートである。   An example of the first embodiment of the present invention will be described with reference to FIGS. FIG. 1 is a functional block diagram of an example of a speech recognition error analysis apparatus. FIG. 4 is a flowchart illustrating the processing flow of the speech recognition error analysis method.

第一実施形態の音声認識誤り分析装置1は、図1において実線で示す、音声認識部11、認識誤り区間抽出部12、開始部二単語組抽出部21,開始部単語連鎖確率計算部22及び低開始部正解二単語組抽出部23を例えば備える。   The speech recognition error analysis apparatus 1 according to the first embodiment includes a speech recognition unit 11, a recognition error section extraction unit 12, a start unit two-word set extraction unit 21, a start unit word chain probability calculation unit 22, and a solid line in FIG. The low start part correct answer two word set extraction part 23 is provided, for example.

<ステップS1>
音声認識部11は、音響モデル、言語モデル及び認識辞書を用いて、音声信号に対して音声認識処理を行い、その音声信号に対してその音声認識処理の結果である単語列を割り当てる。割り当てられた単語列を、認識単語列とする。認識単語列の各単語には、始端時刻と終端時刻が付与される。認識単語列は、認識誤り区間抽出部12に送られる。
音声認識処理の概要については、例えば参考文献1を参照のこと。
<Step S1>
The speech recognition unit 11 performs speech recognition processing on the speech signal using an acoustic model, a language model, and a recognition dictionary, and assigns a word string that is a result of the speech recognition processing to the speech signal. Let the assigned word string be a recognized word string. Each word in the recognized word string is given a start time and an end time. The recognition word string is sent to the recognition error section extraction unit 12.
See, for example, Reference 1 for an overview of speech recognition processing.

〔参考文献1〕政瀧浩和,外5名,「顧客との自然な会話を聞き取る自由発話音声認識技術『VoiceRex』」,NTT技術ジャーナル,2006年11月,No.18,vol.11,p.15−18
例えば、音声認識部11は、「インターネットが繋がらない」という文を少なくとも含む音声信号に対して音声認識処理を行い、図5に実線で示すように、その「インターネットが繋がらない」という音声信号部分に「(インターネット)(勝つ)(な)(が)(荒)(ない)」という単語列を含む認識単語列を割り当てる。
[Reference 1] Hirokazu Masami, 5 others, “Free Speech Recognition Technology that Listens to Natural Conversations with Customers“ VoiceRex ””, NTT Technical Journal, November 2006, No. 18, vol. 11, p. 15-18
For example, the voice recognition unit 11 performs voice recognition processing on a voice signal including at least a sentence “Internet is not connected”, and the voice signal part “Internet is not connected” as shown by a solid line in FIG. Is assigned a recognition word string including the word string “(Internet) (win) (na) (ga) (rough) (no)”.

<ステップS2>
認識誤り区間抽出部12は、認識単語列と、その認識単語列に対応する正解単語列とを比較して、認識誤り単語列と、その認識誤り単語列及びその前後一単語とから構成される認識誤り区間とを抽出する。
<Step S2>
The recognition error section extraction unit 12 compares the recognition word string with the correct word string corresponding to the recognition word string, and is composed of the recognition error word string, the recognition error word string, and one word before and after the recognition error word string. A recognition error interval is extracted.

認識誤り単語列とは、認識単語列内の、その認識単語列に対応する正解単語列と一致しない1つ又は連続する複数の単語から構成される単語列のことである。抽出された認識誤り単語列と認識誤り区間は、開始部二単語組抽出部21に送られる。   The recognition error word string is a word string composed of one or a plurality of consecutive words that do not match the correct word string corresponding to the recognition word string in the recognition word string. The extracted recognition error word string and recognition error section are sent to the start unit two-word set extraction unit 21.

図5に示した例では、認識単語列と、その認識単語列に対応する正解単語列とは、(勝つ)(な)(が)(荒)という連続する4つの単語の部分で一致しない。したがって、(勝つ)(な)(が)(荒)が認識誤り単語列となる。また、この認識誤り単語列に、その前の単語(インターネット)と、その後ろの単語(ない)を加えた(インターネット)(勝つ)(な)(が)(荒)(ない)が、認識誤り区間となる。   In the example shown in FIG. 5, the recognized word string and the correct word string corresponding to the recognized word string do not match in the four consecutive word parts (win) (na) (ga) (rough). Accordingly, (win) (na) (ga) (rough) is a recognition error word string. In addition, this recognition error word string includes the previous word (Internet) and the next word (not) (Internet) (win) (na) (ga) (rough) (no) It becomes a section.

一般に、音声認識部11の音声認識処理により、複数の認識誤り区間が認識誤り区間抽出部12により抽出される。以下の処理は、各複数の認識誤り区間ごとに行われる。   In general, a plurality of recognition error sections are extracted by the recognition error section extraction section 12 by the voice recognition processing of the voice recognition section 11. The following processing is performed for each of a plurality of recognition error intervals.

<ステップS3>
開始部二単語組抽出部21は、図2に例示するように、開始部誤り二単語組抽出部211と開始部正解二単語組抽出部212とを含む。
<Step S3>
As illustrated in FIG. 2, the start part two-word set extraction unit 21 includes a start part error two-word set extraction part 211 and a start part correct answer two-word set extraction part 212.

開始部誤り二単語組抽出部211は、認識誤り単語列と認識誤り区間とから、開始部誤り二単語組を抽出する。抽出された開始部誤り二単語組は、開始部単語連鎖確率計算部22に送られる。
開始部誤り二単語組とは、認識誤り区間の最初の単語と、認識誤り単語列の最初の単語とから構成される二単語である。
The start part error two word set extraction unit 211 extracts a start part error two word set from the recognition error word string and the recognition error section. The extracted start part error two-word set is sent to the start part word chain probability calculation part 22.
The start part error two-word set is a two-word composed of the first word in the recognition error section and the first word in the recognition error word string.

図5に示した例では、認識誤り区間の最初の単語である(インターネット)と、認識誤り単語列の最初の単語である(勝つ)とから構成される二単語(インターネット)(勝つ)が、開始部誤り二単語組となる。   In the example shown in FIG. 5, two words (Internet) (win) composed of the first word in the recognition error section (Internet) and the first word in the recognition error word string (win) are: It becomes a starting part error two word set.

<ステップS4>
開始部二単語組抽出部21の開始部正解二単語組抽出部212は、認識誤り区間と、認識誤り単語列に対応する正解単語列とから、開始部正解二単語組を抽出する。抽出された開始部正解二単語組は、開始部単語連鎖確率計算部22に送られる。
開始部正解二単語組とは、認識誤り区間の最初の単語と、認識誤り単語列に対応する正解単語列の最初の単語とから構成される二単語である。
<Step S4>
The start correct two-word set extraction unit 212 of the start two-word set extraction unit 21 extracts the start correct two-word set from the recognition error section and the correct word sequence corresponding to the recognition error word sequence. The extracted start unit correct answer two-word set is sent to the start unit word chain probability calculation unit 22.
The start part correct two-word group is a two-word composed of the first word in the recognition error section and the first word in the correct word string corresponding to the recognition error word string.

図5に示した例では、認識誤り区間の最初の単語(インターネット)と、認識誤り単語列に対応する正解単語列の最初の単語(が)とから構成される二単語(インターネット)(が)が、開始部正解二単語組となる。   In the example shown in FIG. 5, two words (Internet) (GA) composed of the first word (Internet) in the recognition error section and the first word (GA) in the correct word string corresponding to the recognition error word string. However, it becomes a starting part correct answer two word set.

<ステップS5>
開始部単語連鎖確率計算部22は、音声認識部11が用いたのと同じ言語モデルを用いて、開始部誤り二単語組の単語連鎖確率と開始部正解二単語組の単語連鎖確率をそれぞれ計算する。計算された単語連鎖確率は、計算の元になった開始部誤り二単語組又は開始部正解二単語組と共に、低開始部正解二単語組抽出部23に送られる。
<Step S5>
The start part word chain probability calculation part 22 calculates the word chain probability of the start part error two-word set and the word chain probability of the start part correct answer two-word set, respectively, using the same language model used by the speech recognition unit 11. To do. The calculated word chain probabilities are sent to the low start correct part two-word set extraction unit 23 together with the start part correct two-word set or the start correct part two-word set that is the basis of the calculation.

単語連鎖確率とは、言語モデルを用いて計算される、二単語組の1つ目の単語からその二単語組の2つ目の単語に連鎖する確率のことである(例えば、参考文献2参照。)。   The word chain probability is the probability of chaining from the first word of a two-word set to the second word of the two-word set, calculated using a language model (see, for example, Reference 2) .)

〔参考文献2〕Lawrence Rabiner(著),Biing-Hwang Juang(著),古井定煕(翻訳),「音声認識の基礎(下)」,NTTアドバンステクノロジ株式会社,1995年,P.262−263
<ステップS6>
低開始部正解二単語組抽出部23は、開始部誤り二単語組の単語連鎖確率と開始部正解二単語組の単語連鎖確率とを比較して、開始部誤り二単語組の単語連鎖確率よりも単語連鎖確率が低い開始部正解二単語組を抽出する。開始部誤り二単語組の単語連鎖確率よりも単語連鎖確率が低い開始部正解二単語組を、低開始部正解二単語組とする。
[Reference 2] Lawrence Rabiner (Author), Biing-Hwang Juang (Author), Sadaaki Furui (Translation), "Basics of Speech Recognition (Bottom)", NTT Advanced Technology Co., Ltd., 1995, p. 262-263
<Step S6>
The low start correct two-word set extraction unit 23 compares the word chain probability of the start correct two-word set with the word chain probability of the start correct two-word set, Also extract the correct two-word set of the starting part with a low word chain probability. A start correct two-word set having a word chain probability lower than the word chain probability of the start error two-word set is set as a low start correct two-word set.

低開始部正解二単語組は、その単語連鎖確率が開始部誤り二単語組の単語連鎖確率よりも低いため、認識誤りが開始する原因となり得る。したがって、上記のように低開始部正解二単語組を抽出することにより、言語モデルにおいて認識誤りを起こしやすい部分を特定することができる。より詳細には、言語モデルにおいて認識誤りを起こしやすい部分の内、認識誤りの開始の原因となり得る部分を特定することができる。   The low starting part correct two-word set may cause a recognition error to start because its word chain probability is lower than the word chain probability of the starting part error two-word set. Therefore, by extracting the low-start-part correct two-word set as described above, it is possible to specify a part that is likely to cause a recognition error in the language model. More specifically, it is possible to identify a portion that may cause a recognition error among the portions that are likely to cause a recognition error in the language model.

認識誤りの開始の原因となる低開始部正解二単語組に、対応する開始部誤り二単語組よりも低い単語連鎖確率が割り当てられているのは、言語モデル学習データとして用いるテキストに、その低開始部正解二単語が出現しないか、その出現回数が少ないために、適切な確率を学習できていないことが原因と考えられる。したがって、低開始部正解二単語がよく現れるテキストを、言語モデル学習用データとして使うことで改善を行うことができる。   The word chain probability that is assigned to the low-start-part correct two-word set that causes the start of recognition error is lower than the corresponding two-part start-part error two-word sets. It is thought that the reason is that an appropriate probability has not been learned because the start part correct answer two words do not appear or the number of appearances is small. Therefore, it is possible to improve by using the text in which the low start part correct two words frequently appear as language model learning data.

[第二実施形態]
以下、第二実施形態の例を説明する。先に述べたように、認識誤りは数単語に亘って連続して生じる傾向があり、認識誤りの原因は、(1)認識誤りの開始の原因と、(2)認識誤りの拡大の原因の2つに分けることができる。第二実施形態は、これら両方の原因を特定するものである。
[Second Embodiment]
Hereinafter, an example of the second embodiment will be described. As mentioned above, recognition errors tend to occur continuously over several words. The causes of recognition errors are (1) the cause of recognition error start and (2) the cause of recognition error expansion. It can be divided into two. The second embodiment specifies both causes.

以下、第二実施形態の例を説明するが、第一実施形態と異なる部分についてのみ説明し、第一実施形態と同様な部分については重複説明を省略する。   Hereinafter, although the example of 2nd embodiment is demonstrated, only a different part from 1st embodiment is demonstrated, and duplication description is abbreviate | omitted about the part similar to 1st embodiment.

第二実施形態の音声認識誤り分析装置は、第一実施形態の音声認識誤り分析装置1の各部に加えて、図1に破線で例示する、区間内二単語組抽出部31、区間内単語連鎖確率計算部32及び高区間内誤り二単語組抽出部33を例えば備える。また、第二実施形態の音声認識誤り分析方法においては、第一実施形態の音声認識誤り分析装置1の各処理に加えて、図4に破線で例示するステップS7からS10の処理を行う。   The speech recognition error analysis apparatus according to the second embodiment includes, in addition to the components of the speech recognition error analysis apparatus 1 according to the first embodiment, an intra-section two-word set extraction unit 31 and intra-section word chain illustrated by broken lines in FIG. For example, a probability calculation unit 32 and a high interval error two-word set extraction unit 33 are provided. Further, in the speech recognition error analysis method of the second embodiment, in addition to the processes of the speech recognition error analysis apparatus 1 of the first embodiment, the processes of steps S7 to S10 illustrated by broken lines in FIG. 4 are performed.

<ステップS2>
認識誤り区間抽出部12は、抽出した認識誤り単語列を区間内二単語組抽出部31に送る。認識誤り区間を区間内二単語組抽出部31に送る必要はない。
<Step S2>
The recognition error section extraction unit 12 sends the extracted recognition error word string to the in-section two word set extraction unit 31. It is not necessary to send the recognition error section to the intra-section two-word set extraction unit 31.

<ステップS7>
区間内二単語組抽出部31は、図3に例示するように、区間内誤り二単語組抽出部311と、正解復帰二単語組抽出部312とを含む。
<Step S7>
As illustrated in FIG. 3, the intra-section two-word set extraction unit 31 includes an intra-section error two-word set extraction unit 311 and a correct return two-word set extraction unit 312.

区間内誤り二単語組抽出部311は、認識誤り単語列から、区間内誤り二単語組をすべて抽出する。抽出された区間内誤り二単語組は、正解復帰二単語組抽出部312と、区間内単語連鎖確率計算部32に送られる。
区間内誤り二単語組とは、認識誤り単語列内の連続する2つの単語の組のことである。
The intra-section error two-word set extraction unit 311 extracts all intra-section error two-word sets from the recognition error word string. The extracted intra-section error two-word sets are sent to the correct return two-word set extraction section 312 and the intra-section word chain probability calculation section 32.
The intra-section error two-word set is a set of two consecutive words in the recognition error word string.

図5に示した例では、(勝つ)(な)、(な)(が)及び(が)(荒)がそれぞれ区間内誤り二単語組となる。   In the example shown in FIG. 5, (win) (na), (na) (ga), and (ga) (coarse) are the intra-section error two-word sets.

<ステップS8>
区間内二単語組抽出部31の正解復帰二単語組抽出部312は、区間内誤り二単語組と、正解単語列とから、正解復帰二単語組を区間内誤り二単語組ごとに抽出する。抽出された正解復帰二単語組は、区間内単語連鎖確率計算部32に送られる。
<Step S8>
The correct return two-word set extraction unit 312 of the intra-section two-word set extraction unit 31 extracts a correct return two-word set for each intra-section error two-word set from the intra-section error two-word set and the correct word string. The extracted correct answer return two-word group is sent to the intra-section word chain probability calculation unit 32.

正解復帰二単語組とは、区間内誤り二単語組の1つ目の単語と、その1つ目の単語の始端よりも時間的に後にあり、その1つ目の単語の終端に時間的に最も近い始端を有する正解単語列内の単語とから構成される単語列のことである。   The correct return two-word set is the first word of the intra-interval error two-word set and the time after the beginning of the first word, and at the end of the first word in terms of time. It is a word string composed of words in a correct word string having the closest starting point.

図5に示した例では、区間内誤り二単語組(勝つ)(な)に対応する正解復帰二単語組は、(勝つ)(繋が)である。すなわち、区間内誤り二単語組(勝つ)(な)の1つ目の単語である(勝つ)の始端よりも時間的に後ろにある、正解単語列内の単語は(繋が)と(ら)である。(繋が)と(ら)の内、(勝つ)の終端に時間的に最も近い始端を有するのは、(繋が)である。(勝つ)の終端と(繋が)の始端との時間的な距離の方が、(勝つ)の終端と(ら)の始端との時間的な距離よりも短いからである。したがって、区間内誤り二単語組(勝つ)(な)に対応する正解復帰二単語組は、(勝つ)(繋が)となるのである。同様に、区間内誤り二単語組(な)(が)に対応する正解復帰二単語組は(な)(ら)であり、区間内誤り二単語組(が)(荒)に対応する正解復帰二単語組は(が)(ら)となる。   In the example shown in FIG. 5, the correct answer return two-word group corresponding to the intra-section error two-word group (win) (na) is (win) (connected). That is, the words in the correct word string that are temporally behind the beginning of the first word (win) in the intra-segment error two-word group (win) (na) are (connected) and (ra). It is. Of the (connected) and (ra), it is (connected) that has the start point closest in time to the end of (win). This is because the temporal distance between the end of (winning) and the starting end of (connected) is shorter than the temporal distance between the end of (winning) and the starting end of (ra). Therefore, the correct answer return two-word group corresponding to the intra-section error two-word group (win) (na) is (win) (connected). Similarly, the correct answer return two-word pair corresponding to the intra-section error two-word set (na) (ga) is (na) (ra), and the correct answer return corresponding to the intra-section error two-word set (ga) (rough) The two word set is (ga) (ra).

<ステップS9>
区間内単語連鎖確率計算部32は、音声認識部11が用いたのと同じ言語モデルを用いて、区間内誤り二単語組の単語連鎖確率と正解復帰二単語組の単語連鎖確率をそれぞれ計算する。計算された単語連鎖確率は、計算の元になった区間内誤り二単語組又は正解復帰二単語組と共に、高区間内誤り二単語組抽出部33に送られる。
<Step S9>
The intra-section word chain probability calculation unit 32 calculates the word chain probability of the intra-section error two-word set and the word chain probability of the correct return two-word set using the same language model used by the speech recognition unit 11. . The calculated word chain probabilities are sent to the high-intersection error two-word group extraction unit 33 together with the intra-section error two-word group or the correct return two-word group that is the basis of the calculation.

<ステップS10>
高区間内誤り二単語組抽出部33は、区間内誤り二単語組の単語連鎖確率と、それに対応する正解復帰二単語組の単語連鎖確率とを比較して、正解復帰二単語組の単語連鎖確率よりも単語連鎖確率が高い区間内誤り二単語組を抽出する。正解復帰二単語組の単語連鎖確率よりも単語連鎖確率が高い区間内誤り二単語組を、高区間内誤り二単語組とする。
<Step S10>
The high-interval error two-word set extraction unit 33 compares the word chain probability of the intra-section error two-word set with the corresponding word-chain probability of the correct return two-word set, and the word chain of the correct return two-word set An intra-section error two-word set having a word chain probability higher than the probability is extracted. An intra-section error two-word set having a word chain probability that is higher than the word chain probability of the correct return two-word set is defined as a high-section intra-error two-word set.

高区間内誤り二単語組は、その単語連鎖確率が正解復帰二単語組の単語連鎖確率よりも高いため、認識誤りを拡大する原因となり得る。したがって、上記のように高区間内誤り二単語組を抽出することにより、言語モデルにおいて認識誤りを起こしやすい部分を特定することができる。より詳細には、言語モデルにおいて認識誤りを起こしやすい部分の内、認識誤りを拡大する原因となり得る部分を特定することができる。   The intra-high-interval error two-word group has a higher word chain probability than the word chain probability of the correct return two-word group, and can therefore cause recognition errors to expand. Therefore, by extracting the two-word group error in the high section as described above, it is possible to specify a part that is likely to cause a recognition error in the language model. More specifically, it is possible to identify a portion that can cause recognition errors in a portion that easily causes recognition errors in the language model.

認識誤りの拡大の原因となる高区間内誤り二単語組により高い単語連鎖確率が割り当てられていることは、その高区間内誤り二単語組が偏って多く現れるテキストを言語モデル学習データとして用いていることが原因と考えられる。したがって、この高区間内誤り二単語に偏らないように言語モデル学習に使うテキストを選択することにより改善を行うことができる。   High word chain probabilities are assigned to the high-intra-error two-word pairs that cause recognition errors to expand. This is considered to be the cause. Therefore, it is possible to improve by selecting the text used for language model learning so as not to be biased to the two errors in the high section.

[変形例]
図1に一点鎖線で示す開始部出現頻度集計部24が、低開始部正解二単語組の出現頻度を求めてもよい(ステップS11,図4)。例えば、低開始部正解二単語組抽出部23が抽出した各低開始部正解二単語組の数をカウントして、各低開始部正解二単語組に出現頻度としてそのカウント数を割り当てる。また、例えば、低開始部正解二単語組の出現頻度=(その低開始部正解二単語組のカウント数)/(低開始部正解二単語組のカウント数の総和)とし、各低開始部正解二単語組に出現頻度として割合を割り当ててもよい。
[Modification]
The start part appearance frequency totaling unit 24 indicated by a one-dot chain line in FIG. 1 may obtain the appearance frequency of the low start part correct two-word set (step S11, FIG. 4). For example, the number of each low start correct two-word set extracted by the low start correct two-word set extraction unit 23 is counted, and the count is assigned to each low start correct two-word set as an appearance frequency. Also, for example, the appearance frequency of the low start part correct two-word set = (the count number of the low start part correct two-word set) / (the sum of the count numbers of the low start part correct two-word set), and each low start part correct answer A ratio may be assigned as the appearance frequency to the two-word group.

このように、開始部出現頻度集計部24を設けることにより、低開始部正解二単語組の中で出現頻度が高いものを抽出することが可能となり、改善すべき低開始部正解二単語組を絞り込むことができる。   As described above, by providing the start part appearance frequency totaling unit 24, it is possible to extract a low start part correct answer two-word set having a high appearance frequency, and to select a low start part correct answer two word set to be improved. You can narrow down.

同様に、図1に一点鎖線で示す区間内出現頻度集計部34が、高区間内誤り二単語組の出現頻度を求めてもよい(ステップS12,図4)。これにより、高区間内誤り二単語組の中で出現頻度が高いものを抽出することが可能となり、改善すべき高区間内誤り二単語組を絞り込むことができる。   Similarly, the intra-section appearance frequency totaling unit 34 indicated by the alternate long and short dash line in FIG. 1 may obtain the appearance frequency of the high-section intra-error two-word set (step S12, FIG. 4). As a result, it is possible to extract a high-frequency intra-error two-word set that has a high appearance frequency, and to narrow down high-error intra-word two-word sets to be improved.

上述の構成をコンピュータによって実現する場合、音声認識誤り分析装置の各部が有する機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記各部の機能がコンピュータ上で実現される。   When the above-described configuration is realized by a computer, the processing contents of the functions of each unit of the speech recognition error analysis apparatus are described by a program. By executing this program on a computer, the functions of the above-described units are realized on the computer.

すなわち、CPUが各プログラムを逐次読み込んで実行することにより、音声認識部11、認識誤り区間抽出部12、開始部二単語組抽出部21、開始部単語連鎖確率計算部22、低開始部正解二単語組抽出部23、開始部出現頻度集計部24、区間内二単語組抽出部31、区間内単語連鎖確率計算部32、高区間内誤り二単語組抽出部33及び区間内出現頻度集計部34の機能がそれぞれ実現される。この場合、音声認識誤り装置の各部として機能するCPUは、メモリ、ハードディスク等の記録媒体から読み込み込んだデータに対して処理を行い、処理を行った後のデータを記録媒体に格納する。   That is, when the CPU sequentially reads and executes each program, the speech recognition unit 11, the recognition error section extraction unit 12, the start unit two word set extraction unit 21, the start unit word chain probability calculation unit 22, the low start unit correct answer 2 Word set extraction unit 23, start portion appearance frequency totaling unit 24, intra-section two-word set extraction unit 31, intra-section word chain probability calculation unit 32, high-section error two-word set extraction unit 33, and intra-section appearance frequency totaling section 34 Each function is realized. In this case, the CPU functioning as each unit of the speech recognition error device performs processing on data read from a recording medium such as a memory or a hard disk, and stores the processed data in the recording medium.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよいが、具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、DVD(Digital Versatile Disc)、DVD−RAM(Random Access Memory)、CD−ROM(Compact Disc Read Only Memory)、CD
−R(Recordable)/RW(ReWritable)等を、光磁気記録媒体として、MO(Magneto-Optical disc)等を、半導体メモリとしてEEP−ROM(Electronically Erasable and Programmable-Read Only Memory)等を用いることができる。
The program describing the processing contents can be recorded on a computer-readable recording medium. The computer-readable recording medium may be any medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, or a semiconductor memory. Specifically, for example, the magnetic recording device may be a hard disk device or a flexible Discs, magnetic tapes, etc. as optical discs, DVD (Digital Versatile Disc), DVD-RAM (Random Access Memory), CD-ROM (Compact Disc Read Only Memory), CD
-R (Recordable) / RW (ReWritable), etc., MO (Magneto-Optical disc), etc. as a magneto-optical recording medium, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. as a semiconductor memory it can.

また、このプログラムの流通は、例えば、そのプログラムを記録したDVD、CD−ROM等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。   The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

また、上述した実施形態とは別の実行形態として、コンピュータが可搬型記録媒体から直接このプログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるASP(Application Service Provider)型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの(コンピュータに対する直接の指令ではないがコンピュータの処理を基底する性質を有するデータ等)を含むものとする。   As an execution form different from the above-described embodiment, the computer may read the program directly from the portable recording medium and execute processing according to the program. Each time is transferred, the processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to a computer but has a property that is based on computer processing).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、本装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。   In this embodiment, the present apparatus is configured by executing a predetermined program on a computer. However, at least a part of these processing contents may be realized by hardware.

また、上述の各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。例えば、図4において、ステップS3の処理とステップS4の処理とを並列に行ってもよい。同様に、ステップS7の処理とステップS8の処理とを並列に行ってもよい。また、ステップS3からステップS6までの処理と、ステップS7からステップS10までの処理とを並列に行ってもよい。
その他、本発明の趣旨を逸脱しない範囲で適宜変更が可能であることはいうまでもない。
In addition, the various processes described above are not only executed in time series according to the description, but may be executed in parallel or individually according to the processing capability of the apparatus that executes the processes or as necessary. For example, in FIG. 4, the process of step S3 and the process of step S4 may be performed in parallel. Similarly, the process of step S7 and the process of step S8 may be performed in parallel. Further, the processing from step S3 to step S6 and the processing from step S7 to step S10 may be performed in parallel.
Needless to say, other modifications are possible without departing from the spirit of the present invention.

音声認識誤り分析装置の例の機能ブロック図。The functional block diagram of the example of a speech recognition error analysis apparatus. 開始部二単語組抽出部の例の機能ブロック図。The functional block diagram of the example of a start part two word set extraction part. 区間内二単語組抽出部の例の機能ブロック図。The functional block diagram of the example of the two word group extraction part in an area. 音声認識誤り分析方法の処理の流れを例示するフローチャート。The flowchart which illustrates the flow of a process of the speech recognition error analysis method. 認識誤り単語列、認識誤り区間、開始部誤り二単語組、開始部正解二単語組、区間内誤り二単語組及び正解復帰二単語組等の例を表す図。The figure showing examples, such as a recognition error word sequence, a recognition error section, a start part error two-word group, a start part correct answer two-word group, an intra-section error two-word group, and a correct return two-word group.

符号の説明Explanation of symbols

1 音声認識誤り分析装置
11 音声認識部
12 誤り認識区間抽出部
21 開始部二単語組抽出部
22 開始部単語連鎖確率計算部
23 低開始部正解二単語組抽出部
24 開始部出現頻度集計部
31 区間内二単語組抽出部
32 区間内単語連鎖確率計算部
33 高区間内誤り二単語組抽出部
34 区間内出現頻度集計部
211 開始部二単語組抽出部
212 開始部正解二単語組抽出部
311 区間内二単語組抽出部
312 正解復帰二単語組抽出部
DESCRIPTION OF SYMBOLS 1 Speech recognition error analyzer 11 Speech recognition part 12 Error recognition section extraction part 21 Start part Two word set extraction part 22 Start part Word chain probability calculation part 23 Low start part Correct two word set extraction part 24 Start part appearance frequency totaling part 31 Intra-section two-word set extraction section 32 In-section word chain probability calculation section 33 In-high section error two-word set extraction section 34 In-section appearance frequency counting section 211 Start section Two-word set extraction section 212 Start section correct answer two-word set extraction section 311 Intra-section two-word set extraction unit 312 Correct answer return two-word set extraction unit

Claims (10)

言語モデルを用いて音声信号に対して音声認識処理を行い、その音声認識結果である単語列(以下、認識単語列とする。)を割り当てる音声認識部と、
認識単語列内の、その認識単語列に対応する正解単語列と一致しない1つ又は連続する複数の単語から構成される単語列(以下、認識誤り単語列とする。)と、その認識誤り単語列及びその前後一単語から構成される認識誤り区間とを上記認識単語列から抽出する認識誤り区間抽出部と、
上記認識誤り区間の最初の単語と、上記認識誤り単語列の最初の単語とから構成される開始部誤り二単語組を抽出する開始部誤り二単語組抽出部と、
上記認識誤り区間の最初の単語と、上記認識誤り単語列に対応する正解単語列の最初の単語とから構成される開始部正解二単語組を抽出する開始部正解二単語組抽出部と、
上記言語モデルを用いて、上記開始部誤り二単語組の単語連鎖確率と上記開始部正解二単語組の単語連鎖確率をそれぞれ計算する開始部単語連鎖確率計算部と、
上記開始部誤り二単語組の単語連鎖確率と上記開始部正解二単語組の単語連鎖確率とを比較して、開始部誤り二単語組の単語連鎖確率よりも単語連鎖確率が低い開始部正解二単語組(以下、低開始部正解二単語組とする。)を抽出する低開始部正解二単語組抽出部と、
を備える音声認識誤り分析装置。
A speech recognition unit that performs speech recognition processing on a speech signal using a language model and assigns a word string (hereinafter referred to as a recognition word string) that is a speech recognition result;
A word string composed of one or a plurality of consecutive words that do not match the correct word string corresponding to the recognized word string in the recognized word string (hereinafter referred to as a recognized error word string), and the recognized error word A recognition error interval extraction unit that extracts a recognition error interval composed of a sequence and one word before and after the sequence from the recognition word sequence;
A starter error two-word set extraction unit that extracts a starter error two-word set composed of a first word of the recognition error section and a first word of the recognition-error word string;
A starter correct two-word set extraction unit that extracts a starter correct two-word set composed of a first word of the recognition error section and a first word of a correct word string corresponding to the recognition-error word string;
Using the language model, a starter word chain probability calculating unit that calculates a word chain probability of the starter error two-word set and a word chain probability of the starter correct two-word set, respectively,
Comparing the word chain probability of the start part error two-word set with the word chain probability of the start part correct answer two-word set, the start part correct answer 2 having a word chain probability lower than the word chain probability of the start part error two-word set A low start part correct answer two-word set extraction unit that extracts a word set (hereinafter referred to as a low start part correct answer two word set);
A speech recognition error analysis apparatus comprising:
請求項1に記載の音声認識誤り分析装置において、  The speech recognition error analysis apparatus according to claim 1,
上記低開始部正解二単語組の出現頻度を求める開始部出現頻度集計部、  Start part appearance frequency totalization part for obtaining the appearance frequency of the low start part correct two-word set,
を更に備える音声認識誤り分析装置。  A speech recognition error analyzer further comprising:
請求項1又は2に記載の音声認識誤り分析装置において、  In the speech recognition error analysis device according to claim 1 or 2,
認識誤り単語列内の連続する2つの単語の組(以下、区間内誤り二単語組とする。)のすべてを上記認識誤り単語列から抽出する区間内誤り二単語組抽出部と、  An intra-section error two-word set extraction unit that extracts all of a set of two consecutive words in the recognition-error word string (hereinafter referred to as an intra-section error two-word set) from the recognition-error word string;
区間内誤り二単語組の1つ目の単語と、その1つ目の単語の始端よりも時間的に後にあり、その1つ目の単語の終端に時間的に最も近い始端を有する上記正解単語列内の単語とから構成される正解復帰二単語組を各上記区間内誤り二単語組ごとに抽出する正解復帰二単語組抽出部と、  The first correct word in the intra-interval error two-word set and the correct word that is temporally after the start of the first word and has the start closest to the end of the first word in time A correct return two-word set extraction unit that extracts a correct return two-word set composed of the words in the sequence for each of the above-mentioned two error pairs in the section;
上記言語モデルを用いて、上記区間内誤り二単語組の単語連鎖確率と上記正解復帰二単語組の単語連鎖確率をそれぞれ計算する区間内単語連鎖確率計算部と、  Using the language model, an intra-segment word chain probability calculation unit for calculating a word chain probability of the intra-interval error two-word set and a word chain probability of the correct return two-word set, respectively,
上記区間内誤り二単語組の単語連鎖確率と、その区間内誤り二単語組に対応する正解復帰二単語組の単語連鎖確率とを比較して、上記正解復帰二単語組の単語連鎖確率よりも単語連鎖確率が高い区間内誤り二単語組(以下、高区間内誤り二単語組とする。)を抽出する高区間内誤り二単語組抽出部と、  Compare the word chain probability of the two-word set in the interval and the word chain probability of the two-word set of correct return corresponding to the two-word set of errors in the interval, and more than the word chain probability of the two-word set of correct return A high-intersection error two-word set extraction unit that extracts an intra-section error two-word set with a high word chain probability (hereinafter referred to as a high-section error two-word set);
を更に備える音声認識誤り分析装置。  A speech recognition error analyzer further comprising:
請求項3に記載の音声認識誤り分析装置において、
上記高区間内誤り二単語組の出現頻度を求める区間内出現頻度集計部、
を更に備える音声認識誤り分析装置。
In the speech recognition error analysis device according to claim 3 ,
Intra-section appearance frequency totaling unit for determining the appearance frequency of the above-mentioned high section error two-word set,
A speech recognition error analyzer further comprising:
音声認識部が、言語モデルを用いて音声信号に対して音声認識処理を行い、その音声認識結果である単語列(以下、認識単語列とする。)を割り当てる音声認識ステップと、
認識誤り区間抽出部が、認識単語列内の、その認識単語列に対応する正解単語列と一致しない1つ又は連続する複数の単語から構成される単語列(以下、認識誤り単語列とする。)と、その認識誤り単語列及びその前後一単語から構成される認識誤り区間とを上記認識単語列から抽出する認識誤り区間抽出ステップと、
開始部誤り二単語組抽出部が、上記認識誤り区間の最初の単語と、上記認識誤り単語列の最初の単語とから構成される開始部誤り二単語組を抽出する開始部誤り二単語組抽出ステップと、
開始部正解二単語組抽出部が、上記認識誤り区間の最初の単語と、上記認識誤り単語列に対応する正解単語列の最初の単語とから構成される開始部正解二単語組を抽出する開始部正解二単語組抽出ステップと、
開始部単語連鎖確率部が、上記言語モデルを用いて、上記開始部誤り二単語組の単語連鎖確率と上記開始部正解二単語組の単語連鎖確率をそれぞれ計算する開始部単語連鎖確率計算ステップと、
低開始部正解二単語組抽出部が、上記開始部誤り二単語組の単語連鎖確率と上記開始部正解二単語組の単語連鎖確率とを比較して、開始部誤り二単語組の単語連鎖確率よりも単語連鎖確率が低い開始部正解二単語組(以下、低開始部正解二単語組とする。)を抽出する低開始部正解二単語組抽出ステップと、
を有する音声認識誤り分析方法。
A voice recognition step in which a voice recognition unit performs voice recognition processing on a voice signal using a language model and assigns a word string (hereinafter referred to as a recognition word string) as a result of the voice recognition;
The recognition error section extraction unit is a word string composed of one or a plurality of continuous words that do not match the correct word string corresponding to the recognition word string in the recognition word string (hereinafter referred to as a recognition error word string). And a recognition error section extraction step for extracting a recognition error section composed of the recognition error word string and one word before and after the recognition error word string,
Start part error two-word set extraction unit extracts a start part error two-word set consisting of the first word of the recognition error section and the first word of the recognition error word string Steps,
Start of extracting the correct part two-word set of the start part composed of the first word in the recognition error section and the first word in the correct word string corresponding to the recognition error word string Part correct word two word set extraction step;
A starting word chain probability calculating step, wherein the starting word chain probability part calculates the word chain probability of the starting part error two-word set and the word chain probability of the starting part correct two-word set, respectively, using the language model; ,
The low start part correct two-word set extraction unit compares the word chain probability of the start part correct two-word set with the word chain probability of the start part correct two-word set, and the word chain probability of the start part error two-word set A low start part correct two-word set extraction step for extracting a start correct part two-word set (hereinafter referred to as a low start correct part two-word set) having a lower word chain probability,
A speech recognition error analysis method comprising:
請求項5に記載の音声認識誤り分析方法において、  The speech recognition error analysis method according to claim 5,
開始部出現頻度集計部が、上記低開始部正解二単語組の出現頻度を求める開始部出現頻度集計ステップ、  A start part appearance frequency totaling step for obtaining the appearance frequency of the low start part correct two-word set,
を更に有する音声認識誤り分析方法。  A speech recognition error analysis method further comprising:
請求項5又は6に記載の音声認識誤り分析方法において、  The speech recognition error analysis method according to claim 5 or 6,
区間内誤り二単語組抽出部が、認識誤り単語列内の連続する2つの単語の組(以下、区間内誤り二単語組とする。)のすべてを上記認識誤り単語列から抽出する区間内誤り二単語組抽出ステップと、  The intra-section error two-word set extraction unit extracts all the consecutive two word sets (hereinafter referred to as intra-section error two-word sets) in the recognition error word string from the recognition error word string. A two-word set extraction step;
正解復帰二単語組抽出部が、区間内誤り二単語組の1つ目の単語と、その1つ目の単語の始端よりも時間的に後にあり、その1つ目の単語の終端に時間的に最も近い始端を有する上記正解単語列内の単語とから構成される正解復帰二単語組を各上記区間内誤り二単語組ごとに抽出する正解復帰二単語組抽出ステップと、  The correct return two-word set extraction unit is temporally after the first word of the intra-section error two-word set and the first end of the first word, and at the end of the first word A correct return two-word set extraction step for extracting a correct return two-word set composed of a word in the correct word sequence having a starting point closest to
区間内単語連鎖確率計算部が、上記言語モデルを用いて、上記区間内誤り二単語組の単語連鎖確率と上記正解復帰二単語組の単語連鎖確率をそれぞれ計算する区間内単語連鎖確率計算ステップと、  An intra-segment word chain probability calculation unit, using the language model, to calculate the intra-interval error two-word group word chain probability and the correct answer return two-word group word chain probability, ,
高区間内誤り二単語組抽出部が、上記区間内誤り二単語組の単語連鎖確率と、その区間内誤り二単語組に対応する正解復帰二単語組の単語連鎖確率とを比較して、上記正解復帰二単語組の単語連鎖確率よりも単語連鎖確率が高い区間内誤り二単語組(以下、高区間内誤り二単語組とする。)を抽出する高区間内誤り二単語組抽出ステップと、  The high-intersection error two-word set extraction unit compares the word chain probability of the intra-section error two-word set with the word chain probability of the correct return two-word set corresponding to the intra-section error two-word set, and A high-intersection error two-word set extraction step for extracting an intra-section error two-word pair (hereinafter referred to as a high-section error two-word set) having a word chain probability higher than the word chain probability of the correct return two-word set;
を更に有する音声認識誤り分析方法。  A speech recognition error analysis method further comprising:
請求項7に記載の音声認識誤り分析方法において、
区間内出現頻度集計部が、上記高区間内誤り二単語組の出現頻度を求める区間内出現頻度集計ステップ、
を更に有する音声認識誤り分析方法。
The speech recognition error analysis method according to claim 7 ,
Intra-section appearance frequency totaling section, the intra-section appearance frequency totaling step to obtain the appearance frequency of the above-mentioned high section error two-word set,
A speech recognition error analysis method further comprising:
請求項1から4の何れかに記載の音声認識誤り分析装置の各部としてコンピュータを機能させるための音声認識誤り分析プログラム。   5. A speech recognition error analysis program for causing a computer to function as each part of the speech recognition error analysis apparatus according to claim 1. 請求項9に記載の音声認識誤り分析プログラムが記録されたコンピュータ読み取り可能な記録媒体。   A computer-readable recording medium on which the speech recognition error analysis program according to claim 9 is recorded.
JP2008038468A 2008-02-20 2008-02-20 Speech recognition error analysis apparatus, method, program, and recording medium therefor Expired - Fee Related JP4829910B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2008038468A JP4829910B2 (en) 2008-02-20 2008-02-20 Speech recognition error analysis apparatus, method, program, and recording medium therefor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2008038468A JP4829910B2 (en) 2008-02-20 2008-02-20 Speech recognition error analysis apparatus, method, program, and recording medium therefor

Publications (2)

Publication Number Publication Date
JP2009198646A JP2009198646A (en) 2009-09-03
JP4829910B2 true JP4829910B2 (en) 2011-12-07

Family

ID=41142221

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2008038468A Expired - Fee Related JP4829910B2 (en) 2008-02-20 2008-02-20 Speech recognition error analysis apparatus, method, program, and recording medium therefor

Country Status (1)

Country Link
JP (1) JP4829910B2 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8560318B2 (en) * 2010-05-14 2013-10-15 Sony Computer Entertainment Inc. Methods and system for evaluating potential confusion within grammar structure for set of statements to be used in speech recognition during computing event
JP6026224B2 (en) * 2012-10-29 2016-11-16 Kddi株式会社 Pattern recognition method and apparatus, pattern recognition program and recording medium therefor

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3440840B2 (en) * 1998-09-18 2003-08-25 松下電器産業株式会社 Voice recognition method and apparatus
JP4103639B2 (en) * 2003-03-14 2008-06-18 セイコーエプソン株式会社 Acoustic model creation method, acoustic model creation device, and speech recognition device

Also Published As

Publication number Publication date
JP2009198646A (en) 2009-09-03

Similar Documents

Publication Publication Date Title
US7949532B2 (en) Conversation controller
US8301450B2 (en) Apparatus, method, and medium for dialogue speech recognition using topic domain detection
US7949531B2 (en) Conversation controller
US7949530B2 (en) Conversation controller
US7805312B2 (en) Conversation control apparatus
JP4543294B2 (en) Voice recognition apparatus, voice recognition method, and recording medium
US20020099543A1 (en) Segmentation technique increasing the active vocabulary of speech recognizers
US20080077404A1 (en) Speech recognition device, speech recognition method, and computer program product
JP5175325B2 (en) WFST creation device for speech recognition, speech recognition device using the same, method, program thereof, and storage medium
JP6495792B2 (en) Speech recognition apparatus, speech recognition method, and program
WO2019156101A1 (en) Device for estimating deterioration factor of speech recognition accuracy, method for estimating deterioration factor of speech recognition accuracy, and program
JP4890518B2 (en) Integrated speech recognition system using multiple language models
JP4829910B2 (en) Speech recognition error analysis apparatus, method, program, and recording medium therefor
JP4769261B2 (en) Speech recognition error analysis apparatus, method, program, and recording medium therefor
JP4689032B2 (en) Speech recognition device for executing substitution rules on syntax
JP6716513B2 (en) VOICE SEGMENT DETECTING DEVICE, METHOD THEREOF, AND PROGRAM
JP4533160B2 (en) Discriminative learning method, apparatus, program, and recording medium on which discriminative learning program is recorded
JP6486789B2 (en) Speech recognition apparatus, speech recognition method, and program
JP2002278579A (en) Voice data retrieving device
JP5022319B2 (en) Text mining apparatus, method, program, and recording medium thereof
JP2012108262A (en) Interaction content extraction apparatus, interaction content extraction method, program therefor and recording medium
WO2021044606A1 (en) Learning device, estimation device, methods therefor, and program
JP3969079B2 (en) Voice recognition apparatus and method, recording medium, and program
JP6537996B2 (en) Unknown word detection device, unknown word detection method, program
KR20050065193A (en) Lexical and semantic collocation based korean parsing system and the method

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20100114

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20110524

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20110607

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20110711

RD03 Notification of appointment of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7423

Effective date: 20110729

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20110906

A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20110916

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20140922

Year of fee payment: 3

R150 Certificate of patent or registration of utility model

Ref document number: 4829910

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

Free format text: JAPANESE INTERMEDIATE CODE: R150

S531 Written request for registration of change of domicile

Free format text: JAPANESE INTERMEDIATE CODE: R313531

R350 Written notification of registration of transfer

Free format text: JAPANESE INTERMEDIATE CODE: R350

LAPS Cancellation because of no payment of annual fees