JP4756499B2

JP4756499B2 - Voice recognition result inspection apparatus and computer program

Info

Publication number: JP4756499B2
Application number: JP2005238236A
Authority: JP
Inventors: 徹清水; ウェイキット・ロー; 哲中村
Original assignee: ATR Advanced Telecommunications Research Institute International
Current assignee: ATR Advanced Telecommunications Research Institute International
Priority date: 2005-08-19
Filing date: 2005-08-19
Publication date: 2011-08-24
Anticipated expiration: 2025-08-19
Also published as: JP2007052307A

Abstract

PROBLEM TO BE SOLVED: To provide an inspection device of the voice recognition results, where a viewpoint whether they are suitable to use for translation is introduced in the checks of the results. SOLUTION: This inspection device 60 is used together with the collection 56 of the phrases extracted from the bilingual corpus 34 to check whether to use the words constituting the word series of the voice recognition result 50 of the voice recognition device 48 for the purpose of processing in the translation machine 68 of a phrase base. A reliability level is attached in advance to each word to check. This inspection device 60 includes a suitability evaluation section 70 to give a suitability degree for the translation 68 to each word as a function of the word string including each word and having a matching phrase in the phrase collection 56 among the word series making part of the word string in the voice recognition result, and an accept/no-accept deciding section 74 of the word after comparing the threshold level 82 determined according to the suitability degree of the word and the reliability level of the word. COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、音声認識結果の単語列を構成する単語を後続する処理の対象として受理すべきか否かを検査する装置に関し、特に、音声翻訳処理のための音声認識処理による結果の信頼度をもとに、後続する翻訳処理への適合性を考慮して検査を行なう装置に関する。 The present invention relates to an apparatus for inspecting whether or not a word constituting a word string of a speech recognition result should be accepted as a target of subsequent processing, and in particular, has a reliability of a result of speech recognition processing for speech translation processing. In particular, the present invention relates to an apparatus for performing inspection in consideration of suitability for subsequent translation processing.

音声翻訳は、ある言語の音声を受けて、その発話内容を別の言語に翻訳して出力する処理である。音声翻訳を自動化・高性能化する技術は、音声言語処理技術の目標の一つといえる。音声翻訳は、一般的に次の二つの技術によって実現される。すなわち、発話音声から発話された単語又は文章を同定するための音声認識と、同定された単語又は文章を、別の言語の単語又は文章に変換するための機械翻訳とである。 Speech translation is a process of receiving speech in a certain language, translating the utterance content into another language and outputting it. The technology to automate and improve the performance of speech translation is one of the goals of spoken language processing technology. Speech translation is generally realized by the following two technologies. That is, voice recognition for identifying a word or sentence uttered from an uttered voice and machine translation for converting the identified word or sentence into a word or sentence in another language.

音声翻訳性能を向上させるためには、音声認識と機械翻訳との連携を緊密にすることが必要となる。そのための種々の技術が提案されている。例えば、非特許文献１には、音声認識と機械翻訳とに、統合された一つの統計モデルを用いる技術が開示されている。非特許文献２には、音声認識結果として複数の候補を生成し、それぞれの候補について機械翻訳を行なって、複数の翻訳結果の中から最適な候補を選択する技術が開示されている。 In order to improve speech translation performance, it is necessary to closely link speech recognition and machine translation. Various techniques for this have been proposed. For example, Non-Patent Document 1 discloses a technique that uses one integrated statistical model for speech recognition and machine translation. Non-Patent Document 2 discloses a technique for generating a plurality of candidates as a speech recognition result, performing machine translation on each candidate, and selecting an optimum candidate from the plurality of translation results.

高性能な音声翻訳を実現するためには、音声認識性能と機械翻訳性能とをそれぞれ向上させることも必要となる。機械翻訳性能を向上させるための技術として、統計翻訳（Statistical Machine Translation：ＳＭＴ）において、翻訳処理の単位を長くする技術が提案されている。例えば、非特許文献３には、単語又は連接した複数の単語によって構成される「フレーズ」を処理の単位とした、フレーズベースでの統計翻訳技術が開示されている。この技術では、バイリンガルコーパスから統計翻訳用の統計モデル（以下、翻訳モデル）を学習する過程で、バイリンガルコーパスから自動的に抽出されるフレーズを、翻訳の単位として利用する。翻訳の処理の単位をフレーズにすることにより、単語を単位として翻訳を行なう場合より、単語同士が自然に連接する自然な翻訳結果を得ることができる。 In order to realize high-performance speech translation, it is necessary to improve both speech recognition performance and machine translation performance. As a technique for improving machine translation performance, a technique for prolonging the unit of translation processing has been proposed in statistical machine translation (SMT). For example, Non-Patent Document 3 discloses a phrase-based statistical translation technique using a “phrase” composed of a word or a plurality of connected words as a unit of processing. In this technique, a phrase automatically extracted from a bilingual corpus in the course of learning a statistical model for statistical translation (hereinafter referred to as a translation model) from the bilingual corpus is used as a translation unit. By using a translation processing unit as a phrase, it is possible to obtain a natural translation result in which words naturally connect to each other as compared to the case where translation is performed in units of words.

近年の研究の進展とコンピュータの性能の向上とにより、かなりの精度での音声認識が実現されるようになった。しかし、雑音、話者の変化、非文法的な発話等、音声認識の障害となる要因が多数あり、十分な音声認識の性能を達成することが難しい。そのため、音声認識結果を翻訳処理する前に、音声認識処理での音声認識誤りを的確に検出し訂正することが重要である。そのための技術として、音声認識結果として得られる単語列（又はそれを構成する単語）が信頼のおけるものであるか否かを評価する技術が開発されている。例えば、非特許文献４では、音声認識の結果として出力される単語グラフと、単語ごとに付与された単語事後確率とをもとに、一般化された単語事後確率（Generalized Word Posterior Probability：ＧＷＰＰ）を算出し、音声認識結果の各単語に関する評価に、ＧＷＰＰを使用している。 Recent progress in research and improvements in computer performance have enabled speech recognition with considerable accuracy. However, there are many factors that hinder speech recognition, such as noise, speaker changes, and non-grammatical speech, making it difficult to achieve sufficient speech recognition performance. Therefore, it is important to accurately detect and correct a speech recognition error in the speech recognition process before the speech recognition result is translated. As a technique for that purpose, a technique for evaluating whether or not a word string (or a word constituting the word string) obtained as a speech recognition result is reliable has been developed. For example, in Non-Patent Document 4, a generalized word posterior probability (GWPP) based on a word graph output as a result of speech recognition and a word posterior probability assigned to each word. GWPP is used to evaluate each word of the speech recognition result.

Ｙ．ガオ、「カップリング対統一: 音声−音声翻訳のためのモデリング技法」、Ｅｕｒｏｓｐｅｅｃｈ２００３予稿集、３６５−３６８頁（２００４年）（Y. Gao, "Coupling vs. unifying: Modeling technique for speech-speech translation", Proc. of Eurospeech 2003, pp.365-368 (2004)）Y. Gao, “Coupling vs Unification: Modeling Techniques for Speech-to-Speech Translation”, Eurospech 2003 Proceedings, pages 365-368 (2004) (Y. Gao, “Coupling vs. unifying: Modeling technique for speech-speech translation” , Proc. Of Eurospeech 2003, pp.365-368 (2004)) Ｒ．ツァン他、「Ｎ−ベスト音声認識仮説を用いて改良された音声言語翻訳」、ＩＣＳＬＰ２００４予稿集、１６２９−１６３２頁（２００４年）（R. Zhang et al., "Improved Spoken Language Translation Using N-Best Speech Recognition Hypothesis", Proc. of ICSLP 2004, pp.1629-1632 (2004)）R. Tsang et al., “Improved Spoken Language Translation Using N-Best”, ICSLP2004 Proceedings, 1629-1632 (2004) (R. Zhang et al., “Improved Spoken Language Translation Using N-Best Speech Recognition Hypothesis ", Proc. Of ICSLP 2004, pp.1629-1632 (2004)) Ｅ．スミタ他、「用例翻訳（ＥＢＭＴ）、統計翻訳（ＳＭＴ）、ハイブリッドなど：ＡＴＲ音声言語翻訳システム」、ＩＷＳＬＴ２００４予稿集、１３−２０頁、（２００４年）（E. Sumita et al., "EBMT, SMT, Hybrid and More: ATR Spoken Language Translation System" Proc. of IWSLT 2004, pp.13-20 (2004)）E. Sumita et al., “Example Translation (EBMT), Statistical Translation (SMT), Hybrid, etc .: ATR Spoken Language Translation System”, IWSLT 2004 Proceedings, pp. 13-20, (2004) (E. Sumita et al., “EBMT, SMT, Hybrid and More: ATR Spoken Language Translation System "Proc. Of IWSLT 2004, pp.13-20 (2004)) フランクＫ．スーン他、「単語の評価誤りを最小化するための最適な音響及び言語モデルの重み」、ＩＣＳＬＰ２００４予稿集、４４１−４４４頁、（２００４年）（Frank K. Soong et al., "Optimal Acoustic and Language Model Weights for Minimizing Word Verification Errors" Proc. of ICSLP 2004, pp.441-444 (2004)）Frank K. Soon et al., “Optimal acoustic and language model weights to minimize word evaluation errors”, ICSLP 2004 Proceedings, 441-444, (2004) (Frank K. Soong et al., “Optimal Acoustic and Language Model Weights for Minimizing Word Verification Errors "Proc. Of ICSLP 2004, pp.441-444 (2004))

非特許文献１に記載の技術では、音声認識と機械翻訳とに、統合された一つの統計モデルを用いている。しかしこの統合された統計モデルが、音声認識及び機械翻訳の両方に対して最適なモデルであるとは限らない。そのため、音声認識と機械翻訳とのいずれかの性能が犠牲になる恐れがある。実際、機械翻訳においては、非特許文献３に記載の技術のように、フレーズ等の長い単語列を処理単位として処理が可能であるのに対し、音声認識において、そのような長い処理単位での処理を行なうと、かえって認識性能を低下させるおそれがある。非特許文献２に記載の技術では、機械翻訳の処理を複数の候補について行なうことが必要になる。その分処理の量が増大する。 In the technique described in Non-Patent Document 1, one integrated statistical model is used for speech recognition and machine translation. However, this integrated statistical model is not necessarily the optimal model for both speech recognition and machine translation. Therefore, there is a risk that the performance of either speech recognition or machine translation is sacrificed. In fact, in machine translation, a long word string such as a phrase can be processed as a processing unit as in the technique described in Non-Patent Document 3, whereas in speech recognition, such a long processing unit is used. If the process is performed, the recognition performance may be deteriorated. In the technique described in Non-Patent Document 2, it is necessary to perform machine translation processing on a plurality of candidates. The amount of processing increases accordingly.

そのため、音声認識と機械翻訳との連携を緊密なものにするためには、音声認識結果が機械翻訳に適したものであるか否かという観点から、音声認識結果を検証する必要がある。しかしながら、非特許文献１〜３には、このような評価を行なうための技術について、具体的な記載はない。 Therefore, in order to make the cooperation between speech recognition and machine translation closer, it is necessary to verify the speech recognition result from the viewpoint of whether or not the speech recognition result is suitable for machine translation. However, Non-Patent Documents 1 to 3 do not specifically describe a technique for performing such evaluation.

非特許文献４に記載の技術では、音声認識結果の各単語が認識結果として信頼のおけるものであるか否かについて評価が行なわれる。しかしながら、この技術では、「音声認識結果が機械翻訳に適しているか」という観点で評価が行なわれるものではない。そのためこの技術で高く評価された音声認識結果であっても、機械翻訳に全く適さない場合がある。 In the technique described in Non-Patent Document 4, evaluation is performed as to whether each word of the speech recognition result is reliable as the recognition result. However, this technique does not evaluate from the viewpoint of “whether the speech recognition result is suitable for machine translation”. Therefore, even a speech recognition result highly appreciated by this technique may not be suitable for machine translation at all.

こうした問題は、音声認識と機械翻訳という組合せに限らず、音声認識とその結果を利用する自然言語処理全般との組合せの間にも生じうる。 Such a problem may occur not only in the combination of speech recognition and machine translation, but also in the combination of speech recognition and general natural language processing using the result.

それゆえに、本発明の目的は、音声認識結果の信頼度をもとに、後続する自然言語処理に適したものであるか否かという観点を導入して音声認識結果を検査できる、音声認識結果の検査装置を提供することである。 Therefore, it is an object of the present invention to introduce a viewpoint of whether or not the speech recognition result is suitable for subsequent natural language processing based on the reliability of the speech recognition result. It is to provide an inspection apparatus.

また、本発明の別の目的は、音声認識の性能及び機械翻訳の性能をそれぞれ高く保ちつつ、両者の処理の連携を緊密なものにするための音声認識結果の検査装置を提供することである。 Another object of the present invention is to provide an apparatus for inspecting a speech recognition result for maintaining a high level of speech recognition performance and machine translation performance, and making the processing of both processes closer. .

本発明の第１の局面に係る音声認識結果の検査装置は、音声認識処理により所定の入力音声から生成される音声認識結果の単語列を受けて、音声認識処理に後続する所定のフレーズベースの自然言語処理の対象として、音声認識結果の単語列を構成する単語を受理すべきか否かを検査するための装置である。音声認識結果の検査装置は、自然言語処理のためのコーパスから所定の抽出方法で抽出されたフレーズの集合とともに用いられる。音声認識結果の単語列を構成する単語にはそれぞれ音声認識処理により予め信頼度が付与される。音声認識結果の検査装置は、音声認識結果の単語列を構成する各単語に対し、その単語を含む単語列であって音声認識結果の部分単語列をなす単語列のうちフレーズの集合内に一致するフレーズを持つ単語列の集合の関数として、自然言語処理に対する適合度を付与するための適合度付与手段と、音声認識結果の単語列を構成する単語ごとに、適合度付与手段により当該単語に付与された適合度に応じて定められたしきい値と当該単語に付与された信頼度との比較により、当該単語を受理すべきか否かを決定するための決定手段とを含む。 The speech recognition result inspection apparatus according to the first aspect of the present invention receives a word string of a speech recognition result generated from a predetermined input speech by speech recognition processing, and is based on a predetermined phrase base following the speech recognition processing. This is an apparatus for examining whether or not a word constituting a word string of a speech recognition result should be accepted as a target of natural language processing. The speech recognition result inspection apparatus is used together with a set of phrases extracted by a predetermined extraction method from a corpus for natural language processing. Each word constituting the word string of the speech recognition result is given reliability in advance by the speech recognition process. The speech recognition result inspection device matches, for each word constituting the word sequence of the speech recognition result, a word sequence that includes the word and within a set of phrases in a word sequence that forms a partial word sequence of the speech recognition result As a function of a set of word strings having a phrase to be matched, a fitness level giving means for giving a fitness level for natural language processing, and for each word constituting the word string of the speech recognition result, the fitness level giving means assigns the word to the word. Determining means for determining whether or not to accept the word by comparing the threshold value determined according to the given degree of matching with the confidence level given to the word;

音声認識結果の単語列を構成する単語にはまず、適合度付与手段により適合度が付与される。適合度は、その単語を含む音声認識結果の部分単語列をなす単語列のうち、コーパスから抽出されたフレーズの集合内に一致するフレーズを持つものの集合の関数として与えられる。音声認識結果の単語列の中に、フレーズの集合内に一致するフレーズを持つ部分単語列があれば、その部分単語列にはコーパスとの適合性があると考えられる。すなわち、コーパスを利用して行なわれる後続の自然言語処理にその部分単語列が適していると考えられる。決定手段は、各単語を受理すべきか否かを、適合度に応じて定められたしきい値と信頼度との比較によって決定する。したがって、信頼度に基づく各単語の検査に、「コーパスを利用した自然言語処理に適しているか否か」という観点を導入することができ、音声認識の性能及び機械翻訳の性能を低下させることなく、音声認識と自然言語処理との連携を強化できる。 First, the degree of conformity is given to the words constituting the word string of the speech recognition result by the degree-of-fit provision unit. The goodness-of-fit is given as a function of a set of words having a matching phrase in a set of phrases extracted from the corpus among word strings forming a partial word string of the speech recognition result including the word. If there is a partial word string having a matching phrase in the phrase set in the word string of the speech recognition result, it is considered that the partial word string is compatible with the corpus. That is, the partial word string is considered suitable for subsequent natural language processing performed using a corpus. The determining means determines whether or not each word should be accepted by comparing the threshold value determined according to the degree of matching with the reliability. Therefore, it is possible to introduce the viewpoint of “whether it is suitable for natural language processing using a corpus” in the inspection of each word based on reliability, without reducing the performance of speech recognition and machine translation. , Can strengthen the linkage between speech recognition and natural language processing.

好ましくは、適合度付与手段は、フレーズの集合と、音声認識結果の部分単語列をなす単語列とを照合することにより、フレーズの集合内に一致するフレーズを持つ単語列を検出するための照合手段と、音声認識結果の単語列を構成する各単語に対し、照合手段により検出された単語列のうちその単語を含む単語列からなる集合をもとに、所定の基準にしたがい適合度を付与するための手段とを含む。 Preferably, the matching degree assigning unit collates the phrase set with a word string forming a partial word string of the speech recognition result, thereby detecting a word string having a matching phrase in the phrase set. A degree of conformity is given to each word constituting the word sequence of the speech recognition result and the means according to a predetermined criterion based on a set of word sequences including the word among the word sequences detected by the matching unit Means.

照合手段が音声認識結果の部分単語列をなす単語列とフレーズの集合とを照合することによって、フレーズの集合内に一致するフレーズを持つ単語列が検出される。付与するための手段が、音声認識結果の単語列を構成する各単語に対し、検出された単語列のうちその単語を含む単語列の集合をもとに、適合度を付与する。こうすることにより、一致するフレーズを持つ単語列の検出と、各単語への適合度の付与とを効率的に行なうことができる。 The collation means collates the word string forming the partial word string of the speech recognition result with the phrase set, thereby detecting a word string having a matching phrase in the phrase set. The assigning means assigns a fitness level to each word constituting the word string of the speech recognition result based on a set of word strings including the word among the detected word strings. By doing so, it is possible to efficiently detect a word string having a matching phrase and give a matching degree to each word.

好ましくは、適合度付与手段は、音声認識結果の単語列を構成する各単語に対し、その単語に対する単語列の集合に含まれる単語列の長さの関数として、自然言語処理に対する適合度を付与するための手段を含む。 Preferably, the fitness level assigning unit gives the fitness level for natural language processing to each word constituting the word sequence of the speech recognition result as a function of the length of the word sequence included in the set of word sequences for the word. Means for doing so.

音声認識結果の単語列を構成する単語ごとに、その単語を含みフレーズの集合内に一致するフレーズをもつ部分単語列の長さの関数として、適合度を付与する。したがってこの適合度により、一致したフレーズが存在する部分の長さにより処理性能が変化するような自然言語処理に、認識結果が適しているか否かを評価できる。 For each word constituting the word string of the speech recognition result, the degree of fitness is given as a function of the length of the partial word string that includes the word and has a matching phrase in the phrase set. Therefore, it is possible to evaluate whether or not the recognition result is suitable for natural language processing in which the processing performance varies depending on the length of the portion where the matched phrase exists based on the degree of matching.

付与するための手段は、音声認識結果の単語列を構成する各単語に対し、その単語に対する単語列の集合に含まれる単語列のうち、その単語列を構成する単語数が最大のものの単語数を、適合度として付与してもよい。 The means for giving is, for each word constituting the word string of the speech recognition result, out of the word strings included in the set of word strings for the word, the number of words having the maximum number of words constituting the word string May be given as the fitness.

一致したフレーズの長さが長いほど処理性能が向上するような自然言語処理との適合性を、この適合度によって表すことできる。 The suitability with natural language processing in which the processing performance improves as the length of the matched phrase is longer can be expressed by this suitability.

自然言語処理は、入力音声の言語と所定のターゲット言語とのフレーズベースの統計翻訳処理を含んでもよい。フレーズの集合は、入力音声の言語と所定のターゲット言語とのバイリンガルコーパスから、統計翻訳処理のための翻訳モデルを学習する過程で抽出されたフレーズの集合を含む。適合度付与手段は、音声認識結果の単語列を構成する各単語に対し、その単語に対する単語列の集合の関数として、自然言語処理に対する適合度を付与するための手段を含む。 The natural language process may include a phrase-based statistical translation process between the language of the input speech and a predetermined target language. The set of phrases includes a set of phrases extracted in the process of learning a translation model for statistical translation processing from a bilingual corpus of the input speech language and a predetermined target language. The fitness level assigning means includes means for giving a fitness level for natural language processing as a function of a set of word strings for each word constituting the word sequence of the speech recognition result.

検出されたフレーズをもとに付与される適合度は、バイリンガルコーパス及び当該コーパスから学習された翻訳モデルとの適合性を表すものとなる。したがって、音声認識結果の検査に、その音声認識結果の各単語がフレーズベースの統計翻訳に適しているかという観点を導入することができ、音声認識と統計翻訳との連携を強化できる。 The degree of matching given based on the detected phrase represents the matching between the bilingual corpus and the translation model learned from the corpus. Therefore, the viewpoint of whether each word of the speech recognition result is suitable for phrase-based statistical translation can be introduced into the speech recognition result inspection, and the cooperation between speech recognition and statistical translation can be strengthened.

決定手段は、適合度としきい値とを対応付けて保持するための手段と、音声認識結果の単語列を構成する単語ごとに、その単語に付与された適合度をもとに、保持するための手段により保持された適合度としきい値とにしたがって、その単語に対するしきい値を設定するための手段と、音声認識結果の単語列を構成する単語ごとに、設定するための手段により設定されたしきい値と単語に付与された信頼度との比較によって、その単語を受理すべきか否かを決定するための比較手段とを含んでもよい。 The determining means is for holding the matching level and the threshold in association with each other and for each word constituting the word string of the speech recognition result based on the matching level given to the word Is set by means for setting a threshold for the word according to the fitness and the threshold held by the means, and for each word constituting the word string of the speech recognition result. And comparing means for determining whether or not to accept the word by comparing the threshold value and the reliability assigned to the word.

単語ごとに、その単語の適合度に応じて設定されたしきい値とその単語の信頼度とを比較してその単語を受理すべきか否かを決定することにより、音声認識結果を構成する各単語をコーパスとの適合性を考慮して検査できる。 For each word, each of the constituents of the speech recognition result is determined by comparing whether the word should be accepted by comparing the threshold set according to the degree of fitness of the word and the reliability of the word. Words can be inspected for compatibility with the corpus.

保持するための手段が保持するしきい値は、適合度が高くなるにしたがい低くなるよう選ばれてもよい。 The threshold value held by the means for holding may be selected such that the threshold value decreases as the fitness level increases.

このようなしきい値が選ばれることにより、低い信頼度の単語でも、その単語を含む部分単語列が、後続の自然言語処理に適していれば受理されるようになる。部分単語列が後続のフレーズベースの自然言語処理に適していれば、その部分に対する自然言語処理に失敗する確率は低くなる。そのため結果として、音声認識処理とその結果を用いて行なわれるフレーズベースの自然言語処理とからなる一連の処理の性能が向上する。 By selecting such a threshold value, even a word with low reliability can be accepted if a partial word string including the word is suitable for subsequent natural language processing. If the partial word string is suitable for the subsequent phrase-based natural language processing, the probability that the natural language processing for the portion will fail is low. Therefore, as a result, the performance of a series of processes including a speech recognition process and a phrase-based natural language process performed using the result is improved.

本発明の第２の局面に係るコンピュータプログラムは、コンピュータにより実行されると、当該コンピュータを上記したいずれかの音声認識結果の検査装置として動作させる。したがって、上記した音声認識結果の検査装置と同様の効果を得ることができる。 When the computer program according to the second aspect of the present invention is executed by a computer, it causes the computer to operate as any of the speech recognition result inspection apparatuses described above. Therefore, it is possible to obtain the same effects as those of the voice recognition result inspection apparatus described above.

以下、図面を参照しつつ、本発明の一実施の形態について説明する。なお、以下の説明に用いる図面では、同一の物に同一の符号を付してある。それらの名称及び機能も同一である。したがって、それらについての説明は繰返さない。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings. In the drawings used in the following description, the same reference numerals are assigned to the same items. Their names and functions are also the same. Therefore, description thereof will not be repeated.

［概要］
本実施の形態では、音声認識と自然言語処理の一種であるフレーズベースの統計翻訳との組合せによる音声翻訳において、音声認識結果の各単語について、後続の翻訳処理の対象として受理すべきか否かを検査する。この検査においては、各単語について、その単語の信頼性を表す値（以下、この値を「信頼度」と呼ぶ。）としきい値との比較によって、当該各単語の処理対象としての合否を決める。本実施の形態では、信頼度としきい値との比較によって各単語を検査する問題に、音声認識結果の翻訳処理に対する適合性という観点での評価を導入する。すなわち、音声認識結果が翻訳に適しているかを評価し、この評価の結果に応じて単語ごとにしきい値を変化させる。以下、適合性の指標を「適合度」と呼ぶ。 [Overview]
In the present embodiment, whether or not each word of a speech recognition result should be accepted as a target for subsequent translation processing in speech translation by a combination of speech recognition and phrase-based statistical translation which is a kind of natural language processing. inspect. In this inspection, for each word, pass / fail as a processing target of each word is determined by comparing a value representing the reliability of the word (hereinafter, this value is referred to as “reliability”) with a threshold value. . In the present embodiment, evaluation from the viewpoint of suitability of speech recognition results for translation processing is introduced to the problem of checking each word by comparing the reliability with a threshold value. That is, it is evaluated whether the speech recognition result is suitable for translation, and the threshold value is changed for each word according to the result of the evaluation. Hereinafter, the fitness index is referred to as “fitness”.

本実施の形態では、フレーズベースの統計翻訳で使用される翻訳モデルの学習過程で生成されるフレーズを、適合度を得るために利用する。ソース言語及びターゲット言語の多数の対訳文からなるバイリンガルコーパスから翻訳モデルを学習する際には、副産物として多数のフレーズが抽出される。それらは一つのフレーズの集合を形成する。この集合の要素であるフレーズは、バイリンガルコーパスにある文の一部であり、バイリンガルコーパスにより規定されるドメインに頻出するフレーズであるといえる。そのため、音声認識結果にこのようなフレーズと一致する部分があれば、その部分はドメインとの間に適合性があると考えることができる。さらに、その部分が長ければ長いほど、その部分の適合性は高いと考えられる。そこで、本実施の形態では、音声認識結果に含まれる各単語の適合性に関する評価の尺度として、バイリンガルコーパスから抽出されるフレーズに一致する部分の長さを用いる。また、長さの尺度として単語数を用いる。すなわち、音声認識結果の各単語について、当該単語を含む音声認識結果の部分単語列であって、上記したフレーズの集合中に一致するフレーズを持つものからなる集合を考え、この集合に含まれる部分単語列の長さの最大値によって、当該単語の適合度を定める。なお、以下の説明では、上記フレーズの長さを、「フレーズ長」と呼ぶ。部分単語列もまた一つのフレーズであると考えられるから、部分単語列の長さも同じく「フレーズ長」と呼ぶ。音声認識結果内の、上記フレーズ集合内のフレーズのいずれかに一致する部分単語列を「一致部分」と呼ぶ。ある単語を含む全ての一致部分からなる集合のうち、フレーズ長が最大のもののフレーズ長を、「（その単語の）最大一致長」と呼び、「Ｌmax」で表す。 In the present embodiment, a phrase generated in the learning process of a translation model used in phrase-based statistical translation is used to obtain a goodness of fit. When learning a translation model from a bilingual corpus composed of a large number of parallel translations of a source language and a target language, a large number of phrases are extracted as by-products. They form a set of phrases. The phrase that is an element of this set is a part of a sentence in the bilingual corpus and can be said to be a phrase that frequently appears in the domain defined by the bilingual corpus. Therefore, if there is a part that matches such a phrase in the speech recognition result, it can be considered that the part is compatible with the domain. Furthermore, the longer the part, the higher the suitability of that part. Therefore, in the present embodiment, the length of the portion that matches the phrase extracted from the bilingual corpus is used as a measure for evaluating the suitability of each word included in the speech recognition result. The number of words is used as a measure of length. That is, for each word of the speech recognition result, a partial word string of the speech recognition result including the word, and considering a set having a matching phrase in the set of phrases described above, a portion included in this set The degree of suitability of the word is determined by the maximum length of the word string. In the following description, the length of the phrase is referred to as “phrase length”. Since the partial word string is also considered as one phrase, the length of the partial word string is also called “phrase length”. A partial word string that matches one of the phrases in the phrase set in the speech recognition result is referred to as a “matching portion”. Of the set of all matching parts including a word, the phrase length with the longest phrase length is called “maximum matching length (of the word)” and is represented by “Lmax”.

［構成］
（音声翻訳システムの全体構成）
図１に、本実施の形態に係る音声翻訳システム３０の構成をブロック図で示す。図１を参照して、音声翻訳システム３０は、所定のバイリンガルコーパス３４から翻訳モデルを学習するための翻訳モデル学習装置５４と、翻訳モデル学習装置５４により学習された翻訳モデルを保持するための翻訳モデル部５２とを含む。翻訳モデル学習装置５４は、翻訳モデルの学習の過程で、バイリンガルコーパス３４からフレーズを抽出する機能を持つ。音声翻訳システム３０はさらに、翻訳モデル学習装置５４により抽出されるフレーズを蓄積するためのフレーズデータベース５６を含む。 [Constitution]
(Overall structure of speech translation system)
FIG. 1 is a block diagram showing the configuration of a speech translation system 30 according to this embodiment. Referring to FIG. 1, the speech translation system 30 includes a translation model learning device 54 for learning a translation model from a predetermined bilingual corpus 34, and a translation for holding a translation model learned by the translation model learning device 54. Model part 52 is included. The translation model learning device 54 has a function of extracting a phrase from the bilingual corpus 34 in the course of learning the translation model. The speech translation system 30 further includes a phrase database 56 for storing phrases extracted by the translation model learning device 54.

音声翻訳システム３０はさらに、音声翻訳と上述したしきい値の学習処理とからいずれかを選択する操作入力４２に応じて、翻訳用音声３２と学習のために予め用意された学習用音声３８Ａ，…，３８Ｐとから処理の対象となる音声４６を選択するためのセレクタ４４と、セレクタ４４により選択された音声４６について音声認識処理を行ない、音声認識された単語列と当該単語列の各単語の信頼度とからなる音声認識結果５０を出力するための音声認識装置４８と、音声認識結果５０の各単語の検査を、フレーズデータベース５６内のフレーズに対するその単語の最大一致長Ｌmaxに応じて定められるしきい値とその単語の信頼度とをもとに行ない、音声認識結果５０と各単語の最大一致長Ｌmax及び合否の情報とからなる検査結果７６を生成し出力するための検査装置６０と、第１及び第２の出力を持ち、セレクタ４４に連動し、検査結果７６を操作入力４２に応じて第１又は第２の出力に対して出力するためのスイッチ７８とを含む。 The speech translation system 30 further includes a translation speech 32 and a learning speech 38A prepared in advance for learning in response to an operation input 42 for selecting one of speech translation and the threshold learning process described above. .., 38P and a selector 44 for selecting the speech 46 to be processed, and speech recognition processing is performed on the speech 46 selected by the selector 44, and the speech recognition word string and each word of the word string The speech recognition device 48 for outputting the speech recognition result 50 including the reliability and the inspection of each word of the speech recognition result 50 are determined according to the maximum matching length Lmax of the word with respect to the phrase in the phrase database 56. Based on the threshold value and the reliability of the word, a test result 76 including the speech recognition result 50, the maximum match length Lmax of each word, and pass / fail information is generated. An inspection device 60 for outputting, a switch having first and second outputs, interlocking with the selector 44, and outputting the inspection result 76 to the first or second output according to the operation input 42 78.

音声翻訳システム３０はさらに、スイッチ７８の第１の出力に接続されて、スイッチ７８から検査結果７６が与えられると、当該検査結果７６に含まれる音声認識結果５０に対し各単語の合否に応じた所定の前処理を施すための前処理装置６４と、翻訳モデル部５２を用いたフレーズベースの統計翻訳により、前処理装置６４による前処理済の音声認識結果６６から翻訳結果３６への翻訳を行なうための翻訳装置６８とを含む。本実施の形態では、前処理装置６４は、検査結果７６をもとに音声認識結果５０の単語列から不合格の単語を除去して、前処理済の音声認識結果６６を生成して出力する。 The speech translation system 30 is further connected to the first output of the switch 78. When a test result 76 is given from the switch 78, the speech translation system 30 responds to the pass / fail of each word with respect to the speech recognition result 50 included in the test result 76. Pre-processing device 64 for performing predetermined pre-processing and phrase-based statistical translation using translation model unit 52 translates speech recognition result 66 pre-processed by pre-processing device 64 into translation result 36. And a translation device 68. In the present embodiment, the preprocessing device 64 removes rejected words from the word sequence of the speech recognition result 50 based on the inspection result 76, and generates and outputs a preprocessed speech recognition result 66. .

音声翻訳システム３０はさらに、スイッチ７８の第２の出力に接続され、スイッチ７８から学習用音声３８Ａ，…，３８Ｐについての検査結果７６が与えられると、当該検査結果７６と学習用音声３８Ａ，…，３８Ｐについて予め用意された検査結果の正解（以下、「正解検査結果」と呼ぶ。）４０Ａ，…，４０Ｐとから、最大一致長Ｌmaxごとにしきい値を学習するためのしきい値学習部８４と、しきい値学習部８４による学習により得られた最大一致長Ｌmaxごとのしきい値からなるしきい値テーブルを保持するためのしきい値テーブル部８２とを含む。スイッチ７８はセレクタ４４と連動している。すなわち、操作入力４２に応じてセレクタ４４が翻訳用音声３２を選択しているときには、検査結果７６を前処理装置６４に出力し、セレクタ４４が学習用音声３８Ａ，…，３８Ｐを選択しているときには、検査結果７６をしきい値学習部８４に出力する。 The speech translation system 30 is further connected to the second output of the switch 78. When the test result 76 for the learning speech 38A,..., 38P is given from the switch 78, the test result 76 and the learning speech 38A,. , 38P, the threshold value learning unit 84 for learning the threshold value for each maximum matching length Lmax from the correct answers (hereinafter referred to as “correct answer test results”) 40A,. And a threshold value table unit 82 for holding a threshold value table composed of threshold values for each maximum matching length Lmax obtained by learning by the threshold value learning unit 84. The switch 78 is interlocked with the selector 44. That is, when the selector 44 selects the translation voice 32 according to the operation input 42, the inspection result 76 is output to the preprocessing device 64, and the selector 44 selects the learning voices 38A,. Sometimes, the inspection result 76 is output to the threshold value learning unit 84.

検査装置６０は、フレーズデータベース５６のフレーズに対する音声認識結果５０の各単語の最大一致長Ｌmaxを求め、音声認識結果５０及び各単語の最大一致長Ｌmaxからなる適合度の評価結果７２を生成し出力するための適合度評価部７０と、適合度の評価結果７２に含まれる音声認識結果５０の各単語の合否を、その単語の信頼度及び最大一致長Ｌmaxとしきい値テーブル部８２内のしきい値テーブルとを用いて決定し、適合度の評価結果７２と各単語の合否とからなる検査結果７６を生成し出力するための合否決定部７４とを含む。 The inspection device 60 obtains the maximum matching length Lmax of each word of the speech recognition result 50 for the phrase in the phrase database 56, and generates and outputs a fitness evaluation result 72 composed of the speech recognition result 50 and the maximum matching length Lmax of each word. The pass / fail of each word of the speech recognition result 50 included in the suitability evaluation result 70 and the appraisal result 72 of the suitability, the reliability of the word, the maximum match length Lmax, and the threshold in the threshold table 82 And a pass / fail determination unit 74 for generating and outputting a test result 76 that is determined by using the value table and includes the evaluation result 72 of the matching degree and the pass / fail of each word.

（音声認識装置４８による音声認識結果５０）
図２に、音声認識装置４８が行なう音声認識処理の概要と、その結果音声認識装置４８により出力される音声認識結果５０の構成とを模式的に示す。図２を参照して、音声認識装置４８は、まず与えられた音声４６の音声認識処理１００を行ない、音声４６を出力しうる発話内容を単語の組合せで表現した単語グラフ１０２を生成する。単語グラフ１０２は、単語に対応するパスからなる経路網で構成されたグラフであり、その各パスには音声認識装置４８にて付与されたスコアが格納されている。この経路網を通って始点から終点まで進んだ場合の経路が、音声４６を出力する単語列に対応する。またこの経路網をもとに各単語の事後確率とＧＷＰＰとが算出される。そこで音声認識装置４８は、単語グラフ１０２の経路網から、各単語のスコアに基づいて、経路を選択する処理１０４を行なう。この処理１０４により選択された経路が本実施の形態に係る音声認識結果５０となる。 (Speech recognition result 50 by the speech recognition device 48)
FIG. 2 schematically shows the outline of the speech recognition processing performed by the speech recognition device 48 and the configuration of the speech recognition result 50 output by the speech recognition device 48 as a result. With reference to FIG. 2, the speech recognition device 48 first performs speech recognition processing 100 of the given speech 46, and generates a word graph 102 that expresses the utterance content that can output the speech 46 by a combination of words. The word graph 102 is a graph composed of a route network composed of paths corresponding to words, and a score given by the speech recognition device 48 is stored in each path. A route when traveling from the start point to the end point through the route network corresponds to a word string for outputting the voice 46. Further, the posterior probability and GWPP of each word are calculated based on this route network. Therefore, the voice recognition device 48 performs processing 104 for selecting a route from the route network of the word graph 102 based on the score of each word. The route selected by this processing 104 is the speech recognition result 50 according to the present embodiment.

音声認識結果５０は、単語グラフ１０２から選択された経路の単語Ｗ₁〜Ｗ_Nからなる単語列１０６と、単語Ｗ₁〜Ｗ_Nの各々について単語グラフ１０２から算出される前述のＧＷＰＰ（ＧＷＰＰ1〜ＧＷＰＰN）からなる系列１０８とを含む。本実施の形態に係る音声翻訳システム３０では、各単語について算出されるＧＷＰＰを、音声認識結果５０における当該単語の信頼度として使用する。 The speech recognition result 50 includes the word string 106 composed of the words W _{1 to} W _N of the route selected from the word graph 102 and the GWPP (GWPP 1 to GWPP 1 to GW) calculated from the word graph 102 for each of the words W _{1 to} W _N. And a sequence 108 consisting of GWPPN). In the speech translation system 30 according to the present embodiment, the GWPP calculated for each word is used as the reliability of the word in the speech recognition result 50.

（フレーズデータベース５６）
図３に、フレーズデータベース５６（図１参照）のデータ構成を模式的に示す。図３を参照して、フレーズデータベース５６には、ソース言語（図３に示す例では日本語）の１又は複数の単語からなるフレーズ１２０Ａ，…，１２０Ｑと、それらのフレーズに対応するターゲット言語のフレーズ１２２Ａ，…，１２２Ｑとを含む。フレーズ１２０Ａ，…，１２０Ｑ及び１２２Ａ，…，１２２Ｑはいずれも、ソース言語又はターゲット言語の文法とは無関係にセグメンテーションされた単語列である。そのため、これらのフレーズは、文節等の文法的な単位との間に関連性を持たない。またそのフレーズ長は、フレーズによって異なる。これらのフレーズはいずれも、バイリンガルコーパス３４内の文を所定の統計的手法でセグメンテーションすることにより得られるものである。そのためこれらのフレーズはいずれも、バイリンガルコーパス３４が規定するドメインに頻出するフレーズといえる。 (Phrase database 56)
FIG. 3 schematically shows the data structure of the phrase database 56 (see FIG. 1). Referring to FIG. 3, phrase database 56 includes phrases 120 A,. Phrases 122A,..., 122Q. Phrases 120A,..., 120Q and 122A,... Therefore, these phrases have no relation to grammatical units such as phrases. The phrase length varies depending on the phrase. Each of these phrases is obtained by segmenting a sentence in the bilingual corpus 34 by a predetermined statistical method. Therefore, all of these phrases can be said to be frequent phrases in the domain defined by the bilingual corpus 34.

（適合度の評価結果７２及び検査結果７６）
検査装置６０による適合度の評価と信頼度を用いた検査とは、単語単位で行なわれる。図４（Ａ）及び図４（Ｂ）にそれぞれ、適合度の評価結果７２及び検査結果７６のデータ構成を示す。図４（Ａ）を参照して、適合度の評価結果７２は、図２に示す音声認識結果５０と、音声認識結果５０における単語Ｗ₁〜Ｗ_Nの最大一致長Ｌmaxからなる系列１４０とを含む。なお、本実施の形態においては、当該単語を含む一致部分からなる集合の要素数が０である場合、その単語の最大一致長Ｌmaxを０とする。図４（Ｂ）を参照して、検査結果７６は、適合度の評価結果７２と、単語Ｗ₁〜Ｗ_Nの合否を表す検査結果列１４４とを含む。なお図４（Ｂ）では、検査結果列１４４において、単語Ｗ₁〜Ｗ_Nの合否を「ＯＫ（１）」と「ＮＧ（０）」とによって表している。 (Fitness evaluation result 72 and inspection result 76)
The evaluation of the fitness by the inspection device 60 and the inspection using the reliability are performed in units of words. 4A and 4B show data structures of the fitness evaluation result 72 and the inspection result 76, respectively. Referring to FIG. 4A, the fitness evaluation result 72 includes a speech recognition result 50 shown in FIG. 2 and a sequence 140 composed of the maximum match length Lmax of words W _{1 to} W _N in the speech recognition result 50. Including. In the present embodiment, when the number of elements of a set including matching parts including the word is 0, the maximum matching length Lmax of the word is set to 0. Referring to FIG. 4B, the inspection result 76 includes a conformity evaluation result 72 and an inspection result column 144 representing pass / fail of the words W _{1 to} W _N. In FIG. 4B, the pass / fail of the words W _{1 to} W _N is represented by “OK (1)” and “NG (0)” in the inspection result column 144.

（しきい値テーブル部８２の構成）
図５に、しきい値テーブル部８２に保持されるしきい値テーブルの構成を示す。図５を参照して、しきい値テーブル１６０は、それぞれＬmax＝０、Ｌmax＝１、Ｌmax＝２、Ｌmax＝３、及びＬmax＞３に対応する５種類のしきい値Ｔ0、Ｔ1、Ｔ2、Ｔ3、及びＴ4を含む。しきい値Ｔ0、Ｔ1、Ｔ2、Ｔ3、及びＴ4はそれぞれ、図１に示すしきい値学習部８４による学習によって調整される。 (Configuration of threshold value table unit 82)
FIG. 5 shows the configuration of the threshold value table held in the threshold value table unit 82. Referring to FIG. 5, threshold table 160 includes five threshold values T0, T1, T2, corresponding to Lmax = 0, Lmax = 1, Lmax = 2, Lmax = 3, and Lmax> 3, respectively. Includes T3 and T4. The threshold values T0, T1, T2, T3, and T4 are adjusted by learning by the threshold value learning unit 84 shown in FIG.

（適合度評価部７０の構成）
図６に、適合度評価部７０の機能的構成をブロック図で示す。図６を参照して、適合度評価部７０は、入力された音声認識結果５０を記憶するための認識結果記憶部１８０と、認識結果記憶部１８０に格納された音声認識結果５０中の部分的な単語列（以下、「部分文」と呼ぶ。）とフレーズデータベース５６内のフレーズ１２０Ａ，…，１２０Ｑ（図３参照）とを照合して、フレーズデータベース５６内に一致するフレーズを持つ部分文を検出するための照合部１８２と、音声認識結果５０の各単語について、照合部１８２による検出結果をもとに最大一致長Ｌmaxを算出するためのＬmax算出部１８４と、Ｌmax算出部１８４により算出された各単語の最大一致長Ｌmaxを、音声認識結果５０の当該単語についての適合度として付与し、認識結果記憶部１８０内に適合度の評価結果７２を形成するための適合度付与部１８５と、適合度の評価結果７２が形成されると認識結果記憶部１８０からこれを読出して出力するための出力部１８６とを含む。 (Configuration of conformity evaluation unit 70)
FIG. 6 is a block diagram showing a functional configuration of the fitness evaluation unit 70. With reference to FIG. 6, the fitness evaluation unit 70 includes a recognition result storage unit 180 for storing the input speech recognition result 50, and a partial in the speech recognition result 50 stored in the recognition result storage unit 180. A partial sentence having a matching phrase in the phrase database 56 by comparing a simple word string (hereinafter referred to as “partial sentence”) with the phrases 120A,..., 120Q (see FIG. 3) in the phrase database 56. For each word of the speech recognition result 50, the collation unit 182 for detection is calculated by the Lmax calculation unit 184 for calculating the maximum match length Lmax based on the detection result by the collation unit 182 and the Lmax calculation unit 184. The maximum matching length Lmax of each word is given as the matching level for the word of the speech recognition result 50, and the matching for forming the matching score evaluation result 72 in the recognition result storage unit 180 Comprising a deposition unit 185, an output unit 186 for outputting it reads from the recognition result storage unit 180 and evaluation results 72 fitness is formed.

照合部１８２は、音声認識結果５０の単語列１０６から全ての部分文とその部分文の単語列１０６内での位置を表す位置標識とを生成するための部分文生成部１９０と、生成された部分文及びその位置標識を記憶するための部分文記憶部１９２と、部分文記憶部１９２内の部分文に一致するフレーズをフレーズデータベース５６内で探索し、一致するフレーズがあればその部分文の位置標識を一致部分の位置標識として出力するための探索部１９４と、探索部１９４から出力される位置標識を記憶するための一致部分記憶部１９６とを含む。一致部分記憶部１９６には、探索部１９４から出力された全ての位置標識が格納される。すなわち、一致部分記憶部１９６には、全ての一致部分の位置標識が格納される。 The collation unit 182 is generated with a partial sentence generation unit 190 for generating all partial sentences from the word string 106 of the speech recognition result 50 and a position indicator representing the position of the partial sentence in the word string 106. The partial sentence storage unit 192 for storing the partial sentence and its position indicator, and a phrase that matches the partial sentence in the partial sentence storage part 192 are searched in the phrase database 56, and if there is a matching phrase, A search unit 194 for outputting the position indicator as a position indicator of the matching portion and a matching part storage unit 196 for storing the position indicator output from the search unit 194 are included. In the matching part storage unit 196, all the position indicators output from the search unit 194 are stored. That is, the matching part storage unit 196 stores the position markers of all the matching parts.

Ｌmax算出部１８４は、一致部分記憶部１９６内の位置標識をもとに、認識結果記憶部１８０内の単語列１０６の各単語について、当該単語を含む一致部分の集合を求める。Ｌmax算出部１８４はさらに、当該集合に含まれる一致部分の各々のフレーズ長を求め、そのフレーズ長から、当該単語についての最大一致長Ｌmaxを求めて、適合度付与部１８５に対し出力する。ここでは、ある単語について一致部分の集合が求められれば、当該単語に対する最大一致長が求められるという関数関係が存在する。 Based on the position indicator in the matching part storage unit 196, the Lmax calculation unit 184 determines a set of matching parts including the word for each word in the word string 106 in the recognition result storage unit 180. The Lmax calculating unit 184 further obtains the phrase length of each matching portion included in the set, obtains the maximum matching length Lmax for the word from the phrase length, and outputs the maximum matching length Lmax to the matching degree assigning unit 185. Here, there is a functional relationship in which if a set of matching parts is obtained for a certain word, the maximum matching length for the word is obtained.

（合否決定部７４の構成）
図７に、合否決定部７４（図１参照）の機能的構成をブロック図で示す。図７を参照して、合否決定部７４は、適合度評価部７０から出力された適合度の評価結果７２を記憶するための記憶部２６０と、記憶部２６０内の適合度の評価結果７２から処理対象の単語を順次選択し、当該単語のＧＷＰＰ及び最大一致長Ｌmaxを順次出力するための単語選択部２７０と、単語選択部２７０から出力された最大一致長Ｌmaxに応じたしきい値をしきい値テーブル１６０（図５参照）から選択して出力するためのしきい値設定部２７２と、ＧＷＰＰとしきい値設定部２７２の出力するしきい値とを比較し、その結果を順次出力するための比較部２７４とを含む。合否決定部７４はさらに、比較部２７４により出力された単語Ｗ₁〜Ｗ_Nに関する比較の結果を、単語Ｗ₁〜Ｗ_Nについての検査結果として記憶部２６０内の適合度の評価結果７２に付与することにより、検査結果７６を生成するための検査結果付与部２６４と、検査結果付与部２６４が検査結果７６を生成すると、当該検査結果７６を記憶部２６０から読出して出力するための出力部２６８とを含む。 (Configuration of the pass / fail determination unit 74)
FIG. 7 is a block diagram showing a functional configuration of the pass / fail determination unit 74 (see FIG. 1). With reference to FIG. 7, the pass / fail determination unit 74 includes a storage unit 260 for storing the evaluation result 72 of the fitness level output from the fitness level evaluation unit 70, and the evaluation result 72 of the fitness level in the storage unit 260. A word selection unit 270 for sequentially selecting words to be processed and sequentially outputting GWPP and maximum matching length Lmax of the word, and a threshold value according to the maximum matching length Lmax output from the word selection unit 270 In order to compare the threshold value setting unit 272 for selecting and outputting from the threshold value table 160 (see FIG. 5) with the threshold value output from the GWPP and the threshold value setting unit 272, and sequentially outputting the results. The comparison unit 274 is included. Further, the pass / fail determination unit 74 gives the comparison result regarding the words W _{1 to} W _N output by the comparison unit 274 to the evaluation result 72 of the fitness level in the storage unit 260 as the test result for the words W _{1 to} W _N. Thus, when the inspection result giving unit 264 generates the inspection result 76 and the inspection result adding unit 264 generates the inspection result 76, the output unit 268 reads out the inspection result 76 from the storage unit 260 and outputs it. Including.

（しきい値学習部８４の構成）
図８に、図１に示すしきい値学習部８４の機能的構成をブロック図で示す。図８を参照して、しきい値学習部８４は、スイッチ７８（図１参照）からしきい値学習部８４に与えられる検査結果７６に、対応する正解検査結果４０Ａ，…，４０Ｐを付与して正解付の学習用検査結果２８２を生成するための正解付与部２８０と、生成された正解付の学習用検査結果２８２を記憶するための学習用検査結果記憶部２８４と、学習用検査結果記憶部２８４に格納された正解付の学習用検査結果２８２から、各単語の検査結果の正誤を正解検査結果に基づいて判定してその結果２８８を出力するための正誤判定部２８６と、正誤判定の結果２８８を記憶するための正誤記憶部２９０とを含む。 (Configuration of threshold learning unit 84)
FIG. 8 is a block diagram showing a functional configuration of the threshold learning unit 84 shown in FIG. Referring to FIG. 8, threshold learning unit 84 assigns corresponding correct test results 40A,..., 40P to test result 76 given from switch 78 (see FIG. 1) to threshold learning unit 84. A correct answer assigning unit 280 for generating a learning test result 282 with correct answer, a learning test result storage unit 284 for storing the generated learning test result 282 with correct answer, and a learning test result storage A correct / incorrect determination unit 286 for determining the correctness / incorrectness of the test result of each word based on the correct answer test result from the learning test result 282 with correct answer stored in the unit 284, and for the correct / incorrect determination And a correct / incorrect storage 290 for storing the result 288.

しきい値学習部８４はさらに、正誤判定部２８６による正誤の判定が完了すると、正誤記憶部２９０に格納された正誤判定の結果２８８をもとに、合否決定部７４（図１参照）による検査の性能を評価し、その結果に応じて、正誤記憶部２９０内の情報を用いてしきい値テーブル１６０（図５参照）内のしきい値Ｔ0〜Ｔ4の値を調整するためのしきい値調整部２９４と、しきい値が調整されると、調整後のしきい値をもとに学習用検査結果記憶部２８４内の学習用検査結果２８２について検査を再実施するための再検査部２９６とを含む。 Further, when the correctness determination by the correctness determination unit 286 is completed, the threshold value learning unit 84 performs an inspection by the pass / fail determination unit 74 (see FIG. 1) based on the correctness determination result 288 stored in the correctness / incorrectness storage unit 290. Threshold values for adjusting the values of threshold values T0 to T4 in threshold value table 160 (see FIG. 5) using information in correctness / error storage unit 290 according to the result When the threshold value is adjusted, the adjustment unit 294 and the re-inspection unit 296 for re-inspecting the learning test result 282 in the learning test result storage unit 284 based on the adjusted threshold value. Including.

学習用検査結果記憶部２８４に格納される学習用検査結果２８２は、学習用音声３８Ａ，…，３８Ｐの各々に関する検査結果７６と、学習用音声３８Ａ，…，３８Ｐに対応して予め用意された正解検査結果４０Ａ，…，４０Ｐのうち、当該検査結果に対応するものとの組からなる。正解検査結果は、各単語の各検査結果に対応する正解を含む。正誤判定部２８６は、学習用検査結果記憶部２８４内の情報に変化が生じると、それに応答して正誤判定を開始する機能を持つ。この判定により生成される正誤判定の結果２８８は、正解検査結果の単語列に対応するＧＷＰＰの系列１０８（図２参照）及び最大一致長Ｌmaxの系列１４０（図４参照）と、検査結果列１４４により表される各単語の検査結果の正誤を表す正誤標識からなる正誤標識列とを含む。正誤標識はそれぞれ、対応の単語についての検査結果が正解と一致したか否か、一致しなかった場合正解は何であったかを表す。すなわち、検査結果と正解とが一致したことを表す値と、合格（ＯＫ）にすべき単語を不合格（ＮＧ）にしていることを表す値と、不合格（ＮＧ）にすべき単語を合格（ＯＫ）にしていることを表す値とである。しきい値調整部２９４は、最大一致長Ｌmax別に、合否決定部７４（図１参照）による検査の性能を、次に示す信頼度の判定誤り率（Confidence Error Rate：ＣＥＲ）によって評価する機能を持つ。 The learning test results 282 stored in the learning test result storage unit 284 are prepared in advance corresponding to the test results 76 for the learning voices 38A,..., 38P and the learning voices 38A,. Of the correct answer results 40A,..., 40P, the correct answer results 40A,. The correct test result includes a correct answer corresponding to each test result of each word. The correctness determination unit 286 has a function of starting correctness determination in response to a change in information in the learning test result storage unit 284. The correct / wrong determination result 288 generated by this determination includes the GWPP sequence 108 (see FIG. 2) and the maximum match length Lmax sequence 140 (see FIG. 4) corresponding to the word sequence of the correct test result, and the test result sequence 144. And a correct / incorrect indicator string composed of correct / incorrect indicators indicating the correctness / incorrectness of the inspection result of each word. Each correct / incorrect mark indicates whether or not the test result for the corresponding word matches the correct answer, and if not, what the correct answer is. That is, a value indicating that the test result matches the correct answer, a value indicating that the word to be passed (OK) is rejected (NG), and a word to be rejected (NG) are passed. It is a value indicating that it is set to (OK). The threshold adjustment unit 294 has a function of evaluating the performance of the inspection by the pass / fail determination unit 74 (see FIG. 1) for each maximum match length Lmax using the reliability determination error rate (CER) shown below. Have.

しきい値調整部２９４はさらに、ＣＥＲがそれぞれ最小になるように、しきい値Ｔ0〜Ｔ4の各々を調整する機能を持つ。

The threshold adjustment unit 294 further has a function of adjusting each of the thresholds T0 to T4 so that the CER is minimized.

［動作］
本実施の形態の音声翻訳システム３０は、以下のように動作する。 [Operation]
The speech translation system 30 according to the present embodiment operates as follows.

（翻訳モデルの学習及びフレーズデータの抽出）
図１を参照して、音声翻訳システム３０は、翻訳又はしきい値の学習を行なう前に、予め翻訳モデルの学習を行なう。すなわち、翻訳モデル学習装置５４は、バイリンガルコーパス３４に格納されている対訳文から翻訳モデルの学習を行ない、得られた翻訳モデルを翻訳モデル部５２に格納する。翻訳モデル学習装置５４はこの学習過程において、バイリンガルコーパス３４内に含まれる文から、ソース言語のフレーズ１２０Ａ，…，１２０Ｑ及びターゲット言語のフレーズ１２２Ａ，…，１２２Ｑ（図３参照）を生成する。生成されたフレーズの各々は、フレーズデータベース５６に格納される。 (Learning translation models and extracting phrase data)
Referring to FIG. 1, speech translation system 30 learns a translation model in advance before translation or threshold learning. That is, the translation model learning device 54 learns the translation model from the parallel translation stored in the bilingual corpus 34 and stores the obtained translation model in the translation model unit 52. In this learning process, the translation model learning device 54 generates source language phrases 120A, ..., 120Q and target language phrases 122A, ..., 122Q (see FIG. 3) from sentences included in the bilingual corpus 34. Each of the generated phrases is stored in the phrase database 56.

（しきい値の学習）
以下、しきい値学習部８４がしきい値を学習する動作について説明する。図１を参照して、しきい値の学習が選択され、その選択に対応する操作入力４２が音声翻訳システム３０に与えられると、セレクタ４４が、処理対象の音声４６として学習用音声３８Ａ，…，３８Ｐを選択する。選択された音声４６は、音声認識装置４８により音声認識結果５０に変換される。さらに検査装置６０により音声認識結果５０に対する検査が実行され、検査結果７６（図４（Ｂ）参照）が検査装置６０より出力される。しきい値を学習する場合、スイッチ７８は、検査結果７６をしきい値学習部８４に出力する。なお、音声認識装置４８及び検査装置６０の動作の詳細については、後述する。 (Learning threshold)
Hereinafter, an operation in which the threshold learning unit 84 learns the threshold will be described. Referring to FIG. 1, when learning of a threshold is selected and an operation input 42 corresponding to the selection is given to the speech translation system 30, the selector 44 uses the learning speech 38A,. , 38P. The selected voice 46 is converted into a voice recognition result 50 by the voice recognition device 48. Further, the inspection device 60 performs an inspection on the voice recognition result 50, and an inspection result 76 (see FIG. 4B) is output from the inspection device 60. When learning the threshold value, the switch 78 outputs the inspection result 76 to the threshold value learning unit 84. Details of operations of the voice recognition device 48 and the inspection device 60 will be described later.

図８を参照して、しきい値学習部８４の正解付与部２８０に検査結果７６が与えられると、正解付与部２８０は、与えられた検査結果７６に、正解検査結果４０Ａ，…，４０Ｐから、この検査結果７６に対応するものを付与する。その結果、正解付の学習用検査結果２８２が生成される。生成された正解付の学習用検査結果２８２は、学習用検査結果記憶部２８４に格納される。 Referring to FIG. 8, when test result 76 is given to correct answer giving unit 280 of threshold learning unit 84, correct answer giving unit 280 adds correct test results 40 A,..., 40 P to the given test result 76. A thing corresponding to this inspection result 76 is given. As a result, a learning test result 282 with a correct answer is generated. The generated learning test result 282 with correct answer is stored in the learning test result storage unit 284.

学習用検査結果記憶部２８４に正解付の学習用検査結果２８２が格納されると、正誤判定部２８６は、検査結果列１４４と対応の正解検査結果との比較により、各単語についての検査結果の正誤を判定し、正誤標識列を生成する。そして正誤判定部２８６は、正誤標識列と、学習用検査結果２８２内のＧＷＰＰの系列１０８及び最大一致長Ｌmaxの系列１４０とから、正誤判定の結果２８８を形成し、正誤記憶部２９０に格納する。 When the learning test result 282 with the correct answer is stored in the learning test result storage unit 284, the correctness / incorrectness determination unit 286 compares the test result string 144 with the corresponding correct test result to determine the test result for each word. Correct / incorrect is determined and a correct / incorrect indicator string is generated. The correctness determination unit 286 forms a correctness / incorrectness determination result 288 from the correctness / incorrectness indicator string, the GWPP sequence 108 and the maximum match length Lmax sequence 140 in the learning test result 282, and stores the result in the correctness / incorrectness storage unit 290. .

正誤判定部２８６が、学習用検査結果記憶部２８４内にある正解付の学習用検査結果２８２に対する以上の処理を終了すると、しきい値調整部２９４に対して終了信号を与える。しきい値調整部２９４は、正誤記憶部２９０内にある正誤判定の結果２８８をもとに、適合度ごとにＣＥＲを算出し、適合度ごとのＣＥＲがそれぞれ最小になるように、しきい値テーブル１６０内のしきい値Ｔ0〜Ｔ4を調整する。 When the correctness / incorrectness determination unit 286 finishes the above processing for the learning test result 282 with correct answer in the learning test result storage unit 284, it gives an end signal to the threshold adjustment unit 294. The threshold adjustment unit 294 calculates the CER for each goodness of fit based on the correctness / incorrectness determination result 288 in the correctness / incorrectness storage unit 290, and sets the threshold so that the CER for each goodness of fit is minimized. The threshold values T0 to T4 in the table 160 are adjusted.

しきい値テーブル１６０内のしきい値が変更されると、再検査部２９６は、変更後のしきい値をもとに、学習用検査結果記憶部２８４に格納された学習用の検査結果に関する再検査を実施して、学習用検査結果記憶部２８４内に格納された検査結果を変更する。正誤判定部２８６は、学習用検査結果記憶部２８４内の情報に変化が生じると、再度正誤判定を行なう。以上のような動作を繰返し、ＣＥＲが最小化していれば、しきい値の調整を終了する。このような一連の動作により、しきい値テーブル１６０内のしきい値Ｔ0〜Ｔ4は、対応する適合度の単語について検査を行なった場合の検査誤りの最も少ないしきい値となる。 When the threshold value in the threshold value table 160 is changed, the re-inspection unit 296 relates to the learning inspection result stored in the learning inspection result storage unit 284 based on the changed threshold value. A re-inspection is performed, and the inspection result stored in the learning inspection result storage unit 284 is changed. When the information in the learning test result storage unit 284 changes, the correctness / incorrectness determination unit 286 performs correctness / incorrectness determination again. If the above operation is repeated and the CER is minimized, the adjustment of the threshold value is finished. Through such a series of operations, the threshold values T0 to T4 in the threshold value table 160 become the threshold values with the least number of inspection errors when the corresponding words of the matching degree are inspected.

（音声認識）
再び図１を参照して、翻訳用音声３２に対する音声翻訳に対応する操作入力４２が音声翻訳システム３０に与えられると、音声翻訳システム３０は、音声翻訳を行なうための一連の動作を開始する。この場合、セレクタ４４は、処理対象の音声４６として翻訳用音声３２を選択する。 (voice recognition)
Referring to FIG. 1 again, when operation input 42 corresponding to speech translation for translation speech 32 is given to speech translation system 30, speech translation system 30 starts a series of operations for performing speech translation. In this case, the selector 44 selects the translation voice 32 as the voice 46 to be processed.

セレクタ４４から音声認識装置４８に音声４６が与えられると、音声認識装置４８は図２を用いて前述した一連の処理を実行する。すなわち、図２を参照して、音声認識装置４８はまず、与えられた音声４６の音声認識処理１００を行なって、単語グラフ１０２を生成する。音声認識装置４８はさらに、単語グラフ１０２の経路網を構成する各単語について、事後確率を算出し、さらにＧＷＰＰを算出する。続いて音声認識装置４８は、経路選択１０４を行なって単語列１０６を生成すると共に、当該単語列１０６に対応するＧＷＰＰの系列１０８を生成する。音声認識装置４８は、生成された単語列１０６及び対応するＧＷＰＰの系列１０８の組を音声認識結果５０として出力する。 When the voice 46 is given from the selector 44 to the voice recognition device 48, the voice recognition device 48 executes the series of processes described above with reference to FIG. That is, referring to FIG. 2, the speech recognition device 48 first performs speech recognition processing 100 of the given speech 46 to generate a word graph 102. The speech recognition device 48 further calculates a posterior probability for each word constituting the route network of the word graph 102, and further calculates a GWPP. Subsequently, the voice recognition device 48 performs route selection 104 to generate a word string 106 and also generates a GWPP sequence 108 corresponding to the word string 106. The speech recognition device 48 outputs a set of the generated word string 106 and the corresponding GWPP sequence 108 as a speech recognition result 50.

（適合度の評価）
以下に、検査装置６０の適合度評価部７０（図１参照）が、音声認識結果５０から適合度の評価結果７２（図４（Ａ）参照）を生成する動作について説明する。図６を参照して、適合度評価部７０に音声認識結果５０が入力されると、当該音声認識結果５０は、認識結果記憶部１８０に格納される。部分文生成部１９０は、認識結果記憶部１８０内の音声認識結果５０の単語列１０６（図２参照）から、部分的な単語列とその位置標識とからなる部分文を生成し、部分文記憶部１９２に格納する。部分文記憶部１９２に全ての部分文が格納されると、探索部１９４は、部分文の各々について次の処理により、各部分文に一致するフレーズの探索を行なう。すなわち、探索部１９４はまず、処理対象の部分文を選択し、当該部分文と同じフレーズをフレーズデータベース５６で探索する。同じフレーズが存在すれば、この部分的な単語列は一致部分の一つとなる。この場合、探索部１９４は、処理対象の部分文の位置標識を一致部分記憶部１９６に格納する。同じフレーズがなければ、別の部分文を選択する。探索部１９４による探索が完了すると、一致部分記憶部１９６には、一致部分の各々の単語列１０６内での位置が、格納された位置標識により特定できるようになる。 (Evaluation of conformity)
Hereinafter, an operation in which the fitness evaluation unit 70 (see FIG. 1) of the inspection apparatus 60 generates the fitness evaluation result 72 (see FIG. 4A) from the speech recognition result 50 will be described. Referring to FIG. 6, when the speech recognition result 50 is input to the fitness evaluation unit 70, the speech recognition result 50 is stored in the recognition result storage unit 180. The partial sentence generation unit 190 generates a partial sentence including a partial word string and its position indicator from the word string 106 (see FIG. 2) of the speech recognition result 50 in the recognition result storage unit 180, and stores the partial sentence. Stored in the unit 192. When all the partial sentences are stored in the partial sentence storage unit 192, the search unit 194 searches for a phrase that matches each partial sentence by the following process for each partial sentence. That is, the search unit 194 first selects a partial sentence to be processed, and searches the phrase database 56 for the same phrase as the partial sentence. If the same phrase exists, this partial word string becomes one of the matching parts. In this case, the search unit 194 stores the position indicator of the partial sentence to be processed in the matching part storage unit 196. If there is no same phrase, select another partial sentence. When the search by the search unit 194 is completed, the position of each matching part in the word string 106 can be specified in the matching part storage unit 196 by the stored position indicator.

全ての一致部分の位置標識が一致部分記憶部１９６に格納されると、Ｌmax算出部１８４は、認識結果記憶部１８０内の音声認識結果５０における単語列１０６中の単語Ｗ₁〜Ｗ_Nについて次の処理を行ない、各単語の適合度を決定する。すなわちまず、処理対象の単語Ｗ_nを選び、単語Ｗ_nの位置を特定する。単語Ｗ_nの位置が含まれる全ての位置標識を一致部分記憶部１９６の中で選ぶ。この結果、実質的に単語Ｗ_nに関する一致部分の集合が作成される。選ばれた位置標識をもとに、単語Ｗ_nを含む全ての一致部分についてフレーズ長を求める。すなわち、単語Ｗ_nに関する一致部分の集合の要素の全てについてフレーズ長を求める。求められたフレーズ長のうち最大のものを探すことにより最大一致長Ｌmaxを求めて、適合度付与部１８５に出力する。 When the position markers of all the matching parts are stored in the matching part storage unit 196, the Lmax calculation unit 184 performs the next processing on the words W _{1 to} W _N in the word string 106 in the speech recognition result 50 in the recognition result storage unit 180. Then, the degree of fitness of each word is determined. That is, first, the word W _n to be processed is selected, and the position of the word W _n is specified. All the position indicators including the position of the word W _n are selected in the matched portion storage unit 196. As a result, a set of matching portions substantially relating to the word W _n is created. Based on the selected position marker, the phrase length is obtained for all matching parts including the word W _n . That is, the phrase length is obtained for all the elements of the set of matching parts related to the word W _n . The maximum matching length Lmax is obtained by searching for the largest phrase length among the obtained phrase lengths, and is output to the fitness level assigning unit 185.

以上の処理により決定された各単語の適合度は、図６に示す適合度付与部１８５に与えられる。適合度付与部１８５は、認識結果記憶部１８０内の単語Ｗ₁〜Ｗ_Nにその単語の最大一致長Ｌmaxを適合度として付与する。これにより認識結果記憶部１８０内に、図４（Ａ）に示す適合度の評価結果７２が形成される。適合度付与部１８５は、単語Ｗ₁〜Ｗ_Nについてこの処理が完了すると、終了信号を出力部１８６に与える。出力部１８６は、これに応答して、認識結果記憶部１８０内の情報を読出して出力する。その結果、適合度の評価結果７２が出力されることになる。 The fitness level of each word determined by the above processing is given to the fitness level assigning unit 185 shown in FIG. The fitness level assigning unit 185 gives the maximum matching length Lmax of the word as the fitness level to the words W _{1 to} W _N in the recognition result storage unit 180. As a result, the fitness evaluation result 72 shown in FIG. 4A is formed in the recognition result storage unit 180. When this process is completed for the words W _{1 to} W _N , the fitness level assigning unit 185 gives an end signal to the output unit 186. In response to this, the output unit 186 reads and outputs information in the recognition result storage unit 180. As a result, a fitness evaluation result 72 is output.

（合否の決定）
以下に、合否決定部７４（図１参照）が、適合度の評価結果７２と信頼度とをもとに各単語を検査する動作について説明する。図７を参照して、合否決定部７４に適合度の評価結果７２が入力されると、当該適合度の評価結果７２は、記憶部２６０に記憶される。単語選択部２７０は、適合度の評価結果７２における単語Ｗ₁〜Ｗ_Nの中から処理対象の単語Ｗ_nを順次選択し、当該単語Ｗ_nの適合度とＧＷＰＰとをそれぞれ、しきい値設定部２７２と比較部２７４とに与える。 (Decision of pass / fail)
Below, the operation | movement which the pass / fail determination part 74 (refer FIG. 1) test | inspects each word based on the evaluation result 72 and the reliability of a suitability is demonstrated. With reference to FIG. 7, when the fitness evaluation result 72 is input to the pass / fail determination unit 74, the fitness evaluation result 72 is stored in the storage unit 260. Word selection unit 270, the words W ₁ to W-sequentially select the word W _n to be processed from among _N, fitness of the words W _n and a GWPP respectively in the evaluation result 72 fitness, threshold setting To the unit 272 and the comparison unit 274.

単語Ｗ_nの適合度が与えられると、しきい値設定部２７２は、しきい値テーブル１６０（図５参照）から、その適合度に対応するしきい値を読出して、比較部２７４に与える。例えば、図４（Ｂ）に示すｎ番目の単語Ｗ_nの最大一致長Ｌmaxは１である。図５に示すしきい値テーブル１６０において、Ｌmax＝１の単語に関するしきい値はＴ1である。したがって、しきい値設定部２７２は、しきい値Ｔ1を比較部２７４に与える。比較部２７４は、単語Ｗ_nのＧＷＰＰとしきい値とを比較する。単語Ｗ_nのＧＷＰＰがしきい値Ｔ1以上であれば、合格（ＯＫ）を表す第１の値を出力する。さもなければ不合格（ＮＧ）を表す第２の値を出力する。 When the fitness level of word W _n is given, threshold value setting unit 272 reads a threshold value corresponding to the fitness level from threshold value table 160 (see FIG. 5) and provides it to comparison unit 274. For example, the maximum match length Lmax of the _nth word Wn shown in FIG. In the threshold table 160 shown in FIG. 5, the threshold for the word with Lmax = 1 is T1. Therefore, threshold value setting unit 272 provides threshold value T 1 to comparison unit 274. The comparison unit 274 compares the GWPP of the word W _n with the threshold value. _If the GWPP of the word Wn is greater than or equal to the threshold value T1, a first value representing pass (OK) is output. Otherwise, a second value representing failure (NG) is output.

以上のようにして比較部２７４から出力される値が、順次図７に示す検査結果付与部２６４に与えられると、検査結果付与部２６４は、その値をもとに検査結果列１４４（図４（Ｂ）参照）を形成して、記憶部２６０内の適合度の評価結果７２に付与する。これにより記憶部２６０内に、図４（Ｂ）に示す検査結果７６が形成される。検査結果付与部２６４は、この処理が完了すると、終了信号２６６を出力部２６８に与える。出力部２６８は、これに応答して記憶部２６０内の情報を読出して出力する。その結果、検査結果７６が図１に示すスイッチ７８に出力される。スイッチ７８は、検査結果７６の入力を受けると、その時点で選択されている処理に対応する出力に対して、当該検査結果７６を出力する。この場合スイッチ７８は、検査結果７６を前処理装置６４に対して出力する。 When the values output from the comparison unit 274 as described above are sequentially given to the inspection result assigning unit 264 shown in FIG. 7, the inspection result assigning unit 264 uses the inspection result sequence 144 (FIG. 4) based on the values. (See (B)) and assign it to the fitness evaluation result 72 in the storage unit 260. As a result, a test result 76 shown in FIG. When this processing is completed, the inspection result giving unit 264 gives an end signal 266 to the output unit 268. In response to this, the output unit 268 reads and outputs information in the storage unit 260. As a result, the inspection result 76 is output to the switch 78 shown in FIG. When the switch 78 receives an input of the inspection result 76, the switch 78 outputs the inspection result 76 for the output corresponding to the process selected at that time. In this case, the switch 78 outputs the inspection result 76 to the preprocessing device 64.

（前処理及び翻訳）
図１に示す前処理装置６４は、スイッチ７８から検査結果７６の入力を受けると、検査結果７６をもとに、前処理を行なう。本実施の形態では、検査結果が不合格（ＮＧ）の単語を、音声認識結果の単語列Ｗ₁〜Ｗ_Nから除去して、翻訳用の音声認識結果を生成する。翻訳装置６８は、翻訳モデル部５２に記憶されているフレーズベースの翻訳モデルを用いた統計翻訳により、翻訳結果３６を生成する。 (Pre-processing and translation)
When the preprocessing device 64 shown in FIG. 1 receives the inspection result 76 from the switch 78, the preprocessing device 64 performs preprocessing based on the inspection result 76. In the present embodiment, words whose test results are unacceptable (NG) are removed from the word strings W _{1 to} W _N of the speech recognition results, and a speech recognition result for translation is generated. The translation device 68 generates the translation result 36 by statistical translation using the phrase-based translation model stored in the translation model unit 52.

［実験］
本実施の形態に係る音声翻訳システム３０における単語の検査性能を評価するために、しきい値の学習実験と、当該学習実験により得られたフレーズテーブルを用いての音声認識結果の検査実験とを行なった。 [Experiment]
In order to evaluate the word inspection performance in the speech translation system 30 according to the present embodiment, a threshold learning experiment and a speech recognition result inspection experiment using a phrase table obtained by the learning experiment are performed. I did it.

学習実験及び検査実験では、出願人により作成された旅行会話基本表現コーパス（Basic Travel Expression Corpus：ＢＴＥＣ）を使用して、学習実験用のデータセット及び検査実験用のデータセットを作成した。以下、これらをそれぞれ「学習セット」及び「テストセット」と呼ぶ。学習セットは、ＢＴＥＣの１０１６種類の発声からなり、当該データセット全体での単語数は延べ７２１５単語である。テストセットは、ＢＴＥＣの３０６０種類の発声からなり、当該データセット全体での単語数は延べ２１００５単語である。 In the learning experiment and the examination experiment, a data set for the learning experiment and a data set for the examination experiment were created by using the travel conversation basic expression corpus (Basic Travel Expression Corpus: BTEC) created by the applicant. These are hereinafter referred to as “learning set” and “test set”, respectively. The learning set is composed of 1016 kinds of BTEC utterances, and the total number of words in the entire data set is 7215 words. The test set consists of 3060 utterances of BTEC, and the total number of words in the entire data set is 21005 words.

本実験では、音声認識装置４８として、出願人において開発されたものを使用した。本実験では、フレーズデータベース５６もまた、出願人により予め作成されたものを用いた。このフレーズデータベース５６に格納されたフレーズは異なり数約８０万フレーズであった。フレーズ長Ｌ別の異なり数の内訳を、次のテーブル１に示す。 In this experiment, the speech recognition device 48 developed by the applicant was used. In this experiment, the phrase database 56 was also prepared in advance by the applicant. The phrases stored in the phrase database 56 are different and have about 800,000 phrases. The following table 1 shows a breakdown of the different numbers for each phrase length L.

なお、このフレーズデータベース５６において、１フレーズあたりの平均単語数は３．４８単語であった。

In this phrase database 56, the average number of words per phrase was 3.48 words.

学習実験ではまず、学習セットに含まれる各発声に対する音声認識を行ない、各音声認識結果について単語ごとにＬmaxを算出し、さらに検査結果の正解を付与した。そして、正解に対する検査結果のＣＥＲが最小となるよう、Ｌmaxごとにしきい値の学習を行なった。この学習実験の結果得られたしきい値テーブルを次のテーブル２に示す。 In the learning experiment, speech recognition was first performed for each utterance included in the learning set, Lmax was calculated for each word for each speech recognition result, and a correct answer of the test result was given. Then, the threshold value is learned for each Lmax so that the CER of the inspection result for the correct answer is minimized. The threshold value table obtained as a result of this learning experiment is shown in the following table 2.

テーブル２を参照して、Ｌmaxが大きな値になるにしたがい、しきい値が低くなった。特にＬmax＝０の単語に対するしきい値は、Ｌmax≧１のいずれの単語に対するしきい値よりもはるかに高い値になった。これにより、本実施の形態の検査装置による検査では、検査結果に次のような傾向が生じることが分かる。すなわち、一致するフレーズが長くなるにしたがいＧＷＰＰが低い単語まで合格とする傾向、及び一致するフレーズがない単語については非常に高いＧＷＰＰのもののみを許容する傾向である。これらの傾向は、Ｌmaxが大きな単語ほど翻訳に適しているという当初の予測に合致する。

Referring to Table 2, the threshold value decreased as Lmax increased. In particular, the threshold for a word with Lmax = 0 was much higher than the threshold for any word with Lmax ≧ 1. Thereby, it can be seen that the following tendency occurs in the inspection result in the inspection by the inspection apparatus of the present embodiment. That is, as the matching phrase becomes longer, it tends to pass to a word having a low GWPP, and the word having no matching phrase tends to allow only a very high GWPP. These tendencies are consistent with the initial prediction that words with larger Lmax are better suited for translation.

検査実験では、テストセットに含まれる各発話に対する音声認識を行ない、各音声認識結果について、予め単語ごとに検査結果の正解とを付与した。この検査実験では、比較のため、次の２種類の方法で検査を行なった。すなわち一方は、本実施の形態の検査方法であり、テーブル２に示すしきい値テーブルを使用した検査である。他方は、Ｌmaxに関係なく全単語共通のしきい値を用いた検査である。以下、これらの検査方法をそれぞれ、単に「検査１」、「検査２」と呼ぶ。なお検査２では、全単語共通のしきい値を０．６２に設定した。この値は、Ｌmaxによらずにしきい値固定とし、ＣＥＲが最小となるように調整されたしきい値である。２種類の検査方法による検査結果についてそれぞれ、Ｌmax別にＣＥＲを求めた。また、テストセット全体のＣＥＲも求めた。テストセットに占める単語の割合と、２種類の検査方法による検査結果の各々についてのＣＥＲとを、次のテーブル３においてＬmax別に示す。 In the test experiment, speech recognition was performed for each utterance included in the test set, and a correct answer of the test result was assigned to each word in advance for each speech recognition result. In this inspection experiment, the following two kinds of methods were used for comparison. That is, one is the inspection method of the present embodiment, which is an inspection using the threshold value table shown in Table 2. The other is a test using a threshold common to all words regardless of Lmax. Hereinafter, these inspection methods are simply referred to as “inspection 1” and “inspection 2”, respectively. In examination 2, the common threshold for all words was set to 0.62. This value is a threshold value that is fixed so that CER is minimized regardless of Lmax. CER was calculated | required according to Lmax about the test result by two types of test methods, respectively. The CER of the entire test set was also obtained. The ratio of words in the test set and the CER for each of the inspection results obtained by the two types of inspection methods are shown for each Lmax in Table 3 below.

テーブル３を参照して、Ｌmax≧１の場合、検査１及び検査２の両方において、Ｌmaxが大きな値になるにしたがいＣＥＲが低くなった。また、検査１及び検査２の両方において、Ｌmax＝０の単語についてのＣＥＲは、Ｌmax＝１又は２の単語についてのＣＥＲより低くなった。

Referring to Table 3, in the case of Lmax ≧ 1, the CER decreased as the Lmax increased in both the inspection 1 and the inspection 2. In both Exam 1 and Exam 2, the CER for the word with Lmax = 0 was lower than the CER for the word with Lmax = 1 or 2.

検査１及び検査２の検査結果についてのＣＥＲを比較すると、Ｌmax＞３、Ｌmax＝１、及びＬmax＝０の単語について、検査１の検査結果に対するＣＥＲは、検査２の結果に対するＣＥＲより比べて低下した。テストセット全体についてのＣＥＲを比較すると、検査１及び検査２に対するＣＥＲはそれぞれ、４．３（％）及び４．５（％）であった。したがって、ＧＷＰＰを評価尺度とする検査において、各単語の合否の判定基準として最適なしきい値は、Ｌmaxに応じて変化することが明らかとなった。さらに、学習により最適化されたしきい値を合否の判定基準として使用することにより、Ｌmaxが大きな単語及びＬmaxが小さな単語の両方、並びに検査対象の単語全体において、検査性能に改善が見られることが明らかとなった。このようにして性能の改善された検査方法で音声認識結果を検査し前処理を行なうことにより、音声認識と統計翻訳との連携が強化されるため、音声翻訳の処理の全体的な性能の向上が期待できる。 Comparing the CER for the examination results of examination 1 and examination 2, for words with Lmax> 3, Lmax = 1, and Lmax = 0, the CER for the examination result of examination 1 is lower than the CER for the result of examination 2 did. Comparing the CER for the entire test set, the CER for Exam 1 and Exam 2 was 4.3 (%) and 4.5 (%), respectively. Therefore, in the examination using GWPP as an evaluation scale, it has been clarified that the optimum threshold value as a criterion for pass / fail of each word changes according to Lmax. Furthermore, by using a threshold value optimized by learning as a criterion for pass / fail, the inspection performance can be improved in both words having a large Lmax and words having a small Lmax, and the entire word to be examined. Became clear. By inspecting the speech recognition results with the improved performance inspection method and pre-processing in this way, the linkage between speech recognition and statistical translation is strengthened, improving the overall performance of speech translation processing. Can be expected.

［コンピュータによる実現］
本実施の形態の音声翻訳システム３０は、コンピュータハードウェアと、そのコンピュータハードウェアにより実行されるプログラムと、コンピュータハードウェアに格納されるデータとにより実現される。図９はこのコンピュータシステム３３０の外観を示し、図１０はコンピュータシステム３３０の内部構成を示す。 [Realization by computer]
The speech translation system 30 according to the present embodiment is realized by computer hardware, a program executed by the computer hardware, and data stored in the computer hardware. FIG. 9 shows the external appearance of the computer system 330, and FIG. 10 shows the internal configuration of the computer system 330.

図９を参照して、このコンピュータシステム３３０は、コンピュータ３４０と、モニタ３４２と、キーボード３４６と、マウス３４８と、マイクロフォン３７０と、スピーカ３７２とを含む。コンピュータ３４０は、ＣＤ−ＲＯＭ（コンパクトディスク読出専用メモリ）ドライブ３５０及びＦＤ（フレキシブルディスク）ドライブ３５２を有する。 With reference to FIG. 9, the computer system 330 includes a computer 340, a monitor 342, a keyboard 346, a mouse 348, a microphone 370, and a speaker 372. The computer 340 includes a CD-ROM (Compact Disc Read Only Memory) drive 350 and an FD (Flexible Disc) drive 352.

図１０を参照して、コンピュータ３４０は、ＣＤ−ＲＯＭドライブ３５０及びＦＤドライブ３５２に加えて、ハードディスク３５４と、ＣＰＵ（中央処理装置）３５６と、ＦＤドライブ３５２、ＣＤ−ＲＯＭドライブ３５０、ハードディスク３５４、及びＣＰＵ３５６に接続されたバス３６６と、バス３６６に接続され、ブートアッププログラム等を記憶する読出専用メモリ（ＲＯＭ）３５８と、バス３６６に接続され、プログラム命令、システムプログラム、及び作業データ等を記憶するランダムアクセスメモリ（ＲＡＭ）３６０とを含む。コンピュータ３４０はさらに、バス３６６、マイクロフォン３７０、及びスピーカ３７２とに接続されたサウンドボード３６８を含む。ここでは示さないが、コンピュータ３４０はさらにローカルエリアネットワーク（ＬＡＮ）への接続を提供するネットワークアダプタボードを含んでもよい。 10, in addition to the CD-ROM drive 350 and the FD drive 352, the computer 340 includes a hard disk 354, a CPU (Central Processing Unit) 356, an FD drive 352, a CD-ROM drive 350, a hard disk 354, And a bus 366 connected to the CPU 356, a read-only memory (ROM) 358 connected to the bus 366 and storing a boot-up program and the like, and a bus 366 storing program instructions, system programs, work data and the like. Random access memory (RAM) 360. Computer 340 further includes a sound board 368 connected to bus 366, microphone 370, and speaker 372. Although not shown here, the computer 340 may further include a network adapter board that provides a connection to a local area network (LAN).

コンピュータシステム３３０に音声翻訳システム３０としての動作を行なわせるためのプログラムは、ＣＤ−ＲＯＭドライブ３５０又はＦＤドライブ３５２に挿入されるＣＤ−ＲＯＭ３６２又はＦＤ３６４に記憶され、さらにハードディスク３５４に転送される。又は、プログラムは図示しないネットワークを通じてコンピュータ３４０に送信されハードディスク３５４に記憶されてもよい。プログラムは実行の際にＲＡＭ３６０にロードされる。プログラムは、ＣＤ−ＲＯＭ３６２から、ＦＤ３６４から、又はネットワークを介して、直接にＲＡＭ３６０にロードしてもよい。なお、バイリンガルコーパス３４（図１参照）は例えばハードディスク３５４に記憶され、翻訳モデル部５２の学習時にその必要部分が適宜ＲＡＭ３６０に読込まれる。学習により得られる翻訳モデル及びその過程で抽出されるフレーズもまた、例えばハードディスク３５４に記憶され、必要に応じて必要部分が適宜ＲＡＭ３６０に読込まれる。 A program for causing the computer system 330 to operate as the speech translation system 30 is stored in the CD-ROM 362 or FD 364 inserted into the CD-ROM drive 350 or FD drive 352 and further transferred to the hard disk 354. Alternatively, the program may be transmitted to the computer 340 through a network (not shown) and stored in the hard disk 354. The program is loaded into the RAM 360 when executed. The program may be loaded into the RAM 360 directly from the CD-ROM 362, from the FD 364, or via a network. The bilingual corpus 34 (see FIG. 1) is stored in, for example, the hard disk 354, and necessary parts thereof are appropriately read into the RAM 360 when the translation model unit 52 learns. The translation model obtained by learning and the phrases extracted in the process are also stored in the hard disk 354, for example, and necessary portions are appropriately read into the RAM 360 as necessary.

上記プログラムは、コンピュータ３４０を本実施の形態の音声翻訳システム３０として動作を行なわせる複数の命令を含む。この動作を行なわせるのに必要な基本的機能のいくつかはコンピュータ３４０上で動作するオペレーティングシステム（ＯＳ）若しくはサードパーティのプログラム、又はコンピュータ３４０にインストールされる各種ツールキットのモジュールにより提供される。したがって、このプログラムは本実施の形態のシステム及び方法を実現するのに必要な機能全てを必ずしも含まなくてよい。このプログラムは、命令のうち、所望の結果が得られるように制御されたやり方で適切な機能又は「ツール」を呼出すことにより、上記した音声翻訳システム３０としての動作を実行する命令のみを含んでいればよい。コンピュータシステム３３０の動作は周知であるので、ここでは繰返さない。 The program includes a plurality of instructions that cause the computer 340 to operate as the speech translation system 30 of the present embodiment. Some of the basic functions necessary to perform this operation are provided by operating system (OS) or third party programs running on the computer 340 or various toolkit modules installed on the computer 340. Therefore, this program does not necessarily include all functions necessary for realizing the system and method of the present embodiment. This program includes only instructions for executing the operation as the speech translation system 30 described above by calling an appropriate function or “tool” in a controlled manner so as to obtain a desired result. It only has to be. The operation of computer system 330 is well known and will not be repeated here.

上記実施の形態では、各単語についての信頼度としてＧＷＰＰを用いた。しかし、検査装置による検査に用いることのできる信頼度は、ＧＷＰＰには限定されない。音声認識結果の単語列中の各単語に、所定の評価尺度で信頼度が付与されていれば、当該付与されている信頼度を用いて各単語を検査することができる。ただし、その場合、当該付与されている信頼度を用いて検査を行なうために、予めしきい値の学習をＬmax別に行なっておくことが必要である。又は何らかの学習によりＬmax別に決定されたしきい値をしきい値テーブルとして設定しておくことが必要となる。 In the above embodiment, GWPP is used as the reliability for each word. However, the reliability that can be used for inspection by the inspection apparatus is not limited to GWPP. If each word in the word string of the speech recognition result is given a reliability with a predetermined evaluation scale, each word can be inspected using the given reliability. However, in that case, in order to perform the inspection using the assigned reliability, it is necessary to perform threshold value learning in advance for each Lmax. Alternatively, it is necessary to set a threshold value table determined for each Lmax by some learning as a threshold value table.

また、フレーズデータベースには、ソース言語及びターゲット言語のフレーズの対が格納されていた。しかし、本発明はこのような実施の形態には限定されない。検査装置６０は、ソース言語のフレーズを用いて適合性の評価を行なう。そのため、予めソース言語のフレーズのみを格納したデータベースを用意しておき、当該データベースを用いて適合性の評価を行なうようにしてもよい。 The phrase database stores pairs of phrases in the source language and the target language. However, the present invention is not limited to such an embodiment. The inspection device 60 evaluates the suitability using the source language phrase. Therefore, a database in which only the source language phrases are stored in advance may be prepared, and the suitability may be evaluated using the database.

本実施の形態では、フレーズとの照合の対象として、音声認識結果から全ての組合せで部分文を作成した。しかし、本発明はこのような実施の形態には限定されない。例えば、フレーズデータベースにおいて、各フレーズの出現頻度等の確率情報が得られるならば、当該確率情報に基づき、音声認識結果にフレーズ単位のセグメンテーションを行なってもよい。このようにすることにより、フレーズのオーバーラップを回避して、効率的にＬmaxを求めることができる。また、本実施の形態では、適合度としてＬmaxを用いたが、さらに、上記確率情報を加味して、適合度を求めるようにしてもよい。 In the present embodiment, partial sentences are created with all combinations from the speech recognition result as the target of collation with the phrase. However, the present invention is not limited to such an embodiment. For example, if probability information such as the appearance frequency of each phrase can be obtained in the phrase database, segmentation in units of phrases may be performed on the speech recognition result based on the probability information. By doing so, it is possible to efficiently obtain Lmax while avoiding phrase overlap. In this embodiment, Lmax is used as the fitness level. However, the fitness level may be obtained in consideration of the probability information.

なお、上記した実施の形態では、音声認識結果の各単語について、一致部分の集合を求め、集合に含まれる一致部分のフレーズ長の最大値（最大一致長Ｌ_max）に基づいて適合度を定めている。しかし、音声認識結果の各単語と、後続するフレーズベースの自然言語処理との適合度を定める方法はこれには限定されない。あるフレーズの集合が与えられた場合、音声認識結果のある単語を含む部分単語列であって、当該フレーズの集合に含まれるものからなる集合を求めれば、当該集合の関数として適合度を定めることができる。 In the embodiment described above, a set of matching parts is obtained for each word of the speech recognition result, and the fitness is determined based on the maximum phrase length (maximum matching length L _max ) of the matching parts included in the set. ing. However, the method for determining the degree of matching between each word of the speech recognition result and the subsequent phrase-based natural language processing is not limited to this. When a set of phrases is given, if a set of partial words that include a word with a speech recognition result and is included in the set of phrases is determined, the fitness is determined as a function of the set. Can do.

上記実施の形態では、この関数は「集合を構成する一致部分のうち、フレーズ長が最大のもののフレーズ長」であった。それ以外にも、例えば、部分単語列の集合の要素数、平均のフレーズ長、各要素のフレーズ長の和、等に基づいて適合度の算出を行なうことができる。 In the above embodiment, this function is “the phrase length having the largest phrase length among the matching portions constituting the set”. In addition, the fitness can be calculated based on, for example, the number of elements in the set of partial word strings, the average phrase length, the sum of the phrase lengths of each element, and the like.

上記実施の形態では、前処理装置６４は、検査結果をもとに不合格の単語を音声認識結果の単語列から排除することにより、翻訳用の単語列を生成した。しかし、本実施の形態の検査装置６０による検査結果を用いた前処理は、このようなものには限定されない。例えば、検査結果が不合格の単語が一つでも存在すれば、音声認識結果そのものを棄却するようにしてもよい。この場合、音声認識結果の棄却に応答して所定のエラー信号を発行するようにすると便利である。また例えば、音声認識装置がＮ−ベストの音声認識結果を出力するようにし、前処理装置がＮ−ベストの音声認識結果の各々に対する検査結果に基づき、翻訳に最適な音声認識結果を選択するようにしてもよい。逆に、翻訳に使用される翻訳モデル及びその学習に用いられるバイリンガルコーパスのドメインが、認識すべき音声の言語及びドメインに適合しているかを評価することも可能である。さらに、その評価をもとに、適切なバイリンガルコーパス及び翻訳モデルを選ぶことにより、認識すべき音声に対する音声翻訳の性能の向上が期待できる。 In the above-described embodiment, the preprocessing device 64 generates a word string for translation by excluding rejected words from the word string of the speech recognition result based on the inspection result. However, the pre-processing using the inspection result by the inspection apparatus 60 of the present embodiment is not limited to such a process. For example, the speech recognition result itself may be rejected if there is even one word whose test result fails. In this case, it is convenient to issue a predetermined error signal in response to rejection of the speech recognition result. Further, for example, the speech recognition device outputs the N-best speech recognition result, and the pre-processing device selects the optimum speech recognition result for translation based on the inspection result for each of the N-best speech recognition results. It may be. Conversely, it is also possible to evaluate whether the translation model used for translation and the domain of the bilingual corpus used for learning thereof match the language and domain of the speech to be recognized. Furthermore, by selecting an appropriate bilingual corpus and translation model based on the evaluation, it can be expected to improve speech translation performance for speech to be recognized.

今回開示された実施の形態は単に例示であって、本発明が上記した実施の形態のみに制限されるわけではない。本発明の範囲は、発明の詳細な説明の記載を参酌した上で、特許請求の範囲の各請求項によって示され、そこに記載された文言と均等の意味及び範囲内でのすべての変更を含む。 The embodiment disclosed herein is merely an example, and the present invention is not limited to the above-described embodiment. The scope of the present invention is indicated by each claim in the claims after taking into account the description of the detailed description of the invention, and all modifications within the meaning and scope equivalent to the wording described therein are intended. Including.

音声翻訳システム３０の全体構成を示すブロック図である。1 is a block diagram showing the overall configuration of a speech translation system 30. FIG. 音声認識装置４８による処理の概要と音声認識結果５０のデータ構成とを示す模式図である。It is a schematic diagram which shows the outline | summary of the process by the speech recognition apparatus 48, and the data structure of the speech recognition result 50. FIG. フレーズデータベース５６の構成を示す概略図である。3 is a schematic diagram illustrating a configuration of a phrase database 56. FIG. 適合度の評価結果付の音声認識結果７２及び検査結果７６のデータ構成を示す図である。It is a figure which shows the data structure of the speech recognition result 72 with the evaluation result of a fitness, and the test result 76. FIG. しきい値テーブル１６０のデータ構成を示す図である。It is a figure which shows the data structure of the threshold value table. 適合度評価部７０の機能的構成を示すブロック図である。3 is a block diagram illustrating a functional configuration of a fitness evaluation unit 70. FIG. 合否決定部７４の機能的構成を示すブロック図である。3 is a block diagram illustrating a functional configuration of a pass / fail determination unit 74. FIG. しきい値学習部８４の機能的構成を示すブロック図である。3 is a block diagram showing a functional configuration of a threshold learning unit 84. FIG. 本発明の一実施の形態に係る音声翻訳システム３０を実現するコンピュータシステムの外観図である。1 is an external view of a computer system that implements a speech translation system 30 according to an embodiment of the present invention. 図９に示すコンピュータのブロック図である。FIG. 10 is a block diagram of the computer shown in FIG. 9.

Explanation of symbols

３０音声翻訳システム
３４バイリンガルコーパス
４４セレクタ
４８音声認識装置
５２翻訳モデル部
５４翻訳モデル学習装置
５６フレーズデータベース
６０検査装置
６４前処理装置
６８翻訳装置
７０適合度評価部
７４合否決定部
７８スイッチ
８２しきい値テーブル部
８４しきい値学習部
１６０しきい値テーブル
１８０認識結果記憶部
１８２照合部
１８４Ｌmax算出部
１８５適合度付与部
１８６，２６８出力部
１９０部分文生成部
１９２部分文記憶部
１９４探索部
１９６一致部分記憶部
２６０記憶部
２６４検査結果付与部
２７０単語選択部
２７２しきい値設定部
２７４比較部
２８０正解付与部
２８４学習用検査結果記憶部
２８６正誤判定部
２９０正誤記憶部
２９４しきい値調整部
２９６再検査部 DESCRIPTION OF SYMBOLS 30 Speech translation system 34 Bilingual corpus 44 Selector 48 Speech recognition apparatus 52 Translation model part 54 Translation model learning apparatus 56 Phrase database 60 Inspection apparatus 64 Preprocessing apparatus 68 Translation apparatus 70 Conformity evaluation part 74 Pass / fail decision part 78 Switch 82 Threshold value Table section 84 Threshold learning section 160 Threshold table 180 Recognition result storage section 182 Collation section 184 Lmax calculation section 185 Conformity assignment section 186, 268 Output section 190 Partial sentence generation section 192 Partial sentence storage section 194 Search section 196 Match Partial storage unit 260 Storage unit 264 Test result assignment unit 270 Word selection unit 272 Threshold setting unit 274 Comparison unit 280 Correct answer assignment unit 284 Learning test result storage unit 286 Correct / incorrect determination unit 290 Correct / incorrect storage unit 294 Threshold adjustment unit 296 Re-inspection department

Claims

A word sequence of a speech recognition result generated from a predetermined input speech by speech recognition processing, and a word sequence of the speech recognition result as a target of a predetermined phrase-based statistical natural language processing subsequent to the speech recognition processing A speech recognition result inspection device for inspecting whether or not the words constituting the word should be accepted,
The speech recognition result inspection device is used together with a set of phrases extracted by a predetermined extraction method from a corpus for statistical model learning used in the natural language processing,
Each word constituting the word sequence of the speech recognition result is given a reliability in advance by the speech recognition process,
The voice recognition result inspection apparatus comprises:
For each word constituting the word string of the speech recognition result, a word having a phrase that matches the phrase set in the word string that includes the word and forms a partial word string of the voice recognition result A fitness level giving means for giving a fitness level to the natural language processing as a function of a set of columns;
For each word constituting the word string of the speech recognition result, a comparison is made between the threshold value determined according to the fitness level given to the word by the fitness level giving means and the reliability level given to the word Accordingly, it viewed including a determination means for determining whether or not to accept the word,
The fitness level giving means is
Collating means for detecting a word string having a matching phrase in the phrase set by comparing the phrase set and a word string forming a partial word string of the speech recognition result;
For each word constituting the word string of the speech recognition result, a phrase of the word string included in the set based on a set of word strings including the word among the word strings detected by the matching unit A speech recognition result inspection apparatus comprising: a maximum value, an average value, or a sum of lengths, or means for giving the fitness based on the number of elements included in the set .

The natural language process includes a phrase-based statistical translation process between the language of the input speech and a predetermined target language,
The set of phrases includes a set of phrases extracted in a process of learning a translation model for the statistical translation process from a bilingual corpus of a language of the input speech and the predetermined target language. Inspection device for the voice recognition result described.

The determining means includes
Means for associating and holding the fitness and the threshold;
For each word constituting the word string of the speech recognition result, based on the fitness given to the word, the fitness for the word according to the fitness and the threshold held by the means for holding Means for setting a threshold;
Whether to accept the word by comparing the threshold value set by the setting means and the reliability assigned to the word for each word constituting the word string of the speech recognition result The voice recognition result inspection apparatus according to claim 1, further comprising: a comparison unit for determining whether or not .

A computer program that, when executed by a computer, causes the computer to operate as the speech recognition result inspection apparatus according to any one of claims 1 to 3 .