JP3870722B2

JP3870722B2 - Translation device, recording medium

Info

Publication number: JP3870722B2
Application number: JP2001167615A
Authority: JP
Inventors: 敦子小泉; 博行梶; 康成大淵; 義典北原
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2001-06-04
Filing date: 2001-06-04
Publication date: 2007-01-24
Anticipated expiration: 2019-01-07
Also published as: JP2002014957A

Description

【０００１】
【発明の属する技術分野】
本発明は、音声翻訳装置、及び記録媒体に関し、特に、海外旅行用携帯翻訳機などに好適な発話誘導型の音声翻訳装置、及びこれを実現するための記録媒体に関する。
【０００２】
【従来の技術】
従来の翻訳装置には、大きく分けて二種類の翻訳装置がある。１種類目の翻訳装置は任意の入力文を翻訳する装置であり、２種類目の翻訳装置は、あらかじめ用意された対訳例文を場面やキーワードによって検索して翻訳する装置である。
【０００３】
第１の任意の入力文を翻訳する翻訳装置として、テキストを対象としたものでは、すでに多くの機械翻訳システムが実用化されている。翻訳方式としては、辞書と文法規則に基づいて翻訳を行うルールベース翻訳が広く採用されている。しかし、文法規則でうまく扱えない文章表現(言語現象)が新たに現れた場合に規則の記述が難しいため、用例を利用した用例ベース翻訳の研究もされている。
【０００４】
一方、音声を対象とした翻訳装置の場合は、まず音声認識が正しくなされないと正しい翻訳結果が得られないという問題がある。この問題を解決するため、例えば特開平08-328585では、入力された音声と用例データベース中のすべての用例との言語的類似度を計算することにより、複数の音声認識結果候補の中から正解を選ぶ方法が示されている。
【０００５】
第２のあらかじめ用意された対訳例文を検索する方法の翻訳装置として、例えば、特開昭56-88564に、カテゴリーを指定すると例文が表示され、例文の中から文を選択すると選択した例文の翻訳文が音声出力される携帯型の翻訳装置が記載されている。
【０００６】
【発明が解決しようとする課題】
海外旅行中は、依頼や質問などの比較的決まりきった内容の、短い会話のやりとりを他人とする場面が多い。しかし、旅行という非日常的な場面であるために、話者の発話意図に対して、具体的に話す内容、表現、あるいは単語が母国語でもすぐに思いつかない場合が少なくない。このため、第１の任意の入力文を翻訳する翻訳装置では、入力する文章をすぐに思いつかないため、旅行中の外国語会話を支援するという目的を充分に達成できない場合がある。例えば、飛行機から降りて機内預かりの手荷物を受け取る場所を尋ねたり、手荷物引き換え証を係員に提示したりする際に、「アメリカン航空１２１便の手荷物受取所はどこですか。」「私の荷物が出てきません。これが手荷物引換証です。」といった日本文が完全な形ですぐに頭に浮かぶとは限らない。母国語である日本語の通じる相手と直接会話をする場合は、相手の反応を見ながら「乗るときに預けた荷物は...えーと、どこで...便名？アメリカンの１２１便です。」「荷物が出てこないんですけど...手荷物引換証？ああ、乗るときにもらった控えですか？えーと、これです。」などというやりとりで通じるかもしれないが、機械翻訳で外国語に訳す場合は、「アメリカン航空１２１便の手荷物受取所はどこですか。」「私の荷物が出てきません。これが手荷物引換証です。」のような完全な文が入力されないとうまく訳すことができない。また、母国語で話す場合と違って、「便名はなんですか」「手荷物引換証を見せてください」などと相手に聞き返されても理解できないので、はじめから十分な情報を含む文を相手に伝える必要がある。
【０００７】
一方、第２のあらかじめ用意された対訳例文を検索する翻訳装置は、このような任意文入力の問題点を回避することができる。また、処理が単純であるため、携帯型の翻訳機として実現しやすい。しかし、場面別に分類された例文をメニュー選択によって検索する従来の方法では、例文数が多くなると、何度も選択ボタンを押して場面の絞り込みを繰り返すか、１つの場面に対して数多くある例文をスクロール表示させる必要があるため、使いたい例文を見つけるまでに時間がかかるという問題がある。また、１つの例文について単語の置き換え候補をスクロール等の手段で探す従来の方法では単語の置き換え候補数を多くした場合に操作性が良くないという問題がある。例えば、前述の手荷物受取所の場所を尋ねる例では、「日本航空７５３便の手荷物受取所はどこですか。」という例文を見つけた後、スクロールしたりボタンを押したりして便名を「アメリカン航空１２１便」に置き換えるのは面倒である。
【０００８】
本発明の目的は、ユーザの発話意図に適した例文テンプレートを表示してユーザの発話を誘導することにより、例文検索型の翻訳装置の利点と任意文入力型の翻訳装置の利点を兼ね備えた翻訳装置を実現することにある。
【０００９】
【課題を解決するための手段】
上記課題を解決するため、翻訳装置において、複数の例文テンプレートを記憶する例文ファイルと、入力された音声を認識する音声認識部と、上記音声に類似した例文テンプレート中の語句を上記音声に基づいて置換して翻訳する手段と、翻訳文を音声出力する音声出力部または、翻訳文を表示するディスプレイを有するものとする。また、コンピュータ読みとり可能な記録媒体であって、第１の音声を音声認識手段によって認識するステップと、上記第１の音声と関連づけられた例文テンプレートを検索してディスプレイに表示するステップと、第２の音声を上記音声認識手段によって認識するステップと、上記検索された例文テンプレート中の語句を上記第２の音声に基づいて置換して翻訳するステップと、翻訳結果を痔巣プレイに表示、又は、音声出力するステップとが記録されたものとする。
【００１０】
例えば、翻訳装置では、あらかじめ用意された例文を検索する翻訳装置であって、例文ファイルに記憶する例文を、語句の置き換えが可能な部分(以下スロットという)を含む例文テンプレートとして記述しておく。例文テンプレートにはスロットに関する情報として、そのスロットに現われうる語句の分類コード(例えば場所、日付等の情報)と代表例を記述する。各例文テンプレートにはキーワード(例えば、予約、飛行機、チケット等の情報)を付与する。キーワードには具体的な単語だけでなく、分類コードも記述できるものとする。
【００１１】
本発明の翻訳装置の単語辞書には、音声入力されたキーワードで例文テンプレートを検索するための情報として、単語の発音情報と分類コードを記述する。また、テンプレート翻訳を行なうための情報として、見出しの文法情報、訳語、および訳語の文法情報を記述する。
例文テンプレートを検索するための手段としては、音声で入力されたキーワードを認識する単語認識手段と、認識した単語およびその分類コードがキーワード情報に記述されている例文テンプレートを検索する手段と、複数の例文テンプレートを画面に表示する手段とを設ける。
【００１２】
テンプレート翻訳を行なうための手段としては、音声で入力された文に対して、複数の例文テンプレートの中から類似度が高い例文テンプレートを選択する文音声認識手段と、単語辞書の音韻情報を参照してスロットに挿入された語句を認識する単語音声認識手段と、単語辞書および翻訳規則を参照してスロットに挿入された語句の訳語および語形を決定する手段と、例文テンプレートの訳文情報とスロットに挿入された語句の訳を組み合わせて訳文を生成する手段とを設ける。
【００１３】
ユーザが単語を音声入力すると、単語辞書の音韻情報を参照することによって入力された単語および分類コードを認識する。そしてこれらがキーワード情報に記述されている例文テンプレートを検索し、画面に表示する。例文テンプレートには例文に含まれるスロットに関する情報として、そのスロットに現われうる語句の代表例が記述されている。従って、例文テンプレートを画面に表示する際には、スロットに代表例が挿入された状態で表示する。
【００１４】
画面に１つ以上の例文テンプレートが表示されている状態で、例文テンプレートのスロットにユーザの所望の語句を挿入した文を音声入力する。すると、画面に表示されている例文テンプレートのうち、類似度が高い例文テンプレートに基づいて音声入力文の翻訳を実行し、訳文を表示する。例えば、ユーザが「電話」という単語を音声で入力すると、「電話」という単語がキーワード情報に記述されている例文テンプレートを検索する。画面には「[電話]はどこですか」「[市内電話]のかけ方を教えてください」「[日本]に[コレクトコール]をかけたい」などの例文テンプレートが表示される。下線部はスロットを表す。ユーザは画面をスクロールしながら使えそうな例文を探す。使えそうな例文、例えば「[日本]に[コレクトコール]をかけたい」を見つけたら、「カナダに国際電話をかけたい」という文を音声で入力する。すると、翻訳機は、「[日本]に[コレクトコール]をかけたい」という例文テンプレートを使ったことを認識するとともに、「日本」「コレクトコール」が「カナダ」「国際電話」に置換されたことを認識し、テンプレート翻訳を実行する。
【００１５】
【発明の実施の形態】
本発明の実施例を図１から図１４までを用いて説明する。本実施例は、日本語を母国語とする利用者が英語圏を旅行するための携帯型の音声翻訳装置である。本音声翻訳装置は、ユーザがキーワードを音声入力すると例文テンプレートを検索して表示する。ユーザが例文テンプレートの置き換え可能な単語を別の単語に置き換えてしゃべると、どの例文テンプレートを使って何という単語に置き換えたかを認識し、入力文に対する訳文を表示・発声する。
【００１６】
次に、本実施例の翻訳機の外観を説明する。図１は本発明の一実施例である音声翻訳装置の外観および初期画面を示す図である。音声翻訳装置１０１は、表示画面１０２、電源スイッチ１０３、リセットボタン１０４、訳文を音声出力するためのスピーカー１０５、例文検索のキーワードを音声入力するためのマイク１０６、例文検索・翻訳・訳文発声等を指示するための実行ボタン１０７、前の画面に戻るための「戻る」ボタン１０８、画面上を前後左右に移動するためのスクロールボタン１０９〜１１２を備えている。
【００１７】
次に、本実施例の翻訳記のハードウエア構成を説明する。図２は、本音声翻訳装置のハードウエア構成を示す。本音声翻訳装置は、装置の制御と音声認識、翻訳、音声合成処理を行うためのCPU２０１、音声入力装置２０２、音声出力装置２０３、単語辞書２０４、例文ファイル２０５、メモリ２０６によって構成されている。メモリ２０６には、制御プログラム２６１、単語音声認識プログラム２６２、文音声認識プログラム２６３、テンプレート翻訳プログラム２６４、音声出力プログラム２６５、画面表示プログラム２６６が記憶されている。
【００１８】
次に、例文ファイルの内容を説明する。
【００１９】
図６に例文ファイルの内容を示す。例文ファイルには、各例文テンプレートについて、例文番号６０１、基本例文番号６０２、詳細例文番号６０３、キーワード６０４、原文テンプレート６０５、原文テンプレートの発音を記述した発音情報６０６、訳文テンプレート６０７、訳文テンプレートの発音を記述した訳文発音情報６０８、スロット情報６０９が記述されている。スロット情報６０９には、スロット数６０９１が記述され、各スロットについて、スロットに入る語句の代表例６０９２、代表例の訳６０９３、代表例の訳の発音情報６０９４、スロットにどのような語句が入るかを記述した意味情報６０９５、訳文におけるスロット部分の文法特徴（冠詞、単複等）を記述した文法情報６０９６が記述されている。
【００２０】
例文には「予約したい」のような基本例文と、「一番早く乗れる便を探してください。」のような詳細例文がある。基本例文に対しては、それに対応する詳細例文の例文番号が詳細例文番号６０３に記述されている。詳細例文に対しては、それに対応する基本例文の例文番号が基本例文番号６０２に記述されている。
【００２１】
キーワードで例文を検索すると、基本例文が上の方に表示される。ユーザが「予約したい」を選択して詳細例文の表示を指示すると、「一番早く乗れる便を探してください。」などの詳細例文が表示される。原文テンプレート６０５および訳文テンプレート６０７において、置き換え可能な語句はスロットになっている。
【００２２】
画面に表示する際には、原文テンプレートのスロットに代表例６０９２の単語を埋め込んで表示する。ユーザが単語の置き換えをせずに訳文の表示や発声を指示した場合は、訳文に代表例の訳６０９３を埋め込んで表示し、訳文発音情報６０８に代表例の訳の発音情報６０９４を埋め込んで発声する。
【００２３】
スロットの意味情報６０９５の記述は、「往復切符；片道切符；切符」のように単語を列挙してもよいし、「&#60場所&#62」のように分類コードを使って記述してもよい。どのような単語にも置き換え可能な場合は条件を記述しない。
【００２４】
キーワード６０４に「&#60場所&#62」のような分類コードが記述されている例文は、その分類コードを持つすべての単語から検索される。例えば、ユーザが「郵便局」、「バス停」、または「銀行」など、「&#60場所&#62」という分類コードを持つ単語をキーワードとして入力すると、例文1356「一番近い&#60S1&#62はどこですか」が検索・表示される。
【００２５】
次に、単語辞書の内容について説明する。図７に単語辞書の内容を示す。単語辞書には各単語項目について、単語番号７０１、見出し７０２、読み７０３、発音情報７０４、文法情報７０５、分類コード７０６、訳語７０７、訳語の発音情報７０８、訳語の文法情報７０９、例文番号７１０が記述されている。「&#60場所&#62」などの分類コードを見出しとするレコードの例文番号７１０は、その分類コードをキーワードとする例文の例文番号である。
【００２６】
次に、本実施例の音声翻訳装置の動作について説明する。図３は、本音声翻訳装置の動作の概要を示すフローチャートである。電源ボタン１０３を押すと（３０１）、メモリ２０６のプログラムが起動され、表示画面１０１に初期画面が表示される（３０２）。初期画面が表示されている状態でマイク１０６を通して音声が入力されると、キーワード音声認識を行う（３０３）。
【００２７】
次にキーワード音声認識について図４を参照しながら説明する。まず、単語が音声入力されると（４０２）、単語辞書２０４を参照し、隠れマルコフモデル（ＨＭＭ）などの音声認識アルゴリズムにより、単語音声認識を行い（４０３）、単語候補を画面に表示する（４０４）。
【００２８】
図８を用いて単語候補表示画面の例を説明する。単語候補表示画面では、一行目に第１候補のかな表記、二行目以降に第１候補から第１０候補までの単語候補を表示する。初めは第１候補の単語が反転表示されているが、前後左右スクロールキーを使うことにより、反転表示される単語が前後左右に移動する。これにより、ユーザは意図した単語を選択することができる。図８で画面に表示されていない第４候補以下は、下スクロールキーで画面をスクロールさせることによって見ることができる。単語候補は、同音異義語がある場合は同音異義語を１行に表示し、同音異義語がない場合は対訳を表示する。単語候補の先頭に付いている「＊」は例文検索のキーワードであることを示す。
ユーザが意図した単語を選択して実行ボタンを押すと（４０５）、その単語を例文検索のキーワードとして確定し（４０８）、キーワード認識処理を終了する（４０９）。ユーザが意図する単語が単語候補にない場合は、かな表記の一行目を選んで実行ボタンを押すことにより、かなレベルで入力単語を修正し（４０６）、再度単語辞書検索を行うことができる（４０７）。また、「もどる」ボタンを押すことにより、キーワード音声入力受付の状態に戻ることもできる。
【００２９】
なお、単語候補表示画面は、キーワード音声認識結果をユーザが確認するためのものだが、単語候補と共に訳語が表示されるので、対訳辞書として使うこともできる。また、訳語を選択して実行ボタンを押すと発音を発声するようにすることもできる。同音異義語の場合は単語候補表示画面では訳語が表示されないが、同音異義語の中から単語を選択して実行ボタンを押すと、次の画面で訳語と例文が表示されるようにすることもできる。
【００３０】
次に、例文検索について説明する。入力単語が確定したら、単語辞書２０４において、入力単語の例文番号７１０に記載されている番号の例文を例文ファイル２０５より取り出し（３０４）、画面に表示する（３０５）。単語辞書において分類コード７０６が記述されている単語については、その分類コードをキーワードとする例文も検索する。例えば、入力単語「銀行」に対して、「銀行」をキーワードとする例文に加えて、&#60場所&#62&#60機関&#62という分類コードをキーワードとする例文も検索する。&#60場所&#62という分類コードをキーワードとする例文の番号は、「&#60場所&#62」を見出しとする単語辞書レコードの例文番号７１０に記載されている。
【００３１】
図９に「予約」をキーワードとして例文検索した場合の例文表示画面の例を示す。図９において、９０１は検索された例文である。例文表示画面には、１行目に単語の辞書見出しと訳語が表示され、２行目以降に検索された例文の先頭の３文が表示される。４行目以降の例文は、スクロールキーで画面をスクロールさせることによって見ることができる。文頭に★がついている例文は基本例文であることを示す。例文表示画面では、原文テンプレートのスロットに代表例６０７を埋め込んだものを表示し、アンダーラインで置き換え可能な語句であることを示す。図１０において、「ダラス」「マイアミ」「１４日」は置き換え可能な語句である。
【００３２】
次に、例文絞り込みについて説明する。検索例文のうち、画面に入りきらない部分は上下左右のスクロールキーを使って見ることができるが、例文数が多い場合は、さらにキーワードを追加して例文を絞り込むことができるようにする。これによって、例文が探しやすくなる。ユーザが例文絞り込みをしたい場合は、スクロールボタンで画面の１行目に表示されているキーワードを選択して（反転表示させて）実行ボタンを押す。次に、追加キーワードを音声入力すると、前述のキーワード音声認識を実行し（３０７）、例文の絞り込みを行い（３０８）、絞り込み例文を表示する（３０９）。図９は「予約」をキーワードとする例文検索結果であり、図１０は「フライト」で例文を絞り込んだ結果である。絞り込み結果を見てユーザが「もどる」ボタンを押すと絞り込み前の例文を表示する（３１１）。
【００３３】
なお、置き換え可能な語句のない例文を選択する場合は、スクロールキーによるカーソル移動で例文を選択してもよい（３１２〜３１４）。例文を選択して実行キーを押すと、翻訳が実行される（３２２）。
【００３４】
次に、詳細例文表示について説明する。スクロールキーで例文を選択する際に、例文そのものではなく基本例文の文頭についている★を選択して実行キーを押すと、関連する詳細例文が表示される（３１４）。例えば、図９の例文表示画面において、３行目の「予約したい」を選択して実行キーを押すと翻訳結果が表示され、「予約したい」の左側についている★を選択して実行キーを押すと、図１２に示すような「予約したい」の詳細例文が表示される。
【００３５】
次に、文音声認識について説明する。例文表示画面を見ながらユーザが文を発声すると（３１５）、文音声認識を実行する（３１６）。文音声認識は、例えば「確率モデルによる音声認識」（電子情報通信学会, 1988）pp46-50に記載されているHMMによるオートマトン制御連続単語音声認識アルゴリズムによって行う。
【００３６】
図５に、文音声認識処理のフローを示す。まず、画面に表示されている例文テンプレートに対して、図１４のような有限状態オートマトンを作成する。図１４は、「一番近い&#60s1&#62はどこですか」という例文テンプレートに対して作られる有限状態オートマトンである。図１４において、１４０１と１４０２は「イチバンチカイ」「ワドコデスカ」という発音情報から作成されるＨＭＭ（隠れマルコフモデル）である。原文テンプレートの部分のモデルは、例文ファイルに記載されている発音情報６０６に基づいて作成する（５０２）。
【００３７】
スロット部分のモデルは、スロットに入りうる単語の発音情報に基づいて作成する。これにはまず、例文ファイルのスロット情報６０９の意味情報６０９５を参照することにより（５０４）、単語辞書登録語のうち、どの単語がスロットに入り得るかを認識する。スロットに入りうる単語は、スロットの意味情報６０９５において単語が列挙されている場合は列挙されている単語（５０５）、分類コードが指定されている場合は指定された分類コードを持つ単語（５０６）、何も指定されていない場合は単語辞書登録語（５０７）、とする。それぞれの単語のモデルは、単語辞書に記載されている単語の発音情報７０４に基づいて作成する（５０８）。
【００３８】
画面に表示されている例文テンプレートについて有限状態オートマトンを作成したら（５０９）、 HMMによるオートマトン制御連続単語音声認識アルゴリズムによって入力文の音声認識を行い、各スロットについて、上位１０語をスロット候補語として記憶する（５１０）。
【００３９】
なお、例文テンプレートを見ながら発話する際に、スロットの前後やスロットを含む文節の直後にポーズが入る場合があるので、各例文テンプレートに対して、文全体に対する有限状態オートマトンだけでなく、ポーズが入りそうな場所で区切ったフレーズの有限状態オートマトンも作成すると、より高い認識精度を得ることができる。例えば、上記の例では、図１４に示した「イチバンチカイバステイワドコデスカ」などの他に、「イチバンチカイ」「バステイ」「ワドコデスカ」「バステイワ」「ドコデスカ」などの有限状態オートマトンも作成し、そのフレーズを含む例文テンプレートに対応付けておく。さらに、ポーズを積極的に取り入れて、「イチバンチカイ」でポーズが入った場合に、「一番近い&#60s1&#62はどこですか」の「一番近い」の部分と一致することを表示し、ユーザが確認後、続きを言うように誘導することも考えられる。
【００４０】
本実施例では単語辞書登録語の範囲でスロット挿入語を認識しているが、すべての単語をあらかじめ辞書に登録しておくことはできない。特に固有名詞や土地の名物などの名詞は、旅行会話で重要であるにもかかわらず、単語辞書でカバーしきれない。しかしながら、ユーザがそのような語を使いたい場合は、原語でその単語を知っているケースが多く、単語だけなら何とかそれに近い発音ができる場合が多い。そこで、特定のボタンを押しながらスロットの部分の語句を音声入力すると、当該スロット部分についてはユーザが入力したそのままの音を訳文に埋め込んで出力することが考えられる。前述のように、フレーズに対応する有限状態オートマトンを作成してそのフレーズを含む例文テンプレートに対応付けておく方法を採用すれば、特定ボタンを押してしゃべった部分の前後のフレーズから例文テンプレートを特定することができる。これにより、例えば、「［このアトラクション］は何分待ちですか」という例文テンプレートを使って「Back to the futureは何分待ちですか」と発話することが可能となる。また、「私の名前は&#60s1&#62です」における&#60s1&#62のような特定のスロットに関しては、ユーザが入力したそのままの音を訳文のスロット部分に埋め込むようにすることもできる。
【００４１】
本実施例では、音声入力文と比較する対象として、画面に表示されている例文テンプレートに限定しているが、キーワードで検索された例文テンプレート全体としてもよい。また、ユーザがあらかじめ選んだ例文テンプレートや最近使った例文テンプレートを「ユーザ例文ファイル」に記憶しておき、これを比較対象としてもよい。
【００４２】
フレーズに対応する有限状態オートマトンを作成して、そのフレーズを含む例文テンプレートに対応付けておく方法を採用した場合には、例文テンプレート表示の際に、音声入力すべき範囲を波線等でマークし、文全体をしゃべらなくてよいことを知らせるようにすることも考えられる。この方法によれば使い勝手がよくなる。例えば、「一番早く乗れる便を探してください」という文であれば、「一番早く乗れる便を探して」あるいは「一番早く乗れる便」など、短く言うようにして、音声入力の手間が省ける。
【００４３】
次に、文認識結果表示・訂正について説明する。文音声認識結果は、認識した例文テンプレートにカーソル移動し、スロットの単語を代表例から入力単語に置き換えることによって示す（３１７）。例えば、図１０の例文表示画面が表示されている状態でユーザが「シカゴからシアトルまでの１７日の片道切符が欲しい」と音声入力し、正しく文音声認識がされると、図１１の画面が表示される。文認識結果を見て、ユーザは「もどる」ボタンを押して文入力をやり直すか（３１８）、文認識結果を必要に応じて訂正する（３１９）。例文テンプレートの選択が正しくない場合は、スクロールキーで別の文を選択すると、その例文テンプレートを使ってスロット部の単語認識をやり直す。スロット部の単語認識が正しくない場合は、スクロールキーでスロットを選択して実行ボタンを押すと文認識時に記憶しておいたスロット候補語が表示される。スロット候補語の中に正しいものがあれば、スクロールキーで選択して実行ボタンを押すことによって選択する。
【００４４】
次に、翻訳について説明する。音声入力文と一致する文を確定したら実行ボタンを押すことによって翻訳を指示する（３２０〜３２１）。基本例文を音声入力で選択し、文頭の★を選択して実行ボタンを押すことで詳細例文を見ることも可能である（３２１）。テンプレート翻訳は、例文テンプレートの訳文にスロット部の語句の訳を埋め込むことによって行う（３２２）。例文テンプレートには訳文におけるスロットの文法情報６０９６が記述されている。これと単語辞書２０４の訳語の文法情報７０９を参照することにより、スロット部分の訳語の冠詞や単複などを決定する。例えば、「不定冠詞、単数形」と指定されたスロットに「語頭母音」という文法情報が記述された単語モappleモを埋め込む場合は不定冠詞モanモを付け、「無冠詞、複数形」と指定されたスロットにモcandyモという単語を埋め込む場合は複数形モcandiesモを埋め込む。
【００４５】
翻訳結果表示ステップ（３２３）では、図１３に示す翻訳結果表示画面を表示する。ユーザの指示（３２４）が「もどる」ボタンであれば翻訳実行前に表示した例文を表示し（３２５）、訳文を選択して実行ボタンであれば訳文を発声し（３２６）、原文を選択して実行ボタンであれば次の発話用の例文を表示する（３２７）。次の発話用の例文として表示する例文は、先に翻訳した文が基本例文である場合はその詳細例文であり、先に翻訳した文が詳細例文である場合は翻訳実行前に表示した例文である。これは、同じキーワードで検索された詳細例文のセットは関連する例文であり、これらの例文の中から複数の文を続けて使う可能性があると考えられるからである。代案としては、キーワードの中でも「ホテル」「フロント」「買い物」など、場面に関するものを「場面情報」としてキーワードとは別の情報として各例文テンプレートに付与しておき、場面情報が共通の例文だけを次の発話用の例文として表示することが考えられる。さらに、新たにキーワード検索を行った場合にも場面情報が直前に使った例文と一致するものを優先的に上の方に表示したり、リセットするまで同じ場面が続くことにして場面情報で例文を絞って表示することが考えられる。
【００４６】
本実施例では、対訳例文テンプレートを利用してテンプレート翻訳を行っているが、例文テンプレートの真の目的は、ユーザの発話を誘導することにある。したがって、翻訳処理については、辞書と文法規則による機械翻訳で行うことも考えられる。その場合は、例文テンプレートは必ずしも対訳である必要はなく、入力言語の例文テンプレートだけでよい。
【００４７】
以上、携帯型の音声翻訳装置としての実施例を述べたが、本発明は、上述のような翻訳装置を実現するための図３，図４，及び図５に記載されたフローが記録された記録媒体であってもかまわない。このような記録媒体としては、音声入力された語句を音声認識手段によって認識するステップと、この語句と関連づけられた例文テンプレートを例文ファイル中から検索するステップと、検索された例文ファイルをディスプレイに表示するステップと、表示された例文テンプレートに基づいて、ユーザが発話した文章を認識するステップと、認識した文章と検索された例文テンプレートを比較して、もっとも似ているものを認識するステップと、似ている例文テンプレート中の語句のうち置換可能な語句をユーザが発話した文章に基づいて置換して翻訳するステップと、翻訳結果をディスプレイに表示又は、音声出力するステップとが記録されたコンピュータ読みとり可能な記録媒体が考えられる。
【００４８】
また、他の形の翻訳装置では別のユーザインタフェースが考えうる。例えば、病院・役所・ホテルなどの窓口に備える音声翻訳装置では、画面をタッチパネルとし、スロット部分に指を触れて置き換え単語のみを発声するといったことが考えられる。また、本実施例では、表示画面が小さいことと片手で持って使うことを考慮して、キーワード音声入力で例文テンプレート検索を行うようにしたが、場面や発話意図（質問、依頼、説明等）で例文テンプレートを分類し、メニューの表示・選択で絞っていくことも可能である。
【００４９】
本実施例では、日本語から英語への翻訳を行う音声翻訳装置について説明したが、他の言語対にも応用可能である。また、本実施例では、対訳例文テンプレートにおいて、原文テンプレートに対して認識用の情報、訳文テンプレートに対して生成用の情報を記述するようにしているが、両方の言語のテンプレートに対して認識用の情報と生成用の情報を記述するようにすれば、双方向に翻訳を実行するようにすることも可能である。双方向の翻訳を行う場合は、本実施例で基本例文と詳細例文を対応付けたのと同様の方法で、質問文の例文テンプレートに回答の例文テンプレートを対応付けることが有効である。
【００５０】
【発明の効果】
本発明によれば、音声入力されたキーワードからユーザの発話意図に適した例文テンプレートをすばやく検索し、ユーザの発話を誘導することができる。ユーザは例文テンプレートを見ながらしゃべるので、言いたいことを完全な文の形で楽に言うことができるし、例文テンプレートのスロットには思った通りの任意の語句を入れてしゃべることができるので、単語置換の煩わしさがない。これにより、例文検索型の翻訳装置の利点と任意文入力型の翻訳装置の利点を兼ね備えた翻訳装置を実現することができる。
【図面の簡単な説明】
【図１】本発明の音声翻訳装置の実施例の外観および初期画面を示す図である。
【図２】本発明の音声翻訳装置の実施例の構成を示すブロック図である。
【図３】本発明の音声翻訳装置の実施例の動作を示すフローチャートである。
【図４】単語音声認識の手順を示すフローチャートである。
【図５】入力文音声認識の手順を示すフローチャートである。
【図６】例文ファイルの内容を示す図である。
【図７】単語辞書の内容を示す図である。
【図８】単語候補表示ステップにおける画面表示例を示す図である。
【図９】例文表示ステップにおける画面表示例を示す図である。
【図１０】絞り込み例文表示ステップにおける画面表示例を示す図である。
【図１１】文認識結果表示ステップにおける画面表示例を示す図である。
【図１２】詳細例文表示ステップにおけるの画面表示例を示す図である。
【図１３】翻訳結果表示ステップにおけるの画面表示例を示す図である。
【図１４】例文テンプレートに対する有限状態オートマトンの例を示す図である。
【符号の説明】
２０１：CPU
２０２：音声入力装置
２０３：音声出力装置
２０４：単語辞書
２０５：例文ファイル
２０６：メモリ
２６１：制御プログラム
２６２：単語音声認識プログラム
２６３：文音声認識プログラム
２６４：テンプレート翻訳プログラム
２６５：音声出力プログラム
２６６：画面表示プログラム。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a speech translation apparatus and a recording medium, and more particularly to an utterance guidance type speech translation apparatus suitable for a portable translation machine for overseas travel and a recording medium for realizing the same.
[0002]
[Prior art]
Conventional translation devices are roughly classified into two types. The first type of translation device is a device that translates an arbitrary input sentence, and the second type of translation device is a device that searches and translates bilingual example sentences prepared in advance according to scenes and keywords.
[0003]
As a translation device for translating the first arbitrary input sentence, many machine translation systems have already been put into practical use for those intended for text. As a translation method, rule-based translation that performs translation based on a dictionary and grammatical rules is widely adopted. However, since it is difficult to describe rules when new sentence expressions (linguistic phenomena) that cannot be handled well by grammatical rules appear, research on example-based translation using examples is also being conducted.
[0004]
On the other hand, in the case of a translation device for speech, there is a problem that a correct translation result cannot be obtained unless speech recognition is first performed correctly. In order to solve this problem, for example, in Japanese Patent Laid-Open No. 08-328585, a correct answer is selected from a plurality of speech recognition result candidates by calculating the linguistic similarity between the input speech and all the examples in the example database. How to choose is shown.
[0005]
As an example of a translation device for a method for searching for a second prepared bilingual example sentence, Japanese Patent Laid-Open No. 56-88564 displays an example sentence when a category is designated, and when a sentence is selected from the example sentence, the selected example sentence is translated. A portable translation device is described in which sentences are output as speech.
[0006]
[Problems to be solved by the invention]
While traveling abroad, there are many occasions where someone exchanges short conversations with relatively fixed contents such as requests and questions. However, because it is an extraordinary scene of travel, there are many cases where the content, expressions, or words that are specifically spoken are not immediately conceivable in response to the speaker's intention to speak. For this reason, in the translation apparatus that translates the first arbitrary input sentence, the input sentence cannot be immediately conceived, and thus the purpose of supporting foreign language conversation during travel may not be sufficiently achieved. For example, when asking where to get off the plane and picking up checked baggage, or when presenting a baggage receipt to the attendant, "Where is the baggage claim for American Airlines 121?""This is a baggage claim card." The Japanese sentence may not come to mind immediately in its entirety. When talking directly with someone who speaks Japanese, your native language, while looking at the reaction of the other party, “Luggage you checked when you got on… Well, where… Flight number? American flight 121.” "I don't get my baggage ... but a baggage exchange card? Oh, is it a refrain when I get on it? Well, this is." If you don't enter a complete sentence such as "Where is the baggage claim for American Airlines 121?" Or "My baggage does not come out. This is my baggage voucher." Also, unlike speaking in your native language, you can not understand even if you ask the other person, such as "What is your flight number?" I need to tell.
[0007]
On the other hand, the translation device that searches for the second pre-prepared bilingual example sentence can avoid such problems of inputting arbitrary sentences. Moreover, since the processing is simple, it is easy to realize as a portable translator. However, in the conventional method of searching for example sentences classified by scene by selecting a menu, if the number of example sentences increases, the selection button is pressed repeatedly to repeatedly narrow down the scenes or scroll through many example sentences for one scene. There is a problem that it takes time to find an example sentence to be used because it needs to be displayed. Further, the conventional method of searching for word replacement candidates for one example sentence by means such as scrolling has a problem that the operability is not good when the number of word replacement candidates is increased. For example, in the example of asking the location of the baggage claim mentioned above, after finding the example sentence “Where is the baggage claim for Japan Airlines 753?”, Scroll and press the button to change the flight name to “American Airlines” Replacing it with "121 flight" is troublesome.
[0008]
An object of the present invention is to display an example sentence template suitable for a user's utterance intention and induce the user's utterance, thereby translating the advantage of the example sentence search type translation apparatus and the advantage of the arbitrary sentence input type translation apparatus. To implement the device.
[0009]
[Means for Solving the Problems]
In order to solve the above-described problem, in the translation apparatus, an example sentence file that stores a plurality of example sentence templates, a voice recognition unit that recognizes input voice, and a phrase in the example sentence template similar to the voice based on the voice It is assumed that it has a means for replacing and translating, a voice output unit for outputting the translated sentence as a voice, or a display for displaying the translated sentence. A step of recognizing a first voice by voice recognition means, searching for an example sentence template associated with the first voice and displaying it on a display; Recognizing the speech by the speech recognizing means, replacing the phrase in the searched example sentence template based on the second speech and translating, and displaying the translation result on the nest play, or It is assumed that a step of outputting sound is recorded.
[0010]
For example, the translation device is a translation device that searches for example sentences prepared in advance, and the example sentences stored in the example sentence file are described as an example sentence template including a portion (hereinafter referred to as a slot) that can be replaced with a phrase. In the example sentence template, as the information regarding the slot, a classification code (for example, information such as place and date) of a word that can appear in the slot and a representative example are described. Each example sentence template is assigned a keyword (for example, information on reservations, airplanes, tickets, etc.). It is assumed that the keyword can describe not only specific words but also classification codes.
[0011]
In the word dictionary of the translation apparatus of the present invention, pronunciation information and classification codes of words are described as information for searching for example sentence templates using voice input keywords. In addition, as information for performing template translation, headline grammar information, translated words, and translated word grammatical information are described.
Means for searching for example sentence templates include word recognition means for recognizing a keyword inputted by voice, means for searching for an example sentence template in which the recognized word and its classification code are described in the keyword information, Means for displaying an example sentence template on the screen.
[0012]
As a means for performing template translation, a sentence speech recognition means for selecting an example sentence template having a high similarity from a plurality of example sentence templates and a phonological information in a word dictionary are referred to. Word speech recognition means for recognizing words inserted into slots, means for determining translations and word forms of words inserted into slots with reference to word dictionary and translation rules, translation information of example sentence templates and insertion into slots Means for generating a translation by combining the translated phrases.
[0013]
When the user inputs a word by voice, the input word and classification code are recognized by referring to the phoneme information in the word dictionary. These are searched for example sentence templates described in the keyword information and displayed on the screen. In the example sentence template, representative examples of words that can appear in the slot are described as information about the slots included in the example sentence. Therefore, when the example sentence template is displayed on the screen, it is displayed in a state where the representative example is inserted in the slot.
[0014]
In a state where one or more example sentence templates are displayed on the screen, a sentence in which a user's desired word / phrase is inserted is input to the slot of the example sentence template. Then, the translation of the voice input sentence is executed based on the example sentence template having a high similarity among the example sentence templates displayed on the screen, and the translated sentence is displayed. For example, when the user inputs the word “phone” by voice, an example sentence template in which the word “phone” is described in the keyword information is searched. Example screen templates such as “Where is [phone]”, “Tell me how to place [local phone]” and “I want to make a [collect call] to [Japan]” are displayed on the screen. The underlined portion represents a slot. The user scrolls the screen and searches for an example sentence that can be used. When you find an example sentence that you can use, for example, "I want to make a [collect call] to [Japan]", I will speak the sentence "I want to make an international call to Canada". Then, the translator recognizes that it used the example sentence template “I would like to call [Collect Call] on [Japan]” and that “Japan” and “Collect Call” were replaced with “Canada” and “International Call”. Recognize and perform template translation.
[0015]
DETAILED DESCRIPTION OF THE INVENTION
An embodiment of the present invention will be described with reference to FIGS. This embodiment is a portable speech translation apparatus for a user whose native language is Japanese to travel in an English-speaking country. This speech translation apparatus searches and displays example sentence templates when a user inputs a keyword by speech. When the user speaks by replacing a replaceable word in the example sentence template with another word, the user recognizes which example sentence template was used to replace what word, and displays / speaks a translation for the input sentence.
[0016]
Next, the appearance of the translator of this embodiment will be described. FIG. 1 is a diagram showing the appearance and initial screen of a speech translation apparatus according to an embodiment of the present invention. The speech translation apparatus 101 includes a display screen 102, a power switch 103, a reset button 104, a speaker 105 for outputting a translated sentence as a voice, a microphone 106 for inputting a keyword for example sentence search, a sentence example search / translation / translation utterance, and the like. An execution button 107 for instructing, a “return” button 108 for returning to the previous screen, and scroll buttons 109 to 112 for moving back and forth and right and left on the screen are provided.
[0017]
Next, the hardware configuration of the translation of the present embodiment will be described. FIG. 2 shows a hardware configuration of the speech translation apparatus. The speech translation apparatus includes a CPU 201 for performing control of the apparatus, speech recognition, translation, and speech synthesis processing, a speech input device 202, a speech output device 203, a word dictionary 204, an example sentence file 205, and a memory 206. The memory 206 stores a control program 261, a word speech recognition program 262, a sentence speech recognition program 263, a template translation program 264, a speech output program 265, and a screen display program 266.
[0018]
Next, the contents of the example sentence file will be described.
[0019]
FIG. 6 shows the contents of the example sentence file. The example sentence file includes, for each example sentence template, example sentence number 601, basic example sentence number 602, detailed example sentence number 603, keyword 604, original sentence template 605, pronunciation information 606 describing pronunciation of the original sentence template, translated sentence template 607, and pronunciation of the translated sentence template. Translated pronunciation information 608 and slot information 609 are described. In the slot information 609, the number of slots 6091 is described. For each slot, a representative example 6092 of the words that enter the slot, a translation 6093 of the representative example, pronunciation information 6094 of the translation of the representative example, and what word is included in the slot And grammatical information 6096 describing the grammatical features (articles, singles, etc.) of the slot portion in the translation.
[0020]
There are basic example sentences such as “I want to make a reservation” and detailed example sentences such as “Find the flight that I can get the fastest”. For the basic example sentence, the detailed example sentence number corresponding to the basic example sentence is described in the detailed example sentence number 603. For the detailed example sentence, the example sentence number of the corresponding basic example sentence is described in the basic example sentence number 602.
[0021]
When you search for an example sentence by keyword, the basic example sentence is displayed at the top. When the user selects “I want to reserve” and instructs display of a detailed example sentence, a detailed example sentence such as “Find the flight that can be boarded first” is displayed. In the original sentence template 605 and the translation sentence template 607, the replaceable words / phrases are slots.
[0022]
When displayed on the screen, the word of the representative example 6092 is embedded in the slot of the original text template. When the user instructs the display or utterance of the translation without replacing the word, the translation 6093 of the representative example is embedded in the translation and displayed, and the pronunciation information 6094 of the translation of the representative example is embedded in the translation pronunciation information 608. To do.
[0023]
In the description of the slot semantic information 6095, words may be listed as “round trip ticket; one way ticket; ticket”, or by using a classification code such as “&# 60 place &# 62”. Also good. If any word can be replaced, no condition is described.
[0024]
An example sentence in which a classification code such as “&# 60 place &# 62” is described in the keyword 604 is searched from all words having the classification code. For example, if a user enters a word with a classification code of “&# 60 location &# 62”, such as “Post Office”, “Bus Stop”, or “Bank” as a keyword, Example sentence 1356 “Closest &# 60S1 &#"Where is 62?" Is searched and displayed.
[0025]
Next, the contents of the word dictionary will be described. FIG. 7 shows the contents of the word dictionary. In the word dictionary, for each word item, a word number 701, a heading 702, a reading 703, pronunciation information 704, grammar information 705, a classification code 706, a translation 707, a translation pronunciation 708, a translation grammar information 709, and an example sentence number 710 are stored. is described. An example sentence number 710 of a record having a classification code such as “&# 60 location &# 62” as a headline is an example sentence number of an example sentence having the classification code as a keyword.
[0026]
Next, the operation of the speech translation apparatus according to this embodiment will be described. FIG. 3 is a flowchart showing an outline of the operation of the speech translation apparatus. When the power button 103 is pressed (301), the program in the memory 206 is activated and an initial screen is displayed on the display screen 101 (302). When voice is input through the microphone 106 while the initial screen is displayed, keyword voice recognition is performed (303).
[0027]
Next, keyword speech recognition will be described with reference to FIG. First, when a word is input by speech (402), the word dictionary 204 is referred to, word speech recognition is performed by a speech recognition algorithm such as a hidden Markov model (HMM) (403), and word candidates are displayed on the screen ( 404).
[0028]
An example of the word candidate display screen will be described with reference to FIG. On the word candidate display screen, kana notation of the first candidate is displayed on the first line, and word candidates from the first candidate to the tenth candidate are displayed on the second and subsequent lines. Initially, the first candidate word is highlighted. By using the front / rear / left / right scroll keys, the highlighted word moves forward / backward / left / right. Thereby, the user can select the intended word. The fourth and lower candidates that are not displayed on the screen in FIG. 8 can be viewed by scrolling the screen with the down scroll key. When there is a homonym, the word candidate displays the homonym on one line, and when there is no homonym, it displays a parallel translation. “*” At the beginning of a word candidate indicates a keyword for example sentence search.
When the user selects an intended word and presses the execute button (405), the word is determined as a keyword for example sentence search (408), and the keyword recognition process is terminated (409). If the word intended by the user is not found in the word candidate, the input word is corrected at the kana level by selecting the first line of the kana notation and pressing the execution button (406), and the word dictionary search can be performed again (step 406). 407). In addition, by pressing the “return” button, it is possible to return to the keyword voice input acceptance state.
[0029]
The word candidate display screen is used by the user to confirm the keyword speech recognition result. However, since the translated word is displayed together with the word candidate, it can also be used as a bilingual dictionary. It is also possible to utter a pronunciation by selecting a translated word and pressing the execute button. In the case of homonyms, the translation is not displayed on the word candidate display screen, but if you select a word from the homonyms and press the execute button, the translation and example sentences may be displayed on the next screen. it can.
[0030]
Next, example sentence search will be described. When the input word is determined, the example sentence with the number described in the example sentence number 710 of the input word is extracted from the example sentence file 205 in the word dictionary 204 (304) and displayed on the screen (305). For words in which the classification code 706 is described in the word dictionary, an example sentence using the classification code as a keyword is also searched. For example, for the input word “bank”, in addition to an example sentence using “bank” as a keyword, an example sentence using a classification code of &# 60 location &# 62 &# 60 organization &# 62 as a keyword is also searched. The number of the example sentence having the classification code &# 60 place &# 62 as a keyword is described in the example sentence number 710 of the word dictionary record heading “&# 60 place &# 62”.
[0031]
FIG. 9 shows an example of an example sentence display screen when an example sentence search is performed using “reservation” as a keyword. In FIG. 9, reference numeral 901 denotes a searched example sentence. On the example sentence display screen, the dictionary headline and translation of the word are displayed on the first line, and the first three sentences of the searched example sentence are displayed on the second and subsequent lines. The example sentences after the fourth line can be viewed by scrolling the screen with the scroll keys. Example sentences with ★ at the beginning of the sentence indicate basic example sentences. In the example sentence display screen, the original example template 607 embedded in the slot of the original template is displayed to indicate that the phrase can be replaced with an underline. In FIG. 10, “Dallas”, “Miami”, and “14th” are replaceable phrases.
[0032]
Next, example sentence narrowing will be described. Parts of the search example that do not fit in the screen can be viewed using the up / down / left / right scroll keys, but if there are many example sentences, keywords can be added to narrow down the example sentences. This makes it easier to search for example sentences. When the user wants to narrow down example sentences, the user selects the keyword displayed on the first line of the screen with the scroll button (highlights it) and presses the execution button. Next, when an additional keyword is inputted by voice, the above-described keyword voice recognition is executed (307), the example sentences are narrowed down (308), and the narrowed down example sentences are displayed (309). FIG. 9 shows an example sentence search result using “reservation” as a keyword, and FIG. 10 shows a result of narrowing down example sentences by “flight”. When the user sees the narrowing down result and presses the “return” button, the example sentence before narrowing down is displayed (311).
[0033]
When selecting an example sentence having no replaceable phrase, the example sentence may be selected by moving the cursor with a scroll key (312 to 314). When an example sentence is selected and the execution key is pressed, translation is executed (322).
[0034]
Next, detailed example sentence display will be described. When selecting an example sentence with the scroll key, selecting a star attached to the head of the basic example sentence instead of the example sentence itself and pressing the execution key displays a related detailed example sentence (314). For example, in the example sentence display screen of FIG. 9, when selecting “I want to reserve” on the third line and pressing the execution key, the translation result is displayed. Select “★” on the left side of “I want to reserve” and press the execution key. Then, a detailed example sentence “I want to reserve” as shown in FIG. 12 is displayed.
[0035]
Next, sentence speech recognition will be described. When the user utters a sentence while watching the example sentence display screen (315), sentence speech recognition is executed (316). Sentence speech recognition is performed, for example, by an automaton-controlled continuous word speech recognition algorithm using an HMM described in “Speech recognition by a probability model” (IEICE, 1988) pp46-50.
[0036]
FIG. 5 shows a flow of sentence speech recognition processing. First, a finite state automaton as shown in FIG. 14 is created for the example sentence template displayed on the screen. FIG. 14 is a finite state automaton created for the example sentence template “where is the nearest &# 60s1 &# 62”. In FIG. 14, reference numerals 1401 and 1402 denote HMMs (Hidden Markov Models) created from pronunciation information “Ichibanchikai” and “Wadkodeska”. A model of the original text template is created based on the pronunciation information 606 described in the example sentence file (502).
[0037]
The model of the slot portion is created based on pronunciation information of words that can enter the slot. For this purpose, first, by referring to the semantic information 6095 of the slot information 609 of the example sentence file (504), it is recognized which word of the word dictionary registered words can enter the slot. The words that can enter the slot are the listed word (505) when the word is listed in the slot semantic information 6095, and the word (506) having the specified classification code when the classification code is specified. If nothing is specified, the word dictionary registered word (507) is assumed. The model of each word is created based on the pronunciation information 704 of the word described in the word dictionary (508).
[0038]
Once a finite state automaton is created for the example sentence template displayed on the screen (509), speech recognition of the input sentence is performed by the automaton-controlled continuous word speech recognition algorithm by HMM, and the top 10 words are stored as slot candidate words for each slot. (510).
[0039]
When speaking while looking at example sentence templates, there may be pauses before and after the slot or immediately after the phrase containing the slot, so for each example sentence template, not only the finite state automaton for the whole sentence but also the pause. Creating a finite state automaton of phrases separated by places where they are likely to enter can provide higher recognition accuracy. For example, in the above example, in addition to “Ichibanchikaiba Stay Wadokodeska” shown in FIG. 14 and the like, a finite state automaton such as “Ichibantikai”, “Bastei”, “Wadokodeska”, “Bastiwade”, “Dokodeska” is also created, It associates with an example sentence template including a phrase. In addition, when the pose is actively incorporated and a pose is entered with `` Ichibanchikai '', it displays that it matches the `` closest '' part of `` Where is the nearest &# 60s1 &# 62 '' It is also possible to guide the user to say continuation after confirmation.
[0040]
In this embodiment, slot insertion words are recognized within the range of word dictionary registered words, but not all words can be registered in the dictionary in advance. In particular, nouns such as proper nouns and local specialties are important in travel conversation, but cannot be covered by a word dictionary. However, when the user wants to use such a word, there are many cases where the word is known in the original language, and if only the word is used, the pronunciation can be somehow approximated. Therefore, when a word of a slot portion is inputted by voice while pressing a specific button, it is conceivable that the sound inputted by the user is embedded in the translated sentence and outputted for the slot portion. As described above, if you use a method that creates a finite state automaton that corresponds to a phrase and associates it with an example sentence template that includes that phrase, you can specify the example sentence template from the phrases before and after the part that you spoke by pressing the specific button. be able to. As a result, for example, it is possible to use the example sentence template “How many minutes is waiting for this attraction” to say “How many minutes will the Back to the future wait?”. Also, for a specific slot such as &# 60s1 &# 62 in "My name is &# 60s1 &#62", it is possible to embed the sound as it is input by the user in the slot portion of the translation.
[0041]
In this embodiment, the target to be compared with the voice input sentence is limited to the example sentence template displayed on the screen, but the whole example sentence template searched by the keyword may be used. Also, example sentence templates selected in advance by the user or recently used example sentence templates may be stored in a “user example sentence file” and used as a comparison target.
[0042]
If you adopt a method that creates a finite state automaton that corresponds to a phrase and associates it with an example sentence template that includes that phrase, mark the range to be spoken with a wavy line when displaying the example sentence template, It may be possible to let people know that they don't have to speak the whole sentence. This method improves usability. For example, if the sentence is “Look for the earliest flight”, say “Look for the earliest flight” or “Fastest flight” and say “ Save.
[0043]
Next, sentence recognition result display / correction will be described. The sentence speech recognition result is indicated by moving the cursor to the recognized example sentence template and replacing the word in the slot with the input word from the representative example (317). For example, when the example sentence display screen of FIG. 10 is displayed and the user inputs a voice saying “I want a one-way ticket for 17 days from Chicago to Seattle” and the sentence speech is correctly recognized, the screen of FIG. Is displayed. Looking at the sentence recognition result, the user presses a “return” button to redo the sentence input (318) or correct the sentence recognition result as necessary (319). If the example sentence template is not selected correctly, another sentence is selected with the scroll key, and the word recognition in the slot portion is performed again using the example sentence template. If the word recognition in the slot part is not correct, the slot candidate word stored at the time of sentence recognition is displayed when the slot is selected with the scroll key and the execution button is pressed. If there is a correct slot candidate word, it is selected by selecting with the scroll key and pressing the execution button.
[0044]
Next, translation will be described. When a sentence matching the voice input sentence is confirmed, translation is instructed by pressing the execution button (320 to 321). It is also possible to view a detailed example sentence by selecting a basic example sentence by voice input, selecting ★ at the beginning of the sentence, and pressing the execution button (321). The template translation is performed by embedding the translation of the word / phrase in the slot portion in the translation of the example sentence template (322). In the example sentence template, grammatical information 6096 of the slot in the translated sentence is described. By referring to this and the grammatical information 709 of the translation in the word dictionary 204, the article or single or the like of the translation of the slot portion is determined. For example, if you want to embed the word `` apple apple '' in which the grammatical information `` Initial vowel '' is described in the slot designated `` indefinite article, singular '', add the indefinite article To embed the word “candy” in the designated slot, embed the plural model “dies”.
[0045]
In the translation result display step (323), the translation result display screen shown in FIG. 13 is displayed. If the user's instruction (324) is a “return” button, the example sentence displayed before execution of translation is displayed (325). If the translation is selected and the execution button is selected, the translation is uttered (326), and the original sentence is selected. If it is an execution button, the next utterance example sentence is displayed (327). The example sentence displayed as the next utterance example is a detailed example sentence when the previously translated sentence is a basic example sentence, and the example sentence displayed before the translation is executed when the previously translated sentence is a detailed example sentence. is there. This is because a set of detailed example sentences searched with the same keyword is a related example sentence, and it is considered that there is a possibility that a plurality of sentences will be used continuously from these example sentences. As an alternative, keywords related to the scene, such as “hotel”, “front”, “shopping”, etc., are assigned to each example template as “scene information” as information separate from the keyword, and only example sentences with common scene information Can be displayed as an example sentence for the next utterance. In addition, even when a new keyword search is performed, the scene information that matches the example sentence used immediately before is preferentially displayed on the upper side, or the same scene continues until it is reset, and the example sentence in the scene information It is conceivable to display with narrowing down.
[0046]
In this embodiment, the template translation is performed using the bilingual example sentence template, but the true purpose of the example sentence template is to induce the user's utterance. Therefore, the translation process may be performed by machine translation based on a dictionary and grammatical rules. In that case, the example sentence template does not necessarily need to be translated, and only the example sentence template of the input language is sufficient.
[0047]
As mentioned above, although the embodiment as a portable speech translation apparatus has been described, the present invention records the flows described in FIGS. 3, 4, and 5 for realizing the translation apparatus as described above. It may be a recording medium. As such a recording medium, a step of recognizing a speech input phrase by a speech recognition means, a step of searching an example sentence template associated with the phrase in the example sentence file, and displaying the searched example sentence file on a display A step of recognizing a sentence spoken by the user based on the displayed example sentence template, a step of comparing the recognized sentence with the searched example sentence template, and recognizing the most similar one A computer-readable recording of the steps to replace and translate replaceable words / phrases in the example sentence template based on the sentence spoken by the user, and to display the translation result on the display or to output the voice Recording media can be considered.
[0048]
Another type of translation apparatus may have a different user interface. For example, in a speech translation apparatus provided at a window of a hospital, a government office, a hotel, etc., it is conceivable that a screen is used as a touch panel and only a replacement word is uttered by touching a slot with a finger. In this embodiment, considering the small display screen and use with one hand, the example sentence template search is performed by keyword voice input. However, the scene and utterance intention (question, request, explanation, etc.) It is also possible to classify example sentence templates with, and narrow down by menu display / selection.
[0049]
In the present embodiment, the speech translation apparatus that translates from Japanese to English has been described, but the present invention can also be applied to other language pairs. In this embodiment, in the bilingual example sentence template, information for recognition is described for the original sentence template and information for generation for the translated sentence template is described. It is also possible to execute translation in both directions by describing the information and the information for generation. When bi-directional translation is performed, it is effective to associate the example sentence template of the answer with the example sentence template of the question sentence in the same manner as the basic example sentence and the detailed example sentence are associated with each other in this embodiment.
[0050]
【The invention's effect】
ADVANTAGE OF THE INVENTION According to this invention, the example sentence template suitable for a user's utterance intention can be searched quickly from the keyword input by speech, and a user's utterance can be induced | guided | derived. Users can talk while looking at example sentence templates, so they can easily say what they want to say in the form of a complete sentence, and they can speak any word or phrase as expected in the slot of the example sentence template. There is no troublesome replacement. Accordingly, it is possible to realize a translation apparatus that has the advantages of the example sentence search type translation apparatus and the advantage of the arbitrary sentence input type translation apparatus.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating an appearance and an initial screen of an embodiment of a speech translation apparatus according to the present invention.
FIG. 2 is a block diagram showing a configuration of an embodiment of a speech translation apparatus according to the present invention.
FIG. 3 is a flowchart showing the operation of the embodiment of the speech translation apparatus of the present invention.
FIG. 4 is a flowchart showing a procedure of word speech recognition.
FIG. 5 is a flowchart showing a procedure of input sentence speech recognition.
FIG. 6 is a diagram showing the contents of an example sentence file.
FIG. 7 is a diagram showing the contents of a word dictionary.
FIG. 8 is a diagram showing a screen display example in a word candidate display step.
FIG. 9 is a diagram showing a screen display example in an example sentence display step.
FIG. 10 is a diagram showing a screen display example in a narrowed example sentence display step.
FIG. 11 is a diagram showing a screen display example in a sentence recognition result display step.
FIG. 12 is a diagram showing a screen display example in a detailed example sentence display step.
FIG. 13 is a diagram showing a screen display example in a translation result display step.
FIG. 14 is a diagram illustrating an example of a finite state automaton for an example sentence template.
[Explanation of symbols]
201: CPU
202: Voice input device
203: Audio output device
204: Word dictionary
205: Example sentence file
206: Memory
261: Control program
262: Word speech recognition program
263: Sentence speech recognition program
H.264: Template translation program
265: Audio output program
266: Screen display program.

Claims

In a translation apparatus having a CPU, a voice input unit, a display, an example sentence file for storing an example sentence template in association with its pronunciation information, and a storage means for storing a word dictionary for storing words in association with their pronunciation information A translation method,
A first step of displaying an example sentence template stored in the example sentence file on the display, the example sentence template including a replaceable part and having pronunciation information stored in advance in a memory;
A second step of receiving voice input from the voice input unit;
In the CPU,
A speech model generated using pronunciation information of the displayed example sentence template stored in the storage means and pronunciation information of a word that can enter a replaceable part of the example sentence template is compared with the voice input A third step in which an example sentence template including a speech input portion corresponding to the replaceable portion is a recognition result of the speech input;
A fourth step of outputting a recognition result of the voice input portion;
Translation method characterized by having a fifth step of outputting the translated the recognition result.

The translation method according to claim 1,
In the first step, a plurality of the example sentence templates are displayed,
Wherein in a third step, as compared with phonetic information of a plurality of example sentences templates the speech input is the display, the translation method characterized by identifying a sentence template corresponding to speech input.

The translation method according to claim 1,
In the first step, displaying information indicating the characteristics of the words to be inserted into the replaceable part of the example sentence template,
Wherein in a third step, the recognition of the voice input portion, the translation method and performing with phonetic information word corresponding to characteristics of the displayed words of the word dictionary.

A translation device,
An example sentence template including a replaceable part, an example sentence file including pronunciation information thereof, and a memory storing a word dictionary storing a plurality of words and voice information thereof;
A display that displays information about the example sentence template and the replaceable portion stored in the memory;
An input unit for receiving voice input;
CPU and
The CPU stores the speech model generated using the pronunciation information of the displayed example sentence template stored in the memory and the pronunciation information of words that can enter a replaceable portion of the example sentence template, and the voice input. A translation apparatus characterized in that an example sentence template including a speech input portion corresponding to the replaceable portion is compared as a recognition result of the speech input , and the recognition result is translated and output.

The translation device according to claim 4,
The display displays a plurality of the example sentence templates,
Wherein the CPU, the audio input compared with phonetic information of said displayed plurality of example sentences templates, translation apparatus and specifying a sentence template corresponding to speech input.

The translation device according to claim 4,
The display displays information indicating the characteristics of words inserted in the replaceable part of the example sentence template,
The translation apparatus according to claim 1, wherein the CPU recognizes the voice input portion by using pronunciation information of a word corresponding to a feature of the displayed phrase in the word dictionary.

A program storage medium that stores a processing unit, a voice input unit, a display, an example sentence file that stores an example sentence template in association with the pronunciation information, and a word dictionary that stores words in association with the pronunciation information A translation device having storage means;
A first step of displaying an example sentence template stored in the example sentence file on the display, the example sentence template including a replaceable part and having pronunciation information stored in advance in a memory;
A second step of receiving voice input from the voice input unit;
In the processing unit, the speech model generated using the pronunciation information of the displayed example sentence template stored in the storage unit and the pronunciation information of a word that can enter a replaceable part of the example sentence template, and the voice input And a third step in which an example sentence template including a speech input portion corresponding to the replaceable portion is used as a recognition result of the speech input, and a fourth step of translating the recognition result and outputting the translation result And the steps
A program storage medium for storing a program characterized in that the program is executed.

8. The program storage medium according to claim 7, wherein the program is
In the first step, a plurality of example sentence templates are displayed,
The program storage medium characterized in that, in the third step, the voice input is compared with pronunciation information of the plurality of displayed example sentence templates to identify an example sentence template corresponding to the voice input.

8. The program storage medium according to claim 7, wherein the program is
In the first step, information indicating the characteristics of the words to be inserted into the replaceable part of the example sentence template is displayed;
The program storage medium characterized in that in the third step, the voice input portion is recognized using pronunciation information of a word corresponding to a feature of the displayed phrase in the word dictionary.