JP3993319B2

JP3993319B2 - Interpreting device and recording medium storing program for exerting functions of interpreting device

Info

Publication number: JP3993319B2
Application number: JP25436198A
Authority: JP
Inventors: 睦川越
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1998-09-08
Filing date: 1998-09-08
Publication date: 2007-10-17
Anticipated expiration: 2018-09-08
Also published as: JP2000090087A

Description

【０００１】
【発明の属する技術分野】
本発明は、第１言語と第２言語との間での会話を通訳する通訳装置に関する。
【０００２】
【従来の技術】
従来の携帯型の通訳装置では、ユーザが使用場面に応じて分類されたメニューの選択やキーワードの入力をして第１言語の単語又は一文（質問文）を選択すると、その選択された単語又は一文に対応する第２言語の単語又は一文（質問文）が表示される方式が主に採用されている。
【０００３】
ユーザが会話相手から返事を得るために、例えば特開平７−１０５２２０号公報の技術では、上記の第２言語の質問文の表示を会話相手に提示する際に、第２言語の応答文を数種類表示するようにして、それらの応答文うちの一文を会話相手に選択させる方式が示されている。
【０００４】
また、特開平９−３１９７５０号公報の技術では、会話相手の応答を想定し、応答を階層的に分類して選択を容易にしたり、また数字や記号を直接入力させて相手からの応答を得る方式が示されている。
【０００５】
【発明が解決しようとする課題】
ところが、上記した２つの技術では、ユーザが会話相手から応答文を得るために、会話相手に通訳装置の使用方法を教えて、装置の操作を任せなければならず、通訳装置の操作に不慣れな相手とは、スムーズな会話が困難である。
【０００６】
本発明は、上記課題に鑑み、会話相手が通訳装置の取扱いに習熟していることを必要としない操作性の優れた通訳装置を提供することを目的とする。
【０００７】
【課題を解決するための手段】
上記課題を解決するため本発明は、第１言語の質問文とそれを翻訳した第２言語の質問文とを対にして記憶している質問文データベースと、第２言語の質問文に応答する標準的な複数の第２言語の応答文と、それらを翻訳した第１言語の応答文とを記憶している応答文データベースと、入力操作手段と、音声入力手段と、前記入力操作手段でユーザから入力された第１言語の質問文を前記質問文データベースを検索し、それと対になっている第２言語の質問文を抽出する質問文抽出手段と、抽出された第２言語の質問文の出力を受けてユーザの会話相手が第２言語の応答文を音声で入力した場合に、入力された音声を第２言語の文字応答文として認識する音声認識手段と、前記音声認識手段で認識された文字応答文に類似する第２言語の応答文と対応する第１言語の応答文とを前記応答文データベースから抽出する応答文抽出手段と、前記応答文抽出手段で抽出された第２言語の応答文と第１言語の応答文と前記音声認識手段で認識された文字応答文とを表示する表示手段とを備えることとしている。
【０００８】
【発明の実施の形態】
以下、本発明に係る通信装置の実施の形態について、図面を用いて説明する。
【０００９】
（実施の形態１）
図１は、本発明に係る通訳装置の実施の形態１の構成図である。
【００１０】
この通訳装置は、質問文翻訳データベース１０１と、応答文データベース１０２と、入力操作部１０３と、質問文選択部１０４と、表示部１０５と、音声出力部１０６と、応答文抽出部１０７と、音声認識辞書１０８と、音声入力部１０９と、音声認識部１１０とを備えている。
【００１１】
質問文翻訳データベース１０１は、ＲＯＭ等からなり、第１言語の質問文とそれを第２言語に翻訳した第２言語の質問文とを一対にして識別番号を付して記憶している。図２は、この質問文翻訳データベース１０１の内容の一例を示している。この質問文翻訳データベース１０１では、第１言語を日本語とし、第２言語を英語としている。なお、本実施の形態では、この通訳装置を利用するユーザが日本語を解し、会話相手が英語を解する状況で用いられるものである。
【００１２】
質問文翻訳データベース１０１の識別番号「０００２」には、日本語の質問文「空港へはどのくらい時間がかかりますか」と英語に翻訳された「How long does it take to the airport?」とが対にして記憶されている。更に、この質問文に応答する標準的な応答文の数「５」とその応答文の識別番号「Ｔ１００１」、「Ｔ１００２」、・・・「Ｔ１００５」とが記憶されている。この応答文の識別番号は、応答文データベース１０２に応答文を識別する識別番号として記憶されているものである。
【００１３】
応答文データベース１０２は、ＲＯＭ等からなり、質問文翻訳データベース１０１の英語の質問文に答える標準的な応答文の一覧が記憶されている。図３は、応答文データベース１０２に記憶されている内容の一例を示す図である。各応答文３０１には、その応答文の日本語訳３０２と、その応答文の識別番号３０３とが対応して記憶されている。
【００１４】
応答文データベース１０２には、識別番号「Ｔ０９９９」で識別される応答文「３０dollars」とその日本語への翻訳文「３０ドルです。」とが記憶されている。
【００１５】
入力操作部１０３は、キーボード等からなり、ユーザの質問文の入力を受け付ける。この際、ユーザは、質問文の全文を入力してもよいけれども、例えば、空港への所要時間を尋ねたい場合、「空港」と「時間」とを入力する。入力操作部１０３は、入力された質問文又は入力された単語「空港」と「時間」とを質問文選択部１０４に通知する。
【００１６】
また、入力操作部１０３は、ユーザからの質問指示の入力を受けると、質問文選択部１０４にその旨を通知する。
【００１７】
質問文選択部１０４は、入力操作部１０３から質問文の通知を受けると、質問文翻訳データベース１０１を調べ、一致する質問文とその翻訳された質問文と識別番号と応答文の識別番号とを抽出する。
【００１８】
また、質問文として単語の通知を受けると、その単語をキーワードとして、キーワードを含む質問文を選択し、対になっている翻訳された質問文と識別番号とその応答文の識別番号とを抽出する。例えば、「空港」と「時間」とを通知されたとき、この２つのキーワードを含む識別番号「０００２」の質問文「空港へはどのくらい時間がかかりますか」を選択する。質問文選択部１０４は、抽出した日本語の質問文と英語の質問文とを表示部１０５の所定の領域に表示させ、併せて、応答文の識別番号「Ｔ１００１、Ｔ１００２、Ｔ１００３、Ｔ１００４、Ｔ１００５」を応答文抽出部１０７に通知する。
【００１９】
質問文選択部１０４は、入力操作部１０３から質問指示の通知を受けると、音声出力部１０６に質問文の識別番号を通知する。
【００２０】
表示部１０５は、液晶ディスプレイ等からなり、質問文選択部１０４と応答文抽出部１０７の制御により、質問文や応答文を表示する。
【００２１】
図４は、表示部１０５の表示内容の一例を示す図である。この図は、質問文選択部１０４で質問文が選択され、音声出力部１０６から翻訳された質問文が音声出力されている状態での表示例である。日本語の質問文４０１とその英語の質問文４０２とが質問文選択部１０４によって表示され、その英語の質問文４０２の答えとなる標準的な応答文とそれを日本語に翻訳した翻訳応答文との一覧である応答例４０３とが応答文抽出部１０７によって表示される。このような表示がされることにより、ユーザは、自分が質問しようとする内容と、会話相手が答えてくれるであろう応答文の内容を知ることができる。
【００２２】
図５は、表示部１０５の表示内容の他の一例を示す図である。この図は、ユーザの会話相手から質問文に対する音声の応答文を受けた後の表示内容を示している。質問文４０１、４０２は、図４と同様に表示されている。応答例５０１は、音声認識部１１０で認識された会話相手からの音声応答の認識結果に類似する応答文の一覧が示されている。この応答文中の認識結果と一致する単語は、反転表示されている。表示部１０５の下段に、音声応答の認識結果５０２が示されている。この表示部１０５の内容をユーザが見れば、空港までの所要時間が「タクシーで２０分」であることを推定できる。即ち、認識結果５０２に一致する単語を反転表示された部分が応答文に示されているので、それと認識結果５０２とを比べることにより、数字が異なることから類推される。
【００２３】
音声出力部１０６は、質問文の識別番号とその翻訳された質問文の音声パターンとを記憶しており、質問文選択部１０４から識別番号を通知されると、その識別番号の音声パターンを取り出し、音声信号に変換し、スピーカを介して、質問文を音声出力する。
【００２４】
応答文抽出部１０７は、質問文選択部１０４から応答文の識別番号の通知を受けると、応答文データベース１０２に記憶されている識別番号の一致する応答文とその翻訳応答文とを抽出する。抽出した応答文とその翻訳応答文との一覧を表示部１０５の所定の領域に図４に示したように表示させる。
【００２５】
また、応答文抽出部１０７は、音声認識部１１０から認識結果である文字応答文の通知を受けると、抽出した応答文のうち、文字応答文と類似する応答文を選択する。この類似する応答文とその翻訳応答文と文字応答文とを表示部１０５の所定の表示領域に表示させる。
【００２６】
この類似する応答文の選択に際して、通知された文字応答文を構成する単語と、抽出した応答文を構成する単語とを比較し、一致する単語が応答文に存在するとき、その応答文を類似する応答文として選択する。また、この応答文中の一致する単語の表示属性を図５に示すように白抜きの反転表示となるよう表示属性を変更する。
【００２７】
なお、本実施の形態では、類似する応答文は、応答文に文字応答文を構成する単語を含む場合としたけれども、類似する応答文の選択を以下のようにしてもよい。応答文抽出部１０７は、応答文に含まれる単語の総数に対する文字応答文を構成する単語に一致する単語数の割合を類似度として計算し、所定値以上の類似度を有する応答文を類似する応答文として選択する。
【００２８】
音声認識辞書１０８は、単語標準パターンと単語とを対にして記憶している。図６は、音声認識辞書１０８の内容の一例を示す図である。
【００２９】
単語標準パターン６０１は、標準的に発音された単語「twenty」６０２を発音時間分に対応した５０フレームの特徴量で表している。各フレームは、単語「twenty」を０．５秒で発音し、時間間隔１０msごとずらした所定時間分の音声パターンに対応している。この音声パターンをフーリエ変換し、周波数帯域を１６分割し、各帯域の強度を１６の数値列で表したものを１フレームの特徴量としている。なお、音声パターン６０４は、この各フレームへの分割を説明している。単語「minutes」等についても、単語標準パターンは、所定数のフレームに１６の数値列で表されているけれども、図では省略されている。
【００３０】
音声入力部１０９は、マイクロフォン等からなり、会話相手の音声応答文の入力を受け付け、音声認識部１１０にその音声信号を通知する。
【００３１】
音声認識部１１０は、音声入力部１０９から通知された音声信号を分析し、単語単位の各セグメントに分割し、各セグメントの音声パターンと音声認識辞書１０８の単語標準パターンとを照合する。音声パターンと最も類似度の高い単語標準パターンに対にして記憶されている単語を認識結果とし、認識結果の単語を並べて文字応答文を得る。得られた文字応答文を応答文抽出部１０７に通知する。
【００３２】
本実施の形態では、音声入力部１０９から入力された音声応答文「２０minutes by taxi.」が音声認識部１１０で認識されたけれども、入力される音声応答文によっては、最も類似度の高い標準単語パターンに対になっている単語を選択することが必ずしも正しい認識結果である文字応答文になるとは限らない。
【００３３】
本実施の形態の音声認識部１１０は、更に以下の構成を有する。
【００３４】
音声入力部１０９から図７（ａ）に示す音声応答文「You can go on foot.」が会話相手から入力された場合について説明する。
【００３５】
音声認識部１１０では、図７（ｂ）に示すように音声分析の処理をして、音声パターン７０１を各単語に対応するセグメント７０２〜７０６に分割する。各セグメントの音声パターン７０２〜７０６と音声認識辞書１０８に記憶されている単語標準パターンとを照合し、例えば、ＤＰ（動的計画）マッチング法を用いて類似度を計算する。これによって、セグメント７０２の音声パターンから類似度の高い単語標準パターンの単語から順に、「show」、「you」、「how」、・・・の各単語候補が選択される。同様にセグメント７０３の音声パターンから「can」、「and」、・・・の各単語候補が選択される。
【００３６】
今、応答文抽出部１０７で抽出された応答文を構成する単語に「you」、「can」、「go」、「on」、「foot」等が含まれているとする。音声認識部１１０は、各セグメントごとに各単語候補が応答文抽出部１０７で抽出されている応答文中の単語中に存在するか否かを調べ、存在するときは、その類似度を例えば、「２倍」として、各単語候補を類似度順に並べ替える。この結果、音声入力部１０９から入力された音声応答文の音声パターン７０１から図８に示すような修正された類似度により、各単語候補の順位が並べ替えられる。なお、図７（ｂ）、図８において、類似度の具体的数値の記載は省略している。音声認識部１１０は、最も類似度の高い候補文字をセグメント順に並べて、「You can go on foot.」を認識結果である文字応答文として応答文抽出部１０７に通知する。
【００３７】
このように、質問文に対する応答文が標準的な応答文として応答文抽出部１０７に用意されているので、その応答文を用いて、不特定の会話相手の音声認識の精度を向上させている。
【００３８】
次に、本実施の形態の動作を図９、図１０のフローチャートを用いて説明する。
【００３９】
先ず、ユーザは、入力操作部１０３から日本語の質問文を入力する（Ｓ９０２）。
【００４０】
質問文選択部１０４は、入力操作部１０３で受け付けられた質問文に一致する質問文を質問文翻訳データベース１０１を検索して見つける（Ｓ９０４）。見つけた質問文とその質問文を翻訳した英語の翻訳質問文とを表示部１０５に表示させる（Ｓ９０６）。
【００４１】
次に、応答文抽出部１０７は、質問文選択部１０４から通知された応答文の識別番号をもとに応答文データベース１０２を検索して、応答文とその翻訳文とを抽出する（Ｓ９０８）。抽出した応答文とその翻訳文との一覧を表示部１０５に表示させる（Ｓ９１０）。
【００４２】
質問文選択部１０４は、入力操作部１０３において、ユーザからの質問指示があるか否かを判断する（Ｓ９１２）。なければＳ９０２に戻り、あれば音声出力部１０６に質問指示を通知する。音声出力部１０６は、英語の翻訳質問文を音声出力する（Ｓ９１４）。
【００４３】
次に、音声入力部１０９は、音声出力部１０６から出力された英語の翻訳質問文に答えた会話相手の英語の応答文の音声入力を受け付ける（Ｓ９１６）。
【００４４】
音声認識部１１０は、音声入力部１０９から通知された音声信号を音声分析し、音声認識辞書１０８と照合し、音声を文字応答文として認識する（Ｓ９１８）。
【００４５】
応答文抽出部１０７は、応答文の一覧から文字応答文に類似する応答文を選択する（Ｓ９２０）。類似する応答文（英語）とその翻訳文（日本語）と認識結果である文字応答文（英語）とを表示部１０５に表示させ、この際、類似する応答文の文字応答文に一致する部分の表示属性を変更する（Ｓ９２２）。このようにすることによって、ユーザは、英語を理解することができなくても、質問に対する答えを日本語で理解することができる。
【００４６】
次に、Ｓ９１８の動作の詳細を図１０のフローチャートを用いて詳細に説明する。
【００４７】
先ず、音声認識部１１０は、音声入力部１０９から入力された音声信号を音声分析して、単語単位のセグメントの音声パターンに分割する（Ｓ１００２）。分割した音声パターンを１つ取り出し、音声認識辞書１０８の標準単語パターンとパターンマッチングして類似度を計算する（Ｓ１００４）。この類似度が一定値以上の値を有する標準単語パターンの単語を単語候補とする（Ｓ１００６）。すべてのセグメントの音声パターンについてこの照合をする（Ｓ１００８）。
【００４８】
続いて、単語候補が応答文抽出部で抽出されている応答文の一覧にある応答文を構成する単語と一致するか否か判断する（Ｓ１０１０）。一致しなければ、Ｓ１０１４に移り、一致するときには、その単語候補の類似度を２倍に修正する（Ｓ１０１２）。
【００４９】
次に未処理の単語候補があるか否かを判断し（Ｓ１０１４）、あればＳ１０１０に戻り、なければ各セグメントの類似度の最も高い単語候補を並べて認識結果である文字応答文を得る（Ｓ１０１６）。
【００５０】
このように、予め質問文に対する標準的な応答文が用意されているので、音声応答を音声認識する際に、この応答文に含まれる単語が認識結果である文字応答文を構成する単語となる確率が高いことを利用できる。この結果、不特定の会話相手の音声応答であっても、音声認識の精度が向上する。
【００５１】
（実施の形態２）
図１１は、本発明に係る通訳装置の実施の形態２の構成図である。
【００５２】
この通訳装置は、質問文翻訳データベース１０１と、応答文データベース１１０１と、入力操作部１０３と、質問文選択部１０４と、表示部１０５と、音声出力部１１０２と、応答文抽出部１１０３と、音声認識辞書１０８と、音声入力部１０９と、音声認識部１１０とを備えている。
【００５３】
なお、上記実施の形態１の構成と同一の部分には同一の符号を付してその説明を省略し、本実施の形態固有の構成部分についてのみ説明する。
【００５４】
応答文データベース１１０１は、上記実施の形態１で説明した第２言語（英語）の標準的な応答文とその第１言語（日本語）の翻訳応答文に加えて、その応答文の単語と翻訳応答文の単語との対応を同一の識別番号を付して関連付けて記憶している。更に、応答文を構成する単語又は単語列が質問文に対する応答として必須の要件か否か、また必須の要件である場合には、置換可能性があるか否か、音声認識が正しくされなかったときの再質問規則が記憶されている。
【００５５】
ここで、置換可能性とは、標準的な応答文を構成する単語が他の単語に置換されることが予想されることをいい、置換条件となる単語の属性を含んでいる。
【００５６】
また、再質問規則は、音声認識部１１０において、その応答文を構成する単語を正しく認識することができなかったときに、再質問をするときの規則を定めたものである。
【００５７】
図１２は、この応答文データベース１１０１の内容の一例を示す図である。識別番号「Ｔ１００１」で識別される応答文「１５minutes.」とその翻訳応答文「１５分です。」とを構成する単語の「１５」と「１５」とには識別番号「１」が、「minutes」と「分」とには識別番号「２」がそれぞれ付記されており、対応する単語であることを示している。同様に識別番号「Ｔ１００５」で識別される応答文とその翻訳応答文とにも対応する単語に同一の識別番号が付されている。
【００５８】
一覧表１２０１、１２０２は、応答文を構成する単語について「必須の要件」欄１２０３と、「置換可能性」欄１２０４と、「再質問規則」欄１２０５とを有している。一覧表１２０１は、識別番号「Ｔ１００１」で識別される応答文を構成する単語「１５」、「minutes」が「必須の要件」であるか否か等を記載したものである。両単語とも「必須」であるので「YES」１２０６が「必須の要件」欄１２０３に記載されている。
【００５９】
また、両単語とも他の単語に置換されることが予想されるので、「置換可能性」欄１２０４にはYES１２０７が記載され、単語「１５」が置換される単語は「数字」であり、単語「minutes」が置換される単語は「時間単位」例えば、「second」、「hour」であることが記載されている。
【００６０】
また、単語「１５」又は置換条件に合致した単語が正しく音声認識されないときの対応する規則が再質問規則欄１２０５に「KEYBORD」１２０８と記載されている。この規則「KEYBORD」１２０８は、会話相手からキーボードの操作により単語「１５」を置換する数字の入力を受け付けることを意味している。
【００６１】
単語「minutes」又は置換条件に合致した単語が正しく音声認識されないときの対応規則が再質問規則欄１２０５に「MENU」１２０９と記載されている。この規則「MENU」１２０９は、表示部１０５に「minutes」、「second」、「hour」を表示して、この中から一つを会話相手に選択させることを意味している。
【００６２】
同様に一覧表１２０２には、識別番号「Ｔ１００５」で識別される応答文を構成する単語が「必須の要件」であるか否か等が記載されている。
【００６３】
単語列「by taxi」は、必須の要件であり、置換条件とされる単語の属性は「交通機関」であり、再質問規則として「STR "By what?"」１２１０が記載されている。ここで、再質問規則「STR "By what?"」１２１０は、会話相手に音声による質問文"By what?"を出力することを意味している。
【００６４】
音声出力部１１０２は、再質問規則に対応した音声パターンを記憶している。
【００６５】
再質問規則「KEYBOARD」１２０８に対応して「Input numerals with keyboard.」の音声パターンを記憶しており、「MENU」１２０９に対応して「Indicate appropriate menu item.」の音声パターンを記憶しており、「STR "By what?"」１２１１に対応して「By what?」の音声パターンを記憶している。
【００６６】
音声出力部１１０２は、応答文抽出部１１０３から再質問規則の通知を受けると、対応する音声パターンを音声として出力する。また、応答文抽出部１１０３から再質問の指示を受けると、英語の翻訳質問文を再度出力する。
【００６７】
応答文抽出部１１０３は、上記実施の形態１で説明した構成に加えて、音声認識部１１０から文字応答文の通知を受けて類似する応答文を選択した後、以下の処理をする。
【００６８】
応答文抽出部１１０３は、文字応答文と類似する応答文とが完全に一致するか否かを両文を構成する単語を比較して判断する。完全に一致すれば、質問文の答えに対する応答が成立したものとする。
【００６９】
完全に一致しないとき、異なる部分（類似する応答文の単語）が必須の要件であるか否かを応答文データベース１１０１の一覧表１２０１等をみて判断する。必須の要件でなければ、質問文の答えとして成立するとみなす。
【００７０】
異なる部分が必須の要件であるとき、その単語が置換可能性を有するか否かを同様に一覧表１２０１等をみて判断する。置換可能性がなければ、質問に対する答えとして成立しない。この際、質問文が会話相手に伝わらなかったものとして、音声出力部１１０２に再質問の指示をする。
【００７１】
置換可能性があるときには、文字応答文の異なる部分に対応する単語が置換条件である単語の属性に一致するか否かを判断する。一致すると判断したとき、文字応答文は質問文の答えとして成立する。
【００７２】
一致しないと判断したとき、質問文の答えとして必須の要件が欠落していることになるので、再質問規則を一覧表１２０１等から取得する。
【００７３】
応答文抽出部１１０３は、再質問規則が「KEYBOARD」１２０８であるときには、音声出力部１１０２に再質問規則「KEYBOARD」を通知するとともに、表示部１０５に、「キーボードで入力依頼」を表示させる。これにより、ユーザは、会話相手に入力操作部１０３を用いて入力操作を受けることを了解する。
【００７４】
また、再質問規則が「MENU」１２０９であるときには、音声出力部１１０２に再質問規則「MENU」を通知するとともに、表示部１０５に「second」、「minutes」、「hour」を表示させる。なお、この場合、置換条件の単語の属性が「時間単位」であるのでメニュー項目が「second」等となったけれども、単語の属性によってメニュー項目の内容は変更される。
【００７５】
再質問規則が「STR "By what?"」１２１０であるときには、音声出力部１１０２に再質問規則「STR "By what?"」を通知する。
【００７６】
応答文抽出部１１０３は、音声出力部１１０２に「STR "By what?"」の通知をした後、音声認識部１１０から文字応答文（単語列）の通知を受けると、類似する応答文の単語列又は置換条件に合致した、「by taxi」、「by bus」「by train」又は「on foot」等であるときには、文字応答文を修正する。
【００７７】
また、再質問規則「KEYBOARD」、「MENU」を音声出力部１１０２に通知した後に、入力操作部１０３から会話相手からの数字の入力やメニュー項目の指示を受けた旨の通知により、対応する文字応答文の単語を数字やメニュー項目の内容に変更して表示部１０５に表示させる。
【００７８】
応答文抽出部１０３は、上記実施の形態１では、文字応答文と類似する応答文とを構成する単語で一致するものを類似する応答文の表示属性を変更して表示するようにしたけれども、本実施の形態では、これに換えて、応答文データベース１１０１の類似する応答文を構成する単語とその翻訳応答文を構成する単語との対応関係を識別番号から調べ、文字応答文に一致する翻訳応答文の部分の表示属性を変更する。
【００７９】
なお、音声認識部１１０において、全ての候補文字の類似度が所定のしきい値以下であるときには、認識不能としてその部分を応答文抽出部１１０３に通知する。この場合に、応答文抽出部１１０３は、応答文を構成する単語を正しく認識することができなかったときと同様に取り扱う。
【００８０】
今、上記実施の形態１と同様の英語の質問文「How long does it take to the airport?」が音声出力部１０６から出力された場合に、音声認識部１１０において、文字応答文「xxx minutes by taxi.」が認識されたとき（「xxx」は音声認識不能を示す）、応答文抽出部１１０３は、「xxx」が必須の要件であり、置換条件が数字であり、再質問規則が「KEYBOARD」１２０８であると判断する。音声出力部１１０２から「Input numerals with keyboard.」が音声出力される。会話相手に本装置を渡し、入力操作部１０３から数字「２０」の入力を受ける。これによって、音声認識が不能であった「xxx」が数字「２０」に置換される。
【００８１】
図１３は、表示部１０５に表示された内容の一例を示している。応答例１３０１では、翻訳応答文の文字応答文１３０２に一致する対応部分が反転表示されている。なお、表示属性の変更を反転表示としているけれども、他の属性、例えば表示色の変更等であってもよい。これによって、ユーザは、より一層会話相手からの答えを容易に理解することができる。次に、本実施の形態の動作を図１４に示すフローチャートを用いて説明する。上記実施の形態１の図９に示したＳ９２０までの動作は同様であるので、本実施の形態固有の動作のみ説明する。
【００８２】
Ｓ９２０において、応答文抽出部１１０３は、応答文の一覧から文字応答文に類似する応答文を選択する。
【００８３】
次に、応答文抽出部１１０３は、文字応答文と類似する応答文とが、完全に一致するか否かを判断し（Ｓ１４０２）、一致すると判断したときはＳ１４１６に移り、一致しないと判断したときは、異なる部分（単語又は単語列）が必須の要件であるか否かを応答文データベース１１０１の一覧表１２０１等をみて判断する（Ｓ１４０４）。必須の要件でなければＳ１４１６に移り、必須の要件であるときは、同様に一覧表１２０１等をみて置換可能性が有るか否かを判断する（Ｓ１４０６）。
【００８４】
置換可能でないと判断したときは、質問文が会話相手に伝わらなかったとして、音声出力部１１０２に再質問の指示をし、上記実施の形態１のＳ９１４に戻る。
【００８５】
置換可能であると判断したときは、文字応答文の単語が置換条件に合致するか否かを判断し（Ｓ１４０８）、合致すればＳ１４１６に移る。合致しないときは、再質問規則に応じて、所定の音声を出力するよう音声出力部１１０２に通知する。
【００８６】
音声出力部１１０２は、再質問規則に応じた音声を出力し、会話相手に伝える（Ｓ１４１０）。音声入力部１０９または、入力操作部１０３から会話相手からの再応答を受け付ける（Ｓ１４１２）。
【００８７】
音声認識部１１０は、音声入力部から入力された音声を認識し、単語又は単語列を応答文抽出部１１０３に通知し、入力操作部１０３は入力操作された単語を応答文抽出部１１０３に通知する（Ｓ１４１４）。
【００８８】
応答文抽出部１１０３は、表示部１０５に類似する応答文とその翻訳応答文と認識結果である文字応答文とを表示させる。その際、文字応答文に一致する翻訳応答文の部分の表示属性を変更する（Ｓ１４１６）。
【００８９】
このようにすることで、会話相手からの音声応答を一度は認識できなかった場合でも、その内容に応じた再質問を用意しておき、会話相手から適切な入力を受けて、音声応答文の認識精度を向上することができる。
【００９０】
なお、上記実施の形態では、第１言語として日本語を、第２言語として英語を例に説明したけれども、本発明に係る通訳装置では、第１言語と第２言語とが逆であってもよいし、また他の言語が第１または第２言語となっていてもよい。この場合には、質問文翻訳データベース１０１、音声出力部１０６、１１０２、応答文データベース１０２、１１０１及び音声認識辞書１０８がそれぞれの言語に対応する内容とされる。
【００９１】
また、上記実施の形態では、図１及び図１１に示した構成図で示した各部がそれぞれの機能を発揮したけれども、各部の機能を発揮されるプログラムをコンピュータ読み取り可能なフロッピーディスクやＣＤ−ＲＯＭ等の記録媒体に記録する。この通訳装置特有の機能を有しない携帯端末装置等にこの記録媒体を装着し、本装置と同様の機能を有する通訳装置とすることができる。
【００９２】
【発明の効果】
以上説明したように、本発明は、第１言語の質問文とそれを翻訳した第２言語の質問文とを対にして記憶している質問文データベースと、第２言語の質問文に応答する標準的な複数の第２言語の応答文と、それらを翻訳した第１言語の応答文とを記憶している応答文データベースと、入力操作手段と、音声入力手段と、前記入力操作手段でユーザから入力された第１言語の質問文を前記質問文データベースを検索し、それと対になっている第２言語の質問文を抽出する質問文抽出手段と、抽出された第２言語の質問文の出力を受けてユーザの会話相手が第２言語の応答文を音声で入力した場合に、入力された音声を第２言語の文字応答文として認識する音声認識手段と、前記音声認識手段で認識された文字応答文に類似する第２言語の応答文と対応する第１言語の応答文とを前記応答文データベースから抽出する応答文抽出手段と、前記応答文抽出手段で抽出された第２言語の応答文と第１言語の応答文と前記音声認識手段で認識された文字応答文とを表示する表示手段とを備えることとしている。このような構成によって、ユーザの会話相手は、通常の会話をするように音声応答するだけで、ユーザには、質問文に答えた応答文が、ユーザの理解できる第１言語で表示されるのでスムーズな会話をすることができる。
【００９３】
また、前記音声認識手段は、単語標準パターンとその単語標準パターンに対応する単語とを記憶している音声認識辞書と、前記音声入力手段から入力された音声を分析して単語単位のセグメントのパターンに分割する分割部と、前記分割部で分割されたセグメントのパターンと前記音声認識辞書に記憶されている単語標準パターンとの類似度を計算し、類似度が所定値以上の単語標準パターンに対応する単語を候補単語として抽出する候補単語抽出部と、前記候補単語抽出部で抽出された各セグメント類似度の最大の候補単語を並べて文字応答文を生成する文字応答文生成部とを有することとしている。このような構成によって、会話相手の応答文を文字応答文として認識することができる。
【００９４】
また、前記候補単語抽出部は、抽出した候補単語が前記応答文抽出手段で抽出された応答文を構成する単語と一致するとき、その類似度を所定の倍率で大きくすることとしている。このような構成によって、不特定の会話相手の音声であっても、応答文として予測される単語の認識する確率を高めることによって、音声応答文の認識精度を向上することができる。また、前記応答文データベースには、応答文を構成する単語が応答文として必須の要素であるかと、必須の要素であるときに、その単語が他の単語に置換されることが予想されるかと、予想されるとき、置換が予想される単語の意味属性とが記憶されており、前記音声認識手段で認識された文字応答文を構成する単語と前記類似する応答文を構成する単語とを照合し、必須の要素を欠き、かつ、置換が予想される単語の意味属性と異なる単語を応答文に対応する単語とするとき文字応答文が質問の答えとして成立しないと判断する文字応答文判断手段を備えることとしている。このような構成によって、音声認識手段で認識された文字応答文が質問文の答えとして成立するか否かが判断される。
【００９５】
また、前記応答文データベースには、応答文を構成する単語が必須の要素であるときに、その単語を認識できなかったときの再質問規則が記憶されており、前記文字応答文判断手段が、文字応答文が質問の答えとして成立しないと判断したとき、前記再質問規則に応じた再質問を出力する出力手段を備えることとしている。このような構成によって、質問文の答えと成立しない文字応答文を会話相手への再質問をすることによって、正しく認識された文字応答文とすることができる。
【００９６】
また、前記表示手段に表示される文字応答文を構成する単語と前記応答文抽出手段で抽出された第２言語の応答文を構成する単語とを比較し、一致する単語の前記第２言語の応答文の表示属性を変更する制御手段を備えることとしている。このような構成によって、会話相手からの応答文の理解が容易となる。
【００９７】
また、前記応答文データベースに記憶されている第１言語の応答文とその第２言語の応答文とを構成する対応する単語に同一の識別子を付し、前記表示手段に表示される文字応答文を構成する単語と前記応答文抽出手段で抽出された第２言語の応答文を構成する単語とを比較し、一致する単語と同一の識別子が付された前記第１言語の応答文の表示属性を変更する表示制御手段を備えることとしている。このような構成によって、会話相手からの第２言語の応答文をユーザの理解できる第１言語の内容との違いを明確にして知ることができる。
【００９８】
更に、コンピュータに読取可能な記録媒体であって、第１言語の質問文とそれを翻訳した第２言語の質問文とを対にして記憶している質問文データベースと、第２言語の質問文に応答する標準的な複数の第２言語の応答文と、それらを翻訳した第１言語の応答文とを記憶している応答文データベースとを予め記録し、コンピュータを、入力操作手段と、音声入力手段と、前記入力操作手段でユーザから入力された第１言語の質問文を前記質問文データベースを検索し、それと対になっている第２言語の質問文を抽出する質問文抽出手段と、抽出された第２言語の質問文の出力を受けてユーザの会話相手が第２言語の応答文を音声で入力した場合に、入力された音声を第２言語の文字応答文として認識する音声認識手段と、前記音声認識手段で認識された文字応答文に類似する第２言語の応答文と対応する第１言語の応答文とを前記応答文データベースから抽出する応答文抽出手段と、前記応答文抽出手段で抽出された第２言語の応答文と第１言語の応答文と前記音声認識手段で認識された文字応答文とを表示させる表示制御手段として機能させるためのプログラムを記録することとしている。これによって、通訳機能を有しない携帯端末装置を効率的な通訳装置として使用することができる。
【図面の簡単な説明】
【図１】本発明に係る通訳装置の実施の形態１の構成図である。
【図２】上記実施の形態の質問文翻訳データベースの内容の一例を示す図である。
【図３】上記実施の形態の応答文データベースの内容の一例を示す図である。
【図４】上記実施の形態の表示部に表示されている内容の一例を示す図である。
【図５】上記実施の形態の表示部に表示されている内容の他の一例を示す図である。
【図６】上記実施の形態の音声認識辞書の内容の一例を示す図である。
【図７】（ａ）は、上記実施の形態の音声入力部に入力された音声応答の文字応答文の一例を示す。
（ｂ）は、上記実施の形態の音声認識部において音声信号から単語単位の各セグメントの音声パターンに分割して単語候補を認識する様子の説明図である。
【図８】上記実施の形態の音声認識部において、単語候補の類似度を修正して、文字応答文を認識する様子の説明図である。
【図９】上記実施の形態の動作を説明するフローチャートである。
【図１０】上記図９のＳ９１８の詳細な動作を説明するフローチャートである。
【図１１】本発明に係る通訳装置の実施の形態２の構成図である。
【図１２】上記実施の形態の応答文データベースの内容の一例を示す図である。
【図１３】上記実施の形態の表示部に表示されている内容の一例を示す図である。
【図１４】上記実施の形態の動作を説明するフローチャートである。
【符号の説明】
１０１質問文翻訳データベース
１０２、１１０１応答文データベース
１０３入力操作部
１０４質問文選択部
１０５表示部
１０６、１１０２音声出力部
１０７、１１０３応答文抽出部
１０８音声認識辞書
１０９音声入力部
１１０音声認識部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an interpreting apparatus that interprets a conversation between a first language and a second language.
[0002]
[Prior art]
In a conventional portable interpreting device, when a user selects a menu classified according to usage scenes or inputs a keyword and selects a word or a sentence (question sentence) in a first language, the selected word or A method of displaying a second language word or one sentence (question sentence) corresponding to one sentence is mainly adopted.
[0003]
In order for the user to obtain a reply from the conversation partner, for example, in the technique disclosed in Japanese Patent Laid-Open No. 7-105220, when the display of the above-mentioned second language question sentence is presented to the conversation partner, several types of response sentences in the second language are used. A method is shown in which a conversation partner selects one of the response sentences as displayed.
[0004]
In the technique disclosed in Japanese Patent Application Laid-Open No. 9-319750, responses of the conversation partner are assumed, and the responses are hierarchically classified for easy selection, or a response from the partner is obtained by directly inputting numbers and symbols. The scheme is shown.
[0005]
[Problems to be solved by the invention]
However, in the above two techniques, in order for the user to obtain a response sentence from the conversation partner, the conversation partner must be taught how to use the interpretation device, and the operation of the device must be entrusted. Smooth conversation with the other party is difficult.
[0006]
In view of the above problems, an object of the present invention is to provide an interpreting device with excellent operability that does not require that the conversation partner is proficient in handling the interpreting device.
[0007]
[Means for Solving the Problems]
In order to solve the above problems, the present invention responds to a question sentence database storing a question sentence in a first language and a question sentence in a second language obtained by translating it, and a question sentence in a second language. Response sentence database storing a plurality of standard response sentences in the second language and response sentences in the first language translated from them, input operation means, voice input means, and user in the input operation means A question sentence extracting means for searching the question sentence database for a question sentence in the first language input from the second language and extracting a question sentence in the second language paired with the question sentence database; When the user's conversation partner receives the output and inputs a response sentence in the second language by voice, the voice recognition means for recognizing the input voice as a character response sentence in the second language is recognized by the voice recognition means. Second language response similar to a character response sentence Response sentence extraction means for extracting the response sentence in the first language corresponding to the response sentence database, the response sentence in the second language extracted by the response sentence extraction means, the response sentence in the first language, and the speech recognition Display means for displaying the character response sentence recognized by the means.
[0008]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of a communication apparatus according to the present invention will be described with reference to the drawings.
[0009]
(Embodiment 1)
FIG. 1 is a configuration diagram of Embodiment 1 of an interpreting apparatus according to the present invention.
[0010]
The interpreter includes a question sentence translation database 101, a response sentence database 102, an input operation unit 103, a question sentence selection unit 104, a display unit 105, a voice output unit 106, a response sentence extraction unit 107, and a voice. A recognition dictionary 108, a voice input unit 109, and a voice recognition unit 110 are provided.
[0011]
The question sentence translation database 101 is composed of a ROM or the like, and stores a pair of a question sentence in a first language and a question sentence in a second language obtained by translating the question sentence into a second language with an identification number. FIG. 2 shows an example of the contents of the question sentence translation database 101. In the question sentence translation database 101, the first language is Japanese and the second language is English. In this embodiment, the user who uses this interpreting apparatus understands Japanese and the conversation partner understands English.
[0012]
The identification number “0002” in the question sentence translation database 101 corresponds to a Japanese question sentence “How long does it take to get to the airport” and “How long does it take to the airport?” Translated into English. Is remembered. Furthermore, the standard number of response sentences “5” and the identification numbers “T1001”, “T1002”,... “T1005” of the response sentences are stored. This identification number of the response text is stored in the response text database 102 as an identification number for identifying the response text.
[0013]
The response sentence database 102 includes a ROM or the like, and stores a list of standard response sentences that answer the English question sentences in the question sentence translation database 101. FIG. 3 is a diagram illustrating an example of contents stored in the response sentence database 102. Each response sentence 301 stores a Japanese translation 302 of the response sentence and an identification number 303 of the response sentence correspondingly.
[0014]
The response sentence database 102 stores a response sentence “30dollars” identified by the identification number “T0999” and its translation into Japanese “30 dollars”.
[0015]
The input operation unit 103 includes a keyboard or the like and accepts input of a user's question text. At this time, the user may input the full text of the question sentence. For example, when the user wants to ask the required time to the airport, the user inputs “airport” and “time”. The input operation unit 103 notifies the question sentence selection unit 104 of the inputted question sentence or the inputted words “airport” and “time”.
[0016]
When the input operation unit 103 receives an input of a question instruction from the user, the input operation unit 103 notifies the question sentence selection unit 104 to that effect.
[0017]
When the question sentence selection unit 104 receives the notification of the question sentence from the input operation unit 103, the question sentence selection unit 104 checks the question sentence translation database 101, and finds the matching question sentence, the translated question sentence, the identification number, and the identification number of the response sentence. Extract.
[0018]
When a word notification is received as a question sentence, the question sentence including the keyword is selected using the word as a keyword, and the translated question sentence, identification number, and identification number of the response sentence are extracted. To do. For example, when “Airport” and “Time” are notified, a question sentence “How long does it take to get to the airport” of the identification number “0002” including these two keywords is selected. The question sentence selection unit 104 displays the extracted Japanese question sentence and the English question sentence in a predetermined area of the display unit 105, and at the same time, the response sentence identification number “T1001, T1002, T1003, T1004, T1005”. "Is sent to the response sentence extraction unit 107.
[0019]
Upon receiving a question instruction notification from the input operation unit 103, the question sentence selection unit 104 notifies the voice output unit 106 of the identification number of the question sentence.
[0020]
The display unit 105 includes a liquid crystal display or the like, and displays a question sentence and a response sentence under the control of the question sentence selection unit 104 and the response sentence extraction unit 107.
[0021]
FIG. 4 is a diagram illustrating an example of display contents of the display unit 105. This figure is a display example in a state in which a question sentence is selected by the question sentence selection unit 104 and a question sentence translated from the voice output unit 106 is output as voice. A Japanese question sentence 401 and its English question sentence 402 are displayed by the question sentence selection unit 104, and a standard response sentence that is an answer to the English question sentence 402 and a translation response sentence that is translated into Japanese The response sentence extraction unit 107 displays a response example 403 that is a list of With such a display, the user can know the content that he / she wants to ask a question and the content of a response sentence that the conversation partner will answer.
[0022]
FIG. 5 is a diagram illustrating another example of the display content of the display unit 105. This figure shows the display contents after receiving a voice response sentence to the question sentence from the conversation partner of the user. The question sentences 401 and 402 are displayed as in FIG. The response example 501 shows a list of response sentences similar to the recognition result of the voice response from the conversation partner recognized by the voice recognition unit 110. The word that matches the recognition result in the response sentence is highlighted. A voice response recognition result 502 is shown in the lower part of the display unit 105. If the user looks at the contents of the display unit 105, it can be estimated that the required time to the airport is "20 minutes by taxi". That is, since the response sentence shows a portion where the word matching the recognition result 502 is highlighted, it is inferred that the number is different by comparing it with the recognition result 502.
[0023]
The voice output unit 106 stores the identification number of the question sentence and the voice pattern of the translated question sentence. When the identification number is notified from the question sentence selection unit 104, the voice output unit 106 extracts the voice pattern of the identification number. The voice message is converted into a voice signal, and the question sentence is voiced through a speaker.
[0024]
When the response sentence extraction unit 107 receives the notification of the identification number of the response sentence from the question sentence selection unit 104, the response sentence extraction unit 107 extracts the response sentence having the same identification number stored in the response sentence database 102 and its translation response sentence. A list of the extracted response sentences and their translation response sentences is displayed in a predetermined area of the display unit 105 as shown in FIG.
[0025]
When the response sentence extraction unit 107 receives a notification of a character response sentence as a recognition result from the voice recognition unit 110, the response sentence extraction unit 107 selects a response sentence similar to the character response sentence from the extracted response sentences. The similar response sentence, the translation response sentence, and the character response sentence are displayed in a predetermined display area of the display unit 105.
[0026]
When selecting a similar response sentence, the words constituting the notified text response sentence are compared with the words constituting the extracted response sentence. If a matching word exists in the response sentence, the response sentence is similar. Select as a response sentence. Further, the display attribute of the matching word in the response sentence is changed so as to be a white inverted display as shown in FIG.
[0027]
In the present embodiment, the similar response sentence is a case where the response sentence includes a word constituting the character response sentence, but the selection of the similar response sentence may be as follows. The response sentence extraction unit 107 calculates the ratio of the number of words that match the words constituting the character response sentence with respect to the total number of words included in the response sentence as a similarity, and resembles response sentences having a similarity greater than or equal to a predetermined value Select as response text.
[0028]
The speech recognition dictionary 108 stores word standard patterns and words in pairs. FIG. 6 is a diagram illustrating an example of the contents of the speech recognition dictionary 108.
[0029]
The word standard pattern 601 represents a standardly pronounced word “twenty” 602 with a feature value of 50 frames corresponding to the pronunciation time. In each frame, the word “twenty” is pronounced in 0.5 seconds and corresponds to a voice pattern for a predetermined time shifted by a time interval of 10 ms. This voice pattern is Fourier transformed to divide the frequency band into 16, and the intensity of each band is represented by 16 numerical strings as one frame feature amount. Note that the audio pattern 604 describes this division into frames. For the word “minutes” and the like, the word standard pattern is represented by 16 numerical strings in a predetermined number of frames, but is omitted in the figure.
[0030]
The voice input unit 109 is composed of a microphone or the like, receives input of a voice response sentence of the conversation partner, and notifies the voice recognition unit 110 of the voice signal.
[0031]
The speech recognition unit 110 analyzes the speech signal notified from the speech input unit 109, divides the speech signal into each segment, and collates the speech pattern of each segment with the word standard pattern of the speech recognition dictionary 108. The word stored in a pair with the word standard pattern having the highest similarity to the speech pattern is used as a recognition result, and the word response is arranged to obtain a character response sentence. The obtained character response sentence is notified to the response sentence extraction unit 107.
[0032]
In the present embodiment, the voice response sentence “20 minutes by taxi.” Input from the voice input unit 109 is recognized by the voice recognition unit 110. However, depending on the input voice response sentence, the standard word having the highest similarity is used. Selecting a word paired with a pattern does not necessarily result in a character response sentence that is a correct recognition result.
[0033]
The speech recognition unit 110 of the present embodiment further has the following configuration.
[0034]
The case where the voice response sentence “You can go on foot.” Shown in FIG. 7A is input from the voice input unit 109 from the conversation partner will be described.
[0035]
The speech recognition unit 110 performs speech analysis processing as shown in FIG. 7B, and divides the speech pattern 701 into segments 702 to 706 corresponding to each word. The speech patterns 702 to 706 of each segment are collated with the word standard pattern stored in the speech recognition dictionary 108, and the similarity is calculated using, for example, a DP (dynamic planning) matching method. As a result, the word candidates “show”, “you”, “how”,... Are selected in order from the speech pattern of the segment 702 to the words in the word standard pattern having the highest similarity. Similarly, word candidates “can”, “and”,... Are selected from the speech pattern of the segment 703.
[0036]
Now, it is assumed that “you”, “can”, “go”, “on”, “foot”, and the like are included in the words constituting the response sentence extracted by the response sentence extraction unit 107. The speech recognition unit 110 checks whether or not each word candidate exists in the word in the response sentence extracted by the response sentence extraction unit 107 for each segment. Each word candidate is rearranged in order of similarity as “double”. As a result, the ranks of the word candidates are rearranged according to the modified similarity as shown in FIG. 8 from the voice pattern 701 of the voice response sentence input from the voice input unit 109. In FIG. 7B and FIG. 8, the description of specific numerical values of similarity is omitted. The voice recognition unit 110 arranges candidate characters having the highest similarity in the segment order, and notifies “You can go on foot.” To the response sentence extraction unit 107 as a character response sentence that is a recognition result.
[0037]
Thus, since the response sentence for the question sentence is prepared as a standard response sentence in the response sentence extraction unit 107, the accuracy of voice recognition of an unspecified conversation partner is improved by using the response sentence. .
[0038]
Next, the operation of the present embodiment will be described using the flowcharts of FIGS.
[0039]
First, the user inputs a Japanese question sentence from the input operation unit 103 (S902).
[0040]
The question sentence selection unit 104 searches the question sentence translation database 101 to find a question sentence that matches the question sentence received by the input operation unit 103 (S904). The found question sentence and the English translation question sentence obtained by translating the question sentence are displayed on the display unit 105 (S906).
[0041]
Next, the response text extraction unit 107 searches the response text database 102 based on the identification number of the response text notified from the question text selection unit 104, and extracts the response text and its translation text (S908). . A list of the extracted response sentences and their translated sentences is displayed on the display unit 105 (S910).
[0042]
The question sentence selection unit 104 determines whether or not there is a question instruction from the user in the input operation unit 103 (S912). If not, the process returns to S902, and if so, the voice output unit 106 is notified of a question instruction. The voice output unit 106 outputs the English translation question sentence as a voice (S914).
[0043]
Next, the voice input unit 109 receives a voice input of an English response sentence of the conversation partner who answered the English translation question sentence output from the voice output unit 106 (S916).
[0044]
The voice recognition unit 110 analyzes the voice signal notified from the voice input unit 109, compares it with the voice recognition dictionary 108, and recognizes the voice as a character response sentence (S918).
[0045]
The response sentence extraction unit 107 selects a response sentence similar to the character response sentence from the list of response sentences (S920). A similar response sentence (English), its translated sentence (Japanese), and a character response sentence (English) as a recognition result are displayed on the display unit 105. At this time, a portion that matches the character response sentence of the similar response sentence Display attributes are changed (S922). By doing in this way, the user can understand the answer to the question in Japanese even if the user cannot understand English.
[0046]
Next, details of the operation of S918 will be described in detail with reference to the flowchart of FIG.
[0047]
First, the voice recognition unit 110 performs voice analysis on the voice signal input from the voice input unit 109 and divides the voice signal into segment word patterns (S1002). One divided speech pattern is taken out and pattern matching with the standard word pattern in the speech recognition dictionary 108 is performed to calculate the similarity (S1004). A word having a standard word pattern having a similarity equal to or greater than a certain value is set as a word candidate (S1006). This collation is performed for the voice patterns of all segments (S1008).
[0048]
Subsequently, it is determined whether or not the word candidate matches the word constituting the response sentence in the list of response sentences extracted by the response sentence extraction unit (S1010). If they do not match, the process moves to S1014. If they match, the similarity of the word candidate is corrected to double (S1012).
[0049]
Next, it is determined whether or not there is an unprocessed word candidate (S1014). If there is, the process returns to S1010. If not, word candidates having the highest similarity in each segment are arranged to obtain a character response sentence as a recognition result (S1016). ).
[0050]
As described above, since a standard response sentence for a question sentence is prepared in advance, when a voice response is recognized, a word included in the response sentence becomes a word constituting a character response sentence as a recognition result. High probability can be used. As a result, the accuracy of voice recognition is improved even when the voice response of an unspecified conversation partner is received.
[0051]
(Embodiment 2)
FIG. 11 is a configuration diagram of Embodiment 2 of the interpreting apparatus according to the present invention.
[0052]
This interpreter includes a question sentence translation database 101, a response sentence database 1101, an input operation unit 103, a question sentence selection unit 104, a display unit 105, a voice output unit 1102, a response sentence extraction unit 1103, a voice A recognition dictionary 108, a voice input unit 109, and a voice recognition unit 110 are provided.
[0053]
The same parts as those in the first embodiment are denoted by the same reference numerals, and the description thereof is omitted. Only the constituent parts unique to this embodiment will be described.
[0054]
In addition to the standard response sentence in the second language (English) and the translation response sentence in the first language (Japanese) described in the first embodiment, the response sentence database 1101 includes words and translations of the response sentence Correspondence with the words in the response sentence is stored in association with the same identification number. Furthermore, whether or not the words or word strings that make up the response sentence are indispensable requirements as a response to the question sentence, and if it is an indispensable requirement, whether or not there is a possibility of substitution, speech recognition was not correct The re-question rule is sometimes remembered.
[0055]
Here, the possibility of replacement means that a word constituting a standard response sentence is expected to be replaced with another word, and includes an attribute of a word as a replacement condition.
[0056]
The re-question rule defines a rule for re-questioning when the speech recognition unit 110 cannot correctly recognize the words constituting the response sentence.
[0057]
FIG. 12 is a diagram showing an example of the contents of the response sentence database 1101. The identification number “1” is assigned to the words “15” and “15” constituting the response sentence “15 minutes.” Identified by the identification number “T1001” and the translation response sentence “15 minutes.” An identification number “2” is appended to “minutes” and “minutes”, respectively, indicating that they are corresponding words. Similarly, the same identification number is assigned to the word corresponding to the response sentence identified by the identification number “T1005” and the translation response sentence.
[0058]
The list tables 1201 and 1202 have a “required requirement” column 1203, a “replaceability” column 1204, and a “re-question rule” column 1205 for the words constituting the response sentence. The list 1201 describes whether or not the words “15” and “minutes” constituting the response sentence identified by the identification number “T1001” are “essential requirements”. Since both words are “essential”, “YES” 1206 is described in the “essential requirement” column 1203.
[0059]
Since both words are expected to be replaced with other words, YES 1207 is described in the “replaceability” column 1204, the word with which the word “15” is replaced is “number”, and the word It is described that the word in which “minutes” is replaced is “time unit”, for example, “second”, “hour”.
[0060]
The corresponding rule when the word “15” or the word that matches the replacement condition is not correctly recognized by speech is described as “KEYBORD” 1208 in the re-question rule column 1205. This rule “KEYBORD” 1208 means that an input of a number replacing the word “15” is accepted from the conversation partner by operating the keyboard.
[0061]
The correspondence rule when the word “minutes” or the word that matches the replacement condition is not correctly recognized by voice is described as “MENU” 1209 in the re-question rule column 1205. This rule “MENU” 1209 means that “minutes”, “second”, and “hour” are displayed on the display unit 105 and one of them is selected by the conversation partner.
[0062]
Similarly, the list 1202 describes whether or not the word constituting the response sentence identified by the identification number “T1005” is “essential requirement”.
[0063]
The word string “by taxi” is an indispensable requirement, the attribute of the word used as a replacement condition is “transportation”, and “STR“ By what? ”” 1210 is described as a re-question rule. Here, the re-question rule “STR“ By what? ”” 1210 means that a question sentence “By what?” Is output to the conversation partner.
[0064]
The voice output unit 1102 stores a voice pattern corresponding to the re-question rule.
[0065]
The voice pattern “Input numerals with keyboard.” Is stored corresponding to the re-question rule “KEYBOARD” 1208, and the voice pattern “Indicate appropriate menu item.” Is stored corresponding to “MENU” 1209. The voice pattern “By what?” Is stored in correspondence with “STR“ By what? ”” 1211.
[0066]
Upon receiving the re-question rule notification from the response sentence extraction unit 1103, the voice output unit 1102 outputs a corresponding voice pattern as voice. When receiving a re-question instruction from the response sentence extraction unit 1103, the English translation question sentence is output again.
[0067]
In addition to the configuration described in the first embodiment, the response sentence extraction unit 1103 performs the following processing after selecting a similar response sentence in response to the notification of the character response sentence from the speech recognition unit 110.
[0068]
The response sentence extraction unit 1103 determines whether or not a response sentence similar to a character response sentence is completely matched by comparing words constituting both sentences. If they completely match, it is assumed that a response to the answer to the question sentence has been established.
[0069]
If they do not completely match, it is determined whether or not a different part (similar words in a response sentence) is an essential requirement by looking at the list 1201 in the response sentence database 1101 or the like. If it is not an essential requirement, it will be considered as an answer to the question.
[0070]
When a different part is an indispensable requirement, whether or not the word has a possibility of replacement is similarly determined by looking at the list 1201 or the like. If there is no possibility of replacement, the answer to the question does not hold. At this time, a re-question is instructed to the voice output unit 1102 assuming that the question sentence has not been transmitted to the conversation partner.
[0071]
When there is a possibility of replacement, it is determined whether or not words corresponding to different parts of the character response sentence match the attribute of the word that is the replacement condition. When it is determined that they match, the character response sentence is established as an answer to the question sentence.
[0072]
When it is determined that they do not coincide with each other, an indispensable requirement is missing as an answer to the question sentence.
[0073]
When the re-question rule is “KEYBOARD” 1208, the response sentence extraction unit 1103 notifies the re-question rule “KEYBOARD” to the voice output unit 1102 and causes the display unit 105 to display “input request with keyboard”. Thereby, the user understands that the conversation partner receives an input operation using the input operation unit 103.
[0074]
When the re-question rule is “MENU” 1209, the re-question rule “MENU” is notified to the audio output unit 1102, and “second”, “minutes”, and “hour” are displayed on the display unit 105. In this case, since the word attribute of the replacement condition is “time unit”, the menu item is “second” or the like, but the content of the menu item is changed depending on the word attribute.
[0075]
When the re-question rule is “STR“ By what? ”” 1210, the re-question rule “STR“ By what? ”” Is notified to the audio output unit 1102.
[0076]
When the response sentence extraction unit 1103 notifies the voice output unit 1102 of “STR“ By what? ”” And then receives a notification of a character response sentence (word string) from the voice recognition unit 110, a word of a similar response sentence When “by taxi”, “by bus”, “by train”, “on foot”, or the like that matches the column or replacement condition, the character response sentence is corrected.
[0077]
In addition, after notifying the voice output unit 1102 of the re-question rules “KEYBOARD” and “MENU”, the input operation unit 103 receives a number input from the conversation partner or a notification that a menu item instruction has been received. The words of the response sentence are changed to numbers or menu item contents and displayed on the display unit 105.
[0078]
In the first embodiment, the response sentence extraction unit 103 changes the display attribute of the similar response sentence and displays the matching word that constitutes the response sentence similar to the character response sentence. In this embodiment, instead of this, the correspondence between the words constituting the similar response sentence in the response sentence database 1101 and the words constituting the translation response sentence is checked from the identification number, and the translation that matches the character response sentence Change the display attribute of the response text part.
[0079]
Note that when the similarity of all candidate characters is equal to or lower than a predetermined threshold in the voice recognition unit 110, the response sentence extraction unit 1103 is notified of that part as being unrecognizable. In this case, the response sentence extraction unit 1103 handles the same as when the words constituting the response sentence cannot be correctly recognized.
[0080]
Now, when an English question sentence “How long does it take to the airport?” Similar to the first embodiment is output from the voice output unit 106, the voice response unit 110 receives the character response sentence “xxx minutes by When “taxi.” is recognized (“xxx” indicates that speech recognition is not possible), the response sentence extraction unit 1103 indicates that “xxx” is an essential requirement, the replacement condition is a number, and the re-question rule is “KEYBOARD "1208". The voice output unit 1102 outputs “Input numerals with keyboard.”. This apparatus is handed over to the conversation partner and receives the input of the number “20” from the input operation unit 103. As a result, “xxx”, which cannot be recognized by the voice, is replaced with the number “20”.
[0081]
FIG. 13 shows an example of the content displayed on the display unit 105. In the response example 1301, the corresponding part that matches the character response sentence 1302 of the translation response sentence is highlighted. Although the change of the display attribute is highlighted, other attributes such as a change of display color may be used. As a result, the user can more easily understand the answer from the conversation partner. Next, the operation of the present embodiment will be described with reference to the flowchart shown in FIG. Since the operation up to S920 shown in FIG. 9 of the first embodiment is the same, only the operation unique to the present embodiment will be described.
[0082]
In S920, the response sentence extraction unit 1103 selects a response sentence similar to the character response sentence from the list of response sentences.
[0083]
Next, the response sentence extraction unit 1103 determines whether or not the response sentence similar to the character response sentence completely matches (S1402), and if it matches, the process proceeds to S1416 and determines that they do not match. At this time, it is determined whether or not a different portion (word or word string) is an essential requirement by looking at the list 1201 of the response sentence database 1101 (S1404). If it is not an essential requirement, the process proceeds to S1416. If the requirement is an essential requirement, it is similarly determined whether there is a possibility of replacement by looking at the list 1201 or the like (S1406).
[0084]
If it is determined that the question cannot be replaced, it is determined that the question sentence has not been transmitted to the conversation partner, and the voice output unit 1102 is instructed to re-question, and the process returns to S914 in the first embodiment.
[0085]
When it is determined that the replacement is possible, it is determined whether or not the word of the character response sentence meets the replacement condition (S1408), and if it matches, the process proceeds to S1416. If they do not match, the voice output unit 1102 is notified to output a predetermined voice according to the re-question rule.
[0086]
The voice output unit 1102 outputs a voice corresponding to the re-question rule and transmits it to the conversation partner (S1410). A re-response from the conversation partner is accepted from the voice input unit 109 or the input operation unit 103 (S1412).
[0087]
The voice recognition unit 110 recognizes the voice input from the voice input unit, notifies the response sentence extraction unit 1103 of the word or the word string, and the input operation unit 103 notifies the response sentence extraction unit 1103 of the input operation word. (S1414).
[0088]
The response sentence extraction unit 1103 displays a response sentence similar to the display unit 105, a translation response sentence thereof, and a character response sentence as a recognition result. At that time, the display attribute of the portion of the translation response sentence that matches the character response sentence is changed (S1416).
[0089]
In this way, even if the voice response from the conversation partner cannot be recognized once, prepare a re-question according to the content, receive an appropriate input from the conversation partner, Recognition accuracy can be improved.
[0090]
In the above embodiment, Japanese is used as the first language and English is used as the second language. However, in the interpreting apparatus according to the present invention, even if the first language and the second language are reversed, The other language may be the first or second language. In this case, the question sentence translation database 101, the voice output units 106 and 1102, the response sentence databases 102 and 1101, and the voice recognition dictionary 108 have contents corresponding to the respective languages.
[0091]
Further, in the above embodiment, each unit shown in the configuration diagram shown in FIGS. 1 and 11 performs its function, but a computer readable floppy disk or CD-ROM that can display the program that performs the function of each unit is provided. And so on. By attaching this recording medium to a portable terminal device or the like that does not have a function unique to this interpreting device, an interpreting device having the same function as this device can be obtained.
[0092]
【The invention's effect】
As described above, the present invention responds to a question sentence database storing a pair of a first language question sentence and a second language question sentence translated from the first language question sentence. Response sentence database storing a plurality of standard response sentences in the second language and response sentences in the first language translated from them, input operation means, voice input means, and user in the input operation means A question sentence extracting means for searching the question sentence database for a question sentence in the first language input from the second language and extracting a question sentence in the second language paired with the question sentence database; When the user's conversation partner receives the output and inputs a response sentence in the second language by voice, the voice recognition means for recognizing the input voice as a character response sentence in the second language is recognized by the voice recognition means. Second language response sentence similar to the text response sentence A response sentence extracting means for extracting a corresponding response sentence in the first language from the response sentence database; a response sentence in the second language extracted by the response sentence extracting means; a response sentence in the first language; and the voice recognition means. Display means for displaying the character response sentence recognized in step (b). With such a configuration, the user's conversation partner simply responds by voice as if having a normal conversation, and the response sentence that answered the question sentence is displayed to the user in the first language that the user can understand. You can have a smooth conversation.
[0093]
In addition, the speech recognition means analyzes a speech recognition dictionary storing a word standard pattern and a word corresponding to the word standard pattern, and analyzes a speech input from the speech input means to generate a segment pattern in units of words. The similarity between the segmentation unit that divides into the segment, the segment pattern segmented by the segmentation unit and the word standard pattern stored in the speech recognition dictionary is calculated, and the similarity corresponds to a word standard pattern with a predetermined value or more A candidate word extraction unit that extracts a word to be performed as a candidate word, and a character response sentence generation unit that generates a character response sentence by arranging the candidate words having the highest segment similarity extracted by the candidate word extraction unit. Yes. With such a configuration, the response text of the conversation partner can be recognized as a text response text.
[0094]
The candidate word extraction unit increases the similarity by a predetermined magnification when the extracted candidate word matches the word constituting the response sentence extracted by the response sentence extraction means. With such a configuration, it is possible to improve the recognition accuracy of a voice response sentence by increasing the probability of recognizing a word predicted as a response sentence even with the voice of an unspecified conversation partner. In the response sentence database, whether a word constituting the response sentence is an essential element as a response sentence, and whether the word is expected to be replaced with another word when the word is an essential element. The semantic attribute of the word that is expected to be replaced when it is expected is stored, and the words constituting the character response sentence recognized by the speech recognition means are collated with the words constituting the similar response sentence And a character response sentence determination means for determining that a character response sentence is not established as an answer to a question when a word corresponding to the response sentence is a word that lacks an essential element and is different from the semantic attribute of the word that is expected to be replaced It is going to be equipped with. With such a configuration, it is determined whether or not the character response sentence recognized by the voice recognition means is established as an answer to the question sentence.
[0095]
Further, in the response sentence database, when a word constituting the response sentence is an indispensable element, a re-question rule when the word cannot be recognized is stored, and the character response sentence determination unit includes: When it is determined that the character response sentence does not hold as the answer to the question, an output means for outputting a re-question according to the re-question rule is provided. With such a configuration, a character response sentence that is not recognized as an answer to the question sentence can be made a correctly recognized character response sentence by asking the conversation partner again.
[0096]
In addition, the words constituting the character response sentence displayed on the display means are compared with the words constituting the response sentence of the second language extracted by the response sentence extraction means, and the matching words of the second language are compared. Control means for changing the display attribute of the response sentence is provided. With such a configuration, it becomes easy to understand the response sentence from the conversation partner.
[0097]
In addition, a character response sentence displayed on the display means by attaching the same identifier to the corresponding words constituting the response sentence of the first language and the response sentence of the second language stored in the response sentence database Of the first language response sentence with the same identifier as the matching word is compared with the words constituting the second language response sentence extracted by the response sentence extraction means It is assumed that display control means for changing is provided. With such a configuration, it is possible to clearly know the difference from the content of the first language that the user can understand the response sentence of the second language from the conversation partner.
[0098]
Furthermore, a computer-readable recording medium, a question sentence database storing a pair of a first language question sentence and a second language question sentence translated from the first language question sentence, and a second language question sentence A response sentence database storing in advance a plurality of standard second language response sentences responding to and a first language response sentence translated from them, Computer Search the question sentence database for a question sentence in the first language input from the user by the input operation means, voice input means, and the input operation means, and extract a question sentence in the second language paired therewith. In response to the output of the question sentence extraction means and the extracted second language question sentence, when the user's conversation partner inputs the second language response sentence by voice, the input voice is the second language character response. Response sentence extraction for extracting, from the response sentence database, speech recognition means for recognizing as a sentence, and a response sentence in a first language corresponding to a response sentence in a second language similar to the character response sentence recognized by the speech recognition means Display control means for displaying means, a response sentence in the second language extracted by the response sentence extraction means, a response sentence in the first language, and a character response sentence recognized by the voice recognition means To function as The program is going to be recorded. Accordingly, a portable terminal device that does not have an interpreting function can be used as an efficient interpreting device.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of Embodiment 1 of an interpreting apparatus according to the present invention.
FIG. 2 is a diagram showing an example of contents of a question sentence translation database according to the embodiment.
FIG. 3 is a diagram showing an example of contents of a response sentence database according to the embodiment.
FIG. 4 is a diagram showing an example of contents displayed on the display unit of the embodiment.
FIG. 5 is a diagram showing another example of contents displayed on the display unit of the embodiment.
FIG. 6 is a diagram showing an example of the contents of the speech recognition dictionary of the embodiment.
FIG. 7A shows an example of a character response sentence of a voice response input to the voice input unit of the embodiment.
(B) is an explanatory diagram showing a state in which word candidates are recognized by dividing the speech signal into speech patterns of each segment in a word unit in the speech recognition unit of the embodiment.
FIG. 8 is an explanatory diagram showing how a character response sentence is recognized by correcting the similarity of word candidates in the speech recognition unit of the embodiment.
FIG. 9 is a flowchart for explaining the operation of the embodiment.
FIG. 10 is a flowchart for explaining the detailed operation of S918 in FIG. 9;
FIG. 11 is a configuration diagram of Embodiment 2 of an interpreting apparatus according to the present invention.
FIG. 12 is a diagram illustrating an example of contents of a response sentence database according to the embodiment.
FIG. 13 is a diagram showing an example of content displayed on the display unit of the embodiment.
FIG. 14 is a flowchart illustrating the operation of the embodiment.
[Explanation of symbols]
101 Question translation database
102, 1101 Response sentence database
103 Input operation unit
104 Question sentence selection part
105 display
106, 1102 Audio output unit
107, 1103 Response sentence extraction unit
108 Speech recognition dictionary
109 Voice input unit
110 Voice recognition unit

Claims

A question sentence database storing a question sentence in a first language and a question sentence in a second language into which the question sentence is translated;
A response sentence database storing a plurality of standard second language response sentences responding to a second language question sentence, and a first language response sentence obtained by translating them;
Input operation means;
Voice input means;
A question sentence extraction means for searching the question sentence database for a question sentence in a first language input from a user by the input operation means, and extracting a question sentence in a second language paired therewith;
Speech recognition that recognizes the input speech as a second language character response when the user's conversation partner inputs the second language response in response to the output of the extracted second language question. Means,
A response sentence extraction means for extracting a response sentence in a first language corresponding to a response sentence in a second language similar to a character response sentence recognized by the voice recognition means, from the response sentence database;
An interpreting device comprising: a second language response sentence extracted by the response sentence extraction means; a first language response sentence; and a character response sentence recognized by the voice recognition means. .

The voice recognition means
A speech recognition dictionary storing word standard patterns and words corresponding to the word standard patterns;
A dividing unit that analyzes the voice input from the voice input means and divides the voice into segment patterns;
The similarity between the segment pattern divided by the dividing unit and the word standard pattern stored in the speech recognition dictionary is calculated, and words corresponding to the word standard pattern having a similarity equal to or higher than a predetermined value are extracted as candidate words. A candidate word extraction unit to
The interpreting apparatus according to claim 1, further comprising: a character response sentence generation unit that generates a character response sentence by arranging candidate words having the highest segment similarity extracted by the candidate word extraction unit.

The candidate word extraction unit increases the similarity by a predetermined magnification when the extracted candidate word matches the word constituting the response sentence extracted by the response sentence extraction unit. The interpretation device described.

In the response sentence database, whether a word constituting the response sentence is an essential element as a response sentence, and whether the word is expected to be replaced with another word when it is an essential element, The semantic attributes of the words that are expected to be replaced are stored,
A word that constitutes a character response sentence recognized by the voice recognition means is compared with a word that constitutes a similar response sentence, lacks an essential element, and differs from a semantic attribute of a word that is expected to be replaced 2. The interpreting apparatus according to claim 1, further comprising: a character response sentence determination unit that determines that a character response sentence is not established as an answer to a question when the word is a word corresponding to the response sentence.

In the response sentence database, when a word constituting the response sentence is an essential element, a re-question rule when the word cannot be recognized is stored,
The interpreter according to claim 4, further comprising an output unit that outputs a re-question according to the re-question rule when the character response sentence determination unit determines that the character response sentence is not established as an answer to the question. apparatus.

The words constituting the character response sentence displayed on the display means and the words constituting the response sentence in the second language extracted by the response sentence extraction means are compared, and the response sentence in the second language of the matching word is compared. 6. The interpreting apparatus according to claim 1, further comprising a control unit that changes a display attribute of the interpreter.

The same identifier is attached to the corresponding words constituting the response sentence in the first language and the response sentence in the second language stored in the response sentence database,
The word constituting the character response sentence displayed on the display means is compared with the word constituting the response sentence of the second language extracted by the response sentence extraction means, and the same identifier as the matching word is attached. 6. The interpreting apparatus according to claim 1, further comprising display control means for changing a display attribute of the response sentence in the first language.

A computer-readable recording medium,
A question sentence database storing a question sentence in a first language and a question sentence in a second language into which the question sentence is translated;
A pre-recorded response sentence database storing a plurality of standard second language response sentences responding to the second language question sentences and a first language response sentence obtained by translating them;
Computer
Input operation means;
Voice input means;
A question sentence extraction means for searching the question sentence database for a question sentence in a first language input from a user by the input operation means, and extracting a question sentence in a second language paired therewith;
Speech recognition that recognizes the input speech as a second language character response when the user's conversation partner inputs the second language response in response to the output of the extracted second language question. Means,
A response sentence extraction means for extracting a response sentence in a first language corresponding to a response sentence in a second language similar to a character response sentence recognized by the voice recognition means, from the response sentence database;
Display control means for displaying the response sentence in the second language extracted by the response sentence extraction means, the response sentence in the first language, and the character response sentence recognized by the voice recognition means.
A recording medium on which a program for functioning as a recording medium is recorded.