JP5653392B2

JP5653392B2 - Speech translation apparatus, method and program

Info

Publication number: JP5653392B2
Application number: JP2012146880A
Authority: JP
Inventors: 住田　一男; 一男住田; 鈴木　博和; 博和鈴木; 建太郎降幡; 聡史釜谷; 知野　哲朗; 哲朗知野; 尚義永江; 康顕有賀; 貴史益子
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2012-06-29
Filing date: 2012-06-29
Publication date: 2015-01-14
Anticipated expiration: 2032-06-29
Also published as: US20140006007A1; CN103514153A; JP2014010623A; US20150199341A1; US9002698B2

Description

本発明の実施形態は、音声翻訳装置、方法およびプログラムに関する。 Embodiments described herein relate generally to a speech translation apparatus, method, and program.

近年のグローバル化に伴い、異なる言語を母語とするユーザ同士のコミュニケーションを支援する音声翻訳装置への期待が高まっている。また実際に、音声翻訳機能を提供するサービスも運用されている。しかし、音声認識または機械翻訳を誤りなく実行することは難しい。そこで、発話された原言語を翻訳した言語である目的言語を利用するユーザが目的言語の訳文を理解できない場合に、理解不能な箇所を指定することで原言語を利用する話者に修正を促す手法がある。 With globalization in recent years, there is an increasing expectation for speech translation devices that support communication between users whose native languages are different languages. In fact, services that provide speech translation functions are also in operation. However, it is difficult to execute speech recognition or machine translation without error. Therefore, if the user who uses the target language, which is the translated language of the spoken source language, cannot understand the translation of the target language, the speaker who uses the source language is encouraged to correct it by specifying the unintelligible part. There is a technique.

特許第４０４２３６０号公報Japanese Patent No. 4042360

しかし、理解不能な箇所の修正は原言語側でユーザが文字列の修正を行なう必要があり、また、目的言語側のユーザも訳文を一文ごとに確認し、その確認の結果を入力しなければならず、応答性の高い会話を実現することが難しい。 However, correction of unintelligible parts requires the user to correct the character string on the source language side, and the user on the target language side must also check the translated sentence one sentence at a time and enter the result of the confirmation. It is difficult to realize a conversation with high responsiveness.

本開示は、上述の課題を解決するためになされたものであり、円滑かつ応答性の高い音声翻訳を提供することができる音声翻訳装置、方法、およびプログラムを提供することを目的とする。 The present disclosure has been made to solve the above-described problem, and an object thereof is to provide a speech translation apparatus, method, and program capable of providing speech translation that is smooth and highly responsive.

本実施形態に係る音声翻訳装置は、取得部、音声認識部、翻訳部、検索部、選択部および用例提示部を含む。取得部は、第１言語による発話を音声信号として取得する。音声認識部は、前記音声信号について順次音声認識を行ない、音声認識結果の文字列である第１言語文字列を得る。翻訳部は、前記第１言語文字列を前記第１言語とは異なる第２言語に翻訳し、翻訳結果の文字列である第２言語文字列を得る。検索部は、前記第１言語文字列ごとに該第１言語文字列に類似する前記第１言語での用例である類似用例を検索し、該類似用例が存在する場合は、該類似用例と該類似用例を第２言語に翻訳した結果である対訳用例とを得る。選択部は、ユーザの指示により、前記類似用例が存在する第１言語文字列および前記対訳用例が存在する第２言語文字列の少なくとも一方を選択文字列として選択する。用例提示部は、前記選択文字列に関する類似用例および対訳用例を１以上提示する。 The speech translation apparatus according to this embodiment includes an acquisition unit, a speech recognition unit, a translation unit, a search unit, a selection unit, and an example presentation unit. The acquisition unit acquires an utterance in the first language as an audio signal. The voice recognition unit sequentially performs voice recognition on the voice signal to obtain a first language character string that is a character string of a voice recognition result. The translation unit translates the first language character string into a second language different from the first language, and obtains a second language character string that is a character string as a translation result. The search unit searches for a similar example that is an example in the first language similar to the first language character string for each first language character string, and when the similar example exists, A parallel example that is the result of translating the similar example into the second language is obtained. The selection unit selects at least one of a first language character string in which the similar example exists and a second language character string in which the parallel example exists as a selected character string according to a user instruction. The example presentation unit presents one or more similar examples and parallel translation examples related to the selected character string.

本実施形態に係る音声翻訳装置を示すブロック図。The block diagram which shows the speech translation apparatus which concerns on this embodiment. 用例格納部に格納される原言語の用例と目的言語の用例との一例を示す図。The figure which shows an example of the example of the source language stored in an example storage part, and the example of a target language. 本実施形態に係る音声翻訳装置の動作を示すフローチャート。The flowchart which shows operation | movement of the speech translation apparatus which concerns on this embodiment. 用例検索処理の詳細を示すフローチャート。The flowchart which shows the detail of an example search process. 類似用例および対訳用例の提示処理の詳細を示すフローチャート。The flowchart which shows the detail of the presentation process of a similar example and a translation example. 本実施形態に係る音声翻訳装置の実装例を示す図。The figure which shows the example of mounting of the speech translation apparatus concerning this embodiment. タッチパネルディスプレイの画面表示の一例を示す図。The figure which shows an example of the screen display of a touchscreen display. 本実施形態に係る音声翻訳装置の動作における第１処理を示す図。The figure which shows the 1st process in operation | movement of the speech translation apparatus which concerns on this embodiment. 本実施形態に係る音声翻訳装置の動作における第２処理を示す図。The figure which shows the 2nd process in operation | movement of the speech translation apparatus which concerns on this embodiment. 本実施形態に係る音声翻訳装置の動作における第３処理を示す図。The figure which shows the 3rd process in operation | movement of the speech translation apparatus concerning this embodiment. 本実施形態に係る音声翻訳装置の動作における第４処理を示す図。The figure which shows the 4th process in operation | movement of the speech translation apparatus which concerns on this embodiment. 本実施形態に係る音声翻訳装置の動作における第５処理を示す図。The figure which shows the 5th process in operation | movement of the speech translation apparatus concerning this embodiment. 本実施形態に係る音声翻訳装置の動作における第６処理を示す図。The figure which shows the 6th process in operation | movement of the speech translation apparatus which concerns on this embodiment. 本実施形態に係る音声翻訳装置の動作における第７処理を示す図。The figure which shows the 7th process in operation | movement of the speech translation apparatus which concerns on this embodiment. 原言語側のユーザが用例を選択する場合の動作における第１処理を示す図。The figure which shows the 1st process in operation | movement when the user of the source language side selects an example. 原言語側のユーザが用例を選択する場合の動作における第２処理を示す図。The figure which shows the 2nd process in operation | movement when the user of the source language side selects an example. 適切な用例が存在しない場合の表示例を示す図。The figure which shows the example of a display when a suitable example does not exist. 第２の実施形態に係る用例格納部に格納されるテーブルの一例を示す図。The figure which shows an example of the table stored in the example storage part which concerns on 2nd Embodiment. 第２の実施形態に係る音声翻訳装置の動作の具体例を示す図。The figure which shows the specific example of operation | movement of the speech translation apparatus which concerns on 2nd Embodiment. 第３の実施形態に係る音声翻訳装置を含む音声認識システムを示す図。The figure which shows the speech recognition system containing the speech translation apparatus which concerns on 3rd Embodiment.

近年、例えばスマートフォン（高機能携帯端末）で動作する音声翻訳アプリケーションソフトが商用化されている。また、音声翻訳機能を提供するサービスも運用されている。これらのアプリケーションソフトおよびサービスは、ユーザが一文あるいは数文といった短い単位で音声を発話し、音声認識により対応する文字列に変換する。さらに、機械翻訳により他の言語の文字列に翻訳し、音声合成により翻訳結果の文字列を音声で読み上げるといった動作をする。原言語を利用するユーザには短い単位で発話すること、目的言語を利用するユーザにはその単位で翻訳結果の確認または音声合成音声の聞き取りを行なうことなどが求められる。
このため、このような従来のアプリケーションソフトを用いた会話では、待ち時間が頻繁に発生することになり、応答性のよい会話を行なうことが難しいというのが現状である。ユーザに対して一文単位で発話することを求めるような制約を設けることなく、制約なく発話する内容が相手に伝わることが望ましいが、そのような機能は提供されていない。 In recent years, for example, speech translation application software that operates on a smartphone (high performance portable terminal) has been commercialized. A service that provides a speech translation function is also in operation. In these application software and services, a user speaks a voice in a short unit such as one sentence or several sentences, and converts it into a corresponding character string by voice recognition. Furthermore, it translates into a character string of another language by machine translation, and reads out the character string as a translation result by speech synthesis. Users who use the source language are required to speak in short units, and users who use the target language are required to check the translation result or listen to the synthesized speech in that unit.
For this reason, in such a conversation using the conventional application software, a waiting time frequently occurs, and it is difficult to conduct a conversation with good response. Although it is desirable that the content of the utterance without any restriction is transmitted to the other party without providing the user with a restriction that requires the user to speak one sentence at a time, such a function is not provided.

以下、図面を参照しながら本実施形態に係る音声翻訳装置、方法およびプログラムについて詳細に説明する。なお、以下の実施形態では、同一の参照符号を付した部分は同様の動作をおこなうものとして、重複する説明を適宜省略する。本実施形態では、発話された言語を示す原言語を日本語とし、原言語を翻訳したい言語を示す目的言語を英語として、日本語と英語との間の翻訳を例に説明するが、翻訳処理の対象となる言語は、これらの２言語に限られることなく、あらゆる言語を対象とすることができる。
（第１の実施形態）
第１の実施形態に係る音声翻訳装置について図１のブロック図を参照して説明する。
第１の実施形態に係る音声翻訳装置１００は、音声取得部１０１、音声認識部１０２、機械翻訳部１０３、表示部１０４、用例格納部１０５、用例検索部１０６、ポインティング指示検出部１０７、文字列選択部１０８、用例提示部１０９を含む。 Hereinafter, the speech translation apparatus, method, and program according to the present embodiment will be described in detail with reference to the drawings. Note that, in the following embodiments, the same reference numerals are assigned to the same operations, and duplicate descriptions are omitted as appropriate. In this embodiment, the source language indicating the spoken language is Japanese, the target language indicating the language in which the source language is to be translated is English, and translation between Japanese and English is described as an example. The target language is not limited to these two languages, and any language can be targeted.
(First embodiment)
A speech translation apparatus according to the first embodiment will be described with reference to the block diagram of FIG.
A speech translation apparatus 100 according to the first embodiment includes a speech acquisition unit 101, a speech recognition unit 102, a machine translation unit 103, a display unit 104, an example storage unit 105, an example search unit 106, a pointing instruction detection unit 107, a character string A selection unit 108 and an example presentation unit 109 are included.

音声取得部１０１は、ユーザが原言語（第１言語ともいう）で発話した音声を音声信号として取得する。
音声認識部１０２は、音声取得部１０１から音声信号を受け取り、音声信号を音声認識処理し、音声認識した結果の原言語の文字列である原言語文字列を得る。音声認識部１０２は、音声取得部１０１から音声信号が入力される間、音声認識処理における処理単位ごとに順次音声認識が行われ、原言語文字列が得られるたびに後段に渡される。音声認識の処理単位は、音声中に存在するポーズや言語的な区切り、音声認識候補が確定されるとき、または一定の時間間隔により決定される。また、音声認識した結果が取り出せることをイベントによってユーザに通知してもよい。なお、音声認識の具体的な処理は、一般的な処理を行えばよいため、ここでの説明は省略する。 The voice acquisition unit 101 acquires a voice uttered by the user in the original language (also referred to as a first language) as a voice signal.
The speech recognition unit 102 receives a speech signal from the speech acquisition unit 101, performs speech recognition processing on the speech signal, and obtains a source language character string that is a source language character string as a result of speech recognition. The speech recognition unit 102 sequentially performs speech recognition for each processing unit in the speech recognition processing while the speech signal is input from the speech acquisition unit 101, and is passed to the subsequent stage every time a source language character string is obtained. The processing unit for speech recognition is determined when pauses or linguistic breaks present in speech, when speech recognition candidates are determined, or at certain time intervals. Moreover, you may notify a user by an event that the result of speech recognition can be taken out. In addition, since the concrete process of speech recognition should just perform a general process, description here is abbreviate | omitted.

機械翻訳部１０３は、音声認識部１０２から原言語文字列を受け取り、原言語文字列を目的言語（第２言語ともいう）の文字列に機械翻訳し、翻訳結果の文字列である目的言語文字列を得る。機械翻訳の具体的な処理は、一般的な処理を行えばよいため、ここでの説明を省略する。
表示部１０４は、例えばディスプレイであり、音声認識部１０２から原言語文字列を、機械翻訳部１０３から目的言語文字列をそれぞれ受け取り、原言語文字列および目的言語文字列を表示する。また、後述する用例提示部１０９から類似用例および対訳用例を受け取り、表示する。類似用例は、原言語文字列に類似する原言語での用例である。対訳用例は、類似用例を目的言語に翻訳した結果の用例である。 The machine translation unit 103 receives the source language character string from the speech recognition unit 102, machine translates the source language character string into a character string of a target language (also referred to as a second language), and performs a target language character that is a character string of a translation result Get a column. Since specific processing of machine translation may be general processing, description thereof is omitted here.
The display unit 104 is, for example, a display, and receives a source language character string from the speech recognition unit 102 and a target language character string from the machine translation unit 103, and displays the source language character string and the target language character string. Also, a similar example and a parallel translation example are received from an example presentation unit 109 described later and displayed. The similar example is an example in the source language similar to the source language character string. The parallel translation example is an example of a result obtained by translating a similar example into a target language.

用例格納部１０５は、原言語の用例（以下、原言語用例ともいう）と目的言語の用例（以下、目的言語用例ともいう）とを関連づけて格納する。用例格納部１０５に格納される原言語の用例と目的言語の用例とについては、図２を参照して後述する。
用例検索部１０６は、音声認識部１０２から原言語文字列を受け取り、用例格納部１０５に蓄積される原言語の用例から原言語文字列に類似する類似用例を検索する。
ポインティング指示検出部１０７は、表示部１０４上においてユーザから指示された位置に対応する位置情報を取得する。 The example storage unit 105 stores a source language example (hereinafter also referred to as a source language example) and a target language example (hereinafter also referred to as a target language example) in association with each other. An example of the source language and an example of the target language stored in the example storage unit 105 will be described later with reference to FIG.
The example search unit 106 receives the source language character string from the speech recognition unit 102 and searches for a similar example similar to the source language character string from the source language examples stored in the example storage unit 105.
The pointing instruction detection unit 107 acquires position information corresponding to the position instructed by the user on the display unit 104.

文字列選択部１０８は、ポインティング指示検出部１０７から位置情報を受け取り、表示部１０４に表示された文字列のうち、位置情報に対応した原言語文字列または目的言語文字列を選択文字列として選択する。
用例提示部１０９は、文字列選択部１０８から選択文字列を、用例検索部１０６から選択文字列に関する類似用例および対訳用例をそれぞれ受け取り、類似用例と対訳用例とを表示部１０４に表示させる。また、用例提示部１０９は、選択文字列、選択された類似用例および対訳用例を強調表示する。 The character string selection unit 108 receives position information from the pointing instruction detection unit 107 and selects a source language character string or a target language character string corresponding to the position information from among the character strings displayed on the display unit 104 as a selected character string. To do.
The example presentation unit 109 receives the selected character string from the character string selection unit 108 and the similar example and the parallel translation example related to the selected character string from the example search unit 106, and causes the display unit 104 to display the similar example and the parallel translation example. Also, the example presentation unit 109 highlights the selected character string, the selected similar example, and the parallel translation example.

次に、用例格納部１０５に格納される原言語用例と目的言語用例との一例について図２を参照して説明する。
図２に示すように、原言語である原言語用例２０１と、原言語用例２０１に対応する目的言語である目的言語用例２０２とが対応づけられて格納される。具体的には、例えば原言語である「あまり歩けない」と、「あまり歩けない」の翻訳結果である「I can’t walk so long distance.」とが、それぞれ原言語用例２０１と目的言語用例２０２として格納される。 Next, an example of the source language example and the target language example stored in the example storage unit 105 will be described with reference to FIG.
As shown in FIG. 2, a source language example 201 that is a source language and a target language example 202 that is a target language corresponding to the source language example 201 are stored in association with each other. Specifically, for example, the source language example 201 and the target language example are “I can't walk so long distance.” Which is a translation result of “not so much walking” and “not so much walking”, respectively. 202 is stored.

次に、本実施形態にかかる音声翻訳装置１００の動作について図３のフローチャートを参照して説明する。なお、フローチャートには図示していないが、音声認識部１０２ならびに機械翻訳部１０３は並列的に動作するため、図３の処理に先立ち、音声認識部１０２ならびに機械翻訳部１０３の処理を起動しておく。
ステップＳ３０１では、音声認識部１０２が、音声認識処理を行なった結果の原言語文字列を得る。
ステップＳ３０２では、表示部１０４が、原言語文字列が表示する。
ステップＳ３０３では、機械翻訳部１０３が、機械翻訳処理を行なった結果の目的言語文字列を得る。 Next, the operation of the speech translation apparatus 100 according to the present embodiment will be described with reference to the flowchart of FIG. Although not shown in the flowchart, since the speech recognition unit 102 and the machine translation unit 103 operate in parallel, the processing of the speech recognition unit 102 and the machine translation unit 103 is started prior to the processing of FIG. deep.
In step S301, the speech recognition unit 102 obtains a source language character string as a result of performing speech recognition processing.
In step S302, the display unit 104 displays the source language character string.
In step S303, the machine translation unit 103 obtains a target language character string as a result of machine translation processing.

ステップＳ３０４では、表示部１０４が、目的言語文字列を表示する。なお、表示部１０４では、ステップＳ３０２で原言語文字列を表示せず、目的言語文字列が得られたあとに、原言語文字列と目的言語文字列とを一緒に表示するようにしてもよい。 In step S304, the display unit 104 displays the target language character string. The display unit 104 may not display the source language character string in step S302, but may display the source language character string and the target language character string together after the target language character string is obtained. .

ステップＳ３０５では、用例検索部１０６が、用例検索処理を行なう。用例検索処理については、図４のフローチャートを参照して後述する。
ステップＳ３０６では、ポインティング指示検出部１０７が、ユーザからの指示、すなわち意味が不明な目的言語文字列に対するポインティングがあるかどうかを検出する。ユーザからの指示は、例えば、表示部１０４がタッチパネル式のディスプレイであれば、類似用例および対訳用例が存在することを示す記号をタッチすることで、ユーザからの指示があったと検出される。ユーザからの指示を検出した場合は、ステップＳ３０７に進み、ユーザからの指示を検出しない場合は、ステップＳ３０１に戻り、同様の処理を繰り返す。 In step S305, the example search unit 106 performs an example search process. The example search process will be described later with reference to the flowchart of FIG.
In step S306, the pointing instruction detection unit 107 detects whether there is an instruction from the user, that is, whether there is pointing to a target language character string whose meaning is unknown. For example, when the display unit 104 is a touch panel display, the user's instruction is detected by touching a symbol indicating that there is a similar example and a parallel translation example, and that the instruction from the user is present. If an instruction from the user is detected, the process proceeds to step S307. If an instruction from the user is not detected, the process returns to step S301, and the same processing is repeated.

ステップＳ３０７では、音声認識部１０２が、音声認識処理を一時停止する。
ステップＳ３０８では、用例提示部１０９が、用例の提示処理を行なう。具体的な用例の提示処理については図５のフローチャートを参照して後述する。
ステップＳ３０９では、音声認識部１０２が、音声認識処理を再開し、ステップＳ３０１から同様の処理を繰り返す。その後、発話の入力が無くなった場合、またはユーザによる音声認識処理の終了指示があった場合に音声翻訳装置の動作を終了する。 In step S307, the voice recognition unit 102 temporarily stops the voice recognition process.
In step S308, the example presentation unit 109 performs an example presentation process. Specific example presentation processing will be described later with reference to the flowchart of FIG.
In step S309, the voice recognition unit 102 restarts the voice recognition process and repeats the same process from step S301. Thereafter, the operation of the speech translation apparatus is terminated when there is no more utterance input or when there is an instruction to terminate speech recognition processing by the user.

次に、ステップＳ３０５の動作の詳細について図４のフローチャートを参照して説明する。
ステップＳ４０１では、用例検索部１０６が、原言語文字列を受け取る。
ステップＳ４０２では、用例検索部１０６が、抽出した原言語文字列に類似用例が存在するかどうかを、用例格納部１０５から検索する。類似用例の検索は、例えば原言語文字列と原言語の用例との編集距離を算出して、閾値以上の一致度を有する場合に原言語の用例が類似用例であると判定してもよいし、形態素解析により単語数の一致度が閾値以上であれば類似用例であると判定してもよい。類似用例が存在する場合は、ステップＳ４０３に進み、類似用例が存在しない場合は、ステップＳ３０５およびステップＳ３０６の処理を終了する。
ステップＳ４０３では、用例提示部１０９が、類似用例が存在する原言語文字列に類似用例が存在することを示す記号を対応づけて表示部１０４に表示させ、原言語文字列に対応する目的言語文字列に対訳用例が存在することを示す記号を表示部１０４に表示させる。 Next, details of the operation of step S305 will be described with reference to the flowchart of FIG.
In step S401, the example search unit 106 receives a source language character string.
In step S <b> 402, the example search unit 106 searches the example storage unit 105 to determine whether there is a similar example in the extracted source language character string. In the similar example search, for example, the edit distance between the source language character string and the source language example may be calculated, and when the matching degree is equal to or greater than the threshold, the source language example may be determined to be the similar example. If the degree of coincidence of the number of words is greater than or equal to a threshold value by morphological analysis, it may be determined that the example is similar. If a similar example exists, the process proceeds to step S403. If a similar example does not exist, the processes in steps S305 and S306 are terminated.
In step S403, the example presentation unit 109 causes the display unit 104 to display a symbol indicating that the similar example exists in the source language character string in which the similar example exists, and displays the target language character corresponding to the source language character string. A symbol indicating that a parallel translation example exists in the column is displayed on the display unit 104.

次に、ステップＳ３０８の類似用例および対訳用例の提示処理について図５のフローチャートを参照して説明する。以下では、特に言及しない限り、類似用例および対訳用例をまとめて用例と呼ぶ。
ステップＳ５０１では、用例提示部１０９が、通知とともに用例を表示する。通知は、ユーザから意味を確認したい旨に指示があったことを示す確認メッセージである。用例は１つだけ表示してもよいし、複数の用例を一覧（リスト）として提示してもよい。用例のリストの提示方法は、例えば音声認識結果との類似度が高い順に上位５つを提示してもよいし、用例を全て提示してもよいし、提示された用例の履歴を参照して提示するなど任意の方法でもよい。 Next, the presentation processing of the similar example and the parallel translation example in step S308 will be described with reference to the flowchart of FIG. Hereinafter, unless otherwise noted, similar examples and parallel translation examples are collectively referred to as examples.
In step S501, the example presentation unit 109 displays an example together with the notification. The notification is a confirmation message indicating that there is an instruction from the user to confirm the meaning. Only one example may be displayed, or a plurality of examples may be presented as a list. For example, the list of examples may be presented in the order of the highest similarity to the speech recognition result, for example, all examples may be presented, or the history of presented examples may be referred to Any method such as presentation may be used.

ステップＳ５０２では、ポインティング指示検出部１０７が、用例のリストの中からある用例がポインティングされたかどうか、すなわち用例が選択されたかどうかを検出する。用例が選択された場合は、ステップＳ５０３に進み、用例が選択されない場合は、ステップＳ５０４に進む。
ステップＳ５０３では、用例提示部１０９が、選択された用例を強調表示する。具体的には、例えば対訳用例がポインティングされることにより、選択された対訳用例の文字色が反転したり、ハイライト表示されればよい。また、対訳用例が強調表示されると対応する類似用例も強調表示される。逆の場合も同様である。
ステップＳ５０４では、用例提示部１０９が、確認メッセージ（単に通知ともいう）を提示する。確認メッセージは、選択された用例が適切かどうかをユーザに決定させるためのメッセージである。 In step S502, the pointing instruction detection unit 107 detects whether an example is pointed out from the example list, that is, whether an example is selected. If an example is selected, the process proceeds to step S503. If no example is selected, the process proceeds to step S504.
In step S503, the example presentation unit 109 highlights the selected example. Specifically, the character color of the selected parallel translation example may be reversed or highlighted by pointing the parallel translation example, for example. When the parallel translation example is highlighted, the corresponding similar example is also highlighted. The same applies to the reverse case.
In step S504, the example presentation unit 109 presents a confirmation message (also simply referred to as notification). The confirmation message is a message for allowing the user to determine whether or not the selected example is appropriate.

ステップＳ５０５では、ポインティング指示検出部１０７が、削除に関する指示があるかどうかを検出する。削除に関する指示は、例えば削除ボタンが選択される場合に、削除指示があると検出される。削除に関する指示がある場合は、ステップＳ５０６に進み、削除に関する指示がある場合は、ステップＳ５０２に戻り、同様の処理を繰り返す。
ステップＳ５０６では、用例提示部１０９が、提示された用例の中に適切な用例がないものとして、相手方に内容が伝わらなかったことを示す確認メッセージを表示部１０４に表示させる。
ステップＳ５０７では、ポインティング指示検出部１０７が、確認メッセージに対するユーザからのポインティングがあるかどうかを検出する。確認メッセージがある場合はステップＳ５０８に進み、確認メッセージがない場合は、ユーザからのポインティングがあるまで待機する。 In step S505, the pointing instruction detection unit 107 detects whether there is an instruction regarding deletion. An instruction regarding deletion is detected when there is a deletion instruction when, for example, a delete button is selected. If there is an instruction regarding deletion, the process proceeds to step S506. If there is an instruction regarding deletion, the process returns to step S502, and the same processing is repeated.
In step S506, the example presentation unit 109 causes the display unit 104 to display a confirmation message indicating that the content has not been transmitted to the other party, assuming that there is no appropriate example in the presented examples.
In step S507, the pointing instruction detection unit 107 detects whether or not there is pointing from the user for the confirmation message. If there is a confirmation message, the process proceeds to step S508. If there is no confirmation message, the process waits until there is a pointing from the user.

ステップＳ５０８では、ポインティング指示検出部１０７が、ユーザからのポインティングが肯定を示すかどうかを検出する。ユーザからのポインティングが肯定を示さない場合は、ステップＳ５０９に進み、ユーザからのポインティングが肯定を示す場合は、ステップＳ５１０に進む。
ステップＳ５０９では、用例提示部１０９が、確認メッセージを非表示とし、選択された用例の強調表示を取り消し、通常の表示に戻して、ステップＳ５０２に戻って同様の処理を行なう。
ステップＳ５１０では、用例提示部１０９が、選択された用例を表示エリアの対応する箇所に追加して表示する。
ステップＳ５１１では、用例提示部１０９が、処理対象である原言語文字列および目的言語文字列を削除する。
ステップＳ５１２では、用例提示部１０９が、ステップＳ５０１で表示した用例のリストを非表示にする。以上で用例提示処理を終了する。 In step S508, the pointing instruction detection unit 107 detects whether pointing from the user indicates affirmation. If the pointing from the user does not indicate affirmation, the process proceeds to step S509. If the pointing from the user indicates affirmation, the process proceeds to step S510.
In step S509, the example presentation unit 109 hides the confirmation message, cancels the highlighted display of the selected example, returns to the normal display, returns to step S502, and performs the same processing.
In step S510, the example presentation unit 109 adds and displays the selected example at a corresponding location in the display area.
In step S511, the example presentation unit 109 deletes the source language character string and the target language character string to be processed.
In step S512, the example presentation unit 109 hides the example list displayed in step S501. Thus, the example presentation process ends.

次に、音声翻訳装置の実装例について図６を参照して説明する。
図６は、本実施形態にかかる音声翻訳装置１００をいわゆるタブレット形状のハードウェアに実装する例を示す。図６に示す音声翻訳装置６００は、筐体６０１、タッチパネルディスプレイ６０２およびマイクロフォン６０３を含む。
筐体６０１は、タッチパネルディスプレイ６０２、マイクロフォン６０３が搭載される。
タッチパネルディスプレイ６０２は、ディスプレイが静電容量式であれば、指で触れるとその場所がポインティングされたことを検出することができるポインティング機能（ポインティング指示検出部）と文字および画像などを表示することができる表示機能（表示部）とを有する。
マイクロフォン６０３は、一般的なマイクロフォンを用いればよく、ここでの説明は省略する。 Next, an implementation example of the speech translation apparatus will be described with reference to FIG.
FIG. 6 shows an example in which the speech translation apparatus 100 according to the present embodiment is mounted on so-called tablet-shaped hardware. A speech translation apparatus 600 shown in FIG. 6 includes a housing 601, a touch panel display 602, and a microphone 603.
The housing 601 is mounted with a touch panel display 602 and a microphone 603.
If the display is a capacitive type, the touch panel display 602 can display a pointing function (pointing instruction detection unit) capable of detecting that the location is pointed when touched with a finger, a character, an image, and the like. Display function (display unit).
A general microphone may be used as the microphone 603, and description thereof is omitted here.

次に、タッチパネルディスプレイ６０２の画面表示の一例について図７を参照して説明する。
画面表示のレイアウト例として、図７に示すように、画面の左半分に原言語文字列が表示される表示エリア７０１が表示され、画面の右半分に目的言語文字列が表示される表示エリア７０２が表示される。また、画面の右端には、発話開始ボタン７０３、言語切り替えボタン７０４、削除ボタン７０５、終了ボタン７０６が表示される。 Next, an example of the screen display of the touch panel display 602 will be described with reference to FIG.
As an example of the layout of the screen display, as shown in FIG. 7, a display area 701 in which the source language character string is displayed on the left half of the screen is displayed, and a display area 702 in which the target language character string is displayed on the right half of the screen. Is displayed. Further, an utterance start button 703, a language switching button 704, a delete button 705, and an end button 706 are displayed on the right end of the screen.

発話開始ボタン７０３は、ユーザが発話開始を指示する際にポインティングされる領域である。言語切り替えボタン７０４は、ユーザが原言語と目的言語とを切り替えるためにポインティングされる領域である。削除ボタン７０５は、用例などを削除するためにポインティングされる領域である。終了ボタン７０６は、音声認識処理を終了するためにポインティングされる領域である。
なお、図７に示すようなレイアウトに限らず、必要に応じてボタン群がポップアップするなどどのような配置および構成であってもよい。また、タッチパネルディスプレイに限らず、スクリーンおよびキーボードの組み合わせといった、画面表示と入力とが独立した状態であってもよい。 The utterance start button 703 is an area pointed when the user instructs the start of utterance. The language switching button 704 is an area where the user points to switch between the source language and the target language. The delete button 705 is an area pointed to delete an example or the like. An end button 706 is an area pointed to end the voice recognition process.
In addition, the layout and the configuration are not limited to those illustrated in FIG. 7, and any arrangement and configuration may be used, such as a button group popping up as necessary. In addition to the touch panel display, screen display and input such as a combination of a screen and a keyboard may be independent.

次に、本実施形態に係る音声翻訳装置の動作の具体例について図８から図１４までを参照して説明する。ここでは、図６に示す音声翻訳装置６００を用いた動作例を説明する。
図８では、目的言語側のユーザが発話した場合の表示例を示す。なお図８の例は、目的言語の発話を原言語に機械翻訳する場合であるが、上述した原言語の日本語と目的言語の英語とを入れ替えて上述の処理と同様の処理を行えばよい。具体的には、ユーザが発話音声８０１「Have you already gone around here?」を発話すると、音声認識結果８０２−Ｅとして「Have you already gone around here?」が表示エリア７０２に表示され、音声認識結果８０２−Ｅの機械翻訳結果８０２−Ｊとして「この辺りはもう周られましたか？」が表示エリア７０１に表示される。 Next, a specific example of the operation of the speech translation apparatus according to this embodiment will be described with reference to FIGS. Here, an operation example using the speech translation apparatus 600 shown in FIG. 6 will be described.
FIG. 8 shows a display example when the user on the target language side speaks. The example in FIG. 8 is a case where the utterance of the target language is machine-translated into the source language, but the above-described processing may be performed by replacing the source language Japanese and the target language English. . Specifically, when the user utters the speech voice 801 “Have you already gone around here?”, “Have you already gone around here?” Is displayed in the display area 702 as the voice recognition result 802-E, and the voice recognition result. As the machine translation result 802-J of 802-E, “Is this area already around?” Is displayed in the display area 701.

図９では、原言語側のユーザが発話した場合の表示例を示す。具体的には、音声取得部１０１が、発話音声９０１として「見て回りたいんだけど、あまり歩きたくないんで、バスツアーとかがいいなあ」を取得し、順次音声認識した結果である原言語文字列９０２−Ｊ「見て回りたい」、９０３−Ｊ「あまり歩きたくない」、９０４−Ｊ「バスツアーとかがいい」を表示エリア７０１に表示する。加えて、音声認識結果に対応する機械翻訳結果である目的言語文字列９０２−Ｅ「I would like to look around.」、９０３−Ｅ「Amari doesn’t want to walk.」、９０４−Ｅ「A bus tour is good.」をそれぞれ表示エリア７０２に表示する。記号９０５は類似用例および対訳用例が存在することを示す記号である。ここで、目的言語文字列９０３−Ｅが機械翻訳誤りのため、意味が通じない訳となっていると仮定する。 FIG. 9 shows a display example when the user on the source language side speaks. Specifically, the speech acquisition unit 101 acquires “Language I want to look around, but I do not want to walk too much, so I want to take a bus tour” as the speech speech 901, and the source language characters that are the result of the speech recognition in sequence. Columns 902-J “I want to look around”, 903-J “I don't want to walk too much”, and 904-J “I like a bus tour” are displayed in the display area 701. In addition, the target language character string 902-E “I would like to look around.”, 903-E “Amari doesn't want to walk.”, 904-E “A”, which is a machine translation result corresponding to the speech recognition result "bus tour is good." is displayed in the display area 702. A symbol 905 is a symbol indicating that a similar example and a parallel translation example exist. Here, it is assumed that the target language character string 903-E is meaningless because of a machine translation error.

図１０では、目的言語側のユーザが、意味が通じない目的言語文字列９０３−Ｅをポインティングした場合を示す。ポインティングの方法は、例えば、記号９０５をタッチすることで選択してもよいし、カーソル１００１を記号９０５に合わせてもよい。また、その際、確認メッセージ１００２−Ｅおよび対応する確認メッセージ１００２−Ｊを表示する。図１０の例では、表示エリア７０１において確認メッセージ１００２−Ｊ「何とおっしゃりたいのでしょうか？」が表示され、表示エリア７０２において確認メッセージ１００２−Ｅ「Can you see what the partner wants to say?」が表示される。 FIG. 10 shows a case where the user on the target language side points to a target language character string 903-E that does not make sense. For example, the pointing method may be selected by touching the symbol 905, or the cursor 1001 may be aligned with the symbol 905. At that time, a confirmation message 1002-E and a corresponding confirmation message 1002-J are displayed. In the example of FIG. 10, a confirmation message 1002-J “What do you want to say?” Is displayed in the display area 701, and a confirmation message 1002-E “Can you see what the partner wants to say?” Is displayed in the display area 702. Is displayed.

図１１では、ユーザにより目的言語文字列が選択された結果、原言語文字列の類似用例および対応する目的言語文字列の対訳用例がそれぞれの表示エリア７０１、７０２に表示される。具体的には、用例格納部１０５を参照して、類似用例１１０１−Ｊ「あまり歩けない」、１１０２−Ｊ「私はあまり歩きたくない」および１１０３−Ｊ「明日は歩きたい」と、類似用例に対応する対訳用例１１０１−Ｅ「I can’t walk so long distance.」、１１０２−Ｅ「I don’t want to walk.」および１１０３−Ｅ「Tomorrow, I’d like to walk.」が表示される。 In FIG. 11, as a result of selecting the target language character string by the user, a similar example of the source language character string and a parallel translation example of the corresponding target language character string are displayed in the display areas 701 and 702, respectively. Specifically, referring to the example storage unit 105, similar examples 1101-J “I can't walk too much”, 1102-J “I don't want to walk too much” and 1103-J “I want to walk tomorrow”, and similar examples 1101-E "I can't walk so long distance.", 1102-E "I don't want to walk." And 1103-E "Tomorrow, I'd like to walk." Is done.

図１２では、目的言語側のユーザが対訳用例を選択した場合を示し、例えば選択された対訳用例１２０１−Ｅと対応する類似用例１２０１−Ｊとが共にハイライト表示される。ここでは、対訳用例１２０１−Ｅとして「I can’t walk so long distance.」が選択されてハイライト表示され、対応する類似用例１２０１−Ｊ「あまり歩けない」がハイライト表示される。また、対訳用例が選択された場合に、原言語側の表示エリア７０１に確認メッセージ１２０２「おっしゃりたいことはこの内容でよろしいですか？」が表示される。なお、類似用例および対訳用例がそれぞれ複数表示される場合は、スクロールバー１１０４により類似用例および対訳用例をスクロールしてもよい。 FIG. 12 shows a case where the target language user selects a bilingual example, and for example, the selected bilingual example 1201-E and the corresponding similar example 1201-J are highlighted together. Here, “I ca n’t walk so long distance.” Is selected and highlighted as the parallel translation example 1201-E, and the corresponding similar example 1201-J “cannot walk too much” is highlighted. When the parallel translation example is selected, a confirmation message 1202 “Are you sure you want to say?” Is displayed in the display area 701 on the source language side. When a plurality of similar examples and parallel translation examples are displayed, the scroll bar 1104 may be used to scroll the similar examples and the parallel translation examples.

図１３では、原言語側のユーザが、ハイライト表示された類似用例の内容で承諾するかどうかをポインティングする。具体的に図１３では、確認メッセージ１２０２中の「はい」または「いいえ」をタッチする、またはカーソル１００１で指定する。これによって、ポインティング指示検出部１０７が、ユーザが「はい」および「いいえ」のどちらを選択したかを検出する。 In FIG. 13, the user on the source language side points whether or not to accept the contents of the similar example highlighted. Specifically, in FIG. 13, “Yes” or “No” in the confirmation message 1202 is touched or designated with the cursor 1001. As a result, the pointing instruction detection unit 107 detects whether the user has selected “Yes” or “No”.

図１４では、原言語側のユーザが「はい」を選択した場合、類似用例および対訳用例の一覧表示を取り消し、選択された類似用例および対応する対訳用例をそれぞれの表示エリア７０１、７０２に追加表示すると共に、翻訳誤りである元の原言語文字列と元の目的言語文字列とを削除する。例えば、原言語文字列１４０１−Ｊとして「あまり歩きたくない」を取消線で消し、その上に類似用例「あまり歩けない」を表示する。一方、目的言語文字列１４０１−Ｅとして、「Amari doesn’t want to walk.」を取消線で消し、その上に対訳用例「I can’t walk so long distance.」を表示する。このようにすることで、目的言語側のユーザが翻訳結果の意味が理解できない場合でも、目的言語側のユーザが用例を選択すれば、原言語側のユーザに対応する用例が表示される。原言語側のユーザが選択された類似用例が適切かどうかの決定をするだけでよいので、原言語側のユーザの文の言い換え能力が問われずに、容易にユーザの意図通りの会話をすることができる。 In FIG. 14, when the user on the source language side selects “Yes”, the list display of the similar examples and the parallel translation examples is canceled, and the selected similar examples and the corresponding parallel translation examples are additionally displayed in the display areas 701 and 702, respectively. At the same time, the original source language character string and the original target language character string, which are translation errors, are deleted. For example, “I don't want to walk too much” is erased with a strike line as the source language character string 1401-J, and a similar example “I can't walk too much” is displayed on it. On the other hand, as the target language character string 1401-E, “Amari does n’t want to walk.” Is erased by a strike-through line, and a parallel translation example “I ca n’t walk so long distance.” Is displayed thereon. By doing in this way, even if the user on the target language side cannot understand the meaning of the translation result, if the user on the target language side selects the example, the example corresponding to the user on the source language side is displayed. Since the source language user only needs to determine whether or not the selected similar example is appropriate, the user can easily communicate as intended regardless of the source language user's ability to rephrase the sentence. Can do.

なお、上述の例では、目的言語側のユーザが対訳用例を選択した場合を示すが、原言語側のユーザが類似用例を選択してもよい。原言語側のユーザが類似用例を選択する具体例を図１５および図１６を参照して説明する。
図１５に示すように、原言語側のユーザが類似用例を選択する。ここでは、類似用例１５０１−Ｊ「私はあまり歩きたくない」を選択すると、ハイライト表示される。類似用例１５０１−Ｊが選択されると、目的言語側の表示エリア７０２にある対訳用例１５０１−Ｅ「I don’t want to walk.」がハイライト表示される。併せて、目的言語側の表示エリアに確認メッセージ１５０２「Can you see what the partner wants to say?」が表示される。 The above example shows a case where the target language side user selects the bilingual example, but the source language side user may select the similar example. A specific example in which the user on the source language side selects a similar example will be described with reference to FIGS. 15 and 16.
As shown in FIG. 15, the user on the source language side selects a similar example. Here, when similar example 1501-J “I don't want to walk too much” is selected, it is highlighted. When the similar example 1501-J is selected, a parallel translation example 1501-E “I don't want to walk.” In the display area 702 on the target language side is highlighted. In addition, a confirmation message 1502 “Can you see what the partner wants to say?” Is displayed in the display area on the target language side.

図１６では、目的言語側のユーザがハイライト表示された対訳用例の内容で承諾するかどうかをカーソル１００１などでポインティングする。このように、原言語側のユーザが、発言した内容の原言語文字列の中で類似用例が存在する文がある場合に、原言語側のユーザ自ら類似用例を選択して言い換えることができる。 In FIG. 16, the user on the target language side points with the cursor 1001 or the like as to whether or not he / she accepts the contents of the bilingual example highlighted. In this way, when there is a sentence in which a similar example exists in the source language character string of the content spoken by the user on the source language side, the user on the source language side can select and rephrase the similar example.

次に、類似用例および対訳用例の中に適切な用例が存在しない場合を図１７に示す。 Next, FIG. 17 shows a case where there is no appropriate example in the similar example and the parallel translation example.

目的言語側のユーザまたは原言語側のユーザが、適切な用例が存在しないと判断し、用例を選択しない場合は、処理対象である原言語文字列および目的言語文字列に用例が挿入されない。さらに、処理対象である原言語文字列および目的言語文字列が削除され、確認メッセージ１７０１が表示される。確認メッセージ１７０１は例えば、「申し訳ありませんが、伝わらなかったようです。」といった内容を表示すればよい。
この場合、処理対象となった目的言語文字列の内容は目的言語側のユーザには伝わらなかったが、少なくとも原言語側のユーザは、発話が機械翻訳された内容が目的言語側のユーザに伝わらなかったことがわかるので、原言語側のユーザが別の内容の発話で言い直すなどの対応が可能となる。 When the user on the target language side or the user on the source language side determines that there is no appropriate example and does not select the example, the example is not inserted into the source language character string and the target language character string to be processed. Further, the source language character string and the target language character string to be processed are deleted, and a confirmation message 1701 is displayed. For example, the confirmation message 1701 may display a content such as “I'm sorry, but it didn't appear”.
In this case, the content of the target language character string to be processed is not transmitted to the user on the target language side, but at least the user on the source language side transmits the content of the machine-translated utterance to the user on the target language side. Since it can be seen that there was not, it becomes possible for the user on the source language side to respond by uttering another content.

以上に示した第１の実施形態によれば、原言語文字列に類似用例が存在するかどうかを検索し、類似用例が存在し、かつユーザからの選択があった場合に類似用例および対訳用例を提示する。これにより、音声認識結果の原言語文字列および機械翻訳結果の目的言語文字列において理解不能な箇所をユーザの双方で協力して用例を選択することで、不明な箇所を解消し、異なる言語において円滑に会話することができる。また、対訳用例が選択した場合にのみ音声認識を停止して用例を提示することができるので、会話のレスポンス性を損なうことなく会話することができる。 According to the first embodiment described above, it is searched whether or not there is a similar example in the source language character string, and when there is a similar example and there is a selection from the user, the similar example and the parallel translation example Present. In this way, the user can cooperate with both sides to select an example of an unintelligible part in the source language character string of the speech recognition result and the target language character string of the machine translation result. You can talk smoothly. Further, since the speech recognition can be stopped and the example can be presented only when the parallel example is selected, it is possible to have a conversation without impairing the responsiveness of the conversation.

（第２の実施形態）
第２の実施形態では、用例格納部１０５に原言語の用例または目的言語の用例に注釈を関連づけて格納する点が第１の実施形態と異なる。原言語を目的言語に翻訳する場合、原言語には意味が曖昧な場合がある。例えば、「結構です」という日本語は、「不要です」という断りを日本語ユーザが意図しているのか、「大丈夫です」という承諾を意図しているのかが曖昧である。また同様に、「You’re welcome.」という英語は、「いらっしゃい（Welcom to you）」という歓迎を英語ユーザが意図しているのか、「どういたしまして（Don’t mention it）」という感謝を意図しているのかが曖昧である。
そこで、第２の実施形態では、原言語文字列または目的言語文字列に注釈を関連づけることで、原言語を話すユーザおよび目的言語を話すユーザの意図を正しく反映させた用例をユーザに提示することができる。 (Second Embodiment)
The second embodiment is different from the first embodiment in that the example storage unit 105 stores the annotation in association with the source language example or the target language example. When a source language is translated into a target language, the meaning of the source language may be ambiguous. For example, it is ambiguous whether the Japanese user “it is fine” is intended by the Japanese user to refuse “not necessary” or “it is ok”. Similarly, the English word "You're welcome." Is intended to give thanks to "Don't mention it" whether the English user intends to welcome "Welcom to you". It is ambiguous.
Therefore, in the second embodiment, by associating an annotation with a source language character string or a target language character string, an example that correctly reflects the intentions of the user who speaks the source language and the user who speaks the target language is presented to the user. Can do.

第２の実施形態に係る音声翻訳装置は、第１の実施形態に係る音声翻訳装置１００と同様であるが、用例格納部１０５に格納される用例と、用例検索部１０６における動作とが異なる。
用例格納部１０５は、原言語の用例と注釈とを対応づけ、目的言語の用例と注釈とを対応づけて格納する。
用例検索部１０６は、原言語文字列に類似用例が存在する場合、さらに、類似用例に注釈が存在するかどうかを検索する。 The speech translation apparatus according to the second embodiment is the same as the speech translation apparatus 100 according to the first embodiment, but the example stored in the example storage unit 105 and the operation of the example search unit 106 are different.
The example storage unit 105 associates the source language example and the annotation, and stores the target language example and the annotation in association with each other.
When there is a similar example in the source language character string, the example search unit 106 further searches for whether there is an annotation in the similar example.

次に、第２の実施形態に係る用例格納部１０５に格納されるテーブルの一例について図１８を参照して説明する。
図１８に示すように、原言語用例１８０１と注釈１８０２とが対応づけられ、目的言語用例１８０３と注釈１８０４とが関連づけられて格納される。具体的には、原言語用例１８０５−Ｊ「結構です」と注釈１８０５−１「大丈夫です」とが関連づけられ、原言語用例１８０６−Ｊ「結構です」と注釈１８０６−１「不要です」とが関連づけられて格納される。このように、複数の意味を有する原言語用例には、それぞれの意味に対応する注釈が付けられる。
ここで、これら注釈が存在する原言語用例の翻訳である目的言語の用例には、原言語用例ではなく注釈に応じた目的言語での翻訳が格納される。すなわち、原言語用例１８０５−Ｊ「結構です」と注釈１８０５−１「大丈夫です」とに対応する目的言語用例１８０５−Ｅとして、「That’s good.」が関連づけられて格納される。また、原言語用例１８０６−Ｊ「結構です」と注釈１８０６−１「不要です」とに対応する目的言語用例１８０６−Ｅとして、「No thank you.」が関連づけられて格納される。 Next, an example of a table stored in the example storage unit 105 according to the second embodiment will be described with reference to FIG.
As shown in FIG. 18, the source language example 1801 and the annotation 1802 are associated with each other, and the target language example 1803 and the annotation 1804 are stored in association with each other. Specifically, the source language example 1805-J “OK” is associated with the annotation 1805-1 “OK”, the source language example 1806-J “OK” and the annotation 1806-1 “unnecessary”. Stored in association. As described above, the source language examples having a plurality of meanings are annotated corresponding to the respective meanings.
Here, in the example of the target language that is the translation of the source language example in which these annotations exist, the translation in the target language corresponding to the annotation is stored, not the source language example. That is, “That's good.” Is stored in association with the target language example 1805-E corresponding to the source language example 1805-J “OK” and the annotation 1805-1 “OK”. In addition, “No thank you.” Is stored in association with the target language example 1806-E corresponding to the source language example 1806-J “Nice” and the annotation 1806-1 “Not required”.

また、目的言語用例に注釈が存在する場合は、目的言語用例１８０７−Ｅ「You’re welcome.」と注釈１８０７−１「Welcome to you.」とが関連づけられ、目的言語用例１８０８−Ｅ「You’re welcome.」と注釈１８０８−１「Don’t mention it.」とが関連づけられる。ここで、これら注釈が存在する目的言語用例に対応する原言語では、注釈が存在する原言語用例の場合と同様に、注釈に対応した原言語が格納される。例えば、注釈１８０７−１「Welcome to you」の原言語での翻訳である原言語用例１８０７−Ｊ「いらっしゃいませ」が目的言語用例１８０７−Ｅ「You’re welcome.」と注釈１８０７−１「Welcome to you」とに関連づけられて格納される。 In addition, when an annotation is present in the target language example, the target language example 1807-E “You're welcome.” And the annotation 1807-1 “Welcome to you.” Are associated with each other, and the target language example 1808-E “You” 're welcome.' is associated with the annotation 1808-1 "Don't mention it." Here, in the source language corresponding to the target language example in which these annotations exist, the source language corresponding to the annotation is stored as in the case of the source language example in which the annotations exist. For example, the source language example 1807-J “I welcome” is the target language example 1807-E “You're welcome.” And the annotation 1807-1 “Welcome to You” is a translation in the source language of the annotation 1807-1 “Welcome to you”. Stored in association with “to you”.

同様に、注釈１８０８−１「Welcome to you」の原言語での翻訳である原言語用例１８０８−Ｅ「とんでもありません」が目的言語用例１８０８−Ｅ「You’re welcome.」と注釈１８０７−１「Welcome to you.」とに関連づけられて格納される。このように、同一の原言語用例でも、注釈が存在する場合は、注釈に応じた翻訳を目的言語用例として関連づけて格納する。逆に、同一の目的言語用例でも、注釈が存在する場合は、注釈に応じた翻訳を原言語用例として関連づけて格納する。 Similarly, the original language example 1808-E “not outrageous” which is the translation of the annotation 1808-1 “Welcome to you” in the source language is the target language example 1808-E “You're welcome.” And the annotation 1807-1 “ Stored in association with "Welcome to you." Thus, even in the same source language example, when an annotation exists, the translation corresponding to the annotation is stored in association with the target language example. On the other hand, if an annotation exists even in the same target language example, the translation corresponding to the annotation is stored in association with the source language example.

次に、第２の実施形態に係る音声翻訳装置の動作の具体例について図１９を参照して説明する。
図１９は、図１１に示す例と同様であるが、用例のリストが表示される際に類似用例に加えて注釈も共に表示される例を示す。具体的には、類似用例として「結構です（大丈夫です）」、「結構です（不要です）」が一覧として表示される。なお、類似用例に注釈が存在する場合の記号１９０１は、類似用例に注釈がない場合の記号と区別することが望ましい。例えば、注釈がない場合は記号を白抜きとし、注釈がある場合は記号を塗りつぶしとすればよい。これにより、ユーザは意味が曖昧な文であり、注釈が存在することを認識できる。 Next, a specific example of the operation of the speech translation apparatus according to the second embodiment will be described with reference to FIG.
FIG. 19 is similar to the example shown in FIG. 11, but shows an example in which an annotation is displayed together with a similar example when a list of examples is displayed. Specifically, as a similar example, “Nice (no problem)” and “Nice (unnecessary)” are displayed as a list. It should be noted that it is desirable to distinguish the symbol 1901 when there is an annotation in the similar example from the symbol when there is no annotation in the similar example. For example, if there is no annotation, the symbol is outlined, and if there is an annotation, the symbol is filled. As a result, the user can recognize that the meaning is ambiguous and the annotation exists.

なお、図１９の例では、類似用例１９０２−Ｊ「結構です［大丈夫です］」、１９０３−Ｊ「結構です［不要です］」と類似用例が２つ提示されているのに対し、対訳用例は、１９０２−Ｅ１「That’s fine,」、１９０２−Ｅ２「All right.」および１９０３−Ｅ「No thank you.」と３つ提示される。これは、対訳用例に対応した類似用例を選択する際に、類似用例および注釈が重複する場合は１つ表示すればよいためである。 In addition, in the example of FIG. 19, two similar usage examples are presented as the similar usage example 1902-J “It's fine [OK]” and 1903-J “It ’s fine [unnecessary]”. , 1902-E1 “That's fine,”, 1902-E2 “All right.” And 1903-E “No thank you.”. This is because when the similar example corresponding to the parallel translation example is selected, if the similar example and the annotation overlap, it is only necessary to display one.

以上に示した第２の実施形態によれば、用例に注釈が関連づけられている場合は、用例を表示する際に用例と注釈とを表示することで、目的言語側および原言語側の両方のユーザが注釈を参照することができ、意味の曖昧な用例について適切な意味を示す用例を選択できる。 According to the second embodiment described above, when an annotation is associated with an example, both the target language side and the source language side are displayed by displaying the example and the annotation when displaying the example. The user can refer to the annotation, and can select an example that shows the appropriate meaning for an ambiguous example.

（第３の実施形態）
上述の第１および第２実施形態は単一のデバイス内での構成を想定しているが、複数のデバイスに処理を分散させてもよい。第３の実施形態では、サーバとクライアントとに分けて処理を実現する場合を想定する。
一般に、携帯電話やタブレットＰＣ等のクライアントの計算資源および記憶資源が限定されるデバイスで音声翻訳処理をさせる場合は、データ量および探索空間の自由度に制約が生じる。よって、処理負荷が大きい音声認識、機械翻訳および用例検索の処理を、計算資源および記憶資源の拡張が容易なサーバで動作させることで、クライアントの処理量を軽減することができる。 (Third embodiment)
The first and second embodiments described above assume a configuration in a single device, but the processing may be distributed to a plurality of devices. In the third embodiment, it is assumed that processing is realized separately for a server and a client.
In general, when speech translation processing is performed by a device such as a mobile phone or a tablet PC that has limited computing resources and storage resources, restrictions are imposed on the amount of data and the degree of freedom of search space. Therefore, the processing amount of the client can be reduced by operating the processing of voice recognition, machine translation, and example search, which have a large processing load, on a server with easy expansion of computing resources and storage resources.

ここで、第３の実施形態に係る音声翻訳装置を含む音声認識システムについて図２０のブロック図を参照して説明する。
図２０に示す音声認識システムは、サーバ２０００とクライアント２５００とを含む。 Here, a speech recognition system including the speech translation apparatus according to the third embodiment will be described with reference to the block diagram of FIG.
The voice recognition system shown in FIG. 20 includes a server 2000 and a client 2500.

サーバ２０００は、音声認識部２００１、機械翻訳部２００２、用例検索部２００３、用例格納部２００４、サーバ通信部２００５およびサーバ制御部２００６を含む。
音声認識部２００１、機械翻訳部２００２、用例検索部２００３および用例格納部２００４は、第１の実施形態に係る音声認識部１０２、機械翻訳部１０３、用例検索部１０６および用例格納部１０５と同様の動作を行なうのでここでの説明は省略する。
サーバ通信部２００５は、後述のクライアント通信部２５０６とデータの送受信を行なう。
サーバ制御部２００６は、サーバ全体の動作を制御する。 The server 2000 includes a speech recognition unit 2001, a machine translation unit 2002, an example search unit 2003, an example storage unit 2004, a server communication unit 2005, and a server control unit 2006.
The speech recognition unit 2001, the machine translation unit 2002, the example search unit 2003, and the example storage unit 2004 are the same as the speech recognition unit 102, the machine translation unit 103, the example search unit 106, and the example storage unit 105 according to the first embodiment. Since the operation is performed, the description here is omitted.
A server communication unit 2005 transmits / receives data to / from a client communication unit 2506 described later.
A server control unit 2006 controls the operation of the entire server.

クライアント２５００は、音声取得部２５０１、表示部２５０２、ポインティング指示検出部２５０３、文字列選択部２５０４、用例提示部２５０５、クライアント通信部２５０６およびクライアント制御部２５０７を含む。
音声取得部２５０１、表示部２５０２、ポインティング指示検出部２５０３、文字列選択部２５０４および用例提示部２５０５は、第１の実施形態にかかる音声取得部１０１、表示部１０４、ポインティング指示検出部１０７、文字列選択部１０８および用例提示部１０９と同様の処理を行なうのでここでの説明は省略する。
クライアント通信部２５０６は、サーバ通信部２００５とデータの送受信を行なう。
クライアント制御部２５０７は、クライアント２５００の全体の制御を行なう。 The client 2500 includes a voice acquisition unit 2501, a display unit 2502, a pointing instruction detection unit 2503, a character string selection unit 2504, an example presentation unit 2505, a client communication unit 2506, and a client control unit 2507.
The voice acquisition unit 2501, the display unit 2502, the pointing instruction detection unit 2503, the character string selection unit 2504, and the example presentation unit 2505 are the voice acquisition unit 101, the display unit 104, the pointing instruction detection unit 107, and the characters according to the first embodiment. Since the same processing as that of the column selection unit 108 and the example presentation unit 109 is performed, description thereof is omitted here.
A client communication unit 2506 transmits and receives data to and from the server communication unit 2005.
A client control unit 2507 performs overall control of the client 2500.

次に、サーバ２０００およびクライアント２５００による音声翻訳処理の一例について説明する。
クライアント２５００では、音声取得部２５０１がユーザからの音声を取得し、クライアント通信部２５０６が音声信号をサーバ２０００へ送信する。
サーバ２０００では、サーバ通信部２００５がクライアント２５００からの音声信号を受信し、音声認識部２００１が音声信号について音声認識処理を行なう。その後、機械翻訳部１０３が音声認識結果について機械翻訳処理を行なう。サーバ通信部２００５が、音声認識結果および機械翻訳結果をクライアント２５００へ送信する。また、用例検索部２００３が音声認識結果と類似する類似用例を検索し、類似用例が存在する場合は、類似用例および対応する対訳用例がクライアント２５００に送信される。 Next, an example of speech translation processing by the server 2000 and the client 2500 will be described.
In the client 2500, the audio acquisition unit 2501 acquires audio from the user, and the client communication unit 2506 transmits an audio signal to the server 2000.
In server 2000, server communication unit 2005 receives a voice signal from client 2500, and voice recognition unit 2001 performs voice recognition processing on the voice signal. Thereafter, the machine translation unit 103 performs machine translation processing on the speech recognition result. The server communication unit 2005 transmits the speech recognition result and the machine translation result to the client 2500. Further, the example search unit 2003 searches for a similar example similar to the speech recognition result, and if there is a similar example, the similar example and the corresponding parallel translation example are transmitted to the client 2500.

クライアント２５００では、クライアント通信部２５０６が音声認識結果および機械翻訳結果とそれぞれに対応する類似用例および対訳用例とを受信し、表示部２５０２が音声認識結果および機械翻訳結果を表示する。ポインティング指示検出部２５０３がユーザからの指示を検出した場合は、用例提示部２５０５が選択文字列に関連する対訳用例および類似用例を提示する。 In the client 2500, the client communication unit 2506 receives the speech recognition result and the machine translation result, and the similar use example and the parallel translation example, respectively, and the display unit 2502 displays the speech recognition result and the machine translation result. When the pointing instruction detection unit 2503 detects an instruction from the user, the example presentation unit 2505 presents a parallel translation example and a similar example related to the selected character string.

なお、音声認識結果に類似用例が存在する場合で、クライアント２５００が全ての類似用例を受信せずに、任意の数だけ抽出された類似用例および対応する対訳用例を受信するように設定している場合もある。この場合は、クライアント２５００は、受信していない他の類似用例または対応する対訳用例を受信すべく、リクエストをサーバ２０００に送信する。サーバ２０００の用例検索部２００３は、未抽出の類似用例および対応する対訳用例を抽出し、サーバ通信部２００５がこれらの類似用例および対訳用例を送信する。クライアント２５００では、クライアント通信部２５０６がこれらの類似用例および対訳用例を受信して、新たな類似用例および対訳用例を表示してもよい。 In the case where there are similar examples in the speech recognition result, the client 2500 is set to receive an arbitrary number of extracted similar examples and corresponding bilingual examples without receiving all the similar examples. In some cases. In this case, the client 2500 transmits a request to the server 2000 to receive another similar example that has not been received or a corresponding parallel translation example. The example search unit 2003 of the server 2000 extracts unextracted similar examples and corresponding parallel translation examples, and the server communication unit 2005 transmits these similar examples and parallel translation examples. In the client 2500, the client communication unit 2506 may receive these similar usage examples and parallel translation examples and display new similar usage examples and parallel translation examples.

また、サーバ２０００が、類似用例が存在することを示すフラグのみクライアント２５００に送信してもよい。クライアント２５００では、ユーザからのポインティングがあった場合に、選択文字列に関する類似用例および対訳用例のリクエストをサーバ２０００に送信し、サーバ２０００がリクエストに応じて類似用例および対訳用例をクライアント２５００に送信すればよい。このようにすることで必要な場合にのみ用例の検索処理を行なうので、クライアントにおいて音声翻訳処理の動作をより高速に行なうことができる。 Further, the server 2000 may transmit only a flag indicating that a similar example exists to the client 2500. When there is a pointing from the user, the client 2500 sends a request for the similar example and the translation example regarding the selected character string to the server 2000, and the server 2000 sends the similar example and the translation example to the client 2500 in response to the request. That's fine. In this way, the example search process is performed only when necessary, so that the operation of the speech translation process can be performed at a higher speed in the client.

以上に示した第３の実施形態によれば、処理負荷が大きい音声認識、機械翻訳および用例検索の処理を、計算資源および記憶資源の拡張が容易なサーバで動作させることで、クライアントの処理量を軽減することができる。 According to the third embodiment described above, the processing amount of the client can be increased by operating the processing of speech recognition, machine translation, and example search, which have a large processing load, on a server that can easily expand computing resources and storage resources. Can be reduced.

上述の実施形態の中で示した処理手順に示された指示は、ソフトウェアであるプログラムに基づいて実行されることが可能である。汎用の計算機システムが、このプログラムを予め記憶しておき、このプログラムを読み込むことにより、上述した音声翻訳装置による効果と同様な効果を得ることも可能である。上述の実施形態で記述された指示は、コンピュータに実行させることのできるプログラムとして、磁気ディスク（フレキシブルディスク、ハードディスクなど）、光ディスク（ＣＤ−ＲＯＭ、ＣＤ−Ｒ、ＣＤ−ＲＷ、ＤＶＤ−ＲＯＭ、ＤＶＤ±Ｒ、ＤＶＤ±ＲＷ、Ｂｌｕ−ｒａｙ（登録商標）Ｄｉｓｃなど）、半導体メモリ、又はこれに類する記録媒体に記録される。コンピュータまたは組み込みシステムが読み取り可能な記録媒体であれば、その記憶形式は何れの形態であってもよい。コンピュータは、この記録媒体からプログラムを読み込み、このプログラムに基づいてプログラムに記述されている指示をＣＰＵで実行させれば、上述した実施形態の音声翻訳装置と同様な動作を実現することができる。もちろん、コンピュータがプログラムを取得する場合又は読み込む場合はネットワークを通じて取得又は読み込んでもよい。
また、記録媒体からコンピュータや組み込みシステムにインストールされたプログラムの指示に基づきコンピュータ上で稼働しているＯＳ（オペレーティングシステム）や、データベース管理ソフト、ネットワーク等のＭＷ（ミドルウェア）等が本実施形態を実現するための各処理の一部を実行してもよい。
さらに、本実施形態における記録媒体は、コンピュータあるいは組み込みシステムと独立した媒体に限らず、ＬＡＮやインターネット等により伝達されたプログラムをダウンロードして記憶または一時記憶した記録媒体も含まれる。
また、記録媒体は１つに限られず、複数の媒体から本実施形態における処理が実行される場合も、本実施形態における記録媒体に含まれ、媒体の構成は何れの構成であってもよい。 The instructions shown in the processing procedure shown in the above-described embodiment can be executed based on a program that is software. A general-purpose computer system stores this program in advance and reads this program, so that it is possible to obtain the same effect as that obtained by the speech translation apparatus described above. The instructions described in the above-described embodiments are, as programs that can be executed by a computer, magnetic disks (flexible disks, hard disks, etc.), optical disks (CD-ROM, CD-R, CD-RW, DVD-ROM, DVD). ± R, DVD ± RW, Blu-ray (registered trademark) Disc, etc.), semiconductor memory, or a similar recording medium. As long as the recording medium is readable by the computer or the embedded system, the storage format may be any form. If the computer reads the program from the recording medium and causes the CPU to execute instructions described in the program based on the program, the same operation as the speech translation apparatus of the above-described embodiment can be realized. Of course, when the computer acquires or reads the program, it may be acquired or read through a network.
In addition, the OS (operating system), database management software, MW (middleware) such as a network, etc. running on the computer based on the instructions of the program installed in the computer or embedded system from the recording medium implement this embodiment. A part of each process for performing may be executed.
Furthermore, the recording medium in the present embodiment is not limited to a medium independent of a computer or an embedded system, but also includes a recording medium in which a program transmitted via a LAN, the Internet, or the like is downloaded and stored or temporarily stored.
Further, the number of recording media is not limited to one, and when the processing in this embodiment is executed from a plurality of media, it is included in the recording medium in this embodiment, and the configuration of the media may be any configuration.

なお、本実施形態におけるコンピュータまたは組み込みシステムは、記録媒体に記憶されたプログラムに基づき、本実施形態における各処理を実行するためのものであって、パソコン、マイコン等の１つからなる装置、複数の装置がネットワーク接続されたシステム等の何れの構成であってもよい。
また、本実施形態におけるコンピュータとは、パソコンに限らず、情報処理機器に含まれる演算処理装置、マイコン等も含み、プログラムによって本実施形態における機能を実現することが可能な機器、装置を総称している。 The computer or the embedded system in the present embodiment is for executing each process in the present embodiment based on a program stored in a recording medium. The computer or the embedded system includes a single device such as a personal computer or a microcomputer. The system may be any configuration such as a system connected to the network.
In addition, the computer in this embodiment is not limited to a personal computer, but includes an arithmetic processing device, a microcomputer, and the like included in an information processing device. ing.

本発明のいくつかの実施形態を説明したが、これらの実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行なうことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 Although several embodiments of the present invention have been described, these embodiments are presented by way of example and are not intended to limit the scope of the invention. These novel embodiments can be implemented in various other forms, and various omissions, replacements, and changes can be made without departing from the spirit of the invention. These embodiments and modifications thereof are included in the scope and gist of the invention, and are included in the invention described in the claims and the equivalents thereof.

１００，６００・・・音声翻訳装置、１０１，２５０１・・・音声取得部、１０２，２００１・・・音声認識部、１０３，２００２・・・機械翻訳部、１０４，２５０２・・・表示部、１０５，２００４・・・用例格納部、１０６，２００３・・・用例検索部、１０７，２５０３・・・ポインティング指示検出部、１０８，２５０４・・・文字列選択部、１０９，２５０５・・・用例提示部、２０１，１８０１，１８０５，１８０６，１８０７，１８０８・・・原言語用例、２０２，１８０３，１８０５，１８０６，１８０７，１８０８・・・目的言語用例、６０１・・・筐体、６０２・・・タッチパネルディスプレイ、６０３・・・マイクロフォン、７０１，７０２・・・表示エリア、７０３・・・発話開始ボタン、７０４・・・ボタン、７０５・・・削除ボタン、７０６・・・終了ボタン、８０１，９０１・・・発話音声、８０２−Ｅ・・・音声認識結果、８０２−Ｊ・・・機械翻訳結果、９０２−Ｊ，９０３−Ｊ，９０４−Ｊ，１４０１−Ｊ，・・・原言語文字列、９０２−Ｅ，９０３−Ｅ，９０４−Ｅ，１４０１−Ｅ，・・・目的言語文字列、９０５，１９０１・・・記号，１００１・・・カーソル、１００２，１２０２，１５０２，１７０１・・・確認メッセージ、１１０１−Ｊ，１１０２−Ｊ，１１０３−Ｊ，１２０１−Ｊ，１５０１−Ｊ１９０２−Ｊ，１９０３−Ｊ・・・類似用例、１１０１−Ｅ，１１０２−Ｅ，１１０３−Ｅ，１２０１−Ｅ，１５０１−Ｅ１９０２−Ｅ１，１９０２−Ｅ２，１９０３−Ｅ・・・対訳用例、１１０４・・・スクロールバー、１８０２，１８０４，１８０５・・・注釈、２０００・・・サーバ、２００５・・・サーバ通信部、２００６・・・サーバ制御部、２５００・・・クライアント、２５０６・・・クライアント通信部、２５０７・・・クライアント制御部。 100, 600: speech translation apparatus, 101, 2501: speech acquisition unit, 102, 2001 ... speech recognition unit, 103, 2002 ... machine translation unit, 104, 2502 ... display unit, 105 , 2004 ... Example storage unit, 106, 2003 ... Example search unit, 107, 2503 ... Pointing instruction detection unit, 108, 2504 ... Character string selection unit, 109, 2505 ... Example presentation unit 201, 1801, 1805, 1806, 1807, 1808 ... Examples for source language, 202, 1803, 1805, 1806, 1807, 1808 ... Examples for target language, 601 ... Case, 602 ... Touch panel display 603 ... Microphone, 701, 702 ... Display area, 703 ... Speech start button, 704 ... Button, 05 ... delete button, 706 ... end button, 801, 901 ... uttered speech, 802-E ... speech recognition result, 802-J ... machine translation result, 902-J, 903-J , 904-J, 1401-J, ... source language character string, 902-E, 903-E, 904-E, 1401-E, ... target language character string, 905, 1901 ... symbol, 1001 ... Cursor, 1002, 1202, 1502, 1701 ... Confirmation message, 1101-J, 1102-J, 1103-J, 1201-J, 1501-J1902-J, 1903-J ... Similar examples 1101 -E, 1102-E, 1103-E, 1201-E, 1501-E1902-E1, 1902-E2, 1903-E ... translation example 1104 ... scroll bar, 180 , 1804, 1805 ... annotation, 2000 ... server, 2005 ... server communication unit, 2006 ... server control unit, 2500 ... client, 2506 ... client communication unit, 2507 ... client. Control unit.

Claims

An acquisition unit for acquiring speech in a first language as an audio signal;
A speech recognition unit that sequentially performs speech recognition on the speech signal and obtains a first language character string that is a character string of a speech recognition result;
A translation unit that translates the first language character string into a second language different from the first language, and obtains a second language character string that is a character string of a translation result;
A similar example that is an example in the first language that is similar to the first language character string is searched for each first language character string, and when the similar example exists, the similar example and the similar example are A search unit for obtaining a bilingual example that is the result of translation into two languages;
A selection unit that selects, as a selection character string, at least one of a first language character string in which the similar example exists and a second language character string in which the parallel example exists;
A speech translation apparatus, comprising: an example presentation unit that presents one or more similar examples and parallel translation examples related to the selected character string.

A display unit for displaying the first language character string and the similar example, and the second language character string and the bilingual example, respectively;
When there is a similar example in the first language character string, the example presentation unit associates the first symbol indicating that the example exists with the first language character string and the corresponding second language character string, and The speech translation apparatus according to claim 1, wherein the speech translation apparatus is displayed on a display unit.

The speech translation apparatus according to claim 1, wherein the example presentation unit presents a list of a plurality of similar examples and a plurality of parallel translation examples when the selected character string is selected.

When either the similar example or the parallel translation example is selected, the example presentation unit highlights both the similar example and the parallel translation example, and the emphasized similar example or the emphasized parallel translation example is appropriate. The speech translation apparatus according to any one of claims 1 to 3, wherein a first notification for prompting a determination as to whether or not is present is presented.

5. The speech translation apparatus according to claim 1, further comprising a storage unit that stores the similar example and the bilingual example corresponding to the similar example in association with each other.

The storage unit stores the similar example, the parallel translation example, and an annotation for explaining the intention of at least one of the similar translation example and the parallel translation example ,
The display unit displays both the similar example and the annotation when the annotation is related to the similar example, and displays both the parallel example and the annotation when the annotation is related to the parallel example. The speech translation apparatus according to claim 5 .

The storage unit stores the similar example, the parallel translation example, and an annotation for explaining the intention of at least one of the similar translation example and the parallel translation example,
The example presentation unit, when there is a similar example in the first language character string and the annotation is associated with the similar example, displays a second symbol indicating that the annotation exists in the first language character string and The speech translation apparatus according to claim 5 , wherein the speech translation apparatus is displayed on the display unit in association with a corresponding second language character string.

The said example presentation part displays the 2nd notification for prompting confirmation in the said 1st language on the said display part, when the said 2nd language character string is selected. 8. The speech translation apparatus according to any one of items 7.

An acquisition unit for acquiring speech in a first language as an audio signal;
A first language character string, which is a character string of a speech recognition result that is sequentially speech-recognized for the speech signal, and a translation result character string obtained by translating the first language character string into a second language different from the first language A display unit for displaying the second language character string,
A detection unit for detecting a position on the display unit instructed by a user;
A selection unit that selects at least one of the first language character string and the second language character string as a selection character string based on the position;
With respect to the selected character string, one or more similar examples that are examples in the first language that are similar to the first language character string, and one or more parallel examples that are the result of translating the similar examples into the second language; An example presentation unit for presenting
The speech translation device, wherein the display unit further displays the presented similar example and the parallel translation example.

Utterances in the first language are acquired as audio signals,
Sequentially performing speech recognition on the speech signal to obtain a first language character string that is a character string of a speech recognition result;
Translating the first language character string into a second language different from the first language, obtaining a second language character string that is a character string of a translation result;
A similar example that is an example in the first language that is similar to the first language character string is searched for each first language character string, and when the similar example exists, the similar example and the similar example are The bilingual example that is the result of translation into two languages
According to a user instruction, at least one of a first language character string in which the similar example exists and a second language character string in which the parallel example exists is selected as a selection character string,
Presenting one or more similar examples and bilingual examples related to the selected character string.

Computer
Obtaining means for obtaining speech in a first language as an audio signal;
Speech recognition means for sequentially performing speech recognition on the speech signal and obtaining a first language character string that is a character string of a speech recognition result;
Translation means for translating the first language character string into a second language different from the first language and obtaining a second language character string that is a character string of a translation result;
A similar example that is an example in the first language that is similar to the first language character string is searched for each first language character string, and when the similar example exists, the similar example and the similar example are A search means for obtaining a bilingual example resulting from translation into two languages;
Selection means for selecting, as a selection character string, at least one of a first language character string in which the similar example exists and a second language character string in which the parallel example exists, according to a user instruction;
A speech translation program for functioning as example presentation means for presenting one or more similar examples and parallel translation examples related to the selected character string.