JP6334589B2

JP6334589B2 - Fixed phrase creation device and program, and conversation support device and program

Info

Publication number: JP6334589B2
Application number: JP2016068614A
Authority: JP
Inventors: 千春宇賀神
Original assignee: RECRUIT LIFESTYLE CO., LTD.
Current assignee: RECRUIT LIFESTYLE CO., LTD.
Priority date: 2016-03-30
Filing date: 2016-03-30
Publication date: 2018-05-30
Anticipated expiration: 2036-03-30
Also published as: JP2017182452A

Description

本発明は、定型フレーズ作成装置及びプログラム、並びに、会話支援装置及びプログラムに関する。 The present invention relates to a fixed phrase creation device and program, and a conversation support device and program.

例えば、外国旅行においては、現地での会話が不安材料となる。そこで、互いに異なる言語を話す人同士の会話によるコミュニケーションを支援するため、指さし会話帳（登録商標）と呼ばれる紙媒体やそのアプリケーションソフトウェア（以下単に「アプリ」という）が知られている（例えば非特許文献１）。 For example, when traveling abroad, local conversations are a cause for concern. Therefore, a paper medium called a pointing phrase book (registered trademark) and its application software (hereinafter simply referred to as “app”) are known (for example, non-patented) in order to support communication by conversation between people who speak different languages. Reference 1).

このアプリには、話者が遭遇し得るシチュエーション毎に、よく話される基本的な複数の会話フレーズが第１言語（例えば日本語）と第２言語（例えば英語）で併記された定型フレーズ集が用意されている。話者は、画面表示されたそれらの定型フレーズのなかから、遭遇した場面や質問したい内容に応じた会話フレーズを第１言語で探し出し、併記された第２言語で話者自ら発声するか、テキストをタップして第２言語によるネイティブ音声を再生出力することにより、相手に意志を伝えることができる。 This app includes a set of fixed phrases that contain a number of basic spoken phrases often spoken in a first language (for example, Japanese) and a second language (for example, English) for each situation that a speaker may encounter. Is prepared. The speaker searches for the conversation phrase in the first language from the fixed phrases displayed on the screen according to the scene encountered and the content to be asked, and speaks the speaker himself or herself in the second language written in the text. By tapping and reproducing and outputting native audio in the second language, it is possible to convey the will to the other party.

旅の指さし会話帳アプリ「YUBISASHI」22か国以上対応（無料）［平成２８年３月２８日検索］、インターネット＜ＵＲＬ：http://www.yubisashi.com/あなたと一緒に旅するアプリ＞Travel pointing application “YUBISASHI” for 22 countries or more (free) [March 28, 2016 search], Internet <URL: http://www.yubisashi.com/ An app that travels with you>

しかし、上記従来のアプリでは、沢山の定型フレーズのなかから目的の会話フレーズを探し出すのに時間がかかることがあり、そうなると円滑な会話が妨げられてしまう。また、そもそも、目的の会話フレーズが定型フレーズ集に掲載又は収納されていないこともある。例えば、店舗の従業員等が外国人客を接客するような状況では、その店舗でよく発話される特有の会話フレーズ（例えば、店舗のシステムの説明や推奨品等）が存在し得るが、既成の（出来合いの）定型フレーズ集には、そのような店舗毎の特有な会話フレーズが収納されていることは滅多にない。 However, in the above-described conventional application, it may take time to search for a target conversation phrase from among a large number of fixed phrases, which may hinder smooth conversation. In the first place, the target conversation phrase may not be listed or stored in the standard phrase collection. For example, in a situation where a store employee or the like is serving a foreign customer, there may be a specific conversation phrase (such as a description of the store system or a recommended product) that is often spoken in that store. The (completely) standard phrase collection rarely contains such a unique conversation phrase for each store.

そこで、本発明は、かかる事情に鑑みてなされたものであり、話者がよく使用する独自の定型フレーズを作成して登録することにより、話者同士の円滑な会話によるコミュニケーションを支援することができる定型フレーズ作成装置、会話支援装置、及び会話支援プログラムを提供することを目的とする。 Therefore, the present invention has been made in view of such circumstances, and by creating and registering a unique fixed phrase that is often used by speakers, it is possible to support communication through smooth conversation between speakers. An object of the present invention is to provide a regular phrase creation device, a conversation support device, and a conversation support program.

上記課題を解決するため、本発明の一態様による定型フレーズ作成装置は、話者の第１言語による音声を入力するための入力部と、入力音声の内容を第１言語とは異なる第２言語の内容に翻訳する翻訳部と、話者の指示に基づいて第２言語による翻訳結果を第１言語の内容に逆翻訳する逆翻訳部と、入力音声の内容と第２言語による翻訳結果を表示する対訳表示部と、話者の指示に基づいて入力音声の内容と第２言語による翻訳結果の組を定型フレーズとして登録する登録部とを備える。より具体的には、表示部が、話者の指示を入力するためのボタンを表示するように構成してもよい。なお、「フレーズ」には、文、節、句、語、及び数字が含まれ、また、それらに付随して画像又は記号が含まれていてもよい。 In order to solve the above-mentioned problem, a fixed phrase creating device according to an aspect of the present invention includes an input unit for inputting speech in a speaker's first language, and a second language in which the content of the input speech is different from the first language. A translation unit that translates the content of the input language, a reverse translation unit that back-translates the translation result of the second language into the content of the first language based on the instructions of the speaker, and displays the content of the input speech and the translation result of the second language A bilingual display unit for registering, and a registration unit for registering the content of the input speech and the translation result in the second language as a standard phrase based on the instructions of the speaker. More specifically, the display unit may be configured to display a button for inputting a speaker's instruction. The “phrase” includes sentences, clauses, phrases, words, and numbers, and may include images or symbols accompanying them.

また、定型フレーズのひな型を表示するひな型表示部を更に備え、翻訳部が、話者の指示に基づいて、ひな型を用いて、入力音声の内容を第１言語とは異なる第２言語の内容に翻訳しても好適である。 In addition, a template display unit that displays a template of the fixed phrase is further provided, and the translation unit converts the content of the input speech into the content of the second language different from the first language using the template based on the instructions of the speaker. It is also suitable for translation.

さらに、本発明の一態様による会話支援装置は、本発明による定型フレーズ作成装置と、複数の定型フレーズを含む定型フレーズ集を保持する記憶部と、定型フレーズを表示するフレーズ表示部と、定型フレーズにおける第２言語による翻訳結果を音声で出力する出力部とを備える。 Furthermore, a conversation support apparatus according to an aspect of the present invention includes a fixed phrase creating apparatus according to the present invention, a storage unit that stores a set of fixed phrases including a plurality of fixed phrases, a phrase display unit that displays fixed phrases, and a fixed phrase And an output unit for outputting the translation result in the second language by voice.

また、本発明の一態様による定型フレーズ作成プログラムは、コンピュータ（単数又は単一種に限られず、複数又は複数種でもよい；以下同様）を、話者の第１言語による音声を入力するための入力部と、入力音声の内容を第１言語とは異なる第２言語の内容に翻訳する翻訳部と、話者の指示に基づいて第２言語による翻訳結果を第１言語の内容に逆翻訳する逆翻訳部と、入力音声の内容と第２言語による翻訳結果を表示する対訳表示部と、話者の指示に基づいて入力音声の内容と第２言語による翻訳結果の組を定型フレーズとして登録する登録部として機能させるものである。 In addition, the fixed phrase creation program according to one aspect of the present invention is an input for inputting a voice in a first language of a speaker (not limited to a single type or a single type, but may be a plurality or a plurality of types; the same applies hereinafter). A translation unit that translates the content of the input speech into a content of a second language different from the first language, and a reverse translation that reversely translates the translation result in the second language into the content of the first language based on the instructions of the speaker A translation unit, a bilingual display unit that displays the content of the input speech and the translation result in the second language, and a registration for registering the set of the content of the input speech and the translation result in the second language as a standard phrase based on the instructions of the speaker It functions as a part.

また、本発明の一態様による会話支援プログラムは、コンピュータ（単数又は単一種に限られず、複数又は複数種でもよい；以下同様）を、話者の第１言語による音声を入力するための入力部と、入力音声の内容を第１言語とは異なる第２言語の内容に翻訳する翻訳部と、話者の指示に基づいて第２言語による翻訳結果を第１言語の内容に逆翻訳する逆翻訳部と、入力音声の内容と第２言語による翻訳結果を表示する対訳表示部と、話者の指示に基づいて入力音声の内容と第２言語による翻訳結果の組を定型フレーズとして登録する登録部と、複数の定型フレーズを含む定型フレーズ集を保持する記憶部と、定型フレーズを表示するフレーズ表示部と、定型フレーズにおける第２言語による翻訳結果を音声で出力する出力部として機能させるものである。 In addition, the conversation support program according to an aspect of the present invention includes a computer (not limited to a single type or a single type, but may be a plurality or a plurality of types; the same shall apply hereinafter), and an input unit for inputting speech of the speaker in the first language. And a translation unit that translates the content of the input speech into the content of the second language different from the first language, and the reverse translation that reversely translates the translation result of the second language into the content of the first language based on the instructions of the speaker A bilingual display unit for displaying the content of the input speech and the translation result in the second language, and a registration unit for registering the set of the content of the input speech and the translation result in the second language as a standard phrase based on the instructions of the speaker And a storage unit that holds a collection of fixed phrases including a plurality of fixed phrases, a phrase display unit that displays fixed phrases, and an output unit that outputs a translation result of the fixed phrases in the second language by voice Than is.

本発明によれば、音声入力した第１言語による会話フレーズの内容を、音声翻訳処理によって第２言語の内容に翻訳し、さらに、それを必要に応じて第１言語の内容に逆翻訳して表示することにより、話者は、その第２言語への翻訳結果の当否を判断することができる。その結果、第２言語への翻訳結果が妥当である場合、その第１言語による入力音声の内容と第２言語による翻訳結果の組を１つの定型フレーズとして作成し、実際の会話に先立って、定型フレーズ集に予め登録しておくことができる。従って、話者は会話の場面において、所望の会話フレーズを定型フレーズ集のなかから、的確かつ簡易に探し出して利用することが可能となり、これにより、話者同士の円滑な会話によるコミュニケーションを支援することができる。 According to the present invention, the content of the conversation phrase in the first language input by speech is translated into the content of the second language by speech translation processing, and further back-translated to the content of the first language as necessary. By displaying, the speaker can determine whether the translation result into the second language is correct. As a result, if the translation result into the second language is valid, a set of the content of the input speech in the first language and the translation result in the second language is created as one fixed phrase, and prior to the actual conversation, It can be registered in advance in the fixed phrase collection. Therefore, it is possible for a speaker to find and use a desired conversation phrase accurately and easily from a set of standard phrases in a conversation scene, thereby supporting communication by smooth conversation between speakers. be able to.

本発明による定型フレーズ作成装置を含む会話支援装置に係るネットワーク構成等の好適な一実施形態を概略的に示すシステムブロック図である。1 is a system block diagram schematically showing a preferred embodiment such as a network configuration related to a conversation support device including a fixed phrase creation device according to the present invention. FIG. 本発明による定型フレーズ作成装置を含む会話支援装置の好適な一実施形態における処理の流れ（一部）の一例を示すフローチャートである。It is a flowchart which shows an example of the process flow (part) in suitable one Embodiment of the conversation assistance apparatus containing the fixed phrase creation apparatus by this invention. （Ａ）乃至（Ｄ）は、情報端末における表示画面の遷移の一例を示す平面図である。(A) thru | or (D) are top views which show an example of the transition of the display screen in an information terminal. （Ａ）及び（Ｄ）は、情報端末における表示画面の遷移の一例を示す平面図である。(A) And (D) is a top view which shows an example of the transition of the display screen in an information terminal.

以下、本発明の実施の形態について詳細に説明する。なお、以下の実施の形態は、本発明を説明するための例示であり、本発明をその実施の形態のみに限定する趣旨ではない。また、本発明は、その要旨を逸脱しない限り、さまざまな変形が可能である。さらに、当業者であれば、以下に述べる各要素を均等なものに置換した実施の形態を採用することが可能であり、かかる実施の形態も本発明の範囲に含まれる。またさらに、必要に応じて示す上下左右等の位置関係は、特に断らない限り、図示の表示に基づくものとする。さらにまた、図面における各種の寸法比率は、その図示の比率に限定されるものではない。 Hereinafter, embodiments of the present invention will be described in detail. The following embodiments are examples for explaining the present invention, and are not intended to limit the present invention only to the embodiments. The present invention can be variously modified without departing from the gist thereof. Furthermore, those skilled in the art can employ embodiments in which the elements described below are replaced with equivalent ones, and such embodiments are also included in the scope of the present invention. Furthermore, positional relationships such as up, down, left, and right shown as needed are based on the display shown unless otherwise specified. Furthermore, various dimensional ratios in the drawings are not limited to the illustrated ratios.

（装置構成）
図１は、本発明による定型フレーズ作成装置を含む会話支援装置に係るネットワーク構成等の好適な一実施形態を概略的に示すシステムブロック図である。この例において、会話支援装置１００は、話者が使用する情報端末１０にネットワークＮを介して電子的に接続されるサーバ２０を備える（但し、これに限定されない）。 (Device configuration)
FIG. 1 is a system block diagram schematically showing a preferred embodiment such as a network configuration related to a conversation support device including a fixed phrase creation device according to the present invention. In this example, the conversation support apparatus 100 includes a server 20 that is electronically connected to the information terminal 10 used by the speaker via the network N (but is not limited to this).

情報端末１０は、例えば、タッチパネル等のユーザインターフェイス及び視認性が高いディスプレイを採用する。また、ここでの情報端末１０は、ネットワークＮとの通信機能を有するスマートフォンに代表される携帯電話を含む可搬型のタブレット型端末装置である。さらに、情報端末１０は、プロセッサ１１、記憶資源１２、音声入出力デバイス１３、通信インターフェイス１４、入力デバイス１５、表示デバイス１６、及びカメラ１７を備えている。また、情報端末１０は、インストールされた会話支援アプリ（本発明の一実施形態による会話支援プログラムの少なくとも一部）が動作することにより、本発明の一実施形態による定型フレーズ作成装置を含む会話支援装置の一部又は全部として機能するものである。 The information terminal 10 employs a user interface such as a touch panel and a display with high visibility, for example. The information terminal 10 here is a portable tablet terminal device including a mobile phone represented by a smartphone having a communication function with the network N. The information terminal 10 further includes a processor 11, a storage resource 12, a voice input / output device 13, a communication interface 14, an input device 15, a display device 16, and a camera 17. In addition, the information terminal 10 operates by an installed conversation support application (at least a part of the conversation support program according to the embodiment of the present invention), thereby including the conversation support including the fixed phrase creating device according to the embodiment of the present invention. It functions as part or all of the device.

プロセッサ１１は、算術論理演算ユニット及び各種レジスタ（プログラムカウンタ、データレジスタ、命令レジスタ、汎用レジスタ等）から構成される。また、プロセッサ１１は、記憶資源１２に格納されているプログラムＰ１０の１つである会話支援アプリを解釈及び実行し、各種処理を行う。このプログラムＰ１０としての会話支援アプリは、例えばサーバ２０からネットワークＮを通じて配信可能なものであり、手動で又は自動でインストール及びアップデートされてもよい。 The processor 11 includes an arithmetic logic unit and various registers (program counter, data register, instruction register, general-purpose register, etc.). Further, the processor 11 interprets and executes a conversation support application that is one of the programs P10 stored in the storage resource 12, and performs various processes. The conversation support application as the program P10 can be distributed from the server 20 through the network N, for example, and may be installed and updated manually or automatically.

なお、ネットワークＮは、例えば、有線ネットワーク（近距離通信網（ＬＡＮ）、広域通信網（ＷＡＮ）、又は付加価値通信網（ＶＡＮ）等）と無線ネットワーク（移動通信網、衛星通信網、ブルートゥース（Bluetooth（登録商標））、ＷｉＦｉ(Wireless Fidelity)、ＨＳＤＰＡ(High Speed Downlink Packet Access)等）が混在して構成される通信網である。 The network N includes, for example, a wired network (a short-range communication network (LAN), a wide-area communication network (WAN), a value-added communication network (VAN), etc.) and a wireless network (mobile communication network, satellite communication network, Bluetooth ( Bluetooth (registered trademark)), WiFi (Wireless Fidelity), HSDPA (High Speed Downlink Packet Access), etc.).

記憶資源１２は、物理デバイス（例えば、半導体メモリ等のコンピュータ読み取り可能な記録媒体）の記憶領域が提供する論理デバイスであり、情報端末１０の処理に用いられるオペレーティングシステム（ＯＳ）プログラム、ドライバプログラム、各種データ等を格納する。ドライバプログラムとしては、例えば、音声入出力デバイス１３を制御するための入出力デバイスドライバプログラム、入力デバイス１５を制御するための入力デバイスドライバプログラム、表示デバイス１６を制御するための表示デバイスドライバプログラム等が挙げられる。さらに、音声入出力デバイス１３は、例えば、一般的なマイクロフォン、及びサウンドデータを再生可能なサウンドプレイヤである。 The storage resource 12 is a logical device provided by a storage area of a physical device (for example, a computer-readable recording medium such as a semiconductor memory), and an operating system (OS) program, a driver program used for processing of the information terminal 10, Stores various data. Examples of the driver program include an input / output device driver program for controlling the audio input / output device 13, an input device driver program for controlling the input device 15, and a display device driver program for controlling the display device 16. Can be mentioned. Furthermore, the voice input / output device 13 is, for example, a general microphone and a sound player capable of reproducing sound data.

通信インターフェイス１４は、例えばサーバ２０との接続インターフェイスを提供するものであり、無線通信インターフェイス及び／又は有線通信インターフェイスから構成される。また、入力デバイス１５は、例えば、表示デバイス１６に表示されるアイコン、ボタン、仮想キーボード、テキスト等のタップ動作による入力操作を受け付けるインターフェイスを提供するものであり、タッチパネルの他、情報端末１０に外付けされる各種入力装置を例示することができる。 The communication interface 14 provides a connection interface with the server 20, for example, and is configured from a wireless communication interface and / or a wired communication interface. The input device 15 provides an interface for accepting an input operation by a tap operation such as an icon, a button, a virtual keyboard, or a text displayed on the display device 16. Various input devices to be attached can be exemplified.

表示デバイス１６は、画像表示インターフェイスとして各種の情報を話者に提供するものであり、例えば、有機ＥＬディスプレイ、液晶ディスプレイ、ＣＲＴディスプレイ等が挙げられる。また、カメラ１７は、種々の被写体の静止画や動画を撮像するためのものである。 The display device 16 provides various information to the speaker as an image display interface, and examples thereof include an organic EL display, a liquid crystal display, and a CRT display. The camera 17 is for capturing still images and moving images of various subjects.

サーバ２０は、例えば、演算処理能力の高いホストコンピュータによって構成され、そのホストコンピュータにおいて所定のサーバ用プログラムが動作することにより、サーバ機能を発現するものであり、例えば、音声翻訳処理に必要な音声認識サーバ、翻訳サーバ、及び音声合成サーバとして機能する単数又は複数のホストコンピュータから構成される（図示においては単数で示すが、これに限定されない）。そして、各サーバ２０は、プロセッサ２１、通信インターフェイス２２、及び記憶資源２３を備える。 The server 20 is constituted by, for example, a host computer having high arithmetic processing capability, and expresses a server function when a predetermined server program operates on the host computer. It is composed of a single or a plurality of host computers that function as a recognition server, a translation server, and a speech synthesis server (in the figure, it is indicated by a single, but is not limited to this). Each server 20 includes a processor 21, a communication interface 22, and a storage resource 23.

プロセッサ２１は、算術演算、論理演算、ビット演算等を処理する算術論理演算ユニット及び各種レジスタ（プログラムカウンタ、データレジスタ、命令レジスタ、汎用レジスタ等）から構成され、記憶資源２３に格納されているプログラムＰ２０を解釈及び実行し、所定の演算処理結果を出力する。また、通信インターフェイス２２は、ネットワークＮを介して情報端末１０に接続するためのハードウェアモジュールであり、例えば、ＩＳＤＮモデム、ＡＤＳＬモデム、ケーブルモデム、光モデム、ソフトモデム等の変調復調装置である。 The processor 21 is composed of an arithmetic and logic unit for processing arithmetic operations, logical operations, bit operations and the like and various registers (program counter, data register, instruction register, general-purpose register, etc.), and is stored in the storage resource 23. P20 is interpreted and executed, and a predetermined calculation processing result is output. The communication interface 22 is a hardware module for connecting to the information terminal 10 via the network N. For example, the communication interface 22 is a modulation / demodulation device such as an ISDN modem, an ADSL modem, a cable modem, an optical modem, or a soft modem.

記憶資源２３は、例えば、物理デバイス（ディスクドライブ又は半導体メモリ等のコンピュータ読み取り可能な記録媒体等）の記憶領域が提供する論理デバイスであり、それぞれ単数又は複数のプログラムＰ２０、各種モジュールＬ２０、各種データベースＤ２０、及び各種モデルＭ２０が格納されている。また、記憶資源２３には、予め用意された複数の定型フレーズ、入力音声の履歴データ、各種設定用のデータ等も記憶されている。 The storage resource 23 is a logical device provided by, for example, a storage area of a physical device (a computer-readable recording medium such as a disk drive or a semiconductor memory), and each includes one or a plurality of programs P20, various modules L20, and various databases. D20 and various models M20 are stored. The storage resource 23 also stores a plurality of fixed phrases prepared in advance, history data of input speech, data for various settings, and the like.

プログラムＰ２０は、サーバ２０のメインプログラムである上述したサーバ用プログラム等である。また、各種モジュールＬ２０は、情報端末１０から送信されてくる要求及び情報に係る一連の情報処理を行うため、プログラムＰ１０の動作中に適宜呼び出されて実行されるソフトウェアモジュール（モジュール化されたサブプログラム）である。かかるモジュールＬ２０としては、音声認識モジュール、翻訳モジュール、音声合成モジュール等が挙げられる。 The program P20 is the above-described server program that is the main program of the server 20. In addition, the various modules L20 perform a series of information processing related to requests and information transmitted from the information terminal 10, so that they are appropriately called and executed during the operation of the program P10 (moduleized subprograms). ). Examples of the module L20 include a speech recognition module, a translation module, and a speech synthesis module.

また、各種データベースＤ２０としては、音声翻訳処理のために必要な各種コーパス（例えば、少なくとも２つの言語（第１言語と第２言語）を含む複数言語についての、音声コーパス、文字（語彙）コーパス、辞書、対訳辞書、対訳コーパス等）の各データベース、音声データベース、話者（会話支援アプリのユーザ等）に関する情報を管理するための管理用データベース、複数の定型フレーズを含む定型フレーズ集としてのテキスト及び音声データベース等が挙げられる。また、各種モデルＭ２０としては、音声認識に使用する音響モデルや言語モデル等が挙げられる。このとおり、記憶資源２３は、「記憶部」として機能する。 The various databases D20 include various corpora necessary for speech translation processing (for example, a speech corpus, a character (vocabulary) corpus for a plurality of languages including at least two languages (first language and second language), Dictionary, bilingual dictionary, bilingual corpus, etc.) database, voice database, management database for managing information about speakers (users of conversation support apps, etc.), text as a set of fixed phrases including a plurality of fixed phrases, and An audio database etc. are mentioned. Examples of the various models M20 include acoustic models and language models used for speech recognition. As described above, the storage resource 23 functions as a “storage unit”.

（第１実施形態）
以上のとおり構成された会話支援装置１００における処理操作及び動作の一例について、以下に説明する。図２は、会話支援装置１００における処理の流れ（の一部）の一例を示すフローチャートである。また、図３（Ａ）乃至（Ｄ）は、情報端末における表示画面の遷移の一例を示す平面図である。なお、ここでは、日本語（第１言語）を話す店員（店舗の従業員）が英語（第２言語）を話す外国人の客を接客する際に使用する会話フレーズを定型フレーズとして作成する場面を想定する（但し、言語やシチュエーションはこれに限定されない）。 (First embodiment)
An example of processing operations and operations in the conversation support apparatus 100 configured as described above will be described below. FIG. 2 is a flowchart showing an example of (a part of) the processing flow in the conversation support apparatus 100. 3A to 3D are plan views illustrating an example of display screen transition in the information terminal. Here, a scene where a conversation phrase used when a store clerk who speaks Japanese (first language) (store employee) serves a foreign customer who speaks English (second language) is created as a standard phrase. (However, the language and situation are not limited to this.)

まず、店員が当該会話支援アプリを起動する（ステップＳＵ１）と、サーバ２０のプロセッサ２１及び情報端末１０のプロセッサ１１により、情報端末１０の表示デバイス１６に、客の言語を選択するための言語選択画面が表示される（図３（Ａ）；ステップＳＪ１）。この言語選択画面には、客に言語を尋ねることを店員に促すための日本語のテキストＴ１、客に言語を尋ねる旨の英語のテキストＴ２、及び、想定される複数の代表的な言語（ここでは、英語、中国語（例えば書体により２種類）、ハングル語）を示す言語ボタン３１が表示される。さらにその下方には、言語選択画面を閉じて会話支援アプリを終了するためのキャンセルボタンＢ１も表示される。なお、この画面は、店員と客が音声翻訳によって実際に会話を行う場面を想定したものである。 First, when the store clerk starts the conversation support application (step SU1), the language selection for selecting the customer language on the display device 16 of the information terminal 10 by the processor 21 of the server 20 and the processor 11 of the information terminal 10 is performed. A screen is displayed (FIG. 3A; step SJ1). This language selection screen includes a Japanese text T1 for prompting the store clerk to ask the customer about the language, an English text T2 for asking the customer about the language, and a plurality of typical languages assumed here (here Then, a language button 31 indicating English, Chinese (for example, two types depending on the typeface), and Hangul) is displayed. Further below that, a cancel button B1 for closing the language selection screen and ending the conversation support application is also displayed. This screen assumes a scene in which a store clerk and a customer actually have a conversation through speech translation.

ここでは、実際の会話の前に予め定型フレーズを作成するので、店員が客の言語として英語を選択するようにする。客の言語が選択されると、サーバ２０のプロセッサ２１及び情報端末１０のプロセッサ１１により、ホーム画面として、日本語と英語の音声入力待機画面が表示デバイス１６に表示される（図３（Ｂ）；ステップＳＪ２）。この音声入力待機画面には、店員と客の言語の何れを発話するかを問う日本語のテキストＴ３、並びに、日本語の音声入力を行うための入力ボタン３２ａ及び英語の音声入力を行うための入力ボタン３２ｂが表示される。 Here, since the fixed phrase is created in advance before the actual conversation, the store clerk selects English as the customer language. When the customer language is selected, the processor 21 of the server 20 and the processor 11 of the information terminal 10 display Japanese and English voice input standby screens on the display device 16 as home screens (FIG. 3B). Step SJ2). On this voice input standby screen, Japanese text T3 asking which of the store clerk or customer speaks, an input button 32a for inputting Japanese voice, and an English voice input. An input button 32b is displayed.

また、この音声入力待機画面には、予め設定されている複数の質問定型文のリスト表示を選択するためのお声がけボタン３３、対話者の言語を手動で選択するための言語選択ボタン３４、それまでになされた音声入力内容の履歴表示を選択するための履歴ボタン３５、予め用意された複数の定型フレーズを含む定型フレーズ集を表示し、それらのなかから所望の定型フレーズを選択して会話を進めることができるサジェスト機能を実行するためのサジェストボタン３６、及び当該会話支援アプリの各種設定を行うための設定ボタン３７も表示される。 Also, on this voice input standby screen, a voice button 33 for selecting a list display of a plurality of preset canned sentences, a language selection button 34 for manually selecting a language of a conversation person, History button 35 for selecting the history display of the voice input contents made so far, a list of fixed phrases including a plurality of fixed phrases prepared in advance are displayed, and a desired fixed phrase is selected from among them and a conversation is performed A suggestion button 36 for executing a suggestion function that allows the user to proceed and a setting button 37 for performing various settings of the conversation support application are also displayed.

次に、図３（Ｂ）に示す音声入力待機画面において、店員が日本語の入力ボタン３２ａをタップして日本語の音声入力を選択すると、店員の日本語による発話内容を受け付ける音声入力画面となる（図３（Ｃ））。この音声入力画面が表示されると、音声入出力デバイス１３からの音声入力が可能な状態となる。また、この音声入力画面には、店員の音声入力を促すテキストＴ２、音声入力状態にあることを示すマイク図案３８、及びテキスト入力へ切り替えるための入力切替ボタンＢ２が表示される。さらに、この音声入力画面にも、キャンセルボタンＢ１が表示され、これをタップすることにより、会話支援アプリを終了するか、音声入力待機画面（図３（Ｂ））へ戻って音声入力をやり直すことができる。 Next, in the voice input standby screen shown in FIG. 3B, when the clerk taps the Japanese input button 32a and selects the Japanese voice input, the voice input screen that accepts the utterance contents in Japanese of the clerk; (FIG. 3C). When this voice input screen is displayed, voice input from the voice input / output device 13 is enabled. Further, on this voice input screen, a text T2 for prompting the store clerk to input voice, a microphone design 38 indicating that the voice input state is set, and an input switching button B2 for switching to text input are displayed. Further, a cancel button B1 is also displayed on this voice input screen, and when this is tapped, the conversation support application is terminated, or the voice input standby screen (FIG. 3B) is returned to perform voice input again. Can do.

この状態で、店員が、よく使う会話フレーズ（例えば「申し訳ございません。満席ですのでこちらで少々おまちください。」といったフレーズ）を発話する（ステップＳＵ２）と、テキストＴ２とともに、その声量の大小を模式的に且つ動的に表す多重円形図案３９が表示され、音声入力レベルが話者へ視覚的にフィードバックされる。それから、発話が終了してマイク図案３８がタップされると、プロセッサ１１は、発話内容の受け付けを終了する。情報端末１０のプロセッサ１１は、その音声入力に基づいて音声信号を生成し、その音声信号を通信インターフェイス１４及びネットワークＮを通してサーバ２０へ送信する。このとおり、情報端末１０自体、又はプロセッサ１１及び音声入出力デバイス１３が「入力部」として機能する。 In this state, when the store clerk utters a frequently used conversation phrase (for example, “I ’m sorry. Please wait for a moment because it ’s full.”) A multi-circular pattern 39 schematically and dynamically representing is displayed, and the voice input level is visually fed back to the speaker. Then, when the utterance is finished and the microphone design 38 is tapped, the processor 11 finishes accepting the utterance content. The processor 11 of the information terminal 10 generates an audio signal based on the audio input, and transmits the audio signal to the server 20 through the communication interface 14 and the network N. As described above, the information terminal 10 itself, or the processor 11 and the voice input / output device 13 function as an “input unit”.

次に、サーバ２０のプロセッサ２１は、通信インターフェイス２２を通してその音声信号を受信し、音声認識処理を行う。このとき、プロセッサ２１は、記憶資源２３から、必要なモジュールＬ２０、データベースＤ２０、及びモデルＭ２０（音声認識モジュール、日本語音声コーパス、音響モデル、言語モデル等）を呼び出し、入力音声の「音」を「読み」（文字）へ変換する。このとおり、プロセッサ２１、又は、サーバ２０が全体として「音声認識サーバ」として機能する。また、プロセッサ２１は、認識された内容を、音声入力の履歴データとして、記憶資源２３に（必要に応じて適宜のデータベースに）記憶する。 Next, the processor 21 of the server 20 receives the voice signal through the communication interface 22 and performs voice recognition processing. At this time, the processor 21 calls the necessary module L20, database D20, and model M20 (speech recognition module, Japanese speech corpus, acoustic model, language model, etc.) from the storage resource 23, and obtains “sound” of the input speech. Convert to "reading" (character). As described above, the processor 21 or the server 20 functions as a “voice recognition server” as a whole. In addition, the processor 21 stores the recognized content in the storage resource 23 (in an appropriate database as necessary) as voice input history data.

続いて、プロセッサ２１は、認識された音声の「読み」（文字）を複数の他言語に翻訳する多言語翻訳処理へ移行する。ここでは、相手の言語として英語が選択されているので、プロセッサ２１は、記憶資源２３から、必要なモジュールＬ２０及びデータベースＤ２０（翻訳モジュール、日本語文字コーパス、日本語辞書、英語辞書、日本語／英語対訳辞書、日本語／英語対訳コーパス等）を呼び出し、認識結果である入力音声の「読み」（文字列）を適切に並び替えて日本語の句、節、文等へ変換し、その変換結果に対応する英語を抽出し、それらを英語の文法に従って並び替えて自然な英語の句、節、文等へと変換する。このとおり、プロセッサ２１は、入力音声の内容を第１言語（日本語）とは異なる第２言語（英語）の内容に翻訳する「翻訳部」としても機能し、サーバ２０は、全体として「翻訳サーバ」としても機能する。なお、入力音声が正確に認識されなかった場合には、音声の再入力を行うことができる（図示省略）。また、プロセッサ２１は、それらの日本語及び英語の句、節、文等を、記憶資源２３に記憶しておくこともできる。 Subsequently, the processor 21 proceeds to multilingual translation processing for translating the recognized “reading” (characters) of the speech into a plurality of other languages. Here, since English is selected as the partner language, the processor 21 sends the necessary modules L20 and database D20 (translation module, Japanese character corpus, Japanese dictionary, English dictionary, Japanese / English) from the storage resource 23. English bilingual dictionary, Japanese / English bilingual corpus, etc.), and “recognition” (character string) of the input speech that is the recognition result is appropriately sorted and converted into Japanese phrases, clauses, sentences, etc. Extract English corresponding to the results, rearrange them according to English grammar and convert them into natural English phrases, clauses, sentences, etc. As described above, the processor 21 also functions as a “translation unit” that translates the content of the input speech into the content of the second language (English) that is different from the first language (Japanese). It also functions as a “server”. If the input voice is not correctly recognized, the voice can be re-input (not shown). The processor 21 can also store those Japanese and English phrases, clauses, sentences, and the like in the storage resource 23.

また、この翻訳処理中に、プロセッサ２１は、入力音声の認識結果（入力音声の内容）を、情報端末１０に送信し、プロセッサ１１は、その認識結果を、図３（Ｄ）に示す翻訳処理中画面に、日本語のテキストＴ５として表示する。なお、このテキストＴ５としては、入力音声の認識結果をそのまま表示してもよいし、予め記憶資源２３に記憶されている日本語の会話コーパスのなかから、実際の入力音声の内容に対応するものを呼び出して表示してもよい。また、この翻訳処理中画面には、翻訳処理中であることを示す日本語のテキストＴ６、及び、処理中であることを表すため円弧の一部が回動するように表示される環状図案４０も表示される（ここまでステップＳＪ３）。 Further, during this translation processing, the processor 21 transmits the recognition result of the input speech (contents of the input speech) to the information terminal 10, and the processor 11 converts the recognition result into the translation processing shown in FIG. Displayed as Japanese text T5 on the middle screen. As the text T5, the recognition result of the input voice may be displayed as it is, or it corresponds to the content of the actual input voice from the Japanese conversation corpus stored in the storage resource 23 in advance. May be displayed. Further, on this translation processing screen, a Japanese text T6 indicating that the translation processing is being performed, and a circular design 40 displayed so that a part of the arc is rotated to indicate that the translation processing is being performed. Is also displayed (step SJ3 so far).

次に、多言語翻訳処理が完了すると、プロセッサ２１は、音声合成処理へ移行する（ステップＳＪ４）。このとき、プロセッサ２１は、記憶資源２３から、必要なモジュールＬ２０、データベースＤ２０、及びモデルＭ２０（音声合成モジュール、英語音声コーパス、音響モデル、言語モデル等）を呼び出し、翻訳結果である英語の句、節、文等を自然な音声に変換する。このとおり、プロセッサ２１は、「音声合成部」としても機能し、サーバ２０は、全体として「音声合成サーバ」としても機能する。 Next, when the multilingual translation processing is completed, the processor 21 proceeds to speech synthesis processing (step SJ4). At this time, the processor 21 calls the necessary module L20, database D20, and model M20 (speech synthesis module, English speech corpus, acoustic model, language model, etc.) from the storage resource 23, and the English phrase that is the translation result, Convert clauses, sentences, etc. to natural speech. As described above, the processor 21 also functions as a “speech synthesizer”, and the server 20 also functions as a “speech synthesizer” as a whole.

次いで、プロセッサ２１は、英語による翻訳結果（対応する英語の会話コーパスでもよい）に基づいてテキスト表示用のテキスト信号を生成し、情報端末１０へ送信する。そのテキスト信号を受信したプロセッサ１１は、図３（Ｄ）の翻訳中画面に表示した日本語のテキストＴ５と、その英語による翻訳結果（対訳）のテキストＴ６を、図４（Ａ）に示す翻訳結果表示画面に表示する。また、この翻訳結果表示画面には、テキストＴ５で示す内容が客に伝わることを説明するための日本語のテキストＴ７も表示される。このとおり、プロセッサ１１，２１及び表示デバイス１６が、「対訳表示部」として機能する。 Next, the processor 21 generates a text signal for text display based on the translation result in English (which may be a corresponding English conversation corpus), and transmits the text signal to the information terminal 10. Upon receiving the text signal, the processor 11 translates the Japanese text T5 displayed on the translation-in-progress screen of FIG. 3D and the text T6 of the translation result (translation) in English into the translation shown in FIG. Display on the result display screen. The translation result display screen also displays a Japanese text T7 for explaining that the content indicated by the text T5 is transmitted to the customer. As described above, the processors 11 and 21 and the display device 16 function as a “translation display unit”.

また、この翻訳結果表示画面には、話者が操作可能な各種ボタンが表示される。すなわち、テキストＴ５，Ｔ６の間の画面領域には、図３（Ｂ）のホーム画面へ戻るためのチェックボタンＢ３、及び、英語による翻訳結果のテキストＴ６の内容を日本語に逆翻訳するための逆翻訳ボタンＢ４が表示される。さらに、この翻訳結果表示画面には、翻訳結果の誤りを報告するための誤訳通知ボタンＢ５、対訳のテキストＴ６の内容を再生するための音声出力ボタンＢ６、及び、図３（Ｃ）の音声入力画面に戻って発話をやり直すための再入力ボタンＢ７も表示される（ここまでステップＳＪ５）。 Various buttons that can be operated by the speaker are displayed on the translation result display screen. That is, in the screen area between the texts T5 and T6, there is a check button B3 for returning to the home screen in FIG. 3B and the content of the translation result text T6 in English for back-translation into Japanese. A reverse translation button B4 is displayed. Further, on this translation result display screen, a mistranslation notification button B5 for reporting an error in the translation result, a voice output button B6 for reproducing the contents of the parallel translation text T6, and a voice input in FIG. A re-input button B7 for returning to the screen and starting the speech again is displayed (step SJ5 so far).

（逆翻訳）
ここで、話者は、テキストＴ６で示される英語による翻訳結果の当否が不明な場合、その内容を日本語へ逆翻訳して、その妥当性を判断することができる。すなわち、店員が逆翻訳ボタンＢ４をタップ（ステップＳＵ３；話者の指示）すると、情報端末１０のプロセッサ１１からその選択信号を受信したプロセッサ２１は、上述した多言語翻訳処理により、英語による翻訳結果のテキストＴ６の内容を日本語に逆翻訳し、その日本語による逆翻訳結果のテキストＴ８を、入力音声の内容のテキストＴ５とともに、図４（Ｂ）に示す逆翻訳結果表示画面に表示する（ステップＳＪ６）。このとおり、プロセッサ２１は、第２言語（英語）による翻訳結果を第１言語（日本語）の内容に逆翻訳する「逆翻訳部」としても機能する。 (Reverse translation)
Here, when the correctness of the translation result in English indicated by the text T6 is unknown, the speaker can reverse-translate the contents into Japanese and determine the validity. That is, when the store clerk taps the reverse translation button B4 (step SU3; speaker's instruction), the processor 21 that receives the selection signal from the processor 11 of the information terminal 10 performs the translation result in English by the multilingual translation process described above. 4 is back-translated into Japanese, and the back-translation result text T8 in Japanese is displayed on the back-translation result display screen shown in FIG. Step SJ6). As described above, the processor 21 also functions as a “reverse translation unit” that back-translates the translation result in the second language (English) into the content of the first language (Japanese).

この逆翻訳結果表示画面には、図４（Ａ）に示す翻訳結果表示画面を再表示するための戻るボタンＢ８とともに、テキストＴ５で示される日本語による入力音声の内容とテキストＴ６で示される英語による翻訳結果の組を定型フレーズとして登録するための登録ボタンＢ９が表示される。すなわち、店員は、テキストＴ５で示される日本語による入力音声の内容と、テキストＴ８で示される日本語による逆翻訳結果は、若干の表現の違い（「少々」と「しばらく」）を含むものの同義であるから、テキストＴ６で示される英語による翻訳結果が適当（翻訳の精度又は適合性が高い）と判断し、登録ボタンＢ９をタップすることにより、これらの日本語と英語の内容を定型フレーズとして登録することができる。 In this reverse translation result display screen, the contents of the input speech in Japanese indicated by the text T5 and the English indicated by the text T6 are displayed together with a return button B8 for redisplaying the translation result display screen shown in FIG. A registration button B9 for registering a set of translation results by is registered as a standard phrase. That is, the store clerk is synonymous with the content of the input speech in Japanese indicated by the text T5 and the result of reverse translation in Japanese indicated by the text T8 including a slight difference in expression ("little" and "for a while"). Therefore, it is judged that the translation result in English indicated by the text T6 is appropriate (translation accuracy or suitability is high), and by tapping the registration button B9, these Japanese and English contents are used as standard phrases. You can register.

（定型フレーズ登録）
こうして、店員が登録ボタンＢ９をタップ（ステップＳＵ４；話者の指示）すると、情報端末１０のプロセッサ１１からその選択信号を受信したプロセッサ２１は、データベースＤ２０の１つである定型フレーズ集へアクセスし、テキストＴ５で示される日本語による入力音声の内容とテキストＴ６で示される英語による翻訳結果（会話フレーズ）の組を、その定型フレーズ集に登録（記憶保持）する。このとおり、プロセッサ２１は、「登録部」としても機能する。また、プロセッサ２１は、その登録された定型フレーズを含む定型フレーズの一覧の表示信号を情報端末１０へ送信し、それを受信したプロセッサ１１は、その定型フレーズ集を例えば図４（Ｃ）に示す定型フレーズ集一覧画面に表示する（ステップＳＪ７）。このとおり、プロセッサ１１，２１は、「フレーズ表示部」としても機能する。 (Regular phrase registration)
Thus, when the store clerk taps the registration button B9 (step SU4; speaker's instruction), the processor 21 that has received the selection signal from the processor 11 of the information terminal 10 accesses a set of fixed phrases that is one of the databases D20. The set of the input speech content in Japanese indicated by the text T5 and the translation result (conversation phrase) in English indicated by the text T6 is registered (stored) in the standard phrase collection. As described above, the processor 21 also functions as a “registration unit”. Further, the processor 21 transmits a display signal of a list of fixed phrases including the registered fixed phrases to the information terminal 10, and the processor 11 that has received the signal displays the collection of fixed phrases, for example, in FIG. It is displayed on the standard phrase book list screen (step SJ7). As described above, the processors 11 and 21 also function as “phrase display units”.

この定型フレーズ集一覧画面は、図３（Ｂ）に示すサジェストボタン３６をタップしたときに表示される画面と同一である（但し、異なっていてももちろんよい）。また、定型フレーズ集の構成は特に制限されず、定型フレーズ集一覧画面には、例えば、適宜の複数の見出しタブＲ１〜Ｒ４等（例えば、「最近追加されたフレーズ」、「接客初動」、「挨拶」、「注文」、「推奨」等）毎に、複数の定型フレーズのテキストＴ４０〜Ｔ４６が区分けして表示される。これらの各定型フレーズにおいては、日本語と英語の内容（例えばテキストＴ５，Ｔ６）併記されている。また、各見出しタブＲ１〜Ｒ４に収納された定型フレーズは、例えば、所望の見出しタブＲ１〜Ｒ４をタップしたり、話者が表示デバイス１６の画面をワイプしたりして切り替える（めくる）ことにより、閲覧することができる。さらに、図示を省略するが、各定型フレーズを例えば長押し等の操作で選択して、削除したり、見出しタブＲ１〜Ｒ４間で適宜移動させたりすることもできる。 This fixed phrase list screen is the same as that displayed when the suggest button 36 shown in FIG. 3B is tapped (however, it may of course be different). The configuration of the standard phrase book is not particularly limited, and the standard phrase book list screen includes, for example, a plurality of appropriate heading tabs R1 to R4, etc. A plurality of fixed phrase texts T40 to T46 are displayed separately for each of “greeting”, “order”, “recommendation”, and the like. In each of these fixed phrases, the contents of Japanese and English (for example, texts T5 and T6) are written together. The fixed phrases stored in the heading tabs R1 to R4 are switched (turned) by, for example, tapping the desired heading tabs R1 to R4 or wiping the screen of the display device 16 by the speaker. Can browse. Furthermore, although illustration is abbreviate | omitted, each fixed phrase can also be selected by operation, such as long press, and can be deleted, or it can also be moved suitably between heading tabs R1-R4.

また、所望のフレーズを例えばタップすることにより、先に合成された英語の音声を再生出力することもできる。この場合、プロセッサ２１は、合成された音声に基づいて音声出力用の音声信号を生成し、情報端末１０へ送信する。その音声信号を受信したプロセッサ１１は、音声入出力デバイス１３（出力部）を用いて、テキストＴ５の内容の音声を出力する（読み上げる）。これらの操作の後、店員は、チェックボタンＢ３をタップすることにより図３（Ｂ）のホーム画面へ戻る、或いは、当該アプリを適宜終了することができる（ステップＳＵ５）。 In addition, by tapping a desired phrase, for example, the previously synthesized English voice can be reproduced and output. In this case, the processor 21 generates a voice signal for voice output based on the synthesized voice, and transmits the voice signal to the information terminal 10. The processor 11 that has received the sound signal outputs (reads out) the sound of the content of the text T5 using the sound input / output device 13 (output unit). After these operations, the store clerk can return to the home screen of FIG. 3B by tapping the check button B3, or can appropriately terminate the application (step SU5).

なお、図４（Ａ）に示す翻訳結果表示画面に表示される他のボタンを選択してタップした場合の処理の概要は以下のとおりである。 The outline of the process when another button displayed on the translation result display screen shown in FIG. 4A is selected and tapped is as follows.

（誤訳通知）
日本語による逆翻訳結果を確認し、英語の翻訳結果の精度が不十分又は誤訳であると判断した店員は、誤訳通知ボタンＢ５をタップすることにより、その旨をサーバ２０に通報することができる。この場合、プロセッサ２１は、その英語の翻訳結果が誤りであることを、先に記憶資源２３に記憶しておいた入力音声の内容に関連付けて記憶する。 (Notice of mistranslation)
A clerk who confirms the reverse translation result in Japanese and determines that the accuracy of the English translation result is insufficient or mistranslation can notify the server 20 by tapping the mistranslation notification button B5. . In this case, the processor 21 stores that the English translation result is incorrect in association with the contents of the input speech previously stored in the storage resource 23.

（音声出力）
また、日本語による逆翻訳結果を確認し、英語の翻訳結果が正確又は妥当であると判断した店員は、音声出力ボタンＢ６をタップすることにより、上述した音声出力と同様の処理により、英語による翻訳結果のテキストＴ５の内容を再生することができる。 (Audio output)
In addition, the clerk confirming the reverse translation result in Japanese and judging that the English translation result is correct or appropriate, taps the voice output button B6, thereby performing the same process as the voice output described above, and in English. The contents of the translation result text T5 can be reproduced.

（再入力）
また、逆翻訳結果を確認することなく、或いは、確認した後、店員は、再入力ボタンＢ７をタップすることにより、図３（Ｃ）の音声入力画面に戻って発話をやり直すこともできる。 (Re-enter)
Further, the clerk can return to the voice input screen of FIG. 3C and repeat the utterance without confirming the reverse translation result or after confirming, by tapping the re-input button B7.

（第２実施形態）
第２実施形態における処理は、図３（Ｂ）に示す音声入力画面を表示（ステップＳＪ２）した後で、発話（ステップＳＵ１）する前に、話者が図３（Ｂ）に示すサジェストボタン３６をタップして定型フレーズ集を一旦表示させ、それに含まれる定型フレーズのひな型のなかから所望のひな型を選択すること以外は、第１実施形態における処理と同様である。 (Second Embodiment)
In the processing in the second embodiment, after the voice input screen shown in FIG. 3B is displayed (step SJ2) and before the utterance (step SU1), the speaker presses the suggest button 36 shown in FIG. 3B. The process is the same as the process in the first embodiment, except that a set of fixed phrases is displayed once by tapping and a desired pattern is selected from the patterns of fixed phrases included therein.

すなわち、定型フレーズ集には、所定のひな型（例えば接客の場面では、「申し訳ございません。〜ください。」「本日のおすすめメニューは〜になります。」、「〜の営業時間は〜です。」といったひな型フレーズ）が用意されており、それらは、例えば図４（Ｄ）に示すひな形フレーズ集一覧画面に表示される。このひな型フレーズ集の構成も特に制限されず、ひな形フレーズ集一覧画面には、例えば、適宜の複数の見出しタブＨ１〜Ｈ４等（例えば、「最近使用されたフレーズ」、「接客初動」、「挨拶」、「注文」、「推奨」等）毎に、複数のひな型フレーズのテキストＨ４０〜Ｈ４６が区分けして表示される。このとおり、プロセッサ１１，２１及び表示デバイス１６が、「ひな型表示部」としても機能する。
In other words, there is a set of standard phrases (for example, in the customer service situation, “I ’m sorry. Please please.” “Today ’s recommended menu is ~.”, “~ Business hours are ~.”) and template phrases) are provided such that they are displayed on the stationery phrasebook list screen example shown in FIG. 4 (D). The configuration of the template phrase collection is not particularly limited, and the template phrase collection list screen includes, for example, appropriate plural heading tabs H1 to H4 (for example, “recently used phrases”, “customer initial movement”, “ A plurality of template phrase texts H40 to H46 are displayed separately for each “greeting”, “order”, “recommendation”, and the like. As described above, the processors 11 and 21 and the display device 16 also function as a “model display unit”.

店員が、発話（ステップＳＵ２）の前に、所望のひな型フレーズをタップ等により選択すると、図３（Ｂ）のホーム画面へ戻り（チェックボタンＢ３をタップしてから戻るようにしてもよい）、選択したひな型フレーズに対応した内容を発話すると、プロセッサ２１は、そのひな型フレーズに沿った各種コーパスを用いて翻訳処理を行う（ステップＳＪ３〜ＳＪ５）。 When the store clerk selects a desired model phrase by tapping or the like before the utterance (step SU2), the clerk returns to the home screen of FIG. 3B (may return after tapping the check button B3), When the content corresponding to the selected template phrase is uttered, the processor 21 performs translation processing using various corpora along the template phrase (steps SJ3 to SJ5).

以上のように構成された定型フレーズ作成装置を含む会話支援装置１００、並びに、定型フレーズ作成プログラム及び会話支援プログラムによれば、音声入力した日本語（第１言語）による会話フレーズの内容を、音声翻訳処理によって英語（第２言語）の内容に翻訳し、さらに、それを必要に応じて日本語の内容に逆翻訳して表示する。これにより、店員（話者）は、その英語への翻訳結果の当否を判断することができる。 According to the conversation support device 100 including the fixed phrase creation device configured as described above, and the fixed phrase creation program and the conversation support program, the content of the conversation phrase in Japanese (first language) input by voice is spoken. The content is translated into English (second language) content by translation processing, and further back-translated into Japanese content as necessary. Thereby, the store clerk (speaker) can determine whether or not the translation result into English is correct.

その結果、英語への翻訳結果が妥当である場合、その日本語による入力音声の内容と英語による翻訳結果の組（例えばテキストＴ５，Ｔ６）を１つの定型フレーズとして作成し、実際の会話に先立って、定型フレーズ集に予め登録しておくことができる。従って、話者は会話の場面において、所望の会話フレーズを定型フレーズ集のなかから、的確かつ簡易に探し出して利用することが可能となり、これにより、話者同士の円滑な会話によるコミュニケーションを支援することができる。また、入力音声の翻訳処理の前に、ひな型フレーズを選択することにより、翻訳の精度を高めて、定型フレーズの作成をより簡易に、かつ、より効率的に行うことができる利点がある。 As a result, if the translation result into English is valid, a pair of the input speech content in Japanese and the translation result in English (for example, texts T5 and T6) is created as one fixed phrase and prior to the actual conversation. And can be registered in advance in the standard phrase collection. Therefore, it is possible for a speaker to find and use a desired conversation phrase accurately and easily from a set of standard phrases in a conversation scene, thereby supporting communication by smooth conversation between speakers. be able to. In addition, there is an advantage that by selecting a template phrase before the input speech translation process, the accuracy of translation can be improved, and a standard phrase can be created more easily and efficiently.

なお、上述したとおり、上記の各実施形態は、本発明を説明するための一例であり、本発明をその実施形態に限定する趣旨ではない。また、本発明は、その要旨を逸脱しない限り、様々な変形が可能である。例えば、当業者であれば、実施形態で述べたリソース（ハードウェア資源又はソフトウェア資源）を均等物に置換することが可能であり、そのような置換も本発明の範囲に含まれる。 Note that, as described above, each of the above embodiments is an example for explaining the present invention, and is not intended to limit the present invention to the embodiment. The present invention can be variously modified without departing from the gist thereof. For example, those skilled in the art can replace the resources (hardware resources or software resources) described in the embodiments with equivalents, and such replacements are also included in the scope of the present invention.

また、例えば発話した入力音声の内容が単語の如く短い場合には、逆翻訳が必要ない場合もあり得ることから、図４（Ａ）に示す翻訳結果表示画面にも、登録ボタンＢ９を表示するようにしてもよい。さらに、図２に示すフローの例えばステップＳＪ３において、ステップＳＪ６の逆翻訳処理を予め行っておき、逆翻訳ボタンＢ４がタップされた場合、ステップＳＪ６においては逆翻訳結果の表示のみ行ってもよい。すなわち、この場合、逆翻訳処理を事前に行っておき、話者の指示があった場合にのみ、逆翻訳結果を表示するようにしてもよい。またさらに、図２に示すフローのステップＳＪ３，ＳＪ５間で音声合成処理（ステップＳＪ４）を行わず、ステップＳＵ３において音声出力ボタンＢ６がタップされた場合に、音声合成処理を行ってもよい。 Further, for example, when the content of the spoken input speech is as short as a word, reverse translation may not be necessary, so the registration button B9 is also displayed on the translation result display screen shown in FIG. You may do it. Further, in step SJ3 of the flow shown in FIG. 2, for example, when the reverse translation process of step SJ6 is performed in advance and the reverse translation button B4 is tapped, only the reverse translation result may be displayed in step SJ6. That is, in this case, the reverse translation process may be performed in advance, and the reverse translation result may be displayed only when the speaker gives an instruction. Furthermore, the speech synthesis process may be performed when the speech output button B6 is tapped in step SU3 without performing the speech synthesis process (step SJ4) between steps SJ3 and SJ5 of the flow shown in FIG.

また、音声認識、翻訳、音声合成等の各処理をサーバ２０によって実行する例について記載したが、これらの処理を情報端末１０において実行するように構成してもよい。この場合、それらの処理に用いるモジュールＬ２０は、情報端末１０の記憶資源１２に保存されていてもよい。さらに、音声データベースであるデータベースＤ２０、及び／又は、音響モデル等のモデルＭ２０も、情報端末１０の記憶資源１２に保存されていてもよい。このとおり、定型フレーズ作成装置は、ネットワークＮ及びサーバ２０を備えなくてもよい。また、情報端末１０とネットワークＮとの間には、両者間の通信プロトコルを変換するゲートウェイサーバ等が介在してももちろんよい。また、情報端末１０は、携帯型装置に限らず、例えば、デスクトップ型パソコン、ノート型パソコン、タブレット型パソコン、ラップトップ型パソコン等でもよい。 Moreover, although the example which performs each process, such as speech recognition, translation, speech synthesis, by server 20, was described, you may comprise so that these processes may be performed in the information terminal 10. FIG. In this case, the module L20 used for these processes may be stored in the storage resource 12 of the information terminal 10. Furthermore, a database D20 that is a voice database and / or a model M20 such as an acoustic model may also be stored in the storage resource 12 of the information terminal 10. As described above, the fixed phrase creation device may not include the network N and the server 20. Of course, a gateway server for converting a communication protocol between the information terminal 10 and the network N may be interposed. The information terminal 10 is not limited to a portable device, and may be a desktop personal computer, a notebook personal computer, a tablet personal computer, a laptop personal computer, or the like.

本発明によれば、話者同士の円滑な会話によるコミュニケーションを支援することができるので、本発明は、例えば、互いの言語を理解できない人同士の会話に関するサービスの提供分野における、プログラム、装置、システム、及び方法の設計、製造、提供、販売等の活動に広く利用することができる。 According to the present invention, since communication through smooth conversation between speakers can be supported, the present invention provides, for example, a program, a device, and a device in the field of providing services related to conversation between people who cannot understand each other's language. It can be widely used for activities such as designing, manufacturing, providing and selling systems and methods.

１０情報端末
１１プロセッサ
１２記憶資源
１３音声入出力デバイス
１４通信インターフェイス
１５入力デバイス
１６表示デバイス
１７カメラ
２０サーバ
２１プロセッサ
２２通信インターフェイス
２３記憶資源
３１言語ボタン
３２ａ，３２ｂ入力ボタン
３３お声がけボタン
３４言語選択ボタン
３５履歴ボタン
３６サジェストボタン
３７設定ボタン
３８マイク図案
３９多重円形図案
４０環状図案
１００会話支援装置
Ｂ１キャンセルボタン
Ｂ２入力切替ボタン
Ｂ３チェックボタン
Ｂ４逆翻訳ボタン
Ｂ５誤訳通知ボタン
Ｂ６音声出力ボタン
Ｂ７再入力ボタン
Ｂ８戻るボタン
Ｂ９登録ボタン
Ｄ２０データベース
Ｈ１〜Ｈ４見出しタブ
Ｈ４０〜Ｈ４６ひな型フレーズのテキスト
Ｌ２０モジュール
Ｍ２０モデル
Ｎネットワーク
Ｐ１０，Ｐ２０プログラム
Ｒ１〜Ｒ４見出しタブ
Ｔ１〜Ｔ８テキスト
Ｔ４０〜Ｔ４６定型フレーズのテキスト 10 Information terminal 11 Processor 12 Storage resource 13 Voice input / output device 14 Communication interface 15 Input device 16 Display device 17 Camera 20 Server 21 Processor 22 Communication interface 23 Storage resource 31 Language buttons 32a, 32b Input button 33 Voice button 34 Language selection Button 35 History button 36 Suggest button 37 Setting button 38 Microphone design 39 Multiple circular design 40 Circular design 100 Conversation support device B1 Cancel button B2 Input switching button B3 Check button B4 Reverse translation button B5 Mistranslation notification button B6 Audio output button B7 Re-input button B8 Back button B9 Registration button D20 Database H1-H4 Heading tabs H40-H46 Template phrase text L20 Module M20 Model N Network P 0, P20 program R1~R4 heading tab T1~T8 text of text T40~T46 boilerplate phrases

Claims

An input unit for inputting the voice of the speaker in the first language;
A translation unit that translates the content of the input speech into the content of a second language different from the first language;
A back translation unit that back translates the translation result of the second language into the content of the first language based on a reverse translation instruction by the speaker;
Displaying the content of the input speech and the translation result in the second language on the same screen so that the speaker can visually recognize the content of the input speech and the translation result in the second language, and the speaker A bilingual display unit for displaying the content of the input speech and the reverse translation result in the second language on the same screen so that the content of the input speech and the reverse translation result in the second language are visible.
Based on the registration instruction by the speaker who determined that the translation result is appropriate from the content of the input speech and the reverse translation result, a set of phrases of the content of the input speech and the translation result in the second language A registration unit to register as,
A template display for displaying a template of the fixed phrase;
With
The translation unit selects a specific model phrase from which a part of the word is omitted from the template of the standard phrase by the speaker, and omits the phrase in the specific model by the speaker. When a speech having a content supplemented with a part of the written word is input, the speech content is changed from the first language by using a corpus along the selected specific template phrase. Translate into two languages,
Fixed phrase creation device.

The display unit displays a button for inputting a reverse translation instruction by the speaker, and a button for inputting a registration instruction by the speaker.
The fixed phrase creation device according to claim 1.

The display unit displays a template phrase list screen including a plurality of template phrases, and the template phrase list screen displays text of a plurality of template phrases for each of a plurality of appropriate headings. Displayed,
The fixed phrase creation device according to claim 1 or 2.

The fixed phrase creation device according to any one of claims 1 to 3,
A storage unit that holds a collection of fixed phrases including a plurality of fixed phrases;
A phrase display section for displaying the fixed phrase;
An output unit for outputting the translation result of the second language in the fixed phrase by voice;
A conversation support device.

Computer
An input unit for inputting the voice of the speaker in the first language;
A translation unit for translating the content of the input speech into the content of a second language different from the first language;
A reverse translation unit that back-translates the translation result of the second language into the content of the first language based on a reverse translation instruction by the speaker;
Displaying the content of the input speech and the translation result in the second language on the same screen so that the speaker can visually recognize the content of the input speech and the translation result in the second language, and the speaker A bilingual display unit that displays the content of the input speech and the result of reverse translation in the second language on the same screen so that the content of the input speech and the result of reverse translation in the second language are visible.
Based on the registration instruction by the speaker who determined that the translation result is appropriate from the content of the input speech and the reverse translation result, a set of phrases of the content of the input speech and the translation result in the second language Function as a registration unit for registering as a template display unit for displaying a template of the fixed phrase,
The translation unit selects a specific model phrase from which a part of the word is omitted from the template of the standard phrase by the speaker, and omits the phrase in the specific model by the speaker. When a speech having a content supplemented with a part of the written word is input, the speech content is changed from the first language by using a corpus along the selected specific template phrase. Translate into two languages,
Fixed phrase creation program.

Computer
An input unit for inputting the voice of the speaker in the first language;
A translation unit for translating the content of the input speech into the content of a second language different from the first language;
A reverse translation unit that back-translates the translation result of the second language into the content of the first language based on a reverse translation instruction by the speaker;
Displaying the content of the input speech and the translation result in the second language on the same screen so that the speaker can visually recognize the content of the input speech and the translation result in the second language, and the speaker A bilingual display unit that displays the content of the input speech and the result of reverse translation in the second language on the same screen so that the content of the input speech and the result of reverse translation in the second language are visible.
Based on the registration instruction by the speaker who determined that the translation result is appropriate from the content of the input speech and the reverse translation result, a set of phrases of the content of the input speech and the translation result in the second language As a registration department,
A template display for displaying a template of the fixed phrase;
A storage unit for holding a collection of fixed phrases including a plurality of fixed phrases;
A phrase display unit that displays the fixed phrase; and an output unit that outputs a translation result of the fixed phrase in the second language by voice;
Function as
The translation unit selects a specific model phrase from which a part of the word is omitted from the template of the standard phrase by the speaker, and omits the phrase in the specific model by the speaker. When a speech having a content supplemented with a part of the written word is input, the speech content is changed from the first language by using a corpus along the selected specific template phrase. Translate into two languages,
Conversation support program.