JP6141483B1

JP6141483B1 - Speech translation device, speech translation method, and speech translation program

Info

Publication number: JP6141483B1
Application number: JP2016066157A
Authority: JP
Inventors: 千春宇賀神
Original assignee: RECRUIT LIFESTYLE CO., LTD.
Current assignee: RECRUIT LIFESTYLE CO., LTD.
Priority date: 2016-03-29
Filing date: 2016-03-29
Publication date: 2017-06-07
Anticipated expiration: 2036-03-29
Also published as: JP2017182310A

Abstract

【課題】接客において、円滑なコミュニケーションを図りつつ、対話の内容を分析することにより、注文情報を簡易かつ有効に取得する。【解決手段】本発明の一態様による音声翻訳装置は、ユーザ等の音声を入力するための入力部、入力音声の内容を異なる言語の内容に翻訳する翻訳部、入力音声の翻訳内容を音声等で出力する出力部、ユーザ等が選択可能なように定型フレーズを表示する表示部、ユーザからの注文の問い合わせに対する対話者による注文が確定したか否かを判定する判定部、注文が確定したと判定されたときに、その注文の確定内容を記憶する記憶部を備える。そして、判定部は、例えば、入力音声の内容、又は、選定された定型フレーズの内容に、商品名、注文数量、及びクロージングが含まれている場合に、注文が完了したと判定する。【選択図】図２Order information can be acquired easily and effectively by analyzing the contents of a dialogue while facilitating smooth communication. A speech translation apparatus according to an aspect of the present invention includes an input unit for inputting speech of a user or the like, a translation unit for translating the content of the input speech into content of a different language, the translation content of the input speech as speech, etc. The output unit, the display unit that displays a fixed phrase so that the user can select it, the determination unit that determines whether or not the order by the interrogator for the order inquiry from the user is confirmed, and the order is confirmed When the determination is made, a storage unit is provided for storing the confirmed contents of the order. The determination unit determines that the order has been completed when, for example, the content of the input voice or the content of the selected fixed phrase includes the product name, the order quantity, and the closing. [Selection] Figure 2

Description

本発明は、音声翻訳装置、音声翻訳方法、及び音声翻訳プログラムに関する。 The present invention relates to a speech translation device, a speech translation method, and a speech translation program.

従来、チェーンや独立した小売店舗等において、販売時点情報管理（ＰｏｉｎｔｏｆＳａｌｅ；ＰＯＳ）機能を有するキャッシュレジスタ（以下「ＰＯＳレジスタ」という）を用い、店舗や商品毎の注文情報や会計情報を取得する技術が知られている（例えば特許文献１、及び特許文献１に記載された各特許文献）。また、近時、飲食店をはじめとして、従業員が移動端末装置を操作し、客から注文（オーダー）された飲食メニューと数量を入力することにより、注文情報を取得するオーダーエントリーシステム（以下「ＯＥＳ」という）も導入されている（例えば特許文献２）。こうして取得された注文情報や会計情報は、店舗や商品毎の売上分析や在庫管理等に供され得る。 Conventionally, in a chain or an independent retail store, etc., using a cash register (hereinafter referred to as “POS register”) having a point-of-sale information management (POS) function, order information and accounting information for each store and product are obtained. The technique which performs is known (for example, each patent document described in patent document 1 and patent document 1). In addition, recently, an order entry system (hereinafter referred to as “the order entry system”) that obtains order information by operating a mobile terminal device and entering a food order menu and quantity ordered by a customer, including restaurants. OES ”is also introduced (for example, Patent Document 2). The order information and accounting information acquired in this way can be used for sales analysis and inventory management for each store or product.

一方、例えば店舗の店員と外国人客との会話を行うための音声翻訳アプリケーション（例えば非特許文献１）や、ユーザが会話を行いたいシチュエーション（レストラン、ショッピング等）を選択することにより、目的別の会話パターンがリスト表示される翻訳アプリケーション（例えば非特許文献２）も知られている。 On the other hand, for example, by selecting a speech translation application (for example, Non-Patent Document 1) for carrying out a conversation between a store clerk and a foreign customer, or a situation (restaurant, shopping, etc.) that the user wants to carry out a conversation, A translation application (for example, Non-Patent Document 2) that displays a list of conversation patterns is also known.

特開２０１３−１３７６５７号公報JP 2013-137657 A 特開２０１１−２０４２２７号公報JP 2011-204227 A

Ｕ−ＳＴＡＲコンソーシアムホームページ［平成２８年２月２３日検索］、インターネット＜ＵＲＬ：http://www.ustar-consortium.com/app_ja/app.html＞U-STAR Consortium homepage [Search on February 23, 2016], Internet <URL: http://www.ustar-consortium.com/app_en/app.html> 会話シミュレーションができる！日中韓英対応の翻訳アプリ『［旅行アプリ１位］TS会話翻訳機［CJK］』［平成２８年２月２３日検索］、インターネット＜ＵＲＬ：http://andronavi.com/2012/08/208376＞Conversation simulation is possible! Japanese-Chinese-Korean-compatible translation application “[Travel App No. 1] TS Conversation Translator [CJK]” [searched February 23, 2016], Internet <URL: http://andronavi.com/2012/08/ 208376>

ところで、上記従来の技術を組み合わせることにより、例えば翻訳アプリケーションを使用して外国人客の注文を聞き、その注文内容をＯＥＳやＰＯＳレジスタに入力し、こうして取得した注文情報を、店舗や商品毎の売上分析や在庫管理のために用いることが想起される。しかし、かかる手法では、翻訳アプリケーションを実行する端末装置とＯＥＳやＰＯＳレジスタの端末装置が必要であり、それらを操作して注文情報を取得する作業が煩雑となってしまう。また、ＯＥＳやＰＯＳレジスタ機能を有するアプリケーションを、翻訳アプリケーションと同じ端末装置上で動作させることも考えられるが、この場合でも、複数のアプリケーションを操作する必要があり、依然として作業は煩雑なものとなってしまう。 By the way, by combining the above conventional techniques, for example, using a translation application, a foreign customer's order is heard, the contents of the order are entered into the OES or POS register, and the order information thus obtained is stored for each store or product. It is recalled for use in sales analysis and inventory management. However, such a technique requires a terminal device that executes a translation application and a terminal device such as an OES or POS register, and the operation of operating them to acquire order information becomes complicated. In addition, it may be possible to operate an application having an OES or POS register function on the same terminal device as the translation application. However, even in this case, it is necessary to operate a plurality of applications, and the work is still complicated. End up.

また、店舗や商品毎の売上分析により、例えば、その時点で人気の商品や飲食メニューを抽出し、それらを推奨商品や推奨飲食メニューとして客に勧めるといった営業も可能である。しかし、外国人客に対しては、そういった推奨商品や推奨飲食メニューの分析結果を一旦取得した上で、上述した翻訳アプリケーションを用いて客に勧める必要があるため、この場合の作業も煩雑となり、それに起因して、外国人客との円滑なコミュニケーションを行うことが困難となり、接客の質が低下してしまうおそれがある。 In addition, by sales analysis for each store or product, for example, it is possible to extract popular products and food and drink menus at that time and recommend them to customers as recommended products and recommended food and drink menus. However, for foreign customers, it is necessary to obtain the analysis results of such recommended products and recommended food and drink menus, and then recommend them to customers using the translation application described above. As a result, smooth communication with foreign customers becomes difficult, and the quality of customer service may be reduced.

そこで、本発明は、かかる事情に鑑みてなされたものであり、ユーザによる対話者（外国人客）への接客において、円滑なコミュニケーションを図りつつ、対話の内容を分析することにより注文情報を簡易かつ有効に取得し、これにより、接客の最適化、並びに、ユーザ店舗の売上及び利益の向上に寄与することができる音声翻訳装置、音声翻訳方法、及び音声翻訳プログラムを提供することを目的とする。 Therefore, the present invention has been made in view of such circumstances, and it is possible to simplify the order information by analyzing the contents of the dialogue while facilitating smooth communication when the user interacts with the conversation person (foreign customer). An object of the present invention is to provide a speech translation device, a speech translation method, and a speech translation program that can be acquired effectively and thereby contribute to the optimization of customer service and the improvement of sales and profits of user stores. .

上記課題を解決するため、本発明の一態様による音声翻訳装置は、ユーザ及び／又は対話者の音声を入力するための入力部と、入力音声の内容を異なる言語の内容に翻訳する翻訳部と、入力音声の翻訳内容を音声及び／又はテキストで出力する出力部と、ユーザ及び／又は対話者が選択可能なように定型フレーズを表示する表示部と、ユーザからの注文の問い合わせに対する対話者による注文が確定したか否かを判定する判定部と、注文が確定したと判定されたときに、その注文の確定内容を記憶する記憶部を備える。そして、判定部が、（１）入力音声の内容、又は、選定された定型フレーズの内容に、商品名、注文数量、及びクロージングが含まれている場合、又は、（２）入力音声の内容、又は、選定された定型フレーズの内容に、商品名が含まれており、かつ、会話の終了を示す操作があった場合に、注文が確定したと判定する。 In order to solve the above problems, a speech translation apparatus according to an aspect of the present invention includes an input unit for inputting a voice of a user and / or a dialoguer, and a translation unit for translating the content of the input speech into content of a different language An output unit that outputs the translated content of the input speech as a voice and / or text, a display unit that displays a fixed phrase so that the user and / or the dialogue can be selected, and a dialogue by the user regarding an order inquiry from the user A determination unit that determines whether or not the order has been confirmed, and a storage unit that stores the confirmed content of the order when it is determined that the order has been confirmed. And when the determination unit includes (1) the content of the input voice or the content of the selected fixed phrase includes the product name, the order quantity, and the closing, or (2) the content of the input voice, Alternatively, when the product name is included in the content of the selected fixed phrase and there is an operation indicating the end of the conversation, it is determined that the order is confirmed.

なお、「定型フレーズ」には、文、節、句、語、及び数字が含まれ、また、それらに付随して画像又は記号が含まれていてもよい。また、「クロージング」とは、会話の終了を意味するフレーズ（例えば、注文の問い合わせに対する注文を受け付けて確認した旨を表す又は含意するフレーズ）を含む。さらに、「会話の終了を示す操作」とは、会話を終了する意思表示（例えば、当該音声翻訳装置上で動作する本発明による音声翻訳アプリケーションを終了したり、当該音声翻訳アプリケーションにおける所定の処理を終了したりする操作及び動作）を含む。 The “standard phrase” includes sentences, clauses, phrases, words, and numbers, and may include images or symbols accompanying them. “Closing” includes a phrase meaning the end of the conversation (for example, a phrase indicating or implying that the order has been accepted and confirmed). Further, the “operation indicating the end of the conversation” means an intention to end the conversation (for example, the termination of the speech translation application according to the present invention operating on the speech translation device, or a predetermined process in the speech translation application). Operation and operation).

さらに、表示部が、商品名のリストを表示し、かつ、記憶部に記憶された注文数量の総数がより多い商品名をリストにおいてより高い順位に表示するように構成しても好適である。 Further, the display unit may be configured to display a list of product names and display product names having a larger total number of order quantities stored in the storage unit in a higher order in the list.

またさらに、記憶部が、商品名毎の会計数量を記憶し、表示部が、記憶部に記憶された商品名のリストを表示し、かつ、会計数量の総数がより多い商品名を前記リストにおいてより高い順位に表示するように構成しても好適である。 Still further, the storage unit stores the accounting quantity for each product name, the display unit displays a list of product names stored in the storage unit, and the product name having a larger total number of accounting quantities in the list. It is also preferable to display the images in a higher order.

さらにまた、記憶部が、商品名毎の利益率を記憶し、表示部が、商品名のリストを表示し、かつ、利益率がより高い商品名をリストにおいてより高い順位に表示するように構成しても好適である。 Furthermore, the storage unit stores the profit rate for each product name, the display unit displays a list of product names, and the product name having a higher profit rate is displayed in a higher order in the list. Even it is suitable.

また、定型フレーズは、ユーザが属する業種毎又はユーザの店舗毎に、自動又は手動で予め設定されたものであってもよい。 Further, the fixed phrase may be set automatically or manually in advance for each type of business to which the user belongs or for each store of the user.

さらに、表示部が、各定型フレーズの異なる言語による訳文を表示し、又は、出力部が、各定型フレーズの異なる言語による訳文を音声で出力してもよい。 Further, the display unit may display a translation of each fixed phrase in a different language, or the output unit may output a translation of each fixed phrase in a different language by voice.

また、本発明の一態様による音声翻訳方法は、入力部、翻訳部、出力部、表示部、判定部、及び記憶部を備える音声翻訳装置を用いる方法である。すなわち、当該方法は、入力部が、ユーザ及び／又は対話者の音声を入力するステップと、翻訳部が、入力音声の内容を異なる言語の内容に翻訳するステップと、出力部が、入力音声の翻訳内容を音声及び／又はテキストで出力するステップと、表示部が、ユーザ及び／又は対話者が選択可能なように定型フレーズを表示するステップと、判定部が、ユーザからの注文の問い合わせに対する対話者による注文が確定したか否かを判定するステップと、記憶部が、注文が確定したと判定されたときに、その注文の確定内容を記憶するステップを含む。そして、判定するステップにおいては、（１）入力音声の内容、又は、選定された定型フレーズの内容に、商品名、注文数量、及びクロージングが含まれている場合、又は、（２）入力音声の内容、又は、選定された定型フレーズの内容に、商品名が含まれており、かつ、会話の終了を示す操作があった場合に、注文が確定したと判定する。 A speech translation method according to an aspect of the present invention is a method using a speech translation apparatus including an input unit, a translation unit, an output unit, a display unit, a determination unit, and a storage unit. That is, in the method, the input unit inputs the voice of the user and / or the conversation person, the translation unit translates the content of the input voice into different language content, and the output unit The step of outputting the translation contents by voice and / or text, the step of displaying a fixed phrase so that the display unit can be selected by the user and / or the dialog, and the determination unit interacting with the inquiry about the order from the user A step of determining whether or not the order by the person has been confirmed, and a step of storing the confirmed content of the order when the storage unit determines that the order has been confirmed. In the determining step, (1) the content of the input voice or the content of the selected fixed phrase includes the product name, the order quantity, and the closing, or (2) the input voice When the product name is included in the content or the content of the selected fixed phrase and there is an operation indicating the end of the conversation, it is determined that the order is confirmed.

また、本発明の一態様による音声翻訳プログラムは、コンピュータ（単数又は単一種に限られず、複数又は複数種でもよい；以下同様）を、ユーザ及び／又は対話者の音声を入力するための入力部と、入力音声の内容を異なる言語の内容に翻訳する翻訳部と、入力音声の翻訳内容を音声及び／又はテキストで出力する出力部と、ユーザ及び／又は対話者が選択可能なように定型フレーズを表示する表示部と、ユーザからの注文の問い合わせに対する対話者による注文が確定したか否かを判定する判定部と、注文が確定したと判定されたときに、その注文の確定内容を記憶する記憶部として機能させる。そして、判定部が、（１）入力音声の内容、又は、選定された定型フレーズの内容に、商品名、注文数量、及びクロージングが含まれている場合、又は、（２）入力音声の内容、又は、選定された定型フレーズの内容に、商品名が含まれており、かつ、会話の終了を示す操作があった場合に、注文が確定したと判定する。 In addition, the speech translation program according to one aspect of the present invention is a computer (not limited to a single type or a single type, but may be a plurality or a plurality of types; the same shall apply hereinafter), and an input unit for inputting a voice of a user and / or a conversation person A translation unit that translates the content of the input speech into a different language content, an output unit that outputs the translation content of the input speech as speech and / or text, and a fixed phrase that can be selected by the user and / or the interlocutor , A determination unit for determining whether or not the order by the interrogator for the order inquiry from the user is confirmed, and when the order is determined to be determined, the determined content of the order is stored It functions as a storage unit. And when the determination unit includes (1) the content of the input voice or the content of the selected fixed phrase includes the product name, the order quantity, and the closing, or (2) the content of the input voice, Alternatively, when the product name is included in the content of the selected fixed phrase and there is an operation indicating the end of the conversation, it is determined that the order is confirmed.

本発明によれば、ユーザによる対話者（外国人客）への接客時の会話において、ユーザからの注文の問い合わせに対する対話者による注文が確定したか否かを判定し、その確定内容を記憶することができる。特に、（１）発話された入力音声の内容、又は、選定された定型フレーズの内容に、商品名、注文数量、及びクロージングが含まれている場合、又は、（２）前記入力音声の内容、又は、選定された前記定型フレーズの内容に、商品名が含まれており、かつ、会話の終了を示す操作があった場合に、注文が確定したと判定する。すなわち、本発明によれば、ユーザによる対話者（外国人客）への接客において、会話を通じて円滑なコミュニケーションを図りつつ、対話の内容を分析することにより注文情報を簡易かつ有効に取得することができる。また、これにより、接客の最適化、並びに、ユーザ店舗の売上及び利益の向上に寄与することが可能となる。 According to the present invention, it is determined whether or not an order by a conversation person in response to an order inquiry from a user is confirmed in a conversation when the user interacts with a conversation person (foreigner customer), and the confirmed content is stored. be able to. In particular, (1) when the content of the spoken input speech or the content of the selected fixed phrase includes a product name, order quantity, and closing, or (2) the content of the input speech, Alternatively, when the product name is included in the content of the selected fixed phrase and there is an operation indicating the end of the conversation, it is determined that the order has been confirmed. That is, according to the present invention, order information can be acquired easily and effectively by analyzing the contents of a dialog while conducting smooth communication through a conversation when a user interacts with a dialog person (foreign customer). it can. In addition, this makes it possible to contribute to optimizing customer service and improving sales and profits of user stores.

本発明による音声翻訳装置に係るネットワーク構成等の好適な一実施形態を概略的に示すシステムブロック図である。1 is a system block diagram schematically showing a preferred embodiment of a network configuration and the like related to a speech translation apparatus according to the present invention. 本発明による音声翻訳装置の第１乃至第３実施形態における処理の流れ（一部）の一例を示すフローチャートである。It is a flowchart which shows an example of the flow (a part) of the process in the 1st thru | or 3rd embodiment of the speech translation apparatus by this invention. 本発明による音声翻訳装置の第４乃至第６実施形態における処理の流れ（一部）の一例を示すフローチャートである。It is a flowchart which shows an example of the flow (a part) of the process in 4th thru | or 6th embodiment of the speech translation apparatus by this invention. （Ａ）乃至（Ｃ）は、情報端末における表示画面の遷移の一例を示す平面図である。(A) thru | or (C) are top views which show an example of the transition of the display screen in an information terminal. （Ａ）乃至（Ｄ）は、情報端末における表示画面の遷移の一例を示す平面図である。(A) thru | or (D) are top views which show an example of the transition of the display screen in an information terminal.

以下、本発明の実施の形態について詳細に説明する。なお、以下の実施の形態は、本発明を説明するための例示であり、本発明をその実施の形態のみに限定する趣旨ではない。また、本発明は、その要旨を逸脱しない限り、さまざまな変形が可能である。さらに、当業者であれば、以下に述べる各要素を均等なものに置換した実施の形態を採用することが可能であり、かかる実施の形態も本発明の範囲に含まれる。またさらに、必要に応じて示す上下左右等の位置関係は、特に断らない限り、図示の表示に基づくものとする。さらにまた、図面における各種の寸法比率は、その図示の比率に限定されるものではない。 Hereinafter, embodiments of the present invention will be described in detail. The following embodiments are examples for explaining the present invention, and are not intended to limit the present invention only to the embodiments. The present invention can be variously modified without departing from the gist thereof. Furthermore, those skilled in the art can employ embodiments in which the elements described below are replaced with equivalent ones, and such embodiments are also included in the scope of the present invention. Furthermore, positional relationships such as up, down, left, and right shown as needed are based on the display shown unless otherwise specified. Furthermore, various dimensional ratios in the drawings are not limited to the illustrated ratios.

（装置構成）
図１は、本発明による音声翻訳装置に係るネットワーク構成等の好適な一実施形態を概略的に示すシステムブロック図である。この例において、音声翻訳装置１００は、ユーザが使用する情報端末１０（ユーザ装置）にネットワークＮを介して電子的に接続されるサーバ２０を備える（但し、これに限定されない）。 (Device configuration)
FIG. 1 is a system block diagram schematically showing a preferred embodiment such as a network configuration related to a speech translation apparatus according to the present invention. In this example, the speech translation apparatus 100 includes a server 20 that is electronically connected to the information terminal 10 (user apparatus) used by the user via the network N (but is not limited to this).

情報端末１０は、例えば、タッチパネル等のユーザインターフェイス及び視認性が高いディスプレイを採用する。また、ここでの情報端末１０は、ネットワークＮとの通信機能を有するスマートフォンに代表される携帯電話を含む可搬型のタブレット型端末装置である。さらに、情報端末１０は、プロセッサ１１、記憶資源１２、音声入出力デバイス１３、通信インターフェイス１４、入力デバイス１５、表示デバイス１６、及びカメラ１７を備えている。また、情報端末１０は、インストールされた音声翻訳アプリケーションソフト（本発明の一実施形態による音声翻訳プログラムの少なくとも一部）が動作することにより、本発明の一実施形態による音声翻訳装置の一部又は全部として機能するものである。 The information terminal 10 employs a user interface such as a touch panel and a display with high visibility, for example. The information terminal 10 here is a portable tablet terminal device including a mobile phone represented by a smartphone having a communication function with the network N. The information terminal 10 further includes a processor 11, a storage resource 12, a voice input / output device 13, a communication interface 14, an input device 15, a display device 16, and a camera 17. In addition, the information terminal 10 operates by the installed speech translation application software (at least a part of the speech translation program according to the embodiment of the present invention), so that a part of the speech translation apparatus according to the embodiment of the present invention or It functions as a whole.

プロセッサ１１は、算術論理演算ユニット及び各種レジスタ（プログラムカウンタ、データレジスタ、命令レジスタ、汎用レジスタ等）から構成される。また、プロセッサ１１は、記憶資源１２に格納されているプログラムＰ１０である音声翻訳アプリケーションソフトを解釈及び実行し、各種処理を行う。このプログラムＰ１０としての音声翻訳アプリケーションソフトは、例えばサーバ２０からネットワークＮを通じて配信可能なものであり、手動で又は自動でインストール及びアップデートされてもよい。 The processor 11 includes an arithmetic logic unit and various registers (program counter, data register, instruction register, general-purpose register, etc.). Further, the processor 11 interprets and executes speech translation application software, which is the program P10 stored in the storage resource 12, and performs various processes. The speech translation application software as the program P10 can be distributed from the server 20 through the network N, for example, and may be installed and updated manually or automatically.

なお、ネットワークＮは、例えば、有線ネットワーク（近距離通信網（ＬＡＮ）、広域通信網（ＷＡＮ）、又は付加価値通信網（ＶＡＮ）等）と無線ネットワーク（移動通信網、衛星通信網、ブルートゥース（Bluetooth（登録商標））、ＷｉＦｉ(Wireless Fidelity)、ＨＳＤＰＡ(High Speed Downlink Packet Access)等）が混在して構成される通信網である。 The network N includes, for example, a wired network (a short-range communication network (LAN), a wide-area communication network (WAN), a value-added communication network (VAN), etc.) and a wireless network (mobile communication network, satellite communication network, Bluetooth ( Bluetooth (registered trademark)), WiFi (Wireless Fidelity), HSDPA (High Speed Downlink Packet Access), etc.).

記憶資源１２は、物理デバイス（例えば、半導体メモリ等のコンピュータ読み取り可能な記録媒体）の記憶領域が提供する論理デバイスであり、情報端末１０の処理に用いられるオペレーティングシステムプログラム、ドライバプログラム、各種データ等を格納する。ドライバプログラムとしては、例えば、音声入出力デバイス１３を制御するための入出力デバイスドライバプログラム、入力デバイス１５を制御するための入力デバイスドライバプログラム、表示デバイス１６を制御するための表示デバイスドライバプログラム等が挙げられる。さらに、音声入出力デバイス１３は、例えば、一般的なマイクロフォン、及びサウンドデータを再生可能なサウンドプレイヤである。 The storage resource 12 is a logical device provided by a storage area of a physical device (for example, a computer-readable recording medium such as a semiconductor memory), and an operating system program, a driver program, various data, etc. used for processing of the information terminal 10 Is stored. Examples of the driver program include an input / output device driver program for controlling the audio input / output device 13, an input device driver program for controlling the input device 15, and a display device driver program for controlling the display device 16. Can be mentioned. Furthermore, the voice input / output device 13 is, for example, a general microphone and a sound player capable of reproducing sound data.

通信インターフェイス１４は、例えばサーバ２０との接続インターフェイスを提供するものであり、無線通信インターフェイス及び／又は有線通信インターフェイスから構成される。また、入力デバイス１５は、例えば、表示デバイス１６に表示されるアイコン、ボタン、仮想キーボード、テキスト等のタップ動作による入力操作を受け付けるインターフェイスを提供するものであり、タッチパネルの他、情報端末１０に外付けされる各種入力装置を例示することができる。 The communication interface 14 provides a connection interface with the server 20, for example, and is configured from a wireless communication interface and / or a wired communication interface. The input device 15 provides an interface for accepting an input operation by a tap operation such as an icon, a button, a virtual keyboard, or a text displayed on the display device 16. Various input devices to be attached can be exemplified.

表示デバイス１６は、画像表示インターフェイスとして各種の情報をユーザや対話者（会話の相手方）に提供するものであり、例えば、有機ＥＬディスプレイ、液晶ディスプレイ、ＣＲＴディスプレイ等が挙げられる。また、カメラ１７は、種々の被写体の静止画や動画を撮像するためのものである。 The display device 16 provides various information as an image display interface to a user or a conversation person (conversation partner), and examples thereof include an organic EL display, a liquid crystal display, and a CRT display. The camera 17 is for capturing still images and moving images of various subjects.

サーバ２０は、例えば、演算処理能力の高いホストコンピュータによって構成され、そのホストコンピュータにおいて所定のサーバ用プログラムが動作することにより、サーバ機能を発現するものであり、例えば、音声認識サーバ、翻訳サーバ、及び音声合成サーバとして機能する単数又は複数のホストコンピュータから構成される（図示においては単数で示すが、これに限定されない）。そして、各サーバ２０は、プロセッサ２１、通信インターフェイス２２、及び記憶資源２３を備える。 The server 20 is constituted by, for example, a host computer having a high arithmetic processing capability, and expresses a server function by operating a predetermined server program in the host computer, for example, a speech recognition server, a translation server, And a single or a plurality of host computers functioning as a speech synthesis server (in the drawing, it is indicated by a single, but is not limited thereto). Each server 20 includes a processor 21, a communication interface 22, and a storage resource 23.

プロセッサ２１は、算術演算、論理演算、ビット演算等を処理する算術論理演算ユニット及び各種レジスタ（プログラムカウンタ、データレジスタ、命令レジスタ、汎用レジスタ等）から構成され、記憶資源２３に格納されているプログラムＰ２０を解釈及び実行し、所定の演算処理結果を出力する。また、通信インターフェイス２２は、ネットワークＮを介して情報端末１０に接続するためのハードウェアモジュールであり、例えば、ＩＳＤＮモデム、ＡＤＳＬモデム、ケーブルモデム、光モデム、ソフトモデム等の変調復調装置である。 The processor 21 is composed of an arithmetic and logic unit for processing arithmetic operations, logical operations, bit operations and the like and various registers (program counter, data register, instruction register, general-purpose register, etc.), and is stored in the storage resource 23. P20 is interpreted and executed, and a predetermined calculation processing result is output. The communication interface 22 is a hardware module for connecting to the information terminal 10 via the network N. For example, the communication interface 22 is a modulation / demodulation device such as an ISDN modem, an ADSL modem, a cable modem, an optical modem, or a soft modem.

記憶資源２３は、例えば、物理デバイス（ディスクドライブ又は半導体メモリ等のコンピュータ読み取り可能な記録媒体等）の記憶領域が提供する論理デバイスであり、それぞれ単数又は複数のプログラムＰ２０、各種モジュールＬ２０、各種データベースＤ２０、及び各種モデルＭ２０が格納されている。また、記憶資源２３には、ユーザが対話者へ話しかけるために予め用意された複数の質問定型文、入力音声の履歴データ、各種設定用のデータ、後述する商品（メニュー）の注文確定内容等も記憶されている。 The storage resource 23 is a logical device provided by, for example, a storage area of a physical device (a computer-readable recording medium such as a disk drive or a semiconductor memory), and each includes one or a plurality of programs P20, various modules L20, and various databases. D20 and various models M20 are stored. The storage resource 23 also includes a plurality of standard questions prepared for the user to speak to the interlocutor, history data of input speech, data for various settings, order confirmation details of a product (menu) described later, and the like. It is remembered.

プログラムＰ２０は、サーバ２０のメインプログラムである上述したサーバ用プログラム等である。また、各種モジュールＬ２０は、情報端末１０から送信されてくる要求及び情報に係る一連の情報処理を行うため、プログラムＰ１０の動作中に適宜呼び出されて実行されるソフトウェアモジュール（モジュール化されたサブプログラム）である。かかるモジュールＬ２０としては、音声認識モジュール、翻訳モジュール、音声合成モジュール等が挙げられる。 The program P20 is the above-described server program that is the main program of the server 20. In addition, the various modules L20 perform a series of information processing related to requests and information transmitted from the information terminal 10, so that they are appropriately called and executed during the operation of the program P10 (moduleized subprograms). ). Examples of the module L20 include a speech recognition module, a translation module, and a speech synthesis module.

また、各種データベースＤ２０としては、音声翻訳処理のために必要な各種コーパス（例えば、日本語と英語の音声翻訳の場合、日本語音声コーパス、英語音声コーパス、日本語文字（語彙）コーパス、英語文字（語彙）コーパス、日本語辞書、英語辞書、日英対訳辞書、日英対訳コーパス等）、音声データベース、ユーザに関する情報を管理するための管理用データベース、後述する注文履歴データベース等が挙げられる。また、各種モデルＭ２０としては、音声認識に使用する音響モデルや言語モデル等が挙げられる。 The various databases D20 include various corpora required for speech translation processing (for example, in the case of Japanese and English speech translation, a Japanese speech corpus, an English speech corpus, a Japanese character (vocabulary) corpus, an English character) (Vocabulary) Corpus, Japanese dictionary, English dictionary, Japanese-English bilingual dictionary, Japanese-English bilingual corpus, etc.), speech database, management database for managing information related to users, order history database to be described later, and the like. Examples of the various models M20 include acoustic models and language models used for speech recognition.

（第１実施形態）
以上のとおり構成された音声翻訳装置１００における音声翻訳処理の操作及び動作の一例（第１実施形態）について、以下に更に説明する。図２は、第１乃至第３実施形態の音声翻訳装置１００における処理の流れ（の一部）の一例を示すフローチャートである。また、図４（Ａ）乃至（Ｃ）は、情報端末における表示画面の遷移の一例を示す平面図である。なお、ここでは、情報端末１０のユーザが日本語を話す飲食店等の店員であり、対話者（会話の相手）が英語を話す外国人客である場合の会話を想定する（但し、言語やシチュエーションはこれに限定されない）。 (First embodiment)
An example (first embodiment) of speech translation processing operations and operations in the speech translation apparatus 100 configured as described above will be further described below. FIG. 2 is a flowchart illustrating an example of (a part of) a processing flow in the speech translation apparatus 100 according to the first to third embodiments. 4A to 4C are plan views illustrating an example of display screen transition in the information terminal. Here, it is assumed that the user of the information terminal 10 is a clerk of a restaurant that speaks Japanese, and the conversation person (conversation partner) is a foreign customer who speaks English (however, the language or Situations are not limited to this).

まず、ユーザ（店員）が当該アプリケーションを起動する（ステップＳＵ１）と、情報端末１０の表示デバイス１６に、図４（Ａ）に示す対話者の言語選択画面が表示される（ステップＳＪ１）。この言語選択画面には、対話者に言語を尋ねることをユーザに促すための日本語のテキストＴ１、対話者に言語を尋ねる旨の英語のテキストＴ２、及び、想定される複数の代表的な言語（ここでは、英語、中国語（例えば書体により２種類）、ハングル語）を示す言語ボタン４１が表示される。さらにその下方には、言語選択画面を閉じて当該アプリケーションを終了するためのキャンセルボタンＢ１も表示される。 First, when the user (clerk) activates the application (step SU1), the language selection screen for the conversation person shown in FIG. 4A is displayed on the display device 16 of the information terminal 10 (step SJ1). In this language selection screen, the Japanese text T1 for prompting the user to ask the conversation person about the language, the English text T2 for asking the conversation person about the language, and a plurality of typical languages assumed. Here, a language button 41 indicating English, Chinese (for example, two types depending on the typeface), and Korean is displayed. Further below that, a cancel button B1 for closing the language selection screen and ending the application is also displayed.

このとき、図４（Ａ）に示す如く、日本語のテキストＴ１及び英語のテキストＴ２は、プロセッサ１１及び表示デバイス１６により、情報端末１０の表示デバイス１６の画面において、異なる領域によって区分けされ、且つ、互いに逆向き（互いに異なる向き；図示において上下逆向き）に表示される。これにより、ユーザと対話者が対面している状態で会話を行う場合、ユーザは日本語のテキストＴ１を確認し易い一方、対話者は、英語のテキストＴ２を確認し易くなる。また、日本語のテキストＴ１と英語のテキストＴ２が区分けして表示されるので、両者を明別して更に視認し易くなる利点がある。 At this time, as shown in FIG. 4A, the Japanese text T1 and the English text T2 are divided by the processor 11 and the display device 16 into different areas on the screen of the display device 16 of the information terminal 10, and Are displayed in opposite directions (different directions; upside down in the figure). Thereby, when a conversation is performed in a state where the user and the interlocutor face each other, the user can easily confirm the Japanese text T1, while the interrogator can easily confirm the English text T2. In addition, since the Japanese text T1 and the English text T2 are displayed separately, there is an advantage that both are clearly distinguished and can be visually recognized more easily.

ユーザがその言語選択画面における英語のテキストＴ２の表示を対話者に提示し、対話者に例えば英語（Ｅｎｇｌｉｓｈ）のボタンをタップしてもらうことにより、又は、ユーザが自ら、対話者の言語を選択することができる。こうして対話者の言語が選択されると、サーバ２０のプロセッサ２１及び情報端末１０のプロセッサ１１により、ホーム画面として、日本語と英語の音声入力待機画面が表示デバイス１６に表示される（図４（Ｂ）；ステップＳＪ２）。この音声入力待機画面には、ユーザと対話者の言語の何れを発話するかを問う日本語のテキストＴ３、並びに、日本語の音声入力を行うための入力ボタン４２ａ及び英語の音声入力を行うための入力ボタン４２ｂが表示される。 The user presents the display of the English text T2 on the language selection screen to the conversation person, and the conversation person taps the English button, for example, or the user himself selects the conversation person's language. can do. When the language of the conversation person is selected in this way, the processor 21 of the server 20 and the processor 11 of the information terminal 10 display the Japanese and English voice input standby screen on the display device 16 as the home screen (FIG. 4 ( B); Step SJ2). On this voice input standby screen, the Japanese text T3 asking which of the user's language and the talker's language is spoken, the input button 42a for inputting Japanese voice, and the English voice input are performed. The input button 42b is displayed.

また、この音声入力待機画面には、予め設定されている複数の質問定型文のリスト表示を選択するためのお声がけボタン４３、図４（Ａ）の言語選択画面に戻って対話者の言語を切り替える（言語選択をやり直す）ための言語選択ボタン４４、それまでになされた音声入力内容の履歴表示を選択するための履歴ボタン４５、予め用意された複数の定型フレーズ（推奨フレーズ）群のなかから所望の定型フレーズを選択して会話を進めることができるサジェスト機能を実行するためのサジェストボタン４６、及び当該アプリケーションソフトの各種設定を行うための設定ボタン４７も表示される。 Also, on this voice input standby screen, a voice button 43 for selecting a list display of a plurality of preset question phrases, the language selection screen shown in FIG. Language selection button 44 for switching the language (re-selecting the language), history button 45 for selecting the history display of the voice input content made so far, and a plurality of standard phrases (recommended phrases) prepared in advance Also displayed are a suggest button 46 for executing a suggest function that allows a user to select a desired fixed phrase and proceed with a conversation, and a setting button 47 for performing various settings of the application software.

ここで、ユーザと対話者の会話及び／又は会話準備における通常の音声翻訳処理の主要手順（図２におけるステップＳＪ３）の概要について説明する。まず、図４（Ｂ）に示す音声入力待機画面において、ユーザが日本語の入力ボタン４２ａをタップして日本語の音声入力を選択すると、ユーザの日本語による発話内容を受け付ける音声入力画面となる（図４（Ｃ））。この音声入力画面が表示されると、音声入出力デバイス１３からの音声入力が可能な状態となる。また、この音声入力画面には、ユーザの音声入力を促すテキストＴ４、音声入力状態にあることを示すマイク図案４８、及びテキスト入力へ切り替えるための入力切替ボタン５０が表示される。さらに、この音声入力画面にも、キャンセルボタンＢ１が表示され、これをタップすることにより、会話を終了するか、音声入力待機画面（図４（Ｂ））へ戻って音声入力をやり直すことができる。 Here, an outline of a main procedure of normal speech translation processing (step SJ3 in FIG. 2) in the conversation between the user and the conversation person and / or the conversation preparation will be described. First, on the voice input standby screen shown in FIG. 4 (B), when the user taps the Japanese input button 42a and selects Japanese voice input, the voice input screen accepts the user's Japanese utterance content. (FIG. 4C). When this voice input screen is displayed, voice input from the voice input / output device 13 is enabled. In addition, the voice input screen displays text T4 that prompts the user to input voice, a microphone design 48 indicating that the user is in a voice input state, and an input switching button 50 for switching to text input. Further, a cancel button B1 is also displayed on this voice input screen. By tapping this button, the conversation can be ended or the voice input can be performed again by returning to the voice input standby screen (FIG. 4B). .

この状態で、ユーザが対話者への伝達事項等を発話する（ステップＳＵ２）と、テキストＴ４とともに、その声量の大小を模式的に且つ動的に表す多重円形図案４９が表示され、音声入力レベルが発話者であるユーザへ視覚的にフィードバックされる。それから、発話が終了し、ユーザがマイク図案４８をタップすると、プロセッサ１１は、ユーザによる発話内容の受け付けを終了する。情報端末１０のプロセッサ１１は、その音声入力に基づいて音声信号を生成し、その音声信号を通信インターフェイス１４及びネットワークＮを通してサーバ２０へ送信する。このとおり、情報端末１０自体、又はプロセッサ１１及び音声入出力デバイス１３が「入力部」として機能する。 In this state, when the user utters an item to be communicated to the talker (step SU2), a multi-circular pattern 49 that schematically and dynamically represents the volume of the voice is displayed together with the text T4, and the voice input level Is visually fed back to the user who is the speaker. Then, when the utterance is finished and the user taps the microphone design 48, the processor 11 finishes accepting the utterance content by the user. The processor 11 of the information terminal 10 generates an audio signal based on the audio input, and transmits the audio signal to the server 20 through the communication interface 14 and the network N. As described above, the information terminal 10 itself, or the processor 11 and the voice input / output device 13 function as an “input unit”.

次に、サーバ２０のプロセッサ２１は、通信インターフェイス２２を通してその音声信号を受信し、音声認識処理を行う。このとき、プロセッサ２１は、記憶資源２３から、必要なモジュールＬ２０、データベースＤ２０、及びモデルＭ２０（音声認識モジュール、日本語音声コーパス、音響モデル、言語モデル等）を呼び出し、入力音声の「音」を「読み」（文字）へ変換する。このとおり、プロセッサ２１、又は、サーバ２０が全体として「音声認識サーバ」として機能する。また、プロセッサ２１は、認識された内容を、音声入力の履歴データとして、記憶資源２３に（必要に応じて適宜のデータベースに）記憶する。 Next, the processor 21 of the server 20 receives the voice signal through the communication interface 22 and performs voice recognition processing. At this time, the processor 21 calls the necessary module L20, database D20, and model M20 (speech recognition module, Japanese speech corpus, acoustic model, language model, etc.) from the storage resource 23, and obtains “sound” of the input speech. Convert to "reading" (character). As described above, the processor 21 or the server 20 functions as a “voice recognition server” as a whole. In addition, the processor 21 stores the recognized content in the storage resource 23 (in an appropriate database as necessary) as voice input history data.

次いで、プロセッサ２１は、その入力音声の認識結果を、情報端末１０に送信し、プロセッサ１１は、それを日本語のテキストとして画面表示する（図示省略）。このとき、入力音声の認識結果をそのまま表示してもよいし、予め記憶資源２３に記憶されている日本語の会話コーパスのなかから、実際の入力音声の内容に対応するものを呼び出して表示してもよい。 Next, the processor 21 transmits the recognition result of the input voice to the information terminal 10, and the processor 11 displays it on the screen as Japanese text (not shown). At this time, the recognition result of the input voice may be displayed as it is, or the Japanese speech corpus stored in the storage resource 23 in advance is called and displayed corresponding to the content of the actual input voice. May be.

続いて、プロセッサ２１は、認識された音声の「読み」（文字）を他の言語に翻訳する多言語翻訳処理へ移行する。このとき、プロセッサ２１は、記憶資源２３から、必要なモジュールＬ２０及びデータベースＤ２０（翻訳モジュール、日本語文字コーパス、日本語辞書、英語辞書、日英対訳辞書、日英対訳コーパス等）を呼び出し、認識結果である入力音声の「読み」（文字列）を適切に並び替えて日本語の句、節、文等へ変換し、その変換結果に対応する英語を抽出し、それらを英文法に従って並び替えて自然な英語の句、節、文等へと変換する。このとおり、プロセッサ２１は、「翻訳部」としても機能し、サーバ２０は、全体として「翻訳サーバ」としても機能する。なお、入力音声が正確に認識されなかった場合には、音声の再入力を行うことができる（図示省略）。なお、プロセッサ２１は、それらの日本語及び英語の句、節、文等を、記憶資源２３に記憶しておくこともできる。 Subsequently, the processor 21 proceeds to multilingual translation processing for translating the recognized “reading” (characters) of the speech into another language. At this time, the processor 21 calls the necessary module L20 and database D20 (translation module, Japanese character corpus, Japanese dictionary, English dictionary, Japanese-English bilingual dictionary, Japanese-English bilingual corpus, etc.) from the storage resource 23 and recognizes them. The resulting input speech “reading” (character string) is properly sorted and converted into Japanese phrases, clauses, sentences, etc., the English corresponding to the conversion result is extracted, and these are sorted according to the English grammar. To natural English phrases, clauses, sentences, etc. As described above, the processor 21 also functions as a “translation unit”, and the server 20 also functions as a “translation server” as a whole. If the input voice is not correctly recognized, the voice can be re-input (not shown). The processor 21 can also store those Japanese and English phrases, clauses, sentences, and the like in the storage resource 23.

それから、プロセッサ２１は、音声合成処理へ移行する。このとき、プロセッサ２１は、記憶資源２３から、必要なモジュールＬ２０、データベースＤ２０、及びモデルＭ２０（音声合成モジュール、英語音声コーパス、音響モデル、言語モデル等）を呼び出し、翻訳結果である英語の句、節、文等を自然な音声に変換する。このとおり、プロセッサ２１は、「音声合成部」としても機能し、サーバ２０は、全体として「音声合成サーバ」としても機能する。 Then, the processor 21 proceeds to speech synthesis processing. At this time, the processor 21 calls the necessary module L20, database D20, and model M20 (speech synthesis module, English speech corpus, acoustic model, language model, etc.) from the storage resource 23, and the English phrase that is the translation result, Convert clauses, sentences, etc. to natural speech. As described above, the processor 21 also functions as a “speech synthesizer”, and the server 20 also functions as a “speech synthesizer” as a whole.

そして、プロセッサ２１は、合成された音声に基づいて音声出力用の音声信号を生成し、通信インターフェイス２２及びネットワークＮを通して、情報端末１０へ送信する。情報端末１０のプロセッサ１１は、通信インターフェイス１４を通してその音声信号を受信し、音声入出力デバイス１３を用いて、音声出力処理を行う（ここまでステップＳＪ３）。このとおり、プロセッサ１１及び音声入出力デバイス１３が、「出力部」として機能する。なお、音声出力に先立って、ユーザの音声認識結果とその翻訳結果を、情報端末１０に一旦表示し、ユーザによる確認後に、音声出力を行うようにしてもよい（図示省略）。 Then, the processor 21 generates a voice signal for voice output based on the synthesized voice, and transmits the voice signal to the information terminal 10 through the communication interface 22 and the network N. The processor 11 of the information terminal 10 receives the audio signal through the communication interface 14 and performs an audio output process using the audio input / output device 13 (step SJ3 so far). As described above, the processor 11 and the voice input / output device 13 function as an “output unit”. Prior to the voice output, the user's voice recognition result and the translation result thereof may be temporarily displayed on the information terminal 10 and the voice output may be performed after confirmation by the user (not shown).

次に、本実施形態におけるより具体的な処理の流れとして、ユーザが対話者の注文をとる際の会話における処理の一例について、更に説明する。まず、ユーザが、音声入力画面（図４（Ｃ））に向かって、注文の問い合わせ（例えば「ご注文をお聞きしてもよろしいでしょうか？」というフレーズ）を発話し、その音声を情報端末１０に入力する（ステップＳＵ２）。 Next, as a more specific processing flow in the present embodiment, an example of processing in conversation when the user places an order for a conversation person will be further described. First, the user speaks an order inquiry (for example, the phrase “Are you sure you want to hear your order?”) Toward the voice input screen (FIG. 4C), and the voice is sent to the information terminal. 10 (step SU2).

その音声信号を受信したサーバ２０のプロセッサ２１は、音声認識、多言語翻訳（例えば「ご注文をお聞きしてもよろしいでしょうか？」の対訳である「ＭａｙＩｈａｖｅｙｏｕｒｏｒｄｅｒ，ｐｌｅａｓｅ？」を訳出する）、及びその音声及び／又はテキスト出力までの処理を行う（ステップＳＪ３）。それから、プロセッサ２１は、ユーザの注文の問い合わせに対する対話者の注文が確定（完了）したか否かの判定を行う（ステップＳＪ４）。具体的には、（１）それまでの会話（入力音声）の内容に、商品名（メニュー名）、注文数量、及びクロージングが含まれている場合、又は、（２）それまでの会話の内容に、商品名が含まれており、かつ、会話の終了を示す操作があった場合に、注文が確定したと判定する。 The processor 21 of the server 20 that has received the voice signal generates voice recognition and multilingual translation (for example, “May I have your order? And the processing up to the voice and / or text output is performed (step SJ3). Then, the processor 21 determines whether or not the order of the interrogator in response to the user's order inquiry is confirmed (completed) (step SJ4). Specifically, (1) When the content of the conversation (input voice) so far includes the product name (menu name), order quantity, and closing, or (2) the content of the conversation so far If the product name is included and there is an operation indicating the end of the conversation, it is determined that the order has been confirmed.

本第１実施形態は、上記（１）の場合に該当し、より具体的には、例えば、以下のとおりである。まず、プロセッサ２１が、音声認識した入力音声の内容について形態素解析を行って形態素を取得し、さらに必要に応じて、多言語翻訳処理において記憶資源２３に記憶した入力音声の句、節、文等を呼び出す。そして、それらの形態素、句、節、文等が、予め設定しておいた商品名（メニュー名）、注文数量、及び、クロージングに合致するか否かを判定する。すなわち、それまでの会話内容に、商品名、注文数量、及び、クロージングのそれぞれに合致する項目が全てあった場合には、注文が確定したと判定する（ステップＳＪ４においてＹｅｓ）。また、（２）の場合、より具体的には、それまでの会話内容に、商品名に合致する項目があり、かつ、キャンセルボタンＢ１のタップといった会話の終了を示す操作があった場合に、注文が確定したと判定する（ステップＳＪ４においてＹｅｓ）。一方、上記（１）及び（２）の何れにも該当しない場合には、注文が確定していないと判定する（ステップＳＪ４においてＮｏ）。 The first embodiment corresponds to the case of (1) above, and more specifically, for example, as follows. First, the processor 21 performs morphological analysis on the content of the input speech that has been speech-recognized to obtain a morpheme, and if necessary, phrases, clauses, sentences, etc. of the input speech stored in the storage resource 23 in multilingual translation processing Call. Then, it is determined whether or not those morphemes, phrases, clauses, sentences, and the like match preset product names (menu names), order quantities, and closings. That is, if all the items that match the product name, the order quantity, and the closing are all included in the conversation contents up to that point, it is determined that the order has been confirmed (Yes in step SJ4). In the case of (2), more specifically, when there is an item that matches the product name in the previous conversation content and there is an operation indicating the end of the conversation, such as tapping the cancel button B1, It is determined that the order has been confirmed (Yes in step SJ4). On the other hand, if neither of the above (1) and (2) is applicable, it is determined that the order has not been confirmed (No in step SJ4).

ここでは、注文の問い合わせ（「ご注文をお聞きしてもよろしいでしょうか？」）が発話されたばかりの段階であって、会話の内容には、商品名、注文数量、及び、クロージングの何れも含まれておらず（（１）及び（２）の何れにも非該当）、プロセッサ２１は、注文が確定していないと判定（ステップＳＪ４においてＮｏ）し、対話者の発話（ステップＳＵ２）へ移行する。 Here, the order inquiry ("Are you sure you want to ask me?") Has just been uttered, and the content of the conversation includes product name, order quantity, and closing. It is not included (not applicable to either (1) or (2)), the processor 21 determines that the order has not been confirmed (No in step SJ4), and proceeds to the utterance of the conversation person (step SU2). Transition.

次に、対話者が問い合わせへの回答として、メニュー名と数量（例えば「生ビールを２つお願いします。」に対応する「Ｔｗｏｄｒａｆｔｂｅｅｒ，ｐｌｅａｓｅ．」）を発話して、注文を行う（ステップＳＵ２）。プロセッサ２１は、その発話内容について、音声認識、多言語翻訳、及びその音声及び／又はテキスト出力までの処理（ステップＳＪ３）を行い、続いて、注文の確定判定を行う（ステップＳＪ４）。 Next, as an answer to the inquiry, the dialog person utters the menu name and quantity (for example, “Two draft beer, please.” Corresponding to “Two draft beer please.”) And places an order (step SU2). The processor 21 performs speech recognition, multilingual translation, and processing up to the speech and / or text output (step SJ3) for the utterance content, and subsequently determines the final order (step SJ4).

ここでは、注文内容（「生ビールを２つお願いします。」）が発話された段階であって、それまでの会話の内容には、商品名（「生ビール」）及び注文数量（「２つ」）が含まれているものの、未だクロージングは含まれておらず、また、キャンセルボタンＢ１のタップといった会話の終了を示す操作もないため、プロセッサ２１は、注文が確定していないと判定（ステップＳＪ４においてＮｏ）し、再度、ユーザによる発話（ステップＳＵ２）へ移行する。 Here, the order details (“Two draft beers please.”) Are spoken, and the content of the conversation so far includes the product name (“draft beer”) and the order quantity (“two”). ) Is included, but the closing is not yet included, and there is no operation indicating the end of the conversation such as tapping the cancel button B1, so the processor 21 determines that the order has not been confirmed (step SJ4). No) and the process again proceeds to the user utterance (step SU2).

次に、対話者の注文に対して、ユーザがクロージング（例えば「ありがとうございます。」、「只今お持ちします。」等）のフレーズを発話する（ステップＳＵ２）。プロセッサ２１は、その発話内容について、音声認識、多言語翻訳、及びその音声及び／又はテキスト出力までの処理（ステップＳＪ３）を行い、続いて、注文の確定判定を行う（ステップＳＪ４）。 Next, the user utters a closing phrase (for example, “Thank you”, “I have it now”) in response to the order of the conversation person (step SU2). The processor 21 performs speech recognition, multilingual translation, and processing up to the speech and / or text output (step SJ3) for the utterance content, and subsequently determines the final order (step SJ4).

この段階で、クロージング（「ありがとうございます。」、「只今お持ちします。」等）が発話されているので、それまでの会話の内容には、商品名（「生ビール」）、注文数量（「２つ」）、及びクロージングが全て含まれる。よって、プロセッサ２１は、注文が確定したと判定する（ステップＳＪ４においてＹｅｓ）。それから、プロセッサ２１は、その会話で取得された注文内容の履歴（注文履歴）として、メニュー名の「生ビール」とその数量である「２つ」を記憶資源２３における適宜のデータベースに記憶する（ステップＳＪ５）。この場合、プロセッサ２１は、その注文履歴データを、データベースＤ２０のひとつとしての例えば注文履歴データベース内に保持してもよい。そして、ユーザは、当該アプリケーションを適宜終了することができる（ステップＳＵ３）。 At this stage, closing ("Thank you", "I'll have it now", etc.) is spoken, so the content of the conversation so far includes the product name ("Draft beer"), the order quantity ( “Two”) and closing are all included. Therefore, the processor 21 determines that the order has been confirmed (Yes in step SJ4). Then, the processor 21 stores the menu name “Draft beer” and its quantity “2” in an appropriate database in the storage resource 23 as a history (order history) of the order contents acquired in the conversation (step). SJ5). In this case, the processor 21 may hold the order history data in, for example, an order history database as one of the databases D20. Then, the user can end the application as appropriate (step SU3).

（第２実施形態）
次に、音声翻訳装置１００における音声翻訳処理の操作及び動作の他の一例（第２実施形態）について説明する。この第２実施形態においては、ユーザの注文の問い合わせに対して対話者が注文（メニュー名及び数量）を行ったあとに、ユーザがクロージングのフレーズを発話しないで、図４（Ｃ）に示す音声入力画面において、キャンセルボタンＢ１を押して会話を終了すること以外は、第１実施形態と同様の処理を実行する。本第２実施形態は、上記（２）の場合に該当する。 (Second Embodiment)
Next, another example (second embodiment) of operations and operations of speech translation processing in the speech translation apparatus 100 will be described. In the second embodiment, after the dialog person places an order (menu name and quantity) in response to the user's order inquiry, the user does not utter the closing phrase and the voice shown in FIG. On the input screen, the same processing as in the first embodiment is executed except that the conversation is ended by pressing the cancel button B1. The second embodiment corresponds to the case (2) above.

すなわち、例えば、対話者が注文内容（「生ビールを２つお願いします。」）を発話した後、ユーザが、クロージングのフレーズを入力せずに、適宜の挨拶（例えば「かしこまりました。」や「ありがとうございます。」を示す英語等）を口頭やそれに代わる動作で対話者に伝えることによって注文の問い合わせを終了し（これはなくてもよい）、キャンセルボタンＢ１をタップして当該音声翻訳アプリケーションによる会話を終了する（会話の終了を示す操作）。そうすると、プロセッサ２１は、キャンセルボタンＢ１のタップによる会話を終了する操作がなされたので、入力音声に関する処理であるステップＳＪ３をスキップし、注文の確定判定を行う（ステップＳＪ４）。 That is, for example, after a dialoguer utters the contents of an order (“Thank you for two draft beers.”), The user does not enter a closing phrase, and an appropriate greeting (for example, End the inquiry about the order by verbally or telling the interlocutor (in English indicating “Thank you.”) (Although this is not necessary), and tap the cancel button B1 to apply the speech translation application. End the conversation by (operation indicating the end of the conversation). Then, since the operation for ending the conversation by tapping the cancel button B1 has been performed, the processor 21 skips step SJ3 which is a process related to the input voice, and performs order confirmation determination (step SJ4).

この段階で、それまでの会話の内容には、商品名（「生ビール」）が含まれており、かつ、キャンセルボタンＢ１のタップといった会話の終了を示す操作があったため、プロセッサ２１は、注文が確定したと判定する（ステップＳＪ４においてＹｅｓ）。それから、プロセッサ２１は、その会話で取得された注文内容の履歴（注文履歴）として、メニュー名の「生ビール」とその数量である「２つ」を記憶資源２３に記憶する（ステップＳＪ５）。この場合も、プロセッサ２１は、その注文履歴データを、データベースＤ２０のひとつとしての例えば注文履歴データベース内に保持してもよい。そして、ユーザは、当該アプリケーションを適宜終了することができる（ステップＳＵ３）。 At this stage, since the content of the conversation so far includes the product name (“draft beer”) and there is an operation indicating the end of the conversation such as tapping the cancel button B1, the processor 21 determines that the order has been placed. It is determined that it has been confirmed (Yes in step SJ4). Then, the processor 21 stores the menu name “Draft beer” and its quantity “2” in the storage resource 23 as the history of the order contents acquired in the conversation (order history) (step SJ5). Also in this case, the processor 21 may hold the order history data in, for example, an order history database as one of the databases D20. Then, the user can end the application as appropriate (step SU3).

（第３実施形態）
次に、音声翻訳装置１００における音声翻訳処理の操作及び動作の他の一例（第３実施形態）について説明する。この第３実施形態においては、ユーザの注文の問い合わせに対して対話者が注文（メニュー名のみ）を行ったあとに、ユーザがクロージングのフレーズを発話しないで、図４（Ｃ）に示す音声入力画面において、キャンセルボタンＢ１を押して会話を終了すること以外は、第２実施形態と同様の処理を実行する。本第３実施形態も、上記（２）の場合に該当する。 (Third embodiment)
Next, another example (third embodiment) of speech translation processing operations and operations in the speech translation apparatus 100 will be described. In the third embodiment, the voice input shown in FIG. 4C is performed without the user uttering the closing phrase after the dialog person places an order (menu name only) in response to the user's order inquiry. On the screen, the same processing as in the second embodiment is executed except that the conversation is ended by pressing the cancel button B1. The third embodiment also corresponds to the case (2).

すなわち、例えば一人で来店した対話者が注文内容（「生ビール」）を発話した後、ユーザが、クロージングのフレーズを入力せずに、適宜の挨拶（例えば「かしこまりました。」や「ありがとうございます。」を示す英語等）を口頭やそれに代わる動作で対話者に伝えることによって注文の問い合わせを終了し（これはなくてもよい）、キャンセルボタンＢ１をタップして当該音声翻訳アプリケーションによる会話を終了する（会話の終了を示す操作）。そうすると、プロセッサ２１は、キャンセルボタンＢ１のタップによる会話を終了する操作がなされたので、入力音声に関する処理であるステップＳＪ３をスキップし、注文の確定判定を行う（ステップＳＪ４）。 That is, for example, after a dialog person who visited the store alone uttered the contents of the order ("draft beer"), the user did not enter a closing phrase, but an appropriate greeting (for example, "I was clever". (English, etc.) "is sent to the dialogue person verbally or in an alternative action, and the inquiry for the order is terminated (this is not necessary), and the conversation by the speech translation application is terminated by tapping the cancel button B1. (Operation indicating end of conversation). Then, since the operation for ending the conversation by tapping the cancel button B1 has been performed, the processor 21 skips step SJ3 which is a process related to the input voice, and performs order confirmation determination (step SJ4).

この段階で、それまでの会話の内容には、商品名（「生ビール」）が含まれており、かつ、キャンセルボタンＢ１のタップといった会話の終了を示す操作があったため、プロセッサ２１は、注文が確定したと判定する（ステップＳＪ４においてＹｅｓ）。それから、プロセッサ２１は、その会話で取得された注文内容の履歴（注文履歴）として、メニュー名の「生ビール」を記憶資源２３に記憶する（ステップＳＪ５）。このとき、注文数量を記憶しなくてもよいし、注文数量のデフォルト値を予め「１つ」と設定しておき、そのデフォルト値をその会話における注文数量として記憶してもよい。そして、ユーザは、当該アプリケーションを適宜終了することができる（ステップＳＵ３）。 At this stage, since the content of the conversation so far includes the product name (“draft beer”) and there is an operation indicating the end of the conversation such as tapping the cancel button B1, the processor 21 determines that the order has been placed. It is determined that it has been confirmed (Yes in step SJ4). Then, the processor 21 stores the “draft beer” of the menu name in the storage resource 23 as a history of order contents (order history) acquired in the conversation (step SJ5). At this time, the order quantity may not be stored, or the default value of the order quantity may be set as “one” in advance, and the default value may be stored as the order quantity in the conversation. Then, the user can end the application as appropriate (step SU3).

（第４実施形態）
続いて、音声翻訳装置１００における音声翻訳処理の操作及び動作の他の一例（第４実施形態）について、以下に更に説明する。図３は、第４乃至第６実施形態の音声翻訳装置１００における処理の流れ（の一部）の一例を示すフローチャートである。また、図５（Ａ）乃至（Ｄ）は、情報端末における表示画面の遷移の一例を示す平面図である。この第４実施形態においては、ステップＳＪ２の後にステップＳＵ４，ＳＪ６を実行し、ステップＳＵ２に代えてステップＳＵ５を実行し、ステップＳＪ３に代えてステップＳＪ７を実行すること以外は、第１実施形態と同様の処理を実行する。なお、ステップＳＪ３に代わるステップＳＪ７は、ステップＳＪ４の後に実行する。 (Fourth embodiment)
Subsequently, another example (fourth embodiment) of the operation and operation of the speech translation process in the speech translation apparatus 100 will be further described below. FIG. 3 is a flowchart illustrating an example of (a part of) a processing flow in the speech translation apparatus 100 according to the fourth to sixth embodiments. 5A to 5D are plan views illustrating an example of display screen transition in the information terminal. The fourth embodiment is the same as the first embodiment except that steps SU4 and SJ6 are executed after step SJ2, step SU5 is executed instead of step SU2, and step SJ7 is executed instead of step SJ3. A similar process is executed. Note that step SJ7 instead of step SJ3 is executed after step SJ4.

すなわち、ユーザが、図４（Ｂ）に示す音声入力待機画面において、サジェストボタン４６をタップする（ステップＳＵ４）と、情報端末１０のプロセッサ１１は、ユーザが属する業種の店舗での接客において多用される定型フレーズ群を表示するための指令信号をサーバ２０へ送信する。その指令信号を受信したサーバ２０のプロセッサ２１は、記憶資源２３に記憶されたデータベースＤ２０に含まれる例えばフレーズデータベースにアクセスし、該当する定型フレーズ群を呼び出し、それらのリストの表示画像データを作成して情報端末１０のプロセッサ１１へ送信する。プロセッサ１１は、その表示画像データに基づいて、例えば図５（Ａ）に示す初期フレーズ群画面を表示デバイス１６に表示する（ステップＳＪ８）。 That is, when the user taps the suggest button 46 on the voice input standby screen shown in FIG. 4B (step SU4), the processor 11 of the information terminal 10 is frequently used in customer service at the store of the industry to which the user belongs. The command signal for displaying the fixed phrase group is transmitted to the server 20. The processor 21 of the server 20 that has received the command signal accesses, for example, a phrase database included in the database D20 stored in the storage resource 23, calls the corresponding fixed phrase group, and creates display image data of those lists. To the processor 11 of the information terminal 10. Based on the display image data, the processor 11 displays, for example, an initial phrase group screen shown in FIG. 5A on the display device 16 (step SJ8).

この図５（Ａ）の初期フレーズ群画面には、複数の日本語のフレーズテキストとそれらの英語による訳文を示す英語のフレーズテキストが、定型フレーズ毎に併記された状態でフレーズリストＰ１として表示される。図５（Ａ）に示すとおり、このフレーズリストＰ１には、例えば、客が席に着いたタイミングでよく発話される定型フレーズが含まれている。また、この初期フレーズ群画面において、フレーズリストＰ１の上方及び下方には、それぞれ、図４（Ａ）の言語選択画面において対話者の言語として選択された言語（つまり対訳言語）が英語であることを示す日本語のテキストＴ４、及び、閉じるボタンＢ２も表示される（以下同様）。この閉じるボタンＢ２をタップすることにより、会話を終了するか、フレーズ群画面を閉じて図４（Ｂ）の音声入力待機画面へ戻ることができる。 On the initial phrase group screen of FIG. 5 (A), a plurality of Japanese phrase texts and English phrase texts indicating their translations in English are displayed as a phrase list P1 in a state where they are written together for each fixed phrase. The As shown in FIG. 5A, the phrase list P1 includes, for example, fixed phrases that are often uttered at the timing when the customer arrives at the seat. In the initial phrase group screen, the language selected as the language of the conversation person (that is, the parallel language) on the language selection screen of FIG. 4A is English above and below the phrase list P1, respectively. A Japanese text T4 indicating “” and a close button B2 are also displayed (the same applies hereinafter). By tapping this close button B2, it is possible to end the conversation or close the phrase group screen and return to the voice input standby screen of FIG.

次いで、ユーザは、フレーズリストＰ１のなかから例えば飲み物の注文を問い合わせる旨のフレーズ（例えば「お飲み物はいかがなさいますか？」）のテキストＴ５をタップして選択する（ステップＳＵ５）と、情報端末１０のプロセッサ１１は、その選択指令信号をサーバ２０のプロセッサ２１へ送信する。それを受信したプロセッサ２１は、テキストＴ５の英語による訳文の音声出力データをプロセッサ１１へ返信し、プロセッサ１１は、その音声を音声入出力デバイス１３から出力する。さらに、プロセッサ２１は、ユーザの注文の問い合わせに対する対話者の注文が確定（完了）したか否かの判定を行う（ステップＳＪ４）。 Next, the user taps and selects the text T5 of a phrase (for example, “Would you like a drink?”) From the phrase list P1 to inquire about an order for a drink, for example (step SU5). The ten processors 11 transmit the selection command signal to the processor 21 of the server 20. The processor 21 that has received it returns voice output data of the translation of the text T5 in English to the processor 11, and the processor 11 outputs the voice from the voice input / output device 13. Further, the processor 21 determines whether or not the order of the interrogator in response to the user's order inquiry has been confirmed (completed) (step SJ4).

具体的には、（１）それまでに選定された定型フレーズの内容に、商品名（メニュー名）、注文数量、及びクロージングが含まれている場合、又は、（２）それまでに選定された定型フレーズの内容に、商品名（メニュー名）が含まれており、かつ、会話の終了を示す操作があった場合に、注文が確定したと判定する。本第４実施形態は、上記（１）の場合に該当する。 Specifically, (1) If the content of the standard phrase selected so far includes the product name (menu name), order quantity, and closing, or (2) selected so far When the content of the standard phrase includes a product name (menu name) and there is an operation indicating the end of the conversation, it is determined that the order has been confirmed. The fourth embodiment corresponds to the case of (1) above.

ここでは、注文の問い合わせ（「お飲み物はいかがなさいますか？」）が選択されたばかりの段階であって、それまでに選定された定型フレーズの内容には、商品名（メニュー名）、注文数量、及び、クロージングの何れも含まれておらず、プロセッサ２１は、注文が確定していないと判定（ステップＳＪ４においてＮｏ）し、対話者が選択可能なフレーズ群画面の表示（ステップＳＪ７）へ移行する。 Here, the order inquiry ("Would you like a drink?") Has just been selected, and the contents of the fixed phrase selected so far include the product name (menu name), order quantity Neither closing nor closing is included, and the processor 21 determines that the order has not been confirmed (No in step SJ4), and shifts to a phrase group screen display (step SJ7) that can be selected by the interlocutor. To do.

次に、プロセッサ２１は、再びフレーズデータベースにアクセスして、テキストＴ５の定型フレーズに関連付けられた他の複数の定型フレーズを呼び出し、それらの定型フレーズのリストの表示画像データを作成して情報端末１０のプロセッサ１１へ送信する。プロセッサ１１は、その表示画像データに基づいて、例えば図５（Ｂ）に示すフレーズ群画面を表示デバイス１６に表示する（ステップＳＪ７）。 Next, the processor 21 accesses the phrase database again, calls a plurality of other fixed phrases associated with the fixed phrase of the text T5, creates display image data of a list of those fixed phrases, and creates the information terminal 10. To the processor 11. Based on the display image data, the processor 11 displays, for example, a phrase group screen shown in FIG. 5B on the display device 16 (step SJ7).

この図５（Ｂ）のフレーズ群画面には、図５（Ａ）に示すフレーズリストＰ１と同様の形態で複数のフレーズテキストを含むフレーズリストＰ２が表示される。図５（Ｂ）に示すとおり、フレーズリストＰ２には、複数の商品名（飲み物のメニュー名）が含まれており、また、対話者がユーザに対してメニューを要求する旨のテキストＴ６も含まれている。ユーザがこのフレーズ群画面を対話者に提示し、対話者が所望の飲み物のメニュー名をタップすることにより、注文をとることができる。或いは、対話者の所望の飲み物のメニュー名がフレーズリストＰ２にない場合、対話者は、テキストＴ６の部分をタップすることにより、店員であるユーザに対してメニューの閲覧を求めることができる。 On the phrase group screen of FIG. 5B, a phrase list P2 including a plurality of phrase texts is displayed in the same manner as the phrase list P1 shown in FIG. As shown in FIG. 5B, the phrase list P2 includes a plurality of product names (drink menu names), and also includes text T6 indicating that the interrogator requests a menu from the user. It is. The user can present an order by presenting the phrase group screen to the interlocutor, and the interrogator taps the menu name of the desired drink. Alternatively, if the menu name of the drink desired by the dialogue person is not in the phrase list P2, the dialogue person can request the user who is a store clerk to browse the menu by tapping the text T6 portion.

このようにして、対話者が、フレーズリストＰ２のなかから所望のメニュー名（例えば「生ビール」）を表すフレーズのテキストをタップして選択する（ステップＳＵ５）と、情報端末１０のプロセッサ１１は、その選択指令信号をサーバ２０のプロセッサ２１へ送信する。それを受信したプロセッサ２１は、そのテキストの日本語の音声出力データをプロセッサ１１へ返信し、プロセッサ１１は、その音声を音声入出力デバイス１３から出力する。さらに、プロセッサ２１は、ユーザの注文の問い合わせに対する対話者の注文が確定（完了）したか否かの判定を行う（ステップＳＪ４）。 Thus, when the dialog person taps and selects the text of a phrase representing a desired menu name (for example, “draft beer”) from the phrase list P2 (step SU5), the processor 11 of the information terminal 10 The selection command signal is transmitted to the processor 21 of the server 20. The processor 21 that has received it returns the Japanese voice output data of the text to the processor 11, and the processor 11 outputs the voice from the voice input / output device 13. Further, the processor 21 determines whether or not the order of the interrogator in response to the user's order inquiry has been confirmed (completed) (step SJ4).

ここでは、メニュー名（「生ビール」）が選択された段階であって、それまでに選定された定型フレーズの内容には、注文数量、及び、クロージングの何れも含まれておらず、また、閉じるボタンＢ２のタップによる会話を終了する操作もないので、プロセッサ２１は、注文が確定していないと判定する（ステップＳＪ４においてＮｏ）。 Here, at the stage where the menu name (“Draft beer”) is selected, the contents of the fixed phrase selected so far include neither the order quantity nor the closing, and close. Since there is no operation for ending the conversation by tapping the button B2, the processor 21 determines that the order has not been confirmed (No in step SJ4).

次に、プロセッサ２１は、再びフレーズデータベースにアクセスして、メニュー名が選択された場合に、その数量を対話者に入力して貰うための、数量入力画面の表示画像データを呼び出すか又は作成して情報端末１０のプロセッサ１１へ送信する。プロセッサ１１は、その表示画像データに基づいて、例えば図５（Ｃ）に示す注文数量入力画面を表示デバイス１６に表示する（ステップＳＪ７）。この図５（Ｃ）の数量入力画面には、数を入力するための数字キー５１が表示され、ユーザがこの数量入力画面を対話者に提示し、対話者が画面をタップして注文数量（ここでは例えば２つ）を入力する（ステップＳＵ５）と、その数字がカラム５２に表示される。次いで、プロセッサ２１は、ユーザの注文の問い合わせに対する対話者の注文が確定（完了）したか否かの判定を行う（ステップＳＪ４）。 Next, the processor 21 accesses the phrase database again, and when the menu name is selected, the processor 21 calls or creates the display image data of the quantity input screen for inputting the quantity to the dialog person. To the processor 11 of the information terminal 10. Based on the display image data, the processor 11 displays, for example, an order quantity input screen shown in FIG. 5C on the display device 16 (step SJ7). In the quantity input screen of FIG. 5C, a numeric key 51 for inputting a number is displayed. The user presents this quantity input screen to the dialog person, and the dialog person taps the screen to order quantity ( Here, for example, when two are input (step SU5), the numbers are displayed in the column 52. Next, the processor 21 determines whether or not the order of the interrogator in response to the user's order inquiry is confirmed (completed) (step SJ4).

ここでは、メニュー名（「生ビール」）の注文数量（「２つ」）が入力された段階であって、それまでに選定された定型フレーズの内容には、クロージングが含まれておらず、また、閉じるボタンＢ２のタップによる会話を終了する操作もないので、プロセッサ２１は、注文が確定していないと判定する（ステップＳＪ４においてＮｏ）。 Here, the order quantity (“2”) of the menu name (“Draft beer”) is entered, and the content of the fixed phrase selected so far does not include closing, Since there is no operation for ending the conversation by tapping the close button B2, the processor 21 determines that the order has not been confirmed (No in step SJ4).

次に、プロセッサ２１は、再びフレーズデータベースにアクセスして、数量の入力に関連付けられた他の複数の定型フレーズを呼び出し、それらの定型フレーズのリストの表示画像データを作成して情報端末１０のプロセッサ１１へ送信する。プロセッサ１１は、その表示画像データに基づいて、例えば図５（Ｄ）に示すフレーズ群画面を表示デバイス１６に表示する（ステップＳＪ７）。 Next, the processor 21 accesses the phrase database again, calls a plurality of other fixed phrases associated with the quantity input, creates display image data of a list of these fixed phrases, and processes the processor of the information terminal 10. 11 to send. Based on the display image data, the processor 11 displays, for example, a phrase group screen shown in FIG. 5D on the display device 16 (step SJ7).

この図５（Ｄ）のフレーズ群画面にも、図５（Ａ）に示すフレーズリストＰ１と同様の形態で複数のフレーズテキストを含むフレーズリストＰ３が表示される。図５（Ｄ）に示すとおり、フレーズリストＰ３には、客からの注文や依頼を受けた場合によく発話されるクロージングのフレーズが含まれている。そして、ユーザが、フレーズリストＰ３のなかから所望のフレーズのテキスト部分をタップして選択する（ステップＳＵ５）と、これまでの処理と同様にして、そのフレーズの英語による訳文の音声出力が行われる。さらに、プロセッサ２１は、ユーザの注文の問い合わせに対する対話者の注文が確定（完了）したか否かの判定を行う（ステップＳＪ４）。 Also on the phrase group screen of FIG. 5D, a phrase list P3 including a plurality of phrase texts is displayed in the same manner as the phrase list P1 shown in FIG. As shown in FIG. 5D, the phrase list P3 includes closing phrases that are often spoken when orders or requests from customers are received. Then, when the user taps and selects the text portion of the desired phrase from the phrase list P3 (step SU5), the English translation of the phrase is output in the same manner as the processing so far. . Further, the processor 21 determines whether or not the order of the interrogator in response to the user's order inquiry has been confirmed (completed) (step SJ4).

この段階で、クロージング（「かしこまりました。」、「すぐお持ちします。」等）のフレーズが選択されているので、それまでに選定された定型フレーズの内容には、商品名（「生ビール」）、注文数量（「２つ」）、及びクロージングが全て含まれる。よって、プロセッサ２１は、注文が確定したと判定する（ステップＳＪ４においてＹｅｓ）。それから、プロセッサ２１は、その会話で取得された注文内容の履歴（注文履歴）として、メニュー名の「生ビール」とその数量である「２つ」を記憶資源２３に記憶する（ステップＳＪ５）。この場合、プロセッサ２１は、その注文履歴データを、データベースＤ２０のひとつとしての例えば注文履歴データベース内に保持してもよい。そして、ユーザは、当該アプリケーションを適宜終了することができる（ステップＳＵ３）。 At this stage, closing phrases (such as “I've got it right”, “I will bring you right now”) are selected, so the content of the standard phrase selected so far includes the product name (“Draft beer” "), Order quantity (" 2 "), and closing. Therefore, the processor 21 determines that the order has been confirmed (Yes in step SJ4). Then, the processor 21 stores the menu name “Draft beer” and its quantity “2” in the storage resource 23 as the history of the order contents acquired in the conversation (order history) (step SJ5). In this case, the processor 21 may hold the order history data in, for example, an order history database as one of the databases D20. Then, the user can end the application as appropriate (step SU3).

（第５実施形態）
次に、音声翻訳装置１００における音声翻訳処理の操作及び動作の他の一例（第５実施形態）について説明する。この第５実施形態においては、ユーザの注文の問い合わせに対して対話者が注文（メニュー名及び数量）を行ったあとに、ユーザがクロージングのフレーズを発話しないで、図５（Ｃ）又は図５（Ｄ）に示すフレーズ群表示画面において、閉じるボタンＢ２を押して会話を終了すること以外は、第４実施形態と同様の処理を実行する。本第５実施形態は、上記（２）の場合に該当する。 (Fifth embodiment)
Next, another example (fifth embodiment) of speech translation processing operations and operations in the speech translation apparatus 100 will be described. In the fifth embodiment, after the dialog person places an order (menu name and quantity) in response to the user's order inquiry, the user does not utter the closing phrase, and FIG. 5 (C) or FIG. In the phrase group display screen shown in (D), the same process as in the fourth embodiment is executed except that the close button B2 is pressed to end the conversation. The fifth embodiment corresponds to the case (2) above.

すなわち、例えば、対話者が注文内容（「生ビールを２つお願いします。」）を発話した後、ユーザが、クロージングのフレーズを入力せずに、適宜の挨拶（例えば「かしこまりました。」や「ありがとうございます。」を示す英語等）を口頭やそれに代わる動作で対話者に伝えることによって注文の問い合わせを終了し（これはなくてもよい）、閉じるボタンＢ２をタップして当該音声翻訳アプリケーションによる会話を終了する（会話の終了を示す操作）。そして、プロセッサ２１は、注文の確定判定を行う（ステップＳＪ７）。 That is, for example, after a dialoguer utters the contents of an order (“Thank you for two draft beers.”), The user does not enter a closing phrase, and an appropriate greeting (for example, End the inquiries about the order by telling the interlocutor verbal or alternative actions (such as English indicating “Thank you.”), And tap the close button B2 to tap the speech translation application. End the conversation by (operation indicating the end of the conversation). Then, the processor 21 performs order confirmation determination (step SJ7).

この段階で、それまでに選定された定型フレーズの内容には、商品名（「生ビール」）が含まれており、かつ、閉じるボタンＢ２のタップといった会話の終了を示す操作があったため、プロセッサ２１は、注文が確定したと判定する（ステップＳＪ７においてＹｅｓ）。それから、プロセッサ２１は、その会話で取得された注文内容の履歴（注文履歴）として、メニュー名の「生ビール」とその数量である「２つ」を記憶資源２３に記憶する（ステップＳＪ５）。この場合も、プロセッサ２１は、その注文履歴データを、データベースＤ２０のひとつとしての例えば注文履歴データベース内に保持してもよい。そして、ユーザは、当該アプリケーションを適宜終了することができる（ステップＳＵ３）。 At this stage, the contents of the fixed phrase selected so far include the product name (“draft beer”), and there has been an operation indicating the end of the conversation, such as tapping the close button B2, so the processor 21 Determines that the order has been confirmed (Yes in step SJ7). Then, the processor 21 stores the menu name “Draft beer” and its quantity “2” in the storage resource 23 as the history of the order contents acquired in the conversation (order history) (step SJ5). Also in this case, the processor 21 may hold the order history data in, for example, an order history database as one of the databases D20. Then, the user can end the application as appropriate (step SU3).

（第６実施形態）
次に、音声翻訳装置１００における音声翻訳処理の操作及び動作の他の一例（第６実施形態）について説明する。この第６実施形態においては、ユーザの注文の問い合わせに対して対話者が注文（メニュー名のみ）を行ったあとに、ユーザがクロージングのフレーズを発話しないで、図５（Ｂ）又は図５（Ｃ）に示すフレーズ群表示画面において、閉じるボタンＢ２を押して会話を終了すること以外は、第５実施形態と同様の処理を実行する。本第６実施形態も、上記（２）の場合に該当する。 (Sixth embodiment)
Next, another example (sixth embodiment) of operations and operations of speech translation processing in the speech translation apparatus 100 will be described. In the sixth embodiment, after the dialog person places an order (only the menu name) in response to the user's order inquiry, the user does not utter the closing phrase, and FIG. 5 (B) or FIG. In the phrase group display screen shown in C), the same process as in the fifth embodiment is executed except that the close button B2 is pressed to end the conversation. The sixth embodiment also corresponds to the case (2).

すなわち、例えば一人で来店した対話者が注文内容（「生ビール」）を発話した後、ユーザが対話者へ数量の入力を依頼せず、適宜の挨拶（例えば「かしこまりました。」や「ありがとうございます。」を示す英語等）を口頭やそれに代わる動作で対話者に伝えることによって注文の問い合わせを終了し（これはなくてもよい）、閉じるボタンＢ２をタップして当該音声翻訳アプリケーションによる会話を終了する（会話の終了を示す操作）。そして、プロセッサ２１は、注文の確定判定を行う（ステップＳＪ７）。 That is, for example, after a talker who visited the store alone uttered the order details ("draft beer"), the user did not ask the talker to input the quantity, and an appropriate greeting (for example, "I'm sorry") (In English, etc.) indicating verbal or alternative actions to the dialogue person, the order inquiry is terminated (this may not be necessary), and the close button B2 is tapped to conduct the conversation by the speech translation application. End (operation indicating the end of the conversation). Then, the processor 21 performs order confirmation determination (step SJ7).

この段階で、それまでに選定された定型フレーズの内容には、商品名（「生ビール」）が含まれており、かつ、閉じるボタンＢ２のタップといった会話の終了を示す操作があったため、プロセッサ２１は、注文が確定したと判定する（ステップＳＪ４においてＹｅｓ）。それから、プロセッサ２１は、その会話で取得された注文内容の履歴（注文履歴）として、メニュー名の「生ビール」を記憶資源２３に記憶する（ステップＳＪ５）。このとき、注文数量を記憶しなくてもよいし、注文数量のデフォルト値を予め「１つ」と設定しておき、そのデフォルト値をその会話における注文数量として記憶してもよい。そして、ユーザは、当該アプリケーションを適宜終了することができる（ステップＳＵ３）。 At this stage, the contents of the fixed phrase selected so far include the product name (“draft beer”), and there has been an operation indicating the end of the conversation, such as tapping the close button B2, so the processor 21 Determines that the order has been confirmed (Yes in step SJ4). Then, the processor 21 stores the “draft beer” of the menu name in the storage resource 23 as a history of order contents (order history) acquired in the conversation (step SJ5). At this time, the order quantity may not be stored, or the default value of the order quantity may be set as “one” in advance, and the default value may be stored as the order quantity in the conversation. Then, the user can end the application as appropriate (step SU3).

以上のように構成された音声翻訳装置１００及びそれを用いた音声翻訳方法並びに音声翻訳プログラムによれば、（１）ユーザによる対話者（外国人客）への接客時の会話内容、又は、選定された定型フレーズの内容に、商品名（メニュー名：例えば「生ビール」）、注文数量（例えば「２つ」）、及び、クロージングのそれぞれに合致する項目が全てあった場合、注文が確定したと判定する。また、その会話内容、又は、選定された定型フレーズの内容に、商品名が含まれており、かつ、会話の終了を示す操作（例えばキャンセルボタンＢ１や閉じるボタンＢ２のタップ）があった場合にも、注文が確定したと判定する。このように、本発明によれば、ユーザによる対話者（外国人客）への接客において、会話を通じて円滑なコミュニケーションを図りつつ、両者の対話の内容に所定の要素が含まれるか否かを分析することにより、注文情報を簡易かつ有効に取得することができる。その結果、接客の最適化、並びに、ユーザ店舗の売上及び利益の向上に寄与することが可能となる。 According to the speech translation apparatus 100 configured as described above, the speech translation method using the speech translation device, and the speech translation program, (1) conversation content or selection at the time of customer service to a talker (foreign customer) If there are items that match the product name (menu name: for example, “draft beer”), order quantity (for example, “two”), and closing in the contents of the fixed phrases that have been made, the order is confirmed judge. Moreover, when the product name is included in the content of the conversation or the content of the selected fixed phrase, and there is an operation indicating the end of the conversation (for example, tap of the cancel button B1 or the close button B2). It is also determined that the order has been confirmed. As described above, according to the present invention, when a user interacts with a conversation person (foreigner customer), it is possible to analyze whether or not a predetermined element is included in the contents of the conversation while performing smooth communication through conversation. By doing so, the order information can be acquired easily and effectively. As a result, it is possible to contribute to optimizing customer service and improving sales and profits of user stores.

また、図５（Ａ）乃至（Ｄ）に示す如く、前に選定された定型フレーズに関連付けられた定型フレーズが、順次、別画面として表示デバイス１６に表示されるので、会話の進行に応じた所望のフレーズを簡易かつ的確に選択し易くなり、ユーザと対話者との会話を自然かつ円滑に行うことができる。またさらに、フレーズリストＰ１〜Ｐ３のそれぞれに設定された複数のフレーズが、ユーザが属する業種毎に予め設定されたものであれば、その業種における接客に特化した会話を、より円滑にかつより適切に実施することができる。また、かかる定型フレーズを、ユーザの店舗毎に予め設定しておくこともでき、この場合、店舗毎の特徴や店舗の状況を反映したよりきめ細かい接客が可能となる。 In addition, as shown in FIGS. 5A to 5D, the fixed phrases associated with the fixed phrases selected previously are sequentially displayed on the display device 16 as separate screens, so that the conversation progresses. It becomes easy to select a desired phrase easily and accurately, and the conversation between the user and the conversation person can be performed naturally and smoothly. Furthermore, if a plurality of phrases set in each of the phrase lists P1 to P3 are preset for each type of industry to which the user belongs, conversations specialized for customer service in the type of industry can be made more smoothly and more smoothly. Can be implemented appropriately. Further, such a fixed phrase can be set in advance for each user's store, and in this case, a finer customer service reflecting the features of each store and the situation of the store becomes possible.

ここで、定型フレーズの設定は、自動で行っても手動で行ってもよい。自動で設定する例としては、まず、当該翻訳アプリケーションの利用に際し、ユーザ情報の１つとしてユーザの業種を登録しておき、サーバ２０のプロセッサ２１が、その業種の会話で頻出する定型フレーズのコーパスや履歴のなかから特に多用される定型フレーズを選定してフレーズリストとして設定する形態が挙げられる。或いは、同業種の複数のユーザが発話した定型フレーズを、その発話頻度とともに適宜のデータベースに記憶し、サーバ２０のプロセッサ２１が、それらの定型フレーズのなかから特に多用されているものを選定してフレーズリストとして設定してもよい。一方、手動で設定する例としては、ユーザが所望の定型フレーズを選定し、フレーズリストとしてカスタマイズする形態が挙げられる。 Here, the fixed phrase may be set automatically or manually. As an example of automatic setting, first, when using the translation application, the user's business type is registered as one of the user information, and the processor 21 of the server 20 uses a corpus of fixed phrases that frequently appear in conversations of that business type. In addition, there is a form in which a fixed phrase that is particularly frequently used is selected from history and set as a phrase list. Alternatively, the fixed phrases uttered by a plurality of users in the same industry are stored in an appropriate database together with the utterance frequency, and the processor 21 of the server 20 selects those frequently used from those fixed phrases. It may be set as a phrase list. On the other hand, as an example of setting manually, there is a form in which the user selects a desired fixed phrase and customizes it as a phrase list.

その際、例えばフレーズリストＰ１〜Ｐ３に含まれる定型フレーズを、初期の設定のまま維持（フレーズリストの固定）してもよく、或いは、それらに含まれる定型フレーズを、必要に応じて適宜変更してもよい。特に、後者の場合、例えば、各定型フレーズが選択された回数を記憶資源２３に記憶しておき、サーバ２０のプロセッサ２１が、選択回数のより多いフレーズをフレーズリストＰ１〜Ｐ３の表示画面においてより高い順位に表示（例えば画面の上方に表示したり強調や拡大して表示したり）するようにしてもよい。これにより、ユーザの業種や店舗の実情に即した定型フレーズを表示し易くなり、かつ、選択し易くなる利点があり、また、ユーザと対話者のコミュニケーションを更に高速化することができる。 At that time, for example, the fixed phrases included in the phrase lists P1 to P3 may be maintained at the initial settings (fixed phrase list), or the fixed phrases included in them may be appropriately changed as necessary. May be. In particular, in the latter case, for example, the number of times each fixed phrase is selected is stored in the storage resource 23, and the processor 21 of the server 20 selects a phrase with a larger number of selections on the display screen of the phrase lists P1 to P3. It may be displayed in a higher order (for example, displayed at the upper part of the screen or displayed with emphasis or enlargement). Thereby, there is an advantage that it becomes easy to display a fixed phrase according to the user's type of business and the actual situation of the store, and it is easy to select, and the communication between the user and the conversation person can be further speeded up.

また、注文された各商品（メニュー）について過去の所定期間に選択された回数（当該音声翻訳アプリケーションによる注文数量、当該ユーザの店舗における会計数量、当該ユーザの店舗を含む複数の店舗における注文数量や会計数量）、又は、各商品（メニュー）の利益率を、記憶資源２３に記憶しておき、サーバ２０のプロセッサ２１が、その選択回数がより多い商品、又は、利益率がより高い商品を、例えば図５（Ｂ）に示すフレーズリストＰ２の表示画面においてより高い順位に表示してもよい。これにより、対話者（外国人客）に対し、人気が高い商品や客単価が高い商品を積極的に推奨することができ、その結果、ユーザの店舗の売上及び利益の向上を更に図ることができる。 In addition, the number of times each selected product (menu) is selected in the past predetermined period (order quantity by the speech translation application, accounting quantity at the user's store, order quantity at a plurality of stores including the user's store, (Accounting quantity) or the profit rate of each product (menu) is stored in the storage resource 23, and the processor 21 of the server 20 selects a product with a higher number of selections or a product with a higher profit rate. For example, it may be displayed in a higher order on the display screen of the phrase list P2 shown in FIG. As a result, it is possible to actively recommend popular products and products with a high unit price per visitor to foreigners, and as a result, it is possible to further improve the sales and profits of the user's store. it can.

またさらに、フレーズリストＰ１〜Ｐ３の表示画面において、定型フレーズ毎の日本語のフレーズテキストと英語による訳文を示す英語のフレーズテキストが併記され、また、各定型フレーズの異なる言語（例えば英語）による訳文が音声で出力される。よって、ユーザ及び対話者は、画面の視認に加えて、又は、画面を視認しなくとも、相手の発話内容をより確実に確認することができる。 Furthermore, on the display screen of the phrase lists P1 to P3, the Japanese phrase text for each fixed phrase and the English phrase text indicating the translated sentence in English are written together, and each fixed phrase is translated in a different language (for example, English). Is output by voice. Therefore, the user and the conversation person can confirm the content of the other party's utterance more reliably in addition to viewing the screen or without viewing the screen.

なお、上述したとおり、上記の各実施形態は、本発明を説明するための一例であり、本発明をその実施形態に限定する趣旨ではない。また、本発明は、その要旨を逸脱しない限り、様々な変形が可能である。例えば、当業者であれば、実施形態で述べたリソース（ハードウェア資源又はソフトウェア資源）を均等物に置換することが可能であり、そのような置換も本発明の範囲に含まれる。 Note that, as described above, each of the above embodiments is an example for explaining the present invention, and is not intended to limit the present invention to the embodiment. The present invention can be variously modified without departing from the gist thereof. For example, those skilled in the art can replace the resources (hardware resources or software resources) described in the embodiments with equivalents, and such replacements are also included in the scope of the present invention.

また、図５（Ａ）、（Ｂ）及び（Ｄ）において、日本語のフレーズテキストと英語のフレーズテキストを、図４（Ａ）のテキストＴ１，Ｔ２のように、互いに逆向き（互いに異なる向き；図示において上下逆向き）に表示してもよい。さらに、これらの日本語のフレーズテキストと英語のフレーズテキストを併記せず、何れか一方のみ表示するようにしてもよい。またさらに、図５（Ｃ）に示す数量入力を、音声入力で行うことができるように構成してもよい。さらにまた、ユーザの業種に拘わらず、その他の業種用に設定されたフレーズリストの表示を選択可能にしてもよい。また、図５（Ｂ）に示す商品名（メニュー名）のフレーズリストには、その時点における在庫が多い材料を使用したメニュー名やユーザ又はユーザの店舗が独自に推奨するメニュー名を表示することもできる。 5 (A), (B) and (D), the Japanese phrase text and the English phrase text are opposite to each other (in different directions, like the texts T1 and T2 in FIG. 4 (A)). ; May be displayed in the reverse direction in the figure). Furthermore, these Japanese phrase text and English phrase text may not be written together, and only one of them may be displayed. Furthermore, it may be configured such that the quantity input shown in FIG. 5C can be performed by voice input. Furthermore, it may be possible to select the display of the phrase list set for other industries regardless of the user's industry. In addition, in the phrase list of the product name (menu name) shown in FIG. 5B, a menu name using a material with a large stock at that time or a menu name uniquely recommended by the user or the user's store is displayed. You can also.

また、音声認識、翻訳、音声合成等の各処理をサーバ２０によって実行する例について記載したが、これらの処理を情報端末１０において実行するように構成してもよい。この場合、それらの処理に用いるモジュールＬ２０は、情報端末１０の記憶資源１２に保存されていてもよいし、サーバ２０の記憶資源２３に保存されていてもよい。さらに、音声データベースであるデータベースＤ２０、及び／又は、音響モデル等のモデルＭ２０も、情報端末１０の記憶資源１２に保存されていてもよいし、サーバ２０の記憶資源２３に保存されていてもよい。このとおり、音声翻訳装置は、ネットワークＮ及びサーバ２０を備えなくてもよい。 Moreover, although the example which performs each process, such as speech recognition, translation, speech synthesis, by server 20, was described, you may comprise so that these processes may be performed in the information terminal 10. FIG. In this case, the module L20 used for these processes may be stored in the storage resource 12 of the information terminal 10 or may be stored in the storage resource 23 of the server 20. Furthermore, the database D20 that is a voice database and / or a model M20 such as an acoustic model may be stored in the storage resource 12 of the information terminal 10 or may be stored in the storage resource 23 of the server 20. . As described above, the speech translation apparatus may not include the network N and the server 20.

また、情報端末１０とネットワークＮとの間には、両者間の通信プロトコルを変換するゲートウェイサーバ等が介在してももちろんよい。また、情報端末１０は、携帯型装置に限らず、例えば、デスクトップ型パソコン、ノート型パソコン、タブレット型パソコン、ラップトップ型パソコン等でもよい。 Of course, a gateway server for converting a communication protocol between the information terminal 10 and the network N may be interposed. The information terminal 10 is not limited to a portable device, and may be a desktop personal computer, a notebook personal computer, a tablet personal computer, a laptop personal computer, or the like.

本発明によれば、ユーザによる対話者（外国人客）への接客において、対話の内容を分析することにより注文情報を簡易かつ有効に取得することができるので、例えば、互いの言語を理解できない人同士の会話に関するサービスの提供分野における、プログラム、装置、システム、及び方法の設計、製造、提供、販売等の活動に広く利用することができる。 According to the present invention, order information can be acquired easily and effectively by analyzing the contents of a dialogue when a user interacts with a conversation person (foreigner), so that, for example, the languages of each other cannot be understood. The present invention can be widely used for activities such as design, manufacture, provision, and sales of programs, devices, systems, and methods in the field of providing services related to conversation between people.

１０情報端末
１１プロセッサ
１２記憶資源
１３音声入出力デバイス
１４通信インターフェイス
１５入力デバイス
１６表示デバイス
１７カメラ
２０サーバ
２１プロセッサ
２２通信インターフェイス
２３記憶資源
４１言語ボタン
４２ａ日本語の入力ボタン
４２ｂ英語の入力ボタン
４３お声がけボタン
４４言語選択ボタン
４５履歴ボタン
４６サジェストボタン
４７設定ボタン
４８マイク図案
４９多重円形図案
５０入力切替ボタン
５１数字キー
５２カラム
１００音声翻訳装置
Ｂ１キャンセルボタン
Ｂ２閉じるボタン
Ｄ２０データベース
Ｌ２０モジュール
Ｍ２０モデル
Ｎネットワーク
Ｐ１〜Ｐ３フレーズリスト
Ｐ１０プログラム
Ｐ２０プログラム
Ｔ１〜Ｔ６テキスト 10 Information terminal 11 Processor 12 Storage resource 13 Voice input / output device 14 Communication interface 15 Input device 16 Display device 17 Camera 20 Server 21 Processor 22 Communication interface 23 Storage resource 41 Language button 42a Japanese input button 42b English input button 43 Voice button 44 Language selection button 45 History button 46 Suggest button 47 Setting button 48 Microphone design 49 Multiple circular design 50 Input switch button 51 Number key 52 Column 100 Speech translation device B1 Cancel button B2 Close button D20 Database L20 Module M20 Model N Network P1-P3 Phrase list P10 Program P20 Program T1-T6 Text

Claims

An input unit for inputting the voices of the clerk who makes an order inquiry and the foreign customer who makes the order;
A translation unit that translates the content of the input speech into content of a different language;
An output unit that outputs the translated content of the input speech as speech and / or text;
A display unit that displays a list of fixed phrases that include only product names and fixed phrases that do not include product names so that the store clerk and / or the foreign customers can select,
A determination unit for determining whether or not an order by the foreign customer in response to an order inquiry from the store clerk is confirmed;
When it is determined that the order is confirmed, a storage unit that stores the confirmed content of the order;
With
The determination unit is (1) the content of the input voice or the content of the selected fixed phrase includes a product name, an order quantity, and a closing by the store clerk , or (2) the If the content of the input voice or the content of the selected fixed phrase includes at least a product name and there is an operation by the store clerk indicating the end of the conversation other than the closing, the order is Judge that it is confirmed,
Speech translation device.

The display unit, said in a case where the by foreigners trade name is selected, displays the quantity input screen that does not include the quotient product name for the foreign visitors to enter the order quantity,
The speech translation apparatus according to claim 1.

The display unit displays a list of the product names, and displays the product names having a larger total number of the order quantities stored in the storage unit in a higher rank in the list.
The speech translation apparatus according to claim 1 or 2.

The storage unit stores an accounting quantity for each product name and / or a profit rate for each product name,
The display unit displays a list of the product names, and the product names having a larger total number of the accounting quantities stored in the storage unit and / or the product names having a higher profit rate are displayed. Display higher in the list,
The speech translation apparatus according to any one of claims 1 to 3.

The fixed phrase for each shop industries or per the clerk the clerk belongs, in which preset automatically or manually,
The speech translation apparatus according to claim 1.

The display unit displays a translation of each fixed phrase in a different language, or the output unit outputs a translation of each fixed phrase in a different language by voice.
The speech translation apparatus according to claim 1.

Using a speech translation device including an input unit, a translation unit, an output unit, a display unit, a determination unit, and a storage unit,
The input unit inputs a voice of a clerk who makes an order inquiry and / or a foreign customer who makes an order;
The translation unit translating the content of the input speech into content of a different language;
The output unit outputting the content of translation of the input speech as speech and / or text;
A step of displaying a list of standard phrases including only product names and standard phrases not including product names so that the display unit can be selected by the store clerk and / or the foreign customer ;
The step of determining whether or not the order by the foreign customer for the inquiry of the order from the store clerk is confirmed by the determination unit;
Storing the confirmed content of the order when the storage unit determines that the order has been confirmed;
Including
In the determining step, (1) if the content of the input voice or the content of the selected fixed phrase includes a product name, order quantity, and closing by the store clerk , or (2 ) When the content of the input voice or the content of the selected fixed phrase includes at least a product name and there is an operation by the store clerk indicating the end of the conversation other than the closing, Determine that the order has been confirmed,
Speech translation method.

Computer
An input unit for inputting the voice of a clerk who makes an order inquiry and / or a foreign customer who makes an order;
A translation unit that translates the content of the input speech into a different language
An output unit that outputs the translated content of the input speech as speech and / or text;
A display unit that displays a list of fixed phrases that include only product names and fixed phrases that do not include product names so that the store clerk and / or the foreign customers can select,
A determination unit for determining whether or not an order by the foreign customer in response to an order inquiry from the store clerk is confirmed;
When it is determined that the order is confirmed, a storage unit that stores the confirmed content of the order;
To function,
The determination unit is (1) the content of the input voice or the content of the selected fixed phrase includes a product name, an order quantity, and a closing by the store clerk , or (2) the If the content of the input voice or the content of the selected fixed phrase includes at least a product name and there is an operation by the store clerk indicating the end of the conversation other than the closing, the order is Judge that it is confirmed,
Speech translation program.