JP6174746B1

JP6174746B1 - Speech translation device, speech translation method, and speech translation program

Info

Publication number: JP6174746B1
Application number: JP2016066152A
Authority: JP
Inventors: 千春宇賀神
Original assignee: RECRUIT LIFESTYLE CO., LTD.
Current assignee: RECRUIT LIFESTYLE CO., LTD.
Priority date: 2016-03-29
Filing date: 2016-03-29
Publication date: 2017-08-02
Anticipated expiration: 2036-03-29
Also published as: JP2017181662A

Abstract

【課題】外国人に対する接客時の会話を自然にかつ円滑に進め、接客の最適化を図る。【解決手段】本発明の一態様による音声翻訳装置は、ユーザ等の音声を入力するための入力部、入力音声の内容を異なる言語の内容に翻訳する翻訳部、入力音声の翻訳内容を音声等で出力する出力部、少なくとも１つの上位フレーズを含む上位フレーズ群、及び、各上位フレーズに関連付けられた少なくとも１つの下位フレーズを含む複数の下位フレーズ群を階層的に記憶する記憶部、及び、上位フレーズ群を表示し、上位フレーズのなかから特定のフレーズが選択されたときに、それに関連付けられた下位フレーズを含む下位フレーズ群を表示する処理を、階層的に順次実行する表示部を備える。【選択図】図３The present invention aims to optimize customer service by naturally and smoothly proceeding with customer service conversations with foreigners. A speech translation apparatus according to an aspect of the present invention includes an input unit for inputting speech of a user or the like, a translation unit for translating the content of the input speech into content of a different language, the translation content of the input speech as speech, etc. An output unit that outputs in the above, a high-level phrase group including at least one high-level phrase, and a storage unit that hierarchically stores a plurality of low-level phrase groups including at least one low-level phrase associated with each high-level phrase, and a high-level A display unit is provided that displays a group of phrases and sequentially executes a process of displaying a group of lower phrases including a lower phrase associated therewith when a specific phrase is selected from the upper phrases. [Selection] Figure 3

Description

本発明は、音声翻訳装置、音声翻訳方法、及び音声翻訳プログラムに関する。 The present invention relates to a speech translation device, a speech translation method, and a speech translation program.

互いの言語を理解できない人同士の会話、例えば店舗の店員と外国人客との会話を可能ならしめるべく、話者の発話音声をテキスト化し、そのテキストの内容を相手の言語に機械翻訳した上で画面に表示したり、或いは、音声合成技術を用いてそのテキストの内容を音声再生したりする音声翻訳技術が提案されている（例えば特許文献１）。また、かかる音声翻訳技術を具現化したスマートフォン等の情報端末で動作する音声翻訳アプリケーションも実用化されている（例えば非特許文献１）。さらに、ユーザが会話を行いたいシチュエーションを選択することにより、目的別の会話パターンがリスト表示される翻訳アプリケーションも知られている（例えば非特許文献２）。 In order to enable conversations between people who do not understand each other's language, for example, conversations between store clerk and foreign customers, the speaker's speech is converted into text and the text content is machine-translated into the partner's language. A speech translation technique has been proposed in which the text is displayed on the screen or the text content is played back using a speech synthesis technique (for example, Patent Document 1). A speech translation application that operates on an information terminal such as a smartphone that embodies such speech translation technology has also been put into practical use (for example, Non-Patent Document 1). Furthermore, there is also known a translation application in which a conversation pattern for each purpose is displayed in a list by selecting a situation in which the user wants to have a conversation (for example, Non-Patent Document 2).

特開平９−３４８９５号公報Japanese Patent Laid-Open No. 9-34895

Ｕ−ＳＴＡＲコンソーシアムホームページ［平成２８年２月２３日検索］、インターネット＜ＵＲＬ：http://www.ustar-consortium.com/app_ja/app.html＞U-STAR Consortium homepage [Search on February 23, 2016], Internet <URL: http://www.ustar-consortium.com/app_en/app.html> 会話シミュレーションができる！日中韓英対応の翻訳アプリ『［旅行アプリ１位］TS会話翻訳機［CJK］』［平成２８年２月２３日検索］、インターネット＜ＵＲＬ：http://andronavi.com/2012/08/208376＞Conversation simulation is possible! Japanese-Chinese-Korean-compatible translation application “[Travel App No. 1] TS Conversation Translator [CJK]” [searched February 23, 2016], Internet <URL: http://andronavi.com/2012/08/ 208376>

ところで、例えば非特許文献２に記載された翻訳アプリケーションでは、目的別の会話パターンとして、複数の質問文と各質問文に対する複数の回答文の両方が、一画面に列記されて表示される。発話者は、それらの例文のなかから所望のものを選択することとなるが、そうすると、単に１つの例文を選んで発話するだけなので、会話は単発的なものとなってしまう。また、そのように例文が一画面に列記されていると、それらの例文のなかから所望のものを選択するために、その都度画面をスクロールして検索する必要がある。その結果、例えば接客時において、一連の会話を自然にかつ円滑に行うことができず、それに起因して、適切な接客を行い難いといった不都合がある。 By the way, in the translation application described in Non-Patent Document 2, for example, both a plurality of question sentences and a plurality of answer sentences for each question sentence are listed and displayed as a conversation pattern for each purpose. The speaker selects a desired one of those example sentences. However, in that case, since only one example sentence is selected and uttered, the conversation becomes one-shot. In addition, when example sentences are listed on one screen as described above, it is necessary to scroll the screen and search each time in order to select a desired one from the example sentences. As a result, for example, at the time of customer service, a series of conversations cannot be performed naturally and smoothly, resulting in inconvenience that it is difficult to perform appropriate customer service.

そこで、本発明は、かかる事情に鑑みてなされたものであり、接客時のユーザと対話者（外国人客）の会話を自然にかつ円滑に進めることができ、これにより、接客の最適化に資することができる音声翻訳装置、音声翻訳方法、及び音声翻訳プログラムを提供することを目的とする。 Therefore, the present invention has been made in view of such circumstances, and can naturally and smoothly advance conversation between a user and a talker (foreign customer) during customer service, thereby optimizing customer service. It is an object to provide a speech translation device, a speech translation method, and a speech translation program that can contribute.

上記課題を解決するため、本発明の一態様による音声翻訳装置は、ユーザ及び／又は対話者の音声を入力するための入力部、入力音声の内容を異なる言語の内容に翻訳する翻訳部、及び、入力音声の翻訳内容（対訳）を音声及び／又はテキストで出力する出力部を備える。そして、当該音声翻訳装置は、少なくとも１つの上位フレーズを含む上位フレーズ群、及び、各上位フレーズに関連付けられた少なくとも１つの下位フレーズを含む複数の下位フレーズ群を階層的に記憶する記憶部と、上位フレーズ群を表示し、上位フレーズのなかから特定のフレーズが選択されたときに、その特定のフレーズに関連付けられた下位フレーズ群を表示する処理を、階層的に順次実行する表示部とを更に備える。なお、「フレーズ」には、文、節、句、語、及び数字が含まれ、また、それらに付随して画像又は記号が含まれていてもよい。また、換言すれば、本発明の一態様による音声翻訳装置は、かかる複数のフレーズ群の言わば樹形図を用意しておき、それらの階層的な表示とフレーズの選択を順次（繰り返して）実行するフロー処理を提供する。 In order to solve the above problems, a speech translation apparatus according to an aspect of the present invention includes an input unit for inputting a voice of a user and / or a dialoguer, a translation unit for translating the content of the input speech into content of a different language, and And an output unit that outputs the translation content (translation) of the input speech as speech and / or text. And the said speech translation apparatus WHEREIN: The memory | storage part which memorize | stores hierarchically the high-order phrase group containing at least 1 high-order phrase, and the several low-order phrase group containing the at least 1 low-order phrase linked | related with each high-order phrase, A display unit that displays the upper phrase group, and when the specific phrase is selected from the upper phrase, the process of displaying the lower phrase group associated with the specific phrase in a hierarchical manner. Prepare. The “phrase” includes sentences, clauses, phrases, words, and numbers, and may include images or symbols accompanying them. In other words, the speech translation apparatus according to an aspect of the present invention prepares a so-called tree diagram of the plurality of phrase groups, and sequentially (repetitively) executes hierarchical display and phrase selection thereof. Provide the flow processing.

より具体的には、表示部が、上位フレーズ群及び下位フレーズ群を、それぞれ別画面として表示するように構成しても好適である。 More specifically, the display unit may be configured to display the upper phrase group and the lower phrase group as separate screens.

また、表示部は、上位フレーズとして特定の質問事項が含まれており、その特定の質問事項が選択されたときに、その特定の質問事項への回答を入力するための画面を表示してもよい。 In addition, the display unit includes a specific question as an upper phrase, and when the specific question is selected, the display unit displays a screen for inputting an answer to the specific question. Good.

また、上位フレーズ及び下位フレーズは、ユーザが属する業種毎又はユーザの店舗毎に、自動又は手動で予め設定されたものであってもよい。 Further, the upper phrase and the lower phrase may be automatically or manually set in advance for each type of business to which the user belongs or for each store of the user.

さらに、記憶部が、各上位フレーズ及び各下位フレーズの選択回数を記憶し、表示部は、選択回数がより多い上位フレーズを上位フレーズ群の表示画面においてより高い順位に表示し、かつ、選択回数がより多い下位フレーズを下位フレーズ群の表示画面においてより高い順位に表示するようにしてもよい。 Further, the storage unit stores the number of selections of each upper phrase and each lower phrase, and the display unit displays the higher phrase having a higher number of selections in a higher order on the display screen of the upper phrase group, and the number of selections. The lower phrase having more may be displayed in a higher order on the display screen of the lower phrase group.

またさらに、特定のフレーズが注文の問い合わせであり、その特定のフレーズに関連付けられた下位フレーズを含む下位フレーズ群が複数の注文品のリストである場合、記憶部が、各注文品の選択回数、又は、各注文品の利益率を記憶し、表示部は、選択回数がより多い注文品、又は、利益率がより高い注文品を下位フレーズ群の表示画面においてより高い順位に表示することもできる。 Furthermore, if the specific phrase is an order inquiry and the sub-phrase group including the sub-phrase associated with the specific phrase is a list of a plurality of order items, the storage unit selects the number of times each order item is selected, Alternatively, the profit ratio of each ordered item is stored, and the display unit can display an ordered item having a higher number of selections or an ordered item having a higher profit rate in a higher rank on the display screen of the lower phrase group. .

さらにまた、表示部が、各上位フレーズ及び各下位フレーズの異なる言語による訳文を表示し、又は、出力部が、各上位フレーズ及び各下位フレーズの異なる言語による訳文を音声で出力するようにしてもよい。なお、「訳文」は、記憶部に予め記憶しておいてもよく、或いは、上位フレーズ又は下位フレーズの選択の都度、翻訳部により翻訳するようにしてもよい。 Furthermore, the display unit may display a translation of each upper phrase and each lower phrase in a different language, or the output unit may output a translation of each upper phrase and each lower phrase in a different language by voice. Good. The “translation” may be stored in advance in the storage unit, or may be translated by the translation unit each time an upper phrase or a lower phrase is selected.

また、本発明の一態様による音声翻訳方法は、入力部、翻訳部、出力部、記憶部、及び表示部を備える音声翻訳装置を用いる方法である。すなわち、当該方法は、ユーザ及び／又は対話者の音声を入力するステップと、翻訳部が、入力音声の内容を異なる言語の内容に翻訳するステップと、出力部が、入力音声の翻訳内容を音声及び／又はテキストで出力するステップと、記憶部が、少なくとも１つの上位フレーズを含む上位フレーズ群、及び、各上位フレーズに関連付けられた少なくとも１つの下位フレーズを含む複数の下位フレーズ群を階層的に記憶するステップと、表示部が、上位フレーズ群を表示し、上位フレーズのなかから特定のフレーズが選択されたときに、その特定のフレーズに関連付けられた下位フレーズを含む下位フレーズ群を表示する処理を、階層的に順次実行するステップを含む。 A speech translation method according to an aspect of the present invention is a method using a speech translation apparatus including an input unit, a translation unit, an output unit, a storage unit, and a display unit. That is, in this method, the step of inputting the voice of the user and / or the conversation person, the step of the translation unit translating the content of the input voice into the content of a different language, and the output unit And / or outputting in text, and the storage unit hierarchically includes a plurality of lower phrase groups including at least one upper phrase including at least one upper phrase and at least one lower phrase associated with each upper phrase. The storing step and the display unit displays the upper phrase group, and when a specific phrase is selected from the upper phrases, the lower phrase group including the lower phrase associated with the specific phrase is displayed. Are sequentially executed in a hierarchical manner.

また、本発明の一態様による音声翻訳プログラムは、コンピュータ（単数又は単一種に限られず、複数又は複数種でもよい；以下同様）を、ユーザ及び／又は対話者の音声を入力するための入力部と、入力音声の内容を異なる言語の内容に翻訳する翻訳部と、入力音声の翻訳内容を音声及び／又はテキストで出力する出力部と、少なくとも１つの上位フレーズを含む上位フレーズ群、及び、各上位フレーズに関連付けられた少なくとも１つの下位フレーズを含む複数の下位フレーズ群を階層的に記憶する記憶部と、上位フレーズ群を表示し、上位フレーズのなかから特定のフレーズが選択されたときに、その特定のフレーズに関連付けられた下位フレーズを含む下位フレーズ群を表示する処理を、階層的に順次実行する表示部として機能させる。 In addition, the speech translation program according to one aspect of the present invention is a computer (not limited to a single type or a single type, but may be a plurality or a plurality of types; the same shall apply hereinafter), and an input unit for inputting a voice of a user and / or a conversation person A translation unit that translates the content of the input speech into content of a different language, an output unit that outputs the translation content of the input speech in speech and / or text, a group of upper phrases including at least one upper phrase, and each A storage unit that hierarchically stores a plurality of lower phrase groups including at least one lower phrase associated with the upper phrase, and the upper phrase group. When a specific phrase is selected from the upper phrases, The process of displaying the lower phrase group including the lower phrase associated with the specific phrase is made to function as a display unit that sequentially executes hierarchically. .

本発明によれば、ユーザと対話者との会話において、上位フレーズ群に含まれる上位フレーズのなかから特定のフレーズを選択すると、それに関連付けて記憶された下位フレーズを含む下位フレーズ群が表示される一連の処理が、階層的に順次（繰り返して）実行される。これにより、所定の想定されるシチュエーションにおいて、発話の都度、質問や回答の内容を熟慮することなく、会話を滞りなく続けることができる。したがって、ユーザと対話者との会話を自然にかつ円滑に進めることができ、これにより、接客の最適化を図ることが可能となる。 According to the present invention, when a specific phrase is selected from the upper phrases included in the upper phrase group in the conversation between the user and the interlocutor, the lower phrase group including the lower phrase stored in association therewith is displayed. A series of processing is executed sequentially (repeatedly) hierarchically. Thereby, in a predetermined assumed situation, it is possible to continue the conversation without delay without considering the contents of the question and the answer each time an utterance is made. Therefore, the conversation between the user and the interlocutor can be proceeded naturally and smoothly, which makes it possible to optimize customer service.

本発明による音声翻訳装置に係るネットワーク構成等の好適な一実施形態を概略的に示すシステムブロック図である。1 is a system block diagram schematically showing a preferred embodiment of a network configuration and the like related to a speech translation apparatus according to the present invention. 本発明による音声翻訳装置の好適な一実施形態における処理の流れ（一部）の一例を示すフローチャートである。It is a flowchart which shows an example of the flow (part) of the process in suitable one Embodiment of the speech translation apparatus by this invention. 本発明による音声翻訳装置の好適な一実施形態における処理の流れ（一部）の一例を示すフローチャートである。It is a flowchart which shows an example of the flow (part) of the process in suitable one Embodiment of the speech translation apparatus by this invention. （Ａ）乃至（Ｄ）は、情報端末における表示画面の遷移の一例を示す平面図である。(A) thru | or (D) are top views which show an example of the transition of the display screen in an information terminal. （Ａ）乃至（Ｄ）は、情報端末における表示画面の遷移の一例を示す平面図である。(A) thru | or (D) are top views which show an example of the transition of the display screen in an information terminal. フレーズデータベースのデータ構造の一例を示す模式図である。It is a schematic diagram which shows an example of the data structure of a phrase database.

以下、本発明の実施の形態について詳細に説明する。なお、以下の実施の形態は、本発明を説明するための例示であり、本発明をその実施の形態のみに限定する趣旨ではない。また、本発明は、その要旨を逸脱しない限り、さまざまな変形が可能である。さらに、当業者であれば、以下に述べる各要素を均等なものに置換した実施の形態を採用することが可能であり、かかる実施の形態も本発明の範囲に含まれる。またさらに、必要に応じて示す上下左右等の位置関係は、特に断らない限り、図示の表示に基づくものとする。さらにまた、図面における各種の寸法比率は、その図示の比率に限定されるものではない。 Hereinafter, embodiments of the present invention will be described in detail. The following embodiments are examples for explaining the present invention, and are not intended to limit the present invention only to the embodiments. The present invention can be variously modified without departing from the gist thereof. Furthermore, those skilled in the art can employ embodiments in which the elements described below are replaced with equivalent ones, and such embodiments are also included in the scope of the present invention. Furthermore, positional relationships such as up, down, left, and right shown as needed are based on the display shown unless otherwise specified. Furthermore, various dimensional ratios in the drawings are not limited to the illustrated ratios.

（装置構成）
図１は、本発明による音声翻訳装置に係るネットワーク構成等の好適な一実施形態を概略的に示すシステムブロック図である。この例において、音声翻訳装置１００は、ユーザが使用する情報端末１０（ユーザ装置）にネットワークＮを介して電子的に接続されるサーバ２０を備える（但し、これに限定されない）。 (Device configuration)
FIG. 1 is a system block diagram schematically showing a preferred embodiment such as a network configuration related to a speech translation apparatus according to the present invention. In this example, the speech translation apparatus 100 includes a server 20 that is electronically connected to the information terminal 10 (user apparatus) used by the user via the network N (but is not limited to this).

情報端末１０は、例えば、タッチパネル等のユーザインターフェイス及び視認性が高いディスプレイを採用する。また、ここでの情報端末１０は、ネットワークＮとの通信機能を有するスマートフォンに代表される携帯電話を含む可搬型のタブレット型端末装置である。さらに、情報端末１０は、プロセッサ１１、記憶資源１２、音声入出力デバイス１３、通信インターフェイス１４、入力デバイス１５、表示デバイス１６、及びカメラ１７を備えている。また、情報端末１０は、インストールされた音声翻訳アプリケーションソフト（本発明の一実施形態による音声翻訳プログラムの少なくとも一部）が動作することにより、本発明の一実施形態による音声翻訳装置の一部又は全部として機能するものである。 The information terminal 10 employs a user interface such as a touch panel and a display with high visibility, for example. The information terminal 10 here is a portable tablet terminal device including a mobile phone represented by a smartphone having a communication function with the network N. The information terminal 10 further includes a processor 11, a storage resource 12, a voice input / output device 13, a communication interface 14, an input device 15, a display device 16, and a camera 17. In addition, the information terminal 10 operates by the installed speech translation application software (at least a part of the speech translation program according to the embodiment of the present invention), so that a part of the speech translation apparatus according to the embodiment of the present invention or It functions as a whole.

プロセッサ１１は、算術論理演算ユニット及び各種レジスタ（プログラムカウンタ、データレジスタ、命令レジスタ、汎用レジスタ等）から構成される。また、プロセッサ１１は、記憶資源１２に格納されているプログラムＰ１０である音声翻訳アプリケーションソフトを解釈及び実行し、各種処理を行う。このプログラムＰ１０としての音声翻訳アプリケーションソフトは、例えばサーバ２０からネットワークＮを通じて配信可能なものであり、手動で又は自動でインストール及びアップデートされてもよい。 The processor 11 includes an arithmetic logic unit and various registers (program counter, data register, instruction register, general-purpose register, etc.). Further, the processor 11 interprets and executes speech translation application software, which is the program P10 stored in the storage resource 12, and performs various processes. The speech translation application software as the program P10 can be distributed from the server 20 through the network N, for example, and may be installed and updated manually or automatically.

なお、ネットワークＮは、例えば、有線ネットワーク（近距離通信網（ＬＡＮ）、広域通信網（ＷＡＮ）、又は付加価値通信網（ＶＡＮ）等）と無線ネットワーク（移動通信網、衛星通信網、ブルートゥース（Bluetooth（登録商標））、ＷｉＦｉ(Wireless Fidelity)、ＨＳＤＰＡ(High Speed Downlink Packet Access)等）が混在して構成される通信網である。 The network N includes, for example, a wired network (a short-range communication network (LAN), a wide-area communication network (WAN), a value-added communication network (VAN), etc.) and a wireless network (mobile communication network, satellite communication network, Bluetooth ( Bluetooth (registered trademark)), WiFi (Wireless Fidelity), HSDPA (High Speed Downlink Packet Access), etc.).

記憶資源１２は、物理デバイス（例えば、半導体メモリ等のコンピュータ読み取り可能な記録媒体）の記憶領域が提供する論理デバイスであり、情報端末１０の処理に用いられるオペレーティングシステムプログラム、ドライバプログラム、各種データ等を格納する。ドライバプログラムとしては、例えば、音声入出力デバイス１３を制御するための入出力デバイスドライバプログラム、入力デバイス１５を制御するための入力デバイスドライバプログラム、表示デバイス１６を制御するための表示デバイスドライバプログラム等が挙げられる。さらに、音声入出力デバイス１３は、例えば、一般的なマイクロフォン、及びサウンドデータを再生可能なサウンドプレイヤである。 The storage resource 12 is a logical device provided by a storage area of a physical device (for example, a computer-readable recording medium such as a semiconductor memory), and an operating system program, a driver program, various data, etc. used for processing of the information terminal 10 Is stored. Examples of the driver program include an input / output device driver program for controlling the audio input / output device 13, an input device driver program for controlling the input device 15, and a display device driver program for controlling the display device 16. Can be mentioned. Furthermore, the voice input / output device 13 is, for example, a general microphone and a sound player capable of reproducing sound data.

通信インターフェイス１４は、例えばサーバ２０との接続インターフェイスを提供するものであり、無線通信インターフェイス及び／又は有線通信インターフェイスから構成される。また、入力デバイス１５は、例えば、表示デバイス１６に表示されるアイコン、ボタン、仮想キーボード、テキスト等のタップ動作による入力操作を受け付けるインターフェイスを提供するものであり、タッチパネルの他、情報端末１０に外付けされる各種入力装置を例示することができる。 The communication interface 14 provides a connection interface with the server 20, for example, and is configured from a wireless communication interface and / or a wired communication interface. The input device 15 provides an interface for accepting an input operation by a tap operation such as an icon, a button, a virtual keyboard, or a text displayed on the display device 16. Various input devices to be attached can be exemplified.

表示デバイス１６は、画像表示インターフェイスとして各種の情報をユーザや対話者（会話の相手方）に提供するものであり、例えば、有機ＥＬディスプレイ、液晶ディスプレイ、ＣＲＴディスプレイ等が挙げられる。また、カメラ１７は、種々の被写体の静止画や動画を撮像するためのものである。 The display device 16 provides various information as an image display interface to a user or a conversation person (conversation partner), and examples thereof include an organic EL display, a liquid crystal display, and a CRT display. The camera 17 is for capturing still images and moving images of various subjects.

サーバ２０は、例えば、演算処理能力の高いホストコンピュータによって構成され、そのホストコンピュータにおいて所定のサーバ用プログラムが動作することにより、サーバ機能を発現するものであり、例えば、音声認識サーバ、翻訳サーバ、及び音声合成サーバとして機能する単数又は複数のホストコンピュータから構成される（図示においては単数で示すが、これに限定されない）。そして、各サーバ２０は、プロセッサ２１、通信インターフェイス２２、及び記憶資源２３を備える。 The server 20 is constituted by, for example, a host computer having a high arithmetic processing capability, and expresses a server function by operating a predetermined server program in the host computer, for example, a speech recognition server, a translation server, And a single or a plurality of host computers functioning as a speech synthesis server (in the drawing, it is indicated by a single, but is not limited thereto). Each server 20 includes a processor 21, a communication interface 22, and a storage resource 23.

プロセッサ２１は、算術演算、論理演算、ビット演算等を処理する算術論理演算ユニット及び各種レジスタ（プログラムカウンタ、データレジスタ、命令レジスタ、汎用レジスタ等）から構成され、記憶資源２３に格納されているプログラムＰ２０を解釈及び実行し、所定の演算処理結果を出力する。また、通信インターフェイス２２は、ネットワークＮを介して情報端末１０に接続するためのハードウェアモジュールであり、例えば、ＩＳＤＮモデム、ＡＤＳＬモデム、ケーブルモデム、光モデム、ソフトモデム等の変調復調装置である。 The processor 21 is composed of an arithmetic and logic unit for processing arithmetic operations, logical operations, bit operations and the like and various registers (program counter, data register, instruction register, general-purpose register, etc.), and is stored in the storage resource 23. P20 is interpreted and executed, and a predetermined calculation processing result is output. The communication interface 22 is a hardware module for connecting to the information terminal 10 via the network N. For example, the communication interface 22 is a modulation / demodulation device such as an ISDN modem, an ADSL modem, a cable modem, an optical modem, or a soft modem.

記憶資源２３は、例えば、物理デバイス（ディスクドライブ又は半導体メモリ等のコンピュータ読み取り可能な記録媒体等）の記憶領域が提供する論理デバイスであり、それぞれ単数又は複数のプログラムＰ２０、各種モジュールＬ２０、各種データベースＤ２０、及び各種モデルＭ２０が格納されている。また、記憶資源２３には、ユーザが対話者へ話しかけるために予め用意された複数の質問定型文、入力音声の履歴データ、各種設定用のデータ、後述するフレーズデータ等も記憶されている。 The storage resource 23 is a logical device provided by, for example, a storage area of a physical device (a computer-readable recording medium such as a disk drive or a semiconductor memory), and each includes one or a plurality of programs P20, various modules L20, and various databases. D20 and various models M20 are stored. The storage resource 23 also stores a plurality of standard questions prepared for the user to speak to the talker, input voice history data, data for various settings, phrase data to be described later, and the like.

プログラムＰ２０は、サーバ２０のメインプログラムである上述したサーバ用プログラム等である。また、各種モジュールＬ２０は、情報端末１０から送信されてくる要求及び情報に係る一連の情報処理を行うため、プログラムＰ１０の動作中に適宜呼び出されて実行されるソフトウェアモジュール（モジュール化されたサブプログラム）である。かかるモジュールＬ２０としては、音声認識モジュール、翻訳モジュール、音声合成モジュール等が挙げられる。 The program P20 is the above-described server program that is the main program of the server 20. In addition, the various modules L20 perform a series of information processing related to requests and information transmitted from the information terminal 10, so that they are appropriately called and executed during the operation of the program P10 (moduleized subprograms). ). Examples of the module L20 include a speech recognition module, a translation module, and a speech synthesis module.

また、各種データベースＤ２０としては、音声翻訳処理のために必要な各種コーパス（例えば、日本語と英語の音声翻訳の場合、日本語音声コーパス、英語音声コーパス、日本語文字（語彙）コーパス、英語文字（語彙）コーパス、日本語辞書、英語辞書、日英対訳辞書、日英対訳コーパス等）、音声データベース、ユーザに関する情報を管理するための管理用データベース、後述する階層構造を有するフレーズデータベース等が挙げられる。また、各種モデルＭ２０としては、音声認識に使用する音響モデルや言語モデル等が挙げられる。 The various databases D20 include various corpora required for speech translation processing (for example, in the case of Japanese and English speech translation, a Japanese speech corpus, an English speech corpus, a Japanese character (vocabulary) corpus, an English character) (Vocabulary) Corpus, Japanese dictionary, English dictionary, Japanese-English bilingual dictionary, Japanese-English bilingual corpus, etc.), voice database, management database for managing information about users, phrase database having a hierarchical structure described later, etc. It is done. Examples of the various models M20 include acoustic models and language models used for speech recognition.

（処理）
以上のとおり構成された音声翻訳装置１００における音声翻訳処理の操作及び動作の一例について、以下に更に説明する。図２及び図３は、音声翻訳装置１００における処理の流れ（の一部）の一例を示すフローチャートである。また、図４（Ａ）乃至（Ｄ）及び図５（Ａ）乃至（Ｄ）は、情報端末１０の画面表示における遷移（の一部）の一例を示す平面図である。なお、ここでは、情報端末１０のユーザが日本語を話す飲食店等の店員であり、対話者（会話の相手）が英語を話す外国人客である場合の会話を想定する（但し、言語やシチュエーションはこれに限定されない）。 (processing)
An example of operations and operations of speech translation processing in the speech translation apparatus 100 configured as described above will be further described below. 2 and 3 are flowcharts showing an example of (a part of) the processing flow in the speech translation apparatus 100. FIG. FIGS. 4A to 4D and FIGS. 5A to 5D are plan views illustrating an example of (a part of) transitions in the screen display of the information terminal 10. Here, it is assumed that the user of the information terminal 10 is a clerk of a restaurant that speaks Japanese, and the conversation person (conversation partner) is a foreign customer who speaks English (however, the language or Situations are not limited to this).

まず、ユーザ（店員）が当該アプリケーションを起動する（ステップＳＵ１）と、情報端末１０の表示デバイス１６に、図４（Ａ）に示す対話者の言語選択画面が表示される（ステップＳＪ１）。この言語選択画面には、対話者に言語を尋ねることをユーザに促すための日本語のテキストＴ１、対話者に言語を尋ねる旨の英語のテキストＴ２、及び、想定される複数の代表的な言語（ここでは、英語、中国語（例えば書体により２種類）、ハングル語）を示す言語ボタン４１が表示される。さらにその下方には、言語選択画面を閉じて当該アプリケーションを終了するためのキャンセルボタンＢ１も表示される。 First, when the user (clerk) activates the application (step SU1), the language selection screen for the conversation person shown in FIG. 4A is displayed on the display device 16 of the information terminal 10 (step SJ1). In this language selection screen, the Japanese text T1 for prompting the user to ask the conversation person about the language, the English text T2 for asking the conversation person about the language, and a plurality of typical languages assumed. Here, a language button 41 indicating English, Chinese (for example, two types depending on the typeface), and Korean is displayed. Further below that, a cancel button B1 for closing the language selection screen and ending the application is also displayed.

このとき、図４（Ａ）に示す如く、日本語のテキストＴ１及び英語のテキストＴ２は、プロセッサ１１及び表示デバイス１６により、情報端末１０の表示デバイス１６の画面において、異なる領域によって区分けされ、且つ、互いに逆向き（互いに異なる向き；図示において上下逆向き）に表示される。これにより、ユーザと対話者が対面している状態で会話を行う場合、ユーザは日本語のテキストＴ１を確認し易い一方、対話者は、英語のテキストＴ２を確認し易くなる。また、日本語のテキストＴ１と英語のテキストＴ２が区分けして表示されるので、両者を明別して更に視認し易くなる利点がある。 At this time, as shown in FIG. 4A, the Japanese text T1 and the English text T2 are divided by the processor 11 and the display device 16 into different areas on the screen of the display device 16 of the information terminal 10, and Are displayed in opposite directions (different directions; upside down in the figure). Thereby, when a conversation is performed in a state where the user and the interlocutor face each other, the user can easily confirm the Japanese text T1, while the interrogator can easily confirm the English text T2. In addition, since the Japanese text T1 and the English text T2 are displayed separately, there is an advantage that both are clearly distinguished and can be visually recognized more easily.

ユーザがその言語選択画面における英語のテキストＴ２の表示を対話者に提示し、対話者に例えば英語（Ｅｎｇｌｉｓｈ）のボタンをタップしてもらうことにより、又は、ユーザが自ら、対話者の言語を選択することができる。こうして対話者の言語が選択されると、サーバ２０のプロセッサ２１及び情報端末１０のプロセッサ１１により、ホーム画面として、日本語と英語の音声入力の待機画面が表示デバイス１６に表示される（図４（Ｂ）；ステップＳＪ２）。この待機画面には、ユーザと対話者の言語の何れを発話するかを問う日本語のテキストＴ３、並びに、日本語の音声入力を行うための入力ボタン４２ａ及び英語の音声入力を行うための入力ボタン４２ｂが表示される。 The user presents the display of the English text T2 on the language selection screen to the conversation person, and the conversation person taps the English button, for example, or the user himself selects the conversation person's language. can do. When the language of the conversation person is selected in this way, the standby screen for voice input in Japanese and English is displayed on the display device 16 as the home screen by the processor 21 of the server 20 and the processor 11 of the information terminal 10 (FIG. 4). (B); Step SJ2). On this standby screen, Japanese text T3 asking which of the user's language and the speaker's language is to be spoken, an input button 42a for performing Japanese speech input, and an input for performing English speech input Button 42b is displayed.

また、この待機画面には、予め設定されている複数の質問定型文のリスト表示を選択するためのお声がけボタン４３、図４（Ａ）の言語選択画面に戻って対話者の言語を切り替える（言語選択をやり直す）ための言語選択ボタン４４、これまでになされた音声入力内容の履歴表示を選択するための履歴ボタン４５、予め用意された複数の推奨フレーズ群のなかから所望のフレーズを選択して会話を進めることができるサジェスト機能を実行するためのサジェストボタン４６、及び当該アプリケーションソフトの各種設定を行うための設定ボタン４７も表示される。 Further, on this standby screen, a voice button 43 for selecting a list display of a plurality of preset question phrases, and the language selection screen of FIG. Language selection button 44 for redoing language selection, history button 45 for selecting history display of voice input contents made so far, and selecting a desired phrase from a plurality of recommended phrase groups prepared in advance Then, a suggest button 46 for executing a suggest function capable of proceeding with the conversation and a setting button 47 for performing various settings of the application software are also displayed.

［通常の音声翻訳による会話］
ここで、図２を参照して、ユーザと対話者の会話及び／又は会話準備における通常の音声翻訳処理の一例について説明する。まず、図４（Ｂ）に示す待機画面において、ユーザが日本語の入力ボタン４２ａをタップして日本語の音声入力を選択すると、その音声入力が可能な状態となる。この状態で、ユーザが対話者への伝達事項等を発話する（ステップＳＵ２）と、音声入出力デバイス１３を通して音声入力が行われる（ステップＳＪ３）。情報端末１０のプロセッサ１１は、その音声入力に基づいて音声信号を生成し、その音声信号を通信インターフェイス１４及びネットワークＮを通してサーバ２０へ送信する。このとおり、情報端末１０自体、又はプロセッサ１１及び音声入出力デバイス１３が「入力部」として機能する。 [Conversation by normal speech translation]
Here, with reference to FIG. 2, an example of a normal speech translation process in a conversation between a user and a conversation person and / or a conversation preparation will be described. First, on the standby screen shown in FIG. 4B, when the user taps the Japanese input button 42a and selects Japanese voice input, the voice input is enabled. In this state, when the user utters a matter to be communicated to the conversation person (step SU2), voice input is performed through the voice input / output device 13 (step SJ3). The processor 11 of the information terminal 10 generates an audio signal based on the audio input, and transmits the audio signal to the server 20 through the communication interface 14 and the network N. As described above, the information terminal 10 itself, or the processor 11 and the voice input / output device 13 function as an “input unit”.

サーバ２０のプロセッサ２１は、通信インターフェイス２２を通してその音声信号を受信し、音声認識処理を行う（ステップＳＪ４）。このとき、プロセッサ２１は、記憶資源２３から、必要なモジュールＬ２０、データベースＤ２０、及びモデルＭ２０（音声認識モジュール、日本語音声コーパス、音響モデル、言語モデル等）を呼び出し、入力音声の「音」を「読み」（文字）へ変換する。このとおり、プロセッサ２１、又は、サーバ２０が全体として「音声認識サーバ」として機能する。 The processor 21 of the server 20 receives the voice signal through the communication interface 22 and performs voice recognition processing (step SJ4). At this time, the processor 21 calls the necessary module L20, database D20, and model M20 (speech recognition module, Japanese speech corpus, acoustic model, language model, etc.) from the storage resource 23, and obtains “sound” of the input speech. Convert to "reading" (character). As described above, the processor 21 or the server 20 functions as a “voice recognition server” as a whole.

次に、プロセッサ２１は、認識された音声の「読み」（文字）を他の言語に翻訳する多言語翻訳処理へ移行する（ステップＳＪ５）。このとき、プロセッサ２１は、記憶資源２３から、必要なモジュールＬ２０及びデータベースＤ２０（翻訳モジュール、日本語文字コーパス、日本語辞書、英語辞書、日英対訳辞書、日英対訳コーパス等）を呼び出し、認識結果である入力音声の「読み」（文字列）を適切に並び替えて日本語の句、節、文等へ変換し、その変換結果に対応する英語を抽出し、それらを英文法に従って並び替えて自然な英語の句、節、文等へと変換する。このとおり、プロセッサ２１は、「翻訳部」としても機能し、サーバ２０は、全体として「翻訳サーバ」としても機能する。なお、入力音声が正確に認識されなかった場合には、音声の再入力を行うことができる（画面表示を図示せず）。 Next, the processor 21 proceeds to multilingual translation processing for translating the recognized “reading” (characters) of the recognized speech into another language (step SJ5). At this time, the processor 21 calls the necessary module L20 and database D20 (translation module, Japanese character corpus, Japanese dictionary, English dictionary, Japanese-English bilingual dictionary, Japanese-English bilingual corpus, etc.) from the storage resource 23 and recognizes them. The resulting input speech “reading” (character string) is properly sorted and converted into Japanese phrases, clauses, sentences, etc., the English corresponding to the conversion result is extracted, and these are sorted according to the English grammar. To natural English phrases, clauses, sentences, etc. As described above, the processor 21 also functions as a “translation unit”, and the server 20 also functions as a “translation server” as a whole. If the input voice is not correctly recognized, the voice can be re-input (screen display is not shown).

また、プロセッサ２１は、認識された入力音声の内容を記憶資源２３に記憶する。次に、多言語翻訳処理、及び、入力音声の内容の記憶処理が完了すると、プロセッサ２１は、音声合成処理へ移行する（ステップＳＪ６）。このとき、プロセッサ２１は、記憶資源２３から、必要なモジュールＬ２０、データベースＤ２０、及びモデルＭ２０（音声合成モジュール、英語音声コーパス、音響モデル、言語モデル等）を呼び出し、翻訳結果である英語の句、節、文等を自然な音声に変換する。このとおり、プロセッサ２１は、「音声合成部」としても機能し、サーバ２０は、全体として「音声合成サーバ」としても機能する。 Further, the processor 21 stores the content of the recognized input voice in the storage resource 23. Next, when the multilingual translation processing and the input speech content storage processing are completed, the processor 21 proceeds to speech synthesis processing (step SJ6). At this time, the processor 21 calls the necessary module L20, database D20, and model M20 (speech synthesis module, English speech corpus, acoustic model, language model, etc.) from the storage resource 23, and the English phrase that is the translation result, Convert clauses, sentences, etc. to natural speech. As described above, the processor 21 also functions as a “speech synthesizer”, and the server 20 also functions as a “speech synthesizer” as a whole.

次いで、プロセッサ２１は、合成された音声に基づいて音声出力用の音声信号を生成し、通信インターフェイス２２及びネットワークＮを通して、情報端末１０へ送信する。情報端末１０のプロセッサ１１は、通信インターフェイス１４を通してその音声信号を受信し、音声入出力デバイス１３を用いて、音声出力処理を行う（ステップＳＪ７）。このとおり、プロセッサ１１及び音声入出力デバイス１３が、「出力部」として機能する。 Next, the processor 21 generates a voice signal for voice output based on the synthesized voice, and transmits the voice signal to the information terminal 10 through the communication interface 22 and the network N. The processor 11 of the information terminal 10 receives the audio signal through the communication interface 14, and performs an audio output process using the audio input / output device 13 (step SJ7). As described above, the processor 11 and the voice input / output device 13 function as an “output unit”.

［サジェスト機能による会話］
次に、図３を参照して、ユーザと対話者の会話及び／又は会話準備において、サジェスト機能を用いる場合の処理の一例について説明する。例えば、対話者（外国人客）がユーザの店舗（例えば飲食サービスを提供する店舗）に入店したときに、ユーザが、図４（Ｂ）に示す待機画面のサジェストボタン４６をタップする（ステップＳＵ３）。そうすると、情報端末１０のプロセッサ１１は、ユーザが属する業種の店舗での接客において多用されるフレーズ群を表示するための指令信号をサーバ２０へ送信する。その指令信号を受信したサーバ２０のプロセッサ２１は、記憶資源２３に記憶されたデータベースＤ２０に含まれるフレーズデータベースＤ６０にアクセスする。 [Conversation using the suggest function]
Next, with reference to FIG. 3, an example of processing in the case where the suggest function is used in the conversation between the user and the conversation person and / or conversation preparation will be described. For example, when a dialog person (foreign customer) enters a user's store (for example, a store providing a food service), the user taps the suggest button 46 on the standby screen shown in FIG. SU3). Then, the processor 11 of the information terminal 10 transmits to the server 20 a command signal for displaying a phrase group frequently used in customer service at a store of the industry to which the user belongs. The processor 21 of the server 20 that has received the command signal accesses the phrase database D60 included in the database D20 stored in the storage resource 23.

ここで、図６は、フレーズデータベースＤ６０のデータ構造の一例を示す模式図である。同図に示す如く、フレーズデータベースＤ６０は、階層化された複数のフレーズ群（データ）６１，６２，６３（図示の都合上、３階層まで記載したが、これに限定されない；以下同様）を備えている。それらのフレーズ群６１，６２，６３は、それぞれ複数の日本語のフレーズＸ１１〜Ｘ５５，Ｙ１１〜Ｙ５５，Ｚ１１〜Ｚ５５及びそれらの異なる言語による訳文を含んでおり、上位階層のフレーズ群に含まれるフレーズのそれぞれに、複数のフレーズとそれらを含む下位階層のフレーズ群が関連付けられている。このように、フレーズ群６１，６２，６３等からフレーズの言わば樹形図が構成されており、これにより、フレーズの連続的なフローが提供される。また、フレーズが特定の質問事項である場合、当該フレーズには質問事項フラグが付されている。そして、その質問事項への回答を入力するための画面データが、そのフレーズに関連付けて、フレーズ群の一部として、フレーズデータベースＤ６０又は他の適宜のデータベースに記憶されている。このとおり、記憶資源２３が、「記憶部」として機能する。 Here, FIG. 6 is a schematic diagram showing an example of the data structure of the phrase database D60. As shown in the figure, the phrase database D60 includes a plurality of hierarchized phrase groups (data) 61, 62, 63 (for convenience of illustration, up to three hierarchies are described, but not limited thereto; the same applies hereinafter). ing. Each of these phrase groups 61, 62, and 63 includes a plurality of Japanese phrases X11 to X55, Y11 to Y55, Z11 to Z55, and their translations in different languages, and the phrases included in the higher-level phrase group Are associated with a plurality of phrases and lower-level phrase groups including them. In this way, a so-called tree diagram of phrases is constructed from the phrase groups 61, 62, 63, etc., thereby providing a continuous flow of phrases. Moreover, when a phrase is a specific question item, the question item flag is attached | subjected to the said phrase. Then, screen data for inputting an answer to the question is stored in the phrase database D60 or other appropriate database as a part of the phrase group in association with the phrase. As described above, the storage resource 23 functions as a “storage unit”.

より具体的には、フレーズ群６１とフレーズ群６２は、それぞれ「上位フレーズ群」と「下位フレーズ群」の関係を有しており、フレーズ群６１に含まれる例えばフレーズＸ１１（上位フレーズ）に、フレーズＹ１１〜Ｙ５５（下位フレーズ）及びそれらを含むフレーズ群６２が関連付けられている。同様に、フレーズ群６２とフレーズ群６３も、それぞれ「上位フレーズ群」と「下位フレーズ群」の関係を有しており、フレーズ群６２に含まれる例えばフレーズＹ１１（上位フレーズ）に、フレーズＺ１１〜Ｚ５５（下位フレーズ）及びそれらを含むフレーズ群６３が関連付けられている。また、特定の質問事項であるフレーズＸ１１には質問事項フラグＦが付されており、その特定の質問事項への回答を入力するための画面データが、フレーズＸ１１に関連付けてフレーズ群６２の一部として記憶されている。 More specifically, the phrase group 61 and the phrase group 62 have a relationship of “upper phrase group” and “lower phrase group”, respectively. For example, the phrase X11 (upper phrase) included in the phrase group 61 includes: The phrases Y11 to Y55 (subordinate phrases) and the phrase group 62 including them are associated with each other. Similarly, the phrase group 62 and the phrase group 63 also have a relationship of “upper phrase group” and “lower phrase group”, respectively. For example, a phrase Y11 (upper phrase) included in the phrase group 62 includes phrases Z11 to Z11. Z55 (subordinate phrase) and a phrase group 63 including them are associated with each other. Further, the question item flag F is attached to the phrase X11 which is a specific question item, and screen data for inputting an answer to the specific question item is part of the phrase group 62 in association with the phrase X11. Is remembered as

かかるフレーズデータベースＤ６０にアクセスしたサーバ２０のプロセッサ２１は、まず、上位階層のフレーズ群６１に含まれるフレーズＸ１１〜Ｘ５５を呼び出し、それらのリストの表示画像データを作成して情報端末１０のプロセッサ１１へ送信する。プロセッサ１１は、その表示画像データに基づいて、例えば図４（Ｃ）に示すフレーズ群画面を表示デバイス１６に表示する（ここまでステップＳＪ８）。 The processor 21 of the server 20 that has accessed the phrase database D60 first calls the phrases X11 to X55 included in the phrase group 61 of the upper hierarchy, creates display image data of those lists, and sends it to the processor 11 of the information terminal 10. Send. The processor 11 displays, for example, the phrase group screen shown in FIG. 4C on the display device 16 based on the display image data (step SJ8 so far).

この図４（Ｃ）のフレーズ群画面には、複数の日本語のフレーズテキスト（上位フレーズ）とそれらの英語による訳文を示す英語のフレーズテキストが、フレーズ毎に併記された状態でフレーズリストＰ１として表示される。図４（Ｃ）に示すとおり、フレーズリストＰ１には、飲食店の店員が来店した客に対して最初に声がけする際によく発話される複数のフレーズが含まれている。ユーザは、それらのテキスト部分をタップすることにより、所望の特定のフレーズを選択することができる。また、このフレーズ群画面において、フレーズリストＰ１の上方及び下方には、それぞれ、図４（Ａ）の言語選択画面において対話者の言語として選択された言語（つまり対訳言語）が英語であることを示す日本語のテキストＴ４、及び、フレーズ群画面を閉じて図４（Ｂ）の待機画面へ戻るための閉じるボタンＢ２も表示される（以下同様）。 In the phrase group screen of FIG. 4C, a plurality of Japanese phrase texts (upper phrases) and English phrase texts indicating their translations in English are written together as phrases list P1 in a state where each phrase is written together. Is displayed. As shown in FIG. 4C, the phrase list P1 includes a plurality of phrases that are often spoken when a restaurant clerk speaks to a customer who first visits the store. The user can select a desired specific phrase by tapping those text portions. In the phrase group screen, above and below the phrase list P1, the language selected as the language of the conversation person (ie, the parallel language) on the language selection screen in FIG. 4A is English. The Japanese text T4 shown and a close button B2 for closing the phrase group screen and returning to the standby screen of FIG. 4B are also displayed (the same applies hereinafter).

ここで、ユーザが、フレーズリストＰ１のなかから例えば人数を問い合わせる旨のフレーズ（「何名様ですか？」：特定のフレーズかつ特定の質問事項）のテキストＴ５（フレーズ群６１の例えばフレーズＸ１１に相当）をタップして選択する（ステップＳＵ４）と、情報端末１０のプロセッサ１１は、その選択指令信号をサーバ２０のプロセッサ２１へ送信する。それを受信したプロセッサ２１は、テキストＴ５の英語による訳文の音声出力データをプロセッサ１１へ返信し、プロセッサ１１は、その音声を音声入出力デバイス１３から出力する。 Here, the text T5 (for example, the phrase X11 in the phrase group 61) of a phrase (“how many people?”: A specific phrase and a specific question) that the user inquires about the number of people from the phrase list P1, for example. If it is selected by tapping (corresponding) (step SU4), the processor 11 of the information terminal 10 transmits the selection command signal to the processor 21 of the server 20. The processor 21 that has received it returns voice output data of the translation of the text T5 in English to the processor 11, and the processor 11 outputs the voice from the voice input / output device 13.

また、プロセッサ２１は、フレーズデータベースＤ６０に再度アクセスし、選択されたテキストＴ５に相当するフレーズＸ１１が、特定の質問事項であるか否かを判定する（ステップＳＪ９）。ここで、フレーズＸ１１には質問事項フラグＦが付されている（ステップＳＪ９で「Ｙｅｓ」）ので、プロセッサ２１は、フレーズＸ１１に関連付けられたフレーズ群６２の一部として記憶された回答入力画面データを呼び出し、プロセッサ１１へ返信する。それを受信したプロセッサ１１は、対話者が来店人数を入力するための画面として、例えば図４（Ｄ）に示す回答入力画面を表示デバイス１６に表示する（ここまでステップＳＪ１０）。この図４（Ｄ）の回答入力画面には、人数を入力するための数字キー４８が表示され、ユーザがこの回答入力画面を対話者に提示し、対話者が画面をタップして来店人数（ここでは例えば２人）を入力する（ステップＳＵ５）と、その数字がカラム４９に表示される。 Further, the processor 21 accesses the phrase database D60 again, and determines whether or not the phrase X11 corresponding to the selected text T5 is a specific question (step SJ9). Here, since the question item flag F is attached to the phrase X11 (“Yes” in step SJ9), the processor 21 stores the answer input screen data stored as a part of the phrase group 62 associated with the phrase X11. Is returned to the processor 11. The processor 11 that has received the message displays, for example, an answer input screen shown in FIG. 4D on the display device 16 as a screen for the interlocutor to input the number of visitors (step SJ10 so far). In the answer input screen of FIG. 4D, a numeric key 48 for inputting the number of persons is displayed. The user presents the answer input screen to the dialog person, and the dialog person taps the screen to display the number of visitors ( Here, for example, when two persons are entered (step SU5), the number is displayed in the column 49.

こうして、特定の質問事項であるフレーズＸ１１（テキストＴ５）に対する回答（人数）が入力されると、情報端末１０のプロセッサ１１は、その入力完了信号をサーバ２０のプロセッサ２１へ送信する。それを受信したプロセッサ２１は、フレーズデータベースＤ６０に再度アクセスし、フレーズＸ１１に関連付けられた下位階層のフレーズ群６２に含まれるフレーズＹ１１〜Ｙ５５を呼び出し、それらのリストの表示画像データを作成して情報端末１０のプロセッサ１１へ送信する。プロセッサ１１は、その表示画像データに基づいて、例えば図５（Ａ）に示すフレーズ群画面を表示デバイス１６に表示する（ここまでステップＳＪ１１）。このとおり、プロセッサ１１，２１及び表示デバイス１６が、「表示部」として機能する。 Thus, when an answer (number of people) to the phrase X11 (text T5), which is a specific question, is input, the processor 11 of the information terminal 10 transmits an input completion signal to the processor 21 of the server 20. The processor 21 having received it accesses the phrase database D60 again, calls the phrases Y11 to Y55 included in the phrase group 62 in the lower hierarchy associated with the phrase X11, creates display image data of those lists, and creates information. It transmits to the processor 11 of the terminal 10. Based on the display image data, the processor 11 displays, for example, the phrase group screen shown in FIG. 5A on the display device 16 (step SJ11 so far). As described above, the processors 11 and 21 and the display device 16 function as a “display unit”.

この図５（Ａ）のフレーズ群画面には、図４（Ｃ）に示すフレーズリストＰ１と同様の形態で複数のフレーズテキスト（下位フレーズ）を含むフレーズリストＰ２が表示される。図５（Ａ）に示すとおり、フレーズリストＰ２には、飲食店の店員が来店した客の人数を確認した後によく発話されるフレーズが含まれている。ここで、ユーザは、フレーズリストＰ１と同様に、フレーズリストＰ２におけるテキスト部分をタップすることにより、所望の特定のフレーズを選択することができる。以下、このようにして、上位フレーズ群のフレーズリストの表示、それらのなかから特定のフレーズの選択、及び特定のフレーズに関連付けられた下位フレーズを含む下位フレーズ群の表示といった一連の処理を、階層的に順次（繰り返して）実行することにより、ユーザと対話者の会話を進めることができる。このとおり、上位フレーズ群であるフレーズリストＰ１の下位フレーズ群として表示されたフレーズリストＰ２は、次に表示される更なる下位フレーズ群に対する上位フレーズ群に該当する。 On the phrase group screen of FIG. 5A, a phrase list P2 including a plurality of phrase texts (lower phrases) is displayed in the same form as the phrase list P1 shown in FIG. As shown in FIG. 5A, the phrase list P2 includes phrases that are often uttered after the number of customers who have visited the restaurant is confirmed. Here, similarly to the phrase list P1, the user can select a desired specific phrase by tapping the text portion in the phrase list P2. Hereinafter, a series of processes such as display of the phrase list of the upper phrase group, selection of a specific phrase from them, and display of the lower phrase group including the lower phrase associated with the specific phrase are hierarchically performed. Therefore, the conversation between the user and the conversation person can be advanced by executing sequentially (repetitively). As described above, the phrase list P2 displayed as the lower phrase group of the phrase list P1, which is the upper phrase group, corresponds to the upper phrase group for the further lower phrase group to be displayed next.

次に、ユーザが、フレーズリストＰ２のなかから例えば空席に案内する旨のフレーズ（「お席へご案内します。」：特定のフレーズ）のテキストＴ６（フレーズ群６２の例えばフレーズＹ１１に相当）をタップして選択する（ステップＳＵ４）と、情報端末１０のプロセッサ１１は、その選択指令信号をサーバ２０のプロセッサ２１へ送信する。それを受信したプロセッサ２１は、テキストＴ６の英語による訳文の音声出力データをプロセッサ１１へ返信し、プロセッサ１１は、その音声を音声入出力デバイス１３から出力する。 Next, the text T6 (corresponding to, for example, the phrase Y11 in the phrase group 62) of the phrase ("Guide to the seat.": A specific phrase) that the user guides, for example, to a vacant seat from the phrase list P2. Is selected by tapping (step SU4), the processor 11 of the information terminal 10 transmits the selection command signal to the processor 21 of the server 20. The processor 21 that has received it returns voice output data of the translation of the text T6 in English to the processor 11, and the processor 11 outputs the voice from the voice input / output device 13.

また、プロセッサ２１は、フレーズデータベースＤ６０に再度アクセスし、選択されたテキストＴ６に相当するフレーズＹ１１が、特定の質問事項であるか否かを判定する（ステップＳＪ９）。ここで、フレーズＹ１１には質問事項フラグＦが付されていない（ステップＳＪ９で「Ｎｏ」）ので、プロセッサ２１は、フレーズＹ１１に関連付けられた更に下位階層のフレーズ群６３に含まれるフレーズＺ１１〜Ｚ５５を呼び出し、それらのリストの表示画像データを作成して情報端末１０のプロセッサ１１へ送信する。プロセッサ１１は、その表示画像データに基づいて、例えば図５（Ｂ）に示すフレーズ群画面を表示デバイス１６に表示する（ここまでステップＳＪ１１）。 Further, the processor 21 accesses the phrase database D60 again, and determines whether or not the phrase Y11 corresponding to the selected text T6 is a specific question (step SJ9). Here, since the question flag F is not attached to the phrase Y11 (“No” in step SJ9), the processor 21 includes the phrases Z11 to Z55 included in the phrase group 63 of a lower hierarchy associated with the phrase Y11. And display image data of those lists is created and transmitted to the processor 11 of the information terminal 10. Based on the display image data, the processor 11 displays, for example, the phrase group screen shown in FIG. 5B on the display device 16 (step SJ11 so far).

この図５（Ｂ）のフレーズ群画面には、図４（Ｃ）に示すフレーズリストＰ１と同様の形態で複数のフレーズテキスト（下位フレーズ）を含むフレーズリストＰ３が表示される。図５（Ｂ）に示すとおり、フレーズリストＰ３には、客が席に着いたタイミングでよく発話されるフレーズが含まれている。 On the phrase group screen of FIG. 5B, a phrase list P3 including a plurality of phrase texts (lower phrases) is displayed in the same form as the phrase list P1 shown in FIG. As shown in FIG. 5B, the phrase list P3 includes phrases that are often spoken at the timing when the customer is seated.

次いで、ユーザは、フレーズリストＰ３のなかから例えば飲み物の注文を問い合わせる旨のフレーズ（「お飲み物はいかがなさいますか？」：特定のフレーズ）のテキストＴ７（フレーズ群６３の例えばフレーズＺ１１に相当）をタップして選択する（ステップＳＵ４）と、情報端末１０のプロセッサ１１は、その選択指令信号をサーバ２０のプロセッサ２１へ送信する。それを受信したプロセッサ２１は、テキストＴ７の英語による訳文の音声出力データをプロセッサ１１へ返信し、プロセッサ１１は、その音声を音声入出力デバイス１３から出力する。 Next, the user makes a text T7 (corresponding to, for example, the phrase Z11 in the phrase group 63) of a phrase ("Would you like a drink?": A specific phrase) to inquire about an order for drinks from the phrase list P3 Is selected by tapping (step SU4), the processor 11 of the information terminal 10 transmits the selection command signal to the processor 21 of the server 20. The processor 21 that has received it returns voice output data of the translation of the text T7 in English to the processor 11, and the processor 11 outputs the voice from the voice input / output device 13.

さらに、プロセッサ２１は、フレーズデータベースＤ６０に再度アクセスし、選択されたテキストＴ７に相当するフレーズＺ１１が、特定の質問事項であるか否かを判定する（ステップＳＪ９）。このフレーズＺ１１にも質問事項フラグＦが付されていない（ステップＳＪ９で「Ｎｏ」）ので、プロセッサ２１は、フレーズＺ１１に関連付けられた更に下位階層のフレーズ群に含まれるフレーズを呼び出し、それらのリストの表示画像データを作成して情報端末１０のプロセッサ１１へ送信する。プロセッサ１１は、その表示画像データに基づいて、例えば図５（Ｃ）に示すフレーズ群画面を表示デバイス１６に表示する（ここまでステップＳＪ１１）。 Furthermore, the processor 21 accesses the phrase database D60 again, and determines whether or not the phrase Z11 corresponding to the selected text T7 is a specific question item (step SJ9). Since the question flag F is not attached to this phrase Z11 (“No” in step SJ9), the processor 21 calls the phrases included in the lower-level phrase group associated with the phrase Z11 and lists them. Display image data is generated and transmitted to the processor 11 of the information terminal 10. Based on the display image data, the processor 11 displays, for example, the phrase group screen shown in FIG. 5C on the display device 16 (step SJ11 so far).

この図５（Ｃ）のフレーズ群画面には、図４（Ｃ）に示すフレーズリストＰ１と同様の形態で複数のフレーズテキスト（下位フレーズ）を含むフレーズリストＰ４が表示される。図５（Ｃ）に示すとおり、フレーズリストＰ４には、複数の飲み物のメニュー名が含まれており、また、対話者がユーザに対してメニューを要求する旨のテキストＴ８も含まれている。ユーザがこのフレーズ群画面を対話者に提示し、所望の飲み物のメニュー名をタップして貰うことにより、注文をとることができる。或いは、対話者の所望の飲み物のメニュー名がフレーズリストＰ４にない場合、対話者は、テキストＴ８の部分をタップすることにより、店員であるユーザに対してメニューの閲覧を求めることができる（ステップＳＵ４）。 On the phrase group screen of FIG. 5C, a phrase list P4 including a plurality of phrase texts (lower phrases) is displayed in the same manner as the phrase list P1 shown in FIG. As shown in FIG. 5C, the phrase list P4 includes menu names of a plurality of drinks, and also includes text T8 indicating that the interrogator requests a menu from the user. The user can place an order by presenting this phrase group screen to the interlocutor and tapping on the desired drink menu name. Alternatively, when the menu name of the drink desired by the dialog person is not in the phrase list P4, the dialog person can request the user who is a store clerk to browse the menu by tapping the text T8 portion (step SU4).

このようにして、対話者が、フレーズリストＰ４のなかから所望のメニュー名を表すフレーズ又はメニューの閲覧を依頼する旨のフレーズ（何れも特定のフレーズ）のテキストをタップして選択する（ステップＳＵ４）と、情報端末１０のプロセッサ１１は、その選択指令信号をサーバ２０のプロセッサ２１へ送信する。それを受信したプロセッサ２１は、そのテキストの日本語の音声出力データをプロセッサ１１へ返信し、プロセッサ１１は、その音声を音声入出力デバイス１３から出力する。 In this way, the dialog person taps and selects the phrase representing the desired menu name from the phrase list P4 or the text of the phrase requesting to browse the menu (both are specific phrases) (step SU4). ) And the processor 11 of the information terminal 10 transmits the selection command signal to the processor 21 of the server 20. The processor 21 that has received it returns the Japanese voice output data of the text to the processor 11, and the processor 11 outputs the voice from the voice input / output device 13.

また、プロセッサ２１は、フレーズデータベースＤ６０に再度アクセスし、選択されたテキストに相当するフレーズが、特定の質問事項であるか否かを判定する（ステップＳＪ９）。ここで、フレーズリストＰ４に含まれるフレーズには質問事項フラグＦが付されていない（ステップＳＪ９で「Ｎｏ」）ので、プロセッサ２１は、そのフレーズに関連付けられた更に下位階層のフレーズ群に含まれる複数のフレーズを呼び出し、それらのリストの表示画像データを作成して情報端末１０のプロセッサ１１へ送信する。プロセッサ１１は、その表示画像データに基づいて、例えば図５（Ｄ）に示すフレーズ群画面を表示デバイス１６に表示する（ここまでステップＳＪ１１）。 Further, the processor 21 accesses the phrase database D60 again, and determines whether or not the phrase corresponding to the selected text is a specific question (step SJ9). Here, since the question item flag F is not attached to the phrase included in the phrase list P4 (“No” in step SJ9), the processor 21 is included in the lower-level phrase group associated with the phrase. A plurality of phrases are called, and display image data of those lists is created and transmitted to the processor 11 of the information terminal 10. Based on the display image data, the processor 11 displays, for example, the phrase group screen shown in FIG. 5D on the display device 16 (step SJ11 so far).

この図５（Ｄ）のフレーズ群画面には、図４（Ｃ）に示すフレーズリストＰ１と同様の形態で複数のフレーズテキスト（下位フレーズ）を含むフレーズリストＰ５が表示される。図５（Ｄ）に示すとおり、フレーズリストＰ５には、客からの注文や依頼を受けた場合によく発話されるフレーズが含まれている。そして、ユーザが、フレーズリストＰ５のなかから所望のフレーズのテキスト部分をタップして選択する（ステップＳＵ４）と、これまでの処理と同様にして、そのフレーズの英語による訳文の音声出力が行われ、ユーザは、当該アプリケーションを適宜終了することができる（ステップＳＵ６）。なお、この図５（Ｄ）に示すフレーズリストＰ５は、図５（Ｃ）に示すフレーズリストＰ４に含まれる何れのフレーズに対する下位フレーズ群としても有効である。このように、異なる上位フレーズに対して、複数の同じ下位フレーズを含むフレーズ群が関連付けられていてもよい。 On the phrase group screen of FIG. 5D, a phrase list P5 including a plurality of phrase texts (lower phrases) is displayed in the same form as the phrase list P1 shown in FIG. As shown in FIG. 5D, the phrase list P5 includes phrases that are often spoken when orders or requests from customers are received. When the user taps and selects the text portion of the desired phrase from the phrase list P5 (step SU4), the English translation of the phrase is output in the same manner as the processing so far. The user can end the application as appropriate (step SU6). The phrase list P5 shown in FIG. 5D is effective as a lower phrase group for any phrase included in the phrase list P4 shown in FIG. Thus, a phrase group including a plurality of the same lower phrases may be associated with different upper phrases.

以上のように構成された音声翻訳装置１００及びそれを用いた音声翻訳方法並びに音声翻訳プログラムによれば、ユーザと対話者との会話において、上位フレーズ群に含まれる上位フレーズのなかから特定のフレーズを選択すると、それに関連付けて記憶された下位フレーズを含む下位フレーズ群が画面表示され、かかる一連の処理が階層的に順次（繰り返し）実行される。例えば、図４（Ｃ）→図４（Ｄ）→図５（Ａ）→図５（Ｂ）→図５（Ｃ）→図５（Ｄ）に示す画面表示の階層的な遷移に従って所望のフレーズを選択することにより、飲食店における接客といったシチュエーションにおける会話を進行させることができる。このとおり、本発明によれば、所定の想定されるシチュエーションにおいて、発話の都度、質問や回答の内容を熟慮することなく、会話を滞りなく続けることができるので、ユーザと対話者との会話を自然かつ円滑ならしめ、これにより、接客の最適化を図ることが可能となる。 According to the speech translation apparatus 100 configured as described above, the speech translation method using the speech translation device, and the speech translation program, in a conversation between a user and a conversation person, a specific phrase is selected from the upper phrases included in the upper phrase group. Is selected, a lower phrase group including lower phrases stored in association therewith is displayed on the screen, and such a series of processes are executed sequentially (repeatedly) hierarchically. For example, a desired phrase according to the hierarchical transition of the screen display shown in FIG. 4 (C) → FIG. 4 (D) → FIG. 5 (A) → FIG. 5 (B) → FIG. 5 (C) → FIG. By selecting, a conversation in a situation such as customer service at a restaurant can be advanced. As described above, according to the present invention, in a predetermined assumed situation, each time an utterance is made, it is possible to continue the conversation without considering the contents of the question or the answer, so that the conversation between the user and the conversation person can be performed. This makes it possible to optimize the customer service.

また、図４（Ｃ）及び（Ｄ）並びに図５（Ａ）乃至（Ｄ）に示す如く、上位フレーズ群と下位フレーズ群を、それぞれ別画面として表示デバイス１６に順次表示するので、会話の進行に応じた所望のフレーズを簡易かつ的確に選択し易くなり、ユーザと対話者との会話をより自然かつ円滑ならしめることができる。 Also, as shown in FIGS. 4C and 4D and FIGS. 5A to 5D, the upper phrase group and the lower phrase group are sequentially displayed on the display device 16 as separate screens, so that the conversation progresses. It becomes easy to select a desired phrase according to the user easily and accurately, and the conversation between the user and the interlocutor can be made more natural and smooth.

さらに、上位フレーズとして特定の質問事項（例えば図４（Ｃ）に示すテキストＴ５のフレーズＸ１１）が含まれており、それが選択されたときに、その特定の質問事項への回答を入力するための例えば図４（Ｄ）に示す画面が表示される。よって、フレーズのみを順次表示していく場合に比して、会話の選択肢又は会話の幅を広げることができ、また、これにより、より多くのシチュエーションにおける会話へ柔軟に対応することができる。 Furthermore, a specific question (for example, the phrase X11 of the text T5 shown in FIG. 4C) is included as an upper phrase, and when it is selected, an answer to the specific question is input. For example, the screen shown in FIG. 4D is displayed. Therefore, compared with the case where only phrases are sequentially displayed, the choice of conversation or the width of conversation can be expanded, and this makes it possible to flexibly deal with conversations in more situations.

またさらに、フレーズ群６１，６２，６３等（フレーズリストＰ１〜Ｐ５）のそれぞれに設定された複数のフレーズ（つまり上位フレーズ及び下位フレーズ）は、上述の如く、ユーザが属する業種毎に予め設定されたものであるので、その業種における接客に特化した会話を、より円滑にかつより適切に実施して、接客の最適化を更に図ることができる。また、かかるフレーズを、ユーザの店舗毎に予め設定しておくこともでき、この場合、店舗毎の特徴や店舗の状況を反映したよりきめ細かい接客が可能となる。 Furthermore, as described above, a plurality of phrases (that is, upper phrases and lower phrases) set in each of the phrase groups 61, 62, 63, etc. (phrase lists P1 to P5) are set in advance for each industry to which the user belongs. Therefore, conversations specialized for customer service in the industry can be carried out more smoothly and appropriately, and customer service can be further optimized. In addition, such phrases can be set in advance for each user's store. In this case, finer customer service reflecting the features of each store and the situation of the store is possible.

さらに、これらの上位フレーズ及び下位フレーズの設定は、自動で行っても手動で行ってもよい。自動で設定する例としては、まず、当該翻訳アプリケーションの利用に際し、ユーザ情報の１つとしてユーザの業種を登録しておき、サーバ２０のプロセッサ２１が、その業種の会話で頻出するフレーズのコーパスや履歴のなかから特に多用されるフレーズを選定してフレーズ群として階層化する形態が挙げられる。或いは、同業種の複数のユーザが発話したフレーズを、その発話頻度とともに適宜のデータベースに記憶し、サーバ２０のプロセッサ２１が、それらのフレーズのなかから特に多用されているフレーズを選定してフレーズ群として階層化するようにしてもよい。一方、手動で設定する例としては、ユーザが所望のフレーズを選定し、階層的なフレーズ群としてカスタマイズする形態が挙げられる。 Further, the setting of these upper phrases and lower phrases may be performed automatically or manually. As an example of automatic setting, first, when using the translation application, the user's business type is registered as one of the user information, and the processor 21 of the server 20 uses a corpus of phrases frequently appearing in conversations of the business type. There is a form in which phrases that are particularly frequently used are selected from the history and hierarchized as a phrase group. Alternatively, phrases uttered by a plurality of users in the same industry are stored in an appropriate database together with the utterance frequency, and the processor 21 of the server 20 selects a phrase that is particularly frequently used from those phrases, and a phrase group It may be arranged as a hierarchy. On the other hand, as an example of setting manually, there is a form in which the user selects a desired phrase and customizes it as a hierarchical phrase group.

その際、各フレーズ群６１，６２，６３等（フレーズリストＰ１〜Ｐ５）に含まれるフレーズを、当初の設定のまま維持（フレーズリストの固定）してもよく、或いは、それらに含まれるフレーズを、必要に応じて適宜変更してもよい。すなわち、後者の場合、例えば、各フレーズ（上位フレーズ及び下位フレーズ）が選択された回数を記憶資源２３に記憶しておき、サーバ２０のプロセッサ２１が、選択回数のより多いフレーズを各フレーズ群６１，６２，６３等（フレーズリストＰ１〜Ｐ５）の表示画面においてより高い順位に表示（例えば画面の上方に表示したり強調や拡大して表示したり）するようにしてもよい。これにより、ユーザの業種や店舗の実情に即したフレーズを表示し易くなり、かつ、選択し易くなる利点があり、また、ユーザと対話者のコミュニケーションを更に高速化することができる。 At that time, phrases included in each of the phrase groups 61, 62, 63, etc. (phrase lists P1 to P5) may be maintained as originally set (fix the phrase list), or phrases included in them may be stored. These may be changed as necessary. That is, in the latter case, for example, the number of times each phrase (upper phrase and lower phrase) is selected is stored in the storage resource 23, and the processor 21 of the server 20 selects a phrase with a larger number of selections in each phrase group 61. , 62, 63, etc. (phrase lists P1 to P5) may be displayed in a higher order (for example, displayed at the top of the screen, or displayed with emphasis or enlargement). As a result, it is possible to easily display a phrase in accordance with the user's type of business and the actual situation of the store, and it is easy to select, and the communication between the user and the conversation person can be further speeded up.

また、図５（Ｂ）及び（Ｃ）に示す如く、特定のフレーズが注文（飲み物）の問い合わせ（テキストＴ７のフレーズＺ１１）であり、その特定のフレーズに関連付けられた下位フレーズを含む下位フレーズ群が複数の注文品（メニュー名）のリストである場合、以下の処理を行ってもよい。すなわち、各注文品について過去の所定期間に選択された回数（オーダー数）、又は、各注文品の利益率を、記憶資源２３に記憶しておき、サーバ２０のプロセッサ２１が、その選択回数がより多い注文品、又は、利益率がより高い注文品を、例えば図５（Ｃ）に示すフレーズ群の表示画面においてより高い順位に表示してもよい。これにより、対話者（外国人客）に対し、人気が高いメニューや客単価が高いメニューを積極的に推奨することができ、その結果、ユーザの店舗の売上及び利益の向上を図ることができる。 Also, as shown in FIGS. 5B and 5C, the specific phrase is an order (drink) inquiry (phrase Z11 of text T7), and a lower phrase group including lower phrases associated with the specific phrase Is a list of a plurality of order items (menu names), the following processing may be performed. That is, the number of times each order item is selected in the past predetermined period (the number of orders) or the profit rate of each order item is stored in the storage resource 23, and the processor 21 of the server 20 determines the number of selection times. More ordered items or ordered items with a higher profit rate may be displayed in a higher order on the phrase group display screen shown in FIG. 5C, for example. As a result, it is possible to actively recommend a popular menu or a menu with a high customer unit price to a conversation person (foreign customer), and as a result, it is possible to improve the sales and profit of the user's store. .

またさらに、図４（Ｃ）及び（Ｄ）並びに図５（Ａ）乃至（Ｄ）に示す如く、フレーズ群画面において、フレーズ毎の日本語のフレーズテキストと英語による訳文を示す英語のフレーズテキストが、フレーズリストＰ１〜Ｐ５において併記され、また、各フレーズの異なる言語による訳文が音声で出力される。よって、ユーザ及び対話者は、画面の視認に加えて、又は、画面を視認しなくとも、相手の発話内容をより確実に確認することができる。 Furthermore, as shown in FIGS. 4C and 4D and FIGS. 5A to 5D, on the phrase group screen, there is a Japanese phrase text for each phrase and an English phrase text indicating an English translation. In the phrase lists P1 to P5, translations of the phrases in different languages are output by voice. Therefore, the user and the conversation person can confirm the content of the other party's utterance more reliably in addition to viewing the screen or without viewing the screen.

なお、上述したとおり、上記の各実施形態は、本発明を説明するための一例であり、本発明をその実施形態に限定する趣旨ではない。また、本発明は、その要旨を逸脱しない限り、様々な変形が可能である。例えば、当業者であれば、実施形態で述べたリソース（ハードウェア資源又はソフトウェア資源）を均等物に置換することが可能であり、そのような置換も本発明の範囲に含まれる。 Note that, as described above, each of the above embodiments is an example for explaining the present invention, and is not intended to limit the present invention to the embodiment. The present invention can be variously modified without departing from the gist thereof. For example, those skilled in the art can replace the resources (hardware resources or software resources) described in the embodiments with equivalents, and such replacements are also included in the scope of the present invention.

また、図４（Ｃ）及び（Ｄ）並びに図５（Ａ）乃至（Ｄ）において、日本語のフレーズテキストと英語のフレーズテキストを、図４（Ａ）のテキストＴ１，Ｔ２のように、互いに逆向き（互いに異なる向き；図示において上下逆向き）に表示してもよい。さらに、これらの日本語のフレーズテキストと英語のフレーズテキストを併記せず、何れか一方のみ表示するようにしてもよい。またさらに、図４（Ｄ）に示す回答入力を、音声入力で行うことができるように構成してもよい。さらにまた、ユーザの業種に拘わらず、その他の業種用に設定されたフレーズ群を選択することができるようにしてもよい。また、図５（Ｃ）に示すフレーズ群の表示画面（メニュー名）には、その時点における在庫が多い材料を使用したメニュー名やユーザ又はユーザの店舗が独自に推奨するメニュー名を表示することもできる。 4 (C) and 4 (D) and FIGS. 5 (A) to 5 (D), the Japanese phrase text and the English phrase text are mutually converted into the texts T1 and T2 in FIG. 4 (A). They may be displayed in opposite directions (different directions; upside down in the figure). Furthermore, these Japanese phrase text and English phrase text may not be written together, and only one of them may be displayed. Furthermore, the answer input shown in FIG. 4D may be configured to be performed by voice input. Furthermore, regardless of the user's business type, a phrase group set for other business types may be selected. In addition, on the phrase group display screen (menu name) shown in FIG. 5C, a menu name using a material with a large amount of stock at that time or a menu name uniquely recommended by the user or the user's store is displayed. You can also.

また、音声認識、翻訳、音声合成等の各処理をサーバ２０によって実行する例について記載したが、これらの処理を情報端末１０において実行するように構成してもよい。この場合、それらの処理に用いるモジュールＬ２０は、情報端末１０の記憶資源１２に保存されていてもよいし、サーバ２０の記憶資源２３に保存されていてもよい。さらに、音声データベースであるデータベースＤ２０、及び／又は、音響モデル等のモデルＭ２０も、情報端末１０の記憶資源１２に保存されていてもよいし、サーバ２０の記憶資源２３に保存されていてもよい。このとおり、音声翻訳装置は、ネットワークＮ及びサーバ２０を備えなくてもよい。 Moreover, although the example which performs each process, such as speech recognition, translation, speech synthesis, by server 20, was described, you may comprise so that these processes may be performed in the information terminal 10. FIG. In this case, the module L20 used for these processes may be stored in the storage resource 12 of the information terminal 10 or may be stored in the storage resource 23 of the server 20. Furthermore, the database D20 that is a voice database and / or a model M20 such as an acoustic model may be stored in the storage resource 12 of the information terminal 10 or may be stored in the storage resource 23 of the server 20. . As described above, the speech translation apparatus may not include the network N and the server 20.

また、情報端末１０とネットワークＮとの間には、両者間の通信プロトコルを変換するゲートウェイサーバ等が介在してももちろんよい。また、情報端末１０は、携帯型装置に限らず、例えば、デスクトップ型パソコン、ノート型パソコン、タブレット型パソコン、ラップトップ型パソコン等でもよい。 Of course, a gateway server for converting a communication protocol between the information terminal 10 and the network N may be interposed. The information terminal 10 is not limited to a portable device, and may be a desktop personal computer, a notebook personal computer, a tablet personal computer, a laptop personal computer, or the like.

本発明によれば、接客時のユーザと対話者（外国人客）の会話を自然にかつ円滑に進めることができ、これにより、接客の最適化に資することができるので、例えば、互いの言語を理解できない人同士の会話に関するサービスの提供分野における、プログラム、装置、システム、及び方法の設計、製造、提供、販売等の活動に広く利用することができる。 According to the present invention, a conversation between a user and a talker (foreign customer) at the time of customer service can be promoted naturally and smoothly, thereby contributing to optimization of customer service. It can be widely used for activities such as design, manufacture, provision, and sales of programs, devices, systems, and methods in the field of providing services related to conversations between people who cannot understand.

１０情報端末
１１プロセッサ
１２記憶資源
１３音声入出力デバイス
１４通信インターフェイス
１５入力デバイス
１６表示デバイス
１７カメラ
２０サーバ
２１プロセッサ
２２通信インターフェイス
２３記憶資源
４１言語ボタン
４２ａ日本語の入力ボタン
４２ｂ英語の入力ボタン
４３お声がけボタン
４４言語選択ボタン
４５履歴ボタン
４６サジェストボタン
４７設定ボタン
４８数字キー
４９カラム
６１，６２，６３フレーズ群
１００音声翻訳装置
Ｂ１キャンセルボタン
Ｂ２閉じるボタン
Ｄ２０データベース
Ｄ６０フレーズデータベース
Ｆ質問事項フラグ
Ｌ２０モジュール
Ｍ２０モデル
Ｎネットワーク
Ｐ１〜Ｐ５フレーズリスト
Ｐ１０プログラム
Ｐ２０プログラム
Ｔ１〜Ｔ８テキスト
Ｘ１１〜Ｘ５５フレーズ
Ｙ１１〜Ｙ５５フレーズ
Ｚ１１〜Ｚ５５フレーズ 10 Information terminal 11 Processor 12 Storage resource 13 Voice input / output device 14 Communication interface 15 Input device 16 Display device 17 Camera 20 Server 21 Processor 22 Communication interface 23 Storage resource 41 Language button 42a Japanese input button 42b English input button 43 Voice button 44 Language selection button 45 History button 46 Suggest button 47 Setting button 48 Number keys 49 Columns 61, 62, 63 Phrase group 100 Spoken translation device B1 Cancel button B2 Close button D20 Database D60 Phrase database F Question item flag L20 Module M20 Model N Network P1-P5 Phrase list P10 Program P20 Program T1-T8 Text X11-X55 Phrases Y11-Y5 5 phrases Z11-Z55 phrases

Claims

An input unit for inputting a voice of a user and / or a conversation person, a translation unit for translating the contents of the input voice into contents of different languages, and an output unit for outputting the translation contents of the input voice as voice and / or text A speech translation device comprising:
A storage unit that hierarchically stores an upper phrase group including at least one upper phrase, and a plurality of lower phrase groups including at least one lower phrase associated with each upper phrase;
Display the upper phrase group, and when the specific phrase is selected from the upper phrases, the process of displaying the lower phrase group including the lower phrase associated with the specific phrase is sequentially executed hierarchically A display unit to
Further comprising
The upper phrase group includes a specific question as the upper phrase,
Wherein the display unit, when the particular questionnaire is selected, a different display form and the display of the lower phrase, displays a screen for inputting any answers to the specific questions,
The upper phrase and the lower phrase are preset automatically or manually for each type of business to which the user belongs or for each store of the user,
The storage unit stores the number of times of selection of each upper phrase and each lower phrase,
The display unit displays the higher order phrase having a higher number of selections in a higher order on the display screen of the first phrase group, and the lower order phrase having a higher selection number is displayed on the display screen of the lower order phrase group. In a higher ranking,
If the specific phrase is an order inquiry and the sub-phrase group including the sub-phrase associated with the specific phrase is a list of a plurality of orders,
The storage unit stores the number of times each order item is selected, or the profit rate of each order item,
The display unit displays an ordered item with a larger number of selections or an ordered item with a higher profit margin in a higher order on the display screen of the lower phrase group.
Speech translation device.

The display unit displays the upper phrase group and the lower phrase group as separate screens,
The speech translation apparatus according to claim 1.

The storage unit stores a high-level phrase that is the specific question item with a question flag, and stores screen data for inputting the arbitrary answer to the specific question item. Remember it associated with the high-level phrase that is the question,
The display unit determines whether or not the specific question is selected based on the presence or absence of the question flag, and determines that the specific question is selected when the specific question is selected. Based on the screen data for inputting an arbitrary answer, a screen for inputting the arbitrary answer is displayed.
The speech translation apparatus according to claim 1 or 2.

The display unit displays a translation of each upper phrase and each lower phrase in a different language, or the output unit outputs a translation of each upper phrase and each lower phrase in a different language by voice;
The speech translation apparatus according to any one of claims 1 to 3 .

Using a speech translation device including an input unit, a translation unit, an output unit, a storage unit, and a display unit,
The input unit inputs a voice of a user and / or a conversation person;
The translation unit translating the content of the input speech into content of a different language;
The output unit outputting the content of translation of the input speech as speech and / or text;
The storage unit hierarchically stores a plurality of lower phrase groups including an upper phrase group including at least one upper phrase and at least one lower phrase associated with each upper phrase;
The display unit displays the upper phrase group, and when a specific phrase is selected from the upper phrases, a process of displaying a lower phrase group including a lower phrase associated with the specific phrase, Steps to be executed sequentially in a hierarchy;
Including
The upper phrase group includes a specific question as the upper phrase,
Wherein the display unit, when the particular questionnaire is selected, a different display form and the display of the lower phrase, displays a screen for inputting any answers to the specific questions,
The upper phrase and the lower phrase are preset automatically or manually for each type of business to which the user belongs or for each store of the user,
The storage unit stores the number of times of selection of each upper phrase and each lower phrase,
The display unit displays the higher order phrase having a higher number of selections in a higher order on the display screen of the first phrase group, and the lower order phrase having a higher selection number is displayed on the display screen of the lower order phrase group. In a higher ranking,
If the specific phrase is an order inquiry and the sub-phrase group including the sub-phrase associated with the specific phrase is a list of a plurality of orders,
The storage unit stores the number of times each order item is selected, or the profit rate of each order item,
The display unit displays an ordered item with a larger number of selections or an ordered item with a higher profit margin in a higher order on the display screen of the lower phrase group.
Speech translation method.

Computer
An input unit for inputting a voice of a user and / or a dialogue person;
A translation unit that translates the content of the input speech into content of a different language;
An output unit that outputs the translated content of the input speech as speech and / or text;
A storage unit that hierarchically stores an upper phrase group including at least one upper phrase, and a plurality of lower phrase groups including at least one lower phrase associated with each upper phrase;
Display the upper phrase group, and when the specific phrase is selected from the upper phrases, the process of displaying the lower phrase group including the lower phrase associated with the specific phrase is sequentially executed hierarchically A display unit to
To function,
The upper phrase group includes a specific question as the upper phrase,
Wherein the display unit, when the particular questionnaire is selected, a different display form and the display of the lower phrase, displays a screen for inputting any answers to the specific questions,
The upper phrase and the lower phrase are preset automatically or manually for each type of business to which the user belongs or for each store of the user,
The storage unit stores the number of times of selection of each upper phrase and each lower phrase,
The display unit displays the higher order phrase having a higher number of selections in a higher order on the display screen of the first phrase group, and the lower order phrase having a higher selection number is displayed on the display screen of the lower order phrase group. In a higher ranking,
If the specific phrase is an order inquiry and the sub-phrase group including the sub-phrase associated with the specific phrase is a list of a plurality of orders,
The storage unit stores the number of times each order item is selected, or the profit rate of each order item,
The display unit displays an ordered item with a larger number of selections or an ordered item with a higher profit margin in a higher order on the display screen of the lower phrase group.
Speech translation program.