JP6697270B2

JP6697270B2 - Communication support system, communication support method, and program

Info

Publication number: JP6697270B2
Application number: JP2016006633A
Authority: JP
Inventors: 浩章奥本; 本山　雅; 雅本山; 慶子蛭川; 佳成澤田
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2016-01-15
Filing date: 2016-01-15
Publication date: 2020-05-20
Anticipated expiration: 2036-01-15
Also published as: JP2017126042A

Description

本発明は異なる言語を使用するユーザ同士のコミュニケーションを支援するコミュニケーション支援システム、コミュニケーション支援方法、および当該コミュニケーション支援システムとしてコンピュータを機能させるためのプログラムに関する。 The present invention relates to a communication support system that supports communication between users who use different languages, a communication support method, and a program that causes a computer to function as the communication support system.

経済や文化のグローバル化に伴い、ある言語を使用するユーザが発した発話内容を、当該ある言語とは異なる言語に翻訳するための技術が開発されている。 Along with the globalization of economy and culture, a technique for translating the utterance content uttered by a user who uses a certain language into a language different from the certain language has been developed.

例えば、特許文献１には、２以上の各音声認識装置から受信した音声認識スコアを用いて最も確からしい音声認識結果を選択し、２以上の各翻訳装置から受信した、選択した音声認識結果の翻訳スコアを用いて最も確からしい翻訳結果を選択する制御装置が記載されている。当該制御装置では、２以上の各音声合成装置から受信した音声合成スコアを用いて音声合成結果を選択し、選択した音声合成結果を音声出力する第二端末装置に送信する。 For example, in Patent Document 1, the most probable speech recognition result is selected using the speech recognition scores received from two or more speech recognition devices, and the selected speech recognition results received from two or more translation devices are selected. A control device for selecting the most likely translation result using a translation score is described. The control device selects a voice synthesis result by using the voice synthesis score received from each of the two or more voice synthesis devices, and transmits the selected voice synthesis result to the second terminal device which outputs the voice.

また、特許文献２には、入力された自然言語の音声の信号を分析し、分析された音声を、複数言語の自然言語用音響モデルと自然言語用言語モデルの自然言語用発音辞書を用いてそれぞれ並列に探索する言語自動識別装置が記載されている。当該言語自動識別装置は、探索された結果の尤度を比較して、入力された自然言語の言語を識別する。 Further, in Patent Document 2, an input natural language speech signal is analyzed, and the analyzed speech is analyzed using a natural language acoustic model of a plurality of languages and a natural language pronunciation dictionary of the natural language language model. An automatic language identification device for searching in parallel is described. The automatic language identification device identifies the input natural language by comparing the likelihoods of the searched results.

特開２０１１−９０１００号公報（２０１１年５月６日公開）JP, 2011-90100, A (published on May 6, 2011) 特開２００４−３４７７３２号公報（２００４年１２月９日公開）JP-A-2004-347732 (published on December 9, 2004)

しかしながら、上述のような従来技術では、ユーザが発話した内容を、ユーザが使用している言語とは異なる言語による発話であると装置が認識してしまった場合、ユーザの発話した内容を正しく翻訳することができない。そのため、ある言語を使用するユーザと、当該ある言語とは異なる言語を使用するユーザとが、円滑にコミュニケーションを取ることができなくなるという問題がある。 However, in the related art as described above, when the device recognizes that the content uttered by the user is in a language different from the language used by the user, the content uttered by the user is correctly translated. Can not do it. Therefore, there is a problem that a user who uses a certain language and a user who uses a language different from the certain language cannot communicate smoothly.

本発明は、前記の問題点に鑑みてなされたものであり、その目的は、異なる言語を使用するユーザ同士のコミュニケーションを円滑にする技術を提供することである。 The present invention has been made in view of the above problems, and an object thereof is to provide a technique for facilitating communication between users who use different languages.

上記の課題を解決するために、本発明の一態様に係るコミュニケーション支援システムは、第１のユーザ向けの第１の領域および第２のユーザ向けの第２の領域を有する表示部と、音声入力部と、制御部と、を備えるコミュニケーション支援システムであって、上記制御部は、上記音声入力部を介して第１のユーザの音声を示す第１の音声情報を取得し、上記第１の音声情報が示す第１の音声内容を複数の言語の各々として認識する認識処理を行い、上記複数の言語の各々として認識された認識内容を示す第１の認識内容から、表示対象の第１の認識内容を選択する選択処理を行い、上記表示対象の第１の認識内容を、上記表示部の第１の領域に表示する。 In order to solve the above problems, a communication support system according to an aspect of the present invention provides a display unit having a first area for a first user and a second area for a second user, and a voice input. And a control unit, wherein the control unit obtains first voice information indicating a voice of the first user via the voice input unit, and the first voice A recognition process for recognizing the first voice content indicated by the information as each of the plurality of languages is performed, and the first recognition of the display target is performed from the first recognition content indicating the recognition content recognized as each of the plurality of languages. A selection process for selecting the content is performed, and the first recognition content of the display target is displayed in the first area of the display unit.

また、上記の課題を解決するために、本発明の一態様に係るコミュニケーション支援方法は、第１のユーザの音声を示す第１の音声情報を取得する取得ステップと、上記第１の音声情報が示す第１の音声内容を複数の言語の各々として認識する認識ステップと、上記複数の言語の各々として認識された認識内容を示す第１の認識内容から、表示対象の第１の認識内容を選択する選択ステップと、上記表示対象の第１の認識内容を、第１のユーザ向けの第１の領域に表示する表示ステップと、を含む。 In order to solve the above problems, a communication support method according to an aspect of the present invention includes an acquisition step of acquiring first voice information indicating a voice of a first user, and the first voice information. The first recognition content to be displayed is selected from the recognition step of recognizing the first voice content shown as each of the plurality of languages and the first recognition content showing the recognition content recognized as each of the plurality of languages. And a display step of displaying the first recognition content of the display target in the first area for the first user.

また、上記の課題を解決するために、本発明の一態様に係るプログラムは、第１のユーザ向けの第１の領域および第２のユーザ向けの第２の領域を有する表示部と、音声入力部と、制御部と、を備えるコミュニケーション支援システムとしてコンピュータを機能させるためのプログラムであって、上記制御部に、上記音声入力部を介して第１のユーザの音声を示す第１の音声情報を取得する取得処理、上記第１の音声情報が示す第１の音声内容を複数の言語の各々として認識する認識処理、上記複数の言語の各々として認識された認識内容を示す第１の認識内容から、表示対象の第１の認識内容を選択する選択処理、上記表示対象の第１の認識内容を、上記表示部の第１の領域に表示する表示処理、を実行させる。 In order to solve the above problems, a program according to one aspect of the present invention includes a display unit having a first area for a first user and a second area for a second user, and a voice input. A program for causing a computer to function as a communication support system comprising a control unit and a control unit, wherein the control unit receives first voice information indicating a voice of a first user via the voice input unit. From the acquisition process for acquiring, the recognition process for recognizing the first voice content indicated by the first voice information as each of the plurality of languages, and the first recognition content indicating the recognition content recognized as each of the plurality of languages Selection processing for selecting the first recognition content to be displayed, and display processing for displaying the first recognition content to be displayed in the first area of the display unit.

本発明の一態様によれば、異なる言語を使用するユーザ同士のコミュニケーションを円滑にすることができる。 According to one aspect of the present invention, it is possible to facilitate communication between users who use different languages.

本発明の実施形態１におけるコミュニケーション支援システムの要部構成を示すブロック図である。It is a block diagram showing the composition of the important section of the communication support system in Embodiment 1 of the present invention. 本発明の実施形態１におけるコミュニケーション支援システムを模式的に示す図である。It is a figure which shows typically the communication support system in Embodiment 1 of this invention. 本発明の実施形態１におけるコミュニケーション支援システムにおいて、サービス提供者から発話があった場合の処理の流れの一例を示すシーケンス図である。FIG. 6 is a sequence diagram showing an example of a flow of processing when a service provider utters a communication in the communication support system in the first exemplary embodiment of the present invention. 本発明の実施形態１において表示部に表示される画像の一例を示す図である。FIG. 3 is a diagram showing an example of an image displayed on a display unit in the first embodiment of the present invention. 本発明の実施形態１におけるコミュニケーション支援システムにおいて、サービス提供者から発話があった場合の処理の流れの他の例を示すシーケンス図である。In the communication support system in Embodiment 1 of this invention, it is a sequence diagram which shows the other example of the flow of a process when a service provider speaks. 本発明の実施形態１におけるコミュニケーション支援システムにおいて、サービス利用者から発話があった場合の処理の流れを示すシーケンス図である。FIG. 6 is a sequence diagram showing a flow of processing when a service user utters a communication in the communication support system according to the first exemplary embodiment of the present invention. 本発明の実施形態１において表示部に表示される画像の他の例を示す図である。FIG. 6 is a diagram showing another example of an image displayed on the display unit in the first embodiment of the present invention. 本発明の実施形態１において表示部に表示される画像のさらに他の例を示す図である。FIG. 9 is a diagram showing still another example of the image displayed on the display unit in the first embodiment of the present invention. 本発明の実施形態２における端末記憶部に格納されるデータベースの例である。It is an example of the database stored in the terminal storage unit in the second embodiment of the present invention. 本発明の実施形態２におけるクライアント端末の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process of the client terminal in Embodiment 2 of this invention. 本発明の実施形態２において表示部に表示される画像の一例を示す図である。FIG. 9 is a diagram showing an example of an image displayed on a display unit in Embodiment 2 of the present invention. 本発明の実施形態３において表示部に表示される画像の一例を示す図である。It is a figure which shows an example of the image displayed on a display part in Embodiment 3 of this invention. 本発明の実施形態４において表示部に表示される画像の一例を示す図である。It is a figure which shows an example of the image displayed on the display part in Embodiment 4 of this invention. 本発明の実施形態５において表示部に表示される画像の一例を示す図である。It is a figure which shows an example of the image displayed on a display part in Embodiment 5 of this invention. 本発明の実施形態６における端末記憶部に格納されるデータベースの例である。It is an example of the database stored in the terminal storage unit in the sixth embodiment of the present invention. 本発明の実施形態６におけるクライアント端末の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process of the client terminal in Embodiment 6 of this invention. 本発明の実施形態６において表示部に表示される画像の一例を示す図である。It is a figure which shows an example of the image displayed on the display part in Embodiment 6 of this invention. 本発明の実施形態７におけるクライアント端末の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process of the client terminal in Embodiment 7 of this invention. 本発明の実施形態７において表示部に表示される画像の一例を示す図である。It is a figure which shows an example of the image displayed on the display part in Embodiment 7 of this invention. 本発明の実施形態８において表示部に表示される画像の一例を示す図である。It is a figure which shows an example of the image displayed on the display part in Embodiment 8 of this invention. 認識サーバ、翻訳サーバ、支援サーバ、およびクライアント端末として利用可能なコンピュータのハードウェア構成を例示したブロック図である。It is a block diagram which illustrated the hardware constitutions of the computer which can be used as a recognition server, a translation server, a support server, and a client terminal.

以下、本発明の実施の形態について、詳細に説明する。なお、説明の便宜上、前記実施形態にて説明した部材と同じ機能を有する部材については、同じ符号を付記し、その説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail. For convenience of description, members having the same functions as the members described in the above embodiment will be designated by the same reference numerals, and the description thereof will be omitted.

〔実施形態１〕
（コミュニケーション支援システム１）
図２は、本発明の実施形態１におけるコミュニケーション支援システム１を模式的に示す図である。コミュニケーション支援システム１は、図２に示すように、認識サーバ１０、翻訳サーバ２０、支援サーバ３０、およびクライアント端末４０によって構成されている。本明細書では、クライアント端末４０および支援サーバ３０はローカルに設置されており、認識サーバ１０および翻訳サーバ２０はネットワーク２上に設置されている場合（支援サーバ３０がネットワーク２を介して認識サーバ１０、および翻訳サーバ２０と通信可能な構成）について説明するが、支援サーバ３０、認識サーバ１０、および翻訳サーバ２０が設置される場所については特に限定されない。 [Embodiment 1]
(Communication support system 1)
FIG. 2 is a diagram schematically showing the communication support system 1 according to the first embodiment of the present invention. As shown in FIG. 2, the communication support system 1 includes a recognition server 10, a translation server 20, a support server 30, and a client terminal 40. In this specification, when the client terminal 40 and the support server 30 are installed locally, and the recognition server 10 and the translation server 20 are installed on the network 2 (the support server 30 recognizes the recognition server 10 via the network 2). , And a configuration capable of communicating with the translation server 20), but the locations where the support server 30, the recognition server 10, and the translation server 20 are installed are not particularly limited.

コミュニケーション支援システム１では、コミュニケーション支援システム１を用いてコミュニケーションを図る複数のユーザ（第１のユーザおよび第２のユーザ）が発した音声に応じて、（１）当該音声の内容を所定の言語として認識された認識内容、および（２）当該認識内容を所定の言語に翻訳した翻訳内容、の少なくとも何れかがクライアント端末４０の表示部４６に表示される。このように、コミュニケーション支援システム１は、異なる言語を使用する第１のユーザおよび第２のユーザがコミュニケーションを図ることを支援するシステムである。 In the communication support system 1, according to voices uttered by a plurality of users (first user and second user) who are trying to communicate using the communication support system (1), the content of the voice is set as a predetermined language. At least one of the recognized recognition content and (2) translation content obtained by translating the recognition content into a predetermined language is displayed on the display unit 46 of the client terminal 40. As described above, the communication support system 1 is a system that supports communication between the first user and the second user who use different languages.

本明細書では、第１のユーザがサービス利用者（例えば、コンビニエンスストアを利用する客）、第２のユーザがサービス提供者（例えば、当該コンビニエンスストアの店員）である場合を例に挙げて説明するが、これに限定されない。例えば、第１のユーザがサービス提供者であり、第２のユーザがサービス利用者であってもよい。また、第２のユーザの他の例として、コンビニエンスストアに設置されている装置であって、客が操作することにより商材の情報の表示や商材の注文などを行う装置も挙げられる。 In the present specification, a case where the first user is a service user (for example, a customer who uses a convenience store) and the second user is a service provider (for example, a clerk of the convenience store) will be described as an example. However, the present invention is not limited to this. For example, the first user may be the service provider and the second user may be the service user. Another example of the second user is a device installed in a convenience store, which is operated by a customer to display product information and order products.

コミュニケーション支援システム１を構成する各装置の構成について、図１を参照して説明する。図１は、本発明の実施形態１におけるコミュニケーション支援システム１の要部構成を示すブロック図である。 The configuration of each device constituting the communication support system 1 will be described with reference to FIG. FIG. 1 is a block diagram showing a main configuration of a communication support system 1 according to the first embodiment of the present invention.

（認識サーバ１０）
認識サーバ１０は、図１に示すように、認識サーバ通信部１２および認識サーバ制御部１４を含んでいる。 (Recognition server 10)
As shown in FIG. 1, the recognition server 10 includes a recognition server communication unit 12 and a recognition server control unit 14.

認識サーバ通信部１２は、外部の装置と通信するための通信インターフェースである。 The recognition server communication unit 12 is a communication interface for communicating with an external device.

認識サーバ制御部１４は、認識サーバ１０の各構成を統括的に制御する演算装置である。より具体的には、認識サーバ制御部１４は、認識サーバ通信部１２を介して、（１）サービス提供者が発した音声を示すサービス提供者音声情報、または（２）サービス利用者が発した音声を示すサービス利用者音声情報、を取得する。 The recognition server control unit 14 is an arithmetic device that integrally controls each configuration of the recognition server 10. More specifically, the recognition server control unit 14 causes the recognition server communication unit 12 to (1) the service provider voice information indicating the voice uttered by the service provider, or (2) the service user uttered. Acquire service user voice information indicating voice.

そして、認識サーバ制御部１４は、取得した音声情報が示す音声内容を、予め設定された複数の言語として認識する認識処理を行う。なお、以下では、「Ａ音声情報が示すＡ音声内容を認識する」を、「Ａ音声情報を認識する」と記載する場合もある。認識サーバ制御部１４は、認識する言語を指定されることによって、予め設定された複数の言語のうち指定された言語として音声内容を認識する構成であってもよいし、取得した音声情報に応じて認識する言語を変更する構成であってもよい。 Then, the recognition server control unit 14 performs a recognition process of recognizing the voice content indicated by the acquired voice information as a plurality of preset languages. In the following, "recognizing A voice content indicated by A voice information" may be referred to as "recognizing A voice information". The recognition server control unit 14 may be configured to recognize voice content as a designated language among a plurality of preset languages by designating the recognized language, or depending on the obtained voice information. The language to be recognized may be changed.

例えば、認識サーバ制御部１４は、取得した音声情報を英語で認識するよう指定されると、取得した音声情報が示す音声内容を英語で認識する認識処理を行う。また、認識サーバ制御部１４は、サービス提供者音声情報を取得した場合は日本語で認識するよう設定されている場合、サービス提供者音声情報を取得すると、当該サービス提供者音声情報が示す音声内容を日本語で認識する認識処理を行う。 For example, when the recognition server control unit 14 is designated to recognize the acquired voice information in English, the recognition server control unit 14 performs a recognition process of recognizing the voice content indicated by the acquired voice information in English. When the recognition server control unit 14 is set to recognize in Japanese when the service provider voice information is acquired, when the service provider voice information is acquired, the recognition server control unit 14 acquires the voice content indicated by the service provider voice information. Recognition processing that recognizes in Japanese is performed.

なお、本明細書では、認識サーバ制御部１４は、サービス提供者音声情報を取得した場合、サービス提供者音声情報が示す音声内容を、予め設定されたサービス提供者が使用する言語として認識する認識処理を行う。また、認識サーバ制御部１４は、サービス利用者音声情報を取得した場合、予め設定された複数の言語のうち、指定された言語にて認識する認識処理を行う。 In the present specification, when the recognition server control unit 14 acquires the service provider voice information, the recognition server control unit 14 recognizes the voice content indicated by the service provider voice information as a preset language used by the service provider. Perform processing. In addition, when the service user voice information is acquired, the recognition server control unit 14 performs a recognition process of recognizing in a designated language among a plurality of preset languages.

そして、認識サーバ制御部１４は、認識した内容を示す認識内容（サービス提供者音声情報が示す音声内容を認識した認識内容を示す提供者認識内容、または、サービス利用者音声情報が示す音声内容を認識した認識内容を示す利用者認識内容）と、当該認識処理の確からしさを示す認識確度とを含む認識結果を、認識サーバ通信部１２を介して出力する。なお、「音声情報が示す音声内容Ａを言語Ｂとして認識する」とは、Ａを言語Ｂであるとして解釈するという意味が含まれる。 Then, the recognition server control unit 14 recognizes the recognition content indicating the recognized content (provider recognition content indicating the recognition content that recognizes the voice content indicated by the service provider voice information or the voice content indicated by the service user voice information. The recognition result including the user recognition content indicating the recognized recognition content) and the recognition accuracy indicating the certainty of the recognition processing is output via the recognition server communication unit 12. Note that “recognizing the voice content A indicated by the voice information as the language B” includes the meaning of interpreting A as the language B.

（翻訳サーバ２０）
翻訳サーバ２０は、図１に示すように、翻訳サーバ通信部２２および翻訳サーバ制御部２４を含んでいる。 (Translation server 20)
The translation server 20, as shown in FIG. 1, includes a translation server communication unit 22 and a translation server control unit 24.

翻訳サーバ通信部２２は、外部の装置と通信するための通信インターフェースである。 The translation server communication unit 22 is a communication interface for communicating with an external device.

翻訳サーバ制御部２４は、翻訳サーバ２０の各構成を統括的に制御する演算装置である。より具体的には、翻訳サーバ制御部２４は、翻訳サーバ通信部２２を介して、提供者認識内容、または利用者認識内容を取得する。 The translation server control unit 24 is an arithmetic unit that integrally controls each configuration of the translation server 20. More specifically, the translation server control unit 24 acquires the provider recognition content or the user recognition content via the translation server communication unit 22.

そして、翻訳サーバ制御部２４は、取得した認識内容を、予め設定された複数の言語に翻訳する翻訳処理を行う。翻訳サーバ制御部２４は、翻訳する言語を指定されることによって、予め設定された複数の言語のうち指定された言語として翻訳する構成であってもよいし、取得した認識内容に応じて翻訳する言語を変更する構成であってもよい。 Then, the translation server control unit 24 performs a translation process of translating the acquired recognition content into a plurality of preset languages. The translation server control unit 24 may be configured to translate as a designated language among a plurality of preset languages by designating a language to translate, or translate according to the acquired recognition content. It may be configured to change the language.

例えば、翻訳サーバ制御部２４は、取得した認識内容を英語に翻訳するよう指定されると、取得した認識内容を英語で翻訳する翻訳処理を行う。また、翻訳サーバ制御部２４は、利用者認識内容を取得した場合は日本語に翻訳するよう設定されている場合、利用者認識内容を取得すると、当該利用者認識内容を日本語で翻訳する翻訳処理を行う。 For example, when the translation server control unit 24 is designated to translate the acquired recognition content into English, the translation server control unit 24 performs a translation process of translating the acquired recognition content into English. Further, when the translation server control unit 24 is set to translate into Japanese when the user recognition content is acquired, when the user recognition content is acquired, the translation server control unit 24 translates the user recognition content into Japanese. Perform processing.

なお、本明細書では、翻訳サーバ制御部２４は、提供者認識内容を取得した場合、予め設定された複数の言語のうち、指定された言語に翻訳する翻訳処理を行う。また、翻訳サーバ制御部２４は、利用者認識内容を取得した場合、予め設定されたサービス提供者が使用する言語に翻訳する。 In the present specification, the translation server control unit 24, when acquiring the provider recognition content, performs translation processing to translate into a designated language among a plurality of preset languages. In addition, the translation server control unit 24, when acquiring the user recognition content, translates it into a preset language used by the service provider.

そして、翻訳サーバ制御部２４は、翻訳した内容を示す翻訳内容を、翻訳サーバ通信部２２を介して出力する。 Then, the translation server control unit 24 outputs the translation content indicating the translated content via the translation server communication unit 22.

（支援サーバ３０）
支援サーバ３０は、図１に示すように、支援サーバ通信部３２および支援サーバ制御部３４を含んでいる。 (Support server 30)
As shown in FIG. 1, the support server 30 includes a support server communication unit 32 and a support server control unit 34.

支援サーバ通信部３２は、外部の装置と通信するための通信インターフェースである。 The support server communication unit 32 is a communication interface for communicating with an external device.

支援サーバ制御部３４は、支援サーバ３０の各構成を統括的に制御する演算装置である。支援サーバ制御部３４の詳細について、以下に説明する。 The support server control unit 34 is an arithmetic device that integrally controls each configuration of the support server 30. Details of the support server control unit 34 will be described below.

（支援サーバ制御部３４）
支援サーバ制御部３４は、図１に示すように、情報管理部３４２、選択部３４４、表示態様決定部３４６、および表示情報出力部３４８としても機能する。なお、各部の詳細な処理は、参照する図面を替えて後述する。 (Support server control unit 34)
As shown in FIG. 1, the support server control unit 34 also functions as an information management unit 342, a selection unit 344, a display mode determination unit 346, and a display information output unit 348. Detailed processing of each unit will be described later with reference to the drawings.

情報管理部３４２は、支援サーバ通信部３２を介して取得した情報を管理する。 The information management unit 342 manages the information acquired via the support server communication unit 32.

選択部３４４は、取得した認識内容から、表示対象の認識内容を選択する選択処理を行う。 The selection unit 344 performs a selection process of selecting the recognition content to be displayed from the acquired recognition content.

表示態様決定部３４６は、クライアント端末４０に認識内容または翻訳内容を表示させる表示態様を決定する。 The display mode determination unit 346 determines the display mode for displaying the recognition content or the translation content on the client terminal 40.

表示情報出力部３４８は、クライアント端末４０に認識内容または翻訳内容を表示させるための情報である表示情報を、支援サーバ通信部３２を介して出力する。 The display information output unit 348 outputs display information, which is information for displaying the recognition content or the translation content on the client terminal 40, via the support server communication unit 32.

（クライアント端末４０）
クライアント端末４０は、図１に示すように、クライアント端末通信部４２、クライアント端末制御部４４、表示部４６、音声入力部４８、操作部（操作子）５０、および端末記憶部５２を含んでいる。 (Client terminal 40)
As shown in FIG. 1, the client terminal 40 includes a client terminal communication unit 42, a client terminal control unit 44, a display unit 46, a voice input unit 48, an operation unit (operator) 50, and a terminal storage unit 52. ..

クライアント端末通信部４２は、外部の装置と通信するための通信インターフェースである。 The client terminal communication unit 42 is a communication interface for communicating with an external device.

表示部４６は、取得した画像信号が示す画像を表示する表示デバイスである。また、表示部４６は、サービス利用者向けの第１の領域およびサービス提供者向けの第２の領域を有している。 The display unit 46 is a display device that displays an image represented by the acquired image signal. The display unit 46 also has a first area for service users and a second area for service providers.

音声入力部４８は、クライアント端末４０の周辺の音声を取得し、取得した音声を示す音声情報を出力する。 The voice input unit 48 acquires voice around the client terminal 40 and outputs voice information indicating the acquired voice.

操作部５０は、ユーザの操作を受け付け、受け付けた操作を示す操作信号を出力するデバイスである。 The operation unit 50 is a device that receives a user operation and outputs an operation signal indicating the received operation.

端末記憶部５２は、データベースなどが格納されている記憶装置である。 The terminal storage unit 52 is a storage device that stores a database and the like.

クライアント端末制御部４４は、クライアント端末４０の各構成を統括的に制御する演算装置である。クライアント端末制御部４４の詳細について、以下に説明する。 The client terminal control unit 44 is an arithmetic unit that integrally controls each configuration of the client terminal 40. Details of the client terminal control unit 44 will be described below.

（クライアント端末制御部４４）
クライアント端末制御部４４は、図１に示すように、音声情報取得部４４２、音声認識部４４４、操作信号取得部４４６、表示情報取得部４４８、および表示制御部４５０としても機能する。なお、各部の詳細な処理は、参照する図面を替えて後述する。 (Client terminal control unit 44)
As shown in FIG. 1, the client terminal control unit 44 also functions as a voice information acquisition unit 442, a voice recognition unit 444, an operation signal acquisition unit 446, a display information acquisition unit 448, and a display control unit 450. Detailed processing of each unit will be described later with reference to the drawings.

音声情報取得部４４２は、音声入力部４８を介して、音声情報を取得する。 The voice information acquisition unit 442 acquires voice information via the voice input unit 48.

音声認識部４４４は、上述した認識サーバ制御部１４と同様の処理を実行する。 The voice recognition unit 444 executes the same processing as that of the recognition server control unit 14 described above.

操作信号取得部４４６は、操作部５０を介して、操作信号を取得する。 The operation signal acquisition unit 446 acquires the operation signal via the operation unit 50.

表示情報取得部４４８は、クライアント端末通信部４２を介して、表示情報を取得する。 The display information acquisition unit 448 acquires display information via the client terminal communication unit 42.

表示制御部４５０は、表示部４６に表示させる画像を示す画像信号を出力する。 The display control unit 450 outputs an image signal indicating an image to be displayed on the display unit 46.

（サービス提供者から発話があった場合１）
コミュニケーション支援システム１において、サービス提供者から発話があった場合の処理について、図３を用いて説明する。図３は、本発明の実施形態１におけるコミュニケーション支援システム１において、サービス提供者から発話があった場合の処理の流れの一例を示すシーケンス図である。以下の説明では、特に記載がない限り、サービス提供者が使用する言語は日本語であり、提供者認識内容を翻訳する言語は、英語、中国語、および韓国語である場合を例に挙げ、説明する。また、図３を用いた説明では、認識処理を認識サーバ１０において実行する場合について説明する。 (1 when there is an utterance from the service provider)
In the communication support system 1, a process when a service provider makes a speech will be described with reference to FIG. FIG. 3 is a sequence diagram showing an example of a flow of processing in the communication support system 1 according to the first embodiment of the present invention when a service provider makes a speech. In the following description, unless otherwise specified, the language used by the service provider is Japanese, and the languages for translating the content recognized by the provider are English, Chinese, and Korean, as an example. explain. In the description with reference to FIG. 3, the case where the recognition process is executed by the recognition server 10 will be described.

（ステップＳ２）
クライアント端末４０の音声情報取得部４４２は、音声入力部４８を介してサービス提供者が発した音声を示す提供者音声情報を取得する。具体的には、音声情報取得部４４２は、音声入力部４８を介して、サービス提供者が発した「何かお探しですか？」を示す提供者音声情報を取得する。 (Step S2)
The voice information acquisition unit 442 of the client terminal 40 acquires the provider voice information indicating the voice uttered by the service provider via the voice input unit 48. Specifically, the voice information acquisition unit 442 acquires, via the voice input unit 48, the provider voice information that is issued by the service provider and indicates “Are you looking for something?”.

なお、クライアント端末４０が、取得した音声情報は提供者が発した音声を示す音声情報であるか否かを判定する方法の例として、サービス利用者側に取り付けられたサービス利用者側音声入力部４８ａと、サービス提供者側に取り付けられたサービス提供者側音声入力部４８ｂとを備え、サービス提供者側音声入力部４８ｂを介して取得した音声情報を、提供者音声情報と判定する構成が挙げられる。 As an example of a method for the client terminal 40 to determine whether or not the acquired voice information is voice information indicating the voice uttered by the provider, the service user side voice input unit attached to the service user side. 48a and a service provider side voice input unit 48b attached to the service provider side, and the voice information acquired via the service provider side voice input unit 48b is determined to be the provider voice information. Be done.

（ステップＳ４）
音声情報取得部４４２は、クライアント端末通信部４２を介して支援サーバ３０に提供者音声情報を出力する。 (Step S4)
The voice information acquisition unit 442 outputs the provider voice information to the support server 30 via the client terminal communication unit 42.

（ステップＳ６）
支援サーバ３０の情報管理部３４２は、支援サーバ通信部３２を介して提供者音声情報を取得する。 (Step S6)
The information management unit 342 of the support server 30 acquires the provider voice information via the support server communication unit 32.

（ステップＳ８）
情報管理部３４２は、提供者音声情報を取得すると、当該提供者音声情報が示す提供者音声内容を認識した認識結果を取得するため、支援サーバ通信部３２を介して、当該提供者音声情報を認識サーバ１０に出力する。 (Step S8)
When the information managing unit 342 acquires the provider voice information, the information managing unit 342 acquires the provider voice information via the support server communication unit 32 in order to acquire the recognition result of recognizing the provider voice content indicated by the provider voice information. Output to the recognition server 10.

（ステップＳ１０）
認識サーバ１０の認識サーバ制御部１４は、認識サーバ通信部１２を介して、提供者音声情報を取得する。 (Step S10)
The recognition server control unit 14 of the recognition server 10 acquires the provider voice information via the recognition server communication unit 12.

（ステップＳ１２）
認識サーバ制御部１４は、取得した提供者音声情報が示す音声内容を、サービス提供者が使用する言語として認識する。具体的には、認識サーバ制御部１４は、「何かお探しですか？」を示す提供者音声情報が示す提供者音声内容を、日本語として認識する。 (Step S12)
The recognition server control unit 14 recognizes the voice content indicated by the acquired provider voice information as the language used by the service provider. Specifically, the recognition server control unit 14 recognizes the provider voice content indicated by the provider voice information indicating "Are you looking for something?" As Japanese.

（ステップＳ１４）
認識サーバ制御部１４は、認識した内容を示す提供者認識内容と、ステップＳ１２における認識処理の確からしさを示す認識確度とを含む認識結果を、認識サーバ通信部１２を介して支援サーバ３０に出力する。具体的には、認識サーバ制御部１４は、「何かお探しですか？」を日本語として認識したため、認識確度は高く、提供者認識内容も「何かお探しですか？」になる。 (Step S14)
The recognition server control unit 14 outputs the recognition result including the provider recognition content indicating the recognized content and the recognition accuracy indicating the certainty of the recognition processing in step S12 to the support server 30 via the recognition server communication unit 12. To do. Specifically, since the recognition server control unit 14 recognizes “What are you looking for?” In Japanese, the recognition accuracy is high and the provider recognition content is also “What are you looking for?”.

（ステップＳ１６）
支援サーバ３０の情報管理部３４２は、支援サーバ通信部３２を介して認識結果を取得する。情報管理部３４２は、取得した認識結果を表示態様決定部３４６に出力する。 (Step S16)
The information management unit 342 of the support server 30 acquires the recognition result via the support server communication unit 32. The information management unit 342 outputs the acquired recognition result to the display mode determination unit 346.

（ステップＳ１８）
また、情報管理部３４２は、取得した認識結果に含まれる提供者認識内容を翻訳するため、提供者認識内容を、予め設定されている複数の言語である英語、中国語、および韓国語にそれぞれ翻訳する指示と共に、当該提供者認識内容を翻訳サーバ２０に出力する。 (Step S18)
Further, the information management unit 342 translates the provider recognition content included in the acquired recognition result, so that the provider recognition content is set in each of a plurality of preset languages, English, Chinese, and Korean. The provider recognition content is output to the translation server 20 together with the translation instruction.

（ステップＳ２０）
翻訳サーバ２０の翻訳サーバ制御部２４は、翻訳サーバ通信部２２を介して、提供者認識内容を取得する。 (Step S20)
The translation server control unit 24 of the translation server 20 acquires the provider recognition content via the translation server communication unit 22.

（ステップＳ２２）
翻訳サーバ制御部２４は、取得した提供者認識内容を、支援サーバ３０によって指定された英語、中国語、および韓国語に翻訳する。具体的には、翻訳サーバ制御部２４は、「何かお探しですか？」を英語、中国語、および韓国語に翻訳する。 (Step S22)
The translation server control unit 24 translates the acquired provider recognition content into English, Chinese, and Korean designated by the support server 30. Specifically, the translation server control unit 24 translates "What are you looking for?" Into English, Chinese, and Korean.

（ステップＳ２４）
翻訳サーバ制御部２４は、翻訳処理において翻訳した提供者翻訳内容を、翻訳サーバ通信部２２を介して支援サーバ３０に出力する。 (Step S24)
The translation server control unit 24 outputs the provider translation content translated in the translation process to the support server 30 via the translation server communication unit 22.

（ステップＳ２６）
支援サーバ３０の情報管理部３４２は、支援サーバ通信部３２を介して提供者翻訳内容を取得する。情報管理部３４２は、取得した提供者翻訳内容を、表示態様決定部３４６に出力する。 (Step S26)
The information management unit 342 of the support server 30 acquires the provider translation content via the support server communication unit 32. The information management unit 342 outputs the acquired provider translation content to the display mode determination unit 346.

（ステップＳ２８）
表示態様決定部３４６は、提供者翻訳内容を取得すると、クライアント端末４０の表示部４６に表示する表示態様を決定する。具体的には、表示態様決定部３４６は、ステップＳ１６において取得した認識結果に含まれる提供者認識内容をサービス提供者向けの第２の領域に表示させ、提供者翻訳内容をサービス利用者向けの第１の領域に表示させるように、表示態様を決定する。そして、表示態様決定部３４６は、決定した表示態様、提供者認識内容、および提供者翻訳内容を含む表示情報を、表示情報出力部３４８に出力する。 (Step S28)
The display mode determination unit 346 determines the display mode to be displayed on the display unit 46 of the client terminal 40 when the provider translation content is acquired. Specifically, the display mode determination unit 346 causes the provider recognition content included in the recognition result acquired in step S16 to be displayed in the second area for the service provider, and the provider translation content for the service user. The display mode is determined so that it is displayed in the first area. Then, the display mode determination unit 346 outputs the display information including the determined display mode, the provider recognition content, and the provider translation content to the display information output unit 348.

（ステップＳ３０）
表示情報出力部３４８は、取得した表示情報を、支援サーバ通信部３２を介してクライアント端末４０に出力する。 (Step S30)
The display information output unit 348 outputs the acquired display information to the client terminal 40 via the support server communication unit 32.

（ステップＳ３２）
クライアント端末４０の表示情報取得部４４８は、クライアント端末通信部４２を介して表示情報を取得する。表示情報取得部４４８は、取得した表示情報を表示制御部４５０に出力する。 (Step S32)
The display information acquisition unit 448 of the client terminal 40 acquires the display information via the client terminal communication unit 42. The display information acquisition unit 448 outputs the acquired display information to the display control unit 450.

（ステップＳ３４）
表示制御部４５０は、表示情報を取得すると、表示情報に含まれる情報を参照し、表示部４６に画像を表示させる。具体的には、表示制御部４５０は、表示情報に含まれる表示態様を参照し、表示情報に含まれる提供者認識内容をサービス提供者向けの第２の領域に表示させ、表示情報に含まれる提供者翻訳内容をサービス利用者向けの第１の領域に表示させる。このとき、表示部４６に表示される画像の例を、図４に示す。図４は、本発明の実施形態１において表示部４６に表示される画像の一例を示す図であり、（ａ）は、第１の領域４６ａに表示される画像であり、（ｂ）は、第２の領域４６ｂに表示される画像である。 (Step S34)
Upon obtaining the display information, the display control unit 450 refers to the information included in the display information and causes the display unit 46 to display the image. Specifically, the display control unit 450 refers to the display mode included in the display information, displays the provider recognition content included in the display information in the second area for the service provider, and includes the content in the display information. The provider translation content is displayed in the first area for service users. An example of the image displayed on the display unit 46 at this time is shown in FIG. FIG. 4 is a diagram showing an example of an image displayed on the display unit 46 in the first embodiment of the present invention, (a) is an image displayed on the first area 46a, and (b) is It is an image displayed in the second area 46b.

図４の（ａ）に示すように、第１の領域４６ａに表示される画像には、提供者認識内容を、（１）英語に翻訳した翻訳内容を含むテキスト６００、（２）中国語に翻訳した翻訳内容を含むテキスト６０２、および（３）韓国語に翻訳した翻訳内容を含むテキスト６０４が含まれている。また、第１の領域４６ａに表示される画像には、コミュニケーション支援システムにおいて英語、中国語、および韓国語以外の他の言語にも翻訳した場合、当該他の言語の翻訳内容を表示させるための操作を受け付けるボタン６０６が含まれていてもよい。 As shown in (a) of FIG. 4, in the image displayed in the first area 46a, the contents of the provider recognition are (1) the text 600 including the translated contents translated into English, and (2) the Chinese characters. A text 602 including the translated content translated and (3) a text 604 including the translated content translated into Korean are included. Further, the image displayed in the first area 46a is for displaying the translated content of the other language when translated into other languages other than English, Chinese, and Korean in the communication support system. A button 606 for accepting an operation may be included.

また、図４の（ｂ）に示すように、第２の領域４６ｂに表示される画像には、提供者認識内容を含むテキスト７００が含まれている。 Further, as shown in FIG. 4B, the image displayed in the second area 46b includes the text 700 including the contents of the provider recognition.

（サービス提供者から発話があった場合２）
続いて、コミュニケーション支援システム１において、サービス提供者から発話があった場合の別の処理について、図５を用いて説明する。図５は、本発明の実施形態１におけるコミュニケーション支援システム１において、サービス提供者から発話があった場合の処理の流れの他の例を示すシーケンス図である。図５を用いた説明では、認識処理をクライアント端末４０において実行する場合について説明する。 (When there is a utterance from the service provider 2)
Next, another process in the communication support system 1 when a service provider speaks will be described with reference to FIG. FIG. 5 is a sequence diagram showing another example of the flow of processing in the communication support system 1 according to the first embodiment of the present invention when a service provider makes a speech. In the description using FIG. 5, the case where the recognition process is executed in the client terminal 40 will be described.

（ステップＳ２）
クライアント端末４０の音声情報取得部４４２は、音声入力部４８を介してサービス提供者が発した音声を示す提供者音声情報を取得する。音声情報取得部４４２は、取得した提供者音声情報を、音声認識部４４４に出力する。 (Step S2)
The voice information acquisition unit 442 of the client terminal 40 acquires the provider voice information indicating the voice uttered by the service provider via the voice input unit 48. The voice information acquisition unit 442 outputs the acquired provider voice information to the voice recognition unit 444.

（ステップＳ４０）
音声認識部４４４は、取得した提供者音声情報が示す音声内容を、サービス提供者が使用する言語として認識する。 (Step S40)
The voice recognition unit 444 recognizes the voice content indicated by the acquired provider voice information as the language used by the service provider.

（ステップＳ４２）
音声認識部４４４は、認識した内容を示す提供者認識内容と、ステップＳ４０における認識処理の確からしさを示す認識確度とを含む認識結果を、クライアント端末通信部４２を介して支援サーバ３０に出力する。 (Step S42)
The voice recognition unit 444 outputs the recognition result including the provider recognition content indicating the recognized content and the recognition accuracy indicating the accuracy of the recognition processing in step S40 to the support server 30 via the client terminal communication unit 42. ..

（ステップＳ４４）
支援サーバ３０の情報管理部３４２は、支援サーバ通信部３２を介して認識結果を取得する。情報管理部３４２は、取得した認識結果を表示態様決定部３４６に出力する。 (Step S44)
The information management unit 342 of the support server 30 acquires the recognition result via the support server communication unit 32. The information management unit 342 outputs the acquired recognition result to the display mode determination unit 346.

（ステップＳ１８〜ステップＳ３４）
上述した処理と同じ処理であるため、説明を省略する。 (Step S18 to Step S34)
Since the process is the same as the process described above, the description thereof will be omitted.

このように、音声情報に含まれる音声内容を認識する認識処理は、クライアント端末４０において実行されてもよいし、認識サーバ１０において実行されてもよい。認識処理がクライアント端末４０において実行される場合、クライアント端末４０、支援サーバ３０、および認識サーバ１０との間における通信量を減少させることができるという効果がある。一方、認識処理が認識サーバ１０において実行される場合、クライアント端末４０の負荷を減少させることができるという効果がある。また、クライアント端末４０においてまず認識処理を実行し、認識確度が所定の値より低い場合、認識サーバ１０に認識処理を実行させる構成であってもよい。そのため、特に記載がない限り、認識処理を行う装置については、限定されない。 As described above, the recognition process of recognizing the voice content included in the voice information may be executed by the client terminal 40 or the recognition server 10. When the recognition process is executed in the client terminal 40, there is an effect that the communication amount between the client terminal 40, the support server 30, and the recognition server 10 can be reduced. On the other hand, when the recognition processing is executed by the recognition server 10, there is an effect that the load on the client terminal 40 can be reduced. Alternatively, the recognition process may be executed in the client terminal 40 first, and when the recognition accuracy is lower than a predetermined value, the recognition server 10 may execute the recognition process. Therefore, unless otherwise specified, the device that performs the recognition process is not limited.

（サービス利用者から発話があった場合）
コミュニケーション支援システム１において、サービス利用者から発話があった場合の処理（コミュニケーション支援方法）について、図６を用いて説明する。図６は、本発明の実施形態１におけるコミュニケーション支援システム１において、サービス利用者から発話があった場合の処理の流れを示すシーケンス図である。以下の説明では、特に記載がない限り、利用者認識内容は、英語、中国語、および韓国語として認識され、利用者認識内容を翻訳する言語は、日本語である場合を例に挙げ、説明する。 (When the service user speaks)
In the communication support system 1, a process (communication support method) when a service user speaks will be described with reference to FIG. FIG. 6 is a sequence diagram showing a flow of processing in the communication support system 1 according to the first embodiment of the present invention when a service user makes a speech. In the following explanation, unless otherwise specified, the user recognition content is recognized as English, Chinese, and Korean, and the language for translating the user recognition content is Japanese, as an example. To do.

（ステップＳ５２：取得ステップ）
クライアント端末４０の音声情報取得部４４２は、音声入力部４８を介してサービス利用者が発した音声を示す利用者音声情報を取得する取得処理を行う。具体的には、音声情報取得部４４２は、音声入力部４８を介して、上述したサービス提供者の「何かお探しですか？」に対してサービス利用者が発した (Step S52: acquisition step)
The voice information acquisition unit 442 of the client terminal 40 performs an acquisition process of acquiring user voice information indicating the voice uttered by the service user via the voice input unit 48. Specifically, the voice information acquisition unit 442, via the voice input unit 48, is issued by the service user to the above-mentioned service provider “are you looking for something?”.

を示す利用者音声情報を取得する。 The user voice information indicating is acquired.

（ステップＳ５４）
音声情報取得部４４２は、クライアント端末通信部４２を介して支援サーバ３０に利用者音声情報を出力する。 (Step S54)
The voice information acquisition unit 442 outputs the user voice information to the support server 30 via the client terminal communication unit 42.

（ステップＳ５６）
支援サーバ３０の情報管理部３４２は、支援サーバ通信部３２を介して利用者音声情報を取得する。 (Step S56)
The information management unit 342 of the support server 30 acquires the user voice information via the support server communication unit 32.

（ステップＳ５８）
情報管理部３４２は、利用者音声情報を取得すると、当該利用者音声情報が示す利用者音声内容を認識した認識結果を取得するため、支援サーバ通信部３２を介して、当該利用者音声情報を認識サーバ１０に出力する。 (Step S58)
When the information management unit 342 acquires the user voice information, the information management unit 342 acquires the user voice information via the support server communication unit 32 in order to acquire the recognition result of recognizing the user voice content indicated by the user voice information. Output to the recognition server 10.

（ステップＳ６０）
認識サーバ１０の認識サーバ制御部１４は、認識サーバ通信部１２を介して、利用者音声情報を取得する。 (Step S60)
The recognition server control unit 14 of the recognition server 10 acquires the user voice information via the recognition server communication unit 12.

（ステップＳ６２：認識ステップ）
認識サーバ制御部１４は、取得した利用者音声情報が示す音声内容を、英語、中国語、および韓国語として認識する。具体的には、認識サーバ制御部１４は、「ウォシャンヤオコーヒー」という発音を、英語、中国語、および韓国語として認識する。 (Step S62: Recognition step)
The recognition server control unit 14 recognizes the voice content indicated by the acquired user voice information as English, Chinese, and Korean. Specifically, the recognition server control unit 14 recognizes the pronunciation “Woshang Yao Coffee” as English, Chinese, and Korean.

（ステップＳ６４）
認識サーバ制御部１４は、認識した内容を示す利用者認識内容と、ステップＳ６２における認識処理の確からしさを示す認識確度とを含む認識結果を、認識サーバ通信部１２を介して支援サーバ３０に出力する。具体的には、認識サーバ制御部１４は、「ウォシャンヤオコーヒー」を中国語として認識した認識結果は、認識確度は高く、利用者認識内容も (Step S64)
The recognition server control unit 14 outputs the recognition result including the user recognition content indicating the recognized content and the recognition accuracy indicating the certainty of the recognition processing in step S62 to the support server 30 via the recognition server communication unit 12. To do. Specifically, the recognition server control unit 14 recognizes “Woshang Yao Coffee” in Chinese as a recognition result with high recognition accuracy, and the user recognition content is also high.

になる。一方、認識サーバ制御部１４は、「ウォシャンヤオコーヒー」を英語として認識した結果は、認識確度が低く、利用者認識内容も「What are y'all coffee」になる。 become. On the other hand, the recognition server control unit 14 recognizes "Woshan Yao coffee" in English, and the recognition accuracy is low, and the user recognition content is "What are y'all coffee".

（ステップＳ６６）
支援サーバ３０の情報管理部３４２は、支援サーバ通信部３２を介して認識結果を取得する。情報管理部３４２は、取得した認識結果を選択部３４４に出力する。 (Step S66)
The information management unit 342 of the support server 30 acquires the recognition result via the support server communication unit 32. The information management unit 342 outputs the acquired recognition result to the selection unit 344.

（ステップＳ６８：選択ステップ）
選択部３４４は、認識結果を参照し、表示対象の利用者認識内容を選択する。具体的には、選択部３４４は、認識結果に含まれている認識確度を参照し、認識確度が所定の閾値より高い認識確度で認識された利用者認識内容を選択する。より具体的には、「ウォシャンヤオコーヒー」を英語および中国語として認識した認識確度が所定の閾値より高く、韓国語として認識した認識確度が所定の閾値以下の場合、選択部３４４は、英語および中国語としてそれぞれ認識された利用者認識内容を選択する。そして、選択部３４４は、選択した利用者認識内容を、情報管理部３４２に出力する。また、選択部３４４は、選択した利用者認識内容を含む認識結果を、表示態様決定部３４６に出力する。 (Step S68: selection step)
The selection unit 344 refers to the recognition result and selects the user recognition content to be displayed. Specifically, the selection unit 344 refers to the recognition certainty included in the recognition result, and selects the user recognition content recognized with the recognition certainty whose recognition certainty is higher than a predetermined threshold. More specifically, when the recognition accuracy of recognizing “Woshan Yao Coffee” as English and Chinese is higher than a predetermined threshold and the recognition accuracy of recognizing it as Korean is equal to or lower than the predetermined threshold, the selecting unit 344 selects the English And the user recognition contents recognized as Chinese and Chinese are selected. Then, the selection unit 344 outputs the selected user recognition content to the information management unit 342. The selection unit 344 also outputs the recognition result including the selected user recognition content to the display mode determination unit 346.

（ステップＳ７０）
情報管理部３４２は、取得した利用者認識内容を翻訳するため、当該利用者認識内容を翻訳サーバ２０に出力する。 (Step S70)
The information management unit 342 outputs the acquired user recognition content to the translation server 20 in order to translate the acquired user recognition content.

（ステップＳ７２）
翻訳サーバ２０の翻訳サーバ制御部２４は、翻訳サーバ通信部２２を介して、利用者認識内容を取得する。 (Step S72)
The translation server control unit 24 of the translation server 20 acquires the user recognition content via the translation server communication unit 22.

（ステップＳ７４）
翻訳サーバ制御部２４は、取得した利用者認識内容を、予め設定された日本語に翻訳する。具体的には、翻訳サーバ制御部２４は、中国語の (Step S74)
The translation server control unit 24 translates the acquired user recognition content into preset Japanese. Specifically, the translation server control unit 24

および英語の「What are y'all coffee」を、日本語に翻訳する。 And "What are y'all coffee" in English are translated into Japanese.

（ステップＳ７６）
翻訳サーバ制御部２４は、翻訳処理において翻訳した利用者翻訳内容を、翻訳サーバ通信部２２を介して支援サーバ３０に出力する。 (Step S76)
The translation server control unit 24 outputs the user translation content translated in the translation process to the support server 30 via the translation server communication unit 22.

（ステップＳ７８）
支援サーバ３０の情報管理部３４２は、支援サーバ通信部３２を介して利用者翻訳内容を取得する。情報管理部３４２は、取得した利用者翻訳内容を、表示態様決定部３４６に出力する。 (Step S78)
The information management unit 342 of the support server 30 acquires the user translation content via the support server communication unit 32. The information management unit 342 outputs the acquired user translation content to the display mode determination unit 346.

（ステップＳ８０）
表示態様決定部３４６は、利用者翻訳内容を取得すると、クライアント端末４０の表示部４６に表示する表示態様を決定する。具体的には、表示態様決定部３４６は、ステップＳ６８において取得した認識結果に含まれる利用者認識内容をサービス利用者向けの第１の領域に表示させ、利用者翻訳内容をサービス提供者向けの第２の領域に表示させるように、表示態様を決定する。そして、表示態様決定部３４６は、決定した表示態様、利用者認識内容、および利用者翻訳内容を含む表示情報を、表示情報出力部３４８に出力する。 (Step S80)
When the user translation content is acquired, the display mode determination unit 346 determines the display mode to be displayed on the display unit 46 of the client terminal 40. Specifically, the display mode determination unit 346 causes the user recognition content included in the recognition result acquired in step S68 to be displayed in the first area for the service user, and the user translation content for the service provider. The display mode is determined so that it is displayed in the second area. Then, the display mode determination unit 346 outputs the display information including the determined display mode, the user recognition content, and the user translation content to the display information output unit 348.

（ステップＳ８２）
表示情報出力部３４８は、取得した表示情報を、支援サーバ通信部３２を介してクライアント端末４０に出力する。 (Step S82)
The display information output unit 348 outputs the acquired display information to the client terminal 40 via the support server communication unit 32.

（ステップＳ８４）
クライアント端末４０の表示情報取得部４４８は、クライアント端末通信部４２を介して表示情報を取得する。表示情報取得部４４８は、取得した表示情報を表示制御部４５０に出力する。 (Step S84)
The display information acquisition unit 448 of the client terminal 40 acquires the display information via the client terminal communication unit 42. The display information acquisition unit 448 outputs the acquired display information to the display control unit 450.

（ステップＳ８６：表示ステップ）
表示制御部４５０は、表示情報を取得すると、表示情報に含まれる情報を参照し、表示部４６に画像を表示させる。具体的には、表示制御部４５０は、表示情報に含まれる表示態様を参照し、表示情報に含まれる利用者認識内容をサービス利用者向けの第１の領域に表示させ、表示情報に含まれる利用者翻訳内容をサービス提供者向けの第２の領域に表示させる。このとき、表示部４６に表示される画像の例を、図７に示す。図７は、本発明の実施形態１において表示部４６に表示される画像の他の例を示す図であり、（ａ）は、第１の領域４６ａに表示される画像であり、（ｂ）は、第２の領域４６ｂに表示される画像である。 (Step S86: display step)
Upon obtaining the display information, the display control unit 450 refers to the information included in the display information and causes the display unit 46 to display the image. Specifically, the display control unit 450 refers to the display mode included in the display information, causes the user recognition content included in the display information to be displayed in the first area for the service user, and is included in the display information. The user translated content is displayed in the second area for the service provider. An example of the image displayed on the display unit 46 at this time is shown in FIG. FIG. 7 is a diagram showing another example of an image displayed on the display unit 46 in Embodiment 1 of the present invention, (a) is an image displayed on the first area 46a, and (b) is Is an image displayed in the second area 46b.

図７の（ａ）に示すように、第１の領域４６ａに表示される画像には、上述したテキスト６００、テキスト６０２、およびテキスト６０４に加えて、（１）中国語として認識された利用者認識内容を含むテキスト６１０、および（２）英語として認識された利用者認識内容を含むテキスト６１２が含まれている。 As shown in (a) of FIG. 7, in addition to the text 600, the text 602, and the text 604 described above, the image displayed in the first area 46a includes (1) a user recognized as Chinese A text 610 including recognition contents and (2) a text 612 including user recognition contents recognized as English are included.

また、図７の（ｂ）に示すように、第２の領域４６ｂに表示される画像には、上述したテキスト７００に加えて、（１）中国語として認識された利用者認識内容を日本語に翻訳した利用者翻訳内容を含むテキスト７１０、および（２）英語として認識された利用者認識内容を日本語に翻訳した利用者翻訳内容を含むテキスト７１２が含まれている。 Further, as shown in FIG. 7B, in the image displayed in the second area 46b, in addition to the above-described text 700, (1) user recognition content recognized as Chinese is written in Japanese. A text 710 including the user translated content translated into (1) and (2) a text 712 including the user translated content translated into Japanese from the user recognized content recognized as English.

なお、コミュニケーション支援システム１では、図７の（ａ）に示すように、認識確度が高い順に上から利用者認識内容（第１の認識内容）を表示してもよい。具体的には、ステップＳ８０において、表示態様決定部３４６は、ステップＳ６８において取得した認識結果に含まれる認識確度を参照して、利用者認識内容の表示態様を決定する。例えば、上述したように、認識確度が高い順に上から利用者認識内容が表示されるように表示態様を決定する構成や、認識確度が高い利用者認識内容は濃い色で表示され、認識確度が低い利用者認識内容は薄い色で表示されるように表示態様を決定する構成が挙げられる。また、図７の（ｂ）に示すように、利用者認識内容の表示態様に合わせて、認識確度が高い利用者認識内容を翻訳した利用者翻訳内容を上に表示する構成であってもよい。 In the communication support system 1, as shown in (a) of FIG. 7, the user recognition content (first recognition content) may be displayed from the top in the descending order of recognition accuracy. Specifically, in step S80, the display mode determination unit 346 determines the display mode of the user recognition content with reference to the recognition accuracy included in the recognition result acquired in step S68. For example, as described above, a configuration that determines the display mode so that the user recognition content is displayed from the top in the order of high recognition accuracy, or the user recognition content with high recognition accuracy is displayed in a dark color, and the recognition accuracy is There is a configuration in which the display mode is determined so that the low user recognition content is displayed in a light color. Further, as shown in FIG. 7B, the user translation content obtained by translating the user recognition content having high recognition accuracy may be displayed above in accordance with the display mode of the user recognition content. ..

（さらにサービス提供者から発話があった場合）
続いて、サービス提供者が図７の（ｂ）に示す画像を見て、サービス利用者は「私はコーヒーが欲しいです」と発話したと判断し、サービス利用者に対して「ホットでよろしいですか？」と発話した場合について、図３を用いて説明する。 (In addition, if there is a utterance from the service provider)
Next, the service provider looks at the image shown in FIG. 7 (b), determines that the service user uttered "I want coffee", and asks the service user "Hot and nice. The case of uttering "?" Will be described with reference to FIG.

（ステップＳ２〜ステップＳ１６）
上述した処理と同じ処理であるため、説明を省略する。 (Steps S2 to S16)
Since the process is the same as the process described above, the description thereof will be omitted.

（ステップＳ１８）
情報管理部３４２は、取得した認識結果に含まれる提供者認識内容を翻訳するため、当該提供者認識内容を翻訳サーバ２０に出力する。ここで、上述したステップＳ６８において、選択部３４４は英語および中国語としてそれぞれ認識された利用者認識内容を選択したので、情報管理部３４２は、提供者認識内容を英語および中国語にそれぞれ翻訳する指示と共に、提供者認識内容を翻訳サーバ２０に出力する。 (Step S18)
The information management unit 342 outputs the provider recognition content included in the acquired recognition result to the translation server 20 in order to translate the provider recognition content. Here, in step S68 described above, since the selection unit 344 has selected the user recognition contents recognized as English and Chinese, the information management unit 342 translates the provider recognition contents into English and Chinese, respectively. The provider recognition content is output to the translation server 20 together with the instruction.

（ステップＳ２０）
翻訳サーバ２０の翻訳サーバ制御部２４は、翻訳サーバ通信部２２を介して、提供者認識内容を取得すると共に、提供者認識内容を英語および中国語にそれぞれ翻訳する指示を受け付ける。 (Step S20)
The translation server control unit 24 of the translation server 20 acquires the provider recognition content via the translation server communication unit 22 and receives an instruction to translate the provider recognition content into English and Chinese respectively.

（ステップＳ２２）
翻訳サーバ制御部２４は、取得した提供者認識内容を、受け付けた指示に従い、英語および中国語に翻訳する。具体的には、翻訳サーバ制御部２４は、「ホットでよろしいですか？」を英語および中国語に翻訳する。 (Step S22)
The translation server control unit 24 translates the acquired provider recognition content into English and Chinese according to the received instruction. Specifically, the translation server control unit 24 translates "Are you sure it's hot?" Into English and Chinese.

（ステップＳ２４〜ステップＳ３４）
上述した処理と同じ処理であるため、説明を省略する。ここで、ステップＳ３４において表示部４６に表示される画像の例を、図８の（ａ）および（ｂ）に示す。図８は、本発明の実施形態１において表示部４６に表示される画像のさらに他の例を示す図であり，（ａ）は、第１の領域４６ａに表示される画像であり、（ｂ）は、第２の領域４６ｂに表示される画像である。 (Steps S24 to S34)
Since the process is the same as the process described above, the description thereof will be omitted. Here, an example of the image displayed on the display unit 46 in step S34 is shown in (a) and (b) of FIG. FIG. 8 is a diagram showing still another example of the image displayed on the display unit 46 in the first embodiment of the present invention. FIG. 8A is an image displayed on the first region 46a, and FIG. ) Is an image displayed in the second area 46b.

図８の（ａ）に示すように、第１の領域４６ａに表示される画像には、上述したテキスト６００、テキスト６０２、テキスト６０４、テキスト６１０、およびテキスト６１２に加えて、提供者認識内容を、（１）中国語に翻訳した翻訳内容を含むテキスト６２０、および（２）英語に翻訳した翻訳内容を含むテキスト６２２が含まれている。 As shown in FIG. 8A, in the image displayed in the first area 46a, in addition to the text 600, the text 602, the text 604, the text 610, and the text 612 described above, the content recognized by the provider is displayed. , (1) a text 620 containing the translated content translated into Chinese, and (2) a text 622 containing the translated content translated into English.

また、図８の（ｂ）に示すように、第２の領域４６ｂに表示される画像には、上述したテキスト７００、テキスト７１０、テキスト７１２に加えて、提供者認識内容を含むテキスト７２０が含まれている。 Further, as shown in FIG. 8B, the image displayed in the second area 46b includes the text 720 including the provider recognition content in addition to the text 700, the text 710, and the text 712 described above. Has been.

（さらにサービス利用者から発話があった場合）
続いて、サービス利用者が図８の（ａ）に示す画像を見て、「好的」と発話した場合について、図６を用いて説明する。 (If there is a utterance from the service user)
Next, a case where the service user looks at the image shown in FIG. 8A and utters "favorable" will be described with reference to FIG.

（ステップＳ５２〜ステップＳ５６）
上述した処理と同じ処理であるため、説明を省略する。 (Step S52 to Step S56)
Since the process is the same as the process described above, the description thereof will be omitted.

（ステップＳ５８）
情報管理部３４２は、利用者音声情報を取得すると、当該利用者音声情報が示す利用者音声内容を認識した認識結果を取得するため、支援サーバ通信部３２を介して、当該利用者音声情報を認識サーバ１０に出力する。ここで、上述したステップＳ６８において、選択部３４４は英語および中国語としてそれぞれ認識された利用者認識内容を選択したので、情報管理部３４２は、利用者音声情報を英語および中国語として認識する指示と共に、利用者音声情報を翻訳サーバ２０に出力する。 (Step S58)
When the information management unit 342 acquires the user voice information, the information management unit 342 acquires the user voice information via the support server communication unit 32 in order to acquire the recognition result of recognizing the user voice content indicated by the user voice information. Output to the recognition server 10. Here, in step S68 described above, since the selection unit 344 has selected the user recognition contents recognized as English and Chinese, respectively, the information management unit 342 instructs to recognize the user voice information as English and Chinese. At the same time, the user voice information is output to the translation server 20.

（ステップＳ６０）
認識サーバ１０の認識サーバ制御部１４は、認識サーバ通信部１２を介して、利用者音声情報を取得すると共に、利用者音声情報を英語および中国語として認識する指示を受け付ける。 (Step S60)
The recognition server control unit 14 of the recognition server 10 acquires the user voice information via the recognition server communication unit 12 and receives an instruction to recognize the user voice information as English and Chinese.

（ステップＳ６２）
認識サーバ制御部１４は、取得した利用者音声情報が示す音声内容を、英語および中国語として認識する。具体的には、認識サーバ制御部１４は、「ハオダ」を、英語および中国語として認識する。 (Step S62)
The recognition server control unit 14 recognizes the voice content indicated by the acquired user voice information as English and Chinese. Specifically, the recognition server control unit 14 recognizes “Haoda” as English and Chinese.

（ステップＳ６４）
認識サーバ制御部１４は、認識した内容を示す利用者認識内容と、ステップＳ６２における認識処理の確からしさを示す認識確度とを含む認識結果を、認識サーバ通信部１２を介して支援サーバ３０に出力する。具体的には、認識サーバ制御部１４は、「ハオダ」を中国語として認識した認識結果は、認識確度は高く、利用者認識内容も「好的」になる。一方、認識サーバ制御部１４は、「ハオダ」を英語として認識した認識結果は、認識確度が低くなる。 (Step S64)
The recognition server control unit 14 outputs the recognition result including the user recognition content indicating the recognized content and the recognition accuracy indicating the certainty of the recognition processing in step S62 to the support server 30 via the recognition server communication unit 12. To do. Specifically, the recognition server control unit 14 recognizes “haoda” as Chinese, and the recognition result has high recognition accuracy, and the user recognition content is also “favorable”. On the other hand, the recognition server control unit 14 has low recognition accuracy in the recognition result of recognizing “Hoda” in English.

（ステップＳ６８）
選択部３４４は、認識結果を参照し、表示対象の利用者認識内容を選択する。具体的には、「ハオダ」を中国語として認識した認識確度が所定の閾値より高く、英語として認識した認識確度が所定の閾値以下の場合、選択部３４４は、中国語としてそれぞれ認識された利用者認識内容を選択する。そして、選択部３４４は、選択した利用者認識内容を、情報管理部３４２に出力する。また、選択部３４４は、選択した利用者認識内容を含む認識結果を、表示態様決定部３４６に出力する。 (Step S68)
The selection unit 344 refers to the recognition result and selects the user recognition content to be displayed. Specifically, when the recognition accuracy of recognizing “Haoda” as Chinese is higher than a predetermined threshold and the recognition accuracy of recognizing as “English” is less than or equal to a predetermined threshold, the selection unit 344 uses the usage recognized as Chinese respectively. Person recognition content is selected. Then, the selection unit 344 outputs the selected user recognition content to the information management unit 342. The selection unit 344 also outputs the recognition result including the selected user recognition content to the display mode determination unit 346.

（ステップＳ７０〜ステップＳ８６）
上述した処理と同じ処理であるため、説明を省略する。ここで、ステップＳ８６において表示部４６に表示される画像の例を、図８の（ｃ）および（ｄ）に示す。図８の（ｃ）は、第１の領域４６ａに表示される画像であり、（ｄ）は、第２の領域４６ｂに表示される画像である。 (Step S70 to Step S86)
Since the process is the same as the process described above, the description thereof will be omitted. Here, examples of the image displayed on the display unit 46 in step S86 are shown in FIGS. 8C and 8D. 8C is an image displayed in the first area 46a, and FIG. 8D is an image displayed in the second area 46b.

図８の（ｃ）に示すように第１の領域４６ａに表示される画像には、上述したテキスト６１０、テキスト６１２、テキスト６２０、およびテキスト６２２に加えて、中国語として認識された利用者認識内容を含むテキスト６３０が含まれている。 As shown in FIG. 8C, in the image displayed in the first area 46a, in addition to the text 610, the text 612, the text 620, and the text 622 described above, user recognition recognized as Chinese is recognized. Included is a text 630 containing the content.

また、図８の（ｄ）に示すように、第２の領域４６ｂに表示される画像には、上述したテキスト７００、テキスト７１０、テキスト７１２、およびテキスト７２０に加えて、中国語として認識された利用者認識内容を日本語に翻訳した利用者翻訳内容を含むテキスト７３０が含まれている。 Further, as shown in FIG. 8D, the image displayed in the second area 46b is recognized as Chinese in addition to the text 700, the text 710, the text 712, and the text 720 described above. The text 730 including the user translated content in which the user recognized content is translated into Japanese is included.

このように、本実施形態に係るコミュニケーション支援システム１では、サービス利用者（第１のユーザ）向けの第１の領域４６ａおよびサービス提供者（第２のユーザ）向けの第２の領域４６ｂを有する表示部４６と、音声入力部４８と、制御部（認識サーバ制御部１４、支援サーバ制御部３４、およびクライアント端末制御部４４）を備え、音声入力部４８を介してサービス利用者の音声を示す利用者音声情報（第１の音声情報）を取得し、利用者音声情報が示す利用者音声内容（第１の音声内容）を複数の言語（英語、中国語、および韓国語）の各々として認識する認識処理を行い、複数の言語の各々として認識された認識内容を示す利用者認識内容（第１の認識内容）から、表示対象の利用者認識内容を選択する選択処理を行い、表示対象の利用者認識内容を、表示部４６の第１の領域４６ａに表示する。この構成により、コミュニケーション支援システム１では、サービス利用者の発話を誤った言語として認識した場合であっても、当該誤った言語以外の言語として認識した認識内容も、ユーザに提示する。そのため、コミュニケーション支援システム１では、異なる言語を使用するユーザ同士のコミュニケーションを円滑にすることができる。 As described above, the communication support system 1 according to the present embodiment has the first area 46a for the service user (first user) and the second area 46b for the service provider (second user). The display unit 46, the voice input unit 48, and the control unit (the recognition server control unit 14, the support server control unit 34, and the client terminal control unit 44) are provided, and the voice of the service user is shown via the voice input unit 48. Acquires user voice information (first voice information) and recognizes the user voice content (first voice content) indicated by the user voice information as each of a plurality of languages (English, Chinese, and Korean) The recognition process is performed to perform the selection process of selecting the user recognition content to be displayed from the user recognition content (first recognition content) indicating the recognition content recognized as each of the plurality of languages, and The user recognition content is displayed in the first area 46a of the display unit 46. With this configuration, in the communication support system 1, even when the utterance of the service user is recognized as a wrong language, the recognition content recognized as a language other than the wrong language is also presented to the user. Therefore, the communication support system 1 can facilitate communication between users who use different languages.

また、本実施形態に係るコミュニケーション支援システム１では、第１のユーザはサービス利用者であり、第２のユーザはサービス提供者であってもよい。この構成により、コミュニケーション支援システム１では、使用する言語が特定できないサービス利用者（例えば、お店の客）に対しても、サービス提供者（例えば、お店の店員）は円滑にコミュニケーションを図ることができる。 Further, in the communication support system 1 according to the present embodiment, the first user may be a service user and the second user may be a service provider. With this configuration, in the communication support system 1, the service provider (for example, a store clerk) can smoothly communicate with a service user (for example, a customer of a store) whose language to be used cannot be specified. You can

また、本実施形態に係るコミュニケーション支援システム１では、第１のユーザはサービス提供者であり、第２のユーザはサービス利用者であってもよい。この構成により、コミュニケーション支援システム１では、異なる言語を使用する複数のサービス提供者（例えば、お店の店員）のそれぞれが、サービス利用者（例えば、お店の客）と円滑にコミュニケーションを図ることができる。 Further, in the communication support system 1 according to the present embodiment, the first user may be a service provider and the second user may be a service user. With this configuration, in the communication support system 1, each of a plurality of service providers (for example, store clerks) who use different languages can smoothly communicate with service users (for example, store customers). You can

また、本実施形態に係るコミュニケーション支援システム１では、選択処理において、複数の言語の各々で認識する認識処理の確からしさを示す認識確度を参照して、表示対象の利用者認識内容を選択する。この構成により、コミュニケーション支援システム１では、認識確度が閾値より高い認識確度によって認識された認識内容を、ユーザに提示する。そのため、ユーザが使用する言語である可能性がないと考えられる言語として認識された認識内容は表示しないので、異なる言語を使用するユーザ同士のコミュニケーションを円滑にすることができる。 Further, in the communication support system 1 according to the present embodiment, in the selection process, the user recognition content to be displayed is selected with reference to the recognition accuracy indicating the accuracy of the recognition process recognized in each of the plurality of languages. With this configuration, the communication support system 1 presents the user with the recognition content recognized with the recognition accuracy whose recognition accuracy is higher than the threshold value. Therefore, since the recognition content recognized as a language that is unlikely to be the language used by the user is not displayed, it is possible to facilitate communication between users who use different languages.

また、本実施形態に係るコミュニケーション支援システム１では、認識確度を参照して、表示対象の利用者認識内容を表示部４６の第１の領域４６ａに表示する表示態様を決定する。この構成により、コミュニケーション支援システム１では、例えば、認識確度が高い認識内容を目立つように表示させたり、認識確度が高い順に並べて認識内容を表示させたりすることができる。そのため、コミュニケーション支援システム１では、何れの言語による認識確度が高いのかということをユーザに知らせることができる。 Further, in the communication support system 1 according to the present embodiment, the display mode for displaying the user recognition content to be displayed in the first area 46a of the display unit 46 is determined with reference to the recognition accuracy. With this configuration, in the communication support system 1, for example, the recognition contents having high recognition accuracy can be displayed conspicuously, or the recognition contents can be displayed in the order of high recognition accuracy. Therefore, the communication support system 1 can inform the user of which language has the higher recognition accuracy.

また、本実施形態に係るコミュニケーション支援システム１では、表示対象の利用者認識内容を翻訳した利用者翻訳内容（第１の翻訳内容）を取得し、利用者翻訳内容を表示部４６の第２の領域４６ｂに表示する。この構成により、コミュニケーション支援システム１では、サービス利用者の発話内容をサービス提供者が使用する言語に翻訳し、提示することができる。そのため、コミュニケーション支援システム１では、異なる言語を使用するユーザ同士のコミュニケーションを円滑にすることができる。 In addition, in the communication support system 1 according to the present embodiment, the user translation content (first translation content) obtained by translating the user recognition content to be displayed is acquired, and the user translation content is displayed in the second section of the display unit 46. It is displayed in the area 46b. With this configuration, the communication support system 1 can translate the utterance content of the service user into the language used by the service provider and present the translated content. Therefore, the communication support system 1 can facilitate communication between users who use different languages.

また、本実施形態に係るコミュニケーション支援システム１では、音声入力部４８を介してサービス提供者の音声を示す提供者音声情報（第２の音声情報）を取得し、提供者音声情報が示す提供者音声内容（第２の音声内容）を認識する認識処理を行い、認識処理によって認識された提供者認識内容（第２の認識内容）を複数の言語に翻訳した提供者翻訳内容（第２の翻訳内容）を取得し、提供者翻訳内容を表示部４６の第１の領域４６ａに表示する。この構成により、コミュニケーション支援システム１では、サービス提供者の発話内容をサービス利用者が使用する言語を含む複数の言語に翻訳し、提示することができる。そのため、コミュニケーション支援システム１では、異なる言語を使用するユーザ同士のコミュニケーションを円滑にすることができる。 Further, in the communication support system 1 according to the present embodiment, the provider voice information (second voice information) indicating the voice of the service provider is acquired via the voice input unit 48, and the provider indicated by the provider voice information. Recognition processing for recognizing voice content (second voice content) is performed, and provider translation content (second translation) obtained by translating the provider recognition content (second recognition content) recognized by the recognition processing into a plurality of languages. Content) is acquired and the provider translation content is displayed in the first area 46a of the display unit 46. With this configuration, the communication support system 1 can translate the utterance content of the service provider into a plurality of languages including the language used by the service user and present the translated content. Therefore, the communication support system 1 can facilitate communication between users who use different languages.

なお、本実施形態では、認識確度が所定の閾値より高い認識確度で認識された利用者認識内容を選択する構成としたが、認識確度の積算値と閾値とを比較する構成であってもよい。 Note that, in the present embodiment, the user recognition content having the recognition accuracy higher than the predetermined threshold is selected as the configuration for selecting the user recognition content, but the integrated value of the recognition accuracy and the threshold may be compared. ..

例えば、サービス利用者が最初に発した音声を示す利用者音声情報を、英語、中国語、および韓国語として認識した場合の認識確度がそれぞれ「３０」、「５０」、および「１０」であった場合、上述したステップＳ６８において、選択部３４４は、認識確度に関わらず、全ての利用者認識内容を表示対象として選択する。なお、以下の説明において、言語Ａとして認識した場合の認識確度と、言語Ａ認識確度と称する。 For example, when the user voice information indicating the voice uttered by the service user first is recognized as English, Chinese, and Korean, the recognition accuracies are “30”, “50”, and “10”, respectively. In this case, in step S68 described above, the selection unit 344 selects all the user recognition contents as display targets regardless of the recognition accuracy. In the following description, the recognition accuracy when recognized as the language A and the language A recognition accuracy.

続いて、次にサービス利用者が発した音声を示す利用者音声情報を認識する認識処理における、英語認識確度、中国語認識確度、および韓国語認識確度がそれぞれ「１５」、「６０」、および「５」であった場合、上述したステップＳ６８において、選択部３４４はまず、前回の認識処理における英語認識確度に、今回の認識処理における英語認識確度を加算した積算英語認識確度を算出する。すなわち、選択部３４４は、前回の認識処理における英語認識確度「３０」に、今回の認識処理における英語認識確度「１５」を加算し、積算英語認識確度「４５」を算出する。 Then, in the recognition process of recognizing the user voice information indicating the voice uttered by the service user next, the English recognition accuracy, the Chinese recognition accuracy, and the Korean recognition accuracy are “15”, “60”, and If it is "5", in step S68 described above, the selecting unit 344 first calculates the integrated English recognition accuracy by adding the English recognition accuracy in the current recognition processing to the English recognition accuracy in the previous recognition processing. That is, the selection unit 344 adds the English recognition accuracy “15” in the current recognition processing to the English recognition accuracy “30” in the previous recognition processing to calculate the integrated English recognition accuracy “45”.

続いて、選択部３４４は、同様の処理を各言語において行い、積算中国語認識確度「１１０」および積算韓国語認識確度「１５」を算出する。そして、選択部３４４は、算出した積算値が所定の閾値より高い認識確度で認識された利用者認識内容を選択する。例えば、この場合の閾値が「４０」であった場合、積算英語認識確度「４５」および積算中国語認識確度「１１０」が閾値より高いため、選択部３４４は、英語および中国語として認識された利用者認識内容を選択する。 Subsequently, the selection unit 344 performs the same processing in each language to calculate the integrated Chinese language recognition accuracy “110” and the integrated Korean language recognition accuracy “15”. Then, the selection unit 344 selects the user recognition content in which the calculated integrated value is recognized with a recognition accuracy higher than a predetermined threshold value. For example, when the threshold value in this case is “40”, the integrated English recognition accuracy “45” and the integrated Chinese language recognition accuracy “110” are higher than the threshold values, so the selection unit 344 is recognized as English and Chinese. Select the user recognition content.

さらに、サービス利用者が発話した場合、サービス利用者が発した音声を示す利用者音声情報を、英語、中国語として認識する。そして、英語認識確度および中国語認識確度がそれぞれ「５」、「５０」であった場合、選択部３４４は、それぞれの認識確度を、既に算出している積算英語認識確度および積算中国語認識確度に加算する。具体的には、選択部３４４は、積算英語認識確度「５０」および積算中国語認識確度「１６０」を算出する。例えば、この場合の閾値が「８０」であった場合、積算中国語認識確度「１６０」が閾値より高いため、選択部３４４は、中国語として認識された利用者認識内容を選択する。 Further, when the service user speaks, the user voice information indicating the voice uttered by the service user is recognized as English or Chinese. Then, when the English recognition accuracy and the Chinese recognition accuracy are "5" and "50", respectively, the selection unit 344 calculates the respective recognition accuracy as the already calculated integrated English recognition accuracy and integrated Chinese recognition accuracy. Add to. Specifically, the selection unit 344 calculates the integrated English recognition accuracy “50” and the integrated Chinese recognition accuracy “160”. For example, when the threshold value in this case is “80”, the cumulative Chinese language recognition accuracy “160” is higher than the threshold value, and thus the selection unit 344 selects the user recognition content recognized as Chinese language.

このように、コミュニケーション支援システム１では、認識確度の積算値と閾値とを比較し、選択する利用者認識内容を選択する構成であってもよい。この構成の場合、より好適にサービス利用者が使用する言語を選択することができるので、異なる言語を使用するユーザ同士のコミュニケーションをより円滑にすることができる。 As described above, the communication support system 1 may be configured to compare the integrated value of the recognition accuracy with the threshold value and select the user recognition content to be selected. In the case of this configuration, the language used by the service user can be selected more preferably, so that communication between users who use different languages can be made smoother.

なお、積算値として、認識確度を加算する方法を例に挙げて説明したが、加算に替えて積算を行う構成であっても、平均または加重平均を算出する構成であっても、同様の効果を得ることができる。また、サービス利用者の最初の発話に対して、認識確度に関わらず全ての利用者認識内容を表示対象として選択する構成を例に挙げて説明したが、所定の回数までの発話に対して、認識確度に関わらず全ての利用者認識内容を表示対象として選択する構成であってもよい。例えば、所定の回数を３回とした場合、サービス利用者による発話は、３回までは認識確度（積算値）に関わらず全ての利用者認識内容を表示対象として選択する。そして、サービス利用者による４回目以降の発話に対して、積算値と閾値とを比較し、閾値より高い認識確度で認識された利用者認識内容を選択する構成としてもよい。 It should be noted that the method of adding the recognition accuracy as the integrated value has been described as an example, but the same effect is obtained even if the configuration is such that the integration is performed instead of the addition, or if the average or the weighted average is calculated. Can be obtained. In addition, for the first utterance of the service user, an explanation has been given by taking as an example a configuration in which all user recognition contents are selected as display targets regardless of the recognition accuracy. The configuration may be such that all user recognition contents are selected as display targets regardless of the recognition accuracy. For example, when the predetermined number of times is set to three times, all user recognition contents are selected as display targets up to three times for the utterance by the service user regardless of the recognition accuracy (integrated value). Then, for the fourth and subsequent utterances by the service user, the integrated value may be compared with the threshold value, and the user recognition content recognized with a higher recognition accuracy than the threshold value may be selected.

また、クライアント端末４０による認識処理の認識確度と、認識サーバ１０による認識処理の認識確度とを比較するため、クライアント端末４０による認識処理の認識確度と、認識サーバ１０による認識処理の認識確度とを正規化してもよい。また、言語ごとの認識確度に対しても、正規化してもよい。この構成により、認識処理をクライアント端末４０において行った場合であっても、認識サーバ１０において行った場合であっても、また、何れの言語の認識処理であっても、同じ閾値を用いることができる。 Further, in order to compare the recognition accuracy of the recognition processing by the client terminal 40 with the recognition accuracy of the recognition processing by the recognition server 10, the recognition accuracy of the recognition processing by the client terminal 40 and the recognition accuracy of the recognition processing by the recognition server 10 are compared. It may be normalized. The recognition accuracy for each language may also be normalized. With this configuration, the same threshold value can be used regardless of whether the recognition process is performed by the client terminal 40, the recognition server 10, or the recognition process of any language. it can.

〔実施形態２〕
本発明の他の実施形態について、図９〜図１１に基づいて説明する。 [Embodiment 2]
Another embodiment of the present invention will be described with reference to FIGS. 9 to 11.

上述したように、音声情報に含まれる音声内容を認識する認識処理は、クライアント端末４０において実行されてもよいし、認識サーバ１０において実行されてもよい。そこで本実施形態では、クライアント端末４０に、よく使用されると考えられる発話内容のデータベースを格納することにより、より効果的に異なる言語を使用するユーザ同士のコミュニケーションを円滑にする方法について説明する。なお、本実施形態では、サービス利用者は店の客、サービス提供者は店の店員であり、クライアント端末４０は当該店に設置されている場合を例に挙げて説明する。 As described above, the recognition process of recognizing the voice content included in the voice information may be executed by the client terminal 40 or the recognition server 10. Therefore, in the present embodiment, a method for facilitating communication between users who use different languages more effectively by storing a database of utterance contents considered to be frequently used in the client terminal 40 will be described. In the present embodiment, the case where the service user is a customer of the store, the service provider is a store clerk, and the client terminal 40 is installed in the store will be described as an example.

（端末記憶部５２に格納されるデータベース）
図９は、本発明の実施形態２における端末記憶部５２に格納されるデータベース（認識内容候補一覧）の例である。図９に示すように、端末記憶部５２には、クライアント端末４０が設置されている場所において頻繁に使用されると考えられる発話内容を英語、中国語、および日本語にそれぞれ翻訳した発話内容（以下、「認識コーパス」と称する）が関連付けて格納されている。 (Database stored in terminal storage unit 52)
FIG. 9 is an example of a database (recognition content candidate list) stored in the terminal storage unit 52 according to the second embodiment of the present invention. As shown in FIG. 9, in the terminal storage unit 52, the utterance contents which are translated into English, Chinese, and Japanese respectively, which are considered to be frequently used in the place where the client terminal 40 is installed ( Hereinafter, the “recognition corpus” is stored in association with each other.

（サービス利用者から発話があった場合）
本実施形態において、サービス利用者から発話があった場合について、図１０を用いて説明する。図１０は、本発明の実施形態２におけるクライアント端末４０の処理の流れを示すフローチャートである。本実施形態では、上述したステップＳ６２における認識処理を、クライアント端末４０において実行する。まず、図６を用いて説明したように、音声情報取得部４４２は、利用者音声情報を取得し、取得した利用者音声情報を音声認識部４４４に出力する。 (When the service user speaks)
In the present embodiment, a case where a service user makes a speech will be described with reference to FIG. FIG. 10 is a flowchart showing the flow of processing of the client terminal 40 according to the second embodiment of the present invention. In the present embodiment, the recognition process in step S62 described above is executed in the client terminal 40. First, as described with reference to FIG. 6, the voice information acquisition unit 442 acquires user voice information, and outputs the acquired user voice information to the voice recognition unit 444.

（ステップＳ９０）
上述したステップＳ６２と同様、音声認識部４４４は、取得した利用者音声情報が示す音声内容を、複数の言語として認識する。例えば、サービス利用者が発した「ウォシャンヤオコーヒー」という発音を、英語、中国語、および韓国語として認識する。 (Step S90)
Similar to step S62 described above, the voice recognition unit 444 recognizes the voice content indicated by the acquired user voice information as a plurality of languages. For example, the pronunciation "Woshan Yao Coffee" issued by the service user is recognized as English, Chinese, and Korean.

（ステップＳ９２）
続いて、音声認識部４４４は、各言語として認識した認識内容のうち、端末記憶部５２に格納された認識コーパスに一致する認識内容があるか否かを判定する。 (Step S92)
Subsequently, the voice recognition unit 444 determines whether or not, among the recognition contents recognized as each language, there is a recognition contents that matches the recognition corpus stored in the terminal storage unit 52.

例えば、音声認識部４４４は、「ウォシャンヤオコーヒー」を英語として認識した認識内容「What are y'all coffee」と一致する認識コーパスがあるか否かを判定する。端末記憶部５２には、「What are y'all coffee」と一致する認識コーパスはないので、音声認識結果は、認識内容「What are y'all coffee」と一致する認識コーパスはないと判定する。 For example, the voice recognition unit 444 determines whether or not there is a recognition corpus that matches the recognition content “What are y'all coffee” that recognizes “Woshan Yao coffee” as English. Since there is no recognition corpus that matches “What are y'all coffee” in the terminal storage unit 52, it is determined that there is no recognition corpus that matches the recognition content “What are y'all coffee” in the speech recognition result.

続いて、音声認識部４４４は、「ウォシャンヤオコーヒー」を中国語として認識した認識内容 Subsequently, the voice recognition unit 444 recognizes “Woshang Yao Coffee” as Chinese.

と一致する認識コーパスがあるか否かを判定する。端末記憶部５２には、項目「Ｎｏ．」が「６」に関連付けられた中国語の認識コーパス「我想要［Ｄｒｉｎｋ］」があり、項目「Ｎｏ．」が「２０１」に関連付けられた中国語の認識コーパス It is determined whether there is a recognition corpus that matches with. In the terminal storage unit 52, there is a Chinese recognition corpus "Impression [Drink]" in which the item "No." is associated with "6", and the item "No." is associated with "201" in China. Word recognition corpus

がある。そのため、音声認識部４４４は、認識内容 There is. Therefore, the voice recognition unit 444 determines the recognition content.

と一致する認識コーパスがあると判定する。 It is determined that there is a recognition corpus that matches with.

音声認識部４４４は、同様に、他の言語として認識した認識内容についても、端末記憶部５２に格納された認識コーパスに一致する認識内容があるか否かを判定する。 Similarly, the voice recognition unit 444 also determines whether or not the recognition content recognized as another language matches the recognition corpus stored in the terminal storage unit 52.

（ステップＳ９４）
ステップＳ９２において、「各言語として認識した認識内容のうち、端末記憶部５２に格納された認識コーパスに一致する認識内容はない」と判定された場合（ステップＳ９２：ＮＯ）、音声認識部４４４は、認識した内容を示す利用者認識内容と、ステップＳ９０における認識処理の確からしさを示す認識確度とを含む認識結果を、クライアント端末通信部４２を介して支援サーバ３０に出力する。この場合、支援サーバ３０は、図６におけるステップＳ６８以降の処理を実行する。また、この場合、例えば、支援サーバ３０が取得した認識確度が、所定の認識確度より低い場合、クライアント端末４０から利用者音声情報の出力を要求し、ステップＳ５６以降の処理を実行してもよい。 (Step S94)
When it is determined in step S92 that "there is no recognition content that matches the recognition corpus stored in the terminal storage unit 52 among the recognition contents recognized as each language" (step S92: NO), the voice recognition unit 444 A recognition result including the user recognition content indicating the recognized content and the recognition accuracy indicating the accuracy of the recognition process in step S90 is output to the support server 30 via the client terminal communication unit 42. In this case, the support server 30 executes the processing from step S68 onward in FIG. Further, in this case, for example, when the recognition accuracy acquired by the support server 30 is lower than the predetermined recognition accuracy, the client terminal 40 may be requested to output the user voice information, and the processing from step S56 may be executed. ..

（ステップＳ９６）
一方、ステップＳ９２において、「各言語として認識した認識内容のうち、端末記憶部５２に格納された認識コーパスに一致する認識内容がある」と判定された場合（ステップＳ９２：ＹＥＳ）、音声認識部４４４は、一致した認識コーパスに対応する日本語のコーパス（換言すると、図９に示すデータベースにおいて、一致した認識コーパスに関連付けられた日本語の認識コーパス）を選択する。 (Step S96)
On the other hand, when it is determined in step S92 that "there are recognition contents that match the recognition corpus stored in the terminal storage unit 52 among the recognition contents recognized as each language" (step S92: YES), the voice recognition unit 444 selects the Japanese corpus corresponding to the matched recognition corpus (in other words, the Japanese recognition corpus associated with the matched recognition corpus in the database shown in FIG. 9).

例えば、音声認識部４４４は、音声認識部４４４は、認識内容 For example, the voice recognition unit 444, the voice recognition unit 444,

と一致する認識コーパスがあると判定したので、 Since it is determined that there is a recognition corpus that matches

に関連付けられた日本語の認識コーパス「私はコーヒーが欲しいです」（より具体的には、項目「Ｎｏ．」が「６」に関連付けられた「私は［Ｄｒｉｎｋ］が欲しいです」および項目「Ｎｏ．」が「２０１」に関連付けられた「コーヒー」）を選択する。 Japanese recognition corpus associated with "I want coffee" (more specifically, item "No." is associated with "6" "I want [Drink]" and item " No. ”is associated with“ 201 ”and is“ coffee ”).

（ステップＳ９８）
音声認識部４４４は、一致した認識コーパスを利用者認識内容として、また、選択した日本語の認識コーパスを利用者翻訳内容として、表示情報取得部４４８に出力し、表示情報取得部４４８は表示処理を実行する。 (Step S98)
The voice recognition unit 444 outputs the matched recognition corpus as the user recognition content and the selected Japanese recognition corpus as the user translation content to the display information acquisition unit 448, and the display information acquisition unit 448 performs the display processing. To execute.

なお、表示処理は、音声認識部４４４が利用者認識内容および利用者翻訳内容を、クライアント端末通信部４２を介して支援サーバ３０に出力し、表示情報取得部４４８が、クライアント端末通信部４２を介して支援サーバ３０から表示情報を取得し、取得した表示情報を参照する構成であってもよい。この場合、支援サーバ３０は、上述したステップＳ８０の処理を実行する。 In the display processing, the voice recognition unit 444 outputs the user recognition content and the user translation content to the support server 30 via the client terminal communication unit 42, and the display information acquisition unit 448 causes the client terminal communication unit 42 to operate. The configuration may be such that display information is acquired from the support server 30 via the support server 30 and the acquired display information is referred to. In this case, the support server 30 executes the process of step S80 described above.

ステップＳ９８の処理が実行された場合に表示部４６に表示される画像の例を、図１１に示す。図１１は、本発明の実施形態２において表示部４６に表示される画像の一例を示す図であり、（ａ）は、第１の領域４６ａに表示される画像であり、（ｂ）は、第２の領域４６ｂに表示される画像である。 FIG. 11 shows an example of an image displayed on the display unit 46 when the process of step S98 is executed. FIG. 11 is a diagram showing an example of an image displayed on the display unit 46 in Embodiment 2 of the present invention, (a) is an image displayed on the first area 46a, and (b) is It is an image displayed in the second area 46b.

図１１の（ａ）に示すように、第１の領域４６ａに表示される画像には、中国語として認識された利用者認識内容を含むテキスト６４０が含まれている。そして、図１１の（ｂ）に示すように、第２の領域４６ｂに表示される画像には、中国語として認識された利用者認識内容を日本語に翻訳した利用者翻訳内容を含むテキスト７４０が含まれている。 As shown in FIG. 11A, the image displayed in the first area 46a includes the text 640 including the user recognition content recognized as Chinese. Then, as shown in FIG. 11B, in the image displayed in the second area 46b, the text 740 including the user translation content obtained by translating the user recognition content recognized as Chinese into Japanese. It is included.

このように、本実施形態に係るコミュニケーション支援システム１では、予め定められた認識内容候補一覧（データベース）を参照して、表示対象の利用者認識内容を選択する。この構成により、コミュニケーション支援システム１では、認識サーバ１０による処理を省略することができる。そのため、コミュニケーション支援システム１では、装置間の通信量を減少させることができる。また、認識内容候補一覧に、選択した利用者認識内容を翻訳した利用者翻訳内容も含まれる場合、翻訳サーバ２０による処理も省略することができる。 As described above, in the communication support system 1 according to the present embodiment, the user recognition content to be displayed is selected by referring to the predetermined recognition content candidate list (database). With this configuration, in the communication support system 1, the processing by the recognition server 10 can be omitted. Therefore, the communication support system 1 can reduce the amount of communication between devices. Further, when the recognition content candidate list also includes the user translation content obtained by translating the selected user recognition content, the processing by the translation server 20 can be omitted.

なお、上述したステップＳ９０において、音声認識部４４４は、上述した処理に加えて、認識コーパスに一致する程度を示すスコアを算出してもよい。この場合、上述した実施形態と同様、スコアが閾値より高いか否かを選択部３４４において判定する。そして、選択部３４４は、閾値より高いスコアによって認識された認識内容を選択する。閾値より高いスコアによって認識された認識内容が複数ある場合は、ステップＳ９６において、音声認識部４４４は、当該複数の認識内容にそれぞれ対応する日本語のコーパスを選択する。この場合、表示態様決定部３４６は、スコアが高い順に上から利用者認識内容および利用者翻訳内容が表示される構成が好ましい。この構成により、コミュニケーション支援システム１では、さらに効果的にサービス利用者が使用する言語の誤認識を防ぐことができる。 In addition, in step S90 described above, the voice recognition unit 444 may calculate a score indicating the degree of matching with the recognition corpus, in addition to the processing described above. In this case, similarly to the above-described embodiment, the selection unit 344 determines whether the score is higher than the threshold value. Then, the selection unit 344 selects the recognition content recognized by the score higher than the threshold. When there are a plurality of recognition contents recognized by the score higher than the threshold value, in step S96, the voice recognition unit 444 selects a Japanese corpus corresponding to each of the plurality of recognition contents. In this case, the display mode determination unit 346 preferably has a configuration in which the user recognition content and the user translation content are displayed in order from the highest score. With this configuration, the communication support system 1 can more effectively prevent erroneous recognition of the language used by the service user.

〔実施形態３〕
本発明の他の実施形態について、図１２に基づいて説明する。 [Embodiment 3]
Another embodiment of the present invention will be described with reference to FIG.

上述した実施形態では、認識確度が所定の閾値より高い認識確度で認識された利用者認識内容がない場合、利用者認識内容は表示されなくなってしまう。そのため、本実施形態では、認識確度が所定の閾値（以下、「採用閾値」と称する）より低い閾値（以下、「候補閾値」と称する）を設定する構成について、上述した図６のシーケンス図を用いて説明する。 In the above-described embodiment, if there is no user recognition content recognized with a recognition accuracy higher than a predetermined threshold value, the user recognition content will not be displayed. Therefore, in the present embodiment, the sequence diagram of FIG. 6 described above is used for a configuration in which a threshold value (hereinafter, referred to as “candidate threshold value”) whose recognition accuracy is lower than a predetermined threshold value (hereinafter, referred to as “adoption threshold value”) is set. It will be explained using.

（ステップＳ５２〜ステップＳ６６）
上述した処理と同じ処理であるため、説明を省略する。 (Step S52 to Step S66)
Since the process is the same as the process described above, the description thereof will be omitted.

（ステップＳ６８）
上述した処理と同様、選択部３４４は、認識結果を参照し、採用閾値より高いか否かを判定することにより、表示対象の利用者認識内容を選択する。ここで、例えば、英語、中国語、および韓国語として認識した場合の認識確度がそれぞれ「３０」、「５０」、および「１０」であり、採用閾値が「６０」であった場合、選択部３４４が表示対象の利用者認識内容として選択する利用者認識内容は存在しないことになる。この場合、選択部３４４は、英語、中国語、および韓国語として認識した場合の認識確度が、候補閾値より高いか否かを判定する。例えば、候補閾値が「５」であった場合、選択部３４４は、候補閾値より高い利用者認識内容を選択する。そして、選択部３４４は、選択した利用者認識内容を、情報管理部３４２に出力する。また、選択部３４４は、選択した利用者認識内容を含む認識結果を、表示態様決定部３４６に出力する。 (Step S68)
Similar to the above-described processing, the selection unit 344 selects the user recognition content to be displayed by referring to the recognition result and determining whether it is higher than the adoption threshold. Here, for example, when the recognition accuracies when recognized as English, Chinese, and Korean are “30”, “50”, and “10”, respectively, and the adoption threshold is “60”, the selection unit This means that there is no user recognition content that 344 selects as the user recognition content to be displayed. In this case, the selection unit 344 determines whether or not the recognition accuracy when recognized as English, Chinese, and Korean is higher than the candidate threshold value. For example, when the candidate threshold is “5”, the selection unit 344 selects the user recognition content higher than the candidate threshold. Then, the selection unit 344 outputs the selected user recognition content to the information management unit 342. The selection unit 344 also outputs the recognition result including the selected user recognition content to the display mode determination unit 346.

（ステップＳ７０〜ステップＳ７８）
上述した処理と同じ処理であるため、説明を省略する。 (Steps S70 to S78)
Since the process is the same as the process described above, the description thereof will be omitted.

（ステップＳ８０）
表示態様決定部３４６は、利用者翻訳内容を取得すると、クライアント端末４０の表示部４６に表示する表示態様を決定する。具体的には、表示態様決定部３４６は、ステップＳ６８において取得した利用者認識内容の認識確度が採用閾値以下かつ候補閾値より高いので、認識確度が低い旨を示す表示態様に決定する。認識確度が低い旨を示す表示態様の例として、（１）文字を薄くして表示する、（２）「もしかして」「Did you mean」といった、認識確度が低かったことを暗示するテキストを付加する、などが挙げられる。そして、表示態様決定部３４６は、決定した表示態様、利用者認識内容、および利用者翻訳内容を含む表示情報を、表示情報出力部３４８に出力する。 (Step S80)
When the user translation content is acquired, the display mode determination unit 346 determines the display mode to be displayed on the display unit 46 of the client terminal 40. Specifically, the display mode determination unit 346 determines the display mode indicating that the recognition accuracy is low because the recognition accuracy of the user recognition content acquired in step S68 is less than or equal to the adoption threshold and higher than the candidate threshold. As an example of the display mode indicating that the recognition accuracy is low, (1) characters are displayed thinly, and (2) texts that imply low recognition accuracy are added, such as “maybe” and “Did you mean”. , And so on. Then, the display mode determination unit 346 outputs the display information including the determined display mode, the user recognition content, and the user translation content to the display information output unit 348.

（ステップＳ８２〜ステップＳ８６）
上述した処理と同じ処理であるため、説明を省略する。 (Step S82 to Step S86)
Since the process is the same as the process described above, the description thereof will be omitted.

ステップＳ８６において、表示部４６に表示される画像の例を、図１２の（ａ）および（ｂ）に示す。図１２は、本発明の実施形態３において表示部４６に表示される画像の一例を示す図であり、（ａ）は、第１の領域４６ａに表示される画像であり、（ｂ）は、第２の領域４６ｂに表示される画像である。 12A and 12B show examples of images displayed on the display unit 46 in step S86. FIG. 12 is a diagram showing an example of an image displayed on the display unit 46 in Embodiment 3 of the present invention, (a) is an image displayed on the first area 46a, and (b) is It is an image displayed in the second area 46b.

図１２の（ａ）に示すように、第１の領域４６ａに表示される画像には、（１）中国語として認識された利用者認識内容を含むテキスト６５２、（２）英語として認識された利用者認識内容を含むテキスト６５４、および（３）韓国語として認識された利用者認識内容を含むテキスト６５６が含まれている。また、第１の領域４６ａには、認識確度が低かったことを暗示するテキスト６５０が含まれている。 As shown in (a) of FIG. 12, in the image displayed in the first area 46a, (1) text 652 including the user recognition content recognized as Chinese, (2) recognition as English The text 654 including the user recognition content, and (3) the text 656 including the user recognition content recognized as Korean are included. Further, the first area 46a includes text 650 that implies that the recognition accuracy is low.

また、図１２の（ｂ）に示すように、第２の領域４６ｂに表示される画像には、（１）中国語として認識された利用者認識内容を日本語に翻訳した利用者翻訳内容を含むテキスト７５２、（２）英語として認識された利用者認識内容を日本語に翻訳した利用者翻訳内容を含むテキスト７５４、および（３）韓国語として認識された利用者認識内容を日本語に翻訳した利用者翻訳内容を含むテキスト７５６が含まれている。また、第２の領域４６ｂにも、認識確度が低かったことを暗示するテキスト７５０が含まれている。 Further, as shown in FIG. 12B, the image displayed in the second area 46b includes (1) the user translation content obtained by translating the user recognition content recognized as Chinese into Japanese. Included text 752, (2) Text 754 including user translated content translated into Japanese from user recognized content recognized as English, and (3) Translated user recognized content recognized as Korean into Japanese. A text 756 including the translated content of the user is included. The second area 46b also includes text 750 that implies that the recognition accuracy is low.

（候補閾値より高い認識確度がない場合）
さらに、上述したステップＳ６８において、英語、中国語、および韓国語として認識した場合の認識確度が、候補閾値以下の場合について、上述した図６のシーケンス図を用いて説明する。 (When there is no recognition accuracy higher than the candidate threshold)
Further, a case where the recognition accuracy when recognized as English, Chinese, and Korean in step S68 described above is equal to or less than the candidate threshold value will be described using the sequence diagram of FIG. 6 described above.

（ステップＳ６８）
選択部３４４は、認識結果を参照し、採用閾値または候補閾値より高いか否かを判定することにより、表示対象の利用者認識内容を選択する。ここで、例えば、英語、中国語、および韓国語として認識した場合の認識確度がそれぞれ「１０」、「２０」、および「５」であり、候補閾値が「３０」であった場合、選択部３４４が表示対象の利用者認識内容として選択する利用者認識内容は存在しないことになる。この場合、選択部３４４は、候補閾値より高い認識確度によって認識された認識内容が存在しない旨を示す情報を、表示態様決定部３４６に出力する。 (Step S68)
The selection unit 344 selects the user recognition content to be displayed by referring to the recognition result and determining whether it is higher than the adoption threshold or the candidate threshold. Here, for example, when the recognition accuracies when recognized as English, Chinese, and Korean are “10”, “20”, and “5”, respectively, and the candidate threshold is “30”, the selection unit This means that there is no user recognition content that 344 selects as the user recognition content to be displayed. In this case, the selection unit 344 outputs information indicating that there is no recognition content recognized with the recognition accuracy higher than the candidate threshold value to the display mode determination unit 346.

（ステップＳ７０〜ステップＳ７８）
ステップＳ７０〜ステップＳ７８の処理は実行せず、ステップＳ８０に進む。 (Steps S70 to S78)
The process of steps S70 to S78 is not executed, and the process proceeds to step S80.

（ステップＳ８０）
表示態様決定部３４６は、ステップＳ６８において取得した情報を参照し、クライアント端末４０の表示部４６に表示する表示態様を決定する。具体的には、表示態様決定部３４６は、ステップＳ６８において取得した情報が候補閾値より高い認識確度によって認識された認識内容が存在しない旨を示すので、発話内容を認識できなかった旨を示す表示態様に決定する。発話内容を認識できなかった旨を示す表示態様の例として、（１）認識できなかった旨を示すテキストを表示する、（２）言語を選択させるための画像を表示する、などが挙げられる。そして、表示態様決定部３４６は、決定した表示態様を含む表示情報を、表示情報出力部３４８に出力する。 (Step S80)
The display mode determination unit 346 determines the display mode to be displayed on the display unit 46 of the client terminal 40 with reference to the information acquired in step S68. Specifically, the display mode determination unit 346 indicates that the information acquired in step S68 does not include the recognition content recognized with the recognition accuracy higher than the candidate threshold, and thus the display indicating that the utterance content cannot be recognized is displayed. The mode is decided. Examples of the display mode indicating that the utterance content cannot be recognized include (1) displaying a text indicating that the utterance cannot be recognized, and (2) displaying an image for selecting a language. Then, the display mode determination unit 346 outputs the display information including the determined display mode to the display information output unit 348.

ステップＳ８６において、表示部４６に表示される画像の例を、図１２の（ｃ）および（ｄ）に示す。図１２の（ｃ）は、第１の領域４６ａに表示される画像であり、（ｄ）は、第２の領域４６ｂに表示される画像である。 Examples of images displayed on the display unit 46 in step S86 are shown in (c) and (d) of FIG. 12C is an image displayed in the first area 46a, and FIG. 12D is an image displayed in the second area 46b.

図１２の（ｃ）に示すように、第１の領域４６ａに表示される画像には、認識できなかった旨を示すテキスト６６４が含まれている。また、第１の領域４６ａには、言語を選択する旨の操作をサービス利用者から受け付ける操作子６６２、および当該操作子を操作することにより言語を選択できる旨を示すテキスト６６０も含まれている。 As shown in (c) of FIG. 12, the image displayed in the first area 46a includes text 664 indicating that the image cannot be recognized. The first area 46a also includes a manipulator 662 that receives an operation for selecting a language from a service user, and a text 660 that indicates that the language can be selected by operating the manipulator. ..

また、図１２の（ｄ）に示すように、第２の領域４６ｂに表示される画像には、認識できなかった旨を示すテキスト７６０が含まれている。また、第２の領域４６ｂには、言語を選択する旨の操作をサービス提供者から受け付ける操作子７６２、および当該操作子を操作することにより言語を選択できる旨を示すテキスト７６４も含まれている。 Further, as shown in (d) of FIG. 12, the image displayed in the second area 46b includes the text 760 indicating that the image cannot be recognized. The second area 46b also includes a manipulator 762 that receives an operation for selecting a language from a service provider, and a text 764 that indicates that the language can be selected by operating the manipulator. ..

なお、操作子は、図１２の（ｃ）および（ｄ）に示すように、表示されるＧＵＩ（Graphic User Interface）であってもよいし、ボタンの形状に限られずスイッチの形状などであってもよい。また、サービス提供者向けに設けられた操作部５０（物理的なボタン、スイッチなど）であってもよい。また、図１２の（ａ）および（ｂ）に示す画像において、各言語によって表示されているテキストをタッチすることにより、言語が選択可能な構成であってもよい。例えば、図１２の（ａ）に示す画像において、クライアント端末４０は、テキスト６５２が選択された旨を示す操作信号を取得した場合、サービス利用者が使用する言語を中国語に決定する構成であってもよい。 The operator may be a displayed GUI (Graphic User Interface) as shown in (c) and (d) of FIG. 12, or may be a switch shape or the like without being limited to a button shape. Good. Alternatively, the operation unit 50 (physical buttons, switches, etc.) provided for the service provider may be used. Further, in the images shown in FIGS. 12A and 12B, the language may be selected by touching the text displayed in each language. For example, in the image shown in FIG. 12A, the client terminal 40 has a configuration in which the language used by the service user is determined to be Chinese when an operation signal indicating that the text 652 is selected is acquired. May be.

そして、コミュニケーション支援システム１では、言語が選択された場合、サービス利用者の発話を認識する言語を、選択された言語に決定する。そのため、例えば、コミュニケーション支援システム１が、図１２の（ａ）に示す画像において、テキスト６５２が選択された旨を示す操作信号を取得した場合、図１２の（ｂ）に示す画像において、テキスト７５２が選択された旨を示す操作信号を取得した場合、または、図１２の（ｃ）もしくは（ｄ）に示す画像において、中国の国旗が選択された旨を示す操作信号を取得した場合、コミュニケーション支援システム１では、サービス利用者が使用する言語を中国語に決定する。 Then, in the communication support system 1, when the language is selected, the language for recognizing the utterance of the service user is determined to be the selected language. Therefore, for example, when the communication support system 1 acquires an operation signal indicating that the text 652 is selected in the image shown in FIG. 12A, the text 752 is displayed in the image shown in FIG. When the operation signal indicating that the flag is selected is acquired, or when the operation signal indicating that the Chinese flag is selected is acquired in the image shown in (c) or (d) of FIG. 12, communication support In the system 1, the language used by the service user is determined to be Chinese.

このように、本実施形態に係るコミュニケーション支援システム１では、操作子（操作部５０、操作子６６２、操作子７６２）を介して受け付けた入力に応じて、複数の言語のうち、何れの言語として利用者音声情報が示す利用者音声内容を認識するかを決定する。そのため、コミュニケーション支援システム１では、サービス利用者が発話する声が小さかったり、雑音が入ったりすることにより、認識処理による認識確度が低い場合であっても、サービス利用者またはサービス提供者に言語を選択する操作を促すことにより、異なる言語を使用するユーザ同士のコミュニケーションを円滑にすることができる。 As described above, in the communication support system 1 according to the present embodiment, any one of a plurality of languages can be selected as the language according to the input received through the operators (the operation unit 50, the operator 662, and the operator 762). It is determined whether to recognize the user voice content indicated by the user voice information. Therefore, in the communication support system 1, even if the recognition accuracy by the recognition processing is low due to the voice uttered by the service user being small or noise, the service user or the service provider can speak the language. By prompting the operation to be selected, it is possible to facilitate communication between users who use different languages.

〔実施形態４〕
本発明の他の実施形態について、図１３に基づいて説明する。 [Embodiment 4]
Another embodiment of the present invention will be described with reference to FIG.

上述の実施形態では、図４の（ａ）に示すように、最初にサービス提供者が発話した発話内容を翻訳した提供者翻訳内容の表示順について特に限定していないが、本実施形態では、所定の条件に従って提供者翻訳内容を表示する構成について、図３のシーケンス図を用いて説明する。また、本実施形態では、上述した認識処理において使用される言語ごとに、当該認識処理において使用された回数を示す回数情報を表示態様決定部３４６が参照できる構成となっている。 In the above-described embodiment, as shown in (a) of FIG. 4, the display order of the provider translation content obtained by translating the utterance content first uttered by the service provider is not particularly limited, but in the present embodiment, A configuration for displaying the provider translation content according to a predetermined condition will be described with reference to the sequence diagram of FIG. Further, in the present embodiment, the display mode determination unit 346 is configured to refer to the number-of-times information indicating the number of times the recognition process is performed for each language used in the above-described recognition process.

（ステップＳ２〜ステップＳ２６）
上述した処理と同じ処理であるため、説明を省略する。 (Steps S2 to S26)
Since the process is the same as the process described above, the description thereof will be omitted.

（ステップＳ２８）
表示態様決定部３４６は、提供者翻訳内容を取得すると、クライアント端末４０の表示部４６に表示する表示態様を決定する。ここで、表示態様決定部３４６は、言語ごとの回数情報を参照し、参照した回数情報に応じた表示態様を決定する。例えば、表示態様決定部３４６が取得した回数情報が、認識処理において、英語が２００回、中国語が１００回、韓国語が５０回使用されていることを示す場合、最も多く使用されている英語に翻訳された提供者翻訳内容を第１の領域４６ａの上側に表示し、その下に、その次に多く使用されている中国語に翻訳された提供者翻訳内容を表示し、さらにその下に、韓国語に翻訳された提供者翻訳内容を表示するように、表示態様を決定する。そして、表示態様決定部３４６は、決定した表示態様、提供者認識内容、および提供者翻訳内容を含む表示情報を、表示情報出力部３４８に出力する。 (Step S28)
The display mode determination unit 346 determines the display mode to be displayed on the display unit 46 of the client terminal 40 when the provider translation content is acquired. Here, the display mode determination unit 346 refers to the number-of-times information for each language and determines the display mode according to the referred number-of-times information. For example, when the number-of-times information acquired by the display mode determination unit 346 indicates that English is used 200 times, Chinese is used 100 times, and Korean is used 50 times in the recognition process, the most frequently used English is used. Is displayed above the first area 46a, and below that, the provider translations translated into Chinese, which are used most frequently, are displayed below the first area 46a. The display mode is determined so as to display the provider translation content translated into Korean. Then, the display mode determination unit 346 outputs the display information including the determined display mode, the provider recognition content, and the provider translation content to the display information output unit 348.

（ステップＳ３０〜ステップＳ３４）
上述した処理と同じ処理であるため、説明は省略する。 (Steps S30 to S34)
Since the process is the same as the process described above, the description is omitted.

ステップＳ３４において、表示部４６に表示される画像の例を、図１３の（ａ）および（ｂ）に示す。図１３は、本発明の実施形態４において表示部４６に表示される画像の一例を示す図であり、（ａ）は、第１の領域４６ａに表示される画像であり、（ｂ）は、第２の領域４６ｂに表示される画像である。 Examples of images displayed on the display unit 46 in step S34 are shown in FIGS. FIG. 13 is a diagram showing an example of an image displayed on the display unit 46 in Embodiment 4 of the present invention, (a) is an image displayed on the first region 46a, and (b) is It is an image displayed in the second area 46b.

図１３の（ａ）に示すように、第１の領域４６ａに表示される画像は、英語に翻訳された提供者翻訳内容を含むテキスト６７０が上に表示されており、その下に、中国語に翻訳された提供者翻訳内容を含むテキスト６７２が表示され、さらにその下に、韓国語に翻訳された提供者翻訳内容を含むテキスト６７４が表示されている。 As shown in (a) of FIG. 13, in the image displayed in the first area 46a, the text 670 including the provider translation content translated into English is displayed on the upper side, and below that, the Chinese language is displayed. A text 672 including the translated content of the provider translated is displayed, and further below that, a text 674 including the translated content of the provider translated into Korean is displayed.

また、図１３の（ｂ）に示すように、第２の領域４６ｂに表示される画像には、提供者認識内容を含むテキスト７７０が含まれている。 Further, as shown in FIG. 13B, the image displayed in the second area 46b includes the text 770 including the contents of the recognition by the provider.

（サービス利用者を撮像した撮像画像を使用する場合）
本実施形態の他の例として、サービス利用者を撮像した撮像画像を使用する構成について説明する。本例におけるコミュニケーション支援システム１では、サービス利用者を撮像した撮像画像を参照してサービス利用者が使用する言語を判定した判定結果を、表示態様決定部３４６が参照できる構成となっている。なお、サービス利用者を撮像した撮像画像を参照してサービス利用者が使用する言語を判定する処理は、コミュニケーション支援システム１を構成する各部の何れかにおいて実行されてもよいし、コミュニケーション支援システム１とは異なる装置が判定し、判定結果をコミュニケーション支援システム１が取得する構成であってもよい。 (When using a captured image of a service user)
As another example of this embodiment, a configuration using a captured image of a service user will be described. In the communication support system 1 in this example, the display mode determination unit 346 can refer to the determination result of determining the language used by the service user by referring to the captured image of the service user. The process of determining the language used by the service user by referring to the captured image of the service user may be executed by any of the units included in the communication support system 1. It may be configured such that a device different from that makes a determination and the communication support system 1 obtains the determination result.

上述したステップＳ２８において、表示態様決定部３４６は、提供者翻訳内容を取得すると、クライアント端末４０の表示部４６に表示する表示態様を決定する。ここで、表示態様決定部３４６は、サービス利用者を撮像した撮像画像を参照してサービス利用者が使用する言語を判定した判定結果を参照し、参照した判定結果に応じた表示態様を決定する。例えば、表示態様決定部３４６が取得した判定結果が、サービス利用者が使用する言語は英語であると判定した判定結果であった場合、表示態様決定部３４６は、英語に翻訳された提供者翻訳内容を第１の領域４６ａの上側に表示されるように、表示態様を決定する。そして、表示態様決定部３４６は、決定した表示態様、提供者認識内容、および提供者翻訳内容を含む表示情報を、表示情報出力部３４８に出力する。本例であっても、表示部４６には図１３に示す画像が表示される。 In step S28 described above, the display mode determination unit 346 determines the display mode to be displayed on the display unit 46 of the client terminal 40 when the provider translation content is acquired. Here, the display mode determination unit 346 refers to the captured image of the service user, refers to the determination result of determining the language used by the service user, and determines the display mode according to the referenced determination result. .. For example, if the determination result acquired by the display mode determining unit 346 is the determination result that the language used by the service user is English, the display mode determining unit 346 determines that the provider translation has been translated into English. The display mode is determined so that the content is displayed above the first area 46a. Then, the display mode determination unit 346 outputs the display information including the determined display mode, the provider recognition content, and the provider translation content to the display information output unit 348. Even in this example, the image shown in FIG. 13 is displayed on the display unit 46.

このように、本実施形態に係るコミュニケーション支援システム１では、最初のサービス提供者の発話内容を翻訳した翻訳内容を、所定の条件に従って表示する。この構成により、コミュニケーション支援システム１では、サービス利用者が使用するであろうと考えられる言語に翻訳された翻訳内容を、第１の領域４６ａにおいて例えば目立つように表示することができる。また、コミュニケーション支援システム１は、本システムが発話内容を翻訳するシステムであることを、サービス利用者に容易に理解させることができる。 As described above, the communication support system 1 according to the present embodiment displays the translated content obtained by translating the uttered content of the first service provider according to a predetermined condition. With this configuration, in the communication support system 1, the translated content translated into the language that the service user is supposed to use can be displayed conspicuously in the first area 46a, for example. Further, the communication support system 1 can easily make the service user understand that this system is a system for translating the utterance content.

〔実施形態５〕
本発明の他の実施形態について、図１４に基づいて説明する。 [Embodiment 5]
Another embodiment of the present invention will be described with reference to FIG.

上述の実施形態では、コミュニケーション支援システム１は、ステップＳ６８における選択処理において選択されなかった利用者認識内容の言語による認識は、それ以降の処理では行われていなかった（上述の実施形態１では、韓国語として認識された認識内容は選択処理において選択されなかったので、当該選択処理以降の処理では、サービス利用者の発話内容を韓国語として認識する認識処理は行われなかった）。本実施形態では、ステップＳ６８における選択処理において選択されなかった利用者認識内容の言語であっても、引き続き認識処理を実行する構成について説明する。 In the above-described embodiment, the communication support system 1 does not recognize the user recognition content not selected in the selection processing in step S68 by the language in the subsequent processing (in the above-described embodiment 1, Since the recognition content recognized as Korean was not selected in the selection processing, the recognition processing for recognizing the speech content of the service user as Korean was not performed in the processing after the selection processing). In the present embodiment, a configuration will be described in which the recognition process is continuously executed even if the language of the user recognition content is not selected in the selection process in step S68.

まず、コミュニケーション支援システム１では、選択部３４４は、選択処理において、利用者音声情報が示す音声内容を中国語として認識した利用者認識内容を選択している。そのため、表示部４６の第１の領域４６ａには、中国語以外の言語として認識された利用者認識内容は表示されていない状態である。一方、上述したように、認識処理では、英語、中国語、および韓国語として利用者音声情報が示す音声内容を認識している。 First, in the communication support system 1, the selection unit 344 selects the user recognition content in which the voice content indicated by the user voice information is recognized as Chinese in the selection process. Therefore, the user recognition content recognized as a language other than Chinese is not displayed in the first area 46a of the display unit 46. On the other hand, as described above, in the recognition processing, the voice content indicated by the user voice information is recognized as English, Chinese, and Korean.

このとき、表示部４６に表示される画像の例を、図１４の（ａ）および（ｂ）に示す。図１４は、本発明の実施形態５において表示部４６に表示される画像の一例を示す図であり、（ａ）は、第１の領域４６ａに表示される画像であり、（ｂ）は、第２の領域４６ｂに表示される画像である。 At this time, an example of the image displayed on the display unit 46 is shown in FIGS. FIG. 14 is a diagram showing an example of an image displayed on the display unit 46 in the fifth embodiment of the present invention, (a) is an image displayed on the first area 46a, and (b) is It is an image displayed in the second area 46b.

図１４の（ａ）に示すように、第１の領域４６ａに表示される画像には、中国語として認識した利用者認識内容を含むテキスト６８０が含まれている。また、図１４の（ｂ）に示すように、第２の領域４６ｂに表示される画像には、中国語として認識した利用者認識内容を日本語に翻訳した利用者翻訳内容を含むテキスト７８０が含まれている。 As shown in FIG. 14A, the image displayed in the first area 46a includes the text 680 including the user recognition content recognized as Chinese. Further, as shown in FIG. 14B, in the image displayed in the second area 46b, the text 780 including the user translation content obtained by translating the user recognition content recognized as Chinese into Japanese. include.

続く処理について、上述した図６を用いて説明する。 The subsequent processing will be described with reference to FIG. 6 described above.

（ステップＳ５２）
上述した処理と同様、クライアント端末４０の音声情報取得部４４２は、音声入力部４８を介してサービス利用者が発した音声を示す利用者音声情報を取得する取得処理を行う。ここで、例えば、決定した言語（中国語）以外の言語である英語を使用するサービス利用者が「I want tea」と発した場合、音声情報取得部４４２は、「I want tea」を示す利用者音声情報を取得する。 (Step S52)
Similar to the above-described processing, the voice information acquisition unit 442 of the client terminal 40 performs an acquisition process of acquiring user voice information indicating the voice uttered by the service user via the voice input unit 48. Here, for example, when a service user who uses English, which is a language other than the determined language (Chinese), utters “I want tea”, the voice information acquisition unit 442 uses the usage indicating “I want tea”. Person voice information.

（ステップＳ５４〜ステップＳ６０）
上述した処理と同じ処理であるため、説明は省略する。 (Steps S54 to S60)
Since the process is the same as the process described above, the description is omitted.

（ステップＳ６２）
上述したように、認識サーバ制御部１４は、取得した利用者音声情報が示す音声内容を、英語、中国語、および韓国語として認識する。具体的には、認識サーバ制御部１４は、「アイウォントティー」という発音を、英語、中国語、および韓国語として認識する。 (Step S62)
As described above, the recognition server control unit 14 recognizes the voice content indicated by the acquired user voice information as English, Chinese, and Korean. Specifically, the recognition server control unit 14 recognizes the pronunciation “I Want To” as English, Chinese, and Korean.

（ステップＳ６４〜ステップＳ６６）
上述した処理と同じ処理であるため、説明は省略する。 (Steps S64 to S66)
Since the process is the same as the process described above, the description is omitted.

（ステップＳ６８）
選択部３４４は、表示対象の利用者認識内容として、中国語として認識された利用者認識内容を選択してきたため、まずは、表示対象の利用者認識内容として、中国語として認識された利用者認識内容を選択する。ここで、選択部３４４は、中国語として認識した認識確度よりも他の言語として認識した認識確度が高い場合、当該他の言語として認識した利用者認識内容も選択する。例えば、「アイウォントティー」を中国語として認識した利用者認識内容が「Iwan七」、認識確度が「２０」であり、「アイウォントティー」を英語として認識した利用者認識内容が「I want tea」、認識確度が「５０」であり、「アイウォントティー」を韓国語として認識した利用者認識内容が (Step S68)
Since the selection unit 344 has selected the user recognition content recognized as Chinese as the display target user recognition content, first, the user recognition content recognized as Chinese as the display target user recognition content. Select. Here, when the recognition accuracy recognized as another language is higher than the recognition accuracy recognized as Chinese, the selection unit 344 also selects the user recognition content recognized as the other language. For example, the user recognition content that recognizes "I want tea" as Chinese is "Iwan 7", the recognition accuracy is "20", and the user recognition content that recognizes "I want tea" as English is "I want". "tea", the recognition accuracy is "50", and the user recognition content that recognizes "I want tea" as Korean is

認識確度が「１０」であった場合、選択部３４４は、表示対象の利用者認識内容として、中国語として認識された利用者認識内容に加えて、英語として認識された利用者認識内容を選択する。 When the recognition accuracy is “10”, the selection unit 344 selects the user recognition content recognized as English as the user recognition content to be displayed in addition to the user recognition content recognized as Chinese. To do.

（ステップＳ７０〜ステップＳ７８）
上述した処理と同じ処理であるため、説明は省略する。 (Steps S70 to S78)
Since the process is the same as the process described above, the description is omitted.

（ステップＳ８０）
表示態様決定部３４６は、利用者翻訳内容を取得すると、クライアント端末４０の表示部４６に表示する表示態様を決定する。ここで、表示態様決定部３４６は、ここまで表示対象として選択された利用者認識内容の言語（中国語）の認識確度より、ここまで表示対象として選択されなかった利用者認識内容の言語（英語）の認識確度の方が高い旨を示す表示態様に決定する。ここまで表示対象として選択された利用者認識内容の言語の認識確度より、ここまで表示対象として選択されなかった利用者認識内容の言語の認識確度の方が高い旨を示す表示態様の例として、（１）文字を薄くして表示する（他とは異なる表示態様にて表示する）、（２）「もしかして」「Did you mean」といった、ここまで表示対象として選択されなかったものの、認識確度が高いことを暗示するテキストを付加する、などが挙げられる。そして、表示態様決定部３４６は、決定した表示態様、利用者認識内容、および利用者翻訳内容を含む表示情報を、表示情報出力部３４８に出力する。 (Step S80)
When the user translation content is acquired, the display mode determination unit 346 determines the display mode to be displayed on the display unit 46 of the client terminal 40. Here, based on the recognition accuracy of the language (Chinese) of the user recognition content selected as the display target so far, the display mode determination unit 346 determines the language of the user recognition content not selected as the display target so far (English. ) The display mode indicating that the recognition accuracy is higher is determined. As an example of a display mode indicating that the recognition accuracy of the language of the user recognition content selected so far as the display target is higher than the recognition accuracy of the language of the user recognition content not selected as the display target so far, (1) The characters are displayed thinly (displayed in a display mode different from the others), (2) “Midoshite” and “Did you mean”, which have not been selected as display targets so far, but the recognition accuracy is high. For example, you can add text that implies a high price. Then, the display mode determination unit 346 outputs the display information including the determined display mode, the user recognition content, and the user translation content to the display information output unit 348.

ステップＳ８６において、表示部４６に表示される画像の例を、図１４の（ｃ）および（ｄ）に示す。図１４の（ｃ）は、第１の領域４６ａに表示される画像であり、（ｄ）は、第２の領域４６ｂに表示される画像である。 Examples of images displayed on the display unit 46 in step S86 are shown in (c) and (d) of FIG. FIG. 14C is an image displayed in the first area 46a, and FIG. 14D is an image displayed in the second area 46b.

図１４の（ｃ）に示すように、第１の領域４６ａに表示される画像には、上述したテキスト６８０に加えて、（１）中国語として認識された利用者認識内容を含むテキスト６８２、（２）中国語の認識確度より高かったことを暗示するテキスト６８４および英語として認識された利用者認識内容を含むテキスト６８６が含まれている。また、テキスト６８４およびテキスト６８６の表示を削除する旨の操作を受け付けるボタン６８８をさらに含む構成であってもよい。 As shown in (c) of FIG. 14, in the image displayed in the first area 46a, in addition to the text 680 described above, (1) a text 682 including user recognition content recognized as Chinese, (2) A text 684 indicating that the recognition accuracy is higher than Chinese recognition accuracy and a text 686 including the user recognition content recognized as English are included. Further, it may be configured to further include a button 688 for accepting an operation to delete the display of the text 684 and the text 686.

また、図１４の（ｄ）に示すように、第２の領域４６ｂに表示される画像には、上述したテキスト７８０に加えて、（１）中国語として認識された利用者認識内容を日本語に翻訳した利用者翻訳内容７８２、（２）中国語の認識確度より高かったことを暗示するテキスト７８４および英語として認識された利用者認識内容を日本語に翻訳した利用者翻訳内容７８６が含まれている。また、テキスト７８４およびテキスト７８６の表示を削除する旨の操作を受け付けるボタン７８８をさらに含む構成であってもよい。 Further, as shown in (d) of FIG. 14, in addition to the text 780 described above, (1) user recognition content recognized as Chinese is displayed in the image displayed in the second area 46b in Japanese. The user translation contents 782 translated into (1), (2) a text 784 implying that the recognition accuracy in Chinese is higher than the recognition accuracy, and the user translation contents 786 translated from the user recognition contents recognized as English into Japanese are included. ing. In addition, a configuration may further include a button 788 that accepts an operation to delete the display of the text 784 and the text 786.

ここで、上述した実施形態３において説明したように、操作信号取得部４４６が操作子を介して、決定していた言語を変更する旨の操作を示す操作信号を取得した場合、コミュニケーション支援システム１では、サービス利用者が使用する言語を、今まで表示対象として選択されていた利用者認識内容の言語とは異なる言語に変更してもよい。例えば、図１４の（ｃ）に示す画像に対して、サービス利用者がテキスト６８６をタッチする操作を行った場合、コミュニケーション支援システム１は、選択処理において、英語として認識された利用者認識内容を選択するように変更してもよい。 Here, as described in the third embodiment, when the operation signal acquisition unit 446 acquires an operation signal indicating an operation to change the determined language via the operator, the communication support system 1 Then, the language used by the service user may be changed to a language different from the language of the user recognition content that has been selected as a display target until now. For example, when the service user performs an operation of touching the text 686 on the image shown in FIG. 14C, the communication support system 1 displays the user recognition content recognized as English in the selection process. It may be changed to be selected.

また、今まで表示対象として選択された利用者認識内容の言語の認識確度より、今まで表示対象として選択されなかった利用者認識内容の言語の認識確度の方が高い状態が、連続して所定の回数（例えば３回）に達した場合、コミュニケーション支援システム１では、サービス利用者が使用する言語を、当該今まで表示対象として選択されなかった利用者認識内容の言語に変更してもよい。 In addition, if the recognition accuracy of the language of the user recognition content that has not been selected as the display target is higher than the recognition accuracy of the language of the user recognition content that has been selected as the display target, If the number of times (for example, three times) is reached, the communication support system 1 may change the language used by the service user to the language of the user recognition content that has not been selected as the display target until now.

このように、本実施形態に係るコミュニケーション支援システム１では、表示対象の利用者認識内容として選択されなくなった言語であっても、認識処理において利用者音声情報を当該言語として認識する処理を実行する。そのため、コミュニケーション支援システム１では、例えば、（１）サービス利用者が中国語を使用して発話していたところ、誤認識が多かったため、サービス利用者が、使用する言語を英語に変更した、（２）中国語を使用するサービス利用者と会話しているときに、英語を使用するサービス利用者が発話した、など、サービス利用者が使用する言語が変更になった場合であっても、コミュニケーションを円滑にすることができる。 As described above, in the communication support system 1 according to the present embodiment, the process of recognizing the user voice information as the relevant language is executed in the recognition process even if the language is no longer selected as the user recognition content to be displayed. .. Therefore, in the communication support system 1, for example, (1) when the service user was speaking in Chinese, there were many false recognitions, so the service user changed the language used to English ( 2) Communication is performed even when the language used by the service user has changed, such as when the service user who uses English speaks while talking to the service user who uses Chinese. Can be smoothed.

〔実施形態６〕
本発明の他の実施形態について、図１５〜図１７に基づいて説明する。 [Sixth Embodiment]
Another embodiment of the present invention will be described with reference to FIGS.

上述した実施形態では、認識する音声内容より前に発話された音声内容を参照することなく認識処理が行われていたが、本実施形態では、認識する音声内容より前に発話された音声内容を参照する構成について、説明する。本実施形態では、上述した実施形態２と同様、上述したステップＳ６２における認識処理を、クライアント端末４０において実行する。 In the above-described embodiment, the recognition process is performed without referring to the voice content uttered prior to the recognized voice content, but in the present embodiment, the voice content uttered before the recognized voice content is recognized. The configuration to be referred to will be described. In the present embodiment, the recognition processing in step S62 described above is executed in the client terminal 40, as in the second embodiment described above.

（端末記憶部５２に格納される発話文データベースおよび応答データベース）
図１５は、本発明の実施形態６における端末記憶部５２に格納されるデータベース（応答内容候補一覧）の例であり、（ａ）は、発話文データベースの例であり、（ｂ）は、応答データベースの例である。 (Utterance sentence database and response database stored in terminal storage unit 52)
FIG. 15 is an example of a database (response content candidate list) stored in the terminal storage unit 52 according to the sixth embodiment of the present invention, (a) is an example of a utterance sentence database, and (b) is a response. It is an example of a database.

（発話文データベース）
端末記憶部５２には、クライアント端末４０が設置されている場所においてよく使用されると考えられる発話内容を含む発話文データベースが格納されている。発話文データベースには、図１５の（ａ）に示すように、項目「発話文ＩＤ」、項目「発話文種別」、および項目「発話文コーパス」が関連付けられている。 (Utterance sentence database)
The terminal storage unit 52 stores an utterance sentence database including utterance contents that are considered to be often used in the place where the client terminal 40 is installed. As shown in FIG. 15A, the utterance sentence database is associated with the item “utterance sentence ID”, the item “utterance sentence type”, and the item “utterance sentence corpus”.

項目「発話文ＩＤ」には、関連付けられている発話文を特定するための識別番号である発話文ＩＤが格納されている。 The item “utterance sentence ID” stores an utterance sentence ID that is an identification number for identifying the associated utterance sentence.

項目「発話文種別」には、発話文が属するカテゴリを特定するための識別番号である発話文種別が格納されている。 The item “utterance sentence type” stores an utterance sentence type that is an identification number for identifying the category to which the utterance sentence belongs.

項目「発話文コーパス」には、よく使用されると考えられる発話文が格納されている。なお、図１５の（ａ）に示す発話文データベースでは、項目「発話文コーパス」には日本語の発話文が格納されているが、当該日本語の発話文を英語、中国語、および韓国語に翻訳した発話文も、関連付けて格納されている。 The item “utterance sentence corpus” stores utterance sentences that are considered to be frequently used. In the utterance sentence database shown in FIG. 15A, a Japanese utterance sentence is stored in the item “utterance sentence corpus”, but the Japanese utterance sentence is stored in English, Chinese, and Korean. The utterance sentence translated to is also stored in association.

（応答データベース）
また、端末記憶部５２には、クライアント端末４０が設置されている場所において、ある発話文と、当該発話文に対する応答となる発話文とを関連付けた応答データベースが格納されている。応答データベースには、図１５の（ｂ）に示すように、項目「応答文ＩＤ」、項目「発話文種別」、項目「条件」、および項目「応答発話文種別」が関連付けられている。 (Response database)
Further, the terminal storage unit 52 stores a response database in which a certain utterance sentence and a utterance sentence which is a response to the utterance sentence are associated with each other in the place where the client terminal 40 is installed. As shown in FIG. 15B, the response database is associated with the item “response sentence ID”, the item “utterance sentence type”, the item “condition”, and the item “response utterance sentence type”.

項目「応答文ＩＤ」には、関連付けられている応答発話文種別を特定するための識別番号が格納されている。 The item “response sentence ID” stores an identification number for identifying the associated response utterance sentence type.

項目「発話文種別」には、発話文データベースに格納されている項目「発話文種別」と同様、発話文が属するカテゴリを特定するための識別番号が格納されている。 Similar to the item “utterance sentence type” stored in the utterance sentence database, the item “utterance sentence type” stores an identification number for identifying the category to which the utterance sentence belongs.

項目「条件」には、関連付けられている応答発話文種別を選択するための条件が格納されている。 The item “condition” stores a condition for selecting the associated response utterance sentence type.

項目「応答発話文種別」には、応答として発話する発話文コーパスが属するカテゴリを特定するための識別番号が格納されている。 The item “response utterance sentence type” stores an identification number for identifying the category to which the utterance sentence corpus uttered as a response belongs.

（クライアント端末４０の処理の流れ）
本実施形態におけるクライアント端末４０の処理の流れについて、図１６を用いて説明する。図１６は、本発明の実施形態６におけるクライアント端末４０の処理の流れを示すフローチャートである。 (Processing flow of the client terminal 40)
The processing flow of the client terminal 40 in this embodiment will be described with reference to FIG. FIG. 16 is a flowchart showing the processing flow of the client terminal 40 according to the sixth embodiment of the present invention.

まず、上述した図３または図５のシーケンス図に従い、クライアント端末４０は、サービス提供者から「何かお探しですか？」を示す提供者音声情報を取得し、当該提供者音声情報を認識した提供者認識内容を、英語、中国語、および韓国語に翻訳し、提供者認識内容および提供者翻訳内容を表示する。このとき、表示部４６に表示される画像は、上述した図４の画像である。 First, according to the sequence diagram of FIG. 3 or FIG. 5 described above, the client terminal 40 acquires the provider voice information indicating “What are you looking for?” From the service provider, and recognizes the provider voice information. The provider recognition content is translated into English, Chinese, and Korean, and the provider recognition content and the provider translation content are displayed. At this time, the image displayed on the display unit 46 is the image of FIG. 4 described above.

続いて、図６を用いて説明したように、音声情報取得部４４２は、利用者音声情報を取得し、取得した利用者音声情報を音声認識部４４４に出力する。 Subsequently, as described with reference to FIG. 6, the voice information acquisition unit 442 acquires the user voice information and outputs the acquired user voice information to the voice recognition unit 444.

（ステップＳ１００）
続いて、音声認識部４４４は、各言語として認識した利用者認識内容のうち、所定の認識確度以上の利用者認識内容が複数存在するか否かを判定する。例えば、音声認識部４４４は、「ウォシャンヤオコーヒー」を、（１）英語として認識した認識確度、（２）中国語として認識した認識確度、および（３）韓国語として認識した認識確度が、それぞれ所定の認識確度以上であるか否かを判定する。 (Step S100)
Subsequently, the voice recognition unit 444 determines whether or not there are a plurality of user recognition contents having a predetermined recognition accuracy or higher among the user recognition contents recognized as each language. For example, the voice recognizing unit 444 has (1) recognition accuracy of recognizing "Woshan Yao coffee" as English, (2) recognition accuracy of recognizing as Chinese, and (3) recognition accuracy of recognizing as Korean. It is determined whether or not the recognition accuracy is equal to or higher than the predetermined recognition accuracy.

（ステップＳ１０２）
ステップＳ１００において、「所定の認識確度以上の利用者認識内容が複数存在する」と判定された場合（ステップＳ１００：ＹＥＳ）、音声認識部４４４は、直前（もしくは、それ以前、以下同様）の発話の提供者認識内容に一致する発話文コーパスが、発話文データベースに格納されているか否かを判定する。 (Step S102)
When it is determined in step S100 that “there are a plurality of user recognition contents having a certain recognition accuracy or higher” (step S100: YES), the voice recognition unit 444 causes the speech recognition unit 444 to immediately (or before, same below) utterance. It is determined whether or not the utterance sentence corpus that matches the provider recognition content of is stored in the utterance sentence database.

例えば、音声認識部４４４は、直前の発話の提供者認識内容「何かお探しですか？」に一致する発話文コーパスが、発話文データベースに格納されているか否かを判定する。図１５の（ａ）に示す発話文データベースにおいて、発話文ＩＤが「１」に関連付けられている発話文コーパスは、「何かお探しですか」なので、音声認識部４４４は、直前の発話の提供者認識内容「何かお探しですか？」に一致する発話文コーパスが、発話文データベースに格納されていると判定する。 For example, the voice recognition unit 444 determines whether or not the utterance sentence corpus that matches the provider recognition content “Looking for something?” Of the immediately preceding utterance is stored in the utterance sentence database. In the utterance sentence database shown in FIG. 15A, the utterance sentence corpus associated with the utterance sentence ID “1” is “Are you looking for something?”. It is determined that the utterance sentence corpus that matches the provider recognition content "Are you looking for something?" Is stored in the utterance sentence database.

（ステップＳ１０４）
ステップＳ１０２において、「直前の発話の提供者認識内容と一致する発話文コーパスがある」と判定された場合（ステップＳ１０２：ＹＥＳ）、音声認識部４４４は、直前の発話の提供者認識内容と一致する発話文コーパスに関連付けられた発話文種別を選択する。例えば、音声認識部４４４は、直前の発話の提供者認識内容「何かお探しですか？」と一致する発話文コーパスに関連付けられた発話文種別「１」を選択する。 (Step S104)
When it is determined in step S102 that “there is a utterance sentence corpus that matches the provider recognition content of the immediately preceding utterance” (step S102: YES), the voice recognition unit 444 matches the provider recognition content of the immediately preceding utterance. The utterance sentence type associated with the utterance sentence corpus is selected. For example, the voice recognition unit 444 selects the utterance sentence type “1” associated with the utterance sentence corpus that matches the provider recognition content “there you looking for?” Of the utterance immediately before.

（ステップＳ１０６）
続いて、音声認識部４４４は、応答データベースにおいて、ステップＳ１０４において選択した発話文種別に関連付けられた応答発話文種別のうち、条件が一致する応答発話文種別を選択する。例えば、音声認識部４４４は、図１５の（ｂ）に示す応答データベースにおいて、ステップＳ１０４において選択した発話文種別「１」に関連付けられた応答発話文種別に関連付けられた条件を参照する。発話文種別「１」に関連付けられた応答発話文種別に関連付けられた条件は、全て「―（条件なし）」であるため、音声認識部４４４は応答発話文種別「２」、「３」、「４」を選択する。 (Step S106)
Subsequently, the voice recognition unit 444 selects, in the response database, the response utterance sentence type having the matching condition from the response utterance sentence types associated with the utterance sentence type selected in step S104. For example, the voice recognition unit 444 refers to the condition associated with the response utterance sentence type associated with the utterance sentence type “1” selected in step S104 in the response database illustrated in FIG. 15B. Since the conditions associated with the response utterance sentence type associated with the utterance sentence type “1” are all “− (no condition)”, the voice recognition unit 444 determines that the response utterance sentence types “2”, “3”, Select "4".

（ステップＳ１０８）
そして、音声認識部４４４は、発話文データベースにおいて、ステップＳ１０６において選択した応答発話文種別に関連付けられた発話文コーパスのうち、利用者認識内容と一致する発話文コーパスがあるか否かを判定する。 (Step S108)
Then, the voice recognition unit 444 determines whether or not there is a utterance sentence corpus that matches the user recognition content in the utterance sentence corpus associated with the response utterance sentence type selected in step S106 in the utterance sentence database. ..

例えば、音声認識部４４４は、図１５の（ａ）に示す発話文データベースにおいて、ステップＳ１０６において選択した応答発話文種別「２」、「３」、「４」である発話文種別に関連付けられた発話文コーパスのうち、「ウォシャンヤオコーヒー」を英語として認識した利用者認識内容「What are y'all coffee」と一致する発話文コーパスがあるか否かを判定する。発話文データベースには、利用者認識内容「What are y'all coffee」と一致する発話文コーパスはないため、続いて音声認識部４４４は、「ウォシャンヤオコーヒー」を中国語として認識した利用者認識内容 For example, the voice recognition unit 444 is associated with the utterance sentence types that are the response utterance sentence types “2”, “3”, and “4” selected in step S106 in the utterance sentence database shown in FIG. From the utterance corpus, it is determined whether or not there is a utterance corpus that matches the user recognition content “What are y'all coffee” that recognizes “Woshan Yao coffee” in English. Since there is no utterance sentence corpus that matches the user recognition content “What are y'all coffee” in the utterance sentence database, the voice recognition unit 444 subsequently determines that the user who has recognized “Woshan Yao Coffee” as Chinese. Recognition content

と一致する発話文コーパスがあるか否かを判定する。 It is determined whether or not there is a utterance sentence corpus that matches with.

図１５の（ａ）に示す発話文データベースには不図示であるが、利用者認識内容 Although not shown in the utterance sentence database shown in FIG.

は、発話文ＩＤ「６」に関連付けられた発話文コーパスと一致するため、音声認識部４４４は、利用者認識内容と一致する発話文コーパスはあると判定する。韓国語の利用者認識内容についても同様の処理を実行し、韓国語の利用者認識内容と一致する発話文コーパスはないと判定する。 Matches the utterance sentence corpus associated with the utterance sentence ID “6”, the voice recognition unit 444 determines that there is a utterance sentence corpus that matches the user recognition content. The same processing is executed for the Korean user recognition content, and it is determined that there is no utterance sentence corpus that matches the Korean user recognition content.

（ステップＳ１１０）
ステップＳ１０８において、「利用者認識内容と一致する発話文コーパスがある」と判定された場合（ステップＳ１１０：ＹＥＳ）、音声認識部４４４は、ステップＳ１０８において発話文コーパスと一致した利用者認識内容を、表示対象の利用者認識内容として選択する。 (Step S110)
When it is determined in step S108 that “there is a utterance sentence corpus that matches the user recognition content” (step S110: YES), the voice recognition unit 444 determines the user recognition content that matches the utterance sentence corpus in step S108. , Select as the user recognition content to be displayed.

（ステップＳ１１２）
音声認識部４４４は、選択した利用者認識内容、および、発話文データベースにおいて、当該利用者認識内容と一致する発話文コーパスに関連付けられた日本語の発話文コーパスを利用者翻訳内容として、表示情報取得部４４８に出力し、表示情報取得部４４８は表示処理を実行する。なお、表示処理は、実施形態２のステップＳ９８において説明した処理と同様、表示情報を支援サーバ３０から取得する構成であってもよい。 (Step S112)
The voice recognition unit 444 displays the selected user recognition content and the Japanese utterance sentence corpus associated with the utterance sentence corpus that matches the user recognition content in the utterance sentence database as the user translation content and display information. The information is output to the acquisition unit 448, and the display information acquisition unit 448 executes display processing. The display process may be configured to acquire the display information from the support server 30 as in the process described in step S98 of the second embodiment.

（ステップＳ１１４）
上述したステップＳ１００において「所定の認識確度以上の利用者認識内容が複数存在しない」と判定された場合（ステップＳ１００：ＮＯ）、ステップＳ１０２において「直前の発話の提供者認識内容と一致する発話文コーパスがない」と判定された場合（ステップＳ１０２：ＮＯ）、またはステップＳ１０８において「利用者認識内容と一致する発話文コーパスがない」と判定された場合（ステップＳ１０８：ＮＯ）、音声認識部４４４は、利用者認識内容および認識確度を含む認識結果を、支援サーバ３０にクライアント端末通信部４２を介して支援サーバ３０に出力する。支援サーバ３０は、認識結果を取得すると、上述したステップＳ６８以降の処理を実行する。 (Step S114)
When it is determined in the above-described step S100 that "there is no plurality of user recognition contents having a certain recognition accuracy or higher" (step S100: NO), "the utterance sentence that matches the provider recognition content of the immediately previous utterance" in step S102. If it is determined that there is no corpus (step S102: NO), or if there is no utterance sentence corpus that matches the user recognition content in step S108 (step S108: NO), the voice recognition unit 444. Outputs the recognition result including the user recognition content and the recognition accuracy to the support server 30 via the client terminal communication unit 42 to the support server 30. When the recognition result is acquired, the support server 30 executes the processing of step S68 and the subsequent steps described above.

本実施形態における表示部４６に表示される画像の例を、図１７に示す。図１７は、本発明の実施形態６において表示部４６に表示される画像の一例を示す図であり、（ａ）は、第１の領域４６ａに表示される画像であり、（ｂ）は、第２の領域４６ｂに表示される画像であり、（ｃ）は、第１の領域４６ａに表示される画像であり、（ｄ）は、第２の領域４６ｂに表示される画像である。 An example of an image displayed on the display unit 46 in this embodiment is shown in FIG. FIG. 17 is a diagram showing an example of an image displayed on the display unit 46 in the sixth embodiment of the present invention, (a) is an image displayed on the first area 46a, and (b) is It is an image displayed in the second area 46b, (c) is an image displayed in the first area 46a, and (d) is an image displayed in the second area 46b.

例えば、ステップＳ１００において、「所定の認識確度以上の利用者認識内容が複数存在しない」と判定された場合（ステップＳ１００：ＮＯ）など、ステップＳ１１４の処理を実行した場合、図１７の（ａ）に示すように、第１の領域４６ａに表示される画像には、提供者認識内容を、（１）英語に翻訳した翻訳内容を含むテキスト６９０、（２）中国語に翻訳した翻訳内容を含むテキスト６９２、および（３）韓国語に翻訳した翻訳内容を含むテキスト６９４に加えて、（１）中国語として認識された利用者認識内容を含むテキスト６９６、（２）英語として認識された利用者認識内容を含むテキスト６９７、および（３）韓国語として認識された利用者認識内容を含むテキスト６９８が含まれている。 For example, when the process of step S114 is executed, such as when it is determined in step S100 that “there is no plurality of user recognition contents having a predetermined recognition accuracy or higher” (step S100: NO), (a) of FIG. As shown in FIG. 7, the image displayed in the first area 46a includes (1) a text 690 including the translated content translated into English, and (2) a translated content translated into Chinese. In addition to the text 692 and (3) the text 694 containing the translated content translated into Korean, (1) the text 696 containing the user recognized content as Chinese, (2) the user recognized as English The text 697 including the recognition content and (3) the text 698 including the user recognition content recognized as Korean are included.

一方、ステップＳ１１２における表示処理において第１の領域４６ａに表示される画像は、図１７の（ｃ）に示すように、上述したテキスト６９０、テキスト６９２、およびテキスト６９４に加えて、テキスト６９６が含まれており、テキスト６９７およびテキスト６９７は含まれていない。 On the other hand, the image displayed in the first area 46a in the display process in step S112 includes the text 696 in addition to the text 690, the text 692, and the text 694 described above, as shown in (c) of FIG. Text 697 and text 697 are not included.

また、第２の領域４６ｂに表示される画像には、図１７の（ｂ）に示すように、提供者認識内容を含むテキスト７９０に加えて、（１）中国語として認識された利用者認識内容を日本語に翻訳した利用者翻訳内容を含むテキスト７９６、（２）英語として認識された利用者認識内容を日本語に翻訳した利用者翻訳内容を含むテキスト７９７、および（３）韓国語として認識された利用者認識内容を日本語に翻訳した利用者翻訳内容が含まれている。 In addition, in the image displayed in the second area 46b, as shown in (b) of FIG. 17, in addition to the text 790 including the contents of provider recognition, (1) user recognition recognized as Chinese Text 796 containing the user translated content translated into Japanese, (2) Text 797 containing the user translated content translated into Japanese from the user recognized content recognized as English, and (3) As Korean It contains the user translation contents that are translated from the recognized user recognition contents into Japanese.

一方、ステップＳ１１２における表示処理において第２の領域４６ｂに表示される画像は、図１７の（ｄ）に示すように、上述したテキスト７９０に加えて、テキスト７９６が含まれており、テキスト７９７およびテキスト７９８は含まれていない。 On the other hand, the image displayed in the second area 46b in the display processing in step S112 includes the text 796 in addition to the above-described text 790, as shown in FIG. Text 798 is not included.

このように、本実施形態に係るコミュニケーション支援システム１では、サービス利用者の発話内容を認識した利用者認識内容を選択する処理において、サービス利用者が発話する前の発話内容を参照し、表示対象の利用者認識内容を選択する。そのため、コミュニケーション支援システム１では、会話の流れに合う利用者認識内容を選択でき、かつ、サービス利用者が使用する言語による認識を行うことができるので、異なる言語を使用するユーザ同士のコミュニケーションをより円滑にすることができる。 As described above, in the communication support system 1 according to the present embodiment, in the process of selecting the user recognition content in which the service user's utterance content is recognized, the utterance content before the service user utters is referred to and displayed. Select the user recognition content of. Therefore, the communication support system 1 can select the user recognition content that suits the flow of conversation and can recognize the user in the language used by the service user, so that communication between users who use different languages can be improved. Can be smooth.

〔実施形態７〕
本発明の他の実施形態について、図１８および図１９に基づいて説明する。 [Embodiment 7]
Another embodiment of the present invention will be described with reference to FIGS. 18 and 19.

上述の実施形態では、サービス提供者とサービス利用者とが会話をするコミュニケーションを例に挙げたが、サービス提供者がクライアント端末４０である場合について説明する。 In the above-described embodiment, the communication in which the service provider and the service user have a conversation has been described as an example, but a case where the service provider is the client terminal 40 will be described.

（クライアント端末４０の処理の流れ）
本実施形態におけるクライアント端末４０の処理の流れについて、図１８を用いて説明する。図１８は、本発明の実施形態７におけるクライアント端末４０の処理の流れを示すフローチャートである。 (Processing flow of the client terminal 40)
The process flow of the client terminal 40 in this embodiment will be described with reference to FIG. FIG. 18 is a flowchart showing the processing flow of the client terminal 40 according to the seventh embodiment of the present invention.

（ステップＳ９０）
音声認識部４４４は、取得した利用者音声情報が示す音声内容を、複数の言語として認識する。例えば、サービス利用者が発した「ウォシャンヤオコーヒー」という発音を、英語、中国語、および韓国語として認識する。 (Step S90)
The voice recognition unit 444 recognizes the voice content indicated by the acquired user voice information as a plurality of languages. For example, the pronunciation "Woshan Yao Coffee" issued by the service user is recognized as English, Chinese, and Korean.

（ステップＳ１２０）
音声認識部４４４は、ステップＳ９０における認識処理の確からしさを示す認識確度を参照し、表示対象の利用者認識内容を選択する。例えば、「ウォシャンヤオコーヒー」を中国語として認識した認識確度が所定の閾値より高く、英語および韓国語として認識した認識確度が所定の閾値以下の場合、選択部３４４は、中国語として認識された利用者認識内容 (Step S120)
The voice recognition unit 444 selects the user recognition content to be displayed by referring to the recognition accuracy indicating the accuracy of the recognition processing in step S90. For example, when the recognition accuracy of recognizing “Woshan Yao Coffee” as Chinese is higher than a predetermined threshold and the recognition accuracy of recognizing as English and Korean is equal to or lower than the predetermined threshold, the selection unit 344 recognizes as Chinese. User recognition content

を選択する。 Select.

（ステップＳ１２２）
続いて、音声認識部４４４は、端末記憶部５２に格納されている発話文データベースに、ステップＳ１２０において選択した利用者認識内容と一致する発話文コーパスがあるか否かを判定する。例えば、図１５の（ａ）に示す発話文データベースの例では、ステップＳ１２０において選択された利用者認識内容 (Step S122)
Subsequently, the voice recognition unit 444 determines whether or not the utterance sentence database stored in the terminal storage unit 52 has the utterance sentence corpus that matches the user recognition content selected in step S120. For example, in the example of the utterance sentence database shown in FIG. 15A, the user recognition content selected in step S120.

は、発話文ＩＤ「６」に関連付けられた発話文コーパス「我想要［Ｄｒｉｎｋ］」に一致するので、音声認識部４４４は、一致する発話文コーパスはあると判定する。 Matches the utterance sentence corpus “Imoko [Drink]” associated with the utterance sentence ID “6”, the voice recognition unit 444 determines that there is a utterance sentence corpus that matches.

（ステップＳ１２４）
ステップＳ１２２において、「一致する発話文コーパスはある」と判定された場合（ステップＳ１２２：ＹＥＳ）、音声認識部４４４は、一致する発話文コーパスに関連付けられている発話文種別を選択する。例えば、利用者認識内容 (Step S124)
When it is determined in step S122 that “there is a matching utterance sentence corpus” (step S122: YES), the voice recognition unit 444 selects the utterance sentence type associated with the matching utterance sentence corpus. For example, user recognition content

は、発話文ＩＤ「６」に関連付けられた発話文コーパス「我想要［Ｄｒｉｎｋ］」に一致するので、音声認識部４４４は、発話文ＩＤ「６」に関連付けられた発話文種別「３」を選択する。 Matches the utterance sentence corpus “Imoko [Drink]” associated with the utterance sentence ID “6”, the voice recognition unit 444 causes the speech recognition unit 444 to associate the utterance sentence type “3” with the utterance sentence ID “6”. Select.

（ステップＳ１２６）
音声認識部４４４は、応答データベースを参照し、ステップＳ１２４において選択した発話文種別に関連付けられている応答発話文種別のうち、条件を満たしている応答発話文種別を選択する。例えば、ステップＳ１２４において選択した発話文種別「３」に関連付けられている条件が、（１）［Ｄｒｉｎｋ］が在庫有り、（２）［Ｄｒｉｎｋ］が品切れ、および（３）その他、であり、（１）［Ｄｒｉｎｋ］が在庫有りを満たす場合、音声認識部４４４は、当該条件に関連付けられた応答発話文種別「２０１」を選択する。 (Step S126)
The voice recognition unit 444 refers to the response database and selects a response utterance sentence type that satisfies the condition from the response utterance sentence types associated with the utterance sentence type selected in step S124. For example, the conditions associated with the utterance sentence type “3” selected in step S124 are (1) [Drink] is in stock, (2) [Drink] is out of stock, and (3) Other, ( 1) When [Drink] satisfies the stock, the voice recognition unit 444 selects the response utterance sentence type “201” associated with the condition.

（ステップＳ１２８）
音声認識部４４４は、発話文データベースを参照し、ステップＳ１２６において選択した応答発話文種別に関連付けられた発話文コーパスから、表示対象とする発話文コーパスを提供者翻訳内容として選択する。例えば、音声認識部４４４は、発話文種別「２０１」に関連付けられた発話文コーパスから、「［Ｄｒｉｎｋ］在以下」を選択する。 (Step S128)
The voice recognition unit 444 refers to the utterance sentence database and selects the utterance sentence corpus to be displayed as the provider translation content from the utterance sentence corpus associated with the response utterance sentence type selected in step S126. For example, the voice recognition unit 444 selects “[Drink] present or less” from the utterance sentence corpus associated with the utterance sentence type “201”.

（ステップＳ１３０）
音声認識部４４４は、ステップＳ１２０において選択した利用者認識内容、およびステップＳ１２８において選択した提供者翻訳内容を、表示情報取得部４４８に出力し、表示情報取得部４４８は表示処理を実行する。なお、表示処理は、実施形態２のステップＳ９８において説明した処理と同様、表示情報を支援サーバ３０から取得する構成であってもよい。 (Step S130)
The voice recognition unit 444 outputs the user recognition content selected in step S120 and the provider translation content selected in step S128 to the display information acquisition unit 448, and the display information acquisition unit 448 executes the display process. The display process may be configured to acquire the display information from the support server 30 as in the process described in step S98 of the second embodiment.

（ステップＳ１３２）
一方、ステップＳ１２２において、「一致する発話文コーパスはない」と判定された場合（ステップＳ１２２：ＮＯ）、音声認識部４４４は、認識できなかった旨を示す提供者翻訳内容を、表示情報取得部４４８に出力し、表示情報取得部４４８は表示処理を実行する。なお、表示処理は、実施形態２のステップＳ９８において説明した処理と同様、表示情報を支援サーバ３０から取得する構成であってもよい。 (Step S132)
On the other hand, when it is determined in step S122 that “there is no matching utterance sentence corpus” (step S122: NO), the voice recognition unit 444 displays the translation content of the provider indicating that the recognition cannot be performed in the display information acquisition unit. Then, the display information acquisition unit 448 executes display processing. The display process may be configured to acquire the display information from the support server 30 as in the process described in step S98 of the second embodiment.

ステップＳ１３０において、表示部４６に表示される画像の例を、図１９に示す。図１９は、本発明の実施形態７において表示部４６に表示される画像の一例を示す図であり、（ａ）は、表示される画像の一例であり、（ｂ）は、表示される画像の他の例である。 FIG. 19 shows an example of the image displayed on the display unit 46 in step S130. FIG. 19 is a diagram showing an example of an image displayed on the display unit 46 in the seventh embodiment of the present invention, (a) is an example of the displayed image, and (b) is the displayed image. Is another example of.

図１９の（ａ）に示すように、表示部４６には、上側を第１の領域４６ａとして、（１）中国語として認識された利用者認識内容を含むテキスト８００、（２）英語として認識された利用者認識内容を含むテキスト８０２、および（３）韓国語として認識された利用者認識内容を含むテキスト８０４が含まれている。さらに、表示部４６には、下側を第２の領域４６ｂとして、上述したステップＳ１２８において選択された提供者発話内容を含むテキスト８１０が含まれている。 As shown in (a) of FIG. 19, in the display unit 46, the upper side is the first area 46a, (1) the text 800 including the user recognition content recognized as Chinese, and (2) the recognition as English. A text 802 including the recognized user recognition content and (3) a text 804 including the user recognition content recognized as Korean are included. Further, the display unit 46 includes the text 810 including the content of the utterance of the provider selected in step S128 described above, with the lower side as the second area 46b.

また、例えば、ステップＳ９０においてサービス利用者が発した音声内容を、英語、中国語、および韓国語として認識し、ステップＳ１２２において、中国語として認識した利用者認識内容と一致する発話文コーパスがあり、英語および韓国語として認識した利用者認識内容と一致する発話文コーパスがなかった場合、図１９の（ｂ）に示すように、上述したテキスト８００、テキスト８０２、テキスト８０４、およびテキスト８１０に加えて、第２の領域４６ｂに、上述したテキスト８００、テキスト８０２、テキスト８０４、テキスト８１０と同様の内容をそれぞれ含むテキスト８２０、テキスト８２２、テキスト８２４、テキスト８３０、英語として認識できなかった旨を示す英語のテキスト８３２、および韓国語として認識できなかった旨を示す韓国語のテキスト８３４を含む構成であってもよい。 Further, for example, in step S90, the speech content uttered by the service user is recognized as English, Chinese, and Korean, and in step S122, there is a speech corpus that matches the user recognition content recognized as Chinese. If there is no utterance corpus that matches the user recognition contents recognized as English, Korean, and English, in addition to the above-described text 800, text 802, text 804, and text 810, as shown in FIG. The second area 46b indicates that the text 800, the text 802, the text 804, and the text 820, the text 822, the text 824, the text 830, and the text 820 that cannot be recognized as English are included, respectively. The configuration may include the English text 832 and the Korean text 834 indicating that the text cannot be recognized as Korean.

このように、本実施形態に係るコミュニケーション支援システム１では、表示対象の利用者認識内容に対応する、予め定められた発話文データベースおよび応答データベース（応答内容候補一覧）を参照して、利用者認識内容に対する応答内容を決定し、表示対象の利用者認識内容を表示部４６の第１の領域４６ａに表示し、応答内容を表示部４６の第２の領域４６ｂに表示する。この構成により、コミュニケーション支援システム１では、クライアント端末４０がサービス提供者として、サービス利用者の発話に対して応答することができる。 As described above, in the communication support system 1 according to the present embodiment, the user recognition is performed by referring to the predetermined utterance sentence database and response database (list of response content candidates) corresponding to the user recognition content to be displayed. The response content to the content is determined, the user recognition content to be displayed is displayed in the first area 46a of the display unit 46, and the response content is displayed in the second area 46b of the display unit 46. With this configuration, in the communication support system 1, the client terminal 40 can respond to the utterance of the service user as a service provider.

また、本実施形態に係るコミュニケーション支援システム１では、予め定められた発話文データベースおよび応答データベースに、利用者認識内容に対する応答内容が存在しない場合に、利用者音声内容を認識できなかった旨を表示する。そのため、コミュニケーション支援システム１では、ユーザに対して、ユーザの発話を認識できなかったことを通知することができる。 Further, in the communication support system 1 according to the present embodiment, when the predetermined utterance sentence database and the response database do not have the response content corresponding to the user recognition content, a message indicating that the user voice content cannot be recognized is displayed. To do. Therefore, the communication support system 1 can notify the user that the user's utterance could not be recognized.

〔実施形態８〕
本発明の他の実施形態について、図２０に基づいて説明する。 [Embodiment 8]
Another embodiment of the present invention will be described with reference to FIG.

本実施形態では、コミュニケーション支援システム１が、サービス利用者の発話内容に「コーヒー」などの商品名が含まれていることを認識した場合に表示する画像について、説明する。本実施形態では、端末記憶部５２に、商品に関する情報を含むデータベースが格納されており、音声認識部４４４は当該データベースを参照することができる。 In the present embodiment, an image displayed when the communication support system 1 recognizes that the utterance content of the service user includes a product name such as “coffee” will be described. In the present embodiment, the terminal storage unit 52 stores a database including information about products, and the voice recognition unit 444 can refer to the database.

上述したステップＳ１２２において、音声認識部４４４は、ステップＳ１２０において選択された利用者認識内容に、商品名が含まれていると判定した場合、当該商品に関する情報を、端末記憶部５２から取得する。 When the voice recognition unit 444 determines in step S122 described above that the user recognition content selected in step S120 includes the product name, the voice recognition unit 444 acquires information about the product from the terminal storage unit 52.

例えば、ステップＳ１２２において、ステップＳ１２０において選択された利用者認識内容 For example, in step S122, the user recognition content selected in step S120

は、発話文ＩＤ「６」に関連付けられた発話文コーパス「我想要［Ｄｒｉｎｋ］」に一致し、「［Ｄｒｉｎｋ］」が Matches the utterance sentence corpus “Isoyo [Drink]” associated with the utterance sentence ID “6”, and “[Drink]” is

であると判定した場合、音声認識部４４４は、 If the voice recognition unit 444 determines that

に関する情報を、端末記憶部５２から取得する。そして、ステップＳ１３０において、選択した利用者認識内容および選択した提供者翻訳内容に加えて、取得した The information about the information is acquired from the terminal storage unit 52. Then, in step S130, in addition to the selected user recognition content and the selected provider translation content, it is acquired.

に関する情報を、表示情報取得部４４８に出力する。この場合に、表示部４６に表示される画像の例を、図２０に示す。図２０は、本発明の実施形態８において表示部４６に表示される画像の一例を示す図である。 The information regarding the output is output to the display information acquisition unit 448. In this case, an example of the image displayed on the display unit 46 is shown in FIG. FIG. 20 is a diagram showing an example of an image displayed on the display unit 46 in the eighth embodiment of the present invention.

図２０に示すように、表示部４６には、（１）中国語として認識された利用者認識内容を含むテキスト８４０、（２）英語として認識された利用者認識内容を含むテキスト８４２、および（３）韓国語として認識された利用者認識内容を含むテキスト８４４が含まれている。さらに、表示部４６が表示する画像には、ステップＳ１２２において利用者認識内容に含まれていると判定した商品に関する情報８４６が含まれている。 As shown in FIG. 20, on the display unit 46, (1) a text 840 including the user recognition content recognized as Chinese, (2) a text 842 including the user recognition content recognized as English, and ( 3) The text 844 including the user recognition content recognized as Korean is included. Further, the image displayed by the display unit 46 includes information 846 regarding the product determined to be included in the user recognition content in step S122.

このように、本実施形態に係るコミュニケーション支援システム１では、利用者認識内容に商品名が含まれていた場合、当該商品に関する情報を表示部４６に表示する。そのため、コミュニケーション支援システム１では、サービス利用者が発した商品名に関する情報をサービス利用者に提示することができる。 As described above, in the communication support system 1 according to the present embodiment, when the user recognition content includes the product name, the information about the product is displayed on the display unit 46. Therefore, the communication support system 1 can present the service user with information on the product name issued by the service user.

なお、本実施形態では、サービス提供者がクライアント端末４０である場合について説明したが、実施形態１〜実施形態６に記載したように、サービス提供者が例えば店員である場合であっても、コミュニケーション支援システム１は、表示部４６（第１の領域４６ａおよび第２の領域４６ｂの少なくとも何れか）に商品に関する情報を表示してもよい。 In addition, although the case where the service provider is the client terminal 40 has been described in the present embodiment, as described in the first to sixth embodiments, even when the service provider is, for example, a clerk, communication is performed. The support system 1 may display the information about the product on the display unit 46 (at least one of the first region 46a and the second region 46b).

〔実施形態９〕
認識サーバ１０、翻訳サーバ２０、支援サーバ３０、およびクライアント端末４０の制御ブロック（特に認識サーバ制御部１４、翻訳サーバ制御部２４、支援サーバ制御部３４、およびクライアント端末制御部４４）は、集積回路（ＩＣチップ）等に形成された論理回路（ハードウェア）によって実現してもよいし、ＣＰＵ（Central Processing Unit）を用いてソフトウェアによって実現してもよい。 [Embodiment 9]
The control blocks of the recognition server 10, the translation server 20, the support server 30, and the client terminal 40 (in particular, the recognition server control unit 14, the translation server control unit 24, the support server control unit 34, and the client terminal control unit 44) are integrated circuits. It may be realized by a logic circuit (hardware) formed in an (IC chip) or the like, or may be realized by software using a CPU (Central Processing Unit).

後者の場合、認識サーバ１０、翻訳サーバ２０、支援サーバ３０、およびクライアント端末４０は、各機能を実現するソフトウェアであるプログラムの命令を実行するＣＰＵ、上記プログラムおよび各種データがコンピュータ（またはＣＰＵ）で読み取り可能に記録されたＲＯＭ（Read Only Memory）または記憶装置（これらを「記録媒体」と称する）、上記プログラムを展開するＲＡＭ（Random Access Memory）などを備えている。そして、コンピュータ（またはＣＰＵ）が上記プログラムを上記記録媒体から読み取って実行することにより、本発明の目的が達成される。上記記録媒体としては、「一時的でない有形の媒体」、例えば、テープ、ディスク、カード、半導体メモリ、プログラマブルな論理回路などを用いることができる。また、上記プログラムは、該プログラムを伝送可能な任意の伝送媒体（通信ネットワークや放送波等）を介して上記コンピュータに供給されてもよい。なお、本発明は、上記プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。 In the latter case, the recognition server 10, the translation server 20, the support server 30, and the client terminal 40 have a CPU that executes the instructions of a program that is software that realizes each function, and the program and various data are a computer (or CPU). A ROM (Read Only Memory) or a storage device (these are referred to as a “recording medium”) recorded in a readable manner, a RAM (Random Access Memory) for expanding the program, and the like are provided. Then, the computer (or CPU) reads the program from the recording medium and executes the program to achieve the object of the present invention. As the recording medium, a “non-transitory tangible medium”, for example, a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used. The program may be supplied to the computer via any transmission medium (communication network, broadcast wave, etc.) capable of transmitting the program. The present invention can also be realized in the form of a data signal embedded in a carrier wave, in which the above program is embodied by electronic transmission.

〔実施形態１０〕
上記各実施形態では、複数の装置（認識サーバ１０、翻訳サーバ２０、支援サーバ３０、およびクライアント端末４０）を用いる例を説明したが、各装置の有する各機能を、１つの装置またはより多くの装置が備える構成であってもよい。例えば、上述の実施形態において端末記憶部５２に格納されたデータベース（発話文データベースおよび応答データベースを含む）は、支援サーバ３０に格納されていてもよいし、認識サーバ１０に格納されていてもよい。 [Embodiment 10]
In each of the above-described embodiments, an example in which a plurality of devices (recognition server 10, translation server 20, support server 30, and client terminal 40) are used has been described, but each function of each device is provided by one device or more devices. It may be configured to be included in the device. For example, the database (including the utterance sentence database and the response database) stored in the terminal storage unit 52 in the above-described embodiment may be stored in the support server 30 or the recognition server 10. ..

また、上記各実施形態では、複数のサーバ（認識サーバ１０、翻訳サーバ２０、および支援サーバ３０）を用いる例を説明したが、各サーバの有する各機能が、１つのサーバまたはより多くのサーバが備える構成であってもよい。そして、複数のサーバを適用する場合においては、各サーバは、同じ事業者によって管理されていてもよいし、異なる事業者によって管理されていてもよい。 Further, in each of the above-described embodiments, an example in which a plurality of servers (recognition server 10, translation server 20, and support server 30) are used has been described, but each function of each server may be one server or more servers. The configuration may be provided. Then, when a plurality of servers are applied, each server may be managed by the same business operator or may be managed by different business operators.

〔実施形態１１〕
認識サーバ１０、翻訳サーバ２０、支援サーバ３０、およびクライアント端末４０の各ブロックは、集積回路（ＩＣチップ）等に形成された論理回路（ハードウェア）によって実現してもよいし、ＣＰＵ（Central Processing Unit）を用いてソフトウェアによって実現してもよい。後者の場合、認識サーバ１０、翻訳サーバ２０、支援サーバ３０、およびクライアント端末４０のそれぞれを、図２１に示すようなコンピュータ（電子計算機）を用いて構成することができる。 [Embodiment 11]
Each block of the recognition server 10, the translation server 20, the support server 30, and the client terminal 40 may be realized by a logic circuit (hardware) formed in an integrated circuit (IC chip) or the like, or a CPU (Central Processing). Unit) and may be realized by software. In the latter case, each of the recognition server 10, the translation server 20, the support server 30, and the client terminal 40 can be configured using a computer (electronic computer) as shown in FIG.

図２１は、認識サーバ１０、翻訳サーバ２０、支援サーバ３０、およびクライアント端末４０として利用可能なコンピュータ９１０のハードウェア構成を例示したブロック図である。コンピュータ９１０は、バス９１１を介して互いに接続された演算装置９１２と、主記憶装置９１３と、補助記憶装置９１４と、入出力インターフェース９１５と、通信インターフェース９１６とを備えている。演算装置９１２、主記憶装置９１３、および補助記憶装置９１４は、それぞれ、例えばＣＰＵ、ＲＡＭ（random access memory）、ハードディスクドライブであってもよい。入出力インターフェース９１５には、ユーザがコンピュータ９１０に各種情報を入力するための入力装置９２０、および、コンピュータ９１０がユーザに各種情報を出力するための出力装置９３０が接続される。入力装置９２０および出力装置９３０は、コンピュータ９１０に内蔵されたものであってもよいし、コンピュータ９１０に接続された（外付けされた）ものであってもよい。例えば、入力装置９２０は、キーボード、マウス、タッチセンサなどであってもよく、出力装置９３０は、ディスプレイ、プリンタ、スピーカなどであってもよい。また、タッチセンサとディスプレイとが一体化されたタッチパネルのような、入力装置９２０および出力装置９３０の双方の機能を有する装置を適用してもよい。そして、通信インターフェース９１６は、コンピュータ９１０が外部の装置と通信するためのインターフェースである。 FIG. 21 is a block diagram exemplifying the hardware configuration of a computer 910 that can be used as the recognition server 10, the translation server 20, the support server 30, and the client terminal 40. The computer 910 includes an arithmetic unit 912, a main storage device 913, an auxiliary storage device 914, an input / output interface 915, and a communication interface 916 that are connected to each other via a bus 911. The arithmetic unit 912, the main storage device 913, and the auxiliary storage device 914 may be, for example, a CPU, a RAM (random access memory), and a hard disk drive. The input / output interface 915 is connected to an input device 920 for the user to input various information to the computer 910 and an output device 930 for the computer 910 to output various information to the user. The input device 920 and the output device 930 may be built in the computer 910 or may be connected (externally attached) to the computer 910. For example, the input device 920 may be a keyboard, a mouse, a touch sensor, etc., and the output device 930 may be a display, a printer, a speaker, etc. Further, a device having both functions of the input device 920 and the output device 930, such as a touch panel in which a touch sensor and a display are integrated, may be applied. The communication interface 916 is an interface for the computer 910 to communicate with an external device.

補助記憶装置９１４には、コンピュータ９１０を認識サーバ１０、翻訳サーバ２０、支援サーバ３０、およびクライアント端末４０として動作させるための各種のプログラムが格納されている。そして、演算装置９１２は、補助記憶装置９１４に格納された上記プログラムを主記憶装置９１３上に展開して該プログラムに含まれる命令を実行することによって、コンピュータ９１０を、認識サーバ１０、翻訳サーバ２０、支援サーバ３０、およびクライアント端末４０が備える各部として機能させる。なお、補助記憶装置９１４が備える、プログラム等の情報を記録する記録媒体は、コンピュータ読み取り可能な「一時的でない有形の媒体」であればよく、例えば、テープ、ディスク、カード、半導体メモリ、プログラマブル論理回路などであってもよい。 The auxiliary storage device 914 stores various programs for operating the computer 910 as the recognition server 10, the translation server 20, the support server 30, and the client terminal 40. Then, the arithmetic unit 912 expands the program stored in the auxiliary storage device 914 onto the main storage device 913 and executes the instructions included in the program, thereby causing the computer 910 to operate the recognition server 10 and the translation server 20. , The support server 30, and the client terminal 40 function as each unit. The recording medium for recording information such as programs provided in the auxiliary storage device 914 may be any computer-readable “non-transitory tangible medium”, and examples thereof include tape, disk, card, semiconductor memory, and programmable logic. It may be a circuit or the like.

また、上記プログラムは、コンピュータ９１０の外部から取得してもよく、この場合、任意の伝送媒体（通信ネットワークや放送波等）を介して取得してもよい。そして、本発明は、上記プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。 Further, the program may be acquired from outside the computer 910, and in this case, it may be acquired via an arbitrary transmission medium (communication network, broadcast wave, or the like). The present invention can also be realized in the form of a data signal embedded in a carrier wave, in which the program is embodied by electronic transmission.

本発明は上述した各実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。さらに、各実施形態にそれぞれ開示された技術的手段を組み合わせることにより、新しい技術的特徴を形成することができる。 The present invention is not limited to the above-described embodiments, and various modifications can be made within the scope of the claims, and the embodiments obtained by appropriately combining the technical means disclosed in the different embodiments. Is also included in the technical scope of the present invention. Furthermore, new technical features can be formed by combining the technical means disclosed in each of the embodiments.

１コミュニケーション支援システム
２ネットワーク
１０認識サーバ
１２認識サーバ通信部
１４認識サーバ制御部（制御部）
２０翻訳サーバ
２２翻訳サーバ通信部
２４翻訳サーバ制御部（制御部）
３０支援サーバ
３２支援サーバ通信部
３４支援サーバ制御部（制御部）
３４２情報管理部
３４４選択部
３４６表示態様決定部
３４８表示情報出力部
４０クライアント端末
４２クライアント端末通信部
４４クライアント端末制御部（制御部）
４４２音声情報取得部
４４４音声認識部
４４６操作信号取得部
４４８表示情報取得部
４５０表示制御部
４６表示部
４６ａ第１の領域
４６ｂ第２の領域
４８音声入力部
４８ａサービス利用者側音声入力部
４８ｂサービス提供者側音声入力部
５０操作部
５２端末記憶部 1 Communication Support System 2 Network 10 Recognition Server 12 Recognition Server Communication Unit 14 Recognition Server Control Unit (Control Unit)
20 translation server 22 translation server communication unit 24 translation server control unit (control unit)
30 Support Server 32 Support Server Communication Unit 34 Support Server Control Unit (Control Unit)
342 Information management unit 344 Selection unit 346 Display mode determination unit 348 Display information output unit 40 Client terminal 42 Client terminal communication unit 44 Client terminal control unit (control unit)
442 voice information acquisition unit 444 voice recognition unit 446 operation signal acquisition unit 448 display information acquisition unit 450 display control unit 46 display unit 46a first area 46b second area 48 voice input unit 48a service user side voice input unit 48b service Provider side voice input unit 50 Operation unit 52 Terminal storage unit

Claims

A communication support system comprising: a display section having a first area for a first user and a second area for a second user; a voice input section; and a control section,
The control unit is
Obtaining first voice information indicating the voice of the first user via the voice input unit,
A recognition process is performed for recognizing the first voice content indicated by the first voice information as each of a plurality of languages,
From the first recognition content indicating the recognition content recognized as each of the plurality of languages, the recognition accuracy indicating the accuracy of the recognition process recognized in each of the plurality of languages is referred to, and each of the plurality of languages is referenced. Among the recognition contents recognized as, the selection process of selecting the recognition content in the language recognized with a recognition accuracy higher than a predetermined threshold as the first recognition content to be displayed,
As the first recognition content of the display target, the recognition content in the language recognized with the recognition accuracy higher than the predetermined threshold value is displayed in the first area of the display unit,
A communication support system characterized by that.

The control unit is
If the content selected as the first recognition content by the selection processing is different from the content currently displayed in the first area after the voice input is started, the first processing is performed by the selection processing. The content displayed in the first area of the display unit is updated so that the content selected as the recognition content of the display unit is displayed in the first area of the display unit. Support system.

The control unit is
With reference to the recognition accuracy, a display mode in which the first recognition content of the display target is displayed in the first area of the display unit is determined,
The communication support system according to claim 1 or 2, characterized in that.

The control unit is
Acquire the first translation content obtained by translating the first recognition content of the display target,
Displaying the first translation content in the second area of the display unit,
Communication support system according to any one of claim 1 to 3, characterized in that.

The control unit is
Acquiring second voice information indicating the voice of the second user via the voice input unit,
A recognition process for recognizing the second voice content indicated by the second voice information is performed,
Acquiring second translation content obtained by translating the second recognition content recognized by the recognition processing into the plurality of languages,
Displaying the second translation content in the first area of the display unit,
Communication support system according to any one of claim 1 to 4, characterized in that.

Further equipped with controls
The control unit is
According to the input received via the operator, it is determined which language of the plurality of languages is to recognize the first voice content indicated by the first voice information,
Communication support system according to any one of claim 1 to 5, characterized in that.

The control unit is
The response content for the first recognition content is determined by referring to a predetermined response content candidate list corresponding to the first recognition content of the display target,
The first recognition content of the display target is displayed in the first area of the display unit, and the response content is displayed in the second area of the display unit,
Communication support system according to any one of claim 1 to 3, characterized in that.

The control unit is
When there is no response content for the first recognition content in the predetermined response content candidate list,
The communication support system according to claim 7 , wherein a message indicating that the first voice content cannot be recognized is displayed.

The first user is a service user and the second user is a service provider,
Communication support system according to any one of claims 1-8, characterized in that.

The first user is a service provider and the second user is a service user.
Communication support system according to any one of claims 1-8, characterized in that.

An acquisition step of acquiring first voice information indicating the voice of the first user;
A recognition step of recognizing the first voice content indicated by the first voice information as each of a plurality of languages;
From the first recognition content indicating the recognition content recognized as each of the plurality of languages, the recognition accuracy indicating the accuracy of the recognition process recognized in each of the plurality of languages is referred to, and each of the plurality of languages is referenced. A selection step of selecting, as the first recognition content to be displayed, the recognition content in the language recognized with a recognition accuracy higher than a predetermined threshold value among the recognition content recognized as
As a first recognition content of the display target, a display step of displaying recognition content in a language recognized with a recognition accuracy higher than the predetermined threshold value in a first area for a first user , respectively .
A communication support method comprising:

A program for causing a computer to function as a communication support system including a display unit having a first area for a first user and a second area for a second user, a voice input unit, and a control unit. So, in the above control unit,
An acquisition process for acquiring first voice information indicating the voice of the first user via the voice input unit;
Recognition processing for recognizing the first voice content indicated by the first voice information as each of a plurality of languages,
From the first recognition content indicating the recognition content recognized as each of the plurality of languages, the recognition accuracy indicating the accuracy of the recognition process recognized in each of the plurality of languages is referred to, and each of the plurality of languages is referenced. A selection process of selecting, as the first recognition content to be displayed, the recognition content in the language recognized with a recognition accuracy higher than a predetermined threshold value out of the recognition content recognized as
As the first recognition content of the display target, a display process of displaying recognition content in a language recognized with a recognition accuracy higher than the predetermined threshold value in the first area of the display unit,
A program characterized by causing to execute.