JP6832503B2

JP6832503B2 - Information presentation method, information presentation program and information presentation system

Info

Publication number: JP6832503B2
Application number: JP2017076693A
Authority: JP
Inventors: 三浦　康史; 康史三浦; 昌克星見
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2016-09-07
Filing date: 2017-04-07
Publication date: 2021-02-24
Anticipated expiration: 2037-04-07
Also published as: JP2018045675A

Description

本開示は、複数の話者による対話に係る音声を翻訳し、前記対話を補助するための補助情報を提示する情報提示方法、情報提示プログラム及び情報提示システムに関するものである。 The present disclosure relates to an information presentation method, an information presentation program, and an information presentation system that translates voices related to a dialogue by a plurality of speakers and presents auxiliary information for assisting the dialogue.

従来、複数の話者による対話を支援する対話支援装置において、対話の内容に応じて知識を補うための情報を提示して対話を支援する技術（例えば、特許文献１参照）が知られている。 Conventionally, in a dialogue support device that supports dialogue by a plurality of speakers, a technique for presenting information for supplementing knowledge according to the content of dialogue to support dialogue (see, for example, Patent Document 1) has been known. ..

特開２０１３−７３３５５号公報Japanese Unexamined Patent Publication No. 2013-73355

しかしながら、上記特許文献１では、更なる改善が必要とされていた。 However, in Patent Document 1, further improvement is required.

本開示の一態様に係る情報提示方法は、情報提示システムにおける情報提示方法であって、複数の話者による対話に係る音声を音声認識することにより対話テキストを生成し、前記対話テキストを翻訳することにより対話翻訳テキストを生成し、前記対話翻訳テキストを音声合成することにより対話翻訳音声を生成し、前記対話テキストに基づいて前記対話を補助するための補助情報が存在するか否かを判定し、前記補助情報が存在する場合、前記複数の話者のうちの少なくとも１人の前記情報提示システムの利用状況に応じて、前記補助情報が存在することを前記複数の話者のうちの少なくとも１人に提示する。 The information presentation method according to one aspect of the present disclosure is an information presentation method in an information presentation system, in which dialogue text is generated by voice recognition of voices related to dialogue by a plurality of speakers, and the dialogue text is translated. By generating the dialogue translation text, the dialogue translation voice is generated by synthesizing the dialogue translation text, and it is determined whether or not there is auxiliary information for assisting the dialogue based on the dialogue text. When the auxiliary information exists, at least one of the plurality of speakers indicates that the auxiliary information exists according to the usage status of the information presentation system. Present to people.

上記態様によれば、更なる改善を実現することができる。 According to the above aspect, further improvement can be realized.

本実施の形態の情報提示システムの翻訳端末の構成を示すブロック図である。It is a block diagram which shows the structure of the translation terminal of the information presentation system of this embodiment. 本実施の形態の情報提示システムの翻訳サーバの構成を示すブロック図である。It is a block diagram which shows the structure of the translation server of the information presentation system of this embodiment. 補助説明文記憶部に記憶される情報の一例を示す図である。It is a figure which shows an example of the information which is stored in the auxiliary explanatory text storage part. 補助情報記憶部に記憶される情報の一例を示す図である。It is a figure which shows an example of the information which is stored in the auxiliary information storage part. 本実施の形態における情報提示システムの動作を説明するためのフローチャートである。It is a flowchart for demonstrating the operation of the information presentation system in this Embodiment. 翻訳端末及び翻訳サーバにおいて実行される客発話処理の動作を説明するためのフローチャートである。It is a flowchart for demonstrating the operation of the customer utterance processing executed in a translation terminal and a translation server. 翻訳端末及び翻訳サーバにおいて実行される店員発話処理の動作を説明するための第１のフローチャートである。It is a 1st flowchart for demonstrating operation of a clerk utterance process executed in a translation terminal and a translation server. 翻訳端末及び翻訳サーバにおいて実行される店員発話処理の動作を説明するための第２のフローチャートである。It is a 2nd flowchart for demonstrating operation of a clerk utterance process executed in a translation terminal and a translation server. 補助情報が存在することを提示する際に、翻訳端末に表示される画面の一例を示す図である。It is a figure which shows an example of the screen which is displayed on the translation terminal at the time of presenting the existence of auxiliary information.

（本開示に係る一態様を発明するに至った経緯）
まず、本開示に係る一態様の着眼点について説明する。 (Background to the invention of one aspect of the present disclosure)
First, one aspect of the viewpoint of the present disclosure will be described.

上記特許文献１では、発話行為を示す発話行為タグと話者を示す話者タグとに基づいて補足情報を表示する技術を開示している。発話行為タグとしては、「挨拶」、「応答」及び「質問」などが示されている（特許文献１の図８参照）。また、補足情報を表示するタイミングとしては、話者と発話行為との組み合わせによって、情報提示の要否又は情報の提示タイミングを決める方法が示されている（特許文献１の図１０）。 The above-mentioned Patent Document 1 discloses a technique for displaying supplementary information based on a speech act tag indicating a speech act and a speaker tag indicating a speaker. As the speech act tag, "greeting", "response", "question" and the like are shown (see FIG. 8 of Patent Document 1). Further, as the timing of displaying the supplementary information, a method of determining the necessity of presenting the information or the timing of presenting the information is shown depending on the combination of the speaker and the utterance act (FIG. 10 of Patent Document 1).

このように、特許文献１では、発話行為と話者とに基づいて情報提示の要否及び情報の提示タイミングを制御することができる。しかしながら、情報提示の要否及び情報の提示タイミングは対話支援装置が決定するものであり、客又は接客者が決定するものではない。 As described above, in Patent Document 1, it is possible to control the necessity of presenting information and the timing of presenting information based on the utterance act and the speaker. However, the necessity of presenting information and the timing of presenting information are determined by the dialogue support device, not by the customer or the customer service person.

このように、対話支援装置が情報提示の要否及び情報の提示タイミングを決める場合、客が急いでいるため補足情報の提示を割愛したいという接客者の要望に応えることができない。また、従来の対話支援装置では、対話支援装置を扱っている接客者が十分な知識を持っているため補足情報を提示する必要がない場合であっても、補足情報が通知されてしまう。 In this way, when the dialogue support device determines the necessity of presenting information and the timing of presenting information, it is not possible to respond to the customer's request to omit the presentation of supplementary information because the customer is in a hurry. Further, in the conventional dialogue support device, the supplementary information is notified even when it is not necessary to present the supplementary information because the customer service person who handles the dialogue support device has sufficient knowledge.

また、特許文献１では、補足情報を提示する際に、補足情報の提示をユーザに知らせる方法として、音を鳴らす、表示を点滅させる、又は装置を振動させるなどの方法が示されている。しかしながら、これらの方法は、対話が主体である接客業務を妨げるおそれがあり、対話の中で自然な形で補足情報があることを知らせることができない。 Further, Patent Document 1 discloses a method of sounding a sound, blinking a display, or vibrating a device as a method of notifying a user of the presentation of supplementary information when presenting supplementary information. However, these methods may interfere with the customer service business, which is mainly a dialogue, and cannot inform that there is supplementary information in a natural form in the dialogue.

以上の課題を解決するために、本開示の一態様に係る情報提示方法は、情報提示システムにおける情報提示方法であって、複数の話者による対話に係る音声を音声認識することにより対話テキストを生成し、前記対話テキストを翻訳することにより対話翻訳テキストを生成し、前記対話翻訳テキストを音声合成することにより対話翻訳音声を生成し、前記対話テキストに基づいて前記対話を補助するための補助情報が存在するか否かを判定し、前記複数の話者のうちの少なくとも１人の前記情報提示システムの利用状況に応じて、前記補助情報が存在する場合、前記補助情報が存在することを前記複数の話者のうちの少なくとも１人に提示する。 In order to solve the above problems, the information presentation method according to one aspect of the present disclosure is an information presentation method in an information presentation system, and a dialogue text is produced by recognizing voices related to dialogues by a plurality of speakers. Auxiliary information for generating the dialogue translation text by generating and translating the dialogue text, generating the dialogue translation voice by synthesizing the dialogue translation text by voice synthesis, and assisting the dialogue based on the dialogue text. Is present, and if the auxiliary information is present according to the usage status of the information presentation system of at least one of the plurality of speakers, the presence of the auxiliary information is described. Present to at least one of multiple speakers.

この構成によれば、複数の話者による対話に係る音声を音声認識することにより対話テキストが生成される。対話テキストを翻訳することにより対話翻訳テキストが生成される。対話翻訳テキストを音声合成することにより対話翻訳音声が生成される。対話テキストに基づいて対話を補助するための補助情報が存在するか否かが判定され、補助情報が存在する場合、複数の話者のうちの少なくとも１人の情報提示システムの利用状況に応じて、補助情報が存在することが複数の話者のうちの少なくとも１人に提示される。 According to this configuration, the dialogue text is generated by voice recognition of the voice related to the dialogue by a plurality of speakers. The dialogue translation text is generated by translating the dialogue text. Dialogue-translated speech is generated by synthesizing the dialogue-translated text. Based on the dialogue text, it is determined whether or not there is auxiliary information to assist the dialogue, and if there is auxiliary information, it depends on the usage status of the information presentation system of at least one of the plurality of speakers. , The presence of auxiliary information is presented to at least one of a plurality of speakers.

したがって、対話を補助するための補助情報が提示される前に、補助情報が存在することが提示されるので、必ずしも補助情報が提示されるのではなく、話者が補助情報の提示を必要としているときのみ、補助情報を提示させることができ、円滑な対話を実現することができる。 Therefore, since it is presented that the auxiliary information exists before the auxiliary information for assisting the dialogue is presented, the auxiliary information is not necessarily presented, and the speaker needs to present the auxiliary information. Auxiliary information can be presented only when there is, and smooth dialogue can be realized.

また、上記の情報提示方法において、前記複数の話者のうちの少なくとも１人から前記補助情報の提示指示を受け付け、受け付けた前記提示指示に従って前記補助情報を提示してもよい。 Further, in the above-mentioned information presenting method, the auxiliary information may be presented by receiving an instruction to present the auxiliary information from at least one of the plurality of speakers and following the received instruction to present the auxiliary information.

この構成によれば、複数の話者のうちの少なくとも１人から補助情報の提示指示が受け付けられ、受け付けられた提示指示に従って補助情報が提示される。 According to this configuration, the presentation instruction of the auxiliary information is received from at least one of the plurality of speakers, and the auxiliary information is presented according to the received presentation instruction.

したがって、補助情報の提示を必要としている話者に、補助情報を提示することができる。 Therefore, the auxiliary information can be presented to the speaker who needs to present the auxiliary information.

また、上記の情報提示方法において、前記対話翻訳テキストを音声波形合成することにより前記対話翻訳音声を生成し、前記補助情報を説明する説明テキストを前記対話テキストから特定し、前記説明テキストを音声分析合成することにより補助情報音声を生成してもよい。 Further, in the above information presentation method, the dialogue translation voice is generated by synthesizing the dialogue translation text by voice waveform synthesis, the explanation text explaining the auxiliary information is specified from the dialogue text, and the explanation text is voice-analyzed. Auxiliary information voice may be generated by synthesizing.

この構成によれば、対話翻訳テキストを音声波形合成することにより対話翻訳音声が生成される。補助情報を説明する説明テキストが対話テキストから特定される。説明テキストを音声分析合成することにより補助情報音声が生成される。 According to this configuration, the dialogue-translated speech is generated by synthesizing the speech waveform of the dialogue-translated text. Explanatory text describing the auxiliary information is identified from the dialogue text. Auxiliary information speech is generated by voice analysis and synthesis of the explanatory text.

したがって、補助情報を説明する説明テキストを音声分析合成することにより補助情報音声が生成されるので、補助情報を音声により提示することができる。 Therefore, since the auxiliary information voice is generated by voice analysis and synthesis of the explanatory text explaining the auxiliary information, the auxiliary information can be presented by voice.

また、上記の情報提示方法において、前記補助情報音声の基本周波数は、前記対話翻訳音声の基本周波数とは異なっていてもよい。 Further, in the above information presentation method, the fundamental frequency of the auxiliary information voice may be different from the fundamental frequency of the dialogue translation voice.

この構成によれば、補助情報音声の基本周波数は、対話翻訳音声の基本周波数とは異なっているので、話者は、発話を翻訳した音声と、情報提示システムから提示される補助情報の音声とを区別することができる。 According to this configuration, the fundamental frequency of the auxiliary information voice is different from the fundamental frequency of the interactive translation voice, so that the speaker can use the translated voice of the utterance and the voice of the auxiliary information presented by the information presentation system. Can be distinguished.

また、上記の情報提示方法において、第１の言語で質問する第１の話者の第１の音声を取得するとともに、前記第１の言語とは異なる第２の言語で前記質問に対して回答する第２の話者の第２の音声を取得し、前記第１の音声を音声認識することにより第１の対話テキストを生成し、前記第１の対話テキストを前記第２の言語に翻訳することにより第１の対話翻訳テキストを生成し、前記第１の対話翻訳テキストを音声合成することにより第１の対話翻訳音声を生成し、前記第２の音声を音声認識することにより第２の対話テキストを生成し、前記第２の対話テキストを前記第１の言語に翻訳することにより第２の対話翻訳テキストを生成し、前記第２の対話翻訳テキストを音声合成することにより第２の対話翻訳音声を生成し、前記第１の対話テキスト及び前記第２の対話テキストのうちの少なくとも一方に基づいて前記補助情報が存在するか否かを判定し、前記補助情報が存在する場合、前記第２の話者の前記情報提示システムの利用状況に応じて、前記補助情報が存在することを前記第１の話者及び前記第２の話者の少なくとも一方に提示してもよい。 Further, in the above information presentation method, the first voice of the first speaker asking a question in the first language is acquired, and the question is answered in a second language different from the first language. Acquires the second voice of the second speaker, generates the first dialogue text by recognizing the first voice, and translates the first dialogue text into the second language. As a result, the first dialogue translation text is generated, the first dialogue translation text is generated by voice synthesis of the first dialogue translation text, and the second dialogue is recognized by voice recognition of the second voice. A text is generated, the second dialogue text is translated into the first language to generate a second dialogue translation text, and the second dialogue translation text is voice-synthesized to perform a second dialogue translation. A voice is generated, and it is determined whether or not the auxiliary information exists based on at least one of the first dialogue text and the second dialogue text, and if the auxiliary information exists, the second The existence of the auxiliary information may be presented to at least one of the first speaker and the second speaker according to the usage status of the information presentation system of the speaker.

この構成によれば、第１の言語で質問する第１の話者の第１の音声が取得されるとともに、第１の言語とは異なる第２の言語で質問に対して回答する第２の話者の第２の音声が取得される。第１の音声を音声認識することにより第１の対話テキストが生成される。第１の対話テキストを第２の言語に翻訳することにより第１の対話翻訳テキストが生成される。第１の対話翻訳テキストを音声合成することにより第１の対話翻訳音声が生成される。第２の音声を音声認識することにより第２の対話テキストが生成される。第２の対話テキストを第１の言語に翻訳することにより第２の対話翻訳テキストが生成される。第２の対話翻訳テキストを音声合成することにより第２の対話翻訳音声が生成される。第１の対話テキスト及び第２の対話テキストのうちの少なくとも一方に基づいて補助情報が存在するか否かが判定される。補助情報が存在する場合、第２の話者の情報提示システムの利用状況に応じて、補助情報が存在することが第１の話者及び第２の話者の少なくとも一方に提示される。 According to this configuration, the first voice of the first speaker asking the question in the first language is acquired, and the question is answered in the second language different from the first language. The speaker's second voice is acquired. The first dialogue text is generated by recognizing the first voice. The first dialogue translation text is generated by translating the first dialogue text into the second language. The first dialogue translation voice is generated by voice synthesis of the first dialogue translation text. The second dialogue text is generated by recognizing the second voice. A second dialogue translation text is generated by translating the second dialogue text into the first language. The second dialogue translation voice is generated by voice synthesis of the second dialogue translation text. Whether or not auxiliary information exists is determined based on at least one of the first dialogue text and the second dialogue text. When the auxiliary information is present, the existence of the auxiliary information is presented to at least one of the first speaker and the second speaker, depending on the usage status of the information presentation system of the second speaker.

したがって、第１の話者が第１の言語で質問し、第２の話者が第２の言語で回答する際に、第２の話者が補助情報の提示を必要としているときのみ、補助情報を提示させることができ、円滑な対話を実現することができる。 Therefore, when the first speaker asks a question in the first language and the second speaker answers in the second language, the assistance is provided only when the second speaker requires the presentation of auxiliary information. Information can be presented and smooth dialogue can be realized.

また、上記の情報提示方法において、前記利用状況は、前記第１の言語に翻訳した回数を含み、前記補助情報が存在する場合、前記第１の言語に翻訳した回数が所定回数より多いか否かを判断し、前記第１の言語に翻訳した回数が所定回数より多いと判断した場合、前記補助情報が存在することを前記第２の話者に提示しなくてもよい。 Further, in the above information presentation method, the usage status includes the number of times of translation into the first language, and if the auxiliary information is present, whether or not the number of times of translation into the first language is greater than the predetermined number of times. If it is determined that the number of times of translation into the first language is greater than the predetermined number of times, it is not necessary to present the existence of the auxiliary information to the second speaker.

この構成によれば、利用状況は、第１の言語に翻訳した回数を含む。補助情報が存在する場合、第１の言語に翻訳した回数が所定回数より多いか否かが判断される。第１の言語に翻訳した回数が所定回数より多いと判断された場合、補助情報が存在することが第２の話者に提示されない。 According to this configuration, the usage status includes the number of translations into the first language. If the auxiliary information is present, it is determined whether or not the number of translations into the first language is greater than the predetermined number. If it is determined that the number of translations into the first language is greater than the predetermined number, the presence of auxiliary information is not presented to the second speaker.

したがって、第１の言語に翻訳した回数が所定回数より多い場合、第２の話者にとって補助情報は既知の情報であると推定することができるので、補助情報が存在することを提示する必要がなく、不要な情報を提示する処理を省略することができる。 Therefore, if the number of translations into the first language is greater than the predetermined number, it can be presumed that the auxiliary information is known to the second speaker, and it is necessary to show that the auxiliary information exists. Therefore, the process of presenting unnecessary information can be omitted.

また、上記の情報提示方法において、前記利用状況は、前記第２の話者が前記情報提示システムの利用を開始してから現在までの利用時間を含み、前記補助情報が存在する場合、前記利用時間が所定時間より長いか否かを判断し、前記利用時間が所定時間より長いと判断した場合、前記補助情報が存在することを前記第２の話者に提示しなくてもよい。 Further, in the above information presentation method, the usage status includes the usage time from the start of the second speaker's use of the information presentation system to the present, and when the auxiliary information exists, the usage When it is determined whether or not the time is longer than the predetermined time and it is determined that the usage time is longer than the predetermined time, it is not necessary to present the existence of the auxiliary information to the second speaker.

この構成によれば、利用状況は、第２の話者が情報提示システムの利用を開始してから現在までの利用時間を含む。補助情報が存在する場合、利用時間が所定時間より長いか否かが判断される。利用時間が所定時間より長いと判断された場合、補助情報が存在することが第２の話者に提示されない。 According to this configuration, the usage status includes the usage time from the start of use of the information presentation system by the second speaker to the present. If the auxiliary information exists, it is determined whether or not the usage time is longer than the predetermined time. If it is determined that the usage time is longer than the predetermined time, the existence of the auxiliary information is not presented to the second speaker.

したがって、第２の話者による情報提示システムの利用時間が所定時間より長い場合、第２の話者にとって補助情報は既知の情報であると推定することができるので、補助情報が存在することを提示する必要がなく、不要な情報を提示する処理を省略することができる。 Therefore, when the usage time of the information presentation system by the second speaker is longer than the predetermined time, it can be estimated that the auxiliary information is known information to the second speaker, so that the auxiliary information exists. It is not necessary to present the information, and the process of presenting unnecessary information can be omitted.

また、上記の情報提示方法において、前記補助情報は、説明対象を説明する情報を含み、前記利用状況は、同一の前記説明対象に対応する前記補助情報の提示回数を含み、前記補助情報が存在する場合、前記提示回数が所定回数より多いか否かを判断し、前記提示回数が所定回数より多いと判断した場合、前記補助情報が存在することを前記第２の話者に提示しなくてもよい。 Further, in the above information presenting method, the auxiliary information includes information for explaining the explanation target, the usage status includes the number of times the auxiliary information is presented corresponding to the same explanation target, and the auxiliary information exists. In that case, it is determined whether or not the number of presentations is greater than the predetermined number of times, and if it is determined that the number of presentations is greater than the predetermined number of times, it is not necessary to present the existence of the auxiliary information to the second speaker. May be good.

この構成によれば、補助情報は、説明対象を説明する情報を含む。利用状況は、同一の説明対象に対応する補助情報の提示回数を含む。補助情報が存在する場合、提示回数が所定回数より多いか否かが判断される。提示回数が所定回数より多いと判断された場合、補助情報が存在することが第２の話者に提示されない。 According to this configuration, the auxiliary information includes information for explaining the explanation target. The usage status includes the number of times the auxiliary information corresponding to the same explanation target is presented. When the auxiliary information exists, it is determined whether or not the number of presentations is more than the predetermined number. If it is determined that the number of presentations is greater than the predetermined number, the existence of auxiliary information is not presented to the second speaker.

したがって、同一の説明対象に対応する補助情報の提示回数が所定回数より多い場合、第２の話者にとって補助情報は既知の情報であると推定することができるので、補助情報が存在することを提示する必要がなく、不要な情報を提示する処理を省略することができる。 Therefore, when the number of times the auxiliary information corresponding to the same explanation target is presented is larger than the predetermined number of times, it can be estimated that the auxiliary information is known to the second speaker, so that the auxiliary information exists. It is not necessary to present the information, and the process of presenting unnecessary information can be omitted.

本開示の他の態様に係る情報提示プログラムは、コンピュータを、複数の話者による対話に係る音声を音声認識することにより対話テキストを生成する対話テキスト生成部と、前記対話テキストを翻訳することにより対話翻訳テキストを生成する対話翻訳テキスト生成部と、前記対話翻訳テキストを音声合成することにより対話翻訳音声を生成する対話翻訳音声生成部と、前記対話テキストに基づいて前記対話を補助するための補助情報が存在するか否かを判定する補助情報判定部と、前記補助情報が存在すると判定された場合、前記複数の話者のうちの少なくとも１人の前記コンピュータを構成要素とする情報提示システムの利用状況に応じて、前記補助情報が存在することを前記複数の話者のうちの少なくとも１人に提示するために前記補助情報及び前記対話翻訳音声を送信する送信部として機能させる。 The information presentation program according to another aspect of the present disclosure comprises a computer, a dialogue text generator that generates dialogue text by recognizing voices related to dialogues by a plurality of speakers, and a dialogue text generator that translates the dialogue texts. A dialogue translation text generation unit that generates a dialogue translation text, a dialogue translation voice generation unit that generates a dialogue translation voice by synthesizing the dialogue translation text, and an assistance for assisting the dialogue based on the dialogue text. An auxiliary information determination unit that determines whether or not information exists, and an information presentation system that includes the computer as a component of at least one of the plurality of speakers when it is determined that the auxiliary information exists. Depending on the usage situation, it functions as a transmission unit that transmits the auxiliary information and the interactively translated voice in order to present to at least one of the plurality of speakers that the auxiliary information exists.

この構成によれば、複数の話者による対話に係る音声を音声認識することにより対話テキストが生成される。対話テキストを翻訳することにより対話翻訳テキストが生成される。対話翻訳テキストを音声合成することにより対話翻訳音声が生成される。対話テキストに基づいて対話を補助するための補助情報が存在するか否かが判定され、補助情報が存在する場合、複数の話者のうちの少なくとも１人のコンピュータを構成要素とする情報提示システムの利用状況に応じて、補助情報が存在することを複数の話者のうちの少なくとも１人に提示するために補助情報及び対話翻訳音声が送信される。 According to this configuration, the dialogue text is generated by voice recognition of the voice related to the dialogue by a plurality of speakers. The dialogue translation text is generated by translating the dialogue text. Dialogue-translated speech is generated by synthesizing the dialogue-translated text. Based on the dialogue text, it is determined whether or not auxiliary information for assisting the dialogue exists, and if the auxiliary information exists, an information presentation system including a computer of at least one of a plurality of speakers as a component. Auxiliary information and interactively translated audio are transmitted to present to at least one of a plurality of speakers that the auxiliary information exists, depending on the usage status of the computer.

本開示の他の態様に係る情報提示システムは、端末と、前記端末と通信可能に接続されたサーバとを備え、前記端末は、複数の話者による対話に係る音声を取得する音声取得部と、前記取得した音声を前記サーバへ送信する送信部と、を備え、前記サーバは、前記音声を受信する受信部と、前記対話に係る音声を音声認識することにより対話テキストを生成する対話テキスト生成部と、前記対話テキストを翻訳することにより対話翻訳テキストを生成する対話翻訳テキスト生成部と、前記対話翻訳テキストを音声合成することにより対話翻訳音声を生成する対話翻訳音声生成部と、前記対話テキストに基づいて前記対話を補助するための補助情報が存在するか否かを判定する補助情報判定部と、前記補助情報が存在すると判定された場合、前記複数の話者のうちの少なくとも１人の前記情報提示システムの利用状況に応じて、前記補助情報及び前記対話翻訳音声を前記端末へ送信する送信部と、を備え、前記端末は、前記補助情報及び前記対話翻訳音声を受信する受信部と、前記対話翻訳音声を出力する音声出力部と、前記補助情報が存在することを前記複数の話者のうちの少なくとも１人に提示する提示部と、を備える。 The information presentation system according to another aspect of the present disclosure includes a terminal and a server communicably connected to the terminal, and the terminal includes a voice acquisition unit that acquires voice related to a dialogue by a plurality of speakers. The server includes a transmission unit that transmits the acquired voice to the server, and the server generates a dialogue text by recognizing the voice related to the dialogue with the reception unit that receives the voice. The dialogue translation text generation unit that generates the dialogue translation text by translating the dialogue text, the dialogue translation voice generation unit that generates the dialogue translation voice by synthesizing the dialogue translation text, and the dialogue text. Auxiliary information determination unit that determines whether or not auxiliary information for assisting the dialogue exists based on the above, and if it is determined that the auxiliary information exists, at least one of the plurality of speakers. The terminal includes a transmitting unit that transmits the auxiliary information and the interactively translated voice to the terminal according to the usage status of the information presentation system, and the terminal is a receiving unit that receives the auxiliary information and the interactively translated voice. A voice output unit that outputs the interactively translated voice and a presentation unit that presents the existence of the auxiliary information to at least one of the plurality of speakers are provided.

この構成によれば、端末において、複数の話者による対話に係る音声が取得される。取得された音声がサーバへ送信される。サーバにおいて、音声が受信される。対話に係る音声を音声認識することにより対話テキストが生成される。対話テキストを翻訳することにより対話翻訳テキストが生成される。対話翻訳テキストを音声合成することにより対話翻訳音声が生成される。対話テキストに基づいて対話を補助するための補助情報が存在するか否かが判定される。補助情報が存在すると判定された場合、複数の話者のうちの少なくとも１人の情報提示システムの利用状況に応じて、補助情報及び対話翻訳音声が端末へ送信される。端末において、補助情報及び対話翻訳音声が受信される。対話翻訳音声が出力される。補助情報が存在することが複数の話者のうちの少なくとも１人に提示される。 According to this configuration, voices related to dialogues by a plurality of speakers are acquired at the terminal. The acquired voice is sent to the server. Audio is received at the server. Dialogue text is generated by recognizing the voice related to the dialogue. The dialogue translation text is generated by translating the dialogue text. Dialogue-translated speech is generated by synthesizing the dialogue-translated text. Based on the dialogue text, it is determined whether or not auxiliary information exists to assist the dialogue. When it is determined that the auxiliary information exists, the auxiliary information and the interactively translated voice are transmitted to the terminal according to the usage status of the information presentation system of at least one of the plurality of speakers. Auxiliary information and interactively translated voice are received at the terminal. Dialogue translation audio is output. The presence of ancillary information is presented to at least one of the speakers.

（実施の形態）
以下、本開示の実施の形態について図面を参照しながら説明する。なお、各図面において、同じ構成要素については同じ符号が用いられている。また、以下の実施の形態は、本開示を具体化した一例であって、本開示の技術的範囲を限定するものではない。 (Embodiment)
Hereinafter, embodiments of the present disclosure will be described with reference to the drawings. In each drawing, the same reference numerals are used for the same components. In addition, the following embodiments are examples that embody the present disclosure, and do not limit the technical scope of the present disclosure.

図１は、本実施の形態の情報提示システムの翻訳端末の構成を示すブロック図である。 FIG. 1 is a block diagram showing a configuration of a translation terminal of the information presentation system of the present embodiment.

情報提示システムは、複数の話者による対話に係る音声を翻訳し、対話を補助するための補助情報を提示する。情報提示システムは、翻訳端末１００と翻訳サーバ２００とを備える。翻訳端末１００と翻訳サーバ２００とは、ネットワークを介して互いに通信可能に接続されている。 The information presentation system translates voices related to dialogues by a plurality of speakers and presents auxiliary information for assisting the dialogues. The information presentation system includes a translation terminal 100 and a translation server 200. The translation terminal 100 and the translation server 200 are connected to each other so as to be able to communicate with each other via a network.

翻訳端末１００は、例えば、タブレット型コンピュータ、スマートフォン、携帯電話機又はノート型ＰＣ（パーソナルコンピュータ）であってもよい。また、翻訳端末１００は、複数の話者による対話に係る音声を翻訳するための専用の端末であってもよい。 The translation terminal 100 may be, for example, a tablet computer, a smartphone, a mobile phone, or a notebook PC (personal computer). Further, the translation terminal 100 may be a dedicated terminal for translating the voice related to the dialogue by a plurality of speakers.

図１に示されるように、本実施の形態の情報提示システムの翻訳端末１００は、音声入力部１０１、フォーマット変換部１０２、応答制御部１０３、通信部１０４、音声出力部１０５、応答保持部１０６、補助情報提示設定部１０７、補助情報有無判定部１０８、補助情報提示判定部１０９、補助情報提示部１１０、映像出力部１１１及びユーザ指示入力部１１２を備える。 As shown in FIG. 1, the translation terminal 100 of the information presentation system of the present embodiment has a voice input unit 101, a format conversion unit 102, a response control unit 103, a communication unit 104, a voice output unit 105, and a response holding unit 106. , Auxiliary information presentation setting unit 107, auxiliary information presence / absence determination unit 108, auxiliary information presentation determination unit 109, auxiliary information presentation unit 110, video output unit 111, and user instruction input unit 112.

音声入力部１０１は、例えば訪日外国人などの日本語以外の言語を話す客、又は日本語を話す店員の音声を収集してアナログ音声信号を生成する。音声入力部１０１は、対話する複数の話者が発話した音声信号を取得する。 The voice input unit 101 collects the voices of a customer who speaks a language other than Japanese, such as a foreigner visiting Japan, or a clerk who speaks Japanese, and generates an analog voice signal. The voice input unit 101 acquires voice signals uttered by a plurality of talking speakers.

フォーマット変換部１０２は、音声入力部１０１で生成したアナログ音声信号を例えばパルス符号変調（ＰＣＭ）によりデジタルデータに変換してＰＣＭ音声信号を生成する。 The format conversion unit 102 converts the analog audio signal generated by the audio input unit 101 into digital data by, for example, pulse code modulation (PCM) to generate a PCM audio signal.

応答制御部１０３は、音声入力部１０１から入力された発話音声の話者を判定する。また、応答制御部１０３は、フォーマット変換部１０２で生成されたＰＣＭ音声信号、及び応答保持部１０６に保持される客の発話を翻訳した結果を翻訳サーバ２００へ送信する。また、応答制御部１０３は、翻訳結果、及び補助情報に関するコンテンツ情報を翻訳サーバ２００から受信する。また、応答制御部１０３は、翻訳サーバ２００から受信した翻訳結果を、音声出力部１０５を通じて音声出力する。また、応答制御部１０３は、補助情報の取得、及びユーザ指示入力部１１２から入力された店員の指示に従って、取得した補助情報の提示制御を行う。 The response control unit 103 determines the speaker of the spoken voice input from the voice input unit 101. Further, the response control unit 103 transmits to the translation server 200 the result of translating the PCM voice signal generated by the format conversion unit 102 and the customer's utterance held by the response holding unit 106. Further, the response control unit 103 receives the translation result and the content information related to the auxiliary information from the translation server 200. Further, the response control unit 103 outputs the translation result received from the translation server 200 by voice through the voice output unit 105. In addition, the response control unit 103 acquires auxiliary information and controls the presentation of the acquired auxiliary information in accordance with the instructions of the clerk input from the user instruction input unit 112.

通信部１０４は、インターネットなどの通信回線を介して翻訳サーバ２００とデータ通信を行う。通信部１０４は、複数の話者による対話に係る音声を翻訳サーバ２００へ送信する。通信部１０４は、客の発話を翻訳した結果と、店員の発話を翻訳した結果と、店員の発話及び客の発話に対応する補助情報とを翻訳サーバ２００から受信する。 The communication unit 104 performs data communication with the translation server 200 via a communication line such as the Internet. The communication unit 104 transmits voices related to dialogues by a plurality of speakers to the translation server 200. The communication unit 104 receives from the translation server 200 the result of translating the customer's utterance, the result of translating the clerk's utterance, and the auxiliary information corresponding to the clerk's utterance and the customer's utterance.

音声出力部１０５は、翻訳サーバ２００から受信した客の発話を翻訳した結果と、店員の発話を翻訳した結果と、店員の発話及び客の発話に対応する補助情報とを音声出力する。 The voice output unit 105 voice-outputs the result of translating the customer's utterance received from the translation server 200, the result of translating the clerk's utterance, and the auxiliary information corresponding to the clerk's utterance and the customer's utterance.

応答保持部１０６は、例えば不揮発性メモリであり、翻訳サーバ２００から受信した客の発話を翻訳した結果を保持する。 The response holding unit 106 is, for example, a non-volatile memory, and holds the result of translating the customer's utterance received from the translation server 200.

補助情報提示設定部１０７は、補助情報を提示するか否かを設定するとともに、設定した補助情報を提示するか否かを示す提示要否情報を記憶する。店員は、補助情報提示設定画面などのユーザインタフェースを通じて、補助情報を提示するか否かを予め設定することが可能である。つまり、補助情報提示設定部１０７が補助情報を提示しないことを示す提示要否情報を設定することにより、店員は、補助情報の割り込みなしに、翻訳機能を利用した接客業務を遂行することができる。 The auxiliary information presentation setting unit 107 sets whether or not to present the auxiliary information, and stores the presentation necessity information indicating whether or not to present the set auxiliary information. The clerk can set in advance whether or not to present the auxiliary information through a user interface such as the auxiliary information presentation setting screen. That is, by setting the presentation necessity information indicating that the auxiliary information presentation setting unit 107 does not present the auxiliary information, the clerk can carry out the customer service business using the translation function without interrupting the auxiliary information. ..

補助情報有無判定部１０８は、店員の発話及び客の発話に対応する補助情報が存在するか否かを判定する。 The auxiliary information presence / absence determination unit 108 determines whether or not auxiliary information corresponding to the utterance of the clerk and the utterance of the customer exists.

補助情報提示判定部１０９は、補助情報提示設定部１０７の設定内容、補助情報有無判定部１０８の判定結果、及びユーザ指示入力部１１２から入力されたユーザ指示内容に基づいて、補助情報を提示するか否かを判定する。 The auxiliary information presentation determination unit 109 presents auxiliary information based on the setting contents of the auxiliary information presentation setting unit 107, the determination result of the auxiliary information presence / absence determination unit 108, and the user instruction content input from the user instruction input unit 112. Judge whether or not.

補助情報提示部１１０は、店員の発話及び客の発話に対応する補助情報を、音声出力部１０５及び映像出力部１１１を介して提示する。 The auxiliary information presenting unit 110 presents auxiliary information corresponding to the utterance of the clerk and the utterance of the customer via the audio output unit 105 and the video output unit 111.

映像出力部１１１は、店員の発話及び客の発話に対応する補助情報（コンテンツ情報）を表示する。 The video output unit 111 displays auxiliary information (content information) corresponding to the utterance of the clerk and the utterance of the customer.

ユーザ指示入力部１１２は、補助情報の提示を指示するユーザインタフェースを提供する。 The user instruction input unit 112 provides a user interface for instructing the presentation of auxiliary information.

図２は、本実施の形態の情報提示システムの翻訳サーバの構成を示すブロック図である。 FIG. 2 is a block diagram showing a configuration of a translation server of the information presentation system of the present embodiment.

図２に示されるように、本実施の形態の情報提示システムの翻訳サーバ２００は、通信部２０１、制御部２０２、音声認識部２０３、翻訳部２０４、意図理解部２０５、補助説明文記憶部２０６、補助情報記憶部２０７、音声波形合成部２０８及び音声分析合成部２０９を備える。 As shown in FIG. 2, the translation server 200 of the information presentation system of the present embodiment includes a communication unit 201, a control unit 202, a voice recognition unit 203, a translation unit 204, an intention understanding unit 205, and an auxiliary explanatory text storage unit 206. , Auxiliary information storage unit 207, voice waveform synthesis unit 208, and voice analysis synthesis unit 209.

通信部２０１は、インターネットなどの通信回線を介して翻訳端末１００とデータ通信を行う。 The communication unit 201 performs data communication with the translation terminal 100 via a communication line such as the Internet.

制御部２０２は、音声認識部２０３、翻訳部２０４、意図理解部２０５、音声波形合成部２０８及び音声分析合成部２０９を制御することにより、翻訳端末１００から受信した客の音声又は店員の音声を翻訳し、その翻訳結果を音声合成する。また、制御部２０２は、客の発話及び店員の発話の意図を理解し、その発話意図に応じた補助説明及び補助情報を決定する。 The control unit 202 controls the voice recognition unit 203, the translation unit 204, the intention understanding unit 205, the voice waveform synthesis unit 208, and the voice analysis synthesis unit 209 to obtain the customer's voice or the clerk's voice received from the translation terminal 100. Translate and synthesize the translation result by voice. Further, the control unit 202 understands the intention of the customer's utterance and the clerk's utterance, and determines the auxiliary explanation and the auxiliary information according to the utterance intention.

音声認識部２０３は、客の音声又は店員の音声を認識し、対話テキストを生成する。音声認識部２０３は、対話に係る音声を音声認識することにより対話テキストを生成する。 The voice recognition unit 203 recognizes the voice of the customer or the voice of the clerk and generates the dialogue text. The voice recognition unit 203 generates a dialogue text by recognizing the voice related to the dialogue.

翻訳部２０４は、音声認識部２０３で生成された対話テキストを翻訳することにより対話翻訳テキストを生成する。 The translation unit 204 generates the dialogue translation text by translating the dialogue text generated by the speech recognition unit 203.

意図理解部２０５は、対話テキストに基づいて補助情報が存在するか否かを判定する。意図理解部２０５は、客又は店員の対話テキストから説明対象であるエンティティを抽出し、対話テキストから発話の意図を分類し、エンティティ及び意図に応じた補助情報が存在するか否かを判定する。発話の意図としては、例えば、エンティティへの行き方に関する発話であるのか、エンティティの情報に関する発話であるのかを示す。意図理解部２０５は、補助情報を説明する説明テキストを対話テキストから特定する。 The intention understanding unit 205 determines whether or not auxiliary information exists based on the dialogue text. The intention understanding unit 205 extracts the entity to be explained from the dialogue text of the customer or the clerk, classifies the intention of the utterance from the dialogue text, and determines whether or not the entity and the auxiliary information corresponding to the intention exist. As the intention of the utterance, for example, it indicates whether the utterance is related to the way to the entity or the information of the entity. The intention understanding unit 205 identifies the explanatory text explaining the auxiliary information from the dialogue text.

補助説明文記憶部２０６は、例えば不揮発性メモリであり、補助情報を提示する際に、翻訳端末１００が音声で出力する説明文を記憶する。 The auxiliary explanatory text storage unit 206 is, for example, a non-volatile memory, and stores an explanatory text output by the translation terminal 100 by voice when presenting auxiliary information.

図３は、補助説明文記憶部２０６に記憶される情報の一例を示す図である。説明文ＩＤ３０１は、補助説明文記憶部２０６に記憶する説明文の識別子である。補助情報説明文３０２は、補助情報を説明する説明文を示す説明テキストである。補助説明文記憶部２０６は、説明文ＩＤ３０１と、補助情報説明文３０２とを対応付けて記憶している。 FIG. 3 is a diagram showing an example of information stored in the auxiliary explanatory text storage unit 206. The explanatory text ID 301 is an identifier of the explanatory text stored in the auxiliary explanatory text storage unit 206. The auxiliary information explanatory text 302 is an explanatory text indicating an explanatory text for explaining the auxiliary information. The auxiliary explanatory text storage unit 206 stores the explanatory text ID 301 and the auxiliary information explanatory text 302 in association with each other.

補助情報記憶部２０７は、例えば不揮発性メモリであり、客又は店員の対話テキストから抽出されたエンティティと、対話テキストの発話意図と、エンティティ及び発話意図に応じた説明文とを対応付けて記憶する。 The auxiliary information storage unit 207 is, for example, a non-volatile memory, and stores the entity extracted from the dialogue text of the customer or the clerk, the utterance intention of the dialogue text, and the explanatory text corresponding to the entity and the utterance intention in association with each other. ..

図４は、補助情報記憶部２０７に記憶される情報の一例を示す図である。エンティティ４０１は、客又は店員の対話テキストに含まれる説明対象である。コンテンツＩＤ４０２は、エンティティ４０１の識別子である。意図ＩＤ４０３は、客又は店員の発話意図の識別子である。例えば、意図ＩＤ「０００１」は、エンティティへの行き方に対応し、意図ＩＤ「０００２」はエンティティの情報に対応している。説明文ＩＤ４０４は、エンティティの説明に用いる説明文の識別子である。コンテンツ情報は、エンティティの説明に用いる画像情報の保存先を示すアドレスである。 FIG. 4 is a diagram showing an example of information stored in the auxiliary information storage unit 207. Entity 401 is an explanatory object included in the dialogue text of the customer or the clerk. The content ID 402 is an identifier of the entity 401. The intention ID 403 is an identifier of the utterance intention of the customer or the clerk. For example, the intent ID "0001" corresponds to the way to the entity, and the intent ID "0002" corresponds to the information of the entity. The descriptive text ID 404 is an identifier of the descriptive text used to describe the entity. The content information is an address indicating a storage destination of image information used for explaining an entity.

音声波形合成部２０８は、波形合成型の音声合成を行う。波形合成型の音声合成とは、予め録音された音の断片を連結することにより、音声を合成する手法である。波形合成型の音声合成は、一般的に分析合成型の音声合成よりも品質が高く、人の声に近い特徴を持つ。音声波形合成部２０８は、対話テキストを翻訳した対話翻訳テキストを音声波形合成することにより対話翻訳音声を生成する。 The voice waveform synthesis unit 208 performs waveform synthesis type voice synthesis. Waveform synthesis type speech synthesis is a method of synthesizing speech by concatenating pre-recorded sound fragments. Waveform-synthesized speech synthesis is generally of higher quality than analytical-synthesized speech synthesis and has characteristics similar to human voice. The voice waveform synthesis unit 208 generates the dialogue translation voice by synthesizing the dialogue translation text obtained by translating the dialogue text.

音声分析合成部２０９は、分析合成型の音声合成を行う。分析合成型の音声合成とは、基底周波数又は音色などのパラメータを調整することにより、音声を合成する手法である。分析合成型の音声合成により生成された音声は、ロボット的に聞こえる音声になるという特徴を持つため、人の声と間違えることはない。音声分析合成部２０９は、補助情報を説明する説明テキストを音声分析合成することにより補助情報音声を生成する。補助情報音声の基本周波数は、対話翻訳音声の基本周波数とは異なっている。 The speech analysis synthesis unit 209 performs analysis synthesis type speech synthesis. Analytical synthesis type speech synthesis is a method of synthesizing speech by adjusting parameters such as base frequency or timbre. The voice generated by the analytical synthesis type voice synthesis has the characteristic that it becomes a voice that can be heard like a robot, so that it is not mistaken for a human voice. The voice analysis synthesis unit 209 generates the auxiliary information voice by voice analysis and synthesis of the explanatory text explaining the auxiliary information. The fundamental frequency of the auxiliary information voice is different from the fundamental frequency of the interactive translation voice.

通信部２０１は、補助情報が存在すると判定された場合、補助情報が存在することを複数の話者のうちの少なくとも１人に提示するために補助情報及び対話翻訳音声を翻訳端末１００に送信する。翻訳端末１００の応答制御部１０３は、翻訳サーバ２００から補助情報を受信すると、補助情報が存在することを複数の話者のうちの少なくとも１人に提示する。ユーザ指示入力部１１２は、複数の話者のうちの少なくとも１人から補助情報の提示指示を受け付ける。補助情報提示部１１０は、受け付けた提示指示に従って補助情報を提示する。 When it is determined that the auxiliary information exists, the communication unit 201 transmits the auxiliary information and the interactive translation voice to the translation terminal 100 in order to present the existence of the auxiliary information to at least one of the plurality of speakers. .. When the response control unit 103 of the translation terminal 100 receives the auxiliary information from the translation server 200, the response control unit 103 presents to at least one of the plurality of speakers that the auxiliary information exists. The user instruction input unit 112 receives an instruction to present auxiliary information from at least one of a plurality of speakers. The auxiliary information presentation unit 110 presents auxiliary information according to the received presentation instruction.

上記のように、通信部２０１は、第１の言語で質問する第１の話者（客）の第１の音声を取得するとともに、第１の言語とは異なる第２の言語で質問に対して回答する第２の話者（店員）の第２の音声を取得する。音声認識部２０３は、第１の音声を音声認識することにより第１の対話テキストを生成する。翻訳部２０４は、第１の対話テキストを第２の言語に翻訳することにより第１の対話翻訳テキストを生成する。音声波形合成部２０８は、第１の対話翻訳テキストを音声合成することにより第１の対話翻訳音声を生成する。音声認識部２０３は、第２の音声を音声認識することにより第２の対話テキストを生成する。翻訳部２０４は、第２の対話テキストを第１の言語に翻訳することにより第２の対話翻訳テキストを生成する。音声波形合成部２０８は、第２の対話翻訳テキストを音声合成することにより第２の対話翻訳音声を生成する。意図理解部２０５は、第１の対話テキスト及び第２の対話テキストのうちの少なくとも一方に基づいて補助情報が存在するか否かを判定する。補助情報提示判定部１０９は、補助情報が存在する場合、補助情報が存在することを第１の話者（客）及び第２の話者（店員）の少なくとも一方に提示する。 As described above, the communication unit 201 acquires the first voice of the first speaker (customer) asking a question in the first language, and responds to the question in a second language different from the first language. Acquire the second voice of the second speaker (clerk) who answers. The voice recognition unit 203 generates the first dialogue text by recognizing the first voice. The translation unit 204 generates the first dialogue translation text by translating the first dialogue text into the second language. The voice waveform synthesis unit 208 generates the first dialogue-translated voice by synthesizing the first dialogue-translated text. The voice recognition unit 203 generates the second dialogue text by recognizing the second voice. The translation unit 204 generates the second dialogue translation text by translating the second dialogue text into the first language. The voice waveform synthesis unit 208 generates a second dialogue translation voice by synthesizing the second dialogue translation text. The intention understanding unit 205 determines whether or not auxiliary information exists based on at least one of the first dialogue text and the second dialogue text. When the auxiliary information exists, the auxiliary information presentation determination unit 109 presents the existence of the auxiliary information to at least one of the first speaker (customer) and the second speaker (clerk).

図５は、本実施の形態における情報提示システムの動作を説明するためのフローチャートである。本実施の形態では、客と店員とが対話する例について説明する。客が話す言語と、店員が話す言語とは異なる。情報提示システムは、客の発話を翻訳してから音声出力するとともに、店員の発話を翻訳してから音声出力する。図５に示す情報提示システムの動作は、客が発話してから開始される。 FIG. 5 is a flowchart for explaining the operation of the information presentation system according to the present embodiment. In the present embodiment, an example in which a customer and a clerk interact with each other will be described. The language spoken by the customer is different from the language spoken by the clerk. The information presentation system translates the customer's utterance and then outputs the voice, and translates the clerk's utterance and then outputs the voice. The operation of the information presentation system shown in FIG. 5 is started after the customer speaks.

まず、翻訳端末１００の音声入力部１０１は、発話音声を取得し、音声信号を生成する（ステップＳ１）。客は翻訳端末１００に対して発話する。客の話す言語及び店員の話す言語は、予め設定されている、もしくは、発話前に客又は店員によって選択される。さらに、翻訳端末１００のフォーマット変換部１０２は、音声入力部１０１で生成した音声信号を例えばパルス符号変調（ＰＣＭ）によりデジタルデータに変換し、ＰＣＭ音声信号を生成する。 First, the voice input unit 101 of the translation terminal 100 acquires the spoken voice and generates a voice signal (step S1). The customer speaks to the translation terminal 100. The language spoken by the customer and the language spoken by the clerk are preset or selected by the customer or clerk prior to utterance. Further, the format conversion unit 102 of the translation terminal 100 converts the voice signal generated by the voice input unit 101 into digital data by, for example, pulse code modulation (PCM), and generates a PCM voice signal.

次に、翻訳端末１００の応答制御部１０３は、取得された発話音声が客の発話音声であるか否かを判定する（ステップＳ２）。例えば、翻訳端末１００は、客による音声入力を受け付けるための客用音声入力開始ボタンと、店員による音声入力を受け付けるための店員用音声入力開始ボタンとを備えてもよい。ユーザ指示入力部１１２は、客用音声入力開始ボタンと店員用音声入力開始ボタンとのいずれが押下されたかを判断する。応答制御部１０３は、客用音声入力開始ボタンが押下された後に取得された発話音声は客からの発話音声であると判定し、店員用音声入力開始ボタンが押下された後に取得された発話音声は店員からの発話音声であると判定する。 Next, the response control unit 103 of the translation terminal 100 determines whether or not the acquired utterance voice is the utterance voice of the customer (step S2). For example, the translation terminal 100 may include a customer voice input start button for accepting voice input by the customer and a clerk voice input start button for accepting voice input by the clerk. The user instruction input unit 112 determines which of the customer voice input start button and the clerk voice input start button is pressed. The response control unit 103 determines that the utterance voice acquired after the customer voice input start button is pressed is the utterance voice from the customer, and determines that the utterance voice acquired after the clerk voice input start button is pressed. Is determined to be a voice spoken by a clerk.

なお、翻訳端末１００は、客による音声入力を受け付ける客用音声入力デバイスと、店員による音声入力を受け付ける店員用音声入力デバイスとを備えてもよい。また、応答制御部１０３は、入力された発話音声の言語を判別することにより、客による発話音声であるか、店員による発話音声であるかを判定してもよい。また、取得された発話音声が客の発話音声であるか否かを判定する判定方法については、どのような判定方法を用いてもよい。 The translation terminal 100 may include a customer voice input device that accepts voice input by the customer and a clerk voice input device that accepts voice input by the clerk. Further, the response control unit 103 may determine whether the voice is spoken by the customer or the clerk by discriminating the language of the input voice. Further, any determination method may be used as the determination method for determining whether or not the acquired utterance voice is the utterance voice of the customer.

発話音声が客の発話音声であると判定された場合（ステップＳ２でＹＥＳ）、後述する客発話処理が行われる（ステップＳ３）。また、発話音声が客の発話音声ではないと判定された場合、すなわち、発話音声が店員の発話音声であると判定された場合（ステップＳ２でＮＯ）、後述する店員発話処理が行われる（ステップＳ４）。 When it is determined that the uttered voice is the uttered voice of the customer (YES in step S2), the customer utterance process described later is performed (step S3). Further, when it is determined that the utterance voice is not the utterance voice of the customer, that is, when it is determined that the utterance voice is the utterance voice of the clerk (NO in step S2), the clerk speech processing described later is performed (step). S4).

図６は、翻訳端末１００及び翻訳サーバ２００において実行される客発話処理の動作を説明するためのフローチャートである。図６の客発話処理は、翻訳端末１００の応答制御部１０３によって発話音声が客の発話音声であると判定された場合に開始される。 FIG. 6 is a flowchart for explaining the operation of the customer utterance processing executed by the translation terminal 100 and the translation server 200. The guest utterance process of FIG. 6 is started when the response control unit 103 of the translation terminal 100 determines that the utterance voice is the customer's utterance voice.

まず、翻訳端末１００の通信部１０４は、客発話のＰＣＭ音声信号、客の発話であることを示す識別子、客発話の言語種別及び店員発話の言語種別を含む客発話情報を翻訳サーバ２００に送信する（ステップＳ１１）。 First, the communication unit 104 of the translation terminal 100 transmits the customer utterance information including the PCM voice signal of the customer utterance, the identifier indicating the customer utterance, the language type of the customer utterance, and the language type of the clerk utterance to the translation server 200. (Step S11).

次に、翻訳サーバ２００の通信部２０１は、翻訳端末１００によって送信された客発話情報を受信する（ステップＳ１２）。 Next, the communication unit 201 of the translation server 200 receives the customer utterance information transmitted by the translation terminal 100 (step S12).

次に、翻訳サーバ２００の音声認識部２０３は、受信した客発話のＰＣＭ音声信号及び客発話の言語種別を用いて、ＰＣＭ音声信号を音声認識し、客発話の対話テキストを生成する（ステップＳ１３）。 Next, the voice recognition unit 203 of the translation server 200 uses the received PCM voice signal of the customer utterance and the language type of the customer utterance to voice-recognize the PCM voice signal and generate the dialogue text of the customer utterance (step S13). ).

次に、翻訳サーバ２００の翻訳部２０４は、生成された客発話の対話テキスト、客発話の言語種別及び店員発話の言語種別を用いて、客の言語の対話テキストを店員の言語に翻訳し、客発話の対話翻訳テキストを生成する（ステップＳ１４）。 Next, the translation unit 204 of the translation server 200 translates the dialogue text of the customer's language into the language of the clerk by using the generated dialogue text of the customer's utterance, the language type of the customer's utterance, and the language type of the clerk's speech. A dialogue translation text of the guest utterance is generated (step S14).

次に、翻訳サーバ２００の音声波形合成部２０８は、生成された客発話の対話翻訳テキスト及び店員発話の言語種別を用いて、客発話の対話翻訳テキストを音声合成し、客発話の翻訳合成音声を生成する（ステップＳ１５）。 Next, the voice waveform synthesis unit 208 of the translation server 200 uses the generated dialogue translation text of the customer utterance and the language type of the clerk utterance to synthesize the dialogue translation text of the customer utterance, and translates and synthesizes the customer utterance. Is generated (step S15).

次に、翻訳サーバ２００の通信部２０１は、客発話の対話翻訳テキスト、客発話の翻訳合成音声、客の発話であることを示す識別子及び店員発話の言語種別を含む客発話翻訳情報を翻訳端末１００に送信する（ステップＳ１６）。 Next, the communication unit 201 of the translation server 200 translates the customer utterance translation information including the dialogue translation text of the customer utterance, the translated synthetic voice of the customer utterance, the identifier indicating the customer utterance, and the language type of the clerk utterance. It is transmitted to 100 (step S16).

次に、翻訳端末１００の通信部１０４は、翻訳サーバ２００によって送信された客発話翻訳情報を受信する（ステップＳ１７）。 Next, the communication unit 104 of the translation terminal 100 receives the customer utterance translation information transmitted by the translation server 200 (step S17).

次に、翻訳端末１００の応答制御部１０３は、受信した客発話の対話翻訳テキスト及び店員発話の言語種別を応答保持部１０６に記憶する（ステップＳ１８）。 Next, the response control unit 103 of the translation terminal 100 stores the received dialogue translation text of the customer utterance and the language type of the clerk utterance in the response holding unit 106 (step S18).

次に、翻訳端末１００の音声出力部１０５は、受信した客発話の翻訳合成音声を音声出力する（ステップＳ１９）。なお、映像出力部１１１は、客発話の翻訳合成音声が音声出力される際に、表示画面上にキャラクタを表示し、あたかも表示画面上のキャラクタが話しているように表示制御してもよい。 Next, the voice output unit 105 of the translation terminal 100 outputs the translated synthetic voice of the received customer utterance by voice (step S19). The video output unit 111 may display the character on the display screen when the translated synthetic voice of the customer utterance is output as voice, and control the display as if the character on the display screen is speaking.

図７は、翻訳端末１００及び翻訳サーバ２００において実行される店員発話処理の動作を説明するための第１のフローチャートであり、図８は、翻訳端末１００及び翻訳サーバ２００において実行される店員発話処理の動作を説明するための第２のフローチャートである。図７の店員発話処理は、翻訳端末１００の応答制御部１０３によって発話音声が店員の発話音声であると判定された場合に開始される。 FIG. 7 is a first flowchart for explaining the operation of the clerk utterance process executed by the translation terminal 100 and the translation server 200, and FIG. 8 is a clerk utterance process executed by the translation terminal 100 and the translation server 200. It is the 2nd flowchart for demonstrating the operation of. The clerk utterance process of FIG. 7 is started when the response control unit 103 of the translation terminal 100 determines that the utterance voice is the clerk's utterance voice.

まず、翻訳端末１００の応答制御部１０３は、応答保持部１０６に保持されている客発話の対話翻訳テキストを取得する（ステップＳ２１）。 First, the response control unit 103 of the translation terminal 100 acquires the interactive translation text of the customer utterance held in the response holding unit 106 (step S21).

次に、翻訳端末１００の通信部１０４は、店員発話のＰＣＭ音声信号、店員の発話であることを示す識別子、店員発話の言語種別、客発話の言語種別及び応答保持部１０６から取得した客発話の対話翻訳テキストを含む店員発話情報を翻訳サーバ２００に送信する（ステップＳ２２）。 Next, the communication unit 104 of the translation terminal 100 uses the PCM voice signal of the clerk's utterance, the identifier indicating that the clerk's utterance is spoken, the language type of the clerk's utterance, the language type of the customer's utterance, and the customer utterance acquired from the response holding unit 106. The clerk utterance information including the dialogue translation text of the above is transmitted to the translation server 200 (step S22).

次に、翻訳サーバ２００の通信部２０１は、翻訳端末１００によって送信された店員発話情報を受信する（ステップＳ２３）。 Next, the communication unit 201 of the translation server 200 receives the clerk utterance information transmitted by the translation terminal 100 (step S23).

次に、翻訳サーバ２００の音声認識部２０３は、受信した店員発話のＰＣＭ音声信号及び店員発話の言語種別を用いて、ＰＣＭ音声信号を音声認識し、店員発話の対話テキストを生成する（ステップＳ２４）。 Next, the voice recognition unit 203 of the translation server 200 uses the received PCM voice signal of the clerk utterance and the language type of the clerk utterance to voice-recognize the PCM voice signal and generate the dialogue text of the clerk utterance (step S24). ).

次に、翻訳サーバ２００の翻訳部２０４は、生成された店員発話の対話テキスト、店員発話の言語種別及び客発話の言語種別を用いて、店員の言語の対話テキストを客の言語に翻訳し、店員発話の対話翻訳テキストを生成する（ステップＳ２５）。 Next, the translation unit 204 of the translation server 200 translates the dialogue text of the clerk's language into the customer's language by using the generated dialogue text of the clerk's utterance, the language type of the clerk's utterance, and the language type of the customer's utterance. A dialogue translation text of a clerk's utterance is generated (step S25).

次に、翻訳サーバ２００の音声波形合成部２０８は、生成された店員発話の対話翻訳テキスト及び客発話の言語種別を用いて、店員発話の対話翻訳テキストを音声合成し、店員発話の翻訳合成音声を生成する（ステップＳ２６）。 Next, the voice waveform synthesis unit 208 of the translation server 200 uses the generated dialogue translation text of the clerk's utterance and the language type of the customer's utterance to synthesize the dialogue translation text of the clerk's utterance, and the translation synthesis voice of the clerk's utterance. Is generated (step S26).

次に、翻訳サーバ２００の意図理解部２０５は、客発話の対話翻訳テキスト及び店員発話の対話テキストを用いて、客発話及び店員発話の意図を理解する（ステップＳ２７）。すなわち、意図理解部２０５は、客発話の対話翻訳テキスト及び店員発話の対話テキストからエンティティを抽出し、客発話の対話翻訳テキスト及び店員発話の対話テキストから、発話がエンティティへの行き方を尋ねる発話であるか、発話がエンティティについての情報を尋ねる発話であるかを分類する意図ＩＤを取得する。エンティティの抽出は、機械学習などの統計的方法を用いてもよいし、文法ルールを用いてもよい。また、分類の方法は、機械学習などの統計的方法を用いてもよいし、対話翻訳テキスト及び対話テキストに含まれる表現から所定のルールで推定する方法を用いてもよい。 Next, the intention understanding unit 205 of the translation server 200 understands the intentions of the customer utterance and the clerk utterance by using the dialogue translation text of the customer utterance and the dialogue text of the clerk utterance (step S27). That is, the intention understanding unit 205 extracts an entity from the dialogue translation text of the customer utterance and the dialogue text of the clerk utterance, and the utterance asks the direction to the entity from the dialogue translation text of the customer utterance and the dialogue text of the clerk utterance. Acquires an intent ID that classifies whether the utterance is an utterance asking for information about an entity. For the extraction of entities, statistical methods such as machine learning may be used, or grammatical rules may be used. Further, as the classification method, a statistical method such as machine learning may be used, or a method of estimating from the dialogue translation text and the expressions included in the dialogue text according to a predetermined rule may be used.

次に、翻訳サーバ２００の意図理解部２０５は、取得したエンティティ及び意図ＩＤを用いて、補助情報が存在するか否かを判定する（ステップＳ２８）。意図理解部２０５は、エンティティ及び意図ＩＤを用いて補助情報記憶部２０７を検索することにより、補助情報が存在するか否かを判定する。意図理解部２０５は、取得したエンティティ及び意図ＩＤに対応する補助情報が補助情報記憶部２０７に存在する場合は、補助情報が存在すると判定し、取得したエンティティ及び意図ＩＤに対応する補助情報が補助情報記憶部２０７に存在しない場合は、補助情報が存在しないと判定する。 Next, the intention understanding unit 205 of the translation server 200 determines whether or not the auxiliary information exists by using the acquired entity and the intention ID (step S28). The intention understanding unit 205 determines whether or not the auxiliary information exists by searching the auxiliary information storage unit 207 using the entity and the intention ID. When the auxiliary information storage unit 207 has the auxiliary information corresponding to the acquired entity and the intention ID, the intention understanding unit 205 determines that the auxiliary information exists, and the auxiliary information corresponding to the acquired entity and the intention ID is supplemented. If it does not exist in the information storage unit 207, it is determined that the auxiliary information does not exist.

ここで、補助情報が存在しないと判定された場合（ステップＳ２８でＮＯ）、ステップＳ３２の処理へ移行する。 Here, if it is determined that the auxiliary information does not exist (NO in step S28), the process proceeds to step S32.

一方、補助情報が存在すると判定された場合（ステップＳ２８でＹＥＳ）、翻訳サーバ２００の制御部２０２は、補助情報の説明テキスト及びコンテンツ情報を取得する（ステップＳ２９）。制御部２０２は、取得したエンティティ及び意図ＩＤに対応する説明文ＩＤを補助情報記憶部２０７から取得し、取得した説明文ＩＤに対応する説明テキスト（説明文）を補助説明文記憶部２０６から取得する。また、制御部２０２は、取得したエンティティ及び意図ＩＤに対応するコンテンツ情報を補助情報記憶部２０７から取得する。 On the other hand, when it is determined that the auxiliary information exists (YES in step S28), the control unit 202 of the translation server 200 acquires the explanatory text and the content information of the auxiliary information (step S29). The control unit 202 acquires the explanatory text ID corresponding to the acquired entity and the intention ID from the auxiliary information storage unit 207, and acquires the explanatory text (explanatory text) corresponding to the acquired explanatory text ID from the auxiliary explanatory text storage unit 206. To do. Further, the control unit 202 acquires the content information corresponding to the acquired entity and the intention ID from the auxiliary information storage unit 207.

なお、説明テキスト及びコンテンツ情報が必ずしも存在するとは限らず、制御部２０２は、コンテンツ情報がない場合、説明テキストのみを取得してもよく、説明テキストがない場合、コンテンツ情報のみを取得してもよい。 The explanatory text and the content information do not always exist, and the control unit 202 may acquire only the explanatory text when there is no content information, or may acquire only the content information when there is no explanatory text. Good.

次に、翻訳サーバ２００の翻訳部２０４は、補助情報の説明テキスト、説明テキストの言語種別及び客発話の言語種別を用いて、補助情報の説明テキストを客の言語に翻訳し、補助情報の説明翻訳テキストを生成する（ステップＳ３０）。 Next, the translation unit 204 of the translation server 200 translates the explanation text of the auxiliary information into the customer's language by using the explanation text of the auxiliary information, the language type of the explanation text, and the language type of the customer's utterance, and explains the auxiliary information. Generate the translated text (step S30).

次に、翻訳サーバ２００の音声分析合成部２０９は、生成された補助情報の説明翻訳テキスト及び客発話の言語種別を用いて、補助情報の説明翻訳テキストを音声合成し、補助情報の翻訳合成音声を生成する（ステップＳ３１）。 Next, the speech analysis synthesis unit 209 of the translation server 200 uses the generated explanatory translation text of the auxiliary information and the language type of the guest utterance to synthesize the explanatory translation text of the auxiliary information, and translates and synthesizes the auxiliary information. Is generated (step S31).

次に、翻訳サーバ２００の通信部２０１は、店員発話の対話翻訳テキスト、店員発話の翻訳合成音声、店員の発話であることを示す識別子、説明翻訳テキスト、補助情報の翻訳合成音声、客発話の言語種別及びコンテンツ情報を含む店員発話翻訳情報を翻訳端末１００に送信する（ステップＳ３２）。なお、ステップＳ２８で補助情報が存在しないと判定された場合は、説明翻訳テキスト、補助情報の翻訳合成音声及びコンテンツ情報は送信されない。すなわち、補助情報が存在しないと判定された場合、通信部２０１は、店員発話の対話翻訳テキスト、店員発話の翻訳合成音声、店員の発話であることを示す識別子及び客発話の言語種別を含む店員発話翻訳情報を翻訳端末１００に送信する。 Next, the communication unit 201 of the translation server 200 includes a dialogue translation text of the clerk's utterance, a translation of the clerk's utterance, an identifier indicating that the clerk's utterance, an explanatory translation text, a translation of auxiliary information, and a customer's utterance. The clerk utterance translation information including the language type and content information is transmitted to the translation terminal 100 (step S32). If it is determined in step S28 that the auxiliary information does not exist, the explanatory translation text, the translated synthetic voice of the auxiliary information, and the content information are not transmitted. That is, when it is determined that the auxiliary information does not exist, the communication unit 201 includes the dialogue translation text of the clerk's utterance, the translated synthetic voice of the clerk's utterance, the identifier indicating that the clerk's utterance, and the language type of the customer's utterance. The utterance translation information is transmitted to the translation terminal 100.

次に、翻訳端末１００の通信部１０４は、翻訳サーバ２００によって送信された店員発話翻訳情報を受信する（ステップＳ３３）。 Next, the communication unit 104 of the translation terminal 100 receives the clerk utterance translation information transmitted by the translation server 200 (step S33).

次に、翻訳端末１００の音声出力部１０５は、通信部１０４によって受信された店員発話翻訳情報に含まれる店員発話の翻訳合成音声を音声出力する（ステップＳ３４）。なお、映像出力部１１１は、店員発話の翻訳合成音声が音声出力される際に、表示画面上にキャラクタを表示し、あたかも表示画面上のキャラクタが話しているように表示制御してもよい。 Next, the voice output unit 105 of the translation terminal 100 outputs the translated synthetic voice of the clerk's utterance included in the clerk's utterance translation information received by the communication unit 104 by voice (step S34). The video output unit 111 may display a character on the display screen when the translated synthetic voice of the clerk's utterance is output as voice, and control the display as if the character on the display screen is speaking.

次に、翻訳端末１００の補助情報有無判定部１０８は、通信部１０４によって受信された店員発話翻訳情報に補助情報（説明翻訳テキスト、補助情報の翻訳合成音声及びコンテンツ情報）が含まれているか否かを判定する（ステップＳ３５）。ここで、補助情報が含まれていないと判定された場合（ステップＳ３５でＮＯ）、処理を終了する。 Next, the auxiliary information presence / absence determination unit 108 of the translation terminal 100 determines whether or not the clerk utterance translation information received by the communication unit 104 includes auxiliary information (explanatory translation text, translation synthesis voice of auxiliary information, and content information). (Step S35). Here, if it is determined that the auxiliary information is not included (NO in step S35), the process ends.

一方、補助情報が含まれていると判定された場合（ステップＳ３５でＹＥＳ）、翻訳端末１００の補助情報提示判定部１０９は、補助情報提示設定部１０７の設定情報から補助情報を提示するか否かを判定する。設定情報には、補助情報を提示するか否かが予め設定されている。ここで、補助情報を提示しないと判定された場合（ステップＳ３６でＮＯ）、処理を終了する。 On the other hand, when it is determined that the auxiliary information is included (YES in step S35), whether or not the auxiliary information presentation determination unit 109 of the translation terminal 100 presents the auxiliary information from the setting information of the auxiliary information presentation setting unit 107. Is determined. Whether or not to present auxiliary information is preset in the setting information. Here, if it is determined that the auxiliary information is not presented (NO in step S36), the process ends.

一方、補助情報を提示すると判定された場合（ステップＳ３６でＹＥＳ）、翻訳端末１００の補助情報提示判定部１０９は、補助情報が存在することを客及び店員の少なくとも一方に提示する（ステップＳ３７）。補助情報が存在することが提示されることにより、店員又は客が補助情報を提示するか否かを決定することができる。すなわち、店員にとって補助情報の提示が不要である場合は、店員は、補助情報を提示させることなく、客との対話をそのまま続けることができ、店員にとって補助情報の提示が必要である場合は、店員は、客との対話を中断して、補助情報を提示させることができる。 On the other hand, when it is determined to present the auxiliary information (YES in step S36), the auxiliary information presentation determination unit 109 of the translation terminal 100 presents the existence of the auxiliary information to at least one of the customer and the clerk (step S37). .. By being presented that the auxiliary information is present, it is possible for the clerk or the customer to decide whether or not to present the auxiliary information. That is, when the clerk does not need to present the auxiliary information, the clerk can continue the dialogue with the customer without presenting the auxiliary information, and when the clerk needs to present the auxiliary information, the clerk needs to present the auxiliary information. The clerk can interrupt the dialogue with the customer and have him / her present the auxiliary information.

図９は、補助情報が存在することを提示する際に、翻訳端末に表示される画面の一例を示す図である。例えば、補助情報提示判定部１０９は、翻訳端末１００の画面１１に補助情報が存在することを示すマーク１２を表示することにより、補助情報が存在することを提示する。 FIG. 9 is a diagram showing an example of a screen displayed on the translation terminal when presenting the existence of auxiliary information. For example, the auxiliary information presentation determination unit 109 indicates that the auxiliary information exists by displaying the mark 12 indicating that the auxiliary information exists on the screen 11 of the translation terminal 100.

なお、補助情報提示判定部１０９は、補助情報が存在することを示す通知音を音声出力することにより、補助情報が存在することを提示してもよい。 The auxiliary information presentation determination unit 109 may indicate that the auxiliary information exists by outputting a voice notification sound indicating that the auxiliary information exists.

次に、翻訳端末１００の補助情報提示判定部１０９は、ユーザ指示入力部１１２から補助情報を提示するユーザ指示があるか否かを判定する（ステップＳ３８）。ここで、ユーザ指示がないと判定された場合（ステップＳ３８でＮＯ）、処理を終了する。ユーザ指示入力部１１２は、翻訳端末１００の画面１１に表示された補助情報が存在することを示すマーク１２がタッチされることにより、ユーザ指示を受け付けてもよい。なお、マーク１２が表示された後、所定の時間マーク１２がタッチされない場合、補助情報提示判定部１０９は、マーク１２を消去させてもよい。 Next, the auxiliary information presentation determination unit 109 of the translation terminal 100 determines whether or not there is a user instruction for presenting auxiliary information from the user instruction input unit 112 (step S38). Here, if it is determined that there is no user instruction (NO in step S38), the process ends. The user instruction input unit 112 may accept the user instruction by touching the mark 12 indicating that the auxiliary information displayed on the screen 11 of the translation terminal 100 exists. If the mark 12 is not touched for a predetermined time after the mark 12 is displayed, the auxiliary information presentation determination unit 109 may erase the mark 12.

なお、補助情報提示判定部１０９は、翻訳端末１００に設けられたＬＥＤを点灯又は点滅させることにより、補助情報が存在することを提示してもよい。この場合、ユーザ指示入力部１１２は、翻訳端末１００に設けられたボタンが押下されることにより、ユーザ指示を受け付けてもよい。また、ＬＥＤが点灯又は点滅された後、所定の時間ボタンが押下されない場合、補助情報提示判定部１０９は、ＬＥＤを消灯させてもよい。 The auxiliary information presentation determination unit 109 may indicate that the auxiliary information exists by turning on or blinking the LED provided in the translation terminal 100. In this case, the user instruction input unit 112 may accept the user instruction by pressing the button provided on the translation terminal 100. Further, if the button is not pressed for a predetermined time after the LED is turned on or blinked, the auxiliary information presentation determination unit 109 may turn off the LED.

一方、ユーザ指示があると判定された場合（ステップＳ３８でＹＥＳ）、翻訳端末１００の補助情報提示部１１０は、音声出力部１０５を介して補助情報の翻訳合成音声を音声出力する（ステップＳ３９）。 On the other hand, when it is determined that there is a user instruction (YES in step S38), the auxiliary information presenting unit 110 of the translation terminal 100 outputs the translated synthetic voice of the auxiliary information by voice via the voice output unit 105 (step S39). ..

次に、翻訳端末１００の補助情報提示部１１０は、映像出力部１１１を介して補助情報のコンテンツ情報を表示する（ステップＳ４０）。なお、補助情報提示部１１０は、コンテンツ情報を表示することなく、補助情報の翻訳合成音声のみを出力してもよい。また、補助情報提示部１１０は、補助情報の翻訳合成音声を出力することなく、コンテンツ情報のみを表示してもよい。 Next, the auxiliary information presentation unit 110 of the translation terminal 100 displays the content information of the auxiliary information via the video output unit 111 (step S40). The auxiliary information presentation unit 110 may output only the translated / synthesized voice of the auxiliary information without displaying the content information. Further, the auxiliary information presentation unit 110 may display only the content information without outputting the translated / synthesized voice of the auxiliary information.

なお、本実施の形態の情報提示システムでは、ステップＳ２７に示す意図理解処理及びステップＳ２８に示す補助情報が存在するか否かの判定処理は、店員発話の言語によって行われているが、客発話の言語によって行ってもよい。その場合、ステップＳ２７及びステップＳ２８の処理は、客発話の対話翻訳テキストの代わりに客発話の対話テキストを用いて行われるとともに、店員発話の対話テキストの代わりに店員発話の対話翻訳テキストを用いて行われる。 In the information presentation system of the present embodiment, the intention understanding process shown in step S27 and the determination process of whether or not the auxiliary information shown in step S28 exists are performed in the language spoken by the clerk, but the customer utterance. It may be done in the language of. In that case, the processing of steps S27 and S28 is performed using the dialogue text of the customer utterance instead of the dialogue translation text of the customer utterance, and the dialogue translation text of the clerk utterance is used instead of the dialogue text of the clerk utterance. Will be done.

なお、本実施の形態の情報提示システムでは、客と店員とが異なる言語を話す場合を想定して説明しているが、客と店員とが同じ言語を話す場合にも利用することが可能である。その場合、図６のステップＳ１４、ステップＳ１５、ステップＳ１９、図７のステップＳ２５、ステップＳ２６、及び図８のステップＳ３４の処理を省略してもよい。 In the information presentation system of the present embodiment, the explanation is made assuming that the customer and the clerk speak different languages, but it can also be used when the customer and the clerk speak the same language. is there. In that case, the processes of step S14, step S15, step S19 in FIG. 6, step S25 in FIG. 7, step S26, and step S34 in FIG. 8 may be omitted.

また、翻訳サーバ２００は店員を識別する識別情報と、店員発話回数とを対応付けて記憶する発話回数記憶部を備えてもよい。補助情報が存在すると判定された場合、翻訳サーバ２００の制御部２０２は、発話回数を参照し、店員発話の回数が所定回数より多いか否かを判断してもよい。店員発話が所定回数より多いと判断した場合、補助情報の提示は不要と判断し、補助情報が存在することを提示しなくてもよい。 Further, the translation server 200 may include an utterance count storage unit that stores the identification information that identifies the clerk and the clerk's utterance count in association with each other. When it is determined that the auxiliary information exists, the control unit 202 of the translation server 200 may refer to the number of utterances and determine whether or not the number of clerk utterances is greater than the predetermined number of times. If it is determined that the number of clerk utterances exceeds a predetermined number, it is determined that the presentation of the auxiliary information is unnecessary, and it is not necessary to indicate that the auxiliary information exists.

ここで、客と店員との具体的な対話について説明する。例えば、客が、日本語を話す店員に対し、英語で「アメリカに荷物を送りたいのですが」と発話した場合、情報提示システムは、この客の発話を日本語に翻訳して音声出力する。そして、店員は、日本語で「国際宅配便があります」と発話し、情報提示システムは、この店員の発話を英語に翻訳して音声出力する。このとき、情報提示システムは、店員の対話テキストから、「国際宅配便」という文言をエンティティ（説明対象）として抽出し、国際宅配便に関する補助情報を取得する。情報提示システムは、補助情報が存在することを店員に提示する。情報提示システムは、店員から補助情報の提示指示を受け付けた場合、「国際宅配便で送ることができる荷物のサイズは１６０ｃｍまで、重さは２５ｋｇまでです」という補助情報を音声出力する。 Here, a specific dialogue between the customer and the clerk will be described. For example, if a customer speaks to a Japanese-speaking clerk in English, "I would like to send my luggage to the United States," the information presentation system translates this customer's utterance into Japanese and outputs it as voice. .. Then, the clerk speaks in Japanese, "There is an international courier service," and the information presentation system translates the clerk's utterance into English and outputs it by voice. At this time, the information presentation system extracts the wording "international courier service" as an entity (explanation target) from the dialogue text of the clerk, and acquires auxiliary information about the international courier service. The information presentation system presents to the clerk that auxiliary information exists. When the information presentation system receives an instruction to present auxiliary information from a clerk, it outputs auxiliary information by voice saying "The size of luggage that can be sent by international courier is up to 160 cm and the weight is up to 25 kg."

なお、翻訳サーバ２００は、店員を識別する識別情報と、客発話の言語に翻訳した回数とを対応付けて記憶する翻訳回数記憶部をさらに備えてもよい。そして、補助情報が存在すると判定された場合、翻訳サーバ２００の制御部２０２は、翻訳回数記憶部を参照し、客発話の言語に翻訳した回数が所定回数より多いか否かを判断してもよい。制御部２０２は、客発話の言語に翻訳した回数が所定回数より多いと判断した場合、補助情報の提示は不要であると判断し、補助情報が存在することを店員に提示しなくてもよい。例えば、情報提示システムの利用回数が所定回数より多い店員にとって、補助情報は既知の情報である可能性が高い。そのため、客発話の言語に翻訳した回数が所定回数より多い場合、補助情報の提示は不要であると判断し、補助情報が存在することを店員に提示しなくてもよい。 The translation server 200 may further include a translation count storage unit that stores the identification information that identifies the clerk and the number of translations into the language spoken by the customer in association with each other. Then, when it is determined that the auxiliary information exists, the control unit 202 of the translation server 200 may refer to the translation number storage unit and determine whether or not the number of translations into the language of the customer utterance is greater than the predetermined number. Good. When the control unit 202 determines that the number of times of translation into the language of the customer utterance is more than the predetermined number of times, it is determined that the presentation of the auxiliary information is unnecessary, and it is not necessary to show the clerk that the auxiliary information exists. .. For example, for a clerk who uses the information presentation system more than a predetermined number of times, the auxiliary information is likely to be known information. Therefore, when the number of times of translation into the language of the customer's utterance is more than the predetermined number of times, it is determined that the presentation of the auxiliary information is unnecessary, and it is not necessary to show the clerk that the auxiliary information exists.

また、翻訳サーバ２００は、店員を識別する識別情報と、店員が情報提示システムの利用を開始してから現在までの利用時間とを対応付けて記憶する利用時間記憶部をさらに備えてもよい。そして、補助情報が存在すると判定された場合、翻訳サーバ２００の制御部２０２は、利用時間記憶部を参照し、店員が情報提示システムの利用を開始してから現在までの利用時間が所定時間より長いか否かを判断してもよい。制御部２０２は、利用時間が所定時間より長いと判断した場合、補助情報の提示は不要であると判断し、補助情報が存在することを店員に提示しなくてもよい。例えば、情報提示システムの利用時間が所定時間より長い店員にとって、補助情報は既知の情報である可能性が高い。そのため、店員の情報提示システムの利用時間が所定時間より長い場合、補助情報の提示は不要であると判断し、補助情報が存在することを店員に提示しなくてもよい。 Further, the translation server 200 may further include a usage time storage unit that stores the identification information that identifies the clerk and the usage time from the start of use of the information presentation system to the present. Then, when it is determined that the auxiliary information exists, the control unit 202 of the translation server 200 refers to the usage time storage unit, and the usage time from the time when the clerk starts using the information presentation system to the present is more than the predetermined time. You may decide whether it is long or not. When the control unit 202 determines that the usage time is longer than the predetermined time, it determines that it is not necessary to present the auxiliary information, and it is not necessary to indicate to the clerk that the auxiliary information exists. For example, for a clerk who uses the information presentation system for a longer time than a predetermined time, the auxiliary information is likely to be known information. Therefore, when the usage time of the information presentation system of the clerk is longer than the predetermined time, it is not necessary to judge that the presentation of the auxiliary information is unnecessary and not to show the clerk that the auxiliary information exists.

また、翻訳サーバ２００は、店員を識別する識別情報と、同一の説明対象（エンティティ）に対応する補助情報の提示回数とを対応付けて記憶する提示回数記憶部をさらに備えてもよい。そして、補助情報が存在すると判定された場合、翻訳サーバ２００の制御部２０２は、同一の説明対象に対応する補助情報の提示回数が所定回数より多いか否かを判断してもよい。制御部２０２は、提示回数が所定回数より多いと判断した場合、補助情報の提示は不要であると判断し、補助情報が存在することを店員に提示しなくてもよい。例えば、上記の「国際宅配便」という説明対象に対応する補助情報が、複数回数提示された場合、店員は、補助情報なしに国際宅配便について説明することが可能になる。そのため、特定の説明対象に対応する補助情報が所定回数より多く提示された場合、補助情報の提示は不要であると判断し、補助情報が存在することを店員に提示しなくてもよい。 Further, the translation server 200 may further include a presentation count storage unit that stores the identification information that identifies the clerk and the presentation count of the auxiliary information corresponding to the same explanation target (entity) in association with each other. Then, when it is determined that the auxiliary information exists, the control unit 202 of the translation server 200 may determine whether or not the number of times the auxiliary information corresponding to the same explanation target is presented is greater than the predetermined number of times. When the control unit 202 determines that the number of presentations is greater than the predetermined number of times, the control unit 202 determines that the presentation of the auxiliary information is unnecessary, and does not have to show the store clerk that the auxiliary information exists. For example, when the auxiliary information corresponding to the above-mentioned "international courier service" is presented a plurality of times, the clerk can explain the international courier service without the auxiliary information. Therefore, when the auxiliary information corresponding to the specific explanation target is presented more than a predetermined number of times, it is determined that the presentation of the auxiliary information is unnecessary, and it is not necessary to show the clerk that the auxiliary information exists.

本開示に係る情報提示方法、情報提示プログラム及び情報提示システムは、必ずしも補助情報が提示されるのではなく、話者が補助情報の提示を必要としているときのみ、補助情報を提示させることができ、円滑な対話を実現することができ、複数の話者による対話に係る音声を翻訳し、前記対話を補助するための補助情報を提示する情報提示方法、情報提示プログラム及び情報提示システムとして有用である。 The information presentation method, the information presentation program, and the information presentation system according to the present disclosure do not necessarily present the auxiliary information, and can make the auxiliary information presented only when the speaker needs to present the auxiliary information. It is useful as an information presentation method, an information presentation program, and an information presentation system that can realize a smooth dialogue, translate voices related to a dialogue by a plurality of speakers, and present auxiliary information for assisting the dialogue. is there.

１００翻訳端末
１０１音声入力部
１０２フォーマット変換部
１０３応答制御部
１０４通信部
１０５音声出力部
１０６応答保持部
１０７補助情報提示設定部
１０８補助情報有無判定部
１０９補助情報提示判定部
１１０補助情報提示部
１１１映像出力部
１１２ユーザ指示入力部
２００翻訳サーバ
２０１通信部
２０２制御部
２０３音声認識部
２０４翻訳部
２０５意図理解部
２０６補助説明文記憶部
２０７補助情報記憶部
２０８音声波形合成部
２０９音声分析合成部 100 Translation terminal 101 Voice input unit 102 Format conversion unit 103 Response control unit 104 Communication unit 105 Voice output unit 106 Response holding unit 107 Auxiliary information presentation setting unit 108 Auxiliary information presence / absence judgment unit 109 Auxiliary information presentation judgment unit 110 Auxiliary information presentation unit 111 Video output unit 112 User instruction input unit 200 Translation server 201 Communication unit 202 Control unit 203 Voice recognition unit 204 Translation unit 205 Intention understanding unit 206 Auxiliary explanatory text storage unit 207 Auxiliary information storage unit 208 Voice waveform synthesis unit 209 Voice analysis synthesis unit

Claims

It is an information presentation method in the information presentation system.
Dialogue text is generated by voice recognition of voices related to dialogues by multiple speakers.
A dialogue translation text is generated by translating the dialogue text,
A dialogue-translated voice is generated by synthesizing the dialogue-translated text.
Based on the dialogue text, it is determined whether or not auxiliary information for assisting the dialogue exists.
When the auxiliary information is present, it is determined whether or not the usage record value of the information presentation system of at least one of the plurality of speakers is larger than a predetermined value.
When it is determined that the usage record value is larger than the predetermined value, the existence of the auxiliary information is not presented to at least one of the plurality of speakers, and the usage record value is equal to or less than the predetermined value. If it is determined, the existence of the auxiliary information is presented to at least one of the plurality of speakers.
Information presentation method.

Upon receiving an instruction to present the auxiliary information from at least one of the plurality of speakers,
Present the auxiliary information according to the received presentation instruction,
The information presentation method according to claim 1.

The dialogue-translated voice is generated by synthesizing the speech waveform of the dialogue-translated text.
An explanatory text explaining the auxiliary information is specified from the dialogue text,
Auxiliary information voice is generated by voice analysis and synthesis of the above explanation text.
The information presentation method according to claim 1 or 2.

The fundamental frequency of the auxiliary information voice is different from the fundamental frequency of the interactive translation voice.
The information presentation method according to claim 3.

The second speaker of the second speaker who answers the question in a second language different from the first language while acquiring the first voice of the first speaker asking the question in the first language. Get the voice of
The first dialogue text is generated by recognizing the first voice.
By translating the first dialogue text into the second language, the first dialogue translation text is generated.
The first dialogue translation voice is generated by synthesizing the first dialogue translation text.
A second dialogue text is generated by recognizing the second voice.
A second dialogue translation text is generated by translating the second dialogue text into the first language.
A second dialogue translation voice is generated by voice synthesis of the second dialogue translation text.
Whether or not the auxiliary information exists is determined based on at least one of the first dialogue text and the second dialogue text.
When the auxiliary information is present, it is determined whether or not the usage record value of the information presentation system of the second speaker is larger than the predetermined value.
When it is determined that the usage record value is larger than the predetermined value, the existence of the auxiliary information is not presented to at least one of the first speaker and the second speaker, and the usage record value is the said. When it is determined that the value is equal to or less than a predetermined value, the existence of the auxiliary information is presented to at least one of the first speaker and the second speaker.
The information presentation method according to any one of claims 1 to 4.

The usage record value includes the number of times of translation into the first language.
When the auxiliary information is present, it is determined whether or not the number of translations into the first language is greater than the predetermined number of times.
If it is determined that the number of translations into the first language is greater than the predetermined number, the presence of the auxiliary information is not presented to the second speaker.
The information presentation method according to claim 5.

The usage record value includes the usage time from the start of use of the information presentation system by the second speaker to the present.
If the auxiliary information is present, it is determined whether or not the usage time is longer than the predetermined time, and the usage time is determined.
If it is determined that the usage time is longer than the predetermined time, the existence of the auxiliary information is not presented to the second speaker.
The information presentation method according to claim 5.

The auxiliary information includes information for explaining the object to be explained.
The usage record value includes the number of times the auxiliary information is presented corresponding to the same explanation target.
If the auxiliary information is present, it is determined whether or not the number of presentations is greater than the predetermined number.
If it is determined that the number of presentations is greater than the predetermined number, the existence of the auxiliary information is not presented to the second speaker.
The information presentation method according to claim 5.

Computer,
A dialogue text generator that generates dialogue text by recognizing voices related to dialogues by multiple speakers,
A dialogue translation text generator that generates a dialogue translation text by translating the dialogue text,
A dialogue translation voice generation unit that generates a dialogue translation voice by synthesizing the dialogue translation text,
An auxiliary information determination unit that determines whether or not auxiliary information for assisting the dialogue exists based on the dialogue text,
When it is determined that the auxiliary information exists, the usage record of determining whether or not the usage record value of the information presentation system having the computer as a component of at least one of the plurality of speakers is larger than a predetermined value is determined. Value judgment department and
When it is determined that the usage record value is larger than the predetermined value, the auxiliary information and the interactive translation voice are transmitted so as not to present the existence of the auxiliary information to at least one of the plurality of speakers. Instead, when it is determined that the usage record value is equal to or less than the predetermined value, the auxiliary information and the dialogue-translated voice are presented in order to present to at least one of the plurality of speakers that the auxiliary information exists. To function as a transmitter to send
Information presentation program.

With the terminal
It is equipped with a server that is communicatively connected to the terminal.
The terminal
A voice acquisition unit that acquires voice related to dialogues by multiple speakers,
A transmitter that transmits the acquired voice to the server, and
With
The server
A receiver that receives the audio and
A dialogue text generation unit that generates dialogue text by recognizing the voice related to the dialogue,
A dialogue translation text generator that generates a dialogue translation text by translating the dialogue text,
A dialogue translation voice generation unit that generates a dialogue translation voice by synthesizing the dialogue translation text,
An auxiliary information determination unit that determines whether or not auxiliary information for assisting the dialogue exists based on the dialogue text,
When it is determined that the auxiliary information exists, a usage record value determination unit that determines whether or not the usage record value of the information presentation system of at least one of the plurality of speakers is larger than a predetermined value, and
When it is determined that the usage record value is larger than the predetermined value, the auxiliary information and the dialogue translation voice are not transmitted, and when it is determined that the usage record value is equal to or less than the predetermined value, the auxiliary information and the dialogue translation voice are not transmitted. A transmitter that transmits the interactive translation voice to the terminal,
With
The terminal
A receiving unit that receives the auxiliary information and the interactively translated voice, and
An audio output unit that outputs the interactive translation audio,
A presenting unit that presents to at least one of the plurality of speakers that the auxiliary information exists, and
Information presentation system equipped with.