JP2015127758A

JP2015127758A - Response control device and control program

Info

Publication number: JP2015127758A
Application number: JP2013273284A
Authority: JP
Inventors: 正徳荻野; Masanori Ogino; 暁本村; Akira Motomura
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2013-12-27
Filing date: 2013-12-27
Publication date: 2015-07-09
Also published as: WO2015098306A1

Abstract

PROBLEM TO BE SOLVED: To output a response to the utterance of a user in a short period of time, the response including important information.SOLUTION: A terminal (1) is provided with a response selection unit (142) for selecting a candidate phrase including the most important information as the response phrase from among a plurality of candidate phrases acquired from first and second response generation units (13, 22) which generate candidate phrases in parallel in response to a speech.

Description

本発明は、ユーザの音声に応答する応答制御装置等に関する。 The present invention relates to a response control device that responds to a user's voice.

従来、自動的に会話等の処理を行うロボットおよび音声処理システムが普及している。例えば特許文献１には、ユーザのリクエストに応じて特定のサーバにリクエストを転送し、サーバは、ローカルの記憶システムにない情報が要求されているとインターネット上の情報空間を検索して検索結果をロボットに送り返す技術が開示されている。 2. Description of the Related Art Conventionally, robots and voice processing systems that automatically process conversations and the like have become widespread. For example, in Patent Document 1, a request is forwarded to a specific server in response to a user request, and the server searches the information space on the Internet when information that is not in the local storage system is requested, A technique for sending back to the robot is disclosed.

特開２００３−１１１９８１号公報（２００３年０４月１５日公開）JP 2003-111981 (April 15, 2003) 特開２００６−１０６７６１号公報（２００６年０４月２０日公開）JP 2006-106761 A (published April 20, 2006) 特開２００９−２６５２１９号公報（２００９年１１月１２日公開）JP 2009-265219 A (published November 12, 2009)

しかしながら、上述のような従来技術は、ユーザが発話してから該発話に対する応答を取得するまでの待ち時間が長くなる可能性が高いという課題がある。すなわち、上述のロボットおよび端末では、ローカルの記憶領域での検索処理および端末での処理の後に、インターネット上での検索処理およびサーバでの処理が実行される。従って、上述のロボットおよび端末がユーザの発話を取得してから、該発話に対する応答を出力するまでの時間は長くなる可能性が高い。ここで、ユーザが発話してから該発話に対する応答を取得するまでの待ち時間を短縮するために、端末とサーバとが各々音声処理を並行して実行することが考えられる。しかし、端末とサーバとが各々音声処理を並行して実行する場合、端末の音声処理結果とサーバの音声処理結果のいずれをユーザに出力するかという問題が残る。そして、上述のような従来技術には、端末とサーバとが各々音声処理を並行して実行する場合に端末の音声処理結果とサーバの音声処理結果のいずれをユーザに出力するかという問題を解決する手段は、開示も示唆もされていない。 However, the conventional technology as described above has a problem that there is a high possibility that the waiting time from when a user speaks until a response to the speech is acquired becomes long. That is, in the above-described robot and terminal, after the search process in the local storage area and the process at the terminal, the search process on the Internet and the process at the server are executed. Therefore, there is a high possibility that the time from when the above-described robot and terminal acquire the user's utterance until the response to the utterance is output becomes longer. Here, in order to reduce the waiting time from when the user utters until the response to the utterance is acquired, it is conceivable that the terminal and the server each execute voice processing in parallel. However, when each of the terminal and the server executes voice processing in parallel, there remains a problem of which of the voice processing result of the terminal and the voice processing result of the server is output to the user. The conventional technology as described above solves the problem of outputting to the user either the terminal voice processing result or the server voice processing result when the terminal and the server execute voice processing in parallel. The means to do is not disclosed or suggested.

本発明は、上記問題点に鑑みてなされたものであり、その目的は、ユーザの発話に対して複数の音声処理を並行して得た複数の候補に基づいて、適切な応答を行う応答制御装置等を実現することにある。 The present invention has been made in view of the above problems, and an object of the present invention is to provide response control that makes an appropriate response based on a plurality of candidates obtained in parallel with a plurality of voice processes for a user's utterance. It is to realize an apparatus or the like.

上記の課題を解決するために、本発明の一態様に係る応答制御装置は、音声に対する応答を制御する応答制御装置であって、複数の応答生成手段のそれぞれによって、上記音声に基づいて生成された複数の候補フレーズを取得する候補フレーズ取得手段と、上記候補フレーズ取得手段が取得した上記複数の候補フレーズから、該複数の候補フレーズのそれぞれが有する情報の重要度が最も高い候補フレーズを、応答フレーズとして選択する選択手段とを備えていることを特徴としている。 In order to solve the above-described problem, a response control apparatus according to an aspect of the present invention is a response control apparatus that controls a response to voice, and is generated based on the voice by each of a plurality of response generation units. A candidate phrase acquiring means for acquiring a plurality of candidate phrases, and a candidate phrase having the highest importance of information included in each of the plurality of candidate phrases from the plurality of candidate phrases acquired by the candidate phrase acquiring means It is characterized by comprising selection means for selecting as a phrase.

また、本発明の一態様に係る応答制御装置の制御方法は、音声に対する応答を制御する応答制御装置の制御方法であって、複数の応答生成手段のそれぞれによって、上記音声に基づいて生成された複数の候補フレーズを取得する候補フレーズ取得ステップと、上記候補フレーズ取得ステップが取得した上記複数の候補フレーズから、該複数の候補フレーズのそれぞれが有する情報の重要度が最も高い候補フレーズを、応答フレーズとして選択する選択ステップとを含むことを特徴としている。 The response control apparatus control method according to an aspect of the present invention is a response control apparatus control method for controlling a response to voice, and is generated based on the voice by each of a plurality of response generation means. From the candidate phrase acquisition step for acquiring a plurality of candidate phrases and the plurality of candidate phrases acquired by the candidate phrase acquisition step, the candidate phrase having the highest importance of the information included in each of the plurality of candidate phrases is selected as a response phrase. And a selection step of selecting as a feature.

本発明の一態様によれば、上記音声に対し上記複数の応答生成手段のそれぞれにより生成された複数の候補フレーズに基づいて、適切な応答を行うことができるという効果を奏する。 According to the aspect of the present invention, there is an effect that an appropriate response can be made to the voice based on a plurality of candidate phrases generated by each of the plurality of response generation units.

本発明の実施形態に係る応答制御装置を含む音声応答システムの要部構成を示すブロック図である。It is a block diagram which shows the principal part structure of the voice response system containing the response control apparatus which concerns on embodiment of this invention. 図１の音声応答システムの概要を示す図である。It is a figure which shows the outline | summary of the voice response system of FIG. 図１の応答制御装置および音声処理サーバに格納されている基準フレーズテーブルの例を示す図である。It is a figure which shows the example of the reference | standard phrase table stored in the response control apparatus and voice processing server of FIG. 図１の応答制御装置および音声処理サーバに格納されている付加フレーズテーブルの例を示す図である。It is a figure which shows the example of the additional phrase table stored in the response control apparatus and voice processing server of FIG. 図１の応答制御装置および音声処理サーバの音声処理の概要を示すシーケンス図である。It is a sequence diagram which shows the outline | summary of the audio | voice processing of the response control apparatus and audio | voice processing server of FIG. 図１の応答制御装置および音声処理サーバの応答生成処理の流れを示すシーケンス図である。It is a sequence diagram which shows the flow of the response production | generation process of the response control apparatus of FIG. 1, and a speech processing server. 図１の応答制御装置の応答選択処理の流れを示すシーケンス図である。It is a sequence diagram which shows the flow of the response selection process of the response control apparatus of FIG. 本発明の別の実施形態に係る応答制御装置および音声処理サーバに格納されている付加フレーズテーブルの例を示す図である。It is a figure which shows the example of the additional phrase table stored in the response control apparatus which concerns on another embodiment of this invention, and a speech processing server. 本発明の別の実施形態に係る応答制御装置および音声処理サーバの応答生成処理の流れを示すシーケンス図である。It is a sequence diagram which shows the flow of the response production | generation process of the response control apparatus which concerns on another embodiment of this invention, and a speech processing server. 本発明のさらに別の実施形態に係る応答制御装置を含む音声応答システムの要部構成を示すブロック図である。It is a block diagram which shows the principal part structure of the voice response system containing the response control apparatus which concerns on another embodiment of this invention. 図１０の応答制御装置および音声処理サーバに格納されている付加フレーズテーブルの例を示す図である。It is a figure which shows the example of the additional phrase table stored in the response control apparatus and voice processing server of FIG. 図１０の応答制御装置および音声処理サーバの応答生成処理の流れを示すシーケンス図である。It is a sequence diagram which shows the flow of the response production | generation process of the response control apparatus of FIG. 10, and a speech processing server. 図１０の応答制御装置の応答選択処理の流れを示すシーケンス図である。It is a sequence diagram which shows the flow of the response selection process of the response control apparatus of FIG.

〔実施形態１〕
以下、本発明の一実施の形態について、図１〜図７を参照して説明する。ここでは、本発明の一態様に係る応答制御装置を、携帯端末１（以下、端末１と略記する）として実現した例について説明する。先ず、図２を参照して、端末１を含む音声応答システム１００の概要を説明する。図２は、音声応答システム１００の概要を示す図である。図示の通り、本実施の形態に係る音声応答システム１００は、端末１と音声処理サーバ２（以下、「サーバ２」と略記する）とを含む構成であり、端末１とサーバ２とは通信可能となっている。端末１は、ユーザの音声に対する応答候補フレーズ（以下、候補フレーズと略記する）を生成する処理（応答生成処理）を自ら行うとともに、サーバ２にも、端末１での応答生成処理と並行して、応答生成処理を実行させる。従って、端末での処理の後にサーバでの処理を実行するような従来の音声処理に比べ、端末１は、ユーザが発話してから該発話に対する応答をユーザが取得するまでのユーザの待ち時間を短縮できる。なお以下では、端末１が生成する候補フレーズを「候補フレーズ（Ａ）」と、サーバ２が生成する候補フレーズを「候補フレーズ（Ｂ）」と呼ぶ。端末１は、候補フレーズ（Ａ）と候補フレーズ（Ｂ）とを取得する。そして端末１は、上記２つの候補フレーズから、情報の重要度（応答レベル）がより高い候補フレーズを、出力すべき選択応答フレーズ（以下、選択フレーズと略記する）として選択し、該選択フレーズを出力する。例えば、ユーザが端末１に「今日の天気はなに？」と呼びかけると、端末１は、上記呼びかけに対する応答生成処理を自ら実行するとともに、サーバ２に対し、上記呼びかけに対する応答生成処理をリクエストする。端末１およびサーバ２は各々、外部の情報提供サーバ９８・９９から第１および第２外部情報を取得し、各々の応答生成処理に利用する。なお、端末１およびサーバ２の各々の有する情報検索能力、語彙力等により、端末１とサーバ２とで、応答生成処理の結果は異なり得る。例えば、端末１は、外部の情報提供サーバ９８から、「今日の天気は晴れ」との天気情報および「最高気温は○○度」との最高気温情報を第１外部情報として取得すると、「晴れだよ。最高気温は○○度だよ。」との候補フレーズ（Ａ）を生成する。サーバ２は、外部の情報提供サーバ９９から、「今日の天気は晴れ」との天気情報、「高気圧に覆われているので晴れ」との天気原因情報、および「最高気温は○○度」との最高気温情報を第２外部情報として取得すると、「晴れだよ。高気圧に覆われているからね。最高気温は○○度だよ。」との候補フレーズ（Ｂ）を生成する。そして、サーバ２は候補フレーズ（Ｂ）を端末１に通知する。端末１は、候補フレーズ（Ａ）と候補フレーズ（Ｂ）とを比較し、情報の重要度が高い方の候補フレーズを、出力すべき選択フレーズとして選択する。図２で、端末１は、候補フレーズ（Ａ）に含まれる天気情報と最高気温情報とに加えて、天気原因情報を含む候補フレーズ（Ｂ）を、選択フレーズとして選択し、「晴れだよ。高気圧に覆われているからね。最高気温は○○度だよ。」との選択フレーズを音声出力している。 Embodiment 1
Hereinafter, an embodiment of the present invention will be described with reference to FIGS. Here, an example in which the response control device according to one embodiment of the present invention is realized as the mobile terminal 1 (hereinafter, abbreviated as the terminal 1) will be described. First, the outline of the voice response system 100 including the terminal 1 will be described with reference to FIG. FIG. 2 is a diagram showing an outline of the voice response system 100. As illustrated, the voice response system 100 according to the present embodiment includes a terminal 1 and a voice processing server 2 (hereinafter abbreviated as “server 2”), and the terminal 1 and the server 2 can communicate with each other. It has become. The terminal 1 itself performs a process (response generation process) for generating a response candidate phrase (hereinafter abbreviated as a candidate phrase) for the user's voice, and the server 2 also performs a response generation process in the terminal 1 in parallel. The response generation process is executed. Therefore, compared with conventional voice processing in which processing at the server is executed after processing at the terminal, the terminal 1 has a longer waiting time for the user to acquire a response to the utterance after the user utters. Can be shortened. In the following, the candidate phrase generated by the terminal 1 is referred to as “candidate phrase (A)”, and the candidate phrase generated by the server 2 is referred to as “candidate phrase (B)”. The terminal 1 acquires a candidate phrase (A) and a candidate phrase (B). Then, the terminal 1 selects a candidate phrase with higher information importance (response level) from the above two candidate phrases as a selection response phrase (hereinafter abbreviated as a selection phrase) to be output, and selects the selected phrase. Output. For example, when the user calls the terminal 1 “What is the weather today?”, The terminal 1 executes a response generation process for the call itself and requests the server 2 for a response generation process for the call. . Each of the terminal 1 and the server 2 acquires the first and second external information from the external information providing servers 98 and 99 and uses them for each response generation process. Note that the results of the response generation process may differ between the terminal 1 and the server 2 depending on the information search capability, vocabulary, etc. of each of the terminal 1 and the server 2. For example, when the terminal 1 acquires the weather information “Today's weather is sunny” and the highest temperature information “Highest temperature is XX degrees” from the external information providing server 98 as the first external information, A candidate phrase (A) is generated, “The maximum temperature is XX degrees.” The server 2 receives from the external information providing server 99 the weather information “Today's weather is sunny”, the weather cause information “Sunny because it is covered with high pressure”, and “The maximum temperature is XX degrees”. Is obtained as second external information, a candidate phrase (B) is generated: "It's sunny. It's covered with high pressure. The maximum temperature is OO degrees." Then, the server 2 notifies the terminal 1 of the candidate phrase (B). The terminal 1 compares the candidate phrase (A) with the candidate phrase (B), and selects the candidate phrase with the higher importance of information as the selection phrase to be output. In FIG. 2, the terminal 1 selects the candidate phrase (B) including the weather cause information in addition to the weather information and the maximum temperature information included in the candidate phrase (A) as a selection phrase, and “sunny. "It's covered with high pressure. The maximum temperature is XX degrees."

以上に説明した端末１の概要を整理すれば、以下の通りである。すなわち、端末１は、音声に対する応答を制御する応答制御装置であって、第１応答生成部１３および第２応答生成部２２（複数の応答生成手段）のそれぞれによって、上記音声に基づいて生成された複数の候補フレーズを取得する候補取得部１４１（候補フレーズ取得手段）と、候補取得部１４１が取得した上記複数の候補フレーズから、該複数の候補フレーズのそれぞれが有する応答レベル（情報の重要度）が最も高い候補フレーズを、選択フレーズ（応答フレーズ）として選択する応答選択部１４２（選択手段）とを備えている。従って、端末１は、上記複数の応答生成手段のそれぞれにより上記音声に対して生成された複数の候補フレーズに基づいて、適切な応答を行うことができる。すなわち、端末１は、第１応答生成部１３および第２応答生成部２２によって並行して生成された複数の候補フレーズから、情報の重要度が最も高い候補フレーズを選択フレーズとして選択し、該選択フレーズを出力する。端末１は、発話取得から応答出力までの時間を短縮するために複数の応答生成処理を並行して実行させ、該複数の応答生成処理の結果から、出力すべき応答を１つ選択する。 The outline of the terminal 1 described above is summarized as follows. That is, the terminal 1 is a response control device that controls a response to voice, and is generated based on the voice by the first response generation unit 13 and the second response generation unit 22 (a plurality of response generation units). From the candidate acquisition unit 141 (candidate phrase acquisition means) that acquires a plurality of candidate phrases and the plurality of candidate phrases acquired by the candidate acquisition unit 141, the response levels (importance of information) possessed by each of the plurality of candidate phrases ) Includes a response selection unit 142 (selection unit) that selects a candidate phrase having the highest value as a selection phrase (response phrase). Accordingly, the terminal 1 can make an appropriate response based on the plurality of candidate phrases generated for the voice by each of the plurality of response generation units. That is, the terminal 1 selects a candidate phrase having the highest importance of information as a selected phrase from a plurality of candidate phrases generated in parallel by the first response generation unit 13 and the second response generation unit 22, and selects the selection phrase. Outputs a phrase. The terminal 1 executes a plurality of response generation processes in parallel to shorten the time from utterance acquisition to response output, and selects one response to be output from the results of the plurality of response generation processes.

本実施の形態において、上記複数の候補フレーズはそれぞれ、１個以上の基準フレーズと０個以上の付加フレーズとからなり、応答選択部１４２は、付加フレーズを含む候補フレーズを、付加フレーズを含まない候補フレーズよりも応答レベルが高いと判定する。従って、端末１は、第１応答生成部１３および第２応答生成部２２が各々並行して生成した複数の候補フレーズから、付加フレーズの有無に応じて、出力すべき選択フレーズを選択する。従って、端末１は、ユーザの上記呼びかけに対する直接的な応答である基準フレーズだけでなく、付加的な応答である付加フレーズも出力できる。 In the present embodiment, each of the plurality of candidate phrases includes one or more reference phrases and zero or more additional phrases, and the response selection unit 142 does not include candidate phrases including additional phrases. It is determined that the response level is higher than the candidate phrase. Therefore, the terminal 1 selects a selection phrase to be output from a plurality of candidate phrases respectively generated in parallel by the first response generation unit 13 and the second response generation unit 22 according to the presence or absence of an additional phrase. Therefore, the terminal 1 can output not only the reference phrase that is a direct response to the user's call but also an additional phrase that is an additional response.

端末１は第１応答生成部１３（応答生成手段）を備え、自ら応答生成処理を実行する。つまり、端末１は、例えば端末１を携帯するユーザの現在位置情報等、サーバ２が取得できない情報を利用して、自ら応答生成処理を実行する。なお、呼びかけ音声取得から応答までの時間を短縮するために複数の応答生成処理を並行して実行させ、該複数の応答生成処理の結果から出力すべき応答を１つ選択するのに、端末１が第１応答生成部１３を備えることは必須ではない。なお、端末１は、第１応答生成部１３以外の応答生成手段を備える外部の装置（例えば、第２応答生成部２２を備えるサーバ２）に、自らの応答生成処理に並行させて応答処理を実行させ、該外部の装置により生成された候補フレーズを取得する。詳細は後述する。 The terminal 1 includes a first response generation unit 13 (response generation means) and executes response generation processing by itself. That is, the terminal 1 executes the response generation process by itself using information that the server 2 cannot acquire, such as the current position information of the user who carries the terminal 1. In order to shorten the time from call voice acquisition to response, a plurality of response generation processes are executed in parallel, and one response to be output is selected from the results of the plurality of response generation processes. It is not essential that the first response generator 13 is provided. Note that the terminal 1 performs response processing in parallel with its own response generation processing on an external device including response generation means other than the first response generation unit 13 (for example, the server 2 including the second response generation unit 22). The candidate phrase generated by the external device is acquired. Details will be described later.

（用語説明）音声応答システム１００の実行する「音声処理」とは、音声認識処理と応答生成処理と音声合成処理とを含む処理を指す。「音声認識処理」とは、マイク１７が取得したユーザの呼びかけ音声データを、対応する文字データである呼びかけフレーズに変換する処理であり、音声データを文字データに変換する公知の音声認識処理と同様の処理であってもよい。「応答生成処理」とは、上記呼びかけフレーズに対応する文字データである候補フレーズを生成する処理である。「音声合成処理」とは、文字データである候補フレーズに対応する音声データを生成する処理であり、文字データを音声データに変換する公知の音声合成処理と同様の処理であってもよい。音声合成処理により生成された音声データは、スピーカ１９２から出力される。なお、端末１は、音声認識処理と応答生成処理と音声合成処理とに加え、応答選択処理を実行する。「応答選択処理」とは、詳細は後述するが、各々並行して行われる複数の応答生成処理の結果として生成される複数の候補フレーズから、該複数の候補フレーズのそれぞれが有する情報の重要度が最も高い候補フレーズを、選択フレーズとして選択する処理である。 (Glossary) “Speech processing” executed by the speech response system 100 refers to processing including speech recognition processing, response generation processing, and speech synthesis processing. The “voice recognition process” is a process for converting user call voice data acquired by the microphone 17 into a call phrase that is corresponding character data, and is similar to a known voice recognition process for converting voice data into character data. It may be the process. The “response generation process” is a process for generating a candidate phrase that is character data corresponding to the calling phrase. The “voice synthesis process” is a process for generating voice data corresponding to a candidate phrase that is character data, and may be a process similar to a known voice synthesis process for converting character data into voice data. The voice data generated by the voice synthesis process is output from the speaker 192. Note that the terminal 1 executes a response selection process in addition to the voice recognition process, the response generation process, and the voice synthesis process. The “response selection process” is described later in detail, but the importance of information included in each of the plurality of candidate phrases from a plurality of candidate phrases generated as a result of a plurality of response generation processes performed in parallel. Is the process of selecting the candidate phrase with the highest as the selected phrase.

「呼びかけフレーズ」とは、マイク１７が取得した或る呼びかけ音声に対し、音声認識部１２が音声認識処理を実行して得る、文字データを指す。上記呼びかけフレーズに対し、第１応答生成部１３および第２応答生成部２２が生成する応答を「候補フレーズ」と呼ぶ。上記候補フレーズは、上記呼びかけフレーズに対する直接的な回答である「基準フレーズ」を含み、また、上記呼びかけフレーズに対する付加的な回答または情報を含む「付加フレーズ」が付加されていてもよい。或る呼びかけフレーズに対し、基準フレーズおよび付加フレーズの少なくとも一方は複数あってもよい。上記候補フレーズは、「基準フレーズのみ」または「基準フレーズと１つ以上の付加フレーズとの組み合わせ」である。例えば、呼びかけフレーズ＝「今日の天気は何？」に対し、「晴れだよ。」との基準フレーズ、「高気圧に覆われているからね。」との付加フレーズ（Ａ−１）、「最高気温は○○度になるよ。」との付加フレーズ（Ａ−２）が選択され得る場合、以下の候補フレーズが想定できる。すなわち、基準フレーズのみの「晴れだよ。」という候補フレーズと、基準フレーズに付加フレーズ（Ａ−１）を加えた「晴れだよ。高気圧に覆われているからね。」という候補フレーズと、基準フレーズに付加フレーズ（Ａ−２）を加えた「晴れだよ。最高気温は○○度になるよ。」という候補フレーズと、基準フレーズに付加フレーズ（Ａ−１）および付加フレーズ（Ａ−２）を加えた「晴れだよ。高気圧に覆われているからね。最高気温は○○度だよ。」という候補フレーズという４種類の候補フレーズが想定できる。なお、候補フレーズにおいて、基準フレーズに対し付加フレーズを付加する位置について制限はない。基準フレーズの「後」に付加フレーズを付加してもよいし、基準フレーズの「前」に付加フレーズを付加してもよい。さらに、２つ以上の付加フレーズの間に基準フレーズのある候補フレーズを生成してもよい。また、２以上の付加フレーズの前後について制限はなく、「晴れだよ。高気圧に覆われているからね。最高気温は○○度だよ。」としても、「晴れだよ。最高気温は○○度だよ。高気圧に覆われているからね。」としてもよい。 The “calling phrase” refers to character data obtained by the voice recognition unit 12 executing voice recognition processing on a certain calling voice acquired by the microphone 17. In response to the call phrase, the responses generated by the first response generation unit 13 and the second response generation unit 22 are referred to as “candidate phrases”. The candidate phrase includes a “reference phrase” that is a direct answer to the call phrase, and an “addition phrase” that includes an additional answer or information to the call phrase may be added. There may be a plurality of at least one of a reference phrase and an additional phrase for a certain calling phrase. The candidate phrase is “reference phrase only” or “combination of a reference phrase and one or more additional phrases”. For example, in response to the call phrase = “What is the weather today?”, An additional phrase (A-1), “Highest”. When the additional phrase (A-2) “The temperature is XX degrees” can be selected, the following candidate phrases can be assumed. That is, the candidate phrase “It's sunny” with only the reference phrase, and the candidate phrase “It ’s sunny. It ’s covered with high pressure.” With the additional phrase (A-1) added to the reference phrase, A candidate phrase “It ’s sunny. The maximum temperature is XX degrees.” With an additional phrase (A-2) added to the reference phrase, and an additional phrase (A-1) and an additional phrase (A- 4) Candidate phrases “candidate. Because it is covered with high pressure. The maximum temperature is OO degrees” can be assumed. In addition, in a candidate phrase, there is no restriction | limiting about the position which adds an additional phrase with respect to a reference | standard phrase. An additional phrase may be added “after” the reference phrase, or an additional phrase may be added “before” the reference phrase. Further, a candidate phrase having a reference phrase between two or more additional phrases may be generated. Also, there is no limit before and after two or more additional phrases, and “It ’s sunny. Because it ’s covered with high pressure. The highest temperature is ○○ degrees.”, “It ’s sunny. "It's a degree. It's covered with high pressure."

（端末の要部構成）図１は、端末１およびサーバ２の要部構成を示すブロック図である。図示の通り、端末１は、第１制御部１０、マイク１７、第１記憶部１８、および出力部１９を含む構成である。
マイク１７は、音声等を電気信号に変換し、音声認識部１２に通知する。
出力部１９は、表示部１９１とスピーカ１９２とを含む。表示部１９１は、選択結果出力部１４３から文字データとして通知される選択フレーズを画像として出力する。スピーカ１９２は、音声合成部１５から通知される音声データを音声として出力する。
第１記憶部１８は、端末１が使用する各種データを格納する。第１記憶部１８は、端末１の第１制御部１０が実行する（１）制御プログラム、（２）ＯＳプログラム、（３）各種機能を実行するためのアプリケーションプログラム、および、（４）該アプリケーションプログラムを実行するときに読み出す各種データを記憶する。上記の（１）〜（４）のデータは、例えば、ＲＯＭ（read only memory）、フラッシュメモリ、ＥＰＲＯＭ（Erasable Programmable ROM）、ＥＥＰＲＯＭ（登録商標）（Electrically EPROM）、ＨＤＤ（Hard Disc Drive）等の不揮発性記憶装置に記憶される。また、第１記憶部１８には、第１基準フレーズテーブル１８１と第１付加フレーズテーブル１８２とが格納されている。
第１制御部１０は、音声認識処理、応答生成処理、応答選択処理、音声合成処理を含む端末１の機能を統括して制御するものであり、第１通信部１１、音声認識部１２、第１応答生成部１３、応答制御部１４、音声合成部１５および第１外部情報取得部１６を含む。
第１通信部１１は、サーバ２等との通信を行う。より詳細には、第１通信部１１は、（１）音声認識部１２から、マイク１７が取得した呼びかけ音声に対し音声認識部１２が音声認識処理を実行した結果である呼びかけフレーズと、該呼びかけフレーズに対する候補フレーズを生成する処理の実行要求（応答生成処理のリクエスト）とを取得する。そして、上記呼びかけフレーズと応答生成処理のリクエストとをサーバ２へ送信する。（２）サーバ２から、第２応答生成部２２の応答生成処理結果である候補フレーズ（Ｂ）を受信し、該候補フレーズを候補取得部１４１に通知する。（３）第１応答生成部１３が応答生成処理を実行しようとする際に端末１の保持している情報以外の情報である第１外部情報が必要である場合、該第１外部情報を外部の情報提供サーバ９８等から取得し、第１外部情報取得部１６に通知する。
音声認識部１２は、音声認識処理を実行する。つまり、音声認識部１２は先ず、マイク１７から通知された呼びかけ音声データを文字データである呼びかけフレーズに変換する。そして、上記呼びかけフレーズと、応答生成処理のリクエストとを、第１通信部１１および第１応答生成部１３に通知する。音声認識部１２は、音声データを文字データに変換する公知の音声認識に関する技術を利用してよく、音声認識処理そのものは従来技術を用いて可能であるので、詳細は省略する。
第１応答生成部１３は、応答生成処理を実行する。つまり、第１応答生成部１３は、音声認識部１２から通知される文字データとしての呼びかけフレーズに対して候補フレーズ（Ａ）を生成する。第１応答生成部１３は、第１外部情報取得部１６から通知される第１外部情報を利用して候補フレーズ（Ａ）を生成してもよい。詳細は後述する。
応答制御部１４は、候補取得部１４１、応答選択部１４２および選択結果出力部１４３を含む。候補取得部１４１は、第１応答生成部１３から候補フレーズ（Ａ）を、第１通信部１１から第２応答生成部２２が生成した候補フレーズ（Ｂ）を、取得し、取得した候補フレーズ（Ａ）および（Ｂ）を応答選択部１４２に通知する。応答選択部１４２は、応答選択処理を実行する。具体的には、応答選択部１４２は、候補取得部１４１から通知された候補フレーズ（Ａ）および（Ｂ）から、それぞれの候補フレーズが有する情報の重要度（応答レベル）が高い方の候補フレーズを、出力すべき選択フレーズとして選択する。詳細は後述する。応答選択部１４２は、上記選択フレーズを、選択結果出力部１４３に通知する。選択結果出力部１４３は、応答選択部１４２から通知された上記選択フレーズを、音声合成部１５および表示部１９１に通知する。
音声合成部１５は、音声合成処理を実行する。つまり、音声合成部１５は、選択結果出力部１４３から通知される文字データである選択フレーズを音声データに変換し、スピーカ１９２に出力させる。音声合成部１５は、文字データを音声データに変換する公知の音声合成に関する技術を利用してよく、音声合成処理そのものは従来技術を用いて可能であるので、詳細は省略する。
第１外部情報取得部１６は、外部の情報提供サーバ９８から、端末１の保持している情報以外の情報等である第１外部情報を取得し、該第１外部情報を第１応答生成部１３に通知する。第１外部情報取得部１６は、第１応答生成部１３からのリクエストに応じて、第１外部情報を取得してもよい。 (Configuration of Main Parts of Terminal) FIG. 1 is a block diagram showing the main configuration of the terminal 1 and the server 2. As illustrated, the terminal 1 includes a first control unit 10, a microphone 17, a first storage unit 18, and an output unit 19.
The microphone 17 converts voice or the like into an electrical signal and notifies the voice recognition unit 12.
The output unit 19 includes a display unit 191 and a speaker 192. The display unit 191 outputs the selected phrase notified as character data from the selection result output unit 143 as an image. The speaker 192 outputs the voice data notified from the voice synthesizer 15 as voice.
The first storage unit 18 stores various data used by the terminal 1. The first storage unit 18 includes (1) a control program executed by the first control unit 10 of the terminal 1, (2) an OS program, (3) an application program for executing various functions, and (4) the application. Stores various data to be read when the program is executed. The above data (1) to (4) are, for example, ROM (read only memory), flash memory, EPROM (Erasable Programmable ROM), EEPROM (registered trademark) (Electrically EPROM), HDD (Hard Disc Drive), etc. It is stored in a non-volatile storage device. The first storage unit 18 stores a first reference phrase table 181 and a first additional phrase table 182.
The first control unit 10 controls the functions of the terminal 1 including a voice recognition process, a response generation process, a response selection process, and a voice synthesis process, and controls the first communication unit 11, the voice recognition unit 12, 1 response generation unit 13, response control unit 14, speech synthesis unit 15 and first external information acquisition unit 16 are included.
The first communication unit 11 communicates with the server 2 and the like. More specifically, the first communication unit 11 includes (1) a call phrase that is a result of the voice recognition unit 12 performing voice recognition processing on the call voice acquired by the microphone 17 from the voice recognition unit 12, and the call An execution request (response generation processing request) for generating a candidate phrase for the phrase is acquired. Then, the call phrase and the response generation process request are transmitted to the server 2. (2) The candidate phrase (B) that is the response generation processing result of the second response generation unit 22 is received from the server 2 and the candidate phrase is notified to the candidate acquisition unit 141. (3) When the first external information that is information other than the information held by the terminal 1 is necessary when the first response generation unit 13 tries to execute the response generation processing, the first external information is externally stored. From the information providing server 98 and the like, and notifies the first external information acquisition unit 16.
The voice recognition unit 12 performs a voice recognition process. That is, the voice recognition unit 12 first converts the call voice data notified from the microphone 17 into a call phrase that is character data. And the said communication phrase and the request | requirement of a response production | generation process are notified to the 1st communication part 11 and the 1st response production | generation part 13. FIG. The voice recognition unit 12 may use a known technique for voice recognition that converts voice data into character data, and the voice recognition process itself can be performed using a conventional technique, and thus the details are omitted.
The first response generation unit 13 executes response generation processing. That is, the 1st response production | generation part 13 produces | generates a candidate phrase (A) with respect to the calling phrase as character data notified from the speech recognition part 12. FIG. The first response generation unit 13 may generate the candidate phrase (A) using the first external information notified from the first external information acquisition unit 16. Details will be described later.
The response control unit 14 includes a candidate acquisition unit 141, a response selection unit 142, and a selection result output unit 143. The candidate acquisition unit 141 acquires the candidate phrase (A) from the first response generation unit 13 and the candidate phrase (B) generated by the second response generation unit 22 from the first communication unit 11 and acquires the acquired candidate phrase ( A) and (B) are notified to the response selection unit 142. The response selection unit 142 executes response selection processing. Specifically, the response selection unit 142 selects a candidate phrase having a higher importance (response level) of information included in each candidate phrase from the candidate phrases (A) and (B) notified from the candidate acquisition unit 141. Is selected as a selection phrase to be output. Details will be described later. The response selection unit 142 notifies the selection result output unit 143 of the selected phrase. The selection result output unit 143 notifies the voice synthesis unit 15 and the display unit 191 of the selected phrase notified from the response selection unit 142.
The voice synthesizer 15 performs a voice synthesis process. That is, the speech synthesizer 15 converts the selected phrase, which is character data notified from the selection result output unit 143, into speech data and causes the speaker 192 to output it. The voice synthesizer 15 may use a known technique related to voice synthesis that converts character data into voice data, and the voice synthesis process itself can be performed using conventional techniques, and thus the details are omitted.
The first external information acquisition unit 16 acquires first external information which is information other than the information held by the terminal 1 from the external information providing server 98, and uses the first external information as a first response generation unit. 13 is notified. The first external information acquisition unit 16 may acquire the first external information in response to a request from the first response generation unit 13.

（サーバの要部構成）サーバ２は、第２制御部２０および第２記憶部２４を含む構成である。第２記憶部２４には、第２基準フレーズテーブル２４１と第２付加フレーズテーブル２４２とが格納されており、詳細は後述する。第２制御部２０は、第２通信部２１、第２応答生成部２２および第２外部情報取得部２３を含む。
第２通信部２１は、（１）端末１から、音声認識部１２による音声認識処理の結果である呼びかけフレーズと応答生成処理のリクエストとを受信し、該呼びかけフレーズと応答生成処理のリクエストとを第２応答生成部２２へ通知する。（２）第２応答生成部２２から、応答生成処理結果である候補フレーズ（Ｂ）を取得し、該候補フレーズ（Ｂ）を端末１に送信する。（３）第２応答生成部２２が応答生成処理を実行しようとする際にサーバ２の保持している情報以外の情報である第２外部情報が必要である場合、該第２外部情報を外部の情報提供サーバ９９等から取得し、第２外部情報取得部２３に通知する。
第２応答生成部２２は、応答生成処理を実行する。つまり、第２応答生成部２２は、第２通信部２１から通知される呼びかけフレーズに対して候補フレーズ（Ｂ）を生成する処理を行う。第２応答生成部２２は、第２外部情報取得部２３から通知される第２外部情報を利用して、候補フレーズ（Ｂ）を生成してもよい。詳細は後述する。
第２外部情報取得部２３は、外部の情報提供サーバ９９から、サーバ２の保持している情報以外の情報等である第２外部情報を取得し、該第２外部情報を第２応答生成部２２に通知する。第２外部情報取得部２３は、第２応答生成部２２からのリクエストに応じて、上記第２外部情報を取得してもよい。 (Main part configuration of server) The server 2 includes a second control unit 20 and a second storage unit 24. The second storage unit 24 stores a second reference phrase table 241 and a second additional phrase table 242, details of which will be described later. The second control unit 20 includes a second communication unit 21, a second response generation unit 22, and a second external information acquisition unit 23.
The second communication unit 21 (1) receives a call phrase and a response generation process request as a result of the voice recognition process by the voice recognition unit 12 from the terminal 1, and receives the call phrase and the response generation process request. The second response generation unit 22 is notified. (2) The candidate phrase (B) that is the response generation processing result is acquired from the second response generation unit 22, and the candidate phrase (B) is transmitted to the terminal 1. (3) If the second external information that is information other than the information held by the server 2 is required when the second response generation unit 22 tries to execute the response generation processing, the second external information is externally stored. From the information providing server 99 and the like, and notifies the second external information acquiring unit 23 of the information.
The second response generation unit 22 executes a response generation process. That is, the second response generation unit 22 performs a process of generating a candidate phrase (B) for the calling phrase notified from the second communication unit 21. The second response generation unit 22 may generate the candidate phrase (B) using the second external information notified from the second external information acquisition unit 23. Details will be described later.
The second external information acquisition unit 23 acquires second external information that is information other than the information held by the server 2 from the external information providing server 99, and uses the second external information as a second response generation unit. 22 is notified. The second external information acquisition unit 23 may acquire the second external information in response to a request from the second response generation unit 22.

（記憶部に格納されている情報）図３は、端末１およびサーバ２に格納されている第１基準フレーズテーブル１８１および第２基準フレーズテーブル２４１の例を示す図である。図４は、端末１およびサーバ２に格納されている第１付加フレーズテーブル１８２および第２付加フレーズテーブル２４２の例を示す図である。なお以下では、第１基準フレーズテーブル１８１と第２基準フレーズテーブル２４１とを区別する必要がない場合、両者を併せて「基準フレーズテーブル」と呼ぶ。同様に、第１付加フレーズテーブル１８２と第２付加フレーズテーブル２４２とを併せて「付加フレーズテーブル」と呼ぶ。
図３の基準フレーズテーブルには、呼びかけフレーズと基準フレーズとが対応付けられている。また、基準フレーズテーブルにおいて、各呼びかけフレーズを識別するための「呼びかけＩＤ」が各呼びかけフレーズに対応付けられており、各基準フレーズを識別するための「基準ＩＤ」が各基準フレーズに対応付けられている。図４の付加フレーズテーブルには、基準ＩＤと付加フレーズとが対応付けられており、各付加フレーズを識別するための「付加ＩＤ」が各付加フレーズに対応付けられている。また、各付加フレーズには、基準フレーズに付加するための条件として、「付加条件」が設定されている。第１応答生成部１３および第２応答生成部２２は、付加条件を満たす付加フレーズがあると、該付加フレーズを基準フレーズに付加する。なお、第１応答生成部１３および第２応答生成部２２が応答生成処理を行う際、基準フレーズテーブルの基準フレーズおよび付加フレーズテーブルの付加フレーズの内容は予め決められている。ただし、付加フレーズテーブルにおける付加フレーズの内容は、予め決められていなくともよい。図４の付加フレーズテーブルにおいて、付加ＩＤ＝「３」の付加フレーズは、付加条件が「天気原因情報の取得に成功」であり、付加フレーズ（の内容）は「取得した天気原因情報による」である。これは、天気（晴れ、曇り、雨など）の原因に係る情報を外部の情報提供サーバ９８・９９（例えば、天気情報サーバなど）から取得できた場合に、その情報を付加フレーズとすることを示す。天気の原因には様々なものが考えられるが、その原因を予め全て付加フレーズテーブルに保持しておく必要はなく、例えば、天気情報サーバから取得した天気原因情報を付加フレーズとして、候補フレーズの生成時に利用してもよい。付加ＩＤ＝「４」、「５」の付加フレーズについても同様であり、最高気温および降水確率などの情報を天気情報サーバなどから取得した場合に、付加フレーズの「○○」の箇所を取得した値に置き換えることを想定している。端末１およびサーバ２は各々、基準フレーズテーブルおよび付加フレーズテーブルを格納している。端末１およびサーバ２の各々が格納している基準フレーズテーブルおよび付加フレーズテーブルの内容は、共通であってもよいし、異なっていてもよい。なお、端末１とサーバ２とで基準フレーズテーブルおよび付加フレーズテーブルの内容が同じであったとしても、第１応答生成部１３と第２応答生成部２２とが、或る呼びかけフレーズに対して常に同じ候補フレーズを生成するとは限らない。つまり、第１基準フレーズテーブル１８１の内容と第２基準フレーズテーブル２４１の内容とが同じで、かつ、第１付加フレーズテーブル１８２の内容と第２付加フレーズテーブル２４２の内容とが同じでも、例えば、以下のような事態があり得る。すなわち、サーバ２が最高気温に関する情報を取得して付加ＩＤ＝「４」の付加フレーズを生成できたのに対し、端末１は最高気温に関する情報を取得できず、付加ＩＤ＝「４」の付加フレーズを生成できない、というような事態である。さらに、端末１およびサーバ２の各々が格納している基準フレーズテーブルおよび付加フレーズテーブルの内容は、以下のように異なっていてもよい。例えば、サーバ２の第２基準フレーズテーブル２４１および第２付加フレーズテーブル２４２には、付加条件等として、インターネット上の様々な情報を取得し解析する必要のあるような条件が設定されていてもよい。他方、端末１の第１基準フレーズテーブル１８１および第１付加フレーズテーブル１８２には、付加条件等として、今日の日付、および端末１を携帯するユーザの現在位置情報（端末１の備えるＧＰＳ（Global Positioning System）等で取得する現在位置情報）など、端末１のみが取得できる条件が設定されていてもよい。 (Information Stored in Storage Unit) FIG. 3 is a diagram showing an example of the first reference phrase table 181 and the second reference phrase table 241 stored in the terminal 1 and the server 2. FIG. 4 is a diagram illustrating an example of the first additional phrase table 182 and the second additional phrase table 242 stored in the terminal 1 and the server 2. Hereinafter, when it is not necessary to distinguish between the first reference phrase table 181 and the second reference phrase table 241, both are collectively referred to as a “reference phrase table”. Similarly, the first additional phrase table 182 and the second additional phrase table 242 are collectively referred to as an “addition phrase table”.
In the reference phrase table of FIG. 3, the calling phrase and the reference phrase are associated with each other. In the reference phrase table, “call ID” for identifying each call phrase is associated with each call phrase, and “reference ID” for identifying each reference phrase is associated with each reference phrase. ing. In the additional phrase table of FIG. 4, a reference ID and an additional phrase are associated with each other, and an “addition ID” for identifying each additional phrase is associated with each additional phrase. In each additional phrase, “additional conditions” are set as conditions for adding to the reference phrase. If there is an additional phrase that satisfies the additional condition, the first response generating unit 13 and the second response generating unit 22 add the additional phrase to the reference phrase. In addition, when the 1st response production | generation part 13 and the 2nd response production | generation part 22 perform a response production | generation process, the content of the reference phrase of a reference | standard phrase table and the additional phrase of an additional phrase table is decided beforehand. However, the content of the additional phrase in the additional phrase table may not be determined in advance. In the additional phrase table of FIG. 4, the additional phrase of additional ID = “3” has an additional condition of “successful acquisition of weather cause information”, and the additional phrase (contents) is “according to acquired weather cause information”. is there. This means that when information related to the cause of weather (sunny, cloudy, rain, etc.) can be acquired from an external information providing server 98/99 (for example, a weather information server), the information is used as an additional phrase. Show. There are various possible causes of the weather, but it is not necessary to store all the causes in the additional phrase table in advance. For example, generation of candidate phrases using the weather cause information acquired from the weather information server as additional phrases It may be used sometimes. The same applies to the additional phrases of additional ID = “4” and “5”. When information such as the maximum temperature and the probability of precipitation is acquired from a weather information server or the like, the location of “○○” of the additional phrase is acquired. It is supposed to be replaced with a value. Each of the terminal 1 and the server 2 stores a reference phrase table and an additional phrase table. The contents of the reference phrase table and the additional phrase table stored in each of the terminal 1 and the server 2 may be common or different. Even if the contents of the reference phrase table and the additional phrase table are the same between the terminal 1 and the server 2, the first response generation unit 13 and the second response generation unit 22 always respond to a certain calling phrase. The same candidate phrase is not necessarily generated. That is, even if the contents of the first reference phrase table 181 and the contents of the second reference phrase table 241 are the same, and the contents of the first additional phrase table 182 and the contents of the second additional phrase table 242 are the same, for example, The following situations can occur: That is, while the server 2 can acquire the information about the maximum temperature and generate the additional phrase with the additional ID = “4”, the terminal 1 cannot acquire the information about the maximum temperature and the additional ID = “4”. This is a situation where a phrase cannot be generated. Furthermore, the contents of the reference phrase table and the additional phrase table stored in each of the terminal 1 and the server 2 may be different as follows. For example, the second reference phrase table 241 and the second additional phrase table 242 of the server 2 may be set with conditions for acquiring and analyzing various information on the Internet as additional conditions. . On the other hand, the first reference phrase table 181 and the first additional phrase table 182 of the terminal 1 include, as additional conditions and the like, today's date and current position information of the user carrying the terminal 1 (GPS (Global Positioning provided in the terminal 1). Conditions that only the terminal 1 can acquire, such as current position information acquired by (System) etc., may be set.

（音声処理の概要）図５は、端末１およびサーバ２の行う処理の概要を示すシーケンス図である。端末１の実行する音声処理の基本的な流れは、以下のように整理できる。すなわち、端末１のマイク１７がユーザの呼びかけ音声を取得する（Ｓ１０１）と、マイク１７は上記呼びかけ音声を音声データに変換し、該音声データを音声認識部１２に通知する。音声認識部１２は、上記音声データに対し音声認識処理を実行する（Ｓ１０２）。音声認識部１２は、上記音声データに対し音声認識処理を実行して呼びかけフレーズを取得し、取得した該呼びかけフレーズを、応答生成処理のリクエストと共に、第１応答生成部１３および第１通信部１１に通知する（Ｓ１０３）。音声認識部１２から上記呼びかけフレーズと応答生成処理のリクエストとを通知されると、第１応答生成部１３は応答生成処理を行う（Ｓ１０４）。そして、第１応答生成部１３は生成した候補フレーズ（Ａ）を候補取得部１４１に通知する。また、第１通信部１１は、音声認識部１２から通知された上記呼びかけフレーズと応答生成処理のリクエストとをサーバ２に送信する。サーバ２の第２通信部２１は、端末１から受信した上記呼びかけフレーズと応答生成処理のリクエストとを第２応答生成部２２に通知する。第２通信部２１から上記呼びかけフレーズと応答生成処理のリクエストとを通知されると、第２応答生成部２２は応答生成処理を行う（Ｓ１０４’）。第２応答生成部２２は生成した候補フレーズ（Ｂ）を第２通信部２１に通知し、第２通信部２１は該候補フレーズ（Ｂ）を端末１に送信する。端末１の第１通信部１１は、サーバ２から受信した上記候補フレーズ（Ｂ）を候補取得部１４１に通知する。候補取得部１４１は、端末１およびサーバ２の応答生成処理の結果を取得し、つまり、第１応答生成部１３から候補フレーズ（Ａ）を、第１通信部１１から候補フレーズ（Ｂ）を取得する（Ｓ１０５）。候補取得部１４１は、候補フレーズ（Ａ）および（Ｂ）を、応答選択部１４２に通知する。応答選択部１４２は、応答選択処理を実行し、つまり、候補フレーズ（Ａ）または（Ｂ）のいずれかを選択フレーズとして選択する（Ｓ１０６）。応答選択部１４２は、選択フレーズとして選択した候補フレーズを選択結果出力部１４３に通知する。選択結果出力部１４３は、応答選択部１４２から通知された上記選択フレーズを表示部１９１および音声合成部１５に通知する。音声合成部１５は、選択結果出力部１４３から通知された選択フレーズに対し音声合成処理を実行し、ユーザへ応答を音声出力する（Ｓ１０７）。次に、応答生成処理および応答選択処理の詳細を説明する。 (Outline of Voice Processing) FIG. 5 is a sequence diagram showing an outline of processing performed by the terminal 1 and the server 2. The basic flow of audio processing executed by the terminal 1 can be organized as follows. That is, when the microphone 17 of the terminal 1 acquires the user's calling voice (S101), the microphone 17 converts the calling voice into voice data and notifies the voice recognition unit 12 of the voice data. The voice recognition unit 12 performs a voice recognition process on the voice data (S102). The voice recognition unit 12 executes a voice recognition process on the voice data to acquire a calling phrase, and the acquired response phrase together with the request for the response generation process is used for the first response generation unit 13 and the first communication unit 11. (S103). When the calling phrase and the request for response generation processing are notified from the voice recognition unit 12, the first response generation unit 13 performs response generation processing (S104). Then, the first response generation unit 13 notifies the candidate acquisition unit 141 of the generated candidate phrase (A). In addition, the first communication unit 11 transmits the call phrase notified from the voice recognition unit 12 and a request for response generation processing to the server 2. The second communication unit 21 of the server 2 notifies the second response generation unit 22 of the call phrase received from the terminal 1 and the request for response generation processing. When the second communication unit 21 notifies the call phrase and the request for response generation processing, the second response generation unit 22 performs response generation processing (S104 '). The second response generation unit 22 notifies the generated candidate phrase (B) to the second communication unit 21, and the second communication unit 21 transmits the candidate phrase (B) to the terminal 1. The first communication unit 11 of the terminal 1 notifies the candidate acquisition unit 141 of the candidate phrase (B) received from the server 2. The candidate acquisition unit 141 acquires the result of the response generation processing of the terminal 1 and the server 2, that is, acquires the candidate phrase (A) from the first response generation unit 13 and the candidate phrase (B) from the first communication unit 11. (S105). The candidate acquisition unit 141 notifies the response selection unit 142 of the candidate phrases (A) and (B). The response selection unit 142 executes a response selection process, that is, selects either of the candidate phrases (A) or (B) as the selected phrase (S106). The response selection unit 142 notifies the selection result output unit 143 of the candidate phrase selected as the selected phrase. The selection result output unit 143 notifies the display unit 191 and the speech synthesis unit 15 of the selected phrase notified from the response selection unit 142. The voice synthesis unit 15 performs voice synthesis processing on the selected phrase notified from the selection result output unit 143, and outputs a response to the user as a voice (S107). Next, details of the response generation process and the response selection process will be described.

（応答生成処理）図６は、第１応答生成部１３と第２応答生成部２２とが実行する応答生成処理の流れを示す図である。なお以下では、第１応答生成部１３と第２応答生成部２２とを区別する必要がない場合、両者を併せて「応答生成部」と呼ぶ。図６に示す通り、応答生成部は、呼びかけフレーズを通知されると、先ず、基準フレーズテーブルを参照して、呼びかけフレーズに対応する基準フレーズを選択する（Ｓ２０１）。呼びかけフレーズに対応する基準フレーズが複数ある場合、応答生成部は、条件に合致する基準フレーズを選択する。例えば、図３に例示する基準フレーズテーブルにおいて、呼びかけＩＤ＝「１」の「おはよう。」との呼びかけフレーズに対応する基準フレーズは、基準ＩＤ＝「１−１」の「おはよう。」との基準フレーズのみである。従って、呼びかけＩＤ＝「１」の呼びかけフレーズに対し、応答生成部は、基準フレーズテーブルを参照して、基準ＩＤ＝「１−１」の基準フレーズを選択する。他方、基準フレーズテーブルにおいて、呼びかけＩＤ＝「２」の「今日の天気は何？」との呼びかけフレーズには、基準ＩＤ＝「２−１」、「２−２」、「２−３」の３つの基準フレーズが対応付けられている。つまり、呼びかけＩＤ＝「２」の呼びかけフレーズに対し、応答生成部は、基準フレーズテーブルを参照して、上記３つの基準フレーズを選択しうる。応答生成部は、上記３つの基準フレーズから、条件に合致する基準フレーズを選択する。具体的には、天気情報サーバなどの外部の情報提供サーバ９８・９９から天気情報を取得し、「今日の天気」が「晴れ」であれば、基準ＩＤ＝「２−１」の「晴れだよ。」を選択する。天気情報サーバから取得した「今日の天気」が「曇り」であれば、基準ＩＤ＝「２−２」の「曇りだよ。」を選択する。天気情報を取得できなければ、基準ＩＤ＝「２−４」の「わからないよ。」を選択する。
応答生成部は、次に、図４に例示する付加フレーズテーブルを参照して、Ｓ２０１で選択した基準ＩＤに対応付けられている（関連する）付加ＩＤを選択する（Ｓ２０２）。例えば、基準フレーズテーブルでは、呼びかけＩＤ＝「１」の呼びかけフレーズに、基準ＩＤ＝「１−１」の基準フレーズが対応付けられている。そして、付加フレーズテーブルにおいて、関連する基準ＩＤ＝「１−１」である付加フレーズの付加ＩＤは、「１」および「２」である。従って、応答生成部は、先ず、付加ＩＤ＝「１」の付加フレーズについて、付加条件が満たされているかを確認する（Ｓ２０３）。付加ＩＤ＝「１」の付加フレーズの付加条件が満たされているのを確認できた場合（Ｓ２０３でＹｅｓ）、付加ＩＤ＝「１」の付加フレーズを、基準ＩＤ＝「１−１」の基準フレーズに付加する（Ｓ２０４）。例えば、端末１およびサーバ２の少なくとも一方が、マイク１７から取得したユーザの呼びかけ音声に基づき、該ユーザの感情（楽しい、悲しい等）に係る情報を取得できた場合、応答生成部は、付加ＩＤ＝「１」の付加条件である「ユーザの感情が楽しい」が満たされているかを確認する。そして、ユーザの呼びかけ音声に基づいて「ユーザの感情が楽しい」であることを確認できた場合、応答生成部は、付加ＩＤ＝「１」の付加フレーズである「今日もいいことあるといいね。」を、基準ＩＤ＝「１−１」の基準フレーズに付加する。なお、ユーザの音声に基づいて該ユーザの感情を推定する技術そのものは従来技術を用いて可能であるので、説明を省略する。付加ＩＤ＝「１」の付加フレーズの付加条件が満たされているのを確認できない場合（Ｓ２０３でＮｏ）、付加ＩＤ＝「１」の付加フレーズを基準ＩＤ＝「１−１」の基準フレーズに付加せず、Ｓ２０５に遷移する。
応答生成部は、次に、付加フレーズテーブルを参照して、上記基準ＩＤに対応する（関連する）、他の付加ＩＤが無いか確認する（Ｓ２０５）。つまり、応答生成部は、「関連する基準ＩＤ」がＳ２０１で選択した基準ＩＤである付加ＩＤであって、付加条件が満たされているかを未だ確認していない付加ＩＤが無いか、を確認する。付加条件が満たされているかを未確認の付加ＩＤがある場合（Ｓ２０５でＮｏ）、Ｓ２０２に戻って、付加条件が満たされているかを未確認の付加ＩＤを選択し（Ｓ２０２）、Ｓ２０３以降の処理を繰り返す。例えば、図４に例示する付加フレーズテーブルにおいて、関連する基準ＩＤが「２−１」である付加ＩＤは、「３」、「４」、「５」である。Ｓ２０１で基準ＩＤ＝「２−１」を選択し、付加ＩＤ＝「３」の付加条件は既に確認済み、付加ＩＤ＝「４」、「５」の付加条件は未確認であれば、応答生成部は、次に、付加ＩＤ＝「４」の付加条件が満たされているかを確認する。付加条件が満たされているかを確認していない付加ＩＤがない場合（Ｓ２０５でＹｅｓ）、Ｓ２０６へ遷移する。例えば、呼びかけＩＤ＝「３」の呼びかけフレーズに対し、基準ＩＤ＝「３−３」の基準フレーズを選択した場合、応答生成部は、該基準フレーズに付加する付加フレーズを以下のように選択する。すなわち、関連する基準ＩＤが「３−３」である付加ＩＤ＝「８」の付加条件である、「今日＝土用の丑の日」との付加条件が満たされるかを確認するため、応答生成部は、先ず、今日の日付を取得する。そして、今日の日付が「土曜の丑の日」でない場合、応答生成部は、Ｓ２０４の処理は実行しない。また、Ｓ２０５で、関連する基準ＩＤが「３−３」であって、関連する基準ＩＤが「３−３」であるその他の付加ＩＤもないため、付加フレーズは選択されず、つまり、「付加フレーズなし」となる。なお、Ｓ２０５でループすることにより応答生成部はＳ２０４の処理を複数回実行する可能性があるが、応答生成部は、ループの都度、基準フレーズに付加する付加フレーズを上書きするのではなく、基準フレーズに付加する付加フレーズを追加する。例えば、基準ＩＤ＝「２−１」の基準フレーズを選択し、付加ＩＤ＝「３」の付加条件が満たされているのを確認の後、付加ＩＤ＝「４」の付加条件が満たされているのを確認すると、応答生成部は以下の処理を実行する。すなわち、付加ＩＤ＝「３」の付加条件が満たされているのを確認して生成した「晴れだよ。高気圧に覆われているからね。」との候補フレーズに、「最高気温は○○度になるよ。」との付加フレーズを付加し、「晴れだよ。高気圧に覆われているからね。最高気温は○○度になるよ。」との候補フレーズを生成する。また、上記説明では、応答生成部は、付加条件を未確認の付加ＩＤがなくなるまでＳ２０５の判定を繰り返しているが、付加条件が満たされているかの確認を行った付加フレーズが所定数以上になると、Ｓ２０５の判定をせずにＳ２０６に遷移するとしてもよい。また、基準フレーズに付加する付加フレーズが所定数以上になると、Ｓ２０５の判定をせずにＳ２０６に遷移するとしてもよい。
応答生成部は、Ｓ２０１で選択した基準フレーズに対しＳ２０４で付加フレーズを付加して生成した候補フレーズを、候補取得部１４１または第２通信部２１に通知する候補フレーズとして確定した後、該候補フレーズに応答レベルを付与する（Ｓ２０６）。すなわち、上記候補フレーズが付加フレーズを含む場合は応答レベルを「１」とし、付加フレーズを含まない場合は応答レベルを「０」とする。例えば、基準フレーズが「晴れだよ。」であり、該基準フレーズに関連する付加フレーズであって、付加条件を満たす付加フレーズが「最高気温は○○度になるよ。」である場合、応答生成部は、「晴れだよ。最高気温は○○度になるよ。」との候補フレーズを生成し、該候補フレーズの応答レベルを「１」とする。他方、基準フレーズが「晴れだよ。」であり、該基準フレーズに関連する付加フレーズであって、付加条件を満たす付加フレーズがない場合、応答生成部は、「晴れだよ。」との候補フレーズを生成し、応答レベルを「０」とする。 (Response Generation Processing) FIG. 6 is a diagram showing the flow of response generation processing executed by the first response generation unit 13 and the second response generation unit 22. Hereinafter, when it is not necessary to distinguish between the first response generation unit 13 and the second response generation unit 22, both are collectively referred to as a “response generation unit”. As illustrated in FIG. 6, when the response generation unit is notified of the calling phrase, the response generating unit first selects a reference phrase corresponding to the calling phrase with reference to the reference phrase table (S201). When there are a plurality of reference phrases corresponding to the calling phrase, the response generation unit selects a reference phrase that matches the condition. For example, in the reference phrase table illustrated in FIG. 3, the reference phrase corresponding to the call phrase “good morning” with call ID = “1” is the reference with “good morning” with reference ID = “1-1”. There are only phrases. Therefore, for the call phrase with call ID = “1”, the response generation unit selects the reference phrase with reference ID = “1-1” with reference to the reference phrase table. On the other hand, in the reference phrase table, the reference ID = “2-1”, “2-2”, “2-3” is included in the call phrase “What is today's weather?” With the call ID = “2”. Three reference phrases are associated. That is, for the call phrase with call ID = “2”, the response generation unit can select the three reference phrases with reference to the reference phrase table. The response generation unit selects a reference phrase that meets the conditions from the three reference phrases. Specifically, weather information is acquired from an external information providing server 98/99 such as a weather information server, and if “today's weather” is “sunny”, “sunny” with reference ID = “2-1” Select. If “Today's weather” acquired from the weather information server is “cloudy”, “cloudy” with reference ID = “2-2” is selected. If the weather information cannot be acquired, “I don't know” with the reference ID = “2-4” is selected.
Next, the response generation unit refers to the additional phrase table illustrated in FIG. 4 and selects an additional ID associated with (related to) the reference ID selected in S201 (S202). For example, in the reference phrase table, a reference phrase with reference ID = “1-1” is associated with a call phrase with call ID = “1”. In the additional phrase table, the additional IDs of the additional phrases with the related reference ID = “1-1” are “1” and “2”. Therefore, the response generation unit first confirms whether the additional condition is satisfied for the additional phrase of the additional ID = “1” (S203). When it can be confirmed that the additional condition of the additional phrase of additional ID = “1” is satisfied (Yes in S203), the additional phrase of additional ID = “1” is changed to the reference of reference ID = “1-1”. It is added to the phrase (S204). For example, when at least one of the terminal 1 and the server 2 can acquire information related to the user's emotion (fun, sad, etc.) based on the user's call voice acquired from the microphone 17, the response generation unit = Check whether the additional condition of “1”, “the user's emotion is fun” is satisfied. Then, when it is confirmed that “the user's emotion is fun” based on the user's call voice, the response generation unit is “additional ID =“ 1 ”is an additional phrase“ today is good. Is added to the reference phrase with reference ID = “1-1”. Note that the technology for estimating the user's emotion based on the user's voice can be performed using conventional technology, and thus description thereof is omitted. When it is not possible to confirm that the additional condition of the additional phrase of additional ID = “1” is satisfied (No in S203), the additional phrase of additional ID = “1” is changed to the reference phrase of reference ID = “1-1”. Without adding, the process proceeds to S205.
Next, the response generation unit refers to the additional phrase table and confirms whether there is another additional ID corresponding to (related to) the above-mentioned reference ID (S205). That is, the response generation unit confirms whether there is an additional ID whose “related reference ID” is the reference ID selected in S201 and for which an additional condition has not been confirmed yet. . If there is an additional ID that has not been confirmed whether the additional condition is satisfied (No in S205), the process returns to S202, selects an additional ID that has not been confirmed whether the additional condition is satisfied (S202), and performs the processing after S203. repeat. For example, in the additional phrase table illustrated in FIG. 4, the additional IDs whose related reference ID is “2-1” are “3”, “4”, and “5”. If the reference ID = “2-1” is selected in S201, the additional condition of the additional ID = “3” has already been confirmed, and the additional conditions of the additional ID = “4” and “5” have not been confirmed, the response generation unit Next, it is confirmed whether or not the additional condition of additional ID = “4” is satisfied. If there is no additional ID for which it is not confirmed whether the additional condition is satisfied (Yes in S205), the process proceeds to S206. For example, when a reference phrase with reference ID = “3-3” is selected for a call phrase with call ID = “3”, the response generation unit selects an additional phrase to be added to the reference phrase as follows. . That is, in order to confirm whether or not the additional condition “Today = Town Day of Saddle”, which is the additional condition of the additional ID = “8” with the related reference ID “3-3”, is satisfied, the response generation unit First, get today's date. If today's date is not “Saturday Day”, the response generation unit does not execute the process of S204. In S205, since there is no other additional ID having the related reference ID “3-3” and the related reference ID “3-3”, the additional phrase is not selected. "No phrase". Although the response generation unit may execute the process of S204 multiple times by looping in S205, the response generation unit does not overwrite the additional phrase to be added to the reference phrase each time the loop is performed. Add an additional phrase to be added to the phrase. For example, after selecting the reference phrase of reference ID = “2-1” and confirming that the additional condition of additional ID = “3” is satisfied, the additional condition of additional ID = “4” is satisfied. If it is confirmed, the response generation unit executes the following processing. That is, a candidate phrase “Sunny, because it is covered with high pressure” generated by confirming that the additional condition of additional ID = “3” is satisfied is “the highest temperature is XX Add an additional phrase, "It's a degree." And generate a candidate phrase, "It's sunny. It's covered with high pressure. The maximum temperature is XX degrees." Further, in the above description, the response generation unit repeats the determination in S205 until there is no additional ID for which the additional condition has not been confirmed. However, when the number of additional phrases that have confirmed whether the additional condition is satisfied exceeds a predetermined number. The process may transition to S206 without making the determination of S205. Further, when the number of additional phrases to be added to the reference phrase exceeds a predetermined number, the process may proceed to S206 without performing the determination of S205.
The response generation unit confirms the candidate phrase generated by adding the additional phrase in S204 to the reference phrase selected in S201 as the candidate phrase to be notified to the candidate acquisition unit 141 or the second communication unit 21, and then the candidate phrase Is assigned a response level (S206). That is, when the candidate phrase includes an additional phrase, the response level is “1”, and when the candidate phrase does not include the additional phrase, the response level is “0”. For example, if the reference phrase is “It's sunny” and the additional phrase related to the reference phrase and the additional phrase that satisfies the additional condition is “The maximum temperature is XX degrees”, the response The generation unit generates a candidate phrase “It's sunny. The maximum temperature is OO degrees.”, And sets the response level of the candidate phrase to “1”. On the other hand, when the reference phrase is “sunny,” and there is no additional phrase related to the reference phrase and satisfying the additional condition, the response generation unit selects the candidate “sunny.” A phrase is generated and the response level is set to “0”.

なお、上記の説明で応答生成部は、基準フレーズを決定した後、付加フレーズを決定したが、応答生成処理の手順はこれに限られるものではない。例えば、以下の順序で処理を実行してもよい。すなわち、先ず、基準フレーズテーブルを参照して、呼びかけフレーズに対応する何れかの基準ＩＤを取得する。その後、付加フレーズテーブルを参照して、取得した上記基準ＩＤに対応する（関連する）付加ＩＤの付加条件を確認する。上記付加条件が満たされていると判定した場合、取得した基準ＩＤの基準フレーズを選択する基準フレーズとして確定する。付加条件が満たされていることを確認できない場合、基準フレーズテーブルを参照して、呼びかけフレーズに対応する別の基準ＩＤを取得し、同様に該基準ＩＤに対応する付加ＩＤの付加条件が満たされているかを確認していく。例えば、基準フレーズテーブルを参照して、呼びかけＩＤ＝「３」の「今日の晩御飯は何にしよう？」に対し、先ず、基準ＩＤ＝「３−１」を取得する。次に、付加フレーズテーブルを参照して、基準ＩＤ＝「３−１」に対応付けられている付加ＩＤ＝「６」を取得する。付加ＩＤ＝「６」の付加条件である「気温＜１０度」が満たされているのを確認すると、基準ＩＤ＝「３−１」の基準フレーズを選択する基準フレーズとして確定し、同時に、付加ＩＤ＝「６」の付加フレーズを選択する付加フレーズとして確定する。他方、付加ＩＤ＝「６」の付加条件が満たされていないことを確認した場合、基準ＩＤ＝「３−１」の基準フレーズ、および付加ＩＤ＝「６」の付加フレーズ以外の、基準フレーズおよび付加フレーズの組合せについて、選択の可否を判定する。すなわち、基準フレーズテーブルを参照して、呼びかけＩＤ＝「３」に対応する、基準ＩＤ＝「３−１」の次の基準ＩＤである基準ＩＤ＝「３−２」を取得し、先ほど同様、付加フレーズテーブルを参照して基準ＩＤ＝「３−２」に対応する付加ＩＤの付加条件が満たされているかを確認する。なお、付加ＩＤ＝「６」の付加条件が満たされていないことを確認した場合に加え、付加ＩＤ＝「６」の付加条件が満たされていることを確認できない場合も、基準ＩＤ＝「３−２」を取得する。
また、上記の説明で応答生成部は、基準フレーズテーブルおよび付加フレーズテーブルにおいて選択しうる基準フレーズおよび付加フレーズが複数ある場合、基準ＩＤおよび付加ＩＤが小さい順に選択するか否かを決定する。例えば、基準ＩＤ＝「１−１」の基準フレーズに付加し得る付加フレーズの付加ＩＤが「１」と「２」とである場合、応答生成部は、先ず、付加ＩＤ＝「１」の付加条件を確認し、次に、付加ＩＤ＝「２」の付加条件を確認する。しかしながら、基準ＩＤおよび付加ＩＤが小さい順に選択するか否かを決定することは必須ではなく、大きい順に決定してもよいし、任意の順序で決定してよい。さらに、上記の説明では応答生成部が候補フレーズに応答レベルを付与するが、候補フレーズに応答レベルを付与するのは応答生成部でなくともよい。例えば、応答選択部１４２が、候補取得部１４１から通知される候補フレーズ（Ａ）および（Ｂ）を解析することによって、各候補レベルに応答レベルを付与してもよい。そして、応答選択部１４２は付与した応答レベルが高い方の候補フレーズを選択フレーズとして選択するとしてもよい。具体的には、応答選択部１４２が（例えば、Ｓ２１１で）各候補フレーズに対し応答レベルを付与した後、応答レベルの高い候補フレーズを選択フレーズとして選択してもよい。 In the above description, the response generation unit determines the additional phrase after determining the reference phrase, but the procedure of the response generation process is not limited to this. For example, the processing may be executed in the following order. That is, first, with reference to the reference phrase table, any reference ID corresponding to the calling phrase is acquired. Thereafter, with reference to the additional phrase table, the additional condition of the additional ID corresponding to (related to) the acquired reference ID is confirmed. When it determines with the said additional conditions being satisfy | filled, it determines as a reference | standard phrase which selects the reference | standard phrase of the acquired reference | standard ID. If it cannot be confirmed that the additional condition is satisfied, the reference phrase table is referred to obtain another reference ID corresponding to the calling phrase, and the additional condition of the additional ID corresponding to the reference ID is also satisfied. I will check if it is. For example, with reference to the reference phrase table, first, reference ID = “3-1” is acquired for “What should we have today's dinner?” With call ID = “3”. Next, with reference to the additional phrase table, the additional ID = “6” associated with the reference ID = “3-1” is acquired. When it is confirmed that the additional condition of “addition ID =“ 6 ”,“ temperature <10 ° C. ”is satisfied, the reference phrase of reference ID =“ 3-1 ”is determined as a reference phrase to be selected and added at the same time. The additional phrase with ID = “6” is determined as the additional phrase to be selected. On the other hand, if it is confirmed that the additional condition of the additional ID = “6” is not satisfied, the reference phrase other than the reference phrase of the reference ID = “3-1” and the additional phrase of the additional ID = “6” Whether or not the combination of additional phrases can be selected is determined. That is, referring to the reference phrase table, the reference ID = “3-2”, which is the next reference ID of the reference ID = “3-1”, corresponding to the call ID = “3” is acquired. With reference to the additional phrase table, it is confirmed whether the additional condition of the additional ID corresponding to the reference ID = “3-2” is satisfied. In addition to confirming that the additional condition of additional ID = “6” is not satisfied, reference ID = “3” is also used when it is not possible to confirm that the additional condition of additional ID = “6” is satisfied. -2 ".
In the above description, when there are a plurality of reference phrases and additional phrases that can be selected in the reference phrase table and the additional phrase table, the response generation unit determines whether or not to select the reference ID and the additional ID in ascending order. For example, when the additional IDs of the additional phrases that can be added to the reference phrase of the reference ID = “1-1” are “1” and “2”, the response generation unit first adds the additional ID = “1”. Confirm the conditions, and then confirm the additional condition of additional ID = “2”. However, it is not essential to determine whether or not the reference ID and the additional ID are selected in ascending order, and may be determined in ascending order or in an arbitrary order. Furthermore, in the above description, the response generation unit gives a response level to the candidate phrase, but the response generation unit may not give the response level to the candidate phrase. For example, the response selection unit 142 may give a response level to each candidate level by analyzing the candidate phrases (A) and (B) notified from the candidate acquisition unit 141. And the response selection part 142 may select the candidate phrase with the higher assigned response level as a selection phrase. Specifically, after the response selection unit 142 gives a response level to each candidate phrase (for example, in S211), a candidate phrase with a high response level may be selected as the selected phrase.

（応答選択処理）図７は、応答選択部（第１応答生成部１３および第２応答生成部２２）が行う応答選択処理を説明する図である。応答選択部１４２は、端末１が生成した候補フレーズ（Ａ）と、サーバ２が生成した候補フレーズ（Ｂ）とから、応答レベルの高い候補フレーズをユーザへ出力するための選択フレーズとして選択する（Ｓ２１１）。つまり、応答選択部１４２は、候補フレーズ（Ａ）と候補フレーズ（Ｂ）とから、付加フレーズを含む候補フレーズを、選択フレーズとして選択する。従って、端末１が生成した候補フレーズ（Ａ）と、サーバ２が生成した候補フレーズ（Ｂ）とから、付加フレーズの有無によって、出力すべき選択フレーズを選択することによって、ユーザの呼びかけに対する直接的な応答だけでなく、付加的な応答も出力できる。なお、候補フレーズ（Ａ）の応答レベルと候補フレーズ（Ｂ）の応答レベルとが等しい場合、いずれの候補フレーズを選択してもよい。「候補フレーズ（Ａ）と（Ｂ）とで応答レベルが等しい場合、候補フレーズ（Ａ）を選択する」と予め決めておいてもよいし、逆に「候補フレーズ（Ｂ）を選択する」としてもよい。さらに、「候補フレーズ（Ａ）と（Ｂ）とで応答レベルが等しい場合には、時間的に一番先に取得した候補フレーズを選択する」としてもよいし、逆に「時間的に一番後に取得した候補フレーズを選択する」としてもよい。 (Response Selection Process) FIG. 7 is a diagram for explaining the response selection process performed by the response selection unit (the first response generation unit 13 and the second response generation unit 22). The response selection unit 142 selects a candidate phrase with a high response level from the candidate phrase (A) generated by the terminal 1 and the candidate phrase (B) generated by the server 2 as a selected phrase for outputting to the user ( S211). That is, the response selection unit 142 selects a candidate phrase including an additional phrase as a selected phrase from the candidate phrase (A) and the candidate phrase (B). Therefore, by selecting a selection phrase to be output from the candidate phrase (A) generated by the terminal 1 and the candidate phrase (B) generated by the server 2 depending on the presence or absence of an additional phrase, direct response to the user's call is made. It can output not only simple responses but also additional responses. Note that if the response level of the candidate phrase (A) is equal to the response level of the candidate phrase (B), any candidate phrase may be selected. “If candidate phrase (A) and (B) have the same response level, select candidate phrase (A)” may be determined in advance, or conversely, “select candidate phrase (B)” Also good. Furthermore, “if the candidate phrases (A) and (B) have the same response level, the candidate phrase acquired first in time may be selected”, or conversely “ It is good also as selecting the candidate phrase acquired later.

音声に対する応答を制御する応答制御装置である端末１の処理の流れは以下のように整理できる。すなわち、第１応答生成部１３および第２応答生成部２２（複数の応答生成手段）のそれぞれによって、音声に基づいて生成された複数の候補フレーズを取得するＳ１０５（候補フレーズ取得ステップ）と、Ｓ１０５において応答選択部１４２が取得した上記複数の候補フレーズから、該複数の候補フレーズのそれぞれが有する応答レベル（情報の重要度）が最も高い候補フレーズを、選択フレーズ（応答フレーズ）として選択するＳ１０６またはＳ２１１（選択ステップ）とを含む。音声を取得すると該音声に対する応答を音声または文字画像として出力する音声応答システム１００は、端末１およびサーバ２の双方の音声処理を利用し、出力する応答フレーズに対するユーザの期待度を最大化する。具体的には、端末１は、端末１とサーバ２とに並行して応答生成処理を実行させることにより、端末での処理の後にサーバでの処理を実行するような従来の音声処理に比べ、呼びかけ音声取得から応答までの待ち時間を短縮できる。また、端末１は、上記複数の候補フレーズから、情報の重要度が最も高い候補フレーズを、選択フレーズとして選択し、該選択フレーズを出力できる。 The process flow of the terminal 1, which is a response control device that controls a response to voice, can be organized as follows. That is, S105 (candidate phrase acquisition step) in which a plurality of candidate phrases generated based on speech are acquired by the first response generation unit 13 and the second response generation unit 22 (a plurality of response generation units), and S105, respectively. In S106, the candidate phrase having the highest response level (importance of information) of each of the plurality of candidate phrases is selected as a selected phrase (response phrase) from the plurality of candidate phrases acquired by the response selection unit 142 in S106 or S211 (selection step). When the voice is acquired, the voice response system 100 that outputs a response to the voice as a voice or a character image uses the voice processing of both the terminal 1 and the server 2 to maximize the user's expectation for the output response phrase. Specifically, the terminal 1 performs a response generation process in parallel with the terminal 1 and the server 2, so that the terminal 1 performs the process on the server after the process on the terminal, The waiting time from call voice acquisition to response can be shortened. Further, the terminal 1 can select a candidate phrase having the highest importance of information from the plurality of candidate phrases as a selected phrase and output the selected phrase.

〔実施形態２〕
以下、本発明の他の実施形態について、図１、８および９に基づき説明する。なお、上述した各実施形態にて説明した部材と同じ機能を有する部材については、同じ符号を付記し説明を省略する。本実施の形態に係る携帯端末１Ａ（以下、端末１Ａと略記する）の概要を説明しておけば、以下の通りである。すなわち、端末１Ａは、第１記憶部１８に、各付加フレーズに付加ポイントが対応付けられている第１付加フレーズテーブル１８２Ａを格納している。また、端末１Ａの応答選択部１４２Ａ（選択手段）は、候補フレーズに含まれる付加フレーズに設定された上記付加ポイントの合計値を、当該候補フレーズの応答レベル（情報の重要度）とする。端末１が、付加フレーズの有無に応じて、候補フレーズに応答レベルを付与したのに対し、端末１Ａは、候補フレーズに含まれる付加フレーズに設定されている付加ポイントに応じて、候補フレーズに応答レベルを付与する。それ以外の点では、端末１Ａの基本的な構成は、端末１の構成と同様である。端末１Ａは、複数の候補フレーズから、各候補フレーズに含まれる付加フレーズの付加ポイントの合計値によって、出力すべき選択フレーズを選択することにより、情報の重要度の高い候補フレーズを出力できる。サーバ２Ａは、第２記憶部２４に、各付加フレーズに付加ポイントが対応付けられている第２付加フレーズテーブル２４２Ａを格納している。それ以外の点では、サーバ２Ａの基本的な構成は、サーバ２の構成と同様である。図１は、端末１およびサーバ２の要部構成を示すブロック図であるとともに、端末１と同様の構成を備える端末１Ａ、および、サーバ２と同様の構成を備えるサーバ２Ａの要部構成を示す。以下、さらに詳細を説明する。なお以下では、第１応答生成部１３Ａと第２応答生成部２２Ａとを区別する必要がない場合、両者を併せて「応答生成部」と呼ぶ。同様に、第１付加フレーズテーブル１８２Ａと第２付加フレーズテーブル２４２Ａとを「付加フレーズテーブル」と呼ぶ。また、以下では応答生成部が応答レベルを付与する例を説明するが、応答レベルは、応答選択部１４２Ａが候補フレーズに含まれる付加フレーズの付加ポイントを合計することによって、該候補フレーズに付与してもよい。応答選択部１４２Ａが、候補フレーズに含まれる付加フレーズに設定された付加ポイントの合計値を該候補フレーズの応答レベルとして、該応答レベルが最も高い候補フレーズを選択フレーズとして選択できさえすればよく、応答レベルの付与はどこで行ってもよい。端末１Ａは、候補フレーズに含まれる付加フレーズの付加ポイント、つまり該付加フレーズが有する情報の重要度により、出力すべき選択フレーズを選択する。従って、端末１Ａは、複数の候補フレーズから、情報の重要度が最も高い候補フレーズを出力できる。 [Embodiment 2]
Hereinafter, another embodiment of the present invention will be described with reference to FIGS. In addition, about the member which has the same function as the member demonstrated in each embodiment mentioned above, the same code | symbol is attached and description is abbreviate | omitted. The outline of mobile terminal 1A (hereinafter abbreviated as terminal 1A) according to the present embodiment will be described as follows. That is, the terminal 1 A stores a first additional phrase table 182 A in which additional points are associated with each additional phrase in the first storage unit 18. Further, the response selection unit 142A (selection unit) of the terminal 1A sets the total value of the additional points set in the additional phrase included in the candidate phrase as the response level (importance of information) of the candidate phrase. The terminal 1 gives a response level to the candidate phrase according to the presence / absence of the additional phrase, whereas the terminal 1A responds to the candidate phrase according to the additional point set in the additional phrase included in the candidate phrase. Grant a level. In other respects, the basic configuration of the terminal 1A is the same as the configuration of the terminal 1. The terminal 1A can output a candidate phrase with high importance of information by selecting a selection phrase to be output from a plurality of candidate phrases according to the total value of the additional points of the additional phrases included in each candidate phrase. The server 2A stores, in the second storage unit 24, a second additional phrase table 242A in which additional points are associated with each additional phrase. In other respects, the basic configuration of the server 2A is the same as the configuration of the server 2. FIG. 1 is a block diagram showing the main configuration of the terminal 1 and the server 2 and also shows the main configuration of the terminal 1A having the same configuration as the terminal 1 and the server 2A having the same configuration as the server 2. . Hereinafter, further details will be described. In the following, when it is not necessary to distinguish the first response generation unit 13A and the second response generation unit 22A, both are collectively referred to as a “response generation unit”. Similarly, the first additional phrase table 182A and the second additional phrase table 242A are referred to as “additional phrase tables”. In the following, an example in which the response generation unit assigns a response level will be described. The response level is given to the candidate phrase by the response selection unit 142A summing the additional points of the additional phrases included in the candidate phrase. May be. The response selection unit 142A only needs to select the total value of the additional points set in the additional phrases included in the candidate phrase as the response level of the candidate phrase and select the candidate phrase with the highest response level as the selected phrase. The response level may be assigned anywhere. The terminal 1A selects a selection phrase to be output according to the addition point of the additional phrase included in the candidate phrase, that is, the importance of the information included in the additional phrase. Therefore, the terminal 1A can output a candidate phrase having the highest importance of information from a plurality of candidate phrases.

図８は、端末１Ａに格納されている第１付加フレーズテーブル１８２Ａおよびサーバ２Ａに格納されている第２付加フレーズテーブル２４２Ａの例を示す図である。第１付加フレーズテーブル１８２Ａおよび第２付加フレーズテーブル２４２Ａにおいて、付加フレーズには付加ポイントが設定されている。「付加ポイント」とは、各付加ＩＤに設定されているポイントであり、各付加フレーズが有する情報の重要度を示す。本実施の形態において、各候補フレーズの応答レベルは、各候補フレーズに含まれる付加フレーズに設定された付加ポイントの合計値である。従って、付加フレーズの付加されていない、基準フレーズのみの候補フレーズの応答レベルは「０」である。応答生成部は、基準フレーズに付加フレーズを付加する都度、該基準フレーズを含む候補フレーズの応答レベルに、付加した付加フレーズに設定されている付加ポイントを加算していく。なお、付加ポイントは、全付加フレーズで同じでもよいし、付加フレーズ毎に異なってもよい。全付加フレーズの付加ポイントが同じ場合、応答生成部または応答選択部１４２Ａは、候補フレーズに含まれる付加フレーズの個数に応じて該候補フレーズの応答レベルを設定する。付加ＩＤ毎に付加ポイントが異なる場合、応答生成部または応答選択部１４２Ａは、候補フレーズが含む付加フレーズの個数に、各付加フレーズの付加ポイントによる重み付けをして、該候補フレーズの応答レベルを決定する。例えば、図示の付加フレーズテーブルにおいて、付加ＩＤ＝「８」の付加フレーズの付加ポイント＝「２」は、付加ＩＤ＝「７」の付加フレーズの付加ポイント＝「１」より大きい。付加ＩＤ＝「７」の付加フレーズの付加条件が「気温＞３０度」であり、付加ＩＤ＝「８」の付加フレーズの付加条件は「今日＝土用の丑の日」である。付加ＩＤ＝「８」の付加条件は、「今日というその日」についての条件であり、付加ＩＤ＝「７」の付加条件である「気温」についての条件よりも条件として限定的であり、付加ＩＤ＝「８」の方が、付加ＩＤ＝「７」より付加ポイントがより高い。このように、付加条件の満たし難さに応じて、付加ポイントが設定されてもよい。 FIG. 8 is a diagram illustrating an example of the first additional phrase table 182A stored in the terminal 1A and the second additional phrase table 242A stored in the server 2A. In the first additional phrase table 182A and the second additional phrase table 242A, additional points are set for the additional phrases. “Additional point” is a point set for each additional ID, and indicates the importance of information included in each additional phrase. In the present embodiment, the response level of each candidate phrase is the total value of additional points set for the additional phrases included in each candidate phrase. Therefore, the response level of the candidate phrase with only the reference phrase, to which no additional phrase is added, is “0”. Each time an additional phrase is added to the reference phrase, the response generation unit adds the additional point set for the added additional phrase to the response level of the candidate phrase including the reference phrase. The additional points may be the same for all the additional phrases, or may be different for each additional phrase. When the addition points of all the additional phrases are the same, the response generation unit or the response selection unit 142A sets the response level of the candidate phrase according to the number of additional phrases included in the candidate phrase. When the additional points differ for each additional ID, the response generation unit or the response selection unit 142A determines the response level of the candidate phrase by weighting the number of additional phrases included in the candidate phrase by the additional point of each additional phrase. To do. For example, in the illustrated additional phrase table, the additional point of the additional phrase with additional ID = “8” = “2” is larger than the additional point of the additional phrase with additional ID = “7” = “1”. The additional condition of the additional phrase with the additional ID = “7” is “temperature> 30 degrees”, and the additional condition of the additional phrase with the additional ID = “8” is “today = soil-making day”. The additional condition of the additional ID = “8” is a condition for “the day that is today”, and is more limited as a condition than the condition for the “temperature” that is the additional condition of the additional ID = “7”. = “8” has a higher additional point than additional ID = “7”. Thus, an additional point may be set according to the difficulty of satisfying the additional condition.

図９は、端末１Ａおよびサーバ２Ａの応答生成処理の流れを示すシーケンス図である。図９の応答生成処理は、図６の応答生成処理と比べて、Ｓ２０４とＳ２０５との間に、Ｓ３０１の処理が追加されている点が異なる。すなわち、Ｓ３０１において応答生成部は、Ｓ２０４において基準フレーズに付加した付加フレーズ（付加ＩＤ）の付加ポイントを、応答レベルに加算する。また、図９の応答生成処理は、図６の応答生成処理におけるＳ２０６に代えて、Ｓ３０６の処理を含む。すなわち、Ｓ３０６において応答生成部は、Ｓ２０１で選択した基準フレーズに対しＳ２０４で付加フレーズを付加して生成した候補フレーズを、候補取得部１４１Ａまたは第２通信部２１に通知する候補フレーズとして確定する。また応答生成部は、該候補フレーズに含まれる付加フレーズの付加ポイントの合計値を、該候補フレーズの応答レベルとして確定する。そして応答生成部は、確定した候補フレーズおよび該候補フレーズの応答レベルを候補取得部１４１Ａまたは第２通信部２１に通知する。候補取得部１４１Ａは、第１応答生成部１３Ａが生成した候補フレーズ（Ａ）および該候補フレーズ（Ａ）の応答レベルと、第２応答生成部２２Ａが生成した候補フレーズ（Ｂ）および該候補フレーズ（Ｂ）の応答レベルとを、第１応答生成部１３Ａおよび第１通信部１１から取得する。そして候補取得部１４１Ａは、それらを応答選択部１４２に通知する。応答選択部１４２は、候補フレーズ（Ａ）および（Ｂ）のそれぞれについて、候補フレーズ（Ａ）および（Ｂ）のそれぞれに含まれる付加フレーズに設定された付加ポイントの合計値を、候補フレーズ（Ａ）および（Ｂ）のそれぞれの応答レベル（重要度）として、該応答レベルの高い方の候補フレーズを、選択フレーズとして選択する。図８に例示する付加フレーズテーブルにおいて、付加ＩＤ＝「４」の「最高気温は○○度になるよ。」との付加フレーズの付加ポイントは「１」である。また、付加ＩＤ＝「５」の「降水確率は○○％だよ。」との付加フレーズの付加ポイントは「１」である。従って、呼びかけＩＤ＝「２」の呼びかけフレーズに対して、第１応答生成部１３Ａが「晴れだよ。最高気温は○○度になるよ。」との候補フレーズ（Ａ）を生成する場合、該候補フレーズ（Ａ）の応答レベルは「１」である。他方、第２応答生成部２２Ａが「晴れだよ。最高気温は○○度になるよ。降水確率は○○％だよ。」という候補フレーズ（Ｂ）を生成する場合、該候補フレーズ（Ｂ）の応答レベルは「２」である。候補取得部１４１Ａは、候補フレーズ（Ｂ）の応答レベル＝２が、候補フレーズ（Ａ）の応答レベル＝１よりも大きいため、候補フレーズ（Ｂ）を、出力すべき選択フレーズとして選択する。 FIG. 9 is a sequence diagram showing a flow of response generation processing of the terminal 1A and the server 2A. 9 differs from the response generation process of FIG. 6 in that the process of S301 is added between S204 and S205. That is, in S301, the response generation unit adds the additional point of the additional phrase (addition ID) added to the reference phrase in S204 to the response level. 9 includes the process of S306 instead of S206 in the response generation process of FIG. That is, in S306, the response generation unit determines the candidate phrase generated by adding the additional phrase in S204 to the reference phrase selected in S201 as the candidate phrase to be notified to the candidate acquisition unit 141A or the second communication unit 21. The response generation unit determines the total value of the additional points of the additional phrases included in the candidate phrase as the response level of the candidate phrase. The response generation unit notifies the candidate acquisition unit 141A or the second communication unit 21 of the confirmed candidate phrase and the response level of the candidate phrase. The candidate acquisition unit 141A includes the candidate phrase (A) generated by the first response generation unit 13A and the response level of the candidate phrase (A), the candidate phrase (B) generated by the second response generation unit 22A, and the candidate phrase. The response level of (B) is acquired from the first response generation unit 13A and the first communication unit 11. Then, the candidate acquisition unit 141A notifies them to the response selection unit 142. For each of the candidate phrases (A) and (B), the response selection unit 142 determines the total value of the additional points set in the additional phrases included in each of the candidate phrases (A) and (B) as the candidate phrase (A ) And (B) as the respective response levels (importance), the candidate phrase with the higher response level is selected as the selected phrase. In the additional phrase table illustrated in FIG. 8, the additional point of the additional phrase “1” is “1”. Further, the additional point of the additional phrase of “additional ID =“ 5 ”“ the probability of precipitation is OO% ”is“ 1 ”. Therefore, when the first response generation unit 13A generates a candidate phrase (A) for the call phrase of call ID = “2”, “It's sunny. The maximum temperature is OO degrees.” The response level of the candidate phrase (A) is “1”. On the other hand, when the second response generation unit 22A generates a candidate phrase (B) “It is sunny. The maximum temperature is XX degrees. The precipitation probability is XX%.” ) Is “2”. Since the response level = 2 of the candidate phrase (B) is higher than the response level = 1 of the candidate phrase (A), the candidate acquisition unit 141A selects the candidate phrase (B) as a selection phrase to be output.

〔実施形態３〕
以下、本発明の他の実施形態について、図１０〜１３に基づき説明する。なお、上述した各実施形態にて説明した部材と同じ機能を有する部材については、同じ符号を付記し説明を省略する。図１０は、本実施の形態に係る応答制御装置である携帯端末３（以下、端末３と略記する）を含む音声応答システム３００の要部構成を示すブロック図である。端末３の概要を説明しておけば、以下の通りである。すなわち、端末３は、第１記憶部１８に、各付加フレーズにカテゴリが設定されている第１付加フレーズテーブル１８３を格納している。また、端末３は、応答選択部１４２Ａ（選択手段）によって選択されなかった候補フレーズであって、応答選択部１４２Ａによって選択された選択フレーズ（応答フレーズ）に含まれる基準フレーズと同内容の基準フレーズを含む候補フレーズが、上記選択フレーズに含まれる付加フレーズに設定されたカテゴリとは異なるカテゴリが設定された付加フレーズを含む場合、該付加フレーズを、上記選択フレーズに付加するフレーズ追加部３４１（フレーズ追加手段）を備える。なお、第１付加フレーズテーブル１８３と第２付加フレーズテーブル２４３とを区別する必要がない場合、両者を併せて「付加フレーズテーブル」と呼ぶ。端末３は、応答選択部１４２Ａによって選択フレーズとして選択されなかった候補フレーズに含まれる付加フレーズを、該選択フレーズに付加することができる。従って、端末３は、単一の応答生成処理のみでは生成できないフレーズを、例えば、第１応答生成部１３Ａまたは第２応答生成部２２Ａのみでは生成できないフレーズを、出力できる。また、以下では応答生成部が応答レベルを付与する例を説明するが、応答レベルは、応答選択部１４２Ａが候補フレーズに含まれる付加フレーズの付加ポイントを合計して、該候補フレーズに付与してもよい。 [Embodiment 3]
Hereinafter, other embodiment of this invention is described based on FIGS. In addition, about the member which has the same function as the member demonstrated in each embodiment mentioned above, the same code | symbol is attached and description is abbreviate | omitted. FIG. 10 is a block diagram showing a main configuration of voice response system 300 including portable terminal 3 (hereinafter abbreviated as terminal 3) which is a response control apparatus according to the present embodiment. The outline of the terminal 3 will be described as follows. That is, the terminal 3 stores a first additional phrase table 183 in which a category is set for each additional phrase in the first storage unit 18. In addition, the terminal 3 is a candidate phrase that has not been selected by the response selection unit 142A (selection means) and has the same content as the reference phrase included in the selection phrase (response phrase) selected by the response selection unit 142A. When the candidate phrase including the phrase includes an additional phrase in which a category different from the category set in the additional phrase included in the selected phrase is included, the phrase adding unit 341 that adds the additional phrase to the selected phrase (phrase Additional means). In addition, when it is not necessary to distinguish the 1st addition phrase table 183 and the 2nd addition phrase table 243, both are collectively called an "addition phrase table." The terminal 3 can add an additional phrase included in the candidate phrase that has not been selected as the selected phrase by the response selecting unit 142A to the selected phrase. Therefore, the terminal 3 can output a phrase that cannot be generated only by a single response generation process, for example, a phrase that cannot be generated only by the first response generation unit 13A or the second response generation unit 22A. In addition, an example in which the response generation unit assigns a response level will be described below, but the response selection unit 142A adds the additional points of the additional phrases included in the candidate phrase, and assigns the response level to the candidate phrase. Also good.

図１１は、端末３および音声処理サーバ２に格納されている付加フレーズテーブルの例を示す図である。図示のように、付加フレーズテーブルにおいて、各付加フレーズにはカテゴリが対応付けられている。カテゴリは、付加フレーズがどのような付加情報に関するかを示す。例えば、図１１の付加フレーズテーブルにおいて、付加ＩＤ＝「１」の「今日もいいことあるといいね。」のカテゴリは「感情」である。これは、「今日もいいことあるといいね。」との付加フレーズは、「感情」という付加情報に関することを示す。 FIG. 11 is a diagram illustrating an example of an additional phrase table stored in the terminal 3 and the voice processing server 2. As illustrated, in the additional phrase table, a category is associated with each additional phrase. The category indicates what additional information the additional phrase relates to. For example, in the additional phrase table of FIG. 11, the category “addition is good today” with additional ID = “1” is “emotion”. This indicates that the additional phrase “I hope there is something good today” relates to the additional information “emotion”.

次に、端末３の実行する処理の流れを図１２・１３を用いて説明する。図１２は、端末３および音声処理サーバ２の応答生成処理の流れを示すシーケンス図である。端末３および音声処理サーバ２の応答生成部は、候補フレーズを生成して該候補フレーズに対し応答レベルを付与するのに加えて、該候補フレーズのカテゴリを決定する。図１２の応答生成処理は、図９の応答生成処理におけるＳ３０６に代えて、Ｓ４０６の処理を含む。すなわち、Ｓ４０６において応答生成部は、Ｓ２０１において決定した基準フレーズと、Ｓ２０４において決定した付加フレーズとから候補フレーズを生成する。Ｓ４０６において応答生成部は、Ｓ２０１で選択した基準フレーズに対しＳ２０４で付加フレーズを付加して生成した候補フレーズを、候補取得部１４１Ａまたは第２通信部２１に通知する候補フレーズとして確定する。応答生成部は、上記候補フレーズに含まれる付加フレーズの付加ポイントの合計値を、該候補フレーズの応答レベルとして確定する。さらに応答生成部は、上記候補フレーズに含まれる付加フレーズのカテゴリを、該候補フレーズのカテゴリとして確定する。例えば、Ｓ２０１で「晴れだよ。」を基準フレーズとして選択し、Ｓ２０４で「最高気温は○○度になるよ。」を付加フレーズとして選択した場合、応答生成部は、「晴れだよ。最高気温は○○度になるよ。」との候補フレーズを生成する。そして、Ｓ４０６において応答生成部は、「晴れだよ。最高気温は○○度になるよ。」との候補フレーズのカテゴリを、「最高気温は○○度になるよ。」との付加フレーズのカテゴリである「最高気温」に確定する。同様に、「晴れだよ。最高気温は○○度になるよ。降水確率は○○％だよ。」との候補フレーズを生成した場合、応答生成部は、該候補フレーズのカテゴリを、「最高気温」と「降水確率」とに確定する。つまり、第１応答生成部１３Ａが「晴れだよ。高気圧に覆われているからね。降水確率は○○％だよ。」との候補フレーズ（Ａ）を生成した場合、該候補フレーズ（Ａ）の応答レベルは「２」、カテゴリは「天気理由、降水確率」である。第２応答生成部２２Ａが「晴れだよ。最高気温は○○度になるよ。」との候補フレーズ（Ｂ）を生成した場合、該候補フレーズ（Ｂ）の応答レベルは「１」、カテゴリは「最高気温」である。応答生成部は、確定した候補フレーズ、該候補フレーズの応答レベルおよびカテゴリを候補取得部１４１Ａまたは第２通信部２１に通知する。 Next, the flow of processing executed by the terminal 3 will be described with reference to FIGS. FIG. 12 is a sequence diagram showing the flow of response generation processing of the terminal 3 and the voice processing server 2. The response generation unit of the terminal 3 and the voice processing server 2 determines a category of the candidate phrase in addition to generating a candidate phrase and assigning a response level to the candidate phrase. The response generation process of FIG. 12 includes a process of S406 instead of S306 in the response generation process of FIG. That is, in S406, the response generation unit generates a candidate phrase from the reference phrase determined in S201 and the additional phrase determined in S204. In S406, the response generation unit determines the candidate phrase generated by adding the additional phrase in S204 to the reference phrase selected in S201 as the candidate phrase to be notified to the candidate acquisition unit 141A or the second communication unit 21. The response generation unit determines the total value of the additional points of the additional phrases included in the candidate phrase as the response level of the candidate phrase. Further, the response generation unit determines the category of the additional phrase included in the candidate phrase as the category of the candidate phrase. For example, when “sunny” is selected as a reference phrase in S201 and “highest temperature is XX degrees” is selected as an additional phrase in S204, the response generation unit displays “sunny. A candidate phrase “The temperature will be XX degrees” is generated. In S 406, the response generation unit sets the category of the candidate phrase “It's sunny. The maximum temperature will be XX degrees.” And the additional phrase “The maximum temperature will be XX degrees.” The category is “Highest temperature”. Similarly, when generating a candidate phrase “It's sunny. The maximum temperature is XX degrees. Precipitation probability is XX%.”, The response generation unit sets the category of the candidate phrase as “ The maximum temperature and the probability of precipitation are fixed. That is, when the first response generation unit 13A generates a candidate phrase (A) that says “It is sunny. Because it is covered with high pressure. The probability of precipitation is XX%.” ) Is “2”, and the category is “reason for weather, probability of precipitation”. When the second response generation unit 22A generates a candidate phrase (B) saying “It's sunny. The maximum temperature is XX degrees.”, The response level of the candidate phrase (B) is “1”, category Is the “highest temperature”. The response generation unit notifies the candidate acquisition unit 141A or the second communication unit 21 of the confirmed candidate phrase, the response level and category of the candidate phrase.

図１３は端末３の応答選択処理の流れを示すシーケンス図である。応答選択部１４２Ａは、応答レベル、つまり付加ポイントの合計値の高い候補フレーズを、選択フレーズとして選択する（Ｓ４１１）。例えば、応答レベル＝「２」である「晴れだよ。高気圧に覆われているからね。降水確率は○○％だよ。」との候補フレーズ（Ａ）と、応答レベル＝「１」である「晴れだよ。最高気温は○○度になるよ。」との候補フレーズ（Ｂ）とを取得すると、応答選択部１４２Ａは候補フレーズ（Ａ）を選択フレーズとして選択する。応答選択部１４２Ａは、候補フレーズ（Ａ）と候補フレーズ（Ｂ）とを、どちらの候補フレーズを選択フレーズとして選択したかの情報と一緒に、フレーズ追加部３４１に通知する。
フレーズ追加部３４１は、選択フレーズとして選択されなかった候補フレーズであって、選択フレーズとして選択された候補フレーズ（Ａ）に含まれる基準フレーズ（Ａ−０）と同内容の基準フレーズを含む候補フレーズがあるか確認する（Ｓ４１２）。具体的には、フレーズ追加部３４１は、先ず、選択フレーズとして選択された候補フレーズ（Ａ）に含まれる基準フレーズ（Ａ−０）を抽出する。次に、選択フレーズとして選択されなかった候補フレーズ（Ｂ）の基準フレーズ（Ｂ−０）を抽出する。そして、フレーズ追加部３４１は、基準フレーズ（Ａ−０）と基準フレーズ（Ｂ−０）とが一致するか（同内容か）を判定する。なお、基準フレーズ（Ａ−０）と基準フレーズ（Ｂ−０）とが一致するかの判定は、「一言一句同じか」という判定ではなく、該２つの基準フレーズが特定の語を含むか否かという判定でもよい。つまり、例えば、基準フレーズ（Ａ−０）と基準フレーズ（Ｂ−０）とが同じ「晴れ」という語を含む場合には、基準フレーズ（Ａ−０）と基準フレーズ（Ｂ−０）とは一致していると判定してもよい。例えば、「晴れだよ。高気圧に覆われているからね。降水確率は○○％だよ。」との候補フレーズ（Ａ）の基準フレーズ（Ａ−０）＝「晴れだよ。」と、「晴れだよ。最高気温は○○度になるよ。」との候補フレーズ（Ｂ）の基準フレーズ（Ｂ−０）＝「晴れだよ。」とが同内容であるかを判定する。基準フレーズ（Ａ−０）と同内容の基準フレーズを含む、候補フレーズ（Ａ）以外の候補フレーズがある場合（Ｓ４１２でＹｅｓ）、フレーズ追加部３４１は、選択フレーズとして選択されなかった候補フレーズであって、基準フレーズ（Ａ−０）と同内容の基準フレーズを含む候補フレーズ（Ｂ）を取得する（Ｓ４１３）。例えば、基準フレーズ（Ａ−０）＝「晴れだよ。」と、基準フレーズ（Ｂ−０）＝「晴れだよ。」とが同内容であると判定すると、フレーズ追加部３４１は、「晴れだよ。最高気温は○○度になるよ。」との候補フレーズ（Ｂ）を取得する。基準フレーズ（Ａ−０）と同内容の基準フレーズを含む、候補フレーズ（Ａ）以外の候補フレーズがない場合（Ｓ４１２でＮｏ）、フレーズ追加部３４１は新たな付加フレーズを候補フレーズ（Ａ）には追加せず、処理を終了する。例えば、選択フレーズとして選択しなかった候補フレーズに含まれる基準フレーズ（Ｂ−０）が「雨だよ。」であり、選択フレーズとして選択した候補フレーズに含まれる基準フレーズ（Ａ−０）である「晴れだよ。」である場合、フレーズ追加部３４１は、基準フレーズ（Ａ−０）と基準フレーズ（Ｂ−０）とは一致しないと判定し、処理を終了する。
フレーズ追加部３４１は、候補フレーズ（Ｂ）が、候補フレーズ（Ａ）のカテゴリとは一致しないカテゴリを含むかを判定する（Ｓ４１４）。例えば、「晴れだよ。高気圧に覆われているからね。降水確率は○○％だよ。」との候補フレーズ（Ａ）と、「晴れだよ。最高気温は○○度になるよ。」との候補フレーズ（Ｂ）とについて、フレーズ追加部３４１は、候補フレーズ（Ａ）のカテゴリ＝「天気理由、降水確率」と、候補フレーズ（Ｂ）のカテゴリ＝「最高気温」とは一致しないと判定する。候補フレーズ（Ｂ）が候補フレーズ（Ａ）のカテゴリとは一致しないカテゴリを含む場合（Ｓ４１４でＹｅｓ）、フレーズ追加部３４１は、候補フレーズ（Ｂ）に含まれる付加フレーズであって、候補フレーズ（Ａ）のカテゴリとは一致しないカテゴリに対応する付加フレーズ（Ｂ−１）を取得する（Ｓ４１５）。例えば、カテゴリ＝「天気理由、降水確率」である「晴れだよ。高気圧に覆われているからね。降水確率は○○％だよ。」との候補フレーズ（Ａ）と、カテゴリ＝「最高気温」である「晴れだよ。最高気温は○○度になるよ。」との候補フレーズ（Ｂ）とを比較し、フレーズ追加部３４１は、先ず、候補フレーズ（Ｂ）は、候補フレーズ（Ａ）と異なり、「最高気温」とのカテゴリを含むことを確認する。次に、フレーズ追加部３４１は、候補フレーズ（Ｂ）から、カテゴリ＝「最高気温」に対応する付加フレーズ＝「最高気温は○○度になるよ。」を抽出し取得する。フレーズ追加部３４１は、付加フレーズ（Ｂ−１）を候補フレーズ（Ａ）に付加する（Ｓ４１６）。例えば、フレーズ追加部３４１は、付加フレーズ＝「最高気温は○○度になるよ。」を、「晴れだよ。高気圧に覆われているからね。降水確率は○○％だよ。」との候補フレーズ（Ａ）に付加し、「晴れだよ。高気圧に覆われているからね。降水確率は○○％だよ。最高気温は○○になるよ。」とのフレーズを生成する。そして、フレーズ追加部３４１は、「晴れだよ。高気圧に覆われているからね。降水確率は○○％だよ。最高気温は○○になるよ。」とのフレーズを、選択結果出力部１４３Ａに通知する。候補フレーズ（Ｂ）が候補フレーズ（Ａ）のカテゴリとは一致しないカテゴリを含まない場合（Ｓ４１４でＮｏ）、フレーズ追加部３４１は新たな付加フレーズを候補フレーズ（Ａ）には追加せず、処理を終了する。 FIG. 13 is a sequence diagram showing the flow of response selection processing of the terminal 3. 142 A of response selection parts select a candidate phrase with a high response level, ie, the total value of an additional point, as a selection phrase (S411). For example, when the response level is “2”, “Sunny, because it is covered with high pressure. The probability of precipitation is XX%” and the response level is “1”. Upon obtaining a candidate phrase (B) that says “It's sunny. The maximum temperature will be XX degrees.”, The response selection unit 142A selects the candidate phrase (A) as the selected phrase. 142 A of response selection parts notify a candidate phrase (A) and a candidate phrase (B) to the phrase addition part 341 with the information of which candidate phrase was selected as a selection phrase.
The phrase adding unit 341 is a candidate phrase that has not been selected as the selected phrase, and includes a reference phrase having the same content as the reference phrase (A-0) included in the candidate phrase (A) selected as the selected phrase. It is confirmed whether there is any (S412). Specifically, the phrase adding unit 341 first extracts the reference phrase (A-0) included in the candidate phrase (A) selected as the selected phrase. Next, the reference phrase (B-0) of the candidate phrase (B) not selected as the selected phrase is extracted. And the phrase addition part 341 determines whether a reference | standard phrase (A-0) and a reference | standard phrase (B-0) correspond (it is the same content). Note that whether the reference phrase (A-0) and the reference phrase (B-0) match is not a determination that “the phrase is the same”, but whether the two reference phrases include a specific word. It may be determined whether or not. That is, for example, when the reference phrase (A-0) and the reference phrase (B-0) include the same word “sunny”, the reference phrase (A-0) and the reference phrase (B-0) are It may be determined that they match. For example, the standard phrase (A-0) of the candidate phrase (A) = “It's sunny. It ’s sunny. Because it ’s covered with high pressure. The probability of precipitation is XX%.” It is determined whether the reference phrase (B-0) = “It's sunny.” Of the candidate phrase (B) that says “It's sunny. The maximum temperature is XX degrees.” Is the same content. When there is a candidate phrase other than the candidate phrase (A) that includes the same phrase as the reference phrase (A-0) (Yes in S412), the phrase adding unit 341 is a candidate phrase that has not been selected as the selected phrase. Then, the candidate phrase (B) including the reference phrase having the same content as the reference phrase (A-0) is acquired (S413). For example, if it is determined that the reference phrase (A-0) = “sunny” and the reference phrase (B-0) = “sunny” are the same content, the phrase adding unit 341 determines that “sunny” Get the candidate phrase (B) "The maximum temperature is XX degrees." When there is no candidate phrase other than the candidate phrase (A) including the reference phrase having the same content as the reference phrase (A-0) (No in S412), the phrase adding unit 341 selects a new additional phrase as the candidate phrase (A). Is not added, and the process is terminated. For example, the reference phrase (B-0) included in the candidate phrase that is not selected as the selected phrase is “rainy,” and is the reference phrase (A-0) included in the candidate phrase selected as the selected phrase. If it is “sunny,” the phrase adding unit 341 determines that the reference phrase (A-0) and the reference phrase (B-0) do not match, and ends the process.
The phrase adding unit 341 determines whether the candidate phrase (B) includes a category that does not match the category of the candidate phrase (A) (S414). For example, the candidate phrase (A) “It ’s sunny. Because it ’s covered with high pressure. The probability of precipitation is XX%.” And “It ’s sunny. The maximum temperature is XX degrees. ”For the candidate phrase (B), the category of the candidate phrase (A) =“ weather reason, probability of precipitation ”does not match the category of the candidate phrase (B) =“ highest temperature ” Is determined. When the candidate phrase (B) includes a category that does not match the category of the candidate phrase (A) (Yes in S414), the phrase adding unit 341 is an additional phrase included in the candidate phrase (B), and the candidate phrase ( An additional phrase (B-1) corresponding to a category that does not match the category of A) is acquired (S415). For example, category = “reason for weather, probability of precipitation” “Sunny, because it is covered with high pressure. The probability of precipitation is XX%” and category = “highest” The phrase adding unit 341 first compares the candidate phrase (B) with the candidate phrase (B), which is “temperature”. Unlike A), it is confirmed that the category “maximum temperature” is included. Next, the phrase adding unit 341 extracts and acquires the additional phrase corresponding to the category = “maximum temperature” = “the maximum temperature will be OO degrees” from the candidate phrase (B). The phrase adding unit 341 adds the additional phrase (B-1) to the candidate phrase (A) (S416). For example, the phrase adding unit 341 adds an additional phrase = “the maximum temperature is XX degrees”, “sunny. It is covered with high pressure. The probability of precipitation is XX%.” Is added to the candidate phrase (A), and the phrase “It's sunny. Because it is covered with high pressure. The probability of precipitation is XX%. The maximum temperature is XX.” Is generated. Then, the phrase adding unit 341 selects a phrase “selection output unit with a phrase“ It's sunny. Because it is covered with high pressure. The probability of precipitation is XX%. The maximum temperature is XX. ” Notify 143A. When the candidate phrase (B) does not include a category that does not match the category of the candidate phrase (A) (No in S414), the phrase adding unit 341 does not add a new additional phrase to the candidate phrase (A) and performs processing. Exit.

端末３は、各候補フレーズのカテゴリに基づき、選択フレーズとして選択した候補フレーズに新たな付加フレーズを付加して出力する。例えば、第１応答生成部１３Ａで生成した候補フレーズ（Ａ）に、第２応答生成部２２Ａで選択した付加フレーズ（Ｂ−１）を付加する。従って、端末３は、第１応答生成部１３Ａまたは第２応答生成部２２Ａのみでは生成できないフレーズを出力できる。なお、端末３は、或る候補フレーズを出力した後に別の候補フレーズを付加して出力できる。例えば、ネットワーク通信の遅延等でサーバ２からの応答が一定の閾値以上に遅延した場合など、既に先の応答を出力した後であっても、異なるカテゴリの付加フレーズが後の応答に含まれていた場合、先の応答に後の付加フレーズを付加できる。基本的な処理の手順は、図１３に示した応答選択処理と同等である。すなわち、先ず、端末３は、出力しようとしている候補フレーズ（Ａ）をいったん第１記憶部１８に格納してから、候補フレーズ（Ａ）を出力する。その後、未だ出力していない候補フレーズであって、既に出力した選択フレーズの基礎フレーズと同様の基礎フレーズを含む候補フレーズ（Ｂ）を取得すると、端末３は、該候補フレーズ（Ｂ）のカテゴリを、上記候補フレーズ（Ａ）のカテゴリと比較する。そして、上記候補フレーズ（Ｂ）のカテゴリが、上記候補フレーズ（Ａ）のカテゴリと異なる場合、端末３は、上記候補フレーズ（Ｂ）に含まれる付加フレーズであって、上記候補フレーズ（Ａ）のカテゴリとは異なるカテゴリに対応する付加フレーズを取得し、出力する。つまり端末３は、既に出力した候補フレーズに対して新しい付加フレーズを追加したフレーズを出力できる。 The terminal 3 adds and outputs a new additional phrase to the candidate phrase selected as the selected phrase based on the category of each candidate phrase. For example, the additional phrase (B-1) selected by the second response generation unit 22A is added to the candidate phrase (A) generated by the first response generation unit 13A. Accordingly, the terminal 3 can output a phrase that cannot be generated only by the first response generation unit 13A or the second response generation unit 22A. The terminal 3 can output a candidate phrase after adding another candidate phrase. For example, when the response from the server 2 is delayed beyond a certain threshold due to a delay in network communication or the like, an additional phrase of a different category is included in the later response even after the previous response has already been output. If this is the case, a later additional phrase can be added to the previous response. The basic processing procedure is the same as the response selection processing shown in FIG. That is, first, the terminal 3 temporarily stores the candidate phrase (A) to be output in the first storage unit 18 and then outputs the candidate phrase (A). Thereafter, when a candidate phrase (B) that is a candidate phrase that has not yet been output and includes a basic phrase similar to the basic phrase of the selected phrase that has already been output, the terminal 3 selects the category of the candidate phrase (B). Compare with the category of the candidate phrase (A). And when the category of the said candidate phrase (B) differs from the category of the said candidate phrase (A), the terminal 3 is an additional phrase contained in the said candidate phrase (B), Comprising: The said candidate phrase (A) Acquires and outputs an additional phrase corresponding to a category different from the category. That is, the terminal 3 can output a phrase in which a new additional phrase is added to the already output candidate phrase.

〔実施形態４〕
端末１、１Ａおよび３の制御ブロック（第１制御部１０、３０）は、集積回路（ＩＣチップ）等に形成された論理回路（ハードウェア）によって実現してもよいし、ＣＰＵ（Central Processing Unit）を用いてソフトウェアによって実現してもよい。後者の場合、端末１、１Ａおよび３は、各機能を実現するソフトウェアであるプログラムの命令を実行するＣＰＵ、上記プログラムおよび各種データがコンピュータ（またはＣＰＵ）で読み取り可能に記録されたＲＯＭ（Read Only Memory）または記憶装置（これらを「記録媒体」と称する）、上記プログラムを展開するＲＡＭ（Random Access Memory）などを備えている。そして、コンピュータ（またはＣＰＵ）が上記プログラムを上記記録媒体から読み取って実行することにより、本発明の目的が達成される。上記記録媒体としては、「一時的でない有形の媒体」、例えば、テープ、ディスク、カード、半導体メモリ、プログラマブルな論理回路などを用いることができる。また、上記プログラムは、該プログラムを伝送可能な任意の伝送媒体（通信ネットワークや放送波等）を介して上記コンピュータに供給されてもよい。なお、本発明は、上記プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。 [Embodiment 4]
The control blocks (first control units 10, 30) of the terminals 1, 1A and 3 may be realized by a logic circuit (hardware) formed in an integrated circuit (IC chip) or the like, or a CPU (Central Processing Unit). ) May be implemented by software. In the latter case, the terminals 1, 1A, and 3 include a CPU that executes instructions of a program that is software that implements each function, and a ROM (Read Only) in which the program and various data are recorded so as to be readable by the computer (or CPU). Memory) or a storage device (these are referred to as “recording media”), a RAM (Random Access Memory) for expanding the program, and the like. And the objective of this invention is achieved when a computer (or CPU) reads the said program from the said recording medium and runs it. As the recording medium, a “non-temporary tangible medium” such as a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used. The program may be supplied to the computer via an arbitrary transmission medium (such as a communication network or a broadcast wave) that can transmit the program. The present invention can also be realized in the form of a data signal embedded in a carrier wave in which the program is embodied by electronic transmission.

〔変形例〕
端末１、１Ａおよび３は、複数の音声処理、特に応答生成処理を並行して実行させることにより、或る音声処理を実行し、または実行させようとしてから、別の音声処理を実行する場合に比べ、ユーザが音声を発してから応答が出力されるまでの時間を短縮できる。また、端末１、１Ａおよび３は、並行して生成させた複数の候補フレーズから、情報の重要性（応答レベル）の最も高い候補フレーズを選択し、出力する。つまり、端末１、１Ａおよび３にとって、候補取得部１４１・１４１Ａが複数の候補フレーズを取得でき、応答選択部１４２・１４２Ａが該複数の候補フレーズから、該複数の候補フレーズのそれぞれが有する情報の重要度が最も高い候補フレーズを、選択フレーズとして選択できればよく、その他の構成は必須ではない。上述の端末１、１Ａおよび３が第１応答生成部１３・１３Ａを備え、サーバ２・２Ａが第２応答生成部２２・２２Ａを備える例を説明したが、この構成は必須ではない。例えば、端末１は第１応答生成部１３を備えず、サーバ２が第１応答生成部１３および第２応答生成部２２を備えてもよく、逆に、サーバ２は第２応答生成部２２を備えず、端末１が第１応答生成部１３および第２応答生成部２２を備えてもよい。さらに、候補取得部１４１・１４１Ａが取得する候補フレーズが２つであることも必須ではなく、例えば、３つ以上の候補フレーズを取得してもよい。同様に、応答選択部１４２・１４２Ａは３つ以上の候補フレーズから、情報の重要度が最も高い候補フレーズを選択フレーズとして選択してもよい。また、端末１の音声認識部１２が音声認識処理を実行することも必須ではなく、サーバ２が第２の音声認識部を備え、マイク１７からの音声データを端末１がサーバ２に送信して、端末１とサーバ２とがそれぞれ並行して音声認識処理を実行してもよい。端末１の代わりにサーバ２が音声認識部１２を備え、マイク１７からの音声データに対してサーバ２の音声認識部１２が音声認識処理と、応答生成処理のリクエストとを行ってもよい。さらに、端末１の音声合成部１５が音声合成処理を実行することも必須ではなく、サーバ２が音声合成部１５を備え、選択結果出力部１４３・１４３Ａから取得する選択フレーズに基づいて、スピーカ１９２に出力させる音声データを生成してもよい。なお、一般にサーバは、端末に比べ、高い処理能力を備え、豊富な語彙を利用でき、音声認識の認識精度および応答生成の対応可能数が高い。通常、サーバは端末よりも巨大な音響モデル辞書、言語モデル辞書等を有し、音声認識の処理能力が高く、また、数多くの対話応答シナリオに対応でき、さらに、膨大な音素データを持ちクリアな音声を出力する。 [Modification]
When the terminals 1, 1 A, and 3 execute a certain voice process by executing a plurality of voice processes, in particular, a response generation process in parallel, or execute another voice process. In comparison, the time from when the user utters a sound until the response is output can be shortened. Further, the terminals 1, 1A and 3 select and output a candidate phrase having the highest importance (response level) of information from a plurality of candidate phrases generated in parallel. That is, for the terminals 1, 1 A, and 3, the candidate acquisition units 141 and 141 A can acquire a plurality of candidate phrases, and the response selection units 142 and 142 A It is only necessary to select the candidate phrase having the highest importance as the selected phrase, and other configurations are not essential. Although the above-described terminals 1, 1 A, and 3 include the first response generation units 13 and 13 A and the servers 2 and 2 A include the second response generation units 22 and 22 A, this configuration is not essential. For example, the terminal 1 may not include the first response generation unit 13, and the server 2 may include the first response generation unit 13 and the second response generation unit 22. Conversely, the server 2 includes the second response generation unit 22. The terminal 1 may be provided with the 1st response production | generation part 13 and the 2nd response production | generation part 22 without providing. Furthermore, it is not essential that the candidate acquisition units 141 and 141A acquire two candidate phrases. For example, three or more candidate phrases may be acquired. Similarly, the response selection units 142 and 142A may select a candidate phrase having the highest importance of information as a selected phrase from three or more candidate phrases. Further, it is not essential for the voice recognition unit 12 of the terminal 1 to execute the voice recognition process. The server 2 includes the second voice recognition unit, and the terminal 1 transmits the voice data from the microphone 17 to the server 2. The terminal 1 and the server 2 may execute the speech recognition process in parallel. The server 2 may include the voice recognition unit 12 instead of the terminal 1, and the voice recognition unit 12 of the server 2 may perform a voice recognition process and a response generation process request on the voice data from the microphone 17. Further, it is not essential for the speech synthesizer 15 of the terminal 1 to execute the speech synthesis process, and the server 2 includes the speech synthesizer 15 and based on the selection phrase acquired from the selection result output units 143 and 143A, the speaker 192. Audio data to be output to the user may be generated. In general, a server has a higher processing capacity than a terminal, can use abundant vocabulary, and has a high recognition accuracy for speech recognition and a high number of responses that can be handled. Usually, the server has a larger acoustic model dictionary, language model dictionary, etc. than the terminal, has a high speech recognition processing capacity, can handle many interactive response scenarios, and has a large amount of phoneme data and is clear. Output audio.

本発明は上述した各実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。さらに、各実施形態にそれぞれ開示された技術的手段を組み合わせることにより、新しい技術的特徴を形成することができる。 The present invention is not limited to the above-described embodiments, and various modifications are possible within the scope shown in the claims, and embodiments obtained by appropriately combining technical means disclosed in different embodiments. Is also included in the technical scope of the present invention. Furthermore, a new technical feature can be formed by combining the technical means disclosed in each embodiment.

本発明は、音声に対する応答を制御する応答制御装置に広く利用することができる。 The present invention can be widely used in response control devices that control responses to voice.

１・１Ａ・３携帯端末（応答制御装置），１３・１３Ａ第１応答生成部（応答生成手段），２２・２２Ａ第２応答生成部（応答生成手段），１４１・１４１Ａ候補取得部（候補フレーズ取得手段），１４２・１４２Ａ応答選択部（選択手段），３４１フレーズ追加部（フレーズ追加手段） 1 · 1A · 3 mobile terminal (response control device), 13 · 13A first response generator (response generator), 22 · 22A second response generator (response generator), 141 · 141A candidate acquisition unit (candidate phrase) Acquiring means), 142 / 142A Response selecting section (selecting means), 341 Phrase adding section (phrase adding means)

Claims

A response control device for controlling a response to voice,
Candidate phrase acquiring means for acquiring a plurality of candidate phrases generated based on the voice by each of a plurality of response generating means;
Selecting means for selecting, from among the plurality of candidate phrases acquired by the candidate phrase acquiring means, a candidate phrase having the highest importance of information included in each of the plurality of candidate phrases, as a response phrase. Response control device.

Each of the plurality of candidate phrases includes one or more reference phrases and zero or more additional phrases.
The response control apparatus according to claim 1, wherein the selection unit determines that a candidate phrase including an additional phrase is higher in importance than a candidate phrase that does not include an additional phrase.

Additional points are set for the above additional phrases,
The response control apparatus according to claim 2, wherein the selection unit uses the total value of the additional points set in the additional phrase included in the candidate phrase as the importance of the candidate phrase.

The above additional phrase has a category,
The candidate phrase that is not selected by the selection means and includes the reference phrase having the same content as the reference phrase included in the response phrase selected by the selection means is included in the response phrase The apparatus further comprises phrase adding means for adding the additional phrase to the response phrase when the additional phrase is set with the category different from the category set for the additional phrase. 4. The response control device according to 2 or 3.

A control program for causing a computer to function as the response control device according to claim 1, wherein the control program causes the computer to function as each of the means.