JP6124844B2

JP6124844B2 - SERVER, METHOD USING DATABASE, PROGRAM, SYSTEM, TERMINAL, TERMINAL PROGRAM, AND VOICE DATA OUTPUT DEVICE

Info

Publication number: JP6124844B2
Application number: JP2014129415A
Authority: JP
Inventors: 木付　英士; 英士木付
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2014-06-24
Filing date: 2014-06-24
Publication date: 2017-05-10
Anticipated expiration: 2034-06-24
Also published as: JP2016009072A

Description

本開示は、データベース、データベースを利用する方法、プログラム、システム、端末、端末プログラムおよび音声データ出力装置に関する。 The present disclosure relates to a database, a method using the database, a program, a system, a terminal, a terminal program, and an audio data output device.

従来より、対話システムとして、ユーザと対話可能なシステムが提案されている。当該システムにおては、対話に用いられる複数のキャラクタの音声が設けられ、使用するユーザの好みに合わせてキャラクタの音声を切り替えることも一般的に行われている（特許文献１）。 Conventionally, a system capable of interacting with a user has been proposed as a dialog system. In this system, voices of a plurality of characters used for dialogue are provided, and switching of character voices is generally performed in accordance with the user's preference (Patent Document 1).

特開２００６−３３７４３２号公報JP 2006-337432 A

一方で、複数のキャラクタの音声を利用可能にした場合には、それぞれのキャラクタ毎に対話に必要な辞書を用意する必要がある。 On the other hand, when the voices of a plurality of characters are made available, it is necessary to prepare a dictionary necessary for dialogue for each character.

この点で、新しい機能の対話パターンが追加される毎にそれぞれのキャラクタの辞書を更新する必要があり、キャラクタの数が多い場合には煩雑になるという課題がある。 In this regard, it is necessary to update the dictionary of each character every time a new function dialogue pattern is added, and there is a problem that the number of characters becomes complicated.

本開示は、上述のような課題を解決するためになされたものであって、簡易な方式で辞書を更新することが可能なデータベース、データベースを利用する方法、プログラム、システム、端末、端末プログラムおよび音声データ出力装置を提供することを目的とする。 The present disclosure has been made to solve the above-described problem, and is a database capable of updating a dictionary in a simple manner, a method using the database, a program, a system, a terminal, a terminal program, and An object is to provide an audio data output device.

本開示の一実施形態に従うデータベースは、ユーザからの要求に対する応答処理に利用され、追記可能に構成されるデータベースであって、選択可能な複数のキャラクタにそれぞれ対応して設けられ、当該キャラクタ毎に第１の要求に対する応答処理に用いられる応答データが保存された複数の基本辞書と、複数のキャラクタに関連付けられ、選択されたキャラクタに対応する基本辞書の代わりに第２の要求に対する応答処理に用いられる応答データが保存された追加辞書とを含む。 A database according to an embodiment of the present disclosure is a database configured to be additionally writable and used for response processing to a request from a user, provided for each of a plurality of selectable characters, and for each of the characters A plurality of basic dictionaries in which response data used for response processing to the first request is stored, and used for response processing to the second request instead of the basic dictionary associated with the plurality of characters and corresponding to the selected character. And an additional dictionary in which response data is stored.

好ましくは、追加辞書は、複数のキャラクタを抽象化した抽象キャラクタに対応して設けられる。 Preferably, the additional dictionary is provided corresponding to an abstract character obtained by abstracting a plurality of characters.

好ましくは、応答データは、ユーザからの要求に対する応答処理として音声出力するために利用されるテキストデータである。 Preferably, the response data is text data used for outputting a voice as a response process to a request from the user.

好ましくは、追加辞書の応答データは、基本辞書の応答データの共通の特徴を残して固有の特徴を排して一般化したテキストデータである。 Preferably, the response data of the additional dictionary is text data that is generalized while excluding unique features while leaving the common features of the response data of the basic dictionary.

本開示の一実施形態に従う方法は、追記可能に構成され、選択可能な複数のキャラクタにそれぞれ対応して設けられ、当該キャラクタ毎に応答データが保存された複数の基本辞書と、複数のキャラクタに関連付けられ、選択されたキャラクタに対応する基本辞書の代わりに用いられる応答データが保存された追加辞書とを含むデータベースを利用する方法であって、ユーザからの要求を受け付けるステップと、キャラクタの選択を受け付けるステップと、受け付けた要求に従って応答処理を実行するステップとを備える。応答処理を実行するステップは、受け付けた要求が第１の要求である場合に、選択されたキャラクタに対応して設けられた基本辞書を利用して応答データを抽出するステップと、受け付けた要求が第２の要求である場合に、選択されたキャラクタに関連付られた追加辞書を利用して応答データを抽出するステップと、抽出した応答データと、選択されたキャラクタとに基づいて音声合成するステップと、音声合成した音声データを出力するステップとを含む。 A method according to an embodiment of the present disclosure is configured to be additionally writable, provided corresponding to a plurality of selectable characters, and a plurality of basic dictionaries in which response data is stored for each character, and a plurality of characters. A method using a database including an additional dictionary in which response data stored in response data used instead of the basic dictionary corresponding to the selected character is stored, the step of receiving a request from the user, and the selection of the character A step of accepting, and a step of executing response processing in accordance with the accepted request. The step of executing the response process includes a step of extracting response data using a basic dictionary provided corresponding to the selected character when the received request is the first request, and the received request includes If it is the second request, extracting response data using an additional dictionary associated with the selected character, synthesizing speech based on the extracted response data and the selected character And outputting voice data obtained by voice synthesis.

本開示の一実施形態に従うプログラムは、追記可能に構成され、選択可能な複数のキャラクタにそれぞれ対応して設けられ、当該キャラクタ毎に応答データが保存された複数の基本辞書と、複数のキャラクタに関連付けられ、選択されたキャラクタに対応する基本辞書の代わりに用いられる応答データが保存された追加辞書とを含むデータベースを利用するコンピュータに実行させるためのプログラムであって、プログラムは、コンピュータに、ユーザからの要求を受け付けるステップと、キャラクタの選択を受け付けるステップと、受け付けた要求に従って応答処理を実行するステップとを備える、処理を実行させるように機能させる。応答処理を実行するステップは、受け付けた要求が第１の要求である場合に、選択されたキャラクタに対応して設けられた基本辞書を利用して応答データを抽出するステップと、受け付けた要求が第２の要求である場合に、選択されたキャラクタに関連付られた追加辞書を利用して応答データを抽出するステップと、抽出した応答データと、選択されたキャラクタとに基づいて音声合成するステップと、音声合成した音声データを出力するステップとを含む。 A program according to an embodiment of the present disclosure is configured to be additionally writable, provided corresponding to each of a plurality of selectable characters, a plurality of basic dictionaries in which response data is stored for each character, and a plurality of characters A program for causing a computer to use a database including an additional dictionary in which response data used in place of a basic dictionary corresponding to a selected character is stored and stored in the computer. And a step of receiving a selection from the character, a step of receiving a selection of a character, and a step of executing a response process in accordance with the received request. The step of executing the response process includes a step of extracting response data using a basic dictionary provided corresponding to the selected character when the received request is the first request, and the received request includes If it is the second request, extracting response data using an additional dictionary associated with the selected character, synthesizing speech based on the extracted response data and the selected character And outputting voice data obtained by voice synthesis.

本開示の一実施形態に従うシステムは、追記可能に構成されるデータベースを利用するシステムであって、ユーザからの要求を受け付ける受付手段と、キャラクタの選択を受け付ける選択受付手段と、受付手段で受け付けた要求に従って応答処理を実行する応答実行手段とを備る。データベースは、選択可能な複数のキャラクタにそれぞれ対応して設けられ、当該キャラクタ毎に応答データが保存された複数の基本辞書と、複数のキャラクタに関連付けられ、選択されたキャラクタに対応する基本辞書の代わりに用いられる応答データが保存された追加辞書とを含む。応答実行手段は、受け付けた要求が第１の要求である場合に、選択されたキャラクタに対応して設けられた基本辞書を利用して応答データを抽出する第１抽出手段と、受け付けた要求が第２の要求である場合に、選択されたキャラクタに関連付られた追加辞書を利用して応答データを抽出する第２抽出手段と、抽出した応答データと、選択されたキャラクタとに基づいて音声合成する音声合成手段と、音声合成した音声データを出力する出力手段とを含む。 A system according to an embodiment of the present disclosure is a system that uses a database configured to be additionally writable, and is received by a receiving unit that receives a request from a user, a selection receiving unit that receives a selection of a character, and a receiving unit. Response execution means for executing response processing according to the request is provided. A database is provided corresponding to each of a plurality of selectable characters, a plurality of basic dictionaries in which response data is stored for each character, and a basic dictionary corresponding to the plurality of characters and corresponding to the selected character. And an additional dictionary in which response data used instead is stored. When the received request is the first request, the response execution means includes a first extraction means for extracting response data using a basic dictionary provided corresponding to the selected character, and the received request is In the case of the second request, the second extraction means for extracting response data using an additional dictionary associated with the selected character, voice based on the extracted response data and the selected character Speech synthesis means for synthesizing; and output means for outputting the synthesized voice data.

本開示の別の実施形態に従う方法は、追記可能に構成され、選択可能な複数のキャラクタにそれぞれ対応して設けられ、当該キャラクタ毎に応答データが保存された複数の基本辞書を含む第１のデータベースと、複数のキャラクタに関連付けられ、選択されたキャラクタに対応する基本辞書の代わりに用いられる応答データが保存された追加辞書を含む第２のデータベースを利用する方法であって、ユーザからの要求を受け付けるステップと、キャラクタの選択を受け付けるステップと、受け付けた要求に従って応答処理を実行するステップとを備える。応答処理を実行するステップは、受け付けた要求が第１の要求である場合に、選択されたキャラクタに対応して設けられた第１のデータベースの基本辞書を利用して応答データを抽出するステップと、受け付けた要求が第２の要求である場合に、選択されたキャラクタに関連付られた第２のデータベースの追加辞書を利用して応答データを抽出するステップと、抽出した応答データと、選択されたキャラクタとに基づいて音声合成するステップと、音声合成した音声データを出力するステップとを含む。 A method according to another embodiment of the present disclosure includes a plurality of basic dictionaries configured to be additionally writable, provided corresponding to a plurality of selectable characters, and including a plurality of basic dictionaries in which response data is stored for each character. A method using a second database including a database and an additional dictionary associated with a plurality of characters and storing response data used in place of a basic dictionary corresponding to a selected character, the request from a user , Receiving a character selection, and executing a response process in accordance with the received request. The step of executing the response process is a step of extracting response data using the basic dictionary of the first database provided corresponding to the selected character when the received request is the first request. When the received request is a second request, a step of extracting response data using an additional dictionary of the second database associated with the selected character, and the extracted response data are selected. And synthesizing the voice based on the character and outputting the synthesized voice data.

本開示の一実施形態に従う端末は、外部装置に設けられ、追記可能に構成され、選択可能な複数のキャラクタにそれぞれ対応して設けられ、当該キャラクタ毎に第１の要求に対する応答処理に用いられる応答データが保存された複数の基本辞書と、複数のキャラクタに関連付けられ、選択されたキャラクタに対応する基本辞書の代わりに第２の要求に対する応答処理に用いられる応答データが保存された追加辞書とを含むデータベースを利用する端末であって、ユーザからの第１あるいは第２の要求を受け付ける受付手段と、キャラクタの選択を受け付ける選択受付手段と、受付手段で受け付けた第１あるいは第２の要求に応じてデータベースを利用して抽出された応答データと選択受付手段で選択されたキャラクタとに基づいて音声合成された音声データを出力する出力手段とを備える。 A terminal according to an embodiment of the present disclosure is provided in an external device, is configured to be additionally writable, is provided corresponding to each of a plurality of selectable characters, and is used for response processing to the first request for each character. A plurality of basic dictionaries in which response data is stored, and an additional dictionary in which response data stored in response processing for the second request is stored in place of the basic dictionary associated with the plurality of characters and corresponding to the selected character A receiving unit that receives a first or second request from a user, a selection receiving unit that receives a selection of a character, and a first or second request received by the receiving unit. The voice synthesis based on the response data extracted using the database and the character selected by the selection receiving means. And output means for outputting the audio data.

本開示の別の実施形態に従う端末は、本体内に設けられ、追記可能に構成され、選択可能な複数のキャラクタにそれぞれ対応して設けられ、当該キャラクタ毎に第１の要求に対する応答処理に用いられる応答データが保存された複数の基本辞書を含む第１のデータベースと、外部装置に設けられ、複数のキャラクタに関連付けられ、選択されたキャラクタに対応する基本辞書の代わりに第２の要求に対する応答処理に用いられる応答データが保存された追加辞書を含む第２のデータベースを利用する端末であって、ユーザからの第１あるいは第２の要求を受け付ける受付手段と、キャラクタの選択を受け付ける選択受付手段と、受付手段で受け付けた要求に従って応答処理を実行する応答実行手段とを備える。応答実行手段は、受付手段で第１の要求を受け付けた場合に、選択されたキャラクタに対応して設けられた第１のデータベースの基本辞書を利用して応答データを抽出する抽出手段と、受付手段で第２の要求を受け付けた場合に、選択されたキャラクタに関連付けられた第２のデータベースの追加辞書を利用して抽出された応答データを取得する取得手段と、抽出あるいは取得された応答データと選択受付手段で選択されたキャラクタとに基づいて音声合成する音声合成手段と、音声合成した音声データを出力する出力手段とを含む。 A terminal according to another embodiment of the present disclosure is provided in the main body, is configured to be additionally writable, is provided corresponding to each of a plurality of selectable characters, and is used for response processing for the first request for each character. A first database including a plurality of basic dictionaries in which response data is stored, and a response to the second request instead of the basic dictionary corresponding to the selected character provided in the external device and associated with the plurality of characters A terminal using a second database including an additional dictionary in which response data used for processing is stored, receiving means for receiving a first or second request from a user, and selection receiving means for receiving a selection of a character And response executing means for executing response processing in accordance with the request received by the receiving means. The response execution means includes an extraction means for extracting response data using the basic dictionary of the first database provided corresponding to the selected character when the reception means receives the first request; Means for acquiring response data extracted using the additional dictionary of the second database associated with the selected character when the means receives the second request, and the response data extracted or acquired And a voice synthesizing unit that synthesizes speech based on the character selected by the selection accepting unit, and an output unit that outputs the synthesized voice data.

本開示の別の実施形態に従う端末は、外部装置に設けられ、選択可能な複数のキャラクタにそれぞれ対応して設けられ、当該キャラクタ毎に第１の要求に対する応答処理に用いられる応答データが保存された複数の基本辞書を含む第１のデータベースと、本体内に設けられ、複数のキャラクタに関連付けられ、選択されたキャラクタに対応する基本辞書の代わりに第２の要求に対する応答処理に用いられる応答データが保存された追加辞書を含む第２のデータベースを利用する端末であって、ユーザからの第１あるいは第２の要求を受け付ける受付手段と、キャラクタの選択を受け付ける選択受付手段と、受付手段で受け付けた要求に従って応答処理を実行する応答実行手段とを備える。応答実行手段は、受付手段で第１の要求を受け付けた場合に、選択されたキャラクタに対応して設けられた第１のデータベースの基本辞書を利用して応答データを取得する取得手段と、受付手段で第２の要求を受け付けた場合に、選択されたキャラクタに関連付けられた第２のデータベースの追加辞書を利用して応答データを抽出する抽出手段と、抽出あるいは取得された応答データと選択受付手段で選択されたキャラクタとに基づいて音声合成する音声合成手段と、音声合成した音声データを出力する出力手段とを含む。 A terminal according to another embodiment of the present disclosure is provided in an external device, provided corresponding to each of a plurality of selectable characters, and response data used for response processing to the first request is stored for each character. A first database including a plurality of basic dictionaries and response data provided in the main body, associated with the plurality of characters, and used for response processing to the second request instead of the basic dictionary corresponding to the selected character Is a terminal that uses the second database including the additional dictionary in which is stored, and accepts the first or second request from the user, the selection accepting means for accepting the selection of the character, and the accepting means. Response executing means for executing response processing according to the received request. A response execution unit configured to acquire response data by using a basic dictionary of the first database provided corresponding to the selected character when the reception unit receives the first request; Means for extracting response data using the additional dictionary of the second database associated with the selected character when the second request is received by the means, and the response data extracted and acquired and the selection reception Speech synthesizing means for synthesizing speech based on the character selected by the means, and output means for outputting the synthesized voice data.

本開示の別の実施形態に従う端末は、本体内に設けられ、追記可能に構成され、選択可能な複数のキャラクタにそれぞれ対応して設けられ、当該キャラクタ毎に第１の要求に対する応答処理に用いられる応答データが保存された複数の基本辞書と、複数のキャラクタに関連付けられ、選択されたキャラクタに対応する基本辞書の代わりに第２の要求に対する応答処理に用いられる応答データが保存された追加辞書とを含むデータベースを利用する端末であって、ユーザからの第１あるいは第２の要求を受け付ける受付手段と、キャラクタの選択を受け付ける選択受付手段と、受付手段で受け付けた要求に従って応答処理を実行する応答実行手段とを備える。応答実行手段は、受付手段で第１の要求を受け付けた場合に、選択されたキャラクタに対応して設けられた第１のデータベースの基本辞書を利用して応答データを抽出する第１の抽出手段と、受付手段で第２の要求を受け付けた場合に、選択されたキャラクタに関連付けられた第２のデータベースの追加辞書を利用して応答データを抽出する第２の抽出手段と、抽出された応答データと選択受付手段で選択されたキャラクタとに基づいて音声合成する音声合成手段と、音声合成した音声データを出力する出力手段とを含む。 A terminal according to another embodiment of the present disclosure is provided in the main body, is configured to be additionally writable, is provided corresponding to each of a plurality of selectable characters, and is used for response processing for the first request for each character. A plurality of basic dictionaries in which response data is stored, and an additional dictionary in which response data stored in response processing for the second request is stored in place of the basic dictionary associated with the plurality of characters and corresponding to the selected character A terminal that uses a database including: a receiving unit that receives a first or second request from a user, a selection receiving unit that receives a selection of a character, and a response process according to the request received by the receiving unit Response execution means. The response executing means extracts first response data using the basic dictionary of the first database provided corresponding to the selected character when the accepting means accepts the first request. And a second extracting means for extracting response data using an additional dictionary in the second database associated with the selected character when the receiving means accepts the second request, and the extracted response Speech synthesis means for synthesizing speech based on the data and the character selected by the selection accepting means, and output means for outputting the synthesized voice data.

本開示の一実施形態に従う端末プログラムは、外部装置に設けられ、追記可能に構成され、選択可能な複数のキャラクタにそれぞれ対応して設けられ、当該キャラクタ毎に第１の要求に対する応答処理に用いられる応答データが保存された複数の基本辞書と、複数のキャラクタに関連付けられ、選択されたキャラクタに対応する基本辞書の代わりに第２の要求に対する応答処理に用いられる応答データが保存された追加辞書とを含むデータベースを利用する端末のコンピュータに実行させるための端末プログラムであって、端末プログラムは、コンピュータにユーザからの第１あるいは第２の要求を受け付けるステップと、キャラクタの選択を受け付けるステップと、受け付けた第１あるいは第２の要求に応じてデータベースを利用して抽出された応答データと選択受付手段で選択されたキャラクタとに基づいて音声合成された音声データを出力するステップとを備える、処理を実行させるように機能させる。 A terminal program according to an embodiment of the present disclosure is provided in an external device, configured to be additionally writable, provided corresponding to each of a plurality of selectable characters, and used for response processing to the first request for each character. A plurality of basic dictionaries in which response data is stored, and an additional dictionary in which response data stored in response processing for the second request is stored in place of the basic dictionary associated with the plurality of characters and corresponding to the selected character A terminal program that causes a computer of a terminal that uses a database including the terminal program to receive a first or second request from a user to the computer, a step of receiving a selection of a character, Extraction using database according to the first or second request received And a step of outputting the audio data speech synthesis was based on the character selected in the response data and the selection receiving unit, to function so as to execute the process.

本開示の別実施形態に従う端末プログラムは、本体内に設けられ、追記可能に構成され、選択可能な複数のキャラクタにそれぞれ対応して設けられ、当該キャラクタ毎に第１の要求に対する応答処理に用いられる応答データが保存された複数の基本辞書を含む第１のデータベースと、外部装置に設けられ、複数のキャラクタに関連付けられ、選択されたキャラクタに対応する基本辞書の代わりに第２の要求に対する応答処理に用いられる応答データが保存された追加辞書を含む第２のデータベースを利用する端末のコンピュータに実行させるための端末プログラムであって、端末プログラムは、コンピュータに、ユーザからの第１あるいは第２の要求を受け付けるステップと、キャラクタの選択を受け付けるステップと、受け付けた要求に従って応答処理を実行するステップとを備える処理を実行させるように機能させる。応答処理を実行するステップは、第１の要求を受け付けた場合に、選択されたキャラクタに対応して設けられた第１のデータベースの基本辞書を利用して応答データを抽出するステップと、第２の要求を受け付けた場合に、選択されたキャラクタに関連付けられた第２のデータベースの追加辞書を利用して抽出された応答データを取得するステップと、抽出あるいは取得された応答データと選択されたキャラクタとに基づいて音声合成するステップと、音声合成した音声データを出力するステップとを含む。 A terminal program according to another embodiment of the present disclosure is provided in the main body, is configured to be additionally writable, is provided corresponding to each of a plurality of selectable characters, and is used for response processing to the first request for each character. A first database including a plurality of basic dictionaries in which response data is stored, and a response to the second request instead of the basic dictionary corresponding to the selected character provided in the external device and associated with the plurality of characters A terminal program for causing a computer of a terminal that uses a second database including an additional dictionary in which response data used for processing is stored to be executed by the computer. In accordance with the received request, the step of receiving the selection of the character, the step of receiving the selection of the character, Function is to so as to execute a process comprising the step of performing a response process Te. The step of executing the response process includes a step of extracting response data using the basic dictionary of the first database provided corresponding to the selected character when the first request is received; The response data extracted using the additional dictionary of the second database associated with the selected character when the request is received, the response data extracted or acquired and the selected character And synthesizing speech based on the above and outputting speech synthesized speech data.

本開示の別実施形態に従う端末プログラムは、外部装置に設けられ、選択可能な複数のキャラクタにそれぞれ対応して設けられ、当該キャラクタ毎に第１の要求に対する応答処理に用いられる応答データが保存された複数の基本辞書を含む第１のデータベースと、本体内に設けられ、複数のキャラクタに関連付けられ、選択されたキャラクタに対応する基本辞書の代わりに第２の要求に対する応答処理に用いられる応答データが保存された追加辞書を含む第２のデータベースを利用する端末のコンピュータに実行させるための端末プログラムであって、端末プログラムは、コンピュータに、ユーザからの第１あるいは第２の要求を受け付けるステップと、キャラクタの選択を受け付けるステップと、受け付けた要求に従って応答処理を実行するステップとを備える、処理を実行させるように機能させる。応答処理を実行するステップは、第１の要求を受け付けた場合に、選択されたキャラクタに対応して設けられた第１のデータベースの基本辞書を利用して応答データを取得するステップと、第２の要求を受け付けた場合に、選択されたキャラクタに関連付けられた第２のデータベースの追加辞書を利用して応答データを抽出するステップと、抽出あるいは取得された応答データと選択受付手段で選択されたキャラクタとに基づいて音声合成するステップと、音声合成した音声データを出力するステップとを含む。 A terminal program according to another embodiment of the present disclosure is provided in an external device, provided corresponding to each of a plurality of selectable characters, and response data used for response processing to the first request is stored for each character. A first database including a plurality of basic dictionaries and response data provided in the main body, associated with the plurality of characters, and used for response processing to the second request instead of the basic dictionary corresponding to the selected character Is a terminal program for causing a computer of a terminal that uses the second database including the additional dictionary stored therein to receive the first or second request from the user to the computer; , Receiving the character selection, and executing response processing according to the received request And a step to function so as to execute the process. The step of executing the response process includes a step of acquiring response data using a basic dictionary of the first database provided corresponding to the selected character when the first request is received, The response data is extracted using the additional dictionary of the second database associated with the selected character, and the response data extracted or acquired and selected by the selection receiving means Voice synthesis based on the character, and outputting voice synthesized voice data.

本開示の別実施形態に従う端末プログラムは、本体内に設けられ、追記可能に構成され、選択可能な複数のキャラクタにそれぞれ対応して設けられ、当該キャラクタ毎に第１の要求に対する応答処理に用いられる応答データが保存された複数の基本辞書と、複数のキャラクタに関連付けられ、選択されたキャラクタに対応する基本辞書の代わりに第２の要求に対する応答処理に用いられる応答データが保存された追加辞書とを含むデータベースを利用する端末のコンピュータに実行させるための端末プログラムであって、端末プログラムは、コンピュータに、ユーザからの第１あるいは第２の要求を受け付けるステップと、キャラクタの選択を受け付けるステップと、受け付けた要求に従って応答処理を実行するステップとを備える、処理を実行させるように機能させる。応答処理を実行するステップは、第１の要求を受け付けた場合に、選択されたキャラクタに対応して設けられた第１のデータベースの基本辞書を利用して応答データを抽出するステップと、第２の要求を受け付けた場合に、選択されたキャラクタに関連付けられた第２のデータベースの追加辞書を利用して応答データを抽出するステップと、抽出された応答データと選択されたキャラクタとに基づいて音声合成するステップと、音声合成した音声データを出力するステップとを含む。 A terminal program according to another embodiment of the present disclosure is provided in the main body, is configured to be additionally writable, is provided corresponding to each of a plurality of selectable characters, and is used for response processing to the first request for each character. A plurality of basic dictionaries in which response data is stored, and an additional dictionary in which response data stored in response processing for the second request is stored in place of the basic dictionary associated with the plurality of characters and corresponding to the selected character A terminal program for causing a computer of a terminal that uses a database including: a terminal program to accept a first or second request from a user to a computer; and a step of accepting selection of a character. And executing a response process in accordance with the received request. To function so as to line. The step of executing the response process includes a step of extracting response data using the basic dictionary of the first database provided corresponding to the selected character when the first request is received; The response data is extracted using the additional dictionary of the second database associated with the selected character, and the voice is generated based on the extracted response data and the selected character. A step of synthesizing, and a step of outputting the synthesized voice data.

本開示の一実施形態に従う音声データ出力装置は、追記可能に構成されるデータベースを利用する音声データ出力装置であって、ユーザからの要求を受け付ける受付手段と、キャラクタの選択を受け付ける選択受付手段と、受付手段で受け付けた要求に従って応答処理を実行する応答実行手段とを備える。データベースは、選択可能な複数のキャラクタにそれぞれ対応して設けられ、当該キャラクタ毎に応答データが保存された複数の基本辞書を含む。音声データ出力装置は、選択されたキャラクタに対応する基本辞書の代わりに用いられる応答データが保存された追加辞書を受け付ける追加辞書受付部と、追加辞書受付部の追加辞書の受け付けに従って、当該追加辞書とともに複数のキャラクタと追加辞書との対応関係を表す対応テーブルをデータベースに登録する登録部とをさらに備える。応答実行手段は、受け付けた要求が第１の要求である場合に、選択されたキャラクタに対応して設けられた基本辞書を利用して応答データを抽出する第１抽出手段と、受け付けた要求が第２の要求である場合に、対応テーブルを参照して、選択されたキャラクタに対応する追加辞書を利用して応答データを抽出する第２抽出手段と、抽出した応答データと、選択されたキャラクタとに基づいて音声合成する音声合成手段と、音声合成した音声データを出力する出力手段とを含む。 An audio data output device according to an embodiment of the present disclosure is an audio data output device that uses a database configured to be additionally writable, and includes a reception unit that receives a request from a user, and a selection reception unit that receives a selection of a character. Response executing means for executing response processing in accordance with the request received by the receiving means. The database includes a plurality of basic dictionaries that are respectively provided corresponding to a plurality of selectable characters and in which response data is stored for each character. The voice data output device includes an additional dictionary receiving unit that receives an additional dictionary in which response data used instead of the basic dictionary corresponding to the selected character is stored, and the additional dictionary according to the reception of the additional dictionary of the additional dictionary receiving unit And a registration unit for registering a correspondence table representing a correspondence relationship between the plurality of characters and the additional dictionary in the database. When the received request is the first request, the response execution means includes a first extraction means for extracting response data using a basic dictionary provided corresponding to the selected character, and the received request is In the case of the second request, with reference to the correspondence table, second extraction means for extracting response data using an additional dictionary corresponding to the selected character, the extracted response data, and the selected character Voice synthesis means for synthesizing speech based on the above and output means for outputting voice data synthesized by voice synthesis.

この開示の上記および他の目的、特徴、局面および利点は、添付の図面と関連して理解されるこの開示に関する次の詳細な説明から明らかとなるであろう。 The above and other objects, features, aspects and advantages of this disclosure will become apparent from the following detailed description of this disclosure, which is to be understood in connection with the accompanying drawings.

キャラクタ毎に辞書を設ける必要がなく、簡易な方式で辞書を更新することが可能である。 It is not necessary to provide a dictionary for each character, and the dictionary can be updated by a simple method.

実施形態１に基づく音声出力システム１について説明する図である。It is a figure explaining the audio | voice output system 1 based on Embodiment 1. FIG. 実施形態１に基づく掃除ロボット１００のハードウェア構成の概要を表わすブロック図である。It is a block diagram showing the outline | summary of the hardware constitutions of the cleaning robot 100 based on Embodiment 1. FIG. 実施形態１に基づくサーバ３００のハードウェア構成の概要を表わすブロック図である。It is a block diagram showing the outline | summary of the hardware constitutions of the server 300 based on Embodiment 1. FIG. 実施形態１に基づく音声出力システム１における応答処理の流れを説明するシーケンス図である。It is a sequence diagram explaining the flow of the response process in the audio | voice output system 1 based on Embodiment 1. FIG. 実施形態１に基づくサーバ３００の機能を説明するブロック図である。It is a block diagram explaining the function of the server 300 based on Embodiment 1. FIG. 実施形態１に基づくデータベース５３１の構成を説明する図である。It is a figure explaining the structure of the database 531 based on Embodiment 1. FIG. 実施形態１に基づくデータベース５３１の具体例について説明する図である。It is a figure explaining the specific example of the database 531 based on Embodiment 1. FIG. 実施形態１に基づくサーバ３００のフローを説明する図である。It is a figure explaining the flow of the server 300 based on Embodiment 1. FIG. 実施形態４に基づく音声出力システムの機能を説明するブロック図である。It is a block diagram explaining the function of the audio | voice output system based on Embodiment 4. FIG. 実施形態４に基づく音声出力システム１Ａにおける応答処理の流れを説明するシーケンス図である。It is a sequence diagram explaining the flow of the response process in the audio | voice output system 1A based on Embodiment 4. FIG. 実施形態７に基づくサーバの構成について説明する図である。It is a figure explaining the structure of the server based on Embodiment 7. FIG.

実施形態について、以下、図面を参照しながら説明する。実施形態の説明において、個数および量などに言及する場合、特に記載がある場合を除き、本発明の範囲は必ずしもその個数およびその量などに限定されない。実施形態の説明において、同一の部品および相当部品に対しては、同一の参照番号を付し、重複する説明は繰り返さない場合がある。特に制限が無い限り、実施形態に示す構成に示す構成を適宜組み合わせて用いることは、当初から予定されていることである。 Hereinafter, embodiments will be described with reference to the drawings. In the description of the embodiments, when referring to the number and amount, the scope of the present invention is not necessarily limited to the number and amount unless otherwise specified. In the description of the embodiments, the same parts and corresponding parts are denoted by the same reference numerals, and redundant description may not be repeated. Unless there is a restriction | limiting in particular, it is planned from the beginning to use suitably the structure shown in the structure shown to embodiment.

＜実施形態１＞
（音声出力システム１の構成）
図１は、実施形態１に基づく音声出力システム１について説明する図である。 <Embodiment 1>
(Configuration of audio output system 1)
FIG. 1 is a diagram illustrating an audio output system 1 based on the first embodiment.

図１を参照して、実施形態１に基づく音声出力システム１は、掃除ロボット１００、ネットワーク５、外部装置５０、サーバ３００とにより構成されている。 With reference to FIG. 1, the audio output system 1 according to the first embodiment includes a cleaning robot 100, a network 5, an external device 50, and a server 300.

掃除ロボット１００は、ネットワーク５を介してサーバ３００と通信可能に設けられている。なお、本例においては、ネットワーク５を介してサーバ３００と通信する場合について説明するが、直接、サーバ３００と通信する方式としてもよい。 The cleaning robot 100 is provided so as to be able to communicate with the server 300 via the network 5. In this example, the case of communicating with the server 300 via the network 5 is described, but a method of directly communicating with the server 300 may be used.

音声出力システム１は、音声データ出力装置の一例として掃除ロボット１００から人間（ユーザ）に対して音声が出力され、これに対して掃除ロボット１００に人間（ユーザ）が発した音声が入力されると、サーバ３００において音声認識されて、入力された音声に対する応答内容を表す音声（以降では、「音声応答」とも記載）を、掃除ロボット１００から出力する。当該処理を繰り返すことにより、実施形態１に基づく音声出力システム１は、ユーザと、掃除ロボット１００との疑似的な会話を実現する。 As an example of the audio data output device, the audio output system 1 outputs an audio from a cleaning robot 100 to a human (user), and receives an audio from the human (user) input to the cleaning robot 100. The cleaning robot 100 outputs voice (hereinafter also referred to as “voice response”) that is recognized by the server 300 and that represents the response content to the input voice. By repeating this process, the audio output system 1 according to the first embodiment realizes a pseudo conversation between the user and the cleaning robot 100.

なお、実施形態１では、音声データ出力装置の一例として、音声を認識してユーザに対して音声応答を出力する掃除ロボット１００を例に挙げて説明するが、本発明はこれに限定されるものではない。例えば、対話機能を有する人形や、掃除ロボット１００以外の家電機器（例えば、テレビ、電子レンジなど）、携帯電話機、スマートフォン、タブレット端末、パーソナルコンピュータその他の情報処理端末あるいは、電子ピアノその他の電子楽器、自動車その他の機器によって実現することも可能である。 In the first embodiment, as an example of the voice data output device, a cleaning robot 100 that recognizes voice and outputs a voice response to the user will be described as an example, but the present invention is limited to this. is not. For example, a doll having an interactive function, a household appliance other than the cleaning robot 100 (for example, a television, a microwave oven, etc.), a mobile phone, a smartphone, a tablet terminal, a personal computer or other information processing terminal, an electronic piano or other electronic musical instrument, It can also be realized by an automobile or other equipment.

また、実施形態では、サーバ３００が１つのサーバによって実現される構成を例に挙げて説明するが、本発明はこれに限定されるものではなく、サーバ３００の備える各部（各機能）の少なくとも一部を、他のサーバにより実現する構成を採用してもよい。 In the embodiment, a configuration in which the server 300 is realized by one server will be described as an example. However, the present invention is not limited to this, and at least one of the units (each function) included in the server 300 is described. A configuration may be adopted in which the unit is realized by another server.

本例において、サーバ３００は、外部装置５０と連携して後述する所定の機能を実行することが可能である。例えばサーバ３００は、外部装置５０にアクセスして天気予報に関する情報を取得し、取得した情報に基づく応答処理を実行することが可能である。 In this example, the server 300 can execute a predetermined function to be described later in cooperation with the external device 50. For example, the server 300 can access the external device 50 to acquire information related to the weather forecast, and execute a response process based on the acquired information.

（音声出力システム１のハードウェア）
図２は、実施形態１に基づく掃除ロボット１００のハードウェア構成の概要を表わすブロック図である。 (Hardware of audio output system 1)
FIG. 2 is a block diagram illustrating an outline of a hardware configuration of the cleaning robot 100 based on the first embodiment.

図２に示されるように掃除ロボット１００は、ＣＰＵ（Central Processing Unit）６１０と、一時記憶部６２０と、記憶部６３０と、通信部６４０と、入力部６５０と、出力部６６０とを備える。 As shown in FIG. 2, the cleaning robot 100 includes a CPU (Central Processing Unit) 610, a temporary storage unit 620, a storage unit 630, a communication unit 640, an input unit 650, and an output unit 660.

ＣＰＵ６１０は、制御部として、命令を実行し、掃除ロボット１００の動作を制御する。 As a control unit, CPU 610 executes a command and controls the operation of cleaning robot 100.

一時記憶部６２０は、ＣＰＵ６１０によって生成されたデータ、記憶部６３０から読みだされたデータなどを一時的に保持する。一時記憶部６２０は、たとえばＲＡＭ（Random Access Memory）その他の揮発性のデータ記憶媒体によって実現される。 The temporary storage unit 620 temporarily holds data generated by the CPU 610, data read from the storage unit 630, and the like. Temporary storage unit 620 is realized by, for example, a RAM (Random Access Memory) or other volatile data storage medium.

記憶部６３０は、ＣＰＵ６１０によって生成されたデータ、予め格納されたデータおよびプログラムなどを保持する。記憶部６３０は、たとえばハードディスク装置、フラッシュメモリその他の不揮発性のデータ記録媒体によって実現される。 The storage unit 630 holds data generated by the CPU 610, prestored data, programs, and the like. Storage unit 630 is implemented by, for example, a hard disk device, a flash memory, or other non-volatile data recording medium.

通信部６４０は、ネットワーク５と接続され、サーバ３００と通信する。なお、携帯電話、スマートフォンその他の情報通信端末と通信することも可能である。通信部６４０は、たとえば、無線通信、有線通信のいずれによっても実現される。通信部６４０による通信の態様は特に限られず、パケット通信、赤外線通信、Bluetooth（登録商標）、ＮＦＣ（Near Field Communication）等によって実現される。 The communication unit 640 is connected to the network 5 and communicates with the server 300. It is also possible to communicate with a mobile phone, a smartphone or other information communication terminal. The communication unit 640 is realized by, for example, wireless communication or wired communication. The mode of communication by the communication unit 640 is not particularly limited, and is realized by packet communication, infrared communication, Bluetooth (registered trademark), NFC (Near Field Communication), or the like.

入力部６５０は、たとえば、マイクで実現される。具体的には、外部から音の入力を受け付ける。なお、マイクが入力を受け付ける音を示す音データには、主に人間の発する音声の周波数帯域に含まれる音のデータ（音声データとも称する）の入力を受け付ける場合について説明するが、音声データの周波数帯域以外の周波数帯域を含む音のデータが含まれていてもよい。マイクは、入力された音を示す音声データを、ＣＰＵ６１０に出力する。音データから音声データを検出する方法としては、例えば、音データから人間の発する音声の周波数帯域（例えば、１００Ｈｚ以上かつ１ｋＨｚ以下の周波数帯域）を抽出することによって音声データを検出する方法を挙げることができる。この場合には、入力部６５０は、音データから人間の発する音声の周波数帯域を抽出するために、例えば、バンドパスフィルタ、又は、ハイパスフィルタ及びローパスフィルタを組み合わせたフィルタなどを備えていればよい。 The input unit 650 is realized by a microphone, for example. Specifically, sound input is accepted from the outside. Note that the sound data indicating the sound that the microphone receives input will be described in the case of receiving sound data (also referred to as sound data) included in the frequency band of sound mainly produced by humans. Sound data including a frequency band other than the band may be included. The microphone outputs audio data indicating the input sound to the CPU 610. As a method for detecting sound data from sound data, for example, a method for detecting sound data by extracting a frequency band (for example, a frequency band of 100 Hz or more and 1 kHz or less) of a sound uttered by a human from sound data may be mentioned. Can do. In this case, the input unit 650 may include, for example, a bandpass filter or a filter that combines a high-pass filter and a low-pass filter in order to extract a frequency band of a human-generated voice from sound data. .

なお、入力部６５０は、たとえば、キーボード、マウスその他のポインティングデバイス、信号入力端子、赤外線受光部等を含み得る。 Note that the input unit 650 may include, for example, a keyboard, a mouse or other pointing device, a signal input terminal, an infrared light receiving unit, and the like.

出力部６６０は、たとえば、スピーカで実現される。具体的には、外部に対して出力される応答内容を表す音声信号を再生する。なお、出力部６６０は、たとえば、液晶モニタ、有機ＥＬ（Electro Luminescence）モニタ、ＬＥＤ外部出力インターフェイスを含みうる。 The output unit 660 is realized by a speaker, for example. Specifically, an audio signal representing the response content output to the outside is reproduced. Output unit 660 can include, for example, a liquid crystal monitor, an organic EL (Electro Luminescence) monitor, and an LED external output interface.

駆動部６７０は、掃除ロボット１００が移動する車輪および車輪を駆動するモータであ。なお、たとえば、掃除ロボット１００とは別の機器である場合には、駆動部６７０は、通信回路、バイブレータ、車輪、コンプレッサ、画像処理プロセッサなどを含み得る。 The driving unit 670 is a wheel that the cleaning robot 100 moves and a motor that drives the wheel. For example, when the device is different from the cleaning robot 100, the drive unit 670 may include a communication circuit, a vibrator, a wheel, a compressor, an image processor, and the like.

掃除部６８０は、ブラシや吸引ポンプ等で構成される。
なお、上記構成は、必ずしも必須の構成ではなく、たとえば掃除部６８０等、機器に応じて機能を追加あるいは削除することも可能である。 The cleaning unit 680 includes a brush, a suction pump, and the like.
In addition, the said structure is not necessarily essential structure, For example, it is also possible to add or delete a function according to apparatuses, such as the cleaning part 680. FIG.

図３は、実施形態１に基づくサーバ３００のハードウェア構成の概要を表わすブロック図である。 FIG. 3 is a block diagram showing an outline of the hardware configuration of the server 300 based on the first embodiment.

図３に示されるようにサーバ３００は、ＣＰＵ５１０と、一時記憶部５２０と、記憶部５３０と、通信部５４０と、入力部５５０と、出力部５６０とを備える。 As illustrated in FIG. 3, the server 300 includes a CPU 510, a temporary storage unit 520, a storage unit 530, a communication unit 540, an input unit 550, and an output unit 560.

ＣＰＵ５１０は、制御部として、命令を実行し、サーバ３００の動作を制御する。
一時記憶部５２０は、ＣＰＵ５１０によって生成されたデータ、記憶部５３０から読みだされたデータ、サーバ３００に対して与えられたデータなどを一時的に保持する。一時記憶部５２０は、たとえばＲＡＭ（Random Access Memory）その他の揮発性のデータ記憶媒体によって実現される。 As a control unit, CPU 510 executes instructions and controls the operation of server 300.
Temporary storage unit 520 temporarily holds data generated by CPU 510, data read from storage unit 530, data given to server 300, and the like. Temporary storage unit 520 is realized by, for example, a RAM (Random Access Memory) or other volatile data storage medium.

記憶部５３０は、ＣＰＵ５１０によって生成されたデータ、サーバ３００に対して与えられたデータ、サーバ３００に所定の動作を実行させるために予め格納されたデータおよびプログラムなどを保持する。記憶部５３０は、たとえばハードディスク装置、フラッシュメモリその他の不揮発性のデータ記録媒体によって実現される。 The storage unit 530 holds data generated by the CPU 510, data given to the server 300, data stored in advance for causing the server 300 to perform a predetermined operation, a program, and the like. Storage unit 530 is realized by, for example, a hard disk device, a flash memory, or other nonvolatile data recording medium.

通信部５４０は、ネットワーク５と接続され掃除ロボット１００と通信する。通信部５４０は、たとえば、無線通信、有線通信のいずれによっても実現される。通信部５４０による通信の態様は特に限られず、パケット通信、赤外線通信、Bluetooth（登録商標）、ＮＦＣ（Near Field Communication）等によって実現される。 Communication unit 540 is connected to network 5 and communicates with cleaning robot 100. The communication unit 540 is realized by, for example, wireless communication or wired communication. The mode of communication by the communication unit 540 is not particularly limited, and is realized by packet communication, infrared communication, Bluetooth (registered trademark), NFC (Near Field Communication), or the like.

入力部５５０は、サーバ３００に対する命令または文字その他の情報の入力を受け付ける。入力部５５０は、たとえば、キーボード、マウスその他のポインティングデバイス、信号入力端子、赤外線受光部等を含み得る。 The input unit 550 accepts input of commands or characters and other information to the server 300. The input unit 550 can include, for example, a keyboard, a mouse or other pointing device, a signal input terminal, an infrared light receiving unit, and the like.

出力部５６０は、サーバ３００において生成されたデータ、ＣＰＵ５１０によって検索された結果などを出力する。出力部５６０は、たとえば、液晶モニタ、有機ＥＬ（Electro Luminescence）モニタ、ＬＥＤ外部出力インターフェイスなどによって実現される。 The output unit 560 outputs data generated in the server 300, a result searched by the CPU 510, and the like. The output unit 560 is realized by, for example, a liquid crystal monitor, an organic EL (Electro Luminescence) monitor, an LED external output interface, or the like.

（応答処理概要）
図４は、実施形態１に基づく音声出力システム１における応答処理の流れを説明するシーケンス図である。 (Response processing overview)
FIG. 4 is a sequence diagram illustrating the flow of response processing in the audio output system 1 based on the first embodiment.

図４に示されるように、ユーザは、掃除ロボット１００に対して発話（ユーザ発話とも称する）する（シーケンスｓｑ０）。 As shown in FIG. 4, the user utters (also referred to as user utterance) to cleaning robot 100 (sequence sq0).

掃除ロボット１００は、ユーザ発話に対して音声の入力を受け付ける（シーケンスｓｑ１）。具体的には、掃除ロボット１００は、マイクを介して外部からの音の入力を受け付ける。 Cleaning robot 100 accepts an input of voice in response to the user utterance (sequence sq1). Specifically, the cleaning robot 100 receives an external sound input via a microphone.

次に、掃除ロボット１００は、音声データをサーバ３００に出力する（シーケンスｓｑ２）。具体的には、受け付けた音声データを通信部６４０を介してサーバ３００に出力する。 Next, cleaning robot 100 outputs audio data to server 300 (sequence sq2). Specifically, the received audio data is output to the server 300 via the communication unit 640.

次に、サーバ３００は、掃除ロボット１００から送信された音声データを受信して音声認識を実行する（シーケンスｓｑ３）。具体的には、通信部５４０を介して音声データを受信する。そして、受信した音声データの音声内容を認識する。そして、サーバ３００は、認識した音声内容に基づいて応答処理を実行する（シーケンスｓｑ４）。 Next, server 300 receives the voice data transmitted from cleaning robot 100 and executes voice recognition (sequence sq3). Specifically, the audio data is received via the communication unit 540. Then, the voice content of the received voice data is recognized. Server 300 executes response processing based on the recognized audio content (sequence sq4).

次に、サーバ３００は、応答処理の結果として音声合成により生成した音声データを掃除ロボット１００に送信する（シーケンスｓｑ５）。具体的には、通信部５４０を介して音声データを掃除ロボット１００に送信する。 Next, server 300 transmits voice data generated by voice synthesis as a result of the response process to cleaning robot 100 (sequence sq5). Specifically, the voice data is transmitted to the cleaning robot 100 via the communication unit 540.

次に、掃除ロボット１００は、サーバ３００から受信した音声データの出力処理を実行する（シーケンスｓｑ６）。具体的には、通信部６４０を介してサーバ３００からの音声データを受信する。 Next, cleaning robot 100 executes an output process of audio data received from server 300 (sequence sq6). Specifically, the audio data from the server 300 is received via the communication unit 640.

掃除ロボット１００は、音声データに基づいて音声を再生する（シーケンスｓｑ７）。
具体的には、スピーカを介して音声信号を再生する。 Cleaning robot 100 reproduces the sound based on the sound data (sequence sq7).
Specifically, an audio signal is reproduced through a speaker.

当該処理により、ユーザが発話した内容に従って応答処理し、応答内容を示す音声をユーザに出力することが可能となる。 By this process, it is possible to perform a response process according to the content uttered by the user and to output a voice indicating the response content to the user.

なお、本例においては、一例として音声認識処理および音声合成処理についてサーバ３００側で実行する方式について説明するが、特にこれに限られず、音声認識処理および音声合成処理を掃除ロボット１００側で実行するようにしても良いし、一方を掃除ロボット１００側、他方をサーバ３００側で実行するようにしても良い。以下の構成についても同様である。 In this example, a method of executing the voice recognition process and the voice synthesis process on the server 300 side will be described as an example. However, the present invention is not limited to this, and the voice recognition process and the voice synthesis process are executed on the cleaning robot 100 side. Alternatively, one may be executed on the cleaning robot 100 side and the other on the server 300 side. The same applies to the following configurations.

次に、サーバ３００の具体的構成について説明する。
（サーバ３００の機能ブロック図）
図５は、実施形態１に基づくサーバ３００の機能を説明するブロック図である。 Next, a specific configuration of the server 300 will be described.
(Functional block diagram of server 300)
FIG. 5 is a block diagram illustrating functions of the server 300 based on the first embodiment.

図５に示されるように、サーバ３００は、キャラクタ選択入力受付部４００と、キャラクタ設定部４０２と、応答実行部４０４と、音声入力受信部４１４と、音声認識部４１６と、目覚まし実行部４１８と、データベース（ＤＢ）５３１とを含む。 As shown in FIG. 5, the server 300 includes a character selection input receiving unit 400, a character setting unit 402, a response execution unit 404, a voice input reception unit 414, a voice recognition unit 416, and an alarm execution unit 418. And a database (DB) 531.

キャラクタ選択入力受付部４００は、ユーザからのキャラクタの選択入力の指示を受け付ける。本例においては、ユーザ発話に対する応答処理による音声再生として、予め設けられた複数のキャラクタの音声を選択して再生することが可能に設けられている。 The character selection input receiving unit 400 receives an instruction to select and input a character from the user. In this example, as voice reproduction by response processing to a user utterance, it is possible to select and reproduce voices of a plurality of characters provided in advance.

ここで「キャラクタ」とは、音声の有する性格を意味するものであり、特徴的又は個性的な口調、役柄、声色又はこれらの組み合わせにより特有の性格を有するものである。例えば、口調に癖があり関西弁で発声する場合等である。また、キャラクタは人に限らず動物等やこれらを擬人化したものでもよい。例えば、犬の鳴き声等であってもよいし、犬を擬人化して人間の言葉を発するものとしてもよいし、既存のアニメ−ションキャラクタ等であってもよい。 Here, the “character” means a character possessed by speech, and has a unique character by a characteristic or individual tone, character, voice color, or a combination thereof. For example, there is a habit in the tone and the voice is spoken in the Kansai dialect. Further, the character is not limited to a person but may be an animal or the like, or anthropomorphized person. For example, it may be a cry of a dog, an anthropomorphic dog that utters human words, or an existing animation character.

たとえば、掃除ロボット１００の入力部６５０に設けられる所定キーの操作に従って複数のキャラクタの選択指示が実行され、当該選択指示が通信部６４０を介してサーバ３００に送信される。サーバ３００は、通信部５４０を介して当該選択指示を受信してキャラクタ選択入力受付部４００でキャラクタの選択指示を受け付ける。本例においては、選択可能な４つのキャラクタＡ〜Ｄが設けられている場合について説明する。 For example, a plurality of character selection instructions are executed in accordance with an operation of a predetermined key provided on the input unit 650 of the cleaning robot 100, and the selection instructions are transmitted to the server 300 via the communication unit 640. Server 300 receives the selection instruction via communication unit 540 and receives a character selection instruction at character selection input reception unit 400. In this example, a case where four selectable characters A to D are provided will be described.

キャラクタ設定部４０２は、選択指示されたキャラクタに設定する。当該キャラクタの設定により音声合成における音素パターンが切り替えられる。音素パターンには、声の大きさ、発声速度、音量、高域強調、抑揚等の発声音声に関するデータも含まれる。当該キャラクタの設定に従ってキャラクタに応じた音声データを生成する。 The character setting unit 402 sets the selected character. The phoneme pattern in the speech synthesis is switched according to the setting of the character. The phoneme pattern also includes data related to the uttered voice such as loudness, utterance speed, volume, high frequency emphasis, and inflection. Audio data corresponding to the character is generated according to the setting of the character.

音声入力受信部４１４は、通信部５４０を介して掃除ロボット１００から入力された音声データを受信する。そして、応答実行部４０４に受信した音声データを出力する。 The voice input receiving unit 414 receives voice data input from the cleaning robot 100 via the communication unit 540. Then, the received voice data is output to the response execution unit 404.

応答実行部４０４は、受信した音声データに応じた応答処理を実行する。
具体的には、応答実行部４０４は、音声認識部４１６と、第１抽出部４０６と、第２抽出部４０８と、音声合成部４１０と、データ出力部４１２とを含む。 The response execution unit 404 executes response processing according to the received audio data.
Specifically, the response execution unit 404 includes a speech recognition unit 416, a first extraction unit 406, a second extraction unit 408, a speech synthesis unit 410, and a data output unit 412.

音声認識部４１６は、音声入力受信部４１４によって受信した音声データの示す音声の内容（音声内容）を認識内容（認識フレーズ）として認識する。 The voice recognition unit 416 recognizes the voice content (sound content) indicated by the voice data received by the voice input reception unit 414 as the recognized content (recognition phrase).

第１抽出部４０６は、取得した認識フレーズに基づいて、データベース５３１に格納されている標準会話機能辞書群５３２の対応するキャラクタの基本辞書を参照して、音声データの示す音声内容に対応する応答内容（応答情報）を選択（決定）する。 The first extraction unit 406 refers to the basic dictionary of the corresponding character in the standard conversation function dictionary group 532 stored in the database 531 based on the acquired recognition phrase, and responds to the voice content indicated by the voice data Select (determine) the content (response information).

第２抽出部４０８は、取得した認識フレーズに基づいて、データベース５３１に格納されている標準会話機能辞書群５３２以外の他の辞書群の辞書を参照して、音声データの示す音声内容に対応する応答内容（応答情報）を選択（決定）する。 Based on the acquired recognition phrase, the second extraction unit 408 refers to a dictionary in a dictionary group other than the standard conversation function dictionary group 532 stored in the database 531 and corresponds to the voice content indicated by the voice data. Select (determine) response content (response information).

第１抽出部４０６および第２抽出部４０８のいずれにおいても、記憶部５３０に格納されているデータベース５３１を参照して、辞書を用いて音声データに対する認識フレーズを取得できなかった場合には音声認識は失敗と判断して、応答処理を終了する。 In any of the first extraction unit 406 and the second extraction unit 408, referring to the database 531 stored in the storage unit 530, if a recognition phrase for voice data cannot be obtained using a dictionary, voice recognition is performed. Is determined to have failed, and the response processing is terminated.

音声合成部４１０は、選択された応答内容と、設定されているキャラクタの音素パターンとに基づいて音声合成して音声データを生成する。 The speech synthesizer 410 synthesizes speech based on the selected response content and the set phoneme pattern of the character to generate speech data.

データ出力部４１２は、生成した音声データを通信部５４０を介して掃除ロボット１００に送信する。掃除ロボット１００は、通信部６４０を介して当該音声データを受信して、出力部６６０から音声を再生して出力する。 The data output unit 412 transmits the generated voice data to the cleaning robot 100 via the communication unit 540. The cleaning robot 100 receives the audio data via the communication unit 640 and reproduces and outputs the audio from the output unit 660.

目覚まし実行部４１８は、目覚まし機能が有効に設定されている場合に、予め設定した時刻に通知処理を実行するように指示する。具体的には、目覚まし実行部４１８は、時刻管理機能を有しており、設定された時刻となった場合に通知処理の指示を応答実行部４０４に出力する。 The alarm execution unit 418 instructs to execute the notification process at a preset time when the alarm function is set to be valid. Specifically, the alarm execution unit 418 has a time management function, and outputs a notification processing instruction to the response execution unit 404 when the set time is reached.

機能辞書群受付部４４０は、データベース５３１に追記する新たな機能を追加する際の応答処理を実行するための辞書群を受け付ける。ここで、受け付ける機能辞書群は、後述する抽象キャラクタに対応する辞書である。 The function dictionary group reception unit 440 receives a dictionary group for executing a response process when a new function to be added to the database 531 is added. Here, the function dictionary group to be accepted is a dictionary corresponding to an abstract character described later.

追加基本辞書受付部４５０は、データベース５３１に追記する基本辞書を受け付ける。ここで、受け付ける基本辞書は、後述する新規のキャラクタに対応して追加する辞書である。 The additional basic dictionary receiving unit 450 receives a basic dictionary to be added to the database 531. Here, the basic dictionary to be accepted is a dictionary to be added corresponding to a new character to be described later.

登録部４４２は、機能辞書群受付部４４０で受け付けた機能辞書群をデータベース５３１に追記する。登録部４４２は、追加基本辞書受付部４５０で受け付けた基本辞書をデータベース５３１に追記する。具体的には、登録部４４２は、新規のキャラクタに対応する基本辞書を追加する際には、データベースに登録されている抽象キャラクタと当該新規のキャラクタとを関連付けて登録する。 The registration unit 442 adds the function dictionary group received by the function dictionary group reception unit 440 to the database 531. The registration unit 442 adds the basic dictionary received by the additional basic dictionary reception unit 450 to the database 531. Specifically, when adding a basic dictionary corresponding to a new character, the registration unit 442 registers the abstract character registered in the database and the new character in association with each other.

（データベース構成）
図６は、実施形態１に基づくデータベース５３１の構成を説明する図である。 (Database configuration)
FIG. 6 is a diagram illustrating the configuration of the database 531 based on the first embodiment.

図６に示されるように、データベース５３１は、記憶部５３０に格納されるとともに、追記可能に構成されている。 As shown in FIG. 6, the database 531 is stored in the storage unit 530 and is configured to be additionally writable.

データベース５３１は、各機能毎に応答処理を実行するための辞書が格納される。
本例においては、データベース５３１は、標準会話の応答処理を実行するための標準会話機能辞書群５３２を含む。当該標準会話機能辞書群５３２には、時計の応答処理（時計機能）を実行するための辞書も含まれている。 The database 531 stores a dictionary for executing response processing for each function.
In this example, the database 531 includes a standard conversation function dictionary group 532 for executing standard conversation response processing. The standard conversation function dictionary group 532 includes a dictionary for executing a clock response process (clock function).

データベース５３１は、さらに、天気予報の応答処理（天気予報機能）を実行するための天気予報機能辞書群５３４と、目覚まし（アラーム）の応答処理（目覚まし機能）を実行するための目覚まし機能辞書群５３８とを有する。 The database 531 further includes a weather forecast function dictionary group 534 for executing a weather forecast response process (weather forecast function) and an alarm function dictionary group 538 for executing an alarm response process (alarm function). And have.

天気予報機能辞書群５３４は、抽象キャラクタＸ追加辞書５３４Ａと、抽象キャラクタＹ追加辞書５３４Ｂとを含む。 The weather forecast function dictionary group 534 includes an abstract character X additional dictionary 534A and an abstract character Y additional dictionary 534B.

目覚まし機能辞書群５３８は、抽象キャラクタＸ追加辞書５３８Ａと、抽象キャラクタＹ追加辞書５３８Ｂとを含む。 The alarm function dictionary group 538 includes an abstract character X additional dictionary 538A and an abstract character Y additional dictionary 538B.

また、データベース５３１には、登録部４４２により生成された複数のキャラクタと抽象キャラクタとの関連付けを示す対応テーブル５３９が設けられる。対応テーブル５３９には、キャラクタＡ，Ｂと抽象キャラクタＸとが対応付けられて登録されている。キャラクタＣ，Ｄと抽象キャラクタＹとが対応付けられて登録されている。 Further, the database 531 is provided with a correspondence table 539 indicating associations between a plurality of characters generated by the registration unit 442 and abstract characters. In the correspondence table 539, the characters A and B and the abstract character X are registered in association with each other. Characters C and D and abstract character Y are registered in association with each other.

なお、新規のキャラクタが追加される場合には、登録部４２２によりキャラクタの関連付けを示す新たな対応テーブルが生成される。抽象キャラクタと新規のキャラクタとが関連付けられた対応テーブルが生成される。 When a new character is added, the registration unit 422 generates a new correspondence table indicating the character association. A correspondence table in which the abstract character and the new character are associated with each other is generated.

本例においては、一例として天気予報機能を実行するための天気予報機能辞書群５３４と、目覚まし機能を実行するための目覚まし機能辞書群５３８とは、データベース５３１に新たに追記したものである。具体的には、天気予報機能辞書群５３４および目覚まし機能辞書群５３８は、機能辞書群受付部４４０で受け付けられて、登録部４４２によりデータベース５３１に格納される。 In this example, a weather forecast function dictionary group 534 for executing a weather forecast function and an alarm function dictionary group 538 for executing an alarm function are newly added to the database 531 as an example. Specifically, the weather forecast function dictionary group 534 and the alarm function dictionary group 538 are received by the function dictionary group reception unit 440 and stored in the database 531 by the registration unit 442.

サーバ３００に当該新たな辞書群を追加することにより、種々の音声出力パターンを追加し機能を拡張することが可能となる。 By adding the new dictionary group to the server 300, it is possible to add various voice output patterns and expand the functions.

ここで、標準会話機能辞書群５３２は、キャラクタＡ基本辞書５３２Ａと、キャラクタＢ基本辞書５３２Ｂと、キャラクタＣ基本辞書５３２Ｃと、キャラクタＤ基本辞書５３２Ｄとを含む。 Here, the standard conversation function dictionary group 532 includes a character A basic dictionary 532A, a character B basic dictionary 532B, a character C basic dictionary 532C, and a character D basic dictionary 532D.

すなわち、標準会話のために予め４つのキャラクタ用の基本辞書が設けられている。
したがって、一般的には、機能を拡張（追加）する際にも予め設けられている４つのキャラクタ用の基本辞書に合わせて、それぞれのキャラクタ毎の新たな追加辞書を設ける必要が生じる。 In other words, four basic dictionaries for characters are provided in advance for standard conversation.
Therefore, in general, when expanding (adding) a function, it is necessary to provide a new additional dictionary for each character in accordance with the four basic character dictionaries provided in advance.

一方で、全てのキャラクタに合わせた追加辞書を作成することは、機能を追加する側の負荷を強いることになり、キャラクタ数が多い場合には処理が煩雑になる。 On the other hand, creating an additional dictionary tailored to all characters imposes a load on the side of adding a function, and the process becomes complicated when the number of characters is large.

したがって、本例においては、全てのキャラクタにそれぞれ合わせた追加辞書を作成して追記するのではなく、複数のキャラクタから類似したキャラクタ同士の共通の特徴を残して固有の特徴を排した抽象的なキャラクタを定義し、その抽象的なキャラクタ（抽象キャラクタとも称する）用の追加辞書（テキストデータ）を作成する。 Therefore, in this example, instead of creating and adding additional dictionaries tailored to each character, an abstract that excludes unique features while leaving common features of similar characters from multiple characters. A character is defined, and an additional dictionary (text data) for the abstract character (also called an abstract character) is created.

例えば、キャラクタＡおよびキャラクタＢから、「です」、「ます」等の共通の一般的な丁寧口調を残して、固有の特徴的な口調を排した抽象キャラクタＸを定義する。 For example, the character A and the character B are defined as an abstract character X that leaves a common general polite tone such as “is”, “masu”, etc., and excludes a unique characteristic tone.

また、キャラクタＣおよびキャラクタＤから、共通の一般的な友達口調を残して、固有の特徴的な口調を排した抽象キャラクタＹを定義する。 Further, an abstract character Y is defined from the character C and the character D, leaving a common general friend tone and excluding a unique characteristic tone.

そして、丁寧口調の抽象キャラクタＸの追加辞書と、友達口調のキャラクタＹの追加辞書を作成する。 Then, an additional dictionary of polite tone abstract characters X and an additional dictionary of friend tone characters Y are created.

そして、「です」、「ます」等の丁寧な口調で会話するキャラクタＡおよびキャラクタＢと丁寧口調の抽象キャラクタＸとを関連付ける。 Then, the character A and the character B talking in a polite tone such as “is” or “mas” are associated with the polite tone abstract character X.

また、「だよん」等の友達口調のキャラクタＣとキャラクタＤと友達口調の抽象キャラクタＹとを関連付ける。 Also, the friend tone character C and character D such as “Dayon” are associated with the friend tone abstract character Y.

本例においては、機能を拡張（追加）して、当該機能を利用する場合には、抽象キャラクタＸあるいは抽象キャラクタＹの追加辞書を利用して、設定されているキャラクタの音素パターンを用いて音声合成により音声データを生成する。 In this example, when the function is expanded (added) and the function is used, an additional dictionary of the abstract character X or abstract character Y is used to generate a sound using the set phoneme pattern of the character. Audio data is generated by synthesis.

たとえば、ユーザがキャラクタＡあるいはＢを設定している場合に、抽象キャラクタＸの追加辞書を利用してキャラクタＡあるいはＢの音素パターンと音声合成する。追加辞書は丁寧な口調として定義されているため丁寧な口調で会話するキャラクタＡあるいはＢで音声合成により作成された音声データは違和感を生じさせることなく自然な会話を維持することが可能である。 For example, when the user has set character A or B, voice synthesis is performed with the phoneme pattern of character A or B using the additional dictionary of abstract character X. Since the additional dictionary is defined as a polite tone, the voice data created by voice synthesis with the character A or B talking in a polite tone can maintain a natural conversation without causing a sense of incongruity.

また、ユーザがキャラクタＣあるいはＤを設定している場合に、抽象キャラクタＹの追加辞書を利用してキャラクタＣあるいはＤの音素パターンと音声合成する。追加辞書は友達口調として定義されているため友達口調で会話するキャラクタＣあるいはＤで音声合成により作成された音声データ違和感を生じさせることなく自然な会話を維持することが可能である。 Further, when the user has set the character C or D, voice synthesis is performed with the phoneme pattern of the character C or D using the additional dictionary of the abstract character Y. Since the additional dictionary is defined as a friend tone, it is possible to maintain a natural conversation without causing a sense of incongruity in voice data created by voice synthesis with the character C or D talking in a friend tone.

なお、本例においては、２つの抽象キャラクタ、丁寧口調の抽象キャラクタと、友達口調の抽象キャラクタを設ける場合について説明したが、これに限られず、たとえば、男性口調の抽象キャラクタや、女性口調の抽象キャラクタを定義して、追加辞書を作成するようにしても良い。なお、本例においては一例として口調により抽象キャラクタを定義する場合について説明したが、特に口調に限られず、別に定義することも当然に可能である。 In this example, two abstract characters, a polite tone abstract character, and a friend tone abstract character have been described. However, the present invention is not limited to this. For example, a male tone abstract character and a woman tone abstract character are provided. An additional dictionary may be created by defining a character. In this example, the case where an abstract character is defined by tone is described as an example. However, the present invention is not limited to tone, and can be defined separately.

当該方式により、例えば、新たに機能を追加する場合、全てのキャラクタにそれぞれ合わせた追加辞書を作成して追記するのではなく、抽象的なキャラクタを定義して複数のキャラクタと関連付けて当該抽象キャラクタの辞書を利用することにより、機能を追加して辞書を作成する側の負荷を軽減することが可能である。また、追加する辞書数を減らすことができ管理しやすく、また、辞書を記憶させる容量を小さくすることも可能である。 For example, when a new function is added by this method, instead of creating and adding an additional dictionary for each character, an abstract character is defined and associated with a plurality of characters. By using this dictionary, it is possible to reduce the load on the side of creating a dictionary by adding functions. Further, the number of dictionaries to be added can be reduced and management is easy, and the capacity for storing dictionaries can be reduced.

また、新規のキャラクタ（たとえばキャラクタＥ）を作成する場合には、基本辞書のみを追加する。データベース５３１には、登録部４４２により新規のキャラクタＥと抽象キャラクタとが関連付けられた対応テーブル５３９が生成される。例えば、新規のキャラクタＥが丁寧な口調で会話するキャラクタであれば抽象キャラクタＸと関連付けられて登録される。これにより、キャラクタを作成する場合に全ての機能に対応する辞書を追加する必要はなく、例えば標準会話の応答処理を実行するための辞書を作成すれば、他の機能については抽象キャラクタの辞書を用いることにより自然な会話を維持しつつ応答処理することが可能である。したがって、新規のキャラクタに対応する基本辞書を作成する側の負荷を軽減することが可能である。 Further, when creating a new character (for example, character E), only the basic dictionary is added. In the database 531, a correspondence table 539 in which a new character E and an abstract character are associated is generated by the registration unit 442. For example, if the new character E is a character that speaks in a polite tone, it is registered in association with the abstract character X. Thus, when creating a character, it is not necessary to add a dictionary corresponding to all functions. For example, if a dictionary for executing response processing of a standard conversation is created, an abstract character dictionary is used for other functions. By using it, it is possible to perform response processing while maintaining a natural conversation. Therefore, it is possible to reduce the load on the side of creating a basic dictionary corresponding to a new character.

図７は、実施形態１に基づくデータベース５３１の具体例について説明する図である。
データベース５３１は、一例として実施形態に基づくサーバ３００の備える記憶部５３０に格納されている。 FIG. 7 is a diagram illustrating a specific example of the database 531 based on the first embodiment.
As an example, the database 531 is stored in the storage unit 530 included in the server 300 based on the embodiment.

図７（Ａ）においては、標準会話機能辞書群５３２の各キャラクタ毎に設けられている基本辞書から認識フレーズ「おはよう」、「今何時？」に対する応答処理を整理したテーブルが示されている。データベース５３１は、応答処理の内容としてテキストデータが格納される。 FIG. 7A shows a table in which the response processes for the recognition phrases “Good morning” and “What time is it?” Are arranged from the basic dictionary provided for each character in the standard conversation function dictionary group 532. The database 531 stores text data as the contents of response processing.

認識フレーズ「おはよう」を認識した場合には、キャラクタ毎に回答フレーズが異なる。一例として、キャラクタＡの場合には、「おはようございます、いい朝ですね」を応答内容として決定する。キャラクタＢの場合には、「おはようございます」を応答内容として決定する。キャラクタＣの場合には、「おはよー」を応答内容として決定する。キャラクタＤの場合には、「おっはー」を応答内容として決定する。 When the recognition phrase “good morning” is recognized, the answer phrase is different for each character. As an example, in the case of character A, “Good morning, good morning” is determined as the response content. In the case of the character B, “Good morning” is determined as the response content. In the case of the character C, “Ohayo” is determined as the response content. In the case of the character D, “Oh-ha” is determined as the response content.

なお、認識フレーズ「おはよう」を認識した場合のコマンドは「なし」に設定されている。したがって、通常の標準会話となる。 The command when the recognition phrase “good morning” is recognized is set to “none”. Therefore, it becomes a normal standard conversation.

別の例として認識フレーズ「今何時？」を認識した場合には、キャラクタ毎に回答フレーズが異なる。一例として、キャラクタＡの場合には、「・・・」「ですよ」を応答内容として決定する。ここで、「・・・」の部分は、コマンドにより取得した情報が挿入される。具体的には、認識フレーズ「今何時？」を認識した場合のコマンドは「時刻取得」に設定されている。したがって、現在時刻が取得される。そして、たとえば、「８時」が取得された場合には、「８時ですよ」を応答内容として決定する。 As another example, when the recognition phrase “What time is it now?” Is recognized, the answer phrase is different for each character. As an example, in the case of the character A, “... Here, information acquired by a command is inserted into the portion of “...”. Specifically, the command when the recognition phrase “what time is it now” is set to “acquire time”. Therefore, the current time is acquired. For example, when “8 o'clock” is acquired, “8 o'clock is good” is determined as the response content.

キャラクタＢの場合には、「・・・」「でございます」を応答内容として決定する。ここで、「・・・」の部分は、コマンドにより取得した情報が挿入される。具体的には、認識フレーズ「今何時？」を認識した場合のコマンドは「時刻取得」に設定されている。したがって、現在時刻が取得される。そして、たとえば、「８時」が取得された場合には、「８時でございます」を応答内容として決定する。 In the case of the character B, “... Here, information acquired by a command is inserted into the portion of “...”. Specifically, the command when the recognition phrase “what time is it now” is set to “acquire time”. Therefore, the current time is acquired. For example, when “8 o'clock” is acquired, “It is 8 o'clock” is determined as the response content.

キャラクタＣの場合には、「・・・」「だよん」を応答内容として決定する。ここで、「・・・」の部分は、コマンドにより取得した情報が挿入される。具体的には、認識フレーズ「今何時？」を認識した場合のコマンドは「時刻取得」に設定されている。したがって、現在時刻が取得される。そして、たとえば、「８時」が取得された場合には、「８時だよん」を応答内容として決定する。 In the case of the character C, “...” And “Dayon” are determined as response contents. Here, information acquired by a command is inserted into the portion of “...”. Specifically, the command when the recognition phrase “what time is it now” is set to “acquire time”. Therefore, the current time is acquired. For example, when “8 o'clock” is acquired, “8 o'clock” is determined as the response content.

キャラクタＤの場合には、「・・・」「だワン」を応答内容として決定する。ここで、「・・・」の部分は、コマンドにより取得した情報が挿入される。具体的には、認識フレーズ「今何時？」を認識した場合のコマンドは「時刻取得」に設定されている。したがって、現在時刻が取得される。そして、たとえば、「８時」が取得された場合には、「８時だワン」を応答内容として決定する。 In the case of the character D, “...” And “DA ONE” are determined as response contents. Here, information acquired by a command is inserted into the portion of “...”. Specifically, the command when the recognition phrase “what time is it now” is set to “acquire time”. Therefore, the current time is acquired. For example, if “8 o'clock” is acquired, “8 o'clock is one” is determined as the response content.

図７（Ｂ）においては、目覚まし機能辞書群５３８、天気予報機能辞書群５３４に設けられている抽象キャラクタ追加辞書から認識フレーズに対する応答処理を整理したテーブルが示されている。 FIG. 7B shows a table in which the response processing for the recognition phrase is organized from the abstract character addition dictionaries provided in the alarm function dictionary group 538 and the weather forecast function dictionary group 534.

認識フレーズ「今日の天気は？」を認識した場合について説明する。
認識フレーズ「今日の天気は？」を認識した場合には、キャラクタ毎に回答フレーズが異なる。一例として、抽象キャラクタＸの場合には、「・・・」「の予報です」を応答内容として決定する。ここで、「・・・」の部分は、コマンドにより取得した情報が挿入される。具体的には、認識フレーズ「今日の天気は？」を認識した場合のコマンドは「天気情報取得」に設定されている。したがって、天気情報が取得される。そして、たとえば、「晴れ」が取得された場合には、「晴れの予報です」を応答内容として決定する。 A case where the recognition phrase “What is the weather today?” Will be described.
When the recognition phrase “How is the weather today?” Is recognized, the answer phrase is different for each character. As an example, in the case of the abstract character X, “...” and “This is a forecast” are determined as response contents. Here, information acquired by a command is inserted into the portion of “...”. Specifically, the command for recognizing the recognition phrase “What is the weather today?” Is set to “Acquire Weather Information”. Therefore, weather information is acquired. For example, if “sunny” is acquired, “sunny forecast” is determined as the response content.

抽象キャラクタＹの場合には、「・・・」「そうだよ」を応答内容として決定する。ここで、「・・・」の部分は、コマンドにより取得した情報が挿入される。具体的には、認識フレーズ「今日の天気は？」を認識した場合のコマンドは「天気情報取得」に設定されている。したがって、現在時刻が取得される。そして、たとえば、「晴れ」が取得された場合には、「晴れそうだよ」を応答内容として決定する。 In the case of the abstract character Y, “...” And “Yes” are determined as response contents. Here, information acquired by a command is inserted into the portion of “...”. Specifically, the command for recognizing the recognition phrase “What is the weather today?” Is set to “Acquire Weather Information”. Therefore, the current time is acquired. Then, for example, when “sunny” is acquired, “sounds likely” is determined as the response content.

上記したように、キャラクタ毎に辞書を作成するのではなく、複数のキャラクタと関連付けられる抽象キャラクタの辞書を作成して利用することにより辞書を作成する側の負荷を軽減することができる。 As described above, instead of creating a dictionary for each character, it is possible to reduce the load on the side of creating the dictionary by creating and using a dictionary of abstract characters associated with a plurality of characters.

本例においては、２つずつのキャラクタに対応する抽象キャラクタが定義されるため辞書数は半減する。 In this example, the number of dictionaries is halved because abstract characters corresponding to two characters are defined.

次に、認識フレーズ「(目覚まし)」を認識する場合について説明する。当該機能は、ユーザからの発話に対する応答処理ではなく、設定した所定時刻に自動で目覚まし機能（コマンド）を実行する応答処理である。 Next, the case where the recognition phrase “(alarm)” is recognized will be described. This function is not a response process to a user's utterance, but a response process that automatically executes a wake-up function (command) at a set predetermined time.

具体的には、認識フレーズ「（目覚まし）」は、ユーザからの発話として目覚ましを認識したものと仮定して応答処理を実行する。具体的には、目覚まし実行部４１８からの指示があった場合に認識フレーズ「（目覚まし）」を認識したものと判断する。 Specifically, the recognition phrase “(alarm)” executes response processing on the assumption that the alarm is recognized as an utterance from the user. Specifically, it is determined that the recognition phrase “(alarm)” has been recognized when there is an instruction from the alarm execution unit 418.

認識フレーズ「（目覚まし）」を認識した場合には、キャラクタ毎に回答フレーズが異なる。一例として、抽象キャラクタＸの場合には、「朝です、起きてください」を応答内容として決定する。抽象キャラクタＹの場合には、「朝だよ、起きて」を応答内容として決定する。 When the recognition phrase “(alarm)” is recognized, the answer phrase is different for each character. As an example, in the case of the abstract character X, “Morning, please wake up” is determined as the response content. In the case of the abstract character Y, “It's morning, get up” is determined as the response content.

したがって、例えばサーバに対して新たな機能を追加するベンダー側としては、抽象キャラクタが定義されている場合には、当該抽象キャラクタに対する辞書を作成することにより、全てのキャラクタ毎の辞書を作成する必要が無く、辞書の作成の負担を軽減することが可能である。 Therefore, for example, as a vendor who adds a new function to the server, when an abstract character is defined, it is necessary to create a dictionary for every character by creating a dictionary for the abstract character. It is possible to reduce the burden of creating a dictionary.

また、抽象キャラクタに限らず、通常のキャラクタを追加する場合においても辞書の作成の負担を軽減することが可能である。 Further, not only an abstract character but also a normal character can be added to reduce the burden of creating a dictionary.

例えば、通常のキャラクタを追加する際に、基本辞書に全ての機能に対応する回答フレーズを作成する必要はない。具体的には、既に追加されている機能については、キャラクタと関連付けられる抽象キャラクタの追加辞書を利用することが可能であるため、例えば、標準会話の応答処理に関する辞書を作成すればよい。これにより、通常のキャラクタを追加する際にも辞書の作成の負担を軽減することが可能である。 For example, when adding a normal character, it is not necessary to create answer phrases corresponding to all functions in the basic dictionary. Specifically, for functions that have already been added, it is possible to use an additional dictionary of abstract characters associated with the character, so for example, a dictionary relating to standard conversation response processing may be created. This can reduce the burden of creating a dictionary even when adding a normal character.

また、新たに機能を追加（インストール）する際に、全てのキャラクタにそれぞれ合わせた追加辞書を作成して追記するのではなく、例えば、抽象的なキャラクタＸ、Ｙに対応する辞書のみ作成すれば、違和感の無い自然な会話を維持することが可能であるため、機能を追加して辞書を作成する側の負荷を軽減することが可能である。 Also, when adding (installing) new functions, instead of creating and adding additional dictionaries for each character, for example, only creating dictionaries corresponding to abstract characters X and Y Since it is possible to maintain a natural conversation with no sense of incongruity, it is possible to reduce the load on the side of creating a dictionary by adding functions.

（フロー図）
図８は、実施形態１に基づくサーバ３００のフローを説明する図である。 (Flow diagram)
FIG. 8 is a diagram illustrating the flow of the server 300 based on the first embodiment.

図８に示されるように、音声データを受信したか否かを判断する（ステップＳ１）。具体的には、音声入力受信部４１４は、音声データを受信する。 As shown in FIG. 8, it is determined whether audio data has been received (step S1). Specifically, the voice input receiving unit 414 receives voice data.

ステップＳ１において、音声データを受信したと判断した場合（ステップＳ１においてＹＥＳ）には、次に、音声認識を実行する（ステップＳ２）。具体的には、音声認識部４１６は、音声入力受信部４１４から受信した音声データに従って認識フレーズを取得する。 If it is determined in step S1 that voice data has been received (YES in step S1), then voice recognition is executed (step S2). Specifically, the voice recognition unit 416 acquires a recognition phrase according to the voice data received from the voice input reception unit 414.

次に、キャラクタを確認する（ステップＳ３）。具体的には、第１抽出部４０６は、キャラクタ設定部４０２で設定されたキャラクタを確認する。 Next, the character is confirmed (step S3). Specifically, the first extraction unit 406 confirms the character set by the character setting unit 402.

次に、基本辞書を設定する（ステップＳ４）。第１抽出部４０６は、設定されたキャラクタに対応する基本辞書を設定する。例えば、キャラクタＡが設定されている場合には、キャラクタＡ基本辞書５３２Ａが設定される。 Next, a basic dictionary is set (step S4). The first extraction unit 406 sets a basic dictionary corresponding to the set character. For example, when character A is set, character A basic dictionary 532A is set.

次に、基本辞書に認識フレーズがあるかどうかを判断する（ステップＳ５）。第１抽出部４０６は、設定された基本辞書に認識フレーズが含まれるか否かを判断する。例えば、キャラクタＡ基本辞書５３２Ａに認識フレーズが含まれるか否かを判断する。 Next, it is determined whether or not there is a recognition phrase in the basic dictionary (step S5). The first extraction unit 406 determines whether a recognition phrase is included in the set basic dictionary. For example, it is determined whether or not the recognition phrase is included in the character A basic dictionary 532A.

ステップＳ５において、設定された基本辞書に認識フレーズがあると判断した場合（ステップＳ５においてＹＥＳ）には、回答フレーズを決定する（ステップＳ６）。第１抽出部４０６は、設定された基本辞書から回答フレーズを決定する。 If it is determined in step S5 that there is a recognition phrase in the set basic dictionary (YES in step S5), an answer phrase is determined (step S6). The first extraction unit 406 determines an answer phrase from the set basic dictionary.

次に、音声合成する（ステップＳ７）。具体的には、音声合成部４１０は、決定された回答フレーズと、設定されているキャラクタの音素パターンとに基づいて音声合成して音声データを生成する。 Next, speech synthesis is performed (step S7). Specifically, the speech synthesizer 410 generates speech data by performing speech synthesis based on the determined answer phrase and the set phoneme pattern of the character.

次に、出力する（ステップＳ７＃）。具体的には、データ出力部４１２は、音声合成部４１０で生成された音声データを通信部５４０を介して掃除ロボット１００に出力する。 Next, it outputs (step S7 #). Specifically, the data output unit 412 outputs the voice data generated by the voice synthesis unit 410 to the cleaning robot 100 via the communication unit 540.

そして、処理を終了する（エンド）。
これにより、掃除ロボット１００は、サーバ３００で生成された音声データを受信して、再生出力する。 Then, the process ends (END).
As a result, the cleaning robot 100 receives the audio data generated by the server 300, and reproduces and outputs it.

一方、ステップＳ５において、基本辞書に認識フレーズが無いと判断した場合（ステップＳ５においてＮＯ）には、抽象キャラクタを設定する（ステップＳ８）。具体的には、第２抽出部４０８は、キャラクタ設定部４０２で設定されたキャラクタに関連付けられた抽象キャラクタを設定する。例えば、キャラクタＡが設定されている場合には、抽象キャラクタＸが設定される。 On the other hand, if it is determined in step S5 that there is no recognition phrase in the basic dictionary (NO in step S5), an abstract character is set (step S8). Specifically, the second extraction unit 408 sets an abstract character associated with the character set by the character setting unit 402. For example, when the character A is set, the abstract character X is set.

次に、追加辞書を設定する（ステップＳ９）。具体的には、第２抽出部４０８は、設定された抽象キャラクタに対応する追加辞書を設定する。例えば、抽象キャラクタＸが設定されている場合には、抽象キャラクタＸ追加辞書を設定する。 Next, an additional dictionary is set (step S9). Specifically, the second extraction unit 408 sets an additional dictionary corresponding to the set abstract character. For example, when an abstract character X is set, an abstract character X additional dictionary is set.

次に、追加辞書に認識フレーズがあるかどうかを判断する（ステップＳ１０）。具体的には、第２抽出部４０８は、抽象キャラクタＸ追加辞書５３４Ａに認識フレーズがあるか否かを判断する。 Next, it is determined whether there is a recognition phrase in the additional dictionary (step S10). Specifically, the second extraction unit 408 determines whether there is a recognition phrase in the abstract character X additional dictionary 534A.

追加辞書に認識フレーズがあると判断した場合（ステップＳ１０においてＹＥＳ）には、コマンドがあるかどうかを判断する（ステップＳ１１）。第２抽出部４０８は、抽象キャラクタＸ追加辞書５３４Ａに認識フレーズがあると判断した場合に、当該認識フレーズに対応するコマンドがあるか否かを判断する。例えば、認識フレーズ「今日の天気は？」の場合には、コマンドは「天気情報取得」が対応付けられているためコマンドがあると判断される。 If it is determined that there is a recognition phrase in the additional dictionary (YES in step S10), it is determined whether there is a command (step S11). When the second extraction unit 408 determines that there is a recognition phrase in the abstract character X additional dictionary 534A, the second extraction unit 408 determines whether there is a command corresponding to the recognition phrase. For example, in the case of the recognition phrase “What is the weather today?”, It is determined that there is a command because the command is associated with “acquire weather information”.

ステップＳ１１において、コマンドがあると判断した場合（ステップＳ１１においてＹＥＳ）には、コマンドを実行する（ステップＳ１２）。第２抽出部４０８は、認識フレーズ「今日の天気は？」に対応してコマンド「天気情報取得」があると判断した場合には、天気情報を取得する。具体的には、当該コマンドに従って外部装置５０にアクセスして天気予報に関する情報を取得する。 If it is determined in step S11 that there is a command (YES in step S11), the command is executed (step S12). When the second extraction unit 408 determines that there is a command “acquire weather information” corresponding to the recognition phrase “what is the weather today?”, The second extractor 408 acquires weather information. Specifically, according to the command, the external device 50 is accessed to acquire information related to the weather forecast.

そして、回答フレーズを決定する（ステップＳ６）。第２抽出部４０８は、コマンド実行により取得した天気予報に関する情報に基づいて回答フレーズを決定する。例えば、「晴れの予報です」を決定する
次に、音声合成する（ステップＳ７）。具体的には、音声合成部４１０は、決定された回答フレーズと、設定されているキャラクタの音素パターンとに基づいて音声合成して音声データを生成する。例えばキャラクタＡが設定されている場合には、キャラクタＡの音素パターンと「晴れの予報です」の回答フレーズとに基づいて音声データを生成する。 Then, an answer phrase is determined (step S6). The second extraction unit 408 determines an answer phrase based on information related to the weather forecast acquired by command execution. For example, “Sunny forecast” is determined. Next, speech synthesis is performed (step S7). Specifically, the speech synthesizer 410 generates speech data by performing speech synthesis based on the determined answer phrase and the set phoneme pattern of the character. For example, when the character A is set, voice data is generated based on the phoneme pattern of the character A and the answer phrase “It is a clear forecast”.

そして、処理を終了する（エンド）。
一方、ステップＳ１１において、コマンドが無いと判断した場合（ステップＳ１１においてＮＯ）には、コマンドを実行することなく、回答フレーズを決定する（ステップＳ６）。第２抽出部４０８は、抽象キャラクタ追加辞書から回答フレーズを決定する。 Then, the process ends (END).
On the other hand, if it is determined in step S11 that there is no command (NO in step S11), an answer phrase is determined without executing the command (step S6). The second extraction unit 408 determines an answer phrase from the abstract character addition dictionary.

次に、音声合成する（ステップＳ７）。具体的には、音声合成部４１０は、決定された回答フレーズと、設定されているキャラクタの音素パターンとに基づいて音声合成して音声データを生成する。例えばキャラクタＡが設定されている場合には、キャラクタＡの音素パターンと回答フレーズとに基づいて音声データを生成する。 Next, speech synthesis is performed (step S7). Specifically, the speech synthesizer 410 generates speech data by performing speech synthesis based on the determined answer phrase and the set phoneme pattern of the character. For example, when the character A is set, voice data is generated based on the phoneme pattern of the character A and the answer phrase.

そして、処理を終了する（エンド）。
一方、ステップＳ１０において、追加辞書に認識フレーズが無いと判断した場合（ステップＳ１０においてＮＯ）には、他の追加辞書があるかどうかを判断する（ステップＳ１３）。第２抽出部４０８は、設定された抽象キャラクタＸ追加辞書５３４Ａに認識フレーズが無いと判断した場合には、他の追加辞書があるかどうかを判断する。 Then, the process ends (END).
On the other hand, if it is determined in step S10 that there is no recognition phrase in the additional dictionary (NO in step S10), it is determined whether there is another additional dictionary (step S13). When the second extraction unit 408 determines that there is no recognition phrase in the set abstract character X additional dictionary 534A, the second extraction unit 408 determines whether there is another additional dictionary.

ステップＳ１３において、他の追加辞書があると判断した場合（ステップＳ１３においてＹＥＳ）には、ステップＳ９に戻る。第２抽出部４０８は、他の追加辞書があると判断した場合には、他の追加辞書を設定する。そして、上記の処理を繰り返す。 If it is determined in step S13 that there is another additional dictionary (YES in step S13), the process returns to step S9. If the second extraction unit 408 determines that there is another additional dictionary, it sets another additional dictionary. Then, the above process is repeated.

一方、ステップＳ１３において、他の追加辞書が無いと判断した場合（ステップＳ１３においてＮＯ）には、認識フレーズに対する応答処理を終了する（エンド）。 On the other hand, if it is determined in step S13 that there is no other additional dictionary (NO in step S13), the response process for the recognized phrase is ended (END).

なお、目覚まし実行部４１８は、設定した所定時刻に仮想的に「（目覚まし）」の音声データを応答実行部４０４に出力する。これにより、音声認識部４１６で音声認識され、以降上記の処理により回答フレーズが決定される。 The alarm execution unit 418 virtually outputs the voice data “(alarm)” to the response execution unit 404 at the set predetermined time. Thereby, the speech recognition unit 416 recognizes the speech, and the answer phrase is determined by the above processing.

＜実施形態２＞
上記実施形態においては、まず、第１抽出部４０６で基本辞書を設定して、基本辞書に認識フレーズが無いと判断された場合に第２抽出部４０８で追加辞書を設定して認識フレーズがあるか否かを判断する方式について説明したが、特にこれに限られず、順番を入れ替えて先に第２抽出部４０８で抽象キャラクタに対応する追加辞書を設定して認識フレーズがあるか否かを判断し、認識フレーズが無いと判断された場合に第１抽出部４０６で基本辞書を設定して、基本辞書に認識フレーズがあるか否かを判断するようにしても良い。 <Embodiment 2>
In the above embodiment, first, a basic dictionary is set by the first extraction unit 406, and when it is determined that there is no recognition phrase in the basic dictionary, an additional dictionary is set by the second extraction unit 408 and there is a recognition phrase. However, the present invention is not limited to this, but the order is changed and the second extraction unit 408 first sets an additional dictionary corresponding to the abstract character to determine whether there is a recognition phrase. When it is determined that there is no recognition phrase, the first extraction unit 406 may set a basic dictionary to determine whether or not there is a recognition phrase in the basic dictionary.

追加機能を頻繁に利用する場合には、第１抽出部４０６よりも先に第２抽出部４０８で処理することにより高速に追加した機能の処理を実行することが可能である。 When the additional function is frequently used, it is possible to execute processing of the function added at high speed by processing the second extraction unit 408 prior to the first extraction unit 406.

＜実施形態３＞
また、上記実施形態においては、第１抽出部４０６および第２抽出部４０８を連続的に処理する場合について説明したが、特にこれに限られず第１抽出部４０６および第２抽出部４０８における処理を並列的に実行するようにしても良い。 <Embodiment 3>
In the above embodiment, the case where the first extraction unit 406 and the second extraction unit 408 are continuously processed has been described. However, the present invention is not limited to this, and the processing in the first extraction unit 406 and the second extraction unit 408 is performed. You may make it perform in parallel.

また、選択可能な複数のモードを設けて、いずれか１つを実行するようにしても良い。具体的には、モード毎に処理を切り替えても良い。例えば、基本辞書のみを利用する基本モードや、追加辞書のみを利用する拡張モード、追加辞書の中の特定の辞書のみ利用する特定モード等を設けて、指定されたモードに従って第１抽出部４０６を用いた基本辞書による応答処理あるいは、第２抽出部４０８を用いた追加辞書による応答処理を切り替えて実行することも可能である。 Also, a plurality of selectable modes may be provided and any one of them may be executed. Specifically, the processing may be switched for each mode. For example, a basic mode that uses only the basic dictionary, an expansion mode that uses only the additional dictionary, a specific mode that uses only a specific dictionary in the additional dictionary, and the like are provided, and the first extraction unit 406 is configured according to the designated mode. It is also possible to switch and execute response processing using the basic dictionary used or response processing using an additional dictionary using the second extraction unit 408.

＜実施形態４＞
（実施形態４に基づく音声出力システムの機能ブロック図）
図９は、実施形態４に基づく音声出力システムの機能を説明するブロック図である。 <Embodiment 4>
(Functional block diagram of audio output system based on Embodiment 4)
FIG. 9 is a block diagram illustrating functions of the audio output system based on the fourth embodiment.

図９に示されるように、掃除ロボット１００を掃除ロボット１１０に置換し、サーバ３００をサーバ３１０に置換した点が異なる。 As shown in FIG. 9, the cleaning robot 100 is replaced with a cleaning robot 110, and the server 300 is replaced with a server 310.

具体的には、掃除ロボット１１０は、ＣＰＵ６１１と、データベース６３１とを含む。
ＣＰＵ６１１は、キャラクタ選択入力受付部２００と、音声入力受付部２１４と、キャラクタ設定部２０２と、応答実行部２０４と、目覚まし実行部２１８とを含む。 Specifically, the cleaning robot 110 includes a CPU 611 and a database 631.
CPU 611 includes a character selection input receiving unit 200, a voice input receiving unit 214, a character setting unit 202, a response execution unit 204, and an alarm execution unit 218.

応答実行部２０４は、抽出部２０６と、取得部２０８と、音声認識部２１６と、音声合成部２１０と、データ出力部２１２とを含む。 The response execution unit 204 includes an extraction unit 206, an acquisition unit 208, a speech recognition unit 216, a speech synthesis unit 210, and a data output unit 212.

サーバ３１０は、ＣＰＵ５１１と、データベース５３１Ａとを含む。
ＣＰＵ５１１は、データ抽出部４２０と、データ出力部４３０とを含む。 Server 310 includes a CPU 511 and a database 531A.
The CPU 511 includes a data extraction unit 420 and a data output unit 430.

なお、ここでは省略しているが実施形態１で説明した機能辞書群受付部４４０、追加基本辞書受付部４５０および登録部４４２をさらに設ける構成としても良い。 Although omitted here, the functional dictionary group reception unit 440, the additional basic dictionary reception unit 450, and the registration unit 442 described in the first embodiment may be further provided.

実施形態４の構成は、実施形態１で説明したサーバ３００で備えている主な機能を掃除ロボット１１０が有している構成である。 The configuration of the fourth embodiment is a configuration in which the cleaning robot 110 has the main functions provided in the server 300 described in the first embodiment.

本例におけるデータベース６３１は、図６で説明した標準会話機能辞書群５３２を備える。また、本例におけるデータベース５３１Ａは、図６で説明した目覚まし機能辞書群５３８と、天気予報機能辞書群５３４とを含む。 The database 631 in this example includes the standard conversation function dictionary group 532 described with reference to FIG. Further, the database 531A in this example includes the alarm function dictionary group 538 and the weather forecast function dictionary group 534 described with reference to FIG.

具体的には、標準会話機能については、データベース６３１を利用し、追加機能については、データベース５３１を利用する形態である。 Specifically, the database 631 is used for the standard conversation function, and the database 531 is used for the additional function.

キャラクタ選択入力受付部２００、キャラクタ設定部２０２、音声入力受付部２１４、目覚まし実行部２１８、抽出部２０６、音声認識部２１６、音声合成部２１０、データ出力部２１２は、キャラクタ選択入力受付部４００、キャラクタ設定部４０２、音声入力受信部４１４、目覚まし実行部４１８、第１抽出部４０６、音声認識部４１６、音声合成部４１０、データ出力部４１２と基本的に同様でありその詳細な説明は繰り返さない。 Character selection input reception unit 200, character setting unit 202, voice input reception unit 214, alarm execution unit 218, extraction unit 206, voice recognition unit 216, voice synthesis unit 210, data output unit 212, character selection input reception unit 400, Character setting unit 402, voice input reception unit 414, alarm execution unit 418, first extraction unit 406, voice recognition unit 416, voice synthesis unit 410, and data output unit 412 are basically the same, and detailed description thereof will not be repeated. .

取得部２０８は、音声認識部２１６で認識した認識フレーズに基づいて、サーバ３１０に格納されているデータベース５３１Ａに格納されている追加辞書群の辞書を参照して、音声データの示す音声内容に対応する応答内容（応答情報）を取得する。 The acquisition unit 208 refers to the dictionary of the additional dictionary group stored in the database 531A stored in the server 310 based on the recognition phrase recognized by the voice recognition unit 216, and corresponds to the voice content indicated by the voice data. Acquires the response contents (response information).

（応答処理概要）
図１０は、実施形態４に基づく音声出力システム１Ａにおける応答処理の流れを説明するシーケンス図である。 (Response processing overview)
FIG. 10 is a sequence diagram illustrating the flow of response processing in the audio output system 1A based on the fourth embodiment.

図１０に示されるように、ユーザは、掃除ロボット１１０に対して発話（ユーザ発話とも称する）する（シーケンスｓｑ１０）。 As shown in FIG. 10, the user utters (also referred to as user utterance) to cleaning robot 110 (sequence sq10).

掃除ロボット１１０は、ユーザ発話に対して音声の入力を受け付ける（シーケンスｓｑ１１）。具体的には、入力部６５０は、マイクを介して外部からの音の入力を受け付ける。音声入力受付部２１４は、入力部６５０から音声入力を受け付けて、応答実行部２０４に出力する。 Cleaning robot 110 accepts an input of voice for the user utterance (sequence sq11). Specifically, the input unit 650 receives external sound input via a microphone. The voice input reception unit 214 receives voice input from the input unit 650 and outputs it to the response execution unit 204.

次に、掃除ロボット１１０の応答実行部２０４は、音声認識を実行する（シーケンスｓｑ１２）。具体的には、音声認識部２１６は、音声入力受付部２１４からの音声内容を認識する。 Next, the response execution unit 204 of the cleaning robot 110 executes voice recognition (sequence sq12). Specifically, the voice recognition unit 216 recognizes the voice content from the voice input reception unit 214.

次に、本例においては、掃除ロボット１１０の抽出部２０６は、データベース６３１の標準会話機能辞書群に認識フレーズに対応する辞書が無いと判断する（シーケンスｓｑ１３）。 Next, in this example, the extraction unit 206 of the cleaning robot 110 determines that there is no dictionary corresponding to the recognized phrase in the standard conversation function dictionary group of the database 631 (sequence sq13).

次に、掃除ロボット１１０の取得部２０８は、サーバ３１０にデータを送信する（シーケンスｓｑ１４）。当該データには、認識した音声内容とともに設定されているキャラクタに関する情報が含まれる。 Next, acquisition unit 208 of cleaning robot 110 transmits data to server 310 (sequence sq14). The data includes information on the character set together with the recognized voice content.

サーバ３１０は、掃除ロボット１１０から送信されたデータに基づいて応答処理する（シーケンスｓｑ１５）。具体的には、データ抽出部４２０は、送信されたデータに対してデータベース５３１Ａを参照して、音声内容に対応する応答内容（応答情報）を抽出する。 Server 310 performs response processing based on the data transmitted from cleaning robot 110 (sequence sq15). Specifically, the data extraction unit 420 refers to the database 531A for the transmitted data, and extracts response content (response information) corresponding to the audio content.

データ抽出部４２０の処理は、実施形態１で説明した第２抽出部４０８と同様の処理を実行する。 The data extraction unit 420 performs the same process as the second extraction unit 408 described in the first embodiment.

次に、サーバ３１０は、抽出した応答内容のデータを掃除ロボット１１０に送信する（シーケンスｓｑ１６）。具体的には、データ出力部４３０は、通信部５４０を介して応答内容のデータを掃除ロボット１１０に送信する。 Next, server 310 transmits the extracted response content data to cleaning robot 110 (sequence sq16). Specifically, the data output unit 430 transmits the response content data to the cleaning robot 110 via the communication unit 540.

次に、掃除ロボット１１０は、サーバ３００から受信したデータの出力処理を実行する（シーケンスｓｑ１７）。具体的には、取得部２０８は、サーバから送信された応答内容のデータを取得して、音声合成部２１０に出力する。音声合成部２１０は、受信した応答内容に含まれる回答フレーズおよび設定されているキャラクタとに基づいて音声合成して音声データを生成する。 Next, cleaning robot 110 executes a process for outputting data received from server 300 (sequence sq17). Specifically, the acquisition unit 208 acquires response content data transmitted from the server and outputs the response content data to the speech synthesis unit 210. The speech synthesizer 210 synthesizes speech based on the answer phrase included in the received response content and the set character to generate speech data.

掃除ロボット１００は、音声データに基づいて音声を再生する（シーケンスｓｑ１８）。具体的には、スピーカを介して音声信号を再生する。当該処理により、ユーザが発話した内容に従って応答処理し、応答内容を示す音声をユーザに出力することが可能となる。 Cleaning robot 100 reproduces the sound based on the sound data (sequence sq18). Specifically, an audio signal is reproduced through a speaker. By this process, it is possible to perform a response process according to the content uttered by the user and to output a voice indicating the response content to the user.

なお、本例においては、掃除ロボット１１０の抽出部２０６において、データベース６３１の標準会話機能辞書群に認識フレーズに対応する辞書が無いと判断した場合について説明した。 In this example, a case has been described in which the extraction unit 206 of the cleaning robot 110 determines that there is no dictionary corresponding to the recognized phrase in the standard conversation function dictionary group of the database 631.

一方で、データベース６３１の標準会話機能辞書群に認識フレーズに対応する辞書があると判断した場合には、サーバ３１０にアクセスすることなく、抽出部２０６においてデータベース６３１を参照して回答フレーズを抽出する。そして、回答フレーズと設定されているキャラクタとに基づいて音声合成して音声データを生成する。そして、音声データに基づいて音声を再生する。 On the other hand, when it is determined that there is a dictionary corresponding to the recognized phrase in the standard conversation function dictionary group of the database 631, the extraction unit 206 extracts the answer phrase by referring to the database 631 without accessing the server 310. . Then, voice data is generated by performing voice synthesis based on the answer phrase and the set character. Then, the audio is reproduced based on the audio data.

当該構成において、基本辞書を掃除ロボット１１０のデータベース６３１に格納し、追加辞書をサーバ３１０のデータベース５３１Ａに格納することにより、掃除ロボット１１０における基本辞書を用いた対話の速度を高速にすることが可能である。 In this configuration, by storing the basic dictionary in the database 631 of the cleaning robot 110 and storing the additional dictionary in the database 531A of the server 310, it is possible to increase the speed of the dialogue using the basic dictionary in the cleaning robot 110. It is.

また、追加辞書をサーバ３１０側に持たせることにより掃除ロボット１１０側の記憶容量の制約を考慮する必要がなく追加辞書を容易に追加することが可能である。 Further, by providing the additional dictionary on the server 310 side, it is not necessary to consider the storage capacity limitation on the cleaning robot 110 side, and the additional dictionary can be easily added.

＜実施形態５＞
上記の実施形態４においては、標準会話機能辞書群を掃除ロボット１１０側に設けて、サーバ３１０側に追加辞書群を設ける構成について説明した。一方で、当該構成を逆にして、掃除ロボット１１０側に追加辞書群を設けて、サーバ３１０側に標準会話機能辞書群を設けた構成とすることも可能である。 <Embodiment 5>
In the fourth embodiment, the configuration in which the standard conversation function dictionary group is provided on the cleaning robot 110 side and the additional dictionary group is provided on the server 310 side has been described. On the other hand, it is also possible to reverse the configuration and provide an additional dictionary group on the cleaning robot 110 side and a standard conversation function dictionary group on the server 310 side.

データベースの配置が異なるのみで処理は同様である。
当該構成により、掃除ロボット１１０における追加辞書を用いた追加機能における対話の速度を高速にすることが可能である。 The process is the same except for the arrangement of the database.
With this configuration, it is possible to increase the speed of dialogue in the additional function using the additional dictionary in the cleaning robot 110.

＜実施形態６＞
上記の構成においては、掃除ロボットとサーバとが連携して処理する構成について説明したが、サーバと連携することなく掃除ロボットのみで処理させることも可能である。 <Embodiment 6>
In the above configuration, the configuration in which the cleaning robot and the server process in cooperation has been described. However, the cleaning robot and the server can perform the processing only with no cooperation with the server.

具体的には、図５で説明したサーバ３００の構成を掃除ロボット１００に含めることにより実現することが可能である。当該構成により、掃除ロボット１００に全ての辞書を設ける構成により標準会話および追加機能における対話の速度をとも高速にすることが可能である。 Specifically, it can be realized by including the configuration of the server 300 described in FIG. 5 in the cleaning robot 100. With this configuration, it is possible to increase the speed of the conversation in the standard conversation and the additional function by providing all the dictionaries in the cleaning robot 100.

＜実施形態７＞
図１１は、実施形態７に基づくサーバの構成について説明する図である。 <Embodiment 7>
FIG. 11 is a diagram illustrating a configuration of a server based on the seventh embodiment.

図１１を参照して、本例においては、サーバが複数設けられている場合が示されている。 Referring to FIG. 11, in this example, a case where a plurality of servers are provided is shown.

本例においては、一例としてサーバ３００Ａと、サーバ３００Ｂとが設けられている場合が示されている。 In this example, a case where a server 300A and a server 300B are provided is shown as an example.

上記の実施形態１の構成においては、音声認識と音声認識に対する回答フレーズ（応答態様）とを決定する処理とを同じサーバで実行する場合について説明したが、一方で、当該処理をそれぞれ独立のサーバで実行することも可能である。 In the configuration of the first embodiment, the case where the voice recognition and the process for determining the answer phrase (response mode) for the voice recognition are executed by the same server has been described. It is also possible to execute with.

具体的には、サーバ３００Ａにおいて音声データに対する音声認識を実行し、サーバ３００Ｂにおいて回答フレーズデータを掃除ロボット１００に出力する構成としてもよい。 Specifically, the server 300A may perform voice recognition on the voice data, and the server 300B may output answer phrase data to the cleaning robot 100.

例えば、掃除ロボット１００から音声データをサーバ３００Ａに送信する（１）。サーバ３００Ａが音声データの音声認識を実行する（２）。そして、サーバ３００Ａが掃除ロボット１００に対して認識フレーズを送信する（３）。 For example, audio data is transmitted from the cleaning robot 100 to the server 300A (1). The server 300A performs voice recognition of the voice data (2). Then, the server 300A transmits the recognition phrase to the cleaning robot 100 (3).

掃除ロボット１００がサーバ３００Ａから認識フレーズを受信して、別のサーバ３００Ｂに当該認識フレーズを送信する（４）。 The cleaning robot 100 receives the recognition phrase from the server 300A, and transmits the recognition phrase to another server 300B (4).

サーバ３００Ｂは、掃除ロボット１００から認識フレーズを受信して、当該認識フレーズに対応する回答フレーズを決定する（５）。そして、サーバ３００Ｂは、掃除ロボットに対して回答フレーズを送信する（６）。 The server 300B receives the recognition phrase from the cleaning robot 100 and determines an answer phrase corresponding to the recognition phrase (5). Then, the server 300B transmits an answer phrase to the cleaning robot (6).

なお、本例においては、サーバ３００Ａが音声データの音声認識を実行した認識フレーズを掃除ロボット１００に対して送信する場合について説明したが、認識フレーズに限られず音声認識の結果を示す情報であればどのようなものでも良い。例えば、サーバ３００Ｂに格納されている回答フレーズにアクセスするために必要なアクセス情報（ＵＲＬ（Uniform Resource Locator）等）であってもよい。例えば、当該アクセス情報（ＵＲＬ）を掃除ロボット１００は、サーバ３００Ａから受信して、サーバ３００Ｂにアクセスすることにより回答フレーズをサーバ３００Ｂから取得する構成としてもよい。また、アクセス情報に限られず、サーバ３００Ｂに格納されている回答フレーズがファイル形式で保存されている場合には、サーバ３００Ａからの音声認識の結果を示す情報として、ファイル名を指定する情報であってもよい。例えば、当該ファイル名を掃除ロボット１００は、サーバ３００Ａから受信して、サーバ３００Ｂに対してファイル名を指定して情報を要求することにより、回答フレーズに関連するファイルをサーバ３００Ｂから取得することが可能である。 In addition, in this example, although the case where the server 300A transmits the recognition phrase which performed the speech recognition of audio | voice data with respect to the cleaning robot 100 was demonstrated, if it is the information which shows not only a recognition phrase but the speech recognition result, Any thing is good. For example, it may be access information (URL (Uniform Resource Locator) etc.) necessary for accessing an answer phrase stored in the server 300B. For example, the cleaning robot 100 may receive the access information (URL) from the server 300A and acquire the answer phrase from the server 300B by accessing the server 300B. Further, not limited to access information, when an answer phrase stored in the server 300B is saved in a file format, it is information for designating a file name as information indicating the result of speech recognition from the server 300A. May be. For example, the cleaning robot 100 may receive the file name from the server 300A, specify the file name to the server 300B, and request information, thereby acquiring the file related to the answer phrase from the server 300B. Is possible.

また、同様に、サーバ３００Ａからの音声認識の結果を示す情報として、認識フレーズをテキスト化したテキスト情報を送信するようにしてもよい。掃除ロボット１００は、当該テキスト情報から認識フレーズを抽出して、サーバ３００Ｂにアクセスして回答フレーズを取得するようにしてもよいし、当該テキスト情報をサーバ３００Ｂに送信して、サーバ３００Ｂで認識フレーズを含むテキスト情報を解析して、解析結果に基づいて回答フレーズを決定して、掃除ロボット１００に送信する構成としてもよい。 Similarly, text information obtained by converting a recognition phrase into text may be transmitted as information indicating the result of speech recognition from the server 300A. The cleaning robot 100 may extract a recognition phrase from the text information and access the server 300B to obtain an answer phrase. Alternatively, the cleaning robot 100 may transmit the text information to the server 300B, and the server 300B may recognize the recognition phrase. It is good also as a structure which analyzes the text information containing, determines an answer phrase based on an analysis result, and transmits to the cleaning robot 100.

また、本例においては、サーバ３００で音声認識する場合について説明したが、掃除ロボット１００で音声認識し、その結果に対する回答フレーズを掃除ロボット１００内で決定して、回答フレーズをサーバ３００Ｂから取得するようにしてもよい。その場合、認識フレーズに対するサーバ３００Ｂの回答フレーズにアクセスするアクセス情報（ＵＲＬ）が対応付けられたＵＲＬ対応テーブルを記憶部６３０に設けることにより実現することが可能である。 Moreover, in this example, although the case where the speech recognition was performed by the server 300 was described, the speech recognition is performed by the cleaning robot 100, the answer phrase for the result is determined in the cleaning robot 100, and the answer phrase is acquired from the server 300B. You may do it. In that case, it can be realized by providing the storage unit 630 with a URL correspondence table in which access information (URL) for accessing the reply phrase of the server 300B for the recognized phrase is associated.

また、掃除ロボット１００内に保存されている情報を利用して音声認識の結果に対する回答フレーズを取得することも可能である。 It is also possible to acquire an answer phrase for the result of voice recognition using information stored in the cleaning robot 100.

例えば、一時的に情報を格納することが可能なキャッシュメモリに以前に利用された認識フレーズに対する回答フレーズの情報が含まれている場合には、当該キャッシュメモリに格納されている回答フレーズの情報を利用することにより、例えば、サーバにアクセスすることなく回答フレーズを取得して掃除ロボット１００から発話（応答処理）することも可能である。これによりキャッシュメモリに格納されている情報を利用して早期に発話することが可能である。 For example, when the cache memory capable of temporarily storing information includes information on the answer phrase for the previously used recognition phrase, the answer phrase information stored in the cache memory is By using it, for example, it is possible to acquire an answer phrase without accessing the server and utter (response process) from the cleaning robot 100. As a result, it is possible to utter early using the information stored in the cache memory.

実施形態７のサーバの構成については、上記の実施形態１〜６のいずれにも適用可能である。 The server configuration of the seventh embodiment can be applied to any of the first to sixth embodiments.

＜実施形態８＞
掃除ロボット及びサーバ等の制御ブロックは、集積回路（ＩＣチップ）等に形成された論理回路（ハードウェア）によって実現してもよいし、ＣＰＵ（Central Processing Unit）を用いてソフトウェアによって実現してもよい。 <Embodiment 8>
Control blocks such as a cleaning robot and a server may be realized by a logic circuit (hardware) formed in an integrated circuit (IC chip) or by software using a CPU (Central Processing Unit). Good.

後者の場合、掃除ロボット及びサーバは、各機能を実現するソフトウェアであるプログラムの命令を実行するＣＰＵ、上記プログラムおよび各種データがコンピュータ（またはＣＰＵ）で読み取り可能に記録されたＲＯＭ（Read Only Memory）または記憶装置（これらを「記録媒体」と称する）、上記プログラムを展開するＲＡＭ（Random Access Memory）などを備えている。そして、コンピュータ（またはＣＰＵ）が上記プログラムを上記記録媒体から読み取って実行することにより、本発明の目的が達成される。上記記録媒体としては、「一時的でない有形の媒体」、例えば、テープ、ディスク、カード、半導体メモリ、プログラマブルな論理回路などを用いることができる。また、上記プログラムは、該プログラムを伝送可能な任意の伝送媒体（通信ネットワークや放送波等）を介して上記コンピュータに供給されてもよい。なお、本発明は、上記プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。 In the latter case, the cleaning robot and the server include a CPU that executes instructions of a program that is software that realizes each function, and a ROM (Read Only Memory) in which the program and various data are recorded so as to be readable by the computer (or CPU). Alternatively, a storage device (these are referred to as “recording media”), a RAM (Random Access Memory) that expands the program, and the like are provided. And the objective of this invention is achieved when a computer (or CPU) reads the said program from the said recording medium and runs it. As the recording medium, a “non-temporary tangible medium” such as a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used. The program may be supplied to the computer via an arbitrary transmission medium (such as a communication network or a broadcast wave) that can transmit the program. The present invention can also be realized in the form of a data signal embedded in a carrier wave in which the program is embodied by electronic transmission.

今回開示された実施形態はすべての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は上記した説明ではなくて特許請求の範囲によって示され、特許請求の範囲と均等の意味および範囲内でのすべての変更が含まれることが意図される。 It should be thought that embodiment disclosed this time is an illustration and restrictive at no points. The scope of the present invention is defined by the terms of the claims, rather than the description above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.

１，１Ａ音声出力システム、５ネットワーク、５０外部装置、１００，１１０掃除ロボット、２００，４００キャラクタ選択入力受付部、２０２，４０２キャラクタ設定部、２０４，４０４応答実行部、２０６抽出部、２０８取得部、２１０，４１０音声合成部、２１２，４１２，４３０データ出力部、２１４音声入力受付部、２１６，４１６音声認識部、２１８，４１８目覚まし実行部、３００，３００Ａ，３００Ｂ，３１０サーバ、４０６第１抽出部、４０８第２抽出部、４１４音声入力受信部、４２０データ抽出部、４４０機能辞書群受付部、４５０追加基本辞書受付部、５３１，５３１Ａ，６３１データベース、５２０，６２０一時記憶部、５３０，６３０記憶部、５３２標準会話機能辞書群、５３２Ａ，５３２Ｂ，５３２Ｃ，５３２Ｄ基本辞書、５３４天気予報機能辞書群、５３４Ａ，５３４Ｂ，５３８Ａ，５３８Ｂ追加辞書、５３８目覚まし機能辞書群、５３９対応テーブル、５４０，６４０通信部、５５０，６５０入力部、５６０，６６０出力部、６７０駆動部、６８０掃除部。 1, 1A voice output system, 5 network, 50 external device, 100, 110 cleaning robot, 200, 400 character selection input reception unit, 202, 402 character setting unit, 204, 404 response execution unit, 206 extraction unit, 208 acquisition unit , 210, 410 Speech synthesis unit, 212, 412, 430 Data output unit, 214 Speech input reception unit, 216, 416 Speech recognition unit, 218, 418 Alarm execution unit, 300, 300A, 300B, 310 server, 406 First extraction Unit, 408 second extraction unit, 414 voice input reception unit, 420 data extraction unit, 440 functional dictionary group reception unit, 450 additional basic dictionary reception unit, 531, 531A, 631 database, 520, 620 temporary storage unit, 530, 630 Storage unit, 532 standard conversation function dictionary group, 32A, 532B, 532C, 532D Basic dictionary, 534 Weather forecast function dictionary group, 534A, 534B, 538A, 538B Additional dictionary, 538 Alarm function dictionary group, 539 Corresponding table, 540, 640 Communication unit, 550, 650 Input unit, 560 , 660 output unit, 670 drive unit, 680 cleaning unit.

Claims

A server that is used for response processing to a request from a user and is configured to be additionally writable,
A plurality of basic dictionaries provided corresponding to a plurality of selectable characters, each storing response data used for response processing to the first request for each character;
A server including an additional dictionary in which the response data stored in response to the second request is stored instead of the basic dictionary corresponding to the selected character and associated with the selected character.

The server according to claim 1, wherein the additional dictionary is provided corresponding to an abstract character obtained by abstracting the plurality of characters.

The server according to claim 1, wherein the response data is text data used for outputting a voice as a response process to the request from the user.

The server according to claim 3, wherein the response data of the additional dictionary is text data excluding a characteristic characteristic of the response data of the basic dictionary.

It is configured to be appendable, provided corresponding to each of a plurality of selectable characters, a plurality of basic dictionaries storing response data for each character, and associated with the plurality of characters, corresponding to the selected character Using a database including an additional dictionary storing the response data used instead of the basic dictionary,
Receiving a request from a user;
Receiving a character selection;
Executing a response process in accordance with the accepted request,
The step of executing the response process includes:
Extracting the response data using a basic dictionary provided corresponding to the selected character when the received request is the first request;
If the request received is the second request, and extracting the response data using the associated with the selected character added dictionary,
A method of using a database, comprising: synthesizing speech based on extracted response data and a selected character; and outputting speech synthesized speech data.

It is configured to be appendable, provided corresponding to each of a plurality of selectable characters, a plurality of basic dictionaries storing response data for each character, and associated with the plurality of characters, corresponding to the selected character A program for causing a computer using a database including an additional dictionary in which the response data is used in place of the basic dictionary to be executed,
The program is stored in the computer.
Receiving a request from a user;
Receiving a character selection;
Executing a response process in accordance with the accepted request,
The step of executing the response process includes:
Extracting the response data using a basic dictionary provided corresponding to the selected character when the received request is the first request;
If the request received is the second request, and extracting the response data using the association et OF FURTHER dictionary to the selected character,
Synthesizing speech based on the extracted response data and the selected character;
A program that utilizes a database that functions to execute processing, including the step of outputting voice data that has been synthesized.

A system that uses a database that can be appended to,
A receiving means for receiving a request from a user;
Selection accepting means for accepting selection of a character;
Response executing means for executing response processing in accordance with the request received by the receiving means,
The database is
A plurality of basic dictionaries provided corresponding to each of a plurality of selectable characters and storing response data for each character;
An additional dictionary in which the response data is stored in association with the plurality of characters and used instead of the basic dictionary corresponding to the selected character;
The response execution means includes
First extraction means for extracting response data using a basic dictionary provided corresponding to the selected character when the received request is a first request;
If the request received is the second request, a second extraction means for extracting the response data using the association et OF FURTHER dictionary to the selected character,
Speech synthesis means for performing speech synthesis based on the extracted response data and the selected character;
A system using a database, including output means for outputting voice data synthesized by voice synthesis.

A first database including a plurality of basic dictionaries configured to be additionally writable and provided corresponding to a plurality of selectable characters and storing response data for each character, and associated with the plurality of characters, A method using a second database including an additional dictionary in which the response data used instead of the basic dictionary corresponding to the selected character is stored,
Receiving a request from a user;
Receiving a character selection;
Executing a response process in accordance with the accepted request,
The step of executing the response process includes:
Extracting the response data using the basic dictionary of the first database provided corresponding to the selected character when the received request is the first request;
If the accepted request is a second request, extracting the reply data using the additional dictionary of the second database was association et al to the selected character,
Synthesizing speech based on the extracted response data and the selected character;
A method of using a database, comprising: outputting speech synthesized speech data.

A plurality of basic dictionaries provided in an external device, configured to be additionally writable, provided corresponding to a plurality of selectable characters, and storing response data used for response processing to the first request for each character And a terminal using a database associated with the plurality of characters and including an additional dictionary storing the response data used for response processing to the second request instead of the basic dictionary corresponding to the selected character There,
Accepting means for accepting a first or second request from a user;
Selection accepting means for accepting selection of a character;
Output speech data synthesized by speech based on response data extracted using the database in response to the first or second request received by the receiving means and the character selected by the selection receiving means. And a terminal.

A plurality of basic dictionaries provided in the main body, configured to be additionally writable, provided corresponding to a plurality of selectable characters, and storing response data used for response processing to the first request for each character And the response data used in response processing to the second request instead of the basic dictionary associated with the plurality of characters and associated with the selected character is stored in the first database including A terminal that uses the second database including the additional dictionary,
Accepting means for accepting a first or second request from a user;
Selection accepting means for accepting selection of a character;
Response executing means for executing response processing in accordance with the request received by the receiving means,
The response execution means includes
Extracting means for extracting response data using the basic dictionary of the first database provided corresponding to the selected character when the receiving means receives the first request;
Obtaining means for obtaining response data extracted using an additional dictionary of the second database associated with the selected character when the accepting means accepts a second request;
Speech synthesis means for performing speech synthesis based on the extracted or acquired response data and the character selected by the selection receiving means;
A terminal comprising: output means for outputting voice-synthesized voice data.

A first database including a plurality of basic dictionaries provided in an external device and provided corresponding to a plurality of selectable characters and storing response data used for response processing to the first request for each character And an additional dictionary that is provided in the main body, is associated with the plurality of characters, and stores the response data used for response processing to the second request instead of the basic dictionary corresponding to the selected character. A terminal that uses the database in FIG.
Accepting means for accepting a first or second request from a user;
Selection accepting means for accepting selection of a character;
Response executing means for executing response processing in accordance with the request received by the receiving means,
The response execution means includes
Obtaining means for obtaining response data using the basic dictionary of the first database provided corresponding to the selected character when the accepting means accepts the first request;
An extraction means for extracting response data using an additional dictionary of the second database associated with the selected character when the receiving means receives a second request;
Speech synthesis means for performing speech synthesis based on the extracted or acquired response data and the character selected by the selection receiving means;
A terminal comprising: output means for outputting voice-synthesized voice data.

A plurality of basic dictionaries provided in the main body, configured to be additionally writable, provided corresponding to a plurality of selectable characters, and storing response data used for response processing to the first request for each character And a terminal using a database associated with the plurality of characters and including an additional dictionary storing the response data used for response processing to the second request instead of the basic dictionary corresponding to the selected character There,
Accepting means for accepting a first or second request from a user;
Selection accepting means for accepting selection of a character;
Response executing means for executing response processing in accordance with the request received by the receiving means,
The response execution means includes
When the first request received by said receiving means, first extracting means for extracting the reply data using the basic dictionary of the database provided corresponding to the selected character,
If the second request accepted by the accepting means, second extracting means for extracting the reply data using the additional dictionary of the database that is associated with the selected character,
Speech synthesis means for synthesizing speech based on the extracted response data and the character selected by the selection receiving means;
A terminal comprising: output means for outputting voice-synthesized voice data.

A plurality of basic dictionaries provided in an external device, configured to be additionally writable, provided corresponding to a plurality of selectable characters, and storing response data used for response processing to the first request for each character And a terminal using a database including the additional dictionary in which the response data used in response to the second request is used instead of the basic dictionary corresponding to the selected character and associated with the plurality of characters. A terminal program for causing a computer to execute,
The terminal program accepts a first or second request from a user to the computer;
Receiving a character selection;
Outputting the voice data synthesized based on the response data extracted using the database in response to the accepted first or second request and the selected character, and executing the process. Terminal program to make it function like

A plurality of basic dictionaries provided in the main body, configured to be additionally writable, provided corresponding to a plurality of selectable characters, and storing response data used for response processing to the first request for each character And the response data used in response processing to the second request instead of the basic dictionary associated with the plurality of characters and associated with the selected character is stored in the first database including A terminal program for causing a terminal computer using the second database including the additional dictionary to be executed,
The terminal program is stored in the computer.
Receiving a first or second request from a user;
Receiving a character selection;
And executing a response process according to the received request,
The step of executing the response process includes:
Extracting response data using the basic dictionary of the first database provided corresponding to the selected character when receiving the first request;
Obtaining response data extracted using the additional dictionary of the second database associated with the selected character when receiving the second request;
Synthesizing speech based on the extracted or acquired response data and the selected character;
A terminal program that causes a process to be executed, including the step of outputting voice-synthesized voice data.

A first database including a plurality of basic dictionaries provided in an external device and provided corresponding to a plurality of selectable characters and storing response data used for response processing to the first request for each character And an additional dictionary that is provided in the main body, is associated with the plurality of characters, and stores the response data used for response processing to the second request instead of the basic dictionary corresponding to the selected character. A terminal program for causing a computer of a terminal using the database of 2 to be executed,
The terminal program is stored in the computer.
Receiving a first or second request from a user;
Receiving a character selection;
And executing a response process according to the received request,
The step of executing the response process includes:
Obtaining response data using the basic dictionary of the first database provided corresponding to the selected character when receiving the first request;
Extracting response data using an additional dictionary of the second database associated with the selected character when receiving a second request;
Synthesizing speech based on the extracted or acquired response data and the selected character;
A terminal program that causes a process to be executed, including the step of outputting voice-synthesized voice data.

A plurality of basic dictionaries provided in the main body, configured to be additionally writable, provided corresponding to a plurality of selectable characters, and storing response data used for response processing to the first request for each character And a terminal using a database including the additional dictionary in which the response data used in response to the second request is used instead of the basic dictionary corresponding to the selected character and associated with the plurality of characters. A terminal program for causing a computer to execute,
The terminal program is stored in the computer.
Receiving a first or second request from a user;
Receiving a character selection;
And executing a response process according to the received request,
The step of executing the response process includes:
A step when receiving a first request, extracts the response data by using a basic dictionary of the database provided corresponding to the selected character,
A step when receiving the second request, extracts the response data by using the additional dictionary of the database that is associated with the selected character,
Synthesizing speech based on the extracted response data and the selected character;
A terminal program that causes a process to be executed, including the step of outputting voice-synthesized voice data.

An audio data output device that uses a database configured to be appendable,
A receiving means for receiving a request from a user;
Selection accepting means for accepting selection of a character;
Response executing means for executing response processing in accordance with the request received by the receiving means,
The database includes a plurality of basic dictionaries that are respectively provided corresponding to a plurality of selectable characters and in which response data is stored for each character.
An additional dictionary receiving unit that receives an additional dictionary storing the response data used instead of the basic dictionary corresponding to the selected character;
A registration unit for registering a correspondence table representing a correspondence relationship between the plurality of characters and the additional dictionary together with the additional dictionary in the database according to reception of the additional dictionary of the additional dictionary reception unit,
The response execution means includes
First extraction means for extracting response data using a basic dictionary provided corresponding to the selected character when the received request is a first request;
A second extraction means for extracting response data using an additional dictionary corresponding to the selected character with reference to the correspondence table when the accepted request is a second request;
Speech synthesis means for performing speech synthesis based on the extracted response data and the selected character;
An audio data output device including output means for outputting the synthesized voice data.