JP2006011316A

JP2006011316A - Virtual conversational system

Info

Publication number: JP2006011316A
Application number: JP2004192105A
Authority: JP
Inventors: Kokichi Tanihira; 耕吉谷平
Original assignee: Individual
Current assignee: Individual
Priority date: 2004-06-29
Filing date: 2004-06-29
Publication date: 2006-01-12

Abstract

<P>PROBLEM TO BE SOLVED: To perform a response in accordance with the characteristic of the conversation uttered by a user and the change of an external environment, etc. <P>SOLUTION: In the virtual conversational system, response data is determined by computing parameter values on the basis of input values of the characteristic of the voice of a speaker and the external environment, etc. Specifically, the system converts the characteristic of the user and the external environment, etc., into parameters and determines response data to meet the situation corresponding to these parameters. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、仮想的に会話を楽しむことができる仮想会話システムに関する。 The present invention relates to a virtual conversation system that can virtually enjoy a conversation.

近年のソフトウェア技術の発達により、種々の仮想会話システムが開発されている。このような仮想会話システムの例として、音声識別技術を使用して、ユーザの音声を解読し、この解読結果にしたがって、シナリオを展開して、仮想の人物との対話を楽しむ仮想会話システムが開発されている（非特許文献１参照。）。
EZバーチャルトーク（インターネットＵＲＬhttp://www.au.kddi.com/ezweb/contents/communication/index.html#EZ_VIRTUALTALK） With the recent development of software technology, various virtual conversation systems have been developed. As an example of such a virtual conversation system, a voice conversation technology is used to decipher the user's voice, develop a scenario according to the decryption result, and enjoy a conversation with a virtual person. (See Non-Patent Document 1).
EZ Virtual Talk (Internet URL http://www.au.kddi.com/ezweb/contents/communication/index.html#EZ_VIRTUALTALK)

しかしながら、従来の仮想会話システムにおいては、ユーザの発した会話の認識結果を利用してシナリオにおける応答データを選択することができるが、会話中におけるユーザの特性や、外部環境などの変化に応じて臨機応変にユーザの発した会話に対する応答データを選択することができなかった。 However, in the conventional virtual conversation system, it is possible to select the response data in the scenario by using the recognition result of the conversation originated by the user. However, depending on changes in the user characteristics during the conversation, the external environment, etc. It was not possible to select response data for conversations made by users on an ad hoc basis.

本発明は、上記実情に鑑みてなされたものであり、会話中におけるユーザの特性や、外部環境などの変化に応じて、応答を行なうことができる仮想会話システムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a virtual conversation system capable of performing a response in accordance with changes in user characteristics, external environment, and the like during conversation.

上記目的を達成するために、本発明の第１の発明によれば、ユーザの話し相手となる仮想会話システムにおける話者の特性を決定するユーザから入力されたデータに基づいて、第１のパラメータ値を算出する手段と、ネットワークを介して取得され、話者の特性に影響を与えるデータに基づいて、第２のパラメータ値を算出する手段と、応答データを出力する手段と、ユーザの音声を受信する手段と、前記受信されたユーザの音声に基づいて、話者の特性に影響を与える第３のパラメータ値を算出する手段と、前記第１のパラメータ値、第２のパラメータ値及び第３のパラメータ値を格納する手段と、前記格納された第１のパラメータ値、第２のパラメータ値及び第３のパラメータ値に対して所定の演算処理をすることにより第１の値を算出する手段と、前記受信されたユーザの音声を認識する手段と、第１の応答データと、複数の音声データと、複数の第２の値と、複数の第２の応答データとが関連付けられた応答データベースと、前記出力された応答データが前記第１の応答データに対応し、前記認識されたユーザの音声が前記音声データに対応し、かつ前記算出された第１の値が前記第２の値に対応する第２の応答データを前記応答データベースから検索し、この検索された第２の応答データを決定する手段と、前記決定された第２の応答データを出力する手段とを具備することを特徴とする仮想会話システム、である。 In order to achieve the above object, according to the first invention of the present invention, the first parameter value is based on the data input from the user for determining the characteristics of the speaker in the virtual conversation system as the user's partner. Means for calculating the second parameter value based on data obtained via the network and affecting the characteristics of the speaker, means for outputting response data, and receiving the user's voice Means for calculating a third parameter value that affects speaker characteristics based on the received user voice, the first parameter value, the second parameter value, and the third parameter value. A first value is calculated by performing a predetermined calculation process on the stored first parameter value, second parameter value, and third parameter value, and means for storing the parameter value Means for recognizing the received user's voice, first response data, a plurality of voice data, a plurality of second values, and a plurality of second response data. A response database, the output response data corresponds to the first response data, the recognized user's voice corresponds to the voice data, and the calculated first value is the second value Means for retrieving second response data corresponding to the value from the response database, determining the retrieved second response data, and means for outputting the determined second response data. A virtual conversation system characterized by

また、本発明の第２の発明によれば、ユーザの話し相手となる仮想会話システムにおける話者の特性を決定するユーザから入力されたデータに基づいて、第１のパラメータ値を算出する手段と、ネットワークを介して取得され、話者の特性に影響を与えるデータに基づいて、第２のパラメータ値を算出する手段と、応答データを出力する手段と、ユーザの音声を受信する手段と、ユーザの顔を認識する手段と、前記受信されたユーザの音声に基づいて、話者の特性に影響を与える第３のパラメータ値を算出する手段と、前記認識されたユーザの顔の特徴に基づいて、話者の特性に影響を与える第４のパラメータ値を算出する手段と、前記第１のパラメータ値、第２のパラメータ値、第３のパラメータ値及び第４のパラメータ値を格納する手段と、前記格納された第１のパラメータ値、第２のパラメータ値、第３のパラメータ値及び第４のパラメータ値に対して所定の演算処理をすることにより第１の値を算出する手段と、前記受信されたユーザの音声を認識する手段と、第１の応答データと、複数の音声データと、複数の第２の値と、複数の第２の応答データとが関連付けられた応答データベースと、前記出力された応答データが前記第１の応答データに対応し、前記認識されたユーザの音声が前記音声データに対応し、かつ前記算出された第１の値が前記第２の値に対応する第２の応答データを前記応答データベースから検索し、この検索された第２の応答データを決定する手段と、前記決定された第２の応答データを出力する手段とを具備することを特徴とする仮想会話システム、である。 Further, according to the second invention of the present invention, means for calculating the first parameter value based on the data input from the user determining the characteristics of the speaker in the virtual conversation system that is the user's speaking partner, Means for calculating a second parameter value, means for outputting response data, means for receiving the user's voice, based on data obtained via the network and affecting the characteristics of the speaker; Based on the means for recognizing the face, the means for calculating a third parameter value that affects the characteristics of the speaker based on the received user's voice, and the feature of the face of the recognized user, Means for calculating a fourth parameter value affecting the characteristics of the speaker, and means for storing the first parameter value, the second parameter value, the third parameter value, and the fourth parameter value Means for calculating a first value by performing predetermined arithmetic processing on the stored first parameter value, second parameter value, third parameter value, and fourth parameter value; Means for recognizing the received user's voice; first response data; a plurality of voice data; a plurality of second values; and a response database associated with a plurality of second response data; The output response data corresponds to the first response data, the recognized user's voice corresponds to the voice data, and the calculated first value corresponds to the second value. And a means for determining the retrieved second response data and a means for outputting the determined second response data. Conversation system , It is.

本発明によれば、会話中におけるユーザの特性や、外部環境などの変化に応じて、応答を行なうことができる。 ADVANTAGE OF THE INVENTION According to this invention, a response can be performed according to the change of a user's characteristic in a conversation, an external environment, etc.

以下、図面を参照して、本発明の実施の形態に係る仮想会話システムについて説明する。 Hereinafter, a virtual conversation system according to an embodiment of the present invention will be described with reference to the drawings.

図１は、本発明の実施の形態に係る仮想会話システムを実現するコンピュータを示す図である。 FIG. 1 is a diagram illustrating a computer that realizes a virtual conversation system according to an embodiment of the present invention.

同図に示すように、仮想会話システム１は、バス１０に接続された通信部１１、ＣＰＵ１２、メモリ１３、音声出力部１４、音声入力部１５、入力部１６、画像入力部１７、及びＨＤＤ（ハードディスクドライブ）１８を具備している。 As shown in the figure, the virtual conversation system 1 includes a communication unit 11, a CPU 12, a memory 13, a voice output unit 14, a voice input unit 15, an input unit 16, an image input unit 17, and an HDD ( Hard disk drive) 18.

通信部１１は、インターネットと接続されており、インターネットに対する通信を司る。 The communication unit 11 is connected to the Internet and manages communication with the Internet.

ＣＰＵ１２は、仮想会話システム全体の制御を司るものであり、後述する会話処理プログラム２１などを実行することにより、仮想会話を実現する。 The CPU 12 controls the entire virtual conversation system, and implements a virtual conversation by executing a conversation processing program 21 described later.

メモリ１３は、会話処理プログラム２１などを実行する場合のワークエリアなどとして使用される。 The memory 13 is used as a work area for executing the conversation processing program 21 or the like.

音声出力部１４は、仮想会話システムからの音声を出力するものであり、例えば、スピーカである。 The audio output unit 14 outputs audio from the virtual conversation system, and is, for example, a speaker.

音声入力部１５は、ユーザからの音声を入力するためのものであり、例えば、マイクである。 The voice input unit 15 is for inputting voice from the user, and is, for example, a microphone.

入力部１６は、仮想会話システムにおける仮想会話の相手の設定を行なうためのパラメータ値などを入力するものであり、例えば、キーボードなどの入力デバイスである。 The input unit 16 inputs parameter values for setting a virtual conversation partner in the virtual conversation system, and is an input device such as a keyboard.

画像入力部１７は、ユーザの画像を入力するためのものであり、例えば、撮像カメラなどの画像入力デバイスである。 The image input unit 17 is for inputting a user image, and is, for example, an image input device such as an imaging camera.

ＨＤＤ１８は、会話処理プログラム２１、パラメータ値格納部２２、音声認識プログラム２３、画像認識プログラム２４及び応答データＤＢ２５を格納する。 The HDD 18 stores a conversation processing program 21, a parameter value storage unit 22, a voice recognition program 23, an image recognition program 24, and a response data DB 25.

会話処理プログラム２１は、本発明の実施の形態に係る仮想会話処理システムの動作を実現する。 The conversation processing program 21 realizes the operation of the virtual conversation processing system according to the embodiment of the present invention.

パラメータ値格納部２２は、ユーザから入力される基本パラメータ値、インターネットを介して外部から取得されたネットワークデータから算出されるネットワークパラメータ値、ユーザの音声データから算出される音声パラメータ値、及びユーザの画像データから算出される画像パラメータ値を格納するものである。 The parameter value storage unit 22 is a basic parameter value input from a user, a network parameter value calculated from network data acquired from the outside via the Internet, a voice parameter value calculated from user voice data, and a user parameter An image parameter value calculated from image data is stored.

音声認識プログラム２３は、ユーザから音声入力部１５を介して入力された音声の音声認識処理を行なう。 The voice recognition program 23 performs voice recognition processing of voice input from the user via the voice input unit 15.

画像認識プログラム２４は、画像入力部１７を介して入力されたユーザの画像の認識処理を行なう。 The image recognition program 24 performs a process of recognizing the user image input via the image input unit 17.

応答データＤＢ２５は、複数のシナリオを格納する。このシナリオは、図１４に示すように、複数の応答データを有し、この応答データは、前回の応答データ、パラメータ値及び認識された音声を示す認識データと関連付けられている。 The response data DB 25 stores a plurality of scenarios. As shown in FIG. 14, this scenario has a plurality of response data, and this response data is associated with the previous response data, parameter values, and recognition data indicating the recognized speech.

このような応答データＤＢによれば、前回の応答データ、音声認識データ及び総合パラメータ値によって応答データが特定されることになる。 According to such a response data DB, the response data is specified by the previous response data, voice recognition data, and the overall parameter value.

図２は、本発明の実施の形態に係る仮想会話システムの機能を示すブロック図である。 FIG. 2 is a block diagram showing functions of the virtual conversation system according to the embodiment of the present invention.

同図に示すように、仮想会話システムの機能ブロックは、基本パラメータ値受け付け部３１、ネットワークパラメータ値算出部３２、音声認識部３３、音声パラメータ値算出部３４、画像認識部３５、画像パラメータ値算出部３６、応答データ決定部３７、及び応答データ出力部３８を具備している。 As shown in the figure, the functional block of the virtual conversation system includes a basic parameter value receiving unit 31, a network parameter value calculating unit 32, a voice recognizing unit 33, a voice parameter value calculating unit 34, an image recognizing unit 35, and an image parameter value calculating. A unit 36, a response data determination unit 37, and a response data output unit 38.

基本パラメータ値受け付け部３１は、ユーザから入力された基本設定を受け付け、基本パラメータ値を算出して応答データ決定部３７に出力する。ここで、入力される基本設定は、例えば、仮想会話システムにおける話者の年齢、性別、血液型、仕事などである。 The basic parameter value receiving unit 31 receives basic settings input from the user, calculates basic parameter values, and outputs them to the response data determining unit 37. Here, the input basic settings are, for example, the speaker's age, sex, blood type, work, etc. in the virtual conversation system.

ネットワークパラメータ値算出部３２は、インターネットから取得されたネットワークデータに基づいてネットワークパラメータ値を算出する。このネットワークパラメータ値の算出については後述する。 The network parameter value calculation unit 32 calculates a network parameter value based on network data acquired from the Internet. The calculation of the network parameter value will be described later.

音声認識部３３は、ユーザから入力された音声データの認識を行なう。 The voice recognition unit 33 recognizes voice data input from the user.

音声パラメータ値算出部３４は、音声認識部３３によって認識された音声データに基づいて音声パラメータ値を算出する。この音声パラメータ値の算出については後述する。 The voice parameter value calculation unit 34 calculates a voice parameter value based on the voice data recognized by the voice recognition unit 33. The calculation of the voice parameter value will be described later.

画像認識部３５は、ユーザの顔の画像の認識を行なう。 The image recognition unit 35 recognizes the image of the user's face.

画像パラメータ値算出部３６は、画像認識部３５によって認識されたユーザの顔の画像の画像パラメータ値を算出する。この画像パラメータ値の算出については後述する。 The image parameter value calculation unit 36 calculates the image parameter value of the user's face image recognized by the image recognition unit 35. The calculation of the image parameter value will be described later.

応答データ決定部３７は、基本パラメータ値、ネットワークパラメータ値、音声パラメータ値、画像パラメータ値、及び認識された音声データに基づいて、応答データＤＢ２５を検索して応答データを決定する。この応答データの決定方法については後述する。 The response data determination unit 37 searches the response data DB 25 to determine response data based on the basic parameter value, the network parameter value, the voice parameter value, the image parameter value, and the recognized voice data. A method for determining the response data will be described later.

応答データ出力部３８は、応答データ決定部３７によって決定された応答データを出力する。 The response data output unit 38 outputs the response data determined by the response data determination unit 37.

図３は、応答データ決定部の機能を示すブロック図である。 FIG. 3 is a block diagram illustrating functions of the response data determination unit.

同図に示すように、応答データ決定部３７は総合パラメータ値算出部４１及び応答データＤＢ検索部４２を具備している。 As shown in the figure, the response data determination unit 37 includes an overall parameter value calculation unit 41 and a response data DB search unit 42.

総合パラメータ値算出部４１は、入力された基本パラメータ値、ネットワークパラメータ値、音声パラメータ値及び画像パラメータ値に対して所定の演算を行なうことにより総合パラメータ値を算出する。 The total parameter value calculation unit 41 calculates a total parameter value by performing predetermined calculations on the input basic parameter value, network parameter value, audio parameter value, and image parameter value.

総合パラメータ値の算出は、図５に示すような式により算出される。 The total parameter value is calculated by an equation as shown in FIG.

ここで、Ａ1〜Ａ4は、重み付け係数である。この重み付け係数は、シナリオの種類に応じて設定されるものであり、これによりシナリオの種類に応じた話者の特性を設定することができる。 Here, A1 to A4 are weighting coefficients. This weighting coefficient is set according to the type of scenario, and thus the speaker characteristics according to the type of scenario can be set.

次に、各パラメータ値について説明する。 Next, each parameter value will be described.

基本パラメータ値は、仮想会話システムにおける話者の年齢、性別、血液型、仕事を表わすパラメータ値である。例えば、基本パラメータ値は図６に示すように設定される。 The basic parameter values are parameter values representing the speaker's age, sex, blood type, and work in the virtual conversation system. For example, the basic parameter value is set as shown in FIG.

ここで、重み付け係数Ｂ1〜Ｂ4は、シナリオの種類に応じて設定されるものであり、これによりシナリオの種類に応じた話者の特性を設定することができる。 Here, the weighting coefficients B1 to B4 are set according to the type of scenario, and thus the speaker characteristics according to the type of scenario can be set.

［年齢］については、下記のように設定される。 [Age] is set as follows.

０歳〜１０歳［年齢］＝１
１１歳〜３０歳［年齢］＝２
３１歳〜７０歳［年齢］＝３
７１歳〜［年齢］＝４
このようなパラメータ値は、入力された年齢と、［年齢］のパラメータ値とを関連付けたテーブルを用意しておき（図示せず）、入力された年齢を使用して、当該テーブルを検索することにより［年齢］のパラメータ値が算出される。 0-10 years old [age] = 1
11-30 years old [age] = 2
31 to 70 years [age] = 3
71 years old ~ [age] = 4
For such parameter values, a table in which the input age and the [age] parameter value are associated is prepared (not shown), and the input age is used to search the table. Thus, the parameter value of [age] is calculated.

また、［性別］については、下記のように判断される
男［性別］＝１
女［性別」＝１．５
このようなパラメータ値は、入力された性別と、［性別］のパラメータ値とを関連付けたテーブルを用意しておき（図示せず）、入力された性別を使用して、当該テーブルを検索することにより［性別］のパラメータ値が決定される。 [Gender] is determined as follows: Male [Gender] = 1
Woman [Gender] = 1.5
For such parameter values, a table in which the entered gender and [gender] parameter values are associated is prepared (not shown), and the entered gender is used to search the table. Thus, the parameter value of [sex] is determined.

［血液型］、［仕事］についても、それぞれの種類に応じてパラメータ値が決定される。上記基本パラメータ値の設定項目（年齢、性別など）、パラメータ値は任意に設定可能である。例えば、上記設定項目に加えて、趣味に関する項目、性格に関する項目、好みに関する項目などを設けてもよい。 Also for [blood type] and [work], parameter values are determined according to each type. The setting items (age, sex, etc.) and parameter values for the basic parameter values can be arbitrarily set. For example, in addition to the above setting items, items relating to hobbies, items relating to personality, items relating to preferences, and the like may be provided.

ネットワークパラメータ値は、ネットワーク上において取得されたネットワークデータから算出して得られるパラメータ値であり、話者の性格に対して影響を与える外部要因としてのパラメータである。 The network parameter value is a parameter value obtained by calculation from network data acquired on the network, and is a parameter as an external factor that affects the personality of the speaker.

このネットワークデータは、仮想会話システムにより自動的に取得されるものであり、所定のｗｅｂサイトから取得しても良いし、検索エンジンを使用して取得してもよいし、ユーザによって設定された取得先から取得しても良いし、その取得方法は問わない。 This network data is automatically acquired by the virtual conversation system, and may be acquired from a predetermined web site, may be acquired using a search engine, or may be set by a user. You may acquire from the front and the acquisition method does not ask | require.

図７は、ネットワークパラメータ値を算出するための式を示す図である。ここで、Ｃ1〜Ｃ3は、重み付け係数である。この重み付け係数は、シナリオの種類に応じて設定されるものであり、これによりシナリオの種類に応じた話者の特性を設定することができる。 FIG. 7 is a diagram illustrating an expression for calculating the network parameter value. Here, C1 to C3 are weighting coefficients. This weighting coefficient is set according to the type of scenario, and thus the speaker characteristics according to the type of scenario can be set.

図７において示した［天気］、［気温］、［湿度」の項目についてのネットワークデータは、例えば、天気予報のサイトから取得される。ネットワークパラメータ値の［天気］については、下記のように設定される。 The network data for the items [weather], [temperature], and [humidity] shown in FIG. 7 are acquired from, for example, a weather forecast site. The network parameter value [weather] is set as follows.

晴れ［天気］＝１
曇り［天気］＝２
雨［天気］＝３
このようなパラメータ値は、天気を示すデータと、［天気］のパラメータ値とを関連付けたテーブルを用意しておき（図示せず）、取得された天気を示すデータを使用して、当該テーブルを検索することにより［天気］のパラメータ値が決定される。 Sunny [Weather] = 1
Cloudy [Weather] = 2
Rain [weather] = 3
For such parameter values, a table in which the data indicating the weather and the parameter values of [weather] are associated is prepared (not shown), and the table is obtained using the acquired data indicating the weather. By searching, the parameter value of [weather] is determined.

また、［気温」については、下記のように設定される。 [Temperature] is set as follows.

〜５℃ ［気温］＝１
６℃〜２０℃ ［気温］＝２
２１℃〜３０℃ ［気温」＝３
３１℃〜［気温］＝４
このようなパラメータ値は、気温を示すデータと、［気温］のパラメータ値とを関連付けたテーブルを用意しておき（図示せず）、取得された気温を示すデータを使用して、当該テーブルを検索することにより［気温］のパラメータ値が決定される。 ~ 5 ℃ [Air temperature] = 1
6 ℃ ~ 20 ℃ [Air temperature] = 2
21 ℃ -30 ℃ [Temperature] = 3
31 ° C ~ [Air temperature] = 4
For such parameter values, a table in which the data indicating the air temperature and the parameter value of [air temperature] are associated (not shown) is prepared, and the table is obtained using the acquired data indicating the air temperature. By searching, the parameter value of [temperature] is determined.

［湿度」についても、湿度の種類に応じてパラメータ値が定められる。 As for [humidity], a parameter value is determined according to the type of humidity.

上記ネットワークパラメータ値の設定項目（天気、気温など）、パラメータ値は任意に設定可能である。例えば、上記設定項目に加えて、趣味に関する項目（例えば、野球の試合結果）などを設けてもよい。 The setting items (weather, temperature, etc.) and parameter values of the network parameter value can be arbitrarily set. For example, in addition to the setting items, items related to hobbies (for example, baseball game results) may be provided.

音声パラメータ値は、仮想会話システムにおける話者の音声を表わすパラメータ値である。音声パラメータ値は、音声の強さ、速度、及び語尾に基づいて定められる。すなわち、入力された音声は、音声認識処理により音声の強さ、速度、語尾が認識され、基準となる音声の強さ、速度、語尾に対して、強いか、弱いか、かなり強いか（強大）、かなり弱いか（弱大）、基準と同程度か（標準）を判断し、図８に示すようなテーブルを使用して、音声の強さ、速度、及び語尾それぞれに対してパラメータ値が決定される。 The voice parameter value is a parameter value representing the voice of the speaker in the virtual conversation system. The voice parameter value is determined based on the voice strength, speed, and ending. In other words, the input speech is recognized by the speech recognition process for the strength, speed, and ending of the speech, and is strong, weak, or quite strong against the reference speech strength, speed, and ending (strong) ), Whether it is fairly weak (weak), or similar to the standard (standard), and using the table as shown in FIG. 8, the parameter values for the voice strength, speed, and ending are respectively It is determined.

入力音声パラメータ値は、図９に示す式によって算出される。ここで、Ｄ1〜Ｄ3は、重み付け係数である。この重み付け係数は、シナリオの種類に応じて設定されるものであり、これによりシナリオの種類に応じた話者の特性を設定することができる。 The input voice parameter value is calculated by the formula shown in FIG. Here, D1 to D3 are weighting coefficients. This weighting coefficient is set according to the type of scenario, and thus the speaker characteristics according to the type of scenario can be set.

図１０は、最終音声パラメータ値を決定するためのテーブルを示す図である。同図において、予測パラメータ値は前回の最終音声パラメータ値であり、入力音声パラメータ値は図９に示した式によって決定されるパラメータ値である。最終音声パラメータ値は、以下の規則にしたがって、算出される。 FIG. 10 is a diagram showing a table for determining the final audio parameter value. In the figure, the predicted parameter value is the previous final speech parameter value, and the input speech parameter value is the parameter value determined by the equation shown in FIG. The final audio parameter value is calculated according to the following rules.

１．予測パラメータ値と入力音声パラメータ値との差が２以下の場合には、入力音声パラメータ値が最終音声パラメータ値となる。 1. When the difference between the predicted parameter value and the input voice parameter value is 2 or less, the input voice parameter value becomes the final voice parameter value.

２．予測パラメータ値と入力音声パラメータ値との差が＋２以上の場合には、予測パラメータ値に「２」を付加した値が最終音声パラメータ値となる。 2. When the difference between the prediction parameter value and the input speech parameter value is +2 or more, a value obtained by adding “2” to the prediction parameter value is the final speech parameter value.

３．予測パラメータ値と入力音声パラメータ値との差が−２以下の場合には、予測パラメータ値に「−２」を付加した値が最終音声パラメータ値となる。 3. When the difference between the predicted parameter value and the input speech parameter value is −2 or less, a value obtained by adding “−2” to the predicted parameter value is the final speech parameter value.

このような方法で、音声パラメータ値を決定することにより、最終的な音声パラメータ値が急激に変動することを防止することができるとともに、時系列要素を考慮することができる。 By determining the voice parameter value by such a method, it is possible to prevent the final voice parameter value from abruptly fluctuating and to consider time series elements.

画像パラメータ値は、仮想会話システムにおける話者に影響を与えるユーザの状態を示すパラメータ値である。ここでは画像認識されたユーザの画像のうち、顔の画像、目の画像、口の画像、体全体の画像などに着目して、ユーザの状態をパラメータ値化する。このユーザの状態を判断する方法については、任意に定められる。 The image parameter value is a parameter value indicating a user state that affects a speaker in the virtual conversation system. Here, the user's state is converted into a parameter value by paying attention to the face image, the eye image, the mouth image, the whole body image, etc., among the image images of the user recognized. A method for determining the state of the user is arbitrarily determined.

本実施の形態においては、画像認識処理により、「目が覚めている」、「目が覚めているかも」、「寝ているかも」、「寝ている」のいずれに該当するかを、画像認識されたユーザの画像のうち、顔の画像、目の画像、口の画像、体全体の画像などに着目して決定する。 In this embodiment, the image recognition process determines whether the image corresponds to “Wake up”, “Wake up”, “Sleeping”, or “Sleeping”. Of the recognized user images, the determination is made by paying attention to the face image, the eye image, the mouth image, the whole body image, and the like.

この決定されたユーザの状態に応じた入力画像パラメータ値を図１１に示すテーブルを参照して決定する。そして、最終画像パラメータ値を下記規則にしたがって求める。 The input image parameter value corresponding to the determined user state is determined with reference to the table shown in FIG. Then, the final image parameter value is obtained according to the following rules.

なお、本実施の形態においては、画像認識パラメータ値は、１０秒毎に取得されているものと仮定するが、これに限られるものではない。 In the present embodiment, it is assumed that the image recognition parameter value is acquired every 10 seconds, but the present invention is not limited to this.

１．入力画像パラメータ値が「０」又は「１」の場合、現在の入力画像パラメータ値を含めた１分間のパラメータ値の和を求め、この求められたパラメータ値から図１２に示したテーブルを検索して最終画像パラメータ値を決定する。 1. When the input image parameter value is “0” or “1”, the sum of the parameter values for one minute including the current input image parameter value is obtained, and the table shown in FIG. 12 is searched from the obtained parameter value. To determine the final image parameter value.

２．入力画像パラメータ値が「２」の場合であって、かつ１０秒前が「０」又は「１」の場合には、最終画像パラメータを「２」とする。 2. If the input image parameter value is “2” and 10 seconds ago is “0” or “1”, the final image parameter is set to “2”.

３．入力画像パラメータ値が「２」の場合であって、かつ１０秒前が「２」又は「３」の場合には、最終画像パラメータを「３」とする。 3. If the input image parameter value is “2” and 10 seconds ago is “2” or “3”, the final image parameter is set to “3”.

４．入力画像パラメータ値が「２」の場合、最終画像パラメータを「３」とする。 4). When the input image parameter value is “2”, the final image parameter is “3”.

図１３は、最終画像パラメータ値の例を示す図である。同図においては、６０秒目に取得された画像パラメータ値が取得された画像パラメータ値であり、この取得された画像パラメータ値が、上記規則にしたがって、最終画像パラメータ値に書き換えられた例を示している。なお、上述の各パラメータ値を算出するための算出式、及びテーブルは、ＨＤＤ１８に格納されているものとする（図示せず）。 FIG. 13 is a diagram illustrating an example of the final image parameter value. In the figure, an image parameter value acquired at 60 seconds is an acquired image parameter value, and the acquired image parameter value is rewritten to a final image parameter value according to the above rule. ing. It is assumed that the calculation formulas and tables for calculating each parameter value described above are stored in the HDD 18 (not shown).

本実施の形態においては、取得されたユーザの画像のうち、所定の部分（口、目、体全体）を基に眠たさを示すパラメータ値を算出する例について説明したが、これに限られるものではなく、取得されたユーザの画像からユーザの個性に基づく情報（例えば、願色）を取得し、パラメータ値に置き換えるものであれば本発明の範囲に含まれる。 In the present embodiment, an example has been described in which the parameter value indicating sleepiness is calculated based on a predetermined part (mouth, eyes, whole body) in the acquired user image, but the present invention is not limited to this. Instead, information based on the user's personality (for example, desired color) is acquired from the acquired user's image and replaced with the parameter value, and is included in the scope of the present invention.

応答データＤＢ検索部４２は、認識された音声データ及び総合パラメータ値算出部４１によって算出された総合パラメータ値に基づいて、応答データＤＢ２５を検索することにより応答データを決定する。 The response data DB search unit 42 determines the response data by searching the response data DB 25 based on the recognized voice data and the total parameter value calculated by the total parameter value calculation unit 41.

図４は、応答データＤＢに格納される複数のシナリオのうちの一つのシナリオの例を示す図である。 FIG. 4 is a diagram illustrating an example of one scenario among a plurality of scenarios stored in the response data DB.

同図に示すように、シナリオＩは複数の応答データＡ−１〜Ａ−１０を有し、これら応答データＡ−１〜Ａ−１０は関連付けられている。同図において、矢印は時間の流れを示しており、例えば、応答データＡ−１を出力した後に、ユーザからの音声を受信し、その後、応答データＡ−２を出力することを示し、次に、応答データＡ−２を出力した後、ユーザからの音声を受信し、認識された音声及び総合パラメータ値の結果によって、応答データＡ−２或いは応答データＡ−３を出力することを示している。 As shown in the figure, the scenario I has a plurality of response data A-1 to A-10, and these response data A-1 to A-10 are associated with each other. In the figure, the arrows indicate the flow of time. For example, after outputting the response data A-1, the arrows indicate that the voice from the user is received, and then the response data A-2 is output. After the response data A-2 is output, the voice from the user is received, and the response data A-2 or the response data A-3 is output depending on the recognized voice and the result of the comprehensive parameter value. .

図１４は、応答データＤＢに格納されるデータを示す図である。同図に示すように、１つの応答データには複数の音声認識データが関連付けられている。各音声認識データには複数のパラメータ値が関連付けられており、このパラメータ値には応答データが関連付けられている。 FIG. 14 is a diagram illustrating data stored in the response data DB. As shown in the figure, a plurality of voice recognition data is associated with one response data. Each voice recognition data is associated with a plurality of parameter values, and response data is associated with the parameter values.

すなわち、前に出力された応答データ、音声認識データ（認識された音声データ）及びパラメータ値を使用することにより、ユーザから入力された音声に対応する応答データが検索されることが可能となる。 That is, by using the response data, the speech recognition data (recognized speech data) and the parameter values that have been output previously, it is possible to search for response data corresponding to the speech input by the user.

次に、本発明の実施の形態に係る仮想会話システムの動作について、図１５のフローチャートを参照して説明する。 Next, the operation of the virtual conversation system according to the embodiment of the present invention will be described with reference to the flowchart of FIG.

仮想会話システムを起動すると、まず、ユーザから入力されるユーザの話相手となる話者の基本的な特性を決定するための基本設定を受け付ける（Ｓ１）。ここで、基本パラメータ値は、仮想会話システムにおける話者の年齢、性別、血液型、仕事などである。 When the virtual conversation system is activated, first, a basic setting for determining basic characteristics of a speaker who is a user's partner input from the user is accepted (S1). Here, the basic parameter values are the speaker's age, sex, blood type, work, etc. in the virtual conversation system.

次に、入力された基本設定に対して、基本パラメータ値の算出が行なわれる（Ｓ２）。この基本パラメータ値の算出は、上述のように、入力される基本設定と、基本パラメータ値とが対応付けられたテーブル（図示せず）を検索することにより算出される。 Next, basic parameter values are calculated for the input basic settings (S2). As described above, the basic parameter value is calculated by searching a table (not shown) in which the input basic setting is associated with the basic parameter value.

次に、基本パラメータ値以外の外部要因に関するデータ（ネットワークデータ及び画像データ）を取得する時期か否かの判断が行なわれる（Ｓ３）。この取得の時期については、ユーザにより任意に設定可能である。例えば、ネットワークデータについては、朝の９時、ユーザの画像については１０秒毎など任意に設定することができる。 Next, it is determined whether it is time to acquire data (network data and image data) related to external factors other than the basic parameter values (S3). The acquisition time can be arbitrarily set by the user. For example, the network data can be arbitrarily set at 9 o'clock in the morning and the user image every 10 seconds.

Ｓ３において、データ取得時期であると判断された場合には、次に、ネットワークデータの取得時期であるか否かの判断が行なわれる（Ｓ４）。Ｓ４においてネットワークデータの取得時期であると判断された場合には、ネットワークデータの取得が行なわれる。このネットワークデータの取得方法は、上述のように特に限定はしない。本実施の形態においては、ユーザによって設定された天気予報のサイトから［天気］、［気温］、［湿度」の項目についてのネットワークデータが取得されるものとする（Ｓ５）。 If it is determined in S3 that it is time to acquire data, it is next determined whether it is time to acquire network data (S4). If it is determined in S4 that it is time to acquire network data, network data is acquired. The network data acquisition method is not particularly limited as described above. In the present embodiment, it is assumed that network data for the items [weather], [temperature], and [humidity] are acquired from the weather forecast site set by the user (S5).

次に、取得されたネットワークデータに対して、ネットワークパラメータ値の算出が行なわれる（Ｓ６）。このネットワークパラメータ値の算出は、上述のように、［天気］、［気温］、［湿度」の項目に関するネットワークデータと、ネットワークパラメータ値とが対応付けられたテーブル（図示せず）を検索することにより算出される（Ｓ６）。 Next, network parameter values are calculated for the acquired network data (S6). As described above, the network parameter value is calculated by searching a table (not shown) in which the network data relating to the items [weather], [temperature], and [humidity] are associated with the network parameter value. (S6).

Ｓ６においてネットワークパラメータ値の算出が行なわれた後、或いはＳ４においてネットワークデータの取得時期ではないと判断された場合には、画像データの取得時期か否かの判断が行なわれる（Ｓ７）。 After the network parameter value is calculated in S6 or when it is determined in S4 that it is not the network data acquisition time, it is determined whether or not it is the image data acquisition time (S7).

Ｓ７において画像データ取得時期であると判断された場合には、ユーザの画像を示す画像データの取得を行ない（Ｓ８）、この取得された画像に対して画像認識処理を行なう（Ｓ９）。そして、この画像認識処理が行なわれたユーザの画像に対して、上述のように、画像パラメータ値の算出を行ない（Ｓ１０）、Ｓ３の処理に戻る。 If it is determined in S7 that it is time to acquire image data, image data indicating the user's image is acquired (S8), and image recognition processing is performed on the acquired image (S9). Then, as described above, the image parameter value is calculated for the user image that has undergone this image recognition process (S10), and the process returns to S3.

一方、Ｓ３においてデータ取得時期ではないと判断された場合には、応答データの音声出力が行なわれる（Ｓ２１）。次に、Ｓ２１において出力された音声の応答データに対応するユーザの音声を取得し（Ｓ２２）、この取得された音声の認識処理が行なわれ（Ｓ２３）、音声パラメータ値の算出が行なわれる（Ｓ２４）。 On the other hand, if it is determined in S3 that it is not the data acquisition time, the response data is output as a sound (S21). Next, the user's voice corresponding to the voice response data output in S21 is acquired (S22), the acquired voice is recognized (S23), and the voice parameter value is calculated (S24). ).

次に、算出された基本パラメータ値、ネットワークデータパラメータ値、画像パラメータ値及び音声パラメータ値の値に基づいて、総合パラメータ値の算出が行なわれる（Ｓ２５）。 Next, a total parameter value is calculated based on the calculated basic parameter value, network data parameter value, image parameter value, and audio parameter value (S25).

その後、前回出力された応答データ、認識された音声データ及びＳ２５において算出された総合パラメータ値に基づいてシナリオデータベースを検索することにより、次に出力する応答データを決定する（Ｓ２６）。 Thereafter, the response data to be output next is determined by searching the scenario database based on the response data output last time, the recognized voice data, and the total parameter value calculated in S25 (S26).

その後、会話が終了したか否かの判断が行なわれる。会話が終了したか否かの判断は、出力した応答データに対応する音声認識データ（認識された音声データ）が応答データデータベースに存在しない場合に、会話が終了したと判断される。Ｓ２７において、会話が終了したと判断された場合には、会話システムにおける処理を終了する。一方、会話が終了していないと判断された場合には、Ｓ３の処理に戻る。 Thereafter, a determination is made as to whether the conversation has ended. The determination as to whether or not the conversation has ended is made when the speech recognition data (recognized voice data) corresponding to the output response data does not exist in the response data database. If it is determined in S27 that the conversation has ended, the process in the conversation system is ended. On the other hand, if it is determined that the conversation has not ended, the process returns to S3.

したがって、本発明の実施の形態に係る仮想会話システムによれば、外部要因に応じて仮想会話システムにおける話者のパラメータ値が変化し、その結果、外部要因に応じたシナリオを展開することができる。 Therefore, according to the virtual conversation system according to the embodiment of the present invention, the parameter value of the speaker in the virtual conversation system changes according to the external factor, and as a result, the scenario according to the external factor can be developed. .

なお、本発明は上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。 Note that the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. In addition, various inventions can be formed by appropriately combining a plurality of components disclosed in the embodiment. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, constituent elements over different embodiments may be appropriately combined.

また、実施形態に記載した手法は、計算機（コンピュータ）に実行させることができるプログラム（ソフトウエア手段）として、例えば磁気ディスク（フロッピー（登録商標）ディスク、ハードディスク等）、光ディスク（ＣＤ−ＲＯＭ、ＤＶＤ、ＭＯ等）、半導体メモリ（ＲＯＭ、ＲＡＭ、フラッシュメモリ等）等の記録媒体に格納し、また通信媒体により伝送して頒布することもできる。なお、媒体側に格納されるプログラムには、計算機に実行させるソフトウエア手段（実行プログラムのみならずテーブルやデータ構造も含む）を計算機内に構成させる設定プログラムをも含む。本装置を実現する計算機は、記録媒体に記録されたプログラムを読み込み、また場合により設定プログラムによりソフトウエア手段を構築し、このソフトウエア手段によって動作が制御されることにより上述した処理を実行する。なお、本明細書でいう記録媒体は、頒布用に限らず、計算機内部あるいはネットワークを介して接続される機器に設けられた磁気ディスクや半導体メモリ等の記憶媒体を含むものである。 In addition, the method described in the embodiment is, for example, a magnetic disk (floppy (registered trademark) disk, hard disk, etc.), optical disk (CD-ROM, DVD) as programs (software means) that can be executed by a computer (computer). , MO, etc.), a semiconductor memory (ROM, RAM, flash memory, etc.) or the like, or can be transmitted and distributed via a communication medium. The program stored on the medium side includes a setting program that configures software means (including not only the execution program but also a table and data structure) in the computer. A computer that implements this apparatus reads a program recorded on a recording medium, constructs software means by a setting program as the case may be, and executes the above-described processing by controlling the operation by this software means. The recording medium referred to in this specification is not limited to distribution, but includes a storage medium such as a magnetic disk or a semiconductor memory provided in a computer or a device connected via a network.

本発明の実施の形態に係る仮想会話システムを実現するコンピュータを示す図である。It is a figure which shows the computer which implement | achieves the virtual conversation system which concerns on embodiment of this invention. 本発明の実施の形態に係る仮想会話システムの機能を示すブロック図である。It is a block diagram which shows the function of the virtual conversation system which concerns on embodiment of this invention. 応答データ決定部の機能を示すブロック図である。It is a block diagram which shows the function of a response data determination part. 応答データＤＢに格納される複数のシナリオのうちの一つのシナリオの例を示す図である。It is a figure which shows the example of one scenario among the several scenarios stored in response data DB. 総合パラメータ値を算出するための式を示す図である。It is a figure which shows the type | formula for calculating a comprehensive parameter value. 基本パラメータ値を算出するための式を示す図である。It is a figure which shows the type | formula for calculating a basic parameter value. ネットワークパラメータ値を算出するための式を示す図である。It is a figure which shows the type | formula for calculating a network parameter value. 音声パラメータ値を求めるためのテーブルである。It is a table for calculating | requiring an audio | voice parameter value. 入力音声パラメータ値を算出するための式を示す図である。It is a figure which shows the type | formula for calculating an input audio | voice parameter value. 最終音声パラメータ値を決定するためのテーブルを示す図である。It is a figure which shows the table for determining the last audio | voice parameter value. 入力画像パラメータ値を決定するためのテーブルを示す図である。It is a figure which shows the table for determining an input image parameter value. 最終画像パラメータ値を決定するためのテーブルを示す図である。It is a figure which shows the table for determining a final image parameter value. 最終画像パラメータ値の求め方を説明するための図である。It is a figure for demonstrating how to obtain | require a final image parameter value. 応答データＤＢに格納されるデータを示す図である。It is a figure which shows the data stored in response data DB. 本発明の実施の形態に係る仮想会話システムの動作について説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement of the virtual conversation system which concerns on embodiment of this invention.

Explanation of symbols

１…仮想会話システム、１０…バス、１１…通信部、１２…ＣＰＵ、１３…メモリ、１４…音声出力部、１５…音声入力部、１６…入力部、１７…画像入力部、１８…ＨＤＤ（ハードディスクドライブ）、２１…会話処理プログラム、２２…パラメータ値格納部、２３…音声認識プログラム、２４…画像認識プログラム、２５…応答データＤＢ。 DESCRIPTION OF SYMBOLS 1 ... Virtual conversation system, 10 ... Bus, 11 ... Communication part, 12 ... CPU, 13 ... Memory, 14 ... Voice output part, 15 ... Voice input part, 16 ... Input part, 17 ... Image input part, 18 ... HDD ( Hard disk drive), 21 ... conversation processing program, 22 ... parameter value storage unit, 23 ... voice recognition program, 24 ... image recognition program, 25 ... response data DB.

Claims

Means for calculating a first parameter value based on data input from the user for determining the characteristics of the speaker in the virtual conversation system that is the user's speaking partner;
Means for calculating a second parameter value based on data obtained via the network and affecting the characteristics of the speaker;
Means for outputting response data;
Means for receiving the user's voice;
Means for calculating a third parameter value that affects speaker characteristics based on the received user voice;
Means for storing the first parameter value, the second parameter value, and the third parameter value;
Means for calculating a first value by performing a predetermined calculation process on the stored first parameter value, second parameter value and third parameter value;
Means for recognizing the received user voice;
A response database in which the first response data, the plurality of audio data, the plurality of second values, and the plurality of second response data are associated;
The output response data corresponds to the first response data, the recognized user's voice corresponds to the voice data, and the calculated first value corresponds to the second value. Means for retrieving second response data from the response database and determining the retrieved second response data;
A virtual conversation system, comprising: means for outputting the determined second response data.

Means for calculating a first parameter value based on data input from the user for determining the characteristics of the speaker in the virtual conversation system that is the user's speaking partner;
Means for calculating a second parameter value based on data obtained via the network and affecting the characteristics of the speaker;
Means for outputting response data;
Means for receiving the user's voice;
Means for recognizing the user's face;
Means for calculating a third parameter value that affects speaker characteristics based on the received user voice;
Means for calculating a fourth parameter value that affects speaker characteristics based on the recognized facial features of the user;
Means for storing the first parameter value, the second parameter value, the third parameter value and the fourth parameter value;
Means for calculating a first value by performing predetermined arithmetic processing on the stored first parameter value, second parameter value, third parameter value, and fourth parameter value;
Means for recognizing the received user voice;
A response database in which the first response data, the plurality of audio data, the plurality of second values, and the plurality of second response data are associated;
The output response data corresponds to the first response data, the recognized user's voice corresponds to the voice data, and the calculated first value corresponds to the second value. Means for retrieving second response data from the response database and determining the retrieved second response data;
A virtual conversation system, comprising: means for outputting the determined second response data.

The first parameter value is a parameter value indicating a speaker's personal attribute, and the second parameter value is a parameter value based on an external factor different from the personal attribute. 2. The virtual conversation system according to 2.

3. The virtual conversation system according to claim 1, wherein the third parameter value is determined based on the received user's voice strength, speed, and ending voice level. 4.

The third parameter value is determined based on the received user's voice strength, speed, ending voice level, and a previously calculated third parameter value. The virtual conversation system according to claim 1 or claim 2.

In a virtual conversation program executed in a virtual conversation system including a response database in which first response data, a plurality of audio data, a plurality of second values, and a plurality of second response data are associated with each other,
Means for causing a computer to calculate a first parameter value based on data input from a user for determining characteristics of a speaker in a virtual conversation system as a user's partner;
Means for causing the computer to calculate a second parameter value based on data obtained via the network and affecting the characteristics of the speaker;
Means for causing the computer to output response data;
Means for causing the computer to receive the user's voice;
Means for causing a computer to calculate a third parameter value that affects the characteristics of the speaker based on the received user voice;
Means for causing a computer to calculate a first value by performing predetermined arithmetic processing on the calculated first parameter value, second parameter value, and third parameter value;
Means for causing a computer to recognize the received user voice;
The output response data corresponds to the first response data, the recognized user's voice corresponds to the voice data, and the calculated first value corresponds to the second value. Means for retrieving second response data from the response database and causing the computer to determine the retrieved second response data;
A virtual conversation program comprising: means for causing the computer to output the determined second response data.

In a virtual conversation program executed in a virtual conversation system including a response database in which first response data, a plurality of audio data, a plurality of second values, and a plurality of second response data are associated with each other,
Means for causing a computer to calculate a first parameter value based on data input from a user for determining characteristics of a speaker in a virtual conversation system as a user's partner;
Means for causing the computer to calculate a second parameter value based on data obtained via the network and affecting the characteristics of the speaker;
Means for causing the computer to output response data;
Means for causing the computer to receive the user's voice;
Means for causing the computer to recognize the user's face;
Means for causing a computer to calculate a third parameter value that affects the characteristics of the speaker based on the received user voice;
Means for causing the computer to calculate a fourth parameter value that affects the characteristics of the speaker based on the recognized facial features of the user;
Means for causing a computer to calculate a first value by performing a predetermined calculation process on the first parameter value, the second parameter value, the third parameter value, and the fourth parameter value;
Means for causing a computer to recognize the received user voice;
The output response data corresponds to the first response data, the recognized user's voice corresponds to the voice data, and the calculated first value corresponds to the second value. Means for retrieving second response data from the response database and causing the computer to determine the retrieved second response data;
A virtual conversation program comprising: means for causing the computer to output the determined second response data.