JP5545489B2

JP5545489B2 - Learning system, simulation apparatus, and learning method

Info

Publication number: JP5545489B2
Application number: JP2010232355A
Authority: JP
Inventors: 輝久翠; 清敬大竹; 孔明杉浦; 智織堀; 秀紀柏岡; 哲中村
Original assignee: National Institute of Information and Communications Technology
Current assignee: National Institute of Information and Communications Technology
Priority date: 2010-07-12
Filing date: 2010-10-15
Publication date: 2014-07-09
Anticipated expiration: 2030-10-15
Also published as: JP2012038287A

Description

本発明は、ユーザと対話を行う対話装置であり、ユーザの意思決定を支援する対話装置が出力する文を決定する場合に利用する情報を学習する学習システム等に関するものである。 The present invention relates to a learning system that learns information to be used when determining a sentence to be output by an interactive device that supports a user's decision making.

相談型の音声対話システムは、意思決定支援システムの一種であると考えられる。意思決定支援のタスクは、オペレーションリサーチの研究分野において、多くの研究事例があり、代表的な手法として階層分析法(AHP法)がある（非特許文献１）。AHP法では、問題の要素を「最終目標」、「評価基準」、「代替案」の３階層に分け、ユーザの各評価基準に対する局所重み(重要度)を推定することにより最適な意思決定を行う。 A consultation-type spoken dialogue system is considered to be a kind of decision support system. There are many cases of decision support in the field of operations research, and a representative method is the hierarchical analysis method (AHP method) (Non-patent Document 1). In the AHP method, the problem elements are divided into three layers, “Final Goal”, “Evaluation Criteria”, and “Alternative”, and the optimal weight is determined by estimating the local weight (importance) of each user's evaluation criteria. Do.

かかるAHP法を、そのまま対話装置に適用することが考えられる。 It is conceivable to apply the AHP method as it is to a dialogue apparatus.

また、従来、ユーザが入力したキーに対応する情報を検索し、提示する対話装置はあった。 Conventionally, there has been an interactive device that retrieves and presents information corresponding to a key input by a user.

Saaty,T.,The Analytic Hierarchy Process: Planning, Priority Setting, Resource Allocation, Mcgraw-Hill(1980)Saaty, T., The Analytic Hierarchy Process: Planning, Priority Setting, Resource Allocation, Mcgraw-Hill (1980)

しかしながら、従来の音声対話システムにおいては、ユーザと対話を行う対話システムが文を出力するために必要な情報を構築するために多大な労力が必要であった。 However, in the conventional spoken dialogue system, a great amount of labor is required to construct information necessary for the dialogue system that performs dialogue with the user to output a sentence.

また、上記のAHP法をそのまま対話装置に適用した場合、ユーザの意思決定の支援をするという目的を達成できる対話装置は構成できない。つまり、ユーザにとっての最適な決定を行うために、まず、評価基準に対する重みP_user=(p₁,p₂,・・・,p_M)、および各代替案に対する各評価基準の観点からの局所重みV_user=(v₁₁,v₁₂,・・・,v_1M,・・・,v_nm)を決定する。最適な候補の決定は、優先度「Σ_m=1 ^Mp_mv_km」が最大となる代替案kを選択することで実現される。一般的なAHP法では、評価基準や代替案に対する一対比較により、上記の重みを決定がする。 In addition, when the above AHP method is applied to a dialogue apparatus as it is, a dialogue apparatus that can achieve the purpose of supporting the user's decision making cannot be configured. That is, in order to make an optimal decision for the user, first, the weight P _user = (p ₁ , p ₂ ,..., P _M ) for the evaluation criterion, and the local from the viewpoint of each evaluation criterion for each alternative The weights V _user = (v ₁₁ , v ₁₂ ,..., V _1M ,..., V _nm ) are determined. The determination of the optimal candidate is realized by selecting an alternative k that maximizes the priority “Σ _{m = 1} ^M p _m v _km ”. In the general AHP method, the above weights are determined by paired comparisons with evaluation criteria and alternatives.

しかし、ユーザにとって装置が提示可能な候補やドメイン知識は、対話を通じて初めて知ることができる情報である場合も多く、対話開始時点で全てが既知であることは少ない。また、対話装置において、多数の候補(代替案)や評価基準を扱う場合も多い。そのような状況下で、一対比較を行うのは非常に多くのやり取りが必要となるため、現実的ではない。 However, candidates and domain knowledge that can be presented by the device to the user are often information that can be known for the first time through the dialogue, and are rarely known at the beginning of the dialogue. In many cases, the dialogue apparatus handles a large number of candidates (alternatives) and evaluation criteria. Under such circumstances, it is not realistic to perform a paired comparison because a large number of exchanges are required.

また、ユーザが入力したキーに対応する情報を提示するだけの対話装置では、ユーザの意思決定の支援をするという目的を達成できない。 In addition, an interactive device that only presents information corresponding to a key input by the user cannot achieve the purpose of supporting the user's decision making.

そこで、本発明は、対話装置が出力する文を決定する場合に利用する情報を自動的に学習することを目的とする。 Therefore, an object of the present invention is to automatically learn information to be used when determining a sentence to be output by an interactive device.

また、本発明は、ユーザとの対話の進行に応じて、ユーザの知識と嗜好とに関する情報を動的に変更しながら対話を行うことにより、ユーザの意思決定を適切に支援できる対話装置を提供することを目的とする。 In addition, the present invention provides an interactive apparatus that can appropriately support the user's decision making by performing a conversation while dynamically changing information related to the user's knowledge and preferences according to the progress of the conversation with the user. The purpose is to do.

本第一の発明の学習システムは、スポットに関して対話のシミュレーションを行う対話装置とシミュレーション装置とを具備する学習システムであり、対話装置が出力する文を決定する際に利用される重みベクトルを学習する学習システムであって、対話装置は、スポットと、スポットを決定するための要因である１以上の決定要因と、スポットの１以上の各決定要因の評価を示す評価値とを有するスポット情報を、２以上格納している知識ベースと、情報推薦手法を識別する手法識別子と、情報推薦手法の評価情報と、評価情報を構成する各要素の重みを示す重みベクトルとを有する２以上の情報推薦手法を格納している情報推薦手法格納部と、ユーザの状態を示す情報であり、１以上の各決定要因に対するユーザの嗜好を示す情報である嗜好ベクトルと、１以上の各決定要因に対するユーザの知識を示す知識ベクトルとを有するユーザ状態情報を格納しているユーザ状態情報格納部と、シミュレーション装置から、ユーザが入力する文のパターンであるユーザ文種類を識別するユーザ文種類識別子、またはユーザ文種類識別子と１以上の決定要因または１以上のスポットのうちの１以上の情報とを有するユーザ入力情報を受け付けるユーザ入力情報受付部と、情報推薦手法格納部に格納されている２以上の各情報推薦手法が有する評価情報および重みベクトルと、ユーザ状態情報とを用いて、２以上の各情報推薦手法に対する２以上のスコアを算出するスコア算出部と、スコア算出部が算出した２以上のスコアを用いて、一の情報推薦手法を識別する手法識別子、または手法識別子と、１以上の決定要因または１以上のスポットのうちの１以上の情報とを有する対話文情報を構成する対話文情報構成部と、対話文情報構成部が構成した対話文情報を、シミュレーション装置に送付する対話文出力部と、ユーザ入力情報受付部が受け付けたユーザ入力情報、または対話文出力部が出力した対話文情報のうちの１以上の情報から、少なくとも１以上のスポットまたは１以上の決定要因を取得し、１以上のスポットまたは１以上の決定要因を用いて、ユーザ状態情報格納部のユーザ状態情報を更新するユーザ状態情報更新部とを具備し、スコア算出部は、情報推薦手法格納部に格納されている２以上の各情報推薦手法が有する評価情報および重みベクトルと、ユーザ状態情報更新部が更新したユーザ状態情報とを用いて、２以上の各情報推薦手法に対する２以上のスコアを算出し、シミュレーション装置は、各情報推薦手法と各ユーザ文種類との確率に関する情報である対話確率情報、決定要因が選択される確率に関する情報である決定要因確率情報、およびスポットが選択される確率に関する情報であるスポット確率情報とを格納し得る対話情報格納部と、ユーザの嗜好を示すベクトルであるユーザ嗜好ベクトルを格納し得るユーザ嗜好ベクトル格納部と、対話装置から対話文情報を受け付ける対話文情報受付部と、対話文情報が有する手法識別子と対話確率情報とを用いて、ユーザ文種類を決定し、ユーザ文種類識別子を取得するユーザ文種類決定部と、決定要因確率情報またはスポット確率情報のうちの１以上の情報、または決定要因確率情報またはスポット確率情報のうちの１以上の情報および対話文情報が有する１以上の決定要因または１以上のスポットのうちの１以上の情報とを用いて、１以上の決定要因または１以上のスポットを取得する決定要因等取得部と、ユーザ文種類識別子、またはユーザ文種類識別子と１以上の決定要因または１以上のスポットのうちの１以上の情報とを有するユーザ入力情報を対話装置に送付するユーザ入力情報送付部と、ユーザ嗜好ベクトルと、ユーザ入力情報に含まれるスポットの１以上の各決定要因の評価を示す１以上の評価値とを取得し、ユーザ嗜好ベクトルと１以上の評価値との合致度を算出し、合致度を用いて、ユーザ文種類識別子で識別されるユーザ文種類が選択される報酬を算出する報酬算出部と、報酬を用いて、対話装置の手法識別子に対応する重みベクトルであり、対話装置の情報推薦手法格納部の重みベクトルを更新する学習部とを具備する学習システムである。 The learning system according to the first aspect of the present invention is a learning system that includes an interactive device that simulates an interaction with respect to a spot and a simulation device, and learns a weight vector that is used when determining a sentence that the interactive device outputs. In the learning system, the dialogue apparatus includes spot information including a spot, one or more determinants that are factors for determining the spot, and an evaluation value indicating an evaluation of each of the one or more determinants of the spot. Two or more information recommendation methods having two or more stored knowledge bases, a method identifier for identifying an information recommendation method, evaluation information of the information recommendation method, and a weight vector indicating the weight of each element constituting the evaluation information Is an information recommendation method storage unit storing information, information indicating a user's state, and information indicating a user's preference for one or more determination factors A user state information storage unit storing user state information having a good vector and a knowledge vector indicating the user's knowledge for each of one or more determinants, and a user who is a sentence pattern input by the user from the simulation device A user sentence type identifier for identifying a sentence type, or a user sentence information identifier and a user input information receiving unit for receiving user input information having one or more determinants or one or more pieces of information of one or more spots; and information recommendation A score calculation unit that calculates two or more scores for each of the two or more information recommendation methods using the evaluation information and the weight vector of each of the two or more information recommendation methods stored in the method storage unit, and the user state information And a method identifier for identifying one information recommendation method using two or more scores calculated by the score calculation unit, or a method identifier A dialogue sentence information constituting unit comprising dialogue sentence information having a child, one or more determinants or one or more information of one or more spots, and dialogue sentence information constituted by the dialogue sentence information constituting unit are simulated. At least one spot or one or more from one or more pieces of information out of the dialogue sentence output unit sent to the device and the user input information accepted by the user input information accepting unit or the dialogue sentence information outputted by the dialogue sentence output unit And a user state information update unit that updates user state information in the user state information storage unit using one or more spots or one or more determinants, and the score calculation unit includes information recommendation Using the evaluation information and weight vector of each of the two or more information recommendation methods stored in the method storage unit and the user state information updated by the user state information update unit, two or less Two or more scores are calculated for each of the above information recommendation methods, and the simulation apparatus is information on the probability of selection of conversation probability information, which is information on the probability between each information recommendation method and each user sentence type, and information on the probability that a determinant is selected. Dialog information storage unit that can store determinant probability information and spot probability information that is information regarding the probability that a spot is selected, and a user preference vector storage unit that can store a user preference vector that is a vector indicating user preference And a dialogue sentence information receiving unit that accepts dialogue sentence information from the dialogue device, a user sentence type that determines a user sentence type and acquires a user sentence type identifier using a method identifier and dialogue probability information that the dialogue sentence information has Determining unit and one or more pieces of determinant probability information or spot probability information, or determinant probability information or spot One or more determinants or one or more spots are acquired using one or more of the rate information and one or more determinants included in the dialogue information or one or more information of one or more spots. User input information for sending user input information having a determination factor acquisition unit, user sentence type identifier, or user sentence type identifier and one or more information of one or more determination factors or one or more spots to the interactive device The sending unit, the user preference vector, and one or more evaluation values indicating the evaluation of one or more determinants of the spot included in the user input information are acquired, and the degree of coincidence between the user preference vector and the one or more evaluation values And a reward calculation unit that calculates a reward for selecting a user sentence type identified by the user sentence type identifier using the degree of match, and corresponds to the technique identifier of the dialogue apparatus using the reward A Ruomomi vector, a learning system comprising a learning unit for updating the weight vector information recommendation method storage unit interactive device.

かかる構成により、ユーザと対話を行う対話装置が文を出力するために必要な重みベクトルを自動的に構築できる。 With this configuration, it is possible to automatically construct a weight vector necessary for an interactive apparatus that interacts with the user to output a sentence.

また、本第二の発明の学習システムは、第一の発明に対して、報酬算出部は、スポット確率情報を用いて、ランダムにスポットを決定した場合の１以上の評価値とユーザ嗜好ベクトルとの合致度の期待値を算出するランダム選択合致値算出手段と、ユーザ嗜好ベクトルと、ユーザ入力情報に含まれるスポットの１以上の各決定要因の評価を示す１以上の評価値との合致度を算出する選択スポット合致度算出手段と、ランダム選択合致値算出手段が算出した合致度の期待値と、選択スポット合致度算出手段が算出した合致度とを用いて、ユーザ入力情報に含まれるスポットが選択されたことの報酬を算出する報酬算出手段とを具備する学習システムである。 Further, the learning system of the second aspect of the present invention is directed to the first aspect, wherein the reward calculation unit uses one or more evaluation values and a user preference vector when a spot is randomly determined using spot probability information. The degree of coincidence between the random selection coincidence calculating means for calculating the expected value of the degree of coincidence, the user preference vector, and one or more evaluation values indicating the evaluation of one or more determinants of the spot included in the user input information. The spot included in the user input information is calculated using the expected value of the degree of match calculated by the selected spot match degree calculating means, the random selection match value calculating means, and the match degree calculated by the selected spot match degree calculating means. It is a learning system provided with the reward calculation means which calculates the reward of having been selected.

また、本第三の発明の対話装置は、スポットに関してユーザと対話を行う対話装置であり、スポットと、スポットを決定するための要因である１以上の決定要因と、スポットの１以上の各決定要因の評価を示す評価値とを有するスポット情報を、２以上格納している知識ベースと、対話装置が出力する文または対話装置が出力する文のパターンを示す情報である文パターン情報と、文パターン情報を選択する際に利用される文パターン情報の評価情報とを有する２以上の情報推薦手法を格納している情報推薦手法格納部と、ユーザの状態を示す情報であり、１以上の各決定要因に対するユーザの嗜好を示す情報である嗜好ベクトルと、１以上の各決定要因に対するユーザの知識を示す知識ベクトルとを有するユーザ状態情報を格納しているユーザ状態情報格納部と、ユーザが入力した文を受け付ける受付部と、ユーザ状態情報を、情報推薦手法格納部に格納されている２以上の各情報推薦手法が有する評価情報に適用し、２以上の各情報推薦手法に対する２以上のスコアを算出するスコア算出部と、スコア算出部が算出した２以上のスコアを用いて、一の情報推薦手法が有する文パターン情報を取得し、文パターン情報から文を構成する文構成部と、文構成部が構成した文を出力する文出力部と、受付部が受け付けた文、または文出力部が出力した文のうちの１以上の文から、少なくとも１以上のスポットまたは１以上の決定要因を取得し、１以上のスポットまたは１以上の決定要因を用いて、ユーザ状態情報格納部のユーザ状態情報を更新するユーザ状態情報更新部とを具備し、スコア算出部は、ユーザ状態情報更新部が更新したユーザ状態情報を、情報推薦手法格納部に格納されている２以上の各情報推薦手法が有する評価情報に適用し、２以上の各情報推薦手法に対する２以上のスコアを算出する対話装置である。 The interactive apparatus according to the third aspect of the present invention is an interactive apparatus that performs a dialog with a user regarding a spot, and includes a spot, one or more determination factors that are factors for determining the spot, and one or more determinations of the spot. A knowledge base that stores two or more spot information having an evaluation value indicating the evaluation of the factor, sentence pattern information that is a sentence output by the interactive device or information indicating a sentence pattern output by the interactive device, and a sentence An information recommendation method storage unit storing two or more information recommendation methods having evaluation information of sentence pattern information used when selecting pattern information, and information indicating a user state, each of one or more A user state information having a preference vector which is information indicating a user's preference for a determinant and a knowledge vector indicating a user's knowledge for one or more determinants is stored. The state information storage unit, a reception unit that receives a sentence input by the user, and the user state information are applied to evaluation information included in each of two or more information recommendation methods stored in the information recommendation method storage unit. Using the score calculation unit that calculates two or more scores for each of the information recommendation methods and the two or more scores calculated by the score calculation unit, the sentence pattern information possessed by one information recommendation method is acquired, and the sentence pattern information is At least one of at least one of a sentence composing part that constitutes a sentence, a sentence output part that outputs a sentence constituted by the sentence composing part, a sentence accepted by the accepting part, or a sentence output by the sentence output part A user status information update unit that acquires the above spot or one or more determinants and updates the user status information in the user status information storage unit using the one or more spots or the one or more determinants, The core calculation unit applies the user state information updated by the user state information update unit to evaluation information included in each of the two or more information recommendation methods stored in the information recommendation method storage unit, and each of the two or more information recommendation methods Is an interactive device for calculating a score of 2 or more for

かかる構成により、ユーザとの対話の進行に応じて、ユーザの知識と嗜好とに関する情報を動的に変更しながら対話を行うことにより、ユーザの意思決定を適切に支援できる。 With this configuration, it is possible to appropriately support the user's decision making by performing the dialogue while dynamically changing the information about the user's knowledge and preferences according to the progress of the dialogue with the user.

また、本第四の発明の対話装置は、第三の発明に対して、文構成部は、スコア算出部が算出した２以上のスコアのうち最も大きいスコアに対応する一の情報推薦手法が有する文パターン情報を取得する文パターン情報取得手段と、文パターン情報取得手段が取得した文パターン情報に含まれる１以上の変数を取得し、変数に対応するスポットまたは決定要因を、文出力部が直前に出力した文、または受付部が直前に受け付けた文のうちの１以上の文から取得する変数値取得手段と、文パターン情報取得手段が取得した文パターン情報の変数の箇所に、変数値取得手段が取得した用語を挿入して文を構成する文構成手段とを具備する対話装置である。 Further, in the dialog device according to the fourth aspect of the invention, in contrast to the third aspect, the sentence composition unit has one information recommendation method corresponding to the largest score among two or more scores calculated by the score calculation unit. A sentence pattern information acquisition unit that acquires sentence pattern information, one or more variables included in the sentence pattern information acquired by the sentence pattern information acquisition unit, and a sentence output unit immediately preceding a spot or determination factor corresponding to the variable Variable value acquisition means for acquiring from one or more of the sentences output to the sentence or the sentence received by the reception unit immediately before, and the variable value acquisition at the location of the variable in the sentence pattern information acquired by the sentence pattern information acquisition means And a sentence constructing means for constructing a sentence by inserting the term obtained by the means.

また、本第五の発明の対話装置は、第四の発明に対して、変数値取得手段は、文パターン情報取得手段が取得した文パターン情報に含まれる１以上の変数を取得し、変数に対応する候補となる１以上のスポットまたは１以上の決定要因を、文出力部が直前に出力した文、または受付部が直前に受け付けた文のうちの１以上の文から取得し、候補となる１以上のスポットまたは１以上の決定要因から、候補となる１以上のスポットまたは１以上の決定要因に対応する知識ベースの評価値を用いて、変数に対応するスポットまたは決定要因を選択する対話装置である。 Further, in the dialog device according to the fifth invention, in contrast to the fourth invention, the variable value acquisition means acquires one or more variables included in the sentence pattern information acquired by the sentence pattern information acquisition means, One or more spots or one or more determinants to be the corresponding candidates are acquired from one or more sentences of the sentence output immediately before by the sentence output unit or the sentence received immediately before by the reception unit and become candidates. An interactive device that selects a spot or determinant corresponding to a variable from one or more spots or one or more determinants using a candidate evaluation value of one or more spots or one or more determinants. It is.

また、本第六の発明の対話装置は、第三から第五いずれかの発明に対して、ユーザ状態情報更新部は、受付部が受け付けた文から少なくとも１以上の決定要因を取得するユーザ提示用語取得手段と、文出力部が出力した文のうちの１以上の文から、少なくとも１以上の決定要因を取得する装置提示用語取得手段と、ユーザ提示用語取得手段が取得した１以上の決定要因に対する嗜好ベクトルの要素の値を高くするように更新する嗜好ベクトル更新手段と、装置提示用語取得手段が取得した１以上の決定要因に対する知識ベクトルの要素の値を高くするように更新する知識ベクトル更新手段とを具備する対話装置である。 Further, according to the sixth aspect of the present invention, in the third to fifth aspects of the invention, the user status information update unit obtains at least one or more determinants from the sentence received by the reception unit. A term acquisition unit, a device presentation term acquisition unit that acquires at least one or more determinants from one or more of the sentences output by the sentence output unit, and one or more determinants acquired by the user presentation term acquisition unit Preference vector update means for updating so as to increase the value of the element of the preference vector with respect to, and knowledge vector update for updating so as to increase the value of the element of the knowledge vector for one or more determinants acquired by the device presentation term acquisition means An interactive device.

また、本第七の発明の対話装置は、第三から第六いずれかの発明に対して、受付部は、ユーザが入力した音声を受け付ける音声受付手段と、音声を認識し、文字列に変換する音声認識手段とを具備し、文出力部は、文構成部が構成した文を音声出力する対話装置である。 Further, according to the seventh aspect of the present invention, in any one of the third to sixth aspects, the accepting unit recognizes the speech inputted by the user and recognizes the speech and converts it into a character string. And a sentence output unit is an interactive device that outputs a sentence constituted by the sentence constructing unit.

かかる構成により、ユーザとの音声対話の進行に応じて、ユーザの知識と嗜好とに関する情報を動的に変更しながら音声対話を行うことにより、ユーザの意思決定を適切に支援できる。 With this configuration, it is possible to appropriately support the user's decision making by performing the voice conversation while dynamically changing the information related to the user's knowledge and preferences according to the progress of the voice conversation with the user.

本発明による学習システムによれば、ユーザと対話を行う対話装置が文を出力するために必要な重みベクトルを自動的に構築できる。 According to the learning system of the present invention, it is possible to automatically construct a weight vector necessary for a dialog device that interacts with a user to output a sentence.

実施の形態１における学習システムの概念図Conceptual diagram of the learning system in the first embodiment 同学習システムを構成する対話装置のブロック図Block diagram of the dialogue device that constitutes the learning system 同学習システムを構成するシミュレーション装置のブロック図Block diagram of a simulation device that constitutes the learning system 同対話装置の動作について説明するフローチャートA flowchart for explaining the operation of the dialogue apparatus 同スコア算出処理の動作について説明するフローチャートA flowchart for explaining the operation of the score calculation process 同構成処理の動作について説明するフローチャートA flowchart for explaining the operation of the configuration processing 同ユーザ状態情報更新処理の動作について説明するフローチャートA flowchart for explaining the operation of the user status information update process 同ユーザ状態情報更新処理の動作について説明するフローチャートA flowchart for explaining the operation of the user status information update process 同シミュレーション装置の動作について説明するフローチャートFlow chart explaining operation of the simulation apparatus 同報酬算出処理の動作について説明するフローチャートA flowchart for explaining the operation of the reward calculation process 同スポット情報管理表の一例を示す図The figure which shows an example of the spot information management table 同情報推薦手法管理表の一例を示す図The figure which shows an example of the information recommendation method management table 同対話確率情報管理表を示す図The figure which shows the dialogue probability information management table 同シミュレーション対話数と、各数のターン後の報酬の関係を示す図A diagram showing the relationship between the number of simulated dialogues and the reward after each turn 同ベースライン手法との報酬の比較結果を示す図Figure showing the comparison results of rewards with the baseline method 同ベースライン手法との報酬の比較結果を示す図Figure showing the comparison results of rewards with the baseline method 実施の形態２における対話装置のブロック図Block diagram of interactive apparatus in embodiment 2 同対話装置の動作について説明するフローチャートA flowchart for explaining the operation of the dialogue apparatus 同スコア算出処理の動作について説明するフローチャートA flowchart for explaining the operation of the score calculation process 同文構成処理の動作について説明するフローチャートFlowchart explaining operation of same sentence composition processing 同推薦文取得処理の動作について説明するフローチャートA flowchart for explaining the operation of the recommended sentence acquisition process 同ユーザ状態情報更新処理の動作について説明するフローチャートA flowchart for explaining the operation of the user status information update process 同対話装置の概念図Conceptual diagram of the dialogue device 同対話の流れを説明する図Diagram explaining the flow of the dialogue 同コンピュータシステムの概観図Overview of the computer system 同コンピュータシステムのブロック図Block diagram of the computer system

以下、学習システム等の実施形態について図面を参照して説明する。なお、実施の形態において同じ符号を付した構成要素は同様の動作を行うので、再度の説明を省略する場合がある。
（実施の形態１） Hereinafter, embodiments of a learning system and the like will be described with reference to the drawings. In addition, since the component which attached | subjected the same code | symbol in embodiment performs the same operation | movement, description may be abbreviate | omitted again.
(Embodiment 1)

本実施の形態においてシミュレーション装置と対話装置との対話を自動生成し、報酬を決定し、重みベクトルを更新していく学習システムについて説明する。なお、重みベクトルとは、ユーザ状態情報の各要素に対する重みの集合である。 A learning system that automatically generates a dialog between a simulation device and a dialog device, determines a reward, and updates a weight vector in the present embodiment will be described. The weight vector is a set of weights for each element of the user state information.

なお、対話装置は、ユーザが装置から情報の提示や情報の推薦を受けながら、候補を選択する相談型の対話装置を、シミュレーション装置対応にした装置である。 The interactive device is a device in which a consultation-type interactive device in which a user selects a candidate while receiving information presentation or information recommendation from the device is compatible with a simulation device.

本実施の形態において、対話装置は、複数の候補（ここでは、主として、スポットと言う）の中からユーザに適した候補を選択する相談型対話のモデルを実装する対話装置である。 In the present embodiment, the dialogue device is a dialogue device that implements a consultation-type dialogue model for selecting a candidate suitable for the user from among a plurality of candidates (here, mainly referred to as spots).

図１は、本実施の形態における学習システムの概念図である。学習システムは、対話装置１とシミュレーション装置２を具備する。対話装置１とシミュレーション装置２とは、対話を示すデータである対話データ（後述する対話文情報およびユーザ入力情報）をやりとりしながら、シミュレーション装置２が対話装置１の重みベクトルを更新する。かかる更新を学習と適宜言うこととする。また、図１において、対話装置１とシミュレーション装置２とは、２つの装置であるが、一つの装置でも良い。 FIG. 1 is a conceptual diagram of a learning system in the present embodiment. The learning system includes an interactive device 1 and a simulation device 2. The interactive device 1 and the simulation device 2 update the weight vector of the interactive device 1 while exchanging interactive data (interactive sentence information and user input information described later) that is data indicating the interactive operation. Such update is referred to as learning as appropriate. In FIG. 1, the interactive device 1 and the simulation device 2 are two devices, but may be a single device.

図２は、本実施の形態における学習システムを構成する対話装置１の内部構造を示すブロック図である。 FIG. 2 is a block diagram showing an internal structure of the interactive apparatus 1 that constitutes the learning system according to the present embodiment.

対話装置１は、知識ベース１１、情報推薦手法格納部１２、ユーザ状態情報格納部１３、ユーザ入力情報受付部１４、スコア算出部１５、対話文情報構成部１６、対話文出力部１７、ユーザ状態情報更新部１８を具備する。 The dialogue apparatus 1 includes a knowledge base 11, an information recommendation method storage unit 12, a user state information storage unit 13, a user input information reception unit 14, a score calculation unit 15, a dialogue sentence information configuration unit 16, a dialogue sentence output unit 17, a user state. An information update unit 18 is provided.

対話文情報構成部１６は、手法識別子取得手段１６１、変数値取得手段１６２、対話文情報構成手段１６３を具備する。 The dialogue sentence information configuration unit 16 includes a technique identifier acquisition unit 161, a variable value acquisition unit 162, and a dialogue statement information configuration unit 163.

ユーザ状態情報更新部１８は、ユーザ提示用語取得手段１８１、装置提示用語取得手段１８２、嗜好ベクトル更新手段１８３、知識ベクトル更新手段１８４を具備する。 The user status information update unit 18 includes a user presentation term acquisition unit 181, a device presentation term acquisition unit 182, a preference vector update unit 183, and a knowledge vector update unit 184.

図３は、本実施の形態における学習システムを構成するシミュレーション装置２の内部構造を示すブロック図である。 FIG. 3 is a block diagram showing an internal structure of the simulation apparatus 2 constituting the learning system in the present embodiment.

シミュレーション装置２は、知識ベース１１、対話情報格納部２１、ユーザ嗜好ベクトル格納部２２、対話文情報受付部２３、ユーザ文種類決定部２４、決定要因等取得部２５、ユーザ入力情報送付部２６、報酬算出部２７、学習部２８を具備する。 The simulation apparatus 2 includes a knowledge base 11, a dialogue information storage unit 21, a user preference vector storage unit 22, a dialogue sentence information reception unit 23, a user sentence type determination unit 24, a determination factor acquisition unit 25, a user input information transmission unit 26, A reward calculation unit 27 and a learning unit 28 are provided.

報酬算出部２７は、ランダム選択合致値算出手段２７１、選択スポット合致度算出手段２７２、報酬算出手段２７３を具備する。 The reward calculation unit 27 includes a random selection match value calculation means 271, a selected spot match degree calculation means 272, and a reward calculation means 273.

なお、対話装置１とシミュレーション装置２とは、知識ベース１１を共用しても良いことは言うまでもない。 Needless to say, the dialogue apparatus 1 and the simulation apparatus 2 may share the knowledge base 11.

知識ベース１１は、２以上のスポット情報を格納している。スポット情報とは、スポットと、１以上の決定要因と、評価値とを有する情報である。スポットとは、スポットを識別する情報と同意義であり、例えば、スポット名である。スポットとは、観光のスポット、レストラン、店舗など、通常、ユーザが訪れる場所である。ただし、対話の対象となるものであれば何でも良い。例えば、スポットとは、企業、人の集まり、概念などでも良く、広く解する。また、決定要因は、スポットを決定するための要因である。また、決定要因は、スポットの属性とも言える。決定要因は、スポットの評価の観点とも言える。スポットが観光地である場合、決定要因は、例えば、「庭園で有名」「混雑していない」「世界遺産」「景色がいい」などである。また、評価値は、スポットの１以上の各決定要因の評価を示す情報である。評価値は、「１（○），０（×）」などの２値（２段階）でも良いし、１から５の整数などの多段階でも良い。評価値は、スポットが決定要因に当てはまるか否かを示す評価の値である。スポット情報は、スポットと決定要因に対応する説明文を有しても良い。スポット情報が説明文を有する場合、通常、一のスポットと一の決定要因ごとに説明文を有する。 The knowledge base 11 stores two or more pieces of spot information. The spot information is information having a spot, one or more determinants, and an evaluation value. A spot has the same meaning as information for identifying a spot, and is, for example, a spot name. A spot is a place where a user usually visits, such as a sightseeing spot, a restaurant, or a store. However, anything can be used as long as it is a subject of dialogue. For example, a spot may be a company, a group of people, a concept, etc., and is widely understood. The determinant is a factor for determining a spot. The determinant can also be said to be an attribute of a spot. The determinant can be said to be the viewpoint of spot evaluation. When the spot is a sightseeing spot, the determining factors are, for example, “famous for gardens”, “not crowded”, “world heritage”, “scenic scenery”, and the like. The evaluation value is information indicating evaluation of one or more determinants of the spot. The evaluation value may be binary (two stages) such as “1 (◯), 0 (×)”, or may be multistage such as an integer of 1 to 5. The evaluation value is an evaluation value indicating whether or not a spot is a determining factor. The spot information may have a description corresponding to the spot and the determining factor. When spot information has an explanatory note, it usually has an explanatory note for each spot and one determining factor.

情報推薦手法格納部１２は、２以上の情報推薦手法を格納している。情報推薦手法は、ここでは、手法識別子と、文パターン情報と、評価情報と、重みベクトルとを有する。手法識別子は、情報推薦手法を識別する情報である。文パターン情報は、対話装置１が出力する文または対話装置１が出力する文のパターンを示す情報である。評価情報は、文パターン情報を選択する際に利用される文パターン情報の評価のための情報である。評価情報は、情報推薦手法を評価するための情報である、とも言える。評価情報は、例えば、後述するように２９の要素を有するベクトルである。重みベクトルは、評価情報を構成する各要素の重みを示すベクトルである。なお、情報推薦手法において、文パターン情報は必須ではない。情報推薦手法は、手法識別子と、評価情報と、重みベクトルとからなっていても良い。 The information recommendation method storage unit 12 stores two or more information recommendation methods. Here, the information recommendation method has a method identifier, sentence pattern information, evaluation information, and a weight vector. The method identifier is information for identifying the information recommendation method. The sentence pattern information is information indicating a sentence output from the interactive device 1 or a sentence pattern output from the interactive device 1. The evaluation information is information for evaluating the sentence pattern information used when selecting the sentence pattern information. It can be said that the evaluation information is information for evaluating the information recommendation method. The evaluation information is, for example, a vector having 29 elements as will be described later. The weight vector is a vector indicating the weight of each element constituting the evaluation information. In the information recommendation method, sentence pattern information is not essential. The information recommendation method may include a method identifier, evaluation information, and a weight vector.

また、情報推薦手法には、例えば、以下の６つの手法がある。（１）現在話題のスポットに関する情報推薦（手法１）、（２）現在話題の決定要因に関する情報推薦（手法２）、（３）オープンプロンプト（手法３）、（４）決定要因の提示１（手法４）、（５）決定要因の提示２（手法５）、（６）ユーザが、興味があると推定されるスポットの推薦（手法６）である。手法１は、手法識別子「１」と、直前に説明したスポットについて、詳細な説明を推薦する文パターン情報と、文パターン情報の評価情報と、重みベクトルとを有する現在話題スポット情報推薦手法である。手法２は、手法識別子「２」と、直前に説明した決定要因に関連した別の観光スポットを推薦する文パターン情報と、文パターン情報の評価情報と、重みベクトルとを有する現在話題決定要因情報推薦手法である。手法３は、手法識別子「３」と、特に情報を推薦せず、オープンプロンプトを示す文パターン情報と、文パターン情報の評価情報と、重みベクトルとを有するオープンプロンプト情報推薦手法である。手法４は、対話装置１が説明可能な決定要因を提示する手法である。手法４が選択された場合、対話装置１が推定するユーザの知識が低い決定要因から選択する。手法５は、手法４と同様に、対話装置１が説明可能な決定要因を提示する手法である。手法５が選択された場合、対話装置１が推定するユーザの知識が高い決定要因から選択する。手法６は、対話装置１が推定するユーザの興味に基づいて、ユーザが最も興味を示すと考えられるスポットkを選択し、ユーザに提示する手法である。 In addition, the information recommendation method includes the following six methods, for example. (1) Information recommendation regarding current topic spot (method 1), (2) Information recommendation regarding current topic determinants (method 2), (3) Open prompt (method 3), (4) Presentation of determinants 1 ( Method 4), (5) Presentation of decision factor 2 (Method 5), (6) Spot recommendation (Method 6) that the user is estimated to be interested in. The technique 1 is a current topic spot information recommendation technique having a technique identifier “1”, sentence pattern information for recommending a detailed explanation, evaluation information of sentence pattern information, and a weight vector for the spot just described. . Method 2 is current topic determinant information having method identifier “2”, sentence pattern information that recommends another tourist spot related to the determinant just described, sentence pattern information evaluation information, and a weight vector. This is a recommendation method. Method 3 is an open prompt information recommendation method that includes method identifier “3”, sentence pattern information indicating an open prompt, evaluation information of sentence pattern information, and a weight vector without particularly recommending information. Method 4 is a method for presenting determinants that can be explained by the dialogue apparatus 1. When the method 4 is selected, the user selects from the determinants having a low knowledge of the user estimated by the dialogue apparatus 1. Method 5 is a method for presenting determinants that can be explained by the dialogue apparatus 1, as in Method 4. When the method 5 is selected, the user selects from the determination factors having high knowledge of the user estimated by the dialogue apparatus 1. Method 6 is a method of selecting a spot k that the user is most interested in based on the user's interest estimated by the interactive apparatus 1 and presenting the spot k to the user.

ユーザ状態情報格納部１３は、ユーザの状態を示す情報であり、１以上の各決定要因に対するユーザの嗜好を示す情報である嗜好ベクトルと、１以上の各決定要因に対するユーザの知識を示す知識ベクトルとを有するユーザ状態情報を格納している。また、ユーザ状態情報は、ユーザの１以上の属性値を示す情報である属性ベクトルを有しても良い。ユーザの属性値とは、例えば、性別（男性または女性）、年齢層（１０代，２０代，３０代，団塊の世代団塊ジュニアなど）、職業、出身地、支持政党等である。 The user state information storage unit 13 is information indicating a user's state, a preference vector that is information indicating a user's preference with respect to one or more determinants, and a knowledge vector indicating user's knowledge with respect to one or more determinants The user status information including The user state information may include an attribute vector that is information indicating one or more attribute values of the user. User attribute values include, for example, sex (male or female), age group (10's, 20's, 30's, baby boom junior, etc.), occupation, hometown, supporting political party, and the like.

ユーザ状態情報は、ユーザの決定要因の観点からのスポットに対する重要度を示す１以上の局所重みに関する情報である局所重み情報を含むことは好適である。ユーザの決定要因mの観点からのスポットnに対する局所重みv_nmは、例えば、対話装置１が情報推薦手法１または情報推薦手法２または情報推薦手法６を用いてユーザにスポットの評価を知らせた場合に「１」をとるものとする。なお、これは、ユーザは、対話装置１から提示された情報のみから判断すると仮定している。また、ユーザ状態情報格納部１３のユーザ状態情報は、対話の進行とともに、動的に変更される。 It is preferable that the user state information includes local weight information that is information on one or more local weights indicating the importance of the spot from the viewpoint of the user's determinant. The local weight v _nm for the spot n from the viewpoint of the user's determinant m is, for example, when the interactive apparatus 1 informs the user of the spot evaluation using the information recommendation method 1, the information recommendation method 2, or the information recommendation method 6. “1” shall be taken. This is based on the assumption that the user determines only from the information presented from the interactive device 1. Further, the user status information in the user status information storage unit 13 is dynamically changed as the dialogue progresses.

また、ユーザ状態情報は、対話装置１とシミュレーション装置２との現在までの対話の量を示す情報である対話量情報を有することは好適である。また、ユーザ状態情報は、直前に対話装置１が送付した対話文情報に対応する決定要因に関する情報、または直前にシミュレーション装置２から受け付けたユーザ入力情報に対応する決定要因に関する情報のいずれか１以上の情報を含むことは好適である。 Further, it is preferable that the user state information includes dialogue amount information that is information indicating the amount of dialogue between the dialogue device 1 and the simulation device 2 up to the present. Further, the user status information is one or more of information regarding a determinant corresponding to the dialog sentence information sent by the dialog device 1 immediately before or information regarding a determinant corresponding to the user input information received from the simulation device 2 just before. It is preferable that this information is included.

ユーザ入力情報受付部１４は、シミュレーション装置２から、ユーザ入力情報を受け付ける。ユーザ入力情報とは、ユーザ文種類識別子を有する情報、またはユーザ文種類識別子と１以上の決定要因または１以上のスポットのうちの１以上の情報とを有する情報である。ユーザ文種類識別子とは、ユーザが入力する文のパターンを識別する情報である。 The user input information receiving unit 14 receives user input information from the simulation apparatus 2. The user input information is information having a user sentence type identifier, or information having a user sentence type identifier and one or more information of one or more determinants or one or more spots. The user sentence type identifier is information for identifying a sentence pattern input by the user.

スコア算出部１５は、情報推薦手法格納部１２に格納されている２以上の各情報推薦手法が有する評価情報および重みベクトルと、ユーザ状態情報とを用いて、２以上の各情報推薦手法に対する２以上のスコアを算出する。また、スコア算出部１５は、情報推薦手法格納部１２に格納されている２以上の各情報推薦手法が有する評価情報および重みベクトルと、ユーザ状態情報更新部１８が更新したユーザ状態情報とを用いて、２以上の各情報推薦手法に対する２以上のスコアを算出する。 The score calculation unit 15 uses the evaluation information and the weight vector included in each of the two or more information recommendation methods stored in the information recommendation method storage unit 12 and the user status information, and outputs 2 for each of the two or more information recommendation methods. The above score is calculated. Further, the score calculation unit 15 uses the evaluation information and the weight vector of each of the two or more information recommendation methods stored in the information recommendation method storage unit 12 and the user state information updated by the user state information update unit 18. Thus, a score of 2 or more for each information recommendation method of 2 or more is calculated.

スコア算出部１５は、通常、対話文出力部１７が対話文を出力する前（直前であるとは限らない）に、スコアを算出する。なお、スコア算出部１５は、ユーザ入力情報受付部１４がユーザ入力情報を受け付けるごとに、スコアを算出することは好適である。また、ここで、スコア算出部１５は、例えば、演算式「スコア＝ｆ（ユーザ状態情報，重みベクトル）」によりスコアを算出することである。また、例えば、ｆは「スコア＝ユーザ状態情報×重みベクトル」である。つまり、スコア算出部１５は、次に対話装置１が出力すべき文の文パターン情報を決定するために、文パターン情報と対応付けて管理されている評価情報と動的に変化するユーザ状態情報とを用いて、情報推薦手法ごとにスコアを算出する。 The score calculation unit 15 normally calculates the score before the dialog text output unit 17 outputs the dialog text (not necessarily immediately before). In addition, it is suitable for the score calculation part 15 to calculate a score whenever the user input information reception part 14 receives user input information. Here, the score calculation unit 15 is to calculate a score by, for example, an arithmetic expression “score = f (user state information, weight vector)”. For example, f is “score = user state information × weight vector”. That is, the score calculation unit 15 uses the evaluation information managed in association with the sentence pattern information and the user state information that dynamically changes in order to determine the sentence pattern information of the sentence to be output next by the dialogue apparatus 1. Are used to calculate a score for each information recommendation method.

対話文情報構成部１６は、スコア算出部１５が算出した２以上のスコアを用いて、対話文情報を構成する。通常、対話文情報構成部１６は、スコア算出部１５が算出した最大のスコアに対応する情報推薦手法を識別する手法識別子を有する対話文情報を構成する。なお、対話文情報は、手法識別子を有する。対話文情報は、１以上の決定要因または１以上のスポットのうちの１以上の情報を有することは好適である。手法識別子は、一の情報推薦手法を識別する情報である。 The dialog text information configuration unit 16 configures the dialog text information using two or more scores calculated by the score calculation unit 15. Normally, the dialog text information configuration unit 16 configures dialog text information having a method identifier for identifying an information recommendation method corresponding to the maximum score calculated by the score calculation unit 15. The dialog sentence information has a technique identifier. It is preferable that the dialog sentence information includes one or more information of one or more determinants or one or more spots. The method identifier is information for identifying one information recommendation method.

手法識別子取得手段１６１は、スコア算出部１５が算出した２以上のスコアのうち最も大きいスコアに対応する一の情報推薦手法を識別する手法識別子を、情報推薦手法格納部１２から取得する。 The method identifier acquisition unit 161 acquires, from the information recommendation method storage unit 12, a method identifier that identifies one information recommendation method corresponding to the largest score among the two or more scores calculated by the score calculation unit 15.

変数値取得手段１６２は、対話文出力部１７が直前に出力した対話文情報、またはユーザ入力情報受付部１４が直前に受け付けたユーザ入力情報のうちの１以上の情報から、１以上のスポットまたは１以上の決定要因のうちの１以上の情報を取得する。 The variable value acquisition unit 162 is configured to obtain one or more spots or one of the dialogue text information output immediately before by the dialog text output unit 17 or the user input information received immediately before by the user input information reception unit 14. Obtain one or more information of one or more determinants.

対話文情報構成手段１６３は、手法識別子取得手段１６１が取得した手法識別子、変数値取得手段１６２が取得した１以上のスポットまたは１以上の決定要因のうちの１以上の情報から、対話文情報を構成する。 The dialogue sentence information configuring unit 163 obtains dialogue sentence information from one or more of the method identifier acquired by the method identifier acquisition unit 161, one or more spots acquired by the variable value acquisition unit 162, or one or more determinants. Configure.

なお、スコア算出部１５、および対話文情報構成部１６により、システムの行動ａ_ｓｙｓ（ｃａ_ｓｙｓ）は、以下の数式１が示すソフトマックス政策に基づいて選択される。なお、システムの行動とは、対話文情報が有する手法識別子である。
Note that the system action a _sys (ca _sys ) is selected by the score calculation unit 15 and the dialog sentence information configuration unit 16 based on the softmax policy expressed by the following Equation 1. The system action is a technique identifier included in the dialog text information.

数式１において、Ｓはユーザ状態情報である。ｋは、手法識別子である。また、θは、パラメータの集合であり、（θ_１１，θ_１２，...，θ_１Ｉ，...，θ_ＪＩ）は，Ｊ（手法数）×Ｉ（特徴量数）個のパラメータからなる．パラメータθ_ｊｉは，行動ｊのｉ番目の特徴量に対する重みであり、手法ｊの選択されやすさを決定する。このθが、学習システムにおける学習の対象である。なお。学習には、好ましくは強化学習が用いられる。 In Equation 1, S is user status information. k is a technique identifier. Θ is a set of parameters, and (θ ₁₁ , θ ₁₂ ,..., Θ _1I ,..., Θ _JI ) is calculated from J (number of methods) × I (number of features) parameters. Become. The parameter θ _ji is a weight for the i-th feature amount of the action j, and determines how easily the method j is selected. This θ is an object of learning in the learning system. Note that. For learning, reinforcement learning is preferably used.

対話文出力部１７は、対話文情報構成部１６が構成した対話文情報を、シミュレーション装置２に送付する。 The dialog text output unit 17 sends the dialog text information configured by the dialog text information configuration unit 16 to the simulation apparatus 2.

ユーザ状態情報更新部１８は、ユーザ入力情報受付部１４が受け付けたユーザ入力情報、または対話文出力部１７が出力した対話文のうちの１以上の情報から、少なくとも１以上のスポットまたは１以上の決定要因を取得し、当該１以上のスポットまたは１以上の決定要因を用いて、ユーザ状態情報格納部１３のユーザ状態情報を更新する。そして、ユーザ状態情報更新部１８は、通常、取得したスポットまたは決定要因についてのユーザ状態情報を構成する要素の値（嗜好ベクトルや知識ベクトルや属性ベクトルなどの要素）が上昇するようにユーザ状態情報を更新する。ユーザ状態情報更新部１８は、取得したスポットまたは決定要因についてのユーザ状態情報を構成する要素の値をどのような演算式やアルゴリズムで上昇されるかは問わない。ユーザ状態情報更新部１８は、取得したスポットまたは決定要因についてのユーザ状態情報を構成する要素の値を、定数を加算することにより上昇させても良いし、定数を乗算することにより上昇させても良いし、その他の増加関数により上昇させても良い。
また、ユーザ状態情報更新部１８は、通常、ユーザ入力情報受付部１４がユーザ入力情報を受け付けるごとに更新する。ただし、ユーザ状態情報更新部１８は、対話文出力部１７が対話文を送付するごとに更新しても良い。 The user status information update unit 18 is configured to obtain at least one spot or one or more information from one or more pieces of information of the user input information received by the user input information receiving unit 14 or the dialogue sentence output by the dialogue sentence output unit 17. The determination factor is acquired, and the user status information in the user status information storage unit 13 is updated using the one or more spots or the one or more determination factors. Then, the user state information update unit 18 usually sets the user state information so that the values of elements (elements such as preference vectors, knowledge vectors, and attribute vectors) constituting the user state information about the acquired spot or determination factor are increased. Update. The user status information updating unit 18 does not ask what arithmetic expression or algorithm is used to raise the value of the element constituting the user status information regarding the acquired spot or determination factor. The user status information update unit 18 may increase the value of the element constituting the user status information regarding the acquired spot or determination factor by adding a constant, or increase the value by multiplying the constant. It may be good or may be raised by other increasing functions.
Moreover, the user status information update unit 18 is normally updated every time the user input information reception unit 14 receives user input information. However, the user status information update unit 18 may update the dialog sentence output unit 17 every time the dialog sentence is sent.

ユーザ提示用語取得手段１８１は、ユーザ入力情報受付部１４が受け付けたユーザ入力情報から少なくとも１以上の決定要因を取得する。ユーザ提示用語取得手段１８１は、肯定的な決定要因のみを取得しても良いし、肯定／否定を検知して、各カテゴリー（肯定／否定）ごとに決定要因を取得しても良い。また、ユーザ提示用語取得手段１８１は、着目決定要因を取得しても良い。 The user presentation term acquisition unit 181 acquires at least one or more determination factors from the user input information received by the user input information reception unit 14. The user-presented term acquisition unit 181 may acquire only a positive determinant, or may detect affirmation / negative and acquire a determinant for each category (positive / negative). Further, the user presented term acquisition unit 181 may acquire a focus determination factor.

装置提示用語取得手段１８２は、対話文出力部１７が出力した対話文から、少なくとも１以上の決定要因を取得する。なお、対話文出力部１７が出力した対話文とは、対話文情報構成部１６が構成した対話文と同意義である。また、装置提示用語取得手段１８２は、肯定的な決定要因のみを取得しても良いし、肯定／否定を検知して、各カテゴリー（肯定／否定）ごとに決定要因を取得しても良い。また、装置提示用語取得手段１８２は、着目決定要因を取得しても良い。 The device presentation term acquisition unit 182 acquires at least one or more determinants from the dialogue sentence output by the dialogue sentence output unit 17. The dialogue sentence output by the dialogue sentence output unit 17 has the same meaning as the dialogue sentence formed by the dialogue sentence information configuration unit 16. Further, the device presentation term acquisition unit 182 may acquire only a positive determination factor, or may detect affirmation / negative and acquire a determination factor for each category (positive / negative). In addition, the device presentation term acquisition unit 182 may acquire a focus determination factor.

嗜好ベクトル更新手段１８３は、ユーザ提示用語取得手段１８１が取得した１以上の決定要因に対する嗜好ベクトルの要素の値を高くするように、ユーザ状態情報を更新する。また、嗜好ベクトル更新手段１８３は、装置提示用語取得手段１８２が取得した１以上の決定要因の中で、ユーザ提示用語取得手段１８１が取得できなかった１以上の決定要因に対する嗜好ベクトルの要素の値を低くするように、ユーザ状態情報を更新する。これは、対話装置１が出力したが、シミュレーション装置２に選択されなかった決定要因の値を低くすることである。 The preference vector update unit 183 updates the user state information so as to increase the value of the preference vector element for one or more determination factors acquired by the user presented term acquisition unit 181. Also, the preference vector update unit 183 is a value of an element of the preference vector for one or more determinants that cannot be acquired by the user-presented term acquisition unit 181 among one or more determinants acquired by the device presentation term acquisition unit 182. The user status information is updated so as to lower the value. This is to lower the value of the determinant that is output from the interactive device 1 but not selected by the simulation device 2.

知識ベクトル更新手段１８４は、装置提示用語取得手段１８２が取得した１以上の決定要因に対する知識ベクトルの要素の値を高くするように、ユーザ状態情報を更新する。
なお、ユーザ状態情報更新部１８は、ユーザ提示用語取得手段１８１が取得した１以上の決定要因に対する属性ベクトルの要素の値を変更し、ユーザ状態情報を更新する属性ベクトル更新手段１８５を具備しても良い。 The knowledge vector update unit 184 updates the user state information so as to increase the value of the element of the knowledge vector for one or more determination factors acquired by the device presentation term acquisition unit 182.
The user status information update unit 18 includes attribute vector update means 185 that changes the value of the element of the attribute vector for one or more determination factors acquired by the user presented term acquisition means 181 and updates the user status information. Also good.

シミュレーション装置２を構成する対話情報格納部２１は、対話確率情報、決定要因確率情報、およびスポット確率情報とを格納し得る。対話確率情報とは、各情報推薦手法と各ユーザ文種類との確率に関する情報である。対話確率情報は、例えば、（手法識別子，ユーザ文種類識別子，ユーザ文種類，確率）の情報が、手法識別子とユーザ文種類識別子との組み合わせの数だけ有する。かかる場合、確率は、手法識別子で識別される手法に対応する文が対話装置から送付された場合に、シミュレーション装置２がユーザ文種類識別子で識別される種類の文を生成する（対話装置１に送信する）確率である。決定要因確率情報とは、決定要因が選択される確率に関する情報である。決定要因確率情報は、例えば、（決定要因，確率）の集合である。決定要因確率情報は、関連する２以上の決定要因確率情報を有するグループごとに、管理されている。スポット確率情報は、スポットが選択される確率に関する情報である。なお、すべてのスポットが選択される確率が同じである場合、スポット確率情報は不要である。また、ユーザ文種類とは、ユーザの回答の種類であるとも言えるし、ユーザが入力する文の種類であるとも言える。ユーザの回答、ユーザが入力する文は、ここではシミュレーション装置２が対話装置１に送付する情報とも言える。 The dialogue information storage unit 21 constituting the simulation apparatus 2 can store dialogue probability information, determinant probability information, and spot probability information. The conversation probability information is information regarding the probability between each information recommendation method and each user sentence type. As the conversation probability information, for example, (method identifier, user sentence type identifier, user sentence type, probability) information has the number of combinations of the technique identifier and the user sentence type identifier. In such a case, the probability is that the simulation apparatus 2 generates a sentence of the type identified by the user sentence type identifier when the sentence corresponding to the technique identified by the technique identifier is sent from the dialogue apparatus (to the conversation apparatus 1). Send) probability. The determinant probability information is information regarding the probability that a determinant is selected. The determination factor probability information is, for example, a set of (determination factor, probability). The determinant probability information is managed for each group having two or more related determinant probability information. The spot probability information is information regarding the probability that a spot is selected. In addition, when the probability that all the spots are selected is the same, the spot probability information is unnecessary. The user sentence type can be said to be a type of user's answer or a type of sentence input by the user. The user's answer and the sentence entered by the user can be said to be information that the simulation apparatus 2 sends to the dialogue apparatus 1 here.

ユーザ嗜好ベクトル格納部２２は、ユーザの嗜好を示すベクトルであるユーザ嗜好ベクトルを格納し得る。ユーザ嗜好ベクトルは、２以上の各決定要因に対する嗜好を示す値の集合である。各決定要因に対する値は「０」または「１」のどちらかであっても良いし、多段階（例えば、１から５のいずれかの整数等）であっても良い。 The user preference vector storage unit 22 can store a user preference vector that is a vector indicating the user's preference. The user preference vector is a set of values indicating the preference for each of two or more determining factors. The value for each determinant may be either “0” or “1”, or may be multi-stage (for example, any integer from 1 to 5).

対話文情報受付部２３は、対話装置１から対話文情報を受け付ける。ここでの受け付けとは、通常、情報の受け渡しである。なお、対話文情報は、手法識別子を有する。対話文情報は、１以上の決定要因または１以上のスポットのうちの１以上の情報を有することは好適である。 The dialog text information receiving unit 23 receives dialog text information from the dialog device 1. The acceptance here is usually information delivery. The dialog sentence information has a technique identifier. It is preferable that the dialog sentence information includes one or more information of one or more determinants or one or more spots.

ユーザ文種類決定部２４は、対話文情報が有する手法識別子と対話確率情報とを用いて、ユーザ文種類を決定し、ユーザ文種類識別子を取得する。ユーザ文種類決定部２４は、対話文情報が有する手法識別子と対になる確率とユーザ文種類識別子とを、対話確率情報から取得し、当該確率に応じて、ユーザ文種類識別子を取得する。 The user sentence type determination unit 24 determines the user sentence type using the technique identifier and the conversation probability information included in the dialog sentence information, and acquires the user sentence type identifier. The user sentence type determining unit 24 acquires the probability of pairing with the technique identifier included in the dialog sentence information and the user sentence type identifier from the dialog probability information, and acquires the user sentence type identifier according to the probability.

決定要因等取得部２５は、１以上の決定要因または１以上のスポットを取得する。決定要因等取得部２５は、決定要因確率情報またはスポット確率情報のうちの１以上の情報を用いて、１以上の決定要因または１以上のスポットを取得する。また、決定要因等取得部２５は、決定要因確率情報またはスポット確率情報のうちの１以上の情報および対話文情報が有する１以上の決定要因または１以上のスポットのうちの１以上の情報とを用いて、１以上の決定要因または１以上のスポットを取得する。例えば、対話文情報が３つのスポットを有する場合、決定要因等取得部２５は、スポット確率情報が有する前記３つの各スポットの確率に応じて、一のスポットを取得する。なお、決定要因等取得部２５は、対話文情報の中にスポットが含まれない場合、スポット確率情報が有する確率に応じて、一のスポットを取得することは好適である。 The determination factor acquisition unit 25 acquires one or more determination factors or one or more spots. The determining factor acquisition unit 25 acquires one or more determining factors or one or more spots using one or more pieces of information of the determining factor probability information or spot probability information. Further, the determination factor acquisition unit 25 obtains one or more pieces of information of the decision factor probability information or spot probability information and one or more pieces of information of one or more decision factors or one or more spots included in the dialogue sentence information. Used to obtain one or more determinants or one or more spots. For example, when the dialog sentence information includes three spots, the determination factor acquisition unit 25 acquires one spot according to the probabilities of the three spots included in the spot probability information. In addition, when the spot information is not included in the dialogue sentence information, the determining factor acquisition unit 25 preferably acquires one spot according to the probability of the spot probability information.

ユーザ文種類決定部２４と決定要因等取得部２５とにより、システムの行動ａ^ｔ _ｓｙｓに対するユーザの発話行為ｃａ^ｔ _ｕｓｅｒ、意味内容ｓｃ^ｔ _ｕｓｅｒが取得される。ユーザ文種類決定部２４と決定要因等取得部２５とは、以下の数式２を用いて、ユーザの発話行為と意味内容とを取得する。ここでユーザの発話行為とは、ユーザ文種類識別子である。また、意味内容とは、１以上の決定要因または１以上のスポットのうちの１以上の情報である。
The user sentence type determination unit 24 and the determining factor parameter acquisition unit 25, speech acts of the user for action ^a _{t sys} system ^ca _{t user,} meaning ^sc _{t user} is acquired. The user sentence type determination unit 24 and the determination factor acquisition unit 25 acquire the user's utterance action and meaning content by using the following Equation 2. Here, the user's speech act is a user sentence type identifier. The semantic content is one or more information of one or more determinants or one or more spots.

すなわち、数式２は、ユーザの発話行為ｃａ^ｔ _ｕｓｅｒは、条件付き確率Ｐｒ（ｃａ^ｔ _ｕｓｅｒ｜ｃａ^ｔ _ｓｙｓ）に基づいて、取得される。条件付き確率は、上述した対話確率情報である。ユーザ発話の意味内容ｓｃ^ｔ _ｕｓｅｒは，ユーザの知識下にあるユーザの嗜好に基づいて決定される。ｓｃは，ユーザが知っている（ｋｍ＝１）決定要因の中から，ユーザが興味の有無に基づいて（コーパスの統計に基づいて）取得される。 That is, Equation 2, speech acts ^ca _{t user} of the user, a conditional probability ^Pr ^| based on _{_{(ca t user ca t sys)}} , are obtained. The conditional probability is the above-described dialog probability information. The meaning content sc ^t _user of the user utterance is determined based on the user's preference under the user's knowledge. The sc is acquired based on whether or not the user is interested (based on corpus statistics) from the determinants known by the user (km = 1).

ユーザ入力情報送付部２６は、ユーザ入力情報を対話装置１に送付する。ユーザ入力情報は、ユーザ文種類識別子を有する。また、ユーザ入力情報は、ユーザ文種類識別子と１以上の決定要因または１以上のスポットのうちの１以上の情報とを有しても良い。ここでのユーザ文種類識別子は、ユーザ文種類決定部２４が取得したユーザ文種類識別子である。また、１以上の決定要因または１以上のスポットは、決定要因等取得部２５が取得した情報である。 The user input information sending unit 26 sends user input information to the dialogue apparatus 1. The user input information has a user sentence type identifier. Further, the user input information may include a user sentence type identifier and one or more determinants or one or more information of one or more spots. The user sentence type identifier here is a user sentence type identifier acquired by the user sentence type determination unit 24. Further, one or more determining factors or one or more spots are information acquired by the determining factor acquisition unit 25.

報酬算出部２７は、決定要因等取得部２５が取得したスポットが選択された場合の報酬を算出する。報酬算出部２７は、決定要因等取得部２５が取得したスポットの１以上の各決定要因の評価を示す１以上の評価値を、知識ベース１１から取得する。そして、報酬算出部２７は、ユーザ嗜好ベクトル格納部２２からユーザ嗜好ベクトルを読み出す。そして、報酬算出部２７は、読み出したユーザ嗜好ベクトルと、取得したスポットの１以上の各決定要因の評価を示す１以上の評価値（適宜、「スポット評価ベクトル」という）との合致度を算出し、合致度を用いて、ユーザ文種類識別子で識別されるユーザ文種類が選択される報酬を算出する。例えば、ユーザ嗜好ベクトルが（１，０，１，１，１）、取得したスポット評価ベクトルが（１，１，０，１，１）である場合、合致度は「３」とする。合致度は、例えば、ユーザ嗜好ベクトルとスポット評価ベクトルとで、一致している要素の数である。また、報酬算出部２７は、後述するランダム選択合致値算出手段２７１、選択スポット合致度算出手段２７２、および報酬算出手段２７３を用いて、ランダムにスポットを決定した場合と、決定要因等取得部２５が取得したスポットが選択された場合とを比較し、報酬を算出することは好適である。
報酬算出部２７は、例えば、以下の数式３の報酬関数を用いて、報酬を算出する。
The reward calculation unit 27 calculates a reward when the spot acquired by the determining factor acquisition unit 25 is selected. The reward calculation unit 27 acquires from the knowledge base 11 one or more evaluation values indicating the evaluation of one or more determination factors of the spot acquired by the determination factor etc. acquisition unit 25. Then, the reward calculation unit 27 reads the user preference vector from the user preference vector storage unit 22. Then, the reward calculation unit 27 calculates the degree of coincidence between the read user preference vector and one or more evaluation values (appropriately referred to as “spot evaluation vectors”) indicating the evaluation of each of the one or more determining factors of the acquired spot. The reward for selecting the user sentence type identified by the user sentence type identifier is calculated using the degree of match. For example, when the user preference vector is (1, 0, 1, 1, 1) and the acquired spot evaluation vector is (1, 1, 0, 1, 1), the matching degree is “3”. The degree of match is, for example, the number of elements that match between the user preference vector and the spot evaluation vector. The reward calculation unit 27 uses a random selection match value calculation unit 271, a selected spot match degree calculation unit 272, and a reward calculation unit 273, which will be described later, and a determination factor acquisition unit 25. It is preferable to compare the case where the spot acquired by is selected and calculate the reward.
The reward calculation unit 27 calculates the reward using, for example, a reward function of the following formula 3.

報酬関数は、ユーザが選択したスポットが持つ属性（１以上の評価値）と、ユーザの嗜好（ユーザ嗜好ベクトル）との一致率を基に報酬を算出する関数である。ユーザは、現在の対話状態における知識Ｋ_ｕｓｅｒと局所重みＶ_ｕｓｅｒの下で、最も優先度（Σ_ｍｋ_ｋ・ｐ_ｋ・ｖ_ｋｍ）が高いスポットｋを選択するものとする。報酬Ｒは、ユーザが決定したスポットｋが，ランダムにスポットを決定した場合と比較してどれだけよい選択であるかに基づいて与えられる。数式３において、Ｍは、１以上の評価値の数であり、ユーザ嗜好ベクトルの要素の数であり、例えば、２９である。数式３において、Ｎは、スポットの数である。また、数式３において、ｐ_ｍは、ユーザ嗜好ベクトルである。また、ｅ_ｋ，ｍは、スポットｋの評価情報の各要素である。 The reward function is a function for calculating a reward based on a match rate between an attribute (one or more evaluation values) of a spot selected by the user and a user preference (user preference vector). It is assumed that the user selects a spot k having the highest priority (Σ _m k _k , p _k , v _km ) under the knowledge K _user and the local weight V _user in the current dialog state. The reward R is given based on how good the spot k determined by the user is compared to the case where the spot is randomly determined. In Equation 3, M is the number of one or more evaluation values, the number of elements of the user preference vector, and is 29, for example. In Equation 3, N is the number of spots. Further, in Equation 3, _{p m} is a user preference vector. Further, e _{k, m} are each element of the evaluation information of the spot k.

ランダム選択合致値算出手段２７１は、スポット確率情報を用いて、ランダムにスポットを決定した場合の１以上の評価値とユーザ嗜好ベクトルとの合致度の期待値を算出する。なお、スポット確率情報が、すべてのスポットについて同一でも良く、かかる場合、「ランダムにスポットを決定した場合」とは、「均等な割合でスポットを決定した場合」という意味である。 The random selection match value calculation means 271 uses the spot probability information to calculate an expected value of the degree of match between one or more evaluation values and a user preference vector when a spot is randomly determined. Note that the spot probability information may be the same for all spots. In such a case, “when spots are randomly determined” means “when spots are determined at an equal ratio”.

選択スポット合致度算出手段２７２は、ユーザ嗜好ベクトルと、ユーザ入力情報に含まれるスポットの１以上の各決定要因の評価を示す１以上の評価値との合致度を算出する。また、選択スポット合致度算出手段２７２は、決定要因等取得部２５が決定したスポットの１以上の各決定要因の評価を示す１以上の評価値と、ユーザ嗜好ベクトルとの合致度を算出しても良い。ユーザ入力情報に含まれるスポット、または決定要因等取得部２５が決定したスポットをここでの着目スポットという。 The selected spot coincidence degree calculating means 272 calculates the degree of coincidence between the user preference vector and one or more evaluation values indicating the evaluation of one or more determinants of spots included in the user input information. In addition, the selected spot match degree calculation unit 272 calculates a match degree between the user preference vector and one or more evaluation values indicating the evaluation of one or more determination factors of the spot determined by the determination factor acquisition unit 25. Also good. A spot included in the user input information or a spot determined by the determination factor etc. acquiring unit 25 is referred to as a spot of interest here.

報酬算出手段２７３は、ランダム選択合致値算出手段２７１が算出した合致度の期待値と、選択スポット合致度算出手段２７２が算出した合致度とを用いて、ユーザ入力情報に含まれるスポットが選択されたことの報酬を算出する。 The reward calculation unit 273 selects a spot included in the user input information using the expected value of the degree of match calculated by the random selection match value calculation unit 271 and the match degree calculated by the selected spot match level calculation unit 272. To calculate the reward.

学習部２８は、報酬を用いて、対話装置１の手法識別子に対応する重みベクトルであり、対話装置１の情報推薦手法格納部１２の重みベクトルを更新する。学習部２８は、例えば、報酬が正の数の場合、対話文情報が有する情報推薦手法がより選択されやすくなるように、対話装置１の手法識別子に対応する重みベクトルを更新する。この重みベクトルは、情報推薦手法格納部１２の重みベクトルである。ここで更新とは、学習部２８が情報推薦手法格納部１２の重みベクトルを、直接的に書き換えても良いし、対話装置１に更新を指示しても良い。対話装置１が更新の指示を受け付けた場合、対話装置１は重みベクトルを書き換える、とする。学習部２８が重みベクトルを更新する方法や度合いは問わない。通常、報酬が大きいほど、学習部２８は、対話文情報が有する情報推薦手法がより選択されやすくなるように、報酬の大きさに応じて、対話装置１の手法識別子に対応する重みベクトルを更新する。また、学習部２８は、例えば、報酬が負の数の場合、対話文情報が有する情報推薦手法がより選択されにくくなるように、対話装置１の手法識別子に対応する重みベクトルを更新する。例えば、学習部２８は、自然政策勾配法の一つである、後述するＮａｔｕｒａｌＡｃｔｏｒＣｒｉｔｉｃ（ＮＡＣ）のアルゴリズムにより、重みベクトルを更新する。ＮＡＣについては、「八谷大岳，杉山将：強くなるロボティック・ゲームプレイヤーの作り方，毎日コミュニケーションズ(2008).」に記載されており、公知技術であるので、詳細な説明を省略する。なお、ＮＡＣは、政策を最適化する手法であり、自然政策勾配法の一つである。政策勾配法では，状態Ｓに対する価値関数を直接推定したり、行動価値関数Ｑ（Ｓ，Ａ）を推定したりすることは行わない代わりに、更新前の政策により得られた対話エピソードの報酬を増加させるように自然勾配法により政策πを直接更新する。 The learning unit 28 uses the reward to update the weight vector corresponding to the method identifier of the dialog device 1 and the information recommendation method storage unit 12 of the dialog device 1. For example, when the reward is a positive number, the learning unit 28 updates the weight vector corresponding to the method identifier of the dialog device 1 so that the information recommendation method included in the dialog text information is more easily selected. This weight vector is a weight vector of the information recommendation method storage unit 12. Here, updating means that the learning unit 28 may directly rewrite the weight vector in the information recommendation method storage unit 12 or may instruct the dialog device 1 to update. When the interactive device 1 receives an update instruction, the interactive device 1 rewrites the weight vector. The method and degree by which the learning unit 28 updates the weight vector does not matter. Usually, the larger the reward is, the learning unit 28 updates the weight vector corresponding to the technique identifier of the dialogue apparatus 1 according to the magnitude of the reward so that the information recommendation technique included in the dialog sentence information is more easily selected. To do. For example, when the reward is a negative number, the learning unit 28 updates the weight vector corresponding to the technique identifier of the dialogue apparatus 1 so that the information recommendation technique included in the dialogue sentence information is more difficult to be selected. For example, the learning unit 28 updates the weight vector according to a later-described Natural Actor Critic (NAC) algorithm, which is one of natural policy gradient methods. NAC is described in “Otake Hachiya, Masaru Sugiyama: How to make a strong robotic game player, Mainichi Communications (2008).” Since it is a well-known technique, detailed description is omitted. NAC is a method for optimizing policies and is one of natural policy gradient methods. In the policy gradient method, instead of directly estimating the value function for the state S or estimating the action value function Q (S, A), the reward of the dialogue episode obtained by the policy before the update is used. Update policy π directly by natural gradient method to increase.

知識ベース１１、情報推薦手法格納部１２、ユーザ状態情報格納部１３、対話情報格納部２１、ユーザ嗜好ベクトル格納部２２は、不揮発性の記録媒体が好適であるが、揮発性の記録媒体でも実現可能である。知識ベース１１等にスポット情報等が記憶される過程は問わない。例えば、記録媒体を介してスポット情報等が知識ベース１１等で記憶されるようになってもよく、通信回線等を介して送信されたスポット情報等が知識ベース１１等で記憶されるようになってもよく、あるいは、入力デバイスを介して入力されたスポット情報等が知識ベース１１等で記憶されるようになってもよい。 The knowledge base 11, the information recommendation method storage unit 12, the user state information storage unit 13, the dialogue information storage unit 21, and the user preference vector storage unit 22 are preferably non-volatile recording media, but are also realized by volatile recording media. Is possible. The process of storing spot information or the like in the knowledge base 11 or the like is not limited. For example, spot information or the like may be stored in the knowledge base 11 or the like via a recording medium, and spot information or the like transmitted via a communication line or the like is stored in the knowledge base 11 or the like. Alternatively, spot information or the like input via the input device may be stored in the knowledge base 11 or the like.

ユーザ入力情報受付部１４、および対話文情報受付部２３は、例えば、無線または有線の通信手段により実現される。ユーザ入力情報受付部１４等は、ＭＰＵやメモリ等から実現されても良い。ここでの受け付けとは、受信でも良いし、関数等による情報の受け付け等でも良い。 The user input information receiving unit 14 and the dialog text information receiving unit 23 are realized by, for example, wireless or wired communication means. The user input information receiving unit 14 or the like may be realized by an MPU, a memory, or the like. The reception here may be reception or reception of information by a function or the like.

スコア算出部１５、対話文情報構成部１６、ユーザ状態情報更新部１８、ユーザ文種類決定部２４、決定要因等取得部２５、報酬算出部２７、学習部２８は、通常、ＭＰＵやメモリ等から実現され得る。スコア算出部１５等の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The score calculation unit 15, the dialog sentence information configuration unit 16, the user state information update unit 18, the user sentence type determination unit 24, the determination factor acquisition unit 25, the reward calculation unit 27, and the learning unit 28 are usually from an MPU, a memory, or the like. Can be realized. The processing procedure of the score calculation unit 15 or the like is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

ユーザ入力情報送付部２６、対話文出力部１７は、例えば、無線または有線の通信手段により実現される。ユーザ入力情報送付部２６等は、ＭＰＵやメモリ等から実現されても良い。ここでの送付とは、送信でも良いし、他の処理への情報の受け渡し等でも良い。 The user input information sending unit 26 and the dialogue sentence output unit 17 are realized by, for example, wireless or wired communication means. The user input information sending unit 26 and the like may be realized by an MPU, a memory, or the like. The sending here may be sending or delivery of information to other processes.

次に、学習システムの動作について説明する。まず、対話装置１の動作については、図４のフローチャートを用いて説明する。 Next, the operation of the learning system will be described. First, the operation of the interactive apparatus 1 will be described using the flowchart of FIG.

（ステップＳ４０１）対話文出力部１７は、予め保持している初期の対話文情報を、シミュレーション装置２に送付する。 (Step S <b> 401) The dialogue sentence output unit 17 sends the initial dialogue sentence information stored in advance to the simulation apparatus 2.

（ステップＳ４０２）ユーザ入力情報受付部１４は、シミュレーション装置２から、ユーザ入力情報を受け付けたか否かを判断する。ユーザ入力情報を受け付ければステップＳ４０３に行き、ユーザ入力情報を受け付けなければステップＳ４０２に戻る。 (Step S402) The user input information receiving unit 14 determines whether user input information has been received from the simulation apparatus 2. If user input information is accepted, the process goes to step S403, and if user input information is not accepted, the process returns to step S402.

（ステップＳ４０３）ユーザ入力情報受付部１４または図示しない手段が、ユーザ入力情報受付部１４が受け付けたユーザ入力情報が終了条件を満たすか否かを判断する。終了条件を満たせば処理を終了し、終了条件を満たさなければステップＳ４０５に行く。なお、終了条件とは、例えば、ユーザ入力情報が、予め決められた文のパターンに対応するユーザ文種類識別子を含む場合である。予め決められた文のパターンとは、例えば、「＜スポット＞に行きます。」「＜スポット＞に決めました。」などである。 (Step S403) The user input information receiving unit 14 or a unit (not shown) determines whether or not the user input information received by the user input information receiving unit 14 satisfies an end condition. If the end condition is satisfied, the process ends. If the end condition is not satisfied, the process goes to step S405. The end condition is, for example, a case where the user input information includes a user sentence type identifier corresponding to a predetermined sentence pattern. Examples of the sentence pattern determined in advance include “go to <spot>” and “determine <spot>”.

（ステップＳ４０４）スコア算出部１５は、２以上の各情報推薦手法に対する２以上のスコアを算出する。スコア算出処理の詳細については、図５のフローチャートを用いて説明する。 (Step S404) The score calculation unit 15 calculates two or more scores for two or more information recommendation methods. Details of the score calculation processing will be described with reference to the flowchart of FIG.

（ステップＳ４０５）対話文情報構成部１６は、送付する対話文情報を構成する。対話文情報の構成処理の詳細については、図６のフローチャートを用いて説明する。 (Step S405) The dialog text information configuration unit 16 configures dialog text information to be sent. Details of the processing for constructing dialogue text information will be described with reference to the flowchart of FIG.

（ステップＳ４０６）対話文出力部１７は、対話文情報構成部１６が構成した対話文情報を、シミュレーション装置２に送付する。 (Step S <b> 406) The dialog text output unit 17 sends the dialog text information configured by the dialog text information configuration unit 16 to the simulation apparatus 2.

（ステップＳ４０７）ユーザ状態情報更新部１８は、ユーザ状態情報更新処理を行い、ステップＳ４０２に戻る。ユーザ状態情報更新処理の詳細については、図７のフローチャートを用いて説明する。 (Step S407) The user state information update unit 18 performs a user state information update process, and returns to step S402. Details of the user status information update processing will be described with reference to the flowchart of FIG.

なお、図４のフローチャートにおいて、電源オフや処理終了の割り込みにより処理は終了する。 In the flowchart of FIG. 4, the process ends when the power is turned off or the process ends.

次に、ステップＳ４０４のスコア算出処理の詳細については、図５のフローチャートを用いて説明する。 Next, details of the score calculation processing in step S404 will be described with reference to the flowchart of FIG.

（ステップＳ５０１）スコア算出部１５は、ユーザ状態情報格納部１３からユーザ状態情報を読み出す。 (Step S501) The score calculation unit 15 reads the user state information from the user state information storage unit 13.

（ステップＳ５０２）スコア算出部１５は、カウンタｉに１を代入する。 (Step S502) The score calculation unit 15 substitutes 1 for a counter i.

（ステップＳ５０３）スコア算出部１５は、情報推薦手法格納部１２の中に、ｉ番目の情報推薦手法が存在するか否かを判断する。ｉ番目の情報推薦手法が存在すればステップＳ５０４に行き、存在しなければ上位処理にリターンする。 (Step S503) The score calculation unit 15 determines whether or not the i-th information recommendation method exists in the information recommendation method storage unit 12. If the i-th information recommendation method exists, the process goes to step S504, and if it does not exist, the process returns to the upper process.

（ステップＳ５０４）スコア算出部１５は、ｉ番目の情報推薦手法が有する重みベクトルを読み出す。 (Step S504) The score calculation part 15 reads the weight vector which the i-th information recommendation method has.

（ステップＳ５０５）スコア算出部１５は、ステップＳ５０１で読み出したユーザ状態情報と、ステップＳ５０４で読み出した評価情報、重みベクトルとを用いて、ｉ番目の情報推薦手法のスコアを算出し、当該スコアをｉ番目の情報推薦手法と対応付けて一時蓄積する。スコア算出部１５は、例えば、「ユーザ状態情報×重みベクトル」によりスコアを算出する。なお、ユーザ状態情報および評価情報もベクトルである。 (Step S505) The score calculation unit 15 calculates the score of the i-th information recommendation method using the user state information read in step S501, the evaluation information read in step S504, and the weight vector, and calculates the score. Temporary storage is performed in association with the i-th information recommendation method. The score calculation unit 15 calculates a score using, for example, “user state information × weight vector”. Note that user state information and evaluation information are also vectors.

（ステップＳ５０６）スコア算出部１５は、カウンタｉを１、インクリメントする。ステップＳ５０３に戻る。 (Step S506) The score calculation unit 15 increments the counter i by one. The process returns to step S503.

なお、図５のフローチャートにおいて、スコア算出部１５のスコア算出方法は、問わない。 In the flowchart of FIG. 5, the score calculation method of the score calculation unit 15 does not matter.

次に、ステップＳ４０５の対話文情報の構成処理の詳細については、図６のフローチャートを用いて説明する。 Next, details of the dialogue text information configuration processing in step S405 will be described with reference to the flowchart of FIG.

（ステップＳ６０１）対話文情報構成部１６は、直前のユーザ入力情報、または／および直前の対話装置１の対話文情報に含まれるスポットを取得する。ただし、ここで、スポットを取得できない場合もあり得る。 (Step S601) The dialogue sentence information configuration unit 16 acquires a spot included in the immediately preceding user input information and / or the dialogue sentence information of the immediately preceding dialogue apparatus 1. However, here, there may be a case where the spot cannot be acquired.

（ステップＳ６０２）対話文情報構成部１６は、直前のユーザ入力情報、または／および直前の対話装置１の対話文情報に含まれる決定要因を取得する。ただし、ここで、決定要因を取得できない場合もあり得る。 (Step S602) The dialog statement information configuration unit 16 acquires a determination factor included in the immediately preceding user input information and / or the dialog statement information of the immediately preceding dialog device 1. However, here, there may be a case where the determination factor cannot be acquired.

（ステップＳ６０３）対話文情報構成部１６は、ステップＳ６０１でスポットを取得できたか否かを判断する。取得できればステップＳ６０４に行き、取得できなければステップＳ６０５に行く。 (Step S603) The dialog text information configuration unit 16 determines whether or not a spot has been acquired in Step S601. If it can be acquired, the process goes to step S604, and if it cannot be acquired, the process goes to step S605.

（ステップＳ６０４）対話文情報構成部１６は、変数「着目スポット」に、ステップＳ６０１で取得したスポットを代入する。なお、変数「着目スポット」の値は、現在、対話において着目されているスポットである。また、変数「着目スポット」の値は、通常、一のスポットである。 (Step S604) The dialogue sentence information configuration unit 16 substitutes the spot acquired in Step S601 for the variable “target spot”. Note that the value of the variable “spot of interest” is a spot that is currently focused on in the dialogue. The value of the variable “spot of interest” is usually one spot.

（ステップＳ６０５）対話文情報構成部１６は、ステップＳ６０２で決定要因を取得できたか否かを判断する。取得できればステップＳ６０６に行き、取得できなければステップＳ６０７に行く。 (Step S605) The dialog statement information configuration unit 16 determines whether or not the determination factor has been acquired in Step S602. If it can be acquired, the process goes to step S606, and if it cannot be acquired, the process goes to step S607.

（ステップＳ６０６）対話文情報構成部１６は、変数「着目決定要因」に、ステップＳ６０２で取得した決定要因を代入する。なお、変数「着目決定要因」の値は、現在、対話において着目されている決定要因である。また、変数「着目決定要因」の値は、２以上の決定要因である場合もある。 (Step S606) The dialog statement information configuration unit 16 substitutes the determination factor acquired in Step S602 for the variable “focus determination factor”. Note that the value of the variable “focused determinant” is a determinant currently focused on in the dialogue. In addition, the value of the variable “target decision factor” may be two or more decision factors.

（ステップＳ６０７）対話文情報構成部１６は、変数「着目スポット」の値、および変数「着目決定要因」の値を用いて、知識ベース１１を検索し、着目スポットおよび着目決定要因に対応する説明文情報を、知識ベース１１から読み出す。なお、この説明文情報は、ユーザからの入力文に対する回答文情報である。通常、対話文情報構成部１６は、変数「着目スポット」の値、および変数「着目決定要因」に対応する説明文を知識ベース１１から読み出す。説明文情報とは、説明文そのものであっても良いし、説明文を識別する情報であっても良い。 (Step S <b> 607) The dialogue sentence information configuration unit 16 searches the knowledge base 11 using the value of the variable “focus spot” and the value of the variable “focus determination factor”, and explains corresponding to the focus spot and the focus determination factor. Sentence information is read from the knowledge base 11. The explanatory text information is response text information for an input text from the user. Normally, the dialogue sentence information configuration unit 16 reads from the knowledge base 11 the value of the variable “target spot” and the explanatory text corresponding to the variable “target decision factor”. The explanatory note information may be the explanatory note itself or information for identifying the explanatory note.

（ステップＳ６０８）対話文情報構成部１６は、推薦文情報の取得処理を行う。推薦文情報の取得処理については、図７のフローチャートを用いて説明する。推薦文情報は、手法識別子を有する。また、推薦文情報は、１以上のスポットまたは１以上の決定要因を有することは好適である。 (Step S608) The dialogue text information configuration unit 16 performs a recommendation text information acquisition process. The recommended sentence information acquisition process will be described with reference to the flowchart of FIG. The recommendation sentence information has a technique identifier. Further, it is preferable that the recommendation sentence information has one or more spots or one or more determinants.

（ステップＳ６０９）対話文情報構成手段１６３は、ステップＳ６０７で取得された説明文情報、およびステップＳ６０８で取得された推薦文情報から、対話文情報を構成する。 (Step S609) The dialog text information configuring unit 163 configures dialog text information from the explanatory text information acquired in Step S607 and the recommended text information acquired in Step S608.

なお、図４のフローチャートにおいて、回答文情報と推薦文情報とを取得した。しかし、図４のフローチャートにおいて、推薦文情報のみを取得する、回答文情報と推薦文情報と他の文の情報も取得するなど、種々の文の情報の取得処理を行うことが考えられる。 In the flowchart of FIG. 4, answer sentence information and recommendation sentence information are acquired. However, in the flowchart of FIG. 4, it is conceivable to perform various sentence information acquisition processing, such as acquiring only recommended sentence information, acquiring answer sentence information, recommended sentence information, and other sentence information.

次に、ステップＳ６０８（図６）の推薦文情報の取得処理については、図７のフローチャートを用いて説明する。 Next, the recommended sentence information acquisition process in step S608 (FIG. 6) will be described with reference to the flowchart of FIG.

（ステップＳ７０１）対話文情報構成部１６の手法識別子取得手段１６１は、スコア算出部１５が算出した２以上のスコアのうち最も大きいスコアに対応する一の情報推薦手法を識別する手法識別子を、情報推薦手法格納部１２から取得する。 (Step S701) The method identifier acquisition unit 161 of the dialogue sentence information configuration unit 16 sets a method identifier for identifying one information recommendation method corresponding to the largest score among the two or more scores calculated by the score calculation unit 15 as information Obtained from the recommendation method storage unit 12.

（ステップＳ７０２）変数値取得手段１６２は、カウンタｉに１を代入する。 (Step S702) The variable value acquisition means 162 substitutes 1 for the counter i.

（ステップＳ７０３）変数値取得手段１６２は、ステップＳ７０１で取得した手法識別子に対応する文パターン情報の中の、ｉ番目の変数が存在するか否かを判断する。存在すればステップＳ７０４に行き、存在しなければ上位処理にリターンする。 (Step S703) The variable value acquisition unit 162 determines whether or not the i-th variable exists in the sentence pattern information corresponding to the technique identifier acquired in step S701. If it exists, the process goes to step S704, and if it does not exist, the process returns to the upper process.

（ステップＳ７０４）変数値取得手段１６２は、ステップＳ７０１で取得した文パターン情報の中の、ｉ番目の変数を取得する。なお、この変数には、変数の値をどこから取得するかに関する情報も保持している。 (Step S704) The variable value acquisition unit 162 acquires the i-th variable in the sentence pattern information acquired in step S701. This variable also holds information on where to obtain the variable value from.

（ステップＳ７０５）変数値取得手段１６２は、ｉ番目の変数に代入される１以上の用語を取得する。この用語とは、通常、スポットまたは決定要因（決定要因を特定する単語等でも良い）である。 (Step S705) The variable value acquisition unit 162 acquires one or more terms to be substituted into the i-th variable. This term is usually a spot or a determinant (may be a word specifying the determinant).

（ステップＳ７０６）変数値取得手段１６２は、カウンタｉを１、インクリメントする。 (Step S706) The variable value acquisition unit 162 increments the counter i by one.

次に、ステップＳ４０７のユーザ状態情報更新処理の詳細については、図８のフローチャートを用いて説明する。 Next, details of the user status information update processing in step S407 will be described with reference to the flowchart of FIG.

（ステップＳ８０１）ユーザ状態情報更新部１８のユーザ提示用語取得手段１８１は、ユーザ入力情報受付部１４が受け付けた最新（直前）のユーザ入力情報から、１以上の決定要因を取得する。また、直前に受け付けた文から１以上の決定要因を取得できない場合、ユーザ提示用語取得手段１８１は、着目決定要因を取得する。また、ユーザ提示用語取得手段１８１は、ユーザ入力情報受付部１４が受け付けた最新（直前）のユーザ入力情報から、１以上のスポットを取得する。そして、ユーザ提示用語取得手段１８１は、取得した決定要因または／および取得したスポットを、バッファに一時格納する。 (Step S801) The user presentation term acquisition means 181 of the user status information update unit 18 acquires one or more determination factors from the latest (immediately preceding) user input information received by the user input information reception unit 14. When one or more determination factors cannot be acquired from the sentence received immediately before, the user-presented term acquisition unit 181 acquires the focus determination factor. The user presented term acquisition unit 181 acquires one or more spots from the latest (immediately preceding) user input information received by the user input information receiving unit 14. Then, the user-presented term acquisition unit 181 temporarily stores the acquired determination factor or / and the acquired spot in the buffer.

（ステップＳ８０２）嗜好ベクトル更新手段１８３は、ユーザ状態情報格納部１３のユーザ状態情報が有する嗜好ベクトルを読み出す。そして、嗜好ベクトル更新手段１８３は、ステップＳ８０１で取得した決定要因に対応する要素の値が大きくなるように、ユーザ状態情報に含まれる嗜好ベクトルを更新する。なお、ここで、属性ベクトル更新手段１８５は、ステップＳ８０１で取得された決定要因に対応する要素の値が大きくなるように、または当該決定要因に対応する要素の値になるように、または当該決定要因に対応する要素の値に近づくように、ユーザ状態情報に含まれる属性ベクトルを更新しても良い。例えば、属性ベクトルの要素が性別の場合であり、男性「１」、女性「０」で規定されており、ステップＳ８０１で取得された決定要因に対応する要素の値が「０」（女性）である場合、属性ベクトル更新手段１８５は、属性「性別」の値がより「０」に近づくように、属性値を変更する。なお、属性ベクトル更新手段１８５は、属性値をどのように変更するかは問わない。例えば、属性ベクトル更新手段１８５は、取得された決定要因に対応する要素の値（１または０）の平均値を、現在の属性値とする。 (Step S802) The preference vector update unit 183 reads the preference vector included in the user status information in the user status information storage unit 13. Then, the preference vector update unit 183 updates the preference vector included in the user state information so that the value of the element corresponding to the determination factor acquired in step S801 is increased. Here, the attribute vector update unit 185 is configured so that the value of the element corresponding to the determination factor acquired in step S801 is increased, or the value of the element corresponding to the determination factor is set. The attribute vector included in the user state information may be updated so as to approach the value of the element corresponding to the factor. For example, the attribute vector element is gender, which is defined by male “1” and female “0”, and the value of the element corresponding to the determination factor acquired in step S801 is “0” (female). If there is, the attribute vector update unit 185 changes the attribute value so that the value of the attribute “gender” is closer to “0”. Note that the attribute vector update unit 185 does not matter how the attribute value is changed. For example, the attribute vector update unit 185 sets the average value of the element values (1 or 0) corresponding to the acquired determination factor as the current attribute value.

（ステップＳ８０３）装置提示用語取得手段１８２は、対話文出力部１７が送付した最新（直前）の対話文情報から、１以上の決定要因を取得する。また、装置提示用語取得手段１８２は、対話文出力部１７が送付した最新（直前）の対話文情報から、１以上のスポットを取得する。そして、装置提示用語取得手段１８２は、取得した決定要因または／および取得したスポットを、バッファに一時格納する。 (Step S803) The device presentation term acquisition means 182 acquires one or more determinants from the latest (immediately previous) dialog sentence information sent by the dialog sentence output unit 17. Further, the device presentation term acquisition means 182 acquires one or more spots from the latest (immediately previous) dialog sentence information sent by the dialog sentence output unit 17. Then, the device presentation term acquisition unit 182 temporarily stores the acquired determination factor or / and the acquired spot in the buffer.

（ステップＳ８０４）知識ベクトル更新手段１８４は、ユーザ状態情報格納部１３のユーザ状態情報が有する知識ベクトルを読み出す。知識ベクトル更新手段１８４は、ステップＳ８０３で取得した決定要因に対応する要素の値が大きくなるように、ユーザ状態情報に含まれる知識ベクトルを更新する。 (Step S804) The knowledge vector update means 184 reads the knowledge vector possessed by the user state information in the user state information storage unit 13. The knowledge vector update unit 184 updates the knowledge vector included in the user state information so that the value of the element corresponding to the determination factor acquired in step S803 is increased.

（ステップＳ８０５）ユーザ状態情報更新部１８は、ユーザ状態情報格納部１３のユーザ状態情報が有する対話のターン数を読み出す。なお、対話のターン数とは、対話が繰り返された対数である。そして、ユーザ状態情報更新部１８は、読み出したターン数に１を加えた値を、新しいターン数として、ユーザ状態情報格納部１３のユーザ状態情報を更新する。 (Step S805) The user state information update unit 18 reads the number of conversation turns included in the user state information in the user state information storage unit 13. The number of dialogue turns is a logarithm of dialogue repeated. Then, the user status information update unit 18 updates the user status information in the user status information storage unit 13 with a value obtained by adding 1 to the read number of turns as a new number of turns.

（ステップＳ８０６）ユーザ状態情報更新部１８は、直前ユーザ発話行為情報を更新する。直前ユーザ発話行為情報は、直前にユーザ入力情報受付部１４が受け付けたユーザ入力情報に関する情報であり、ユーザが要求した情報の種類（スポットのみ、決定要因名のみ、またはその両方等）に対応する情報である。 (Step S806) The user state information update unit 18 updates the immediately preceding user utterance action information. The immediately preceding user utterance action information is information regarding the user input information received by the user input information receiving unit 14 immediately before, and corresponds to the type of information requested by the user (spot only, determinant name only, or both). Information.

（ステップＳ８０７）ユーザ状態情報更新部１８は、直前システム発話行為情報を更新する。直前システム発話行為情報は、直前に対話文出力部１７が送付した対話文情報に関する情報であり、選択した情報推薦手法を特定する情報（手法識別子）である。 (Step S807) The user state information update unit 18 updates the immediately preceding system utterance action information. The immediately preceding system utterance action information is information related to the dialog text information sent by the dialog text output unit 17 immediately before, and is information (method identifier) that identifies the selected information recommendation method.

（ステップＳ８０８）ユーザ状態情報更新部１８は、システム提示履歴情報を更新する。なお、システム提示履歴情報は、対話装置１（システム）がシミュレーション装置２に送付したスポット、および決定要因の数である。ユーザ状態情報更新部１８は、ステップＳ８０３でバッファに書き込んだ決定要因およびスポットを、それぞれユニーク処理し、バッファ内の決定要因の数およびスポットの数を取得する。そして、ユーザ状態情報更新部１８は、バッファ内の決定要因の数およびスポットの数を、システム提示履歴情報として取得する。 (Step S808) The user state information update unit 18 updates the system presentation history information. The system presentation history information is the number of spots and determination factors sent from the interactive apparatus 1 (system) to the simulation apparatus 2. The user status information update unit 18 performs a unique process on the determination factor and spot written in the buffer in step S803, and acquires the number of determination factors and the number of spots in the buffer. Then, the user status information update unit 18 acquires the number of determining factors and the number of spots in the buffer as system presentation history information.

なお、図８のフローチャートにおいて、ステップＳ８０５からＳ８０８において更新した情報は、ユーザ状態情報を構成する情報の例であり、その他の情報がユーザ状態情報を構成しても良い。 In the flowchart of FIG. 8, the information updated in steps S805 to S808 is an example of information constituting the user status information, and other information may constitute the user status information.

次に、シミュレーション装置２の動作については、図９のフローチャートを用いて説明する。 Next, the operation of the simulation apparatus 2 will be described with reference to the flowchart of FIG.

（ステップＳ９０１）対話文情報受付部２３は、対話文情報を受け付けたか否かを判断する。対話文情報を受け付ければステップＳ９０２に行き、対話文情報を受け付けなければステップＳ９０１に戻る。 (Step S901) The dialog text information receiving unit 23 determines whether dialog text information has been received. If dialog text information is accepted, the process goes to step S902. If dialog text information is not accepted, the process returns to step S901.

（ステップＳ９０２）ユーザ文種類決定部２４は、ステップＳ９０１で受け付けた対話文情報が有する手法識別子を取得する。 (Step S902) The user sentence type determination unit 24 acquires a technique identifier included in the dialog sentence information received in Step S901.

（ステップＳ９０３）ユーザ文種類決定部２４は、ステップＳ９０２で取得した手法識別子と、対話情報格納部２１の対話確率情報とを用いて、ユーザ文種類を決定し、ユーザ文種類識別子を取得する。 (Step S903) The user sentence type determination unit 24 determines the user sentence type using the method identifier acquired in step S902 and the conversation probability information in the conversation information storage unit 21, and acquires the user sentence type identifier.

（ステップＳ９０４）決定要因等取得部２５は、ステップＳ９０１で受け付けた対話文情報が有する１以上の決定要因または／および１以上のスポットを取得する。なお、ここで決定要因および１以上のスポットが取得できない場合もある。 (Step S904) The determination factor acquisition unit 25 acquires one or more determination factors or / and one or more spots included in the dialog sentence information received in Step S901. Here, there are cases where the determination factor and one or more spots cannot be acquired.

（ステップＳ９０５）決定要因等取得部２５は、ステップＳ９０４で取得した１以上の決定要因または／および１以上のスポットを用いて、１以上の決定要因または１以上のスポットを取得する。なお、ここで、決定要因およびスポットが決定されない場合もある。なお、決定要因等取得部２５は、ステップＳ９０４で取得した１以上の決定要因または／および１以上のスポットを用いて、スポットを取得できない場合、スポット確率情報が有する確率に応じて、一のスポットを取得しても良い。 (Step S905) The determination factor etc. acquiring unit 25 acquires one or more determination factors or one or more spots by using one or more determination factors or / and one or more spots acquired in step S904. Here, the determination factor and the spot may not be determined. In addition, when the determination factor acquisition unit 25 cannot acquire a spot using one or more determination factors or / and one or more spots acquired in step S904, one spot is determined according to the probability of the spot probability information. You may get

（ステップＳ９０６）ユーザ入力情報送付部２６は、ステップＳ９０３で取得したユーザ文種類識別子、およびステップＳ９０５で取得した０以上の決定要因または／および０以上のスポットを用いて、ユーザ入力情報を構成する。ここで、ユーザ入力情報とは、ユーザ文種類識別子、および０以上の決定要因または／および０以上のスポットを有するユーザ入力情報を、送信するデータ構造に構成することである。 (Step S906) The user input information sending unit 26 configures user input information using the user sentence type identifier acquired in step S903 and zero or more determinants or / and zero or more spots acquired in step S905. . Here, the user input information is to configure a user sentence type identifier and user input information having zero or more determinants or / and zero or more spots in a data structure to be transmitted.

（ステップＳ９０７）ユーザ入力情報送付部２６は、ステップＳ９０６で構成したユーザ入力情報を、対話装置１に送付する。 (Step S907) The user input information sending unit 26 sends the user input information configured in step S906 to the dialogue apparatus 1.

（ステップＳ９０８）報酬算出部２７は、決定要因等取得部２５が取得したスポットが選択された場合の報酬を算出する。報酬算出処理については、図１０のフローチャートを用いて説明する。 (Step S908) The reward calculation unit 27 calculates a reward when the spot acquired by the determination factor acquisition unit 25 is selected. The reward calculation process will be described with reference to the flowchart of FIG.

（ステップＳ９０９）学習部２８は、ステップＳ９０８で算出された報酬を用いて、対話装置１の手法識別子に対応する重みベクトルであり、対話装置１の情報推薦手法格納部１２の重みベクトルを更新し、ステップＳ９０１に戻る。 (Step S909) The learning unit 28 uses the reward calculated in step S908 to update the weight vector of the information recommendation method storage unit 12 of the dialog device 1 that is a weight vector corresponding to the method identifier of the dialog device 1. Return to step S901.

なお、図９のフローチャートにおいて、電源オフや処理終了の割り込みにより処理は終了する。 In the flowchart of FIG. 9, the process is terminated by power-off or a process termination interrupt.

次に、ステップＳ９０８の報酬算出処理については、図１０のフローチャートを用いて説明する。 Next, the reward calculation process in step S908 will be described with reference to the flowchart of FIG.

（ステップＳ１００１）報酬算出部２７は、ユーザ嗜好ベクトル格納部２２からユーザ嗜好ベクトルを読み出す。 (Step S1001) The reward calculation unit 27 reads a user preference vector from the user preference vector storage unit 22.

（ステップＳ１００２）ランダム選択合致値算出手段２７１は、カウンタｉに１を代入する。 (Step S1002) The random selection match value calculation means 271 substitutes 1 for the counter i.

（ステップＳ１００３）ランダム選択合致値算出手段２７１は、ｉ番目のスポットが知識ベース１１に存在するか否かを判断する。ｉ番目のスポットが存在すればステップＳ１００４に行き、存在しなければステップＳ１００７に行く。 (Step S1003) The random selection match value calculation means 271 determines whether or not the i-th spot exists in the knowledge base 11. If the i-th spot exists, the process goes to step S1004, and if not, the process goes to step S1007.

（ステップＳ１００４）ランダム選択合致値算出手段２７１は、ｉ番目のスポットの１以上の評価値を、知識ベース１１から読み出す。 (Step S1004) The random selection match value calculation means 271 reads one or more evaluation values of the i-th spot from the knowledge base 11.

（ステップＳ１００５）ランダム選択合致値算出手段２７１は、ステップＳ１００１で読み出したユーザ嗜好ベクトルと、ステップＳ１００４で読み出したｉ番目のスポットの１以上の評価値とから、ユーザ嗜好ベクトルとｉ番目のスポットの１以上の評価値との合致度を算出し、当該合致度を図示しないバッファに一時蓄積する。ランダム選択合致値算出手段２７１は、例えば、ユーザ嗜好ベクトルの各要素と、ｉ番目のスポットの１以上の各評価値との一致する数を、合致度とする。 (Step S1005) The random selection match value calculation means 271 calculates the user preference vector and the i-th spot from the user preference vector read in step S1001 and one or more evaluation values of the i-th spot read in step S1004. The degree of coincidence with one or more evaluation values is calculated, and the degree of coincidence is temporarily stored in a buffer (not shown). The random selection match value calculation means 271 sets, for example, the number of matches between each element of the user preference vector and one or more evaluation values of the i-th spot as the match level.

（ステップＳ１００６）ランダム選択合致値算出手段２７１は、カウンタｉを、１インクリメントし、ステップＳ１００３に戻る。 (Step S1006) The random selection match value calculation means 271 increments the counter i by 1, and returns to step S1003.

（ステップＳ１００７）ランダム選択合致値算出手段２７１は、ステップＳ１００５で一時蓄積された全スポットの合致度から、全スポットの合致度の期待値を算出する。ランダム選択合致値算出手段２７１は、全スポットの合致度の平均値を期待値としても良い。また、ランダム選択合致値算出手段２７１は、「各スポットの合致度×スポット確率情報が示す各スポットの確率」の合計を、期待値としても良い。 (Step S1007) The random selection match value calculation means 271 calculates the expected value of the match degree of all spots from the match degree of all spots temporarily accumulated in step S1005. The random selection match value calculation means 271 may use an average value of the match degrees of all spots as an expected value. In addition, the random selection match value calculation unit 271 may use the sum of “the degree of match of each spot × the probability of each spot indicated by the spot probability information” as an expected value.

（ステップＳ１００８）選択スポット合致度算出手段２７２は、着目スポットの１以上の各決定要因の評価を示す１以上の評価値を、知識ベース１１から読み出す。 (Step S1008) The selected spot coincidence degree calculation means 272 reads from the knowledge base 11 one or more evaluation values indicating the evaluation of one or more determinants of the spot of interest.

（ステップＳ１００９）選択スポット合致度算出手段２７２は、ステップＳ１００１で読み出したユーザ嗜好ベクトルと、ステップＳ１００８で読み出した１以上の評価値とから、ユーザ嗜好ベクトルと着目スポットの１以上の評価値との合致度を算出する。選択スポット合致度算出手段２７２は、例えば、ユーザ嗜好ベクトルの各要素と、着目スポットの１以上の各評価値との一致する数を、合致度とする。 (Step S1009) The selected spot coincidence degree calculation means 272 calculates the user preference vector and one or more evaluation values of the spot of interest from the user preference vector read in Step S1001 and one or more evaluation values read in Step S1008. The degree of match is calculated. For example, the selected spot coincidence degree calculating means 272 sets the number of coincidence between each element of the user preference vector and one or more evaluation values of the spot of interest as the coincidence degree.

（ステップＳ１０１０）報酬算出手段２７３は、ランダム選択合致値算出手段２７１が算出した合致度の期待値と、選択スポット合致度算出手段２７２が算出した合致度とを用いて、ユーザ入力情報に含まれるスポットが選択されたことの報酬を算出する。報酬算出手段２７３は、例えば、「報酬＝ステップＳ１００９で算出した合致度−ステップＳ１００７で算出した期待値」により、報酬を算出する。なお、かかる算出式は、報酬算出手段２７３が予め保持している、とする。 (Step S1010) The reward calculation means 273 is included in the user input information using the expected value of the degree of match calculated by the random selection match value calculation means 271 and the degree of match calculated by the selected spot match degree calculation means 272. A reward for selecting a spot is calculated. The reward calculation means 273 calculates a reward by, for example, “reward = the degree of match calculated in step S1009−the expected value calculated in step S1007”. It is assumed that the calculation formula is stored in advance by the reward calculation unit 273.

以下、本実施の形態における学習システムの評価実験について説明する。なお、評価実験における対話装置１は、ユーザを模擬したシミュレーション装置２との対話を進めながら、シミュレーション装置２の訪問先の意思決定を支援し、京都の観光案内を行う装置である。 Hereinafter, an evaluation experiment of the learning system in the present embodiment will be described. Note that the dialogue device 1 in the evaluation experiment is a device that assists in the decision making of the visit destination of the simulation device 2 while performing a dialogue with the simulation device 2 that simulates a user, and provides sightseeing guidance in Kyoto.

図１１は、知識ベース１１が保持しているスポット情報管理表の一例である。スポット情報管理表は、スポットに関する情報であるスポット情報を２以上格納している。ここでのスポットは、京都の観光地である。なお、本明細書において、スポットとは、スポットを特定する情報（スポット識別子と言い換えても良い）である場合と、スポットの概念を示す場合とがある。また、スポット情報は、「スポット」「決定要因情報」「評価値」「説明文」を有する。「決定要因情報」は、決定要因を識別する「決定要因識別子」と決定要因を正確に示す句である（肯定的または否定的な用語も含む句である）「決定要因」を有する。また、本具体例において、決定要因は、「庭園が有名」「混雑しない」「世界遺産」「景色が良い」「アクセスが良い」「紅葉が有名」「桜が有名」「歴史で有名」「散策できる」「イベントがある」の１０要因がある（順不同）、とする。「評価値」は、各スポットの決定要因に対する評価を示す情報であり、各スポットが、ここでは、決定要因情報が示す決定要因の特性を有している場合は「１」、有していない場合は「０」の値を採る。なお、「評価値」は、「１」「０」の２値ではなく、多段階の評価（例えば、「１」から「５」など）でも良い。 FIG. 11 is an example of a spot information management table held by the knowledge base 11. The spot information management table stores two or more pieces of spot information, which is information regarding spots. The spot here is a sightseeing spot in Kyoto. In this specification, a spot may be information that identifies a spot (which may be referred to as a spot identifier) or a concept of a spot. The spot information includes “spot”, “determining factor information”, “evaluation value”, and “description”. The “determining factor information” includes a “determining factor identifier” that identifies a determinant and a “determining factor” that is a phrase that accurately indicates the determinant (including a positive or negative term). Also, in this specific example, the determinants are “Garden is famous”, “Not crowded”, “World heritage”, “Scenic view”, “Good access”, “Colorful leaves”, “Sakura is famous”, “History in history” Suppose that there are 10 factors (in no particular order): “Walking” and “There is an event”. The “evaluation value” is information indicating an evaluation with respect to the determinant of each spot. Here, each spot has “1” when it has the characteristic of the determinant indicated by the determinant information. In this case, a value of “0” is taken. The “evaluation value” may be a multi-level evaluation (for example, “1” to “5”) instead of the binary values “1” and “0”.

また、図１２は、情報推薦手法格納部１２の情報推薦手法管理表の一例である。情報推薦手法管理表は、６つの情報推薦手法を格納している。情報推薦手法は、「ＩＤ」「手法識別子」「文パターン情報」「評価情報」「重みベクトル」を有する。「ＩＤ」は、情報推薦手法を識別する数値である。「手法識別子」は、情報推薦手法を識別する文字列であり、その意義を示す。「文パターン情報」は、情報の推薦文を構成する元になる情報、または情報の推薦文（ＩＤ＝３のみ）である。情報の推薦文においてタグ（先頭"＜"、終端"＞"の情報）は変数である。また、文パターン情報の中の"｛"および"｝"で囲まれた情報は、直前の変数の値の取得方法（取得動作）を示す情報（以下、取得動作記述という。）である。＜着目スポット＞は、現在の着目スポットが代入される。＜着目決定要因＞は、現在の着目決定要因が代入される。また、＜１以上の未出決定要因＞は、着目スポットについて、今までの対話で出現しておらず、かつ着目スポットの評価値が「１」の決定要因が代入される。また、＜１以上の未出スポット＞は、着目決定要因について、今までの対話で出現しておらず、かつ着目決定要因の評価値が「１」のスポットが代入される。＜最高嗜好スポット＞は、ユーザ状態情報が有する嗜好ベクトルを元に、各スポットのランク（嗜好値）を算出した場合に最高点になるスポットである。取得動作記述「｛ｓｅｌｅｃｔ３以下の決定要因ｗｈｅｒｅ知識ベクトル内の値が低い順｝」は、知識ベクトル内の決定要素の値が低い順に３以下の決定要因を取得する動作を示す。なお、この取得動作記述は、＜１以上の未出決定要因＞に付随しているので、知識ベクトル内の決定要素の値が低い順に、３以下の未出の決定要因を取得する動作を示すこととなる。取得動作記述「｛ｓｅｌｅｃｔ３以下の決定要因ｗｈｅｒｅ嗜好ベクトル内の値が高い順｝」は、嗜好ベクトル内の値が高い順に３以下の決定要因を取得する動作を示す。なお、この取得動作記述は、＜１以上の未出決定要因＞に付随しているので、嗜好ベクトル内の値が高い順に、３以下の未出の決定要因を取得する動作を示すこととなる。取得動作記述「｛ｓｅｌｅｃｔ最高嗜好スポットｗｈｅｒｅ嗜好値が最大のスポット｝」は、現在のユーザ状態情情報から、ユーザから見て、最大の嗜好値であるとシステムが推定するスポットを取得することを示す。例えば、この取得動作記述は、嗜好値を算出する演算式に嗜好ベクトル、スポットごとに各決定要因の評価値を代入し、嗜好値を算出し、最大の嗜好値を有するスポットを取得する動作を示す。「評価情報」は、情報推薦手法ごとに保持している情報である。そして、現在のユーザ状態情報と情報推薦手法が有する評価情報とを用いて、各情報推薦手法のスコアが算出され、当該スコアが最大の情報推薦手法に対応する文パターン情報を用いて、推薦文が構成される。例えば、現在のユーザ状態情報（ベクトル）と情報推薦手法が有する評価情報（ベクトル）とが乗算され、情報推薦手法のスコアが算出される。 FIG. 12 is an example of an information recommendation method management table in the information recommendation method storage unit 12. The information recommendation method management table stores six information recommendation methods. The information recommendation method includes “ID”, “method identifier”, “sentence pattern information”, “evaluation information”, and “weight vector”. “ID” is a numerical value for identifying an information recommendation method. The “method identifier” is a character string that identifies the information recommendation method and indicates its significance. The “sentence pattern information” is information that constitutes a recommended sentence of information, or a recommended sentence of information (ID = 3 only). In the recommendation sentence of information, the tag (information of the beginning “<”, end “>”) is a variable. Also, information enclosed by “{” and “}” in the sentence pattern information is information (hereinafter referred to as an acquisition operation description) indicating a method for acquiring the value of the immediately preceding variable (acquisition operation). The current target spot is substituted for <target spot>. The current focus determination factor is substituted into <focus determination factor>. For <one or more undecided determinants>, a determinant that has not appeared in the previous dialogue for the spot of interest and whose evaluation value of the spot of interest is “1” is substituted. For <1 or more unspotted spots>, a spot whose attention determining factor has not appeared in the previous dialogue and whose evaluation value of the attention determining factor is “1” is substituted. <Highest favorite spot> is a spot that becomes the highest point when the rank (preference value) of each spot is calculated based on the preference vector of the user status information. The acquisition operation description “{decision factor equal to or less than 3 select where in the knowledge vector has the lowest value}” indicates an operation of acquiring a decision factor of 3 or less in the descending order of the value of the decision element in the knowledge vector. Since this acquisition operation description is attached to <one or more undecided determinants>, it indicates an operation of acquiring three or less undecided determinants in descending order of the value of the determinant element in the knowledge vector. It will be. The acquisition operation description “{decision factor equal to or smaller than select 3 where preference value in order of descending value}” indicates an operation of acquiring a determinant of 3 or less in descending order of value in the preference vector. Since this acquisition operation description is attached to <one or more undecided determinants>, it indicates an operation for acquiring three or less undecided determinants in descending order of the value in the preference vector. . The acquisition operation description “{select highest preference spot where the preference value is the maximum}” indicates that a spot estimated by the system to be the maximum preference value when viewed from the user is obtained from the current user status information. Show. For example, this acquisition operation description substitutes the preference vector and the evaluation value of each determinant for each spot into the arithmetic expression for calculating the preference value, calculates the preference value, and acquires the spot having the maximum preference value. Show. “Evaluation information” is information held for each information recommendation method. Then, using the current user status information and the evaluation information of the information recommendation method, the score of each information recommendation method is calculated, and using the sentence pattern information corresponding to the information recommendation method with the maximum score, the recommended sentence Is configured. For example, the score of the information recommendation method is calculated by multiplying the current user state information (vector) and the evaluation information (vector) of the information recommendation method.

また、ユーザ状態情報格納部１３は、以下に説明するユーザ状態情報を格納している、とする。例えば、ユーザ状態情報は、知識ベクトル「Ｋ_user」、嗜好ベクトル「Ｐ_user」、および局所重み行列「Ｖ_user」の３要素を有する、とする。ここでは、簡単のため、ユーザの嗜好ベクトル「Ｐ_user=(p₁,p₂,・・・,p_M)」の要素は、「１」または「０」の２値からなるパラメータである、とする。すなわち、ユーザがある決定要因mに興味があり(もしくは潜在的に興味があり)、スポット決定する際に重視する場合にp_mは「１」をとるものとする。また、ユーザが、(ユーザ自身も気づいていない)潜在的な嗜好を持っている状態を表現するために、ユーザの知識ベクトル「Ｋ_user=(k₁,k₂,・・・,k_M)」を導入する。ユーザが、システム（対話装置１）が決定要因mを扱えることを知っている、もしくはシステムが決定要因mを推薦した場合にベクトルの要素k_Mは、「１」をとる。これらのベクトルを用いることにより、例えば、決定要因mが、ユーザが潜在的に興味を持っている要因であるが、ユーザはそれに気づいていないという状態は(k_m=０,p_m=１)で表現できる。また、ユーザの決定要因mの観点からのスポットnに対する局所重みv_nmは、ユーザは、システムから提示された情報のみから判断すると仮定して、システムが上記の６つの推薦手法のうち「ＩＤ＝１，２，６」のいずれかの推薦手法を用いて、ユーザにスポットの評価を知らせた場合に「１」をとるものとする。なお、ユーザ状態情報は、属性ベクトル「Ａ_user」を有していても良い。 In addition, it is assumed that the user status information storage unit 13 stores user status information described below. For example, it is assumed that the user state information has three elements: a knowledge vector “K _user ”, a preference vector “P _user ”, and a local weight matrix “V _user ”. Here, for the sake of simplicity, the elements of the _user preference vector “P _user = (p ₁ , p ₂ ,..., P _M )” are parameters consisting of binary values of “1” or “0”. And That is, the user is interested in is determinant m (or potentially interested), p _m is assumed to take the "1" when emphasized in determining spot. In addition, in order to express a state in which the user has a potential preference (the user himself / herself is unaware), the user's knowledge vector “K _user = (k ₁ , k ₂ ,..., K _M ) Is introduced. When the user knows that the system (interactive device 1) can handle the determining factor m, or when the system recommends the determining factor m, the vector element k _M takes “1”. By using these vectors, for example, determinants m is is a factor that the user has potentially interesting, the state that the user is not aware of it _{_{(k m = 0, p m}} = 1) Can be expressed as Further, assuming that the user determines only the local weight v _nm for the spot n from the viewpoint of the user's determinant m from the information presented from the system, the system uses “ID = “1” is assumed when the user is informed of the evaluation of the spot using any one of the recommended methods “1, 2, 6”. Note that the user status information may have an attribute vector “A _user ”.

また、ここでのユーザ状態情報は、対話状態の特徴ベクトルに相当する。さらに詳細には、ユーザ状態情報は、ここでは、以下の６種類の情報を有する。第一は、ターン数である。ターン数は、ここでは、ノコギリ関数を利用することにより、５つのパラメータでターン数を表現する。第二は、直前ユーザ発話行為情報である。直前ユーザ発話行為情報は、例えば、ベクトル（x_i,x_i+1,x_i+2,x_i+3,x_i+4)であり、（１ｉｆａ_user ^t-1=x_i ，otherwise ０）とする。ここで、「ａ_user ^t-1」は直前のユーザの発話である。また、直前のユーザの発話が、システム（対話装置１）が推薦したスポット（スポット名と言い換えても良い）もしくは決定要因のみを含む場合には「x_i=１」であり他のベクトルの要素は「０」である。また、システムから推薦されていないスポットを含む場合には「x_i+1=１」であり他のベクトルの要素は「０」である。また、システムから推薦されていない決定要因のみを含む場合には「x_i+2=１」であり他のベクトルの要素は「０」である。また、システムから推薦されていないスポット・決定要因の双方を含む場合には「x_i+3=1」であり他のベクトルの要素は「０」である。さらに、これらのいずれも含まない場合には「x_i+4=１」であり他のベクトルの要素は「０」とする。第三は、直前システム発話行為情報である。直前システム発話行為情報は、例えば、ベクトル（y_i,y_i+1,y_i+2,y_i+3,y_i+4,y_i+5,y_i+6)であり、（１ｉｆａ_sys ^t-1=y_i ，otherwise ０）とする。ここで、「ａ_sys ^t-1」は直前のシステムの発話である。また、直前のシステムの発話が、情報推薦手法１を用いた場合は「y_i+1=１」であり他のベクトルの要素は「０」である。また、情報推薦手法２を用いた場合は「y_i+２=１」であり他のベクトルの要素は「０」である。つまり、情報推薦手法ｎを用いた場合は「y_i+n=１」であり他のベクトルの要素は「０」とする。第四は、ユーザの決定要因に対する知識ベクトルである。なお、知識ベクトルは、ユーザの決定要因に対する知識を、「Σ_n=1 ^NPr(k_n=1)」により算出しても良い。ここで、k_nは、ｎ番目の決定要因に対する知識ベクトルの要素値である。また、Pr(k=1)は、「kが1である」と、システムが推定する事後確率（確信度）である。ここで、嗜好ベクトルの要素値は、「１」または「０」である、とする。第五は、システムが提示したスポット・決定要因数である。つまり、第五は、システム提示履歴情報であり、例えば、「Σ_n=1 ^N Σ_m=1 ^Mv_nm」である。ここで、v_nmは、対話装置１が有するスポットｍの決定要因ｎに対する評価値である。第六は、嗜好ベクトルである。嗜好ベクトルは、ユーザの各決定要因に対する嗜好を示す情報である。なお、嗜好ベクトルは、ユーザが決定要因を重視する確率の期待値に置き換えても良い。この期待値は、「Pr(k_n=1)×Pr(p_n=1))」で示され、ここで、各決定要因ごと計１０パラメータを有する。ここで、「p_n」は、嗜好ベクトルの要素値である。ここで、嗜好ベクトルの要素値は、「１」または「０」である、とする。 The user state information here corresponds to a feature vector of the dialog state. More specifically, here, the user status information includes the following six types of information. The first is the number of turns. Here, the number of turns is expressed by five parameters using a sawtooth function. The second is immediately preceding user utterance action information. The immediately preceding user utterance action information is, for example, a vector (x _i , x _{i + 1} , x _{i + 2} , x _{i + 3} , x _{i + 4} ), and (1 if a _user ^t−1 = x _i , otherwise 0). Here, “a _user ^t−1 ” is the utterance of the previous user. In addition, when the utterance of the previous user includes only a spot (which may be referred to as a spot name) recommended by the system (dialogue device 1) or a determinant, “x _i = 1” and other vector elements Is “0”. When a spot not recommended by the system is included, “x _{i + 1} = 1” and the elements of other vectors are “0”. When only the determinants not recommended by the system are included, “x _{i + 2} = 1” and the elements of the other vectors are “0”. Further, when both spots and determinants not recommended by the system are included, “x _{i + 3} = 1” and the elements of other vectors are “0”. Further, when none of these is included, “x _{i + 4} = 1” and the elements of the other vectors are “0”. The third is the immediately preceding system utterance action information. The immediately preceding system utterance action information is, for example, a vector (y _i , y _{i + 1} , y _{i + 2} , y _{i + 3} , y _{i + 4} , y _{i + 5} , y _{i + 6} ), and (1 if a _sys ^t-1 = y _i , otherwise 0) Here, “a _sys ^t-1 ” is the utterance of the immediately preceding system. Further, when the information recommendation method 1 is used as the utterance of the immediately preceding system, “y _{i + 1} = 1” and the elements of the other vectors are “0”. When the information recommendation method 2 is used, “y _{i + 2} = 1”, and the elements of the other vectors are “0”. That is, when the information recommendation method n is used, “y _{i + n} = 1”, and the elements of other vectors are “0”. The fourth is a knowledge vector for a user's determinant. Note that the knowledge vector may be calculated by “Σ _{n = 1} ^N Pr (k _n = 1)”, which is the knowledge of the user's decision factor. Here, k _n is the element values of the knowledge vector for the n th determinant. Pr (k = 1) is a posterior probability (certainty factor) estimated by the system that “k is 1”. Here, it is assumed that the element value of the preference vector is “1” or “0”. The fifth is the number of spots and determinants presented by the system. That is, the fifth item is system presentation history information, for example, “Σ _{n = 1} ^N Σ _{m = 1} ^M v _nm ”. Here, v _nm is an evaluation value for the determinant n of the spot m that the interactive apparatus 1 has. The sixth is a preference vector. The preference vector is information indicating a user's preference for each determining factor. Note that the preference vector may be replaced with an expected value of the probability that the user attaches importance to the determination factor. This expected value is indicated by “Pr (k _n = 1) × Pr (p _n = 1))”, and has 10 parameters in total for each decision factor. Here, “p _n ” is an element value of the preference vector. Here, it is assumed that the element value of the preference vector is “1” or “0”.

そして、ユーザ状態情報格納部１３が格納しているユーザ状態情報は、上記で説明した要素値を有するベクトルであり、２９の要素値を有するベクトル（s1,s2,・・・,s29）である、とする。そして、ユーザ状態情報の初期値は、（s0001,s0002,・・・,s0029）である、とする。 The user state information stored in the user state information storage unit 13 is a vector having the element values described above, and is a vector (s1, s2,..., S29) having 29 element values. , And. The initial value of the user state information is (s0001, s0002,..., S0029).

また、対話文出力部１７は、対話装置１が起動時に出力される対話文情報である初期の対話文情報（手法識別子「３」）を格納している、とする。なお、手法識別子「３」に対応する文は、例えば、「京都観光システムです。お好みの観光スポットを推薦します。」である。 Further, it is assumed that the dialog text output unit 17 stores initial dialog text information (method identifier “3”) that is dialog text information output when the dialog device 1 is activated. The sentence corresponding to the method identifier “3” is, for example, “Kyoto sightseeing system. Recommend your favorite sightseeing spot.”

さらに、ユーザ入力情報受付部１４は、対話の終了条件であるユーザ文種類識別子を保持している。ここでのユーザ文種類識別子は、文のパターン「＜スポット＞に行きます。」「＜スポット＞に決めました。」に対応するユーザ文種類識別子である。 Further, the user input information receiving unit 14 holds a user sentence type identifier which is a conversation end condition. The user sentence type identifier here is a user sentence type identifier corresponding to the sentence pattern “go to <spot>” or “determined to <spot>”.

また、対話情報格納部２１の対話確率情報は、例えば、図１３に示す対話確率情報管理表が示す情報である。対話確率情報管理表は、ユーザ文種類、および手法１から手法３の確率とを有する。つまり、図１３は、情報推薦手法識別子「手法１」「手法２」「手法３」のいずれかの情報推薦手法識別子が対話文情報に含まれる場合に、ユーザ文種類識別子で識別されるユーザ文種類が選択される確率を示している。なお、図１２において、推薦手法４から６に対するユーザの行動選択（ユーザ文種類識別子の選択）には，情報推薦手法識別子「手法１」による確率を用いる、とする。 Further, the dialogue probability information in the dialogue information storage unit 21 is, for example, information indicated by the dialogue probability information management table shown in FIG. The dialogue probability information management table has user sentence types and the probabilities of Method 1 to Method 3. That is, FIG. 13 shows the user sentence identified by the user sentence type identifier when the information recommendation technique identifier “method 1”, “method 2”, or “method 3” is included in the dialog sentence information. The probability that the type is selected is shown. In FIG. 12, it is assumed that the probability based on the information recommendation method identifier “method 1” is used for user action selection (selection of user sentence type identifier) for the recommendation methods 4 to 6.

かかる状態において、評価実験を行った。評価実験において、各シミュレーション対話ごとに，シミュレーション話者（Ｐｕｓｅｒ，Ｋｕｓｅｒ，Ｖｕｓｅｒ）をサンプリングする。擬似話者は，嗜好を４つ持つものと仮定する（＝嗜好ベクトルＰｕｓｅｒの４つの要素が"１"，残りの要素が"０"）。嗜好の選択には、被験者実験を行った後に行ったアンケートにより調べたユーザの嗜好の分布（図１２参照）を用いた．ユーザの知識Ｋｕｓｅｒについても同様に、予備実験において、ユーザがシステム推薦前に発話した割合に基づいて設定した。ユーザの局所重みＶｕｓｅｒは，ユーザが予備知識を持たないと仮定し、すべてを"０"に初期化した。対話装置１の側（システム側）のパラメータについても同様に、予備実験の結果に基づいてシステムが推定するユーザの嗜好Ｐｓｙｓと知識Ｋｓｙｓを初期化した。また、シミュレーションを行うに際して、以下の仮定を置いた。システムは、ユーザの発話の音声認識、および理解誤りを行わず、その時点での政策πに基づいて推薦内容を決定する。ユーザは、２０ターン対話を継続するものとし、シミュレーション装置２が応答（ユーザ入力情報）を生成する。対話装置１は、報酬関数に基づいて報酬を与えられる。以上の条件で、対話のシミュレーションを行い、２０００対話ごとに政策（パラメータθ）をＮＡＣにより更新した。 In this state, an evaluation experiment was performed. In the evaluation experiment, a simulation speaker (Puser, Kuser, Vuser) is sampled for each simulation dialogue. The pseudo-speaker is assumed to have four preferences (= four elements of the preference vector User are “1” and the remaining elements are “0”). For preference selection, the distribution of user preferences (see Fig. 12) examined by questionnaires conducted after subject experiments were used. Similarly, the user's knowledge Kuser was set based on the ratio of the user's utterance before the system recommendation in the preliminary experiment. The user's local weight Vuser is initialized to “0”, assuming that the user has no prior knowledge. Similarly, the user preference Psys and knowledge Ksys estimated by the system based on the result of the preliminary experiment were initialized for the parameters on the interactive apparatus 1 side (system side). The following assumptions were made when performing the simulation. The system determines the recommended content based on the policy π at that time without performing speech recognition and understanding error of the user's utterance. The user is assumed to continue the dialogue for 20 turns, and the simulation apparatus 2 generates a response (user input information). The interactive device 1 is rewarded based on the reward function. Under the above conditions, the dialogue was simulated, and the policy (parameter θ) was updated by NAC every 2000 dialogues.

以下に、実験結果を説明する。まず、最初に、政策反復による報酬の改善について調べた。本実験での手法には、ランダム要素が含まれるために、実験結果はすべて５回の試行の平均である。図１４に、行ったシミュレーション対話数（２，０００対話を１ｂａｔｃｈとする）と，２，５，１０，１５，２０ターン後の報酬の関係を示す。また、図１４において、ユーザがドメインに関するすべての知識を持っている場合に決定を行った場合を（Ｏｒａｃｌｅ）として併記する。システムの政策は３０，０００対話で収束した。 The experimental results will be described below. First of all, we investigated the improvement of remuneration by repeating the policy. Since the method in this experiment includes random elements, all the experimental results are the average of five trials. FIG. 14 shows the relationship between the number of simulation dialogues performed (2,000 dialogues are 1 batch) and rewards after 2, 5, 10, 15, and 20 turns. In FIG. 14, the case where the determination is made when the user has all the knowledge about the domain is also shown as (Oracle). The system policy converged with 30,000 dialogues.

また、学習されたパラメータθ（重みベクトル）の値を比較・分析することにより、対話戦略を分析した。手法４，５では、開始からの対話のターン数が少ないことを表すパラメータに対する重みが大きく、手法２，６においてターン数が多いことを表すパラメータの重みが大きいことが分かった。この結果は、学習後の対話戦略では、最初にユーザに決定要因に対する知識を与え、ユーザの嗜好を推定した上で、具体的な候補を提示する対話戦略を行うことを表している。 In addition, the dialogue strategy was analyzed by comparing and analyzing the value of the learned parameter θ (weight vector). In the methods 4 and 5, the weight for the parameter indicating that the number of turns of the dialogue from the start is small is large, and in the methods 2 and 6, the weight of the parameter indicating that the number of turns is large is large. This result indicates that in the dialogue strategy after learning, first, the user is given knowledge of the determinants, and the user's preference is estimated, and then the dialogue strategy for presenting specific candidates is performed.

次に、学習された対話戦略を、以下の２つのベースライン手法と比較した。
（１）推薦なし（Ｂ１）
ベースライン手法（Ｂ１）は、システムは要求された情報の提示のみを行い、推薦は行わない手法である。これは、常に手法３を選択する場合と等価である。
（２）ランダムに推薦（Ｂ２）
ベースライン手法（Ｂ２）は、システムは、選択可能な６手法からランダムに推薦手法を選択する。これは、パラメータθの初期値（すべて０）における戦略と等価である。 Next, the learned dialogue strategy was compared with the following two baseline approaches.
(1) No recommendation (B1)
In the baseline method (B1), the system only presents requested information and does not recommend it. This is equivalent to the case where Method 3 is always selected.
(2) Recommendation at random (B2)
In the baseline method (B2), the system randomly selects a recommended method from six selectable methods. This is equivalent to the strategy at the initial value (all 0) of the parameter θ.

図１５に、これらのベースライン手法との比較結果を示す。ＮＡＣにより最適化した対話戦略は、ベースライン手法と比較して有意に大きな報酬を得ることができた（ｎ＝５００，ｐ＜．０１）。 FIG. 15 shows the results of comparison with these baseline methods. The dialogue strategy optimized by NAC was able to obtain significantly larger rewards compared to the baseline approach (n = 500, p <0.01).

さらに、決定するスポットの適合度と対話の長さのトレードオフの問題を考える。ユーザにとって、次に尋ねたい事項が明確に決まっている場合に情報推薦されることや、既知の内容を繰り返し推薦されることは、わずらわしいものとなる。そこで、推薦行為にペナルティを考慮した上での、対話戦略の最適さを考える。手法３以外の推薦手法に０．０５のペナルティを与える評価関数により対話戦略を学習し評価を行った。この結果を、図１６に示す。ランダムに推薦手法を選択する手法と比較して、提案法により学習した対話戦略は対話が長引いている場合にも、報酬の減少量が少ない。これは、学習された対話戦略では不必要な推薦を避けているものと考えられる。提案手法により得られた報酬は、ベースライン手法と比較して統計的に有意であった（ｐ＜．０１）。 Furthermore, consider the trade-off problem between the fitness of the spot to be determined and the length of dialogue. It is troublesome for the user to recommend information when the item to be asked next is clearly determined, or to repeatedly recommend known content. Therefore, we consider the optimal dialogue strategy after taking the penalty into the recommendation act. The dialogue strategy was learned and evaluated by an evaluation function that gave a penalty of 0.05 to recommended methods other than Method 3. The result is shown in FIG. Compared with the method of selecting a recommendation method at random, the dialogue strategy learned by the proposed method has a small amount of decrease in reward even when the dialogue is prolonged. This is thought to avoid unnecessary recommendations in the learned dialogue strategy. The reward obtained by the proposed method was statistically significant compared to the baseline method (p <.01).

以上、本実施の形態によれば、ユーザと対話を行う対話装置が文を出力するために必要な重みベクトルを自動的に構築できる。 As described above, according to the present embodiment, it is possible to automatically construct a weight vector necessary for a dialog device that performs a dialog with a user to output a sentence.

また、本実施の形態によれば、ユーザの知識と嗜好の両方を考慮する対話状態のモデルを提案し、強化学習により最適化を行い、ベースライン手法と比較してユーザがよりよい意思決定を行えることを確認した。 In addition, according to the present embodiment, a dialogue state model that considers both the user's knowledge and preferences is proposed, optimization is performed by reinforcement learning, and the user can make better decisions as compared to the baseline method. I confirmed that I can do it.

なお、本実施の形態において、重みベクトルの学習のアルゴリズムは問わない。 In the present embodiment, the learning algorithm of the weight vector does not matter.

さらに、本実施の形態における処理は、ソフトウェアで実現しても良い。そして、このソフトウェアをソフトウェアダウンロード等により配布しても良い。また、このソフトウェアをＣＤ−ＲＯＭなどの記録媒体に記録して流布しても良い。なお、このことは、本明細書における他の実施の形態においても該当する。なお、本実施の形態における対話装置１を実現するソフトウェアは、以下のようなプログラムである。つまり、このプログラムは、記憶媒体に、スポットと、当該スポットを決定するための要因である１以上の決定要因と、当該スポットの前記１以上の各決定要因の評価を示す評価値とを有するスポット情報を、２以上格納しており、かつ、情報推薦手法を識別する手法識別子と、当該情報推薦手法の評価情報と、評価情報を構成する各要素の重みを示す重みベクトルとを有する２以上の情報推薦手法を格納しており、かつ、ユーザの状態を示す情報であり、１以上の各決定要因に対するユーザの嗜好を示す情報である嗜好ベクトルと、１以上の各決定要因に対するユーザの知識を示す知識ベクトルとを有するユーザ状態情報を格納しており、コンピュータを、シミュレーション装置から、ユーザが入力する文のパターンであるユーザ文種類を識別するユーザ文種類識別子、またはユーザ文種類識別子と１以上の決定要因または１以上のスポットのうちの１以上の情報とを有するユーザ入力情報を受け付けるユーザ入力情報受付部と、前記記憶媒体に格納されている２以上の各情報推薦手法が有する評価情報および重みベクトルと、前記ユーザ状態情報とを用いて、前記２以上の各情報推薦手法に対する２以上のスコアを算出するスコア算出部と、前記スコア算出部が算出した２以上のスコアを用いて、一の情報推薦手法を識別する手法識別子、または手法識別子と、１以上の決定要因または１以上のスポットのうちの１以上の情報とを有する対話文情報を構成する対話文情報構成部と、前記対話文情報構成部が構成した対話文情報を、前記シミュレーション装置に送付する対話文出力部と、前記ユーザ入力情報受付部が受け付けたユーザ入力情報、または前記対話文出力部が出力した対話文情報のうちの１以上の情報から、少なくとも１以上のスポットまたは１以上の決定要因を取得し、当該１以上のスポットまたは１以上の決定要因を用いて、前記記憶媒体のユーザ状態情報を更新するユーザ状態情報更新部とを具備し、前記スコア算出部は、前記記憶媒体に格納されている２以上の各情報推薦手法が有する評価情報および重みベクトルと、前記ユーザ状態情報更新部が更新したユーザ状態情報とを用いて、前記２以上の各情報推薦手法に対する２以上のスコアを算出するものとして機能させるためのプログラム、である。 Furthermore, the processing in the present embodiment may be realized by software. Then, this software may be distributed by software download or the like. Further, this software may be recorded and distributed on a recording medium such as a CD-ROM. This also applies to other embodiments in this specification. Note that the software that implements the interactive apparatus 1 in the present embodiment is the following program. That is, this program has a spot having a spot, one or more determinants that are factors for determining the spot, and an evaluation value indicating evaluation of each of the one or more determinants of the spot on the storage medium. Two or more information having at least two pieces of information and having a method identifier for identifying the information recommendation method, evaluation information of the information recommendation method, and a weight vector indicating the weight of each element constituting the evaluation information An information recommendation method is stored, and information indicating a user's state, a preference vector that is information indicating a user's preference for one or more determinants, and a user's knowledge for one or more determinants User state information having a knowledge vector to be stored, and from a simulation device, a user sentence type which is a pattern of a sentence input by the user is stored. User input information receiving unit for receiving user input information having another user sentence type identifier or user sentence type identifier and one or more determinants or one or more information of one or more spots, and stored in the storage medium A score calculation unit that calculates two or more scores for each of the two or more information recommendation methods using the evaluation information and weight vector of each of the two or more information recommendation methods and the user status information; Using the two or more scores calculated by the score calculation unit, a method identifier for identifying one information recommendation method, or a method identifier, and one or more determinants or one or more information of one or more spots Dialogue sentence information constituting the dialogue sentence information, and the dialogue sentence output for sending the dialogue sentence information constituted by the dialogue sentence information constituting unit to the simulation apparatus. And at least one spot or one or more determinants from the user input information received by the user input information received by the user input information or the dialog sentence information output by the dialog sentence output unit And a user state information update unit that updates user state information of the storage medium using the one or more spots or one or more determinants, wherein the score calculation unit is stored in the storage medium. Two or more scores for each of the two or more information recommendation methods are calculated using the evaluation information and the weight vector of each of the two or more information recommendation methods and the user status information updated by the user status information update unit. It is a program for making it function as a thing.

また、本実施の形態におけるシミュレーション装置２を実現するソフトウェアは、以下のようなプログラムである。つまり、このプログラムは、記憶媒体に、各情報推薦手法と各ユーザ文種類との確率に関する情報である対話確率情報、決定要因が選択される確率に関する情報である決定要因確率情報、およびスポットが選択される確率に関する情報であるスポット確率情報とを格納しており、かつ、ユーザの嗜好を示すベクトルであるユーザ嗜好ベクトルを格納しており、コンピュータを、対話装置から対話文情報を受け付ける対話文情報受付部と、前記対話文情報が有する手法識別子と前記対話確率情報とを用いて、ユーザ文種類を決定し、ユーザ文種類識別子を取得するユーザ文種類決定部と、前記決定要因確率情報または前記スポット確率情報のうちの１以上の情報、または前記決定要因確率情報または前記スポット確率情報のうちの１以上の情報および前記対話文情報が有する１以上の決定要因または１以上のスポットのうちの１以上の情報とを用いて、１以上の決定要因または１以上のスポットを取得する決定要因等取得部と、前記ユーザ文種類識別子、または前記ユーザ文種類識別子と１以上の決定要因または１以上のスポットのうちの１以上の情報とを有するユーザ入力情報を前記対話装置に送付するユーザ入力情報送付部と、前記ユーザ嗜好ベクトルと、前記ユーザ入力情報に含まれるスポットの前記１以上の各決定要因の評価を示す１以上の評価値とを取得し、前記ユーザ嗜好ベクトルと前記１以上の評価値との合致度を算出し、当該合致度を用いて、前記ユーザ文種類識別子で識別されるユーザ文種類が選択される報酬を算出する報酬算出部と、前記報酬を用いて、前記対話装置の前記手法識別子に対応する重みベクトルであり、前記対話装置の前記記憶媒体の重みベクトルを更新する学習部として機能させるためのプログラム、である。 Moreover, the software which implement | achieves the simulation apparatus 2 in this Embodiment is the following programs. In other words, this program selects dialogue probability information that is information about the probability of each information recommendation method and each user sentence type, determinant probability information that is information about the probability that a determinant is selected, and a spot on a storage medium. Dialogue probability information, which is information relating to the probability of being played, and a user preference vector, which is a vector indicating the user's preference, is received, and dialogue text information for accepting dialogue text information from a dialogue device Using the reception unit, the method identifier included in the dialogue sentence information and the dialogue probability information, determine a user sentence type, obtain a user sentence type identifier, and the determination factor probability information or the One or more pieces of information in the spot probability information, or one or more pieces of information in the determinant probability information or the spot probability information And one or more determinants or one or more information of one or more spots included in the dialogue sentence information, a determination factor etc. acquisition unit for acquiring one or more determinants or one or more spots, A user sentence information identifier, or a user sentence information identifier and one or more determinants or user input information having one or more information of one or more spots; The user preference vector and one or more evaluation values indicating the evaluation of each of the one or more determining factors of the spot included in the user input information are acquired, and the degree of coincidence between the user preference vector and the one or more evaluation values A reward calculation unit for calculating a reward for selecting a user sentence type identified by the user sentence type identifier using the degree of match, and using the reward, A weight vector corresponding to the method identifier of the device, a program, to function as a learning unit for updating the weight vectors of the storage medium of the interactive device.

また、上記プログラムにおいて、前記報酬算出部は、前記スポット確率情報を用いて、ランダムにスポットを決定した場合の１以上の評価値と前記ユーザ嗜好ベクトルとの合致度の期待値を算出するランダム選択合致値算出手段と、前記ユーザ嗜好ベクトルと、前記ユーザ入力情報に含まれるスポットの前記１以上の各決定要因の評価を示す１以上の評価値との合致度を算出する選択スポット合致度算出手段と、前記ランダム選択合致値算出手段が算出した合致度の期待値と、前記選択スポット合致度算出手段が算出した合致度とを用いて、前記ユーザ入力情報に含まれるスポットが選択されたことの報酬を算出する報酬算出手段とを具備するものとして、コンピュータを機能させるためのプログラムであることは好適である。
（実施の形態２） In the above program, the reward calculation unit uses the spot probability information to calculate an expected value of the degree of coincidence between one or more evaluation values when the spot is randomly determined and the user preference vector. Selected spot match degree calculating means for calculating a match degree between the match value calculating means, the user preference vector, and one or more evaluation values indicating the evaluation of the one or more determining factors of the spot included in the user input information The spot included in the user input information is selected using the expected value of the degree of match calculated by the random selection match value calculation unit and the match level calculated by the selected spot match level calculation unit. It is preferable that the program includes a reward calculation unit that calculates a reward, and is a program for causing a computer to function.
(Embodiment 2)

本実施の形態において、実施の形態１で学習した重みベクトルを用いた、対話装置について説明する。この対話装置は、ユーザが装置から情報の提示や情報の推薦を受けながら、候補を選択する相談型の対話装置である。なお、本実施の形態における対話装置において、ユーザとのインタラクションは音声により行うが、音声による入出力は必須ではない。 In the present embodiment, an interactive device using the weight vector learned in the first embodiment will be described. This dialogue apparatus is a consultation type dialogue apparatus in which a user selects a candidate while receiving information presentation or information recommendation from the apparatus. In the dialogue apparatus according to the present embodiment, interaction with the user is performed by voice, but voice input / output is not essential.

嗜好に合った候補を選択する際には、多くの要因（後述する決定要因と同意義）を考慮する必要がある。対話装置を利用するユーザは、そのような要因を必ずしも全て把握しているわけではないため、対話装置はユーザに対して情報推薦を行い、対話装置が保有する知識とユーザの知識とのギャップを埋める必要がある。 When selecting a candidate that suits the taste, it is necessary to consider many factors (same meaning as a determinant described later). Since the user who uses the interactive device does not necessarily grasp all such factors, the interactive device recommends information to the user, and the gap between the knowledge held by the interactive device and the user's knowledge is obtained. Need to fill.

本実施の形態において、複数の候補（ここでは、主として、スポットと言う）の中からユーザに適した候補を選択する相談型対話のモデルを実装する対話装置について述べる。 In this embodiment, an interactive apparatus that implements a consultation-type conversation model for selecting a candidate suitable for the user from among a plurality of candidates (here, mainly referred to as spots) will be described.

図１７は、本実施の形態における対話装置３の内部構造を示すブロック図である。対話装置３は、知識ベース１１、情報推薦手法格納部１２、ユーザ状態情報格納部１３、受付部３４、スコア算出部１５、文構成部３６、文出力部３７、ユーザ状態情報更新部１８を備える。 FIG. 17 is a block diagram showing the internal structure of the interactive apparatus 3 in the present embodiment. The dialogue apparatus 3 includes a knowledge base 11, an information recommendation method storage unit 12, a user state information storage unit 13, a reception unit 34, a score calculation unit 15, a sentence composition unit 36, a sentence output unit 37, and a user state information update unit 18. .

受付部３４は、音声受付手段３４１、音声認識手段３４２を備える。 The reception unit 34 includes a voice reception unit 341 and a voice recognition unit 342.

文構成部３６は、文パターン情報取得手段３６１、変数値取得手段３６２、文構成手段３６３を備える。
ユーザ状態情報更新部１８は、ここでは、ユーザ提示用語取得手段１８１、装置提示用語取得手段１８２、嗜好ベクトル更新手段１８３、知識ベクトル更新手段１８４、および属性ベクトル更新手段１８５を具備する。 The sentence composition unit 36 includes sentence pattern information acquisition means 361, variable value acquisition means 362, and sentence composition means 363.
Here, the user status information update unit 18 includes a user presentation term acquisition unit 181, a device presentation term acquisition unit 182, a preference vector update unit 183, a knowledge vector update unit 184, and an attribute vector update unit 185.

受付部３４は、ユーザが入力した文を受け付ける。ここで、受け付けとは、通常、音声の受け付けである。ただし、受け付けとは、キーボードやマウス、タッチパネルなどの入力デバイスから入力された情報の受け付け、有線もしくは無線の通信回線を介して送信された情報の受信、光ディスクや磁気ディスク、半導体メモリなどの記録媒体から読み出された情報の受け付けなどを含む概念である。つまり、入力とは、音声入力、文字列入力等であり、入力手段は問わない。 The accepting unit 34 accepts a sentence input by the user. Here, the reception is usually reception of voice. However, reception means reception of information input from an input device such as a keyboard, mouse, touch panel, reception of information transmitted via a wired or wireless communication line, a recording medium such as an optical disk, a magnetic disk, or a semiconductor memory This is a concept including reception of information read out from. In other words, the input includes voice input, character string input, and the like, and any input means may be used.

文の入力手段は、マイクやキーボードやマウスやメニュー画面によるもの等、何でも良い。受付部３４は、マイクやキーボード等の入力手段のデバイスドライバーや、メニュー画面の制御ソフトウェア等で実現され得る。 The sentence input means may be anything such as a microphone, keyboard, mouse or menu screen. The receiving unit 34 can be realized by a device driver for input means such as a microphone and a keyboard, control software for a menu screen, and the like.

音声受付手段３４１は、ユーザが入力した音声を、マイクから受け付ける。 The voice receiving unit 341 receives voice input by the user from a microphone.

音声認識手段３４２は、音声受付手段３４１が受け付けた音声を認識し、文字列に変換する。音声認識手段３４２における音声認識方法は問わない。音声認識手段３４２は、公知技術であるので、詳細な説明は省略する。音声認識手段３４２は、通常、ＭＰＵやメモリ等から実現され得る。音声認識手段３４２の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The voice recognition unit 342 recognizes the voice received by the voice reception unit 341 and converts it into a character string. The voice recognition method in the voice recognition means 342 does not matter. Since the voice recognition unit 342 is a known technique, a detailed description thereof is omitted. The voice recognition unit 342 can be usually realized by an MPU, a memory, or the like. The processing procedure of the voice recognition means 342 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

スコア算出部１５は、ユーザ状態情報格納部１３のユーザ状態情報を、情報推薦手法格納部１２に格納されている２以上の各情報推薦手法が有する評価情報に適用し、２以上の各情報推薦手法に対する２以上のスコアを算出する。また、スコア算出部１５は、ユーザ状態情報更新部１８が更新したユーザ状態情報を、情報推薦手法格納部１２に格納されている２以上の各情報推薦手法が有する評価情報に適用し、２以上の各情報推薦手法に対する２以上のスコアを算出する。 The score calculation unit 15 applies the user state information stored in the user state information storage unit 13 to evaluation information included in each of the two or more information recommendation methods stored in the information recommendation method storage unit 12, and each of the two or more information recommendations A score of 2 or more for the method is calculated. The score calculation unit 15 applies the user state information updated by the user state information update unit 18 to evaluation information included in each of the two or more information recommendation methods stored in the information recommendation method storage unit 12. A score of 2 or more for each information recommendation method is calculated.

スコア算出部１５は、通常、文出力部３７が文を出力する前（直前であるとは限らない）に、スコアを算出する。なお、スコア算出部１５は、受付部３４が文を受け付けるごとに、スコアを算出することは好適である。また、ここで、適用とは、例えば、演算式「スコア＝ｆ（ユーザ状態情報，重みベクトル）」によりスコアを算出することである。また、例えば、ｆは「スコア＝ユーザ状態情報×重みベクトル」である。つまり、スコア算出部１５は、次に対話装置１が出力すべき文の文パターン情報を決定するために、文パターン情報と対応付けて管理されている評価情報と動的に変化するユーザ状態情報とを用いて、情報推薦手法ごとにスコアを算出する。 The score calculation unit 15 usually calculates a score before the sentence output unit 37 outputs a sentence (not necessarily immediately before). In addition, it is suitable for the score calculation part 15 to calculate a score whenever the reception part 34 receives a sentence. Here, the application is, for example, calculating a score by an arithmetic expression “score = f (user state information, weight vector)”. For example, f is “score = user state information × weight vector”. That is, the score calculation unit 15 uses the evaluation information managed in association with the sentence pattern information and the user state information that dynamically changes in order to determine the sentence pattern information of the sentence to be output next by the dialogue apparatus 1. Are used to calculate a score for each information recommendation method.

文構成部３６は、スコア算出部１５が算出した２以上のスコアを用いて、一の情報推薦手法が有する文パターン情報を取得し、文パターン情報から文を構成する。通常、文構成部３６は、スコア算出部１５が算出したスコアが最大の情報推薦手法が有する文パターン情報を取得し、文パターン情報から文を構成する。ここで、文パターン情報が文である場合は、文の構成とはＮＯＰ（何らの処理もしない）である。また、文パターン情報が変数を含む文のパターン情報であれば、文の構成とは、直前の出力文や直前の受け付けられた文や、着目している決定要因や着目しているスポットなどから変数の値を取得し、取得した変数の値を文パターン情報に代入する処理である。なお、着目している決定要因とは、1以上のスポットを出力する元になった決定要因である。また、着目しているスポットとは、１以上の決定要因を出力する元になったスポットである。なお、文構成部３６は、直前のユーザ入力文、または／および直前の対話装置１の出力文から、着目スポットや着目決定要因を取得する処理も行う。かかる処理の詳細については後述する。 The sentence composing unit 36 uses the two or more scores calculated by the score calculating unit 15 to acquire sentence pattern information included in one information recommendation method, and composes a sentence from the sentence pattern information. Normally, the sentence composing unit 36 acquires sentence pattern information included in the information recommendation method having the maximum score calculated by the score calculating unit 15 and composes a sentence from the sentence pattern information. Here, when the sentence pattern information is a sentence, the structure of the sentence is NOP (no processing is performed). In addition, if the sentence pattern information is pattern information of a sentence including variables, the sentence structure is determined from the immediately preceding output sentence, the immediately preceding accepted sentence, the determinant of interest, the spot of interest, etc. This is a process of acquiring the value of a variable and substituting the acquired value of the variable into sentence pattern information. Note that the determinant of interest is the determinant from which one or more spots are output. The spot of interest is a spot from which one or more determinants are output. In addition, the sentence structure part 36 also performs the process which acquires an attention spot and an attention determination factor from the user's input sentence immediately before and / or the output sentence of the dialog apparatus 1 immediately before. Details of such processing will be described later.

文構成部３６は、通常、ＭＰＵやメモリ等から実現され得る。文構成部３６の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The sentence constructing unit 36 can be usually realized by an MPU, a memory, or the like. The processing procedure of the sentence composing unit 36 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

文パターン情報取得手段３６１は、スコア算出部１５が算出した２以上のスコアのうち最も大きいスコアに対応する一の情報推薦手法が有する文パターン情報を、情報推薦手法格納部１２から取得する。 The sentence pattern information acquisition unit 361 acquires, from the information recommendation method storage unit 12, sentence pattern information included in one information recommendation method corresponding to the largest score among the two or more scores calculated by the score calculation unit 15.

変数値取得手段３６２は、文パターン情報取得手段３６１が取得した文パターン情報に含まれる１以上の変数を取得し、変数に対応するスポットまたは決定要因を、文出力部３７が直前に出力した文、または受付部３４が直前に受け付けた文のうちの１以上の文から取得する。また、変数値取得手段３６２は、文パターン情報取得手段３６１が取得した文パターン情報に含まれる１以上の変数を取得し、変数に対応する候補となる１以上のスポットまたは１以上の決定要因を、文出力部３７が直前に出力した文、または受付部３４が直前に受け付けた文のうちの１以上の文から取得し、候補となる１以上のスポットまたは１以上の決定要因から、候補となる１以上のスポットまたは１以上の決定要因に対応する知識ベース１１の評価値を用いて、変数に対応するスポットまたは決定要因を選択する。変数値取得手段３６２は、現在着目しているスポットである着目スポット、または現在着目している決定要因である着目決定要因から、変数に対応するスポットまたは決定要因を取得しても良い。 The variable value acquisition unit 362 acquires one or more variables included in the sentence pattern information acquired by the sentence pattern information acquisition unit 361, and the sentence output unit 37 outputs the spot or determination factor corresponding to the variable immediately before. Or from one or more of the sentences received by the receiving unit 34 immediately before. Further, the variable value acquisition unit 362 acquires one or more variables included in the sentence pattern information acquired by the sentence pattern information acquisition unit 361, and sets one or more spots or one or more determinants as candidates corresponding to the variables. From one or more of the sentences output immediately before by the sentence output unit 37 or the sentence received immediately before by the receiving unit 34, and from one or more spots or one or more determinants as candidates, Using the evaluation values of the knowledge base 11 corresponding to one or more spots or one or more determinants, a spot or determinant corresponding to the variable is selected. The variable value acquisition unit 362 may acquire a spot or a determinant corresponding to the variable from a target spot that is a spot that is currently focused or a focus determination factor that is a decisive factor that is currently focused.

文構成手段３６３は、文パターン情報取得手段３６１が取得した文パターン情報の変数の箇所に、変数値取得手段３６２が取得した用語（通常、スポットまたは決定要因）を挿入して文を構成する。文構成手段３６３は、出力される文が自然な流暢な文となるように、変形する処理を行っても良い。かかる処理は公知技術であるので詳細な説明を省略する。 The sentence constructing unit 363 composes a sentence by inserting the term (usually a spot or a determinant) acquired by the variable value acquiring unit 362 into the variable portion of the sentence pattern information acquired by the sentence pattern information acquiring unit 361. The sentence constructing unit 363 may perform a transformation process so that the output sentence becomes a natural and fluent sentence. Since this processing is a known technique, detailed description thereof is omitted.

文出力部３７は、文構成部３６が構成した文を出力する。ここで、出力とは、通常、音声出力である。つまり、文出力部３７は、文構成部３６が構成した文を音声出力することは好適である。ただし、出力とは、ディスプレイへの表示、プロジェクターを用いた投影、プリンタへの印字、外部の装置（音声出力装置や表示装置など）への送信、記録媒体への蓄積、他の処理装置や他のプログラムなどへの処理結果の引渡しなどを含む概念である。 The sentence output unit 37 outputs the sentence constructed by the sentence constructing unit 36. Here, the output is usually an audio output. That is, it is preferable that the sentence output unit 37 outputs the sentence formed by the sentence structure unit 36 by voice. However, output refers to display on a display, projection using a projector, printing on a printer, transmission to an external device (such as an audio output device or display device), storage in a recording medium, other processing devices, etc. This is a concept that includes delivery of processing results to other programs.

文出力部３７は、ディスプレイやスピーカー等の出力デバイスを含むと考えても含まないと考えても良い。文出力部３７は、出力デバイスのドライバーソフトまたは、出力デバイスのドライバーソフトと出力デバイス等で実現され得る。 The sentence output unit 37 may be considered as including or not including an output device such as a display or a speaker. The sentence output unit 37 may be implemented by output device driver software, or output device driver software and an output device.

ユーザ状態情報更新部１８は、受付部３４が受け付けた文、または文出力部３７が出力した文のうちの１以上の文から、少なくとも１以上のスポットまたは１以上の決定要因を取得し、当該１以上のスポットまたは１以上の決定要因を用いて、ユーザ状態情報格納部１３のユーザ状態情報を更新する。ユーザ状態情報更新部１８は、受付部３４が受け付けた文、または文出力部３７が出力した文のうちの１以上の文を形態素解析し、特定の品詞（名詞や形容詞や形容動詞など）の用語を取得し、当該用語をキーとして、知識ベース１１を検索し、知識ベース１１に格納されている用語をスポットまたは決定要因として取得しても良い。また、ユーザ状態情報更新部１８は、受付部３４が受け付けた文、または文出力部３７が出力した文のうちの１以上の文から漢字列を取得し、当該漢字列をキーとして、知識ベース１１を検索し、知識ベース１１に格納されている漢字列をスポットまたは決定要因として取得しても良い。そして、ユーザ状態情報更新部１８は、通常、取得したスポットまたは決定要因についてのユーザ状態情報を構成する要素の値（嗜好ベクトルや知識ベクトルなどの要素）が上昇するようにユーザ状態情報を更新する。また、ユーザ状態情報更新部１８は、通常、受付部３４が文を受け付けるごとに更新する。ただし、ユーザ状態情報更新部１８は、文出力部３７が文を出力するごとに更新しても良い。 The user status information update unit 18 acquires at least one or more spots or one or more determinants from one or more sentences out of the sentences received by the reception unit 34 or the sentences output by the sentence output unit 37. The user status information in the user status information storage unit 13 is updated using one or more spots or one or more determining factors. The user status information update unit 18 performs morphological analysis on one or more sentences of the sentence received by the reception part 34 or the sentence output by the sentence output part 37, and stores a specific part of speech (noun, adjective, adjective verb, etc.). The term may be acquired, the knowledge base 11 may be searched using the term as a key, and the term stored in the knowledge base 11 may be acquired as a spot or a determinant. In addition, the user state information update unit 18 acquires a kanji string from one or more sentences out of the sentences received by the receiving unit 34 or the sentences output by the sentence output unit 37, and uses the kanji string as a key for the knowledge base. 11 may be retrieved, and a Chinese character string stored in the knowledge base 11 may be acquired as a spot or a determining factor. Then, the user state information update unit 18 usually updates the user state information so that the values of elements (elements such as a preference vector and a knowledge vector) constituting the user state information regarding the acquired spot or determination factor are increased. . In addition, the user status information update unit 18 normally updates every time the reception unit 34 receives a sentence. However, the user state information update unit 18 may update the sentence output unit 37 each time a sentence is output.

ユーザ提示用語取得手段１８１は、受付部３４が受け付けた文から少なくとも１以上の決定要因を取得する。ユーザ提示用語取得手段１８１は、肯定的な決定要因のみを取得しても良いし、肯定／否定を検知して、各カテゴリー（肯定／否定）ごとに決定要因を取得しても良い。ユーザ提示用語取得手段１８１は、例えば、受付部３４が受け付けた文から自立語を取得し、自立語が知識ベース１１の決定要因である場合に、当該自立語を決定要因として取得する。また、ユーザ提示用語取得手段１８１は、着目決定要因を取得しても良い。 The user presented term acquisition unit 181 acquires at least one or more determinants from the sentence received by the receiving unit 34. The user-presented term acquisition unit 181 may acquire only a positive determinant, or may detect affirmation / negative and acquire a determinant for each category (positive / negative). For example, the user-presented term acquisition unit 181 acquires an independent word from a sentence received by the reception unit 34, and acquires the independent word as a determining factor when the independent word is a determining factor of the knowledge base 11. Further, the user presented term acquisition unit 181 may acquire a focus determination factor.

装置提示用語取得手段１８２は、文出力部３７が出力した文のうちの１以上の文から、少なくとも１以上の決定要因を取得する。装置提示用語取得手段１８２は、例えば、文出力部３７が出力した文から自立語を取得し、自立語が知識ベース１１の決定要因である場合に、当該自立語を決定要因として取得する。なお、文出力部３７が出力した文とは、文構成部３６が構成した文と同意義である。また、装置提示用語取得手段１８２は、肯定的な決定要因のみを取得しても良いし、肯定／否定を検知して、各カテゴリー（肯定／否定）ごとに決定要因を取得しても良い。また、装置提示用語取得手段１８２は、着目決定要因を取得しても良い。 The device presentation term acquisition means 182 acquires at least one or more determinants from one or more sentences out of the sentences output by the sentence output unit 37. For example, the device presentation term acquisition unit 182 acquires an independent word from a sentence output by the sentence output unit 37, and acquires the independent word as a determining factor when the independent word is a determining factor of the knowledge base 11. Note that the sentence output by the sentence output unit 37 has the same meaning as the sentence configured by the sentence configuration unit 36. Further, the device presentation term acquisition unit 182 may acquire only a positive determination factor, or may detect affirmation / negative and acquire a determination factor for each category (positive / negative). In addition, the device presentation term acquisition unit 182 may acquire a focus determination factor.

嗜好ベクトル更新手段１８３は、ユーザ提示用語取得手段１８１が取得した１以上の決定要因に対する嗜好ベクトルの要素の値を高くするように、ユーザ状態情報を更新する。また、嗜好ベクトル更新手段１８３は、装置提示用語取得手段１８２が取得した１以上の決定要因の中で、ユーザ提示用語取得手段１８１が取得できなかった１以上の決定要因に対する嗜好ベクトルの要素の値を低くするように、ユーザ状態情報を更新する。これは、対話装置１が出力したが、ユーザに選択されなかった決定要因の値を低くすることである。 The preference vector update unit 183 updates the user state information so as to increase the value of the preference vector element for one or more determination factors acquired by the user presented term acquisition unit 181. Also, the preference vector update unit 183 is a value of an element of the preference vector for one or more determinants that cannot be acquired by the user-presented term acquisition unit 181 among one or more determinants acquired by the device presentation term acquisition unit 182. The user status information is updated so as to lower the value. This is to lower the value of the determinant that is output from the interactive device 1 but not selected by the user.

知識ベクトル更新手段１８４は、装置提示用語取得手段１８２が取得した１以上の決定要因に対する知識ベクトルの要素の値を高くするように、ユーザ状態情報を更新する。
属性ベクトル更新手段１８５は、ユーザ提示用語取得手段１８１が取得した１以上の決定要因に対する属性ベクトルの要素の値を変更し、ユーザ状態情報を更新する。つまり、属性ベクトル更新手段１８５は、ユーザ提示用語取得手段１８１が取得した決定要因に対応する要素の値が大きくなるように、または当該決定要因に対応する要素の値になるように、または当該決定要因に対応する要素の値に近づくように、ユーザ状態情報に含まれる属性ベクトルの当該要素の値を更新する。 The knowledge vector update unit 184 updates the user state information so as to increase the value of the element of the knowledge vector for one or more determination factors acquired by the device presentation term acquisition unit 182.
The attribute vector update unit 185 changes the value of the element of the attribute vector for one or more determination factors acquired by the user presented term acquisition unit 181 and updates the user state information. That is, the attribute vector update unit 185 increases the value of the element corresponding to the determination factor acquired by the user presented term acquisition unit 181 or the value of the element corresponding to the determination factor. The value of the element of the attribute vector included in the user state information is updated so as to approach the value of the element corresponding to the factor.

次に、対話装置１の動作については、図１８のフローチャートを用いて説明する。 Next, the operation of the interactive apparatus 1 will be described using the flowchart of FIG.

（ステップＳ１８０１）文出力部３７は、予め保持している初期文を出力する。初期文とは、対話装置１が動作開始の際（例えば、起動時）にユーザに出力する文である。初期文は、例えば、「京都観光案内システムです。お好みの観光スポットを推薦します。」という文である。 (Step S1801) The sentence output unit 37 outputs an initial sentence held in advance. The initial sentence is a sentence that is output to the user when the interactive apparatus 1 starts operation (for example, at the time of activation). The initial sentence is, for example, the sentence “Kyoto Tourist Information System. Recommend your favorite sightseeing spot”.

（ステップＳ１８０２）受付部３４は、ユーザから文を受け付けたか否かを判断する。文を受け付ければステップＳ１８０３に行き、文を受け付けなければステップＳ１８０２に戻る。 (Step S1802) The reception unit 34 determines whether a sentence has been received from the user. If a sentence is accepted, the process goes to step S1803, and if no sentence is accepted, the process returns to step S1802.

（ステップＳ１８０３）受付部３４の音声認識手段３４２は、ステップＳ１８０２で受け付けた文を音声認識し、文字列の文（文字コード列の文）を取得する。 (Step S1803) The voice recognition means 342 of the reception unit 34 recognizes the sentence received in step S1802, and acquires a sentence of a character string (a sentence of a character code string).

（ステップＳ１８０４）受付部３４または図示しない手段が、受付部３４が受け付けた文が終了条件を満たすか否かを判断する。終了条件を満たせば処理を終了し、終了条件を満たさなければステップＳ１８０５に行く。なお、終了条件とは、例えば、ユーザの入力文が、予め決められた文のパターンに合致する文を含むことである。予め決められた文のパターンとは、例えば、「＜スポット＞に行きます。」「＜スポット＞に決めました。」などである。 (Step S1804) The receiving unit 34 or a means (not shown) determines whether or not the sentence received by the receiving unit 34 satisfies the end condition. If the end condition is satisfied, the process ends. If the end condition is not satisfied, the process proceeds to step S1805. The termination condition is, for example, that the user input sentence includes a sentence that matches a predetermined sentence pattern. Examples of the sentence pattern determined in advance include “go to <spot>” and “determine <spot>”.

（ステップＳ１８０５）スコア算出部１５は、２以上の各情報推薦手法に対する２以上のスコアを算出する。スコア算出処理の詳細については、図１９のフローチャートを用いて説明する。 (Step S1805) The score calculation unit 15 calculates two or more scores for two or more information recommendation methods. Details of the score calculation processing will be described with reference to the flowchart of FIG.

（ステップＳ１８０６）文構成部３６は、出力する１以上の文を構成する。文構成処理の詳細については、図２０のフローチャートを用いて説明する。 (Step S1806) The sentence composition unit 36 composes one or more sentences to be output. Details of the sentence composition process will be described with reference to the flowchart of FIG.

（ステップＳ１８０７）文出力部３７は、文構成部３６が構成した１以上の文を出力する。 (Step S1807) The sentence output unit 37 outputs one or more sentences configured by the sentence constructing unit 36.

（ステップＳ１８０８）ユーザ状態情報更新部１８は、ユーザ状態情報更新処理を行い、ステップＳ１８０２に戻る。ユーザ状態情報更新処理の詳細については、図２２のフローチャートを用いて説明する。 (Step S1808) The user status information update unit 18 performs user status information update processing, and returns to step S1802. Details of the user status information update processing will be described with reference to the flowchart of FIG.

なお、図１８のフローチャートにおいて、処理の終了前に、文出力部３７は、予め決められた文や、予め決められた文パターンから構成された文を出力しても良い。 In the flowchart of FIG. 18, the sentence output unit 37 may output a predetermined sentence or a sentence composed of a predetermined sentence pattern before the end of the process.

さらに、図４のフローチャートにおいて、電源オフや処理終了の割り込みにより処理は終了する。 Further, in the flowchart of FIG. 4, the process is terminated by powering off or a process termination interrupt.

次に、ステップＳ１８０５のスコア算出処理の詳細については、図１９のフローチャートを用いて説明する。 Next, details of the score calculation processing in step S1805 will be described with reference to the flowchart of FIG.

（ステップＳ１９０１）スコア算出部１５は、ユーザ状態情報格納部１３からユーザ状態情報を読み出す。 (Step S1901) The score calculation unit 15 reads the user status information from the user status information storage unit 13.

（ステップＳ１９０２）スコア算出部１５は、カウンタｉに１を代入する。 (Step S1902) The score calculation part 15 substitutes 1 to the counter i.

（ステップＳ１９０３）スコア算出部１５は、情報推薦手法格納部１２の中に、ｉ番目の情報推薦手法が存在するか否かを判断する。ｉ番目の情報推薦手法が存在すればステップＳ１９０４に行き、存在しなければ上位処理にリターンする。 (Step S1903) The score calculation unit 15 determines whether or not the i-th information recommendation method exists in the information recommendation method storage unit 12. If the i-th information recommendation method exists, the process goes to step S1904, and if it does not exist, the process returns to the upper process.

（ステップＳ１９０４）スコア算出部１５は、ｉ番目の情報推薦手法が有する評価情報を読み出す。 (Step S1904) The score calculation part 15 reads the evaluation information which the i-th information recommendation method has.

（ステップＳ１９０５）スコア算出部１５は、ステップＳ１９０１で読み出したユーザ状態情報と、ステップＳ１９０４で読み出した評価情報とを用いて、ｉ番目の情報推薦手法のスコアを算出し、当該スコアをｉ番目の情報推薦手法と対応付けて一時蓄積する。スコア算出部１５は、例えば、ステップＳ１９０１で読み出したユーザ状態情報とステップＳ１９０４で読み出した評価情報と乗算し、スコアを算出する。 (Step S1905) The score calculation unit 15 calculates the score of the i-th information recommendation method using the user state information read in step S1901 and the evaluation information read in step S1904, and calculates the score as the i-th Temporary storage in association with the information recommendation method. For example, the score calculation unit 15 multiplies the user state information read in step S1901 and the evaluation information read in step S1904 to calculate a score.

（ステップＳ１９０６）スコア算出部１５は、カウンタｉを１、インクリメントし、ステップＳ１９０３に戻る。 (Step S1906) The score calculation unit 15 increments the counter i by 1, and returns to step S1903.

なお、図１９のフローチャートにおいて、スコア算出部１５のスコア算出方法は、問わない。 In the flowchart of FIG. 19, the score calculation method of the score calculation unit 15 does not matter.

次に、ステップＳ１８０６の文構成処理の詳細については、図２０のフローチャートを用いて説明する。 Next, details of the sentence composition processing in step S1806 will be described with reference to the flowchart of FIG.

（ステップＳ２００１）文構成部３６は、直前のユーザ入力文、または／および直前の対話装置１の出力文を自然言語処理し、スポットを取得する。文構成部３６は、例えば、直前のユーザ入力文、または／および直前の対話装置１の出力文を形態素解析し、自立語を取得し、当該自立語をキーとして知識ベース１１を検索し、知識ベース１１に存在するスポットを取得する。ただし、ここで、スポットを取得できない場合もあり得る。 (Step S2001) The sentence composing unit 36 performs natural language processing on the immediately preceding user input sentence or / and the immediately preceding output sentence of the interactive device 1 to obtain a spot. The sentence constructing unit 36, for example, performs morphological analysis on the immediately preceding user input sentence or / and the immediately preceding output sentence of the dialogue apparatus 1, acquires an independent word, searches the knowledge base 11 using the independent word as a key, A spot existing in the base 11 is acquired. However, here, there may be a case where the spot cannot be acquired.

（ステップＳ２００２）文構成部３６は、直前のユーザ入力文、または／および直前の対話装置１の出力文を自然言語処理し、決定要因を取得する。文構成部３６は、例えば、直前のユーザ入力文、または／および直前の対話装置１の出力文を形態素解析し、自立語を取得し、当該自立語をキーとして知識ベース１１を検索し、知識ベース１１に存在する決定要因を取得する。ただし、ここで、決定要因を取得できない場合もあり得る。 (Step S2002) The sentence composing unit 36 performs natural language processing on the immediately preceding user input sentence or / and the immediately preceding output sentence of the interactive device 1, and acquires a determination factor. The sentence constructing unit 36, for example, performs morphological analysis on the immediately preceding user input sentence or / and the immediately preceding output sentence of the dialogue apparatus 1, acquires an independent word, searches the knowledge base 11 using the independent word as a key, The determination factor existing in the base 11 is acquired. However, here, there may be a case where the determination factor cannot be acquired.

（ステップＳ２００３）文構成部３６は、ステップＳ２００１でスポットを取得できたか否かを判断する。取得できればステップＳ２００４に行き、取得できなければステップＳ２００５に行く。 (Step S2003) The sentence structure part 36 judges whether the spot was able to be acquired by step S2001. If it can be acquired, the process goes to step S2004, and if it cannot be acquired, the process goes to step S2005.

（ステップＳ２００４）文構成部３６は、変数「着目スポット」に、ステップＳ２００１で取得したスポットを代入する。なお、変数「着目スポット」の値は、現在、対話において着目されているスポットである。また、変数「着目スポット」の値は、通常、一のスポットである。 (Step S2004) The sentence structure part 36 substitutes the spot acquired by step S2001 to the variable "focus spot". Note that the value of the variable “spot of interest” is a spot that is currently focused on in the dialogue. The value of the variable “spot of interest” is usually one spot.

（ステップＳ２００５）文構成部３６は、ステップＳ２００２で決定要因を取得できたか否かを判断する。取得できればステップＳ２００６に行き、取得できなければステップＳ２００７に行く。 (Step S2005) The sentence structure part 36 judges whether the decision factor was able to be acquired by step S2002. If it can be acquired, go to step S2006, and if it cannot be acquired, go to step S2007.

（ステップＳ２００６）文構成部３６は、変数「着目決定要因」に、ステップＳ２００２で取得した決定要因を代入する。なお、変数「着目決定要因」の値は、現在、対話において着目されている決定要因である。また、変数「着目決定要因」の値は、２以上の決定要因である場合もある。 (Step S2006) The sentence structure part 36 substitutes the determination factor acquired by step S2002 to variable "focus determination factor". Note that the value of the variable “focused determinant” is a determinant currently focused on in the dialogue. In addition, the value of the variable “target decision factor” may be two or more decision factors.

（ステップＳ２００７）文構成部３６は、変数「着目スポット」の値、および変数「着目決定要因」の値を用いて、知識ベース１１を検索し、着目スポットおよび着目決定要因に対応する説明文を、知識ベース１１から読み出す。なお、この説明文は、ユーザからの入力文に対する回答文である。通常、文構成部３６は、変数「着目スポット」の値、および変数「着目決定要因」に対応する説明文を知識ベース１１から読み出す。 (Step S2007) The sentence composing unit 36 searches the knowledge base 11 using the value of the variable “focused spot” and the value of the variable “focused determination factor”, and provides an explanatory sentence corresponding to the focused spot and the focused determination factor. Read from the knowledge base 11. This explanatory text is an answer text to the input text from the user. Normally, the sentence composing unit 36 reads out the value of the variable “focus spot” and the explanatory text corresponding to the variable “focus determination factor” from the knowledge base 11.

（ステップＳ２００８）文構成部３６は、推薦文の取得処理を行い、上位処理にリターンする。推薦文取得処理については、図２１のフローチャートを用いて説明する。 (Step S2008) The sentence structure part 36 performs the recommendation sentence acquisition process, and returns to the higher-level process. The recommended sentence acquisition process will be described with reference to the flowchart of FIG.

なお、図２０のフローチャートにおいて、回答文と推薦文を取得した。しかし、図２０のフローチャートにおいて、推薦文のみを取得する、回答文と推薦文と他の文も取得するなど、種々の文の取得処理が考えられる。 In the flowchart of FIG. 20, an answer sentence and a recommendation sentence are acquired. However, in the flowchart of FIG. 20, various sentence acquisition processes are conceivable, such as obtaining only a recommended sentence, obtaining an answer sentence, a recommended sentence, and other sentences.

次に、ステップＳ２００８の推薦文取得処理については、図２１のフローチャートを用いて説明する。 Next, the recommendation sentence acquisition process of step S2008 is demonstrated using the flowchart of FIG.

（ステップＳ２１０１）文構成部３６の文パターン情報取得手段３６１は、スコア算出部１５が算出した２以上のスコアのうち最も大きいスコアに対応する一の情報推薦手法が有する文パターン情報を、情報推薦手法格納部１２から取得する。 (Step S2101) The sentence pattern information acquisition unit 361 of the sentence composition unit 36 recommends the sentence pattern information included in one information recommendation method corresponding to the largest score among the two or more scores calculated by the score calculation unit 15 as information recommendation. Obtained from the technique storage unit 12.

（ステップＳ２１０２）変数値取得手段３６２は、カウンタｉに１を代入する。 (Step S2102) The variable value acquisition unit 362 substitutes 1 for the counter i.

（ステップＳ２１０３）変数値取得手段３６２は、ステップＳ２１０１で取得した文パターン情報の中の、ｉ番目の変数が存在するか否かを判断する。存在すればステップＳ２１０４に行き、存在しなければステップＳ２１０８に行く。 (Step S2103) The variable value acquisition unit 362 determines whether or not the i-th variable exists in the sentence pattern information acquired in step S2101. If it exists, go to Step S2104, and if not, go to Step S2108.

（ステップＳ２１０４）変数値取得手段３６２は、ステップＳ２１０１で取得した文パターン情報の中の、ｉ番目の変数を取得する。なお、この変数には、変数の値をどこから取得するかに関する情報も保持している。 (Step S2104) The variable value acquisition unit 362 acquires the i-th variable in the sentence pattern information acquired in step S2101. This variable also holds information on where to obtain the variable value from.

（ステップＳ２１０５）変数値取得手段３６２は、ｉ番目の変数に代入される１以上の用語を取得する。この用語とは、通常、スポットまたは決定要因（決定要因を特定する単語等でも良い）である。 (Step S2105) The variable value acquisition unit 362 acquires one or more terms to be substituted into the i-th variable. This term is usually a spot or a determinant (may be a word specifying the determinant).

（ステップＳ２１０６）文構成手段３６３は、ステップＳ２１０４で取得した１以上の用語を、文パターン情報の中のｉ番目の変数の箇所に代入する。 (Step S2106) The sentence composing means 363 substitutes one or more terms acquired in step S2104 for the i-th variable in the sentence pattern information.

（ステップＳ２１０７）変数値取得手段３６２は、カウンタｉを１、インクリメントし、ステップＳ２１０３に戻る。 (Step S2107) The variable value acquisition unit 362 increments the counter i by 1, and returns to step S2103.

（ステップＳ２１０８）文構成手段３６３は、取得した文を、自然な文に変更し、上位処理にリターンする。なお、自然な文に変更する必要がない場合は、ステップＳ２１０８では何も処理されない。また、文を自然な文に変更する技術は公知技術であるので、詳細な説明を省略する。文を自然な文に変更する技術は、例えば、統計ベースの手法を用いる。 (Step S2108) The sentence composing means 363 changes the acquired sentence to a natural sentence, and returns to the upper process. If it is not necessary to change to a natural sentence, nothing is processed in step S2108. Moreover, since the technique for changing a sentence to a natural sentence is a known technique, detailed description thereof is omitted. As a technique for changing a sentence to a natural sentence, for example, a statistical-based method is used.

次に、ステップＳ１８０８のユーザ状態情報更新処理の詳細については、図２２のフローチャートを用いて説明する。 Next, details of the user status information update processing in step S1808 will be described with reference to the flowchart of FIG.

（ステップＳ２２０１）ユーザ状態情報更新部１８のユーザ提示用語取得手段１８１は、受付部３４が受け付けた最新（直前）の文から、１以上の決定要因を取得する。また、直前に受け付けた文から１以上の決定要因を取得できない場合、ユーザ提示用語取得手段１８１は、着目決定要因を取得する。また、ユーザ提示用語取得手段１８１は、受付部３４が受け付けた最新（直前）の文から、１以上のスポットを取得する。そして、ユーザ提示用語取得手段１８１は、取得した決定要因または／および取得したスポットを、バッファに一時格納する。 (Step S2201) The user presentation term acquisition unit 181 of the user state information update unit 18 acquires one or more determinants from the latest (immediately preceding) sentence received by the reception unit 34. When one or more determination factors cannot be acquired from the sentence received immediately before, the user-presented term acquisition unit 181 acquires the focus determination factor. Further, the user-presented term acquisition unit 181 acquires one or more spots from the latest (immediately preceding) sentence received by the receiving unit 34. Then, the user-presented term acquisition unit 181 temporarily stores the acquired determination factor or / and the acquired spot in the buffer.

（ステップＳ２２０２）嗜好ベクトル更新手段１８３は、ユーザ状態情報格納部１３のユーザ状態情報が有する嗜好ベクトルを読み出す。そして、嗜好ベクトル更新手段１８３は、ステップＳ２２０１で取得した決定要因に対応する要素の値が大きくなるように、ユーザ状態情報に含まれる嗜好ベクトルを更新する。また、属性ベクトル更新手段１８５は、ステップＳ８０１で取得された決定要因に対応する要素の値が大きくなるように、または当該決定要因に対応する要素の値になるように、または当該決定要因に対応する要素の値に近づくように、ユーザ状態情報に含まれる属性ベクトルを更新する。 (Step S2202) The preference vector update unit 183 reads the preference vector included in the user status information in the user status information storage unit 13. Then, the preference vector update unit 183 updates the preference vector included in the user state information so that the value of the element corresponding to the determination factor acquired in step S2201 is increased. Further, the attribute vector update unit 185 corresponds to the determination factor so that the value of the element corresponding to the determination factor acquired in step S801 is increased, or to the value of the element corresponding to the determination factor. The attribute vector included in the user state information is updated so as to approach the value of the element to be performed.

（ステップＳ２２０３）装置提示用語取得手段１８２は、文出力部３７が出力した最新（直前）の文から、１以上の決定要因を取得する。また、装置提示用語取得手段１８２は、文出力部３７が出力した最新（直前）の文から、１以上のスポットを取得する。そして、装置提示用語取得手段１８２は、取得した決定要因または／および取得したスポットを、バッファに一時格納する。 (Step S2203) The device presentation term acquisition means 182 acquires one or more determinants from the latest (immediately preceding) sentence output by the sentence output unit 37. Further, the device presentation term acquisition unit 182 acquires one or more spots from the latest (immediately preceding) sentence output by the sentence output unit 37. Then, the device presentation term acquisition unit 182 temporarily stores the acquired determination factor or / and the acquired spot in the buffer.

（ステップＳ２２０４）知識ベクトル更新手段１８４は、ユーザ状態情報格納部１３のユーザ状態情報が有する知識ベクトルを読み出す。知識ベクトル更新手段１８４は、ステップＳ２２０３で取得した決定要因に対応する要素の値が大きくなるように、ユーザ状態情報に含まれる知識ベクトルを更新する。 (Step S2204) The knowledge vector update unit 184 reads the knowledge vector included in the user state information in the user state information storage unit 13. The knowledge vector update unit 184 updates the knowledge vector included in the user state information so that the value of the element corresponding to the determination factor acquired in step S2203 is increased.

（ステップＳ２２０５）ユーザ状態情報更新部１８は、ユーザ状態情報格納部１３のユーザ状態情報が有する対話のターン数を読み出す。なお、対話のターン数とは、対話が繰り返された対数である。そして、ユーザ状態情報更新部１８は、読み出したターン数に１を加えた値を、新しいターン数として、ユーザ状態情報格納部１３のユーザ状態情報を更新する。 (Step S2205) The user state information update unit 18 reads the number of conversation turns included in the user state information in the user state information storage unit 13. The number of dialogue turns is a logarithm of dialogue repeated. Then, the user status information update unit 18 updates the user status information in the user status information storage unit 13 with a value obtained by adding 1 to the read number of turns as a new number of turns.

（ステップＳ２２０６）ユーザ状態情報更新部１８は、直前ユーザ発話行為情報を更新する。直前ユーザ発話行為情報は、直前に受付部３４が受け付けた文に関する情報であり、ユーザが要求した情報の種類（スポットのみ、決定要因名のみ、またはその両方等）に対応する情報である。 (Step S2206) The user state information update unit 18 updates the immediately preceding user utterance action information. The immediately preceding user utterance action information is information related to the sentence received by the receiving unit 34 immediately before, and is information corresponding to the type of information requested by the user (spot only, determinant name only, or both).

（ステップＳ２２０７）ユーザ状態情報更新部１８は、直前システム発話行為情報を更新する。直前システム発話行為情報は、直前に文出力部３７が出力した文に関する情報であり、選択した情報推薦手法を特定する情報である。 (Step S2207) The user state information update unit 18 updates the immediately preceding system utterance action information. The immediately preceding system utterance action information is information related to the sentence output by the sentence output unit 37 immediately before, and is information for specifying the selected information recommendation method.

（ステップＳ２２０８）ユーザ状態情報更新部１８は、システム提示履歴情報を更新し、上位処理にリターンする。なお、システム提示履歴情報は、対話装置１（システム）が出力したスポット、および決定要因の数である。ユーザ状態情報更新部１８は、ステップＳ２２０３でバッファに書き込んだ決定要因およびスポットを、それぞれユニーク処理し、バッファ内の決定要因の数およびスポットの数を取得する。そして、ユーザ状態情報更新部１８は、バッファ内の決定要因の数およびスポットの数を、システム提示履歴情報として取得する。 (Step S2208) The user status information update unit 18 updates the system presentation history information, and returns to the upper process. The system presentation history information is the number of spots and determination factors output by the interactive device 1 (system). The user status information update unit 18 uniquely processes the determination factor and the spot written in the buffer in step S2203, respectively, and acquires the number of determination factors and the number of spots in the buffer. Then, the user status information update unit 18 acquires the number of determining factors and the number of spots in the buffer as system presentation history information.

なお、図２２のフローチャートにおいて、ステップＳ２２０５からＳ２２０８において更新した情報は、ユーザ状態情報を構成する情報の例であり、その他の情報がユーザ状態情報を構成しても良い。 In the flowchart of FIG. 22, the information updated in steps S2205 to S2208 is an example of information constituting the user status information, and other information may constitute the user status information.

以下、本実施の形態における対話装置３の具体的な動作について説明する。対話装置１の概念図は図２３である。本具体例において、対話装置３は、ユーザとの対話を進めながら、ユーザの訪問先の意思決定を支援し、京都の観光案内を行うシステムである。 Hereinafter, a specific operation of the interactive apparatus 3 in the present embodiment will be described. FIG. 23 is a conceptual diagram of the dialogue apparatus 1. In this specific example, the dialogue device 3 is a system that assists the decision making of the user's visit destination while promoting dialogue with the user, and provides sightseeing guidance in Kyoto.

知識ベース１１が保持しているスポット情報管理表の一例は、上述した図１１である。また、情報推薦手法格納部１２の情報推薦手法管理表の一例は、上述した図１２である。 An example of the spot information management table held by the knowledge base 11 is the above-described FIG. An example of the information recommendation method management table in the information recommendation method storage unit 12 is the above-described FIG.

また、ユーザ状態情報格納部１３は、以下に説明するユーザ状態情報を格納している、とする。例えば、ユーザ状態情報は、知識ベクトル「Ｋ_user」、嗜好ベクトル「Ｐ_user」、および局所重み行列「Ｖ_user」の３要素を有する、とする。ここでは、簡単のため、ユーザの嗜好ベクトル「Ｐ_user=(p₁,p₂,・・・,p_M)」の要素は、「１」または「０」の２値からなるパラメータである、とする。すなわち、ユーザがある決定要因mに興味があり(もしくは潜在的に興味があり)、スポット決定する際に重視する場合にp_mは「１」をとるものとする。また、ユーザが、(ユーザ自身も気づいていない)潜在的な嗜好を持っている状態を表現するために、ユーザの知識ベクトル「Ｋ_user=(k₁,k₂,・・・,k_M)」を導入する。ユーザが、システム（対話装置３）が決定要因mを扱えることを知っている、もしくはシステムが決定要因mを推薦した場合にベクトルの要素k_Mは、「１」をとる。これらのベクトルを用いることにより、例えば、決定要因mが、ユーザが潜在的に興味を持っている要因であるが、ユーザはそれに気づいていないという状態は(k_m=０,p_m=１)で表現できる。また、ユーザの決定要因mの観点からのスポットnに対する局所重みv_nmは、ユーザは、システムから提示された情報のみから判断すると仮定して、システムが上記の６つの推薦手法のうち「ＩＤ＝１，２，６」のいずれかの推薦手法を用いて、ユーザにスポットの評価を知らせた場合に「１」をとるものとする。 In addition, it is assumed that the user status information storage unit 13 stores user status information described below. For example, it is assumed that the user state information has three elements: a knowledge vector “K _user ”, a preference vector “P _user ”, and a local weight matrix “V _user ”. Here, for the sake of simplicity, the elements of the _user preference vector “P _user = (p ₁ , p ₂ ,..., P _M )” are parameters consisting of binary values of “1” or “0”. And That is, the user is interested in is determinant m (or potentially interested), p _m is assumed to take the "1" when emphasized in determining spot. In addition, in order to express a state in which the user has a potential preference (the user himself / herself is unaware), the user's knowledge vector “K _user = (k ₁ , k ₂ ,..., K _M ) Is introduced. When the user knows that the system (dialogue device 3) can handle the determination factor m, or when the system recommends the determination factor m, the vector element k _M takes “1”. By using these vectors, for example, determinants m is is a factor that the user has potentially interesting, the state that the user is not aware of it _{_{(k m = 0, p m}} = 1) Can be expressed as Further, assuming that the user determines only the local weight v _nm for the spot n from the viewpoint of the user's determinant m from the information presented from the system, the system uses “ID = “1” is assumed when the user is informed of the evaluation of the spot using any one of the recommended methods “1, 2, 6”.

また、ここでのユーザ状態情報は、対話状態の特徴ベクトルに相当する。さらに詳細には、ユーザ状態情報は、ここでは、以下の６種類の情報を有する。第一は、ターン数である。ターン数は、ここでは、ノコギリ関数を利用することにより、５つのパラメータでターン数を表現する。第二は、直前ユーザ発話行為情報である。直前ユーザ発話行為情報は、例えば、ベクトル（x_i,x_i+1,x_i+2,x_i+3,x_i+4)であり、（１ｉｆａ_user ^t-1=x_i ，otherwise ０）とする。ここで、「ａ_user ^t-1」は直前のユーザの発話である。また、直前のユーザの発話が、システム（対話装置３）が推薦したスポット（スポット名と言い換えても良い）もしくは決定要因のみを含む場合には「x_i=１」であり他のベクトルの要素は「０」である。また、システムから推薦されていないスポットを含む場合には「x_i+1=１」であり他のベクトルの要素は「０」である。また、システムから推薦されていない決定要因のみを含む場合には「x_i+2=１」であり他のベクトルの要素は「０」である。また、システムから推薦されていないスポット・決定要因の双方を含む場合には「x_i+3=1」であり他のベクトルの要素は「０」である。さらに、これらのいずれも含まない場合には「x_i+4=１」であり他のベクトルの要素は「０」とする。第三は、直前システム発話行為情報である。直前システム発話行為情報は、例えば、ベクトル（y_i,y_i+1,y_i+2,y_i+3,y_i+4,y_i+5,y_i+6)であり、（１ｉｆａ_sys ^t-1=y_i ，otherwise ０）とする。ここで、「ａ_sys ^t-1」は直前のシステムの発話である。また、直前のシステムの発話が、情報推薦手法１を用いた場合は「y_i+1=１」であり他のベクトルの要素は「０」である。また、情報推薦手法２を用いた場合は「y_i+２=１」であり他のベクトルの要素は「０」である。つまり、情報推薦手法ｎを用いた場合は「y_i+n=１」であり他のベクトルの要素は「０」とする。第四は、ユーザの決定要因に対する知識ベクトルである。なお、知識ベクトルは、ユーザの決定要因に対する知識を、「Σ_n=1 ^NPr(k_n=1)」により算出しても良い。ここで、k_nは、ｎ番目の決定要因に対する知識ベクトルの要素値である。また、Pr(k=1)は、「kが1である」と、システムが推定する事後確率（確信度）である。ここで、嗜好ベクトルの要素値は、「１」または「０」である、とする。第五は、システムが提示したスポット・決定要因数である。つまり、第五は、システム提示履歴情報であり、例えば、「Σ_n=1 ^N Σ_m=1 ^Mv_nm」である。ここで、v_nmは、対話装置３が有するスポットｍの決定要因ｎに対する評価値である。第六は、嗜好ベクトルである。嗜好ベクトルは、ユーザの各決定要因に対する嗜好を示す情報である。なお、嗜好ベクトルは、ユーザが決定要因を重視する確率の期待値に置き換えても良い。この期待値は、「Pr(k_n=1)×Pr(p_n=1))」で示され、ここで、各決定要因ごと計１０パラメータを有する。ここで、「p_n」は、嗜好ベクトルの要素値である。ここで、嗜好ベクトルの要素値は、「１」または「０」である、とする。 The user state information here corresponds to a feature vector of the dialog state. More specifically, here, the user status information includes the following six types of information. The first is the number of turns. Here, the number of turns is expressed by five parameters using a sawtooth function. The second is immediately preceding user utterance action information. The immediately preceding user utterance action information is, for example, a vector (x _i , x _{i + 1} , x _{i + 2} , x _{i + 3} , x _{i + 4} ), and (1 if a _user ^t−1 = x _i , otherwise 0). Here, “a _user ^t−1 ” is the utterance of the previous user. Also, if the utterance of the previous user includes only a spot (which may be referred to as a spot name) recommended by the system (dialogue device 3) or a determinant, “x _i = 1” and other vector elements Is “0”. When a spot not recommended by the system is included, “x _{i + 1} = 1” and the elements of other vectors are “0”. When only the determinants not recommended by the system are included, “x _{i + 2} = 1” and the elements of the other vectors are “0”. Further, when both spots and determinants not recommended by the system are included, “x _{i + 3} = 1” and the elements of other vectors are “0”. Further, when none of these is included, “x _{i + 4} = 1” and the elements of the other vectors are “0”. The third is the immediately preceding system utterance action information. The immediately preceding system utterance action information is, for example, a vector (y _i , y _{i + 1} , y _{i + 2} , y _{i + 3} , y _{i + 4} , y _{i + 5} , y _{i + 6} ), and (1 if a _sys ^t-1 = y _i , otherwise 0) Here, “a _sys ^t-1 ” is the utterance of the immediately preceding system. Further, when the information recommendation method 1 is used as the utterance of the immediately preceding system, “y _{i + 1} = 1” and the elements of the other vectors are “0”. When the information recommendation method 2 is used, “y _{i + 2} = 1”, and the elements of the other vectors are “0”. That is, when the information recommendation method n is used, “y _{i + n} = 1”, and the elements of other vectors are “0”. The fourth is a knowledge vector for a user's determinant. Note that the knowledge vector may be calculated by “Σ _{n = 1} ^N Pr (k _n = 1)”, which is the knowledge of the user's decision factor. Here, k _n is the element values of the knowledge vector for the n th determinant. Pr (k = 1) is a posterior probability (certainty factor) estimated by the system that “k is 1”. Here, it is assumed that the element value of the preference vector is “1” or “0”. The fifth is the number of spots and determinants presented by the system. That is, the fifth item is system presentation history information, for example, “Σ _{n = 1} ^N Σ _{m = 1} ^M v _nm ”. Here, v _nm is an evaluation value for the determinant n of the spot m that the interactive apparatus 3 has. The sixth is a preference vector. The preference vector is information indicating a user's preference for each determining factor. Note that the preference vector may be replaced with an expected value of the probability that the user attaches importance to the determination factor. This expected value is indicated by “Pr (k _n = 1) × Pr (p _n = 1))”, and has 10 parameters in total for each decision factor. Here, “p _n ” is an element value of the preference vector. Here, it is assumed that the element value of the preference vector is “1” or “0”.

また、文出力部３７は、対話装置３が起動時に出力される文である初期文「京都観光システムです。お好みの観光スポットを推薦します。」を格納している、とする。 Further, it is assumed that the sentence output unit 37 stores an initial sentence “Kyoto sightseeing system. Recommend your favorite sightseeing spot”, which is a sentence output when the interactive device 3 is activated.

さらに、受付部３４は、対話の終了条件である文のパターン「＜スポット＞に行きます。」「＜スポット＞に決めました。」を保持している、とする。なお、ここで＜スポット＞は、スポットが代入され得る変数である。 Further, it is assumed that the reception unit 34 holds the sentence patterns “go to <spot>” and “determined <spot>”, which are conditions for terminating the conversation. Here, <spot> is a variable to which a spot can be substituted.

かかる状態において、ユーザは、対話装置３を起動した、とする。次に、文出力部３７は、初期文を読み出し、初期文「京都観光システムです。お好みの観光スポットを推薦します。」を音声出力する。 In this state, it is assumed that the user has activated the dialogue apparatus 3. Next, the sentence output unit 37 reads the initial sentence and outputs the initial sentence “Kyoto sightseeing system. Recommend your favorite sightseeing spot” by voice.

次に、ユーザは、「仁和寺の桜について教えて。」と音声入力した、とする。すると、受付部３４の音声受付手段３４１は、音声「仁和寺の桜について教えて。」を受け付ける。次に、音声認識手段３４２は、この音声を認識し、文「仁和寺の桜について教えて。」を取得する。 Next, it is assumed that the user inputs a voice message “Tell me about cherry blossoms at Ninna-ji Temple”. Then, the voice reception means 341 of the reception unit 34 receives the voice “Tell me about cherry blossoms at Ninna-ji.” Next, the voice recognition means 342 recognizes this voice and acquires the sentence “Tell me about cherry blossoms at Ninna-ji.”

次に、受付部３４は、入力された文「仁和寺の桜について教えて。」が終了条件である文パターン「＜スポット＞に行きます。」または「＜スポット＞に決めました。」に合致しない、と判断する。 Next, the reception unit 34 changes to the sentence pattern “I will go to <spot>” or “I decided to <spot>” with the input sentence “Tell me about cherry blossoms at Ninna-ji.” Judge that they do not match.

次に、スコア算出部１５は、各情報推薦手法に対する６つのスコアを算出する。つまり、まず、スコア算出部１５は、ユーザ状態情報格納部１３からユーザ状態情報（s0001,s0002,・・・,s0029）を読み出す。 Next, the score calculation unit 15 calculates six scores for each information recommendation method. That is, first, the score calculation unit 15 reads the user state information (s0001, s0002,..., S0029) from the user state information storage unit 13.

次に、スコア算出部１５は、情報推薦手法格納部１２の中に、１番目から６番目までの各情報推薦手法が有する評価情報（ベクトル）と重みベクトルと、ユーザ状態情報（s0001,s0002,・・・,s0029）とを、情報推薦手法ごとに乗算する。そして、スコア算出部１５は、６つの情報推薦手法のスコアを算出する。 Next, the score calculation unit 15 stores in the information recommendation method storage unit 12 the evaluation information (vector), the weight vector, and the user status information (s0001, s0002, , S0029) for each information recommendation method. And the score calculation part 15 calculates the score of six information recommendation methods.

次に、文構成部３６は、直前のユーザ入力文を自然言語処理し、スポット「仁和寺」を取得する。また、文構成部３６は、直前のユーザ入力文を自然言語処理し、決定要因「桜」を取得する。なお、文構成部３６は、例えば、ユーザ入力文を形態素解析し、自立語「仁和寺」「桜」「教えて」を取得する。そして、文構成部３６は、３つの自立語をキーとして、知識ベース１１を検索し、「仁和寺」がスポット、「桜」が決定要因であることを検知し、スポット「仁和寺」および決定要因「桜」を取得する。 Next, the sentence composing unit 36 performs natural language processing on the immediately preceding user input sentence, and acquires the spot “Ninna-ji”. In addition, the sentence composing unit 36 performs natural language processing on the immediately preceding user input sentence, and acquires the determination factor “sakura”. For example, the sentence composing unit 36 performs morphological analysis on the user input sentence, and acquires the independent words “Ninnaji”, “Sakura”, and “Teach me”. Then, the sentence composition unit 36 searches the knowledge base 11 using the three independent words as keys, detects that “Ninna-ji” is a spot and “Sakura” is a determining factor, and determines the spot “Ninna-ji” and the determination Get the factor “Sakura”.

次に、文構成部３６は、変数「着目スポット」に、取得したスポット「仁和寺」を代入する。また、文構成部３６は、変数「着目決定要因」に、取得した決定要因「桜」を代入する。 Next, the sentence composing unit 36 substitutes the acquired spot “Ninna-ji” for the variable “spot of interest”. In addition, the sentence composition unit 36 substitutes the acquired determinant “sakura” for the variable “focus determinant”.

次に、文構成部３６は、変数「着目スポット」の値「仁和寺」、および変数「着目決定要因」の値「桜」を用いて、知識ベース１１を検索し、着目スポットおよび着目決定要因に対応する説明文「御室桜は、樹高が低く単弁の香り高い白花を根元から咲かせる珍しい桜です。開花時期が遅く、京都の春の終わりを飾ります。」を、知識ベース１１から読み出す。この説明文は、回答文となる。 Next, the sentence composition unit 36 searches the knowledge base 11 using the value “Ninna-ji” of the variable “focused spot” and the value “cherry blossom” of the variable “focused determination factor”, and searches for the focused spot and the focused determination factor. Read from the knowledge base 11 "Omuro Sakura is a rare cherry tree with a low tree height and a single fragrant white flower that blooms from the base. The flowering time is late and decorates the end of spring in Kyoto." This explanatory text becomes an answer text.

次に、文構成部３６は、推薦文の取得処理を行う。つまり、まず、文構成部３６の文パターン情報取得手段３６１は、スコア算出部１５が算出した６つのスコアのうち最も大きいスコアに対応する一の情報推薦手法（ここでは、ＩＤ＝１の手法）が有する文パターン情報「＜着目スポット＞は、＜１以上の未出決定要因＞。何か説明しましょうか？」を、図１２の情報推薦手法管理表から取得する。 Next, the sentence composing unit 36 performs a recommendation sentence acquisition process. That is, first, the sentence pattern information acquisition unit 361 of the sentence composing unit 36 has one information recommendation method corresponding to the largest score among the six scores calculated by the score calculation unit 15 (in this case, the method of ID = 1). Is acquired from the information recommendation method management table of FIG. 12. The sentence pattern information “<the spot of interest> is <one or more undecided factors>.

次に、変数値取得手段３６２は、文パターン情報から、１番目の変数＜着目スポット＞を取得する。次に、変数値取得手段３６２は、変数＜着目スポット＞に代入される１以上の用語（変数「着目スポット」の値「仁和寺」）を取得する。次に、文構成手段３６３は、取得した１以上の用語「仁和寺」を、文パターン情報の中の１番目の変数の箇所に代入し、「仁和寺は、＜１以上の未出決定要因＞。何か説明しましょうか？」を得る。 Next, the variable value acquisition unit 362 acquires the first variable <target spot> from the sentence pattern information. Next, the variable value acquisition unit 362 acquires one or more terms (the value “Ninnaji” of the variable “target spot”) that is substituted into the variable <target spot>. Next, the sentence composing means 363 substitutes the acquired one or more terms “Ninna-ji” into the location of the first variable in the sentence pattern information. > Would you explain something?

次に、変数値取得手段３６２は、文パターン情報から、２番目の変数＜１以上の未出決定要因＞を取得する。そして、変数値取得手段３６２は、変数＜１以上の未出決定要因＞を取得する。つまり、変数値取得手段３６２は、変数「着目スポット」の値「仁和寺」の評価値が「１」であり、既出の決定要因「桜」を除く決定要因である「景色が良い」「庭園が有名」「紅葉が有名」「世界遺産」「イベントがある」を取得する。そして、次に、文構成手段３６３は、取得した１以上の用語を、文パターン情報の中の２番目の変数の箇所に代入し、「仁和寺は、景色が良い、庭園が有名、紅葉が有名、世界遺産、イベントがある。何か説明しましょうか？」を得る。次に、文構成手段３６３は、「仁和寺は、景色が良い、庭園が有名、紅葉が有名、世界遺産、イベントがある。何か説明しましょうか？」を自然な文「仁和寺は、景色が良く、庭園、紅葉が有名で、世界遺産であり、イベントがあります。何か説明しましょうか？」に変換し、推薦文を取得する。 Next, the variable value acquisition unit 362 acquires the second variable <one or more undecided determination factors> from the sentence pattern information. Then, the variable value acquisition unit 362 acquires a variable <one or more undecided determination factors>. That is, the variable value acquisition unit 362 has the evaluation value of the value “Ninna-ji” of the variable “spot of interest” as “1”, and the “scenery is good” Is famous, "Colorful leaves are famous", "World Heritage", "Event is". Then, the sentence composing means 363 substitutes the acquired one or more terms into the second variable location in the sentence pattern information, and says, “Ninna-ji Temple has good scenery, famous gardens, and autumn leaves. There are famous, world heritage, and events. Next, the sentence composition means 363 is a natural sentence "Ninna-ji Temple is" Ninna-ji Temple has a good view, garden is famous, autumn leaves are famous, there are world heritage, and an event. The view is good, the garden is famous for its autumn leaves, it is a World Heritage site, and there are events.

つまり、上記の処理は、「Method1（(Spot_仁和寺),(Det_景色,Det_庭園,Det_紅葉，Det_世界遺産,Det_イベント))」を実行したこととなる。Method1()は、情報推薦手法１を適用することを意味する。「Spot_仁和寺」はスポット「仁和寺」、「Det_景色」は決定要因「景色」、「Det_庭園」は決定要因「庭園」、「Det_紅葉」は決定要因「紅葉」、「Det_世界遺産」は決定要因「世界遺産」、「Det_イベント」は決定要因「イベント」を意味する。 In other words, the above-described processing is executed as “Method 1 ((Spot _Ninnaji ), (Det _scenery , Det _garden , Det _{autumn leaves} , Det _{world heritage} , Det _event ))”. Method1 () means that the information recommendation method 1 is applied. “Spot _Ninna-ji ” is the spot “Ninna-ji”, “Det _Landscape ” is the determinant “Scenery”, “Det _Garden ” is the determinant “Garden”, “Det Colored _Leaves ” is the determinant “Autumn Leaves”, “Det _{World Heritage} ” Means determinant “world heritage”, “Det _event ” means determinant “event”.

次に、文出力部３７は、回答文「御室桜は、樹高が低く単弁の香り高い白花を根元から咲かせる珍しい桜です。開花時期が遅く、京都の春の終わりを飾ります。」を音声出力する。続いて、文出力部３７は、推薦文「仁和寺は、景色が良く、庭園、紅葉が有名で、世界遺産であり、イベントがあります。何か説明しましょうか？」を音声出力する。 Next, the sentence output unit 37 outputs an answer sentence “Omuro Sakura is a rare cherry tree with a low tree height and a single fragrant white flower that blooms from the root. The flowering time is late and decorates the end of spring in Kyoto.” . Subsequently, the sentence output unit 37 outputs a recommendation sentence “Ninna-ji Temple has a good view, is famous for its garden and autumn leaves, is a World Heritage site, and has an event.

次に、ユーザ状態情報更新部１８は、以下のように、ユーザ状態情報更新処理を行う。つまり、まず、ユーザ状態情報更新部１８のユーザ提示用語取得手段１８１は、受付部３４が受け付けた最新の文「仁和寺の桜について教えて。」から、１以上の決定要因「桜」を取得する。 Next, the user status information update unit 18 performs user status information update processing as follows. That is, first, the user presentation term acquisition unit 181 of the user status information update unit 18 acquires one or more determinants “sakura” from the latest sentence “Tell me about cherry blossoms at Ninna-ji Temple” received by the reception unit 34. To do.

そして、嗜好ベクトル更新手段１８３は、ユーザ状態情報格納部１３のユーザ状態情報が有する嗜好ベクトルを読み出す。そして、嗜好ベクトル更新手段１８３は、取得した決定要因「桜」に対応する要素の値が大きくなるように、ユーザ状態情報に含まれる嗜好ベクトルを更新する。決定要因「桜」に対応する要素の値を、どの程度大きくするかについては問わない。この大きくする値は、固定の値でも、固定の割合でも、動的に変化しても良い。 And the preference vector update means 183 reads the preference vector which the user state information of the user state information storage part 13 has. Then, the preference vector update unit 183 updates the preference vector included in the user state information so that the value of the element corresponding to the acquired determination factor “Sakura” becomes large. It does not matter how large the value of the element corresponding to the determinant factor “sakura” is. The value to be increased may be a fixed value, a fixed ratio, or dynamically change.

次に、装置提示用語取得手段１８２は、文出力部３７が出力した最新の推薦文「仁和寺は、景色が良く、庭園、紅葉が有名で、世界遺産であり、イベントがあります。何か説明しましょうか？」から、１以上の決定要因「景色」「庭園」「紅葉」「世界遺産」「イベント」を取得する。 Next, the device presentation term acquisition means 182 uses the latest recommended sentence output by the sentence output unit 37: “Ninnaji Temple has a good view, is famous for its garden, autumn leaves, is a World Heritage site, and has an event. Get one or more determinants “scenery” “garden” “autumn leaves” “world heritage” “event” from

次に、知識ベクトル更新手段１８４は、ユーザ状態情報格納部１３のユーザ状態情報が有する知識ベクトルを読み出す。知識ベクトル更新手段１８４は、取得した決定要因「景色」「庭園」「紅葉」「世界遺産」「イベント」に対応する要素の値が大きくなるように、ユーザ状態情報に含まれる知識ベクトルを更新する。決定要因「景色」等に対応する要素の値を、どの程度大きくするかについては問わない。この大きくする値は、固定の値でも、固定の割合でも、動的に変化しても良い。 Next, the knowledge vector update unit 184 reads the knowledge vector included in the user state information in the user state information storage unit 13. The knowledge vector update means 184 updates the knowledge vector included in the user state information so that the value of the element corresponding to the acquired determinants “scenery”, “garden”, “autumn leaves”, “world heritage”, “event” becomes large. . It does not matter how much the value of the element corresponding to the determinant “scenery” or the like is increased. The value to be increased may be a fixed value, a fixed ratio, or dynamically change.

次に、ユーザ状態情報更新部１８は、ユーザ状態情報格納部１３のユーザ状態情報が有する対話のターン数「０」を読み出す。そして、ユーザ状態情報更新部１８は、読み出したターン数「０」に１を加えた値「１」を、新しいターン数として、ユーザ状態情報格納部１３のユーザ状態情報を更新する。 Next, the user state information update unit 18 reads the number of conversation turns “0” included in the user state information in the user state information storage unit 13. Then, the user state information update unit 18 updates the user state information in the user state information storage unit 13 with the value “1” obtained by adding 1 to the read turn number “0” as the new turn number.

次に、ユーザ状態情報更新部１８は、直前ユーザ発話行為情報を「０」から「１」に更新する。 Next, the user state information update unit 18 updates the immediately preceding user utterance action information from “0” to “1”.

次に、ユーザ状態情報更新部１８は、直前システム発話行為情報を「０」から「１」に更新する。 Next, the user status information update unit 18 updates the immediately preceding system utterance action information from “0” to “1”.

次に、ユーザ状態情報更新部１８は、システム提示履歴情報を構成するスポット数を「０」から「１」（「仁和寺」を出力したので）に更新する。また、ユーザ状態情報更新部１８は、システム提示履歴情報を構成する決定要因数を「０」から「５」（「景色」「庭園」「紅葉」「世界遺産」「イベント」を出力したので）に更新する。 Next, the user status information update unit 18 updates the number of spots constituting the system presentation history information from “0” to “1” (since “Ninna-ji” was output). In addition, the user status information update unit 18 changes the number of determinants constituting the system presentation history information from “0” to “5” (because “landscape”, “garden”, “autumn leaves”, “world heritage”, and “event” are output). Update to

以上により、ユーザ状態情報が最新の値（s0101,s0102,・・・,s0129）に更新された。 As described above, the user status information is updated to the latest value (s0101, s0102,..., S0129).

次に、ユーザと対話装置３との対話が何度か行われ、５回目の発話「ここの景色はどうですか？」が、ユーザにより音声入力された、とする。なお、現在の着目スポットは「仁和寺」である。 Next, it is assumed that the user has interacted with the dialog device 3 several times, and the fifth utterance “How is the scenery here?” Is voice input by the user. The current spot of interest is “Ninna-ji”.

次に、受付部３４の音声受付手段３４１は、ユーザから音声による文「ここの景色はどうですか？」を受け付ける。 Next, the voice reception means 341 of the reception unit 34 receives a voice sentence “How is the scenery here?” From the user.

次に、受付部３４の音声認識手段３４２は、受け付けた文「ここの景色はどうですか？」を音声認識し、文字列の文を取得する。 Next, the voice recognition unit 342 of the reception unit 34 recognizes the received sentence “How is the scenery here?” And acquires a sentence of a character string.

次に、受付部３４は、入力された文「ここの景色はどうですか？」が終了条件である文パターン「＜スポット＞に行きます。」または「＜スポット＞に決めました。」に合致しない、と判断する。 Next, the reception unit 34 does not match the sentence pattern “go to <spot>” or “decided to <spot>” whose input condition is “How is the scenery here?” Judge that.

次に、スコア算出部１５は、６つの各情報推薦手法に対する６つのスコアを、上述と同様に算出する。 Next, the score calculation unit 15 calculates six scores for each of the six information recommendation methods in the same manner as described above.

つまり、まず、スコア算出部１５は、ユーザ状態情報格納部１３から、現在のユーザ状態情報（s0501,s0502,・・・,s0529）を読み出す。 That is, first, the score calculation unit 15 reads the current user state information (s0501, s0502,..., S0529) from the user state information storage unit 13.

次に、スコア算出部１５は、情報推薦手法格納部１２の中に、１番目から６番目までの各情報推薦手法が有する評価情報（ベクトル）と重みベクトルと、ユーザ状態情報（s0501,s0502,・・・,s0529）とを、情報推薦手法ごとに乗算する。そして、スコア算出部１５は、６つの情報推薦手法のスコアを算出する。 Next, the score calculation unit 15 stores in the information recommendation method storage unit 12 the evaluation information (vector), the weight vector, and the user state information (s0501, s0502, each of the first to sixth information recommendation methods. .., S0529) for each information recommendation method. And the score calculation part 15 calculates the score of six information recommendation methods.

次に、文構成部３６は、直前のユーザ入力文を自然言語処理し、決定要因「景色」を取得する。なお、ここでは、文構成部３６は、ユーザ入力文からスポットを取得できなかった。 Next, the sentence composing unit 36 performs natural language processing on the immediately preceding user input sentence, and acquires the determination factor “scenery”. In addition, the sentence structure part 36 was not able to acquire a spot from a user input sentence here.

次に、文構成部３６は、変数「着目決定要因」に、取得した決定要因「景色」を代入する。 Next, the sentence composition unit 36 substitutes the acquired determination factor “scenery” for the variable “focused determination factor”.

次に、文構成部３６は、変数「着目スポット」の値「仁和寺」、および変数「着目決定要因」の値「景色」を用いて、知識ベース１１を検索し、着目スポットおよび着目決定要因に対応する説明文「三門の上からは。京都市内が一望できます。」を、知識ベース１１から読み出す。この説明文は、回答文となる。 Next, the sentence composition unit 36 searches the knowledge base 11 using the value “Ninna-ji” of the variable “focused spot” and the value “scenery” of the variable “focused determination factor” to find the focused spot and the focused determination factor. Read from the knowledge base 11 the explanation corresponding to “From the top of Sanmon. This explanatory text becomes an answer text.

次に、文構成部３６は、推薦文の取得処理を行う。つまり、まず、文構成部３６の文パターン情報取得手段３６１は、スコア算出部１５が算出した６つのスコアのうち最も大きいスコアに対応する一の情報推薦手法（ここでは、ＩＤ＝２の手法）が有する文パターン情報「＜着目決定要因＞ところですと、＜１以上の未出スポット＞などが紹介できます。」を、図１２の情報推薦手法管理表から取得する。 Next, the sentence composing unit 36 performs a recommendation sentence acquisition process. That is, first, the sentence pattern information acquisition unit 361 of the sentence composing unit 36 is one information recommendation method corresponding to the largest score among the six scores calculated by the score calculation unit 15 (in this case, the method of ID = 2). Is acquired from the information recommendation method management table of FIG. 12. <Focus determining factor> Then, <1 or more unseen spots> etc. can be introduced].

次に、変数値取得手段３６２は、文パターン情報から、１番目の変数＜着目決定要因＞を取得する。次に、変数値取得手段３６２は、変数＜着目決定要因＞に代入される１以上の用語（変数「着目決定要因」の値「景色」）を取得する。次に、文構成手段３６３は、取得した１以上の用語「景色が良い」を、文パターン情報の中の１番目の変数の箇所に代入し、「景色が良いところですと、＜１以上の未出スポット＞などが紹介できます。」を得る。なお、「決定要因」に関する変数に値を代入する場合は、変数値取得手段３６２は、図１１の属性値「決定要因」を取得する、とする。 Next, the variable value acquisition unit 362 acquires the first variable <focus determination factor> from the sentence pattern information. Next, the variable value acquisition unit 362 acquires one or more terms (the value “scenery” of the variable “focus determination factor”) that is substituted into the variable <focus determination factor>. Next, the sentence composing means 363 substitutes the acquired one or more terms “good scenery” into the position of the first variable in the sentence pattern information, and “if the scenery is good, <1 or more Unspotted spot> etc. can be introduced. Note that, when assigning a value to a variable related to “determination factor”, the variable value acquisition unit 362 acquires the attribute value “determination factor” in FIG. 11.

次に、変数値取得手段３６２は、文パターン情報から、２番目の変数＜１以上の未出スポット＞を取得する。そして、変数値取得手段３６２は、変数＜１以上の未出スポット＞を取得する。つまり、変数値取得手段３６２は、変数「着目決定要因」の値「景色」の評価値が「１」であり、既出の決定要因「仁和寺」を除くスポットである「清水寺」「伏見稲荷大社」「鞍馬寺」を、知識ベース１１（図１１）から取得する。そして、次に、文構成手段３６３は、取得した用語「清水寺」「伏見稲荷大社」「鞍馬寺」を、文パターン情報の中の２番目の変数の箇所に代入し、「景色が良いところですと、清水寺、伏見稲荷大社、鞍馬寺などが紹介できます。」を得る。次に、文構成手段３６３は、「景色が良いところですと、清水寺、伏見稲荷大社、鞍馬寺などが紹介できます。」を自然な文に変換しようとするが、変更の必要がなく、推薦文「景色が良いところですと、清水寺、伏見稲荷大社、鞍馬寺などが紹介できます。」を取得する。 Next, the variable value acquisition unit 362 acquires the second variable <one or more unspotted spots> from the sentence pattern information. Then, the variable value acquisition unit 362 acquires a variable <one or more unspotted spots>. In other words, the variable value acquisition means 362 has a value “1” for the value “scenery” of the variable “focused determinant”, and spots “Kiyomizu-dera” and “Fushimi Inari Taisha” excluding the existing determinant “Ninna-ji”. “Kuramaji” is acquired from the knowledge base 11 (FIG. 11). Next, the sentence composing means 363 substitutes the acquired terms “Kiyomizu-dera”, “Fushimi Inari-taisha” and “Kurama-ji” for the second variable in the sentence pattern information. Kiyomizu Temple, Fushimi Inari Taisha Shrine, Kurama Temple, etc. ”can be introduced. Next, the sentence composing means 363 tries to convert “If the scenery is good, you can introduce Kiyomizu Temple, Fushimi Inari Taisha, Kurama Temple, etc.” to natural sentences, but there is no need to change and recommend Acquired the sentence “If you have a good view, you can introduce Kiyomizu Temple, Fushimi Inari Taisha Shrine, Kurama Temple, etc.”

つまり、上記の処理は、「Method2（(Spot_清水寺,Spot_{伏見稲荷大社},Spot_鞍馬寺),(Det_景色))」を実行したこととなる。Method2()は、情報推薦手法2を適用することを意味する。「Spot_清水寺」等はスポット「清水寺」等を意味する。 In other words, the above-described processing is executed “Method 2 ((Spot _{Kiyomizu Temple} , Spot _{Fushimi Inari Taisha} , Spot _{Kurama Temple} ), (Det _Landscape ))”. Method2 () means that the information recommendation method 2 is applied. “Spot _{Kiyomizu-dera} ” etc. means the spot “Kiyomizu-dera” etc.

次に、文出力部３７は、回答文「三門の上からは。京都市内が一望できます。」を音声出力する。続いて、文出力部３７は、推薦文「景色が良いところですと、清水寺、伏見稲荷大社、鞍馬寺などが紹介できます。」を音声出力する。 Next, the sentence output unit 37 outputs an answer sentence “From the top of Sanmon. Subsequently, the sentence output unit 37 outputs a recommendation sentence “If the scenery is good, you can introduce Kiyomizu Temple, Fushimi Inari Taisha, Kurama Temple, etc.”.

次に、ユーザ状態情報更新部１８は、以下のように、ユーザ状態情報更新処理を行う。つまり、まず、ユーザ状態情報更新部１８のユーザ提示用語取得手段１８１は、受付部３４が受け付けた最新の文「ここの景色はどうですか？」から、１以上の決定要因「景色」を取得する。 Next, the user status information update unit 18 performs user status information update processing as follows. That is, first, the user presentation term acquisition unit 181 of the user state information update unit 18 acquires one or more determinants “scenery” from the latest sentence “how is the scenery here” received by the reception unit 34.

そして、嗜好ベクトル更新手段１８３は、ユーザ状態情報格納部１３のユーザ状態情報が有する嗜好ベクトルを読み出す。そして、嗜好ベクトル更新手段１８３は、取得した決定要因「景色」に対応する要素の値が大きくなるように、ユーザ状態情報に含まれる嗜好ベクトルを更新する。決定要因「景色」に対応する要素の値を、どの程度大きくするかについては問わない。 And the preference vector update means 183 reads the preference vector which the user state information of the user state information storage part 13 has. Then, the preference vector updating unit 183 updates the preference vector included in the user state information so that the value of the element corresponding to the acquired determination factor “scenery” becomes large. It does not matter how large the value of the element corresponding to the determining factor “scenery” is.

次に、装置提示用語取得手段１８２は、文出力部３７が出力した最新の推薦文「景色が良いところですと、清水寺、伏見稲荷大社、鞍馬寺などが紹介できます。」から、１以上の決定要因「景色」を取得する。 Next, the device presentation term acquisition means 182 has one or more of the following recommended sentences output by the sentence output unit 37: “If the scenery is good, Kiyomizu Temple, Fushimi Inari Taisha, Kurama Temple, etc. can be introduced.” Get the determinant "scenery".

次に、知識ベクトル更新手段１８４は、ユーザ状態情報格納部１３のユーザ状態情報が有する知識ベクトルを読み出す。知識ベクトル更新手段１８４は、取得した決定要因「景色」に対応する要素の値が大きくなるように、ユーザ状態情報に含まれる知識ベクトルを更新する。なお、例えば、すでに決定要因「景色」に対応する要素の値が最大値である場合は、この要素値は変化しない。 Next, the knowledge vector update unit 184 reads the knowledge vector included in the user state information in the user state information storage unit 13. The knowledge vector update unit 184 updates the knowledge vector included in the user state information so that the value of the element corresponding to the acquired determination factor “scenery” becomes large. For example, when the value of the element corresponding to the determination factor “scenery” is already the maximum value, the element value does not change.

次に、ユーザ状態情報更新部１８は、ユーザ状態情報格納部１３のユーザ状態情報が有する対話のターン数「５」を読み出す。そして、ユーザ状態情報更新部１８は、読み出したターン数「５」に１を加えた値「６」を、新しいターン数として、ユーザ状態情報格納部１３のユーザ状態情報を更新する。 Next, the user state information update unit 18 reads the number of conversation turns “5” included in the user state information in the user state information storage unit 13. Then, the user state information update unit 18 updates the user state information in the user state information storage unit 13 with the value “6” obtained by adding 1 to the read turn number “5” as the new turn number.

次に、ユーザ状態情報更新部１８は、直前ユーザ発話行為情報を「１」のままとする。また、ユーザ状態情報更新部１８は、直前システム発話行為情報を「１」のままとする。 Next, the user status information update unit 18 keeps the previous user utterance action information as “1”. Further, the user status information update unit 18 keeps the previous system utterance action information as “1”.

次に、ユーザ状態情報更新部１８は、システム提示履歴情報を構成するスポット数を「１」から「４」（「清水寺」「伏見稲荷大社」「鞍馬寺」を出力したので）に更新する。また、「景色」は、既に出現したいたので、ユーザ状態情報更新部１８は、システム提示履歴情報を構成する決定要因数を「５」のままとする。 Next, the user status information update unit 18 updates the number of spots constituting the system presentation history information from “1” to “4” (since “Kiyomizu Temple”, “Fushimi Inari Taisha” and “Kurama Temple” are output). Since “scenery” has already appeared, the user state information update unit 18 keeps the number of determining factors constituting the system presentation history information as “5”.

以上により、ユーザ状態情報が最新の値（s0601,s0602,・・・,s0629）に更新された。 As described above, the user status information is updated to the latest value (s0601, s0602,..., S0629).

次に、ユーザは、対話装置３が出力した回答文と推薦文を聞き、「清水寺」（６回目の発話）と答えた、とする。 Next, it is assumed that the user hears the answer sentence and the recommendation sentence output by the dialogue apparatus 3 and answers “Kiyomizu Temple” (sixth utterance).

次に、受付部３４の音声受付手段３４１は、ユーザから音声による文「清水寺」を受け付ける。 Next, the voice reception unit 341 of the reception unit 34 receives a voice sentence “Kiyomizu Temple” from the user.

次に、受付部３４の音声認識手段３４２は、受け付けた文「清水寺」を音声認識し、文字列の文を取得する。 Next, the voice recognition means 342 of the reception unit 34 recognizes the received sentence “Kiyomizu-dera” and acquires a sentence of a character string.

次に、受付部３４は、入力された文「清水寺」が終了条件である文パターン「＜スポット＞に行きます。」または「＜スポット＞に決めました。」に合致しない、と判断する。 Next, the reception unit 34 determines that the input sentence “Kiyomizu-dera” does not match the sentence pattern “go to <spot>” or “determined <spot>” as the end condition.

つまり、まず、スコア算出部１５は、ユーザ状態情報格納部１３から、現在のユーザ状態情報（s0601,s0602,・・・,s0629）を読み出す。 That is, first, the score calculation unit 15 reads the current user status information (s0601, s0602,..., S0629) from the user status information storage unit 13.

次に、スコア算出部１５は、情報推薦手法格納部１２の中に、１番目から６番目までの各情報推薦手法が有する評価情報（ベクトル）と重みベクトルと、ユーザ状態情報（s0601,s0602,・・・,s0629）とを、情報推薦手法ごとに乗算する。そして、スコア算出部１５は、６つの情報推薦手法のスコアを算出する。 Next, the score calculation unit 15 stores in the information recommendation method storage unit 12 the evaluation information (vector), the weight vector, and the user state information (s0601, s0602, each of the first to sixth information recommendation methods. , S0629) is multiplied for each information recommendation method. And the score calculation part 15 calculates the score of six information recommendation methods.

次に、文構成部３６は、直前のユーザ入力文を自然言語処理し、スポット「清水寺」を取得する。なお、ここでは、文構成部３６は、ユーザ入力文から決定要因を取得できなかった。 Next, the sentence composing unit 36 performs natural language processing on the immediately preceding user input sentence and acquires the spot “Kiyomizu Temple”. In addition, the sentence structure part 36 was not able to acquire a decision factor from a user input sentence here.

次に、文構成部３６は、変数「着目スポット」に、取得したスポット「清水寺」を代入する。なお、現在の変数「着目決定要因」は、「景色」である。 Next, the sentence composing unit 36 substitutes the acquired spot “Kiyomizu-dera” for the variable “target spot”. Note that the current variable “Focus Determination Factor” is “Scenery”.

次に、文構成部３６は、変数「着目スポット」の値「清水寺」、および変数「着目決定要因」の値「景色」を用いて、知識ベース１１を検索し、着目スポットおよび着目決定要因に対応する説明文「清水の舞台は斜面の上に建てられ、ここから望む市街の風景は見事です。」を、知識ベース１１から読み出す。この説明文は、回答文となる。 Next, the sentence composition unit 36 searches the knowledge base 11 using the value “Kiyomizu-dera” of the variable “focused spot” and the value “scenery” of the variable “focused determination factor”, and sets the focused spot and the focused determination factor. The corresponding explanation “The stage of Shimizu is built on the slope and the cityscape you want from here is wonderful” is read from the knowledge base 11. This explanatory text becomes an answer text.

次に、文構成部３６は、推薦文の取得処理を行う。つまり、まず、文構成部３６の文パターン情報取得手段３６１は、スコア算出部１５が算出した６つのスコアのうち最も大きいスコアに対応する一の情報推薦手法（ここでは、ＩＤ＝４の手法）が有する文パターン情報「他にも＜１以上の未出決定要因＞｛ｓｅｌｅｃｔ３以下の決定要因ｗｈｅｒｅ知識ベクトル内の値が低い順｝なところなどが説明できます。」を、図１２の情報推薦手法管理表から取得する。 Next, the sentence composing unit 36 performs a recommendation sentence acquisition process. That is, first, the sentence pattern information acquisition unit 361 of the sentence composition unit 36 has one information recommendation method corresponding to the largest score among the six scores calculated by the score calculation unit 15 (here, the method of ID = 4). The sentence pattern information possessed by “Other than 1 or more undecided determinants> {Determinants of select 3 or less where where the value in the knowledge vector is low} can be explained.” Obtain from the recommended method management table.

次に、変数値取得手段３６２は、文パターン情報から、１番目の変数＜１以上の未出決定要因＞を取得する。次に、変数値取得手段３６２は、取得動作記述｛ｓｅｌｅｃｔ３以下の決定要因ｗｈｅｒｅ知識ベクトル内の値が低い順｝に従って、変数＜１以上の未出決定要因＞に代入される１以上の用語（変数「着目決定要因」の値「景色」）を取得する。つまり、変数値取得手段３６２は、既出の決定要因「桜」「景色」以外の決定要因「混雑」「世界遺産」「散策」・・・のうちで、知識ベクトル内の値が低い順に３つ（ここで、「世界遺産」「散策」「歴史」）を取得した、とする。次に、文構成手段３６３は、取得した３つの決定要因「世界遺産」「散策」「歴史」を、文パターン情報の中の１番目の変数の箇所に代入し、「他にも世界遺産、散策できる、歴史で有名なところなどが説明できます。」を得る。 Next, the variable value acquisition unit 362 acquires the first variable <one or more undecided determination factors> from the sentence pattern information. Next, the variable value acquisition means 362 has one or more terms assigned to the variable <one or more undecided determinants> in accordance with the acquisition operation description {decision factor of select 3 or less where value in the knowledge vector is low}. (The value “scenery” of the variable “focus determination factor”) is acquired. That is, the variable value acquisition means 362 has three of the determinants “congested”, “world heritage”, “walk” other than the previously determined determinants “cherry blossom” “landscape”, etc. (Here, “World Heritage” “Walk” “History”). Next, the sentence composing means 363 substitutes the acquired three determinants “world heritage”, “walk”, and “history” into the first variable part in the sentence pattern information, and “other world heritage, Can explain the famous places that can be walked in history. "

つまり、上記の処理は、「Method4（(NULL),(Det_世界遺産,Det_散策,Det_歴史))」を実行したこととなる。Method4()は、情報推薦手法4を適用することを意味する。「Det_世界遺産」等は決定要因「世界遺産」等を意味する。 In other words, the above-mentioned process, so that you run the "Method4 ((NULL), (Det _{World Heritage Site,} Det _walk, Det _history))". Method4 () means that the information recommendation method 4 is applied. “Det _{World Heritage} ” etc. means determinant “World Heritage” etc.

次に、文出力部３７は、回答文「清水の舞台は斜面の上に建てられ、ここから望む市街の風景は見事です。」を音声出力する。続いて、文出力部３７は、推薦文「他にも世界遺産、散策できる、歴史で有名なところなどが説明できます。」を音声出力する。 Next, the sentence output unit 37 outputs an answer sentence “The stage of Shimizu is built on a slope and the scenery of the city you want from here is wonderful”. Subsequently, the sentence output unit 37 outputs a recommendation sentence “I can explain other world heritage sites, walkable places, famous places in history, etc.”.

次に、ユーザ状態情報更新部１８は、上記と同様に、ユーザ状態情報更新処理を行い、ユーザ状態情報が最新の値（s0701,s0702,・・・,s0729）に更新された、とする。 Next, it is assumed that the user state information update unit 18 performs the user state information update process in the same manner as described above, and the user state information is updated to the latest value (s0701, s0702,..., S0729).

その後、ユーザと対話装置３との対話が進行し、ユーザが１６回目の発話で「では、南禅寺に行きます。」を音声入力した、とする。 After that, it is assumed that the dialogue between the user and the dialogue device 3 proceeds, and the user voice-inputs “I will go to Nanzenji” in the 16th utterance.

次に、受付部３４の音声受付手段３４１は、ユーザから音声による文「では、南禅寺に行きます。」を受け付ける。 Next, the voice reception means 341 of the reception unit 34 receives a voice sentence “I will go to Nanzenji” from the user.

次に、受付部３４の音声認識手段３４２は、受け付けた文「では、南禅寺に行きます。」を音声認識し、文字列の文を取得する。 Next, the speech recognition means 342 of the reception unit 34 recognizes the received sentence “I will go to Nanzenji” and acquires a sentence of a character string.

次に、受付部３４は、入力された文「では、南禅寺に行きます。」が終了条件である文パターン「＜スポット＞に行きます。」に合致する、と判断する。文「では、南禅寺に行きます。」が文パターン「＜スポット＞に行きます。」に合致する文を含むからである。 Next, the reception unit 34 determines that the input sentence “I will go to Nanzenji” matches the sentence pattern “I will go to <spot>”, which is the end condition. This is because the sentence “Let's go to Nanzenji” includes a sentence that matches the sentence pattern “Go to <spot>.”

そして、処理は終了する。なお、処理の終了時に、文出力部３７は、予め決められた文「京都観光案内システムをご利用頂き、有り難うございました。」や、予め決められた文パターン「＜スポット＞に決定しました。他に知りたいことはありますか？」から構成された文を出力しても良い。なお、ここでは、文出力部３７は、文パターン「＜スポット＞に決定しました。他に知りたいことはありますか？」から「南禅寺に決定しました。他に知りたいことはありますか？」を出力した、とする。 Then, the process ends. At the end of the process, the sentence output unit 37 determines a predetermined sentence “Thank you for using the Kyoto Tourist Information System” and a predetermined sentence pattern “<Spot>”. Do you have anything else you want to know? " Here, the sentence output unit 37 has decided to Nanzenji from the sentence pattern “I decided to <spot>. Do you have anything else to know?” Is there anything else you want to know? ? "Is output.

以上の対話の流れを、図２４に示す。図２４において、Ｓ［数値］は、対話装置３（システム）からの発話、Ｕ［数値］はユーザからの発話を示す。また、Ａｎｓは回答文、Ｒｅｃは推薦文を示す。 The flow of the above dialogue is shown in FIG. In FIG. 24, S [numerical value] indicates an utterance from the dialogue apparatus 3 (system), and U [numeric value] indicates an utterance from the user. Ans indicates an answer sentence, and Rec indicates a recommendation sentence.

以上、本実施の形態によれば、ユーザとの対話の進行に応じて、ユーザの知識と嗜好とに関する情報を動的に変更しながら対話を行うことにより、ユーザの意思決定を適切に支援できる。 As described above, according to the present embodiment, it is possible to appropriately support the user's decision making by performing the conversation while dynamically changing the information related to the user's knowledge and preferences according to the progress of the conversation with the user. .

なお、本実施の形態によれば、ユーザ状態情報の内容は問わない。また、ユーザとの対話の進行に応じて、ユーザ状態情報をどのように変更するかは問わない。ただ、ユーザの発話内に含まれるスポットや決定要因であり、否定的な内容ではないスポットや決定要因に対する嗜好に関する値（嗜好ベクトルを構成する要素値など）は、大きくする（対話装置３に選択されやすくする）。また、対話装置３が出力した発話内に含まれるスポットや決定要因に対する知識に関する値（知識ベクトルを構成する要素値など）は、大きくする（対話装置３に選択されやすくする）。 In addition, according to this Embodiment, the content of user state information does not ask | require. Further, it does not matter how the user status information is changed according to the progress of the dialogue with the user. However, a value related to a preference (such as an element value constituting a preference vector) that is a spot or determinant included in the user's utterance and is not a negative content is selected (selected by the dialogue apparatus 3). To make it easier). In addition, the value (such as element values constituting the knowledge vector) related to the knowledge about the spots and determination factors included in the utterance output from the dialog device 3 is increased (to be easily selected by the dialog device 3).

また、本実施の形態における、対話によるユーザ状態情報の変更の例について、以下に詳細に説明する。事前のユーザ状態情報が有する知識ベクトルが「Ｋｓｙｓ＝（０．２２，０．０１，０．０２，０．１８，・・・）」であり、事前の嗜好ベクトルが「Ｐｓｙｓ＝（０．３７，０．１９，０．４８，０．３８，・・・）」であるとする。そして、ユーザの発話を受け付けて、対話装置３は、情報推薦手法１を選択した、とする。つまり、対話装置３は、「a_sys=Method1{(Spot₅),(Det₁,Det₃,Det₄)}」を実行した、とする。そして、対話装置３は、推薦文「仁和寺(Spot₅)に関しては、庭園(Det₁)、世界遺産(Det₃)、紅葉(Det₄)について説明できます。」を得た、とする。そして、ユーザは、「世界遺産について(Det₃)教えて」を入力した、とする。 Further, an example of changing the user state information by dialogue in the present embodiment will be described in detail below. The knowledge vector of the prior user state information is “Ksys = (0.22, 0.01, 0.02, 0.18,...)”, And the prior preference vector is “Psys = (0.37). , 0.19, 0.48, 0.38, ...) ". Then, it is assumed that the dialogue apparatus 3 has selected the information recommendation method 1 in response to the user's utterance. In other words, it is assumed that the dialogue apparatus 3 executes “a _sys = Method1 {(Spot ₅ ), (Det ₁ , Det ₃ , Det ₄ )}”. Then, it is assumed that the dialogue apparatus 3 has obtained the recommendation sentence “Regarding Ninnaji (Spot ₅ ), it can explain the garden (Det ₁ ), the world heritage (Det ₃ ), and the autumn leaves (Det ₄ )”. Then, it is assumed that the user inputs “Tell me about World Heritage (Det ₃ )”.

すると、ユーザ状態情報更新部１８は、事後の知識ベクトル「Ｋｓｙｓ＝（（１．００，０．０１，１．００，１．００，・・・）」、事後の嗜好ベクトル「Ｐｓｙｓ＝（０．２６，０．１９，０．６５，０．２２，・・・）」を得る。なお、ベクトルは、（庭園，混雑，世界遺産，紅葉，景色，アクセス，桜，歴史，散策，イベント）の要素値で構成されている、とする。対話装置３（システム）から、決定要因「庭園」「世界遺産」「紅葉」が出力されたことにより、事後の知識ベクトルの「庭園」「世界遺産」「紅葉」の要素値が向上した。また、ユーザから決定要因「世界遺産」が入力されたことにより、嗜好ベクトルの「世界遺産」の要素値が増加した。そして、ユーザが決定要因「庭園」「紅葉」を選択しなかったことにより、嗜好ベクトルの「庭園」「紅葉」の要素値が減少した。
また、本実施の形態において、実施の形態１で学習した重みベクトルを用いた、対話装置について説明した。しかし、本実施の形態における対話装置３が用いる重みベクトルは、実施の形態１で学習した重みベクトルでなくても良い。つまり、対話装置３と実施の形態１で説明したシミュレーション装置２とは連携していなくても良い。 Then, the user state information update unit 18 uses the subsequent knowledge vector “Ksys = ((1.00, 0.01, 1.00, 1.00,...)” And the subsequent preference vector “Psys = (0 .26, 0.19, 0.65, 0.22, ...) ", and the vector is (garden, crowded, world heritage, autumn leaves, scenery, access, cherry blossom, history, walk, event) As the determinants “garden”, “world heritage”, and “colored leaves” are output from the dialogue device 3 (system), the subsequent knowledge vector “garden” “world heritage” The element value of “Autumn leaves” has improved, and the element value of “World Heritage” in the preference vector has increased due to the input of the determinant “World Heritage” by the user. "By not selecting" Autumn leaves " Element values of the "garden", "autumn leaves" of good vector is reduced.
In the present embodiment, the interactive apparatus using the weight vector learned in the first embodiment has been described. However, the weight vector used by the dialogue apparatus 3 in the present embodiment may not be the weight vector learned in the first embodiment. That is, the dialogue apparatus 3 and the simulation apparatus 2 described in the first embodiment may not be linked.

また、本実施の形態における処理は、ソフトウェアで実現しても良い。そして、このソフトウェアをソフトウェアダウンロード等により配布しても良い。また、このソフトウェアをＣＤ−ＲＯＭなどの記録媒体に記録して流布しても良い。なお、このことは、本明細書における他の実施の形態においても該当する。なお、本実施の形態における対話装置３を実現するソフトウェアは、以下のようなプログラムである。つまり、このプログラムは、記憶媒体に、スポットと、当該スポットを決定するための要因である１以上の決定要因と、当該スポットの前記１以上の各決定要因の評価を示す評価値とを有するスポット情報を、２以上格納している知識ベースと、前記対話装置が出力する文または前記対話装置が出力する文のパターンを示す情報である文パターン情報と、当該文パターン情報を選択する際に利用される文パターン情報の評価情報とを有する２以上の情報推薦手法と、ユーザの状態を示す情報であり、１以上の各決定要因に対するユーザの嗜好を示す情報である嗜好ベクトルと、１以上の各決定要因に対するユーザの知識を示す知識ベクトルとを有するユーザ状態情報とを格納しており、コンピュータを、ユーザが入力した文を受け付ける受付部と、前記ユーザ状態情報を、前記記憶媒体に格納されている２以上の各情報推薦手法が有する評価情報に適用し、前記２以上の各情報推薦手法に対する２以上のスコアを算出するスコア算出部と、前記スコア算出部が算出した２以上のスコアを用いて、一の情報推薦手法が有する文パターン情報を取得し、当該文パターン情報から文を構成する文構成部と、前記文構成部が構成した文を出力する文出力部と、前記受付部が受け付けた文、または前記文出力部が出力した文のうちの１以上の文から、少なくとも１以上のスポットまたは１以上の決定要因を取得し、当該１以上のスポットまたは１以上の決定要因を用いて、前記記憶媒体のユーザ状態情報を更新するユーザ状態情報更新部とをして機能させるプログラムであり、前記スコア算出部は、前記ユーザ状態情報更新部が更新したユーザ状態情報を、前記情報推薦手法格納部に格納されている２以上の各情報推薦手法が有する評価情報に適用し、前記２以上の各情報推薦手法に対する２以上のスコアを算出するものとして機能させるプログラム、である。 Further, the processing in the present embodiment may be realized by software. Then, this software may be distributed by software download or the like. Further, this software may be recorded and distributed on a recording medium such as a CD-ROM. This also applies to other embodiments in this specification. Note that the software that realizes the interactive device 3 in the present embodiment is the following program. That is, this program has a spot having a spot, one or more determinants that are factors for determining the spot, and an evaluation value indicating evaluation of each of the one or more determinants of the spot on the storage medium. Used when selecting a knowledge base that stores two or more pieces of information, sentence pattern information that indicates a sentence pattern output by the interactive device or a sentence pattern output by the interactive device, and the sentence pattern information Two or more information recommendation methods having evaluation information of sentence pattern information to be performed; information indicating user status; preference vector which is information indicating user preference for one or more determination factors; and one or more information User status information having knowledge vectors indicating the user's knowledge for each determinant is stored, and the computer accepts a sentence input by the user And a score calculation unit that applies the user status information to evaluation information of each of the two or more information recommendation methods stored in the storage medium and calculates two or more scores for the two or more information recommendation methods And using the two or more scores calculated by the score calculation section, the sentence pattern information possessed by one information recommendation method is acquired, and a sentence composition section that composes a sentence from the sentence pattern information; and the sentence composition section Acquire at least one spot or one or more determinants from one or more sentences out of a sentence output part that outputs the composed sentence and a sentence received by the reception part or a sentence output by the sentence output part. The score calculation is a program that functions as a user status information update unit that updates user status information of the storage medium using the one or more spots or one or more determining factors. Applies the user status information updated by the user status information update unit to the evaluation information of each of the two or more information recommendation methods stored in the information recommendation method storage unit, and the two or more information recommendation methods This is a program that functions as one that calculates a score of 2 or more for.

また、上記プログラムにおいて、前記文構成部は、前記スコア算出部が算出した２以上のスコアのうち最も大きいスコアに対応する一の情報推薦手法が有する文パターン情報を取得する文パターン情報取得手段と、前記文パターン情報取得手段が取得した文パターン情報に含まれる１以上の変数を取得し、当該変数に対応するスポットまたは決定要因を、前記文出力部が直前に出力した文、または前記受付部が直前に受け付けた文のうちの１以上の文から取得する変数値取得手段と、前記文パターン情報取得手段が取得した文パターン情報の変数の箇所に、前記変数値取得手段が取得した用語を挿入して文を構成する文構成手段とを具備するものとして、コンピュータを機能させることは好適である。 In the above program, the sentence composing unit includes sentence pattern information obtaining means for obtaining sentence pattern information included in one information recommendation method corresponding to the largest score among two or more scores calculated by the score calculating unit; , One or more variables included in the sentence pattern information acquired by the sentence pattern information acquisition unit, and a spot or determination factor corresponding to the variable, the sentence output by the sentence output unit immediately before, or the reception unit The variable value acquisition unit that acquires from one or more of the sentences received immediately before, and the term acquired by the variable value acquisition unit in the variable part of the sentence pattern information acquired by the sentence pattern information acquisition unit It is preferable to make a computer function as what comprises the sentence structure means which inserts and comprises a sentence.

また、上記プログラムにおいて、前記変数値取得手段は、前記文パターン情報取得手段が取得した文パターン情報に含まれる１以上の変数を取得し、当該変数に対応する候補となる１以上のスポットまたは１以上の決定要因を、前記文出力部が直前に出力した文、または前記受付部が直前に受け付けた文のうちの１以上の文から取得し、前記候補となる１以上のスポットまたは１以上の決定要因から、前記候補となる１以上のスポットまたは１以上の決定要因に対応する前記知識ベースの評価値を用いて、前記変数に対応するスポットまたは決定要因を選択するものとして、コンピュータを機能させることは好適である。 In the above program, the variable value acquisition unit acquires one or more variables included in the sentence pattern information acquired by the sentence pattern information acquisition unit, and is one or more spots or 1 that are candidates corresponding to the variable. The above determinants are obtained from one or more sentences of the sentence output by the sentence output unit or the sentence received by the reception unit immediately before, and are one or more spots or one or more candidates as the candidates. Let the computer function as one that selects a spot or a determinant corresponding to the variable using one or more spots as candidates or an evaluation value of the knowledge base corresponding to one or more determinants from the determinants That is preferred.

また、上記プログラムにおいて、前記ユーザ状態情報更新部は、前記受付部が受け付けた文から少なくとも１以上の決定要因を取得するユーザ提示用語取得手段と、前記文出力部が出力した文のうちの１以上の文から、少なくとも１以上の決定要因を取得する装置提示用語取得手段と、前記ユーザ提示用語取得手段が取得した１以上の決定要因に対する嗜好ベクトルの要素の値を高くするように更新する嗜好ベクトル更新手段と、前記装置提示用語取得手段が取得した１以上の決定要因に対する知識ベクトルの要素の値を高くするように更新する知識ベクトル更新手段とを具備するものとして、コンピュータを機能させることは好適である。 In the above program, the user status information update unit includes a user presentation term acquisition unit that acquires at least one or more determinants from the sentence received by the reception unit, and one of the sentences output by the sentence output unit. From the above sentence, the device presentation term acquisition means for acquiring at least one or more determination factors, and the preference for updating the preference vector element value for the one or more determination factors acquired by the user presentation term acquisition means. It is possible to cause a computer to function as comprising a vector updating unit and a knowledge vector updating unit that updates the value of an element of a knowledge vector for one or more determinants acquired by the device presentation term acquisition unit. Is preferred.

また、上記プログラムにおいて、前記受付部は、ユーザが入力した音声を受け付ける音声受付手段と、前記音声を認識し、文字列に変換する音声認識手段とを具備し、前記文出力部は、前記文構成部が構成した文を音声出力するものとして、コンピュータを機能させることは好適である。 In the above program, the reception unit includes a voice reception unit that receives a voice input by a user, and a voice recognition unit that recognizes the voice and converts the voice into a character string. It is preferable to make a computer function as a voice output of a sentence formed by the component.

また、図２５は、本明細書で述べたプログラムを実行して、上述した実施の形態の対話装置等を実現するコンピュータの外観を示す。上述の実施の形態は、コンピュータハードウェア及びその上で実行されるコンピュータプログラムで実現され得る。図２５は、このコンピュータシステム３４０の概観図であり、図２６は、コンピュータシステム３４０のブロック図である。 FIG. 25 shows the external appearance of a computer that executes the program described in this specification to realize the interactive apparatus or the like of the above-described embodiment. The above-described embodiments can be realized by computer hardware and a computer program executed thereon. FIG. 25 is an overview diagram of the computer system 340, and FIG. 26 is a block diagram of the computer system 340.

図２５において、コンピュータシステム３４０は、ＦＤドライブ、ＣＤ−ＲＯＭドライブを含むコンピュータ３４１と、キーボード３４２と、マウス３４３と、モニタ３４４と、マイク３４５とを含む。なお、対話装置１、３、およびシミュレーション装置２は、マイク３４５を有しなくても良い。 25, the computer system 340 includes a computer 341 including an FD drive and a CD-ROM drive, a keyboard 342, a mouse 343, a monitor 344, and a microphone 345. Note that the dialogue apparatuses 1 and 3 and the simulation apparatus 2 do not have to include the microphone 345.

図２６において、コンピュータ３４１は、ＦＤドライブ３４１１、ＣＤ−ＲＯＭドライブ３４１２に加えて、ＭＰＵ３４１３と、ＣＤ−ＲＯＭドライブ３４１２及びＦＤドライブ３４１１に接続されたバス３４１４と、ブートアッププログラム等のプログラムを記憶するためのＲＯＭ３４１５とに接続され、アプリケーションプログラムの命令を一時的に記憶するとともに一時記憶空間を提供するためのＲＡＭ３４１６と、アプリケーションプログラム、システムプログラム、及びデータを記憶するためのハードディスク３４１７とを含む。ここでは、図示しないが、コンピュータ３４１は、さらに、ＬＡＮへの接続を提供するネットワークカードを含んでも良い。 26, in addition to the FD drive 3411 and the CD-ROM drive 3412, the computer 341 stores an MPU 3413, a bus 3414 connected to the CD-ROM drive 3412 and the FD drive 3411, and a program such as a bootup program. A RAM 3416 for temporarily storing application program instructions and providing a temporary storage space; and a hard disk 3417 for storing application programs, system programs, and data. Although not shown here, the computer 341 may further include a network card that provides connection to the LAN.

コンピュータシステム３４０に、上述した実施の形態の対話装置等の機能を実行させるプログラムは、ＣＤ−ＲＯＭ３５０１、またはＦＤ３５０２に記憶されて、ＣＤ−ＲＯＭドライブ３４１２またはＦＤドライブ３４１１に挿入され、さらにハードディスク３４１７に転送されても良い。これに代えて、プログラムは、図示しないネットワークを介してコンピュータ３４１に送信され、ハードディスク３４１７に記憶されても良い。プログラムは実行の際にＲＡＭ３４１６にロードされる。プログラムは、ＣＤ−ＲＯＭ３５０１、ＦＤ３５０２またはネットワークから直接、ロードされても良い。 A program for causing the computer system 340 to execute the functions of the interactive apparatus or the like of the above-described embodiment is stored in the CD-ROM 3501 or FD 3502, inserted into the CD-ROM drive 3412 or FD drive 3411, and further stored in the hard disk 3417. May be forwarded. Alternatively, the program may be transmitted to the computer 341 via a network (not shown) and stored in the hard disk 3417. The program is loaded into the RAM 3416 at the time of execution. The program may be loaded directly from the CD-ROM 3501, the FD 3502, or the network.

プログラムは、コンピュータ３４１に、上述した実施の形態の対話装置等の機能を実行させるオペレーティングシステム（ＯＳ）、またはサードパーティープログラム等は、必ずしも含まなくても良い。プログラムは、制御された態様で適切な機能（モジュール）を呼び出し、所望の結果が得られるようにする命令の部分のみを含んでいれば良い。コンピュータシステム３４０がどのように動作するかは周知であり、詳細な説明は省略する。 The program does not necessarily include an operating system (OS), a third-party program, or the like that causes the computer 341 to execute the functions of the interactive device according to the above-described embodiment. The program only needs to include an instruction portion that calls an appropriate function (module) in a controlled manner and obtains a desired result. How the computer system 340 operates is well known and will not be described in detail.

また、上記プログラムを実行するコンピュータは、単数であってもよく、複数であってもよい。すなわち、集中処理を行ってもよく、あるいは分散処理を行ってもよい。 Further, the computer that executes the program may be singular or plural. That is, centralized processing may be performed, or distributed processing may be performed.

また、上記各実施の形態において、一の装置に存在する２以上の通信手段は、物理的に一の媒体で実現されても良いことは言うまでもない。 Further, in each of the above embodiments, it goes without saying that two or more communication units existing in one apparatus may be physically realized by one medium.

また、上記各実施の形態において、各処理（各機能）は、単一の装置（システム）によって集中処理されることによって実現されてもよく、あるいは、複数の装置によって分散処理されることによって実現されてもよい。 In each of the above embodiments, each process (each function) may be realized by centralized processing by a single device (system), or by distributed processing by a plurality of devices. May be.

本発明は、以上の実施の形態に限定されることなく、種々の変更が可能であり、それらも本発明の範囲内に包含されるものであることは言うまでもない。 The present invention is not limited to the above-described embodiments, and various modifications are possible, and it goes without saying that these are also included in the scope of the present invention.

以上のように、本発明にかかる学習システムは、ユーザと対話を行う対話装置が文を出力するために必要な重みベクトルを自動的に構築できる、という効果を有し、学習システム等として有用である。 As described above, the learning system according to the present invention has an effect that an interactive device that interacts with a user can automatically construct a weight vector necessary for outputting a sentence, and is useful as a learning system. is there.

１、３対話装置
２シミュレーション装置
１１知識ベース
１２情報推薦手法格納部
１３ユーザ状態情報格納部
１４ユーザ入力情報受付部
１５スコア算出部
１６対話文情報構成部
１６対話対話文情報構成部
１７対話文出力部
１８ユーザ状態情報更新部
２１対話情報格納部
２２ユーザ嗜好ベクトル格納部
２３対話文情報受付部
２４ユーザ文種類決定部
２５決定要因等取得部
２６ユーザ入力情報送付部
２７報酬算出部
２８学習部
３４受付部
３６文構成部
３７文出力部
１６１手法識別子取得手段
１６２、３６２変数値取得手段
１６３対話文情報構成手段
１８１ユーザ提示用語取得手段
１８２装置提示用語取得手段
１８３嗜好ベクトル更新手段
１８４知識ベクトル更新手段
２７１ランダム選択合致値算出手段
２７２選択スポット合致度算出手段
２７３報酬算出手段
３４１音声受付手段
３４２音声認識手段
３６１文パターン情報取得手段
３６３文構成手段 DESCRIPTION OF SYMBOLS 1, 3 Dialogue apparatus 2 Simulation apparatus 11 Knowledge base 12 Information recommendation method storage part 13 User state information storage part 14 User input information reception part 15 Score calculation part 16 Dialogue sentence information structure part 16 Dialogue sentence sentence information structure part 17 Dialogue sentence output Unit 18 User status information update unit 21 Dialog information storage unit 22 User preference vector storage unit 23 Dialogue sentence information reception unit 24 User sentence type determination unit 25 Determination factor acquisition unit 26 User input information transmission unit 27 Reward calculation unit 28 Learning unit 34 Reception unit 36 Sentence configuration unit 37 Sentence output unit 161 Method identifier acquisition unit 162, 362 Variable value acquisition unit 163 Dialogue sentence information configuration unit 181 User presentation term acquisition unit 182 Device presentation term acquisition unit 183 Preference vector update unit 184 Knowledge vector update unit 271 Random selection match value calculation means 2 2 Select spot matching degree calculating unit 273 reward calculation means 341 voice accepting unit 342 speech recognition means 361 sentence pattern information acquiring unit 363 sentences constituting unit

Claims

A learning system comprising a dialog device and a simulation device for simulating a dialog about a spot, and a learning system for learning a weight vector used when determining a sentence output by the dialog device,
The interactive device is:
Knowledge storing two or more pieces of spot information including a spot, one or more determinants that are factors for determining the spot, and an evaluation value indicating the evaluation of each of the one or more determinants of the spot Base and
Information recommendation method storage that stores two or more information recommendation methods having a method identifier for identifying an information recommendation method, evaluation information of the information recommendation method, and a weight vector indicating the weight of each element constituting the evaluation information And
Stores user state information including information indicating a user state and a preference vector which is information indicating a user's preference for one or more determinants and a knowledge vector indicating a user's knowledge for one or more determinants A user status information storage unit,
From the simulation device, a user sentence type identifier that identifies a user sentence type that is a pattern of a sentence input by a user, or a user sentence type identifier and one or more information of one or more determinants or one or more spots A user input information receiving unit for receiving user input information,
Using the evaluation information and weight vector of each of the two or more information recommendation methods stored in the information recommendation method storage unit, and the user status information, two or more scores for the two or more information recommendation methods are obtained. A score calculation unit to calculate,
Using two or more scores calculated by the score calculation unit, a method identifier for identifying one information recommendation method, or a method identifier, and one or more determinants or one or more information of one or more spots A dialogue sentence information constituting unit constituting dialogue sentence information having;
Dialog message output unit configured to send dialog message information configured by the dialog message information configuration unit to the simulation device;
Acquiring at least one spot or one or more determinants from one or more pieces of information of user input information received by the user input information receiving unit or dialogue sentence output by the dialogue sentence output unit; Using one or more spots or one or more determinants, and a user state information update unit for updating user state information in the user state information storage unit,
The score calculation unit
Each of the two or more pieces of information using the evaluation information and the weight vector of each of the two or more pieces of information recommendation methods stored in the information recommendation method storage unit and the user state information updated by the user state information update unit. Calculate a score of 2 or higher for the recommendation method,
The simulation apparatus includes:
Dialogue probability information that is information about the probability of each information recommendation method and each user sentence type, determinant probability information that is information about the probability that a determinant is selected, and spot probability information that is information about the probability that a spot is selected A dialogue information storage unit capable of storing
A user preference vector storage unit that can store a user preference vector that is a vector indicating the user's preference;
A dialog text information receiving unit that receives dialog text information from the dialog device;
A user sentence type determining unit that determines a user sentence type using the technique identifier and the conversation probability information included in the dialog sentence information, and acquires a user sentence type identifier;
One or more information of the determinant probability information or the spot probability information, or one or more determinants of the one or more information of the determinant probability information or the spot probability information and the dialogue sentence information or 1 Using one or more pieces of information among the above spots, one or more determining factors or a determining factor acquiring unit for acquiring one or more spots,
A user input information sending unit for sending user input information having the user sentence type identifier or the user sentence type identifier and one or more determinants or one or more information of one or more spots to the interactive device;
The user preference vector and one or more evaluation values indicating the evaluation of each of the one or more determinants of the spot included in the user input information are acquired, and the user preference vector matches the one or more evaluation values A reward calculating unit for calculating a reward for selecting a user sentence type identified by the user sentence type identifier using the degree of match;
A learning system comprising: a learning unit that uses the reward to update a weight vector corresponding to the method identifier of the dialog device and the information recommendation method storage unit of the dialog device.

The reward calculation unit
Random selection match value calculation means for calculating an expected value of the degree of match between one or more evaluation values when the spot is randomly determined using the spot probability information and the user preference vector;
A selected spot coincidence degree calculating means for calculating a degree of coincidence between the user preference vector and one or more evaluation values indicating evaluations of the one or more determination factors of the spot included in the user input information;
A reward for selecting a spot included in the user input information using the expected value of the degree of match calculated by the random selection match value calculation unit and the match level calculated by the selected spot match level calculation unit. The learning system according to claim 1, further comprising reward calculating means for calculating.

The user status information storage unit
A preference vector that is information indicating a user's state and indicating user's preference for one or more determinants, a knowledge vector indicating user knowledge for one or more determinants, and one or more determinants The learning system according to claim 1, further comprising user state information having attribute vectors that are information indicating user attributes with respect to.

The simulation apparatus which comprises the learning system in any one of Claims 1-3.

On the recording medium,
Dialogue probability information that is information about the probability of each information recommendation method and each user sentence type, determinant probability information that is information about the probability that a determinant is selected, and spot probability information that is information about the probability that a spot is selected When,
Stores a user preference vector, which is a vector indicating the user's preference,
It is a learning method that can be realized by a dialogue sentence information reception unit, a user sentence type determination unit, a determination factor acquisition unit, a user input information transmission unit, a reward calculation unit, and a learning unit,
The dialog text information receiving unit receives dialog text information from a dialog device, and receives a dialog text information receiving step;
The user sentence type determination unit determines a user sentence type using the technique identifier and the conversation probability information included in the dialog sentence information, and acquires a user sentence type identifier; and
The determination factor etc. acquiring unit may include one or more pieces of information of the decision factor probability information or the spot probability information, or one or more pieces of information of the decision factor probability information or the spot probability information and the dialogue sentence information. Using one or more determinants or one or more pieces of information of one or more spots having one or more determinants or a determinant acquisition step for acquiring one or more spots;
The user input information sending unit sends user input information having the user sentence type identifier or the user sentence type identifier and one or more determinants or one or more information of one or more spots to the dialogue apparatus. A user input information sending step,
The reward calculation unit acquires the user preference vector and one or more evaluation values indicating the evaluation of each of the one or more determinants of spots included in the user input information, and the user preference vector and the one or more A degree of coincidence with the evaluation value, and using the degree of coincidence, a reward calculating step of calculating a reward for selecting the user sentence type identified by the user sentence type identifier;
A learning method comprising: a learning step in which the learning unit updates a weight vector of the information recommendation method storage unit of the dialog device, the weight vector corresponding to the method identifier of the dialog device using the reward .