JP2022154041A

JP2022154041A - Subject feeling estimation model, device and method, and behavior modification promotion model

Info

Publication number: JP2022154041A
Application number: JP2021056881A
Authority: JP
Inventors: ロベルトセバスチャンレガスピ; Sebastian Legaspi Toberto; 文臻徐; Wenzhen Xu; 真弥和田; Shinya Wada; 達也小西; Tatsuya Konishi; 茂莉黒川; Mori Kurokawa
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2021-03-30
Filing date: 2021-03-30
Publication date: 2022-10-13
Anticipated expiration: 2041-03-30
Also published as: JP7448502B2

Abstract

To provide a subject feeling estimation model capable of estimating a subject feeling of a user.SOLUTION: This model makes a computer function as: a belief model for generating or updating belief information including a probability of causing a new state as a result of a behavior under a certain subject feeling level of a user; a desire model for receiving a reward corresponding to value information of the user to generate a desirable state of the user and determining a policy being a set of behaviors that can bring about the desirable state; an intention model for generating causal relation information relating to causal relations among a state, a value, a reword and a behavior on the basis of the belief information, the value information and the reward; a behavior model for determining and outputting a behavior on the basis of the policy and the causal relation information; and a subject feeling model for determining or updating the subject feeling level of the user on the basis of a generated new state and a feature amount relating to a prescribed feature of the user, and updating the subject feeling level to be used in the belief model into a determined or updated subject feeling level.SELECTED DRAWING: Figure 1

Description

本発明は、「説得」によって人の行動変容を促す説得的技術（Persuasive technology）に関する。 TECHNICAL FIELD The present invention relates to persuasive technology that promotes behavioral change of people through "persuasion".

ユーザの信念、認識や望みを所定の方向に変化させ得る広義の「説得」によって、当該ユーザの行動変容を促す説得的技術（Persuasive technology）が注目されている。例えば、健康福祉の分野、教育の分野、及び都市交通の分野においても、それぞれ例えば特許文献１、特許文献２、及び特許文献３に開示されているように、この説得的技術の適用が精力的に進められている。 Attention is focused on persuasive technology that encourages behavioral change of the user by means of "persuasion" in a broad sense that can change the user's beliefs, perceptions, and desires in a predetermined direction. For example, in the fields of health and welfare, education, and urban transportation, the application of this persuasive technique is vigorous, as disclosed, for example, in Patent Document 1, Patent Document 2, and Patent Document 3, respectively. is being advanced to.

また、説得的技術は、ＡＩ（Artificial Intelligence）がユーザとのコミュニケーションを介し、ユーザに種々様々なサービスを提供する自立型ＡＩシステムを実現する上で欠かせない技術として、今後ますます発展していくものと考えられる。 In addition, persuasive technology will continue to develop in the future as an indispensable technology for realizing autonomous AI systems that provide various services to users through AI (Artificial Intelligence) communication with users. It is thought that there are many.

Rita Orji and Karyn Moffatt, "Persuasive technology for health and wellness: State-of-the-art and emerging trends", Health Informatics J. 24(1), pp.66-91. ２０１８年, ＜https://doi.org/10.1177/1460458216650979＞Rita Orji and Karyn Moffatt, "Persuasive technology for health and wellness: State-of-the-art and emerging trends", Health Informatics J. 24(1), pp.66-91. 2018, <https://doi .org/10.1177/1460458216650979> Yohana Dewi Lulu Widyasari et al., "Persuasive technology for enhanced learning behavior in higher education", International Journal of Educational Technology in Higher Education, 16:15, ２０１９年, ＜https://doi.org/10.1186/s41239-019-0142-5＞Yohana Dewi Lulu Widyasari et al., "Persuasive technology for enhanced learning behavior in higher education", International Journal of Educational Technology in Higher Education, 16:15, 2019, <https://doi.org/10.1186/s41239-019 -0142-5> Evangelia Anagnostopoulou et al., "Persuasive Technologies for Sustainable Mobility: State of the Art and Emerging Trends", Sustainability 2018, 10(7), pp.2128, ２０１８年, ＜https://doi.org/10.3390/su10072128＞Evangelia Anagnostopoulou et al., "Persuasive Technologies for Sustainable Mobility: State of the Art and Emerging Trends", Sustainability 2018, 10(7), pp.2128, 2018, <https://doi.org/10.3390/su10072128>

しかしながら、上述したような従来の説得的技術では、ＡＩがユーザを「説得」するにしても、複雑な又は刻々と変動する通常の環境世界の状況にあって、ユーザのより適切な行動変容を促す、具体的にはより適切な意思決定や行動決定を促すことは依然、困難であるのが実情である。 However, with conventional persuasive techniques such as those described above, even if AI “persuades” the user, it is difficult for the user to make more appropriate behavioral changes in the situation of a normal environmental world that is complicated or changes from moment to moment. The reality is that it is still difficult to encourage, more specifically, to encourage more appropriate decision-making and action-making.

本願発明者等は、この困難である理由が、ＡＩの「説得」を受けて自らの意思で行った行動と、その結果として現れた環境世界の状態との間に、ユーザ自身が繋がりや連動性を感じられないケースが少なからず生じることにある、と考えた。すなわち従来の説得的技術では、ユーザの「行動主体感」（自らの行動によって周囲に影響を与えているという感覚）の変化を何ら考慮していないので、この「行動主体感」を減退させることのない、ユーザにとって納得のいく適切な「説得」を行うことが、非常に困難になっていることを突き止めたのである。 The inventors of the present application believe that the reason for this difficulty lies in the fact that the user himself/herself is connected or interlocked between the actions he/she takes on his/her own will after being "persuaded" by AI and the state of the environmental world that appears as a result. I thought that there were many cases where I could not feel sexuality. In other words, conventional persuasive techniques do not take into account changes in the user's "sense of initiative" (the feeling that his or her own actions have an impact on the surroundings). We have found that it is extremely difficult to perform appropriate "persuasion" that is satisfactory to the user, without the

そこで、本発明は、ユーザの行動主体感を推定し、また当該行動主体感を考慮して当該ユーザの意思又は行動における変化を促すことの可能な主体感推定モデル、主体感推定装置、主体感推定方法、及び行動変容促進モデルを提供することを目的とする。 Accordingly, the present invention provides a sense of subjectivity estimation model, a sense of subjectivity estimation device, and a sense of subjectivity that can estimate a user's sense of agency and, in consideration of the sense of agency, encourage changes in the user's intentions or actions. The purpose is to provide an estimation method and a behavioral change promotion model.

本発明によれば、ユーザを含む環境世界の状態に対する行動を、報酬を用いて決定する中で、当該ユーザの主体感を推定するコンピュータを機能させる主体感推定モデルであって、
当該ユーザのある主体感レベルの下で、ある状態に対してある行動を行った結果、ある新たな状態が生じる確率を含む情報であるビリーフ情報を生成又は更新するビリーフモデルと、
当該ユーザにとっての価値に係る価値情報と、それに対応する当該報酬とを受け取って、当該ユーザの所望する状態である所望状態を生成し、当該所望状態をもたらし得る行動の集合である方針を決定するデザイアモデルと、
当該ビリーフ情報と、当該価値情報及び当該報酬とに基づき、状態、価値、報酬及び行動の間の因果関係に係る因果関係情報を生成する意思モデルと、
当該方針と、当該因果関係情報とに基づき、観測された状態に対して行うべき行動を決定し、出力する行動モデルと、
出力された当該行動によって生じた新たな状態と、当該新たな状態の下での当該ユーザの所定の特徴に係る特徴量とに基づき、当該ユーザの主体感レベルを決定又は更新し、出力するとともに、上記のビリーフモデルで用いる主体感レベルを、決定又は更新した当該主体感レベルに更新させる主体感モデルと
してコンピュータを機能させる主体感推定モデルが提供される。 According to the present invention, a sense of subjectivity estimation model that causes a computer to function to estimate a sense of subjectivity of a user while determining behavior with respect to the state of the environment world including the user using a reward, comprising:
a belief model for generating or updating belief information, which is information including the probability that a new state will occur as a result of performing a certain action on a certain state under a certain sense of subjectivity level of the user;
Receive value information related to value for the user and the corresponding reward, generate a desired state desired by the user, and determine a policy that is a set of actions that can bring about the desired state. Desire model and
an intention model that generates causal relationship information relating to causal relationships between states, values, rewards, and actions based on the belief information, the value information, and the reward;
an action model that determines and outputs an action to be taken for an observed state based on the policy and the causal relationship information;
determining or updating and outputting the sense of subjectivity level of the user based on the output new state caused by the action and the feature amount related to the predetermined feature of the user under the new state; A sense of subjectivity estimation model is provided that causes a computer to function as a sense of subjectivity model that updates the sense of subjectivity level used in the belief model to the determined or updated sense of subjectivity level.

この本発明による主体感推定モデルの一実施形態として、行動モデルは、
当該方針と、当該因果関係情報と、当該ユーザとの間で行った所定の問いかけを含むコミュニケーションの内容とに基づき、最適とされる方針である最適方針を生成する行動計画部と、
生成された当該最適方針を用いて、観測された状態に対して行うべき行動を決定し、出力する行動決定部と
を有することも好ましい。 As an embodiment of the sense of subjectivity estimation model according to the present invention, the behavior model is:
an action planning unit that generates an optimal policy based on the policy, the causal relationship information, and the contents of communication including a predetermined question made with the user;
It is also preferable to have an action determination unit that determines and outputs an action to be taken for the observed state using the generated optimal policy.

さらに、本発明による主体感推定モデルの他の実施形態として、主体感推定モデルは、当該ユーザから、当該価値情報に係る情報と、当該価値情報に係る情報に対応する報酬に係る情報とを受け取って、当該価値情報に係る情報及び当該報酬に係る情報に基づき、当該価値情報及びそれに対応する当該報酬を生成又は更新し、上記のデザイアモデルへ出力する価値一致化モデルとしてコンピュータを更に機能させることも好ましい。またここで、この価値一致化モデルは、協調逆強化学習（ＣＩＲＬ，Cooperative Inverse Reinforcement Learning）に係るアルゴリズムを用いて構築されていることも好ましい。 Furthermore, as another embodiment of the sense of subjectivity estimation model according to the present invention, the sense of subjectivity estimation model receives information related to the value information and information related to reward corresponding to the information related to the value information from the user. Then, based on the information on the value information and the information on the remuneration, generate or update the value information and the remuneration corresponding thereto, and cause the computer to further function as a value matching model that outputs to the above desire model. is also preferred. Here, it is also preferable that this value matching model is constructed using an algorithm related to cooperative inverse reinforcement learning (CIRL).

さらに、本発明による主体感推定モデルの更なる他の実施形態として、主体感推定モデルは、
観測された状態と、これに対応する出力された行動とを受け取って、少なくとも複数のユーザの各々についての当該因果関係情報を統合した統合因果関係情報に基づき、起こり得る状態候補としての代替状態を生成し出力する状態生成器と、
上記の出力された行動によって生じた新たな状態と、当該代替状態とから、当該所望状態との相違を表す損失を生成し、当該損失をもって状態生成器に対し訓練を行わせ、また当該損失をもって自らの訓練を行う判別器と、
訓練された状態生成器で生成される当該代替状態に対応する報酬である予測報酬を生成し、当該予測報酬をもって行動モデルに対し当該行動の決定についての訓練を行わせる評価器と
を有する代替状態生成・評価モデルとしてコンピュータを更に機能させることも好ましい。また、この代替状態生成・評価モデルは、敵対的生成ネットワーク（ＧＡＮ，Generative Adversarial Networks）に係るアルゴリズムを用いて構築されていることも好ましい。 Furthermore, as yet another embodiment of the sense of subjectivity estimation model according to the present invention, the sense of subjectivity estimation model is:
Receiving an observed state and a corresponding output action, and selecting an alternative state as a possible state candidate based on integrated causal relationship information that integrates the causal relationship information for each of at least a plurality of users. a state generator that generates and outputs;
Generate a loss representing the difference from the desired state from the new state generated by the output action and the alternative state, train the state generator with the loss, and use the loss a classifier that trains itself;
an evaluator that generates a predicted reward that is a reward corresponding to the alternative state generated by the trained state generator, and that trains a behavior model to determine the behavior with the predicted reward. It is also preferable to have a computer function additionally as a generating and evaluating model. It is also preferable that this alternative state generation/evaluation model is constructed using an algorithm related to generative adversarial networks (GAN).

さらに本発明による主体感推定モデルにおいて、ビリーフモデルは、部分観測マルコフ決定過程（ＰＯＭＤＰ，Partially Observable Markov Decision Process）に係るアルゴリズムを用いて構築されていることも好ましい。また、意思モデルにおける因果関係情報は、ベイジアンネットワーク（Bayesian network）アルゴリズムに係る情報であることも好ましい。 Furthermore, in the sense of subjectivity estimation model according to the present invention, the belief model is preferably constructed using an algorithm related to the Partially Observable Markov Decision Process (POMDP). Also, the causal relationship information in the intention model is preferably information related to a Bayesian network algorithm.

本発明によれば、また、以上に述べた主体感推定モデルを用いて、当該環境世界における観測された状態から、当該ユーザの主体感を推定する主体感推定装置が提供される。 According to the present invention, there is also provided a sense of subjectivity estimation device that estimates the sense of subjectivity of the user from the state observed in the environment world using the sense of subjectivity estimation model described above.

本発明によれば、さらに、ユーザを含む環境世界の状態に対する行動を、報酬を用いて決定する中で、当該ユーザの主体感を推定するコンピュータにおける主体感推定方法であって、
当該ユーザのある主体感レベルの下で、ある状態に対してある行動を行った結果、ある新たな状態が生じる確率を含む情報であるビリーフ情報を生成又は更新するステップと、
当該ユーザにとっての価値に係る価値情報と、それに対応する当該報酬とを受け取って、当該ユーザの所望する状態である所望状態を生成し、当該所望状態をもたらし得る行動の集合である方針を決定するステップと、
当該ビリーフ情報と、当該価値情報及び当該報酬とに基づき、状態、価値、報酬及び行動の間の因果関係に係る因果関係情報を生成するステップと、
当該方針と、当該因果関係情報とに基づき、観測された状態に対して行うべき行動を決定し、出力するステップと、
出力された当該行動によって生じた新たな状態と、当該新たな状態の下での当該ユーザの所定の特徴に係る特徴量とに基づき、当該ユーザの主体感レベルを決定又は更新し、出力するとともに、上記のビリーフ情報を生成又は更新するステップで用いる主体感レベルを、決定又は更新した当該主体感レベルに更新させる主体感モデルと
を有する主体感推定方法が提供される。 According to the present invention, there is further provided a sense of subjectivity estimation method in a computer for estimating a sense of subjectivity of a user while determining behavior with respect to the state of the environment world including the user using a reward, comprising:
a step of generating or updating belief information, which is information including the probability that a new state will occur as a result of performing a certain action on a certain state under a certain sense of subjectivity level of the user;
Receive value information related to value for the user and the corresponding reward, generate a desired state desired by the user, and determine a policy that is a set of actions that can bring about the desired state. a step;
generating causal relationship information relating to causal relationships between states, values, rewards and actions based on the belief information, the value information and the reward;
determining and outputting an action to be taken for the observed state based on the policy and the causality information;
determining or updating and outputting the sense of subjectivity level of the user based on the output new state caused by the action and the feature amount related to the predetermined feature of the user under the new state; and a sense of subjectivity model for updating the sense of subjectivity level used in the step of generating or updating the belief information to the determined or updated sense of subjectivity level.

本発明によれば、さらにまた、ユーザを含む環境世界の状態に対する行動を、報酬を用いて決定する中で、当該ユーザの行動変容を促すコンピュータを機能させる行動変容促進モデルであって、
当該ユーザのある主体感レベルの下で、ある状態に対してある行動を行った結果、ある新たな状態が生じる確率を含む情報であるビリーフ情報を生成又は更新するビリーフモデルと、
当該ユーザにとっての価値に係る価値情報と、それに対応する当該報酬とを受け取って、当該ユーザの所望する状態である所望状態を生成し、当該所望状態をもたらし得る行動の集合である方針を決定するデザイアモデルと、
当該ビリーフ情報と、当該価値情報及び当該報酬とに基づき、状態、価値、報酬及び行動の間の因果関係に係る因果関係情報を生成する意思モデルと、
当該方針と、当該因果関係情報と、当該ユーザとの間で行った所定の問いかけを含むコミュニケーションの内容とに基づき、最適とされる方針である最適方針を生成し、当該最適方針を用いて、観測された状態に対して行うべき行動を決定し、出力する行動モデルと、
出力された当該行動によって生じた新たな状態と、当該新たな状態の下での当該ユーザの所定の特徴に係る特徴量とに基づき、当該ユーザの主体感レベルを決定又は更新し、上記のビリーフモデルで用いる主体感レベルを、決定又は更新した当該主体感レベルに更新させる主体感モデルと
してコンピュータを機能させる行動変容促進モデルが提供される。 According to the present invention, there is furthermore a behavior modification promotion model that functions a computer that encourages behavior modification of a user while determining behavior with respect to the state of the environment world including the user using a reward,
a belief model for generating or updating belief information, which is information including the probability that a new state will occur as a result of performing a certain action on a certain state under a certain sense of subjectivity level of the user;
Receive value information related to value for the user and the corresponding reward, generate a desired state desired by the user, and determine a policy that is a set of actions that can bring about the desired state. Desire model and
an intention model that generates causal relationship information relating to causal relationships between states, values, rewards, and actions based on the belief information, the value information, and the reward;
Based on the policy, the causal relationship information, and the contents of communication including the predetermined question with the user, an optimal policy is generated, and using the optimal policy, an action model that determines and outputs an action to be taken for an observed state;
determining or updating the sense of subjectivity level of the user based on the output new state caused by the action and the feature amount related to the predetermined feature of the user under the new state; A behavioral change promotion model is provided that causes a computer to function as a sense of subjectivity model that updates the sense of subjectivity level used in the model to the determined or updated sense of subjectivity level.

本発明の主体感推定モデル、主体感推定装置、主体感推定方法、及び行動変容促進モデルによれば、ユーザの行動主体感を推定し、また当該行動主体感を考慮して当該ユーザの意思又は行動における変化を促すことができる。 According to the subjective feeling estimation model, the subjective feeling estimating device, the subjective feeling estimating method, and the behavior change promotion model of the present invention, the user's sense of agency is estimated, and the user's intention or It can encourage change in behavior.

本発明による主体感推定モデルの一実施形態を示す模式図、及び本発明による主体感推定装置の一実施形態における機能構成を示す機能ブロック図である。1 is a schematic diagram showing an embodiment of a sense of subjectivity estimation model according to the present invention, and a functional block diagram showing a functional configuration in an embodiment of a sense of subjectivity estimation device according to the present invention; FIG. 本発明の一実施形態における理論的基礎体系を説明するための模式図である。1 is a schematic diagram for explaining a theoretical foundation system in one embodiment of the present invention; FIG.

以下、本発明の実施形態について、図面を用いて詳細に説明する。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

［主体感推定モデル，行動変容促進モデル］
図１は、本発明による主体感推定モデルの一実施形態を示す模式図、及び本発明による主体感推定装置の一実施形態における機能構成を示す機能ブロック図である。 [Sense of agency estimation model, behavioral change promotion model]
FIG. 1 is a schematic diagram showing an embodiment of a subjective feeling estimation model according to the present invention, and a functional block diagram showing a functional configuration in an embodiment of a subjective feeling estimation device according to the present invention.

図１に示した本実施形態の主体感推定モデル１は、（価値一体化モデル１３や行動モデル１５に備えられた）インタフェース（ＩＦ）を介し、ユーザＨとコミュニケーションを行う中で策定した方針に従い、ユーザＨを含む環境世界に対し行動を行って、環境世界の状態を所望の状態に向けて変化させることの可能なモデルとなっている。 The sense of subjectivity estimation model 1 of this embodiment shown in FIG. , and the user H can act on the environmental world to change the state of the environmental world toward a desired state.

ここで主体感推定モデル１は、上記のコミュニケーションを介し、ユーザの所定の特徴に係る情報（例えば心拍数や呼吸の速さ等の生理学指標）を取得して、ユーザＨの「行動主体感」（自らの行動によって周囲に影響を与えているという感覚。以下、主体感と略称）を定量化している。さらに、ユーザＨとコミュニケーションを行いつつ、環境世界の状態を受けて方針を策定し行動を行うサイクルの中で、適宜この「主体感」を更新する。またこれにより例えば本実施形態においては、ユーザＨのリアルタイムの「主体感」や、ユーザＨの「主体感」のダイナミックな変動を決定し出力することも可能となっているのである。 Here, the sense of agency estimation model 1 acquires information (for example, physiological indicators such as heart rate and breathing rate) related to the user's predetermined characteristics through the communication described above, and calculates the "sense of agency" of the user H. (the sense that one's actions have an impact on the surroundings; hereinafter abbreviated as sense of agency) is quantified. Furthermore, while communicating with the user H, this "sense of subjectivity" is appropriately updated in the cycle of formulating policies and taking actions in response to the state of the environmental world. Further, for example, in this embodiment, it is also possible to determine and output the user H's real-time "sense of subjectivity" and the dynamic variation of the user H's "sense of subjectivity".

さらに、主体感推定モデル１は本実施形態において、例え複雑な且つ刻々と変動する環境世界の状況にあっても、ユーザＨと適切なコミュニケーションを行い、ユーザＨの「主体感」を維持向上させつつ（減退させることなく）、ユーザＨにおける行動変容を促す、具体的にはより適切な意思決定や行動決定を促すことの可能な行動変容促進モデルにもなっている。 Furthermore, in this embodiment, the sense of subjectivity estimation model 1 communicates appropriately with the user H and maintains and improves the "sense of subjectivity" of the user H even in a complex and ever-changing environmental world. It is also a behavior modification promotion model capable of promoting behavior modification in user H, specifically, more appropriate decision-making and action determination, while (without diminishing).

具体的に、主体感推定モデル（行動変容促進モデル）１は、ユーザＨを含む環境世界の状態に対する「行動」を、当該状態及び当該行動に基づき算出される「報酬」を用いて決定する中で、ユーザＨの「主体感」を推定するモデルとなっており、図１に示したように少なくとも、
（Ａ）ユーザＨのある「主体感レベル」の下で、ある状態に対してある行動を行った結果、ある新たな状態が生じる確率を含む情報である「ビリーフ情報」を生成又は更新するビリーフモデル１１と、
（Ｂ）ユーザＨにとっての価値に係る「価値情報」と、それに対応する「報酬」とを受け取って、「価値情報」及び「報酬」に基づき、ユーザＨの所望する状態である所望状態を生成し、当該所望状態をもたらし得る行動の集合である「方針」を決定するデザイアモデル１２と、
（Ｃ）「ビリーフ情報」と、「価値情報」及び「報酬」とを受け取り、状態、価値情報、報酬及び行動の間の因果関係に係る「因果関係情報」を生成する意思モデル１４と、
（Ｄ）「方針」と、「因果関係情報」とに基づき、観測された状態に対して行うべき「行動」を決定し、出力する行動モデル１５と、
（Ｅ）出力された「行動」によって生じた「新たな状態」と、「新たな状態」の下でのユーザＨの所定の特徴に係る「特徴量」（例えば心拍数や呼吸の速さ等の生理学指標から生成された特徴量）とを受け取って、「新たな状態」及び「特徴量」に基づき、ユーザＨの「主体感レベル」を決定又は更新し、出力するとともに、上記（Ａ）のビリーフモデル１１で用いる「主体感レベル」を、決定又は更新した「主体感レベル」に更新させる主体感モデル１７と
してコンピュータを機能させるモデルとなっている。 Specifically, the sense of subjectivity estimation model (behavior change promotion model) 1 determines the "behavior" for the state of the environment world including the user H using the "reward" calculated based on the state and the behavior. is a model for estimating user H's sense of agency, and as shown in FIG. 1, at least
(A) A belief that generates or updates "belief information", which is information including the probability that a new state will occur as a result of performing a certain action in a certain state under a certain "subjectivity level" of the user H. model 11;
(B) Receive "value information" related to value for user H and corresponding "reward", and generate a desired state desired by user H based on the "value information" and "reward" and a desire model 12 that determines a "policy" that is a set of actions that can bring about the desired state;
(C) an intention model 14 that receives "belief information,""valueinformation," and "rewards," and generates "causal relationship information" relating to causal relationships between states, value information, rewards, and behavior;
(D) an action model 15 that determines and outputs an "action" to be performed for an observed state based on the "policy" and "causal relationship information";
(E) A "new state" caused by the output "behavior" and a "feature amount" (for example, heart rate, breathing rate, etc.) related to a predetermined feature of the user H under the "new state" ), determines or updates the user H's "subjectivity level" based on the "new state" and the "feature amount", and outputs the above (A) This model causes the computer to function as a subjective feeling model 17 that updates the "subjective feeling level" used in the belief model 11 to the determined or updated "subjective feeling level".

このように、主体感推定モデル１は、従来推定の困難であった（特に、複雑な且つ刻々と変動する環境世界下での推定ができなかった）「主体感」を推定することができ、さらに、適宜更新したユーザＨの「主体感（レベル）」を考慮して、環境世界に対する行動を決定することにより、その中でユーザＨの行動変容、すなわち意思又は行動における変化を促すこともできる。例えば、主体感推定モデル１によれば、（主体感推定モデル１からの提案や説得等を受けて）ユーザＨが自らの意思で行った行動と、その結果として現れた環境世界の状態との間に、ユーザＨ自身が繋がりや連動性を感じられるように、すなわちユーザＨの「主体感」を向上させこそすれ減退させずに、適切な提案や説得等を行うことも可能となるのである。 In this way, the sense of subjectivity estimation model 1 can estimate the "sense of subjectivity," which has been difficult to estimate in the past (in particular, it was impossible to estimate in a complex and ever-changing environmental world). Furthermore, by taking into consideration the user H's "sense of subjectivity (level)" that has been appropriately updated and determining his or her behavior in relation to the environment, it is possible to encourage the user H to change his/her behavior, that is, to change his/her intention or behavior. . For example, according to the sense of subjectivity estimation model 1, user H's voluntary behavior (in response to a proposal or persuasion from the sense of subjectivity estimation model 1) and the state of the environmental world that has appeared as a result. In the meantime, it is possible to make appropriate proposals and persuasion, etc. so that the user H himself can feel the connection and interlocking, that is, the "sense of independence" of the user H can be improved and not diminished. .

次に、図２を用いて、本実施形態の主体感推定モデル（行動変容促進モデル）１における機能構成の理論的基礎について説明を行う。 Next, with reference to FIG. 2, the theoretical basis of the functional configuration in the sense of subjectivity estimation model (behavioral change promotion model) 1 of this embodiment will be described.

図２は、本発明の一実施形態における理論的基礎体系を説明するための模式図である。 FIG. 2 is a schematic diagram for explaining the theoretical foundation system in one embodiment of the present invention.

一般に、ユーザの主体感（ＳｏＡ，Sense of Agency）は、（環境世界において予測される状態を目指して行われる）ユーザの意思的な行動の良好な流れを中断させるような妨害事象の発生から、強い影響を受けて変動する。以下、図２を用い、状態－意思－行動－新たな状態の連鎖の中で、主体感レベルが如何に変化するのかを説明する。 In general, the user's sense of agency (SoA) is determined from the occurrence of disturbing events that interrupt the good flow of the user's intentional actions (towards a predicted state in the environmental world). change under strong influence. How the sense of subjectivity level changes in the chain of state-will-behavior-new state will be described below with reference to FIG.

最初に、将来発生する意思を導出するモデルとして、ビリーフ－デザイア－意思モデル（Georgeff et al., 5th International Workshop, ATAL’98 Proceedings, pp.1-10, 1998）が公知である。このモデルにおいて、「ビリーフ（信念・確信）」（図２）及び「デザイア（願望・欲求）」（図２）はそれぞれ、環境世界の知覚された状態構造、及び環境世界の所望・希望する状態構造についての蓄積された情報となっている。 First, the belief-desire-intention model (Georgeff et al., 5th International Workshop, ATAL'98 Proceedings, pp.1-10, 1998) is known as a model for deriving future intentions. In this model, 'belief' (Fig. 2) and 'desire' (Fig. 2) are the perceived state structure of the environmental world and the desired/desired state of the environmental world, respectively. It is the stored information about the structure.

また、「意思」（図２）は、「ビリーフ」及び「デザイア」から決定され、具体的には、環境世界の所望・希望する状態構造をもたらすと仮定された行動を含む情報となっている。このように、「意思」は行動を特定し制御するのであるが、このような行動の特定・制御が、変化した「ビリーフ」や「デザイア」の影響を受けて更新されるのである。 In addition, "will" (Fig. 2) is determined from "belief" and "desire", and specifically, it is information that includes actions assumed to bring about the desired state structure of the environmental world. . In this way, "intention" specifies and controls behavior, but the specification and control of such behavior are updated under the influence of changed "belief" and "desire".

また、ここで主体感は、所望・希望する状態構造を達成することになる「意思」が「行動」（図２）の中に具現化しているか否かによって決定される感覚であり、「意思」を受けた「行動計画・選択」（図２）によって与えられることになる。 Here, the sense of subjectivity is a feeling that is determined by whether or not the "intention" to achieve the desired state structure is embodied in the "action" (Fig. 2). will be given by the "action plan/choice" (Fig. 2) that received ".

本願発明者等は、以上に述べた「意思」についての知見と、従来の主体感の出現や途絶に関する認知科学や神経科学の理論とを統合し、図２のような基礎体系を考案したのである。 The inventors of the present application have integrated the above-mentioned knowledge of "intention" with conventional theories of cognitive science and neuroscience concerning the emergence and cessation of a sense of agency, and devised a basic system as shown in Figure 2. be.

ここで従来、認知科学や神経科学ではモータ（motor）の学習制御の理論が存在する中、「比較器」（図２）を用いたモデルの行動認知に対する妥当性が議論されてきた。この「比較器」モデルによれば、モータの駆動（行動）は、モータへの指令の遠心性コピーに基づき生成されるモータ出力の予測結果に伴って実施される。次いで、「比較器」において、この予測結果と実測されたモータ出力とが比較され、両者が一致する場合、このモータ出力は、モータ自身の駆動（行動）を原因としたものであるとして記録される。一方、一致しない場合、モータ駆動を制御しているという意味での"主体感"における中断や途絶が発生したとするのである。 Here, in cognitive science and neuroscience, the validity of a model using a "comparator" (Fig. 2) for behavioral cognition has been discussed, while the theory of motor learning control exists. According to this "comparator" model, motor drive (action) is performed with a prediction of the motor output produced based on the efferent copy of the command to the motor. Next, in the "comparator", the predicted result and the actually measured motor output are compared, and if they match, the motor output is recorded as being caused by the driving (behavior) of the motor itself. be. On the other hand, if they do not match, it is assumed that the "sense of subjectivity" in the sense of controlling the motor drive has been interrupted or disrupted.

これに対し、認知科学や神経科学における遡及推定（ＲＩ，Retrospective inference）の理論は、意図された又は予測された状態と観測された実際の状態とが一致すれば、主体感が生じたとの「推定」（図２）を行うものとなっている。また、「意思」とともに他の高次の（認知に係る）因子、例えば外界のコンテキストや社会的状況に係る手がかりも主体感の「推定」において考慮している。具体的にこの理論では、観測された状態が予測通りに生じた場合、行動は円滑に実施されていき、行動や態度についての考えは、意識の片隅におかれることになる。一方、観測された状態が予測とは異なる場合、脳は遡及推定（ＲＩ）を行い、起こした行動は観測された状態の原因となったのか否かについての解答を求めるのである。 In contrast, the theory of retrospective inference (RI) in cognitive science and neuroscience states that if the intended or predicted state and the observed actual state match, a sense of subjectivity arises. estimation” (Fig. 2). In addition to "intention", other higher-order (relevant to cognition) factors, such as clues related to the context of the external world and social situations, are also considered in the "estimation" of the sense of subjectivity. Concretely, in this theory, if the observed state occurs as expected, the action will be carried out smoothly, and the idea of action and attitude will be put in the corner of consciousness. On the other hand, if the observed state differs from the prediction, the brain performs retrospective inference (RI), asking for an answer as to whether the behavior caused the observed state.

以下、以上に説明した理論的基礎体系をコンピュータにおいて具現した、本発明の一実施形態としての主体感推定モデル（行動変容促進モデル）１における具体的構成について詳細に説明を行う。 A specific configuration of the subjective feeling estimation model (behavioral change promotion model) 1 as an embodiment of the present invention, which is a computer implementation of the theoretical basic system described above, will now be described in detail.

ちなみに通常、人の脳は、他人の心を表現した精神モデルを保持し、それを使って他人の精神状態を察する処理をこなしている。この処理能力についての認知科学の理論であるいわゆる心の理論（ＴｏＭ，Theory of Mind）では、人はこのような処理能力を保持するが故に、様々なコンテキストの中で他人が如何に振舞うのか及び何故そのように振舞うのかについての認知を直感的に得ることができるとしている。 By the way, normally, the human brain holds a mental model that expresses the mind of another person, and uses it to process the mental state of the other person. In the so-called Theory of Mind (ToM), which is a theory of cognitive science about this processing ability, people retain such processing ability, and therefore how others behave in various contexts and You can intuitively get a sense of why it behaves the way it does.

ここで以下に説明する主体感推定モデル（行動変容促進モデル）１は、この心の理論（ＴｏＭ）をいわばエミュレートし、「ビリーフ」、「デザイア」、「意思」、「行動計画・選択」及び「推定」を他人（本実施形態ではユーザＨ）に帰するものとし、何故他人（ユーザＨ）はそのように行動するのか、及び行動の結果としての環境世界の状態を如何に認知するのか、言い換えれば他人（ユーザＨ）の主体感はどのようになっており如何に作用するのか、について理解を行うのである。 Here, the sense of subjectivity estimation model (behavioral change promotion model) 1 described below emulates this theory of mind (ToM), so to speak, and includes "belief", "desire", "intention", and "action plan/choice". and "estimation" is attributed to another person (user H in this embodiment), why the other person (user H) behaves like that, and how to perceive the state of the environment world as a result of the action In other words, it is necessary to understand how the other person (user H) has a sense of subjectivity and how it works.

［モデル構成，主体感推定方法］
以下、本発明による主体感推定モデル（行動変容促進モデル）１の一実施形態における機能構成について、より詳細に説明を行う。同じく図１によれば、主体感推定モデル（行動変容促進モデル）１は、
（ア）ビリーフモデル１１と、デザイアモデル１２と、価値一致化モデル１３と、意思モデル１４と、
（イ）行動計画部１５１及び行動決定部１５２を含む行動モデル１５と、
（ウ）ＣＢＮ（Causal Bayesian Network）集合体１６１と、状態生成器１６２と、判別器１６３と、評価器１６４とを含む代替状態生成・評価モデル１６と、
（エ）主体感モデル１７と
を、コンピュータ（に搭載されたプログラム）によって具現される機能構成部として備えている。以下、上記の各機能構成部について具体的に説明を行う。 [Model configuration, subjective feeling estimation method]
Hereinafter, the functional configuration in one embodiment of the sense of subjectivity estimation model (behavior change promotion model) 1 according to the present invention will be described in more detail. Similarly, according to FIG. 1, the sense of subjectivity estimation model (behavior change promotion model) 1 is
(A) Belief model 11, Desire model 12, Value matching model 13, Intention model 14,
(b) an action model 15 including an action planning unit 151 and an action determining unit 152;
(c) an alternative state generation/evaluation model 16 including a CBN (Causal Bayesian Network) aggregate 161, a state generator 162, a classifier 163, and an evaluator 164;
(d) The subjective feeling model 17 is provided as a functional component embodied by (a program installed in) a computer. Hereinafter, each functional configuration unit described above will be specifically described.

＜ビリーフ（信念・確信）モデル＞
同じく図１において、ビリーフ（信念・確信）モデル１１は、ユーザＨのある「主体感レベル」soaの下で、ある状態sに対してある行動aを行った結果、ある新たな状態ｓ'が生じる確率を含む情報である「ビリーフ情報」を生成又は更新するモデルである。本実施形態において、このビリーフモデル１１は、部分観測マルコフ決定過程（ＰＯＭＤＰ，Partially Observable Markov Decision Process）（MONAHAN G. E. Management Science 28(1), 1-16, 1982）に係るアルゴリズムを用いて構築される。 <Belief model>
Also in FIG. 1, the belief model 11 assumes that a new state s' is created as a result of performing a certain action a in a certain state s under a certain "subjectivity level" soa of the user H. It is a model that generates or updates "belief information", which is information containing the probability of occurrence. In this embodiment, this belief model 11 is constructed using an algorithm related to the Partially Observable Markov Decision Process (POMDP) (MONAHAN GE Management Science 28(1), 1-16, 1982). .

具体的にビリーフモデル１１は、
（ａ）ユーザＨ及び主体感推定モデル１を含む環境世界がとり得る状態sの集合を状態空間Ｓとし、
（ｂ）ユーザＨ及び主体感推定モデル１が行い得る（出力し得る）行動aの集合を行動空間Ａとし、
（ｃ）状態sにおいて行動aを行った際に、状態s'への遷移が生じる条件付き確率を遷移確率Τ(s'|s, a)とし、
（ｄ）状態sにおいて行動aを行った際のコストをc＝c(s, a)とし、
（ｅ）状態sにおいて行動aを行った際に状態s'への遷移が生じる場合に、主体感推定モデル１が環境世界から観測oを得る確率を観測確率O(o|s', a)として、
状態sにおいて行動aを行った主体感推定モデル１が、観測oを得た際の「ビリーフ情報」として、環境世界が状態s'をとる確率であるビリーフB(s')を導出する。 Specifically, Belief Model 11 is
(a) Let state space S be a set of possible states s of the environment world including user H and sense of subjectivity estimation model 1,
(b) Assume that a set of actions a that can be performed (output) by the user H and the sense of subjectivity estimation model 1 is an action space A;
(c) Let transition probability T(s'|s, a) be the conditional probability that a transition to state s' occurs when action a is performed in state s,
(d) Let the cost of performing action a in state s be c = c(s, a),
(e) Observation probability O(o|s', a) is the probability that subjective feeling estimation model 1 obtains observation o from the environment world when a transition to state s' occurs when action a is performed in state s. As
The sense of subjectivity estimation model 1, which performed action a in state s, derives belief B(s'), which is the probability that the environmental world takes state s', as "belief information" when observation o is obtained.

より具体的には、前時点でのビリーフをB(s)とし、β＝1／Prob(o|b, a)を規格化定数とすると、現時点での（B(s)の更新結果としての）ビリーフB(s')は、次式
（１） B(s')＝β・O(o|s', a)・Σ_s∈SΤ(s'|s, a)B(s)
によって算出することができる。これはいわば、環境世界のとり得る状態がどのぐらい起こり得るのかについての"信念・確信"の度合いととることも可能な情報となっている。 More specifically, if the belief at the previous point in time is B(s) and β = 1/Prob(o|b, a) is the normalization constant, then the current update result of B(s) is ) Belief B(s') is expressed by the following formula (1) B(s')=β・O(o|s', a)・_Σ
can be calculated by This is information that can be taken as the degree of "belief/confidence" about how likely the states of the environment are to occur.

ここで、「主体感レベル」soaは、後に詳細に説明するが、ユーザＨがとり得る状態として状態空間Ｓの要素となっており、ビリーフB(s')は、後述する主体感モデル１７から受け取った更新された「主体感レベル」soa'を含む{s}についての総和（上式（１）のΣ_s∈S）をとることによって更新された値となるのである。 Here, the "subjectivity level" soa is an element of the state space S as a state that the user H can take, and the belief B(s') is derived from the subjectivity model 17 described later. The updated value is obtained by taking the sum (Σ _s∈S in the above equation (1)) for {s} including the received updated “subjectivity level” soa′.

ちなみに、ユーザＨを、道路Ｘを走行している自動車のドライバとし、環境世界を、ユーザＨの自動車も含む道路交通状況及び道路網周辺の環境とすると、状態は例えば、「目的地Ｗに向かっている」、「道路ＸのユーザＨの位置での交通状況は"ノーマル"である」、「道路Ｘのこの先の交通状況は"渋滞"である」、「天候は"晴れ"である」、・・・であり、行動は例えば、「引き続き道路Ｘを走行する」、「（本モデルが）この先渋滞している旨を通知する」、「道路Ｙへ迂回する」、・・・とすることができる。また、「主体感レベル」は例えば、"high"、"a little high"、"neutral"、"a little low"、"low"の５段階に設定されてもよい。 By the way, assuming that the user H is the driver of a car traveling on the road X, and the environment world is the road traffic condition including the user H's car and the environment around the road network, the state is, for example, "towards the destination W "The traffic condition at the position of the user H on the road X is 'normal'", "The traffic condition ahead of the road X is 'congested'", "The weather is 'clear'", . can be done. Also, the "subjectivity level" may be set to five levels, for example, "high", "a little high", "neutral", "a little low", and "low".

ここでこのビリーフモデル１１を含む主体感推定モデル１全体において、通常とは異なり、主体感推定モデル１とユーザＨとは完全に分離したエンティティとはなっておらず、主体感推定モデル１は、ユーザＨが協力してくれることを期して処理を行うものとなっている。また、本主体感推定モデル１は、このビリーフモデル１１において、通常のＰＯＭＤＰにおいて用いられる報酬（関数）を採用しておらず、代わりに、この後説明するデザイア（願望・欲求）モデルにおいて報酬rを採用しているのである。 Here, in the entire sense of subjectivity estimation model 1 including this belief model 11, unlike usual, the sense of subjectivity estimation model 1 and the user H are not completely separate entities. Processing is performed with the expectation that user H will cooperate. In addition, the subjective feeling estimation model 1 does not adopt the reward (function) used in ordinary POMDP in this belief model 11, but instead uses the reward r is adopted.

＜デザイア（願望・欲求）モデル＞
同じく図１において、デザイア（願望，欲求）モデル１２は、ユーザにとっての価値vに係る「価値情報」と、それに対応する報酬rとを受け取って、「価値情報」及び報酬rに基づき、ユーザの所望する状態である所望状態sd（∈Ｓ）を生成し、所望状態sdをもたらす可能性のある行動aの集合である方針πを決定するモデルである。 <Desire model>
Also in FIG. 1, the desire (desire, desire) model 12 receives "value information" related to the value v for the user and the corresponding reward r, and based on the "value information" and the reward r, the user's It is a model that generates a desired state sd (εS), which is a desired state, and determines a policy π that is a set of actions a that can bring about the desired state sd.

本実施形態において、このデザイアモデル１２は具体的に、深層ニューラルネットワーク（ＤＮＮ，Deep Neural Networks）アルゴリズムで構成され、
（ａ）ユーザＨにとっての（例えば社会的価値である）価値vのセットτ＝<v1, v2,・・・, vn>、例えばτ=<能率性, 倹約性, 利他性, 幸福度, 自己愛度, 他人の主体感を察する度合い>と、
（ｂ）状態sにおいて行動aを行った際に得られた観測oによって算出される報酬r（∈Ｒ(報酬空間)）と
を入力とし、ユーザの所望する状態である所望状態sd（∈Ｓ）を出力する価値・報酬モデルDMを用いて、所望状態sdをもたらす可能性のある行動aの集合である、観測oの関数としての方針π(o)＝<a1, a2,・・・, an>を決定するのである。 In this embodiment, the desire model 12 is specifically configured with a deep neural network (DNN, Deep Neural Networks) algorithm,
(a) A set of values v (for example, social values) for user H, τ=<v1, v2, . degree of affection, degree of sense of subjectivity of others> and
(b) A desired state sd (εS ), the policy π(o)=<a1, a2, . to determine an>.

ここで、上記（ａ）の価値vのセットτも、上記（ｂ）の報酬rもともに、主体感推定モデル１との相互作用の中でユーザＨによって与えられた又は示されたものとなっている。ちなみに、このような価値vのセットτは、この後説明する価値一致化モデル１３で生成されるのである。 Here, both the set τ of values v in (a) above and the reward r in (b) above are given or indicated by the user H in interaction with the subjective feeling estimation model 1. ing. Incidentally, such a set τ of values v is generated by the value matching model 13 described later.

＜価値一体化モデル＞
一般論として、人はある状況において実際に望んでいることを、他人に対し誤って若しくは偽って伝えてしまうことも少なくない。これは、ＡＩに対し自らの要望を伝えて期待通りの行動を行ってもらおうとする際、大きな問題となる。 <Value integration model>
Generally speaking, people often misrepresent or misrepresent to others what they really want in a given situation. This poses a big problem when trying to convey one's wishes to AI and have it act as expected.

例えば、人は、ＡＩロボットに対し、ＡＩロボット自身がコーヒーを楽しむことよりも、自ら（人）のためにコーヒーを淹れてくれることを期待する（正解の価値とする）。また、人は、自律（自動）運転車に対し、運転中、自ら（人）にとってどのような価値が重要となるかを認知してくれることを要望することになる。例えば、交通ルールの順守、歩行者から離隔することや、愚図る子供が乗車している状況で交通渋滞に巻き込まれないこと等を必須の価値として認知することを期待するのである。しかしながら、ＡＩにとって、このような人の要望を的確に認知すること、言い換えると、人にとっての価値とＡＩの取り入れる価値とを一致化する（揃える）ことは、従来非常に困難となっていた。 For example, people expect AI robots to brew coffee for themselves (people) rather than enjoying coffee themselves (the value of the correct answer). People will also want self-driving cars to recognize what values are important to them while driving. For example, we expect them to recognize the essential values of complying with traffic rules, staying away from pedestrians, and not getting stuck in traffic jams when children are driving. However, it has been extremely difficult for AI to accurately recognize such human desires, in other words, to match (align) the value for humans and the value that AI incorporates.

例えば、一般的な強化学習（ＲＬ，Reinforcement Learning）を実行するＡＩは、遠い将来の報酬ほど割り引いて加算した累積報酬を最大とするようにして、最適な方針を学習するが、このように扱われる報酬は、あくまで数学上の抽象量であって、現実の環境世界に本来的に備わったものではない。さらに言えば、人は何を考慮すべき価値とするかといった問題や、何故ある価値を重要とするかといった問題に対し、数学上の抽象量ではなく、実際に得られる量として答えるモデルを形成することは非常に困難である。 For example, AI that performs general reinforcement learning (RL) learns the optimal policy by maximizing the cumulative reward that is added by discounting the farther future reward. The rewards given are merely mathematical abstract quantities, not intrinsic to the real environment. In addition, we form a model that answers questions such as what values should be considered and why certain values are important, not as abstract quantities in mathematics, but as actually obtained quantities. is very difficult to do.

また従来、逆強化学習（ＩＲＬ，Inverse Reinforcement Learning）（Andrew Ng and Stuart J Russell, ICML 2000: Proceedings of the Seventeenth International Conference）を用いて、この価値一致化の問題を解決する試みもなされてきた。しかしながら、人は自らの全ての要望をＡＩに理解してもらいたいわけではなく（例えばコーヒータイムを楽しむといったような個人的な望みは理解される必要がない）、ＡＩにとってそれを区別して処理することは非常に難しいとの問題が生じていた。さらに、ＩＲＬは、観測された人の行動・態度は最適化されたものであることを前提にしており、観測された人の行動・態度に含まれる様々な有用情報を活用して調整を行うことができなかった。 Conventionally, attempts have also been made to solve this value matching problem using Inverse Reinforcement Learning (IRL) (Andrew Ng and Stuart J Russell, ICML 2000: Proceedings of the Seventeenth International Conference). However, humans do not want AI to understand all of their wishes (for example, personal wishes such as enjoying coffee time do not need to be understood), and AI needs to distinguish and process them. It was very difficult and a problem had arisen. Furthermore, IRL assumes that the observed human behavior and attitudes are optimized, and makes adjustments using various useful information contained in the observed human behavior and attitudes. I couldn't.

そこで、本発明に係る価値一致化モデル１３は、ユーザとの価値についてのコミュニケーションを可能にする、協調逆強化学習（ＣＩＲＬ，Cooperative Inverse Reinforcement Learning）（Dylan Hadfield-Menell et al., 30th Conference on Neural Information Processing Systems (NIPS) 2016）に係るアルゴリズムを用いて構築されている。 Therefore, the value matching model 13 according to the present invention uses Cooperative Inverse Reinforcement Learning (CIRL) (Dylan Hadfield-Menell et al., 30th Conference on Neural It is constructed using algorithms related to Information Processing Systems (NIPS) 2016).

具体的に価値一致化モデル１３は、ＣＩＲＬの処理フローの一環として、ユーザから、
（ａ）「価値情報」に係る情報、本実施形態ではユーザＨにとっての価値v（ユーザＨが（例えば社会生活上）重要であると認識している価値v）のセットτ＝<v1, v2,・・・, vn>に係る情報、すなわち、各価値v1, v2,・・・, vnについての情報と、
（ｂ）「価値情報」に係る情報に対応する報酬rに係る情報、本実施形態では各価値v1, v2,・・・, vnの報酬rについての情報と
を受け取って、これら（ａ）及び（ｂ）の情報に基づき、「価値情報」及びそれに対応する報酬rを生成又は更新し、デザイアモデル１２へ出力するモデルとなっている。 Specifically, the value matching model 13, as part of the CIRL processing flow, from the user,
(a) Information related to "value information", in this embodiment, a set τ=<v1, v2 of value v for user H (value v recognized by user H as being important (for example, in social life)) , ..., vn>, that is, information about each value v1, v2, ..., vn;
(b) receive information related to the reward r corresponding to the information related to the "value information", in this embodiment information about the reward r of each value v1, v2, ..., vn; Based on the information of (b), the model generates or updates the “value information” and the reward r corresponding thereto, and outputs them to the desire model 12 .

ここで、（主体感推定モデル１としての）価値一致化モデル１３は、
（ア）当初、ユーザＨが個人的にその価値を認めるもの、本実施形態では価値セットτ＝<v1, v2,・・・, vn>について明確に認知しておらず、
（イ）カメラ等の測定手段から出力される測定結果を入力可能な、又はテキスト入出力・音声入出力等の可能なインタフェース（ＩＦ）を介し、ユーザＨに対して、価値セットτに係る観測や問い合わせを行い、
（ウ）「ユーザは通常、価値セットτに基づいた行動を行う」ことを大前提として、価値セットτに関する情報（例えばユーザの動作・態度や回答、さらにはそれに関連して得られた報酬・成果に係る情報）を収集し、
（エ）収集した情報に基づき、推定した価値セットτを最大化することを目的として価値セット・報酬生成処理を行うのである。 Here, the value matching model 13 (as the sense of subjectivity estimation model 1) is
(a) Initially, the user H personally recognizes the value, in this embodiment, does not clearly recognize the value set τ = <v1, v2, ..., vn>,
(b) Observation related to value set τ for user H via an interface (IF) capable of inputting measurement results output from measuring means such as cameras, or capable of text input/output, voice input/output, etc. or make an inquiry,
(c) Based on the premise that "users usually act based on value set τ", information on value set τ (for example, user's actions, attitudes, answers, and rewards and information related to the results),
(d) Based on the collected information, value set/reward generation processing is performed with the aim of maximizing the estimated value set τ.

ちなみに、より一致化した価値セットτを決定するため、上述したような主体感推定モデル１（価値一致化モデル１３）とユーザＨとの相互作用は、継続的に繰り返し行われることも好ましい。 Incidentally, in order to determine a more consistent value set τ, it is also preferable that the above-described interaction between the sense of subjectivity estimation model 1 (value matching model 13) and the user H is continuously repeated.

また、価値一致化モデル１３は本実施形態において、上記のインタフェース（ＩＦ）を介し、例えばユーザＨに対し問合せ・要求を行って、その応答内容からユーザＨの主体感に係るセルフレポート（自己申告）soa_rを生成し、後に詳細に説明する主体感モデル１７へ出力することも可能となっている。 Also, in this embodiment, the value matching model 13 makes an inquiry/request to, for example, the user H via the above interface (IF), and based on the content of the response, a self-report (self-report) related to the subjectivity of the user H. ) soa_r and output to the subjective feeling model 17, which will be described later in detail.

＜意思モデル＞ <Intention model>

将来発生する意思を導出するモデルとして、すでに説明したビリーフ－デザイア－意思モデル（Georgeff et al., 5th International Workshop, ATAL’98 Proceedings, pp.1-10, 1998）が知られている。ここで、意思は、環境世界において所望の状態（結果）をもたらす原因になるであろう行動として表される。しかしながら、ユーザＨの主体感レベルsoaを導出する上では、主体感推定モデル１（やユーザＨ）が原因（行動）と結果（状態）との関係、すなわち因果関係を如何に把握するかの問題を解決しなければならない。 The belief-desire-intention model (Georgeff et al., 5th International Workshop, ATAL'98 Proceedings, pp.1-10, 1998) is known as a model for deriving future intentions. Here, intention is expressed as an action that would cause a desired state (result) in the environmental world. However, in deriving the sense of subjectivity level soa of the user H, there is the problem of how the sense of subjectivity estimation model 1 (or the user H) grasps the relationship between the cause (behavior) and the result (state), that is, the causal relationship. must be resolved.

この問題を解決するべく、同じく図１において意思モデル１４は、
（ａ）ビリーフモデル１１から受け取ったビリーフ情報と、
（ｂ）デザイアモデル１２から受け取った価値セットτ（価値情報）及び対応する報酬と
に基づいて、状態、価値、報酬、及び行動（本実施形態ではさらに、ビリーフ情報に含まれるコスト）の間の因果関係に係る「因果関係情報」を生成するモデルとなっている。 In order to solve this problem, the intention model 14 in FIG.
(a) belief information received from the belief model 11;
(b) based on the value set τ (value information) and corresponding rewards received from the desire model 12, between states, values, rewards, and actions (in this embodiment, the costs also included in the belief information); It is a model that generates "causal relationship information" related to causal relationships.

本実施形態において、この意思モデル１４は、因果ベイジアンネットワーク（ＣＢＮ，Causal Bayesian Network）アルゴリズムを用いて構築される。また、出力となる「因果関係情報」は本実施形態において、ＣＢＮアルゴリズムに係る情報、具体的には構成されたＣＢＮの構成情報そのもの（図１の左側下方参照）であり、具体的には、状態s及び行動aが与えられたときに結果として生じる状態s'の条件付き確率を含む情報となっている。 In this embodiment, the intention model 14 is constructed using a Causal Bayesian Network (CBN) algorithm. In this embodiment, the output "causal relationship information" is information related to the CBN algorithm, specifically the configured CBN configuration information itself (see the lower left side of FIG. 1). The information includes the conditional probability of the resulting state s' given the state s and the action a.

ここでＣＢＮは、有向非巡回グラフモデルであって、親関数群{pa(Ｙ_i)}で特定される有向エッジＥを伴ったノード群{Ｙ_i}、及び条件付き確率群{Prob(Ｙ_i|pa(Ｙ_i)}で構成されている。ここで各ノードＹ_iは、状態、行動、コスト、価値、及び報酬のいずれかに対応するものである。また、有向エッジＥは、それによって結ばれるＹ_iとＹ_jとの間に因果関係的な遷移の可能性があることを示しており、具体的には、Ｙ_iはある確率をもってＹ_jの原因となる、言い換えるとＹ_jの起こる可能性が、Ｙ_iを条件とした条件付き確率分布で表されることを示している。 where CBN is a directed acyclic graph model, a set of nodes {Y _i } with directed edges E identified by the parent function set {pa(Y _i )}, and a set of conditional probabilities {Prob (Y _i |pa(Y _i )}, where each node Y _i corresponds to one of state, action, cost, value, and reward. indicates that there is a possibility of a causal transition between Y _i and Y _j connected by it, specifically Y _i causes Y _j with some probability, in other words and Y _j are represented by a conditional probability distribution with Y _i as a condition.

さらに本実施形態において、ＣＢＮにはｄｏ演算子：do(Ｙ_j＝y_j)が規定されている。このｄｏ演算子がＣＢＮに適用されると、pa(Ｙ_i)＝Φであって、Prob(Ｙ_i)＝δ(Ｙ_i;y_i)となる。すなわちｄｏ演算子は、主体感推定モデル１によって実施される「（行動変容理論における）介入」に対応する演算子となっているのである。 Furthermore, in this embodiment, the CBN defines a do operator: do(Y _j =y _j ). When this do operator is applied to the CBN, pa(Y _i )=Φ and Prob(Y _i )=δ(Y _i ;y _i ). In other words, the do operator is an operator corresponding to the "intervention (in behavior change theory)" implemented by the sense of subjectivity estimation model 1. FIG.

また、意思モデル１４は当初、「介入」を通して環境世界の因果関係を推定するが、最終的には、「過去の段階で、ある異なる「介入」が実施されていたとしたら、何が生じていたのか」といった反実仮想的な問いに答える必要が生じる。そのため本実施形態では、意思モデル１４は「反実仮想モード」をとることも可能となっている。 In addition, the intention model 14 initially estimates the causal relationship of the environmental world through "intervention", but in the end, "what would have happened if a different "intervention" had been implemented in the past stage?" It becomes necessary to answer counterfactual and hypothetical questions such as Therefore, in this embodiment, the intention model 14 can also take a "counterfactual virtual mode."

以上説明したように、意思モデル１４は、
・人の意思は、「環境世界の事象は如何なる因果関係で繋がっているのか」についての考えや、「意図した行動による結果として何がもたらされるか」についての予想に基づいて形成されるとの、本願発明者等によって新たに設定された仮説
のもとに構築されている。 As explained above, the intention model 14 is
・It is said that human intentions are formed based on the idea of ``how are the events in the environment connected by causal relationships?'' and expectations of ``what will be brought about as a result of the intended action.'' , is constructed based on a new hypothesis set by the inventors of the present application.

すなわち、意思モデル１４は、図２に示した本発明の理論的基礎体系における「比較器」や「推定」の機能、すなわち行われた行動の結果（観測された実際の状態）が予測又は意図された状態となっているかを判定・推定する機能、を取り込んだものとなっているのである。ちなみに、状態s及び行動aが与えられたときに結果として生じる状態s'の条件付き確率、すなわち状態s'の生じる確率がどのくらい高いかは、意思モデル１４の出力に含まれる情報となっているが、まさに意図した状態との対応関係を反映したものとなっている。 That is, the intention model 14 is the function of the "comparator" and "estimation" in the theoretical basis system of the present invention shown in FIG. It incorporates the function of judging and estimating whether it is in a state of being Incidentally, the conditional probability of the resulting state s' when the state s and the action a are given, that is, how high the probability of the state s' occurring is information included in the output of the intention model 14. is a reflection of the correspondence with the intended state.

またさらに、意思モデル１４には、ビリーフモデル１１から受け取ったビリーフ情報を介し、ユーザＨが想定した状態と、実際の状態が一致しているか否かに係る情報、すなわち主体感レベルsoaに係る情報が反映されているのである。 Furthermore, the intention model 14 stores information regarding whether or not the state assumed by the user H matches the actual state via the belief information received from the belief model 11, that is, information regarding the sense of subjectivity level soa. is reflected.

＜行動モデル＞
同じく図１において、行動モデル１５は、
（ａ）デザイアモデル１２から受け取った方針π＝<a1, a2,・・・, an>と、
（ｂ）意思モデル１４から受け取った因果関係情報（ＣＢＮ構成情報）
とに基づき、観測された状態sに対して行うべき行動aを決定し、出力するモデルとなっており、本実施形態において、行動計画部１５１及び行動決定部１５２を備えている。 <Action model>
Also in FIG. 1, the behavioral model 15 is:
(a) Policy π = <a1, a2, ..., an> received from Desire Model 12, and
(b) causal relationship information (CBN configuration information) received from the intention model 14
Based on the above, the action a to be performed for the observed state s is determined and output.

このうち行動計画部１５１は、
上記（ａ）の方針πと、上記（ｂ）の因果関係情報（ＣＢＮ構成情報）と、さらに、
（ｃ）カメラ等の測定手段から出力される測定結果を入力可能な、又はテキスト入出力・音声入出力等の可能なインタフェース（ＩＦ）を介し、ユーザＨとの間で行った所定の問いかけを含むコミュニケーションの内容と
に基づき、最適とされる方針である最適方針π*を生成する。 Of these, the action planning section 151
The policy π of the above (a), the causal relationship information (CBN configuration information) of the above (b), and further,
(c) Through an interface (IF) capable of inputting measurement results output from measuring means such as a camera, or capable of text input/output, voice input/output, etc., a predetermined question made between User H Optimal policy π*, which is the policy considered optimal, is generated based on the content of the communication including.

この行動計画部１５１は本実施形態において、公知のＸＡＩＰ（eXplainable AI Planning agent）（Chakraborti et al., Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence IJCAI-19, pp.1335-1343, 2019）における行動決定処理を用いて構築されている。ここでＸＡＩＰでは通常、行動計画問題Ｐ、遷移関数ζ_Ｐ：Ｓ×Ａ→Ｓ×Ｃや、行動計画アルゴリズムＡ：Ｐ×ｔ→π が定義される。ここで、tは最適性や健常性といった（本発明のτとは異なる）性質を表す量となっている。これに対し、本願発明者等は、行動計画問題Ｐの代わりに、ユーザＨの精神モデルΠを採用して、行動計画部１５１を構築しているのである。 In this embodiment, the action planning unit 151 uses a known XAIP (eXplainable AI Planning agent) (Chakraborti et al., Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence IJCAI-19, pp.1335-1343, 2019) It is constructed using the action decision processing in Here, in XAIP, an action plan problem P, a transition function ζ _P : S×A→S×C, and an action plan algorithm A: P×t→π are usually defined. Here, t is a quantity representing properties such as optimality and soundness (different from τ in the present invention). On the other hand, the inventors of the present application employ the mental model Π of the user H instead of the action planning problem P to construct the action planning section 151 .

ここで、精神モデルΠは、本実施形態において意思モデル１４から受け取った因果関係情報（ＣＢＮ構成情報）であり、例えばユーザＨが自動車を運転している状況において、「現状の道路Ｘでこのまま進み、目的地Ｗまでの所要時間をよく考慮し、渋滞を回避してできるだけ速い速度で走行し、途中下車は極力避ける」といったようなユーザＨの心的な情報をとりまとめたモデルとなっている。 Here, the mental model Π is causal relationship information (CBN configuration information) received from the intention model 14 in this embodiment. , carefully consider the time required to reach destination W, avoid traffic jams, drive as fast as possible, and avoid getting off the train as much as possible."

具体的に、行動計画部１５１は、このユーザＨの精神モデルΠと、遷移関数ζ_Π：Ｓ×Ａ→Ｓ×Ｃと、行動計画アルゴリズムPA：Π×DM→π*とを規定する。ここで、DMは、デザイアモデル１２を構成する価値・報酬モデルであり、ここでは所望状態sd（∈Ｓ）をもたらす可能性のある（状態sにおける）行動aの集合である方針π＝<a1, a2,・・・, an>として実施される。また、π*は上述したように最適方針である。 Specifically, the action planning unit 151 defines this mental model Π of the user H, the transition function ζ _Π : S×A→S×C, and the action planning algorithm PA: π×DM→π*. Here, DM is the value/reward model that constitutes the desire model 12, where the policy π=<a1 , a2,..., an>. Also, π* is the optimal policy as described above.

この最適方針π*は、実際の状態sを所望状態sdに遷移させる行動の集合であり、すなわち遷移関数は、次式
（２） ζ_Π(π, s)＝<sd, Σ_ai∈π(ci＋ri)>
のようになるのである。ここで、ci及びriはそれぞれ、方針πの実行時に最適化すべきコスト及び報酬となっている。なお、上式（２）の遷移関数ζ_Πは_、公知のＸＡＩＰでは採用されることのない報酬riを含むことを特徴の１つとしている。 This optimal policy π* is a set of actions that transition the actual state s to the desired state sd, that is, the transition function is given by the following equation (2) ζ _Π (π, s)=<sd, Σ _ai∈π ( ci＋ri)>
It becomes like this. where ci and ri are the cost and reward to be optimized when executing policy π, respectively. Note that one of the features of the transition function _ζΠ of the above equation (2) is that it includes a reward ri that is not used _in known XAIP.

行動計画部１５１は、行動計画アルゴリズムPAに従い、最適方針π*を決定するべく、方針πの行動を実施した際に遷移先の状態として生じた状態s(i+1)（＝ζ_Π(π, si)）が、所望状態sdに完全に若しくは概ね一致するように方針最適化処理を行うのである。以下、この方針最適化処理の具体的な実施形態を説明する。 In order to determine the optimum policy π*, the action planning unit 151 follows the action planning algorithm PA to determine the state s(i+1) (=ζ _Π (π , si)) performs the policy optimization process so that it completely or roughly matches the desired state sd. A specific embodiment of this policy optimization process will be described below.

本実施形態では基本的に、行動計画部１５１は、ユーザＨの精神モデルと同様のモデルを維持する必要があり、したがって、ユーザＨの予測される行動や精神モデルが自らの行動やモデルと一致していない場合には、ユーザＨに対し「説明」を行い、モデルの一致化を進める。 In this embodiment, basically, the action planning unit 151 needs to maintain a model similar to the user H's mental model. If they do not match, an "explanation" is given to the user H, and the matching of the models is advanced.

例えば、ユーザＨ（ドライバ）が「引き続き道路Ｘを走行する」場合に、「道路Ｙへ迂回する」ことを方針とするべく、「説明」として「道路Ｘはこの先渋滞しているのに対し、道路Ｙは渋滞していない」を採用してもよいのである。 For example, when the user H (driver) "continues to drive on the road X", the policy is to "take a detour to the road Y". Road Y is not congested" may be adopted.

ここで「説明」のための１つの手法として、予測調停（inference reconciliation）が実施される。具体的には、行動計画部１５１の行動計画アルゴリズムPA^χ（以後χを、ＡＩとしてのモデル１を表すものとする）が方針πを生成するのに対し、ユーザＨの行動計画アルゴリズムPA^Ｈは同じ方針πを生成しない場合に、行動計画部１５１は、説明εを実施し、PA^Ｈが同じ方針πを生成するように、すなわちPA^Ｈ：Π×Ｄ→^επとなるようにする。 As one approach to "clarification" here, inference reconciliation is implemented. Specifically, the action plan algorithm PA ^χ of the action plan unit 151 (hereinafter χ represents the model 1 as AI) generates the policy π, whereas the action plan algorithm PA ^H of the user H is If it does not generate the same policy π, the action planner 151 performs the explanation ε so that PA ^H generates the same policy π, ie PA ^H : Π×D→ ^ε π.

この説明εは例えば、ユーザＨ（の行動計画アルゴリズムPA^Ｈ）に対してなされる、PA^χから生成される方針πについての具体的な問いかけ・質問を含むコミュニケーション内容とすることができる。ここで、この問いかけ・質問は、「何故ある行動aが方針πにあるのか」、「何故方針πであって他の方針π'ではないのか」や、「何故方針πが最適であるのか（すなわち何故π(s)はaであってa'ではないのか）」についての説明的な若しくは説得的な対話の形をとることも好ましい。ちなみにこのような対話は、上述した（ｃ）カメラ等の測定手段から出力される測定結果を入力可能な、又はテキスト入出力・音声入出力等の可能なインタフェース（ＩＦ）を介し実施可能となっているのである。 This explanation ε can be, for example, a communication content including a specific question/question about the policy π generated from PA ^χ to be given to the user H (the action plan algorithm PA ^H ). Here, this question/question is ``why is a certain action a in policy π?'', ``why policy π and not another policy π''', or ``why policy π is optimal ( ie why π(s) is a and not a'). By the way, such a dialogue can be implemented via an interface (IF) capable of inputting the measurement results output from the above-mentioned (c) measuring means such as a camera, or capable of text input/output, voice input/output, etc. -ing

また、上述した予測調停とは別に、この説明εによってユーザＨの精神モデルΠ^Ｈを変化させることを考えてもよい。具体的には、ユーザＨは、行動計画アルゴリズムPA^χが最適であるとする方針πを、全く異なる"精神性"をもって評する可能性がある。そこで、説明εを、ユーザＨも決定された方針πに同意するように用いるのである。例えば、PA^χ：Π×Ｄ→πである下で、Ｈ^Ｈを、PA^Ｈ：Ｈ^Ｈ×Ｄ→πを満たすようなユーザＨの精神モデルとした上で、精神モデルΠ^Ｈを説明εによって、Ｈ^Ｈに変換させてもよい（すなわち、Π^Ｈ＋ε→Ｈ^Ｈとしてもよい）。 Also, apart from the predictive arbitration described above, it may be considered to change the mental model ΠH of the user ^H by this explanation ε. Specifically, the user H may evaluate the policy π that the action plan algorithm PA ^χ is optimal, with a completely different "spirituality". Therefore, the explanation ε is used so that the user H also agrees with the determined policy π. For example, under PA ^χ : Π×D→π, let H ^H be the mental model of user ^H that satisfies PA ^H :H ^H ×D→π. , H ^H (that is, Π ^H + ε→H ^H ).

またさらに他の手法として、行動計画部１５１は、互いに異なる仮定の下で生成された価値セットτ（の中の価値）の相違を強調する説明εを実施することもできる。このような説明εの実施は、例えばユーザＨと主体感推定モデル１（行動計画部１５１）との間では、ε←τΔτ^Ｈと表すことができ、ユーザＨ１とユーザＨ２との間では、ε←τ^Ｈ１Δτ^Ｈ２と表すことが可能である。このうち、後者については、ユーザＨ１の価値をユーザＨ２の価値の上位と捉える主体感推定モデル１の意向と適合したものとなっており、共同で行動を行うケースにおける、主体感推定モデル１の目標である主体感レベルのバランスのとれた推定を実現することも可能となる。 As yet another approach, the action planner 151 may implement explanations ε that emphasize differences in (values in) value sets τ generated under different assumptions. Implementation of such an explanation ε can be expressed as, for example, ε←τΔτ ^H between the user H and the sense of subjectivity estimation model 1 (behavior planning unit 151), and between the user H1 and the user H2, ε ←τ ^H1 Δτ ^H2 can be expressed. Of these, the latter is compatible with the intention of the subjectivity estimation model 1, which considers the value of the user H1 to be higher than the value of the user H2. It is also possible to realize a well-balanced estimation of the target sense of subjectivity level.

ここで具体的に、PA^χ：Π×Ｄ→πである下で、ユーザＨと主体感推定モデル１との相互作用の中、τ^Ｈ＋ε→τ'^Ｈ、及びPA^χ：Π×Ｒ×τ'^Ｈ→πとなるような説明εが生成され、実施される。または、主体感推定モデル１がユーザＨ１の価値をユーザＨ２の価値の上位と捉える中（すなわち、PA^χ：Π×Ｒ×τ^Ｈ２→πである中）、τ^Ｈ１＋ε→τ'^Ｈ１＝τ'^Ｈ２を満たすような説明εが生成され、実施されてもよいのである。 Specifically, under PA ^χ : Π×D→π, τ ^H +ε→τ′ ^H and PA ^χ : π×R× An explanation ε such that τ′ ^H →π is generated and implemented. Alternatively, while the sense of subjectivity estimation model 1 considers the value of user H1 to be higher than the value of user H2 (that is, PA ^χ : Π×R×τ ^H2 →π), τ ^H1 +ε→τ′ ^H1 =τ An explanation ε that satisfies ' ^H2 may be generated and implemented.

ちなみに、以上説明したことからも明らかなように、行動計画部１５１は、上述した価値一致化モデル１３での状況とは異なり、ユーザＨ（の行動計画アルゴリズム）よりも問題解決能力のより高い行動計画アルゴリズムを備えているのである。すなわち、PA^χ＞PA^Ｈとなっているのである。 Incidentally, as is clear from the above explanation, unlike the situation in the value matching model 13 described above, the action planning unit 151 has a higher problem-solving ability than the user H (action planning algorithm). It has a planning algorithm. That is, PA ^χ >PA ^H.

以上、ＡＩがユーザに対し「説明」を行い、モデルの一致化を図るためのいくつかの手法を説明したが、いずれにしても、本実施形態での手法は、ＡＩによる説得技術（persuasive technology）における３つの基本に則っている。すなわち第１の基本として、主体感推定モデル１は、ユーザＨに対し、行動の計画や推定の結果の提示についての高い透明性を有し、ユーザＨは、質問したり説明を求めたりすることができるようになっている。なおこれにより、主体感推定モデル１はユーザＨにとって信頼できるものとなり、両者の関係がより向上することが期待される。 As described above, several methods for AI to "explain" to the user and achieve model matching have been described. ) are based on three basic principles. That is, as a first basis, the sense of subjectivity estimation model 1 has a high degree of transparency regarding the presentation of action plans and estimation results to the user H, and the user H can ask questions and request explanations. is now possible. It is expected that this will make the sense of subjectivity estimation model 1 trustworthy for the user H, and further improve the relationship between them.

また第２の基本として、主体感推定モデル１は、自身がユーザＨの行動を理解していることを、当のユーザＨへ説明することができる。なおこれにより、ユーザＨの主体感推定モデル１への共感度を高めることが可能となる。さらに第３の基本として、主体感推定モデル１は、ユーザＨに対し、行動の計画や推定を、ユーザＨと協働して行っている。なおこれによって、本発明の推定対象であるユーザＨの主体感レベルそのものを、向上させることも可能となるのである。 As a second basis, the sense of subjectivity estimation model 1 can explain to the user H that it understands the behavior of the user H itself. Note that this makes it possible to increase the degree of sympathy for the user H's sense of subjectivity estimation model 1 . Furthermore, as a third basis, the sense of subjectivity estimation model 1 works together with the user H to plan and estimate the behavior of the user. It should be noted that this also makes it possible to improve the sense of subjectivity level itself of user H, which is the estimation target of the present invention.

同じく図１において、行動モデル１５の行動決定部１５２は、行動計画部１５１で生成された最適方針π*を用いて、観測された状態sに対して行うべき行動aを決定し、出力する。具体的には、π(s)＝aを実行するのである。ちなみに、ここで決定される行動は、（主体感推定モデル１が自律ＡＩとして制御を行っている場合における）主体感推定モデル１、（ユーザＨが主に制御を行っている場合における）ユーザＨ、及び（主体感推定モデル１とユーザＨとが協働して制御を行っている場合における）主体感推定モデル１とユーザＨとの両者、のうちのいずれかの行動となる。 Also in FIG. 1, the action determining unit 152 of the action model 15 uses the optimal policy π* generated by the action planning unit 151 to decide and output the action a to be performed for the observed state s. Specifically, π(s)=a is executed. Incidentally, the actions determined here are the subjective feeling estimation model 1 (when the subjective feeling estimation model 1 performs control as an autonomous AI), the user H , and both of the sense of subjectivity estimation model 1 and the user H (in the case where the sense of subjectivity estimation model 1 and the user H cooperate to perform control).

ここで、決定される行動が主体感推定モデル１の行動である場合、行動決定部１５２から出力された行動aは、所定のインタフェース（ＩＦ）を介し（例えばアクチュエータの駆動、ディスプレイへの表示や、スピーカからの音声出力といった態様を介し）ユーザＨを含む環境世界へ作用し、これを受けた環境世界の状態sは、状態s'へ変化することになるのである。 Here, when the action to be determined is the action of the sense of subjectivity estimation model 1, the action a output from the action determination unit 152 is transmitted via a predetermined interface (IF) (for example, driving an actuator, displaying on a display, or , and voice output from a speaker) to the environmental world including the user H, and the state s of the environmental world receiving this changes to the state s'.

また、行動aの結果として世界において観測される状態s'は、「意図した（予測された）状態が、行動モデルで実施された行動aによるもの（であって他の行動主体の行動によるものではない）か否か」を決定するべく、意思モデル１４にフィードバックされるのである。 In addition, the state s' observed in the world as a result of action a is defined as "the intended (predicted) state resulting from action a implemented in the action model (and resulting from the action of another agent). It is fed back to the intention model 14 to determine "whether or not".

ちなみに、ユーザＨを、道路Ｘを走行している自動車のドライバとし、環境世界を、ユーザＨの自動車も含む道路交通状況及び道路網周辺の環境とすると、決定された行動：「道路Ｙへ迂回すべき旨を通知する」や「道路Ｙへ迂回する」によって、例えば新たな状態：「道路Ｙを走行して目的地Ｗに向かっている」や「走行している道路Ｙの交通状況は"渋滞"ではない」が発生することになる。またこれにより、例えば当初"low"であったユーザＨの「主体感レベル」が"high"に変化することになるのである。 By the way, assuming that the user H is the driver of a car traveling on the road X, and the environment world is the road traffic condition including the user H's car and the environment around the road network, the determined action: "detour to the road Y For example, a new state: "I am driving on road Y toward destination W" or "The traffic condition of road Y I am driving on is". Congestion "not" will occur. Also, as a result, for example, the "sense of subjectivity level" of user H, which was originally "low", changes to "high".

＜代替状態生成・評価モデル＞
同じく図１において、代替状態生成・評価モデル１６は、過去に見られない、予測されない又は希にしか起こらない状況においては、新規の方針を生成し、評価しなければならない、といった問題を解決するためのモデルである。いわば、行動変容を促すための「介入」用の介入コンテンツを自動で生成するモデルと捉えることもできるのである。 <Alternative state generation/evaluation model>
Also referring to FIG. 1, the Alternate State Generation and Evaluation Model 16 solves the problem of having to generate and evaluate new policies in previously unseen, unforeseen or infrequently occurring situations. It is a model for In other words, it can be regarded as a model that automatically generates intervention content for "intervention" to encourage behavioral change.

具体的に、代替状態生成・評価モデル１６は、起こり得る新規の状態としての代替状態s^alを生成する状態「生成」器（１６２）、及び生成された代替状態s^alを評価する「評価器」（１６４）を備えており、公知の「生成・評価（actor-critic）フレームワーク」（Aras Dargazany, arXiv:2004.04574 Artificial Intelligence [cs.AI], 2020）（Zhewei Huang et al., arXiv:1903.04411 Computer Vision and Pattern Recognition [cs.CV], 2019）に基づき構成されたモデルである。ちなみにこの「生成・評価（actor-critic）フレームワーク」は、敵対的生成ネットワーク（ＧＡＮ，Generative Adversarial Networks）・深層強化学習（ＤＲＬ，Deep Reinforcement Learning）アルゴリズムを用いて構築されている。 Specifically, the alternative state generation and evaluation model 16 includes a state 'generator' (162) that generates an alternative state s ^al as a possible new state, and an 'evaluator' (162) that evaluates the generated alternative state s ^al ” (164), and is a well-known “actor-critic framework” (Aras Dargazany, arXiv:2004.04574 Artificial Intelligence [cs.AI], 2020) (Zhewei Huang et al., arXiv:1903.04411 This model is based on Computer Vision and Pattern Recognition [cs.CV], 2019). Incidentally, this “actor-critic framework” is built using Generative Adversarial Networks (GAN) and Deep Reinforcement Learning (DRL) algorithms.

ただし、この公知の「生成・評価（actor-critic）フレームワーク」では、ＡＩは現状の環境世界を正確にモデル化できているとの前提の下で処理が進められるのに対し、代替状態生成・評価モデル１６では、過去に学習された様々なコンテキストからの知識や、互いに異なる複数のユーザの精神モデル（ＣＢＮ）からの知見（の集積体）を採用して、（ユーザＨにとって）予測・予見し得なかった、しかし発生し得る様々な状況を学習し、問題を解決するものとなっているのである。 However, in this well-known actor-critic framework, processing proceeds under the assumption that AI can accurately model the current environment world, whereas alternative state generation・In the evaluation model 16, knowledge from various contexts learned in the past and knowledge (aggregate of them) from different mental models (CBN) of a plurality of users are adopted to predict (for user H) It learns from various unforeseeable but possible situations and solves problems.

同じく図１において、代替状態生成・評価モデル１６のＣＢＮ集合体１６１は、互いに異なる複数のユーザであるＨ1, Ｈ2, ・・・, Ｈpそれぞれの意思モデル（精神モデル，因果関係情報）であるＣＢＮ1, ＣＢＮ2, ・・・, ＣＢＮpの集合体ＣＢＮ_∪（＝ＣＢＮ_{Ｈi}）、言い換えれば統合因果関係情報、となっている。なお、ＣＢＮ1, ＣＢＮ2, ・・・, ＣＢＮpの少なくとも一部は、（人の違いではなく）コンテキストの違いに対応した、例えば互いに異なるコンテキストに対応したエンティティとすることも可能である。 Similarly, in FIG. 1, the CBN aggregation 161 of the alternative state generation/evaluation model 16 is a CBN1 that is an intention model (mental model, causal relationship information) of each of a plurality of different users H1, H2, . . . , Hp. , _CBN2 , _. At least some of CBN1, CBN2, .

具体的に集合体ＣＢＮ_∪は、例えばＣＢＮ_Ｈ1＝(DAG<Ｙ_Ｈ1, Ｅ_Ｈ1>, Prob_Ｈ1)及びＣＢＮ_Ｈ2＝(DAG<Ｙ_Ｈ2, Ｅ_Ｈ2>, Prob_Ｈ2)が与えられたときに、次式
（３）ＣＢＮ_∪＝(DAG_∪<Ｙ_Ｈ1∪Ｙ_Ｈ2, Ｅ_Ｈ1∪Ｅ_Ｈ2>, Prob_Ｈ1∪Prob_Ｈ2)
で表すことができる。ここで、DAG<Ｙ, Ｅ>は、ノード（変数）Ｙ及びエッジＥで構成さされる有向非巡回グラフを指しており、また、Probは、各エッジＥに対応する遷移確率である。 Specifically, when given _CBN _H1 = (DAG<Y _H1 , E _H1 >, Prob _H1 ) and CBN _H2 = (DAG<Y _H2 , E _H2 >, Prob _H2 ), Formula (3) CBN _∪ = (DAG _∪ <YH1 _∪YH2 , _EH1 _∪EH2 >, Prob _Ｈ1 _∪Prob _Ｈ2 )
can be expressed as Here, DAG<Y, E> indicates a directed acyclic graph composed of nodes (variables) Y and edges E, and Prob is the transition probability corresponding to each edge E.

同じく図１において、代替状態生成・評価モデル１６の状態生成器１６２は、
（ａ）観測された状態sと、これに対応する（行動決定部１５２から）出力された行動aとを受け取って、
（ｂ）ＣＢＮ集合体１６１から受け取った意思モデルの集合体ＣＢＮ_∪（統合因果関係情報）に基づき、
起こり得る状態候補としての代替状態s^alを生成し、出力する。 Also in FIG. 1, the state generator 162 of the alternative state generation/evaluation model 16 is:
(a) receiving an observed state s and a corresponding output action a (from the action determination unit 152),
(b) based on the aggregate CBN _∪ (integrated causal relationship information) of intention models received from the CBN aggregate 161,
Generate and output alternative states s ^al as possible state candidates.

ここで代替状態s^alは、ユーザＨが予測・予期しなかった又は起こり得るとは考えなかった新規の状態であり、例えば過去に見られなかった未知の問題に対する、より好適な代替の解決行動を決定するのに使用されるものとなっている。 Here, the alternative state s ^al is a new state that the user H did not anticipate/anticipate or consider possible, for example, a more suitable alternative solution action for an unknown problem that has not been seen in the past. is used to determine

ちなみに、状態生成器１６２は、公知の敵対的生成ネットワーク（ＧＡＮ）の生成部分に対応するものになってはいるが、従来のように例えばフェイクデータを生成するのではなく、新規の問題を解決するための新たな戦略を生み出すための代替状態s^alを生成するのである。 By the way, the state generator 162 corresponds to the generation part of the well-known generative adversarial network (GAN), but instead of generating, for example, fake data as in the past, it solves a new problem. It generates alternative states s ^al to create new strategies for

同じく図１において、代替状態生成・評価モデル１６の判別器１６３は、状態生成器１６２で生成された代替状態s^alが、ユーザＨのいるコンテキストではあり得ないほどに架空のものとなってはいないか否かを判別する。すなわち判別器１６３は、生成された新規の代替状態s^alが現実の問題を解決するのに有用となり得るものか否かを見極め、代替状態s^alがそのような状態となるように、状態生成器１６２の訓練を促すものとなっているのである。 Similarly, in FIG. 1, the classifier 163 of the alternative state generation/evaluation model 16 prevents the alternative state s ^al generated by the state generator 162 from being unimaginably fictitious in the context in which the user H is present. determine whether or not That is, the discriminator 163 determines whether the generated new alternative state s ^al can be useful in solving a real problem, and determines whether the alternative state s ^al is such a state. It is intended to encourage training of the instrument 162 .

具体的に判別器１６３は本実施形態において、複数の全結合層を含む深層ニューラルネットワーク（ＤＮＮ，Deep Neural Networks）アルゴリズムで構成されており、（行動決定部１５２による）行動aによって生じた新たな状態s'と、（状態生成器１６２で生成された）代替状態s^alとから、所望状態sdとの相違を表す損失ALossを生成し、この損失ALossをもって、（ａ）状態生成器１６２に対し訓練を行わせ、また、（ｂ）自ら（判別器１６３）の訓練を行う。 Specifically, in this embodiment, the classifier 163 is composed of a deep neural network (DNN, Deep Neural Networks) algorithm including a plurality of fully connected layers, and a new From the state s′ and the alternative state s ^al (generated by the state generator 162), a loss ALoss representing the difference from the desired state sd is generated. and (b) train itself (classifier 163).

ここで、一般的な敵対的生成ネットワーク（ＧＡＮ）においては、敵対的損失として、現在の状態s'と生成された状態sgとの相違の度合い、すなわち、max_ψ(Ｅx_ｓ～μ[ψ(s')]－Ｅx_ｓg～ug[ψ(sg)])が算出される。ここで、Ｅxは期待値であって、μ及びugはそれぞれ現在の状態のサンプル確率分布、及び生成された状態のサンプル確率分布である。 Here, in a general adversarial generation network (GAN), the adversarial loss is the degree of difference between the current state s' and the generated state sg, that is, max _ψ (Ex _{s ~ μ} [ψ( s')]-Ex _sg˜ug [ψ(sg)]) is calculated. where Ex is the expected value, and μ and ug are the sample probability distributions of the current state and the generated state, respectively.

これに対し、判別器１６３は本実施形態において、スカラである敵対的損失ALossそのものを出力するのであり、従来とは異なり、
（ａ）ALoss(s')：現在の状態s'と所望の状態sdとの間の損失、及び
（ｂ）AL(s^al)：代替状態s^alと所望の状態sdとの間の損失
として、ALoss(s')とAL(s^al)との間の最適な誤差を選択することを目的としているのである。 On the other hand, in this embodiment, the discriminator 163 outputs the hostile loss ALoss itself, which is a scalar.
As (a) ALoss(s'): loss between current state s' and desired state sd, and (b) AL(s ^al ): loss between alternative state s ^al and desired state sd , ALoss(s') and AL(s ^al ).

同じく図１において、代替状態生成・評価モデル１６の評価器１６４は、（敵対的損失ALossによって訓練された）状態生成器１６２で生成される代替状態s^alに対応する報酬である予測報酬を生成する。ここで、この予測報酬は、行動モデル１５（の行動決定部１５２）から出力される行動aによって算出される報酬をもはや含まないものとなっている。 Also in FIG. 1, the evaluator 164 of the alternative state generation and evaluation model 16 generates a predicted reward, which is the reward corresponding to the alternative state s ^al generated in the state generator 162 (trained by the adversarial loss ALoss). do. Here, this predicted reward no longer includes the reward calculated by the action a output from (the action determination unit 152 of) the action model 15 .

また評価器１６４は、この予測報酬をもって行動モデル１５（の行動決定部１５２）に対し行動の決定についての訓練を行わせる。これにより、行動モデル１５（の行動決定部１５２）における行動決定処理を、過去に見られない、予測されない又は希にしか起こらない状況に対しても適用できるように更新することが可能となるのである。 Also, the evaluator 164 causes the action model 15 (the action decision section 152 thereof) to undergo action decision training with this predicted reward. As a result, it is possible to update the behavior determination process in (the behavior determination unit 152 of) the behavior model 15 so that it can be applied to situations that have not been seen in the past, cannot be predicted, or rarely occur. be.

具体的に、時点tにおける予測報酬は、強化学習のＱ学習価値関数Q(s^al _t)とすることができる。このＱ学習価値関数Q(s^al _t)は、次式
（４） Q(s^al _t)＝r(s^al _t, a_t)＋γQ(s^al _t+1)
のように、割り引きされた報酬として更新される。ここで、r(s^al _t, at)は、状態s^al _tの下で行動a_tを行う場合の報酬となっている。 Specifically, the predicted reward at time t can be the Q learning value function Q(s ^al _t ) of reinforcement learning. This Q learning value function Q(s ^al _t ) is expressed by the following equation (4) Q(s ^al _t )=r(s ^al _t , a _t )+γQ(s ^al _t+1 )
, is updated as a discounted reward. Here, r(s ^al _t , at) is the reward for performing action a _t under state s ^al _t .

次いで本実施形態において、行動モデル１５の行動決定部１５２は、この予測報酬（Ｑ学習価値関数）Q(s^al _t)を用い、行動aを導出するためのπ*(s)を、r(s^al _t, π*(s^al _t))＋Q(ζ(s^al _t, π*(s^al _t)))が最大化するように訓練するのである。ここで、遷移関数ζ(s^al _t, π*(s^al _t))は、時刻t+1における代替状態s^al _t+1となる。このような代替状態s^al _t+1は、ユーザＨがその能力の限界から、提示された問題への回答は不可能であるといったような苦境に立たされた場合に、主体感推定モデル１によって提示される解答と捉えることもできる。またこのような解答をユーザＨに提示することは、ユーザＨの主体感推定モデル（行動変容促進モデル）１に対する信頼性を高めるのに貢献することにもなるのである。 Next, in this embodiment, the behavior determination unit 152 of the behavior model 15 uses this predicted reward (Q learning value function) Q(s ^al _t ) to calculate π*(s) for deriving the behavior a as r( We train to maximize s ^al _t , π*(s ^al _t )) + Q(ζ(s ^al _t , π*(s ^al _t ))). Here, the transition function ζ(s ^al _t , π*(s ^al _t )) is the alternate state s ^al _t+1 at time t+1. Such an alternative state s ^al _t+1 is generated by the sense of subjectivity estimation model 1 when the user H is in a predicament in which it is impossible to answer the presented question due to the limitation of his ability. It can also be regarded as a presented answer. Also, presenting such an answer to the user H contributes to increasing the reliability of the user H's sense of subjectivity estimation model (behavioral change promotion model) 1 .

＜主体感モデル＞
同じく図１において、主体感モデル１７は本実施形態において、行動の表現型としての動的なコンテキストに依存する主体感（ＳｏＡ）レベルのリアルタイムの変動を推定し出力する。 <Sense of subjectivity model>
Also referring to FIG. 1, the sense of subjectivity model 17 estimates and outputs real-time variations in sense of subjectivity (SoA) levels dependent on dynamic context as behavioral phenotypes in this embodiment.

過去に行われたある心理学的実験（Tapal, A. et al., Frontiers in Psychology, 8, Article 1552. 2017, ＜https://doi.org/10.3389/fpsyg.2017.01552＞）では、特定のイベントについての自己の主体感（ＳｏＡ，Sense of Agency）を本人が直接評価した結果であるセルフリポート（自己申告）を介した、直接的な主体感の測定が行われている。 In a previous psychological experiment (Tapal, A. et al., Frontiers in Psychology, 8, Article 1552. 2017, <https://doi.org/10.3389/fpsyg.2017.01552>), a specific A direct measurement of subjectivity is being carried out through self-reports (self-reports), which are the results of a person's direct evaluation of his or her sense of agency (SoA, Sense of Agency) about an event.

また過去には、主体感の変動についての本人による測定値と外部からの測定値との知覚的差異を用いた、直接的な主体感の測定例も存在する。しかしながらいずれの手法においても、測定対象者からの直接的な主体感に関する応答を必須とし、それ故、測定対象者に断続的な行動の中断を強いることになるので、特に主体感レベルが大きく変動する状況においては、適用することが困難となっていた。 In the past, there are also examples of direct measurement of subjectivity using the perceptual difference between the measured value of the subject's subjectivity and the measured value from the outside. However, in any method, it is essential that the person being measured responds directly to the sense of subjectivity, and as a result, the person being measured is forced to intermittently suspend their actions, so the sense of subjectivity level fluctuates greatly. It has become difficult to apply in situations where

これに対し、本願発明者等は、主体感レベルsoaの変化が、ユーザＨ（測定対象者）における生理学的指標（例えば心拍数や呼吸の速さ等）、姿勢、身振りや、音声韻律指標（例えば調子、アクセント、イントネーション、発話速度、発話ピッチ、及び発話量等）における時間変化に明確に現れること（を仮説として上手くいくこと）を見出した。ここで、これらの測定結果は従来、（情動や気分を含む）感情状態の推定に効果的に用いられてきたものとなっている。 On the other hand, the inventors of the present application have found that changes in the sense of subjectivity level soa are reflected in physiological indices (e.g., heart rate, breathing speed, etc.), posture, gestures, and speech prosodic indices ( (For example, tone, accent, intonation, speech rate, speech pitch, and speech volume, etc.) clearly appear in temporal changes (as a hypothesis, it works well). Here, these measurements have traditionally been effectively used to estimate emotional states (including emotions and moods).

また、いわゆるアフェクティブコンピューティング（Affective Computing）の分野では、ウェアラブルセンサや環境センサによって得られた行動・態度の表現型の情報からＡＩを用いて、対象者の感情を認識したり、感情における適応応答を探ったりする研究が精力的に行われている。またさらに、人の主体感と情動とは、日々の生活の中で常に相互作用していることを証明した研究もいくつか存在する（例えば、Matthis Synofzik et al., Front. Psychol., 4(127), 2013 ＜https://doi.org/10.3389/fpsyg.2013.00127＞や，Antje Gentsch1 and Matthis Synofzik, Front. Hum. Neurosci., 8:608, 2014 ＜https://doi.org/10.3389/fnhum.2014.00608＞等）。 In the field of so-called Affective Computing, AI is used to recognize the emotions of the target person and adapt to their emotions, based on information on behavior and attitude phenotypes obtained from wearable sensors and environmental sensors. Research to explore responses is being vigorously carried out. Furthermore, there are some studies that prove that human subjectivity and emotion are constantly interacting in daily life (e.g., Matthis Synofzik et al., Front. Psychol., 4( 127), 2013 <https://doi.org/10.3389/fpsyg.2013.00127>, Antje Gentsch1 and Matthis Synofzik, Front. Hum. Neurosci., 8:608, 2014 <https://doi.org/10.3389/ fnhum.2014.00608> etc.).

また例えば、主体感は、感情的な因子、例えば行動による感情に関わる結果への期待がポジティブかネガティブか、今回の行動を行う動機は高いのか低いのかや、行動を行うのは友好的な環境においてか敵対的な環境においてか等によって変調し得るとの研究結果（Julia F Christensen et al., Exp Brain Res. 237(5), 1205-1212, 2019 ＜https://doi.org/10.1007/s00221-018-5461-6＞）も開示されている。 For example, the sense of agency is based on emotional factors, such as whether the expected outcome of the action is positive or negative, whether the motivation to perform the current action is high or low, and whether the behavior is in a friendly environment. (Julia F Christensen et al., Exp Brain Res. 237(5), 1205-1212, 2019 <https://doi.org/10.1007/ s00221-018-5461-6>) is also disclosed.

以上に説明したような知見や発見を定式化するべく、主体感モデル１７においては、現時点の主体感レベルsoaを出力する主体感認識関数Ω：ρ1×ρ2×ρ3×ρ4×・・・→Ｓを規定する。ここで、ρ1, ρ2, ・・・は、ユーザＨ（測定対象者）における生理学的指標（例えば心拍数や呼吸の速さ等）、姿勢、身振りや、音声韻律指標（例えば調子、アクセント、イントネーション、発話速度、発話ピッチ、及び発話量等）を表す特徴量パラメータである。 In order to formulate the findings and discoveries described above, the sense of subjectivity model 17 includes a sense of subjectivity recognition function Ω for outputting the current sense of subjectivity level soa: ρ1×ρ2×ρ3×ρ4× . stipulate. Here, ρ1, ρ2, . , speech rate, speech pitch, speech volume, etc.).

このような主体感認識関数Ωを規定した上で、主体感モデル１７は具体的に、深層ニューラルネットワーク（ＤＮＮ，Deep Neural Networks）アルゴリズムで構成され、
（ａ）行動決定部１５２から出力された行動aによって生じた新たな状態s'と、
（ｂ）この新たな状態s'の下での（新たな状態s'の影響を受けた）ユーザＨにおける所定の特徴ρに係る特徴量と
を受け取って、これら新たな状態s'及び特徴量ρに基づき、ユーザＨの主体感レベル（soa）を決定又は更新し、出力する。さらに、ビリーフモデル１１で用いる主体感レベルsoaを、決定又は更新した主体感レベルsoa'に更新させる。これにより、例えば主体感の中断や途絶が生じる原因や特徴を推定したり予期したりすることも可能となるのである。 After defining such a sense of subjectivity recognition function Ω, the sense of subjectivity model 17 is specifically composed of a deep neural network (DNN, Deep Neural Networks) algorithm,
(a) a new state s' caused by the action a output from the action determination unit 152;
(b) receive a feature quantity related to a given feature ρ of the user H (affected by the new state s′) under this new state s′, and obtain these new state s′ and feature quantity Based on ρ, the sense of subjectivity level (soa) of the user H is determined or updated and output. Further, the sense of subjectivity level soa used in the belief model 11 is updated to the determined or updated sense of subjectivity level soa'. This makes it possible to estimate or anticipate the causes and characteristics of, for example, the interruption or discontinuity of the sense of subjectivity.

ちなみに、この主体感モデル１７から出力された主体感レベルsoa'がビリーフモデル１１の「ビリーフ情報」を更新し、さらにこの更新されたビリーフ情報が意思モデル１４の「因果関係情報」（ＣＢＮ構成情報）を更新する流れは、まさに、主体感にかかわる「知覚対象を制御しているとの確信・信念」（perceived control）が「意思」に影響を及ぼす、との従来の心理学理論を体現したものとなっている。 Incidentally, the subjective feeling level soa' output from the subjective feeling model 17 updates the "belief information" of the belief model 11, and furthermore, this updated belief information is the "causal relationship information" (CBN configuration information) of the intention model 14. ) embodies the conventional psychological theory that ``perceived control'', which is related to the sense of subjectivity, affects ``intention''. It is a thing.

ここで、主体感モデル１７は、
（ｃ）価値一致化モデル１３から受け取った、ユーザＨの主体感に係るセルフレポート（自己申告）soa_r
にも基づいて、ユーザＨの主体感レベル（soa）を決定又は更新し、出力することも好ましい。これは、価値一体化モデル１３が、ＣＩＲＬ（協調逆強化学習）の処理フローの一環として、（認識している主体感に疑いのある場合に）ユーザＨに対し、ユーザＨの主体感レベルを問い合わせた結果として、ユーザＨの主体感に係るセルフレポートsoa_rを取得した場合の処理となる。 Here, the sense of subjectivity model 17 is
(c) Self-report (self-report) soa_r on user H's sense of subjectivity received from value matching model 13
It is also preferable to determine or update and output the sense of subjectivity level (soa) of user H based on also. This is because the integrated value model 13, as part of the processing flow of CIRL (collaborative inverse reinforcement learning), asks the user H the subjective feeling level As a result of the inquiry, the processing is performed when a self-report soa_r related to user H's sense of agency is acquired.

以上、主体感モデル１７における主体感レベルsoaの生成・更新処理を説明したが、主体感推定モデル１はこれにより、例えば、ユーザＨに生じている主体感レベルを如何に確実に捉えて理解しているのかをユーザＨに対し説明することもでき、またその結果、ユーザＨに対し共感性と信頼性の高さを更に実証してみせることも可能となるのである。さらに、主体感推定モデル１はその上で、提示した方針がユーザＨの主体感レベルを向上させこそすれ、減退させるものではないと考えられることを、ユーザＨに納得させることも可能となり、またこのように、ユーザＨに対し、賢く分別のある応答を提供することもできるのである。 The process of generating and updating the sense of subjectivity level soa in the sense of subjectivity model 17 has been described above. As a result, it is possible to further demonstrate to user H the high level of empathy and reliability. Furthermore, the sense of subjectivity estimation model 1 can also persuade user H that the presented policy will only improve user H's sense of subjectivity level and not reduce it. In this way, user H can also be provided with an intelligent and sensible response.

［主体感推定装置，主体感推定プログラム］
以下、図１に戻って、以上に説明したような主体感推定モデル１を搭載しており、推定対象ユーザであるユーザＨの主体感を推定する主体感推定装置９について説明する。ちなみに同様の構成によって本装置は、行動変容促進モデル（主体感推定モデル）１を搭載した行動変容促進装置とすることも可能となっている。 [Sense of subjectivity estimation device, sense of subjectivity estimation program]
Hereinafter, referring back to FIG. 1, the sense of subjectivity estimation device 9 that incorporates the sense of subjectivity estimation model 1 described above and estimates the sense of subjectivity of the user H who is the estimation target user will be described. By the way, with the same configuration, this device can also be used as a behavior change promotion device equipped with a behavior change promotion model (subjectivity estimation model) 1 .

図１の左側下部に示した本実施形態の主体感推定装置９は、搭載した主体感推定モデル１を用いて、環境世界における観測された状態から、推定対象であるユーザＨの主体感を推定する、具体的にはユーザＨの主体感レベルを決定する装置となっている。 A sense of subjectivity estimation device 9 of this embodiment shown in the lower left part of FIG. Specifically, it is a device that determines the sense of subjectivity level of the user H. FIG.

具体的に図１において、主体感推定装置９のユーザインタフェース（ＩＦ）９１は、価値一致化モデル１３のインタフェース（ＩＦ）、行動計画部１５１のインタフェース（ＩＦ）、及び行動モデル１５（行動決定部１５２）のインタフェース（ＩＦ）に相当し、ユーザＨに係る測定結果を取り入れたり、ユーザＨとの各種コミュニケーションに係る情報を入出力したり、さらには決定された行動を環境世界に対し作用させる役割を果たす。また、環境世界の状態といったような、主体感推定モデル１の訓練に必要となる情報や、主体感推定の条件となる情報を収集する入力部ともなっている。 Specifically, in FIG. 1, the user interface (IF) 91 of the sense of subjectivity estimation device 9 includes the interface (IF) of the value matching model 13, the interface (IF) of the action planning unit 151, and the behavior model 15 (behavior determination unit 152), which takes in measurement results related to user H, inputs and outputs information related to various types of communication with user H, and acts on the environment world with determined actions. fulfill It also serves as an input unit for collecting information necessary for training the subjective feeling estimation model 1, such as the state of the environment world, and information serving as conditions for subjective feeling estimation.

訓練部９２は、受け取った主体感推定モデル１の訓練に必要となる情報から訓練データを生成し、これを用いて主体感推定モデル１の訓練を実施する。 The training unit 92 generates training data from the received information necessary for training the sense of subjectivity estimation model 1, and uses this to train the sense of subjectivity estimation model 1. FIG.

主体感推定部９３は、受け取った主体感推定の条件となる情報に基づき、訓練済みの主体感推定モデル１を用いて、ユーザＨの主体感レベルを決定する。ここで本実施形態においては、複雑な又は刻々と変動する環境世界の状況の中でも、ユーザＨのリアルタイムの主体感レベルや、ユーザＨの主体感レベルのダイナミックな変動を決定し出力することが可能となっている。 The sense of subjectivity estimation unit 93 determines the sense of subjectivity level of the user H using the trained sense of subjectivity estimation model 1 based on the received information serving as conditions for estimation of the sense of subjectivity. Here, in this embodiment, it is possible to determine and output the real-time sense of subjectivity level of user H and dynamic changes in the sense of subjectivity level of user H even in a complex or ever-changing environmental world. It has become.

出力部９４は、決定された主体感レベルに係る情報を、（通信機能を備えている場合に）外部の情報処理装置へ送信したり、（表示機能を備えている場合に）表示したりする。 The output unit 94 transmits information related to the determined sense of subjectivity level to an external information processing device (if it has a communication function) or displays it (if it has a display function). .

ここで、訓練部９２及び主体感推定部９３は、本発明による主体感推定方法の一実施形態を実施する主要機能構成部であり、また、本発明による主体感推定プログラムの一実施形態を保存したプロセッサ・メモリの機能と捉えることもできる。またこのことから、主体感推定装置９は、主体感推定の専用装置であってもよいが、本発明による主体感推定プログラムを搭載した、例えばクラウドサーバ、非クラウドのサーバ装置、パーソナル・コンピュータ（ＰＣ）、ノート型若しくはタブレット型コンピュータ、スマートフォン、又はウェアラブルコンピュータ等とすることも可能である。 Here, the training unit 92 and the sense of subjectivity estimation unit 93 are main functional components that implement an embodiment of the sense of subjectivity estimation method according to the present invention, and store an embodiment of the sense of subjectivity estimation program according to the present invention. It can also be regarded as a function of a processor memory that has been processed. For this reason, the subjective feeling estimation device 9 may be a dedicated device for subjective feeling estimation, but it may be a cloud server, a non-cloud server device, a personal computer (for example, a cloud server, a non-cloud server device, a personal computer ( PC), notebook or tablet computer, smart phone, wearable computer, or the like.

以上詳細に説明したように、本発明によれば、従来推定の困難であった（特に、複雑な且つ刻々と変動する環境世界下での推定ができなかった）ユーザの主体感を推定することができ、また、適宜更新したユーザの主体感を考慮して、環境世界に対する行動を決定することにより、その中でユーザの行動変容、すなわち意思又は行動における変化を促すことも可能となる。 As described in detail above, according to the present invention, it is possible to estimate a user's sense of subjectivity, which has been difficult to estimate in the past (especially, it was not possible to estimate under a complex and ever-changing environmental world). In addition, taking into account the user's sense of subjectivity, which has been appropriately updated, it is possible to determine behavior in relation to the environment, thereby encouraging behavioral change of the user, that is, change in intention or behavior.

例えば、適切な実施形態をとることによって、（本発明による主体感推定モデルからの提案や説得等を受けて）ユーザが自らの意思で行った行動と、その結果として現れた環境世界の状態との間に、ユーザ自身が繋がりや連動性を感じられるように、すなわちユーザの主体感を向上させこそすれ減退させずに、適切な提案や説得等を行うことも可能となるのである。 For example, by adopting an appropriate embodiment, it is possible to distinguish between actions taken by the user of his/her own will (in response to suggestions and persuasion from the sense of subjectivity estimation model according to the present invention) and the state of the environmental world that appears as a result. It is also possible to make appropriate proposals and persuasion, etc., so that the user himself/herself can feel a sense of connection and interlocking, that is, without diminishing the user's sense of independence.

また、本発明は、以上に述べたような作用効果を奏するが故に、将来様々な場面において見られるであろう人間とＡＩとの相互理解や協働活動について、その内容を向上・発展させるのにも大いに貢献するものになると考えられる。 In addition, because the present invention produces the effects described above, it will be possible to improve and develop the contents of mutual understanding and collaborative activities between humans and AI that will be seen in various situations in the future. is also expected to make a significant contribution to

さらに、例えば子供達に対し質の高い、すなわち子供達の主体性や勉強への意欲を尊重した教育を提供するために、本発明による主体感推定モデルや行動変容促進モデルを用いて、子供達の主体感を維持・向上させるような提案・指導を含む教育行動を、実施することもできる。すなわち本発明によれば、国連が主導する持続可能な開発目標（ＳＤＧｓ）の目標４「すべての人々に包摂的かつ公平で質の高い教育を提供し、生涯学習の機会を促進する」に貢献することも可能となるのである。 Furthermore, for example, in order to provide high-quality education to children, that is, education that respects children's independence and motivation to study, using the subjectivity estimation model and the behavior change promotion model according to the present invention, children It is also possible to carry out educational activities that include suggestions and guidance to maintain and improve the sense of independence of children. In other words, according to the present invention, it contributes to Goal 4 of the Sustainable Development Goals (SDGs) led by the United Nations, "Provide inclusive, equitable and quality education and promote lifelong learning opportunities for all." It is also possible to

また、例えば大人達に対し、ディーセント・ワーク（働きがいのある人間らしい仕事）を提供するために、本発明による主体感推定モデルや行動変容促進モデルを用いて、大人達の主体感を維持・向上させるような、仕事を得るための又は仕事上のアドバイスを実施し、大人達の適切な仕事上の行動変容を促すこともできる。すなわち本発明によれば、国連が主導するＳＤＧｓの目標８「すべての人々のための包摂的かつ持続可能な経済成長、雇用およびディーセント・ワークを推進する」に貢献することも可能となるのである。 In addition, for example, in order to provide adults with decent work (work that is worthwhile and humane), the sense of subjectivity estimation model and behavioral change promotion model according to the present invention are used to maintain and improve the sense of subjectivity of adults. It can also provide job-finding or job advice that encourages adults to make appropriate job behavior changes. In other words, according to the present invention, it is possible to contribute to Goal 8 of the SDGs led by the United Nations, "Promote inclusive and sustainable economic growth, employment and decent work for all". be.

さらに、例えば都市部を走行する自動車のドライバ達に対し、このドライバ達の目的を確実に且つ円滑に達成するため、本発明による主体感推定モデルや行動変容促進モデルを用いて、ドライバ達の主体感を減退させない、すなわちドライバ達にとって納得し易いナビゲーションを実施することもできる。すなわち本発明によれば、国連が主導するＳＤＧｓの目標１１「都市を包摂的、安全、レジリエントかつ持続可能にする」に貢献することも可能となるのである。 Furthermore, for example, for drivers of automobiles traveling in urban areas, in order to achieve the objectives of these drivers reliably and smoothly, the sense of subjectivity estimation model and the behavioral change promotion model according to the present invention are used to determine the subjectivity of the drivers. It is also possible to implement navigation that does not impair the driver's feeling, that is, that is easy for the drivers to understand. In other words, according to the present invention, it is possible to contribute to Goal 11 of the SDGs led by the United Nations, "Make cities inclusive, safe, resilient and sustainable."

またさらに、例えば消費者達に対し、持続可能な消費とライフスタイルを提供するため、本発明による主体感推定モデルや行動変容促進モデルを用いて、消費者達の主体感を減退させない、すなわち消費者達にとって納得し易い消費行動上のアドバイスや提案を実施することもできる。すなわち本発明によれば、国連が主導するＳＤＧｓの目標１２「持続可能な消費と生産のパターンを確保する」に貢献することも可能となるのである。 Furthermore, for example, in order to provide consumers with sustainable consumption and lifestyles, the sense of subjectivity estimation model and the behavioral change promotion model according to the present invention are used so that the sense of subjectivity of consumers is not diminished, that is, consumption It is also possible to implement advice and proposals on consumer behavior that are easy for people to understand. In other words, according to the present invention, it is possible to contribute to Goal 12 of the SDGs led by the United Nations, "ensure sustainable consumption and production patterns."

上述した本発明の種々の実施形態について、本発明の技術思想及び見地の範囲の種々の変更、修正及び省略は、当業者によれば容易に行うことができる。上述の説明はあくまで例であって、何ら制約しようとするものではない。本発明は、特許請求の範囲及びその均等物として限定するものにのみ制約される。 For the various embodiments of the present invention described above, various changes, modifications and omissions within the spirit and scope of the present invention can be easily made by those skilled in the art. The above description is exemplary only and is not intended to be limiting. The invention is to be limited only as limited by the claims and the equivalents thereof.

１主体感推定モデル（行動変容促進モデル）
１１ビリーフモデル
１２デザイアモデル
１３価値一致化モデル
１４意思モデル
１５行動モデル
１５１行動計画部
１５２行動決定部
１６代替状態生成・評価モデル
１６１ＣＢＮ（Causal Bayesian Network）集合体
１６２状態生成器
１６３判別器
１６４評価器
１７主体感モデル
９主体感推定装置
９１ユーザインタフェース（ユーザＩＦ）
９２訓練部
９３主体感推定部
９４出力部 1 Sense of agency estimation model (behavior change promotion model)
11 belief model 12 desire model 13 value matching model 14 intention model 15 action model 151 action planning unit 152 action decision unit 16 alternative state generation/evaluation model 161 CBN (Causal Bayesian Network) aggregate 162 state generator 163 discriminator 164 evaluation Device 17 Subjectivity model 9 Subjectivity estimation device 91 User interface (user IF)
92 training unit 93 sense of subjectivity estimation unit 94 output unit

Claims

A sense of subjectivity estimation model that causes a computer to function to estimate a sense of subjectivity of a user while determining behavior with respect to the state of the environment world including the user using a reward,
a belief model for generating or updating belief information, which is information including the probability that a new state will occur as a result of performing a certain action on a certain state under a certain sense of subjectivity level of the user;
Receive value information related to value for the user and the corresponding reward, generate a desired state desired by the user, and determine a policy that is a set of actions that can bring about the desired state. Desire model and
an intention model that generates causal relationship information relating to causal relationships between states, values, rewards, and actions based on the belief information, the value information, and the reward;
an action model that determines and outputs an action to be taken for an observed state based on the policy and the causal relationship information;
determining or updating and outputting the sense of subjectivity level of the user based on the output new state caused by the action and the feature amount related to the predetermined feature of the user under the new state; 2. A sense of subjectivity estimation model characterized by causing a computer to function as a sense of subjectivity model for updating the sense of subjectivity level used in the belief model to the determined or updated sense of subjectivity level.

The behavioral model is
an action planning unit that generates an optimal policy based on the policy, the causal relationship information, and the contents of communication including a predetermined question made with the user;
2. The subjective feeling estimation model according to claim 1, further comprising an action determination unit that determines and outputs an action to be taken for an observed state using the generated optimal policy.

receive information related to the value information and information related to the remuneration corresponding to the information related to the value information from the user, and based on the information related to the value information and the information related to the remuneration, the value information and the 3. A sense of subjectivity estimation model according to claim 1 or 2, further comprising a computer functioning as a value matching model for generating or updating the corresponding reward and outputting it to said desire model.

4. The sense of subjectivity estimation model according to claim 3, wherein the value matching model is constructed using an algorithm related to Cooperative Inverse Reinforcement Learning (CIRL).

Receiving an observed state and a corresponding output action, and selecting an alternative state as a possible state candidate based on integrated causal relationship information that integrates the causal relationship information for each of at least a plurality of users. a state generator that generates and outputs;
Generate a loss representing the difference from the desired state from the new state caused by the action and the alternative state, train the state generator with the loss, and train itself with the loss. a discriminator that performs
an evaluator that generates a predicted reward that is a reward corresponding to the alternative state generated by the trained state generator, and that uses the predicted reward to train the behavior model to determine the behavior. 5. The sense of subjectivity estimation model according to any one of claims 1 to 4, further comprising a computer functioning as an alternative state generation/evaluation model.

6. The sense of subjectivity estimation model according to claim 5, wherein the alternative state generation/evaluation model is constructed using an algorithm related to Generative Adversarial Networks (GAN).

7. Sense of subjectivity estimation according to any one of claims 1 to 6, wherein said belief model is constructed using an algorithm related to Partially Observable Markov Decision Process (POMDP). model.

8. The sense of subjectivity estimation model according to any one of claims 1 to 7, wherein the causal relationship information is information related to a Bayesian network algorithm.

9. A sense of subjectivity estimating apparatus that uses the sense of subjectivity estimation model according to any one of claims 1 to 8 to estimate a sense of subjectivity of the user from an observed state in the environment.

A method for estimating a sense of subjectivity in a computer for estimating a sense of subjectivity of a user while determining behavior with respect to the state of the environment world including the user using a reward, comprising:
a step of generating or updating belief information, which is information including the probability that a new state will occur as a result of performing a certain action on a certain state under a certain sense of subjectivity level of the user;
Receive value information related to value for the user and the corresponding reward, generate a desired state desired by the user, and determine a policy that is a set of actions that can bring about the desired state. a step;
generating causal relationship information relating to causal relationships between states, values, rewards and actions based on the belief information, the value information and the reward;
determining and outputting an action to be taken for the observed state based on the policy and the causality information;
determining or updating and outputting the sense of subjectivity level of the user based on the output new state caused by the action and the feature amount related to the predetermined feature of the user under the new state; and a sense of subjectivity model for updating a sense of subjectivity level used in the step of generating or updating the belief information to the determined or updated sense of subjectivity level.

A behavior modification promotion model that functions a computer that encourages behavior modification of a user while determining behavior with respect to the state of the environment world including the user using a reward,
a belief model for generating or updating belief information, which is information including the probability that a new state will occur as a result of performing a certain action on a certain state under a certain sense of subjectivity level of the user;
Receive value information related to value for the user and the corresponding reward, generate a desired state desired by the user, and determine a policy that is a set of actions that can bring about the desired state. Desire model and
an intention model that generates causal relationship information relating to causal relationships between states, values, rewards, and actions based on the belief information, the value information, and the reward;
Based on the policy, the causal relationship information, and the contents of communication including the predetermined question with the user, an optimal policy is generated, and using the optimal policy, an action model that determines and outputs an action to be taken for an observed state;
determining or updating the sense of subjectivity level of the user based on the output new state caused by the action and the feature amount of the predetermined feature of the user under the new state; A behavior change promotion model characterized by causing a computer to function as a sense of subjectivity model for updating the sense of subjectivity level used in the above to the determined or updated sense of subjectivity level.