JP7448502B2

JP7448502B2 - Sense of agency estimation model, device and method, and behavioral change promotion model

Info

Publication number: JP7448502B2
Application number: JP2021056881A
Authority: JP
Inventors: ロベルトセバスチャンレガスピ; 文臻徐; 真弥和田; 達也小西; 茂莉黒川
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2021-03-30
Filing date: 2021-03-30
Publication date: 2024-03-12
Anticipated expiration: 2041-03-30
Also published as: JP2022154041A

Description

本発明は、「説得」によって人の行動変容を促す説得的技術（Persuasive technology）に関する。 The present invention relates to persuasive technology that promotes behavioral change in people through "persuasion."

ユーザの信念、認識や望みを所定の方向に変化させ得る広義の「説得」によって、当該ユーザの行動変容を促す説得的技術（Persuasive technology）が注目されている。例えば、健康福祉の分野、教育の分野、及び都市交通の分野においても、それぞれ例えば特許文献１、特許文献２、及び特許文献３に開示されているように、この説得的技術の適用が精力的に進められている。 Persuasive technology that promotes behavior change in users through "persuasion" in a broad sense that can change users' beliefs, perceptions, and desires in a predetermined direction is attracting attention. For example, in the field of health and welfare, the field of education, and the field of urban transportation, the application of this persuasive technology is being actively pursued, as disclosed in Patent Document 1, Patent Document 2, and Patent Document 3, respectively. is being advanced.

また、説得的技術は、ＡＩ（Artificial Intelligence）がユーザとのコミュニケーションを介し、ユーザに種々様々なサービスを提供する自立型ＡＩシステムを実現する上で欠かせない技術として、今後ますます発展していくものと考えられる。 In addition, persuasive technology will continue to develop as an indispensable technology for realizing autonomous AI systems that provide a variety of services to users through communication with users. It is thought that it will happen.

Rita Orji and Karyn Moffatt, "Persuasive technology for health and wellness: State-of-the-art and emerging trends", Health Informatics J. 24(1), pp.66-91. ２０１８年, ＜https://doi.org/10.1177/1460458216650979＞Rita Orji and Karyn Moffatt, "Persuasive technology for health and wellness: State-of-the-art and emerging trends", Health Informatics J. 24(1), pp.66-91. 2018, <https://doi .org/10.1177/1460458216650979＞ Yohana Dewi Lulu Widyasari et al., "Persuasive technology for enhanced learning behavior in higher education", International Journal of Educational Technology in Higher Education, 16:15, ２０１９年, ＜https://doi.org/10.1186/s41239-019-0142-5＞Yohana Dewi Lulu Widyasari et al., "Persuasive technology for enhanced learning behavior in higher education", International Journal of Educational Technology in Higher Education, 16:15, 2019, <https://doi.org/10.1186/s41239-019 -0142-5＞ Evangelia Anagnostopoulou et al., "Persuasive Technologies for Sustainable Mobility: State of the Art and Emerging Trends", Sustainability 2018, 10(7), pp.2128, ２０１８年, ＜https://doi.org/10.3390/su10072128＞Evangelia Anagnostopoulou et al., "Persuasive Technologies for Sustainable Mobility: State of the Art and Emerging Trends", Sustainability 2018, 10(7), pp.2128, 2018, <https://doi.org/10.3390/su10072128>

しかしながら、上述したような従来の説得的技術では、ＡＩがユーザを「説得」するにしても、複雑な又は刻々と変動する通常の環境世界の状況にあって、ユーザのより適切な行動変容を促す、具体的にはより適切な意思決定や行動決定を促すことは依然、困難であるのが実情である。 However, with the conventional persuasive techniques described above, even if AI "persuades" the user, it is difficult to induce more appropriate behavior changes in the user's complex or ever-changing normal environmental situations. The reality is that it is still difficult to encourage, specifically to encourage more appropriate decision-making and behavioral decisions.

本願発明者等は、この困難である理由が、ＡＩの「説得」を受けて自らの意思で行った行動と、その結果として現れた環境世界の状態との間に、ユーザ自身が繋がりや連動性を感じられないケースが少なからず生じることにある、と考えた。すなわち従来の説得的技術では、ユーザの「行動主体感」（自らの行動によって周囲に影響を与えているという感覚）の変化を何ら考慮していないので、この「行動主体感」を減退させることのない、ユーザにとって納得のいく適切な「説得」を行うことが、非常に困難になっていることを突き止めたのである。 The inventors of the present application believe that the reason for this difficulty is that the user himself/herself has created a connection or interlock between the action taken of one's own will after being "persuaded" by AI and the state of the environmental world that appeared as a result. I believe that this is due to the fact that there are many cases where people are unable to feel their sexuality. In other words, conventional persuasive techniques do not take into account any changes in the user's ``sense of agency'' (the feeling that one's actions have an impact on the surroundings), so it is important to reduce this ``sense of agency''. They discovered that it has become extremely difficult to provide appropriate "persuasion" that satisfies users.

そこで、本発明は、ユーザの行動主体感を推定し、また当該行動主体感を考慮して当該ユーザの意思又は行動における変化を促すことの可能な主体感推定モデル、主体感推定装置、主体感推定方法、及び行動変容促進モデルを提供することを目的とする。 Therefore, the present invention provides a sense of agency estimation model, a sense of agency estimation device, and a sense of agency that can estimate a user's sense of agency and encourage changes in the user's intention or behavior by taking the sense of agency into consideration. The purpose is to provide an estimation method and a behavior change promotion model.

本発明によれば、ユーザを含む環境世界の状態に対する行動を、報酬を用いて決定する中で、当該ユーザの主体感を推定するコンピュータを機能させる主体感推定モデルであって、
当該ユーザのある主体感レベルの下で、ある状態に対してある行動を行った結果、ある新たな状態が生じる確率を含む情報であるビリーフ情報を生成又は更新するビリーフモデルと、
当該ユーザにとっての価値に係る価値情報と、それに対応する当該報酬とを受け取って、当該ユーザの所望する状態である所望状態を生成し、当該所望状態をもたらし得る行動の集合である方針を決定するデザイアモデルと、
当該ビリーフ情報と、当該価値情報及び当該報酬とに基づき、状態、価値、報酬及び行動の間の因果関係に係る因果関係情報を生成する意思モデルと、
当該方針と、当該因果関係情報と、当該ユーザとの間で行った所定の問いかけを含むコミュニケーションの内容とに基づき、最適とされる方針である最適方針を生成し、当該最適方針を用いて、観測された状態に対して行うべき行動を決定し、出力する行動モデルと、
出力された当該行動によって生じた新たな状態と、当該新たな状態の下での当該ユーザの所定の特徴に係る特徴量とに基づき、当該ユーザの主体感レベルを決定又は更新し、出力するとともに、上記のビリーフモデルで用いる主体感レベルを、決定又は更新した当該主体感レベルに更新させる主体感モデルと
してコンピュータを機能させる主体感推定モデルが提供される。 According to the present invention, there is provided a sense of agency estimation model that allows a computer to function to estimate a user's sense of agency while determining behavior regarding the state of the environmental world including the user using rewards,
a belief model that generates or updates belief information that is information that includes the probability that a new state will occur as a result of performing a certain action for a certain state under a certain sense of agency level of the user;
Receive value information related to value for the user and the corresponding reward, generate a desired state that is the state desired by the user, and determine a policy that is a set of actions that can bring about the desired state. Desire model and
an intention model that generates causal relationship information regarding a causal relationship between states, values, rewards, and actions based on the belief information, the value information, and the reward;
Based on the policy, the cause-and-effect relationship information , and the content of the communication including the predetermined questions conducted with the user , an optimal policy that is the optimal policy is generated, and using the optimal policy, a behavior model that determines and outputs actions to be taken in response to observed conditions;
Determining or updating the sense of agency level of the user based on the output new state caused by the action and the feature amount related to the predetermined characteristics of the user under the new state, and outputting the determined level. A sense of agency estimation model is provided that causes a computer to function as a sense of agency model that updates the sense of agency level used in the above belief model to the determined or updated sense of agency level.

さらに、本発明による主体感推定モデルの他の実施形態として、主体感推定モデルは、当該ユーザから、当該価値情報に係る情報と、当該価値情報に係る情報に対応する報酬に係る情報とを受け取って、当該価値情報に係る情報及び当該報酬に係る情報に基づき、当該価値情報及びそれに対応する当該報酬を生成又は更新し、上記のデザイアモデルへ出力する価値一致化モデルとしてコンピュータを更に機能させることも好ましい。またここで、この価値一致化モデルは、協調逆強化学習（ＣＩＲＬ，Cooperative Inverse Reinforcement Learning）に係るアルゴリズムを用いて構築されていることも好ましい。 Furthermore, as another embodiment of the sense of agency estimation model according to the present invention, the sense of agency estimation model receives from the user information related to the value information and information related to the reward corresponding to the information related to the value information. and further cause the computer to function as a value matching model that generates or updates the value information and the corresponding remuneration based on the information regarding the value information and the information regarding the remuneration, and outputs it to the above-mentioned desire model. is also preferable. Here, it is also preferable that this value matching model is constructed using an algorithm related to Cooperative Inverse Reinforcement Learning (CIRL).

さらに、本発明による主体感推定モデルの更なる他の実施形態として、主体感推定モデルは、
観測された状態と、これに対応する出力された行動とを受け取って、少なくとも複数のユーザの各々についての当該因果関係情報を統合した統合因果関係情報に基づき、起こり得る状態候補としての代替状態を生成し出力する状態生成器と、
上記の出力された行動によって生じた新たな状態と、当該代替状態とから、当該所望状態との相違を表す損失を生成し、当該損失をもって状態生成器に対し訓練を行わせ、また当該損失をもって自らの訓練を行う判別器と、
訓練された状態生成器で生成される当該代替状態に対応する報酬である予測報酬を生成し、当該予測報酬をもって行動モデルに対し当該行動の決定についての訓練を行わせる評価器と
を有する代替状態生成・評価モデルとしてコンピュータを更に機能させることも好ましい。また、この代替状態生成・評価モデルは、敵対的生成ネットワーク（ＧＡＮ，Generative Adversarial Networks）に係るアルゴリズムを用いて構築されていることも好ましい。 Furthermore, as yet another embodiment of the sense of agency estimation model according to the present invention, the sense of agency estimation model is as follows:
Receive the observed state and the corresponding output action, and select an alternative state as a possible state candidate based on integrated causal relationship information that integrates the causal relationship information for each of at least a plurality of users. a state generator that generates and outputs;
Generate a loss representing the difference from the desired state from the new state caused by the output action and the alternative state, train the state generator using the loss, and use the loss to train the state generator. A discriminator that trains itself,
an evaluator that generates a predicted reward that is a reward corresponding to the alternative state generated by a trained state generator, and trains a behavioral model to determine the behavior using the predicted reward; It is also preferable to have the computer further function as a generation/evaluation model. It is also preferable that this alternative state generation/evaluation model is constructed using an algorithm related to generative adversarial networks (GAN).

さらに本発明による主体感推定モデルにおいて、ビリーフモデルは、部分観測マルコフ決定過程（ＰＯＭＤＰ，Partially Observable Markov Decision Process）に係るアルゴリズムを用いて構築されていることも好ましい。また、意思モデルにおける因果関係情報は、ベイジアンネットワーク（Bayesian network）アルゴリズムに係る情報であることも好ましい。 Further, in the sense of agency estimation model according to the present invention, it is preferable that the belief model is constructed using an algorithm related to a Partially Observable Markov Decision Process (POMDP). Further, it is also preferable that the causal relationship information in the intention model is information related to a Bayesian network algorithm.

本発明によれば、また、以上に述べた主体感推定モデルを用いて、当該環境世界における観測された状態から、当該ユーザの主体感を推定する主体感推定装置が提供される。 According to the present invention, there is also provided a sense of agency estimation device that estimates the user's sense of agency from the observed state in the environmental world using the sense of agency estimation model described above.

本発明によれば、さらに、ユーザを含む環境世界の状態に対する行動を、報酬を用いて決定する中で、当該ユーザの主体感をコンピュータが推定する主体感推定方法であって、
当該ユーザのある主体感レベルの下で、ある状態に対してある行動を行った結果、ある新たな状態が生じる確率を含む情報であるビリーフ情報を生成又は更新するステップと、
当該ユーザにとっての価値に係る価値情報と、それに対応する当該報酬とを受け取って、当該ユーザの所望する状態である所望状態を生成し、当該所望状態をもたらし得る行動の集合である方針を決定するステップと、
当該ビリーフ情報と、当該価値情報及び当該報酬とに基づき、状態、価値、報酬及び行動の間の因果関係に係る因果関係情報を生成するステップと、
当該方針と、当該因果関係情報と、当該ユーザとの間で行った所定の問いかけを含むコミュニケーションの内容とに基づき、最適とされる方針である最適方針を生成し、生成された当該最適方針を用いて、観測された状態に対して行うべき行動を決定し、出力するステップと、
出力された当該行動によって生じた新たな状態と、当該新たな状態の下での当該ユーザの所定の特徴に係る特徴量とに基づき、当該ユーザの主体感レベルを決定又は更新し、出力するとともに、上記のビリーフ情報を生成又は更新するステップで用いる主体感レベルを、決定又は更新した当該主体感レベルに更新させるステップと
を有する主体感推定方法が提供される。 According to the present invention, there is further provided a sense of agency estimation method in which a computer estimates a user's sense of agency while determining an action regarding the state of the environmental world including the user using a reward,
generating or updating belief information, which is information that includes a probability that a new state will occur as a result of performing a certain action on a certain state under a certain sense of agency level of the user;
Receive value information related to value for the user and the corresponding reward, generate a desired state that is the state desired by the user, and determine a policy that is a set of actions that can bring about the desired state. step and
a step of generating causal relationship information regarding a causal relationship between a state, a value, a reward, and an action based on the belief information, the value information, and the reward;
Based on the policy, the cause-and-effect relationship information , and the content of the communication including the predetermined questions conducted with the user , an optimal policy that is considered to be the optimal policy is generated, and the generated optimal policy is a step of determining and outputting an action to be taken for the observed state using the method;
Determining or updating the sense of agency level of the user based on the output new state caused by the action and the feature amount related to the predetermined characteristics of the user under the new state, and outputting the determined level. A method for estimating a sense of agency is provided, comprising: updating the sense of agency level used in the step of generating or updating the belief information to the determined or updated sense of agency level.

本発明によれば、さらにまた、ユーザを含む環境世界の状態に対する行動を、報酬を用いて決定する中で、当該ユーザの行動変容を促すコンピュータを機能させる行動変容促進モデルであって、
当該ユーザのある主体感レベルの下で、ある状態に対してある行動を行った結果、ある新たな状態が生じる確率を含む情報であるビリーフ情報を生成又は更新するビリーフモデルと、
当該ユーザにとっての価値に係る価値情報と、それに対応する当該報酬とを受け取って、当該ユーザの所望する状態である所望状態を生成し、当該所望状態をもたらし得る行動の集合である方針を決定するデザイアモデルと、
当該ビリーフ情報と、当該価値情報及び当該報酬とに基づき、状態、価値、報酬及び行動の間の因果関係に係る因果関係情報を生成する意思モデルと、
当該方針と、当該因果関係情報と、当該ユーザとの間で行った所定の問いかけを含むコミュニケーションの内容とに基づき、最適とされる方針である最適方針を生成し、当該最適方針を用いて、観測された状態に対して行うべき行動を決定し、出力する行動モデルと、
出力された当該行動によって生じた新たな状態と、当該新たな状態の下での当該ユーザの所定の特徴に係る特徴量とに基づき、当該ユーザの主体感レベルを決定又は更新し、上記のビリーフモデルで用いる主体感レベルを、決定又は更新した当該主体感レベルに更新させる主体感モデルと
してコンピュータを機能させる行動変容促進モデルが提供される。 According to the present invention, there is further provided a behavior change promotion model that operates a computer that promotes behavior change in a user while determining behavior regarding the state of the environmental world including the user using rewards,
a belief model that generates or updates belief information that is information that includes the probability that a new state will occur as a result of performing a certain action for a certain state under a certain sense of agency level of the user;
Receive value information related to value for the user and the corresponding reward, generate a desired state that is the state desired by the user, and determine a policy that is a set of actions that can bring about the desired state. Desire model and
an intention model that generates causal relationship information regarding a causal relationship between states, values, rewards, and actions based on the belief information, the value information, and the reward;
Based on the policy, the cause-and-effect relationship information, and the content of the communication including the predetermined questions conducted with the user, an optimal policy that is the optimal policy is generated, and using the optimal policy, a behavior model that determines and outputs actions to be taken in response to observed conditions;
Based on the output new state caused by the action and the feature amount related to the predetermined characteristics of the user under the new state, the user's sense of agency level is determined or updated, and the above beliefs are determined. A behavior change promoting model is provided that causes a computer to function as a sense of agency model that updates the sense of agency level used in the model to the determined or updated sense of agency level.

本発明の主体感推定モデル、主体感推定装置、主体感推定方法、及び行動変容促進モデルによれば、ユーザの行動主体感を推定し、また当該行動主体感を考慮して当該ユーザの意思又は行動における変化を促すことができる。 According to the sense of agency estimation model, sense of agency estimation device, sense of agency estimation method, and behavioral change promotion model of the present invention, a user's sense of agency is estimated, and the user's intention or intention is determined based on the sense of agency. It can encourage changes in behavior.

本発明による主体感推定モデルの一実施形態を示す模式図、及び本発明による主体感推定装置の一実施形態における機能構成を示す機能ブロック図である。1 is a schematic diagram showing an embodiment of a sense of agency estimation model according to the present invention, and a functional block diagram showing a functional configuration of an embodiment of a sense of agency estimation device according to the present invention. 本発明の一実施形態における理論的基礎体系を説明するための模式図である。FIG. 1 is a schematic diagram for explaining a theoretical basic system in an embodiment of the present invention.

以下、本発明の実施形態について、図面を用いて詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail using the drawings.

［主体感推定モデル，行動変容促進モデル］
図１は、本発明による主体感推定モデルの一実施形態を示す模式図、及び本発明による主体感推定装置の一実施形態における機能構成を示す機能ブロック図である。 [Sense of agency estimation model, behavior change promotion model]
FIG. 1 is a schematic diagram showing an embodiment of a sense of agency estimation model according to the present invention, and a functional block diagram showing a functional configuration of an embodiment of a sense of agency estimation device according to the present invention.

図１に示した本実施形態の主体感推定モデル１は、（価値一体化モデル１３や行動モデル１５に備えられた）インタフェース（ＩＦ）を介し、ユーザＨとコミュニケーションを行う中で策定した方針に従い、ユーザＨを含む環境世界に対し行動を行って、環境世界の状態を所望の状態に向けて変化させることの可能なモデルとなっている。 The sense of agency estimation model 1 of this embodiment shown in FIG. , is a model in which it is possible to take actions on the environmental world, including the user H, and change the state of the environmental world toward a desired state.

ここで主体感推定モデル１は、上記のコミュニケーションを介し、ユーザの所定の特徴に係る情報（例えば心拍数や呼吸の速さ等の生理学指標）を取得して、ユーザＨの「行動主体感」（自らの行動によって周囲に影響を与えているという感覚。以下、主体感と略称）を定量化している。さらに、ユーザＨとコミュニケーションを行いつつ、環境世界の状態を受けて方針を策定し行動を行うサイクルの中で、適宜この「主体感」を更新する。またこれにより例えば本実施形態においては、ユーザＨのリアルタイムの「主体感」や、ユーザＨの「主体感」のダイナミックな変動を決定し出力することも可能となっているのである。 Here, the sense of agency estimation model 1 acquires information related to predetermined characteristics of the user (for example, physiological indicators such as heart rate and breathing rate) through the above-mentioned communication, and estimates the user H's "sense of agency". (The feeling of having an impact on those around you through your actions; hereafter abbreviated as sense of agency) is quantified. Furthermore, while communicating with user H, this "sense of ownership" is updated as appropriate during the cycle of formulating policies and taking actions in response to the state of the environmental world. Furthermore, in this embodiment, for example, it is also possible to determine and output the user H's real-time "sense of subjectivity" and the dynamic fluctuations of the user H's "sense of subjectivity."

さらに、主体感推定モデル１は本実施形態において、例え複雑な且つ刻々と変動する環境世界の状況にあっても、ユーザＨと適切なコミュニケーションを行い、ユーザＨの「主体感」を維持向上させつつ（減退させることなく）、ユーザＨにおける行動変容を促す、具体的にはより適切な意思決定や行動決定を促すことの可能な行動変容促進モデルにもなっている。 Furthermore, in this embodiment, the sense of agency estimation model 1 maintains and improves the "sense of agency" of the user H by appropriately communicating with the user H even in complex and ever-changing environmental world situations. At the same time, it also serves as a behavior change promotion model that can encourage user H to change his or her behavior, specifically, to encourage more appropriate decision-making and behavioral decisions.

具体的に、主体感推定モデル（行動変容促進モデル）１は、ユーザＨを含む環境世界の状態に対する「行動」を、当該状態及び当該行動に基づき算出される「報酬」を用いて決定する中で、ユーザＨの「主体感」を推定するモデルとなっており、図１に示したように少なくとも、
（Ａ）ユーザＨのある「主体感レベル」の下で、ある状態に対してある行動を行った結果、ある新たな状態が生じる確率を含む情報である「ビリーフ情報」を生成又は更新するビリーフモデル１１と、
（Ｂ）ユーザＨにとっての価値に係る「価値情報」と、それに対応する「報酬」とを受け取って、「価値情報」及び「報酬」に基づき、ユーザＨの所望する状態である所望状態を生成し、当該所望状態をもたらし得る行動の集合である「方針」を決定するデザイアモデル１２と、
（Ｃ）「ビリーフ情報」と、「価値情報」及び「報酬」とを受け取り、状態、価値情報、報酬及び行動の間の因果関係に係る「因果関係情報」を生成する意思モデル１４と、
（Ｄ）「方針」と、「因果関係情報」とに基づき、観測された状態に対して行うべき「行動」を決定し、出力する行動モデル１５と、
（Ｅ）出力された「行動」によって生じた「新たな状態」と、「新たな状態」の下でのユーザＨの所定の特徴に係る「特徴量」（例えば心拍数や呼吸の速さ等の生理学指標から生成された特徴量）とを受け取って、「新たな状態」及び「特徴量」に基づき、ユーザＨの「主体感レベル」を決定又は更新し、出力するとともに、上記（Ａ）のビリーフモデル１１で用いる「主体感レベル」を、決定又は更新した「主体感レベル」に更新させる主体感モデル１７と
してコンピュータを機能させるモデルとなっている。 Specifically, the sense of agency estimation model (behavior change promotion model) 1 determines "behavior" for the state of the environmental world, including the user H, using the "reward" calculated based on the state and the action. This is a model for estimating user H's "sense of agency," and as shown in Figure 1, at least
(A) A belief that generates or updates "belief information" that is information that includes the probability that a new state will occur as a result of performing a certain action in a certain state under a certain "level of agency" of the user H. Model 11 and
(B) Receive "value information" related to value for user H and the corresponding "reward" and generate a desired state that is the state desired by user H based on the "value information" and "reward" and a desire model 12 that determines a "policy" that is a set of actions that can bring about the desired state;
(C) an intention model 14 that receives "belief information", "value information" and "reward" and generates "causal relationship information" regarding the causal relationship between the state, value information, reward, and behavior;
(D) a behavior model 15 that determines and outputs the "action" to be taken in response to the observed state based on the "policy" and "causal relationship information";
(E) A “new state” caused by the output “behavior” and “feature amounts” related to predetermined characteristics of user H under the “new state” (e.g. heart rate, breathing rate, etc.) Based on the "new state" and "features", the user H's "sensitivity level" is determined or updated and output, and the above (A) This is a model in which a computer functions as a sense of agency model 17 that updates the "level of sense of agency" used in the belief model 11 of 2008 to the determined or updated "level of sense of agency."

このように、主体感推定モデル１は、従来推定の困難であった（特に、複雑な且つ刻々と変動する環境世界下での推定ができなかった）「主体感」を推定することができ、さらに、適宜更新したユーザＨの「主体感（レベル）」を考慮して、環境世界に対する行動を決定することにより、その中でユーザＨの行動変容、すなわち意思又は行動における変化を促すこともできる。例えば、主体感推定モデル１によれば、（主体感推定モデル１からの提案や説得等を受けて）ユーザＨが自らの意思で行った行動と、その結果として現れた環境世界の状態との間に、ユーザＨ自身が繋がりや連動性を感じられるように、すなわちユーザＨの「主体感」を向上させこそすれ減退させずに、適切な提案や説得等を行うことも可能となるのである。 In this way, sense of agency estimation model 1 can estimate the "sense of agency," which has traditionally been difficult to estimate (especially in a complex and ever-changing environment). Furthermore, by determining the behavior toward the environmental world by taking into consideration User H's appropriately updated "sense of agency (level)," it is also possible to encourage User H to change his or her behavior, that is, change in intention or behavior. . For example, according to the sense of agency estimation model 1, there is a difference between an action that the user H takes of his or her own will (in response to a proposal or persuasion from the sense of agency estimation model 1) and the state of the environmental world that appears as a result of the action. In the meantime, it is also possible to make appropriate proposals and persuasion by improving User H's sense of agency, so that User H feels a sense of connection and interactivity, without diminishing it. .

次に、図２を用いて、本実施形態の主体感推定モデル（行動変容促進モデル）１における機能構成の理論的基礎について説明を行う。 Next, the theoretical basis of the functional configuration of the sense of agency estimation model (behavior change promotion model) 1 of this embodiment will be explained using FIG. 2.

図２は、本発明の一実施形態における理論的基礎体系を説明するための模式図である。 FIG. 2 is a schematic diagram for explaining the theoretical basic system in one embodiment of the present invention.

一般に、ユーザの主体感（ＳｏＡ，Sense of Agency）は、（環境世界において予測される状態を目指して行われる）ユーザの意思的な行動の良好な流れを中断させるような妨害事象の発生から、強い影響を受けて変動する。以下、図２を用い、状態－意思－行動－新たな状態の連鎖の中で、主体感レベルが如何に変化するのかを説明する。 In general, a user's sense of agency (SoA) is based on the occurrence of a disturbing event that interrupts the good flow of the user's intentional actions (which are aimed at a predicted state in the environmental world). It fluctuates under strong influence. Below, using FIG. 2, we will explain how the sense of agency level changes in the chain of state-intention-action-new state.

最初に、将来発生する意思を導出するモデルとして、ビリーフ－デザイア－意思モデル（Georgeff et al., 5th International Workshop, ATAL’98 Proceedings, pp.1-10, 1998）が公知である。このモデルにおいて、「ビリーフ（信念・確信）」（図２）及び「デザイア（願望・欲求）」（図２）はそれぞれ、環境世界の知覚された状態構造、及び環境世界の所望・希望する状態構造についての蓄積された情報となっている。 First, the belief-desire-intention model (Georgeff et al., 5th International Workshop, ATAL'98 Proceedings, pp.1-10, 1998) is known as a model for deriving intentions that will occur in the future. In this model, “beliefs” (Figure 2) and “desires” (Figure 2) represent the perceived state structure of the environmental world and the desired/desired state of the environmental world, respectively. It is a collection of information about the structure.

また、「意思」（図２）は、「ビリーフ」及び「デザイア」から決定され、具体的には、環境世界の所望・希望する状態構造をもたらすと仮定された行動を含む情報となっている。このように、「意思」は行動を特定し制御するのであるが、このような行動の特定・制御が、変化した「ビリーフ」や「デザイア」の影響を受けて更新されるのである。 In addition, "intention" (Figure 2) is determined from "belief" and "desire," and specifically, it is information that includes actions that are assumed to bring about the desired/desired state structure of the environmental world. . In this way, ``intention'' specifies and controls behavior, and this ``intention'' specifies and controls behavior, and is updated under the influence of changed ``beliefs'' and ``desires.''

また、ここで主体感は、所望・希望する状態構造を達成することになる「意思」が「行動」（図２）の中に具現化しているか否かによって決定される感覚であり、「意思」を受けた「行動計画・選択」（図２）によって与えられることになる。 In addition, the sense of agency here is a feeling determined by whether or not the "intention" that will achieve the desired/desired state structure is embodied in the "action" (Figure 2). ” (Figure 2).

本願発明者等は、以上に述べた「意思」についての知見と、従来の主体感の出現や途絶に関する認知科学や神経科学の理論とを統合し、図２のような基礎体系を考案したのである。 The inventors of this application have integrated the above-mentioned knowledge about "intention" with conventional theories of cognitive science and neuroscience regarding the emergence and discontinuation of a sense of agency, and have devised a basic system as shown in Figure 2. be.

ここで従来、認知科学や神経科学ではモータ（motor）の学習制御の理論が存在する中、「比較器」（図２）を用いたモデルの行動認知に対する妥当性が議論されてきた。この「比較器」モデルによれば、モータの駆動（行動）は、モータへの指令の遠心性コピーに基づき生成されるモータ出力の予測結果に伴って実施される。次いで、「比較器」において、この予測結果と実測されたモータ出力とが比較され、両者が一致する場合、このモータ出力は、モータ自身の駆動（行動）を原因としたものであるとして記録される。一方、一致しない場合、モータ駆動を制御しているという意味での"主体感"における中断や途絶が発生したとするのである。 Conventionally, in cognitive science and neuroscience, there has been a theory of learning control of motors, and the validity of models using "comparators" (Figure 2) for behavioral cognition has been debated. According to this "comparator" model, the actuation of the motor is performed in conjunction with a predicted motor output that is generated based on an efferent copy of the command to the motor. Next, the "comparator" compares this predicted result with the actually measured motor output, and if the two match, this motor output is recorded as being caused by the drive (action) of the motor itself. Ru. On the other hand, if they do not match, it is assumed that an interruption or discontinuation of the "sense of ownership" in the sense of controlling the motor drive has occurred.

これに対し、認知科学や神経科学における遡及推定（ＲＩ，Retrospective inference）の理論は、意図された又は予測された状態と観測された実際の状態とが一致すれば、主体感が生じたとの「推定」（図２）を行うものとなっている。また、「意思」とともに他の高次の（認知に係る）因子、例えば外界のコンテキストや社会的状況に係る手がかりも主体感の「推定」において考慮している。具体的にこの理論では、観測された状態が予測通りに生じた場合、行動は円滑に実施されていき、行動や態度についての考えは、意識の片隅におかれることになる。一方、観測された状態が予測とは異なる場合、脳は遡及推定（ＲＩ）を行い、起こした行動は観測された状態の原因となったのか否かについての解答を求めるのである。 In contrast, the theory of retrospective inference (RI) in cognitive science and neuroscience holds that a sense of agency has occurred if the intended or predicted state matches the observed actual state. ``estimation'' (Figure 2). In addition to ``intention,'' other higher-order (cognition-related) factors, such as cues related to the context of the outside world and social situations, are also considered in ``estimating'' the sense of agency. Specifically, according to this theory, if the observed state occurs as predicted, the behavior will be carried out smoothly, and thoughts about the behavior and attitude will be left in the back of consciousness. On the other hand, if the observed state differs from the prediction, the brain performs retrospective inference (RI) to answer whether the action caused the observed state.

以下、以上に説明した理論的基礎体系をコンピュータにおいて具現した、本発明の一実施形態としての主体感推定モデル（行動変容促進モデル）１における具体的構成について詳細に説明を行う。 Hereinafter, a detailed explanation will be given of the specific configuration of the sense of agency estimation model (behavior change promotion model) 1 as an embodiment of the present invention, in which the theoretical basic system explained above is implemented in a computer.

ちなみに通常、人の脳は、他人の心を表現した精神モデルを保持し、それを使って他人の精神状態を察する処理をこなしている。この処理能力についての認知科学の理論であるいわゆる心の理論（ＴｏＭ，Theory of Mind）では、人はこのような処理能力を保持するが故に、様々なコンテキストの中で他人が如何に振舞うのか及び何故そのように振舞うのかについての認知を直感的に得ることができるとしている。 By the way, the human brain normally maintains a mental model that represents the mind of another person, and uses this model to process the process of sensing the mental state of another person. The so-called Theory of Mind (ToM), which is a cognitive science theory about this processing ability, states that because people possess this processing ability, they are able to understand how others behave in various contexts. It is said that it is possible to intuitively gain an understanding of why a person behaves in a certain way.

ここで以下に説明する主体感推定モデル（行動変容促進モデル）１は、この心の理論（ＴｏＭ）をいわばエミュレートし、「ビリーフ」、「デザイア」、「意思」、「行動計画・選択」及び「推定」を他人（本実施形態ではユーザＨ）に帰するものとし、何故他人（ユーザＨ）はそのように行動するのか、及び行動の結果としての環境世界の状態を如何に認知するのか、言い換えれば他人（ユーザＨ）の主体感はどのようになっており如何に作用するのか、について理解を行うのである。 The sense of agency estimation model (behavior change promotion model) 1 described below emulates this theory of mind (ToM), and includes "beliefs," "desires," "intentions," and "action plans and choices." and "estimation" are attributed to another person (user H in this embodiment), why does the other person (user H) act in this way, and how does he or she perceive the state of the environmental world as a result of the action? In other words, we understand what the other person's (user H's) sense of agency looks like and how it works.

［モデル構成，主体感推定方法］
以下、本発明による主体感推定モデル（行動変容促進モデル）１の一実施形態における機能構成について、より詳細に説明を行う。同じく図１によれば、主体感推定モデル（行動変容促進モデル）１は、
（ア）ビリーフモデル１１と、デザイアモデル１２と、価値一致化モデル１３と、意思モデル１４と、
（イ）行動計画部１５１及び行動決定部１５２を含む行動モデル１５と、
（ウ）ＣＢＮ（Causal Bayesian Network）集合体１６１と、状態生成器１６２と、判別器１６３と、評価器１６４とを含む代替状態生成・評価モデル１６と、
（エ）主体感モデル１７と
を、コンピュータ（に搭載されたプログラム）によって具現される機能構成部として備えている。以下、上記の各機能構成部について具体的に説明を行う。 [Model configuration, sense of agency estimation method]
Hereinafter, the functional configuration of an embodiment of the sense of agency estimation model (behavior change promotion model) 1 according to the present invention will be described in more detail. Similarly, according to Figure 1, the sense of agency estimation model (behavior change promotion model) 1 is as follows:
(A) Belief model 11, desire model 12, value matching model 13, intention model 14,
(a) a behavior model 15 including an action planning unit 151 and a behavior determining unit 152;
(c) an alternative state generation/evaluation model 16 including a CBN (Causal Bayesian Network) aggregate 161, a state generator 162, a discriminator 163, and an evaluator 164;
(d) A sense of agency model 17 is provided as a functional component implemented by (a program installed in) a computer. Each of the above functional components will be specifically explained below.

＜ビリーフ（信念・確信）モデル＞
同じく図１において、ビリーフ（信念・確信）モデル１１は、ユーザＨのある「主体感レベル」soaの下で、ある状態sに対してある行動aを行った結果、ある新たな状態ｓ'が生じる確率を含む情報である「ビリーフ情報」を生成又は更新するモデルである。本実施形態において、このビリーフモデル１１は、部分観測マルコフ決定過程（ＰＯＭＤＰ，Partially Observable Markov Decision Process）（MONAHAN G. E. Management Science 28(1), 1-16, 1982）に係るアルゴリズムを用いて構築される。 <Belief model>
Similarly, in FIG. 1, the belief model 11 assumes that as a result of a certain action a for a certain state s under a certain "level of agency" soa of the user H, a certain new state s' is created. This is a model that generates or updates "belief information," which is information that includes the probability of occurrence. In this embodiment, the belief model 11 is constructed using an algorithm related to Partially Observable Markov Decision Process (POMDP) (MONAHAN GE Management Science 28(1), 1-16, 1982). .

具体的にビリーフモデル１１は、
（ａ）ユーザＨ及び主体感推定モデル１を含む環境世界がとり得る状態sの集合を状態空間Ｓとし、
（ｂ）ユーザＨ及び主体感推定モデル１が行い得る（出力し得る）行動aの集合を行動空間Ａとし、
（ｃ）状態sにおいて行動aを行った際に、状態s'への遷移が生じる条件付き確率を遷移確率Τ(s'|s, a)とし、
（ｄ）状態sにおいて行動aを行った際のコストをc＝c(s, a)とし、
（ｅ）状態sにおいて行動aを行った際に状態s'への遷移が生じる場合に、主体感推定モデル１が環境世界から観測oを得る確率を観測確率O(o|s', a)として、
状態sにおいて行動aを行った主体感推定モデル１が、観測oを得た際の「ビリーフ情報」として、環境世界が状態s'をとる確率であるビリーフB(s')を導出する。 Specifically, Belief Model 11 is
(a) Let the state space S be the set of states s that the environmental world including the user H and the sense of agency estimation model 1 can take,
(b) Let the set of actions a that the user H and the sense of agency estimation model 1 can perform (that can be output) be an action space A,
(c) Let the conditional probability that a transition to state s' occurs when action a is performed in state s be the transition probability T(s'|s, a),
(d) Let the cost of performing action a in state s be c=c(s, a),
(e) If a transition to state s' occurs when action a is performed in state s, the probability that sense of agency estimation model 1 obtains observation o from the environmental world is the observation probability O(o|s', a) As,
Sense of agency estimation model 1, which performed action a in state s, derives belief B(s'), which is the probability that the environmental world takes state s', as "belief information" when observation o is obtained.

より具体的には、前時点でのビリーフをB(s)とし、β＝1／Prob(o|b, a)を規格化定数とすると、現時点での（B(s)の更新結果としての）ビリーフB(s')は、次式
（１） B(s')＝β・O(o|s', a)・Σ_s∈SΤ(s'|s, a)B(s)
によって算出することができる。これはいわば、環境世界のとり得る状態がどのぐらい起こり得るのかについての"信念・確信"の度合いととることも可能な情報となっている。 More specifically, if the belief at the previous point is B(s) and β=1/Prob(o|b, a) is the normalization constant, then the current update result of (B(s)) is ) Belief B(s') is expressed by the following formula (1) B(s')=β・O(o|s', a)・Σ _s∈S Τ(s'|s, a)B(s)
It can be calculated by This is information that can be interpreted as the degree of "belief/certainty" about the likelihood of possible states in the environmental world.

ここで、「主体感レベル」soaは、後に詳細に説明するが、ユーザＨがとり得る状態として状態空間Ｓの要素となっており、ビリーフB(s')は、後述する主体感モデル１７から受け取った更新された「主体感レベル」soa'を含む{s}についての総和（上式（１）のΣ_s∈S）をとることによって更新された値となるのである。 Here, the "sense of agency level" soa, which will be explained in detail later, is an element of the state space S as a state that the user H can take, and the belief B(s') is derived from the sense of agency model 17 described later. The updated value is obtained by taking the sum (Σ _s∈S in the above equation (1)) of {s} including the received updated "sense of agency level"soa'.

ちなみに、ユーザＨを、道路Ｘを走行している自動車のドライバとし、環境世界を、ユーザＨの自動車も含む道路交通状況及び道路網周辺の環境とすると、状態は例えば、「目的地Ｗに向かっている」、「道路ＸのユーザＨの位置での交通状況は"ノーマル"である」、「道路Ｘのこの先の交通状況は"渋滞"である」、「天候は"晴れ"である」、・・・であり、行動は例えば、「引き続き道路Ｘを走行する」、「（本モデルが）この先渋滞している旨を通知する」、「道路Ｙへ迂回する」、・・・とすることができる。また、「主体感レベル」は例えば、"high"、"a little high"、"neutral"、"a little low"、"low"の５段階に設定されてもよい。 By the way, if user H is the driver of a car traveling on road "The traffic situation on road X at the location of user H is 'normal'", "The traffic situation ahead on road X is 'congestion'", "The weather is 'sunny'", ...and the actions may be, for example, ``continue driving on road Can be done. Further, the "level of sense of subjectivity" may be set to five levels, for example, "high", "a little high", "neutral", "a little low", and "low".

ここでこのビリーフモデル１１を含む主体感推定モデル１全体において、通常とは異なり、主体感推定モデル１とユーザＨとは完全に分離したエンティティとはなっておらず、主体感推定モデル１は、ユーザＨが協力してくれることを期して処理を行うものとなっている。また、本主体感推定モデル１は、このビリーフモデル１１において、通常のＰＯＭＤＰにおいて用いられる報酬（関数）を採用しておらず、代わりに、この後説明するデザイア（願望・欲求）モデルにおいて報酬rを採用しているのである。 Here, in the entire sense of agency estimation model 1 including this belief model 11, unlike usual, the sense of agency estimation model 1 and the user H are not completely separate entities, and the sense of agency estimation model 1 is The process is performed in the hope that user H will cooperate. In addition, the main sense of agency estimation model 1 does not use the reward (function) used in normal POMDP in the belief model 11, but instead uses the reward r is adopted.

＜デザイア（願望・欲求）モデル＞
同じく図１において、デザイア（願望，欲求）モデル１２は、ユーザにとっての価値vに係る「価値情報」と、それに対応する報酬rとを受け取って、「価値情報」及び報酬rに基づき、ユーザの所望する状態である所望状態sd（∈Ｓ）を生成し、所望状態sdをもたらす可能性のある行動aの集合である方針πを決定するモデルである。 <Desire model>
Similarly, in FIG. 1, the desire model 12 receives "value information" related to the value v for the user and the corresponding reward r, and based on the "value information" and the reward r, the desire model 12 This is a model that generates a desired state sd (∈S), which is a desired state, and determines a policy π, which is a set of actions a that may bring about the desired state sd.

本実施形態において、このデザイアモデル１２は具体的に、深層ニューラルネットワーク（ＤＮＮ，Deep Neural Networks）アルゴリズムで構成され、
（ａ）ユーザＨにとっての（例えば社会的価値である）価値vのセットτ＝<v1, v2,・・・, vn>、例えばτ=<能率性, 倹約性, 利他性, 幸福度, 自己愛度, 他人の主体感を察する度合い>と、
（ｂ）状態sにおいて行動aを行った際に得られた観測oによって算出される報酬r（∈Ｒ(報酬空間)）と
を入力とし、ユーザの所望する状態である所望状態sd（∈Ｓ）を出力する価値・報酬モデルDMを用いて、所望状態sdをもたらす可能性のある行動aの集合である、観測oの関数としての方針π(o)＝<a1, a2,・・・, an>を決定するのである。 In this embodiment, the desire model 12 is specifically configured with a deep neural network (DNN) algorithm,
(a) A set of values v (e.g. social values) for user H τ=<v1, v2,..., vn>, for example τ=<efficiency, frugality, altruism, happiness, self degree of love, degree to which one senses a sense of agency in others> and
(b) The desired state sd (∈S ) is used as a value/reward model DM that outputs the policy π(o)=<a1, a2,..., an> is determined.

ここで、上記（ａ）の価値vのセットτも、上記（ｂ）の報酬rもともに、主体感推定モデル１との相互作用の中でユーザＨによって与えられた又は示されたものとなっている。ちなみに、このような価値vのセットτは、この後説明する価値一致化モデル１３で生成されるのである。 Here, both the set of values τ in (a) above and the reward r in (b) above are given or shown by the user H in the interaction with the sense of agency estimation model 1. ing. Incidentally, such a set τ of values v is generated by the value matching model 13, which will be explained later.

＜価値一体化モデル＞
一般論として、人はある状況において実際に望んでいることを、他人に対し誤って若しくは偽って伝えてしまうことも少なくない。これは、ＡＩに対し自らの要望を伝えて期待通りの行動を行ってもらおうとする際、大きな問題となる。 <Value integration model>
Generally speaking, people often mistakenly or misrepresent to others what they actually want in a given situation. This becomes a major problem when trying to convey one's wishes to an AI and have it act in accordance with one's expectations.

例えば、人は、ＡＩロボットに対し、ＡＩロボット自身がコーヒーを楽しむことよりも、自ら（人）のためにコーヒーを淹れてくれることを期待する（正解の価値とする）。また、人は、自律（自動）運転車に対し、運転中、自ら（人）にとってどのような価値が重要となるかを認知してくれることを要望することになる。例えば、交通ルールの順守、歩行者から離隔することや、愚図る子供が乗車している状況で交通渋滞に巻き込まれないこと等を必須の価値として認知することを期待するのである。しかしながら、ＡＩにとって、このような人の要望を的確に認知すること、言い換えると、人にとっての価値とＡＩの取り入れる価値とを一致化する（揃える）ことは、従来非常に困難となっていた。 For example, people expect AI robots to make coffee for themselves (people) rather than for the AI robots to enjoy coffee themselves (this is considered the value of a correct answer). Additionally, people will want autonomous (self-driving) cars to recognize what values are important to them while driving. For example, we expect them to recognize as essential values such as following traffic rules, keeping a distance from pedestrians, and not getting stuck in traffic jams when a child is in the car. However, it has traditionally been extremely difficult for AI to accurately recognize the needs of such people, or in other words, to match the value for people with the value that AI incorporates.

例えば、一般的な強化学習（ＲＬ，Reinforcement Learning）を実行するＡＩは、遠い将来の報酬ほど割り引いて加算した累積報酬を最大とするようにして、最適な方針を学習するが、このように扱われる報酬は、あくまで数学上の抽象量であって、現実の環境世界に本来的に備わったものではない。さらに言えば、人は何を考慮すべき価値とするかといった問題や、何故ある価値を重要とするかといった問題に対し、数学上の抽象量ではなく、実際に得られる量として答えるモデルを形成することは非常に困難である。 For example, AI that performs general reinforcement learning (RL) learns the optimal policy by maximizing the cumulative reward that is added by discounting rewards that are far in the future. The reward that is given is an abstract mathematical quantity, and is not something inherent in the real environmental world. Furthermore, we have created a model that answers the questions of what values people should consider and why they consider certain values important, not as abstract mathematical quantities, but as quantities that can actually be obtained. It is very difficult to do so.

また従来、逆強化学習（ＩＲＬ，Inverse Reinforcement Learning）（Andrew Ng and Stuart J Russell, ICML 2000: Proceedings of the Seventeenth International Conference）を用いて、この価値一致化の問題を解決する試みもなされてきた。しかしながら、人は自らの全ての要望をＡＩに理解してもらいたいわけではなく（例えばコーヒータイムを楽しむといったような個人的な望みは理解される必要がない）、ＡＩにとってそれを区別して処理することは非常に難しいとの問題が生じていた。さらに、ＩＲＬは、観測された人の行動・態度は最適化されたものであることを前提にしており、観測された人の行動・態度に含まれる様々な有用情報を活用して調整を行うことができなかった。 In the past, attempts have also been made to solve this value matching problem using inverse reinforcement learning (IRL) (Andrew Ng and Stuart J Russell, ICML 2000: Proceedings of the Seventeenth International Conference). However, people do not want AI to understand all of their wishes (for example, personal wishes such as enjoying coffee time do not need to be understood), and AI needs to distinguish between them and process them. This was a very difficult problem. Furthermore, IRL assumes that the observed person's behavior and attitude are optimized, and makes adjustments by utilizing various useful information contained in the observed person's behavior and attitude. I couldn't.

そこで、本発明に係る価値一致化モデル１３は、ユーザとの価値についてのコミュニケーションを可能にする、協調逆強化学習（ＣＩＲＬ，Cooperative Inverse Reinforcement Learning）（Dylan Hadfield-Menell et al., 30th Conference on Neural Information Processing Systems (NIPS) 2016）に係るアルゴリズムを用いて構築されている。 Therefore, the value matching model 13 according to the present invention uses Cooperative Inverse Reinforcement Learning (CIRL) (Dylan Hadfield-Menell et al., 30th Conference on Neural It is constructed using algorithms related to Information Processing Systems (NIPS) 2016).

具体的に価値一致化モデル１３は、ＣＩＲＬの処理フローの一環として、ユーザから、
（ａ）「価値情報」に係る情報、本実施形態ではユーザＨにとっての価値v（ユーザＨが（例えば社会生活上）重要であると認識している価値v）のセットτ＝<v1, v2,・・・, vn>に係る情報、すなわち、各価値v1, v2,・・・, vnについての情報と、
（ｂ）「価値情報」に係る情報に対応する報酬rに係る情報、本実施形態では各価値v1, v2,・・・, vnの報酬rについての情報と
を受け取って、これら（ａ）及び（ｂ）の情報に基づき、「価値情報」及びそれに対応する報酬rを生成又は更新し、デザイアモデル１２へ出力するモデルとなっている。 Specifically, the value matching model 13 receives requests from the user as part of the CIRL processing flow.
(a) Information related to "value information", in this embodiment, a set of values v for user H (values v that user H recognizes as important (for example, in social life)) τ=<v1, v2 ,..., vn>, that is, information about each value v1, v2,..., vn,
(b) Information regarding the reward r corresponding to the information regarding "value information", in this embodiment, information regarding the reward r for each value v1, v2,..., vn is received, and these (a) and Based on the information in (b), this model generates or updates "value information" and the corresponding reward r, and outputs it to the desire model 12.

ここで、（主体感推定モデル１としての）価値一致化モデル１３は、
（ア）当初、ユーザＨが個人的にその価値を認めるもの、本実施形態では価値セットτ＝<v1, v2,・・・, vn>について明確に認知しておらず、
（イ）カメラ等の測定手段から出力される測定結果を入力可能な、又はテキスト入出力・音声入出力等の可能なインタフェース（ＩＦ）を介し、ユーザＨに対して、価値セットτに係る観測や問い合わせを行い、
（ウ）「ユーザは通常、価値セットτに基づいた行動を行う」ことを大前提として、価値セットτに関する情報（例えばユーザの動作・態度や回答、さらにはそれに関連して得られた報酬・成果に係る情報）を収集し、
（エ）収集した情報に基づき、推定した価値セットτを最大化することを目的として価値セット・報酬生成処理を行うのである。 Here, the value matching model 13 (as the sense of agency estimation model 1) is
(A) Initially, the user H does not clearly recognize what he personally recognizes as the value, in this embodiment, the value set τ=<v1, v2,..., vn>;
(b) Observations related to the value set τ are provided to the user H via an interface (IF) that allows input of measurement results output from measurement means such as a camera, or allows for text input/output, voice input/output, etc. or make inquiries,
(C) Based on the basic premise that "users usually act based on the value set τ," information regarding the value set τ (e.g., the user's actions/attitudes and answers, as well as the rewards obtained in relation to them, etc.) information related to results),
(D) Based on the collected information, value set/reward generation processing is performed with the aim of maximizing the estimated value set τ.

ちなみに、より一致化した価値セットτを決定するため、上述したような主体感推定モデル１（価値一致化モデル１３）とユーザＨとの相互作用は、継続的に繰り返し行われることも好ましい。 Incidentally, in order to determine a more consistent value set τ, it is also preferable that the interaction between the sense of agency estimation model 1 (value matching model 13) and the user H as described above is continuously and repeatedly performed.

また、価値一致化モデル１３は本実施形態において、上記のインタフェース（ＩＦ）を介し、例えばユーザＨに対し問合せ・要求を行って、その応答内容からユーザＨの主体感に係るセルフレポート（自己申告）soa_rを生成し、後に詳細に説明する主体感モデル１７へ出力することも可能となっている。 Further, in this embodiment, the value matching model 13 makes inquiries and requests to the user H via the above-mentioned interface (IF), and generates a self-report (self-report) regarding the user H's sense of agency based on the response contents. ) soa_r can also be generated and output to the sense of agency model 17, which will be explained in detail later.

＜意思モデル＞ <Intention model>

将来発生する意思を導出するモデルとして、すでに説明したビリーフ－デザイア－意思モデル（Georgeff et al., 5th International Workshop, ATAL’98 Proceedings, pp.1-10, 1998）が知られている。ここで、意思は、環境世界において所望の状態（結果）をもたらす原因になるであろう行動として表される。しかしながら、ユーザＨの主体感レベルsoaを導出する上では、主体感推定モデル１（やユーザＨ）が原因（行動）と結果（状態）との関係、すなわち因果関係を如何に把握するかの問題を解決しなければならない。 The previously explained belief-desire-intention model (Georgeff et al., 5th International Workshop, ATAL'98 Proceedings, pp.1-10, 1998) is known as a model for deriving intentions that will occur in the future. Here, intention is expressed as an action that will cause a desired state (result) in the environmental world. However, in deriving user H's sense of agency level soa, the problem is how sense of agency estimation model 1 (and user H) understands the relationship between causes (actions) and results (states), that is, the causal relationship. must be resolved.

この問題を解決するべく、同じく図１において意思モデル１４は、
（ａ）ビリーフモデル１１から受け取ったビリーフ情報と、
（ｂ）デザイアモデル１２から受け取った価値セットτ（価値情報）及び対応する報酬と
に基づいて、状態、価値、報酬、及び行動（本実施形態ではさらに、ビリーフ情報に含まれるコスト）の間の因果関係に係る「因果関係情報」を生成するモデルとなっている。 In order to solve this problem, the intention model 14 in FIG.
(a) Belief information received from belief model 11,
(b) Based on the value set τ (value information) received from the desire model 12 and the corresponding reward, the relationship between the state, value, reward, and action (in this embodiment, the cost included in the belief information) This is a model that generates "causal relationship information" related to causal relationships.

本実施形態において、この意思モデル１４は、因果ベイジアンネットワーク（ＣＢＮ，Causal Bayesian Network）アルゴリズムを用いて構築される。また、出力となる「因果関係情報」は本実施形態において、ＣＢＮアルゴリズムに係る情報、具体的には構成されたＣＢＮの構成情報そのもの（図１の左側下方参照）であり、具体的には、状態s及び行動aが与えられたときに結果として生じる状態s'の条件付き確率を含む情報となっている。 In this embodiment, the intention model 14 is constructed using a Causal Bayesian Network (CBN) algorithm. In addition, in this embodiment, the output "causal relationship information" is information related to the CBN algorithm, specifically, the configuration information of the configured CBN itself (see the lower left side of FIG. 1), and specifically, This information includes the conditional probability of the resulting state s' given the state s and the action a.

ここでＣＢＮは、有向非巡回グラフモデルであって、親関数群{pa(Ｙ_i)}で特定される有向エッジＥを伴ったノード群{Ｙ_i}、及び条件付き確率群{Prob(Ｙ_i|pa(Ｙ_i)}で構成されている。ここで各ノードＹ_iは、状態、行動、コスト、価値、及び報酬のいずれかに対応するものである。また、有向エッジＥは、それによって結ばれるＹ_iとＹ_jとの間に因果関係的な遷移の可能性があることを示しており、具体的には、Ｙ_iはある確率をもってＹ_jの原因となる、言い換えるとＹ_jの起こる可能性が、Ｙ_iを条件とした条件付き確率分布で表されることを示している。 Here, CBN is a directed acyclic graph model, which includes a group of nodes {Y _i } with a directed edge E specified by a group of parent functions {pa(Y _i )}, and a conditional probability group {Prob (Y _i |pa(Y _i )}, where each node Y _i corresponds to one of the state, action, cost, value, and reward. Also, the directed edge E indicates that there is a possibility of a causal transition between Y _i and Y _j connected by this, and specifically, Y _i causes Y _j with a certain probability, in other words. This shows that the probability of occurrence of Y _j is expressed by a conditional probability distribution with Y _i as a condition.

さらに本実施形態において、ＣＢＮにはｄｏ演算子：do(Ｙ_j＝y_j)が規定されている。このｄｏ演算子がＣＢＮに適用されると、pa(Ｙ_i)＝Φであって、Prob(Ｙ_i)＝δ(Ｙ_i;y_i)となる。すなわちｄｏ演算子は、主体感推定モデル１によって実施される「（行動変容理論における）介入」に対応する演算子となっているのである。 Furthermore, in this embodiment, a do operator: do(Y _j =y _j ) is defined in the CBN. When this do operator is applied to the CBN, pa(Y _i )=Φ and Prob(Y _i )=δ(Y _i ;y _i ). In other words, the do operator corresponds to the "intervention (in behavior change theory)" implemented by the sense of agency estimation model 1.

また、意思モデル１４は当初、「介入」を通して環境世界の因果関係を推定するが、最終的には、「過去の段階で、ある異なる「介入」が実施されていたとしたら、何が生じていたのか」といった反実仮想的な問いに答える必要が生じる。そのため本実施形態では、意思モデル１４は「反実仮想モード」をとることも可能となっている。 In addition, the intention model 14 initially estimates causal relationships in the environmental world through ``interventions,'' but ultimately, ``what would have happened if a different ``intervention'' had been implemented in the past?'' It becomes necessary to answer counterfactual and hypothetical questions such as "Is this true?" Therefore, in this embodiment, the intention model 14 can also take a "counterfactual virtual mode."

以上説明したように、意思モデル１４は、
・人の意思は、「環境世界の事象は如何なる因果関係で繋がっているのか」についての考えや、「意図した行動による結果として何がもたらされるか」についての予想に基づいて形成されるとの、本願発明者等によって新たに設定された仮説
のもとに構築されている。 As explained above, the intention model 14 is
・Human intention is said to be formed based on thoughts about ``what kind of causal relationships exist between events in the environmental world'' and predictions about ``what will result from an intended action.'' , has been constructed based on a new hypothesis set by the inventors of the present application.

すなわち、意思モデル１４は、図２に示した本発明の理論的基礎体系における「比較器」や「推定」の機能、すなわち行われた行動の結果（観測された実際の状態）が予測又は意図された状態となっているかを判定・推定する機能、を取り込んだものとなっているのである。ちなみに、状態s及び行動aが与えられたときに結果として生じる状態s'の条件付き確率、すなわち状態s'の生じる確率がどのくらい高いかは、意思モデル１４の出力に含まれる情報となっているが、まさに意図した状態との対応関係を反映したものとなっている。 In other words, the intention model 14 functions as a "comparator" or "estimation" in the theoretical basic system of the present invention shown in FIG. It incorporates the function of determining and estimating whether the state is the same or not. By the way, the conditional probability of the resulting state s' when state s and action a are given, that is, how high the probability that state s' will occur is information included in the output of the intention model 14. However, it exactly reflects the correspondence relationship with the intended state.

またさらに、意思モデル１４には、ビリーフモデル１１から受け取ったビリーフ情報を介し、ユーザＨが想定した状態と、実際の状態が一致しているか否かに係る情報、すなわち主体感レベルsoaに係る情報が反映されているのである。 Furthermore, the intention model 14 includes information regarding whether or not the state assumed by the user H and the actual state match, through the belief information received from the belief model 11, that is, information regarding the sense of agency level soa. is reflected.

＜行動モデル＞
同じく図１において、行動モデル１５は、
（ａ）デザイアモデル１２から受け取った方針π＝<a1, a2,・・・, an>と、
（ｂ）意思モデル１４から受け取った因果関係情報（ＣＢＮ構成情報）
とに基づき、観測された状態sに対して行うべき行動aを決定し、出力するモデルとなっており、本実施形態において、行動計画部１５１及び行動決定部１５２を備えている。 <Behavioral model>
Also in FIG. 1, the behavioral model 15 is
(a) The policy π=<a1, a2,..., an> received from the desire model 12,
(b) Causal relationship information received from the intention model 14 (CBN configuration information)
Based on this, the model determines and outputs the action a to be taken for the observed state s, and in this embodiment includes an action planning section 151 and an action determining section 152.

このうち行動計画部１５１は、
上記（ａ）の方針πと、上記（ｂ）の因果関係情報（ＣＢＮ構成情報）と、さらに、
（ｃ）カメラ等の測定手段から出力される測定結果を入力可能な、又はテキスト入出力・音声入出力等の可能なインタフェース（ＩＦ）を介し、ユーザＨとの間で行った所定の問いかけを含むコミュニケーションの内容と
に基づき、最適とされる方針である最適方針π*を生成する。 Among these, the action planning department 151 is
The policy π of (a) above, the causal relationship information (CBN configuration information) of (b) above, and further,
(c) Through an interface (IF) that can input measurement results output from a measurement means such as a camera, or that can perform text input/output, voice input/output, etc., ask the user H a certain question. An optimal policy π*, which is the optimal policy, is generated based on the content of the communication included.

この行動計画部１５１は本実施形態において、公知のＸＡＩＰ（eXplainable AI Planning agent）（Chakraborti et al., Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence IJCAI-19, pp.1335-1343, 2019）における行動決定処理を用いて構築されている。ここでＸＡＩＰでは通常、行動計画問題Ｐ、遷移関数ζ_Ｐ：Ｓ×Ａ→Ｓ×Ｃや、行動計画アルゴリズムＡ：Ｐ×ｔ→π が定義される。ここで、tは最適性や健常性といった（本発明のτとは異なる）性質を表す量となっている。これに対し、本願発明者等は、行動計画問題Ｐの代わりに、ユーザＨの精神モデルΠを採用して、行動計画部１５１を構築しているのである。 In this embodiment, the action planning unit 151 uses the well-known XAIP (eXplainable AI Planning agent) (Chakraborti et al., Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence IJCAI-19, pp.1335-1343, 2019). It is constructed using the action decision processing in . Here, in XAIP, an action planning problem P, a transition function ζ _P :S×A→S×C, and an action planning algorithm A: P×t→π are usually defined. Here, t is a quantity representing properties such as optimality and healthiness (different from τ of the present invention). In contrast, the inventors of the present application adopt the mental model Π of the user H instead of the action planning problem P to construct the action planning section 151.

ここで、精神モデルΠは、本実施形態において意思モデル１４から受け取った因果関係情報（ＣＢＮ構成情報）であり、例えばユーザＨが自動車を運転している状況において、「現状の道路Ｘでこのまま進み、目的地Ｗまでの所要時間をよく考慮し、渋滞を回避してできるだけ速い速度で走行し、途中下車は極力避ける」といったようなユーザＨの心的な情報をとりまとめたモデルとなっている。 Here, the mental model Π is the causal relationship information (CBN configuration information) received from the intention model 14 in this embodiment, and for example, in a situation where the user H is driving a car, the mental model Π is ``keep going on the current road X''. This model compiles the mental information of the user H, such as ``to carefully consider the time required to reach the destination W, avoid traffic jams, drive as fast as possible, and avoid getting off the train as much as possible.''

具体的に、行動計画部１５１は、このユーザＨの精神モデルΠと、遷移関数ζ_Π：Ｓ×Ａ→Ｓ×Ｃと、行動計画アルゴリズムPA：Π×DM→π*とを規定する。ここで、DMは、デザイアモデル１２を構成する価値・報酬モデルであり、ここでは所望状態sd（∈Ｓ）をもたらす可能性のある（状態sにおける）行動aの集合である方針π＝<a1, a2,・・・, an>として実施される。また、π*は上述したように最適方針である。 Specifically, the action planning unit 151 defines the mental model Π of the user H, the transition function ζ _Π : S×A→S×C, and the action planning algorithm PA: Π×DM→π*. Here, DM is a value/reward model that constitutes the desire model 12, and here, the policy π = < a1 which is a set of actions a (in state s) that can bring about the desired state sd (∈S) , a2,..., an>. Furthermore, π* is the optimal policy as described above.

この最適方針π*は、実際の状態sを所望状態sdに遷移させる行動の集合であり、すなわち遷移関数は、次式
（２） ζ_Π(π, s)＝<sd, Σ_ai∈π(ci＋ri)>
のようになるのである。ここで、ci及びriはそれぞれ、方針πの実行時に最適化すべきコスト及び報酬となっている。なお、上式（２）の遷移関数ζ_Πは_、公知のＸＡＩＰでは採用されることのない報酬riを含むことを特徴の１つとしている。 This optimal policy π* is a set of actions that transition the actual state s to the desired state sd, that is, the transition function is the following equation (2) ζ _Π (π, s)=<sd, Σ _ai∈π ( ci＋ri)>
It becomes like this. Here, ci and ri are the cost and reward that should be optimized when executing the policy π, respectively. Note that one of the characteristics of the transition function ζ _Π in the above equation (2) _is that it includes a reward ri that is not adopted in the known XAIP.

行動計画部１５１は、行動計画アルゴリズムPAに従い、最適方針π*を決定するべく、方針πの行動を実施した際に遷移先の状態として生じた状態s(i+1)（＝ζ_Π(π, si)）が、所望状態sdに完全に若しくは概ね一致するように方針最適化処理を行うのである。以下、この方針最適化処理の具体的な実施形態を説明する。 In order to determine the optimal policy π* according to the action planning algorithm PA, the action planning unit 151 determines the state s(i+1) (=ζ _Π (π , si)), the policy optimization process is performed so that the state completely or approximately matches the desired state sd. A specific embodiment of this policy optimization process will be described below.

本実施形態では基本的に、行動計画部１５１は、ユーザＨの精神モデルと同様のモデルを維持する必要があり、したがって、ユーザＨの予測される行動や精神モデルが自らの行動やモデルと一致していない場合には、ユーザＨに対し「説明」を行い、モデルの一致化を進める。 Basically, in this embodiment, the behavior planning unit 151 needs to maintain a model similar to the mental model of the user H, and therefore, the predicted behavior and mental model of the user H are consistent with the user's own behavior and model. If they do not match, "explanation" is provided to user H, and the matching of the models is proceeded.

例えば、ユーザＨ（ドライバ）が「引き続き道路Ｘを走行する」場合に、「道路Ｙへ迂回する」ことを方針とするべく、「説明」として「道路Ｘはこの先渋滞しているのに対し、道路Ｙは渋滞していない」を採用してもよいのである。 For example, when user H (driver) "continues to drive on road "Road Y is not congested" may be adopted.

ここで「説明」のための１つの手法として、予測調停（inference reconciliation）が実施される。具体的には、行動計画部１５１の行動計画アルゴリズムPA^χ（以後χを、ＡＩとしてのモデル１を表すものとする）が方針πを生成するのに対し、ユーザＨの行動計画アルゴリズムPA^Ｈは同じ方針πを生成しない場合に、行動計画部１５１は、説明εを実施し、PA^Ｈが同じ方針πを生成するように、すなわちPA^Ｈ：Π×Ｄ→^επとなるようにする。 Here, as one method for "explanation", predictive reconciliation (inference reconciliation) is implemented. Specifically, the action planning algorithm PA ^χ (hereinafter, χ represents model 1 as AI) of the action planning unit 151 generates the policy π, whereas the action planning algorithm PA ^H of the user H generates the policy π. If the same policy π is not generated, the action planning unit 151 performs the explanation ε so that PA ^H generates the same policy π, that is, PA ^H : Π×D→ ^ε π.

この説明εは例えば、ユーザＨ（の行動計画アルゴリズムPA^Ｈ）に対してなされる、PA^χから生成される方針πについての具体的な問いかけ・質問を含むコミュニケーション内容とすることができる。ここで、この問いかけ・質問は、「何故ある行動aが方針πにあるのか」、「何故方針πであって他の方針π'ではないのか」や、「何故方針πが最適であるのか（すなわち何故π(s)はaであってa'ではないのか）」についての説明的な若しくは説得的な対話の形をとることも好ましい。ちなみにこのような対話は、上述した（ｃ）カメラ等の測定手段から出力される測定結果を入力可能な、又はテキスト入出力・音声入出力等の可能なインタフェース（ＩＦ）を介し実施可能となっているのである。 This explanation ε can be, for example, communication content including specific questions and inquiries about the policy π generated from PA ^χ , which is made to the user H (his action plan algorithm PA ^H ). Here, this question/question is ``Why is a certain action a in policy π?'', ``Why is it in policy π and not another policy π'?'', or ``Why is policy π optimal? In other words, it is also preferable to take the form of an explanatory or persuasive dialogue about ``Why is π(s) a and not a'?'' Incidentally, such dialogue can be carried out via the above-mentioned (c) interface (IF) that allows input of measurement results output from measurement means such as a camera, or that allows text input/output, voice input/output, etc. -ing

また、上述した予測調停とは別に、この説明εによってユーザＨの精神モデルΠ^Ｈを変化させることを考えてもよい。具体的には、ユーザＨは、行動計画アルゴリズムPA^χが最適であるとする方針πを、全く異なる"精神性"をもって評する可能性がある。そこで、説明εを、ユーザＨも決定された方針πに同意するように用いるのである。例えば、PA^χ：Π×Ｄ→πである下で、Ｈ^Ｈを、PA^Ｈ：Ｈ^Ｈ×Ｄ→πを満たすようなユーザＨの精神モデルとした上で、精神モデルΠ^Ｈを説明εによって、Ｈ^Ｈに変換させてもよい（すなわち、Π^Ｈ＋ε→Ｈ^Ｈとしてもよい）。 Furthermore, apart from the above-described predictive arbitration, it may be considered to change the mental model Π ^H of the user H based on this explanation ε. Specifically, the user H may evaluate the policy π, which assumes that the action planning algorithm PA ^χ is optimal, with a completely different "spirituality." Therefore, the explanation ε is used so that the user H also agrees with the determined policy π. For example, if PA ^χ :Π×D→π, let H ^H be the mental model of user H that satisfies PA ^H :H ^H ×D→π, and then let the mental model Π ^H be , H ^H (that is, Π ^H +ε→H ^H ).

またさらに他の手法として、行動計画部１５１は、互いに異なる仮定の下で生成された価値セットτ（の中の価値）の相違を強調する説明εを実施することもできる。このような説明εの実施は、例えばユーザＨと主体感推定モデル１（行動計画部１５１）との間では、ε←τΔτ^Ｈと表すことができ、ユーザＨ１とユーザＨ２との間では、ε←τ^Ｈ１Δτ^Ｈ２と表すことが可能である。このうち、後者については、ユーザＨ１の価値をユーザＨ２の価値の上位と捉える主体感推定モデル１の意向と適合したものとなっており、共同で行動を行うケースにおける、主体感推定モデル１の目標である主体感レベルのバランスのとれた推定を実現することも可能となる。 As yet another method, the action planning unit 151 can also implement explanation ε that emphasizes the difference between (the values in) the value sets τ generated under mutually different assumptions. Implementation of such explanation ε can be expressed as ε←τΔτ ^H between user H and sense of agency estimation model 1 (action planning unit 151), and between user H1 and user H2, ε It can be expressed as ←τ ^H1 Δτ ^H2 . Of these, the latter is compatible with the intention of sense of agency estimation model 1, which considers the value of user H1 to be higher than the value of user H2. It is also possible to achieve a balanced estimation of the target level of sense of agency.

ここで具体的に、PA^χ：Π×Ｄ→πである下で、ユーザＨと主体感推定モデル１との相互作用の中、τ^Ｈ＋ε→τ'^Ｈ、及びPA^χ：Π×Ｒ×τ'^Ｈ→πとなるような説明εが生成され、実施される。または、主体感推定モデル１がユーザＨ１の価値をユーザＨ２の価値の上位と捉える中（すなわち、PA^χ：Π×Ｒ×τ^Ｈ２→πである中）、τ^Ｈ１＋ε→τ'^Ｈ１＝τ'^Ｈ２を満たすような説明εが生成され、実施されてもよいのである。 Specifically, under the condition that PA ^χ : Π × D → π, in the interaction between user H and sense of agency estimation model 1, τ ^H +ε → τ' ^H and PA ^χ : Π × R × An explanation ε such that τ′ ^H →π is generated and implemented. Alternatively, while sense of agency estimation model 1 considers the value of user H1 to be higher than the value of user H2 (that is, PA ^χ :Π×R×τ ^H2 →π), τ ^H1 +ε→τ' ^H1 = τ ' An explanation ε that satisfies ^H2 may be generated and implemented.

ちなみに、以上説明したことからも明らかなように、行動計画部１５１は、上述した価値一致化モデル１３での状況とは異なり、ユーザＨ（の行動計画アルゴリズム）よりも問題解決能力のより高い行動計画アルゴリズムを備えているのである。すなわち、PA^χ＞PA^Ｈとなっているのである。 Incidentally, as is clear from the above explanation, unlike the situation in the value matching model 13 described above, the action planning unit 151 is configured to perform actions that have higher problem-solving ability than (the action planning algorithm of) the user H. It has a planning algorithm. In other words, PA ^χ >PA ^H.

以上、ＡＩがユーザに対し「説明」を行い、モデルの一致化を図るためのいくつかの手法を説明したが、いずれにしても、本実施形態での手法は、ＡＩによる説得技術（persuasive technology）における３つの基本に則っている。すなわち第１の基本として、主体感推定モデル１は、ユーザＨに対し、行動の計画や推定の結果の提示についての高い透明性を有し、ユーザＨは、質問したり説明を求めたりすることができるようになっている。なおこれにより、主体感推定モデル１はユーザＨにとって信頼できるものとなり、両者の関係がより向上することが期待される。 Above, we have explained several methods for AI to provide "explanation" to the user and achieve model matching, but in any case, the method in this embodiment is based on persuasive technology ) is based on the three basic principles. That is, as a first basis, the sense of agency estimation model 1 has high transparency in presenting action plans and estimation results to the user H, and the user H is not allowed to ask questions or request explanations. is now possible. Note that this makes the sense of agency estimation model 1 reliable for the user H, and it is expected that the relationship between the two will further improve.

また第２の基本として、主体感推定モデル１は、自身がユーザＨの行動を理解していることを、当のユーザＨへ説明することができる。なおこれにより、ユーザＨの主体感推定モデル１への共感度を高めることが可能となる。さらに第３の基本として、主体感推定モデル１は、ユーザＨに対し、行動の計画や推定を、ユーザＨと協働して行っている。なおこれによって、本発明の推定対象であるユーザＨの主体感レベルそのものを、向上させることも可能となるのである。 As a second basis, the sense of agency estimation model 1 can explain to the user H that it understands the user H's behavior. Note that this makes it possible to increase the degree of empathy of the user H with the sense of agency estimation model 1. Furthermore, as a third basis, the sense of agency estimation model 1 collaborates with the user H to plan and estimate actions for the user H. Note that this also makes it possible to improve the sense of ownership level of the user H, which is the estimation target of the present invention.

同じく図１において、行動モデル１５の行動決定部１５２は、行動計画部１５１で生成された最適方針π*を用いて、観測された状態sに対して行うべき行動aを決定し、出力する。具体的には、π(s)＝aを実行するのである。ちなみに、ここで決定される行動は、（主体感推定モデル１が自律ＡＩとして制御を行っている場合における）主体感推定モデル１、（ユーザＨが主に制御を行っている場合における）ユーザＨ、及び（主体感推定モデル１とユーザＨとが協働して制御を行っている場合における）主体感推定モデル１とユーザＨとの両者、のうちのいずれかの行動となる。 Similarly, in FIG. 1, the behavior determining unit 152 of the behavior model 15 uses the optimal policy π* generated by the behavior planning unit 151 to determine and output the behavior a to be performed for the observed state s. Specifically, it executes π(s)=a. Incidentally, the actions determined here are those of the sense of agency estimation model 1 (when the sense of agency estimation model 1 is controlling as an autonomous AI) and the user H (when the user H is mainly controlling). , and both of the sense of agency estimation model 1 and the user H (in the case where the sense of agency estimation model 1 and the user H perform control cooperatively).

ここで、決定される行動が主体感推定モデル１の行動である場合、行動決定部１５２から出力された行動aは、所定のインタフェース（ＩＦ）を介し（例えばアクチュエータの駆動、ディスプレイへの表示や、スピーカからの音声出力といった態様を介し）ユーザＨを含む環境世界へ作用し、これを受けた環境世界の状態sは、状態s'へ変化することになるのである。 Here, if the action to be determined is the action of the sense of agency estimation model 1, the action a output from the action determination unit 152 is processed via a predetermined interface (IF) (for example, driving an actuator, displaying it on a display, etc.). , the state s of the world including the user H changes to the state s'.

また、行動aの結果として世界において観測される状態s'は、「意図した（予測された）状態が、行動モデルで実施された行動aによるもの（であって他の行動主体の行動によるものではない）か否か」を決定するべく、意思モデル１４にフィードバックされるのである。 In addition, the state s' observed in the world as a result of action a is defined as "the intended (predicted) state is due to action a carried out in the behavioral model (but not due to the actions of other agents)". It is fed back to the intention model 14 in order to determine whether or not.

ちなみに、ユーザＨを、道路Ｘを走行している自動車のドライバとし、環境世界を、ユーザＨの自動車も含む道路交通状況及び道路網周辺の環境とすると、決定された行動：「道路Ｙへ迂回すべき旨を通知する」や「道路Ｙへ迂回する」によって、例えば新たな状態：「道路Ｙを走行して目的地Ｗに向かっている」や「走行している道路Ｙの交通状況は"渋滞"ではない」が発生することになる。またこれにより、例えば当初"low"であったユーザＨの「主体感レベル」が"high"に変化することになるのである。 By the way, if user H is the driver of a car traveling on road For example, new states such as "I am driving on road Y and heading towards destination W" or "What is the traffic situation on road Y that I am driving on?" or "Detour to road Y" Traffic jams will occur. Also, as a result, the "sensitivity level" of user H, which was initially "low", changes to "high", for example.

＜代替状態生成・評価モデル＞
同じく図１において、代替状態生成・評価モデル１６は、過去に見られない、予測されない又は希にしか起こらない状況においては、新規の方針を生成し、評価しなければならない、といった問題を解決するためのモデルである。いわば、行動変容を促すための「介入」用の介入コンテンツを自動で生成するモデルと捉えることもできるのである。 <Alternative state generation/evaluation model>
Also in FIG. 1, the alternative state generation/evaluation model 16 solves the problem of having to generate and evaluate new policies in situations that have not been seen in the past, are not predicted, or occur only rarely. This is a model for In other words, it can be viewed as a model that automatically generates intervention content for "intervention" to encourage behavior change.

具体的に、代替状態生成・評価モデル１６は、起こり得る新規の状態としての代替状態s^alを生成する状態「生成」器（１６２）、及び生成された代替状態s^alを評価する「評価器」（１６４）を備えており、公知の「生成・評価（actor-critic）フレームワーク」（Aras Dargazany, arXiv:2004.04574 Artificial Intelligence [cs.AI], 2020）（Zhewei Huang et al., arXiv:1903.04411 Computer Vision and Pattern Recognition [cs.CV], 2019）に基づき構成されたモデルである。ちなみにこの「生成・評価（actor-critic）フレームワーク」は、敵対的生成ネットワーク（ＧＡＮ，Generative Adversarial Networks）・深層強化学習（ＤＲＬ，Deep Reinforcement Learning）アルゴリズムを用いて構築されている。 Specifically, the alternative state generation/evaluation model 16 includes a state “generation” device (162) that generates an alternative state s ^a as a possible new state, and an “evaluator” that evaluates the generated alternative state s ^a ” (164) and the well-known “actor-critic framework” (Aras Dargazany, arXiv:2004.04574 Artificial Intelligence [cs.AI], 2020) (Zhewei Huang et al., arXiv:1903.04411 This is a model constructed based on Computer Vision and Pattern Recognition [cs.CV], 2019). Incidentally, this "actor-critic framework" is constructed using Generative Adversarial Networks (GAN) and Deep Reinforcement Learning (DRL) algorithms.

ただし、この公知の「生成・評価（actor-critic）フレームワーク」では、ＡＩは現状の環境世界を正確にモデル化できているとの前提の下で処理が進められるのに対し、代替状態生成・評価モデル１６では、過去に学習された様々なコンテキストからの知識や、互いに異なる複数のユーザの精神モデル（ＣＢＮ）からの知見（の集積体）を採用して、（ユーザＨにとって）予測・予見し得なかった、しかし発生し得る様々な状況を学習し、問題を解決するものとなっているのである。 However, in this well-known "actor-critic framework," AI processes based on the assumption that the current environmental world can be accurately modeled, whereas・The evaluation model 16 employs knowledge from various contexts learned in the past and knowledge from multiple users' different mental models (CBNs) to make predictions (for user H). It is designed to learn about various situations that could occur but could not be foreseen, and to solve problems.

同じく図１において、代替状態生成・評価モデル１６のＣＢＮ集合体１６１は、互いに異なる複数のユーザであるＨ1, Ｈ2, ・・・, Ｈpそれぞれの意思モデル（精神モデル，因果関係情報）であるＣＢＮ1, ＣＢＮ2, ・・・, ＣＢＮpの集合体ＣＢＮ_∪（＝ＣＢＮ_{Ｈi}）、言い換えれば統合因果関係情報、となっている。なお、ＣＢＮ1, ＣＢＮ2, ・・・, ＣＢＮpの少なくとも一部は、（人の違いではなく）コンテキストの違いに対応した、例えば互いに異なるコンテキストに対応したエンティティとすることも可能である。 Similarly, in FIG. 1, the CBN aggregate 161 of the alternative state generation/evaluation model 16 includes CBN1, which is the intention model (mental model, causal relationship information) of each of a plurality of different users H1, H2, ..., Hp. , CBN2, ..., CBNp is an aggregate CBN _∪ (=CBN _{Hi} ), in other words, integrated causal relationship information. Note that at least some of CBN1, CBN2, .

具体的に集合体ＣＢＮ_∪は、例えばＣＢＮ_Ｈ1＝(DAG<Ｙ_Ｈ1, Ｅ_Ｈ1>, Prob_Ｈ1)及びＣＢＮ_Ｈ2＝(DAG<Ｙ_Ｈ2, Ｅ_Ｈ2>, Prob_Ｈ2)が与えられたときに、次式
（３）ＣＢＮ_∪＝(DAG_∪<Ｙ_Ｈ1∪Ｙ_Ｈ2, Ｅ_Ｈ1∪Ｅ_Ｈ2>, Prob_Ｈ1∪Prob_Ｈ2)
で表すことができる。ここで、DAG<Ｙ, Ｅ>は、ノード（変数）Ｙ及びエッジＥで構成さされる有向非巡回グラフを指しており、また、Probは、各エッジＥに対応する遷移確率である。 Specifically, the set CBN _∪ is, for example, given CBN _H1 = (DAG<Y _H1 , E _H1 >, Prob _H1 ) and CBN _H2 = (DAG<Y _H2 , E _H2 >, Prob _H2 ) The following formula (3) CBN _∪ = (DAG _∪ <Y _H1 ∪Y _H2 , E _H1 ∪E _H2 >, Prob _H1 ∪Prob _H2 )
It can be expressed as Here, DAG<Y, E> refers to a directed acyclic graph composed of nodes (variables) Y and edges E, and Prob is a transition probability corresponding to each edge E.

同じく図１において、代替状態生成・評価モデル１６の状態生成器１６２は、
（ａ）観測された状態sと、これに対応する（行動決定部１５２から）出力された行動aとを受け取って、
（ｂ）ＣＢＮ集合体１６１から受け取った意思モデルの集合体ＣＢＮ_∪（統合因果関係情報）に基づき、
起こり得る状態候補としての代替状態s^alを生成し、出力する。 Also in FIG. 1, the state generator 162 of the alternative state generation/evaluation model 16 is
(a) Receiving the observed state s and the corresponding action a outputted (from the action determining unit 152),
(b) Based on the collection of intention models CBN _∪ (integrated causal relationship information) received from the CBN collection 161,
Generate and output an alternative state s ^al as a possible state candidate.

ここで代替状態s^alは、ユーザＨが予測・予期しなかった又は起こり得るとは考えなかった新規の状態であり、例えば過去に見られなかった未知の問題に対する、より好適な代替の解決行動を決定するのに使用されるものとなっている。 Here, the alternative state s ^al is a new state that the user H did not predict/expect, or did not think could occur, for example, a more suitable alternative solution action for an unknown problem that has not been seen in the past. It is used to determine the

ちなみに、状態生成器１６２は、公知の敵対的生成ネットワーク（ＧＡＮ）の生成部分に対応するものになってはいるが、従来のように例えばフェイクデータを生成するのではなく、新規の問題を解決するための新たな戦略を生み出すための代替状態s^alを生成するのである。 By the way, the state generator 162 corresponds to the generation part of a well-known generative adversarial network (GAN), but instead of generating fake data as in the past, it solves a new problem. It generates alternative ^states to generate new strategies for doing.

同じく図１において、代替状態生成・評価モデル１６の判別器１６３は、状態生成器１６２で生成された代替状態s^alが、ユーザＨのいるコンテキストではあり得ないほどに架空のものとなってはいないか否かを判別する。すなわち判別器１６３は、生成された新規の代替状態s^alが現実の問題を解決するのに有用となり得るものか否かを見極め、代替状態s^alがそのような状態となるように、状態生成器１６２の訓練を促すものとなっているのである。 Similarly, in FIG. 1, the discriminator 163 of the alternative state generation/evaluation model 16 prevents the alternative state s ^al generated by the state generator 162 from being so fictitious that it cannot exist in the context where user H is present. Determine whether or not there is. In other words, the discriminator 163 determines whether the generated new alternative state s ^al can be useful for solving an actual problem, and generates the state so that the alternative state s ^al becomes such a state. This is to encourage training of the vessel 162.

具体的に判別器１６３は本実施形態において、複数の全結合層を含む深層ニューラルネットワーク（ＤＮＮ，Deep Neural Networks）アルゴリズムで構成されており、（行動決定部１５２による）行動aによって生じた新たな状態s'と、（状態生成器１６２で生成された）代替状態s^alとから、所望状態sdとの相違を表す損失ALossを生成し、この損失ALossをもって、（ａ）状態生成器１６２に対し訓練を行わせ、また、（ｂ）自ら（判別器１６３）の訓練を行う。 Specifically, in this embodiment, the discriminator 163 is configured with a deep neural network (DNN) algorithm including a plurality of fully connected layers, and the discriminator 163 is configured with a deep neural network (DNN) algorithm that includes a plurality of fully connected layers. A loss ALoss representing the difference from the desired state sd is generated from the state s' and the alternative state s ^a (generated by the state generator 162), and with this loss ALoss, (a) (b) Train itself (discriminator 163).

ここで、一般的な敵対的生成ネットワーク（ＧＡＮ）においては、敵対的損失として、現在の状態s'と生成された状態sgとの相違の度合い、すなわち、max_ψ(Ｅx_ｓ～μ[ψ(s')]－Ｅx_ｓg～ug[ψ(sg)])が算出される。ここで、Ｅxは期待値であって、μ及びugはそれぞれ現在の状態のサンプル確率分布、及び生成された状態のサンプル確率分布である。 Here, in a general generative adversarial network (GAN), the adversarial loss is the degree of difference between the current state s' and the generated state sg, that is, max _ψ (Ex _s~μ [ψ( s')]-Ex _sg~ug [ψ(sg)]) is calculated. Here, Ex is the expected value, and μ and ug are the sample probability distribution of the current state and the sample probability distribution of the generated state, respectively.

これに対し、判別器１６３は本実施形態において、スカラである敵対的損失ALossそのものを出力するのであり、従来とは異なり、
（ａ）ALoss(s')：現在の状態s'と所望の状態sdとの間の損失、及び
（ｂ）AL(s^al)：代替状態s^alと所望の状態sdとの間の損失
として、ALoss(s')とAL(s^al)との間の最適な誤差を選択することを目的としているのである。 On the other hand, in this embodiment, the discriminator 163 outputs the scalar adversarial loss ALoss itself, and unlike the conventional method,
(a) ALoss(s'): as the loss between the current state s' and the desired state sd, and (b) AL(s ^al ): as the loss between the alternative state s ^al and the desired state sd. , ALoss(s') and AL(s ^al ).

同じく図１において、代替状態生成・評価モデル１６の評価器１６４は、（敵対的損失ALossによって訓練された）状態生成器１６２で生成される代替状態s^alに対応する報酬である予測報酬を生成する。ここで、この予測報酬は、行動モデル１５（の行動決定部１５２）から出力される行動aによって算出される報酬をもはや含まないものとなっている。 Also in FIG. 1, the evaluator 164 of the alternative state generation/evaluation model 16 generates a predicted reward that is a reward corresponding ^to the alternative state s a generated by the state generator 162 (trained by the adversarial loss ALoss). do. Here, this predicted reward no longer includes the reward calculated by the behavior a output from (the behavior determining unit 152 of) the behavior model 15.

また評価器１６４は、この予測報酬をもって行動モデル１５（の行動決定部１５２）に対し行動の決定についての訓練を行わせる。これにより、行動モデル１５（の行動決定部１５２）における行動決定処理を、過去に見られない、予測されない又は希にしか起こらない状況に対しても適用できるように更新することが可能となるのである。 The evaluator 164 also causes the behavior model 15 (the behavior determining unit 152 thereof) to perform training on behavior determination using this predicted reward. This makes it possible to update the behavior decision processing in the behavior model 15 (the behavior decision unit 152 thereof) so that it can be applied to situations that have not been seen in the past, are not predicted, or occur only rarely. be.

具体的に、時点tにおける予測報酬は、強化学習のＱ学習価値関数Q(s^al _t)とすることができる。このＱ学習価値関数Q(s^al _t)は、次式
（４） Q(s^al _t)＝r(s^al _t, a_t)＋γQ(s^al _t+1)
のように、割り引きされた報酬として更新される。ここで、r(s^al _t, at)は、状態s^al _tの下で行動a_tを行う場合の報酬となっている。 Specifically, the predicted reward at time t can be a Q-learning value function Q(s ^al _t ) of reinforcement learning. This Q learning value function Q(s ^al _t ) is calculated by the following formula (4) Q(s ^al _t )=r(s ^al _t , a _t )+γQ(s ^al _t+1 )
will be updated as a discounted reward. Here, r(s ^al _t , at) is the reward for performing action a _t under state s ^al _t .

次いで本実施形態において、行動モデル１５の行動決定部１５２は、この予測報酬（Ｑ学習価値関数）Q(s^al _t)を用い、行動aを導出するためのπ*(s)を、r(s^al _t, π*(s^al _t))＋Q(ζ(s^al _t, π*(s^al _t)))が最大化するように訓練するのである。ここで、遷移関数ζ(s^al _t, π*(s^al _t))は、時刻t+1における代替状態s^al _t+1となる。このような代替状態s^al _t+1は、ユーザＨがその能力の限界から、提示された問題への回答は不可能であるといったような苦境に立たされた場合に、主体感推定モデル１によって提示される解答と捉えることもできる。またこのような解答をユーザＨに提示することは、ユーザＨの主体感推定モデル（行動変容促進モデル）１に対する信頼性を高めるのに貢献することにもなるのである。 Next, in the present embodiment, the behavior determining unit 152 of the behavior model 15 uses this predicted reward (Q learning value function) Q(s ^al _t ) to convert π*(s) for deriving behavior a into r( We train to maximize s ^al _t , π*(s ^al _t ))+Q(ζ(s ^al _t , π*(s ^al _t ))). Here, the transition function ζ(s ^al _t , π*(s ^al _t )) becomes the alternative state s ^al _t+1 at time t+1. Such an alternative state s ^al _t+1 is created by sense of agency estimation model 1 when user H is in a predicament where it is impossible to answer the posed question due to the limit of his/her ability. It can also be seen as an answer presented. Furthermore, presenting such an answer to the user H also contributes to increasing the reliability of the user H's sense of agency estimation model (behavior change promotion model) 1.

＜主体感モデル＞
同じく図１において、主体感モデル１７は本実施形態において、行動の表現型としての動的なコンテキストに依存する主体感（ＳｏＡ）レベルのリアルタイムの変動を推定し出力する。 <Sense of ownership model>
Similarly, in FIG. 1, in this embodiment, the sense of agency model 17 estimates and outputs real-time fluctuations in the sense of agency (SoA) level that depends on the dynamic context as a behavioral phenotype.

過去に行われたある心理学的実験（Tapal, A. et al., Frontiers in Psychology, 8, Article 1552. 2017, ＜https://doi.org/10.3389/fpsyg.2017.01552＞）では、特定のイベントについての自己の主体感（ＳｏＡ，Sense of Agency）を本人が直接評価した結果であるセルフリポート（自己申告）を介した、直接的な主体感の測定が行われている。 In a psychological experiment conducted in the past (Tapal, A. et al., Frontiers in Psychology, 8, Article 1552. 2017, <https://doi.org/10.3389/fpsyg.2017.01552>), certain Sense of agency is directly measured through self-reports (self-reports), which are the results of a person's direct evaluation of their sense of agency (SoA) regarding an event.

また過去には、主体感の変動についての本人による測定値と外部からの測定値との知覚的差異を用いた、直接的な主体感の測定例も存在する。しかしながらいずれの手法においても、測定対象者からの直接的な主体感に関する応答を必須とし、それ故、測定対象者に断続的な行動の中断を強いることになるので、特に主体感レベルが大きく変動する状況においては、適用することが困難となっていた。 In the past, there have also been examples of direct measurement of a person's sense of agency, using the perceptual difference between a person's own measured value and an externally measured value of fluctuations in their sense of agency. However, both methods require a direct response from the person being measured regarding their sense of agency, which forces the person being measured to interrupt their behavior intermittently, resulting in significant fluctuations in the level of their sense of agency. It has been difficult to apply it in situations where

これに対し、本願発明者等は、主体感レベルsoaの変化が、ユーザＨ（測定対象者）における生理学的指標（例えば心拍数や呼吸の速さ等）、姿勢、身振りや、音声韻律指標（例えば調子、アクセント、イントネーション、発話速度、発話ピッチ、及び発話量等）における時間変化に明確に現れること（を仮説として上手くいくこと）を見出した。ここで、これらの測定結果は従来、（情動や気分を含む）感情状態の推定に効果的に用いられてきたものとなっている。 In contrast, the inventors of the present application have found that changes in the sense of agency level soa are influenced by physiological indicators (e.g., heart rate and breathing rate), posture, gestures, and speech prosody indicators ( For example, we have found that this phenomenon clearly appears in temporal changes in tone, accent, intonation, speech rate, speech pitch, speech volume, etc. (this is a hypothesis that works). Here, these measurement results have conventionally been effectively used to estimate emotional states (including emotions and moods).

また、いわゆるアフェクティブコンピューティング（Affective Computing）の分野では、ウェアラブルセンサや環境センサによって得られた行動・態度の表現型の情報からＡＩを用いて、対象者の感情を認識したり、感情における適応応答を探ったりする研究が精力的に行われている。またさらに、人の主体感と情動とは、日々の生活の中で常に相互作用していることを証明した研究もいくつか存在する（例えば、Matthis Synofzik et al., Front. Psychol., 4(127), 2013 ＜https://doi.org/10.3389/fpsyg.2013.00127＞や，Antje Gentsch1 and Matthis Synofzik, Front. Hum. Neurosci., 8:608, 2014 ＜https://doi.org/10.3389/fnhum.2014.00608＞等）。 In addition, in the field of so-called Affective Computing, AI is used to recognize a subject's emotions from information on behavioral and attitude expressions obtained from wearable sensors and environmental sensors, and to make adaptations to emotions. Research is being actively conducted to explore the response. Furthermore, there are some studies that have proven that people's sense of agency and emotions constantly interact in daily life (for example, Matthis Synofzik et al., Front. Psychol., 4( 127), 2013 <https://doi.org/10.3389/fpsyg.2013.00127> and Antje Gentsch1 and Matthis Synofzik, Front. Hum. Neurosci., 8:608, 2014 <https://doi.org/10.3389/ fnhum.2014.00608> etc.).

また例えば、主体感は、感情的な因子、例えば行動による感情に関わる結果への期待がポジティブかネガティブか、今回の行動を行う動機は高いのか低いのかや、行動を行うのは友好的な環境においてか敵対的な環境においてか等によって変調し得るとの研究結果（Julia F Christensen et al., Exp Brain Res. 237(5), 1205-1212, 2019 ＜https://doi.org/10.1007/s00221-018-5461-6＞）も開示されている。 For example, sense of agency is influenced by emotional factors, such as whether the expectations for the emotional outcome of the action are positive or negative, whether the motivation for performing the current action is high or low, and whether the action is performed in a friendly environment. Research results show that it can be modulated depending on whether the person is in a hostile environment or in a hostile environment (Julia F Christensen et al., Exp Brain Res. 237(5), 1205-1212, 2019 <https://doi.org/10.1007/ s00221-018-5461-6>) is also disclosed.

以上に説明したような知見や発見を定式化するべく、主体感モデル１７においては、現時点の主体感レベルsoaを出力する主体感認識関数Ω：ρ1×ρ2×ρ3×ρ4×・・・→Ｓを規定する。ここで、ρ1, ρ2, ・・・は、ユーザＨ（測定対象者）における生理学的指標（例えば心拍数や呼吸の速さ等）、姿勢、身振りや、音声韻律指標（例えば調子、アクセント、イントネーション、発話速度、発話ピッチ、及び発話量等）を表す特徴量パラメータである。 In order to formulate the findings and findings explained above, the sense of agency model 17 uses a sense of agency recognition function Ω that outputs the current sense of agency level soa: ρ1×ρ2×ρ3×ρ4×...→S stipulates. Here, ρ1, ρ2, . , speech rate, speech pitch, speech amount, etc.).

このような主体感認識関数Ωを規定した上で、主体感モデル１７は具体的に、深層ニューラルネットワーク（ＤＮＮ，Deep Neural Networks）アルゴリズムで構成され、
（ａ）行動決定部１５２から出力された行動aによって生じた新たな状態s'と、
（ｂ）この新たな状態s'の下での（新たな状態s'の影響を受けた）ユーザＨにおける所定の特徴ρに係る特徴量と
を受け取って、これら新たな状態s'及び特徴量ρに基づき、ユーザＨの主体感レベル（soa）を決定又は更新し、出力する。さらに、ビリーフモデル１１で用いる主体感レベルsoaを、決定又は更新した主体感レベルsoa'に更新させる。これにより、例えば主体感の中断や途絶が生じる原因や特徴を推定したり予期したりすることも可能となるのである。 After defining such a sense of agency recognition function Ω, the sense of agency model 17 is specifically constructed using a deep neural network (DNN) algorithm,
(a) A new state s' caused by the action a output from the action determining unit 152,
(b) receiving the feature amounts related to the predetermined feature ρ of the user H under this new state s' (affected by the new state s'), and changing these new state s' and the feature amounts; Based on ρ, the user H's sense of agency level (soa) is determined or updated and output. Furthermore, the sense of subjectivity level soa used in the belief model 11 is updated to the determined or updated sense of subjectivity level soa'. This makes it possible, for example, to estimate or anticipate the causes and characteristics of interruptions or discontinuities in the sense of agency.

ちなみに、この主体感モデル１７から出力された主体感レベルsoa'がビリーフモデル１１の「ビリーフ情報」を更新し、さらにこの更新されたビリーフ情報が意思モデル１４の「因果関係情報」（ＣＢＮ構成情報）を更新する流れは、まさに、主体感にかかわる「知覚対象を制御しているとの確信・信念」（perceived control）が「意思」に影響を及ぼす、との従来の心理学理論を体現したものとなっている。 Incidentally, the sense of agency level soa' output from the sense of agency model 17 updates the "belief information" of the belief model 11, and furthermore, this updated belief information is used as the "causal relationship information" (CBN configuration information) of the intention model 14. ) is exactly the embodiment of the conventional psychological theory that ``perceived control,'' which is related to a sense of agency, influences ``intention.'' It has become a thing.

ここで、主体感モデル１７は、
（ｃ）価値一致化モデル１３から受け取った、ユーザＨの主体感に係るセルフレポート（自己申告）soa_r
にも基づいて、ユーザＨの主体感レベル（soa）を決定又は更新し、出力することも好ましい。これは、価値一体化モデル１３が、ＣＩＲＬ（協調逆強化学習）の処理フローの一環として、（認識している主体感に疑いのある場合に）ユーザＨに対し、ユーザＨの主体感レベルを問い合わせた結果として、ユーザＨの主体感に係るセルフレポートsoa_rを取得した場合の処理となる。 Here, the sense of agency model 17 is
(c) Self-report (self-report) regarding user H's sense of agency received from value matching model 13 soa_r
It is also preferable to determine or update the sense of agency level (SOA) of the user H based on the information and output it. This is because the value integration model 13, as part of the processing flow of CIRL (Collaborative Inverse Reinforcement Learning), tells User H the level of sense of agency (when there is doubt about the perceived sense of agency). This process is performed when a self-report soa_r related to user H's sense of agency is obtained as a result of the inquiry.

以上、主体感モデル１７における主体感レベルsoaの生成・更新処理を説明したが、主体感推定モデル１はこれにより、例えば、ユーザＨに生じている主体感レベルを如何に確実に捉えて理解しているのかをユーザＨに対し説明することもでき、またその結果、ユーザＨに対し共感性と信頼性の高さを更に実証してみせることも可能となるのである。さらに、主体感推定モデル１はその上で、提示した方針がユーザＨの主体感レベルを向上させこそすれ、減退させるものではないと考えられることを、ユーザＨに納得させることも可能となり、またこのように、ユーザＨに対し、賢く分別のある応答を提供することもできるのである。 The process of generating and updating the sense of agency level soa in the sense of agency model 17 has been explained above, but the sense of agency estimation model 1 can thereby, for example, accurately capture and understand the sense of agency level occurring in the user H. It is also possible to explain to the user H what the user is doing, and as a result, it becomes possible to further demonstrate to the user H the high level of empathy and reliability. Furthermore, sense of agency estimation model 1 can also convince user H that the proposed policy can only improve the level of user H's sense of agency, not reduce it. In this way, it is also possible to provide user H with a smart and sensible response.

［主体感推定装置，主体感推定プログラム］
以下、図１に戻って、以上に説明したような主体感推定モデル１を搭載しており、推定対象ユーザであるユーザＨの主体感を推定する主体感推定装置９について説明する。ちなみに同様の構成によって本装置は、行動変容促進モデル（主体感推定モデル）１を搭載した行動変容促進装置とすることも可能となっている。 [Sense of agency estimation device, sense of agency estimation program]
Hereinafter, returning to FIG. 1, a description will be given of a sense of agency estimation device 9 that is equipped with the sense of agency estimation model 1 as described above and that estimates the sense of agency of the user H who is the user to be estimated. Incidentally, with a similar configuration, this device can also be used as a behavior change promotion device equipped with a behavior change promotion model (sense of agency estimation model) 1.

図１の左側下部に示した本実施形態の主体感推定装置９は、搭載した主体感推定モデル１を用いて、環境世界における観測された状態から、推定対象であるユーザＨの主体感を推定する、具体的にはユーザＨの主体感レベルを決定する装置となっている。 The sense of agency estimation device 9 of this embodiment shown in the lower left part of FIG. 1 estimates the sense of agency of the user H, who is the estimation target, from the observed state in the environmental world using the installed sense of agency estimation model 1. Specifically, it is a device that determines the user H's sense of agency level.

具体的に図１において、主体感推定装置９のユーザインタフェース（ＩＦ）９１は、価値一致化モデル１３のインタフェース（ＩＦ）、行動計画部１５１のインタフェース（ＩＦ）、及び行動モデル１５（行動決定部１５２）のインタフェース（ＩＦ）に相当し、ユーザＨに係る測定結果を取り入れたり、ユーザＨとの各種コミュニケーションに係る情報を入出力したり、さらには決定された行動を環境世界に対し作用させる役割を果たす。また、環境世界の状態といったような、主体感推定モデル１の訓練に必要となる情報や、主体感推定の条件となる情報を収集する入力部ともなっている。 Specifically, in FIG. 1, a user interface (IF) 91 of the sense of agency estimation device 9 includes an interface (IF) of the value matching model 13, an interface (IF) of the action planning unit 151, and an interface (IF) of the behavior model 15 (action determining unit). 152), and has the role of incorporating measurement results related to user H, inputting and outputting information related to various communications with user H, and furthermore, making determined actions act on the environmental world. fulfill. It also serves as an input unit that collects information necessary for training the sense of agency estimation model 1, such as the state of the environmental world, and information that is a condition for estimating a sense of agency.

訓練部９２は、受け取った主体感推定モデル１の訓練に必要となる情報から訓練データを生成し、これを用いて主体感推定モデル１の訓練を実施する。 The training unit 92 generates training data from the received information necessary for training the sense of agency estimation model 1, and uses this to train the sense of agency estimation model 1.

主体感推定部９３は、受け取った主体感推定の条件となる情報に基づき、訓練済みの主体感推定モデル１を用いて、ユーザＨの主体感レベルを決定する。ここで本実施形態においては、複雑な又は刻々と変動する環境世界の状況の中でも、ユーザＨのリアルタイムの主体感レベルや、ユーザＨの主体感レベルのダイナミックな変動を決定し出力することが可能となっている。 The sense of agency estimation unit 93 determines the sense of agency level of the user H using the trained sense of agency estimation model 1 based on the received information serving as a condition for estimation of a sense of agency. In this embodiment, it is possible to determine and output user H's real-time sense of agency level and dynamic fluctuations in user H's sense of agency level even in complex or ever-changing environmental world situations. It becomes.

出力部９４は、決定された主体感レベルに係る情報を、（通信機能を備えている場合に）外部の情報処理装置へ送信したり、（表示機能を備えている場合に）表示したりする。 The output unit 94 transmits information related to the determined sense of ownership level to an external information processing device (if equipped with a communication function) or displays it (if equipped with a display function). .

ここで、訓練部９２及び主体感推定部９３は、本発明による主体感推定方法の一実施形態を実施する主要機能構成部であり、また、本発明による主体感推定プログラムの一実施形態を保存したプロセッサ・メモリの機能と捉えることもできる。またこのことから、主体感推定装置９は、主体感推定の専用装置であってもよいが、本発明による主体感推定プログラムを搭載した、例えばクラウドサーバ、非クラウドのサーバ装置、パーソナル・コンピュータ（ＰＣ）、ノート型若しくはタブレット型コンピュータ、スマートフォン、又はウェアラブルコンピュータ等とすることも可能である。 Here, the training unit 92 and the sense of agency estimation unit 93 are main functional components that implement an embodiment of the sense of agency estimation method according to the present invention, and also store an embodiment of the sense of agency estimation program according to the present invention. It can also be seen as a function of the processor and memory. Further, from this, the sense of agency estimation device 9 may be a dedicated device for estimating a sense of agency, but may be a cloud server, a non-cloud server device, a personal computer ( It is also possible to use a PC), a notebook or tablet computer, a smartphone, a wearable computer, or the like.

以上詳細に説明したように、本発明によれば、従来推定の困難であった（特に、複雑な且つ刻々と変動する環境世界下での推定ができなかった）ユーザの主体感を推定することができ、また、適宜更新したユーザの主体感を考慮して、環境世界に対する行動を決定することにより、その中でユーザの行動変容、すなわち意思又は行動における変化を促すことも可能となる。 As explained in detail above, according to the present invention, it is possible to estimate a user's sense of agency, which has been difficult to estimate in the past (especially in a complex and constantly changing environment). Furthermore, by determining the user's behavior toward the environmental world in consideration of the user's appropriately updated sense of agency, it is also possible to encourage the user to change their behavior, that is, change their intentions or actions.

例えば、適切な実施形態をとることによって、（本発明による主体感推定モデルからの提案や説得等を受けて）ユーザが自らの意思で行った行動と、その結果として現れた環境世界の状態との間に、ユーザ自身が繋がりや連動性を感じられるように、すなわちユーザの主体感を向上させこそすれ減退させずに、適切な提案や説得等を行うことも可能となるのである。 For example, by adopting an appropriate embodiment, it is possible to understand the actions that the user has taken of his or her own will (in response to suggestions and persuasion from the sense of agency estimation model according to the present invention) and the state of the environmental world that appears as a result. During this time, it is possible to make appropriate suggestions and persuasion by improving the user's sense of agency, so that the user feels connected and interlocked, without diminishing the user's sense of agency.

また、本発明は、以上に述べたような作用効果を奏するが故に、将来様々な場面において見られるであろう人間とＡＩとの相互理解や協働活動について、その内容を向上・発展させるのにも大いに貢献するものになると考えられる。 In addition, since the present invention has the effects described above, it is also useful for improving and developing the content of mutual understanding and collaborative activities between humans and AI that will be seen in various situations in the future. It is thought that it will also make a significant contribution.

さらに、例えば子供達に対し質の高い、すなわち子供達の主体性や勉強への意欲を尊重した教育を提供するために、本発明による主体感推定モデルや行動変容促進モデルを用いて、子供達の主体感を維持・向上させるような提案・指導を含む教育行動を、実施することもできる。すなわち本発明によれば、国連が主導する持続可能な開発目標（ＳＤＧｓ）の目標４「すべての人々に包摂的かつ公平で質の高い教育を提供し、生涯学習の機会を促進する」に貢献することも可能となるのである。 Furthermore, in order to provide children with high-quality education that respects children's independence and motivation to study, for example, the sense of agency estimation model and behavior change promotion model of the present invention can be used to It is also possible to implement educational actions, including suggestions and guidance, that maintain and improve a sense of ownership. In other words, the present invention contributes to Goal 4 of the Sustainable Development Goals (SDGs) led by the United Nations: “Provide inclusive, equitable and quality education for all and promote lifelong learning opportunities.” It is also possible to do so.

また、例えば大人達に対し、ディーセント・ワーク（働きがいのある人間らしい仕事）を提供するために、本発明による主体感推定モデルや行動変容促進モデルを用いて、大人達の主体感を維持・向上させるような、仕事を得るための又は仕事上のアドバイスを実施し、大人達の適切な仕事上の行動変容を促すこともできる。すなわち本発明によれば、国連が主導するＳＤＧｓの目標８「すべての人々のための包摂的かつ持続可能な経済成長、雇用およびディーセント・ワークを推進する」に貢献することも可能となるのである。 Furthermore, in order to provide adults with decent work (jobs that are rewarding and humane), for example, the sense of agency estimation model and behavior change promotion model of the present invention can be used to maintain and improve adults' sense of agency. It is also possible to provide advice on how to get a job or on the job, such as encouraging adults to change their behavior at work. In other words, according to the present invention, it is also possible to contribute to Goal 8 of the SDGs led by the United Nations, "Promote inclusive and sustainable economic growth, employment and decent work for all." be.

さらに、例えば都市部を走行する自動車のドライバ達に対し、このドライバ達の目的を確実に且つ円滑に達成するため、本発明による主体感推定モデルや行動変容促進モデルを用いて、ドライバ達の主体感を減退させない、すなわちドライバ達にとって納得し易いナビゲーションを実施することもできる。すなわち本発明によれば、国連が主導するＳＤＧｓの目標１１「都市を包摂的、安全、レジリエントかつ持続可能にする」に貢献することも可能となるのである。 Furthermore, in order to reliably and smoothly achieve the goals of automobile drivers driving in urban areas, for example, the sense of agency estimation model and behavioral change promotion model according to the present invention can be used to improve the driver's agency. It is also possible to implement navigation that does not reduce the feeling of navigation, that is, that is easy for drivers to understand. In other words, according to the present invention, it is also possible to contribute to Goal 11 of the SDGs led by the United Nations: "Making cities inclusive, safe, resilient and sustainable."

またさらに、例えば消費者達に対し、持続可能な消費とライフスタイルを提供するため、本発明による主体感推定モデルや行動変容促進モデルを用いて、消費者達の主体感を減退させない、すなわち消費者達にとって納得し易い消費行動上のアドバイスや提案を実施することもできる。すなわち本発明によれば、国連が主導するＳＤＧｓの目標１２「持続可能な消費と生産のパターンを確保する」に貢献することも可能となるのである。 Furthermore, in order to provide consumers with sustainable consumption and lifestyles, for example, the sense of agency estimation model and behavior change promotion model of the present invention can be used to prevent consumers' sense of agency from diminishing. It is also possible to provide advice and suggestions on consumer behavior that are easy for consumers to understand. In other words, according to the present invention, it is also possible to contribute to Goal 12 of the SDGs led by the United Nations, "Ensure sustainable consumption and production patterns."

上述した本発明の種々の実施形態について、本発明の技術思想及び見地の範囲の種々の変更、修正及び省略は、当業者によれば容易に行うことができる。上述の説明はあくまで例であって、何ら制約しようとするものではない。本発明は、特許請求の範囲及びその均等物として限定するものにのみ制約される。 Regarding the various embodiments of the present invention described above, various changes, modifications, and omissions within the scope of the technical idea and viewpoint of the present invention can be easily made by those skilled in the art. The above description is merely an example and is not intended to be limiting in any way. The invention is limited only by the claims and their equivalents.

１主体感推定モデル（行動変容促進モデル）
１１ビリーフモデル
１２デザイアモデル
１３価値一致化モデル
１４意思モデル
１５行動モデル
１５１行動計画部
１５２行動決定部
１６代替状態生成・評価モデル
１６１ＣＢＮ（Causal Bayesian Network）集合体
１６２状態生成器
１６３判別器
１６４評価器
１７主体感モデル
９主体感推定装置
９１ユーザインタフェース（ユーザＩＦ）
９２訓練部
９３主体感推定部
９４出力部 1 Sense of agency estimation model (behavioral change promotion model)
11 Belief model 12 Desire model 13 Value matching model 14 Intention model 15 Behavior model 151 Action planning unit 152 Behavior determining unit 16 Alternative state generation/evaluation model 161 CBN (Causal Bayesian Network) aggregate 162 State generator 163 Discriminator 164 Evaluation Device 17 Sense of agency model 9 Sense of agency estimation device 91 User interface (user IF)
92 Training section 93 Sense of agency estimation section 94 Output section

Claims

A sense of agency estimation model that operates a computer to estimate a user's sense of agency while determining behavior regarding the state of the environmental world including the user using rewards,
a belief model that generates or updates belief information that is information that includes the probability that a new state will occur as a result of performing a certain action for a certain state under a certain sense of agency level of the user;
Receive value information related to value for the user and the corresponding reward, generate a desired state that is the state desired by the user, and determine a policy that is a set of actions that can bring about the desired state. Desire model and
an intention model that generates causal relationship information regarding a causal relationship between states, values, rewards, and actions based on the belief information, the value information, and the reward;
Based on the policy, the cause-and-effect relationship information , and the content of the communication including the predetermined questions conducted with the user , an optimal policy that is the optimal policy is generated, and using the optimal policy, a behavior model that determines and outputs actions to be taken in response to observed conditions;
Determining or updating the sense of agency level of the user based on the output new state caused by the action and the feature amount related to the predetermined characteristics of the user under the new state, and outputting the determined level. , a sense of agency estimation model characterized in that a computer functions as a sense of agency model that updates the sense of agency level used in the belief model to the determined or updated sense of agency level.

Receive information related to the value information and information related to the reward corresponding to the information related to the value information from the user, and based on the information related to the value information and the information related to the reward, 2. The sense of agency estimation model according to claim 1, wherein the computer further functions as a value matching model that generates or updates the corresponding reward and outputs it to the desire model.

3. The sense of agency estimation model according to claim 2 , wherein the value matching model is constructed using an algorithm related to cooperative inverse reinforcement learning (CIRL).

Receive the observed state and the corresponding output action, and select an alternative state as a possible state candidate based on integrated causal relationship information that integrates the causal relationship information for each of at least a plurality of users. a state generator that generates and outputs;
A loss representing the difference from the desired state is generated from the new state caused by the action and the alternative state, the state generator is trained using the loss, and the state generator is trained using the loss. A discriminator that performs
an evaluator that generates a predicted reward that is a reward corresponding to the alternative state generated by the trained state generator, and uses the predicted reward to train the behavior model to determine the behavior. 4. The sense of agency estimation model according to claim 1 , further comprising a computer functioning as an alternative state generation/evaluation model.

5. The sense of agency estimation model according to claim 4 , wherein the alternative state generation/evaluation model is constructed using an algorithm related to generative adversarial networks (GAN).

The sense of agency estimation according to any one of claims 1 to 5 , wherein the belief model is constructed using an algorithm related to a Partially Observable Markov Decision Process (POMDP). model.

7. The sense of agency estimation model according to claim 1, wherein the causal relationship information is information related to a Bayesian network algorithm.

8. A sense of agency estimation device for estimating a sense of agency of the user from an observed state in the environmental world using the sense of agency estimation model according to any one of claims 1 to 7 .

A sense of agency estimation method in which a computer estimates a user's sense of agency while determining behavior regarding the state of the environmental world including the user using rewards, the method comprising:
generating or updating belief information, which is information that includes a probability that a new state will occur as a result of performing a certain action on a certain state under a certain sense of agency level of the user;
Receive value information related to value for the user and the corresponding reward, generate a desired state that is the state desired by the user, and determine a policy that is a set of actions that can bring about the desired state. step and
a step of generating causal relationship information regarding a causal relationship between a state, a value, a reward, and an action based on the belief information, the value information, and the reward;
Based on the policy, the cause-and-effect relationship information , and the content of the communication including the predetermined questions conducted with the user , an optimal policy that is considered to be the optimal policy is generated, and the generated optimal policy is a step of determining and outputting an action to be taken for the observed state using the method;
Determining or updating the sense of agency level of the user based on the output new state caused by the action and the feature amount related to the predetermined characteristics of the user under the new state, and outputting the determined level. A method for estimating a sense of agency, comprising: updating the sense of agency level used in the step of generating or updating the belief information to the determined or updated sense of agency level .

A behavior change promotion model that operates a computer that promotes behavior change in a user while determining behavior regarding the state of the environmental world including the user using rewards,
a belief model that generates or updates belief information that is information that includes the probability that a new state will occur as a result of performing a certain action for a certain state under a certain sense of agency level of the user;
Receive value information related to value for the user and the corresponding reward, generate a desired state that is the state desired by the user, and determine a policy that is a set of actions that can bring about the desired state. Desire model and
an intention model that generates causal relationship information regarding a causal relationship between states, values, rewards, and actions based on the belief information, the value information, and the reward;
Based on the policy, the cause-and-effect relationship information, and the content of the communication including the predetermined questions conducted with the user, an optimal policy that is the optimal policy is generated, and using the optimal policy, a behavior model that determines and outputs actions to be taken in response to observed conditions;
Based on the output new state caused by the action and the feature amount related to the predetermined characteristics of the user under the new state, the sense of agency level of the user is determined or updated, and the belief model is A behavioral change promotion model characterized in that a computer functions as a sense of agency model that updates a sense of agency level used in the above to a determined or updated sense of agency level.