JP2016224512A

JP2016224512A - Decision support system and decision making support method

Info

Publication number: JP2016224512A
Application number: JP2015107193A
Authority: JP
Inventors: 幸二福田; Koji Fukuda; 泰幸工藤; Yasuyuki Kudo; 谷本　幸一; Koichi Tanimoto; 幸一谷本; 美奈子鳥羽; Minako Toba
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2015-05-27
Filing date: 2015-05-27
Publication date: 2016-12-28
Anticipated expiration: 2035-05-27
Also published as: JP6511333B2

Abstract

PROBLEM TO BE SOLVED: To naturally model expert knowledge and make future prediction.SOLUTION: A decision support system having an action chain model, a reaction model, a decision model, an evaluation model, and an action selection unit, derives a next action possibly taken by a player from actions of the player using the action chain model; derives next index values from the action of the player and index values using the reaction model; calculates a selection probability of the action of each player from the derived next index values using the decision model; and calculates an evaluation value representing a desirable degree of the derived next index values for each player using the evaluation model, and the action selection unit selects and outputs the action of each player using the action derived using the action chain model, the selection probability calculated using the decision model, and the evaluation value calculated using the evaluation model.SELECTED DRAWING: Figure 2

Description

本発明は、意思決定支援システムに関する。 The present invention relates to a decision support system.

専門家の知見を用いて将来生じるであろう情勢や関係者の行動を予測するコンピュータシステムが提案されている。 Computer systems have been proposed that use the knowledge of experts to predict future situations and behaviors of related parties.

例えば、特許文献１（特開平５−２０４９９１号公報）には、コンピュータと時系列データベースと登録パターンデータベースと端末装置からなるシステムにおいて、複数のパターンを登録するステップと、時系列データベースから時系列データを読み込み、既に登録されている複数のパターンとの照合をパターン毎，一定期間毎に行うステップと、登録されたパターン間の出現に関する因果関係を分析するステップと、分析結果を表示するステップとからなる時系列データ検索システムが記載されている。特許文献１に記載された時系列データ検索システムは、登録されたパターンと比較した結果に基づいて（ルールベースで）将来動向（アクション）を予測する。 For example, Patent Document 1 (Japanese Patent Laid-Open No. 5-204991) discloses a step of registering a plurality of patterns in a system including a computer, a time series database, a registration pattern database, and a terminal device, and a time series data from the time series database. From the step of performing the matching with a plurality of patterns already registered for each pattern and every predetermined period, the step of analyzing the causal relationship regarding the appearance between the registered patterns, and the step of displaying the analysis result A time-series data search system is described. The time-series data search system described in Patent Literature 1 predicts a future trend (action) based on a result of comparison with a registered pattern (on a rule basis).

特開平５−２０４９９１号公報JP-A-5-204991

しかし、特許文献１に記載されているようなルールベースの予測システムでは、連鎖モデルを用いて、多くのルールを含むモデルを作成して、次に何が生じるかをシミュレートするものである。このため、ルールベースの予測システムではモデルの作成が困難であった。すなわち、このモデルでは、全ての事象や関係者の行動を考慮しなければならず、専門家の知見を整理して、知見を統合したモデルを作成するために、専門家の知見を分類して、モデルを作成することは困難である。このため、専門家の知識を無理なくモデル化して、将来を予測するシステムが求められている。 However, in a rule-based prediction system as described in Patent Document 1, a model including many rules is created using a chain model, and what happens next is simulated. For this reason, it is difficult to create a model with a rule-based prediction system. In other words, in this model, all events and the actions of related parties must be taken into account, and in order to organize the knowledge of experts and create a model that integrates knowledge, the knowledge of experts is classified. Creating a model is difficult. For this reason, there is a need for a system that models expert knowledge without difficulty and predicts the future.

本願において開示される発明の代表的な一例を示せば以下の通りである。すなわち、プロセッサとメモリとを有する計算機によって構成される意思決定支援システムであって、前記メモリは、意思決定に必要な複数の情勢を数値化した複数の指標値を格納し、前記意思決定支援システムは、前記プロセッサが、プレーヤーのアクションから次のアクションを導出するためのアクション連鎖モデルと、前記プロセッサが、プレーヤーのアクションと前記指標値とから、次の指標値を導出するための反応モデルと、前記プロセッサが、指標値からアクションの意思を表す選択確率をプレーヤー毎に導出するための意思モデルと、前記プロセッサが、指標値から評価値をプレーヤー毎に導出するための評価モデルと、前記プロセッサが、プレーヤーのアクションを選択するアクション選択部とを有し、前記意思決定支援システムは、前記アクション連鎖モデルを用いて、前記プレーヤーのアクションから前記プレーヤーがとり得る次のアクションを導出し、前記反応モデルを用いて、前記プレーヤーのアクションと前記指標値とから、次の指標値を導出し、前記意思モデルを用いて、前記導出された次の指標値から前記各プレーヤーのアクションの選択確率を計算し、前記評価モデルを用いて、前記導出された次の指標値が前記各プレーヤーにとって望ましい程度を表す評価値を計算し、前記アクション選択部は、前記アクション連鎖モデルを用いて導出されたアクションと、前記意思モデルを用いて計算された選択確率と、前記評価モデルを用いて計算された評価値とを用いて、前記各プレーヤーのアクションを選択し、前記選択されたアクションを出力する。 A typical example of the invention disclosed in the present application is as follows. That is, a decision support system comprising a computer having a processor and a memory, wherein the memory stores a plurality of index values obtained by quantifying a plurality of situations necessary for decision making, and the decision support system Is an action chain model for the processor to derive the next action from the player's action, and a reaction model for the processor to derive the next index value from the player's action and the index value; An intention model for the player to derive, for each player, a selection probability representing the intention of the action from the index value; an evaluation model for the processor to derive an evaluation value for each player from the index value; An action selection unit for selecting an action of the player, and The system derives the next action that the player can take from the action of the player using the action chain model, and uses the reaction model to determine the next index value from the action of the player and the index value. And using the intention model to calculate the selection probability of each player's action from the derived next index value, and using the evaluation model, the derived next index value is An evaluation value representing a degree desired for the player is calculated, and the action selection unit uses the action derived using the action chain model, the selection probability calculated using the intention model, and the evaluation model. Using the calculated evaluation value, the action of each player is selected, and the selected action is output.

本発明の代表的な形態によれば、専門家の知見を容易に整理でき、将来生じるであろう情勢やアクションを専門家の知見に基づいて予測することができる。前述した以外の課題、構成及び効果は、以下の実施例の説明により明らかにされる。 According to the representative embodiment of the present invention, expert knowledge can be easily organized, and the situation and actions that will occur in the future can be predicted based on the expert knowledge. Problems, configurations, and effects other than those described above will become apparent from the description of the following embodiments.

第１の実施例の意思決定支援システムの物理的な構成を示すブロック図である。It is a block diagram which shows the physical structure of the decision support system of a 1st Example. 第１の実施例の意思決定支援システムの論理的な構成を示すブロック図である。It is a block diagram which shows the logical structure of the decision support system of a 1st Example. 第１の実施例のアクション連鎖モデルを説明する図である。It is a figure explaining the action chain model of the 1st example. 第１の実施例の反応モデルを説明する図である。It is a figure explaining the reaction model of the 1st example. 第１の実施例の意思モデルを説明する図である。It is a figure explaining the intention model of a 1st Example. 第１の実施例の評価モデルを説明する図である。It is a figure explaining the evaluation model of a 1st Example. 第１の実施例の意思決定支援システムによる処理のフローチャートである。It is a flowchart of the process by the decision support system of a 1st Example. 第１の実施例のシミュレーション結果出力画面の例を示す図である。It is a figure which shows the example of the simulation result output screen of 1st Example. 第１の実施例の変形例の意思決定支援システムによる処理のフローチャートである。It is a flowchart of the process by the decision support system of the modification of a 1st Example. 第１の実施例の星取表出力画面の例を示す図である。It is a figure which shows the example of the star chart output screen of a 1st Example. 第１の実施例の星取表を構成するためのモンテカルロ木探索を説明する図である。It is a figure explaining the Monte Carlo tree search for comprising the star chart of a 1st Example. 第２の実施例の意思決定支援システムの論理的な構成を示すブロック図である。It is a block diagram which shows the logical structure of the decision support system of a 2nd Example. 第２の実施例の調停モデルを説明する図である。It is a figure explaining the mediation model of a 2nd Example.

図１は、第１の実施例の意思決定支援システムの物理的な構成を示すブロック図である。 FIG. 1 is a block diagram showing the physical configuration of the decision support system of the first embodiment.

本実施例の意思決定支援システムは、複数の計算機（ＣＡＬＣ＿ＮＯＤＥ）と、これら複数の計算機を接続する通信スイッチ（ＣＯＭ＿ＳＷ）とを有する。 The decision support system according to this embodiment includes a plurality of computers (CALC_NODE) and a communication switch (COM_SW) that connects the plurality of computers.

各計算機（ＣＡＬＣ＿ＮＯＤＥ）は、プログラムを実行するプロセッサ（ＣＰＵ）と、データ及びプログラムを格納する一時記憶装置（ＲＡＭ）及び補助記憶装置（ＳＴＯＲ）と、通信スイッチ（ＣＯＭ＿ＳＷ）と接続される通信デバイス（ＣＯＭ＿ＤＥＶ）とを有する。プロセッサ（ＣＰＵ）と、一時記憶装置（ＲＡＭ）と、補助記憶装置（ＳＴＯＲ）と、通信デバイス（ＣＯＭ＿ＤＥＶ）とは、バス（ＢＵＳ）で接続されている。 Each computer (CALC_NODE) is a communication device (CPU) that executes a program, a temporary storage device (RAM) that stores data and a program, an auxiliary storage device (STOR), and a communication device (COM_SW) that is connected to a communication switch (COM_SW). COM_DEV). The processor (CPU), the temporary storage device (RAM), the auxiliary storage device (STOR), and the communication device (COM_DEV) are connected by a bus (BUS).

プロセッサ（ＣＰＵ）は、一時記憶装置（ＲＡＭ）に格納されたプログラムを実行する。一時記憶装置（ＲＡＭ）は、不揮発性の記憶素子であるＲＯＭ及び揮発性の記憶素子である（ＲＡＭ）を含む。ＲＯＭは、不変のプログラム（例えば、ＢＩＯＳ）などを格納する。（ＲＡＭ）は、Ｄ（ＲＡＭ）（Dynamic Random Access Memory）のような高速かつ揮発性の記憶素子であり、プロセッサ（ＣＰＵ）が実行するプログラム及びプログラムの実行時に使用されるデータを一時的に格納する。 The processor (CPU) executes a program stored in a temporary storage device (RAM). The temporary storage device (RAM) includes a ROM that is a nonvolatile storage element and a RAM that is a volatile storage element. The ROM stores an immutable program (for example, BIOS). (RAM) is a high-speed and volatile storage element such as D (RAM) (Dynamic Random Access Memory), and temporarily stores a program executed by the processor (CPU) and data used when the program is executed. To do.

補助記憶装置（ＳＴＯＲ）は、例えば、磁気記憶装置（ＨＤＤ）、フラッシュメモリ（ＳＳＤ）等の大容量かつ不揮発性の記憶装置であり、プロセッサ（ＣＰＵ）が実行するプログラム及びプログラムの実行時に使用されるデータを格納する。すなわち、プログラムは、補助記憶装置（ＳＴＯＲ）から読み出されて、記憶装置（ＲＡＭ）にロードされて、プロセッサ（ＣＰＵ）によって実行される。 The auxiliary storage device (STOR) is a large-capacity non-volatile storage device such as a magnetic storage device (HDD) or a flash memory (SSD), for example, and is used when executing a program executed by a processor (CPU) and a program. Data to be stored. That is, the program is read from the auxiliary storage device (STOR), loaded into the storage device (RAM), and executed by the processor (CPU).

通信デバイス（ＣＯＭ＿ＤＥＶ）は、所定のプロトコルに従って、通信スイッチ（ＣＯＭ＿ＳＷ）を介して、他の装置との通信を制御するネットワークインターフェース装置である。 The communication device (COM_DEV) is a network interface device that controls communication with other devices via a communication switch (COM_SW) according to a predetermined protocol.

各計算機（ＣＡＬＣ＿ＮＯＤＥ）は、入力インターフェース及び出力インターフェースを有してもよい。入力インターフェースは、オペレータからの入力を受けるインターフェースであり、具体的には、マウス、キーボード、タッチパネル、マイクなどである。出力インターフェースは、プログラムの実行結果をオペレータが視認可能な形式で出力するインターフェースであり、ディスプレイ装置やプリンタなどである。 Each computer (CALC_NODE) may have an input interface and an output interface. The input interface is an interface that receives input from an operator, and specifically includes a mouse, a keyboard, a touch panel, a microphone, and the like. The output interface is an interface that outputs the execution result of the program in a format that can be visually recognized by the operator, such as a display device or a printer.

通信スイッチ（ＣＯＭ＿ＳＷ）に、入力インターフェース及び出力インターフェースを有する端末計算機が接続されてもよい。 A terminal computer having an input interface and an output interface may be connected to the communication switch (COM_SW).

プロセッサ（ＣＰＵ）が実行するプログラムは、リムーバブルメディア（ＣＤ−ＲＯＭ、フラッシュメモリなど）又はネットワークを介して各計算機（ＣＡＬＣ＿ＮＯＤＥ）に提供され、非一時的記憶媒体である不揮発性の補助記憶装置（ＳＴＯＲ）に格納される。このため、各計算機（ＣＡＬＣ＿ＮＯＤＥ）は、リムーバブルメディアからデータを読み込むインターフェースを有するとよい。 A program executed by the processor (CPU) is provided to each computer (CALC_NODE) via a removable medium (CD-ROM, flash memory, etc.) or a network, and is a nonvolatile auxiliary storage device (STOR) that is a non-temporary storage medium. ). Therefore, each computer (CALC_NODE) may have an interface for reading data from the removable medium.

各計算機（ＣＡＬＣ＿ＮＯＤＥ）は、物理的に一つの計算機上で、又は、論理的又は物理的に構成された複数の計算機上で構成される計算機システムであり、同一の計算機上で別個のスレッドで動作してもよく、複数の物理的計算機資源上に構築された仮想計算機上で動作してもよい。また、各計算機（ＣＡＬＣ＿ＮＯＤＥ）の各機能部は異なる計算機上で実現されてもよい。 Each computer (CALC_NODE) is a computer system that is configured on a single computer or a plurality of computers that are logically or physically configured, and operates on separate threads on the same computer. Alternatively, it may operate on a virtual machine built on a plurality of physical computer resources. Each functional unit of each computer (CALC_NODE) may be realized on a different computer.

図２は、第１の実施例の意思決定支援システムの論理的な構成を示すブロック図である。 FIG. 2 is a block diagram illustrating a logical configuration of the decision support system according to the first embodiment.

本実施例の意思決定支援システムは、アクション連鎖モデル１、反応モデル２、意思モデル４及び評価モデル５の四つのモデルで構成される。具体的には、図２に示す意思決定支援システムは、アクション連鎖モデル１、反応モデル２及び複数のアクション決定部３を有する。アクション決定部３は、意思モデル４、評価モデル５及びアクション選択部６を有し、プレーヤー毎に設けられる。 The decision support system according to the present embodiment includes four models: an action chain model 1, a reaction model 2, a intention model 4, and an evaluation model 5. Specifically, the decision support system shown in FIG. 2 includes an action chain model 1, a reaction model 2, and a plurality of action determination units 3. The action determination unit 3 includes an intention model 4, an evaluation model 5, and an action selection unit 6, and is provided for each player.

アクション連鎖モデル１は、図３に示すように、ルールベースのシミュレータであり、プレーヤーの現在のアクションから次に実行される蓋然性があるアクションを導出する。プレーヤーは、世の中で意思を決定し、行動する（アクションを実行する）主体であり、例えば、国民、行政機関（各省庁）、国会、内閣、外国政府、マスコミなどである。 As shown in FIG. 3, the action chain model 1 is a rule-based simulator, and derives an action that is likely to be executed next from the current action of the player. A player is a subject who makes decisions and acts (executes actions) in the world, and includes, for example, the public, administrative bodies (each ministry and agency), the Diet, the Cabinet, foreign governments, and the media.

反応モデル２は、図４に示すように、ルールベースのシミュレータであり、各プレーヤーの現在のアクション及び現在の指標値から次の指標値を導出する。指標値とは、例えば、世の中で生じている事象（情勢の変化）を数値化した指標であり、経済指標（ＧＤＰ、株価、為替レートなど）、世論調査結果（内閣支持率など）である。 As shown in FIG. 4, the reaction model 2 is a rule-based simulator, and derives the next index value from the current action and the current index value of each player. The index value is, for example, an index that quantifies events occurring in the world (changes in circumstances), such as economic indicators (GDP, stock price, exchange rate, etc.), and public opinion survey results (Cabinet support rate, etc.).

アクション決定部３は、プレーヤー毎に設けられ、各プレーヤーの次のアクションを導出する。 The action determination unit 3 is provided for each player, and derives the next action of each player.

意思モデル４は、図５に示すように、次の指標値からアクション意思を導出するシミュレータである。アクション意思は、ある情勢（指標値の組み合わせによって表される）において採用されるアクションを実行する意志の強さを表す数値である。すなわち、各プレーヤーはアクション意思の値が大きいアクションを選択する確率（期待値）が高い。評価モデル５は、図６に示すように、次の指標値から評価値を導出するシミュレータである。評価値は、ある情勢がプレーヤーにとって望ましい程度を表す数値である。 As shown in FIG. 5, the intention model 4 is a simulator that derives an action intention from the next index value. The action intention is a numerical value indicating the strength of the will to execute the action adopted in a certain situation (represented by a combination of index values). That is, each player has a high probability (expected value) of selecting an action having a large action intention value. As shown in FIG. 6, the evaluation model 5 is a simulator that derives an evaluation value from the next index value. The evaluation value is a numerical value representing the degree to which a certain situation is desirable for the player.

アクション選択部６は、当該プレーヤーの次のアクションの蓋然性、アクション意思及び評価値から当該プレーヤーの次のアクションを導出するセレクタである。アクション選択部６は、例えば、意思モデル４から出力されたアクション意思及び評価モデル５から出力された評価値でアクション連鎖モデル１から出力されたアクションに重み付けすることによって、当該プレーヤーの次のアクションを選択する。 The action selection unit 6 is a selector that derives the next action of the player from the probability of the next action of the player, the action intention, and the evaluation value. The action selection unit 6 weights the action output from the action chain model 1 with the action intention output from the intention model 4 and the evaluation value output from the evaluation model 5, for example. select.

アクション選択部６が導出した次のアクションは、アクション連鎖モデル１に入力され、次のアクションのシミュレーションに用いられる。また、反応モデル２が導出した次の指標値は、反応モデル２に入力され、次の指標値のシミュレーションに用いられる。 The next action derived by the action selection unit 6 is input to the action chain model 1 and used for simulation of the next action. The next index value derived by the reaction model 2 is input to the reaction model 2 and used for the simulation of the next index value.

図３は、第１の実施例のアクション連鎖モデル１を説明する図である。アクション連鎖モデル１は、各プレーヤーのアクションをノードとしたマルコフ決定過程モデルで表されている。各ノードにはプレーヤーがアクションを選択する確率が対応付けられており、ノード間のエッジにはノード間を状態が遷移する確率が対応付けられている。 FIG. 3 is a diagram for explaining the action chain model 1 of the first embodiment. The action chain model 1 is represented by a Markov decision process model in which each player's action is a node. Each node is associated with a probability that the player selects an action, and an edge between the nodes is associated with a probability that the state transitions between the nodes.

図３に示すアクション連鎖モデル１では、プレーヤー１がアクション１、アクション２及びアクション３を選択する確率は、それぞれ、０．３：０．５：０．２である。また、プレーヤー１がアクション１を選択した場合、プレーヤー２は０．８の確率でアクション２を選択する。つまり、プレーヤー２がアクション２を選択する確率は、式１によって表すことができる。
１−（（１−０．３）×（１−０．８）） …（式１） In the action chain model 1 shown in FIG. 3, the probabilities that the player 1 selects the action 1, the action 2 and the action 3 are 0.3: 0.5: 0.2, respectively. When player 1 selects action 1, player 2 selects action 2 with a probability of 0.8. That is, the probability that the player 2 selects the action 2 can be expressed by Equation 1.
1-((1-0.3) × (1-0.8)) (Formula 1)

アクション連鎖モデル１によって、プレーヤーの現在のアクションから蓋然性がある次のアクションを一つ又は複数導出することができる。 The action chain model 1 can derive one or more next actions that are likely from the player's current action.

図４は、第１の実施例の反応モデル２を説明する図である。反応モデル２は、コーザル・ループ・ダイアグラムによって各指標間の相関関係を表すモデルである。 FIG. 4 is a diagram for explaining the reaction model 2 of the first embodiment. The reaction model 2 is a model that represents the correlation between each index by a causal loop diagram.

反応モデル２は、例えば図４に示すように、複数の指標をノードとして、ノード間をエッジによって連結したグラフィカルモデルによって表すことができる。各エッジの実線の矢印は正の相関を示し、破線の矢印は負の相関を示す。さらに、各エッジに係数を定めることによって、各指標の挙動を表すシステムダイナミクスモデルとすることができる。各エッジの係数は、指標の増減量との比で定義する。例えば、指標３と指標２とは正の相関があり、エッジの係数が０．５であれば、指標３が１増加したとき、指標２は０．５増加する。また、指標３と指標５とは負の相関があり、エッジの係数が１．２であれば、指標３が１増加したとき、指標５は１．２減少する。 For example, as illustrated in FIG. 4, the reaction model 2 can be represented by a graphical model in which a plurality of indices are nodes and the nodes are connected by edges. A solid line arrow at each edge indicates a positive correlation, and a broken line arrow indicates a negative correlation. Furthermore, by defining a coefficient for each edge, a system dynamics model representing the behavior of each index can be obtained. The coefficient of each edge is defined by the ratio with the increase / decrease amount of the index. For example, if the index 3 and the index 2 have a positive correlation and the edge coefficient is 0.5, when the index 3 increases by 1, the index 2 increases by 0.5. In addition, the index 3 and the index 5 have a negative correlation, and if the edge coefficient is 1.2, when the index 3 increases by 1, the index 5 decreases by 1.2.

なお、離散系シミュレーションを行う場合、計算機内では反応モデル２を漸化式で表すことができる。また、連続系シミュレーションを行う場合、計算機内では反応モデル２を一次微分方程式で表すことができる。 When performing a discrete system simulation, the reaction model 2 can be represented by a recurrence formula in the computer. Moreover, when performing a continuous system simulation, the reaction model 2 can be represented by a primary differential equation in the computer.

指標には、ストック要素とフロー要素とがある。ストック要素は、例えば、原油の備蓄量など、ある時点における量を示す。また、フロー要素は、例えば、原油の輸入量（生産量）や消費量など、時間帯における変数の流れを示す。ある指標をストック要素とするかフロー要素とするかは、世の中で、その指標が一般的にストック量及びフロー量のいずれとして使われているかによって決定するとよい。また、実際には数値で計測できない量（例えば、リスクインパクト、ナショナリズム）を指標として用いてもよい。このように、ストック要素とフロー要素とを混在させて反応モデル２を構成することによって、有識者の思考をそのままで、制約を設けることなく、モデル化することができる。 Indicators include stock elements and flow elements. The stock element indicates an amount at a certain point in time, for example, a stock of crude oil. The flow element indicates a flow of variables in a time zone such as the import amount (production amount) and consumption amount of crude oil. Whether an index is a stock element or a flow element may be determined depending on whether the index is generally used as a stock quantity or a flow quantity in the world. Moreover, you may use the quantity (for example, risk impact, nationalism) which cannot be actually measured numerically as a parameter | index. In this way, by configuring the reaction model 2 by mixing the stock elements and the flow elements, it is possible to model the thinking of the expert without any restrictions.

図５は、第１の実施例の意思モデル４を説明する図である。意思モデル４は、コーザル・ループ・ダイアグラムによって各指標間の相関関係を表し、さらに、各プレーヤーのアクションと各指標との相関関係を表すモデルである。 FIG. 5 is a diagram for explaining the intention model 4 of the first embodiment. The intention model 4 is a model that expresses a correlation between each index by a causal loop diagram, and further represents a correlation between each player's action and each index.

意思モデル４は、前述した反応モデル２と同様のモデルによって表すことができる。すなわち、意思モデル４は、例えば、複数の指標をノードとして、ノード間をエッジによって連結したグラフィカルモデルによって表すことができる。各エッジの実線の矢印は正の相関を示し、破線の矢印は負の相関を示す。さらに、各エッジに係数を定めることによって、各指標の挙動を表すシステムダイナミクスモデルとすることができる。各エッジの係数は、指標の増減量との比で定義する。 The intention model 4 can be expressed by a model similar to the reaction model 2 described above. That is, the intention model 4 can be represented by, for example, a graphical model in which a plurality of indices are nodes and nodes are connected by edges. A solid line arrow at each edge indicates a positive correlation, and a broken line arrow indicates a negative correlation. Furthermore, by defining a coefficient for each edge, a system dynamics model representing the behavior of each index can be obtained. The coefficient of each edge is defined by the ratio with the increase / decrease amount of the index.

意思モデル４において、指標間は互いに相関関係を有するが、プレーヤーのアクションと指標との間は、各プレーヤーのアクションから指標へ向かうのエッジのみが定義され、逆方向のエッジは定義されない。また、各アクションの間のエッジも定義されない。意思モデル４によって、各プレーヤーのアクションの指標への影響をモデル化することができる。 In the intention model 4, the indexes have a correlation with each other. However, between the player's action and the index, only the edge from each player's action toward the index is defined, and the opposite edge is not defined. Also, the edge between each action is not defined. The intention model 4 can model the influence of each player's action on the index.

なお、離散系シミュレーションを行う場合、計算機内では意思モデル４を漸化式で表すことができる。また、連続系シミュレーションを行う場合、計算機内では意思モデル４を一次微分方程式で表すことができる。意思モデル４は、反応モデル２と同様に、意思モデル４の指標には、ストック要素とフロー要素とがある。 When performing discrete system simulation, the intention model 4 can be expressed by a recurrence formula in the computer. Moreover, when performing a continuous system simulation, the intention model 4 can be represented by a primary differential equation in the computer. In the intention model 4, as in the reaction model 2, the indicators of the intention model 4 include a stock element and a flow element.

図６は、第１の実施例の評価モデル５を説明する図である。評価モデル５は、コーザル・ループ・ダイアグラムによって各指標間の相関関係を表し、各プレーヤーのアクションと各指標との相関関係を表し、各指標と各プレーヤーの評価との相関関係を表し、さらに、各指標と各プレーヤーのアクションの意思との相関関係を表すモデルである。評価は、複数の指標の組み合わせによって表される情勢を、各プレーヤーが望ましいと思うかを示す数値である。なお、評価は、指標の数値範囲の組み合わせ毎に異なる評価値を持ってもよい。アクション意思は、各プレーヤーがとり得るアクションと各アクションを選択する確率の組によって表される。 FIG. 6 is a diagram for explaining the evaluation model 5 of the first embodiment. The evaluation model 5 represents a correlation between each indicator by a corusal loop diagram, represents a correlation between each player's action and each indicator, represents a correlation between each indicator and each player's evaluation, This model represents the correlation between each indicator and each player's intention of action. The evaluation is a numerical value indicating whether each player prefers the situation represented by the combination of a plurality of indicators. Note that the evaluation may have a different evaluation value for each combination of numerical ranges of the index. The action intention is represented by a set of actions that each player can take and the probability of selecting each action.

評価モデル５は、前述した反応モデル２と同様のモデルによって表すことができる。すなわち、評価モデル５は、例えば、複数の指標をノードとして、ノード間をエッジによって連結したグラフィカルモデルによって表すことができる。各エッジの実線の矢印は正の相関を示し、破線の矢印は負の相関を示す。さらに、各エッジに係数を定めることによって、各指標の挙動を表すシステムダイナミクスモデルとすることができる。各エッジの係数は、指標の増減量との比で定義する。 The evaluation model 5 can be represented by a model similar to the reaction model 2 described above. That is, the evaluation model 5 can be represented by, for example, a graphical model in which a plurality of indices are nodes and nodes are connected by edges. A solid line arrow at each edge indicates a positive correlation, and a broken line arrow indicates a negative correlation. Furthermore, by defining a coefficient for each edge, a system dynamics model representing the behavior of each index can be obtained. The coefficient of each edge is defined by the ratio with the increase / decrease amount of the index.

評価モデル５において、指標間は互いに相関関係を有するが、プレーヤーのアクションと指標との間は、各プレーヤーのアクションから指標へ向かうのエッジのみが定義され、逆方向のエッジは定義されない。また、各アクションの間のエッジも定義されない。また、各指標と各プレーヤーの評価との間は、各指標から評価へ向かうのエッジのみが定義され、逆方向のエッジは定義されない。また、各指標と各プレーヤーのアクションの意思との間は、各指標からアクションの意思へ向かうのエッジのみが定義され、逆方向のエッジは定義されない。さらに、アクションの意思の間のエッジも定義されない。評価モデル５によって、各指標の各プレーヤーのアクションの意思の強さへの影響をモデル化することができ、各プレーヤーのアクションの評価値を定めることができる。 In the evaluation model 5, the indexes have a correlation with each other, but only an edge from each player's action toward the index is defined, and no opposite edge is defined between the player's action and the index. Also, the edge between each action is not defined. Also, between each index and each player's evaluation, only the edge from each index toward the evaluation is defined, and the opposite edge is not defined. Also, between each index and each player's action intention, only the edge from each index toward the action intention is defined, and the opposite edge is not defined. In addition, the edges between intentions of action are not defined. The evaluation model 5 can model the effect of each index on the strength of the intention of each player's action, and can determine the evaluation value of each player's action.

なお、離散系シミュレーションを行う場合、計算機内では意思モデル４を漸化式で表すことができる。また、連続系シミュレーションを行う場合、計算機内では意思モデル４を一次微分方程式で表すことができる。評価モデル５は、反応モデル２と同様に、評価モデル５の指標には、ストック要素とフロー要素とがある。 When performing discrete system simulation, the intention model 4 can be expressed by a recurrence formula in the computer. Moreover, when performing a continuous system simulation, the intention model 4 can be represented by a primary differential equation in the computer. Similar to the reaction model 2, the evaluation model 5 includes stock elements and flow elements as indices of the evaluation model 5.

なお、評価モデル５は意思モデル４を含み、意思モデル４は反応モデル２を含む。このため、反応モデル２と意思モデル４と評価モデル５とは、一つのモデルを論理的に区分して構成してもよい。 The evaluation model 5 includes the intention model 4, and the intention model 4 includes the reaction model 2. Therefore, the reaction model 2, the intention model 4, and the evaluation model 5 may be configured by logically dividing one model.

反応モデル２、意思モデル４及び評価モデル５において、各ノードが表す事象は多少は関係があるので、ほとんど全てのノード間でエッジが定義できる。しかし、全てのノード間でエッジを定義するとモデルが複雑になるので、相関性が高いエッジ（例えば、係数が所定の閾値より大きいエッジ）によってモデルを構成するとよい。 In the reaction model 2, the intention model 4, and the evaluation model 5, the events represented by each node are somewhat related, so that an edge can be defined between almost all nodes. However, if the edges are defined between all the nodes, the model becomes complicated. Therefore, the model may be configured with edges having high correlation (for example, edges whose coefficients are larger than a predetermined threshold).

図７は、第１の実施例の意思決定支援システムによる処理のフローチャートである。 FIG. 7 is a flowchart of processing by the decision support system of the first embodiment.

まず、現在の状況及びシミュレーション期間が入力インターフェースに入力されると（Ｓ１０１）繰り返し制御パラメータｔの初期値にシミュレーション開始時を設定し、シミュレーション終了時ｔ＿ｅｎｄを設定する。入力される現在の状態は、各プレーヤーの現在のアクション及び現在の各指標値を含む。 First, when the current situation and simulation period are input to the input interface (S101), the simulation start time is set as the initial value of the repetitive control parameter t, and the simulation end time t_end is set. The current state entered includes each player's current action and each current index value.

次に、ｔがｔ＿ｅｎｄより小さいかを判定する（Ｓ１０２）。ｔがｔ＿ｅｎｄより小さければ、ステップＳ１０３及びＳ１０５に進む。ｔがｔ＿ｅｎｄ以上であれば、指定された期間のシミュレーション結果が得られたので、処理を終了し、シミュレーション結果出力画面（図８）を出力する。 Next, it is determined whether t is smaller than t_end (S102). If t is smaller than t_end, the process proceeds to steps S103 and S105. If t is equal to or greater than t_end, the simulation result for the specified period is obtained, so the processing is terminated and the simulation result output screen (FIG. 8) is output.

ステップＳ１０３では、アクション連鎖モデル１を駆動し、プレーヤーの現在のアクションから蓋然性がある次のアクションを導出し、補助記憶装置（ＳＴＯＲ）に格納する（Ｓ１０４）。また、ステップＳ１０５では、反応モデル２を駆動し、各プレーヤーの現在のアクション及び現在の指標値から次の指標値を導出し、補助記憶装置（ＳＴＯＲ）に格納する（Ｓ１０６）。 In step S103, the action chain model 1 is driven, and the next action having a probability is derived from the current action of the player, and stored in the auxiliary storage device (STOR) (S104). In step S105, the reaction model 2 is driven, and the next index value is derived from the current action and the current index value of each player, and stored in the auxiliary storage device (STOR) (S106).

ステップＳ１０３〜Ｓ１０６の処理は、並行して実行することができるが、アクション連鎖モデル１を起動する処理（Ｓ１０３）と、反応モデル２を駆動する処理（Ｓ１０５）とを順に実行してもよい。 The processes of steps S103 to S106 can be executed in parallel, but the process of starting the action chain model 1 (S103) and the process of driving the reaction model 2 (S105) may be executed in order.

次に、全てのプレーヤーについて、意思モデル４及び評価モデル５を駆動する（Ｓ１０７、Ｓ１０９）。ステップＳ１０７〜Ｓ１１０の処理は、並行して実行することができるが、意思モデル４を起動する処理（Ｓ１０７）と、評価モデル５を駆動する処理（Ｓ１０９）とを順に実行してもよい。 Next, the intention model 4 and the evaluation model 5 are driven for all players (S107, S109). The processes of steps S107 to S110 can be executed in parallel, but the process of starting the intention model 4 (S107) and the process of driving the evaluation model 5 (S109) may be executed in order.

ステップＳ１０７では、意思モデル４を駆動し、次の指標値から各プレーヤーのアクション意思を導出し、補助記憶装置（ＳＴＯＲ）に格納する（Ｓ１０８）。また、ステップＳ１０９では、評価モデル５を駆動し、次の指標値から評価値を導出し、補助記憶装置（ＳＴＯＲ）に格納する（Ｓ１１０）。 In step S107, the intention model 4 is driven, the action intention of each player is derived from the next index value, and stored in the auxiliary storage device (STOR) (S108). In step S109, the evaluation model 5 is driven, an evaluation value is derived from the next index value, and stored in the auxiliary storage device (STOR) (S110).

その後、アクション選択部６が、当該プレーヤーの次のアクションの蓋然性、アクション意思及び評価値を勘案して当該プレーヤーの次のアクションを決定し、補助記憶装置（ＳＴＯＲ）に格納する（Ｓ１１１）。 Thereafter, the action selection unit 6 determines the next action of the player in consideration of the probability of the next action of the player, the action intention, and the evaluation value, and stores them in the auxiliary storage device (STOR) (S111).

全てのプレーヤーの次のアクションが決定した後、反応モデル２が出力した次の指標値を現在の指標値に設定して、反応モデル２の時刻を一つ進める（Ｓ１１２）。そして、繰り返し制御パラメータｔに１を加算して（Ｓ１１３）、ステップＳ１０２に戻る。なお、ｔに加算される１は、シミュレーションを実行する時間間隔を示し、オペレータが（例えば、１日を）予め設定するとよい。 After the next action of all players is determined, the next index value output from the reaction model 2 is set to the current index value, and the time of the reaction model 2 is advanced by one (S112). Then, 1 is added to the repetitive control parameter t (S113), and the process returns to step S102. In addition, 1 added to t shows the time interval which performs simulation, and it is good for an operator to preset (for example, 1 day).

以上の処理によって、シミュレーション期間中の各プレーヤーのアクションを導出できる。 Through the above processing, each player's action during the simulation period can be derived.

図８は、第１の実施例のシミュレーション結果出力画面１０００の例を示す図である。シミュレーション結果出力画面１０００は、時間の経過に伴って各プレーヤーが選択するアクションを表し、出力インターフェース（ディスプレイ装置）に表示される。例えば、図８に示すように、縦方向にプレーヤーが列記され、横方向にシミュレーション結果の時刻が列記された表形式で、各プレーヤーの各時刻におけるアクションが表示される。シミュレーション結果出力画面１０００によって、ユーザは各プレーヤーがとるアクションを時系列に知ることができる。 FIG. 8 is a diagram illustrating an example of a simulation result output screen 1000 according to the first embodiment. The simulation result output screen 1000 represents an action selected by each player as time passes, and is displayed on an output interface (display device). For example, as shown in FIG. 8, the actions at each time of each player are displayed in a table format in which players are listed in the vertical direction and times of simulation results are listed in the horizontal direction. The simulation result output screen 1000 allows the user to know the actions taken by each player in time series.

次に、第１の実施例の変形例について説明する。以下に説明する変形例では、星取表を用いて各プレーヤーのアクションを選択する。 Next, a modification of the first embodiment will be described. In the modification described below, the action of each player is selected using a star chart.

図９は、第１の実施例の変形例の意思決定支援システムによる処理のフローチャートである。 FIG. 9 is a flowchart of processing by the decision support system according to the modification of the first embodiment.

まず、現在の状況及びシミュレーション期間が入力インターフェースに入力されると（Ｓ１２１）、繰り返し制御パラメータｔの初期値をシミュレーション開始時に設定し、シミュレーション終了時ｔ＿ｅｎｄを設定する。入力される現在の状態には、各プレーヤーの現在のアクション及び現在の各指標値が含まれる。 First, when the current situation and simulation period are input to the input interface (S121), the initial value of the repetitive control parameter t is set at the start of simulation, and t_end at the end of simulation is set. The input current state includes the current action of each player and the current index values.

次に、ｔがｔ＿ｅｎｄより小さいかを判定する。ｔがｔ＿ｅｎｄより小さければ、ステップＳ１０３及びＳ１０５に進む（Ｓ１２２）。ｔがｔ＿ｅｎｄ以上であれば、指定された期間のシミュレーション結果が得られたので、処理を終了し、シミュレーション結果出力画面（図８）を出力する。 Next, it is determined whether t is smaller than t_end. If t is smaller than t_end, the process proceeds to steps S103 and S105 (S122). If t is equal to or greater than t_end, the simulation result for the specified period is obtained, so the processing is terminated and the simulation result output screen (FIG. 8) is output.

次に、全てのプレーヤーが取り得るアクションの選択肢の全ての組み合わせを列挙し（Ｓ１２３）、列挙された全ての組み合わせについて、アクション連鎖モデル１及び反応モデル２をｔ＋αまで駆動し、各プレーヤーの次のアクション及び次の指標値を導出し、補助記憶装置（ＳＴＯＲ）に格納する（Ｓ１２４）。αは、星取表に記載される各アクションのスコアを計算するうえで、時刻ｔにおいて予測が考慮される将来の時間である。 Next, all combinations of action options that all players can take are listed (S123), and for all the listed combinations, the action chain model 1 and the reaction model 2 are driven to t + α, and each player's next combination is selected. The action and the next index value are derived and stored in the auxiliary storage device (STOR) (S124). α is a future time at which a prediction is considered at time t in calculating the score of each action described in the star chart.

その後、全てのプレーヤーについて、意思モデル４及び評価モデル５を駆動し、各プレーヤーのアクション意思及び評価値を導出し、補助記憶装置（ＳＴＯＲ）に格納する（Ｓ１２５）。 Thereafter, the intention model 4 and the evaluation model 5 are driven for all the players, and the action intention and evaluation value of each player are derived and stored in the auxiliary storage device (STOR) (S125).

その後、時刻ｔ＋αでの評価値に基づいて星取表を作成し（Ｓ１２６）、星取表を用いて、時刻ｔにおけるプレーヤーのアクションを決定し、補助記憶装置（ＳＴＯＲ）に格納する（Ｓ１２７）。具体的には、星取表は自分の評価と他のプレーヤーの評価とを記載した表で、後述するように、適切なアクションを選択するために用いられる。 Thereafter, a star chart is created based on the evaluation value at time t + α (S126), the action of the player at time t is determined using the star chart, and stored in the auxiliary storage device (STOR) (S127). Specifically, the star chart is a table in which one's own evaluation and other player's evaluation are described, and is used to select an appropriate action as will be described later.

全てのプレーヤーの次のアクションが決定した後、反応モデル２が出力した次の指標値を現在の指標値に設定して、反応モデル２の時刻を一つ進める（Ｓ１２８）。そして、繰り返し制御パラメータｔに１を加算して（Ｓ１２９）、ステップＳ１２２に戻る。 After the next action of all players is determined, the next index value output from the reaction model 2 is set as the current index value, and the time of the reaction model 2 is advanced by one (S128). Then, 1 is added to the repetition control parameter t (S129), and the process returns to step S122.

以上の処理によって、シミュレーション期間中の各プレーヤーのアクションを、他人のアクションを考慮しつつシミュレートできる。 Through the above processing, each player's action during the simulation period can be simulated in consideration of the actions of others.

図１０は、第１の実施例の星取表出力画面１１００の例を示す図である。星取表出力画面は、シミュレーション結果出力画面１０００（図８）において、アクションの欄を選択することによって、出力インターフェース（ディスプレイ装置）に表示される。なお、星取表は、二人のプレーヤーのアクションの関係を表すので、オペレータは、シミュレーション結果出力画面１０００においてアクションの欄を選択した後に、相手方のプレーヤーを選択する。 FIG. 10 is a diagram illustrating an example of the star chart output screen 1100 according to the first embodiment. The star chart output screen is displayed on the output interface (display device) by selecting the action column in the simulation result output screen 1000 (FIG. 8). Since the star chart represents the relationship between the actions of the two players, the operator selects the opponent player after selecting the action column on the simulation result output screen 1000.

なお、星取表は、計算機の内部では、図１１に示すモンテカルロ木によって構成するとよい。 Note that the star chart is preferably composed of a Monte Carlo tree shown in FIG. 11 inside the computer.

まず、画面の内容について説明する。図１０に示す星取表出力画面１１００は、二人のプレーヤーの関係を示し、縦方向にプレーヤー１のアクションが列記され、横方向にプレーヤー２のアクションが列記された星取表１１１０を含む。また、画面の下部には「戻る」ボタン１１２０が設けられている。オペレータが「戻る」ボタン１１２０を操作することによって、シミュレーション結果出力画面１０００に戻ることができる。 First, the contents of the screen will be described. The star chart output screen 1100 shown in FIG. 10 shows the relationship between two players, and includes a star chart 1110 in which the actions of the player 1 are listed in the vertical direction and the actions of the player 2 are listed in the horizontal direction. A “return” button 1120 is provided at the bottom of the screen. The operator can return to the simulation result output screen 1000 by operating the “return” button 1120.

次に、星取表の内容について説明する。図１０に示す星取表は、プレーヤー１のアクションとプレーヤー２のアクションとの組において、プレーヤー１の評価とプレーヤー２の評価とが組になって記録される。評価は、図１０に示すように記号で表してもよいし、数値で表してもよい。星取表を用いることによって、複数のプレーヤーの評価値を総合した評価したアクションを決定することができる。 Next, the contents of the star chart will be described. The star chart shown in FIG. 10 is recorded with a combination of the evaluation of the player 1 and the evaluation of the player 2 in the combination of the action of the player 1 and the action of the player 2. The evaluation may be expressed by a symbol as shown in FIG. 10 or may be expressed by a numerical value. By using the star chart, it is possible to determine an action that evaluates the evaluation values of a plurality of players.

次に、星取表を用いてアクションを選択する方法について、自分がプレーヤー１であり、相手がプレーヤー２である場合を説明する。ＭｉｎＭａｘ法を用いて、想定される最大の損害が最小になるようにアクションを決定する。 Next, a method for selecting an action using the star chart will be described in the case where the player is the player 1 and the opponent is the player 2. The action is determined using the MinMax method so that the largest possible damage is minimized.

例えば、プレーヤーが二人の場合、星取表の各行（自分のアクションが同じ行）に着目して、相手（プレーヤー２）の評価値が最も良いアクションにおけるプレーヤー１の評価値（ＭｉｎＭａｘ評価値）が最も良いアクションを、自プレーヤーの次のアクションに決定する。図示した場合、プレーヤー１のアクションはアクション２に決定する。 For example, if there are two players, paying attention to each row of the star chart (your own action is the same), the evaluation value (MinMax evaluation value) of player 1 in the action with the best evaluation value of the opponent (player 2) is The best action is determined as the next action of the player. In the illustrated case, the action of the player 1 is determined as action 2.

また、プレーヤーが３人以上である場合、自分（プレーヤー１）のアクションについて、他プレーヤーのアクションを順に（そのプレーヤーの評価値が最も高いアクションを選ぶとして）固定した上で、自分の評価値が最も高いアクションに決定する。以下、具体的に説明する。 If there are three or more players, the actions of other players are fixed in order for the actions of the player (player 1) (assuming that the player with the highest evaluation value of the player is selected), Decide on the highest action. This will be specifically described below.

プレーヤーが３人の場合は、以下のステップでアクションを決める。 If there are three players, the action is decided by the following steps.

ステップ１：自分（プレーヤー１）のアクションをアクション１に設定する。 Step 1: Set own (player 1) action to action 1.

その状態で、プレーヤー２とプレーヤー３との２人のゲームを考える。
ステップ１−１：プレーヤー２を自プレーヤーとして、上記のプレーヤーが２人の方法（星取表によるＭｉｎＭａｘ）によって、プレーヤー２のアクションを決める。
ステップ１−２：プレーヤー３を自プレーヤーとして、上記のプレーヤーが２人の方法（星取表によるＭｉｎＭａｘ）によって、プレーヤー３のアクションを決める。 In that state, consider a two-player game between player 2 and player 3.
Step 1-1: With the player 2 as the player, the above player determines the action of the player 2 by the method of two people (MinMax according to the star chart).
Step 1-2: With the player 3 as its own player, the player determines the action of the player 3 by the method of two people (MinMax according to the star chart).

以上で決まった各プレーヤーのアクション（プレーヤー１＝アクション１、プレーヤー２＝ステップ１−１で決めたアクション、プレーヤー３＝ステップ１−２で決めたアクション）の組み合わせにおける、自分（プレーヤー１）の評価値を、プレーヤー１のアクション１における評価値に設定する。 Evaluation of the player (player 1) in the combination of the actions of each player determined as described above (player 1 = action 1, player 2 = action determined in step 1-1, player 3 = action determined in step 1-2) The value is set to the evaluation value in action 1 of player 1.

ステップ２：自分（プレーヤー１）のアクションをアクション２に設定する。 Step 2: Set own (player 1) action to action 2.

また、前述と同様の方法で、プレーヤー１のアクション２における評価値を決める。 Further, the evaluation value in the action 2 of the player 1 is determined by the same method as described above.

ステップ３：自分（プレーヤー１）のアクションの数だけ、ステップ１の計算を行い、自分（プレーヤー１）にとっての評価値が最も良いアクションを決定する。 Step 3: The calculation of step 1 is performed for the number of actions of the player (player 1), and the action having the best evaluation value for the player (player 1) is determined.

さらに、プレーヤーが４人の場合は、以下のステップでアクションを決定する。 Furthermore, when there are four players, the action is determined by the following steps.

ステップ４：自分（プレーヤー１）のアクションを固定する。その結果、プレーヤー２〜４のプレーヤーが３人のゲームとなる。
ステップ４−１：３人のゲームにおいて、プレーヤー２を自プレーヤーとして、上記のプレーヤーが３人の方法によって、プレーヤー２のアクションを決める。
ステップ４−２：３人のゲームにおいて、プレーヤー３を自プレーヤーとして、上記のプレーヤーが３人の方法によって、プレーヤー３のアクションを決める。
ステップ４−３：３人のゲームにおいて、プレーヤー４を自プレーヤーとして、上記のプレーヤーが３人の方法によって、プレーヤー４のアクションを決める。 Step 4: Fix yourself (player 1) 's action. As a result, the players 2 to 4 become three players.
Step 4-1: In the game of three players, the player 2 is determined as the player, and the player determines the action of the player 2 by the method of three players.
Step 4-2: In the three-player game, the player 3 is the player, and the player determines the action of the player 3 by the method of three players.
Step 4-3: In the three-player game, the player 4 is determined to be the player, and the player determines the action of the player 4 by three methods.

以上で決まった各プレーヤーのアクション（プレーヤー１＝アクション１、プレーヤー２〜４＝ステップ４−１〜４−３で決めたアクション）の組み合わせにおける、自分（プレーヤー１）の評価値を、プレーヤー１のアクション１における評価値に設定する。
ステップ５：自分（プレーヤー１）のアクションの数だけ、ステップ４の計算を行い、自分（プレーヤー１）にとっての評価値が最も良いアクションを決定する。 In the combination of the actions of each player determined as described above (player 1 = action 1, players 2-4 = actions determined in steps 4-1 to 4-3), the evaluation value of the player (player 1) Set to the evaluation value in action 1.
Step 5: Step 4 is calculated for the number of actions of the player (player 1), and the action having the best evaluation value for the player (player 1) is determined.

図１１は、第１の実施例の星取表を構成するためのモンテカルロ木探索を説明する図である。 FIG. 11 is a diagram for explaining a Monte Carlo tree search for configuring the star chart of the first embodiment.

すなわち、図１０を用いて前述した方法では、全プレーヤーの全てのアクションの組み合わせを計算するので、計算量が多い。このため、モンテカルロ木探索（Monte-Carlo Tree Search）を用いることで、少ない計算量で同様の処理を近似的に実行することができる。なお、モンテカルロ木の計算を無限回行うと、全プレーヤーの全てのアクションの組み合わせを計算した場合と同じ結果が得られる。 That is, in the method described above with reference to FIG. 10, since all combinations of actions of all players are calculated, the calculation amount is large. For this reason, the same processing can be approximately executed with a small amount of calculation by using Monte-Carlo Tree Search. If the Monte Carlo tree is calculated infinitely, the same result as when all combinations of actions of all players are calculated can be obtained.

図１１に示すモンテカルロ木探索では、プレーヤー１〜４の４人によるゲームで、各プレーヤーが、アクション１、２のいずれかを選択可能な場合、自分（プレーヤー１）のアクション１における評価値を計算する処理を示す。 In the Monte Carlo tree search shown in FIG. 11, when each player can select one of actions 1 and 2 in the game of four players 1 to 4, the evaluation value in action 1 of himself (player 1) is calculated. The processing to be performed is shown.

まず、自分（プレーヤー１）がアクション１を選択する。次に、プレーヤー２〜４のうちから１人をランダム（等確率）に選択する。以下、プレーヤー２を選択した場合を説明する。プレーヤー２は、子ノードにおいて、プレーヤー２にとって評価値の平均値が高いアクションを選択する。 First, the player (player 1) selects action 1. Next, one of the players 2 to 4 is randomly selected (equal probability). Hereinafter, the case where the player 2 is selected will be described. The player 2 selects an action having a high average evaluation value for the player 2 at the child node.

次に、プレーヤー３又は４の１人をランダム（等確率）に選択する。以下、プレーヤー３を選択した場合を説明する。プレーヤー３は、子ノードにおいて、プレーヤー３にとって評価値の平均値が高いアクションを選択する。最後に、プレーヤー４が、子ノードにおいて、プレーヤー４にとって評価値の平均値が高いアクションを選択する。 Next, one player 3 or 4 is selected at random (equal probability). Hereinafter, the case where the player 3 is selected will be described. The player 3 selects an action having a high average evaluation value for the player 3 at the child node. Finally, the player 4 selects an action having a high evaluation value for the player 4 at the child node.

以上の処理によって、時刻Ｘ＋０における、各プレーヤーのアクションの組が決定する。 With the above processing, a set of actions of each player at time X + 0 is determined.

さらに、先読みをする場合、以下の処理を実行する。 Further, when prefetching is performed, the following processing is executed.

まず、プレーヤー１〜４の１人をランダム（等確率）に選択し、子ノードにおいて、選択されたプレーヤーにとって評価値が高いアクションを選択する。その後、残りのプレーヤーについて、前述と同様に、ランダムにプレーヤーを選択し、アクションを決定する。ある程度まで木を展開した後、ランダム・プレイアウトによって、全てのプレーヤーがランダムにアクションを選択して、時刻を進める。そして、予め定めておいた先読み回数に達したときの各プレーヤーの評価値を計算する。最後に、それまで通ってきた木の各ノードを逆に辿って、ノードに付けられている評価値に、先読み時の評価値を追加して平均値を求め、評価値を更新する。 First, one of the players 1 to 4 is selected at random (equal probability), and an action having a high evaluation value for the selected player is selected at the child node. Thereafter, for the remaining players, the player is randomly selected and the action is determined in the same manner as described above. After expanding the tree to some extent, all players randomly select an action and advance the time by random playout. Then, the evaluation value of each player when the predetermined number of pre-reads is reached is calculated. Finally, each node of the tree that has passed so far is traced in reverse, the evaluation value at the time of prefetching is added to the evaluation value attached to the node, the average value is obtained, and the evaluation value is updated.

以上の処理を、数百回程度行った後、ルートノードに付されているプレーヤー１の評価値が、プレーヤー１のアクション１の評価値となる。 After performing the above processing about several hundred times, the evaluation value of the player 1 attached to the root node becomes the evaluation value of the action 1 of the player 1.

以上の処理を、自分（プレーヤー１）がとり得る各アクションについて行い、最も自分（プレーヤー１）にとっての評価値が良いアクションを選択する。 The above processing is performed for each action that the player (player 1) can take, and the action with the best evaluation value for the player (player 1) is selected.

このように、星取表によって、ユーザはアクションが導出された理由を知ることができる。 Thus, the star chart allows the user to know the reason why the action is derived.

以上に説明したように、本発明の第１の実施例によると、アクション連鎖モデル１、反応モデル２、意思モデル４及び評価モデル５を用いて意思決定支援システムを構成するので、専門家の知見を容易に整理してモデル化することができる。このため、専門家の知見に基づいて将来生じるであろう情勢やアクションを予測することができる。特に、意思モデル４と評価モデル５を分けてモデル化するので、意思要因と抑制要因とを分けることができ、専門家の知見を加工することなくモデルに取り込むことができる。 As described above, according to the first embodiment of the present invention, since the decision support system is configured using the action chain model 1, the reaction model 2, the intention model 4, and the evaluation model 5, the knowledge of experts Can be easily organized and modeled. Therefore, it is possible to predict the situation and actions that will occur in the future based on the knowledge of experts. In particular, since the intention model 4 and the evaluation model 5 are separately modeled, the intention factor and the suppression factor can be separated, and expert knowledge can be incorporated into the model without being processed.

＜第２の実施例＞
次に、本発明の第２の実施例について説明する。第２の実施例の意思決定支援システムは、アクション連鎖モデル１、反応モデル２、意思モデル４、評価モデル５及び調停モデル７の五つのモデルで構成される。第２の実施例では、前述した第１の実施例と同じ構成及び処理の説明は省略し、異なる構成及び処理について説明する。 <Second embodiment>
Next, a second embodiment of the present invention will be described. The decision support system of the second embodiment is composed of five models: an action chain model 1, a reaction model 2, a intention model 4, an evaluation model 5, and an arbitration model 7. In the second embodiment, description of the same configuration and processing as those of the first embodiment is omitted, and different configuration and processing will be described.

図１２は、第２の実施例の意思決定支援システムの論理的な構成を示すブロック図である。 FIG. 12 is a block diagram illustrating a logical configuration of the decision support system according to the second embodiment.

第２の実施例の意思決定支援システムは、アクション連鎖モデル１、反応モデル２、複数のアクション決定部３及び調停モデル７を有する。アクション決定部３は、意思モデル４、評価モデル５及びアクション選択部６を有し、プレーヤー毎に設けられる。 The decision support system of the second embodiment includes an action chain model 1, a reaction model 2, a plurality of action determination units 3, and an arbitration model 7. The action determination unit 3 includes an intention model 4, an evaluation model 5, and an action selection unit 6, and is provided for each player.

アクション連鎖モデル１、反応モデル２、意思モデル４、評価モデル５及びアクション選択部６は、前述した第１の実施例と同じである。なお、第２の実施例のアクション決定部３は、各プレーヤー毎に取り得る複数のアクションを、その選択率と共に出力する。 The action chain model 1, the reaction model 2, the intention model 4, the evaluation model 5, and the action selection unit 6 are the same as those in the first embodiment described above. The action determination unit 3 of the second embodiment outputs a plurality of actions that can be taken for each player together with the selection rate.

調停モデル７は、複数のアクション決定部３から出力されたアクションを調停して、各プレーヤーのアクションを決定する。例えば、各プレーヤーが取り得るアクションには、相反するものがある。調停モデル７は、これらの関係を用いて、同時に行うことができるアクションの組み合わせを選択し、各プレーヤーのアクションを決定する。 The mediation model 7 mediates the actions output from the plurality of action determination units 3 and determines the action of each player. For example, there are conflicting actions that each player can take. The arbitration model 7 uses these relationships to select a combination of actions that can be performed at the same time, and determines an action for each player.

具体的には、調停モデル７は、アクション選択部６から出力された複数のアクションの選択率を計算し、各プレーヤー毎に選択率が最も高いアクションを選択し、各プレーヤーのアクションに決定する。 Specifically, the arbitration model 7 calculates a selection rate of a plurality of actions output from the action selection unit 6, selects an action with the highest selection rate for each player, and determines the action of each player.

図１３は、第２の実施例の調停モデル７を説明する図である。調停モデル７は、コーザル・ループ・ダイアグラムによって各指標間（ストック要素、フロー要素、状態要素）の相関関係を表すモデルである。 FIG. 13 is a diagram illustrating the arbitration model 7 according to the second embodiment. The arbitration model 7 is a model that represents the correlation between each index (stock element, flow element, state element) by a causal loop diagram.

例えば、図示した調停モデル７では、ストック要素１、２、３、４を、それぞれ、ｘ１、ｘ２、ｘ３、ｘ４とし、状態要素１、２を、それぞれ、ｙ１、ｙ２とし、各エッジの係数（ｋ１〜ｋ９）を定める。なお、状態要素は、現在の状態を表す数字であり、例えば、プレーヤー１が現在、アクション１を行っている場合に１、アクション１以外を行っている場合に０などと定めることができる。 For example, in the illustrated arbitration model 7, the stock elements 1, 2, 3, and 4 are x1, x2, x3, and x4, respectively, and the state elements 1 and 2 are y1 and y2, respectively. k1 to k9). The state element is a number representing the current state, and can be set to 1 when the player 1 is currently performing action 1, 0 when the player 1 is performing action other than action 1, and the like.

図示したように、ストック要素及び状態要素によって、ストック要素へ流入するフローの量や、ストック要素から流出するフローの量が決まり、ストック要素の量が啓示的に変化する。 As shown in the drawing, the amount of flow flowing into and out of the stock element is determined by the stock element and the state element, and the amount of the stock element changes in an empirical manner.

前述のように定義した場合、離散系シミュレーションにおいて、時刻ｔ＋１における各ストック要素の値は以下の漸化式によって計算することができる。
ｘ１（ｔ＋１）＝ｋ１×ｙ１（ｔ）＋ｋ２×ｘ４（ｔ）−ｋ３×ｘ３（ｔ）
ｘ２（ｔ＋１）＝ｘ１（ｔ）＋ｋ３×ｘ３（ｔ）−ｋ４×ｘ４（ｔ）
ｘ３（ｔ＋１）＝ｋ５×ｘ４（ｔ）−ｋ６×ｘ１（ｔ）
ｘ４（ｔ＋１）＝ｋ７×ｘ１（ｔ）−ｋ８×ｘ３（ｔ）−ｋ９×ｙ２（ｔ） When defined as described above, in the discrete system simulation, the value of each stock element at time t + 1 can be calculated by the following recurrence formula.
x1 (t + 1) = k1 * y1 (t) + k2 * x4 (t) -k3 * x3 (t)
x2 (t + 1) = x1 (t) + k3 * x3 (t) -k4 * x4 (t)
x3 (t + 1) = k5 * x4 (t) -k6 * x1 (t)
x4 (t + 1) = k7 * x1 (t) -k8 * x3 (t) -k9 * y2 (t)

また、連続系シミュレーションにおいて、各ストック要素の値は以下の微分方程式によって計算することができる。
ｄ［ｘ１（ｔ）］／ｄｔ＝ｋ１×ｙ１（ｔ）＋ｋ２×ｘ４（ｔ）−ｋ３×ｘ３（ｔ）
ｄ［ｘ２（ｔ）］／ｄｔ＝ｘ１（ｔ）＋ｋ３×ｘ３（ｔ）−ｋ４×ｘ４（ｔ）
ｄ［ｘ３（ｔ）］／ｄｔ＝ｋ５×ｘ４（ｔ）−ｋ６×ｘ１（ｔ）
ｄ［ｘ４（ｔ）］／ｄｔ＝ｋ７×ｘ１（ｔ）−ｋ８×ｘ３（ｔ）−ｋ９×ｙ２（ｔ） In the continuous system simulation, the value of each stock element can be calculated by the following differential equation.
d [x1 (t)] / dt = k1 * y1 (t) + k2 * x4 (t) -k3 * x3 (t)
d [x2 (t)] / dt = x1 (t) + k3 × x3 (t) −k4 × x4 (t)
d [x3 (t)] / dt = k5 * x4 (t) -k6 * x1 (t)
d [x4 (t)] / dt = k7 * x1 (t) -k8 * x3 (t) -k9 * y2 (t)

調停モデル７では、ストック要素及び状態要素によってフロー量を制御することによって、複数のストック要素を関連付けて制御し、各プレーヤーがアクションを選択する確率（選択率）を決定し、各プレーヤーのアクションを調停することができる。 In the arbitration model 7, by controlling the flow amount by the stock element and the state element, a plurality of stock elements are controlled in association with each other, the probability (selection rate) that each player selects an action is determined, and the action of each player is determined. Mediation is possible.

以上に説明したように、本発明の第２の実施例によると、調停モデル７が、アクション選択部６が選択した各プレーヤーのアクションを調停して、各プレーヤーのアクションを決定するので、複数プレーヤーのアクションの調停を考慮せず、モデル１、２、４、５を作成することができる。すなわち、アクションの調停と切り離してモデルを作成することができる。 As described above, according to the second embodiment of the present invention, the arbitration model 7 arbitrates the actions of each player selected by the action selection unit 6 and determines the actions of each player. Models 1, 2, 4, and 5 can be created without considering mediation of actions. That is, it is possible to create a model separately from the mediation of actions.

なお、本発明は前述した実施例に限定されるものではなく、添付した特許請求の範囲の趣旨内における様々な変形例及び同等の構成が含まれる。例えば、前述した実施例は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに本発明は限定されない。また、ある実施例の構成の一部を他の実施例の構成に置き換えてもよい。また、ある実施例の構成に他の実施例の構成を加えてもよい。また、各実施例の構成の一部について、他の構成の追加・削除・置換をしてもよい。 The present invention is not limited to the above-described embodiments, and includes various modifications and equivalent configurations within the scope of the appended claims. For example, the above-described embodiments have been described in detail for easy understanding of the present invention, and the present invention is not necessarily limited to those having all the configurations described. A part of the configuration of one embodiment may be replaced with the configuration of another embodiment. Moreover, you may add the structure of another Example to the structure of a certain Example. In addition, for a part of the configuration of each embodiment, another configuration may be added, deleted, or replaced.

また、前述した各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等により、ハードウェアで実現してもよく、プロセッサがそれぞれの機能を実現するプログラムを解釈し実行することにより、ソフトウェアで実現してもよい。 In addition, each of the above-described configurations, functions, processing units, processing means, etc. may be realized in hardware by designing a part or all of them, for example, with an integrated circuit, and the processor realizes each function. It may be realized by software by interpreting and executing the program to be executed.

各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリ、ハードディスク、ＳＳＤ（Solid State Drive）等の記憶装置、又は、ＩＣカード、ＳＤカード、ＤＶＤ等の記録媒体に格納することができる。 Information such as programs, tables, and files that realize each function can be stored in a storage device such as a memory, a hard disk, or an SSD (Solid State Drive), or a recording medium such as an IC card, an SD card, or a DVD.

また、制御線や情報線は説明上必要と考えられるものを示しており、実装上必要な全ての制御線や情報線を示しているとは限らない。実際には、ほとんど全ての構成が相互に接続されていると考えてよい。 Further, the control lines and the information lines are those that are considered necessary for the explanation, and not all the control lines and the information lines that are necessary for the mounting are shown. In practice, it can be considered that almost all the components are connected to each other.

ＣＡＬＣ＿ＮＯＤＥ計算機
ＣＰＵプロセッサ
ＲＡＭ一時記憶装置
ＳＴＯＲ補助記憶装置
ＣＯＭ＿ＳＷ通信スイッチ
ＣＯＭ＿ＤＥＶ通信デバイス
１アクション連鎖モデル
２反応モデル
３アクション決定部
４意思モデル
５評価モデル
６アクション選択部
７調停モデル CALC_NODE computer CPU processor RAM temporary storage device STOR auxiliary storage device COM_SW communication switch COM_DEV communication device 1 action chain model 2 reaction model 3 action decision unit 4 intention model 5 evaluation model 6 action selection unit 7 arbitration model

Claims

A decision support system comprising a computer having a processor and a memory,
The memory stores a plurality of index values obtained by quantifying a plurality of situations necessary for decision making,
The decision support system includes:
An action chain model for the processor to derive a next action from a player action;
A reaction model for the processor to derive a next index value from a player's action and the index value;
An intention model for the processor to derive, for each player, a selection probability representing the intention of the action from the index value;
An evaluation model for the processor to derive an evaluation value from the index value for each player;
The processor has an action selection unit for selecting an action of a player;
The decision support system includes:
Using the action chain model, deriving the next action that the player can take from the action of the player,
The reaction model is used to derive the next index value from the player's action and the index value,
Using the intention model, calculate the selection probability of each player's action from the derived next index value,
Using the evaluation model, calculating an evaluation value representing the degree to which the derived next index value is desirable for each player;
The action selection unit uses the action derived using the action chain model, the selection probability calculated using the intention model, and the evaluation value calculated using the evaluation model. A decision support system, characterized by selecting a player action and outputting the selected action.

The decision support system according to claim 1,
The processor has an arbitration model for arbitrating the actions of each selected player to determine the actions of each player;
The action selection unit selects a plurality of actions that each player can take, and outputs probabilities of the selected plurality of actions.
The decision support system uses the arbitration model to arbitrate the actions of the players selected by the action selection unit to determine the actions of the players.

The decision support system according to claim 1,
The action selection unit outputs screen data for displaying the actions of the selected players in time series.

The decision support system according to claim 1,
The decision selection support system, wherein the action selection unit outputs screen data for displaying an evaluation of each player for a set of actions of the plurality of players.

A decision support method executed by a computer having a processor and a memory,
The memory stores a plurality of index values obtained by quantifying a plurality of situations necessary for decision making,
The calculator includes an action chain model for deriving a next action from a player action, a reaction model for deriving a next index value from the player action and the index value, and an intention of the action from the index value. An intention model for deriving a selection probability for each player, and an evaluation model for deriving an evaluation value for each player from the index value,
The method
The processor derives the next action that the player can take from the action of the player using the action chain model, and stores it in the memory;
The processor uses the reaction model to derive the next index value from the player action and the index value, and stores the next index value in the memory,
The processor calculates the selection probability of each player's action from the derived next index value using the intention model, and stores it in the memory;
The processor uses the evaluation model to calculate an evaluation value representing the degree to which the derived next index value is desirable for each player, and stores the evaluation value in the memory;
The processor uses the action derived using the action chain model, the selection probability calculated using the intention model, and the evaluation value calculated using the evaluation model to A decision support method, wherein an action is selected and stored in the memory.

The decision support method according to claim 5,
The calculator has an arbitration model for arbitrating the actions of each selected player to determine the actions of each player;
The processor selects a plurality of actions that each player can take, and outputs probabilities of the selected plurality of actions.
A decision support method, wherein the processor arbitrates the action of each selected player using the arbitration model, and determines the action of each player.

The decision support method according to claim 5,
The said processor outputs the screen data for displaying the action of each said selected player in time series, The decision support method characterized by the above-mentioned.

The decision support method according to claim 5,
A decision support method, wherein the processor outputs screen data for displaying an evaluation of each player for a set of actions of the plurality of players.