JPWO2016121053A1

JPWO2016121053A1 - Computer system and graphical model management method

Info

Publication number: JPWO2016121053A1
Application number: JP2016571597A
Authority: JP
Inventors: ヨウショウ; 利昇三好; 泰隆長谷川; 伴　秀行; 伴　　秀行
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2015-01-29
Filing date: 2015-01-29
Publication date: 2017-12-07
Anticipated expiration: 2035-01-29
Also published as: WO2016121053A1; JP6422512B2

Abstract

演算装置及びメモリを有する計算機を備える計算機システムであって、学習データを用いてグラフィカルモデルを構築するグラフィカルモデル構築部と、エビデンスデータの入力を受け付け、推論対象のノードが所定の状態値になる確率値の分布である推論結果を算出する推論部と、を備え、推論部は、影響評価値を算出する影響度算出部と、確率値微分量を算出する確率値微分量算出部と、を含み、確率値微分量算出部は、推論対象のノードと依存関係を有するノードを抽出し、抽出されたノードの中から処理対象のノードを選択し、処理対象のノードの確率テーブルを取得し、処理対象のノードと依存関係を有する一つの以上のノードの集合を家族ノードとして抽出し、推論対象のノード、処理対象のノード、及び家族ノードを組み合わせた同時確率変数の同時確率分布を算出し、同時確率分布及び処理対象のノードの確率テーブルを用いて確率値微分量を算出する。A computer system including a computing device and a computer having a memory, and a probability that a node to be inferred has a predetermined state value by accepting input of evidence data and a graphical model construction unit that constructs a graphical model using learning data An inference unit that calculates an inference result that is a distribution of values, and the inference unit includes an impact calculation unit that calculates an impact evaluation value, and a probability value differential amount calculation unit that calculates a probability value differential amount The probability value derivative calculation unit extracts a node having a dependency relationship with the inference target node, selects a processing target node from the extracted nodes, obtains a probability table of the processing target node, A set of one or more nodes that have a dependency relationship with the target node is extracted as a family node, and the inference target node, the processing target node, and the family node are combined. Was calculated joint probability distribution of the joint probability variables, calculates the probability value differential quantity using the probability table of the joint probability distribution and processing node.

Description

本発明は、グラフィカルモデルにおける確率推論方法に関する。 The present invention relates to a probability inference method in a graphical model.

ベイジアンネットワーク等のグラフィカルモデルは因果関係分析及び予測モデル構築においてよく用いられる。 Graphical models such as Bayesian networks are often used in causality analysis and prediction model construction.

グラフィカルモデルは、確率変数であるノードと、ノード間を接続するエッジとから構成される。エッジは確率変数間の依存関係を示す。依存関係のある二つのノードうち、一方のノードに影響を与えるノードを親ノードと呼び、一方のノードの影響を受けるノードを子ノードと呼ぶ。さらに、一つのノードと当該ノードの親ノードの集合を当該ノードの家族ノードと呼ぶ。 The graphical model is composed of nodes that are random variables and edges that connect the nodes. An edge indicates a dependency between random variables. Of the two nodes having a dependency relationship, a node that affects one node is called a parent node, and a node that is affected by one node is called a child node. Furthermore, a set of one node and a parent node of the node is referred to as a family node of the node.

各ノードには条件付確率テーブルが与えられる。条件付確率テーブルは、あるノードの親ノードの状態値ベクトルごとの、該当あるノードの条件付確率値の分布を表す。ここで、条件付確率分布は、ノードが取り得る状態値の条件付確率値の分布を意味する。 Each node is given a conditional probability table. The conditional probability table represents a distribution of conditional probability values of a certain node for each state value vector of a parent node of the certain node. Here, the conditional probability distribution means a distribution of conditional probability values of state values that a node can take.

グラフィカルモデルでは、所定の確率変数に値を入力すると、予測対象の確率変数の確率値を得ることができる。これによって、例えば、過去のデータから将来の動向等を予測することができる。 In the graphical model, when a value is input to a predetermined random variable, the probability value of the random variable to be predicted can be obtained. Thereby, for example, future trends and the like can be predicted from past data.

本明細書では、予測対象の確率変数を推論対象と呼び、条件付確率値を用いて推論対象の確率値の分布（確率分布）を算出することを推論と呼び、推論対象の確率分布を推論結果と呼ぶ。また、本明細書では、推論対象の確率分布における各確率値の信頼区間を信頼区間と呼ぶ。 In this specification, the random variable to be predicted is called the inference target, and the distribution of probability values (probability distribution) using the conditional probability value is called inference, and the probability distribution of the inference target is inferred. Call the result. In this specification, the confidence interval of each probability value in the probability distribution of the inference object is referred to as a confidence interval.

Tim Van Allen, Ajit Singh, Russell Greiner, Peter Hooper、“Quantifying the uncertainty of a belief net response: Bayesian error-bars for belief net inference”、Artificial Intelligence 172 (2008) 483-513Tim Van Allen, Ajit Singh, Russell Greiner, Peter Hooper, “Quantifying the uncertainty of a belief net response: Bayesian error-bars for belief net inference”, Artificial Intelligence 172 (2008) 483-513

グラフィカルモデルは、有限の数のデータを用いた学習によって生成されるため、統計的な信頼性という指標を有する。グラフィカルモデルの統計的な不確実さを定量化し、これに基づいて、グラフィカルモデルを用いた予測の信頼性を評価し、また、グラフィカルモデルを再構築することが重要である。 Since the graphical model is generated by learning using a finite number of data, it has an index of statistical reliability. It is important to quantify the statistical uncertainty of the graphical model, based on this, evaluate the reliability of the prediction using the graphical model, and reconstruct the graphical model.

非特許文献１に記載されているように、確率テーブルにおける確率値が推論対象に与える影響は、下式（１）に示すように与えられる。なお、式（１）は非特許文献１の式（８）に対応する。 As described in Non-Patent Document 1, the influence of the probability value in the probability table on the inference target is given as shown in the following equation (1). Equation (1) corresponds to Equation (8) in Non-Patent Document 1.

ここで、確率値微分量は、条件付確率値の微小変化に対する推論結果の確率値の変化量を表す。式（１）に示すように、影響評価値は確率値微分量の関数として与えられる。 Here, the probability value differential amount represents the change amount of the probability value of the inference result with respect to the minute change of the conditional probability value. As shown in Expression (1), the influence evaluation value is given as a function of the probability value differential amount.

ここで、非特許文献１には、確率値微分量が下式（２）のように定義され、具体的には下式（３）を算出すればよいことが記載されている。なお、式（２）は非特許文献１の式（６）に対応し、式（３）は非特許文献１の式（１４）に対応する。 Here, Non-Patent Document 1 describes that the probability value differential amount is defined as in the following equation (2), and specifically, the following equation (3) may be calculated. Equation (2) corresponds to Equation (6) in Non-Patent Literature 1, and Equation (3) corresponds to Equation (14) in Non-Patent Literature 1.

式（３）に示すような従来の変数消去法の中間結果を用いた計算機方法の場合、グラフィカルモデルの規模が大きくなると、厳密推論法である変数消去法における計算コストが大きくなり、実際に計算できないという問題がある。 In the case of the computer method using the intermediate result of the conventional variable elimination method as shown in Equation (3), the computational cost of the variable elimination method, which is a strict inference method, increases as the scale of the graphical model increases. There is a problem that you can not.

本発明では、規模が大きいグラフィカルモデルにおいても計算コストを抑えて、影響評価値を算出できる装置及び方法を提供する。 The present invention provides an apparatus and a method capable of calculating an impact evaluation value while suppressing calculation cost even in a large-scale graphical model.

本願において開示される発明の代表的な一例を示せば以下の通りである。すなわち、プログラムを実行する演算装置、及び前記プログラムを格納するメモリを有する一つ以上の計算機を備える計算機システムであって、確率変数に対応する複数の項目を含むレコードから構成される学習データを管理する学習データ記憶部と、前記学習データを用いて、前記確率変数に対応するノード、前記ノード間の依存関係を示すエッジ、及び前記ノードに対応する前記確率変数の状態値毎の確率値の分布を示す確率テーブルから構成されるグラフィカルモデルを構築するグラフィカルモデル構築部と、前記グラフィカルモデル構築部によって構築された前記グラフィカルモデルの構造情報、及び前記グラフィカルモデルに含まれる複数のノードの各々の前記確率テーブルを管理するモデル情報記憶部と、前記確率変数となる複数の項目を含むレコードから構成され、少なくとも一つの項目に値が格納されるエビデンスデータの入力を受け付け、推論対象のノードが所定の状態値になる確率値の分布である推論結果を算出する推論部と、を備え、前記推論部は、前記推論対象のノードと依存関係を有するノードの確率値が前記推論対象のノードの確率値に与える影響の大きさを示す影響評価値を算出する影響度算出部と、前記影響評価値の算出に用いられ、前記推論対象のノードと依存関係を有するノードの確率値の微小変化に対する前記推論対象のノードの確率値の変化量である確率値微分量を算出する確率値微分量算出部と、を含み、前記確率値微分量算出部は、前記モデル情報記憶部によって管理される前記グラフィカルモデルの構造情報を参照して、前記推論対象のノードと依存関係を有するノードを抽出し、前記抽出されたノードの中から処理対象のノードを選択し、前記モデル情報記憶部によって管理される前記処理対象のノードの前記確率テーブルを取得し、前記モデル情報記憶部によって管理される前記グラフィカルモデルの構造情報を参照して、前記処理対象のノードと依存関係を有する一つの以上のノードの集合を家族ノードとして抽出し、前記推論対象のノード、前記処理対象のノード、及び前記家族ノードの前記確率テーブルを用いて、前記推論対象のノード、前記処理対象のノード、及び前記家族ノードを組み合わせた同時確率変数の同時確率分布を算出し、前記同時確率分布及び前記処理対象のノードの確率テーブルを用いて前記確率値微分量を算出することを特徴とする。 A typical example of the invention disclosed in the present application is as follows. That is, a computer system including an arithmetic unit that executes a program and one or more computers having a memory that stores the program, and manages learning data including records including a plurality of items corresponding to random variables A learning data storage unit, and using the learning data, a node corresponding to the random variable, an edge indicating a dependency relationship between the nodes, and a distribution of probability values for each state value of the random variable corresponding to the node A graphical model constructing unit that constructs a graphical model including a probability table indicating the structure information of the graphical model constructed by the graphical model constructing unit, and the probability of each of a plurality of nodes included in the graphical model A model information storage unit for managing the table, and a compound variable serving as the random variable. An inference unit that accepts input of evidence data that is composed of records containing items and stores values in at least one item, and calculates inference results that are distributions of probability values that cause the inference target node to have a predetermined state value And the inference unit calculates an influence evaluation value indicating an influence evaluation value indicating a degree of influence of a probability value of a node having a dependency relationship with the inference target node on the probability value of the inference target node. And a probability value differential amount that is a change amount of the probability value of the inference target node with respect to a minute change in the probability value of the node having a dependency relationship with the inference target node. A probability value differential amount calculating unit, wherein the probability value differential amount calculating unit refers to structure information of the graphical model managed by the model information storage unit, and the inference Extracting a node having a dependency relationship with an elephant node, selecting a node to be processed from the extracted nodes, and obtaining the probability table of the node to be processed managed by the model information storage unit , Referring to the structural information of the graphical model managed by the model information storage unit, extracting a set of one or more nodes having a dependency relationship with the processing target node as a family node, and the inference target node , Using the probability table of the processing target node and the family node, calculating a joint probability distribution of a joint probability variable combining the inference target node, the processing target node, and the family node, The probability value differential amount is calculated using a joint probability distribution and a probability table of the processing target node.

本発明によれば、確率値微分量が同時確率分布を用いて算出される。そのため、近似推論手法を用いて影響評価値を算出できるため、規模が大きいグラフィカルモデルにおいても計算コストを抑えて、影響評価値を算出できる。上記した以外の課題、構成及び効果は、以下の実施例の説明により明らかにされる。 According to the present invention, the probability value differential amount is calculated using the joint probability distribution. Therefore, since the impact evaluation value can be calculated using the approximate reasoning method, it is possible to calculate the impact evaluation value while suppressing the calculation cost even in a large-scale graphical model. Problems, configurations, and effects other than those described above will become apparent from the description of the following examples.

実施例１の計算機の推論部が実行する処理の一例を説明するフローチャートである。6 is a flowchart illustrating an example of processing executed by an inference unit of the computer according to the first embodiment. 実施例１の計算機システムの構成の一例を示すブロック図である。1 is a block diagram illustrating an example of a configuration of a computer system according to a first embodiment. 実施例１のデータベースに格納される学習データの一例を示す説明図である。It is explanatory drawing which shows an example of the learning data stored in the database of Example 1. 実施例１のグラフィカルモデルの一例を示す説明図である。3 is an explanatory diagram illustrating an example of a graphical model according to Embodiment 1. FIG. 実施例１のグラフィカルモデルの構造情報の一例を示す説明図である。6 is an explanatory diagram illustrating an example of structure information of a graphical model according to Embodiment 1. FIG. 実施例１のグラフィカルモデルの構造情報の一例を示す説明図である。6 is an explanatory diagram illustrating an example of structure information of a graphical model according to Embodiment 1. FIG. 実施例１の条件付確率テーブルの一例を示す説明図である。It is explanatory drawing which shows an example of the conditional probability table of Example 1. FIG. 実施例１のエビデンスデータの一例を示す説明図である。It is explanatory drawing which shows an example of the evidence data of Example 1. FIG. 実施例１の推論結果管理情報の一例を示す説明図である。It is explanatory drawing which shows an example of the inference result management information of Example 1. 実施例１の影響度管理情報の一例を示す説明図である。It is explanatory drawing which shows an example of the influence management information of Example 1. FIG. 実施例１の影響度管理情報の一例を示す説明図である。It is explanatory drawing which shows an example of the influence management information of Example 1. FIG. 実施例１の計算機のグラフィカルモデル構築部が実行する処理の一例を説明するフローチャートである。6 is a flowchart illustrating an example of processing executed by a graphical model construction unit of the computer according to the first embodiment. 実施例１の表示部によって表示されるユーザインタフェースの一例を示す説明図である。FIG. 6 is an explanatory diagram illustrating an example of a user interface displayed by the display unit according to the first embodiment. 実施例２の計算機が実行するグラフィカルモデルの再構築処理を説明するフローチャートである。10 is a flowchart for explaining a graphical model reconstruction process executed by the computer according to the second embodiment. 実施例３の計算機が実行するグラフィカルモデルの再構築処理を説明するフローチャートである。12 is a flowchart for explaining a graphical model reconstruction process executed by a computer according to a third embodiment. 実施例４の計算機が実行するグラフィカルモデルの再構築処理を説明するフローチャートである。15 is a flowchart for explaining a graphical model reconstructing process executed by the computer according to the fourth embodiment.

実施例１では、計算機２００が、入力された学習データに基づいて、複数の確率変数間の因果関係及び遷移関係をグラフィカルモデルとして構築する。また、計算機２００が、構築されたグラフィカルモデルに対して入力されたエビデンスデータに基づいて、近似推論方法を用いて推論対象の確率分布（推論結果）を算出する。このとき、計算機２００は、影響評価値を算出し、当該影響評価値を用いて推論結果の条件付確率値の信頼区間も算出する。 In the first embodiment, the computer 200 constructs a causal relationship and a transition relationship between a plurality of random variables as a graphical model based on the input learning data. Further, the computer 200 calculates the probability distribution (inference result) of the inference object using the approximate inference method based on the evidence data input to the constructed graphical model. At this time, the computer 200 calculates an impact evaluation value, and also calculates a confidence interval of the conditional probability value of the inference result using the impact evaluation value.

グラフィカルモデルを用いて、過去のデータ（サンプルデータ）から将来の動向等を予測する場合、推論結果だけではなく、推論結果における確率値の信頼区間を示すことができれば、当該推論結果の信頼性及び説得力の向上に繋がる。 When predicting future trends, etc. from past data (sample data) using a graphical model, if the confidence interval of the probability value in the inference result can be shown as well as the inference result, the reliability of the inference result and It leads to improvement of persuasive power.

例えば、コインを投げてコインが表面又は裏面になる事象を考えた場合、コインが表面及び裏面になる真実の確率はそれぞれ５０％である。しかし、コインを投げる回数が有限回、例えば、１０回である場合、１０回中必ず５回が表面になるとは限らない。ここで、コインを投げる回数をサンプル数と呼ぶ。 For example, when the phenomenon that a coin is thrown and the coin becomes the front side or the back side is considered, the true probability that the coin becomes the front side or the back side is 50%. However, if the number of coins to be thrown is a finite number of times, for example, 10 times, 5 times out of 10 times does not necessarily become the surface. Here, the number of times the coin is thrown is called the number of samples.

したがって、サンプル数が有限値である場合に、コインが表面になる確率は、５０％からずれた値となる。前述した試行では、コインが表面になる確率は６０％になる。コインを投げる回数が多いほど、得られる結果の揺らぎが小さくなることから、例えば、コインを１０回投げて得られる確率値は、コインを１００回投げて得られる確率値より信頼性が低いことが分かる。 Therefore, when the number of samples is a finite value, the probability that the coin becomes the surface is a value deviated from 50%. In the trial described above, the probability that the coin will be on the surface is 60%. As the number of coins to be thrown increases, the fluctuation of the obtained result becomes smaller. For example, the probability value obtained by throwing a coin 10 times may be less reliable than the probability value obtained by throwing a coin 100 times. I understand.

前述のような有限のサンプル数から得られた確率値は、真実の確率値の推測値となるため、当該確率値の信頼度を評価する必要がある。この確率値の信頼度を評価する信頼性評価値は、サンプル数の関数として与えられる。 Since the probability value obtained from the finite number of samples as described above is an estimated value of the true probability value, it is necessary to evaluate the reliability of the probability value. The reliability evaluation value for evaluating the reliability of the probability value is given as a function of the number of samples.

学習データを用いた機械学習によってグラフィカルモデルを構築する場合、各ノードに対して条件付確率テーブルを与えることができる。各ノードの条件付確率テーブルの条件付確率値は、前述の試行と同様に、学習データの数、すなわち、サンプル数が多いほど信頼性が高い。 When a graphical model is constructed by machine learning using learning data, a conditional probability table can be given to each node. The conditional probability value in the conditional probability table of each node is more reliable as the number of learning data, that is, the number of samples is larger, as in the above-described trial.

したがって、当該グラフィカルモデルを用いた推論結果の信頼性もサンプル数に依存するため、サンプル数を多いほど、より正確な推論結果を得ることができる。すなわち、推論結果の信頼区間が小さくなる。このことから、推論結果の信頼度についても、サンプル数の関数として表すことができる。 Therefore, since the reliability of the inference result using the graphical model also depends on the number of samples, the more accurate inference result can be obtained as the number of samples increases. That is, the confidence interval of the inference result is reduced. From this, the reliability of the inference result can also be expressed as a function of the number of samples.

このような繋がりで、学習データの有限性による、条件付確率値の信頼度は最終的に推論結果の信頼度に反映し、信頼区間という形式で表す。 With such a connection, the reliability of the conditional probability value due to the finiteness of the learning data is finally reflected in the reliability of the inference result and expressed in the form of a confidence interval.

ここで、学習データは後述するように、確率変数に対応する項目を複数含むレコードの集合であり、少なくとも一つ以上の項目に値が格納される。グラフィカルモデルは、各確率変数をノードとして扱い、また、ノード間の依存関係をエッジとして表現したグラフである。また、グラフィカルモデルでは、ノードに確率テーブルが与えられる。 Here, as will be described later, the learning data is a set of records including a plurality of items corresponding to the random variables, and values are stored in at least one item. The graphical model is a graph that treats each random variable as a node and expresses a dependency relationship between nodes as an edge. In the graphical model, a probability table is given to the node.

グラフィカルモデルには、ベイジアンネットワーク及びマルコフネットワークなどの種類がある。本実施例ではベイジアンネットワークを例に説明する。ベイジアンネットワークでは、確率変数間の依存関係は条件付確率として表され、また、ノードの確率値の分布を示す確率テーブルは条件付確率テーブルとなる。また、本実施例では、後述するエビデンスデータの確率変数に対応する項目の中で、値を持っていない項目（確率変数）の一部又は全てが推論対象として扱われる。 There are several types of graphical models, such as Bayesian networks and Markov networks. In this embodiment, a Bayesian network will be described as an example. In a Bayesian network, the dependency between random variables is expressed as a conditional probability, and a probability table indicating the distribution of probability values of nodes is a conditional probability table. Also, in this embodiment, some or all of items (probability variables) that do not have a value among items corresponding to random variables of evidence data described later are treated as inference targets.

なお、近似推論手法は、推論対象の確率分布を算出するための近似方法である。代表的な近似推論手法には、Ｌｏｏｐｙｂｅｌｉｅｆｐｒｏｐａｇａｔｉｏｎ法又はＧｉｂｂｓｓａｍｐｌｉｎｇ法などが知られている。 The approximate reasoning method is an approximate method for calculating the probability distribution of the reasoning object. As a representative approximate reasoning method, the Loopy belief propagation method or the Gibbs sampling method is known.

図２は、実施例１の計算機システムの構成の一例を示すブロック図である。 FIG. 2 is a block diagram illustrating an example of the configuration of the computer system according to the first embodiment.

計算機システムは、計算機２００及びデータベース２０６から構成される。 The computer system includes a computer 200 and a database 206.

計算機２００は、グラフィカルモデルを構築し、また、グラフィカルモデルを用いて推論結果及び推論結果における確率値の信頼区間等を算出する。本実施例の計算機２００は、演算装置２０１、メモリ２０２、記憶媒体２０３、入力装置２０４、及び出力装置２０５を有し、内部バス等を介して互いに接続される。 The computer 200 constructs a graphical model, and calculates an inference result and a confidence interval of a probability value in the inference result using the graphical model. The computer 200 of this embodiment includes an arithmetic device 201, a memory 202, a storage medium 203, an input device 204, and an output device 205, which are connected to each other via an internal bus or the like.

演算装置２０１は、メモリ２０２に格納されるプログラムを実行する演算装置であり、例えば、ＣＰＵ及びＧＰＵ等がある。以下の、機能部を主語として処理及び機能を説明する場合、演算装置２０１によって当該機能部を実現するプログラムが実行されていることを示す。メモリ２０２は、演算装置２０１によって実行されるプログラム及び当該プログラムによって使用される情報を格納する。メモリ２０２は、揮発性のメモリ及び不揮発性のメモリのいずれであってもよい。 The arithmetic device 201 is an arithmetic device that executes a program stored in the memory 202, and includes, for example, a CPU and a GPU. In the following description, when processing and functions are described using a functional unit as a subject, it indicates that a program for realizing the functional unit is being executed by the arithmetic device 201. The memory 202 stores a program executed by the arithmetic device 201 and information used by the program. The memory 202 may be either a volatile memory or a non-volatile memory.

記憶媒体２０３は、計算機２００が有する各種機能を実現するプログラム等を格納する。本実施例では、演算装置２０１が、記憶媒体２０３からプログラムを読み出し、読み出されたプログラムをメモリ２０２上にロードし、さらに、ロードされたプログラムを実行する。本実施例の記憶媒体２０３に格納されるプログラム等については後述する。 The storage medium 203 stores programs and the like that realize various functions of the computer 200. In this embodiment, the arithmetic unit 201 reads a program from the storage medium 203, loads the read program onto the memory 202, and executes the loaded program. The programs stored in the storage medium 203 of this embodiment will be described later.

なお、記憶媒体２０３に格納されるプログラムは、ＣＤ−ＲＯＭ及びフラッシュメモリ等のリムーバブルメディア又はネットワークを介して接続される配信サーバから取得する方法が考えられる。リムーバブルメディアからプログラムを取得する場合、計算機２００は、リムーバブルメディアに接続されるインタフェースを備える。 Note that the program stored in the storage medium 203 can be obtained from a removable medium such as a CD-ROM and a flash memory, or a distribution server connected via a network. When acquiring a program from a removable medium, the computer 200 includes an interface connected to the removable medium.

入力装置２０４は、計算機２００に各種情報を入力するための装置であり、例えば、キーボード、マウス、及びタッチパネル等が含まれる。出力装置２０５は、計算機２００が実行した処理結果を出力する装置であり、例えばディスプレイ等が含まれる。 The input device 204 is a device for inputting various information to the computer 200, and includes, for example, a keyboard, a mouse, a touch panel, and the like. The output device 205 is a device that outputs a processing result executed by the computer 200, and includes, for example, a display.

データベース２０６は、計算機２００が管理する各種データを格納する。本実施例では、図示しないストレージシステムを用いてデータベース２０６が構築されるものとする。ストレージシステムは、コントローラ、外部インタフェース、及び複数の記憶媒体を備える。ストレージシステムは、複数の記憶媒体を用いてＲＡＩＤを構成することができる。また、ストレージシステムは、ＲＡＩＤボリュームを用いて複数の論理的な記憶領域を提供することもできる。 The database 206 stores various data managed by the computer 200. In this embodiment, it is assumed that the database 206 is constructed using a storage system (not shown). The storage system includes a controller, an external interface, and a plurality of storage media. The storage system can configure RAID using a plurality of storage media. The storage system can also provide a plurality of logical storage areas using a RAID volume.

データベース２０６は、学習データ記憶部２４１、モデル情報記憶部２４２、エビデンスデータ記憶部２４３、推論結果記憶部２４４、及び影響度記憶部２４５を含む。 The database 206 includes a learning data storage unit 241, a model information storage unit 242, an evidence data storage unit 243, an inference result storage unit 244, and an influence degree storage unit 245.

学習データ記憶部２４１は、グラフィカルモデルの構築時に用いられる学習データ３００を記憶する。学習データ３００の詳細は図３を用いて説明する。モデル情報記憶部２４２は、グラフィカルモデルの構造を示す構造情報５００、５１０、及び条件付確率テーブル６００を記憶する。構造情報５００、５１０の詳細については図５Ａ及び図５Ｂを用いて説明し、また、条件付確率テーブル６００の詳細については図６を用いて説明する。 The learning data storage unit 241 stores learning data 300 used when a graphical model is constructed. Details of the learning data 300 will be described with reference to FIG. The model information storage unit 242 stores structure information 500 and 510 indicating the structure of the graphical model, and a conditional probability table 600. Details of the structure information 500 and 510 will be described with reference to FIGS. 5A and 5B, and details of the conditional probability table 600 will be described with reference to FIG.

エビデンスデータ記憶部２４３は、エビデンスデータ７００を記憶する。ここで、エビデンスデータ７００は、健康検診等を行う患者等から取得される検査値及び画像データ等の健康に関する情報、並びに、医師による診断情報、処置及び処方薬等の診療に関する情報等を示す。エビデンスデータ７００の詳細は図７を用いて説明する。推論結果記憶部２４４は、推論結果管理情報８００を記憶する。推論結果管理情報８００の詳細は図８を用いて説明する。影響度記憶部２４５は、影響度管理情報９００を記憶する。影響度管理情報９００の詳細は図９を用いて説明する。 The evidence data storage unit 243 stores the evidence data 700. Here, the evidence data 700 indicates information relating to health such as test values and image data acquired from patients undergoing health examinations, etc., and information relating to medical treatment such as diagnostic information, treatments, and prescription drugs by doctors. Details of the evidence data 700 will be described with reference to FIG. The inference result storage unit 244 stores the inference result management information 800. Details of the inference result management information 800 will be described with reference to FIG. The influence degree storage unit 245 stores influence degree management information 900. Details of the influence management information 900 will be described with reference to FIG.

ここで、記憶媒体２０３に格納されるプログラムについて説明する。 Here, the program stored in the storage medium 203 will be described.

記憶媒体２０３は、グラフィカルモデル構築部２１０、推論部２２０、及び表示部２３０を実現するプログラムを格納する。 The storage medium 203 stores programs for realizing the graphical model construction unit 210, the inference unit 220, and the display unit 230.

グラフィカルモデル構築部２１０は、グラフィカルモデルを構築し、また、構築されたグラフィカルモデルに関する各種情報を生成する。グラフィカルモデル構築部２１０は複数のモジュールから構成される。本実施例のグラフィカルモデル構築部２１０は、モデル構造学習部２１１、サンプル数算出部２１２、及び確率テーブル算出部２１３を含む。 The graphical model construction unit 210 constructs a graphical model and generates various types of information related to the constructed graphical model. The graphical model construction unit 210 is composed of a plurality of modules. The graphical model construction unit 210 of this embodiment includes a model structure learning unit 211, a sample number calculation unit 212, and a probability table calculation unit 213.

モデル構造学習部２１１は、学習データ３００を用いて、グラフィカルモデル（ベイジアンネットワーク）を構築する。また、モデル構造学習部２１１は、モデル情報記憶部２４２を介してデータベース２０６に構築されたグラフィカルモデルの構造情報５００、５１０を格納する。ここで、モデル構造学習部２１１は、ベイジアンネットワークの既存の構造学習アルゴリズムを用いてグラフィカルモデルを構築するものとする。ベイジアンネットワークの既存の構造学習アルゴリズムとしては、例えば、ＨｉｌｌＣｌｉｍｂｉｎｇ法等がある。 The model structure learning unit 211 uses the learning data 300 to construct a graphical model (Bayesian network). The model structure learning unit 211 stores the structure information 500 and 510 of the graphical model constructed in the database 206 via the model information storage unit 242. Here, it is assumed that the model structure learning unit 211 constructs a graphical model using an existing structure learning algorithm of the Bayesian network. As an existing structure learning algorithm of the Bayesian network, for example, there is a Hill Climbing method.

サンプル数算出部２１２は、学習データ３００に含まれるレコードの中から所定の条件に合致するレコードの数をサンプル数として算出する。サンプル数算出部２１２は、影響度記憶部２４５を介して、影響度管理情報９００に算出されたサンプル数を格納する。 The sample number calculation unit 212 calculates the number of records that meet a predetermined condition from the records included in the learning data 300 as the number of samples. The sample number calculation unit 212 stores the calculated sample number in the influence management information 900 via the influence storage unit 245.

確率テーブル算出部２１３は、構築されたグラフィカルモデルの各ノードの条件付確率テーブル６００を算出する。確率テーブル算出部２１３は、モデル情報記憶部２４２を介して、条件付確率テーブル６００を格納する。 The probability table calculation unit 213 calculates a conditional probability table 600 for each node of the constructed graphical model. The probability table calculation unit 213 stores the conditional probability table 600 via the model information storage unit 242.

推論部２２０は、推論対象の入力を受け付け、入力された推論対象の条件付確率値の分布（推論結果）を算出する。また、本実施例の推論部２２０は、ある条件付確率値が推論結果の確率値に与える影響の大きさを影響評価値として算出し、算出された影響評価値に基づいて推論結果の確率値の信頼区間を算出する。推論部２２０は、複数のモジュールから構成される。本実施例の推論部２２０は、推論結果算出部２２１、確率値微分量算出部２２２、影響度算出部２２３、及び信頼区間算出部２２４を含む。 The inference unit 220 receives an input of an inference object, and calculates a distribution (inference result) of the conditional probability value of the input inference object. In addition, the inference unit 220 of this embodiment calculates the magnitude of the influence of a certain conditional probability value on the probability value of the inference result as an impact evaluation value, and the probability value of the inference result based on the calculated impact evaluation value Compute confidence intervals for. The inference unit 220 includes a plurality of modules. The inference unit 220 of this embodiment includes an inference result calculation unit 221, a probability value differential amount calculation unit 222, an influence degree calculation unit 223, and a confidence interval calculation unit 224.

推論結果算出部２２１は、変数消去法等の厳密推論手法、又は、ＬｏｏｐｙＢｅｌｉｅｆＰｒｏｐａｇａｔｉｏｎ等の近似推論手法を用いて、推論対象の確率分布を算出する。 The inference result calculation unit 221 calculates a probability distribution of the inference target using a strict inference method such as a variable elimination method or an approximate inference method such as Loop Belief Propagation.

確率値微分量算出部２２２は、影響評価値を算出するために必要となる確率値微分量を算出する。ここで、確率値微分量は、条件付確率値の微小な変化量に対する推論結果の確率値の変化量であり、式（２）のように表される。 The probability value differential amount calculation unit 222 calculates the probability value differential amount necessary for calculating the impact evaluation value. Here, the probability value differential amount is a change amount of the probability value of the inference result with respect to a minute change amount of the conditional probability value, and is expressed as Expression (2).

本実施例では、確率値微分量を下式（４）で与える。厳密推論を用いて式（４）に含まれる同時確率分布を算出する場合、非特許文献１に記載する確率値微分量の計算結果と一致する。なお、影響評価値及び確率値微分量の具体的な算出方法については後述する。 In this embodiment, the probability value differential amount is given by the following equation (4). When calculating the joint probability distribution included in Equation (4) using strict inference, the result matches the calculation result of the probability value differential amount described in Non-Patent Document 1. A specific method for calculating the impact evaluation value and the probability value differential amount will be described later.

ここで、式（４）の導出について説明する。以下の説明では、推論対象をＴとし、条件付確率値の子ノードに対応する確率変数をＡとし、親ノードに対応する確率変数をＢ_ｉとする。なお、ｉは１からｍまでの整数であるものとする。また、Ｔの取り得る値の集合をＲ（Ｔ）とし、Ａの取り得る値の集合をＲ（Ａ）とし、Ｂ_ｉの取り得る値の集合をＲ（Ｂ_ｉ）とする。このとき、下式（５）が成り立つ。Here, the derivation of Expression (4) will be described. In the following description, an inference object is T, the random variable corresponding to the child nodes of the conditional probability value is A, the random variable corresponding to the parent node B _i. Note that i is an integer from 1 to m. Also, a set of possible values of T and R (T), a set of possible values of A and R (A), a set of possible values of B _i and R (B _i). At this time, the following expression (5) holds.

下式（６）、（７）に示すようなベイズの定理の式を用いると、式（５）は下式（８）のようになる。 By using the Bayes' theorem as shown in the following equations (6) and (7), the equation (5) becomes the following equation (8).

ここで、ｔ_ｌをＲ（Ｔ）の任意の値、ａ_ｋをＲ（Ａ）の任意の値、ｂ_ｉ,ｊをＲ（Ｂ_ｉ）の任意の値として下式（９）を計算する。Here, t ₁ is an arbitrary value of R (T), a _k is an arbitrary value of R (A), b _{i, j} is an arbitrary value of R (B _i ), and the following equation (9) is calculated. .

このとき、右辺には下式（１０）に示す項以外に下式（１１）を含む項がないため、式（９）は式（１２）に示すようになる。 At this time, since there is no term including the following formula (11) other than the term shown in the following formula (10) on the right side, the formula (9) becomes as shown in the formula (12).

式（１２）の右辺にベイズの定理を用いて変形すると下式（１３）のようになる。したがって、式（１２）は式（１４）に示すような形に表せる。すなわち、式（４）に一致する。以上が、式（４）の導出方法の説明である。 When the right side of equation (12) is transformed using Bayes' theorem, the following equation (13) is obtained. Therefore, Expression (12) can be expressed as shown in Expression (14). That is, it agrees with Expression (4). The above is the description of the derivation method of Equation (4).

影響度算出部２２３は、確率値微分量を用いて影響評価値を算出する。信頼区間算出部２２４は、影響評価値及びサンプル数に基づいて、推論結果の確率値の信頼区間を算出する。 The influence degree calculation unit 223 calculates an influence evaluation value using the probability value differential amount. The confidence interval calculation unit 224 calculates a confidence interval of the probability value of the inference result based on the impact evaluation value and the number of samples.

表示部２３０は、出力装置２０５等を介して、ユーザに対して出力する各種情報を表示する。表示部２３０は複数のモジュールから構成される。本実施例の表示部２３０は、モデル表示部２３１及び推論結果表示部２３２を含む。 The display unit 230 displays various information to be output to the user via the output device 205 or the like. The display unit 230 includes a plurality of modules. The display unit 230 of this embodiment includes a model display unit 231 and an inference result display unit 232.

モデル表示部２３１は、構築されたグラフィカルモデルを表示するための表示データを生成し、生成された表示データに基づいて出力装置２０５を介してグラフィカルモデルに関する情報を出力する。 The model display unit 231 generates display data for displaying the constructed graphical model, and outputs information related to the graphical model via the output device 205 based on the generated display data.

推論結果表示部２３２は、算出された推論結果を表示するための表示データを生成し、生成された表示データに基づいて出力装置２０５を介して推論結果に関する情報を出力する。 The inference result display unit 232 generates display data for displaying the calculated inference result, and outputs information on the inference result via the output device 205 based on the generated display data.

図３は、実施例１のデータベース２０６に格納される学習データ３００の一例を示す説明図である。 FIG. 3 is an explanatory diagram illustrating an example of the learning data 300 stored in the database 206 according to the first embodiment.

学習データ３００は、識別情報、及び確率変数に対応する複数のカラムから構成されるレコードを含む。本実施例のレコードは、患者ＩＤ３０１、ＢＭＩ値３０２、血圧値３０３、血糖値３０４、心臓病３０５、及び糖尿病３０６を含む。 The learning data 300 includes records composed of a plurality of columns corresponding to identification information and random variables. The record of this embodiment includes a patient ID 301, a BMI value 302, a blood pressure value 303, a blood glucose level 304, a heart disease 305, and a diabetes 306.

患者ＩＤ３０１は、患者の識別情報である。ＢＭＩ値３０２、血圧値３０３、及び血糖値３０４は、患者のＢＭＩ値、血圧値、及び血糖値である。心臓病３０５及び糖尿病３０６は、患者が心臓病及び糖尿病に該当するか否かを示す情報である。患者ＩＤ３０１がレコードの識別情報であり、ＢＭＩ値３０２、血圧値３０３、血糖値３０４、心臓病３０５、及び糖尿病３０６は、確率変数である。 The patient ID 301 is patient identification information. The BMI value 302, the blood pressure value 303, and the blood glucose level 304 are the BMI value, blood pressure value, and blood glucose level of the patient. Heart disease 305 and diabetes 306 are information indicating whether or not the patient falls into heart disease and diabetes. Patient ID 301 is identification information of the record, and BMI value 302, blood pressure value 303, blood glucose level 304, heart disease 305, and diabetes 306 are random variables.

患者が心臓病又は糖尿病に該当する場合には、心臓病３０５又は糖尿病３０６には「Ｙｅｓ」が格納され、患者が心臓病又は糖尿病に該当しない場合には、心臓病３０５又は糖尿病３０６には「Ｎｏ」が格納される。 When the patient has heart disease or diabetes, “Yes” is stored in the heart disease 305 or diabetes 306, and when the patient does not have heart disease or diabetes, the heart disease 305 or diabetes 306 has “ “No” is stored.

図３の上から１行目のレコードは、患者ＩＤ３０１が「Ｋ０００１」、ＢＭＩ値が「３２」、血圧値が「９０」、及び血糖値が「５」であり、また、心臓病及び糖尿病のいずれにも該当しないことを示す。 The record in the first line from the top of FIG. 3 shows that the patient ID 301 is “K0001”, the BMI value is “32”, the blood pressure value is “90”, and the blood glucose level is “5”. Indicates that it does not fall under any of the above.

なお、ＢＭＩ値３０２、血圧値３０３、血糖値３０４、心臓病３０５、及び糖尿病３０６には、必ずしも値が格納されていなくてもよい。この場合、当該カラムにはデータが欠損していることを示す情報が格納される。データの欠損を示す情報は、数値、文字、及びブール値のいずれであってもよい。 Note that the BMI value 302, blood pressure value 303, blood glucose level 304, heart disease 305, and diabetes 306 do not necessarily have to store values. In this case, information indicating that data is missing is stored in the column. The information indicating the data loss may be any of a numerical value, a character, and a Boolean value.

次に、図３に示す学習データ３００に対応するグラフィカルモデル及び構造情報５００、５１０について説明する。 Next, the graphical model and structure information 500 and 510 corresponding to the learning data 300 shown in FIG. 3 will be described.

図４は、実施例１のグラフィカルモデル４００の一例を示す説明図である。 FIG. 4 is an explanatory diagram illustrating an example of the graphical model 400 according to the first embodiment.

グラフィカルモデル４００は、複数のノード４１０、及びノード４１０間を接続するエッジ４２０から構成される。図４に示すグラフィカルモデル４００の各ノード４１０は、学習データ３００のＢＭＩ値３０２、血圧値３０３、血糖値３０４、心臓病３０５、及び糖尿病３０６に対応する。 The graphical model 400 includes a plurality of nodes 410 and an edge 420 that connects the nodes 410. Each node 410 of the graphical model 400 shown in FIG. 4 corresponds to the BMI value 302, blood pressure value 303, blood glucose level 304, heart disease 305, and diabetes 306 of the learning data 300.

本実施例ではベイジアンネットワークを想定しているため、ノード４１０を接続するエッジ４２０には向きが存在する。また、エッジ４２０の始点に対応するノード４１０を親ノードと呼び、エッジ４２０の終点に対応するノード４１０を子ノードと呼ぶ。例えば、「糖尿病」のノード４１０は「血糖値」のノードの子ノードであり、「血糖値」のノードは「糖尿病」のノードに対する親ノードである。各ノード４１０には条件付確率テーブルが与えられる。ベイジアンネットワークでは、子ノードの確率分布は親ノードの確率値に依存する。 Since a Bayesian network is assumed in the present embodiment, there is a direction at the edge 420 connecting the nodes 410. The node 410 corresponding to the start point of the edge 420 is called a parent node, and the node 410 corresponding to the end point of the edge 420 is called a child node. For example, the “diabetes” node 410 is a child node of the “blood glucose level” node, and the “blood glucose level” node is a parent node for the “diabetes” node. Each node 410 is given a conditional probability table. In a Bayesian network, the probability distribution of child nodes depends on the probability value of the parent node.

図５Ａ及び図５Ｂは、実施例１のグラフィカルモデル４００の構造情報５００、５１０の一例を示す説明図である。本実施例のグラフィカルモデル４００の構造情報には、ノード４１０の情報及びエッジ４２０の情報が含まれる。 5A and 5B are explanatory diagrams illustrating examples of the structure information 500 and 510 of the graphical model 400 according to the first embodiment. The structure information of the graphical model 400 of this embodiment includes information on the node 410 and information on the edge 420.

図５Ａは、グラフィカルモデル４００におけるノード４１０に関する構造情報５００を示す。構造情報５００は、一つのノード４１０に対して一つのレコードを含み、レコードはノードＩＤ５０１及び項目名５０２を含む。 FIG. 5A shows structural information 500 regarding the node 410 in the graphical model 400. The structure information 500 includes one record for one node 410, and the record includes a node ID 501 and an item name 502.

ノードＩＤ５０１は、ノード４１０を一意に識別するための識別情報である。項目名５０２は、ノード４１０に対応する確率変数の識別情報である。項目名５０２は、学習データ３００の項目名に対応する。 The node ID 501 is identification information for uniquely identifying the node 410. The item name 502 is identification information of a random variable corresponding to the node 410. The item name 502 corresponds to the item name of the learning data 300.

図５Ｂは、グラフィカルモデル４００におけるエッジ４２０に関する構造情報５１０を示す。構造情報５１０は、一つのエッジ４２０に対して一つのレコードを含み、レコードはエッジＩＤ５１１、親ノード５１２、及び子ノード５１３を含む。 FIG. 5B shows structural information 510 regarding the edge 420 in the graphical model 400. The structure information 510 includes one record for one edge 420, and the record includes an edge ID 511, a parent node 512, and a child node 513.

エッジＩＤ５１１は、エッジ４２０を一意に識別するための識別情報である。親ノード５１２は、親ノードに対応するノード４１０の識別情報である。子ノード５１３は、子ノードに対応するノード４１０の識別情報である。親ノード５１２及び子ノード５１３には項目名５０２と同一の情報が格納される。なお、親ノード５１２及び子ノード５１３にはノードＩＤ５０１と同一の情報が格納されてもよい。 The edge ID 511 is identification information for uniquely identifying the edge 420. The parent node 512 is identification information of the node 410 corresponding to the parent node. The child node 513 is identification information of the node 410 corresponding to the child node. The same information as the item name 502 is stored in the parent node 512 and the child node 513. Note that the same information as the node ID 501 may be stored in the parent node 512 and the child node 513.

図６は、実施例１の条件付確率テーブル６００の一例を示す説明図である。 FIG. 6 is an explanatory diagram illustrating an example of the conditional probability table 600 according to the first embodiment.

条件付確率テーブル６００は、親ノードの状態値に対して子ノードが任意の状態値を取る条件付確率値を格納する。条件付確率テーブル６００は、子ノードに対応するノード４１０に与えられるものである。 The conditional probability table 600 stores a conditional probability value at which a child node takes an arbitrary state value with respect to the state value of the parent node. The conditional probability table 600 is given to the node 410 corresponding to the child node.

条件付確率テーブル６００は、親ノード６０１、子ノード６０２、条件付確率６０３、及びサンプル数６０４を含む。 The conditional probability table 600 includes a parent node 601, a child node 602, a conditional probability 603, and the number of samples 604.

親ノード６０１は、親ノードの状態値である。親ノードが複数存在する場合、親ノード６０１には親ノードの数だけカラムが存在する。子ノード６０２は、子ノードの状態値である。条件付確率６０３は、親ノード６０１に設定された状態値に対して、子ノード６０２に設定された状態値を取る確率値である。サンプル数６０４は、学習データ３００に含まれるレコードのうち、親ノード６０１の状態値と一致するレコードの数である。 The parent node 601 is a state value of the parent node. When there are a plurality of parent nodes, the parent node 601 has as many columns as the number of parent nodes. Child node 602 is a state value of the child node. The conditional probability 603 is a probability value that takes the state value set in the child node 602 with respect to the state value set in the parent node 601. The sample number 604 is the number of records that match the state value of the parent node 601 among the records included in the learning data 300.

なお、サンプル数は、条件付確率テーブル６００とは別のテーブルにて管理されてもよい。 Note that the number of samples may be managed in a table different from the conditional probability table 600.

図６の一番上のレコードの場合、血圧値が「９０」かつ血糖値が「５」である場合に、心臓病になる確率が「９％」であることを示す。また、図６の一番上のレコードの場合、親ノードである血圧値及び血糖値の状態値がそれぞれ「９０」及び「５」であるレコードの数が「１５６３」であることを示す。条件付確率６０３は、学習データ３００に含まれる親ノード６０１の状態値に一致するレコードの数と、親ノード６０１及び子ノード６０２の状態値に一致するレコードの数とから求めることができる。 In the case of the top record in FIG. 6, when the blood pressure value is “90” and the blood glucose level is “5”, the probability of heart disease is “9%”. Further, in the case of the top record in FIG. 6, the number of records whose blood pressure value and blood glucose level state values as parent nodes are “90” and “5”, respectively, is “1563”. The conditional probability 603 can be obtained from the number of records that match the state value of the parent node 601 included in the learning data 300 and the number of records that match the state values of the parent node 601 and the child node 602.

一般的に、サンプル数６０４の値が大きいほど条件付確率６０３が示す信頼性は高くなる。 In general, the greater the value of the sample number 604, the higher the reliability indicated by the conditional probability 603.

図７は、実施例１のエビデンスデータ７００の一例を示す説明図である。 FIG. 7 is an explanatory diagram of an example of the evidence data 700 according to the first embodiment.

実施例１のエビデンスデータ７００は、学習データ３００に含まれるレコードと同一の構成である。具体的には、エビデンスデータ７００は、患者ＩＤ７０１、ＢＭＩ値７０２、血圧値７０３、血糖値７０４、心臓病７０５、及び糖尿病７０６を含む。なお、患者から取得されていない項目の値には、データが欠損していることを示す記号が格納される。 The evidence data 700 according to the first embodiment has the same configuration as the records included in the learning data 300. Specifically, the evidence data 700 includes a patient ID 701, a BMI value 702, a blood pressure value 703, a blood sugar level 704, a heart disease 705, and diabetes 706. In addition, the symbol which shows that data is missing is stored in the value of the item which is not acquired from a patient.

図８は、実施例１の推論結果管理情報８００の一例を示す説明図である。 FIG. 8 is an explanatory diagram illustrating an example of the inference result management information 800 according to the first embodiment.

推論結果管理情報８００は、グラフィカルモデル４００及びエビデンスデータ７００を用いて算出された推論対象のノード４１０が任意の状態値となる条件付確率値を格納する。実施例１の推論結果管理情報８００は、エビデンス８０１、推論対象８０２、条件付確率８０３、及び信頼区間８０４を含む。 The inference result management information 800 stores a conditional probability value at which the inference target node 410 calculated using the graphical model 400 and the evidence data 700 becomes an arbitrary state value. The inference result management information 800 according to the first embodiment includes evidence 801, an inference object 802, a conditional probability 803, and a confidence interval 804.

エビデンス８０１は、エビデンスデータ７００のうち、推論対象８０２に関連するノード４１０と依存関係のあるノード４１０の状態値である。推論対象８０２は、推論対象であるノード４１０の状態値である。 The evidence 801 is a state value of the node 410 having a dependency relationship with the node 410 related to the inference target 802 in the evidence data 700. The inference target 802 is a state value of the node 410 that is the inference target.

推論結果管理情報８００には、エビデンスデータ７００に含まれる一つのレコードに対して、推論対象のノード４１０が取り得る状態値の数だけレコードが生成される。本実施例では、「心臓病」に対応するノード４１０は「Ｙｅｓ」又は「Ｎｏ」のいずれかの状態値を取るため、推論結果管理情報８００には、エビデンスデータ７００の一つのレコードに対して、二つのレコードが格納される。例えば、エビデンスデータ７００のレコードの数が「Ｍ」の場合、推論結果管理情報８００のレコードの数は「２Ｍ」となる。 In the inference result management information 800, as many records as the state values that can be taken by the inference target node 410 are generated for one record included in the evidence data 700. In this embodiment, since the node 410 corresponding to “heart disease” takes a state value of “Yes” or “No”, the inference result management information 800 includes one record of the evidence data 700. , Two records are stored. For example, when the number of records in the evidence data 700 is “M”, the number of records in the inference result management information 800 is “2M”.

条件付確率８０３は、推論対象の条件付確率値である。エビデンスデータ７００の一つのレコードに対応する推論結果管理情報８００の二つのレコードの条件付確率８０３の集合が、当該エビデンスデータ７００の一つのレコードにおける推論結果となる。 The conditional probability 803 is a conditional probability value to be inferred. A set of conditional probabilities 803 of the two records of the inference result management information 800 corresponding to one record of the evidence data 700 is an inference result in one record of the evidence data 700.

信頼区間８０４は、条件付確率値の信頼性を評価する信頼区間である。本実施例では、信頼区間８０４には、９５％信頼区間の値を格納されるものとする。 The confidence interval 804 is a confidence interval for evaluating the reliability of the conditional probability value. In this embodiment, it is assumed that the confidence interval 804 stores the value of the 95% confidence interval.

図８に示す一番上のレコードは、ＢＭＩ値が「２１」、血圧値が「９０」、かつ血糖値が「５」である患者が心臓病となる確率は、９５％の確率で５％から９％であることを示す。 The top record shown in FIG. 8 shows that the probability that a patient with a BMI value of “21”, a blood pressure value of “90”, and a blood glucose level of “5” will have heart disease is 95% with a probability of 5%. 9%.

図９Ａ及び図９Ｂは、実施例１の影響度管理情報９００の一例を示す説明図である。なお、影響度管理情報９００のレコード数が多いため、図９Ａ及び図９Ｂの二つに分けて影響度管理情報９００を示している。 9A and 9B are explanatory diagrams illustrating an example of the impact management information 900 according to the first embodiment. Since the number of records of the impact management information 900 is large, the impact management information 900 is shown in two parts of FIGS. 9A and 9B.

影響度管理情報９００は、条件付確率６０３の推論結果への影響度を示す影響評価値を管理する。影響度管理情報９００は、親ノード９０１、子ノード９０２、条件付確率９０３、サンプル数９０４、推論対象９０５、同時確率９０６、確率値微分量９０７、及び影響評価値９０８を含む。 The influence degree management information 900 manages an influence evaluation value indicating the degree of influence of the conditional probability 603 on the inference result. The influence management information 900 includes a parent node 901, a child node 902, a conditional probability 903, a sample number 904, an inference target 905, a joint probability 906, a probability value derivative 907, and an influence evaluation value 908.

親ノード９０１、子ノード９０２、条件付確率９０３、及びサンプル数９０４は、親ノード６０１、子ノード６０２、条件付確率６０３、及びサンプル数６０４と同一のものである。 The parent node 901, the child node 902, the conditional probability 903, and the number of samples 904 are the same as the parent node 601, the child node 602, the conditional probability 603, and the number of samples 604.

なお、推論対象と依存関係を有するノード４１０の条件付確率テーブル６００に対して一つの影響度管理情報９００が存在する。図９は血糖値のノード４１０に対応する影響度管理情報９００である。また、影響度管理情報９００には、一つの条件付確率値に対して、推論対象が取り得る状態値の数だけレコードが生成される。「心臓病」に対応するノード４１０は「Ｙｅｓ」又は「Ｎｏ」の二つの状態値を取るため、影響度管理情報９００には、条件付確率テーブル６００の一つのレコードに対して、二つのレコードが格納される。例えば、条件付確率テーブル６００のレコードの数が「Ｎ」の場合、影響度管理情報９００のレコードの数は「２Ｎ」となる。 Note that one piece of influence management information 900 exists for the conditional probability table 600 of the node 410 having a dependency relationship with the inference target. FIG. 9 shows the influence level management information 900 corresponding to the blood glucose level node 410. In addition, as many records as the number of state values that can be taken as an inference target are generated in the influence management information 900 for one conditional probability value. Since the node 410 corresponding to “heart disease” takes two state values “Yes” or “No”, the impact management information 900 includes two records for one record of the conditional probability table 600. Is stored. For example, when the number of records in the conditional probability table 600 is “N”, the number of records in the impact management information 900 is “2N”.

推論対象９０５は、推論対象の状態値である。同時確率９０６は、当該レコードに対応する状態値の組み合わせにおける関連確率変数の同時確率値である。確率値微分量９０７は、条件付確率９０３の確率値微分量である。影響評価値９０８は、条件付確率９０３が推論結果の確率値に与える影響度である。 The inference object 905 is a state value of the inference object. The joint probability 906 is a joint probability value of related random variables in a combination of state values corresponding to the record. The probability value derivative 907 is a probability value derivative of the conditional probability 903. The influence evaluation value 908 is the degree of influence that the conditional probability 903 has on the probability value of the inference result.

例えば、図９の一番目のレコードは、ＢＭＩ値が「２０」及び血糖値が「５」である場合に心臓病となる条件付確率値「９％」が推論結果の確率値に与える影響の大きさ、すなわち、影響評価値が「０．２」であることを示す。 For example, the first record of FIG. 9 shows the influence of the conditional probability value “9%” that causes heart disease on the probability value of the inference result when the BMI value is “20” and the blood glucose level is “5”. The magnitude, that is, the influence evaluation value is “0.2”.

影響評価値９０８に格納される値は、条件付確率６０３の微小変化に対する推論結果の確率値の変化量の関数の値として与えられる。影響評価値を算出するための関数は、推論部２２０に予め設定されているものとする。また、推論結果管理情報８００の信頼区間８０４に格納される値は、後述するようにサンプル数及び影響評価値に基づいて算出される。 The value stored in the impact evaluation value 908 is given as a value of a function of the amount of change in the probability value of the inference result for a minute change in the conditional probability 603. It is assumed that a function for calculating the impact evaluation value is set in the inference unit 220 in advance. Also, the value stored in the confidence interval 804 of the inference result management information 800 is calculated based on the number of samples and the impact evaluation value, as will be described later.

次に、計算機２００が実行する処理について説明する。まず、グラフィカルモデル４００の構築処理について図１０を用いて説明する。図１０は、実施例１の計算機２００のグラフィカルモデル構築部２１０が実行する処理の一例を説明するフローチャートである。 Next, processing executed by the computer 200 will be described. First, the construction process of the graphical model 400 will be described with reference to FIG. FIG. 10 is a flowchart illustrating an example of processing executed by the graphical model construction unit 210 of the computer 200 according to the first embodiment.

計算機２００は、グラフィカルモデル４００の構築が指示された場合等に以下で説明する処理を開始する。 The computer 200 starts processing described below when the construction of the graphical model 400 is instructed.

計算機２００は、学習データ３００の入力を受け付ける（ステップＳ１００１）。学習データ３００は、例えば、入力装置２０４等を用いて入力する方法が考えられる。 The computer 200 accepts input of learning data 300 (step S1001). For example, the learning data 300 may be input using the input device 204 or the like.

このとき、計算機２００は、データベース２０６の学習データ記憶部２４１を介して入力された学習データ３００をデータベース２０６に格納する。なお、様々なデータ形式の学習データ３００が入力される場合、学習データ記憶部２４１は、入力されたデータを図３に示すような形式に変換した後、変換された学習データ３００を格納してもよい。 At this time, the computer 200 stores the learning data 300 input via the learning data storage unit 241 of the database 206 in the database 206. When learning data 300 in various data formats is input, the learning data storage unit 241 stores the converted learning data 300 after converting the input data into the format shown in FIG. Also good.

次に、計算機２００は、学習データ３００の離散化処理を実行する（ステップＳ１００２）。具体的には、グラフィカルモデル構築部２１０のモデル構造学習部２１１が、学習データ３００のレコードの項目のうち、当該項目に格納される状態値が連続値をとる項目について状態値を離散化する。例えば、血糖値の状態値として整数のみを扱うように離散化する。この場合、小数点以下の数は、四捨五入、切り捨て、切り上げ等が行われる。なお、離散化の粒度は任意に設定できる。 Next, the computer 200 executes a discretization process for the learning data 300 (step S1002). Specifically, the model structure learning unit 211 of the graphical model construction unit 210 discretizes state values for items in which the state value stored in the item has a continuous value among the items of the record of the learning data 300. For example, it is discretized so that only an integer is handled as the state value of the blood sugar level. In this case, the number after the decimal point is rounded off, rounded down or rounded up. The granularity of discretization can be set arbitrarily.

次に、計算機２００は、グラフィカルモデル４００の構築に用いられる制限条件の設定処理を実行する（ステップＳ１００３）。例えば、グラフィカルモデル構築部２１０のモデル構造学習部２１１が、入力装置２０４等を用いて入力された制限条件を受け付け、当該制約条件をメモリ２０２に格納する。 Next, the computer 200 executes a restriction condition setting process used to construct the graphical model 400 (step S1003). For example, the model structure learning unit 211 of the graphical model construction unit 210 receives a restriction condition input using the input device 204 or the like, and stores the restriction condition in the memory 202.

ここで、制限条件としては、ノード４１０間の依存関係等が考えられる。例えば、「第１のノードと第２のノードとの間にはエッジなし」、「第３のノードと第４のノードとの間にはエッジあり」等の情報が制限情報として入力される。 Here, as the limiting condition, a dependency relationship between the nodes 410 can be considered. For example, information such as “there is no edge between the first node and the second node” and “there is an edge between the third node and the fourth node” is input as the restriction information.

次に、計算機２００は、学習データ３００を用いたモデル構造の学習処理を実行する（ステップＳ１００４）。具体的には、グラフィカルモデル構築部２１０のモデル構造学習部２１１が、学習データ３００及び制約条件に基づいて、ノード４１０の構造情報５００及びエッジ４２０の構造情報５１０を生成することによって、グラフィカルモデル４００を構築する。なお、ベイジアンネットワークの構造学習アルゴリズムとしてＨｉｌｌＣｌｉｍｂｉｎｇ法等が知られている。本実施例は、どのような学習アルゴリズムを用いてもよい。 Next, the computer 200 executes a model structure learning process using the learning data 300 (step S1004). Specifically, the model structure learning unit 211 of the graphical model construction unit 210 generates the structure information 500 of the node 410 and the structure information 510 of the edge 420 based on the learning data 300 and the constraint conditions, thereby the graphical model 400. Build up. The Hill Climbing method is known as a Bayesian network structure learning algorithm. In this embodiment, any learning algorithm may be used.

次に、計算機２００は、データ照合処理を実行する（ステップＳ１００５）。具体的には、グラフィカルモデル構築部２１０のサンプル数算出部２１２が、学習データ３００に含まれるレコードの中から、あるノード４１０を子ノードとした場合における、子ノードの状態値、及び親ノードの状態値の組み合わせと一致するレコードを抽出する。 Next, the computer 200 executes data collation processing (step S1005). Specifically, when the sample number calculation unit 212 of the graphical model construction unit 210 sets a certain node 410 as a child node from among records included in the learning data 300, the child node state value and the parent node Extract records that match the status value combination.

次に、計算機２００は、サンプル数算出処理を実行する（ステップＳ１００６）。具体的には、グラフィカルモデル構築部２１０のサンプル数算出部２１２が、データ照合処理において抽出されたレコードの数をサンプル数として算出し、算出されたサンプル数を子ノードの状態値及び親ノードの状態値と対応付けて、メモリ２０２に一時的に格納する。 Next, the computer 200 executes a sample number calculation process (step S1006). Specifically, the number-of-samples calculation unit 212 of the graphical model construction unit 210 calculates the number of records extracted in the data matching process as the number of samples, and the calculated number of samples is the state value of the child node and the parent node. The data is temporarily stored in the memory 202 in association with the state value.

次に、計算機２００は、条件付確率テーブル６００の算出処理を実行する（ステップＳ１００７）。具体的には、以下のような処理が実行される。 Next, the computer 200 executes a calculation process of the conditional probability table 600 (step S1007). Specifically, the following processing is executed.

グラフィカルモデル構築部２１０の確率テーブル算出部２１３が、処理対象のノード４１０を選択し、構造情報５００、５１０に基づいて、選択されたノード４１０に対する親ノードを特定し、条件付確率テーブル６００に子ノード及び親ノードの状態値の組み合わせの数だけレコードを生成する。さらに、確率テーブル算出部２１３は、生成されたレコードのサンプル数６０４に、ステップＳ１００６において算出されたサンプル数を格納する。 The probability table calculation unit 213 of the graphical model construction unit 210 selects the node 410 to be processed, identifies the parent node for the selected node 410 based on the structure information 500 and 510, and creates a child in the conditional probability table 600. Records are generated for the number of combinations of the state values of the node and the parent node. Further, the probability table calculation unit 213 stores the number of samples calculated in step S1006 in the number of samples 604 of the generated record.

また、確率テーブル算出部２１３は、学習データ３００に含まれるレコードのうち、子ノード６０２の状態値が所定値になるレコードの割合を条件付確率値として算出する。さらに、確率テーブル算出部２１３は、算出された条件付確率値を条件付確率テーブル６００の所定のレコードの条件付確率６０３に格納する。 Further, the probability table calculation unit 213 calculates, as a conditional probability value, a ratio of records in which the state value of the child node 602 is a predetermined value among the records included in the learning data 300. Further, the probability table calculation unit 213 stores the calculated conditional probability value in the conditional probability 603 of a predetermined record in the conditional probability table 600.

以上の処理によって、図５Ａ及び図５Ｂに示すような構造情報５００、５１０、並びに、図６に示すような条件付確率テーブル６００が生成される。すなわち、図４に示すようなグラフィカルモデル４００が構築される。 Through the above processing, the structure information 500 and 510 as shown in FIGS. 5A and 5B and the conditional probability table 600 as shown in FIG. 6 are generated. That is, a graphical model 400 as shown in FIG. 4 is constructed.

次に、推論結果を算出するための処理について図１を用いて説明する。図１は、実施例１の計算機２００の推論部２２０が実行する処理の一例を説明するフローチャートである。 Next, a process for calculating the inference result will be described with reference to FIG. FIG. 1 is a flowchart illustrating an example of processing executed by the inference unit 220 of the computer 200 according to the first embodiment.

推論部２２０は、処理の開始を指示される場合、以下で説明する処理を開始する。このとき、推論部２２０には、エビデンスデータ７００が入力される。なお、エビデンスデータ７００を用いた推論結果の算出方法は公知のものであるため、説明を省略する。ここでは、推論結果算出部２２１が、変数消去法等の厳密推論手法、又は、ＬｏｏｐｙＢｅｌｉｅｆＰｒｏｐａｇａｔｉｏｎ等の近似推論法を用いて、推論結果が算出されているものとする。 When instructing unit 220 is instructed to start processing, inference unit 220 starts processing described below. At this time, the evidence data 700 is input to the inference unit 220. In addition, since the calculation method of the inference result using the evidence data 700 is a well-known method, description is abbreviate | omitted. Here, it is assumed that the inference result calculation unit 221 calculates an inference result using a strict inference method such as a variable elimination method, or an approximate inference method such as Loopy Belief Propagation.

推論部２２０は、推論対象を選択する（ステップＳ１０１）。具体的には、推論部２２０の確率値微分量算出部２２２が、入力装置２０４を介してユーザ等からノード４１０の識別情報を受け付け、当該ノード４１０を推論対象として設定する。 The inference unit 220 selects an inference target (step S101). Specifically, the probability value differential amount calculation unit 222 of the inference unit 220 receives the identification information of the node 410 from the user or the like via the input device 204, and sets the node 410 as an inference target.

次に、推論部２２０は、処理対象となる条件付確率テーブル６００を選択する（ステップＳ１０２）。具体的には、以下のような処理が実行される。 Next, the inference unit 220 selects the conditional probability table 600 to be processed (step S102). Specifically, the following processing is executed.

推論部２２０の確率値微分量算出部２２２は、構造情報５００、５１０を参照して、推論対象を葉ノードとして、根ノードまでエッジ４２０に沿ってグラフィカルモデル４００を辿ることによって、推論対象と依存関係のある複数のノード４１０を抽出する。確率値微分量算出部２２２は、抽出された複数のノード４１０の各々に対応する条件付確率テーブル６００をデータベース２０６から読み出し、メモリ２０２に格納する。 The probability value differential amount calculation unit 222 of the inference unit 220 refers to the structure information 500 and 510 and follows the graphical model 400 along the edge 420 to the root node using the inference object as a leaf node, thereby depending on the inference object. A plurality of related nodes 410 are extracted. The probability value differential amount calculation unit 222 reads the conditional probability table 600 corresponding to each of the extracted nodes 410 from the database 206 and stores it in the memory 202.

確率値微分量算出部２２２は、抽出された複数のノード４１０の中から処理対象のノード４１０を一つ選択する。例えば、推論対象を子ノードとした場合の親ノード、さらにその親ノードの順に選択する方法が考えられる。なお、本実施例は、処理対象のノード４１０の選択方法に依存しない。以下、選択されたノード４１０を選択ノード４１０とも記載する。 The probability value differential amount calculation unit 222 selects one processing target node 410 from among the plurality of extracted nodes 410. For example, a method of selecting a parent node when the inference target is a child node and then selecting the parent node in that order can be considered. Note that this embodiment does not depend on the method of selecting the node 410 to be processed. Hereinafter, the selected node 410 is also referred to as a selection node 410.

確率値微分量算出部２２２は、メモリ２０２に格納された複数の条件付確率テーブル６００の中から、選択ノード４１０に対応する条件付確率テーブル６００を取得する。また、確率値微分量算出部２２２は、取得された条件付確率テーブル６００に基づいて、図９Ａ及び図９Ｂに示すような影響度管理情報９００を生成する。具体的には、親ノード及び子ノードの状態値の組み合わせに対して、推論対象が取り得る状態値毎にレコードを生成する。推論対象が「心臓病」の場合、状態値は二つであるため、条件付確率テーブル６００の一つのレコードに対して、二つのレコードが影響度管理情報９００に生成される。 The probability value differential amount calculation unit 222 acquires the conditional probability table 600 corresponding to the selected node 410 from the plurality of conditional probability tables 600 stored in the memory 202. Further, the probability value differential amount calculation unit 222 generates the degree of influence management information 900 as shown in FIGS. 9A and 9B based on the acquired conditional probability table 600. Specifically, a record is generated for each state value that can be inferred for a combination of state values of a parent node and a child node. When the inference target is “heart disease”, since there are two state values, two records are generated in the influence management information 900 for one record in the conditional probability table 600.

確率値微分量算出部２２２は、生成されたレコードの親ノード９０１、子ノード９０２、条件付確率９０３、及びサンプル数９０４に、選択された条件付確率テーブル６００の親ノード６０１、子ノード６０２、条件付確率６０３、サンプル数６０４の値を格納し、また、推論対象に推論対象が取り得る状態値を格納する。この時点では、同時確率９０６、確率値微分量９０７、及び影響評価値９０８には値が格納されない。以上がステップＳ１０２の処理の説明である。 The probability value derivative calculation unit 222 adds the parent node 601, child node 602, and the selected conditional probability table 600 to the parent node 901, child node 902, conditional probability 903, and number of samples 904 of the generated record. The values of the conditional probability 603 and the number of samples 604 are stored, and the state values that the inference object can take are stored in the inference object. At this time, no value is stored in the joint probability 906, the probability value differential amount 907, and the impact evaluation value 908. The above is the description of the processing in step S102.

次に、推論部２２０は、関連確率変数を抽出する（ステップＳ１０３）。具体的には、推論部２２０の確率値微分量算出部２２２は、構造情報５００、５１０を参照して、選択ノード４１０を子ノードとして、選択ノード４１０と依存関係がある選択ノードの親ノード４１０の集合を家族ノード４１０として抽出する。さらに、推論部２２０は、推論対象、選択ノード４１０、及び抽出された家族ノード４１０を関連確率変数として抽出する。 Next, the inference unit 220 extracts a related random variable (step S103). Specifically, the probability value differential amount calculation unit 222 of the inference unit 220 refers to the structure information 500 and 510 and uses the selected node 410 as a child node, and the parent node 410 of the selected node that is dependent on the selected node 410. Is extracted as a family node 410. Further, the inference unit 220 extracts the inference object, the selected node 410, and the extracted family node 410 as related random variables.

次に、推論部２２０は、関連確率変数の同時確率分布を算出する（ステップＳ１０４）。具体的には、推論部２２０の確率値微分量算出部２２２が、推論対象、選択ノード４１０、及び抽出された家族ノード４１０の条件付確率テーブル６００を用いた、ＬｏｏｐｙＢｅｌｉｅｆＰｒｏｐａｇａｔｉｏｎ等の近似推論法に基づいて、関連確率変数の同時確率分布を算出する。このとき、確率値微分量算出部２２２は、算出された同時確率分布に基づいて、親ノード、子ノード、及び推論対象９０５の状態値の組み合わせが一致するレコードの同時確率９０６に同時確率値を格納する。 Next, the inference unit 220 calculates a joint probability distribution of related random variables (step S104). Specifically, the probability value differential amount calculation unit 222 of the inference unit 220 uses an inference method such as Loop Belief Propagation using the conditional probability table 600 of the inference object, the selected node 410, and the extracted family node 410. Based on the above, a joint probability distribution of related random variables is calculated. At this time, the probability value derivative calculation unit 222 calculates a joint probability value to the joint probability 906 of the record in which the combination of the state value of the parent node, the child node, and the inference target 905 matches based on the calculated joint probability distribution. Store.

次に、推論部２２０は、選択された条件付確率テーブル６００及び算出された同時確率分布を用いて、確率値微分量を算出する（ステップＳ１０５）。具体的には、以下のような処理が実行される。 Next, the inference unit 220 calculates a probability value differential amount using the selected conditional probability table 600 and the calculated joint probability distribution (step S105). Specifically, the following processing is executed.

推論部２２０の確率値微分量算出部２２２は、選択された条件付確率テーブル６００の親ノード６０１及び子ノード６０２の状態値を参照して、算出された同時確率分布における任意の状態値の組み合わせと一致するレコードを選択する。確率値微分量算出部２２２は、検索されたレコードの条件付確率９０３及び同時確率９０６の値を読み出す。 The probability value derivative calculation unit 222 of the inference unit 220 refers to the state values of the parent node 601 and the child node 602 in the selected conditional probability table 600, and combines any state value in the calculated joint probability distribution. Select records that match The probability value derivative calculation unit 222 reads the values of the conditional probability 903 and the joint probability 906 of the retrieved record.

確率値微分量算出部２２２は、条件付確率９０３及び同時確率９０６の値を式（４）に代入することによって確率値微分量を算出する。確率値微分量算出部２２２は、選択されたレコードの確率値微分量９０７に算出された確率値微分量を格納する。以上がステップＳ１０５の処理の説明である。 The probability value derivative calculation unit 222 calculates the probability value derivative by substituting the values of the conditional probability 903 and the joint probability 906 into the equation (4). The probability value derivative calculation unit 222 stores the calculated probability value derivative in the probability value derivative 907 of the selected record. The above is the description of the processing in step S105.

次に、推論部２２０は、確率値微分量を用いて、選択ノード４１０の条件付確率値が推論結果の確率値に与える影響度の大きさを影響評価値として算出する（ステップＳ１０６）。 Next, using the probability value differential amount, the inference unit 220 calculates the degree of influence that the conditional probability value of the selection node 410 has on the probability value of the inference result as an influence evaluation value (step S106).

一般的に影響評価値は確率値微分量の関数として与えられ、また、様々な条件に応じて任意の関数として設定できる。本実施例では、確率値微分量そのものが影響評価値となるように定義する。なお、重み付き関数を用いて複数の確率値微分量の関数を平均化し、当該関数を影響評価値として用いてもよい。 In general, the impact evaluation value is given as a function of the probability value differential amount, and can be set as an arbitrary function according to various conditions. In the present embodiment, the definition is such that the probability value derivative itself becomes the influence evaluation value. Note that a plurality of probability value differential amount functions may be averaged using a weighted function, and the function may be used as an impact evaluation value.

ステップＳ１０６では、推論部２２０の確率値微分量算出部２２２が、所定のレコードの確率値微分量９０７の値を予め設定された関数に代入することによって影響評価値を算出し、当該レコードの影響評価値９０８に算出された影響評価値を格納する。 In step S106, the probability value derivative calculation unit 222 of the inference unit 220 calculates an influence evaluation value by substituting the value of the probability value derivative 907 of the predetermined record into a preset function, and the influence of the record The evaluation value calculated in the evaluation value 908 is stored.

次に、推論部２２０は、読み出された全ての条件付確率テーブル６００について処理が完了したか否かを判定する（ステップＳ１０７）。 Next, the inference unit 220 determines whether or not the processing has been completed for all the read conditional probability tables 600 (step S107).

読み出された全ての条件付確率テーブル６００について処理が完了していないと判定された場合、推論部２２０は、ステップＳ１０２に戻り同様の処理を実行する。 When it is determined that the processing has not been completed for all the read conditional probability tables 600, the inference unit 220 returns to step S102 and executes the same processing.

読み出された全ての条件付確率テーブル６００について処理が完了していると判定された場合、推論部２２０は、サンプル数９０４及び影響評価値９０８に基づいて、推論結果の確率値の信頼区間を算出する（ステップＳ１０８）。例えば、下式（１５）のような公知の方法を用いて信頼区間を算出できる。なお、分散の２乗は、下式（１６）のように与えられる。 When it is determined that the processing has been completed for all the read conditional probability tables 600, the inference unit 220 determines the confidence interval of the probability value of the inference result based on the number of samples 904 and the impact evaluation value 908. Calculate (step S108). For example, the confidence interval can be calculated using a known method such as the following equation (15). Note that the square of the variance is given by the following equation (16).

ここで、ノードＤ、ノードＤの親ノードであるノードＣ、ノードＣの親ノードであるノードＢ、ノードＢの親ノードであるノードＡという直列的なベイジアンネットワークを例に図１の処理の具体的な流れについて説明する。このとき、ノードＡの条件付確率テーブル６００は条件付確率Ｐ（Ａ）の分布、ノードＢの条件付確率テーブル６００は条件付確率Ｐ（Ｂ｜Ａ）の分布、ノードＣの条件付確率テーブル６００は条件付確率Ｐ（Ｃ｜Ｂ）の分布、ノードＤの条件付確率テーブル６００は条件付確率Ｐ（Ｄ｜Ｃ）の分布として与えられる。 Here, the processing of FIG. 1 is illustrated by taking as an example a serial Bayesian network of node D, node C, the parent node of node D, node B, the parent node of node C, and node A, the parent node of node B. The general flow will be described. At this time, conditional probability table 600 of node A is a distribution of conditional probability P (A), conditional probability table 600 of node B is a distribution of conditional probability P (B | A), conditional probability table of node C 600 is given as a distribution of conditional probability P (C | B), and the conditional probability table 600 of node D is given as a distribution of conditional probability P (D | C).

ステップＳ１０１において、推論部２２０は、ノードＤを推論対象として選択する。ステップＳ１０２において、推論部２２０は、ノードＡ、ノードＢ、及びノードＣを推論対象と依存関係のあるノード４１０として抽出する。また、推論部２２０は、ノードＣを選択ノード４１０として選択する。 In step S101, the inference unit 220 selects the node D as an inference target. In step S102, the inference unit 220 extracts the node A, the node B, and the node C as the node 410 having a dependency relationship with the inference target. In addition, the inference unit 220 selects the node C as the selection node 410.

ステップＳ１０３において、推論部２２０は、ノードＢ及びノードＣを家族ノード４１０として抽出し、また、ノードＢ、ノードＣ、及びノードＤを関連確率変数として抽出する。ステップＳ１０４において、推論部２２０は、同時確率Ｐ（Ｂ，Ｃ，Ｄ）の分布を算出する。 In step S103, the inference unit 220 extracts the node B and the node C as the family node 410, and extracts the node B, the node C, and the node D as related random variables. In step S104, the inference unit 220 calculates the distribution of the joint probability P (B, C, D).

ステップＳ１０５において、推論部２２０は、ノードＢ、ノードＣ、及びノードＤの状態値の組み合わせ毎に、同時確率Ｐ（Ｂ＝ｂ，Ｃ＝ｃ，Ｄ＝ｄ）を条件付確率Ｐ（Ｃ＝ｃ｜Ｂ＝ｂ）で除算して、確率微分量を算出する。ステップＳ１０６において、推論部２２０は、確率微分量を用いて影響評価値を算出する。これによって、ノードＣの条件付確率値がノードＤの確率値に与える影響の大きさを見積もることができる。 In step S105, the inference unit 220 sets the joint probability P (B = b, C = c, D = d) to the conditional probability P (C = C = D = d) for each combination of the state values of the nodes B, C, and D. Divide by c | B = b) to calculate the probability derivative. In step S106, the inference unit 220 calculates an impact evaluation value using the probability differential amount. As a result, the magnitude of the influence of the conditional probability value of node C on the probability value of node D can be estimated.

ステップＳ１０７において、推論部２２０は、全ての条件付確率テーブル６００について処理が完了していないと判定する。そのため、推論部２２０は、ステップＳ１０２に戻り、ノードＢを選択ノード４１０として選択する。ステップＳ１０３において、推論部２２０は、ノードＡ及びノードＢを家族ノードとして抽出し、また、ノードＡ、ノードＢ、及びノードＤを関連確率変数として抽出する。ステップＳ１０４において、推論部２２０は、同時確率Ｐ（Ａ，Ｂ，Ｄ）の分布を算出する。 In step S107, the inference unit 220 determines that processing has not been completed for all conditional probability tables 600. Therefore, the inference unit 220 returns to step S102 and selects the node B as the selection node 410. In step S103, the inference unit 220 extracts node A and node B as family nodes, and extracts node A, node B, and node D as related random variables. In step S104, the inference unit 220 calculates the distribution of the joint probability P (A, B, D).

ステップＳ１０５において、推論部２２０は、ノードＡ、ノードＢ、及びノードＤの状態値の組み合わせ毎に、同時確率Ｐ（Ａ＝ａ’，Ｂ＝ｂ’，Ｄ＝ｄ’）を条件付確率Ｐ（Ｂ＝ｂ’｜Ａ＝ａ’）で除算して、確率微分量を算出する。ステップＳ１０６において、推論部２２０は、確率微分量を用いて影響評価値を算出する。これによって、ノードＢの条件付確率値がノードＤの確率値に与える影響の大きさを見積もることができる。 In step S <b> 105, the inference unit 220 determines the joint probability P (A = a ′, B = b ′, D = d ′) as the conditional probability P for each combination of the state values of the node A, node B, and node D. Divide by (B = b ′ | A = a ′) to calculate the probability differential amount. In step S106, the inference unit 220 calculates an impact evaluation value using the probability differential amount. As a result, the magnitude of the influence of the conditional probability value of node B on the probability value of node D can be estimated.

ステップＳ１０７において、推論部２２０は、全ての条件付確率テーブル６００について処理が完了していないと判定する。そのため、推論部２２０は、ステップＳ１０２に戻り、ノードＡを選択ノード４１０として選択する。ステップＳ１０３において、推論部２２０は、ノードＡは根ノード４１０であるため親ノード４１０が存在しないため、ノードＡ及びノードＤを関連確率変数として抽出する。ステップＳ１０４において、推論部２２０は、同時確率Ｐ（Ａ，Ｄ）の分布を算出する。 In step S107, the inference unit 220 determines that processing has not been completed for all conditional probability tables 600. Therefore, the inference unit 220 returns to Step S102 and selects the node A as the selection node 410. In step S103, the inference unit 220 extracts the node A and the node D as related random variables since the parent node 410 does not exist because the node A is the root node 410. In step S104, the inference unit 220 calculates a distribution of the joint probability P (A, D).

ステップＳ１０５において、推論部２２０は、ノードＡ及びノードＤの状態値の組み合わせ毎に、同時確率Ｐ（Ａ＝ａ’’，Ｄ＝ｄ’’）を条件付確率Ｐ（Ａ＝ａ’’）で除算して、確率微分量を算出する。ステップＳ１０６において、推論部２２０は、確率微分量を用いて影響評価値を算出する。これによって、ノードＡの条件付確率値がノードＤの確率値に与える影響の大きさを見積もることができる。 In step S105, the inference unit 220 sets the joint probability P (A = a ″, D = d ″) to the conditional probability P (A = a ″) for each combination of the state values of the node A and the node D. Divide by to calculate the probability derivative. In step S106, the inference unit 220 calculates an impact evaluation value using the probability differential amount. As a result, the magnitude of the influence of the conditional probability value of node A on the probability value of node D can be estimated.

ステップＳ１０７において、推論部２２０は、全ての条件付確率テーブル６００について処理が完了したと判定する。ステップＳ１０８において、推論部２２０は、算出された影響評価値を用いて推論結果の確率値の影響度を算出する。以上が、図１の処理の具体的な流れの説明である。 In step S107, the inference unit 220 determines that the processing has been completed for all conditional probability tables 600. In step S108, the inference unit 220 calculates the degree of influence of the probability value of the inference result using the calculated influence evaluation value. The above is the description of the specific flow of the processing of FIG.

表示部２３０によって表示される情報の一例を図１１を用いて説明する。図１１は、実施例１の表示部２３０によって表示されるユーザインタフェース１１００の一例を示す説明図である。 An example of information displayed by the display unit 230 will be described with reference to FIG. FIG. 11 is an explanatory diagram illustrating an example of a user interface 1100 displayed by the display unit 230 according to the first embodiment.

ユーザインタフェース１１００は、推論結果表示領域１１１０、グラフィカルモデル表示領域１１２０、影響評価値ソートボタン１１３０、及びサンプル数ソートボタン１１４０を含む。 The user interface 1100 includes an inference result display area 1110, a graphical model display area 1120, an impact evaluation value sort button 1130, and a sample number sort button 1140.

推論結果表示領域１１１０は、推論結果に関する各種情報を表示する領域である。図１１では、推論結果表示領域１１１０には、影響度管理情報９００及び推論結果管理情報８００が表示される。この場合、表示部２３０は、影響度記憶部２４５から影響度管理情報９００を取得し、また、推論結果記憶部２４４から推論結果管理情報８００を取得し、取得された各情報を表示領域に表示する。 The inference result display area 1110 is an area for displaying various information related to the inference result. In FIG. 11, the inference result display area 1110 displays the influence degree management information 900 and the inference result management information 800. In this case, the display unit 230 acquires the influence level management information 900 from the influence level storage unit 245, acquires the inference result management information 800 from the inference result storage unit 244, and displays each acquired information in the display area. To do.

グラフィカルモデル表示領域１１２０は、推論結果の算出に用いられたグラフィカルモデル４００を表示する領域である。表示部２３０は、モデル情報記憶部２４２からノード４１０の構造情報５００及びエッジ４２０の構造情報５１０を取得して、グラフィカルモデル４００の表示データを生成することによって表示領域にグラフィカルモデル４００を表示する。 The graphical model display area 1120 is an area for displaying the graphical model 400 used for calculating the inference result. The display unit 230 acquires the structure information 500 of the node 410 and the structure information 510 of the edge 420 from the model information storage unit 242 and generates display data of the graphical model 400 to display the graphical model 400 in the display area.

影響評価値ソートボタン１１３０及びサンプル数ソートボタン１１４０は、影響度管理情報９００のレコードをソートするための操作ボタンである。影響評価値ソートボタン１１３０が操作された場合、表示部２３０は、影響評価値９０８の値が小さい順又は大きい順にレコードがソートされた影響度管理情報９００を表示する。また、サンプル数ソートボタン１１４０が操作された場合、表示部２３０は、サンプル数９０４の値が小さい順又は大きい順にレコードがソートされた影響度管理情報９００を表示する。 The impact evaluation value sort button 1130 and the sample number sort button 1140 are operation buttons for sorting the records of the impact management information 900. When the impact evaluation value sort button 1130 is operated, the display unit 230 displays the impact management information 900 in which the records are sorted in ascending order or decreasing order of the impact evaluation value 908. When the sample number sort button 1140 is operated, the display unit 230 displays the influence management information 900 in which the records are sorted in ascending order or descending order of the sample number 904.

なお、ユーザインタフェース１１００の表示形式及び表示される情報は一例であってこれに限定されない。表示部２３０は、データベース２０６に格納される情報を用いて様々な表示形式で様々な情報を表示することができる。 Note that the display format of the user interface 1100 and the information displayed are merely examples, and the present invention is not limited thereto. The display unit 230 can display various information in various display formats using information stored in the database 206.

実施例１によれば、計算機２００は、式（４）に示すような数式を用いて影響評価値を算出するための確率値微分量を算出する。同時確率分布は、近似推論手法に基づいて算出できるため、従来の式（２）によりも計算コストを小さくすることができる。そのため、規模が大きいグラフィカルモデルにおいても、影響評価値を算出できる。また、影響度評価値を用いて推論結果の確率値の信頼区間を算出することができる。信頼区間とともに推論結果を表示することによって、推論結果の信頼性及び説得力を向上させることができる。 According to the first embodiment, the computer 200 calculates the probability value differential amount for calculating the influence evaluation value using a mathematical expression as shown in Expression (4). Since the joint probability distribution can be calculated based on the approximate reasoning method, the calculation cost can be reduced by the conventional equation (2). Therefore, the impact evaluation value can be calculated even in a large-scale graphical model. Further, the confidence interval of the probability value of the inference result can be calculated using the influence degree evaluation value. By displaying the inference result together with the confidence interval, the reliability and persuasive power of the inference result can be improved.

実施例２では、影響評価値に基づいて補強データを追加することによって、グラフィカルモデルが再構築される。以下、実施例１との差異を中心に実施例２について説明する。 In Example 2, the graphical model is reconstructed by adding reinforcement data based on the impact assessment value. Hereinafter, the second embodiment will be described focusing on differences from the first embodiment.

実施例２の計算機システムは実施例１のものと同一である。また、実施例２の計算機２００のハードウェア構成及びソフトウェア構成は実施例１のものと同一である。また、実施例２のデータベース２０６に格納される情報は、学習データ記憶部２４１によって管理される補強データが新たに格納される点が実施例１と異なる。その他の情報は実施例１と同一である。 The computer system of the second embodiment is the same as that of the first embodiment. The hardware configuration and software configuration of the computer 200 of the second embodiment are the same as those of the first embodiment. Further, the information stored in the database 206 of the second embodiment is different from the first embodiment in that the reinforcement data managed by the learning data storage unit 241 is newly stored. Other information is the same as in the first embodiment.

実施例２では、推論結果及び推論結果の確率値の信頼区間が算出された後に以下のような処理が実行される。このとき、複数の補強データを含む補充データ情報が入力されているものとする。 In the second embodiment, after the inference result and the confidence interval of the probability value of the inference result are calculated, the following processing is executed. At this time, it is assumed that supplementary data information including a plurality of reinforcement data is input.

図１２は、実施例２の計算機２００が実行するグラフィカルモデルの再構築処理を説明するフローチャートである。 FIG. 12 is a flowchart for explaining a graphical model reconstruction process executed by the computer 200 according to the second embodiment.

計算機２００は、影響度管理情報９００を選択し（ステップＳ１２０１）、選択された影響度管理情報９００に対応する条件付確率テーブル６００のレコードを影響評価値９０８の値が大きい順にソートする（ステップＳ１２０２）。 The computer 200 selects the impact management information 900 (step S1201), and sorts the records in the conditional probability table 600 corresponding to the selected impact management information 900 in descending order of the impact evaluation value 908 (step S1202). ).

次に、計算機２００は、影響評価値９０８の値が所定の閾値より大きいレコードを補強対象の条件付確率値として選択する（ステップＳ１２０３）。 Next, the computer 200 selects a record whose influence evaluation value 908 is greater than a predetermined threshold value as a conditional probability value to be reinforced (step S1203).

次に、計算機２００は、選択された条件付確率値に対応するノード４１０の状態値の組み合わせに基づいて、補充データに含まれるレコードと照合を行い、使用可能なレコードを抽出する（ステップＳ１２０４）。例えば、計算機２００は、複数のノード４１０の状態値を状態値ベクトルとして、補充データの照合を行う。 Next, based on the combination of the state values of the node 410 corresponding to the selected conditional probability value, the computer 200 performs collation with the record included in the supplement data and extracts a usable record (step S1204). . For example, the computer 200 collates supplementary data using the state values of the plurality of nodes 410 as state value vectors.

次に、計算機２００は、補強データから抽出されたレコードを学習データ３００に追加することによって、学習データ３００を更新する（ステップＳ１２０５）。 Next, the computer 200 updates the learning data 300 by adding the record extracted from the reinforcement data to the learning data 300 (step S1205).

計算機２００は、全ての影響度管理情報９００について処理が完了したか否かを判定する（ステップＳ１２０６）。 The computer 200 determines whether or not the processing has been completed for all the impact management information 900 (step S1206).

全ての影響度管理情報９００について処理が完了していないと判定された場合、計算機２００は、ステップＳ１２０１に戻り、同様の処理を実行する。 When it is determined that the processing has not been completed for all the impact management information 900, the computer 200 returns to step S1201 and executes the same processing.

全ての影響度管理情報９００について処理が完了していると判定された場合、計算機２００は、更新された学習データ３００を用いてグラフィカルモデル４００の構築処理を実行する（ステップＳ１２０７）。なお、グラフィカルモデル４００の構築処理は図１０と同一の処理である。 When it is determined that the processing has been completed for all the influence management information 900, the computer 200 executes the construction process of the graphical model 400 using the updated learning data 300 (step S1207). The construction process of the graphical model 400 is the same process as FIG.

ステップＳ１２０１からステップＳ１２０６までの処理は、推論部２２０が実行するものとする。なお、グラフィカルモデル構築部２１０、又はその他の機能部が実行してもよい。 It is assumed that the processing from step S1201 to step S1206 is executed by the inference unit 220. Note that the graphical model construction unit 210 or other functional units may execute.

実施例２によれば、影響評価値に基づいて、グラフィカルモデル４００を再構築することによって、推論結果の信頼区間を小さくすることができる。すなわち、信頼性の高い推論結果が算出可能なグラフィカルモデル４００を構築することができる。 According to the second embodiment, the confidence interval of the inference result can be reduced by reconstructing the graphical model 400 based on the influence evaluation value. That is, the graphical model 400 that can calculate a highly reliable inference result can be constructed.

実施例３では、影響評価値を用いて推論結果に対する任意のノード４１０の依存関係を再評価することによって、グラフィカルモデル４００をコンパクト化する。以下、実施例１との差異を中心に実施例３について説明する。 In the third embodiment, the graphical model 400 is made compact by re-evaluating the dependency of any node 410 on the inference result using the impact evaluation value. Hereinafter, the third embodiment will be described focusing on differences from the first embodiment.

実施例３の計算機システムは実施例１のものと同一である。また、実施例３の計算機２００のハードウェア構成及びソフトウェア構成は実施例１のものと同一である。また、実施例３のデータベース２０６に格納される情報は実施例１のものと同一である。 The computer system of the third embodiment is the same as that of the first embodiment. The hardware configuration and software configuration of the computer 200 of the third embodiment are the same as those of the first embodiment. The information stored in the database 206 of the third embodiment is the same as that of the first embodiment.

実施例３では、推論結果及び推論結果の確率値の信頼区間が算出された後に以下のような処理が実行される。 In the third embodiment, after the inference result and the confidence interval of the probability value of the inference result are calculated, the following processing is executed.

図１３は、実施例３の計算機２００が実行するグラフィカルモデルの再構築処理を説明するフローチャートである。 FIG. 13 is a flowchart for explaining the graphical model reconstruction process executed by the computer 200 according to the third embodiment.

計算機２００は、各影響度管理情報９００を用いて、対応するノード４１０の総影響評価値を算出する（ステップＳ１３０１）。 The computer 200 calculates the total impact evaluation value of the corresponding node 410 using each impact management information 900 (step S1301).

例えば、計算機２００は、選択された影響度管理情報９００の全てのレコードの影響評価値９０８の重み付き平均を、ノードの４１０の総影響評価値として算出する。条件付確率値の影響評価値は、推論対象の状態値ごとに存在するため、算出されたノード４１０の総影響評価値も推論対象の状態値ごとに存在する。一つのノード４１０について、推論対象の全ての状態値に対する影響評価値９０８の重み付き平均を算出することによって、推論対象に対する当該ノード４１０の影響評価値として見積もることができる。 For example, the computer 200 calculates the weighted average of the impact evaluation values 908 of all the records of the selected impact management information 900 as the total impact evaluation value of the node 410. Since the influence evaluation value of the conditional probability value exists for each state value to be inferred, the calculated total influence evaluation value of the node 410 also exists for each state value to be inferred. By calculating the weighted average of the influence evaluation values 908 for all the state values of the inference object for one node 410, it can be estimated as the influence evaluation value of the node 410 for the inference object.

なお、前述したノード４１０の総影響評価値は一例であって他の算出方法を用いてもよい。 Note that the total impact evaluation value of the node 410 described above is an example, and other calculation methods may be used.

次に、計算機２００は、算出されたノード４１０の総影響評価値に基づいて、推論対象と依存関係のある複数のノード４１０の中から重要ノード４１０を選択する（ステップＳ１３０２）。 Next, the computer 200 selects the important node 410 from among the plurality of nodes 410 having a dependency relationship with the inference target based on the calculated total impact evaluation value of the node 410 (step S1302).

例えば、計算機２００は、予め設定された閾値に基づいて、ノード４１０の総影響評価値が当該閾値より大きいノード４１０を重要ノード４１０として選択する。なお、閾値は、計算機２００の計算機リソース量及び計算時間等に基づいて設定することができる。 For example, the computer 200 selects, as the important node 410, the node 410 whose total influence evaluation value of the node 410 is larger than the threshold based on a preset threshold. The threshold value can be set based on the computer resource amount of the computer 200, the calculation time, and the like.

計算機２００は、再度、グラフィカルモデル４００の構築処理を実行する（ステップＳ１３０３）。ステップＳ１３０３の処理は、図１０とほぼ同一であるが、ステップＳ１００３において、選択された重要ノード４１０のみが依存関係を有するノード４１０の情報とし入力される点が異なる。その他の処理は、図１０と同一の処理である。 The computer 200 executes the construction process of the graphical model 400 again (step S1303). The processing in step S1303 is almost the same as that in FIG. 10, except that only the selected important node 410 is input as information of the node 410 having the dependency in step S1003. Other processes are the same as those in FIG.

ステップＳ１３０１、ステップＳ１３０２の処理は、推論部２２０が実行するものとする。なお、グラフィカルモデル構築部２１０、又はその他の機能部が実行してもよい。 It is assumed that the inference unit 220 executes the processes in steps S1301 and S1302. Note that the graphical model construction unit 210 or other functional units may execute.

前述した処理を繰り返し実行することによって、グラフィカルモデルをコンパクト化することも可能である。 It is also possible to make the graphical model compact by repeatedly executing the processing described above.

実施例３によれば、影響評価値に基づいて、グラフィカルモデルをコンパクト化することができる。 According to the third embodiment, the graphical model can be made compact based on the influence evaluation value.

実施例４では、影響評価値及びサンプル数を用いて学習データの離散化粒度を修正することによって、グラフィカルモデル４００を再構築する。ここで、離散化粒度とは、連続である数値の幅の大きさを示す。 In Example 4, the graphical model 400 is reconstructed by correcting the discretization granularity of the learning data using the impact evaluation value and the number of samples. Here, the discretization granularity indicates the width of a continuous numerical value.

例えば、連続する実数について、整数値のみを扱う離散化は、小数点第１までの数値を扱う離散化より粒度が大きい。より具体的な医療データを考えた場合、血圧値は連続値となり、その値が「６０」から「２００」まで広い範囲に分布する。実施例１でも述べたように、グラフィカルモデルを構築するときに、ノード４１０の状態値を有限にする必要ある場合がある。このとき、血糖値を、「８０以下」、「８０から１００」、「１００から１２０」、「１２０以上」の四つに離散化する方法が考えられる。 For example, for continuous real numbers, discretization that handles only integer values has a greater granularity than discretization that handles numbers up to the first decimal point. When more specific medical data is considered, the blood pressure value is a continuous value, and the value is distributed over a wide range from “60” to “200”. As described in the first embodiment, when the graphical model is constructed, the state value of the node 410 may need to be finite. At this time, a method of discretizing the blood glucose level into four values of “80 or less”, “80 to 100”, “100 to 120”, and “120 or more” can be considered.

離散化の幅はグラフィカルモデルの予測性能に影響する。すなわち、離散化した場合、推論結果は本来の連続値の近似値となるため、離散化の幅を細かくすれば、離散化による誤差が小さくなる。一方、離散化の幅が細かいと、ノードの状態値に一致する学習データの数が少なくなる。そのため、条件付確率テーブル６００における条件付確率値の信頼性が低くなり、結果として推論結果の信頼性及び精度の低下をまねく。したがって、適切な離散化の幅を設定する必要がある。 The width of the discretization affects the prediction performance of the graphical model. That is, in the case of discretization, the inference result is an approximate value of the original continuous value. Therefore, if the discretization width is reduced, the error due to discretization is reduced. On the other hand, if the discretization width is small, the number of learning data that matches the state value of the node decreases. Therefore, the reliability of the conditional probability value in the conditional probability table 600 is lowered, and as a result, the reliability and accuracy of the inference result are lowered. Therefore, it is necessary to set an appropriate discretization width.

以下、実施例１との差異を中心に実施例４について説明する。 Hereinafter, the fourth embodiment will be described focusing on differences from the first embodiment.

実施例４の計算機システムは実施例１のものと同一である。また、実施例４の計算機２００のハードウェア構成及びソフトウェア構成は実施例１のものと同一である。また、実施例４のデータベース２０６に格納される情報は実施例１のものと同一である。 The computer system of the fourth embodiment is the same as that of the first embodiment. The hardware configuration and software configuration of the computer 200 of the fourth embodiment are the same as those of the first embodiment. The information stored in the database 206 of the fourth embodiment is the same as that of the first embodiment.

実施例４では、推論結果及び推論結果の確率値の信頼区間が算出された後に以下のような処理が実行される。 In the fourth embodiment, after the inference result and the confidence interval of the probability value of the inference result are calculated, the following processing is executed.

図１４は、実施例４の計算機２００が実行するグラフィカルモデルの再構築処理を説明するフローチャートである。 FIG. 14 is a flowchart for explaining a graphical model reconstruction process executed by the computer 200 according to the fourth embodiment.

計算機２００は、影響度管理情報９００が存在するノード４１０の中から状態値が離散化されたノード４１０を一つ選択する（ステップＳ１４０１）。例えば、グラフィカルモデル４００の構築時に、予め、状態値が離散化されたノード４１０にフラグ等を付与しておくことが考えられる。 The computer 200 selects one node 410 whose state value is discretized from the nodes 410 in which the influence degree management information 900 exists (step S1401). For example, when the graphical model 400 is constructed, it is conceivable to add a flag or the like to the node 410 whose state values are discretized in advance.

次に、計算機２００は、影響度管理情報９００に基づいて、選択されたノード４１０の離散化の粒度を決定する（ステップＳ１４０２）。離散化の粒度の決定方法は例えば、以下のような四つの方法が考えられる。 Next, the computer 200 determines the discretization granularity of the selected node 410 based on the influence management information 900 (step S1402). For example, the following four methods can be considered as a method for determining the granularity of discretization.

（１）サンプル数９０４の値が全てのレコードについて同一又は誤差が小さくなるように離散化の粒度を決定する。 (1) The discretization granularity is determined so that the value of the number of samples 904 is the same for all records or the error becomes small.

（２）各レコードの影響評価値９０８とサンプル数９０４との積又は重み付き平均値が同一又は誤差が小さくなるように離散化の粒度を決定する。 (2) The granularity of discretization is determined so that the product or weighted average value of the influence evaluation value 908 and the number of samples 904 of each record is the same or the error is reduced.

（３）影響評価値９０８が所定の閾値以上であるレコードの数が、影響評価値９０８が当該所定の閾値より小さいレコードの数の半分となるように離散化の粒度を決定する。 (3) The granularity of discretization is determined so that the number of records whose influence evaluation value 908 is equal to or greater than a predetermined threshold is half the number of records whose influence evaluation value 908 is smaller than the predetermined threshold.

（４）影響評価値９０８に基づいて状態値の幅を所定数に分類することによって離散化の粒度を決定する。 (4) The discretization granularity is determined by classifying the state value width into a predetermined number based on the influence evaluation value 908.

なお、前述した離散化の粒度の決定方法は一例であって、影響度管理情報９００を用いたものであればどのような方法であってもよい。 The discretization granularity determination method described above is merely an example, and any method using the influence management information 900 may be used.

次に、計算機２００は、状態値が離散化された全てのノード４１０について処理が完了したか否かを判定する（ステップＳ１４０３）。 Next, the computer 200 determines whether or not the processing has been completed for all the nodes 410 whose state values are discretized (step S1403).

状態値が離散化された全てのノード４１０について処理が完了していないと判定された場合、計算機２００は、ステップＳ１４０１に戻り同様の処理を実行する。 When it is determined that the processing has not been completed for all the nodes 410 whose state values are discretized, the computer 200 returns to step S1401 and executes the same processing.

状態値が離散化された全てのノード４１０について処理が完了したと判定された場合、計算機２００は、再度、グラフィカルモデル４００の構築処理を実行する（ステップＳ１４０４）。このとき、ステップＳ１００２において、決定された離散化の粒度に基づいて学習データの離散化処理が実行される。その他の処理は、図１０と同一の処理である。 When it is determined that the processing has been completed for all the nodes 410 whose state values are discretized, the computer 200 executes the construction process of the graphical model 400 again (step S1404). At this time, in step S1002, the discretization process of the learning data is executed based on the determined discretization granularity. Other processes are the same as those in FIG.

ステップＳ１４０１からステップＳ１４０３までの処理は、推論部２２０が実行するものとする。なお、グラフィカルモデル構築部２１０、又はその他の機能部が実行してもよい。 It is assumed that the processing from step S1401 to step S1403 is executed by the inference unit 220. Note that the graphical model construction unit 210 or other functional units may execute.

前述した処理を繰り返し実行することによって、適切な離散化の粒度を決定できる。 An appropriate discretization granularity can be determined by repeatedly executing the processing described above.

実施例４によれば、影響評価値及びサンプル数等に基づいて離散化の粒度を調整することによって、信頼度の高い推論結果が算出可能なグラフィカルモデル４００を構築することができる。 According to the fourth embodiment, by adjusting the discretization granularity based on the influence evaluation value, the number of samples, and the like, it is possible to construct the graphical model 400 that can calculate a highly reliable inference result.

なお、本発明は上記した実施例に限定されるものではなく、様々な変形例が含まれる。また、例えば、上記した実施例は本発明を分かりやすく説明するために構成を詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、各実施例の構成の一部について、他の構成に追加、削除、置換することが可能である。 In addition, this invention is not limited to an above-described Example, Various modifications are included. Further, for example, the above-described embodiments are described in detail for easy understanding of the present invention, and are not necessarily limited to those provided with all the described configurations. Further, a part of the configuration of each embodiment can be added to, deleted from, or replaced with another configuration.

また、上記の各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また、本発明は、実施例の機能を実現するソフトウェアのプログラムコードによっても実現できる。この場合、プログラムコードを記録した記憶媒体をコンピュータに提供し、そのコンピュータが備えるＣＰＵが記憶媒体に格納されたプログラムコードを読み出す。この場合、記憶媒体から読み出されたプログラムコード自体が前述した実施例の機能を実現することになり、そのプログラムコード自体、及びそれを記憶した記憶媒体は本発明を構成することになる。このようなプログラムコードを供給するための記憶媒体としては、例えば、フレキシブルディスク、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、ハードディスク、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）、光ディスク、光磁気ディスク、ＣＤ−Ｒ、磁気テープ、不揮発性のメモリカード、ＲＯＭなどが用いられる。 Each of the above-described configurations, functions, processing units, processing means, and the like may be realized by hardware by designing a part or all of them with, for example, an integrated circuit. The present invention can also be realized by software program codes that implement the functions of the embodiments. In this case, a storage medium in which the program code is recorded is provided to the computer, and a CPU included in the computer reads the program code stored in the storage medium. In this case, the program code itself read from the storage medium realizes the functions of the above-described embodiments, and the program code itself and the storage medium storing it constitute the present invention. As a storage medium for supplying such a program code, for example, a flexible disk, a CD-ROM, a DVD-ROM, a hard disk, an SSD (Solid State Drive), an optical disk, a magneto-optical disk, a CD-R, a magnetic tape, A non-volatile memory card, ROM, or the like is used.

また、本実施例に記載の機能を実現するプログラムコードは、例えば、アセンブラ、Ｃ／Ｃ＋＋、ｐｅｒｌ、Ｓｈｅｌｌ、ＰＨＰ、Ｊａｖａ等の広範囲のプログラム又はスクリプト言語で実装できる。 Further, the program code for realizing the functions described in the present embodiment can be implemented by a wide range of programs or script languages such as assembler, C / C ++, perl, Shell, PHP, Java, and the like.

さらに、実施例の機能を実現するソフトウェアのプログラムコードを、ネットワークを介して配信することによって、それをコンピュータのハードディスクやメモリ等の記憶手段又はＣＤ−ＲＷ、ＣＤ−Ｒ等の記憶媒体に格納し、コンピュータが備えるＣＰＵが当該記憶手段や当該記憶媒体に格納されたプログラムコードを読み出して実行するようにしてもよい。 Furthermore, by distributing the program code of the software that implements the functions of the embodiments via a network, the program code is stored in a storage means such as a hard disk or memory of a computer or a storage medium such as a CD-RW or CD-R The CPU included in the computer may read and execute the program code stored in the storage unit or the storage medium.

上述の実施例において、制御線や情報線は、説明上必要と考えられるものを示しており、製品上必ずしも全ての制御線や情報線を示しているとは限らない。全ての構成が相互に接続されていてもよい。 In the above-described embodiments, the control lines and information lines indicate what is considered necessary for the explanation, and not all control lines and information lines on the product are necessarily shown. All the components may be connected to each other.

Claims

A computer system comprising an arithmetic device for executing a program and one or more computers having a memory for storing the program,
A learning data storage unit for managing learning data composed of records including a plurality of items corresponding to random variables;
Using the learning data, it is composed of a node corresponding to the random variable, an edge indicating a dependency relationship between the nodes, and a probability table indicating a distribution of probability values for each state value of the random variable corresponding to the node. A graphical model building unit for building a graphical model;
A model information storage unit that manages structure information of the graphical model constructed by the graphical model construction unit, and the probability table of each of a plurality of nodes included in the graphical model;
It is a distribution of probability values that is composed of records including a plurality of items corresponding to the random variables, accepts input of evidence data in which values are stored in at least one item, and inferred nodes become predetermined state values. An inference section for calculating an inference result,
The reasoning part is
An influence degree calculation unit for calculating an influence evaluation value indicating a magnitude of influence of a probability value of a node having a dependency relationship with the inference target node on the probability value of the inference target node;
A probability value used to calculate the impact evaluation value, and calculates a probability value derivative that is a change amount of the probability value of the inference target node with respect to a minute change in the probability value of the node having a dependency relationship with the inference target node. A differential amount calculation unit;
Including
The probability value differential amount calculation unit includes:
With reference to the structure information of the graphical model managed by the model information storage unit, a node having a dependency relationship with the inference target node is extracted,
Select a processing target node from the extracted nodes, obtain the probability table of the processing target node managed by the model information storage unit,
With reference to the structure information of the graphical model managed by the model information storage unit, a set of one or more nodes having a dependency relationship with the processing target node is extracted as a family node,
Using the probability table of the inference target node, the processing target node, and the family node, the joint probability distribution of the joint random variable combining the inference target node, the processing target node, and the family node To calculate
The computer system, wherein the probability value derivative is calculated using the joint probability distribution and a probability table of the processing target node.

The computer system according to claim 1,
The computer system characterized in that the probability value differential amount calculation unit calculates the probability value differential amount using Equation (1).

The computer system according to claim 2,
The inference unit includes a confidence interval calculation unit that calculates a confidence interval of a probability value in the inference result using a function having the influence evaluation value as a variable,
The computer system includes an inference result storage unit that manages inference result management information in which the probability value of the inference result and the confidence interval are associated with each other.

The computer system according to claim 2,
An influence degree storage unit for managing influence degree management information in which the probability value of the probability table is associated with the influence evaluation value for each of the probability tables of the extracted nodes;
The learning data storage unit manages reinforcement data composed of records including a plurality of items corresponding to the random variables,
The reasoning part is
After calculating the impact evaluation value for all probability values in the probability table of the extracted node, select a probability table to be processed,
With reference to the degree of influence management information corresponding to the probability table to be processed, based on the influence evaluation value, select a probability value that requires re-learning using the reinforcement data,
Based on a combination of state values of the random variable corresponding to the selected probability value, matching with a record included in the reinforcement data,
Based on the result of the collation, the record used for the relearning is extracted from the reinforcement data,
Updating the learning data by adding the extracted records;
The computer system is characterized in that the graphical model constructing unit constructs the graphical model again using the updated learning data.

The computer system according to claim 2,
An influence degree storage unit for managing influence degree management information in which the probability value of the probability table is associated with the influence evaluation value for each of the probability tables of the extracted nodes;
The reasoning part is
Based on the influence degree management information, a total influence evaluation value indicating a magnitude of influence of the extracted node on the inference target node is calculated;
Based on the value of the total impact assessment value, an important node is selected from the extracted nodes,
The computer system is characterized in that the graphical model construction unit constructs the graphical model again in consideration of only the dependency relationship of the important nodes.

The computer system according to claim 2,
The graphical model building unit
Constructing the graphical model by discretizing the state value of the node, which has a continuous state value, into a predetermined granularity;
The computer system includes: a probability value of the probability table; a sample number that is the number of records of the learning data that matches a combination of state values of random variables corresponding to the probability value of the probability table; A degree-of-impact management unit that manages the degree-of-impact management information associated with each of the probability tables of the extracted nodes;
The reasoning part is
Select a node whose state value is a continuous value,
Read the influence management information corresponding to the probability value table of the node,
Based on the number of samples of the degree of influence management information and the influence evaluation value, determine the granularity of the state value of the node,
The computer system is characterized in that the graphical model constructing unit constructs the graphical model again based on the newly determined granularity of the state value of the node.

In a computer system including one or more computers, a probability table indicating a node corresponding to a random variable, an edge indicating a dependency relationship between the nodes, and a distribution of probability values for each state value of the random variable corresponding to the node. A graphical model management method considering the statistical uncertainty of a graphical model composed of
The one or more computers have a computing device that executes a program, and a memory that stores the program,
The computer system includes a learning data storage unit that manages learning data composed of records including a plurality of items corresponding to random variables.
The management method of the graphical model is:
The arithmetic unit constructs the graphical model using the learning data, and stores the constructed structural information of the graphical model and the probability table of each of a plurality of nodes included in the graphical model in the memory. A first step;
Probability that the arithmetic unit is configured from a record including a plurality of items corresponding to the random variable, accepts input of evidence data in which a value is stored in at least one item, and the inference target node becomes a predetermined state value A second step of calculating an inference result that is a distribution of values;
A third step in which the arithmetic unit calculates a probability value differential amount that is a change amount of the probability value of the inference target node with respect to a minute change in the probability value of the node having a dependency relationship with the inference target node;
The arithmetic unit uses the probability value differential amount to calculate an influence evaluation value indicating a magnitude of an influence of a probability value of a node having a dependency relationship with the inference target node on the probability value of the inference target node. And a fourth step of
The third step includes
A fifth step of referring to the structure information of the graphical model stored in the memory and extracting a node having a dependency relationship with the inferred node;
A sixth step of selecting a node to be processed from the extracted nodes and obtaining the probability table of the node to be processed from the memory;
Referring to structure information of the graphical model stored in the memory, a seventh step of extracting a set of one or more nodes having a dependency relationship with the processing target node as a family node;
Using the probability table of the inference target node, the processing target node, and the family node, the joint probability distribution of the joint random variable combining the inference target node, the processing target node, and the family node An eighth step of calculating
And a ninth step of calculating the probability value differential amount using the joint probability distribution and the probability table of the processing target node.

A graphical model management method according to claim 7, comprising:
In the ninth step, the probability value differential amount is calculated using Expression (2), and the graphical model management method is characterized in that:

The graphical model management method according to claim 8, comprising:
The fourth step includes
Calculating the confidence interval of the probability value in the inference result using a function having the impact evaluation value as a variable after the impact evaluation value is calculated;
And storing the inference result management information in which the probability value of the inference result is associated with the confidence interval in the memory.