JP2010257269A

JP2010257269A - Probabilistic inference device

Info

Publication number: JP2010257269A
Application number: JP2009107253A
Authority: JP
Inventors: Hiroshi Hitosugi; 裕志一杉
Original assignee: National Institute of Advanced Industrial Science and Technology AIST
Current assignee: National Institute of Advanced Industrial Science and Technology AIST
Priority date: 2009-04-27
Filing date: 2009-04-27
Publication date: 2010-11-11
Anticipated expiration: 2029-04-27
Also published as: JP5170698B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a probabilistic inference device for quickly and efficiently performing inference processing by using a Bayesian network having a mechanism for restricting the combination of the values of nodes representing random variables. <P>SOLUTION: The probabilistic inference device includes an inference mechanism for performing inference processing by using a Bayesian network configuring a network by a plurality of node representing random variables and edges representing causality between the random variables between the nodes, wherein constraints are added to the combination of the values of the Bayesian network in order to reduce the degree of freedom of the combination of the values. As a technological means for constraining the values, a method for adding constraint nodes to the Bayesian network is used. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、ベイジアンネットによる知識表現技術を用いた確率的推論装置に関するものであり、更に詳細には、確率変数を表すノードの取り得る値の組み合わせを制限する機構を持つベイジアンネットを用いて推論処理を高速に効率よく行う確率的推論装置に関するものである。 The present invention relates to a probabilistic inference apparatus using a Bayesian network knowledge representation technique, and more specifically, infers using a Bayesian network having a mechanism that limits combinations of possible values of nodes representing random variables. The present invention relates to a stochastic reasoning apparatus that performs processing at high speed and efficiency.

ベイジアンネット（非特許文献１）は、複数の確率変数の間の確率的な因果関係を計算機のメモリ上に記憶するためのデータ構造である。ベイジアンネットを用いることによって、複雑な知識を効率的に表現することができ、その知識に基づいて事後確率計算やＭＰＥ計算など様々な確率的推論を行うことができる。現在、ベイジアンネットの応用は、音声や画像などのパターン認識、ロボットの運動制御、自然言語処理、知識情報処理など広範囲に及ぶ。音声認識などでよくつかわれる隠れマルコフモデル（ＨＭＭ）もベイジアンネットの一種である。 A Bayesian network (Non-Patent Document 1) is a data structure for storing a stochastic causal relationship between a plurality of random variables on a memory of a computer. By using the Bayesian network, complex knowledge can be efficiently expressed, and various probabilistic inferences such as posterior probability calculation and MPE calculation can be performed based on the knowledge. Currently, Bayesian networks have a wide range of applications, such as pattern recognition for voice and images, robot motion control, natural language processing, and knowledge information processing. A hidden Markov model (HMM) often used in speech recognition is a type of Bayesian network.

ベイジアンネットは、確率変数を表すノードと、そのノードの間の確率変数間の因果関係を表すエッジにより複数のノードのネットワークで構成される。さらに、各ノードごとに条件付確率表と呼ばれるものを保持する。条件付確率表は、あるノードの親ノードの集合がある値の組み合わせを取ったときにそのノードがある値を取る条件付確率を表にしたものである。 A Bayesian network is composed of a network of a plurality of nodes, with nodes representing random variables and edges representing causal relationships between random variables between the nodes. Furthermore, what is called a conditional probability table is held for each node. The conditional probability table is a table of conditional probabilities that take a certain value when a set of parent values of a node takes a certain value combination.

図１は、４つのノードから成るベイジアンネットの一例を説明する図であり、図２は、これらのノードにおける条件付確率表の一例を説明する図である。図１および図２を参照して、４つの確率変数を表すノードＳ，ノードＲ，ノードＷ，ノードＣから構成される簡単なベイジアンネットについて説明する。ノードＳの確率変数Ｓは「スプリンクラーが動いたかどうか」、ノードＲの確率変数Ｒは「雨が降ったかどうか」、ノードＷの確率変数Ｗは「芝生が濡れているかどうか」、ノードＣの確率変数Ｃは「雲が出ているかどうか」を表しているとする。 FIG. 1 is a diagram for explaining an example of a Bayesian network composed of four nodes, and FIG. 2 is a diagram for explaining an example of a conditional probability table at these nodes. A simple Bayesian network composed of nodes S, R, W, and C representing four random variables will be described with reference to FIGS. The random variable S of the node S is “whether the sprinkler has moved”, the random variable R of the node R is “whether it has rained”, the random variable W of the node W is “whether the lawn is wet”, the probability of the node C The variable C represents “whether or not a cloud is present”.

図２に示すように、４つの確率変数の間の因果関係は、条件付確率表として与える。図２では、図１のベイジアンネットの各ノードに付随する条件付確率表の例を示している。ここで、２１はノードＳに付随する条件付確率表、２２はノードＲに付随する条件付確率表、２３はノードＷに付随する条件付確率表、２４はノードＣに付随する条件付確率表である。 As shown in FIG. 2, the causal relationship between the four random variables is given as a conditional probability table. FIG. 2 shows an example of a conditional probability table attached to each node of the Bayesian network of FIG. Here, 21 is a conditional probability table associated with the node S, 22 is a conditional probability table associated with the node R, 23 is a conditional probability table associated with the node W, and 24 is a conditional probability table associated with the node C. It is.

条件付確率表は、確率変数の間の因果関係の強さの知識を記憶するデータである。例えば、ノードＷに付随する条件付確率表２３において、条件付き確率Ｐ（Ｗ＝ｎｏ｜Ｓ＝ｎｏ，Ｒ＝ｎｏ）＝０．８８は、スプリンクラーも動かず雨も降っていないときに芝生が濡れていない確率は、０．８８であるという知識を表している。また、ノードＲに付随する条件付確率表２２において、条件付き確率Ｐ（Ｒ＝ｙｅｓ）＝０．０２は、単に雨が降る確率（事前確率）が０．０２であるという知識を表している。 A conditional probability table is data that stores knowledge of the strength of causal relationships between random variables. For example, in the conditional probability table 23 associated with the node W, the conditional probability P (W = no | S = no, R = no) = 0.88 indicates that the lawn does not move when the sprinkler does not move and it does not rain. The probability of not being wet represents the knowledge that it is 0.88. Further, in the conditional probability table 22 associated with the node R, the conditional probability P (R = yes) = 0.02 represents knowledge that the probability of raining (prior probability) is simply 0.02. .

次に、本発明のアルゴリズムを説明する上で必要となるＭＰＥ（ｍｏｓｔｐｒｏｂａｂｌｅｅｘｐｌａｎａｔｉｏｎ）という概念について簡単に説明する。 Next, a concept called MPE (Most Probable Explanation) necessary for explaining the algorithm of the present invention will be briefly described.

ＭＰＥとは、ベイジアンネットにおいて、与えられた観測データを最もよく説明する変数の値の組のことである。与えられた観測データを表す確率変数とその値の組の集合を集合ｉ、隠れ変数（観測データ以外の確率変数）とその値の組の集合を集合ｈとすると、ＭＰＥとなる値の組ｍは次の式で与えられる。

ただし、Ｐ（ｈ，ｉ）は集合ｈと集合ｉという値の組み合わせが起きる同時確率で、以下の式で表せる。

ここで、ｐａｒｅｎｔｓ（ｘ）はノードＸの親ノードの値の組である。 MPE is a set of variable values that best describes given observation data in a Bayesian network. A set of random variables representing given observation data and their values is a set i, and a set of hidden variables (random variables other than observation data) and their values is a set h. Is given by:

However, P (h, i) is a simultaneous probability that a combination of values of the set h and the set i occurs, and can be expressed by the following expression.

Here, parents (x) is a set of values of the parent node of the node X.

例えば、図１のベイジアンネットにおいて、観測値Ｗ＝ｙｅｓが与えられたとする。この場合に、求めるＭＰＥは、観測値との同時確率がもっとも高い隠れ変数Ｓ，Ｒ，Ｃの値の組｛ｓ，ｒ，ｃ｝で、以下の式で表される。

For example, assume that the observed value W = yes is given in the Bayesian network of FIG. In this case, the MPE to be obtained is a set {s, r, c} of hidden variables S, R, C having the highest joint probability with the observed value, and is represented by the following expression.

以下に、具体的なＭＰＥの計算手順の一例を示す。まず、「Ｓ＝ｎｏ，Ｒ＝ｎｏ，Ｃ＝ｎｏ」という値の組の、観測値Ｗ＝ｙｅｓとの同時確率は、図２の条件付確率表の値を用いて以下のように計算される。

An example of a specific MPE calculation procedure is shown below. First, the simultaneous probability of the set of values “S = no, R = no, C = no” and the observed value W = yes is calculated as follows using the values in the conditional probability table of FIG. The

同様にして、他の値の組み合わせの同時確率も計算し、２の３乗個あるすべての組み合わせの各同時確率をまとめると下のようになる。

この中では、「Ｓ＝ｙｅｓ，Ｒ＝ｎｏ，Ｃ＝ｎｏ」がもっとも同時確率の高い値の組み合わせになるので、これがＭＰＥである。したがって、図１および図２の形式で記憶されている知識に基づいて、もし芝生が濡れているならば、「スプリンクラーは動いたが雨は降らず雲も出ていない」という組み合わせがもっとも可能性が高いと推論されたことになる。 Similarly, the simultaneous probabilities of other combinations of values are calculated, and the simultaneous probabilities of all combinations of 2 to the 3rd power are summarized as follows.

Among them, “S = yes, R = no, C = no” is a combination of values having the highest simultaneous probability, and this is MPE. Therefore, based on the knowledge stored in the format of FIG. 1 and FIG. 2, if the lawn is wet, the combination of “the sprinkler moved but it did not rain and no clouds” was most likely Is inferred to be high.

次に、ベイジアンネットの条件付確率表の学習の処理について説明する。ベイジアンネットのネットワーク構造が与えられていて、各確率変数の値の組についての大量の観測データがあれば、それをもとに条件付確率表の要素の値を決めることができる。これを条件付確率表の学習と呼ぶ。 Next, the learning process of the Bayesian network conditional probability table will be described. If a Bayesian network structure is given and there is a large amount of observation data for each set of random variable values, the values of the elements in the conditional probability table can be determined based on that data. This is called conditional probability table learning.

例えば、１０００個の観測データのうち、Ｒ＝ｎｏであるものが９８０個であれば、Ｐ（Ｒ＝ｎｏ）は９８０／１０００となる。また、その中で、さらにＣ＝ｎｏであるものが６８６個であれば、Ｐ（Ｃ＝ｎｏ｜Ｒ＝ｎｏ）は６８６／９８０となる。 For example, out of 1000 observation data, if there are 980 data with R = no, P (R = no) is 980/1000. If there are 686 that have C = no among them, P (C = no | R = no) is 686/980.

隠れ変数（観測データが与えられない変数）がある場合は、ＥＭアルゴリズムなどを用いて、隠れ変数の推定値に基づいて条件付確率の値を決定する。 If there is a hidden variable (a variable for which observation data is not given), the value of the conditional probability is determined based on the estimated value of the hidden variable using an EM algorithm or the like.

先行技術としては、非特許文献４のように異なるベイジアンネットの混合モデルを学習するアルゴリズムも提案されている。 As a prior art, an algorithm for learning a mixed model of different Bayesian networks as in Non-Patent Document 4 has been proposed.

条件付確率表の学習は、通常大量のデータを一度に処理することで行われる。しかし、時々刻々と新しい観測データが与えられるたびに、逐次的に条件付確率表を更新する学習アルゴリズムもある。そのようなアルゴリズムは、オンライン学習アルゴリズムと呼ばれる。 Learning of the conditional probability table is usually performed by processing a large amount of data at a time. However, there is a learning algorithm that sequentially updates the conditional probability table every time new observation data is given. Such an algorithm is called an online learning algorithm.

次に、オンライン学習アルゴリズムについて説明する。図３は、オンライン学習アルゴリズムのフローチャートを示す図である。このフローチャートに示すように、隠れ変数が含まれている場合の、条件付確率表のオンライン学習アルゴリズムは、次のような処理ステップにより学習処理が行われる（詳細については非特許文献３を参照）。
ステップ１；入力ノードに観測された値を設定する。
ステップ２；観測値と現在の条件付確率表の値に基づいてＭＰＥを計算することにより隠れ変数の値を推定する。
ステップ３；ＭＰＥの値に基づいて、条件付確率表を更新する。
ステップ４；（必要ならば）ＭＰＥを出力する。
ステップ５；ステップ１に戻る。 Next, an online learning algorithm will be described. FIG. 3 is a diagram showing a flowchart of the online learning algorithm. As shown in this flowchart, in the online learning algorithm of the conditional probability table when a hidden variable is included, the learning process is performed by the following processing steps (refer to Non-Patent Document 3 for details). .
Step 1: Set the observed value at the input node.
Step 2: Estimate the value of the hidden variable by calculating the MPE based on the observed value and the current conditional probability table value.
Step 3: Update the conditional probability table based on the MPE value.
Step 4: Output MPE (if necessary).
Step 5: Return to Step 1.

次に、このオンライン学習アルゴリズムのフローチャートの各ステップについて詳細に説明すると、
ステップ１（図３の３１）においては、新たに得られた観測データの値を、入力ノードの値に設定する。観測データとは、例えば、画像認識装置の場合はカメラ等から得られた画像情報、音声認識装置の場合はマイク等から得られた音声情報、自然言語処理装置の場合は文章入力装置等から得られた記号列、ロボットの運動制御装置の場合はセンサー等から得られた外界およびロボットの状態に関する情報である。
ステップ２（図３の３２）においては、入力データの値とその時点での条件付確率表の値を用いて、入力ノード以外のノード（すなわち隠れノード）の確率変数の値を、ＭＰＥ計算によって推定する。
ステップ３（図３の３３）においては、ステップ２で計算された各確率変数の値を、過去に得られたデータの統計量に加えることにより、条件付確率表の値を計算しなおす。例えば、過去に得られた条件付確率Ｐ（Ｙ＝ｙｅｓ｜Ｘ＝ｙｅｓ）の値が３／１０であり、今回得られた確率変数Ｘ，Ｙの値がそれぞれＸ＝ｙｅｓ，Ｙ＝ｙｅｓであったなら、条件付確率の値はＰ（Ｙ＝ｙｅｓ｜Ｘ＝ｙｅｓ）＝（３＋１）／（１０＋１）＝４／１１に更新する。
ステップ４（図３の３４）においては、必要に応じて推定された確率変数の値を出力する。例えば、画像認識装置や音声認識装置の場合は認識結果、自然言語処理装置の場合は文章の意味を表す情報、ロボットの運動制御装置の場合はアクチュエータの制御に必要な情報、等を出力する。 Next, each step of the online learning algorithm flowchart will be described in detail.
In step 1 (31 in FIG. 3), the newly obtained observation data value is set to the value of the input node. Observation data refers to, for example, image information obtained from a camera or the like in the case of an image recognition device, voice information obtained from a microphone or the like in the case of a speech recognition device, or from a text input device or the like in the case of a natural language processing device. In the case of a robot's motion control device, it is information about the external environment and the state of the robot obtained from a sensor or the like.
In step 2 (32 in FIG. 3), using the value of the input data and the value of the conditional probability table at that time, the value of the random variable of a node other than the input node (that is, the hidden node) is calculated by MPE calculation. presume.
In step 3 (33 in FIG. 3), the value of the conditional probability table is recalculated by adding the value of each random variable calculated in step 2 to the statistics of the data obtained in the past. For example, the value of the conditional probability P (Y = yes | X = yes) obtained in the past is 3/10, and the values of the probability variables X and Y obtained this time are X = yes and Y = yes, respectively. If there is, the conditional probability value is updated to P (Y = yes | X = yes) = (3 + 1) / (10 + 1) = 4/11.
In step 4 (34 in FIG. 3), the value of the estimated random variable is output as necessary. For example, the recognition result is output in the case of an image recognition device or a speech recognition device, the information indicating the meaning of a sentence is output in the case of a natural language processing device, and the information necessary for controlling the actuator is output in the case of a motion control device of a robot.

なお、ステップ３の条件付確率表の更新を行う手段については、様々なものが利用できる。例えば、非特許文献３で述べられているように、自己組織化マップを使うのも１つの方法である。この場合、確率変数は自己組織化マップの競合層に対応し、確率変数が取り得る値は自己組織化マップの競合層のユニットに対応する。そして、条件付確率はユニットの参照ベクトルの要素の値に対応する。こうすることで、自己組織化マップの特徴である近傍学習の効果により、汎化能力が向上するという利点がある。 Various means can be used as means for updating the conditional probability table in step 3. For example, as described in Non-Patent Document 3, using a self-organizing map is one method. In this case, the random variable corresponds to the competitive layer of the self-organizing map, and the value that the random variable can take corresponds to the competitive layer unit of the self-organizing map. The conditional probability corresponds to the value of the element of the reference vector of the unit. By doing so, there is an advantage that the generalization ability is improved by the effect of neighborhood learning which is a feature of the self-organizing map.

図４は、確率的推論・条件付確率学習装置のモジュール構成を説明する図である。図３のオンライン学習アルゴリズムを用いた確率的推論および条件付確率の学習を行う推論学習装置は、図４に示すようなモジュール構成とすることができる。図４において、４１は外部から入力データを受け取る入力部、４２はベイジアンネットを用いた知識データベースである。確率的推論部４３は、入力部４１および知識データベース４２から値を受け取って、ＭＰＥ計算を行う。条件付確率表学習部４４は、確率的推論部４３からＭＰＥの値を受け取って、それに基づいて知識データベース４２の値を更新する。出力部４５は、確率的推論部４３から受け取ったＭＰＥの値を出力する。 FIG. 4 is a diagram illustrating the module configuration of the probabilistic reasoning / conditional probability learning device. The inference learning apparatus that performs probabilistic inference and conditional probability learning using the online learning algorithm of FIG. 3 can have a module configuration as shown in FIG. In FIG. 4, 41 is an input unit for receiving input data from the outside, and 42 is a knowledge database using a Bayesian network. The probabilistic reasoning unit 43 receives values from the input unit 41 and the knowledge database 42 and performs MPE calculation. The conditional probability table learning unit 44 receives the MPE value from the probabilistic reasoning unit 43 and updates the value of the knowledge database 42 based on the MPE value. The output unit 45 outputs the MPE value received from the probabilistic reasoning unit 43.

なお、図４に示すモジュール構成の確率的推論・条件付確率学習装置から、条件付確率表学習部４４を取り除いた推論装置とした構成とすることもできる。このようなモジュール構成の装置は、学習機能を持たない確率的推論装置となる。 It is also possible to adopt a configuration in which an inference device is obtained by removing the conditional probability table learning unit 44 from the probabilistic inference / conditional probability learning device having the module configuration shown in FIG. A device having such a module configuration is a stochastic reasoning device having no learning function.

また、図５は、オンライン学習アルゴリズムを、学習能力を持つロボットに応用した場合のモジュール構成を説明する図である。図３において説明したオンライン学習アルゴリズムは、例えば、学習能力を持つロボットに応用できる。 FIG. 5 is a diagram for explaining a module configuration when the online learning algorithm is applied to a robot having learning ability. The online learning algorithm described in FIG. 3 can be applied to a robot having learning ability, for example.

この場合、図３のオンライン学習アルゴリズムを、学習能力を持つロボットに応用した場合には、図５に示すようなモジュール構成の推論学習装置となる。図５に示す装置構成においては、センサー５１からの情報と、知識データベース５２にもとづいて、確率的推論部５３がロボットの外界の状況を認識する。条件付確率表学習部５４は、確率的推論部５３からの認識結果を受け取り、それにもとづいて知識データベース５２を更新する。また、意思決定部５５は、認識結果にもとづいて運動の意思決定をし、アクチュエータ５６を駆動する。同時に意思決定部５５が、強化学習アルゴリズム等を用いて行動ルールの変更を行う。 In this case, when the online learning algorithm in FIG. 3 is applied to a robot having learning ability, an inference learning device having a module configuration as shown in FIG. 5 is obtained. In the apparatus configuration shown in FIG. 5, the probabilistic reasoning unit 53 recognizes the external environment of the robot based on the information from the sensor 51 and the knowledge database 52. The conditional probability table learning unit 54 receives the recognition result from the probabilistic reasoning unit 53 and updates the knowledge database 52 based on the recognition result. Further, the decision making unit 55 makes a decision on exercise based on the recognition result, and drives the actuator 56. At the same time, the decision making unit 55 changes the action rule using a reinforcement learning algorithm or the like.

J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference,Morgan Kaufmann, 1988.J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann, 1988. 一杉裕志、「脳の情報処理原理の解明状況」、産業技術総合研究所テクニカルレポートＡＩＳＴ０７−Ｊ０００１２，Ｍａｒ２００８．Ishisugi Hiroshi, “The Elucidation of Information Processing Principles in the Brain”, National Institute of Advanced Industrial Science and Technology Technical Report AIST07-J00012, Mar 2008. 一杉裕志、「大脳皮質神経回路が行うベイジアンネット構造学習に関する考察」、人工知能学会第７２回人工知能基本問題研究会（ＳＩＧ−ＦＰＡＩ）資料，Ｎｏｖ２００８．Hiroshi Issugi, “Study on Bayesian network structure learning performed by cerebral cortical neural circuit”, Artificial Intelligence Society 72th Artificial Intelligence Basic Problem Study Group (SIG-FPAI) document, Nov 2008. Thiesson B, Meek C, Chickering D, Heckerman D. Learning mixture of DAG models. Technical Report, MSR-TR-97-30, Redmond: Microsoft Research, 1997.Thiesson B, Meek C, Chickering D, Heckerman D. Learning mixture of DAG models. Technical Report, MSR-TR-97-30, Redmond: Microsoft Research, 1997.

ところで、ベイジアンネットを大規模化しようとすると、ノード数が増えるにつれてノードの値の組み合わせの数は指数関数的に増えるため、確率的推論（たとえば、各確率変数の事後確率の計算やＭＰＥの計算）をする際の無意味な局所解の数の増大、探索空間の増大といった問題が起きる。また、条件付確率表の学習時にも同様に、過適合や無意味な局所解の数の増大、探索空間の増大という問題が起きる。したがって、ベイジアンネットはある程度以上の大規模化が難しいという問題がある。 By the way, when trying to increase the size of the Bayesian network, the number of combinations of node values increases exponentially as the number of nodes increases, so probabilistic reasoning (for example, calculation of posterior probabilities for each random variable and MPE calculation). ) Causes problems such as an increase in the number of meaningless local solutions and an increase in search space. Similarly, when learning the conditional probability table, problems such as overfitting, an increase in the number of meaningless local solutions, and an increase in search space occur. Therefore, the Bayesian network has a problem that it is difficult to increase the scale to a certain extent.

また、通常のベイジアンネットでは、混合分布を効率的に表現できないという問題がある。混合分布とは、複数の異なる形を持つ確率分布を混合して得られる確率分布である。具体例で説明すると、生物の網膜に入ってくる視覚情報は、混合分布にしたがう信号の例である。例えば、人の顔、木の実の形、捕食者の形などは、それぞれが異なる確率分布にしたがって視覚情報を生成する。実際の生物の目の前に提示される視覚情報は、目の前にあるどれか１つの物体を生成したものであるはずである。個々の確率分布の内部は連続しているので、自己組織化マップを用いて学習すれば、補完されて汎化能力が上がるが、木の実の形と捕食者の形のように、かけ離れた分布の間は補完すると、かえって汎化能力が落ちることが想像される。 In addition, a normal Bayesian network has a problem that the mixture distribution cannot be expressed efficiently. The mixed distribution is a probability distribution obtained by mixing a plurality of probability distributions having different shapes. As a specific example, the visual information that enters the retina of a living organism is an example of a signal that follows a mixed distribution. For example, the human face, the shape of a nut, the shape of a predator, etc. generate visual information according to different probability distributions. The visual information presented in front of the eyes of an actual living thing should have generated any one object in front of the eyes. Since the inside of each probability distribution is continuous, learning with a self-organizing map complements and increases generalization ability, but the distribution of distant distributions like the shape of a nut and the shape of a predator increases. Complementing the space is expected to reduce generalization ability.

次に、従来技術では、混合分布を表現する条件付確率表をうまく学習できないことを示す実験例について説明する。図６は、２つの隠れノードと４９個の入力ノードからなるベイジアンネットを説明する図である。図７は、従来技術を用いて２つの自己組織化マップを使って混合分布を学習した例を説明する図である。図６および図７を参照する。 Next, an experimental example will be described which shows that the prior art cannot successfully learn the conditional probability table expressing the mixture distribution. FIG. 6 is a diagram for explaining a Bayesian network composed of two hidden nodes and 49 input nodes. FIG. 7 is a diagram for explaining an example in which a mixture distribution is learned using two self-organizing maps using the conventional technique. Please refer to FIG. 6 and FIG.

ここで説明するベイジアンネットは、図６に示すように、２つの隠れノード（Ｈ_１，Ｈ_２）と４９個の入力ノード（Ｉ_１，…，Ｉ_４９）からなるベイジアンネットである。ノードＨ_１，ノードＨ_２が隠れ変数を表す隠れノードである。また、ノードＩ_１，…，ノードＩ_４９が観測データを入力する入力ノードである。 The Bayesian network described here is a Bayesian network composed of two hidden nodes (H ₁ , H ₂ ) and 49 input nodes (I ₁ ,..., I ₄₉ ) as shown in FIG. Nodes H ₁ and H ₂ are hidden nodes representing hidden variables. Nodes I ₁ ,..., And node I ₄₉ are input nodes for inputting observation data.

図６のベイジアンネットの条件付確率表を、例えば、非特許文献３に述べられている自己組織化マップを用いた従来技術を使って学習させる。この学習装置に、２つの確率分布を混合した混合分布から生成される２次元のデータを、４９次元の冗長なデータに変換して、入力ノードの観測値として与える。すべての確率変数は、取り得る値の数は１０とした。 The conditional probability table of the Bayesian network in FIG. 6 is learned by using a conventional technique using a self-organizing map described in Non-Patent Document 3, for example. In this learning device, two-dimensional data generated from a mixed distribution obtained by mixing two probability distributions is converted into 49-dimensional redundant data and given as an observation value of an input node. All random variables have 10 possible values.

２次元のデータから４９次元のデータへの変換は以下のように行う。２次元の空間を７×７の格子で区切り、４９個の格子点の座標と、入力する２次元データの座標とのユークリッド距離をｄ_ｉ（ｉ＝１，…，４９）として、ａ_ｉ＝ｍａｘ（０．８−３ｄ_ｉ，０）を１０段階に量子化したものを各入力ノードの値とする。ただし、ｍａｘ（ｘ，ｙ）はｘとｙのうち最大の値を返す関数である。 Conversion from two-dimensional data to 49-dimensional data is performed as follows. The two-dimensional space is divided by a 7 × 7 grid, and the Euclidean distance between the coordinates of the 49 grid points and the coordinates of the input two-dimensional data is d _i (i = 1,..., 49), and a _i = A value obtained by quantizing max (0.8-3d _i , 0) into 10 levels is set as a value of each input node. However, max (x, y) is a function that returns the maximum value of x and y.

従来技術により、２つの自己組織化マップを使って混合分布を学習した例では、図７に示されるように、実験の結果は、２つのノード（自己組織化マップ）が２つの離れた確率分布を無理に同時に学習してしまう。このため、結果的に無意味な学習結果が得られてしまうという問題がある。 In an example in which a mixture distribution is learned using two self-organizing maps according to the prior art, as shown in FIG. 7, the result of the experiment is a probability distribution in which two nodes (self-organizing maps) are separated by two. I will forcibly learn at the same time. For this reason, there is a problem that a meaningless learning result is obtained as a result.

なお、図７において、枠で囲ったＬ字形の部分（領域）は、入力データを生成する確率分布を２次元空間上に示したものである。破線上の点と実線上の点はそれぞれ２つのノードの自己組織化マップの各ユニットの受容野の重心を示している。 In FIG. 7, an L-shaped portion (region) surrounded by a frame indicates a probability distribution for generating input data on a two-dimensional space. The point on the broken line and the point on the solid line indicate the centroid of the receptive field of each unit of the self-organizing map of the two nodes.

混合分布を扱える従来技術は存在する。例えば、非特許文献４はベイジアンネットで混合分布を表現する従来技術である。しかし、大規模化が難しいというベイジアンネットの問題は解決されずに残っている。 There are conventional techniques that can handle mixed distributions. For example, Non-Patent Document 4 is a conventional technique for expressing a mixture distribution with a Bayesian network. However, the Bayesian network problem, which is difficult to scale up, remains unsolved.

本発明は上記のような問題点を解決するためになされたものであり、本発明の目的は、確率変数を表すノードの取り得る値の組み合わせを制限する機構を持つベイジアンネットを用いて推論処理を高速に効率よく行う確率的推論装置を提供することにある。 The present invention has been made to solve the above-described problems, and an object of the present invention is to perform inference processing using a Bayesian network having a mechanism for limiting combinations of possible values of nodes representing random variables. It is an object of the present invention to provide a stochastic reasoning apparatus that efficiently performs the above.

上記のような目的を達成するため、本発明による確率的推論装置は、基本的な構成として、確率変数を表す複数のノードと前記ノードの間の確率変数間の因果関係を表すエッジによりネットワーク構成したベイジアンネットを用いて推論処理を行う推論機構を備えた確率的推論装置において、このベイジアンネットの値の組み合わせに対して制約条件を加えることで、値の組み合わせの自由度を低減させ、課題を解決する。値を制約する技術的手段としては、ベイジアンネットに制約条件ノードを追加する方法か、あるいはそれと等価であるが、同時確率の計算の際にどの程度制約条件が満たされているかを同時確率の大きさに反映させるという方法を用いる。 In order to achieve the above object, a stochastic reasoning apparatus according to the present invention basically has a network configuration with a plurality of nodes representing random variables and edges representing causal relationships between the random variables between the nodes. In a probabilistic inference device equipped with an inference mechanism that performs inference processing using the Bayesian network, adding constraints to the combination of values in this Bayesian network reduces the degree of freedom of the combination of values, Resolve. The technical means for constraining the value is to add a constraint node to the Bayesian network, or equivalently, but how much the constraint is satisfied when calculating the joint probability. The method of reflecting on the size is used.

具体的には、第１の特徴として、本発明の確率的推論装置は、確率変数を表すノードの取り得る値の組み合わせを制限する機構を持つベイジアンネットを用いて推論処理を行う推論機構を有する確率的推論装置であって、前記ベイジアンネットは、確率変数を表すノードの取り得る値が２つ以上の通常の値と１つ以上のφ値と呼ぶ値から成る３つ以上の値のうちのどれか１つを取るノードが、ネットワークを構成するノードの中に２つ以上存在し、さらにφ値を取り得る前記ノードの子ノードとして制約条件ノードと呼ぶノードが１つ以上あって、その制約条件ノードの条件付確率表の値が、φ値を取り得る前記ノードの値がφ値を取る頻度が高くなるよう制約しているベイジアンネットであり、前記推論機構が、前記ベイジアンネットの一部のノードに、そのノードが表す確率変数の値または値の確率分布が入力として与えられた時に、ベイジアンネットを構成するノードのネットワークを用いて、他の確率変数の値または値の事後確率を推論することを特徴とするものである。 Specifically, as a first feature, the stochastic inference apparatus of the present invention has an inference mechanism that performs an inference process using a Bayesian network having a mechanism for limiting combinations of values that can be taken by nodes representing random variables. The Bayesian network is a probabilistic inference device, wherein a possible value of a node representing a random variable is selected from among three or more values including a value called two or more normal values and one or more φ values. There are two or more nodes that take any one of the nodes constituting the network, and there are one or more nodes called constraint nodes as child nodes of the node that can take a φ value. The value of the conditional probability table of the condition node is a Bayesian network that restricts the value of the node that can take the φ value to be high in frequency of taking the φ value, and the inference mechanism is a part of the Bayesian network. When a node is given as input the value of the random variable represented by the node or the probability distribution of the value, the posterior probabilities of the values or values of other random variables are inferred using the network of nodes constituting the Bayesian network. It is characterized by this.

また、第２の特徴として、本発明による確率的推論装置は、前記ベイジアンネットが、さらに、φ値を取り得る前記ノードであって、そのノードのφ値以外の値の数をｓ個とすると、そのｓ個の各値を取る各事前確率が実質的に等しいノードを１つ以上持つベイジアンネットであり、前記推論機構が、前記ベイジアンネットの一部のノードに、そのノードが表す確率変数の値または値の確率分布が入力として与えられた時に、ベイジアンネットを構成するノードのネットワークを用いて、他の確率変数の値または値の事後確率を推論することを特徴とするものである。 As a second feature, the stochastic inference apparatus according to the present invention is such that the Bayesian network is a node that can further take a φ value, and the number of values other than the φ value of the node is s. , A Bayesian network having one or more nodes with substantially equal prior probabilities taking each of the s values, and the inference mechanism includes the random variables represented by the nodes at some nodes of the Bayesian network. When a value or a probability distribution of values is given as an input, a value of another random variable or a posteriori probability of the value is inferred using a network of nodes constituting a Bayesian network.

また、第３の特徴として、本発明による確率的推論装置においては、各ノードの条件付確率表は、推論処理を行った推論結果を用いて条件付確率表を学習する際には、φ値を取り得る１つ以上の前記ノードの条件付確率表を自己組織化マップを用いて学習し、その際にそのノードが取り得る値のうち２つ以上のφ値以外の値を近傍学習の対象とすることを特徴とするものである。 As a third feature, in the probabilistic inference apparatus according to the present invention, the conditional probability table of each node is obtained when the conditional probability table is learned using the inference result obtained by performing the inference processing. A conditional probability table of one or more nodes that can take a value is learned using a self-organizing map, and values other than two or more φ values that can be taken by the node at that time are subject to neighborhood learning It is characterized by that.

上記のような特徴を備える本発明の確率的推論装置によれば、ベイジアンネットの値の組み合わせに対して制約条件を加えることで、値の組み合わせの自由度を低減させ、推論処理を高速に効率よく行うことができる。なお、制約を加えることで、ベイジアンネットとしての表現力は低下することになるが、自然界にある画像情報や音声情報などは、信号源がスパース性、すなわち、めったに活性化しない、という性質を満たしていることが多いので、実用性において問題となることはない。本発明による確率的推論装置におけるベイジアンネットは、そのような自然界にある情報をより効率的に扱えるよう特殊化されたベイジアンネットとなっているものを利用する。これにより、また、混合分布の問題も解決される。 According to the probabilistic inference apparatus of the present invention having the above-described features, by adding a constraint condition to the combination of values of the Bayesian network, the degree of freedom of the combination of values is reduced, and the inference processing is efficiently performed. Can be done well. By adding constraints, the expressive power of a Bayesian network is reduced, but image information and audio information in the natural world satisfy the property that the signal source is sparse, that is, rarely activated. Therefore, there is no problem in practicality. The Bayesian network in the stochastic reasoning apparatus according to the present invention uses a Bayesian network specialized to handle such information in the natural world more efficiently. This also solves the problem of mixture distribution.

このように、制約条件によってノードの値の組み合わせの数が劇的に減少することで、確率変数の値を推論する際の計算量が劇的に減少することになり、また、後述の実験（図１２）で示すように、混合分布をうまく表現できるようになる。 In this way, the number of combinations of node values is drastically reduced due to the constraints, which dramatically reduces the amount of computation when inferring the value of a random variable. As shown in FIG. 12), the mixture distribution can be expressed well.

４つのノードから成るベイジアンネットの一例を説明する図である。It is a figure explaining an example of the Bayesian network which consists of four nodes. ノードにおける条件付確率表の一例を説明する図である。It is a figure explaining an example of the conditional probability table in a node. オンライン学習アルゴリズムのフローチャートを示す図である。It is a figure which shows the flowchart of an online learning algorithm. 確率的推論・条件付確率学習装置のモジュール構成を説明する図である。It is a figure explaining the module structure of a stochastic reasoning and conditional probability learning apparatus. オンライン学習アルゴリズムを学習能力を持つロボットに応用した場合のモジュール構成を説明する図である。It is a figure explaining the module structure at the time of applying an online learning algorithm to the robot with learning ability. ２つの隠れノードと４９個の入力ノードからなるベイジアンネットを説明する図である。It is a figure explaining the Bayesian network which consists of two hidden nodes and 49 input nodes. 従来技術を用いて２つの自己組織化マップを使って混合分布を学習した一例を説明する図である。It is a figure explaining an example which learned mixture distribution using two self-organizing maps using conventional technology. 隠れ変数の値の組み合わせを制限するノードＳを持つベイジアンネットを説明する図である。It is a figure explaining the Bayesian network with the node S which restrict | limits the combination of the value of a hidden variable. 本発明の確率的推論装置において用いたベイジアンネットの条件付確率表の記憶形式の一例を説明する図である。It is a figure explaining an example of the storage format of the conditional probability table | surface of a Bayesian network used in the stochastic reasoning apparatus of this invention. 一部の隠れ変数がφ値の制約を受けないベイジアンネットの例を説明する図である。It is a figure explaining the example of the Bayesian network in which some hidden variables are not restricted by the φ value. 制約条件ノードを２つ持つベイジアンネットの例を説明する図である。It is a figure explaining the example of the Bayesian network which has two constraint condition nodes. 本発明を用いて２つの自己組織化マップを使って混合分布を学習した一例を説明する図である。It is a figure explaining an example which learned mixture distribution using two self-organization maps using the present invention.

以下、本発明を実施するための形態について説明する。まず、変数の値の組み合わせを制限するノードＳを持つベイジアンネットについて説明する。図８は、変数の値の組み合わせを制限するノードＳを持つベイジアンネットを説明する図である。 Hereinafter, modes for carrying out the present invention will be described. First, a Bayesian network having nodes S that limit combinations of variable values will be described. FIG. 8 is a diagram for explaining a Bayesian network having nodes S that limit combinations of variable values.

ｎ個の隠れ変数を表すノードＨ_ｉ（ｉ＝１，…，ｎ）が、それぞれ｛ｘ_φ，ｘ_１，ｘ_２，…，ｘ_ｓ−１，ｘ_ｓ｝という（ｓ＋１）個の値を取り得るとする。以下、ｘ_φをφ値、φ値以外の値を非φ値と呼ぶ。 Nodes H _i (i = 1,..., n) representing n hidden variables have (s + 1) values {x _φ , x ₁ , x ₂ ,..., x _s−1 , x _s }, respectively. It can be taken. Hereinafter, _xφ is called a φ value, and values other than the φ value are called non-φ values.

また、図８に示すベイジアンネットのように、各隠れ変数の値がφ値になる確率が高くなるような制約条件を表現する１つのノードＳを、すべての隠れ変数の子ノードとして追加する。このノードＳを制約条件ノードと呼ぶ。 Further, as in the Bayesian network shown in FIG. 8, one node S expressing a constraint condition that increases the probability that the value of each hidden variable becomes a φ value is added as a child node of all hidden variables. This node S is called a constraint node.

制約条件ノードのノードＳに付随する条件付確率表は、ノードＳの親ノードＨ_ｉの多くがφ値を取るときに、条件付確率Ｐ（Ｓ＝ｙｅｓ｜Ｈ_１，…，Ｈ_ｎ）の値が大きい、という特徴を持つものとする。この条件付確率の値は、いわば、隠れ変数の値の組がどの程度制約条件を満たしているかを表している。このような特徴があれば、入力ノードＩ_ｊ（ｊ＝１，…，ｍ）の値が与えられた時、同時にＳ＝ｙｅｓという値も与えた上で、隠れ変数Ｈ_ｉの値をＭＰＥ計算によって推論すれば、隠れ変数の値は高い頻度でφ値を取るようになる。 The conditional probability table associated with the node S of the constraint condition node indicates that the conditional probability P (S = yes | H ₁ ,..., H _n ) when many of the parent nodes H _i of the node S take φ values. Assume that the value is large. The value of the conditional probability represents the extent to which the set of hidden variable values satisfies the constraint condition. With such a feature, when the value of the input node I _j (j = 1,..., M) is given, the value of S = yes is given at the same time, and the value of the hidden variable H _i is calculated by MPE. As a result, the value of the hidden variable takes a φ value with high frequency.

ノードＳおよびその条件付確率表Ｐ（Ｓ｜Ｈ_１，…，Ｈ_ｎ）は、メモリ上に明示的に持つ必要はなく、同時確率の計算式を修正するだけで、実質的に同じ効果が得られる。制約条件ノードのノードＳを含まないベイジアンネットにおける、同時確率の計算式は、以下の式であった。

これを例えば、以下の式に修正する。

このように修正した場合、明示的に下記の条件付確率表Ｐ（Ｓ＝ｙｅｓ｜Ｈ_１，…，Ｈ_ｎ）を持つ制約条件ノードを追加した場合と、実質的に等価である。

ただし、αは正規化定数、βはスパース性を制御するパラメタである。Ａ（ｈ）はｈの活性度を表す値である。以下に定義されるＡ（ｈ）は「非φ値を取る要素の数」がｍ個であれば１、ｍ個でなければ無限大を示す値（所定値）を返す。

The node S and its conditional probability table P (S | H ₁ ,..., H _n ) do not have to be explicitly stored in the memory, and substantially the same effect can be obtained only by correcting the joint probability calculation formula. can get. The equation for calculating the joint probability in a Bayesian network that does not include the node S of the constraint condition node was as follows.

For example, this is corrected to the following formula.

Such a modification is substantially equivalent to the case where a constraint node having the following conditional probability table P (S = yes | H ₁ ,..., H _n ) is explicitly added.

Where α is a normalization constant and β is a parameter for controlling sparsity. A (h) is a value representing the activity of h. A (h) defined below returns 1 if the “number of elements having non-φ values” is m, and returns a value (predetermined value) indicating infinity if the number is not m.

このように定義される制約条件ノードのノードＳをベイジアンネットに追加すると、ＭＰＥ計算時に、隠れ変数の値の多くがφ値をとるように制約される。具体的にはｎ個の隠れノードのうち、ｍ個が非φ値、（ｎ−ｍ）個がφ値を取るという制約条件になる。 When the node S of the constraint condition node defined in this way is added to the Bayesian network, many of the values of the hidden variables are constrained to take the φ value during the MPE calculation. Specifically, of the n hidden nodes, the constraint condition is that m takes a non-φ value and (n−m) takes a φ value.

この時、ｎ個のノードの取り得る値の組み合わせの数は、ｓのｍ乗掛ける_ｎＣ_ｍである（ただし、_ｎＣ_ｍはｎ個からｍ個を選び出す組み合わせの数）。制約条件がない場合は値の組み合わせの数はｓ＋１のｎ乗であるから、制約条件によって値の組み合わせの数が劇的に減少することになる。この効果は、ノード数ｎが大きいときに、より顕著になる。 At this time, the number of combinations of values that can be taken by n nodes is _n C _m multiplied by s multiplied by _m (where _n C _m is the number of combinations for selecting m from n). When there is no constraint condition, the number of value combinations is s + 1 to the nth power, and the number of value combinations is dramatically reduced by the constraint condition. This effect becomes more prominent when the number of nodes n is large.

この効果により、ベイジアンネットはそのままでは大規模化が難しいという問題が解決される。図９は、本発明の確率的推論装置において用いたベイジアンネットの条件付確率表の記憶形式の一例を説明する図である。 This effect solves the problem that it is difficult to increase the scale of the Bayesian network. FIG. 9 is a diagram for explaining an example of the storage format of the Bayesian network conditional probability table used in the stochastic inference apparatus of the present invention.

本発明の確率的推論装置におけるベイジアンネットは、制約条件ノードのノードＳ以外のノードの条件付確率表については、従来技術と同じ形式で保持することが可能である。例えば、Ｘ，Ｙが確率変数の時、条件付確率Ｐ（Ｙ｜Ｘ）は、具体的には、図９に示す条件付確率表９１のように、（ｓ＋１）×（ｓ＋１）通りの条件付確率の値の表にしてメモリ上に記録すればよい。 The Bayesian network in the probabilistic reasoning apparatus of the present invention can hold the conditional probability table of nodes other than the node S of the constraint condition node in the same format as in the prior art. For example, when X and Y are random variables, the conditional probability P (Y | X) is specifically expressed as (s + 1) × (s + 1) conditions as shown in the conditional probability table 91 shown in FIG. A table of attached probability values may be recorded on the memory.

なお、ほとんどのノードがφ値を取ることが、値の組み合わせの爆発を抑える本質的に重要な要件であるため、少数のノードが制約を受けないようなベイジアンネットであっても、当然に本発明に含まれる。 Note that the fact that most nodes take a φ value is an essential requirement to suppress the explosion of value combinations, so even a Bayesian network where a small number of nodes are not constrained naturally. Included in the invention.

図１０は、一部の隠れ変数がφ値の制約を受けないベイジアンネットの例を説明する図である。例えば、図１０では、一部のノードがφ値の制約を受けないベイジアンネットの例を示している。このベイジアンネットでは、確率変数Ｈ_１と確率変数Ｈ_３は高い頻度でφ値を取るようにノードＳによって制約されるが、確率変数Ｈ_２はそのような制約を受けない。このような場合でも、確率変数の取り得る値の組み合わせは劇的に減ることには変わりがなく、発明の効果は失われない。 FIG. 10 is a diagram illustrating an example of a Bayesian network in which some hidden variables are not restricted by the φ value. For example, FIG. 10 shows an example of a Bayesian network in which some nodes are not restricted by the φ value. In this Bayesian network, the random variable H ₁ and the random variable H ₃ are constrained by the node S to take the φ value with high frequency, but the random variable H ₂ is not subject to such a constraint. Even in such a case, the combination of values that the random variable can take is still drastically reduced, and the effect of the invention is not lost.

また、制約条件ノードは１つである必要はない。図１１は、制約条件ノードを２つ持つベイジアンネットの例を説明する図である。図１１に示す例では、ノードＳ_１およびノードＳ_２がともに制約条件ノードである。ノードＳ_１は確率変数Ｈ_１，確率変数Ｈ_２の値、ノードＳ_２は確率変数Ｈ_３，確率変数Ｈ_４の値が高い頻度でφ値になるよう制約する役割を持つ。 Also, there is no need for one constraint node. FIG. 11 is a diagram illustrating an example of a Bayesian network having two constraint condition nodes. In the example shown in FIG. 11, the node S ₁ and node S ₂ are both constraint nodes. The node S ₁ has a role of restricting the values of the random variable H ₁ and the random variable H ₂ and the node S ₂ to restrict the values of the random variable H ₃ and the random variable H ₄ to φ values with high frequency.

また、先に定義した関数Ａ（ｈ）ではφ値をとるノード数が（ｎ−ｍ）個という固定値になるような制約条件を考えたが、与えられる観測データごとにφ値をとるノード数が変動するようなベイジアンネットであっても、本発明に含まれる。たとえば、φ値ではない値を持つノードの数を罰金項として持つ最適化問題の形で確率的推論を実行する場合等がそれに相当する。具体的には、例えば、関数Ａ（ｈ）を次のように定義した場合が含まれる。

In addition, in the function A (h) defined above, the constraint condition that the number of nodes taking the φ value is a fixed value of (n−m) is considered, but the node that takes the φ value for each given observation data Even a Bayesian network whose number fluctuates is included in the present invention. For example, this is the case when probabilistic inference is executed in the form of an optimization problem having the number of nodes having a value other than φ as a fine term. Specifically, for example, the case where the function A (h) is defined as follows is included.

なお、これまで説明したＭＰＥ計算の例では、１つの入力ノードには１つの確定値を入力した。しかし、一般にベイジアンネットでは、ノードに与える観測データは確定値である必要はなく、値の確率分布を与えた場合でも、他の確率変数の値に関する確率的推論を行うことができる。 In the example of MPE calculation described so far, one definite value is input to one input node. However, in general, in a Bayesian network, observation data given to a node does not need to be a definite value, and even when a probability distribution of values is given, probabilistic inference regarding the values of other random variables can be performed.

また、本発明の確率的推論装置は、ＭＰＥ計算による確率計算だけでなく、確率変数の事後確率の計算など、様々な確率的推論を行う際に効果を発揮する。 The probabilistic reasoning apparatus of the present invention is effective not only in the probability calculation by MPE calculation but also in performing various probabilistic inferences such as calculation of posterior probabilities of random variables.

ＭＰＥを計算する方法にはさまざまなものがあるが、そこで用いるアルゴリズムによらず、本発明の確率的推論装置によるベイジアンネットを用いることで効果を発揮する。用いるアルゴリズムは、先に説明したすべての値の組み合わせを計算する素朴な方法を含むだけでなく、ベストファーストサーチなどのヒューリスティックスを用いた探索の方法、ビタビアルゴリズムなどのダイナミックプログラミングを用いた方法、欲張り法、最急降下法、模擬焼きなまし法を含む局所探索法、マルコフ連鎖モンテカルロなどのモンテカルロ法を用いた方法も含む。 There are various methods for calculating the MPE. Regardless of the algorithm used there, the effect is exhibited by using the Bayesian network by the stochastic reasoning apparatus of the present invention. The algorithms used include not only the simple method of calculating all combinations of values described above, but also search methods using heuristics such as best-first search, methods using dynamic programming such as the Viterbi algorithm, and greedy It also includes methods using local search methods such as the method, steepest descent method, simulated annealing method, and Monte Carlo methods such as Markov chain Monte Carlo.

ベイジアンネットを用いて、確率変数の事後確率の計算を用いる場合も同様に、用いるアルゴリズムによらず、効果がある。用いるアルゴリズムは、すべての値の組み合わせの同時確率を用いる素朴な方法、ヒューリスティックスを用いた方法、確率伝播アルゴリズムのようにダイナミックプログラミングを用いた方法や、それを応用した近似解法であるルーピー確率伝播アルゴリズム、マルコフ連鎖モンテカルロなどのモンテカルロ法を用いた方法も含む。 Similarly, when a posterior probability calculation of a random variable is used using a Bayesian network, there is an effect regardless of the algorithm used. The algorithm used is a simple method that uses the joint probability of all combinations of values, a method that uses heuristics, a method that uses dynamic programming such as a probability propagation algorithm, and the loopy probability propagation algorithm that is an approximate solution that applies it. And a method using a Monte Carlo method such as Markov chain Monte Carlo.

上で述べた確率的推論装置の推論結果を用いて条件付確率表を学習する条件付確率表学習装置を構築できる。具体的には、図３で述べたオンライン学習アルゴリズムを用いるのが１つの方法であるが、図３のアルゴリズム以外にも、ＥＭアルゴリズムなどを用いることができる。 A conditional probability table learning device that learns a conditional probability table using the inference result of the probabilistic inference device described above can be constructed. Specifically, one method is to use the online learning algorithm described in FIG. 3, but an EM algorithm or the like can be used in addition to the algorithm in FIG.

さらに、条件付確率の学習の際に、非特許文献３で述べた方法による自己組織化マップを用いることもできる。ただし、φ値は近傍学習の対象としない。つまり、φ値を表すユニットは、他のφ値以外の値を表すユニットの近傍にはないと考えて、近傍学習を行うように構成する。 Furthermore, a self-organizing map obtained by the method described in Non-Patent Document 3 can be used when learning conditional probabilities. However, the φ value is not subject to neighborhood learning. That is, it is assumed that the unit representing the φ value is not in the vicinity of other units representing values other than the φ value, and is configured to perform neighborhood learning.

すなわち、それは、確率的推論装置の推論結果を用いて条件付確率表を学習する際に、φ値を取り得る１つ以上の前記ノードの条件付確率表を自己組織化マップを用いて学習するものであり、その際にそのノードが取り得る値のうち２つ以上のφ値以外の値を近傍学習の対象とする条件付確率表学習装置となる。 That is, when learning a conditional probability table using an inference result of a probabilistic inference device, the conditional probability table of one or more nodes that can take a φ value is learned using a self-organizing map. In this case, a conditional probability table learning device is provided that uses values other than two or more φ values among the values that can be taken by the node at that time as targets of neighborhood learning.

このような学習アルゴリズムを用いた学習結果として得られるベイジアンネットは、さらに、φ値を取り得る前記ノードであって、そのノードのφ値以外の値の数をｓ個とすると、そのｓ個の各値を取る各事前確率が実質的に等しいノードを１つ以上持つベイジアンネットになる。 A Bayesian network obtained as a learning result using such a learning algorithm is the node that can take a φ value, and when the number of values other than the φ value of the node is s, A Bayesian network having one or more nodes with substantially equal prior probabilities for each value is obtained.

これについて具体的に説明すると、自己組織化マップでは、近傍学習と競合学習の効果により、競合層の各ユニットが勝者になる確率がほぼ等しくなる。例えば、確率変数Ｘが取り得る値が、
｛ｘ_φ，ｘ_１，ｘ_２，…，ｘ_ｓ｝
であるとすると、φ値ｘ_φ以外を近傍学習の対象とすることによって、φ値以外の各値を取る事前確率Ｐ（Ｘ＝ｘ_ｉ）（ｉ＝１，…，ｓ）に対して、
Ｐ（Ｘ＝ｘ_１）＝Ｐ（Ｘ＝ｘ_２）＝ … ＝Ｐ（Ｘ＝ｘ_ｓ）＝δ_X
という等式が近似的に成り立つようになる。ただし、δ_Ｘはノードごとに決まる値である。 Specifically, in the self-organizing map, the probability of each unit in the competitive layer becoming a winner is almost equal due to the effects of neighborhood learning and competitive learning. For example, the value that the random variable X can take is
{X _φ , x ₁ , x ₂ ,..., X _s }
, The prior probabilities P (X = x _i ) (i = 1,..., S) taking each value other than the φ value by making other than the φ value x _φ an object of neighborhood learning,
P (X = x ₁ ) = P (X = x ₂ ) =... = P (X = x _s ) = δ _X
The following equation holds approximately. However, the [delta] _X is a value determined for each node.

この等式が成り立っていれば、２つの確率変数Ｘおよび確率変数Ｙに関して、
Ｐ（Ｘ｜Ｙ）＝Ｐ（Ｘ）
という関係が成り立つかどうかを判定するのが容易になるという利点がある。条件付確率Ｐ（Ｘ｜Ｙ）の値がδ_Ｘとほぼ等しいかどうかを判定するだけですむからである。この性質は、ベイジアンネットを単純化して計算効率を上げる際に役立つ。また、ノードＸが親ノードを持たない場合、事前確率Ｐ（Ｘ）の値がφ値以外に対してδ_Ｘになるので、個々の値の事前確率Ｐ（Ｘ＝ｘ_１），Ｐ（Ｘ＝ｘ_２），…，Ｐ（Ｘ＝ｘ_ｓ）を明示的にメモリに記憶する必要がなくなるという利点がある。 If this equation holds, for two random variables X and Y,
P (X | Y) = P (X)
There is an advantage that it is easy to determine whether or not the relationship is established. Conditional probability P | value of (X Y) is because it is only necessary to determine whether substantially equal to [delta] _X. This property is useful for simplifying Bayesian networks and increasing computational efficiency. Also, if the node X does not have a parent node, the value of the prior probability P (X) is [delta] _X with respect to non-φ value, the prior probability of the individual values _{P (X = x 1),} P (X = X ₂ ),..., P (X = x _s ) has the advantage of not having to be explicitly stored in the memory.

一般に機械学習アルゴリズムでは、パラメタの自由度が高いと、過適合や局所解におちいり、汎化能力が落ちやすくなるという問題があるが、本発明の確率的推論装置のベイジアンネットでは、確率変数の値の組み合わせを制限することにより、条件付確率表の要素が取り得る値も制約されるため、過適合や局所解をまぬがれて汎化能力が向上することが期待できる。 In general, in machine learning algorithms, when the degree of freedom of parameters is high, there is a problem that generalization ability tends to be reduced due to overfitting and local solutions, but in the Bayesian network of the stochastic reasoning apparatus of the present invention, random variables By limiting the combinations of values, the values that can be taken by the elements of the conditional probability table are also constrained, so it can be expected that generalization ability will be improved by bypassing overfitting and local solutions.

確率変数の値の制限は、表を記憶する際のメモリ量の低減につながる可能性もある。値を制限することで条件付確率表の多くの要素の値が０になるなら、そのような疎な表を前提としたデータ構造を用いることで、条件付確率表を記憶するために必要なメモリ量を減らすことができる。 Limiting the value of the random variable may lead to a reduction in the amount of memory when storing the table. If the value of many elements of the conditional probability table becomes 0 by limiting the value, it is necessary to store the conditional probability table by using a data structure that assumes such a sparse table. The amount of memory can be reduced.

場合によっては、確率変数の値の制限が、事後確率計算における浮動小数点のオーバーフロー・アンダーフローの問題や、計算精度の問題を解決できる可能性がある。大規模なベイジアンネットの上で確率推論を行うためには、非常に多くの数の掛け算を行う必要があり、オーバーフロー・アンダーフローを引き起こしたり、計算精度が悪くなったりするという問題があるが、もし、使用する確率推論アルゴリズムが「φ値のノードは無視できる」という性質を持つなら、少数の非φ値のノードだけを用いて確率推論を行うことができるので、これらの問題を回避することができる。 In some cases, limiting the value of a random variable may solve the problem of floating point overflow / underflow in posterior probability calculations and the problem of calculation accuracy. In order to perform probabilistic reasoning on a large-scale Bayesian network, it is necessary to perform a very large number of multiplications, which causes problems such as overflow / underflow and poor calculation accuracy. If the probabilistic reasoning algorithm used has the property that “φ value nodes can be ignored”, probabilistic reasoning can be performed using only a small number of non-φ value nodes, thus avoiding these problems. Can do.

前述の実験と同様に、図６により説明したような２個の隠れノードと４９個の入力ノードからなるベイジアンネットにおいて、前述の実験と同じ入力データに対する条件付確率表の学習を行った。ただし、ｍ＝１、すなわち、ＭＰＥにおいて、２個のうち常に１つがφ値で１つが非φ値であるように制約条件を課した。すなわち、図６の２つの隠れノードＨ_１，Ｈ_２の共通の子ノードとして制約条件ノードＳを追加した場合と等価な条件で、実験を行った。 As in the previous experiment, the conditional probability table for the same input data as in the previous experiment was learned in a Bayesian network composed of two hidden nodes and 49 input nodes as described with reference to FIG. However, m = 1, that is, in MPE, a constraint was imposed so that one of the two was always a φ value and one was a non-φ value. That is, the experiment was performed under conditions equivalent to the case where the constraint node S was added as a common child node of the _two hidden nodes H ₁ and H ₂ in FIG.

学習時に２つの隠れノードがそれぞれ自己組織化マップの競合層として動作する点は、前述の実験と同じである。ただし、φ値は近傍学習の対象としない。すなわち、φ値を表すユニットは、他のφ値以外の値を表すすべての値の近傍にないものとして、近傍学習を行った。 The point that the two hidden nodes each operate as a competitive layer of the self-organizing map at the time of learning is the same as the above-described experiment. However, the φ value is not subject to neighborhood learning. That is, the neighborhood learning is performed on the assumption that the unit representing the φ value is not in the vicinity of all values representing values other than the other φ values.

実験の結果を図１２に示している。図１２は、本発明の確率的推論装置を用いて２つの自己組織化マップを使って混合分布を学習した一例を説明する図である。図１２に示す実験結果では、２つのノードのうち、１つが左のＬ字型の確率分布内の一点が入力されたときに非φ値になり、もう１つのノードは、右のＬ字型に対して非φ値になるような条件付確率表が、学習されている。その結果、２つの１次元の自己組織化マップが２つの確率分布をきれいに学習している。 The result of the experiment is shown in FIG. FIG. 12 is a diagram for explaining an example in which a mixture distribution is learned using two self-organizing maps using the probabilistic reasoning apparatus of the present invention. In the experimental result shown in FIG. 12, when one point in the left L-shaped probability distribution is input, one of the two nodes becomes a non-φ value, and the other node has the right L-shaped. A conditional probability table that has non-φ values is learned. As a result, the two one-dimensional self-organizing maps clearly learn the two probability distributions.

なお、この実験では、各ノードが取り得る値の数はｓ＋１＝１０である。制約条件がなければ、値の組み合わせの数は（ｓ＋１）の２乗、すなわち、１００であるが、制約条件があるおかげで、ｓの１乗、掛ける_２Ｃ_１、すなわち、１８に激減し、ＭＰＥの計算速度が大幅に向上した。ベイジアンネットがより大規模になれば、計算速度向上の効果はより顕著に表れる。 In this experiment, the number of values that each node can take is s + 1 = 10. Without constraints, the number of value combinations is (s + 1) squared, ie, 100, but thanks to the constraints, s is multiplied by ₂ C ₁ , ie, 18 The calculation speed of MPE has been greatly improved. If the Bayesian network becomes larger, the effect of improving the calculation speed becomes more prominent.

非特許文献２で述べられているように、ベイジアンネットを用いた確率的推論装置は、パターン認識（画像認識、音声認識など）、ロボットの運動制御や行動計画、ファジィ情報処理、自然言語処理など、さまざまな用途に用いることができる。 As described in Non-Patent Document 2, probabilistic inference devices using Bayesian networks include pattern recognition (image recognition, speech recognition, etc.), robot motion control and action planning, fuzzy information processing, natural language processing, etc. Can be used for various purposes.

本発明の確率的推論装置は、これらを含む多くのベイジアンネットの応用に対して効果を発揮する。特に、例えば、人間の脳が扱うことを得意とする自然界にある情報、具体的には、自然画像、音声情報、自然言語などの情報の、パターン認識などの処理に高い効果を発揮する。 The probabilistic reasoning apparatus of the present invention is effective for many Bayesian network applications including them. In particular, for example, it is highly effective in processing such as pattern recognition of information in the natural world that is good at handling the human brain, specifically, information such as natural images, audio information, and natural language.

２１〜２４条件付確率表
４１入力部
４２知識データベース
４３確率的推論部
４４条件付確率表学習部
４５出力部
５１センサー
５２知識データベース
５３確率的推論部
５４条件付確率表学習部
５５意思決定部
５６アクチュエータ
９１条件付確率表 21-24 Conditional probability table 41 Input unit 42 Knowledge database 43 Probabilistic reasoning unit 44 Conditional probability table learning unit 45 Output unit 51 Sensor 52 Knowledge database 53 Probabilistic reasoning unit 54 Conditional probability table learning unit 55 Decision making unit 56 Actuator 91 Conditional probability table

Claims

A stochastic inference apparatus having an inference mechanism that performs an inference process using a Bayesian network having a mechanism that limits combinations of possible values of nodes representing random variables,
In the Bayesian network, a node representing a random variable has a node that takes one of three or more values including two or more normal values and one or more values called φ values. There are one or more nodes called constraint nodes as child nodes of the node that exist in two or more nodes constituting the network and can take a φ value, and the conditional probability table of the constraint node The value is a Bayesian network that constrains the value of the node that can take a φ value so that the frequency of taking the φ value is high,
When the inference mechanism receives a value of a random variable represented by the node or a probability distribution of the value as an input to a part of the nodes of the Bayesian network, the inference mechanism uses another network of nodes constituting the Bayesian network to A stochastic reasoning device for inferring a value of a random variable or a posteriori probability of a value.

The stochastic inference device according to claim 1,
The Bayesian network further includes:
the node that can take a φ value,
When the number of values other than the φ value of the node is s,
A Bayesian network having one or more nodes with substantially equal prior probabilities for each of the s values,
When the inference mechanism receives a value of a random variable represented by the node or a probability distribution of the value as an input to a part of the nodes of the Bayesian network, the inference mechanism uses another network of nodes constituting the Bayesian network to A stochastic reasoning device for inferring a value of a random variable or a posteriori probability of a value.

In the probabilistic inference device according to claim 1 or 2,
The conditional probability table for each node is self-organizing the conditional probability table for one or more nodes that can take a φ value when learning the conditional probability table using the inference result obtained by the inference process. A probabilistic inference device characterized by learning using a map, and using values other than two or more φ values among values that can be taken by the node at that time as targets for neighborhood learning.