GB2421324A

GB2421324A - Method and apparatus for propagation in a graphical probabilistic model

Info

Publication number: GB2421324A
Application number: GB0427631A
Authority: GB
Inventors: David Mcgaw; Jean-Jacques Gras
Original assignee: Motorola Inc
Current assignee: Motorola Solutions Inc
Priority date: 2004-12-17
Filing date: 2004-12-17
Publication date: 2006-06-21
Also published as: WO2006065464A2; GB0427631D0; WO2006065464A3

Abstract

Information may be propagated forwards or backwards in a Bayesian Network during run time and using partial node probability tables. A method of propagation includes detecting (401) a change in a first data value associated with a first node of the Bayesian Network. In response, a partial node probability table is determined (403, 405). For example, a partial node probability table which only includes states that are associated with the first data value may be determined. A state probability distribution for a second node is then determined (407) in response to the partial node probability table. The invention may facilitate information propagation in a Bayesian Network and may in particular reduce memory requirements and granularity errors.

Description

• • • • •

I « «• • 4

• • • • • t

• • • « • «

• • « • • *

• •• ••* ••• • I • »

2421324

METHOD AND APPARATUS FOR PROPAGATION IN A GRAPHICAL

PROBABILISTIC MODEL

5 Field of the invention

The invention relates to a method and apparatus for propagation in a graphical probabilistic model, such as for example a Bayesian Network.

10

Background of the Invention

In recent years, artificial intelligence and expert systems 15 have become increasingly prevalent. In particular, although systems such as neural networks have been known for some time, the advances in computing technology has resulted in increased computational resources being readily available thereby providing for an increased practicality and 20 effectiveness of such systems.

Artificial intelligence and expert systems may be deployed in many different environments. For example, fault evaluation for complex software or medical diagnosis may be 25 performed or assisted by such systems. An advantage of expert systems is that they may learn from previous experiences. For example, if a medical expert system is provided with information of symptoms of a patient, it may generate a diagnosis in response to these symptoms. However, 30 if a user enters information of the correct diagnosis, the expert system may automatically modify the parameters and

associations in the system to reflect this information. This may improve the accuracy of a subsequent diagnosis.

Artificial intelligence and expert systems are frequently 5 implemented by the use of Bayesian Networks. A Bayesian Network represents the interactions within a group of uncertain events. A Bayesian Network may be represented by a network of nodes (representing various variables of the system) and connections between nodes (representing the 10 existence of a relationship between the node variables). The connections are formed from a parent node to a child node and represent the dependence of the variable of the child node on the variable of the parent node. Interactions between events are quantified by assigning probabilities to 15 the different possible values (states) of each variable and specifying explicit relations such as an equation or a probabilistic relationship.

In a Bayesian Network, information may be provided to given 20 variable or node. For example, information (often termed

"evidence") may be entered by a user. In this way, the value of a variable (node) of the Bayesian Network may be set to a given value (e.g. a deterministic or probabilistic value including a probability distribution). This information may 25 then be disseminated through the Bayesian Network.

Specifically, the variables may be updated throughout the network to reflect this additional information. The updating of the node variables is known as propagation and, in particular, updating in the direction from a parent to a 30 child is known as forward propagation and updating in the direction from a child to a parent is known as backwards propagation.

CML01802N Specification FinalVersion

Several tools have been implemented to calculate the relationships between nodes and to propagate information through a Bayesian Network. These tools are generally based 5 on the algorithms described in "Probabilistic Reasoning In Intelligent Systems", Judea Pearl, Morgan Kaufmann Publishers, 1997, ISBN 1-55860-47 9-0.However, although the algorithms are suitable for many applications, these conventional algorithms have a number of associated 10 disadvantages in many situations. For example, when a relation is deterministic, or a single discrete value,

rather than a probabilistic distribution, is known existing methods of propagation can require a large amount of memory and the predictions can be rather inaccurate. In particular, 15 current propagation tools tend to result in reduced accuracy of predictions due to the granularity associated with dividing data values of nodes into ranges, and excessive memory requirements resulting from large node probability tables as will be described in more detail.

20

In Bayesian Network tools there are generally two different types of node that can be represented: discrete and continuously discretised.

25 Discrete nodes, representing ordinal or categorical variables, contain only a set of single values, which can be a number or a string. Examples would be colours: "blue", "green" and "yellow", or integers: "1", "3" and "5". Only states that are explici.tly defined are valid, i.e. "purple" 30 and "2" would not be valid inputs in these nodes.

CML01802N Specification FinalVersion

Unfortunately, due to the complexity of the calculations, no usable continuous ranges have yet been implemented (for example a representation of the interval 1 to 4 which allows any value to be entered). However, a continuously 5 discretised range can be used to represent a numerical range. This is done by splitting the range into intervals resulting in a sequence of discrete range states. For example, the intervals 1-2, 2-3 and 3-4 can be used, making it possible to enter any value between 1 and 4 by assigning 10 the value to the correct interval.

If, in this example, we consider a node with the state intervals 1-2 and 2-3, and we attempt to enter the value 1.1 in this node, we can only enter a probability of 100% of the 15 value being in state 1-2. Thus we go from knowing the exact value to knowing it is 1.5±0.5. In this way entering a value will introduce an error.

Furthermore, when combining data values from a plurality of 20 parent nodes to determine a value of a child node, the granularity may be increased. For example, FIG. 1 illustrates an example where two parent nodes (Pi, P2) 101, 103 are added together to get a value of a child node 103.

25 If the values 1.25 and 2.25 are entered into the parents, they will, due to granularity, be placed in the 1-2 and 2-3 intervals (representing 1.5±0.5 and 2.5±0.5). When added together, the result 3-5 is split over 2 levels 2-4 and 4-6. Thus, even though the result is 3.5, the model predicts the 30 value is in the interval 2-6 (or 4±2) and not only is there an error from the granularity of the parent nodes, but there is an additional error due to the combination of granularity

CML01802N Specification FinalVersion

errors of the parent and child nodes. The actual result should be 100% in 2-4.

The effect over a large model with many layers can be 5 significant and tends to result in a blurring, and typically a broadening, of the distribution.

Another problem with current approaches is that they have high memory requirements. In Bayesian Networks, every node 10 has an associated node probability table. If the node has any parents, the node probability table describes the probabilities of states of the node, based on the possible combinations of states of the node's immediate parents.

15 As an example, FIG. 2 shows three nodes, where each node can have many states. Each node has a node probability table associated describing the probability of the node value being in each state given the status of the parent nodes. In the example, the two parent nodes (Pi, P2) 201, 203 do not 20 themselves have any parent nodes and are described by a 1-dimensional array representing the likelihood of each state. The two parent nodes 201, 203 are coupled to a child node 205 (C) which has a node probability table with a cell for every combination of parent and child states. Hence, the 25 total number of cells (and thus required memory locations) of the child node 205 is equal to the product of the number of states of the parent and the child nodes. In the present example, the node probability table of the child node 205 has eight cells (23).

30

Unfortunately nodes typically have many more states. For example, discretising a continuous range using interval

CML01802N Specification FinalVersion

values typically results in each node having 10 to 20 states for each parent or child node. As the size of the node probability table is derived by multiplying the number of states each parent has by the number of states in the child, 5 this means that a node with only two parents typically results in a node probability table having between 1000 and 8000 entries.

Furthermore, the node probability table size grows 10 exponentially with the number of parents, and a node probability table can quickly grow out of control. For example, four parents having 20 states and a child having 10 states results in a node probability table having 1.6 million entries.

15

In some cases it may be possible to use intermediate nodes to combine factors. This may result in a drastic reduction of the size of the node probability tables but can also introduce errors due to granularity in the intermediate 20 nodes. Furthermore, there are restrictions on the types of relations that can be modelled in this way, as it is only possible to split a model using intermediate nodes in situations where the relation for the child can be split into separate relations that only relate to subsets of the 25 parents.

Hence, an improved system for propagation in a Bayesian Network would be advantageous and in particular a system allowing for increased flexibility, accuracy, reduced memory 30 requirements and/or improved performance would be advantageous.

CML01802N Specification FinalVersion

Summary of the Invention

Accordingly, the Invention seeks to preferably mitigate,

5 alleviate or eliminate one or more of the above mentioned disadvantages singly or in any combination.

According to a first aspect of the invention there is provided a method of propagation in a graphical 10 probabilistic model, the method comprising: detecting a change in a first data value associated with a first node of the graphical probabilistic model; determining a partial node probability table associated with a second node connected to the first node in response to the detection of 15 the change; and determining a state probability distribution for the second node in response to the partial node probability table.

The graphical probabilistic model may be a Bayesian Network.

20

The partial node probability table may for example relate to only some states of the node probability table of the second node. In particular, the partial node probability table may relate only to a subset of parents of the first or second 25 node. The detection of the change in the first data value may for example be by a detection of a user input of information associated with the first data value.

The invention may allow partial node probability tables to 30 be determined at run time. In particular, propagation may be facilitated by determination of node probability tables specifically when a data value changes. This may allow a

• • • t • • •

_«♦•••• • •

o partial node probability table suitable for the specific conditions to be used rather than requiring a full node probability table covering all possible scenarios. Accordingly, accuracy may be improved by a customisation of 5 the propagation, and in particular the node probability tables, to the current conditions, such as the current data values. Alternatively or additionally, the memory requirement may be substantially reduced as only parts of the node probability table required or desired for the 10 current conditions needs to be calculated.

Thus the invention may allow improved accuracy and/or reduced memory requirements for propagation in a graphical probabilistic model such as a Bayesian Network.

15

The invention may be particularly advantageous for propagation involving deterministic relationships and/or data values.

20 According to an optional feature of the invention, values are determined only for a subset of the partial node probability table and the determining of the partial node probability table comprises determining the subset of the node probability table in response to the first data value. 25 This may allow or facilitate determination of partial node probability tables suited for the specific first data value and may in particular reduce memory requirements as only the subset needed for the specific first data value needs to be calculated.

30

According to an optional feature of the invention, the subset is determined as a subset comprising only values jn

• • # « • •

• • I

• • • •

associated with a state corresponding to the first data value. This may reduce memory requirements and provides for a practical way of determining the subset to suit the current conditions.

5

According to an optional feature of the invention, the second node is a child of the first node. A partial node probability table associated with a child may be determined in response to a change of data value for a parent node. The 10 invention may facilitate and/or enable and/or improve forward propagation in a graphical probabilistic model such as a Bayesian Network.

According to an optional feature of the invention, the 15 determining of the partial node probability table comprises determining state probability values for the second node by statistically evaluating results of applying a function associated with the second node to stochastically selected sample values associated with the first node. This provides 20 for a practical means of determining the partial node probability table at run time.

According to an optional feature of the invention, the stochastically selected sample values are selected in 25 response to a probability distribution of data values of the first node. This provides for a practical means of determining the partial node probability table at run time and may in particular reduce the complexity and the computational and memory resource required for the 30 determination of the partial node probability table. An accurate determination of the partial node probability table may alternatively or additionally be determined.

I

According to an optional feature of the invention, the results of applying the function to the stochastically selected sample values are weighted in response to a 5 probability distribution of data values of the first node. This provides for a practical means of determining the partial node probability table at run time and may in particular reduce the complexity and computational and memory resource required for the determination of the 10 partial node probability table An accurate determination of the partial node probability table may alternatively or additionally be determined.

According to an optional feature of the invention, 15 determining the state probability values comprises stochastically selecting sample values in response to the first data value. This provides for a practical means of determining the partial node probability table at run time and may in particular reduce the complexity and the 20 computational and memory resource required for the determination of the partial node probability table. An accurate determination of the partial node probability table suited to the current first data value may alternatively or additionally be determined.

25

According to an optional feature of the invention, determining the state probability values comprises stochastically selecting sample values only in a data value interval associated with the first data value. This may 30 effectively reduce the computational burden and the memory requirement while achieving high accuracy.

According to an optional feature of the invention, the state probability value of a state is determined in response to a proportion of results being associated with the state. This may provide highly accurate results yet allow a low 5 complexity determination.

According to an optional feature of the invention, the graphical probabilistic model comprises a third node being a parent of the second node and having an associated second 0 data value; and determining the state probability values comprises: selecting a first state of the first node associated with the first data value; selecting a second state of the third node associated with the second data value; stochastically selecting first samples associated 5 with the first state; stochastically selecting second samples associated with the second state; determining sample values for the second node in response to the first samples and second samples; and determining the state probability values in response to a distribution of the sample values !0 for the second node.

In particular, the partial node probability table may only be determined for the specific intervals of the parents to which the current node value belongs. This may substantially !5 reduce the complexity and the computational and memory resource required for the determination of the partial node probability table while achieving high accuracy.

According to an optional feature of the invention, the 10 second node is a parent of the first node. A partial node probability table associated with a parent may be determined in response to a change of data value for a child node. The

12

invention may facilitate and/or enable and/or improve backwards propagation m a graphical probabilistic model such as a Bayesian Network. The partial node probability table may be a partial node probability table associated 5 with the parent and indicating a probability of data values of the child nodes as a function of data values of the parent. The partial node probability table may be a node probability table associated only with a single parent of a child node having a plurality of parents.

10

According to an optional feature of the invention, determining the state probability distribution comprises determining state probability values associated with the second node by applying Bayes Theorem to data values of the 15 partial node probability table and marginalisation of the first node. The partial node probability table may be a partial node probability table associated with the parent and indicating a probability of data values of the child nodes as a function of data values of the parent and the 20 feature may allow a state probability distribution for the parent node to be determined from the partial node probability table.

According to an optional feature of the invention, 25 determining the partial node probability table comprises determining state probability values for the second node by statistically evaluating results of applying a function associated with the first node to stochastically selected sample values of the second node. This provides for a 30 practical means of determining the partial node probability table at run time.

Ml IM !*»««• • «

13

According to an optional feature of the invention, the stochastically selected samples are selected in response to a probability distribution of data values of the second node. This provides for a practical means of determining the 5 partial node probability table at run time and may in particular reduce the complexity and the computational and memory resource required for the determination of the partial node probability table. An accurate determination of the partial node probability table may alternatively or 10 additionally be achieved.

According to an optional feature of the invention, the results of applying the function to the stochastically selected samples are weighted in response to a probability 15 distribution of data values of the second node. This provides for a practical means of determining the partial node probability table at run time and may in particular reduce the complexity and the computational and memory resource required for the determination of the partial node 20 probability table. An accurate determination of the partial node probability table may alternatively or additionally be achieved.

According to an optional feature of the invention, states of 25 the second node correspond to data value intervals and determining the partial node probability table further comprises resizing the data value intervals. This may allow an improved accuracy and performance of the graphical probabilistic model. In particular, each of the data value 30 intervals associated with individual states may be optimised for the current conditions thereby reducing granularity errors.

14

According to an optional feature of the invention, resizing the data value intervals comprises distributing the data value intervals over a data value range having an associated 5 probability density above a threshold. This may provide increased accuracy and a low complexity approach to resizing.

According to an optional feature of the invention, resizing 10 the data value intervals comprises a scaling of the data value intervals. This may provide increased accuracy and a low complexity approach to resizing.

According to an optional feature of the invention, resizing 15 the data value intervals comprises modifying the data value intervals to result in a predetermined probability distribution of the data value intervals. In particular, the predetermined probability distribution may correspond to the data value intervals having a substantially equal 20 probability. This may provide increased accuracy and a low complexity approach to resizing.

According to an optional feature of the invention, the first data value is a deterministic value. Alternatively, the 25 first data value may be a stochastic value

According to an optional feature of the invention, the first data value is a user input value. The invention may provide for improved propagation in a graphical probabilistic model 30 such as a Bayesian Network in response to receiving a new user input.

15

According to a second aspect of the invention, there is provided an apparatus for propagation of information in a graphical probabilistic model, the apparatus comprising:

means for detecting a change in a first data value 5 associated with a first node of the graphical probabilistic model; means for determining a partial Node probability table associated with a second node connected to the first node in response to the detection of the change; and means for determining a state probability distribution for the 10 second node in response to the partial node probability table.

These and other aspects, features and advantages of the invention will be apparent from and elucidated with reference 15 to the embodiment(s) described hereinafter.

Brief Description of the Drawings

20 Embodiments of the invention will be described, by way of example only, with reference to the drawings, in which

FIG. 1 is an illustration of an example of a determination of a probability distribution for a child node having two 25 parent nodes;

FIG. 2 is an illustration of an example of a determination of a node probability table for a child node having two parent nodes;

30

FIG. 3 illustrates an example of a Bayesian Network in which embodiments of the invention may be applied;

16

FIG. 4 illustrates a flow chart of a method of forward propagation in accordance with some embodiments of the invention;

5

FIG. 5 illustrates an example of backwards propagation in accordance with some embodiments of the invention;

FIG. 6 illustrates an example of a probability distribution 10 and a division of a value into intervals for a node of a Bayesian Network;

FIG. 7 illustrates another example of a probability distribution and a division of a value into intervals for a 15 node of a Bayesian Network; and

FIG. 8 illustrates another example of a probability distribution and a division of a value into intervals for a node of a Bayesian Network.

20

Detailed Description of Embodiments of the Invention

The following description focuses on embodiments of the 25 invention applicable to a Bayesian Network wherein some nodes have deterministic data values and/or deterministic relations. However, it will be appreciated that the invention is not limited to this application but may be applied to other Bayesian Networks or graphical 30 probabilistic models.

17

FIG. 3 illustrates an example of a Bayesian Network 300 in which embodiments of the invention may bo applied. The Bayesian Network 300 may represent a model of a decision process of an expert system such as a medical diagnosis 5 expert system. In the following, it will be assumed that models of the expert system are implemented as Bayesian Networks having tree like structures and that there is only a single path between any two nodes. Hence, these models exclude diamond and loop structures. This provides for a 10 simplification as it can be assumed the parents of a given node are independent of each other.

In the example of FIG. 3 a given node C 301 has two parents PI and P2 303, 305. In the example network, node C 301 has 15 two children nodes 307, 309 and parent PI 303 itself has two parents 311, 313 and parent P2 305 has one parent 315.

In accordance with some embodiments of the invention, the full node probability tables of each node are not determined 20 in advance but rather partial node probability tables are determined specifically when a data value changes. In particular, a modification of a data value of node PI 303 may be detected and in response the data value(s) of node C 301 are updated by determination of a partial node 25 probability table which is suitable for updating node C 301 in view of the current conditions and in particular in view of the data value of the parent nodes PI and P2 303, 305. As another example, a change of a data value of node C 301 may be detected and in response a partial node probability table 30 may be determined which allows the data value of node PI 303 to be updated accordingly (and the data value of node P2 may also be updated).

18

Hence, in the example, forwards and backwards propagation in a Bayesian Network is performed on the fly during run time and by using only partial node probability tables. In 5 particular, the propagation may be performed by determining a subset of the node probability tables which is required for determining state values for the specific new data value. Hence, in particular for a deterministic data value, a significantly reduced node probability table may be 10 determined.

In the following, an example of forward propagation from the parent nodes PI and P2 303, 305 to the child node C 301 in accordance with some embodiments of the current invention 15 will be described.

FIG. 4 illustrates a flow chart of a method of forward propagation in accordance with some embodiments of the invention.

20

The method starts in step 401 wherein a change in a data value of a node is detected. In particular, the method may detect that one or more state values of the parent node Pi 303 has changed. The change may for example be a change of 25 data values in response to a user input. For example,

evidence may be input by a user which provides additional deterministic or probabilistic information to the Bayesian Network. As another example, a state value of node PI 303 may change in response to a change of input from the parents 30 311, 313 of node PI 303. Hence, the propagation from node PI 303 to node C 301 may be initialised in response to a propagation of information from the parents of node Pi 303.

Step 401 is followed by step 403 wherein a subset of the node probability table for node C 301 is determined. Thus, in the example, step 403 comprises determining which 5 elements of the node probability table of node C 301 are to be included in the partial node probability table.

In particular, the subset may be determined to include elements relating to the states in which the current state 10 values belong. Thus, in an example wherein the data value range associated with each of node Pi 303 and node P2 305 is divided into five intervals [0,1]; [1,2]; [2,3]; [3,4] and [4,5] (equivalent to the example of FIG. 1); and the state value of node PI 303 is 100% in interval [1,2] and the state 15 value of node P2 303 is 100% in interval [2,3[,the subset of the node probability table corresponding to the intervals [2,3] of PI and [2,3] of P2 are selected.

Step 403 is followed by step 405 wherein the partial node 20 probability table is calculated. The partial node probability table is in the specific example determined by statistically evaluating results of applying a function associated with the second node to stochastically selected sample values associated with the first node.

25

Specifically, step 405 may comprise randomly selecting data values in the two data intervals associated with the parent nodes PI and P2 303, 305 and calculating the result for node C 301 for these values. In the present example, sample 30 values are thus stochastically selected and summed together. The result of the sum is then allocated to the appropriate state (data interval) of the child node C 301. The

• • • • • •

20

probability value of each state of the child node C 301 may then be determined by the proportion of results which are allocated to the state.

5 The partial node probability table values may specifically be determined by a sampling method such as Monte-Carlo or uniform sampling. As Bayesian Networks can deal with probabilistic or deterministic relations, exact calculation of the node probability table is difficult and therefore the 10 node probability table is conveniently determined by taking samples from each parent and calculating the result (which may be a single value or a distribution depending on the relationship). As a single sample is not representative of a distribution, it is necessary to use many samples and the 15 more samples taken, the closer the calculated distribution will be to the actual distribution.

As a specific example, it is assumed that the values [1,2] and [2,3] have been entered for PI and P2 respectively and 20 that node C 301 represents the sum of PI and P2.

To calculate the result, many sample points are taken for each parent and the result is calculated for each sample set. A sample may be generated by randomly picking a value 25 in the state value intervals of the parent nodes PI and P2 as illustrated in the table below.

Result

PI

P2

(PI +

Falls

Bin

Bin sample sample

P2)

into

[2,4]

14,6]

1.11

2 .79

3 . 9

Bin [2,4]

1

0

1. 09

2.8

3.9

Bin [2,4]

1

0

1.46

2 .76

4 .22

Bin [4,6]

0

1

1.28

2.7

3.98

Bin [2,4]

1

0

1. 13

2 . 3

3.42

Bin [2,4]

1

0

1.53

2 . 19

3.71

Bin [2,4]

1

0

1.27

2.91

4 . 18

Bin [4,6]

0

1

h-'

00 to

2.92

4 . 74

Bin [4,6]

0

1

1. 96

2.41

4 . 37

Bin [4,6]

0

1

1. 84

2 . 88

4 . 72

Bin [4,6]

0

1

Total

5

..

5

As illustrated, the samples 1.11 and 2.79 may be picked for PI and P2 respectively. Applying the relation for node C 5 (addition) the result of 3.9 is achieved. Thus an observation is added to the [2,4] state for node C.

In the example, this process is repeated 10 times and 10 results are calculated. Once the results have been 10 calculated and allocated to states, the total number of observations associated with each state is representative of the states probability. The probability of each state is thus determined by normalising the observation counts.

15 In the example, half of the points fall in state [2,4] and half fall in state [4,6] resulting in a probability of 0.5 for each of these states. However as with all random sampling techniques, the samples are picked randomly and it is possible for more results to fall into one bin than the 20 other due to random fluctuations. This may introduce a small error into the model. Obviously the more samples that are taken, the smaller the effect of each fluctuation will be.

Hence, in the specific example, state probability values of the partial node probability table is determined by first identifying a state of each of the parents and then 5 stochastically selecting samples having values belonging to this state. The relationship is then applied to the selected samples and the state probability values are determined in response to the resulting distribution of the sample values.

10 Step 405 is followed by step 407 wherein the partial node probability table is used to determine the probability distribution for node C 301. In the example of forward propagation, the partial node probability table may directly be used as the node probability table of the child node and 15 in particular, the probability distribution of node C 301 may be generated by inserting the values of the appropriate cells of the partial node probability table into the corresponding cells of the probability distribution of node C 301.

20

In the example above, the propagation was of simple state values i.e. each of the parents Pi and P2 had a single state value. However, in many cases the data value of the parents may be a probabilistic distribution. For example, rather 25 than each parent having a data value corresponding to a single state, the data value may correspond to a probability of each state.

Propagation may in this case be achieved by repeating the 30 same process for each non-zero state and modifying the results or the sample selection (or both) in response to the probabilities of each state.

23

Thus, in some embodiments, the sample values may be selected in response to a probability distribution of the data values. In particular, samples may be selected by first 5 selecting a state based on its probability and then uniformly selecting a sample value within this state.

Also, in some embodiments the results of applying the function to the sample values may be weighted in response to 10 a probability distribution of data values of the first node.

For example, node Pi 303 could have a distribution with 10% probability of interval [0,1]; 20% probability of interval [1,2]; 30% probability of interval [2,3]; 35% probability of 15 interval [3,4]; and 5% probability of interval [4,5].

In some embodiments, the samples are selected from the intervals in response to this distribution. This may for example be achieved by partitioning the interval [0,1] into 20 five intervals each corresponding to one state. The size of each interval may be set to correspond to the relative likelihood of each state, more formally, if there are n states and P, is the probability of the ith state then the intervals may be determined as:

25

I'VE'',

Hi ,=(>

, otherwise

In the specific example, the intervals would thus be [0,0.1], [0.1,0.3], [0.3,0.6], [0.6,0.95] and [0.95,1]. A 30 state may be determined by generating a uniform random value

between zero and one, determining which interval the random value falls in and selecting the corresponding state. A sample may then be generated by a random selection of a value in the interval of the state.

5

A disadvantage of this approach is that states with higher probability will have more sample values resulting in the accuracy of these states being increased at the expense of the less likely intervals. Consequently, it may not be 10 possible to guarantee a given level of accuracy.

Hence, in some embodiments it may be advantageous to select a state randomly and weigh the result in response to the probability of the state. This will provide a good spread of 15 samples across likely and unlikely states. For example if the sample is selected from the second state corresponding to the interval [1,2 J the observation count may be weighted by the probability of 20%. If there are several parents, the probabilities of being in each parent state can be 20 multiplied together.

The previous example described forward propagation in a Bayesian Network. A similar method may be applied to backwards propagation. Specifically, a partial node 25 probability table may be determined for the child node and Bayes theorem may be used to determine a probability distribution of the parent. The partial node probability table may specifically relate only to the parent to which the information is propagated. Thus, a partial node 30 probability table indicative of P(C|PX), where PL is the parent to which information is propagated, may be derived despite the child C having more than this parent. The

25

probability distribution for Px may then be determined by application of Bayes theorem as will be described.

If a node C has parents Pi, P2, •••, Pn, Bayes theorem is 5 given as:

P(C I P )P(P) P(P C) = * ' '* K '' for/ = 1 P(C)

Thus, to calculate the new distribution of node Pi (or more 10 correctly the value of PjC), three values are required: The old distribution of P,. This is readily available as it will have been calculated in a previous propagation cycle or will have the initialised value.

15 - The distribution of node C. As the theorem is applied to backwards propagation, this will have been calculated in a previous iteration of the current propagation cycle.

The partial node probability table describing 20 P(C|P±).

The partial node probability table is indicative of the probability of a given value of the child for a given value of the parent. Hence, the partial node probability table is 25 equivalent to the partial node probability table calculated for forwards propagation and may be calculated in a similar way.

In particular, a partial node probability table may be 30 calculated for each parent node by a sampling method such as Monte-Carlo in the same way as previously described for

calculation of the child node's probability distribution. However, in addition to determining the state values of the child, the state values of the parent are also evaluated.

5 FIG. 5 illustrates an example where two parent nodes Pi, P2 501, 503 are connected to a child node C 505. In the specific example, it is desirable to propagate upwards from the child node C 404 for example due to evidence in the form of a state or likelihood being entered into C by a user or 10 by upwards propagation from children of C.

The following describes propagation to node PI. Propagation to P2 is analogous.

15 First, the partial node probability table representing

P(C|P1) is generated. In an almost identical way to forward propagation, the node probability table is generated by a sampling method such as Monte-Carlo. The only difference is that in addition to counting the state (data interval) that 20 the result falls in, the state (interval) of the selected sample for Pi is also considered.

The following table illustrates the sampling process. For each state in node Pi, a number of samples are selected (in 25 this example 5 samples are selected per state). With each sample for Pi. a sample is selected for P2. As in the forward propagation algorithm, the result is calculated for the selected samples and the corresponding state is determined.

PI

P2

Result

Falls

Sample

(P1+P2)

in

0.47

0.46

0.93

[0,1]

• • * • • • • • • • • •

• • • • t •

• • • • • • • • • • • • ••• • ••

27

• • • • • • • • • • • • • • • • • • •

0.20

1. 82

2 . 02

[2,3]

0 .77

0 .44

1.21

[1,2]

0.01

1.90

1.91

[1,2]

0.89

1.73

2.62

[2,3]

1.18

0.54

1.72

[1,2]

1. 62

0.52

2 .14

[2,3]

1. 16

1.38

2.55

[2,3]

1. 16

1.92

3.08

[3,4]

1.77

0.55

2 . 32

[2,3]

In the forward propagation algorithm the number of times that a state corresponds to a result is calculated. However in order to create the partial node probability table for 5 the backwards propagation, it is also necessary to divide the samples depending on the state of Pi that was sampled.

The following table shows the partial node probability table which results from the samples of the above table. For 10 example, 20% of the samples (1 sample) was in [0,1] for Pi and [0-1] for C. This corresponds to the sample in row 1 of the table.

PI

[0-1]

[1-2]

[0-1]

0 . 2

0

C

[1-2]

0.4

0.2

[2-3]

0.4

0.6

^ [3-4]

0

0.2

15

• • • •

• • • • •

• • •

• •

28

The next step in backwards propagation is to apply Bayes theorem. In particular, Bayes theorem is applied to every cell in the partial node probability table. The optimal application of Bayes theorem may depend on the approach used 5 to select the samples.

For example, assuming that each sample counts equally, Bayes theorem may be applied directly. E.g. in the case of Pi = [ 0,1] and C=[0,1], the value of the partial node probability table 10 is multiplied by P(P=[0,1]) and divided by P(C=[0,1]). In the specific example nodes C and Pi are both uniform and only a scaling effect results from the application of Bayes theorem.

15 In an example where the samples are weighted by the probability of observing the sample, half of Bayes theorem (P(C|P1)P(P1)) has already been calculated and the values of the partial node probability table are simply divided by P(C) .

20

The final step in backwards propagation is to marginalise the resulting table over the child node C 505. As is known to the person skilled xn the art, marginalisation may be considered to remove the impact of node C by taking into 25 consideration all potential values C may have and multiplying each by the likelihood of the individual value and summing over all possible values. Hence, marginalisation may in this case be considered a weighted average of the impact of the values of node C.

30

In some embodiments the method may further comprise resizing the data value intervals associated with the individual

states. In a conventional Bayesian Network, the probability distribution intervals and the node probability tables are predetermined and the ranges of all the nodes are fixed. However, in order to reduce memory and computational 5 resource requirements the number of states that can be implemented is restricted. Thus there is an inherent tradeoff between accuracy and efficiency.

FIG. 6 illustrates an example of a probability distribution 10 and a division of a value into intervals. As the distribution of a node can vary, it is possible that several of the intervals have zero probabilities (Intervals [I0, Ii], [ I j, I2], [I2, Ij] and [I6, I7] in FIG. 6). In traditional tools, where fixed ranges are used, this results in a 15 granularity error which is higher than necessary for the current distribution. Therefore, by resizing the intervals dynamically to match the current distribution, smaller intervals and thus lower granularity errors may be achieved.

20 For example, in some embodiments, the data value intervals may be distributed over the data value range for which the probability distribution is above a threshold. For example, a range in which the probability distribution is higher than, say, 0.01 may be identified and the fixed number of 25 intervals may be distributed to cover only this range.

In some embodiments, resizing of data value intervals may comprise a scaling of the data value intervals.

Specifically, the original intervals may be scaled 30 proportionally based only on the maximum and minimum sample values that are observed. An example of this is shown in FIG. 7 wherein the intervals of FIG. 6 are scaled to cover

30

the smaller range corresponding to a non-zero probability distribution. In the example, the ratio between the intervals remains the same but as the range covered is smaller, the intervals are smaller thereby reducing the 5 granularity error.

More specifically, if the maximum and minimum observed values are a and b respectively and the intervals smallest and largest intervals are I0 and In respectively, the new 10 I-,th value (I/) may be calculated as follows:

/' = / b~a | /»a~/o/)

'

The advantage of this method is that it retains the original 15 ratio of intervals, so if it is desired to represent a node by e.g. a logarithmic interval structure this will be retained.

In some embodiments, the resizing of the data value 20 intervals comprises modifying the data value intervals to result in a predetermined probability distribution of the data value intervals. In particular, the intervals may be redistributed such that the probability of each interval is equally likely. This will tend to reduce the error in many 25 embodiments.

For example if a node comprises n states the corresponding intervals are selected such that the probability of being in each interval is 1/n. An example of this is shown in FIG. 8 30 wherein the intervals of FIG. 6 are modified to cover the

smaller range corresponding to a non-zero probability distribution.

It will be appreciated that the above description for 5 clarity has described embodiments of the invention with reference to different functional processes which may be implemented in suitable functional units. It will be apparent that any suitable distribution of functionality between different functional units or processors may be used 10 without detracting from the invention.

The invention can be implemented in any suitable form including hardware, software, firmware or any combination of these. The invention may optionally be implemented at least 15 partly as computer software running on one or more data processors and/or digital signal processors. The elements and components of an embodiment of the invention may be physically, functionally and logically implemented in any suitable way. Indeed the functionality may be implemented in 20 a single unit, in a plurality of units or as part of other functional units. As such, the invention may be implemented in a single unit or may be physically and functionally distributed between different units and processors.

25 Although the present invention has been described in connection with some embodiments, it is not intended to be limited to the specific form set forth herein. Rather, the scope of the present invention is limited only by the accompanying claims. Additionally, although a feature may 30 appear to be described in connection with particular embodiments, one skilled in the art would recognize that various features of the described embodiments may be

combined in accordance with the invention. In the claims, the term comprising does not exclude the presence of other elements or steps.

5 Furthermore, although individually listed, a plurality of means, elements or method steps may be implemented by e.g. a single unit or processor. Additionally, although individual features may be included in different claims, these may possibly be advantageously combined, and the inclusion in 10 different claims does not imply that a combination of features is not feasible and/or advantageous. Also the inclusion of a feature in one category of claims does not imply a limitation to this category but rather indicates that the feature is equally applicable to other claim 15 categories as appropriate. Furthermore, the order of features in the claims do not imply any specific order in which the features must be worked and in particular the order of individual steps in a method claim does not imply that the steps must be performed in this order. Rather, the 20 steps may be performed in any suitable order. In addition, singular references do not exclude a plurality. Thus references to "a", "an", "first", "second" etc do not preclude a plurality.

• t • « « •

33

Claims

1. A method of propagation in a graphical probabilistic 5 model, the method comprising:

detecting a change in a first data value associated with a first node of the graphical probabilistic model;

determining a partial node probability table associated with a second node connected to the first node in response 10 to the detection of the change; and determining a state probability distribution for the second node in response to the partial node probability table.

15

2. The method claimed in claim 1 wherein values are determined only for a subset of the partial node probability table and the determining of the partial node probability table comprises determining the subset of the node probability table in response to the first data value.

20

3. The method claimed in any previous claim wherein the subset is determined as a subset comprising only values associated with states corresponding to the first data value.

25

4. The method claimed in any previous claim wherein the second node is a child of the first node.

5. The method claimed in claim 4 wherein the determining 30 of the partial node probability table comprises determining state probability values for the second node by statistically evaluating results of applying a function

34

associated with the second node to stochastically selected sample values associated with the first node.

6. The method claimed in claim 5 wherein the

5 stochastically selected sample values are selected in response to a probability distribution of data values of the first node.

7. The method claimed in any of the claims 5 to 6 wherein 10 the results of applying the function to the stochastically selected sample values are weighted in response to a probability distribution of data values of the first node.

8. The method claimed in any of the claims 5 to 7 wherein 15 determining the state probability values comprises stochastically selecting sample values in response to the first data value.

9. The method claimed in claim 8 wherein determining the 20 state probability values comprises stochastically selecting sample values only in a data value interval associated with the first data value.

10. The method claimed in any of the claims 5 to 9 wherein 25 the state probability value of a state is determined in response to a proportion of results being associated with the state.

11. The method claimed in any of the claims 5 to 10 wherein 30 the graphical probabilistic model comprises a third node being a parent of the second node and having an associated

35

second data value; and determining the state probability values comprises:

selecting a first state of the first node associated with the first data value;

5 selecting a second state of the third node associated with the second data value;

stochastically selecting first samples associated with the first state;

stochastically selecting second samples associated with 10 the second state;

determining sample values for the second node in response to the first samples and second samples; and determining the state probability values in response to a distribution of the sample values for the second node.

15

12. The method claimed in any of the previous claims 1 to 3 wherein the second node is a parent of the first node.

13. The method claimed in claim 12 wherein determining the 20 state probability distribution comprises determining state probability values associated with the second node by applying Bayes Theorem to data values of the partial node probability table and marginalisation of the first node.

25

14. The method claimed in claim 12 or 13 wherein determining the partial node probability table comprises determining state probability values for the second node by statistically evaluating results of applying a function associated with the first node to stochastically selected 30 sample values of the second node.

36

15. The method claimed in claim 14 wherein the stochastically selected samples are selected in response to a probability distribution of data values of the second node.

5

16. The method claimed in any of the claims 14 or 15 wherein the results of applying the function to the stochastically selected samples are weighted in response to a probability distribution of data values of the second

10 node.

17. The method claimed in any of the previous claims wherein states of the second node corresponds to data value intervals and determining the partial node probability table

15 further comprises resizing the data value intervals.

18. The method claimed in claim 17 wherein resizing the data value intervals comprises distributing the data value intervals over a data value range having an associated

20 probability density above a threshold.

19. The method claimed in claim 17 or 18 wherein resizing the data value intervals comprises a scaling of the data value intervals.

25

20. The method claimed in claim 17 wherein resizing the data value intervals comprises modifying the data value intervals to result in a predetermined probability distribution of the data value intervals.

30

21. The method claimed in claim 20 wherein the predetermined probability distribution corresponds to the

••I It* •

data value intervals having a substantially equal probability.

22. The method claimed in any of the previous claims 5 wherein the first data value is a deterministic value.

23. The method claimed in any of the previous claims 1 to 21 wherein the first data value is a stochastic value

10

24. The method claimed in any of the previous claims wherein the first data value is a user input value.

25. The method claimed in any of the previous claims wherein the graphical probabilistic model is a Bayesian

15 Network.

26. A computer program enabling the carrying out of a method according to any of the previous claims.

20

27. A record carrier comprising a computer program as claimed in claim 26.

28. An apparatus for propagation of information in a graphical probabilistic model, the apparatus comprising: 25 means for detecting a change in a first data value associated with a first node of the graphical probabilistic model;

means for determining a partial node probability table associated with a second node connected to the first node in 30 response to the detection of the change; and

means for determining a state probability distribution for the second node in response to the partial node probability table.