GB2258311A

GB2258311A - Monitoring a plurality of parameters

Info

Publication number: GB2258311A
Application number: GB9215907A
Authority: GB
Inventors: Nigel Andrew Dodd
Original assignee: Individual
Current assignee: Individual
Priority date: 1991-07-27
Filing date: 1992-07-27
Publication date: 1993-02-03
Anticipated expiration: 2012-07-27
Also published as: GB9116255D0; GB2258311B; GB9215907D0

Abstract

An equipment monitor 10 which is capable of learning how to respond to particular inputs (e.g. a neural network) is connected to a plurality of instruments 12 (e.g. medical instruments in an intensive care ward or instruments in an industrial plant). During an initial training session, a human supervisor monitors the instruments to ensure that no potential alarm condition is encountered whilst the monitor assimilates the gamut of signals representative of "safe" or "healthy" conditions. Thereafter the equipment is left to signal an alarm if the collection of signals it is monitoring strays out of the range encountered during the training session. A button may be provided for indicating to the monitor that responses which give rise to false alarms should be included in its "safe" responses. The monitor may be provided with some rules prior to its learning phase. <IMAGE>

Description

"Apparatus and Method for Monitorins" This invention relates to apparatus and methods for monitoring a plurality of input signals or parameters of a system and for determining the condition of the system. In particular, but not exclusively, the invention relates to such apparatus and methods for identifying novel input signals indicative of an alarm state.

The apparatus and method have very many specific applications, but one typical application is in an intensive care ward where a patient is connected to various instruments which observe various parameters of the patient, for example, his heart rate, pulse rate, respiratory rate, ABP, CVP and so on. These instruments are conventionally monitored by a nurse who determines whether the patient's condition is stable or whether action should be taken.

Attempts have been made to relieve the nurse's workload by equipping some of these instruments with primitive monitors which sound an alarm when the reading of the instrument strays outside pre-determined limits.

However, the bounds of acceptability are to some extent arbitrary and require setting by someone with prior knowledge of the patient's condition. Additionally, the individual single instrument monitor has no information regarding the other parameters of the patient's body that are being instrumented. By treating each bodily parameter as distinct and unrelated to any other bodily parameter much useful information regarding the interdependence of these measurements is discarded. The human body is, after all, a unified whole whose parts must function in accord.

A need exists, therefore, for an apparatus and method functioning as a network monitor which learns the interrelationship between the samples, made by the instruments, of the bodily state. Any departure from normality of the entire system is detected and typically is signalled as an alarm.

Accordingly, in one aspect, this invention provides apparatus for monitoring a plurality of parameters of a system and for identifying or predicting the condition of the system, said apparatus comprising: a plurality of inputs each for receiving data representative of a respective parameter of the system; processing means for processing data received via said inputs to determine or predict the condition of the system and provide output data representative of said condition; training means operable to identify to the processing means monitored input signals which belong to a first class which indicate a first predetermined system condition (e.g.

healthy), and data generating means operable to generate or synthesise and present to said processing means input signals which belong to, or have a high probability of belonging to, a second class which indicates a second predetermined system condition (e.g. alarm condition), the processing means being operable during a learning phase to be taught to distinguish between input signals in said first class and said second class.

The generating means preferably simulates the statistical distribution of said second class of input signals by using a pseudo-random generator. The distribution of said random process may be non-uniform, to allow for incorporation of prior knowledge of the monitored system.

In one arrangement, the output provides data representing the probability of the monitored input signals falling in said second class. The processing means preferably is configured as or operates as an artificial neural network.

The configuration or operation of the processing means is preferably structured as a network with a capability equivalent to a three or more layered network, corresponding or analogous to an input layer, at least one "hidden" or intermediate layer and an output layer.

In another aspect, this invention provides apparatus for monitoring a plurality of system parameters and thereby deducing the condition of said system, said system including processing means capable of being taught to distinguish between a set of monitored system parameters belonging to a Class indicating one system condition and a set of monitored system parameters belonging to another Class indicating another system condition, means for supplying and identifying the Class of a plurality of sets of parameters from one of said Classes, and means for randomly or pseudo randomly generating or synthesising sets of data and for identifying these to the processor as belonging to the other Class.

In yet another aspect, this invention provides a method for monitoring a plurality of parameters of a system and for identifying or predicting the condition of the system, the method comprising the steps of: receiving a plurality of inputs each representing a particular parameter of the system, supplying said inputs to a processor for processing thereof to determine or predict the condition of the system, said processor having been trained by identifying to it those input signals belonging to a first class indicating a first predetermined system condition (e.g. healthy) and by providing said processor with synthesised or otherwise generated input signals belonging to, or having a high probability of belonging to, a second class indicating a second predetermined system condition (e.g. alarm condition) and identifying said synthesised or otherwise generated input signals as belonging to said second class.

Whilst the invention has been described above, it extends to any inventive combination of the features set out above or in the following description.

The invention may be performed in various ways and an embodiment thereof will now be described, by way of example only, reference to the accompanying drawings, in which: Figure 1 is a schematic view of an example of a neural network monitor in accordance with the invention; Figure 2 is a diagram of a simple three-layer neural network; Figure 3 represents the first stage of errorbackpropagation; Figure 4 represents the second stage of errorbackpropagation; Figure 5 is a diagram illustrating the ability of the system to generalise; Figure 6 is a diagram illustrating the insensitivity of the system to noise; Figure 7 shows the ouputs of a number of devices connected to an example of network monitor, together with the output of the network monitor itself; and Figure 8 shows a typical non-alarm or "healthy" distribution and a suitable default distribution for explaining the operation of a monitor using a non-uniform default distribution for training.

The neural network monitor 10 receives outputs from a number of instruments 12 and provides an output indicating a condition requiring attention. The neural network monitor under consideration does not require the explicit statement of rules, as would a conventional expert system, but may be taught by example. The invention however also extends to network monitors which use rules known prior to learning, and prior knowledge may be incorporated in different ways; For example prior knowledge may be incorporated by starting with weights other than small random ones, and in other ways as discussed elsewhere herein.

Referring to the present case, after a period of supervision, during which it is given experience of the normal, or healthy response to be expected from the instruments it is monitoring, the neural network monitor is left to receive input without guidance. At first the behaviour is somewhat "nervous" and the device will sound its alarm bell at any possibly "unhealthy" response from the instruments. The (human) supervisor may then indicate to the neural network monitor that the response from the instruments that caused the alarm should be included in its concept of a normal or healthy response. After repeated reassurances, the neural network monitor settles down to given infrequent false alarms.

A neural network consists of a number of simple processors, or neurons, linked together as in Figure 2. The neurons combine their inputs and subsequently produce an output which is passed to other neurons. The links between neurons contain weights which control the amplitude of the signal passing through. In addition, each neuron has an associated bias, which is effectively a connection to a neuron which is always in the on-, or 1-, state. It is the weights and biases that embody the information required to classify the input signals, just as in the mammalian brain it is the links between the neurons that determine its function. In the training stage, the weights and biases are iteratively improved by applying input and output pairs to the network.

One way of viewing the operation of the neural network is as interpolation in a space defined by its parameters.

The training patterns are the representative examples of classes. After training, previously unseen patterns are classified according to an interpolation between the training examples. The number of neurons in the network determines the complexity of the space in which interpolation is done. In this way, with enough neurons, a division of feature-space by arbitrary complex boundaries can be made, assimilating the fine distinguishing features of the input patterns. Alternatively, by providing only a few neurons, the network is forced to generalise and the outliers in its training will be effectively ignored.

The knowledge of the network is contained in the values of the weights between neurons. Initially these are set to small random values. The process by which the values of the weights are refined to represent better the mapping of the network's input to the required output is known as "errorbackpropagation".

In the error-backpropagation process, the objective is to obtain some Aw for each weight such that, when the weight vector is changed from W to W+Wt the error (i.e. the difference between the actual output y, and the desired output d) is reduced. Let us first define the error, summed over all training examples, c, as

where the j subscript denotes in turn each output unit as in Figure 3.

The net input to a neuron in the j layer is obtained by multiplying all its separate inputs by their respective weights and adding:

where the i subscript denotes in turn each unit contributing to output unit j. Let the neuron's output be some differentiable function, A, of this net input, Yj = A(x1) (3) The threshold function is not differentiable, but it can be approximated by a function that is differentiable called the "logistic", "signoid" or hyperbolic tangent.

From equation 2 we get the derivatives @@@ dwji and (5) zj dyi = ii and from equation 3 we get the derivative (6) @@@@ The process of error-backpropagation starts at the output of the network after input pattern number c has been propagated forwards from the input to the output. The actual output, in this case Yj ct is compared with the desired output.

Therefore, for just one of the c, the derivative of this error with respect to the jth neuron's output is 6E @@@@ . (7) dyj This starts the error-backpropagation process whose objective is to find the error derivative with respect to the weights: @E AE0yj azj (8) @@@ @@@ @@@ @@@ where the derivatives on the right hand side can be evaluated from equations 7, 6 and 4 above.

The next stage is to evaluate dyi to enable us to continue down the network:

where the derivatives on the right hand side can be evaluated from equations 7, 6 and 5 above.

Now we have completed the necessary calculations for the top layer. Let us re-label the layers as shown in Figure 4 to enable us to continue the next layer down.

We have already obtained wj by equation 9 where the old i layer is now the new j layer. From this we can calculate dwji, as in equation 8. If we have not yet reached the bottom layer of the network, we calculate dyi as in equation 9.

The chaining process continues to evaluate dw followed by dy for all the layers of the network, thereby obtaining the error derivative of all the weights in the network.

Having described the process of error-backpropagation in detail we now turn to the technique of "default classification" according to which, if no examples of one class of output are available, the complementary class must be inferred by default. For example, if a particular system is being monitored for some dangerous condition, and if that dangerous condition cannot be produced at will to train the network explicitly, then the only examples available for training will be consistent with the healthy, non-dangerous state of the system.Training a network only on one class of inputs, with no counter-examples, causes the network to classify everything as the only class it has been shown. However, by training the network on examples of the "healthy" class but also on random inputs for the "dangerous" class, any input which occurs after training which does not resemble one of the previously encountered "healthy" inputs will automatically be classified as "dangerous". The network effectively behaves as a novelty detector.

To illustrate this, a network was trained as follows.

It had five inputs and one output. The "healthy", class 1, input vectors consisted of elements a, b, c, d, e such that b < c, d < c, a < b, e < d, ( 10) as illustrated in table 1. 50 training examples satisfying conditions 10 were generated. A network with 5 inputs, 3 hidden neurons and one output was trained to output class 1 for this data. Additionally random data with the same first-order statistics as the data for output class 1 was synthesised for which the network was trained to output class 0. When tested on a new set of 100 inputs, half of which were produced by explicitly following condition 1, and half of which were generated randomly, the performance was 93% correct. For the randomly generated data there is a finite probability of fulfilling condition 1, and it can be shown that the network is in fact performing to within 1% of the inherent upper limit of performance.

A more rigorous justification for synthesising the available data with random numbers follows from the fact that training seeks to minimise the sum squared error over the training set. Consider a binary classification network with a single input v producing an output f(v). The required outputs are 0 if the input is a member of Class A and 1 if the input is a member of class B.If the prior probability of any data being a member of class A is PAT and the prior probability of any data being a member of class B is PB; and if the probability distribution functions of the

input output a b c d e class 0.150160 0.241971 0.496722 0.338752 0.163327 1.0 0.752625 -0.258011 0.050505 -0.144331 0.085486 0.0 0.390102 0.582408 0.667979 0.252589 0.037113 1.0 -0.894841 0.933459 -0.331432 -0.835807 -0.459371 0.0 -0.406147 -0.263851 0.074262 -0.330817 -0.446870 1.0 0.024406 0.173021 0.477517 0.743378 0.155935 0.0 0.769972 0.797939 0.964704 0.768290 0.724058 1.0 0.705362 0.622344 0.909775 0.808566 -0.722170 0.0 0.574456 0.622694 0.686187 0.684142 0.684017 1.0 0.173560 -0.250082 -0.946428 -0.070469 0.570686 0.0 Table 1: Output class 1 vectors are obtained using condition 1. Output class 0 vectors are synthesized from random numbers having the same mean as class 1.

two classes as functions of the input v are PA (V) and PB (V), then the sum squared error, E, over the whole training set is given by:

Differentiating this with respect to the function f: 8E of =2PA(U)PAf(v) + 2PB(l))PB [ f(tl)- 1 ] and equating this to zero

which is exactly the probability of the correct classification being B given that the input was v. So by training for the minimisation of sum squared error, and using as targets 0 for class A and 1 for class B, the output from the network assumes a value equal to the probability of class B.

Substituting the f(v) corresponding to a trained network back into 11 we get

and so the minimum attainable error is when there is zero overlap between the distributions:

In this way it is possible to model the default class to produce an error less than the error to be expected from using uniformly distributed random variates.

As a generalisation, in a situation where a network is apprenticed to a human to learn to distinguish a healthy class of signal from anything else that might come along, it is important that the network should reach a state of learning where it can be left alone, as soon as possible.

The training stage should be as brief as possible, leaving only the occasional false alarm to recall the human operator to give the benefit of his judgement. Reaching a useful level of performance with only a few examples is only possible if those examples are representative of all members of the class.

To demonstrate that networks are able to generalise with relatively few examples, the training-set described above was used with different numbers of training examples. With only 5 unique examples of class 1 the performance is above 80% (see Figure 5). As before 50 random vectors were used to synthesise class 0. In this example the network has 5 inputs, 3 hidden neurons and 1 output. There is evidence to suggest that as the hidden layer of neurons is made smaller, so the network is obliged to generalise better. This generalisation is at the expense of being able to classify correctly the outliers of the training-set.

Turning to the noise tolerance of the network, the ability of a network with few hidden neurons to generalise suggests that a compact representation of the data is being made. If the data is corrupted by zero-mean noise, the essence of the data is retained from sample to sample while the noise is changing. Within limits this has little effect on the ability of the network to learn classifications. Figure 6 shows the effect on classification performance of unseen data with an increasing amount of noise present on both the training and test data. The score is 74% even with a signal to noise ratio of 1:1.

For scalar inputs which vary slowly with time and have no syntax, an appropriately structured artificial neural network which is layered and has total connectivity between layers is just as good as any. However, certain types of input may have an underlying generator which undergoes well defined state transitions.

For these types of input the ideal network should contain the hardware (written in terms of the network formalism) able to exploit the regularity of the data and to extract parameters from it to be fed to the rest of the network.

Since the network learns from example, the allocation of signals to inputs of the network is arbitrary. This is certainly the case for a homogeneous network consisting, for instance, of totally interconnected layers, however particular applications may require a structured network predisposed to address the characteristic variation in certain types of input. In this case inputs to the network will favour certain types of signal and must be allocated accordingly.

The envisaged method of operation of the neural network equipment monitor of Figure 1 is very simple and consists of: Connect it up: Connect the various instruments 12 whose output requires monitoring to the neural network equipment monitor 10. If the instruments do not provide a line-output then it is usually a simple matter to provide one. If, however, breaking into the circuitry of the instrument is not allowed, then - a pick-up coil mounted on the surface of the instrument will pick up any high-frequency signal, such as a video signal, which can be de-modulated and sent to the neural network equipment monitor. As discussed above, the neural network equipment monitor will adapt to virtually any type of signal and is tolerant of noise.

There is an increasing tendency to equip intensive care wards with data collection centres which serve to collect the vital function data of a ward of patients.

An instrumentation such as this would be an ideal platform for incorporating a neural network equipment monitor.

Teach it about "healthy" signals: Once the instruments are turned on and registering signals characteristic of a "healthy" state, press the OK button 14 and keep it pressed (it can be equipped with a latch) for several minutes while the neural network equipment monitor learns the concept of a healthy signal, making sure, during this time that the signals are characteristic ally healthy.

Let it work alone: Release the OK button. If the concept of a healthy signal has been well represented by the training examples given so far, the false alarm rate will be low. If, however, a signal is produced which does not fit the network's concept of a healthy signal, the alarm will sound requiring the human operator to press the OK button if indeed the signal is a healthy one. If false alarms are too frequent, a subsequent period of training may be required. Otherwise the neural network equipment monitor can be left to monitor the instruments unattended.

The simulations used as examples in previous sections were implemented on a Sun 3 and took 75 seconds for 500 updates, each having calculated the error derivatives over the entire 100 training patterns. Once the network has learned, the input patterns can be processed at an approximate rate of 500 patterns per second. Even an 8-bit processor running 100 times slower than the Sun 3 would therefore be able to cope adequately with real-time input at a rate of 5 patterns per second, though the learning time of two hours might be impractical. Of course, the learning can take place on a powerful machine leaving a much slower processor in charge of the monitoring once the network has learned.

If real-time learning is required, or if a much larger network is needed, there is no technology-limited upper limit to performance is the algorithm is implemented using parallel processors. A practical design using transputers allows the use of the language Occam which addresses the parallelism in a program. Using this formalism a forwards pass through a network that has already learned the correct weight values would be PROC forward.pass SEQ PAR calculate output for each neuron in input layer PAR calculate output for each neuron in next layer PAR calculate output for each neuron in output layer A backward pass, wherein the weight updates are calculated, would have a similar, but inverted, structure.

A complete learning cycle would consist of PROC learn.cycle SEQ forward. pass backward. pass The computing power of a T800 Transputer is very roughly equivalent to a Sun 3, and from this the number of Transputers required for a network of a given size to operate at a given speed can be calculated. For development, one of the many commercially available Transputer systems hosted by a PC would be used. However, for a conveniently packaged system suitable for use in a hospital, an expandable board containing 5 Transputers with memory, power-supply, etc., would fit into a box the size of a small briefcase.

A trial with real data was conducted. A hospital intensive care ward was approached to obtain vital function data from a number of patients over a period of 24 hours.

The data was scaled to lie between -1 and +1, and an artificial neural network trained to output 1 for the healthy data and 0 for uniformly distributed uncorrelated noise. Results are shown in Figure 7. In Figure 7 the output of the artificial neural network is given at the bottom, and is mostly high, indicating a healthy response from the patient. The low dips indicate alarm conditions and with each of these can be associated a departure from normal of one or more of the vital function traces above.

The modification of the statistical distribution of the default class is possible in many ways. For example, in the example given above where a process is generating data a, b, c, d, e subscribing to the (secret) relationship a < b < c and e < d < c. This data relationship is unknown to the neural network monitor at the beginning of training. A naive random synthesis process will generate uniform distributions over the same range for a, b, c, d, e. A cursory examination of the data, however, will reveal that the distributions of a and e will be biased towards the low end and the distribution of c will be biased towards the high end. Performance will be improved if this knowledge is incorporated into the training process. Given no knowledge about the statistical distribution of the default class, a uniform uncorrelated distribution is assumed.However if we have such knowledge it may be incorporated into the statistical distribution of the default class. This may be implemented either at the outset of operation or may be continuously adapted to incoming data. This is a good example of making use of the first order statistics of the data without necessarily having access to the more subtle second order information in terms of the interrelationships between the data.

To reduce the error below that expected from a uniform distribution we can synthesise default inputs with a distribution which takes account of prior knowledge of the distribution of the non-alarm class.

If, for instance, the non-alarm inputs are expected to fall within certain bounds, then the default inputs can be synthesised to fall only outside these limits. In this way the overlap between the two distributions (the default distribution and the non-alarm distribution) is zero and therefore the expected error will be zero. In practice, estimation of the limits of the non-alarm class will not be 100% reliable and so the expected error will be greater than zero. Also, leaving a gap in the default distribution in this way, requires the neural network to generalise well and interpolate smoothly which, in turn, is only guaranteed when the architecture of the neural network is well suited to the problem.Thus in some cases it may be safer to allow the default distribution to erode slightly into the expected distribution of the non-alarm data so that there is less reliance on the good behaviour of the neural network.

Figure 8 below shows a typical non-alarm or healthy, distribution and a suitable default distribution. Of course, in most real systems these distributions will be multi-dimensional.

Another aspect of incorporation of prior knowledge is in the structuring of a network so that it is predisposed to perform well for a particular problem domain. An example: consider a device that is looking at an image of some kind.

Suppose that translating an object in the image does not affect the identity of the image in any way. It is possible to build into the network this prior knowledge of translation invariance. This is done by sharing weights (weight replication), using non-standard activation functions for the neurons and other architectural features.

The major advantages of the described embodiment and the artificial neural network for equipment monitoring include: (1). Learn by example (2). Generalise from a representative training-set (3). Tolerant of noise (4). Incorporation of prior knowledge (5). Adaptive to any combination of inputs (6). Extremely simple to operate - no special skill required (7). Small network implementable on standard PC AT + interface (8). No technology-limited upper bound to potential size of network.

Claims

1. Apparatus for monitoring a plurality of parameters of a system and for identifying or predicting the condition of the system, said apparatus comprising: a plurality of inputs each for receiving data representative of a respective parameter of the system; processing means for processing data received via said inputs to determine or predict the condition of the system and provide output data representative of said condition; training means operable to identify to the processing means monitored inputs which belong to a first class which indicate a first predetermined system condition (e.g.

healthy), and data generating means operable to generate or synthesise and present to said processing means input values which belong to, or have a high probability of belonging to, a second class which indicates a second predetermined system condition (e.g. alarm condition), the processing means being operable during a learning phase to be taught to distinguish between inputs in said first class and said second class.

2. Apparatus according to Claim 1, wherein said data generating means simulates the statistical distribution of said second class of inputs by using a pseudo-random generator.

3. Apparatus according to Claim 2, wherein the distribution of said random process is non-uniform.

4. Apparatus according to any preceding claim, wherein the output provides data representing the probability of the monitored input signals falling in said second class.

5. Apparatus according to any preceding claim, wherein the processing means is configured as or operates as an artificial neural network.

6. Apparatus according to Claim 5, wherein the configuration or operation of the processing means is preferably structured as a network with a capability equivalent to a three or more layered network, corresponding or analogous to an input layer, at least one "hidden" or intermediate layer and an output layer.

7. Apparatus for monitoring a plurality of system parameters and thereby deducing the condition of said system, said system including processing means capable of being taught to distinguish between a set of monitored system parameters belonging to a Class indicating one system condition and a set of monitored system parameters belonging to another Class indicating another system condition, means for supplying and identifying the Class of a plurality of sets of parameters from one of said Classes, and means for randomly or pseudo randomly generating or synthesising sets of data and for identifying these to the processing as belonging to the other Class.

8. A method for monitoring a plurality of parameters of a system and for identifying or predicting the condition of the system, the method comprising the steps of: receiving a plurality of inputs each represents a particular parameter of the system, supplying said inputs to a processor for processing thereof to determine or predict the condition of the system, said processor having been trained by identifying to it those inputs signals belonging to a first class indicating a first predetermined system condition (e.g. healthy) and by providing said processor with synthesised or otherwise generated input signals belonging to, or having a high probability of belonging to, a second class indicating a second predetermined system condition (e.g. alarm condition) and identifying said synthesised or otherwise generated input signals as belonging to said second class.

9. A method according to Claim 10, wherein said signals belonging to or having a high probability of belonging to said second class have a uniform uncorrelated distribution.

10. A method according to Claim 9, wherein said signals belonging to or having a high probability of belonging to said second class have a given statistical distribution based on information obtained either at the outset of operation or continuously adapted to incoming data.

11. Apparatus substantially as hereinbefore described with reference to and as illustrated in any of the accompanying Figures.

12. A method as substantially hereinbefore described with reference to and as illustrated in the accompanying drawings.