GB2434504A

GB2434504A - Pattern Recognition Systems

Info

Publication number: GB2434504A
Application number: GB0600667A
Authority: GB
Inventors: Eleftheria Katsiri
Original assignee: Individual
Current assignee: Individual
Priority date: 2006-01-13
Filing date: 2006-01-13
Publication date: 2007-07-25
Anticipated expiration: 2026-01-13
Also published as: GB2434504B; GB0600667D0

Abstract

This invention generally relates to methods, apparatus and computer program code for classifying data representing patterns of activity, in particular activity in physical space as defined by spatial location data, in particular in mobile communications networks, such as digital mobile phone networks. A user activity monitoring system for a digital mobile communications network, the system comprising: an input for receiving spatial position information from a plurality of mobile communications devices coupled to said network; a module for constructing a trajectory for each of said devices, a trajectory of a device comprising a time series of positions of the device; and a classification system configured to classify said trajectories into selected classes of a predetermined set of classes using a plurality of hidden Markov models or Rete networks to provide a classification data output responsive to said trajectory classification.

Description

Pattern Recognition Systems.

FIELD OF THE INVENTION

This invention generally relates to methods, apparatus and computer program code for classifring data representing patterns of activity, in particular activity in physical space as defined by spatial location data, in particular in mobile communications networks, such as digital mobile phone networks.

BACKGROUND TO THE INVENTION

Many mobile communications networks provide a crude form of location technology by identifying a particular base station to which a mobile device is attached. However, location technology is developing rapidly and now includes triangulation techniques, as well as GPS (Global Positioning System) technology which is being built into an increasing number of mobile devices. Spatial position information accurate to of order 1 centimetre is also provided by ultrasonic BAT sensors (see, for example, [harter 99]) and UWB (Ultra Wide Band) [fleming95] devices, for example of the type available from Ubisense of Cambridge UK. (Details of the references are given at the end of the

description),

There is a need for improved techniques for processing this type of spatial location data so as to be able to make good use of the information available. We will describe systems and methods which address these needs, and which, in aspects, also have applications outside the processing of spatial data.

SUMMARY OF THE INVENTION

According to a first aspect of the present invention there is therefore provided a user activity monitoring system for a digital mobile communications network, the system comprising: an input for receiving spatial position information from a plurality of mobile communications devices coupled to said network; a module for constructing a trajectory for each of said devices, a trajectory of a device comprising a time series of positions of the device; and a classification system configured to classify said trajectories into selected classes of a predetermined set of classes using a plurality of hidden Markov models or Rete networks to provide a classification data output responsive to said trajectory classification.

The network may comprise any conventional mobile communications network including, but not limited to, a wireless local area network (WLAN) and a mobile phone network such as a GSM, GPRS or 3G network, for example of the type specified in the 3GPP (3rd Generation Partnership Project) and 3GPP2 specifications, Embodiments of the system may also be used across multiple networks for multiply enabled mobile communications devices.

Preferably the classification system is configured to select a class for a trajectory responsive to probability data from the plurality of hidden Markov models or Rete networks for the trajectory, typically selecting the most likely model or network.

Preferably the classification comprises Bayesian classification, as described further below.

In some preferred embodiments of the system, the collective behaviour of a plurality of devices is classified, based on a plurality of the trajectories. This may be used, for example, to identify a collective physical motion state of the plurality of devices (which, in general, will be attached to human users). Thus, for example, a group of devices/users moving on a train may be identified. More generally embodiments of the system may be used to identify one or more locations of traffic congestion andlor relatively free traffic flow. A collective behaviour classification may also be employed to reconfigure the network in order to better cope with predicted or actual load, for example to increase the coverage in a region where many users are present or where, for example based upon predicted motion, many users or an increased number of users is predicted to be present. Methods for reconfiguring a network may include reallocation of base stations and/or other techniques, such as network bandwidth control and/or cell size adjustment.

In some preferred embodiments of the system, the trajectory constructing module includes a system to link spatial position information received from a single device at a plurality of different elements of the network. This may comprise, for example, a system to retain data from two or more cells or base stations to which a device is attached at any one time. Additionally or alternatively this may be accomplished by monitoring hand-overs (hand-offs) in the mobile communications network to track a device as it moves within a region of network coverage.

In some preferred embodiments the classification system is distributed across a plurality of servers in a tree structure. Preferably a hierarchy of servers is present, so that, for example, a server at a lower layer need only pass information relating to a change in the data used for classifying further up the hierarchy to a higher level. Preferably means is provided for coordinating the classification system across these servers, for example using a distributed object structure such as a web service implementation. This provides a scaleable architecture which is useful in the context of managing the large volumes of data which may be encountered, for example, in a mobile phone network.

Preferably the system also includes a training module for training the classification system hidden Markov models or Rete networks using historical data from the user activity monitoring.

The invention also provides a method of monitoring user activity in a digital mobile communications network, the method comprising: receiving spatial position information from a plurality of mobile communications devices coupled to said network; constructing a trajectory for each of said devices, a trajectory of a device comprising a time series of positions of the device; and classifying said trajectories into selected classes of a predetermined set of classes using a plurality of hidden Markov models or Rete networks to provide a classification data output responsive to said trajectory classification.

In a further aspect the invention provides a method of user activity monitoring, the method comprising: inputting spatial position data for a least one user representing activity of said user; constructing a space-time trajectory for said user; and classifying said space-time trajectory into one of a plurality of predetermined classes using a plurality of hidden Markov models or Rete networks.

In embodiments the classification is sufficiently fine to identif' particular users.

Preferably each trajectory includes a sufficient number of points for reliable classification with at least 50 percent (on average) reliability discrimination of an unknown sample belonging to one of the predetermined classes.

Another application of the above described techniques involves identifying potential security violations in a packet data communications network. Known techniques typically rely upon determining a data rate (packets/second) but embodiments of the method we describe do not need this information. Instead in some preferred embodiments, putative invariant features are sought.

According to a further aspect of the invention there is therefore provided a method of detecting a potential security violation in a packet data communications network, the method comprising: capturing data from said network relating to data packets carried by the network; representing said captured data as tuples, each said tuple comprising a set of data items relating to a captured packet, said data items being selected from the group consisting of packet identification data, packet size, packet source address, packet source port, and packet time; grouping said tuples into sets of tuples each set representing a trajectory of said grouped tuples; and classifying said tuple trajectories using a plurality of hidden Markov models or Rete networks to identify a trajectory defining a potential security violation of said network.

In embodiments the tuples may be grouped, for example, by a source identifier such as a source IP address and/or port. Trajectories may, for example be in n-dimensional packet parameter space (where n is an integer equal to or greater than 1), and may optionally include one or more physical space dimensions.

Another application of the techniques we describe involves the detection of changes in financial instruments. These may be assumed to represent stochastic networks of random variables.

Thus in a further aspect the invention provides a method of identifying a potentially valuable stock share or other financial instrument, the method comprising: capturing data relating to stocks, shares or other financial instruments; representing said captured data as tuples, each said tuple comprising a set of parameters relating to said stocks, shares or other financial instruments; grouping said tuples into sets of tuples each set representing a trajectory of said grouped tuples; and classifying said tuple trajectories using a plurality of hidden Markov models or Rete networks to identi f' a potentially valuable stock, share or other financial instrument.

The data relating to the stocks, shares or other financial instruments may comprise value data and/or data derived from this, for example by differentiation in time, and/or other data such as that derived from company research.

The invention further provides computer program code to implement the above described systems and methods, in particular on a data carrier such as a disc, CD or CD-rom, non-volatile or programmed memory or on a data carrier such as an optical or electrical signal carrier. Code (and/or data) to implement embodiments of the invention may comprise source, object or executable code in a conventional programming language (interpreted or compiled) such as C, or assembly code, or other computer program code. Such code and/or data may be distributed between a plurality of coupled components in communication with one another.

The invention also provides a computer system comprising means for implementing the steps of each of the above described methods.

BRIEF DESCRIPTION OF THE DRAWINGS

These and other aspects of the invention will now be further described by way of example only, with reference to the accompanying figures which are as follows: Figure 1: Input System (input_final.jpg).

Figure 2: The UserAtPosition event type (UserAtPosition.jpg).

Figure 3: User Trajectory (allphonemes.jpg).

Figure 4: Trajectory creation (Trajectory.jpg).

Figure 5: Sampling process (walking.jpg,sit-down.jg, sitting.jpg).

Figure 6: Samples of the Sit Down movement (sit_3_square.jpg, sitlO.jpg, sitl5.jpg).

Figure 7: Classification System (recognition.jpg).

Figure 8: Classification System architecture 1 (classification.jpg).

Figure 9: A Rete Network that detects the closest empty location to user (cem_rete_sal3 copyjpg).

Figure 10: The Markov Generation model (hmm.jpg) Figure 11: Rete-based Classifier Architecture (recogni.vsd).

Figure 12: Classification Architecture 2 (classif2.vsd).

Figure 13: Data.mlf (data.mlf.vsd).

Figure 14: Data.scp (data.scp.vsd).

Figure 15: Trajectories of two users sitting down and getting up(3d) (stavros-eli-full.jpg).

Figure 16:Trajectories of two users sitting down and getting up (2d).(eli-stavros-z.jpg).

Figure 17: AESLdefinitions (implementation_aesl.jpg).

Figure 18: Filter combination.(filter_context2.jpg).

Figure 19: Dual-layer knowledge base. (levels2.jpg).

Figure 20: An example GSM network architecture (gsm.vsd).

Figure 21: Distributed classification architectures for the GSM network with server tree (gsm2.jpg).

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

According to a first embodiment of the present invention there is provided a user activity monitoring system for a digital mobile communications network.

* The system comprises: an input for receiving spatial position information from a plurality of mobile communications devices coupled to said network; a module for constructing a trajectory for each of said devices, a trajectory of a device comprising a time series of positions of the device; and a classification system configured to classify said trajectories into selected classes of a predetermined set of classes using a plurality of hidden Markov models or Rete networks to provide a classification data output responsive to said trajectory classification.

It is assumed that communication devices are equipped with appropriate location technologies such as the GPS technology, in order to know their position in 3D space.

Communication systems that are enhanced with allocation technologies belong to the general category of sensor-driven systems. Several location technologies exist at the moment that are appropriate for communication devices. We summarise the main ones below: Location Technologies The Active BAT is an indoors positioning system that uses an ultrasound time-of-flight trilateration (trilateration is a method of surveying analogous to triangulation, in which each triangle is determined by the measurement of all three sides) technique to provide positioning accuracy of one cm indoors. Users and objects carry Active BAT tags. In response to a request that the controller sends via short-range radio, a BAT emits an ultrasonic pulse to a grid of ceiling-mounted receivers. At the same time the controller sends the radio frequency request packet, it also sends a synchronised reset signal to the ceiling sensors using a wired serial network. Each ceiling sensor measures the time interval from reset to ultrasonic pulse arrival and computes its distance from the BAT.

The local controller then forwards the distance measurements to a central controller which performs the trilateration computation. Statistical pruning eliminates erroneous sensor measurements caused by a ceiling sensor hearing a reflected ultrasound pulse instead of one that travelled along the direct path from the BAT to the sensor. The system can locate BATs to within 1 cm of their true position for 95 percent of the measurements. It can also compute orientation information.

The Cricket System [priyanthaOO] is another location system that is also based on ultrasound technology. In contrast to the BAT, it uses ultrasound emitters to create the infrastructure and embeds receivers in the object being located. This approach forces the mobile objects to perform all their own triangulation (Triangulation is defined as the measurement of a series or network of triangles in order to survey and map out a territory or region, by measuring the angles and one side of each triangle) computations.

Cricket uses the radio frequency signal not only for synchronisation of the time measurement but also to delineate the time region during which the receiver should consider the sounds it receives. Like the Active BAT, Cricket uses ultrasonic time-of-flight data and a radio frequency control signal but this system does not require a grid of ceiling sensors. Cricket in its currently implemented form is less precise than the Active BAT, however, the fundamental limit of range estimation accuracy used in Cricket should be the same as the Active BAT. Its advantages include privacy and decentralised scalability while its disadvantages include a lack of centralised management or monitoring and the computational burden that processing both the ultrasound pulses and PB data places on the mobile receivers.

The Global Positioning System (GPS) [dana98} is a satellite-based navigation system developed and operated by the US Department of Defence. GPS permits land, sea and airborne users to determine their three-dimensional position, velocity and time. GPS uses a constellation of 21 operational NAVSTAR satellites and 3 active spares. The GPS satellite signal contains information used to identify the satellite and provide position, timing, ranging data, satellite status and the updated ephemeris (orbital parameters). A minimum of 4 satellites allows the GPS client to compute latitude, longitude, altitude (with reference to mean sea level) and GPS system time, through a process of triangulation. The satellites receive periodic updates with accurate information on their exact orbits. Differential GPS (DGPS) is regular GPS with an additional correction signal added. DGPS uses a reference station at a known point (also called a base station') to calculate and correct bias errors. The reference station computes corrections for each satellite signal and broadcasts these corrections to the remote, or field, GPS receiver. The remote receiver then applies the corrections to each satellite used to compute its fix.

Ultra Wideband (UWB) [fleming95] is a radio technology that opens up new capabilities in radio communications. A wireless technology transmits digital data at very high rates over a wide spectrum of frequency. Within the power limits allowed under current FCC regulations, not only can UWB carry huge amounts of data over a short distance at very low power, but it also has the ability to carry signals through doors and other obstacles that reflect signals at more limited bandwidths and higher power. In addition to its uses in wireless communications products and applications, UWB can also be used for very high-resolution radars and precision (sub-centimeter) location and tracking systems.

UWB radiation has unique advantages: transceivers and antennas can be made very small (i.e., coin size), low power and low cost because the electronics can be completely integrated in CMOS without any inductive components. Ultra Wideband signals form a shadow spectrum that can coexist and does not interfere with the sine wave spectrum.

The transmitted power is spread over such a large bandwidth that the amount of power in any narrow frequency band is very small. The advantages of spread spectrum are shared, including multipath immunity, tolerance of interference from other radio sources and inherent privacy from eavesdropping (low probability of intercept). Ultra Wideband non-sinusoidal signals have very good penetrating capabilities, and they support centimetre-level location accuracy without needing extremely accurate clocks to synchronise multiple receivers.

Currently, the rapid advances in mobile communications and the activities of development forums such as Source 02 [sourceo2] include efforts towards the integration of mobile communication systems such as GSM, U!vITS, iMode, with state of the art location technologies, such as GPS.

The next section describes the input system, i.e. the part of the system that is responsible for communicating the spatial positions of a device that is both coupled to the system and equipped with a positioning technology such as the ones above.

System Input The input of the system depicted in Figure 1. It consists of two main components: The client wrapper component and the input component. The client wrapper component operates on the coupled device and the input component on the said system. The client wrapper component accesses the GPS data and publishes it through the Application Programmable Interface (API) PublishSensorUpdate() in the form of a stream of UserAtPosition events to the input component in regular intervals, typically 1 UserAtPosition event per second. This is a typical value for the rate of calculated positions per device, for both the GPS and Active BAT location technology.

The structure of the UserAtPosition event is depicted in Figure 2. The name UserAtPosition is the type we have defined for this event. DevicelD, <x,y,z,> are the attributes of the event. Deviceld is the unique identifier of the coupled device, e.g., in the case of a mobile phone it could be its S1M card number. The attribute tuple <x,y,z> represent the unique coordinates of the device's position in 3-D space, according to a predefined coordinate system. Several coordinate systems exist currently, such as the World Geodetic System 1984 (WGS84) and the Terrestrial Reference Frame (TRF) for the GPS technology. The said system is independent of coordinate system used.

Geodetic transformations make it possible to transform the positions from one systems to another and for this reason, we have modelled the UserAtPosition event position tuple with <x,y,z>, which is correct in all cases.

As shown in Figure 1, the input component consists of four subcomponents: An Event Listener, an Event Adaptor, a Rete Manager and a Deductive Knowledge Base. The Event Listener, through the ReceiveSensorUpdate() API, listens for events that represent spatial positions produced by the coupled devices. It has filtering capabilities on event type and event attributes. Once an event is received it then passes it on to the Event Adaptor. The Event Adaptor performs the translation between the received event and a string and performs a remote invocation on the Rete Manager that causes a fact of type UserAtPosition to be asserted in the knowledge base, in a similar manner as it would be inserted into a database table. It reports any exceptions raised. Several such Event Adaptors may operate concurrently.

A knowledge base represents predicates that are true by storing an instance of each of these predicates. We refer to this instance as a fact. The assertion of a fact in the knowledge base is equivalent to it being stored in the knowledge base as a true statement. A fact being retracted from the knowledge base results in the removal of the fact from the knowledge base. In fact, the assert command is similar to a database ADD, whereas the retract command is equivalent to a database DELETE. When a fact is asserted in the knowledge base, this signifies that the predicate that the fact's predicate has the value TRUE. When the fact is retracted from the knowledge base, this signifies that the corresponding predicate has the value FALSE. This nomenclature is taken from logic programming.

The structure of the UserAtPosition fact and the UserAtPosition predicate is identical to the one of the UserAtPosition event and therefore is depicted in Figure 2.

The Rete Manager is a distributed object that allows the deductive Knowledge base to be accessed through a unique address, called IOR (Identifying object reference) without its location being known. Although there is only one Rete Manager per deductive knowledge base, the Event Adaptor can connect to more than one Rete Manager and consequently Deductive Knowledge Base, e.g. in order to replicate the data, load balance the Deductive Knowledge Base component, or in case of failure, to fall back to a replicated server.

This concludes the description of the system input. In the manner described above, the knowledge base stores the input that is received from the coupled devices. The interface Publish Abstract Event is discussed in Section "Rete-based Classifier".

Constructing a Trajectory The second part of the system is the one where a trajectory is constructed from the events that have been received by the system and correspond to the device positions. A trajectory is a time-series of positions of one or more device. An example of a trajectory that corresponds to a user that sits down momentarily on a sofa, gets up again, remains standing still for a few seconds and then walks away, is portrayed in Figure 3. The coordinate system is user defined. The points in the figure represent the device positions, as those have been communicated to the said system in the form of events, as explained in Section "System Input". The dotted line connects the points.

The architecture of this part of the system is portrayed in Figure 4. In order for the system to classify a trajectory correctly, the trajectory is cut into samples. This part of the system is called a sampler. The sampler uses a window over the stream of positions that are exported by the knowledge base and creates samples of a fixed size. The size is determined by the smaller number of points that describe a trajectory that belongs to a predefined class, under the worst circumstances. This very important as often events get lost or are filtered by the location technology and sometimes the rate by which the location technology generates location events varies (as is the case with the Active BAT system) so often samples contain a different number of data points. This is not a problem with our system. The Adaptor part of the trajectory creation translates the data points from their current format (knowledge bases fact format) to the format needed by the classifier which is the HTK User format [htkbook].

The functionality of the sampler is portrayed in Figure 5. In this figure, the window acts as a buffer of three data points (events). Each new event produced by the location system is tested against the three previous ones, which are buffered. Figure 5 illustrates how the movements Walking, Sit Down, Sitting and Stand Up are delimited with the help of the window.

An example of the samples that are produced by the sampler is portrayed in Figure 6.

After the samples have been processed by the Adaptor, they are sent to the classification system.

Classification System The architecture of the classification system is portrayed in Figures 8 and 12, in respect to the components of the Input System and Trajectory creation module respectively.

The classification system automatically infers high-level knowledge from trajectories such as the ones discussed in Section "Constructing a trajectory" when it has been appropriately trained. Such a system behaves similarly to a speech recogniser that recognises words from processing speech signals. More specifically the classifier comprises of one or more HMM-based classifier or and one or more Rete-based classifier.

HMM-based classifier A more detailed view of the classification system can be seen in Figures 7 and 11. The HMM-based classifier is built using the HTK toolkit [htk]. comprises of a set of training tools (Hlnit, HERest), a collection of Hidden Markov models (HIvIMs), a recognition module (HVite), an analysis module (HResults), a dictionary file and a network file.

Each HMM model represents a predefined classification class. Each model is represented by means of its parameter files. Initially the models are represented by the file hmmO. Each model is trained by means of a set of (predefined) configuration files, a set of transcription files (data.mlt) and a set of data files (data.scp). The data.mlf file contains labels that link the samples that have been generated by the Trajectory Creation modules with the correct classification class, as discussed in Section "Creating a Trajectory". Figure 6 portrays three training samples of the class "Sit Down". After the models have been trained, their parameters have been re-estimated (See Section "Hidden Markov Models") and they are represented by a new parameter file hmm2.

The file data.mlf (Figure 13) contains an entry for each sample. This entry contains the path where the sample can be found and the label that corresponds to the classification class e.g., Sit Down. The file train_data.scp contains the path and the file names of the training files. The file test_data.scp (Figure.14) contains the path and file names of the test samples (that will be recognised by the classifier).

The dictionary and network file are part of the system configuration files. After the classification has been performed by the recogniser, the results are generated in the results.mlf file (depicted in Figure 11 as "Transcriptions".) The analysis module (HResult) analyses theclassification results file from the recogniser by comparing it to a reference file where the same samples were correctly labeled and a recognition score is assigned to the recognition as a percentage of correct labels over the overall number of labels.

Table 1 contains the recognition score for the movement phonemes tested for the same user that produced the training samples as well as an additional user. The recognition score is 93% of the phonemes identified correctly. If recognition is performed for a different set of phonemes, namely, only patterns of doors opening outwards as opposed to walking straight, the recognition score is 77.78%.

Classes Training Test Samples Correct Recognition Samples score walking-Still-123 46 43 93.18% Sitting-Sit Down-Stand Up (same user) Walking-Still-123 45 41 91.11% Sitting-Sit Down-Stand tip (two users) User 31 8 8 100% Recognition (two users) Table 1: Classification Score examples.

User Recognition By means of the classification system it is also possible to recognise the user that has performed the said trajectory. The drive behind this goal has been the observation that a user sitting down on seats with different heights causes the production of different tracks by the monitoring system; A user recognition problem, which consists of distinguishing a user by the trajectories produced while sitting down and standing up again, was implemented using the said classification system. Figures 15 and 16 portray the trajectories of two users sitting down on a sofa and getting up again that are used as training samples to a user recogniser. The recognition score for this experiment was 100% (Table 1).

Hidden Markov Models A Hidden Markov Model (HMM) is a stochastic model where an underlying process that is not observable can be observed through another set of stochastic processes that produce the sequence of observations.

An HMM can be seen as a finite state machine that consists of N states denoted as X=XI,X2,...,XN and the state at time t as q,. An HMM is characterised by the following: * S, the number of distinct observation symbols per state, i.e., the discrete alphabet size. The observation symbols conespond to the physical output of the system being modelled. We denote the individual symbols as V = v1, v2,. . .v.

* The state transition probability distribution A = a0 where o a0 = P{q,1 = x, q, = x1], I = i, j = N * The observation symbol probability distribution in state j, B = b, (k), for a fixed time t, where o b.(k)=P[vk at tq,x,J,1_j_N,l_.k_S * The initial state distribution r = {r1}, i.e., the probability that each state x. is the first state, o ir.=P[q1=xj,1 =i =N Each time that a state j is entered at time t, an observation Vector O is generated from the probability density b (0,). After the HMM has moved from the initial state x0 to a final state X71 for this sequence, a sequence of observations has been generated: 0= I 2* T where each observation 0, is one of the symbols from V and T is the number of observations in the sequence (it is assumed that states x0 and x,.1 do not produce any observations).

Figure shows an example of this process where a six state model moves through the state sequence X=1, 2,2,3,4,4,5,6 in order to generate the sequence 0 to 06 (States 1 and 6 are the initial and final states and they do not generate any observations).

The general recognition problem can be seen as classifying an observation sequence T to the HMM that represents the hidden underlying process that generated the observation sequence. This problem entails three more specific problems: the first is that when trying to create HMM models for each movement phoneme, the values of the state transition probabilities and output probabilities b of each model are not known and need to be estimated by training data. The better the estimation, the more accurate the model. The second problem arises when trying to uncover the hidden part of the model. As the process to be modelled (movement phoneme) is unknown, the state sequence that generated an observation is not known either. The third problem is a problem of evaluation: how is the most appropriate model that generated the observation sequence defined, out of a set of possible models.

Bayesian Reasoning Assuming a vocabulary that consists of words w1 that represent the classification classes of interest, i.e., imagine we want to classify a single user's trajectory to a set of redefined classes Sit Down, Stand Up, Walking, Still, Sitting, each representing the possible movements that can be performed by the user, let each movement be represented by a sequence of position vectors of three dimensions (x, y, z) or observations 0, defined as O-(Ol,02,...,07) where 0, is the position (01 is a vector of three variables x, y, z that represent the coordinates) observed at time t. The phoneme recognition problem can then be regarded as that of computing the model with the maximum probability of having generated the observation sequence 0.

argmaxP(w1IO) (1) where w, is the Ih phoneme in the vocabulary.

This probability is not computable directly, but using Bayes' rule gives P(w. I 0) = P(0 I w,)P(w,) (2) P(0) Equation (1) is solved using (2) if P(0 I w1) can be estimated. The general problem of the direct estimation of the joint conditional probability P(01,02,..., T w1) from examples of location samples, given the dimensionality of the observation sequence 0, is not practicable. However, if a parametric model of word production such as a Markov model is used, then estimation from data is possible since the problem of estimating the class conditional observation densities P(0 I w) is replaced by the mathematically much simpler problem of estimating the Markov model parameters, which entails significantly smaller computational effort. Given an HMM model, the joint probability that 0 is generated by the model M moving through the state sequence X of Figure 10 is calculated simply as the product of the transition probabilities and the output probabilities: P(0, X I M) = a12b2 (0)a22b2 (02)a23b3 (03)... (3) Given that X is unknown, the required likelihood is computed by summing over all possible state sequences X = x(l), x(2), x(3),..., x(T) , that is P(0 I M) = where x(0) is the model's entry state and x(T +1) is the model exit state.

As an alternative to Equation (3), the likelihood can be approximated by only considering the most likely state sequence that is i'(O I M) = maxX(aX(O)T(l)Ib(,)(0,)a(,)X(,+I)} (4) This assumes that the parameters a and b are known. Although this is not generally the case, HMMs allow for the above parameters to be estimated using training data.

This process is called training (Figure 11).

Training.

Given a set of training examples corresponding to a particular model, the parameters of that model can be determined automatically by a statistically robust and efficient re-estimation procedure (Baum-Welch re-estimation). This procedure has the following steps: A set of prototype models are created, in which the output distribution for each state j is Gaussian with mean vector and covariance matrix E, that is, b (0,) satisfies: b(O,)=N(0, :1u,E) If L (t) denotes the probability of being in state j at time t, then the maximum likelihood estimates of,u, and Z, can be calculated as shown below 1LJ (t)O, L (t) - L (t)(O, - )(O, -1L1(t) where prime denotes vector transpose. To apply the above equations, the probability L.(t) . . J must be calculated. This is done efficiently using the Forward-Backward algorithm. Executing the above produces a set of models, which are optimised according to the training data.

Recognition Recognition of an unknown data sample of size s is based on building a Hidden Markov Model network and finding a path of size s that has the maximum likelihood (Viterbi algorithm). That path corresponds to the HMM model that corresponds to the correct phoneme. The model with the highest maximum likelihood is selected for each observation sequence under consideration (Token Passing Algorithm).

Rete-based classifier AESL input The AESL input may comprise a separate input to the classification system, based on Rete Networks. Preferably this input comprises a language based on first-order logic called AESL (Abstract Event Specification Language) that allows Rete Networks to be compiled from the specification. Although, generally in a hidden Markov Model based classifier we classify into predefined classes, for the Rete-based classifier, preferably we generate the classifiers in real time, through passing through the AESL input the AESL definition. It is also possible to use hidden Markov models in this way, i.e. not (only) to have predefined classes but also to provide a facility to be able to give a command to the system that "explains" (defines) how to create and train one (or more) of these models from scratch.

Most distributed systems research assumes that events are primitive, and various studies have, therefore, concentrated on composite events. However event-based systems, such as those using finite state machines, are insufficient for querying and subscribing transparently to distributed state [katsiri04]. This is due to the fact that the mapping between the subscription language and the implementation domain is incomplete, which makes computation by finite automata limited. This necessitates an alternative model for sensor-driven communication systems.

This section utilises the notion of an abstract event [katsiriO4] as a notification of transparent changes in distributed state. This is implemented as an extension to the publish/subscribe protocol in which a higher-order service (Abstract Event Detection Service) publishes its interface; this service takes an abstract event specification as an argument and in return publishes an interface to a further service (an abstract event detector or Rete-based classifier), which notifies transitions between the values true and false of the formula, thus providing a natural interface to applications.

The AESL input allows Rete-based classifiers to be generated in run-time from a specification. This means that although in hidden Markov Model based classification we classify into predefined classes, for the Rete-based classifier, we generate the classifiers in real time, through passing through the AESL input the AESL definition.

The AESL input can be linked to the HMM-based classifiers with minor alterations to the system.

Rete-based classifiers comprise one or more Rete Networks. A Rete Network that classifies rooms to the class "Closest empty location to each user" is portrayed in Figure 9. A Rete-based classifier is connected to the said system input (Figure 1) and more specifically it is connected to the Rete Manager module. It has a second input (AESL input) where the classification class is provided by the user or another classification system (lower in the hierarchy) in the form of a first-order logic definition. The AESL parser (depicted in Figure 17 as "Parser") translates the AESL definition into a set of production system rules, using the production system languages Jess, Drools or CLIPS.

The production system rules are fed via the Rete Manager to the knowledge base module where they are compiled by the knowledge base interpreter into one or more Rete Networks that perform the classification. Each Rete network generates a token for each successful classification and this is published as an event by the Rete Manager through the PublishAbstractEvent () interface. Because the abstract event represents the result of the classification it can be fed through the System Input to another classification system thus realising hierarchical classification.

An AESL definition for locating the closest location to each user follows: UL for UserlnLocation, AL for AtomicLocation, EL for EmptyLocation, CL for ClosestLocation, CEL for ClosestEmptyLocation and D for Distance: (It is assumed that the UserlnLocation predicates are inferred by the system automatically form the UserAtPosition predicates and the Distance predicates are constructed by the system from the UserAtPosition predicates (device positions).) -iu UL(u,rid, role, rattr) A AL(rid, rattr, polygon) = EL(rid, rattr) D(v1,u,role,rid2,rattr2) > D(v, ,u,role,rid,rattr1) = CL(u,role,rid1,rattr1) (6) CL(u,role,rid,rattr) A EL(rid,rattr) CEL(u,role,rid,rattr) Temporal Reasoning It is also possible to reason with the temporal properties of the events. In this case, each device is forced to attach a timestamp to each position event. Rete-based classifiers work with time at a dual level. They know which predicates hold "now" and also they know of the timestamps and the local clock. This allows them to take decisions about temporal properties e.g., "Locate the closest, empty Meeting Room to each user which has been empty for at least 5 mm." Rete Networks Each AESL definition is compiled into one or more Rete networks (forgy82) that are structured as a deductive knowledge base, and that can perform semantic operations on instances of first-order logic predicates that are defined in terms of FOL ( first-order logic) formulae. Rete networks consist of nodes and arcs. Every time a sensor creates a new instance of a concrete state predicate, corresponding tokens are created and propagated through the arcs to the nodes, eventually modit'ing appropriately the value of the abstract predicate.

Node Types This section outlines the type of nodes that are found in Rete Networks.

One-input nodes check whether the received tokens correspond to a particular condition, e.g., if they are of class UserAtPosition. These nodes are portrayed in red in Figure 9.

One-input nodes also check whether a value is assigned correctly to an attribute thus restricting the selection of the one input nodes further, e.g., selecting from the UserAtPosition tokens only the ones that refer to user "ek236". Such nodes are portrayed in brown (although they are not depicted in Figure 9 as the AESL definition.(6) does not have any attributes bound to any specific values. Instead, they form part of the filters, see Figure 18) This is allows for filtering to take place after the Rete Network has been built. This avoids duplicating computation (see [katsiriO5] Chapter 12) Each one-input node forwards the tokens that satisfy the check on to its child nodes.

Two-input nodes represent conjunctions. They concatenate the tokens that are stored in their right and left memory, and they perform a test to determine whether shared variables are bound correctly. Such nodes are portrayed in green and are labelled "AND".

Store nodes act as buffers for the current and historical instances of a predicate type and forward all stored instances on to the child nodes. This allows for temporal reasoning as they store historic instances of the same predicate.

Trigger-Query (TQ) nodes are nodes that trigger a CLIPS query that selects all instances of a particular predicate from the knowledge base for each token that is received at that node. Each (TQ) node is portrayed as a pair of identical nodes connected with a curvy line. TQ nodes are integral in Rete Networks that implement functions such as those that calculate the maximum or minimum value of an attribute of all stored instances of a predicate. They are often used in this dissertation for calculating the location with the smallest distance to one of the users. Each of the two nodes that form a TQ node is labelled "TQ (predicate)".

NOT nodes are satisfied when there is no token in their right memory. They are two-input nodes that use a special, auxiliary token "none" in their left memory. They are portrayed in yellow.

Test nodes perform a mathematical or logical operation such as equality or inequality on the values of the attributes of the tokens they receive. Test nodes that implement temporal reasoning are marked as TEMP.

Finally, &P nodes are final nodes. When a token is forwarded to the final node, an instance of the abstract predicate that is being defined is created or deleted accordingly and an "activation" or "de-activation" abstract event is triggered, respectively. Each match in the network of Figure 9 will cause the detection of the following abstract event: ClosestEmptyLocation (uid, role, rid, rattr, activation, timestamp) Each time an instance of the abstract predicate ClosestEmptyLocation (uid, role, rid, rattr) that was previously true is evaluated to false, the following event will be detected: ClosestEmptyLocation (uid, role, rid, rattr, de-activation, timestamp} Filters A filter is implemented as an AE detector with linear complexity. Filters are combined whenever there is a shared condition. For example, the filter of (5) can be combined with the filter of(6) as shown in Figure 18.

rattrMeeting Room A roleCeo.

rattr= Meeting Room A role= Sysadm (5) rattr= Meeting Room A role= Ceo (6) The Rete Algorithm The Rete algorithm [forgy79, forgy82] is a powerful and efficient algorithm for pattern matching as it exploits properties such as temporal redundancy in the working memory and structural similarity in the production memory in order to avoid examining the whole of these memories in each cycle. Temporal redundancy expresses the fact that not all elements in working memory change in each cycle and that any production that becomes instantiated was close to being instantiated in the previous cycle. Structural similarity expresses the fact that many productions have many conditional elements in common in their LHS. Based on these two properties, the Rete algorithm, instead of examining the working memory directly, monitors the changes made to working memory and maintains internal information that is equivalent to the working memory.

At the beginning of a cycle the match routine computes whether any changes need to be made to the conflict set. If there are changes to be made, it sends these changes to the interpreter where the conflict set is being maintained.

The interpreter consists of a fixed part that deals with the conflict set and a variable part that is generated by the compilation of the LHS of the productions into Rete networks.

These networks perform the actual match. Structural similarity is achieved in them by combining nodes that test for the same LHS conditional elements.

For each working memory element, a token is created for the pair of the working memory element and a tag. The tag is used in order to determine whether the working memory element is added or removed from the working memory. Tokens are processed by the nodes in the network in order to determine whether the overall pattern is matched or not.

Distributed Classification In some preferred embodiments the classification system is distributed across a plurality of servers in a tree structure. Preferably a hierarchy of servers is present, so that, for example, a server at a lower layer need only pass information relating to a change in the data used for classifying further up the hierarchy to a higher level. Preferably means is provided for coordinating the classification system across these servers, for example using a distributed object structure such as a web service implementation. This provides a scaleable architecture which is useful in the context of managing the large volumes of data which may be encountered, for example, in a mobile phone network.

All components in the system are implemented as distributed objects using the CORBA or Web Services distributed object technologies. It is possible to use other technologies as well wherever needed. The system knows of a set of predefined low-level predicates that are derived from the coupled device without any logical inference (Rete-based classifier) or classification (HMM-based classifier). These predicates are processed by the Input System at the same rate as their input rate and are fed to the classifiers accordingly. The result of the classification can be fed to a higher-level classifier in the form of an event through the interface PublishAbstractEvent () in Figure 1. For HMM-based classifiers, the new event is fed to the part "Trajectory Creation" where it can be sampled and adapted to be fed to the classifier. For Rete-based classification it is fed to the Event Listener of the AESL Input System (Figure 17).

A tree-like distributed classification for communication systems such as the GSM network (Figure 20) is depicted in Figure 21. However the system is not restricted to GSM networks. A tree-like distributed classification for a communication system with a complex architecture, such as the internet architecture, including peer-to-peer communications between nodes, can be implemented using our system on top of a distributed hash table with an appropriate protocol for optimising the placement of the classifiers as close as possible to the devices.

Scalability In addition to the above described hierarchical structure, Rete-based classification itself is structured in a relatively unusual way, namely, in order to allow applications to register inference rules that generate abstract knowledge from low-level, sensor-derived knowledge. Scalability is achieved by maintaining a dual-layer knowledge representation mechanism for each Rete-based classifier that functions in a similar way to a two-level cache. The lower layer maintains knowledge about the current state of the Communication System at device level by continually processing a high rate of events produced by the coupled devices e.g., it knows of the position of a device in space in terms of his coordinates (x,y,z). The higher layer maintains easily retrievable, user-defined, abstract knowledge about current and historical states of the communications environment along with temporal properties such as the time of occurrence and their duration. Such abstract knowledge has the property that it is updated much less frequently than knowledge in the lower layer, namely, only when certain threshold events happen. Knowledge is retrieved mainly by accessing the higher layer, which entails a significantly lower computational cost than accessing the lower layer, thus maintaining the overall system scalability.

Figure 19 depicts the architecture of a dual-layer knowledge base. The lower layer is depicted as "Sensor Abstract Layer (SAL)" and the higher layer as "Deductive Abstract Layer (DAL)". The lower-level predicates p1 and p2 are monitored at the same rate as their arrival rate. The higher layer predicates P1 and P2 are abstract events specified by an AESL definition (see Section Rete-based classification) and only change according to the definition, at a lower rate than p1 and p2. See [katsiriO3] for details.

Collective Behaviour modelling Collective behaviour can be recognised and classified by means both of the HMM-based classifiers and the Rete-based classifiers. For example the said system can be used to classify samples consisting of positions derived from a plurality of devices, that characterise the collective physical motion of a plurality of devices (attached to humans). Collective behaviour recognition can be used to classify the features of a vehicle transportation model or a model for crowd management in public areas. In the former case, the model can be used to recognise the large-scale behaviour of people in cars, moving over a network of roads and railways. The system classifies samples taken from devices that are located within a single or adjacent cells (in a GSM network, see Figure 20) into the classification features/classes may be "traffic jam", "normal traffic", "train", "bus", "destination" etc. This way of collecting samples, ensures the locality of the devices that form a pattern but also ensures that patterns that exist at the boundaries of communication cells can also be captured, e.g., capturing a train that is crossing from one cell to another.

The crowd management model, can be used to describe the collective behaviour of a crowd of people moving inside a public area such as the Paddington station is London.

Samples taken as described above, correspond to the classification classes "metro exit", "gate", "train platform", "congestion", "shop", "meeting point", "walking", "running" etc. Effectors that are triggered as a result of successful classification, may be used for example to reconfigure the network in order to better cope with predicted or actual load, for example to increase the coverage in a region where many users are present or where, for example based upon predicted motion, many users or an increased number of users is predicted to be present. Methods for reconfiguring a network may include reallocation of base stations and/or other techniques, such as network bandwidth control and/or cell size adjustment.

Stock Management In preferred embodiments of the said system, it is possible to compose, deploy and manage algorithmic trading strategies, such as VWAP, Spread Trading and Index Arbitrage, by means of the Rete-based and HMM-based Classifier. Using the AESL input it is possible to reason transparently with distributed state, leading to the composition of strategies that are portable and can be applied to multiple steams of real stock data in a "plug-and-play" manner. The AESL input allows the composition of strategies of an expressiveness that is not possible with existing technologies that are based on finite-state machines, including negation (see Section "AESL input"). It is also possible to calculate the risk associated with derivative modelling in real time, which is infeasible with current technologies.

Packet Data Communications Another application of the above described techniques involves identifying potential security violations in a packet data communications network. Known techniques typically rely upon determining a data rate (packets/second) but embodiments of the method we describe do not need this information. Instead in some preferred embodiments, putative invariant features are sought. For example, in order to detect an intrusion it is often necessary to detect data packet streams of the same length. This can be implemented efficiently using the classification system and the AESL input in order to provide a language for driving the classification.

References [clips] CLIPS. A Tool for Designing Expert Systems.

http//www.ghg.net/clips/CLIPS.html [dana98] P. Dana. Global Positioning System Overview.

[fleming95] R. Fleming and Kushner. Low-Power, Miniature, Distributed Position Location and Communication and Communication Devices Using Ultra-Wideband, Non-Sinusoidal Communication Technology. Technical Report, Aether Wire Location, 1995.

[forgy79] C.L.Forgy On the effective implementation of Production Systems. PhD Thesis, Carnegie-Mellon University, 1979.

[forgy82] C.L.Forgy. Rete: A fast algorithm for the Many Pattern /Many Object Pattern-Match Problem.

[harter99] A. Hailer, A. Hopper, P. Steggles, A. Ward and P. Webster. The Anatomy of a Context-Aware Application. In MobiCom'99: Proceedings of the 5th Annual ACM/IEEE International Conference on Mobile Computing and Networking, pages 59-68, Seattle, WA, Aug 1999. ACM Press.

[htk] http://htk.eng.cam.ac.uk/ [htkbook] S. Young et al, The HTK Book, http://labrosa.ee.columbia.edu/doc/HTKBOOk21/HTKBook.html.

Uess] Jess:The Rule Engine for the Java Platform.

http://herzberg.ca.sandia.gov/jess [katsiri03] E.Katsiri and A. Mycrofi. Knowledge-Representation and Abstract Reasoning for Sentient Computing. In Proceedings of First Workshop on Challenges and Novel Applications of Automated Reasoning, in conjunction with CADE-19 pages 73-82. Miami Beach, FL.

[katsiriO4] E. Katsiri, J. Bacon, and A. Mycroft. An Extended Publish/Subscribe Protocol for Transparent Subscriptions to DistributedAbstract State in Sensor Driven Systems using Abstract Events, in Proc. International Workshop on Distributed Event-Based Systems, May 24-25, Edinburgh,

UK

[katsiri05] Middleware Support for Context-Awareness in Distributed Sensor-Driven Systems, PhD Thesis, University of Cambridge.

[priyantha00] N.B.Priyantha, A. Chakraboty and H. Balakrishnan. The Cricket Location-Support System. In MobiCom 00: Proceedings of the 6th Annual International Conference on Mobile Computing and Networking, pages 32-43, Boston, MA, Aug.2000.

[sourceo2] Source O2:http://www.sourceo2.com No doubt many other effective alternatives will occur to the skilled person. It will be understood that the invention is not limited to the described embodiments and encompasses modifications apparent to those skilled in the art lying within the spirit and scope of the claims appended hereto.

Claims

CLAIMS: 1. A user activity monitoring system for a digital mobile

communications network, the system comprising: an input for receiving spatial position information from a plurality of mobile communications devices coupled to said network; a module for constructing a trajectory for each of said devices, a trajectory of a device comprising a time series of positions of the device; and a classification system configured to classify said trajectories into selected classes of a predetermined set of classes using a plurality of hidden Markov models or Rete networks to provide a classification data output responsive to said trajectory classification.

2. A system as claimed in claim I wherein said classification system is configured to select a said class for a trajectory responsive to probability data from said plurality of hidden Markov models or Rete networks for the trajectory.

3. A system as claimed in claim 2 wherein said classification comprises Bayesian classification.

4. A system as claimed in claim 1, 2 or 3 wherein at least one said hidden Markov model or Rete network is responsive to a plurality of said trajectories from a plurality of said devices, whereby said at least one hidden Markov model or Rete network represents collective behaviour of a plurality of said devices, and wherein said classification data includes at least one said collective behaviour classification.

5. A system as claimed in any one of claims I to 4 wherein said trajectory constructing module comprises a system to link spatial position information received from a single said device at a plurality of different elements of said network.

6. A system as claimed in any one of claims I to 5 wherein said classification system is distributed across a plurality of servers in a tree structure, the system further comprising means to coordinate the classification system across said servers.

7. A system as claimed in any one of claims ito 6 further comprising means for re-configuring said network responsive to said classification data.

8. A system as claimed in any one of claims I to 7 further comprising a training module for training said classification system hidden Markov models or Rete networks response to historical data from said user activity monitoring.

9. A method of monitoring user activity in a digital mobile communications network, the method comprising: receiving spatial position information from a plurality of mobile communications devices coupled to said network; constructing a trajectory for each of said devices, a trajectory of a device comprising a time series of positions of the device; and classifying said trajectories into selected classes of a predetermined set of classes using a plurality of hidden Markov models or Rete networks to provide a classification data output responsive to said trajectory classification.

10. A method as claimed in claim 9 wherein at least one said hidden Markov model or Rete network is responsive to a plurality of said trajectories from a plurality of said devices, whereby said at least one hidden Markov model or Rete network represents collective behaviour of a plurality of said devices, and wherein said classification data includes at least one said collective behaviour classification.

11. A method as claimed in claim 10 wherein said collective behaviour classification comprises a classification defining a collective physical motion state of said plurality of devices.

12. A method of user activity monitoring, the method comprising: inputting spatial position data for a least one user representing activity of said user; constructing a space-time trajectory for said user; and classifying said space-time trajectory into one of a plurality of predetermined classes using a plurality of hidden Markov models or Rete networks.

13. A method of classifying user activity as claimed in claim 12, wherein said classifying further comprises identifying said user.

14. A method of classifying user activity as claimed in claims 12 and 13 further comprising updating said models using a result of said user activity monitoring; 15. A method of detecting a potential security violation in a packet data communications network, the method comprising: capturing data from said network relating to data packets carried by the network; representing said captured data as tuples, each said tuple comprising a set of data items relating to a captured packet, said data items being selected from the group consisting of packet identification data, packet size, packet source address, packet source port, and packet time; grouping said tuples into sets of tuples each set representing a trajectory of said grouped tuples; and classifying said tuple trajectories using a plurality of hidden Markov models or Rete networks to identify a trajectory defining a potential security violation of said network.

16. A method of identifying a potentially valuable stock share or other financial instrument, the method comprising: capturing data relating to stocks, shares or other financial instruments; representing said captured data as tuples, each said tuple comprising a set of parameters relating to said stocks, shares or other financial instruments; grouping said tuples into sets of tuples each set representing a trajectory of said grouped tuples; and classifying said tuple trajectories using a plurality of hidden Markov models or Rete networks to identify a potentially valuable stock, share or other financial instrument.

17. A carrier carrying computer program code to, when running, implement the method of any one of claims 9 to 16.

18. A computer system comprising means for implementing the method of any one of claims 9 to 16.