GB2430830A

GB2430830A - Image sequence movement analysis system using object model, likelihood sampling and scoring

Info

Publication number: GB2430830A
Application number: GB0619126A
Authority: GB
Inventors: Stephen Mckenna; Gordon Mcallister; Timothy Roberts
Original assignee: University of Dundee
Current assignee: University of Dundee
Priority date: 2005-09-28
Filing date: 2006-09-28
Publication date: 2007-04-04
Also published as: GB0519698D0; GB0619126D0

Abstract

A system for movement analysis suitable for use in the analysis of human movement, particularly in sports such as golf, tennis, squash, baseball or cricket. The system has an image analysis means for receiving a sequence images and creating an object model (e.g. head part 13a-13c) of an object to be analysed; there is also a likelihood scoring module and sampling means, which samples and scores the image data to extract movement information from images in the sequence, analysing object movement. 2D probabilistic masks 23-29 may be used, for example based on image membership of foreground or background regions. Sports performance information can be extracted from the sequence of images in a video, for example allowing communication of expect advice as to how the object (human) motion differs from ideal, preferred motion, e.g. in golf swing analysis.

Description

1 2430830 I System for Movement Analysis 3 The present invention relates

to a system for movement 4 analysis suitable for use in the analysis of human movement, particularly in sports such as golf, tennis, 6 squash, baseball or cricket.

::::. 8 Mass participation sports such as golf and tennis require **.S 9 the player to learn and develop specific techniques and body movements to allow them to play these games properly *..I..

* Ii and successfully. Most of the time, players rely upon *S*S..

* 12 their own ability to analyze their performance and the 13 comments of fellow participants to improve their *. 14 technique. Whilst it is possible to obtain lessons from **** for example, golf professionals or tennis coaches, these 16 lessons can be expensive and usually involve the coach 17 providing comments upon the player's technique, without 18 the use of video or the like. Where video is available, 19 the cost to the player usually increases further as the I motion captured on the video tape requires expert 2 analysis.

4 In view of these limitations, it would be desirable to create an inexpensive system in which the position of an 6 object such as the human body and sports equipment might 7 be identifiable in order to allow analysis of body 8 movement.

A technique for determining object pose from images is Ii the subject of International patent application 12 PCT/GB2004/001545. The technique is also described in 13 T.J. Roberts. Efficient Human Pose Estimation from Real 14 World Images. PhD thesis, University of Dundee, February 2005. Golf swing tracking has been investigated in the 16 literature (N. Gehrig, V. Lepetit, and P. Fua. Visual 17 golf club tracking for enhanced swing analysis. In 18 British Machine Vision Conference, September 2003) R. Urtasun and P. Fua. 3D human body tracking using : * 21 deterministic temporal motion models. In European *.. S ** 22 Conference on Computer Vision, Prague, Czech Republic, 23 May 2004 describes the use of a Principal Component 24 Analysis (PCA) motion model for tracking through walking sequences; in this case the model is fitted * 26 deterrninistically, rather than through stochastic search. S...

m' 28 In addition, systems are known that allow a user to 29 obtain an analysis of their golf swing by sending a video of their swing over the internet to a golf professional.

31 However, such a system merely provides a convenient way 32 of communicating video data and relies solely on the 33 expertise of the golf professional to analyse the 1 player's technique without providing any analysis of the 2 golf swing.

4 It is an aim of the present invention to provide an improved system of movement analysis suitable for use in 6 sports training.

8 In accordance with a first aspect of the invention there 9 is provided a system for analysing the movement of an object, the system comprising: 11 image capture means adapted to capture a sequence of 12 images of an object; 13 image analysis means comprising 14 object model creation means for creating an object model of an object to be analysed, a likelihood scoring module 16 and sampling means, wherein the image analysis means 17 extracts movement information from images in the sequence 18 to analyse object motion.

optionally, the image capture means and the image 21 analysis means are remotely connected. * ** * * * **,S **.

23 Optionally the image capture means and the image 24 analysis means are connected via the internet.

*a**..

* : 25 ****** * 26 Optionally, the image analysis means is located on a *:*:.* 27 central server.

***.** 28 S...

29 In accordance with a second aspect of the invention there is provided a system for analysing the movement of an 31 object, the system comprising: 32 image analysis means adapted to receive a sequence of 33 images, the image analysis means comprising 1 object model creation means for creating an object model 2 of an object to be analysed, a likelihood scoring module 3 and sampling means, wherein the image analysis means 4 extracts movement information from images in the sequence to analyse object motion.

7 Preferably, the analysis of object motion comprises 8 comparing the object motion to an idealised or preferred 9 object motions, and identifying differences. I0

II These differences may be communicated to the user as 12 expert advice on the manner in which the object movement 13 differs from an idealised or preferred movement.

IS Advantageously, this can be achieved without expert 16 comment.

IS Preferably, the object model represents the visual 19 appearance of a part using a set of 2D probabilistic masks * *.

:.. 22 Preferably, the mask represents a distinct visual region **** ****. 23 that a point in the part model may belong to. Optionally, 24 the paint in the part model is the foreground region.

**.I*. * 25 *.*..

* 26 Preferably, a mask value represents the probability of a 27 point on the mask belonging to this class. *... * * -

29 Preferably, the object model is associated with a layered 2D transform which is used to project the set of masks 3! into the image whilst taking into account self-occlusion 32 of the object model.

1 Preferably, the transform defines a one-to-one mapping 2 between points on the mask and pixels in the image.

4 Optionally, each part is modelled using two masks, representing membership of foreground and background 6 regions.

8 Preferably, the sampling means generates object pose 9 configurations that can be scored by the likelihood module 12 Preferably, the sampling means comprises a learned motion 13 model which generates a sequence of object configurations 14 which describe a predetermined object motion trajectory.

16 The predetermined pattern of object motion is related to 17 the motion constraints that apply to the typical swing or 18 stroke used in golf, tennis or other activity that is 19 being analysed.

21 Preferably, the sampling means fits the object motion :.::* 22 trajectory to the images of the image sequence in 23 parallel. *

Preferably, a principal component analysis (PCA) model of * 26 the object motion trajectory is used. * * * * *

* S. - S...

28 Preferably, a mean object motion trajectory and a set of 29 basis vectors are calculated from training data.

31 Preferably, new object motion trajectories can be 32 generated by sampling basis weights.

1 Preferably, a sequence of object configurations is 2 generated and then associated with a particular sequence 3 of frames such that they can subsequently be scored.

Preferably, the object configurations are scored by the 6 likelihood module once the object configurations for each 7 frame have been calculated using the motion model.

9 Preferably, the likelihood scoring module scores a part configuration using cues which are calculated from the 11 input video sequence in a pre-processing stage.

13 Preferably, individual part scores are then combined to 14 produce a score for the full object configuration.

16 Preferably, for each mask in a part. model, first and 17 second densities can be learned over the cue values 18 conditioned on the model being in a first and second 19 state defined as on' and off'. The ratio of the first and second densities gives a non-linear mapping between 21 the cue values and a measure of belief that the values * ** 22 are explained by the model being on' or off' * S 24 Preferably, a mask can be scored by accumulating the *....: 25 likelihood ratios of each cue value at all points in the * 26 mask, weighted by the mask value. * *

28 Preferably, the frame differencing cue is a measure of 29 how likely a pixel in one frame is to be the same' as the corresponding pixel in all the other frames in the 31 video sequence.

I The system of the present invention track peoples' body 2 parts and other relevant objects such as sports equipment 3 in a video sequence in order to extract information 4 pertaining to an individual's body shape, position and movement.

7 Sports performance information can be extracted from the 8 sequence of images in the video. The invention is 9 particularly suited to sports where a single person performs an in- place, stylised movement. El

12 The present invention is suitable for analysing a golf 13 swing and provides a tracking strategy which takes 14 advantage of the stylised nature of the golf swing motion. The present invention provides a platform for 16 rapidly developing and integrating these components by 17 creating an object model, likelihood scoring and the 18 sampling method.

The present invention will now be described by way of : * 21 example only with reference to the accompanying drawings *e..

22 in which: **.. * S 23

*....: 24 Fig.l is a schematic overview of an example of the * * *....: 25 present invention; * S :. 27 Fig.2 shows an overview of the object model used in an **..

e, 28 example of the present invention; Fig. 3 shows a probabilistic mask for a golf club; 32 Fig.4 shows an overview of a sampler used in an example 33 of the present invention; 1 Fig.5 shows an overview of the likelihood scoring module 2 used in one example of the present invention; and 4 Fig.6 shows an overview of the present invention in use.

6 The analysis means of the present invention uses three 7 main components for analysing motion as shown in Fig.l.

8 An object model which provides part configurations 3, 9 sample/tracking means 5 and means for calculating likelihoods 7. These features are combined in a single 11 software application which, in use is loaded onto a 12 computer. The computer can act as a central server which 13 is attached to a number of image capturing means each of 14 which provide sequences of images for analysis. In one IS embodiment of the invention, the images are stored on the 16 central server and each sequence of image can be labelled 17 such that the user's performance can be monitored over 18 time.

The analysis of the images will be described with 21 reference to figures 2 to 5 and describing the invention * *.

:.. 22 with reference to the particular example of golf swing S...

. 23 analysis. In the case of golf swing analysis, the parts * 24 of the scene that are of interest for determining ****** * S * 25 performance parameters are the golf club and individual *SS.e * S 26 body parts of the golfer. These parts are collectively ** 27 referred to as the object model. S... * * SSS*

29 Fig. 2 shows an object model overview. First, training data in the form of images ha to lic are annotated and 31 in this case the head region 13a to 13c is extracted.

32 Then the set of extracted regions are combined 15 in some 33 way to produce the head part foreground model 19. In this 1 example an average is taken, however other functions may 2 be used.

4 The visual appearance of a part is modelled using a set of 2D probabilistic masks which are stored in a parts 6 library 21. Each mask 23 to 29 represents a distinct 7 visual region that a point in the part model may belong 8 to (e.g. foreground region' or "background region") and 9 the mask value represents the degree to which a point on the mask belongs to this class. The mask values are ii probabilities and as such are real numbers between zero 12 and one. The set of masks for each part are of equal size 13 and are aligned with each other. Equivalently, the 14 separate real-valued masks could be regarded as a single, real vectorvalued mask.

17 Each part model is associated with a layered 2D transform 18 which is used to project the set of masks into the image 19 whilst taking into account self-occlusion of the object model. The transform relates each point in the mask with * *. 21 a unique pixel in the image: * S * S...

Ia.. 22 * **S.

23 XT_T(P) **5** * 24 * 25 where T is the transform, p is a point on the part model *.. 26 and XT is the transformed point in the image. The form of * e...

27 T depends on the parameterisation of the pose 28 configuration, m, for the part model. There are several 29 possible parameterisations, e.g. affine, for this configuration. This application uses the following 31 parameterisat ion: I m=(x,y,e) 3 where (x,y) is the translation of the part model in the 4 image and e is the orientation of the part model in the image plane. In general, a scale parameter may be needed 6 but in this example this parameter is shared by all 7 parts. The full object pose configuration is thus: 9 m=[m11m2, . . . 11 where N is the number of parts that make up the object 12 model.

14 For the golf swing analysis application, each part is IS modelled using two masks, representing membership of 16 foreground and background regions. The parts may be 17 modelled by one or more masks. Possible foreground 31 and 18 background 33 masks for the golf club are shown in Figure 19 3. The greyscale value of the mask corresponds to the weight; black corresponds to a weight of 0, whilst white 21 corresponds to a weight of 1. The foreground mask 31 in 22 Fig.3a reflects the expectation that the foreground part 23 of the club is long and thin, with some uncertainty due * . 24 to flex of the clubshaft.

S

* 25 The large block at the bottom of the background mask *, 26 Fig.3b means that the configuration of the clubhead with 27 respect to the clubshaft is not explicitly modelled (and . 5 28 is deemed not of particular consequence.) The job of the sampler in the hypothesise-and-test 31 paradigm used here is to generate object pose 32 configurations that can subsequently be scored by the 33 likelihood module. An overview of this process is given 1 in Fig. 4 indicated by reference numeral 35. Object parts 2 37 are sampled using one of a number of available 3 samplers 39 and constraints 43 applied to produce a 4 specific instantiation of the object model 45.

6 A stylised object movement such as a golf swing produces 7 a smaller subset of object poses. The swing provides 8 strong motion constraints that can be used to ease the 9 tracking problem.

11 A motion model can be learned which generates a sequence 12 of object configurations which describe a plausible 13 swing. This sequence is the swing trajectory; a specific 14 swing trajectory is defined by a swing configuration, whose form depends on the motion model used.

17 The apparatus and method of the present invention 18 performs off-line tracking and allows a swing trajectory 19 to be fitted to the entire sequence at once. This is possible since tracking in real-time is not required. * ** *s.

I

22 For this application, a PCA model of the swing trajectory 23 is used. A mean swing trajectory and a set of basis **. I. * : 24 vectors are calculated from training data. New swing S.....

* * 25 trajectories can be generated by sampling basis weights; * 26 the set of sampled basis weights is typically of much * S. * 27 lower dimensionality than the full swing trajectory 28 space. The full swing configuration, s, in this case is 29 given by: 31 S=(b1,.. ., b, St. s, tp) 1 where bi1tt1b are the n basis weights 2 obtained by projecting a swing trajectory onto the PCA 3 basis space, s, s, t, t, and p are temporal scale, 4 spatial scale, x and y spatial translation and temporal position, respectively. Note that. it is not assumed that 6 the temporal scale (i.e. how fast the swing is) of the 7 swing is known beforehand.

9 The temporal scale gives the speed of the swing and hence the number of object configurations/frames that must be IL scored. The length of the swing is typically less than 12 the length of the captured sequence. The temporal 13 position is typically less than the length of the 14 sequence, so the temporal position parameter gives the starting frame of the swing. This allows the object 16 configuration to be scored over the correct frames (for 17 this particular swing configuration.) The spatial scale 18 parameter helps correct for uncertainty in the distance 19 between the golfer and camera. * **

21 To calculate a score for a given swing configuration, a 22 sequence of object configurations must first be generated 23 and then associated with a particular sequence of frames 24 such that they can subsequently be scored. * S

26 The temporal scale from a given sampled swing 27 configuration may not be the same as the temporary scale 28 used in constructing the PCA space. In the cases where 29 this occurs, the swing trajectory specified by the basis weights of the swing configuration is temporally 31 resampled using cubic spline interpretation to match the 32 temporal scale specified by the swing configuration. Once 1 the correct object configurations for each frame have 2 been calculated, the swing configuration can then be 3 scored by the likelihood module.

There are several different search strategies that cart be 6 employed to determine the best' swing configuration.

7 Search could be performed strictly in the swing 8 configuration space. Another possibility is to combine 9 swing configuration sampling with a local search in image space at each frame; the resulting swing trajectory may 11 not be a plausible swing, so it is back-projected into 12 the PCA space to give the nearest plausible swing 13 trajectory described by the PCA model.

In the golf swing analysis application, a part 16 configuration is scored using pixeiwise cues which are 17 calculated from the input video sequence in a 18 preprocessing stage. In general, the cues may not be 19 pixeiwise. The individual part scores are then combined to produce a score for a full object configuration. The * 21 cue values are not used directly, rather a non-linear 22 mapping is learned over them. Ar' illustration of the **.

23 likelihood scoring process 47 is given in Fig.5. Fig.5 24 shows the scoring of the hypothesis 48 using image data 50. The sample arid image are combined 51 and likelihood S.... * .

26 measures 55 are applied to give a likelihood 53. * . * 2 * ** S...

* * "8 *... - 29 A part model can be in one of two states: on' if the model is correctly aligned with its corresponding part, 31 or off' otherwise. For each mask in a part model, two 32 densities can be learned over the cue values conditioned 33 on the model being on' and off'. The ratio of these two 1 densities gives a non-linear mapping between the cue 2 values and a measure of belief that the values are 3 explained by the model being on' or off'.

More formally, the set of possible states is S={on,off) 6 and the set of mask classes is C. The elements in C 7 depend on the application. The score is then the ratio: 9 p(xlc,s=on) p(xlc,s =off) where x is a cue value, CEC and sES.

II

12 A mask can be scored by accumulating the likelihood 13 ratios of each cue value at all points in the mask, 14 weighted by the mask value. Two cues are used in the golf application: frame differencing and colour. The 16 likelihood ratios f or each pixel (for every cue) are 17 assigned in a preprocessing stage. For this application, IS C- (foreground, background). In general, there may be more 19 than two masks per point. * ** * * * ***

21 For the golf swing analysis application, pixeiwise cues 22 may be used and likelihood ratios for different classes * 23 are learned separately (as opposed to where a similarity 24 measure between masks of different class is used.) * * * * * * ** * 26 To learn the on' density for a given class, the mask for 27 that class is aligned correctly with respect to its 28 corresponding part in a set of training images. For each 29 point in the mask, the cue value of the corresponding pixel is added to the density with weight proportional to 31 the mask value at that point. The off' density is I learned in a similar way, with the exception that the 2 mask is initially misaligned in the training images.

4 Once the on and off densities have been learned, their ratio can be calculated. The values of this function can 6 then be stored for later use in the tracking application.

7 The specific cues that are used in the golf swing 8 analysis application are discussed next.

The frame differencing cue used here is a measure of how II likely a pixel in one frame is to be the same' as the 12 corresponding pixel in all the other frames in the video 13 sequence. The more different it is from all the other 14 pixels, the more likely it is to be foreground.

16 A distance measure, d, is defined between the colour 17 values of two pixels. There are many possibilities for 18 such a distance measure, which may depend to some extent 19 on the colour space. In this case, RGB colour space is used and the distance measure for a pixel at position x : * 21 in frame number if from the corresponding pixel in frame S...

22 number n is: * S ***.

24 d(,c,f,n)=I tI(X)-I(X) I I *...S. * *25

26 where I is a frame, with the subscript denoting frame . 27 number and I I I I denotes Euclidean distance. The frame *.*.

28 difference cue is Qa = G(d(x,f,n)) nxf 32 where G is a Gaussian kernel.

I For the golf swing analysis application, log-likelihood 2 ratios are learned for the foreground and background 3 classes. In this application, the frame differencing is a 4 strong cue for providing evidence of foreground regions.

The foreground log-likelihood ratio for the frame 6 differencing cue is: 8 L ffd(x) - Iog(p(Qd(X)c = foreground,s = on)) Iog(p(Q(x) c = foreground, s = off)) The background ratio for the frame differencing cue is:

II

L bfd(L) = Iog(p(Q(x)c = background,s = on))

- Iog(p(Q(x)l c = background,s = off))

14 The density estimates, p(.), in the preceding equations IS are formed using histograms. In general, there are many 16 ways of representing such estimates.

18 The colour cue is simply taken as the RGB pixel value at * ** 19 each pixel in a frame. The value of the colour cue at position x is therefore 1(x) *.** :": 22 In the golf swing analysis application, only a foreground :: 23 log-likeilhood ratio is used. The colour cue is weaker * * 24 for providing evidence for the golf club than the frame differencing cue when a low cost camera is used; this is **** 26 due to the low resolution and poor colour signal from 27 such a camera, combining to give sparse and unstable 28 colour information for the clubshatt. However, the cue 29 still adds useful information for the fitting process. In general, higher quality/cost cameras may be used, and may I provide better colour signals. The foreground log- 2 likelihood ratio for the colour cue is: 4 L(X) = Iog(p(I(x)c = foreground,s = on)) 1og(p(1(X)c = foreground,s = off)) 6 Similarly to the frame differencing case, a histogram 7 method is used to estimate the densities in the preceding 8 equation; in general any density estimation method may be 9 used.

11 A hypothesised part configuration is used to project a 12 part model into the image space. There are many possible 13 methods of scoring the object model. In the golf swing 14 analysis application, the score for part model i is: [Sorry. Ignored \begin{align*) ... \end(align*}] 16 where P is the set of all points on the mask, and M(p) 17 and M(p) are the weights at point p in the foreground * 18 and background masks of the ith part model, respectively. *a**

In use, there are three stages in the above example of 21 the present invention.

22 1. learning, * 23 2. preprocessing and 24 3. tracking. S...

26 The learning stage is performed before the system is 27 deployed. It requires training data (videos of golf 28 swings in this example) to be marked up so that part 29 models can be learned and the on' and off' states for each part model in each image can be properly identified.

2 The cues can then be calculated and densities over the 3 cues can be learned. The steps are: l.Mark-up training data.

6 2.Learn part models.

7 3.Learn global swing trajectory model (e.g. PCA.) 8 4.Learn foreground frame difference log-likelihood 9 ratios.

5.Learn background frame difference log-likelihood

II ratios.

12 6.Learn foreground colour log-likelihood ratios.

14 In the preprocessing stage, the frame differencing cue is calculated.

16 1.Calculate frame difference value for each pixel in 17 all images.

18 2.Pre-calculate pixelwise frame-difference and colour 19 likelihood ratios.

: *. 20 *1** 21 This algorithm details an extension to the search method *...

22 detailed above in which swing configuration sampling is : 23 combined with a local search in image space for each : 24 frame. * 25

26 1.Sample swing configuration space to obtain population 27 of swing trajectories.

28 2.use swing trajectories as seed points for local 29 search. A basic gradient scheme is described, but in general other local search schemes may by used.

31 (a) Score initial swing trajectory 32 (b) For each frame: I (i) for each part 2 (A) Calculate a set of new part 3 configurations adjacent to the 4 current part configuration.

(B) Calculate scores for each new part 6 configuration.

7 (C) Set the highest scoring new part 8 configuration to current part 9 configuration'.

(D) If the current part configuration's II score is higher than the old score (by 12 some pre-specif led margin) then go back to 13 step (A) 14 (ii) store best configuration for each part.

16 3.Once search has terminated, construct a new swing 17 trajectory comprising the best results across all 18 swing trajectories. For each frame: 19 (a) For all swing trajectories in the population, determine the highest scoring part configuration . 21 for the current frame. *...

22 (b) Append this part configuration to the new swing 23 trajectory.

* 24 4.Backproject the new swing trajectory into the PCA * * *.*.

* 25 space to obtain closest legal swing in model. - ** 26 5.Iterate from 2 until some convergence criterion has 27 been reached.

29 Fig.6 shows an example of the present invention 60 in which a scene 61, represented by a person addressing a 31 golf ball is captured by a camera 63, such that the 32 player's swing is captured and sent via the internet 65 I to a central server containing analysis means in 2 accordance with the present invention.

4 The analysis means has an output to a monitor which displays the analysis results. These analysis results 6 will identify and comment upon any deficiencies in the 7 swing of the player. The analysis is based upon learned 8 golf swing characteristics and rules that define good and 9 bad technique in respect of a golf swing.

II The analysed data may be sent to the player for display 12 on a personal computer at home or elsewhere. As the 13 processing can be done centrally, the system does not 14 need large processing capacity, only a camera connection IS and access to the internet. The system may be accessed 16 via a PDA or mobile phone.

18 In one example of the present invention, the system is 19 set up on a golf course such that a player can have each drive monitored to investigate changes in his swing. The 21 system can also be set up at golf driving ranges or : .. 22 elsewhere.

*SS. 23 S...

24 The invention is equally applicable to other sports or the analysis of other objects, especially where the type 5:w: 26 of movement of the object expected conforms to a general 27 pattern. *S..

29 Improvements and modifications may be incorporated herein without deviating from the scope of the invention.

Claims

I CLAIMS

3 1. A system for analysing the movement of an object, 4 the system comprising: image analysis means adapted to receive a sequence of 6 images, the image analysis means comprising 7 object model creation means for creating an object model 8 of an object to be analysed, a likelihood scoring module 9 and sampling means, wherein the image analysis means ID extracts movement information from images in the sequence ii to analyse object motion.

13 2. A system as claimed in claim 1 wherein, the analysis 14 of object motion comprises comparing the object motion to an idealised or preferred object motion, and identifying 16 differences.

18 3. A system as claimed in claim 2 wherein the 19 differences are communicated to the user as expert advice on the manner in which the object motion differs from the 21 idealised or preferred object motion.

: .. 22 *::::* 23 4. A system as claimed in any preceding claim wherein, 24 the object model represents the visual appearance of a * 25 part of the object using a set of 2D probabilistic masks.

* * 26 * 27 5. A system as claimed in claim 4 wherein, each mask 28 represents a distinct visual region that a point in the 29 part model belongs to.

31 6. A system as claimed in claim 5 wherein, the point in 32 the part model is the foreground region.

I 7. A system as claimed in claim 4 wherein, a mask value 2 represents the probability of a point on the mask 3 belonging to this class.

8. A system as claimed in any of claims 4 to 7 wherein, 6 the object model is associated with a layered 2D 7 transform which is used to project the set of masks into 8 an image in the sequence whilst taking into account self- 9 occlusion of the object model. l0

II 9. A system as claimed in claim 8 wherein, the 12 transform defines a one-to-one mapping between points on 13 the mask and pixels in the image.

IS 10. A system as claimed in claim 4 wherein, each part is 16 modelled using two masks, representing membership of

17 foreground and background regions.

19 11. A system as claimed in any preceding claim wherein, the sampling means generates object pose configurations 21 that can be scored by the likelihood scoring module. * ..

* * * 22 *** *. 23 12. A system as claimed in any preceding claim wherein, *** 24 the sampling means comprises a learned motion model which ****** * 25 generates a sequence of object configurations which ****** . . . * * 26 describe a predetermined object motion trajectory. * S S

:5:: 28 13. A system as claimed in claim 12 wherein, the 5*:: 29 sampling means fits the object motion trajectory to the images of the image sequence in parallel.

32 14. A system as claimed in any of claims 12 or 13 33 wherein, a principal component analysis (PCA) model of 34 the object motion trajectory is used.

2 15. A system as claimed in claim 14 wherein, a mean 3 object motion trajectory and a set of basis vectors are 4 calculated from training data.

6 16. A system as claimed in claim 14 or 15 wherein, new 7 object motion trajectories can be generated by sampling 8 basis weights.

17. A system as claimed in any of claims 12 to 16 II wherein, a sequence of object configurations is generated 12 and then associated with a particular sequence of images 13 such that they can subsequently be scored.

18. A system as claimed in claim 17 wherein, the object 16 configurations are scored by the likelihood scoring 17 module once the object configurations for each image have 18 been calculated using the motion model.

19. A system as claimed in claim 17 wherein, the : *". 21 likelihood scoring module scores a part configuration * 22 using cues which are calculated from a input video S...

23 sequence in a pre-processing stage.

S

*5S*** * S :: 25 20. A system as claimed in claim 19 wherein, individual * * 26 part scores are combined to produce a score for the full 27 object configuration.

29 21. A system as claimed in claim 4 wherein, for each mask in a part model, first and second densities can be 31 learned over cue values conditioned on the model being in 32 a first and second state defined as on' and off', the 33 ratio of the first and second densities giving a nonI linear mapping between the cue values and a measure of 2 belief that the values are explained by the model being 3 on' or off'.

22. A system as claimed in claim 21 wherein1 each mask 6 can be scored by accumulating the likelihood ratios of 7 each cue value at all points in the mask, weighted by the 8 mask value.

23. A system as claimed in any preceding claim further II comprising image capture means adapted to capture the 12 sequence of images.

14 24. A system as claimed in claim 23 wherein, the image capture means and the image analysis means are remotely 16 connected.

18 25. A system as claimed in claim 23 or 24 wherein, the 19 image capture means and the image analysis means are connected via the Internet.

* *. 21 * S S *5S* 22 26. A system as claimed in claims 23 to 25 wherein, the 23 image analysis means is located on a central server. * *

S

**.... * * * S * S S * SS **** * S S...