GB2253109A

GB2253109A - Tracking using neural networks

Info

Publication number: GB2253109A
Application number: GB9125376A
Authority: GB
Inventors: Niall Peter Mcloughlin; Hidenori Inoughi
Original assignee: Hitachi Europe Ltd
Current assignee: Hitachi Europe Ltd
Priority date: 1990-11-28
Filing date: 1991-11-28
Publication date: 1992-08-26
Also published as: JPH04260979A; GB9125376D0; GB9025797D0

Abstract

To track one or more moving objects within a field of view, successive frames of the image of the field of view are processed 104 to obtain optical flow data comprising, for each pixel of the image and for each direction of orientation, a measure of the component change of image intensity at that pixel. The optical flow data components are then applied to three-layer neural networks 107 each arranged to output one of the components of the position of a centroid of one moving boundary defined by the optical flow data, the learning rules of each network including a decay constant so that the network functions as a short-term memory and continues to output the approximate centroid component values in the absence of an input, for instance due to occultation. The centroids and image of the field of view are applied to a further neural network which is trained to recognise object shape, so that the object may be tracked through collisions and partial occultations. The output signals drive a camera to track the object. <IMAGE>

Description

"MOTION TARGET TRACKING SYSTEM" The invention relates to a motion target tracking system and provides a method and apparatus whereby the movement of one or more objects within a field of view can be tracked.

An object of the invention, which is defined in appended claims, is to provide a tracking method and tracking apparatus which can follow objects moving within its field in view and output their co-ordinates, or cause a camera or other device to follow a selected object, and which is of improved performance with regard to maintaining tracking of the object through temporary occlusions, or through collisions with other objects within the field.

A further object of the invention is to provide a neural network which detects centroid information, such that it can be applied to detecting a representative point among presented clusters of data samples, the x,y coordinates of this point rendering a parameterized classification.

The invention will now be described with reference to the accompanying drawings, in which Figure 1 is a block diagram of the system of the invention; Figures 2, 3 and 4 are diagrams explanatory of the production of a set of corrected velocity component signals; Figure 5a shows, diagramatically, the structure of ART1 neural networks. [The Augmented ART (Adaptive Resonance Theory) Neural Network IJCNN-91 (International Joint Conference on Neural Networks) (Seattle) Proceedings Vol.II pp 467-472] Figure 5b shows, diagrammatically, the structure of one of the neural networks for computing the motion of the centroids of the boundary movements of moving objects; Figure 6 illustrates a stage in the isolation of the objects whose moving boundaries have been identified; Figure 7 shows a single-layer neural network acting a novelty filter; Figure 8 is a diagram illustrating a collision between the object being tracked and another moving object; Figure 9 shows the hardware implementation of a system according to the invention; Figure 10 is explanatory in more detail of the system of Figure 9; Figures 11 to 19 represent in more detail successive stages of processing of images by the motionoriented contrast filter 104 of Figure 1; Figures 20 and 21 illustrate the effect of the normalising filter 105 of Figure 1;; Figures 21 illustrates the production of projected velocity component signals Sxlkr Sy,k from the normalised image data; Figure 22 illustrates the effect on the projected velocity components signals Sx,kr Syk of two of the neural networks of Figure 5b; Figure 23 illustrates the determination of a bounding area isolating the image of an object to be tracked; Figure 24 illustrates the input of the ins tar outs tar competition network for real data; and Figure 25 illustrates the output of the ins tar outstar competition network for real data.

Referring first to Figure 1, the system includes a sensor 101, comprising a camera which generates a bit-map image of the field of view and supplies it to two sub-systems, 102 and 103. Sub-system 102 receives image information from the sensor, processes it, and applies it to a neural network trained to produce output signals representing the co-ordinates of points which correspond to the centroids of the moving boundary regions of the objects. Sub-system 103 receives image information from the sensor and also the centroid information from sub-system 102 and applies it to a neural network trained to recognise the shapes of the boundaries of objects being followed. When collisions occur, this sub-system is therefore able to distinguish an object being followed from colliding objects and so maintain correct tracking.

Dealing now in more detail with sub-system 102, the bit information from the sensor 101 comprises pixel intensities for successive image frames, Ii,j(t1), Ii,j(t2), Ii,j(t3), etc., where I represents intensity, i,j are the pixel coordinates, and t1, t2 etc. are the times of the successive frames. This information is applied to a motion-oriented contrast filter 104. The filter 104 is a four-level neural model which is arranged to measure optical flow, that is to say, the components, resolved into different orientations, of the difference in intensity between successive frames. Such filters are known and will not be described here - see for example the article by S. Grossberg & M. Rudd entitled "A Neural Architecture for Visual Motion Perception: Group and Element Apparent Motion" in Neural Networks vol. 2 pp 421-450 (1989). Reference may also be made to "The Adaptive Brain II" - ed.Stephen Grossberg (Elsevier 1987).

The output signals from the motion-oriented contrast filter are in analogue form, and may be represented by Ui,j (k,t), where i,j are the co-ordinates of the pixel, k is orientation (of which there are eight directions in the particular embodiment being described), and t is time.

These signals are then applied to a normalising filter 105 which converts them to output signals Vi,j (k,t) normalised in amplitude to a value lying between 0 and A and smoothed by space and time averaging. This filter normalises the signals according to the equation ed/dtVij(k,t)= -aVij(k,t)+(A - Vij(k,t))NEij - Vij(k,t)NIij (1) Where NEij = Uij(k,t) NIij = SGo(u,v)U(i+u)(j+v)(k,t) (u,v) < > (0,0) Ga(u,v) =(1/2no)Exp(- (u2+v2)/2a2) E, a, A are positive constants typically E=0.01, a = 0.1, A = 1.0 which is based on the well-known cell membrane equation.

This averaging and normalising process increases the reliability of the signals from the motion-oriented contrast filter, reduces noise, and also suppresses the temporary effect of high velocity objects crossing the field.

The above steps are illustrated diagrammatically in Figures 2, 3 and 4. Referring first to Figure 2, an image frame is assumed to contain three objects, one of which is assumed to be stationary while the other two are moving to the right in the view shown in the Figure with different velocities. The motion-oriented contrast filter 104 compares this image frame with the immediately preceeding image frame and produces an output representing the optical flow, i.e. the velocity of the moving boundary edges, for each of eight orientations (k=0...7). The optical flow is then normalised by the filter 105, which produces an averaging effect over space and a decaying average with respect to time, as is indicated in Figure 3, and normalises them in amplitude.

The signals Vi,j(k,t) are then processed in a feature extraction module 106 to produce a set of velocity component signals Sx,kr Sy,k by projecting the Vi,j signals for each of the eight values of k on to the X and Y axes. This process is shown in a simplified form in Figure 4, where the two moving objects of Figure 2 were assumed to be moving from left to right in the X-direction. For each of the eight orientations (k=0 to 7) the leading and trailing boundary edges of the two moving objects produce peaks in the Sx and Sy signals.

The Sx and 5y signals are now each applied to one of a set of neural networks 107, the structure of one of which is shown in Figure 5b. Each of these networks is arranged to learn to follow the components of motion of the moving boundaries and to output a signal representing the direction and velocity of the movement of their centroids.

Referring now to Figure 5b, the network comprises three layers of neurons, designated in the Figure as F1, F2, and F3, and the signal Si is applied to both F1 and F3. Layers F2 and F3 are competitive, each neuron receiving negative-weighted inputs from its neighbours in the layer and positive-weighted feedback from its own output. These weights are fixed, i.e. they are unaffected during the learning process. The outputs of the network, xk or yk, representing the motion of the centroid(s) of a boundary movement being followed, is developed or encoded in F2.

The instar stage, that is, the activity to the nodes of the layer F2 from F1 is controlled by the following equation:

where a, A are positive constants, typically a = 0.1, A = 1.0 A suitable learning law is given by: dwji/dt = e(t).(si-wji).xj (3) where e(t) is a time-decaying function, such as eO/[log(1 + t) + 1]; eO > 0 or eO/[1 + t]; eO > 0 The outstar stage of the network, that is the activity to the nodes of the nodes in the F3 layer are controlled by the following equation:

where a, A are positive constants.

A suitable learning law for this network is given by: dZkj/dt = e(t).(yk-Zkj).xj (5) The network is trained on an object being tracked. The interaction between layers F1 and F2 acts as a matched filter, sharpening the signal peaks and reducing noise together with the interaction (lateral inhibition) between nodes in F2. F2 itself acts as a short-term memory, holding a decaying image of the peaks of the Si signals, while in combination with F3 it forms a longer-term memory. As a result, if the object being followed is temporarily occluded these memory systems maintain a slowly-decaying memory of its motion, and are able to continue with the tracking process as soon as it re-appears.

The network of Figure 5b is a subset of the ART1 neural network. The detailed behaviour of this type of neural network is understood in the literature. The only difference between Figure 5a and 5b is that Figure 5a has a reset node, XR, which allows only one node in F2 to become active.

Returning again to Figure 1, it will be recalled that the system contains a sub-system 103 which receives image information from the sensor and the centroid in for mation xk, Yk, from the neural networks 107. This sub-system contains a feature extraction module and a neural network trained to recognise the shape of the boundary of the object being followed, and is able to distinguish an object being followed from colliding objects, and so maintain tracking when collisions occur.

For the purpose of shape recognition the signals Ii,j (t) from the sensor 101 are applied to a feature extraction module 108 which extracts boundary information and passes it to a novelty filter 109.

The feature extraction module 108 receives the unprocessed image information from the sensor 101 and the output from the networks 107 which is thresholded so that only significant peaks are examined. From this information the approximate centre of motion for each moving boundary in the field of view is calculated.

Various methods are available for such boundary description, including edge detection algorithms, fast Fourier transforms and the generalised Hough transform.

The co-ordinates of a 'box' bounding each are then determined.

Figure 6 is illustrative of this process. In that Figure the set of objects within the field of view corresponds to that of Figure 2. The centroids 601, 602 of the two moving objects are isolated by determining a set of co-ordinates which define a "box", i.e. a separate area, 603, 604 respectively, containing and isolating each of them. The object 605 is present in the bit image generated by the sensor 101, but as it is not in motion no boundary movement centroid is generated for it by the neural network 107, and it is ignored.

The output of the feature extraction module 108 is passed to an adaptive novelty filter 109. This adaptive novelty filter is shown diagrammatically in Figure 7, and consists of a single layer of neurons which are all laterally inter-connected by adaptive weights Wij. The activation of each node xi is given by the following equation: xi(t+1 )=xi(t)+SWij(t) .xj(t) (6) i < > j These weights are modified during training by the following learning rule: dWij/dt = -a.xi(t).xj(t) (7) From the above equations it can be seen that it an input is presented for long enough the output of the neurons will tend to zero. During training each shape pattern to be learned is presented to the network for an iteration period of t=T1. during this period the weights "learn" each pattern and the output converges to zero from each neuron.Only one presentation of each pattern is necessary for this convergence assuming T1 is chosen correctly. During normal operation the iteration period time is decreased from Tl to T2, T2 being a fraction of Tl, and the output will converge to zero if the pattern presented has been previously learnt during training. If an obscured learned pattern is input to this network the missing information is "recalled" as novelties (activations) at the output. Reference may be made to the book "Self-organisation and Associative Memory" by T. Kohonen (Springer-Verlag 1989), or to the paper by E. Ardizzone entitled "Application of the Novelty Filter to Motion Analysis" in the Proceedings of the INNC 1990, Paris, pp 46-49.

Having identified an object tracked by its shape, the object can be followed in spite of "collisions", i.e. the intrusion or overlapping of the objects. The principle involved can best be explained by reference to Figure 8. In this Figure, 801 represents an object being tracked, which is moving across the field of view from left to right, and 802 is another object which passes in front of 801, and obscures its leading edge. The tracking system as hitherto described recognises moving boundary edges, and has a short-term memory whereby, if the object 801 is occluded for a brief period, it leading edge will normally be picked up again when it re-appears. However, if the objects 801, 802 are moving in roughly the same direction at speeds which are not very different there will be an ambiguity, and the system could pick up the edge of object 802, instead of that of object 801. The sub-system 103, being able to identify objects by their shape, can resolve ambiguities of this kind.

The output from the novelty filter 109 may be applied to a driver circuit 110, arranged, for example to supply drive current to motors controlling the movement of a camera.

The multiple target tracking system described in Figure 1, has been tested on the following hardware configuration. Referring to Figure 9 as a general purpose Charge Coupled Device (CCD) camera was used as sensor to the system (901). The output from the camera being captured by a frame grabber board (902), inside a Meiko M10 enclosure (903). The captured image frames are then passed onto a transputer network, via a transputer bus (904), which is composed of four Meiko In-Sun transputer boards (905) each of which contains sixteen Inmos T800 transputers, for a total of sixty four T800 transputers. Referring back to Figure 1, subsystem 102 and subsystem 103 are implemented on the transputer network and will be explained in detail in the following section.Referring to Figure 9 again, the results of tracking are viewed interactively on the scene monitor (906), via a display board (907), and either on the Sun 4/370 monitor (908) or on a Macintosh terminal (909) connected via Ethernet (910), both using the X-Windows standard graphical interface. The system was tested on images of two toy trains (911) taken from a height. In this way it was possible to control the duration and extent of collision and occlusion, by using bridges and tunnels.

Referring to Figure 10, the subsystems described in Figure 1 are explained from an implementation point of view. Each box in this diagram represents a separately compiled module of code, and each is run on a separate T800 transputer. The cylinders 1009 and 1011 represent pipelines of T800 transputers which will be explained below. The arrows in Figure 10 represent the bi-directional communication paths between the modules of code. Each module of code is compiled separately before execution using the standard OPS compiler, in the case of OCCAM2 code (modules 1005, 1006, 1007, 1008 (a & ), 1009 (a & & ), 1010, 1011, and 1012), or the standard C compiler (modules 1002, 1003, and 1004).

Finally upon execution the CS~HOST module (1002) configures the transputer network for modules 1005-1012 and then downloads all the executable modules to their respective transputers. CS~HOST also sets up communication between modules 1003, 1004 and 1005 as shown in Figure 10, to enable the input to the system (Frame Grabber (1005)) and the output to the scene display (Frame Display (1010)) to be controlled. CS HOST (1004) is a control program implemented using Meiko's CS~BUILD libraries as an extension to C.

Cylinder 1001 is representative of the interfact between the CS HOST (1002) and the Unix file system, which in this implementation was SunOS 4.0.3. It is via this link that the external Unix file system is accessed, for such reasons as writing results to the Sun or Macintosh screen or to a file.

The TTY module (1003) is used to initialise the Frame Grabber driver software so that the FG module (1005) is ready to sample at the correct video standard.

This system can operate on multiple standards including PAL and NTSC video standards, and the size of the image to be displayed and what processing, if any, is to be done to the image before display are all controlled via this file. This control module coupled with the GFX module (1004) is used to control the FG(1005) and FD(1007) modules, and hence the input and output to and from the system. TTY is written in standard C with all transputer communications handled by standard Meiko CS Tools libraries.

The GFX module is used to switch the mode of the FD(1007) to allow graphics to be displayed on the scene monitor if required. This control mechanism has been set up to allow for example the moving tracked object to be replaced with a graphical icon when it is being correctly tracked. As the FD(1007) operates in two modes, one where the processed images from the FG(1005) may be displayed and another where only graphics may be displayed, it is necessary to flip between these modes if such a display is required.

The FG(1005) module contains the control software for the Frame Grabber. It also contains code for communication with various other modules (1003, 1004, and 1006). The FG module is responsible for capturing incoming camera signals, and passing these captured images in frame format to the BUFF module (1006). The FG module initialises the Frame Grabber board based on commands it receives from the TTY module (1003) during the starting up period. Once the FG module is in active capturing mode it splits each captured frame in half and passes each half in parallel down the two indicated channels (1013) to the BUFF module (1006). The FG module is composed of a large number of parallel processes, some of which control the actual frame grabber hardware, while others are used to ensure efficient communication.A simple protocol, based on standard networking protocols such as TCP/IP, has been implemented to ensure the successful transportation of data between FG and BUFF.

The BUFF module (1006) again contains a number of parallel processes to speed up communication over the indicated links. Please note that links and communication channels will be used interchangeably in this text. The BUFF module's main purpose is to buffer the output from the FG module to the processing modules Masterl (1008a) and Master2 (1008b). The reason behind this is an attempt to have as may as T800 transputers processing at the same time, and hence decrease the overall response time. BUFF is also responsible for splitting the image into two equal parts and passing these parts to Master (1008a) and Master2 (1008b) respectively.

The FD module (1007), or Frame Display module, is responsible for all display routines. It accepts images and graphical commands over it's links to Master1(1008a) and Master2(1008b). A strict protocol has been designed to allow the controlled flipping of the graphical mode, as described earlier, and also to allow chunks of image data, from Masterl and Master2, to be displayed independently in a controlled manner. Again here separate processes are assigned for communication efficiency and run in parallel to the screen display interface processes..

The Masterl and Master2 modules (1008a & 1008b) are responsible for splitting the image into a fixed number of chunks, this number being fixed by the number of T800 transputers in the Motion Oriented Contrast Filter (MOCF) pipelines (1009a & 1009b), which is twenty in Figure 10. They are exactly the same modules and so only Master1(1008a) will be discussed. It sends out these chunks over a link to MOCF(1009a) and receives back the processed images Vij(kt)n for display purposes, and the extracted signatures Sx,k,n and Sytktn. The subscript n has been used here to represent the fact that there are n such outputs for each input image, each n corresponds to one chunk.As these signatures are only for a chunk of the image they are meaningless until they are all collected in BUFF2(1010), and so Masterl controls the transmission of these chunk signatures to BUFF2(1010). This link is also used to send the image intensity date Iij to BUFF2 for use with the Novelty module (1012). As this link is bi-directional it is also possible to send graphical commands over this link.

This may be used to display results obtained from BUFF2 module (1010), from the centroid detection neural network module (1011) via BUFF2, or from the Novelty module (1012) via BUFF2. Finally Masterl is connected to the FD module (1007) and it is over this link that all display commands and data are sent.

The MOCF modules (1009a & 1009b) are identical duplicates of each other. They each contain twenty T800 transputers, each running the exact same software module slave (1009c). Each slave module is connected on both sides to another slave module, except the initial slave module which is connected to its respective Master module (1008a or 1008b), or the end slave module which is only connected on one side. Each slave module contains a number of parallel processes and each module contains buffers for both the input image chunks and the output processed image chunks Vij(kt)n and signatures Sx,k,n and Sy,k,n The slave modules compute the Vij(kt)n signals from two successive images and store the latest image for comparison with the next one. With reference to Figure 1, they perform the tasks of Simplified MOC Filter (104), the Spatio Temporal Normalization Filter(105) and the Feature Extraction (I)(106) functions. They return this information along their links to the controlling Master module (1009a or 1009b). This slave module has been designed to forward data out to further away slaves (relative to the Master module), and to forward the results of processing back to the controlling Master module.

The BUFF2(1010) module receives the signatures Sxtkn and Sy,k,n from both the Masterl module (1009a) and from the Master 2 module (1009b). Its purposes are to collect these subsignatures together to create the full signatures Sx,k and Sytkt and to collect together both halfs of the original image Iij forwarded from the Master modules (1009a & 1009b). The BUFF2 module also sends on these signatures to the Centroid Detection Neural Network (CDNN) module (1011), and extracts boundary descriptions from the image according to Feature Extraction (II)(108) (reference Figure 1). It receives back the output from the CDNN and thresholds this to discover the significant peaks.These significant peaks are used to create significant subregions or bound boxes in the input image Iij, and the features calculated in these regions are passed on to the Novelty module (1012). Control software in the BUFF2 module can interpret the output from the Novelty module (1012) and act accordingly. Referencing Figure 1 again this module (1010) performs part of the function of Feature Extraction (I)(107), the function of Feature Extraction (II)(108) and may also provide the function of the Driving module (110).

The CDNN module(1011) contains sixteen T800 transputers, each of which runs and identical Instar Competitive, Outstar Competitive (ICOC) module.

Referring again to Figure 1 each of these modules provides the function of one of the networks in 107.

Additional software is also resident in each module to receive and forward input data, and to receive and forward result data back to BUFF2 in a similar manner to the slave modules (1009c).

The Novelty module (1012) accepts input from the BUFF2 module (1010) and returns it's output back to BUFF2. With reference to Figure 1, this module provides the function of the Novelty Filter (109) described previously.

With reference to Figure 1, the simplified MOC filter 104 receives input as shown in Figure 11, 12 and 13. These Figure represent the first of three stages of processing of the MOC filter as described below. The input described previously as Iij(t) is an array of pixel intensities 1101, 1201, & 1301. In this explanation only one object 1102 is assumed to be moving, and the background is assumed to be a uniform white. Each of the four masks shown 1103 are convolved with the input data to produce eight directional arrays which represent the change in intensity in each direction. The algorithm 1104 for calculating the direction arrays is described in Figure Il (and also in 12 & 13). Figure 11 represents the first case, where the new object appears at time t = 1. Up until this time no object has been present moving in the previous images.Figure 12 represents the second case, where the new object position 1202 is connected to the previous object position 1203, this represents one possibility of the frame after Figure 11, in other words the frame at time t=2. Figure 13 represents the final case examined, where the new object position 1302 is totally separate from it's previous position 1303, again this represents one possibility of the frame after Figure 11, at time =2.

Figures 14, 15 & 16 represent the second stage of processing, for each of the respective cases explained above. In Figure 14 a summation mask 1401 is convolved with the input image and the difference in intensity between the summated area in the input image and that of the previous image (time t=0), is calculated. If the summated image intensity decreases at a point as it does in Figure 11 a measure of this decrease in intensity is entered in the lessbright array. Similarly if the summated image intensity for a point in the new image increases with respect to the previous image then a measure of this increase is entered into the morebright array. If there is no change in the summated intensity both arrays are set to zero for that element. This process may be described as calculating the average intensity change in regions surrounding each pixel.In Figure 15 the same process occurs, creating different output to the lessbright and morebright arrays as the summated input image intensity is compared to that of a previous frame (time t=1) which contained the image. Finally in Figure 16 the same process occurs except more values are set in the lessbright and morebright arrays as the two frames being compared contain larger differences in the position of the object.

Figures 17, 18, & 19 represent the final stage of processing, for each of the respective cases. In each Figure the output for motion from left to right, motion6, is shown. In each case morebright is multiplied by direction 2 and summed with the result of the multiplication of direction6 by lessbright. This produces the optical flow output as shown in 1701, 1801, & 1901. In all Figures the leading edge is clearly represented, although in Figure 18 a slight "shadow" of the trailing edge is also picked up. This noise is removed by the next two stages of processing, the spatio-temporal normalisation filter and the feature extraction(II) module.

The output from the MOC filter 104 is then fed into the spatio-temporal normalization filter (STNF) 105 as previously described.

With reference to Figure 20, we see the output from case 1 above, 2001, being fed into the STNF and the resulting normalised output 2002.

With reference to Figure 21 the signatures 5x,k=6 2101 and Sywk=62102 for the above example case, created by summing the input 2103 in the X and Y diretion as previously described are shown.

With reference to Figure 22, the sharpened signals, x6 and yg, produced from the F2 layer of the two instar competitive, outstar competitive neural networks corresponding to direction 6 are shown.

With reference to Figure 23, the bounding box calculated from the xk and yk signals is shown. The boundary features calculated in this box are sent on to the Novelty filter neural network 109, which outputs zero as it recognises the shape of the object as a square.

Finally with reference to Figures 24 and Figure 25, the input and output to and from the instar outstar competitive networks for a real data example are shown.

Claims

CLAIMS:

1. Method of tracking multiple objects moving in a field of view comprising the steps of a. processing successive frames of the image of the field of view to obtain optical flow data comprising, for each pixel of the image and for each direction of orientation, a measure of the component changes of image intensity at that pixel, and b. applying the optical flow data components to three-layer neural networks each arranged to output, for each moving boundary defined by the optical flow data, one of the components of the position of its centroid, the learning rules of each network including a decay constant whereby the network functions as a short-term memory and continues to output the approximate centroid component values in the absence of an input.

2. Method according to Claim 1 in which the optical flow data components are pre-processed before being applied to the neural networks to normalise their amplitude range and convert them to short-range time and spatial averages of their individual values.

3. Method according to Claim 1 or Claim 2 in which the boundary centroid components are applied to means also receiving a bit-map image of the field of view and arranged to define separate areas each bounding a moving object.

4. Method according to Claim 3 in which a further neural network is trained to the shape of an object within one said boundary area, whereby the object may be tracked through collisions and partial occlusions.

5. Method according to any preceding claim in which the output component signals are applied to a driver arranged to drive a camera or other device to cause it to follow one of the tracked objects.

6. Apparatus for tracking multiple moving objects in a field of view comprising a sensor continuously receiving images of the field of view and providing pixel intensity signals to successive image frames, a motion-oriented contrast filter comparing successive image frames to derive from them analogue output signals representing components of optical flow of moving boundaries within the field, a feature extraction module resolving the component signals along mutually perpendicular axes to obtain velocity component signals for each moving boundary component within the field and a set of neutral networks receiving the velocity component signals and trained to follow an object being tracked so as to output tracking signals representing the movement of the centroids of their boundaries.

7. Apparatus for tracking multiple moving objects according to claim 6 including a normalising filter receiving the optical flow signals from the motion-oriented contrast filter and scaling and averaging them with time decay before application to the feature extraction module.

8. Apparatus for tracking multiple moving objects according to claim 6 or claim 7 in which the neural networks receiving the velocity component signals are three-layer networks comprising first and third layers to which the velocity component signals are applied and a second, middle, layer providing outputs representing the motion of the centroids of the boundary movement being followed, the middle and third layers being competitive with fixed weights and the activity to the nodes of the second layer from those of the first being given by the following equation:

and the activity to the nodes of the third layer from those of the second being given by the following equation:

9.Apparatus for tracking multiple moving objects according to any of claims 6 to 8 including means for identifying individual objects including a feature extraction module receiving signals from the sensor and centroid co-ordinate information from the neural networks and generating a set of co-ordinates defining an area corresponding to each moving object, these co-ordinates then being applied to a neural network having a short-term memory and arranged to learn the shape of the object, whereby the object can be identified through occultations and collisions.

10. Method of tracking multiple objects moving in a field of view substantially as hereinbefore described with reference to and as illustrated in the accompanying drawings.

11. Apparatus for tracking multiple moving objects in a field of view substantially as hereinbefore described with reference to and as illustrated in the accompanying drawings.