WO2018087545A1

WO2018087545A1 - Object location technique

Info

Publication number: WO2018087545A1
Application number: PCT/GB2017/053366
Authority: WO
Inventors: Mahdu KIRAN; Mohamed SEDKY
Original assignee: Staffordshire University
Priority date: 2016-11-08
Filing date: 2017-11-08
Publication date: 2018-05-17

Abstract

A method comprising receiving video data from a camera, the video data comprising a plurality of video frames captured by the camera; tracking a target object in a sequence of video frames of the video data; calculating a position of the target object in the video frame coordinate system; and converting the position of the target object from the video frame coordinate system to geographic coordinates in a geographic coordinate system.

Description

Object Location Technique

Field

The present disclosure relates to methods, systems and computer readable media for processing video data captured by a camera, in particular to automatically locate the position of a target object. A particular application of the object location techniques of the present disclosure is in video surveillance systems.

Background

Video surveillance systems can use one or more cameras to capture video footage of a surveillance site. Object location techniques can be used to automatically determine the location or position of a target object detected in the captured video data. The position of the object can be displayed as an indicator or icon on a dedicated map of the surveillance site.

The present disclosure seeks to extend the functionality of known object location techniques.

Summary

According to one aspect of the present disclosure, there is provided a method comprising: receiving video data from a camera, the video data comprising a plurality of video frames captured by the camera; tracking a target object in a sequence of video frames of the video data; calculating a position of the target object in the video frame coordinate system; and converting the position of the target object from the video frame coordinate system to geographic coordinates in a geographic coordinate system. The method can convert the position of a target object in captured video data into geographic coordinates in a geographic coordinate system, for example a world geographic coordinate system, to determine the position or location of the target object in the geographic coordinate system.

The method may be an automated, computer-implemented method. The method therefore can automatically convert the position of a target object in video data to geographic coordinates in a geographic coordinate system.

The method may further comprise using the position of the geographic coordinates to display on a map image an indicator to show the position of the target object on the map image. The map image may be displayed in a map view window. The map image may be a satellite image. The position of the target object in a satellite image can therefore be easily identified by a viewer.

Converting the position of the target object from the video frame coordinate system to geographic coordinates in a geographic coordinate system may comprise converting the position of the target object in the video frame coordinate system to a position of the target object in a map coordinate system; and converting the position of the target object in the map coordinate system to geographic coordinates in the geographic coordinate system.

Converting the position of the target object in the video frame coordinate system to a position of the target object in the map coordinate system may comprise applying a homographic transformation to the position of the target object in the video frame coordinate system.

The homographic transformation may be determined by a mapping of each of the positions of at least four non-collinear points in the video frame coordinate system to a corresponding position of each of the points in the map coordinate system.

Converting the position of the target object in the map coordinate system to geographic coordinates in the geographic coordinate system may comprise: selecting at least three reference points in the map coordinate system; identifying, for each of the at least three reference points, corresponding geographic coordinates in the geographic coordinate system; determining a mapping for converting the position of the target object in the map coordinate system to geographic coordinates in the geographic coordinate system using the reference points and identified geographic coordinates of the reference points; and using the mapping to convert the position of the target object in the map coordinate system to geographic coordinates in the geographic coordinate system.

The method may comprise displaying the video data in a video data viewing window. The method may comprise displaying a visual identifier for the target object in the video data viewing window. The visual identifier is used to clearly distinguish the target object from the background in the video frame and to allow a viewer of the video data viewing window to quickly and clearly locate the target object within the video frame.

The method may comprise displaying the geographic coordinates for the target object in a video data viewing window.

The method may comprise calculating the distance between the target object and a fixed point in the video frame. The method may comprise displaying the distance in a video data viewing window.

The method may comprise comprising tracking a plurality of target objects. The method may comprise calculating the distance between two target objects. The method may comprise displaying the distance in a video data viewing window.

The method may comprise calculating the velocity of a target object. To calculate the velocity, the distance and speed may be calculated from the geographic coordinates. The method may comprise displaying the velocity in a video data viewing window.

The method may comprise converting the position of the moving object from the video frame coordinate system to tracking coordinates in a tracking device coordinate system for a tracking device; adjusting the tracking device to direct the tracking device towards the moving object.

The tracking device may be a light or a loudspeaker or a camera. The tracking device may be a tracking camera and adjusting the tracking device to direct the tracking device towards the moving object comprising adjusting the camera to monitor the moving object. The method may further comprise displaying the view captured by the tracking camera.

The view captured by the tracking camera may be displayed in a target object viewing window. The tracking coordinates can be calculated from the geographic coordinates.

According to another aspect of the present disclosure, there is provided a system comprising a memory, a processor and at least one camera, the system being configured to perform a method according to an aspect of the present disclosure. According to another aspect of the present disclosure, there is provided a computer readable medium having stored therein computer readable instructions which, when executed by a processor, cause the processor to perform a method according to an aspect of the present disclosure.

The features, functions, and advantages can be achieved independently in various embodiments of the present disclosure or may be combined in yet other

embodiments in which further details can be seen with reference to the following description and drawings.

Brief Description of the Drawings

Illustrative embodiments of the present disclosure will now be described, by way of example only, with reference to the drawings. In the drawings:

Figure 1 is an illustration of a screen shot;

Figure 2 is an illustration of a later screen shot;

Figure 3 is an illustration of an even later screen shot;

Figure 4 is a flowchart of a method for determining a geolocation of a target object from video data;

Figure 5 is an illustration of another screen shot;

Figure 6 is a schematic block diagram of a computer system;

Figure 7 is a flowchart of a method for object segmentation;

Figure 8 is a diagram showing how illumination and the image picked up by a camera are related; and

Figure 9 is a flowchart of a method for determining geographic coordinates of a target object from a video frame.

Throughout the description and the drawings, like reference numerals refer to like parts.

Detailed Description

FIGs 1-3 depict an embodiment of the present disclosure in which a target object is tracked by a surveillance system. In particular, the coordinates of the target object are determined in the coordinate system of a video frame received from a fixed camera and are used to calculate the geolocation (latitude and longitude) for the target object. The geolocation of the target object is used to focus a pan-tilt- zoom (PTZ) camera on the target object, the PTZ camera being used to follow the object as it moves. As such, a motion of the target object is tracked across a sequence of video frames.

FIG. 1 depicts a screen shot of a screen 10 which is displayed to a user. In particular, FIG. 1 depicts the screen 10 having a video data viewing window 20, a target object viewing window 30, and a map view window 40.

Video data is received from a fixed camera and displayed in the video data viewing window 20. The fixed camera captures video data of a site, which in this example is a car park. A target object 22, which in this example is a car, is identified in a video frame of the video frame data. Identification of the target in this embodiment is performed by object segmentation, in which the motion of the car identifies the car as a target object 22. A visual identifier 24, which in this example is a box around the target object 22, is displayed for the target object 22 in the video data viewing window 20.

The position of the target object 22 in a coordinate system of the video frame is determined. The coordinate system corresponds to a horizontal distance and a vertical distance from a corner of the frame displayed in the video data viewing window 20.

Using a number of known reference points within the frame, the position of the target object 22 in the video frame coordinate system is mapped to latitude and longitude coordinates for the target object 22. For example, one reference point may be the corner of the building shown in FIG. 1 for which the latitude and longitude

coordinates are known. Another possible reference point may be the tip of a lamppost such as that shown in the image for which the latitude and longitude coordinates are known. By using reference points within the frame shown in the video data viewing window 20 and knowledge of the latitude and longitude coordinates of those reference points, the latitude and longitude coordinates for the target object 22 are calculated.

The calculated latitude and longitude coordinates of the target object 22 are used to adjust a pan-tilt-zoom (PTZ) camera so that the PTZ camera focuses on the target object 22. In the example shown in FIG. 1 , the PTZ camera has centred the target object 22 in the target object viewing window 30 and has zoomed to an appropriate level. Within the map view window 40, a map of the car park is depicted, on which the position 42 of the target object 22 is shown. The position 46 of the fixed camera from which the video data is received is also depicted. The position 44 of the PTZ camera from which the images of the target object viewing window 30 are received is also depicted.

The video data comprises one or more sequences of frames. In the sequence being described the car moves in the car park as the sequence of frames progresses, and the PTZ camera tracks the car as it moves. FIG. 2 depicts a screen shot when a frame later in the sequence of frames than that shown in FIG. 1 is displayed in the video data viewing window 20. That is, the frames in the sequence of frames between the frame depicted in FIG. 1 and the frame depicted in FIG. 2 are not illustrated in the figures for reasons of brevity, but on the actual screen a continuous sequence of frames is shown and the PTZ camera tracks the car as it moves. In the depicted frame of the video data the target object 22 and associated visual identifier 24 has now moved to a different location in the car park. The new position of the target object 22 in the video frame coordinate system is determined and used to calculate the latitude and longitude coordinates for the target object 22. The latitude and longitude coordinates of the target object 22 are used to direct the PTZ camera to focus on the new location of the target object 22 as shown in the target object viewing window 30. The new location 42 of the target object 22 is also illustrated in the map view window 40. A similar process is followed for all or some of the frames in the sequence between the one shown in FIG. 1 and the one shown in FIG. 2.

FIG. 3 depicts a screen shot when a frame later in the sequence of video frames than that shown in FIG. 2 is shown in the video data viewing window 20. A new position of the target object 22 is displayed in the video data viewing window 20 along with a visual identifier for the target object 22. Once again the position of the target object 22 in the video frame coordinate system is determined and the latitude and longitude coordinates for the target object 22 are calculated. The latitude and longitude coordinates for the target object 22 are used to direct the PTZ camera to focus on the newly calculated location of the target object 22 as shown in the target object viewing window 30. The new location 42 of the target object 22 is illustrated in the map view window 40. Again, as with the transition between the frames shown in FIGs. 1 and 2, on the actual screen a continuous sequence of frames is shown between the frames shown in FIGs. 2 and 3 and a similar process is followed for all or some of the frames in the sequence between the one shown in FIG. 2 and the one shown in FIG. 3.

FIG. 4 is a flowchart depicting a method for determining a geolocation of a target object from video data. The method is an automated, computer-implemented method.

Referring to FIG. 4, at step 405 the process starts as video data is received from a fixed camera. The video data is displayed in a video data viewing window at step 410. At step 415 a target object is identified in a video frame of the received video data. A target object is typically a moving object within the field of view of the fixed camera.

At step 420, a position of the target object in the video frame coordinate system is determined.

At step 425, a visual identifier for the target object is displayed in the video data viewing window. The displayed visual identifier is used to clearly distinguish the target object from the background in the video frame and to allow a viewer of the video data viewing window to quickly and clearly locate the target object within the video frame. In the depicted embodiments the visual identifier is a rectangle around the target object.

At step 430 the latitude and longitude coordinates for the target object are calculated. In order to do this, stationary reference points within the video frame are used.

These reference points may, for example, include a location of a lamppost or the corner of a building etc. For each of these reference points, coordinates in the video frame coordinate system are known. Additionally for each these reference points, latitude and longitude coordinates are known. In this way a mapping exists between the coordinates of each reference point in the video frame coordinate system and their respective latitude and longitude coordinates. Accordingly, by comparing the position of the target object in the video frame coordinate system with the position of each reference point in the video frame coordinate system, it is possible to calculate the latitude and longitude coordinates for the target object. Further details on how this can be done are explained below. At step 435 the calculated latitude and longitude coordinates for the target object are displayed in the video data viewing window. The latitude and longitude coordinates can be displayed in the vicinity of the target object or anywhere within the video data viewing frame.

At step 440 a distance is calculated between the target object and the fixed camera from which the video data is received. Latitude and longitude coordinates of the fixed video camera are known and so the ground distance between the target object and the fixed camera can be calculated by comparing the coordinates of the target object with the coordinates of the fixed camera.

At step 445, the distance data is displayed in the video data viewing window.

At step 450, the pan-tilt-zoom (PTZ) camera is adjusted to monitor the target object. Information concerning the calculated latitude and longitude coordinates for the target object is used to direct the PTZ camera to the target object. The PTZ camera is configured to zoom to an appropriate level for viewing the target object based on the calculated distance data. For example if the ground distance between the target object and the fixed camera is known and the geolocation of the PTZ camera is known then a distance from the PTZ camera to the target object can be calculated and used to adjust the zoom of the PTZ camera. Frames captured by the PTZ camera are displayed in a target object viewing window.

At step 455, an indicator is used to indicate the position of the target object on a map image in a map view window.

At step 460, a decision is made as to whether or not a next video frame should be processed.

If subsequent frames are to be processed, the method loops back to step 420 for analysis of the next frame. This can occur if, for example, the target object moves and so has a different location in subsequent frames.

Alternatively, if a next video frame is not to be processed then the process ends at step 465. This can occur if, for example, the target object moves out of the field of view of the fixed camera or if no video frames remain to be processed. In this way a target object is monitored as it moves within the field of view of the fixed camera.

FIG. 5 depicts a screen shot 10 according to another embodiment. In this

embodiment a target object viewing window is not present. Video data is received and shown in the video data viewing window 20 and a moving object 22 is identified within a received video frame of the video data. A visual identifier 24 for the target object 22 is displayed in the video data viewing window 20. Latitude and longitude coordinates are calculated for the target object 22.

In a map view window 40 the location 42 of the moving object 22 is shown.

The methods described herein may be implemented on a computer apparatus such as that illustrated in FIG. 6. Referring to FIG. 6, the computer apparatus comprises a communications adaptor 605, a processor 610 and a memory 615. The computer apparatus also comprises an input device adaptor 620 for communicating with an input device 625. The computer further comprises a display adaptor 630 for operation with a display 635. The processor 610 is configured to receive data including video data, access for memory 615, and to act upon instructions received either from said memory 615 or said communications adaptor 605. The

communications adaptor 605 is configured to receive data and to send out data.

Data received by the processor 610 includes video data captured by a fixed camera 640. Processor 610 is configured to process the video data from the fixed camera 640. Processor 610 is further configured to identify a target object in a video frame of the video data, calculate the latitude and longitude coordinates of the target object and cause this information to be displayed on the display 635. The processor is further configured to adjust a PTZ camera 650 to track said target object based on the calculated latitude and longitude coordinates for the target object. Video data from the PTZ camera can then be displayed on display 635.

Other architectures to that shown in FIG. 6 may be used as will be appreciated by the skilled person. For example, the computer apparatus may be a distributed system which is distributed across a network or through dedicated local connections.

The methods described herein may be implemented by a computer program. The computer program may include computer executable code or instructions arranged to instruct a computer to perform the functions of one or more of the methods described above. The computer program and/or the code or instructions for performing such methods may be provided to an apparatus, such as a computer, on a computer readable medium or computer program product. The computer readable medium could be, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, or a propagation medium for data transmission, for example for downloading the code over the Internet. Alternatively, the computer readable medium could take the form of a physical computer readable medium such as semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disc, and an optical disk, such as a CD-ROM, CD-R/W or DVD.

Detection

Detection of a moving target object can be performed by any suitable method such as optical flow; temporal differencing; and background modelling, commonly known as background subtraction or a hybrid approach which combines a number of approaches. In embodiments, a target object is detected by object segmentation methods, such as that set out below.

Referring to FIG. 7, the first step of a typical object segmentation algorithm is to receive a video frame at step 710. The video frame is converted to normalised RGB data at step 720. At step 730, a pixel of the video frame is compared with a corresponding pixel of a background model. At step 740, a determination is made as to whether the pixel of the video frame differs from the corresponding pixel of the background model by more than a threshold value. If a determination is made that the difference is greater than a threshold value then the pixel of the video frame is categorised as belonging to a target object at step 750. Alternatively, if a

determination is made that the difference is less than a threshold value then the pixel of the video frame is categorised as belonging to the background at step 760, i.e. not belonging to a potential target object. If further pixels remain to be processed (step 770) then the process loops to step 730. If no further pixels remain to be processed, then post-processing of the pixels categorised as belonging to the target object occurs at step 780. The process terminates at step 790. In order to compare a pixel of a received video frame with a corresponding pixel of a background model at step 740, in some embodiments an assessment of the reflectivity properties of an object covered by the pixel is used as set out below.

The reflectivity of an object is a measure of the amount of light reflected by the object, or radiance, relative to the amount of incident light shone on the object, or irradiance, and is indicative of the reflectance or intrinsic brightness of the object. The reflectivity of an object can be used as a signature of the object. Hence it can be used to segment the object with respect to the remainder of an image. The reflectivity of an object is composed of diffuse (Lambertian, body) and specular (surface) components.

Referring to FIG. 8, the output of a camera depends on three factors:

1 . Illuminant (light source) (E(λ)) 800

2. Spectral response characteristics (Q) of camera (sensors) 840

3. Reflectivity (5(2)) of the object 830

For an RGB camera, the response to light at a given pixel is defined by the triplet of responses given by R, G and B outputs. The R G and B outputs are related to the illuminant, the camera response and the spectral reflectivity by Equation (1):

where λ is the wavelength (the visible range is approximately from 400nm to 700nm), E(X) is the spectral power distribution of the illuminant which can be estimated by the available knowledge of the environment or by reference to a background model, S(X) is a spectral reflectivity function characterising the proportion of light on an object that the object reflects.

Function Q_R (λ) characterises the red camera sensor spectral response

characteristics, which determines the proportion of the red colour signal the sensor absorbs on a per-wavelength basis and can, for example, be given from a manufacturer's data sheet. Q_G(λ) characterises the green camera sensor spectral response characteristics; Q_B(λ) characterises the blue camera sensor spectral response characteristics; w_d is a parameter for the diffuse reflection 820 component; and the geometrical parameter for the specular reflection 810 component is given by w_s.

In order to build a computational physical model, the spectral reflectivity function S(λ) is represented as a weighted sum of spectral basis functions by:

where is a basis function and w_t is a corresponding weight. In the example that

follows, n = 3 and the spectral basis functions are Parkkinen spectral basis functions (J. Parkkinen et al. Characteristic spectra of munsell colors. J. Opt. Soc. Am. A, 6(2):318-322, 1989. 3).

The aim now is to calculate the weights of the spectral basis functions, to obtain the spectral reflectivity of the object represented by the pixels of the current image. The calculated weights can then be used in the comparison of the video frame with the background model.

The model of Equation (1) is rewritten as:

where

The first basis function of Parkkinen is constant, and so

The first term can be merged with the specular component term to give:

Next X_i Y_i and Z_i are calculated by:

These integrations are calculated to obtain the transformation matrix

Now the weights of the basis functions can be obtained from RGB values by:

o

As a special case, for diffuse-only reflection w_s = 0:

where

By using the transformation of Equation (12), the RGB image of the video frame is represented by basis function weights as an expression of the reflectivity of the surfaces represented in the current image. That is, the R, G and B values of each pixel in the image of the video frame are used to calculate the basis function weights characterising the spectral reflectivity of any surface that pixel covers.

In order to compare two objects (whether they have the same surface material and colour or not), the spectral reflectivity of the surfaces of both objects can be compared by cross-correlation.

This can be achieved by finding the correlation between the Parkkinen basis functions:

If the first surface has weights given by w_lt w₂ and w_{3 l} and the second surface has weights given by w\, w'₂ and w'₃ , then the correlation, indicating the degree of similarity between two different surfaces becomes:

where l_x and I₂ represent the first and second surfaces.

One way to compare the video frame with the background model is to use Equation (12). The background model is formed by averaging over weights (i.e. taking the mean value of the weights) calculated for a number of input images which represent relatively static background frames. The average weights, along with the weights calculated for the video frame, can be substituted into Equation (12) to define

where BG represents a pixel of the background model and VF represents a pixel of the received video frame.

Each pixel, by comparing the calculated value of Equation (15) with threshold values Cmax and Cmin , can then be categorised as part of the background or as part of the foreground (steps 750 and 760). In this way, a foreground mask, FGMask, is defined.

Once all pixels have been processed, a post-processing of the foreground mask can begin (step 780). After the initial foreground mask has been created from the video frame, a dilation step is applied, which performs a dilation operation to expand or thicken the initial foreground mask. Following that, an erosion operation is applied to the resultant mask, which reduces the size of an object by eliminating area around the object edges. The erosion operation removes noise and eliminates foreground mask details smaller than a predetermined structure element size (e.g. 2 pixels).

In the way described above, a target object is identified.

Other methods of identifying a target object in a video frame are possible, as would be appreciated by the person skilled in the art. Derivation of geolocation information

A geographic coordinate system is a coordinate system that uses geographic coordinates which enable, in principle, every location on the Earth to be specified by a set of coordinates. A geographic coordinate system can be considered as a universal geolocation coordinate system. Embodiments using latitude and longitude coordinates will be described, although any suitable geographic coordinate system can be used.

A location of a target object can be derived from video data by any suitable method. FIG. 9 shows a flowchart of a method for deriving geographic coordinates for a target object identified in video data from a fixed camera according to an embodiment. In this embodiment, the geographic coordinates of the target object are calculated by first calculating the coordinates of the target object in a site map, such as may be viewed in a map view window, that covers the field of view of the fixed camera.

At step 910, four points (pixels) are selected within a video frame of the received video data and their respective positions in a video frame coordinate system are identified.

At step 920, the site map is oriented such that north is directed to the positive y-axis (ordinate) of the site map and east is directed towards the positive x-axis (abscissa) of the site map.

At step 930, four points are selected within the site map, the four points

corresponding to the respective positions of each of the objects covered by the four points selected within the video frame.

At step 940, a perspective transformation matrix is calculated. At step 950, a homography transformation matrix is calculated using the perspective transformation matrix. The homography transformation matrix maps the positions of the selected four points within the video frame to the corresponding four points within the site map.

The position of an object within a map coordinate system of the site map is therefore related to the position of the object within the video frame coordinate system by the transformation:

Where CamPixelx is the position of the pixel covering the object in the x (horizontal) direction in the video frame, CamPixely is the position of the pixel covering the object in the y (vertical) direction, MapPixelx is the position of the object on the site map in the east-west (horizontal) direction, MapPixely is the position of the object on the site map in the north-south (vertical) direction, and H represents the homography transformation matrix.

Steps 910, 920, 930, 940 and 950 lead to the calculation of a homography transformation matrix. Steps 910, 920, 930, 940 and 950 need only be performed once to provide the homography transformation matrix for a given viewpoint of the fixed camera.

At step 960, the homography transformation matrix, H , is applied to a position of a target object identified in a video frame, in the video frame coordinate system. The position of the target object can be determined by the position of a pixel at the centre- of-mass of the target object in the video frame. By applying the homography transformation matrix to the position of the target object within the video frame, a position of the target object within a map coordinate system of the site map is identified.

At step 970, three corners of the site map are selected and their latitude and longitude coordinates identified. At step 980, the height of the map is calculated. That is, the distance, MapImageH eight, between two vertically aligned corners of the site map is calculated. Additionally, at step 980 the width of the map is calculated. That is, the distance, MapImageWidth, between two horizontally aligned corners of the site map is calculated.

Steps 970 and 980 are used to calculate appropriate parameters for converting a position of an object in the map coordinate system to a position of the object in latitiude and longitude coordinates. Steps 970 and 980 need only be performed once to provide the appropriate parameters. At step 990, the known latitude and longitude coordinates of the four corners of the site map and the height of the map and the width of the map are used to calculate the latitude and longitude coordinates of the target object. The latitude LAT_T0 and longitude LONG_TO coordinates of the target object are calculated by:

where δ_χ is the difference between the latitudes (in decimal) of two linear horizontal reference corners, S_y is the difference between the longitudes (in decimal) of two linear vertical reference corners, MapPixelx is the x location (horizontal) of the target object on the map image, MapPixely is the y location (vertical) of the target object on the map image, MapOriginx is the horizontal position of the origin in the map image, and MapOriginy is the vertical position of the origin in the map image.

Variations of the described embodiments are envisaged, and the features of the disclosed embodiments can be combined in any way.

The geographic coordinates can be latitude and longitude coordinates. Coordinates can be calculated in a Universal Transverse Mercator coordinate system.

Coordinates can be calculated in a Universal Polar Stereographic coordinate system.

A video data viewing window may not be shown on a screen. A target object viewing window may not be shown on a screen. A map view window may not be shown on a screen. Any combination of a target object viewing window, a video data viewing window and a map view window can be shown on one or more screens.

A target object can be assigned an index number or other marker. This index number or marker can then be stored and used to identify the same target object in later video frames or in separate video data streams. In this way, once a target object has been identified in a first series of video frames, it is possible to identify the same target object in a second series of video frames. An assigned index number or marker for a target object can be displayed on a screen, for example, in the vicinity of the target object to which that index number or marker is assigned in video frames shown in a video data viewing window. A visual identifier for a target object can take any suitable form for identifying the target object in a video data viewing window. For example, a visual identifier can comprise a box overlaying the image of the target object in the frame shown in the video data viewing window. For example, a visual identifier can comprise an arrow pointing to the target object in the video data viewing window. Any suitable colours or shapes can be used for a visual identifier. Alternatively, in some embodiments, no visual identifiers are shown in the video data viewing window.

Other information can be displayed on the screen. The latitude and longitude coordinates of a target object can be displayed on the screen. In particular, the latitude and longitude coordinates of the target object can be displayed in the vicinity of the target object shown in a video frame displayed in a video data viewing window. A distance from the target object to the fixed camera can be calculated and shown on the screen. A distance from the target object to the PTZ camera can be calculated and displayed on the screen. A distance from the target object to a reference point or landmark can be determined and shown on the screen.

An estimated velocity of a target object can be shown on the screen. The velocity of the target object can be calculated by comparing the coordinates of the target object across a series of video frames.

Other variations of the described embodiments are also contemplated. For example, the video data shown in a video data viewing window may not originate from a fixed camera. Multiple cameras may be used to track a target object and to better triangulate a position of the target object. Video data from one or more of the multiple cameras can be displayed in a video data viewing window.

The location of a target object can be represented in any way in a map view window. For example, the location of a camera can be represented in the map view window. The locations of known landmarks and/or reference points can be represented in the map view window.

A target object may be selected by a user. A target object can be detected by motion.

One or more target objects may be tracked simultaneously. A distance between two target objects can be calculated from the determined positions of the two target objects. The ground distance between two target objects can be shown on the screen. The ground distance between a first target object and a second target object may be calculated by comparing the latitude and longitude coordinates of the first target object with the latitude and longitude coordinates of the second target object.

The ground distance GD between a first target object and a second target object can be calculated as:

where R « 6371 km is the approximate radius of the Earth. Parameter s is the central angle between the position of the first target object and the position of the second target object in latitude and longitude coordinates, and is given by:

Where atan2 (... , ... ) is the arctangent function with two arguments, capable of interpreting the signs of the two arguments to return the appropriate quadrant of the angle, and a is given by:

In Equation (22), Δ_long and Δ_lat are given by:

where (Lat^ Long^ are the calculated geographic coordinates of the first target object and (Lat₂, Long₂) are the geographic coordinates of the second target object. The ground distance between a target object and the fixed camera may also be calculated in this way. The ground distance between a first target object and a second target object may be calculated from the determined position of the first target object and the determined position of the second target object in the video frame coordinate system.

Identification of a target object can be performed by any suitable means. A target object can be identified by a user. A target object can be identified by motion sensors.

In the embodiment described in relation to FIGs 7 and 8, object segmentation was used to identify a target object. However, any suitable method for identifying a target object can be used.

Variations on the described method of object segmentation are envisaged such as those set out below.

A received video frame can be converted into any suitable image representation. In this respect, any suitable colour space can be chosen, e.g. RGB, normalised RGB, HSV.YIQ, YCbCr, YPbPr, CI ELAB and RGB colour ratio. However, normalised RGB and HSV are the most common colour spaces used for object segmentation as these colour spaces have been shown to be more tolerant of minor variations in the illuminant.

A background model can be created using any suitable method. In certain embodiments, a background model is created by analysing a series of received images which represent relatively static background frames. These frames can be represented by, for example, spectral reflectivity weights as explained above. The weights of each of the background frames can then be averaged to obtain a mean set of background frame weights that are then used to form the background model. The mean background frame can be called BG_mean.

When comparing the video frame to the background model, any suitable method can be used. Processes are envisaged that do not rely on calculating the spectral reflectivity of objects in a scene.

In the embodiment described in relation to FIGs 7 and 8, Parkinnen spectral basis functions were used. The skilled person would appreciate that other representations can be used. For example, instead of Parkinnen basis functions, embodiments can use other spectral basis functions, an appropriate set of eigenvectors of the statistical distribution of the spectral reflectances of Munsell colour chips, or of natural surfaces, or an appropriately selected set of Fourier basis functions.

In certain embodiments, the autocorrelation of the background frame is calculated, without normalisation of its magnitude, as

For each of the background training frames BG_i where i represents the background frame number, the ratio C_BG between the autocorrelation of the reflectivity of each frame and the autocorrelation of the mean background reflectivity of background frames, can be calculated:

The threshold values C_min and C_max can then be calculated by

The autocorrelation of the video frame, VF_Corr can then be calculated as

and the ratio

can then be compared with the threshold values to determine whether or not a pixel belongs to the foreground object or to the background.

Once a target object has been identified, a position of the target object in the video frame coordinate system can be determined by any suitable method, and any suitable coordinate system can be used. In certain embodiments, the position of the target object in the video frame is determined by calculating a horizontal distance and a vertical distance from a predetermined point, such as the bottom left corner of the video frame.

Variations on the described method of determining a geolocation of a target object are envisaged such as those set out below.

Reference points used to map a position of a target object in a video frame coordinate system to latitude and longitude may be chosen by a user.

The latitude and longitude coordinates of reference points can be established by manual readings. These manual readings can then be calibrated with a

measurement apparatus' position in a video frame to establish a mapping between a position of a target object in a video frame coordinate system and latitude and longitude coordinates of the target object.

A calculated ground distance from a target object to a PTZ camera can be used to adjust a zoom of the PTZ camera. The zoom of the PTZ camera can alternatively or additionally be adjusted based on a calculated velocity of the target object. Similarly, the panning and tilting of the PTZ camera can be adjusted according to any suitable criteria.

In certain embodiments, there is no PTZ camera to track a movement of a target object. The latitude and longitude coordinates of the target object may be sent to another device capable of tracking or following the movement of the target object.

The above embodiments have been described by way of example only, and the described embodiments are to be considered in all respects only as illustrative and not restrictive. It will be appreciated that variations of the described embodiments may be made without departing from the scope of the invention.

Claims

Claims:

1. A method comprising:

receiving video data from a camera, the video data comprising a plurality of video frames captured by the camera;

tracking a target object in a sequence of video frames of the video data; calculating a position of the target object in the video frame coordinate system; and

converting the position of the target object from the video frame coordinate system to geographic coordinates in a geographic coordinate system.

2. A method according to claim 1 , further comprising:

using the position of the geographic coordinates to display on a map image an indicator to show the position of the target object on the map image.

3. A method according to claim 2, wherein the map image is a satellite image.

4. A method according to any preceding claim, wherein the converting the position of the target object from the video frame coordinate system to geographic coordinates in a geographic coordinate system comprises:

converting the position of the target object in the video frame coordinate system to a position of the target object in a map coordinate system; and

converting the position of the target object in the map coordinate system to geographic coordinates in the geographic coordinate system.

5. A method according to claim 4, wherein the converting the position of the target object in the video frame coordinate system to a position of the target object in the map coordinate system comprises applying a homographic transformation to the position of the target object in the video frame coordinate system.

6. A method according to claim 5, wherein the homographic transformation is determined by a mapping of each of the positions of at least four non-collinear points in the video frame coordinate system to a corresponding position of each of the points in the map coordinate system.

7. A method according to any of claims 4 to 6, wherein converting the position of the target object in the map coordinate system to geographic coordinates in the geographic coordinate system comprises:

selecting at least three reference points in the map coordinate system;

identifying, for each of the at least three reference points, corresponding geographic coordinates in the geographic coordinate system;

determining a mapping for converting the position of the target object in the map coordinate system to geographic coordinates in the geographic coordinate system using the reference points and identified geographic coordinates of the reference points; and

using the mapping to convert the position of the target object in the map coordinate system to geographic coordinates in the geographic coordinate system.

8. A method according to any preceding claim, further comprising:

calculating the distance between the target object and a fixed point in the video frame.

9. A method according to any preceding claim, comprising tracking a plurality of target objects.

10. A method according to claim 9, further comprising calculating the distance between two target objects.

1 1. A method according to any preceding claim, comprising calculating the velocity of a target object.

12. A method according to any of claims 8 to 1 1 , wherein the distance and speed are calculated from the geographic coordinates.

13. A method according to any preceding claim, further comprising:

converting the position of the moving object from the video frame coordinate system to tracking coordinates in a tracking device coordinate system for a tracking device;

adjusting the tracking device to direct the tracking device towards the moving object.

14. A method according to claim 13, wherein the tracking device is a tracking camera and adjusting the tracking device to direct the tracking device towards the moving object comprising adjusting the camera to monitor the moving object, the method further comprising:

displaying the view captured by the tracking camera.

15. A method according to claim 14, wherein the tracking coordinates are calculated from the geographic coordinates.

16. A system comprising a memory, a processor and at least one camera, the system being configured to perform the method of any preceding claim.

17. A computer readable medium having stored therein computer readable instructions which, when executed by a processor, cause the processor to perform the method of any of claims 1 to 15.